Re: [PATCH] i386: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c [PR 102811]

2021-11-23 Thread Uros Bizjak via Gcc-patches
On Wed, Nov 24, 2021 at 7:25 AM Kong, Lingling via Gcc-patches
 wrote:
>
> Hi,
>
> vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with 
> -mf16c. So added define_insn extendhfsf2 and truncsfhf2 for target_f16c.
> And cleared before conversion, updated  movhi_internal and 
> ix86_can_change_mode_class.

Please fix the above commit message.

>
> OK for master?
>
> gcc/ChangeLog:
>
> PR target/102811
> * config/i386/i386.c (ix86_can_change_mode_class): SSE2 can load 
> 16bit data
> to sse register via pinsrw.

Allow 16bit data in XMM register for SSE2 targets.

> * config/i386/i386.md (extendhfsf2): Add extenndhfsf2 for f16c.

... for TARGET_F16C.

> (extendhfdf2): Split extendhf2 into separate extendhfsf2, 
> extendhfdf2.
> extendhfdf only for target_avx512fp16.

Restrict extendhfdf for TARGET_AVX512FP16 only.

> (*extendhf2):rename extendhf2.

Rename from extendhf2.

> (truncsfhf2): Likewise.
> (truncdfhf2): Likewise.
> (*trunc2): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> PR target/102811
> * gcc.target/i386/pr90773-21.c: Optimized movhi_internal,
> optimize vmovd + movw to vpextrw.

Also allow pextrw.

> * gcc.target/i386/pr90773-23.c: Ditto.
> * gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c: New test.

Otherwise LGTM.

BTW: When playing with my patch, I introduced (define_insn
"*vec_set_0" ...) to optimize scalar load to a vector. Does
ix86_expand_vector_set work OK without this pattern?

Thanks,
Uros.

> ---
>  gcc/config/i386/i386.c|  5 +-
>  gcc/config/i386/i386.md   | 74 +--
>  .../i386/avx512vl-vcvtps2ph-pr102811.c| 11 +++
>  gcc/testsuite/gcc.target/i386/pr90773-21.c|  2 +-
>  gcc/testsuite/gcc.target/i386/pr90773-23.c|  2 +-
>  5 files changed, 83 insertions(+), 11 deletions(-)  create mode 100644 
> gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 
> e94efdf39fb..4b813533961 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -19485,9 +19485,8 @@ ix86_can_change_mode_class (machine_mode from, 
> machine_mode to,
>  disallow a change to these modes, reload will assume it's ok to
>  drop the subreg from (subreg:SI (reg:HI 100) 0).  This affects
>  the vec_dupv4hi pattern.
> -NB: AVX512FP16 supports vmovw which can load 16bit data to sse
> -register.  */
> -  int mov_size = MAYBE_SSE_CLASS_P (regclass) && TARGET_AVX512FP16 ? 2 : 
> 4;
> +NB: SSE2 can load 16bit data to sse register via pinsrw.  */
> +  int mov_size = MAYBE_SSE_CLASS_P (regclass) && TARGET_SSE2 ? 2 :
> +4;
>if (GET_MODE_SIZE (from) < mov_size)
> return false;
>  }
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 
> 6eb9de81921..6ee264f1151 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -2525,6 +2525,16 @@
>  case TYPE_SSEMOV:
>return ix86_output_ssemov (insn, operands);
>
> +case TYPE_SSELOG:
> +  if (SSE_REG_P (operands[0]))
> +   return MEM_P (operands[1])
> + ? "pinsrw\t{$0, %1, %0|%0, %1, 0}"
> + : "pinsrw\t{$0, %k1, %0|%0, %k1, 0}";
> +  else
> +   return MEM_P (operands[1])
> + ? "pextrw\t{$0, %1, %0|%0, %1, 0}"
> + : "pextrw\t{$0, %1, %k0|%k0, %k1, 0}";
> +
>  case TYPE_MSKLOG:
>if (operands[1] == const0_rtx)
> return "kxorw\t%0, %0, %0";
> @@ -2540,13 +2550,17 @@
>  }
>  }
>[(set (attr "isa")
> -   (cond [(eq_attr "alternative" "9,10,11,12,13")
> - (const_string "avx512fp16")
> +   (cond [(eq_attr "alternative" "9,10,11,12")
> + (const_string "sse2")
> +  (eq_attr "alternative" "13")
> + (const_string "sse4")
>]
>(const_string "*")))
> (set (attr "type")
>   (cond [(eq_attr "alternative" "9,10,11,12,13")
> - (const_string "ssemov")
> + (if_then_else (match_test "TARGET_AVX512FP16")
> +   (const_string "ssemov")
> +   (const_string "sselog"))
> (eq_attr "alternative" "4,5,6,7")
>   (const_string "mskmov")
> (eq_attr "alternative" "8")
> @@ -4574,8 +4588,32 @@
>emit_move_insn (operands[0], CONST0_RTX (V2DFmode));
>  })
>
> -(define_insn "extendhf2"
> -  [(set (match_operand:MODEF 0 "nonimm_ssenomem_operand" "=v")
> +(define_expand "extendhfsf2"
> +  [(set (match_operand:SF 0 "register_operand")
> +   (float_extend:SF
> + (match_operand:HF 1 "nonimmediate_operand")))]
> +  "TARGET_AVX512FP16 || TARGET_F16C || TARGET_AVX512VL"
> +{
> +  if (!TARGET_AVX512FP16)
> +{
> +  rtx res = gen_reg_rtx (V4SFmode);
> +  rtx tmp = force_reg (V8HFmode, CONST0_RTX (V8HFmode));
> +
> +  

Re: [PATCH] Loop unswitching: support gswitch statements.

2021-11-23 Thread Richard Biener via Gcc-patches
On Tue, Nov 23, 2021 at 4:20 PM Martin Liška  wrote:
>
> On 11/23/21 14:58, Richard Biener wrote:
> > On Mon, Nov 22, 2021 at 4:07 PM Martin Liška  wrote:
> >>
> >> On 11/19/21 11:00, Richard Biener wrote:
> >>> On Tue, Nov 16, 2021 at 3:40 PM Martin Liška  wrote:
> 
>  On 11/11/21 08:15, Richard Biener wrote:
> > So I'd try to do no functional change first, improving the costing and
> > setting up the transform to simply pick up the stmts to "fold" as 
> > discovered
> > during analysis (as I hinted you possibly can use gimple_uid to mark
> > the stmts that simplify, IIRC gimple_uid is preserved during copying.
> > gimple_uid would also scale better than gimple_plf in case we do
> > the analysis for all candidates at once).
> 
>  Thinking about the analysis. Am I correct that we want to properly 
>  calculate
>  loop size for true and false edge of a potential gcond before the 
>  actually unswitching?
> >>>
> >>> Yes.
> >>>
>  We can do that by finding a first gcond candidate, evaluate (symbolic + 
>  irange approache)
>  all other gcond in the loop body and use BB_REACHABLE discovery. 
>  Similarly to what we do now
>  at lines 378-446. Then tree_num_loop_insns can be adjusted for only 
>  these reachable blocks.
>  Having that, we can calculate # of insns that will live in true/false 
>  loops.
> >>>
> >>> So whatever we do here we should record as "this control stmt folds to
> >>> {true,false}" (or {true,unknown},
> >>> or in future, "this control stmt will lead to edge {e,unknown}"),
> >>> recording the simplification
> >>> on the true/false loop version in a way we can apply it after the 
> >>> transform.
> >>>
>  Then we can call tree_unswitch_loop and make the gcond folding as we do 
>  in the versioned loops.
> 
>  Is it a step in good direction? Having that we can then extend it to 
>  gswitch statements.
> >>>
> >>> One issue I see is that BB_REACHABLE is there only once but you could use
> >>> auto_bb_flag reachable_true, reachable_false to distinguish the
> >>> true/false loop version
> >>> copies.
> >>>
> >>> So yes, I think that sounds reasonable.  At the point we want to
> >>> evaluate different
> >>> (first) unswitching opportunities against each other storing this only
> >>> as BB flag is
> >>> likely to hit limits.  When we want to evaluate multiple levels of
> >>> unswitching before
> >>> doing any transforms even more so (if there are 3 opportunities there'd be
> >>> many cases to be considered when going to level 3 ;)).  I _think_ that a 
> >>> sparse
> >>> lattice of stmt UID -> edge might do the trick if we change 
> >>> tree_num_loop_insns
> >>> do to a DFS walk from the loop header, ignoring not taken edges by
> >>> consulting the
> >>> lattice.  Or, for speed reason, pre-compute tree_num_loop_insns for each 
> >>> BB
> >>> so we just have to sum a different set of BBs rather than walking all
> >>> stmts again.
> >>>
> >>> That said, the second step would definitely be to choose the "best" 
> >>> opportunity
> >>> on the current level.
> >>>
> >>> Richard.
> >>>
>  Cheers,
>  Martin
> >>
> >> Hello.
> >>
> >> I'm sending a new version where I changed:
> >> 1) all unswitch_predicates are find for a loop
> >> 2) context sensitive costing happens based on an unswitch_predicate and BB 
> >> reachability
> >>  is implemented
> >> 3) folding happens in recursive invocation once we decide to unswitch
> >> 4) the patch folds both symbolic gcond predicates and irange provided by 
> >> ranger
> >> 5) debug counter was added
> >>
> >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests. 
> >> Plus, I tested it
> >> on SPEC2006 and SPEC2017 with -size=ref.
> >
> > Meh, diff made a mess out if this ;)  Random comments, I'm walking
> > myself the optimizations
> > flow.
>
> Sure.
>
> >
> > tree_unswitch_single_loop:
> >
> > +  unswitch_predicate *predicate = NULL;
> > +  if (num > param_max_unswitch_level)
> > +{
> > +  if (dump_file
> > + && (dump_flags & TDF_DETAILS))
> > +   fprintf (dump_file, ";; Not unswitching anymore, hit max level\n");
> > +  goto exit;
> > +}
> >
> > this looks like we can do this check before find_all_unswitching_predicates?
>
> Makes sense.
>
> >
> > +  for (auto pred: candidates)
> > +{
> > +  unsigned cost
> > +   = evaluate_loop_insns_for_predicate (loop, bbs, ranger, pred);
> > ...
> >
> > so this searches for the first candidate that fits in
> > param_max_unswitch_insns, it doesn't
> > yet try to find the cheapest / best one.  Please add a comment to say
> > that.  After we
> > found one candidate we apply unswitching to such one candidate (and throw 
> > the
> > others away).  I guess that's OK - it's what the old code did - what
> > you did for this
> > intermediate step is actually gather all unswitching predicates
> > upfront.  Hopefully
> > we'll be able to share some of the work 

Re: Ping: [PATCH v7 2/2] Don't move cold code out of loop by checking bb count

2021-11-23 Thread Richard Biener via Gcc-patches
On Wed, Nov 24, 2021 at 6:15 AM Xionghu Luo  wrote:
>
> Gentle ping and is this patch still suitable for stage 3?  Thanks.

It's on my list to look at, yes.  I'm worried about accuracy of profile counts
though which is why I keep pushing it back - I see the various profile count
fixes are still pending.

I'm thinking of requiring reliable_p () on the counts (but that will restrict
this to FDO).

Anyway, still on my list to look at - sorry for the delays.

Richard.

>
> [PATCH v7 2/2] Don't move cold code out of loop by checking bb count
>
> https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583911.html
>
>
>
> On 2021/11/10 11:08, Xionghu Luo via Gcc-patches wrote:
> >
> >
> > On 2021/11/4 21:00, Richard Biener wrote:
> >> On Wed, Nov 3, 2021 at 2:29 PM Xionghu Luo  wrote:
> >>>
> >>>
>  +  while (outmost_loop != loop)
>  +{
>  +  if (bb_colder_than_loop_preheader (loop_preheader_edge
>  (outmost_loop)->src,
>  +loop_preheader_edge 
>  (cold_loop)->src))
>  +   cold_loop = outmost_loop;
>  +  outmost_loop = superloop_at_depth (loop, loop_depth 
>  (outmost_loop) + 1);
>  +}
> 
>  could be instead written as
> 
>    coldest_loop = coldest_outermost_loop[loop->num];
>    if (loop_depth (coldest_loop) < loop_depth (outermost_loop))
>  return outermost_loop;
>    return coldest_loop;
> 
>  ?  And in the usual case coldest_outermost_loop[L] would be the loop 
>  tree root.
>  It should be possible to compute such cache in a DFS walk of the loop 
>  tree
>  (the loop iterator by default visits in such order).
> >>>
> >>>
> >>> Thanks.  Updated the patch with your suggestion.  Not sure whether it 
> >>> strictly
> >>> conforms to your comments.  Though the patch passed all my added 
> >>> tests(coverage not enough),
> >>> I am still a bit worried if pre-computed coldest_loop is outside of 
> >>> outermost_loop, but
> >>> outermost_loop is not the COLDEST LOOP, i.e. (outer->inner)
> >>>
> >>>  [loop tree root, coldest_loop, outermost_loop,..., second_coldest_loop, 
> >>> ..., loop],
> >>>
> >>> then function find_coldest_out_loop will return a loop NOT accord with our
> >>> expectation, that should return second_coldest_loop instead of 
> >>> outermost_loop?
> >> Hmm, interesting - yes.  I guess the common case will be that the 
> >> pre-computed
> >> outermost loop will be the loop at depth 1 since outer loops tend to
> >> be colder than
> >> inner loops?  That would then defeat the whole exercise.
> >
> > It is not easy to construct such cases, But finally I got below results,
> >
> > 1) many cases inner loop is hotter than outer loop, for example:
> >
> > loop 1's coldest_outermost_loop is 1, colder_than_inner_loop is NULL
> > loop 2's coldest_outermost_loop is 1, colder_than_inner_loop is 1
> > loop 3's coldest_outermost_loop is 1, colder_than_inner_loop is 2
> > loop 4's coldest_outermost_loop is 1, colder_than_inner_loop is 2
> >
> >
> > 2) But there are also cases inner loop is colder than outer loop, like:
> >
> > loop 1's coldest outermost loop is 1, colder_than_inner_loop is NULL
> > loop 2's coldest outermost loop is 2, colder_than_inner_loop is NULL
> > loop 3's coldest outermost loop is 3, colder_than_inner_loop is NULL
> >
> >
> >>
> >> To optimize the common case but not avoiding iteration in the cases we care
> >> about we could instead cache the next outermost loop that is _not_ colder
> >> than loop.  So for your [ ... ] example above we'd have> 
> >> hotter_than_inner_loop[loop] == outer (second_coldest_loop), where the
> >> candidate would then be 'second_coldest_loop' and we'd then iterate
> >> to hotter_than_inner_loop[hotter_than_inner_loop[loop]] to find the next
> >> cold candidate we can compare against?  For the common case we'd
> >> have hotter_than_inner_loop[looo] == NULL (no such loop) and we then
> >> simply pick 'outermost_loop'.
> >
> > Thanks.  It was difficult to understand, but finally I got to know what you
> > want to express :)
> >
> > We should cache the next loop that is *colder* than loop instead of '_not_ 
> > colder
> > than loop', and 'hotter_than_inner_loop' should be 'colder_than_inner_loop',
> > then it makes sense if the coldest loop is outside of outermost loop, 
> > continue to
> > find a colder loop between outermost loop and current loop in
> > colder_than_inner_loop[loop->num]?  Hope I understood you correctly...
> >
> >>
> >> One comment on the patch itself below.
> >>
> >
> > The loop in fill_cold_out_loop is also removed in the updated v7 patch.
> >
> >
> >
> > [PATCH v7 2/2] Don't move cold code out of loop by checking bb count
> >
> > From: Xiong Hu Luo 
> >
> > v7 changes:
> > 1. Refine get_coldest_out_loop to replace loop with checking
> > pre-computed coldest_outermost_loop and colder_than_inner_loop.
> > 2. Add function fill_cold_out_loop, compute coldest_outermost_loop and
> > 

[PATCH][wwwdocs] Update section on enormous source files in htdocs/projects/beginner.html

2021-11-23 Thread Eric Gallager via Gcc-patches
On Tue, Nov 23, 2021 at 6:27 PM Eric Gallager  wrote:
>
> On Fri, Nov 19, 2021 at 8:14 AM Eric Gallager  wrote:
> >
> > On Fri, Nov 19, 2021 at 1:48 AM Gerald Pfeifer  wrote:
> > >
> > > Cool, thank you!
> > >
> > > Please feel free to commit patches like this without asking for
> > > approval (though I'm happy to review and approve).
> > >
> > > Gerald
> > >
> >
> > OK thanks; committed as dbaebcd
>
> I've also committed one to remove the section on traditional C now:
> https://gcc.gnu.org/git/?p=gcc-wwwdocs.git;a=commitdiff;h=ca83c13ad6bf0d351220dafa36264ebc7a6b7816

This next patch does more than just removing old stuff: it adds an
extra sentence to describe a shell command used to generate a list, so
to verify that I've got the shell command right, I'm asking for a
review.

Thanks,
Eric Gallager


patch-beginner-projects-02.diff
Description: Binary data


RE: [PATCH] i386: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c [PR 102811]

2021-11-23 Thread Liu, Hongtao via Gcc-patches



>-Original Message-
>From: Kong, Lingling 
>Sent: Wednesday, November 24, 2021 2:25 PM
>To: Liu, Hongtao ; gcc-patches@gcc.gnu.org
>Cc: Kong, Lingling 
>Subject: RE: [PATCH] i386: vcvtph2ps and vcvtps2ph should be used to convert
>_Float16 to SFmode with -mf16c [PR 102811]
>
>Hi,
>
>vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with
>-mf16c. So added define_insn extendhfsf2 and truncsfhf2 for target_f16c.
>And cleared before conversion, updated  movhi_internal and
>ix86_can_change_mode_class.
>
>OK for master?
>
>gcc/ChangeLog:
>
>   PR target/102811
>   * config/i386/i386.c (ix86_can_change_mode_class): SSE2 can load
>16bit data
>   to sse register via pinsrw.
>   * config/i386/i386.md (extendhfsf2): Add extenndhfsf2 for f16c.
>   (extendhfdf2): Split extendhf2 into separate extendhfsf2,
>extendhfdf2.
>   extendhfdf only for target_avx512fp16.
>   (*extendhf2):rename extendhf2.
>   (truncsfhf2): Likewise.
>   (truncdfhf2): Likewise.
>   (*trunc2): Likewise.
>
>gcc/testsuite/ChangeLog:
>
>   PR target/102811
>   * gcc.target/i386/pr90773-21.c: Optimized movhi_internal,
>   optimize vmovd + movw to vpextrw.
>   * gcc.target/i386/pr90773-23.c: Ditto.
>   * gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c: New test.
>---
> gcc/config/i386/i386.c|  5 +-
> gcc/config/i386/i386.md   | 74 +--
> .../i386/avx512vl-vcvtps2ph-pr102811.c| 11 +++
> gcc/testsuite/gcc.target/i386/pr90773-21.c|  2 +-
> gcc/testsuite/gcc.target/i386/pr90773-23.c|  2 +-
> 5 files changed, 83 insertions(+), 11 deletions(-)  create mode 100644
>gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c
>
>diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index
>e94efdf39fb..4b813533961 100644
>--- a/gcc/config/i386/i386.c
>+++ b/gcc/config/i386/i386.c
>@@ -19485,9 +19485,8 @@ ix86_can_change_mode_class (machine_mode
>from, machine_mode to,
>disallow a change to these modes, reload will assume it's ok to
>drop the subreg from (subreg:SI (reg:HI 100) 0).  This affects
>the vec_dupv4hi pattern.
>-   NB: AVX512FP16 supports vmovw which can load 16bit data to sse
>-   register.  */
>-  int mov_size = MAYBE_SSE_CLASS_P (regclass) && TARGET_AVX512FP16 ?
>2 : 4;
>+   NB: SSE2 can load 16bit data to sse register via pinsrw.  */
>+  int mov_size = MAYBE_SSE_CLASS_P (regclass) && TARGET_SSE2 ? 2 :
>+4;
>   if (GET_MODE_SIZE (from) < mov_size)
>   return false;
> }
>diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index
>6eb9de81921..6ee264f1151 100644
>--- a/gcc/config/i386/i386.md
>+++ b/gcc/config/i386/i386.md
>@@ -2525,6 +2525,16 @@
> case TYPE_SSEMOV:
>   return ix86_output_ssemov (insn, operands);
>
>+case TYPE_SSELOG:
>+  if (SSE_REG_P (operands[0]))
>+  return MEM_P (operands[1])
>+? "pinsrw\t{$0, %1, %0|%0, %1, 0}"
>+: "pinsrw\t{$0, %k1, %0|%0, %k1, 0}";
>+  else
>+  return MEM_P (operands[1])
>+? "pextrw\t{$0, %1, %0|%0, %1, 0}"
>+: "pextrw\t{$0, %1, %k0|%k0, %k1, 0}";
>+
> case TYPE_MSKLOG:
>   if (operands[1] == const0_rtx)
>   return "kxorw\t%0, %0, %0";
>@@ -2540,13 +2550,17 @@
> }
> }
>   [(set (attr "isa")
>-  (cond [(eq_attr "alternative" "9,10,11,12,13")
>-(const_string "avx512fp16")
>+  (cond [(eq_attr "alternative" "9,10,11,12")
>+(const_string "sse2")
>+ (eq_attr "alternative" "13")
>+(const_string "sse4")
>  ]
>  (const_string "*")))
>(set (attr "type")
>  (cond [(eq_attr "alternative" "9,10,11,12,13")
>-(const_string "ssemov")
>+(if_then_else (match_test "TARGET_AVX512FP16")
>+  (const_string "ssemov")
>+  (const_string "sselog"))
>   (eq_attr "alternative" "4,5,6,7")
> (const_string "mskmov")
>   (eq_attr "alternative" "8")
>@@ -4574,8 +4588,32 @@
>   emit_move_insn (operands[0], CONST0_RTX (V2DFmode));
> })
>
>-(define_insn "extendhf2"
>-  [(set (match_operand:MODEF 0 "nonimm_ssenomem_operand" "=v")
>+(define_expand "extendhfsf2"
>+  [(set (match_operand:SF 0 "register_operand")
>+  (float_extend:SF
>+(match_operand:HF 1 "nonimmediate_operand")))]
>+  "TARGET_AVX512FP16 || TARGET_F16C || TARGET_AVX512VL"
>+{
>+  if (!TARGET_AVX512FP16)
>+{
>+  rtx res = gen_reg_rtx (V4SFmode);
>+  rtx tmp = force_reg (V8HFmode, CONST0_RTX (V8HFmode));
>+
>+  ix86_expand_vector_set (false, tmp, operands[1], 0);
>+  emit_insn (gen_vcvtph2ps (res, gen_lowpart (V8HImode, tmp)));
>+  emit_move_insn (operands[0], gen_lowpart (SFmode, res));
>+  DONE;
>+}
>+})
>+
>+(define_expand "extendhfdf2"
>+  [(set (match_operand:DF 0 "register_operand")
>+  (float_extend:DF
>+(match_operand:HF 1 

RE: [PATCH] i386: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c [PR 102811]

2021-11-23 Thread Kong, Lingling via Gcc-patches
Hi,

vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with 
-mf16c. So added define_insn extendhfsf2 and truncsfhf2 for target_f16c.
And cleared before conversion, updated  movhi_internal and 
ix86_can_change_mode_class.

OK for master?

gcc/ChangeLog:

PR target/102811
* config/i386/i386.c (ix86_can_change_mode_class): SSE2 can load 16bit 
data
to sse register via pinsrw.
* config/i386/i386.md (extendhfsf2): Add extenndhfsf2 for f16c.
(extendhfdf2): Split extendhf2 into separate extendhfsf2, 
extendhfdf2.
extendhfdf only for target_avx512fp16.
(*extendhf2):rename extendhf2.
(truncsfhf2): Likewise.
(truncdfhf2): Likewise.
(*trunc2): Likewise.

gcc/testsuite/ChangeLog:

PR target/102811
* gcc.target/i386/pr90773-21.c: Optimized movhi_internal,
optimize vmovd + movw to vpextrw.
* gcc.target/i386/pr90773-23.c: Ditto.
* gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c: New test.
---
 gcc/config/i386/i386.c|  5 +-
 gcc/config/i386/i386.md   | 74 +--
 .../i386/avx512vl-vcvtps2ph-pr102811.c| 11 +++
 gcc/testsuite/gcc.target/i386/pr90773-21.c|  2 +-
 gcc/testsuite/gcc.target/i386/pr90773-23.c|  2 +-
 5 files changed, 83 insertions(+), 11 deletions(-)  create mode 100644 
gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 
e94efdf39fb..4b813533961 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -19485,9 +19485,8 @@ ix86_can_change_mode_class (machine_mode from, 
machine_mode to,
 disallow a change to these modes, reload will assume it's ok to
 drop the subreg from (subreg:SI (reg:HI 100) 0).  This affects
 the vec_dupv4hi pattern.
-NB: AVX512FP16 supports vmovw which can load 16bit data to sse
-register.  */
-  int mov_size = MAYBE_SSE_CLASS_P (regclass) && TARGET_AVX512FP16 ? 2 : 4;
+NB: SSE2 can load 16bit data to sse register via pinsrw.  */
+  int mov_size = MAYBE_SSE_CLASS_P (regclass) && TARGET_SSE2 ? 2 : 
+4;
   if (GET_MODE_SIZE (from) < mov_size)
return false;
 }
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 
6eb9de81921..6ee264f1151 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -2525,6 +2525,16 @@
 case TYPE_SSEMOV:
   return ix86_output_ssemov (insn, operands);
 
+case TYPE_SSELOG:
+  if (SSE_REG_P (operands[0]))
+   return MEM_P (operands[1])
+ ? "pinsrw\t{$0, %1, %0|%0, %1, 0}"
+ : "pinsrw\t{$0, %k1, %0|%0, %k1, 0}";
+  else
+   return MEM_P (operands[1])
+ ? "pextrw\t{$0, %1, %0|%0, %1, 0}"
+ : "pextrw\t{$0, %1, %k0|%k0, %k1, 0}";
+
 case TYPE_MSKLOG:
   if (operands[1] == const0_rtx)
return "kxorw\t%0, %0, %0";
@@ -2540,13 +2550,17 @@
 }
 }
   [(set (attr "isa")
-   (cond [(eq_attr "alternative" "9,10,11,12,13")
- (const_string "avx512fp16")
+   (cond [(eq_attr "alternative" "9,10,11,12")
+ (const_string "sse2")
+  (eq_attr "alternative" "13")
+ (const_string "sse4")
   ]
   (const_string "*")))
(set (attr "type")
  (cond [(eq_attr "alternative" "9,10,11,12,13")
- (const_string "ssemov")
+ (if_then_else (match_test "TARGET_AVX512FP16")
+   (const_string "ssemov")
+   (const_string "sselog"))
(eq_attr "alternative" "4,5,6,7")
  (const_string "mskmov")
(eq_attr "alternative" "8")
@@ -4574,8 +4588,32 @@
   emit_move_insn (operands[0], CONST0_RTX (V2DFmode));
 })
 
-(define_insn "extendhf2"
-  [(set (match_operand:MODEF 0 "nonimm_ssenomem_operand" "=v")
+(define_expand "extendhfsf2"
+  [(set (match_operand:SF 0 "register_operand")
+   (float_extend:SF
+ (match_operand:HF 1 "nonimmediate_operand")))]
+  "TARGET_AVX512FP16 || TARGET_F16C || TARGET_AVX512VL"
+{
+  if (!TARGET_AVX512FP16)
+{
+  rtx res = gen_reg_rtx (V4SFmode);
+  rtx tmp = force_reg (V8HFmode, CONST0_RTX (V8HFmode));
+
+  ix86_expand_vector_set (false, tmp, operands[1], 0);
+  emit_insn (gen_vcvtph2ps (res, gen_lowpart (V8HImode, tmp)));
+  emit_move_insn (operands[0], gen_lowpart (SFmode, res));
+  DONE;
+}
+})
+
+(define_expand "extendhfdf2"
+  [(set (match_operand:DF 0 "register_operand")
+   (float_extend:DF
+ (match_operand:HF 1 "nonimmediate_operand")))]
+  "TARGET_AVX512FP16")
+
+(define_insn "*extendhf2"
+  [(set (match_operand:MODEF 0 "register_operand" "=v")
 (float_extend:MODEF
  (match_operand:HF 1 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512FP16"
@@ -4766,7 +4804,31 @@
 
 ;; Conversion from {SF,DF}mode to HFmode.
 
-(define_insn "trunchf2"
+(define_expand "truncsfhf2"
+  

Ping: [PATCH v7 2/2] Don't move cold code out of loop by checking bb count

2021-11-23 Thread Xionghu Luo via Gcc-patches
Gentle ping and is this patch still suitable for stage 3?  Thanks.


[PATCH v7 2/2] Don't move cold code out of loop by checking bb count

https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583911.html



On 2021/11/10 11:08, Xionghu Luo via Gcc-patches wrote:
> 
> 
> On 2021/11/4 21:00, Richard Biener wrote:
>> On Wed, Nov 3, 2021 at 2:29 PM Xionghu Luo  wrote:
>>>
>>>
 +  while (outmost_loop != loop)
 +{
 +  if (bb_colder_than_loop_preheader (loop_preheader_edge
 (outmost_loop)->src,
 +loop_preheader_edge 
 (cold_loop)->src))
 +   cold_loop = outmost_loop;
 +  outmost_loop = superloop_at_depth (loop, loop_depth (outmost_loop) 
 + 1);
 +}

 could be instead written as

   coldest_loop = coldest_outermost_loop[loop->num];
   if (loop_depth (coldest_loop) < loop_depth (outermost_loop))
 return outermost_loop;
   return coldest_loop;

 ?  And in the usual case coldest_outermost_loop[L] would be the loop tree 
 root.
 It should be possible to compute such cache in a DFS walk of the loop tree
 (the loop iterator by default visits in such order).
>>>
>>>
>>> Thanks.  Updated the patch with your suggestion.  Not sure whether it 
>>> strictly
>>> conforms to your comments.  Though the patch passed all my added 
>>> tests(coverage not enough),
>>> I am still a bit worried if pre-computed coldest_loop is outside of 
>>> outermost_loop, but
>>> outermost_loop is not the COLDEST LOOP, i.e. (outer->inner)
>>>
>>>  [loop tree root, coldest_loop, outermost_loop,..., second_coldest_loop, 
>>> ..., loop],
>>>
>>> then function find_coldest_out_loop will return a loop NOT accord with our
>>> expectation, that should return second_coldest_loop instead of 
>>> outermost_loop?
>> Hmm, interesting - yes.  I guess the common case will be that the 
>> pre-computed
>> outermost loop will be the loop at depth 1 since outer loops tend to
>> be colder than
>> inner loops?  That would then defeat the whole exercise.
> 
> It is not easy to construct such cases, But finally I got below results,
> 
> 1) many cases inner loop is hotter than outer loop, for example:
> 
> loop 1's coldest_outermost_loop is 1, colder_than_inner_loop is NULL
> loop 2's coldest_outermost_loop is 1, colder_than_inner_loop is 1
> loop 3's coldest_outermost_loop is 1, colder_than_inner_loop is 2
> loop 4's coldest_outermost_loop is 1, colder_than_inner_loop is 2
> 
> 
> 2) But there are also cases inner loop is colder than outer loop, like:
> 
> loop 1's coldest outermost loop is 1, colder_than_inner_loop is NULL
> loop 2's coldest outermost loop is 2, colder_than_inner_loop is NULL
> loop 3's coldest outermost loop is 3, colder_than_inner_loop is NULL
> 
> 
>>
>> To optimize the common case but not avoiding iteration in the cases we care
>> about we could instead cache the next outermost loop that is _not_ colder
>> than loop.  So for your [ ... ] example above we'd have> 
>> hotter_than_inner_loop[loop] == outer (second_coldest_loop), where the
>> candidate would then be 'second_coldest_loop' and we'd then iterate
>> to hotter_than_inner_loop[hotter_than_inner_loop[loop]] to find the next
>> cold candidate we can compare against?  For the common case we'd
>> have hotter_than_inner_loop[looo] == NULL (no such loop) and we then
>> simply pick 'outermost_loop'.
> 
> Thanks.  It was difficult to understand, but finally I got to know what you
> want to express :)
> 
> We should cache the next loop that is *colder* than loop instead of '_not_ 
> colder
> than loop', and 'hotter_than_inner_loop' should be 'colder_than_inner_loop',
> then it makes sense if the coldest loop is outside of outermost loop, 
> continue to
> find a colder loop between outermost loop and current loop in
> colder_than_inner_loop[loop->num]?  Hope I understood you correctly...
> 
>>
>> One comment on the patch itself below.
>>
> 
> The loop in fill_cold_out_loop is also removed in the updated v7 patch.
> 
> 
> 
> [PATCH v7 2/2] Don't move cold code out of loop by checking bb count
> 
> From: Xiong Hu Luo 
> 
> v7 changes:
> 1. Refine get_coldest_out_loop to replace loop with checking
> pre-computed coldest_outermost_loop and colder_than_inner_loop.
> 2. Add function fill_cold_out_loop, compute coldest_outermost_loop and
> colder_than_inner_loop recursively without loop.
> 
> v6 changes:
> 1. Add function fill_coldest_out_loop to pre compute the coldest
> outermost loop for each loop.
> 2. Rename find_coldest_out_loop to get_coldest_out_loop.
> 3. Add testcase ssa-lim-22.c to differentiate with ssa-lim-19.c.
> 
> v5 changes:
> 1. Refine comments for new functions.
> 2. Use basic_block instead of count in bb_colder_than_loop_preheader
> to align with function name.
> 3. Refine with simpler implementation for get_coldest_out_loop and
> ref_in_loop_hot_body::operator for better understanding.
> 
> v4 changes:
> 1. Sort out profile_count 

Re: [PATCH v3 1/4] Fix loop split incorrect count and probability

2021-11-23 Thread Xionghu Luo via Gcc-patches
Gentle ping, thanks.

[PATCH v3] Fix loop split incorrect count and probability

https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583626.html


On 2021/11/8 14:09, Xionghu Luo via Gcc-patches wrote:
> 
> 
> On 2021/10/27 15:44, Jan Hubicka wrote:
>>> On Wed, 27 Oct 2021, Jan Hubicka wrote:
>>>
>
> gcc/ChangeLog:
>
>   * tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
>   (do_split_loop_on_cond): Likewise.
> ---
>  gcc/tree-ssa-loop-split.c | 25 -
>  1 file changed, 16 insertions(+), 9 deletions(-)
>
> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> index 3f6ad046623..d30782888f3 100644
> --- a/gcc/tree-ssa-loop-split.c
> +++ b/gcc/tree-ssa-loop-split.c
> @@ -575,7 +575,11 @@ split_loop (class loop *loop1)
>   stmts2);
>   tree cond = build2 (guard_code, boolean_type_node, guard_init, border);
>   if (!initial_true)
> -   cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); 
> +   cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> +
> + edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
> +? EDGE_SUCC (bbs[i], 0)
> +: EDGE_SUCC (bbs[i], 1);
>  
>   /* Now version the loop, placing loop2 after loop1 connecting
>  them, and fix up SSA form for that.  */
> @@ -583,10 +587,10 @@ split_loop (class loop *loop1)
>   basic_block cond_bb;
>  
>   class loop *loop2 = loop_version (loop1, cond, _bb,
> -profile_probability::always (),
> -profile_probability::always (),
> -profile_probability::always (),
> -profile_probability::always (),
> +true_edge->probability,
> +true_edge->probability.invert (),
> +true_edge->probability,
> +true_edge->probability.invert (),
>  true);

 As discussed yesterday, for loop of form

 for (...)
   if (cond)
 cond = something();
   else
 something2

 Split as
>>>
>>> Note that you are missing to conditionalize loop1 execution
>>> on 'cond' (not sure if that makes a difference).
>> You are right - forgot to mention that.
>>
>> Entry conditional makes no difference on scaling stmts inside loop but
>> affects its header and expected trip count. We however need to set up
>> probability of this conditional (and preheader count if it exists)
>> There is no general way to read the probability of this initial
>> conditional from cfg profile.  So I guess we are stuck with guessing
>> some arbitrary value. I guess common case is that cond is true first
>> iteration tough and often we can easily see that fromo PHI node
>> initializing the test variable.
>>
>> Other thing that changes is expected number of iterations of the split
>> loops, so we may want to update the exit conditinal probability
>> accordingly...
>>
> Sorry for the late reply.  The below updated patch mainly solves the issues
> you pointed out:
>   - profile count proportion for both original loop and copied loop
> without dropping down the true branch's count;
>   - probability update in the two loops and between the two loops;
>   - number of iterations update/check for split_loop.
> 
> 
> [PATCH v3] Fix loop split incorrect count and probability
> 
> In tree-ssa-loop-split.c, split_loop and split_loop_on_cond does two
> kind of split. split_loop only works for single loop and insert edge at
> exit when split, while split_loop_on_cond is not limited to single loop
> and insert edge at latch when split.  Both split behavior should consider
> loop count and probability update.  For split_loop, loop split condition
> is moved in front of loop1 and loop2; But split_loop_on_cond moves the
> condition between loop1 and loop2, this patch does:
> 1) profile count proportion for both original loop and copied loop
> without dropping down the true branch's count;
> 2) probability update in and between the two loops;
> 3) number of iterations update for split_loop.
> 
> Regression tested pass, OK for master?
> 
> Changes diff for split_loop and split_loop_on_cond cases:
> 
> 1) diff base/loop-split.c.151t.lsplit patched/loop-split.c.152t.lsplit
> ...
> [local count: 118111600]:
>if (beg_5(D) < end_8(D))
>  goto ; [89.00%]
>else
>  goto ; [11.00%]
> 
> [local count: 105119324]:
>if (beg2_6(D) < c_9(D))
> -goto ; [100.00%]
> +goto ; [33.00%]
>else
> -goto ; [100.00%]
> +goto ; [67.00%]
> 
> -   [local count: 105119324]:
> +   [local count: 34689377]:
>_25 = beg_5(D) + 1;
>_26 = end_8(D) - 

[PATCH v2] Fix incorrect loop exit edge probability [PR103270]

2021-11-23 Thread Xionghu Luo via Gcc-patches
On 2021/11/23 17:50, Jan Hubicka wrote:
>> On Tue, Nov 23, 2021 at 6:52 AM Xionghu Luo  wrote:
>>>
>>> r12-4526 cancelled jump thread path rotates loop. It exposes a issue in
>>> profile-estimate when predict_extra_loop_exits, outer loop's exit edge
>>> is marked as inner loop's extra loop exit and set with incorrect
>>> prediction, then a hot inner loop will become cold loop finally through
>>> optimizations, this patch ignores the EDGE_DFS_BACK edge when searching
>>> extra exit edges to avoid unexpected predict_edge.
>>
>> Not sure how outer vs. inner loop exit correlates with EDGE_DFS_BACK,
>> I have expected a check based on which loop is exited by the edge instead?
>> A backedge should never be an exit, no?
>>
>> Note that the profile pass does not yet mark backedges so EDGE_DFS_BACK
>> settings are unreliable.
> 
> So we have two nested loops and an exit which goes from inner loop and
> exists both loops.  While processing outer loop we set pretty high exit
> probability that is not good for inner loop?

No, the edge only belongs to outer loop only.  Can an exit edge belongs to
two different loops at the same time?
Exit edges are iterated with LI_FROM_INNERMOST in predict_loops, if an edge
already has prediction by querying edge_predicted_by_p, maybe_predict_edge
will early return to not set it again.

The CFG is:

2
|
8< // l1
| \   |
10 9  |
| |
  7
6 11,6->7) is set to (33%,67%) by l3 unexpectedly.

FYI: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103270#c5

> 
> I guess we could just check if exit edge source basic block has same
> loop depth as the loop we are processing?
> 


Thanks for the suggestion, it works.  Loop checks already existed in
predict_paths_for_bb, just need pass down the loop argument.
Updated as v2 patch.


v2-0001-Fix-incorrect-loop-exit-edge-probability-PR103270.patch

r12-4526 cancelled jump thread path rotates loop. It exposes a issue in
profile-estimate when predict_extra_loop_exits, outer loop's exit edge
is marked as inner loop's extra loop exit and set with incorrect
prediction, then a hot inner loop will become cold loop finally through
optimizations, this patch add loop check when searching extra exit edges
to avoid unexpected predict_edge from predict_paths_for_bb.

Regression tested pass on P8 & x86, OK for master?

gcc/ChangeLog:

PR middle-end/103270
* predict.c (predict_extra_loop_exits): Add loop parameter.
(predict_loops): Call with loop argument.

gcc/testsuite/ChangeLog:

PR middle-end/103270
* gcc.dg/pr103270.c: New test.
---
 gcc/predict.c   | 10 ++
 gcc/testsuite/gcc.dg/pr103270.c | 19 +++
 2 files changed, 25 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr103270.c

diff --git a/gcc/predict.c b/gcc/predict.c
index 68b11135680..082782ec4e9 100644
--- a/gcc/predict.c
+++ b/gcc/predict.c
@@ -1859,7 +1859,7 @@ predict_iv_comparison (class loop *loop, basic_block bb,
exits to predict them using PRED_LOOP_EXTRA_EXIT.  */
 
 static void
-predict_extra_loop_exits (edge exit_edge)
+predict_extra_loop_exits (class loop *loop, edge exit_edge)
 {
   unsigned i;
   bool check_value_one;
@@ -1912,12 +1912,14 @@ predict_extra_loop_exits (edge exit_edge)
continue;
   if (EDGE_COUNT (e->src->succs) != 1)
{
- predict_paths_leading_to_edge (e, PRED_LOOP_EXTRA_EXIT, NOT_TAKEN);
+ predict_paths_leading_to_edge (e, PRED_LOOP_EXTRA_EXIT, NOT_TAKEN,
+loop);
  continue;
}
 
   FOR_EACH_EDGE (e1, ei, e->src->preds)
-   predict_paths_leading_to_edge (e1, PRED_LOOP_EXTRA_EXIT, NOT_TAKEN);
+   predict_paths_leading_to_edge (e1, PRED_LOOP_EXTRA_EXIT, NOT_TAKEN,
+  loop);
 }
 }
 
@@ -2009,7 +2011,7 @@ predict_loops (void)
 ex->src->index, ex->dest->index);
  continue;
}
- predict_extra_loop_exits (ex);
+ predict_extra_loop_exits (loop, ex);
 
  if (number_of_iterations_exit (loop, ex, _desc, false, false))
niter = niter_desc.niter;
diff --git a/gcc/testsuite/gcc.dg/pr103270.c b/gcc/testsuite/gcc.dg/pr103270.c
new file mode 100644
index 000..819310e360e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr103270.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-profile_estimate" } */
+
+void test(int a, int* i)
+{
+  for (; a < 5; ++a)
+{
+  int b = 0;
+  int c = 0;
+  for (; b != -11; b--)
+   for (int d = 0; d ==0; d++)
+ {
+   *i += c & a;
+   c = b;
+ }
+}
+}
+
+/* { dg-final { scan-tree-dump-not "extra loop exit heuristics of 
edge\[^:\]*:" "profile_estimate"} } */
-- 
2.25.1






Re: [PATCH v2] c++: Fix missing NSDMI diagnostic in C++98 [PR103347]

2021-11-23 Thread Jason Merrill via Gcc-patches

On 11/23/21 17:06, Marek Polacek wrote:

On Tue, Nov 23, 2021 at 02:42:12PM -0500, Jason Merrill wrote:

On 11/22/21 17:17, Marek Polacek wrote:

Here the problem is that we aren't detecting a NSDMI in C++98:

struct A {
void *x = NULL;
};

because maybe_warn_cpp0x uses input_location and that happens to point
to NULL which comes from a system header.  Jakub suggested changing the
location to the '=', thereby avoiding the system header problem.  To
that end, I've added a new location_t member into cp_declarator.  This
member is used when this declarator is part of an init-declarator.  The
rest of the changes is obvious.  I've also taken the liberty of adding
loc_or_input_loc, since I want to avoid checking for UNKNOWN_LOCATION.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR c++/103347

gcc/cp/ChangeLog:

* cp-tree.h (struct cp_declarator): Add a location_t member.
(maybe_warn_cpp0x): Add a location_t parameter with a default argument.
(loc_or_input_loc): New.
* decl.c (grokdeclarator): Use loc_or_input_loc.  Pass init_loc down
to maybe_warn_cpp0x.
* error.c (maybe_warn_cpp0x): Add a location_t parameter.  Use it.
* parser.c (make_declarator): Initialize init_loc.
(cp_parser_member_declaration): Set init_loc.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/nsdmi-warn1.C: New test.
* g++.dg/cpp0x/nsdmi-warn1.h: New file.
---
   gcc/cp/cp-tree.h | 16 +---
   gcc/cp/decl.c| 22 +---
   gcc/cp/error.c   | 32 
   gcc/cp/parser.c  |  2 ++
   gcc/testsuite/g++.dg/cpp0x/nsdmi-warn1.C | 10 
   gcc/testsuite/g++.dg/cpp0x/nsdmi-warn1.h |  2 ++
   6 files changed, 55 insertions(+), 29 deletions(-)
   create mode 100644 gcc/testsuite/g++.dg/cpp0x/nsdmi-warn1.C
   create mode 100644 gcc/testsuite/g++.dg/cpp0x/nsdmi-warn1.h

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 3f56cb90d14..2037082b0c7 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6231,9 +6231,11 @@ struct cp_declarator {
 /* If this declarator is parenthesized, this the open-paren.  It is
UNKNOWN_LOCATION when not parenthesized.  */
 location_t parenthesized;
-
-  location_t id_loc; /* Currently only set for cdk_id, cdk_decomp and
-   cdk_function. */
+  /* Currently only set for cdk_id, cdk_decomp and cdk_function.  */
+  location_t id_loc;
+  /* If this declarator is part of an init-declarator, the location of the
+ initializer.  */


Currently this comment is inaccurate because we don't set it for all
init-declarators.  That should be pretty trivial to do, even if we don't use
the location yet in other contexts.


The following patch sets ->init_loc in a few more spots.  I've looked
at every cp_parser_declarator call and if it's followed by a =/{, I
set ->init_loc.  Pedantically, it's also an init-declarator if the
declarator is followed by a requires-clause, but I've not looked for
those cases.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


-- >8 --
Here the problem is that we aren't detecting a NSDMI in C++98:

struct A {
   void *x = NULL;
};

because maybe_warn_cpp0x uses input_location and that happens to point
to NULL which comes from a system header.  Jakub suggested changing the
location to the '=', thereby avoiding the system header problem.  To
that end, I've added a new location_t member into cp_declarator.  This
member is used when this declarator is part of an init-declarator.  The
rest of the changes is obvious.  I've also taken the liberty of adding
loc_or_input_loc, since I want to avoid checking for UNKNOWN_LOCATION.

PR c++/103347

gcc/cp/ChangeLog:

* cp-tree.h (struct cp_declarator): Add a location_t member.
(maybe_warn_cpp0x): Add a location_t parameter with a default argument.
(loc_or_input_loc): New.
* decl.c (grokdeclarator): Use loc_or_input_loc.  Pass init_loc down
to maybe_warn_cpp0x.
* error.c (maybe_warn_cpp0x): Add a location_t parameter.  Use it.
* parser.c (make_declarator): Initialize init_loc.
(cp_parser_member_declaration): Set init_loc.
(cp_parser_condition): Likewise.
(cp_parser_init_declarator): Likewise.
(cp_parser_parameter_declaration): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/nsdmi-warn1.C: New test.
* g++.dg/cpp0x/nsdmi-warn1.h: New file.
---
  gcc/cp/cp-tree.h | 16 +---
  gcc/cp/decl.c| 22 +---
  gcc/cp/error.c   | 32 
  gcc/cp/parser.c  |  8 ++
  gcc/testsuite/g++.dg/cpp0x/nsdmi-warn1.C | 10 
  gcc/testsuite/g++.dg/cpp0x/nsdmi-warn1.h |  2 ++
  6 files changed, 61 insertions(+), 29 deletions(-)
  create 

[r12-5483 Regression] FAIL: c-c++-common/attr-retain-9.c -Wc++-compat (test for excess errors) on Linux/x86_64

2021-11-23 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

30ba058f77eedfaf7a0582f5d42aff949710bce4 is the first bad commit
commit 30ba058f77eedfaf7a0582f5d42aff949710bce4
Author: Martin Sebor 
Date:   Tue Nov 23 15:30:29 2021 -0700

Implement -Winfinite-recursion [PR88232].

caused

FAIL: c-c++-common/attr-retain-5.c  -std=gnu++14 (test for excess errors)
FAIL: c-c++-common/attr-retain-5.c  -std=gnu++17 (test for excess errors)
FAIL: c-c++-common/attr-retain-5.c  -std=gnu++2a (test for excess errors)
FAIL: c-c++-common/attr-retain-5.c  -std=gnu++98 (test for excess errors)
FAIL: c-c++-common/attr-retain-5.c  -Wc++-compat  (test for excess errors)
FAIL: c-c++-common/attr-retain-6.c  -std=gnu++14 (test for excess errors)
FAIL: c-c++-common/attr-retain-6.c  -std=gnu++17 (test for excess errors)
FAIL: c-c++-common/attr-retain-6.c  -std=gnu++2a (test for excess errors)
FAIL: c-c++-common/attr-retain-6.c  -std=gnu++98 (test for excess errors)
FAIL: c-c++-common/attr-retain-6.c  -Wc++-compat  (test for excess errors)
FAIL: c-c++-common/attr-retain-9.c  -std=gnu++14 (test for excess errors)
FAIL: c-c++-common/attr-retain-9.c  -std=gnu++17 (test for excess errors)
FAIL: c-c++-common/attr-retain-9.c  -std=gnu++2a (test for excess errors)
FAIL: c-c++-common/attr-retain-9.c  -std=gnu++98 (test for excess errors)
FAIL: c-c++-common/attr-retain-9.c  -Wc++-compat  (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-5483/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=c-c++-common/attr-retain-5.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=c-c++-common/attr-retain-5.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=c-c++-common/attr-retain-5.c --target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=c-c++-common/attr-retain-5.c --target_board='unix{-m64\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=c-c++-common/attr-retain-6.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=c-c++-common/attr-retain-6.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=c-c++-common/attr-retain-6.c --target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=c-c++-common/attr-retain-6.c --target_board='unix{-m64\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=c-c++-common/attr-retain-9.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=c-c++-common/attr-retain-9.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=c-c++-common/attr-retain-9.c --target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=c-c++-common/attr-retain-9.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


Re: [PATCH] c++: Implement C++23 P2128R6 - Multidimensional subscript operator [PR102611]

2021-11-23 Thread Jason Merrill via Gcc-patches

On 10/14/21 04:26, Jakub Jelinek wrote:

Hi!

The following patch implements the C++23 Multidimensional subscript operator
P2128R6 paper.
As C++20 and older only allow a single expression in between []s (albeit
for C++20 with a deprecation warning if it is a comma expression) and even
in C++23 and for the coming years I think the vast majority of subscript
expressions will still have a single expression and even in C++23 it is
quite special, as e.g. the builtin operator requires exactly one
assignment expression, the patch attempts to optimize for that case and
if possible not to slow down that common case (or use more memory for it).
So, already during parsing it differentiates between that (uses a single
index_exp tree in that case) and the new cases (zero or two+ expressions
in the list), for which it sets index_exp to NULL_TREE and uses a
releasing_vec instead similarly to how e.g. finish_call_expr uses it.

In call.c it introduces new functions build_op_subscript{,_1} which are
something in between build_new_op{,_1} and build_op_call{,_1}.
The former requires fixed number of arguments (and the patch still uses
it for the common case of subscript with exactly one index expression),
the latter handles variable number of arguments but is too CALL_EXPR specific
and handles various cases that are unnecessary for the subscript.
Right now the subscript for 0 or 2+ expressions doesn't need to deal with
builtin candidates and so is quite simple.

As discussed in the paper, for backwards compatibility, if for 2+ index
expressions build_op_subscript fails (called with tf_none) and the
expressions together form a valid comma expression (again checked with
tf_none), it is used that C++20-ish way with a pedwarn about it, but if
even that fails, build_op_subscript is called again with standard complain
flags to diagnose it in the new way.  And similarly for the builtin case.

The -Wcomma-subscript warning used to be enabled by default unless
-Wno-deprecated.  Since the C/C++98..20 behavior is no longer deprecated,
but ill-formed or changed meaning, it is now for C++23 enabled by
default regardless of -Wno-deprecated and controls the pedwarn (but not the
errors emitted if something wasn't valid before and isn't valid in C++23
either).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-10-14  Jakub Jelinek  

PR c++/102611
gcc/
* doc/invoke.texi (-Wcomma-subscript): Document that for
-std=c++20 the option isn't enabled by default with -Wno-deprecated
but for -std=c++23 it is.
gcc/c-family/
* c-opts.c (c_common_post_options): Enable -Wcomma-subscript by
default for C++23 regardless of warn_deprecated.
* c-cppbuiltin.c (c_cpp_builtins): Predefine
__cpp_multidimensional_subscript=202110L for C++23.
gcc/cp/
* cp-tree.h (build_op_subscript): Implement P2128R6
- Multidimensional subscript operator.  Declare.
(grok_array_decl): Remove bool argument, add vec **
and tsubst_flags_t arguments.
(build_min_non_dep_op_overload): Declare another overload.
* parser.c (cp_parser_postfix_expression): Mention C++23 syntax
in function comment.  For C++23 parse zero or more than one
initializer clauses in expression list, adjust grok_array_decl
caller.
(cp_parser_builtin_offsetof): Adjust grok_array_decl caller.
* decl.c (grok_op_properties): For C++23 don't check number
of arguments of operator[].
* decl2.c (grok_array_decl): Remove decltype_p argument, add
index_exp_list and complain arguments.  If index_exp is NULL,
handle *index_exp_list as the subscript expression list.
* tree.c (build_min_non_dep_op_overload): New overload.
* call.c (build_op_subscript_1, build_op_subscript): New
functions.
* pt.c (tsubst_copy_and_build) : If second
operand is magic CALL_EXPR with ovl_op_identifier (ARRAY_REF)
as CALL_EXPR_FN, tsubst CALL_EXPR arguments including expanding
pack expressions in it and call grok_array_decl instead of
build_x_array_ref.
* semantics.c (handle_omp_array_sections_1): Adjust grok_array_decl
caller.
gcc/testsuite/
* g++.dg/cpp2a/comma1.C: Expect different diagnostics for C++23.
* g++.dg/cpp2a/comma3.C: Likewise.
* g++.dg/cpp2a/comma4.C: Expect diagnostics for C++23.
* g++.dg/cpp2a/comma5.C: Expect different diagnostics for C++23.
* g++.dg/cpp23/feat-cxx2b.C: Test __cpp_multidimensional_subscript
predefined macro.
* g++.dg/cpp23/subscript1.C: New test.
* g++.dg/cpp23/subscript2.C: New test.
* g++.dg/cpp23/subscript3.C: New test.
* g++.dg/cpp23/subscript4.C: New test.
* g++.dg/cpp23/subscript5.C: New test.
* g++.dg/cpp23/subscript6.C: New test.

--- gcc/doc/invoke.texi.jj  2021-10-12 09:08:25.781088065 +0200
+++ gcc/doc/invoke.texi 2021-10-13 

Re: [PATCH v2 0/2] RISC-V: add gcc support for Scalar Cryptography v1.0.0-rc6

2021-11-23 Thread Palmer Dabbelt

[Changing to Jim's new address]

On Mon, 22 Nov 2021 00:19:08 PST (-0800), s...@isrc.iscas.ac.cn wrote:

From: SiYu Wu 

This patch add gcc backend support for RISC-V Scalar Cryptography
Extension (k-ext), including machine description, builtins defines and
testcases for each k-ext's subset.

A note about Zbkx: The Zbkx should be implemented in bitmanip's Zbp, but
since zbp is not included in the bitmanip spec v1.0, and crypto's v1.0
release will earlier than bitmanip's next release, so for now we
implementing it here.

Version logs:

v2: As Kito mentions, now this patch only includes the arch string related
stuff, the builtins and md changes is not included, waiting for the builtin
and intrinsic added to the spec. Also removed the unnecessary patches and add
Changelogs.


I don't think there's anything wrong with what's here, but IMO we should 
hold off on merging until GCC does something with these extensions.  

IIUC all this enables is passing "-march=*Zk*" instead of 
"-Wa,-march=*Zk*", and while that is useful I'm worried it'll just make 
more of a headache for users who lose a simple way to detect the 
intrinsics.  IMO forcing users to pass -Wa properly encodes the "GCC 
doesn't support these, but binutils does" scenario pretty sanely, and 
users doing things at this level of complexity should be used to that 
already because it happens somewhat frequently.


I'm not sure if I'm missing some use case this for this, though.


SiYu Wu (2):
  RISC-V: Add option defines for Scalar Cryptography
  RISC-V: Add implied defines of Zk, Zkn and Zks

 gcc/common/config/riscv/riscv-common.c | 38 +-
 gcc/config/riscv/arch-canonicalize | 16 ++-
 gcc/config/riscv/riscv-opts.h  | 22 +++
 gcc/config/riscv/riscv.opt |  3 ++
 4 files changed, 77 insertions(+), 2 deletions(-)


Re: libstdc++: Make atomic::wait() const [PR102994]

2021-11-23 Thread Thomas Rodgers via Gcc-patches
const qualification was also missing in the free functions for
wait/wait_explicit/notify_one/notify_all. Revised patch attached.

On Tue, Nov 9, 2021 at 11:40 AM Jonathan Wakely  wrote:

> On Tue, 9 Nov 2021 at 18:09, Thomas Rodgers wrote:
>
>> Revised patch attached.
>>
>
> OK for trunk and gcc-11, thanks.
>
>
>
>> On Fri, Nov 5, 2021 at 4:46 PM Jonathan Wakely 
>> wrote:
>>
>>> On Fri, 5 Nov 2021 at 21:51, Jonathan Wakely via Libstdc++
>>>  wrote:
>>> >
>>> > OK, thanks.
>>>
>>> Actually, we should really have a test to verify it can be called on a
>>> const object. Please add something when you commit, it can be dumb and
>>> simple, it just needs to verify that it can be called.
>>>
>>>
>>> >
>>> >
>>> > On Fri, 5 Nov 2021 at 21:46, Thomas Rodgers via Libstdc++ <
>>> > libstd...@gcc.gnu.org> wrote:
>>> >
>>> > >
>>> > >
>>>
>>>
From 337c147b5bb0265522d5aac4beefb3dec1ebe026 Mon Sep 17 00:00:00 2001
From: Thomas Rodgers 
Date: Tue, 9 Nov 2021 09:42:49 -0800
Subject: [PATCH] libstdc++: Make atomic::wait() const [PR102994]

This was an oversight in the original commit adding wait/notify
to atomic.

libstdc++-v3/ChangeLog:

	PR libstdc++/102994
	* include/bits/atomic_base.h (__atomic_base<_PTp*>::wait()):
	Add const qualifier.
	* include/std/atomic (atomic<_Tp*>::wait(), atomic_wait(),
	atomic_wait_explicit(), atomic_notify_one(), atomic_notify_all()):
	Likewise.
	* testsuite/29_atomics/atomic/wait_notify/102994.cc:
	New test.
---
 libstdc++-v3/include/bits/atomic_base.h   |  2 +-
 libstdc++-v3/include/std/atomic   |  8 
 .../29_atomics/atomic/wait_notify/102994.cc   | 19 +++
 3 files changed, 24 insertions(+), 5 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/102994.cc

diff --git a/libstdc++-v3/include/bits/atomic_base.h b/libstdc++-v3/include/bits/atomic_base.h
index 9e18aadadaf..a104adc1a10 100644
--- a/libstdc++-v3/include/bits/atomic_base.h
+++ b/libstdc++-v3/include/bits/atomic_base.h
@@ -893,7 +893,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #if __cpp_lib_atomic_wait
   _GLIBCXX_ALWAYS_INLINE void
   wait(__pointer_type __old,
-	   memory_order __m = memory_order_seq_cst) noexcept
+	   memory_order __m = memory_order_seq_cst) const noexcept
   {
 	std::__atomic_wait_address_v(&_M_p, __old,
  [__m, this]
diff --git a/libstdc++-v3/include/std/atomic b/libstdc++-v3/include/std/atomic
index 936dd50ba1c..9b827b425dc 100644
--- a/libstdc++-v3/include/std/atomic
+++ b/libstdc++-v3/include/std/atomic
@@ -646,9 +646,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	__cmpexch_failure_order(__m));
   }
 
-#if __cpp_lib_atomic_wait 
+#if __cpp_lib_atomic_wait
 void
-wait(__pointer_type __old, memory_order __m = memory_order_seq_cst) noexcept
+wait(__pointer_type __old, memory_order __m = memory_order_seq_cst) const noexcept
 { _M_b.wait(__old, __m); }
 
 // TODO add const volatile overload
@@ -1434,12 +1434,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
 inline void
-atomic_notify_one(atomic<_Tp>* __a) noexcept
+atomic_notify_one(const atomic<_Tp>* __a) noexcept
 { __a->notify_one(); }
 
   template
 inline void
-atomic_notify_all(atomic<_Tp>* __a) noexcept
+atomic_notify_all(const atomic<_Tp>* __a) noexcept
 { __a->notify_all(); }
 #endif // __cpp_lib_atomic_wait
 
diff --git a/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/102994.cc b/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/102994.cc
new file mode 100644
index 000..28c3d66f451
--- /dev/null
+++ b/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/102994.cc
@@ -0,0 +1,19 @@
+// { dg-options "-std=gnu++20" }
+// { dg-do compile { target c++20 } }
+// { dg-require-gthreads "" }
+
+#include 
+
+void
+test1(const std::atomic , char*p)
+{
+  a.wait(p);
+}
+
+void
+test2(const std::atomic* a, int v)
+{
+  std::atomic_wait(a, v);
+  std::atomic_notify_one(a);
+  std::atomic_notify_all(a);
+}
-- 
2.31.1



[committed][wwwdocs] Remove section on traditional C from htdocs/projects/beginner.html

2021-11-23 Thread Eric Gallager via Gcc-patches
On Fri, Nov 19, 2021 at 8:14 AM Eric Gallager  wrote:
>
> On Fri, Nov 19, 2021 at 1:48 AM Gerald Pfeifer  wrote:
> >
> > Cool, thank you!
> >
> > Please feel free to commit patches like this without asking for
> > approval (though I'm happy to review and approve).
> >
> > Gerald
> >
>
> OK thanks; committed as dbaebcd

I've also committed one to remove the section on traditional C now:
https://gcc.gnu.org/git/?p=gcc-wwwdocs.git;a=commitdiff;h=ca83c13ad6bf0d351220dafa36264ebc7a6b7816


Re: [PATCH v3] c-format: Add -Wformat-int-precision option [PR80060]

2021-11-23 Thread Joseph Myers
On Tue, 23 Nov 2021, Daniil Stas via Gcc-patches wrote:

> On Mon, 22 Nov 2021 20:35:03 +
> Joseph Myers  wrote:
> 
> > On Sun, 21 Nov 2021, Daniil Stas via Gcc-patches wrote:
> > 
> > > This option is enabled by default when -Wformat option is enabled. A
> > > user can specify -Wno-format-int-precision to disable emitting
> > > warnings when passing an argument of an incompatible integer type to
> > > a 'd', 'i', 'o', 'u', 'x', or 'X' conversion specifier when it has
> > > the same precision as the expected type.  
> > 
> > I'd expect this to apply to 'b' and 'B' as well (affects commit
> > message, ChangeLog entry, option help string, documentation).
> > 
> 
> Hi Joseph,
> 
> I can't find any description of these specifiers anywhere. And looks

They're new specifiers in C23.  See the most recent working draft 
.

> like gcc doesn't recognize them when I try to compile a sample program

GCC should recognize them (i.e., not warn about them with -Wformat) if you 
have commit bd6f2c63168e89bede631daf8b673eab16acc747 (12 October).

> with them (I just get %B printed when I run the program).

If you want runtime support for those specifiers in printf, you'll need a 
libc implementation with support for them.  In glibc that means commit 
309548bec3b89022bbc81a372ec3e9240211d799 (10 November) or later, for 
example.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] handle member references in -Waddress [PR96507]

2021-11-23 Thread Martin Sebor via Gcc-patches

On 11/23/21 12:59 PM, Jason Merrill wrote:

On 11/22/21 18:21, Marek Polacek wrote:
On Mon, Nov 22, 2021 at 04:00:56PM -0700, Martin Sebor via Gcc-patches 
wrote:

While going through old -Waddress bug reports to close after
the recent improvements to the warning I came across PR 96507
that points out that member references aren't handled.  Since
testing the address of a reference for equality to null is
in general diagnosed, this seems like an oversight worth fixing.
  Attached is a change to the C++ front end to diagnose member
references as well.

Tested on x86_64-linux.

Martin



Issue -Waddress also for reference members [PR96507].

Resolves:
PR c++/96507 - missing -Waddress for member references

gcc/cp/ChangeLog:

PR c++/96507
* typeck.c (warn_for_null_address): Handle reference members.

gcc/testsuite/ChangeLog:

PR c++/96507
* g++.dg/warn/Waddress-8.C: New test.

diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index 58919aaf13e..694c53eef8a 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -4676,15 +4676,21 @@ warn_for_null_address (location_t location, 
tree op, tsubst_flags_t complain)

  "addition %qE and NULL", cop);
    return;
  }
-  else if (CONVERT_EXPR_P (op)
-  && TYPE_REF_P (TREE_TYPE (TREE_OPERAND (op, 0
+  else if (CONVERT_EXPR_P (op))
  {
-  STRIP_NOPS (op);
+  tree op0 = TREE_OPERAND (op, 0);
+  if (TYPE_REF_P (TREE_TYPE (op0)))
+    {


Isn't this just REFERENCE_REF_P?


No, there's no INDIRECT_REF here.

Martin, I think you don't need to change the test to two levels since 
you don't use the op0 variable again; I think these two lines:



+  if (TREE_CODE (op) == COMPONENT_REF)
+    op = TREE_OPERAND (op, 1);


are all the change you need for this fix.  OK that way.


True.  I put it back the way it was and committed it in r12-5484.

Martin



Jason





[PATCH v2] c++: Fix missing NSDMI diagnostic in C++98 [PR103347]

2021-11-23 Thread Marek Polacek via Gcc-patches
On Tue, Nov 23, 2021 at 02:42:12PM -0500, Jason Merrill wrote:
> On 11/22/21 17:17, Marek Polacek wrote:
> > Here the problem is that we aren't detecting a NSDMI in C++98:
> > 
> > struct A {
> >void *x = NULL;
> > };
> > 
> > because maybe_warn_cpp0x uses input_location and that happens to point
> > to NULL which comes from a system header.  Jakub suggested changing the
> > location to the '=', thereby avoiding the system header problem.  To
> > that end, I've added a new location_t member into cp_declarator.  This
> > member is used when this declarator is part of an init-declarator.  The
> > rest of the changes is obvious.  I've also taken the liberty of adding
> > loc_or_input_loc, since I want to avoid checking for UNKNOWN_LOCATION.
> > 
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > 
> > PR c++/103347
> > 
> > gcc/cp/ChangeLog:
> > 
> > * cp-tree.h (struct cp_declarator): Add a location_t member.
> > (maybe_warn_cpp0x): Add a location_t parameter with a default argument.
> > (loc_or_input_loc): New.
> > * decl.c (grokdeclarator): Use loc_or_input_loc.  Pass init_loc down
> > to maybe_warn_cpp0x.
> > * error.c (maybe_warn_cpp0x): Add a location_t parameter.  Use it.
> > * parser.c (make_declarator): Initialize init_loc.
> > (cp_parser_member_declaration): Set init_loc.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp0x/nsdmi-warn1.C: New test.
> > * g++.dg/cpp0x/nsdmi-warn1.h: New file.
> > ---
> >   gcc/cp/cp-tree.h | 16 +---
> >   gcc/cp/decl.c| 22 +---
> >   gcc/cp/error.c   | 32 
> >   gcc/cp/parser.c  |  2 ++
> >   gcc/testsuite/g++.dg/cpp0x/nsdmi-warn1.C | 10 
> >   gcc/testsuite/g++.dg/cpp0x/nsdmi-warn1.h |  2 ++
> >   6 files changed, 55 insertions(+), 29 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp0x/nsdmi-warn1.C
> >   create mode 100644 gcc/testsuite/g++.dg/cpp0x/nsdmi-warn1.h
> > 
> > diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> > index 3f56cb90d14..2037082b0c7 100644
> > --- a/gcc/cp/cp-tree.h
> > +++ b/gcc/cp/cp-tree.h
> > @@ -6231,9 +6231,11 @@ struct cp_declarator {
> > /* If this declarator is parenthesized, this the open-paren.  It is
> >UNKNOWN_LOCATION when not parenthesized.  */
> > location_t parenthesized;
> > -
> > -  location_t id_loc; /* Currently only set for cdk_id, cdk_decomp and
> > -   cdk_function. */
> > +  /* Currently only set for cdk_id, cdk_decomp and cdk_function.  */
> > +  location_t id_loc;
> > +  /* If this declarator is part of an init-declarator, the location of the
> > + initializer.  */
> 
> Currently this comment is inaccurate because we don't set it for all
> init-declarators.  That should be pretty trivial to do, even if we don't use
> the location yet in other contexts.

The following patch sets ->init_loc in a few more spots.  I've looked
at every cp_parser_declarator call and if it's followed by a =/{, I
set ->init_loc.  Pedantically, it's also an init-declarator if the
declarator is followed by a requires-clause, but I've not looked for
those cases.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Here the problem is that we aren't detecting a NSDMI in C++98:

struct A {
  void *x = NULL;
};

because maybe_warn_cpp0x uses input_location and that happens to point
to NULL which comes from a system header.  Jakub suggested changing the
location to the '=', thereby avoiding the system header problem.  To
that end, I've added a new location_t member into cp_declarator.  This
member is used when this declarator is part of an init-declarator.  The
rest of the changes is obvious.  I've also taken the liberty of adding
loc_or_input_loc, since I want to avoid checking for UNKNOWN_LOCATION.

PR c++/103347

gcc/cp/ChangeLog:

* cp-tree.h (struct cp_declarator): Add a location_t member.
(maybe_warn_cpp0x): Add a location_t parameter with a default argument.
(loc_or_input_loc): New.
* decl.c (grokdeclarator): Use loc_or_input_loc.  Pass init_loc down
to maybe_warn_cpp0x.
* error.c (maybe_warn_cpp0x): Add a location_t parameter.  Use it.
* parser.c (make_declarator): Initialize init_loc.
(cp_parser_member_declaration): Set init_loc.
(cp_parser_condition): Likewise.
(cp_parser_init_declarator): Likewise.
(cp_parser_parameter_declaration): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/nsdmi-warn1.C: New test.
* g++.dg/cpp0x/nsdmi-warn1.h: New file.
---
 gcc/cp/cp-tree.h | 16 +---
 gcc/cp/decl.c| 22 +---
 gcc/cp/error.c   | 32 
 gcc/cp/parser.c  |  8 ++
 gcc/testsuite/g++.dg/cpp0x/nsdmi-warn1.C | 10 

[wwwdocs] Update C++ DR table

2021-11-23 Thread Marek Polacek via Gcc-patches
This patch updates the C++ DR table.  Several older DRs are now in the
standard, and we have a few new ones.

Pushed.

---
 htdocs/projects/cxx-dr-status.html | 232 -
 1 file changed, 158 insertions(+), 74 deletions(-)

diff --git a/htdocs/projects/cxx-dr-status.html 
b/htdocs/projects/cxx-dr-status.html
index 8f750892..e8002b27 100644
--- a/htdocs/projects/cxx-dr-status.html
+++ b/htdocs/projects/cxx-dr-status.html
@@ -15,7 +15,7 @@
 
   This table tracks the implementation status of C++ defect reports in GCC.
   It is based on C++ Standard Core Language Issue Table of Contents, Revision
-  104 (http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_toc.html;>here).
+  106 (http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_toc.html;>here).
 
   
 
@@ -8768,11 +8768,11 @@
   -
   
 
-
+
   https://wg21.link/cwg1249;>1249
-  drafting
+  DR
   Cv-qualification of nested lambda capture
-  -
+  ?
   
 
 
@@ -10304,11 +10304,11 @@
   ?
   
 
-
+
   https://wg21.link/cwg1468;>1468
-  drafting
+  CD5
   typeid, overload resolution, and implicit lambda 
capture
-  -
+  ?
   
 
 
@@ -11922,11 +11922,11 @@
   -
   
 
-
+
   https://wg21.link/cwg1699;>1699
-  drafting
+  extension
   Does befriending a class befriend its friends?
-  -
+  No
   
 
 
@@ -12097,11 +12097,11 @@
   -
   
 
-
+
   https://wg21.link/cwg1724;>1724
-  drafting
+  DR
   Unclear rules for deduction failure
-  -
+  ?
   
 
 
@@ -12160,11 +12160,11 @@
   ?
   
 
-
+
   https://wg21.link/cwg1733;>1733
-  drafting
+  DR
   Return type and value for operator= with 
ref-qualifier
-  -
+  ?
   
 
 
@@ -12552,11 +12552,11 @@
   ?
   
 
-
+
   https://wg21.link/cwg1789;>1789
-  drafting
+  review
   Array reference vs array decay in overload resolution
-  -
+  ?
   
 
 
@@ -12650,11 +12650,11 @@
   ?
   
 
-
+
   https://wg21.link/cwg1803;>1803
-  drafting
+  CD5
   opaque-enum-declaration as member-declaration
-  -
+  ?
   
 
 
@@ -14876,11 +14876,11 @@
   No
   https://gcc.gnu.org/PR91081;>PR91081
 
-
+
   https://wg21.link/cwg2121;>2121
-  accepted
+  WP
   More flexible lambda syntax
-  -
+  ?
   
 
 
@@ -16809,9 +16809,9 @@
   ?
   
 
-
+
   https://wg21.link/cwg2397;>2397
-  drafting
+  DRWP
   auto specifier for pointers and references to 
arrays
   12
   https://gcc.gnu.org/PR100975;>PR100975
@@ -17166,11 +17166,11 @@
   ?
   
 
-
+
   https://wg21.link/cwg2448;>2448
-  ready
+  DRWP
   Cv-qualification of arithmetic types and deprecation of volatile
-  -
+  ?
   
 
 
@@ -17215,11 +17215,11 @@
   N/A
   
 
-
+
   https://wg21.link/cwg2455;>2455
-  drafting
+  accepted
   Concatenation of string literals vs translation phases 5 and 6
-  -
+  ?
   
 
 
@@ -17236,11 +17236,11 @@
   ?
   
 
-
+
   https://wg21.link/cwg2458;>2458
-  ready
+  DRWP
   Value category of expressions denoting non-static member 
functions
-  -
+  ?
   
 
 
@@ -17285,18 +17285,18 @@
   -
   
 
-
+
   https://wg21.link/cwg2465;>2465
-  ready
+  DRWP
   Coroutine parameters passed to a promise constructor
-  -
+  ?
   
 
-
+
   https://wg21.link/cwg2466;>2466
-  drafting
+  DRWP
   co_await should be a single evaluation
-  -
+  ?
   
 
 
@@ -17322,14 +17322,14 @@
 
 
   https://wg21.link/cwg2470;>2470
-  DR
+  DRWP
   Multiple array objects providing storage for one object
   ?
   
 
 
   https://wg21.link/cwg2471;>2471
-  open
+  drafting
   Nested class template argument deduction
   -
   
@@ -17348,11 +17348,11 @@
   -
   
 
-
+
   https://wg21.link/cwg2474;>2474
-  drafting
+  DRWP
   Cv-qualification and deletion
-  -
+  ?
   
 
 
@@ -17364,16 +17364,16 @@
 
 
   https://wg21.link/cwg2476;>2476
-  ready
+  drafting
   placeholder-type-specifiers and function declarators
   -
   
 
-
+
   https://wg21.link/cwg2477;>2477
-  ready
+  DRWP
   Defaulted vs deleted copy constructors/assignment operators
-  -
+  ?
   
 
 
@@ -17383,32 +17383,32 @@
   -
   
 
-
+
   https://wg21.link/cwg2479;>2479
-  open
+  DRWP
   Missing specifications for consteval and 
constinit
   Yes
 

[committed] libstdc++: Add another testcase for std::unique_ptr printer [PR103086]

2021-11-23 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux, pushed to trunk.


libstdc++-v3/ChangeLog:

PR libstdc++/103086
* testsuite/libstdc++-prettyprinters/cxx11.cc: Check unique_ptr
with non-empty pointer and non-empty deleter.
---
 .../testsuite/libstdc++-prettyprinters/cxx11.cc   | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/libstdc++-v3/testsuite/libstdc++-prettyprinters/cxx11.cc 
b/libstdc++-v3/testsuite/libstdc++-prettyprinters/cxx11.cc
index 637246b3c12..fd50e8b028a 100644
--- a/libstdc++-v3/testsuite/libstdc++-prettyprinters/cxx11.cc
+++ b/libstdc++-v3/testsuite/libstdc++-prettyprinters/cxx11.cc
@@ -151,6 +151,17 @@ main()
   std::unique_ptr& rempty_ptr = empty_ptr;
 // { dg-final { note-test rempty_ptr {std::unique_ptr = {get() = {}}} } }
 
+  struct Deleter_pr103086
+  {
+int deleter_member = -1;
+void operator()(int*) const noexcept { }
+  };
+
+  std::unique_ptr uniq_ptr;
+// { dg-final { note-test uniq_ptr {std::unique_ptr = {get() = 0x0}} } }
+  std::unique_ptr& runiq_ptr = uniq_ptr;
+// { dg-final { note-test runiq_ptr {std::unique_ptr = {get() = 0x0}} } }
+
   ExTuple tpl(6,7);
 // { dg-final { note-test tpl {std::tuple containing = {[1] = 6, [2] = 7}} } }
   ExTuple  = tpl;
-- 
2.31.1



[committed] libstdc++: Add effective-target for std::allocator implementation

2021-11-23 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux, pushed to trunk.



This allows tests to be skipped if the std::allocator implementation is
not __gnu_cxx::new_allocator.

The 20_util/allocator/overaligned.cc test requires either C++17 or
new_allocator, otherwise we can't guarantee to return overaligned
memory.

libstdc++-v3/ChangeLog:

* testsuite/18_support/50594.cc: Check effective target.
* testsuite/20_util/allocator/1.cc: Likewise.
* testsuite/20_util/allocator/overaligned.cc: Likewise.
* testsuite/23_containers/unordered_map/96088.cc: Likewise.
* testsuite/23_containers/unordered_multimap/96088.cc: Likewise.
* testsuite/23_containers/unordered_multiset/96088.cc: Likewise.
* testsuite/23_containers/unordered_set/96088.cc: Likewise.
* testsuite/ext/throw_allocator/check_delete.cc: Likewise.
* testsuite/ext/throw_allocator/check_new.cc: Likewise.
* testsuite/lib/libstdc++.exp 
(check_effective_target_std_allocator_new):
Define new proc.
---
 libstdc++-v3/testsuite/18_support/50594.cc| 1 +
 libstdc++-v3/testsuite/20_util/allocator/1.cc | 7 +++
 libstdc++-v3/testsuite/20_util/allocator/overaligned.cc   | 2 +-
 .../testsuite/23_containers/unordered_map/96088.cc| 1 +
 .../testsuite/23_containers/unordered_multimap/96088.cc   | 1 +
 .../testsuite/23_containers/unordered_multiset/96088.cc   | 1 +
 .../testsuite/23_containers/unordered_set/96088.cc| 1 +
 .../testsuite/ext/throw_allocator/check_delete.cc | 1 +
 libstdc++-v3/testsuite/ext/throw_allocator/check_new.cc   | 1 +
 libstdc++-v3/testsuite/lib/libstdc++.exp  | 8 
 10 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/testsuite/18_support/50594.cc 
b/libstdc++-v3/testsuite/18_support/50594.cc
index a18e8278081..c15e704debe 100644
--- a/libstdc++-v3/testsuite/18_support/50594.cc
+++ b/libstdc++-v3/testsuite/18_support/50594.cc
@@ -1,5 +1,6 @@
 // { dg-options "-fwhole-program" }
 // { dg-additional-options "-static-libstdc++" { target *-*-mingw* } }
+// { dg-require-effective-target std_allocator_new }
 // { dg-xfail-run-if "AIX operator new" { powerpc-ibm-aix* } }
 
 // Copyright (C) 2011-2021 Free Software Foundation, Inc.
diff --git a/libstdc++-v3/testsuite/20_util/allocator/1.cc 
b/libstdc++-v3/testsuite/20_util/allocator/1.cc
index ebcd6c28c5f..79e223c13c2 100644
--- a/libstdc++-v3/testsuite/20_util/allocator/1.cc
+++ b/libstdc++-v3/testsuite/20_util/allocator/1.cc
@@ -17,6 +17,8 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
+// { dg-require-effective-target std_allocator_new }
+
 // 20.4.1.1 allocator members
 
 #include 
@@ -35,7 +37,7 @@ struct gnu { };
 bool check_new = false;
 bool check_delete = false;
 
-void* 
+void*
 operator new(std::size_t n) THROW(std::bad_alloc)
 {
   check_new = true;
@@ -59,9 +61,6 @@ void test01()
 {
   std::allocator obj;
 
-  // NB: These should work for various size allocation and
-  // deallocations.  Currently, they only work as expected for sizes >
-  // _MAX_BYTES as defined in stl_alloc.h, which happes to be 128. 
   gnu* pobj = obj.allocate(256);
   VERIFY( check_new );
 
diff --git a/libstdc++-v3/testsuite/20_util/allocator/overaligned.cc 
b/libstdc++-v3/testsuite/20_util/allocator/overaligned.cc
index fd03d62b238..8c90fcc0e92 100644
--- a/libstdc++-v3/testsuite/20_util/allocator/overaligned.cc
+++ b/libstdc++-v3/testsuite/20_util/allocator/overaligned.cc
@@ -16,7 +16,7 @@
 // .
 
 // { dg-options "-faligned-new" }
-// { dg-do run { target c++11 } }
+// { dg-do run { target { c++11 && { c++17 || std_allocator_new } } } }
 // { dg-require-cstdint "" }
 
 #include 
diff --git a/libstdc++-v3/testsuite/23_containers/unordered_map/96088.cc 
b/libstdc++-v3/testsuite/23_containers/unordered_map/96088.cc
index 83ca1c0afd6..27c499ed348 100644
--- a/libstdc++-v3/testsuite/23_containers/unordered_map/96088.cc
+++ b/libstdc++-v3/testsuite/23_containers/unordered_map/96088.cc
@@ -1,4 +1,5 @@
 // { dg-do run { target c++17 } }
+// { dg-require-effective-target std_allocator_new }
 
 // Copyright (C) 2021 Free Software Foundation, Inc.
 //
diff --git a/libstdc++-v3/testsuite/23_containers/unordered_multimap/96088.cc 
b/libstdc++-v3/testsuite/23_containers/unordered_multimap/96088.cc
index de7f009dadc..eaadd08e7ca 100644
--- a/libstdc++-v3/testsuite/23_containers/unordered_multimap/96088.cc
+++ b/libstdc++-v3/testsuite/23_containers/unordered_multimap/96088.cc
@@ -1,4 +1,5 @@
 // { dg-do run { target c++17 } }
+// { dg-require-effective-target std_allocator_new }
 
 // Copyright (C) 2021 Free Software Foundation, Inc.
 //
diff --git a/libstdc++-v3/testsuite/23_containers/unordered_multiset/96088.cc 
b/libstdc++-v3/testsuite/23_containers/unordered_multiset/96088.cc
index b9bbf63b863..aa137ec9302 100644
--- a/libstdc++-v3/testsuite/23_containers/unordered_multiset/96088.cc
+++ 

Re: [PATCH 1/2] add -Wuse-after-free

2021-11-23 Thread Martin Sebor via Gcc-patches

On 11/22/21 6:32 PM, Jeff Law wrote:



On 11/1/2021 4:17 PM, Martin Sebor via Gcc-patches wrote:

Patch 1 in the series detects a small subset of uses of pointers
made indeterminate by calls to deallocation functions like free
or C++ operator delete.  To control the conditions the warnings
are issued under the new -Wuse-after-free= option provides three
levels.  At the lowest level the warning triggers only for
unconditional uses of freed pointers and doesn't warn for uses
in equality expressions.  Level 2 warns also for come conditional
uses, and level 3 also for uses in equality expressions.

I debated whether to make level 2 or 3 the default included in
-Wall.  I decided on 3 for two reasons: 1) to raise awareness
of both the problem and GCC's new ability to detect it: using
a pointer after it's been freed, even only in principle, by
a successful call to realloc, is undefined, and 2) because
it's trivial to lower the level either globally, or locally
by suppressing the warning around such misuses.

I've tested the patch on x86_64-linux and by building Glibc
and Binutils/GDB.  It triggers a number of times in each, all
due to comparing invalidated pointers for equality (i.e., level
3).  I have suppressed these in GCC (libiberty) by a #pragma,
and will see how the Glibc folks want to deal with theirs (I
track them in BZ #28521).

The tests contain a number of xfails due to limitations I'm
aware of.  I marked them pr?? until the patch is approved.
I will open bugs for them before committing if I don't resolve
them in a followup.

Martin

gcc-63272-1.diff

Add -Wuse-after-free.

gcc/c-family/ChangeLog

* c.opt (-Wuse-after-free): New options.

gcc/ChangeLog:

* diagnostic-spec.c (nowarn_spec_t::nowarn_spec_t): Handle
OPT_Wreturn_local_addr and OPT_Wuse_after_free_.
* diagnostic-spec.h (NW_DANGLING): New enumerator.
* doc/invoke.texi (-Wuse-after-free): Document new option.
* gimple-ssa-warn-access.cc (pass_waccess::check_call): Rename...
(pass_waccess::check_call_access): ...to this.
(pass_waccess::check): Rename...
(pass_waccess::check_block): ...to this.
(pass_waccess::check_pointer_uses): New function.
(pass_waccess::gimple_call_return_arg): New function.
(pass_waccess::warn_invalid_pointer): New function.
(pass_waccess::check_builtin): Handle free and realloc.
(gimple_use_after_inval_p): New function.
(get_realloc_lhs): New function.
(maybe_warn_mismatched_realloc): New function.
(pointers_related_p): New function.
(pass_waccess::check_call): Call check_pointer_uses.
(pass_waccess::execute): Compute and free dominance info.

libcpp/ChangeLog:

* files.c (_cpp_find_file): Substitute a valid pointer for
an invalid one to avoid -Wuse-0after-free.

libiberty/ChangeLog:

* regex.c: Suppress -Wuse-after-free.

gcc/testsuite/ChangeLog:

* gcc.dg/Wmismatched-dealloc-2.c: Avoid -Wuse-after-free.
* gcc.dg/Wmismatched-dealloc-3.c: Same.
* gcc.dg/attr-alloc_size-6.c: Disable -Wuse-after-free.
* gcc.dg/attr-alloc_size-7.c: Same.
* c-c++-common/Wuse-after-free-2.c: New test.
* c-c++-common/Wuse-after-free-3.c: New test.
* c-c++-common/Wuse-after-free-4.c: New test.
* c-c++-common/Wuse-after-free-5.c: New test.
* c-c++-common/Wuse-after-free-6.c: New test.
* c-c++-common/Wuse-after-free-7.c: New test.
* c-c++-common/Wuse-after-free.c: New test.
* g++.dg/warn/Wdangling-pointer.C: New test.
* g++.dg/warn/Wmismatched-dealloc-3.C: New test.
* g++.dg/warn/Wuse-after-free.C: New test.

diff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-access.cc
index 63fc27a1487..2065402a2b9 100644
--- a/gcc/gimple-ssa-warn-access.cc
+++ b/gcc/gimple-ssa-warn-access.cc

@@ -3397,33 +3417,460 @@ pass_waccess::maybe_check_dealloc_call (gcall *call)
  }
  }
  
+/* Return true if either USE_STMT's basic block (that of a pointer's use)

+   is dominated by INVAL_STMT's (that of a pointer's invalidating statement,
+   which is either a clobber or a deallocation call), or if they're in
+   the same block, USE_STMT follows INVAL_STMT.  */
+
+static bool
+gimple_use_after_inval_p (gimple *inval_stmt, gimple *use_stmt,
+ bool last_block = false)
+{
+  tree clobvar =
+gimple_clobber_p (inval_stmt) ? gimple_assign_lhs (inval_stmt) : NULL_TREE;
+
+  basic_block inval_bb = gimple_bb (inval_stmt);
+  basic_block use_bb = gimple_bb (use_stmt);
+
+  if (inval_bb != use_bb)
+{
+  if (dominated_by_p (CDI_DOMINATORS, use_bb, inval_bb))
+   return true;
+
+  if (!clobvar || !last_block)
+   return false;
+
+  auto gsi = gsi_for_stmt (use_stmt);
+
+  auto_bitmap visited;
+
+  /* A use statement in the last basic block in a function or one that
+falls through to it is after any other 

Re: [PATCH] PR fortran/103392 - [9/10/11/12 Regression] ICE in simplify_bound, at fortran/simplify.c:4273

2021-11-23 Thread Mikael Morin

Le 23/11/2021 à 21:46, Harald Anlauf via Fortran a écrit :

Dear all,

in simplify_bound we did hit an assert when trying to simplify
LBOUND/UBOUND for arrays with allocatable or pointer attribute.

We cannot do that.  Terminate simplification in that situation.

Regtested on x86_64-pc-linux-gnu.  OK for mainline/affected branches?


OK. Thanks.


[PATCH] PR fortran/103392 - [9/10/11/12 Regression] ICE in simplify_bound, at fortran/simplify.c:4273

2021-11-23 Thread Harald Anlauf via Gcc-patches
Dear all,

in simplify_bound we did hit an assert when trying to simplify
LBOUND/UBOUND for arrays with allocatable or pointer attribute.

We cannot do that.  Terminate simplification in that situation.

Regtested on x86_64-pc-linux-gnu.  OK for mainline/affected branches?

Thanks,
Harald

From 82c5d7ab299ad4bce98b53cc9bba223c29b34e66 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Tue, 23 Nov 2021 21:39:36 +0100
Subject: [PATCH] Fortran: do not attempt simplification of [LU]BOUND for
 pointer/allocatable

gcc/fortran/ChangeLog:

	PR fortran/103392
	* simplify.c (simplify_bound): Do not try to simplify
	LBOUND/UBOUND for arrays with POINTER or ALLOCATABLE attribute.

gcc/testsuite/ChangeLog:

	PR fortran/103392
	* gfortran.dg/bound_simplification_7.f90: New test.
---
 gcc/fortran/simplify.c |  6 ++
 .../gfortran.dg/bound_simplification_7.f90 | 18 ++
 2 files changed, 24 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/bound_simplification_7.f90

diff --git a/gcc/fortran/simplify.c b/gcc/fortran/simplify.c
index 6a6b3fbd037..c9e13b59da9 100644
--- a/gcc/fortran/simplify.c
+++ b/gcc/fortran/simplify.c
@@ -4266,6 +4266,12 @@ simplify_bound (gfc_expr *array, gfc_expr *dim, gfc_expr *kind, int upper)
 	 || (as->type == AS_ASSUMED_SHAPE && upper)))
 return NULL;

+  /* 'array' shall not be an unallocated allocatable variable or a pointer that
+ is not associated.  */
+  if (array->expr_type == EXPR_VARIABLE
+  && (gfc_expr_attr (array).allocatable || gfc_expr_attr (array).pointer))
+return NULL;
+
   gcc_assert (!as
 	  || (as->type != AS_DEFERRED
 		  && array->expr_type == EXPR_VARIABLE
diff --git a/gcc/testsuite/gfortran.dg/bound_simplification_7.f90 b/gcc/testsuite/gfortran.dg/bound_simplification_7.f90
new file mode 100644
index 000..3efecdff769
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/bound_simplification_7.f90
@@ -0,0 +1,18 @@
+! { dg-do compile }
+! PR fortran/103392 - ICE in simplify_bound
+
+program p
+  integer, allocatable :: a(1:1) ! { dg-error "deferred shape or assumed rank" }
+  integer :: b(1) = lbound(a)! { dg-error "does not reduce" }
+  integer :: c(1) = ubound(a)! { dg-error "does not reduce" }
+end
+
+subroutine s(x, y)
+  type t
+ integer :: i(3)
+  end type t
+  type(t), pointer :: x(:)
+  type(t), allocatable :: y(:)
+  integer, parameter   :: m(1) = ubound (x(1)% i)
+  integer  :: n(1) = ubound (y(1)% i)
+end subroutine s
--
2.26.2



Re: [PATCH, v2] c++: Diagnose taking address of an immediate member function [PR102753]

2021-11-23 Thread Jason Merrill via Gcc-patches

On 10/29/21 11:24, Jakub Jelinek wrote:

On Tue, Oct 26, 2021 at 04:58:11PM -0400, Jason Merrill wrote:

I'm afraid I don't have a good idea where to move that diagnostic to though,
it would need to be done somewhere where we are certain we aren't in a
subexpression of immediate invocation.  Given statement expressions, even
diagnostics after parsing whole statements might not be good enough, e.g.
void
qux ()
{
static_assert (bar (({ constexpr auto a = 1; foo; })) == 42);
}


I suppose (a wrapper for) fold_build_cleanup_point_expr would be a possible
place to check, since that's called for full-expressions.


I've played a little bit with this (tried to do it at cp_fold time), but
there are problems with that.
cp_fold of course isn't a good spot for this because it can be called from
fold_for_warn and at that point we don't know if we are inside of immediate
invocation's argument or not, or it can be called even inside of consteval
fn bodies etc.


How about checking in cp_fold_r instead of cp_fold?


So, let's suppose we do a separate cp_walk_tree just for
this if cxx_dialect >= cxx20 e.g. from cp_fold_function and
cp_fully_fold_init or some other useful spot, like in the patch below
we avoid walking into THEN_CLAUSE of IF_STMT_CONSTEVAL_P IF_STMTs.
And if this would be done before cp_fold_function's cp_fold_r walk,
we'd also need calls to source_location_current_p as an exception.
The major problem is the location used for the error_at,
e.g. the ADDR_EXPRs pretty much never EXPR_HAS_LOCATION and PTRMEM_CST
doesn't even have location, so while we would report diagnostics, it would
be always
cc1plus: error: taking address of an immediate function ‘consteval int S::foo() 
const’
etc.


I've checked in a patch to give PTRMEM_CST a location wrapper; perhaps 
that will be helpful.



I guess one option is to report it even later, during gimplification where
gimplify_expr etc. track input_location, but what to do with static
initializers?
Another option would be to have a walk_tree_1 variant that would be updating
input_location similarly to how gimplify_expr does that, i.e.
   saved_location = input_location;
   if (save_expr != error_mark_node
   && EXPR_HAS_LOCATION (*expr_p))
 input_location = EXPR_LOCATION (*expr_p);
...
   input_location = saved_location;
but probably using RAII because walk_tree_1 has a lot of returns in it.


iloc_sentinel seems relevant.


And turn walk_tree_1 into a template instantiated twice, once as walk_tree_1
without the input_location handling in it and once with it under some
different name?


Maybe just add the handling to walk_tree_1?


Or do we have some other expression walker that does update input_location
as it goes?

--- gcc/cp/typeck.c.jj  2021-10-27 09:03:07.555043491 +0200
+++ gcc/cp/typeck.c 2021-10-29 15:59:57.871449304 +0200
@@ -6773,16 +6773,6 @@ cp_build_addr_expr_1 (tree arg, bool str
return error_mark_node;
  }
  
-	if (TREE_CODE (t) == FUNCTION_DECL

-   && DECL_IMMEDIATE_FUNCTION_P (t)
-   && !in_immediate_context ())
- {
-   if (complain & tf_error)
- error_at (loc, "taking address of an immediate function %qD",
-   t);
-   return error_mark_node;
- }
-
type = build_ptrmem_type (context_for_name_lookup (t),
  TREE_TYPE (t));
t = make_ptrmem_cst (type, t);
@@ -6809,15 +6799,6 @@ cp_build_addr_expr_1 (tree arg, bool str
  {
tree stripped_arg = tree_strip_any_location_wrapper (arg);
if (TREE_CODE (stripped_arg) == FUNCTION_DECL
- && DECL_IMMEDIATE_FUNCTION_P (stripped_arg)
- && !in_immediate_context ())
-   {
- if (complain & tf_error)
-   error_at (loc, "taking address of an immediate function %qD",
- stripped_arg);
- return error_mark_node;
-   }
-  if (TREE_CODE (stripped_arg) == FUNCTION_DECL
  && !mark_used (stripped_arg, complain) && !(complain & tf_error))
return error_mark_node;
val = build_address (arg);
--- gcc/cp/cp-gimplify.c.jj 2021-09-18 09:47:08.409573816 +0200
+++ gcc/cp/cp-gimplify.c2021-10-29 16:48:42.308261319 +0200
@@ -902,6 +902,17 @@ cp_fold_r (tree *stmt_p, int *walk_subtr
}
cp_walk_tree (_FOR_PRE_BODY (stmt), cp_fold_r, data, NULL);
*walk_subtrees = 0;
+  return NULL;
+}
+
+  if (code == IF_STMT && IF_STMT_CONSTEVAL_P (stmt))
+{
+  /* Don't walk THEN_CLAUSE (stmt) for consteval if.  IF_COND is always
+boolean_false_node.  */
+  cp_walk_tree (_CLAUSE (stmt), cp_fold_r, data, NULL);
+  cp_walk_tree (_SCOPE (stmt), cp_fold_r, data, NULL);
+  *walk_subtrees = 0;
+  return NULL;
  }
  
return NULL;

@@ -1418,9 +1429,9 @@ cp_genericize_r (tree *stmt_p, int *walk
}
  
if (tree fndecl = cp_get_callee_fndecl_nofold (stmt))

-   if (DECL_IMMEDIATE_FUNCTION_P (fndecl))

Re: [PATCH] libcpp: Use [[likely]] conditionally

2021-11-23 Thread Jeff Law via Gcc-patches




On 11/23/2021 1:34 PM, Christophe Lyon wrote:



On Tue, Nov 23, 2021 at 4:41 PM Jeff Law via Gcc-patches 
 wrote:




On 11/23/2021 8:26 AM, Christophe LYON via Gcc-patches wrote:
> Hi!
>
> On 23/11/2021 01:26, Jeff Law via Gcc-patches wrote:
>>
>>
>> On 11/22/2021 10:22 AM, Marek Polacek via Gcc-patches wrote:
>>> Let's hide [[likely]] behind a macro, to suppress warnings if the
>>> compiler doesn't support it.
>>>
>>> Co-authored-by: Jonathan Wakely 
>>>
>>> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
>>>
>>> PR preprocessor/103355
>>>
>>> libcpp/ChangeLog:
>>>
>>> * lex.c: Use ATTR_LIKELY instead of [[likely]].
>>> * system.h (ATTR_LIKELY): Define.
>> OK
>> jeff
>
>
> This patch breaks the build when the host compiler is gcc-4.8.5,
> because __has_cpp_attribute is not defined.
Sigh.  I'd like to move to a more recent prereq if we could.


I don't know why we have such an old dependency indeed.
I am not requesting it, I just happen to have an old enough host
compiler so that I can check/complain when we accidentally
break the dependency :-)
Probably the enterprise distros.  I suspect we'll be able to roll 
forward in 2-3 years...


Jeff


Re: [PATCH] libcpp: Use [[likely]] conditionally

2021-11-23 Thread Jakub Jelinek via Gcc-patches
On Tue, Nov 23, 2021 at 09:34:04PM +0100, Christophe Lyon via Gcc-patches wrote:
> > > This patch breaks the build when the host compiler is gcc-4.8.5,
> > > because __has_cpp_attribute is not defined.
> > Sigh.  I'd like to move to a more recent prereq if we could.
> >
> 
> I don't know why we have such an old dependency indeed.
> I am not requesting it, I just happen to have an old enough host
> compiler so that I can check/complain when we accidentally
> break the dependency :-)

4.8.5 is still widely used and is the first one that supports C++11
reasonably well that it can be used.
__has_cpp_attribute has been added I think only in C++20, before that it was
in SD6, but even that is post C++11 I believe.
So provided we want to support C++11 (and IMHO we should, we can't afford to
be like Rust that can't build with a few days old compiler), we need to be
prepared that __has_cpp_attribute won't be defined.

Jakub



Re: [PATCH] libcpp: Use [[likely]] conditionally

2021-11-23 Thread Christophe Lyon via Gcc-patches
On Tue, Nov 23, 2021 at 4:41 PM Jeff Law via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

>
>
> On 11/23/2021 8:26 AM, Christophe LYON via Gcc-patches wrote:
> > Hi!
> >
> > On 23/11/2021 01:26, Jeff Law via Gcc-patches wrote:
> >>
> >>
> >> On 11/22/2021 10:22 AM, Marek Polacek via Gcc-patches wrote:
> >>> Let's hide [[likely]] behind a macro, to suppress warnings if the
> >>> compiler doesn't support it.
> >>>
> >>> Co-authored-by: Jonathan Wakely 
> >>>
> >>> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> >>>
> >>> PR preprocessor/103355
> >>>
> >>> libcpp/ChangeLog:
> >>>
> >>> * lex.c: Use ATTR_LIKELY instead of [[likely]].
> >>> * system.h (ATTR_LIKELY): Define.
> >> OK
> >> jeff
> >
> >
> > This patch breaks the build when the host compiler is gcc-4.8.5,
> > because __has_cpp_attribute is not defined.
> Sigh.  I'd like to move to a more recent prereq if we could.
>

I don't know why we have such an old dependency indeed.
I am not requesting it, I just happen to have an old enough host
compiler so that I can check/complain when we accidentally
break the dependency :-)

Christophe



>
>
> >
> > Is this small patch OK with a proper ChangeLog?
> Yes.  Sorry about the breakage.
> jeff
>
>
>


Re: [EXTERNAL] Re: [PATCH][WIP] PR tree-optimization/101808 Boolean comparison simplification

2021-11-23 Thread Jeff Law via Gcc-patches




On 11/23/2021 1:08 PM, Navid Rahimi wrote:

In gimple your primary goal should be to reduce the number of
expressions that are evaluated.  This patch does the opposite.

That is actually a really good point in my opinion. I am hesitant about this 
patch and wanted to hear gcc-patch opinion about this. Doing something like 
this in IR level is a little bit counter intuitive to me. I will take a look at 
LLVM in my spare time to see where they are transferring that pattern and what 
was the rationale behind it.
It could be easily looked at as target expansion issue.  ie, there's two 
equivalent forms for the full expression and the desired form varies 
based on some property of the target.  The idea we've kicked around, but 
not implemented, would be to allow target specific match.pd patterns to 
drive rewriting expressions at the gimple->rtl border.


Jeff



Re: [EXTERNAL] Re: [PATCH][WIP] PR tree-optimization/101808 Boolean comparison simplification

2021-11-23 Thread Navid Rahimi via Gcc-patches
> In gimple your primary goal should be to reduce the number of
> expressions that are evaluated.  This patch does the opposite.

That is actually a really good point in my opinion. I am hesitant about this 
patch and wanted to hear gcc-patch opinion about this. Doing something like 
this in IR level is a little bit counter intuitive to me. I will take a look at 
LLVM in my spare time to see where they are transferring that pattern and what 
was the rationale behind it.

Best wishes,
Navid.


From: Jeff Law 
Sent: Tuesday, November 23, 2021 12:02
To: Navid Rahimi; Navid Rahimi via Gcc-patches
Subject: Re: [EXTERNAL] Re: [PATCH][WIP] PR tree-optimization/101808 Boolean 
comparison simplification



On 11/23/2021 12:42 PM, Navid Rahimi wrote:
> In case of x86_64. This is the code:
>
> src_1(bool, bool):
>  cmp dil, sil
>  setbal
>  ret
>
> tgt_1(bool, bool):
>  xor edi, 1
>  mov eax, edi
>  and eax, esi
>  ret
>
>
> Lets look at the latency of the src_1:
> cmp: latency of 1: (page 663, table C-17)
> setb: latency of 2. They don't report setb latency in intel instruction 
> manual. But the closest instruction to this setbe does have latency of 2.
>
> But for tgt_1:
> xor: latency 1.
> mov: latency 1. (But it seems x86_64 does optimize this instruction and 
> basically it is latency 0 in this case.  In Zero-Latency MOV Instructions 
> section they explain it [1].)
> and: latency 1.
>
> So even if you consider setb as latency of 1 it is equal. But if it is 
> latency of 2, it should be a 1 latency win.
>
> 1) 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.intel.com%2Fcontent%2Fdam%2Fwww%2Fpublic%2Fus%2Fen%2Fdocuments%2Fmanuals%2F64-ia-32-architectures-optimization-manual.pdfdata=04%7C01%7Cnavidrahimi%40microsoft.com%7Cda4bfe80ceaa432a813e08d9aebc33ee%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637732945624565576%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=sopToDx8Y4xfROdI7nRYxYQ%2FCHJPgjIKKGEaWiAXmL4%3Dreserved=0
But these are target issues you've raised -- those should be handled in
the RTL pipeline and are not a significant concern for gimple.

In gimple your primary goal should be to reduce the number of
expressions that are evaluated.  This patch does the opposite.

jeff



Re: [EXTERNAL] Re: [PATCH][WIP] PR tree-optimization/101808 Boolean comparison simplification

2021-11-23 Thread Jeff Law via Gcc-patches




On 11/23/2021 12:55 PM, Navid Rahimi wrote:

Did you test Ada with this patch as that is where the "odd" boolean
types show up?

No I haven't tested Ada yet. Since it is work in progress still [WIP]. Quick 
question, to prevent applying this optimization to those odd Boolean types in 
Ada, there should be a check to check whether it is canonical boolean type or 
signed/unsigned, which should prevent messing with odd Boolean types in Ada.
IIRC, you should check the type's precision.  THere should be examples 
you can find in one or more of the gimple optimizers.


jeff



Re: [EXTERNAL] Re: [PATCH][WIP] PR tree-optimization/101808 Boolean comparison simplification

2021-11-23 Thread Jeff Law via Gcc-patches




On 11/23/2021 12:42 PM, Navid Rahimi wrote:

In case of x86_64. This is the code:

src_1(bool, bool):
 cmp dil, sil
 setbal
 ret

tgt_1(bool, bool):
 xor edi, 1
 mov eax, edi
 and eax, esi
 ret


Lets look at the latency of the src_1:
cmp: latency of 1: (page 663, table C-17)
setb: latency of 2. They don't report setb latency in intel instruction manual. 
But the closest instruction to this setbe does have latency of 2.

But for tgt_1:
xor: latency 1.
mov: latency 1. (But it seems x86_64 does optimize this instruction and 
basically it is latency 0 in this case.  In Zero-Latency MOV Instructions 
section they explain it [1].)
and: latency 1.

So even if you consider setb as latency of 1 it is equal. But if it is latency 
of 2, it should be a 1 latency win.

1) 
https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf
But these are target issues you've raised -- those should be handled in 
the RTL pipeline and are not a significant concern for gimple.


In gimple your primary goal should be to reduce the number of 
expressions that are evaluated.  This patch does the opposite.


jeff



Re: [PATCH] handle member references in -Waddress [PR96507]

2021-11-23 Thread Jason Merrill via Gcc-patches

On 11/22/21 18:21, Marek Polacek wrote:

On Mon, Nov 22, 2021 at 04:00:56PM -0700, Martin Sebor via Gcc-patches wrote:

While going through old -Waddress bug reports to close after
the recent improvements to the warning I came across PR 96507
that points out that member references aren't handled.  Since
testing the address of a reference for equality to null is
in general diagnosed, this seems like an oversight worth fixing.
  Attached is a change to the C++ front end to diagnose member
references as well.

Tested on x86_64-linux.

Martin



Issue -Waddress also for reference members [PR96507].

Resolves:
PR c++/96507 - missing -Waddress for member references

gcc/cp/ChangeLog:

PR c++/96507
* typeck.c (warn_for_null_address): Handle reference members.

gcc/testsuite/ChangeLog:

PR c++/96507
* g++.dg/warn/Waddress-8.C: New test.

diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index 58919aaf13e..694c53eef8a 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -4676,15 +4676,21 @@ warn_for_null_address (location_t location, tree op, 
tsubst_flags_t complain)
"addition %qE and NULL", cop);
return;
  }
-  else if (CONVERT_EXPR_P (op)
-  && TYPE_REF_P (TREE_TYPE (TREE_OPERAND (op, 0
+  else if (CONVERT_EXPR_P (op))
  {
-  STRIP_NOPS (op);
+  tree op0 = TREE_OPERAND (op, 0);
+  if (TYPE_REF_P (TREE_TYPE (op0)))
+   {


Isn't this just REFERENCE_REF_P?


No, there's no INDIRECT_REF here.

Martin, I think you don't need to change the test to two levels since 
you don't use the op0 variable again; I think these two lines:



+ if (TREE_CODE (op) == COMPONENT_REF)
+   op = TREE_OPERAND (op, 1);


are all the change you need for this fix.  OK that way.

Jason



Re: [EXTERNAL] Re: [PATCH][WIP] PR tree-optimization/101808 Boolean comparison simplification

2021-11-23 Thread Navid Rahimi via Gcc-patches
> Did you test Ada with this patch as that is where the "odd" boolean
> types show up?
No I haven't tested Ada yet. Since it is work in progress still [WIP]. Quick 
question, to prevent applying this optimization to those odd Boolean types in 
Ada, there should be a check to check whether it is canonical boolean type or 
signed/unsigned, which should prevent messing with odd Boolean types in Ada.

Best wishes,
Navid.


From: Andrew Pinski 
Sent: Tuesday, November 23, 2021 11:33
To: Jeff Law
Cc: Navid Rahimi; Navid Rahimi via Gcc-patches
Subject: [EXTERNAL] Re: [PATCH][WIP] PR tree-optimization/101808 Boolean 
comparison simplification

[You don't often get email from pins...@gmail.com. Learn why this is important 
at http://aka.ms/LearnAboutSenderIdentification.]

On Tue, Nov 23, 2021 at 11:15 AM Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 11/23/2021 11:34 AM, Navid Rahimi via Gcc-patches wrote:
> > Hi GCC community,
> >
> > I wanted you take a quick look at this patch to solve this bug [1]. This is 
> > the code example for the optimization [2] which does include a link to 
> > proof of each different optimization.
> >
> > I think it should be possible to use simpler approach than what Andrew has 
> > used here [3].
> >
> > P.S. Tested and verified on Linux x86_64.
> >
> > 1) 
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fbugzilla%2Fshow_bug.cgi%3Fid%3D101808data=04%7C01%7Cnavidrahimi%40microsoft.com%7C7b45fdd017874f287caf08d9aeb836ad%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637732928490579680%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=UsE0BbrZpRhrPZZreF%2Bj2spaYmJZuVLc053sWTFG6Ow%3Dreserved=0
> > 2) 
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcompiler-explorer.com%2Fz%2FGc448eE3zdata=04%7C01%7Cnavidrahimi%40microsoft.com%7C7b45fdd017874f287caf08d9aeb836ad%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637732928490579680%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=vGlVXyBqBeABvP8hQGb6paYj1t078rSlLdpI0t6qDlc%3Dreserved=0
> > 3) 
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fbugzilla%2Fshow_bug.cgi%3Fid%3D101808%23c1data=04%7C01%7Cnavidrahimi%40microsoft.com%7C7b45fdd017874f287caf08d9aeb836ad%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637732928490579680%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=DNAyrNo9uZ05FyJYOdc%2BD85Dc9A3VgP95htSfaxRS40%3Dreserved=0
> Don't those match.pd patterns make things worse?  We're taking a single
> expression evaluation (the conditional) and turning it into two logicals
> AFAICT.
>
> For the !x expression, obviously if x is a  constant, then we can
> compute that at compile time and we're going from a single conditional
> to a single logical which is probably a win, but that's not the case
> with this patch AFAICT.

One thing is you could use ! to see if bit_not simplifies down to a
constant which is what I did in the bug report.  But it might be more
useful to use the ^ flag (which I added in a different patch) which
says the bit_xor is removed then accept it.

Note (bit_not @0) is wrong, it should be (bit_xor @0 { booleantrue; }
) as there are boolean types which are signed and/or > 1 precision
which is what I had in my patch.
Did you test Ada with this patch as that is where the "odd" boolean
types show up?

Thanks,
Andrew Pinski


>
> jeff


Re: [PATCH] coroutines: Handle initial awaiters with non-void returns [PR 100127].

2021-11-23 Thread Jason Merrill via Gcc-patches

On 11/19/21 12:40, Iain Sandoe wrote:

On 18 Nov 2021, at 23:42, Iain Sandoe  wrote:

On 18 Nov 2021, at 22:13, Jason Merrill via Gcc-patches 
 wrote:

On 11/5/21 11:46, Iain Sandoe wrote:

The way in which a C++20 coroutine is specified discards any value



  tree aw_r = TREE_VEC_ELT (vec, 2);
+ if (!VOID_TYPE_P (TREE_TYPE (aw_r)))
+   aw_r = build1 (CONVERT_EXPR, void_type_node, aw_r);


Is there a reason not to use convert_to_void?


no, just me still learning APIs… I’ll do a revised and check it.


So I’m testing this replacement:

  aw_r = convert_to_void (aw_r, ICV_CAST, tf_warning_or_error);


Why ICV_CAST?  I'd think ICV_STATEMENT, so that we get [[nodiscard]] 
warnings.


OK with that change.

Jason



Re: [PATCH] c++: redundant explicit 'this' capture in C++17 [PR100493]

2021-11-23 Thread Jason Merrill via Gcc-patches

On 11/19/21 14:25, Patrick Palka wrote:

As described in detail in the PR, in C++20 implicitly capturing 'this'
via the '=' capture default is deprecated, but in C++17 explicitly
capturing 'this' alongside a '=' capture default is ill-formed.  This
means it's impossible to write a C++17 lambda that captures 'this' and
that also has a '=' capture default in a forward-compatible way with
C++20:

   [=] { this; }  // #1 deprecated in C++20, OK in C++17
 // GCC issues a -Wdeprecated warning in C++20 mode
   [=, this] { }  // #2 ill-formed in C++17, OK in C++20
 // GCC issues an unconditional warning in C++17 mode

This patch resolves this dilemma by downgrading the warning for #2 into
a -pedantic one.  In passing, move it into the -Wc++20-extensions class
of warnings and mention that the construct in question is a C++20 one.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/11?


OK.


PR c++/100493

gcc/cp/ChangeLog:

* parser.c (cp_parser_lambda_introducer): In C++17, don't
diagnose a redundant 'this' capture alongside a by-copy
capture default unless -pedantic.  Move the diagnostic into
-Wc++20-extensions and improve the wording.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/lambda-this1.C: Adjust expected diagnostics.
* g++.dg/cpp1z/lambda-this8.C: New test.
* g++.dg/cpp2a/lambda-this3.C: Compile with -pedantic in C++17
to continue to diagnose redundant 'this' captures.
---
  gcc/cp/parser.c   |  8 +---
  gcc/testsuite/g++.dg/cpp1z/lambda-this1.C |  8 
  gcc/testsuite/g++.dg/cpp1z/lambda-this8.C | 10 ++
  gcc/testsuite/g++.dg/cpp2a/lambda-this3.C |  2 +-
  4 files changed, 20 insertions(+), 8 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/lambda-this8.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 65f0f112011..30790006ac9 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -11120,10 +11120,12 @@ cp_parser_lambda_introducer (cp_parser* parser, tree 
lambda_expr)
if (cp_lexer_next_token_is_keyword (parser->lexer, RID_THIS))
{
  location_t loc = cp_lexer_peek_token (parser->lexer)->location;
- if (cxx_dialect < cxx20
+ if (cxx_dialect < cxx20 && pedantic
  && LAMBDA_EXPR_DEFAULT_CAPTURE_MODE (lambda_expr) == CPLD_COPY)
-   pedwarn (loc, 0, "explicit by-copy capture of % redundant "
-"with by-copy capture default");
+   pedwarn (loc, OPT_Wc__20_extensions,
+"explicit by-copy capture of % "
+"with by-copy capture default only available with "
+"%<-std=c++20%> or %<-std=gnu++20%>");
  cp_lexer_consume_token (parser->lexer);
  if (LAMBDA_EXPR_THIS_CAPTURE (lambda_expr))
pedwarn (input_location, 0,
diff --git a/gcc/testsuite/g++.dg/cpp1z/lambda-this1.C 
b/gcc/testsuite/g++.dg/cpp1z/lambda-this1.C
index b13ff8b9fc6..e12330a8291 100644
--- a/gcc/testsuite/g++.dg/cpp1z/lambda-this1.C
+++ b/gcc/testsuite/g++.dg/cpp1z/lambda-this1.C
@@ -18,7 +18,7 @@ struct A {
  auto i = [=] { return a; };   // { dg-warning "implicit capture" 
"" { target c++2a } }
  auto j = [&] { return a; };
  // P0409R2 - C++2A lambda capture [=, this]
-auto k = [=, this] { return a; };// { dg-error "explicit by-copy capture of 'this' 
redundant with by-copy capture default" "" { target c++17_down } }
+auto k = [=, this] { return a; };// { dg-error "explicit by-copy capture of 'this' with 
by-copy capture default only available with" "" { target c++17_down } }
  auto l = [&, this] { return a; };
  auto m = [=, *this] { return a; };// { dg-error "'*this' capture only available 
with" "" { target c++14_down } }
  auto n = [&, *this] { return a; };// { dg-error "'*this' capture only available 
with" "" { target c++14_down } }
@@ -27,12 +27,12 @@ struct A {
// { dg-error "'*this' capture only available 
with" "" { target c++14_down } .-1 }
  auto q = [=, this, *this] { return a; };// { dg-error "already captured 
'this'" }
// { dg-error "'*this' capture only available 
with" "" { target c++14_down } .-1 }
-   // { dg-error "explicit by-copy capture of 'this' 
redundant with by-copy capture default" "" { target c++17_down } .-2 }
+   // { dg-error "explicit by-copy capture of 'this' 
with by-copy capture default only available with" "" { target c++17_down } .-2 }
  auto r = [=, this, this] { return a; };// { dg-error "already captured 
'this'" }
-  // { dg-error "explicit by-copy capture of 'this' 
redundant with by-copy capture default" "" { target c++17_down } .-1 }
+  // { 

Re: [PATCH] c++: -Wuninitialized for mem-inits and empty classes [PR19808]

2021-11-23 Thread Jason Merrill via Gcc-patches

On 11/19/21 16:57, Marek Polacek wrote:

This fixes a bogus -Wuninitialized warning: there's nothing to initialize
in empty classes, so don't add them into our uninitialized set.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


PR c++/19808

gcc/cp/ChangeLog:

* init.c (emit_mem_initializers): Don't add is_really_empty_class
members into uninitialized.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wuninitialized-28.C: Make a class nonempty.
* g++.dg/warn/Wuninitialized-29.C: Likewise.
* g++.dg/warn/Wuninitialized-31.C: New test.
---
  gcc/cp/init.c |  3 +-
  gcc/testsuite/g++.dg/warn/Wuninitialized-28.C |  1 +
  gcc/testsuite/g++.dg/warn/Wuninitialized-29.C |  1 +
  gcc/testsuite/g++.dg/warn/Wuninitialized-31.C | 73 +++
  4 files changed, 77 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/warn/Wuninitialized-31.C

diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index 975f2eda29d..2a4512e462a 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -1470,7 +1470,8 @@ emit_mem_initializers (tree mem_inits)
  for (tree f = next_initializable_field (TYPE_FIELDS (current_class_type));
 f != NULL_TREE;
 f = next_initializable_field (DECL_CHAIN (f)))
-  if (!DECL_ARTIFICIAL (f))
+  if (!DECL_ARTIFICIAL (f)
+ && !is_really_empty_class (TREE_TYPE (f), /*ignore_vptr*/false))
uninitialized.add (f);
  
if (mem_inits

diff --git a/gcc/testsuite/g++.dg/warn/Wuninitialized-28.C 
b/gcc/testsuite/g++.dg/warn/Wuninitialized-28.C
index 7dbbf8719ec..816249c2b9c 100644
--- a/gcc/testsuite/g++.dg/warn/Wuninitialized-28.C
+++ b/gcc/testsuite/g++.dg/warn/Wuninitialized-28.C
@@ -47,6 +47,7 @@ struct F {
  };
  
  struct bar {

+  int a;
bar() {}
bar(bar&) {}
  };
diff --git a/gcc/testsuite/g++.dg/warn/Wuninitialized-29.C 
b/gcc/testsuite/g++.dg/warn/Wuninitialized-29.C
index bc742997441..da81abf07c9 100644
--- a/gcc/testsuite/g++.dg/warn/Wuninitialized-29.C
+++ b/gcc/testsuite/g++.dg/warn/Wuninitialized-29.C
@@ -47,6 +47,7 @@ struct F {
  };
  
  struct bar {

+  int a;
bar() {}
bar(bar&) {}
  };
diff --git a/gcc/testsuite/g++.dg/warn/Wuninitialized-31.C 
b/gcc/testsuite/g++.dg/warn/Wuninitialized-31.C
new file mode 100644
index 000..e22b150db46
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wuninitialized-31.C
@@ -0,0 +1,73 @@
+// PR c++/19808
+// { dg-do compile }
+// { dg-options "-Wuninitialized" }
+
+class AllocatorWithCleanup {
+public:
+  int *allocate(int);
+};
+class SecBlock {
+  SecBlock() : m_ptr(m_alloc.allocate(0)) {} // { dg-bogus "uninitialized" }
+  AllocatorWithCleanup m_alloc;
+  int *m_ptr;
+};
+
+struct A {
+  int *allocate(int);
+};
+
+struct B {
+  int : 0;
+  int *allocate(int);
+};
+
+struct C : B {
+};
+
+struct D {
+  char arr[0];
+  int *allocate(int);
+};
+
+struct E { };
+
+struct F {
+  E arr[10];
+  int *allocate(int);
+};
+
+struct G {
+  E e;
+  int *allocate(int);
+};
+
+struct H {
+  virtual void foo ();
+  int *allocate(int);
+};
+
+template
+struct X {
+  X() : m_ptr(t.allocate(0)) {} // { dg-bogus "uninitialized" }
+  T t;
+  int *m_ptr;
+};
+
+struct V {
+  int a;
+  int *allocate(int);
+};
+
+struct Z {
+  Z() : m_ptr(v.allocate(0)) {} // { dg-warning "uninitialized" }
+  V v;
+  int *m_ptr;
+};
+
+X x1;
+X x2;
+X x3;
+X x4;
+X x5;
+X x6;
+X x7;

base-commit: fc6c6f64ecff376902e7e1ef295f2d8518407ab5





Re: [EXTERNAL] Re: [PATCH][WIP] PR tree-optimization/101808 Boolean comparison simplification

2021-11-23 Thread Navid Rahimi via Gcc-patches
In case of x86_64. This is the code:

src_1(bool, bool):
cmp dil, sil
setbal
ret

tgt_1(bool, bool):
xor edi, 1
mov eax, edi
and eax, esi
ret


Lets look at the latency of the src_1:
cmp: latency of 1: (page 663, table C-17)
setb: latency of 2. They don't report setb latency in intel instruction manual. 
But the closest instruction to this setbe does have latency of 2.

But for tgt_1:
xor: latency 1.
mov: latency 1. (But it seems x86_64 does optimize this instruction and 
basically it is latency 0 in this case.  In Zero-Latency MOV Instructions 
section they explain it [1].)
and: latency 1.

So even if you consider setb as latency of 1 it is equal. But if it is latency 
of 2, it should be a 1 latency win.

1) 
https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf

Best wishes,
Navid.


From: Jeff Law 
Sent: Tuesday, November 23, 2021 11:14
To: Navid Rahimi; Navid Rahimi via Gcc-patches
Subject: [EXTERNAL] Re: [PATCH][WIP] PR tree-optimization/101808 Boolean 
comparison simplification



On 11/23/2021 11:34 AM, Navid Rahimi via Gcc-patches wrote:
> Hi GCC community,
>
> I wanted you take a quick look at this patch to solve this bug [1]. This is 
> the code example for the optimization [2] which does include a link to proof 
> of each different optimization.
>
> I think it should be possible to use simpler approach than what Andrew has 
> used here [3].
>
> P.S. Tested and verified on Linux x86_64.
>
> 1) 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fbugzilla%2Fshow_bug.cgi%3Fid%3D101808data=04%7C01%7Cnavidrahimi%40microsoft.com%7C29308ca3ff234b91a31608d9aeb57500%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637732916650766903%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=m%2BIgviZpMo0MT369dcIzefp810oz%2FMU9LC1Mk2FdChk%3Dreserved=0
> 2) 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcompiler-explorer.com%2Fz%2FGc448eE3zdata=04%7C01%7Cnavidrahimi%40microsoft.com%7C29308ca3ff234b91a31608d9aeb57500%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637732916650766903%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=IwNQZsaEUaB1MKRfL8OWkWYvx0ODq86Obt3eFuxZD40%3Dreserved=0
> 3) 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fbugzilla%2Fshow_bug.cgi%3Fid%3D101808%23c1data=04%7C01%7Cnavidrahimi%40microsoft.com%7C29308ca3ff234b91a31608d9aeb57500%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637732916650766903%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=2DB2AlZWbjJ6Fd3Aw%2Bb2Oub3t8d1i7%2FnaUQKUe2m4uQ%3Dreserved=0
Don't those match.pd patterns make things worse?  We're taking a single
expression evaluation (the conditional) and turning it into two logicals
AFAICT.

For the !x expression, obviously if x is a  constant, then we can
compute that at compile time and we're going from a single conditional
to a single logical which is probably a win, but that's not the case
with this patch AFAICT.

jeff


Re: [PATCH] c++: Fix missing NSDMI diagnostic in C++98 [PR103347]

2021-11-23 Thread Jason Merrill via Gcc-patches

On 11/22/21 17:17, Marek Polacek wrote:

Here the problem is that we aren't detecting a NSDMI in C++98:

struct A {
   void *x = NULL;
};

because maybe_warn_cpp0x uses input_location and that happens to point
to NULL which comes from a system header.  Jakub suggested changing the
location to the '=', thereby avoiding the system header problem.  To
that end, I've added a new location_t member into cp_declarator.  This
member is used when this declarator is part of an init-declarator.  The
rest of the changes is obvious.  I've also taken the liberty of adding
loc_or_input_loc, since I want to avoid checking for UNKNOWN_LOCATION.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR c++/103347

gcc/cp/ChangeLog:

* cp-tree.h (struct cp_declarator): Add a location_t member.
(maybe_warn_cpp0x): Add a location_t parameter with a default argument.
(loc_or_input_loc): New.
* decl.c (grokdeclarator): Use loc_or_input_loc.  Pass init_loc down
to maybe_warn_cpp0x.
* error.c (maybe_warn_cpp0x): Add a location_t parameter.  Use it.
* parser.c (make_declarator): Initialize init_loc.
(cp_parser_member_declaration): Set init_loc.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/nsdmi-warn1.C: New test.
* g++.dg/cpp0x/nsdmi-warn1.h: New file.
---
  gcc/cp/cp-tree.h | 16 +---
  gcc/cp/decl.c| 22 +---
  gcc/cp/error.c   | 32 
  gcc/cp/parser.c  |  2 ++
  gcc/testsuite/g++.dg/cpp0x/nsdmi-warn1.C | 10 
  gcc/testsuite/g++.dg/cpp0x/nsdmi-warn1.h |  2 ++
  6 files changed, 55 insertions(+), 29 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/nsdmi-warn1.C
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/nsdmi-warn1.h

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 3f56cb90d14..2037082b0c7 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6231,9 +6231,11 @@ struct cp_declarator {
/* If this declarator is parenthesized, this the open-paren.  It is
   UNKNOWN_LOCATION when not parenthesized.  */
location_t parenthesized;
-
-  location_t id_loc; /* Currently only set for cdk_id, cdk_decomp and
-   cdk_function. */
+  /* Currently only set for cdk_id, cdk_decomp and cdk_function.  */
+  location_t id_loc;
+  /* If this declarator is part of an init-declarator, the location of the
+ initializer.  */


Currently this comment is inaccurate because we don't set it for all 
init-declarators.  That should be pretty trivial to do, even if we don't 
use the location yet in other contexts.



+  location_t init_loc;
/* GNU Attributes that apply to this declarator.  If the declarator
   is a pointer or a reference, these attribute apply to the type
   pointed to.  */
@@ -6878,7 +6880,8 @@ extern const char *lang_decl_dwarf_name   (tree, 
int, bool);
  extern const char *language_to_string (enum languages);
  extern const char *class_key_or_enum_as_string(tree);
  extern void maybe_warn_variadic_templates   (void);
-extern void maybe_warn_cpp0x   (cpp0x_warn_str str);
+extern void maybe_warn_cpp0x   (cpp0x_warn_str str,
+location_t = input_location);
  extern bool pedwarn_cxx98   (location_t, int, const char 
*, ...) ATTRIBUTE_GCC_DIAG(3,4);
  extern location_t location_of   (tree);
  extern void qualified_name_lookup_error   (tree, tree, tree,
@@ -7996,6 +7999,11 @@ extern bool decl_in_std_namespace_p   (tree);
  extern void require_complete_eh_spec_types(tree, tree);
  extern void cxx_incomplete_type_diagnostic(location_t, const_tree,
 const_tree, diagnostic_t);
+inline location_t
+loc_or_input_loc (location_t loc)
+{
+  return loc == UNKNOWN_LOCATION ? input_location : loc;
+}
  
  inline location_t

  cp_expr_loc_or_loc (const_tree t, location_t or_loc)
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 9f68d1a5590..ae0e0bae9cc 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -11522,14 +11522,18 @@ grokdeclarator (const cp_declarator *declarator,
if (initialized == SD_DEFAULTED || initialized == SD_DELETED)
  funcdef_flag = true;
  
-  location_t typespec_loc = smallest_type_location (type_quals,

-   declspecs->locations);
-  if (typespec_loc == UNKNOWN_LOCATION)
-typespec_loc = input_location;
-
-  location_t id_loc = declarator ? declarator->id_loc : input_location;
-  if (id_loc == UNKNOWN_LOCATION)
-id_loc = input_location;
+  location_t typespec_loc = loc_or_input_loc (smallest_type_location
+ (type_quals,
+  declspecs->locations));
+  location_t id_loc;

Committed: [PATCH v2] fixincludes: don't abort() on access failure [PR103306]

2021-11-23 Thread Xi Ruoyao via Gcc-patches
Committed as r12-5477.

On Tue, 2021-11-23 at 23:39 +0800, Xi Ruoyao via Gcc-patches wrote:
> [v2: format fix]
> 
> Some distro may ship dangling symlinks in include directories, triggers
> the access failure.  Skip it and continue to next header instead of
> being to panic.
> 
> Restore to old behavior before r12-5234 but without resurrecting the
> problematic getcwd() call, by using the environment variable "INPUT"
> exported by fixinc.sh.
> 
> Tested on x86_64-linux-gnu, with a dangling symlink intentionally
> injected into /usr/include.
> 
> fixincludes/
> 
> PR bootstrap/103306
> * fixincl.c (process): Don't call abort().
> ---
>  fixincludes/fixincl.c | 15 ---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/fixincludes/fixincl.c b/fixincludes/fixincl.c
> index a17b65866c3..92909baf85f 100644
> --- a/fixincludes/fixincl.c
> +++ b/fixincludes/fixincl.c
> @@ -1352,10 +1352,19 @@ process (void)
>  
>    if (access (pz_curr_file, R_OK) != 0)
>  {
> -  /* Some really strange error happened.  */
> -  fprintf (stderr, "Cannot access %s: %s\n", pz_curr_file,
> +  /* It may happens if for e. g. the distro ships some broken symlinks
> +    in /usr/include.  */
> +
> +  /* "INPUT" is exported in fixinc.sh, which is the pwd where fixincl
> +    runs.  It's used instead of getcwd to avoid allocating a buffer
> +    with unknown length.  */
> +  const char *cwd = getenv ("INPUT");
> +  if (!cwd)
> +   cwd = "the working directory";
> +
> +  fprintf (stderr, "Cannot access %s from %s: %s\n", pz_curr_file, cwd,
>    xstrerror (errno));
> -  abort ();
> +  return;
>  }
>  
>    pz_curr_data = load_file (pz_curr_file);

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH][WIP] PR tree-optimization/101808 Boolean comparison simplification

2021-11-23 Thread Andrew Pinski via Gcc-patches
On Tue, Nov 23, 2021 at 11:15 AM Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 11/23/2021 11:34 AM, Navid Rahimi via Gcc-patches wrote:
> > Hi GCC community,
> >
> > I wanted you take a quick look at this patch to solve this bug [1]. This is 
> > the code example for the optimization [2] which does include a link to 
> > proof of each different optimization.
> >
> > I think it should be possible to use simpler approach than what Andrew has 
> > used here [3].
> >
> > P.S. Tested and verified on Linux x86_64.
> >
> > 1) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101808
> > 2) https://compiler-explorer.com/z/Gc448eE3z
> > 3) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101808#c1
> Don't those match.pd patterns make things worse?  We're taking a single
> expression evaluation (the conditional) and turning it into two logicals
> AFAICT.
>
> For the !x expression, obviously if x is a  constant, then we can
> compute that at compile time and we're going from a single conditional
> to a single logical which is probably a win, but that's not the case
> with this patch AFAICT.

One thing is you could use ! to see if bit_not simplifies down to a
constant which is what I did in the bug report.  But it might be more
useful to use the ^ flag (which I added in a different patch) which
says the bit_xor is removed then accept it.

Note (bit_not @0) is wrong, it should be (bit_xor @0 { booleantrue; }
) as there are boolean types which are signed and/or > 1 precision
which is what I had in my patch.
Did you test Ada with this patch as that is where the "odd" boolean
types show up?

Thanks,
Andrew Pinski


>
> jeff


[pushed] c++: Add static in g++.dg/warn/Waddress-5.C

2021-11-23 Thread Marek Polacek via Gcc-patches
While reviewing some other changes I noticed that this test talks
about 'sf' being static, but it wasn't actually marked as such.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Waddress-5.C: Make sf static.
---
 gcc/testsuite/g++.dg/warn/Waddress-5.C | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/warn/Waddress-5.C 
b/gcc/testsuite/g++.dg/warn/Waddress-5.C
index b1ad38a8112..b1287b2fac3 100644
--- a/gcc/testsuite/g++.dg/warn/Waddress-5.C
+++ b/gcc/testsuite/g++.dg/warn/Waddress-5.C
@@ -12,7 +12,7 @@ struct A
   virtual void vf ();
   virtual void pvf () = 0;
 
-  void sf ();
+  static void sf ();
 
   int *p;
   int a[2];

base-commit: 3363022ed810a2797c47867890547c8f73163257
-- 
2.33.1



[wwwdocs] Document new C++ features in GCC 12

2021-11-23 Thread Marek Polacek via Gcc-patches
I've reviewed all the C++ patches that have gone into GCC 12, and
documented the ones that seemed most interesting/relevant to our
users.

Additionally, I've also added links to the proposals/PRs/git commits
so that it's easier to find out more.

I've also updated our C++ DR table.

Validates, pushed.

---
 htdocs/gcc-12/changes.html | 129 +
 htdocs/projects/cxx-dr-status.html |  10 +--
 2 files changed, 119 insertions(+), 20 deletions(-)

diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html
index 89cbbdd8..49be40fd 100644
--- a/htdocs/gcc-12/changes.html
+++ b/htdocs/gcc-12/changes.html
@@ -143,13 +143,19 @@ a work-in-progress.
 C family
 
   Support for __builtin_shufflevector compatible with
-  the clang language extension was added.
-  
+  the clang language extension was added.
+  Support for attribute unavailable was added.
+  A new built-in function, __builtin_assoc_barrier, was added.
+  It can be used to inhibit re-association of floating-point
+  expressions.
   New warnings:
 
   -Wbidi-chars warns about potentially misleading UTF-8
-   bidirectional control characters.  The default is 
-Wbidi-chars=unpaired.
-  
+   bidirectional control characters.  The default is
+   -Wbidi-chars=unpaired
+   (https://gcc.gnu.org/PR103026;>PR103026)
+  -Warray-compare warns about comparisons between two 
operands of
+ array type (https://gcc.gnu.org/PR97573;>PR97573)
 
   
   Enhancements to existing warnings:
@@ -159,7 +165,8 @@ a work-in-progress.
-Wno-attributes=ns:: to suppress warnings about unknown 
scoped
attributes (in C++11 and C2X).  Similarly,
#pragma GCC diagnostic ignored_attributes "vendor::attr" 
can
-   be used to achieve the same effect.
+   be used to achieve the same effect
+   (https://gcc.gnu.org/PR101940;>PR101940)
 
   
 
@@ -168,23 +175,115 @@ a work-in-progress.
 
   Several C++23 features have been implemented:
 
-  P1938R3, if consteval
-  P0849R8, auto(x): decay-copy in the language
-  P2242R3, Non-literal variables (and labels and gotos) in constexpr 
functions
-  P2334R1, Support for preprocessing directives elifdef 
and
- elifndef
-  P2360R0, Extend init-statement to allow 
alias-declaration
-  DR 2397, auto specifier for pointers and references to 
arrays
+  https://wg21.link/p1938;>P1938R3, if 
consteval
+ (https://gcc.gnu.org/PR100974;>PR100974)
+  https://wg21.link/p0849;>P0849R8, auto(x):
+ decay-copy in the language
+ (https://gcc.gnu.org/PR103049;>PR103049)
+  https://wg21.link/p2242;>P2242R3, Non-literal variables 
(and
+ labels and gotos) in constexpr functions
+ (https://gcc.gnu.org/PR102612;>PR102612)
+  https://wg21.link/p2334;>P2334R1, Support for 
preprocessing
+ directives elifdef and elifndef
+ (https://gcc.gnu.org/PR102616;>PR102616)
+  https://wg21.link/p2360;>P2360R0, Extend 
init-statement
+ to allow alias-declaration
+ (https://gcc.gnu.org/PR102617;>PR102617)
+  
+  https://wg21.link/cwg2397;>DR 2397, auto 
specifier
+ for pointers and references to arrays
+ (https://gcc.gnu.org/PR100975;>PR100975)
 
   
   Several C++ Defect Reports have been resolved, e.g.:
 
-  DR 1227, Mixing immediate and non-immediate contexts in deduction 
failure
-  DR 2397, auto specifier for pointers and references to 
arrays
+  https://wg21.link/cwg960;>DR 960, Covariant functions 
and
+ lvalue/rvalue references
+  https://wg21.link/cwg1227;>DR 1227, Mixing immediate and
+ non-immediate contexts in deduction failure
+  https://wg21.link/cwg1315;>DR 1315, Restrictions on 
non-type
+ template arguments in partial specializations
+  https://wg21.link/cwg2082;>DR 2082, Referring to 
parameters
+ in unevaluated operands of default arguments
+  https://wg21.link/cwg2351;>DR 2351, 
void{}
+  https://wg21.link/cwg2374;>DR 2374, Overly permissive
+ specification of enum direct-list-initialization
+  https://wg21.link/cwg2397;>DR 2397, auto 
specifier
+ for pointers and references to arrays
+  https://wg21.link/cwg2446;>DR 2446, Questionable 
type-dependency
+ of concept-ids
 
   
+  New command-line option -fimplicit-constexpr can be used to
+  make inline functions implicitly constexpr
+  (https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=87c2080b;>git)
+  Deduction guides can be declared at class scope
+  (https://gcc.gnu.org/PR79501;>PR79501)
   -Wuninitialized warns about using uninitialized variables in
-  member initializer lists
+  member initializer lists (https://gcc.gnu.org/PR19808;>PR19808)
+  
+  -Wint-in-bool-context is now disabled when instantiating
+  a template (https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=3a2b12bc;>git)
+  Stricter 

Re: [PATCH][WIP] PR tree-optimization/101808 Boolean comparison simplification

2021-11-23 Thread Jeff Law via Gcc-patches




On 11/23/2021 11:34 AM, Navid Rahimi via Gcc-patches wrote:

Hi GCC community,

I wanted you take a quick look at this patch to solve this bug [1]. This is the 
code example for the optimization [2] which does include a link to proof of 
each different optimization.

I think it should be possible to use simpler approach than what Andrew has used 
here [3].

P.S. Tested and verified on Linux x86_64.

1) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101808
2) https://compiler-explorer.com/z/Gc448eE3z
3) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101808#c1
Don't those match.pd patterns make things worse?  We're taking a single 
expression evaluation (the conditional) and turning it into two logicals 
AFAICT.


For the !x expression, obviously if x is a  constant, then we can 
compute that at compile time and we're going from a single conditional 
to a single logical which is probably a win, but that's not the case 
with this patch AFAICT.


jeff


[PATCH, committed] rs6000: Fix test_mffsl.c effective target check

2021-11-23 Thread Bill Schmidt via Gcc-patches
Hi!

Paul Clarke pointed out to me that I had wrongly used a compile-time check
instead of a run-time check in this executable test.  This patch fixes
that.  I also fixed a typo in a string that caught my eye.

Tested on powerpc64le-linux-gnu, committed as obvious.

Thanks!
Bill


2021-11-23  Bill Schmidt  

gcc/testsuite/
* gcc.target/powerpc/test_mffsl.c: Change effective target to
a run-time check.  Fix a typo in a debug print statement.
---
 gcc/testsuite/gcc.target/powerpc/test_mffsl.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/test_mffsl.c 
b/gcc/testsuite/gcc.target/powerpc/test_mffsl.c
index 28c2b91988e..f1f960c51c7 100644
--- a/gcc/testsuite/gcc.target/powerpc/test_mffsl.c
+++ b/gcc/testsuite/gcc.target/powerpc/test_mffsl.c
@@ -1,6 +1,6 @@
 /* { dg-do run { target { powerpc*-*-* } } } */
 /* { dg-options "-O2 -std=c99 -mcpu=power9" } */
-/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-require-effective-target p9vector_hw } */
 
 #ifdef DEBUG
 #include 
@@ -28,7 +28,7 @@ int main ()
   if (mffs_val.ll != mffsl_val.ll)
 {
 #ifdef DEBUG
-  printf("ERROR, __builtin_mffsl() returned 0x%llx, not the expecected 
value 0x%llx\n",
+  printf("ERROR, __builtin_mffsl() returned 0x%llx, not the expected value 
0x%llx\n",
 mffsl_val.ll, mffs_val.ll);
 #else
   abort();
-- 
2.25.1




Re: [PATCH v2] implement -Winfinite-recursion [PR88232]

2021-11-23 Thread Jeff Law via Gcc-patches




On 11/11/2021 2:46 PM, Martin Sebor via Gcc-patches wrote:

Attached is a v2 of the solution I posted earlier this week
with a few tweaks made after a more careful consideration of
the problem and possible false negatives and positives.

1) It avoids warning for [apparently infinitely recursive] calls
   in noreturn functions where the recursion may be prevented by
   a call to a noreturn function.
2) It avoids warning for calls where the recursion may be prevented
   by a call to a longjmp or siglongjmp.
3) It warns for recursive calls to built-ins in definitions of
   the corresponding library functions (e.g., for a call to
   __builtin_malloc in malloc).
4) It warns for calls to C++ functions even if they call other
   functions that might throw and so break out of the infinite
   recursion.  (E.g., operator new.)  This is the same as Clang.
5) It doesn't warn for calls to C++ functions with the throw
   expression.

Besides these changes to the warning itself, I've also improved
the code a bit by making the workhorse function a member of
the pass so recursive calls don't need to pass as many arguments
to itself.

Retested on x86_64-linux and by building Glibc and Binutils/GDB.

A possible enhancement is to warn for calls to calloc, malloc,
or realloc from within the definition of one of the other two
functions.  That might be a mistake made in code that tries
naively to replace the allocator with its own implementation.

On 11/9/21 9:28 PM, Martin Sebor wrote:

The attached patch adds support to the middle end for detecting
infinitely recursive calls.  The warning is controlled by the new
-Winfinite-recursion option.  The option name is the same as
Clang's.

I scheduled the warning pass to run after early inlining to
detect mutually recursive calls but before tail recursion which
turns some recursive calls into infinite loops and so makes
the two indistinguishable.

The warning detects a superset of problems detected by Clang
(based on its tests).  It detects the problem in PR88232
(the feature request) as well as the one in PR 87742,
an unrelated problem report that was root-caused to bug due
to infinite recursion.

This initial version doesn't attempt to deal with superimposed
symbols, so those might trigger false positives.  I'm not sure
that's something to worry about.

The tests are very light, but those for the exceptional cases
are exceedingly superficial, so it's possible those might harbor
some false positives and negatives.

Tested on x86_64-linux.

Martin




gcc-88232.diff

Implement -Winfinite-recursion [PR88232].

Resolves:
PR middle-end/88232 - Please implement -Winfinite-recursion

gcc/ChangeLog:

PR middle-end/88232
* Makefile.in (OBJS): Add gimple-warn-recursion.o.
* common.opt: Add -Winfinite-recursion.
* doc/invoke.texi (-Winfinite-recursion): Document.
* passes.def (pass_warn_recursion): Schedule a new pass.
* tree-pass.h (make_pass_warn_recursion): Declare.
* gimple-warn-recursion.c: New file.

gcc/c-family/ChangeLog:

PR middle-end/88232
* c.opt: Add -Winfinite-recursion.

gcc/testsuite/ChangeLog:

PR middle-end/88232
* c-c++-common/attr-used-5.c: Suppress valid warning.
* c-c++-common/attr-used-6.c: Same.
* c-c++-common/attr-used-9.c: Same.
* g++.dg/warn/Winfinite-recursion-2.C: New test.
* g++.dg/warn/Winfinite-recursion-3.C: New test.
* g++.dg/warn/Winfinite-recursion.C: New test.
* gcc.dg/Winfinite-recursion-2.c: New test.
* gcc.dg/Winfinite-recursion.c: New test.
This is OK.  While there may be other improvements that could be made, 
this looks like a reasonable warning as-is and can be extended/refined 
as needed.


jeff


Re: [PATCH] PR middle-end/103059: reload: Also accept ASHIFT with indexed addressing

2021-11-23 Thread Jeff Law via Gcc-patches




On 11/10/2021 9:41 AM, Maciej W. Rozycki wrote:

  It's actually hunk #2 that fixes this specific ICE.  The other two are
just a consequence: #3 just being a commutative variant of the same case
and #1 from observing that the rtx may now have changed if an ASHIFT too.


Are we getting into find_reloads_address_1 in any case where the RTL is not an
address inside a MEM?

  I've had a GDB session left open with the problematic source, so it was
merely a case of a rerun and grabbing some data.  So with a breakpoint set
at reload.c:5565, conditionalised on (code0 == ASHIFT || code1 == ASHIFT),
we get exactly this, as with my change description:

Breakpoint 52, find_reloads_address_1 (mode=E_DImode, as=0 '\000', 
x=0x7fffedbaf7b0, context=0, outer_code=MEM, index_code=SCRATCH, 
loc=0x761a82f0, opnum=1, type=RELOAD_FOR_INPUT, ind_levels=1, 
insn=0x7fffefc1c9c0) at .../gcc/reload.c:5565
5565if (code0 == MULT || code0 == SIGN_EXTEND || code0 == TRUNCATE
(gdb) print code0
$12958 = ASHIFT
(gdb) print code1
$12959 = PLUS
(gdb) print outer_code
$12960 = MEM
(gdb) pr insn
(insn 2051 2050 2052 180 (set (reg/f:SI 0 %r0 [555])
 (plus:SI (ashift:SI (reg/v:SI 154 [ n_ctrs ])
 (const_int 3 [0x3]))
 (plus:SI (reg/v/f:SI 9 %r9 [orig:176 fn_buffer ] [176])
 (const_int 24 [0x18] ".../libgcc/libgcov-driver.c":172:40 
614 {movaddrdi}
  (nil))
(gdb) pr x
(plus:SI (ashift:SI (reg/v:SI 154 [ n_ctrs ])
 (const_int 3 [0x3]))
 (plus:SI (reg/v/f:SI 9 %r9 [orig:176 fn_buffer ] [176])
 (const_int 24 [0x18])))
(gdb) bt
#0  find_reloads_address_1 (mode=E_DImode, as=0 '\000', x=0x7fffedbaf7b0, 
context=0, outer_code=MEM, index_code=SCRATCH, loc=0x761a82f0, opnum=1, 
type=RELOAD_FOR_INPUT, ind_levels=1, insn=0x7fffefc1c9c0) at 
.../gcc/reload.c:5565
#1  0x111ecd18 in find_reloads_address (mode=E_DImode, memrefloc=0x0, 
ad=0x7fffedbaf7b0, loc=0x761a82f0, opnum=1, type=RELOAD_FOR_INPUT, 
ind_levels=1, insn=0x7fffefc1c9c0) at .../gcc/reload.c:5264
#2  0x111e2fbc in find_reloads (insn=0x7fffefc1c9c0, replace=1, ind_levels=1, 
live_known=1, reload_reg_p=0x12ec7770 ) at 
.../gcc/reload.c:2843
#3  0x112060f4 in reload_as_needed (live_known=1) at 
.../gcc/reload1.c:4522
#4  0x111f9008 in reload (first=0x75dd3c28, global=1) at 
.../gcc/reload1.c:1047
#5  0x10f1458c in do_reload () at .../gcc/ira.c:5944
#6  0x10f14d54 in (anonymous namespace)::pass_reload::execute 
(this=0x12f21d20) at .../gcc/ira.c:6118
#7  0x1112472c in execute_one_pass (pass=0x12f21d20) at 
.../gcc/passes.c:2567
#8  0x11124bc4 in execute_pass_list_1 (pass=0x12f21d20) at 
.../gcc/passes.c:2656
#9  0x11124c0c in execute_pass_list_1 (pass=0x12f20b80) at 
.../gcc/passes.c:2657
#10 0x11124cac in execute_pass_list (fn=0x75dc4b00, 
pass=0x12f1c900) at .../gcc/passes.c:2667
#11 0x109b64f4 in cgraph_node::expand (this=0x75d65a50) at 
.../gcc/cgraphunit.c:1828
#12 0x109b6eac in expand_all_functions () at .../gcc/cgraphunit.c:1992
#13 0x109b7eb8 in symbol_table::compile (this=0x75c4) at 
.../gcc/cgraphunit.c:2356
#14 0x109b8638 in symbol_table::finalize_compilation_unit 
(this=0x75c4) at .../gcc/cgraphunit.c:2537
#15 0x112c4418 in compile_file () at .../gcc/toplev.c:477
#16 0x112c8f60 in do_compile (no_backend=false) at .../gcc/toplev.c:2154
#17 0x112c95d4 in toplev::main (this=0x7fffe944, argc=76, 
argv=0x7fffed78) at .../gcc/toplev.c:2306
#18 0x1245a7b8 in main (argc=76, argv=0x7fffed78) at 
.../gcc/main.c:39
(gdb)

-- so `find_reloads_address' is called from reload.c:2843, which is the
call site in code quoted at the top, for an address associated with the
`p' constraint, and then it goes down to `find_reloads_address_1', which
cannot recognise the rtx and therefore leaves it unchanged.

  Here OUTER_CODE is indeed MEM, but it's merely hardcoded by the caller
at reload.c:5264 irrespective of actual insn/rtx:

   return find_reloads_address_1 (mode, as, ad, 0, MEM, SCRATCH, loc,
 opnum, type, ind_levels, insn);

(I note that `find_reloads_address' does that in several places throughout
and I haven't investigated how legitimate it is, but my guts feeling is at
least in the case concerned it's merely a placeholder, because for a plain
address reference it would have to be nil really.)

  Let me know if it clears your concerns and whether there's anything else
you want me to retrieve from that GDB session.
Thanks for the clarifications.  I never would have guessed that we could 
get into that code in the way you've described, but being reload nothing 
should be terribly surprising.


All my concerns have been addressed.  This is fine for the trunk. Thanks 
for your patience & explanations.


Jeff


[PATCH] rs6000: Clarify overloaded builtin diagnostic

2021-11-23 Thread Bill Schmidt via Gcc-patches
Hi!

When a built-in function required by an overloaded function name is not
currently enabled, the diagnostic message is not as clear as it should be.
Saying that one built-in "requires" another is somewhat misleading.  It is
better to explicitly state that the overloaded builtin is implemented by the
missing builtin, so the user knows that the previous error message for the
implementing builtin is because of the overload relationship.

This patch adjusts the informational diagnostic for both the original support
and the new builtin support.  This doesn't affect the test suite, since we
don't test for "note" diagnostics anywhere.

Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is
this okay for trunk?

Thanks!
Bill


2021-11-23  Bill Schmidt  

gcc/
* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
Clarify diagnostic.
(altivec_resolve_new_overloaded_builtin): Likewise.
---
 gcc/config/rs6000/rs6000-c.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index d08bdfec3ae..5eeac9d4c06 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
@@ -1946,7 +1946,8 @@ altivec_resolve_overloaded_builtin (location_t loc, tree 
fndecl,
   non-overloaded function has already been issued.  Add
   clarification of the previous message.  */
rich_location richloc (line_table, input_location);
-   inform (, "builtin %qs requires builtin %qs",
+   inform (,
+   "overloaded builtin %qs is implemented by builtin %qs",
name, internal_name);
  }
else
@@ -2992,7 +2993,8 @@ altivec_resolve_new_overloaded_builtin (location_t loc, 
tree fndecl,
   non-overloaded function has already been issued.  Add
   clarification of the previous message.  */
rich_location richloc (line_table, input_location);
-   inform (, "builtin %qs requires builtin %qs",
+   inform (,
+   "overloaded builtin %qs is implemented by builtin %qs",
name, internal_name);
  }
else
-- 
2.27.0




[PATCH][WIP] PR tree-optimization/101808 Boolean comparison simplification

2021-11-23 Thread Navid Rahimi via Gcc-patches
Hi GCC community,

I wanted you take a quick look at this patch to solve this bug [1]. This is the 
code example for the optimization [2] which does include a link to proof of 
each different optimization.

I think it should be possible to use simpler approach than what Andrew has used 
here [3].

P.S. Tested and verified on Linux x86_64.

1) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101808 
2) https://compiler-explorer.com/z/Gc448eE3z
3) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101808#c1

Best wishes,
Navid.

0001-PR-tree-optimization-101808.patch
Description: 0001-PR-tree-optimization-101808.patch


Re: [PATCH] Enhance optimize_atomic_bit_test_and to handle truncation.

2021-11-23 Thread Jeff Law via Gcc-patches




On 11/16/2021 10:20 PM, liuhongt via Gcc-patches wrote:

r12-5102-gfb161782545224f5 improves integer bit test on
__atomic_fetch_[or|and]_* returns only for nop_convert, .i.e.

transfrom

   mask_5 = 1 << bit_4(D);
   mask.0_1 = (unsigned int) mask_5;
   _2 = __atomic_fetch_or_4 (a_7(D), mask.0_1, 0);
   t1_9 = (int) _2;
   t2_10 = mask_5 & t1_9;

to

   mask_5 = 1 << n_4(D);
   mask.1_1 = (unsigned int) mask_5;
   _11 = .ATOMIC_BIT_TEST_AND_SET (_a_1_4, n_4(D), 0);
   _8 = (int) _11;

And this patch extend the original patch to handle truncation.
.i.e.

transform

   long int mask;
   mask_8 = 1 << n_7(D);
   mask.0_1 = (long unsigned int) mask_8;
   _2 = __sync_fetch_and_or_8 (_a_2_3, mask.0_1);
   _3 = (unsigned int) _2;
   _4 = (unsigned int) mask_8;
   _5 = _3 & _4;
   _6 = (int) _5;

to

   long int mask;
   mask_8 = 1 << n_7(D);
   mask.0_1 = (long unsigned int) mask_8;
   _14 = .ATOMIC_BIT_TEST_AND_SET (_a_2_3, n_7(D), 0);
   _5 = (unsigned int) _14;
   _6 = (int) _5;

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
Ok for trunk?

2021-11-17  Hongtao Liu  
H.J. Lu  

gcc/ChangeLog:

PR tree-optimization/103194
* match.pd (gimple_nop_atomic_bit_test_and_p): Extended to
match truncation.
* tree-ssa-ccp.c (gimple_nop_convert): Declare.
(optimize_atomic_bit_test_and): Enhance
optimize_atomic_bit_test_and to handle truncation.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr103194-2.c: New test.
* gcc.target/i386/pr103194-3.c: New test.
* gcc.target/i386/pr103194-4.c: New test.
* gcc.target/i386/pr103194-5.c: New test.
* gcc.target/i386/pr103194.c: New test.

OK
jeff



Re: [PATCH 3/4] libgcc: Split FDE search code from PT_GNU_EH_FRAME lookup

2021-11-23 Thread Florian Weimer via Gcc-patches
* Jakub Jelinek:

> On Wed, Nov 03, 2021 at 05:28:48PM +0100, Florian Weimer wrote:
>> @@ -383,12 +376,34 @@ _Unwind_IteratePhdrCallback (struct dl_phdr_info 
>> *info, size_t size, void *ptr)
>>  # endif
>>  #endif
>>  
>> -  _Unwind_Ptr dbase = unw_eh_callback_data_dbase (data);
>> +  return 1;
>> +}
>> +
>> +/* Result type of find_fde_tail below.  */
>> +struct find_fde_tail_result
>> +{
>> +  const fde *entry;
>> +  void *func;
>> +};
>> +
>> +/* Find the FDE for the program counter PC, in a previously located
>> +   PT_GNU_EH_FRAME data region.  */
>> +static struct find_fde_tail_result
>> +find_fde_tail (_Unwind_Ptr pc,
>> +   const struct unw_eh_frame_hdr *hdr,
>> +   _Unwind_Ptr dbase)
>
> I think returning a struct like find_fde_tail_result can work nicely
> on certain targets, but on many others the psABI forces such returns through
> stack etc.
> Wouldn't it be better to return const fde * instead of
> struct find_fde_tail_result, pass in struct dwarf_eh_bases *bases
> as another argument to find_fde_tail, just return NULL on the failure
> cases and return some fde pointer and set bases->func on success?

I've refactored it further in the version below.  I introduced the
struct to consolidate the *bases struct update among the success cases,
but I think it's okay to do this inline.  I didn't introduce a separate
function for that.

I tested this in isolation on x86-64 and i386, with no apparent
regressions.

I think the changes are compatible with the fourth patch (which I still
have to rebase on top of that).

Thanks,
Florian

8<--8<
This allows switching to a different implementation for
PT_GNU_EH_FRAME lookup in a subsequent commit.

This moves some of the PT_GNU_EH_FRAME parsing out of the glibc loader
lock that is implied by dl_iterate_phdr.  However, the FDE is already
parsed outside the lock before this change, so this does not introduce
additional crashes in case of a concurrent dlclose.

libunwind/ChangeLog

* unwind-dw2-fde-dip.c (struct unw_eh_callback_data): Add hdr.
Remove func, ret.
(find_fde_tail): New function.  Split from
_Unwind_IteratePhdrCallback. Move the result initialization
from _Unwind_Find_FDE.
(_Unwind_Find_FDE): Updated to call find_fde_tail.

---
 libgcc/unwind-dw2-fde-dip.c | 92 -
 1 file changed, 50 insertions(+), 42 deletions(-)

diff --git a/libgcc/unwind-dw2-fde-dip.c b/libgcc/unwind-dw2-fde-dip.c
index 3f302826d2d..fbb0fbdebb9 100644
--- a/libgcc/unwind-dw2-fde-dip.c
+++ b/libgcc/unwind-dw2-fde-dip.c
@@ -113,8 +113,7 @@ struct unw_eh_callback_data
 #if NEED_DBASE_MEMBER
   void *dbase;
 #endif
-  void *func;
-  const fde *ret;
+  const struct unw_eh_frame_hdr *hdr;
   int check_cache;
 };
 
@@ -197,10 +196,6 @@ _Unwind_IteratePhdrCallback (struct dl_phdr_info *info, 
size_t size, void *ptr)
 #else
   _Unwind_Ptr load_base;
 #endif
-  const unsigned char *p;
-  const struct unw_eh_frame_hdr *hdr;
-  _Unwind_Ptr eh_frame;
-  struct object ob;
   _Unwind_Ptr pc_low = 0, pc_high = 0;
 
   struct ext_dl_phdr_info
@@ -348,10 +343,8 @@ _Unwind_IteratePhdrCallback (struct dl_phdr_info *info, 
size_t size, void *ptr)
 return 0;
 
   /* Read .eh_frame_hdr header.  */
-  hdr = (const struct unw_eh_frame_hdr *)
+  data->hdr = (const struct unw_eh_frame_hdr *)
 __RELOC_POINTER (p_eh_frame_hdr->p_vaddr, load_base);
-  if (hdr->version != 1)
-return 1;
 
 #ifdef CRT_GET_RFIB_DATA
 # if defined __i386__ || defined __nios2__
@@ -383,12 +376,30 @@ _Unwind_IteratePhdrCallback (struct dl_phdr_info *info, 
size_t size, void *ptr)
 # endif
 #endif
 
-  _Unwind_Ptr dbase = unw_eh_callback_data_dbase (data);
+  return 1;
+}
+
+/* Find the FDE for the program counter PC, in a previously located
+   PT_GNU_EH_FRAME data region.  *BASES is updated if an FDE to return is
+   found.  */
+
+static const fde *
+find_fde_tail (_Unwind_Ptr pc,
+  const struct unw_eh_frame_hdr *hdr,
+  _Unwind_Ptr dbase,
+  struct dwarf_eh_bases *bases)
+{
+  const unsigned char *p = (const unsigned char *) (hdr + 1);
+  _Unwind_Ptr eh_frame;
+  struct object ob;
+
+  if (hdr->version != 1)
+return NULL;
+
   p = read_encoded_value_with_base (hdr->eh_frame_ptr_enc,
base_from_cb_data (hdr->eh_frame_ptr_enc,
   dbase),
-   (const unsigned char *) (hdr + 1),
-   _frame);
+   p, _frame);
 
   /* We require here specific table encoding to speed things up.
  Also, DW_EH_PE_datarel here means using PT_GNU_EH_FRAME start
@@ -404,7 +415,7 @@ _Unwind_IteratePhdrCallback (struct dl_phdr_info *info, 
size_t size, void *ptr)
p, _count);
   /* Shouldn't happen.  */
   if 

[PATCH 2/2] PR tree-optimization/103231 - Directly resolve range_of_stmt dependencies.

2021-11-23 Thread Andrew MacLeod via Gcc-patches

This is the second patch in the series.

Ranger uses its own API to recursively satisfy dependencies. When 
range_of_stmt is called on _1482 = _1154 + _1177;  it picks up the 
ranges of _1154 and _1177 from it's cache. If those statements have not 
been seen yet, it recursively calls range_of_stmt on each one to resolve 
the answer.  Each main API call can trigger up to 5 other calls to get 
to the next API point:


   gimple_ranger::fold_range_internal (...)
   gimple_ranger::range_of_stmt (_1154,...)
   gimple_ranger::range_of_expr (_1154,)
   fold_using_range::range_of_range_op (..)
   fold_using_range::fold_stmt (...)
   gimple_ranger::fold_range_internal (...)
   gimple_ranger::range_of_stmt (_1482,...)

For a normal forward walk, values tend to already be in the cache, but 
when we try to answer a range_on_edge question on a back edge, it can 
trigger a very long series of queries.  I spent some time analyzing 
these patterns, and found that regardless of which API entry point was 
used, ultimately range_of_stmt is invoked in a predictable order to 
initiate the cache values.


This patch implements a dependency resolver which when range_of_stmt 
uses when it is called on something which does not have a cache entry 
yet (thus the disambiguation of the temporal failure vs lack of cache 
entry in the previous patch)


This looks at each operand, and if that operand does not have a cache 
entry, pushes it on a stack.   Names are popped from the stack and 
fold_using_range() is invoked once all the operands have been 
resolved.   When we do get to call fold_using_range::fold_stmt(), we are 
sure the operands are cached and the value will simply be calculated.  
This is ultimately the exact series of events that would have happened 
had the main API been used... except we don't involve the call stack 
anymore for each one.


Well, mostly :-).  For this fix, we only do this with operands of stmts 
which have a range-ops handler.. meaning we do not use the API for 
anything range-ops understands.  We will still use the main API for 
resolving PHIS and other statements as they are encountered.    We could 
do this for PHIS as well, but for the most part it was the chains of 
stmts within a block that were causing the vast majority of the issue.  
If we later discover large chains of PHIs are causing issues as well, 
then I can easily add them to this as well.  I avoided them this time 
because there is extra overhead involved in traversing all the PHI 
arguments extra times.  Sticking with range-ops limits us to 2 operands 
to check, and the overhead is very minimal.


I have tested this with PHIs as well and we could just include them 
upfront. The overhead is more than doubled, but the increased compile 
time of a VRP pass is still under 1%.


Bootstrapped on x86_64-pc-linux-gnu with no regressions.  OK?

Andrew



From 28d1fea6e6c0c0368dbc04e895aaa0a6b47c19da Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Mon, 22 Nov 2021 14:39:41 -0500
Subject: [PATCH 3/3] Directly resolve range_of_stmt dependencies.

All ranger API entries eventually call range_of_stmt to ensure there is an
initial global value to work with.  This can cause very deep call chains when
satisfied via the normal API.  Instead, push any dependencies onto a stack
and evaluate them in a depth first manner, mirroring what would have happened
via the normal API calls.

	PR tree-optimization/103231
	gcc/
	* gimple-range.cc (gimple_ranger::gimple_ranger): Create stmt stack.
	(gimple_ranger::gimple_ranger): Delete stmt stack.
	(gimple_ranger::range_of_stmt): Process depenedencies if they have no
	global cache entry.
	(gimple_ranger::prefill_name): New.
	(gimple_ranger::prefill_stmt_dependencies): New.
	* gimple-range.h (class gimple_ranger): Add prototypes.
---
 gcc/gimple-range.cc | 107 +++-
 gcc/gimple-range.h  |   4 ++
 2 files changed, 109 insertions(+), 2 deletions(-)

diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index e3ab3a8bb48..178a470a419 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -46,6 +46,9 @@ gimple_ranger::gimple_ranger () :
   m_oracle = m_cache.oracle ();
   if (dump_file && (param_ranger_debug & RANGER_DEBUG_TRACE))
 tracer.enable_trace ();
+  m_stmt_list.create (0);
+  m_stmt_list.safe_grow (num_ssa_names);
+  m_stmt_list.truncate (0);
 
   // Ensure the not_executable flag is clear everywhere.
   if (flag_checking)
@@ -61,6 +64,11 @@ gimple_ranger::gimple_ranger () :
 }
 }
 
+gimple_ranger::~gimple_ranger ()
+{
+  m_stmt_list.release ();
+}
+
 bool
 gimple_ranger::range_of_expr (irange , tree expr, gimple *stmt)
 {
@@ -284,9 +292,10 @@ gimple_ranger::range_of_stmt (irange , gimple *s, tree name)
   else
 {
   bool current;
-  // Check if the stmt has already been processed, and is not stale.
+  // Check if the stmt has already been processed.
   if (m_cache.get_global_range (r, name, current))
 	{
+	  // If it isn't stale, 

[PATCH 1/2] Split return functionality of get_non_stale_global_range.

2021-11-23 Thread Andrew MacLeod via Gcc-patches
This is the first of 2 patches which will reduce the depth of the call 
chain in ranger.


This patch simply splits the functionality of the routine 
get_non_stale_global_range() from a single boolean return to a boolean 
return and a bool reference.


This routine queries the global cache for a value.  If  there is no 
value, it queries the legacy global range and sets it to that value.  If 
there was a value, it checks the temporal cache to see if its current, 
and if it is, returns TRUe plus the range.


If the value is not currrent, or it was set to the legacy global value, 
then the timestamp is marked as "always current" as it indicates a 
calculation is ongoing, and we dont want to trigger any additional 
temporal faults until the calculation is done. And finallt FALSE is 
returned for all these cases.


The second patch in the series wants to disambiguate at the call site 
whether this was a failure due to not being in the global cache, or 
whether it was due to the timestamp being out of date and take different 
actions for each case.   Details in the following note.


This has been Bootstrapped on x86_64-pc-linux-gnu with no regressions.  OK?

Andrew

From 310719594aa20e8d012f478ab3208f889b558bac Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Fri, 19 Nov 2021 12:59:12 -0500
Subject: [PATCH 2/3] Split return functionality of get_non_stale_global_range.

Get_non_stale_global_range returns true only when there is a cache entry that
is not out of date.  Change it so that it returns true if there was a cache
value, but return the temporal comparison result in an auxiallary flag.

	* gimple-range-cache.cc (ranger_cache::get_global_range): Always
	return a range, return if it came from the cache or not.
	(get_non_stale_global_range): Rename to get_global_range, and return
	the temporal state in a flag.
	* gimple-range-cache.h (get_non_stale_global_range): Rename and adjust.
	* gimple-range.cc (gimple_ranger::range_of_expr): No need to query
	get_global_range.
	(gimple_ranger::range_of_stmt): Adjust for global cache temporal state
	returned in a flag.
---
 gcc/gimple-range-cache.cc | 55 ---
 gcc/gimple-range-cache.h  |  2 +-
 gcc/gimple-range.cc   | 21 ---
 3 files changed, 41 insertions(+), 37 deletions(-)

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index b347edeb474..fe31e9462aa 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -923,44 +923,45 @@ ranger_cache::dump_bb (FILE *f, basic_block bb)
 }
 
 // Get the global range for NAME, and return in R.  Return false if the
-// global range is not set.
+// global range is not set, and return the legacy global value in R.
 
 bool
 ranger_cache::get_global_range (irange , tree name) const
 {
-  return m_globals.get_global_range (r, name);
+  if (m_globals.get_global_range (r, name))
+return true;
+  r = gimple_range_global (name);
+  return false;
 }
 
-// Get the global range for NAME, and return in R if the value is not stale.
-// If the range is set, but is stale, mark it current and return false.
-// If it is not set pick up the legacy global value, mark it current, and
-// return false.
-// Note there is always a value returned in R. The return value indicates
-// whether that value is an up-to-date calculated value or not..
+// Get the global range for NAME, and return in R.  Return false if the
+// global range is not set, and R will contain the legacy global value.
+// CURRENT_P is set to true if the value was in cache and not stale.
+// Otherwise, set CURRENT_P to false and mark as it always current.
+// If the global cache did not have a value, initialize it as well.
+// After this call, the global cache will have a value.
 
 bool
-ranger_cache::get_non_stale_global_range (irange , tree name)
+ranger_cache::get_global_range (irange , tree name, bool _p)
 {
-  if (m_globals.get_global_range (r, name))
-{
-  // Use this value if the range is constant or current.
-  if (r.singleton_p ()
-	  || m_temporal->current_p (name, m_gori.depend1 (name),
-m_gori.depend2 (name)))
-	return true;
-}
+  bool had_global = get_global_range (r, name);
+
+  // If there was a global value, set current flag, otherwise set a value.
+  current_p = false;
+  if (had_global)
+current_p = r.singleton_p ()
+		|| m_temporal->current_p (name, m_gori.depend1 (name),
+	  m_gori.depend2 (name));
   else
-{
-  // Global has never been accessed, so pickup the legacy global value.
-  r = gimple_range_global (name);
-  m_globals.set_global_range (name, r);
-}
-  // After a stale check failure, mark the value as always current until a
-  // new one is set.
-  m_temporal->set_always_current (name);
-  return false;
+m_globals.set_global_range (name, r);
+
+  // If the existing value was not current, mark it as always current.
+  if (!current_p)
+m_temporal->set_always_current (name);
+  return current_p;
 }
-//  Set the 

Re: [PATCH 3/3] elf: Add _dl_find_eh_frame function

2021-11-23 Thread Adhemerval Zanella via Gcc-patches



On 17/11/2021 10:40, Florian Weimer wrote:
> * Adhemerval Zanella via Libc-alpha:
> 
>> However the code is somewhat complex and I would like to have some feedback
>> if gcc will be willing to accept this change (I assume it would require
>> this code merge on glibc beforehand).
> 
> There's a long review queue on the GCC side due to the stage1 close.
> It may still be considered for GCC 12.  Jakub has also requested that
> we hold off committing the glibc side until the GCC side is reviewed.
> 
> I'll flesh out the commit message and NEWS entry once we have agreed
> upon the interface.
> 
>>> new file mode 100644
>>> index 00..c7313c122d
>>> --- /dev/null
>>> +++ b/elf/dl-find_eh_frame.c
> 
>>> +/* Data for the main executable.  There is usually a large gap between
>>> +   the main executable and initially loaded shared objects.  Record
>>> +   the main executable separately, to increase the chance that the
>>> +   range for the non-closeable mappings below covers only the shared
>>> +   objects (and not also the gap between main executable and shared
>>> +   objects).  */
>>> +static uintptr_t _dl_eh_main_map_start attribute_relro;
>>> +static struct dl_eh_frame_info _dl_eh_main_info attribute_relro;
>>> +
>>> +/* Data for initally loaded shared objects that cannot be unlaoded.
>>
>> s/initally/initially and s/unlaoded/unloaded.
> 
> Fixed.
> 
>>
>>> +   The mapping base addresses are stored in address order in the
>>> +   _dl_eh_nodelete_mappings_bases array (containing
>>> +   _dl_eh_nodelete_mappings_size elements).  The EH data for a base
>>> +   address is stored in the parallel _dl_eh_nodelete_mappings_infos.
>>> +   These arrays are not modified after initialization.  */
>>> +static uintptr_t _dl_eh_nodelete_mappings_end attribute_relro;
>>> +static size_t _dl_eh_nodelete_mappings_size attribute_relro;
>>> +static uintptr_t *_dl_eh_nodelete_mappings_bases attribute_relro;
>>> +static struct dl_eh_frame_info *_dl_eh_nodelete_mappings_infos
>>> +  attribute_relro;
>>> +
>>> +/* Mappings created by dlopen can go away with dlclose, so a data
>>> +   dynamic data structure with some synchronization is needed.
>>
>> This sounds strange ("a data dynamic data").
> 
> I dropped the first data.
> 
>>
>>> +   Individual segments are similar to the _dl_eh_nodelete_mappings
>>
>> Maybe use _dl_eh_nodelete_mappings_*, because '_dl_eh_nodelete_mappings'
>> itself if not defined anywhere.
> 
> Right.
> 
>>> +   Adding new elements to this data structure is another source of
>>> +   quadratic behavior for dlopen.  If the other causes of quadratic
>>> +   behavior are eliminated, a more complicated data structure will be
>>> +   needed.  */
>>
>> This worries me, specially we have reports that python and other dynamic
>> environments do use a lot of plugin and generates a lot of dlopen() calls.
>> What kind of performance implication do you foresee here?
> 
> The additional overhead is not disproportionate to the other sources of
> quadratic behavior.  With 1,000 dlopen'ed objects, overall run-time
> seems to be comparable to the strcmp time required soname matching, for
> example, and is quite difficult to measure.  So we could fix the
> performance regression if we used a hash table for that …
> 
> It's just an undesirable complexity class.  The implementation is not
> actually slow because it's a mostly-linear copy (although a backwards
> one).  Other parts of dlopen involve pointer chasing and are much
> slower.

Right, I agree this should probably won't incur in performance issues,
I was curious if you have any numbers about it.

> 
>>> +/* Allocate an empty segment that is at least SIZE large.  PREVIOUS */
>>
>> What this PREVIOUS refer to?
> 
> Oops, it's now:
> 
> /* Allocate an empty segment that is at least SIZE large.  PREVIOUS
>points to the chain of previously allocated segments and can be
>NULL.  */
> 
>>> +/* Update the version to reflect that an update is happening.  This
>>> +   does not change the bit that controls the active segment chain.
>>> +   Returns the index of the currently active segment chain.  */
>>> +static inline unsigned int
>>> +_dl_eh_mappings_begin_update (void)
>>> +{
>>> +  unsigned int v
>>> += __atomic_wide_counter_fetch_add_relaxed 
>>> (&_dl_eh_loaded_mappings_version,
>>> +   2);
>>
>> Why use an 'unsigned int' for the wide counter here?
> 
> Because …
> 
>>> +  /* Subsequent stores to the TM data must not be reordered before the
>>> + store above with the version update.  */
>>> +  atomic_thread_fence_release ();
>>> +  return v & 1;
>>> +}
> 
> … we only need the lower bit.

Ack, I guess it won't matter to compiler.

> 
>>> +  /* Other initially loaded objects.  */
>>> +  if (pc >= *_dl_eh_nodelete_mappings_bases
>>> +  && pc < _dl_eh_nodelete_mappings_end)
>>> +{
>>> +  size_t idx = _dl_eh_find_lower_bound (pc,
>>> +

Re: [PATCH] Loop unswitching: support gswitch statements.

2021-11-23 Thread Martin Liška

On 11/23/21 16:20, Martin Liška wrote:

Sure, so for e.g. case 1 ... 5 we would need to create a new unswitch_predicate
with 1 <= index && index <= 5 tree predicate (and the corresponding irange 
range).
Later once we unswitch on it, we should use a special unreachable_flag that will
be used for marking of dead edges (similarly how we fold gconds to 
boolean_{false/true}_node.
Does it make sense?


I have thought about it more and it's not enough. What we really want is having 
a irange
for *each edge* (2 for gconds and multiple for gswitchs). Once we select a 
unswitch_predicate,
then we need to fold_range in true/false loop all these iranges. Doing that we 
can handle situations like:

if (index < 1)
   do_something1

if (index > 2)
   do_something2

switch (index)
   case 1 ... 2:
 do_something;
...

as seen the once we unswitch on 'index < 1' and 'index > 2', then the first 
case will be taken in the false_edge
of 'index > 2' loop unswitching.

Martin


Re: [PATCH 06/10] tree-object-size: Support dynamic sizes in conditions

2021-11-23 Thread Jakub Jelinek via Gcc-patches
On Tue, Nov 23, 2021 at 09:30:30PM +0530, Siddhesh Poyarekar wrote:
> On 11/23/21 21:22, Jakub Jelinek wrote:
> > Evaluating __bdos in both passes is undesirable, certainly for the same
> > SSA_NAME, but even for different SSA_NAMEs, if everything is done in a
> > single pass it can easily share temporaries (object sizes for SSA_NAMEs it
> > uses), while if some __bdos is evaluated early and other late, we'll need to
> > hope further optimizations CSE those.
> 
> OK, then treat __bdos like __bos in objsz1, adding MIN?MAX for subobjects
> and full evaluation in objsz2?

Yes.  It is not perfect, but unfortunately it is hard to get perfect
results.  The subobject stuff for __bos has been designed when GCC wasn't
doing such optimizations, later MEM_REF has been introduced and we really
can't prevent all those optimizations just because there could be a
subobject __bos somewhere that cares about that.

Jakub



Re: [PATCH 06/10] tree-object-size: Support dynamic sizes in conditions

2021-11-23 Thread Jakub Jelinek via Gcc-patches
On Tue, Nov 23, 2021 at 09:08:35PM +0530, Siddhesh Poyarekar wrote:
> On 11/23/21 21:06, Siddhesh Poyarekar wrote:
> > On 11/23/21 20:42, Jakub Jelinek wrote:
> > > On Wed, Nov 10, 2021 at 12:31:32AM +0530, Siddhesh Poyarekar wrote:
> > > > (object_sizes_execute): Don't insert min/max for dynamic sizes.
> > > 
> > > I'm worried about this.
> > > I'd say what we might want to do is in the early pass for __bdos
> > > compute actually __bos (i.e. the static one) and add MIN_EXPR/MAX_EXPR
> > > for the result of the __bdos call from the second pass with the
> > > statically computed value.
> > > 
> > > The reason for the MIN_EXPR/MAX_EXPR stuff is that GIMPLE optimizations
> > > can remove exact ADDR_EXPRs with detailed COMPONENT_REF etc. access paths
> > > in it, so during the late objsz2 pass the subobject modes don't work
> > > reliably anymore.  But the subobject knowledge should be the same between
> > > the static and dynamic evaluation...
> > 
> > So in the dynamic case we almost always end up with the right expression
> > in objsz1, except in cases where late optimizations make available
> > information that wasn't available earlier.  How about putting in a
> > MIN_EXPR/MAX_EXPR if we *fail* to get the subobject size instead?
> 
> Actually if we don't get a dynamic expression it's unlikely that we'll get a
> static size either, so I'm not sure if MIN_EXPR/MAX_EXPR will actually do
> anything useful.

Consider:
struct S { int a; char b[16]; int c; char d[]; };

static int
foo (struct S *s)
{
  return __builtin_object_size (>b[2], 1);
}

int
bar (int m)
{
  struct S *s = (struct S *) __builtin_malloc (m);
  int r = foo (s);
  __builtin_free (s);
  return r;
}

In early_objsz, foo isn't inlined, we can statically determine the maximum
bound of 14 but don't really know how large the allocation will actually be.
So, we record that foo returns MIN_EXPR <__builtin_object_size (>b[2], 1), 
14>
and in the late objsz compute that as MIN_EXPR <-1UL, 14>.
Now, with __builtin_dynamic_object_size, I think it is pretty much the same,
you don't know how large the allocation actually is, so IMHO we want to
record MIN_EXPR <__buitin_dynamic_object_size (>b[2], 1), 14> and
at runtime do MIN_EXPR  or so.

It is true that it is just an upper bound, if we do:

static int
baz (struct S *s, int l)
{
  return __builtin_dynamic_object_size (l ? >b[3] : >b[2], 1);
}

int
qux (int m, int l)
{
  struct S *s = (struct S *) __builtin_malloc (m);
  int r = foo (s, l);
  __builtin_free (s);
  return r;
}

then the statically computed __bos in early_objsz would be still 14 and I
think dynamic needs to punt because it really doesn't know how large the
allocation will be.  In the late objsz it can make
m - 6 - !!l out of it for the __bdos (, 0) from it and combine that
to MIN_EXPR  which still isn't exact, that would be
l ? MIN_EXPR  : MIN_EXPR .
But, at late objsz time, the information that there was >b[3] and
>b[2] might be gone, consider e.g. in baz above a call
  corge ((char *) s + offsetof (struct S, b) + 2,
 (char *) s + offsetof (struct S, b) + 3);
where SCCVN will CSE the (char *) s + offsetof (struct S, b) + 2
and >b[2] etc. expressions because they have the same value.
But __bos ((char *) s + offsetof (struct S, b) + 2, 1) should be
equal to __bos ((char *) s + offsetof (struct S, b) + 2, 0).

Jakub



Re: [PATCH] fixincludes: don't abort() on access failure [PR103306]

2021-11-23 Thread Jeff Law via Gcc-patches




On 11/23/2021 2:31 AM, Xi Ruoyao wrote:

On Mon, 2021-11-22 at 17:37 -0700, Jeff Law wrote:


On 11/18/2021 4:01 AM, Xi Ruoyao via Gcc-patches wrote:

Some distro may ship dangling symlinks in include directories,
triggers
the access failure.  Skip it and continue to next header instead of
being to panic.

Restore to old behavior before r12-5234 but without resurrecting the
problematic getcwd() call, by using the environment variable "INPUT"
exported by fixinc.sh.

Tested on x86_64-linux-gnu, with a dangling symlink in /usr/include.

fixincludes/

 PR bootstrap/103306
 * fixincl.c (process): Don't call abort().
---
   fixincludes/fixincl.c | 15 ---
   1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/fixincludes/fixincl.c b/fixincludes/fixincl.c
index a17b65866c3..81939ee5ffa 100644
--- a/fixincludes/fixincl.c
+++ b/fixincludes/fixincl.c
@@ -1352,10 +1352,19 @@ process (void)
   
     if (access (pz_curr_file, R_OK) != 0)

   {
-  /* Some really strange error happened.  */
-  fprintf (stderr, "Cannot access %s: %s\n", pz_curr_file,
+  /* It may happens if for e. g. the distro ships some broken symlinks
+   * in /usr/include.  */
+
+  /* "INPUT" is exported in fixinc.sh, which is the pwd where fixincl
+   * runs.  It's used instead of getcwd to avoid allocating a buffer
+   * with unknown length.  */

Formatting nits.  We don't use '*' at the start of comment lines. Drop
the '*' like this

    /* blah blah blah
   more text.  */

Strangely contrib/check_GNU_style.sh does not warn about this.
It should.  Though in fairness, that checker is new relative to the 
overall live of the GCC project and obviously not 100% complete. Patches 
are always appreciated :-)







+  const char *cwd = getenv ("INPUT");
+  if (!cwd)
+   cwd = "the working directory";
+
+  fprintf (stderr, "Cannot access %s from %s: %s\n", pz_curr_file, cwd,
    xstrerror (errno));
-  abort ();
+  return;
   }

If INPUT is always exported, why not just print it? ie, would CWD after
actually be NULL?

INPUT is set by fixinc.sh.  During GCC building process fixincl is
always invoked by fixinc.sh.  However someone may run fixincl executable
directly for debugging.

Good point.  With the formatting nit fixed, this is fine for the trunk.

Thanks,
jeff



Re: [PATCH 06/10] tree-object-size: Support dynamic sizes in conditions

2021-11-23 Thread Siddhesh Poyarekar

On 11/23/21 21:22, Jakub Jelinek wrote:

Evaluating __bdos in both passes is undesirable, certainly for the same
SSA_NAME, but even for different SSA_NAMEs, if everything is done in a
single pass it can easily share temporaries (object sizes for SSA_NAMEs it
uses), while if some __bdos is evaluated early and other late, we'll need to
hope further optimizations CSE those.


OK, then treat __bdos like __bos in objsz1, adding MIN?MAX for 
subobjects and full evaluation in objsz2?


Thanks,
Siddhesh


Re: [PATCH 06/10] tree-object-size: Support dynamic sizes in conditions

2021-11-23 Thread Jakub Jelinek via Gcc-patches
On Tue, Nov 23, 2021 at 09:06:49PM +0530, Siddhesh Poyarekar wrote:
> On 11/23/21 20:42, Jakub Jelinek wrote:
> > On Wed, Nov 10, 2021 at 12:31:32AM +0530, Siddhesh Poyarekar wrote:
> > >   (object_sizes_execute): Don't insert min/max for dynamic sizes.
> > 
> > I'm worried about this.
> > I'd say what we might want to do is in the early pass for __bdos
> > compute actually __bos (i.e. the static one) and add MIN_EXPR/MAX_EXPR
> > for the result of the __bdos call from the second pass with the
> > statically computed value.
> > 
> > The reason for the MIN_EXPR/MAX_EXPR stuff is that GIMPLE optimizations
> > can remove exact ADDR_EXPRs with detailed COMPONENT_REF etc. access paths
> > in it, so during the late objsz2 pass the subobject modes don't work
> > reliably anymore.  But the subobject knowledge should be the same between
> > the static and dynamic evaluation...
> 
> So in the dynamic case we almost always end up with the right expression in
> objsz1, except in cases where late optimizations make available information
> that wasn't available earlier.  How about putting in a MIN_EXPR/MAX_EXPR if
> we *fail* to get the subobject size instead?

I don't think that is the case, perhaps for trivial testcases yes when early
inlining inlines the fortification always_inline functions and everything
appears in a single function.
The primary reason for objsz2 being done later is that it is after inlining,
IPA optimizations and some optimization passes that clean up after those.
But at the same time it is after too many optimizations that could have
broken the exact subobject details.
But very often in objsz1 you'll just see const char *p argument and only
inlining will reveal how that was allocated etc.

Evaluating __bdos in both passes is undesirable, certainly for the same
SSA_NAME, but even for different SSA_NAMEs, if everything is done in a
single pass it can easily share temporaries (object sizes for SSA_NAMEs it
uses), while if some __bdos is evaluated early and other late, we'll need to
hope further optimizations CSE those.

Jakub



Re: [PATCH 2/2][GCC] arm: Declare MVE types internally via pragma

2021-11-23 Thread Murray Steele via Gcc-patches
On 23/11/2021 14:16, Richard Earnshaw wrote:
> 
> 
> On 23/11/2021 09:37, Murray Steele wrote:
>> On 18/11/2021 15:45, Richard Earnshaw wrote:
>>
>>>
>>> This is mostly OK, but can't we reduce the number of tests somewhat? For 
>>> example, I think you can merge type_redef_13.c and type_redef_14.c by 
>>> writing
>>>
>>> /* { dg-do compile } */
>>> /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>>> /* { dg-add-options arm_v8_1m_mve } */
>>>
>>> int uint8x16x4_t; /* { dg-message "note: previous declaration of 
>>> 'uint8x16x4_t'" } */
>>> int uint16x8x2_t; /* { dg-message "note: previous declaration of 
>>> 'uint16x8x2_t'" } */
>>>
>>> #pragma GCC arm "arm_mve_types.h"  /* { dg-error {'uint8x16x4_t' 
>>> redeclared} } */
>>>    /* { dg-error {'uint16x8x2_t' redeclared} {target *-*-*} .-1 } */
>>>
>>> etc.  Note the second dg-error is anchored to the line above it (.-1).
>>>
>>> R.
>>
>> Thanks. I think if we'd like to reduce the number of tests, it would make 
>> the most
>> sense to merge the test cases in the way you've described based on their 
>> implementation
>> and target features. i.e.
>>
>> - type_redef_1.c : covers mve_pred16_t.
>> - type_redef_2.c : covers single-integer-vector types.
>> - type_redef_3.c : covers single-float-vector types.
>> - type_redef_4.c : covers integer-vector-tuple types.
>> - type_redef_5.c : covers float-vector-tuple types.
>>
>> The idea being that the test results for these tests should allow someone to 
>> triangulate
>> the cause of the failure. For example, if tests 4 and 5 fail, it is likely 
>> due to a
>> deficiency in the MVE tuple type implementation, rather than the handling of 
>> target-specific
>> features. More specific tests failures can be determined by looking through 
>> log files.
>>
>> Thanks,
>> Murray
>>
> 
> Merged files will still have the same number of tests, and the same possible 
> test names, just from fewer source files.  So I don't think triangulation 
> will be an issue.

Ok. In that case we could merge all source files into one which covers all 
types.

Thanks,
Murray


[PATCH v2] fixincludes: don't abort() on access failure [PR103306]

2021-11-23 Thread Xi Ruoyao via Gcc-patches
[v2: format fix]

Some distro may ship dangling symlinks in include directories, triggers
the access failure.  Skip it and continue to next header instead of
being to panic.

Restore to old behavior before r12-5234 but without resurrecting the
problematic getcwd() call, by using the environment variable "INPUT"
exported by fixinc.sh.

Tested on x86_64-linux-gnu, with a dangling symlink intentionally
injected into /usr/include.

fixincludes/

PR bootstrap/103306
* fixincl.c (process): Don't call abort().
---
 fixincludes/fixincl.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/fixincludes/fixincl.c b/fixincludes/fixincl.c
index a17b65866c3..92909baf85f 100644
--- a/fixincludes/fixincl.c
+++ b/fixincludes/fixincl.c
@@ -1352,10 +1352,19 @@ process (void)
 
   if (access (pz_curr_file, R_OK) != 0)
 {
-  /* Some really strange error happened.  */
-  fprintf (stderr, "Cannot access %s: %s\n", pz_curr_file,
+  /* It may happens if for e. g. the distro ships some broken symlinks
+in /usr/include.  */
+
+  /* "INPUT" is exported in fixinc.sh, which is the pwd where fixincl
+runs.  It's used instead of getcwd to avoid allocating a buffer
+with unknown length.  */
+  const char *cwd = getenv ("INPUT");
+  if (!cwd)
+   cwd = "the working directory";
+
+  fprintf (stderr, "Cannot access %s from %s: %s\n", pz_curr_file, cwd,
   xstrerror (errno));
-  abort ();
+  return;
 }
 
   pz_curr_data = load_file (pz_curr_file);
-- 
2.34.0




Re: [PATCH] libcpp: Use [[likely]] conditionally

2021-11-23 Thread Jeff Law via Gcc-patches




On 11/23/2021 8:26 AM, Christophe LYON via Gcc-patches wrote:

Hi!

On 23/11/2021 01:26, Jeff Law via Gcc-patches wrote:



On 11/22/2021 10:22 AM, Marek Polacek via Gcc-patches wrote:

Let's hide [[likely]] behind a macro, to suppress warnings if the
compiler doesn't support it.

Co-authored-by: Jonathan Wakely 

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR preprocessor/103355

libcpp/ChangeLog:

* lex.c: Use ATTR_LIKELY instead of [[likely]].
* system.h (ATTR_LIKELY): Define.

OK
jeff



This patch breaks the build when the host compiler is gcc-4.8.5, 
because __has_cpp_attribute is not defined.

Sigh.  I'd like to move to a more recent prereq if we could.




Is this small patch OK with a proper ChangeLog?

Yes.  Sorry about the breakage.
jeff




Re: [PATCH 06/10] tree-object-size: Support dynamic sizes in conditions

2021-11-23 Thread Siddhesh Poyarekar

On 11/23/21 21:06, Siddhesh Poyarekar wrote:

On 11/23/21 20:42, Jakub Jelinek wrote:

On Wed, Nov 10, 2021 at 12:31:32AM +0530, Siddhesh Poyarekar wrote:

(object_sizes_execute): Don't insert min/max for dynamic sizes.


I'm worried about this.
I'd say what we might want to do is in the early pass for __bdos
compute actually __bos (i.e. the static one) and add MIN_EXPR/MAX_EXPR
for the result of the __bdos call from the second pass with the
statically computed value.

The reason for the MIN_EXPR/MAX_EXPR stuff is that GIMPLE optimizations
can remove exact ADDR_EXPRs with detailed COMPONENT_REF etc. access paths
in it, so during the late objsz2 pass the subobject modes don't work
reliably anymore.  But the subobject knowledge should be the same between
the static and dynamic evaluation...


So in the dynamic case we almost always end up with the right expression 
in objsz1, except in cases where late optimizations make available 
information that wasn't available earlier.  How about putting in a 
MIN_EXPR/MAX_EXPR if we *fail* to get the subobject size instead?


Actually if we don't get a dynamic expression it's unlikely that we'll 
get a static size either, so I'm not sure if MIN_EXPR/MAX_EXPR will 
actually do anything useful.


Siddhesh


Re: [PATCH] libcpp: Use [[likely]] conditionally

2021-11-23 Thread Marek Polacek via Gcc-patches
On Tue, Nov 23, 2021 at 04:26:19PM +0100, Christophe LYON via Gcc-patches wrote:
> Hi!
> 
> On 23/11/2021 01:26, Jeff Law via Gcc-patches wrote:
> > 
> > 
> > On 11/22/2021 10:22 AM, Marek Polacek via Gcc-patches wrote:
> > > Let's hide [[likely]] behind a macro, to suppress warnings if the
> > > compiler doesn't support it.
> > > 
> > > Co-authored-by: Jonathan Wakely 
> > > 
> > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > 
> > > PR preprocessor/103355
> > > 
> > > libcpp/ChangeLog:
> > > 
> > > * lex.c: Use ATTR_LIKELY instead of [[likely]].
> > > * system.h (ATTR_LIKELY): Define.
> > OK
> > jeff
> 
> 
> This patch breaks the build when the host compiler is gcc-4.8.5, because
> __has_cpp_attribute is not defined.
 
Ah, of course.

> Is this small patch OK with a proper ChangeLog?

Yes, please.

> diff --git a/libcpp/system.h b/libcpp/system.h
> index f6fc583ab80..b78ab813d2f 100644
> --- a/libcpp/system.h
> +++ b/libcpp/system.h
> @@ -430,6 +430,8 @@ extern void fancy_abort (const char *, int, const char
> *) ATTRIBUTE_NORETURN;
>  # else
>  #  define ATTR_LIKELY
>  # endif
> +#else
> +# define ATTR_LIKELY
>  #endif
> 
>  /* Poison identifiers we do not want to use.  */
> 
> 
> Thanks,
> 
> 
> Christophe
> 
> 
> 

Marek



Re: [PATCH 06/10] tree-object-size: Support dynamic sizes in conditions

2021-11-23 Thread Siddhesh Poyarekar

On 11/23/21 20:42, Jakub Jelinek wrote:

On Wed, Nov 10, 2021 at 12:31:32AM +0530, Siddhesh Poyarekar wrote:

(object_sizes_execute): Don't insert min/max for dynamic sizes.


I'm worried about this.
I'd say what we might want to do is in the early pass for __bdos
compute actually __bos (i.e. the static one) and add MIN_EXPR/MAX_EXPR
for the result of the __bdos call from the second pass with the
statically computed value.

The reason for the MIN_EXPR/MAX_EXPR stuff is that GIMPLE optimizations
can remove exact ADDR_EXPRs with detailed COMPONENT_REF etc. access paths
in it, so during the late objsz2 pass the subobject modes don't work
reliably anymore.  But the subobject knowledge should be the same between
the static and dynamic evaluation...


So in the dynamic case we almost always end up with the right expression 
in objsz1, except in cases where late optimizations make available 
information that wasn't available earlier.  How about putting in a 
MIN_EXPR/MAX_EXPR if we *fail* to get the subobject size instead?


Siddhesh


Reduce size of modref_access_tree

2021-11-23 Thread Jan Hubicka via Gcc-patches
Hi,

Modref tree template stores its own copy of param_moderf_max_bases, *_max_refs
and *_max_accesses values.  This was done before we had per-function limits and
even back then it was bit dubious, so this patch removes it.

Bootstrapped/regtested x86_64-linux, will commit it shortly.
Honza

gcc/ChangeLog:

* ipa-modref-tree.h (struct modref_tree): Remove max_bases, max_refs
and max_accesses.
(modref_tree::modref_tree): Remove parametr.
(modref_tree::insert_base): Add max_bases parameter.
(modref_tree::insert): Add max_bases, max_refs, max_accesses
parameters.
(modref_tree::insert): New member function.
(modref_tree::merge): Add max_bases, max_refs, max_accesses
parameters.
(modref_tree::insert): New member function.
* ipa-modref-tree.c (test_insert_search_collapse): Update.
(test_merge): Update.
* ipa-modref.c (dump_records): Don't dump max_refs and max_bases.
(dump_lto_records): Likewise.
(modref_summary::finalize): Fix whitespace.
(get_modref_function_summary): Likewise.
(modref_access_analysis::record_access): Update.
(modref_access_analysis::record_access_lto): Update.
(modref_access_analysis::process_fnspec): Update.
(analyze_function): Update.
(modref_summaries::duplicate): Update.
(modref_summaries_lto::duplicate): Update.
(write_modref_records): Update.
(read_modref_records): Update.
(read_section): Update.
(propagate_unknown_call): Update.
(modref_propagate_in_scc): Update.
(ipa_merge_modref_summary_after_inlining): Update.

diff --git a/gcc/ipa-modref-tree.c b/gcc/ipa-modref-tree.c
index e23d88d7fc0..0671fa76199 100644
--- a/gcc/ipa-modref-tree.c
+++ b/gcc/ipa-modref-tree.c
@@ -874,11 +874,11 @@ test_insert_search_collapse ()
   modref_ref_node *ref_node;
   modref_access_node a = unspecified_modref_access_node;
 
-  modref_tree *t = new modref_tree(1, 2, 2);
+  modref_tree *t = new modref_tree();
   ASSERT_FALSE (t->every_base);
 
   /* Insert into an empty tree.  */
-  t->insert (1, 2, a, false);
+  t->insert (1, 2, 2, 1, 2, a, false);
   ASSERT_NE (t->bases, NULL);
   ASSERT_EQ (t->bases->length (), 1);
   ASSERT_FALSE (t->every_base);
@@ -896,7 +896,7 @@ test_insert_search_collapse ()
   ASSERT_EQ (ref_node->ref, 2);
 
   /* Insert when base exists but ref does not.  */
-  t->insert (1, 3, a, false);
+  t->insert (1, 2, 2, 1, 3, a, false);
   ASSERT_NE (t->bases, NULL);
   ASSERT_EQ (t->bases->length (), 1);
   ASSERT_EQ (t->search (1), base_node);
@@ -909,7 +909,7 @@ test_insert_search_collapse ()
 
   /* Insert when base and ref exist, but access is not dominated by nor
  dominates other accesses.  */
-  t->insert (1, 2, a, false);
+  t->insert (1, 2, 2, 1, 2, a, false);
   ASSERT_EQ (t->bases->length (), 1);
   ASSERT_EQ (t->search (1), base_node);
 
@@ -917,12 +917,12 @@ test_insert_search_collapse ()
   ASSERT_NE (ref_node, NULL);
 
   /* Insert when base and ref exist and access is dominated.  */
-  t->insert (1, 2, a, false);
+  t->insert (1, 2, 2, 1, 2, a, false);
   ASSERT_EQ (t->search (1), base_node);
   ASSERT_EQ (base_node->search (2), ref_node);
 
   /* Insert ref to trigger ref list collapse for base 1.  */
-  t->insert (1, 4, a, false);
+  t->insert (1, 2, 2, 1, 4, a, false);
   ASSERT_EQ (t->search (1), base_node);
   ASSERT_EQ (base_node->refs, NULL);
   ASSERT_EQ (base_node->search (2), NULL);
@@ -930,7 +930,7 @@ test_insert_search_collapse ()
   ASSERT_TRUE (base_node->every_ref);
 
   /* Further inserts to collapsed ref list are ignored.  */
-  t->insert (1, 5, a, false);
+  t->insert (1, 2, 2, 1, 5, a, false);
   ASSERT_EQ (t->search (1), base_node);
   ASSERT_EQ (base_node->refs, NULL);
   ASSERT_EQ (base_node->search (2), NULL);
@@ -938,13 +938,13 @@ test_insert_search_collapse ()
   ASSERT_TRUE (base_node->every_ref);
 
   /* Insert base to trigger base list collapse.  */
-  t->insert (5, 0, a, false);
+  t->insert (1, 2, 2, 5, 0, a, false);
   ASSERT_TRUE (t->every_base);
   ASSERT_EQ (t->bases, NULL);
   ASSERT_EQ (t->search (1), NULL);
 
   /* Further inserts to collapsed base list are ignored.  */
-  t->insert (7, 8, a, false);
+  t->insert (1, 2, 2, 7, 8, a, false);
   ASSERT_TRUE (t->every_base);
   ASSERT_EQ (t->bases, NULL);
   ASSERT_EQ (t->search (1), NULL);
@@ -959,23 +959,23 @@ test_merge ()
   modref_base_node *base_node;
   modref_access_node a = unspecified_modref_access_node;
 
-  t1 = new modref_tree(3, 4, 1);
-  t1->insert (1, 1, a, false);
-  t1->insert (1, 2, a, false);
-  t1->insert (1, 3, a, false);
-  t1->insert (2, 1, a, false);
-  t1->insert (3, 1, a, false);
-
-  t2 = new modref_tree(10, 10, 10);
-  t2->insert (1, 2, a, false);
-  t2->insert (1, 3, a, false);
-  t2->insert (1, 4, a, false);
-  t2->insert (3, 2, a, false);
-  t2->insert (3, 3, a, false);
-  t2->insert (3, 4, a, false);
-  t2->insert (3, 5, a, false);

Re: [PATCH] libcpp: Use [[likely]] conditionally

2021-11-23 Thread Christophe LYON via Gcc-patches

Hi!

On 23/11/2021 01:26, Jeff Law via Gcc-patches wrote:



On 11/22/2021 10:22 AM, Marek Polacek via Gcc-patches wrote:

Let's hide [[likely]] behind a macro, to suppress warnings if the
compiler doesn't support it.

Co-authored-by: Jonathan Wakely 

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR preprocessor/103355

libcpp/ChangeLog:

* lex.c: Use ATTR_LIKELY instead of [[likely]].
* system.h (ATTR_LIKELY): Define.

OK
jeff



This patch breaks the build when the host compiler is gcc-4.8.5, because 
__has_cpp_attribute is not defined.


Is this small patch OK with a proper ChangeLog?


diff --git a/libcpp/system.h b/libcpp/system.h
index f6fc583ab80..b78ab813d2f 100644
--- a/libcpp/system.h
+++ b/libcpp/system.h
@@ -430,6 +430,8 @@ extern void fancy_abort (const char *, int, const 
char *) ATTRIBUTE_NORETURN;

 # else
 #  define ATTR_LIKELY
 # endif
+#else
+# define ATTR_LIKELY
 #endif

 /* Poison identifiers we do not want to use.  */


Thanks,


Christophe





RE: [PATCH]AArch64 Optimize right shift rounding narrowing

2021-11-23 Thread Tamar Christina via Gcc-patches
Adding ML back in. ☹

> -Original Message-
> From: Tamar Christina 
> Sent: Tuesday, November 23, 2021 3:17 PM
> To: Tamar Christina 
> Cc: Richard Earnshaw ; nd ;
> Richard Sandiford ; Marcus Shawcroft
> ; Kyrylo Tkachov 
> Subject: RE: [PATCH]AArch64 Optimize right shift rounding narrowing
> 
> Ping.
> 
> > -Original Message-
> > From: Gcc-patches  > bounces+tamar.christina=arm@gcc.gnu.org> On Behalf Of Tamar
> > Christina via Gcc-patches
> > Sent: Friday, November 12, 2021 12:08 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Richard Earnshaw ; nd ;
> > Richard Sandiford ; Marcus Shawcroft
> > 
> > Subject: [PATCH]AArch64 Optimize right shift rounding narrowing
> >
> > Hi All,
> >
> > This optimizes right shift rounding narrow instructions to rounding
> > add narrow high where one vector is 0 when the shift amount is half
> > that of the original input type.
> >
> > i.e.
> >
> > uint32x4_t foo (uint64x2_t a, uint64x2_t b) {
> >   return vrshrn_high_n_u64 (vrshrn_n_u64 (a, 32), b, 32); }
> >
> > now generates:
> >
> > foo:
> > moviv3.4s, 0
> > raddhn  v0.2s, v2.2d, v3.2d
> > raddhn2 v0.4s, v2.2d, v3.2d
> >
> > instead of:
> >
> > foo:
> > rshrn   v0.2s, v0.2d, 32
> > rshrn2  v0.4s, v1.2d, 32
> > ret
> >
> > On Arm cores this is an improvement in both latency and throughput.
> > Because a vector zero is needed I created a new method
> > aarch64_gen_shareable_zero that creates zeros using V4SI and then
> > takes a subreg of the zero to the desired type.  This allows CSE to
> > share all the zero constants.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-protos.h (aarch64_gen_shareable_zero):
> > New.
> > * config/aarch64/aarch64-simd.md (aarch64_rshrn,
> > aarch64_rshrn2):
> > * config/aarch64/aarch64.c (aarch64_gen_shareable_zero): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/advsimd-intrinsics/shrn-1.c: New test.
> > * gcc.target/aarch64/advsimd-intrinsics/shrn-2.c: New test.
> > * gcc.target/aarch64/advsimd-intrinsics/shrn-3.c: New test.
> > * gcc.target/aarch64/advsimd-intrinsics/shrn-4.c: New test.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/config/aarch64/aarch64-protos.h
> > b/gcc/config/aarch64/aarch64-protos.h
> > index
> >
> f7887d06139f01c1591c4e755538d94e5e608a52..f7f5cae82bc9198e54d0298f25f
> > 7c0f5902d5fb1 100644
> > --- a/gcc/config/aarch64/aarch64-protos.h
> > +++ b/gcc/config/aarch64/aarch64-protos.h
> > @@ -846,6 +846,7 @@ const char *aarch64_output_move_struct (rtx
> > *operands);  rtx aarch64_return_addr_rtx (void);  rtx
> > aarch64_return_addr (int, rtx);  rtx aarch64_simd_gen_const_vector_dup
> > (machine_mode, HOST_WIDE_INT);
> > +rtx aarch64_gen_shareable_zero (machine_mode);
> >  bool aarch64_simd_mem_operand_p (rtx);  bool
> > aarch64_sve_ld1r_operand_p (rtx);  bool aarch64_sve_ld1rq_operand_p
> > (rtx); diff --git a/gcc/config/aarch64/aarch64-simd.md
> > b/gcc/config/aarch64/aarch64- simd.md index
> >
> c71658e2bf52b26bf9fc9fa702dd5446447f4d43..d7f8694add540e32628893a7b7
> > 471c08de6f760f 100644
> > --- a/gcc/config/aarch64/aarch64-simd.md
> > +++ b/gcc/config/aarch64/aarch64-simd.md
> > @@ -1956,20 +1956,32 @@ (define_expand "aarch64_rshrn"
> > (match_operand:SI 2 "aarch64_simd_shift_imm_offset_")]
> >"TARGET_SIMD"
> >{
> > -operands[2] = aarch64_simd_gen_const_vector_dup (mode,
> > -INTVAL (operands[2]));
> > -rtx tmp = gen_reg_rtx (mode);
> > -if (BYTES_BIG_ENDIAN)
> > -  emit_insn (gen_aarch64_rshrn_insn_be (tmp, operands[1],
> > -   operands[2], CONST0_RTX
> > (mode)));
> > +if (INTVAL (operands[2]) == GET_MODE_UNIT_BITSIZE
> > (mode))
> > +  {
> > +   rtx tmp0 = aarch64_gen_shareable_zero (mode);
> > +   emit_insn (gen_aarch64_raddhn (operands[0], operands[1],
> > tmp0));
> > +  }
> >  else
> > -  emit_insn (gen_aarch64_rshrn_insn_le (tmp, operands[1],
> > -   operands[2], CONST0_RTX
> > (mode)));
> > -
> > -/* The intrinsic expects a narrow result, so emit a subreg that will 
> > get
> > -   optimized away as appropriate.  */
> > -emit_move_insn (operands[0], lowpart_subreg (mode,
> > tmp,
> > -mode));
> > +  {
> > +   rtx tmp = gen_reg_rtx (mode);
> > +   operands[2] = aarch64_simd_gen_const_vector_dup
> > (mode,
> > +INTVAL (operands[2]));
> > +   if (BYTES_BIG_ENDIAN)
> > + emit_insn (
> > +   gen_aarch64_rshrn_insn_be (tmp, operands[1],
> > +operands[2],
> > +CONST0_RTX
> > (mode)));
> > +   else
> > + emit_insn (
> > +   gen_aarch64_rshrn_insn_le 

Re: [PATCH] Loop unswitching: support gswitch statements.

2021-11-23 Thread Martin Liška

On 11/23/21 14:58, Richard Biener wrote:

On Mon, Nov 22, 2021 at 4:07 PM Martin Liška  wrote:


On 11/19/21 11:00, Richard Biener wrote:

On Tue, Nov 16, 2021 at 3:40 PM Martin Liška  wrote:


On 11/11/21 08:15, Richard Biener wrote:

So I'd try to do no functional change first, improving the costing and
setting up the transform to simply pick up the stmts to "fold" as discovered
during analysis (as I hinted you possibly can use gimple_uid to mark
the stmts that simplify, IIRC gimple_uid is preserved during copying.
gimple_uid would also scale better than gimple_plf in case we do
the analysis for all candidates at once).


Thinking about the analysis. Am I correct that we want to properly calculate
loop size for true and false edge of a potential gcond before the actually 
unswitching?


Yes.


We can do that by finding a first gcond candidate, evaluate (symbolic + irange 
approache)
all other gcond in the loop body and use BB_REACHABLE discovery. Similarly to 
what we do now
at lines 378-446. Then tree_num_loop_insns can be adjusted for only these 
reachable blocks.
Having that, we can calculate # of insns that will live in true/false loops.


So whatever we do here we should record as "this control stmt folds to
{true,false}" (or {true,unknown},
or in future, "this control stmt will lead to edge {e,unknown}"),
recording the simplification
on the true/false loop version in a way we can apply it after the transform.


Then we can call tree_unswitch_loop and make the gcond folding as we do in the 
versioned loops.

Is it a step in good direction? Having that we can then extend it to gswitch 
statements.


One issue I see is that BB_REACHABLE is there only once but you could use
auto_bb_flag reachable_true, reachable_false to distinguish the
true/false loop version
copies.

So yes, I think that sounds reasonable.  At the point we want to
evaluate different
(first) unswitching opportunities against each other storing this only
as BB flag is
likely to hit limits.  When we want to evaluate multiple levels of
unswitching before
doing any transforms even more so (if there are 3 opportunities there'd be
many cases to be considered when going to level 3 ;)).  I _think_ that a sparse
lattice of stmt UID -> edge might do the trick if we change tree_num_loop_insns
do to a DFS walk from the loop header, ignoring not taken edges by
consulting the
lattice.  Or, for speed reason, pre-compute tree_num_loop_insns for each BB
so we just have to sum a different set of BBs rather than walking all
stmts again.

That said, the second step would definitely be to choose the "best" opportunity
on the current level.

Richard.


Cheers,
Martin


Hello.

I'm sending a new version where I changed:
1) all unswitch_predicates are find for a loop
2) context sensitive costing happens based on an unswitch_predicate and BB 
reachability
 is implemented
3) folding happens in recursive invocation once we decide to unswitch
4) the patch folds both symbolic gcond predicates and irange provided by ranger
5) debug counter was added

Patch can bootstrap on x86_64-linux-gnu and survives regression tests. Plus, I 
tested it
on SPEC2006 and SPEC2017 with -size=ref.


Meh, diff made a mess out if this ;)  Random comments, I'm walking
myself the optimizations
flow.


Sure.



tree_unswitch_single_loop:

+  unswitch_predicate *predicate = NULL;
+  if (num > param_max_unswitch_level)
+{
+  if (dump_file
+ && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file, ";; Not unswitching anymore, hit max level\n");
+  goto exit;
+}

this looks like we can do this check before find_all_unswitching_predicates?


Makes sense.



+  for (auto pred: candidates)
+{
+  unsigned cost
+   = evaluate_loop_insns_for_predicate (loop, bbs, ranger, pred);
...

so this searches for the first candidate that fits in
param_max_unswitch_insns, it doesn't
yet try to find the cheapest / best one.  Please add a comment to say
that.  After we
found one candidate we apply unswitching to such one candidate (and throw the
others away).  I guess that's OK - it's what the old code did - what
you did for this
intermediate step is actually gather all unswitching predicates
upfront.  Hopefully
we'll be able to share some of the work done for the recursive invocations.

+ fprintf (dump_file, ";; Unswitching loop with condition: ");

"on condition"

+ fprintf (dump_file, ";; Not unswitching condition, loop too big "
+  "(%d insns): ", cost);

"cost too big"?  I assume 'cost' is the number of stmts we'll add,
loop-size - true-eliminated - false-eliminated?


I'm going to adjust this.



+exit:
+  for (auto predicate: candidates)
+delete predicate;

Some refactoring should get rid of the goto ...


You don't like it? It seems to me quite logical as one does not have to repeat
a clean up code before each return statement.



+static unsigned
+evaluate_insns (class loop *loop,  basic_block *bbs,
+   

Re: [PATCH 06/10] tree-object-size: Support dynamic sizes in conditions

2021-11-23 Thread Jakub Jelinek via Gcc-patches
On Wed, Nov 10, 2021 at 12:31:32AM +0530, Siddhesh Poyarekar wrote:
>   (object_sizes_execute): Don't insert min/max for dynamic sizes.

I'm worried about this.
I'd say what we might want to do is in the early pass for __bdos
compute actually __bos (i.e. the static one) and add MIN_EXPR/MAX_EXPR
for the result of the __bdos call from the second pass with the
statically computed value.

The reason for the MIN_EXPR/MAX_EXPR stuff is that GIMPLE optimizations
can remove exact ADDR_EXPRs with detailed COMPONENT_REF etc. access paths
in it, so during the late objsz2 pass the subobject modes don't work
reliably anymore.  But the subobject knowledge should be the same between
the static and dynamic evaluation...

Jakub



Re: [PATCH 04/10] tree-object-size: Single pass dependency loop resolution

2021-11-23 Thread Jakub Jelinek via Gcc-patches
On Tue, Nov 23, 2021 at 07:14:04PM +0530, Siddhesh Poyarekar wrote:
> > This feels way too risky to me.  I think making some code do something
> > different between (x & OST_DYNAMIC) == 0 and == 1 is just fine,
> > it doesn't have to share everything.  After all, for __bdos we actually
> > emit code to compute it at runtime, while for __bos we don't.
> > So I'd keep e.g. .pass = 0, .pass = 1 and .pass = 2 (in a loop) in
> > compute_builtin_object_size for non-OST_DYNAMIC and only use your new
> > stuff for __bdos.
> > E.g. creating new SSA_NAMEs for __bos doesn't look like a good idea to me.
> > The GCC __bos testsuite is not big enough that it covers everything and
> > even in it we sometimes reexamine 2, 3 or 4 times.
> 
> OK, so addr_object_size does not participate in dependency loops, so I can
> keep its changes intact and simply add a INTEGER_CST check at the end to
> return unknown if the size expression is not constant; we kinda do that for
> object offsets anyway.
> 
> I could then make a new entry point collect_object_sizes_for (renaming the
> current one to collect_static_object_sizes_for and a new one
> collect_dynamic_object_sizes_for, routing based on object_size_type &
> OST_DYNAMIC) which sends the object size collection down different paths.  I
> can reuse the object size vectors and just have different data in them.

I thought you'd in compute_builtin_object_size do something like:
  osi.depths = NULL;
  osi.stack = NULL;
  osi.tos = NULL;

  if (object_size_type & OST_DYNAMIC)
{
  osi.tempsize_objs.create (0);
  // All the new stuff in there
  goto done;
}

  /* First pass: walk UD chains, compute object sizes that
 can be computed.  osi.reexamine bitmap at the end will
 contain what variables were found in dependency cycles
 and therefore need to be reexamined.  */
  osi.pass = 0;
  osi.changed = false;
  // Current code

 done:
  BITMAP_FREE (osi.reexamine);
  BITMAP_FREE (osi.visited);
}
or so.  The new routine you've added used just for OST_DYNAMIC, keep the
old ones like check_for_plus_in_loops{,_1}, and for the likes of
cond_expr_object_size tweak it so that it can be used in both modes.

Jakub



Re: [PATCH 2/2][GCC] arm: Declare MVE types internally via pragma

2021-11-23 Thread Richard Earnshaw via Gcc-patches




On 23/11/2021 09:37, Murray Steele wrote:

On 18/11/2021 15:45, Richard Earnshaw wrote:



This is mostly OK, but can't we reduce the number of tests somewhat? For 
example, I think you can merge type_redef_13.c and type_redef_14.c by writing

/* { dg-do compile } */
/* { dg-require-effective-target arm_v8_1m_mve_ok } */
/* { dg-add-options arm_v8_1m_mve } */

int uint8x16x4_t; /* { dg-message "note: previous declaration of 
'uint8x16x4_t'" } */
int uint16x8x2_t; /* { dg-message "note: previous declaration of 
'uint16x8x2_t'" } */

#pragma GCC arm "arm_mve_types.h"  /* { dg-error {'uint8x16x4_t' redeclared} } 
*/
   /* { dg-error {'uint16x8x2_t' redeclared} {target *-*-*} .-1 } */

etc.  Note the second dg-error is anchored to the line above it (.-1).

R.


Thanks. I think if we'd like to reduce the number of tests, it would make the 
most
sense to merge the test cases in the way you've described based on their 
implementation
and target features. i.e.

- type_redef_1.c : covers mve_pred16_t.
- type_redef_2.c : covers single-integer-vector types.
- type_redef_3.c : covers single-float-vector types.
- type_redef_4.c : covers integer-vector-tuple types.
- type_redef_5.c : covers float-vector-tuple types.

The idea being that the test results for these tests should allow someone to 
triangulate
the cause of the failure. For example, if tests 4 and 5 fail, it is likely due 
to a
deficiency in the MVE tuple type implementation, rather than the handling of 
target-specific
features. More specific tests failures can be determined by looking through log 
files.

Thanks,
Murray



Merged files will still have the same number of tests, and the same 
possible test names, just from fewer source files.  So I don't think 
triangulation will be an issue.


Re: [PATCH 05/10] __builtin_dynamic_object_size: Recognize builtin

2021-11-23 Thread Jakub Jelinek via Gcc-patches
On Tue, Nov 23, 2021 at 07:23:01PM +0530, Siddhesh Poyarekar wrote:
> > What's the advantage of another argument and then merging it with
> > object_size_type over just passing object_size_type which will have
> > all the bits in?
> 
> I kept the size bits as an internal detail, I can define them in
> tree-object-size.h and hae builtins.c (and others) use them.

Good idea.

Jakub



Re: [PATCH] Loop unswitching: support gswitch statements.

2021-11-23 Thread Richard Biener via Gcc-patches
On Mon, Nov 22, 2021 at 4:07 PM Martin Liška  wrote:
>
> On 11/19/21 11:00, Richard Biener wrote:
> > On Tue, Nov 16, 2021 at 3:40 PM Martin Liška  wrote:
> >>
> >> On 11/11/21 08:15, Richard Biener wrote:
> >>> So I'd try to do no functional change first, improving the costing and
> >>> setting up the transform to simply pick up the stmts to "fold" as 
> >>> discovered
> >>> during analysis (as I hinted you possibly can use gimple_uid to mark
> >>> the stmts that simplify, IIRC gimple_uid is preserved during copying.
> >>> gimple_uid would also scale better than gimple_plf in case we do
> >>> the analysis for all candidates at once).
> >>
> >> Thinking about the analysis. Am I correct that we want to properly 
> >> calculate
> >> loop size for true and false edge of a potential gcond before the actually 
> >> unswitching?
> >
> > Yes.
> >
> >> We can do that by finding a first gcond candidate, evaluate (symbolic + 
> >> irange approache)
> >> all other gcond in the loop body and use BB_REACHABLE discovery. Similarly 
> >> to what we do now
> >> at lines 378-446. Then tree_num_loop_insns can be adjusted for only these 
> >> reachable blocks.
> >> Having that, we can calculate # of insns that will live in true/false 
> >> loops.
> >
> > So whatever we do here we should record as "this control stmt folds to
> > {true,false}" (or {true,unknown},
> > or in future, "this control stmt will lead to edge {e,unknown}"),
> > recording the simplification
> > on the true/false loop version in a way we can apply it after the transform.
> >
> >> Then we can call tree_unswitch_loop and make the gcond folding as we do in 
> >> the versioned loops.
> >>
> >> Is it a step in good direction? Having that we can then extend it to 
> >> gswitch statements.
> >
> > One issue I see is that BB_REACHABLE is there only once but you could use
> > auto_bb_flag reachable_true, reachable_false to distinguish the
> > true/false loop version
> > copies.
> >
> > So yes, I think that sounds reasonable.  At the point we want to
> > evaluate different
> > (first) unswitching opportunities against each other storing this only
> > as BB flag is
> > likely to hit limits.  When we want to evaluate multiple levels of
> > unswitching before
> > doing any transforms even more so (if there are 3 opportunities there'd be
> > many cases to be considered when going to level 3 ;)).  I _think_ that a 
> > sparse
> > lattice of stmt UID -> edge might do the trick if we change 
> > tree_num_loop_insns
> > do to a DFS walk from the loop header, ignoring not taken edges by
> > consulting the
> > lattice.  Or, for speed reason, pre-compute tree_num_loop_insns for each BB
> > so we just have to sum a different set of BBs rather than walking all
> > stmts again.
> >
> > That said, the second step would definitely be to choose the "best" 
> > opportunity
> > on the current level.
> >
> > Richard.
> >
> >> Cheers,
> >> Martin
>
> Hello.
>
> I'm sending a new version where I changed:
> 1) all unswitch_predicates are find for a loop
> 2) context sensitive costing happens based on an unswitch_predicate and BB 
> reachability
> is implemented
> 3) folding happens in recursive invocation once we decide to unswitch
> 4) the patch folds both symbolic gcond predicates and irange provided by 
> ranger
> 5) debug counter was added
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests. Plus, 
> I tested it
> on SPEC2006 and SPEC2017 with -size=ref.

Meh, diff made a mess out if this ;)  Random comments, I'm walking
myself the optimizations
flow.

tree_unswitch_single_loop:

+  unswitch_predicate *predicate = NULL;
+  if (num > param_max_unswitch_level)
+{
+  if (dump_file
+ && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file, ";; Not unswitching anymore, hit max level\n");
+  goto exit;
+}

this looks like we can do this check before find_all_unswitching_predicates?

+  for (auto pred: candidates)
+{
+  unsigned cost
+   = evaluate_loop_insns_for_predicate (loop, bbs, ranger, pred);
...

so this searches for the first candidate that fits in
param_max_unswitch_insns, it doesn't
yet try to find the cheapest / best one.  Please add a comment to say
that.  After we
found one candidate we apply unswitching to such one candidate (and throw the
others away).  I guess that's OK - it's what the old code did - what
you did for this
intermediate step is actually gather all unswitching predicates
upfront.  Hopefully
we'll be able to share some of the work done for the recursive invocations.

+ fprintf (dump_file, ";; Unswitching loop with condition: ");

"on condition"

+ fprintf (dump_file, ";; Not unswitching condition, loop too big "
+  "(%d insns): ", cost);

"cost too big"?  I assume 'cost' is the number of stmts we'll add,
loop-size - true-eliminated - false-eliminated?

+exit:
+  for (auto predicate: candidates)
+delete predicate;

Some refactoring should get rid of the goto ...

Re: [PATCH 05/10] __builtin_dynamic_object_size: Recognize builtin

2021-11-23 Thread Siddhesh Poyarekar

On 11/23/21 18:11, Jakub Jelinek wrote:

On Wed, Nov 10, 2021 at 12:31:31AM +0530, Siddhesh Poyarekar wrote:

Recognize the __builtin_dynamic_object_size builtin and add paths in the
object size path to deal with it, but treat it like
__builtin_object_size for now.  Also add tests to provide the same
testing coverage for the new builtin name.

gcc/ChangeLog:

* builtins.def (BUILT_IN_DYNAMIC_OBJECT_SIZE): New builtin.
* tree-object-size.h (compute_builtin_object_size): Add new
argument dynamic.
* builtins.c (expand_builtin, fold_builtin_2): Handle it.
(fold_builtin_object_size): Handle new builtin and adjust for
change to compute_builtin_object_size.
* tree-object-size.c: Include builtins.h.
(OST_DYNAMIC): New enum value.
(compute_builtin_object_size): Add new argument dynamic.
(addr_object_size): Adjust.
(early_object_sizes_execute_one,
dynamic_object_sizes_execute_one): New functions.
(object_sizes_execute): Rename insert_min_max_p argument to
early. Handle BUILT_IN_DYNAMIC_OBJECT_SIZE and call the new


Two spaces after . instead of just one.


--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -972,6 +972,7 @@ DEF_BUILTIN_STUB (BUILT_IN_STRNCMP_EQ, 
"__builtin_strncmp_eq")
  
  /* Object size checking builtins.  */

  DEF_GCC_BUILTIN  (BUILT_IN_OBJECT_SIZE, "object_size", 
BT_FN_SIZE_CONST_PTR_INT, ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_GCC_BUILTIN   (BUILT_IN_DYNAMIC_OBJECT_SIZE, 
"dynamic_object_size", BT_FN_SIZE_CONST_PTR_INT, ATTR_NOTHROW_LEAF_LIST)


Are you sure about the omission of CONST_ in there?
If I do:
   size_t a = __builtin_dynamic_object_size (x, 0);
   size_t b = __builtin_dynamic_object_size (x, 0);
I'd expect the compiler to perform it just once.  While it might actually do
it eventually after objsz2 pass lowers it, with the above it won't really do
it.  Perhaps const attribute isn't really safe, the function might need to
read some memory in order to compute the return value, but certainly it will
not store to any memory, so perhaps
ATTR_PURE_NOTHROW_LEAF_LIST ?


Thanks, I'll fix this.




+#define DYNAMIC_OBJECT_SIZE


Why this extra macro?


+#define __builtin_object_size __builtin_dynamic_object_size



  extern char ax[];
+#ifndef DYNAMIC_OBJECT_SIZE


You can #ifndef __builtin_object_size
instead...


I'll fix this too.




@@ -371,7 +373,8 @@ addr_object_size (struct object_size_info *osi, const_tree 
ptr,
  || TREE_CODE (TREE_OPERAND (pt_var, 0)) != SSA_NAME)
{
  compute_builtin_object_size (TREE_OPERAND (pt_var, 0),
-  object_size_type & ~OST_SUBOBJECT, );
+  object_size_type & OST_MINIMUM, ,
+  object_size_type & OST_DYNAMIC);
}
else
{
@@ -835,9 +838,10 @@ resolve_dependency_loops (struct object_size_info *osi)
  
  bool

  compute_builtin_object_size (tree ptr, int object_size_type,
-tree *psize)
+tree *psize, bool dynamic)
  {
gcc_assert (object_size_type >= 0 && object_size_type < OST_END);
+  object_size_type |= dynamic ? OST_DYNAMIC : 0;


What's the advantage of another argument and then merging it with
object_size_type over just passing object_size_type which will have
all the bits in?


I kept the size bits as an internal detail, I can define them in 
tree-object-size.h and hae builtins.c (and others) use them.



+static void
+early_object_sizes_execute_one (gimple_stmt_iterator *i, gimple *call)
+{
+  tree ost = gimple_call_arg (call, 1);
+  tree lhs = gimple_call_lhs (call);
+  gcc_assert (lhs != NULL_TREE);
+
+  if (tree_fits_uhwi_p (ost))
+{
+  unsigned HOST_WIDE_INT object_size_type = tree_to_uhwi (ost);
+  tree ptr = gimple_call_arg (call, 0);
+  if ((object_size_type == 1 || object_size_type == 3)
+ && (TREE_CODE (ptr) == ADDR_EXPR || TREE_CODE (ptr) == SSA_NAME))


I think it would be better to have early exits there to avoid
indenting most of the function too much, because the function doesn't
do anything otherwise.  So:
   if (!tree_fits_uhwi_p (ost))
 return;


OK, I'll fix this.



   unsigned HOST_WIDE_INT object_size_type = tree_to_uhwi (ost);
   tree ptr = gimple_call_arg (call, 0);
   if (object_size_type != 1 && object_size_type != 3)
 return;
   if (TREE_CODE (ptr) != ADDR_EXPR && TREE_CODE (ptr) != SSA_NAME)
 return;

   tree type = ...


+   {
+ tree type = TREE_TYPE (lhs);
+ tree bytes;
+ if (compute_builtin_object_size (ptr, object_size_type, ))
+   {
+ tree tem = make_ssa_name (type);
+ gimple_call_set_lhs (call, tem);
+ enum tree_code code
+   = object_size_type & OST_MINIMUM ? MAX_EXPR : MIN_EXPR;
+ tree cst = fold_convert (type, bytes);
+ gimple *g = 

Re: [PATCH 04/10] tree-object-size: Single pass dependency loop resolution

2021-11-23 Thread Siddhesh Poyarekar

On 11/23/21 17:37, Jakub Jelinek wrote:

On Wed, Nov 10, 2021 at 12:31:30AM +0530, Siddhesh Poyarekar wrote:

Use SSA names as placeholders self-referencing variables to generate
expressions for object sizes and then reduce those size expressions
to constants instead of repeatedly walking through statements.

This change also makes sure that object sizes for an SSA name are
updated at most twice, once if there is a dependency loop and then the
final time upon computation of object size.  Iteration to deduce the
final size is now done on the size expressions instead of walking
through the object references.

Added test to include a case where __builtin_object_size incorrectly
returned the minimum object size as zero.


This feels way too risky to me.  I think making some code do something
different between (x & OST_DYNAMIC) == 0 and == 1 is just fine,
it doesn't have to share everything.  After all, for __bdos we actually
emit code to compute it at runtime, while for __bos we don't.
So I'd keep e.g. .pass = 0, .pass = 1 and .pass = 2 (in a loop) in
compute_builtin_object_size for non-OST_DYNAMIC and only use your new
stuff for __bdos.
E.g. creating new SSA_NAMEs for __bos doesn't look like a good idea to me.
The GCC __bos testsuite is not big enough that it covers everything and
even in it we sometimes reexamine 2, 3 or 4 times.


OK, so addr_object_size does not participate in dependency loops, so I 
can keep its changes intact and simply add a INTEGER_CST check at the 
end to return unknown if the size expression is not constant; we kinda 
do that for object offsets anyway.


I could then make a new entry point collect_object_sizes_for (renaming 
the current one to collect_static_object_sizes_for and a new one 
collect_dynamic_object_sizes_for, routing based on object_size_type & 
OST_DYNAMIC) which sends the object size collection down different 
paths.  I can reuse the object size vectors and just have different data 
in them.


Would that be reasonable?

Thanks,
Siddhesh


Re: [PATCH] [RFC][PR102768] aarch64: Add compiler support for Shadow Call Stack

2021-11-23 Thread Dan Li via Gcc-patches




On 11/23/21 6:51 PM, Szabolcs Nagy wrote:

The 11/23/2021 16:32, Dan Li wrote:

On 11/3/21 8:00 PM, Szabolcs Nagy wrote:

i assume exception handling info has to change for scs to
work (to pop the shadow stack when transferring control),
so either scs must require -fno-exceptions or the eh info
changes must be implemented.

i think the kernel does not require exceptions and does
not depend on the unwinder runtime in libgcc, so this
is optional for the linux kernel use-case.


I recompiled a glibc and gcc runtime library with -ffixed-x18 enabled.
As you said, the scs stack needs to be popped at the same time during
exception handling.

I saw that Clang is processed by adding
".cfi_escape 0x16, 0x12, 0x02, 0x82, 0x78"
directive (x18 -= 8;) after each emit of scs push[2].

But this directive has problems when executed in libgcc:
1)context->reg[x] in uw_init_context_1 are all based on cfa, most
   registers have no initial values by default.
2)Address of shadow call stack (x18) cannot(and should not) be calculated
   based on cfa, and I did not yet find a way to assign hardware register
   x18 to context->reg[18].
3)This causes libgcc to crash when parsing .cfi_escape exp because of 0
   address dereference (* x18)
   (execute_stack_op => case DW_OP_breg18: _Unwind_GetGR)
4)uw_install_context_1 does not restore all hardware registers by default
   before eh return, so context->reg[18] can't write directly to hw x18.
   (In clang, __unw_getcontext/__unw_resume will save/restore all hardware
   registers, so this directive works fine in my libunwind test.)

I tried to fix this problem through a patch[3], the exception handling
works fine in my test environment, but I'm not sure if this fix is
ppropriate for two reasons:
1)libgcc does not push/pop all registers by default during exception
   handling. Is this change appropriate?
2)The test case may not be able to test this patch, because the test
   environment requires at least on glibc/gcc runtime compiled with
   -ffixed-x18.

May be it's better to rely on -fno-exceptions for this patch first? and If
the glibc/gcc runtime also supports SCS later, the problem can be fixed
at the same time.


i did not look at the exception handling in detail (that's
difficult to understand for me too).

to use scs, non-default abi is required anyway, so not
supporting exceptions sounds fine to me. however it should
be documented and ideally enforced (-fexceptions should
be rejected, just like -fno-fixed-x18).

Thanks Szabolcs,

This sounds reasonable to me, and I'll fix it in the next version.


i assume the linux kernel does not require -fexceptions.


AFAIK, -fexceptions are not used in the linux kernel.


PS:
I'm still not familiar enough with exception handling in libgcc/libunwind,
please correct me if there are any mistakes :)

[1] 
https://github.com/llvm/llvm-project/commit/f11eb3ebe77729426e562d7d4d7ebb1d5ff2e7c8
[2] https://reviews.llvm.org/D54609
[3] https://gcc.gnu.org/bugzilla/attachment.cgi?id=51854=diff



Re: [PATCH 03/10] tree-object-size: Use tree instead of HOST_WIDE_INT

2021-11-23 Thread Siddhesh Poyarekar

On 11/23/21 17:28, Jakub Jelinek wrote:

On Mon, Nov 22, 2021 at 01:32:22PM +0100, Jakub Jelinek via Gcc-patches wrote:

On Mon, Nov 22, 2021 at 06:01:08PM +0530, Siddhesh Poyarekar wrote:

On 11/22/21 17:30, Siddhesh Poyarekar wrote:

So I've got patch 10/10, which handles dynamic (and consequently
negative) offsets.  It basically computes a "whole size", which then
gives the extent to which a negative offset is valid, making the
estimates a bit more precise.  I didn't do it for static object sizes
because I didn't have time then, but I could add a patch 11/10 if the
idea sounds OK to you.


... or alternatively, I could bring the whole size idea into this tree
conversion patch so that it handles all kinds of offsets.  That might even
eliminate patch 10/10.  What would you prefer?


Into this patch.


BTW, seems the current behavior is to punt on those "negative" values,
we trigger
   if (offset >= offset_limit)
case for it and return unknown.


The current behaviour is actually inconsistent; for SSA names it punts 
for sizes greater than offset limit and for addr_expr it ends up with 
larger sizes.


Siddhesh


[committed] libstdc++: Fix circular dependency for bitmap_allocator [PR103381]

2021-11-23 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux, pushed to trunk. Backports needed too.


 includes , and since C++17 that
includes . If std::allocator is defined in terms of
__gnu_cxx::bitmap_allocator then you get a circular reference and
bootstrap fails when compiling src/c++17/*.cc.

libstdc++-v3/ChangeLog:

PR libstdc++/103381
* include/ext/bitmap_allocator.h: Include 
instead of .
---
 libstdc++-v3/include/ext/bitmap_allocator.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/ext/bitmap_allocator.h 
b/libstdc++-v3/include/ext/bitmap_allocator.h
index cc80593764c..0444deb479c 100644
--- a/libstdc++-v3/include/ext/bitmap_allocator.h
+++ b/libstdc++-v3/include/ext/bitmap_allocator.h
@@ -31,7 +31,7 @@
 
 #include  // For std::pair.
 #include  // For __throw_bad_alloc().
-#include  // For greater_equal, and less_equal.
+#include  // For greater_equal, and less_equal.
 #include  // For operator new.
 #include  // _GLIBCXX_DEBUG_ASSERT
 #include 
-- 
2.31.1



Re: [PATCH take 2] ivopts: Improve code generated for very simple loops.

2021-11-23 Thread Richard Biener via Gcc-patches
On Thu, Nov 18, 2021 at 4:18 PM Roger Sayle  wrote:
>
>
> Hi Richard,
> Many thanks for the patch review.
>
> On Tue, Nov 16, 2021 at 12:38 Richard Biener wrote:
> > On Mon, Nov 15, 2021 at 2:04 PM Roger Sayle
> >  wrote:
> > >
> > > This patch tidies up the code that GCC generates for simple loops, by
> > > selecting/generating a simpler loop bound expression in ivopts.
> > > Previously:
> > >
> > > void v1 (unsigned long *in, unsigned long *out, unsigned int n) {
> > >   int i;
> > >   for (i = 0; i < n; i++) {
> > > out[i] = in[i];
> > >   }
> > > }
> > >
> > > on x86_64 generated:
> > > v1: testl   %edx, %edx
> > > je  .L1
> > > movl%edx, %edx
> > > xorl%eax, %eax
> > > .L3:movq(%rdi,%rax,8), %rcx
> > > movq%rcx, (%rsi,%rax,8)
> > > addq$1, %rax
> > > cmpq%rax, %rdx
> > > jne .L3
> > > .L1:ret
> > >
> > > and now instead generates:
> > > v1: testl   %edx, %edx
> > > je  .L1
> > > movl%edx, %edx
> > > xorl%eax, %eax
> > > leaq0(,%rdx,8), %rcx
> > > .L3:movq(%rdi,%rax), %rdx
> > > movq%rdx, (%rsi,%rax)
> > > addq$8, %rax
> > > cmpq%rax, %rcx
> > > jne .L3
> > > .L1:ret
> >
> > Is that actually better?  IIRC the addressing modes are both complex and we
> > now have an extra lea?
>
>
> Technically the induction variable elimination is removing two multiplies by 8
> (or left shifts) from the body of the loop and replacing them with a single
> multiply by 8 prior to the loop, which is exactly the induction variable
> optimization that ivopts is designed to do.  It's true that with x86's complex
> addressing modes these "multiplications" are free, but ivopts is run on all
> targets including those without indexed addressing, and even on x86 there's
> a benefit from shorter instruction encodings.
>
> > For this case I see we generate
> >   _15 = n_10(D) + 4294967295;
> >   _8 = (unsigned long) _15;
> >   _7 = _8 + 1;
> >
> > where n is unsigned int so if we know that n is not zero we can simplify the
> > addition and conveniently the loop header test provides this guarantee.
> > IIRC there were some attempts to enhance match.pd for some cases of such
> > expressions.
>
> Exactly.  The loop optimizers are generating the expression (x-1)+1 and
> assuming that the middle-end will simplify this to x.  Unfortunately, the
> change of modes (designed to prevent overflow) actually makes this
> impossible to simplify, due to concerns about overflow.
>
> > +  /* If AFTER_ADJUST is required, the code below generates the equivalent
> > +   * of BASE + NITER * STEP + STEP, when ideally we'd prefer the expression
> > +   * BASE + (NITER + 1) * STEP, especially when NITER is often of the form
> > +   * SSA_NAME - 1.  Unfortunately, guaranteeing that adding 1 to NITER
> > +   * doesn't overflow is tricky, so we peek inside the TREE_NITER_DESC
> > +   * class for common idioms that we know are safe.  */
> >
> > No '* ' each line.
> Doh!  Thanks.  Sometimes I hate vi.
>
> > I wonder if the non-overflowing can be captured by
> > integer_onep (iv->step)
> > && max_stmt_executions (loop, )
>
> Unfortunately, max_stmt_executions is intended to return the wide_int count
> of iterations, either the known constant value or a profile-based estimate, 
> while
> the optimizations I'm proposing work with symbolic values from variable 
> iteration
> counts.  When the iteration count is a constant, fold-const is already able to
> simplify (x-1)+1 to an integer constant even with type conversions.
>
> > if we then do (niter + 1) * step instead of niter*step + step would that do 
> > the
> > same?
>
> Yes.  I've extended the scope of the patch to now also handle loops of the
> form (for i=beg; i now use (end-beg) as the incremented niter, so the invariant expression now
> becomes (end-beg)*4 instead of the currently generated:  ((end-beg)-1)*4 + 4.
>
> I'm assuming that the niter*step + step is by design (for the general case),
> so I'm only tweaking the (common) corner cases, where it's easy to see that
> it's safe to substitute a simpler expression.  For more general affine 
> recurrences
> in complex loop nests, niter*step + step may be preferred.
>
> > That said - what the change does is actually ensure that we CSE niter + 1
> > with the bound of the simplified exit test?
>
> Not quite, this simply provides a simplified expression for "niter + 1" that
> takes advantage of the implicit range information we have.  You're right
> in theory that ranger/vrp may be able to help simplify (long)((int)x-1)+1L
> more generally.   In this case, I suspect it's just a historical artifact that
> much of the literature on (iv) loop optimizations dates back to Fortran
> and Algol where arrays were 1-based rather than 0-based, so decades
> later gcc 11 on x86_64 sometimes requires an extra inc/dec instruction
> for purely historical 

Re: [PATCH 05/10] __builtin_dynamic_object_size: Recognize builtin

2021-11-23 Thread Jakub Jelinek via Gcc-patches
On Wed, Nov 10, 2021 at 12:31:31AM +0530, Siddhesh Poyarekar wrote:
> Recognize the __builtin_dynamic_object_size builtin and add paths in the
> object size path to deal with it, but treat it like
> __builtin_object_size for now.  Also add tests to provide the same
> testing coverage for the new builtin name.
> 
> gcc/ChangeLog:
> 
>   * builtins.def (BUILT_IN_DYNAMIC_OBJECT_SIZE): New builtin.
>   * tree-object-size.h (compute_builtin_object_size): Add new
>   argument dynamic.
>   * builtins.c (expand_builtin, fold_builtin_2): Handle it.
>   (fold_builtin_object_size): Handle new builtin and adjust for
>   change to compute_builtin_object_size.
>   * tree-object-size.c: Include builtins.h.
>   (OST_DYNAMIC): New enum value.
>   (compute_builtin_object_size): Add new argument dynamic.
>   (addr_object_size): Adjust.
>   (early_object_sizes_execute_one,
>   dynamic_object_sizes_execute_one): New functions.
>   (object_sizes_execute): Rename insert_min_max_p argument to
>   early. Handle BUILT_IN_DYNAMIC_OBJECT_SIZE and call the new

Two spaces after . instead of just one.

> --- a/gcc/builtins.def
> +++ b/gcc/builtins.def
> @@ -972,6 +972,7 @@ DEF_BUILTIN_STUB (BUILT_IN_STRNCMP_EQ, 
> "__builtin_strncmp_eq")
>  
>  /* Object size checking builtins.  */
>  DEF_GCC_BUILTIN (BUILT_IN_OBJECT_SIZE, "object_size", 
> BT_FN_SIZE_CONST_PTR_INT, ATTR_CONST_NOTHROW_LEAF_LIST)
> +DEF_GCC_BUILTIN (BUILT_IN_DYNAMIC_OBJECT_SIZE, 
> "dynamic_object_size", BT_FN_SIZE_CONST_PTR_INT, ATTR_NOTHROW_LEAF_LIST)

Are you sure about the omission of CONST_ in there?
If I do:
  size_t a = __builtin_dynamic_object_size (x, 0);
  size_t b = __builtin_dynamic_object_size (x, 0);
I'd expect the compiler to perform it just once.  While it might actually do
it eventually after objsz2 pass lowers it, with the above it won't really do
it.  Perhaps const attribute isn't really safe, the function might need to
read some memory in order to compute the return value, but certainly it will
not store to any memory, so perhaps
ATTR_PURE_NOTHROW_LEAF_LIST ?

> +#define DYNAMIC_OBJECT_SIZE

Why this extra macro?

> +#define __builtin_object_size __builtin_dynamic_object_size

>  extern char ax[];
> +#ifndef DYNAMIC_OBJECT_SIZE

You can #ifndef __builtin_object_size
instead...

> @@ -371,7 +373,8 @@ addr_object_size (struct object_size_info *osi, 
> const_tree ptr,
> || TREE_CODE (TREE_OPERAND (pt_var, 0)) != SSA_NAME)
>   {
> compute_builtin_object_size (TREE_OPERAND (pt_var, 0),
> -object_size_type & ~OST_SUBOBJECT, );
> +object_size_type & OST_MINIMUM, ,
> +object_size_type & OST_DYNAMIC);
>   }
>else
>   {
> @@ -835,9 +838,10 @@ resolve_dependency_loops (struct object_size_info *osi)
>  
>  bool
>  compute_builtin_object_size (tree ptr, int object_size_type,
> -  tree *psize)
> +  tree *psize, bool dynamic)
>  {
>gcc_assert (object_size_type >= 0 && object_size_type < OST_END);
> +  object_size_type |= dynamic ? OST_DYNAMIC : 0;

What's the advantage of another argument and then merging it with
object_size_type over just passing object_size_type which will have
all the bits in?

> +static void
> +early_object_sizes_execute_one (gimple_stmt_iterator *i, gimple *call)
> +{
> +  tree ost = gimple_call_arg (call, 1);
> +  tree lhs = gimple_call_lhs (call);
> +  gcc_assert (lhs != NULL_TREE);
> +
> +  if (tree_fits_uhwi_p (ost))
> +{
> +  unsigned HOST_WIDE_INT object_size_type = tree_to_uhwi (ost);
> +  tree ptr = gimple_call_arg (call, 0);
> +  if ((object_size_type == 1 || object_size_type == 3)
> +   && (TREE_CODE (ptr) == ADDR_EXPR || TREE_CODE (ptr) == SSA_NAME))

I think it would be better to have early exits there to avoid
indenting most of the function too much, because the function doesn't
do anything otherwise.  So:
  if (!tree_fits_uhwi_p (ost))
return;

  unsigned HOST_WIDE_INT object_size_type = tree_to_uhwi (ost);
  tree ptr = gimple_call_arg (call, 0);
  if (object_size_type != 1 && object_size_type != 3)
return;
  if (TREE_CODE (ptr) != ADDR_EXPR && TREE_CODE (ptr) != SSA_NAME)
return;

  tree type = ...

> + {
> +   tree type = TREE_TYPE (lhs);
> +   tree bytes;
> +   if (compute_builtin_object_size (ptr, object_size_type, ))
> + {
> +   tree tem = make_ssa_name (type);
> +   gimple_call_set_lhs (call, tem);
> +   enum tree_code code
> + = object_size_type & OST_MINIMUM ? MAX_EXPR : MIN_EXPR;
> +   tree cst = fold_convert (type, bytes);
> +   gimple *g = gimple_build_assign (lhs, code, tem, cst);
> +   gsi_insert_after (i, g, GSI_NEW_STMT);
> +   update_stmt (call);
> + }
> + }
> +}

> +/* Attempt to fold one 

Re: [PATCH 04/10] tree-object-size: Single pass dependency loop resolution

2021-11-23 Thread Jakub Jelinek via Gcc-patches
On Wed, Nov 10, 2021 at 12:31:30AM +0530, Siddhesh Poyarekar wrote:
> Use SSA names as placeholders self-referencing variables to generate
> expressions for object sizes and then reduce those size expressions
> to constants instead of repeatedly walking through statements.
> 
> This change also makes sure that object sizes for an SSA name are
> updated at most twice, once if there is a dependency loop and then the
> final time upon computation of object size.  Iteration to deduce the
> final size is now done on the size expressions instead of walking
> through the object references.
> 
> Added test to include a case where __builtin_object_size incorrectly
> returned the minimum object size as zero.

This feels way too risky to me.  I think making some code do something
different between (x & OST_DYNAMIC) == 0 and == 1 is just fine,
it doesn't have to share everything.  After all, for __bdos we actually
emit code to compute it at runtime, while for __bos we don't.
So I'd keep e.g. .pass = 0, .pass = 1 and .pass = 2 (in a loop) in
compute_builtin_object_size for non-OST_DYNAMIC and only use your new
stuff for __bdos.
E.g. creating new SSA_NAMEs for __bos doesn't look like a good idea to me.
The GCC __bos testsuite is not big enough that it covers everything and
even in it we sometimes reexamine 2, 3 or 4 times.

> gcc/ChangeLog:
> 
>   * tree-object-size.c (struct object_size_info): Remove pass,
>   changed, depths, stack and tos.  Add tempsize_objs.
>   (OST_TREE_CODE): New macro.
>   (expr_object_size, merge_object_sizes, plus_stmt_object_size,
>   cond_expr_object_size): Return tree and don't pass pointer tree.
>   (object_sizes_set): Return void.  Adjust implementation to hold
>   placeholder SSA names and their values in different slots.
>   (addr_object_size): Adjust for single pass.
>   (reducing_size, estimate_size, resolve_dependency_loops): New
>   functions.
>   (compute_builtin_object_size): Call them.
>   (make_tempsize): New function.
>   (collect_object_sizes_for): Use it.  Update object_sizes at most
>   twice.
>   (check_for_plus_in_loops, check_for_plus_in_loops_1): Remove
>   functions.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/builtin-object-size-1.c (test6): New test for
>   passthrough.
>   * gcc.dg/builtin-object-size-2.c: Likewise.
>   * gcc.dg/builtin-object-size-3.c: Likewise.
>   * gcc.dg/builtin-object-size-4.c: Likewise.

Jakub



Re: [PATCH 03/10] tree-object-size: Use tree instead of HOST_WIDE_INT

2021-11-23 Thread Jakub Jelinek via Gcc-patches
On Mon, Nov 22, 2021 at 01:32:22PM +0100, Jakub Jelinek via Gcc-patches wrote:
> On Mon, Nov 22, 2021 at 06:01:08PM +0530, Siddhesh Poyarekar wrote:
> > On 11/22/21 17:30, Siddhesh Poyarekar wrote:
> > > So I've got patch 10/10, which handles dynamic (and consequently
> > > negative) offsets.  It basically computes a "whole size", which then
> > > gives the extent to which a negative offset is valid, making the
> > > estimates a bit more precise.  I didn't do it for static object sizes
> > > because I didn't have time then, but I could add a patch 11/10 if the
> > > idea sounds OK to you.
> > 
> > ... or alternatively, I could bring the whole size idea into this tree
> > conversion patch so that it handles all kinds of offsets.  That might even
> > eliminate patch 10/10.  What would you prefer?
> 
> Into this patch.

BTW, seems the current behavior is to punt on those "negative" values,
we trigger
  if (offset >= offset_limit)
case for it and return unknown.

Jakub



Re: [PATCH] PR fortran/87851 - [9/10/11/12 Regression] Wrong return type for len_trim

2021-11-23 Thread Mikael Morin

Le 22/11/2021 à 21:30, Bernhard Reutner-Fischer a écrit :


I'm just wondering loud if it would be more convenient to have a
unsigned hidden_arg:1 bit in let's say gfc_actual_arglist that denotes
if the argument should be const eval'ed and used before, and, most
importantly not passed to the library. We seem to have more than just
the index intrinsic's kind arg in that boat. And from what i read,
powerpc will eventuall want to select quite some kind-specific library
functions soon, depending on how this part is implemented..

Maybe add SPEC_HIDDEN_ARG / SPEC_LIBRARY_SELECTOR additional
gfc_param_spec_type if a separate bit is deemed inappropriate.

Such a hidden_arg/library_selector/non_library_call_arg flag is maybe
better than matching individual functions and strcmp the arg name.


Hello,

I prefer not to go that way if possible:
 - because additional flags have a maintenance cost; it’s an additional 
complexity in the core structures, which impacts the whole compiler; 
it’s additional code to set them up, and maintainers have to understand 
what they are for, where they matter and where they don’t.
 - because the flag would have to be set at some point somewhere, which 
would probably be by matching individual functions and argument names; 
so the result would be the same.


You seem to be mostly concerned by the performance penalty, but I think 
4 characters string comparisons at compile time don’t matter in 
practice, as long as there aren’t millions of them.


Regarding the powerpc floating point representation and kind problem, 
let’s see what we need when we really need it. ;-)


Mikael



Re: [PATCH] [RFC][PR102768] aarch64: Add compiler support for Shadow Call Stack

2021-11-23 Thread Szabolcs Nagy via Gcc-patches
The 11/23/2021 16:32, Dan Li wrote:
> On 11/3/21 8:00 PM, Szabolcs Nagy wrote:
> > i assume exception handling info has to change for scs to
> > work (to pop the shadow stack when transferring control),
> > so either scs must require -fno-exceptions or the eh info
> > changes must be implemented.
> > 
> > i think the kernel does not require exceptions and does
> > not depend on the unwinder runtime in libgcc, so this
> > is optional for the linux kernel use-case.
> > 
> I recompiled a glibc and gcc runtime library with -ffixed-x18 enabled.
> As you said, the scs stack needs to be popped at the same time during
> exception handling.
> 
> I saw that Clang is processed by adding
> ".cfi_escape 0x16, 0x12, 0x02, 0x82, 0x78"
> directive (x18 -= 8;) after each emit of scs push[2].
> 
> But this directive has problems when executed in libgcc:
> 1)context->reg[x] in uw_init_context_1 are all based on cfa, most
>   registers have no initial values by default.
> 2)Address of shadow call stack (x18) cannot(and should not) be calculated
>   based on cfa, and I did not yet find a way to assign hardware register
>   x18 to context->reg[18].
> 3)This causes libgcc to crash when parsing .cfi_escape exp because of 0
>   address dereference (* x18)
>   (execute_stack_op => case DW_OP_breg18: _Unwind_GetGR)
> 4)uw_install_context_1 does not restore all hardware registers by default
>   before eh return, so context->reg[18] can't write directly to hw x18.
>   (In clang, __unw_getcontext/__unw_resume will save/restore all hardware
>   registers, so this directive works fine in my libunwind test.)
> 
> I tried to fix this problem through a patch[3], the exception handling
> works fine in my test environment, but I'm not sure if this fix is
> ppropriate for two reasons:
> 1)libgcc does not push/pop all registers by default during exception
>   handling. Is this change appropriate?
> 2)The test case may not be able to test this patch, because the test
>   environment requires at least on glibc/gcc runtime compiled with
>   -ffixed-x18.
> 
> May be it's better to rely on -fno-exceptions for this patch first? and If
> the glibc/gcc runtime also supports SCS later, the problem can be fixed
> at the same time.

i did not look at the exception handling in detail (that's
difficult to understand for me too).

to use scs, non-default abi is required anyway, so not
supporting exceptions sounds fine to me. however it should
be documented and ideally enforced (-fexceptions should
be rejected, just like -fno-fixed-x18).

i assume the linux kernel does not require -fexceptions.

> 
> PS:
> I'm still not familiar enough with exception handling in libgcc/libunwind,
> please correct me if there are any mistakes :)
> 
> [1] 
> https://github.com/llvm/llvm-project/commit/f11eb3ebe77729426e562d7d4d7ebb1d5ff2e7c8
> [2] https://reviews.llvm.org/D54609
> [3] https://gcc.gnu.org/bugzilla/attachment.cgi?id=51854=diff
> 


[PATCH] tree-optimization/103361 - fix unroll-and-jam direction vector handling

2021-11-23 Thread Richard Biener via Gcc-patches
This properly uses lambda_int instead of truncating the direction
vector to int which leads to false unexpected negative values.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-11-23  Richard Biener  

PR tree-optimization/103361
* gimple-loop-jam.c (adjust_unroll_factor): Use lambda_int
for the dependence distance.
* tree-data-ref.c (print_lambda_vector): Properly print a lambda_int.

* g++.dg/torture/pr103361.C: New testcase.
---
 gcc/gimple-loop-jam.c   |  4 ++--
 gcc/testsuite/g++.dg/torture/pr103361.C | 18 ++
 gcc/tree-data-ref.c |  2 +-
 3 files changed, 21 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/torture/pr103361.C

diff --git a/gcc/gimple-loop-jam.c b/gcc/gimple-loop-jam.c
index 666f740f86d..933a4e0e6b0 100644
--- a/gcc/gimple-loop-jam.c
+++ b/gcc/gimple-loop-jam.c
@@ -402,10 +402,10 @@ adjust_unroll_factor (class loop *inner, struct 
data_dependence_relation *ddr,
 a >= N, or b > 0, or b is zero and a > 0.  Otherwise the unroll
 factor needs to be limited so that the first condition holds.
 That may limit the factor down to zero in the worst case.  */
- int dist = dist_v[0];
+ lambda_int dist = dist_v[0];
  if (dist < 0)
gcc_unreachable ();
- else if ((unsigned)dist >= *unroll)
+ else if (dist >= (lambda_int)*unroll)
;
  else if (lambda_vector_zerop (dist_v + 1, DDR_NB_LOOPS (ddr) - 1))
{
diff --git a/gcc/testsuite/g++.dg/torture/pr103361.C 
b/gcc/testsuite/g++.dg/torture/pr103361.C
new file mode 100644
index 000..ec1d6e1bae4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr103361.C
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-floop-unroll-and-jam" } */
+
+char a, b;
+extern unsigned short c[];
+extern bool d[];
+const unsigned short (const unsigned short , const unsigned short ) {
+  if (g < f)
+return g;
+  return f;
+}
+void k() {
+  for (int h = 0; b; h += 3)
+for (unsigned long i = 0; i < 11104842004558084287ULL;
+ i += -11104842004558084300ULL)
+  for (bool j(e(6, e(6, c[h + i]))); j < (bool)a; j = 7)
+d[7] = 0;
+}
diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index 46f4ffedb48..9b6ca1a91e5 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -388,7 +388,7 @@ print_lambda_vector (FILE * outfile, lambda_vector vector, 
int n)
   int i;
 
   for (i = 0; i < n; i++)
-fprintf (outfile, "%3d ", (int)vector[i]);
+fprintf (outfile, HOST_WIDE_INT_PRINT_DEC " ", vector[i]);
   fprintf (outfile, "\n");
 }
 
-- 
2.31.1


Re: [PATCH] inliner: Remove unused transform_lang_insert_block hook

2021-11-23 Thread Richard Biener via Gcc-patches
On Tue, 23 Nov 2021, Jakub Jelinek wrote:

> Hi!
> 
> This struct copy_body_data's hook is always NULL since merge
> of the tuples branch, before that it has been shortly used by the C++
> FE during ctor/dtor cloning to chain the remapped blocks, but only
> very shortly, before transform_lang_insert_block was a bool and
> the call to insert_block was done through a langhook.
> I'd say that for something that hasn't been used since 4.4 there is
> zero chance we'll want to use it again in the near future.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> 2021-11-23  Jakub Jelinek  
> 
> gcc/
>   * tree-inline.h (struct copy_body_data): Remove
>   transform_lang_insert_block member.
>   * tree-inline.c (remap_block): Don't call
>   id->transform_lang_insert_block.
>   (optimize_inline_calls, copy_gimple_seq_and_replace_locals,
>   tree_function_versioning, maybe_inline_call_in_expr,
>   copy_fn): Don't initialize id.transform_lang_insert_block.
>   * gimplify.c (gimplify_omp_loop): Likewise.
> gcc/c/
>   * c-typeck.c (c_clone_omp_udr): Don't initialize
>   id.transform_lang_insert_block.
> gcc/cp/
>   * semantics.c (clone_omp_udr): Don't initialize
>   id.transform_lang_insert_block.
>   * optimize.c (clone_body): Likewise.
> 
> --- gcc/tree-inline.h.jj  2021-05-04 21:02:24.181799753 +0200
> +++ gcc/tree-inline.h 2021-11-22 13:17:15.330592766 +0100
> @@ -133,9 +133,6 @@ struct copy_body_data
>   and only in that case will actually remap the type.  */
>bool dont_remap_vla_if_no_change;
>  
> -  /* A function to be called when duplicating BLOCK nodes.  */
> -  void (*transform_lang_insert_block) (tree);
> -
>/* Statements that might be possibly folded.  */
>hash_set *statements_to_fold;
>  
> --- gcc/tree-inline.c.jj  2021-11-19 16:39:51.669594099 +0100
> +++ gcc/tree-inline.c 2021-11-22 13:17:30.429377127 +0100
> @@ -823,9 +823,6 @@ remap_block (tree *block, copy_body_data
>   _NONLOCALIZED_VARS (new_block),
>   id);
>  
> -  if (id->transform_lang_insert_block)
> -id->transform_lang_insert_block (new_block);
> -
>/* Remember the remapped block.  */
>insert_decl_map (id, old_block, new_block);
>  }
> @@ -5473,7 +5470,6 @@ optimize_inline_calls (tree fn)
>id.transform_new_cfg = false;
>id.transform_return_to_modify = true;
>id.transform_parameter = true;
> -  id.transform_lang_insert_block = NULL;
>id.statements_to_fold = new hash_set;
>  
>push_gimplify_context ();
> @@ -5857,7 +5853,6 @@ copy_gimple_seq_and_replace_locals (gimp
>id.transform_new_cfg = false;
>id.transform_return_to_modify = false;
>id.transform_parameter = false;
> -  id.transform_lang_insert_block = NULL;
>  
>/* Walk the tree once to find local labels.  */
>memset (, 0, sizeof (wi));
> @@ -6252,7 +6247,6 @@ tree_function_versioning (tree old_decl,
>id.transform_new_cfg = true;
>id.transform_return_to_modify = false;
>id.transform_parameter = false;
> -  id.transform_lang_insert_block = NULL;
>  
>old_entry_block = ENTRY_BLOCK_PTR_FOR_FN
>  (DECL_STRUCT_FUNCTION (old_decl));
> @@ -6541,7 +6535,6 @@ maybe_inline_call_in_expr (tree exp)
>id.transform_new_cfg = false;
>id.transform_return_to_modify = true;
>id.transform_parameter = true;
> -  id.transform_lang_insert_block = NULL;
>  
>/* Make sure not to unshare trees behind the front-end's back
>since front-end specific mechanisms may rely on sharing.  */
> @@ -6613,7 +6606,6 @@ copy_fn (tree fn, tree& parms, tree& res
>id.transform_new_cfg = false;
>id.transform_return_to_modify = false;
>id.transform_parameter = true;
> -  id.transform_lang_insert_block = NULL;
>  
>/* Make sure not to unshare trees behind the front-end's back
>   since front-end specific mechanisms may rely on sharing.  */
> --- gcc/gimplify.c.jj 2021-11-22 16:14:18.365451780 +0100
> +++ gcc/gimplify.c2021-11-22 16:54:31.726082428 +0100
> @@ -13413,7 +13413,6 @@ gimplify_omp_loop (tree *expr_p, gimple_
>   id.transform_call_graph_edges = CB_CGE_DUPLICATE;
>   id.transform_new_cfg = true;
>   id.transform_return_to_modify = false;
> - id.transform_lang_insert_block = NULL;
>   id.eh_lp_nr = 0;
>   walk_tree (_CLAUSE_REDUCTION_INIT (*pc), copy_tree_body_r,
>  , NULL);
> --- gcc/c/c-typeck.c.jj   2021-11-22 10:07:01.305225923 +0100
> +++ gcc/c/c-typeck.c  2021-11-22 13:18:20.678659463 +0100
> @@ -13848,7 +13848,6 @@ c_clone_omp_udr (tree stmt, tree omp_dec
>id.transform_call_graph_edges = CB_CGE_DUPLICATE;
>id.transform_new_cfg = true;
>id.transform_return_to_modify = false;
> -  id.transform_lang_insert_block = NULL;
>id.eh_lp_nr = 0;
>walk_tree (, copy_tree_body_r, , NULL);
>

[PATCH] inliner: Remove unused transform_lang_insert_block hook

2021-11-23 Thread Jakub Jelinek via Gcc-patches
Hi!

This struct copy_body_data's hook is always NULL since merge
of the tuples branch, before that it has been shortly used by the C++
FE during ctor/dtor cloning to chain the remapped blocks, but only
very shortly, before transform_lang_insert_block was a bool and
the call to insert_block was done through a langhook.
I'd say that for something that hasn't been used since 4.4 there is
zero chance we'll want to use it again in the near future.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-11-23  Jakub Jelinek  

gcc/
* tree-inline.h (struct copy_body_data): Remove
transform_lang_insert_block member.
* tree-inline.c (remap_block): Don't call
id->transform_lang_insert_block.
(optimize_inline_calls, copy_gimple_seq_and_replace_locals,
tree_function_versioning, maybe_inline_call_in_expr,
copy_fn): Don't initialize id.transform_lang_insert_block.
* gimplify.c (gimplify_omp_loop): Likewise.
gcc/c/
* c-typeck.c (c_clone_omp_udr): Don't initialize
id.transform_lang_insert_block.
gcc/cp/
* semantics.c (clone_omp_udr): Don't initialize
id.transform_lang_insert_block.
* optimize.c (clone_body): Likewise.

--- gcc/tree-inline.h.jj2021-05-04 21:02:24.181799753 +0200
+++ gcc/tree-inline.h   2021-11-22 13:17:15.330592766 +0100
@@ -133,9 +133,6 @@ struct copy_body_data
  and only in that case will actually remap the type.  */
   bool dont_remap_vla_if_no_change;
 
-  /* A function to be called when duplicating BLOCK nodes.  */
-  void (*transform_lang_insert_block) (tree);
-
   /* Statements that might be possibly folded.  */
   hash_set *statements_to_fold;
 
--- gcc/tree-inline.c.jj2021-11-19 16:39:51.669594099 +0100
+++ gcc/tree-inline.c   2021-11-22 13:17:30.429377127 +0100
@@ -823,9 +823,6 @@ remap_block (tree *block, copy_body_data
_NONLOCALIZED_VARS (new_block),
id);
 
-  if (id->transform_lang_insert_block)
-id->transform_lang_insert_block (new_block);
-
   /* Remember the remapped block.  */
   insert_decl_map (id, old_block, new_block);
 }
@@ -5473,7 +5470,6 @@ optimize_inline_calls (tree fn)
   id.transform_new_cfg = false;
   id.transform_return_to_modify = true;
   id.transform_parameter = true;
-  id.transform_lang_insert_block = NULL;
   id.statements_to_fold = new hash_set;
 
   push_gimplify_context ();
@@ -5857,7 +5853,6 @@ copy_gimple_seq_and_replace_locals (gimp
   id.transform_new_cfg = false;
   id.transform_return_to_modify = false;
   id.transform_parameter = false;
-  id.transform_lang_insert_block = NULL;
 
   /* Walk the tree once to find local labels.  */
   memset (, 0, sizeof (wi));
@@ -6252,7 +6247,6 @@ tree_function_versioning (tree old_decl,
   id.transform_new_cfg = true;
   id.transform_return_to_modify = false;
   id.transform_parameter = false;
-  id.transform_lang_insert_block = NULL;
 
   old_entry_block = ENTRY_BLOCK_PTR_FOR_FN
 (DECL_STRUCT_FUNCTION (old_decl));
@@ -6541,7 +6535,6 @@ maybe_inline_call_in_expr (tree exp)
   id.transform_new_cfg = false;
   id.transform_return_to_modify = true;
   id.transform_parameter = true;
-  id.transform_lang_insert_block = NULL;
 
   /* Make sure not to unshare trees behind the front-end's back
 since front-end specific mechanisms may rely on sharing.  */
@@ -6613,7 +6606,6 @@ copy_fn (tree fn, tree& parms, tree& res
   id.transform_new_cfg = false;
   id.transform_return_to_modify = false;
   id.transform_parameter = true;
-  id.transform_lang_insert_block = NULL;
 
   /* Make sure not to unshare trees behind the front-end's back
  since front-end specific mechanisms may rely on sharing.  */
--- gcc/gimplify.c.jj   2021-11-22 16:14:18.365451780 +0100
+++ gcc/gimplify.c  2021-11-22 16:54:31.726082428 +0100
@@ -13413,7 +13413,6 @@ gimplify_omp_loop (tree *expr_p, gimple_
id.transform_call_graph_edges = CB_CGE_DUPLICATE;
id.transform_new_cfg = true;
id.transform_return_to_modify = false;
-   id.transform_lang_insert_block = NULL;
id.eh_lp_nr = 0;
walk_tree (_CLAUSE_REDUCTION_INIT (*pc), copy_tree_body_r,
   , NULL);
--- gcc/c/c-typeck.c.jj 2021-11-22 10:07:01.305225923 +0100
+++ gcc/c/c-typeck.c2021-11-22 13:18:20.678659463 +0100
@@ -13848,7 +13848,6 @@ c_clone_omp_udr (tree stmt, tree omp_dec
   id.transform_call_graph_edges = CB_CGE_DUPLICATE;
   id.transform_new_cfg = true;
   id.transform_return_to_modify = false;
-  id.transform_lang_insert_block = NULL;
   id.eh_lp_nr = 0;
   walk_tree (, copy_tree_body_r, , NULL);
   return stmt;
--- gcc/cp/semantics.c.jj   2021-11-19 09:58:37.239716820 +0100
+++ gcc/cp/semantics.c  2021-11-22 13:18:40.604374884 +0100
@@ -6066,7 +6066,6 @@ clone_omp_udr (tree stmt, tree omp_decl1
   

[committed] openmp: Fix up handling of reduction clauses on the loop construct [PR102431]

2021-11-23 Thread Jakub Jelinek via Gcc-patches
Hi!

We were using unshare_expr and walk_tree_without_duplicate replacement
of the placeholder vars.  The OMP_CLAUSE_REDUCTION_{INIT,MERGE} can contain
other trees that need to be duplicated though, e.g. BLOCKs referenced in
BIND_EXPR(s), or local VAR_DECLs.  This patch uses the inliner code to copy
all of that.  There is a slight complication that those local VAR_DECLs or
placeholders don't have DECL_CONTEXT set, they will get that only when
they are gimplified later on, so this patch sets DECL_CONTEXT for those
temporarily and resets it afterwards.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2021-11-23  Jakub Jelinek  

PR middle-end/102431
* gimplify.c (replace_reduction_placeholders): Remove.
(note_no_context_vars): New function.
(gimplify_omp_loop): For OMP_PARALLEL's BIND_EXPR create a new
BLOCK.  Use copy_tree_body_r with walk_tree instead of unshare_expr
and replace_reduction_placeholders for duplication of
OMP_CLAUSE_REDUCTION_{INIT,MERGE} expressions.  Ensure all mentioned
automatic vars have DECL_CONTEXT set to non-NULL before doing so
and reset it afterwards for those vars and their corresponding
vars.

* c-c++-common/gomp/pr102431.c: New test.
* g++.dg/gomp/pr102431.C: New test.
* gfortran.dg/gomp/pr102431.f90: New test.

--- gcc/gimplify.c.jj   2021-11-17 17:28:51.0 +0100
+++ gcc/gimplify.c  2021-11-22 16:14:18.365451780 +0100
@@ -13128,21 +13128,15 @@ gimplify_omp_for (tree *expr_p, gimple_s
 /* Helper for gimplify_omp_loop, called through walk_tree.  */
 
 static tree
-replace_reduction_placeholders (tree *tp, int *walk_subtrees, void *data)
+note_no_context_vars (tree *tp, int *, void *data)
 {
-  if (DECL_P (*tp))
+  if (VAR_P (*tp)
+  && DECL_CONTEXT (*tp) == NULL_TREE
+  && !is_global_var (*tp))
 {
-  tree *d = (tree *) data;
-  if (*tp == OMP_CLAUSE_REDUCTION_PLACEHOLDER (d[0]))
-   {
- *tp = OMP_CLAUSE_REDUCTION_PLACEHOLDER (d[1]);
- *walk_subtrees = 0;
-   }
-  else if (*tp == OMP_CLAUSE_REDUCTION_DECL_PLACEHOLDER (d[0]))
-   {
- *tp = OMP_CLAUSE_REDUCTION_DECL_PLACEHOLDER (d[1]);
- *walk_subtrees = 0;
-   }
+  vec *d = (vec *) data;
+  d->safe_push (*tp);
+  DECL_CONTEXT (*tp) = current_function_decl;
 }
   return NULL_TREE;
 }
@@ -13312,7 +13306,8 @@ gimplify_omp_loop (tree *expr_p, gimple_
 {
   if (pass == 2)
{
- tree bind = build3 (BIND_EXPR, void_type_node, NULL, NULL, NULL);
+ tree bind = build3 (BIND_EXPR, void_type_node, NULL, NULL,
+ make_node (BLOCK));
  append_to_statement_list (*expr_p, _EXPR_BODY (bind));
  *expr_p = make_node (OMP_PARALLEL);
  TREE_TYPE (*expr_p) = void_type_node;
@@ -13379,25 +13374,64 @@ gimplify_omp_loop (tree *expr_p, gimple_
*pc = copy_node (c);
OMP_CLAUSE_DECL (*pc) = unshare_expr (OMP_CLAUSE_DECL (c));
TREE_TYPE (*pc) = unshare_expr (TREE_TYPE (c));
-   OMP_CLAUSE_REDUCTION_INIT (*pc)
- = unshare_expr (OMP_CLAUSE_REDUCTION_INIT (c));
-   OMP_CLAUSE_REDUCTION_MERGE (*pc)
- = unshare_expr (OMP_CLAUSE_REDUCTION_MERGE (c));
if (OMP_CLAUSE_REDUCTION_PLACEHOLDER (*pc))
  {
+   auto_vec no_context_vars;
+   int walk_subtrees = 0;
+   note_no_context_vars (_CLAUSE_REDUCTION_PLACEHOLDER (c),
+ _subtrees, _context_vars);
+   if (tree p = OMP_CLAUSE_REDUCTION_DECL_PLACEHOLDER (c))
+ note_no_context_vars (, _subtrees, _context_vars);
+   walk_tree_without_duplicates (_CLAUSE_REDUCTION_INIT (c),
+ note_no_context_vars,
+ _context_vars);
+   walk_tree_without_duplicates (_CLAUSE_REDUCTION_MERGE (c),
+ note_no_context_vars,
+ _context_vars);
+
OMP_CLAUSE_REDUCTION_PLACEHOLDER (*pc)
  = copy_node (OMP_CLAUSE_REDUCTION_PLACEHOLDER (c));
if (OMP_CLAUSE_REDUCTION_DECL_PLACEHOLDER (*pc))
  OMP_CLAUSE_REDUCTION_DECL_PLACEHOLDER (*pc)
= copy_node (OMP_CLAUSE_REDUCTION_DECL_PLACEHOLDER (c));
-   tree nc = *pc;
-   tree data[2] = { c, nc };
-   walk_tree_without_duplicates (_CLAUSE_REDUCTION_INIT (nc),
- replace_reduction_placeholders,
- data);
-   walk_tree_without_duplicates (_CLAUSE_REDUCTION_MERGE (nc),
- replace_reduction_placeholders,
- data);
+
+   

Re: [PATCH] Fix incorrect loop exit edge probability [PR103270]

2021-11-23 Thread Jan Hubicka via Gcc-patches
> On Tue, Nov 23, 2021 at 6:52 AM Xionghu Luo  wrote:
> >
> > r12-4526 cancelled jump thread path rotates loop. It exposes a issue in
> > profile-estimate when predict_extra_loop_exits, outer loop's exit edge
> > is marked as inner loop's extra loop exit and set with incorrect
> > prediction, then a hot inner loop will become cold loop finally through
> > optimizations, this patch ignores the EDGE_DFS_BACK edge when searching
> > extra exit edges to avoid unexpected predict_edge.
> 
> Not sure how outer vs. inner loop exit correlates with EDGE_DFS_BACK,
> I have expected a check based on which loop is exited by the edge instead?
> A backedge should never be an exit, no?
> 
> Note that the profile pass does not yet mark backedges so EDGE_DFS_BACK
> settings are unreliable.

So we have two nested loops and an exit which goes from inner loop and
exists both loops.  While processing outer loop we set pretty high exit
probability that is not good for inner loop?

I guess we could just check if exit edge source basic block has same
loop depth as the loop we ar eprocesing?

Honza
> 
> Richard.
> 
> >
> > gcc/ChangeLog:
> >
> > PR middle-end/103270
> > * predict.c (predict_extra_loop_exits): Ignore EDGE_DFS_BACK edge.
> >
> > gcc/ChangeLog:
> >
> > PR middle-end/103270
> > * predict.c (predict_extra_loop_exits): New.
> > ---
> >  gcc/predict.c | 4 
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/gcc/predict.c b/gcc/predict.c
> > index 68b11135680..1ae8ccff72c 100644
> > --- a/gcc/predict.c
> > +++ b/gcc/predict.c
> > @@ -1910,6 +1910,10 @@ predict_extra_loop_exits (edge exit_edge)
> > continue;
> >if ((check_value_one ^ integer_onep (val)) == 1)
> > continue;
> > +#if 0
> > +  if (e->flags & EDGE_DFS_BACK)
> > +   continue;
> > +#endif
> >if (EDGE_COUNT (e->src->succs) != 1)
> > {
> >   predict_paths_leading_to_edge (e, PRED_LOOP_EXTRA_EXIT, 
> > NOT_TAKEN);
> > --
> > 2.25.1
> >


Re: Improve byte-wise DSE (modref-dse-[45].c failures)

2021-11-23 Thread Richard Biener via Gcc-patches
On Tue, Nov 23, 2021 at 8:26 AM Jan Hubicka via Gcc-patches
 wrote:
>
> Hi,
> testcase modref-dse-4.c and modref-dse-5.c fails on some targets because they
> depend on store merging.  What really happens is that without store merging
> we produce for kill_me combined write that is ao_ref with offset=0, size=32
> and max_size=96.  We have size != max_size becaue we do ont track the info 
> that
> all 3 writes must happen in a group and conider case only some of them are 
> done.
>
> This disables byte-wise DSE which checks that size == max_size.  This is
> completely unnecesary for store being proved to be dead or load being checked
> to not read live bytes.  It is only necessary for kill store that is used to
> prove that given store is dead.
>
> While looking into this I also noticed that we check that everything is byte
> aligned.  This is also unnecessary and with access merging in modref may more
> commonly fire on accesses that we could otherwise handle.
>
> This patch fixes both also also changes interface to normalize_ref that I 
> found
> confusing since it modifies the ref. Instead of that we have get_byte_range
> that is computing range in bytes (since that is what we need to maintain the
> bitmap) and has additional parameter specifying if the store in question 
> should
> be turned into sub-range or super-range depending whether we compute range
> for kill or load.
>
> Bootstrapped/regtested x86_64-linux OK?

OK.

Thanks,
Richard.

> gcc/ChangeLog:
>
> 2021-11-23  Jan Hubicka  
>
> * tree-ssa-dse.c (valid_ao_ref_for_dse): Rename to ...
> (valid_ao_ref_kill_for_dse): ... this; do not check that boundaries
> are divisible by BITS_PER_UNIT.
> (get_byte_aligned_range_containing_ref): New function.
> (get_byte_aligned_range_contained_in_ref): New function.
> (normalize_ref): Rename to ...
> (get_byte_range): ... this one; handle accesses not aligned to byte
> boundary; return range in bytes rater than updating ao_ref.
> (clear_live_bytes_for_ref): Take write ref by reference; simplify 
> using
> get_byte_access.
> (setup_live_bytes_from_ref): Likewise.
> (clear_bytes_written_by): Update.
> (live_bytes_read): Update.
> (dse_classify_store): Simplify tech before live_bytes_read checks.
>
> gcc/testsuite/ChangeLog:
>
> 2021-11-23  Jan Hubicka  
>
> * gcc.dg/tree-ssa/modref-dse-4.c: Update template.
> * gcc.dg/tree-ssa/modref-dse-5.c: Update template.
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-4.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-4.c
> index 81aa7dc587c..19e91b00f15 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-4.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-4.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-tree-dse2-details"  } */
> +/* { dg-options "-O2 -fdump-tree-dse1-details"  } */
>  struct a {int a,b,c;};
>  __attribute__ ((noinline))
>  void
> @@ -23,4 +23,4 @@ set (struct a *a)
>my_pleasure (a);
>a->b=1;
>  }
> -/* { dg-final { scan-tree-dump "Deleted dead store: kill_me" "dse2" } } */
> +/* { dg-final { scan-tree-dump "Deleted dead store: kill_me" "dse1" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-5.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-5.c
> index ad35b70136f..dc2c2892615 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-5.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-5.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-tree-dse2-details"  } */
> +/* { dg-options "-O2 -fdump-tree-dse1-details"  } */
>  struct a {int a,b,c;};
>  __attribute__ ((noinline))
>  void
> @@ -36,8 +36,7 @@ set (struct a *a)
>  {
>wrap (0, a);
>int ret = wrap2 (0, a);
> -  //int ret = my_pleasure (a);
>a->b=1;
>return ret;
>  }
> -/* { dg-final { scan-tree-dump "Deleted dead store: wrap" "dse2" } } */
> +/* { dg-final { scan-tree-dump "Deleted dead store: wrap" "dse1" } } */
> diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
> index 9531d892f76..8717d654e5a 100644
> --- a/gcc/tree-ssa-dse.c
> +++ b/gcc/tree-ssa-dse.c
> @@ -156,57 +156,137 @@ initialize_ao_ref_for_dse (gimple *stmt, ao_ref *write)
>  }
>
>  /* Given REF from the alias oracle, return TRUE if it is a valid
> -   memory reference for dead store elimination, false otherwise.
> +   kill memory reference for dead store elimination, false otherwise.
>
> In particular, the reference must have a known base, known maximum
> size, start at a byte offset and have a size that is one or more
> bytes.  */
>
>  static bool
> -valid_ao_ref_for_dse (ao_ref *ref)
> +valid_ao_ref_kill_for_dse (ao_ref *ref)
>  {
>return (ao_ref_base (ref)
>   && known_size_p (ref->max_size)
>   && maybe_ne (ref->size, 0)
>   && known_eq (ref->max_size, ref->size)
> - && known_ge (ref->offset, 0)
> - && multiple_p (ref->offset, BITS_PER_UNIT)
> 

Re: [PATCH] Fix incorrect loop exit edge probability [PR103270]

2021-11-23 Thread Richard Biener via Gcc-patches
On Tue, Nov 23, 2021 at 6:52 AM Xionghu Luo  wrote:
>
> r12-4526 cancelled jump thread path rotates loop. It exposes a issue in
> profile-estimate when predict_extra_loop_exits, outer loop's exit edge
> is marked as inner loop's extra loop exit and set with incorrect
> prediction, then a hot inner loop will become cold loop finally through
> optimizations, this patch ignores the EDGE_DFS_BACK edge when searching
> extra exit edges to avoid unexpected predict_edge.

Not sure how outer vs. inner loop exit correlates with EDGE_DFS_BACK,
I have expected a check based on which loop is exited by the edge instead?
A backedge should never be an exit, no?

Note that the profile pass does not yet mark backedges so EDGE_DFS_BACK
settings are unreliable.

Richard.

>
> gcc/ChangeLog:
>
> PR middle-end/103270
> * predict.c (predict_extra_loop_exits): Ignore EDGE_DFS_BACK edge.
>
> gcc/ChangeLog:
>
> PR middle-end/103270
> * predict.c (predict_extra_loop_exits): New.
> ---
>  gcc/predict.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/gcc/predict.c b/gcc/predict.c
> index 68b11135680..1ae8ccff72c 100644
> --- a/gcc/predict.c
> +++ b/gcc/predict.c
> @@ -1910,6 +1910,10 @@ predict_extra_loop_exits (edge exit_edge)
> continue;
>if ((check_value_one ^ integer_onep (val)) == 1)
> continue;
> +#if 0
> +  if (e->flags & EDGE_DFS_BACK)
> +   continue;
> +#endif
>if (EDGE_COUNT (e->src->succs) != 1)
> {
>   predict_paths_leading_to_edge (e, PRED_LOOP_EXTRA_EXIT, NOT_TAKEN);
> --
> 2.25.1
>


Re: [PATCH 2/2][GCC] arm: Declare MVE types internally via pragma

2021-11-23 Thread Murray Steele via Gcc-patches
On 18/11/2021 15:45, Richard Earnshaw wrote:

> 
> This is mostly OK, but can't we reduce the number of tests somewhat? For 
> example, I think you can merge type_redef_13.c and type_redef_14.c by writing
> 
> /* { dg-do compile } */
> /* { dg-require-effective-target arm_v8_1m_mve_ok } */
> /* { dg-add-options arm_v8_1m_mve } */
> 
> int uint8x16x4_t; /* { dg-message "note: previous declaration of 
> 'uint8x16x4_t'" } */
> int uint16x8x2_t; /* { dg-message "note: previous declaration of 
> 'uint16x8x2_t'" } */
> 
> #pragma GCC arm "arm_mve_types.h"  /* { dg-error {'uint8x16x4_t' redeclared} 
> } */
>   /* { dg-error {'uint16x8x2_t' redeclared} {target *-*-*} .-1 } */
> 
> etc.  Note the second dg-error is anchored to the line above it (.-1).
> 
> R.

Thanks. I think if we'd like to reduce the number of tests, it would make the 
most
sense to merge the test cases in the way you've described based on their 
implementation
and target features. i.e.

- type_redef_1.c : covers mve_pred16_t.
- type_redef_2.c : covers single-integer-vector types.
- type_redef_3.c : covers single-float-vector types.
- type_redef_4.c : covers integer-vector-tuple types.
- type_redef_5.c : covers float-vector-tuple types.

The idea being that the test results for these tests should allow someone to 
triangulate
the cause of the failure. For example, if tests 4 and 5 fail, it is likely due 
to a
deficiency in the MVE tuple type implementation, rather than the handling of 
target-specific
features. More specific tests failures can be determined by looking through log 
files.

Thanks,
Murray


Re: [PATCH v2] Canonicalize [ssa_n, CST] to ssa_n p+ CST in fold_stmt_1

2021-11-23 Thread Richard Biener via Gcc-patches
On Tue, Nov 23, 2021 at 6:30 AM apinski--- via Gcc-patches
 wrote:
>
> From: Andrew Pinski 
>
> This is a new version of the patch to fix PR 102216.
> Instead of doing the canonicalization inside forwprop, Richi
> mentioned we should do it inside fold_stmt_1 and that is what
> this patch does.
>
> PR tree-optimization/102216
>
> gcc/ChangeLog:
>
> * gimple-fold.c (fold_stmt_1): Add canonicalization
> of "[ssa_n, CST]" to "ssa_n p+ CST", note this
> can only be done if !in_place.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/tree-ssa/pr102216-1.C: New test.
> * g++.dg/tree-ssa/pr102216-2.C: New test.
> ---
>  gcc/gimple-fold.c  | 21 ++
>  gcc/testsuite/g++.dg/tree-ssa/pr102216-1.C | 21 ++
>  gcc/testsuite/g++.dg/tree-ssa/pr102216-2.C | 45 ++
>  3 files changed, 87 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr102216-1.C
>  create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr102216-2.C
>
> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
> index ad9703ee471..aab6818c93f 100644
> --- a/gcc/gimple-fold.c
> +++ b/gcc/gimple-fold.c
> @@ -6061,6 +6061,27 @@ fold_stmt_1 (gimple_stmt_iterator *gsi, bool inplace, 
> tree (*valueize) (tree))
>   if (REFERENCE_CLASS_P (*lhs)
>   && maybe_canonicalize_mem_ref_addr (lhs))
> changed = true;
> + /* Canonicalize [ssa_n, CST] to ssa_n p+ CST.
> +This cannot be done in maybe_canonicalize_mem_ref_addr
> +as the gimple now has two operands rather than one.
> +The same reason why this can't be done in
> +maybe_canonicalize_mem_ref_addr is the same reason why
> +this can't be done inplace.  */
> + if (!inplace && TREE_CODE (*rhs) == ADDR_EXPR)
> +   {
> + tree inner = TREE_OPERAND (*rhs, 0);
> + if (TREE_CODE (inner) == MEM_REF
> + && TREE_CODE (TREE_OPERAND (inner, 0)) == SSA_NAME

fold_stmt also works pre-SSA, so instead check for != ADDR_EXPR here

> + && TREE_CODE (TREE_OPERAND (inner, 1)) == INTEGER_CST)
> +   {
> + tree ptr = TREE_OPERAND (inner, 0);
> + tree addon = TREE_OPERAND (inner, 1);
> + addon = fold_convert (sizetype, addon);
> + gimple_assign_set_rhs_with_ops (gsi, POINTER_PLUS_EXPR,
> + ptr, addon);

please update 'stmt' here from gsi

OK with those changes.

Thanks,
Richard.

> + changed = true;
> +   }
> +   }
> }
>else
> {
> diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr102216-1.C 
> b/gcc/testsuite/g++.dg/tree-ssa/pr102216-1.C
> new file mode 100644
> index 000..21f7f6797ff
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/tree-ssa/pr102216-1.C
> @@ -0,0 +1,21 @@
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +void link_error ();
> +void g ()
> +{
> +  const char **language_names;
> +
> +  language_names = new const char *[6];
> +
> +  const char **language_names_p = language_names;
> +
> +  language_names_p++;
> +  language_names_p++;
> +  language_names_p++;
> +
> +  if ( (language_names_p) - (language_names+3) != 0)
> +link_error();
> +  delete[] language_names;
> +}
> +/* We should have removed the link_error on the gimple level as GCC should
> +   be able to tell that language_names_p is the same as language_names+3.  */
> +/* { dg-final { scan-tree-dump-times "link_error" 0 "optimized" } } */
> diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr102216-2.C 
> b/gcc/testsuite/g++.dg/tree-ssa/pr102216-2.C
> new file mode 100644
> index 000..8d351a9bad0
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/tree-ssa/pr102216-2.C
> @@ -0,0 +1,45 @@
> +/* { dg-options "-O2 -Wall" } */
> +#include 
> +
> +static inline bool
> +compare_cstrings (const char *str1, const char *str2)
> +{
> +  return str1 < str2;
> +}
> +
> +void
> +add_set_language_command ()
> +{
> +  static const char **language_names;
> +
> +  language_names = new const char *[6];
> +
> +  language_names[0] = "auto";
> +  language_names[1] = "local";
> +  language_names[2] = "unknown";
> +
> +  const char **language_names_p = language_names;
> +  /* language_names_p == _names[0].  */
> +  language_names_p++;
> +  /* language_names_p == _names[1].  */
> +  language_names_p++;
> +  /* language_names_p == _names[2].  */
> +  language_names_p++;
> +  /* language_names_p == _names[3].  */
> +
> +  const char **sort_begin;
> +
> +  if (0)
> +sort_begin = _names[3];
> +  else
> +sort_begin = language_names_p;
> +
> +  language_names[3] = "";
> +  language_names[4] = "";
> +  language_names[5] = NULL;
> +
> +  /* There should be no warning associated with this std::sort as
> + sort_begin != _names[5] and GCC should be able to figure
> + that out.  */
> +  std::sort (sort_begin, _names[5], 

Re: [PATCH] fixincludes: don't abort() on access failure [PR103306]

2021-11-23 Thread Xi Ruoyao via Gcc-patches
On Mon, 2021-11-22 at 17:37 -0700, Jeff Law wrote:
> 
> 
> On 11/18/2021 4:01 AM, Xi Ruoyao via Gcc-patches wrote:
> > Some distro may ship dangling symlinks in include directories,
> > triggers
> > the access failure.  Skip it and continue to next header instead of
> > being to panic.
> > 
> > Restore to old behavior before r12-5234 but without resurrecting the
> > problematic getcwd() call, by using the environment variable "INPUT"
> > exported by fixinc.sh.
> > 
> > Tested on x86_64-linux-gnu, with a dangling symlink in /usr/include.
> > 
> > fixincludes/
> > 
> > PR bootstrap/103306
> > * fixincl.c (process): Don't call abort().
> > ---
> >   fixincludes/fixincl.c | 15 ---
> >   1 file changed, 12 insertions(+), 3 deletions(-)
> > 
> > diff --git a/fixincludes/fixincl.c b/fixincludes/fixincl.c
> > index a17b65866c3..81939ee5ffa 100644
> > --- a/fixincludes/fixincl.c
> > +++ b/fixincludes/fixincl.c
> > @@ -1352,10 +1352,19 @@ process (void)
> >   
> >     if (access (pz_curr_file, R_OK) != 0)
> >   {
> > -  /* Some really strange error happened.  */
> > -  fprintf (stderr, "Cannot access %s: %s\n", pz_curr_file,
> > +  /* It may happens if for e. g. the distro ships some broken symlinks
> > +   * in /usr/include.  */
> > +
> > +  /* "INPUT" is exported in fixinc.sh, which is the pwd where fixincl
> > +   * runs.  It's used instead of getcwd to avoid allocating a buffer
> > +   * with unknown length.  */
> Formatting nits.  We don't use '*' at the start of comment lines. Drop
> the '*' like this
> 
>    /* blah blah blah
>   more text.  */

Strangely contrib/check_GNU_style.sh does not warn about this.

> 
> > +  const char *cwd = getenv ("INPUT");
> > +  if (!cwd)
> > +   cwd = "the working directory";
> > +
> > +  fprintf (stderr, "Cannot access %s from %s: %s\n", pz_curr_file, cwd,
> >    xstrerror (errno));
> > -  abort ();
> > +  return;
> >   }
> If INPUT is always exported, why not just print it? ie, would CWD after 
> actually be NULL?

INPUT is set by fixinc.sh.  During GCC building process fixincl is
always invoked by fixinc.sh.  However someone may run fixincl executable
directly for debugging.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v3] c-format: Add -Wformat-int-precision option [PR80060]

2021-11-23 Thread Daniil Stas via Gcc-patches
On Mon, 22 Nov 2021 20:35:03 +
Joseph Myers  wrote:

> On Sun, 21 Nov 2021, Daniil Stas via Gcc-patches wrote:
> 
> > This option is enabled by default when -Wformat option is enabled. A
> > user can specify -Wno-format-int-precision to disable emitting
> > warnings when passing an argument of an incompatible integer type to
> > a 'd', 'i', 'o', 'u', 'x', or 'X' conversion specifier when it has
> > the same precision as the expected type.  
> 
> I'd expect this to apply to 'b' and 'B' as well (affects commit
> message, ChangeLog entry, option help string, documentation).
> 

Hi Joseph,

I can't find any description of these specifiers anywhere. And looks
like gcc doesn't recognize them when I try to compile a sample program
with them (I just get %B printed when I run the program).
Do these specifiers actually exist? Can you point me to the
documentation?

Thanks


Re: [PATCH, rs6000] optimization for vec_reve builtin [PR100868]

2021-11-23 Thread HAO CHEN GUI via Gcc-patches
Thanks for your review. Committed as r12-5463.

On 22/11/2021 上午 10:56, David Edelsohn wrote:
> On Wed, Nov 17, 2021 at 3:28 AM HAO CHEN GUI  wrote:
>> Hi,
>>
>>   The patch optimized for vec_reve builtin on rs6000. For V2DI and V2DF, it 
>> is implemented by xxswapd on all targets. For V16QI, V8HI, V4SI and V4SF, it 
>> is implemented by quadword byte reverse plus halfword/word byte reverse when 
>> p9_vector is set.
>>
>>   Bootstrapped and tested on powerpc64le-linux with no regressions. Is this 
>> okay for trunk? Any recommendations? Thanks a lot.
>>
>> ChangeLog
>> 2021-11-17 Haochen Gui 
>>
>> gcc/
>> * config/rs6000/altivec.md (altivec_vreve2 for VEC_K): Use
>> xxbrq for v16qi, xxbrq + xxbrh for v8hi and xxbrq + xxbrw for v4si
>> or v4sf when p9_vector is set.
>> (altivec_vreve2 for VEC_64): Defined. Implemented by xxswapd.
>>
>> gcc/testsuite/
>> * gcc.target/powerpc/vec_reve_1.c: New test.
>> * gcc.target/powerpc/vec_reve_2.c: Likewise.
> This is okay.
>
> Please don't send a message that contains the patch as both an inline
> message and as an attachment.
>
> Thanks, David


Re: [PATCH] [RFC][PR102768] aarch64: Add compiler support for Shadow Call Stack

2021-11-23 Thread Dan Li via Gcc-patches

Hi Szabolcs,

First of all, apologies for my late reply (since I just had a new baby,
I'm quite busy recently and also because I'm not familiar with C++
exception handling, it takes me some time to learn this part).

On 11/3/21 8:00 PM, Szabolcs Nagy wrote:

The 11/03/2021 00:24, Dan Li wrote:

On 11/2/21 9:04 PM, Szabolcs Nagy wrote:

The 11/02/2021 00:06, Dan Li via Gcc-patches wrote:

Shadow Call Stack can be used to protect the return address of a
function at runtime, and clang already supports this feature[1].

To enable SCS in user mode, in addition to compiler, other support
is also required (as described in [2]). This patch only adds basic
support for SCS from the compiler side, and provides convenience
for users to enable SCS.

For linux kernel, only the support of the compiler is required.

[1] https://clang.llvm.org/docs/ShadowCallStack.html
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102768


i'm not a gcc maintainer, but i prefer such feature
to be in upstream gcc instead of in a plugin.

it will require update to the documentation:

which should mention that it depends on -ffixed-x18
(probably that should be enforced too) which is an
important abi issue: functions following the normal
pcs can clobber x18 and break scs.


Thanks Szabolcs, I will update the documentation in next version.

It sounds reasonable to enforced -ffixed-x18 with scs, but I see
that clang doesn’t do that. Maybe it is better to be consistent
with clang here?


i mean gcc can issue a diagnostic if -ffixed-x18 is not passed.
(it seems clang rejects scs too without -ffixed-x18)


Oh, yes, you are right. Clang rejects scs without -ffixed-x18[1],
I should add a similar check in next version.

and that there is no unwinder support.


Ok, let me try to add a support for this.


i assume exception handling info has to change for scs to
work (to pop the shadow stack when transferring control),
so either scs must require -fno-exceptions or the eh info
changes must be implemented.

i think the kernel does not require exceptions and does
not depend on the unwinder runtime in libgcc, so this
is optional for the linux kernel use-case.


I recompiled a glibc and gcc runtime library with -ffixed-x18 enabled.
As you said, the scs stack needs to be popped at the same time during
exception handling.

I saw that Clang is processed by adding
".cfi_escape 0x16, 0x12, 0x02, 0x82, 0x78"
directive (x18 -= 8;) after each emit of scs push[2].

But this directive has problems when executed in libgcc:
1)context->reg[x] in uw_init_context_1 are all based on cfa, most
  registers have no initial values by default.
2)Address of shadow call stack (x18) cannot(and should not) be calculated
  based on cfa, and I did not yet find a way to assign hardware register
  x18 to context->reg[18].
3)This causes libgcc to crash when parsing .cfi_escape exp because of 0
  address dereference (* x18)
  (execute_stack_op => case DW_OP_breg18: _Unwind_GetGR)
4)uw_install_context_1 does not restore all hardware registers by default
  before eh return, so context->reg[18] can't write directly to hw x18.
  (In clang, __unw_getcontext/__unw_resume will save/restore all hardware
  registers, so this directive works fine in my libunwind test.)

I tried to fix this problem through a patch[3], the exception handling
works fine in my test environment, but I'm not sure if this fix is
ppropriate for two reasons:
1)libgcc does not push/pop all registers by default during exception
  handling. Is this change appropriate?
2)The test case may not be able to test this patch, because the test
  environment requires at least on glibc/gcc runtime compiled with
  -ffixed-x18.

May be it's better to rely on -fno-exceptions for this patch first? and If
the glibc/gcc runtime also supports SCS later, the problem can be fixed
at the same time.

PS:
I'm still not familiar enough with exception handling in libgcc/libunwind,
please correct me if there are any mistakes :)

[1] 
https://github.com/llvm/llvm-project/commit/f11eb3ebe77729426e562d7d4d7ebb1d5ff2e7c8
[2] https://reviews.llvm.org/D54609
[3] https://gcc.gnu.org/bugzilla/attachment.cgi?id=51854=diff