Re: [PATCH] i386: Optimize code generation of __mm256_zextsi128_si256(__mm_set1_epi8(-1))

2022-09-22 Thread Hongtao Liu via Gcc-patches
On Fri, Sep 23, 2022 at 11:07 AM Hu, Lin1  wrote:
>
> Hi, Hongtao
>
> I have modefied this patch and regtested on x86_64-pc-linux-gnu.
>
Ok.
> BRs.
> Lin
>
> -Original Message-
> From: Hongtao Liu 
> Sent: Friday, September 23, 2022 9:48 AM
> To: Hu, Lin1 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> Subject: Re: [PATCH] i386: Optimize code generation of 
> __mm256_zextsi128_si256(__mm_set1_epi8(-1))
>
> On Thu, Sep 22, 2022 at 3:20 PM Hu, Lin1 via Gcc-patches 
>  wrote:
> >
> > Hi all,
> >
> > This patch aims to optimize code generation of 
> > __mm256_zextsi128_si256(__mm_set1_epi8(-1)). Reduce the number of 
> > instructions required to achieve the final result.
> >
> > Regtested on x86_64-pc-linux-gnu. Ok for trunk?
> >
> > BRs,
> > Lin
> >
> > gcc/ChangeLog:
> >
> > PR target/94962
> > * config/i386/constraints.md (BH): New define_constraint.
> > * config/i386/i386.cc (standard_sse_constant_p): Add return 3/4 
> > when operand matches new predicate.
> > (standard_sse_constant_opcode): Add new alternative branch to 
> > return "vpcmpeqd".
> > * config/i386/predicates.md 
> > (vector_all_ones_zero_extend_half_operand): New define_predicate.
> > (vector_all_ones_zero_extend_quarter_operand): Ditto.
> > * config/i386/sse.md: Add constraint to insn "mov_internal".
> (mov_internal): Add new constraint BH.
> Put the insn name at first.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/94962
> > * gcc.target/i386/avx256-unaligned-load-1.c: Modify test.
> > * gcc.target/i386/avx256-unaligned-store-1.c: Ditto.
> > * gcc.target/i386/avx256-unaligned-store-2.c: Ditto.
> > * gcc.target/i386/avx256-unaligned-store-3.c: Ditto.
> > * gcc.target/i386/pr94962-1.c: New test.
> > * gcc.target/i386/pr94962-2.c: Ditto.
> > * gcc.target/i386/pr94962-3.c: Ditto.
> > * gcc.target/i386/pr94962-4.c: Ditto.
> > ---
> >  gcc/config/i386/constraints.md|  8 +++
> >  gcc/config/i386/i386.cc   | 26 +++-
> >  gcc/config/i386/predicates.md | 49 ++
> >  gcc/config/i386/sse.md|  8 +--
> >  .../gcc.target/i386/avx256-unaligned-load-1.c |  4 +-
> >  .../i386/avx256-unaligned-store-1.c   |  4 +-
> >  .../i386/avx256-unaligned-store-2.c   |  4 +-
> >  .../i386/avx256-unaligned-store-3.c   |  4 +-
> >  gcc/testsuite/gcc.target/i386/pr94962-1.c | 11 
> >  gcc/testsuite/gcc.target/i386/pr94962-2.c | 17 +
> >  gcc/testsuite/gcc.target/i386/pr94962-3.c | 64 +++
> >  gcc/testsuite/gcc.target/i386/pr94962-4.c | 49 ++
> >  12 files changed, 235 insertions(+), 13 deletions(-)  create mode
> > 100644 gcc/testsuite/gcc.target/i386/pr94962-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr94962-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr94962-3.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr94962-4.c
> >
> > diff --git a/gcc/config/i386/constraints.md
> > b/gcc/config/i386/constraints.md index 7361687632f..95b2b142d41 100644
> > --- a/gcc/config/i386/constraints.md
> > +++ b/gcc/config/i386/constraints.md
> > @@ -168,6 +168,9 @@
> >  ;;  z  Constant call address operand.
> >  ;;  C  Integer SSE constant with all bits set operand.
> >  ;;  F  Floating-point SSE constant with all bits set operand.
> > +;;  H  Integer SSE constant that is 128/256bit all ones
> > +;; and zero-extand to 256/512bit, or 128bit all ones
> > +;; and zero-extend to 512bit.
> >  ;;  M  x86-64 memory operand.
> >
> >  (define_constraint "Bf"
> > @@ -233,6 +236,11 @@
> >(and (match_test "TARGET_SSE")
> > (match_operand 0 "float_vector_all_ones_operand")))
> >
> > +(define_constraint "BH"
> > +  "@internal integer constant with last half/quarter bits set operand."
> > +  (ior (match_operand 0 "vector_all_ones_zero_extend_half_operand")
> > +   (match_operand 0
> > +"vector_all_ones_zero_extend_quarter_operand")))
> > +
> >  ;; NB: Similar to 'm', but don't use define_memory_constraint on
> > x86-64  ;; to prevent LRA from converting the operand to the form '(mem 
> > (reg X))'
> >  ;; where X is a base register.
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index
> > dadf453d6c0..ca799da5d7e 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -5186,7 +5186,8 @@ standard_80387_constant_rtx (int idx)
> >XFmode);  }
> >
> > -/* Return 1 if X is all bits 0 and 2 if X is all bits 1
> > +/* Return 1 if X is all bits 0, 2 if X is all bits 1
> > +   and 3 if X is all bits 1 with zero extend
> > in supported SSE/AVX vector mode.  */
> >
> >  int
> > @@ -5234,6 +5235,10 @@ standard_sse_constant_p (rtx x, machine_mode 
> > pred_mode)
> > }
> >  }
> >
> > +  if (vector_all_ones_zero_extend_half_operand (x, mode)
> > +  || 

RE: [PATCH] i386: Optimize code generation of __mm256_zextsi128_si256(__mm_set1_epi8(-1))

2022-09-22 Thread Hu, Lin1 via Gcc-patches
Hi, Hongtao

I have modefied this patch and regtested on x86_64-pc-linux-gnu.

BRs.
Lin

-Original Message-
From: Hongtao Liu  
Sent: Friday, September 23, 2022 9:48 AM
To: Hu, Lin1 
Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
Subject: Re: [PATCH] i386: Optimize code generation of 
__mm256_zextsi128_si256(__mm_set1_epi8(-1))

On Thu, Sep 22, 2022 at 3:20 PM Hu, Lin1 via Gcc-patches 
 wrote:
>
> Hi all,
>
> This patch aims to optimize code generation of 
> __mm256_zextsi128_si256(__mm_set1_epi8(-1)). Reduce the number of 
> instructions required to achieve the final result.
>
> Regtested on x86_64-pc-linux-gnu. Ok for trunk?
>
> BRs,
> Lin
>
> gcc/ChangeLog:
>
> PR target/94962
> * config/i386/constraints.md (BH): New define_constraint.
> * config/i386/i386.cc (standard_sse_constant_p): Add return 3/4 when 
> operand matches new predicate.
> (standard_sse_constant_opcode): Add new alternative branch to return 
> "vpcmpeqd".
> * config/i386/predicates.md 
> (vector_all_ones_zero_extend_half_operand): New define_predicate.
> (vector_all_ones_zero_extend_quarter_operand): Ditto.
> * config/i386/sse.md: Add constraint to insn "mov_internal".
(mov_internal): Add new constraint BH.
Put the insn name at first.
>
> gcc/testsuite/ChangeLog:
>
> PR target/94962
> * gcc.target/i386/avx256-unaligned-load-1.c: Modify test.
> * gcc.target/i386/avx256-unaligned-store-1.c: Ditto.
> * gcc.target/i386/avx256-unaligned-store-2.c: Ditto.
> * gcc.target/i386/avx256-unaligned-store-3.c: Ditto.
> * gcc.target/i386/pr94962-1.c: New test.
> * gcc.target/i386/pr94962-2.c: Ditto.
> * gcc.target/i386/pr94962-3.c: Ditto.
> * gcc.target/i386/pr94962-4.c: Ditto.
> ---
>  gcc/config/i386/constraints.md|  8 +++
>  gcc/config/i386/i386.cc   | 26 +++-
>  gcc/config/i386/predicates.md | 49 ++
>  gcc/config/i386/sse.md|  8 +--
>  .../gcc.target/i386/avx256-unaligned-load-1.c |  4 +-
>  .../i386/avx256-unaligned-store-1.c   |  4 +-
>  .../i386/avx256-unaligned-store-2.c   |  4 +-
>  .../i386/avx256-unaligned-store-3.c   |  4 +-
>  gcc/testsuite/gcc.target/i386/pr94962-1.c | 11 
>  gcc/testsuite/gcc.target/i386/pr94962-2.c | 17 +
>  gcc/testsuite/gcc.target/i386/pr94962-3.c | 64 +++
>  gcc/testsuite/gcc.target/i386/pr94962-4.c | 49 ++
>  12 files changed, 235 insertions(+), 13 deletions(-)  create mode 
> 100644 gcc/testsuite/gcc.target/i386/pr94962-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr94962-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr94962-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr94962-4.c
>
> diff --git a/gcc/config/i386/constraints.md 
> b/gcc/config/i386/constraints.md index 7361687632f..95b2b142d41 100644
> --- a/gcc/config/i386/constraints.md
> +++ b/gcc/config/i386/constraints.md
> @@ -168,6 +168,9 @@
>  ;;  z  Constant call address operand.
>  ;;  C  Integer SSE constant with all bits set operand.
>  ;;  F  Floating-point SSE constant with all bits set operand.
> +;;  H  Integer SSE constant that is 128/256bit all ones
> +;; and zero-extand to 256/512bit, or 128bit all ones
> +;; and zero-extend to 512bit.
>  ;;  M  x86-64 memory operand.
>
>  (define_constraint "Bf"
> @@ -233,6 +236,11 @@
>(and (match_test "TARGET_SSE")
> (match_operand 0 "float_vector_all_ones_operand")))
>
> +(define_constraint "BH"
> +  "@internal integer constant with last half/quarter bits set operand."
> +  (ior (match_operand 0 "vector_all_ones_zero_extend_half_operand")
> +   (match_operand 0 
> +"vector_all_ones_zero_extend_quarter_operand")))
> +
>  ;; NB: Similar to 'm', but don't use define_memory_constraint on 
> x86-64  ;; to prevent LRA from converting the operand to the form '(mem (reg 
> X))'
>  ;; where X is a base register.
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 
> dadf453d6c0..ca799da5d7e 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -5186,7 +5186,8 @@ standard_80387_constant_rtx (int idx)
>XFmode);  }
>
> -/* Return 1 if X is all bits 0 and 2 if X is all bits 1
> +/* Return 1 if X is all bits 0, 2 if X is all bits 1
> +   and 3 if X is all bits 1 with zero extend
> in supported SSE/AVX vector mode.  */
>
>  int
> @@ -5234,6 +5235,10 @@ standard_sse_constant_p (rtx x, machine_mode pred_mode)
> }
>  }
>
> +  if (vector_all_ones_zero_extend_half_operand (x, mode)
> +  || vector_all_ones_zero_extend_quarter_operand (x, mode))
> +return 3;
> +
>return 0;
>  }
>
> @@ -5341,6 +5346,25 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx 
> *operands)
>   gcc_unreachable ();
> }
> }
> +  else if (vector_all_ones_zero_extend_half_operand 

Re: [PATCH V3] rs6000: cannot_force_const_mem for HIGH code rtx[PR106460]

2022-09-22 Thread Jiufu Guo via Gcc-patches
Hi,

Gentle ping:
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601190.html

BR,
Jeff (Jiufu)

Jiufu Guo  writes:

> Hi,
>
> As the issue in PR106460, a rtx 'high:DI (symbol_ref:DI ("var_48")' is tried
> to store into constant pool and ICE occur.  But actually, this rtx represents
> partial address and can not be put into a .rodata section.
>
> This patch updates rs6000_cannot_force_const_mem to return true for rtx(s) 
> with
> HIGH code, because these rtx(s) indicate part of address and are not ok for
> constant pool.
>
> Below are some examples:
> (high:DI (const:DI (plus:DI (symbol_ref:DI ("xx") (const_int 12 [0xc])
> (high:DI (symbol_ref:DI ("var_1")..)))
>
> This patch updated the previous patch, and drafted an test case which ICE
> without the patch, and assoicated with one PR.
> https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597712.html
> This patch also updated the message for previous patch V2.
>
> I would ask help to review this patch one more time.
>
> Bootstrap and regtest pass on ppc64 and ppc64le.
> Is this ok for trunk.
>
> BR,
> Jeff(Jiufu)
>
>   PR target/106460
>
> gcc/ChangeLog:
>
>   * config/rs6000/rs6000.cc (rs6000_cannot_force_const_mem): Return true
>   for HIGH code rtx.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/powerpc/pr106460.c: New test.
> ---
>  gcc/config/rs6000/rs6000.cc |  7 +--
>  gcc/testsuite/gcc.target/powerpc/pr106460.c | 11 +++
>  2 files changed, 16 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106460.c
>
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 2f3146e56f8..04e3a393147 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -9643,8 +9643,11 @@ rs6000_init_stack_protect_guard (void)
>  static bool
>  rs6000_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
>  {
> -  if (GET_CODE (x) == HIGH
> -  && GET_CODE (XEXP (x, 0)) == UNSPEC)
> +  /* If GET_CODE (x) is HIGH, the 'X' represets the high part of a 
> symbol_ref.
> + It indicates partial address,  which can not be put into a constant 
> pool.
> + e.g.  (high:DI (unspec:DI [(symbol_ref/u:DI ("*.LC0")..)
> + (high:DI (symbol_ref:DI ("var")..)).  */
> +  if (GET_CODE (x) == HIGH)
>  return true;
>  
>/* A TLS symbol in the TOC cannot contain a sum.  */
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106460.c 
> b/gcc/testsuite/gcc.target/powerpc/pr106460.c
> new file mode 100644
> index 000..dfaffcb6e28
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr106460.c
> @@ -0,0 +1,11 @@
> +/* { dg-options "-O1 -mdejagnu-cpu=power10" } */
> +
> +/* (high:DI (symbol_ref:DI ("var_48")..))) should not cause ICE. */
> +extern short var_48;
> +void
> +foo (double *r)
> +{
> +  if (var_48)
> +*r = 1234.5678;
> +}
> +


Re: [PATCH] i386: Optimize code generation of __mm256_zextsi128_si256(__mm_set1_epi8(-1))

2022-09-22 Thread Hongtao Liu via Gcc-patches
On Thu, Sep 22, 2022 at 3:20 PM Hu, Lin1 via Gcc-patches
 wrote:
>
> Hi all,
>
> This patch aims to optimize code generation of 
> __mm256_zextsi128_si256(__mm_set1_epi8(-1)). Reduce the number of 
> instructions required to achieve the final result.
>
> Regtested on x86_64-pc-linux-gnu. Ok for trunk?
>
> BRs,
> Lin
>
> gcc/ChangeLog:
>
> PR target/94962
> * config/i386/constraints.md (BH): New define_constraint.
> * config/i386/i386.cc (standard_sse_constant_p): Add return 3/4 when 
> operand matches new predicate.
> (standard_sse_constant_opcode): Add new alternative branch to return 
> "vpcmpeqd".
> * config/i386/predicates.md 
> (vector_all_ones_zero_extend_half_operand): New define_predicate.
> (vector_all_ones_zero_extend_quarter_operand): Ditto.
> * config/i386/sse.md: Add constraint to insn "mov_internal".
(mov_internal): Add new constraint BH.
Put the insn name at first.
>
> gcc/testsuite/ChangeLog:
>
> PR target/94962
> * gcc.target/i386/avx256-unaligned-load-1.c: Modify test.
> * gcc.target/i386/avx256-unaligned-store-1.c: Ditto.
> * gcc.target/i386/avx256-unaligned-store-2.c: Ditto.
> * gcc.target/i386/avx256-unaligned-store-3.c: Ditto.
> * gcc.target/i386/pr94962-1.c: New test.
> * gcc.target/i386/pr94962-2.c: Ditto.
> * gcc.target/i386/pr94962-3.c: Ditto.
> * gcc.target/i386/pr94962-4.c: Ditto.
> ---
>  gcc/config/i386/constraints.md|  8 +++
>  gcc/config/i386/i386.cc   | 26 +++-
>  gcc/config/i386/predicates.md | 49 ++
>  gcc/config/i386/sse.md|  8 +--
>  .../gcc.target/i386/avx256-unaligned-load-1.c |  4 +-
>  .../i386/avx256-unaligned-store-1.c   |  4 +-
>  .../i386/avx256-unaligned-store-2.c   |  4 +-
>  .../i386/avx256-unaligned-store-3.c   |  4 +-
>  gcc/testsuite/gcc.target/i386/pr94962-1.c | 11 
>  gcc/testsuite/gcc.target/i386/pr94962-2.c | 17 +
>  gcc/testsuite/gcc.target/i386/pr94962-3.c | 64 +++
>  gcc/testsuite/gcc.target/i386/pr94962-4.c | 49 ++
>  12 files changed, 235 insertions(+), 13 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr94962-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr94962-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr94962-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr94962-4.c
>
> diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
> index 7361687632f..95b2b142d41 100644
> --- a/gcc/config/i386/constraints.md
> +++ b/gcc/config/i386/constraints.md
> @@ -168,6 +168,9 @@
>  ;;  z  Constant call address operand.
>  ;;  C  Integer SSE constant with all bits set operand.
>  ;;  F  Floating-point SSE constant with all bits set operand.
> +;;  H  Integer SSE constant that is 128/256bit all ones
> +;; and zero-extand to 256/512bit, or 128bit all ones
> +;; and zero-extend to 512bit.
>  ;;  M  x86-64 memory operand.
>
>  (define_constraint "Bf"
> @@ -233,6 +236,11 @@
>(and (match_test "TARGET_SSE")
> (match_operand 0 "float_vector_all_ones_operand")))
>
> +(define_constraint "BH"
> +  "@internal integer constant with last half/quarter bits set operand."
> +  (ior (match_operand 0 "vector_all_ones_zero_extend_half_operand")
> +   (match_operand 0 "vector_all_ones_zero_extend_quarter_operand")))
> +
>  ;; NB: Similar to 'm', but don't use define_memory_constraint on x86-64
>  ;; to prevent LRA from converting the operand to the form '(mem (reg X))'
>  ;; where X is a base register.
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index dadf453d6c0..ca799da5d7e 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -5186,7 +5186,8 @@ standard_80387_constant_rtx (int idx)
>XFmode);
>  }
>
> -/* Return 1 if X is all bits 0 and 2 if X is all bits 1
> +/* Return 1 if X is all bits 0, 2 if X is all bits 1
> +   and 3 if X is all bits 1 with zero extend
> in supported SSE/AVX vector mode.  */
>
>  int
> @@ -5234,6 +5235,10 @@ standard_sse_constant_p (rtx x, machine_mode pred_mode)
> }
>  }
>
> +  if (vector_all_ones_zero_extend_half_operand (x, mode)
> +  || vector_all_ones_zero_extend_quarter_operand (x, mode))
> +return 3;
> +
>return 0;
>  }
>
> @@ -5341,6 +5346,25 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx 
> *operands)
>   gcc_unreachable ();
> }
> }
> +  else if (vector_all_ones_zero_extend_half_operand (x, mode))
> +{
> +  if (GET_MODE_SIZE (mode) == 64)
> +   {
> + gcc_assert (TARGET_AVX512F);
> + return "vpcmpeqd \t %t0, %t0, %t0";
> +   }
> +  else if (GET_MODE_SIZE (mode) == 32)
> +   {
> + gcc_assert (TARGET_AVX);
> + return "vpcmpeqd \t %x0, %x0, %x0";
> +   }
> +  

Re: [RFC PATCH] __trunc{tf, xf, df, sf, hf}bf2, __truncbfhf2 and __extendbfsf2

2022-09-22 Thread Hongtao Liu via Gcc-patches
On Thu, Sep 22, 2022 at 11:56 PM Jakub Jelinek  wrote:
>
> On Tue, Sep 20, 2022 at 10:51:18AM +0200, Jakub Jelinek via Gcc-patches wrote:
> > On Tue, Sep 20, 2022 at 11:35:07AM +0800, Hongtao Liu wrote:
> > > > The question is (mainly for aarch64, arm and x86 backend maintainers) 
> > > > if we
> > > > shouldn't support it, in the PR there is a partial patch to do so, but
> > > > the big question is if it should be supported as the __bf16 type those
> > > > 3 targets use with u6__bf16 mangling and remove those *_invalid_* cases
> > > > and add conversions to/from at least SFmode but probably also DFmode, 
> > > > TFmode
> > > > and XFmode on x86 and implement arithmetics on those through conversion 
> > > > to
> > > > SFmode, performing arithmetics there and conversion back.
> > > > Conversion from BFmode to SFmode is easy, left shift by 16 and ought to 
> > > > be
> > > > implemented inline, SFmode -> BFmode conversion is harder,
> > > > I think it is roughly:
> > > I'm not sure if there should be any floating point exceptions for
> > > BFmode operation.
> > > For x86, there's no floating point exceptions for AVX512_BF16 related
> > > instructions
> >
> > As long as __bf16 is just an extension, supporting or not supporting
> > exceptions on sNaNs is just fine I think, but I'm afraid it is different
> > for std::bfloat16_t.  If we claim we support it (define that type
> > in , predefine __STD_BFLOAT16_TYPE__), then it needs to follow
> > ISO/IEC/IEEE 60559, and I'm afraid that means also exceptions and the like.
> > While the IEEE spec doesn't cover the exact bfloat16 format, C++ talks about
> > a format with these and these number of bits here and there that behaves
> > like in IEEE otherwise.
> > Whether we support std::bfloat16_t at all is our choice, if we do support
> > it, whether we support it with __bf16 underlying type or come up with
> > something different, it is up to us, and with -ffast-math/-Ofast etc.
> > we can certainly use hw instructions for it which don't raise exceptions.
> >
> > At least that is my limited understanding of it...
>
> I've been playing with this a little bit and here is a soft-fp version of
> IMHO everything we need for proper bfloat16 support.
> In particular, I think we need all the truncating conversions from other
> floating formats that a target with BFmode floating point support (currently
> arm, aarch64 and x86) has, truncating conversion from BFmode to HFmode
> (seems GCC when precision is the same considers conversions truncating)
> and an extension from BFmode to SFmode.  Extensions from BFmode to
> SF/DF/XF/TFmode are IMHO best implemented inside of GCC by performing
> BFmode to SFmode conversion first and then converting SFmode to those
> other formats, other arithmetics on BFmode should be implemented simply
> by widening to SFmode, doing arithmetics there and then converting back.
> The BF to SFmode extension can be also implemented simply by shifting
> the VCEd value up by 16 bits and VCEing the result if flags say
> sNaNs don't need to be handled, or IMHO if we use the extended result
> in some arithmetic operation that will handle the sNaN signaling +
> conversion into qNaN, similarly for SFmode to BFmode conversions
> we can use hw instructions if available and we don't care about sNaNs.
>
> The C FE has the advantage that it has excess precision support, there
> we should arrange for BFmode to be always promoted to SFmode excess
> precision, but C++ FE doesn't.
>
> Also, question to ARM/AArch64/x86 maintainers is if it is ok to
> add conversion and arithmetic support to the __bf16 type, or if
> that type should keep to be useless and there should be another
> type (some keyword or just float __attribute__((__mode__ (__BF__
> that we'd have that support for.  Whatever type we'd use as
> std::bfloat16_t should mangle as DFb16_ rather than u6__bf16 that
> __bf16 currently mangles to though.
>
> Thoughts on this?
x86 is ok to add conversion and arithmetic support, also for mange as DFb16_.
>
> And for Joseph, sure, the libgcc/soft-fp/ part should probably go
> into glibc first and be copied from there afterwards.
>
> Perhaps the __truncbfhf2 could be dropped and we could just on
> the compiler side emit shift left by 16 before calling __truncsfhf2.
>
> --- libgcc/soft-fp/brain.h.jj   2022-09-22 15:28:04.865171729 +0200
> +++ libgcc/soft-fp/brain.h  2022-09-22 15:35:11.970374554 +0200
> @@ -0,0 +1,172 @@
> +/* Software floating-point emulation.
> +   Definitions for Brain Floating Point format (bfloat16).
> +   Copyright (C) 1997-2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser 

Re: [committed 2/2] libstdc++: Implement constexpr std::bitset for C++23 (P2417R2)

2022-09-22 Thread Jonathan Wakely via Gcc-patches
On Thu, 22 Sept 2022 at 15:26, Jonathan Wakely via Libstdc++
 wrote:
>
> Tested x86_64-linux. Pushed to trunk.
>
> -- >8 --
>
> Also add _GLIBCXX_HOSTED checks to simplify making 
> freestanding in the near future.
>
> libstdc++-v3/ChangeLog:
>
> * include/std/bitset (bitset): Add constexpr for C++23. Guard
> members using std::string with _GLIBCXX_HOSTED.
> * include/std/version (__cpp_lib_constexpr_bitset): Define.
> * testsuite/20_util/bitset/access/constexpr.cc: New test.
> * testsuite/20_util/bitset/cons/constexpr_c++23.cc: New test.
> * testsuite/20_util/bitset/count/constexpr.cc: New test.
> * testsuite/20_util/bitset/ext/constexpr.cc: New test.
> * testsuite/20_util/bitset/operations/constexpr_c++23.cc: New test.
> * testsuite/20_util/bitset/version.cc: New test.

The new tests fail with -D_GLIBCXX_DEBUG because I didn't update .

I'll do that tomorrow.



Re: [PATCH] testsuite: Sanitize fails for SP FPU on Arm

2022-09-22 Thread Joseph Myers
On Thu, 22 Sep 2022, Torbjörn SVENSSON via Gcc-patches wrote:

> This patch stops reporting fails for Arm targets with single
> precision floating point unit for types wider than 32 bits (the width
> of float on arm-none-eabi).
> 
> As reported in PR102017, fenv is reported as supported in recent
> versions of newlib. At the same time, for some Arm targets, the
> implementation in libgcc does not support exceptions and thus, the
> test fails with a call to abort().

It's definitely wrong to have this sort of Arm-specific conditional in 
architecture-independent tests.  Tests requiring floating-point exceptions 
support should have an appropriate dg-require-effective-target; if that 
dg-require-effective-target wrongly passes in certain configurations, fix 
it (or e.g. add a new check_effective_target_fenv_exceptions_double to 
verify that exceptions work for double, as opposed to the present 
check_effective_target_fenv_exceptions which checks whether exceptions 
work for float, and then adjust tests requiring exceptions for double to 
use the new effective-target).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Avoid depending on destructor order

2022-09-22 Thread Jason Merrill via Gcc-patches

On 9/19/22 12:20, Thomas Neumann wrote:

In some scenarios (e.g., when mixing gcc and clang code), it can
happen that frames are deregistered after the lookup structure
has already been destroyed. That in itself would be fine, but
it triggers an assert in __deregister_frame_info_bases that
expects to find the frame.

To avoid that, we now remember that the btree as already been
destroyed and disable the assert in that case.


OK.


libgcc/ChangeLog:

 * unwind-dw2-fde.c: (release_register_frames) Remember
 when the btree has been destroyed.
 (__deregister_frame_info_bases) Disable the assert when
 shutting down.
---
  libgcc/unwind-dw2-fde.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c
index 919abfe0664..d237179f4ea 100644
--- a/libgcc/unwind-dw2-fde.c
+++ b/libgcc/unwind-dw2-fde.c
@@ -48,6 +48,7 @@ typedef __UINTPTR_TYPE__ uintptr_type;
  #include "unwind-dw2-btree.h"

  static struct btree registered_frames;
+static bool in_shutdown;

  static void
  release_registered_frames (void) __attribute__ ((destructor (110)));
@@ -57,6 +58,7 @@ release_registered_frames (void)
    /* Release the b-tree and all frames. Frame releases that happen 
later are

     * silently ignored */
    btree_destroy (_frames);
+  in_shutdown = true;
  }

  static void
@@ -282,7 +284,7 @@ __deregister_frame_info_bases (const void *begin)
    __gthread_mutex_unlock (_mutex);
  #endif

-  gcc_assert (ob);
+  gcc_assert (in_shutdown || ob);
    return (void *) ob;
  }





Re: [PATCH] c++: ICE-on-invalid with designated initializer [PR106983]

2022-09-22 Thread Jason Merrill via Gcc-patches

On 9/20/22 17:05, Marek Polacek wrote:

We ICE in the code added in r12-7117: type_build_dtor_call gets
the error_mark_node because the type of 'prev' wasn't declared.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


PR c++/106983

gcc/cp/ChangeLog:

* typeck2.cc (split_nonconstant_init_1): Check TYPE_P.

gcc/testsuite/ChangeLog:

* g++.dg/other/error36.C: New test.
---
  gcc/cp/typeck2.cc|  2 +-
  gcc/testsuite/g++.dg/other/error36.C | 13 +
  2 files changed, 14 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/other/error36.C

diff --git a/gcc/cp/typeck2.cc b/gcc/cp/typeck2.cc
index 688e9c15326..75fd0e2a9bf 100644
--- a/gcc/cp/typeck2.cc
+++ b/gcc/cp/typeck2.cc
@@ -597,7 +597,7 @@ split_nonconstant_init_1 (tree dest, tree init, bool last,
if (prev == field_index)
  break;
tree ptype = TREE_TYPE (prev);
-   if (type_build_dtor_call (ptype))
+   if (TYPE_P (ptype) && type_build_dtor_call (ptype))
  {
tree pcref = build3 (COMPONENT_REF, ptype, dest, prev,
 NULL_TREE);
diff --git a/gcc/testsuite/g++.dg/other/error36.C 
b/gcc/testsuite/g++.dg/other/error36.C
new file mode 100644
index 000..556287816fd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/other/error36.C
@@ -0,0 +1,13 @@
+// PR c++/106983
+// { dg-do compile { target c++20 } }
+
+typedef unsigned long long A;
+typedef union
+{
+  struct B s; // { dg-error "incomplete" }
+  A a;
+} U;
+void f (A x, unsigned int b)
+{
+  const U y = {.a = x};
+}

base-commit: be60aa5b608b5f09fadfeff852a46589ac311a42




Re: [PATCH] c++: Implement __is_{nothrow_,}convertible [PR106784]

2022-09-22 Thread Jason Merrill via Gcc-patches

On 9/22/22 09:39, Marek Polacek wrote:

To improve compile times, the C++ library could use compiler built-ins
rather than implementing std::is_convertible (and _nothrow) as class
templates.  This patch adds the built-ins.  We already have
__is_constructible and __is_assignable, and the nothrow forms of those.

Microsoft (and clang, for compatibility) also provide an alias called
__is_convertible_to.  I did not add it, but it would be trivial to do
so.

I noticed that our __is_assignable doesn't implement the "Access checks
are performed as if from a context unrelated to either type" requirement,
therefore std::is_assignable / __is_assignable give two different results
here:

   class S {
 operator int();
 friend void g(); // #1
   };

   void
   g ()
   {
 // #1 doesn't matter
 static_assert(std::is_assignable::value, "");
 static_assert(__is_assignable(int&, S), "");
   }

This is not a problem if __is_assignable is not meant to be used by
the users.


That's fine, it's not.


This patch doesn't make libstdc++ use the new built-ins, but I had to
rename a class otherwise its name would clash with the new built-in.


Sigh, that's going to be a hassle when comparing compiler versions on 
preprocessed code.



Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR c++/106784

gcc/c-family/ChangeLog:

* c-common.cc (c_common_reswords): Add __is_convertible and
__is_nothrow_convertible.
* c-common.h (enum rid): Add RID_IS_CONVERTIBLE and
RID_IS_NOTHROW_CONVERTIBLE.

gcc/cp/ChangeLog:

* constraint.cc (diagnose_trait_expr): Handle CPTK_IS_CONVERTIBLE
and CPTK_IS_NOTHROW_CONVERTIBLE.
* cp-objcp-common.cc (names_builtin_p): Handle RID_IS_CONVERTIBLE
RID_IS_NOTHROW_CONVERTIBLE.
* cp-tree.h (enum cp_trait_kind): Add CPTK_IS_CONVERTIBLE and
CPTK_IS_NOTHROW_CONVERTIBLE.
(is_convertible): Declare.
(is_nothrow_convertible): Likewise.
* cxx-pretty-print.cc (pp_cxx_trait_expression): Handle
CPTK_IS_CONVERTIBLE and CPTK_IS_NOTHROW_CONVERTIBLE.
* method.cc (is_convertible): New.
(is_nothrow_convertible): Likewise.
* parser.cc (cp_parser_primary_expression): Handle RID_IS_CONVERTIBLE
and RID_IS_NOTHROW_CONVERTIBLE.
(cp_parser_trait_expr): Likewise.
* semantics.cc (trait_expr_value): Handle CPTK_IS_CONVERTIBLE and
CPTK_IS_NOTHROW_CONVERTIBLE.
(finish_trait_expr): Likewise.

libstdc++-v3/ChangeLog:

* include/std/type_traits: Rename __is_nothrow_convertible to
__is_nothrow_convertible_lib.
* testsuite/20_util/is_nothrow_convertible/value_ext.cc: Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Enhance to test __is_convertible and
__is_nothrow_convertible.
* g++.dg/ext/is_convertible1.C: New test.
* g++.dg/ext/is_convertible2.C: New test.
* g++.dg/ext/is_nothrow_convertible1.C: New test.
* g++.dg/ext/is_nothrow_convertible2.C: New test.
---
  gcc/c-family/c-common.cc  |   2 +
  gcc/c-family/c-common.h   |   1 +
  gcc/cp/constraint.cc  |   6 +
  gcc/cp/cp-objcp-common.cc |   2 +
  gcc/cp/cp-tree.h  |   4 +
  gcc/cp/cxx-pretty-print.cc|   6 +
  gcc/cp/method.cc  |  31 ++
  gcc/cp/parser.cc  |  10 +
  gcc/cp/semantics.cc   |   8 +
  gcc/testsuite/g++.dg/ext/has-builtin-1.C  |   6 +
  gcc/testsuite/g++.dg/ext/is_convertible1.C| 269 +
  gcc/testsuite/g++.dg/ext/is_convertible2.C|  46 +++
  .../g++.dg/ext/is_nothrow_convertible1.C  | 270 ++
  .../g++.dg/ext/is_nothrow_convertible2.C  |  19 ++
  libstdc++-v3/include/std/type_traits  |   4 +-
  .../is_nothrow_convertible/value_ext.cc   |   4 +-
  16 files changed, 684 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/ext/is_convertible1.C
  create mode 100644 gcc/testsuite/g++.dg/ext/is_convertible2.C
  create mode 100644 gcc/testsuite/g++.dg/ext/is_nothrow_convertible1.C
  create mode 100644 gcc/testsuite/g++.dg/ext/is_nothrow_convertible2.C

diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index c0f15f4cab1..dce3045c9f2 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -541,6 +541,8 @@ const struct c_common_resword c_common_reswords[] =
{ "__is_constructible", RID_IS_CONSTRUCTIBLE, D_CXXONLY },
{ "__is_nothrow_assignable", RID_IS_NOTHROW_ASSIGNABLE, D_CXXONLY },
{ "__is_nothrow_constructible", RID_IS_NOTHROW_CONSTRUCTIBLE, D_CXXONLY },
+  { "__is_convertible", RID_IS_CONVERTIBLE, D_CXXONLY },
+  { "__is_nothrow_convertible", RID_IS_NOTHROW_CONVERTIBLE, D_CXXONLY },
{ "__reference_constructs_from_temporary", 

Re: [PATCH] rs6000: Fix the condition with frame_pointer_needed_indeed [PR96072]

2022-09-22 Thread Segher Boessenkool
Hi!

On Thu, Sep 22, 2022 at 09:41:42AM +0800, Kewen.Lin wrote:
>   * config/rs6000/rs6000-logue.cc (rs6000_emit_epilogue): Update the
>   condition for adding REG_CFA_DEF_CFA reg note with
>   frame_pointer_needed_indeed.

> --- a/gcc/config/rs6000/rs6000-logue.cc
> +++ b/gcc/config/rs6000/rs6000-logue.cc
> @@ -4956,7 +4956,7 @@ rs6000_emit_epilogue (enum epilogue_type epilogue_type)
>a REG_CFA_DEF_CFA note, but that's OK;  A duplicate is
>discarded by dwarf2cfi.cc/dwarf2out.cc, and in any case would
>be harmless if emitted.  */
> -  if (frame_pointer_needed)
> +  if (frame_pointer_needed_indeed)
>   {
> insn = get_last_insn ();

I thought about adding an assert here, but the very next insn gives a
clear enough message anyway, zo it would be just noise :-)

> diff --git a/gcc/testsuite/gcc.target/powerpc/pr96072.c 
> b/gcc/testsuite/gcc.target/powerpc/pr96072.c
> new file mode 100644
> index 000..23d1cc74ffd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr96072.c
> @@ -0,0 +1,14 @@
> +/* { dg-options "-O1" } */
> +
> +/* Verify there is no ICE on 32 bit environment.  */

/* This used to ICE with the SYSV ABI (PR96072).  */

Please use -O2 if that works here.

Okay for trunk.  Thank you!


Segher


Re: [PATCH] rs6000: Fix condition of define_expand vec_shr_ [PR100645]

2022-09-22 Thread Segher Boessenkool
Hi!

Heh, I first thought I had mistyped thgew PR #, but it is this one after
all :-)

On Thu, Sep 22, 2022 at 09:41:34AM +0800, Kewen.Lin wrote:
> PR100645 exposes one latent bug in define_expand vec_shr_
> that the current condition TARGET_ALTIVEC is too loose.  The
> mode iterator VEC_L contains a few modes, they are not always
> supported as vector mode, VECTOR_UNIT_ALTIVEC_OR_VSX_P should
> be used like some other VEC_L usages.

> --- a/gcc/config/rs6000/vector.md
> +++ b/gcc/config/rs6000/vector.md
> @@ -1475,7 +1475,7 @@ (define_expand "vec_shr_"
>[(match_operand:VEC_L 0 "vlogical_operand")
> (match_operand:VEC_L 1 "vlogical_operand")
> (match_operand:QI 2 "reg_or_short_operand")]
> -  "TARGET_ALTIVEC"
> +  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr100645.c
> @@ -0,0 +1,13 @@
> +/* { dg-require-effective-target powerpc_altivec_ok } */
> +/* { dg-options "-mdejagnu-cpu=power6 -maltivec" } */

This is a strange choice: we normally do not enable VMX on p6.  Just use
p7 instead?  There is no need for altivec_ok in any case, the -mcpu=
guarantees it is satisfied.

> +/* It's to verify no ICE here.  */

"This used to ICE."?

Please commit this now, looks good.  Thanks!


Segher


Re: [PATCH] gcc/config/t-i386: add build dependencies on i386-builtin-types.inc

2022-09-22 Thread Sergei Trofimovich via Gcc-patches
On Fri, 16 Sept 2022 at 19:49, Sergei Trofimovich  wrote:
>
> From: Sergei Trofimovich 
>
> i386-builtin-types.inc is included indirectly via i386-builtins.h
> into 4 files: i386.cc i386-builtins.cc i386-expand.cc i386-features.cc
>
> Only i386.cc dependency was present in gcc/config/t-i386 makefile.
>
> As a result parallel builds occasionally fail as:
>
> g++ ... -o i386-builtins.o ... 
> ../../gcc-13-20220911/gcc/config/i386/i386-builtins.cc
> In file included from 
> ../../gcc-13-20220911/gcc/config/i386/i386-builtins.cc:92:
> ../../gcc-13-20220911/gcc/config/i386/i386-builtins.h:25:10:
>  fatal error: i386-builtin-types.inc: No such file or directory
>25 | #include "i386-builtin-types.inc"
>   |  ^~~~
> compilation terminated.
> make[3]: *** [../../gcc-13-20220911/gcc/config/i386/t-i386:54: 
> i386-builtins.o]
>   Error 1 shuffle=1663349189
>
> gcc/
> * config/i386/t-i386: Add build-time dependencies against
> i386-builtin-types.inc to i386-builtins.o, i386-expand.o,
> i386-features.o.
> ---
>  gcc/config/i386/t-i386 | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/gcc/config/i386/t-i386 b/gcc/config/i386/t-i386
> index 4e2a0efc615..ffdbbdfe8ce 100644
> --- a/gcc/config/i386/t-i386
> +++ b/gcc/config/i386/t-i386
> @@ -62,7 +62,12 @@ i386-features.o: $(srcdir)/config/i386/i386-features.cc
> $(COMPILE) $<
> $(POSTCOMPILE)
>
> +# i386-builtin-types.inc is included into i386-builtins.h.
> +# Below are direct users of i386-builtins.h:
>  i386.o: i386-builtin-types.inc
> +i386-builtins.o: i386-builtin-types.inc
> +i386-expand.o: i386-builtin-types.inc
> +i386-features.o: i386-builtin-types.inc
>
>  i386-builtin-types.inc: s-i386-bt ; @true
>  s-i386-bt: $(srcdir)/config/i386/i386-builtin-types.awk \
> --
> 2.37.2
>

Is it a reasonable approach? Maybe gcc has an equivalent of automake's
BUILT_SOURCES to avoid explicit tracking of such dependencies?

-- 
Sergei


Re: [PATCH] frange: dump hex values when dumping FP numbers.

2022-09-22 Thread Jakub Jelinek via Gcc-patches
On Thu, Sep 22, 2022 at 06:49:10PM +0200, Aldy Hernandez wrote:
> It has been suggested that if we start bumping numbers by an ULP when
> calculating open ranges (for example the numbers less than 3.0) that
> dumping these will become increasingly harder to read, and instead we
> should opt for the hex representation.  I still find the floating
> point representation easier to read for most numbers, but perhaps we
> could have both?
> 
> With this patch this is the representation for [15.0, 20.0]:
> 
>  [frange] float [1.5e+1 (0x0.fp+4), 2.0e+1 (0x0.ap+5)]
> 
> Would you find this useful, or should we stick to the hex
> representation only (or something altogether different)?

I think dumping both is the way to go, but real_to_hexadecimal doesn't
do anything useful with decimal floats, so that part should be
guarded on !DECIMAL_FLOAT_TYPE_P (type).

Why do you build a tree + dump_generic_node for decimal instead of
real_to_decimal_for_mode ?
The former I think calls:
char string[100];
real_to_decimal (string, , sizeof (string), 0, 1);
so perhaps:
  char s[100];
  real_to_decimal_for_mode (s, , sizeof (string), 0, 1, TYPE_MODE (type));
  pp_string (pp, "%s", s);
  if (!DECIMAL_FLOAT_TYPE_P (type))
{
  real_to_hexadecimal (s, , sizeof (s), 0, 1);
  pp_printf (pp, " (%s)", s);
}
?

Jakub



Re: [PATCH] frange: dump hex values when dumping FP numbers.

2022-09-22 Thread Jeff Law via Gcc-patches



On 9/22/22 10:49, Aldy Hernandez via Gcc-patches wrote:

It has been suggested that if we start bumping numbers by an ULP when
calculating open ranges (for example the numbers less than 3.0) that
dumping these will become increasingly harder to read, and instead we
should opt for the hex representation.  I still find the floating
point representation easier to read for most numbers, but perhaps we
could have both?

With this patch this is the representation for [15.0, 20.0]:

  [frange] float [1.5e+1 (0x0.fp+4), 2.0e+1 (0x0.ap+5)]

Would you find this useful, or should we stick to the hex
representation only (or something altogether different)?

Tested on x86-64 Linux.

gcc/ChangeLog:

* value-range-pretty-print.cc (vrange_printer::print_real_value): New.
(vrange_printer::visit): Call print_real_value.
* value-range-pretty-print.h: New print_real_value.


The big advantage of the hex representation is you can feed that back 
into the compiler trivially and be confident the bit pattern hasn't 
changed.   I've found it invaluable when doing deep FP analysis.



jeff




Re: [PATCH] frange: dump hex values when dumping FP numbers.

2022-09-22 Thread Toon Moene

If it's not too cumbersome, I suggest dumping both.

In my neck-of-the-woods (meteorology) I have seen this done just to 
ensure that algorithms that are supposed to be bit-reproducable actually 
are - and that it can be checked visually.


Kind regards,
Toon.

On 9/22/22 18:49, Aldy Hernandez via Gcc-patches wrote:


It has been suggested that if we start bumping numbers by an ULP when
calculating open ranges (for example the numbers less than 3.0) that
dumping these will become increasingly harder to read, and instead we
should opt for the hex representation.  I still find the floating
point representation easier to read for most numbers, but perhaps we
could have both?

With this patch this is the representation for [15.0, 20.0]:

  [frange] float [1.5e+1 (0x0.fp+4), 2.0e+1 (0x0.ap+5)]

Would you find this useful, or should we stick to the hex
representation only (or something altogether different)?

Tested on x86-64 Linux.

gcc/ChangeLog:

* value-range-pretty-print.cc (vrange_printer::print_real_value): New.
(vrange_printer::visit): Call print_real_value.
* value-range-pretty-print.h: New print_real_value.
---
  gcc/value-range-pretty-print.cc | 16 
  gcc/value-range-pretty-print.h  |  1 +
  2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/gcc/value-range-pretty-print.cc b/gcc/value-range-pretty-print.cc
index eb7442229ba..51be037c254 100644
--- a/gcc/value-range-pretty-print.cc
+++ b/gcc/value-range-pretty-print.cc
@@ -117,6 +117,16 @@ vrange_printer::print_irange_bitmasks (const irange ) 
const
pp_string (pp, buf);
  }
  
+void

+vrange_printer::print_real_value (tree type, const REAL_VALUE_TYPE ) const
+{
+  char s[60];
+  tree t = build_real (type, r);
+  dump_generic_node (pp, t, 0, TDF_NONE, false);
+  real_to_hexadecimal (s, , sizeof (s), 0, 1);
+  pp_printf (pp, " (%s)", s);
+}
+
  // Print an frange.
  
  void

@@ -141,11 +151,9 @@ vrange_printer::visit (const frange ) const
bool has_endpoints = !r.known_isnan ();
if (has_endpoints)
  {
-  dump_generic_node (pp,
-build_real (type, r.lower_bound ()), 0, TDF_NONE, 
false);
+  print_real_value (type, r.lower_bound ());
pp_string (pp, ", ");
-  dump_generic_node (pp,
-build_real (type, r.upper_bound ()), 0, TDF_NONE, 
false);
+  print_real_value (type, r.upper_bound ());
  }
pp_character (pp, ']');
print_frange_nan (r);
diff --git a/gcc/value-range-pretty-print.h b/gcc/value-range-pretty-print.h
index 20c26598fe7..a9ae5a7b4cc 100644
--- a/gcc/value-range-pretty-print.h
+++ b/gcc/value-range-pretty-print.h
@@ -32,6 +32,7 @@ private:
void print_irange_bound (const wide_int , tree type) const;
void print_irange_bitmasks (const irange &) const;
void print_frange_nan (const frange &) const;
+  void print_real_value (tree type, const REAL_VALUE_TYPE ) const;
  
pretty_printer *pp;

  };


--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands



Re: [PATCH] c++ modules: ICE with class NTTP argument [PR100616]

2022-09-22 Thread Nathan Sidwell via Gcc-patches

On 9/22/22 14:25, Patrick Palka wrote:


index 80467c19254..722b64793ed 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -18235,9 +18235,11 @@ maybe_register_incomplete_var (tree var)
{
  /* When the outermost open class is complete we can resolve any
 pointers-to-members.  */
- tree context = outermost_open_class ();
- incomplete_var iv = {var, context};
- vec_safe_push (incomplete_vars, iv);
+ if (tree context = outermost_open_class ())
+   {
+ incomplete_var iv = {var, context};
+ vec_safe_push (incomplete_vars, iv);
+   }


My immediate thought here is eek!  during stream in, the 
outermost_open_class could be anything -- to do with the context that 
wanted to lookup of the thing being streamed in, right?  So, the above 
change is I think just papering over a problem in this case.


not sure how to approach this.

nathan

--
Nathan Sidwell



[PATCH 17/17] Convert CFN_BUILT_IN_PARITY to range-ops.

2022-09-22 Thread Andrew MacLeod via Gcc-patches


Also, as the last builtin remaining, also remove the builtin 
infrastructure routines from fold_using_range.



Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew

From 5608e410914ebb7c8cc9fa50afc8ada3b22cbf2c Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Tue, 20 Sep 2022 19:30:46 -0400
Subject: [PATCH 17/17] Convert CFN_BUILT_IN_PARITY to range-ops.

Also, as the last builtin remaining, also remove the builtin infrastrucure
routines from fold_using_range.

	* gimple-range-fold.cc (range_of_range_op): Handle no operands.
	(range_of_call): Do not check for builtins.
	(fold_using_range::range_of_builtin_call): Delete.
	(fold_using_range::range_of_builtin_int_call): Delete.
	* gimple-range-fold.h: Adjust prototypes.
	* gimple-range-op.cc (class cfn_parity): New.
	(gimple_range_op_handler::maybe_builtin_call): Set arguments.
---
 gcc/gimple-range-fold.cc | 60 
 gcc/gimple-range-fold.h  |  4 ---
 gcc/gimple-range-op.cc   | 19 +
 3 files changed, 31 insertions(+), 52 deletions(-)

diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index 5e8a13e7337..c381ef94087 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -534,6 +534,16 @@ fold_using_range::range_of_range_op (vrange ,
   tree lhs = handler.lhs ();
   tree op1 = handler.operand1 ();
   tree op2 = handler.operand2 ();
+
+  // Certain types of builtin functions may have no arguments.
+  if (!op1)
+{
+  Value_Range r1 (type);
+  if (!handler.fold_range (r, type, r1, r1))
+	r.set_varying (type);
+  return true;
+}
+
   Value_Range range1 (TREE_TYPE (op1));
   Value_Range range2 (op2 ? TREE_TYPE (op2) : TREE_TYPE (op1));
 
@@ -823,7 +833,7 @@ fold_using_range::range_of_phi (vrange , gphi *phi, fur_source )
 // If a range cannot be calculated, return false.
 
 bool
-fold_using_range::range_of_call (vrange , gcall *call, fur_source )
+fold_using_range::range_of_call (vrange , gcall *call, fur_source &)
 {
   tree type = gimple_range_type (call);
   if (!type)
@@ -832,9 +842,7 @@ fold_using_range::range_of_call (vrange , gcall *call, fur_source )
   tree lhs = gimple_call_lhs (call);
   bool strict_overflow_p;
 
-  if (range_of_builtin_call (r, call, src))
-;
-  else if (gimple_stmt_nonnegative_warnv_p (call, _overflow_p))
+  if (gimple_stmt_nonnegative_warnv_p (call, _overflow_p))
 r.set_nonnegative (type);
   else if (gimple_call_nonnull_result_p (call)
 	   || gimple_call_nonnull_arg (call))
@@ -852,50 +860,6 @@ fold_using_range::range_of_call (vrange , gcall *call, fur_source )
   return true;
 }
 
-// For a builtin in CALL, return a range in R if known and return
-// TRUE.  Otherwise return FALSE.
-
-bool
-fold_using_range::range_of_builtin_call (vrange , gcall *call,
-	 fur_source )
-{
-  combined_fn func = gimple_call_combined_fn (call);
-  if (func == CFN_LAST)
-return false;
-
-  tree type = gimple_range_type (call);
-  gcc_checking_assert (type);
-
-  if (irange::supports_p (type))
-return range_of_builtin_int_call (as_a  (r), call, src);
-
-  return false;
-}
-
-bool
-fold_using_range::range_of_builtin_int_call (irange , gcall *call,
-	 fur_source &)
-{
-  combined_fn func = gimple_call_combined_fn (call);
-  if (func == CFN_LAST)
-return false;
-
-  tree type = gimple_range_type (call);
-  scalar_int_mode mode;
-
-  switch (func)
-{
-CASE_CFN_PARITY:
-  r.set (build_zero_cst (type), build_one_cst (type));
-  return true;
-
-default:
-  break;
-}
-  return false;
-}
-
-
 // Calculate a range for COND_EXPR statement S and return it in R.
 // If a range cannot be calculated, return false.
 
diff --git a/gcc/gimple-range-fold.h b/gcc/gimple-range-fold.h
index ce18c66b8e7..d1ed2bca80f 100644
--- a/gcc/gimple-range-fold.h
+++ b/gcc/gimple-range-fold.h
@@ -165,10 +165,6 @@ protected:
   bool range_of_call (vrange , gcall *call, fur_source );
   bool range_of_cond_expr (vrange , gassign* cond, fur_source );
   bool range_of_address (irange , gimple *s, fur_source );
-  bool range_of_builtin_call (vrange , gcall *call, fur_source );
-  bool range_of_builtin_int_call (irange , gcall *call, fur_source );
-  void range_of_builtin_ubsan_call (irange , gcall *call, tree_code code,
-fur_source );
   bool range_of_phi (vrange , gphi *phi, fur_source );
   void range_of_ssa_name_with_loop_info (vrange &, tree, class loop *, gphi *,
 	 fur_source );
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index 76295466e65..d7c6dfa933d 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -663,6 +663,20 @@ private:
   bool m_is_pos;
 } op_cfn_goacc_dim_size (false), op_cfn_goacc_dim_pos (true);
 
+
+// Implement range operator for CFN_BUILT_IN_
+class cfn_parity : public range_operator
+{
+public:
+  using range_operator::fold_range;
+  virtual bool fold_range (irange , tree type, const irange &,
+			   const irange &, relation_kind) const
+  {

[PATCH 16/17] Convert CFN_BUILT_IN_GOACC_DIM_* to range-ops.

2022-09-22 Thread Andrew MacLeod via Gcc-patches

Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.

AndrewFrom e7f035f66aa25e0537a0e3a76d43c71fe9531724 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Tue, 20 Sep 2022 19:19:30 -0400
Subject: [PATCH 16/17] Convert CFN_BUILT_IN_GOACC_DIM_* to range-ops.

	* gimple-range-fold.cc (range_of_builtin_int_call): Remove case
	for CFN_GOACC_DIM_*.
	* gimple-range-op.cc (class cfn_goacc_dim): New.
	(gimple_range_op_handler::maybe_builtin_call): Set arguments.
---
 gcc/gimple-range-fold.cc | 19 
 gcc/gimple-range-op.cc   | 47 
 2 files changed, 47 insertions(+), 19 deletions(-)

diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index d22fb0e9352..5e8a13e7337 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -889,25 +889,6 @@ fold_using_range::range_of_builtin_int_call (irange , gcall *call,
   r.set (build_zero_cst (type), build_one_cst (type));
   return true;
 
-case CFN_GOACC_DIM_SIZE:
-case CFN_GOACC_DIM_POS:
-  // Optimizing these two internal functions helps the loop
-  // optimizer eliminate outer comparisons.  Size is [1,N]
-  // and pos is [0,N-1].
-  {
-	bool is_pos = func == CFN_GOACC_DIM_POS;
-	int axis = oacc_get_ifn_dim_arg (call);
-	int size = oacc_get_fn_dim_size (current_function_decl, axis);
-	if (!size)
-	  // If it's dynamic, the backend might know a hardware limitation.
-	  size = targetm.goacc.dim_limit (axis);
-
-	r.set (build_int_cst (type, is_pos ? 0 : 1),
-	   size
-	   ? build_int_cst (type, size - is_pos) : vrp_val_max (type));
-	return true;
-  }
-
 default:
   break;
 }
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index f9161b5820f..76295466e65 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -635,6 +635,34 @@ public:
   }
 } op_cfn_strlen;
 
+
+// Implement range operator for CFN_BUILT_IN_GOACC_DIM
+class cfn_goacc_dim : public range_operator
+{
+public:
+  cfn_goacc_dim (bool is_pos) { m_is_pos = is_pos; }
+  using range_operator::fold_range;
+  virtual bool fold_range (irange , tree type, const irange ,
+			   const irange &, relation_kind) const
+  {
+tree axis_tree;
+if (!lh.singleton_p (_tree))
+  return false;
+HOST_WIDE_INT axis = TREE_INT_CST_LOW (axis_tree);
+int size = oacc_get_fn_dim_size (current_function_decl, axis);
+if (!size)
+  // If it's dynamic, the backend might know a hardware limitation.
+  size = targetm.goacc.dim_limit (axis);
+
+r.set (build_int_cst (type, m_is_pos ? 0 : 1),
+	   size
+	   ? build_int_cst (type, size - m_is_pos) : vrp_val_max (type));
+return true;
+  }
+private:
+  bool m_is_pos;
+} op_cfn_goacc_dim_size (false), op_cfn_goacc_dim_pos (true);
+
 // Set up a gimple_range_op_handler for any built in function which can be
 // supported via range-ops.
 
@@ -748,6 +776,25 @@ gimple_range_op_handler::maybe_builtin_call ()
 	break;
   }
 
+// Optimizing these two internal functions helps the loop
+// optimizer eliminate outer comparisons.  Size is [1,N]
+// and pos is [0,N-1].
+case CFN_GOACC_DIM_SIZE:
+  // This call will ensure all the asserts are triggered.
+  oacc_get_ifn_dim_arg (call);
+  m_op1 = gimple_call_arg (call, 0);
+  m_valid = true;
+  m_int = _cfn_goacc_dim_size;
+  break;
+
+case CFN_GOACC_DIM_POS:
+  // This call will ensure all the asserts are triggered.
+  oacc_get_ifn_dim_arg (call);
+  m_op1 = gimple_call_arg (call, 0);
+  m_valid = true;
+  m_int = _cfn_goacc_dim_pos;
+  break;
+
 default:
   break;
 }
-- 
2.37.3



[PATCH 15/17] Convert CFN_BUILT_IN_STRLEN to range-ops.

2022-09-22 Thread Andrew MacLeod via Gcc-patches

Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew
From c750e675cb77f283ff991682db7740bc5f6d4cf4 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Tue, 20 Sep 2022 19:05:03 -0400
Subject: [PATCH 15/17] Convert CFN_BUILT_IN_STRLEN to range-ops.

	* gimple-range-fold.cc (range_of_builtin_int_call): Remove case
	for CFN_BUILT_IN_STRLEN.
	* gimple-range-op.cc (class cfn_strlen): New.
	(gimple_range_op_handler::maybe_builtin_call): Set arguments.
---
 gcc/gimple-range-fold.cc | 21 -
 gcc/gimple-range-op.cc   | 37 +
 2 files changed, 37 insertions(+), 21 deletions(-)

diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index d445270417a..d22fb0e9352 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -908,27 +908,6 @@ fold_using_range::range_of_builtin_int_call (irange , gcall *call,
 	return true;
   }
 
-case CFN_BUILT_IN_STRLEN:
-  if (tree lhs = gimple_call_lhs (call))
-	if (ptrdiff_type_node
-	&& (TYPE_PRECISION (ptrdiff_type_node)
-		== TYPE_PRECISION (TREE_TYPE (lhs
-	  {
-	tree type = TREE_TYPE (lhs);
-	tree max = vrp_val_max (ptrdiff_type_node);
-	wide_int wmax
-	  = wi::to_wide (max, TYPE_PRECISION (TREE_TYPE (max)));
-	tree range_min = build_zero_cst (type);
-	// To account for the terminating NULL, the maximum length
-	// is one less than the maximum array size, which in turn
-	// is one less than PTRDIFF_MAX (or SIZE_MAX where it's
-	// smaller than the former type).
-	// FIXME: Use max_object_size() - 1 here.
-	tree range_max = wide_int_to_tree (type, wmax - 2);
-	r.set (range_min, range_max);
-	return true;
-	  }
-  break;
 default:
   break;
 }
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index 09b7dd2add3..f9161b5820f 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -611,6 +611,30 @@ cfn_ubsan op_cfn_ubsan_add (PLUS_EXPR);
 cfn_ubsan op_cfn_ubsan_sub (MINUS_EXPR);
 cfn_ubsan op_cfn_ubsan_mul (MULT_EXPR);
 
+
+// Implement range operator for CFN_BUILT_IN_STRLEN
+class cfn_strlen : public range_operator
+{
+public:
+  using range_operator::fold_range;
+  virtual bool fold_range (irange , tree type, const irange &,
+			   const irange &, relation_kind) const
+  {
+tree max = vrp_val_max (ptrdiff_type_node);
+wide_int wmax
+  = wi::to_wide (max, TYPE_PRECISION (TREE_TYPE (max)));
+tree range_min = build_zero_cst (type);
+// To account for the terminating NULL, the maximum length
+// is one less than the maximum array size, which in turn
+// is one less than PTRDIFF_MAX (or SIZE_MAX where it's
+// smaller than the former type).
+// FIXME: Use max_object_size() - 1 here.
+tree range_max = wide_int_to_tree (type, wmax - 2);
+r.set (range_min, range_max);
+return true;
+  }
+} op_cfn_strlen;
+
 // Set up a gimple_range_op_handler for any built in function which can be
 // supported via range-ops.
 
@@ -711,6 +735,19 @@ gimple_range_op_handler::maybe_builtin_call ()
   m_int = _cfn_ubsan_mul;
   break;
 
+case CFN_BUILT_IN_STRLEN:
+  {
+	tree lhs = gimple_call_lhs (call);
+	if (lhs && ptrdiff_type_node && (TYPE_PRECISION (ptrdiff_type_node)
+	 == TYPE_PRECISION (TREE_TYPE (lhs
+	  {
+	m_op1 = gimple_call_arg (call, 0);
+	m_valid = true;
+	m_int = _cfn_strlen;
+	  }
+	break;
+  }
+
 default:
   break;
 }
-- 
2.37.3



[PATCH 13/17] Convert CFN_BUILT_IN_CLRSB to range-ops.

2022-09-22 Thread Andrew MacLeod via Gcc-patches

 Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.

AndrewFrom f7e62b09300b6935bceaffb4c42f6edab80f52dc Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Tue, 20 Sep 2022 18:21:04 -0400
Subject: [PATCH 13/17] Convert CFN_BUILT_IN_CLRSB to range-ops.

	* gimple-range-fold.cc (range_of_builtin_int_call): Remove case
	for CFN_BUILT_IN_CLRSB.
	* gimple-range-op.cc (class cfn_clrsb): New.
	(gimple_range_op_handler::maybe_builtin_call): Set arguments.
---
 gcc/gimple-range-fold.cc |  7 ---
 gcc/gimple-range-op.cc   | 23 +++
 2 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index 96a138a7a02..1d7d1da7bbe 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -916,8 +916,6 @@ fold_using_range::range_of_builtin_int_call (irange , gcall *call,
 return false;
 
   tree type = gimple_range_type (call);
-  tree arg;
-  int prec;
   scalar_int_mode mode;
 
   switch (func)
@@ -926,11 +924,6 @@ fold_using_range::range_of_builtin_int_call (irange , gcall *call,
   r.set (build_zero_cst (type), build_one_cst (type));
   return true;
 
-CASE_CFN_CLRSB:
-  arg = gimple_call_arg (call, 0);
-  prec = TYPE_PRECISION (TREE_TYPE (arg));
-  r.set (build_int_cst (type, 0), build_int_cst (type, prec - 1));
-  return true;
 case CFN_UBSAN_CHECK_ADD:
   range_of_builtin_ubsan_call (r, call, PLUS_EXPR, src);
   return true;
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index 801c2bb235e..bee225431e8 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -559,6 +559,23 @@ cfn_ctz::fold_range (irange , tree type, const irange ,
   return true;
 }
 
+
+// Implement range operator for CFN_BUILT_IN_
+class cfn_clrsb : public range_operator
+{
+public:
+  using range_operator::fold_range;
+  virtual bool fold_range (irange , tree type, const irange ,
+			   const irange &, relation_kind) const
+  {
+if (lh.undefined_p ())
+  return false;
+int prec = TYPE_PRECISION (lh.type ());
+r.set (build_int_cst (type, 0), build_int_cst (type, prec - 1));
+return true;
+  }
+} op_cfn_clrsb;
+
 // Set up a gimple_range_op_handler for any built in function which can be
 // supported via range-ops.
 
@@ -632,6 +649,12 @@ gimple_range_op_handler::maybe_builtin_call ()
 	m_int = _cfn_ctz;
   break;
 
+CASE_CFN_CLRSB:
+  m_op1 = gimple_call_arg (call, 0);
+  m_valid = true;
+  m_int = _cfn_clrsb;
+  break;
+
 default:
   break;
 }
-- 
2.37.3



[PATCH 14/17] Convert CFN_BUILT_IN_UBSAN_CHECK_* to range-ops.

2022-09-22 Thread Andrew MacLeod via Gcc-patches

 Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew
From b6f670ff706e35dc51a62db4206cb241dcac4963 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Tue, 20 Sep 2022 18:48:05 -0400
Subject: [PATCH 14/17] Convert CFN_BUILT_IN_UBSAN_CHECK_* to range-ops.

	* gimple-range-fold.cc (range_of_builtin_ubsan_call): Delete.
	(range_of_builtin_int_call): Remove cases for
	CFN_BUILT_IN_UBSAN_CHECK.
	* gimple-range-op.cc (class cfn_ubsan): New.
	(gimple_range_op_handler::maybe_builtin_call): Set arguments.
---
 gcc/gimple-range-fold.cc | 47 +
 gcc/gimple-range-op.cc   | 56 
 2 files changed, 57 insertions(+), 46 deletions(-)

diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index 1d7d1da7bbe..d445270417a 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -852,41 +852,6 @@ fold_using_range::range_of_call (vrange , gcall *call, fur_source )
   return true;
 }
 
-// Return the range of a __builtin_ubsan* in CALL and set it in R.
-// CODE is the type of ubsan call (PLUS_EXPR, MINUS_EXPR or
-// MULT_EXPR).
-
-void
-fold_using_range::range_of_builtin_ubsan_call (irange , gcall *call,
-	   tree_code code, fur_source )
-{
-  gcc_checking_assert (code == PLUS_EXPR || code == MINUS_EXPR
-		   || code == MULT_EXPR);
-  tree type = gimple_range_type (call);
-  range_op_handler op (code, type);
-  gcc_checking_assert (op);
-  int_range_max ir0, ir1;
-  tree arg0 = gimple_call_arg (call, 0);
-  tree arg1 = gimple_call_arg (call, 1);
-  src.get_operand (ir0, arg0);
-  src.get_operand (ir1, arg1);
-  // Check for any relation between arg0 and arg1.
-  relation_kind relation = src.query_relation (arg0, arg1);
-
-  bool saved_flag_wrapv = flag_wrapv;
-  // Pretend the arithmetic is wrapping.  If there is any overflow,
-  // we'll complain, but will actually do wrapping operation.
-  flag_wrapv = 1;
-  op.fold_range (r, type, ir0, ir1, relation);
-  flag_wrapv = saved_flag_wrapv;
-
-  // If for both arguments vrp_valueize returned non-NULL, this should
-  // have been already folded and if not, it wasn't folded because of
-  // overflow.  Avoid removing the UBSAN_CHECK_* calls in that case.
-  if (r.singleton_p ())
-r.set_varying (type);
-}
-
 // For a builtin in CALL, return a range in R if known and return
 // TRUE.  Otherwise return FALSE.
 
@@ -909,7 +874,7 @@ fold_using_range::range_of_builtin_call (vrange , gcall *call,
 
 bool
 fold_using_range::range_of_builtin_int_call (irange , gcall *call,
-	 fur_source )
+	 fur_source &)
 {
   combined_fn func = gimple_call_combined_fn (call);
   if (func == CFN_LAST)
@@ -924,16 +889,6 @@ fold_using_range::range_of_builtin_int_call (irange , gcall *call,
   r.set (build_zero_cst (type), build_one_cst (type));
   return true;
 
-case CFN_UBSAN_CHECK_ADD:
-  range_of_builtin_ubsan_call (r, call, PLUS_EXPR, src);
-  return true;
-case CFN_UBSAN_CHECK_SUB:
-  range_of_builtin_ubsan_call (r, call, MINUS_EXPR, src);
-  return true;
-case CFN_UBSAN_CHECK_MUL:
-  range_of_builtin_ubsan_call (r, call, MULT_EXPR, src);
-  return true;
-
 case CFN_GOACC_DIM_SIZE:
 case CFN_GOACC_DIM_POS:
   // Optimizing these two internal functions helps the loop
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index bee225431e8..09b7dd2add3 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -576,6 +576,41 @@ public:
   }
 } op_cfn_clrsb;
 
+
+// Implement range operator for CFN_BUILT_IN_
+class cfn_ubsan : public range_operator
+{
+public:
+  cfn_ubsan (enum tree_code code) { m_code = code; }
+  using range_operator::fold_range;
+  virtual bool fold_range (irange , tree type, const irange ,
+			   const irange , relation_kind rel) const
+  {
+range_op_handler handler (m_code, type);
+gcc_checking_assert (handler);
+
+bool saved_flag_wrapv = flag_wrapv;
+// Pretend the arithmetic is wrapping.  If there is any overflow,
+// we'll complain, but will actually do wrapping operation.
+flag_wrapv = 1;
+bool result = handler.fold_range (r, type, lh, rh, rel);
+flag_wrapv = saved_flag_wrapv;
+
+// If for both arguments vrp_valueize returned non-NULL, this should
+// have been already folded and if not, it wasn't folded because of
+// overflow.  Avoid removing the UBSAN_CHECK_* calls in that case.
+if (result && r.singleton_p ())
+  r.set_varying (type);
+return result;
+  }
+private:
+  enum tree_code m_code;
+};
+
+cfn_ubsan op_cfn_ubsan_add (PLUS_EXPR);
+cfn_ubsan op_cfn_ubsan_sub (MINUS_EXPR);
+cfn_ubsan op_cfn_ubsan_mul (MULT_EXPR);
+
 // Set up a gimple_range_op_handler for any built in function which can be
 // supported via range-ops.
 
@@ -655,6 +690,27 @@ gimple_range_op_handler::maybe_builtin_call ()
   m_int = _cfn_clrsb;
   break;
 
+case CFN_UBSAN_CHECK_ADD:
+  m_op1 = gimple_call_arg (call, 

[PATCH 12/17] Convert CFN_CTZ builtins to range-ops.

2022-09-22 Thread Andrew MacLeod via Gcc-patches

Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew
From 55738d8d96bb4f39a72cf5e3739d35b39fc2146a Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Tue, 20 Sep 2022 18:19:30 -0400
Subject: [PATCH 12/17] Convert CFN_CTZ builtins to range-ops.

	* gimple-range-fold.cc (range_of_builtin_int_call): Remove case
	for CFN_CTZ.
	* gimple-range-op.cc (class cfn_ctz): New.
	(gimple_range_op_handler::maybe_builtin_call): Set arguments.
---
 gcc/gimple-range-fold.cc | 61 +--
 gcc/gimple-range-op.cc   | 79 
 2 files changed, 80 insertions(+), 60 deletions(-)

diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index 63eaa90be96..96a138a7a02 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -917,7 +917,7 @@ fold_using_range::range_of_builtin_int_call (irange , gcall *call,
 
   tree type = gimple_range_type (call);
   tree arg;
-  int mini, maxi, zerov = 0, prec;
+  int prec;
   scalar_int_mode mode;
 
   switch (func)
@@ -926,65 +926,6 @@ fold_using_range::range_of_builtin_int_call (irange , gcall *call,
   r.set (build_zero_cst (type), build_one_cst (type));
   return true;
 
-CASE_CFN_CTZ:
-  // __builtin_ctz* return [0, prec-1], except for when the
-  // argument is 0, but that is undefined behavior.
-  //
-  // For __builtin_ctz* consider argument of 0 always undefined
-  // behavior, for internal fns depending on CTZ_DEFINED_VALUE_AT_ZERO.
-  arg = gimple_call_arg (call, 0);
-  prec = TYPE_PRECISION (TREE_TYPE (arg));
-  mini = 0;
-  maxi = prec - 1;
-  mode = SCALAR_INT_TYPE_MODE (TREE_TYPE (arg));
-  if (gimple_call_internal_p (call))
-	{
-	  if (optab_handler (ctz_optab, mode) != CODE_FOR_nothing
-	  && CTZ_DEFINED_VALUE_AT_ZERO (mode, zerov) == 2)
-	{
-	  // Handle only the two common values.
-	  if (zerov == -1)
-		mini = -1;
-	  else if (zerov == prec)
-		maxi = prec;
-	  else
-		// Magic value to give up, unless we can prove arg is non-zero.
-		mini = -2;
-	}
-	}
-  src.get_operand (r, arg);
-  if (!r.undefined_p ())
-	{
-	  // If arg is non-zero, then use [0, prec - 1].
-	  if (!range_includes_zero_p ())
-	{
-	  mini = 0;
-	  maxi = prec - 1;
-	}
-	  // If some high bits are known to be zero, we can decrease
-	  // the maximum.
-	  wide_int max = r.upper_bound ();
-	  if (max == 0)
-	{
-	  // Argument is [0, 0].  If CTZ_DEFINED_VALUE_AT_ZERO
-	  // is 2 with value -1 or prec, return [-1, -1] or [prec, prec].
-	  // Otherwise ignore the range.
-	  if (mini == -1)
-		maxi = -1;
-	  else if (maxi == prec)
-		mini = prec;
-	}
-	  // If value at zero is prec and 0 is in the range, we can't lower
-	  // the upper bound.  We could create two separate ranges though,
-	  // [0,floor_log2(max)][prec,prec] though.
-	  else if (maxi != prec)
-	maxi = wi::floor_log2 (max);
-	}
-  if (mini == -2)
-	break;
-  r.set (build_int_cst (type, mini), build_int_cst (type, maxi));
-  return true;
-
 CASE_CFN_CLRSB:
   arg = gimple_call_arg (call, 0);
   prec = TYPE_PRECISION (TREE_TYPE (arg));
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index caba49309f9..801c2bb235e 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -489,6 +489,76 @@ cfn_clz::fold_range (irange , tree type, const irange ,
   return true;
 }
 
+// Implement range operator for CFN_BUILT_IN_CTZ
+class cfn_ctz : public range_operator
+{
+public:
+  cfn_ctz (bool internal) { m_gimple_call_internal_p = internal; }
+  using range_operator::fold_range;
+  virtual bool fold_range (irange , tree type, const irange ,
+			   const irange &, relation_kind) const;
+private:
+  bool m_gimple_call_internal_p;
+} op_cfn_ctz (false), op_cfn_ctz_internal (true);
+
+bool
+cfn_ctz::fold_range (irange , tree type, const irange ,
+		 const irange &, relation_kind) const
+{
+  if (lh.undefined_p ())
+return false;
+  int prec = TYPE_PRECISION (lh.type ());
+  int mini = 0;
+  int maxi = prec - 1;
+  int zerov = 0;
+  scalar_int_mode mode = SCALAR_INT_TYPE_MODE (lh.type ());
+
+  if (m_gimple_call_internal_p)
+{
+  if (optab_handler (ctz_optab, mode) != CODE_FOR_nothing
+	  && CTZ_DEFINED_VALUE_AT_ZERO (mode, zerov) == 2)
+	{
+	  // Handle only the two common values.
+	  if (zerov == -1)
+	mini = -1;
+	  else if (zerov == prec)
+	maxi = prec;
+	  else
+	// Magic value to give up, unless we can prove arg is non-zero.
+	mini = -2;
+	}
+}
+  // If arg is non-zero, then use [0, prec - 1].
+  if (!range_includes_zero_p ())
+{
+  mini = 0;
+  maxi = prec - 1;
+}
+  // If some high bits are known to be zero, we can decrease
+  // the maximum.
+  wide_int max = lh.upper_bound ();
+  if (max == 0)
+{
+  // Argument is [0, 0].  If CTZ_DEFINED_VALUE_AT_ZERO
+  // is 2 with value -1 or prec, return [-1, -1] or 

[PATCH 11/17] Convert CFN_CLZ builtins to range-ops.

2022-09-22 Thread Andrew MacLeod via Gcc-patches

Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew

From ae1669a98656cca594fcd2fef6bd2cd7308a361f Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Tue, 20 Sep 2022 18:12:25 -0400
Subject: [PATCH 11/17] Convert CFN_CLZ builtins to range-ops.

	* gimple-range-fold.cc (range_of_builtin_int_call): Remove case
	for CFN_CLZ.
	* gimple-range-op.cc (class cfn_clz): New.
	(gimple_range_op_handler::maybe_builtin_call): Set arguments.
---
 gcc/gimple-range-fold.cc | 61 -
 gcc/gimple-range-op.cc   | 84 
 2 files changed, 84 insertions(+), 61 deletions(-)

diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index ca531037e13..63eaa90be96 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -926,67 +926,6 @@ fold_using_range::range_of_builtin_int_call (irange , gcall *call,
   r.set (build_zero_cst (type), build_one_cst (type));
   return true;
 
-CASE_CFN_CLZ:
-  // __builtin_c[lt]z* return [0, prec-1], except when the
-  // argument is 0, but that is undefined behavior.
-  //
-  // For __builtin_c[lt]z* consider argument of 0 always undefined
-  // behavior, for internal fns depending on C?Z_DEFINED_VALUE_AT_ZERO.
-  arg = gimple_call_arg (call, 0);
-  prec = TYPE_PRECISION (TREE_TYPE (arg));
-  mini = 0;
-  maxi = prec - 1;
-  mode = SCALAR_INT_TYPE_MODE (TREE_TYPE (arg));
-  if (gimple_call_internal_p (call))
-	{
-	  if (optab_handler (clz_optab, mode) != CODE_FOR_nothing
-	  && CLZ_DEFINED_VALUE_AT_ZERO (mode, zerov) == 2)
-	{
-	  // Only handle the single common value.
-	  if (zerov == prec)
-		maxi = prec;
-	  else
-		// Magic value to give up, unless we can prove arg is non-zero.
-		mini = -2;
-	}
-	}
-
-  src.get_operand (r, arg);
-  // From clz of minimum we can compute result maximum.
-  if (!r.undefined_p ())
-	{
-	  // From clz of minimum we can compute result maximum.
-	  if (wi::gt_p (r.lower_bound (), 0, TYPE_SIGN (r.type (
-	{
-	  maxi = prec - 1 - wi::floor_log2 (r.lower_bound ());
-	  if (mini == -2)
-		mini = 0;
-	}
-	  else if (!range_includes_zero_p ())
-	{
-	  mini = 0;
-	  maxi = prec - 1;
-	}
-	  if (mini == -2)
-	break;
-	  // From clz of maximum we can compute result minimum.
-	  wide_int max = r.upper_bound ();
-	  int newmini = prec - 1 - wi::floor_log2 (max);
-	  if (max == 0)
-	{
-	  // If CLZ_DEFINED_VALUE_AT_ZERO is 2 with VALUE of prec,
-	  // return [prec, prec], otherwise ignore the range.
-	  if (maxi == prec)
-		mini = prec;
-	}
-	  else
-	mini = newmini;
-	}
-  if (mini == -2)
-	break;
-  r.set (build_int_cst (type, mini), build_int_cst (type, maxi));
-  return true;
-
 CASE_CFN_CTZ:
   // __builtin_ctz* return [0, prec-1], except for when the
   // argument is 0, but that is undefined behavior.
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index 84837f8ee43..caba49309f9 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -414,6 +414,81 @@ public:
   }
 } op_cfn_popcount;
 
+// Implement range operator for CFN_BUILT_IN_CLZ
+class cfn_clz : public range_operator
+{
+public:
+  cfn_clz (bool internal) { m_gimple_call_internal_p = internal; }
+  using range_operator::fold_range;
+  virtual bool fold_range (irange , tree type, const irange ,
+			   const irange &, relation_kind) const;
+private:
+  bool m_gimple_call_internal_p;
+} op_cfn_clz (false), op_cfn_clz_internal (true);
+
+bool
+cfn_clz::fold_range (irange , tree type, const irange ,
+		 const irange &, relation_kind) const
+{
+  // __builtin_c[lt]z* return [0, prec-1], except when the
+  // argument is 0, but that is undefined behavior.
+  //
+  // For __builtin_c[lt]z* consider argument of 0 always undefined
+  // behavior, for internal fns depending on C?Z_DEFINED_ALUE_AT_ZERO.
+  if (lh.undefined_p ())
+return false;
+  int prec = TYPE_PRECISION (lh.type ());
+  int mini = 0;
+  int maxi = prec - 1;
+  int zerov = 0;
+  scalar_int_mode mode = SCALAR_INT_TYPE_MODE (lh.type ());
+  if (m_gimple_call_internal_p)
+{
+  if (optab_handler (clz_optab, mode) != CODE_FOR_nothing
+	  && CLZ_DEFINED_VALUE_AT_ZERO (mode, zerov) == 2)
+	{
+	  // Only handle the single common value.
+	  if (zerov == prec)
+	maxi = prec;
+	  else
+	// Magic value to give up, unless we can prove arg is non-zero.
+	mini = -2;
+	}
+}
+
+  // From clz of minimum we can compute result maximum.
+  if (wi::gt_p (lh.lower_bound (), 0, TYPE_SIGN (lh.type (
+{
+  maxi = prec - 1 - wi::floor_log2 (lh.lower_bound ());
+  if (mini == -2)
+	mini = 0;
+}
+  else if (!range_includes_zero_p ())
+{
+  mini = 0;
+  maxi = prec - 1;
+}
+  if (mini == -2)
+return false;
+  // From clz of maximum we can compute result minimum.
+  wide_int max = lh.upper_bound ();
+  int 

[PATCH 10/17] Convert CFN_BUILT_FFS and CFN_POPCOUNT to range-ops.

2022-09-22 Thread Andrew MacLeod via Gcc-patches

 Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.

AndrewFrom 5f730c650184d4c8bfad513a9e0e593f87a5bf0c Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Tue, 20 Sep 2022 18:07:14 -0400
Subject: [PATCH 10/17] Convert CFN_BUILT_FFS and CFN_POPCOUNT to range-ops.

	* gimple-range-fold.cc (range_of_builtin_int_call): Remove case
	for CFN_FFS and CFN_POPCOUNT.
	* gimple-range-op.cc (class cfn_pocount): New.
	(gimple_range_op_handler::maybe_builtin_call): Set arguments.
---
 gcc/gimple-range-fold.cc | 22 --
 gcc/gimple-range-op.cc   | 34 ++
 2 files changed, 34 insertions(+), 22 deletions(-)

diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index af1f83f7409..ca531037e13 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -922,28 +922,6 @@ fold_using_range::range_of_builtin_int_call (irange , gcall *call,
 
   switch (func)
 {
-CASE_CFN_FFS:
-CASE_CFN_POPCOUNT:
-  // __builtin_ffs* and __builtin_popcount* return [0, prec].
-  arg = gimple_call_arg (call, 0);
-  prec = TYPE_PRECISION (TREE_TYPE (arg));
-  mini = 0;
-  maxi = prec;
-  src.get_operand (r, arg);
-  // If arg is non-zero, then ffs or popcount are non-zero.
-  if (!range_includes_zero_p ())
-	mini = 1;
-  // If some high bits are known to be zero, decrease the maximum.
-  if (!r.undefined_p ())
-	{
-	  if (TYPE_SIGN (r.type ()) == SIGNED)
-	range_cast (r, unsigned_type_for (r.type ()));
-	  wide_int max = r.upper_bound ();
-	  maxi = wi::floor_log2 (max) + 1;
-	}
-  r.set (build_int_cst (type, mini), build_int_cst (type, maxi));
-  return true;
-
 CASE_CFN_PARITY:
   r.set (build_zero_cst (type), build_one_cst (type));
   return true;
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index 45384d990ae..84837f8ee43 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -387,6 +387,33 @@ cfn_toupper_tolower::fold_range (irange , tree type, const irange ,
   return true;
 }
 
+// Implement range operator for CFN_BUILT_IN_FFS and CFN_BUILT_IN_POPCOUNT.
+class cfn_popcount : public range_operator
+{
+public:
+  using range_operator::fold_range;
+  virtual bool fold_range (irange , tree type, const irange ,
+			   const irange &, relation_kind) const
+  {
+if (lh.undefined_p ())
+  return false;
+// __builtin_ffs* and __builtin_popcount* return [0, prec].
+int prec = TYPE_PRECISION (lh.type ());
+// If arg is non-zero, then ffs or popcount are non-zero.
+int mini = range_includes_zero_p () ? 0 : 1;
+int maxi = prec;
+
+// If some high bits are known to be zero, decrease the maximum.
+int_range_max tmp = lh;
+if (TYPE_SIGN (tmp.type ()) == SIGNED)
+  range_cast (tmp, unsigned_type_for (tmp.type ()));
+wide_int max = tmp.upper_bound ();
+maxi = wi::floor_log2 (max) + 1;
+r.set (build_int_cst (type, mini), build_int_cst (type, maxi));
+return true;
+  }
+} op_cfn_popcount;
+
 // Set up a gimple_range_op_handler for any built in function which can be
 // supported via range-ops.
 
@@ -435,6 +462,13 @@ gimple_range_op_handler::maybe_builtin_call ()
 	}
   break;
 
+CASE_CFN_FFS:
+CASE_CFN_POPCOUNT:
+  m_op1 = gimple_call_arg (call, 0);
+  m_int = _cfn_popcount;
+  m_valid = true;
+  break;
+
 default:
   break;
 }
-- 
2.37.3



[PATCH 09/17] Convert CFN_BUILT_IN_TOUPPER and TOLOWER to range-ops.

2022-09-22 Thread Andrew MacLeod via Gcc-patches

Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew
From 2f5da730f159de238500c82b0c6ef6c9ab91b1c2 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Tue, 20 Sep 2022 17:14:30 -0400
Subject: [PATCH 09/17] Convert CFN_BUILT_IN_TOUPPER and TOLOWER to range-ops.

	* gimple-range-fold.cc (get_letter_range): Move to new class.
	(range_of_builtin_int_call): Remove case for CFN_BUILT_IN_TOUPPER
	and CFN_BUILT_IN_TOLOWER.
	* gimple-range-op.cc (class cfn_toupper_tolower): New.
	(gimple_range_op_handler::maybe_builtin_call): Set arguments.
---
 gcc/gimple-range-fold.cc | 66 --
 gcc/gimple-range-op.cc   | 77 
 2 files changed, 77 insertions(+), 66 deletions(-)

diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index 417a925ac9f..af1f83f7409 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -887,28 +887,6 @@ fold_using_range::range_of_builtin_ubsan_call (irange , gcall *call,
 r.set_varying (type);
 }
 
-// Return TRUE if we recognize the target character set and return the
-// range for lower case and upper case letters.
-
-static bool
-get_letter_range (tree type, irange , irange )
-{
-  // ASCII
-  int a = lang_hooks.to_target_charset ('a');
-  int z = lang_hooks.to_target_charset ('z');
-  int A = lang_hooks.to_target_charset ('A');
-  int Z = lang_hooks.to_target_charset ('Z');
-
-  if ((z - a == 25) && (Z - A == 25))
-{
-  lowers = int_range<2> (build_int_cst (type, a), build_int_cst (type, z));
-  uppers = int_range<2> (build_int_cst (type, A), build_int_cst (type, Z));
-  return true;
-}
-  // Unknown character set.
-  return false;
-}
-
 // For a builtin in CALL, return a range in R if known and return
 // TRUE.  Otherwise return FALSE.
 
@@ -944,50 +922,6 @@ fold_using_range::range_of_builtin_int_call (irange , gcall *call,
 
   switch (func)
 {
-case CFN_BUILT_IN_TOUPPER:
-  {
-	arg = gimple_call_arg (call, 0);
-	// If the argument isn't compatible with the LHS, do nothing.
-	if (!range_compatible_p (type, TREE_TYPE (arg)))
-	  return false;
-	if (!src.get_operand (r, arg))
-	  return false;
-
-	int_range<3> lowers;
-	int_range<3> uppers;
-	if (!get_letter_range (type, lowers, uppers))
-	  return false;
-
-	// Return the range passed in without any lower case characters,
-	// but including all the upper case ones.
-	lowers.invert ();
-	r.intersect (lowers);
-	r.union_ (uppers);
-	return true;
-  }
-
- case CFN_BUILT_IN_TOLOWER:
-  {
-	arg = gimple_call_arg (call, 0);
-	// If the argument isn't compatible with the LHS, do nothing.
-	if (!range_compatible_p (type, TREE_TYPE (arg)))
-	  return false;
-	if (!src.get_operand (r, arg))
-	  return false;
-
-	int_range<3> lowers;
-	int_range<3> uppers;
-	if (!get_letter_range (type, lowers, uppers))
-	  return false;
-
-	// Return the range passed in without any upper case characters,
-	// but including all the lower case ones.
-	uppers.invert ();
-	r.intersect (uppers);
-	r.union_ (lowers);
-	return true;
-  }
-
 CASE_CFN_FFS:
 CASE_CFN_POPCOUNT:
   // __builtin_ffs* and __builtin_popcount* return [0, prec].
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index d62dff5f92e..45384d990ae 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -322,6 +322,71 @@ public:
   }
 } op_cfn_signbit;
 
+// Implement range operator for CFN_BUILT_IN_TOUPPER and CFN_BUILT_IN_TOLOWER.
+class cfn_toupper_tolower : public range_operator
+{
+public:
+  using range_operator::fold_range;
+  cfn_toupper_tolower (bool toupper)  { m_toupper = toupper; }
+  virtual bool fold_range (irange , tree type, const irange ,
+			   const irange &, relation_kind) const;
+private:
+  bool get_letter_range (tree type, irange , irange ) const;
+  bool m_toupper;
+} op_cfn_toupper (true), op_cfn_tolower (false);
+
+// Return TRUE if we recognize the target character set and return the
+// range for lower case and upper case letters.
+
+bool
+cfn_toupper_tolower::get_letter_range (tree type, irange ,
+   irange ) const
+{
+  // ASCII
+  int a = lang_hooks.to_target_charset ('a');
+  int z = lang_hooks.to_target_charset ('z');
+  int A = lang_hooks.to_target_charset ('A');
+  int Z = lang_hooks.to_target_charset ('Z');
+
+  if ((z - a == 25) && (Z - A == 25))
+{
+  lowers = int_range<2> (build_int_cst (type, a), build_int_cst (type, z));
+  uppers = int_range<2> (build_int_cst (type, A), build_int_cst (type, Z));
+  return true;
+}
+  // Unknown character set.
+  return false;
+}
+
+bool
+cfn_toupper_tolower::fold_range (irange , tree type, const irange ,
+ const irange &, relation_kind) const
+{
+  int_range<3> lowers;
+  int_range<3> uppers;
+  if (!get_letter_range (type, lowers, uppers))
+return false;
+
+  r = lh;
+  if (m_toupper)
+{
+  // Return the range passed in without any lower case characters,
+  // but including all the 

[PATCH 08/17] Convert CFN_BUILT_IN_SIGNBIT to range-ops.

2022-09-22 Thread Andrew MacLeod via Gcc-patches

Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew


From eb82b9f68eb8d0cc65a1a022154c8e729860ea59 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Wed, 21 Sep 2022 09:29:40 -0400
Subject: [PATCH 08/17] Convert CFN_BUILT_IN_SIGNBIT to range-ops.

	* gimple-range-fold.cc (range_of_builtin_int_call): Remove case
	for CFN_BUILT_IN_SIGNBIT.
	* gimple-range-op.cc (class cfn_signbit): New.
	(gimple_range_op_handler::maybe_builtin_call): Set arguments.
---
 gcc/gimple-range-fold.cc | 20 
 gcc/gimple-range-op.cc   | 27 +++
 2 files changed, 27 insertions(+), 20 deletions(-)

diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index 63a1f517d28..417a925ac9f 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -944,26 +944,6 @@ fold_using_range::range_of_builtin_int_call (irange , gcall *call,
 
   switch (func)
 {
-case CFN_BUILT_IN_SIGNBIT:
-  {
-	arg = gimple_call_arg (call, 0);
-	frange tmp;
-	if (src.get_operand (tmp, arg))
-	  {
-	bool signbit;
-	if (tmp.signbit_p (signbit))
-	  {
-		if (signbit)
-		  r.set_nonzero (type);
-		else
-		  r.set_zero (type);
-		return true;
-	  }
-	return false;
-	  }
-	break;
-  }
-
 case CFN_BUILT_IN_TOUPPER:
   {
 	arg = gimple_call_arg (call, 0);
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index bcc4c3d778c..d62dff5f92e 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -301,6 +301,27 @@ public:
   }
 } op_cfn_constant_p;
 
+// Implement range operator for CFN_BUILT_IN_SIGNBIT.
+class cfn_signbit : public range_operator_float
+{
+public:
+  using range_operator_float::fold_range;
+  virtual bool fold_range (irange , tree type, const frange ,
+			   const irange &, relation_kind) const
+  {
+bool signbit;
+if (lh.signbit_p (signbit))
+  {
+	if (signbit)
+	  r.set_nonzero (type);
+	else
+	  r.set_zero (type);
+	return true;
+  }
+   return false;
+  }
+} op_cfn_signbit;
+
 // Set up a gimple_range_op_handler for any built in function which can be
 // supported via range-ops.
 
@@ -331,6 +352,12 @@ gimple_range_op_handler::maybe_builtin_call ()
 	m_valid = false;
   break;
 
+case CFN_BUILT_IN_SIGNBIT:
+  m_op1 = gimple_call_arg (call, 0);
+  m_float = _cfn_signbit;
+  m_valid = true;
+  break;
+
 default:
   break;
 }
-- 
2.37.3



[PATCH 07/17] Add range-ops support for builtin functions.

2022-09-22 Thread Andrew MacLeod via Gcc-patches
Check for builtins that can be a range-op entry and Convert 
CFN_BUILT_IN_CONSTANT_P as first POC.


Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew

From b40b3035879cf695b72010858b9705a344292bdb Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Tue, 20 Sep 2022 16:53:37 -0400
Subject: [PATCH 07/17] Add range-ops support for builtin functions.

Convert CFN_BUILT_IN_CONSTANT_P as first POC.

	* gimple-range-fold.cc
	(fold_using_range::range_of_builtin_int_call): Remove case for
	CFN_BUILT_IN_CONSTANT_P.
	* gimple-range-op.cc (gimple_range_op_handler::supported_p):
	Check if a call also creates a range-op object.
	(gimple_range_op_handler): Also check builtin calls.
	(class cfn_constant_float_p): New.  Float CFN_BUILT_IN_CONSTANT_P.
	(class cfn_constant_p): New.  Integral CFN_BUILT_IN_CONSTANT_P.
	(gimple_range_op_handler::maybe_builtin_call): Set arguments and
	handler for supported built-in calls.
	* gimple-range-op.h (maybe_builtin_call): New prototype.
---
 gcc/gimple-range-fold.cc |  17 ---
 gcc/gimple-range-op.cc   | 104 ---
 gcc/gimple-range-op.h|   1 +
 3 files changed, 97 insertions(+), 25 deletions(-)

diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index 42408254c35..63a1f517d28 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -944,23 +944,6 @@ fold_using_range::range_of_builtin_int_call (irange , gcall *call,
 
   switch (func)
 {
-case CFN_BUILT_IN_CONSTANT_P:
-  {
-	arg = gimple_call_arg (call, 0);
-	Value_Range tmp (TREE_TYPE (arg));
-	if (src.get_operand (tmp, arg) && tmp.singleton_p ())
-	  {
-	r.set (build_one_cst (type), build_one_cst (type));
-	return true;
-	  }
-	if (cfun->after_inlining)
-	  {
-	r.set_zero (type);
-	return true;
-	  }
-	break;
-  }
-
 case CFN_BUILT_IN_SIGNBIT:
   {
 	arg = gimple_call_arg (call, 0);
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index ab5b389449d..bcc4c3d778c 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -123,7 +123,11 @@ gimple_range_op_handler::supported_p (gimple *s)
 {
   enum tree_code code;
   tree type = get_code_and_type (s, code);
-  return (type && range_op_handler (code, type));
+  if (type && range_op_handler (code, type))
+return true;
+  if (is_a  (s) && gimple_range_op_handler (s))
+return true;
+  return false;
 }
 
 // Construct a handler object for statement S.
@@ -133,6 +137,8 @@ gimple_range_op_handler::gimple_range_op_handler (gimple *s)
   enum tree_code code;
   tree type = get_code_and_type (s, code);
   m_stmt = s;
+  m_op1 = NULL_TREE;
+  m_op2 = NULL_TREE;
   if (type)
 set_op_handler (code, type);
 
@@ -142,7 +148,7 @@ gimple_range_op_handler::gimple_range_op_handler (gimple *s)
 	case GIMPLE_COND:
 	  m_op1 = gimple_cond_lhs (m_stmt);
 	  m_op2 = gimple_cond_rhs (m_stmt);
-	  break;
+	  return;
 	case GIMPLE_ASSIGN:
 	  m_op1 = gimple_range_base_of_assignment (m_stmt);
 	  if (m_op1 && TREE_CODE (m_op1) == MEM_REF)
@@ -158,14 +164,15 @@ gimple_range_op_handler::gimple_range_op_handler (gimple *s)
 	}
 	  if (gimple_num_ops (m_stmt) >= 3)
 	m_op2 = gimple_assign_rhs2 (m_stmt);
-	  else
-	m_op2 = NULL_TREE;
-	  break;
+	  return;
 	default:
-	  m_op1 = NULL_TREE;
-	  m_op2 = NULL_TREE;
-	  break;
+	  gcc_unreachable ();
+	  return;
   }
+  // If no range-op table entry handled this stmt, check for other supported
+  // statements.
+  if (is_a  (m_stmt))
+maybe_builtin_call ();
 }
 
 // Calculate what we can determine of the range of this unary
@@ -247,3 +254,84 @@ gimple_range_op_handler::calc_op2 (vrange , const vrange _range,
 }
   return op2_range (r, type, lhs_range, op1_range);
 }
+
+// 
+
+// Implement range operator for float CFN_BUILT_IN_CONSTANT_P.
+class cfn_constant_float_p : public range_operator_float
+{
+public:
+  using range_operator_float::fold_range;
+  virtual bool fold_range (irange , tree type, const frange ,
+			   const irange &, relation_kind) const
+  {
+if (lh.singleton_p ())
+  {
+	r.set (build_one_cst (type), build_one_cst (type));
+	return true;
+  }
+if (cfun->after_inlining)
+  {
+	r.set_zero (type);
+	return true;
+  }
+return false;
+  }
+} op_cfn_constant_float_p;
+
+// Implement range operator for integral CFN_BUILT_IN_CONSTANT_P.
+class cfn_constant_p : public range_operator
+{
+public:
+  using range_operator::fold_range;
+  virtual bool fold_range (irange , tree type, const irange ,
+			   const irange &, relation_kind) const
+  {
+if (lh.singleton_p ())
+  {
+	r.set (build_one_cst (type), build_one_cst (type));
+	return true;
+  }
+if (cfun->after_inlining)
+  {
+	r.set_zero (type);
+	return true;
+  }
+return false;
+  }
+} op_cfn_constant_p;
+
+// Set up a gimple_range_op_handler for any built in function which can be
+// supported via 

[PATCH 06/17] Always check the return value of fold_range.

2022-09-22 Thread Andrew MacLeod via Gcc-patches
The fold_range routine in range-ops returns FALSE if the operation 
fails.  There are a few places which assume the operation was 
successful.  Fix those.


Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew


From 2f92f685da2ef9e82ee6262519919180df8f2dd9 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Wed, 21 Sep 2022 16:15:02 -0400
Subject: [PATCH 06/17] Always check the return value of fold_range.

The fold_range routine in range-ops returns FALSE if the operation
fails.  There are a few places which assume the operation was
successful.  Fix those.

	* gimple-range-fold.cc (range_of_range_op): Set result to
	VARYING if the call to fold_range fails.
	* tree-data-ref.cc (compute_distributive_range): Ditto.
	* tree-vrp.cc (range_fold_binary_expr): Ditto.
	(range_fold_unary_expr): Ditto.
	* value-query.cc (range_query::get_tree_range): Ditto.
---
 gcc/gimple-range-fold.cc | 6 --
 gcc/tree-data-ref.cc | 6 --
 gcc/tree-vrp.cc  | 6 --
 gcc/value-query.cc   | 6 --
 4 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index addf3e7f254..42408254c35 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -544,7 +544,8 @@ fold_using_range::range_of_range_op (vrange ,
 	  // Fold range, and register any dependency if available.
 	  Value_Range r2 (type);
 	  r2.set_varying (type);
-	  handler.fold_range (r, type, range1, r2);
+	  if (!handler.fold_range (r, type, range1, r2))
+	r.set_varying (type);
 	  if (lhs && gimple_range_ssa_p (op1))
 	{
 	  if (src.gori ())
@@ -567,7 +568,8 @@ fold_using_range::range_of_range_op (vrange ,
 	  fputc ('\n', dump_file);
 	}
 	  // Fold range, and register any dependency if available.
-	  handler.fold_range (r, type, range1, range2, rel);
+	  if (!handler.fold_range (r, type, range1, range2, rel))
+	r.set_varying (type);
 	  if (irange::supports_p (type))
 	relation_fold_and_or (as_a  (r), s, src);
 	  if (lhs)
diff --git a/gcc/tree-data-ref.cc b/gcc/tree-data-ref.cc
index ff9327f6deb..91bfb619d66 100644
--- a/gcc/tree-data-ref.cc
+++ b/gcc/tree-data-ref.cc
@@ -594,7 +594,8 @@ compute_distributive_range (tree type, value_range _range,
   if (result_range)
 {
   range_op_handler op (code, type);
-  op.fold_range (*result_range, type, op0_range, op1_range);
+  if (!op.fold_range (*result_range, type, op0_range, op1_range))
+	result_range->set_varying (type);
 }
 
   /* The distributive property guarantees that if TYPE is no narrower
@@ -642,7 +643,8 @@ compute_distributive_range (tree type, value_range _range,
   range_op_handler op (code, ssizetype);
   bool saved_flag_wrapv = flag_wrapv;
   flag_wrapv = 1;
-  op.fold_range (wide_range, ssizetype, op0_range, op1_range);
+  if (!op.fold_range (wide_range, ssizetype, op0_range, op1_range))
+wide_range.set_varying (ssizetype);;
   flag_wrapv = saved_flag_wrapv;
   if (wide_range.num_pairs () != 1 || !range_int_cst_p (_range))
 return false;
diff --git a/gcc/tree-vrp.cc b/gcc/tree-vrp.cc
index c3030a1b130..93482e5d102 100644
--- a/gcc/tree-vrp.cc
+++ b/gcc/tree-vrp.cc
@@ -1069,7 +1069,8 @@ range_fold_binary_expr (value_range *vr,
 vr1.set_varying (expr_type);
   vr0.normalize_addresses ();
   vr1.normalize_addresses ();
-  op.fold_range (*vr, expr_type, vr0, vr1);
+  if (!op.fold_range (*vr, expr_type, vr0, vr1))
+vr->set_varying (expr_type);
 }
 
 /* Perform a unary operation on a range.  */
@@ -1095,7 +1096,8 @@ range_fold_unary_expr (value_range *vr,
 
   value_range vr0_cst (*vr0);
   vr0_cst.normalize_addresses ();
-  op.fold_range (*vr, expr_type, vr0_cst, value_range (expr_type));
+  if (!op.fold_range (*vr, expr_type, vr0_cst, value_range (expr_type)))
+vr->set_varying (expr_type);
 }
 
 /* If the range of values taken by OP can be inferred after STMT executes,
diff --git a/gcc/value-query.cc b/gcc/value-query.cc
index 0bdd670982b..296784be31d 100644
--- a/gcc/value-query.cc
+++ b/gcc/value-query.cc
@@ -252,7 +252,8 @@ range_query::get_tree_range (vrange , tree expr, gimple *stmt)
 	  Value_Range r1 (TREE_TYPE (TREE_OPERAND (expr, 1)));
 	  range_of_expr (r0, TREE_OPERAND (expr, 0), stmt);
 	  range_of_expr (r1, TREE_OPERAND (expr, 1), stmt);
-	  op.fold_range (r, type, r0, r1);
+	  if (!op.fold_range (r, type, r0, r1))
+	r.set_varying (type);
 	}
   else
 	r.set_varying (type);
@@ -268,7 +269,8 @@ range_query::get_tree_range (vrange , tree expr, gimple *stmt)
 	  Value_Range r1 (type);
 	  r1.set_varying (type);
 	  range_of_expr (r0, TREE_OPERAND (expr, 0), stmt);
-	  op.fold_range (r, type, r0, r1);
+	  if (!op.fold_range (r, type, r0, r1))
+	r.set_varying (type);
 	}
   else
 	r.set_varying (type);
-- 
2.37.3



[PATCH 05/17] Add missing float fold_range prototype for floats.

2022-09-22 Thread Andrew MacLeod via Gcc-patches
Unary operations require op2 to be the range of the type of the LHS. 
This is so the type for the LHS can be properly set.  There are is a 
missing prototype for this combination.


Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew

From be2a25adbdc76a770f7470cc9f47892f7a4139ae Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Tue, 20 Sep 2022 12:34:08 -0400
Subject: [PATCH 05/17] Add missing float fold_range prototype for floats.

Unary operations require op2 to be the range of the type of the LHS.
This is so the type for the LHS can be properly set.

	* range-op-float.cc (range_operator_float::fold_range): New base
	  method for "int = float op int".
	* range-op.cc (range_op_handler::fold_range): New case.
	* range-op.h: Update prototypes.
---
 gcc/range-op-float.cc | 10 ++
 gcc/range-op.cc   | 13 ++---
 gcc/range-op.h|  5 +
 3 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/gcc/range-op-float.cc b/gcc/range-op-float.cc
index 2bd3dc9253f..aa5b7ed073d 100644
--- a/gcc/range-op-float.cc
+++ b/gcc/range-op-float.cc
@@ -58,6 +58,16 @@ range_operator_float::fold_range (frange  ATTRIBUTE_UNUSED,
   return false;
 }
 
+bool
+range_operator_float::fold_range (irange  ATTRIBUTE_UNUSED,
+  tree type ATTRIBUTE_UNUSED,
+  const frange  ATTRIBUTE_UNUSED,
+  const irange  ATTRIBUTE_UNUSED,
+  relation_kind rel ATTRIBUTE_UNUSED) const
+{
+  return false;
+}
+
 bool
 range_operator_float::fold_range (irange  ATTRIBUTE_UNUSED,
   tree type ATTRIBUTE_UNUSED,
diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 9ae42b8331f..072ebd32109 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -4208,9 +4208,16 @@ range_op_handler::fold_range (vrange , tree type,
 			   as_a  (rh), rel);
 
   if (is_a  (r))
-return m_float->fold_range (as_a  (r), type,
-as_a  (lh),
-as_a  (rh), rel);
+{
+  if (is_a  (rh))
+	return m_float->fold_range (as_a  (r), type,
+as_a  (lh),
+as_a  (rh), rel);
+  else
+	return m_float->fold_range (as_a  (r), type,
+as_a  (lh),
+as_a  (rh), rel);
+}
   return m_float->fold_range (as_a  (r), type,
 			  as_a  (lh),
 			  as_a  (rh), rel);
diff --git a/gcc/range-op.h b/gcc/range-op.h
index b4b5101a9e0..b2f063afb07 100644
--- a/gcc/range-op.h
+++ b/gcc/range-op.h
@@ -117,6 +117,11 @@ public:
 			   const frange ,
 			   const frange ,
 			   relation_kind rel = VREL_VARYING) const;
+  // Unary operations have the range of the LHS as op2.
+  virtual bool fold_range (irange , tree type,
+			   const frange ,
+			   const irange ,
+			   relation_kind rel = VREL_VARYING) const;
   virtual bool fold_range (irange , tree type,
 			   const frange ,
 			   const frange ,
-- 
2.37.3



[PATCH 04/17] Fix calc_op1 for undefined op2_range.

2022-09-22 Thread Andrew MacLeod via Gcc-patches
Unary operations pass the type of operand 1 into op1_range.  If that 
range is undefined, the routine blindly picks the type of operand 
2,which in the case of a unary op, does not exist and traps.


Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.

AndrewFrom a7a6649f4e7c459a95dee1600554ad06aaeb1cf6 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Thu, 22 Sep 2022 10:27:17 -0400
Subject: [PATCH 04/17] Fix calc_op1 for undefined op2_range.

Unary operations pass the type of operand 1 into op1_range.  If that
range is undefined, the routine blindly picks the type of operand 2,
which in the case of a unary op, does not exist and traps.

	* gimple-range-op.cc (gimple_range_op_handler::calc_op1): Use
	  operand 1 for second range if there is no operand 2.
---
 gcc/gimple-range-op.cc | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index f03125a0fc5..ab5b389449d 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -208,10 +208,14 @@ gimple_range_op_handler::calc_op1 (vrange , const vrange _range,
   // If op2 is undefined, solve as if it is varying.
   if (op2_range.undefined_p ())
 {
-  // This is sometimes invoked on single operand stmts.
   if (gimple_num_ops (m_stmt) < 3)
 	return false;
-  tree op2_type = TREE_TYPE (operand2 ());
+  tree op2_type;
+  // This is sometimes invoked on single operand stmts.
+  if (operand2 ())
+	op2_type = TREE_TYPE (operand2 ());
+  else
+	op2_type = TREE_TYPE (operand1 ());
   Value_Range trange (op2_type);
   trange.set_varying (op2_type);
   return op1_range (r, type, lhs_range, trange);
-- 
2.37.3



[PATCH 03/17] Create gimple_range_op_handler in a new source file.

2022-09-22 Thread Andrew MacLeod via Gcc-patches
 Range-ops is meant to be IL independent.  Some gimple processing has 
be placed in range-ops, and some is located in gori.  Split it all into 
a file and isolate it in a new class gimple_range_op_handler.


Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew
From 51ce06385bf259a092f830f1a6dcc4b98757919e Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Thu, 1 Sep 2022 10:34:55 -0400
Subject: [PATCH 03/17] Create gimple_range_op_handler in a new source file.

Range-ops is meant to be IL independent.  Some gimple processing has
be placed in range-ops, and some is located in gori.  Split it all into
a file and isolate it in a new class gimple_range_op_handler.

	* Makefile.in (OBJS): Add gimple-range-op.o.
	* gimple-range-edge.cc (gimple_outgoing_range_stmt_p): Use
	gimple_range_op_handler.
	* gimple-range-fold.cc (gimple_range_base_of_assignment): Move
	to a method in gimple_range_op_handler.
	(gimple_range_operand1): Ditto.
	(gimple_range_operand2): Ditto.
	(fold_using_range::fold_stmt): Use gimple_range_op_handler.
	(fold_using_range::range_of_range_op): Ditto.
	(fold_using_range::relation_fold_and_or): Ditto.
	(fur_source::register_outgoing_edges): Ditto.
	(gimple_range_ssa_names): Relocate to gimple-range-op.cc.
	* gimple-range-fold.h: Adjust prototypes.
	* gimple-range-gori.cc (gimple_range_calc_op1): Move
	to a method in gimple_range_op_handler.
	(gimple_range_calc_op2): Ditto.
	(gori_compute::compute_operand_range): Use
	gimple_range_op_handler.
	(gori_compute::compute_logical_operands): Ditto.
	(compute_operand1_range): Ditto.
	(gori_compute::compute_operand2_range): Ditto.
	(gori_compute::compute_operand1_and_operand2_range): Ditto.
	* gimple-range-gori.h: Adjust protoypes.
	* gimple-range-op.cc: New.  Supply gimple_range_op_handler methods.
	* gimple-range-op.h: New.  Supply gimple_range_op_handler class.
	* gimple-range.cc (gimple_ranger::prefill_name): Use
	gimple_range_op_handler.
	(gimple_ranger::prefill_stmt_dependencies): Ditto.
	* gimple-range.h: Include gimple-range-op.h.
	* range-op.cc (range_op_handler::range_op_handler): Adjust and
	remove gimple * parameter option.
	* range-op.h: Adjust prototypes.
---
 gcc/Makefile.in  |   1 +
 gcc/gimple-range-edge.cc |   2 +-
 gcc/gimple-range-fold.cc | 153 +---
 gcc/gimple-range-fold.h  |  12 +-
 gcc/gimple-range-gori.cc | 134 ++---
 gcc/gimple-range-gori.h  |  27 ++---
 gcc/gimple-range-op.cc   | 245 +++
 gcc/gimple-range-op.h|  51 
 gcc/gimple-range.cc  |  11 +-
 gcc/gimple-range.h   |   2 +-
 gcc/range-op.cc  |  37 ++
 gcc/range-op.h   |   4 +-
 12 files changed, 386 insertions(+), 293 deletions(-)
 create mode 100644 gcc/gimple-range-op.cc
 create mode 100644 gcc/gimple-range-op.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index a4689d52e36..59b67d99441 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1416,6 +1416,7 @@ OBJS = \
 	gimple-range-fold.o \
 	gimple-range-gori.o \
 	gimple-range-infer.o \
+	gimple-range-op.o \
 	gimple-range-trace.o \
 	gimple-ssa-backprop.o \
 	gimple-ssa-isolate-paths.o \
diff --git a/gcc/gimple-range-edge.cc b/gcc/gimple-range-edge.cc
index 194e8f87a4b..95deadffc55 100644
--- a/gcc/gimple-range-edge.cc
+++ b/gcc/gimple-range-edge.cc
@@ -43,7 +43,7 @@ gimple_outgoing_range_stmt_p (basic_block bb)
   if (!gsi_end_p (gsi))
 {
   gimple *s = gsi_stmt (gsi);
-  if (is_a (s) && range_op_handler (s))
+  if (is_a (s) && gimple_range_op_handler::supported_p (s))
 	return gsi_stmt (gsi);
   if (is_a  (s))
 	return gsi_stmt (gsi);
diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index a45fc7ad4c6..addf3e7f254 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -42,7 +42,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "vr-values.h"
 #include "range.h"
 #include "value-query.h"
-#include "range-op.h"
+#include "gimple-range-op.h"
 #include "gimple-range.h"
 // Construct a fur_source, and set the m_query field.
 
@@ -463,73 +463,6 @@ gimple_range_adjustment (vrange , const gimple *stmt)
 }
 }
 
-// Return the base of the RHS of an assignment.
-
-static tree
-gimple_range_base_of_assignment (const gimple *stmt)
-{
-  gcc_checking_assert (gimple_code (stmt) == GIMPLE_ASSIGN);
-  tree op1 = gimple_assign_rhs1 (stmt);
-  if (gimple_assign_rhs_code (stmt) == ADDR_EXPR)
-return get_base_address (TREE_OPERAND (op1, 0));
-  return op1;
-}
-
-// Return the first operand of this statement if it is a valid operand
-// supported by ranges, otherwise return NULL_TREE.  Special case is
-// &(SSA_NAME expr), return the SSA_NAME instead of the ADDR expr.
-
-tree
-gimple_range_operand1 (const gimple *stmt)
-{
-  gcc_checking_assert (range_op_handler (stmt));
-
-  switch (gimple_code (stmt))
-{
-  case GIMPLE_COND:
-	return gimple_cond_lhs (stmt);
-  case GIMPLE_ASSIGN:
-	{
-	  tree base = 

[PATCH 02/17] Adjust range_op_handler to store the handler directly.

2022-09-22 Thread Andrew MacLeod via Gcc-patches
Range_op_handler currently stores a tree code and a type.  It defers 
checking to see if there is a valid handler until asked.


This change checks at constructor time and store a pointer to the 
handler if there is one.


Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew
From 24c473a14d3cbe6fc44997122b532cb9406497cb Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Wed, 31 Aug 2022 14:07:13 -0400
Subject: [PATCH 02/17] Adjust range_op_handler to store the handler directly.

Range_op_handler currently stores a tree code and a type.  It defers
checking to see if there is a valid handler until asked.
This change checks at constuctor time and store a pointer to
the handler if there is one.

	* range-op.cc (range_op_handler::set_op_handler): Set new fields.
	(ange_op_handler::range_op_handler): Likewise.
	(range_op_handler::operator bool): Remove.
	(range_op_handler::fold_range): Use appropriate handler.
	(range_op_handler::op1_range): Likewise.
	(range_op_handler::op2_range): Likewise.
	(range_op_handler::lhs_op1_relation): Likewise.
	(range_op_handler::lhs_op2_relation): Likewise.
	(range_op_handler::op1_op2_relation): Likewise.
	* range-op.h (class range_op_handler): Store handler pointers.
	(range_op_handler:: operator bool): Inline.
---
 gcc/range-op.cc | 246 +---
 gcc/range-op.h  |   8 +-
 2 files changed, 114 insertions(+), 140 deletions(-)

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 806edf1012e..f642b3f26de 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -4159,48 +4159,63 @@ get_float_handler (enum tree_code code, tree)
   return (*floating_tree_table)[code];
 }
 
+void
+range_op_handler::set_op_handler (tree_code code, tree type)
+{
+  if (irange::supports_p (type))
+{
+  m_float = NULL;
+  m_int = get_handler (code, type);
+  m_valid = m_int != NULL;
+}
+  else if (frange::supports_p (type))
+{
+  m_int = NULL;
+  m_float = get_float_handler (code, type);
+  m_valid = m_float != NULL;
+}
+  else
+{
+  m_int = NULL;
+  m_float = NULL;
+  m_valid = false;
+}
+}
+
 range_op_handler::range_op_handler (tree_code code, tree type)
-  : m_code (code), m_type (type)
 {
+  set_op_handler (code, type);
 }
 
 range_op_handler::range_op_handler (const gimple *s)
 {
+  tree_code code = NOP_EXPR;
+  tree type = NULL_TREE;
+
   if (const gassign *ass = dyn_cast (s))
 {
-  m_code = gimple_assign_rhs_code (ass);
+  code = gimple_assign_rhs_code (ass);
   // The LHS of a comparison is always an int, so we must look at
   // the operands.
-  if (TREE_CODE_CLASS (m_code) == tcc_comparison)
-	m_type = TREE_TYPE (gimple_assign_rhs1 (ass));
+  if (TREE_CODE_CLASS (code) == tcc_comparison)
+	type = TREE_TYPE (gimple_assign_rhs1 (ass));
   else
-	m_type = TREE_TYPE (gimple_assign_lhs (ass));
+	type = TREE_TYPE (gimple_assign_lhs (ass));
 }
   else if (const gcond *cond = dyn_cast (s))
 {
-  m_code = gimple_cond_code (cond);
-  m_type = TREE_TYPE (gimple_cond_lhs (cond));
+  code = gimple_cond_code (cond);
+  type = TREE_TYPE (gimple_cond_lhs (cond));
 }
-  else
+
+  if (!type)
 {
-  // A null type means there is no handler for this combination,
-  // but the decision whether there is one or not, is delayed
-  // until operator bool below is queried.
-  m_code = NOP_EXPR;
-  m_type = nullptr;
+  m_int = NULL;
+  m_float = NULL;
+  m_valid = false;
 }
-}
-
-// Return TRUE if there is a handler available for the current
-// combination of tree_code and type.
-
-range_op_handler::operator bool () const
-{
-  if (!m_type)
-return false;
-  if (frange::supports_p (m_type))
-return get_float_handler (m_code, m_type);
-  return get_handler (m_code, m_type);
+  else
+set_op_handler (code, type);
 }
 
 bool
@@ -4209,26 +4224,19 @@ range_op_handler::fold_range (vrange , tree type,
 			  const vrange ,
 			  relation_kind rel) const
 {
-  if (irange::supports_p (m_type))
-{
-  range_operator *op = get_handler (m_code, m_type);
-  return op->fold_range (as_a  (r), type,
-			 as_a  (lh),
-			 as_a  (rh), rel);
-}
-  if (frange::supports_p (m_type))
-{
-  range_operator_float *op = get_float_handler (m_code, m_type);
-  if (is_a  (r))
-	return op->fold_range (as_a  (r), type,
-			   as_a  (lh),
-			   as_a  (rh), rel);
-  return op->fold_range (as_a  (r), type,
-			 as_a  (lh),
-			 as_a  (rh), rel);
-}
-  gcc_unreachable ();
-  return false;
+  gcc_checking_assert (m_valid);
+  if (m_int)
+return m_int->fold_range (as_a  (r), type,
+			   as_a  (lh),
+			   as_a  (rh), rel);
+
+  if (is_a  (r))
+return m_float->fold_range (as_a  (r), type,
+as_a  (lh),
+as_a  (rh), rel);
+  return m_float->fold_range (as_a  (r), type,
+			  as_a  (lh),
+			  as_a  (rh), rel);
 }
 
 bool
@@ -4237,26 +4245,19 @@ 

[PATCH 01/17] Replace another snippet with a call to, gimple_range_ssa_names.

2022-09-22 Thread Andrew MacLeod via Gcc-patches

When the original patch was applied, I missed a spot which could
also be rewritten to use gimple_range_ssa_names.

Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew
From 3cba5cd6e019182dbff756f621af048d55cdda98 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Wed, 31 Aug 2022 17:28:09 -0400
Subject: [PATCH 01/17] Replace another snippet with a call to
 gimple_range_ssa_names.

When the original patch was applied, I missed a spot which could
also be rewritten to use gimple_range_ssa_names.

	* tree-ssa-threadbackward.cc
	  (back_threader::find_paths_to_names): Replace sequence with
	  a call to gimple_range_ssa_names.
---
 gcc/tree-ssa-threadbackward.cc | 20 +++-
 1 file changed, 3 insertions(+), 17 deletions(-)

diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
index 9725f50e639..2a8cfa3ee01 100644
--- a/gcc/tree-ssa-threadbackward.cc
+++ b/gcc/tree-ssa-threadbackward.cc
@@ -435,28 +435,14 @@ back_threader::find_paths_to_names (basic_block bb, bitmap interesting,
 		}
 	  /* For other local defs process their uses, amending
 		 imports on the way.  */
-	  else if (gassign *ass = dyn_cast  (def_stmt))
+	  else
 		{
 		  tree ssa[3];
-		  if (range_op_handler (ass))
-		{
-		  ssa[0] = gimple_range_ssa_p (gimple_range_operand1 (ass));
-		  ssa[1] = gimple_range_ssa_p (gimple_range_operand2 (ass));
-		  ssa[2] = NULL_TREE;
-		}
-		  else if (gimple_assign_rhs_code (ass) == COND_EXPR)
-		{
-		  ssa[0] = gimple_range_ssa_p (gimple_assign_rhs1 (ass));
-		  ssa[1] = gimple_range_ssa_p (gimple_assign_rhs2 (ass));
-		  ssa[2] = gimple_range_ssa_p (gimple_assign_rhs3 (ass));
-		}
-		  else
-		continue;
-		  for (unsigned j = 0; j < 3; ++j)
+		  unsigned lim = gimple_range_ssa_names (ssa, 3, def_stmt);
+		  for (unsigned j = 0; j < lim; ++j)
 		{
 		  tree rhs = ssa[j];
 		  if (rhs
-			  && TREE_CODE (rhs) == SSA_NAME
 			  && bitmap_set_bit (m_imports,
 	 SSA_NAME_VERSION (rhs)))
 			{
-- 
2.37.3



[PATCH 00/17] Move builtin functions to range-ops.

2022-09-22 Thread Andrew MacLeod via Gcc-patches
Builtin functions have been handled until now as special cases in 
gimple-range-fold.cc. This set of patches makes the changes required to 
create a range_operator for those functions.  This allows them to behave 
like a normal unary/binary operation through out the ranger ecosystem.  
In particular, it will enable us to make GORI aware of them as we can 
now provide op1_range and op2_range routines, as well as registering an 
relations as needed.  None of these enhanced functions are provided yet, 
this is strictly a conversion.  This enables us to do this for any 
operation with 1 or 2 operands.


There are 17 patches, some are bug fixes, some are infrastructure, a 
couple are just missing functionality, but most are them are conversions 
of the builtins.  I did each builtin as a separate patch so if a 
regression triggers, we can pinpoint it faster.


Of note:

Patch 2 : Modifies the range_op_handler class to store an integer 
handler and a float handler rather than the old tree-code and type.  By 
looking up the handler immediately and storing the pointer, this opens 
up the possibility of processing handlers which are not in a tree-code 
table.


Patch 3 : Range-ops is suppose to be IL independent, designed to work in 
RTL land as well.  A little bit of gimple had crept in, and I needed a 
layer that is gimple aware.  This patch introduces a 
gimple_range_op_handler which inherits from range_op_handler, and acts 
as the connector between the gimple IL and range-ops. Some of that code 
was in range-ops, and a lot more was located in the GORI file.  All 
those bits and pieces have been moved into the new class.


Patch 7 : This patch adjusts gimple_range_op_handler constructor to also 
check if a builtin function call might have a range_operator object 
available, and if so, return that.  This initial conversion also adds 
CFN_BUILT_IN_CONSTANT_P as the first builtin, removing it from the big 
switch in gimple-range-fold.cc.


Patch 8-16 :  Moves SIGNBIT, TOUPPER/LOWER, POPCOUNT, CLZ, CTZ, CLRSB, 
UBSAN*, STRLEN, and GOACC to range-ops.


patch 17: Finally, moves CFA_BUILT_IN_PARITY to range-ops, and removes 
the builtin-function code checks from range_of_call in gimple_range-fold.cc


These patches all bootstrap on x86_64-pc-linux-gnu with no regressions. 
  Performance wise, it all ends up as approximately a wash. (VRP a hair 
slower, threading a hair faster)


Pushed.

Andrew



[PATCH] c++ modules: ICE with class NTTP argument [PR100616]

2022-09-22 Thread Patrick Palka via Gcc-patches
When streaming in the artificial VAR_DECL synthesized for a class NTTP
argument, we end up crashing from complete_vars because the call to
maybe_register_incomplete_var from add_module_namespace_decl for this
VAR_DECL pushes an unexpected NULL_TREE type onto the incomplete_vars
vector.

This patch fixes this by checking for NULL_TREE before pushing onto
the vector.  This avoids the crash, but I noticed we still appear to
mishandle these artificial VAR_DECLs across translation units: the lookup
from get_template_parm_object for an existing VAR_DECL for the given
class NTTP argument fails to find the streamed-in VAR_DECL from the
other translation unit, so we end up creating a second VAR_DECL, but
that causes specialization equivalency issues in the XFAIL'd part of the
below test.  I'm afraid I don't understand why the lookup fails here
despite having done add_module_namespace_decl during stream-in, but
fixing the ICE seems like a safe and useful step towards enabling class
NTTP arguments used in modules.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/100616

gcc/cp/ChangeLog:

* decl.cc (maybe_register_incomplete_var): Check result of
outermost_open_class.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr100616_a.C: New test.
* g++.dg/modules/pr100616_b.C: New test.
---
 gcc/cp/decl.cc|  8 +---
 gcc/testsuite/g++.dg/modules/pr100616_a.C |  8 
 gcc/testsuite/g++.dg/modules/pr100616_b.C | 10 ++
 3 files changed, 23 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/pr100616_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/pr100616_b.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 80467c19254..722b64793ed 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -18235,9 +18235,11 @@ maybe_register_incomplete_var (tree var)
{
  /* When the outermost open class is complete we can resolve any
 pointers-to-members.  */
- tree context = outermost_open_class ();
- incomplete_var iv = {var, context};
- vec_safe_push (incomplete_vars, iv);
+ if (tree context = outermost_open_class ())
+   {
+ incomplete_var iv = {var, context};
+ vec_safe_push (incomplete_vars, iv);
+   }
}
 }
 }
diff --git a/gcc/testsuite/g++.dg/modules/pr100616_a.C 
b/gcc/testsuite/g++.dg/modules/pr100616_a.C
new file mode 100644
index 000..788af2eb533
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr100616_a.C
@@ -0,0 +1,8 @@
+// PR c++/100616
+// { dg-additional-options "-std=c++20 -fmodules-ts" }
+// { dg-module-cmi pr100616 }
+export module pr100616;
+
+template struct C { };
+struct A { };
+C c1;
diff --git a/gcc/testsuite/g++.dg/modules/pr100616_b.C 
b/gcc/testsuite/g++.dg/modules/pr100616_b.C
new file mode 100644
index 000..8037ceda3ed
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr100616_b.C
@@ -0,0 +1,10 @@
+// PR c++/100616
+// { dg-additional-options "-std=c++20 -fmodules-ts" }
+module pr100616;
+
+C c2;
+
+// FIXME: We don't reuse the artificial VAR_DECL for the class NTTP argument 
A{}
+// from the other translation unit, which causes these types to be different.
+using ty_a = decltype(c1);
+using ty_a = decltype(c2); // { dg-bogus "conflicting" "" { xfail *-*-* } }
-- 
2.38.0.rc0.52.gdda7228a83



[PATCH][_GLIBCXX_DEBUG][_GLIBCXX_INLINE_VERSION] Add missing printers

2022-09-22 Thread François Dumont via Gcc-patches

Hi

    This patch fix failures when _GLIBCXX_INLINE_VERSION mode and running:

make check-debug RUNTESTFLAGS=prettyprinters.exp

    libstdc++: [_GLIBCXX_INLINE_VERSION] Add gdb pretty print for 
_GLIBCXX_DEBUG


    In _GLIBCXX_DEBUG mode containers are in std::__debug namespace but 
not template
    parameters. In _GLIBCXX_INLINE_VERSION mode most types are in 
std::__8 namespace but
    not std::__debug containers. We need to register specific type 
printers for this

    combination.

    libstdc++-v3/ChangeLog:

    * python/libstdcxx/v6/printers.py 
(add_one_template_type_printer): Register
    printer for types in std::__debug namespace with template 
parameters in std::__8

    namespace.

Ok to commit ?

François
diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py b/libstdc++-v3/python/libstdcxx/v6/printers.py
index 24a6462e496..1e9d0627e9f 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -2036,6 +2036,10 @@ def add_one_template_type_printer(obj, name, defargs):
 printer = TemplateTypePrinter(ns+name, defargs)
 gdb.types.register_type_printer(obj, printer)
 
+# Add type printer for same type in debug namespace:
+printer = TemplateTypePrinter('std::__debug::'+name, defargs)
+gdb.types.register_type_printer(obj, printer)
+
 class FilteringTypePrinter(object):
 r"""
 A type printer that uses typedef names for common template specializations.


[PING] [PATCH] libstdc++: basic_filebuf: don't flush more often than necessary.

2022-09-22 Thread Charles-François Natali via Gcc-patches
On Mon, Sep 5, 2022, 23:51 Charles-Francois Natali 
wrote:

> `basic_filebuf::xsputn` would bypass the buffer when passed a chunk of
> size 1024 and above, seemingly as an optimisation.
>
> This can have a significant performance impact if the overhead of a
> `write` syscall is non-negligible, e.g. on a slow disk, on network
> filesystems, or simply during IO contention because instead of flushing
> every `BUFSIZ` (by default), we can flush every 1024 char.
> The impact is even greater with custom larger buffers, e.g. for network
> filesystems, because the code could issue `write` for example 1000X more
> often than necessary with respect to the buffer size.
> It also introduces a significant discontinuity in performance when
> writing chunks of size 1024 and above.
>
> See this reproducer which writes down a fixed number of chunks to a file
> open with `O_SYNC` - to replicate high-latency `write` - for varying
> size of chunks:
>
> ```
> $ cat test_fstream_flush.cpp
>
> int
> main(int argc, char* argv[])
> {
>   assert(argc == 3);
>
>   const auto* path = argv[1];
>   const auto chunk_size = std::stoul(argv[2]);
>
>   const auto fd =
> open(path, O_CREAT | O_TRUNC | O_WRONLY | O_SYNC | O_CLOEXEC, 0666);
>   assert(fd >= 0);
>
>   auto filebuf = __gnu_cxx::stdio_filebuf(fd, std::ios_base::out);
>   auto stream = std::ostream();
>
>   const auto chunk = std::vector(chunk_size);
>
>   for (auto i = 0; i < 1'000; ++i) {
> stream.write(chunk.data(), chunk.size());
>   }
>
>   return 0;
> }
> ```
>
> ```
> $ g++ -o /tmp/test_fstream_flush test_fstream_flush.cpp -std=c++17
> $ for i in $(seq 1021 1025); do echo -e "\n$i"; time
> /tmp/test_fstream_flush /tmp/foo $i; done
>
> 1021
>
> real0m0.997s
> user0m0.000s
> sys 0m0.038s
>
> 1022
>
> real0m0.939s
> user0m0.005s
> sys 0m0.032s
>
> 1023
>
> real0m0.954s
> user0m0.005s
> sys 0m0.034s
>
> 1024
>
> real0m7.102s
> user0m0.040s
> sys 0m0.192s
>
> 1025
>
> real0m7.204s
> user0m0.025s
> sys 0m0.209s
> ```
>
> See the huge drop in performance at the 1024-boundary.
>
> An `strace` confirms that from size 1024 we effectively defeat
> buffering:
> 1023-sized writes
> ```
> $ strace -P /tmp/foo -e openat,write,writev /tmp/test_fstream_flush
> /tmp/foo 1023 2>&1 | head -n5
> openat(AT_FDCWD, "/tmp/foo", O_WRONLY|O_CREAT|O_TRUNC|O_SYNC|O_CLOEXEC,
> 0666) = 3
> writev(3,
> [{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> iov_len=8184},
> {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> iov_len=1023}], 2) = 9207
> writev(3,
> [{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> iov_len=8184},
> {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> iov_len=1023}], 2) = 9207
> writev(3,
> [{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> iov_len=8184},
> {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> iov_len=1023}], 2) = 9207
> writev(3,
> [{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> iov_len=8184},
> {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> iov_len=1023}], 2) = 9207
> ```
>
> vs 1024-sized writes
> ```
> $ strace -P /tmp/foo -e openat,write,writev /tmp/test_fstream_flush
> /tmp/foo 1024 2>&1 | head -n5
> openat(AT_FDCWD, "/tmp/foo", O_WRONLY|O_CREAT|O_TRUNC|O_SYNC|O_CLOEXEC,
> 0666) = 3
> writev(3, [{iov_base=NULL, iov_len=0},
> {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> iov_len=1024}], 2) = 1024
> writev(3, [{iov_base="", iov_len=0},
> {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> iov_len=1024}], 2) = 1024
> writev(3, [{iov_base="", iov_len=0},
> {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> iov_len=1024}], 2) = 1024
> writev(3, [{iov_base="", iov_len=0},
> {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> iov_len=1024}], 2) = 1024
> ```
>
> Instead, it makes sense to only bypass the buffer if the amount of data
> to be written is larger than the buffer capacity.
>
> Closes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63746
> ---
>  libstdc++-v3/include/bits/fstream.tcc |  9 +--
>  .../27_io/basic_filebuf/sputn/char/63746.cc   | 55 +++
>  2 files changed, 58 insertions(+), 6 deletions(-)
>  create mode 100644
> libstdc++-v3/testsuite/27_io/basic_filebuf/sputn/char/63746.cc
>
> diff --git a/libstdc++-v3/include/bits/fstream.tcc
> b/libstdc++-v3/include/bits/fstream.tcc
> index 7ccc887b8..2e9369628 100644
> --- a/libstdc++-v3/include/bits/fstream.tcc
> +++ b/libstdc++-v3/include/bits/fstream.tcc
> @@ -757,23 +757,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  {
>streamsize __ret = 0;
>// Optimization in the always_noconv() case, to be generalized in
> 

[PATCH] frange: dump hex values when dumping FP numbers.

2022-09-22 Thread Aldy Hernandez via Gcc-patches
It has been suggested that if we start bumping numbers by an ULP when
calculating open ranges (for example the numbers less than 3.0) that
dumping these will become increasingly harder to read, and instead we
should opt for the hex representation.  I still find the floating
point representation easier to read for most numbers, but perhaps we
could have both?

With this patch this is the representation for [15.0, 20.0]:

 [frange] float [1.5e+1 (0x0.fp+4), 2.0e+1 (0x0.ap+5)]

Would you find this useful, or should we stick to the hex
representation only (or something altogether different)?

Tested on x86-64 Linux.

gcc/ChangeLog:

* value-range-pretty-print.cc (vrange_printer::print_real_value): New.
(vrange_printer::visit): Call print_real_value.
* value-range-pretty-print.h: New print_real_value.
---
 gcc/value-range-pretty-print.cc | 16 
 gcc/value-range-pretty-print.h  |  1 +
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/gcc/value-range-pretty-print.cc b/gcc/value-range-pretty-print.cc
index eb7442229ba..51be037c254 100644
--- a/gcc/value-range-pretty-print.cc
+++ b/gcc/value-range-pretty-print.cc
@@ -117,6 +117,16 @@ vrange_printer::print_irange_bitmasks (const irange ) 
const
   pp_string (pp, buf);
 }
 
+void
+vrange_printer::print_real_value (tree type, const REAL_VALUE_TYPE ) const
+{
+  char s[60];
+  tree t = build_real (type, r);
+  dump_generic_node (pp, t, 0, TDF_NONE, false);
+  real_to_hexadecimal (s, , sizeof (s), 0, 1);
+  pp_printf (pp, " (%s)", s);
+}
+
 // Print an frange.
 
 void
@@ -141,11 +151,9 @@ vrange_printer::visit (const frange ) const
   bool has_endpoints = !r.known_isnan ();
   if (has_endpoints)
 {
-  dump_generic_node (pp,
-build_real (type, r.lower_bound ()), 0, TDF_NONE, 
false);
+  print_real_value (type, r.lower_bound ());
   pp_string (pp, ", ");
-  dump_generic_node (pp,
-build_real (type, r.upper_bound ()), 0, TDF_NONE, 
false);
+  print_real_value (type, r.upper_bound ());
 }
   pp_character (pp, ']');
   print_frange_nan (r);
diff --git a/gcc/value-range-pretty-print.h b/gcc/value-range-pretty-print.h
index 20c26598fe7..a9ae5a7b4cc 100644
--- a/gcc/value-range-pretty-print.h
+++ b/gcc/value-range-pretty-print.h
@@ -32,6 +32,7 @@ private:
   void print_irange_bound (const wide_int , tree type) const;
   void print_irange_bitmasks (const irange &) const;
   void print_frange_nan (const frange &) const;
+  void print_real_value (tree type, const REAL_VALUE_TYPE ) const;
 
   pretty_printer *pp;
 };
-- 
2.37.1



[PATCH] frange: drop endpoints to min/max representable numbers for -ffinite-math-only.

2022-09-22 Thread Aldy Hernandez via Gcc-patches
Similarly to how we drop NANs to UNDEFINED when -ffinite-math-only, I
think we can drop the numbers outside of the min/max representable
numbers to the representable number.

This means the endpoings to VR_VARYING for -ffinite-math-only can now
be the min/max representable, instead of -INF and +INF.

Saturating in the setter means that the upcoming implementation for
binary operators no longer have to worry about doing the right
thing for -ffinite-math-only.  If the range goes outside the limits,
it'll get chopped down.

How does this look?

Tested on x86-64 Linux.

gcc/ChangeLog:

* range-op-float.cc (build_le): Use vrp_val_*.
(build_lt): Same.
(build_ge): Same.
(build_gt): Same.
* value-range.cc (frange::set): Chop ranges outside of the
representable numbers for -ffinite-math-only.
(frange::normalize_kind): Use vrp_val*.
(frange::verify_range): Same.
(frange::set_nonnegative): Same.
(range_tests_floats): Remove tests that depend on -INF and +INF.
* value-range.h (real_max_representable): Add prototype.
(real_min_representable): Same.
(vrp_val_max): Set max representable number for
-ffinite-math-only.
(vrp_val_min): Same but for min.
(frange::set_varying): Use vrp_val*.
---
 gcc/range-op-float.cc | 12 +++
 gcc/value-range.cc| 46 ---
 gcc/value-range.h | 30 ++--
 3 files changed, 53 insertions(+), 35 deletions(-)

diff --git a/gcc/range-op-float.cc b/gcc/range-op-float.cc
index 2bd3dc9253f..15ba19c2deb 100644
--- a/gcc/range-op-float.cc
+++ b/gcc/range-op-float.cc
@@ -232,7 +232,8 @@ build_le (frange , tree type, const frange )
 {
   gcc_checking_assert (!val.known_isnan ());
 
-  r.set (type, dconstninf, val.upper_bound ());
+  REAL_VALUE_TYPE ninf = *TREE_REAL_CST_PTR (vrp_val_min (type));
+  r.set (type, ninf, val.upper_bound ());
 
   // Add both zeros if there's the possibility of zero equality.
   frange_add_zeros (r, type);
@@ -257,7 +258,8 @@ build_lt (frange , tree type, const frange )
   return false;
 }
   // We only support closed intervals.
-  r.set (type, dconstninf, val.upper_bound ());
+  REAL_VALUE_TYPE ninf = *TREE_REAL_CST_PTR (vrp_val_min (type));
+  r.set (type, ninf, val.upper_bound ());
   return true;
 }
 
@@ -268,7 +270,8 @@ build_ge (frange , tree type, const frange )
 {
   gcc_checking_assert (!val.known_isnan ());
 
-  r.set (type, val.lower_bound (), dconstinf);
+  REAL_VALUE_TYPE inf = *TREE_REAL_CST_PTR (vrp_val_max (type));
+  r.set (type, val.lower_bound (), inf);
 
   // Add both zeros if there's the possibility of zero equality.
   frange_add_zeros (r, type);
@@ -294,7 +297,8 @@ build_gt (frange , tree type, const frange )
 }
 
   // We only support closed intervals.
-  r.set (type, val.lower_bound (), dconstinf);
+  REAL_VALUE_TYPE inf = *TREE_REAL_CST_PTR (vrp_val_max (type));
+  r.set (type, val.lower_bound (), inf);
   return true;
 }
 
diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index 7e8028eced2..e57d60e1bac 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -338,6 +338,18 @@ frange::set (tree min, tree max, value_range_kind kind)
   m_neg_nan = false;
 }
 
+  // For -ffinite-math-only we can drop ranges outside the
+  // representable numbers to min/max for the type.
+  if (flag_finite_math_only)
+{
+  REAL_VALUE_TYPE min_repr = *TREE_REAL_CST_PTR (vrp_val_min (m_type));
+  REAL_VALUE_TYPE max_repr = *TREE_REAL_CST_PTR (vrp_val_max (m_type));
+  if (real_less (_min, _repr))
+   m_min = min_repr;
+  if (real_less (_repr, _max))
+   m_max = max_repr;
+}
+
   // Check for swapped ranges.
   gcc_checking_assert (tree_compare (LE_EXPR, min, max));
 
@@ -371,8 +383,8 @@ bool
 frange::normalize_kind ()
 {
   if (m_kind == VR_RANGE
-  && real_isinf (_min, 1)
-  && real_isinf (_max, 0))
+  && vrp_val_is_min (build_real (m_type, m_min))
+  && vrp_val_is_max (build_real (m_type, m_max)))
 {
   if (m_pos_nan && m_neg_nan)
{
@@ -385,8 +397,8 @@ frange::normalize_kind ()
   if (!m_pos_nan || !m_neg_nan)
{
  m_kind = VR_RANGE;
- m_min = dconstninf;
- m_max = dconstinf;
+ m_min = *TREE_REAL_CST_PTR (vrp_val_min (m_type));
+ m_max = *TREE_REAL_CST_PTR (vrp_val_max (m_type));
  return true;
}
 }
@@ -706,8 +718,8 @@ frange::verify_range ()
 case VR_VARYING:
   gcc_checking_assert (m_type);
   gcc_checking_assert (m_pos_nan && m_neg_nan);
-  gcc_checking_assert (real_isinf (_min, 1));
-  gcc_checking_assert (real_isinf (_max, 0));
+  gcc_checking_assert (vrp_val_is_min (build_real (m_type, m_min)));
+  gcc_checking_assert (vrp_val_is_max (build_real (m_type, m_max)));
   return;
 case VR_RANGE:
   gcc_checking_assert (m_type);
@@ -732,7 +744,8 @@ frange::verify_range ()

[PATCH] Add debug functions for REAL_VALUE_TYPE.

2022-09-22 Thread Aldy Hernandez via Gcc-patches
We currently have no way of dumping REAL_VALUE_TYPEs when debugging.

Tested on a gdb session examining the real value 10.0:

(gdb) p min
$9 = {cl = 1, decimal = 0, sign = 0, signalling = 0, canonical = 0, uexp = 4, 
sig = {0, 0, 11529215046068469760}}
(gdb) p debug (min)
0x0.ap+4

OK for trunk?

gcc/ChangeLog:

* real.cc (debug): New.
---
 gcc/real.cc | 16 
 1 file changed, 16 insertions(+)

diff --git a/gcc/real.cc b/gcc/real.cc
index 73bbac645d9..a31b256a47b 100644
--- a/gcc/real.cc
+++ b/gcc/real.cc
@@ -1900,6 +1900,22 @@ real_to_decimal (char *str, const REAL_VALUE_TYPE 
*r_orig, size_t buf_size,
digits, crop_trailing_zeros, VOIDmode);
 }
 
+DEBUG_FUNCTION void
+debug (const REAL_VALUE_TYPE *r)
+{
+  char s[60];
+  real_to_hexadecimal (s, r, sizeof (s), 0, 1);
+  fprintf (stderr, "%s\n", s);
+}
+
+DEBUG_FUNCTION void
+debug (const REAL_VALUE_TYPE )
+{
+  char s[60];
+  real_to_hexadecimal (s, , sizeof (s), 0, 1);
+  fprintf (stderr, "%s\n", s);
+}
+
 /* Render R as a hexadecimal floating point constant.  Emit DIGITS
significant digits in the result, bounded by BUF_SIZE.  If DIGITS is 0,
choose the maximum for the representation.  If CROP_TRAILING_ZEROS,
-- 
2.37.1



[PATCH] testsuite: Sanitize fails for SP FPU on Arm

2022-09-22 Thread Torbjörn SVENSSON via Gcc-patches
This patch stops reporting fails for Arm targets with single
precision floating point unit for types wider than 32 bits (the width
of float on arm-none-eabi).

As reported in PR102017, fenv is reported as supported in recent
versions of newlib. At the same time, for some Arm targets, the
implementation in libgcc does not support exceptions and thus, the
test fails with a call to abort().

gcc/testsuite/ChangeLog:

* gcc.dg/c2x-float-7.c: Invert the exception check for Arm
targets with SP FPU.
* gcc.dg/pr95115.c: Likewise.
* gcc.dg/torture/float32x-nan-floath.c: Likewise.
* gcc.dg/torture/float32x-nan.c: Likewise.
* gcc.dg/torture/float64-nan-floath.c: Likewise.
* gcc.dg/torture/float64-nan.c: Likewise.
* gcc.dg/torture/inf-compare-1.c: Likewise.
* gcc.dg/torture/inf-compare-2.c: Likewise.
* gcc.dg/torture/inf-compare-3.c: Likewise.
* gcc.dg/torture/inf-compare-4.c: Likewise.

Co-Authored-By: Yvan ROUX  
Signed-off-by: Torbjörn SVENSSON  
---
 gcc/testsuite/gcc.dg/c2x-float-7.c   | 10 ++
 gcc/testsuite/gcc.dg/pr95115.c   |  5 +
 gcc/testsuite/gcc.dg/torture/floatn-nan-floath.h |  5 +
 gcc/testsuite/gcc.dg/torture/floatn-nan.h| 10 ++
 gcc/testsuite/gcc.dg/torture/inf-compare-1.c |  5 +
 gcc/testsuite/gcc.dg/torture/inf-compare-2.c |  5 +
 gcc/testsuite/gcc.dg/torture/inf-compare-3.c |  5 +
 gcc/testsuite/gcc.dg/torture/inf-compare-4.c |  5 +
 8 files changed, 50 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/c2x-float-7.c 
b/gcc/testsuite/gcc.dg/c2x-float-7.c
index 0c90ff24165..c699e94aff8 100644
--- a/gcc/testsuite/gcc.dg/c2x-float-7.c
+++ b/gcc/testsuite/gcc.dg/c2x-float-7.c
@@ -39,11 +39,21 @@ main (void)
 abort ();
   feclearexcept (FE_ALL_EXCEPT);
   d += d;
+#if defined(__ARM_FP) && __ARM_FP == 4
+  /* Arm with SP FPU does not support exceptions (see pr102017).  */
+  if (fetestexcept (FE_INVALID))
+#else
   if (!fetestexcept (FE_INVALID))
+#endif
 abort ();
   feclearexcept (FE_ALL_EXCEPT);
   ld += ld;
+#if defined(__ARM_FP) && __ARM_FP == 4
+  /* Arm with SP FPU does not support exceptions (see pr102017).  */
+  if (fetestexcept (FE_INVALID))
+#else
   if (!fetestexcept (FE_INVALID))
+#endif
 abort ();
   exit (0);
 }
diff --git a/gcc/testsuite/gcc.dg/pr95115.c b/gcc/testsuite/gcc.dg/pr95115.c
index 46a95dfb698..15bc6854819 100644
--- a/gcc/testsuite/gcc.dg/pr95115.c
+++ b/gcc/testsuite/gcc.dg/pr95115.c
@@ -19,7 +19,12 @@ main (void)
   double r = x ();
   if (!__builtin_isnan (r))
abort ();
+#if defined(__ARM_FP) && __ARM_FP == 4
+  /* Arm with SP FPU does not support exceptions (see pr102017).  */
+  if (fetestexcept (FE_INVALID))
+#else
   if (!fetestexcept (FE_INVALID))
+#endif
abort ();
   exit (0);
 }
diff --git a/gcc/testsuite/gcc.dg/torture/floatn-nan-floath.h 
b/gcc/testsuite/gcc.dg/torture/floatn-nan-floath.h
index 9892fd0cf63..5c9f28d4fdc 100644
--- a/gcc/testsuite/gcc.dg/torture/floatn-nan-floath.h
+++ b/gcc/testsuite/gcc.dg/torture/floatn-nan-floath.h
@@ -30,7 +30,12 @@ main (void)
 {
   volatile TYPE r;
   r = nans_cst + nans_cst;
+#if defined(__ARM_FP) && __ARM_FP == 4 && (EXT || WIDTH > 32)
+  /* Arm with SP FPU does not support exceptions (see pr102017).  */
+  if (fetestexcept (FE_INVALID))
+#else
   if (!fetestexcept (FE_INVALID))
+#endif
 abort ();
   exit (0);
 }
diff --git a/gcc/testsuite/gcc.dg/torture/floatn-nan.h 
b/gcc/testsuite/gcc.dg/torture/floatn-nan.h
index 89d2e2eec34..0abb0668677 100644
--- a/gcc/testsuite/gcc.dg/torture/floatn-nan.h
+++ b/gcc/testsuite/gcc.dg/torture/floatn-nan.h
@@ -30,10 +30,20 @@ main (void)
 {
   volatile TYPE r;
   r = nan_cst + nan_cst;
+#if defined(__ARM_FP) && __ARM_FP == 4 && (EXT || WIDTH > 32)
+  /* Arm with SP FPU does not support exceptions (see pr102017).  */
+  if (!fetestexcept (FE_INVALID))
+#else
   if (fetestexcept (FE_INVALID))
+#endif
 abort ();
   r = nans_cst + nans_cst;
+#if defined(__ARM_FP) && __ARM_FP == 4 && (EXT || WIDTH > 32)
+  /* Arm with SP FPU does not support exceptions (see pr102017).  */
+  if (fetestexcept (FE_INVALID))
+#else
   if (!fetestexcept (FE_INVALID))
+#endif
 abort ();
   exit (0);
 }
diff --git a/gcc/testsuite/gcc.dg/torture/inf-compare-1.c 
b/gcc/testsuite/gcc.dg/torture/inf-compare-1.c
index 70f255e680a..df0e61d9f89 100644
--- a/gcc/testsuite/gcc.dg/torture/inf-compare-1.c
+++ b/gcc/testsuite/gcc.dg/torture/inf-compare-1.c
@@ -16,6 +16,11 @@ int
 main (void)
 {
   i = x > __builtin_inf ();
+#if defined(__ARM_FP) && __ARM_FP == 4
+  /* Arm with SP FPU does not support exceptions (see pr102017).  */
+  if (i != 0 || fetestexcept (FE_INVALID))
+#else
   if (i != 0 || !fetestexcept (FE_INVALID))
+#endif
 abort ();
 }
diff --git a/gcc/testsuite/gcc.dg/torture/inf-compare-2.c 
b/gcc/testsuite/gcc.dg/torture/inf-compare-2.c
index 011f992d5a0..dcb43ccc444 100644
--- 

[RFC PATCH] __trunc{tf,xf,df,sf,hf}bf2, __truncbfhf2 and __extendbfsf2

2022-09-22 Thread Jakub Jelinek via Gcc-patches
On Tue, Sep 20, 2022 at 10:51:18AM +0200, Jakub Jelinek via Gcc-patches wrote:
> On Tue, Sep 20, 2022 at 11:35:07AM +0800, Hongtao Liu wrote:
> > > The question is (mainly for aarch64, arm and x86 backend maintainers) if 
> > > we
> > > shouldn't support it, in the PR there is a partial patch to do so, but
> > > the big question is if it should be supported as the __bf16 type those
> > > 3 targets use with u6__bf16 mangling and remove those *_invalid_* cases
> > > and add conversions to/from at least SFmode but probably also DFmode, 
> > > TFmode
> > > and XFmode on x86 and implement arithmetics on those through conversion to
> > > SFmode, performing arithmetics there and conversion back.
> > > Conversion from BFmode to SFmode is easy, left shift by 16 and ought to be
> > > implemented inline, SFmode -> BFmode conversion is harder,
> > > I think it is roughly:
> > I'm not sure if there should be any floating point exceptions for
> > BFmode operation.
> > For x86, there's no floating point exceptions for AVX512_BF16 related
> > instructions
> 
> As long as __bf16 is just an extension, supporting or not supporting
> exceptions on sNaNs is just fine I think, but I'm afraid it is different
> for std::bfloat16_t.  If we claim we support it (define that type
> in , predefine __STD_BFLOAT16_TYPE__), then it needs to follow
> ISO/IEC/IEEE 60559, and I'm afraid that means also exceptions and the like.
> While the IEEE spec doesn't cover the exact bfloat16 format, C++ talks about
> a format with these and these number of bits here and there that behaves
> like in IEEE otherwise.
> Whether we support std::bfloat16_t at all is our choice, if we do support
> it, whether we support it with __bf16 underlying type or come up with
> something different, it is up to us, and with -ffast-math/-Ofast etc.
> we can certainly use hw instructions for it which don't raise exceptions.
> 
> At least that is my limited understanding of it...

I've been playing with this a little bit and here is a soft-fp version of
IMHO everything we need for proper bfloat16 support.
In particular, I think we need all the truncating conversions from other
floating formats that a target with BFmode floating point support (currently
arm, aarch64 and x86) has, truncating conversion from BFmode to HFmode
(seems GCC when precision is the same considers conversions truncating)
and an extension from BFmode to SFmode.  Extensions from BFmode to
SF/DF/XF/TFmode are IMHO best implemented inside of GCC by performing
BFmode to SFmode conversion first and then converting SFmode to those
other formats, other arithmetics on BFmode should be implemented simply
by widening to SFmode, doing arithmetics there and then converting back.
The BF to SFmode extension can be also implemented simply by shifting
the VCEd value up by 16 bits and VCEing the result if flags say
sNaNs don't need to be handled, or IMHO if we use the extended result
in some arithmetic operation that will handle the sNaN signaling +
conversion into qNaN, similarly for SFmode to BFmode conversions
we can use hw instructions if available and we don't care about sNaNs.

The C FE has the advantage that it has excess precision support, there
we should arrange for BFmode to be always promoted to SFmode excess
precision, but C++ FE doesn't.

Also, question to ARM/AArch64/x86 maintainers is if it is ok to
add conversion and arithmetic support to the __bf16 type, or if
that type should keep to be useless and there should be another
type (some keyword or just float __attribute__((__mode__ (__BF__
that we'd have that support for.  Whatever type we'd use as
std::bfloat16_t should mangle as DFb16_ rather than u6__bf16 that
__bf16 currently mangles to though.

Thoughts on this?

And for Joseph, sure, the libgcc/soft-fp/ part should probably go
into glibc first and be copied from there afterwards.

Perhaps the __truncbfhf2 could be dropped and we could just on
the compiler side emit shift left by 16 before calling __truncsfhf2.

--- libgcc/soft-fp/brain.h.jj   2022-09-22 15:28:04.865171729 +0200
+++ libgcc/soft-fp/brain.h  2022-09-22 15:35:11.970374554 +0200
@@ -0,0 +1,172 @@
+/* Software floating-point emulation.
+   Definitions for Brain Floating Point format (bfloat16).
+   Copyright (C) 1997-2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License 

Re: TYPE_{MIN/MAX}_VALUE for floats?

2022-09-22 Thread Aldy Hernandez via Gcc-patches
On Thu, Sep 22, 2022 at 5:22 PM Jakub Jelinek  wrote:
>
> On Thu, Sep 22, 2022 at 05:02:19PM +0200, Aldy Hernandez wrote:
> > It has always irritated me that we don't have TYPE_MIN_VALUE and
> > TYPE_MAX_VALUE for floats (and for pointers for that matter).  This
> > means, we have to recalculate it ad-nauseum in vrp_val_min and
> > vrp_val_max.
> >
> > I know we have dconstinf and dconstninf for floats, which we can just
> > wrap around a TREE_REAL_CST, but it still seems like we should be more
> > consistent here.  If we know the endpoint for a type, we should cache
> > it in it.
>
> This looks problematic.
> While for !MODE_HAS_INFINITIES there are clear values, otherwise
> the flag_finite_math_only flag has Optimization keyword, so it can change
> between different functions, while a type is a global entity that can be
> used by both __attribute__((optimize ("Ofast"))) and standard floating point
> functions.

Oh...it can have different values in different functions?  Yeah,
that's not gonna work.  Oh well, thanks.

Aldy



Merge from trunk to gccgo branch

2022-09-22 Thread Ian Lance Taylor via Gcc-patches
I've merged trunk revision f35be1268c996d993ab0b4ff329734d467474445 to
the gccgo branch.

Ian


Re: [PATCH] tree-object-size: Support strndup and strdup

2022-09-22 Thread Siddhesh Poyarekar

On 2022-09-22 09:02, Jakub Jelinek wrote:

On Mon, Aug 15, 2022 at 03:23:11PM -0400, Siddhesh Poyarekar wrote:

--- a/gcc/tree-object-size.cc
+++ b/gcc/tree-object-size.cc
@@ -495,6 +495,18 @@ decl_init_size (tree decl, bool min)
return size;
  }
  
+/* Get the outermost object that PTR may point into.  */

+
+static tree
+get_whole_object (const_tree ptr)
+{
+  tree pt_var = TREE_OPERAND (ptr, 0);
+  while (handled_component_p (pt_var))
+pt_var = TREE_OPERAND (pt_var, 0);
+
+  return pt_var;
+}


Not sure why you want a new function for this.
This is essentially get_base_address (TREE_OPERAND (ptr, 0)).


Oh, so can addr_object_size be simplified to use get_base_address too?


  /* Compute __builtin_object_size for PTR, which is a ADDR_EXPR.
 OBJECT_SIZE_TYPE is the second argument from __builtin_object_size.
 If unknown, return size_unknown (object_size_type).  */
+  if (!size_valid_p (sz, object_size_type)
+   || size_unknown_p (sz, object_size_type))
+{
+  tree wholesrc = NULL_TREE;
+  if (TREE_CODE (src) == ADDR_EXPR)
+   wholesrc = get_whole_object (src);
+
+  if (!(object_size_type & OST_MINIMUM)
+ || (wholesrc && TREE_CODE (wholesrc) == STRING_CST))


Is this safe?  I mean get_whole_object will also skip ARRAY_REFs with
variable indexes etc. and the STRING_CST could have embedded '\0's
in it.
Even if c_strlen (src, 1) is constant, I don't see what you can assume
for object size of strndup ("abcd\0efgh", n); for minimum, except 1.


Can't we assume MIN(5, n) for STRING_CST?

For ARRAY_REFs, it may end up being MIN(array_size, n) and not account 
for the NUL termination but I was thinking of that as being a better 
option than bailing out.  Should we try harder here and return, e.g. 
strlen or some equivalent?



But on the other side, 1 is a safe minimum for OST_MINIMUM of both
strdup and strndup if you don't find anything more specific (exact strlen
for strndup) because the terminating '\0' will be always there.


OK, I can return size_one_node as the final return value for OST_MINIMUM 
if we don't find a suitable expression.



Other than that you'd need to consider INTEGER_CST second strndup argument
or ranges of the second argument etc.
E.g. maximum for OST_DYNAMIC could be for strndup (src, n)
MIN (__bdos (src, ?), n + 1).


Yeah, that's what I return in the end:

  return fold_build2 (MIN_EXPR, sizetype,
 fold_build2 (PLUS_EXPR, sizetype, size_one_node,n),
 sz);

where sz is __bdos(src)




@@ -2113,7 +2177,7 @@ const pass_data pass_data_object_sizes =
PROP_objsz, /* properties_provided */
0, /* properties_destroyed */
0, /* todo_flags_start */
-  0, /* todo_flags_finish */
+  TODO_update_ssa_no_phi, /* todo_flags_finish */
  };
  
  class pass_object_sizes : public gimple_opt_pass

@@ -2153,7 +2217,7 @@ const pass_data pass_data_early_object_sizes =
0, /* properties_provided */
0, /* properties_destroyed */
0, /* todo_flags_start */
-  0, /* todo_flags_finish */
+  TODO_update_ssa_no_phi, /* todo_flags_finish */
  };


This is quite expensive.  Do you really need full ssa update, or just
TODO_update_ssa_only_virtuals would be enough (is it for the missing
vuse on the strlen call if you emit it)?
In any case, would be better not to do that always, but only if you
really need it (emitted the strlen call somewhere; e.g. if __bdos is
never used, only __bos, it is certainly not needed), todo flags
can be both in todo_flags_finish and in return value from execute method.


Thanks, I'll find a cheaper way to do this.

Thanks,
Sid


Re: TYPE_{MIN/MAX}_VALUE for floats?

2022-09-22 Thread Jakub Jelinek via Gcc-patches
On Thu, Sep 22, 2022 at 05:02:19PM +0200, Aldy Hernandez wrote:
> It has always irritated me that we don't have TYPE_MIN_VALUE and
> TYPE_MAX_VALUE for floats (and for pointers for that matter).  This
> means, we have to recalculate it ad-nauseum in vrp_val_min and
> vrp_val_max.
> 
> I know we have dconstinf and dconstninf for floats, which we can just
> wrap around a TREE_REAL_CST, but it still seems like we should be more
> consistent here.  If we know the endpoint for a type, we should cache
> it in it.

This looks problematic.
While for !MODE_HAS_INFINITIES there are clear values, otherwise
the flag_finite_math_only flag has Optimization keyword, so it can change
between different functions, while a type is a global entity that can be
used by both __attribute__((optimize ("Ofast"))) and standard floating point
functions.
In some sense it is similar to TYPE_MODE which for vectors needs to be
actually a function call that decides based on the current function.
But then, having it in TYPE_*_VALUE doesn't have the benefits you want from
it...

Jakub



TYPE_{MIN/MAX}_VALUE for floats?

2022-09-22 Thread Aldy Hernandez via Gcc-patches
It has always irritated me that we don't have TYPE_MIN_VALUE and
TYPE_MAX_VALUE for floats (and for pointers for that matter).  This
means, we have to recalculate it ad-nauseum in vrp_val_min and
vrp_val_max.

I know we have dconstinf and dconstninf for floats, which we can just
wrap around a TREE_REAL_CST, but it still seems like we should be more
consistent here.  If we know the endpoint for a type, we should cache
it in it.

Furthermore, just the way we're chopping off NANs in the frange::set()
routine, we should be able to chop off things outside the min/max
representable range, at least for -ffinite-math-only.  For example,
the endpoints to VR_VARYING for a float in -ffinite-math-only should
be real_{min/max}_representable(), which REAL_VALUE_TYPE already
provides.   I am testing a patch to do this, but am unhappy that we
have recalculate things.

Is there a reason we can't store these in the type?

I tried the naive attached approach, but I quickly ran into LTO woes:

FAIL: gcc.c-torture/execute/ieee/20001122-1.c compilation,  -O2 -flto
-fno-use-linker-plugin -flto-partition=none
 (internal compiler error: 'verify_type' failed)

$ ./xgcc -B./ a.c -O2 -flto -w
lto1: error: type variant differs by TYPE_MAX_VALUE

So I clearly don't know what I'm doing.

Would folks be ok with filling TYPE_MIN_VALUE and friends for floats,
and if so, could someone give me a hand here?  What am I missing?

Thanks.
Aldy

p.s. Now that we're onto this subject, in the distant future, I'd
actually like to store a vrange in the tree type.  I mean, they are
first class citizens in the SSA name now, and we have a typeless way
of storing ranges in GC space.  Anywho, that's for the future, cause I
like the pain... just wanted to gauge the temperature on that one as
well.
diff --git a/gcc/stor-layout.cc b/gcc/stor-layout.cc
index 88923c4136b..98f268d9f5a 100644
--- a/gcc/stor-layout.cc
+++ b/gcc/stor-layout.cc
@@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "attribs.h"
 #include "debug.h"
 #include "calls.h"
+#include "real.h"
 
 /* Data type for the expressions representing sizes of data types.
It is the first integer type laid out.  */
diff --git a/gcc/tree.cc b/gcc/tree.cc
index 4165cbd7c3b..7a1fc6c4888 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -7620,6 +7620,31 @@ build_offset_type (tree basetype, tree type)
   return t;
 }
 
+/* Create a floating point type with PRECISION.  */
+
+tree
+build_float_type (unsigned precision)
+{
+  tree type = make_node (REAL_TYPE);
+  TYPE_PRECISION (type) = precision;
+  layout_type (type);
+
+  if (flag_finite_math_only)
+{
+  REAL_VALUE_TYPE min, max;
+  real_min_representable (, type);
+  real_max_representable (, type);
+  TYPE_MIN_VALUE (type) = build_real (type, min);
+  TYPE_MAX_VALUE (type) = build_real (type, max);
+}
+  else
+{
+  TYPE_MIN_VALUE (type) = build_real (type, dconstninf);
+  TYPE_MAX_VALUE (type) = build_real (type, dconstinf);
+}
+  return type;
+}
+
 /* Create a complex type whose components are COMPONENT_TYPE.
 
If NAMED is true, the type is given a TYPE_NAME.  We do not always
@@ -9427,17 +9452,9 @@ build_common_tree_nodes (bool signed_char)
 
   pointer_sized_int_node = build_nonstandard_integer_type (POINTER_SIZE, 1);
 
-  float_type_node = make_node (REAL_TYPE);
-  TYPE_PRECISION (float_type_node) = FLOAT_TYPE_SIZE;
-  layout_type (float_type_node);
-
-  double_type_node = make_node (REAL_TYPE);
-  TYPE_PRECISION (double_type_node) = DOUBLE_TYPE_SIZE;
-  layout_type (double_type_node);
-
-  long_double_type_node = make_node (REAL_TYPE);
-  TYPE_PRECISION (long_double_type_node) = LONG_DOUBLE_TYPE_SIZE;
-  layout_type (long_double_type_node);
+  float_type_node = build_float_type (FLOAT_TYPE_SIZE);
+  double_type_node = build_float_type (DOUBLE_TYPE_SIZE);
+  long_double_type_node = build_float_type (LONG_DOUBLE_TYPE_SIZE);
 
   for (i = 0; i < NUM_FLOATN_NX_TYPES; i++)
 {
diff --git a/gcc/tree.h b/gcc/tree.h
index 266e24a0563..b83fac17f1a 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -4729,6 +4729,7 @@ extern tree build_varargs_function_type_array (tree, int, tree *);
 extern tree build_method_type_directly (tree, tree, tree);
 extern tree build_method_type (tree, tree);
 extern tree build_offset_type (tree, tree);
+extern tree build_float_type (unsigned);
 extern tree build_complex_type (tree, bool named = false);
 extern tree array_type_nelts (const_tree);
 


[committed] libiberty: Refer to Bugzilla in README

2022-09-22 Thread Jonathan Wakely via Gcc-patches
Approved by Richi on IRC. Pushed to trunk.

-- >8 --

We want bugs reported to Bugzilla, not emailed to gcc-bugs.

libiberty/ChangeLog:

* README: Replace gcc-bugs email address with Bugzilla URL.
---
 libiberty/README | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libiberty/README b/libiberty/README
index 9f1cc979e49..e7ffb17c192 100644
--- a/libiberty/README
+++ b/libiberty/README
@@ -15,7 +15,7 @@ The library must be configured from the top source directory. 
 Don't
 try to run configure in this directory.  Follow the configuration
 instructions in ../README.
 
-Please report bugs to "gcc-b...@gcc.gnu.org" and send fixes to
+Please report bugs to https://gcc.gnu.org/bugzilla/ and send fixes to
 "gcc-patches@gcc.gnu.org".  Thank you.
 
 ADDING A NEW FILE
-- 
2.37.3



[committed 2/2] libstdc++: Implement constexpr std::bitset for C++23 (P2417R2)

2022-09-22 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk.

-- >8 --

Also add _GLIBCXX_HOSTED checks to simplify making 
freestanding in the near future.

libstdc++-v3/ChangeLog:

* include/std/bitset (bitset): Add constexpr for C++23. Guard
members using std::string with _GLIBCXX_HOSTED.
* include/std/version (__cpp_lib_constexpr_bitset): Define.
* testsuite/20_util/bitset/access/constexpr.cc: New test.
* testsuite/20_util/bitset/cons/constexpr_c++23.cc: New test.
* testsuite/20_util/bitset/count/constexpr.cc: New test.
* testsuite/20_util/bitset/ext/constexpr.cc: New test.
* testsuite/20_util/bitset/operations/constexpr_c++23.cc: New test.
* testsuite/20_util/bitset/version.cc: New test.
---
 libstdc++-v3/include/std/bitset   | 244 --
 libstdc++-v3/include/std/version  |   1 +
 .../20_util/bitset/access/constexpr.cc|  55 
 .../20_util/bitset/cons/constexpr_c++23.cc|  53 
 .../20_util/bitset/count/constexpr.cc |  93 +++
 .../testsuite/20_util/bitset/ext/constexpr.cc |  32 +++
 .../bitset/operations/constexpr_c++23.cc  |  31 +++
 .../testsuite/20_util/bitset/version.cc   |  10 +
 8 files changed, 440 insertions(+), 79 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/20_util/bitset/access/constexpr.cc
 create mode 100644 
libstdc++-v3/testsuite/20_util/bitset/cons/constexpr_c++23.cc
 create mode 100644 libstdc++-v3/testsuite/20_util/bitset/count/constexpr.cc
 create mode 100644 libstdc++-v3/testsuite/20_util/bitset/ext/constexpr.cc
 create mode 100644 
libstdc++-v3/testsuite/20_util/bitset/operations/constexpr_c++23.cc
 create mode 100644 libstdc++-v3/testsuite/20_util/bitset/version.cc

diff --git a/libstdc++-v3/include/std/bitset b/libstdc++-v3/include/std/bitset
index 438c2f7efe9..0c84f15fda0 100644
--- a/libstdc++-v3/include/std/bitset
+++ b/libstdc++-v3/include/std/bitset
@@ -44,14 +44,15 @@
 
 #pragma GCC system_header
 
-#include 
 #include// For invalid_argument, out_of_range,
 // overflow_error
-#include 
-#include 
-
-#if __cplusplus >= 201103L
-# include 
+#if _GLIBCXX_HOSTED
+# include 
+# include 
+# include 
+# if __cplusplus >= 201103L
+#  include 
+# endif
 #endif
 
 #define _GLIBCXX_BITSET_BITS_PER_WORD  (__CHAR_BIT__ * __SIZEOF_LONG__)
@@ -65,6 +66,10 @@ namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 
+#if __cplusplus > 202002L && _GLIBCXX_HOSTED
+# define __cpp_lib_constexpr_bitset 202202L
+#endif
+
   /**
*  Base class, general case.  It is a class invariant that _Nw will be
*  nonnegative.
@@ -111,7 +116,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   _S_maskbit(size_t __pos) _GLIBCXX_NOEXCEPT
   { return (static_cast<_WordT>(1)) << _S_whichbit(__pos); }
 
-  _WordT&
+  _GLIBCXX14_CONSTEXPR _WordT&
   _M_getword(size_t __pos) _GLIBCXX_NOEXCEPT
   { return _M_w[_S_whichword(__pos)]; }
 
@@ -120,12 +125,12 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   { return _M_w[_S_whichword(__pos)]; }
 
 #if __cplusplus >= 201103L
-  const _WordT*
+  constexpr const _WordT*
   _M_getdata() const noexcept
   { return _M_w; }
 #endif
 
-  _WordT&
+  _GLIBCXX23_CONSTEXPR _WordT&
   _M_hiword() _GLIBCXX_NOEXCEPT
   { return _M_w[_Nw - 1]; }
 
@@ -133,52 +138,61 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   _M_hiword() const _GLIBCXX_NOEXCEPT
   { return _M_w[_Nw - 1]; }
 
-  void
+  _GLIBCXX23_CONSTEXPR void
   _M_do_and(const _Base_bitset<_Nw>& __x) _GLIBCXX_NOEXCEPT
   {
for (size_t __i = 0; __i < _Nw; __i++)
  _M_w[__i] &= __x._M_w[__i];
   }
 
-  void
+  _GLIBCXX14_CONSTEXPR void
   _M_do_or(const _Base_bitset<_Nw>& __x) _GLIBCXX_NOEXCEPT
   {
for (size_t __i = 0; __i < _Nw; __i++)
  _M_w[__i] |= __x._M_w[__i];
   }
 
-  void
+  _GLIBCXX14_CONSTEXPR void
   _M_do_xor(const _Base_bitset<_Nw>& __x) _GLIBCXX_NOEXCEPT
   {
for (size_t __i = 0; __i < _Nw; __i++)
  _M_w[__i] ^= __x._M_w[__i];
   }
 
-  void
+  _GLIBCXX14_CONSTEXPR void
   _M_do_left_shift(size_t __shift) _GLIBCXX_NOEXCEPT;
 
-  void
+  _GLIBCXX14_CONSTEXPR void
   _M_do_right_shift(size_t __shift) _GLIBCXX_NOEXCEPT;
 
-  void
+  _GLIBCXX14_CONSTEXPR void
   _M_do_flip() _GLIBCXX_NOEXCEPT
   {
for (size_t __i = 0; __i < _Nw; __i++)
  _M_w[__i] = ~_M_w[__i];
   }
 
-  void
+  _GLIBCXX14_CONSTEXPR void
   _M_do_set() _GLIBCXX_NOEXCEPT
   {
for (size_t __i = 0; __i < _Nw; __i++)
  _M_w[__i] = ~static_cast<_WordT>(0);
   }
 
-  void
+  _GLIBCXX14_CONSTEXPR void
   _M_do_reset() _GLIBCXX_NOEXCEPT
-  { __builtin_memset(_M_w, 0, _Nw * sizeof(_WordT)); }
+  {
+   if (__builtin_is_constant_evaluated())
+ {
+   for (_WordT& __w : _M_w)
+

[committed 1/2] libstdc++: Rearrange tests for

2022-09-22 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk.

-- >8 --

In C++03 std::bitset was in the Container clause, but since C++11 it has
been in the Utilties clause. This moves the tests to the 20_util
directory, where most people probably expect to find them.

Also create 'access', 'observers', and 'io' subdirectories and group
some tests under there, rather than having one directory per function
name, and only a single test in that directory.

libstdc++-v3/ChangeLog:

* testsuite/23_containers/bitset/18604.cc: Moved to...
* testsuite/20_util/bitset/18604.cc: ...here.
* testsuite/23_containers/bitset/45713.cc: Moved to...
* testsuite/20_util/bitset/45713.cc: ...here.
* testsuite/23_containers/bitset/to_string/dr396.cc: Moved to...
* testsuite/20_util/bitset/access/dr396.cc: ...here.
* testsuite/23_containers/bitset/to_string/1.cc: Moved to...
* testsuite/20_util/bitset/access/to_string.cc: ...here.
* testsuite/23_containers/bitset/to_ullong/1.cc: Moved to...
* testsuite/20_util/bitset/access/to_ullong.cc: ...here.
* testsuite/23_containers/bitset/to_ulong/1.cc: Moved to...
* testsuite/20_util/bitset/access/to_ulong.cc: ...here.
* testsuite/23_containers/bitset/cons/1.cc: Moved to...
* testsuite/20_util/bitset/cons/1.cc: ...here.
* testsuite/23_containers/bitset/cons/16020.cc: Moved to...
* testsuite/20_util/bitset/cons/16020.cc: ...here.
* testsuite/23_containers/bitset/cons/2.cc: Moved to...
* testsuite/20_util/bitset/cons/2.cc: ...here.
* testsuite/23_containers/bitset/cons/3.cc: Moved to...
* testsuite/20_util/bitset/cons/3.cc: ...here.
* testsuite/23_containers/bitset/cons/38244.cc: Moved to...
* testsuite/20_util/bitset/cons/38244.cc: ...here.
* testsuite/23_containers/bitset/cons/50268.cc: Moved to...
* testsuite/20_util/bitset/cons/50268.cc: ...here.
* testsuite/23_containers/bitset/cons/6282.cc: Moved to...
* testsuite/20_util/bitset/cons/6282.cc: ...here.
* testsuite/23_containers/bitset/cons/constexpr.cc: Moved to...
* testsuite/20_util/bitset/cons/constexpr.cc: ...here.
* testsuite/23_containers/bitset/cons/dr1325-1.cc: Moved to...
* testsuite/20_util/bitset/cons/dr1325-1.cc: ...here.
* testsuite/23_containers/bitset/cons/dr1325-2.cc: Moved to...
* testsuite/20_util/bitset/cons/dr1325-2.cc: ...here.
* testsuite/23_containers/bitset/cons/dr396.cc: Moved to...
* testsuite/20_util/bitset/cons/dr396.cc: ...here.
* testsuite/23_containers/bitset/debug/invalidation/1.cc: Moved to...
* testsuite/20_util/bitset/debug/invalidation/1.cc: ...here.
* testsuite/23_containers/bitset/ext/15361.cc: Moved to...
* testsuite/20_util/bitset/ext/15361.cc: ...here.
* testsuite/23_containers/bitset/hash/1.cc: Moved to...
* testsuite/20_util/bitset/hash/1.cc: ...here.
* testsuite/23_containers/bitset/input/1.cc: Moved to...
* testsuite/20_util/bitset/io/input.cc: ...here.
* testsuite/23_containers/bitset/count/6124.cc: Moved to...
* testsuite/20_util/bitset/observers/6124.cc: ...here.
* testsuite/23_containers/bitset/all/1.cc: Moved to...
* testsuite/20_util/bitset/observers/all.cc: ...here.
* testsuite/23_containers/bitset/test/1.cc: Moved to...
* testsuite/20_util/bitset/observers/test.cc: ...here.
* testsuite/23_containers/bitset/operations/1.cc: Moved to...
* testsuite/20_util/bitset/operations/1.cc: ...here.
* testsuite/23_containers/bitset/operations/13838.cc: Moved to...
* testsuite/20_util/bitset/operations/13838.cc: ...here.
* testsuite/23_containers/bitset/operations/2.cc: Moved to...
* testsuite/20_util/bitset/operations/2.cc: ...here.
* testsuite/23_containers/bitset/operations/96303.cc: Moved to...
* testsuite/20_util/bitset/operations/96303.cc: ...here.
* testsuite/23_containers/bitset/operations/constexpr-2.cc: Moved to...
* testsuite/20_util/bitset/operations/constexpr-2.cc: ...here.
* testsuite/23_containers/bitset/operations/constexpr.cc: Moved to...
* testsuite/20_util/bitset/operations/constexpr.cc: ...here.
* testsuite/23_containers/bitset/requirements/constexpr_functions.cc: 
Moved to...
* testsuite/20_util/bitset/requirements/constexpr_functions.cc: ...here.
* 
testsuite/23_containers/bitset/requirements/explicit_instantiation/1.cc: Moved 
to...
* testsuite/20_util/bitset/requirements/explicit_instantiation/1.cc: 
...here.
* 
testsuite/23_containers/bitset/requirements/explicit_instantiation/1_c++0x.cc: 
Moved to...
* 
testsuite/20_util/bitset/requirements/explicit_instantiation/1_c++0x.cc: 
...here.
* testsuite/23_containers/headers/bitset/synopsis.cc: Moved to...
* 

Re: [PATCH v6, rs6000] Implemented f[min/max]_optab by xs[min/max]dp [PR103605]

2022-09-22 Thread Segher Boessenkool
Hi!

On Thu, Sep 22, 2022 at 10:28:23AM +0800, Kewen.Lin wrote:
> on 2022/9/22 05:56, Segher Boessenkool wrote:
> > On Fri, Jun 24, 2022 at 10:02:19AM +0800, HAO CHEN GUI wrote:
> > In the other direction I am worried that the unspecs will degrade
> > performance (relative to smin/smax) when -ffast-math *is* active (and
> > this new builtin code and pattern doesn't blow up).
> 
> For fmin/fmax it would be fine, since they are transformed to {MAX,MIN}
> EXPR in middle end, and yes, it can degrade for the bifs, although IMHO
> the previous expansion to smin/smax contradicts with the bif names (users
> expect to map them to xs{min,max}dp than others).

But builtins *never* say to generate any particular instruction.  They
say to generate code that implements certain functionality.  For many
builtins this does of course boil down to specific instructions, but
even then it could be optimised away completely or replace with
something more specific if things can be folded or such.

> > I still think we should get RTL codes for this, to have access to proper
> > floating point min/max semantics always and everywhere.  "fmin" and
> > "fmax" seem to be good names :-)
> 
> It would be good, especially if we have observed some uses of these bifs
> and further opportunities around them.  :)

Currently we only have smin/smax for float, and those are not valid for
NaNs, or when the sign of zeros is relevant.  On the other hand the
semantics of fmin/fmax are settled and in most standards nowadays.  So
it is time we did this I would say :-)


Segher


[PATCH] opts: fix --help=common with '\t' description

2022-09-22 Thread Martin Liška
Fixes -flto-compression option:

-  -flto-compression-level= Use z Use zlib/zstd compression level 
 for IL.
+  -flto-compression-level=<0,19> Use zlib/zstd compression level  for 
IL.

Ready for master?

Thanks,
Martin

---
 gcc/common.opt | 2 +-
 gcc/opts.cc| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 06ef768ab78..296d6f194bf 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2106,7 +2106,7 @@ Specify the algorithm to partition symbols and vars at 
linktime.
 ; The initial value of -1 comes from Z_DEFAULT_COMPRESSION in zlib.h.
 flto-compression-level=
 Common Joined RejectNegative UInteger Var(flag_lto_compression_level) Init(-1) 
IntegerRange(0, 19)
--flto-compression-level=   Use zlib/zstd compression level 
 for IL.
+Use zlib/zstd compression level  for IL.
 
 flto-odr-type-merging
 Common Ignore
diff --git a/gcc/opts.cc b/gcc/opts.cc
index e058aaf3697..eb5db01de17 100644
--- a/gcc/opts.cc
+++ b/gcc/opts.cc
@@ -1801,7 +1801,7 @@ print_filtered_help (unsigned int include_flags,
  help = new_help;
}
 
-  if (option->range_max != -1)
+  if (option->range_max != -1 && tab == NULL)
{
  char b[128];
  snprintf (b, sizeof (b), "<%d,%d>", option->range_min,
-- 
2.37.3



Re: [PATCH v6, rs6000] Implemented f[min/max]_optab by xs[min/max]dp [PR103605]

2022-09-22 Thread Segher Boessenkool
Hi!

On Thu, Sep 22, 2022 at 05:59:07PM +0800, HAO CHEN GUI wrote:
> >> I still think we should get RTL codes for this, to have access to proper
> >> floating point min/max semantics always and everywhere.  "fmin" and
> >> "fmax" seem to be good names :-)
> > 
> > It would be good, especially if we have observed some uses of these bifs
> > and further opportunities around them.  :)
> > 
> Shall we submit a PR to add fmin/fmax to RTL codes?

Yes, please do.

If we have fmin/fmax RTL codes that describe the standard semantics,
we can generate code for that with -ffast-math as well, since the
code generated is optimal in either case; it's just the *generic*
optimisations that fall behind.


Segher


[PATCH] c++: Implement __is_{nothrow_,}convertible [PR106784]

2022-09-22 Thread Marek Polacek via Gcc-patches
To improve compile times, the C++ library could use compiler built-ins
rather than implementing std::is_convertible (and _nothrow) as class
templates.  This patch adds the built-ins.  We already have
__is_constructible and __is_assignable, and the nothrow forms of those.

Microsoft (and clang, for compatibility) also provide an alias called
__is_convertible_to.  I did not add it, but it would be trivial to do
so.

I noticed that our __is_assignable doesn't implement the "Access checks
are performed as if from a context unrelated to either type" requirement,
therefore std::is_assignable / __is_assignable give two different results
here:

  class S {
operator int();
friend void g(); // #1
  };

  void
  g ()
  {
// #1 doesn't matter
static_assert(std::is_assignable::value, "");
static_assert(__is_assignable(int&, S), "");
  }

This is not a problem if __is_assignable is not meant to be used by
the users.

This patch doesn't make libstdc++ use the new built-ins, but I had to
rename a class otherwise its name would clash with the new built-in.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR c++/106784

gcc/c-family/ChangeLog:

* c-common.cc (c_common_reswords): Add __is_convertible and
__is_nothrow_convertible.
* c-common.h (enum rid): Add RID_IS_CONVERTIBLE and
RID_IS_NOTHROW_CONVERTIBLE.

gcc/cp/ChangeLog:

* constraint.cc (diagnose_trait_expr): Handle CPTK_IS_CONVERTIBLE
and CPTK_IS_NOTHROW_CONVERTIBLE.
* cp-objcp-common.cc (names_builtin_p): Handle RID_IS_CONVERTIBLE
RID_IS_NOTHROW_CONVERTIBLE.
* cp-tree.h (enum cp_trait_kind): Add CPTK_IS_CONVERTIBLE and
CPTK_IS_NOTHROW_CONVERTIBLE.
(is_convertible): Declare.
(is_nothrow_convertible): Likewise.
* cxx-pretty-print.cc (pp_cxx_trait_expression): Handle
CPTK_IS_CONVERTIBLE and CPTK_IS_NOTHROW_CONVERTIBLE.
* method.cc (is_convertible): New.
(is_nothrow_convertible): Likewise.
* parser.cc (cp_parser_primary_expression): Handle RID_IS_CONVERTIBLE
and RID_IS_NOTHROW_CONVERTIBLE.
(cp_parser_trait_expr): Likewise.
* semantics.cc (trait_expr_value): Handle CPTK_IS_CONVERTIBLE and
CPTK_IS_NOTHROW_CONVERTIBLE.
(finish_trait_expr): Likewise.

libstdc++-v3/ChangeLog:

* include/std/type_traits: Rename __is_nothrow_convertible to
__is_nothrow_convertible_lib.
* testsuite/20_util/is_nothrow_convertible/value_ext.cc: Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Enhance to test __is_convertible and
__is_nothrow_convertible.
* g++.dg/ext/is_convertible1.C: New test.
* g++.dg/ext/is_convertible2.C: New test.
* g++.dg/ext/is_nothrow_convertible1.C: New test.
* g++.dg/ext/is_nothrow_convertible2.C: New test.
---
 gcc/c-family/c-common.cc  |   2 +
 gcc/c-family/c-common.h   |   1 +
 gcc/cp/constraint.cc  |   6 +
 gcc/cp/cp-objcp-common.cc |   2 +
 gcc/cp/cp-tree.h  |   4 +
 gcc/cp/cxx-pretty-print.cc|   6 +
 gcc/cp/method.cc  |  31 ++
 gcc/cp/parser.cc  |  10 +
 gcc/cp/semantics.cc   |   8 +
 gcc/testsuite/g++.dg/ext/has-builtin-1.C  |   6 +
 gcc/testsuite/g++.dg/ext/is_convertible1.C| 269 +
 gcc/testsuite/g++.dg/ext/is_convertible2.C|  46 +++
 .../g++.dg/ext/is_nothrow_convertible1.C  | 270 ++
 .../g++.dg/ext/is_nothrow_convertible2.C  |  19 ++
 libstdc++-v3/include/std/type_traits  |   4 +-
 .../is_nothrow_convertible/value_ext.cc   |   4 +-
 16 files changed, 684 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ext/is_convertible1.C
 create mode 100644 gcc/testsuite/g++.dg/ext/is_convertible2.C
 create mode 100644 gcc/testsuite/g++.dg/ext/is_nothrow_convertible1.C
 create mode 100644 gcc/testsuite/g++.dg/ext/is_nothrow_convertible2.C

diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index c0f15f4cab1..dce3045c9f2 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -541,6 +541,8 @@ const struct c_common_resword c_common_reswords[] =
   { "__is_constructible", RID_IS_CONSTRUCTIBLE, D_CXXONLY },
   { "__is_nothrow_assignable", RID_IS_NOTHROW_ASSIGNABLE, D_CXXONLY },
   { "__is_nothrow_constructible", RID_IS_NOTHROW_CONSTRUCTIBLE, D_CXXONLY },
+  { "__is_convertible", RID_IS_CONVERTIBLE, D_CXXONLY },
+  { "__is_nothrow_convertible", RID_IS_NOTHROW_CONVERTIBLE, D_CXXONLY },
   { "__reference_constructs_from_temporary", RID_REF_CONSTRUCTS_FROM_TEMPORARY,
D_CXXONLY },
   { "__reference_converts_from_temporary", RID_REF_CONVERTS_FROM_TEMPORARY,
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h

libgo patch committed: Add cgo.Incomplete

2022-09-22 Thread Ian Lance Taylor via Gcc-patches
This libgo patch changes the cgo command to use runtime/cgo.Incomplete
instead of //go:notinheap, and to define the new type in the
runtime/cgo package.  This ports https://go.dev/cl/421879 to libgo.
This is a quick port to update libgo to work with the version of cgo
in gc mainline.  A more complete port will follow, changing the gc
version of cmd/cgo to choose an approach based on feature testing the
gccgo in use.  Bootstrapped and tested on x86_64-pc-linux-gnu.
Committed to mainline.

Ian
c69b87678d3c4e9b995b8ccb51fb38c75a134323
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index dce38e727a7..f7a7985287d 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-6543b7fc6da533eb976b37649a925e7fd5a521fa
+42efec8c126cf3787bc7c89d9c7f224eff7c5a21
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/libgo/go/cmd/cgo/gcc.go b/libgo/go/cmd/cgo/gcc.go
index 02391495bbe..e786aeaafa9 100644
--- a/libgo/go/cmd/cgo/gcc.go
+++ b/libgo/go/cmd/cgo/gcc.go
@@ -132,12 +132,11 @@ func (p *Package) addToFlag(flag string, args []string) {
 //
 // For example, the following string:
 //
-// `a b:"c d" 'e''f'  "g\""`
+// `a b:"c d" 'e''f'  "g\""`
 //
 // Would be parsed as:
 //
-// []string{"a", "b:c d", "ef", `g"`}
-//
+// []string{"a", "b:c d", "ef", `g"`}
 func splitQuoted(s string) (r []string, err error) {
var args []string
arg := make([]rune, len(s))
@@ -1156,13 +1155,19 @@ func (p *Package) mangle(f *File, arg *ast.Expr, 
addPosition bool) (ast.Expr, bo
 
 // checkIndex checks whether arg has the form [i], possibly inside
 // type conversions. If so, then in the general case it writes
-//_cgoIndexNN := a
-//_cgoNN := [i] // with type conversions, if any
+//
+// _cgoIndexNN := a
+// _cgoNN := [i] // with type conversions, if any
+//
 // to sb, and writes
-//_cgoCheckPointer(_cgoNN, _cgoIndexNN)
+//
+// _cgoCheckPointer(_cgoNN, _cgoIndexNN)
+//
 // to sbCheck, and returns true. If a is a simple variable or field reference,
 // it writes
-//_cgoIndexNN := 
+//
+// _cgoIndexNN := 
+//
 // and dereferences the uses of _cgoIndexNN. Taking the address avoids
 // making a copy of an array.
 //
@@ -1210,10 +1215,14 @@ func (p *Package) checkIndex(sb, sbCheck *bytes.Buffer, 
arg ast.Expr, i int) boo
 
 // checkAddr checks whether arg has the form , possibly inside type
 // conversions. If so, it writes
-//_cgoBaseNN := 
-//_cgoNN := _cgoBaseNN // with type conversions, if any
+//
+// _cgoBaseNN := 
+// _cgoNN := _cgoBaseNN // with type conversions, if any
+//
 // to sb, and writes
-//_cgoCheckPointer(_cgoBaseNN, true)
+//
+// _cgoCheckPointer(_cgoBaseNN, true)
+//
 // to sbCheck, and returns true. This tells _cgoCheckPointer to check
 // just the contents of the pointer being passed, not any other part
 // of the memory allocation. This is run after checkIndex, which looks
@@ -2131,8 +2140,8 @@ type typeConv struct {
// Type names X for which there exists an XGetTypeID function with type 
func() CFTypeID.
getTypeIDs map[string]bool
 
-   // badStructs contains C structs that should be marked NotInHeap.
-   notInHeapStructs map[string]bool
+   // incompleteStructs contains C structs that should be marked 
Incomplete.
+   incompleteStructs map[string]bool
 
// Predeclared types.
bool   ast.Expr
@@ -2145,7 +2154,6 @@ type typeConv struct {
string ast.Expr
goVoid ast.Expr // _Ctype_void, denotes 
C's void
goVoidPtr  ast.Expr // unsafe.Pointer or 
*byte
-   goVoidPtrNoHeapast.Expr // 
*_Ctype_void_notinheap, like goVoidPtr but marked NotInHeap
 
ptrSize int64
intSize int64
@@ -2169,7 +2177,7 @@ func (c *typeConv) Init(ptrSize, intSize int64) {
c.m = make(map[string]*Type)
c.ptrs = make(map[string][]*Type)
c.getTypeIDs = make(map[string]bool)
-   c.notInHeapStructs = make(map[string]bool)
+   c.incompleteStructs = make(map[string]bool)
c.bool = c.Ident("bool")
c.byte = c.Ident("byte")
c.int8 = c.Ident("int8")
@@ -2188,7 +2196,6 @@ func (c *typeConv) Init(ptrSize, intSize int64) {
c.void = c.Ident("void")
c.string = c.Ident("string")
c.goVoid = c.Ident("_Ctype_void")
-   c.goVoidPtrNoHeap = c.Ident("*_Ctype_void_notinheap")
 
// Normally cgo translates void* to unsafe.Pointer,
// but for historical reasons -godefs uses *byte instead.
@@ -2531,19 +2538,13 @@ func (c *typeConv) loadType(dtype dwarf.Type, pos 
token.Pos, parent string) *Typ
// other than try to determine a Go representation.
tt := *t
tt.C = {"%s %s", 

Re: [PATCH v3 08/11] OpenMP/OpenACC: Rework clause expansion and nested struct handling

2022-09-22 Thread Jakub Jelinek via Gcc-patches
On Mon, Sep 19, 2022 at 08:40:34PM +0100, Julian Brown wrote:
> On Wed, 14 Sep 2022 15:24:12 +0200
> Jakub Jelinek  wrote:
> 
> > On Tue, Sep 13, 2022 at 02:03:18PM -0700, Julian Brown wrote:
> > > This patch is an extension and rewrite/rethink of the following two
> > > patches:
> > > 
> > >   "OpenMP/OpenACC: Add inspector class to unify mapped address
> > > analysis"
> > > https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591977.html
> > > 
> > >   "OpenMP: Handle reference-typed struct members"
> > >   https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591978.html
> 
> Here's a new version with some review comments addressed, rebased and
> with some adjustments to tests etc., necessary because of tweaks to
> earlier patches.
> 
> Re-tested with offloading to NVPTX. OK?

Ok, thanks.

Jakub



[PATCH] tree-optimization/102801 - testcase for uninit diagnostic

2022-09-22 Thread Richard Biener via Gcc-patches
The following testcase is fixed in GCC 12+

Tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/102801
gcc/testsuite/
* g++.dg/warn/Wuninitialized-33.C: New testcase.
---
 gcc/testsuite/g++.dg/warn/Wuninitialized-33.C | 55 +++
 1 file changed, 55 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/warn/Wuninitialized-33.C

diff --git a/gcc/testsuite/g++.dg/warn/Wuninitialized-33.C 
b/gcc/testsuite/g++.dg/warn/Wuninitialized-33.C
new file mode 100644
index 000..1bb0639ee30
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wuninitialized-33.C
@@ -0,0 +1,55 @@
+// PR102801
+// { dg-do compile }
+// { dg-require-effective-target c++17 }
+// { dg-options "-O2 -Wall" }
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+class C {
+bool b{}; // { dg-bogus "uninitialized" }
+
+struct Shared {};
+using SharedPtr = std::shared_ptr;
+
+SharedPtr shared;
+
+public:
+C() = delete;
+C(bool bIn) : b(bIn) {}
+~C();
+int someMethod() const;
+};
+
+using OptC = std::optional;
+
+class C2 {
+OptC c;
+public:
+C2() = default;
+C2(const C ) : c(cIn) {}
+~C2();
+void operator()() const;
+void swap(C2 ) { std::swap(c, o.c); }
+};
+
+
+template 
+class Q {
+std::vector queue;
+public:
+void Add(std::vector ) {
+for (T & item : items) {
+queue.push_back(T());
+item.swap(queue.back());
+}
+}
+void Exec();
+};
+
+extern void foo(Q & q, std::vector );
+void foo(Q & q, std::vector ) { q.Add(items); q.Exec(); }
-- 
2.35.3


Re: [PATCH v3 06/11] OpenMP: Pointers and member mappings

2022-09-22 Thread Jakub Jelinek via Gcc-patches
On Sun, Sep 18, 2022 at 08:19:29PM +0100, Julian Brown wrote:
> @@ -2609,6 +2672,9 @@ gfc_trans_omp_clauses (stmtblock_t *block, 
> gfc_omp_clauses *clauses,
>if (clauses == NULL)
>  return NULL_TREE;
>  
> +  hash_map sym_rooted_nl;

Isn't hash_map ctor pretty costly (allocates memory etc.)?
And gfc_trans_omp_clauses is called for all OpenMP constructs, in many
cases they are never going to have any map clauses or even if they do,
they might not trigger this code.

> +  bool built_sym_hash = false;

So, I think usually we don't construct such hash_maps right away,
but have just pointer to the hash map initialized to NULL (then you
don't need to built_sym_hash next to it) and you simply new the hash_map
when needed the first time and delete it at the end (which does nothing
if it is NULL).

Jakub



[PATCH][DOCS] changes: mentioned ignore -gz=zlib-gnu option

2022-09-22 Thread Martin Liška
---
 htdocs/gcc-13/changes.html | 4 
 1 file changed, 4 insertions(+)

diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
index a7d88038..0e895110 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -35,6 +35,10 @@ a work-in-progress.
   -gstabs and -gxcoff options) has been removed.
   (This means the dbx debugger is no longer
   supported, either.)
+Legacy debug info compression option -gz=zlib-gnu was 
removed
+  and the option is ignored right now.  If you really want to use the 
compression algorithm,
+  use the corresponding -Wl,--compress-debug-sections=zlib-gnu
+  and -Wa,--compress-debug-sections=zlib-gnu options.
 
 
 
-- 
2.37.3



Re: [PATCH] tree-object-size: Support strndup and strdup

2022-09-22 Thread Jakub Jelinek via Gcc-patches
On Mon, Aug 15, 2022 at 03:23:11PM -0400, Siddhesh Poyarekar wrote:
> --- a/gcc/tree-object-size.cc
> +++ b/gcc/tree-object-size.cc
> @@ -495,6 +495,18 @@ decl_init_size (tree decl, bool min)
>return size;
>  }
>  
> +/* Get the outermost object that PTR may point into.  */
> +
> +static tree
> +get_whole_object (const_tree ptr)
> +{
> +  tree pt_var = TREE_OPERAND (ptr, 0);
> +  while (handled_component_p (pt_var))
> +pt_var = TREE_OPERAND (pt_var, 0);
> +
> +  return pt_var;
> +}

Not sure why you want a new function for this.
This is essentially get_base_address (TREE_OPERAND (ptr, 0)).

>  /* Compute __builtin_object_size for PTR, which is a ADDR_EXPR.
> OBJECT_SIZE_TYPE is the second argument from __builtin_object_size.
> If unknown, return size_unknown (object_size_type).  */
> +  if (!size_valid_p (sz, object_size_type)
> +   || size_unknown_p (sz, object_size_type))
> +{
> +  tree wholesrc = NULL_TREE;
> +  if (TREE_CODE (src) == ADDR_EXPR)
> + wholesrc = get_whole_object (src);
> +
> +  if (!(object_size_type & OST_MINIMUM)
> +   || (wholesrc && TREE_CODE (wholesrc) == STRING_CST))

Is this safe?  I mean get_whole_object will also skip ARRAY_REFs with
variable indexes etc. and the STRING_CST could have embedded '\0's
in it.
Even if c_strlen (src, 1) is constant, I don't see what you can assume
for object size of strndup ("abcd\0efgh", n); for minimum, except 1.
But on the other side, 1 is a safe minimum for OST_MINIMUM of both
strdup and strndup if you don't find anything more specific (exact strlen
for strndup) because the terminating '\0' will be always there.
Other than that you'd need to consider INTEGER_CST second strndup argument
or ranges of the second argument etc.
E.g. maximum for OST_DYNAMIC could be for strndup (src, n)
MIN (__bdos (src, ?), n + 1).

> @@ -2113,7 +2177,7 @@ const pass_data pass_data_object_sizes =
>PROP_objsz, /* properties_provided */
>0, /* properties_destroyed */
>0, /* todo_flags_start */
> -  0, /* todo_flags_finish */
> +  TODO_update_ssa_no_phi, /* todo_flags_finish */
>  };
>  
>  class pass_object_sizes : public gimple_opt_pass
> @@ -2153,7 +2217,7 @@ const pass_data pass_data_early_object_sizes =
>0, /* properties_provided */
>0, /* properties_destroyed */
>0, /* todo_flags_start */
> -  0, /* todo_flags_finish */
> +  TODO_update_ssa_no_phi, /* todo_flags_finish */
>  };

This is quite expensive.  Do you really need full ssa update, or just
TODO_update_ssa_only_virtuals would be enough (is it for the missing
vuse on the strlen call if you emit it)?
In any case, would be better not to do that always, but only if you
really need it (emitted the strlen call somewhere; e.g. if __bdos is
never used, only __bos, it is certainly not needed), todo flags
can be both in todo_flags_finish and in return value from execute method.

Jakub



[PATCH] support -gz=zstd for both linker and assembler

2022-09-22 Thread Martin Liška
Hi.

Tested with Fangrui's patch set sent to binutils ML and mold linker.

$ gcc -g -gz=zstd a.c --save-temps --verbose 2>&1 | grep debug-sections
 /home/marxin/Programming/binutils/objdir/gas/as-new -v --gdwarf-5 
--compress-debug-sections=zstd --64 -o a.o a.s
 /home/marxin/bin/gcc/libexec/gcc/x86_64-pc-linux-gnu/13.0.0/collect2 -plugin 
/home/marxin/bin/gcc/libexec/gcc/x86_64-pc-linux-gnu/13.0.0/liblto_plugin.so 
-plugin-opt=/home/marxin/bin/gcc/libexec/gcc/x86_64-pc-linux-gnu/13.0.0/lto-wrapper
 -plugin-opt=-fresolution=a.res -plugin-opt=-pass-through=-lgcc 
-plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lc 
-plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s 
--eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 
--compress-debug-sections=zstd /lib/../lib64/crt1.o /lib/../lib64/crti.o 
/home/marxin/bin/gcc/lib64/gcc/x86_64-pc-linux-gnu/13.0.0/crtbegin.o 
-L/home/marxin/bin/gcc/lib64/gcc/x86_64-pc-linux-gnu/13.0.0 
-L/home/marxin/bin/gcc/lib64/gcc/x86_64-pc-linux-gnu/13.0.0/../../../../lib64 
-L/lib/../lib64 -L/usr/lib/../lib64 
-L/home/marxin/bin/gcc/lib64/gcc/x86_64-pc-linux-gnu/13.0.0/../../.. a.o -lgcc 
--as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed 
/home/marxin/bin/gcc/lib64/gcc/x86_64-pc-linux-gnu/13.0.0/crtend.o 
/lib/../lib64/crtn.o

$ gdb a.out
...
BFD: /home/marxin/Programming/testcases/a.out: unable to initialize decompress 
status for section .debug_abbrev
BFD: /home/marxin/Programming/testcases/a.out: unable to initialize decompress 
status for section .debug_abbrev
"/home/marxin/Programming/testcases/a.out": not in executable format: file 
format not recognized

So it's really compressed with zstd. I'm going to write ChangeLog entry for 
zlib-gnu once this gets merged as well.

Ready to be installed?
Thanks,
Martin

PR driver/106897

gcc/ChangeLog:

* common.opt: Add -gz=zstd value.
* configure.ac: Detect --compress-debug-sections=zstd
for both linker and assembler.
* configure: Regenerate.
* gcc.cc (LINK_COMPRESS_DEBUG_SPEC): Handle -gz=zstd.
(ASM_COMPRESS_DEBUG_SPEC): Likewise.
---
 gcc/common.opt   |  5 -
 gcc/configure| 11 +--
 gcc/configure.ac | 11 +--
 gcc/gcc.cc   | 15 +++
 4 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 06ef768ab78..68370db816b 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3419,7 +3419,10 @@ EnumValue
 Enum(compressed_debug_sections) String(zlib) Value(1)
 
 EnumValue
-Enum(compressed_debug_sections) String(zlib-gnu) Value(2)
+Enum(compressed_debug_sections) String(zstd) Value(2)
+
+EnumValue
+Enum(compressed_debug_sections) String(zlib-gnu) Value(3)
 
 gz
 Common Driver
diff --git a/gcc/configure b/gcc/configure
index 70a013e9a30..ce4e1859e1f 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -29727,13 +29727,16 @@ else
if $gcc_cv_as --compress-debug-sections -o conftest.o conftest.s 2>&1 | 
grep -i warning > /dev/null
then
  gcc_cv_as_compress_debug=0
-   # Since binutils 2.26, gas supports --compress-debug-sections=zlib,
-   # defaulting to the ELF gABI format.
elif $gcc_cv_as --compress-debug-sections=zlib -o conftest.o conftest.s > 
/dev/null 2>&1
then
  gcc_cv_as_compress_debug=1
  gcc_cv_as_compress_debug_option="--compress-debug-sections"
  gcc_cv_as_no_compress_debug_option="--nocompress-debug-sections"
+ # Since binutils 2.40, gas supports --compress-debug-sections=zstd.
+ if $gcc_cv_as --compress-debug-sections=zstd -o conftest.o conftest.s > 
/dev/null 2>&1
+ then
+   gcc_cv_as_compress_debug=2
+ fi
else
  gcc_cv_as_compress_debug=0
fi
@@ -30251,6 +30254,10 @@ $as_echo_n "checking linker for compressed debug 
sections... " >&6; }
 if $gcc_cv_ld --help 2>&1 | grep -- '--compress-debug-sections.*\' 
> /dev/null; then
 gcc_cv_ld_compress_debug=1
 gcc_cv_ld_compress_debug_option="--compress-debug-sections"
+# Detect zstd debug section compression support
+if $gcc_cv_ld --help 2>&1 | grep -- '--compress-debug-sections.*\' 
> /dev/null; then
+  gcc_cv_ld_compress_debug=2
+fi
 else
   case "${target}" in
 *-*-solaris2*)
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 96e10d7c194..b6bafa8b7d6 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -5732,13 +5732,16 @@ gcc_GAS_CHECK_FEATURE([compressed debug sections],
if $gcc_cv_as --compress-debug-sections -o conftest.o conftest.s 2>&1 | 
grep -i warning > /dev/null
then
  gcc_cv_as_compress_debug=0
-   # Since binutils 2.26, gas supports --compress-debug-sections=zlib,
-   # defaulting to the ELF gABI format.
elif $gcc_cv_as --compress-debug-sections=zlib -o conftest.o conftest.s > 
/dev/null 2>&1
then
  gcc_cv_as_compress_debug=1
  gcc_cv_as_compress_debug_option="--compress-debug-sections"
  

Re: PING^2: [PATCH v5] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2022-09-22 Thread Richard Biener via Gcc-patches
On Thu, Sep 22, 2022 at 2:34 PM Richard Biener
 wrote:
>
> On Wed, Sep 7, 2022 at 9:30 AM Di Zhao OS  
> wrote:
> >
> > Gentle ping again.
>
> So I got the chance to review the change again on the travel to GNU
> Cauldron 2022.
>
> There's quite some factoring / moving of stuff in the patch.  I've
> already pushed to trunk
> a change that factores out can_track_predicate_on_edge (your 
> vn_tracking_edge),
> factoring out is_vn_valid_at_bb also looks useful, so I'll followup
> with a similar change.
>
> I'm going to attach a commented quoted patch (because that's what I
> produced during
> the travel).  An overall comment (also in that attachment) would be
>
> "why are equivalences recorded in the expression hash table at all?  Are
> they not predicated values of an SSA name and thus should be a
> vn_pval chain in vn_ssa_aux itself?
>
> conditional equivalences are expensive to handle (so are the existing
> predicated values which I do not like too much and which, frankly, I've
> probably designed too general - ATM we only ever insert predicated
> values 'true' and 'false' which could be used to simplify a lot of logic
> but would break this patch?)
>
> At some point I wanted to see whether we can use ranger relations
> for all of this.
>
> Then, for "true" equivalence tracking it might be interesting to explore
> "path value numbering", aka allow revisiting code from the equivalence
> op defs to the equivalence producing edge(s) with the equivalence fully
> reflected in the value number.  The interesting thing might be that we
> can track whether there's any equivalence on the side and based on
> use heuristic decide whether that's going to pay off.  It might be also
> possible to re-use this to improve jump threading costing.  If we'd be
> able to "fork" the VN state we could re-run from the later definition
> of an equivalence to the point it is established.
>
> So, overall I wasn't able to get at what this patch will catch and what
> it will not catch - that is, to what extent equivalences affect
> previously and future recorded expressions.  Plus the implementation
> feels like it bolts on the wrong place.
>
> As I'm not happy with my predicated values implementation either I'm
> of course a bit biased here (note the implementation was mostly added
> to avoid regressions with respect to the previous VN implementation
> and I should probably make it less general and more optimized - but
> as said, using ranger might be an option here).
>
> You have one testcase, ssa-fre-102.c, that seems to require VN
> with equivalences, the others should be catched by rangers relation
> handling, no?"
>
> I've looked into using ranger for what the existing predicated value
> handling does, plus catch more cases transparently.  I'm not sure
> rangers equivalences handling is a good fit so to handle those an
> approach like yours might be necessary.  Note I'm not really happy
> about the patch as-is (nor I am happy about what I implemented
> with predicated values - sorry for that).  I'm not even sure equivalences
> can be handled "nicely" :/

Meh, I said I would attach something.  Here it is.

Richard.


review
Description: Binary data


Re: [PATCH v2] remove -gz=zlib-gnu option value

2022-09-22 Thread Richard Biener via Gcc-patches
On Thu, Sep 22, 2022 at 2:26 PM Martin Liška  wrote:
>
> Hi.
>
> I have a better version of the patch where section compression detection is 
> based
> on ld --help, rather than a particular binutils version.
> That's much easier and all ld, ld.bfd and mold use the very same option.
>
> Ready to be installed?

OK.  Can you document the change in changes.html?

> Thanks,
> Martin


Re: PING^2: [PATCH v5] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2022-09-22 Thread Richard Biener via Gcc-patches
On Wed, Sep 7, 2022 at 9:30 AM Di Zhao OS  wrote:
>
> Gentle ping again.

So I got the chance to review the change again on the travel to GNU
Cauldron 2022.

There's quite some factoring / moving of stuff in the patch.  I've
already pushed to trunk
a change that factores out can_track_predicate_on_edge (your vn_tracking_edge),
factoring out is_vn_valid_at_bb also looks useful, so I'll followup
with a similar change.

I'm going to attach a commented quoted patch (because that's what I
produced during
the travel).  An overall comment (also in that attachment) would be

"why are equivalences recorded in the expression hash table at all?  Are
they not predicated values of an SSA name and thus should be a
vn_pval chain in vn_ssa_aux itself?

conditional equivalences are expensive to handle (so are the existing
predicated values which I do not like too much and which, frankly, I've
probably designed too general - ATM we only ever insert predicated
values 'true' and 'false' which could be used to simplify a lot of logic
but would break this patch?)

At some point I wanted to see whether we can use ranger relations
for all of this.

Then, for "true" equivalence tracking it might be interesting to explore
"path value numbering", aka allow revisiting code from the equivalence
op defs to the equivalence producing edge(s) with the equivalence fully
reflected in the value number.  The interesting thing might be that we
can track whether there's any equivalence on the side and based on
use heuristic decide whether that's going to pay off.  It might be also
possible to re-use this to improve jump threading costing.  If we'd be
able to "fork" the VN state we could re-run from the later definition
of an equivalence to the point it is established.

So, overall I wasn't able to get at what this patch will catch and what
it will not catch - that is, to what extent equivalences affect
previously and future recorded expressions.  Plus the implementation
feels like it bolts on the wrong place.

As I'm not happy with my predicated values implementation either I'm
of course a bit biased here (note the implementation was mostly added
to avoid regressions with respect to the previous VN implementation
and I should probably make it less general and more optimized - but
as said, using ranger might be an option here).

You have one testcase, ssa-fre-102.c, that seems to require VN
with equivalences, the others should be catched by rangers relation
handling, no?"

I've looked into using ranger for what the existing predicated value
handling does, plus catch more cases transparently.  I'm not sure
rangers equivalences handling is a good fit so to handle those an
approach like yours might be necessary.  Note I'm not really happy
about the patch as-is (nor I am happy about what I implemented
with predicated values - sorry for that).  I'm not even sure equivalences
can be handled "nicely" :/

Thanks,
Richard.

> Thanks,
> Di Zhao
>
> > -Original Message-
> > From: Di Zhao OS
> > Sent: Tuesday, July 12, 2022 2:08 AM
> > To: 'gcc-patches@gcc.gnu.org' 
> > Cc: 'Richard Biener' 
> > Subject: PING: [PATCH v5] tree-optimization/101186 - extend FRE with
> > "equivalence map" for condition prediction
> >
> > Updated the patch in the attachment, so it can apply.
> >
> > Thanks,
> > Di Zhao
> >
> > > -Original Message-
> > > From: Di Zhao OS
> > > Sent: Sunday, May 29, 2022 11:59 PM
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: Richard Biener 
> > > Subject: [PATCH v5] tree-optimization/101186 - extend FRE with 
> > > "equivalence
> > > map" for condition prediction
> > >
> > > Hi, attached is a new version of the patch. The changes are:
> > > - Skip using temporary equivalences for floating-point values, because
> > > folding expressions can generate incorrect values. For example,
> > > operations on 0.0 and -0.0 may have different results.
> > > - Avoid inserting duplicated back-refs from value-number to predicates.
> > > - Disable fre in testsuite/g++.dg/pr83541.C .
> > >
> > > Summary of the previous versions:
> > > https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587346.html
> > >
> > > Is the patch still considered?
> > >
> > > Thanks,
> > > Di Zhao
> > >
> > > ---
> > >
> > > Extend FRE with temporary equivalences.
> > >
> > > 2022-05-29  Di Zhao  
> > >
> > > gcc/ChangeLog:
> > > PR tree-optimization/101186
> > > * tree-ssa-sccvn.c (VN_INFO): remove assertions (there could be a
> > > predicate already).
> > > (dominated_by_p_w_unex): Moved upward.
> > > (vn_nary_op_get_predicated_value): Moved upward.
> > > (is_vn_valid_at_bb): Check if vn_pval is valid at BB.
> > > (lookup_equiv_head): Lookup the "equivalence head" of given node.
> > > (lookup_equiv_heads): Lookup the "equivalence head"s of given 
> > > nodes.
> > > (vn_tracking_edge): Extracted utility function.
> > > (init_vn_nary_op_from_stmt): Insert and lookup by "equivalence
> > 

[PATCH v2] remove -gz=zlib-gnu option value

2022-09-22 Thread Martin Liška
Hi.

I have a better version of the patch where section compression detection is 
based
on ld --help, rather than a particular binutils version.
That's much easier and all ld, ld.bfd and mold use the very same option.

Ready to be installed?
Thanks,
MartinFrom d2314c942c5c19a5fd5d6b2d45750d863636873c Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Thu, 22 Sep 2022 13:04:57 +0200
Subject: [PATCH] remove -gz=zlib-gnu option value

The option value is legacy and probably not used at all,
thus ignore it.

gcc/ChangeLog:

	* configure: Regenerate.
	* configure.ac: Simplify to gcc_cv_ld_compress_debug={0,1}
	and gcc_cv_as_compress_debug={0,1}.
	* doc/invoke.texi: Document the removal.
	* gcc.cc (LINK_COMPRESS_DEBUG_SPEC): Simplify and ignore
	  zlib-gnu.
	(ASM_COMPRESS_DEBUG_SPEC): Likewise.

Co-Authored-By: Fangrui Song 
---
 gcc/configure   | 49 +
 gcc/configure.ac| 49 +
 gcc/doc/invoke.texi | 11 +-
 gcc/gcc.cc  | 26 +---
 4 files changed, 20 insertions(+), 115 deletions(-)

diff --git a/gcc/configure b/gcc/configure
index 817d765568e..70a013e9a30 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -29727,16 +29727,9 @@ else
if $gcc_cv_as --compress-debug-sections -o conftest.o conftest.s 2>&1 | grep -i warning > /dev/null
then
  gcc_cv_as_compress_debug=0
-   # Since binutils 2.26, gas supports --compress-debug-sections=type,
+   # Since binutils 2.26, gas supports --compress-debug-sections=zlib,
# defaulting to the ELF gABI format.
-   elif $gcc_cv_as --compress-debug-sections=zlib-gnu -o conftest.o conftest.s > /dev/null 2>&1
-   then
- gcc_cv_as_compress_debug=2
- gcc_cv_as_compress_debug_option="--compress-debug-sections"
- gcc_cv_as_no_compress_debug_option="--nocompress-debug-sections"
-   # Before binutils 2.26, gas only supported --compress-debug-options and
-   # emitted the traditional GNU format.
-   elif $gcc_cv_as --compress-debug-sections -o conftest.o conftest.s > /dev/null 2>&1
+   elif $gcc_cv_as --compress-debug-sections=zlib -o conftest.o conftest.s > /dev/null 2>&1
then
  gcc_cv_as_compress_debug=1
  gcc_cv_as_compress_debug_option="--compress-debug-sections"
@@ -30254,48 +30247,16 @@ $as_echo "$gcc_cv_ld_eh_gc_sections_bug" >&6; }
 
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking linker for compressed debug sections" >&5
 $as_echo_n "checking linker for compressed debug sections... " >&6; }
-# gold/gld support compressed debug sections since binutils 2.19/2.21
-# In binutils 2.26, gld gained support for the ELF gABI format.
-if test $in_tree_ld = yes ; then
-  gcc_cv_ld_compress_debug=0
-  if test $ld_is_mold = yes; then
-gcc_cv_ld_compress_debug=3
-gcc_cv_ld_compress_debug_option="--compress-debug-sections"
-  elif test "$gcc_cv_gld_major_version" -eq 2 -a "$gcc_cv_gld_minor_version" -ge 19 -o "$gcc_cv_gld_major_version" -gt 2 \
- && test $in_tree_ld_is_elf = yes && test $ld_is_gold = yes; then
-gcc_cv_ld_compress_debug=2
-gcc_cv_ld_compress_debug_option="--compress-debug-sections"
-  elif test "$gcc_cv_gld_major_version" -eq 2 -a "$gcc_cv_gld_minor_version" -ge 26 -o "$gcc_cv_gld_major_version" -gt 2 \
- && test $in_tree_ld_is_elf = yes && test $ld_is_gold = no; then
-gcc_cv_ld_compress_debug=3
-gcc_cv_ld_compress_debug_option="--compress-debug-sections"
-  elif test "$gcc_cv_gld_major_version" -eq 2 -a "$gcc_cv_gld_minor_version" -ge 21 -o "$gcc_cv_gld_major_version" -gt 2 \
- && test $in_tree_ld_is_elf = yes; then
+# GNU ld/gold support --compressed-debug-sections=zlib since binutils 2.26.
+if $gcc_cv_ld --help 2>&1 | grep -- '--compress-debug-sections.*\' > /dev/null; then
 gcc_cv_ld_compress_debug=1
-  fi
-elif echo "$ld_ver" | grep GNU > /dev/null; then
-  if test $ld_is_mold = yes; then
-gcc_cv_ld_compress_debug=3
-gcc_cv_ld_compress_debug_option="--compress-debug-sections"
-  elif test "$ld_vers_major" -lt 2 \
- || test "$ld_vers_major" -eq 2 -a "$ld_vers_minor" -lt 21; then
-gcc_cv_ld_compress_debug=0
-  elif test "$ld_vers_major" -eq 2 -a "$ld_vers_minor" -lt 26; then
-gcc_cv_ld_compress_debug=1
-  else
-gcc_cv_ld_compress_debug=3
-gcc_cv_ld_compress_debug_option="--compress-debug-sections"
-  fi
-  if test $ld_is_gold = yes; then
-gcc_cv_ld_compress_debug=2
 gcc_cv_ld_compress_debug_option="--compress-debug-sections"
-  fi
 else
   case "${target}" in
 *-*-solaris2*)
   # Introduced in Solaris 11.2.
   if $gcc_cv_ld --help 2>&1 | grep -- '-z compress-sections' > /dev/null; then
-gcc_cv_ld_compress_debug=3
+gcc_cv_ld_compress_debug=1
 gcc_cv_ld_compress_debug_option="-z compress-sections"
   else
 gcc_cv_ld_compress_debug=0
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 59f205a1781..96e10d7c194 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -5732,16 +5732,9 @@ 

Re: [PATCH v3] tree-optimization/95821 - Convert strlen + strchr to memchr

2022-09-22 Thread Jakub Jelinek via Gcc-patches
On Tue, Jun 21, 2022 at 11:12:15AM -0700, Noah Goldstein wrote:
> This patch allows for strchr(x, c) to the replace with memchr(x, c,
> strlen(x) + 1) if strlen(x) has already been computed earlier in the
> tree.
> 
> Handles PR95821: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95821
> 
> Since memchr doesn't need to re-find the null terminator it is faster
> than strchr.
> 
> bootstrapped and tested on x86_64-linux.
> 
>   PR tree-optimization/95821
> 
> gcc/
> 
>   * tree-ssa-strlen.cc (strlen_pass::handle_builtin_strchr): Emit
>   memchr instead of strchr if strlen already computed.
> 
> gcc/testsuite/
> 
>   * c-c++-common/pr95821-1.c: New test.
>   * c-c++-common/pr95821-2.c: New test.
>   * c-c++-common/pr95821-3.c: New test.
>   * c-c++-common/pr95821-4.c: New test.
>   * c-c++-common/pr95821-5.c: New test.
>   * c-c++-common/pr95821-6.c: New test.
>   * c-c++-common/pr95821-7.c: New test.
>   * c-c++-common/pr95821-8.c: New test.

Sorry for the delay.

> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/pr95821-1.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { scan-assembler "memchr" } } */

Please don't scan assembler, whether memchr will expand
to a call or be expanded inline etc. is not known.
Better use "-O2 -fdump-tree-optimize" in dg-options
and scan the optimized dump for "memchr \\\(".
Ditto for other tests.

> @@ -2452,32 +2459,96 @@ strlen_pass::handle_builtin_strchr ()
> fprintf (dump_file, "Optimizing: ");
> print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
>   }
> -   if (si != NULL && si->endptr != NULL_TREE)
> +   /* Three potential optimizations assume t=strlen (s) has already been
> +  computed:
> + 1. strchr (s, chr) where chr is known to be zero -> t

-> s + t
rather than
-> t
actually.

> + 2. strchr (s, chr) where chr is known not to be zero ->
> +memchr (s, chr, t)
> + 3. strchr (s, chr) where chr is not known to be zero or

nor instead of or?

> +non-zero -> memchr (s, chr, t + 1).  */
> +   if (!is_strchr_zerop)
>   {
> -   rhs = unshare_expr (si->endptr);
> -   if (!useless_type_conversion_p (TREE_TYPE (lhs),
> -   TREE_TYPE (rhs)))
> - rhs = fold_convert_loc (loc, TREE_TYPE (lhs), rhs);
> +   /* If its not strchr (s, zerop) then try and convert to
> +  memchr since strlen has already been computed.  */
> +   tree fn = builtin_decl_explicit (BUILT_IN_MEMCHR);
> +
> +   /* Only need to check length strlen (s) + 1 if chr may be zero.
> +  Otherwise the last chr (which is known to be zero) can never
> +  be a match.  */
> +   bool chr_nonzero = false;
> +   if (TREE_CODE (chr) == INTEGER_CST
> +   && integer_nonzerop (fold_convert (char_type_node, chr)))
> + chr_nonzero = true;
> +   else if (TREE_CODE (chr) == SSA_NAME
> +&& CHAR_TYPE_SIZE < INT_TYPE_SIZE)
> + {
> +   value_range r;
> +   /* Try to determine using ranges if (char) chr must
> +  be always 0.  That is true e.g. if all the subranges

must be always non-zero ?

> +  have the INT_TYPE_SIZE - CHAR_TYPE_SIZE bits
> +  the same on lower and upper bounds.  */

That is actually not enough, see below.

> +   if (get_range_query (cfun)->range_of_expr (r, chr, stmt)
> +   && r.kind () == VR_RANGE)
> + {
> +   wide_int mask
> +   = wi::mask (CHAR_TYPE_SIZE, true, INT_TYPE_SIZE);

Wrong indentation, = should be 2 columns left of wide_int.

> +   for (unsigned i = 0; i < r.num_pairs (); ++i)
> + if ((r.lower_bound (i) & mask)
> + != (r.upper_bound (i) & mask))
> +   {
> + chr_nonzero = false;
> + break;
> +   }

This else if actually can't do what it indends to, because
chr_nonzero is initialized to false at the start and in the loop you
also just set it to false, so it is always false.
You need to add chr_nonzero = true; before the for loop above.
With that, all the above test proves is that there is no range like
[15, 257] where it would include 256 in the middle of the range or
at the end.  But the above doesn't clear chr_nonzero on ranges like
[0, 32] or [256, 511] where (char) chr can still be zero.
So, the test should be:
if ((r.lower_bound (i) & mask)
!= (r.upper_bound (i) & mask)
|| (r.lower_bound (i) & ~mask) == 0)
or so, that will rule out also the above ranges and if one just has ranges
like:
[1, 32] U [48, 56] U [257, 

Re: [PATCH] c++ modules: partial variable template specializations [PR106826]

2022-09-22 Thread Nathan Sidwell via Gcc-patches

On 9/21/22 12:16, Patrick Palka wrote:

With partial variable template specializations, it looks like we
stream the VAR_DECL (i.e. the DECL_TEMPLATE_RESULT of the corresponding
TEMPLATE_DECL) since process_partial_specialization adds it to the
specializations table, but end up never streaming the corresponding
TEMPLATE_DECL itself that appears only in the primary template's
DECL_TEMPLATE_SPECIALIZATIONS list, which leads to the list being
incomplete on stream-in.

The modules machinery already has special logic for streaming partial
specializations of class templates; this patch generalizes it to handle
those of variable templates as well.


looks good, I didn't realize template vars had partial specializations.



Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

PR c++/106826

gcc/cp/ChangeLog:

* module.cc (trees_out::decl_value): Use get_template_info in
the MK_partial case.
(trees_out::key_mergeable): Likewise.
(trees_in::key_mergeable): Likewise.
(has_definition): Consider DECL_INITIAL of a partial variable
template specialization.
(depset::hash::make_dependency): Introduce a dependency of
partial variable template specializations too.

gcc/testsuite/ChangeLog:

* g++.dg/modules/partial-2_a.C: New test.
* g++.dg/modules/partial-2_b.C: New test.
---
  gcc/cp/module.cc   | 32 +---
  gcc/testsuite/g++.dg/modules/partial-2_a.C | 43 ++
  gcc/testsuite/g++.dg/modules/partial-2_b.C | 21 +++
  3 files changed, 82 insertions(+), 14 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/partial-2_a.C
  create mode 100644 gcc/testsuite/g++.dg/modules/partial-2_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 9a9ef4e3332..334bde99b0f 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -7789,8 +7789,9 @@ trees_out::decl_value (tree decl, depset *dep)
}
  else
{
- tree_node (CLASSTYPE_TI_TEMPLATE (TREE_TYPE (inner)));
- tree_node (CLASSTYPE_TI_ARGS (TREE_TYPE (inner)));
+ tree ti = get_template_info (inner);
+ tree_node (TI_TEMPLATE (ti));
+ tree_node (TI_ARGS (ti));
}
}
tree_node (get_constraints (decl));
@@ -10626,8 +10627,9 @@ trees_out::key_mergeable (int tag, merge_kind mk, tree 
decl, tree inner,
case MK_partial:
  {
key.constraints = get_constraints (inner);
-   key.ret = CLASSTYPE_TI_TEMPLATE (TREE_TYPE (inner));
-   key.args = CLASSTYPE_TI_ARGS (TREE_TYPE (inner));
+   tree ti = get_template_info (inner);
+   key.ret = TI_TEMPLATE (ti);
+   key.args = TI_ARGS (ti);
  }
  break;
}
@@ -10866,8 +10868,8 @@ trees_in::key_mergeable (int tag, merge_kind mk, tree 
decl, tree inner,
   spec; spec = TREE_CHAIN (spec))
{
  tree tmpl = TREE_VALUE (spec);
- if (template_args_equal (key.args,
-  CLASSTYPE_TI_ARGS (TREE_TYPE (tmpl)))
+ tree ti = get_template_info (tmpl);
+ if (template_args_equal (key.args, TI_ARGS (ti))
  && cp_tree_equal (key.constraints,
get_constraints
(DECL_TEMPLATE_RESULT (tmpl
@@ -11381,8 +11383,7 @@ has_definition (tree decl)
  
  case VAR_DECL:

if (DECL_LANG_SPECIFIC (decl)
- && DECL_TEMPLATE_INFO (decl)
- && DECL_USE_TEMPLATE (decl) < 2)
+ && DECL_TEMPLATE_INFO (decl))
return DECL_INITIAL (decl);
else
{
@@ -12498,11 +12499,14 @@ depset::hash::make_dependency (tree decl, entity_kind 
ek)
  
if (!dep)

  {
-  if (DECL_IMPLICIT_TYPEDEF_P (decl)
- /* ... not an enum, for instance.  */
- && RECORD_OR_UNION_TYPE_P (TREE_TYPE (decl))
- && TYPE_LANG_SPECIFIC (TREE_TYPE (decl))
- && CLASSTYPE_USE_TEMPLATE (TREE_TYPE (decl)) == 2)
+  if ((DECL_IMPLICIT_TYPEDEF_P (decl)
+  /* ... not an enum, for instance.  */
+  && RECORD_OR_UNION_TYPE_P (TREE_TYPE (decl))
+  && TYPE_LANG_SPECIFIC (TREE_TYPE (decl))
+  && CLASSTYPE_USE_TEMPLATE (TREE_TYPE (decl)) == 2)
+ || (VAR_P (decl)
+ && DECL_LANG_SPECIFIC (decl)
+ && DECL_USE_TEMPLATE (decl) == 2))
{
  /* A partial or explicit specialization. Partial
 specializations might not be in the hash table, because
@@ -12515,7 +12519,7 @@ depset::hash::make_dependency (tree decl, entity_kind 
ek)
 dep_hash, and then convert the dep we just found into a
 redirect.  */
  
-	  tree ti = TYPE_TEMPLATE_INFO (TREE_TYPE (decl));

+ tree ti = get_template_info (decl);
  tree tmpl = TI_TEMPLATE (ti);
  tree partial = NULL_TREE;
  for (tree 

Re: [PATCH] c: fix uninitialized c_expr::m_decimal [PR106830]

2022-09-22 Thread Jakub Jelinek via Gcc-patches
On Tue, Sep 06, 2022 at 09:20:47PM -0400, David Malcolm via Gcc-patches wrote:
> I added c_expr::m_decimal in r13-2386-gbedfca647a9e9c1a as part of the
> implementation of -Wxor-used-as-pow, but I missed various places where
> the field needed to be initialized.
> 
> Fixed thusly (based on searching for places that assign to
> the original_code field).
> 
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> 
> OK for trunk?
> 
> Thanks
> Dave
> 
> 
> gcc/c-family/ChangeLog:
>   PR c/106830
>   * c-warn.cc (check_for_xor_used_as_pow): Don't try checking
>   values that don't fit in uhwi.
> 
> gcc/c/ChangeLog:
>   PR c/106830
>   * c-parser.cc (c_parser_initelt): Initialize m_decimal.
>   (c_parser_cast_expression): Likewise.
>   (c_parser_alignof_expression): Likewise.
>   (c_parser_postfix_expression_after_paren_type): Likewise.
>   (c_parser_postfix_expression_after_primary): Likewise.
>   (c_parser_expression): Likewise.
>   (c_parser_omp_variable_list): Likewise.
>   (c_parser_transaction_expression): Likewise.
>   * c-tree.h (c_expr::set_error): Likewise.
>   * c-typeck.cc (c_expr_sizeof_expr): Likewise.
>   (parser_build_unary_op): Likewise.
>   (parser_build_binary_op): Likewise.
>   (digest_init): Likewise.
>   (pop_init_level): Likewise.
>   * gimple-parser.cc (c_parser_gimple_call_internal): Likewise.
> 
> gcc/testsuite/ChangeLog:
>   PR c/106830
>   * gcc.dg/Wxor-used-as-pow-pr106830.c: New test.

Ok, thanks.

Jakub



[PATCH] remove -gz=zlib-gnu option value

2022-09-22 Thread Martin Liška
On 9/21/22 11:35, Richard Biener wrote:
> On Wed, Sep 21, 2022 at 9:49 AM Martin Liška  wrote:
>>
>> On 9/21/22 09:36, Richard Biener wrote:
>>> If it's all configure time what's the point in
>>> "deprecating" it?
>>
>> Note it's one of our options -gz where 'zlib-gnu' is one of the possible 
>> option values.
> 
> I see.  Not sure if deprecating is really necessary, you need to keep
> recognizing
> zlib-gnu as no-op anyway.  So I'd just go ahead and remove support for it.

Hi.

I'm sending patch that makes it no-op and simplifies more the configure.ac 
detection.

Tested with both ld.bfd and mold:

$ ./xgcc -B. ~/Programming/testcases/a.c -c -gz=zlib --save-temps --verbose 
2>&1 | grep =zlib | grep -v COLLECT_GCC_OPTIONS
 ./cc1 -fpreprocessed a.i -quiet -dumpbase a.c -dumpbase-ext .c -mtune=generic 
-march=x86-64 -gz=zlib -version -o a.s
 ./as -v --compress-debug-sections=zlib --64 -o a.o a.s

$ ./xgcc -B. ~/Programming/testcases/a.c -c -gz=zlib-gnu --save-temps --verbose 
2>&1 | grep =zlib | grep -v COLLECT_GCC_OPTIONS
 ./cc1 -fpreprocessed a.i -quiet -dumpbase a.c -dumpbase-ext .c -mtune=generic 
-march=x86-64 -gz=zlib-gnu -version -o a.s

Ready after it finishes tests?
Thanks,
Martin

> 
>> Martin
From 979ab57b853cee002d29d1ac9199021a1866e4fb Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Thu, 22 Sep 2022 13:04:57 +0200
Subject: [PATCH] remove -gz=zlib-gnu option value

The option value is legacy and probably not used at all,
thus ignore it.

gcc/ChangeLog:

	* configure: Regenerate.
	* configure.ac: Simplify to gcc_cv_ld_compress_debug={0,1}
	and gcc_cv_as_compress_debug={0,1}.
	* doc/invoke.texi: Document the removal.
	* gcc.cc (LINK_COMPRESS_DEBUG_SPEC): Simplify and ignore
	  zlib-gnu.
	(ASM_COMPRESS_DEBUG_SPEC): Likewise.

Co-Authored-By: Fangrui Song 
---
 gcc/configure   | 39 +--
 gcc/configure.ac| 39 +--
 gcc/doc/invoke.texi | 11 +--
 gcc/gcc.cc  | 26 +-
 4 files changed, 28 insertions(+), 87 deletions(-)

diff --git a/gcc/configure b/gcc/configure
index 817d765568e..3cea801d5a7 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -29727,16 +29727,9 @@ else
if $gcc_cv_as --compress-debug-sections -o conftest.o conftest.s 2>&1 | grep -i warning > /dev/null
then
  gcc_cv_as_compress_debug=0
-   # Since binutils 2.26, gas supports --compress-debug-sections=type,
+   # Since binutils 2.26, gas supports --compress-debug-sections=zlib,
# defaulting to the ELF gABI format.
-   elif $gcc_cv_as --compress-debug-sections=zlib-gnu -o conftest.o conftest.s > /dev/null 2>&1
-   then
- gcc_cv_as_compress_debug=2
- gcc_cv_as_compress_debug_option="--compress-debug-sections"
- gcc_cv_as_no_compress_debug_option="--nocompress-debug-sections"
-   # Before binutils 2.26, gas only supported --compress-debug-options and
-   # emitted the traditional GNU format.
-   elif $gcc_cv_as --compress-debug-sections -o conftest.o conftest.s > /dev/null 2>&1
+   elif $gcc_cv_as --compress-debug-sections=zlib -o conftest.o conftest.s > /dev/null 2>&1
then
  gcc_cv_as_compress_debug=1
  gcc_cv_as_compress_debug_option="--compress-debug-sections"
@@ -30254,40 +30247,26 @@ $as_echo "$gcc_cv_ld_eh_gc_sections_bug" >&6; }
 
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking linker for compressed debug sections" >&5
 $as_echo_n "checking linker for compressed debug sections... " >&6; }
-# gold/gld support compressed debug sections since binutils 2.19/2.21
-# In binutils 2.26, gld gained support for the ELF gABI format.
+# GNU ld/gold support --compressed-debug-sections=zlib since binutils 2.26.
 if test $in_tree_ld = yes ; then
   gcc_cv_ld_compress_debug=0
   if test $ld_is_mold = yes; then
-gcc_cv_ld_compress_debug=3
-gcc_cv_ld_compress_debug_option="--compress-debug-sections"
-  elif test "$gcc_cv_gld_major_version" -eq 2 -a "$gcc_cv_gld_minor_version" -ge 19 -o "$gcc_cv_gld_major_version" -gt 2 \
- && test $in_tree_ld_is_elf = yes && test $ld_is_gold = yes; then
-gcc_cv_ld_compress_debug=2
+gcc_cv_ld_compress_debug=1
 gcc_cv_ld_compress_debug_option="--compress-debug-sections"
   elif test "$gcc_cv_gld_major_version" -eq 2 -a "$gcc_cv_gld_minor_version" -ge 26 -o "$gcc_cv_gld_major_version" -gt 2 \
  && test $in_tree_ld_is_elf = yes && test $ld_is_gold = no; then
-gcc_cv_ld_compress_debug=3
-gcc_cv_ld_compress_debug_option="--compress-debug-sections"
-  elif test "$gcc_cv_gld_major_version" -eq 2 -a "$gcc_cv_gld_minor_version" -ge 21 -o "$gcc_cv_gld_major_version" -gt 2 \
- && test $in_tree_ld_is_elf = yes; then
 gcc_cv_ld_compress_debug=1
+gcc_cv_ld_compress_debug_option="--compress-debug-sections"
   fi
 elif echo "$ld_ver" | grep GNU > /dev/null; then
   if test $ld_is_mold = yes; then
-gcc_cv_ld_compress_debug=3
+gcc_cv_ld_compress_debug=1
 gcc_cv_ld_compress_debug_option="--compress-debug-sections"
   elif 

[PATCH] tree-optimization/106922 - missed FRE/PRE

2022-09-22 Thread Richard Biener via Gcc-patches
The following enhances the store-with-same-value trick in
vn_reference_lookup_3 by not only looking for

  a = val;
  *ptr = val;
  .. = a;

but also

  *ptr = val;
  other = x;
  .. = a;

where the earlier store is more than one hop away.  It does this
by queueing the actual value to compare until after the walk but
as disadvantage only allows a single such skipped store from a
constant value.

Unfortunately we cannot handle defs from non-constants this way
since we're prone to pick up values from the past loop iteration
this way and we have no good way to identify values that are
invariant in the currently iterated cycle.  That's why we keep
the single-hop lookup for those cases.  gcc.dg/tree-ssa/pr87126.c
would be a testcase that's un-XFAILed when we'd handle those
as well.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/106922
* tree-ssa-sccvn.cc (vn_walk_cb_data::same_val): New member.
(vn_walk_cb_data::finish): Perform delayed verification of
a skipped may-alias.
(vn_reference_lookup_pieces): Likewise.
(vn_reference_lookup): Likewise.
(vn_reference_lookup_3): When skipping stores of the same
value also handle constant stores that are more than a
single VDEF away by delaying the verification.

* gcc.dg/tree-ssa/ssa-fre-100.c: New testcase.
* g++.dg/tree-ssa/pr106922.C: Adjust.
---
 gcc/testsuite/g++.dg/tree-ssa/pr106922.C|  3 +-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-100.c | 25 ++
 gcc/tree-ssa-sccvn.cc   | 97 ++---
 3 files changed, 93 insertions(+), 32 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-100.c

diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr106922.C 
b/gcc/testsuite/g++.dg/tree-ssa/pr106922.C
index faf379b0361..14fa061de20 100644
--- a/gcc/testsuite/g++.dg/tree-ssa/pr106922.C
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr106922.C
@@ -87,5 +87,4 @@ void testfunctionfoo() {
   }
 }
 
-// { dg-final { scan-tree-dump-times "Found fully redundant value" 4 "pre" { 
xfail { ! lp64 } } } }
-// { dg-final { scan-tree-dump-not "m_initialized" "cddce3" { xfail { ! lp64 } 
} } }
+// { dg-final { scan-tree-dump-not "m_initialized" "dce3" } }
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-100.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-100.c
new file mode 100644
index 000..ead76548f3d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-100.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-fre1" } */
+
+float bar, baz;
+void foo (int *p, int n)
+{
+  *p = 0;
+  do
+{
+  bar = 1.;
+  /* When iterating we should have optimistically value-numbered
+*p to zero, on the second iteration we have to prove the
+store below does not affect the value of this load though.
+We can compare the stored value against the value from the
+previous iteration instead relying on a non-walking lookup.  */
+  if (*p)
+{
+  baz = 2.;
+  *p = 0;
+}
+}
+  while (--n);
+}
+
+/* { dg-final { scan-tree-dump-not "baz" "fre1" } } */
diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
index 85a7698f694..9c12a8e4f03 100644
--- a/gcc/tree-ssa-sccvn.cc
+++ b/gcc/tree-ssa-sccvn.cc
@@ -1803,7 +1803,8 @@ struct vn_walk_cb_data
   vn_lookup_kind vn_walk_kind_, bool tbaa_p_, tree mask_,
   bool redundant_store_removal_p_)
 : vr (vr_), last_vuse_ptr (last_vuse_ptr_), last_vuse (NULL_TREE),
-  mask (mask_), masked_result (NULL_TREE), vn_walk_kind (vn_walk_kind_),
+  mask (mask_), masked_result (NULL_TREE), same_val (NULL_TREE),
+  vn_walk_kind (vn_walk_kind_),
   tbaa_p (tbaa_p_), redundant_store_removal_p (redundant_store_removal_p_),
   saved_operands (vNULL), first_set (-2), first_base_set (-2),
   known_ranges (NULL)
@@ -1864,6 +1865,7 @@ struct vn_walk_cb_data
   tree last_vuse;
   tree mask;
   tree masked_result;
+  tree same_val;
   vn_lookup_kind vn_walk_kind;
   bool tbaa_p;
   bool redundant_store_removal_p;
@@ -1902,6 +1904,8 @@ vn_walk_cb_data::finish (alias_set_type set, 
alias_set_type base_set, tree val)
   masked_result = val;
   return (void *) -1;
 }
+  if (same_val && !operand_equal_p (val, same_val))
+return (void *) -1;
   vec 
 = saved_operands.exists () ? saved_operands : vr->operands;
   return vn_reference_lookup_or_insert_for_pieces (last_vuse, set, base_set,
@@ -2675,36 +2679,57 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
*data_,
 and return the found value.  */
   if (is_gimple_reg_type (TREE_TYPE (lhs))
  && types_compatible_p (TREE_TYPE (lhs), vr->type)
- && (ref->ref || data->orig_ref.ref))
-   {
- tree *saved_last_vuse_ptr = data->last_vuse_ptr;
- /* Do not update last_vuse_ptr in vn_reference_lookup_2.  */
- data->last_vuse_ptr = NULL;
- tree 

Re: [PATCH] xtensa: gcc: implement MI thunk generation for call0 ABI

2022-09-22 Thread Max Filippov via Gcc-patches
On Tue, Sep 13, 2022 at 2:58 PM Max Filippov  wrote:
>
> Suwa-san, could you please take a look?
>
> This change fixes the fowllowing testsuite failures when building for
> call0 ABI:
>
> g++.dg/ipa/pr60640-4.C
> g++.dg/ipa/pr83549.C
> g++.dg/ipa/pr83667.C
> g++.dg/torture/pr81812.C
>
> gcc/
> * config/xtensa/xtensa.cc (xtensa_can_output_mi_thunk)
> (xtensa_output_mi_thunk): New functions.
> (TARGET_ASM_CAN_OUTPUT_MI_THUNK)
> (TARGET_ASM_OUTPUT_MI_THUNK): New macro definitions.
> (xtensa_prepare_expand_call): Use fixed register a8 as temporary
> when called with reload_completed set to 1.
> ---
>  gcc/config/xtensa/xtensa.cc | 116 +++-
>  1 file changed, 115 insertions(+), 1 deletion(-)

Regtested for target=xtensa-linux-uclibc, no new regressions.
Committed to master.

-- 
Thanks.
-- Max


[COMMITTED] xtensa: gcc: enable section anchors support

2022-09-22 Thread Max Filippov via Gcc-patches
gcc/
* config/xtensa/xtensa.cc (TARGET_MAX_ANCHOR_OFFSET): New
definition.
---
 gcc/config/xtensa/xtensa.cc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/config/xtensa/xtensa.cc b/gcc/config/xtensa/xtensa.cc
index e5abd356a745..828c7642b7cb 100644
--- a/gcc/config/xtensa/xtensa.cc
+++ b/gcc/config/xtensa/xtensa.cc
@@ -366,6 +366,9 @@ static rtx xtensa_delegitimize_address (rtx);
 #undef TARGET_ASM_OUTPUT_MI_THUNK
 #define TARGET_ASM_OUTPUT_MI_THUNK xtensa_output_mi_thunk
 
+#undef TARGET_MAX_ANCHOR_OFFSET
+#define TARGET_MAX_ANCHOR_OFFSET 1020
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 
-- 
2.30.2



[PATCH] tree-optimization/99407 - DSE with data-ref analysis

2022-09-22 Thread Richard Biener via Gcc-patches
The following resolves the issue that DSE cannot handle references
with variable offsets well when identifying possible uses of a store.
Instead of just relying on ref_maybe_used_by_stmt_p we use data-ref
analysis, making sure to perform that at most once per stmt.  The
new mode is only exercised by the DSE pass before loop optimization
as specified by a new pass parameter and when expensive optimizations
are enabled, so it's disabled below -O2.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/99407
* tree-ssa-dse.c (dse_stmt_to_dr_map): New global.
(dse_classify_store): Use data-ref analysis to disambiguate more uses.
(pass_dse::use_dr_analysis_p): New pass parameter.
(pass_dse::set_pass_param): Implement.
(pass_dse::execute): Allocate and deallocate dse_stmt_to_dr_map.

* gcc.dg/vect/tsvc/vect-tsvc-s243.c: Remove XFAIL.
---
 gcc/passes.def|  2 +-
 .../gcc.dg/vect/tsvc/vect-tsvc-s243.c |  2 +-
 gcc/tree-ssa-dse.cc   | 51 ++-
 3 files changed, 52 insertions(+), 3 deletions(-)

diff --git a/gcc/passes.def b/gcc/passes.def
index 6bb92efacd4..939ec3e29c8 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -263,7 +263,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_sancov);
   NEXT_PASS (pass_asan);
   NEXT_PASS (pass_tsan);
-  NEXT_PASS (pass_dse);
+  NEXT_PASS (pass_dse, true /* use DR analysis */);
   NEXT_PASS (pass_dce);
   /* Pass group that runs when 1) enabled, 2) there are loops
 in the function.  Make sure to run pass_fix_loops before
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s243.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s243.c
index 93618213c74..6eb0240da40 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s243.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s243.c
@@ -38,4 +38,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } 
*/
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */
diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc
index 34cfd1a8802..2411ac711de 100644
--- a/gcc/tree-ssa-dse.cc
+++ b/gcc/tree-ssa-dse.cc
@@ -18,6 +18,7 @@ along with GCC; see the file COPYING3.  If not see
 .  */
 
 #include "config.h"
+#define INCLUDE_MEMORY
 #include "system.h"
 #include "coretypes.h"
 #include "backend.h"
@@ -45,6 +46,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-modref.h"
 #include "target.h"
 #include "tree-ssa-loop-niter.h"
+#include "cfgloop.h"
+#include "tree-data-ref.h"
 
 /* This file implements dead store elimination.
 
@@ -937,6 +940,10 @@ contains_phi_arg (gphi *phi, tree arg)
   return false;
 }
 
+/* Hash map of the memory use in a GIMPLE assignment to its
+   data reference.  If NULL data-ref analysis isn't used.  */
+static hash_map *dse_stmt_to_dr_map;
+
 /* A helper of dse_optimize_stmt.
Given a GIMPLE_ASSIGN in STMT that writes to REF, classify it
according to downstream uses and defs.  Sets *BY_CLOBBER_P to true
@@ -951,6 +958,8 @@ dse_classify_store (ao_ref *ref, gimple *stmt,
   gimple *temp;
   int cnt = 0;
   auto_bitmap visited;
+  std::unique_ptr
+dra (nullptr, free_data_ref);
 
   if (by_clobber_p)
 *by_clobber_p = true;
@@ -1019,6 +1028,28 @@ dse_classify_store (ao_ref *ref, gimple *stmt,
  /* If the statement is a use the store is not dead.  */
  else if (ref_maybe_used_by_stmt_p (use_stmt, ref))
{
+ if (dse_stmt_to_dr_map
+ && ref->ref
+ && is_gimple_assign (use_stmt))
+   {
+ if (!dra)
+   dra.reset (create_data_ref (NULL, NULL, ref->ref, stmt,
+   false, false));
+ bool existed_p;
+ data_reference_p 
+   = dse_stmt_to_dr_map->get_or_insert (use_stmt, _p);
+ if (!existed_p)
+   drb = create_data_ref (NULL, NULL,
+  gimple_assign_rhs1 (use_stmt),
+  use_stmt, false, false);
+ if (!dr_may_alias_p (dra.get (), drb, NULL))
+   {
+ if (gimple_vdef (use_stmt))
+   defs.safe_push (use_stmt);
+ continue;
+   }
+   }
+
  /* Handle common cases where we can easily build an ao_ref
 structure for USE_STMT and in doing so we find that the
 references hit non-live bytes and thus can be ignored.
@@ -1535,14 +1566,21 @@ class pass_dse : public gimple_opt_pass
 {
 public:
   pass_dse (gcc::context *ctxt)
-: gimple_opt_pass (pass_data_dse, ctxt)
+: gimple_opt_pass (pass_data_dse, 

Re: [PATCH v6, rs6000] Implemented f[min/max]_optab by xs[min/max]dp [PR103605]

2022-09-22 Thread HAO CHEN GUI via Gcc-patches
Hi Kewen & Segher,

Thanks so much for your review comments.

On 22/9/2022 上午 10:28, Kewen.Lin wrote:
> on 2022/9/22 05:56, Segher Boessenkool wrote:
>> Hi!
>>
>> On Fri, Jun 24, 2022 at 10:02:19AM +0800, HAO CHEN GUI wrote:
>>>   This patch also binds __builtin_vsx_xs[min/max]dp to fmin/max instead
>>> of smin/max. So the builtins always generate xs[min/max]dp on all
>>> platforms.
>>
>> But how does this not blow up with -ffast-math?
> 
> Indeed.  Since it guards with "TARGET_VSX && !flag_finite_math_only",
> the bifs seem to cause ICE at -ffast-math.
> 
> Haochen, could you double check it?
I tested it with "-ffast-math". fmin/max functions are converted to
MIN/MAX_EXPR in gimple lower pass. But the built-ins are not and hit the
ICE. I thought the built-ins are folded to MIN/MAX_EXPR like vec_ versions'
when fast-math is set. In fact they're not. Sorry for that.

I made a patch to fold these two built-ins to MIN/MAX_EXPR when fast-math
is set. Then the built-ins are converted to MIN/MAX_EXPR and expanded to
smin/max.

Thanks for pointing out the problem!

> 
>>
>> In the other direction I am worried that the unspecs will degrade
>> performance (relative to smin/smax) when -ffast-math *is* active (and
>> this new builtin code and pattern doesn't blow up).
> 
> For fmin/fmax it would be fine, since they are transformed to {MAX,MIN}
> EXPR in middle end, and yes, it can degrade for the bifs, although IMHO
> the previous expansion to smin/smax contradicts with the bif names (users
> expect to map them to xs{min,max}dp than others).
> 
>>
>> I still think we should get RTL codes for this, to have access to proper
>> floating point min/max semantics always and everywhere.  "fmin" and
>> "fmax" seem to be good names :-)
> 
> It would be good, especially if we have observed some uses of these bifs
> and further opportunities around them.  :)
> 
Shall we submit a PR to add fmin/fmax to RTL codes?

> BR,
> Kewen


[PATCH] c++, c: Implement C++23 P1774R8 - Portable assumptions [PR106654]

2022-09-22 Thread Jakub Jelinek via Gcc-patches
Hi!

The following patch implements C++23 P1774R8 - Portable assumptions
paper, by introducing support for [[assume (cond)]]; attribute for C++.
In addition to that the patch adds [[gnu::assume (cond)]]; and
__attribute__((assume (cond))); support to both C and C++.
As described in C++23, the attribute argument is conditional-expression
rather than the usual assignment-expression for attribute arguments,
the condition is contextually converted to bool (for C truthvalue conversion
is done on it) and is never evaluated at runtime.
For C++ constant expression evaluation, I only check the simplest conditions
for undefined behavior, because otherwise I'd need to undo changes to
*ctx->global which happened during the evaluation (but I believe the spec
allows that and we can further improve later).
The patch uses a new internal function, .ASSUME, to hold the condition
in the FEs.  At gimplification time, if the condition is simple/without
side-effects, it is gimplified as if (cond) ; else __builtin_unreachable ();
and otherwise for now dropped on the floor.  The intent is to incrementally
outline the conditions into separate artificial functions and use
.ASSUME further to tell the ranger and perhaps other optimization passes
about the assumptions, as detailed in the PR.

When implementing it, I found that assume entry hasn't been added to
https://eel.is/c++draft/cpp.cond#6
Jonathan said he'll file a NB comment about it, this patch assumes it
has been added into the table as 202207L when the paper has been voted in.

With the attributes for both C/C++, I'd say we don't need to add
__builtin_assume with similar purpose, especially when __builtin_assume
in LLVM is just weird.  It is strange for side-effects in function call's
argument not to be evaluated, and LLVM in that case (annoyingly) warns
and ignores the side-effects (but doesn't do then anything with it),
if there are no side-effects, it will work like our
if (!cond) __builtin_unreachable ();

During bootstrap/regtest, I've discovered a problem with the way we
handle scoped attributes.  For declaration or type attributes for attributes
we don't know anything about we just don't add them to the declarations or
types, so later in the FEs and middle-end it is fine to use lookup_attribute
etc. which just check the attribute name and not namespace because
non-standard non-GNU attributes just won't show there.  But in the case of
attributes on statements, nothing has filtered out the unknown attributes,
so with my earlier patch e.g. c-c++-common/Wno-attributes-6.c test failed
because it uses:
[[vendor::assume(1 + 1 == 2)]];
with -Wno-attributes=vendor::assume and lookup_attribute ("assume", )
finds such attribute and handled it that way.
So, for those cases, this patch introduces lookup_attribute and
remove_attribute overloads which specify also the namespace.
I think the fallthrough, hot, cold, likely, unlikely attribute handling
will need to use the new APIs too, so that we don't handle
msft::fallthrough attribute as something we'd know.

Earlier version (without the attribs.{h,cc} changes and the 3 argument
lookup_attribute/remove_attribute uses instead of 2) has been successfully
bootstrapped/regtested on x86_64-linux and i686-linux with the
FAIL: c-c++-common/Wno-attributes-6.c  -Wc++-compat  (test for excess errors)
regression, ok for trunk if this passes full bootstrap/regtest again
(note, I've of course checked it already with
GXX_TESTSUITE_STDS=98,11,14,17,20,2b make check-gcc check-g++ \
  RUNTESTFLAGS="dg.exp='feat* attr-assume* Wno-attrib*'"
)?

2022-09-22  Jakub Jelinek  

PR c++/106654
gcc/
* internal-fn.def (ASSUME): New internal function.
* internal-fn.h (expand_ASSUME): Declare.
* internal-fn.cc (expand_ASSUME): Define.
* gimplify.cc (gimplify_call_expr): Gimplify IFN_ASSUME.
* fold-const.h (simple_operand_p_2): Declare.
* fold-const.cc (simple_operand_p_2): Remove forward declaration.
No longer static.  Adjust function comment and fix a typo in it.
(simple_operand_p): Adjust function comment.
* attribs.h (remove_attribute): Declare overload with additional
attr_ns argument.
(private_lookup_attribute): Declare overload with additional
attr_ns and attr_ns_len arguments.
(lookup_attribute): New overload with additional attr_ns argument.
* attribs.cc (remove_attribute): New overload with additional
attr_ns argument.
(private_lookup_attribute): New overload with additional
attr_ns and attr_ns_len arguments.
* doc/extend.texi: Document assume attribute.  Move fallthrough
attribute example to its section.
gcc/c-family/
* c-attribs.cc (handle_assume_attribute): New function.
(c_common_attribute_table): Add entry for assume attribute.
* c-lex.cc (c_common_has_attribute): Handle
__have_cpp_attribute (assume).
gcc/c/
* c-parser.cc (handle_assume_attribute): 

Re: Re: [PATCH] DSE: Enhance dse with def-ref analysis

2022-09-22 Thread Richard Biener via Gcc-patches
On Thu, 22 Sep 2022, juzhe.zh...@rivai.ai wrote:

> Hi, Richard. I tried your suggestion which is applying your code and PR106019.
> It works for me now. Thank you so much.
> 
> I will apply your suggestion on RVV GCC12.2 downstream (Because it has not 
> been supported on upstream).
> 
> I have another question:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99409 
> It seems that this issue occurs because GCC miss scalar-expansion 
> optimization.
> 
> I read the book ?Compiler Challenges for High-Performance Architectures?.
> There is a chapter: Chapter 5.3 Scalar Expansion.
> Is it a good idea to implement a new pass in GCC following the scalar 
> expansion algorithm this book provided?
> Or you have another better option to fix this issue ? Thanks.

Since this is about vectorization the more canonical place to perform
this is during if-conversion where we create a loop copy with transforms
applied that help vectorization.

It would be also nice to have a look at LLVM to see how they tackle
this specific case (they seem to manage with registers and shuffling)

Richard.

> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2022-09-22 16:48
> To: juzhe.zh...@rivai.ai
> CC: gcc-patches
> Subject: Re: Re: [PATCH] DSE: Enhance dse with def-ref analysis
> On Thu, 22 Sep 2022, juzhe.zh...@rivai.ai wrote:
>  
> > Does your local code exclude my codes?
> > I am using GCC12.2. When I delete all my codes and apply your codes only.
> > It fails to delete redundant stores and no auto-vecotorization of my RVV 
> > GCC in this test. 
> > I am not sure whether I am on the same page with you.
>  
> I applied my patch to GCC master where it handles the testcase
> from the PR in the first 042t.dse1 pass.  I have not applied your
> patch.  The patch needs an amendment to pass bootstrap,
>  
>   if (is_gimple_assign (use_stmt))
>  
> needs to be
>  
>   if (ref->ref && is_gimple_assign (use_stmt))
>  
> testing then also reveals
>  
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s243.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s243.c scan-tree-dump vect "vectorized 1 
> loops"
>  
> I guess that's expected.  Indeed when applying the patch to the
> GCC 12 branch the case isn't optimized.  I think it's probably
> the PR106019 fix missing, aka r13-1203-g038b077689bb53
>  
> Richard.
>  
> > 
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2022-09-22 16:01
> > To: juzhe.zh...@rivai.ai
> > Subject: Re: Re: [PATCH] DSE: Enhance dse with def-ref analysis
> > On Thu, 22 Sep 2022, juzhe.zh...@rivai.ai wrote:
> >  
> > > I tried this solution you gave:
> > > >> else if (ref_maybe_used_by_stmt_p (use_stmt, ref))
> > > >>   {
> > > >>   if (is_gimple_assign (use_stmt))
> > > >> {
> > > >>   data_reference_p dra, drb;
> > > >>   dra = create_data_ref (NULL, NULL, ref->ref, stmt,
> > > >> false, false);
> > > >>   drb = create_data_ref (NULL, NULL,
> > > >> gimple_assign_rhs1 (use_stmt),
> > > >> use_stmt, false, false);
> > > >>   bool alias_p = dr_may_alias_p (dra, drb, NULL);
> > > >>   free_data_ref (dra);
> > > >>   free_data_ref (drb);
> > > >>   if (!alias_p)
> > > >> {
> > > >>   if (gimple_vdef (use_stmt))
> > > >> defs.safe_push (use_stmt);
> > > >>   continue;
> > > >> }
> > > >> }
> > > 
> > > It still fails to delete the redundant store. The reason is when checking 
> > > the redundant store.
> > > it didn't match the condtion: ref_maybe_used_by_stmt_p (use_stmt, ref).
> >  
> > It does for me:
> >  
> >   Deleted dead store: a[i_18] = _5;
> >  
> > ...
> >  
> >:
> >   _1 = b[i_18];
> >   _2 = c[i_18];
> >   _3 = d[i_18];
> >   _4 = _2 * _3;
> >   _5 = _1 + _4;
> >   _8 = e[i_18];
> >   _9 = _3 * _8;
> >   _10 = _5 + _9;
> >   b[i_18] = _10;
> >   _12 = i_18 + 1;
> >   _13 = a[_12];
> >   _15 = _3 * _13;
> >   _16 = _10 + _15;
> >   a[i_18] = _16;
> >  
> > the other relevant function is stmt_kills_ref_p, that one does
> > handle a[i_18] vs. a[i_18] just fine.
> >  
> > > Maybe we should first figure why it doesn't satisfy this situation?
> > > 
> > > 
> > > juzhe.zh...@rivai.ai
> > >  
> > > From: Richard Biener
> > > Date: 2022-09-22 15:44
> > > To: Ju-Zhe Zhong
> > > CC: gcc-patches; richard.sandiford
> > > Subject: Re: [PATCH] DSE: Enhance dse with def-ref analysis
> > > On Thu, 22 Sep 2022, Richard Biener wrote:
> > >  
> > > > On Thu, 22 Sep 2022, juzhe.zh...@rivai.ai wrote:
> > > > 
> > > > > From: Ju-Zhe Zhong 
> > > > > 
> > > > > This patch fix issue: PR 99407
> > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99407
> > > > > 
> > > > > The enhancement implementation is simple:
> > > > > 1.Search gimple statement in program reverse order.
> > > > > 2.Queue the store statement which may be possible kill the def
> > > > >   of previous store statement.
> > > > > 3.Perform dse_def_ref_analysis to remove stores will not kill
> > > > >   any def.
> > > > >   For example:
> > > > 

Re: Re: [PATCH] DSE: Enhance dse with def-ref analysis

2022-09-22 Thread juzhe.zh...@rivai.ai
Hi, Richard. I tried your suggestion which is applying your code and PR106019.
It works for me now. Thank you so much.

I will apply your suggestion on RVV GCC12.2 downstream (Because it has not been 
supported on upstream).

I have another question:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99409 
It seems that this issue occurs because GCC miss scalar-expansion optimization.

I read the book 《Compiler Challenges for High-Performance Architectures》.
There is a chapter: Chapter 5.3 Scalar Expansion.
Is it a good idea to implement a new pass in GCC following the scalar expansion 
algorithm this book provided?
Or you have another better option to fix this issue ? Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2022-09-22 16:48
To: juzhe.zh...@rivai.ai
CC: gcc-patches
Subject: Re: Re: [PATCH] DSE: Enhance dse with def-ref analysis
On Thu, 22 Sep 2022, juzhe.zh...@rivai.ai wrote:
 
> Does your local code exclude my codes?
> I am using GCC12.2. When I delete all my codes and apply your codes only.
> It fails to delete redundant stores and no auto-vecotorization of my RVV GCC 
> in this test. 
> I am not sure whether I am on the same page with you.
 
I applied my patch to GCC master where it handles the testcase
from the PR in the first 042t.dse1 pass.  I have not applied your
patch.  The patch needs an amendment to pass bootstrap,
 
  if (is_gimple_assign (use_stmt))
 
needs to be
 
  if (ref->ref && is_gimple_assign (use_stmt))
 
testing then also reveals
 
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s243.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s243.c scan-tree-dump vect "vectorized 1 
loops"
 
I guess that's expected.  Indeed when applying the patch to the
GCC 12 branch the case isn't optimized.  I think it's probably
the PR106019 fix missing, aka r13-1203-g038b077689bb53
 
Richard.
 
> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2022-09-22 16:01
> To: juzhe.zh...@rivai.ai
> Subject: Re: Re: [PATCH] DSE: Enhance dse with def-ref analysis
> On Thu, 22 Sep 2022, juzhe.zh...@rivai.ai wrote:
>  
> > I tried this solution you gave:
> > >> else if (ref_maybe_used_by_stmt_p (use_stmt, ref))
> > >>   {
> > >>   if (is_gimple_assign (use_stmt))
> > >> {
> > >>   data_reference_p dra, drb;
> > >>   dra = create_data_ref (NULL, NULL, ref->ref, stmt,
> > >> false, false);
> > >>   drb = create_data_ref (NULL, NULL,
> > >> gimple_assign_rhs1 (use_stmt),
> > >> use_stmt, false, false);
> > >>   bool alias_p = dr_may_alias_p (dra, drb, NULL);
> > >>   free_data_ref (dra);
> > >>   free_data_ref (drb);
> > >>   if (!alias_p)
> > >> {
> > >>   if (gimple_vdef (use_stmt))
> > >> defs.safe_push (use_stmt);
> > >>   continue;
> > >> }
> > >> }
> > 
> > It still fails to delete the redundant store. The reason is when checking 
> > the redundant store.
> > it didn't match the condtion: ref_maybe_used_by_stmt_p (use_stmt, ref).
>  
> It does for me:
>  
>   Deleted dead store: a[i_18] = _5;
>  
> ...
>  
>:
>   _1 = b[i_18];
>   _2 = c[i_18];
>   _3 = d[i_18];
>   _4 = _2 * _3;
>   _5 = _1 + _4;
>   _8 = e[i_18];
>   _9 = _3 * _8;
>   _10 = _5 + _9;
>   b[i_18] = _10;
>   _12 = i_18 + 1;
>   _13 = a[_12];
>   _15 = _3 * _13;
>   _16 = _10 + _15;
>   a[i_18] = _16;
>  
> the other relevant function is stmt_kills_ref_p, that one does
> handle a[i_18] vs. a[i_18] just fine.
>  
> > Maybe we should first figure why it doesn't satisfy this situation?
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2022-09-22 15:44
> > To: Ju-Zhe Zhong
> > CC: gcc-patches; richard.sandiford
> > Subject: Re: [PATCH] DSE: Enhance dse with def-ref analysis
> > On Thu, 22 Sep 2022, Richard Biener wrote:
> >  
> > > On Thu, 22 Sep 2022, juzhe.zh...@rivai.ai wrote:
> > > 
> > > > From: Ju-Zhe Zhong 
> > > > 
> > > > This patch fix issue: PR 99407
> > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99407
> > > > 
> > > > The enhancement implementation is simple:
> > > > 1.Search gimple statement in program reverse order.
> > > > 2.Queue the store statement which may be possible kill the def
> > > >   of previous store statement.
> > > > 3.Perform dse_def_ref_analysis to remove stores will not kill
> > > >   any def.
> > > >   For example:
> > > > a[i_18] = _5;
> > > > ...
> > > > foo ();
> > > > a[i_18] = _7;
> > > > 
> > > >   a[i_18] = _7 is queued at the begining and will be removed
> > > >   in dse_def_ref_analysis.
> > > > 4.Remove the store if the def is confirmed to be killed.
> > > 
> > > But we already do the very same thing in dse_classify_store, I fail
> > > to see why we need to have an alternate implementation?  It also
> > > seems to be quadratic in the size of a basic-block?
> > > 
> > > The issue with dse_classify_store is that it relies on
> > > ref_maybe_used_by_stmt_p but that doesn't handle
> > > 
> > >  a[i] = ..;
> > >  .. = a[i+1];
> > > 
> > > but when 

Re: [OG12][PATCH] OpenMP: Fix ICE with OMP metadirectives

2022-09-22 Thread Tobias Burnus

Hello Paul-Antoine, hi all,

On 21.09.22 23:18, Paul-Antoine Arras wrote:


Here is a patch that fixes an ICE in gfortran triggered by an invalid end 
statement at the end of an OMP metadirective:


Remark for other reads of this email: This only applies to OG12 as mainline
does not have the following patches:
---
Patch set from December: 
https://gcc.gnu.org/pipermail/gcc-patches/2021-December/thread.html#586599
Reviews in May (multiple, e.g. 1/7 is at 
https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595762.html )
Fortran follow-up patches: 
https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590368.html
---


I played a bit with the patch. I think it looks okayish.

It seems to handle the code in question correctly. I worried about some bits,
which turned out to be unfounded. However, I found some related issues that
look similar (but are unaffected of the patch).

I do not quickly see whether your patch should handle them as well or whether
that's a completely separate code location which requires a completely separate
patch. – If the latter, the patch LGTM, otherwise, it would be great if it could
handle the other issues as well.


First, can you include in your patch also:

--- a/gcc/fortran/parse.cc
+++ b/gcc/fortran/parse.cc
@@ -2520 +2520 @@ gfc_ascii_statement (gfc_statement st)
-  p = "!OMP END METADIRECTIVE";
+  p = "!$OMP END METADIRECTIVE";


The following first two examples are about "omp begin/end metadirective" - while
"your" ICE was about the non-delimited "omp metadirective".

The following program gives an ICE - and I believe it is valid code.
When replacing 'nothing' by 'parallel', it is instead rejected
("Unexpected !OMP END METADIRECTIVE statement".)

! ice-on-valid-code, rejects-valid -- this is bad!
subroutine test2
logical :: UseDevice
!$OMP begin metadirective &
!$OMP   when ( user = { condition ( UseDevice ) } &
!$OMP : nothing ) &
!$OMP   default ( parallel )
  block
 call bar()
  end block
  !block
  !   call foo()
  !end block
!$omp end metadirective
end

I wonder whether it is also related to strictly nested blocks,
but, in any case, (strictly/loosely structures) blocks do not
apply here ('begin/end metadirective' has association 'delimited'
not 'block'). – Thus, I tried also with two 'block' to check this
is also accepted.



Likewise, the following code is mishandled in an odd way – but only
if all when/default use the same delimited directive:

! diagnostic, accepts-invalid -- not ideal but neither ICE nor rejecting valid 
code
!$OMP begin metadirective &
!$OMP   when ( user = { condition ( UseDevice ) } &
!$OMP : parallel ) &
!$OMP   default ( parallel )
   call bar()
!!$omp end parallel  ! (1)
!!$omp end metadirective  ! (2)
end

Uncommenting (2): it is accepted (and it should be)
Uncommenting (1): This is accepted - but shouldn't. There is an
"end metadirective" missing – that is required.
Uncommenting (1) and (2): The line (1) accepted but then (2) is rejected

Note: This only happens if all directives in when/default are the same
such that the 'end parallel' works for all of them.


I also tried the non-delimited '!$omp metadirective' (i.e. no begin...end),
but that seems to work fine. I still wonder whether it should be added as
another testcase (three tests, could be in the same files), just to make sure.

The following handles "end parallel" if there is only "parallel" in 
when/default;
however, I think all variants of the following are valid
(but bad style - for a (non-loop-associated) block-associated directive,
using begin/end makes more sense than dumping an explicit end directive.)


! OK - add as three test cases (?)
program test
logical :: UseDevice
!$OMP metadirective &
!$OMP   when ( user = { condition ( UseDevice ) } &
!$OMP : parallel ) &
!$OMP   default ( parallel )
block
   ! ...
end block
!$omp end parallel  ! Accepted, but only all cases have 'parallel'
end program

The "end parallel" is optional here as there is a strictly structured block
(the "block ... end block"); without "end parallel" or without the
"block" / "end block" (→ loosely structured block, then "end parallel" is 
required),
it also work. (Hence, three testcases.)

* * *


To the patch - one important comment to "ChangeLog(.omp)" and otherwise only
some personal comments.


Subject: [PATCH] OpenMP: Fix ICE with OMP metadirectives
...
Also add a new test to check this behaviour.

(Personally, I would remove the last line: I always would expect a testcase
if feasible and 4 lines later there is also "New test.". But it also does
not harm. – Especially when just browsing through the comments ("git log"),
I am happy if the log is short. However, when studying a commit in more detail,
I am happy about understanding what/why a patch did something. )

(Personally, I also would also add 'Fortran' to the subject line if only related
Fortran; first, it makes it more likely to get reviewed by Fortran maintainers
and 

Re: Re: [PATCH] DSE: Enhance dse with def-ref analysis

2022-09-22 Thread juzhe.zh...@rivai.ai
OK. Thank you so much fixing this for me. Would you mind pushing your 
optimization upstream? 
I will abandon my codes. Thank you so much.



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2022-09-22 16:48
To: juzhe.zh...@rivai.ai
CC: gcc-patches
Subject: Re: Re: [PATCH] DSE: Enhance dse with def-ref analysis
On Thu, 22 Sep 2022, juzhe.zh...@rivai.ai wrote:
 
> Does your local code exclude my codes?
> I am using GCC12.2. When I delete all my codes and apply your codes only.
> It fails to delete redundant stores and no auto-vecotorization of my RVV GCC 
> in this test. 
> I am not sure whether I am on the same page with you.
 
I applied my patch to GCC master where it handles the testcase
from the PR in the first 042t.dse1 pass.  I have not applied your
patch.  The patch needs an amendment to pass bootstrap,
 
  if (is_gimple_assign (use_stmt))
 
needs to be
 
  if (ref->ref && is_gimple_assign (use_stmt))
 
testing then also reveals
 
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s243.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s243.c scan-tree-dump vect "vectorized 1 
loops"
 
I guess that's expected.  Indeed when applying the patch to the
GCC 12 branch the case isn't optimized.  I think it's probably
the PR106019 fix missing, aka r13-1203-g038b077689bb53
 
Richard.
 
> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2022-09-22 16:01
> To: juzhe.zh...@rivai.ai
> Subject: Re: Re: [PATCH] DSE: Enhance dse with def-ref analysis
> On Thu, 22 Sep 2022, juzhe.zh...@rivai.ai wrote:
>  
> > I tried this solution you gave:
> > >> else if (ref_maybe_used_by_stmt_p (use_stmt, ref))
> > >>   {
> > >>   if (is_gimple_assign (use_stmt))
> > >> {
> > >>   data_reference_p dra, drb;
> > >>   dra = create_data_ref (NULL, NULL, ref->ref, stmt,
> > >> false, false);
> > >>   drb = create_data_ref (NULL, NULL,
> > >> gimple_assign_rhs1 (use_stmt),
> > >> use_stmt, false, false);
> > >>   bool alias_p = dr_may_alias_p (dra, drb, NULL);
> > >>   free_data_ref (dra);
> > >>   free_data_ref (drb);
> > >>   if (!alias_p)
> > >> {
> > >>   if (gimple_vdef (use_stmt))
> > >> defs.safe_push (use_stmt);
> > >>   continue;
> > >> }
> > >> }
> > 
> > It still fails to delete the redundant store. The reason is when checking 
> > the redundant store.
> > it didn't match the condtion: ref_maybe_used_by_stmt_p (use_stmt, ref).
>  
> It does for me:
>  
>   Deleted dead store: a[i_18] = _5;
>  
> ...
>  
>:
>   _1 = b[i_18];
>   _2 = c[i_18];
>   _3 = d[i_18];
>   _4 = _2 * _3;
>   _5 = _1 + _4;
>   _8 = e[i_18];
>   _9 = _3 * _8;
>   _10 = _5 + _9;
>   b[i_18] = _10;
>   _12 = i_18 + 1;
>   _13 = a[_12];
>   _15 = _3 * _13;
>   _16 = _10 + _15;
>   a[i_18] = _16;
>  
> the other relevant function is stmt_kills_ref_p, that one does
> handle a[i_18] vs. a[i_18] just fine.
>  
> > Maybe we should first figure why it doesn't satisfy this situation?
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2022-09-22 15:44
> > To: Ju-Zhe Zhong
> > CC: gcc-patches; richard.sandiford
> > Subject: Re: [PATCH] DSE: Enhance dse with def-ref analysis
> > On Thu, 22 Sep 2022, Richard Biener wrote:
> >  
> > > On Thu, 22 Sep 2022, juzhe.zh...@rivai.ai wrote:
> > > 
> > > > From: Ju-Zhe Zhong 
> > > > 
> > > > This patch fix issue: PR 99407
> > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99407
> > > > 
> > > > The enhancement implementation is simple:
> > > > 1.Search gimple statement in program reverse order.
> > > > 2.Queue the store statement which may be possible kill the def
> > > >   of previous store statement.
> > > > 3.Perform dse_def_ref_analysis to remove stores will not kill
> > > >   any def.
> > > >   For example:
> > > > a[i_18] = _5;
> > > > ...
> > > > foo ();
> > > > a[i_18] = _7;
> > > > 
> > > >   a[i_18] = _7 is queued at the begining and will be removed
> > > >   in dse_def_ref_analysis.
> > > > 4.Remove the store if the def is confirmed to be killed.
> > > 
> > > But we already do the very same thing in dse_classify_store, I fail
> > > to see why we need to have an alternate implementation?  It also
> > > seems to be quadratic in the size of a basic-block?
> > > 
> > > The issue with dse_classify_store is that it relies on
> > > ref_maybe_used_by_stmt_p but that doesn't handle
> > > 
> > >  a[i] = ..;
> > >  .. = a[i+1];
> > > 
> > > but when seeing a[_1] vs. a[_2] (two variable offsets), it gives
> > > up, asserting may-aliasing.  We do have infrastructure to catch
> > > such cases with data reference analysis.  If we want to catch
> > > these cases we should use that instead.  Given we have a
> > > DSE/DCE pass pair right before loop optimizations we could even
> > > move those inside of the loop pipeline and perform this more
> > > expensive checks conditional on loop/scev availability.
> >  
> > Oh, and when doing non-loop aware analysis we don't need SCEV.  The
> > 

Re: Re: [PATCH] DSE: Enhance dse with def-ref analysis

2022-09-22 Thread Richard Biener via Gcc-patches
On Thu, 22 Sep 2022, juzhe.zh...@rivai.ai wrote:

> Does your local code exclude my codes?
> I am using GCC12.2. When I delete all my codes and apply your codes only.
> It fails to delete redundant stores and no auto-vecotorization of my RVV GCC 
> in this test. 
> I am not sure whether I am on the same page with you.

I applied my patch to GCC master where it handles the testcase
from the PR in the first 042t.dse1 pass.  I have not applied your
patch.  The patch needs an amendment to pass bootstrap,

  if (is_gimple_assign (use_stmt))

needs to be

  if (ref->ref && is_gimple_assign (use_stmt))

testing then also reveals

XPASS: gcc.dg/vect/tsvc/vect-tsvc-s243.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s243.c scan-tree-dump vect "vectorized 1 
loops"

I guess that's expected.  Indeed when applying the patch to the
GCC 12 branch the case isn't optimized.  I think it's probably
the PR106019 fix missing, aka r13-1203-g038b077689bb53

Richard.

> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2022-09-22 16:01
> To: juzhe.zh...@rivai.ai
> Subject: Re: Re: [PATCH] DSE: Enhance dse with def-ref analysis
> On Thu, 22 Sep 2022, juzhe.zh...@rivai.ai wrote:
>  
> > I tried this solution you gave:
> > >> else if (ref_maybe_used_by_stmt_p (use_stmt, ref))
> > >>   {
> > >>   if (is_gimple_assign (use_stmt))
> > >> {
> > >>   data_reference_p dra, drb;
> > >>   dra = create_data_ref (NULL, NULL, ref->ref, stmt,
> > >> false, false);
> > >>   drb = create_data_ref (NULL, NULL,
> > >> gimple_assign_rhs1 (use_stmt),
> > >> use_stmt, false, false);
> > >>   bool alias_p = dr_may_alias_p (dra, drb, NULL);
> > >>   free_data_ref (dra);
> > >>   free_data_ref (drb);
> > >>   if (!alias_p)
> > >> {
> > >>   if (gimple_vdef (use_stmt))
> > >> defs.safe_push (use_stmt);
> > >>   continue;
> > >> }
> > >> }
> > 
> > It still fails to delete the redundant store. The reason is when checking 
> > the redundant store.
> > it didn't match the condtion: ref_maybe_used_by_stmt_p (use_stmt, ref).
>  
> It does for me:
>  
>   Deleted dead store: a[i_18] = _5;
>  
> ...
>  
>:
>   _1 = b[i_18];
>   _2 = c[i_18];
>   _3 = d[i_18];
>   _4 = _2 * _3;
>   _5 = _1 + _4;
>   _8 = e[i_18];
>   _9 = _3 * _8;
>   _10 = _5 + _9;
>   b[i_18] = _10;
>   _12 = i_18 + 1;
>   _13 = a[_12];
>   _15 = _3 * _13;
>   _16 = _10 + _15;
>   a[i_18] = _16;
>  
> the other relevant function is stmt_kills_ref_p, that one does
> handle a[i_18] vs. a[i_18] just fine.
>  
> > Maybe we should first figure why it doesn't satisfy this situation?
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2022-09-22 15:44
> > To: Ju-Zhe Zhong
> > CC: gcc-patches; richard.sandiford
> > Subject: Re: [PATCH] DSE: Enhance dse with def-ref analysis
> > On Thu, 22 Sep 2022, Richard Biener wrote:
> >  
> > > On Thu, 22 Sep 2022, juzhe.zh...@rivai.ai wrote:
> > > 
> > > > From: Ju-Zhe Zhong 
> > > > 
> > > > This patch fix issue: PR 99407
> > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99407
> > > > 
> > > > The enhancement implementation is simple:
> > > > 1.Search gimple statement in program reverse order.
> > > > 2.Queue the store statement which may be possible kill the def
> > > >   of previous store statement.
> > > > 3.Perform dse_def_ref_analysis to remove stores will not kill
> > > >   any def.
> > > >   For example:
> > > > a[i_18] = _5;
> > > > ...
> > > > foo ();
> > > > a[i_18] = _7;
> > > > 
> > > >   a[i_18] = _7 is queued at the begining and will be removed
> > > >   in dse_def_ref_analysis.
> > > > 4.Remove the store if the def is confirmed to be killed.
> > > 
> > > But we already do the very same thing in dse_classify_store, I fail
> > > to see why we need to have an alternate implementation?  It also
> > > seems to be quadratic in the size of a basic-block?
> > > 
> > > The issue with dse_classify_store is that it relies on
> > > ref_maybe_used_by_stmt_p but that doesn't handle
> > > 
> > >  a[i] = ..;
> > >  .. = a[i+1];
> > > 
> > > but when seeing a[_1] vs. a[_2] (two variable offsets), it gives
> > > up, asserting may-aliasing.  We do have infrastructure to catch
> > > such cases with data reference analysis.  If we want to catch
> > > these cases we should use that instead.  Given we have a
> > > DSE/DCE pass pair right before loop optimizations we could even
> > > move those inside of the loop pipeline and perform this more
> > > expensive checks conditional on loop/scev availability.
> >  
> > Oh, and when doing non-loop aware analysis we don't need SCEV.  The
> > following optimizes the testcase but as said I don't think we want
> > to perform this for each of the DSE passes since it can be somewhat
> > expensive, at least without doing more caching (we could keep a
> > stmt -> data-ref hash-map and compute data-refs at most once for each
> > statement, that would make it more 

Re: Re: [PATCH] DSE: Enhance dse with def-ref analysis

2022-09-22 Thread juzhe.zh...@rivai.ai
Does your local code exclude my codes?
I am using GCC12.2. When I delete all my codes and apply your codes only.
It fails to delete redundant stores and no auto-vecotorization of my RVV GCC in 
this test. 
I am not sure whether I am on the same page with you.




juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2022-09-22 16:01
To: juzhe.zh...@rivai.ai
Subject: Re: Re: [PATCH] DSE: Enhance dse with def-ref analysis
On Thu, 22 Sep 2022, juzhe.zh...@rivai.ai wrote:
 
> I tried this solution you gave:
> >> else if (ref_maybe_used_by_stmt_p (use_stmt, ref))
> >>   {
> >>   if (is_gimple_assign (use_stmt))
> >> {
> >>   data_reference_p dra, drb;
> >>   dra = create_data_ref (NULL, NULL, ref->ref, stmt,
> >> false, false);
> >>   drb = create_data_ref (NULL, NULL,
> >> gimple_assign_rhs1 (use_stmt),
> >> use_stmt, false, false);
> >>   bool alias_p = dr_may_alias_p (dra, drb, NULL);
> >>   free_data_ref (dra);
> >>   free_data_ref (drb);
> >>   if (!alias_p)
> >> {
> >>   if (gimple_vdef (use_stmt))
> >> defs.safe_push (use_stmt);
> >>   continue;
> >> }
> >> }
> 
> It still fails to delete the redundant store. The reason is when checking the 
> redundant store.
> it didn't match the condtion: ref_maybe_used_by_stmt_p (use_stmt, ref).
 
It does for me:
 
  Deleted dead store: a[i_18] = _5;
 
...
 
   :
  _1 = b[i_18];
  _2 = c[i_18];
  _3 = d[i_18];
  _4 = _2 * _3;
  _5 = _1 + _4;
  _8 = e[i_18];
  _9 = _3 * _8;
  _10 = _5 + _9;
  b[i_18] = _10;
  _12 = i_18 + 1;
  _13 = a[_12];
  _15 = _3 * _13;
  _16 = _10 + _15;
  a[i_18] = _16;
 
the other relevant function is stmt_kills_ref_p, that one does
handle a[i_18] vs. a[i_18] just fine.
 
> Maybe we should first figure why it doesn't satisfy this situation?
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2022-09-22 15:44
> To: Ju-Zhe Zhong
> CC: gcc-patches; richard.sandiford
> Subject: Re: [PATCH] DSE: Enhance dse with def-ref analysis
> On Thu, 22 Sep 2022, Richard Biener wrote:
>  
> > On Thu, 22 Sep 2022, juzhe.zh...@rivai.ai wrote:
> > 
> > > From: Ju-Zhe Zhong 
> > > 
> > > This patch fix issue: PR 99407
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99407
> > > 
> > > The enhancement implementation is simple:
> > > 1.Search gimple statement in program reverse order.
> > > 2.Queue the store statement which may be possible kill the def
> > >   of previous store statement.
> > > 3.Perform dse_def_ref_analysis to remove stores will not kill
> > >   any def.
> > >   For example:
> > > a[i_18] = _5;
> > > ...
> > > foo ();
> > > a[i_18] = _7;
> > > 
> > >   a[i_18] = _7 is queued at the begining and will be removed
> > >   in dse_def_ref_analysis.
> > > 4.Remove the store if the def is confirmed to be killed.
> > 
> > But we already do the very same thing in dse_classify_store, I fail
> > to see why we need to have an alternate implementation?  It also
> > seems to be quadratic in the size of a basic-block?
> > 
> > The issue with dse_classify_store is that it relies on
> > ref_maybe_used_by_stmt_p but that doesn't handle
> > 
> >  a[i] = ..;
> >  .. = a[i+1];
> > 
> > but when seeing a[_1] vs. a[_2] (two variable offsets), it gives
> > up, asserting may-aliasing.  We do have infrastructure to catch
> > such cases with data reference analysis.  If we want to catch
> > these cases we should use that instead.  Given we have a
> > DSE/DCE pass pair right before loop optimizations we could even
> > move those inside of the loop pipeline and perform this more
> > expensive checks conditional on loop/scev availability.
>  
> Oh, and when doing non-loop aware analysis we don't need SCEV.  The
> following optimizes the testcase but as said I don't think we want
> to perform this for each of the DSE passes since it can be somewhat
> expensive, at least without doing more caching (we could keep a
> stmt -> data-ref hash-map and compute data-refs at most once for each
> statement, that would make it more acceptable).
>  
> Richard.
>  
> From 515b213e9d06c2bd36160e66728f57e48095bb84 Mon Sep 17 00:00:00 2001
> From: Richard Biener 
> Date: Thu, 22 Sep 2022 09:40:40 +0200
> Subject: [PATCH] tree-optimization/99407 - DSE with data-ref analysis
> To: gcc-patches@gcc.gnu.org
>  
> * tree-ssa-dse.c (dse_classify_store): Use data-ref analysis
> to disambiguate more uses.
> ---
> gcc/tree-ssa-dse.cc | 21 +
> 1 file changed, 21 insertions(+)
>  
> diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc
> index 34cfd1a8802..340a54f4105 100644
> --- a/gcc/tree-ssa-dse.cc
> +++ b/gcc/tree-ssa-dse.cc
> @@ -45,6 +45,8 @@ along with GCC; see the file COPYING3.  If not see
> #include "ipa-modref.h"
> #include "target.h"
> #include "tree-ssa-loop-niter.h"
> +#include "cfgloop.h"
> +#include "tree-data-ref.h"
> /* This file implements dead store elimination.
> @@ -1019,6 +1021,25 @@ dse_classify_store (ao_ref *ref, gimple *stmt,
>   /* If the statement is a use the store is not dead.  */
>   else if 

Re: [PATCH] DSE: Enhance dse with def-ref analysis

2022-09-22 Thread Richard Biener via Gcc-patches
On Thu, 22 Sep 2022, Richard Biener wrote:

> On Thu, 22 Sep 2022, juzhe.zh...@rivai.ai wrote:
> 
> > From: Ju-Zhe Zhong 
> > 
> > This patch fix issue: PR 99407
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99407
> > 
> > The enhancement implementation is simple:
> > 1.Search gimple statement in program reverse order.
> > 2.Queue the store statement which may be possible kill the def
> >   of previous store statement.
> > 3.Perform dse_def_ref_analysis to remove stores will not kill
> >   any def.
> >   For example:
> > a[i_18] = _5;
> > ...
> > foo ();
> > a[i_18] = _7;
> > 
> >   a[i_18] = _7 is queued at the begining and will be removed
> >   in dse_def_ref_analysis.
> > 4.Remove the store if the def is confirmed to be killed.
> 
> But we already do the very same thing in dse_classify_store, I fail
> to see why we need to have an alternate implementation?  It also
> seems to be quadratic in the size of a basic-block?
> 
> The issue with dse_classify_store is that it relies on
> ref_maybe_used_by_stmt_p but that doesn't handle
> 
>  a[i] = ..;
>  .. = a[i+1];
> 
> but when seeing a[_1] vs. a[_2] (two variable offsets), it gives
> up, asserting may-aliasing.  We do have infrastructure to catch
> such cases with data reference analysis.  If we want to catch
> these cases we should use that instead.  Given we have a
> DSE/DCE pass pair right before loop optimizations we could even
> move those inside of the loop pipeline and perform this more
> expensive checks conditional on loop/scev availability.

Oh, and when doing non-loop aware analysis we don't need SCEV.  The
following optimizes the testcase but as said I don't think we want
to perform this for each of the DSE passes since it can be somewhat
expensive, at least without doing more caching (we could keep a
stmt -> data-ref hash-map and compute data-refs at most once for each
statement, that would make it more acceptable).

Richard.

>From 515b213e9d06c2bd36160e66728f57e48095bb84 Mon Sep 17 00:00:00 2001
From: Richard Biener 
Date: Thu, 22 Sep 2022 09:40:40 +0200
Subject: [PATCH] tree-optimization/99407 - DSE with data-ref analysis
To: gcc-patches@gcc.gnu.org

* tree-ssa-dse.c (dse_classify_store): Use data-ref analysis
to disambiguate more uses.
---
 gcc/tree-ssa-dse.cc | 21 +
 1 file changed, 21 insertions(+)

diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc
index 34cfd1a8802..340a54f4105 100644
--- a/gcc/tree-ssa-dse.cc
+++ b/gcc/tree-ssa-dse.cc
@@ -45,6 +45,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-modref.h"
 #include "target.h"
 #include "tree-ssa-loop-niter.h"
+#include "cfgloop.h"
+#include "tree-data-ref.h"
 
 /* This file implements dead store elimination.
 
@@ -1019,6 +1021,25 @@ dse_classify_store (ao_ref *ref, gimple *stmt,
  /* If the statement is a use the store is not dead.  */
  else if (ref_maybe_used_by_stmt_p (use_stmt, ref))
{
+ if (is_gimple_assign (use_stmt))
+   {
+ data_reference_p dra, drb;
+ dra = create_data_ref (NULL, NULL, ref->ref, stmt,
+false, false);
+ drb = create_data_ref (NULL, NULL,
+gimple_assign_rhs1 (use_stmt),
+use_stmt, false, false);
+ bool alias_p = dr_may_alias_p (dra, drb, NULL);
+ free_data_ref (dra);
+ free_data_ref (drb);
+ if (!alias_p)
+   {
+ if (gimple_vdef (use_stmt))
+   defs.safe_push (use_stmt);
+ continue;
+   }
+   }
+
  /* Handle common cases where we can easily build an ao_ref
 structure for USE_STMT and in doing so we find that the
 references hit non-live bytes and thus can be ignored.
-- 
2.35.3



Re: Re: [PATCH] DSE: Enhance dse with def-ref analysis

2022-09-22 Thread juzhe.zh...@rivai.ai
OK. You mean we should check why if fails in ref_maybe_used_by_stmt_p
instead of doing the data-ref analysis outside dse_classify_store ?


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2022-09-22 15:32
To: Ju-Zhe Zhong
CC: gcc-patches; richard.sandiford
Subject: Re: [PATCH] DSE: Enhance dse with def-ref analysis
On Thu, 22 Sep 2022, juzhe.zh...@rivai.ai wrote:
 
> From: Ju-Zhe Zhong 
> 
> This patch fix issue: PR 99407
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99407
> 
> The enhancement implementation is simple:
> 1.Search gimple statement in program reverse order.
> 2.Queue the store statement which may be possible kill the def
>   of previous store statement.
> 3.Perform dse_def_ref_analysis to remove stores will not kill
>   any def.
>   For example:
> a[i_18] = _5;
> ...
> foo ();
> a[i_18] = _7;
> 
>   a[i_18] = _7 is queued at the begining and will be removed
>   in dse_def_ref_analysis.
> 4.Remove the store if the def is confirmed to be killed.
 
But we already do the very same thing in dse_classify_store, I fail
to see why we need to have an alternate implementation?  It also
seems to be quadratic in the size of a basic-block?
 
The issue with dse_classify_store is that it relies on
ref_maybe_used_by_stmt_p but that doesn't handle
 
a[i] = ..;
.. = a[i+1];
 
but when seeing a[_1] vs. a[_2] (two variable offsets), it gives
up, asserting may-aliasing.  We do have infrastructure to catch
such cases with data reference analysis.  If we want to catch
these cases we should use that instead.  Given we have a
DSE/DCE pass pair right before loop optimizations we could even
move those inside of the loop pipeline and perform this more
expensive checks conditional on loop/scev availability.
 
Richard.
 
> I have fully tested it in RISC-V foundation downstream port (RVV):
> https://github.com/riscv-collab/riscv-gcc/tree/riscv-gcc-rvv-next
> 
> Are you willing to review this patch and test it in ARM/x86?
> 
> gcc/ChangeLog:
> 
> * tree-ssa-dse.cc (dse_search_def_stores): New function.
> (dse_can_def_ref_p): Ditto.
> (dse_def_ref_analysis): Add a new argument.
> (dse_optimize_stmt): Pass through stores_queue.
> (pass_dse::execute): Add dse_def_ref_analysis and stores_queue.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.dg/tree-ssa/pr99407.c: New test.
> 
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/pr99407.c |  30 
>  gcc/tree-ssa-dse.cc | 209 +++-
>  2 files changed, 236 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr99407.c
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr99407.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr99407.c
> new file mode 100644
> index 000..57cea77da7c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr99407.c
> @@ -0,0 +1,30 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-dse1-details" } */
> +typedef float real_t;
> +
> +#define iterations 10
> +#define LEN_1D 32000
> +#define LEN_2D 256
> +real_t flat_2d_array[LEN_2D*LEN_2D];
> +
> +real_t x[LEN_1D];
> +
> +real_t a[LEN_1D],b[LEN_1D],c[LEN_1D],d[LEN_1D],e[LEN_1D],
> +bb[LEN_2D][LEN_2D],cc[LEN_2D][LEN_2D],tt[LEN_2D][LEN_2D];
> +
> +int indx[LEN_1D];
> +
> +real_t* __restrict__ xx;
> +real_t* yy;
> +real_t s243(void)
> +{
> +  for (int nl = 0; nl < iterations; nl++) {
> +for (int i = 0; i < LEN_1D-1; i++) {
> +a[i] = b[i] + c[i  ] * d[i];
> +b[i] = a[i] + d[i  ] * e[i];
> +a[i] = b[i] + a[i+1] * d[i];
> +}
> +  }
> +}
> +
> +/* { dg-final { scan-tree-dump "Deleted dead store" "dse1" } } */
> \ No newline at end of file
> diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc
> index 34cfd1a8802..a8ca3672da2 100644
> --- a/gcc/tree-ssa-dse.cc
> +++ b/gcc/tree-ssa-dse.cc
> @@ -1332,6 +1332,186 @@ dse_optimize_call (gimple_stmt_iterator *gsi, sbitmap 
> live_bytes)
>return true;
>  }
>  
> +/* Search the stores_queue to see whether there is a store has a same vdef
> +   as the stmt.  */
> +
> +static bool
> +dse_search_def_stores (function *fun, auto_vec _queue,
> +gimple *stmt)
> +{
> +  /* Consider the following sequcence:
> +a[i_18] = _5;
> +_8 = e[i_18];
> +_9 = _3 * _8;
> +_10 = _5 + _9;
> +b[i_18] = _10;
> +_12 = i_18 + 1;
> +_13 = a[_12];
> +_15 = _3 * _13;
> +_16 = _10 + _15;
> +a[i_18] = _16
> +
> +We should be able to remove a[i_18] = _5.  */
> +  for (unsigned int i = 0; i < stores_queue.length (); ++i)
> +{
> +  if (!stores_queue[i])
> + continue;
> +  tree lhs1 = gimple_assign_lhs (stores_queue[i]);
> +  tree lhs2 = gimple_assign_lhs (stmt);
> +
> +  if (TREE_CODE (lhs1) != TREE_CODE (lhs2))
> + continue;
> +  if (operand_equal_p (gimple_assign_lhs (stores_queue[i]),
> +gimple_assign_lhs (stmt), OEP_ADDRESS_OF))
> + {
> +   /* No matter it can be eliminated or not, remove it
> +  in the worklist.  */
> +   stores_queue[i] = NULL;
> 

Re: [PATCH] DSE: Enhance dse with def-ref analysis

2022-09-22 Thread Richard Biener via Gcc-patches
On Thu, 22 Sep 2022, juzhe.zh...@rivai.ai wrote:

> From: Ju-Zhe Zhong 
> 
> This patch fix issue: PR 99407
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99407
> 
> The enhancement implementation is simple:
> 1.Search gimple statement in program reverse order.
> 2.Queue the store statement which may be possible kill the def
>   of previous store statement.
> 3.Perform dse_def_ref_analysis to remove stores will not kill
>   any def.
>   For example:
> a[i_18] = _5;
> ...
> foo ();
> a[i_18] = _7;
> 
>   a[i_18] = _7 is queued at the begining and will be removed
>   in dse_def_ref_analysis.
> 4.Remove the store if the def is confirmed to be killed.

But we already do the very same thing in dse_classify_store, I fail
to see why we need to have an alternate implementation?  It also
seems to be quadratic in the size of a basic-block?

The issue with dse_classify_store is that it relies on
ref_maybe_used_by_stmt_p but that doesn't handle

 a[i] = ..;
 .. = a[i+1];

but when seeing a[_1] vs. a[_2] (two variable offsets), it gives
up, asserting may-aliasing.  We do have infrastructure to catch
such cases with data reference analysis.  If we want to catch
these cases we should use that instead.  Given we have a
DSE/DCE pass pair right before loop optimizations we could even
move those inside of the loop pipeline and perform this more
expensive checks conditional on loop/scev availability.

Richard.

> I have fully tested it in RISC-V foundation downstream port (RVV):
> https://github.com/riscv-collab/riscv-gcc/tree/riscv-gcc-rvv-next
> 
> Are you willing to review this patch and test it in ARM/x86?
> 
> gcc/ChangeLog:
> 
> * tree-ssa-dse.cc (dse_search_def_stores): New function.
> (dse_can_def_ref_p): Ditto.
> (dse_def_ref_analysis): Add a new argument.
> (dse_optimize_stmt): Pass through stores_queue.
> (pass_dse::execute): Add dse_def_ref_analysis and stores_queue.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.dg/tree-ssa/pr99407.c: New test.
> 
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/pr99407.c |  30 
>  gcc/tree-ssa-dse.cc | 209 +++-
>  2 files changed, 236 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr99407.c
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr99407.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr99407.c
> new file mode 100644
> index 000..57cea77da7c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr99407.c
> @@ -0,0 +1,30 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-dse1-details" } */
> +typedef float real_t;
> +
> +#define iterations 10
> +#define LEN_1D 32000
> +#define LEN_2D 256
> +real_t flat_2d_array[LEN_2D*LEN_2D];
> +
> +real_t x[LEN_1D];
> +
> +real_t a[LEN_1D],b[LEN_1D],c[LEN_1D],d[LEN_1D],e[LEN_1D],
> +bb[LEN_2D][LEN_2D],cc[LEN_2D][LEN_2D],tt[LEN_2D][LEN_2D];
> +
> +int indx[LEN_1D];
> +
> +real_t* __restrict__ xx;
> +real_t* yy;
> +real_t s243(void)
> +{
> +  for (int nl = 0; nl < iterations; nl++) {
> +for (int i = 0; i < LEN_1D-1; i++) {
> +a[i] = b[i] + c[i  ] * d[i];
> +b[i] = a[i] + d[i  ] * e[i];
> +a[i] = b[i] + a[i+1] * d[i];
> +}
> +  }
> +}
> +
> +/* { dg-final { scan-tree-dump "Deleted dead store" "dse1" } } */
> \ No newline at end of file
> diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc
> index 34cfd1a8802..a8ca3672da2 100644
> --- a/gcc/tree-ssa-dse.cc
> +++ b/gcc/tree-ssa-dse.cc
> @@ -1332,6 +1332,186 @@ dse_optimize_call (gimple_stmt_iterator *gsi, sbitmap 
> live_bytes)
>return true;
>  }
>  
> +/* Search the stores_queue to see whether there is a store has a same vdef
> +   as the stmt.  */
> +
> +static bool
> +dse_search_def_stores (function *fun, auto_vec _queue,
> +gimple *stmt)
> +{
> +  /* Consider the following sequcence:
> +a[i_18] = _5;
> +_8 = e[i_18];
> +_9 = _3 * _8;
> +_10 = _5 + _9;
> +b[i_18] = _10;
> +_12 = i_18 + 1;
> +_13 = a[_12];
> +_15 = _3 * _13;
> +_16 = _10 + _15;
> +a[i_18] = _16
> +
> +We should be able to remove a[i_18] = _5.  */
> +  for (unsigned int i = 0; i < stores_queue.length (); ++i)
> +{
> +  if (!stores_queue[i])
> + continue;
> +  tree lhs1 = gimple_assign_lhs (stores_queue[i]);
> +  tree lhs2 = gimple_assign_lhs (stmt);
> +
> +  if (TREE_CODE (lhs1) != TREE_CODE (lhs2))
> + continue;
> +  if (operand_equal_p (gimple_assign_lhs (stores_queue[i]),
> +gimple_assign_lhs (stmt), OEP_ADDRESS_OF))
> + {
> +   /* No matter it can be eliminated or not, remove it
> +  in the worklist.  */
> +   stores_queue[i] = NULL;
> +   if (gimple_assign_single_p (stmt) && !gimple_has_side_effects (stmt)
> +   && !is_ctrl_altering_stmt (stmt)
> +   && (!stmt_could_throw_p (fun, stmt)
> +   || fun->can_delete_dead_exceptions))
> + return true;
> + }
> +  

[PATCH] i386: Optimize code generation of __mm256_zextsi128_si256(__mm_set1_epi8(-1))

2022-09-22 Thread Hu, Lin1 via Gcc-patches
Hi all,

This patch aims to optimize code generation of 
__mm256_zextsi128_si256(__mm_set1_epi8(-1)). Reduce the number of instructions 
required to achieve the final result.

Regtested on x86_64-pc-linux-gnu. Ok for trunk?

BRs,
Lin

gcc/ChangeLog:

PR target/94962
* config/i386/constraints.md (BH): New define_constraint.
* config/i386/i386.cc (standard_sse_constant_p): Add return 3/4 when 
operand matches new predicate.
(standard_sse_constant_opcode): Add new alternative branch to return 
"vpcmpeqd".
* config/i386/predicates.md (vector_all_ones_zero_extend_half_operand): 
New define_predicate.
(vector_all_ones_zero_extend_quarter_operand): Ditto.
* config/i386/sse.md: Add constraint to insn "mov_internal".

gcc/testsuite/ChangeLog:

PR target/94962
* gcc.target/i386/avx256-unaligned-load-1.c: Modify test.
* gcc.target/i386/avx256-unaligned-store-1.c: Ditto.
* gcc.target/i386/avx256-unaligned-store-2.c: Ditto.
* gcc.target/i386/avx256-unaligned-store-3.c: Ditto.
* gcc.target/i386/pr94962-1.c: New test.
* gcc.target/i386/pr94962-2.c: Ditto.
* gcc.target/i386/pr94962-3.c: Ditto.
* gcc.target/i386/pr94962-4.c: Ditto.
---
 gcc/config/i386/constraints.md|  8 +++
 gcc/config/i386/i386.cc   | 26 +++-
 gcc/config/i386/predicates.md | 49 ++
 gcc/config/i386/sse.md|  8 +--
 .../gcc.target/i386/avx256-unaligned-load-1.c |  4 +-
 .../i386/avx256-unaligned-store-1.c   |  4 +-
 .../i386/avx256-unaligned-store-2.c   |  4 +-
 .../i386/avx256-unaligned-store-3.c   |  4 +-
 gcc/testsuite/gcc.target/i386/pr94962-1.c | 11 
 gcc/testsuite/gcc.target/i386/pr94962-2.c | 17 +
 gcc/testsuite/gcc.target/i386/pr94962-3.c | 64 +++
 gcc/testsuite/gcc.target/i386/pr94962-4.c | 49 ++
 12 files changed, 235 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr94962-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr94962-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr94962-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr94962-4.c

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 7361687632f..95b2b142d41 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -168,6 +168,9 @@
 ;;  z  Constant call address operand.
 ;;  C  Integer SSE constant with all bits set operand.
 ;;  F  Floating-point SSE constant with all bits set operand.
+;;  H  Integer SSE constant that is 128/256bit all ones
+;; and zero-extand to 256/512bit, or 128bit all ones
+;; and zero-extend to 512bit.
 ;;  M  x86-64 memory operand.
 
 (define_constraint "Bf"
@@ -233,6 +236,11 @@
   (and (match_test "TARGET_SSE")
(match_operand 0 "float_vector_all_ones_operand")))
 
+(define_constraint "BH"
+  "@internal integer constant with last half/quarter bits set operand."
+  (ior (match_operand 0 "vector_all_ones_zero_extend_half_operand")
+   (match_operand 0 "vector_all_ones_zero_extend_quarter_operand")))
+
 ;; NB: Similar to 'm', but don't use define_memory_constraint on x86-64
 ;; to prevent LRA from converting the operand to the form '(mem (reg X))'
 ;; where X is a base register.
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index dadf453d6c0..ca799da5d7e 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -5186,7 +5186,8 @@ standard_80387_constant_rtx (int idx)
   XFmode);
 }
 
-/* Return 1 if X is all bits 0 and 2 if X is all bits 1
+/* Return 1 if X is all bits 0, 2 if X is all bits 1
+   and 3 if X is all bits 1 with zero extend
in supported SSE/AVX vector mode.  */
 
 int
@@ -5234,6 +5235,10 @@ standard_sse_constant_p (rtx x, machine_mode pred_mode)
}
 }
 
+  if (vector_all_ones_zero_extend_half_operand (x, mode)
+  || vector_all_ones_zero_extend_quarter_operand (x, mode))
+return 3;
+
   return 0;
 }
 
@@ -5341,6 +5346,25 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx 
*operands)
  gcc_unreachable ();
}
}
+  else if (vector_all_ones_zero_extend_half_operand (x, mode))
+{
+  if (GET_MODE_SIZE (mode) == 64)
+   {
+ gcc_assert (TARGET_AVX512F);
+ return "vpcmpeqd \t %t0, %t0, %t0";
+   }
+  else if (GET_MODE_SIZE (mode) == 32)
+   {
+ gcc_assert (TARGET_AVX);
+ return "vpcmpeqd \t %x0, %x0, %x0";
+   }
+  gcc_unreachable ();
+}
+  else if (vector_all_ones_zero_extend_quarter_operand (x, mode))
+{
+  gcc_assert (TARGET_AVX512F);
+  return "vpcmpeqd \t %x0, %x0, %x0";
+}
 
   gcc_unreachable ();
 }
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 4f16bb748b5..655eabf793b 100644
--- a/gcc/config/i386/predicates.md
+++ 

[PATCH] Some VN TLC

2022-09-22 Thread Richard Biener via Gcc-patches
The following was prompted by review of the patch introducing
equivalences to VN.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-ssa-sccvn.cc (can_track_predicate_on_edge): New
function split out from ...
(vn_nary_op_insert_pieces_predicated): ... here.
---
 gcc/tree-ssa-sccvn.cc | 43 +++
 1 file changed, 27 insertions(+), 16 deletions(-)

diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
index 74b8d8d18ef..85a7698f694 100644
--- a/gcc/tree-ssa-sccvn.cc
+++ b/gcc/tree-ssa-sccvn.cc
@@ -4460,28 +4460,39 @@ vn_nary_op_insert_pieces (unsigned int length, enum 
tree_code code,
   return vn_nary_op_insert_into (vno1, valid_info->nary);
 }
 
+/* Return whether we can track a predicate valid when PRED_E is executed.  */
+
+static bool
+can_track_predicate_on_edge (edge pred_e)
+{
+  /* ???  As we are currently recording a basic-block index in
+ vn_pval.valid_dominated_by_p and using dominance for the
+ validity check we cannot track predicates on all edges.  */
+  if (single_pred_p (pred_e->dest))
+return true;
+  /* Never record for backedges.  */
+  if (pred_e->flags & EDGE_DFS_BACK)
+return false;
+  /* When there's more than one predecessor we cannot track
+ predicate validity based on the destination block.  The
+ exception is when all other incoming edges are backedges.  */
+  edge_iterator ei;
+  edge e;
+  int cnt = 0;
+  FOR_EACH_EDGE (e, ei, pred_e->dest->preds)
+if (! dominated_by_p (CDI_DOMINATORS, e->src, e->dest))
+  cnt++;
+  return cnt == 1;
+}
+
 static vn_nary_op_t
 vn_nary_op_insert_pieces_predicated (unsigned int length, enum tree_code code,
 tree type, tree *ops,
 tree result, unsigned int value_id,
 edge pred_e)
 {
-  /* ???  Currently tracking BBs.  */
-  if (! single_pred_p (pred_e->dest))
-{
-  /* Never record for backedges.  */
-  if (pred_e->flags & EDGE_DFS_BACK)
-   return NULL;
-  edge_iterator ei;
-  edge e;
-  int cnt = 0;
-  /* Ignore backedges.  */
-  FOR_EACH_EDGE (e, ei, pred_e->dest->preds)
-   if (! dominated_by_p (CDI_DOMINATORS, e->src, e->dest))
- cnt++;
-  if (cnt != 1)
-   return NULL;
-}
+  if (!can_track_predicate_on_edge (pred_e))
+return NULL;
   if (dump_file && (dump_flags & TDF_DETAILS)
   /* ???  Fix dumping, but currently we only get comparisons.  */
   && TREE_CODE_CLASS (code) == tcc_comparison)
-- 
2.35.3


[PATCH] DSE: Enhance dse with def-ref analysis

2022-09-22 Thread juzhe . zhong
From: Ju-Zhe Zhong 

This patch fix issue: PR 99407
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99407

The enhancement implementation is simple:
1.Search gimple statement in program reverse order.
2.Queue the store statement which may be possible kill the def
  of previous store statement.
3.Perform dse_def_ref_analysis to remove stores will not kill
  any def.
  For example:
a[i_18] = _5;
...
foo ();
a[i_18] = _7;

  a[i_18] = _7 is queued at the begining and will be removed
  in dse_def_ref_analysis.
4.Remove the store if the def is confirmed to be killed.

I have fully tested it in RISC-V foundation downstream port (RVV):
https://github.com/riscv-collab/riscv-gcc/tree/riscv-gcc-rvv-next

Are you willing to review this patch and test it in ARM/x86?

gcc/ChangeLog:

* tree-ssa-dse.cc (dse_search_def_stores): New function.
(dse_can_def_ref_p): Ditto.
(dse_def_ref_analysis): Add a new argument.
(dse_optimize_stmt): Pass through stores_queue.
(pass_dse::execute): Add dse_def_ref_analysis and stores_queue.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr99407.c: New test.

---
 gcc/testsuite/gcc.dg/tree-ssa/pr99407.c |  30 
 gcc/tree-ssa-dse.cc | 209 +++-
 2 files changed, 236 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr99407.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr99407.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr99407.c
new file mode 100644
index 000..57cea77da7c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr99407.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-dse1-details" } */
+typedef float real_t;
+
+#define iterations 10
+#define LEN_1D 32000
+#define LEN_2D 256
+real_t flat_2d_array[LEN_2D*LEN_2D];
+
+real_t x[LEN_1D];
+
+real_t a[LEN_1D],b[LEN_1D],c[LEN_1D],d[LEN_1D],e[LEN_1D],
+bb[LEN_2D][LEN_2D],cc[LEN_2D][LEN_2D],tt[LEN_2D][LEN_2D];
+
+int indx[LEN_1D];
+
+real_t* __restrict__ xx;
+real_t* yy;
+real_t s243(void)
+{
+  for (int nl = 0; nl < iterations; nl++) {
+for (int i = 0; i < LEN_1D-1; i++) {
+a[i] = b[i] + c[i  ] * d[i];
+b[i] = a[i] + d[i  ] * e[i];
+a[i] = b[i] + a[i+1] * d[i];
+}
+  }
+}
+
+/* { dg-final { scan-tree-dump "Deleted dead store" "dse1" } } */
\ No newline at end of file
diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc
index 34cfd1a8802..a8ca3672da2 100644
--- a/gcc/tree-ssa-dse.cc
+++ b/gcc/tree-ssa-dse.cc
@@ -1332,6 +1332,186 @@ dse_optimize_call (gimple_stmt_iterator *gsi, sbitmap 
live_bytes)
   return true;
 }
 
+/* Search the stores_queue to see whether there is a store has a same vdef
+   as the stmt.  */
+
+static bool
+dse_search_def_stores (function *fun, auto_vec _queue,
+  gimple *stmt)
+{
+  /* Consider the following sequcence:
+a[i_18] = _5;
+_8 = e[i_18];
+_9 = _3 * _8;
+_10 = _5 + _9;
+b[i_18] = _10;
+_12 = i_18 + 1;
+_13 = a[_12];
+_15 = _3 * _13;
+_16 = _10 + _15;
+a[i_18] = _16
+
+We should be able to remove a[i_18] = _5.  */
+  for (unsigned int i = 0; i < stores_queue.length (); ++i)
+{
+  if (!stores_queue[i])
+   continue;
+  tree lhs1 = gimple_assign_lhs (stores_queue[i]);
+  tree lhs2 = gimple_assign_lhs (stmt);
+
+  if (TREE_CODE (lhs1) != TREE_CODE (lhs2))
+   continue;
+  if (operand_equal_p (gimple_assign_lhs (stores_queue[i]),
+  gimple_assign_lhs (stmt), OEP_ADDRESS_OF))
+   {
+ /* No matter it can be eliminated or not, remove it
+in the worklist.  */
+ stores_queue[i] = NULL;
+ if (gimple_assign_single_p (stmt) && !gimple_has_side_effects (stmt)
+ && !is_ctrl_altering_stmt (stmt)
+ && (!stmt_could_throw_p (fun, stmt)
+ || fun->can_delete_dead_exceptions))
+   return true;
+   }
+}
+
+  return false;
+}
+
+/* Return true if the TREE_CODE of the mem op is allowed to do dse
+   according to def-ref analysis.  */
+
+static bool
+dse_can_def_ref_p (gimple *stmt)
+{
+  /*TODO: For now, we only support dse according to
+def-ref analysis for ARRAY_REF.  */
+  return TREE_CODE (gimple_assign_lhs (stmt)) == ARRAY_REF;
+}
+
+/* Perform def-ref analysis on all the stores of stores_queue worklist.
+   Since dse is running on reverse program order walk, the stores in
+   stores_queue are always after stmt, clear the store in the stores_queue
+   if the address of store lhs is changed or the lhs of store is used
+   in stmt.  */
+
+static void
+dse_def_ref_analysis (gimple *stmt, auto_vec _queue)
+{
+  for (unsigned int i = 0; i < stores_queue.length (); ++i)
+{
+  if (!stores_queue[i])
+   continue;
+
+  /* If we meet non-call or non-assign statement, we disable the possible
+   * dse.  */
+  if (gimple_code (stmt) != GIMPLE_CALL
+ && gimple_code (stmt) != GIMPLE_ASSIGN)
+   

[PATCH] DSE: Enhance dse with def-ref analysis

2022-09-22 Thread juzhe . zhong
From: Ju-Zhe Zhong 

This patch fix issue: PR 99407
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99407

The enhancement implementation is simple:
1.Search gimple statement in program reverse order.
2.Queue the store statement which may be possible kill the def
  of previous store statement.
3.Perform dse_def_ref_analysis to remove stores will not kill
  any def.
  For example:
a[i_18] = _5;
...
foo ();
a[i_18] = _7;

  a[i_18] = _7 is queued at the begining and will be removed
  in dse_def_ref_analysis.
4.Remove the store if the def is confirmed to be killed.

I have fully tested it in RISC-V foundation downstream port (RVV):
https://github.com/riscv-collab/riscv-gcc/tree/riscv-gcc-rvv-next

Are you willing to review this patch and test it in ARM/x86?

gcc/ChangeLog:

* tree-ssa-dse.cc (dse_search_def_stores):
(dse_can_def_ref_p):
(dse_def_ref_analysis):
(dse_optimize_stmt):
(pass_dse::execute):

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr99407.c: New test.

---
 gcc/testsuite/gcc.dg/tree-ssa/pr99407.c |  30 
 gcc/tree-ssa-dse.cc | 209 +++-
 2 files changed, 236 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr99407.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr99407.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr99407.c
new file mode 100644
index 000..57cea77da7c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr99407.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-dse1-details" } */
+typedef float real_t;
+
+#define iterations 10
+#define LEN_1D 32000
+#define LEN_2D 256
+real_t flat_2d_array[LEN_2D*LEN_2D];
+
+real_t x[LEN_1D];
+
+real_t a[LEN_1D],b[LEN_1D],c[LEN_1D],d[LEN_1D],e[LEN_1D],
+bb[LEN_2D][LEN_2D],cc[LEN_2D][LEN_2D],tt[LEN_2D][LEN_2D];
+
+int indx[LEN_1D];
+
+real_t* __restrict__ xx;
+real_t* yy;
+real_t s243(void)
+{
+  for (int nl = 0; nl < iterations; nl++) {
+for (int i = 0; i < LEN_1D-1; i++) {
+a[i] = b[i] + c[i  ] * d[i];
+b[i] = a[i] + d[i  ] * e[i];
+a[i] = b[i] + a[i+1] * d[i];
+}
+  }
+}
+
+/* { dg-final { scan-tree-dump "Deleted dead store" "dse1" } } */
\ No newline at end of file
diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc
index 34cfd1a8802..a8ca3672da2 100644
--- a/gcc/tree-ssa-dse.cc
+++ b/gcc/tree-ssa-dse.cc
@@ -1332,6 +1332,186 @@ dse_optimize_call (gimple_stmt_iterator *gsi, sbitmap 
live_bytes)
   return true;
 }
 
+/* Search the stores_queue to see whether there is a store has a same vdef
+   as the stmt.  */
+
+static bool
+dse_search_def_stores (function *fun, auto_vec _queue,
+  gimple *stmt)
+{
+  /* Consider the following sequcence:
+a[i_18] = _5;
+_8 = e[i_18];
+_9 = _3 * _8;
+_10 = _5 + _9;
+b[i_18] = _10;
+_12 = i_18 + 1;
+_13 = a[_12];
+_15 = _3 * _13;
+_16 = _10 + _15;
+a[i_18] = _16
+
+We should be able to remove a[i_18] = _5.  */
+  for (unsigned int i = 0; i < stores_queue.length (); ++i)
+{
+  if (!stores_queue[i])
+   continue;
+  tree lhs1 = gimple_assign_lhs (stores_queue[i]);
+  tree lhs2 = gimple_assign_lhs (stmt);
+
+  if (TREE_CODE (lhs1) != TREE_CODE (lhs2))
+   continue;
+  if (operand_equal_p (gimple_assign_lhs (stores_queue[i]),
+  gimple_assign_lhs (stmt), OEP_ADDRESS_OF))
+   {
+ /* No matter it can be eliminated or not, remove it
+in the worklist.  */
+ stores_queue[i] = NULL;
+ if (gimple_assign_single_p (stmt) && !gimple_has_side_effects (stmt)
+ && !is_ctrl_altering_stmt (stmt)
+ && (!stmt_could_throw_p (fun, stmt)
+ || fun->can_delete_dead_exceptions))
+   return true;
+   }
+}
+
+  return false;
+}
+
+/* Return true if the TREE_CODE of the mem op is allowed to do dse
+   according to def-ref analysis.  */
+
+static bool
+dse_can_def_ref_p (gimple *stmt)
+{
+  /*TODO: For now, we only support dse according to
+def-ref analysis for ARRAY_REF.  */
+  return TREE_CODE (gimple_assign_lhs (stmt)) == ARRAY_REF;
+}
+
+/* Perform def-ref analysis on all the stores of stores_queue worklist.
+   Since dse is running on reverse program order walk, the stores in
+   stores_queue are always after stmt, clear the store in the stores_queue
+   if the address of store lhs is changed or the lhs of store is used
+   in stmt.  */
+
+static void
+dse_def_ref_analysis (gimple *stmt, auto_vec _queue)
+{
+  for (unsigned int i = 0; i < stores_queue.length (); ++i)
+{
+  if (!stores_queue[i])
+   continue;
+
+  /* If we meet non-call or non-assign statement, we disable the possible
+   * dse.  */
+  if (gimple_code (stmt) != GIMPLE_CALL
+ && gimple_code (stmt) != GIMPLE_ASSIGN)
+   {
+ stores_queue[i] = NULL;
+ continue;
+   }
+
+  tree lhs = gimple_get_lhs 

Re: [PATCH][RFH] Wire ranger into FRE

2022-09-22 Thread Richard Biener via Gcc-patches
On Wed, 21 Sep 2022, Andrew MacLeod wrote:

> 
> On 9/21/22 06:13, Richard Biener wrote:
> > On Mon, 19 Sep 2022, Andrew MacLeod wrote:
> >
> >
> >> It looks like you created a fur_source to manually adjust PHIs within the
> >> fold_stmt query to ignore edges that are not marked executable.
> > Yes, and use the current values from the VN lattice when looking at
> > statement operands.
> 
> yes, that is exactly how its intended to be used.
> 
> 
> >
> >> That would then just leave you with the stale cache state to deal with?  
> >> And
> >> if we can resolve that, would all just work?  at least in theory?
> > In theory, yes.  Besides that the use-def walking of the cache it not
> > wired up with fur_*
> 
> Well, yes. hmm, you want to set cache values based on the VN lattice as well.
> yes. OK, let me do a bit of cache explanation since I haven't done that yet.
> It does not need a fur_source of any kind, and I'll explain why.
> 
> The cache has 2 primary functions..
>   1) maintain the global definition table (used to decide if a name has been
> processed). This is local and not the one the rest of GCC uses.   and
>   2) maintain the range-on-entry cache andresolve queries to that efficiently.
> 
> The cache does not actually create any NEW information.  This is one of its
> key features in preventing any kind of cascading cyclic updates.  All it does
> is propagate existing information from the definition table, with values
> extracted from the global value table.  So your example is not good for this,
> as there isn't much in the cache for it.  so lets tweak it and add another
> block. example:
> 
> n_2 = 1
>   i_4 = 0
>   val_5 = 0
> :
>   # i_1 = PHI 
>   #val_2 = PHI 
>   val_6 = val_2 + 1;
>   i_7 = i_1 + 1
>   if (i_7 > 22)
>  goto 
>   else
>  goto 
> 
>   if (i_7 < n_3)
>     goto ;
>   else
>     goto ;
> 
>   _8 = val_6
>   return _8
> 
> For the sake of simplicity, lets also assume bb2 and bb3 have been looked and
> all the ssa-names defined in those blocks have an entry in rangers defintion
> table.
> 
> Moving to  if we ask for the range of "if (i_7< n_3) to be evaluated, it
> checks that i_7 and n_3 have been evaluated before it proceeds.  Both have
> entries, which means the next task is to get their values at this location. 
> range_of_expr is called on each one, and as they are not defined in this
> block, ranger asks the cache for the value of i_7 on entry to bb7. (likewise
> when it gets an answer back, it will do so for n_3 as well)
> 
> The cache walks back the dominators until it finds either:
>   a) the block with the definition of i_7, or
>   b) a block which has an on-entry cache value for i_7 already set.
> During it walk, it tags any block which has i_7 in the export list, meaning an
> outgoing edge from that block may change the value of i_7.
> 
> There are additional complexities, but the fundamental operation is to now
> take the value it saw from a) or b) as the starting value, and supply that to
> GORI at every intervening outgoing edge i_7 was exported from. Whenever the
> value changes along the way, we write a cache update at the end of the edge to
> facilitate future queries.  At the end, the answer has been calculated and is
> stored as the on-entry value for this block.
> 
> So returning to the example, assume i_7 was set to VARYING in bb3, GORI would
> apply !(i_7 > 22) to the value, and we would end up in  with a
> range-on-entry of [0, 21] and it would be stored in bb7.
> 
> In your example, if you have disabled that back edge, you would have a value
> of [1,1] for i_7.  GORI would not have changed that value since its already <
> 22, and we would store [1,1] as the range-on-entry to 
> 
> Likewise, we do something similar for n_3.  The point is, the cache has not
> gone and created an new information.  its *only* purpose it to propagate known
> values thru the CFG, adjusting them for any outgoing edges that are
> encountered.  It uses a temporal marking in an attempt to identify when a
> global value has been changed, meaning it may need to go and repopulate
> something, but the bottom line It never contains anything beyond "reductions"
> in the ranges of values in the global table.  And it only every works on one
> name at a time.
> 
> THe bottom line, Ranger effectively only every changes values via the global
> table. And the cache propagates simply those values around, adjusting them
> with GORI as appropriate.

Hmm, but when it reaches the definition of _7 then it ends up calling
range_of_stmt on it which then recursively processes operands and the
result is stored into the "global table"?  For the definition of _7
fur_* isn't asked for the operands so valueization would not happen.
path-ranger seems to override the def walking (range_of_expr) and thus
could transparently valueize.

OK, so you say there's two things we need to invalidate - first the
range-on-entry cache and second the "global table" entries?  Note that
value-numbering doesn't need to 

RE: [PATCH] i386: Add syscall to enable AMX for latest kernels

2022-09-22 Thread Liu, Hongtao via Gcc-patches


> -Original Message-
> From: Jiang, Haochen 
> Sent: Thursday, September 22, 2022 2:23 PM
> To: Uros Bizjak 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> Subject: RE: [PATCH] i386: Add syscall to enable AMX for latest kernels
> 
> Hi all,
> 
> I would like to backport this patch to GCC 12 release branch as machines with
> the version of default GCC is 12.x (which is always using newer kernels), if 
> the
> patch is not backported, the amx tests will always fail.
> 
> Ok for backport?
Ok.
> 
> BRs,
> Haochen
> 
> > -Original Message-
> > From: Uros Bizjak 
> > Sent: Tuesday, June 21, 2022 10:53 PM
> > To: Jiang, Haochen 
> > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> > Subject: Re: [PATCH] i386: Add syscall to enable AMX for latest
> > kernels
> >
> > On Tue, Jun 21, 2022 at 9:41 AM Jiang, Haochen
> > 
> > wrote:
> > >
> > > > -Original Message-
> > > > From: Uros Bizjak 
> > > > Sent: Tuesday, June 21, 2022 3:06 PM
> > > > To: Jiang, Haochen 
> > > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> > > > Subject: Re: [PATCH] i386: Add syscall to enable AMX for latest
> > > > kernels
> > > >
> > > > On Tue, Jun 21, 2022 at 4:23 AM Jiang, Haochen
> > > > 
> > > > wrote:
> > > > >
> > > > > > -Original Message-
> > > > > > From: Uros Bizjak 
> > > > > > Sent: Monday, June 20, 2022 10:54 PM
> > > > > > To: Jiang, Haochen 
> > > > > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao
> > > > > > 
> > > > > > Subject: Re: [PATCH] i386: Add syscall to enable AMX for
> > > > > > latest kernels
> > > > > >
> > > > > > On Mon, Jun 20, 2022 at 10:04 AM Haochen Jiang
> > > > > > 
> > > > > > wrote:
> > > > > > >
> > > > > > > From: "Jiang, Haochen" 
> > > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > We need syscall to enable AMX for kernels>=5.4. It is
> > > > > > > missing in current amx tests, which will cause test fail.
> > > > > >
> > > > > > So this new code is only valid for linux & co?
> > > > >
> > > > > Thanks for reminding me for that, I only test on linux since the
> > > > > header file is
> > > > only in linux.
> > > > >
> > > > > Just updated a patch wrapping with a macro not to change the
> > > > > behavior on
> > > > windows.
> > > >
> > > > I think you want __linux__ there, not __unix__.
> > >
> > > Fixed with __linux__.
> >
> > OK.
> >
> > Thanks,
> > Uros.
> >
> > >
> > > Thx,
> > > Haochen
> > >
> > > >
> > > > Uros.
> > > >
> > > > >
> > > > > Regtested on x86_64-pc-linux-gnu.
> > > > >
> > > > > Thx,
> > > > > Haochen
> > > > > >
> > > > > > Uros.
> > > > > >
> > > > > > >
> > > > > > > This patch aims to add them to fix this bug.
> > > > > > >
> > > > > > > BRs,
> > > > > > > Haochen
> > > > > > >
> > > > > > > gcc/testsuite/ChangeLog:
> > > > > > >
> > > > > > > * gcc.target/i386/amx-check.h (request_perm_xtile_data):
> > > > > > > New function to check if AMX is usable and enable AMX.
> > > > > > > (main): Run test if AMX is usable.
> > > > > > > ---
> > > > > > >  gcc/testsuite/gcc.target/i386/amx-check.h | 24
> > > > > > > +++
> > > > > > >  1 file changed, 24 insertions(+)
> > > > > > >
> > > > > > > diff --git a/gcc/testsuite/gcc.target/i386/amx-check.h
> > > > > > > b/gcc/testsuite/gcc.target/i386/amx-check.h
> > > > > > > index 434b0e59703..92ed8669304 100644
> > > > > > > --- a/gcc/testsuite/gcc.target/i386/amx-check.h
> > > > > > > +++ b/gcc/testsuite/gcc.target/i386/amx-check.h
> > > > > > > @@ -4,11 +4,22 @@
> > > > > > >  #include 
> > > > > > >  #include 
> > > > > > >  #include 
> > > > > > > +#include 
> > > > > > > +#include 
> > > > > > >  #ifdef DEBUG
> > > > > > >  #include 
> > > > > > >  #endif
> > > > > > >  #include "cpuid.h"
> > > > > > >
> > > > > > > +#define XFEATURE_XTILECFG  17
> > > > > > > +#define XFEATURE_XTILEDATA 18
> > > > > > > +#define XFEATURE_MASK_XTILECFG (1 << XFEATURE_XTILECFG)
> > > > > > > +#define XFEATURE_MASK_XTILEDATA(1 <<
> XFEATURE_XTILEDATA)
> > > > > > > +#define XFEATURE_MASK_XTILE(XFEATURE_MASK_XTILECFG |
> > > > > > XFEATURE_MASK_XTILEDATA)
> > > > > > > +
> > > > > > > +#define ARCH_GET_XCOMP_PERM0x1022
> > > > > > > +#define ARCH_REQ_XCOMP_PERM0x1023
> > > > > > > +
> > > > > > >  /* TODO: The tmm emulation is temporary for current
> > > > > > > AMX implementation with no tmm regclass, should
> > > > > > > be changed in the future. */ @@ -44,6 +55,18 @@ typedef
> > > > > > > struct __tile
> > > > > > >  /* Stride (colum width in byte) used for tileload/store */
> > > > > > > #define _STRIDE 64
> > > > > > >
> > > > > > > +/* We need syscall to use amx functions */ int
> > > > > > > +request_perm_xtile_data() {
> > > > > > > +  unsigned long bitmask;
> > > > > > > +
> > > > > > > +  if (syscall (SYS_arch_prctl, ARCH_REQ_XCOMP_PERM,
> > > > > > XFEATURE_XTILEDATA) ||
> > > > > > > +  syscall (SYS_arch_prctl, ARCH_GET_XCOMP_PERM, ))
> > > > > > > +return 0;
> > > > > > > +
> > > > > > > +  return (bitmask & 

RE: [PATCH] i386: Add syscall to enable AMX for latest kernels

2022-09-22 Thread Jiang, Haochen via Gcc-patches
Hi all,

I would like to backport this patch to GCC 12 release branch as machines with 
the version of default GCC
is 12.x (which is always using newer kernels), if the patch is not backported, 
the amx tests will always fail.

Ok for backport?

BRs,
Haochen

> -Original Message-
> From: Uros Bizjak 
> Sent: Tuesday, June 21, 2022 10:53 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> Subject: Re: [PATCH] i386: Add syscall to enable AMX for latest kernels
> 
> On Tue, Jun 21, 2022 at 9:41 AM Jiang, Haochen 
> wrote:
> >
> > > -Original Message-
> > > From: Uros Bizjak 
> > > Sent: Tuesday, June 21, 2022 3:06 PM
> > > To: Jiang, Haochen 
> > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> > > Subject: Re: [PATCH] i386: Add syscall to enable AMX for latest
> > > kernels
> > >
> > > On Tue, Jun 21, 2022 at 4:23 AM Jiang, Haochen
> > > 
> > > wrote:
> > > >
> > > > > -Original Message-
> > > > > From: Uros Bizjak 
> > > > > Sent: Monday, June 20, 2022 10:54 PM
> > > > > To: Jiang, Haochen 
> > > > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao
> > > > > 
> > > > > Subject: Re: [PATCH] i386: Add syscall to enable AMX for latest
> > > > > kernels
> > > > >
> > > > > On Mon, Jun 20, 2022 at 10:04 AM Haochen Jiang
> > > > > 
> > > > > wrote:
> > > > > >
> > > > > > From: "Jiang, Haochen" 
> > > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > We need syscall to enable AMX for kernels>=5.4. It is missing
> > > > > > in current amx tests, which will cause test fail.
> > > > >
> > > > > So this new code is only valid for linux & co?
> > > >
> > > > Thanks for reminding me for that, I only test on linux since the
> > > > header file is
> > > only in linux.
> > > >
> > > > Just updated a patch wrapping with a macro not to change the
> > > > behavior on
> > > windows.
> > >
> > > I think you want __linux__ there, not __unix__.
> >
> > Fixed with __linux__.
> 
> OK.
> 
> Thanks,
> Uros.
> 
> >
> > Thx,
> > Haochen
> >
> > >
> > > Uros.
> > >
> > > >
> > > > Regtested on x86_64-pc-linux-gnu.
> > > >
> > > > Thx,
> > > > Haochen
> > > > >
> > > > > Uros.
> > > > >
> > > > > >
> > > > > > This patch aims to add them to fix this bug.
> > > > > >
> > > > > > BRs,
> > > > > > Haochen
> > > > > >
> > > > > > gcc/testsuite/ChangeLog:
> > > > > >
> > > > > > * gcc.target/i386/amx-check.h (request_perm_xtile_data):
> > > > > > New function to check if AMX is usable and enable AMX.
> > > > > > (main): Run test if AMX is usable.
> > > > > > ---
> > > > > >  gcc/testsuite/gcc.target/i386/amx-check.h | 24
> > > > > > +++
> > > > > >  1 file changed, 24 insertions(+)
> > > > > >
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/amx-check.h
> > > > > > b/gcc/testsuite/gcc.target/i386/amx-check.h
> > > > > > index 434b0e59703..92ed8669304 100644
> > > > > > --- a/gcc/testsuite/gcc.target/i386/amx-check.h
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/amx-check.h
> > > > > > @@ -4,11 +4,22 @@
> > > > > >  #include 
> > > > > >  #include 
> > > > > >  #include 
> > > > > > +#include 
> > > > > > +#include 
> > > > > >  #ifdef DEBUG
> > > > > >  #include 
> > > > > >  #endif
> > > > > >  #include "cpuid.h"
> > > > > >
> > > > > > +#define XFEATURE_XTILECFG  17
> > > > > > +#define XFEATURE_XTILEDATA 18
> > > > > > +#define XFEATURE_MASK_XTILECFG (1 << XFEATURE_XTILECFG)
> > > > > > +#define XFEATURE_MASK_XTILEDATA(1 << XFEATURE_XTILEDATA)
> > > > > > +#define XFEATURE_MASK_XTILE(XFEATURE_MASK_XTILECFG |
> > > > > XFEATURE_MASK_XTILEDATA)
> > > > > > +
> > > > > > +#define ARCH_GET_XCOMP_PERM0x1022
> > > > > > +#define ARCH_REQ_XCOMP_PERM0x1023
> > > > > > +
> > > > > >  /* TODO: The tmm emulation is temporary for current
> > > > > > AMX implementation with no tmm regclass, should
> > > > > > be changed in the future. */ @@ -44,6 +55,18 @@ typedef
> > > > > > struct __tile
> > > > > >  /* Stride (colum width in byte) used for tileload/store */
> > > > > > #define _STRIDE 64
> > > > > >
> > > > > > +/* We need syscall to use amx functions */ int
> > > > > > +request_perm_xtile_data() {
> > > > > > +  unsigned long bitmask;
> > > > > > +
> > > > > > +  if (syscall (SYS_arch_prctl, ARCH_REQ_XCOMP_PERM,
> > > > > XFEATURE_XTILEDATA) ||
> > > > > > +  syscall (SYS_arch_prctl, ARCH_GET_XCOMP_PERM, ))
> > > > > > +return 0;
> > > > > > +
> > > > > > +  return (bitmask & XFEATURE_MASK_XTILE) != 0; }
> > > > > > +
> > > > > >  /* Initialize tile config by setting all tmm size to 16x64 */
> > > > > > void init_tile_config (__tilecfg_u *dst)  { @@ -186,6 +209,7
> > > > > > @@ main () #ifdef AMX_BF16
> > > > > >&& __builtin_cpu_supports ("amx-bf16")  #endif
> > > > > > +  && request_perm_xtile_data ()
> > > > > >)
> > > > > >  {
> > > > > >DO_TEST ();
> > > > > > --
> > > > > > 2.18.2
> > > > > >


Re: [PATCH] [x86] Fix typo in floorv2sf2, should be register_operand for op1, not vector_operand.

2022-09-22 Thread Uros Bizjak via Gcc-patches
On Thu, Sep 22, 2022 at 3:18 AM liuhongt via Gcc-patches
 wrote:
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Verify 526.blend_r can be rebuilt with the fix.
>
> Ok for trunk?

This patch is OK as obvious.

Thanks,
Uros.

> gcc/ChangeLog:
>
> PR target/106994
> * config/i386/mmx.md (floorv2sf2): Fix typo, use
> register_operand instead of vector_operand for operands[1].
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr106994.c: New test.
> ---
>  gcc/config/i386/mmx.md   |  2 +-
>  gcc/testsuite/gcc.target/i386/pr106994.c | 24 
>  2 files changed, 25 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr106994.c
>
> diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
> index 222a041de58..c359e2dd6de 100644
> --- a/gcc/config/i386/mmx.md
> +++ b/gcc/config/i386/mmx.md
> @@ -1676,7 +1676,7 @@ (define_expand "lceilv2sfv2si2"
>  (define_expand "floorv2sf2"
>[(set (match_operand:V2SF 0 "register_operand")
> (unspec:V2SF
> - [(match_operand:V2SF 1 "vector_operand")
> + [(match_operand:V2SF 1 "register_operand")
>(match_dup 2)]
>   UNSPEC_ROUND))]
>"TARGET_SSE4_1 && !flag_trapping_math
> diff --git a/gcc/testsuite/gcc.target/i386/pr106994.c 
> b/gcc/testsuite/gcc.target/i386/pr106994.c
> new file mode 100644
> index 000..0803311dc75
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr106994.c
> @@ -0,0 +1,24 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=skylake -Ofast" } */
> +
> +typedef struct {
> +  float ymin, ymax;
> +} rctf;
> +
> +rctf view2d_map_cur_using_maskUI_view2d_view_ortho_curmasked;
> +float view2d_map_cur_using_maskUI_view2d_view_ortho_yofs;
> +
> +void BLI_rctf_translate();
> +void glLoadIdentity();
> +
> +void
> +view2d_map_cur_using_maskUI_view2d_view_ortho() {
> +  
> BLI_rctf_translate(_map_cur_using_maskUI_view2d_view_ortho_curmasked);
> +  view2d_map_cur_using_maskUI_view2d_view_ortho_curmasked.ymin =
> +  
> __builtin_floor(view2d_map_cur_using_maskUI_view2d_view_ortho_curmasked.ymin) 
> -
> +  view2d_map_cur_using_maskUI_view2d_view_ortho_yofs;
> +  view2d_map_cur_using_maskUI_view2d_view_ortho_curmasked.ymax =
> +  
> __builtin_floor(view2d_map_cur_using_maskUI_view2d_view_ortho_curmasked.ymax) 
> -
> +  view2d_map_cur_using_maskUI_view2d_view_ortho_yofs;
> +  glLoadIdentity();
> +}
> --
> 2.27.0
>