Re: [PATCH] i386: Mark Xeon Phi ISAs as deprecated

2023-12-05 Thread Richard Biener
On Wed, Dec 6, 2023 at 3:33 AM Jiang, Haochen  wrote:
>
> > -Original Message-
> > From: Jiang, Haochen
> > Sent: Friday, December 1, 2023 4:51 PM
> > To: Richard Biener 
> > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> > ubiz...@gmail.com
> > Subject: RE: [PATCH] i386: Mark Xeon Phi ISAs as deprecated
> >
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Friday, December 1, 2023 4:37 PM
> > > To: Jiang, Haochen 
> > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> > > ubiz...@gmail.com
> > > Subject: Re: [PATCH] i386: Mark Xeon Phi ISAs as deprecated
> > >
> > > On Fri, Dec 1, 2023 at 8:34 AM Jiang, Haochen 
> > > wrote:
> > > >
> > > > > -Original Message-
> > > > > From: Richard Biener 
> > > > > Sent: Friday, December 1, 2023 3:04 PM
> > > > > To: Jiang, Haochen 
> > > > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> > > > > ubiz...@gmail.com
> > > > > Subject: Re: [PATCH] i386: Mark Xeon Phi ISAs as deprecated
> > > > >
> > > > > On Fri, Dec 1, 2023 at 3:22 AM Haochen Jiang
> > 
> > > > > wrote:
> > > > > >
> > > > > > Since Knight Landing and Knight Mill microarchitectures are EOL, we
> > > > > > would like to remove its support in GCC 15. In GCC 14, we will first
> > > > > > emit a warning for the usage.
> > > > >
> > > > > I think it's better to keep supporting -mtune/arch=knl without 
> > > > > diagnostics
> > > >
> > > > I see, it could be a choice and might be better. But if we take this, 
> > > > how
> > should
> > > > we define -mtune=knl remains a question.
> > >
> > > I'd say mapping it to a "close" micro-architecture makes most sense, but
> > > we could also simply keep the tuning entry for knl?
> >
> > Actually I have written a removal test patch, one of the issue might be 
> > there is
> > something specific about knl in tuning for VZEROUPPER, which is also 
> > reflected
> > in
> > PR82990.
> >
> > /* X86_TUNE_EMIT_VZEROUPPER: This enables vzeroupper instruction
> > insertion
> >before a transfer of control flow out of the function.  */
> > DEF_TUNE (X86_TUNE_EMIT_VZEROUPPER, "emit_vzeroupper", ~m_KNL)
> >
> > If we chose to keep them, this behavior will be changed.
>
> Hi Richard,
>
> After double thinking, I suppose we still should remove the arch/tune options
> here to avoid misleading behavior since there will always something be 
> changed.
>
> What is your concern about removing? Do you have anything that relies on the
> tune and arch?

We usually promise backwards compatibility with respect to accepted options
which is why we have things like

ftree-vect-loop-version
Common Ignore
Does nothing. Preserved for backward compatibility.

the backend errors on unknown march/tune and that would be a regression
for build systems using that (even if that's indeed very unlikely).  That's why
I suggested to make it still do something (doing "nothing", aka keeping generic
is probably worse than dropping).  I guess having -march=knl behave differently
is also bad so I guess there's not a good solution for that.

So - just to have made the above point, I'm fine with what x86 maintainers
decide here.

Richard.

> Thx,
> Haochen
>
> >
> > >
> > > > > but simply not enable the ISAs we don't support.  The better question 
> > > > > is
> > > > > what to do about KNL specific intrinsics headers / intrinsics?  Will 
> > > > > we
> > > > > simply remove those?
> > > >
> > > > If there is no objection, The intrinsics are planned to be removed in 
> > > > GCC 15.
> > > > As far as concerned, almost nobody are using them with the latest GCC.
> > And
> > > > there is no complaint when removing them in ICC/ICX.
> > >
> > > I see.  Replacing the header contents with #error "XYZ is no longer
> > supported"
> > > might be nicer.  OTOH x86intrin.h should simply no longer include them.
> >
> > That is nicer. I will take that in GCC 15 patch.
> >
> > Thx,
> > Haochen
> >
> > >
> > > Richard.
> > >
> > > > Thx,
> > > > Haochen
> > > >
> > > > >
> > > > > Richard.
> > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > > * config/i386/driver-i386.cc (host_detect_local_cpu):
> > > > > > Do not append "-mno-" for Xeon Phi ISAs.
> > > > > > * config/i386/i386-options.cc 
> > > > > > (ix86_option_override_internal):
> > > > > > Emit a warning for KNL/KNM targets.
> > > > > > * config/i386/i386.opt: Emit a warning for Xeon Phi ISAs.
> > > > > >
> > > > > > gcc/testsuite/ChangeLog:
> > > > > >
> > > > > > * g++.dg/other/i386-2.C: Adjust testcases.
> > > > > > * g++.dg/other/i386-3.C: Ditto.
> > > > > > * g++.dg/pr80481.C: Ditto.
> > > > > > * gcc.dg/pr71279.c: Ditto.
> > > > > > * gcc.target/i386/avx5124fmadd-v4fmaddps-1.c: Ditto.
> > > > > > * gcc.target/i386/avx5124fmadd-v4fmaddps-2.c: Ditto.
> > > > > > * gcc.target/i386/avx5124fmadd-v4fmaddss-1.c: Ditto.
> > > > > > * gcc.target/i386/avx5124fmadd-v4fnmaddps-1.c: Ditto.
> > > > > > * 

Re: [gcc15] nested functions in C

2023-12-05 Thread Richard Biener
On Tue, Dec 5, 2023 at 10:16 PM Martin Uecker  wrote:
>
> Am Dienstag, dem 05.12.2023 um 21:08 + schrieb Joseph Myers:
> > On Mon, 4 Dec 2023, Martin Uecker wrote:
> >
> > > > The key feature of lambdas (which failed to make it into C23) for this
> > > > purpose is that you can't convert them to function pointers, which
> > > > eliminates any need for trampolines.
> > >
> > > And also makes them useful only for template-like macro programming,
> > > but not much else. So my understanding was that this needs to be
> > > addressed at some point.
> >
> > Where "addressed" probably means some kind of callable object that stores
> > more than just a function pointer in order to be able to encapsulate both
> > the code address of a lambda and the context it needs to receive
> > implicitly.  So still not needing trampolines.
>
> Yes, a wide function pointer type similar to C++'s std::function.
>
> This would also be a way to eliminate the need for trampolines
> for GCC's nested function.

And conversion to ordinary function pointer types would be still
possible by using (on heap) trampolines then and would offer
backward compatibility.  I wonder how much implementation work
would it be to add the wide function pointer types (please hide
details in the C frontend).

Richard.

> Martin
> >
>


Re: [PATCH] lower-bitint: Fix arithmetics followed by extension by many bits [PR112809]

2023-12-05 Thread Richard Biener
On Tue, 5 Dec 2023, Jakub Jelinek wrote:

> Hi!
> 
> A zero or sign extension from result of some upwards_2limb operation
> is implemented in lower_mergeable_stmt as an extra loop which fills in
> the extra bits with 0s or 1s.
> If the delta of extended vs. unextended bit count is small, the code
> doesn't use a loop and emits up to a couple of stores to constant indexes,
> but if the delta is large, it uses
> cnt = (bo_bit != 0) + 1 + (rem != 0);
> statements.  bo_bit is non-zero for bit-field loads and is done in that
> case as straight line, the unconditional 1 in there is for a loop which
> handles most of the limbs in the delta and finally (rem != 0) is for the
> case when the extended precision is not a multiple of limb_prec and is
> again done in straight line code (after the loop).
> The testcase ICEs because the decision what idx to use was incorrect
> for kind == bitint_prec_huge (i.e. when the precision delta is very large)
> and rem == 0 (i.e. the extended precision is multiple of limb_prec).
> In that case cnt is either 1 (if bo_bit == 0) or 2, and idx should
> be either first size_int (start) and then result of create_loop (for bo_bit
> != 0) or just result of create_loop, but by mistake the last case
> was size_int (end), which means when precision is multiple of limb_prec
> storing above the precision (which ICEs; but also not emitting the loop
> which is needed).
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok
> for trunk?

OK.

> 2023-12-05  Jakub Jelinek  
> 
>   PR tree-optimization/112809
>   * gimple-lower-bitint.cc (bitint_large_huge::lower_mergeable_stmt): For
>   separate_ext in kind == bitint_prec_huge mode if rem == 0, create for
>   i == cnt - 1 the loop rather than using size_int (end).
> 
>   * gcc.dg/bitint-48.c: New test.
> 
> --- gcc/gimple-lower-bitint.cc.jj 2023-12-05 09:48:14.0 +0100
> +++ gcc/gimple-lower-bitint.cc2023-12-05 18:55:58.996323144 +0100
> @@ -2624,7 +2624,7 @@ bitint_large_huge::lower_mergeable_stmt
>   {
> if (kind == bitint_prec_large || (i == 0 && bo_bit != 0))
>   idx = size_int (start + i);
> -   else if (i == cnt - 1)
> +   else if (i == cnt - 1 && (rem != 0))
>   idx = size_int (end);
> else if (i == (bo_bit != 0))
>   idx = create_loop (size_int (start + i), _next);
> --- gcc/testsuite/gcc.dg/bitint-48.c.jj   2023-12-05 19:00:19.593664966 
> +0100
> +++ gcc/testsuite/gcc.dg/bitint-48.c  2023-12-05 19:00:14.599735086 +0100
> @@ -0,0 +1,23 @@
> +/* PR tree-optimization/112809 */
> +/* { dg-do compile { target bitint } } */
> +/* { dg-options "-O2" } */
> +
> +#if __BITINT_MAXWIDTH__ >= 512
> +_BitInt (512) a;
> +_BitInt (256) b;
> +_BitInt (256) c;
> +
> +int
> +foo (void)
> +{
> +  return a == (b | c);
> +}
> +
> +void
> +bar (void)
> +{
> +  a /= b - 2;
> +}
> +#else
> +int i;
> +#endif
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] RISC-V: xtheadmemidx: Document inline asm issue with memory constraint

2023-12-05 Thread Kito Cheng
LGTM

On Tue, Dec 5, 2023 at 11:16 PM Christoph Müllner
 wrote:
>
> The XTheadMemIdx support relies on the fact that memory operands that
> can be expressed by XTheadMemIdx instructions, will only appear as
> operands of such instructions.  For internal instruction generation
> this is guaranteed by the implemenation.  However, in case of inline
> assembly, this guarantee is not given and we cannot differentiate
> these two cases when printing the operand:
>
>   asm volatile ("sd %1,%0" : "=m"(*tmp) : "r"(val));
>   asm volatile ("th.srd %1,%0" : "=m"(*tmp) : "r"(val));
>
> If XTheadMemIdx is enabled, then the address will be printed as if an
> XTheadMemIdx instruction is emitted, which is obviously wrong in the
> first case.
>
> There might be solutions to handle this (e.g. using TARGET_MEM_CONSTRAINT
> or extending the mnemonics to accept the standard operands for
> XTheadMemIdx instructions), but let's document this behavior for now
> as a known issue by adding xfail tests until we have an acceptable fix.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/xtheadmemidx-inline-asm-1.c: New test.
>
> Reported-by: Jin Ma 
> Signed-off-by: Christoph Müllner 
> ---
>  .../riscv/xtheadmemidx-inline-asm-1.c | 26 +++
>  1 file changed, 26 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmemidx-inline-asm-1.c
>
> diff --git a/gcc/testsuite/gcc.target/riscv/xtheadmemidx-inline-asm-1.c 
> b/gcc/testsuite/gcc.target/riscv/xtheadmemidx-inline-asm-1.c
> new file mode 100644
> index 000..da52433feb7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/xtheadmemidx-inline-asm-1.c
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-Og" } } */
> +/* { dg-options "-march=rv64gc_xtheadmemidx" } */
> +
> +/* XTheadMemIdx support is implemented such that reg+reg addressing mode
> +   loads/stores are preferred over standard loads/stores.
> +   If this order changed using inline assembly, the result will be invalid
> +   instructions.  This test serves the purpose of documenting this
> +   limitation until a solution is available.  */
> +
> +void foo (void *p, unsigned long off, unsigned long val)
> +{
> +  unsigned long *tmp = (unsigned long*)(p + off);
> +  asm volatile ("sd%1,%0" : "=m"(*tmp) : "r"(val));
> +}
> +
> +void bar (void *p, unsigned long off, unsigned long val)
> +{
> +  unsigned long *tmp = (unsigned long*)(p + off);
> +  asm volatile ("th.srd%1,%0" : "=m"(*tmp) : "r"(val));
> +}
> +
> +/* { dg-final { scan-assembler "sd\t\[a-z\]\[0-9\]+,0\\(\[a-z\]\[0-9\]+\\)" 
> { xfail *-*-* } } } */
> +/* { dg-final { scan-assembler-not 
> "sd\t\[a-z\]\[0-9\]+,\[a-z\]\[0-9\]+,\[a-z\]\[0-9\]+,0" { xfail *-*-* } } } */
> +/* { dg-final { scan-assembler 
> "th\.srd\t\[a-z\]\[0-9\]+,\[a-z\]\[0-9\]+,\[a-z\]\[0-9\]+,0" } } */
> +/* { dg-final { scan-assembler-not 
> "th\.srd\t\[a-z\]\[0-9\]+,0\\(\[a-z\]\[0-9\]+\\)" } } */
> --
> 2.43.0
>


[PATCH v3 4/5] LoongArch: New options -mrecip and -mrecip= with ffast-math.

2023-12-05 Thread Jiahao Xu
When both the -mrecip and -mfrecipe options are enabled, use approximate 
reciprocal
instructions and approximate reciprocal square root instructions with additional
Newton-Raphson steps to implement single precision floating-point division, 
square
root and reciprocal square root operations, for a better performance.

gcc/ChangeLog:

* config/loongarch/genopts/loongarch.opt.in (recip_mask): New variable.
(-mrecip, -mrecip): New options.
* config/loongarch/lasx.md (div3): New expander.
(*div3): Rename.
(sqrt2): New expander.
(*sqrt2): Rename.
(rsqrt2): New expander.
* config/loongarch/loongarch-protos.h (loongarch_emit_swrsqrtsf): New 
prototype.
(loongarch_emit_swdivsf): Ditto.
* config/loongarch/loongarch.cc (loongarch_option_override_internal): 
Set
recip_mask for -mrecip and -mrecip= options.
(loongarch_emit_swrsqrtsf): New function.
(loongarch_emit_swdivsf): Ditto.
* config/loongarch/loongarch.h (RECIP_MASK_NONE, RECIP_MASK_DIV, 
RECIP_MASK_SQRT
RECIP_MASK_RSQRT, RECIP_MASK_VEC_DIV, RECIP_MASK_VEC_SQRT, 
RECIP_MASK_VEC_RSQRT
RECIP_MASK_ALL): New bitmasks.
(TARGET_RECIP_DIV, TARGET_RECIP_SQRT, TARGET_RECIP_RSQRT, 
TARGET_RECIP_VEC_DIV
TARGET_RECIP_VEC_SQRT, TARGET_RECIP_VEC_RSQRT): New tests.
* config/loongarch/loongarch.md (sqrt2): New expander.
(*sqrt2): Rename.
(rsqrt2): New expander.
* config/loongarch/loongarch.opt (recip_mask): New variable.
(-mrecip, -mrecip): New options.
* config/loongarch/lsx.md (div3): New expander.
(*div3): Rename.
(sqrt2): New expander.
(*sqrt2): Rename.
(rsqrt2): New expander.
* config/loongarch/predicates.md (reg_or_vecotr_1_operand): New 
predicate.
* doc/invoke.texi (LoongArch Options): Document new options.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/divf.c: New test.
* gcc.target/loongarch/recip-divf.c: New test.
* gcc.target/loongarch/recip-sqrtf.c: New test.
* gcc.target/loongarch/sqrtf.c: New test.
* gcc.target/loongarch/vector/lasx/lasx-divf.c: New test.
* gcc.target/loongarch/vector/lasx/lasx-recip-divf.c: New test.
* gcc.target/loongarch/vector/lasx/lasx-recip-sqrtf.c: New test.
* gcc.target/loongarch/vector/lasx/lasx-recip.c: New test.
* gcc.target/loongarch/vector/lasx/lasx-sqrtf.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-divf.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-recip-divf.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-recip-sqrtf.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-recip.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-sqrtf.c: New test.

diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index 483b185b059..c3848d02fd3 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -23,6 +23,9 @@ config/loongarch/loongarch-opts.h
 HeaderInclude
 config/loongarch/loongarch-str.h
 
+TargetVariable
+unsigned int recip_mask = 0
+
 ; ISA related options
 ;; Base ISA
 Enum
@@ -194,6 +197,14 @@ mexplicit-relocs
 Target Var(la_opt_explicit_relocs_backward) Init(M_OPT_UNSET)
 Use %reloc() assembly operators (for backward compatibility).
 
+mrecip
+Target RejectNegative Var(loongarch_recip)
+Generate approximate reciprocal divide and square root for better throughput.
+
+mrecip=
+Target RejectNegative Joined Var(loongarch_recip_name)
+Control generation of reciprocal estimates.
+
 ; The code model option names for -mcmodel.
 Enum
 Name(cmodel) Type(int)
diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index e4310c4523d..f6f2feedbb3 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -1194,7 +1194,25 @@ (define_insn "mul3"
   [(set_attr "type" "simd_fmul")
(set_attr "mode" "")])
 
-(define_insn "div3"
+(define_expand "div3"
+  [(set (match_operand:FLASX 0 "register_operand")
+(div:FLASX (match_operand:FLASX 1 "reg_or_vecotr_1_operand")
+  (match_operand:FLASX 2 "register_operand")))]
+  "ISA_HAS_LASX"
+{
+  if (mode == V8SFmode
+&& TARGET_RECIP_VEC_DIV
+&& optimize_insn_for_speed_p ()
+&& flag_finite_math_only && !flag_trapping_math
+&& flag_unsafe_math_optimizations)
+  {
+loongarch_emit_swdivsf (operands[0], operands[1],
+   operands[2], V8SFmode);
+DONE;
+  }
+})
+
+(define_insn "*div3"
   [(set (match_operand:FLASX 0 "register_operand" "=f")
(div:FLASX (match_operand:FLASX 1 "register_operand" "f")
   (match_operand:FLASX 2 "register_operand" "f")))]
@@ -1223,7 +1241,23 @@ (define_insn "fnma4"
   [(set_attr "type" "simd_fmadd")
(set_attr "mode" "")])
 
-(define_insn "sqrt2"
+(define_expand "sqrt2"
+  [(set (match_operand:FLASX 0 

[PATCH v3 2/5] LoongArch: Use standard pattern name for xvfrsqrt/vfrsqrt instructions.

2023-12-05 Thread Jiahao Xu
Rename lasx_xvfrsqrt*/lsx_vfrsqrt* to rsqrt2 to align with standard
pattern name. Define function use_rsqrt_p to decide when to use rsqrt optab.

gcc/ChangeLog:

* config/loongarch/lasx.md (lasx_xvfrsqrt_): Renamed to ..
(rsqrt2): .. this.
* config/loongarch/loongarch-builtins.cc
(CODE_FOR_lsx_vfrsqrt_d): Redefine to standard pattern name.
(CODE_FOR_lsx_vfrsqrt_s): Ditto.
(CODE_FOR_lasx_xvfrsqrt_d): Ditto.
(CODE_FOR_lasx_xvfrsqrt_s): Ditto.
* config/loongarch/loongarch.cc (use_rsqrt_p): New function.
(loongarch_optab_supported_p): Ditto.
(TARGET_OPTAB_SUPPORTED_P): New hook.
* config/loongarch/loongarch.md (*rsqrta): Remove.
(*rsqrt2): New insn pattern.
(*rsqrtb): Remove.
* config/loongarch/lsx.md (lsx_vfrsqrt_): Renamed to ..
(rsqrt2): .. this.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lasx/lasx-rsqrt.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-rsqrt.c: New test.

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index f6e5208a6f1..c8edc1bfd76 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -1646,10 +1646,10 @@ (define_insn "lasx_xvfrecipe_"
   [(set_attr "type" "simd_fdiv")
(set_attr "mode" "")])
 
-(define_insn "lasx_xvfrsqrt_"
+(define_insn "rsqrt2"
   [(set (match_operand:FLASX 0 "register_operand" "=f")
-   (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")]
- UNSPEC_LASX_XVFRSQRT))]
+(unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")]
+ UNSPEC_LASX_XVFRSQRT))]
   "ISA_HAS_LASX"
   "xvfrsqrt.\t%u0,%u1"
   [(set_attr "type" "simd_fdiv")
diff --git a/gcc/config/loongarch/loongarch-builtins.cc 
b/gcc/config/loongarch/loongarch-builtins.cc
index 507fc953c72..ba8686d4ceb 100644
--- a/gcc/config/loongarch/loongarch-builtins.cc
+++ b/gcc/config/loongarch/loongarch-builtins.cc
@@ -500,6 +500,8 @@ AVAIL_ALL (lasx_frecipe, ISA_HAS_LASX && TARGET_FRECIPE)
 #define CODE_FOR_lsx_vssrlrn_bu_h CODE_FOR_lsx_vssrlrn_u_bu_h
 #define CODE_FOR_lsx_vssrlrn_hu_w CODE_FOR_lsx_vssrlrn_u_hu_w
 #define CODE_FOR_lsx_vssrlrn_wu_d CODE_FOR_lsx_vssrlrn_u_wu_d
+#define CODE_FOR_lsx_vfrsqrt_d CODE_FOR_rsqrtv2df2
+#define CODE_FOR_lsx_vfrsqrt_s CODE_FOR_rsqrtv4sf2
 
 /* LoongArch ASX define CODE_FOR_lasx_mxxx */
 #define CODE_FOR_lasx_xvsadd_b CODE_FOR_ssaddv32qi3
@@ -776,6 +778,8 @@ AVAIL_ALL (lasx_frecipe, ISA_HAS_LASX && TARGET_FRECIPE)
 #define CODE_FOR_lasx_xvsat_hu CODE_FOR_lasx_xvsat_u_hu
 #define CODE_FOR_lasx_xvsat_wu CODE_FOR_lasx_xvsat_u_wu
 #define CODE_FOR_lasx_xvsat_du CODE_FOR_lasx_xvsat_u_du
+#define CODE_FOR_lasx_xvfrsqrt_d CODE_FOR_rsqrtv4df2
+#define CODE_FOR_lasx_xvfrsqrt_s CODE_FOR_rsqrtv8sf2
 
 static const struct loongarch_builtin_description loongarch_builtins[] = {
 #define LARCH_MOVFCSR2GR 0
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 57a20bec8a4..96a4b846f2d 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -11487,6 +11487,30 @@ loongarch_builtin_support_vector_misalignment 
(machine_mode mode,
  is_packed);
 }
 
+static bool
+use_rsqrt_p (void)
+{
+  return (flag_finite_math_only
+ && !flag_trapping_math
+ && flag_unsafe_math_optimizations);
+}
+
+/* Implement the TARGET_OPTAB_SUPPORTED_P hook.  */
+
+static bool
+loongarch_optab_supported_p (int op, machine_mode, machine_mode,
+optimization_type opt_type)
+{
+  switch (op)
+{
+case rsqrt_optab:
+  return opt_type == OPTIMIZE_FOR_SPEED && use_rsqrt_p ();
+
+default:
+  return true;
+}
+}
+
 /* If -fverbose-asm, dump some info for debugging.  */
 static void
 loongarch_asm_code_end (void)
@@ -11625,6 +11649,9 @@ loongarch_asm_code_end (void)
 #undef TARGET_FUNCTION_ARG_BOUNDARY
 #define TARGET_FUNCTION_ARG_BOUNDARY loongarch_function_arg_boundary
 
+#undef TARGET_OPTAB_SUPPORTED_P
+#define TARGET_OPTAB_SUPPORTED_P loongarch_optab_supported_p
+
 #undef TARGET_VECTOR_MODE_SUPPORTED_P
 #define TARGET_VECTOR_MODE_SUPPORTED_P loongarch_vector_mode_supported_p
 
diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 07beede8892..fd154b02e48 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -60,6 +60,7 @@ (define_c_enum "unspec" [
   UNSPEC_TIE
 
   ;; RSQRT
+  UNSPEC_RSQRT
   UNSPEC_RSQRTE
 
   ;; RECIP
@@ -1134,25 +1135,14 @@ (define_insn "sqrt2"
(set_attr "mode" "")
(set_attr "insn_count" "1")])
 
-(define_insn "*rsqrta"
+(define_insn "*rsqrt2"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
-   (div:ANYF (match_operand:ANYF 1 "const_1_operand" "")
- (sqrt:ANYF (match_operand:ANYF 2 "register_operand" "f"]
-  "flag_unsafe_math_optimizations"
-  "frsqrt.\t%0,%2"
-  

[PATCH v3 5/5] LoongArch: Vectorized loop unrolling is disable for divf/sqrtf/rsqrtf when -mrecip is enabled.

2023-12-05 Thread Jiahao Xu
Using -mrecip generates a sequence of instructions to replace divf, sqrtf and 
rsqrtf. The number
of generated instructions is close to or exceeds the maximum issue instructions 
per cycle of the
LoongArch, so vectorized loop unrolling is not performed on them.

gcc/ChangeLog:

* config/loongarch/loongarch.cc 
(loongarch_vector_costs::determine_suggested_unroll_factor):
If m_has_recip is true, uf return 1.
(loongarch_vector_costs::add_stmt_cost): Detect the use of approximate 
instruction sequence.

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 2c06edcff92..0ca60e15ced 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -3974,7 +3974,9 @@ protected:
   /* Reduction factor for suggesting unroll factor.  */
   unsigned m_reduc_factor = 0;
   /* True if the loop contains an average operation. */
-  bool m_has_avg =false;
+  bool m_has_avg = false;
+  /* True if the loop uses approximation instruction sequence.  */
+  bool m_has_recip = false;
 };
 
 /* Implement TARGET_VECTORIZE_CREATE_COSTS.  */
@@ -4021,7 +4023,7 @@ loongarch_vector_costs::determine_suggested_unroll_factor 
(loop_vec_info loop_vi
 {
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
 
-  if (m_has_avg)
+  if (m_has_avg || m_has_recip)
 return 1;
 
   /* Don't unroll if it's specified explicitly not to be unrolled.  */
@@ -4081,6 +4083,36 @@ loongarch_vector_costs::add_stmt_cost (int count, 
vect_cost_for_stmt kind,
}
 }
 
+  combined_fn cfn;
+  if (kind == vector_stmt
+  && stmt_info
+  && stmt_info->stmt)
+{
+  /* Detect the use of approximate instruction sequence.  */
+  if ((TARGET_RECIP_VEC_SQRT || TARGET_RECIP_VEC_RSQRT)
+ && (cfn = gimple_call_combined_fn (stmt_info->stmt)) != CFN_LAST)
+   switch (cfn)
+ {
+ case CFN_BUILT_IN_SQRTF:
+   m_has_recip = true;
+ default:
+   break;
+ }
+  else if (TARGET_RECIP_VEC_DIV
+  && gimple_code (stmt_info->stmt) == GIMPLE_ASSIGN)
+   {
+ machine_mode mode = TYPE_MODE (vectype);
+ switch (gimple_assign_rhs_code (stmt_info->stmt))
+   {
+   case RDIV_EXPR:
+ if (GET_MODE_INNER (mode) == SFmode)
+   m_has_recip = true;
+   default:
+ break;
+   }
+   }
+}
+
   return retval;
 }
 
-- 
2.20.1



[PATCH v3 3/5] LoongArch: Redefine pattern for xvfrecip/vfrecip instructions.

2023-12-05 Thread Jiahao Xu
Redefine pattern for [x]vfrecip instructions use rtx code instead of unspec, 
and enable
[x]vfrecip instructions to be generated during auto-vectorization.

gcc/ChangeLog:

* config/loongarch/lasx.md (lasx_xvfrecip_): Renamed to ..
(recip3): .. this.
* config/loongarch/loongarch-builtins.cc (CODE_FOR_lsx_vfrecip_d): 
Redefine
to new pattern name.
(CODE_FOR_lsx_vfrecip_s): Ditto.
(CODE_FOR_lasx_xvfrecip_d): Ditto.
(CODE_FOR_lasx_xvfrecip_s): Ditto.
(loongarch_expand_builtin_direct): For the vector recip instructions, 
construct a
temporary parameter const1_vector.
* config/loongarch/lsx.md (lsx_vfrecip_): Renamed to ..
(recip3): .. this.
* config/loongarch/predicates.md (const_vector_1_operand): New 
predicate.

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index c8edc1bfd76..e4310c4523d 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -1626,12 +1626,12 @@ (define_insn "lasx_xvfmina_"
   [(set_attr "type" "simd_fminmax")
(set_attr "mode" "")])
 
-(define_insn "lasx_xvfrecip_"
+(define_insn "recip3"
   [(set (match_operand:FLASX 0 "register_operand" "=f")
-   (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")]
- UNSPEC_LASX_XVFRECIP))]
+   (div:FLASX (match_operand:FLASX 1 "const_vector_1_operand" "")
+ (match_operand:FLASX 2 "register_operand" "f")))]
   "ISA_HAS_LASX"
-  "xvfrecip.\t%u0,%u1"
+  "xvfrecip.\t%u0,%u2"
   [(set_attr "type" "simd_fdiv")
(set_attr "mode" "")])
 
diff --git a/gcc/config/loongarch/loongarch-builtins.cc 
b/gcc/config/loongarch/loongarch-builtins.cc
index ba8686d4ceb..c77394176db 100644
--- a/gcc/config/loongarch/loongarch-builtins.cc
+++ b/gcc/config/loongarch/loongarch-builtins.cc
@@ -502,6 +502,8 @@ AVAIL_ALL (lasx_frecipe, ISA_HAS_LASX && TARGET_FRECIPE)
 #define CODE_FOR_lsx_vssrlrn_wu_d CODE_FOR_lsx_vssrlrn_u_wu_d
 #define CODE_FOR_lsx_vfrsqrt_d CODE_FOR_rsqrtv2df2
 #define CODE_FOR_lsx_vfrsqrt_s CODE_FOR_rsqrtv4sf2
+#define CODE_FOR_lsx_vfrecip_d CODE_FOR_recipv2df3
+#define CODE_FOR_lsx_vfrecip_s CODE_FOR_recipv4sf3
 
 /* LoongArch ASX define CODE_FOR_lasx_mxxx */
 #define CODE_FOR_lasx_xvsadd_b CODE_FOR_ssaddv32qi3
@@ -780,6 +782,8 @@ AVAIL_ALL (lasx_frecipe, ISA_HAS_LASX && TARGET_FRECIPE)
 #define CODE_FOR_lasx_xvsat_du CODE_FOR_lasx_xvsat_u_du
 #define CODE_FOR_lasx_xvfrsqrt_d CODE_FOR_rsqrtv4df2
 #define CODE_FOR_lasx_xvfrsqrt_s CODE_FOR_rsqrtv8sf2
+#define CODE_FOR_lasx_xvfrecip_d CODE_FOR_recipv4df3
+#define CODE_FOR_lasx_xvfrecip_s CODE_FOR_recipv8sf3
 
 static const struct loongarch_builtin_description loongarch_builtins[] = {
 #define LARCH_MOVFCSR2GR 0
@@ -3024,6 +3028,22 @@ loongarch_expand_builtin_direct (enum insn_code icode, 
rtx target, tree exp,
   if (has_target_p)
 create_output_operand ([opno++], target, TYPE_MODE (TREE_TYPE (exp)));
 
+  /* For the vector reciprocal instructions, we need to construct a temporary
+ parameter const1_vector.  */
+  switch (icode)
+{
+case CODE_FOR_recipv8sf3:
+case CODE_FOR_recipv4df3:
+case CODE_FOR_recipv4sf3:
+case CODE_FOR_recipv2df3:
+  loongarch_prepare_builtin_arg ([2], exp, 0);
+  create_input_operand ([1], CONST1_RTX (ops[0].mode), ops[0].mode);
+  return loongarch_expand_builtin_insn (icode, 3, ops, has_target_p);
+
+default:
+  break;
+}
+
   /* Map the arguments to the other operands.  */
   gcc_assert (opno + call_expr_nargs (exp)
  == insn_data[icode].n_generator_args);
diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md
index aeae1b1a622..06402e3b353 100644
--- a/gcc/config/loongarch/lsx.md
+++ b/gcc/config/loongarch/lsx.md
@@ -1539,12 +1539,12 @@ (define_insn "lsx_vfmina_"
   [(set_attr "type" "simd_fminmax")
(set_attr "mode" "")])
 
-(define_insn "lsx_vfrecip_"
+(define_insn "recip3"
   [(set (match_operand:FLSX 0 "register_operand" "=f")
-   (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")]
-UNSPEC_LSX_VFRECIP))]
+   (div:FLSX (match_operand:FLSX 1 "const_vector_1_operand" "")
+(match_operand:FLSX 2 "register_operand" "f")))]
   "ISA_HAS_LSX"
-  "vfrecip.\t%w0,%w1"
+  "vfrecip.\t%w0,%w2"
   [(set_attr "type" "simd_fdiv")
(set_attr "mode" "")])
 
diff --git a/gcc/config/loongarch/predicates.md 
b/gcc/config/loongarch/predicates.md
index d02e846cb12..f7796da10b2 100644
--- a/gcc/config/loongarch/predicates.md
+++ b/gcc/config/loongarch/predicates.md
@@ -227,6 +227,10 @@ (define_predicate "const_1_operand"
   (and (match_code "const_int,const_wide_int,const_double,const_vector")
(match_test "op == CONST1_RTX (GET_MODE (op))")))
 
+(define_predicate "const_vector_1_operand"
+  (and (match_code "const_vector")
+   (match_test "op == CONST1_RTX (GET_MODE (op))")))
+
 (define_predicate "reg_or_1_operand"
   (ior (match_operand 0 

[PATCH v3 1/5] LoongArch: Add support for LoongArch V1.1 approximate instructions.

2023-12-05 Thread Jiahao Xu
This patch adds define_insn/builtins/intrinsics for these instructions, and add 
option
-mfrecipe to control instruction generation.

gcc/ChangeLog:

* config/loongarch/genopts/isa-evolution.in (fecipe): Add.
* config/loongarch/larchintrin.h (__frecipe_s): New intrinsic.
(__frecipe_d): Ditto.
(__frsqrte_s): Ditto.
(__frsqrte_d): Ditto.
* config/loongarch/lasx.md (lasx_xvfrecipe_): New insn 
pattern.
(lasx_xvfrsqrte_): Ditto.
* config/loongarch/lasxintrin.h (__lasx_xvfrecipe_s): New intrinsic.
(__lasx_xvfrecipe_d): Ditto.
(__lasx_xvfrsqrte_s): Ditto.
(__lasx_xvfrsqrte_d): Ditto.
* config/loongarch/loongarch-builtins.cc (AVAIL_ALL): Add predicates.
(LSX_EXT_BUILTIN): New macro.
(LASX_EXT_BUILTIN): Ditto.
* config/loongarch/loongarch-cpucfg-map.h: Regenerate.
* config/loongarch/loongarch-c.cc: Add builtin macro 
"__loongarch_frecipe".
* config/loongarch/loongarch-def.cc: Regenerate.
* config/loongarch/loongarch-str.h (OPTSTR_FRECIPE): Regenerate.
* config/loongarch/loongarch.cc (loongarch_asm_code_end): Dump status 
for TARGET_FRECIPE.
* config/loongarch/loongarch.md (loongarch_frecipe_): New insn 
pattern.
(loongarch_frsqrte_): Ditto.
* config/loongarch/loongarch.opt: Regenerate.
* config/loongarch/lsx.md (lsx_vfrecipe_): New insn pattern.
(lsx_vfrsqrte_): Ditto.
* config/loongarch/lsxintrin.h (__lsx_vfrecipe_s): New intrinsic.
(__lsx_vfrecipe_d): Ditto.
(__lsx_vfrsqrte_s): Ditto.
(__lsx_vfrsqrte_d): Ditto.
* doc/extend.texi: Add documentation for LoongArch new builtins and 
intrinsics.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/larch-frecipe-builtin.c: New test.
* gcc.target/loongarch/vector/lasx/lasx-frecipe-builtin.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-frecipe-builtin.c: New test.

diff --git a/gcc/config/loongarch/genopts/isa-evolution.in 
b/gcc/config/loongarch/genopts/isa-evolution.in
index a6bc3f87f20..11a198b649f 100644
--- a/gcc/config/loongarch/genopts/isa-evolution.in
+++ b/gcc/config/loongarch/genopts/isa-evolution.in
@@ -1,3 +1,4 @@
+2  25  frecipe Support frecipe.{s/d} and frsqrte.{s/d} 
instructions.
 2  26  div32   Support div.w[u] and mod.w[u] instructions with 
inputs not sign-extended.
 2  27  lam-bh  Support am{swap/add}[_db].{b/h} instructions.
 2  28  lamcas  Support amcas[_db].{b/h/w/d} instructions.
diff --git a/gcc/config/loongarch/larchintrin.h 
b/gcc/config/loongarch/larchintrin.h
index e571ed27b37..bb1cda831eb 100644
--- a/gcc/config/loongarch/larchintrin.h
+++ b/gcc/config/loongarch/larchintrin.h
@@ -333,6 +333,44 @@ __iocsrwr_d (unsigned long int _1, unsigned int _2)
 }
 #endif
 
+#ifdef __loongarch_frecipe
+/* Assembly instruction format: fd, fj.  */
+/* Data types in instruction templates:  SF, SF.  */
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__frecipe_s (float _1)
+{
+  __builtin_loongarch_frecipe_s ((float) _1);
+}
+
+/* Assembly instruction format: fd, fj.  */
+/* Data types in instruction templates:  DF, DF.  */
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__frecipe_d (double _1)
+{
+  __builtin_loongarch_frecipe_d ((double) _1);
+}
+
+/* Assembly instruction format: fd, fj.  */
+/* Data types in instruction templates:  SF, SF.  */
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__frsqrte_s (float _1)
+{
+  __builtin_loongarch_frsqrte_s ((float) _1);
+}
+
+/* Assembly instruction format: fd, fj.  */
+/* Data types in instruction templates:  DF, DF.  */
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__frsqrte_d (double _1)
+{
+  __builtin_loongarch_frsqrte_d ((double) _1);
+}
+#endif
+
 /* Assembly instruction format:ui15.  */
 /* Data types in instruction templates:  USI.  */
 #define __dbar(/*ui15*/ _1) __builtin_loongarch_dbar ((_1))
diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 116b30c0774..f6e5208a6f1 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -40,8 +40,10 @@ (define_c_enum "unspec" [
   UNSPEC_LASX_XVFCVTL
   UNSPEC_LASX_XVFLOGB
   UNSPEC_LASX_XVFRECIP
+  UNSPEC_LASX_XVFRECIPE
   UNSPEC_LASX_XVFRINT
   UNSPEC_LASX_XVFRSQRT
+  UNSPEC_LASX_XVFRSQRTE
   UNSPEC_LASX_XVFCMP_SAF
   UNSPEC_LASX_XVFCMP_SEQ
   UNSPEC_LASX_XVFCMP_SLE
@@ -1633,6 +1635,17 @@ (define_insn "lasx_xvfrecip_"
   [(set_attr "type" "simd_fdiv")
(set_attr "mode" "")])
 
+;; Approximate Reciprocal Instructions.
+
+(define_insn "lasx_xvfrecipe_"
+  [(set (match_operand:FLASX 0 "register_operand" "=f")
+(unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")]
+ 

[PATCH v3 0/5] Add support for approximate instructions and optimize divf/sqrtf/rsqrtf operations.

2023-12-05 Thread Jiahao Xu
LoongArch V1.1 adds support for approximate instructions, which are utilized 
along with additional
Newton-Raphson steps implement single precision floating-point division, square 
root and reciprocal
square root operations for better throughput.

The patches are modifications made based on the patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639243.html

Jiahao Xu (5):
  LoongArch: Add support for LoongArch V1.1 approximate instructions.
  LoongArch: Use standard pattern name for xvfrsqrt/vfrsqrt
instructions.
  LoongArch: Redefine pattern for xvfrecip/vfrecip instructions.
  LoongArch: New options -mrecip and -mrecip= with ffast-math.
  LoongArch: Vectorized loop unrolling is disable for divf/sqrtf/rsqrtf
when -mrecip is enabled.

 gcc/config/loongarch/genopts/isa-evolution.in |   1 +
 gcc/config/loongarch/genopts/loongarch.opt.in |  11 +
 gcc/config/loongarch/larchintrin.h|  38 +++
 gcc/config/loongarch/lasx.md  |  89 ++-
 gcc/config/loongarch/lasxintrin.h |  34 +++
 gcc/config/loongarch/loongarch-builtins.cc|  66 +
 gcc/config/loongarch/loongarch-c.cc   |   3 +
 gcc/config/loongarch/loongarch-cpucfg-map.h   |   1 +
 gcc/config/loongarch/loongarch-def.cc |   3 +-
 gcc/config/loongarch/loongarch-protos.h   |   2 +
 gcc/config/loongarch/loongarch-str.h  |   1 +
 gcc/config/loongarch/loongarch.cc | 252 +-
 gcc/config/loongarch/loongarch.h  |  18 ++
 gcc/config/loongarch/loongarch.md | 104 ++--
 gcc/config/loongarch/loongarch.opt|  15 ++
 gcc/config/loongarch/lsx.md   |  89 ++-
 gcc/config/loongarch/lsxintrin.h  |  34 +++
 gcc/config/loongarch/predicates.md|   8 +
 gcc/doc/extend.texi   |  35 +++
 gcc/doc/invoke.texi   |  54 
 gcc/testsuite/gcc.target/loongarch/divf.c |  10 +
 .../loongarch/larch-frecipe-builtin.c |  28 ++
 .../gcc.target/loongarch/recip-divf.c |   9 +
 .../gcc.target/loongarch/recip-sqrtf.c|  23 ++
 gcc/testsuite/gcc.target/loongarch/sqrtf.c|  24 ++
 .../loongarch/vector/lasx/lasx-divf.c |  13 +
 .../vector/lasx/lasx-frecipe-builtin.c|  30 +++
 .../loongarch/vector/lasx/lasx-recip-divf.c   |  12 +
 .../loongarch/vector/lasx/lasx-recip-sqrtf.c  |  28 ++
 .../loongarch/vector/lasx/lasx-recip.c|  24 ++
 .../loongarch/vector/lasx/lasx-rsqrt.c|  26 ++
 .../loongarch/vector/lasx/lasx-sqrtf.c|  29 ++
 .../loongarch/vector/lsx/lsx-divf.c   |  13 +
 .../vector/lsx/lsx-frecipe-builtin.c  |  30 +++
 .../loongarch/vector/lsx/lsx-recip-divf.c |  12 +
 .../loongarch/vector/lsx/lsx-recip-sqrtf.c|  28 ++
 .../loongarch/vector/lsx/lsx-recip.c  |  24 ++
 .../loongarch/vector/lsx/lsx-rsqrt.c  |  26 ++
 .../loongarch/vector/lsx/lsx-sqrtf.c  |  29 ++
 39 files changed, 1234 insertions(+), 42 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/divf.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/larch-frecipe-builtin.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/recip-divf.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/recip-sqrtf.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/sqrtf.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-divf.c
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-frecipe-builtin.c
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-divf.c
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-sqrtf.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-rsqrt.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-sqrtf.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-divf.c
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-frecipe-builtin.c
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-divf.c
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-sqrtf.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-rsqrt.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-sqrtf.c

-- 
2.20.1



Re: [PATCH 02/17] [APX NDD] Restrict TImode register usage when NDD enabled

2023-12-05 Thread Uros Bizjak
On Wed, Dec 6, 2023 at 2:31 AM Hongyu Wang  wrote:
>
> Uros Bizjak  于2023年12月5日周二 18:46写道:
>
> >
> > On Tue, Dec 5, 2023 at 3:29 AM Hongyu Wang  wrote:
> > >
> > > Under APX NDD, previous TImode allocation will have issue that it was
> > > originally allocated using continuous pair, like rax:rdi, rdi:rdx.
> > >
> > > This will cause issue for all TImode NDD patterns. For NDD we will not
> > > assume the arithmetic operations like add have dependency between dest
> > > and src1, then write to 1st highpart rdi will be overrided by the 2nd
> > > lowpart rdi if 2nd lowpart rdi have different src as input, then the write
> > > to 1st highpart rdi will missed and cause miscompliation.
> > >
> > > To resolve this, under TARGET_APX_NDD we'd only allow register with even
> > > regno to be allocated with TImode, then TImode registers will be allocated
> > > with non-overlapping pairs.
> >
> > Perhaps you could use earlyclobber with __doubleword instructions:
> >
> > (define_insn_and_split "*add3_doubleword"
> >   [(set (match_operand: 0 "nonimmediate_operand" "=ro,r")
> > (plus:
> >   (match_operand: 1 "nonimmediate_operand" "%0,0")
> >   (match_operand: 2 "x86_64_hilo_general_operand" "r,o")))
> >(clobber (reg:CC FLAGS_REG))]
> >
> > For the above pattern, you can add earlyclobbered  output
> > alternative that guarantees that output won't be allocated to any of
> > the input registers.
> >
>
> Yes, it does resolve the dest/src overlapping issue we met, thanks!
> I tried it and no fails in gcc-testsuite and spec. Suppose for
> different src1/src2 RA can handle them correctly.

Yes, and when memory input operand is used in doubleword patterns, you
need earlyclobber anyway, otherwise nothing prevents the compiler from
clobbering address registers. When addr registers are dead, the
compiler can (and will) allocate output register to the same regno as
address register.

Uros,

> Will update in V3 patches with the changes of get_attr_isa (insn) == 
> ISA_APX_NDD


Re: Re: [PATCH] RISC-V: Remove useless modes

2023-12-05 Thread Li Xu
Got it.
Committed, thanks juzhe and kito.



xu...@eswincomputing.com
 
From: Kito Cheng
Date: 2023-12-06 14:45
To: Li Xu
CC: gcc-patches; palmer; juzhe.zhong
Subject: Re: [PATCH] RISC-V: Remove useless modes
You could add [NFC] to the title for this kind of patch to declare its
clean up or refactor patch without change any function or feature,
that would be easier for reviewer, anyway LGTM as well
 
On Wed, Dec 6, 2023 at 12:50 PM Li Xu  wrote:
>
> From: xuli 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.md: Remove.
> ---
>  gcc/config/riscv/riscv.md | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> index a98918dfd43..0db659acfbe 100644
> --- a/gcc/config/riscv/riscv.md
> +++ b/gcc/config/riscv/riscv.md
> @@ -235,7 +235,6 @@
>RVVM1x7DF,RVVM1x6DF,RVVM1x5DF,RVVM2x4DF,
>RVVM1x4DF,RVVM2x3DF,RVVM1x3DF,RVVM4x2DF,
>RVVM2x2DF,RVVM1x2DF,
> -  VNx2x1DF,VNx3x1DF,VNx4x1DF,VNx5x1DF,VNx6x1DF,VNx7x1DF,VNx8x1DF,
>
> V1QI,V2QI,V4QI,V8QI,V16QI,V32QI,V64QI,V128QI,V256QI,V512QI,V1024QI,V2048QI,V4096QI,
>V1HI,V2HI,V4HI,V8HI,V16HI,V32HI,V64HI,V128HI,V256HI,V512HI,V1024HI,V2048HI,
>V1SI,V2SI,V4SI,V8SI,V16SI,V32SI,V64SI,V128SI,V256SI,V512SI,V1024SI,
> --
> 2.17.1
>


Re: [PATCH 3/4] RISC-V: Add crypto vector machine descriptions

2023-12-05 Thread juzhe.zh...@rivai.ai
Do vector crypto instruction demand RATIO ?

If no, add them into:

;; It is valid for instruction that require sew/lmul ratio.
(define_attr "ratio" ""
  (cond [(eq_attr "type" "vimov,vfmov,vldux,vldox,vstux,vstox,\
vialu,vshift,vicmp,vimul,vidiv,vsalu,\
vext,viwalu,viwmul,vicalu,vnshift,\
vimuladd,vimerge,vaalu,vsmul,vsshift,\
vnclip,viminmax,viwmuladd,vmffs,vmsfs,\
vmiota,vmidx,vfalu,vfmul,vfminmax,vfdiv,\
vfwalu,vfwmul,vfsqrt,vfrecp,vfsgnj,vfcmp,\
vfmerge,vfcvtitof,vfcvtftoi,vfwcvtitof,\
vfwcvtftoi,vfwcvtftof,vfncvtitof,vfncvtftoi,\
vfncvtftof,vfmuladd,vfwmuladd,vfclass,vired,\
viwred,vfredu,vfredo,vfwredu,vfwredo,vimovvx,\
vimovxv,vfmovvf,vfmovfv,vslideup,vslidedown,\
vislide1up,vislide1down,vfslide1up,vfslide1down,\
vgather,vcompress,vlsegdux,vlsegdox,vssegtux,vssegtox")
 (const_int INVALID_ATTRIBUTE)


+(define_insn "@pred_vandn"
+  [(set (match_operand:VI 0 "register_operand"   "=vd,vd")

Seems all vector crypto instructions are not allowed to use v0 ? Why not use vr?

+   (set_attr "mode" "")])
use  is enough.

+(define_insn "@pred_vwsll_scalar"
+  [(set (match_operand:VWEXTI 0 "register_operand"   "=")
+(if_then_else:VWEXTI
+  (unspec:
+[(match_operand: 1 "vector_mask_operand" "vmWc1")
+ (match_operand 5 "vector_length_operand""   rK")
+ (match_operand 6 "const_int_operand""   i")
+ (match_operand 7 "const_int_operand""   i")
+ (match_operand 8 "const_int_operand""   i")
+ (reg:SI VL_REGNUM)
+ (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+  (ashift:VWEXTI
+(zero_extend:VWEXTI
+ (match_operand: 3 "register_operand"  "vr"))
+(match_operand: 4 "pmode_reg_or_uimm5_operand" "rK"))
+  (match_operand:VWEXTI 2 "vector_merge_operand" "0vu")))]
+  "TARGET_ZVBB"
+  "vwsll.v%o4\t%0,%3,%4%p1"
+  [(set_attr "type" "vwsll")
+   (set_attr "mode" "")])

Seems that we can leverage EEW widen overlap ?

See RVV ISA:

 ;; According to RVV ISA:
 ;; The destination EEW is greater than the source EEW, the source EMUL 
is at least 1,
 ;; and the overlap is in the highest-numbered part of the destination 
register group
 ;; (e.g., when LMUL=8, vzext.vf4 v0, v6 is legal, but a source of v0, 
v2, or v4 is not).
 ;; So the source operand should have LMUL >= 1.

Reference patch: 
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/638869.html 

Currently, I don't have a solution to support highest-number overlap for vv 
instruction.
Keep them early clobber for now it ok.



juzhe.zh...@rivai.ai
 
From: Feng Wang
Date: 2023-12-06 10:45
To: gcc-patches
CC: kito.cheng; jeffreyalaw; juzhe.zhong; zhusonghe; panciyan; Feng Wang
Subject: [PATCH 3/4] RISC-V: Add crypto vector machine descriptions
This patch add the crypto machine descriptions(vector-crypto.md) and
some new iterators which are used by crypto vector ext.
 
Co-Authored by: Songhe Zhu 
Co-Authored by: Ciyan Pan 
 
gcc/ChangeLog:
 
* config/riscv/iterators.md: Add rotate insn name.
* config/riscv/riscv.md: Add new insns name for crypto vector.
* config/riscv/vector-iterators.md: Add new iterators for crypto vector.
* config/riscv/vector.md: Add the corresponding attr for crypto vector.
* config/riscv/vector-crypto.md: New file.The machine descriptions for crypto 
vector.
---
gcc/config/riscv/iterators.md|   4 +-
gcc/config/riscv/riscv.md|  33 +-
gcc/config/riscv/vector-crypto.md| 500 +++
gcc/config/riscv/vector-iterators.md |  41 +++
gcc/config/riscv/vector.md   |  49 ++-
5 files changed, 607 insertions(+), 20 deletions(-)
create mode 100755 gcc/config/riscv/vector-crypto.md
 
diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index ecf033f2fa7..f332fba7031 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -304,7 +304,9 @@
(umax "maxu")
(clz "clz")
(ctz "ctz")
- (popcount "cpop")])
+ (popcount "cpop")
+ (rotate "rol")
+ (rotatert "ror")])
;; ---
;; Int Iterators.
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 935eeb7fd8e..a887f3cd412 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -428,6 +428,34 @@
;; vcompressvector compress instruction
;; vmov whole vector register move
;; vector   unknown vector instruction
+;; 17. Crypto Vector instructions
+;; vandncrypto vector bitwise and-not instructions
+;; vbrevcrypto vector reverse bits in elements instructions
+;; vbrev8   crypto vector reverse bits in bytes instructions
+;; vrev8crypto vector reverse bytes instructions
+;; vclz crypto vector count leading Zeros instructions
+;; vctz crypto vector count lrailing 

Re: [PATCH 2/4] RISC-V: Add crypto vector builtin function.

2023-12-05 Thread juzhe.zh...@rivai.ai
+if (!((strcmp (instance.base_name, "vghsh") == 0
+  || strcmp (instance.base_name, "vgmul") == 0
+  || strcmp (instance.base_name, "vaesz") == 0
+  || strcmp (instance.base_name, "vsha2ms") == 0
+  || strcmp (instance.base_name, "vsha2ch") == 0
+  || strcmp (instance.base_name, "vsha2cl") == 0
+  || strcmp (instance.base_name, "vsm3me") == 0)
+  && overloaded_p))
+  b.append_name (operand_suffixes[instance.op_info->op]);

Split them into another shape, so that you don't need to use strcmp.




juzhe.zh...@rivai.ai
 
From: Feng Wang
Date: 2023-12-06 10:45
To: gcc-patches
CC: kito.cheng; jeffreyalaw; juzhe.zhong; zhusonghe; panciyan; Feng Wang
Subject: [PATCH 2/4] RISC-V: Add crypto vector builtin function.
This patch add the intrinsic funtions of crypto vector based on the
intrinsic doc(https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob
/eopc/vector-crypto/auto-generated/vector-crypto/intrinsic_funcs.md).
 
Co-Authored by: Songhe Zhu 
Co-Authored by: Ciyan Pan 
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-bases.cc (class vandn):
Add new function_base for crypto vector.
(class bitmanip): Ditto.
(class b_reverse):Ditto.
(class vwsll):Ditto.
(class clmul):Ditto.
(class vg_nhab):  Ditto.
(class crypto_vv):Ditto.
(class crypto_vi):Ditto.
(class vaeskf2_vsm3c):Ditto.
(class vsm3me):Ditto.
(BASE): Add BASE declaration for crypto vector.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-shapes.cc (struct crypto_vv_def):
Add new function_shape for crypto vector.
(struct crypto_vi_def): Ditto.
(SHAPE): Add SHAPE declaration of crypto vector.
* config/riscv/riscv-vector-builtins-shapes.h: Ditto.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_CRYPTO_SEW32_OPS):
Add new data struct for crypto vector.
(DEF_RVV_CRYPTO_SEW64_OPS): Ditto.
(DEF_VECTOR_CRYPTO_FUNCTION): New MACRO define of crypto vector.
(registered_function::overloaded_hash): Processing size_t uimm for C overloaded 
func.
(handle_pragma_vector): Add registration for crypto vector.
* config/riscv/riscv-vector-builtins.def (vi): Add vi OP_TYPE.
* config/riscv/riscv-vector-builtins.h (struct crypto_function_group_info):
Add new struct definition for crypto vector.
* config/riscv/t-riscv: Add building dependency files.
* config/riscv/riscv-vector-crypto-builtins-avail.h:
New file to control enable.
* config/riscv/riscv-vector-crypto-builtins-functions.def:
New file. Definition of crypto vector.
* config/riscv/riscv-vector-crypto-builtins-types.def:
New file. New type definition for crypto vector.
---
.../riscv/riscv-vector-builtins-bases.cc  | 259 +-
.../riscv/riscv-vector-builtins-bases.h   |  28 ++
.../riscv/riscv-vector-builtins-shapes.cc |  66 -
.../riscv/riscv-vector-builtins-shapes.h  |   4 +
gcc/config/riscv/riscv-vector-builtins.cc | 152 +-
gcc/config/riscv/riscv-vector-builtins.def|   1 +
gcc/config/riscv/riscv-vector-builtins.h  |   8 +
.../riscv-vector-crypto-builtins-avail.h  |  25 ++
...riscv-vector-crypto-builtins-functions.def |  78 ++
.../riscv-vector-crypto-builtins-types.def|  21 ++
gcc/config/riscv/t-riscv  |   2 +
11 files changed, 641 insertions(+), 3 deletions(-)
create mode 100755 gcc/config/riscv/riscv-vector-crypto-builtins-avail.h
create mode 100755 gcc/config/riscv/riscv-vector-crypto-builtins-functions.def
create mode 100755 gcc/config/riscv/riscv-vector-crypto-builtins-types.def
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index d70468542ee..6d52230e9ba 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -2127,6 +2127,207 @@ public:
   }
};
+/* Below implements are vector crypto */
+/* Implements vandn.[vv,vx] */
+class vandn : public function_base
+{
+public:
+  rtx expand (function_expander ) const override
+  {
+switch (e.op_info->op)
+  {
+  case OP_TYPE_vv:
+return e.use_exact_insn (code_for_pred_vandn (e.vector_mode ()));
+  case OP_TYPE_vx:
+return e.use_exact_insn (code_for_pred_vandn_scalar (e.vector_mode 
()));
+  default:
+gcc_unreachable ();
+  }
+  }
+};
+
+/* Implements vrol/vror/clz/ctz.  */
+template
+class bitmanip : public function_base
+{
+public:
+  bool apply_tail_policy_p () const override
+  {
+return (CODE == CLZ || CODE == CTZ) ? false : true;
+  }
+  bool apply_mask_policy_p () const override
+  {
+return (CODE == CLZ || CODE == CTZ) ? false : true;
+  }
+  bool has_merge_operand_p () const override
+  {
+return (CODE == CLZ || CODE == CTZ) ? false : true;
+  }
+  
+  rtx expand (function_expander ) const override
+  {
+switch (e.op_info->op)
+{
+  case OP_TYPE_v:
+  case OP_TYPE_vv:
+return e.use_exact_insn (code_for_pred_v (CODE, e.vector_mode ()));
+  case OP_TYPE_vx:
+  

Re: [PATCH] RISC-V: Remove useless modes

2023-12-05 Thread Kito Cheng
You could add [NFC] to the title for this kind of patch to declare its
clean up or refactor patch without change any function or feature,
that would be easier for reviewer, anyway LGTM as well

On Wed, Dec 6, 2023 at 12:50 PM Li Xu  wrote:
>
> From: xuli 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.md: Remove.
> ---
>  gcc/config/riscv/riscv.md | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> index a98918dfd43..0db659acfbe 100644
> --- a/gcc/config/riscv/riscv.md
> +++ b/gcc/config/riscv/riscv.md
> @@ -235,7 +235,6 @@
>RVVM1x7DF,RVVM1x6DF,RVVM1x5DF,RVVM2x4DF,
>RVVM1x4DF,RVVM2x3DF,RVVM1x3DF,RVVM4x2DF,
>RVVM2x2DF,RVVM1x2DF,
> -  VNx2x1DF,VNx3x1DF,VNx4x1DF,VNx5x1DF,VNx6x1DF,VNx7x1DF,VNx8x1DF,
>
> V1QI,V2QI,V4QI,V8QI,V16QI,V32QI,V64QI,V128QI,V256QI,V512QI,V1024QI,V2048QI,V4096QI,
>V1HI,V2HI,V4HI,V8HI,V16HI,V32HI,V64HI,V128HI,V256HI,V512HI,V1024HI,V2048HI,
>V1SI,V2SI,V4SI,V8SI,V16SI,V32SI,V64SI,V128SI,V256SI,V512SI,V1024SI,
> --
> 2.17.1
>


[PATCH v2] LoongArch: Fix eh_return epilogue for normal returns

2023-12-05 Thread Yang Yujie
On LoongArch, the regitsters $r4 - $r7 (EH_RETURN_DATA_REGNO) will be saved
and restored in the function prologue and epilogue if the given function calls
__builtin_eh_return.  This causes the return value to be overwritten on normal
return paths and breaks a rare case of libgcc's _Unwind_RaiseException.

gcc/ChangeLog:

* config/loongarch/loongarch.cc: Do not restore the saved eh_return
data registers ($r4-$r7) for a normal return of a function that calls
__builtin_eh_return elsewhere.
* config/loongarch/loongarch-protos.h: Same.
* config/loongarch/loongarch.md: Same.

gcc/testsuite/ChangeLog:

* gcc.target/eh_return-normal-return.c: New test.
---
 gcc/config/loongarch/loongarch-protos.h   |  2 +-
 gcc/config/loongarch/loongarch.cc | 41 ---
 gcc/config/loongarch/loongarch.md | 18 +++-
 .../gcc.target/eh_return-normal-return.c  | 31 ++
 4 files changed, 75 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/eh_return-normal-return.c

diff --git a/gcc/config/loongarch/loongarch-protos.h 
b/gcc/config/loongarch/loongarch-protos.h
index cb8fc36b086..af20b5d7132 100644
--- a/gcc/config/loongarch/loongarch-protos.h
+++ b/gcc/config/loongarch/loongarch-protos.h
@@ -60,7 +60,7 @@ enum loongarch_symbol_type {
 extern rtx loongarch_emit_move (rtx, rtx);
 extern HOST_WIDE_INT loongarch_initial_elimination_offset (int, int);
 extern void loongarch_expand_prologue (void);
-extern void loongarch_expand_epilogue (bool);
+extern void loongarch_expand_epilogue (int);
 extern bool loongarch_can_use_return_insn (void);
 
 extern bool loongarch_symbolic_constant_p (rtx, enum loongarch_symbol_type *);
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 3545e66a10e..9c0e0dd1b73 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -1015,20 +1015,30 @@ loongarch_save_restore_reg (machine_mode mode, int 
regno, HOST_WIDE_INT offset,
 
 static void
 loongarch_for_each_saved_reg (HOST_WIDE_INT sp_offset,
- loongarch_save_restore_fn fn)
+ loongarch_save_restore_fn fn,
+ bool skip_eh_data_regs_p)
 {
   HOST_WIDE_INT offset;
 
   /* Save the link register and s-registers.  */
   offset = cfun->machine->frame.gp_sp_offset - sp_offset;
   for (int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++)
-if (BITSET_P (cfun->machine->frame.mask, regno - GP_REG_FIRST))
-  {
-   if (!cfun->machine->reg_is_wrapped_separately[regno])
- loongarch_save_restore_reg (word_mode, regno, offset, fn);
+{
+  /* Special care needs to be taken for $r4-$r7 (EH_RETURN_DATA_REGNO)
+when returning normally from a function that calls __builtin_eh_return.
+In this case, these registers are saved but should not be restored,
+or the return value may be clobbered.  */
 
-   offset -= UNITS_PER_WORD;
-  }
+  if (BITSET_P (cfun->machine->frame.mask, regno - GP_REG_FIRST))
+   {
+ if (!(cfun->machine->reg_is_wrapped_separately[regno]
+   || (skip_eh_data_regs_p
+   && GP_ARG_FIRST <= regno && regno < GP_ARG_FIRST + 4)))
+   loongarch_save_restore_reg (word_mode, regno, offset, fn);
+
+ offset -= UNITS_PER_WORD;
+   }
+}
 
   /* This loop must iterate over the same space as its companion in
  loongarch_compute_frame_info.  */
@@ -1297,7 +1307,7 @@ loongarch_expand_prologue (void)
GEN_INT (-step1));
   RTX_FRAME_RELATED_P (emit_insn (insn)) = 1;
   size -= step1;
-  loongarch_for_each_saved_reg (size, loongarch_save_reg);
+  loongarch_for_each_saved_reg (size, loongarch_save_reg, false);
 }
 
   /* Set up the frame pointer, if we're using one.  */
@@ -1382,11 +1392,11 @@ loongarch_can_use_return_insn (void)
   return reload_completed && cfun->machine->frame.total_size == 0;
 }
 
-/* Expand an "epilogue" or "sibcall_epilogue" pattern; SIBCALL_P
-   says which.  */
+/* Expand function epilogue for the following insn patterns:
+   "epilogue" (style == 0) / "sibcall_epilogue" (1) / "eh_return" (2).  */
 
 void
-loongarch_expand_epilogue (bool sibcall_p)
+loongarch_expand_epilogue (int style)
 {
   /* Split the frame into two.  STEP1 is the amount of stack we should
  deallocate before restoring the registers.  STEP2 is the amount we
@@ -1403,7 +1413,8 @@ loongarch_expand_epilogue (bool sibcall_p)
   bool need_barrier_p
 = (get_frame_size () + cfun->machine->frame.arg_pointer_offset) != 0;
 
-  if (!sibcall_p && loongarch_can_use_return_insn ())
+  /* Handle simple returns.  */
+  if (style == 0 && loongarch_can_use_return_insn ())
 {
   emit_jump_insn (gen_return ());
   return;
@@ -1479,7 +1490,8 @@ loongarch_expand_epilogue (bool sibcall_p)
 
   /* Restore the registers.  */
   

Re: [PATCH] RISC-V: Remove useless modes

2023-12-05 Thread juzhe.zh...@rivai.ai
LGTM.



juzhe.zh...@rivai.ai
 
From: Li Xu
Date: 2023-12-06 12:49
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong; xuli
Subject: [PATCH] RISC-V: Remove useless modes
From: xuli 
 
gcc/ChangeLog:
 
* config/riscv/riscv.md: Remove.
---
gcc/config/riscv/riscv.md | 1 -
1 file changed, 1 deletion(-)
 
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index a98918dfd43..0db659acfbe 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -235,7 +235,6 @@
   RVVM1x7DF,RVVM1x6DF,RVVM1x5DF,RVVM2x4DF,
   RVVM1x4DF,RVVM2x3DF,RVVM1x3DF,RVVM4x2DF,
   RVVM2x2DF,RVVM1x2DF,
-  VNx2x1DF,VNx3x1DF,VNx4x1DF,VNx5x1DF,VNx6x1DF,VNx7x1DF,VNx8x1DF,
   
V1QI,V2QI,V4QI,V8QI,V16QI,V32QI,V64QI,V128QI,V256QI,V512QI,V1024QI,V2048QI,V4096QI,
   V1HI,V2HI,V4HI,V8HI,V16HI,V32HI,V64HI,V128HI,V256HI,V512HI,V1024HI,V2048HI,
   V1SI,V2SI,V4SI,V8SI,V16SI,V32SI,V64SI,V128SI,V256SI,V512SI,V1024SI,
-- 
2.17.1
 
 


Re: [PATCH] analyzer: deal with -fshort-enums

2023-12-05 Thread Alexandre Oliva
On Nov 22, 2023, Alexandre Oliva  wrote:

> Ah, nice, that's a great idea, I wish I'd thought of that!  Will do.

Sorry it took me so long, here it is.  I added two tests, so that,
regardless of the defaults, we get both circumstances tested, without
repetition.

Regstrapped on x86_64-linux-gnu.  Also tested on arm-eabi.  Ok to install?


analyzer: deal with -fshort-enums

On platforms that enable -fshort-enums by default, various switch-enum
analyzer tests fail, because apply_constraints_for_gswitch doesn't
expect the integral promotion type cast.  I've arranged for the code
to cope with those casts.


for  gcc/analyzer/ChangeLog

* region-model.cc (has_nondefault_case_for_value_p): Take
enumerate type as a parameter.
(region_model::apply_constraints_for_gswitch): Cope with
integral promotion type casts.

for  gcc/testsuite/ChangeLog

* gcc.dg/analyzer/switch-short-enum-1.c: New.
* gcc.dg/analyzer/switch-no-short-enum-1.c: New.
---
 gcc/analyzer/region-model.cc   |   27 +++-
 .../gcc.dg/analyzer/switch-no-short-enum-1.c   |  141 
 .../gcc.dg/analyzer/switch-short-enum-1.c  |  140 
 3 files changed, 304 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/switch-no-short-enum-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/switch-short-enum-1.c

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 2157ad2578b85..6a7a8bc9f4884 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -5387,10 +5387,10 @@ has_nondefault_case_for_value_p (const gswitch 
*switch_stmt, tree int_cst)
has nondefault cases handling all values in the enum.  */
 
 static bool
-has_nondefault_cases_for_all_enum_values_p (const gswitch *switch_stmt)
+has_nondefault_cases_for_all_enum_values_p (const gswitch *switch_stmt,
+   tree type)
 {
   gcc_assert (switch_stmt);
-  tree type = TREE_TYPE (gimple_switch_index (switch_stmt));
   gcc_assert (TREE_CODE (type) == ENUMERAL_TYPE);
 
   for (tree enum_val_iter = TYPE_VALUES (type);
@@ -5426,6 +5426,23 @@ apply_constraints_for_gswitch (const 
switch_cfg_superedge ,
 {
   tree index  = gimple_switch_index (switch_stmt);
   const svalue *index_sval = get_rvalue (index, ctxt);
+  bool check_index_type = true;
+
+  /* With -fshort-enum, there may be a type cast.  */
+  if (ctxt && index_sval->get_kind () == SK_UNARYOP
+  && TREE_CODE (index_sval->get_type ()) == INTEGER_TYPE)
+{
+  const unaryop_svalue *unaryop = as_a  
(index_sval);
+  if (unaryop->get_op () == NOP_EXPR
+ && is_a  (unaryop->get_arg ()))
+   if (const initial_svalue *initvalop = (as_a 
+  (unaryop->get_arg (
+ if (TREE_CODE (initvalop->get_type ()) == ENUMERAL_TYPE)
+   {
+ index_sval = initvalop;
+ check_index_type = false;
+   }
+}
 
   /* If we're switching based on an enum type, assume that the user is only
  working with values from the enum.  Hence if this is an
@@ -5437,12 +5454,14 @@ apply_constraints_for_gswitch (const 
switch_cfg_superedge ,
   ctxt
   /* Must be an enum value.  */
   && index_sval->get_type ()
-  && TREE_CODE (TREE_TYPE (index)) == ENUMERAL_TYPE
+  && (!check_index_type
+ || TREE_CODE (TREE_TYPE (index)) == ENUMERAL_TYPE)
   && TREE_CODE (index_sval->get_type ()) == ENUMERAL_TYPE
   /* If we have a constant, then we can check it directly.  */
   && index_sval->get_kind () != SK_CONSTANT
   && edge.implicitly_created_default_p ()
-  && has_nondefault_cases_for_all_enum_values_p (switch_stmt)
+  && has_nondefault_cases_for_all_enum_values_p (switch_stmt,
+index_sval->get_type ())
   /* Don't do this if there's a chance that the index is
 attacker-controlled.  */
   && !ctxt->possibly_tainted_p (index_sval))
diff --git a/gcc/testsuite/gcc.dg/analyzer/switch-no-short-enum-1.c 
b/gcc/testsuite/gcc.dg/analyzer/switch-no-short-enum-1.c
new file mode 100644
index 0..98f6d91f97481
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/switch-no-short-enum-1.c
@@ -0,0 +1,141 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-fno-short-enums" } */
+/* { dg-skip-if "default" { ! short_enums } } */
+
+#include "analyzer-decls.h"
+
+/* Verify the handling of "switch (enum_value)".  */
+
+enum e
+{
+ E_VAL0,
+ E_VAL1,
+ E_VAL2
+};
+
+/* Verify that we assume that "switch (enum)" doesn't follow implicit
+   "default" if all enum values have cases  */
+
+int test_all_values_covered_implicit_default_1 (enum e x)
+{
+  switch (x)
+{
+case E_VAL0:
+  return 1066;
+case E_VAL1:
+  return 1776;
+case E_VAL2:
+  return 1945;
+}
+  __analyzer_dump_path (); /* { dg-bogus "path" } */
+}
+

Re: [PATCH RFA (libstdc++)] c++: partial ordering of object parameter [PR53499]

2023-12-05 Thread waffl3x



On Tuesday, December 5th, 2023 at 9:36 PM, Jason Merrill  
wrote:


> 
> 
> On 12/5/23 23:23, waffl3x wrote:
> 
> > Does CWG2834 effect this weird edge case?
> 
> 
> 2834 affects all partial ordering with explicit object member functions;

Both in relation to each other, and to iobj and static member functions?

> currently the working draft says that they get an additional fake object
> parameter, which is clearly wrong.

Yeah, that's really weird. I was under the impression that's how static
member functions worked, I didn't realize it was also how it's
specified for xobj member functions. I still find it weird for static
member functions. I guess I'll have to study template partial ordering,
what it is, how it's specified and whatnot. I think I understand it
intuitively but not at a language law level.

> > I couldn't quite grasp the
> > standardese so I'm not really sure. These are a few cases from a test
> > that I finalized last night. I ran this by jwakely and he agreed that
> > the behavior as shown is correct by the standard. I'll also add that
> > this is also the current behavior of my patch.
> > 
> > template concept Constrain = true;
> > 
> > inline constexpr int iobj_fn = 5;
> > inline constexpr int xobj_fn = 10;
> > 
> > struct S {
> > int f(Constrain auto) { return iobj_fn; };
> > int f(this S&&, auto) { return xobj_fn; };
> > 
> > int g(auto) { return iobj_fn; };
> > int g(this S&&, Constrain auto) { return xobj_fn; };
> > };
> > int main() {
> > S s{};
> > s.f (0) == iobj_fn;
> 
> 
> Yes, the xobj fn isn't viable because it takes an rvalue ref.
> 
> > static_cast(s).f (0) == iobj_fn;
> 
> 
> Yes, the functions look the same to partial ordering, so we compare
> constraints and the iobj fn is more constrained.
> 
> > s.g (0) == iobj_fn;
> 
> 
> Yes, the xobj fn isn't viable.
> 
> > static_cast(s).g (0) == xobj_fn;
> 
> 
> Yes, the xobj fn is more constrained.
> 
> Jason

It's funny to see you effortlessly agree with what took me a number of
hours pondering.

So just to confirm, you're also saying the changes proposed by CWG2834
will not change the behavior of this example?

Alex


Re: [PATCH] libsupc++: try cxa_thread_atexit_impl at runtime

2023-12-05 Thread Alexandre Oliva
On Dec  5, 2023, Alexandre Oliva  wrote:

> Maybe we should narrow it down to targets in which weak undefined
> symbols are available with the expected semantics, and where the symbol
> is known to have ever been defined in libc.  On it...

This patch reintroduces the weak symbol reference only on GNU systems,
where they're most likely to be useful.  If other systems could benefit,
we can always add them later.

> Or maybe a weak definition (or weak alias to a definition) in that file
> would enable us to test whether the weak definition was preempted

Uhh...  'cept libc wouldn't preempt from libstdc++; the opposite would
occur, but that doesn't help.


Regstrapped on x86_64-linux-gnu, also tested with
ac_cv_func___cxa_thread_atexit_impl=no.  Ok to (re)install?


libsupc++: try cxa_thread_atexit_impl at runtime

g++.dg/tls/thread_local-order2.C fails when the toolchain is built for
a platform that lacks __cxa_thread_atexit_impl, even if the program is
built and run using that toolchain on a (later) platform that offers
__cxa_thread_atexit_impl.

This patch adds runtime testing for __cxa_thread_atexit_impl on select
platforms (GNU variants, for starters) that support weak symbols.


for  libstdc++-v3/ChangeLog

* config/os/gnu-linux/os_defines.h
(_GLIBCXX_MAY_HAVE___CXA_THREAD_ATEXIT_IMPL): Define.
* libsupc++/atexit_thread.cc [__GXX_WEAK__ &&
_GLIBCXX_MAY_HAVE___CXA_THREAD_ATEXIT_IMPL]
(__cxa_thread_atexit): Add dynamic detection of
__cxa_thread_atexit_impl.
---
 libstdc++-v3/config/os/gnu-linux/os_defines.h |5 +
 libstdc++-v3/libsupc++/atexit_thread.cc   |   23 ++-
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/config/os/gnu-linux/os_defines.h 
b/libstdc++-v3/config/os/gnu-linux/os_defines.h
index 87317031fcd71..a2e4baec069d5 100644
--- a/libstdc++-v3/config/os/gnu-linux/os_defines.h
+++ b/libstdc++-v3/config/os/gnu-linux/os_defines.h
@@ -60,6 +60,11 @@
 # define _GLIBCXX_HAVE_FLOAT128_MATH 1
 #endif
 
+// Enable __cxa_thread_atexit to rely on a (presumably libc-provided)
+// __cxa_thread_atexit_impl, if it happens to be defined, even if
+// configure couldn't find it during the build.
+#define _GLIBCXX_MAY_HAVE___CXA_THREAD_ATEXIT_IMPL 1
+
 #ifdef __linux__
 // The following libpthread properties only apply to Linux, not GNU/Hurd.
 
diff --git a/libstdc++-v3/libsupc++/atexit_thread.cc 
b/libstdc++-v3/libsupc++/atexit_thread.cc
index 9346d50f5dafe..aa4ed5312bfe3 100644
--- a/libstdc++-v3/libsupc++/atexit_thread.cc
+++ b/libstdc++-v3/libsupc++/atexit_thread.cc
@@ -138,11 +138,32 @@ namespace {
   }
 }
 
+#if __GXX_WEAK__ && _GLIBCXX_MAY_HAVE___CXA_THREAD_ATEXIT_IMPL
+extern "C"
+int __attribute__ ((__weak__))
+__cxa_thread_atexit_impl (void (_GLIBCXX_CDTOR_CALLABI *func) (void *),
+ void *arg, void *d);
+#endif
+
+// ??? We can't make it an ifunc, can we?
 extern "C" int
 __cxxabiv1::__cxa_thread_atexit (void (_GLIBCXX_CDTOR_CALLABI *dtor)(void *),
-void *obj, void */*dso_handle*/)
+void *obj, void *dso_handle)
   _GLIBCXX_NOTHROW
 {
+#if __GXX_WEAK__ && _GLIBCXX_MAY_HAVE___CXA_THREAD_ATEXIT_IMPL
+  if (__cxa_thread_atexit_impl)
+// Rely on a (presumably libc-provided) __cxa_thread_atexit_impl,
+// if it happens to be defined, even if configure couldn't find it
+// during the build.  _GLIBCXX_MAY_HAVE___CXA_THREAD_ATEXIT_IMPL
+// may be defined e.g. in os_defines.h on platforms where some
+// versions of libc have a __cxa_thread_atexit_impl definition,
+// but whose earlier versions didn't.  This enables programs build
+// by toolchains compatible with earlier libc versions to still
+// benefit from a libc-provided __cxa_thread_atexit_impl.
+return __cxa_thread_atexit_impl (dtor, obj, dso_handle);
+#endif
+
   // Do this initialization once.
   if (__gthread_active_p ())
 {


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


[PATCH V3 3/3] split complicate constant to memory

2023-12-05 Thread Jiufu Guo
Hi,

Sometimes, a complicated constant is built via 3(or more)
instructions to build. Generally speaking, it would not be
as fast as loading it from the constant pool (as a few
discussions in PR63281):
* "ld" is one instruction.  If consider "address/toc"
  adjust, we may count it as 2 instructions (the high part
  of address computation could be optimized as nop by linker
  further). And "pld" may need fewer cycles.
* As testing(SPEC2017), it could get better/stable runtime
  if set the threshold as "> 2" (compare with "> 3").

As tested on spec2017, for visible performance changes, we
can find the runtime improvement on 500.perlbench_r about
~1.8% (-O2, P10) with the patch. And for performance
downgrades on other benchmarks, as investigated, the recessions
are not caused by this patch.

Compare with the previous version:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636566.html
This version is refreshed based on the latest code.

Boostrap & regtest pass on ppc64{,le}.
Is this ok for trunk?

BR,
Jeff (Jiufu Guo)

PR target/63281

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_set_const): Update to split
complicate constant to memory.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const_anchors.c: Update to test final-rtl. 
* gcc.target/powerpc/parall_5insn_const.c: Update to keep original test
point.
* gcc.target/powerpc/pr106550.c: Likewise..
* gcc.target/powerpc/pr106550_1.c: Likewise.
* gcc.target/powerpc/pr87870.c: Update according to latest behavior.
* gcc.target/powerpc/pr93012.c: Likewise.

---
 gcc/config/rs6000/rs6000.cc | 16 
 .../gcc.target/powerpc/const_anchors.c  |  5 ++---
 .../gcc.target/powerpc/parall_5insn_const.c | 14 --
 gcc/testsuite/gcc.target/powerpc/pr106550.c | 17 +++--
 gcc/testsuite/gcc.target/powerpc/pr106550_1.c   | 15 +--
 gcc/testsuite/gcc.target/powerpc/pr87870.c  |  5 -
 gcc/testsuite/gcc.target/powerpc/pr93012.c  |  5 -
 7 files changed, 66 insertions(+), 11 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 2e074a21a05..e44a6da91ae 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10271,6 +10271,22 @@ rs6000_emit_set_const (rtx dest, rtx source)
  c = sext_hwi (c, 32);
  emit_move_insn (lo, GEN_INT (c));
}
+
+  /* If it can be stored to the constant pool and profitable.  */
+  else if (base_reg_operand (dest, mode)
+  && num_insns_constant (source, mode) > 2)
+   {
+ rtx sym = force_const_mem (mode, source);
+ if (TARGET_TOC && SYMBOL_REF_P (XEXP (sym, 0))
+ && use_toc_relative_ref (XEXP (sym, 0), mode))
+   {
+ rtx toc = create_TOC_reference (XEXP (sym, 0), copy_rtx (dest));
+ sym = gen_const_mem (mode, toc);
+ set_mem_alias_set (sym, get_TOC_alias_set ());
+   }
+
+ emit_insn (gen_rtx_SET (dest, sym));
+   }
   else
rs6000_emit_set_long_const (dest, c);
   break;
diff --git a/gcc/testsuite/gcc.target/powerpc/const_anchors.c 
b/gcc/testsuite/gcc.target/powerpc/const_anchors.c
index 542e2674b12..188744165f2 100644
--- a/gcc/testsuite/gcc.target/powerpc/const_anchors.c
+++ b/gcc/testsuite/gcc.target/powerpc/const_anchors.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target has_arch_ppc64 } } */
-/* { dg-options "-O2" } */
+/* { dg-options "-O2 -fdump-rtl-final" } */
 
 #define C1 0x2351847027482577ULL
 #define C2 0x2351847027482578ULL
@@ -16,5 +16,4 @@ void __attribute__ ((noinline)) foo1 (long long *a, long long 
b)
   if (b)
 *a++ = C2;
 }
-
-/* { dg-final { scan-assembler-times {\maddi\M} 2 } } */
+/* { dg-final { scan-rtl-dump-times {\madddi3\M} 2 "final" } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c 
b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c
index e3a9a7264cf..df0690b90be 100644
--- a/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c
+++ b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c
@@ -9,8 +9,18 @@
 void __attribute__ ((noinline)) foo (unsigned long long *a)
 {
   /* 2 lis + 2 ori + 1 rldimi for each constant.  */
-  *a++ = 0x800aabcdc167fa16ULL;
-  *a++ = 0x7543a876867f616ULL;
+  {
+register long long d asm("r0") = 0x800aabcdc167fa16ULL;
+long long n;
+asm("mr %0, %1" : "=r"(n) : "r"(d));
+*a++ = n;
+  }
+  {
+register long long d asm("r0") = 0x7543a876867f616ULL;
+long long n;
+asm("mr %0, %1" : "=r"(n) : "r"(d));
+*a++ = n;
+  }
 }
 
 long long A[] = {0x800aabcdc167fa16ULL, 0x7543a876867f616ULL};
diff --git a/gcc/testsuite/gcc.target/powerpc/pr106550.c 
b/gcc/testsuite/gcc.target/powerpc/pr106550.c
index 74e395331ab..5eca2b2f701 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr106550.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr106550.c
@@ -1,12 +1,25 @@
 /* PR 

[PATCH V3 2/3] Using pli for constant splitting

2023-12-05 Thread Jiufu Guo
Hi,

For constant building e.g. r120=0x, which does not fit 'li or lis',
'pli' is used to build this constant via 'emit_move_insn'.

While for a complicated constant, e.g. 0xULL, when using
'rs6000_emit_set_long_const' to split the constant recursively, it fails to
use 'pli' to build the half part constant: 0x.

'rs6000_emit_set_long_const' could be updated to use 'pli' to build half
part of the constant when necessary.  For example: 0xULL,
"pli 3,1717986918; rldimi 3,3,32,0" can be used.

Compare with previous:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636567.html
This verion is refreshed and added with a new testcase.

Bootstrap pass on ppc64{,le}.
Is this ok for trunk?

BR,
Jeff (Jiufu Guo)

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Add code to use
pli for 34bit constant.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const_split_pli.c: New test.

---
 gcc/config/rs6000/rs6000.cc| 7 +++
 gcc/testsuite/gcc.target/powerpc/const_split_pli.c | 9 +
 2 files changed, 16 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/const_split_pli.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index dbdc72dce5d..2e074a21a05 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10509,6 +10509,13 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c, 
int *num_insns)
   GEN_INT (0x)));
   };
 
+  if (TARGET_PREFIXED && SIGNED_INTEGER_34BIT_P (c))
+{
+  /* li/lis/pli */
+  count_or_emit_insn (dest, GEN_INT (c));
+  return;
+}
+
   if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 0x8000))
   || (ud4 == 0 && ud3 == 0 && ud2 == 0 && !(ud1 & 0x8000)))
 {
diff --git a/gcc/testsuite/gcc.target/powerpc/const_split_pli.c 
b/gcc/testsuite/gcc.target/powerpc/const_split_pli.c
new file mode 100644
index 000..626c93084aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/const_split_pli.c
@@ -0,0 +1,9 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target power10_ok } */
+
+unsigned long long msk66() { return 0xULL; }
+
+/* { dg-final { scan-assembler-times {\mpli\M} 1 } } */
+/* { dg-final { scan-assembler-not {\mli\M} } } */
+/* { dg-final { scan-assembler-not {\mlis\M} } } */
-- 
2.25.1



[PATCH V3 1/3]rs6000: update num_insns_constant for 2 insns

2023-12-05 Thread Jiufu Guo
Hi,

Trunk gcc supports more constants to be built via two instructions:
e.g. "li/lis; xori/xoris/rldicl/rldicr/rldic".
And then num_insns_constant should also be updated.

Function "rs6000_emit_set_long_const" is used to build complicated
constants; and "num_insns_constant_gpr" is used to compute 'how
many instructions are needed" to build the constant. So, these 
two functions should be aligned.

The idea of this patch is: to reuse "rs6000_emit_set_long_const" to
compute/record the instruction number(when computing the insn_num, 
then do not emit instructions).

Compare with the previous version:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636565.html
This version updates "rs6000_emit_set_long_const" to use a condition
if to select either "computing insn number" or "emitting the insn".
And put them together to avoid misalign in the future.

Bootstrap & regtest pass ppc64{,le}.
Is this ok for trunk?

BR,
Jeff (Jiufu Guo)

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Add new
parameter to record number of instructions to build the constant.
(num_insns_constant_gpr): Call rs6000_emit_set_long_const to compute
num_insn.

---
 gcc/config/rs6000/rs6000.cc | 272 ++--
 1 file changed, 137 insertions(+), 135 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 3dfd79c4c43..dbdc72dce5d 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -1115,7 +1115,7 @@ static tree rs6000_handle_longcall_attribute (tree *, 
tree, tree, int, bool *);
 static tree rs6000_handle_altivec_attribute (tree *, tree, tree, int, bool *);
 static tree rs6000_handle_struct_attribute (tree *, tree, tree, int, bool *);
 static tree rs6000_builtin_vectorized_libmass (combined_fn, tree, tree);
-static void rs6000_emit_set_long_const (rtx, HOST_WIDE_INT);
+static void rs6000_emit_set_long_const (rtx, HOST_WIDE_INT, int * = nullptr);
 static int rs6000_memory_move_cost (machine_mode, reg_class_t, bool);
 static bool rs6000_debug_rtx_costs (rtx, machine_mode, int, int, int *, bool);
 static int rs6000_debug_address_cost (rtx, machine_mode, addr_space_t,
@@ -6054,21 +6054,9 @@ num_insns_constant_gpr (HOST_WIDE_INT value)
 
   else if (TARGET_POWERPC64)
 {
-  HOST_WIDE_INT low = sext_hwi (value, 32);
-  HOST_WIDE_INT high = value >> 31;
-
-  if (high == 0 || high == -1)
-   return 2;
-
-  high >>= 1;
-
-  if (low == 0 || low == high)
-   return num_insns_constant_gpr (high) + 1;
-  else if (high == 0)
-   return num_insns_constant_gpr (low) + 1;
-  else
-   return (num_insns_constant_gpr (high)
-   + num_insns_constant_gpr (low) + 1);
+  int num_insns = 0;
+  rs6000_emit_set_long_const (NULL, value, _insns);
+  return num_insns;
 }
 
   else
@@ -10494,14 +10482,13 @@ can_be_built_by_li_and_rldic (HOST_WIDE_INT c, int 
*shift, HOST_WIDE_INT *mask)
 
 /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
Output insns to set DEST equal to the constant C as a series of
-   lis, ori and shl instructions.  */
+   lis, ori and shl instructions.  If NUM_INSNS is not NULL, then
+   only increase *NUM_INSNS as the number of insns, and do not output
+   real insns.  */
 
 static void
-rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
+rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c, int *num_insns)
 {
-  rtx temp;
-  int shift;
-  HOST_WIDE_INT mask;
   HOST_WIDE_INT ud1, ud2, ud3, ud4;
 
   ud1 = c & 0x;
@@ -10509,168 +10496,183 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT 
c)
   ud3 = (c >> 32) & 0x;
   ud4 = (c >> 48) & 0x;
 
-  if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 0x8000))
-  || (ud4 == 0 && ud3 == 0 && ud2 == 0 && ! (ud1 & 0x8000)))
-emit_move_insn (dest, GEN_INT (sext_hwi (ud1, 16)));
+  /* This lambda is used to emit one insn or just increase the insn count.
+ When counting the insn number, no need to emit the insn.  Here, two
+ kinds of insns are needed: move and rldimi. */
+  auto count_or_emit_insn = [_insns] (rtx dest, rtx op1, rtx op2 = NULL) {
+if (num_insns)
+  (*num_insns)++;
+else if (!op2)
+  emit_move_insn (dest, op1);
+else
+  emit_insn (gen_rotldi3_insert_3 (dest, op1, GEN_INT (32), op2,
+  GEN_INT (0x)));
+  };
 
-  else if ((ud4 == 0x && ud3 == 0x && (ud2 & 0x8000))
-  || (ud4 == 0 && ud3 == 0 && ! (ud2 & 0x8000)))
+  if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 0x8000))
+  || (ud4 == 0 && ud3 == 0 && ud2 == 0 && !(ud1 & 0x8000)))
 {
-  temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
+  /* li */
+  count_or_emit_insn (dest, GEN_INT (sext_hwi (ud1, 16)));
+  return;
+}
+
+  rtx temp = num_insns ? nullptr
+  : can_create_pseudo_p () ? gen_reg_rtx 

Re: [PATCH] libsupc++: try cxa_thread_atexit_impl at runtime

2023-12-05 Thread Alexandre Oliva
On Dec  5, 2023, David Edelsohn  wrote:

> The error is:
> ld: 0711-317 ERROR: Undefined symbol: __cxa_thread_atexit_impl
> from the new, weak reference.

Thanks.

> Also, earlier in atexit_thread.cc, there is another definition protected by

> _GLIBCXX_HAVE___CXA_THREAD_ATEXIT_IMPL

> not utilized by the new reference.

*nod*, the one I recently added covers a different situation, in which
the _impl symbol is not found in libc at build time.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [PATCH v2] rs6000: Add new pass for replacement of contiguous addresses vector load lxv with lxvp

2023-12-05 Thread Michael Meissner
On Wed, Dec 06, 2023 at 10:22:57AM +0800, Kewen.Lin wrote:
> I'd expect you use UNSPEC_MMA_EXTRACT to extract V16QI from the result of 
> lxvp,
> the current define_insn_and_split "*vsx_disassemble_pair" should be able to 
> take
> care of it further (eg: reg and regoff).
> 
> BR,
> Kewen

With Peter's subreg patch, UNSPEC_MMA_EXTRACT would produce two move with
eSUBREGs:

For a FMA type loop such as:

union vector_hack2 {
  vector  unsigned char vuc[2];
  vector double v[2];
};

static void
use_mma_ld_st_normal_no_unroll (double * __restrict__ r,
const double * __restrict__ a,
const double * __restrict__ b,
size_t num)
{
  __vector_pair * __restrict__ v_r = ( __vector_pair * __restrict__) r;
  const __vector_pair * __restrict__ v_a = (const __vector_pair * __restrict__) 
a;
  const __vector_pair * __restrict__ v_b = (const __vector_pair * __restrict__) 
b;
  size_t num_vector = num / (2 * (sizeof (vector double) / sizeof (double)));
  size_t num_scalar = num % (2 * (sizeof (vector double) / sizeof (double)));
  size_t i;
  union vector_hack2 a_union;
  union vector_hack2 b_union;
  union vector_hack2 r_union;
  vector double a_hi, a_lo;
  vector double b_hi, b_lo;
  vector double r_hi, r_lo;
  union vector_hack result_hi, result_lo;

#pragma GCC unroll 0
  for (i = 0; i < num_vector; i++)
{
  __builtin_vsx_disassemble_pair (_union.vuc, _a[i]);
  __builtin_vsx_disassemble_pair (_union.vuc, _b[i]);
  __builtin_vsx_disassemble_pair (_union.vuc, _r[i]);

  a_hi = a_union.v[0];
  b_hi = b_union.v[0];
  r_hi = r_union.v[0];

  a_lo = a_union.v[1];
  b_lo = b_union.v[1];
  r_lo = r_union.v[1];

  result_hi.v = (a_hi * b_hi) + r_hi;
  result_lo.v = (a_lo * b_lo) + r_lo;

  __builtin_vsx_build_pair (_r[i], result_hi.vuc, result_lo.vuc);
}

  if (num_scalar)
{
  r += num_vector * (2 * (sizeof (vector double) / sizeof (double)));
  a += num_vector * (2 * (sizeof (vector double) / sizeof (double)));
  b += num_vector * (2 * (sizeof (vector double) / sizeof (double)));

#pragma GCC unroll 0
  for (i = 0; i < num_scalar; i++)
 r[i] += (a[i] * b[i]);
}

  return;
}

Peter's code would produce the following in the inner loop:

(insn 16 15 19 4 (set (reg:OO 133 [ _43 ])
(mem:OO (plus:DI (reg/v/f:DI 150 [ a ])
(reg:DI 143 [ ivtmp.1088 ])) [6 MEM[(__vector_pair *)a_30(D) + 
ivtmp.1088_88 * 1]+0 S32 A128])) "p10-fma.h":3285:1 2181 {*movoo}
 (nil))
(insn 19 16 22 4 (set (reg:OO 136 [ _48 ])
(mem:OO (plus:DI (reg/v/f:DI 151 [ b ])
(reg:DI 143 [ ivtmp.1088 ])) [6 MEM[(__vector_pair *)b_31(D) + 
ivtmp.1088_88 * 1]+0 S32 A128])) "p10-fma.h":3285:1 2181 {*movoo}
 (nil))
(insn 22 19 25 4 (set (reg:OO 139 [ _53 ])
(mem:OO (plus:DI (reg/v/f:DI 149 [ r ])
(reg:DI 143 [ ivtmp.1088 ])) [6 MEM[(__vector_pair *)r_29(D) + 
ivtmp.1088_88 * 1]+0 S32 A128])) "p10-fma.h":3285:1 2181 {*movoo}
 (nil))
(insn 25 22 26 4 (set (reg:V2DF 117 [ _6 ])
(fma:V2DF (subreg:V2DF (reg:OO 136 [ _48 ]) 16)
(subreg:V2DF (reg:OO 133 [ _43 ]) 16)
(subreg:V2DF (reg:OO 139 [ _53 ]) 16))) "p10-fma.h":3319:35 1265 
{*vsx_fmav2df4}
 (nil))
(insn 26 25 27 4 (set (reg:V2DF 118 [ _8 ])
(fma:V2DF (subreg:V2DF (reg:OO 136 [ _48 ]) 0)
(subreg:V2DF (reg:OO 133 [ _43 ]) 0)
(subreg:V2DF (reg:OO 139 [ _53 ]) 0))) "p10-fma.h":3320:35 1265 
{*vsx_fmav2df4}
 (expr_list:REG_DEAD (reg:OO 139 [ _53 ])
(expr_list:REG_DEAD (reg:OO 136 [ _48 ])
(expr_list:REG_DEAD (reg:OO 133 [ _43 ])
(nil)
(insn 27 26 28 4 (set (reg:OO 142 [ _59 ])
(unspec:OO [
(subreg:V16QI (reg:V2DF 117 [ _6 ]) 0)
(subreg:V16QI (reg:V2DF 118 [ _8 ]) 0)
] UNSPEC_VSX_ASSEMBLE)) 2183 {*vsx_assemble_pair}
 (expr_list:REG_DEAD (reg:V2DF 118 [ _8 ])
(expr_list:REG_DEAD (reg:V2DF 117 [ _6 ])
(nil

Now in theory you could get ride of the UNSPEC_VSX_ASSEMBLE also using SUBREG's.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


[PATCH] RISC-V: Remove useless modes

2023-12-05 Thread Li Xu
From: xuli 

gcc/ChangeLog:

* config/riscv/riscv.md: Remove.
---
 gcc/config/riscv/riscv.md | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index a98918dfd43..0db659acfbe 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -235,7 +235,6 @@
   RVVM1x7DF,RVVM1x6DF,RVVM1x5DF,RVVM2x4DF,
   RVVM1x4DF,RVVM2x3DF,RVVM1x3DF,RVVM4x2DF,
   RVVM2x2DF,RVVM1x2DF,
-  VNx2x1DF,VNx3x1DF,VNx4x1DF,VNx5x1DF,VNx6x1DF,VNx7x1DF,VNx8x1DF,
   
V1QI,V2QI,V4QI,V8QI,V16QI,V32QI,V64QI,V128QI,V256QI,V512QI,V1024QI,V2048QI,V4096QI,
   V1HI,V2HI,V4HI,V8HI,V16HI,V32HI,V64HI,V128HI,V256HI,V512HI,V1024HI,V2048HI,
   V1SI,V2SI,V4SI,V8SI,V16SI,V32SI,V64SI,V128SI,V256SI,V512SI,V1024SI,
-- 
2.17.1



RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code

2023-12-05 Thread Tamar Christina
> > > +
> > > +  tree truth_type = truth_type_for (vectype_op);  machine_mode mode =
> > > + TYPE_MODE (truth_type);  int ncopies;
> > > +
> 
> more line break issues ... (also below, check yourself)
> 
> shouldn't STMT_VINFO_VECTYPE already match truth_type here?  If not
> it looks to be set wrongly (or shouldn't be set at all)
> 

Fixed, I now leverage the existing vect_recog_bool_pattern to update the types
If needed and determine the initial type in vect_get_vector_types_for_stmt.

> > > +  if (slp_node)
> > > +ncopies = 1;
> > > +  else
> > > +ncopies = vect_get_num_copies (loop_vinfo, truth_type);
> > > +
> > > +  vec_loop_masks *masks = _VINFO_MASKS (loop_vinfo);  bool
> > > + masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> > > +
> 
> what about with_len?

Should be easy to add, but don't know how it works.

> 
> > > +  /* Analyze only.  */
> > > +  if (!vec_stmt)
> > > +{
> > > +  if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
> > > + {
> > > +   if (dump_enabled_p ())
> > > +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > +"can't vectorize early exit because the "
> > > +"target doesn't support flag setting vector "
> > > +"comparisons.\n");
> > > +   return false;
> > > + }
> > > +
> > > +  if (!expand_vec_cmp_expr_p (vectype_op, truth_type, NE_EXPR))
> 
> Why NE_EXPR?  This looks wrong.  Or vectype_op is wrong if you're
> emitting
> 
>  mask = op0 CMP op1;
>  if (mask != 0)
> 
> I think you need to check for CMP, not NE_EXPR.

Well CMP is checked by vectorizable_comparison_1, but I realized this
check is not checking what I wanted and the cbranch requirements
already do.  So removed.

> 
> > > + {
> > > +   if (dump_enabled_p ())
> > > +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > +"can't vectorize early exit because the "
> > > +"target does not support boolean vector "
> > > +"comparisons for type %T.\n", truth_type);
> > > +   return false;
> > > + }
> > > +
> > > +  if (ncopies > 1
> > > +   && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
> > > + {
> > > +   if (dump_enabled_p ())
> > > +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > +"can't vectorize early exit because the "
> > > +"target does not support boolean vector OR for "
> > > +"type %T.\n", truth_type);
> > > +   return false;
> > > + }
> > > +
> > > +  if (!vectorizable_comparison_1 (vinfo, truth_type, stmt_info, 
> > > code, gsi,
> > > +   vec_stmt, slp_node, cost_vec))
> > > + return false;
> 
> I suppose vectorizable_comparison_1 will check this again, so the above
> is redundant?
> 

The IOR? No, vectorizable_comparison_1 doesn't reduce so may not check it
depending on the condition.

> > > +  /* Determine if we need to reduce the final value.  */
> > > +  if (stmts.length () > 1)
> > > +{
> > > +  /* We build the reductions in a way to maintain as much 
> > > parallelism as
> > > +  possible.  */
> > > +  auto_vec workset (stmts.length ());
> > > +  workset.splice (stmts);
> > > +  while (workset.length () > 1)
> > > + {
> > > +   new_temp = make_temp_ssa_name (truth_type, NULL,
> > > "vexit_reduc");
> > > +   tree arg0 = workset.pop ();
> > > +   tree arg1 = workset.pop ();
> > > +   new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0,
> > > arg1);
> > > +   vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> > > +_gsi);
> > > +   if (slp_node)
> > > + slp_node->push_vec_def (new_stmt);
> > > +   else
> > > + STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
> > > +   workset.quick_insert (0, new_temp);
> 
> Reduction epilogue handling has similar code to reduce a set of vectors
> to a single one with an operation.  I think we want to share that code.
> 

I've taken a look but that code isn't suitable here since they have different
constraints.  I don't require an in-order reduction since for the comparison
all we care about is whether in a lane any bit is set or not.  This means:

1. we can reduce using a fast operation like IOR.
2. we can reduce in as much parallelism as possible.

The comparison is on the critical path for the loop now, unlike live reductions
which are always at the end, so using the live reduction code resulted in a
slow down since it creates a longer dependency chain.

> > > + }
> > > +}
> > > +  else
> > > +new_temp = stmts[0];
> > > +
> > > +  gcc_assert (new_temp);
> > > +
> > > +  tree cond = new_temp;
> > > +  if (masked_loop_p)
> > > +{
> > > +  tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies,
> > > truth_type, 0);
> > > +  cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> > > 

Re: [PATCH RFA (libstdc++)] c++: partial ordering of object parameter [PR53499]

2023-12-05 Thread Jason Merrill

On 12/5/23 23:23, waffl3x wrote:

Does CWG2834 effect this weird edge case?


2834 affects all partial ordering with explicit object member functions; 
currently the working draft says that they get an additional fake object 
parameter, which is clearly wrong.



I couldn't quite grasp the
standardese so I'm not really sure. These are a few cases from a test
that I finalized last night. I ran this by jwakely and he agreed that
the behavior as shown is correct by the standard. I'll also add that
this is also the current behavior of my patch.

template concept Constrain = true;

inline constexpr int iobj_fn = 5;
inline constexpr int xobj_fn = 10;

struct S {
   int f(Constrain auto) { return iobj_fn; };
   int f(this S&&, auto) { return xobj_fn; };

   int g(auto) { return iobj_fn; };
   int g(this S&&, Constrain auto) { return xobj_fn; };
};
int main() {
   S s{};
   s.f (0)   == iobj_fn;


Yes, the xobj fn isn't viable because it takes an rvalue ref.


   static_cast(s).f (0) == iobj_fn;


Yes, the functions look the same to partial ordering, so we compare 
constraints and the iobj fn is more constrained.



   s.g (0)   == iobj_fn;


Yes, the xobj fn isn't viable.


   static_cast(s).g (0) == xobj_fn;


Yes, the xobj fn is more constrained.

Jason



Re: [PATCH RFA (libstdc++)] c++: partial ordering of object parameter [PR53499]

2023-12-05 Thread waffl3x
Does CWG2834 effect this weird edge case? I couldn't quite grasp the
standardese so I'm not really sure. These are a few cases from a test
that I finalized last night. I ran this by jwakely and he agreed that
the behavior as shown is correct by the standard. I'll also add that
this is also the current behavior of my patch.

template concept Constrain = true;

inline constexpr int iobj_fn = 5;
inline constexpr int xobj_fn = 10;

struct S {
  int f(Constrain auto) { return iobj_fn; };
  int f(this S&&, auto) { return xobj_fn; };

  int g(auto) { return iobj_fn; };
  int g(this S&&, Constrain auto) { return xobj_fn; };
};
int main() {
  S s{};
  s.f (0)   == iobj_fn;
  static_cast(s).f (0) == iobj_fn;

  s.g (0)   == iobj_fn;
  static_cast(s).g (0) == xobj_fn;
}


On Tuesday, December 5th, 2023 at 7:21 PM, Jason Merrill  
wrote:


> 
> 
> Tested x86_64-pc-linux-gnu. Are the library test changes OK? A reduced
> example of the issue is at https://godbolt.org/z/cPxrcnKjG
> 
> -- 8< --
> 
> Looks like we implemented option 1 (skip the object parameter) for CWG532
> before the issue was resolved, and never updated to the final resolution of
> option 2 (model it as a reference). More recently CWG2445 extended this
> handling to static member functions; I think that's wrong, and have
> opened CWG2834 to address that and how explicit object member functions
> interact with it.
> 
> The FIXME comments are to guide how the explicit object member function
> support should change the uses of DECL_NONSTATIC_MEMBER_FUNCTION_P.
> 
> The library testsuite changes are to make partial ordering work again
> between the generic operator- in the testcase and
> _Pointer_adapter::operator-.
> 
> DR 532
> PR c++/53499
> 
> gcc/cp/ChangeLog:
> 
> * pt.cc (more_specialized_fn): Fix object parameter handling.
> 
> gcc/testsuite/ChangeLog:
> 
> * g++.dg/template/partial-order4.C: New test.
> * g++.dg/template/spec26.C: Adjust for CWG532.
> 
> libstdc++-v3/ChangeLog:
> 
> * testsuite/23_containers/vector/ext_pointer/types/1.cc
> * testsuite/23_containers/vector/ext_pointer/types/2.cc
> (N::operator-): Make less specialized.
> ---
> gcc/cp/pt.cc | 68 ++-
> .../g++.dg/template/partial-order4.C | 17 +
> gcc/testsuite/g++.dg/template/spec26.C | 10 +--
> .../vector/ext_pointer/types/1.cc | 4 +-
> .../vector/ext_pointer/types/2.cc | 4 +-
> 5 files changed, 78 insertions(+), 25 deletions(-)
> create mode 100644 gcc/testsuite/g++.dg/template/partial-order4.C
> 
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index 924a20973b4..4b2af4f7aca 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -25218,27 +25218,61 @@ more_specialized_fn (tree pat1, tree pat2, int len)
> bool lose1 = false;
> bool lose2 = false;
> 
> - /* Remove the this parameter from non-static member functions. If
> - one is a non-static member function and the other is not a static
> - member function, remove the first parameter from that function
> - also. This situation occurs for operator functions where we
> - locate both a member function (with this pointer) and non-member
> - operator (with explicit first operand). /
> - if (DECL_NONSTATIC_MEMBER_FUNCTION_P (decl1))
> + / C++17 [temp.func.order]/3 (CWG532)
> +
> + If only one of the function templates M is a non-static member of some
> + class A, M is considered to have a new first parameter inserted in its
> + function parameter list. Given cv as the cv-qualifiers of M (if any), the
> + new parameter is of type "rvalue reference to cv A" if the optional
> + ref-qualifier of M is && or if M has no ref-qualifier and the first
> + parameter of the other template has rvalue reference type. Otherwise, the
> + new parameter is of type "lvalue reference to cv A". /
> +
> + if (DECL_STATIC_FUNCTION_P (decl1) || DECL_STATIC_FUNCTION_P (decl2))
> {
> - len--; / LEN is the number of significant arguments for DECL1 /
> - args1 = TREE_CHAIN (args1);
> - if (!DECL_STATIC_FUNCTION_P (decl2))
> - args2 = TREE_CHAIN (args2);
> - }
> - else if (DECL_NONSTATIC_MEMBER_FUNCTION_P (decl2))
> - {
> - args2 = TREE_CHAIN (args2);
> - if (!DECL_STATIC_FUNCTION_P (decl1))
> + / Note C++20 DR2445 extended the above to static member functions, but
> + I think think the old G++ behavior of just skipping the object
> + parameter when comparing to a static member function was better, so
> + let's stick with that for now. This is CWG2834. --jason 2023-12 /
> + if (DECL_NONSTATIC_MEMBER_FUNCTION_P (decl1)) / FIXME or explicit /
> {
> - len--;
> + len--; / LEN is the number of significant arguments for DECL1 /
> args1 = TREE_CHAIN (args1);
> }
> + else if (DECL_NONSTATIC_MEMBER_FUNCTION_P (decl2)) / FIXME or explicit /
> + args2 = TREE_CHAIN (args2);
> + }
> + else if (DECL_NONSTATIC_MEMBER_FUNCTION_P (decl1) / FIXME implicit only /
> + && DECL_NONSTATIC_MEMBER_FUNCTION_P (decl2))
> + {
> + / Note DR2445 also (IMO wrongly) removed the "only one" above, which
> + would break e.g. 

RE: [PATCH 8/21]middle-end: update vectorizable_live_reduction with support for multiple exits and different exits

2023-12-05 Thread Tamar Christina
> > > is the exit edge you are looking for without iterating over all loop 
> > > exits.
> > >
> > > > +   gimple *tmp_vec_stmt = vec_stmt;
> > > > +   tree tmp_vec_lhs = vec_lhs;
> > > > +   tree tmp_bitstart = bitstart;
> > > > +   /* For early exit where the exit is not in the BB that 
> > > > leads
> > > > +  to the latch then we're restarting the iteration in 
> > > > the
> > > > +  scalar loop.  So get the first live value.  */
> > > > +   restart_loop = restart_loop || exit_e != main_e;
> > > > +   if (restart_loop)
> > > > + {
> > > > +   tmp_vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
> > > > +   tmp_vec_lhs = gimple_get_lhs (tmp_vec_stmt);
> > > > +   tmp_bitstart = build_zero_cst (TREE_TYPE 
> > > > (bitstart));
> > >
> > > Hmm, that gets you the value after the first iteration, not the one 
> > > before which
> > > would be the last value of the preceeding vector iteration?
> > > (but we don't keep those, we'd need a PHI)
> >
> > I don't fully follow.  The comment on top of this hunk under if 
> > (loop_vinfo) states
> > that lhs should be pointing to a PHI.
> >
> > When I inspect the statement I see
> >
> > i_14 = PHI 
> >
> > so i_14 is the value at the start of the current iteration.  If we're 
> > coming from the
> > header 0, otherwise i_11 which is the value of the previous iteration?
> >
> > The peeling code explicitly leaves i_14 in the merge block and not i_11 for 
> > this
> exact reason.
> > So I'm confused, my understanding is that we're already *at* the right PHI.
> >
> > Is it perhaps that you thought we put i_11 here for the early exits? In 
> > which case
> > Yes I'd agree that that would be wrong, and there we would have had to look 
> > at
> > The defs, but i_11 is the def.
> >
> > I already kept this in mind and leveraged peeling to make this part easier.
> > i_11 is used in the main exit and i_14 in the early one.
> 
> I think the important detail is that this code is only executed for
> vect_induction_defs which are indeed PHIs and so we're sure the
> value live is before any modification so fine to feed as initial
> value for the PHI in the epilog.
> 
> Maybe we can assert the def type here?

We can't assert because until cfg cleanup the dead value is still seen and still
vectorized.  That said I've added a guard here.  We vectorize the non-induction
value as normal now and if it's ever used it'll fail.

> 
> > >
> > > Why again do we need (non-induction) live values from the vector loop to 
> > > the
> > > epilogue loop again?
> >
> > They can appear as the result value of the main exit.
> >
> > e.g. in testcase (vect-early-break_17.c)
> >
> > #define N 1024
> > unsigned vect_a[N];
> > unsigned vect_b[N];
> >
> > unsigned test4(unsigned x)
> > {
> >  unsigned ret = 0;
> >  for (int i = 0; i < N; i++)
> >  {
> >vect_b[i] = x + i;
> >if (vect_a[i] > x)
> >  return vect_a[i];
> >vect_a[i] = x;
> >ret = vect_a[i] + vect_b[i];
> >  }
> >  return ret;
> > }
> >
> > The only situation they can appear in the as an early-break is when
> > we have a case where main exit != latch connected exit.
> >
> > However in these cases they are unused, and only there because
> > normally you would have exited (i.e. there was a return) but the
> > vector loop needs to start over so we ignore it.
> >
> > These happen in testcase vect-early-break_74.c and
> > vect-early-break_78.c
> 
> Hmm, so in that case their value is incorrect (but doesn't matter,
> we ignore it)?
> 

Correct, they're placed there due to exit redirection, but in these inverted
testcases where we've peeled the vector iteration you can't ever skip the
epilogue.  So they are guaranteed not to be used.

> > > > +   gimple_stmt_iterator exit_gsi;
> > > > +   tree new_tree
> > > > + = vectorizable_live_operation_1 (loop_vinfo, 
> > > > stmt_info,
> > > > +  exit_e, vectype, 
> > > > ncopies,
> > > > +  slp_node, bitsize,
> > > > +  tmp_bitstart, 
> > > > tmp_vec_lhs,
> > > > +  lhs_type, 
> > > > restart_loop,
> > > > +  _gsi);
> > > > +
> > > > +   /* Use the empty block on the exit to materialize the 
> > > > new
> > > stmts
> > > > +  so we can use update the PHI here.  */
> > > > +   if (gimple_phi_num_args (use_stmt) == 1)
> > > > + {
> > > > +   auto gsi = gsi_for_stmt (use_stmt);
> > > > +   remove_phi_node (, false);
> > > > +   tree lhs_phi = gimple_phi_result (use_stmt);
> > > > +   gimple *copy = gimple_build_assign (lhs_phi, 
> > > > 

RE: [PATCH 10/21]middle-end: implement relevancy analysis support for control flow

2023-12-05 Thread Tamar Christina
> > > +   && LOOP_VINFO_LOOP_IV_COND (loop_vinfo) != cond)
> > > + *relevant = vect_used_in_scope;
> 
> but why not simply mark all gconds as vect_used_in_scope?
> 

We break outer-loop vectorization since doing so would pull the inner loop's
exit into scope for the outerloop.   Also we can't force the loop's main IV exit
to be in scope, since it will be replaced by the vectorizer.

I've updated the code to remove the quadratic lookup.

> > > +}
> > >
> > >/* changing memory.  */
> > >if (gimple_code (stmt_info->stmt) != GIMPLE_PHI) @@ -374,6 +379,11 @@
> > > vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
> > >   *relevant = vect_used_in_scope;
> > >}
> > >
> > > +  auto_vec exits = get_loop_exit_edges (loop);  auto_bitmap
> > > + exit_bbs;  for (edge exit : exits)
> 
> is it your mail client messing patches up?  missing line-break
> again.
> 

Yeah, seems it was, hopefully fixed now.

> > > +bitmap_set_bit (exit_bbs, exit->dest->index);
> > > +
> 
> you don't seem to use the bitmap?
> 
> > >/* uses outside the loop.  */
> > >FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt_info->stmt, op_iter,
> > > SSA_OP_DEF)
> > >  {
> > > @@ -392,7 +402,6 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info,
> > > loop_vec_info loop_vinfo,
> > > /* We expect all such uses to be in the loop exit phis
> > >(because of loop closed form)   */
> > > gcc_assert (gimple_code (USE_STMT (use_p)) == GIMPLE_PHI);
> > > -   gcc_assert (bb == single_exit (loop)->dest);
> > >
> > >*live_p = true;
> > >   }
> > > @@ -793,6 +802,20 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info
> > > loop_vinfo, bool *fatal)
> > >   return res;
> > >   }
> > >   }
> > > + }
> > > +   else if (gcond *cond = dyn_cast  (stmt_vinfo->stmt))
> > > + {
> > > +   enum tree_code rhs_code = gimple_cond_code (cond);
> > > +   gcc_assert (TREE_CODE_CLASS (rhs_code) == tcc_comparison);
> > > +   opt_result res
> > > + = process_use (stmt_vinfo, gimple_cond_lhs (cond),
> > > +loop_vinfo, relevant, , false);
> > > +   if (!res)
> > > + return res;
> > > +   res = process_use (stmt_vinfo, gimple_cond_rhs (cond),
> > > + loop_vinfo, relevant, , false);
> > > +   if (!res)
> > > + return res;
> > >  }
> 
> I guess we're missing an
> 
>   else
> gcc_unreachable ();
> 
> to catch not handled stmt kinds (do we have gcond patterns yet?)
> 
> > > else if (gcall *call = dyn_cast  (stmt_vinfo->stmt))
> > >   {
> > > @@ -13043,11 +13066,15 @@ vect_analyze_stmt (vec_info *vinfo,
> > >node_instance, cost_vec);
> > >if (!res)
> > >   return res;
> > > -   }
> > > +}
> > > +
> > > +  if (is_ctrl_stmt (stmt_info->stmt))
> > > +STMT_VINFO_DEF_TYPE (stmt_info) = vect_early_exit_def;
> 
> I think it should rather be vect_condition_def.  It's also not
> this functions business to set STMT_VINFO_DEF_TYPE.  If we ever
> get to handle not if-converted code (or BB vectorization of that)
> then a gcond would define the mask stmts are under.
> 

Hmm sure, I've had to place it in multiple other places but moved it
away from here.  The main ones are set during dataflow analysis when
we determine which statements need to be moved.

> > >switch (STMT_VINFO_DEF_TYPE (stmt_info))
> > >  {
> > >case vect_internal_def:
> > > +  case vect_early_exit_def:
> > >  break;
> > >
> > >case vect_reduction_def:
> > > @@ -13080,6 +13107,7 @@ vect_analyze_stmt (vec_info *vinfo,
> > >  {
> > >gcall *call = dyn_cast  (stmt_info->stmt);
> > >gcc_assert (STMT_VINFO_VECTYPE (stmt_info)
> > > +   || gimple_code (stmt_info->stmt) == GIMPLE_COND
> > > || (call && gimple_call_lhs (call) == NULL_TREE));
> > >*need_to_vectorize = true;
> > >  }
> > > @@ -13835,6 +13863,14 @@ vect_is_simple_use (vec_info *vinfo,
> > > stmt_vec_info stmt, slp_tree slp_node,
> > > else
> > >   *op = gimple_op (ass, operand + 1);
> > >   }
> > > +  else if (gcond *cond = dyn_cast  (stmt->stmt))
> > > + {
> > > +   gimple_match_op m_op;
> > > +   if (!gimple_extract_op (cond, _op))
> > > + return false;
> > > +   gcc_assert (m_op.code.is_tree_code ());
> > > +   *op = m_op.ops[operand];
> > > + }
> 
> Please do not use gimple_extract_op, use
> 
>   *op = gimple_op (cond, operand);
> 
> > >else if (gcall *call = dyn_cast  (stmt->stmt))
> > >   *op = gimple_call_arg (call, operand);
> > >else
> > > @@ -14445,6 +14481,8 @@ vect_get_vector_types_for_stmt (vec_info
> > > *vinfo, stmt_vec_info stmt_info,
> > >*nunits_vectype_out = NULL_TREE;
> > >
> > >if (gimple_get_lhs (stmt) == NULL_TREE
> > > +  /* Allow vector conditionals through here.  */
> > > +  && !is_ctrl_stmt (stmt)
> 
> !is_a  (stmt)
> 
> > 

[PATCH]middle-end: Fix peeled vect loop IV values.

2023-12-05 Thread Tamar Christina
Hi All,

While waiting for reviews I found this case where both loop exit needs to go to
epilogue loop, but there was an IV related variable that was used in the scalar
iteration as well.

vect_update_ivs_after_vectorizer then blew the value away and replaced it with
the value if it took the normal exit.

For these cases where we've peeled an a vector iteration, we should skip
vect_update_ivs_after_vectorizer since all exits are "alternate" exits.

For this to be correct we have peeling put the right LCSSA variables so
vectorable_live_operations takes care of it.

This is triggered by new testcases 79 and 80 in early break testsuite
and I'll merge this commit in the main one.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
Put right LCSSA var for peeled vect loops.
(vect_do_peeling): Skip vect_update_ivs_after_vectorizer.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 
7d48502e2e46240553509dfa6d75fcab7fea36d3..bfdbeb7faaba29aad51c0561dace680c96759484
 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1668,6 +1668,7 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, 
edge loop_exit,
   edge loop_entry = single_succ_edge (new_preheader);
   if (flow_loops)
{
+ bool peeled_iters = single_pred (loop->latch) != loop_exit->src;
  /* Link through the main exit first.  */
  for (auto gsi_from = gsi_start_phis (loop->header),
   gsi_to = gsi_start_phis (new_loop->header);
@@ -1692,11 +1693,19 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop 
*loop, edge loop_exit,
  continue;
}
}
+ /* If we have multiple exits and the vector loop is peeled then we
+need to use the value at start of loop.  */
+ if (peeled_iters)
+   {
+ tree tmp_arg = gimple_phi_result (from_phi);
+ if (!new_phi_args.get (tmp_arg))
+   new_arg = tmp_arg;
+   }
 
  tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
  gphi *lcssa_phi = create_phi_node (new_res, new_preheader);
 
- /* Main loop exit should use the final iter value.  */
+ /* Otherwise, main loop exit should use the final iter value.  */
  SET_PHI_ARG_DEF (lcssa_phi, loop_exit->dest_idx, new_arg);
 
  adjust_phi_and_debug_stmts (to_phi, loop_entry, new_res);
@@ -3394,9 +3403,13 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, 
tree nitersm1,
   if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
update_e = single_succ_edge (e->dest);
 
-  /* Update the main exit.  */
-  vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
-   update_e);
+  /* If we have a peeled vector iteration, all exits are the same, leave it
+and so the main exit needs to be treated the same as the alternative
+exits in that we leave their updates to vectorizable_live_operations.
+*/
+  if (!LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo))
+   vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
+ update_e);
 
   if (skip_epilog || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
{




-- 
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 
7d48502e2e46240553509dfa6d75fcab7fea36d3..bfdbeb7faaba29aad51c0561dace680c96759484
 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1668,6 +1668,7 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, 
edge loop_exit,
   edge loop_entry = single_succ_edge (new_preheader);
   if (flow_loops)
{
+ bool peeled_iters = single_pred (loop->latch) != loop_exit->src;
  /* Link through the main exit first.  */
  for (auto gsi_from = gsi_start_phis (loop->header),
   gsi_to = gsi_start_phis (new_loop->header);
@@ -1692,11 +1693,19 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop 
*loop, edge loop_exit,
  continue;
}
}
+ /* If we have multiple exits and the vector loop is peeled then we
+need to use the value at start of loop.  */
+ if (peeled_iters)
+   {
+ tree tmp_arg = gimple_phi_result (from_phi);
+ if (!new_phi_args.get (tmp_arg))
+   new_arg = tmp_arg;
+   }
 
  tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
  gphi *lcssa_phi = create_phi_node (new_res, new_preheader);
 
- /* Main loop exit should use the final iter value.  */
+  

[PATCH]middle-end: correct loop bounds for early breaks and peeled vector loops

2023-12-05 Thread Tamar Christina
Hi All,

While waiting for reviews I've continued to run more test.
In this case this was one found running 32-bit systems.

While we calculate the right latch count for the epilog,
the vectorizer overrides SCEV and so unrolling goes wrong.

This updates the bounds for the case where we've peeled a
vector iteration.

Testcase in early break testsuite adjusted to test for this
and I'll merge this commit in the main one.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-loop-manip.cc (vect_do_peeling): Adjust bounds.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 
4edde4443ecd98775972f39b3fe839255db12b04..7d48502e2e46240553509dfa6d75fcab7fea36d3
 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -3457,6 +3457,12 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, 
tree nitersm1,
   if (bound_scalar.is_constant ())
{
  gcc_assert (bound != 0);
+ /* Adjust the upper bound by the extra peeled vector iteration if we
+are an epilogue of an peeled vect loop and not VLA.  For VLA the
+loop bounds are unknown.  */
+ if (LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo)
+ && vf.is_constant ())
+   bound += vf.to_constant ();
  /* -1 to convert loop iterations to latch iterations.  */
  record_niter_bound (epilog, bound - 1, false, true);
  scale_loop_profile (epilog, profile_probability::always (),




-- 
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 
4edde4443ecd98775972f39b3fe839255db12b04..7d48502e2e46240553509dfa6d75fcab7fea36d3
 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -3457,6 +3457,12 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, 
tree nitersm1,
   if (bound_scalar.is_constant ())
{
  gcc_assert (bound != 0);
+ /* Adjust the upper bound by the extra peeled vector iteration if we
+are an epilogue of an peeled vect loop and not VLA.  For VLA the
+loop bounds are unknown.  */
+ if (LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo)
+ && vf.is_constant ())
+   bound += vf.to_constant ();
  /* -1 to convert loop iterations to latch iterations.  */
  record_niter_bound (epilog, bound - 1, false, true);
  scale_loop_profile (epilog, profile_probability::always (),





RE: [PATCH 13/21]middle-end: Update loop form analysis to support early break

2023-12-05 Thread Tamar Christina
ping

> -Original Message-
> From: Tamar Christina 
> Sent: Monday, November 27, 2023 10:48 PM
> To: Tamar Christina ; gcc-patches@gcc.gnu.org
> Cc: nd ; rguent...@suse.de; j...@ventanamicro.com
> Subject: RE: [PATCH 13/21]middle-end: Update loop form analysis to support 
> early
> break
> 
> Ping
> 
> > -Original Message-
> > From: Tamar Christina 
> > Sent: Monday, November 6, 2023 7:41 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: nd ; rguent...@suse.de; j...@ventanamicro.com
> > Subject: [PATCH 13/21]middle-end: Update loop form analysis to support
> > early break
> >
> > Hi All,
> >
> > This sets LOOP_VINFO_EARLY_BREAKS and does some misc changes so the
> > other patches are self contained.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * tree-vect-loop.cc (vect_analyze_loop_form): Analyse all exits.
> > (vect_create_loop_vinfo): Set LOOP_VINFO_EARLY_BREAKS.
> > (vect_transform_loop): Use it.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index
> > 51a054c5b035ac80dfbbf3b5ba2f6da82fda91f6..f9483eff6e9606e835906fb
> > 991f07cd6052491d0 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -1700,12 +1700,12 @@ vect_compute_single_scalar_iteration_cost
> > (loop_vec_info loop_vinfo)
> >loop_vinfo->scalar_costs->finish_cost (nullptr);  }
> >
> > -
> >  /* Function vect_analyze_loop_form.
> >
> > Verify that certain CFG restrictions hold, including:
> > - the loop has a pre-header
> > -   - the loop has a single entry and exit
> > +   - the loop has a single entry
> > +   - nested loops can have only a single exit.
> > - the loop exit condition is simple enough
> > - the number of iterations can be analyzed, i.e, a countable loop.  The
> >   niter could be analyzed under some assumptions.  */ @@ -1841,10
> > +1841,14 @@ vect_analyze_loop_form (class loop *loop,
> > vect_loop_form_info *info)
> >"not vectorized: latch block not empty.\n");
> >
> >/* Make sure the exit is not abnormal.  */
> > -  if (exit_e->flags & EDGE_ABNORMAL)
> > -return opt_result::failure_at (vect_location,
> > -  "not vectorized:"
> > -  " abnormal loop exit edge.\n");
> > +  auto_vec exits = get_loop_exit_edges (loop);
> > +  for (edge e : exits)
> > +{
> > +  if (e->flags & EDGE_ABNORMAL)
> > +   return opt_result::failure_at (vect_location,
> > +  "not vectorized:"
> > +  " abnormal loop exit edge.\n");
> > +}
> >
> >info->conds
> >  = vect_get_loop_niters (loop, exit_e, >assumptions, @@ -1920,6
> > +1924,10 @@ vect_create_loop_vinfo (class loop *loop, vec_info_shared
> > *shared,
> >
> >LOOP_VINFO_IV_EXIT (loop_vinfo) = info->loop_exit;
> >
> > +  /* Check to see if we're vectorizing multiple exits.  */
> > + LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > += !LOOP_VINFO_LOOP_CONDS (loop_vinfo).is_empty ();
> > +
> >if (info->inner_loop_cond)
> >  {
> >stmt_vec_info inner_loop_cond_info @@ -11577,7 +11585,7 @@
> > vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
> >/* Make sure there exists a single-predecessor exit bb.  Do this before
> >   versioning.   */
> >edge e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > -  if (! single_pred_p (e->dest))
> > +  if (! single_pred_p (e->dest) && !LOOP_VINFO_EARLY_BREAKS
> > + (loop_vinfo))
> >  {
> >split_loop_exit_edge (e, true);
> >if (dump_enabled_p ())
> >
> >
> >
> >
> > --


Re: [PATCH 1/4] RISC-V: Add crypto vector implied ISA info.

2023-12-05 Thread Tsukasa OI
On 2023/12/06 11:45, Feng Wang wrote:
> Due to the crypto vector entension is depend on the Vector extension,
> so the "v" info is added into implied ISA info with the corresponding
> crypto vector extension.

Hi Feng,

It's true that vector crypto extensions are based on the vector
extension but it *does not* mean that it requires full the 'V'
extension.  Vector crypto extensions also consider about embedded
processors where VLEN < 128.

Quoting the documentation:

> The Zvknhb and Zvbc Vector Crypto Extensions --and accordingly the composite 
> extensions Zvkn
> and Zvks-- require a Zve64x base, or application ("V") base Vector Extension.
> 
> All of the other Vector Crypto Extensions can be built on any embedded (Zve*) 
> or application ("V")
> base Vector Extension.

So, correct dependencies to add are like follows:

> +  {"zvbb",  "zvkb"},
> +  {"zvbc",   "zve64x"},
> +  {"zvkb",   "zve32x"},
> +  {"zvkg",   "zve32x"},
> +  {"zvkned", "zve32x"},
> +  {"zvknha", "zve32x"},
> +  {"zvknhb", "zve64x"},
> +  {"zvksed", "zve32x"},
> +  {"zvksh",  "zve32x"},

Note that 'V' indirectly depends on both 'Zve32x' and 'Zve64x' so this
would be fine to represent "or application ('V')" part quoted above.

Also, consider adding those dependencies to the Python script
gcc/config/riscv/arch-canonicalize.

Thanks,
Tsukasa




> 
> gcc/ChangeLog:
> 
>   * common/config/riscv/riscv-common.cc: Add "v" into implied ISA info.
> ---
>  gcc/common/config/riscv/riscv-common.cc | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/gcc/common/config/riscv/riscv-common.cc 
> b/gcc/common/config/riscv/riscv-common.cc
> index 6c210412515..dbb42ca2f1e 100644
> --- a/gcc/common/config/riscv/riscv-common.cc
> +++ b/gcc/common/config/riscv/riscv-common.cc
> @@ -120,6 +120,15 @@ static const riscv_implied_info_t riscv_implied_info[] =
>{"zvksc", "zvbc"},
>{"zvksg", "zvks"},
>{"zvksg", "zvkg"},
> +  {"zvbb",  "zvkb"},
> +  {"zvbc", "v"},
> +  {"zvkb", "v"},
> +  {"zvkg", "v"},
> +  {"zvkned",   "v"},
> +  {"zvknha",   "v"},
> +  {"zvknhb",   "v"},
> +  {"zvksed",   "v"},
> +  {"zvksh","v"},
>  
>{"zfh", "zfhmin"},
>{"zfhmin", "f"},


[PATCH 3/4] RISC-V: Add crypto vector machine descriptions

2023-12-05 Thread Feng Wang
This patch add the crypto machine descriptions(vector-crypto.md) and
some new iterators which are used by crypto vector ext.

Co-Authored by: Songhe Zhu 
Co-Authored by: Ciyan Pan 

gcc/ChangeLog:

* config/riscv/iterators.md: Add rotate insn name.
* config/riscv/riscv.md: Add new insns name for crypto vector.
* config/riscv/vector-iterators.md: Add new iterators for crypto vector.
* config/riscv/vector.md: Add the corresponding attr for crypto vector.
* config/riscv/vector-crypto.md: New file.The machine descriptions for 
crypto vector.
---
 gcc/config/riscv/iterators.md|   4 +-
 gcc/config/riscv/riscv.md|  33 +-
 gcc/config/riscv/vector-crypto.md| 500 +++
 gcc/config/riscv/vector-iterators.md |  41 +++
 gcc/config/riscv/vector.md   |  49 ++-
 5 files changed, 607 insertions(+), 20 deletions(-)
 create mode 100755 gcc/config/riscv/vector-crypto.md

diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index ecf033f2fa7..f332fba7031 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -304,7 +304,9 @@
 (umax "maxu")
 (clz "clz")
 (ctz "ctz")
-(popcount "cpop")])
+(popcount "cpop")
+(rotate "rol")
+(rotatert "ror")])
 
 ;; ---
 ;; Int Iterators.
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 935eeb7fd8e..a887f3cd412 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -428,6 +428,34 @@
 ;; vcompressvector compress instruction
 ;; vmov whole vector register move
 ;; vector   unknown vector instruction
+;; 17. Crypto Vector instructions
+;; vandncrypto vector bitwise and-not instructions
+;; vbrevcrypto vector reverse bits in elements instructions
+;; vbrev8   crypto vector reverse bits in bytes instructions
+;; vrev8crypto vector reverse bytes instructions
+;; vclz crypto vector count leading Zeros instructions
+;; vctz crypto vector count lrailing Zeros instructions
+;; vrol crypto vector rotate left instructions
+;; vror crypto vector rotate right instructions
+;; vwsllcrypto vector widening shift left logical instructions
+;; vclmul   crypto vector carry-less multiply - return low half 
instructions
+;; vclmulh  crypto vector carry-less multiply - return high half 
instructions
+;; vghshcrypto vector add-multiply over GHASH Galois-Field instructions
+;; vgmulcrypto vector multiply over GHASH Galois-Field instrumctions
+;; vaesef   crypto vector AES final-round encryption instructions
+;; vaesem   crypto vector AES middle-round encryption instructions
+;; vaesdf   crypto vector AES final-round decryption instructions
+;; vaesdm   crypto vector AES middle-round decryption instructions
+;; vaeskf1  crypto vector AES-128 Forward KeySchedule generation 
instructions
+;; vaeskf2  crypto vector AES-256 Forward KeySchedule generation 
instructions
+;; vaeszcrypto vector AES round zero encryption/decryption instructions
+;; vsha2ms  crypto vector SHA-2 message schedule instructions
+;; vsha2ch  crypto vector SHA-2 two rounds of compression instructions
+;; vsha2cl  crypto vector SHA-2 two rounds of compression instructions
+;; vsm4kcrypto vector SM4 KeyExpansion instructions
+;; vsm4rcrypto vector SM4 Rounds instructions
+;; vsm3me   crypto vector SM3 Message Expansion instructions
+;; vsm3ccrypto vector SM3 Compression instructions
 (define_attr "type"
   "unknown,branch,jump,jalr,ret,call,load,fpload,store,fpstore,
mtc,mfc,const,arith,logical,shift,slt,imul,idiv,move,fmove,fadd,fmul,
@@ -447,7 +475,9 @@
vired,viwred,vfredu,vfredo,vfwredu,vfwredo,
vmalu,vmpop,vmffs,vmsfs,vmiota,vmidx,vimovvx,vimovxv,vfmovvf,vfmovfv,
vslideup,vslidedown,vislide1up,vislide1down,vfslide1up,vfslide1down,
-   vgather,vcompress,vmov,vector"
+   
vgather,vcompress,vmov,vector,vandn,vbrev,vbrev8,vrev8,vclz,vctz,vcpop,vrol,vror,vwsll,
+   
vclmul,vclmulh,vghsh,vgmul,vaesef,vaesem,vaesdf,vaesdm,vaeskf1,vaeskf2,vaesz,
+   vsha2ms,vsha2ch,vsha2cl,vsm4k,vsm4r,vsm3me,vsm3c"
   (cond [(eq_attr "got" "load") (const_string "load")
 
 ;; If a doubleword move uses these expensive instructions,
@@ -3747,6 +3777,7 @@
 (include "thead.md")
 (include "generic-ooo.md")
 (include "vector.md")
+(include "vector-crypto.md")
 (include "zicond.md")
 (include "zc.md")
 (include "corev.md")
diff --git a/gcc/config/riscv/vector-crypto.md 
b/gcc/config/riscv/vector-crypto.md
new file mode 100755
index 000..a40ecef4342
--- /dev/null
+++ b/gcc/config/riscv/vector-crypto.md
@@ -0,0 

[PATCH 2/4] RISC-V: Add crypto vector builtin function.

2023-12-05 Thread Feng Wang
This patch add the intrinsic funtions of crypto vector based on the
intrinsic doc(https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob
/eopc/vector-crypto/auto-generated/vector-crypto/intrinsic_funcs.md).

Co-Authored by: Songhe Zhu 
Co-Authored by: Ciyan Pan 

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc (class vandn):
Add new function_base for crypto vector.
(class bitmanip): Ditto.
(class b_reverse):Ditto.
(class vwsll):Ditto.
(class clmul):Ditto.
(class vg_nhab):  Ditto.
(class crypto_vv):Ditto.
(class crypto_vi):Ditto.
(class vaeskf2_vsm3c):Ditto.
(class vsm3me):Ditto.
(BASE): Add BASE declaration for crypto vector.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-shapes.cc (struct crypto_vv_def):
Add new function_shape for crypto vector.
(struct crypto_vi_def): Ditto.
(SHAPE): Add SHAPE declaration of crypto vector.
* config/riscv/riscv-vector-builtins-shapes.h: Ditto.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_CRYPTO_SEW32_OPS):
Add new data struct for crypto vector.
(DEF_RVV_CRYPTO_SEW64_OPS): Ditto.
(DEF_VECTOR_CRYPTO_FUNCTION): New MACRO define of crypto vector.
(registered_function::overloaded_hash): Processing size_t uimm for C 
overloaded func.
(handle_pragma_vector): Add registration for crypto vector.
* config/riscv/riscv-vector-builtins.def (vi): Add vi OP_TYPE.
* config/riscv/riscv-vector-builtins.h (struct 
crypto_function_group_info):
Add new struct definition for crypto vector.
* config/riscv/t-riscv: Add building dependency files.
* config/riscv/riscv-vector-crypto-builtins-avail.h:
New file to control enable.
* config/riscv/riscv-vector-crypto-builtins-functions.def:
New file. Definition of crypto vector.
* config/riscv/riscv-vector-crypto-builtins-types.def:
New file. New type definition for crypto vector.
---
 .../riscv/riscv-vector-builtins-bases.cc  | 259 +-
 .../riscv/riscv-vector-builtins-bases.h   |  28 ++
 .../riscv/riscv-vector-builtins-shapes.cc |  66 -
 .../riscv/riscv-vector-builtins-shapes.h  |   4 +
 gcc/config/riscv/riscv-vector-builtins.cc | 152 +-
 gcc/config/riscv/riscv-vector-builtins.def|   1 +
 gcc/config/riscv/riscv-vector-builtins.h  |   8 +
 .../riscv-vector-crypto-builtins-avail.h  |  25 ++
 ...riscv-vector-crypto-builtins-functions.def |  78 ++
 .../riscv-vector-crypto-builtins-types.def|  21 ++
 gcc/config/riscv/t-riscv  |   2 +
 11 files changed, 641 insertions(+), 3 deletions(-)
 create mode 100755 gcc/config/riscv/riscv-vector-crypto-builtins-avail.h
 create mode 100755 gcc/config/riscv/riscv-vector-crypto-builtins-functions.def
 create mode 100755 gcc/config/riscv/riscv-vector-crypto-builtins-types.def

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index d70468542ee..6d52230e9ba 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -2127,6 +2127,207 @@ public:
   }
 };
 
+/* Below implements are vector crypto */
+/* Implements vandn.[vv,vx] */
+class vandn : public function_base
+{
+public:
+  rtx expand (function_expander ) const override
+  {
+switch (e.op_info->op)
+  {
+  case OP_TYPE_vv:
+return e.use_exact_insn (code_for_pred_vandn (e.vector_mode ()));
+  case OP_TYPE_vx:
+return e.use_exact_insn (code_for_pred_vandn_scalar (e.vector_mode 
()));
+  default:
+gcc_unreachable ();
+  }
+  }
+};
+
+/* Implements vrol/vror/clz/ctz.  */
+template
+class bitmanip : public function_base
+{
+public:
+  bool apply_tail_policy_p () const override
+  {
+return (CODE == CLZ || CODE == CTZ) ? false : true;
+  }
+  bool apply_mask_policy_p () const override
+  {
+return (CODE == CLZ || CODE == CTZ) ? false : true;
+  }
+  bool has_merge_operand_p () const override
+  {
+return (CODE == CLZ || CODE == CTZ) ? false : true;
+  }
+  
+  rtx expand (function_expander ) const override
+  {
+switch (e.op_info->op)
+{
+  case OP_TYPE_v:
+  case OP_TYPE_vv:
+return e.use_exact_insn (code_for_pred_v (CODE, e.vector_mode ()));
+  case OP_TYPE_vx:
+return e.use_exact_insn (code_for_pred_v_scalar (CODE, e.vector_mode 
()));
+  default:
+gcc_unreachable ();
+}
+  }
+};
+
+/* Implements vbrev/vbrev8/vrev8.  */
+template
+class b_reverse : public function_base
+{
+public:
+  rtx expand (function_expander ) const override
+  {
+  return 

[PATCH 1/4] RISC-V: Add crypto vector implied ISA info.

2023-12-05 Thread Feng Wang
Due to the crypto vector entension is depend on the Vector extension,
so the "v" info is added into implied ISA info with the corresponding
crypto vector extension.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add "v" into implied ISA info.
---
 gcc/common/config/riscv/riscv-common.cc | 9 +
 1 file changed, 9 insertions(+)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 6c210412515..dbb42ca2f1e 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -120,6 +120,15 @@ static const riscv_implied_info_t riscv_implied_info[] =
   {"zvksc", "zvbc"},
   {"zvksg", "zvks"},
   {"zvksg", "zvkg"},
+  {"zvbb",  "zvkb"},
+  {"zvbc", "v"},
+  {"zvkb", "v"},
+  {"zvkg", "v"},
+  {"zvkned",   "v"},
+  {"zvknha",   "v"},
+  {"zvknhb",   "v"},
+  {"zvksed",   "v"},
+  {"zvksh","v"},
 
   {"zfh", "zfhmin"},
   {"zfhmin", "f"},
-- 
2.17.1



[PATCH] testsuite: Adjust for the new permerror -Wincompatible-pointer-types

2023-12-05 Thread Yang Yujie
r14-6037 turned -Wincompatible-pointer-types into a permerror,
which causes the following tests to fail.

gcc/testsuite/ChangeLog:

* gcc.dg/fixed-point/composite-type.c: replace dg-warning with dg-error.
---
 .../gcc.dg/fixed-point/composite-type.c   | 64 +--
 1 file changed, 32 insertions(+), 32 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/fixed-point/composite-type.c 
b/gcc/testsuite/gcc.dg/fixed-point/composite-type.c
index 59351ff09b3..f91e480bcbf 100644
--- a/gcc/testsuite/gcc.dg/fixed-point/composite-type.c
+++ b/gcc/testsuite/gcc.dg/fixed-point/composite-type.c
@@ -68,39 +68,39 @@ FIXED_POINT_COMPOSITE_DECL(_Sat unsigned long long _Accum, 
Sullk);  /* { dg-erro
 
 int main()
 {
-  FIXED_POINT_COMPOSITE_TEST(short _Fract, sf);  /* { dg-warning "incompatible 
pointer type" } */
-  FIXED_POINT_COMPOSITE_TEST(_Fract, f);  /* { dg-warning "incompatible 
pointer type" } */
-  FIXED_POINT_COMPOSITE_TEST(long _Fract, lf);  /* { dg-warning "incompatible 
pointer type" } */
-  FIXED_POINT_COMPOSITE_TEST(long long _Fract, llf);  /* { dg-warning 
"incompatible pointer type" } */
-  FIXED_POINT_COMPOSITE_TEST(unsigned short _Fract, usf);  /* { dg-warning 
"incompatible pointer type" } */
-  FIXED_POINT_COMPOSITE_TEST(unsigned _Fract, uf);  /* { dg-warning 
"incompatible pointer type" } */
-  FIXED_POINT_COMPOSITE_TEST(unsigned long _Fract, ulf);  /* { dg-warning 
"incompatible pointer type" } */
-  FIXED_POINT_COMPOSITE_TEST(unsigned long long _Fract, ullf);  /* { 
dg-warning "incompatible pointer type" } */
-  FIXED_POINT_COMPOSITE_TEST(_Sat short _Fract, Ssf);  /* { dg-warning 
"incompatible pointer type" } */
-  FIXED_POINT_COMPOSITE_TEST(_Sat _Fract, Sf);  /* { dg-warning "incompatible 
pointer type" } */
-  FIXED_POINT_COMPOSITE_TEST(_Sat long _Fract, Slf);  /* { dg-warning 
"incompatible pointer type" } */
-  FIXED_POINT_COMPOSITE_TEST(_Sat long long _Fract, Sllf);  /* { dg-warning 
"incompatible pointer type" } */
-  FIXED_POINT_COMPOSITE_TEST(_Sat unsigned short _Fract, Susf);  /* { 
dg-warning "incompatible pointer type" } */
-  FIXED_POINT_COMPOSITE_TEST(_Sat unsigned _Fract, Suf);  /* { dg-warning 
"incompatible pointer type" } */
-  FIXED_POINT_COMPOSITE_TEST(_Sat unsigned long _Fract, Sulf);  /* { 
dg-warning "incompatible pointer type" } */
-  FIXED_POINT_COMPOSITE_TEST(_Sat unsigned long long _Fract, Sullf);  /* { 
dg-warning "incompatible pointer type" } */
+  FIXED_POINT_COMPOSITE_TEST(short _Fract, sf);  /* { dg-error "incompatible 
pointer type" } */
+  FIXED_POINT_COMPOSITE_TEST(_Fract, f);  /* { dg-error "incompatible pointer 
type" } */
+  FIXED_POINT_COMPOSITE_TEST(long _Fract, lf);  /* { dg-error "incompatible 
pointer type" } */
+  FIXED_POINT_COMPOSITE_TEST(long long _Fract, llf);  /* { dg-error 
"incompatible pointer type" } */
+  FIXED_POINT_COMPOSITE_TEST(unsigned short _Fract, usf);  /* { dg-error 
"incompatible pointer type" } */
+  FIXED_POINT_COMPOSITE_TEST(unsigned _Fract, uf);  /* { dg-error 
"incompatible pointer type" } */
+  FIXED_POINT_COMPOSITE_TEST(unsigned long _Fract, ulf);  /* { dg-error 
"incompatible pointer type" } */
+  FIXED_POINT_COMPOSITE_TEST(unsigned long long _Fract, ullf);  /* { dg-error 
"incompatible pointer type" } */
+  FIXED_POINT_COMPOSITE_TEST(_Sat short _Fract, Ssf);  /* { dg-error 
"incompatible pointer type" } */
+  FIXED_POINT_COMPOSITE_TEST(_Sat _Fract, Sf);  /* { dg-error "incompatible 
pointer type" } */
+  FIXED_POINT_COMPOSITE_TEST(_Sat long _Fract, Slf);  /* { dg-error 
"incompatible pointer type" } */
+  FIXED_POINT_COMPOSITE_TEST(_Sat long long _Fract, Sllf);  /* { dg-error 
"incompatible pointer type" } */
+  FIXED_POINT_COMPOSITE_TEST(_Sat unsigned short _Fract, Susf);  /* { dg-error 
"incompatible pointer type" } */
+  FIXED_POINT_COMPOSITE_TEST(_Sat unsigned _Fract, Suf);  /* { dg-error 
"incompatible pointer type" } */
+  FIXED_POINT_COMPOSITE_TEST(_Sat unsigned long _Fract, Sulf);  /* { dg-error 
"incompatible pointer type" } */
+  FIXED_POINT_COMPOSITE_TEST(_Sat unsigned long long _Fract, Sullf);  /* { 
dg-error "incompatible pointer type" } */
 
-  FIXED_POINT_COMPOSITE_TEST(short _Accum, sk);  /* { dg-warning "incompatible 
pointer type" } */
-  FIXED_POINT_COMPOSITE_TEST(_Accum, k);  /* { dg-warning "incompatible 
pointer type" } */
-  FIXED_POINT_COMPOSITE_TEST(long _Accum, lk);  /* { dg-warning "incompatible 
pointer type" } */
-  FIXED_POINT_COMPOSITE_TEST(long long _Accum, llk);  /* { dg-warning 
"incompatible pointer type" } */
-  FIXED_POINT_COMPOSITE_TEST(unsigned short _Accum, usk);  /* { dg-warning 
"incompatible pointer type" } */
-  FIXED_POINT_COMPOSITE_TEST(unsigned _Accum, uk);  /* { dg-warning 
"incompatible pointer type" } */
-  FIXED_POINT_COMPOSITE_TEST(unsigned long _Accum, ulk);  /* { dg-warning 
"incompatible pointer type" } */
-  FIXED_POINT_COMPOSITE_TEST(unsigned long long _Accum, ullk);  /* { 
dg-warning "incompatible pointer type" } */
-  FIXED_POINT_COMPOSITE_TEST(_Sat 

Re: [PATCH] arm: fix c23 0-named-args caller-side stdarg

2023-12-05 Thread Alexandre Oliva
On Nov 19, 2023, Alexandre Oliva  wrote:

> On arm-eabi targets, c23 stdarg execution tests that pass arguments to
> (...) functions (without any named argument), the caller passes
> everything on the stack, but the callee expects arguments in
> registers.

Ping?  This slightly modified patch only adds comments to
aapcs_layout_arg compared with the original one.

The commit message doesn't name explicitly the fixed testsuite
failures.  Here they are:

FAIL: gcc.dg/c23-stdarg-4.c execution test
FAIL: gcc.dg/torture/c23-stdarg-split-1a.c   -O0  execution test
FAIL: gcc.dg/torture/c23-stdarg-split-1a.c   -O1  execution test
FAIL: gcc.dg/torture/c23-stdarg-split-1a.c   -O2  execution test
FAIL: gcc.dg/torture/c23-stdarg-split-1a.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  execution test
FAIL: gcc.dg/torture/c23-stdarg-split-1a.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  execution test
FAIL: gcc.dg/torture/c23-stdarg-split-1a.c   -O3 -g  execution test
FAIL: gcc.dg/torture/c23-stdarg-split-1a.c   -Os  execution test

Tested on arm-eabi.  Ok to install?


arm: fix c23 0-named-args caller-side stdarg

On arm-eabi targets, c23 stdarg execution tests that pass arguments to
(...) functions (without any named argument), the caller passes
everything on the stack, but the callee expects arguments in
registers.  My reading of the AAPCS32 suggests that the caller is
correct, so I've arranged for the caller to pass the first arguments
in registers to TYPE_NO_NAMED_STDARG_P-typed functions.

The implementation issue in calls.cc is that n_named_args is initially
set to zero in expand_call, so the test argpos < n_named_args yields
false for all arguments, and aapcs_layout_arg takes !named as meaning
stack.

But there's a catch there: on targets in which neither
strict_argument_naming nor !pretend_outgoing_varargs_named hold,
n_named_args is bumped up to num_actuals, which covers stdarg
arguments in pre-c23 cases, but not for TYPE_NO_NAMED_ARGS_STDARG_P.

I'm hesitant to modify the generic ABI-affecting code, so I'm going
for a more surgical fix for ARM AAPCS only.  I suspect we might want
yet another targetm predicate to enable the n_named_args overriding
block to disregard TYPE_NO_NAMED_ARGS_STDARG_P, and allow all actuals
to be passed as if named.


for  gcc/ChangeLog

* config/arm/arm.h (CUMULATIVE_ARGS): Add aapcs_pretend_named.
* config/arm/arm.cc (arm_init_cumulative_args): Set it for
aapcs no-named-args stdarg functions.
(aapcs_layout_arg): Ignore named if aapcs_pretend_named.
---
 gcc/config/arm/arm.cc |9 +++--
 gcc/config/arm/arm.h  |1 +
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 6e3e2e8fb1bfb..4a350bd8c8f47 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -7019,8 +7019,11 @@ aapcs_layout_arg (CUMULATIVE_ARGS *pcum, machine_mode 
mode,
   pcum->aapcs_arg_processed = true;
 
   /* Special case: if named is false then we are handling an incoming
- anonymous argument which is on the stack.  */
-  if (!named)
+ anonymous argument which is on the stack, unless
+ aapcs_pretend_named, in which case we're dealing with a
+ TYPE_NO_NAMED_ARGS_STDARG_P call and, even if args are !named, we
+ ought to use available registers first.  */
+  if (!named && !pcum->aapcs_pretend_named)
 return;
 
   /* Is this a potential co-processor register candidate?  */
@@ -7141,6 +7144,8 @@ arm_init_cumulative_args (CUMULATIVE_ARGS *pcum, tree 
fntype,
   pcum->aapcs_arg_processed = false;
   pcum->aapcs_cprc_slot = -1;
   pcum->can_split = true;
+  pcum->aapcs_pretend_named = (fntype
+  && TYPE_NO_NAMED_ARGS_STDARG_P (fntype));
 
   if (pcum->pcs_variant != ARM_PCS_AAPCS)
{
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index a9c2752c0ea5e..65d2d567686d3 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1702,6 +1702,7 @@ typedef struct
   unsigned aapcs_vfp_reg_alloc;
   int aapcs_vfp_rcount;
   MACHMODE aapcs_vfp_rmode;
+  bool aapcs_pretend_named; /* Set for TYPE_NO_NAMED_ARGS_STDARG_P.  */
 } CUMULATIVE_ARGS;
 #endif
 


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [PATCH v2] rs6000: Add new pass for replacement of contiguous addresses vector load lxv with lxvp

2023-12-05 Thread Kewen.Lin
on 2023/12/6 02:01, Ajit Agarwal wrote:
> Hello Kewen:
> 
> 
> On 05/12/23 7:13 pm, Ajit Agarwal wrote:
>> Hello Kewen:
>>
>> On 04/12/23 7:31 am, Kewen.Lin wrote:
>>> Hi Ajit,
>>>
>>> on 2023/12/1 17:10, Ajit Agarwal wrote:
 Hello Kewen:

 On 24/11/23 3:01 pm, Kewen.Lin wrote:
> Hi Ajit,
>
> Don't forget to CC David (CC-ed) :), some comments are inlined below.
>
> on 2023/10/8 03:04, Ajit Agarwal wrote:
>> Hello All:
>>
>> This patch add new pass to replace contiguous addresses vector load lxv 
>> with mma instruction
>> lxvp.
>
> IMHO the current binding lxvp (and lxvpx, stxvp{x,}) to MMA looks wrong, 
> it's only
> Power10 and VSX required, these instructions should perform well without 
> MMA support.
> So one patch to separate their support from MMA seems to go first.
>

 I will make the changes for Power10 and VSX.

>> This patch addresses one regressions failure in ARM architecture.
>
> Could you explain this?  I don't see any test case for this.

 I have submitted v1 of the patch and there were regressions failure for 
 Linaro.
 I have fixed in version V2.
>>>
>>> OK, thanks for clarifying.  So some unexpected changes on generic code in v1
>>> caused the failure exposed on arm.
>>>

  
> Besides, it seems a bad idea to put this pass after reload? as register 
> allocation
> finishes, this pairing has to be restricted by the reg No. (I didn't see 
> any
> checking on the reg No. relationship for paring btw.)
>

 Adding before reload pass deletes one of the lxv and replaced with lxvp. 
 This
 fails in reload pass while freeing reg_eqivs as ira populates them and then
>>>
>>> I can't find reg_eqivs, I guessed you meant reg_equivs and moved this pass 
>>> right before
>>> pass_reload (between pass_ira and pass_reload)?  IMHO it's unexpected as 
>>> those two passes
>>> are closely correlated.  I was expecting to put it somewhere before ira.
>>
>> Yes they are tied together and moving before reload will not work.
>>
>>>
 vecload pass deletes some of insns and while freeing in reload pass as insn
 is already deleted in vecload pass reload pass segfaults.

 Moving vecload pass before ira will not make register pairs with lxvp and
 in ira and that will be a problem.
>>>
>>> Could you elaborate the obstacle for moving such pass before pass_ira?
>>>
>>> Basing on the status quo, the lxvp is bundled with OOmode, then I'd expect
>>> we can generate OOmode move (load) and use the components with unspec (or
>>> subreg with Peter's patch) to replace all the previous use places, it looks
>>> doable to me.
>>
>> Moving before ira passes, we delete the offset lxv and generate lxvp and 
>> replace all
>> the uses, that I am doing. But the offset lxvp register generated by ira are 
>> not
>> register pair and generate random register and hence we cannot generate lxvp.
>>
>> For example one lxv is generated with register 32 and other pair is generated
>> with register 45 by ira if we move it before ira passes.
> 
> It generates the following.
>   lxvp %vs32,0(%r4)
> xvf32ger 0,%vs34,%vs32
> xvf32gerpp 0,%vs34,%vs45

What do the RTL insns for these insns look like?

I'd expect you use UNSPEC_MMA_EXTRACT to extract V16QI from the result of lxvp,
the current define_insn_and_split "*vsx_disassemble_pair" should be able to take
care of it further (eg: reg and regoff).

BR,
Kewen

> xxmfacc 0
> stxvp %vs2,0(%r3)
> stxvp %vs0,32(%r3)
> blr
> 
> 
> Instead of vs33 ira generates vs45 if we move before pass_ira.
> 
> Thanks & Regards
> Ajit
> 
>  
>> Thanks & Regards
>> Ajit
>>>
>>

 Making after reload pass is the only solution I see as ira and reload pass
 makes register pairs and vecload pass will be easier with generation of
 lxvp.

 Please suggest.
  
> Looking forward to the comments from Segher/David/Peter/Mike etc.
>>>
>>> Still looking forward. :)
>>>
>>> BR,
>>> Kewen



[PATCH RFA (libstdc++)] c++: partial ordering of object parameter [PR53499]

2023-12-05 Thread Jason Merrill
Tested x86_64-pc-linux-gnu.  Are the library test changes OK?  A reduced
example of the issue is at https://godbolt.org/z/cPxrcnKjG

-- 8< --

Looks like we implemented option 1 (skip the object parameter) for CWG532
before the issue was resolved, and never updated to the final resolution of
option 2 (model it as a reference).  More recently CWG2445 extended this
handling to static member functions; I think that's wrong, and have
opened CWG2834 to address that and how explicit object member functions
interact with it.

The FIXME comments are to guide how the explicit object member function
support should change the uses of DECL_NONSTATIC_MEMBER_FUNCTION_P.

The library testsuite changes are to make partial ordering work again
between the generic operator- in the testcase and
_Pointer_adapter::operator-.

DR 532
PR c++/53499

gcc/cp/ChangeLog:

* pt.cc (more_specialized_fn): Fix object parameter handling.

gcc/testsuite/ChangeLog:

* g++.dg/template/partial-order4.C: New test.
* g++.dg/template/spec26.C: Adjust for CWG532.

libstdc++-v3/ChangeLog:

* testsuite/23_containers/vector/ext_pointer/types/1.cc
* testsuite/23_containers/vector/ext_pointer/types/2.cc
(N::operator-): Make less specialized.
---
 gcc/cp/pt.cc  | 68 ++-
 .../g++.dg/template/partial-order4.C  | 17 +
 gcc/testsuite/g++.dg/template/spec26.C| 10 +--
 .../vector/ext_pointer/types/1.cc |  4 +-
 .../vector/ext_pointer/types/2.cc |  4 +-
 5 files changed, 78 insertions(+), 25 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/template/partial-order4.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 924a20973b4..4b2af4f7aca 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -25218,27 +25218,61 @@ more_specialized_fn (tree pat1, tree pat2, int len)
   bool lose1 = false;
   bool lose2 = false;
 
-  /* Remove the this parameter from non-static member functions.  If
- one is a non-static member function and the other is not a static
- member function, remove the first parameter from that function
- also.  This situation occurs for operator functions where we
- locate both a member function (with this pointer) and non-member
- operator (with explicit first operand).  */
-  if (DECL_NONSTATIC_MEMBER_FUNCTION_P (decl1))
+  /* C++17 [temp.func.order]/3 (CWG532)
+
+ If only one of the function templates M is a non-static member of some
+ class A, M is considered to have a new first parameter inserted in its
+ function parameter list. Given cv as the cv-qualifiers of M (if any), the
+ new parameter is of type "rvalue reference to cv A" if the optional
+ ref-qualifier of M is && or if M has no ref-qualifier and the first
+ parameter of the other template has rvalue reference type. Otherwise, the
+ new parameter is of type "lvalue reference to cv A".  */
+
+  if (DECL_STATIC_FUNCTION_P (decl1) || DECL_STATIC_FUNCTION_P (decl2))
 {
-  len--; /* LEN is the number of significant arguments for DECL1 */
-  args1 = TREE_CHAIN (args1);
-  if (!DECL_STATIC_FUNCTION_P (decl2))
-   args2 = TREE_CHAIN (args2);
-}
-  else if (DECL_NONSTATIC_MEMBER_FUNCTION_P (decl2))
-{
-  args2 = TREE_CHAIN (args2);
-  if (!DECL_STATIC_FUNCTION_P (decl1))
+  /* Note C++20 DR2445 extended the above to static member functions, but
+I think think the old G++ behavior of just skipping the object
+parameter when comparing to a static member function was better, so
+let's stick with that for now.  This is CWG2834.  --jason 2023-12 */
+  if (DECL_NONSTATIC_MEMBER_FUNCTION_P (decl1)) /* FIXME or explicit */
{
- len--;
+ len--; /* LEN is the number of significant arguments for DECL1 */
  args1 = TREE_CHAIN (args1);
}
+  else if (DECL_NONSTATIC_MEMBER_FUNCTION_P (decl2)) /* FIXME or explicit 
*/
+   args2 = TREE_CHAIN (args2);
+}
+  else if (DECL_NONSTATIC_MEMBER_FUNCTION_P (decl1) /* FIXME implicit only */
+  && DECL_NONSTATIC_MEMBER_FUNCTION_P (decl2))
+{
+  /* Note DR2445 also (IMO wrongly) removed the "only one" above, which
+would break e.g.  cpp1y/lambda-generic-variadic5.C.  */
+  len--;
+  args1 = TREE_CHAIN (args1);
+  args2 = TREE_CHAIN (args2);
+}
+  else if (DECL_NONSTATIC_MEMBER_FUNCTION_P (decl1) /* FIXME implicit only */
+  || DECL_NONSTATIC_MEMBER_FUNCTION_P (decl2))
+{
+  /* The other is a non-member or explicit object member function;
+rewrite the implicit object parameter to a reference.  */
+  tree ns = DECL_NONSTATIC_MEMBER_FUNCTION_P (decl2) ? decl2 : decl1;
+  tree  = ns == decl2 ? args2 : args1;
+  tree obtype = TREE_TYPE (TREE_VALUE (nsargs));
+
+  nsargs = TREE_CHAIN (nsargs);
+
+  cp_ref_qualifier rqual = type_memfn_rqual (TREE_TYPE (ns));
+  if (rqual == 

Re: [PATCH] Don't vectorize when vector stmts are only vec_contruct and stores

2023-12-05 Thread Hongtao Liu
On Mon, Dec 4, 2023 at 10:10 PM Richard Biener
 wrote:
>
> On Mon, Dec 4, 2023 at 6:32 AM liuhongt  wrote:
> >
> > .i.e. for below cases.
> >a[0] = b1;
> >a[1] = b2;
> >..
> >a[n] = bn;
> >
> > There're extra dependences when contructing the vector, but not for
> > scalar store. According to experiments, it's generally worse.
> >
> > The patch adds an cut-off heuristic when vec_stmt is just
> > vec_construct and vector store. It improves SPEC2017 a little bit.
> >
> > BenchMarks  Ratio
> > 500.perlbench_r 2.60%
> > 502.gcc_r   0.30%
> > 505.mcf_r   0.40%
> > 520.omnetpp_r   -1.00%
> > 523.xalancbmk_r 0.90%
> > 525.x264_r  0.00%
> > 531.deepsjeng_r 0.30%
> > 541.leela_r 0.90%
> > 548.exchange2_r 3.20%
> > 557.xz_r1.40%
> > 503.bwaves_r0.00%
> > 507.cactuBSSN_r 0.00%
> > 508.namd_r  0.30%
> > 510.parest_r0.00%
> > 511.povray_r0.20%
> > 519.lbm_r   SAME BIN
> > 521.wrf_r   -0.30%
> > 526.blender_r   -1.20%
> > 527.cam4_r  -0.20%
> > 538.imagick_r   4.00%
> > 544.nab_r   0.40%
> > 549.fotonik3d_r 0.00%
> > 554.roms_r  0.00%
> > Geomean-int 0.90%
> > Geomean-fp  0.30%
> > Geomean-all 0.50%
> >
> > And
> > Regressed testcases:
> >
> > gcc.target/i386/part-vect-absneghf.c
> > gcc.target/i386/part-vect-copysignhf.c
> > gcc.target/i386/part-vect-xorsignhf.c
> >
> > Regressed under -m32 since it generates 2 vector
> > .ABS/NEG/XORSIGN/COPYSIGN vs original 1 64-bit vec_construct. The
> > original testcases are used to test vectorization capability for
> > .ABS/NEG/XORG/COPYSIGN, so just restrict testcase to TARGET_64BIT.
> >
> > gcc.target/i386/pr111023-2.c
> > gcc.target/i386/pr111023.c
> > Regressed under -m32
> >
> > testcase as below
> >
> > void
> > v8hi_v8qi (v8hi *dst, v16qi src)
> > {
> >   short tem[8];
> >   tem[0] = src[0];
> >   tem[1] = src[1];
> >   tem[2] = src[2];
> >   tem[3] = src[3];
> >   tem[4] = src[4];
> >   tem[5] = src[5];
> >   tem[6] = src[6];
> >   tem[7] = src[7];
> >   dst[0] = *(v8hi *) tem;
> > }
> >
> > under 64-bit target, vectorizer realize it's just permutation of
> > original src vector, but under -m32, vectorizer relies on
> > vec_construct for vectorization. I think optimziation for this case
> > under 32-bit target maynot impact much, so just add
> > -fno-vect-cost-model.
> >
> > gcc.target/i386/pr91446.c: This testcase is guard for cost model of
> > vector store, not vectorization capability, so just adjust testcase.
> >
> > gcc.target/i386/pr108938-3.c: This testcase relies on vec_construct to
> > optimize for bswap, like other optimziation vectorizer can't realize
> > optimization after it. So the current solution is add
> > -fno-vect-cost-model to the testcase.
> >
> > costmodel-pr104582-1.c
> > costmodel-pr104582-2.c
> > costmodel-pr104582-4.c
> >
> > Failed since it's now not vectorized, looked at the PR, it's exactly
> > what's wanted, so adjust testcase to scan-tree-dump-not.
> >
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > Ok for trunk?
>
> So the original motivation to not more aggressively prune
> store-from-CTOR vectorization in the vectorizer itself is that
> the vector store is possibly better for STLF (larger stores are
> good, larger loads eventually problematic).

That's exactly what I worried about, and I didn't observe any STLF
stall in SPEC2017, I'll try with more benchmarks.
But on the other hand, the cost model is not suitable for solving this
problem, at best it only circumvents part of this.

>
> I'd also expect the costs to play out to not make those profitable.
>
> OTOH, if you have a series of 'double' stores you can convert to
> a series of V2DF stores you _may_ be faster if this reduces
> pressure on the store unit.  Esp. V2DF is cheap to construct
> with one movhpd.

>
> So I don't think we want to try to pattern match it this way?
>
> In fact the SLP vectorization cases could all arrive with an
> SLP node specified (vectorizable_store would have to be
> changed here), which means you could check for an
> vect_external_def child instead?
>
> But as said, I would hope that we can arrive at a better way
> assessing the CONSTRUCTOR cost.  IMHO one big issue
> is that load and store cost are comparatively high compared
> to simple stmt ops so it's very hard to offset saving many
> stores with "ops".  That's because we generally think of
> 'cost' to model latency but as you say stores don't really
> have latency - we only have store bandwidth of the store

Yes.

> unit and of course issue width (but that's true for other ops
> as well).  I wonder what happens if we set both scalar and
> vector store cost to zero?  Or maybe one (to count one
> issue slot)?

I tried to reduce the cost of the scalar store, but it regressed in 

[PATCH v8] Introduce attribute sym_alias

2023-12-05 Thread Alexandre Oliva


Here's an improved version that fixes some cases of making static local
names visible through sym_alias, detection of symbol name clashes when
sym_alias is registered before a clashing definition ("sym name"
attributes are now introduced to enable sym_alias-created declarations
to be identified), and aliases to typeinfo sym_alias names, with tests
adjusted and extended to match.

I've retained comments for the desired create_alias calls that didn't
work, instead of create_same_body_alias for functions, and for the
create_extra_name_alias calls I used to issue for variables, mainly to
draw attention to the fact that some of these calls, found undesirable
in earlier iterations, are still there, hoping that we can keep them
this way rather than work out some other way to introduce this feature.

Regstrapped on x86_64-linux-gnu, also tested on arm-eabi.  Ok to
install?


Introduce attribute sym_alias

This patch introduces an attribute to add extra asm names (aliases)
for a decl when its definition is output.  The main goal is to ease
interfacing C++ with Ada, as C++ mangled names have to be named, and
in some cases (e.g. when using stdint.h typedefs in function
arguments) the symbol names may vary across platforms.

The attribute is usable in C and C++, presumably in all C-family
languages.  It can be attached to global variables and functions, and
also to local static variables.  In C++, it can also be attached to
class types, namespace-scoped variables and functions, static data
members, member functions, explicit instantiations and specializations
of template functions, members and classes.

When applied to constructors or destructor, additional sym aliases
with _Base and _Del suffixes are defined for variants other than
complete-object ones.  This changes the assumption that clones always
carry the same attributes as their abstract declarations, so there is
now a function to adjust them.

C++ also had a bug in which attributes from local extern declarations
failed to be propagated to a preexisting corresponding
namespace-scoped decl.  I've fixed that, and adjusted acc tests that
distinguished between C and C++ in this regard.

Applying the attribute to class types is only valid in C++, and the
effect is to attach the alias to the RTTI object associated with the
class type.


for  gcc/ChangeLog

* attribs.cc: Include cgraph.h.
(decl_attributes): Allow late introduction of sym_alias in
types.
(create_sym_alias_decl, create_sym_alias_decls): New.
* attribs.h: Declare them.
(FOR_EACH_SYM_ALIAS): New macro.
* cgraph.cc (cgraph_node::create): Create sym_alias decls.
* varpool.cc (varpool_node::get_create): Create sym_alias
decls.
* cgraph.h (symtab_node::remap_sym_alias_target): New.
* symtab.cc (symtab_node::remap_sym_alias_target): Define.
(symbol_table::insert_to_assembler_name_hash): Check for
symbol name clashes.
(symtab_node::noninterposable_alias): Drop sym_alias
attributes.
* cgraphunit.cc (cgraph_node::analyze): Create alias_target
node if needed.
(analyze_functions): Fixup visibility of implicit alias only
after its node is analyzed.
* doc/extend.texi (sym_alias): Document for variables,
functions and types.

for  gcc/ada/ChangeLog

* doc/gnat_rm/interfacing_to_other_languages.rst: Mention
attribute sym_alias to give RTTI symbols mnemonic names.
* doc/gnat_ugn/the_gnat_compilation_model.rst: Mention
aliases.  Fix incorrect ref to C1 ctor variant.

for  gcc/c-family/ChangeLog

* c-ada-spec.cc (pp_asm_name): Use first sym_alias if
available.
* c-attribs.cc (handle_sym_alias_attribute): New.
(c_common_attribute_table): Add sym_alias.
(handle_copy_attribute): Do not copy sym_alias attribute.

for  gcc/c/ChangeLog

* c-decl.cc (duplicate_decls): Remap sym_alias target.
(finish_decl): Create varpool_node for local static
variables.

for  gcc/cp/ChangeLog

* class.cc (adjust_clone_attributes): New.
(copy_fndecl_with_name, build_clone): Call it.
* cp-tree.h (adjust_clone_attributes): Declare.
(update_sym_alias_interface): Declare.
(update_tinfo_sym_alias): Declare.
* decl.cc (duplicate_decls): Remap sym_alias target.
Adjust clone attributes.
(grokfndecl): Tentatively create sym_alias decls after
adding attributes in e.g. a template member function explicit
instantiation.
* decl2.cc (cplus_decl_attributes): Update tinfo sym_alias.
(copy_interface, update_sym_alias_interface): New.
(determine_visibility): Update sym_alias interface.
(tentative_decl_linkage, import_export_decl): Likewise.
* name-lookup.cc: Include target.h and cgraph.h.
(push_local_extern_decl_alias): Merge attributes with
namespace-scoped decl, and 

[PATCH] c-family: Fix ICE with large column number after restoring a PCH [PR105608]

2023-12-05 Thread Lewis Hyatt
Hello-

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105608

There are two related issues here really, a regression since GCC 11 where we
can ICE after restoring a PCH, and a deeper issue with bogus locations
assigned to macros that were defined prior to restoring a PCH.  This patch
fixes the ICE regression with a simple change, and I think it's appropriate
for GCC 14 as well as backport to 11, 12, 13. The bad locations (wrong, but
not generally causing an ICE, and mostly affecting only the output of
-Wunused-macros) are not as problematic, and will be harder to fix. I could
take a stab at that for GCC 15. In the meantime the patch adds XFAILed
tests for the wrong locations (as well as passing tests for the regression
fix). Does it look OK please? Bootstrap + regtest all languages on x86-64
Linux. Thanks!

-Lewis

-- >8 --

Users are allowed to define macros prior to restoring a precompiled header
file, as long as those macros are not defined (or are defined identically)
in the PCH.  However, the PCH restoration process destroys all the macro
definitions, so libcpp has to record them before restoring the PCH and then
redefine them afterward.

This process does not currently assign great locations to the macros after
redefining them. Some work is needed to also remember the original locations
and get the line_maps instance in the right state (since, like all other
data structures, the line_maps instance is also reset after restoring a PCH).
The new testcase line-map-3.C contains XFAILed examples where the locations
are wrong.

This patch addresses a more pressing issue, which is that we ICE in some
cases since GCC 11, hitting an assert in line-maps.cc. It happens if the
first line encountered after the PCH restore requires an LC_RENAME map, such
as will happen if the line is sufficiently long.  This is much easier to
fix, since we just need to call linemap_line_start before asking libcpp to
redefine the stored macros, instead of afterward, to avoid the unexpected
need for an LC_RENAME before an LC_ENTER has been seen.

gcc/c-family/ChangeLog:

PR preprocessor/105608
* c-pch.cc (c_common_read_pch): Start a new line map before asking
libcpp to restore macros defined prior to reading the PCH, instead
of afterward.

gcc/testsuite/ChangeLog:

PR preprocessor/105608
* g++.dg/pch/line-map-1.C: New test.
* g++.dg/pch/line-map-1.Hs: New test.
* g++.dg/pch/line-map-2.C: New test.
* g++.dg/pch/line-map-2.Hs: New test.
* g++.dg/pch/line-map-3.C: New test.
* g++.dg/pch/line-map-3.Hs: New test.
---
 gcc/c-family/c-pch.cc  |  5 ++---
 gcc/testsuite/g++.dg/pch/line-map-1.C  |  4 
 gcc/testsuite/g++.dg/pch/line-map-1.Hs |  1 +
 gcc/testsuite/g++.dg/pch/line-map-2.C  |  6 ++
 gcc/testsuite/g++.dg/pch/line-map-2.Hs |  1 +
 gcc/testsuite/g++.dg/pch/line-map-3.C  | 23 +++
 gcc/testsuite/g++.dg/pch/line-map-3.Hs |  1 +
 7 files changed, 38 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/pch/line-map-1.C
 create mode 100644 gcc/testsuite/g++.dg/pch/line-map-1.Hs
 create mode 100644 gcc/testsuite/g++.dg/pch/line-map-2.C
 create mode 100644 gcc/testsuite/g++.dg/pch/line-map-2.Hs
 create mode 100644 gcc/testsuite/g++.dg/pch/line-map-3.C
 create mode 100644 gcc/testsuite/g++.dg/pch/line-map-3.Hs

diff --git a/gcc/c-family/c-pch.cc b/gcc/c-family/c-pch.cc
index 2f014fca210..9ee6f179002 100644
--- a/gcc/c-family/c-pch.cc
+++ b/gcc/c-family/c-pch.cc
@@ -342,6 +342,8 @@ c_common_read_pch (cpp_reader *pfile, const char *name,
   gt_pch_restore (f);
   cpp_set_line_map (pfile, line_table);
   rebuild_location_adhoc_htab (line_table);
+  line_table->trace_includes = saved_trace_includes;
+  linemap_add (line_table, LC_ENTER, 0, saved_loc.file, saved_loc.line);
 
   timevar_push (TV_PCH_CPP_RESTORE);
   if (cpp_read_state (pfile, name, f, smd) != 0)
@@ -355,9 +357,6 @@ c_common_read_pch (cpp_reader *pfile, const char *name,
 
   fclose (f);
 
-  line_table->trace_includes = saved_trace_includes;
-  linemap_add (line_table, LC_ENTER, 0, saved_loc.file, saved_loc.line);
-
   /* Give the front end a chance to take action after a PCH file has
  been loaded.  */
   if (lang_post_pch_load)
diff --git a/gcc/testsuite/g++.dg/pch/line-map-1.C 
b/gcc/testsuite/g++.dg/pch/line-map-1.C
new file mode 100644
index 000..9d1ac6d1683
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pch/line-map-1.C
@@ -0,0 +1,4 @@
+/* PR preprocessor/105608 */
+/* { dg-do compile } */
+#define MACRO_ON_A_LONG_LINE "this line is long enough that it forces the line 
table to create an LC_RENAME map, which formerly triggered an ICE after PCH 
restore"
+#include "line-map-1.H"
diff --git a/gcc/testsuite/g++.dg/pch/line-map-1.Hs 
b/gcc/testsuite/g++.dg/pch/line-map-1.Hs
new file mode 100644
index 000..3b6178bfae0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pch/line-map-1.Hs
@@ -0,0 +1 @@
+/* This space intentionally 

Re: [PATCH 02/17] [APX NDD] Restrict TImode register usage when NDD enabled

2023-12-05 Thread Hongyu Wang
Uros Bizjak  于2023年12月5日周二 18:46写道:

>
> On Tue, Dec 5, 2023 at 3:29 AM Hongyu Wang  wrote:
> >
> > Under APX NDD, previous TImode allocation will have issue that it was
> > originally allocated using continuous pair, like rax:rdi, rdi:rdx.
> >
> > This will cause issue for all TImode NDD patterns. For NDD we will not
> > assume the arithmetic operations like add have dependency between dest
> > and src1, then write to 1st highpart rdi will be overrided by the 2nd
> > lowpart rdi if 2nd lowpart rdi have different src as input, then the write
> > to 1st highpart rdi will missed and cause miscompliation.
> >
> > To resolve this, under TARGET_APX_NDD we'd only allow register with even
> > regno to be allocated with TImode, then TImode registers will be allocated
> > with non-overlapping pairs.
>
> Perhaps you could use earlyclobber with __doubleword instructions:
>
> (define_insn_and_split "*add3_doubleword"
>   [(set (match_operand: 0 "nonimmediate_operand" "=ro,r")
> (plus:
>   (match_operand: 1 "nonimmediate_operand" "%0,0")
>   (match_operand: 2 "x86_64_hilo_general_operand" "r,o")))
>(clobber (reg:CC FLAGS_REG))]
>
> For the above pattern, you can add earlyclobbered  output
> alternative that guarantees that output won't be allocated to any of
> the input registers.
>

Yes, it does resolve the dest/src overlapping issue we met, thanks!
I tried it and no fails in gcc-testsuite and spec. Suppose for
different src1/src2 RA can handle them correctly.

Will update in V3 patches with the changes of get_attr_isa (insn) == ISA_APX_NDD


Re: [PATCH] i386: Move vzeroupper pass from after reload pass to after postreload_cse [PR112760]

2023-12-05 Thread Hongtao Liu
On Wed, Dec 6, 2023 at 6:23 AM Jakub Jelinek  wrote:
>
> Hi!
>
> Regardless of the outcome of the REG_UNUSED discussions, I think
> it is a good idea to move the vzeroupper pass one pass later.
> As can be seen in the multiple PRs and as postreload.cc documents,
> reload/LRA is known to create dead statements quite often, which
> is the reason why we have postreload_cse pass at all.
> Doing vzeroupper pass before such cleanup means the pass including
> df_analyze for it needs to process more instructions than needed
> and because mode switching adds note problem, also higher chance of
> having stale REG_UNUSED notes.
> And, I really don't see why vzeroupper can't wait until those cleanups
> are done.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
LGTM.
>
> 2023-12-05  Jakub Jelinek  
>
> PR rtl-optimization/112760
> * config/i386/i386-passes.def (pass_insert_vzeroupper): Insert
> after pass_postreload_cse rather than pass_reload.
> * config/i386/i386-features.cc (rest_of_handle_insert_vzeroupper):
> Adjust comment for it.
>
> * gcc.dg/pr112760.c: New test.
>
> --- gcc/config/i386/i386-passes.def.jj  2023-01-16 11:52:15.960735877 +0100
> +++ gcc/config/i386/i386-passes.def 2023-12-05 19:15:01.748279329 +0100
> @@ -24,7 +24,7 @@ along with GCC; see the file COPYING3.
> REPLACE_PASS (PASS, INSTANCE, TGT_PASS)
>   */
>
> -  INSERT_PASS_AFTER (pass_reload, 1, pass_insert_vzeroupper);
> +  INSERT_PASS_AFTER (pass_postreload_cse, 1, pass_insert_vzeroupper);
>INSERT_PASS_AFTER (pass_combine, 1, pass_stv, false /* timode_p */);
>/* Run the 64-bit STV pass before the CSE pass so that CONST0_RTX and
>   CONSTM1_RTX generated by the STV pass can be CSEed.  */
> --- gcc/config/i386/i386-features.cc.jj 2023-11-02 07:49:15.029894060 +0100
> +++ gcc/config/i386/i386-features.cc2023-12-05 19:15:48.658620698 +0100
> @@ -2627,10 +2627,11 @@ convert_scalars_to_vector (bool timode_p
>  static unsigned int
>  rest_of_handle_insert_vzeroupper (void)
>  {
> -  /* vzeroupper instructions are inserted immediately after reload to
> - account for possible spills from 256bit or 512bit registers.  The pass
> - reuses mode switching infrastructure by re-running mode insertion
> - pass, so disable entities that have already been processed.  */
> +  /* vzeroupper instructions are inserted immediately after reload and
> + postreload_cse to clean up after it a little bit to account for possible
> + spills from 256bit or 512bit registers.  The pass reuses mode switching
> + infrastructure by re-running mode insertion pass, so disable entities
> + that have already been processed.  */
>for (int i = 0; i < MAX_386_ENTITIES; i++)
>  ix86_optimize_mode_switching[i] = 0;
>
> --- gcc/testsuite/gcc.dg/pr112760.c.jj  2023-12-01 13:46:57.444746529 +0100
> +++ gcc/testsuite/gcc.dg/pr112760.c 2023-12-01 13:46:36.729036971 +0100
> @@ -0,0 +1,22 @@
> +/* PR rtl-optimization/112760 */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fno-dce -fno-guess-branch-probability 
> --param=max-cse-insns=0" } */
> +/* { dg-additional-options "-m8bit-idiv -mavx" { target i?86-*-* x86_64-*-* 
> } } */
> +
> +unsigned g;
> +
> +__attribute__((__noipa__)) unsigned short
> +foo (unsigned short a, unsigned short b)
> +{
> +  unsigned short x = __builtin_add_overflow_p (a, g, (unsigned short) 0);
> +  g -= g / b;
> +  return x;
> +}
> +
> +int
> +main ()
> +{
> +  unsigned short x = foo (40, 6);
> +  if (x != 0)
> +__builtin_abort ();
> +}
>
> Jakub
>


-- 
BR,
Hongtao


Re: [PATCH] RISC-V: Add vec_init expander for masks [PR112854].

2023-12-05 Thread juzhe.zh...@rivai.ai
LGTM.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-12-05 23:13
To: gcc-patches; palmer; Kito Cheng; jeffreyalaw; juzhe.zh...@rivai.ai
CC: rdapp.gcc
Subject: [PATCH] RISC-V: Add vec_init expander for masks [PR112854].
Hi,
 
PR112854 shows a problem on rv32 with zvl1024b.  During the course of
expand_constructor we try to overlay/subreg a 64-element mask by a
scalar (Pmode) register.  This works for zvle512b and its maximum of
32 elements but fails for rv32 and 64 elements.
 
To circumvent this this patch adds a vec_init expander for vector masks
by initializing a QImode vector and comparing that against 0.  This
also ensures we don't do element initialization of masks.
 
Regards
Robin
 
gcc/ChangeLog:
 
PR target/112854
 
* config/riscv/autovec.md (vec_initqi): New expander.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/pr112854.c: New test.
---
gcc/config/riscv/autovec.md  | 16 
.../gcc.target/riscv/rvv/autovec/pr112854.c  | 12 
2 files changed, 28 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112854.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 3c4d68367f0..65ab76b3e0c 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -394,6 +394,22 @@ (define_expand "vec_init"
   }
)
+;; Provide a vec_init for mask registers by initializing
+;; a QImode vector and comparing it against 0.
+(define_expand "vec_initqi"
+  [(match_operand:VB 0 "register_operand")
+   (match_operand 1 "")]
+  "TARGET_VECTOR"
+  {
+machine_mode qimode = riscv_vector::get_vector_mode
+ (QImode, GET_MODE_NUNITS (mode)).require ();
+rtx tmp = gen_reg_rtx (qimode);
+riscv_vector::expand_vec_init (tmp, operands[1]);
+riscv_vector::expand_vec_cmp (operands[0], NE, tmp, CONST0_RTX (qimode));
+DONE;
+  }
+)
+
;; Slide an RVV vector left and insert a scalar into element 0.
(define_expand "vec_shl_insert_"
   [(match_operand:VI 0 "register_operand")
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112854.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112854.c
new file mode 100644
index 000..8f7f13f9dc1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112854.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv_zvl1024b -mabi=ilp32d 
--param=riscv-autovec-preference=fixed-vlmax" } */
+
+short a, b;
+void c(int d) {
+  for (; a; a--) {
+b = 0;
+for (; b <= 8; b++)
+  if (d)
+break;
+  }
+}
-- 
2.43.0
 
 


Re: [PATCH v5] Introduce strub: machine-independent stack scrubbing

2023-12-05 Thread Alexandre Oliva
On Dec  5, 2023, Alexandre Oliva  wrote:

> I intend to install this as part of the monster patch upthread.

I tweaked it a little further, so that exceptions don't mess with the
pattern counts, and extending the same anti-vrp measure to the other
strub-const tests, even though they weren't affected.

I also had to tweak strub-ptrfn2.c, because of the recent
warning-to-error changes.

Finally, the ChangeLog checker noticed that this entry was no longer
applicable:

>   * multiple_target.cc (pass_target_clone::gate): Test seen_error.

I'd duplicated exactly the fix for ipa/107897, and didn't realize it had
been fixed independently.

I'm reposting only the parts of the final patch pertaining to the
modified test files below, to spare you all yet another copy of the
moster patch.  The whole thing is r14-6201.  I've also refreshed my
strub repo with the trunk commit (same commit id), with the extra
patchlets for broader strub testing on top of it.

diff --git a/gcc/testsuite/c-c++-common/torture/strub-const1.c 
b/gcc/testsuite/c-c++-common/torture/strub-const1.c
new file mode 100644
index 0..5e956cb1a9b6b
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/torture/strub-const1.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-fstrub=strict -fdump-ipa-strub" } */
+
+/* Check that, along with a strub const function call, we issue an asm
+   statement to make sure the watermark passed to it is held in memory before
+   the call, and another to make sure it is not assumed to be unchanged.  f
+   should not be inlined into g, but if it were too simple it might be folded
+   by interprocedural value-range propagation.  */
+
+extern int __attribute__ ((__strub__ ("callable"),
+  __const__, __nothrow__)) c ();
+
+int __attribute__ ((__strub__, __const__))
+f () {
+  return c ();
+}
+
+int
+g () {
+  return f ();
+}
+
+/* { dg-final { scan-ipa-dump-times "__asm__" 2 "strub" } } */
diff --git a/gcc/testsuite/c-c++-common/torture/strub-const2.c 
b/gcc/testsuite/c-c++-common/torture/strub-const2.c
new file mode 100644
index 0..73d650292dfbf
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/torture/strub-const2.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-fstrub=strict -fdump-ipa-strub" } */
+
+/* Check that, along with a strub implicitly-const function call, we issue an
+   asm statement to make sure the watermark passed to it is held in memory
+   before the call, and another to make sure it is not assumed to be
+   unchanged.  */
+
+extern int __attribute__ ((__strub__ ("callable"),
+  __const__, __nothrow__)) c ();
+
+int __attribute__ ((__strub__))
+#if ! __OPTIMIZE__
+__attribute__ ((__const__))
+#endif
+f () {
+  return c ();
+}
+
+int
+g () {
+  return f ();
+}
+
+/* { dg-final { scan-ipa-dump-times "__asm__" 2 "strub" } } */
diff --git a/gcc/testsuite/c-c++-common/torture/strub-const3.c 
b/gcc/testsuite/c-c++-common/torture/strub-const3.c
new file mode 100644
index 0..2584f1f974a58
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/torture/strub-const3.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-fstrub=strict -fdump-ipa-strub" } */
+
+/* Check that, along with a strub const wrapping call, we issue an asm 
statement
+   to make sure the watermark passed to it is held in memory before the call,
+   and another to make sure it is not assumed to be unchanged.  */
+
+extern int __attribute__ ((__strub__ ("callable"),
+  __const__, __nothrow__)) c ();
+
+int __attribute__ ((__strub__ ("internal"), __const__))
+f () {
+  return c ();
+}
+
+/* { dg-final { scan-ipa-dump-times "__asm__" 2 "strub" } } */
diff --git a/gcc/testsuite/c-c++-common/torture/strub-const4.c 
b/gcc/testsuite/c-c++-common/torture/strub-const4.c
new file mode 100644
index 0..d819f54ec0230
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/torture/strub-const4.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-fstrub=strict -fdump-ipa-strub" } */
+
+/* Check that, along with a strub implicitly-const wrapping call, we issue an
+   asm statement to make sure the watermark passed to it is held in memory
+   before the call, and another to make sure it is not assumed to be
+   unchanged.  */
+
+extern int __attribute__ ((__strub__ ("callable"),
+  __const__, __nothrow__)) c ();
+
+int __attribute__ ((__strub__ ("internal")))
+#if ! __OPTIMIZE__
+__attribute__ ((__const__))
+#endif
+f () {
+  return c ();
+}
+
+/* { dg-final { scan-ipa-dump-times "__asm__" 2 "strub" } } */

[...]

diff --git a/gcc/testsuite/c-c++-common/torture/strub-ptrfn2.c 
b/gcc/testsuite/c-c++-common/torture/strub-ptrfn2.c
new file mode 100644
index 0..ef634d351265f
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/torture/strub-ptrfn2.c
@@ -0,0 +1,55 @@
+/* { dg-do compile } */
+/* { dg-options "-fstrub=relaxed -Wpedantic" } */
+
+/* C++ does not warn about the partial 

Re: [PATCH] libsupc++: try cxa_thread_atexit_impl at runtime

2023-12-05 Thread Alexandre Oliva
Hello, David,

On Dec  5, 2023, David Edelsohn  wrote:

> This patch broke bootstrap on AIX.  The stage1 compiler is not able to
> build a program and stage2 configure fails.

Thanks for the report. sorry about the breakage.

If the patch makes any difference, this suggests that __GXX_WEAK__ is
defined on AIX, but that we can't rely on a weak undefined symbol for
this purpose.  Back to the drawing board...  I'm reverting this for now.

Maybe we should narrow it down to targets in which weak undefined
symbols are available with the expected semantics, and where the symbol
is known to have ever been defined in libc.  On it...

Or maybe a weak definition (or weak alias to a definition) in that file
would enable us to test whether the weak definition was preempted, and
use it if so.  Or even move the fallback definition into the weak
symbol.

Thanks again,

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


[PATCH] RISC-V: Remove xfail from ssa-fre-3.c testcase

2023-12-05 Thread Edwin Lu
Ran the test case at 122e7b4f9d0c2d54d865272463a1d812002d0a5c where the xfail 
was introduced. The test did pass at that hash and has continued to pass since
then. Remove the xfail

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/ssa-fre-3.c: Remove xfail

Signed-off-by: Edwin Lu 
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-3.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-3.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-3.c
index 224dd4f72ef..b2924837a22 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-3.c
@@ -18,4 +18,4 @@ foo (int a, int b)
   return aa + bb;
 }
 
-/* { dg-final { scan-tree-dump "Replaced \\\(int\\\) aa_.*with a_" "fre1" { 
xfail { riscv*-*-* && lp64 } } } } */
+/* { dg-final { scan-tree-dump "Replaced \\\(int\\\) aa_.*with a_" "fre1" } } 
*/
-- 
2.34.1



Re: [PATCH] libstdc++: implement std::generator

2023-12-05 Thread Jonathan Wakely
On Sat, 18 Nov 2023 at 19:50, Arsen Arsenović  wrote:
>
> libstdc++-v3/ChangeLog:
>
> * include/Makefile.am: Install std/generator, bits/elements_of.h
> as freestanding.
> * include/Makefile.in: Regenerate.
> * include/bits/version.def: Add __cpp_lib_generator.
> * include/bits/version.h: Regenerate.
> * include/precompiled/stdc++.h: Include .
> * include/std/ranges: Include bits/elements_of.h
> * include/bits/elements_of.h: New file.
> * include/std/generator: New file.
> * testsuite/24_iterators/range_generators/01.cc: New test.
> * testsuite/24_iterators/range_generators/02.cc: New test.
> * testsuite/24_iterators/range_generators/copy.cc: New test.
> * testsuite/24_iterators/range_generators/except.cc: New test.
> * testsuite/24_iterators/range_generators/synopsis.cc: New test.
> * testsuite/24_iterators/range_generators/subrange.cc: New test.
> ---
> Evening,
>
> This is an implementation of  from C++23.  It should be
> feature-complete, though it doesn't have all the tests that it ought to
> and is missing a few tweaks.
>
> Posting to get reviews in the meanwhile, in case something obvious was
> missed.
>
> Have a lovely night :-)
>
>  libstdc++-v3/include/Makefile.am  |   2 +
>  libstdc++-v3/include/Makefile.in  |   2 +
>  libstdc++-v3/include/bits/elements_of.h   |  72 ++
>  libstdc++-v3/include/bits/version.def |   9 +
>  libstdc++-v3/include/bits/version.h   |  11 +
>  libstdc++-v3/include/precompiled/stdc++.h |   1 +
>  libstdc++-v3/include/std/generator| 820 ++
>  libstdc++-v3/include/std/ranges   |   4 +
>  .../24_iterators/range_generators/01.cc   |  55 ++
>  .../24_iterators/range_generators/02.cc   | 219 +
>  .../24_iterators/range_generators/copy.cc |  97 +++
>  .../24_iterators/range_generators/except.cc   |  97 +++
>  .../24_iterators/range_generators/subrange.cc |  45 +
>  .../24_iterators/range_generators/synopsis.cc |  38 +
>  14 files changed, 1472 insertions(+)
>  create mode 100644 libstdc++-v3/include/bits/elements_of.h
>  create mode 100644 libstdc++-v3/include/std/generator
>  create mode 100644 libstdc++-v3/testsuite/24_iterators/range_generators/01.cc
>  create mode 100644 libstdc++-v3/testsuite/24_iterators/range_generators/02.cc
>  create mode 100644 
> libstdc++-v3/testsuite/24_iterators/range_generators/copy.cc
>  create mode 100644 
> libstdc++-v3/testsuite/24_iterators/range_generators/except.cc
>  create mode 100644 
> libstdc++-v3/testsuite/24_iterators/range_generators/subrange.cc
>  create mode 100644 
> libstdc++-v3/testsuite/24_iterators/range_generators/synopsis.cc
>
> diff --git a/libstdc++-v3/include/Makefile.am 
> b/libstdc++-v3/include/Makefile.am
> index 17d9d9cec313..0b764f2b8a9e 100644
> --- a/libstdc++-v3/include/Makefile.am
> +++ b/libstdc++-v3/include/Makefile.am
> @@ -35,6 +35,7 @@ std_freestanding = \
> ${std_srcdir}/coroutine \
> ${std_srcdir}/expected \
> ${std_srcdir}/functional \
> +   ${std_srcdir}/generator \
> ${std_srcdir}/iterator \
> ${std_srcdir}/limits \
> ${std_srcdir}/memory \
> @@ -122,6 +123,7 @@ bits_freestanding = \
> ${bits_srcdir}/concept_check.h \
> ${bits_srcdir}/char_traits.h \
> ${bits_srcdir}/cpp_type_traits.h \
> +   ${bits_srcdir}/elements_of.h \
> ${bits_srcdir}/enable_special_members.h \
> ${bits_srcdir}/functexcept.h \
> ${bits_srcdir}/functional_hash.h \
> diff --git a/libstdc++-v3/include/Makefile.in 
> b/libstdc++-v3/include/Makefile.in
> index f038af709cc4..7f1a6592942e 100644
> --- a/libstdc++-v3/include/Makefile.in
> +++ b/libstdc++-v3/include/Makefile.in
> @@ -393,6 +393,7 @@ std_freestanding = \
> ${std_srcdir}/coroutine \
> ${std_srcdir}/expected \
> ${std_srcdir}/functional \
> +   ${std_srcdir}/generator \
> ${std_srcdir}/iterator \
> ${std_srcdir}/limits \
> ${std_srcdir}/memory \
> @@ -477,6 +478,7 @@ bits_freestanding = \
> ${bits_srcdir}/concept_check.h \
> ${bits_srcdir}/char_traits.h \
> ${bits_srcdir}/cpp_type_traits.h \
> +   ${bits_srcdir}/elements_of.h \
> ${bits_srcdir}/enable_special_members.h \
> ${bits_srcdir}/functexcept.h \
> ${bits_srcdir}/functional_hash.h \
> diff --git a/libstdc++-v3/include/bits/elements_of.h 
> b/libstdc++-v3/include/bits/elements_of.h
> new file mode 100644
> index ..663e15a94aa7
> --- /dev/null
> +++ b/libstdc++-v3/include/bits/elements_of.h
> @@ -0,0 +1,72 @@
> +// Tag type for yielding ranges rather than values in   -*- C++ 
> -*-
> +
> +// Copyright (C) 2023 Free Software Foundation, Inc.
> +//
> +// This file is part of the GNU ISO C++ Library.  This library is free
> +// software; you can redistribute it and/or modify it under the

Re: [[PATCH][GCC13] 0/2] Fix combined tree build of GCC 13 with Binutils 2.41

2023-12-05 Thread Indu Bhagat

On 12/5/23 13:45, Jakub Jelinek wrote:

On Tue, Dec 05, 2023 at 01:36:30PM -0800, Indu Bhagat wrote:

To resolve the issue of combined Binutils (2.41) + GCC (13) failing to
install (https://sourceware.org/bugzilla/show_bug.cgi?id=31108), we will
need some backports.  This specific issue is with using --enable-shared
in the combined tree build; it arises due to missing install-*
dependencies in the top-level makefiles.

I think it makes sense to bring both of the following two commits (from
the trunk) to the GCC13 branch:

commit eff0e7a4ae31d1e4e64ae37bbc10d073d8579255
Author: Indu Bhagat 
Date:   Wed Jan 18 23:17:49 2023 -0800
toplevel: Makefile.def: add install-strip dependency on libsframe
 


commit dab58c93634bef06fd289f49109b5c370cd5c380
Author: Indu Bhagat 
Date:   Tue Nov 15 15:07:04 2022 -0800
bfd: linker: merge .sframe sections

This patch set cherry-picks the above two commits to GCC13 branch.  The
patches apply cleanly with no conflicts.


Won't this break building gcc 13 with in-tree older binutils which don't have
libsframe at all?  I think binutils 2.39 and older don't have it.



I tested with binutils-2_39-branch and releases/gcc-13 as well (with 
--enable-shared --disable-bootstrap). It builds and installs fine.


Indu



Re: [PATCH] libstdc++: Add workaround to std::ranges::subrange [PR111948]

2023-12-05 Thread Jonathan Wakely
On Thu, 30 Nov 2023 at 15:52, Jonathan Wakely  wrote:
>
> I think I'll push this to work around the compiler bug. We can revert it
> later if the front end gets fixed.
>
> Tested x86_64-linux. Needed on trunk and gcc-13.

Pushed to trunk now.

>
> -- >8 --
>
> libstdc++-v3/ChangeLog:
>
> PR libstdc++/111948
> * include/bits/ranges_util.h (subrange): Add constructor to
> _Size to aoid setting member in constructor.
> * testsuite/std/ranges/subrange/111948.cc: New test.
> ---
>  libstdc++-v3/include/bits/ranges_util.h   | 21 ---
>  .../testsuite/std/ranges/subrange/111948.cc   |  8 +++
>  2 files changed, 21 insertions(+), 8 deletions(-)
>  create mode 100644 libstdc++-v3/testsuite/std/ranges/subrange/111948.cc
>
> diff --git a/libstdc++-v3/include/bits/ranges_util.h 
> b/libstdc++-v3/include/bits/ranges_util.h
> index ab6c69c57d0..185e46ec7a9 100644
> --- a/libstdc++-v3/include/bits/ranges_util.h
> +++ b/libstdc++-v3/include/bits/ranges_util.h
> @@ -267,13 +267,21 @@ namespace ranges
>using __size_type
> = __detail::__make_unsigned_like_t>;
>
> -  template
> +  template
> struct _Size
> -   { };
> +   {
> + [[__gnu__::__always_inline__]]
> + constexpr _Size(_Tp = {}) { }
> +   };
>
>template
> struct _Size<_Tp, true>
> -   { _Tp _M_size; };
> +   {
> + [[__gnu__::__always_inline__]]
> + constexpr _Size(_Tp __s = {}) : _M_size(__s) { }
> +
> + _Tp _M_size;
> +   };
>
>[[no_unique_address]] _Size<__size_type> _M_size = {};
>
> @@ -294,11 +302,8 @@ namespace ranges
>noexcept(is_nothrow_constructible_v<_It, decltype(__i)>
>&& is_nothrow_constructible_v<_Sent, _Sent&>)
> requires (_Kind == subrange_kind::sized)
> -  : _M_begin(std::move(__i)), _M_end(__s)
> -  {
> -   if constexpr (_S_store_size)
> - _M_size._M_size = __n;
> -  }
> +  : _M_begin(std::move(__i)), _M_end(__s), _M_size(__n)
> +  { }
>
>template<__detail::__different_from _Rng>
> requires borrowed_range<_Rng>
> diff --git a/libstdc++-v3/testsuite/std/ranges/subrange/111948.cc 
> b/libstdc++-v3/testsuite/std/ranges/subrange/111948.cc
> new file mode 100644
> index 000..dcc64b56def
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/std/ranges/subrange/111948.cc
> @@ -0,0 +1,8 @@
> +// { dg-do compile { target c++20 } }
> +
> +#include 
> +
> +// Bug libstdc++/111948 - subrange modifies a const size object
> +
> +constexpr auto r = std::ranges::subrange(std::views::iota(0), 5);
> +static_assert(std::ranges::distance(r));
> --
> 2.43.0
>



Re: [PATCH] libstdc++: Implement LGW 4016 for std::ranges::to

2023-12-05 Thread Jonathan Wakely
On Thu, 30 Nov 2023 at 15:53, Jonathan Wakely wrote:
>
> Before pushing I'll fix the summary to say "LWG" instead of "LGW" (the
> airport code for London Gatwick!)

Pushed to trunk now.

>
> On Thu, 30 Nov 2023 at 15:51, Jonathan Wakely wrote:
> >
> > This hasn't been finally approved by LWG yet, but everybody seems to be
> > in favour of it. I think I'll push this soon.
> >
> > Tested x86_64-linux.
> >
> > -- >8 --
> >
> > This implements the proposed resolution of LWG 4016, so that
> > std::ranges::to does not use std::back_inserter and std::inserter.
> > Instead it inserts at the back of the container directly, using
> > the first supported one of emplace_back, push_back, emplace, and insert.
> >
> > Using emplace avoids creating a temporary that has to be moved into the
> > container, for cases where the source range and the destination
> > container do not have the same value type.
> >
> > libstdc++-v3/ChangeLog:
> >
> > * include/std/ranges (__detail::__container_insertable): Remove.
> > (__detail::__container_inserter): Remove.
> > (ranges::to): Use emplace_back or emplace, as per LWG 4016.
> > * testsuite/std/ranges/conv/1.cc (Cont4, test_2_1_4): Check for
> > use of emplace_back and emplace.
> > ---
> >  libstdc++-v3/include/std/ranges |  50 +++
> >  libstdc++-v3/testsuite/std/ranges/conv/1.cc | 149 
> >  2 files changed, 144 insertions(+), 55 deletions(-)
> >
> > diff --git a/libstdc++-v3/include/std/ranges 
> > b/libstdc++-v3/include/std/ranges
> > index 9d4c2e01c4d..afd0a38e0cf 100644
> > --- a/libstdc++-v3/include/std/ranges
> > +++ b/libstdc++-v3/include/std/ranges
> > @@ -9229,26 +9229,6 @@ namespace __detail
> > { __c.max_size() } -> same_as;
> >};
> >
> > -  template
> > -constexpr bool __container_insertable
> > -  = requires(_Container& __c, _Ref&& __ref) {
> > -   typename _Container::value_type;
> > -   requires (
> > - requires { __c.push_back(std::forward<_Ref>(__ref)); }
> > - || requires { __c.insert(__c.end(), std::forward<_Ref>(__ref)); }
> > -   );
> > -  };
> > -
> > -  template
> > -constexpr auto
> > -__container_inserter(_Container& __c)
> > -{
> > -  if constexpr (requires { __c.push_back(std::declval<_Ref>()); })
> > -   return std::back_inserter(__c);
> > -  else
> > -   return std::inserter(__c, __c.end());
> > -}
> > -
> >template
> >  constexpr bool __toable = requires {
> >requires (!input_range<_Cont>
> > @@ -9301,17 +9281,33 @@ namespace __detail
> >  std::forward<_Args>(__args)...);
> >   else
> > {
> > - using __detail::__container_insertable;
> > - using __detail::__reservable_container;
> >   using _RefT = range_reference_t<_Rg>;
> >   static_assert(constructible_from<_Cont, _Args...>);
> > - static_assert(__container_insertable<_Cont, _RefT>);
> >   _Cont __c(std::forward<_Args>(__args)...);
> > - if constexpr (sized_range<_Rg> && 
> > __reservable_container<_Cont>)
> > + if constexpr (sized_range<_Rg>
> > + && __detail::__reservable_container<_Cont>)
> > 
> > __c.reserve(static_cast>(ranges::size(__r)));
> > - auto __ins = __detail::__container_inserter<_RefT>(__c);
> > - for (auto&& __e : __r)
> > -   *__ins++ = std::forward(__e);
> > + // _GLIBCXX_RESOLVE_LIB_DEFECTS
> > + // 4016. container-insertable checks do not match what
> > + // container-inserter does
> > + auto __it = ranges::begin(__r);
> > + const auto __sent = ranges::end(__r);
> > + while (__it != __sent)
> > +   {
> > + if constexpr (requires { __c.emplace_back(*__it); })
> > +   __c.emplace_back(*__it);
> > + else if constexpr (requires { __c.push_back(*__it); })
> > +   __c.push_back(*__it);
> > + else
> > +   {
> > + auto __end = __c.end();
> > + if constexpr (requires { __c.emplace(__end, *__it); })
> > +   __end = __c.emplace(__end, *__it);
> > + else
> > +   __end = __c.insert(__end, *__it);
> > +   }
> > + ++__it;
> > +   }
> >   return __c;
> > }
> > }
> > diff --git a/libstdc++-v3/testsuite/std/ranges/conv/1.cc 
> > b/libstdc++-v3/testsuite/std/ranges/conv/1.cc
> > index 4b6814b1add..b5f861dedb3 100644
> > --- a/libstdc++-v3/testsuite/std/ranges/conv/1.cc
> > +++ b/libstdc++-v3/testsuite/std/ranges/conv/1.cc
> > @@ -203,33 +203,51 @@ test_2_1_3()
> >VERIFY( c2.c.get_allocator() == Alloc(78) );
> >  }
> >
> > -template
> > +enum AppendKind 

[committed] libstdc++: Redefine __glibcxx_assert to work in C++23 constexpr

2023-12-05 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk. Might be worth backporting too,
but it can wait.

-- >8 --

The changes in r14-5979 to support unknown references in constant
expressions caused some test regressions. The way that __glibcxx_assert
is defined for constant evaluation no longer works when
_GLIBCXX_ASSERTIONS is defined.

This change simplifies __glibcxx_assert so that there is only one check,
rather than a constexpr one and a conditionally-enabled runtime one. The
constexpr one does not need to use __builtin_unreachable to cause a
compilation failure, because __glibcxx_assert_fail is not usable in
constant expressions, so that will cause a failure too.

As well as fixing the regressions, this makes the code for the
assertions shorter and simpler, so should be quicker to compile, and
might inline better too.

libstdc++-v3/ChangeLog:

* include/bits/c++config (__glibcxx_assert_fail): Declare even
when assertions are not enabled.
(__glibcxx_constexpr_assert): Remove macro.
(__glibcxx_assert_impl): Remove macro.
(_GLIBCXX_ASSERT_FAIL): New macro.
(_GLIBCXX_DO_ASSERT): New macro.
(__glibcxx_assert): Simplify to a single definition that works
at runtime and during constant evaluation.
* 
testsuite/21_strings/basic_string_view/element_access/char/back_constexpr_neg.cc:
Adjust expected errors.
* 
testsuite/21_strings/basic_string_view/element_access/char/constexpr_neg.cc:
Likewise.
* 
testsuite/21_strings/basic_string_view/element_access/char/front_constexpr_neg.cc:
Likewise.
* 
testsuite/21_strings/basic_string_view/element_access/wchar_t/back_constexpr_neg.cc:
Likewise.
* 
testsuite/21_strings/basic_string_view/element_access/wchar_t/constexpr_neg.cc:
Likewise.
* 
testsuite/21_strings/basic_string_view/element_access/wchar_t/front_constexpr_neg.cc:
Likewise.
* 
testsuite/21_strings/basic_string_view/modifiers/remove_prefix/debug.cc:
Likewise.
* 
testsuite/21_strings/basic_string_view/modifiers/remove_suffix/debug.cc:
Likewise.
* testsuite/23_containers/span/back_neg.cc: Likewise.
* testsuite/23_containers/span/front_neg.cc: Likewise.
* testsuite/23_containers/span/index_op_neg.cc: Likewise.
* testsuite/26_numerics/lcm/105844.cc: Likewise.
---
 libstdc++-v3/include/bits/c++config   | 47 +--
 .../element_access/char/back_constexpr_neg.cc |  3 +-
 .../element_access/char/constexpr_neg.cc  |  3 +-
 .../char/front_constexpr_neg.cc   |  3 +-
 .../wchar_t/back_constexpr_neg.cc |  3 +-
 .../element_access/wchar_t/constexpr_neg.cc   |  3 +-
 .../wchar_t/front_constexpr_neg.cc|  3 +-
 .../modifiers/remove_prefix/debug.cc  |  2 +-
 .../modifiers/remove_suffix/debug.cc  |  2 +-
 .../testsuite/23_containers/span/back_neg.cc  |  4 +-
 .../testsuite/23_containers/span/front_neg.cc |  4 +-
 .../23_containers/span/index_op_neg.cc|  4 +-
 .../testsuite/26_numerics/lcm/105844.cc   |  2 +-
 13 files changed, 36 insertions(+), 47 deletions(-)

diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index 410c136e1b1..284d24d933f 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -577,46 +577,41 @@ namespace std
 #undef _GLIBCXX_VERBOSE_ASSERT
 
 // Assert.
-#if defined(_GLIBCXX_ASSERTIONS) \
-  || defined(_GLIBCXX_PARALLEL) || defined(_GLIBCXX_PARALLEL_ASSERTIONS)
-# ifdef _GLIBCXX_VERBOSE_ASSERT
+#ifdef _GLIBCXX_VERBOSE_ASSERT
 namespace std
 {
 #pragma GCC visibility push(default)
-  // Avoid the use of assert, because we're trying to keep the 
-  // include out of the mix.
+  // Don't use  because this should be unaffected by NDEBUG.
   extern "C++" _GLIBCXX_NORETURN
   void
-  __glibcxx_assert_fail(const char* __file, int __line,
-   const char* __function, const char* __condition)
+  __glibcxx_assert_fail /* Called when a precondition violation is detected. */
+(const char* __file, int __line, const char* __function,
+ const char* __condition)
   _GLIBCXX_NOEXCEPT;
 #pragma GCC visibility pop
 }
-#define __glibcxx_assert_impl(_Condition)  \
-  if (__builtin_expect(!bool(_Condition), false))  \
-  {\
-__glibcxx_constexpr_assert(false); \
-std::__glibcxx_assert_fail(__FILE__, __LINE__, __PRETTY_FUNCTION__,
\
-  #_Condition);\
-  }
-# else // ! VERBOSE_ASSERT
-# define __glibcxx_assert_impl(_Condition) \
-  if (__builtin_expect(!bool(_Condition), false))  \
-  {\
-__glibcxx_constexpr_assert(false); \
-

Re: [PATCH] libsupc++: try cxa_thread_atexit_impl at runtime

2023-12-05 Thread Andrew Pinski
On Tue, Dec 5, 2023 at 3:15 PM David Edelsohn  wrote:
>
> The error is:
>
> ld: 0711-317 ERROR: Undefined symbol: __cxa_thread_atexit_impl
>
>
> from the new, weak reference.

By the way this seems like the same issue on nvptx too. See
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112858 which has a
similar analysis as below too.

Thanks,
Andrew

>
>
> Also, earlier in atexit_thread.cc, there is another definition protected by
>
>
> _GLIBCXX_HAVE___CXA_THREAD_ATEXIT_IMPL
>
>
> not utilized by the new reference.
>
>
> Thanks, David
>
>
> On Tue, Dec 5, 2023 at 11:10 AM David Edelsohn  wrote:
>>
>> Alex,
>>
>> This patch broke bootstrap on AIX.  The stage1 compiler is not able to build 
>> a program and stage2 configure fails.
>>
>> Thanks, David
>>


Re: [PATCH] libsupc++: try cxa_thread_atexit_impl at runtime

2023-12-05 Thread David Edelsohn
The error is:

ld: 0711-317 ERROR: Undefined symbol: __cxa_thread_atexit_impl


from the new, weak reference.


Also, earlier in atexit_thread.cc, there is another definition protected by


_GLIBCXX_HAVE___CXA_THREAD_ATEXIT_IMPL


not utilized by the new reference.


Thanks, David

On Tue, Dec 5, 2023 at 11:10 AM David Edelsohn  wrote:

> Alex,
>
> This patch broke bootstrap on AIX.  The stage1 compiler is not able to
> build a program and stage2 configure fails.
>
> Thanks, David
>
>


Re: Modula-2: Support '-isysroot [...]'

2023-12-05 Thread Gaius Mulley
Thomas Schwinge  writes:

> Hi!
>
> OK to push the attached "Modula-2: Support '-isysroot [...]'"?
>
> This greatly improves test results for the cross configurations I've
> tested, but I don't know if any real handling needs to be implemented, or
> this should be done differently altogether?
>
>
> Grüße
>  Thomas
>
>
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße
> 201, 80634 München; Gesellschaft mit beschränkter Haftung;
> Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft:
> München; Registergericht München, HRB 106955
>
>>From 0bd30fd25138497df5320e5f63fd04e1b5756cc5 Mon Sep 17 00:00:00 2001
> From: Thomas Schwinge 
> Date: Tue, 5 Dec 2023 09:54:54 +0100
> Subject: [PATCH] Modula-2: Support '-isysroot [...]'
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> In GCC cross configurations (tested '--target=amdgcn-amdhsa' and
> '--target=nvptx-none') with a sysroot configured, the 'gm2' driver invocations
> are passed '--sysroot=[...]', which is translated into '-isysroot [...]' for
> the 'cc1gm2' compiler invocation.  The latter, however gets complained about:
>
> cc1gm2: warning: command-line option ‘-isysroot [...]’ is valid for 
> C/C++/D/Fortran/ObjC/ObjC++ but not for Modula-2
>
> ..., and therefore a ton of FAILs.
>
> Reproducer (also for non-cross, native configurations):
>
> $ build-gcc/gcc/gm2 -Bbuild-gcc/gcc -v --sysroot=/tmp -x modula-2 
> /dev/null
> [...]
>  build-gcc/gcc/cc1gm2 [...] -isysroot [...]/tmp [...]
> cc1gm2: warning: command-line option ‘-isysroot /tmp’ is valid for 
> C/C++/D/Fortran/ObjC/ObjC++ but not for Modula-2
> [...]
>
>   gcc/m2/
>   * lang.opt (-isysroot): New.
> ---
>  gcc/m2/lang.opt | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/gcc/m2/lang.opt b/gcc/m2/lang.opt
> index 24f3c6594b9..a60c03e69d4 100644
> --- a/gcc/m2/lang.opt
> +++ b/gcc/m2/lang.opt
> @@ -405,6 +405,10 @@ iquote
>  Modula-2
>  ; Documented in c.opt
>  
> +isysroot
> +Modula-2
> +; Documented in c.opt
> +
>  isystem
>  Modula-2
>  ; Documented in c.opt

Hi Thomas,

yes indeed and many thanks for the fix!  gm2-lang.cc anticipates
OPT_isysroot (albeit it does nothing with it yet)

regards,
Gaius


Re: [PATCH] libiberty: Fix build with GCC < 7

2023-12-05 Thread Ian Lance Taylor
On Tue, Dec 5, 2023 at 2:06 PM Jakub Jelinek  wrote:
>
> Ok for trunk (both gcc and binutils)?
>
> 2023-12-05  Jakub Jelinek  
>
> * configure.ac (HAVE_X86_SHA1_HW_SUPPORT): Verify __get_cpuid and
> __get_cpuid_count are not implicitly declared.
> * configure: Regenerated.

This is fine.  Thanks.

Ian


[PATCH] i386: Move vzeroupper pass from after reload pass to after postreload_cse [PR112760]

2023-12-05 Thread Jakub Jelinek
Hi!

Regardless of the outcome of the REG_UNUSED discussions, I think
it is a good idea to move the vzeroupper pass one pass later.
As can be seen in the multiple PRs and as postreload.cc documents,
reload/LRA is known to create dead statements quite often, which
is the reason why we have postreload_cse pass at all.
Doing vzeroupper pass before such cleanup means the pass including
df_analyze for it needs to process more instructions than needed
and because mode switching adds note problem, also higher chance of
having stale REG_UNUSED notes.
And, I really don't see why vzeroupper can't wait until those cleanups
are done.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-12-05  Jakub Jelinek  

PR rtl-optimization/112760
* config/i386/i386-passes.def (pass_insert_vzeroupper): Insert
after pass_postreload_cse rather than pass_reload.
* config/i386/i386-features.cc (rest_of_handle_insert_vzeroupper):
Adjust comment for it.

* gcc.dg/pr112760.c: New test.

--- gcc/config/i386/i386-passes.def.jj  2023-01-16 11:52:15.960735877 +0100
+++ gcc/config/i386/i386-passes.def 2023-12-05 19:15:01.748279329 +0100
@@ -24,7 +24,7 @@ along with GCC; see the file COPYING3.
REPLACE_PASS (PASS, INSTANCE, TGT_PASS)
  */
 
-  INSERT_PASS_AFTER (pass_reload, 1, pass_insert_vzeroupper);
+  INSERT_PASS_AFTER (pass_postreload_cse, 1, pass_insert_vzeroupper);
   INSERT_PASS_AFTER (pass_combine, 1, pass_stv, false /* timode_p */);
   /* Run the 64-bit STV pass before the CSE pass so that CONST0_RTX and
  CONSTM1_RTX generated by the STV pass can be CSEed.  */
--- gcc/config/i386/i386-features.cc.jj 2023-11-02 07:49:15.029894060 +0100
+++ gcc/config/i386/i386-features.cc2023-12-05 19:15:48.658620698 +0100
@@ -2627,10 +2627,11 @@ convert_scalars_to_vector (bool timode_p
 static unsigned int
 rest_of_handle_insert_vzeroupper (void)
 {
-  /* vzeroupper instructions are inserted immediately after reload to
- account for possible spills from 256bit or 512bit registers.  The pass
- reuses mode switching infrastructure by re-running mode insertion
- pass, so disable entities that have already been processed.  */
+  /* vzeroupper instructions are inserted immediately after reload and
+ postreload_cse to clean up after it a little bit to account for possible
+ spills from 256bit or 512bit registers.  The pass reuses mode switching
+ infrastructure by re-running mode insertion pass, so disable entities
+ that have already been processed.  */
   for (int i = 0; i < MAX_386_ENTITIES; i++)
 ix86_optimize_mode_switching[i] = 0;
 
--- gcc/testsuite/gcc.dg/pr112760.c.jj  2023-12-01 13:46:57.444746529 +0100
+++ gcc/testsuite/gcc.dg/pr112760.c 2023-12-01 13:46:36.729036971 +0100
@@ -0,0 +1,22 @@
+/* PR rtl-optimization/112760 */
+/* { dg-do run } */
+/* { dg-options "-O2 -fno-dce -fno-guess-branch-probability 
--param=max-cse-insns=0" } */
+/* { dg-additional-options "-m8bit-idiv -mavx" { target i?86-*-* x86_64-*-* } 
} */
+
+unsigned g;
+
+__attribute__((__noipa__)) unsigned short
+foo (unsigned short a, unsigned short b)
+{
+  unsigned short x = __builtin_add_overflow_p (a, g, (unsigned short) 0);
+  g -= g / b;
+  return x;
+}
+
+int
+main ()
+{
+  unsigned short x = foo (40, 6);
+  if (x != 0)
+__builtin_abort ();
+}

Jakub



[PATCH] lower-bitint: Fix arithmetics followed by extension by many bits [PR112809]

2023-12-05 Thread Jakub Jelinek
Hi!

A zero or sign extension from result of some upwards_2limb operation
is implemented in lower_mergeable_stmt as an extra loop which fills in
the extra bits with 0s or 1s.
If the delta of extended vs. unextended bit count is small, the code
doesn't use a loop and emits up to a couple of stores to constant indexes,
but if the delta is large, it uses
  cnt = (bo_bit != 0) + 1 + (rem != 0);
statements.  bo_bit is non-zero for bit-field loads and is done in that
case as straight line, the unconditional 1 in there is for a loop which
handles most of the limbs in the delta and finally (rem != 0) is for the
case when the extended precision is not a multiple of limb_prec and is
again done in straight line code (after the loop).
The testcase ICEs because the decision what idx to use was incorrect
for kind == bitint_prec_huge (i.e. when the precision delta is very large)
and rem == 0 (i.e. the extended precision is multiple of limb_prec).
In that case cnt is either 1 (if bo_bit == 0) or 2, and idx should
be either first size_int (start) and then result of create_loop (for bo_bit
!= 0) or just result of create_loop, but by mistake the last case
was size_int (end), which means when precision is multiple of limb_prec
storing above the precision (which ICEs; but also not emitting the loop
which is needed).

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok
for trunk?

2023-12-05  Jakub Jelinek  

PR tree-optimization/112809
* gimple-lower-bitint.cc (bitint_large_huge::lower_mergeable_stmt): For
separate_ext in kind == bitint_prec_huge mode if rem == 0, create for
i == cnt - 1 the loop rather than using size_int (end).

* gcc.dg/bitint-48.c: New test.

--- gcc/gimple-lower-bitint.cc.jj   2023-12-05 09:48:14.0 +0100
+++ gcc/gimple-lower-bitint.cc  2023-12-05 18:55:58.996323144 +0100
@@ -2624,7 +2624,7 @@ bitint_large_huge::lower_mergeable_stmt
{
  if (kind == bitint_prec_large || (i == 0 && bo_bit != 0))
idx = size_int (start + i);
- else if (i == cnt - 1)
+ else if (i == cnt - 1 && (rem != 0))
idx = size_int (end);
  else if (i == (bo_bit != 0))
idx = create_loop (size_int (start + i), _next);
--- gcc/testsuite/gcc.dg/bitint-48.c.jj 2023-12-05 19:00:19.593664966 +0100
+++ gcc/testsuite/gcc.dg/bitint-48.c2023-12-05 19:00:14.599735086 +0100
@@ -0,0 +1,23 @@
+/* PR tree-optimization/112809 */
+/* { dg-do compile { target bitint } } */
+/* { dg-options "-O2" } */
+
+#if __BITINT_MAXWIDTH__ >= 512
+_BitInt (512) a;
+_BitInt (256) b;
+_BitInt (256) c;
+
+int
+foo (void)
+{
+  return a == (b | c);
+}
+
+void
+bar (void)
+{
+  a /= b - 2;
+}
+#else
+int i;
+#endif

Jakub



Re: [PATCH] btf: avoid wrong DATASEC entries for extern vars [PR112849]

2023-12-05 Thread David Faust



On 12/5/23 13:28, Indu Bhagat wrote:
> On 12/4/23 15:47, David Faust wrote:
>> The process of creating BTF_KIND_DATASEC records involves iterating
>> through variable declarations, determining which section they will be
>> placed in, and creating an entry in the appropriate DATASEC record
>> accordingly.
>>
>> For variables without e.g. an explicit __attribute__((section)), we use
>> categorize_decl_for_section () to identify the appropriate named section
>> and corresponding BTF_KIND_DATASEC record.
>>
>> This was incorrectly being done for 'extern' variable declarations as
>> well as non-extern ones, which meant that extern variable declarations
>> could result in BTF_KIND_DATASEC entries claiming the variable is
>> allocated in some section such as '.bss' without any knowledge whether
>> that is actually true. That resulted in errors building the Linux kernel
>> BPF selftests.
>>
>> This patch corrects btf_collect_datasec () to avoid assuming a section
>> for extern variables, and only emit BTF_KIND_DATASEC entries for them if
>> they have a known section.
>>
>> Bootstrapped + tested on x86_64-linux-gnu.
>> Tested on x86_64-linux-gnu host for bpf-unknown-none.
>>
> 
> One comment below.
> 
> LGTM, otherwise.
> Thanks
> 
>> gcc/
>>  PR debug/112849
>>  * btfout.cc (btf_collect_datasec): Avoid incorrectly creating an
>>  entry in a BTF_KIND_DATASEC record for extern variable decls without
>>  a known section.
>>
>> gcc/testsuite/
>>  PR debug/112849
>>  * gcc.dg/debug/btf/btf-datasec-3.c: New test.
>> ---
>>   gcc/btfout.cc | 10 ++-
>>   .../gcc.dg/debug/btf/btf-datasec-3.c  | 27 +++
>>   2 files changed, 36 insertions(+), 1 deletion(-)
>>   create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-datasec-3.c
>>
>> diff --git a/gcc/btfout.cc b/gcc/btfout.cc
>> index a5e0d640e19..db4f1084f85 100644
>> --- a/gcc/btfout.cc
>> +++ b/gcc/btfout.cc
>> @@ -486,7 +486,15 @@ btf_collect_datasec (ctf_container_ref ctfc)
>>   
>> /* Mark extern variables.  */
>> if (DECL_EXTERNAL (node->decl))
>> -dvd->dvd_visibility = BTF_VAR_GLOBAL_EXTERN;
>> +{
>> +  dvd->dvd_visibility = BTF_VAR_GLOBAL_EXTERN;
>> +
>> +  /* PR112849: avoid assuming a section for extern decls without
>> + an explicit section, which would result in incorrectly
>> + emitting a BTF_KIND_DATASEC entry for them.  */
>> +  if (node->get_section () == NULL)
>> +continue;
>> +}
>>   
>> const char *section_name = get_section_name (node);
>> if (section_name == NULL)
>> diff --git a/gcc/testsuite/gcc.dg/debug/btf/btf-datasec-3.c 
>> b/gcc/testsuite/gcc.dg/debug/btf/btf-datasec-3.c
>> new file mode 100644
>> index 000..3c1c7a28c2a
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/debug/btf/btf-datasec-3.c
>> @@ -0,0 +1,27 @@
>> +/* PR debug/112849
>> +   Test that we do not incorrectly create BTF_KIND_DATASEC entries for
>> +   extern decls with no known section.  */
>> +
>> +/* { dg-do compile } */
>> +/* { dg-options "-O0 -gbtf -dA" } */
>> +
>> +extern int VERSION __attribute__((section (".version")));
>> +
>> +extern int test_bss1;
>> +extern int test_data1;
>> +
>> +int test_bss2;
>> +int test_data2 = 2;
>> +
>> +int
>> +foo (void)
>> +{
>> +  test_bss2 = VERSION;
>> +  return test_bss1 + test_data1 + test_data2;
>> +}
>> +
>> +/* There should only be a DATASEC entries for VERSION out of the extern 
>> decls.  */
> 
> The statement is unclear as is. Perhaps you wanted to say "There should 
> only be 3 DATASEC entries; including one for VERSION even though it is 
> extern decl" ?

Thanks. I reworded parts of that once or twice ended up with a garbled mess.

Changed to:

/* There should be 3 DATASEC entries total.  Of the extern decls, only VERSION
   has a known section; entries are not created for the other two.  */

and pushed.

> 
>> +/* { dg-final { scan-assembler-times "bts_type" 3 } } */
>> +/* { dg-final { scan-assembler-times "bts_type: \\(BTF_KIND_VAR 
>> 'test_data2'\\)" 1 } } */
>> +/* { dg-final { scan-assembler-times "bts_type: \\(BTF_KIND_VAR 
>> 'test_bss2'\\)" 1 } } */
>> +/* { dg-final { scan-assembler-times "bts_type: \\(BTF_KIND_VAR 
>> 'VERSION'\\)" 1 } } */
> 


[PATCH] libiberty: Fix build with GCC < 7

2023-12-05 Thread Jakub Jelinek
Hi!

Tobias reported on IRC that the linker fails to build with GCC 4.8.5.
In configure I've tried to use everything actually used in the sha1.c
x86 hw implementation, but unfortunately I forgot about implicit function
declarations.  GCC before 7 did have  header and bit_SHA define
and __get_cpuid function defined inline, but it didn't define
__get_cpuid_count, which compiled fine (and the configure test is
intentionally compile time only) due to implicit function declaration,
but then failed to link when linking the linker, because
__get_cpuid_count wasn't defined anywhere.

The following patch fixes that by using what autoconf uses in AC_CHECK_DECL
to make sure the functions are declared.

Bootstrapped/regtested in GCC on x86_64-linux and i686-linux with GCC 12 as
system compiler (HAVE_X86_SHA1_HW_SUPPORT is defined there) and tested by
Tobias with GCC 4.8.5 (it isn't defined there anymore).

Ok for trunk (both gcc and binutils)?

2023-12-05  Jakub Jelinek  

* configure.ac (HAVE_X86_SHA1_HW_SUPPORT): Verify __get_cpuid and
__get_cpuid_count are not implicitly declared.
* configure: Regenerated.

--- libiberty/configure.ac.jj   2023-12-01 08:10:44.877293904 +0100
+++ libiberty/configure.ac  2023-12-05 16:09:49.506323449 +0100
@@ -771,6 +771,8 @@ void foo (__m128i *buf, unsigned int e,
 int bar (void)
 {
   unsigned int eax, ebx, ecx, edx;
+  (void) __get_cpuid;
+  (void) __get_cpuid_count;
   if (__get_cpuid_count (7, 0, , , , )
   && (ebx & bit_SHA) != 0
   && __get_cpuid (1, , , , )
--- libiberty/configure.jj  2023-12-01 08:10:44.876293919 +0100
+++ libiberty/configure 2023-12-05 16:10:06.415083621 +0100
@@ -7667,6 +7667,8 @@ void foo (__m128i *buf, unsigned int e,
 int bar (void)
 {
   unsigned int eax, ebx, ecx, edx;
+  (void) __get_cpuid;
+  (void) __get_cpuid_count;
   if (__get_cpuid_count (7, 0, , , , )
   && (ebx & bit_SHA) != 0
   && __get_cpuid (1, , , , )

Jakub



Re: [[PATCH][GCC13] 0/2] Fix combined tree build of GCC 13 with Binutils 2.41

2023-12-05 Thread Jakub Jelinek
On Tue, Dec 05, 2023 at 01:36:30PM -0800, Indu Bhagat wrote:
> To resolve the issue of combined Binutils (2.41) + GCC (13) failing to
> install (https://sourceware.org/bugzilla/show_bug.cgi?id=31108), we will
> need some backports.  This specific issue is with using --enable-shared
> in the combined tree build; it arises due to missing install-*
> dependencies in the top-level makefiles.
> 
> I think it makes sense to bring both of the following two commits (from
> the trunk) to the GCC13 branch:
> 
>   commit eff0e7a4ae31d1e4e64ae37bbc10d073d8579255
>   Author: Indu Bhagat 
>   Date:   Wed Jan 18 23:17:49 2023 -0800
>   toplevel: Makefile.def: add install-strip dependency on libsframe
> 
> 
>   commit dab58c93634bef06fd289f49109b5c370cd5c380
>   Author: Indu Bhagat 
>   Date:   Tue Nov 15 15:07:04 2022 -0800
>   bfd: linker: merge .sframe sections
> 
> This patch set cherry-picks the above two commits to GCC13 branch.  The
> patches apply cleanly with no conflicts.

Won't this break building gcc 13 with in-tree older binutils which don't have
libsframe at all?  I think binutils 2.39 and older don't have it.

Jakub



[[PATCH][GCC13] 2/2] toplevel: Makefile.def: add install-strip dependency on libsframe

2023-12-05 Thread Indu Bhagat
As noted in PR libsframe/30014 - FTBFS: install-strip fails because
bfdlib relinks and fails to find libsframe, the install time
dependencies of libbfd need to be updated.

ChangeLog:

* Makefile.def: Reflect that libsframe needs to installed before
libbfd.  Reorder a bit to better track libsframe dependencies.
* Makefile.in: Regenerate.

(cherry picked from commit eff0e7a4ae31d1e4e64ae37bbc10d073d8579255)
---
 Makefile.def | 5 -
 Makefile.in  | 3 ++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/Makefile.def b/Makefile.def
index 41512475042..0c107cae128 100644
--- a/Makefile.def
+++ b/Makefile.def
@@ -490,7 +490,6 @@ dependencies = { module=install-binutils; 
on=install-opcodes; };
 dependencies = { module=install-strip-binutils; on=install-strip-opcodes; };
 
 // Likewise for ld, libctf, and bfd.
-dependencies = { module=install-bfd; on=install-libsframe; };
 dependencies = { module=install-libctf; on=install-bfd; };
 dependencies = { module=install-ld; on=install-bfd; };
 dependencies = { module=install-ld; on=install-libctf; };
@@ -498,6 +497,10 @@ dependencies = { module=install-strip-libctf; 
on=install-strip-bfd; };
 dependencies = { module=install-strip-ld; on=install-strip-bfd; };
 dependencies = { module=install-strip-ld; on=install-strip-libctf; };
 
+// libbfd depends on libsframe
+dependencies = { module=install-bfd; on=install-libsframe; };
+dependencies = { module=install-strip-bfd; on=install-strip-libsframe; };
+
 // libopcodes depends on libbfd
 dependencies = { module=configure-opcodes; on=configure-bfd; hard=true; };
 dependencies = { module=install-opcodes; on=install-bfd; };
diff --git a/Makefile.in b/Makefile.in
index 076a48944b8..c1a607ac564 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -65991,13 +65991,14 @@ all-stageautoprofile-binutils: 
maybe-all-stageautoprofile-libsframe
 all-stageautofeedback-binutils: maybe-all-stageautofeedback-libsframe
 install-binutils: maybe-install-opcodes
 install-strip-binutils: maybe-install-strip-opcodes
-install-bfd: maybe-install-libsframe
 install-libctf: maybe-install-bfd
 install-ld: maybe-install-bfd
 install-ld: maybe-install-libctf
 install-strip-libctf: maybe-install-strip-bfd
 install-strip-ld: maybe-install-strip-bfd
 install-strip-ld: maybe-install-strip-libctf
+install-bfd: maybe-install-libsframe
+install-strip-bfd: maybe-install-strip-libsframe
 configure-opcodes: configure-bfd
 configure-stage1-opcodes: configure-stage1-bfd
 configure-stage2-opcodes: configure-stage2-bfd
-- 
2.41.0



[[PATCH][GCC13] 1/2] bfd: linker: merge .sframe sections

2023-12-05 Thread Indu Bhagat
The linker merges all the input .sframe sections.  When merging, the
linker verifies that all the input .sframe sections have the same
abi/arch.

The linker uses libsframe library to perform key actions on the
.sframe sections - decode, read, and create output data.  This
implies buildsystem changes to make and install libsframe before
libbfd.

The linker places the output .sframe section in a new segment of its
own: PT_GNU_SFRAME.  A new segment is not added, however, if the
generated .sframe section is empty.

When a section is discarded from the final link, the corresponding
entries in the .sframe section for those functions are also deleted.

The linker sorts the SFrame FDEs on start address by default and sets
the SFRAME_F_FDE_SORTED flag in the .sframe section.

This patch also adds support for generation of SFrame unwind
information for the .plt* sections on x86_64.  SFrame unwind info is
generated for IBT enabled PLT, lazy/non-lazy PLT.

The existing linker option --no-ld-generated-unwind-info has been
adapted to include the control of whether .sframe unwind information
will be generated for the linker generated sections like PLT.

Changes to the linker script have been made as necessary.

ChangeLog:

* Makefile.def: Add install dependency on libsframe for libbfd.
* Makefile.in: Regenerated.

(cherry picked from commit dab58c93634bef06fd289f49109b5c370cd5c380)
---
 Makefile.def |  4 
 Makefile.in  | 11 +++
 2 files changed, 15 insertions(+)

diff --git a/Makefile.def b/Makefile.def
index 35e994eb77e..41512475042 100644
--- a/Makefile.def
+++ b/Makefile.def
@@ -457,11 +457,14 @@ dependencies = { module=all-gdbsupport; on=all-gnulib; };
 dependencies = { module=all-gdbsupport; on=all-intl; };
 
 // Host modules specific to binutils.
+// build libsframe before bfd for encoder/decoder support for linking
+// SFrame sections
 dependencies = { module=configure-bfd; on=configure-libiberty; hard=true; };
 dependencies = { module=configure-bfd; on=configure-intl; };
 dependencies = { module=all-bfd; on=all-libiberty; };
 dependencies = { module=all-bfd; on=all-intl; };
 dependencies = { module=all-bfd; on=all-zlib; };
+dependencies = { module=all-bfd; on=all-libsframe; };
 dependencies = { module=configure-opcodes; on=configure-libiberty; hard=true; 
};
 dependencies = { module=all-opcodes; on=all-libiberty; };
 
@@ -487,6 +490,7 @@ dependencies = { module=install-binutils; 
on=install-opcodes; };
 dependencies = { module=install-strip-binutils; on=install-strip-opcodes; };
 
 // Likewise for ld, libctf, and bfd.
+dependencies = { module=install-bfd; on=install-libsframe; };
 dependencies = { module=install-libctf; on=install-bfd; };
 dependencies = { module=install-ld; on=install-bfd; };
 dependencies = { module=install-ld; on=install-libctf; };
diff --git a/Makefile.in b/Makefile.in
index 06a9398e172..076a48944b8 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -65849,6 +65849,16 @@ all-stagetrain-bfd: maybe-all-stagetrain-zlib
 all-stagefeedback-bfd: maybe-all-stagefeedback-zlib
 all-stageautoprofile-bfd: maybe-all-stageautoprofile-zlib
 all-stageautofeedback-bfd: maybe-all-stageautofeedback-zlib
+all-bfd: maybe-all-libsframe
+all-stage1-bfd: maybe-all-stage1-libsframe
+all-stage2-bfd: maybe-all-stage2-libsframe
+all-stage3-bfd: maybe-all-stage3-libsframe
+all-stage4-bfd: maybe-all-stage4-libsframe
+all-stageprofile-bfd: maybe-all-stageprofile-libsframe
+all-stagetrain-bfd: maybe-all-stagetrain-libsframe
+all-stagefeedback-bfd: maybe-all-stagefeedback-libsframe
+all-stageautoprofile-bfd: maybe-all-stageautoprofile-libsframe
+all-stageautofeedback-bfd: maybe-all-stageautofeedback-libsframe
 configure-opcodes: configure-libiberty
 configure-stage1-opcodes: configure-stage1-libiberty
 configure-stage2-opcodes: configure-stage2-libiberty
@@ -65981,6 +65991,7 @@ all-stageautoprofile-binutils: 
maybe-all-stageautoprofile-libsframe
 all-stageautofeedback-binutils: maybe-all-stageautofeedback-libsframe
 install-binutils: maybe-install-opcodes
 install-strip-binutils: maybe-install-strip-opcodes
+install-bfd: maybe-install-libsframe
 install-libctf: maybe-install-bfd
 install-ld: maybe-install-bfd
 install-ld: maybe-install-libctf
-- 
2.41.0



[[PATCH][GCC13] 0/2] Fix combined tree build of GCC 13 with Binutils 2.41

2023-12-05 Thread Indu Bhagat
Hello,

To resolve the issue of combined Binutils (2.41) + GCC (13) failing to
install (https://sourceware.org/bugzilla/show_bug.cgi?id=31108), we will
need some backports.  This specific issue is with using --enable-shared
in the combined tree build; it arises due to missing install-*
dependencies in the top-level makefiles.

I think it makes sense to bring both of the following two commits (from
the trunk) to the GCC13 branch:

commit eff0e7a4ae31d1e4e64ae37bbc10d073d8579255
Author: Indu Bhagat 
Date:   Wed Jan 18 23:17:49 2023 -0800
toplevel: Makefile.def: add install-strip dependency on libsframe


commit dab58c93634bef06fd289f49109b5c370cd5c380
Author: Indu Bhagat 
Date:   Tue Nov 15 15:07:04 2022 -0800
bfd: linker: merge .sframe sections

This patch set cherry-picks the above two commits to GCC13 branch.  The
patches apply cleanly with no conflicts.

---
Testing notes:
 - Combined tree with GCC 13 (releases/gcc-13 branch) with binutils 2.41
   (binutils-2_41-release-point branch) with "--enable-shared
   --disable-bootstrap" builds and installs.
 - Bootstrapped and regression tested releases/gcc-13 branch (make
   check-gcc in a NOT combined tree build).
---

Thanks,
Indu Bhagat (2):
  bfd: linker: merge .sframe sections
  toplevel: Makefile.def: add install-strip dependency on libsframe

 Makefile.def |  7 +++
 Makefile.in  | 12 
 2 files changed, 19 insertions(+)

-- 
2.41.0



Re: [PATCH] btf: avoid wrong DATASEC entries for extern vars [PR112849]

2023-12-05 Thread Indu Bhagat

On 12/4/23 15:47, David Faust wrote:

The process of creating BTF_KIND_DATASEC records involves iterating
through variable declarations, determining which section they will be
placed in, and creating an entry in the appropriate DATASEC record
accordingly.

For variables without e.g. an explicit __attribute__((section)), we use
categorize_decl_for_section () to identify the appropriate named section
and corresponding BTF_KIND_DATASEC record.

This was incorrectly being done for 'extern' variable declarations as
well as non-extern ones, which meant that extern variable declarations
could result in BTF_KIND_DATASEC entries claiming the variable is
allocated in some section such as '.bss' without any knowledge whether
that is actually true. That resulted in errors building the Linux kernel
BPF selftests.

This patch corrects btf_collect_datasec () to avoid assuming a section
for extern variables, and only emit BTF_KIND_DATASEC entries for them if
they have a known section.

Bootstrapped + tested on x86_64-linux-gnu.
Tested on x86_64-linux-gnu host for bpf-unknown-none.



One comment below.

LGTM, otherwise.
Thanks


gcc/
PR debug/112849
* btfout.cc (btf_collect_datasec): Avoid incorrectly creating an
entry in a BTF_KIND_DATASEC record for extern variable decls without
a known section.

gcc/testsuite/
PR debug/112849
* gcc.dg/debug/btf/btf-datasec-3.c: New test.
---
  gcc/btfout.cc | 10 ++-
  .../gcc.dg/debug/btf/btf-datasec-3.c  | 27 +++
  2 files changed, 36 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-datasec-3.c

diff --git a/gcc/btfout.cc b/gcc/btfout.cc
index a5e0d640e19..db4f1084f85 100644
--- a/gcc/btfout.cc
+++ b/gcc/btfout.cc
@@ -486,7 +486,15 @@ btf_collect_datasec (ctf_container_ref ctfc)
  
/* Mark extern variables.  */

if (DECL_EXTERNAL (node->decl))
-   dvd->dvd_visibility = BTF_VAR_GLOBAL_EXTERN;
+   {
+ dvd->dvd_visibility = BTF_VAR_GLOBAL_EXTERN;
+
+ /* PR112849: avoid assuming a section for extern decls without
+an explicit section, which would result in incorrectly
+emitting a BTF_KIND_DATASEC entry for them.  */
+ if (node->get_section () == NULL)
+   continue;
+   }
  
const char *section_name = get_section_name (node);

if (section_name == NULL)
diff --git a/gcc/testsuite/gcc.dg/debug/btf/btf-datasec-3.c 
b/gcc/testsuite/gcc.dg/debug/btf/btf-datasec-3.c
new file mode 100644
index 000..3c1c7a28c2a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/debug/btf/btf-datasec-3.c
@@ -0,0 +1,27 @@
+/* PR debug/112849
+   Test that we do not incorrectly create BTF_KIND_DATASEC entries for
+   extern decls with no known section.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O0 -gbtf -dA" } */
+
+extern int VERSION __attribute__((section (".version")));
+
+extern int test_bss1;
+extern int test_data1;
+
+int test_bss2;
+int test_data2 = 2;
+
+int
+foo (void)
+{
+  test_bss2 = VERSION;
+  return test_bss1 + test_data1 + test_data2;
+}
+
+/* There should only be a DATASEC entries for VERSION out of the extern decls. 
 */


The statement is unclear as is. Perhaps you wanted to say "There should 
only be 3 DATASEC entries; including one for VERSION even though it is 
extern decl" ?



+/* { dg-final { scan-assembler-times "bts_type" 3 } } */
+/* { dg-final { scan-assembler-times "bts_type: \\(BTF_KIND_VAR 
'test_data2'\\)" 1 } } */
+/* { dg-final { scan-assembler-times "bts_type: \\(BTF_KIND_VAR 
'test_bss2'\\)" 1 } } */
+/* { dg-final { scan-assembler-times "bts_type: \\(BTF_KIND_VAR 'VERSION'\\)" 
1 } } */




Re: [gcc15] nested functions in C

2023-12-05 Thread Martin Uecker
Am Dienstag, dem 05.12.2023 um 21:08 + schrieb Joseph Myers:
> On Mon, 4 Dec 2023, Martin Uecker wrote:
> 
> > > The key feature of lambdas (which failed to make it into C23) for this 
> > > purpose is that you can't convert them to function pointers, which 
> > > eliminates any need for trampolines.
> > 
> > And also makes them useful only for template-like macro programming,
> > but not much else. So my understanding was that this needs to be 
> > addressed at some point. 
> 
> Where "addressed" probably means some kind of callable object that stores 
> more than just a function pointer in order to be able to encapsulate both 
> the code address of a lambda and the context it needs to receive 
> implicitly.  So still not needing trampolines.

Yes, a wide function pointer type similar to C++'s std::function.

This would also be a way to eliminate the need for trampolines
for GCC's nested function.

Martin
> 



Re: [gcc15] nested functions in C

2023-12-05 Thread Joseph Myers
On Mon, 4 Dec 2023, Martin Uecker wrote:

> > The key feature of lambdas (which failed to make it into C23) for this 
> > purpose is that you can't convert them to function pointers, which 
> > eliminates any need for trampolines.
> 
> And also makes them useful only for template-like macro programming,
> but not much else. So my understanding was that this needs to be 
> addressed at some point. 

Where "addressed" probably means some kind of callable object that stores 
more than just a function pointer in order to be able to encapsulate both 
the code address of a lambda and the context it needs to receive 
implicitly.  So still not needing trampolines.

-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH] c++: fix ICE with sizeof in a template [PR112869]

2023-12-05 Thread Marek Polacek
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
This test shows that we cannot clear *walk_subtrees in
cp_fold_immediate_r when we're in_immediate_context, because that,
as the comment says, affects cp_fold_r as well.  Here we had an
expression with

  min ((long int) VIEW_CONVERT_EXPR(bytecount), (long int) 
<<< Unknown tree: sizeof_expr
(int) <<< error >>> >>>)

as its sub-expression, and we never evaluated that into

  min ((long int) bytecount, 4)

so the SIZEOF_EXPR leaked into the middle end.

(There's still one *walk_subtrees = 0; in cp_fold_immediate_r, but that
one should be OK.)

PR c++/112869

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_fold_immediate_r): Don't clear *walk_subtrees
for unevaluated operands.

gcc/testsuite/ChangeLog:

* g++.dg/template/sizeof18.C: New test.
---
 gcc/cp/cp-gimplify.cc| 8 ++--
 gcc/testsuite/g++.dg/template/sizeof18.C | 8 
 2 files changed, 10 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/template/sizeof18.C

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index 5abb91bbdd3..46c3eb91853 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -1177,13 +1177,9 @@ cp_fold_immediate_r (tree *stmt_p, int *walk_subtrees, 
void *data_)
   ? tf_error : tf_none);
   const tree_code code = TREE_CODE (stmt);
 
-  /* No need to look into types or unevaluated operands.
- NB: This affects cp_fold_r as well.  */
+  /* No need to look into types or unevaluated operands.  */
   if (TYPE_P (stmt) || unevaluated_p (code) || in_immediate_context ())
-{
-  *walk_subtrees = 0;
-  return NULL_TREE;
-}
+return NULL_TREE;
 
   tree decl = NULL_TREE;
   bool call_p = false;
diff --git a/gcc/testsuite/g++.dg/template/sizeof18.C 
b/gcc/testsuite/g++.dg/template/sizeof18.C
new file mode 100644
index 000..afba9946258
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/sizeof18.C
@@ -0,0 +1,8 @@
+// PR c++/112869
+// { dg-do compile }
+
+void min(long, long);
+template  void Binaryread(int &, T, unsigned long);
+template <> void Binaryread(int &, float, unsigned long bytecount) {
+  min(bytecount, sizeof(int));
+}

base-commit: 9c3a880feecf81c310b4ade210fbd7004c9aece7
-- 
2.43.0



aarch64: Fix +nopredres, +nols64 and +nomops

2023-12-05 Thread Andrew Carlotti
For native cpu feature detection, certain features have no entry in
/proc/cpuinfo, so have to be assumed to be present whenever the detected
cpu is supposed to support that feature.

However, the logic for this was mistakenly implemented by excluding
these features from part of aarch64_get_extension_string_for_isa_flags.
This function is also used elsewhere when canonicalising explicit
feature sets, which may require removing features that are normally
implied by the specified architecture version.

This change reenables generation of +nopredres, +nols64 and +nomops
during canonicalisation, by relocating the misplaced native cpu
detection logic.

gcc/ChangeLog:

* common/config/aarch64/aarch64-common.cc
(aarch64_get_extension_string_for_isa_flags): Remove filtering
of features without native detection.
* config/aarch64/driver-aarch64.cc (host_detect_local_cpu):
Explicitly add expected features that lack cpuinfo detection.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/options_set_29.c: New test.


diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
b/gcc/common/config/aarch64/aarch64-common.cc
index 
ee2ea7eae105d19ec906ef8d25d3a237fbeac4b4..37e60d6083e290b18b1f4c6274123b0a58de5476
 100644
--- a/gcc/common/config/aarch64/aarch64-common.cc
+++ b/gcc/common/config/aarch64/aarch64-common.cc
@@ -357,8 +357,7 @@ aarch64_get_extension_string_for_isa_flags
   }
 
   for (auto  : all_extensions)
-if (opt.native_detect_p
-   && (opt.flag_canonical != AARCH64_FL_CRYPTO)
+if ((opt.flag_canonical != AARCH64_FL_CRYPTO)
&& (opt.flag_canonical & current_flags & ~isa_flags))
   {
current_flags &= ~opt.flags_off;
diff --git a/gcc/config/aarch64/driver-aarch64.cc 
b/gcc/config/aarch64/driver-aarch64.cc
index 
8e318892b10aa2288421fad418844744a2f5a3b4..470c19b650f1ae953918eaeddbf0f768c12a99d9
 100644
--- a/gcc/config/aarch64/driver-aarch64.cc
+++ b/gcc/config/aarch64/driver-aarch64.cc
@@ -262,6 +262,7 @@ host_detect_local_cpu (int argc, const char **argv)
   unsigned int n_variants = 0;
   bool processed_exts = false;
   aarch64_feature_flags extension_flags = 0;
+  aarch64_feature_flags unchecked_extension_flags = 0;
   aarch64_feature_flags default_flags = 0;
   std::string buf;
   size_t sep_pos = -1;
@@ -348,7 +349,10 @@ host_detect_local_cpu (int argc, const char **argv)
  /* If the feature contains no HWCAPS string then ignore it for the
 auto detection.  */
  if (val.empty ())
-   continue;
+   {
+ unchecked_extension_flags |= aarch64_extensions[i].flag;
+ continue;
+   }
 
  bool enabled = true;
 
@@ -447,6 +451,13 @@ host_detect_local_cpu (int argc, const char **argv)
   if (tune)
 return res;
 
+  if (!processed_exts)
+goto not_found;
+
+  /* Add any features that should be be present, but can't be verified using
+ the /proc/cpuinfo "Features" list.  */
+  extension_flags |= unchecked_extension_flags & default_flags;
+
   {
 std::string extension
   = aarch64_get_extension_string_for_isa_flags (extension_flags,
diff --git a/gcc/testsuite/gcc.target/aarch64/options_set_29.c 
b/gcc/testsuite/gcc.target/aarch64/options_set_29.c
new file mode 100644
index 
..01bb73c02e232bdfeca5f16dad3fa2a6484843d5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/options_set_29.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=armv9.3-a+nopredres+nols64+nomops" } */
+
+int main ()
+{
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times {\.arch 
armv9\.3\-a\+crc\+nopredres\+nols64\+nomops\n} 1 } } */
+
+/* Checking if enabling default features drops the superfluous bits.   */


aarch64: Fix +nocrypto handling

2023-12-05 Thread Andrew Carlotti
Additionally, replace all checks for the AARCH64_FL_CRYPTO bit with
checks for (AARCH64_FL_AES | AARCH64_FL_SHA2) instead.  The value of the
AARCH64_FL_CRYPTO bit within isa_flags is now ignored, but it is
retained because removing it would make processing the data in
option-extensions.def significantly more complex.

Ok for master?

gcc/ChangeLog:

* common/config/aarch64/aarch64-common.cc
(aarch64_get_extension_string_for_isa_flags): Fix generation of
the "+nocrypto" extension.
* config/aarch64/aarch64.h (AARCH64_ISA_CRYPTO): Remove.
(TARGET_CRYPTO): Remove.
* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins):
Don't use TARGET_CRYPTO.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/options_set_27.c: New test.
* gcc.target/aarch64/options_set_28.c: New test.


diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
b/gcc/common/config/aarch64/aarch64-common.cc
index 
20bc4e1291bba9b73798398fea659f1154afa205..6d12454143cd64ebaafa7f5e6c23869ee0bfa543
 100644
--- a/gcc/common/config/aarch64/aarch64-common.cc
+++ b/gcc/common/config/aarch64/aarch64-common.cc
@@ -310,6 +310,7 @@ aarch64_get_extension_string_for_isa_flags
  But in order to make the output more readable, it seems better
  to add the strings in definition order.  */
   aarch64_feature_flags added = 0;
+  auto flags_crypto = AARCH64_FL_AES | AARCH64_FL_SHA2;
   for (unsigned int i = ARRAY_SIZE (all_extensions); i-- > 0; )
 {
   auto  = all_extensions[i];
@@ -319,7 +320,7 @@ aarch64_get_extension_string_for_isa_flags
 per-feature crypto flags.  */
   auto flags = opt.flag_canonical;
   if (flags == AARCH64_FL_CRYPTO)
-   flags = AARCH64_FL_AES | AARCH64_FL_SHA2;
+   flags = flags_crypto;
 
   if ((flags & isa_flags & (explicit_flags | ~current_flags)) == flags)
{
@@ -337,9 +338,27 @@ aarch64_get_extension_string_for_isa_flags
   /* Remove the features in current_flags & ~isa_flags.  If the feature does
  not have an HWCAPs then it shouldn't be taken into account for feature
  detection because one way or another we can't tell if it's available
- or not.  */
+ or not.
+
+ As a special case, emit "+nocrypto" instead of "+noaes+nosha2", in order
+ to support assemblers that predate the separate per-feature crypto flags.
+ Only use "+nocrypto" when "simd" is enabled (to avoid redundant feature
+ removal), and when "sm4" is not already enabled (to avoid dependending on
+ whether "+nocrypto" also disables "sm4")  */
+  for (auto  : all_extensions)
+if ((opt.flag_canonical == AARCH64_FL_CRYPTO)
+   && ((flags_crypto & current_flags & ~isa_flags) == flags_crypto)
+   && (current_flags & AARCH64_FL_SIMD)
+   && !(current_flags & AARCH64_FL_SM4))
+  {
+   current_flags &= ~opt.flags_off;
+   outstr += "+no";
+   outstr += opt.name;
+  }
+
   for (auto  : all_extensions)
 if (opt.native_detect_p
+   && (opt.flag_canonical != AARCH64_FL_CRYPTO)
&& (opt.flag_canonical & current_flags & ~isa_flags))
   {
current_flags &= ~opt.flags_off;
diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
index 
ab8844f6049dc95b97648b651bfcd3a4ccd3ca0b..4f9ee01d52f3ac42f95edbb030bdb2d09fc36d16
 100644
--- a/gcc/config/aarch64/aarch64-c.cc
+++ b/gcc/config/aarch64/aarch64-c.cc
@@ -140,7 +140,7 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
   aarch64_def_or_undef (TARGET_ILP32, "_ILP32", pfile);
   aarch64_def_or_undef (TARGET_ILP32, "__ILP32__", pfile);
 
-  aarch64_def_or_undef (TARGET_CRYPTO, "__ARM_FEATURE_CRYPTO", pfile);
+  aarch64_def_or_undef (TARGET_AES && TARGET_SHA2, "__ARM_FEATURE_CRYPTO", 
pfile);
   aarch64_def_or_undef (TARGET_SIMD_RDMA, "__ARM_FEATURE_QRDMX", pfile);
   aarch64_def_or_undef (TARGET_SVE, "__ARM_FEATURE_SVE", pfile);
   cpp_undef (pfile, "__ARM_FEATURE_SVE_BITS");
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
1ac298926ce1606a87bcdcaf691f182ca416d600..d3613a0a42b7b6d2c4452739841b133014909a39
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -177,10 +177,13 @@ enum class aarch64_feature : unsigned char {
 
 #endif
 
-/* Macros to test ISA flags.  */
+/* Macros to test ISA flags.
+
+   There is intentionally no macro for AARCH64_FL_CRYPTO, since this flag bit
+   is not always set when its constituent features are present.
+   Check (TARGET_AES && TARGET_SHA2) instead.  */
 
 #define AARCH64_ISA_CRC(aarch64_isa_flags & AARCH64_FL_CRC)
-#define AARCH64_ISA_CRYPTO (aarch64_isa_flags & AARCH64_FL_CRYPTO)
 #define AARCH64_ISA_FP (aarch64_isa_flags & AARCH64_FL_FP)
 #define AARCH64_ISA_SIMD   (aarch64_isa_flags & AARCH64_FL_SIMD)
 #define AARCH64_ISA_LSE   (aarch64_isa_flags & AARCH64_FL_LSE)
@@ -223,9 +226,6 @@ enum class aarch64_feature : unsigned char {
 #define 

aarch64 testsuite: Check entire .arch string

2023-12-05 Thread Andrew Carlotti
Add a terminating newline to various tests, and add missing
extensions to some test strings.

Obvious change, so I'll push it once my other option handling patches are
approved (if noone objects).

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cpunative/native_cpu_18.c: Add \+nopauth\n
* gcc.target/aarch64/options_set_7.c: Add \+crc\n
* gcc.target/aarch64/options_set_8.c: Add \+crc\+nodotprod\n
* gcc.target/aarch64/cpunative/native_cpu_0.c: Add \n
* gcc.target/aarch64/cpunative/native_cpu_1.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_2.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_3.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_4.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_5.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_6.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_7.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_8.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_9.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_10.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_11.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_12.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_13.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_14.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_15.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_16.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_17.c: Ditto.
* gcc.target/aarch64/options_set_1.c: Ditto.
* gcc.target/aarch64/options_set_2.c: Ditto.
* gcc.target/aarch64/options_set_3.c: Ditto.
* gcc.target/aarch64/options_set_4.c: Ditto.
* gcc.target/aarch64/options_set_5.c: Ditto.
* gcc.target/aarch64/options_set_6.c: Ditto.
* gcc.target/aarch64/options_set_9.c: Ditto.
* gcc.target/aarch64/options_set_11.c: Ditto.
* gcc.target/aarch64/options_set_12.c: Ditto.
* gcc.target/aarch64/options_set_13.c: Ditto.
* gcc.target/aarch64/options_set_14.c: Ditto.
* gcc.target/aarch64/options_set_15.c: Ditto.
* gcc.target/aarch64/options_set_16.c: Ditto.
* gcc.target/aarch64/options_set_17.c: Ditto.
* gcc.target/aarch64/options_set_18.c: Ditto.
* gcc.target/aarch64/options_set_19.c: Ditto.
* gcc.target/aarch64/options_set_20.c: Ditto.
* gcc.target/aarch64/options_set_21.c: Ditto.
* gcc.target/aarch64/options_set_22.c: Ditto.
* gcc.target/aarch64/options_set_23.c: Ditto.
* gcc.target/aarch64/options_set_24.c: Ditto.
* gcc.target/aarch64/options_set_25.c: Ditto.
* gcc.target/aarch64/options_set_26.c: Ditto.


diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_0.c 
b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_0.c
index 
8499f87c39b173491a89626af56f4e193b1d12b5..fb5a7a18ad1a2d09ac4b231150a1bd9e72d6fab6
 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_0.c
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_0.c
@@ -7,6 +7,6 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-assembler {\.arch armv8-a\+crc\+dotprod\+crypto} } } */
+/* { dg-final { scan-assembler {\.arch armv8-a\+crc\+dotprod\+crypto\n} } } */
 
 /* Test a normal looking procinfo.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_1.c 
b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_1.c
index 
2cf0e89994b1cc0dc9fac67f4dc431c003498048..cb50e3b73057994432cc3ed15e3d5b57c7a3cb7b
 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_1.c
@@ -7,6 +7,6 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-assembler {\.arch armv8-a\+nosimd} } } */
+/* { dg-final { scan-assembler {\.arch armv8-a\+nosimd\n} } } */
 
 /* Test one where fp is on by default so turn off simd.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_10.c 
b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_10.c
index 
ddb06b8227576807fe068b76dabed91a0223e4fa..6a524bad371c55fc32698ff0994f4ad431be49ca
 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_10.c
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_10.c
@@ -7,6 +7,6 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-assembler {\.arch armv8-a\+nofp} } } */
+/* { dg-final { scan-assembler {\.arch armv8-a\+nofp\n} } } */
 
 /* Test one with no entry in feature list.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_11.c 
b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_11.c
index 
96b9ca434ebbf007ddaa45d55a8c2b8e7a19a715..644f4792275bdd32a9f84241f0c329b046cbd909
 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_11.c
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_11.c
@@ -7,6 +7,6 @@ int main()
   return 0;
 }
 
-/* { dg-final { 

[PATCH] aarch64: Add missing driver-aarch64 dependencies

2023-12-05 Thread Andrew Carlotti
Ok for master?

gcc/ChangeLog:

* config/aarch64/x-aarch64: Add missing dependencies.


diff --git a/gcc/config/aarch64/x-aarch64 b/gcc/config/aarch64/x-aarch64
index 
3cf701a0a01ab00eaaafdfad14bd90ebbb1d498f..6fd638faaab7cb5bb2309d36d6dea2adf1fb8d32
 100644
--- a/gcc/config/aarch64/x-aarch64
+++ b/gcc/config/aarch64/x-aarch64
@@ -1,3 +1,7 @@
 driver-aarch64.o: $(srcdir)/config/aarch64/driver-aarch64.cc \
-  $(CONFIG_H) $(SYSTEM_H)
+  $(CONFIG_H) $(SYSTEM_H) $(TM_H) $(CORETYPES_H) \
+  $(srcdir)/config/aarch64/aarch64-protos.h \
+  $(srcdir)/config/aarch64/aarch64-feature-deps.h \
+  $(srcdir)/config/aarch64/aarch64-cores.def \
+  $(srcdir)/config/aarch64/aarch64-arches.def
$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) $<


[PATCH] build: unbreak bootstrap on uclinux targets [PR112762]

2023-12-05 Thread Marek Polacek
Tested with .../configure --target=c6x-uclinux [...] && make all-gcc,
ok for trunk?

-- >8 --
Currently, cross-compiling with --target=c6x-uclinux (and several other)
fails due to:

../../src/gcc/config/linux.h:221:45: error: 
'linux_fortify_source_default_level' was not declared in this scope
 #define TARGET_FORTIFY_SOURCE_DEFAULT_LEVEL linux_fortify_source_default_level

In the PR Andrew mentions that another fix would be in config.gcc,
but really, here I meant to use the target hook for glibc only, not
uclibc.  This trivial patch fixes the build problem.  It means that
-fhardened with uclibc will use -D_FORTIFY_SOURCE=2 and not =3.

PR target/112762

gcc/ChangeLog:

* config/linux.h: Redefine TARGET_FORTIFY_SOURCE_DEFAULT_LEVEL for
glibc only.
---
 gcc/config/linux.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/linux.h b/gcc/config/linux.h
index 79b6537dcf1..73f39d3c603 100644
--- a/gcc/config/linux.h
+++ b/gcc/config/linux.h
@@ -215,7 +215,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 # undef TARGET_LIBM_FUNCTION_MAX_ERROR
 # define TARGET_LIBM_FUNCTION_MAX_ERROR linux_libm_function_max_error
 
-#endif
-
 #undef TARGET_FORTIFY_SOURCE_DEFAULT_LEVEL
 #define TARGET_FORTIFY_SOURCE_DEFAULT_LEVEL linux_fortify_source_default_level
+
+#endif

base-commit: 9c3a880feecf81c310b4ade210fbd7004c9aece7
-- 
2.43.0



Re: [PATCH v2] rs6000: Add new pass for replacement of contiguous addresses vector load lxv with lxvp

2023-12-05 Thread Ajit Agarwal
Hello Kewen:


On 05/12/23 7:13 pm, Ajit Agarwal wrote:
> Hello Kewen:
> 
> On 04/12/23 7:31 am, Kewen.Lin wrote:
>> Hi Ajit,
>>
>> on 2023/12/1 17:10, Ajit Agarwal wrote:
>>> Hello Kewen:
>>>
>>> On 24/11/23 3:01 pm, Kewen.Lin wrote:
 Hi Ajit,

 Don't forget to CC David (CC-ed) :), some comments are inlined below.

 on 2023/10/8 03:04, Ajit Agarwal wrote:
> Hello All:
>
> This patch add new pass to replace contiguous addresses vector load lxv 
> with mma instruction
> lxvp.

 IMHO the current binding lxvp (and lxvpx, stxvp{x,}) to MMA looks wrong, 
 it's only
 Power10 and VSX required, these instructions should perform well without 
 MMA support.
 So one patch to separate their support from MMA seems to go first.

>>>
>>> I will make the changes for Power10 and VSX.
>>>
> This patch addresses one regressions failure in ARM architecture.

 Could you explain this?  I don't see any test case for this.
>>>
>>> I have submitted v1 of the patch and there were regressions failure for 
>>> Linaro.
>>> I have fixed in version V2.
>>
>> OK, thanks for clarifying.  So some unexpected changes on generic code in v1
>> caused the failure exposed on arm.
>>
>>>
>>>  
 Besides, it seems a bad idea to put this pass after reload? as register 
 allocation
 finishes, this pairing has to be restricted by the reg No. (I didn't see 
 any
 checking on the reg No. relationship for paring btw.)

>>>
>>> Adding before reload pass deletes one of the lxv and replaced with lxvp. 
>>> This
>>> fails in reload pass while freeing reg_eqivs as ira populates them and then
>>
>> I can't find reg_eqivs, I guessed you meant reg_equivs and moved this pass 
>> right before
>> pass_reload (between pass_ira and pass_reload)?  IMHO it's unexpected as 
>> those two passes
>> are closely correlated.  I was expecting to put it somewhere before ira.
> 
> Yes they are tied together and moving before reload will not work.
> 
>>
>>> vecload pass deletes some of insns and while freeing in reload pass as insn
>>> is already deleted in vecload pass reload pass segfaults.
>>>
>>> Moving vecload pass before ira will not make register pairs with lxvp and
>>> in ira and that will be a problem.
>>
>> Could you elaborate the obstacle for moving such pass before pass_ira?
>>
>> Basing on the status quo, the lxvp is bundled with OOmode, then I'd expect
>> we can generate OOmode move (load) and use the components with unspec (or
>> subreg with Peter's patch) to replace all the previous use places, it looks
>> doable to me.
> 
> Moving before ira passes, we delete the offset lxv and generate lxvp and 
> replace all
> the uses, that I am doing. But the offset lxvp register generated by ira are 
> not
> register pair and generate random register and hence we cannot generate lxvp.
> 
> For example one lxv is generated with register 32 and other pair is generated
> with register 45 by ira if we move it before ira passes.

It generates the following.
lxvp %vs32,0(%r4)
xvf32ger 0,%vs34,%vs32
xvf32gerpp 0,%vs34,%vs45
xxmfacc 0
stxvp %vs2,0(%r3)
stxvp %vs0,32(%r3)
blr


Instead of vs33 ira generates vs45 if we move before pass_ira.

Thanks & Regards
Ajit

 
> Thanks & Regards
> Ajit
>>
> 
>>>
>>> Making after reload pass is the only solution I see as ira and reload pass
>>> makes register pairs and vecload pass will be easier with generation of
>>> lxvp.
>>>
>>> Please suggest.
>>>  
 Looking forward to the comments from Segher/David/Peter/Mike etc.
>>
>> Still looking forward. :)
>>
>> BR,
>> Kewen


Re: [PATCH] libgfortran: Fix -Wincompatible-pointer-types errors

2023-12-05 Thread Richard Earnshaw




On 05/12/2023 10:59, Jakub Jelinek wrote:

On Tue, Dec 05, 2023 at 10:57:50AM +, Richard Earnshaw wrote:

On 05/12/2023 10:51, Jakub Jelinek wrote:

On Tue, Dec 05, 2023 at 10:47:34AM +, Richard Earnshaw wrote:

The following patch makes libgfortran build on i686-linux after hacking up
--- kinds.h.xx  2023-12-05 00:23:00.133365064 +0100
+++ kinds.h 2023-12-05 11:19:24.409679808 +0100
@@ -10,8 +10,8 @@ typedef GFC_INTEGER_2 GFC_LOGICAL_2;
#define HAVE_GFC_LOGICAL_2
#define HAVE_GFC_INTEGER_2
-typedef int32_t GFC_INTEGER_4;
-typedef uint32_t GFC_UINTEGER_4;
+typedef long GFC_INTEGER_4;
+typedef unsigned long GFC_UINTEGER_4;


That doesn't look right for a 64-bit processor.  Presumably 4 means 4 bytes,


i686-linux is an ILP32 target, which I chose exactly because I regularly build
it, had a tree with it around and because unlike 64-bit targets there are 2
standard 32-bit signed integer types.  Though, normally int32_t there is
int rather than long int and so the errors only appeared after this hack.



My point is that on aarch64/x86_64 etc, this will make GFC_INTEGER_4 a
64-bit type, whereas previously it was 32-bit.


Sure.  The above patch is a hack for a generated header.  I'm not proposing
that as a change, just explaining how I've verified the actual patch on
i686-linux with such a hack.

Jakub



Ah, I understand now.

I've successfully built arm and aarch64 cross toolchains with this patch 
(newlib).  So LGTM, thanks.


R.


Re: [Patch] tsystem.h: Declare calloc/realloc #ifdef inhibit_libc

2023-12-05 Thread Jakub Jelinek
On Tue, Dec 05, 2023 at 06:29:10PM +0100, Tobias Burnus wrote:
> Crossref, there is are -Wbuiltin-declaration-mismatch warnings in 
> libgcc/emutls.c,
> cf. https://gcc.gnu.org/PR109289
> 
> I decided to leave this to Thomas and Florian and just fix the build issue 
> with
> the attached patch. That build issue was also mentioned in PR libgcc/109289.
> 
> An alternative would be __builtin, but as the other #define were pre-existing,
> I went for the tsystem.h version.
> 
> OK for mainline?
> 
> Tobias
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
> München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
> Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
> München, HRB 106955

> tsystem.h: Declare calloc/realloc #ifdef inhibit_libc
> 
> Declare calloc and realloc #ifndef and inhibit_libc is
> defined.  Those are used by libgcc/emutls.c.
> 
> gcc/ChangeLog:
> 
>   * tsystem.h (calloc, realloc): Declare when inhibit_libc.

Ok, thanks.

Jakub



[Patch] tsystem.h: Declare calloc/realloc #ifdef inhibit_libc

2023-12-05 Thread Tobias Burnus

Crossref, there is are -Wbuiltin-declaration-mismatch warnings in 
libgcc/emutls.c,
cf. https://gcc.gnu.org/PR109289

I decided to leave this to Thomas and Florian and just fix the build issue with
the attached patch. That build issue was also mentioned in PR libgcc/109289.

An alternative would be __builtin, but as the other #define were pre-existing,
I went for the tsystem.h version.

OK for mainline?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
tsystem.h: Declare calloc/realloc #ifdef inhibit_libc

Declare calloc and realloc #ifndef and inhibit_libc is
defined.  Those are used by libgcc/emutls.c.

gcc/ChangeLog:

	* tsystem.h (calloc, realloc): Declare when inhibit_libc.

diff --git a/gcc/tsystem.h b/gcc/tsystem.h
index 081c73345cd..c49ff578cb7 100644
--- a/gcc/tsystem.h
+++ b/gcc/tsystem.h
@@ -47,12 +47,20 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #ifdef inhibit_libc
 
 #ifndef malloc
 extern void *malloc (size_t);
 #endif
 
+#ifndef calloc
+extern void *calloc(size_t, size_t);
+#endif
+
+#ifndef realloc
+extern void *realloc(void *, size_t);
+#endif
+
 #ifndef free
 extern void free (void *);
 #endif
 
 #ifndef atexit
 extern int atexit (void (*)(void));


Re: [PATCH] c++, v2: Fix parsing [[]][[]];

2023-12-05 Thread Marek Polacek
On Tue, Dec 05, 2023 at 06:00:31PM +0100, Jakub Jelinek wrote:
> On Tue, Dec 05, 2023 at 09:45:32AM -0500, Marek Polacek wrote:
> > > When working on the previous patch I put [[]] [[]] asm (""); into a
> > > testcase, but was surprised it wasn't parsed.
> > 
> > By wasn't parsed you mean we gave an error, right?  I only see an error
> > with block-scope [[]] [[]];.
> 
> Yeah.
> The reason why [[]][[]]; works at namespace scope is that if
>   else if (cp_lexer_nth_token_is (parser->lexer,
>   cp_parser_skip_std_attribute_spec_seq 
> (parser,
>  1),
>   CPP_SEMICOLON))
> which is the case here then even if after parsing the attributes next token
> isn't CPP_SEMICOLON (the case here without the patch), it will just return
> and another cp_parser_declaration will parse another [[]], that time also
> with CPP_SEMICOLON.
> 
> > It seems marginally better to me to use void_list_node so that we don't
> > need a new parm, like what we do when parsing parameters: ()/(void)/(...),
> > but I should let others decide.
> 
> Here is a modified version of the patch which does it like that.

Thanks, this looks good to me.
 
> 2023-12-05  Jakub Jelinek  
> 
>   * parser.cc (cp_parser_std_attribute_spec): Return void_list_node
>   rather than NULL_TREE if token is neither CPP_OPEN_SQUARE nor
>   RID_ALIGNAS CPP_KEYWORD.
>   (cp_parser_std_attribute_spec_seq): For attr_spec == void_list_node
>   break, for attr_spec == NULL_TREE continue.
> 
>   * g++.dg/cpp0x/gen-attrs-79.C: New test.
> 
> --- gcc/cp/parser.cc.jj   2023-12-05 16:18:32.224909370 +0100
> +++ gcc/cp/parser.cc  2023-12-05 17:07:34.690170639 +0100
> @@ -30244,7 +30244,11 @@ void cp_parser_late_contract_condition (
>   [ [ assert :  contract-mode [opt] : conditional-expression ] ]
>   [ [ pre :  contract-mode [opt] : conditional-expression ] ]
>   [ [ post :  contract-mode [opt] identifier [opt] :
> -  conditional-expression ] ]  */
> +  conditional-expression ] ]
> +
> +   Return void_list_node if the current token doesn't start an
> +   attribute-specifier to differentiate from NULL_TREE returned e.g.
> +   for [ [ ] ].  */
>  
>  static tree
>  cp_parser_std_attribute_spec (cp_parser *parser)
> @@ -30324,7 +30328,7 @@ cp_parser_std_attribute_spec (cp_parser
>  
>if (token->type != CPP_KEYWORD
> || token->keyword != RID_ALIGNAS)
> - return NULL_TREE;
> + return void_list_node;
>  
>cp_lexer_consume_token (parser->lexer);
>maybe_warn_cpp0x (CPP0X_ATTRIBUTES);
> @@ -30397,8 +30401,12 @@ cp_parser_std_attribute_spec_seq (cp_par
>while (true)
>  {
>tree attr_spec = cp_parser_std_attribute_spec (parser);
> -  if (attr_spec == NULL_TREE)
> +  if (attr_spec == void_list_node)
>   break;
> +  /* Accept [[]][[]]; for which cp_parser_std_attribute_spec
> +  returns NULL_TREE as there are no attributes.  */
> +  if (attr_spec == NULL_TREE)
> + continue;
>if (attr_spec == error_mark_node)
>   return error_mark_node;
>  
> --- gcc/testsuite/g++.dg/cpp0x/gen-attrs-79.C.jj  2023-12-05 
> 17:04:14.235988879 +0100
> +++ gcc/testsuite/g++.dg/cpp0x/gen-attrs-79.C 2023-12-05 17:04:14.235988879 
> +0100
> @@ -0,0 +1,9 @@
> +// { dg-do compile { target c++11 } }
> +
> +[[]] [[]];
> +
> +[[]] [[]] void
> +foo ()
> +{
> +  [[]] [[]];
> +}

Marek



Re: [PATCH v3 1/3] libgomp, nvptx: low-latency memory allocator

2023-12-05 Thread Tobias Burnus

On 05.12.23 16:39, Andrew Stubbs wrote:

Hence, mentioning in this section in addition that
omp_low_lat_mem_space  is honored on devices
seems to be the better location.


How about this?


LGTM – Thanks!

Tobias


--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -3012,9 +3012,9 @@ value.
 @item omp_const_mem_alloc   @tab omp_const_mem_space
 @item omp_high_bw_mem_alloc @tab omp_high_bw_mem_space
 @item omp_low_lat_mem_alloc @tab omp_low_lat_mem_space
-@item omp_cgroup_mem_alloc  @tab --
-@item omp_pteam_mem_alloc   @tab --
-@item omp_thread_mem_alloc  @tab --
+@item omp_cgroup_mem_alloc  @tab omp_low_lat_mem_space
(implementation defined)
+@item omp_pteam_mem_alloc   @tab omp_low_lat_mem_space
(implementation defined)
+@item omp_thread_mem_alloc  @tab omp_low_lat_mem_space
(implementation defined)
 @end multitable

 The predefined allocators use the default values for the traits,
@@ -3060,7 +3060,7 @@
OMP_ALLOCATOR=omp_low_lat_mem_space:pinned=true,partition=nearest

 @item @emph{See also}:
 @ref{Memory allocation}, @ref{omp_get_default_allocator},
-@ref{omp_set_default_allocator}
+@ref{omp_set_default_allocator}, @ref{Offload-Target Specific}

 @item @emph{Reference}:
 @uref{https://www.openmp.org/, OpenMP specification v5.0}, Section 6.21
@@ -5710,7 +5710,8 @@ For the memory spaces, the following applies:
 @itemize
 @item @code{omp_default_mem_space} is supported
 @item @code{omp_const_mem_space} maps to @code{omp_default_mem_space}
-@item @code{omp_low_lat_mem_space} maps to @code{omp_default_mem_space}
+@item @code{omp_low_lat_mem_space} is only available on supported
devices,
+  and maps to @code{omp_default_mem_space} otherwise.
 @item @code{omp_large_cap_mem_space} maps to
@code{omp_default_mem_space},
   unless the memkind library is available
 @item @code{omp_high_bw_mem_space} maps to @code{omp_default_mem_space},
@@ -5766,6 +5767,9 @@ Additional notes regarding the traits:
 @item The @code{sync_hint} trait has no effect.
 @end itemize

+See also:
+@ref{Offload-Target Specifics}
+

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH RFC] c++: mangle function template constraints

2023-12-05 Thread Jonathan Wakely
On Wed, 22 Nov 2023 at 14:50, Jonathan Wakely  wrote:
>
> On Mon, 20 Nov 2023 at 02:56, Jason Merrill wrote:
> >
> > Tested x86_64-pc-linux-gnu.  Are the library bits OK?  Any comments before I
> > push this?
>
> The library parts are OK.
>
> The variable template is_trivially_copyable_v just uses
> __is_trivially_copyable so should be just as efficient, and the change
> to  is fine.
>
> The variable template is_trivially_destructible_v instantiates the
> is_trivially_destructible type trait, which instantiates
> __is_destructible_safe and __is_destructible_impl, which is probably
> why we used the built-in directly in . But that's an
> acceptable overhead to avoid using the built-in in a mangled context,
> and it would be good to optimize the variable template anyway, as a
> separate change.

This actually causes a regression:

FAIL: 20_util/variant/87619.cc  -std=gnu++20 (test for excess errors)
FAIL: 20_util/variant/87619.cc  -std=gnu++23 (test for excess errors)
FAIL: 20_util/variant/87619.cc  -std=gnu++26 (test for excess errors)

It's OK for C++17 because the changed code is only used for C++20 and later.

That test instantiates a very large variant to check that we don't hit
our template instantiation depth limit. Using the variable template
(which uses the class template) instead of the built-in causes it to
fail now.

So optimizing the variable template is now a priority.



[PATCH] c++, v2: Fix parsing [[]][[]];

2023-12-05 Thread Jakub Jelinek
On Tue, Dec 05, 2023 at 09:45:32AM -0500, Marek Polacek wrote:
> > When working on the previous patch I put [[]] [[]] asm (""); into a
> > testcase, but was surprised it wasn't parsed.
> 
> By wasn't parsed you mean we gave an error, right?  I only see an error
> with block-scope [[]] [[]];.

Yeah.
The reason why [[]][[]]; works at namespace scope is that if
  else if (cp_lexer_nth_token_is (parser->lexer,
  cp_parser_skip_std_attribute_spec_seq (parser,
 1),
  CPP_SEMICOLON))
which is the case here then even if after parsing the attributes next token
isn't CPP_SEMICOLON (the case here without the patch), it will just return
and another cp_parser_declaration will parse another [[]], that time also
with CPP_SEMICOLON.

> It seems marginally better to me to use void_list_node so that we don't
> need a new parm, like what we do when parsing parameters: ()/(void)/(...),
> but I should let others decide.

Here is a modified version of the patch which does it like that.

2023-12-05  Jakub Jelinek  

* parser.cc (cp_parser_std_attribute_spec): Return void_list_node
rather than NULL_TREE if token is neither CPP_OPEN_SQUARE nor
RID_ALIGNAS CPP_KEYWORD.
(cp_parser_std_attribute_spec_seq): For attr_spec == void_list_node
break, for attr_spec == NULL_TREE continue.

* g++.dg/cpp0x/gen-attrs-79.C: New test.

--- gcc/cp/parser.cc.jj 2023-12-05 16:18:32.224909370 +0100
+++ gcc/cp/parser.cc2023-12-05 17:07:34.690170639 +0100
@@ -30244,7 +30244,11 @@ void cp_parser_late_contract_condition (
  [ [ assert :  contract-mode [opt] : conditional-expression ] ]
  [ [ pre :  contract-mode [opt] : conditional-expression ] ]
  [ [ post :  contract-mode [opt] identifier [opt] :
-conditional-expression ] ]  */
+conditional-expression ] ]
+
+   Return void_list_node if the current token doesn't start an
+   attribute-specifier to differentiate from NULL_TREE returned e.g.
+   for [ [ ] ].  */
 
 static tree
 cp_parser_std_attribute_spec (cp_parser *parser)
@@ -30324,7 +30328,7 @@ cp_parser_std_attribute_spec (cp_parser
 
   if (token->type != CPP_KEYWORD
  || token->keyword != RID_ALIGNAS)
-   return NULL_TREE;
+   return void_list_node;
 
   cp_lexer_consume_token (parser->lexer);
   maybe_warn_cpp0x (CPP0X_ATTRIBUTES);
@@ -30397,8 +30401,12 @@ cp_parser_std_attribute_spec_seq (cp_par
   while (true)
 {
   tree attr_spec = cp_parser_std_attribute_spec (parser);
-  if (attr_spec == NULL_TREE)
+  if (attr_spec == void_list_node)
break;
+  /* Accept [[]][[]]; for which cp_parser_std_attribute_spec
+returns NULL_TREE as there are no attributes.  */
+  if (attr_spec == NULL_TREE)
+   continue;
   if (attr_spec == error_mark_node)
return error_mark_node;
 
--- gcc/testsuite/g++.dg/cpp0x/gen-attrs-79.C.jj2023-12-05 
17:04:14.235988879 +0100
+++ gcc/testsuite/g++.dg/cpp0x/gen-attrs-79.C   2023-12-05 17:04:14.235988879 
+0100
@@ -0,0 +1,9 @@
+// { dg-do compile { target c++11 } }
+
+[[]] [[]];
+
+[[]] [[]] void
+foo ()
+{
+  [[]] [[]];
+}


Jakub



[committed] libstdc++: Disable std::formatter::set_debug_format [PR112832]

2023-12-05 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk. Will backport to gcc-13 too.

-- >8 --

All set_debug_format member functions should be guarded by the
__cpp_lib_formatting_ranges macro (which is not defined yet).

libstdc++-v3/ChangeLog:

PR libstdc++/112832
* include/std/format (formatter::set_debug_format): Ensure this
member is defined conditionally for all specializations.
* testsuite/std/format/formatter/112832.cc: New test.
---
 libstdc++-v3/include/std/format   |  8 +
 .../testsuite/std/format/formatter/112832.cc  | 29 +++
 2 files changed, 37 insertions(+)
 create mode 100644 libstdc++-v3/testsuite/std/format/formatter/112832.cc

diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index 58cd310db4d..01f0a58392a 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -1815,9 +1815,11 @@ namespace __format
return _M_f.format(__u, __fc);
}
 
+#if __cpp_lib_format_ranges
   constexpr void
   set_debug_format() noexcept
   { _M_f._M_spec._M_type = __format::_Pres_esc; }
+#endif
 
 private:
   __format::__formatter_int _M_f;
@@ -1843,7 +1845,9 @@ namespace __format
format(_CharT* __u, basic_format_context<_Out, _CharT>& __fc) const
{ return _M_f.format(__u, __fc); }
 
+#if __cpp_lib_format_ranges
   constexpr void set_debug_format() noexcept { _M_f.set_debug_format(); }
+#endif
 
 private:
   __format::__formatter_str<_CharT> _M_f;
@@ -1866,7 +1870,9 @@ namespace __format
   basic_format_context<_Out, _CharT>& __fc) const
{ return _M_f.format(__u, __fc); }
 
+#if __cpp_lib_format_ranges
   constexpr void set_debug_format() noexcept { _M_f.set_debug_format(); }
+#endif
 
 private:
   __format::__formatter_str<_CharT> _M_f;
@@ -1888,7 +1894,9 @@ namespace __format
   basic_format_context<_Out, _CharT>& __fc) const
{ return _M_f.format({__u, _Nm}, __fc); }
 
+#if __cpp_lib_format_ranges
   constexpr void set_debug_format() noexcept { _M_f.set_debug_format(); }
+#endif
 
 private:
   __format::__formatter_str<_CharT> _M_f;
diff --git a/libstdc++-v3/testsuite/std/format/formatter/112832.cc 
b/libstdc++-v3/testsuite/std/format/formatter/112832.cc
new file mode 100644
index 000..9aa2095a73d
--- /dev/null
+++ b/libstdc++-v3/testsuite/std/format/formatter/112832.cc
@@ -0,0 +1,29 @@
+// { dg-do compile { target c++20 } }
+
+#include 
+
+template()[0])>>
+constexpr bool
+test_pr112832()
+{
+  std::formatter f;
+  if constexpr (requires{ f.set_debug_format(); })
+f.set_debug_format();
+  return true;
+}
+
+int main()
+{
+  static_assert(test_pr112832());
+  static_assert(test_pr112832());
+  static_assert(test_pr112832());
+  static_assert(test_pr112832());
+#ifdef _GLIBCXX_USE_WCHAR_T
+  static_assert(test_pr112832());
+  static_assert(test_pr112832());
+  static_assert(test_pr112832());
+  static_assert(test_pr112832());
+  static_assert(test_pr112832());
+#endif
+}
-- 
2.43.0



Re: [PATCH v3] RISC-V: Implement TLS Descriptors.

2023-12-05 Thread Tatsuyuki Ishi
> On Nov 21, 2023, at 15:59, Fangrui Song  wrote:
> 
> On Mon, Nov 20, 2023 at 6:20 AM Tatsuyuki Ishi  > wrote:
>> 
>> This implements TLS Descriptors (TLSDESC) as specified in [1].
>> 
>> The 4-instruction sequence is implemented as a single RTX insn for
>> simplicity, but this can be revisited later if instruction scheduling or
>> more flexible RA is desired.
>> 
>> The default remains to be the traditional TLS model, but can be configured
>> with --with_tls={trad,desc}. The choice can be revisited once toolchain
>> and libc support ships.
>> 
>> [1]: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/373.
>> 
>> gcc/Changelog:
>>* config/riscv/riscv.opt: Add -mtls-dialect to configure TLS flavor.
>>* config.gcc: Add --with_tls configuration option to change the
>>default TLS flavor.
>>* config/riscv/riscv.h: Add TARGET_TLSDESC determined from
>>-mtls-dialect and with_tls defaults.
>>* config/riscv/riscv-opts.h: Define enum riscv_tls_type for the
>>two TLS flavors.
>>* config/riscv/riscv-protos.h: Define SYMBOL_TLSDESC symbol type.
>>* config/riscv/riscv.md: Add instruction sequence for TLSDESC.
>>* config/riscv/riscv.cc (riscv_symbol_insns): Add instruction
>>sequence length data for TLSDESC.
>>(riscv_legitimize_tls_address): Add lowering of TLSDESC.
>>* doc/install.texi: Document --with-tls for RISC-V.
>>* doc/invoke.texi: Document --mtls-dialect for RISC-V.
> 
> Nit: One dash for --mtls-dialect.
> 
>> ---
>> No regression in gcc tests for rv64gc, tested alongside the binutils and
>> glibc implementation. Tested with --with_tls=desc.
>> 
>> v2: Add with_tls configuration option, and a few readability improvements.
>>Added Changelog.
>> v3: Add documentation per Kito's suggestion.
>>Fix minor issues pointed out by Kito and Jeff.
>>Thanks Kito Cheng and Jeff Law for review.
>> 
>> I've considered gating this behind a GAS feature test, but it seems
>> nontrivial especially for restricting the variants available at runtime.
>> Since TLS descriptors is not selected by default, I've decided to leave it
>> ungated.
>> 
>> In other news, I have made some progress on binutils side, and I'll try to
>> update the GAS / ld patch set with relaxation included, by the end of this
>> month.
> 
> Thanks for the update.  I understand the complexity adding a runtime
> test when the feature also requires binutils and rtld support.
> I hope that we add a test checking assembly under
> gcc/testsuite/gcc.target/riscv/tls , otherwise as a non-default test,
> when this breaks, it may be difficult to figure it out.
> (glibc/elf/tst-* will need a runtime test, but GCC needs to have its own.)
> 
>> gcc/config.gcc  | 15 ++-
>> gcc/config/riscv/riscv-opts.h   |  6 ++
>> gcc/config/riscv/riscv-protos.h |  5 +++--
>> gcc/config/riscv/riscv.cc   | 24 
>> gcc/config/riscv/riscv.h|  9 +++--
>> gcc/config/riscv/riscv.md   | 21 -
>> gcc/config/riscv/riscv.opt  | 14 ++
>> gcc/doc/install.texi|  3 +++
>> gcc/doc/invoke.texi | 13 -
>> 9 files changed, 99 insertions(+), 11 deletions(-)
>> 
>> diff --git a/gcc/config.gcc b/gcc/config.gcc
>> index 415e0e1ebc5..2c1a7179b02 100644
>> --- a/gcc/config.gcc
>> +++ b/gcc/config.gcc
>> @@ -2434,6 +2434,7 @@ riscv*-*-linux*)
>># Force .init_array support.  The configure script cannot always
>># automatically detect that GAS supports it, yet we require it.
>>gcc_cv_initfini_array=yes
>> +   with_tls=${with_tls:-trad}
>>;;
>> riscv*-*-elf* | riscv*-*-rtems*)
>>tm_file="elfos.h newlib-stdint.h ${tm_file} riscv/elf.h"
>> @@ -2476,6 +2477,7 @@ riscv*-*-freebsd*)
>># Force .init_array support.  The configure script cannot always
>># automatically detect that GAS supports it, yet we require it.
>>gcc_cv_initfini_array=yes
>> +   with_tls=${with_tls:-trad}
>>;;
>> 
>> loongarch*-*-linux*)
>> @@ -4566,7 +4568,7 @@ case "${target}" in
>>;;
>> 
>>riscv*-*-*)
>> -   supported_defaults="abi arch tune riscv_attribute isa_spec"
>> +   supported_defaults="abi arch tune riscv_attribute isa_spec 
>> tls"
>> 
>>case "${target}" in
>>riscv-* | riscv32*) xlen=32 ;;
>> @@ -4694,6 +4696,17 @@ case "${target}" in
>>;;
>>esac
>>fi
>> +   # Handle --with-tls.
>> +   case "$with_tls" in
>> +   "" \
>> +   | trad | desc)
>> +   # OK
>> +   ;;
>> +   *)
>> +   echo "Unknown TLS method used in 
>> --with-tls=$with_tls" 1>&2
>> +   exit 1
>> +

Re: [PATCH] c++, v2: Further #pragma GCC unroll C++ fix [PR112795]

2023-12-05 Thread Jason Merrill

On 12/5/23 11:03, Jakub Jelinek wrote:

On Tue, Dec 05, 2023 at 10:07:19AM -0500, Jason Merrill wrote:

Please.  Maybe check_pragma_unroll? check_unroll_factor?


So like this (assuming it passes bootstrap/regtest, so far passed just
GXX_TESTSUITE_STDS=98,11,14,17,20,23,26 make check-g++ 
RUNTESTFLAGS="dg.exp='unroll*'"
)?


OK.


2023-12-05  Jakub Jelinek  

PR c++/112795
* cp-tree.h (cp_check_pragma_unroll): Declare.
* semantics.cc (cp_check_pragma_unroll): New function.
* parser.cc (cp_parser_pragma_unroll): Use cp_check_pragma_unroll.
* pt.cc (tsubst_expr) : Likewise.
(tsubst_stmt) : Likwsie.

* g++.dg/ext/unroll-2.C: Use { target c++11 } instead of dg-skip-if for
-std=gnu++98.
* g++.dg/ext/unroll-3.C: Likewise.
* g++.dg/ext/unroll-7.C: New test.
* g++.dg/ext/unroll-8.C: New test.

--- gcc/cp/cp-tree.h.jj 2023-12-05 09:06:06.140878013 +0100
+++ gcc/cp/cp-tree.h2023-12-05 16:21:05.564736203 +0100
@@ -7918,6 +7918,7 @@ extern tree most_general_lambda   (tree)
  extern tree finish_omp_target (location_t, tree, tree, bool);
  extern void finish_omp_target_clauses (location_t, tree, tree *);
  extern void maybe_warn_unparenthesized_assignment (tree, tsubst_flags_t);
+extern tree cp_check_pragma_unroll (location_t, tree);
  
  /* in tree.cc */

  extern int cp_tree_operand_length (const_tree);
--- gcc/cp/semantics.cc.jj  2023-12-04 08:59:06.888357091 +0100
+++ gcc/cp/semantics.cc 2023-12-05 16:56:03.718410332 +0100
@@ -13016,4 +13016,33 @@ cp_build_bit_cast (location_t loc, tree
return ret;
  }
  
+/* Diagnose invalid #pragma GCC unroll argument and adjust

+   it if needed.  */
+
+tree
+cp_check_pragma_unroll (location_t loc, tree unroll)
+{
+  HOST_WIDE_INT lunroll = 0;
+  if (type_dependent_expression_p (unroll))
+;
+  else if (!INTEGRAL_TYPE_P (TREE_TYPE (unroll))
+  || (!value_dependent_expression_p (unroll)
+  && (!tree_fits_shwi_p (unroll)
+  || (lunroll = tree_to_shwi (unroll)) < 0
+  || lunroll >= USHRT_MAX)))
+{
+  error_at (loc, "%<#pragma GCC unroll%> requires an"
+   " assignment-expression that evaluates to a non-negative"
+   " integral constant less than %u", USHRT_MAX);
+  unroll = integer_one_node;
+}
+  else if (TREE_CODE (unroll) == INTEGER_CST)
+{
+  unroll = fold_convert (integer_type_node, unroll);
+  if (integer_zerop (unroll))
+   unroll = integer_one_node;
+}
+  return unroll;
+}
+
  #include "gt-cp-semantics.h"
--- gcc/cp/parser.cc.jj 2023-12-05 09:05:37.533281014 +0100
+++ gcc/cp/parser.cc2023-12-05 16:18:32.224909370 +0100
@@ -50243,27 +50243,7 @@ cp_parser_pragma_unroll (cp_parser *pars
  {
location_t location = cp_lexer_peek_token (parser->lexer)->location;
tree unroll = cp_parser_constant_expression (parser);
-  unroll = fold_non_dependent_expr (unroll);
-  HOST_WIDE_INT lunroll = 0;
-  if (type_dependent_expression_p (unroll))
-;
-  else if (!INTEGRAL_TYPE_P (TREE_TYPE (unroll))
-  || (!value_dependent_expression_p (unroll)
-  && (!tree_fits_shwi_p (unroll)
-  || (lunroll = tree_to_shwi (unroll)) < 0
-  || lunroll >= USHRT_MAX)))
-{
-  error_at (location, "%<#pragma GCC unroll%> requires an"
-   " assignment-expression that evaluates to a non-negative"
-   " integral constant less than %u", USHRT_MAX);
-  unroll = NULL_TREE;
-}
-  else if (TREE_CODE (unroll) == INTEGER_CST)
-{
-  unroll = fold_convert (integer_type_node, unroll);
-  if (integer_zerop (unroll))
-   unroll = integer_one_node;
-}
+  unroll = cp_check_pragma_unroll (location, fold_non_dependent_expr (unroll));
cp_parser_skip_to_pragma_eol (parser, pragma_tok);
return unroll;
  }
--- gcc/cp/pt.cc.jj 2023-12-05 09:06:06.175877520 +0100
+++ gcc/cp/pt.cc2023-12-05 16:48:05.641109116 +0100
@@ -18407,22 +18407,24 @@ tsubst_stmt (tree t, tree args, tsubst_f
complain, in_decl, decomp);
  }
  
+	tree unroll = RECUR (RANGE_FOR_UNROLL (t));

+   if (unroll)
+ unroll
+   = cp_check_pragma_unroll (EXPR_LOCATION (RANGE_FOR_UNROLL (t)),
+ unroll);
if (processing_template_decl)
  {
RANGE_FOR_IVDEP (stmt) = RANGE_FOR_IVDEP (t);
-   RANGE_FOR_UNROLL (stmt) = RANGE_FOR_UNROLL (t);
+   RANGE_FOR_UNROLL (stmt) = unroll;
RANGE_FOR_NOVECTOR (stmt) = RANGE_FOR_NOVECTOR (t);
finish_range_for_decl (stmt, decl, expr);
if (decomp && decl != error_mark_node)
  cp_finish_decomp (decl, decomp);
  }
else
- {
-   tree unroll = RECUR (RANGE_FOR_UNROLL (t));
-   stmt = cp_convert_range_for 

[PATCH] remove qmtest-related Makefile targets

2023-12-05 Thread Eric Gallager
On GitHub, Joseph Myers (@jsm28 there) says in MentorEmbedded/qmtest#1
that the qmtest-related targets should have been removed long ago. This
patch does so.

Ref:
https://github.com/MentorEmbedded/qmtest/issues/1

gcc/ChangeLog:

* Makefile.in: Remove qmtest-related targets.
---
 gcc/Makefile.in | 53 -
 1 file changed, 53 deletions(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index e154f7c0055..f80aa09ef74 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -3733,7 +3733,6 @@ distclean: clean lang.distclean
-rm -f testsuite/*.log testsuite/*.sum
-cd testsuite && rm -f x *.x *.x? *.exe *.rpo *.o *.s *.S *.cc
-cd testsuite && rm -f *.out *.gcov *$(coverageexts)
-   -rm -rf ${QMTEST_DIR} stamp-qmtest
-rm -f .gdbinit configargs.h
-rm -f gcov.pod
 # Delete po/*.gmo only if we are not building in the source directory.
@@ -4415,58 +4414,6 @@ check-parallel-% : site.exp
  fi ; \
fi )
 
-# QMTest targets
-
-# The path to qmtest.
-QMTEST_PATH=qmtest
-
-# The flags to pass to qmtest.
-QMTESTFLAGS=
-
-# The flags to pass to "qmtest run".
-QMTESTRUNFLAGS=-f none --result-stream dejagnu_stream.DejaGNUStream
-
-# The command to use to invoke qmtest.
-QMTEST=${QMTEST_PATH} ${QMTESTFLAGS}
-
-# The tests (or suites) to run.
-QMTEST_GPP_TESTS=g++
-
-# The subdirectory of the OBJDIR that will be used to store the QMTest
-# test database configuration and that will be used for temporary
-# scratch space during QMTest's execution.
-QMTEST_DIR=qmtestsuite
-
-# Create the QMTest database configuration.
-${QMTEST_DIR} stamp-qmtest:
-   ${QMTEST} -D ${QMTEST_DIR} create-tdb \
-   -c gcc_database.GCCDatabase \
-   -a srcdir=`cd ${srcdir}/testsuite && ${PWD_COMMAND}` && \
-   $(STAMP) stamp-qmtest
-
-# Create the QMTest context file.
-${QMTEST_DIR}/context: stamp-qmtest
-   rm -f $@
-   echo "CompilerTable.languages=c cplusplus" >> $@
-   echo "CompilerTable.c_kind=GCC" >> $@
-   echo "CompilerTable.c_path=${objdir}/xgcc" >> $@
-   echo "CompilerTable.c_options=-B${objdir}/" >> $@
-   echo "CompilerTable.cplusplus_kind=GCC" >> $@
-   echo "CompilerTable.cplusplus_path=${objdir}/xg++" >> $@
-   echo "CompilerTable.cplusplus_options=-B${objdir}/" >> $@
-   echo "DejaGNUTest.target=${target_noncanonical}" >> $@
-
-# Run the G++ testsuite using QMTest.
-qmtest-g++: ${QMTEST_DIR}/context
-   cd ${QMTEST_DIR} && ${QMTEST} run ${QMTESTRUNFLAGS} -C context \
-  -o g++.qmr ${QMTEST_GPP_TESTS}
-
-# Use the QMTest GUI.
-qmtest-gui: ${QMTEST_DIR}/context
-   cd ${QMTEST_DIR} && ${QMTEST} gui -C context
-
-.PHONY: qmtest-g++
-
 # Run Paranoia on real.cc.
 
 paranoia.o: $(srcdir)/../contrib/paranoia.cc $(CONFIG_H) $(SYSTEM_H) $(TREE_H)
-- 
2.32.0 (Apple Git-132)



Re: [PATCH] libstdc++: Add test for LWG Issue 3897

2023-12-05 Thread Jonathan Wakely
On Tue, 5 Dec 2023 at 15:57, Will Hawkins wrote:
>
> On Tue, Dec 5, 2023 at 10:46 AM Jonathan Wakely  wrote:
> >
> > On Mon, 4 Dec 2023 at 16:42, Will Hawkins wrote:
> > >
> > > Hello!
> > >
> > > Thank you, as always, for the great work that you do on libstdc++. The
> > > inout_ptr implementation properly handles the issue raised in LWG 3897
> > > but it seems like having an explicit test might be a good idea.
> >
> > Thanks, Will, we should definitely have a test for this.
> >
> > I've tweaked it a bit to avoid leaking the pointer (in case anybody
> > runs the tests under valgrind or ASan) and to add your new test to the
>
> Of course ... how could I forget to delete the pointer? I'm a goofball.

:-)

> > existing file (to avoid the overhead of a separate test just for this
> > one check).
>
> Makes perfect sense. I wasn't sure how you typically handle that. I
> will know for the future.

In principle it's better to have one test file per thing we want to
check ... but libstdc++ has a lot of tests, and every one of them
includes the bits/stdc++.h precompiled header which includes the
entire library. And the way dejagnu works, every test runs multiple
compilations, because it preprocesses or compiles various helper files
to check the test conditions. And since I run every test about 20
times (with various combinations of options) it all adds up.



Re: [PATCH v5] aarch64: New RTL optimization pass avoid-store-forwarding.

2023-12-05 Thread Richard Sandiford
Manos Anagnostakis  writes:
> This is an RTL pass that detects store forwarding from stores to larger loads 
> (load pairs).
>
> This optimization is SPEC2017-driven and was found to be beneficial for some 
> benchmarks,
> through testing on ampere1/ampere1a machines.
>
> For example, it can transform cases like
>
> str  d5, [sp, #320]
> fmul d5, d31, d29
> ldp  d31, d17, [sp, #312] # Large load from small store
>
> to
>
> str  d5, [sp, #320]
> fmul d5, d31, d29
> ldr  d31, [sp, #312]
> ldr  d17, [sp, #320]
>
> Currently, the pass is disabled by default on all architectures and enabled 
> by a target-specific option.
>
> If deemed beneficial enough for a default, it will be enabled on 
> ampere1/ampere1a,
> or other architectures as well, without needing to be turned on by this 
> option.
>
> Bootstrapped and regtested on aarch64-linux.
>
> gcc/ChangeLog:
>
> * config.gcc: Add aarch64-store-forwarding.o to extra_objs.
> * config/aarch64/aarch64-passes.def (INSERT_PASS_AFTER): New pass.
> * config/aarch64/aarch64-protos.h (make_pass_avoid_store_forwarding): 
> Declare.
> * config/aarch64/aarch64.opt (mavoid-store-forwarding): New option.
>   (aarch64-store-forwarding-threshold): New param.
> * config/aarch64/t-aarch64: Add aarch64-store-forwarding.o
> * doc/invoke.texi: Document new option and new param.
> * config/aarch64/aarch64-store-forwarding.cc: New file.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/ldp_ssll_no_overlap_address.c: New test.
> * gcc.target/aarch64/ldp_ssll_no_overlap_offset.c: New test.
> * gcc.target/aarch64/ldp_ssll_overlap.c: New test.
>
> Signed-off-by: Manos Anagnostakis 
> Co-Authored-By: Manolis Tsamis 
> Co-Authored-By: Philipp Tomsich 
> ---
> Changes in v5:
>   - Remove unnecessary cselib_lookup on load_mem_addr.
>   - Fix warning with store_info by renaming to str_info.

OK, thanks!  And thanks for your patience with the reviews.

Richard

>  gcc/config.gcc|   1 +
>  gcc/config/aarch64/aarch64-passes.def |   1 +
>  gcc/config/aarch64/aarch64-protos.h   |   1 +
>  .../aarch64/aarch64-store-forwarding.cc   | 319 ++
>  gcc/config/aarch64/aarch64.opt|   9 +
>  gcc/config/aarch64/t-aarch64  |  10 +
>  gcc/doc/invoke.texi   |  11 +-
>  .../aarch64/ldp_ssll_no_overlap_address.c |  33 ++
>  .../aarch64/ldp_ssll_no_overlap_offset.c  |  33 ++
>  .../gcc.target/aarch64/ldp_ssll_overlap.c |  33 ++
>  10 files changed, 450 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/config/aarch64/aarch64-store-forwarding.cc
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_address.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_offset.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_ssll_overlap.c
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 6450448f2f0..7c48429eb82 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -350,6 +350,7 @@ aarch64*-*-*)
>   cxx_target_objs="aarch64-c.o"
>   d_target_objs="aarch64-d.o"
>   extra_objs="aarch64-builtins.o aarch-common.o aarch64-sve-builtins.o 
> aarch64-sve-builtins-shapes.o aarch64-sve-builtins-base.o 
> aarch64-sve-builtins-sve2.o aarch64-sve-builtins-sme.o 
> cortex-a57-fma-steering.o aarch64-speculation.o 
> falkor-tag-collision-avoidance.o aarch-bti-insert.o aarch64-cc-fusion.o"
> + extra_objs="${extra_objs} aarch64-store-forwarding.o"
>   target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.cc 
> \$(srcdir)/config/aarch64/aarch64-sve-builtins.h 
> \$(srcdir)/config/aarch64/aarch64-sve-builtins.cc"
>   target_has_targetm_common=yes
>   ;;
> diff --git a/gcc/config/aarch64/aarch64-passes.def 
> b/gcc/config/aarch64/aarch64-passes.def
> index 662a13fd5e6..94ced0aebf6 100644
> --- a/gcc/config/aarch64/aarch64-passes.def
> +++ b/gcc/config/aarch64/aarch64-passes.def
> @@ -24,3 +24,4 @@ INSERT_PASS_BEFORE (pass_late_thread_prologue_and_epilogue, 
> 1, pass_switch_pstat
>  INSERT_PASS_AFTER (pass_machine_reorg, 1, pass_tag_collision_avoidance);
>  INSERT_PASS_BEFORE (pass_shorten_branches, 1, pass_insert_bti);
>  INSERT_PASS_AFTER (pass_if_after_combine, 1, pass_cc_fusion);
> +INSERT_PASS_AFTER (pass_peephole2, 1, pass_avoid_store_forwarding);
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 25a9103f0e7..bd4b34d9af1 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -1063,6 +1063,7 @@ rtl_opt_pass *make_pass_tag_collision_avoidance 
> (gcc::context *);
>  rtl_opt_pass *make_pass_insert_bti (gcc::context *ctxt);
>  rtl_opt_pass *make_pass_cc_fusion (gcc::context *ctxt);
>  rtl_opt_pass *make_pass_switch_pstate_sm (gcc::context *ctxt);
> +rtl_opt_pass *make_pass_avoid_store_forwarding (gcc::context 

Re: [PATCH v2 06/11] aarch64: Fix up aarch64_print_operand xzr/wzr case

2023-12-05 Thread Richard Sandiford
Alex Coplan  writes:
> Hi,
>
> This is a v2 of:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637612.html
>
> v1 was approved as-is, but this version pulls out the test into a helper
> function which is used by later patches in the series.
>
> Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk?
>
> Thanks,
> Alex
>
> -- >8 --
>
> This adjusts aarch64_print_operand to recognize zero rtxes in modes other than
> VOIDmode.  This allows us to use xzr/wzr for zero vectors, for example.
>
> We extract the test into a helper function, aarch64_const_zero_rtx_p, since 
> this
> predicate is needed by later patches.
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-protos.h (aarch64_const_zero_rtx_p): New.
>   * config/aarch64/aarch64.cc (aarch64_const_zero_rtx_p): New.
>   Use it ...
>   (aarch64_print_operand): ... here.  Recognize CONST0_RTXes in
>   modes other than VOIDmode.
>
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index d2718cc87b3..27fc6ccf098 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -773,6 +773,7 @@ bool aarch64_expand_cpymem (rtx *);
>  bool aarch64_expand_setmem (rtx *);
>  bool aarch64_float_const_zero_rtx_p (rtx);
>  bool aarch64_float_const_rtx_p (rtx);
> +bool aarch64_const_zero_rtx_p (rtx);
>  bool aarch64_function_arg_regno_p (unsigned);
>  bool aarch64_fusion_enabled_p (enum aarch64_fusion_pairs);
>  bool aarch64_gen_cpymemqi (rtx *);
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index fca64daf2a0..a35c6bbe335 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -9095,6 +9095,15 @@ aarch64_float_const_zero_rtx_p (rtx x)
>return real_equal (CONST_DOUBLE_REAL_VALUE (x), );
>  }
>  
> +/* Return true if X is any kind of constant zero rtx.  */
> +
> +bool
> +aarch64_const_zero_rtx_p (rtx x)
> +{
> +  return x == CONST0_RTX (GET_MODE (x))
> +|| (CONST_DOUBLE_P (x) && aarch64_float_const_zero_rtx_p (x));
> +}
> +

Think this is easier to read if formatted as:

  return (x == CONST0_RTX (GET_MODE (x))
  || (CONST_DOUBLE_P (x) && aarch64_float_const_zero_rtx_p (x)));

OK with that change (or as-is if you prefer).  Thanks for splitting
the function out.

Richard

>  /* Return TRUE if rtx X is immediate constant that fits in a single
> MOVI immediate operation.  */
>  bool
> @@ -9977,8 +9986,7 @@ aarch64_print_operand (FILE *f, rtx x, int code)
>  
>  case 'w':
>  case 'x':
> -  if (x == const0_rtx
> -   || (CONST_DOUBLE_P (x) && aarch64_float_const_zero_rtx_p (x)))
> +  if (aarch64_const_zero_rtx_p (x))
>   {
> asm_fprintf (f, "%czr", code);
> break;


Re: [PATCH 1/1] RISC-V: Add support for XCVsimd extension in CV32E40P

2023-12-05 Thread Kito Cheng
There are few formatting issues, but I think this is generally OK for
intrinsic support only, but one question before I formally say LGTM
for this patch: are you interested in making it able to code gen with
GNU vector and also for auto vectorization stuff? If so, I can spend
more time on reviewing that and give more comments.


[PATCH] c++, v2: Further #pragma GCC unroll C++ fix [PR112795]

2023-12-05 Thread Jakub Jelinek
On Tue, Dec 05, 2023 at 10:07:19AM -0500, Jason Merrill wrote:
> Please.  Maybe check_pragma_unroll? check_unroll_factor?

So like this (assuming it passes bootstrap/regtest, so far passed just
GXX_TESTSUITE_STDS=98,11,14,17,20,23,26 make check-g++ 
RUNTESTFLAGS="dg.exp='unroll*'"
)?

2023-12-05  Jakub Jelinek  

PR c++/112795
* cp-tree.h (cp_check_pragma_unroll): Declare.
* semantics.cc (cp_check_pragma_unroll): New function.
* parser.cc (cp_parser_pragma_unroll): Use cp_check_pragma_unroll.
* pt.cc (tsubst_expr) : Likewise.
(tsubst_stmt) : Likwsie.

* g++.dg/ext/unroll-2.C: Use { target c++11 } instead of dg-skip-if for
-std=gnu++98.
* g++.dg/ext/unroll-3.C: Likewise.
* g++.dg/ext/unroll-7.C: New test.
* g++.dg/ext/unroll-8.C: New test.

--- gcc/cp/cp-tree.h.jj 2023-12-05 09:06:06.140878013 +0100
+++ gcc/cp/cp-tree.h2023-12-05 16:21:05.564736203 +0100
@@ -7918,6 +7918,7 @@ extern tree most_general_lambda   (tree)
 extern tree finish_omp_target  (location_t, tree, tree, bool);
 extern void finish_omp_target_clauses  (location_t, tree, tree *);
 extern void maybe_warn_unparenthesized_assignment (tree, tsubst_flags_t);
+extern tree cp_check_pragma_unroll (location_t, tree);
 
 /* in tree.cc */
 extern int cp_tree_operand_length  (const_tree);
--- gcc/cp/semantics.cc.jj  2023-12-04 08:59:06.888357091 +0100
+++ gcc/cp/semantics.cc 2023-12-05 16:56:03.718410332 +0100
@@ -13016,4 +13016,33 @@ cp_build_bit_cast (location_t loc, tree
   return ret;
 }
 
+/* Diagnose invalid #pragma GCC unroll argument and adjust
+   it if needed.  */
+
+tree
+cp_check_pragma_unroll (location_t loc, tree unroll)
+{
+  HOST_WIDE_INT lunroll = 0;
+  if (type_dependent_expression_p (unroll))
+;
+  else if (!INTEGRAL_TYPE_P (TREE_TYPE (unroll))
+  || (!value_dependent_expression_p (unroll)
+  && (!tree_fits_shwi_p (unroll)
+  || (lunroll = tree_to_shwi (unroll)) < 0
+  || lunroll >= USHRT_MAX)))
+{
+  error_at (loc, "%<#pragma GCC unroll%> requires an"
+   " assignment-expression that evaluates to a non-negative"
+   " integral constant less than %u", USHRT_MAX);
+  unroll = integer_one_node;
+}
+  else if (TREE_CODE (unroll) == INTEGER_CST)
+{
+  unroll = fold_convert (integer_type_node, unroll);
+  if (integer_zerop (unroll))
+   unroll = integer_one_node;
+}
+  return unroll;
+}
+
 #include "gt-cp-semantics.h"
--- gcc/cp/parser.cc.jj 2023-12-05 09:05:37.533281014 +0100
+++ gcc/cp/parser.cc2023-12-05 16:18:32.224909370 +0100
@@ -50243,27 +50243,7 @@ cp_parser_pragma_unroll (cp_parser *pars
 {
   location_t location = cp_lexer_peek_token (parser->lexer)->location;
   tree unroll = cp_parser_constant_expression (parser);
-  unroll = fold_non_dependent_expr (unroll);
-  HOST_WIDE_INT lunroll = 0;
-  if (type_dependent_expression_p (unroll))
-;
-  else if (!INTEGRAL_TYPE_P (TREE_TYPE (unroll))
-  || (!value_dependent_expression_p (unroll)
-  && (!tree_fits_shwi_p (unroll)
-  || (lunroll = tree_to_shwi (unroll)) < 0
-  || lunroll >= USHRT_MAX)))
-{
-  error_at (location, "%<#pragma GCC unroll%> requires an"
-   " assignment-expression that evaluates to a non-negative"
-   " integral constant less than %u", USHRT_MAX);
-  unroll = NULL_TREE;
-}
-  else if (TREE_CODE (unroll) == INTEGER_CST)
-{
-  unroll = fold_convert (integer_type_node, unroll);
-  if (integer_zerop (unroll))
-   unroll = integer_one_node;
-}
+  unroll = cp_check_pragma_unroll (location, fold_non_dependent_expr (unroll));
   cp_parser_skip_to_pragma_eol (parser, pragma_tok);
   return unroll;
 }
--- gcc/cp/pt.cc.jj 2023-12-05 09:06:06.175877520 +0100
+++ gcc/cp/pt.cc2023-12-05 16:48:05.641109116 +0100
@@ -18407,22 +18407,24 @@ tsubst_stmt (tree t, tree args, tsubst_f
complain, in_decl, decomp);
  }
 
+   tree unroll = RECUR (RANGE_FOR_UNROLL (t));
+   if (unroll)
+ unroll
+   = cp_check_pragma_unroll (EXPR_LOCATION (RANGE_FOR_UNROLL (t)),
+ unroll);
if (processing_template_decl)
  {
RANGE_FOR_IVDEP (stmt) = RANGE_FOR_IVDEP (t);
-   RANGE_FOR_UNROLL (stmt) = RANGE_FOR_UNROLL (t);
+   RANGE_FOR_UNROLL (stmt) = unroll;
RANGE_FOR_NOVECTOR (stmt) = RANGE_FOR_NOVECTOR (t);
finish_range_for_decl (stmt, decl, expr);
if (decomp && decl != error_mark_node)
  cp_finish_decomp (decl, decomp);
  }
else
- {
-   tree unroll = RECUR (RANGE_FOR_UNROLL (t));
-   stmt = cp_convert_range_for (stmt, decl, expr, decomp,
-

Re: [PATCH] libsupc++: try cxa_thread_atexit_impl at runtime

2023-12-05 Thread David Edelsohn
Alex,

This patch broke bootstrap on AIX.  The stage1 compiler is not able to
build a program and stage2 configure fails.

Thanks, David


Re: [PATCH] c++: Implement C++ DR 2262 - Attributes for asm-definition [PR110734]

2023-12-05 Thread Jason Merrill

On 12/5/23 02:40, Jakub Jelinek wrote:

Hi!

Seems in 2017 attribute-specifier-seq[opt] was added to asm-declaration
and the change was voted in as a DR.

The following patch implements it by parsing the attributes and warning
about them.

I found one attribute parsing bug I'll send a fix for momentarily.

And there is another thing I wonder about: with -Wno-attributes= we are
supposed to ignore the attributes altogether, but we are actually still
warning about them when we emit these generic warnings about ignoring
all attributes which appertain to this and that (perhaps with some
exceptions we first remove from the attribute chain), like:
void foo () { [[foo::bar]]; }
with -Wattributes -Wno-attributes=foo::bar
Shouldn't we call some helper function in cases like this and warn
not when std_attrs (or how the attribute chain var is called) is non-NULL,
but if it is non-NULL and contains at least one non-attribute_ignored_p
attribute?


Sounds good.


cp_parser_declaration at least tries:
   if (std_attrs != NULL_TREE && !attribute_ignored_p (std_attrs))
 warning_at (make_location (attrs_loc, attrs_loc, parser->lexer),
 OPT_Wattributes, "attribute ignored");
but attribute_ignored_p here checks the first attribute rather than the
whole chain.  So it will incorrectly not warn if there is an ignored
attribute followed by non-ignored.


I agree.


Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.


2023-12-05  Jakub Jelinek  

PR c++/110734
* parser.cc (cp_parser_block_declaration): Implement C++ DR 2262
- Attributes for asm-definition.  Call cp_parser_asm_definition
even if RID_ASM token is only seen after sequence of standard
attributes.
(cp_parser_asm_definition): Parse standard attributes before
RID_ASM token and warn for them with -Wattributes.

* g++.dg/DRs/dr2262.C: New test.
* g++.dg/cpp0x/gen-attrs-76.C (foo, bar): Don't expect errors
on attributes on asm definitions.
* g++.dg/gomp/attrs-11.C: Remove 2 expected errors.

--- gcc/cp/parser.cc.jj 2023-12-04 08:59:06.871357329 +0100
+++ gcc/cp/parser.cc2023-12-04 20:23:53.225009856 +0100
@@ -15398,7 +15398,6 @@ cp_parser_block_declaration (cp_parser *
/* Peek at the next token to figure out which kind of declaration is
   present.  */
cp_token *token1 = cp_lexer_peek_token (parser->lexer);
-  size_t attr_idx;
  
/* If the next keyword is `asm', we have an asm-definition.  */

if (token1->keyword == RID_ASM)
@@ -15452,22 +15451,36 @@ cp_parser_block_declaration (cp_parser *
/* If the next token is `static_assert' we have a static assertion.  */
else if (token1->keyword == RID_STATIC_ASSERT)
  cp_parser_static_assert (parser, /*member_p=*/false);
-  /* If the next tokens after attributes is `using namespace', then we have
- a using-directive.  */
-  else if ((attr_idx = cp_parser_skip_std_attribute_spec_seq (parser, 1)) != 1
-  && cp_lexer_nth_token_is_keyword (parser->lexer, attr_idx,
-RID_USING)
-  && cp_lexer_nth_token_is_keyword (parser->lexer, attr_idx + 1,
-RID_NAMESPACE))
+  else
  {
-  if (statement_p)
-   cp_parser_commit_to_tentative_parse (parser);
-  cp_parser_using_directive (parser);
+  size_t attr_idx = cp_parser_skip_std_attribute_spec_seq (parser, 1);
+  cp_token *after_attr = NULL;
+  if (attr_idx != 1)
+   after_attr = cp_lexer_peek_nth_token (parser->lexer, attr_idx);
+  /* If the next tokens after attributes is `using namespace', then we have
+a using-directive.  */
+  if (after_attr
+ && after_attr->keyword == RID_USING
+ && cp_lexer_nth_token_is_keyword (parser->lexer, attr_idx + 1,
+   RID_NAMESPACE))
+   {
+ if (statement_p)
+   cp_parser_commit_to_tentative_parse (parser);
+ cp_parser_using_directive (parser);
+   }
+  /* If the next token after attributes is `asm', then we have
+an asm-definition.  */
+  else if (after_attr && after_attr->keyword == RID_ASM)
+   {
+ if (statement_p)
+   cp_parser_commit_to_tentative_parse (parser);
+ cp_parser_asm_definition (parser);
+   }
+  /* Anything else must be a simple-declaration.  */
+  else
+   cp_parser_simple_declaration (parser, !statement_p,
+ /*maybe_range_for_decl*/NULL);
  }
-  /* Anything else must be a simple-declaration.  */
-  else
-cp_parser_simple_declaration (parser, !statement_p,
- /*maybe_range_for_decl*/NULL);
  }
  
  /* Parse a simple-declaration.

@@ -22424,6 +22437,7 @@ cp_parser_asm_definition (cp_parser* par
bool invalid_inputs_p = false;
bool invalid_outputs_p = false;
required_token missing = 

Re: [PATCH] libstdc++: Add test for LWG Issue 3897

2023-12-05 Thread Will Hawkins
On Tue, Dec 5, 2023 at 10:46 AM Jonathan Wakely  wrote:
>
> On Mon, 4 Dec 2023 at 16:42, Will Hawkins wrote:
> >
> > Hello!
> >
> > Thank you, as always, for the great work that you do on libstdc++. The
> > inout_ptr implementation properly handles the issue raised in LWG 3897
> > but it seems like having an explicit test might be a good idea.
>
> Thanks, Will, we should definitely have a test for this.
>
> I've tweaked it a bit to avoid leaking the pointer (in case anybody
> runs the tests under valgrind or ASan) and to add your new test to the

Of course ... how could I forget to delete the pointer? I'm a goofball.

> existing file (to avoid the overhead of a separate test just for this
> one check).

Makes perfect sense. I wasn't sure how you typically handle that. I
will know for the future.

>
> See attached ...

Thank you for the feedback! I look forward to the next time I can help!
Will


Re: [PATCH] libstdc++: Add test for LWG Issue 3897

2023-12-05 Thread Jonathan Wakely
On Mon, 4 Dec 2023 at 16:42, Will Hawkins wrote:
>
> Hello!
>
> Thank you, as always, for the great work that you do on libstdc++. The
> inout_ptr implementation properly handles the issue raised in LWG 3897
> but it seems like having an explicit test might be a good idea.

Thanks, Will, we should definitely have a test for this.

I've tweaked it a bit to avoid leaking the pointer (in case anybody
runs the tests under valgrind or ASan) and to add your new test to the
existing file (to avoid the overhead of a separate test just for this
one check).

See attached ...
commit c02f3696fdb07d1a06c1aa7b035be9a20d65b803
Author: Will Hawkins 
Date:   Mon Dec 4 20:59:44 2023

libstdc++: Add test for LWG Issue 3897

Add a test to verify that the implementation of inout_ptr is not
vulnerable to LWG Issue 3897.

libstdc++-v3/ChangeLog:

* testsuite/20_util/smartptr.adapt/inout_ptr/2.cc: Add check
for LWG Issue 3897.

Co-authored-by: Jonathan Wakely 

diff --git a/libstdc++-v3/testsuite/20_util/smartptr.adapt/inout_ptr/2.cc 
b/libstdc++-v3/testsuite/20_util/smartptr.adapt/inout_ptr/2.cc
index ca6076209c2..b4a2d95227a 100644
--- a/libstdc++-v3/testsuite/20_util/smartptr.adapt/inout_ptr/2.cc
+++ b/libstdc++-v3/testsuite/20_util/smartptr.adapt/inout_ptr/2.cc
@@ -96,7 +96,22 @@ test_unique_ptr()
   VERIFY( upbd->id == 2 );
 }
 
+void
+test_lwg3897()
+{
+  // Verify that implementation handles LWG Issue 3897
+  auto nuller = [](int** p) {
+delete *p;
+*p = nullptr;
+  };
+  int* i = new int{5};
+  nuller(std::inout_ptr(i));
+
+  VERIFY( i == nullptr );
+}
+
 int main()
 {
   test_unique_ptr();
+  test_lwg3897();
 }


Re: [PATCH v3 1/3] libgomp, nvptx: low-latency memory allocator

2023-12-05 Thread Andrew Stubbs

On 04/12/2023 16:04, Tobias Burnus wrote:

On 03.12.23 01:32, Andrew Stubbs wrote:

This patch adds support for allocating low-latency ".shared" memory on
NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc.  The 
memory
can be allocated, reallocated, and freed using a basic but fast 
algorithm,
is thread safe and the size of the low-latency heap can be configured 
using

the GOMP_NVPTX_LOWLAT_POOL environment variable.

The use of the PTX dynamic_smem_size feature means that low-latency 
allocator

will not work with the PTX 3.1 multilib.

For now, the omp_low_lat_mem_alloc allocator also works, but that will 
change

when I implement the access traits.


...

LGTM, however, I about the following:


diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index e5fe7af76af..39d0749e7b3 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -3012,11 +3012,14 @@ value.
  @item omp_const_mem_alloc   @tab omp_const_mem_space
  @item omp_high_bw_mem_alloc @tab omp_high_bw_mem_space
  @item omp_low_lat_mem_alloc @tab omp_low_lat_mem_space
-@item omp_cgroup_mem_alloc  @tab --
-@item omp_pteam_mem_alloc   @tab --
-@item omp_thread_mem_alloc  @tab --
+@item omp_cgroup_mem_alloc  @tab omp_low_lat_mem_space 
(implementation defined)
+@item omp_pteam_mem_alloc   @tab omp_low_lat_mem_space 
(implementation defined)
+@item omp_thread_mem_alloc  @tab omp_low_lat_mem_space 
(implementation defined)

  @end multitable

+The @code{omp_low_lat_mem_space} is only available on supported devices.
+See @ref{Offload-Target Specifics}.
+


Whether it would be clearer to have this wording not here for the 
OMP_ALLOCATOR env, i.e.

https://gcc.gnu.org/onlinedocs/libgomp/OMP_005fALLOCATOR.html
but just a simple crossref like:

--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -3061,5 +3061,5 @@ 
OMP_ALLOCATOR=omp_low_lat_mem_space:pinned=true,partition=nearest

  @item @emph{See also}:
  @ref{Memory allocation}, @ref{omp_get_default_allocator},
-@ref{omp_set_default_allocator}
+@ref{omp_set_default_allocator}, @ref{Offload-Target Specifics}

  @item @emph{Reference}:


And add your wording to:
   https://gcc.gnu.org/onlinedocs/libgomp/Memory-allocation.html

As this sections mentions that "omp_low_lat_mem_space maps to 
omp_default_mem_space" in general.
Hence, mentioning in this section in addition that  
omp_low_lat_mem_space  is honored on devices

seems to be the better location.


How about this?

--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -3012,9 +3012,9 @@ value.
 @item omp_const_mem_alloc   @tab omp_const_mem_space
 @item omp_high_bw_mem_alloc @tab omp_high_bw_mem_space
 @item omp_low_lat_mem_alloc @tab omp_low_lat_mem_space
-@item omp_cgroup_mem_alloc  @tab --
-@item omp_pteam_mem_alloc   @tab --
-@item omp_thread_mem_alloc  @tab --
+@item omp_cgroup_mem_alloc  @tab omp_low_lat_mem_space 
(implementation defined)
+@item omp_pteam_mem_alloc   @tab omp_low_lat_mem_space 
(implementation defined)
+@item omp_thread_mem_alloc  @tab omp_low_lat_mem_space 
(implementation defined)

 @end multitable

 The predefined allocators use the default values for the traits,
@@ -3060,7 +3060,7 @@ 
OMP_ALLOCATOR=omp_low_lat_mem_space:pinned=true,partition=nearest


 @item @emph{See also}:
 @ref{Memory allocation}, @ref{omp_get_default_allocator},
-@ref{omp_set_default_allocator}
+@ref{omp_set_default_allocator}, @ref{Offload-Target Specific}

 @item @emph{Reference}:
 @uref{https://www.openmp.org, OpenMP specification v5.0}, Section 6.21
@@ -5710,7 +5710,8 @@ For the memory spaces, the following applies:
 @itemize
 @item @code{omp_default_mem_space} is supported
 @item @code{omp_const_mem_space} maps to @code{omp_default_mem_space}
-@item @code{omp_low_lat_mem_space} maps to @code{omp_default_mem_space}
+@item @code{omp_low_lat_mem_space} is only available on supported devices,
+  and maps to @code{omp_default_mem_space} otherwise.
 @item @code{omp_large_cap_mem_space} maps to @code{omp_default_mem_space},
   unless the memkind library is available
 @item @code{omp_high_bw_mem_space} maps to @code{omp_default_mem_space},
@@ -5766,6 +5767,9 @@ Additional notes regarding the traits:
 @item The @code{sync_hint} trait has no effect.
 @end itemize

+See also:
+@ref{Offload-Target Specifics}
+
 @c -
 @c Offload-Target Specifics
 @c -



Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 
80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: 
Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; 
Registergericht München, HRB 106955






Re: [PATCH] tree-optimization/112843 - update_stmt doing wrong things

2023-12-05 Thread Andrew MacLeod

On 12/5/23 03:27, Richard Biener wrote:

The following removes range_query::update_stmt and its single
invocation from update_stmt_operands.  That function is not
supposed to look beyond the raw stmt contents of the passed
stmt since there's no guarantee about the rest of the IL.

I've successfully bootstrapped & tested the update_stmt_operands
hunk, now testing removal of the actual routine as well.  The
testcase that was added when introducing range_query::update_stmt
still passes.

OK to remove the implementation?  I don't see any way around
removing the call though.


Im ok removing it.  Now that we are enabling ranger during a lot of IL 
updating, it probably doesn't make sense for the few cases it use to 
help with with, and may well be dangerous.


the testcase in question that was added appears to be threaded now which 
it wasn't before.  If a similar situation occurs and we need some sort 
of updating, I'll just mark the ssa-name on the LHS as out-of-date, and 
then it'll get lazily updated if need be.


Thanks.

Andrew


Thanks,
Richard.

PR tree-optimization/112843
* tree-ssa-operands.cc (update_stmt_operands): Do not call
update_stmt from ranger.
* value-query.h (range_query::update_stmt): Remove.
* gimple-range.h (gimple_ranger::update_stmt): Likewise.
* gimple-range.cc (gimple_ranger::update_stmt): Likewise.
---
  gcc/gimple-range.cc  | 34 --
  gcc/gimple-range.h   |  1 -
  gcc/tree-ssa-operands.cc |  3 ---
  gcc/value-query.h|  3 ---
  4 files changed, 41 deletions(-)

diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index 5e9bb397a20..84d2c7516e6 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -544,40 +544,6 @@ gimple_ranger::register_transitive_inferred_ranges 
(basic_block bb)
  }
  }
  
-// When a statement S has changed since the result was cached, re-evaluate

-// and update the global cache.
-
-void
-gimple_ranger::update_stmt (gimple *s)
-{
-  tree lhs = gimple_get_lhs (s);
-  if (!lhs || !gimple_range_ssa_p (lhs))
-return;
-  Value_Range r (TREE_TYPE (lhs));
-  // Only update if it already had a value.
-  if (m_cache.get_global_range (r, lhs))
-{
-  // Re-calculate a new value using just cache values.
-  Value_Range tmp (TREE_TYPE (lhs));
-  fold_using_range f;
-  fur_stmt src (s, _cache);
-  f.fold_stmt (tmp, s, src, lhs);
-
-  // Combine the new value with the old value to check for a change.
-  if (r.intersect (tmp))
-   {
- if (dump_file && (dump_flags & TDF_DETAILS))
-   {
- print_generic_expr (dump_file, lhs, TDF_SLIM);
- fprintf (dump_file, " : global value re-evaluated to ");
- r.dump (dump_file);
- fputc ('\n', dump_file);
-   }
- m_cache.set_global_range (lhs, r);
-   }
-}
-}
-
  // This routine will export whatever global ranges are known to GCC
  // SSA_RANGE_NAME_INFO and SSA_NAME_PTR_INFO fields.
  
diff --git a/gcc/gimple-range.h b/gcc/gimple-range.h

index 5807a2b80e5..6b0835c4ca1 100644
--- a/gcc/gimple-range.h
+++ b/gcc/gimple-range.h
@@ -52,7 +52,6 @@ public:
virtual bool range_of_stmt (vrange , gimple *, tree name = NULL) override;
virtual bool range_of_expr (vrange , tree name, gimple * = NULL) override;
virtual bool range_on_edge (vrange , edge e, tree name) override;
-  virtual void update_stmt (gimple *) override;
void range_on_entry (vrange , basic_block bb, tree name);
void range_on_exit (vrange , basic_block bb, tree name);
void export_global_ranges ();
diff --git a/gcc/tree-ssa-operands.cc b/gcc/tree-ssa-operands.cc
index 57e393ae164..b0516a00d64 100644
--- a/gcc/tree-ssa-operands.cc
+++ b/gcc/tree-ssa-operands.cc
@@ -30,7 +30,6 @@ along with GCC; see the file COPYING3.  If not see
  #include "stmt.h"
  #include "print-tree.h"
  #include "dumpfile.h"
-#include "value-query.h"
  
  
  /* This file contains the code required to manage the operands cache of the

@@ -1146,8 +1145,6 @@ update_stmt_operands (struct function *fn, gimple *stmt)
gcc_assert (gimple_modified_p (stmt));
operands_scanner (fn, stmt).build_ssa_operands ();
gimple_set_modified (stmt, false);
-  // Inform the active range query an update has happened.
-  get_range_query (fn)->update_stmt (stmt);
  
timevar_pop (TV_TREE_OPS);

  }
diff --git a/gcc/value-query.h b/gcc/value-query.h
index 429446b32eb..0a6f18b03f6 100644
--- a/gcc/value-query.h
+++ b/gcc/value-query.h
@@ -71,9 +71,6 @@ public:
virtual bool range_on_edge (vrange , edge, tree expr);
virtual bool range_of_stmt (vrange , gimple *, tree name = NULL);
  
-  // When the IL in a stmt is changed, call this for better results.

-  virtual void update_stmt (gimple *) { }
-
// Query if there is any relation between SSA1 and SSA2.
relation_kind query_relation (gimple *s, tree ssa1, tree ssa2,
bool get_range 

  1   2   3   >