Re: [PATCH] x86: Support vector __bf16 type.

2022-08-16 Thread Hongtao Liu via Gcc-patches
On Tue, Aug 16, 2022 at 3:50 PM Kong, Lingling via Gcc-patches
 wrote:
>
> Hi,
>
> The patch is support vector init/broadcast/set/extract for __bf16 type.
> The __bf16 type is a storage type.
>
> OK for master?
Ok.
>
> gcc/ChangeLog:
>
> * config/i386/i386-expand.cc (ix86_expand_sse_movcc): Handle vector
> BFmode.
> (ix86_expand_vector_init_duplicate): Support vector BFmode.
> (ix86_expand_vector_init_one_nonzero): Ditto.
> (ix86_expand_vector_init_one_var): Ditto.
> (ix86_expand_vector_init_concat): Ditto.
> (ix86_expand_vector_init_interleave): Ditto.
> (ix86_expand_vector_init_general): Ditto.
> (ix86_expand_vector_init): Ditto.
> (ix86_expand_vector_set_var): Ditto.
> (ix86_expand_vector_set): Ditto.
> (ix86_expand_vector_extract): Ditto.
> * config/i386/i386.cc (classify_argument): Add BF vector modes.
> (function_arg_64): Ditto.
> (ix86_gimplify_va_arg): Ditto.
> (ix86_get_ssemov): Ditto.
> * config/i386/i386.h (VALID_AVX256_REG_MODE): Add BF vector modes.
> (VALID_AVX512F_REG_MODE): Ditto.
> (host_detect_local_cpu): Ditto.
> (VALID_SSE2_REG_MODE): Ditto.
> * config/i386/i386.md: Add BF vector modes.
> (MODE_SIZE): Ditto.
> (ssemodesuffix): Add bf suffix for BF vector modes.
> (ssevecmode): Ditto.
> * config/i386/sse.md (VMOVE): Adjust for BF vector modes.
> (VI12HFBF_AVX512VL): Ditto.
> (V_256_512): Ditto.
> (VF_AVX512HFBF16): Ditto.
> (VF_AVX512BWHFBF16): Ditto.
> (VIHFBF): Ditto.
> (avx512): Ditto.
> (VIHFBF_256): Ditto.
> (VIHFBF_AVX512BW): Ditto.
> (VI2F_256_512):Ditto.
> (V8_128):Ditto.
> (V16_256): Ditto.
> (V32_512): Ditto.
> (sseinsnmode): Ditto.
> (sseconstm1): Ditto.
> (sseintmodesuffix): New mode_attr.
> (avx512fmaskmode): Ditto.
> (avx512fmaskmodelower): Ditto.
> (ssedoublevecmode): Ditto.
> (ssehalfvecmode): Ditto.
> (ssehalfvecmodelower): Ditto.
> (ssescalarmode): Add vector BFmode mapping.
> (ssescalarmodelower): Ditto.
> (ssexmmmode): Ditto.
> (ternlogsuffix): Ditto.
> (ssescalarsize): Ditto.
> (sseintprefix): Ditto.
> (i128): Ditto.
> (xtg_mode): Ditto.
> (bcstscalarsuff): Ditto.
> (_blendm): New define_insn for BFmode.
> (_store_mask): Ditto.
> (vcond_mask_): Ditto.
> (vec_set_0): New define_insn for BF vector set.
> (V8BFH_128): New mode_iterator for BFmode.
> (avx512fp16_mov): Ditto.
> (vec_set): New define_insn for BF vector set.
> (@vec_extract_hi_): Ditto.
> (@vec_extract_lo_): Ditto.
> (vec_set_hi_): Ditto.
> (vec_set_lo_): Ditto.
> (*vec_extract_0): New define_insn_and_split for BF
> vector extract.
> (*vec_extract): New define_insn.
> (VEC_EXTRACT_MODE): Add BF vector modes.
> (PINSR_MODE): Add V8BF.
> (sse2p4_1): Ditto.
> (pinsr_evex_isa): Ditto.
> (_pinsr): Adjust to support
> insert for V8BFmode.
> (pbroadcast_evex_isa): Add BF vector modes.
> (AVX2_VEC_DUP_MODE): Ditto.
> (VEC_INIT_MODE): Ditto.
> (VEC_INIT_HALF_MODE): Ditto.
> (avx2_pbroadcast): Adjust to support BF vector mode
> broadcast.
> (avx2_pbroadcast_1): Ditto.
> (_vec_dup_1): Ditto.
> (_vec_dup_gpr):
> Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * g++.target/i386/vect-bfloat16-1.C: New test.
> * gcc.target/i386/vect-bfloat16-1.c: New test.
> * gcc.target/i386/vect-bfloat16-2a.c: New test.
> * gcc.target/i386/vect-bfloat16-2b.c: New test.
> * gcc.target/i386/vect-bfloat16-typecheck_1.c: New test.
> * gcc.target/i386/vect-bfloat16-typecheck_2.c: New test.
> ---
>  gcc/config/i386/i386-expand.cc| 129 +++--
>  gcc/config/i386/i386.cc   |  16 +-
>  gcc/config/i386/i386.h|  12 +-
>  gcc/config/i386/i386.md   |   9 +-
>  gcc/config/i386/sse.md| 211 --
>  .../g++.target/i386/vect-bfloat16-1.C |  13 +
>  .../gcc.target/i386/vect-bfloat16-1.c |  30 ++
>  .../gcc.target/i386/vect-bfloat16-2a.c| 121 
>  .../gcc.target/i386/vect-bfloat16-2b.c|  22 ++
>  .../i386/vect-bfloat16-typecheck_1.c  | 258 ++
>  .../i386/vect-bfloat16-typecheck_2.c  | 248 +
>  11 files changed, 950 insertions(+), 119 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/i386/vect-bfloat16-1.C
>  create mode 100644 gcc/testsuite/gcc.target/i386/vect-bfloat16-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/vect-bfloat16-2a.c
>  

Re: [PATCH V4] rs6000: Optimize cmp on rotated 16bits constant

2022-08-16 Thread Jiufu Guo via Gcc-patches


Gentle ping:
https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598769.html

BR,
Jeff(Jiufu)

Jiufu Guo  writes:

> Hi,
>
> When checking eq/neq with a constant which has only 16bits, it can be
> optimized to check the rotated data.  By this, the constant building
> is optimized.
>
> As the example in PR103743:
> For "in == 0x8000LL", this patch generates:
> rotldi %r3,%r3,16
> cmpldi %cr0,%r3,32768
> instead:
> li %r9,-1
> rldicr %r9,%r9,0,0
> cmpd %cr0,%r3,%r9
>
> This is a new patch to optimize compare(eq/ne) on rotatable constant.
> This patch would be more straitforward than previous patch, like:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595576.html.
>
> This patch pass bootstrap and regtest on ppc64 and ppc64le.
> Is it ok for trunk?  Thanks for comments!
>
> BR,
> Jeff(Jiufu)
>
>
>   PR target/103743
>
> gcc/ChangeLog:
>
>   * config/rs6000/rs6000-protos.h (rotate_from_leading_zeros_const):
>   New decl.
>   * config/rs6000/rs6000.cc (rotate_from_leading_zeros_const): New
>   define for checking simply rotated constant.
>   * config/rs6000/rs6000.md (*rotate_on_cmpdi): New
>   define_insn_and_split to optimze comparing on rotated constant.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/powerpc/pr103743.c: New test.
>   * gcc.target/powerpc/pr103743_1.c: New test.
>
> ---
>  gcc/config/rs6000/rs6000-protos.h |  1 +
>  gcc/config/rs6000/rs6000.cc   | 29 ++
>  gcc/config/rs6000/rs6000.md   | 54 ++-
>  gcc/testsuite/gcc.target/powerpc/pr103743.c   | 52 ++
>  gcc/testsuite/gcc.target/powerpc/pr103743_1.c | 95 +++
>  5 files changed, 230 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103743.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103743_1.c
>
> diff --git a/gcc/config/rs6000/rs6000-protos.h 
> b/gcc/config/rs6000/rs6000-protos.h
> index 3ea01023609..bc7f6ff64ef 100644
> --- a/gcc/config/rs6000/rs6000-protos.h
> +++ b/gcc/config/rs6000/rs6000-protos.h
> @@ -35,6 +35,7 @@ extern bool xxspltib_constant_p (rtx, machine_mode, int *, 
> int *);
>  extern int vspltis_shifted (rtx);
>  extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
>  extern bool macho_lo_sum_memory_operand (rtx, machine_mode);
> +extern int rotate_from_leading_zeros_const (unsigned HOST_WIDE_INT, int);
>  extern int num_insns_constant (rtx, machine_mode);
>  extern int small_data_operand (rtx, machine_mode);
>  extern bool mem_operand_gpr (rtx, machine_mode);
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 3ff16b8ae04..9c0996603d0 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -14861,6 +14861,35 @@ rs6000_reverse_condition (machine_mode mode, enum 
> rtx_code code)
>  return reverse_condition (code);
>  }
>  
> +/* Check if C can be rotated from an immediate which contains leading
> +   zeros at least CLZ.
> +
> +   Return the number by which C can be rotated from the immediate.
> +   Return -1 if C can not be rotated as from.  */
> +
> +int
> +rotate_from_leading_zeros_const (unsigned HOST_WIDE_INT c, int clz)
> +{
> +  /* case. 0..0xxx: already at least clz zeros.  */
> +  int lz = clz_hwi (c);
> +  if (lz >= clz)
> +return 0;
> +
> +  /* case a. 0..0xxx0..0: at least clz zeros.  */
> +  int tz = ctz_hwi (c);
> +  if (lz + tz >= clz)
> +return tz;
> +
> +  /* xx0..0xx: rotate enough bits firstly, then check case a.  */
> +  const int rot_bits = HOST_BITS_PER_WIDE_INT - clz + 1;
> +  unsigned HOST_WIDE_INT rc = (c >> rot_bits) | (c << (clz - 1));
> +  tz = ctz_hwi (rc);
> +  if (clz_hwi (rc) + tz >= clz)
> +return tz + rot_bits;
> +
> +  return -1;
> +}
> +
>  /* Generate a compare for CODE.  Return a brand-new rtx that
> represents the result of the compare.  */
>  
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 1367a2cb779..603781b29aa 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -7720,6 +7720,59 @@ (define_insn "*movsi_from_df"
>"xscvdpsp %x0,%x1"
>[(set_attr "type" "fp")])
>  
> +
> +(define_code_iterator eqne [eq ne])
> +
> +;; "i == C" ==> "rotl(i,N) == rotl(C,N)"
> +(define_insn_and_split "*rotate_on_cmpdi"
> +  [(set (pc)
> + (if_then_else (eqne (match_operand:DI 1 "gpc_reg_operand" "r")
> +  (match_operand:DI 2 "const_int_operand" "n"))
> +  (label_ref (match_operand 0 ""))
> +(pc)))]
> +  "TARGET_POWERPC64 && !reload_completed && can_create_pseudo_p ()
> +   && num_insns_constant (operands[2], DImode) > 1
> +   && (rotate_from_leading_zeros_const (~UINTVAL (operands[2]), 49) > 0
> +   || rotate_from_leading_zeros_const (UINTVAL (operands[2]), 48) > 0)"
> +   "#"
> +   "&& 1"
> +  [(pc)]
> +{
> +  rtx cnd = XEXP (SET_SRC (single_set (curr_insn)), 

Re: [PATCH] rs6000: avoid ineffective replacement of splitters

2022-08-16 Thread Jiufu Guo via Gcc-patches
Hi,

"Kewen.Lin"  writes:

> Hi Jeff,
>
> on 2022/8/12 14:39, Jiufu Guo via Gcc-patches wrote:
>> Hi,
>> 
>> As a comment in
>> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599556.html
>> 
>> Those splitters call rs6000_emit_set_const directly, and the replacements
>> are never used.  Using (pc) would be less misleading.
>
> Since the replacements are never used, IMHO this subject doesn't
> quite meet the change.  How about "fix misleading new patterns
> of splitters"?
Thanks for your helpful sugguestion!

BR,
Jeff(Jiufu)
>
>> 
>> This patch pass bootstrap on ppc64 BE and LE.
>> Is this ok for trunk.
>
> This patch is OK w/ or w/o subject tweaked.  Thanks!
>
> BR,
> Kewen
>
>> 
>> BR,
>> Jeff(Jiufu)
>> 
>> gcc/ChangeLog:
>> 
>>  * config/rs6000/rs6000.md: (constant splitters): Use "(pc)" as the
>>  replacements.
>> 
>> ---
>>  gcc/config/rs6000/rs6000.md | 12 +++-
>>  1 file changed, 3 insertions(+), 9 deletions(-)
>> 
>> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
>> index 1367a2cb779..7fadbeef1aa 100644
>> --- a/gcc/config/rs6000/rs6000.md
>> +++ b/gcc/config/rs6000/rs6000.md
>> @@ -7727,11 +7727,7 @@ (define_split
>>[(set (match_operand:SI 0 "gpc_reg_operand")
>>  (match_operand:SI 1 "const_int_operand"))]
>>"num_insns_constant (operands[1], SImode) > 1"
>> -  [(set (match_dup 0)
>> -(match_dup 2))
>> -   (set (match_dup 0)
>> -(ior:SI (match_dup 0)
>> -(match_dup 3)))]
>> +  [(pc)]
>>  {
>>if (rs6000_emit_set_const (operands[0], operands[1]))
>>  DONE;
>> @@ -9662,8 +9658,7 @@ (define_split
>>[(set (match_operand:DI 0 "int_reg_operand_not_pseudo")
>>  (match_operand:DI 1 "const_int_operand"))]
>>"TARGET_POWERPC64 && num_insns_constant (operands[1], DImode) > 1"
>> -  [(set (match_dup 0) (match_dup 2))
>> -   (set (match_dup 0) (plus:DI (match_dup 0) (match_dup 3)))]
>> +  [(pc)]
>>  {
>>if (rs6000_emit_set_const (operands[0], operands[1]))
>>  DONE;
>> @@ -9675,8 +9670,7 @@ (define_split
>>[(set (match_operand:DI 0 "int_reg_operand_not_pseudo")
>>  (match_operand:DI 1 "const_scalar_int_operand"))]
>>"TARGET_POWERPC64 && num_insns_constant (operands[1], DImode) > 1"
>> -  [(set (match_dup 0) (match_dup 2))
>> -   (set (match_dup 0) (plus:DI (match_dup 0) (match_dup 3)))]
>> +  [(pc)]
>>  {
>>if (rs6000_emit_set_const (operands[0], operands[1]))
>>  DONE;


Re: [RFC]rs6000: split complicated constant to memory

2022-08-16 Thread Jiufu Guo via Gcc-patches


Hi,

Segher Boessenkool  writes:

> Hi!
>
> On Mon, Aug 15, 2022 at 01:25:19PM +0800, Jiufu Guo wrote:
>> This patch tries to put the constant into constant pool if building the
>> constant requires 3 or more instructions.
>> 
>> But there is a concern: I'm wondering if this patch is really profitable.
>> 
>> Because, as I tested, 1. for simple case, if instructions are not been run
>> in parallel, loading constant from memory maybe faster; but 2. if there
>> are some instructions could run in parallel, loading constant from memory
>> are not win comparing with building constant.  As below examples.
>> 
>> For f1.c and f3.c, 'loading' constant would be acceptable in runtime aspect;
>> for f2.c and f4.c, 'loading' constant are visibly slower. 
>> 
>> For real-world cases, both kinds of code sequences exist.
>> 
>> So, I'm not sure if we need to push this patch.
>> 
>> Run a lot of times (10) below functions to check runtime.
>> f1.c:
>> long foo (long *arg, long*, long *)
>> {
>>   *arg = 0x12345678;
>> }
>> asm building constant:
>>  lis 10,0x1234
>>  ori 10,10,0x5678
>>  sldi 10,10,32
>> vs.  asm loading
>>  addis 10,2,.LC0@toc@ha
>>  ld 10,.LC0@toc@l(10)
>
> This is just a load insn, unless this is the only thing needing the TOC.
> You can use crtl->uses_const_pool as an approximation here, to figure
> out if we have that case?

Thanks for point out this!
crtl->uses_const_pool is set to 1 in force_const_mem. 
create_TOC_reference would be called after force_const_mem.
One concern: there maybe the case that crtl->uses_const_pool was not
clear to zero after related symbols are optimized out.

>
>> The runtime between 'building' and 'loading' are similar: some times the
>> 'building' is faster; sometimes 'loading' is faster. And the difference is
>> slight.
>
> When there is only one constant, sure.  But that isn't the expensive
> case we need to avoid :-)
Yes. If there are other instructions around, scheduler could
optimized the 'building' instructions to be in parallel with other
instructions.  If we emit 'building' instruction in split1 pass (before
sched1), these 'building constant' instructions may be more possible to
be scheduled better.  Then 'building form' maybe not bad.

>
>>  addis 9,2,.LC2@toc@ha
>>  ld 7,.LC0@toc@l(7)
>>  ld 10,.LC1@toc@l(10)
>>  ld 9,.LC2@toc@l(9)
>> For this case, 'loading' is always slower than 'building' (>15%).
>
> Only if there is nothing else to do, and only in cases where code size
> does not matter (i.e. microbenchmarks).
Yes, 'loading' may save code size slightly.
>
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr63281.c
>> @@ -0,0 +1,11 @@
>> +/* PR target/63281 */
>> +/* { dg-do compile { target lp64 } } */
>> +/* { dg-options "-O2 -std=c99" } */
>
> Why std=c99 btw?  The default is c17.  Is there something we need to
> disable here?
Oh, this option is not required.  Thanks!

BR,
Jeff(Jiufu)

>
>
> Segher


Re: [COMMITTED] Abstract interesting ssa-names from GORI.

2022-08-16 Thread Andrew MacLeod via Gcc-patches


On 8/16/22 21:16, Andrew MacLeod wrote:


On 8/16/22 04:21, Aldy Hernandez wrote:
On Thu, Aug 11, 2022 at 1:42 PM Richard Biener  
wrote:


@@ -599,6 +592,30 @@ path_range_query::compute_imports (bitmap 
imports, const vec )

 worklist.safe_push (arg);
 }
 }
+  else if (gassign *ass = dyn_cast  (def_stmt))
+   {
+ tree ssa[3];
+ if (range_op_handler (ass))
+   {
+ ssa[0] = gimple_range_ssa_p (gimple_range_operand1 
(ass));
+ ssa[1] = gimple_range_ssa_p (gimple_range_operand2 
(ass));

+ ssa[2] = NULL_TREE;
+   }
+ else if (gimple_assign_rhs_code (ass) == COND_EXPR)
+   {
+ ssa[0] = gimple_range_ssa_p (gimple_assign_rhs1 (ass));
+ ssa[1] = gimple_range_ssa_p (gimple_assign_rhs2 (ass));
+ ssa[2] = gimple_range_ssa_p (gimple_assign_rhs3 (ass));
+   }
+ else
+   continue;
+ for (unsigned j = 0; j < 3; ++j)
+   {
+ tree rhs = ssa[j];
+ if (rhs && add_to_imports (rhs, imports))
+   worklist.safe_push (rhs);
+   }
+   }

We seem to have 3 copies of this copy now: this one, the
threadbackward one, and the original one.

Could we abstract this somehow?

Aldy



This particular code sequence processing range-ops and COND_EXPR is 
becoming more common, so I've abstracted it into a routine.


Basically, pass it a vector and the stmt, and it will fill the first X 
elements with ssa-names from the stmt.  It only deals with range-ops 
and COND_EXPR for now, and it requires you pass it enough elements (3) 
so that it doesn't have to check if its overflowing the bounds.  It 
returns the number of names it put in the vector.


This patch changes GORI to use the new routine.  Bootstrapped on 
x86_64-pc-linux-gnu with no regressions.  Pushed.



Andrew



This patch converts the code sequence you complained about to use the 
new routine.  Check to make sure it doesnt affect the nuber of threads, 
it should bootstrap and pass regressions, but I have run out of time 
today.   Give it a go, and if there was another place you saw this, 
change it there too.  I didn't find another place.


Andrew
diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index c99d77dd340..4e0d6bdcf3f 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -558,25 +558,11 @@ path_range_query::compute_exit_dependencies (bitmap dependencies,
   else if (gassign *ass = dyn_cast  (def_stmt))
 	{
 	  tree ssa[3];
-	  if (range_op_handler (ass))
+	  unsigned count = gimple_range_ssa_names (ssa, 3, ass);
+	  for (unsigned j = 0; j < count; ++j)
 	{
-	  ssa[0] = gimple_range_ssa_p (gimple_range_operand1 (ass));
-	  ssa[1] = gimple_range_ssa_p (gimple_range_operand2 (ass));
-	  ssa[2] = NULL_TREE;
-	}
-	  else if (gimple_assign_rhs_code (ass) == COND_EXPR)
-	{
-	  ssa[0] = gimple_range_ssa_p (gimple_assign_rhs1 (ass));
-	  ssa[1] = gimple_range_ssa_p (gimple_assign_rhs2 (ass));
-	  ssa[2] = gimple_range_ssa_p (gimple_assign_rhs3 (ass));
-	}
-	  else
-	continue;
-	  for (unsigned j = 0; j < 3; ++j)
-	{
-	  tree rhs = ssa[j];
-	  if (rhs && add_to_exit_dependencies (rhs, dependencies))
-		worklist.safe_push (rhs);
+	  if (add_to_exit_dependencies (ssa[j], dependencies))
+		worklist.safe_push (ssa[j]);
 	}
 	}
 }


[COMMITTED] Abstract interesting ssa-names from GORI.

2022-08-16 Thread Andrew MacLeod via Gcc-patches


On 8/16/22 04:21, Aldy Hernandez wrote:

On Thu, Aug 11, 2022 at 1:42 PM Richard Biener  wrote:


@@ -599,6 +592,30 @@ path_range_query::compute_imports (bitmap imports, const 
vec )
 worklist.safe_push (arg);
 }
 }
+  else if (gassign *ass = dyn_cast  (def_stmt))
+   {
+ tree ssa[3];
+ if (range_op_handler (ass))
+   {
+ ssa[0] = gimple_range_ssa_p (gimple_range_operand1 (ass));
+ ssa[1] = gimple_range_ssa_p (gimple_range_operand2 (ass));
+ ssa[2] = NULL_TREE;
+   }
+ else if (gimple_assign_rhs_code (ass) == COND_EXPR)
+   {
+ ssa[0] = gimple_range_ssa_p (gimple_assign_rhs1 (ass));
+ ssa[1] = gimple_range_ssa_p (gimple_assign_rhs2 (ass));
+ ssa[2] = gimple_range_ssa_p (gimple_assign_rhs3 (ass));
+   }
+ else
+   continue;
+ for (unsigned j = 0; j < 3; ++j)
+   {
+ tree rhs = ssa[j];
+ if (rhs && add_to_imports (rhs, imports))
+   worklist.safe_push (rhs);
+   }
+   }

We seem to have 3 copies of this copy now: this one, the
threadbackward one, and the original one.

Could we abstract this somehow?

Aldy



This particular code sequence processing range-ops and COND_EXPR is 
becoming more common, so I've abstracted it into a routine.


Basically, pass it a vector and the stmt, and it will fill the first X 
elements with ssa-names from the stmt.  It only deals with range-ops and 
COND_EXPR for now, and it requires you pass it enough elements (3) so 
that it doesn't have to check if its overflowing the bounds.  It returns 
the number of names it put in the vector.


This patch changes GORI to use the new routine.  Bootstrapped on 
x86_64-pc-linux-gnu with no regressions.  Pushed.



Andrew
commit 80f78716c2c7ce1b7f96077c35c1dd474a2086a2
Author: Andrew MacLeod 
Date:   Tue Aug 16 13:18:37 2022 -0400

Abstract interesting ssa-names from GORI.

Provide a routine to pick out the ssa-names from interesting statements.

* gimple-range-fold.cc (gimple_range_ssa_names): New.
* gimple-range-fold.h (gimple_range_ssa_names): New prototype.
* gimple-range-gori.cc (range_def_chain::get_def_chain): Move
  code to new routine.

diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index 689d8279627..b0b22106320 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -1580,3 +1580,36 @@ fur_source::register_outgoing_edges (gcond *s, irange _range, edge e0, edge
 	}
 }
 }
+
+// Given stmt S, fill VEC, up to VEC_SIZE elements, with relevant ssa-names
+// on the statement.  For efficiency, it is an error to not pass in enough
+// elements for the vector.  Return the number of ssa-names.
+
+unsigned
+gimple_range_ssa_names (tree *vec, unsigned vec_size, gimple *stmt)
+{
+  tree ssa;
+  int count = 0;
+
+  if (range_op_handler (stmt))
+{
+  gcc_checking_assert (vec_size >= 2);
+  if ((ssa = gimple_range_ssa_p (gimple_range_operand1 (stmt
+	vec[count++] = ssa;
+  if ((ssa = gimple_range_ssa_p (gimple_range_operand2 (stmt
+	vec[count++] = ssa;
+}
+  else if (is_a (stmt)
+	   && gimple_assign_rhs_code (stmt) == COND_EXPR)
+{
+  gcc_checking_assert (vec_size >= 3);
+  gassign *st = as_a (stmt);
+  if ((ssa = gimple_range_ssa_p (gimple_assign_rhs1 (st
+	vec[count++] = ssa;
+  if ((ssa = gimple_range_ssa_p (gimple_assign_rhs2 (st
+	vec[count++] = ssa;
+  if ((ssa = gimple_range_ssa_p (gimple_assign_rhs3 (st
+	vec[count++] = ssa;
+}
+  return count;
+}
diff --git a/gcc/gimple-range-fold.h b/gcc/gimple-range-fold.h
index c2f381dffec..f2eab720213 100644
--- a/gcc/gimple-range-fold.h
+++ b/gcc/gimple-range-fold.h
@@ -96,6 +96,14 @@ range_compatible_p (tree type1, tree type2)
 	  && TYPE_SIGN (type1) == TYPE_SIGN (type2));
 }
 
+extern tree gimple_range_operand1 (const gimple *s);
+extern tree gimple_range_operand2 (const gimple *s);
+
+// Given stmt S, fill VEC, up to VEC_SIZE elements, with relevant ssa-names
+// on the statement.  For efficiency, it is an error to not pass in enough
+// elements for the vector.  Return the number of ssa-names.
+
+unsigned gimple_range_ssa_names (tree *vec, unsigned vec_size, gimple *stmt);
 
 // Source of all operands for fold_using_range and gori_compute.
 // It abstracts out the source of an operand so it can come from a stmt or
@@ -150,9 +158,6 @@ protected:
   relation_oracle *m_oracle;
 };
 
-extern tree gimple_range_operand1 (const gimple *s);
-extern tree gimple_range_operand2 (const gimple *s);
-
 // This class uses ranges to fold a gimple statement producinf a range for
 // the LHS.  The source of all operands is supplied via the fur_source class
 // which provides a range_query as well as a source location and any other
diff --git a/gcc/gimple-range-gori.cc b/gcc/gimple-range-gori.cc

[PATCH] d: Fix #error You must define PREFERRED_DEBUGGING_TYPE if DWARF is not supported (PR105659)

2022-08-16 Thread Iain Buclaw via Gcc-patches
Hi,

Because targetdm contains hooks pertaining to both the target platform
and cpu, it tries to pull in both platform and cpu headers via tm_d.h in
the source file where TARGETDM_INITIALIZER is used.

Since 12.0, this has caused problems when there is no platform (*-elf),
resulting in default-d.cc failing to build due to triggering a
PREFERRED_DEBUGGING_TYPE #error.

This patch removes the CPU-specific hooks from targetdm, documenting
them instead as target macros.  Also removing the generation of tm_d.h
as its role is redundant.

I also notice that Rust maintainers initially copied what I did in
devel/rust/master, but ended up reverting back to using macros to get at
target OS and CPU information as well, possibly because they ran into
the same problems as reported in PR105659.

I'm not sure whether calling these hooks via function-like macros is
really desirable, I do recall early on during the review process of the
D front-end that putting target-specific language features behind a
targetdm hook was the preferred/encouraged way to expose these things.

One alternative perhaps would be to break out CPU-specific hooks in
targetdm into a separate targetdm_cpu hook vector.  This would mean
there would be no need to include tm_p.h anywhere in D-specific target
sources (only tm.h where needed), and all D-specific prototypes in
$cpu_type-protos.h can be removed.  Though tm_d.h would still be
redundant, so either way it gets the chop.

OK? Thoughts?  I don't expect this to go in for 12.2, but backporting
some time before 12.3 would be nice.

Bootstrapped and regression tested on x86_64-linux-gnu, and checked that
it indeed fixes the referenced PR by building an aarch64-rtems cross.

Regards,
Iain.

---
PR d/105659

gcc/ChangeLog:

* Makefile.in (tm_d_file_list): Remove.
(tm_d_include_list): Remove.
(TM_D_H): Remove.
(tm_d.h): Remove.
(cs-tm_d.h): Remove.
(generated_files): Remove TM_D_H.
* config.gcc (tm_d_file): Remove.
* config/darwin-d.cc: Include memmodel.h and tm_p.h instead of tm_d.h.
* config/default-d.cc: Remove includes of memmodel.h and tm_d.h.
* config/dragonfly-d.cc: Include tm_p.h instead of tm_d.h.
* configure: Regenerate.
* configure.ac (tm_d_file): Remove.
(tm_d_file_list): Remove substitution.
(tm_d_include_list): Remove substitution.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in (TARGET_D_CPU_VERSIONS): Document hook as being a
function-like macro.
(TARGET_D_REGISTER_CPU_TARGET_INFO): Likewise.

gcc/d/ChangeLog:

* d-builtins.cc: Include memmodel.h and tm_p.h.
(d_init_versions): Call TARGET_D_CPU_VERSIONS via macro.
* d-target.cc (Target::_init): Call TARGET_D_REGISTER_CPU_TARGET_INFO
via macro.
* d-target.def (d_cpu_versions): Remove hook.
(d_register_cpu_target_info): Remove hook.
---
 gcc/Makefile.in   | 11 +--
 gcc/config.gcc|  7 ---
 gcc/config/darwin-d.cc|  3 ++-
 gcc/config/default-d.cc   |  9 +++--
 gcc/config/dragonfly-d.cc |  2 +-
 gcc/configure | 32 
 gcc/configure.ac  | 18 --
 gcc/d/d-builtins.cc   |  6 +-
 gcc/d/d-target.cc |  4 +++-
 gcc/d/d-target.def| 22 --
 gcc/doc/tm.texi   | 22 --
 gcc/doc/tm.texi.in| 18 --
 12 files changed, 55 insertions(+), 99 deletions(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 203f0a15187..12d9b5a3be4 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -571,8 +571,6 @@ tm_include_list=@tm_include_list@
 tm_defines=@tm_defines@
 tm_p_file_list=@tm_p_file_list@
 tm_p_include_list=@tm_p_include_list@
-tm_d_file_list=@tm_d_file_list@
-tm_d_include_list=@tm_d_include_list@
 build_xm_file_list=@build_xm_file_list@
 build_xm_include_list=@build_xm_include_list@
 build_xm_defines=@build_xm_defines@
@@ -865,7 +863,6 @@ BCONFIG_H = bconfig.h $(build_xm_file_list)
 CONFIG_H  = config.h  $(host_xm_file_list)
 TCONFIG_H = tconfig.h $(xm_file_list)
 TM_P_H= tm_p.h$(tm_p_file_list)
-TM_D_H= tm_d.h$(tm_d_file_list)
 GTM_H = tm.h  $(tm_file_list) insn-constants.h
 TM_H  = $(GTM_H) insn-flags.h $(OPTIONS_H)
 
@@ -1937,7 +1934,6 @@ bconfig.h: cs-bconfig.h ; @true
 tconfig.h: cs-tconfig.h ; @true
 tm.h: cs-tm.h ; @true
 tm_p.h: cs-tm_p.h ; @true
-tm_d.h: cs-tm_d.h ; @true
 
 cs-config.h: Makefile
TARGET_CPU_DEFAULT="" \
@@ -1964,11 +1960,6 @@ cs-tm_p.h: Makefile
HEADERS="$(tm_p_include_list)" DEFINES="" \
$(SHELL) $(srcdir)/mkconfig.sh tm_p.h
 
-cs-tm_d.h: Makefile
-   TARGET_CPU_DEFAULT="" \
-   HEADERS="$(tm_d_include_list)" DEFINES="" \
-   $(SHELL) $(srcdir)/mkconfig.sh tm_d.h
-
 # Don't automatically run autoconf, since configure.ac might be accidentally
 # newer than configure.  Also, this 

Re: [PATCH] c++: Extend -Wpessimizing-move to other contexts

2022-08-16 Thread Marek Polacek via Gcc-patches
On Tue, Aug 16, 2022 at 03:23:18PM -0400, Jason Merrill wrote:
> On 8/2/22 16:04, Marek Polacek wrote:
> > In my recent patch which enhanced -Wpessimizing-move so that it warns
> > about class prvalues too I said that I'd like to extend it so that it
> > warns in more contexts where a std::move can prevent copy elision, such
> > as:
> > 
> >T t = std::move(T());
> >T t(std::move(T()));
> >T t{std::move(T())};
> >T t = {std::move(T())};
> >void foo (T);
> >foo (std::move(T()));
> > 
> > This patch does that by adding two maybe_warn_pessimizing_move calls.
> > These must happen before we've converted the initializers otherwise the
> > std::move will be buried in a TARGET_EXPR.
> > 
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > 
> > PR c++/106276
> > 
> > gcc/cp/ChangeLog:
> > 
> > * call.cc (build_over_call): Call maybe_warn_pessimizing_move.
> > * cp-tree.h (maybe_warn_pessimizing_move): Declare.
> > * decl.cc (build_aggr_init_full_exprs): Call
> > maybe_warn_pessimizing_move.
> > * typeck.cc (maybe_warn_pessimizing_move): Handle TREE_LIST and
> > CONSTRUCTOR.  Add a bool parameter and use it.  Adjust a diagnostic
> > message.
> > (check_return_expr): Adjust the call to maybe_warn_pessimizing_move.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp0x/Wpessimizing-move7.C: Add dg-warning.
> > * g++.dg/cpp0x/Wpessimizing-move8.C: New test.
> > ---
> >   gcc/cp/call.cc|  5 +-
> >   gcc/cp/cp-tree.h  |  1 +
> >   gcc/cp/decl.cc|  3 +-
> >   gcc/cp/typeck.cc  | 58 -
> >   .../g++.dg/cpp0x/Wpessimizing-move7.C | 16 ++---
> >   .../g++.dg/cpp0x/Wpessimizing-move8.C | 65 +++
> >   6 files changed, 120 insertions(+), 28 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp0x/Wpessimizing-move8.C
> > 
> > diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
> > index 01a7be10077..370137ebd6d 100644
> > --- a/gcc/cp/call.cc
> > +++ b/gcc/cp/call.cc
> > @@ -9627,10 +9627,13 @@ build_over_call (struct z_candidate *cand, int 
> > flags, tsubst_flags_t complain)
> > if (!conversion_warning)
> > arg_complain &= ~tf_warning;
> > +  if (arg_complain & tf_warning)
> > +   maybe_warn_pessimizing_move (arg, type, /*return_p*/false);
> > +
> > val = convert_like_with_context (conv, arg, fn, i - is_method,
> >arg_complain);
> > val = convert_for_arg_passing (type, val, arg_complain);
> > -   
> > +
> > if (val == error_mark_node)
> >   return error_mark_node;
> > else
> > diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> > index 3278b4114bd..5a8af22b509 100644
> > --- a/gcc/cp/cp-tree.h
> > +++ b/gcc/cp/cp-tree.h
> > @@ -8101,6 +8101,7 @@ extern tree finish_right_unary_fold_expr (tree, 
> > int);
> >   extern tree finish_binary_fold_expr  (tree, tree, int);
> >   extern tree treat_lvalue_as_rvalue_p   (tree, bool);
> >   extern bool decl_in_std_namespace_p(tree);
> > +extern void maybe_warn_pessimizing_move (tree, tree, bool);
> >   /* in typeck2.cc */
> >   extern void require_complete_eh_spec_types(tree, tree);
> > diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
> > index 70ad681467e..dc6853a7de1 100644
> > --- a/gcc/cp/decl.cc
> > +++ b/gcc/cp/decl.cc
> > @@ -7220,9 +7220,10 @@ check_array_initializer (tree decl, tree type, tree 
> > init)
> >   static tree
> >   build_aggr_init_full_exprs (tree decl, tree init, int flags)
> > -
> >   {
> > gcc_assert (stmts_are_full_exprs_p ());
> > +  if (init)
> > +maybe_warn_pessimizing_move (init, TREE_TYPE (decl), 
> > /*return_p*/false);
> 
> This is a surprising place to add this.  Why here rather than in
> build_aggr_init or check_initializer?

IIRC it just felt appropriate since we only want to invoke maybe_warn_ on the
full expression, not any subexpressions -- we're looking to see if the
outermost expr is a std::move.  Also, we want to warn for all types, not just
classes.

But I can move the call into some place in check_initializer if you prefer.

Marek



Re: [PATCH] xtensa: Prevent emitting integer additions of constant zero

2022-08-16 Thread Max Filippov via Gcc-patches
Hi Suwa-san,

On Tue, Aug 16, 2022 at 5:42 AM Takayuki 'January June' Suwa
 wrote:
>
> In a few cases, obviously omitable add instructions can be emitted via
> invoking gen_addsi3.
>
> gcc/ChangeLog:
>
> * config/xtensa/xtensa.md (addsi3_internal): Rename from "addsi3".
> (addsi3): New define_expand in order to reject integer additions of
> constant zero.
> ---
>  gcc/config/xtensa/xtensa.md | 14 +-
>  1 file changed, 13 insertions(+), 1 deletion(-)

with this change a bunch of tests fail to build with the following error:
  undefined reference to `__addsi3'

E.g. gcc.c-torture/execute/2519-1.c
or gcc.c-torture/execute/20070919-1.c

-- 
Thanks.
-- Max


Re: [PATCH] c++: Extend -Wpessimizing-move to other contexts

2022-08-16 Thread Jason Merrill via Gcc-patches

On 8/2/22 16:04, Marek Polacek wrote:

In my recent patch which enhanced -Wpessimizing-move so that it warns
about class prvalues too I said that I'd like to extend it so that it
warns in more contexts where a std::move can prevent copy elision, such
as:

   T t = std::move(T());
   T t(std::move(T()));
   T t{std::move(T())};
   T t = {std::move(T())};
   void foo (T);
   foo (std::move(T()));

This patch does that by adding two maybe_warn_pessimizing_move calls.
These must happen before we've converted the initializers otherwise the
std::move will be buried in a TARGET_EXPR.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR c++/106276

gcc/cp/ChangeLog:

* call.cc (build_over_call): Call maybe_warn_pessimizing_move.
* cp-tree.h (maybe_warn_pessimizing_move): Declare.
* decl.cc (build_aggr_init_full_exprs): Call
maybe_warn_pessimizing_move.
* typeck.cc (maybe_warn_pessimizing_move): Handle TREE_LIST and
CONSTRUCTOR.  Add a bool parameter and use it.  Adjust a diagnostic
message.
(check_return_expr): Adjust the call to maybe_warn_pessimizing_move.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/Wpessimizing-move7.C: Add dg-warning.
* g++.dg/cpp0x/Wpessimizing-move8.C: New test.
---
  gcc/cp/call.cc|  5 +-
  gcc/cp/cp-tree.h  |  1 +
  gcc/cp/decl.cc|  3 +-
  gcc/cp/typeck.cc  | 58 -
  .../g++.dg/cpp0x/Wpessimizing-move7.C | 16 ++---
  .../g++.dg/cpp0x/Wpessimizing-move8.C | 65 +++
  6 files changed, 120 insertions(+), 28 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/Wpessimizing-move8.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 01a7be10077..370137ebd6d 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -9627,10 +9627,13 @@ build_over_call (struct z_candidate *cand, int flags, 
tsubst_flags_t complain)
if (!conversion_warning)
arg_complain &= ~tf_warning;
  
+  if (arg_complain & tf_warning)

+   maybe_warn_pessimizing_move (arg, type, /*return_p*/false);
+
val = convert_like_with_context (conv, arg, fn, i - is_method,
   arg_complain);
val = convert_for_arg_passing (type, val, arg_complain);
-   
+
if (val == error_mark_node)
  return error_mark_node;
else
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 3278b4114bd..5a8af22b509 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -8101,6 +8101,7 @@ extern tree finish_right_unary_fold_expr (tree, int);
  extern tree finish_binary_fold_expr  (tree, tree, int);
  extern tree treat_lvalue_as_rvalue_p   (tree, bool);
  extern bool decl_in_std_namespace_p(tree);
+extern void maybe_warn_pessimizing_move (tree, tree, bool);
  
  /* in typeck2.cc */

  extern void require_complete_eh_spec_types(tree, tree);
diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 70ad681467e..dc6853a7de1 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -7220,9 +7220,10 @@ check_array_initializer (tree decl, tree type, tree init)
  
  static tree

  build_aggr_init_full_exprs (tree decl, tree init, int flags)
-
  {
gcc_assert (stmts_are_full_exprs_p ());
+  if (init)
+maybe_warn_pessimizing_move (init, TREE_TYPE (decl), /*return_p*/false);


This is a surprising place to add this.  Why here rather than in 
build_aggr_init or check_initializer?



return build_aggr_init (decl, init, flags, tf_warning_or_error);
  }
  
diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc

index 9500c4e2fe8..2650beb780e 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -10368,17 +10368,17 @@ treat_lvalue_as_rvalue_p (tree expr, bool return_p)
  }
  }
  
-/* Warn about wrong usage of std::move in a return statement.  RETVAL

-   is the expression we are returning; FUNCTYPE is the type the function
-   is declared to return.  */
+/* Warn about dubious usage of std::move (in a return statement, if RETURN_P
+   is true).  EXPR is the std::move expression; TYPE is the type of the object
+   being initialized.  */
  
-static void

-maybe_warn_pessimizing_move (tree retval, tree functype)
+void
+maybe_warn_pessimizing_move (tree expr, tree type, bool return_p)
  {
if (!(warn_pessimizing_move || warn_redundant_move))
  return;
  
-  location_t loc = cp_expr_loc_or_input_loc (retval);

+  const location_t loc = cp_expr_loc_or_input_loc (expr);
  
/* C++98 doesn't know move.  */

if (cxx_dialect < cxx11)
@@ -10390,14 +10390,32 @@ maybe_warn_pessimizing_move (tree retval, tree 
functype)
  return;
  
/* This is only interesting for class types.  */

-  if (!CLASS_TYPE_P (functype))
+  if (!CLASS_TYPE_P (type))
  return;
  
+  /* A a = std::move (A());  */

+  if (TREE_CODE (expr) == TREE_LIST)
+{
+  if (list_length (expr) == 

[PATCH] bug in emergency cxa pool free()

2022-08-16 Thread Keef Aragon
This probably has never actually affected anyone in practice. The normal
ABI implementation just uses malloc and only falls back to the pool on
malloc failure. But if that happens a bunch of times the freelist gets out
of order which violates some of the invariants of the freelist (as well as
the comments that follow the bug). The bug is just a comparison reversal
when traversing the freelist in the case where the pointer being returned
to the pool is after the existing freelist.

I'm not sure what to do as far as the test suite is concerned. It's a
private part of the implementation of the exception handling ABI and it can
only ever be triggered if malloc fails (repeatedly). So it seems like
reproducing it from the external interface will require hooking malloc to
forcibly return NULL.

But I'm a newb on these lists, so will obediently do as instructed.
diff --git a/libstdc++-v3/ChangeLog-2022 b/libstdc++-v3/ChangeLog-2022
new file mode 100644
index 000..8057de58539
--- /dev/null
+++ b/libstdc++-v3/ChangeLog-2022
@@ -0,0 +1,4 @@
+2022-08-16  Keef Aragon  
+
+* libstdc++-v3/libsupc++/eh_alloc.cc: inverse comparison in pool::free
+
diff --git a/libstdc++-v3/libsupc++/eh_alloc.cc b/libstdc++-v3/libsupc++/eh_alloc.cc
index c85b9aed40b..cad2750e3b9 100644
--- a/libstdc++-v3/libsupc++/eh_alloc.cc
+++ b/libstdc++-v3/libsupc++/eh_alloc.cc
@@ -225,7 +225,7 @@ namespace
 	  for (fe = _free_entry;
 	   (*fe)->next
 	   && (reinterpret_cast  ((*fe)->next)
-		   > reinterpret_cast  (e) + sz);
+		   < reinterpret_cast  (e) + sz);
 	   fe = &(*fe)->next)
 	;
 	  // If we can merge the next block into us do so and continue


Re: [PATCH 0/5] IEEE 128-bit built-in overload support.

2022-08-16 Thread Segher Boessenkool
Hi!

On Tue, Aug 16, 2022 at 08:07:48PM +0200, Jakub Jelinek wrote:
> On Thu, Aug 11, 2022 at 08:44:17PM +, Joseph Myers wrote:
> > On Thu, 11 Aug 2022, Michael Meissner via Gcc-patches wrote:
> > > In looking at it, I now believe that the type for _Float128 and __float128
> > > should always be the same within the compiler.  Whether we would continue 
> > > to
> > > use the same type for long double and _Float128/__float128 remains to be 
> > > seen.
> > 
> > long double and _Float128 must always be different types; that's how it's 
> > defined in C23.
> 
> And when we implement C++23 P1467R9, if std::float128_t will be
> _Float128 under the hood, then long double and _Float128 have to remain
> distinct types and mangle differently, long double (and __float128 if
> long double is IEEE quad and __float128 exists?) need to mangle the way
> they currently do and _Float128 should mangle as  DF128_ .
>  ::= DF  _ # ISO/IEC TS 18661 binary floating point type 
> _FloatN (N bits)

So should we make std::floatNN_t be the same as _FloatNN, and mangled
as DF_ ?  And __ieee128 (and long double implemented as that) the
same as we already have.

> Wonder how shall we mangle the underlying type of std::bfloat16_t though.

That should get some cross-platform mangling?  Power shouldn't go its
own way here :-)

> I assume e.g. for libstdc++ implementation purposes we need to have
> __ibm128 and __float128 types mangling as long double mangles when the
> -mabi={ibm,ieee}longdouble option is used, because otherwise it would be
> really hard to implement it.

If at all possible it should be the same as we have already: otherwise
it will be at least five years before anything works again (for users).

This agrees with what you propose afaics, but let's make this explicit?
It helps us sleep at night :-)


Segher


[PATCH][_GLIBCXX_ASSERTIONS] Activate __glibcxx_requires_string/__glibcxx_requires_string_len

2022-08-16 Thread François Dumont via Gcc-patches
Following my remark about tests XFAIL-ing when running 'make check' I'd 
like to propose this to avoid this situation.


    libstdc++: [_GLIBCXX_ASSERTIONS] Activate basic 
_GLIBCXX_DEBUB_PEDANTIC checks


    Activate __glibcxx_requires_string/__glibcxx_requires_string_len in 
basic _GLIBCXX_ASSERTIONS
    mode which is then considering _GLIBCXX_DEBUG_PEDANTIC for this 
purpose.


    Thanks to this change add _GLIBCXX_ASSERTIONS to some tests that 
are otherwise XFAIL-ing on

    segmentation fault rather than on a proper __glibcxx_assert call.

    libstdc++-v3/ChangeLog:

    * include/debug/debug.h (__glibcxx_requires_string, 
__glibcxx_requires_string_len): Move

    definitions...
    * include/debug/assertions.h: ... here. Definition now 
depends on _GLIBCXX_ASSERTIONS rather

    than _GLIBCXX_DEBUG.
    * 
testsuite/21_strings/basic_string/operations/ends_with/char_neg.cc: Add 
_GLIBCXX_ASSERTIONS

    define.
    * 
testsuite/21_strings/basic_string/operations/ends_with/wchar_t_neg.cc: 
Likewise.
    * 
testsuite/21_strings/basic_string/operations/starts_with/char_neg.cc: 
Likewise.
    * 
testsuite/21_strings/basic_string/operations/starts_with/wchar_t_neg.cc: 
Likewise.


Tested under Linux x86_64.

Ok to commit ?

François
diff --git a/libstdc++-v3/include/debug/assertions.h b/libstdc++-v3/include/debug/assertions.h
index 57c0ab2c3cf..fcc910c7396 100644
--- a/libstdc++-v3/include/debug/assertions.h
+++ b/libstdc++-v3/include/debug/assertions.h
@@ -43,6 +43,8 @@
 # define __glibcxx_requires_non_empty_range(_First,_Last)
 # define __glibcxx_requires_nonempty()
 # define __glibcxx_requires_subscript(_N)
+# define __glibcxx_requires_string(_String)
+# define __glibcxx_requires_string_len(_String,_Len)
 #else
 
 // Verify that [_First, _Last) forms a non-empty iterator range.
@@ -53,6 +55,22 @@
 // Verify that the container is nonempty
 # define __glibcxx_requires_nonempty()		\
   __glibcxx_assert(!this->empty())
+# ifdef _GLIBCXX_DEBUG_PEDANTIC
+#  if __cplusplus < 201103L
+#   define __glibcxx_requires_string(_String)	\
+  __glibcxx_assert(_String != 0)
+#   define __glibcxx_requires_string_len(_String,_Len)	\
+  __glibcxx_assert(_String != 0 || _Len == 0)
+#  else
+#   define __glibcxx_requires_string(_String)	\
+  __glibcxx_assert(_String != nullptr)
+#   define __glibcxx_requires_string_len(_String,_Len)	\
+  __glibcxx_assert(_String != nullptr || _Len == 0)
+#  endif // C++11
+# else
+#  define __glibcxx_requires_string(_String)
+#  define __glibcxx_requires_string_len(_String,_Len)
+# endif // _GLIBCXX_DEBUG_PEDANTIC
 #endif
 
 #ifdef _GLIBCXX_DEBUG
diff --git a/libstdc++-v3/include/debug/debug.h b/libstdc++-v3/include/debug/debug.h
index f4233760426..5593b4fe92c 100644
--- a/libstdc++-v3/include/debug/debug.h
+++ b/libstdc++-v3/include/debug/debug.h
@@ -78,8 +78,6 @@ namespace __gnu_debug
 # define __glibcxx_requires_partitioned_upper_pred(_First,_Last,_Value,_Pred)
 # define __glibcxx_requires_heap(_First,_Last)
 # define __glibcxx_requires_heap_pred(_First,_Last,_Pred)
-# define __glibcxx_requires_string(_String)
-# define __glibcxx_requires_string_len(_String,_Len)
 # define __glibcxx_requires_irreflexive(_First,_Last)
 # define __glibcxx_requires_irreflexive2(_First,_Last)
 # define __glibcxx_requires_irreflexive_pred(_First,_Last,_Pred)
@@ -118,17 +116,6 @@ namespace __gnu_debug
   __glibcxx_check_heap(_First,_Last)
 # define __glibcxx_requires_heap_pred(_First,_Last,_Pred)	\
   __glibcxx_check_heap_pred(_First,_Last,_Pred)
-# if __cplusplus < 201103L
-#  define __glibcxx_requires_string(_String)	\
-  _GLIBCXX_DEBUG_PEDASSERT(_String != 0)
-#  define __glibcxx_requires_string_len(_String,_Len)	\
-  _GLIBCXX_DEBUG_PEDASSERT(_String != 0 || _Len == 0)
-# else
-#  define __glibcxx_requires_string(_String)	\
-  _GLIBCXX_DEBUG_PEDASSERT(_String != nullptr)
-#  define __glibcxx_requires_string_len(_String,_Len)	\
-  _GLIBCXX_DEBUG_PEDASSERT(_String != nullptr || _Len == 0)
-# endif
 # define __glibcxx_requires_irreflexive(_First,_Last)	\
   __glibcxx_check_irreflexive(_First,_Last)
 # define __glibcxx_requires_irreflexive2(_First,_Last)	\
diff --git a/libstdc++-v3/testsuite/21_strings/basic_string/operations/ends_with/char_neg.cc b/libstdc++-v3/testsuite/21_strings/basic_string/operations/ends_with/char_neg.cc
index 7a7b8dd077d..6080ddc0555 100644
--- a/libstdc++-v3/testsuite/21_strings/basic_string/operations/ends_with/char_neg.cc
+++ b/libstdc++-v3/testsuite/21_strings/basic_string/operations/ends_with/char_neg.cc
@@ -15,7 +15,7 @@
 // with this library; see the file COPYING3.  If not see
 // .
 //
-// { dg-options "-std=gnu++2a -O0" }
+// { dg-options "-std=gnu++2a -D_GLIBCXX_ASSERTIONS" }
 // { dg-do run { target c++2a xfail *-*-* } }
 
 #define _GLIBCXX_DEBUG_PEDANTIC
diff --git a/libstdc++-v3/testsuite/21_strings/basic_string/operations/ends_with/wchar_t_neg.cc 

Re: [PATCH 0/5] IEEE 128-bit built-in overload support.

2022-08-16 Thread Jakub Jelinek via Gcc-patches
On Thu, Aug 11, 2022 at 08:44:17PM +, Joseph Myers wrote:
> On Thu, 11 Aug 2022, Michael Meissner via Gcc-patches wrote:
> 
> > In looking at it, I now believe that the type for _Float128 and __float128
> > should always be the same within the compiler.  Whether we would continue to
> > use the same type for long double and _Float128/__float128 remains to be 
> > seen.
> 
> long double and _Float128 must always be different types; that's how it's 
> defined in C23.

And when we implement C++23 P1467R9, if std::float128_t will be
_Float128 under the hood, then long double and _Float128 have to remain
distinct types and mangle differently, long double (and __float128 if
long double is IEEE quad and __float128 exists?) need to mangle the way
they currently do and _Float128 should mangle as  DF128_ .
 ::= DF  _ # ISO/IEC TS 18661 binary floating point type 
_FloatN (N bits)
Wonder how shall we mangle the underlying type of std::bfloat16_t though.

I assume e.g. for libstdc++ implementation purposes we need to have
__ibm128 and __float128 types mangling as long double mangles when the
-mabi={ibm,ieee}longdouble option is used, because otherwise it would be
really hard to implement it.

Jakub



[wwwdocs] Add C++23 papers approved by WG21 at the July plenary

2022-08-16 Thread Marek Polacek via Gcc-patches
We have a lot of new papers to implement.  I've also opened PRs for them.

Pushed.

commit c6f8ab1adad76d2b43f6cbdacd84202131ae1e5c
Author: Marek Polacek 
Date:   Tue Aug 16 13:39:34 2022 -0400

cxx-status: Add C++23 papers approved by WG21 at the July plenary

diff --git a/htdocs/projects/cxx-status.html b/htdocs/projects/cxx-status.html
index b243b846..ef4722c0 100644
--- a/htdocs/projects/cxx-status.html
+++ b/htdocs/projects/cxx-status.html
@@ -139,10 +139,14 @@
 
 
 
-   Change scope of lambda trailing-return-type 
+   Change scope of lambda trailing-return-type 
https://wg21.link/p2036r3;>P2036R3
-   https://gcc.gnu.org/PR102610;>No
-   
+   https://gcc.gnu.org/PR102610;>No
+   
+
+
+
+   https://wg21.link/p2579r0;>P2579R0
 
 
 
@@ -231,6 +235,118 @@
13

 
+
+
+   The Equality Operator You Are Looking For 
+   https://wg21.link/p2468r2;>P2468R2
+   https://gcc.gnu.org/PR106644;>No
+   
+
+
+
+   De-deprecating volatile compound operations 
+   https://wg21.link/p2327r1;>P2327R1
+   13
+   
+
+
+
+   Support for #warning 
+   https://wg21.link/p2437r1;>P2437R1
+   https://gcc.gnu.org/PR106646;>Yes?
+   
+
+
+
+   Remove non-encodable wide character literals and multicharacter 
wide character literals 
+   https://wg21.link/p2362r3;>P2362R3
+   https://gcc.gnu.org/PR106647;>No
+   
+
+
+
+   Delimited escape sequences 
+   https://wg21.link/p2290r3;>P2290R3
+   https://gcc.gnu.org/PR106645;>No
+   
+
+
+
+   Named universal character escapes 
+   https://wg21.link/p2071r2;>P2071R2
+   https://gcc.gnu.org/PR106648;>No
+   
+
+
+
+   Relaxing some constexpr restrictions 
+   https://wg21.link/p2448r2;>P2448R2
+   https://gcc.gnu.org/PR106649;>No
+   
+
+
+
+   Using unknown references in constant expressions 
+   https://wg21.link/p2280r4;>P2280R4
+   https://gcc.gnu.org/PR106650;>No
+   
+
+
+
+   static operator() 
+   https://wg21.link/p1169r4;>P1169R4
+   https://gcc.gnu.org/PR106651;>No
+   
+
+
+
+   Extended floating-point types and standard names 
+   https://wg21.link/p1467r9;>P1467R9
+   https://gcc.gnu.org/PR106652;>No
+   
+
+
+
+   Class template argument deduction from inherited constructors 
+   https://wg21.link/p2582r1;>P2582R1
+   https://gcc.gnu.org/PR106653;>No
+   
+
+
+
+   Portable assumptions 
+   https://wg21.link/p1774r8;>P1774R8
+   https://gcc.gnu.org/PR106654;>No
+   
+
+
+
+   Support for UTF-8 as a portable source file encoding 
+   https://wg21.link/p2295r6;>P2295R6
+   https://gcc.gnu.org/PR106655;>No
+   
+
+
+
+   char8_t Compatibility and Portability Fix 
+   https://wg21.link/p2513r3;>P2513R3
+   https://gcc.gnu.org/PR106656;>No
+   
+
+
+
+   Relax requirements on wchar_t to match existing 
practices 
+   https://wg21.link/p2460r2;>P2460R2
+   https://gcc.gnu.org/PR106657;>No
+   
+
+
+
+   Explicit lifetime management 
+   https://wg21.link/p2590r2;>P2590R2
+   https://gcc.gnu.org/PR106658;>No
+   
+
 

OpenMP patch ping

2022-08-16 Thread Tobias Burnus

I would like to ping the following OpenMP patches.

First two non-pings but just RFC:

- "Restore 'GOMP_offload_unregister_ver' functionality"
  https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597918.html
  * QUESTION: See 'assert' question in email exchange
   (linked email message + emails before/after it in the thread)

- [Patch] OpenMP: Fix folding with simd's linear clause [PR106492]
  https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599308.html
  * Already committed as obvious to mainline + GCC 12
  * Question: Close PR or also backport to GCC 11? Other patch comments?


Now the patch pings – lightly ordered by complexity:

* [PATCH, OpenMP, C++] Allow classes with static members to be mappable
  https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598918.html
  * Modified as requested in the patch review

* [Patch] OpenMP: Fix var replacement with 'simd' and linear-step vars 
[PR106548]
  https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599832.html

* [Patch] Fortran: OpenMP fix declare simd inside modules and absent linear 
step [PR106566]
  https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599829.html

* [patch] libgomp/splay-tree.h: Fix splay_tree_prefix handling
  https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598694.html

* [Patch] OpenMP requires: Fix diagnostic filename corner case
  https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598697.html

* [PATCH] openmp: fix max_vf setting for amdgcn offloading
  https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598265.html

(Side note: we should at some point find a way to improve target-specific
handling; similar to the are-exceptions-supported issue of PR101544 but
there are more.)

* [PATCH] OpenMP, libgomp: Environment variable syntax extension.
  https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598770.html
  * Post-review Revised patch

* [Patch] OpenMP, libgomp, gimple: omp_get_max_teams, omp_set_num_teams, and 
omp_{gs}et_teams_thread_limit on offload devices
  https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599278.html
  * Revised patch; saw a light review before

* [Patch] OpenMP: Support reverse offload (middle end part)
  https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598662.html
(Does permanent host fallback for target regions; I have a WIP patch
for AMDGCN and nvptx)

* [PATCH 0/3] OpenMP SIMD routines
  https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599487.html

* [PATCH 0/5] [gfortran] Support for allocate directive (OpenMP 5.0)
  https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588367.html
(already older - but I missed to ping it.)

* [PATCH 00/17] openmp, nvptx, amdgcn: 5.0 Memory Allocators
  https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597976.html
  * Unified-Shared Memory & Pinned Memory

Depending on those:

* [PATCH] OpenMP, libgomp: Handle unified shared memory in 
omp_target_is_accessible.
  https://gcc.gnu.org/pipermail/gcc-patches/2022-May/594187.html

* [PATCH, OpenMP, Fortran] requires unified_shared_memory 1/2: adjust 
libgfortran memory allocators
  https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599703.html
  (Fortran part, required for ...)
* [PATCH, OpenMP, Fortran] requires unified_shared_memory 2/2: insert USM 
allocators into libgfortran
  https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599704.html

And finally:

* [PATCH, libgomp] Fix chunk_size<1 for dynamic schedule
  https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599332.html

Tobias

PS: I hope the list above and the one below is somewhat complete...

 * * *

PPS: Tracking patches pending (re)submissions, just that it is not
forgotten on our side:

* [PATCH, OpenMP, v4] Implement uses_allocators clause for target regions
  https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596587.html
  * Needs to be revised according to review comments

* Fortran allocatable components handling (needs to be slit into separate 
pieces and submitted
  separately)
  https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593704.html

* [PATCH 00/16] OpenMP: lvalues in "map" clauses and struct handling rework
  https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586600.html
  and January + February updates (search for metadirective)
+ patch review end of May, e.g.
  https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595762.html (+ 4 more 
emails)

* [PATCH 00/16] OpenMP: lvalues in "map" clauses and struct handling rework
  https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585439.html
  + [PATCH v2 00/11] OpenMP 5.0: C & C++ "declare mapper" support (plus struct 
rework, etc.)
  https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591973.html

* [PATCH 00/40] OpenACC "kernels" Improvements
  https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586901.html
  (4 simple patches are in by now; could be reviewed, in principle)

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 

Re: Ping^2: 2 libcpp patches

2022-08-16 Thread Joseph Myers
On Tue, 16 Aug 2022, Lewis Hyatt via Gcc-patches wrote:

> For the first patch, I think it is a worthwhile goal to fix all the
> places where libcpp fails to support UTF-8 correctly, and this is one
> of two remaining ones that I'm aware of. I can fix the other case
> (handling of #pragma push_macro) once this one is in place.
> 
> The second patch is about libcpp not allowing raw strings containing
> newlines in preprocessor directives, which is a nearly decade-old
> glitch that I think is also worth addressing.

As these are both about C++ features, C++ maintainers might be best placed 
to look at them.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: ICE after folding svld1rq to vec_perm_expr duing forwprop

2022-08-16 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni  writes:
> On Tue, 9 Aug 2022 at 18:42, Richard Biener  
> wrote:
>>
>> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni
>>  wrote:
>> >
>> > On Mon, 8 Aug 2022 at 14:27, Richard Biener  
>> > w>> > >
>> > >
>> > >   /* If result vector has greater length than input vector,
>> > > + then allow permuting two vectors as long as:
>> > > + a) sel.nelts_per_pattern == 1
>> > > + b) sel.npatterns == len of input vector.
>> > > + The intent is to permute input vectors, and
>> > > + dup the elements in resulting vector to target vector length.  */
>> > > +
>> > > +  if (maybe_gt (TYPE_VECTOR_SUBPARTS (type),
>> > > +   TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0
>> > > +{
>> > > +  nelts = sel.encoding ().npatterns ();
>> > > +  if (sel.encoding ().nelts_per_pattern () != 1
>> > > + || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE 
>> > > (arg0)
>> > > +   return NULL_TREE;
>> > > +}
>> > >
>> > > so the only case you add is non-VLA to VLA and there
>> > > explicitely only the case of a period that's same as the
>> > > element count in the input vectors.
>> > >
>> > >
>> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree
>> > > node, int spc, dump_flags_t flags,
>> > > pp_space (pp);
>> > >   }
>> > >   }
>> > > +   if (VECTOR_TYPE_P (TREE_TYPE (node))
>> > > +   && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant ())
>> > > + pp_string (pp, ", ... ");
>> > > pp_right_brace (pp);
>> > >
>> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"?  Are they?
>> > Well, it got created for the following case after folding:
>> > svint32_t f2(int a, int b, int c, int d)
>> > {
>> >   int32x4_t v = {a, b, c, d};
>> >   return svld1rq_s32 (svptrue_b8 (), [0]);
>> > }
>> >
>> > The svld1rq_s32 call gets folded to:
>> > v = {a, b, c, d}
>> > lhs = VEC_PERM_EXPR
>> >
>> > fold_vec_perm then folds the above VEC_PERM_EXPR to
>> > VLA constructor, since elements in v (in_elts) are not constant, and
>> > need_ctor is thus true:
>> > lhs = {a, b, c, d, ...}
>> > I added "..." to make it more explicit that it's a VLA constructor.
>>
>> But I doubt we do anything reasonable with such a beast?  Do we?
>> I suppose it's like a vec_duplicate if you view it as V1TImode
>> but do we actually make sure to do this duplication?
> I am not sure. As mentioned above, the current code-gen for VLA
> constructor looks pretty bad.
> Should we avoid folding VLA constructors for now ?

VLA constructors aren't really a thing.  At least, the only VLA vector
you could represent with current CONSTRUCTOR nodes is a fixed-length
sequence at the start of an otherwise zero vector.  I'm not sure
we even use that though (perhaps we do and I've forgotten).

> I guess these are 2 different issues:
> (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests.
> (b) Extending fold_vec_perm to handle vectors with differing lengths.
>
> For (a), I think the issue with using:
> res_type = gimple_assign_lhs (stmt)
> in previous patch, was that op2's type will change to match tgt_units,
> if we go thru
> (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch,
> and may thus not be same as len(lhs_type) anymore, and hit the assert
> in fold_vec_perm.
>
> IIUC, for lhs = VEC_PERM_EXPR, we now have the
> following semantics:
> (1) Element types for lhs, rhs1 and rhs2 should be the same.
> (2) len(lhs) == len(mask) and len(rhs1) == len(rhs2).

Yeah.

> The attached patch changes res_type from TREE_TYPE (arg0) to following:
> res_type = build_vector_type (TREE_TYPE (TREE_TYPE (arg0)),
> TYPE_VECTOR_SUBPARTS (op2))
> so it has same element type as arg0 (and arg1) and len of op2.
> Does that look reasonable ?
>
> If we need a cast from res_type to lhs_type, then both would be fixed
> width vectors
> with len(lhs_type) being a multiple of len(res_type).
> IIUC, we don't support casting from VLA vector to/from fixed width vector,

Yes, that's not supported as a cast.  If the compiler knows the
length of the "VLA" vector then it's not VLA.  If it doesn't
know the length of the VLA vector then the sizes could be different
(preventing VIEW_CONVERT_EXPR) and the number of elements could be
different (preventing pointwise CONVERT_EXPRs).

> or from VLA vector of one type to VLA vector of other type ?

That's supported though.  They work just like VLS vectors: if the sizes
are the same then we can use VIEW_CONVERT_EXPR, if the number of elements
are the same then we can do pointwise conversions (e.g. element-by-element
extensions, truncations, conversions to float, conversions to integer, etc).

> Currently, if op2 is VLA, and we enter the branch:
> (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR)
> then I think it will bail out because op2_units will not be a compile
> time constant,
> and constant_multiple_p (op2_units, tgt_units, ) would 

[PATCH][pushed] docs: remove link to www.bullfreeware.com from install

2022-08-16 Thread Martin Liška
As mentioned at https://gcc.gnu.org/PR106637#c2, the discontinued
providing binaries.

PR target/106637

gcc/ChangeLog:

* doc/install.texi: Remove link to www.bullfreeware.com
---
 gcc/doc/install.texi | 4 
 1 file changed, 4 deletions(-)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 460da3a0fd5..b3bebdf125b 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -3474,10 +3474,6 @@ contact their makers.
 @item
 AIX:
 @itemize
-@item
-@uref{http://www.bullfreeware.com,,Bull's Open Source Software Archive for
-for AIX 6 and AIX 7};
-
 @item
 @uref{http://www.perzl.org/aix/,,AIX Open Source Packages (AIX5L AIX 6.1
 AIX 7.1)}.
-- 
2.37.1



Re: [PATCH 0/2] RISC-V: Support _Float16 type and implement zfh and zfhmin extension

2022-08-16 Thread Kito Cheng via Gcc-patches
This patch set has been committed to trunk.

On Wed, Aug 10, 2022 at 11:44 PM Kito Cheng  wrote:
>
> This patch set implements Zfh and Zfhmin, adds soft-float for _Float16, and 
> enables _Float16 type in C++ mode.
>
> Zfh and Zfhmin are extensions for IEEE half precision, both are ratified in 
> Jan. 2022[1]
>
> v2 Changes:
> Fix mangling for C++ mode to fit the RISC-V psABI spec.
>
>
> [1] 
> https://github.com/riscv/riscv-isa-manual/commit/b35a54079e0da11740ce5b1e6db999d1d5172768
>
>
>


[wwwdocs] Update C++ DR table from Core Language Issue TOC, Revision 109

2022-08-16 Thread Marek Polacek via Gcc-patches
A lot of updates this time.

Pushed.

commit 0a423169f0abf14b765493d7b11b790d847494e8
Author: Marek Polacek 
Date:   Tue Aug 16 11:32:24 2022 -0400

cxx-dr-status: Update from C++ Core Language Issue TOC, Revision 109

diff --git a/htdocs/projects/cxx-dr-status.html 
b/htdocs/projects/cxx-dr-status.html
index 501fa501..39e6a6e3 100644
--- a/htdocs/projects/cxx-dr-status.html
+++ b/htdocs/projects/cxx-dr-status.html
@@ -15,7 +15,7 @@
 
   This table tracks the implementation status of C++ defect reports in GCC.
   It is based on C++ Standard Core Language Issue Table of Contents, Revision
-  106 (http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_toc.html;>here).
+  109 (http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_toc.html;>here).
 
   
 
@@ -1212,7 +1212,7 @@
 
 
   https://wg21.link/cwg170;>170
-  drafting
+  open
   Pointer-to-member conversions
   -
   
@@ -3211,7 +3211,7 @@
 
 
   https://wg21.link/cwg455;>455
-  drafting
+  open
   Partial ordering and non-deduced arguments
   -
   
@@ -3358,7 +3358,7 @@
 
 
   https://wg21.link/cwg476;>476
-  extension
+  CD5
   Determining the buffer size for placement new
   ?
   
@@ -4070,9 +4070,9 @@
   ?
   
 
-
+
   https://wg21.link/cwg578;>578
-  open
+  review
   Phase 1 replacement of characters with 
universal-character-names
   -
   
@@ -4835,7 +4835,7 @@
 
 
   https://wg21.link/cwg687;>687
-  extension
+  NAD
   template keyword with unqualified-ids
   ?
   
@@ -4847,9 +4847,9 @@
   ?
   
 
-
+
   https://wg21.link/cwg689;>689
-  open
+  CD5
   Maximum values of signed and unsigned integers
   -
   
@@ -5122,7 +5122,7 @@
 
 
   https://wg21.link/cwg728;>728
-  extension
+  NAD
   Restrictions on local classes
   ?
   
@@ -6437,9 +6437,9 @@
   ?
   
 
-
+
   https://wg21.link/cwg916;>916
-  open
+  concepts
   Does a reference type have a destructor?
   -
   
@@ -6635,7 +6635,7 @@
 
 
   https://wg21.link/cwg944;>944
-  extension
+  NAD
   reinterpret_cast for all types with the same size and 
alignment
   ?
   
@@ -8798,7 +8798,7 @@
 
 
   https://wg21.link/cwg1253;>1253
-  drafting
+  open
   Generic non-template members
   -
   
@@ -8812,7 +8812,7 @@
 
 
   https://wg21.link/cwg1255;>1255
-  drafting
+  open
   Definition problems with constexpr functions
   -
   
@@ -8831,9 +8831,9 @@
   -
   
 
-
+
   https://wg21.link/cwg1258;>1258
-  drafting
+  CD5
   "Instantiation context" differs from dependent lookup rules
   -
   
@@ -9085,7 +9085,7 @@
 
 
   https://wg21.link/cwg1294;>1294
-  drafting
+  open
   Side effects in dynamic/static initialization
   -
   
@@ -9309,7 +9309,7 @@
 
 
   https://wg21.link/cwg1326;>1326
-  extension
+  dup
   Deducing an array bound from an initializer-list
   ?
   
@@ -9372,7 +9372,7 @@
 
 
   https://wg21.link/cwg1335;>1335
-  drafting
+  open
   Stringizing, extended characters, and universal-character-names
   -
   
@@ -9419,9 +9419,9 @@
   ?
   
 
-
+
   https://wg21.link/cwg1342;>1342
-  drafting
+  DRWP
   Order of initialization with multiple declarators
   -
   
@@ -9778,7 +9778,7 @@
 
 
   https://wg21.link/cwg1393;>1393
-  extension
+  C++17
   Pack expansions in using-declarations
   ?
   
@@ -9799,7 +9799,7 @@
 
 
   https://wg21.link/cwg1396;>1396
-  drafting
+  open
   Deferred instantiation and checking of non-static data member 
initializers
   -
   
@@ -9846,9 +9846,9 @@
   ?
   
 
-
+
   https://wg21.link/cwg1403;>1403
-  open
+  review
   Universal-character-names in comments
   -
   
@@ -10037,7 +10037,7 @@
 
 
   https://wg21.link/cwg1430;>1430
-  drafting
+  open
   Pack expansion into fixed alias template parameter list
   -
   https://gcc.gnu.org/PR66834;>PR66834,
@@ -10054,7 +10054,7 @@
 
 
   https://wg21.link/cwg1432;>1432
-  drafting
+  open
   Newly-ambiguous variadic template expansions
   -
   
@@ -10313,7 +10313,7 @@
 
 
   https://wg21.link/cwg1469;>1469
-  extension
+  CD5
   Omitted bound in array new-expression
   ?
   
@@ -10957,7 +10957,7 @@
 
 
   https://wg21.link/cwg1561;>1561
-  extension
+  CD4
   Aggregates with empty base classes
   ?
   
@@ -11413,7 +11413,7 @@
 
 
   https://wg21.link/cwg1626;>1626
-  drafting
+  open
   constexpr member 

[Patch] OpenMP: Fix var replacement with 'simd' and linear-step vars [PR106548]

2022-08-16 Thread Tobias Burnus

The testcase is just a copy of linear-1 with 'omp ... for' replaced by 'omp ... 
for simd',
matching what the PR report referred to.

The problem occurs for 'omp ... for simd linear( i : step)' when 'step' is a 
variable
when a omp_fn... is generated - as in this case, the original variable is used 
(in the
reduced example of the PR, the PARM_DECL of 'f') instead of the replacement.

OK for mainline? Thoughts on backporting (and for which versions)?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP: Fix var replacement with 'simd' and linear-step vars [PR106548]

gcc/ChangeLog:

	PR middle-end/106548
	* omp-low.cc (lower_rec_input_clauses): Use build_outer_var_ref
	for 'simd' linear-step values that are variable.

libgomp/ChangeLog:

	PR middle-end/106548
	* testsuite/libgomp.c/linear-2.c: New test.

diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index 3c4b8593c8b..d6d6ff372a1 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -6188,6 +6188,10 @@ lower_rec_input_clauses (tree clauses, gimple_seq *ilist, gimple_seq *dlist,
 		  && gimple_omp_for_combined_into_p (ctx->stmt))
 		{
 		  tree t = OMP_CLAUSE_LINEAR_STEP (c);
+		  if (VAR_P (t)
+			  || TREE_CODE (t) == PARM_DECL
+			  || TREE_CODE (t) == RESULT_DECL)
+			t = build_outer_var_ref (t, ctx);
 		  tree stept = TREE_TYPE (t);
 		  tree ct = omp_find_clause (clauses,
 		 OMP_CLAUSE__LOOPTEMP_);
diff --git a/libgomp/testsuite/libgomp.c/linear-2.c b/libgomp/testsuite/libgomp.c/linear-2.c
new file mode 100644
index 000..fee6fbc276d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/linear-2.c
@@ -0,0 +1,251 @@
+/* PR middle-end/106548.  */
+int a[256];
+
+__attribute__((noinline, noclone)) int
+f1 (int i)
+{
+  #pragma omp parallel for simd linear (i: 4)
+  for (int j = 16; j < 64; j++)
+{
+  a[i] = j;
+  i += 4;
+}
+  return i;
+}
+
+__attribute__((noinline, noclone)) short int
+f2 (short int i, char k)
+{
+  #pragma omp parallel for simd linear (i: k + 1)
+  for (long j = 16; j < 64; j++)
+{
+  a[i] = j;
+  i += 4;
+}
+  return i;
+}
+
+__attribute__((noinline, noclone)) long long int
+f3 (long long int i, long long int k)
+{
+  #pragma omp parallel for simd linear (i: k)
+  for (short j = 16; j < 64; j++)
+{
+  a[i] = j;
+  i += 4;
+}
+  return i;
+}
+
+__attribute__((noinline, noclone)) int
+f4 (int i)
+{
+  #pragma omp parallel for simd linear (i: 4) schedule(static, 3)
+  for (int j = 16; j < 64; j++)
+{
+  a[i] = j;
+  i += 4;
+}
+  return i;
+}
+
+__attribute__((noinline, noclone)) short int
+f5 (short int i, char k)
+{
+  #pragma omp parallel for simd linear (i: k + 1) schedule(static, 5)
+  for (long j = 16; j < 64; j++)
+{
+  a[i] = j;
+  i += 4;
+}
+  return i;
+}
+
+__attribute__((noinline, noclone)) long long int
+f6 (long long int i, long long int k)
+{
+  #pragma omp parallel for simd linear (i: k) schedule(static, 7)
+  for (short j = 16; j < 64; j++)
+{
+  a[i] = j;
+  i += 4;
+}
+  return i;
+}
+
+__attribute__((noinline, noclone)) int
+f7 (int i)
+{
+  #pragma omp parallel for simd linear (i: 4) schedule(dynamic, 3)
+  for (int j = 16; j < 64; j++)
+{
+  a[i] = j;
+  i += 4;
+}
+  return i;
+}
+
+__attribute__((noinline, noclone)) short int
+f8 (short int i, char k)
+{
+  #pragma omp parallel for simd linear (i: k + 1) schedule(dynamic, 5)
+  for (long j = 16; j < 64; j++)
+{
+  a[i] = j;
+  i += 4;
+}
+  return i;
+}
+
+__attribute__((noinline, noclone)) long long int
+f9 (long long int i, long long int k)
+{
+  #pragma omp parallel for simd linear (i: k) schedule(dynamic, 7)
+  for (short j = 16; j < 64; j++)
+{
+  a[i] = j;
+  i += 4;
+}
+  return i;
+}
+
+__attribute__((noinline, noclone)) int
+f10 (int i, long step)
+{
+  #pragma omp parallel for simd linear (i: 4)
+  for (int j = 16; j < 112; j += step)
+{
+  a[i] = j / 2 + 8;
+  i += 4;
+}
+  return i;
+}
+
+__attribute__((noinline, noclone)) short int
+f11 (short int i, char k, char step)
+{
+  #pragma omp parallel for simd linear (i: k + 1)
+  for (long j = 16; j < 112; j += step)
+{
+  a[i] = j / 2 + 8;
+  i += 4;
+}
+  return i;
+}
+
+__attribute__((noinline, noclone)) long long int
+f12 (long long int i, long long int k, int step)
+{
+  #pragma omp parallel for simd linear (i: k)
+  for (short j = 16; j < 112; j += step)
+{
+  a[i] = j / 2 + 8;
+  i += 4;
+}
+  return i;
+}
+
+__attribute__((noinline, noclone)) int
+f13 (int i, long long int step)
+{
+  #pragma omp parallel for simd linear (i: 4) schedule(static, 3)
+  for (int j = 16; j < 112; j += step)
+{
+  a[i] = j / 2 + 8;
+  i += 4;
+}
+  return i;
+}
+

[PING^6] nvptx: Allow '--with-arch' to override the default '-misa' (was: nvptx multilib setup)

2022-08-16 Thread Thomas Schwinge
Hi Tom!

Ping.


Grüße
 Thomas


On 2022-08-06T21:20:38+0200, I wrote:
> Hi Tom!
>
> Ping.
>
>
> Grüße
>  Thomas
>
>
> On 2022-07-27T17:48:58+0200, I wrote:
>> Hi Tom!
>>
>> Ping.
>>
>>
>> Grüße
>>  Thomas
>>
>>
>> On 2022-07-20T14:46:03+0200, I wrote:
>>> Hi Tom!
>>>
>>> Ping.
>>>
>>>
>>> Grüße
>>>  Thomas
>>>
>>>
>>> On 2022-07-13T10:42:44+0200, I wrote:
 Hi Tom!

 Ping.


 Grüße
  Thomas


 On 2022-07-05T16:59:23+0200, I wrote:
> Hi Tom!
>
> Ping.
>
>
> Grüße
>  Thomas
>
>
> On 2022-06-15T23:18:10+0200, I wrote:
>> Hi Tom!
>>
>> On 2022-05-13T16:20:14+0200, I wrote:
>>> On 2022-02-04T13:09:29+0100, Tom de Vries via Gcc  
>>> wrote:
 On 2/4/22 08:21, Thomas Schwinge wrote:
> On 2022-02-03T13:35:55+, "vries at gcc dot gnu.org via Gcc-bugs" 
>  wrote:
>> I've tested this using (recommended) driver 470.94 on boards:
>>>
>> while iterating over dimensions { -mptx=3.1 , -mptx=6.3 } x { 
>> GOMP_NVPTX_JIT=-O0,  }.
>
> Do you use separate (nvptx-none offload target only?) builds for
> different '-mptx' variants (likewise: '-misa'), or have you hacked up 
> the
> multilib configuration?

 Neither, I'm using --target_board=unix/foffload= for that.
>>>
>>> ACK, I see.  So these flags then only affect GCC/nvptx code generation
>>> for the actual user code (here: GCC libgomp test cases), but for the
>>> GCC/nvptx target libraries (such as: libc, libm, libgfortran, libgomp --
>>> the latter especially relevant for OpenMP), it uses PTX code from one of
>>> the two "pre-compiled" GCC/nvptx multilibs: default or '-mptx=3.1'.
>>>
>>> Meaning, one can't just use such a flag for "completely building code"
>>> for a specific configuration.  Random example,
>>> '-foffload-options=nvptx-none=-march=sm_75': as GCC/nvptx target
>>> libraries aren't being built for '-march=sm_75' multilib,
>>> '-foffload-options=nvptx-none=-march=sm_75' uses the default multilib,
>>> which isn't '-march=sm_75'.
>>>
>>>
   ('gcc/config/nvptx/t-nvptx:MULTILIB_OPTIONS'
> etc., I suppose?)  Should we add a few representative configurations 
> to
> be built by default?  And/or, should we have a way to 'configure' per
> user needs (I suppose: '--with-multilib-list=[...]', as supported for 
> a
> few other targets?)?  (I see there's also a new
> '--with-multilib-generator=[...]', haven't looked in detail.)  No 
> matter
> which way: again, combinatorial explosion is a problem, of course...

 As far as I know, the gcc build doesn't finish when switching default 
 to
 higher than sm_35, so there's little point to go to a multilib setup at
 this point.  But once we fix that, we could reconsider, otherwise,
 things are likely to regress again.
>>>
>>> As far as I remember, several issues have been fixed.  Still waiting for
>>> Roger's "middle-end: Support ABIs that pass FP values as wider integers"
>>> or something similar, but that PR104489 issue is being worked around by
>>> "Limit HFmode support to mexperimental", if I got that right.
>>>
>>> Now I'm not suggesting we should now enable all or any random GCC/nvptx
>>> multilibs, to get all these variants of GCC/nvptx target libraries 
>>> built;
>>> especially also given that GCC/nvptx code generation currently doesn't
>>> make too much use of the new capabilities.
>>>
>>> However, we do have a specific request that a customer would like to be
>>> able to change at GCC 'configure' time the GCC/nvptx default multilib
>>> (including that being used for building corresponding GCC/nvptx target
>>> libraries).
>>>
>>> Per 'gcc/doc/install.texi', I do see that some GCC targets allow for
>>> GCC 'configure'-time '--with-multilib-list=[...]', or
>>> '--with-multilib-generator=[...]', and I suppose we could be doing
>>> something similar?  But before starting implementing, I'd like your
>>> input, as you'll be the one to approve in the end.  And/or, maybe you've
>>> already made up your own ideas about that?
>>
>> So, instead of "random GCC/nvptx multilib configuration" (last
>> paragraph), I've come up with a way to implement our customer's request
>> (second last paragraph): 'configure' GCC/nvptx '--with-arch=sm_70'.
>>
>> I think I've implemented this in a way so that "random GCC/nvptx multilib
>> configuration" may eventually be implemented on top of that.  For easy
>> review/testing I've split my changes into three commits, see attached
>> "nvptx: Make default '-misa=sm_30' explicit",
>> "nvptx: Introduce dummy multilib option for default '-misa=sm_30'",
>> "nvptx: Allow 

[PING^7] nvptx: forward '-v' command-line option to assembler, linker

2022-08-16 Thread Thomas Schwinge
Hi Tom!

Ping.


Grüße
 Thomas


On 2022-08-06T21:20:23+0200, I wrote:
> Hi Tom!
>
> Ping.
>
>
> Grüße
>  Thomas
>
>
> On 2022-07-27T17:48:46+0200, I wrote:
>> Hi Tom!
>>
>> Ping.
>>
>>
>> Grüße
>>  Thomas
>>
>>
>> On 2022-07-20T14:44:36+0200, I wrote:
>>> Hi Tom!
>>>
>>> Ping.
>>>
>>>
>>> Grüße
>>>  Thomas
>>>
>>>
>>> On 2022-07-13T10:41:23+0200, I wrote:
 Hi Tom!

 Ping.


 Grüße
  Thomas


 On 2022-07-05T16:58:54+0200, I wrote:
> Hi Tom!
>
> Ping.
>
>
> Grüße
>  Thomas
>
>
> On 2022-06-07T17:41:16+0200, I wrote:
>> Hi!
>>
>> On 2022-05-30T09:06:21+0200, Tobias Burnus  
>> wrote:
>>> On 29.05.22 22:49, Thomas Schwinge wrote:
 Not sure if that's what you had in mind, but what do you think about 
 the
 attached "nvptx: forward '-v' command-line option to assembler, 
 linker"?
 OK to push to GCC master branch (after merging
 
 "Put '-v' verbose output onto stderr instead of stdout")?
>>>
>>> I was mainly thinking of some way to have it available — which
>>> '-foffload-options=-Wa,-v' already permits on the GCC side. (Once the
>>> nvptx-tools patch actually makes use of the '-v'.)
>>
>> (Merged a week ago.)
>>
>>> If I understand your patch correctly, this patch now causes 'gcc -v' to
>>> imply 'gcc -v -Wa,-v'. I think that's okay, since 'gcc -v' already
>>> outputs a lot of lines and those lines can be helpful to understand what
>>> happens and what not.
>>
>> ACK.
>>
>>> Tom, your thoughts on this?
>>
>> Ping.
>>
>>
>> Grüße
>>  Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 17c35607d4927299b0c4bd19dd6fd205c85c4a4b Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Sun, 29 May 2022 22:31:43 +0200
Subject: [PATCH] nvptx: forward '-v' command-line option to assembler, linker

For example, for offloading compilation with '-save-temps -v', before vs. after
word-diff then looks like:

[...]
 [...]/build-gcc-offload-nvptx-none/gcc/as {+-v -v+} -o ./a.xnvptx-none.mkoffload.o ./a.xnvptx-none.mkoffload.s
{+Verifying sm_30 code with sm_35 code generation.+}
{+ ptxas -c -o /dev/null ./a.xnvptx-none.mkoffload.o --gpu-name sm_35 -O0+}
[...]
 [...]/build-gcc-offload-nvptx-none/gcc/collect2 {+-v -v+} -o ./a.xnvptx-none.mkoffload [...] @./a.xnvptx-none.mkoffload.args.1 -lgomp -lgcc -lc -lgcc
{+collect2 version 12.0.1 20220428 (experimental)+}
{+[...]/build-gcc-offload-nvptx-none/gcc/collect-ld -v -v -o ./a.xnvptx-none.mkoffload [...] ./a.xnvptx-none.mkoffload.o -lgomp -lgcc -lc -lgcc+}
{+Linking ./a.xnvptx-none.mkoffload.o as 0+}
{+trying lib libc.a+}
{+trying lib libgcc.a+}
{+trying lib libgomp.a+}
{+Resolving abort+}
{+Resolving acc_on_device+}
{+Linking libgomp.a::oacc-init.o/ as 1+}
{+Linking libc.a::lib_a-abort.o/   as 2+}
[...]

(This depends on 
"Put '-v' verbose output onto stderr instead of stdout".)

	gcc/
	* config/nvptx/nvptx.h (ASM_SPEC, LINK_SPEC): Define.
---
 gcc/config/nvptx/nvptx.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index ed72c253191..b184f1d0150 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -27,6 +27,13 @@
 
 /* Run-time Target.  */
 
+/* Assembler supports '-v' option; handle similar to
+   '../../gcc.cc:asm_options', 'HAVE_GNU_AS'.  */
+#define ASM_SPEC "%{v}"
+
+/* Linker supports '-v' option.  */
+#define LINK_SPEC "%{v}"
+
 #define STARTFILE_SPEC "%{mmainkernel:crt0.o}"
 
 #define TARGET_CPU_CPP_BUILTINS() nvptx_cpu_cpp_builtins ()
-- 
2.25.1



Re: [Patch] Fortran: OpenMP fix declare simd inside modules and absent linear step [PR106566]

2022-08-16 Thread Tobias Burnus

Fixed subject line: "absent linear" should be "absent linear step" in the 
subject line;
i.e. with "step" added: "Fortran: OpenMP fix declare simd inside modules and absent 
linear step [PR106566]"

I have also decided to move the 'step = 1' to openmp.cc, which also set it 
before with
the old pre-OpenMP 5.2 syntax.

I also added a pre-OpenMP-5.2-syntax example.

 * * *

For GCC 12 (and GCC 11), only the '%s' fix and the third, now added example 
apply;
for the 5.1 syntax, 'step' was already set.

OK? And thoughts regarding the backports (none? Only 12? Or 11+12?)?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
Fortran: OpenMP fix declare simd inside modules and absent linear step [PR106566]

gcc/fortran/ChangeLog:

	PR fortran/106566
	* openmp.cc (gfc_match_omp_clauses): Fix setting linear-step value
	to 1 when not specified.
	(gfc_match_omp_declare_simd): Accept module procedures.

gcc/testsuite/ChangeLog:

	PR fortran/106566
	* gfortran.dg/gomp/declare-simd-4.f90: New test.
	* gfortran.dg/gomp/declare-simd-5.f90: New test.
	* gfortran.dg/gomp/declare-simd-6.f90: New test.

 gcc/fortran/openmp.cc | 10 +++--
 gcc/testsuite/gfortran.dg/gomp/declare-simd-4.f90 | 42 +++
 gcc/testsuite/gfortran.dg/gomp/declare-simd-5.f90 | 49 +++
 gcc/testsuite/gfortran.dg/gomp/declare-simd-6.f90 | 42 +++
 4 files changed, 140 insertions(+), 3 deletions(-)

diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index a7eb6c3e8f4..594907714ff 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -2480,7 +2480,7 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
 		  goto error;
 		}
 		}
-	  else
+	  if (step == NULL)
 		{
 		  step = gfc_get_constant_expr (BT_INTEGER,
 		gfc_default_integer_kind,
@@ -4213,9 +4213,13 @@ gfc_match_omp_declare_simd (void)
   gfc_omp_declare_simd *ods;
   bool needs_space = false;
 
-  switch (gfc_match (" ( %s ) ", _name))
+  switch (gfc_match (" ( "))
 {
-case MATCH_YES: break;
+case MATCH_YES:
+  if (gfc_match_symbol (_name, /* host assoc = */ true) != MATCH_YES
+	  || gfc_match (" ) ") != MATCH_YES)
+	return MATCH_ERROR;
+  break;
 case MATCH_NO: proc_name = NULL; needs_space = true; break;
 case MATCH_ERROR: return MATCH_ERROR;
 }
diff --git a/gcc/testsuite/gfortran.dg/gomp/declare-simd-4.f90 b/gcc/testsuite/gfortran.dg/gomp/declare-simd-4.f90
new file mode 100644
index 000..44132525963
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/declare-simd-4.f90
@@ -0,0 +1,42 @@
+! { dg-do compile }
+! { dg-additional-options "-fdump-tree-gimple" }
+!
+! PR fortran/106566
+!
+! { dg-final { scan-tree-dump-times "__attribute__\\(\\(omp declare simd \\(linear\\(0:ref,step\\(4\\)\\) simdlen\\(8\\)\\)\\)\\)" 2 "gimple" } }
+! { dg-final { scan-tree-dump-times "__attribute__\\(\\(omp declare simd \\(linear\\(0:ref,step\\(8\\)\\) simdlen\\(8\\)\\)\\)\\)" 2 "gimple" } }
+
+subroutine add_one2(p)
+  implicit none
+  !$omp declare simd(add_one2) linear(p: ref) simdlen(8)
+  integer :: p
+
+  p = p + 1
+end subroutine
+
+subroutine linear_add_one2(p)
+  implicit none
+  !$omp declare simd(linear_add_one2) linear(p: ref, step(2)) simdlen(8)
+  integer :: p
+
+  p = p + 1
+end subroutine
+
+module m
+   integer, parameter :: NN = 1023
+   integer :: a(NN)
+contains
+  subroutine module_add_one2(q)
+implicit none
+!$omp declare simd(module_add_one2) linear(q: ref) simdlen(8)
+integer :: q
+q = q + 1
+  end subroutine
+
+  subroutine linear_add_one2(q)
+implicit none
+!$omp declare simd(linear_add_one2) linear(q: ref, step(2)) simdlen(8)
+integer :: q
+q = q + 1
+  end subroutine
+end module
diff --git a/gcc/testsuite/gfortran.dg/gomp/declare-simd-5.f90 b/gcc/testsuite/gfortran.dg/gomp/declare-simd-5.f90
new file mode 100644
index 000..f5880f50090
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/declare-simd-5.f90
@@ -0,0 +1,49 @@
+! { dg-do compile }
+!
+! PR fortran/106566
+!
+
+subroutine add_one2(p)
+  implicit none
+  procedure(add_one2) :: ext1
+  !$omp declare simd(ext1) linear(p: ref) simdlen(8)  ! { dg-error "OMP DECLARE SIMD should refer to containing procedure 'add_one2'" }
+  integer :: p
+
+  p = p + 1
+end subroutine
+
+subroutine linear_add_one2(p)
+  implicit none
+  procedure(linear_add_one2) :: ext2
+  !$omp declare simd(ext2) linear(p: ref, step(2)) simdlen(8)  ! { dg-error "OMP DECLARE SIMD should refer to containing procedure 'linear_add_one2'" }
+  integer :: p
+
+  p = p + 1
+end subroutine
+
+module m
+   integer, parameter :: NN = 1023
+   integer :: a(NN)
+contains
+  subroutine some_proc(r)
+integer :: r
+  end subroutine
+  subroutine 

Re: [PATCH] soft-fp: Update soft-fp from glibc

2022-08-16 Thread Kito Cheng via Gcc-patches
Hi Joseph:

I saw other soft-fp updates will also ask for some approval on the
list too, anyway I know that now :)

Thanks!

On Tue, Aug 16, 2022 at 10:18 PM Joseph Myers  wrote:
>
> On Tue, 16 Aug 2022, Kito Cheng wrote:
>
> > ping
>
> Under our write access policies, "Importing files maintained outside the
> tree from their official versions." does not require review or approval.
>
> --
> Joseph S. Myers
> jos...@codesourcery.com


[PATCH] fortran: Add -static-libquadmath support [PR46539]

2022-08-16 Thread Jakub Jelinek via Gcc-patches
Hi!

The following patch is a revival of the
https://gcc.gnu.org/legacy-ml/gcc-patches/2014-10/msg00771.html
patch.  While trunk configured against recent glibc and with linker
--as-needed support doesn't really need to link against -lquadmath
anymore, there are still other targets where libquadmath is still in
use.
As has been discussed, making -static-libgfortran imply statically
linking both libgfortran and libquadmath is undesirable because of
the significant licensing differences between the 2 libraries.
Compared to the 2014 patch, this one doesn't handle -lquadmath
addition in the driver, which to me looks incorrect, libgfortran
configure determines where in libgfortran.spec -lquadmath should
be present if at all and with what it should be wrapper, but
analyzes gfortran -### -static-libgfortran stderr and based on
that figures out what gcc/configure.ac determined.

So far slightly tested on x86_64-linux (and will bootstrap/regtest
it there tonight), but I unfortunately don't have a way to test it
e.g. on Darwin.

Thoughts on this?

2022-08-16  Francois-Xavier Coudert  
Jakub Jelinek  

PR fortran/46539
gcc/
* common.opt (static-libquadmath): New option.
* gcc.c (driver_handle_option): Always accept -static-libquadmath.
* config/darwin.h (LINK_SPEC): Handle -static-libquadmath.
gcc/fortran/
* lang.opt (static-libquadmath): New option.
* invoke.texi (-static-libquadmath): Document it.
* options.c (gfc_handle_option): Error out if -static-libquadmath
is passed but we do not support it.
libgfortran/
* acinclude.m4 (LIBQUADSPEC): From $FC -static-libgfortran -###
output determine -Bstatic/-Bdynamic, -bstatic/-bdynamic,
-aarchive_shared/-adefault linker support or Darwin remapping
of -lgfortran to libgfortran.a%s and use that around or instead
of -lquadmath in LIBQUADSPEC.
* configure: Regenerated.

--- gcc/common.opt.jj   2022-06-27 11:18:02.050066582 +0200
+++ gcc/common.opt  2022-08-16 14:51:04.611673800 +0200
@@ -3601,6 +3601,10 @@ static-libphobos
 Driver
 ; Documented for D, but always accepted by driver.
 
+static-libquadmath
+Driver
+; Documented for Fortran, but always accepted by driver.
+
 static-libstdc++
 Driver
 
--- gcc/gcc.cc.jj   2022-08-11 09:57:24.765334380 +0200
+++ gcc/gcc.cc  2022-08-16 14:57:54.708327024 +0200
@@ -4585,12 +4585,14 @@ driver_handle_option (struct gcc_options
 case OPT_static_libgcc:
 case OPT_shared_libgcc:
 case OPT_static_libgfortran:
+case OPT_static_libquadmath:
 case OPT_static_libphobos:
 case OPT_static_libstdc__:
   /* These are always valid, since gcc.cc itself understands the
 first two, gfortranspec.cc understands -static-libgfortran,
-d-spec.cc understands -static-libphobos, and g++spec.cc
-understands -static-libstdc++ */
+d-spec.cc understands -static-libphobos, g++spec.cc
+understands -static-libstdc++ and libgfortran.spec handles
+-static-libquadmath.  */
   validated = true;
   break;
 
--- gcc/config/darwin.h.jj  2022-08-16 14:51:14.529544492 +0200
+++ gcc/config/darwin.h 2022-08-16 14:53:54.402460097 +0200
@@ -443,6 +443,7 @@ extern GTY(()) int darwin_ms_struct;
  %:replace-outfile(-lobjc libobjc-gnu.a%s); \
 :%:replace-outfile(-lobjc -lobjc-gnu )}}\
%{static|static-libgcc|static-libgfortran:%:replace-outfile(-lgfortran 
libgfortran.a%s)}\
+   %{static|static-libgcc|static-libgfortran:%:replace-outfile(-lquadmath 
libquadmath.a%s)}\
%{static|static-libgcc|static-libphobos:%:replace-outfile(-lgphobos 
libgphobos.a%s)}\

%{static|static-libgcc|static-libstdc++|static-libgfortran:%:replace-outfile(-lgomp
 libgomp.a%s)}\
%{static|static-libgcc|static-libstdc++:%:replace-outfile(-lstdc++ 
libstdc++.a%s)}\
--- gcc/fortran/lang.opt.jj 2022-02-04 14:36:55.050604670 +0100
+++ gcc/fortran/lang.opt2022-08-16 14:52:52.459267705 +0200
@@ -863,6 +863,10 @@ static-libgfortran
 Fortran
 Statically link the GNU Fortran helper library (libgfortran).
 
+static-libquadmath
+Fortran
+Statically link the GCC Quad-Precision Math Library (libquadmath).
+
 std=f2003
 Fortran
 Conform to the ISO Fortran 2003 standard.
--- gcc/fortran/options.cc.jj   2022-01-18 11:58:59.568982256 +0100
+++ gcc/fortran/options.cc  2022-08-16 14:56:22.807525218 +0200
@@ -692,6 +692,13 @@ gfc_handle_option (size_t scode, const c
 #endif
   break;
 
+case OPT_static_libquadmath:
+#ifndef HAVE_LD_STATIC_DYNAMIC
+  gfc_fatal_error ("%<-static-libquadmath%> is not supported in this "
+  "configuration");
+#endif
+  break;
+
 case OPT_fintrinsic_modules_path:
 case OPT_fintrinsic_modules_path_:
 
--- gcc/fortran/invoke.texi.jj  2022-05-09 09:09:20.312473272 +0200
+++ gcc/fortran/invoke.texi 2022-08-16 16:12:47.638203577 +0200
@@ -170,7 +170,7 @@ and warnings}.
 
 

Re: [PATCH] Support threading of just the exit edge

2022-08-16 Thread Andrew MacLeod via Gcc-patches



On 8/16/22 05:18, Richard Biener wrote:

On Mon, 15 Aug 2022, Aldy Hernandez wrote:


On Mon, Aug 15, 2022 at 9:24 PM Andrew MacLeod  wrote:

heh. or just


+  int_range<2> r;
+  if (!fold_range (r, const_cast  (cond_stmt))
+  || !r.singleton_p ())


if you do not provide a range_query to any of the fold_using_range code,
it defaults to:

fur_source::fur_source (range_query *q)
{
if (q)
  m_query = q;
else if (cfun)
  m_query = get_range_query (cfun);
else
  m_query = get_global_range_query ();
m_gori = NULL;
}


Sweet.  Even better!

So when I do the following incremental change ontop of the posted
patch then I see that the path query is able to simplify more
"single BB paths" than the global range folding.

diff --git a/gcc/tree-ssa-threadbackward.cc
b/gcc/tree-ssa-threadbackward.cc
index 669098e4ec3..777e778037f 100644
--- a/gcc/tree-ssa-threadbackward.cc
+++ b/gcc/tree-ssa-threadbackward.cc
@@ -314,6 +314,12 @@ back_threader::find_taken_edge_cond (const
vec ,
  {
int_range_max r;
  
+  int_range<2> rf;

+  if (path.length () == 1)
+{
+  fold_range (rf, cond);
+}
+
m_solver->compute_ranges (path, m_imports);
m_solver->range_of_stmt (r, cond);
  
@@ -325,6 +331,8 @@ back_threader::find_taken_edge_cond (const

vec ,
  
if (r == true_range || r == false_range)

  {
+  if (path.length () == 1)
+   gcc_assert  (r == rf);
edge e_true, e_false;
basic_block bb = gimple_bb (cond);
extract_true_false_edges_from_block (bb, _true, _false);

Even doing the following (not sure what's the difference and in
particular expense over the path range query) results in missed
simplifications (checking my set of cc1files).

diff --git a/gcc/tree-ssa-threadbackward.cc
b/gcc/tree-ssa-threadbackward.cc
index 669098e4ec3..1d43a179d08 100644
--- a/gcc/tree-ssa-threadbackward.cc
+++ b/gcc/tree-ssa-threadbackward.cc
@@ -99,6 +99,7 @@ private:
  
back_threader_registry m_registry;

back_threader_profitability m_profit;
+  gimple_ranger *m_ranger;
path_range_query *m_solver;
  
// Current path being analyzed.

@@ -146,12 +147,14 @@ back_threader::back_threader (function *fun,
unsigned flags, bool first)
// The path solver needs EDGE_DFS_BACK in resolving mode.
if (flags & BT_RESOLVE)
  mark_dfs_back_edges ();
-  m_solver = new path_range_query (flags & BT_RESOLVE);
+  m_ranger = new gimple_ranger;
+  m_solver = new path_range_query (flags & BT_RESOLVE, m_ranger);
  }
  
  back_threader::~back_threader ()

  {
delete m_solver;
+  delete m_ranger;
  
loop_optimizer_finalize ();

  }
@@ -314,6 +317,12 @@ back_threader::find_taken_edge_cond (const
vec ,
  {
int_range_max r;
  
+  int_range<2> rf;

+  if (path.length () == 1)
+{
+  fold_range (rf, cond, m_ranger);
+}
+
m_solver->compute_ranges (path, m_imports);
m_solver->range_of_stmt (r, cond);
  
@@ -325,6 +334,8 @@ back_threader::find_taken_edge_cond (const

vec ,
  
if (r == true_range || r == false_range)

  {
+  if (path.length () == 1)
+   gcc_assert  (r == rf);
edge e_true, e_false;
basic_block bb = gimple_bb (cond);
extract_true_false_edges_from_block (bb, _true, _false);

one example is

 [local count: 14414059]:
_128 = node_177(D)->typed.type;
pretmp_413 = MEM[(const union tree_node *)_128].base.code;
_431 = pretmp_413 + 65519;
if (_128 == 0B)
   goto ; [18.09%]
else
   goto ; [81.91%]

where m_imports for the path is just _128 and the range computed is
false while the ranger query returns VARYING.  But
path_range_query::range_defined_in_block does

   if (bb && POINTER_TYPE_P (TREE_TYPE (name)))
 m_ranger->m_cache.m_exit.maybe_adjust_range (r, name, bb);
This is the coarse grained "side effect applies somewhere in the block" 
mechanism.  There is no understanding of where in the block it happens.


which adjusts the range to ~[0, 0], probably because of the
dereference in the following stmt.

Why does fold_range not do this when folding the exit test?  Is there
a way to make it do so?  It looks like the only routine using this
in gimple-range.cc is range_on_edge and there it's used for e->src
after calling range_on_exit for e->src (why's it not done in
range_on_exit?).  A testcase for this is


Fold_range doesnt do this because it is simply another statement.  It 
makes no attempt to understand the context in which you are folding 
something. you could be folding that stmt from a different location (ie 
recomputing)   If your context is that you are looking for the range 
after the last statement has been executed, then one needs to check to 
see if there are any side effects.


ranger uses it for range_on_edge (), because  it knows all the 
statements in the block have been executed, and its safe to apply 
anything seen in the block.  It does it right after range_on_exit() is 
called internally.


Once upon a time, it was integrated with range-on-exit, but it turned 

Re: [PATCH] soft-fp: Update soft-fp from glibc

2022-08-16 Thread Joseph Myers
On Tue, 16 Aug 2022, Kito Cheng wrote:

> ping

Under our write access policies, "Importing files maintained outside the 
tree from their official versions." does not require review or approval.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Avoid further recomputations in path_range_query once path is finalized.

2022-08-16 Thread Richard Biener via Gcc-patches
On Tue, Aug 16, 2022 at 3:59 PM Aldy Hernandez  wrote:
>
> [Richi, I'm trying to make things more obvious for others working for the
> cose base.  What do you think?]
>
> This makes a few things explicit to avoid misuse.  First, we add a
> flag to differentiate between a path whose depdenency ranges are being
> computed, and a path that is finalized and no further calculations
> should be done.  I've also enhanced the comments to document what
> queries are supported.  Finally, I added some asserts to make sure
> range_of_expr is only used on the final conditional once the path has
> been finalized.
>
> Unfortunately, the forward threader builds dummy statements it passes
> to the path solver.  Most of these don't have a BB associated with
> them.  I've cleared the BB field for the one that still had one, to
> make the asserts above work.
>
> No difference in thread counts over my cc1 .ii files.
>
> gcc/ChangeLog:
>
> * gimple-range-path.cc (path_range_query::path_range_query):
> Initialized m_path_finalized.
> (path_range_query::internal_range_of_expr): Avoid further
> recomputations once path is finalized.
> Take basic_block instead of stmt as argument.
> (path_range_query::range_of_expr): Document.  Add asserts.
> (path_range_query::compute_ranges): Set m_path_finalized.
> * gimple-range-path.h (path_range_query::internal_range_of_expr):
> Replace statement argument with basic block.
> (class path_range_query): Add m_path_finalized.
> * tree-ssa-threadedge.cc
> (jump_threader::simplify_control_stmt_condition): Clear BB field
> in dummy_switch.
> ---
>  gcc/gimple-range-path.cc   | 50 ++
>  gcc/gimple-range-path.h|  7 +-
>  gcc/tree-ssa-threadedge.cc |  1 +
>  3 files changed, 52 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
> index c99d77dd340..a8412dd090b 100644
> --- a/gcc/gimple-range-path.cc
> +++ b/gcc/gimple-range-path.cc
> @@ -40,7 +40,8 @@ path_range_query::path_range_query (bool resolve, 
> gimple_ranger *ranger)
>: m_cache (new ssa_global_cache),
>  m_has_cache_entry (BITMAP_ALLOC (NULL)),
>  m_resolve (resolve),
> -m_alloced_ranger (!ranger)
> +m_alloced_ranger (!ranger),
> +m_path_finalized (false)
>  {
>if (m_alloced_ranger)
>  m_ranger = new gimple_ranger;
> @@ -159,7 +160,7 @@ path_range_query::range_on_path_entry (vrange , tree 
> name)
>  // Return the range of NAME at the end of the path being analyzed.
>
>  bool
> -path_range_query::internal_range_of_expr (vrange , tree name, gimple *stmt)
> +path_range_query::internal_range_of_expr (vrange , tree name, basic_block 
> bb)
>  {
>if (!r.supports_type_p (TREE_TYPE (name)))
>  return false;
> @@ -174,8 +175,16 @@ path_range_query::internal_range_of_expr (vrange , 
> tree name, gimple *stmt)
>return true;
>  }
>
> -  if (stmt
> -  && range_defined_in_block (r, name, gimple_bb (stmt)))
> +  // Avoid further recomputations once the path has been finalized.
> +  if (m_path_finalized)
> +{
> +  gimple_range_global (r, name);

I suppose we can't assert here instead?

> +  return true;
> +}
> +
> +  // We're being called as part of the calculation of ranges for exit
> +  // dependencies.  Set the cache as we traverse the path top-down.
> +  if (bb && range_defined_in_block (r, name, bb))
>  {
>if (TREE_CODE (name) == SSA_NAME)
> {
> @@ -192,10 +201,37 @@ path_range_query::internal_range_of_expr (vrange , 
> tree name, gimple *stmt)
>return true;
>  }
>
> +// This, as well as range_of_stmt, are the main entry points for
> +// making queries about a path.
> +//
> +// Once the ranges for the exit dependencies have been pre-calculated,
> +// the path is considered finalized, and the only valid query is
> +// asking the range of a NAME at the end of the path.
> +//
> +// Note that this method can also be called internally before the path
> +// is finalized, as part of the path traversal pre-calculating the
> +// ranges for exit dependencies.  In this case, it may be called on
> +// statements that are not the final conditional as described above.
> +
>  bool
>  path_range_query::range_of_expr (vrange , tree name, gimple *stmt)
>  {
> -  if (internal_range_of_expr (r, name, stmt))
> +  basic_block bb = stmt ? gimple_bb (stmt) : NULL;
> +
> +  // Once a path is finalized, the only valid queries are of the final
> +  // statement in the exit block.

why?  shouldn't we support queries for all dependent ranges (what
the user specified as imports argument to compute_ranges)?

If we want to restrict things this way shouldn't we simply not expose
this function but instead just path_range_query::range_of_exit,
or even what the threader uses as find_taken_edge?

The loop header copying case would also work with this,
it's interested in the known outgoing edge of the 

[Patch] Fortran: OpenMP fix declare simd inside modules and absent linear [PR106566]

2022-08-16 Thread Tobias Burnus

This patch fixes two issues – the first was reported to me by email but it
also shows up in the official OpenMP examples (see PR).

Namely: Inside a module, 'gfc_match(" ( %s )")' fails as the symbol is already
host associated. (The symbol is the current procedure name.)

Solution: Match with passing (permit) host_assoc = true to the match function
instead of 'false' as done with '%s'.

Afterwards, it was failing when folding a NULL_TREE. Solution: Init the linear
step with 1 in that case.

OK for mainline?

 * * *

I have not checked GCC < mainline. The current example is OpenMP 5.2 only
and only supported since June 7, 2022 in C/C++ and July 4 for Fortran.
However, I assume the same issue also affects GCC < 13 with a tailored
testcase. - If there is the sentiment to fix it also for older GCC,
I can come up with modified testcases and a GCC 12 (and GCC 11?) patch.
Thoughts?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
Fortran: OpenMP fix declare simd inside modules and absent linear [PR106566]

gcc/fortran/ChangeLog:

	PR fortran/106566
	* openmp.cc (gfc_match_omp_declare_simd): Accept module procedures.
	* trans-openmp.cc (gfc_trans_omp_clauses): Fix declare simd without
	linear-step value.

gcc/testsuite/ChangeLog:

	PR fortran/106566
	* gfortran.dg/gomp/declare-simd-4.f90: New test.
	* gfortran.dg/gomp/declare-simd-5.f90: New test.

 gcc/fortran/openmp.cc |  8 +++-
 gcc/fortran/trans-openmp.cc   |  2 +
 gcc/testsuite/gfortran.dg/gomp/declare-simd-4.f90 | 42 +++
 gcc/testsuite/gfortran.dg/gomp/declare-simd-5.f90 | 49 +++
 4 files changed, 99 insertions(+), 2 deletions(-)

diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index a7eb6c3e8f4..e430f4c49dd 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -4213,9 +4213,13 @@ gfc_match_omp_declare_simd (void)
   gfc_omp_declare_simd *ods;
   bool needs_space = false;
 
-  switch (gfc_match (" ( %s ) ", _name))
+  switch (gfc_match (" ( "))
 {
-case MATCH_YES: break;
+case MATCH_YES:
+  if (gfc_match_symbol (_name, /* host assoc = */ true) != MATCH_YES
+	  || gfc_match (" ) ") != MATCH_YES)
+	return MATCH_ERROR;
+  break;
 case MATCH_NO: proc_name = NULL; needs_space = true; break;
 case MATCH_ERROR: return MATCH_ERROR;
 }
diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index de27ed52c02..22e6dd254c7 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -2798,6 +2798,8 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
 			  }
 			else
 			  {
+			if (last_step == NULL_TREE)
+			  last_step = size_one_node;
 			if (kind == OMP_CLAUSE_LINEAR_REF)
 			  {
 tree type;
diff --git a/gcc/testsuite/gfortran.dg/gomp/declare-simd-4.f90 b/gcc/testsuite/gfortran.dg/gomp/declare-simd-4.f90
new file mode 100644
index 000..44132525963
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/declare-simd-4.f90
@@ -0,0 +1,42 @@
+! { dg-do compile }
+! { dg-additional-options "-fdump-tree-gimple" }
+!
+! PR fortran/106566
+!
+! { dg-final { scan-tree-dump-times "__attribute__\\(\\(omp declare simd \\(linear\\(0:ref,step\\(4\\)\\) simdlen\\(8\\)\\)\\)\\)" 2 "gimple" } }
+! { dg-final { scan-tree-dump-times "__attribute__\\(\\(omp declare simd \\(linear\\(0:ref,step\\(8\\)\\) simdlen\\(8\\)\\)\\)\\)" 2 "gimple" } }
+
+subroutine add_one2(p)
+  implicit none
+  !$omp declare simd(add_one2) linear(p: ref) simdlen(8)
+  integer :: p
+
+  p = p + 1
+end subroutine
+
+subroutine linear_add_one2(p)
+  implicit none
+  !$omp declare simd(linear_add_one2) linear(p: ref, step(2)) simdlen(8)
+  integer :: p
+
+  p = p + 1
+end subroutine
+
+module m
+   integer, parameter :: NN = 1023
+   integer :: a(NN)
+contains
+  subroutine module_add_one2(q)
+implicit none
+!$omp declare simd(module_add_one2) linear(q: ref) simdlen(8)
+integer :: q
+q = q + 1
+  end subroutine
+
+  subroutine linear_add_one2(q)
+implicit none
+!$omp declare simd(linear_add_one2) linear(q: ref, step(2)) simdlen(8)
+integer :: q
+q = q + 1
+  end subroutine
+end module
diff --git a/gcc/testsuite/gfortran.dg/gomp/declare-simd-5.f90 b/gcc/testsuite/gfortran.dg/gomp/declare-simd-5.f90
new file mode 100644
index 000..f5880f50090
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/declare-simd-5.f90
@@ -0,0 +1,49 @@
+! { dg-do compile }
+!
+! PR fortran/106566
+!
+
+subroutine add_one2(p)
+  implicit none
+  procedure(add_one2) :: ext1
+  !$omp declare simd(ext1) linear(p: ref) simdlen(8)  ! { dg-error "OMP DECLARE SIMD should refer to containing procedure 'add_one2'" }
+  integer :: p
+
+  p = p + 1
+end subroutine
+
+subroutine 

[PATCH] Refactor back_threader_profitability

2022-08-16 Thread Richard Biener via Gcc-patches
The following refactors profitable_path_p in the backward threader,
splitting out parts that can be computed once the exit block is known,
parts that contiguously update and that can be checked allowing
for the path to be later identified as FSM with larger limits,
possibly_profitable_path_p, and final checks done when the whole
path is known, profitable_path_p.

I've removed the back_threader_profitability instance from the
back_threader class and instead instantiate it once per path
discovery.  I've kept the size compute non-incremental to simplify
the patch and not worry about unwinding.

There's key changes to previous behavior - namely we apply
the param_max_jump_thread_duplication_stmts early only when
we know the path cannot become an FSM one (multiway + thread through
latch) but make sure to elide the path query when we we didn't
yet discover that but are over this limit.  Similarly the
speed limit is now used even when we did not yet discover a
hot BB on the path.  Basically the idea is to only stop path
discovery when we know the path will never become profitable
but avoid the expensive path range query when we know it's
currently not.

I've done a few cleanups, merging functions, on the way.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Statistics show an overall slight increase in threading but
looking at different files theres noise up and down.  That's
somewhat expected since we now are applying the "more correct"
limits in the end.  Unless I made big mistakes of course.

The next thing cost-wise would be to raise the backwards
threading limit to the limit of DOM so we don't get
artificial high counts for that.

OK?

Thanks,
Richard.

* tree-ssa-threadbackward.cc
(back_threader_profitability): Split profitable_path_p
into possibly_profitable_path_p and itself, keep state
as new members.
(back_threader::m_profit): Remove.
(back_threader::find_paths): Likewise.
(back_threader::maybe_register_path): Take profitability
instance as parameter.
(back_threader::find_paths_to_names): Likewise.  Use
possibly_profitable_path_p and avoid the path range query
when the path is currently too large.
(back_threader::find_paths): Fold into ...
(back_threader::maybe_thread_block): ... this.
(get_gimple_control_stmt): Remove.
(back_threader_profitability::possibly_profitable_path_p):
Split out from profitable_path_p, do early profitability
checks.
(back_threader_profitability::profitable_path_p): Do final
profitability path after the taken edge has been determined.
---
 gcc/tree-ssa-threadbackward.cc | 350 ++---
 1 file changed, 192 insertions(+), 158 deletions(-)

diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
index 1c362839fab..077a767e73d 100644
--- a/gcc/tree-ssa-threadbackward.cc
+++ b/gcc/tree-ssa-threadbackward.cc
@@ -61,15 +61,33 @@ public:
 class back_threader_profitability
 {
 public:
-  back_threader_profitability (bool speed_p)
-: m_speed_p (speed_p)
-  { }
-  bool profitable_path_p (const vec &, tree name, edge taken,
- bool *irreducible_loop = NULL);
+  back_threader_profitability (bool speed_p, gimple *stmt);
+  bool possibly_profitable_path_p (const vec &, tree, bool *);
+  bool profitable_path_p (const vec &,
+ edge taken, bool *irreducible_loop);
 private:
   const bool m_speed_p;
+  int m_exit_jump_benefit;
+  bool m_threaded_multiway_branch;
+  // The following are computed by possibly_profitable_path_p
+  bool m_threaded_through_latch;
+  bool m_multiway_branch_in_path;
+  bool m_contains_hot_bb;
+  int m_n_insns;
 };
 
+back_threader_profitability::back_threader_profitability (bool speed_p,
+ gimple *last)
+  : m_speed_p (speed_p)
+{
+  m_threaded_multiway_branch = (gimple_code (last) == GIMPLE_SWITCH
+   || gimple_code (last) == GIMPLE_GOTO);
+  // The forward threader has estimate_threading_killed_stmts, in
+  // particular it estimates further DCE from eliminating the exit
+  // control stmt.
+  m_exit_jump_benefit = estimate_num_insns (last, _size_weights);
+}
+
 // Back threader flags.
 #define BT_NONE 0
 // Generate fast code at the expense of code size.
@@ -86,11 +104,11 @@ public:
   unsigned thread_blocks ();
 private:
   void maybe_thread_block (basic_block bb);
-  void find_paths (basic_block bb, tree name);
   bool debug_counter ();
-  edge maybe_register_path ();
+  edge maybe_register_path (back_threader_profitability &);
   void maybe_register_path_dump (edge taken_edge);
-  void find_paths_to_names (basic_block bb, bitmap imports, unsigned);
+  void find_paths_to_names (basic_block bb, bitmap imports, unsigned,
+   back_threader_profitability &);
   edge find_taken_edge (const vec );
   edge 

[PATCH] Avoid further recomputations in path_range_query once path is finalized.

2022-08-16 Thread Aldy Hernandez via Gcc-patches
[Richi, I'm trying to make things more obvious for others working for the
cose base.  What do you think?]

This makes a few things explicit to avoid misuse.  First, we add a
flag to differentiate between a path whose depdenency ranges are being
computed, and a path that is finalized and no further calculations
should be done.  I've also enhanced the comments to document what
queries are supported.  Finally, I added some asserts to make sure
range_of_expr is only used on the final conditional once the path has
been finalized.

Unfortunately, the forward threader builds dummy statements it passes
to the path solver.  Most of these don't have a BB associated with
them.  I've cleared the BB field for the one that still had one, to
make the asserts above work.

No difference in thread counts over my cc1 .ii files.

gcc/ChangeLog:

* gimple-range-path.cc (path_range_query::path_range_query):
Initialized m_path_finalized.
(path_range_query::internal_range_of_expr): Avoid further
recomputations once path is finalized.
Take basic_block instead of stmt as argument.
(path_range_query::range_of_expr): Document.  Add asserts.
(path_range_query::compute_ranges): Set m_path_finalized.
* gimple-range-path.h (path_range_query::internal_range_of_expr):
Replace statement argument with basic block.
(class path_range_query): Add m_path_finalized.
* tree-ssa-threadedge.cc
(jump_threader::simplify_control_stmt_condition): Clear BB field
in dummy_switch.
---
 gcc/gimple-range-path.cc   | 50 ++
 gcc/gimple-range-path.h|  7 +-
 gcc/tree-ssa-threadedge.cc |  1 +
 3 files changed, 52 insertions(+), 6 deletions(-)

diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index c99d77dd340..a8412dd090b 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -40,7 +40,8 @@ path_range_query::path_range_query (bool resolve, 
gimple_ranger *ranger)
   : m_cache (new ssa_global_cache),
 m_has_cache_entry (BITMAP_ALLOC (NULL)),
 m_resolve (resolve),
-m_alloced_ranger (!ranger)
+m_alloced_ranger (!ranger),
+m_path_finalized (false)
 {
   if (m_alloced_ranger)
 m_ranger = new gimple_ranger;
@@ -159,7 +160,7 @@ path_range_query::range_on_path_entry (vrange , tree name)
 // Return the range of NAME at the end of the path being analyzed.
 
 bool
-path_range_query::internal_range_of_expr (vrange , tree name, gimple *stmt)
+path_range_query::internal_range_of_expr (vrange , tree name, basic_block bb)
 {
   if (!r.supports_type_p (TREE_TYPE (name)))
 return false;
@@ -174,8 +175,16 @@ path_range_query::internal_range_of_expr (vrange , tree 
name, gimple *stmt)
   return true;
 }
 
-  if (stmt
-  && range_defined_in_block (r, name, gimple_bb (stmt)))
+  // Avoid further recomputations once the path has been finalized.
+  if (m_path_finalized)
+{
+  gimple_range_global (r, name);
+  return true;
+}
+
+  // We're being called as part of the calculation of ranges for exit
+  // dependencies.  Set the cache as we traverse the path top-down.
+  if (bb && range_defined_in_block (r, name, bb))
 {
   if (TREE_CODE (name) == SSA_NAME)
{
@@ -192,10 +201,37 @@ path_range_query::internal_range_of_expr (vrange , tree 
name, gimple *stmt)
   return true;
 }
 
+// This, as well as range_of_stmt, are the main entry points for
+// making queries about a path.
+//
+// Once the ranges for the exit dependencies have been pre-calculated,
+// the path is considered finalized, and the only valid query is
+// asking the range of a NAME at the end of the path.
+//
+// Note that this method can also be called internally before the path
+// is finalized, as part of the path traversal pre-calculating the
+// ranges for exit dependencies.  In this case, it may be called on
+// statements that are not the final conditional as described above.
+
 bool
 path_range_query::range_of_expr (vrange , tree name, gimple *stmt)
 {
-  if (internal_range_of_expr (r, name, stmt))
+  basic_block bb = stmt ? gimple_bb (stmt) : NULL;
+
+  // Once a path is finalized, the only valid queries are of the final
+  // statement in the exit block.
+  if (flag_checking && m_path_finalized && stmt)
+{
+  // The forward threader may query dummy statements without a
+  // basic block.
+  if (bb)
+   {
+ gcc_assert (stmt == last_stmt (bb));
+ gcc_assert (bb == exit_bb ());
+   }
+}
+
+  if (internal_range_of_expr (r, name, bb))
 {
   if (r.undefined_p ())
m_undefined_path = true;
@@ -608,6 +644,7 @@ path_range_query::compute_ranges (const vec 
,
 
   set_path (path);
   m_undefined_path = false;
+  m_path_finalized = false;
 
   if (dependencies)
 bitmap_copy (m_exit_dependencies, dependencies);
@@ -642,6 +679,7 @@ path_range_query::compute_ranges (const vec 
,
 
   move_next ();
 }
+  

[PATCH] Fix bogus -Wstringop-overflow warning in Ada

2022-08-16 Thread Eric Botcazou via Gcc-patches
Hi,

the following bogus warning:

In function 'lto26',
inlined from 'main' at /home/eric/gnat/bugs/V721-018/b~lto26.adb:237:7:
lto26.adb:11:13: warning: writing 1 byte into a region of size 0 [-Wstringop-
overflow=]
   11 | Set (R, (7, 0, 84, Stream_Element (I), 0, 0, 0), 1);
  | ^
lto26.adb: In function 'main':
lto26.adb:11:50: note: at offset -9223372036854775808 into destination object 
'A.0' of size 7
   11 | Set (R, (7, 0, 84, Stream_Element (I), 0, 0, 0), 1);
  |  ^

comes from a discrepancy between get_offset_range, which uses a signed type, 
and handle_array_ref, which uses an unsigned one, to do offset computations.

Tested on x86-64/Linux, OK for the mainline?


2022-08-16  Eric Botcazou  

* pointer-query.cc (handle_array_ref): Fix handling of low bound.


2022-08-16  Eric Botcazou  

* gnat.dg/lto26.adb: New test.
* gnat.dg/lto26_pkg1.ads, gnat.dg/lto26_pkg1.adb: New helper.
* gnat.dg/lto26_pkg2.ads, gnat.dg/lto26_pkg2.adb: Likewise.

-- 
Eric Botcazoudiff --git a/gcc/pointer-query.cc b/gcc/pointer-query.cc
index ae561731216..0f0100233c1 100644
--- a/gcc/pointer-query.cc
+++ b/gcc/pointer-query.cc
@@ -1796,14 +1796,19 @@ handle_array_ref (tree aref, gimple *stmt, bool addr, int ostype,
   orng[0] = -orng[1] - 1;
 }
 
-  /* Convert the array index range determined above to a byte
- offset.  */
+  /* Convert the array index range determined above to a byte offset.  */
   tree lowbnd = array_ref_low_bound (aref);
-  if (!integer_zerop (lowbnd) && tree_fits_uhwi_p (lowbnd))
-{
-  /* Adjust the index by the low bound of the array domain
-	 (normally zero but 1 in Fortran).  */
-  unsigned HOST_WIDE_INT lb = tree_to_uhwi (lowbnd);
+  if (TREE_CODE (lowbnd) == INTEGER_CST && !integer_zerop (lowbnd))
+{
+  /* Adjust the index by the low bound of the array domain (0 in C/C++,
+	 1 in Fortran and anything in Ada) by applying the same processing
+	 as in get_offset_range.  */
+  const wide_int wlb = wi::to_wide (lowbnd);
+  signop sgn = SIGNED;
+  if (TYPE_UNSIGNED (TREE_TYPE (lowbnd))
+	  && wlb.get_precision () < TYPE_PRECISION (sizetype))
+	sgn = UNSIGNED;
+  const offset_int lb = offset_int::from (wlb, sgn);
   orng[0] -= lb;
   orng[1] -= lb;
 }
with Lto26_Pkg2; use Lto26_Pkg2;

package body Lto26_Pkg1 is

  procedure Set (R : Rec; A : Stream_Element_Array; C :Unsigned_8) is
procedure My_Build is new Build;
  begin
 My_Build (A, C);
  end;

end Lto26_Pkg1;
with Ada.Finalization;
with Ada.Streams;  use Ada.Streams;
with Interfaces; use Interfaces;

package Lto26_Pkg1 is

  type Rec is new Ada.Finalization.Limited_Controlled with null record;

  procedure Set (R : Rec; A : Stream_Element_Array; C :Unsigned_8);

end Lto26_Pkg1;
with Ada.Streams; use Ada.Streams;
with Interfaces; use Interfaces;

package Lto26_Pkg2 is

  generic
  procedure Build (A : Stream_Element_Array; C : Unsigned_8);

end Lto26_Pkg2;
package body Lto26_Pkg2 is

  procedure Build (A : Stream_Element_Array; C : Unsigned_8) is
Start  : Stream_Element_Offset := A'First;
Next   : Stream_Element_Offset;
Length : Unsigned_8;
  begin
for I in 1 .. C loop
  Length := Unsigned_8 (A (A'First));
  Next   := Start + Stream_Element_Offset (Length);
  Start  := Next;
end loop;
  end;

end Lto26_Pkg2;
-- { dg-do run }
-- { dg-options "-O2 -flto" { target lto } }

with Ada.Streams; use Ada.Streams;
with Lto26_Pkg1; use Lto26_Pkg1;

procedure Lto26 is
  R : Rec;
begin
  for I in 1 .. 10 loop
Set (R, (7, 0, 84, Stream_Element (I), 0, 0, 0), 1);
  end loop;
end;


Re: [PATCH] Tame path_range_query::compute_imports

2022-08-16 Thread Aldy Hernandez via Gcc-patches
On Tue, Aug 16, 2022 at 3:48 PM Andrew MacLeod  wrote:
>
>
> On 8/16/22 06:25, Aldy Hernandez wrote:
> > On Mon, Aug 15, 2022 at 11:53 AM Richard Biener  wrote:
> >> The remaining issue I have with the path_range_query is that
> >> we re-use the same instance in the back threader but the
> >> class doesn't provide any way to "restart", aka give m_path
> >> a lifetime.  The "start a new path" API seems to essentially
> >> be compute_ranges (), but there's no convenient way to end.
> >> It might be more appropriate to re-instantiate the path_range_query,
> >> though that comes at a cost.  Or abstract an actual query, like
> >> adding a
> > Yes, compute_ranges() is the way to start a new path.  It resets exit
> > dependencies, the path, relations, etc.  I think it would be clearer
> > to name it set_path (or reset_path if we want to share nomenclature
> > with the path_oracle).
> >
> > Instantiating a new path_range_query per path is fine, as long as you
> > allocate the ranger it uses yourself, instead of letting
> > path_range_query allocate it.  Instantiating a new ranger does have a
> > cost, and it's best to let path_range_query re-use a ranger from path
> > to path.  This is why path_range_query is (class) global in the
> > backwards threader.  Andrew mentioned last year making the ranger
> > start-up 0-cost, but it still leaves the internal caching the ranger
> > will do from path to path (well, the stuff outside the current path,
> > cause the stuff inside the path is irrelevant since it'll get
> > recalculated).
>
> Yes, you will want to have one instance of ranger regardless... just
> pass it to whatever/however many other instances you want to build paths
> from.
>
> Ranger itself is primarily to provide range-on-entry to the path.
> Trying to use it for values within the path would bring in values
> outside the path as it doesnt understand you have selected on certain
> edges along the way.
>
> The GORI engine within ranger can be utilized within the path because
> GORI never looks outside the basic block being asked about, other than
> thru the range-query that is provided to it. SO its perfectly safe to
> use within the path.
>
> As both GORI and ranger cache things and share the def chains, its far
> more efficient to have a global instance that is just utilized.   Even a
> zero-cost start up would incur costs as it recalculates the same things
> over and over

I forgot about the def chains.  That should be fine however, since we
use the gori from within the ranger that got passed down.

Aldy



Re: Where in C++ module streaming to handle a new bitfield added in "tree_decl_common"

2022-08-16 Thread Qing Zhao via Gcc-patches


> On Aug 16, 2022, at 8:37 AM, Richard Biener  
> wrote:
> 
> On Tue, Aug 16, 2022 at 2:16 PM Nathan Sidwell  wrote:
>> 
>> On 8/15/22 10:03, Richard Biener wrote:
>>> On Mon, Aug 15, 2022 at 3:29 PM Nathan Sidwell via Gcc-patches
>>>  wrote:
 
 On 8/2/22 10:44, Qing Zhao wrote:
> Hi, Nathan,
> 
> I am adding a new bitfield “decl_not_flexarray” in “tree_decl_common”  
> (gcc/tree-core.h) for the new gcc feature -fstrict-flex-arrays.
> 
> 
> diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> index ea9f281f1cc..458c6e6ceea 100644
> --- a/gcc/tree-core.h
> +++ b/gcc/tree-core.h
> @@ -1813,7 +1813,10 @@ struct GTY(()) tree_decl_common {
>   TYPE_WARN_IF_NOT_ALIGN.  */
>unsigned int warn_if_not_align : 6;
> 
> -  /* 14 bits unused.  */
> +  /* In FIELD_DECL, this is DECL_NOT_FLEXARRAY.  */
> +  unsigned int decl_not_flexarray : 1;
 
 Is it possible to invert the meaning here -- set the flag if it /IS/ a
 flexible array? negated flags can be confusing, and I see your patch
 sets it to '!is_flexible_array (...)' anyway?
>>> 
>>> The issue is it's consumed by the middle-end but set by a single (or two)
>>> frontends and the conservative setting is having the bit not set.  That 
>>> works
>>> nicely together with touching just the frontends that want stricter behavior
>>> than currently ...
>> 
>> Makes sense, but is the comment incomplete?  I'm guessing this flag is
>> for FIELD_DECLs /of array type/, and not just any old FIELD_DECL?  After
>> all a field of type int is not a flexible array, but presumably doesn't
>> need this flag setting?
> 
> Yes, the docs should be more complete in tree.h on the actual 
> DECL_NOT_FLEXARRAY
> definition.

Okay, will add more comments in tree.h to make the DECL_NOT_FLEXARRAY more 
complete.

thanks.

Qing
> 
> Richard.
> 
>> nathan
>> 
>> --
>> Nathan Sidwell



Ping^2: 2 libcpp patches

2022-08-16 Thread Lewis Hyatt via Gcc-patches
On Wed, Jul 20, 2022 at 8:56 PM Lewis Hyatt  wrote:
>
> Hello-
>
> May I please ping these two preprocessor patches?
>
> For PR103902:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596704.html
>
> For PR55971:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596820.html
>
> Thanks!

Hello-

I would very much appreciate feedback on these two patches please?

For the first patch, I think it is a worthwhile goal to fix all the
places where libcpp fails to support UTF-8 correctly, and this is one
of two remaining ones that I'm aware of. I can fix the other case
(handling of #pragma push_macro) once this one is in place.

The second patch is about libcpp not allowing raw strings containing
newlines in preprocessor directives, which is a nearly decade-old
glitch that I think is also worth addressing.

Thanks!

-Lewis


Re: [PATCH] Tame path_range_query::compute_imports

2022-08-16 Thread Andrew MacLeod via Gcc-patches



On 8/16/22 06:25, Aldy Hernandez wrote:

On Mon, Aug 15, 2022 at 11:53 AM Richard Biener  wrote:

The remaining issue I have with the path_range_query is that
we re-use the same instance in the back threader but the
class doesn't provide any way to "restart", aka give m_path
a lifetime.  The "start a new path" API seems to essentially
be compute_ranges (), but there's no convenient way to end.
It might be more appropriate to re-instantiate the path_range_query,
though that comes at a cost.  Or abstract an actual query, like
adding a

Yes, compute_ranges() is the way to start a new path.  It resets exit
dependencies, the path, relations, etc.  I think it would be clearer
to name it set_path (or reset_path if we want to share nomenclature
with the path_oracle).

Instantiating a new path_range_query per path is fine, as long as you
allocate the ranger it uses yourself, instead of letting
path_range_query allocate it.  Instantiating a new ranger does have a
cost, and it's best to let path_range_query re-use a ranger from path
to path.  This is why path_range_query is (class) global in the
backwards threader.  Andrew mentioned last year making the ranger
start-up 0-cost, but it still leaves the internal caching the ranger
will do from path to path (well, the stuff outside the current path,
cause the stuff inside the path is irrelevant since it'll get
recalculated).


Yes, you will want to have one instance of ranger regardless... just 
pass it to whatever/however many other instances you want to build paths 
from.


Ranger itself is primarily to provide range-on-entry to the path.  
Trying to use it for values within the path would bring in values 
outside the path as it doesnt understand you have selected on certain 
edges along the way.


The GORI engine within ranger can be utilized within the path because 
GORI never looks outside the basic block being asked about, other than 
thru the range-query that is provided to it. SO its perfectly safe to 
use within the path.


As both GORI and ranger cache things and share the def chains, its far 
more efficient to have a global instance that is just utilized.   Even a 
zero-cost start up would incur costs as it recalculates the same things 
over and over



Andrew



Re: Where in C++ module streaming to handle a new bitfield added in "tree_decl_common"

2022-08-16 Thread Qing Zhao via Gcc-patches


> On Aug 15, 2022, at 9:28 AM, Nathan Sidwell  wrote:
> 
> On 8/2/22 10:44, Qing Zhao wrote:
>> Hi, Nathan,
>> I am adding a new bitfield “decl_not_flexarray” in “tree_decl_common”  
>> (gcc/tree-core.h) for the new gcc feature -fstrict-flex-arrays.
>> 
>> diff --git a/gcc/tree-core.h b/gcc/tree-core.h
>> index ea9f281f1cc..458c6e6ceea 100644
>> --- a/gcc/tree-core.h
>> +++ b/gcc/tree-core.h
>> @@ -1813,7 +1813,10 @@ struct GTY(()) tree_decl_common {
>>  TYPE_WARN_IF_NOT_ALIGN.  */
>>   unsigned int warn_if_not_align : 6;
>> -  /* 14 bits unused.  */
>> +  /* In FIELD_DECL, this is DECL_NOT_FLEXARRAY.  */
>> +  unsigned int decl_not_flexarray : 1;
> 
> Is it possible to invert the meaning here -- set the flag if it /IS/ a 
> flexible array? negated flags can be confusing, and I see your patch sets it 
> to '!is_flexible_array (...)' anyway?
> 
>> +
>> +  /* 13 bits unused.  */
>>   /* UID for points-to sets, stable over copying from inlining.  */
>>   unsigned int pt_uid;
>> 
>> (Please refer to the following for details:
>> https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598556.html
>> https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598965.html
> 
> 
> 
>> )
>> Richard mentioned the following:
>> "I've not seen it so you are probably missing it - the bit has to be
>> streamed in tree-streamer-{in,out}.cc to be usable from LTO.  Possibly
>> C++ module streaming also needs to handle it.”
>> I have figured out that where to add the handling of the bit in 
>> “tree-streamer-{in, out}.cc,
>> However, it’s quite difficult for me to locate where should I add the 
>> handling of this new bit in
>> C++ module streaming,  could you please help me on this?
> 
> 
> add it in to trees_{in,out}::core_bools.  You could elide streaming for 
> non-FIELD_DECL decls.

Got it. Thanks a lot.

Qing
> 
> Hope that helps.
> 
> nathan
> 
> 
> 
>> Thanks a lot for your help.
>> Qing
> 
> 
> -- 
> Nathan Sidwell



Re: [PATCH] Support threading of just the exit edge

2022-08-16 Thread Richard Biener via Gcc-patches
On Tue, 16 Aug 2022, Richard Biener wrote:

> On Tue, 16 Aug 2022, Aldy Hernandez wrote:
> 
> > On Tue, Aug 16, 2022 at 11:18 AM Richard Biener  wrote:
> > >
> > > On Mon, 15 Aug 2022, Aldy Hernandez wrote:
> > >
> > > > On Mon, Aug 15, 2022 at 9:24 PM Andrew MacLeod  
> > > > wrote:
> > > > >
> > > > > heh. or just
> > > > >
> > > > >
> > > > > +  int_range<2> r;
> > > > > +  if (!fold_range (r, const_cast  (cond_stmt))
> > > > > +  || !r.singleton_p ())
> > > > >
> > > > >
> > > > > if you do not provide a range_query to any of the fold_using_range 
> > > > > code,
> > > > > it defaults to:
> > > > >
> > > > > fur_source::fur_source (range_query *q)
> > > > > {
> > > > >if (q)
> > > > >  m_query = q;
> > > > >else if (cfun)
> > > > >  m_query = get_range_query (cfun);
> > > > >else
> > > > >  m_query = get_global_range_query ();
> > > > >m_gori = NULL;
> > > > > }
> > > > >
> > > >
> > > > Sweet.  Even better!
> > >
> > > So when I do the following incremental change ontop of the posted
> > > patch then I see that the path query is able to simplify more
> > > "single BB paths" than the global range folding.
> > >
> > > diff --git a/gcc/tree-ssa-threadbackward.cc
> > > b/gcc/tree-ssa-threadbackward.cc
> > > index 669098e4ec3..777e778037f 100644
> > > --- a/gcc/tree-ssa-threadbackward.cc
> > > +++ b/gcc/tree-ssa-threadbackward.cc
> > > @@ -314,6 +314,12 @@ back_threader::find_taken_edge_cond (const
> > > vec ,
> > >  {
> > >int_range_max r;
> > >
> > > +  int_range<2> rf;
> > > +  if (path.length () == 1)
> > > +{
> > > +  fold_range (rf, cond);
> > > +}
> > > +
> > >m_solver->compute_ranges (path, m_imports);
> > >m_solver->range_of_stmt (r, cond);
> > >
> > > @@ -325,6 +331,8 @@ back_threader::find_taken_edge_cond (const
> > > vec ,
> > >
> > >if (r == true_range || r == false_range)
> > >  {
> > > +  if (path.length () == 1)
> > > +   gcc_assert  (r == rf);
> > >edge e_true, e_false;
> > >basic_block bb = gimple_bb (cond);
> > >extract_true_false_edges_from_block (bb, _true, _false);
> > >
> > > Even doing the following (not sure what's the difference and in
> > > particular expense over the path range query) results in missed
> > > simplifications (checking my set of cc1files).
> > >
> > > diff --git a/gcc/tree-ssa-threadbackward.cc
> > > b/gcc/tree-ssa-threadbackward.cc
> > > index 669098e4ec3..1d43a179d08 100644
> > > --- a/gcc/tree-ssa-threadbackward.cc
> > > +++ b/gcc/tree-ssa-threadbackward.cc
> > > @@ -99,6 +99,7 @@ private:
> > >
> > >back_threader_registry m_registry;
> > >back_threader_profitability m_profit;
> > > +  gimple_ranger *m_ranger;
> > >path_range_query *m_solver;
> > >
> > >// Current path being analyzed.
> > > @@ -146,12 +147,14 @@ back_threader::back_threader (function *fun,
> > > unsigned flags, bool first)
> > >// The path solver needs EDGE_DFS_BACK in resolving mode.
> > >if (flags & BT_RESOLVE)
> > >  mark_dfs_back_edges ();
> > > -  m_solver = new path_range_query (flags & BT_RESOLVE);
> > > +  m_ranger = new gimple_ranger;
> > > +  m_solver = new path_range_query (flags & BT_RESOLVE, m_ranger);
> > >  }
> > 
> > Passing an allocated ranger here results in less simplifications over
> > letting path_range_query allocate its own?  That's not right.  Or do
> > you mean that using fold_range() with the m_ranger causes ICEs with
> > your patch (due to the non-null processing described below)?
> 
> Yes, I've needed a ranger to use fold_range (..., m_ranger) which
> I thought might do more than not passing one.
> 
> > >
> > >  back_threader::~back_threader ()
> > >  {
> > >delete m_solver;
> > > +  delete m_ranger;
> > >
> > >loop_optimizer_finalize ();
> > >  }
> > > @@ -314,6 +317,12 @@ back_threader::find_taken_edge_cond (const
> > > vec ,
> > >  {
> > >int_range_max r;
> > >
> > > +  int_range<2> rf;
> > > +  if (path.length () == 1)
> > > +{
> > > +  fold_range (rf, cond, m_ranger);
> > > +}
> > > +
> > >m_solver->compute_ranges (path, m_imports);
> > >m_solver->range_of_stmt (r, cond);
> > >
> > > @@ -325,6 +334,8 @@ back_threader::find_taken_edge_cond (const
> > > vec ,
> > >
> > >if (r == true_range || r == false_range)
> > >  {
> > > +  if (path.length () == 1)
> > > +   gcc_assert  (r == rf);
> > >edge e_true, e_false;
> > >basic_block bb = gimple_bb (cond);
> > >extract_true_false_edges_from_block (bb, _true, _false);
> > >
> > > one example is
> > >
> > >  [local count: 14414059]:
> > > _128 = node_177(D)->typed.type;
> > > pretmp_413 = MEM[(const union tree_node *)_128].base.code;
> > > _431 = pretmp_413 + 65519;
> > > if (_128 == 0B)
> > >   goto ; [18.09%]
> > > else
> > >   goto ; [81.91%]
> > >
> > > where m_imports for the path is just _128 and the range computed is
> > > false while the ranger query returns VARYING.  But
> > > 

Re: [PATCH] Tame path_range_query::compute_imports

2022-08-16 Thread Andrew MacLeod via Gcc-patches



On 8/16/22 05:28, Richard Biener wrote:

On Tue, 16 Aug 2022, Aldy Hernandez wrote:


On Tue, Aug 16, 2022 at 11:08 AM Aldy Hernandez  wrote:

On Tue, Aug 16, 2022 at 10:32 AM Richard Biener  wrote:

On Tue, 16 Aug 2022, Aldy Hernandez wrote:


On Thu, Aug 11, 2022 at 1:42 PM Richard Biener  wrote:


@@ -599,6 +592,30 @@ path_range_query::compute_imports (bitmap imports, const 
vec )
 worklist.safe_push (arg);
 }
 }
+  else if (gassign *ass = dyn_cast  (def_stmt))
+   {
+ tree ssa[3];
+ if (range_op_handler (ass))
+   {
+ ssa[0] = gimple_range_ssa_p (gimple_range_operand1 (ass));
+ ssa[1] = gimple_range_ssa_p (gimple_range_operand2 (ass));
+ ssa[2] = NULL_TREE;
+   }
+ else if (gimple_assign_rhs_code (ass) == COND_EXPR)
+   {
+ ssa[0] = gimple_range_ssa_p (gimple_assign_rhs1 (ass));
+ ssa[1] = gimple_range_ssa_p (gimple_assign_rhs2 (ass));
+ ssa[2] = gimple_range_ssa_p (gimple_assign_rhs3 (ass));
+   }
+ else
+   continue;
+ for (unsigned j = 0; j < 3; ++j)
+   {
+ tree rhs = ssa[j];
+ if (rhs && add_to_imports (rhs, imports))
+   worklist.safe_push (rhs);
+   }
+   }

We seem to have 3 copies of this copy now: this one, the
threadbackward one, and the original one.

Could we abstract this somehow?

I've thought about this but didn't find any good solution since the
use of the operands is always a bit different.  But I was wondering
why/if the COND_EXPR special-casing is necessary, that is, why
don't we have a range_op_handler for it and if we don't why
do we care about it?


Simply that range-ops is defined as always being one or 2 use operands. 
everything is streamlined for those common cases.  There is no support 
in the API for a 3rd field. All non-conforming stmts are "unique" and 
thus require some level of specialization.  I am not aware of any other 
3 operand stmt we would care about, so at the time it seemed easier to 
leave cond-expr as a specialization, in theory localized.


as for the abstraction, its easy enough to provide a routine which will 
take a vector and fill it with ssa_names we might care about on the 
stmt.. then each caller can do their own little specialized bit.  I 
never expected for that sequence to become common place :-)



I think it's because we don't have a range-op handler for COND_EXPR,
opting to handle the relational operators instead in range-ops.  We
have similar code in the folder:

   if (range_op_handler (s))
 res = range_of_range_op (r, s, src);
   else if (is_a(s))
 res = range_of_phi (r, as_a (s), src);
   else if (is_a(s))
 res = range_of_call (r, as_a (s), src);
   else if (is_a (s) && gimple_assign_rhs_code (s) == COND_EXPR)
 res = range_of_cond_expr (r, as_a (s), src);

Andrew, do you have any suggestions here?

Hmmm, so thinking about this, perhaps special casing it is the way to go ??

It looks like so.  Though a range_op_handler could, for
_1 = _2 ? _3 : _4; derive a range for _3 from _1 if _2 is
known true?

the fold_using_range class does this for the forward direction, and my 
intention (just havent gotten to it yet), was to add similar 
specialization in GORI rather than expanding range-ops just for the one 
stmt... in much the same way that range_of_cond_expr works.


It should be localized to gori_compute::compute_operand_range () , where:

 range_op_handler handler(stmt);
  if (!handler)
    return false;

before returning, we check if its a COND_EXPR, and if so, call a 
specialized compute routine to look at the various operands and invoke 
any continued operand ranges as approriate.    It should be quite trivial..


  I'll take a look at adding COND_EXPR to GORI, and privoide that 
routine somewhere.. probably as a function as part of 
gimple-range-fold.h so it can potentially apply to any stmt.


Andrew



Re: [PATCH] fortran: Expand ieee_arithmetic module's ieee_value inline [PR106579]

2022-08-16 Thread FX via Gcc-patches
Hi,

>> Why looping over fields? The class type is a simple type with only one 
>> member (and it should be an integer, we can assert that).
> 
> I wanted to make sure it has exactly one field.
> The ieee_arithmetic.F90 module in libgfortran surely does that, but I've
> been worrying about some user overriding that module with something
> different.

In Fortran world it would be a rare and crazy thing to do, but I see. Could you 
add a comment (pointing out to the type definition in libgfortran)?


> The libgfortran version had default: label:
>switch (type) \
>{ \
>  case IEEE_SIGNALING_NAN: \
>return __builtin_nans ## SUFFIX (""); \
>  case IEEE_QUIET_NAN: \
>return __builtin_nan ## SUFFIX (""); \
>  case IEEE_NEGATIVE_INF: \
>return - __builtin_inf ## SUFFIX (); \
>  case IEEE_NEGATIVE_NORMAL: \
>return -42; \
>  case IEEE_NEGATIVE_DENORMAL: \
>return -(GFC_REAL_ ## TYPE ## _TINY) / 2; \
>  case IEEE_NEGATIVE_ZERO: \
>return -(GFC_REAL_ ## TYPE) 0; \
>  case IEEE_POSITIVE_ZERO: \
>return 0; \
>  case IEEE_POSITIVE_DENORMAL: \
>return (GFC_REAL_ ## TYPE ## _TINY) / 2; \
>  case IEEE_POSITIVE_NORMAL: \
>return 42; \
>  case IEEE_POSITIVE_INF: \
>return __builtin_inf ## SUFFIX (); \
>  default: \
>return 0; \
>} \
> and I've tried to traslate that into what it generates.

OK, that makes sense. But:

> There is at least the IEEE_OTHER_VALUE (aka 0) value
> that isn't covered in the switch, but it is just an integer
> under the hood, so it could have any other value.

I think I originally included the default for IEEE_OTHER_VALUE, but the 
standard says IEEE_OTHER_VALUE as an argument to IEE_VALUE is not allowed. If 
removing the default would make the generated code shorter, I suggest we do it; 
otherwise let’s keep it.

With those two caveats, the patch is OK. We shouldn’t touch the library code 
for now, but when the patch is committed we can add the removal of IEEE_VALUE 
and IEEE_CLASS from the library to this list: 
https://gcc.gnu.org/wiki/LibgfortranAbiCleanup

FX

[PATCH] xtensa: Prevent emitting integer additions of constant zero

2022-08-16 Thread Takayuki 'January June' Suwa via Gcc-patches
In a few cases, obviously omitable add instructions can be emitted via
invoking gen_addsi3.

gcc/ChangeLog:

* config/xtensa/xtensa.md (addsi3_internal): Rename from "addsi3".
(addsi3): New define_expand in order to reject integer additions of
constant zero.
---
 gcc/config/xtensa/xtensa.md | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index 9eeb73915..c132c1626 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -156,7 +156,19 @@
 
 ;; Addition.
 
-(define_insn "addsi3"
+(define_expand "addsi3"
+  [(set (match_operand:SI 0 "register_operand")
+   (plus:SI (match_operand:SI 1 "register_operand")
+(match_operand:SI 2 "add_operand")))]
+  ""
+{
+  if (! CONST_INT_P (operands[2]) || INTVAL (operands[2]) != 0)
+emit_insn (gen_addsi3_internal (operands[0],
+   operands[1], operands[2]));
+  DONE;
+})
+
+(define_insn "addsi3_internal"
   [(set (match_operand:SI 0 "register_operand" "=D,D,a,a,a")
(plus:SI (match_operand:SI 1 "register_operand" "%d,d,r,r,r")
 (match_operand:SI 2 "add_operand" "d,O,r,J,N")))]
-- 
2.20.1


Re: Where in C++ module streaming to handle a new bitfield added in "tree_decl_common"

2022-08-16 Thread Richard Biener via Gcc-patches
On Tue, Aug 16, 2022 at 2:16 PM Nathan Sidwell  wrote:
>
> On 8/15/22 10:03, Richard Biener wrote:
> > On Mon, Aug 15, 2022 at 3:29 PM Nathan Sidwell via Gcc-patches
> >  wrote:
> >>
> >> On 8/2/22 10:44, Qing Zhao wrote:
> >>> Hi, Nathan,
> >>>
> >>> I am adding a new bitfield “decl_not_flexarray” in “tree_decl_common”  
> >>> (gcc/tree-core.h) for the new gcc feature -fstrict-flex-arrays.
> >>>
> >>> 
> >>> diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> >>> index ea9f281f1cc..458c6e6ceea 100644
> >>> --- a/gcc/tree-core.h
> >>> +++ b/gcc/tree-core.h
> >>> @@ -1813,7 +1813,10 @@ struct GTY(()) tree_decl_common {
> >>>TYPE_WARN_IF_NOT_ALIGN.  */
> >>> unsigned int warn_if_not_align : 6;
> >>>
> >>> -  /* 14 bits unused.  */
> >>> +  /* In FIELD_DECL, this is DECL_NOT_FLEXARRAY.  */
> >>> +  unsigned int decl_not_flexarray : 1;
> >>
> >> Is it possible to invert the meaning here -- set the flag if it /IS/ a
> >> flexible array? negated flags can be confusing, and I see your patch
> >> sets it to '!is_flexible_array (...)' anyway?
> >
> > The issue is it's consumed by the middle-end but set by a single (or two)
> > frontends and the conservative setting is having the bit not set.  That 
> > works
> > nicely together with touching just the frontends that want stricter behavior
> > than currently ...
>
> Makes sense, but is the comment incomplete?  I'm guessing this flag is
> for FIELD_DECLs /of array type/, and not just any old FIELD_DECL?  After
> all a field of type int is not a flexible array, but presumably doesn't
> need this flag setting?

Yes, the docs should be more complete in tree.h on the actual DECL_NOT_FLEXARRAY
definition.

Richard.

> nathan
>
> --
> Nathan Sidwell


Re: [PATCH] Tame path_range_query::compute_imports

2022-08-16 Thread Aldy Hernandez via Gcc-patches
On Tue, Aug 16, 2022, 14:26 Richard Biener  wrote:

> On Tue, 16 Aug 2022, Aldy Hernandez wrote:
>
> > On Tue, Aug 16, 2022 at 1:38 PM Richard Biener 
> wrote:
> > >
> > > On Tue, 16 Aug 2022, Aldy Hernandez wrote:
> > >
> > > > On Mon, Aug 15, 2022 at 11:53 AM Richard Biener 
> wrote:
> > > > >
> > > > > The remaining issue I have with the path_range_query is that
> > > > > we re-use the same instance in the back threader but the
> > > > > class doesn't provide any way to "restart", aka give m_path
> > > > > a lifetime.  The "start a new path" API seems to essentially
> > > > > be compute_ranges (), but there's no convenient way to end.
> > > > > It might be more appropriate to re-instantiate the
> path_range_query,
> > > > > though that comes at a cost.  Or abstract an actual query, like
> > > > > adding a
> > > >
> > > > Yes, compute_ranges() is the way to start a new path.  It resets exit
> > > > dependencies, the path, relations, etc.  I think it would be clearer
> > > > to name it set_path (or reset_path if we want to share nomenclature
> > > > with the path_oracle).
> > > >
> > > > Instantiating a new path_range_query per path is fine, as long as you
> > > > allocate the ranger it uses yourself, instead of letting
> > > > path_range_query allocate it.  Instantiating a new ranger does have a
> > > > cost, and it's best to let path_range_query re-use a ranger from path
> > > > to path.  This is why path_range_query is (class) global in the
> > > > backwards threader.  Andrew mentioned last year making the ranger
> > > > start-up 0-cost, but it still leaves the internal caching the ranger
> > > > will do from path to path (well, the stuff outside the current path,
> > > > cause the stuff inside the path is irrelevant since it'll get
> > > > recalculated).
> > > >
> > > > However, why can't you use compute_ranges (or whatever we rename it
> to ;-))??
> > >
> > > I've added
> > >
> > >auto_bb_flag m_on_path;
> > >
> > > to the path query and at set_path time set m_on_path on each BB so
> > > the m_path->contains () linear walks go away.  But I need to clear
> > > the flag for which I would need something like finish_path (),
> > > doing it just at the point we deallocate the path query object
> > > or when we set the next path via compute_ranges doesn't look right
> > > (and in fact it doesn't work out-of-the-box without adjusting the
> > > lifetime of the path query object).
> > >
> > > So a more incremental thing would be to add such finish_path ()
> > > or to make the whole path query object single-shot, thus remove
> > > compute_ranges and instead use the CTOR for this.
> > >
> > > Probably not too important (for short paths).
> >
> > On a high level, I wonder if this matters since we don't allow long
> > paths for other performance reasons you've already tackled.  But OTOH,
> > I've always been a little uncomfortable with contains_p linear search,
> > so if you think this makes a difference, go right ahead :).
> >
> > I'm fine with either the finish_path() or the single-shot thing you
> > speak of.  Although making path query inmutable makes things cleaner
> > in the long run.  I like it!  My guess is that the non-ranger
> > instantiation penalty would be minimal.  I'd even remove the default
> > (auto-allocated) ranger from path_range_query, to make it obvious that
> > you need to manage that yourself and avoid folks shooting themselves
> > in the foot.
>
> We currently have
>
> path_range_query::path_range_query (bool resolve, gimple_ranger *ranger)
>   : m_cache (new ssa_global_cache),
> m_has_cache_entry (BITMAP_ALLOC (NULL)),
> m_resolve (resolve),
> m_alloced_ranger (!ranger)
> {
>   if (m_alloced_ranger)
> m_ranger = new gimple_ranger;
>   else
> m_ranger = ranger;
>
>   m_oracle = new path_oracle (m_ranger->oracle ());
>
>   if (m_resolve && flag_checking)
> verify_marked_backedges (cfun);
> }
>
> so at least verify_marked_backedges will explode, I suppose we
> want to hoist that somehow ...
>

Good point.


> then we allocate the path_oracle - that one does have a
> reset_path () function at least.  It's allocation looks
> quite harmless, but we should only need it when m_resolve?
>

Yes.


> > Wanna have a go at it?  If you'd rather not, I can work on it.
>
> If you have cycles go ahead - I'm fiddling with other parts of
> the threader right now.
>

Sure, I'll take a stab at it. Thanks for the other stuff you're doing on
the threader.

Aldy


> Richard.
>
>


Re: [PATCH] c++: Extend -Wpessimizing-move to other contexts

2022-08-16 Thread Marek Polacek via Gcc-patches
Ping.  (The other std::move patches depend on this one.)

(can_do_rvo_p is renamed to can_elide_copy_prvalue_p in the PR90428 patch.)

On Tue, Aug 02, 2022 at 07:04:47PM -0400, Marek Polacek via Gcc-patches wrote:
> In my recent patch which enhanced -Wpessimizing-move so that it warns
> about class prvalues too I said that I'd like to extend it so that it
> warns in more contexts where a std::move can prevent copy elision, such
> as:
> 
>   T t = std::move(T());
>   T t(std::move(T()));
>   T t{std::move(T())};
>   T t = {std::move(T())};
>   void foo (T);
>   foo (std::move(T()));
> 
> This patch does that by adding two maybe_warn_pessimizing_move calls.
> These must happen before we've converted the initializers otherwise the
> std::move will be buried in a TARGET_EXPR.
> 
> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> 
>   PR c++/106276
> 
> gcc/cp/ChangeLog:
> 
>   * call.cc (build_over_call): Call maybe_warn_pessimizing_move.
>   * cp-tree.h (maybe_warn_pessimizing_move): Declare.
>   * decl.cc (build_aggr_init_full_exprs): Call
>   maybe_warn_pessimizing_move.
>   * typeck.cc (maybe_warn_pessimizing_move): Handle TREE_LIST and
>   CONSTRUCTOR.  Add a bool parameter and use it.  Adjust a diagnostic
>   message.
>   (check_return_expr): Adjust the call to maybe_warn_pessimizing_move.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp0x/Wpessimizing-move7.C: Add dg-warning.
>   * g++.dg/cpp0x/Wpessimizing-move8.C: New test.
> ---
>  gcc/cp/call.cc|  5 +-
>  gcc/cp/cp-tree.h  |  1 +
>  gcc/cp/decl.cc|  3 +-
>  gcc/cp/typeck.cc  | 58 -
>  .../g++.dg/cpp0x/Wpessimizing-move7.C | 16 ++---
>  .../g++.dg/cpp0x/Wpessimizing-move8.C | 65 +++
>  6 files changed, 120 insertions(+), 28 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/Wpessimizing-move8.C
> 
> diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
> index 01a7be10077..370137ebd6d 100644
> --- a/gcc/cp/call.cc
> +++ b/gcc/cp/call.cc
> @@ -9627,10 +9627,13 @@ build_over_call (struct z_candidate *cand, int flags, 
> tsubst_flags_t complain)
>if (!conversion_warning)
>   arg_complain &= ~tf_warning;
>  
> +  if (arg_complain & tf_warning)
> + maybe_warn_pessimizing_move (arg, type, /*return_p*/false);
> +
>val = convert_like_with_context (conv, arg, fn, i - is_method,
>  arg_complain);
>val = convert_for_arg_passing (type, val, arg_complain);
> - 
> +
>if (val == error_mark_node)
>  return error_mark_node;
>else
> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> index 3278b4114bd..5a8af22b509 100644
> --- a/gcc/cp/cp-tree.h
> +++ b/gcc/cp/cp-tree.h
> @@ -8101,6 +8101,7 @@ extern tree finish_right_unary_fold_expr (tree, 
> int);
>  extern tree finish_binary_fold_expr  (tree, tree, int);
>  extern tree treat_lvalue_as_rvalue_p  (tree, bool);
>  extern bool decl_in_std_namespace_p   (tree);
> +extern void maybe_warn_pessimizing_move   (tree, tree, bool);
>  
>  /* in typeck2.cc */
>  extern void require_complete_eh_spec_types   (tree, tree);
> diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
> index 70ad681467e..dc6853a7de1 100644
> --- a/gcc/cp/decl.cc
> +++ b/gcc/cp/decl.cc
> @@ -7220,9 +7220,10 @@ check_array_initializer (tree decl, tree type, tree 
> init)
>  
>  static tree
>  build_aggr_init_full_exprs (tree decl, tree init, int flags)
> - 
>  {
>gcc_assert (stmts_are_full_exprs_p ());
> +  if (init)
> +maybe_warn_pessimizing_move (init, TREE_TYPE (decl), /*return_p*/false);
>return build_aggr_init (decl, init, flags, tf_warning_or_error);
>  }
>  
> diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
> index 9500c4e2fe8..2650beb780e 100644
> --- a/gcc/cp/typeck.cc
> +++ b/gcc/cp/typeck.cc
> @@ -10368,17 +10368,17 @@ treat_lvalue_as_rvalue_p (tree expr, bool return_p)
>  }
>  }
>  
> -/* Warn about wrong usage of std::move in a return statement.  RETVAL
> -   is the expression we are returning; FUNCTYPE is the type the function
> -   is declared to return.  */
> +/* Warn about dubious usage of std::move (in a return statement, if RETURN_P
> +   is true).  EXPR is the std::move expression; TYPE is the type of the 
> object
> +   being initialized.  */
>  
> -static void
> -maybe_warn_pessimizing_move (tree retval, tree functype)
> +void
> +maybe_warn_pessimizing_move (tree expr, tree type, bool return_p)
>  {
>if (!(warn_pessimizing_move || warn_redundant_move))
>  return;
>  
> -  location_t loc = cp_expr_loc_or_input_loc (retval);
> +  const location_t loc = cp_expr_loc_or_input_loc (expr);
>  
>/* C++98 doesn't know move.  */
>if (cxx_dialect < cxx11)
> @@ -10390,14 +10390,32 @@ maybe_warn_pessimizing_move (tree retval, tree 
> functype)
>  return;
>  
>

Re: [PATCH] Tame path_range_query::compute_imports

2022-08-16 Thread Richard Biener via Gcc-patches
On Tue, 16 Aug 2022, Aldy Hernandez wrote:

> On Tue, Aug 16, 2022 at 1:38 PM Richard Biener  wrote:
> >
> > On Tue, 16 Aug 2022, Aldy Hernandez wrote:
> >
> > > On Mon, Aug 15, 2022 at 11:53 AM Richard Biener  wrote:
> > > >
> > > > The remaining issue I have with the path_range_query is that
> > > > we re-use the same instance in the back threader but the
> > > > class doesn't provide any way to "restart", aka give m_path
> > > > a lifetime.  The "start a new path" API seems to essentially
> > > > be compute_ranges (), but there's no convenient way to end.
> > > > It might be more appropriate to re-instantiate the path_range_query,
> > > > though that comes at a cost.  Or abstract an actual query, like
> > > > adding a
> > >
> > > Yes, compute_ranges() is the way to start a new path.  It resets exit
> > > dependencies, the path, relations, etc.  I think it would be clearer
> > > to name it set_path (or reset_path if we want to share nomenclature
> > > with the path_oracle).
> > >
> > > Instantiating a new path_range_query per path is fine, as long as you
> > > allocate the ranger it uses yourself, instead of letting
> > > path_range_query allocate it.  Instantiating a new ranger does have a
> > > cost, and it's best to let path_range_query re-use a ranger from path
> > > to path.  This is why path_range_query is (class) global in the
> > > backwards threader.  Andrew mentioned last year making the ranger
> > > start-up 0-cost, but it still leaves the internal caching the ranger
> > > will do from path to path (well, the stuff outside the current path,
> > > cause the stuff inside the path is irrelevant since it'll get
> > > recalculated).
> > >
> > > However, why can't you use compute_ranges (or whatever we rename it to 
> > > ;-))??
> >
> > I've added
> >
> >auto_bb_flag m_on_path;
> >
> > to the path query and at set_path time set m_on_path on each BB so
> > the m_path->contains () linear walks go away.  But I need to clear
> > the flag for which I would need something like finish_path (),
> > doing it just at the point we deallocate the path query object
> > or when we set the next path via compute_ranges doesn't look right
> > (and in fact it doesn't work out-of-the-box without adjusting the
> > lifetime of the path query object).
> >
> > So a more incremental thing would be to add such finish_path ()
> > or to make the whole path query object single-shot, thus remove
> > compute_ranges and instead use the CTOR for this.
> >
> > Probably not too important (for short paths).
> 
> On a high level, I wonder if this matters since we don't allow long
> paths for other performance reasons you've already tackled.  But OTOH,
> I've always been a little uncomfortable with contains_p linear search,
> so if you think this makes a difference, go right ahead :).
> 
> I'm fine with either the finish_path() or the single-shot thing you
> speak of.  Although making path query inmutable makes things cleaner
> in the long run.  I like it!  My guess is that the non-ranger
> instantiation penalty would be minimal.  I'd even remove the default
> (auto-allocated) ranger from path_range_query, to make it obvious that
> you need to manage that yourself and avoid folks shooting themselves
> in the foot.

We currently have

path_range_query::path_range_query (bool resolve, gimple_ranger *ranger)
  : m_cache (new ssa_global_cache),
m_has_cache_entry (BITMAP_ALLOC (NULL)),
m_resolve (resolve),
m_alloced_ranger (!ranger)
{
  if (m_alloced_ranger)
m_ranger = new gimple_ranger;
  else
m_ranger = ranger;

  m_oracle = new path_oracle (m_ranger->oracle ());

  if (m_resolve && flag_checking)
verify_marked_backedges (cfun);
}

so at least verify_marked_backedges will explode, I suppose we
want to hoist that somehow ...

then we allocate the path_oracle - that one does have a
reset_path () function at least.  It's allocation looks
quite harmless, but we should only need it when m_resolve?

> Wanna have a go at it?  If you'd rather not, I can work on it.

If you have cycles go ahead - I'm fiddling with other parts of
the threader right now.

Richard.


Re: [PATCH] s390: Implement vec_set with vec_merge and, vec_duplicate.

2022-08-16 Thread Andreas Krebbel via Gcc-patches
On 8/12/22 16:48, Robin Dapp wrote:
> Hi,
> 
> similar to other backends this patch implements vec_set via
> vec_merge and vec_duplicate instead of an unspec.  This opens up
> more possibilites to combine instructions.
> 
> Bootstrapped and regtested. No regressions.
> 
> Is it OK?
> 
> Regards
>  Robin
> 
> gcc/ChangeLog:
> 
>   * config/s390/s390.md: Implement vec_set with vec_merge and
>   vec_duplicate.
>   * config/s390/vector.md: Likewise.
>   * config/s390/vx-builtins.md: Likewise.
>   * config/s390/s390.cc (s390_expand_vec_init): Emit new pattern.
>   (print_operand_address): New output modifier.
>   (print_operand): New output modifier.

The way you handle the element selector doesn't look right to me. It appears to 
be an index if it is
a CONST_INT and a bitmask otherwise. I don't think it is legal to change 
operand semantics like this
depending on the operand type. This would break e.g. if LRA would decide to 
load the immediate index
in a register.

Couldn't you make the shift part of the RTX instead and have the parameter 
always as an index?

Bye,

Andreas

> ---
> 
> diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
> index c86b26933d7a..ff89fb83360a 100644
> --- a/gcc/config/s390/s390.cc
> +++ b/gcc/config/s390/s390.cc
> @@ -7073,11 +7073,10 @@ s390_expand_vec_init (rtx target, rtx vals)
>if (!general_operand (elem, GET_MODE (elem)))
>   elem = force_reg (inner_mode, elem);
> 
> -  emit_insn (gen_rtx_SET (target,
> -   gen_rtx_UNSPEC (mode,
> -   gen_rtvec (3, elem,
> -  GEN_INT (i), target),
> -   UNSPEC_VEC_SET)));
> +  emit_insn
> + (gen_rtx_SET
> +  (target, gen_rtx_VEC_MERGE
> +   (mode, gen_rtx_VEC_DUPLICATE (mode, elem), target, GEN_INT (1 << 
> i;
>  }
>  }
> 
> @@ -8057,6 +8056,8 @@ print_operand_address (FILE *file, rtx addr)
>  'S': print S-type memory reference (base+displacement).
>  'Y': print address style operand without index (e.g. shift count or
> setmem
>operand).
> +'P': print address-style operand without index but with the offset as
> +  if it were specified by a 'p' format flag.
> 
>  'b': print integer X as if it's an unsigned byte.
>  'c': print integer X as if it's an signed byte.
> @@ -8068,6 +8069,7 @@ print_operand_address (FILE *file, rtx addr)
>  'k': print the first nonzero SImode part of X.
>  'm': print the first SImode part unequal to -1 of X.
>  'o': print integer X as if it's an unsigned 32bit word.
> +'p': print N such that 2^N == X (X must be a power of 2 and const int).
>  's': "start" of contiguous bitmask X in either DImode or vector
> inner mode.
>  't': CONST_INT: "start" of contiguous bitmask X in SImode.
>CONST_VECTOR: Generate a bitmask for vgbm instruction.
> @@ -8237,6 +8239,16 @@ print_operand (FILE *file, rtx x, int code)
>print_shift_count_operand (file, x);
>return;
> 
> +case 'P':
> +  if (CONST_INT_P (x))
> + {
> +   ival = exact_log2 (INTVAL (x));
> +   fprintf (file, HOST_WIDE_INT_PRINT_DEC, ival);
> + }
> +  else
> + print_shift_count_operand (file, x);
> +  return;
> +
>  case 'K':
>/* Append @PLT to both local and non-local symbols in order to
> support
>Linux Kernel livepatching: patches contain individual functions and
> @@ -8321,6 +8333,9 @@ print_operand (FILE *file, rtx x, int code)
>   case 'o':
> ival &= 0x;
> break;
> + case 'p':
> +   ival = exact_log2 (INTVAL (x));
> +   break;
>   case 'e': case 'f':
>   case 's': case 't':
> {
> diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
> index f37d8fd33a15..a82db4c624fa 100644
> --- a/gcc/config/s390/s390.md
> +++ b/gcc/config/s390/s390.md
> @@ -183,7 +183,6 @@ (define_c_enum "unspec" [
> UNSPEC_VEC_GFMSUM_128
> UNSPEC_VEC_GFMSUM_ACCUM
> UNSPEC_VEC_GFMSUM_ACCUM_128
> -   UNSPEC_VEC_SET
> 
> UNSPEC_VEC_VSUMG
> UNSPEC_VEC_VSUMQ
> diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
> index c50451a8326c..bde3a39db3d4 100644
> --- a/gcc/config/s390/vector.md
> +++ b/gcc/config/s390/vector.md
> @@ -467,12 +467,17 @@ (define_insn "mov"
>  ; vec_set is supposed to *modify* an existing vector so operand 0 is
>  ; duplicated as input operand.
>  (define_expand "vec_set"
> -  [(set (match_operand:V0 "register_operand"  "")
> - (unspec:V [(match_operand: 1 "general_operand"   "")
> -(match_operand:SI2 "nonmemory_operand" "")
> -(match_dup 0)]
> -UNSPEC_VEC_SET))]
> -  "TARGET_VX")
> +  [(set (match_operand:V  0 "register_operand" "")
> + (vec_merge:V
> +   (vec_duplicate:V
> + (match_operand: 1 "general_operand" ""))

[PATCH] Stop backwards thread discovery when leaving a loop

2022-08-16 Thread Richard Biener via Gcc-patches
The backward threader copier cannot deal with the situation of
copying blocks belonging to different loops and will reject those
paths late.  The following uses this to prune path discovery,
saving on compile-time.  Note the off-loop block is still considered
as entry edge origin.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

I've split this out from the 'Support threading of just the exit edge'
patch under discussion - it's really an independent change.

Pushed.

* tree-ssa-threadbackward.cc (back_threader::find_paths_to_names):
Do not walk further if we are leaving the current loop.
---
 gcc/tree-ssa-threadbackward.cc | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
index b886027fccf..1c362839fab 100644
--- a/gcc/tree-ssa-threadbackward.cc
+++ b/gcc/tree-ssa-threadbackward.cc
@@ -355,6 +355,12 @@ back_threader::find_paths_to_names (basic_block bb, bitmap 
interesting,
  || maybe_register_path ()))
 ;
 
+  // The backwards thread copier cannot copy blocks that do not belong
+  // to the same loop, so when the new source of the path entry no
+  // longer belongs to it we don't need to search further.
+  else if (m_path[0]->loop_father != bb->loop_father)
+;
+
   // Continue looking for ways to extend the path but limit the
   // search space along a branch
   else if ((overall_paths = overall_paths * EDGE_COUNT (bb->preds))
-- 
2.35.3


Re: [PATCH] Tame path_range_query::compute_imports

2022-08-16 Thread Aldy Hernandez via Gcc-patches
On Tue, Aug 16, 2022 at 1:38 PM Richard Biener  wrote:
>
> On Tue, 16 Aug 2022, Aldy Hernandez wrote:
>
> > On Mon, Aug 15, 2022 at 11:53 AM Richard Biener  wrote:
> > >
> > > The remaining issue I have with the path_range_query is that
> > > we re-use the same instance in the back threader but the
> > > class doesn't provide any way to "restart", aka give m_path
> > > a lifetime.  The "start a new path" API seems to essentially
> > > be compute_ranges (), but there's no convenient way to end.
> > > It might be more appropriate to re-instantiate the path_range_query,
> > > though that comes at a cost.  Or abstract an actual query, like
> > > adding a
> >
> > Yes, compute_ranges() is the way to start a new path.  It resets exit
> > dependencies, the path, relations, etc.  I think it would be clearer
> > to name it set_path (or reset_path if we want to share nomenclature
> > with the path_oracle).
> >
> > Instantiating a new path_range_query per path is fine, as long as you
> > allocate the ranger it uses yourself, instead of letting
> > path_range_query allocate it.  Instantiating a new ranger does have a
> > cost, and it's best to let path_range_query re-use a ranger from path
> > to path.  This is why path_range_query is (class) global in the
> > backwards threader.  Andrew mentioned last year making the ranger
> > start-up 0-cost, but it still leaves the internal caching the ranger
> > will do from path to path (well, the stuff outside the current path,
> > cause the stuff inside the path is irrelevant since it'll get
> > recalculated).
> >
> > However, why can't you use compute_ranges (or whatever we rename it to 
> > ;-))??
>
> I've added
>
>auto_bb_flag m_on_path;
>
> to the path query and at set_path time set m_on_path on each BB so
> the m_path->contains () linear walks go away.  But I need to clear
> the flag for which I would need something like finish_path (),
> doing it just at the point we deallocate the path query object
> or when we set the next path via compute_ranges doesn't look right
> (and in fact it doesn't work out-of-the-box without adjusting the
> lifetime of the path query object).
>
> So a more incremental thing would be to add such finish_path ()
> or to make the whole path query object single-shot, thus remove
> compute_ranges and instead use the CTOR for this.
>
> Probably not too important (for short paths).

On a high level, I wonder if this matters since we don't allow long
paths for other performance reasons you've already tackled.  But OTOH,
I've always been a little uncomfortable with contains_p linear search,
so if you think this makes a difference, go right ahead :).

I'm fine with either the finish_path() or the single-shot thing you
speak of.  Although making path query inmutable makes things cleaner
in the long run.  I like it!  My guess is that the non-ranger
instantiation penalty would be minimal.  I'd even remove the default
(auto-allocated) ranger from path_range_query, to make it obvious that
you need to manage that yourself and avoid folks shooting themselves
in the foot.

Wanna have a go at it?  If you'd rather not, I can work on it.

Aldy



Re: Where in C++ module streaming to handle a new bitfield added in "tree_decl_common"

2022-08-16 Thread Nathan Sidwell via Gcc-patches

On 8/15/22 10:03, Richard Biener wrote:

On Mon, Aug 15, 2022 at 3:29 PM Nathan Sidwell via Gcc-patches
 wrote:


On 8/2/22 10:44, Qing Zhao wrote:

Hi, Nathan,

I am adding a new bitfield “decl_not_flexarray” in “tree_decl_common”  
(gcc/tree-core.h) for the new gcc feature -fstrict-flex-arrays.


diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index ea9f281f1cc..458c6e6ceea 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -1813,7 +1813,10 @@ struct GTY(()) tree_decl_common {
   TYPE_WARN_IF_NOT_ALIGN.  */
unsigned int warn_if_not_align : 6;

-  /* 14 bits unused.  */
+  /* In FIELD_DECL, this is DECL_NOT_FLEXARRAY.  */
+  unsigned int decl_not_flexarray : 1;


Is it possible to invert the meaning here -- set the flag if it /IS/ a
flexible array? negated flags can be confusing, and I see your patch
sets it to '!is_flexible_array (...)' anyway?


The issue is it's consumed by the middle-end but set by a single (or two)
frontends and the conservative setting is having the bit not set.  That works
nicely together with touching just the frontends that want stricter behavior
than currently ...


Makes sense, but is the comment incomplete?  I'm guessing this flag is 
for FIELD_DECLs /of array type/, and not just any old FIELD_DECL?  After 
all a field of type int is not a flexible array, but presumably doesn't 
need this flag setting?


nathan

--
Nathan Sidwell


Re: [PATCH] Implement __builtin_issignaling

2022-08-16 Thread Richard Biener via Gcc-patches
On Tue, 16 Aug 2022, Jakub Jelinek wrote:

> On Tue, Aug 16, 2022 at 11:41:06AM +, Richard Biener wrote:
> > Can you also amend the extend.texi documentation?  I think the
> > behavior will be special enough to worth mentioning it (I don't see
> > any of -ffinite-math-only effect on isnan/isinf mentioned though).
> 
> Like this?

Yes.

Thanks,
Richard.

> --- gcc/doc/extend.texi.jj2022-08-16 13:23:04.227103773 +0200
> +++ gcc/doc/extend.texi   2022-08-16 13:56:01.250769807 +0200
> @@ -13557,6 +13557,8 @@ In the same fashion, GCC provides @code{
>  @code{isinf_sign}, @code{isnormal} and @code{signbit} built-ins used with
>  @code{__builtin_} prefixed.  The @code{isinf} and @code{isnan}
>  built-in functions appear both with and without the @code{__builtin_} prefix.
> +With @code{-ffinite-math-only} option the @code{isinf} and @code{isnan}
> +built-in functions will always return 0.
>  
>  GCC provides built-in versions of the ISO C99 floating-point rounding and
>  exceptions handling functions @code{fegetround}, @code{feclearexcept} and
> @@ -14496,6 +14498,12 @@ Note while the parameter list is an
>  ellipsis, this function only accepts exactly one floating-point
>  argument.  GCC treats this parameter as type-generic, which means it
>  does not do default promotion from float to double.
> +This built-in function can work even without the non-default
> +@code{-fsignaling-nans} option, although if a signaling NaN is computed,
> +stored or passed as argument to some function other than this built-in
> +in the current translation unit, it is safer to use @code{-fsignaling-nans}.
> +With @code{-ffinite-math-only} option this built-in function will always
> +return 0.
>  @end deftypefn
>  
>  @deftypefn {Built-in Function} int __builtin_ffs (int x)
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Re: [PATCH] driver: fix environ corruption after putenv() [PR106624]

2022-08-16 Thread Martin Liška
On 8/16/22 13:48, Sergei Trofimovich wrote:
> From: Sergei Trofimovich 
> 
> The bug appeared afte r13-2010-g1270ccda70ca09 "Factor out
> jobserver_active_p" slightly changed `putenv()` use from allocating
> to non-allocating:
> 
> -xputenv (concat ("MAKEFLAGS=", dup, NULL));
> +xputenv (jinfo.skipped_makeflags.c_str ());
> 
> `xputenv()` (and `putenv()`) don't copy strings and only store the
> pointer in the `environ` global table. As a result `environ` got
> corrupted as soon as `jinfo.skipped_makeflags` store got deallocated.
> 
> This started causing bootstrap crashes in `execv()` calls:
> 
> xgcc: fatal error: cannot execute '/build/build/./prev-gcc/collect2': 
> execv: Bad address
> 
> The change restores memory allocation for `xputenv()` argument.

Thanks for the patch.

I think it's an obvious fix, please install it.

Martin

> 
> gcc/
> 
>   PR driver/106624
>   * gcc (driver::detect_jobserver): Allocate storage xputenv()
>   argument using xstrdup().
> ---
>  gcc/gcc.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/gcc.cc b/gcc/gcc.cc
> index cac11c1a117..75ca0ece1a4 100644
> --- a/gcc/gcc.cc
> +++ b/gcc/gcc.cc
> @@ -9182,7 +9182,7 @@ driver::detect_jobserver () const
>  {
>jobserver_info jinfo;
>if (!jinfo.is_active && !jinfo.skipped_makeflags.empty ())
> -xputenv (jinfo.skipped_makeflags.c_str ());
> +xputenv (xstrdup (jinfo.skipped_makeflags.c_str ()));
>  }
>  
>  /* Determine what the exit code of the driver should be.  */



Re: [PATCH] Implement __builtin_issignaling

2022-08-16 Thread Jakub Jelinek via Gcc-patches
On Tue, Aug 16, 2022 at 11:41:06AM +, Richard Biener wrote:
> Can you also amend the extend.texi documentation?  I think the
> behavior will be special enough to worth mentioning it (I don't see
> any of -ffinite-math-only effect on isnan/isinf mentioned though).

Like this?

--- gcc/doc/extend.texi.jj  2022-08-16 13:23:04.227103773 +0200
+++ gcc/doc/extend.texi 2022-08-16 13:56:01.250769807 +0200
@@ -13557,6 +13557,8 @@ In the same fashion, GCC provides @code{
 @code{isinf_sign}, @code{isnormal} and @code{signbit} built-ins used with
 @code{__builtin_} prefixed.  The @code{isinf} and @code{isnan}
 built-in functions appear both with and without the @code{__builtin_} prefix.
+With @code{-ffinite-math-only} option the @code{isinf} and @code{isnan}
+built-in functions will always return 0.
 
 GCC provides built-in versions of the ISO C99 floating-point rounding and
 exceptions handling functions @code{fegetround}, @code{feclearexcept} and
@@ -14496,6 +14498,12 @@ Note while the parameter list is an
 ellipsis, this function only accepts exactly one floating-point
 argument.  GCC treats this parameter as type-generic, which means it
 does not do default promotion from float to double.
+This built-in function can work even without the non-default
+@code{-fsignaling-nans} option, although if a signaling NaN is computed,
+stored or passed as argument to some function other than this built-in
+in the current translation unit, it is safer to use @code{-fsignaling-nans}.
+With @code{-ffinite-math-only} option this built-in function will always
+return 0.
 @end deftypefn
 
 @deftypefn {Built-in Function} int __builtin_ffs (int x)

Jakub



[PATCH] driver: fix environ corruption after putenv() [PR106624]

2022-08-16 Thread Sergei Trofimovich via Gcc-patches
From: Sergei Trofimovich 

The bug appeared afte r13-2010-g1270ccda70ca09 "Factor out
jobserver_active_p" slightly changed `putenv()` use from allocating
to non-allocating:

-xputenv (concat ("MAKEFLAGS=", dup, NULL));
+xputenv (jinfo.skipped_makeflags.c_str ());

`xputenv()` (and `putenv()`) don't copy strings and only store the
pointer in the `environ` global table. As a result `environ` got
corrupted as soon as `jinfo.skipped_makeflags` store got deallocated.

This started causing bootstrap crashes in `execv()` calls:

xgcc: fatal error: cannot execute '/build/build/./prev-gcc/collect2': 
execv: Bad address

The change restores memory allocation for `xputenv()` argument.

gcc/

PR driver/106624
* gcc (driver::detect_jobserver): Allocate storage xputenv()
argument using xstrdup().
---
 gcc/gcc.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/gcc.cc b/gcc/gcc.cc
index cac11c1a117..75ca0ece1a4 100644
--- a/gcc/gcc.cc
+++ b/gcc/gcc.cc
@@ -9182,7 +9182,7 @@ driver::detect_jobserver () const
 {
   jobserver_info jinfo;
   if (!jinfo.is_active && !jinfo.skipped_makeflags.empty ())
-xputenv (jinfo.skipped_makeflags.c_str ());
+xputenv (xstrdup (jinfo.skipped_makeflags.c_str ()));
 }
 
 /* Determine what the exit code of the driver should be.  */
-- 
2.37.1



Re: [PATCH] s390: Implement vec_extract via vec_select.

2022-08-16 Thread Andreas Krebbel via Gcc-patches
On 8/12/22 16:19, Robin Dapp wrote:
> Hi,
> 
> vec_select can handle dynamic/runtime masks nowadays.  Therefore we can
> get rid of the UNSPEC_VEC_EXTRACT that was preventing further
> optimizations like combining instructions with vec_extract patterns.
> 
> Bootstrapped and regtested. No regressions.
> 
> Is it OK?
> 
> Regards
>  Robin
> 
> gcc/ChangeLog:
> 
>   * config/s390/s390.md: Remove UNSPEC_VEC_EXTRACT.
>   * config/s390/vector.md: Rewrite patterns to use vec_select.
>   * config/s390/vx-builtins.md (vec_scatter_element_SI):
>   Likewise.

Ok. Thanks!

Andreas


Re: [PATCH] Support threading of just the exit edge

2022-08-16 Thread Aldy Hernandez via Gcc-patches
On Tue, Aug 16, 2022 at 1:32 PM Richard Biener  wrote:
>
> On Tue, 16 Aug 2022, Aldy Hernandez wrote:
>
> > On Tue, Aug 16, 2022 at 11:18 AM Richard Biener  wrote:
> > >
> > > On Mon, 15 Aug 2022, Aldy Hernandez wrote:
> > >
> > > > On Mon, Aug 15, 2022 at 9:24 PM Andrew MacLeod  
> > > > wrote:
> > > > >
> > > > > heh. or just
> > > > >
> > > > >
> > > > > +  int_range<2> r;
> > > > > +  if (!fold_range (r, const_cast  (cond_stmt))
> > > > > +  || !r.singleton_p ())
> > > > >
> > > > >
> > > > > if you do not provide a range_query to any of the fold_using_range 
> > > > > code,
> > > > > it defaults to:
> > > > >
> > > > > fur_source::fur_source (range_query *q)
> > > > > {
> > > > >if (q)
> > > > >  m_query = q;
> > > > >else if (cfun)
> > > > >  m_query = get_range_query (cfun);
> > > > >else
> > > > >  m_query = get_global_range_query ();
> > > > >m_gori = NULL;
> > > > > }
> > > > >
> > > >
> > > > Sweet.  Even better!
> > >
> > > So when I do the following incremental change ontop of the posted
> > > patch then I see that the path query is able to simplify more
> > > "single BB paths" than the global range folding.
> > >
> > > diff --git a/gcc/tree-ssa-threadbackward.cc
> > > b/gcc/tree-ssa-threadbackward.cc
> > > index 669098e4ec3..777e778037f 100644
> > > --- a/gcc/tree-ssa-threadbackward.cc
> > > +++ b/gcc/tree-ssa-threadbackward.cc
> > > @@ -314,6 +314,12 @@ back_threader::find_taken_edge_cond (const
> > > vec ,
> > >  {
> > >int_range_max r;
> > >
> > > +  int_range<2> rf;
> > > +  if (path.length () == 1)
> > > +{
> > > +  fold_range (rf, cond);
> > > +}
> > > +
> > >m_solver->compute_ranges (path, m_imports);
> > >m_solver->range_of_stmt (r, cond);
> > >
> > > @@ -325,6 +331,8 @@ back_threader::find_taken_edge_cond (const
> > > vec ,
> > >
> > >if (r == true_range || r == false_range)
> > >  {
> > > +  if (path.length () == 1)
> > > +   gcc_assert  (r == rf);
> > >edge e_true, e_false;
> > >basic_block bb = gimple_bb (cond);
> > >extract_true_false_edges_from_block (bb, _true, _false);
> > >
> > > Even doing the following (not sure what's the difference and in
> > > particular expense over the path range query) results in missed
> > > simplifications (checking my set of cc1files).
> > >
> > > diff --git a/gcc/tree-ssa-threadbackward.cc
> > > b/gcc/tree-ssa-threadbackward.cc
> > > index 669098e4ec3..1d43a179d08 100644
> > > --- a/gcc/tree-ssa-threadbackward.cc
> > > +++ b/gcc/tree-ssa-threadbackward.cc
> > > @@ -99,6 +99,7 @@ private:
> > >
> > >back_threader_registry m_registry;
> > >back_threader_profitability m_profit;
> > > +  gimple_ranger *m_ranger;
> > >path_range_query *m_solver;
> > >
> > >// Current path being analyzed.
> > > @@ -146,12 +147,14 @@ back_threader::back_threader (function *fun,
> > > unsigned flags, bool first)
> > >// The path solver needs EDGE_DFS_BACK in resolving mode.
> > >if (flags & BT_RESOLVE)
> > >  mark_dfs_back_edges ();
> > > -  m_solver = new path_range_query (flags & BT_RESOLVE);
> > > +  m_ranger = new gimple_ranger;
> > > +  m_solver = new path_range_query (flags & BT_RESOLVE, m_ranger);
> > >  }
> >
> > Passing an allocated ranger here results in less simplifications over
> > letting path_range_query allocate its own?  That's not right.  Or do
> > you mean that using fold_range() with the m_ranger causes ICEs with
> > your patch (due to the non-null processing described below)?
>
> Yes, I've needed a ranger to use fold_range (..., m_ranger) which
> I thought might do more than not passing one.

Yeah.  If you don't pass it a ranger, it'll use global ranges (i.e.
SSA_NAME_RANGE_INFO).

More specifically, it will first try to get the ranger you have
enabled in your pass with enable_ranger().  If that's not available,
then it will use global ranges.

So you could also do:

m_ranger = enable_ranger (fun);

and then in the destructor:

disable_ranger (m_fun);
m_ranger = NULL; // for good measure.

Then you could use fold_range() without any parameters and it will
DTRT.  This is what I had in mind when I shared my proof of concept
for tree-cfg's version of find_taken_edge(bb).  If you have enabled a
ranger, it'll use that, otherwise it'll use global SSA_NAME_RANGE_INFO
ranges.


>
> > >
> > >  back_threader::~back_threader ()
> > >  {
> > >delete m_solver;
> > > +  delete m_ranger;
> > >
> > >loop_optimizer_finalize ();
> > >  }
> > > @@ -314,6 +317,12 @@ back_threader::find_taken_edge_cond (const
> > > vec ,
> > >  {
> > >int_range_max r;
> > >
> > > +  int_range<2> rf;
> > > +  if (path.length () == 1)
> > > +{
> > > +  fold_range (rf, cond, m_ranger);
> > > +}
> > > +
> > >m_solver->compute_ranges (path, m_imports);
> > >m_solver->range_of_stmt (r, cond);
> > >
> > > @@ -325,6 +334,8 @@ back_threader::find_taken_edge_cond (const
> > > vec ,
> > >
> > >if (r == true_range || r == false_range)

Re: [PATCH] Implement __builtin_issignaling

2022-08-16 Thread Richard Biener via Gcc-patches
On Tue, 16 Aug 2022, Jakub Jelinek wrote:

> On Mon, Aug 15, 2022 at 03:06:16PM +0200, Jakub Jelinek via Gcc-patches wrote:
> > On Mon, Aug 15, 2022 at 12:07:38PM +, Richard Biener wrote:
> > > Ah, I misread
> > > 
> > > +static rtx
> > > +expand_builtin_issignaling (tree exp, rtx target)
> > > +{
> > > +  if (!validate_arglist (exp, REAL_TYPE, VOID_TYPE))
> > > +return NULL_RTX;
> > > +
> > > +  tree arg = CALL_EXPR_ARG (exp, 0);
> > > +  scalar_float_mode fmode = SCALAR_FLOAT_TYPE_MODE (TREE_TYPE (arg));
> > > +  const struct real_format *fmt = REAL_MODE_FORMAT (fmode);
> > > +
> > > +  /* Expand the argument yielding a RTX expression. */
> > > +  rtx temp = expand_normal (arg);
> > > +
> > > +  /* If mode doesn't support NaN, always return 0.  */
> > > +  if (!HONOR_NANS (fmode))
> > > +{
> > > +  emit_move_insn (target, const0_rtx);
> > > +  return target;
> > 
> > I think I can expand on the comment why HONOR_NANS instead of HONOR_SNANS
> > and also add comment to the folding case.
> 
> So what about like in this incremental patch:
> 
> --- gcc/builtins.cc.jj2022-08-16 13:23:04.220103861 +0200
> +++ gcc/builtins.cc   2022-08-16 13:32:03.411257574 +0200
> @@ -2765,7 +2765,13 @@ expand_builtin_issignaling (tree exp, rt
>/* Expand the argument yielding a RTX expression. */
>rtx temp = expand_normal (arg);
>  
> -  /* If mode doesn't support NaN, always return 0.  */
> +  /* If mode doesn't support NaN, always return 0.
> + Don't use !HONOR_SNANS (fmode) here, so there is some possibility of
> + __builtin_issignaling working without -fsignaling-nans.  Especially
> + when -fno-signaling-nans is the default.
> + On the other side, MODE_HAS_NANS (fmode) is unnecessary, with
> + -ffinite-math-only even __builtin_isnan or __builtin_fpclassify
> + fold to 0 or non-NaN/Inf classification.  */
>if (!HONOR_NANS (fmode))
>  {
>emit_move_insn (target, const0_rtx);
> @@ -9259,6 +9265,12 @@ fold_builtin_classify (location_t loc, t
>return fold_build2_loc (loc, UNORDERED_EXPR, type, arg, arg);
>  
>  case BUILT_IN_ISSIGNALING:
> +  /* Folding to true for REAL_CST is done in fold_const_call_ss.
> +  Don't use tree_expr_signaling_nan_p (arg) -> integer_one_node
> +  and !tree_expr_maybe_signaling_nan_p (arg) -> integer_zero_node
> +  here, so there is some possibility of __builtin_issignaling working
> +  without -fsignaling-nans.  Especially when -fno-signaling-nans is
> +  the default.  */
>if (!tree_expr_maybe_nan_p (arg))
>   return omit_one_operand_loc (loc, type, integer_zero_node, arg);
>return NULL_TREE;

Can you also amend the extend.texi documentation?  I think the
behavior will be special enough to worth mentioning it (I don't see
any of -ffinite-math-only effect on isnan/isinf mentioned though).

I'm OK with the rest of the patch if Joseph doesn't have comments
on the actual issignaling lowerings (which I didn't review for
correctness due to lack of knowledge).

> > > > That seems like a glibc bug/weird feature in the __MATH_TG macro
> > > > or _Generic.
> > > > When compiled with C++ it is rejected.
> > > 
> > > So what about __builtin_issignaling then?  Do we want to silently
> > > ignore errors there?
> > 
> > I think we should just restrict it to the scalar floating point types.
> > After all, other typegeneric builtins that are or can be used similarly
> > do the same thing.
> 
> Note, that is what the patch does currently (rejecting _Complex
> {float,double,long double} etc. arguments).

I see.

Richard.


Re: [PATCH] Tame path_range_query::compute_imports

2022-08-16 Thread Richard Biener via Gcc-patches
On Tue, 16 Aug 2022, Aldy Hernandez wrote:

> On Mon, Aug 15, 2022 at 11:53 AM Richard Biener  wrote:
> >
> > The remaining issue I have with the path_range_query is that
> > we re-use the same instance in the back threader but the
> > class doesn't provide any way to "restart", aka give m_path
> > a lifetime.  The "start a new path" API seems to essentially
> > be compute_ranges (), but there's no convenient way to end.
> > It might be more appropriate to re-instantiate the path_range_query,
> > though that comes at a cost.  Or abstract an actual query, like
> > adding a
> 
> Yes, compute_ranges() is the way to start a new path.  It resets exit
> dependencies, the path, relations, etc.  I think it would be clearer
> to name it set_path (or reset_path if we want to share nomenclature
> with the path_oracle).
> 
> Instantiating a new path_range_query per path is fine, as long as you
> allocate the ranger it uses yourself, instead of letting
> path_range_query allocate it.  Instantiating a new ranger does have a
> cost, and it's best to let path_range_query re-use a ranger from path
> to path.  This is why path_range_query is (class) global in the
> backwards threader.  Andrew mentioned last year making the ranger
> start-up 0-cost, but it still leaves the internal caching the ranger
> will do from path to path (well, the stuff outside the current path,
> cause the stuff inside the path is irrelevant since it'll get
> recalculated).
> 
> However, why can't you use compute_ranges (or whatever we rename it to ;-))??

I've added

   auto_bb_flag m_on_path;

to the path query and at set_path time set m_on_path on each BB so
the m_path->contains () linear walks go away.  But I need to clear
the flag for which I would need something like finish_path (),
doing it just at the point we deallocate the path query object
or when we set the next path via compute_ranges doesn't look right
(and in fact it doesn't work out-of-the-box without adjusting the
lifetime of the path query object).

So a more incremental thing would be to add such finish_path ()
or to make the whole path query object single-shot, thus remove
compute_ranges and instead use the CTOR for this.

Probably not too important (for short paths).

Richard.

> Aldy
> 
> >
> >   query start (const vec &);
> >
> > and make range_of_* and friends members of a new 'query' class
> > instantiated by path_range_query.  I ran into this when trying
> > to axe the linear array walks for the .contains() query on the
> > path where I need a convenient way to "clenanup" after a path
> > query is done.
> >
> > Richard.
> >


Re: [PATCH] Implement __builtin_issignaling

2022-08-16 Thread Jakub Jelinek via Gcc-patches
On Mon, Aug 15, 2022 at 03:06:16PM +0200, Jakub Jelinek via Gcc-patches wrote:
> On Mon, Aug 15, 2022 at 12:07:38PM +, Richard Biener wrote:
> > Ah, I misread
> > 
> > +static rtx
> > +expand_builtin_issignaling (tree exp, rtx target)
> > +{
> > +  if (!validate_arglist (exp, REAL_TYPE, VOID_TYPE))
> > +return NULL_RTX;
> > +
> > +  tree arg = CALL_EXPR_ARG (exp, 0);
> > +  scalar_float_mode fmode = SCALAR_FLOAT_TYPE_MODE (TREE_TYPE (arg));
> > +  const struct real_format *fmt = REAL_MODE_FORMAT (fmode);
> > +
> > +  /* Expand the argument yielding a RTX expression. */
> > +  rtx temp = expand_normal (arg);
> > +
> > +  /* If mode doesn't support NaN, always return 0.  */
> > +  if (!HONOR_NANS (fmode))
> > +{
> > +  emit_move_insn (target, const0_rtx);
> > +  return target;
> 
> I think I can expand on the comment why HONOR_NANS instead of HONOR_SNANS
> and also add comment to the folding case.

So what about like in this incremental patch:

--- gcc/builtins.cc.jj  2022-08-16 13:23:04.220103861 +0200
+++ gcc/builtins.cc 2022-08-16 13:32:03.411257574 +0200
@@ -2765,7 +2765,13 @@ expand_builtin_issignaling (tree exp, rt
   /* Expand the argument yielding a RTX expression. */
   rtx temp = expand_normal (arg);
 
-  /* If mode doesn't support NaN, always return 0.  */
+  /* If mode doesn't support NaN, always return 0.
+ Don't use !HONOR_SNANS (fmode) here, so there is some possibility of
+ __builtin_issignaling working without -fsignaling-nans.  Especially
+ when -fno-signaling-nans is the default.
+ On the other side, MODE_HAS_NANS (fmode) is unnecessary, with
+ -ffinite-math-only even __builtin_isnan or __builtin_fpclassify
+ fold to 0 or non-NaN/Inf classification.  */
   if (!HONOR_NANS (fmode))
 {
   emit_move_insn (target, const0_rtx);
@@ -9259,6 +9265,12 @@ fold_builtin_classify (location_t loc, t
   return fold_build2_loc (loc, UNORDERED_EXPR, type, arg, arg);
 
 case BUILT_IN_ISSIGNALING:
+  /* Folding to true for REAL_CST is done in fold_const_call_ss.
+Don't use tree_expr_signaling_nan_p (arg) -> integer_one_node
+and !tree_expr_maybe_signaling_nan_p (arg) -> integer_zero_node
+here, so there is some possibility of __builtin_issignaling working
+without -fsignaling-nans.  Especially when -fno-signaling-nans is
+the default.  */
   if (!tree_expr_maybe_nan_p (arg))
return omit_one_operand_loc (loc, type, integer_zero_node, arg);
   return NULL_TREE;

> > > That seems like a glibc bug/weird feature in the __MATH_TG macro
> > > or _Generic.
> > > When compiled with C++ it is rejected.
> > 
> > So what about __builtin_issignaling then?  Do we want to silently
> > ignore errors there?
> 
> I think we should just restrict it to the scalar floating point types.
> After all, other typegeneric builtins that are or can be used similarly
> do the same thing.

Note, that is what the patch does currently (rejecting _Complex
{float,double,long double} etc. arguments).

Jakub



[committed] d: Update DIP links in gdc documentation to point at upstream repository

2022-08-16 Thread Iain Buclaw via Gcc-patches
Hi,

This patch fixes the broken DIP links in the GDC documentation.

The wiki links probably worked at some point in the distant past, but
now the official location of tracking all D Improvement Proposals is on
the upstream dlang/DIPs GitHub repository.

Regtested, committed to mainline, and backported to the releases/gcc-10,
releases/gcc-11, and releases/gcc-12 branches.

Regards,
Iain.

---
PR d/106638

gcc/d/ChangeLog:

* gdc.texi: Update DIP links to point at upstream dlang/DIPs
repository.
---
 gcc/d/gdc.texi | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/gcc/d/gdc.texi b/gcc/d/gdc.texi
index 2be3154bf86..2bff627d863 100644
--- a/gcc/d/gdc.texi
+++ b/gcc/d/gdc.texi
@@ -326,14 +326,17 @@ values are supported:
 @item all
 Turns on all upcoming D language features.
 @item dip1000
-Implements @uref{https://wiki.dlang.org/DIP1000} (Scoped pointers).
+Implements 
@uref{https://github.com/dlang/DIPs/blob/master/DIPs/other/DIP1000.md}
+(Scoped pointers).
 @item dip1008
-Implements @uref{https://wiki.dlang.org/DIP1008} (Allow exceptions in
-@code{@@nogc} code).
+Implements 
@uref{https://github.com/dlang/DIPs/blob/master/DIPs/other/DIP1008.md}
+(Allow exceptions in @code{@@nogc} code).
 @item dip1021
-Implements @uref{https://wiki.dlang.org/DIP1021} (Mutable function arguments).
+Implements 
@uref{https://github.com/dlang/DIPs/blob/master/DIPs/accepted/DIP1021.md}
+(Mutable function arguments).
 @item dip25
-Implements @uref{https://wiki.dlang.org/DIP25} (Sealed references).
+Implements 
@uref{https://github.com/dlang/DIPs/blob/master/DIPs/archive/DIP25.md}
+(Sealed references).
 @item dtorfields
 Turns on generation for destructing fields of partially constructed objects.
 @item fieldwise
@@ -383,7 +386,8 @@ are supported:
 @item all
 Turns off all revertable D language features.
 @item dip25
-Reverts @uref{https://wiki.dlang.org/DIP25} (Sealed references).
+Reverts @uref{https://github.com/dlang/DIPs/blob/master/DIPs/archive/DIP25.md}
+(Sealed references).
 @item dtorfields
 Turns off generation for destructing fields of partially constructed objects.
 @item markdown
-- 
2.34.1



Re: [PATCH] Support threading of just the exit edge

2022-08-16 Thread Richard Biener via Gcc-patches
On Tue, 16 Aug 2022, Aldy Hernandez wrote:

> On Tue, Aug 16, 2022 at 11:18 AM Richard Biener  wrote:
> >
> > On Mon, 15 Aug 2022, Aldy Hernandez wrote:
> >
> > > On Mon, Aug 15, 2022 at 9:24 PM Andrew MacLeod  
> > > wrote:
> > > >
> > > > heh. or just
> > > >
> > > >
> > > > +  int_range<2> r;
> > > > +  if (!fold_range (r, const_cast  (cond_stmt))
> > > > +  || !r.singleton_p ())
> > > >
> > > >
> > > > if you do not provide a range_query to any of the fold_using_range code,
> > > > it defaults to:
> > > >
> > > > fur_source::fur_source (range_query *q)
> > > > {
> > > >if (q)
> > > >  m_query = q;
> > > >else if (cfun)
> > > >  m_query = get_range_query (cfun);
> > > >else
> > > >  m_query = get_global_range_query ();
> > > >m_gori = NULL;
> > > > }
> > > >
> > >
> > > Sweet.  Even better!
> >
> > So when I do the following incremental change ontop of the posted
> > patch then I see that the path query is able to simplify more
> > "single BB paths" than the global range folding.
> >
> > diff --git a/gcc/tree-ssa-threadbackward.cc
> > b/gcc/tree-ssa-threadbackward.cc
> > index 669098e4ec3..777e778037f 100644
> > --- a/gcc/tree-ssa-threadbackward.cc
> > +++ b/gcc/tree-ssa-threadbackward.cc
> > @@ -314,6 +314,12 @@ back_threader::find_taken_edge_cond (const
> > vec ,
> >  {
> >int_range_max r;
> >
> > +  int_range<2> rf;
> > +  if (path.length () == 1)
> > +{
> > +  fold_range (rf, cond);
> > +}
> > +
> >m_solver->compute_ranges (path, m_imports);
> >m_solver->range_of_stmt (r, cond);
> >
> > @@ -325,6 +331,8 @@ back_threader::find_taken_edge_cond (const
> > vec ,
> >
> >if (r == true_range || r == false_range)
> >  {
> > +  if (path.length () == 1)
> > +   gcc_assert  (r == rf);
> >edge e_true, e_false;
> >basic_block bb = gimple_bb (cond);
> >extract_true_false_edges_from_block (bb, _true, _false);
> >
> > Even doing the following (not sure what's the difference and in
> > particular expense over the path range query) results in missed
> > simplifications (checking my set of cc1files).
> >
> > diff --git a/gcc/tree-ssa-threadbackward.cc
> > b/gcc/tree-ssa-threadbackward.cc
> > index 669098e4ec3..1d43a179d08 100644
> > --- a/gcc/tree-ssa-threadbackward.cc
> > +++ b/gcc/tree-ssa-threadbackward.cc
> > @@ -99,6 +99,7 @@ private:
> >
> >back_threader_registry m_registry;
> >back_threader_profitability m_profit;
> > +  gimple_ranger *m_ranger;
> >path_range_query *m_solver;
> >
> >// Current path being analyzed.
> > @@ -146,12 +147,14 @@ back_threader::back_threader (function *fun,
> > unsigned flags, bool first)
> >// The path solver needs EDGE_DFS_BACK in resolving mode.
> >if (flags & BT_RESOLVE)
> >  mark_dfs_back_edges ();
> > -  m_solver = new path_range_query (flags & BT_RESOLVE);
> > +  m_ranger = new gimple_ranger;
> > +  m_solver = new path_range_query (flags & BT_RESOLVE, m_ranger);
> >  }
> 
> Passing an allocated ranger here results in less simplifications over
> letting path_range_query allocate its own?  That's not right.  Or do
> you mean that using fold_range() with the m_ranger causes ICEs with
> your patch (due to the non-null processing described below)?

Yes, I've needed a ranger to use fold_range (..., m_ranger) which
I thought might do more than not passing one.

> >
> >  back_threader::~back_threader ()
> >  {
> >delete m_solver;
> > +  delete m_ranger;
> >
> >loop_optimizer_finalize ();
> >  }
> > @@ -314,6 +317,12 @@ back_threader::find_taken_edge_cond (const
> > vec ,
> >  {
> >int_range_max r;
> >
> > +  int_range<2> rf;
> > +  if (path.length () == 1)
> > +{
> > +  fold_range (rf, cond, m_ranger);
> > +}
> > +
> >m_solver->compute_ranges (path, m_imports);
> >m_solver->range_of_stmt (r, cond);
> >
> > @@ -325,6 +334,8 @@ back_threader::find_taken_edge_cond (const
> > vec ,
> >
> >if (r == true_range || r == false_range)
> >  {
> > +  if (path.length () == 1)
> > +   gcc_assert  (r == rf);
> >edge e_true, e_false;
> >basic_block bb = gimple_bb (cond);
> >extract_true_false_edges_from_block (bb, _true, _false);
> >
> > one example is
> >
> >  [local count: 14414059]:
> > _128 = node_177(D)->typed.type;
> > pretmp_413 = MEM[(const union tree_node *)_128].base.code;
> > _431 = pretmp_413 + 65519;
> > if (_128 == 0B)
> >   goto ; [18.09%]
> > else
> >   goto ; [81.91%]
> >
> > where m_imports for the path is just _128 and the range computed is
> > false while the ranger query returns VARYING.  But
> > path_range_query::range_defined_in_block does
> >
> >   if (bb && POINTER_TYPE_P (TREE_TYPE (name)))
> > m_ranger->m_cache.m_exit.maybe_adjust_range (r, name, bb);
> >
> > which adjusts the range to ~[0, 0], probably because of the
> > dereference in the following stmt.
> >
> > Why does fold_range not do this when folding the exit test?  Is there
> > a way to make it do so?  It looks 

Re: [PATCH] Tame path_range_query::compute_imports

2022-08-16 Thread Aldy Hernandez via Gcc-patches
On Mon, Aug 15, 2022 at 11:53 AM Richard Biener  wrote:
>
> The remaining issue I have with the path_range_query is that
> we re-use the same instance in the back threader but the
> class doesn't provide any way to "restart", aka give m_path
> a lifetime.  The "start a new path" API seems to essentially
> be compute_ranges (), but there's no convenient way to end.
> It might be more appropriate to re-instantiate the path_range_query,
> though that comes at a cost.  Or abstract an actual query, like
> adding a

Yes, compute_ranges() is the way to start a new path.  It resets exit
dependencies, the path, relations, etc.  I think it would be clearer
to name it set_path (or reset_path if we want to share nomenclature
with the path_oracle).

Instantiating a new path_range_query per path is fine, as long as you
allocate the ranger it uses yourself, instead of letting
path_range_query allocate it.  Instantiating a new ranger does have a
cost, and it's best to let path_range_query re-use a ranger from path
to path.  This is why path_range_query is (class) global in the
backwards threader.  Andrew mentioned last year making the ranger
start-up 0-cost, but it still leaves the internal caching the ranger
will do from path to path (well, the stuff outside the current path,
cause the stuff inside the path is irrelevant since it'll get
recalculated).

However, why can't you use compute_ranges (or whatever we rename it to ;-))??

Aldy

>
>   query start (const vec &);
>
> and make range_of_* and friends members of a new 'query' class
> instantiated by path_range_query.  I ran into this when trying
> to axe the linear array walks for the .contains() query on the
> path where I need a convenient way to "clenanup" after a path
> query is done.
>
> Richard.
>



Re: [PATCH] Teach vectorizer to deal with bitfield accesses (was: [RFC] Teach vectorizer to deal with bitfield reads)

2022-08-16 Thread Andre Vieira (lists) via Gcc-patches

Hi,

New version of the patch attached, but haven't recreated the ChangeLog 
yet, just waiting to see if this is what you had in mind. See also some 
replies to your comments in-line below:


On 09/08/2022 15:34, Richard Biener wrote:


@@ -2998,7 +3013,7 @@ ifcvt_split_critical_edges (class loop *loop, bool
aggressive_if_conv)
auto_vec critical_edges;

/* Loop is not well formed.  */
-  if (num <= 2 || loop->inner || !single_exit (loop))
+  if (num <= 2 || loop->inner)
  return false;

body = get_loop_body (loop);

this doesn't appear in the ChangeLog nor is it clear to me why it's
needed?  Likewise

So both these and...


-  /* Save BB->aux around loop_version as that uses the same field.  */
-  save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
-  void **saved_preds = XALLOCAVEC (void *, save_length);
-  for (unsigned i = 0; i < save_length; i++)
-saved_preds[i] = ifc_bbs[i]->aux;
+  void **saved_preds = NULL;
+  if (any_complicated_phi || need_to_predicate)
+{
+  /* Save BB->aux around loop_version as that uses the same field.
*/
+  save_length = loop->inner ? loop->inner->num_nodes :
loop->num_nodes;
+  saved_preds = XALLOCAVEC (void *, save_length);
+  for (unsigned i = 0; i < save_length; i++)
+   saved_preds[i] = ifc_bbs[i]->aux;
+}

is that just premature optimization?


.. these changes are to make sure we can still use the loop versioning 
code even for cases where there are bitfields to lower but no ifcvts 
(i.e. num of BBs <= 2).
I wasn't sure about the loop-inner condition and the small examples I 
tried it seemed to work, that is loop version seems to be able to handle 
nested loops.


The single_exit condition is still required for both, because the code 
to create the loop versions depends on it. It does look like I missed 
this in the ChangeLog...



+  /* BITSTART and BITEND describe the region we can safely load from
inside the
+ structure.  BITPOS is the bit position of the value inside the
+ representative that we will end up loading OFFSET bytes from the
start
+ of the struct.  BEST_MODE is the mode describing the optimal size of
the
+ representative chunk we load.  If this is a write we will store the
same
+ sized representative back, after we have changed the appropriate
bits.  */
+  get_bit_range (, , comp_ref, , );

I think you need to give up when get_bit_range sets bitstart = bitend to
zero

+  if (get_best_mode (bitsize, bitpos.to_constant (), bitstart, bitend,
+TYPE_ALIGN (TREE_TYPE (struct_expr)),
+INT_MAX, false, _mode))

+  tree rep_decl = build_decl (UNKNOWN_LOCATION, FIELD_DECL,
+ NULL_TREE, rep_type);
+  /* Load from the start of 'offset + bitpos % alignment'.  */
+  uint64_t extra_offset = bitpos.to_constant ();

you shouldn't build a new FIELD_DECL.  Either you use
DECL_BIT_FIELD_REPRESENTATIVE directly or you use a
BIT_FIELD_REF accessing the "representative".
DECL_BIT_FIELD_REPRESENTATIVE exists so it can maintain
a variable field offset, you can also subset that with an
intermediate BIT_FIELD_REF if DECL_BIT_FIELD_REPRESENTATIVE is
too large for your taste.

I'm not sure all the offset calculation you do is correct, but
since you shouldn't invent a new FIELD_DECL it probably needs
to change anyway ...
I can use the DECL_BIT_FIELD_REPRESENTATIVE, but I'll still need some 
offset calculation/extraction. It's easier to example with an example:


In vect-bitfield-read-3.c the struct:
typedef struct {
    int  c;
    int  b;
    bool a : 1;
} struct_t;

and field access 'vect_false[i].a' or 'vect_true[i].a' will lead to a 
DECL_BIT_FIELD_REPRESENTATIVE of TYPE_SIZE of 8 (and TYPE_PRECISION is 
also 8 as expected). However, the DECL_FIELD_OFFSET of either the 
original field decl, the actual bitfield member, or the 
DECL_BIT_FIELD_REPRESENTATIVE is 0 and the DECL_FIELD_BIT_OFFSET is 64. 
These will lead to the correct load:

_1 = vect_false[i].D;

D here being the representative is an 8-bit load from vect_false[i] + 
64bits. So all good there. However, when we construct BIT_FIELD_REF we 
can't simply use DECL_FIELD_BIT_OFFSET (field_decl) as the 
BIT_FIELD_REF's bitpos.  During `verify_gimple` it checks that bitpos + 
bitsize < TYPE_SIZE (TREE_TYPE (load)) where BIT_FIELD_REF (load, 
bitsize, bitpos).


So instead I change bitpos such that:
align_of_representative = TYPE_ALIGN (TREE_TYPE (representative));
bitpos -= bitpos.to_constant () / align_of_representative * 
align_of_representative;


I've now rewritten this to:
poly_int64 q,r;
if (can_trunc_div_p(bitpos, align_of_representative, , ))
  bitpos = r;

It makes it slightly clearer, also because I no longer need the changes 
to the original tree offset as I'm just using D for the load.

Note that for optimization it will be important that all
accesses to the bitfield members of the same bitfield use the
same underlying area (CSE and store-forwarding will thank 

Re: [PATCH] Support threading of just the exit edge

2022-08-16 Thread Aldy Hernandez via Gcc-patches
On Tue, Aug 16, 2022 at 11:18 AM Richard Biener  wrote:
>
> On Mon, 15 Aug 2022, Aldy Hernandez wrote:
>
> > On Mon, Aug 15, 2022 at 9:24 PM Andrew MacLeod  wrote:
> > >
> > > heh. or just
> > >
> > >
> > > +  int_range<2> r;
> > > +  if (!fold_range (r, const_cast  (cond_stmt))
> > > +  || !r.singleton_p ())
> > >
> > >
> > > if you do not provide a range_query to any of the fold_using_range code,
> > > it defaults to:
> > >
> > > fur_source::fur_source (range_query *q)
> > > {
> > >if (q)
> > >  m_query = q;
> > >else if (cfun)
> > >  m_query = get_range_query (cfun);
> > >else
> > >  m_query = get_global_range_query ();
> > >m_gori = NULL;
> > > }
> > >
> >
> > Sweet.  Even better!
>
> So when I do the following incremental change ontop of the posted
> patch then I see that the path query is able to simplify more
> "single BB paths" than the global range folding.
>
> diff --git a/gcc/tree-ssa-threadbackward.cc
> b/gcc/tree-ssa-threadbackward.cc
> index 669098e4ec3..777e778037f 100644
> --- a/gcc/tree-ssa-threadbackward.cc
> +++ b/gcc/tree-ssa-threadbackward.cc
> @@ -314,6 +314,12 @@ back_threader::find_taken_edge_cond (const
> vec ,
>  {
>int_range_max r;
>
> +  int_range<2> rf;
> +  if (path.length () == 1)
> +{
> +  fold_range (rf, cond);
> +}
> +
>m_solver->compute_ranges (path, m_imports);
>m_solver->range_of_stmt (r, cond);
>
> @@ -325,6 +331,8 @@ back_threader::find_taken_edge_cond (const
> vec ,
>
>if (r == true_range || r == false_range)
>  {
> +  if (path.length () == 1)
> +   gcc_assert  (r == rf);
>edge e_true, e_false;
>basic_block bb = gimple_bb (cond);
>extract_true_false_edges_from_block (bb, _true, _false);
>
> Even doing the following (not sure what's the difference and in
> particular expense over the path range query) results in missed
> simplifications (checking my set of cc1files).
>
> diff --git a/gcc/tree-ssa-threadbackward.cc
> b/gcc/tree-ssa-threadbackward.cc
> index 669098e4ec3..1d43a179d08 100644
> --- a/gcc/tree-ssa-threadbackward.cc
> +++ b/gcc/tree-ssa-threadbackward.cc
> @@ -99,6 +99,7 @@ private:
>
>back_threader_registry m_registry;
>back_threader_profitability m_profit;
> +  gimple_ranger *m_ranger;
>path_range_query *m_solver;
>
>// Current path being analyzed.
> @@ -146,12 +147,14 @@ back_threader::back_threader (function *fun,
> unsigned flags, bool first)
>// The path solver needs EDGE_DFS_BACK in resolving mode.
>if (flags & BT_RESOLVE)
>  mark_dfs_back_edges ();
> -  m_solver = new path_range_query (flags & BT_RESOLVE);
> +  m_ranger = new gimple_ranger;
> +  m_solver = new path_range_query (flags & BT_RESOLVE, m_ranger);
>  }

Passing an allocated ranger here results in less simplifications over
letting path_range_query allocate its own?  That's not right.  Or do
you mean that using fold_range() with the m_ranger causes ICEs with
your patch (due to the non-null processing described below)?

>
>  back_threader::~back_threader ()
>  {
>delete m_solver;
> +  delete m_ranger;
>
>loop_optimizer_finalize ();
>  }
> @@ -314,6 +317,12 @@ back_threader::find_taken_edge_cond (const
> vec ,
>  {
>int_range_max r;
>
> +  int_range<2> rf;
> +  if (path.length () == 1)
> +{
> +  fold_range (rf, cond, m_ranger);
> +}
> +
>m_solver->compute_ranges (path, m_imports);
>m_solver->range_of_stmt (r, cond);
>
> @@ -325,6 +334,8 @@ back_threader::find_taken_edge_cond (const
> vec ,
>
>if (r == true_range || r == false_range)
>  {
> +  if (path.length () == 1)
> +   gcc_assert  (r == rf);
>edge e_true, e_false;
>basic_block bb = gimple_bb (cond);
>extract_true_false_edges_from_block (bb, _true, _false);
>
> one example is
>
>  [local count: 14414059]:
> _128 = node_177(D)->typed.type;
> pretmp_413 = MEM[(const union tree_node *)_128].base.code;
> _431 = pretmp_413 + 65519;
> if (_128 == 0B)
>   goto ; [18.09%]
> else
>   goto ; [81.91%]
>
> where m_imports for the path is just _128 and the range computed is
> false while the ranger query returns VARYING.  But
> path_range_query::range_defined_in_block does
>
>   if (bb && POINTER_TYPE_P (TREE_TYPE (name)))
> m_ranger->m_cache.m_exit.maybe_adjust_range (r, name, bb);
>
> which adjusts the range to ~[0, 0], probably because of the
> dereference in the following stmt.
>
> Why does fold_range not do this when folding the exit test?  Is there
> a way to make it do so?  It looks like the only routine using this
> in gimple-range.cc is range_on_edge and there it's used for e->src
> after calling range_on_exit for e->src (why's it not done in
> range_on_exit?).  A testcase for this is

Andrew's gonna have to answer this one, because I'm just a user of the
infer_range infrastructure.  But yes, you're right... fold_range
doesn't seem to take into account side-effects such as non-null.

Aldy

>
> int foo (int **p, int i)
> {
>   int *q = *p;

[PATCH][pushed] VR: add more virtual dtors

2022-08-16 Thread Martin Liška
Likewise pushed as obvious.

Martin

Add 2 virtual destructors in order to address:

gcc/alloc-pool.h:522:5: warning: destructor called on non-final 
'value_range_equiv' that has virtual functions but non-virtual destructor 
[-Wdelete-non-abstract-non-virtual-dtor]
gcc/ggc.h:166:3: warning: destructor called on non-final 'int_range<1>' that 
has virtual functions but non-virtual destructor 
[-Wdelete-non-abstract-non-virtual-dtor]

gcc/ChangeLog:

* value-range-equiv.h (class value_range_equiv): Add virtual
  destructor.
* value-range.h: Likewise.
---
 gcc/value-range-equiv.h | 3 +++
 gcc/value-range.h   | 1 +
 2 files changed, 4 insertions(+)

diff --git a/gcc/value-range-equiv.h b/gcc/value-range-equiv.h
index ad8c640b15b..1a8014df834 100644
--- a/gcc/value-range-equiv.h
+++ b/gcc/value-range-equiv.h
@@ -37,6 +37,9 @@ class GTY((user)) value_range_equiv : public value_range
   /* Shallow-copies equiv bitmap.  */
   value_range_equiv& operator=(const value_range_equiv &) /* = delete */;
 
+  /* Virtual destructor.  */
+  virtual ~value_range_equiv () = default;
+
   /* Move equiv bitmap from source range.  */
   void move (value_range_equiv *);
 
diff --git a/gcc/value-range.h b/gcc/value-range.h
index 856947d23dd..f0075d0fb1a 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -228,6 +228,7 @@ public:
   int_range (tree type);
   int_range (const int_range &);
   int_range (const irange &);
+  virtual ~int_range () = default;
   int_range& operator= (const int_range &);
 private:
   template  friend void gt_ggc_mx (int_range *);
-- 
2.37.1



[PATCH][pushed] VR: mitigate -Wfinal-dtor-non-final-class clang warnings

2022-08-16 Thread Martin Liška
Pushed as obvious.

Martin

Fixes:

gcc/value-range-storage.h:129:40: warning: class with destructor marked 'final' 
cannot be inherited from [-Wfinal-dtor-non-final-class]
gcc/value-range-storage.h:146:36: warning: class with destructor marked 'final' 
cannot be inherited from [-Wfinal-dtor-non-final-class]

gcc/ChangeLog:

* value-range-storage.h (class obstack_vrange_allocator): Mark
  the class as final.
(class ggc_vrange_allocator): Likewise.
---
 gcc/value-range-storage.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/value-range-storage.h b/gcc/value-range-storage.h
index 3fac5ea2f86..9cd6b9f7bec 100644
--- a/gcc/value-range-storage.h
+++ b/gcc/value-range-storage.h
@@ -119,7 +119,7 @@ class GTY (()) frange_storage_slot
   frange_props m_props;
 };
 
-class obstack_vrange_allocator : public vrange_allocator
+class obstack_vrange_allocator final: public vrange_allocator
 {
 public:
   obstack_vrange_allocator ()
@@ -139,7 +139,7 @@ private:
   obstack m_obstack;
 };
 
-class ggc_vrange_allocator : public vrange_allocator
+class ggc_vrange_allocator final: public vrange_allocator
 {
 public:
   ggc_vrange_allocator () { }
-- 
2.37.1



Re: [PATCH] Tame path_range_query::compute_imports

2022-08-16 Thread Richard Biener via Gcc-patches
On Tue, 16 Aug 2022, Aldy Hernandez wrote:

> On Mon, Aug 15, 2022 at 11:53 AM Richard Biener  wrote:
> >
> > On Thu, 11 Aug 2022, Aldy Hernandez wrote:
> >
> > > On Thu, Aug 11, 2022 at 3:59 PM Andrew MacLeod  
> > > wrote:
> > > >
> > > >
> > > > On 8/11/22 07:42, Richard Biener wrote:
> > > > > This avoids going BBs outside of the path when adding def chains
> > > > > to the set of imports.  It also syncs the code with
> > > > > range_def_chain::get_def_chain to not miss out on some imports
> > > > > this function would identify.
> > > > >
> > > > > Bootstrap / regtest pending on x86_64-unknown-linux-gnu.
> > > > >
> > > > > The question still stands on what the path_range_query::compute_ranges
> > > > > actually needs in its m_imports - at least I don't easily see how
> > > > > the range-folds will use the path range cache or be path sensitive
> > > > > at all.
> > > >
> > > > All the range folding code is in gimple_range_fold.{h,cc}, and its
> > > > driven by the mystical FUR_source classes.  fur_source stands for
> > > > Fold_Using_Range source, and its basically just an API class which all
> > > > the folding routines use to make queries. it is used by all the fold
> > > > routines to ask any questions about valueizing relations,  ssa name,
> > > > etc..   but abstracts the actual source of the information. Its the
> > > > distillation from previous incarnations where I use to pass an edge, a
> > > > stmt and other stuff to each routine that it might need, and decided to
> > > > abstract since it was very unwieldy.  The base class requires only a
> > > > range_query which is then used for all queries.
> > >
> > > Note that not only is ranger and path_query a range_query, so is
> > > vr_values from legacy land.  It all shares the same API.  And the
> > > simplify_using_ranges class takes a range_query, so it can work with
> > > legacy or ranger, or even (untested) the path_query class.
> > >
> > > >
> > > > Then I derive fur_stmt which is instantiated additionally with the stmt
> > > > you wish to fold at, and it will perform queries using that stmt as the
> > > > context source..   Any requests for ranges/relations/etc will occur as
> > > > if that stmt location is the source.  If folding a particular stmt, you
> > > > use that stmt as the fur_stmt source.  This is also how I do
> > > > recalculations..  when we see
> > > > bb4:
> > > >a_32 = f_16 + 10
> > > > <...>
> > > > bb88:
> > > >if (f_16 < 20)
> > > >   b_8 = a_32 + 8
> > > > and there is sufficient reason to think that a_32 would have a different
> > > > value , we can invoke a re-fold of a_32's defintion stmt at the use
> > > > point in b_8..  using that stmt as the fur_source. Ranger will take into
> > > > account the range of f_16 being [0,19] at that spot, and recalculate
> > > > a_32 as [10,29].  Its expensive to do this at every use point, so we
> > > > only do it if we think there is a good reason at this point.
> > > >
> > > > The point is that the fur_source mechanism is how we provide a context,
> > > > and that class talkes care of the details of what the source actually 
> > > > is.
> > > >
> > > > There are other fur_sources.. fur_edge allows all the same questions to
> > > > be answered, but using an edge as the source. Meaning we can calculate
> > > > an arbitrary stmt/expressions as if it occurs on an edge.
> > > >
> > > > There are also a couple of specialized fur_sources.. there is an
> > > > internal one in ranger which communicates some other information called
> > > > fur_depend which acts like range_of_stmt, but with additional
> > > > functionality to register dependencies in GORI as they are seen.
> > >
> > > This is a really good explanation.  I think you should save it and
> > > included it in the documentation when you/we get around to writing it
> > > ;-).
> > >
> > > >
> > > > Aldy overloads the fur_depend class (called jt_fur_source--  Im not sure
> > > > the origination of the name) to work with the values in the path_query
> > > > class.   You will note that the path_range_query class inherits from a
> > > > range_query, so it supports all the range_of_expr, range_of_stmt, and
> > > > range_on_edge aspect of rangers API.
> > >
> > > The name comes from "jump thread" fur_source.  I should probably
> > > rename that to path_fur_source.  Note that even though the full
> > > range_query API is available in path_range_query, only range_of_expr
> > > and range_of_stmt are supported (or tested).  As I mention in the
> > > comment for the class:
> > >
> > > // This class is a basic block path solver.  Given a set of BBs
> > > // indicating a path through the CFG, range_of_expr and range_of_stmt
> > > // will calculate the range of an SSA or STMT as if the BBs in the
> > > // path would have been executed in order.
> > >
> > > So using range_on_edge would probably give unexpected results, using
> > > stuff in the cache as it would appear at the end of the path, or some
> > > such.  We could definitely 

Re: [PATCH] Tame path_range_query::compute_imports

2022-08-16 Thread Richard Biener via Gcc-patches
On Tue, 16 Aug 2022, Aldy Hernandez wrote:

> On Tue, Aug 16, 2022 at 11:08 AM Aldy Hernandez  wrote:
> >
> > On Tue, Aug 16, 2022 at 10:32 AM Richard Biener  wrote:
> > >
> > > On Tue, 16 Aug 2022, Aldy Hernandez wrote:
> > >
> > > > On Thu, Aug 11, 2022 at 1:42 PM Richard Biener  
> > > > wrote:
> > > >
> > > > > @@ -599,6 +592,30 @@ path_range_query::compute_imports (bitmap 
> > > > > imports, const vec )
> > > > > worklist.safe_push (arg);
> > > > > }
> > > > > }
> > > > > +  else if (gassign *ass = dyn_cast  (def_stmt))
> > > > > +   {
> > > > > + tree ssa[3];
> > > > > + if (range_op_handler (ass))
> > > > > +   {
> > > > > + ssa[0] = gimple_range_ssa_p (gimple_range_operand1 
> > > > > (ass));
> > > > > + ssa[1] = gimple_range_ssa_p (gimple_range_operand2 
> > > > > (ass));
> > > > > + ssa[2] = NULL_TREE;
> > > > > +   }
> > > > > + else if (gimple_assign_rhs_code (ass) == COND_EXPR)
> > > > > +   {
> > > > > + ssa[0] = gimple_range_ssa_p (gimple_assign_rhs1 (ass));
> > > > > + ssa[1] = gimple_range_ssa_p (gimple_assign_rhs2 (ass));
> > > > > + ssa[2] = gimple_range_ssa_p (gimple_assign_rhs3 (ass));
> > > > > +   }
> > > > > + else
> > > > > +   continue;
> > > > > + for (unsigned j = 0; j < 3; ++j)
> > > > > +   {
> > > > > + tree rhs = ssa[j];
> > > > > + if (rhs && add_to_imports (rhs, imports))
> > > > > +   worklist.safe_push (rhs);
> > > > > +   }
> > > > > +   }
> > > >
> > > > We seem to have 3 copies of this copy now: this one, the
> > > > threadbackward one, and the original one.
> > > >
> > > > Could we abstract this somehow?
> > >
> > > I've thought about this but didn't find any good solution since the
> > > use of the operands is always a bit different.  But I was wondering
> > > why/if the COND_EXPR special-casing is necessary, that is, why
> > > don't we have a range_op_handler for it and if we don't why
> > > do we care about it?
> >
> > I think it's because we don't have a range-op handler for COND_EXPR,
> > opting to handle the relational operators instead in range-ops.  We
> > have similar code in the folder:
> >
> >   if (range_op_handler (s))
> > res = range_of_range_op (r, s, src);
> >   else if (is_a(s))
> > res = range_of_phi (r, as_a (s), src);
> >   else if (is_a(s))
> > res = range_of_call (r, as_a (s), src);
> >   else if (is_a (s) && gimple_assign_rhs_code (s) == COND_EXPR)
> > res = range_of_cond_expr (r, as_a (s), src);
> >
> > Andrew, do you have any suggestions here?
> 
> Hmmm, so thinking about this, perhaps special casing it is the way to go ??

It looks like so.  Though a range_op_handler could, for
_1 = _2 ? _3 : _4; derive a range for _3 from _1 if _2 is
known true?


[PATCH] middle-end/106630 - avoid ping-pong between extract_muldiv and match.pd

2022-08-16 Thread Richard Biener via Gcc-patches
The following avoids ping-pong between the match.pd pattern changing
(sizetype) ((a_9 + 1) * 48) to (sizetype)(a_9 + 1) * 48 and
extract_muldiv performing the reverse transform by restricting the
match.pd pattern to narrowing conversions as the comment indicates.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR middle-end/106630
* match.pd ((T)(x * CST) -> (T)x * CST): Restrict to
narrowing conversions.

* gcc.dg/torture/pr106630.c: New testcase.
---
 gcc/match.pd|  2 +-
 gcc/testsuite/gcc.dg/torture/pr106630.c | 13 +
 2 files changed, 14 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr106630.c

diff --git a/gcc/match.pd b/gcc/match.pd
index e32bda64e64..07d0a61fc3a 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1917,7 +1917,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (convert (mult@0 zero_one_valued_p@1 INTEGER_CST@2))
  (if (INTEGRAL_TYPE_P (type)
   && INTEGRAL_TYPE_P (TREE_TYPE (@0))
-  && TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@0)))
+  && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@0)))
   (mult (convert @1) (convert @2
 
 /* (X << C) != 0 can be simplified to X, when C is zero_one_valued_p.
diff --git a/gcc/testsuite/gcc.dg/torture/pr106630.c 
b/gcc/testsuite/gcc.dg/torture/pr106630.c
new file mode 100644
index 000..d608b9151be
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr106630.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+
+short d, e;
+int f;
+extern short g[][24];
+char c;
+void h() {
+  char a = 6;
+  c = a;
+  for (unsigned long a = (d || e) - 1; a < c; a += f)
+for (signed b = 0; b < 24; b++)
+  g[a][b] = 4;
+}
-- 
2.35.3


Re: [PATCH] Support threading of just the exit edge

2022-08-16 Thread Richard Biener via Gcc-patches
On Mon, 15 Aug 2022, Aldy Hernandez wrote:

> On Mon, Aug 15, 2022 at 9:24 PM Andrew MacLeod  wrote:
> >
> > heh. or just
> >
> >
> > +  int_range<2> r;
> > +  if (!fold_range (r, const_cast  (cond_stmt))
> > +  || !r.singleton_p ())
> >
> >
> > if you do not provide a range_query to any of the fold_using_range code,
> > it defaults to:
> >
> > fur_source::fur_source (range_query *q)
> > {
> >if (q)
> >  m_query = q;
> >else if (cfun)
> >  m_query = get_range_query (cfun);
> >else
> >  m_query = get_global_range_query ();
> >m_gori = NULL;
> > }
> >
> 
> Sweet.  Even better!

So when I do the following incremental change ontop of the posted
patch then I see that the path query is able to simplify more
"single BB paths" than the global range folding.

diff --git a/gcc/tree-ssa-threadbackward.cc 
b/gcc/tree-ssa-threadbackward.cc
index 669098e4ec3..777e778037f 100644
--- a/gcc/tree-ssa-threadbackward.cc
+++ b/gcc/tree-ssa-threadbackward.cc
@@ -314,6 +314,12 @@ back_threader::find_taken_edge_cond (const 
vec ,
 {
   int_range_max r;
 
+  int_range<2> rf;
+  if (path.length () == 1)
+{
+  fold_range (rf, cond);
+}
+
   m_solver->compute_ranges (path, m_imports);
   m_solver->range_of_stmt (r, cond);
 
@@ -325,6 +331,8 @@ back_threader::find_taken_edge_cond (const 
vec ,
 
   if (r == true_range || r == false_range)
 {
+  if (path.length () == 1)
+   gcc_assert  (r == rf);
   edge e_true, e_false;
   basic_block bb = gimple_bb (cond);
   extract_true_false_edges_from_block (bb, _true, _false);

Even doing the following (not sure what's the difference and in
particular expense over the path range query) results in missed
simplifications (checking my set of cc1files).

diff --git a/gcc/tree-ssa-threadbackward.cc 
b/gcc/tree-ssa-threadbackward.cc
index 669098e4ec3..1d43a179d08 100644
--- a/gcc/tree-ssa-threadbackward.cc
+++ b/gcc/tree-ssa-threadbackward.cc
@@ -99,6 +99,7 @@ private:
 
   back_threader_registry m_registry;
   back_threader_profitability m_profit;
+  gimple_ranger *m_ranger;
   path_range_query *m_solver;
 
   // Current path being analyzed.
@@ -146,12 +147,14 @@ back_threader::back_threader (function *fun, 
unsigned flags, bool first)
   // The path solver needs EDGE_DFS_BACK in resolving mode.
   if (flags & BT_RESOLVE)
 mark_dfs_back_edges ();
-  m_solver = new path_range_query (flags & BT_RESOLVE);
+  m_ranger = new gimple_ranger;
+  m_solver = new path_range_query (flags & BT_RESOLVE, m_ranger);
 }
 
 back_threader::~back_threader ()
 {
   delete m_solver;
+  delete m_ranger;
 
   loop_optimizer_finalize ();
 }
@@ -314,6 +317,12 @@ back_threader::find_taken_edge_cond (const 
vec ,
 {
   int_range_max r;
 
+  int_range<2> rf;
+  if (path.length () == 1)
+{
+  fold_range (rf, cond, m_ranger);
+}
+
   m_solver->compute_ranges (path, m_imports);
   m_solver->range_of_stmt (r, cond);
 
@@ -325,6 +334,8 @@ back_threader::find_taken_edge_cond (const 
vec ,
 
   if (r == true_range || r == false_range)
 {
+  if (path.length () == 1)
+   gcc_assert  (r == rf);
   edge e_true, e_false;
   basic_block bb = gimple_bb (cond);
   extract_true_false_edges_from_block (bb, _true, _false);

one example is

 [local count: 14414059]:
_128 = node_177(D)->typed.type;
pretmp_413 = MEM[(const union tree_node *)_128].base.code;
_431 = pretmp_413 + 65519;
if (_128 == 0B)
  goto ; [18.09%]
else
  goto ; [81.91%]

where m_imports for the path is just _128 and the range computed is
false while the ranger query returns VARYING.  But
path_range_query::range_defined_in_block does

  if (bb && POINTER_TYPE_P (TREE_TYPE (name)))
m_ranger->m_cache.m_exit.maybe_adjust_range (r, name, bb);

which adjusts the range to ~[0, 0], probably because of the
dereference in the following stmt.

Why does fold_range not do this when folding the exit test?  Is there
a way to make it do so?  It looks like the only routine using this
in gimple-range.cc is range_on_edge and there it's used for e->src
after calling range_on_exit for e->src (why's it not done in
range_on_exit?).  A testcase for this is

int foo (int **p, int i)
{
  int *q = *p;
  int res = *q + i;
  if (q)
return res;
  return -1;
}

which we "thread" with the path and with the above ICEs because
fold_range doesn't get that if (q) is always true.  Without the
patch ethread doesn't want to duplicate the block (it's too large)
but threadfull will if you disable evrp (if you remove the increment
by 'i' it again won't since nothing interesting prevails and it
won't go to BB 0 and fails to pick up a thread of length > 1):

Checking profitability of path (backwards):  bb:2 (6 insns) bb:0
  Control statement insns: 2
  Overall: 4 insns
  [1] Registering jump thread: (0, 2) incoming edge;  (2, 3) nocopy;
path: 0->2->3 SUCCESS
Removing basic block 2
;; basic block 2, loop depth 0
;;  pred:
_1 = *p_6(D);
_2 = (long unsigned int) n_7(D);
_3 = _2 * 4;
q_8 = _1 + _3;
res_9 

Re: [PATCH] Tame path_range_query::compute_imports

2022-08-16 Thread Aldy Hernandez via Gcc-patches
On Tue, Aug 16, 2022 at 11:08 AM Aldy Hernandez  wrote:
>
> On Tue, Aug 16, 2022 at 10:32 AM Richard Biener  wrote:
> >
> > On Tue, 16 Aug 2022, Aldy Hernandez wrote:
> >
> > > On Thu, Aug 11, 2022 at 1:42 PM Richard Biener  wrote:
> > >
> > > > @@ -599,6 +592,30 @@ path_range_query::compute_imports (bitmap imports, 
> > > > const vec )
> > > > worklist.safe_push (arg);
> > > > }
> > > > }
> > > > +  else if (gassign *ass = dyn_cast  (def_stmt))
> > > > +   {
> > > > + tree ssa[3];
> > > > + if (range_op_handler (ass))
> > > > +   {
> > > > + ssa[0] = gimple_range_ssa_p (gimple_range_operand1 (ass));
> > > > + ssa[1] = gimple_range_ssa_p (gimple_range_operand2 (ass));
> > > > + ssa[2] = NULL_TREE;
> > > > +   }
> > > > + else if (gimple_assign_rhs_code (ass) == COND_EXPR)
> > > > +   {
> > > > + ssa[0] = gimple_range_ssa_p (gimple_assign_rhs1 (ass));
> > > > + ssa[1] = gimple_range_ssa_p (gimple_assign_rhs2 (ass));
> > > > + ssa[2] = gimple_range_ssa_p (gimple_assign_rhs3 (ass));
> > > > +   }
> > > > + else
> > > > +   continue;
> > > > + for (unsigned j = 0; j < 3; ++j)
> > > > +   {
> > > > + tree rhs = ssa[j];
> > > > + if (rhs && add_to_imports (rhs, imports))
> > > > +   worklist.safe_push (rhs);
> > > > +   }
> > > > +   }
> > >
> > > We seem to have 3 copies of this copy now: this one, the
> > > threadbackward one, and the original one.
> > >
> > > Could we abstract this somehow?
> >
> > I've thought about this but didn't find any good solution since the
> > use of the operands is always a bit different.  But I was wondering
> > why/if the COND_EXPR special-casing is necessary, that is, why
> > don't we have a range_op_handler for it and if we don't why
> > do we care about it?
>
> I think it's because we don't have a range-op handler for COND_EXPR,
> opting to handle the relational operators instead in range-ops.  We
> have similar code in the folder:
>
>   if (range_op_handler (s))
> res = range_of_range_op (r, s, src);
>   else if (is_a(s))
> res = range_of_phi (r, as_a (s), src);
>   else if (is_a(s))
> res = range_of_call (r, as_a (s), src);
>   else if (is_a (s) && gimple_assign_rhs_code (s) == COND_EXPR)
> res = range_of_cond_expr (r, as_a (s), src);
>
> Andrew, do you have any suggestions here?

Hmmm, so thinking about this, perhaps special casing it is the way to go ??



Re: [PATCH][pushed] VR: add missing override keyworks

2022-08-16 Thread Aldy Hernandez via Gcc-patches
Thanks.

On Tue, Aug 16, 2022 at 11:07 AM Martin Liška  wrote:
>
> Pushing as it follows the same pattern as:
>
>   virtual void set (tree, tree, value_range_kind = VR_RANGE) override;
>
> Martin
>
> Address:
>
> gcc/value-range-equiv.h:57:8: warning: 'set_undefined' overrides a member 
> function but is not marked 'override' [-Winconsistent-missing-override]
> gcc/value-range-equiv.h:58:8: warning: 'set_varying' overrides a member 
> function but is not marked 'override' [-Winconsistent-missing-override]
>
> gcc/ChangeLog:
>
> * value-range-equiv.h (class value_range_equiv):
> ---
>  gcc/value-range-equiv.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/value-range-equiv.h b/gcc/value-range-equiv.h
> index 0a52d1372a1..ad8c640b15b 100644
> --- a/gcc/value-range-equiv.h
> +++ b/gcc/value-range-equiv.h
> @@ -54,8 +54,8 @@ class GTY((user)) value_range_equiv : public value_range
>bool equal_p (const value_range_equiv &, bool ignore_equivs) const;
>
>/* Types of value ranges.  */
> -  void set_undefined ();
> -  void set_varying (tree);
> +  void set_undefined () override;
> +  void set_varying (tree) override;
>
>/* Equivalence bitmap methods.  */
>bitmap equiv () const { return m_equiv; }
> --
> 2.37.1
>



Re: [PATCH] Tame path_range_query::compute_imports

2022-08-16 Thread Aldy Hernandez via Gcc-patches
On Tue, Aug 16, 2022 at 10:32 AM Richard Biener  wrote:
>
> On Tue, 16 Aug 2022, Aldy Hernandez wrote:
>
> > On Thu, Aug 11, 2022 at 1:42 PM Richard Biener  wrote:
> >
> > > @@ -599,6 +592,30 @@ path_range_query::compute_imports (bitmap imports, 
> > > const vec )
> > > worklist.safe_push (arg);
> > > }
> > > }
> > > +  else if (gassign *ass = dyn_cast  (def_stmt))
> > > +   {
> > > + tree ssa[3];
> > > + if (range_op_handler (ass))
> > > +   {
> > > + ssa[0] = gimple_range_ssa_p (gimple_range_operand1 (ass));
> > > + ssa[1] = gimple_range_ssa_p (gimple_range_operand2 (ass));
> > > + ssa[2] = NULL_TREE;
> > > +   }
> > > + else if (gimple_assign_rhs_code (ass) == COND_EXPR)
> > > +   {
> > > + ssa[0] = gimple_range_ssa_p (gimple_assign_rhs1 (ass));
> > > + ssa[1] = gimple_range_ssa_p (gimple_assign_rhs2 (ass));
> > > + ssa[2] = gimple_range_ssa_p (gimple_assign_rhs3 (ass));
> > > +   }
> > > + else
> > > +   continue;
> > > + for (unsigned j = 0; j < 3; ++j)
> > > +   {
> > > + tree rhs = ssa[j];
> > > + if (rhs && add_to_imports (rhs, imports))
> > > +   worklist.safe_push (rhs);
> > > +   }
> > > +   }
> >
> > We seem to have 3 copies of this copy now: this one, the
> > threadbackward one, and the original one.
> >
> > Could we abstract this somehow?
>
> I've thought about this but didn't find any good solution since the
> use of the operands is always a bit different.  But I was wondering
> why/if the COND_EXPR special-casing is necessary, that is, why
> don't we have a range_op_handler for it and if we don't why
> do we care about it?

I think it's because we don't have a range-op handler for COND_EXPR,
opting to handle the relational operators instead in range-ops.  We
have similar code in the folder:

  if (range_op_handler (s))
res = range_of_range_op (r, s, src);
  else if (is_a(s))
res = range_of_phi (r, as_a (s), src);
  else if (is_a(s))
res = range_of_call (r, as_a (s), src);
  else if (is_a (s) && gimple_assign_rhs_code (s) == COND_EXPR)
res = range_of_cond_expr (r, as_a (s), src);

Andrew, do you have any suggestions here?

Aldy



Re: [PATCH] LoongArch: Provide fmin/fmax RTL pattern

2022-08-16 Thread Lulu Cheng

Looks good to me.


在 2022/8/16 下午4:08, Xi Ruoyao 写道:

A simple optimization.  Ok for trunk?

-- >8 --

We already had smin/smax RTL pattern using fmin/fmax instruction.  But
for smin/smax, it's unspecified what will happen if either operand is
NaN.  So we would generate calls to libc fmin/fmax functions with
-fno-finite-math-only (the default for all optimization levels expect
-Ofast).

But, LoongArch fmin/fmax instruction is IEEE-754-2008 conformant so we
can also use the instruction for fmin/fmax pattern and avoid the library
function call.

gcc/ChangeLog:

* config/loongarch/loongarch.md (fmax3): New RTL pattern.
(fmin3): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/fmax-fmin.c: New test.
---
  gcc/config/loongarch/loongarch.md | 18 +++
  .../gcc.target/loongarch/fmax-fmin.c  | 30 +++
  2 files changed, 48 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/fmax-fmin.c

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 6b6df22a5f1..8e8868de9f5 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -1023,6 +1023,24 @@ (define_insn "smin3"
[(set_attr "type" "fmove")
 (set_attr "mode" "")])
  
+(define_insn "fmax3"

+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+   (smax:ANYF (match_operand:ANYF 1 "register_operand" "f")
+  (match_operand:ANYF 2 "register_operand" "f")))]
+  ""
+  "fmax.\t%0,%1,%2"
+  [(set_attr "type" "fmove")
+   (set_attr "mode" "")])
+
+(define_insn "fmin3"
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+   (smin:ANYF (match_operand:ANYF 1 "register_operand" "f")
+  (match_operand:ANYF 2 "register_operand" "f")))]
+  ""
+  "fmin.\t%0,%1,%2"
+  [(set_attr "type" "fmove")
+   (set_attr "mode" "")])
+
  (define_insn "smaxa3"
[(set (match_operand:ANYF 0 "register_operand" "=f")
(if_then_else:ANYF
diff --git a/gcc/testsuite/gcc.target/loongarch/fmax-fmin.c 
b/gcc/testsuite/gcc.target/loongarch/fmax-fmin.c
new file mode 100644
index 000..92cf8a1501d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/fmax-fmin.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-mdouble-float -fno-finite-math-only" } */
+/* { dg-final { scan-assembler "fmin\\.s" } } */
+/* { dg-final { scan-assembler "fmin\\.d" } } */
+/* { dg-final { scan-assembler "fmax\\.s" } } */
+/* { dg-final { scan-assembler "fmax\\.d" } } */
+
+double
+_fmax(double a, double b)
+{
+  return __builtin_fmax(a, b);
+}
+
+float
+_fmaxf(float a, float b)
+{
+  return __builtin_fmaxf(a, b);
+}
+
+double
+_fmin(double a, double b)
+{
+  return __builtin_fmin(a, b);
+}
+
+float
+_fminf(float a, float b)
+{
+  return __builtin_fminf(a, b);
+}




[PATCH][pushed] VR: add missing override keyworks

2022-08-16 Thread Martin Liška
Pushing as it follows the same pattern as:

  virtual void set (tree, tree, value_range_kind = VR_RANGE) override;

Martin

Address:

gcc/value-range-equiv.h:57:8: warning: 'set_undefined' overrides a member 
function but is not marked 'override' [-Winconsistent-missing-override]
gcc/value-range-equiv.h:58:8: warning: 'set_varying' overrides a member 
function but is not marked 'override' [-Winconsistent-missing-override]

gcc/ChangeLog:

* value-range-equiv.h (class value_range_equiv):
---
 gcc/value-range-equiv.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/value-range-equiv.h b/gcc/value-range-equiv.h
index 0a52d1372a1..ad8c640b15b 100644
--- a/gcc/value-range-equiv.h
+++ b/gcc/value-range-equiv.h
@@ -54,8 +54,8 @@ class GTY((user)) value_range_equiv : public value_range
   bool equal_p (const value_range_equiv &, bool ignore_equivs) const;
 
   /* Types of value ranges.  */
-  void set_undefined ();
-  void set_varying (tree);
+  void set_undefined () override;
+  void set_varying (tree) override;
 
   /* Equivalence bitmap methods.  */
   bitmap equiv () const { return m_equiv; }
-- 
2.37.1



Re: [x86 PATCH] PR target/106577: force_reg may clobber operands during split.

2022-08-16 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Tue, Aug 16, 2022 at 10:14 AM Richard Sandiford
>  wrote:
>>
>> Richard Biener via Gcc-patches  writes:
>> > On Fri, Aug 12, 2022 at 10:41 PM Roger Sayle  
>> > wrote:
>> >>
>> >>
>> >> This patch fixes PR target/106577 which is a recent ICE on valid 
>> >> regression
>> >> caused by my introduction of a *testti_doubleword pre-reload splitter in
>> >> i386.md.  During the split pass before reload, this converts the virtual
>> >> *testti_doubleword into an *andti3_doubleword and *cmpti_doubleword,
>> >> checking that any immediate operand is a valid 
>> >> "x86_64_hilo_general_operand"
>> >> and placing it into a TImode register using force_reg if it isn't.
>> >>
>> >> The unexpected behaviour (that caught me out) is that calling force_reg
>> >> may occasionally clobber the contents of the global operands array, or
>> >> more accurately recog_data.operand[0], which means that by the time
>> >> split_XXX calls gen_split_YYY the replacement insn's operands have been
>> >> corrupted.
>> >>
>> >> It's difficult to tell who (if anyone is at fault).  The re-entrant
>> >> stack trace (for the attached PR) looks like:
>> >>
>> >> gen_split_203 (*testti_doubleword) calls
>> >> force_reg calls
>> >> emit_move_insn calls
>> >> emit_move_insn_1 calls
>> >> gen_movti calls
>> >> ix86_expand_move calls
>> >> ix86_convert_const_wide_int_to_broadcast calls
>> >> ix86_vector_duplicate_value calls
>> >> recog_memoized calls
>> >> recog.
>> >>
>> >> By far the simplest and possibly correct fix is rather than attempt
>> >> to push and pop recog_data, to simply (in pre-reload splits) save a
>> >> copy of any operands that will be needed after force_reg, and use
>> >> these copies afterwards.  Many pre-reload splitters avoid this issue
>> >> using "[(clobber (const_int 0))]" and so avoid gen_split_YYY functions,
>> >> but in our case we still need to save a copy of operands[0] (even if we
>> >> call emit_insn or expand_* ourselves), so we might as well continue to
>> >> use the conveniently generated gen_split.
>> >>
>> >> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
>> >> and make -k check, both with and without --target_board=unix{-m32},
>> >> with no new failures. Ok for mainline?
>> >
>> > Why this obviously fixes the issue seen I wonder whether there's
>> > more of recog_data that might be used after control flow returns
>> > to recog_memoized and thus the fix would be there, not in any
>> > backend pattern triggering the issue like this?
>> >
>> > The "easiest" fix would maybe to add a in_recog flag and
>> > simply return FAIL from recog when recursing.  Not sure what
>> > the effect on this particular pattern would be though?
>> >
>> > The better(?) fix might be to push/pop recog_data in 'recog', but
>> > of course give that recog_data is currently a global leakage
>> > in intermediate code can still happen.
>> >
>> > That said - does anybody know of similar fixes for this issue in other
>> > backends patterns?
>>
>> I don't think it's valid for a simple query function like
>> ix86_vector_duplicate_value to clobber global state.  Doing that
>> could cause problems in other situations, not just splits.
>>
>> Ideally, it would be good to wean insn-recog.cc:recog off global state.
>> The only parts of recog_data it uses (if I didn't miss something)
>> are recog_data.operands and recog_data.insn (but only to nullify
>> it for recog_memoized, which wouldn't be necessary if recog didn't
>> clobber recog_data.operands).  But I guess some .md expand/insn
>> conditions probably rely on the operands array being in recog_data,
>> so that might not be easy.
>>
>> IMO the correct low-effort fix is to save and restore recog_data
>> in ix86_vector_duplicate_value.  It's a relatively big copy,
>> but the current code is pretty wasteful anyway (allocating at
>> least a new SET and INSN for every query).  Compared to the
>> overhead of doing that, a copy to and from the stack shouldn't
>> be too bad.
>
> I see.  I wonder if we should at least add some public API for
> save/restore of recog_data so the many places don't need to
> invent their own version and they are more easily to find later.

Plain assignment should work.  The structure isn't very fancy ;-)

> Maybe some RAII
>
> {
>   push_recog_data saved ();
>
> }
>
> ?

Maybe.  But if we're going to spend effort on something, moving away
from the global state seems better IMO.

> Shall we armor recog () for recursive invocation by adding a
> ->in_recog member to recog_data?

Yeah, we could do that, but it wouldn't catch the current bug.

Thanks,
Richard


[PATCH][pushed] analyzer: add more final override keywords

2022-08-16 Thread Martin Liška
Pushed as obvious.

Martin

gcc/analyzer/ChangeLog:

* region-model.cc: Fix -Winconsistent-missing-override clang
  warning.
* region.h: Likewise.
---
 gcc/analyzer/region-model.cc | 4 ++--
 gcc/analyzer/region.h| 3 ++-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index b05b7097c00..b5bc3efda32 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -3425,7 +3425,7 @@ public:
 result_set.add (sval);
   }
 
-  void visit_unaryop_svalue (const unaryop_svalue *sval)
+  void visit_unaryop_svalue (const unaryop_svalue *sval) final override
   {
 const svalue *arg = sval->get_arg ();
 if (result_set.contains (arg))
@@ -3449,7 +3449,7 @@ public:
   }
   }
 
-  void visit_repeated_svalue (const repeated_svalue *sval)
+  void visit_repeated_svalue (const repeated_svalue *sval) final override
   {
 sval->get_inner_svalue ()->accept (this);
 if (result_set.contains (sval->get_inner_svalue ()))
diff --git a/gcc/analyzer/region.h b/gcc/analyzer/region.h
index 20dffc7f577..d37584b7285 100644
--- a/gcc/analyzer/region.h
+++ b/gcc/analyzer/region.h
@@ -919,7 +919,8 @@ public:
   const svalue *get_byte_offset () const { return m_byte_offset; }
 
   bool get_relative_concrete_offset (bit_offset_t *out) const final override;
-  const svalue * get_byte_size_sval (region_model_manager *mgr) const;
+  const svalue * get_byte_size_sval (region_model_manager *mgr)
+const final override;
 
 
 private:
-- 
2.37.1



[PATCH][pushed] i386: add 'final' and 'override' to scalar_chain

2022-08-16 Thread Martin Liška
In c3ed9e0d6e96d8697e4bab994f8acbc5506240ee, David added some
"final override" and since that there are 2 new warnings that
need the same treatment:

gcc/config/i386/i386-features.h:186:8: warning: 'convert_op' overrides a member 
function but is not marked 'override' [-Winconsistent-missing-override]
gcc/config/i386/i386-features.h:186:8: warning: 'convert_op' overrides a member 
function but is not marked 'override' [-Winconsistent-missing-override]
gcc/config/i386/i386-features.h:199:8: warning: 'convert_op' overrides a member 
function but is not marked 'override' [-Winconsistent-missing-override]
gcc/config/i386/i386-features.h:199:8: warning: 'convert_op' overrides a member 
function but is not marked 'override' [-Winconsistent-missing-override]

gcc/ChangeLog:

* config/i386/i386-features.h (class general_scalar_chain): Add
  final override for a method.
(class timode_scalar_chain): Likewise.
---
 gcc/config/i386/i386-features.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/i386-features.h b/gcc/config/i386/i386-features.h
index 3d88a88e014..f898e67a108 100644
--- a/gcc/config/i386/i386-features.h
+++ b/gcc/config/i386/i386-features.h
@@ -183,7 +183,7 @@ class general_scalar_chain : public scalar_chain
 
  private:
   void convert_insn (rtx_insn *insn) final override;
-  void convert_op (rtx *op, rtx_insn *insn);
+  void convert_op (rtx *op, rtx_insn *insn) final override;
   int vector_const_cost (rtx exp);
 };
 
@@ -196,7 +196,7 @@ class timode_scalar_chain : public scalar_chain
  private:
   void fix_debug_reg_uses (rtx reg);
   void convert_insn (rtx_insn *insn) final override;
-  void convert_op (rtx *op, rtx_insn *insn);
+  void convert_op (rtx *op, rtx_insn *insn) final override;
 };
 
 } // anon namespace
-- 
2.37.1



Re: [PATCH] Tame path_range_query::compute_imports

2022-08-16 Thread Aldy Hernandez via Gcc-patches
On Mon, Aug 15, 2022 at 11:53 AM Richard Biener  wrote:
>
> On Thu, 11 Aug 2022, Aldy Hernandez wrote:
>
> > On Thu, Aug 11, 2022 at 3:59 PM Andrew MacLeod  wrote:
> > >
> > >
> > > On 8/11/22 07:42, Richard Biener wrote:
> > > > This avoids going BBs outside of the path when adding def chains
> > > > to the set of imports.  It also syncs the code with
> > > > range_def_chain::get_def_chain to not miss out on some imports
> > > > this function would identify.
> > > >
> > > > Bootstrap / regtest pending on x86_64-unknown-linux-gnu.
> > > >
> > > > The question still stands on what the path_range_query::compute_ranges
> > > > actually needs in its m_imports - at least I don't easily see how
> > > > the range-folds will use the path range cache or be path sensitive
> > > > at all.
> > >
> > > All the range folding code is in gimple_range_fold.{h,cc}, and its
> > > driven by the mystical FUR_source classes.  fur_source stands for
> > > Fold_Using_Range source, and its basically just an API class which all
> > > the folding routines use to make queries. it is used by all the fold
> > > routines to ask any questions about valueizing relations,  ssa name,
> > > etc..   but abstracts the actual source of the information. Its the
> > > distillation from previous incarnations where I use to pass an edge, a
> > > stmt and other stuff to each routine that it might need, and decided to
> > > abstract since it was very unwieldy.  The base class requires only a
> > > range_query which is then used for all queries.
> >
> > Note that not only is ranger and path_query a range_query, so is
> > vr_values from legacy land.  It all shares the same API.  And the
> > simplify_using_ranges class takes a range_query, so it can work with
> > legacy or ranger, or even (untested) the path_query class.
> >
> > >
> > > Then I derive fur_stmt which is instantiated additionally with the stmt
> > > you wish to fold at, and it will perform queries using that stmt as the
> > > context source..   Any requests for ranges/relations/etc will occur as
> > > if that stmt location is the source.  If folding a particular stmt, you
> > > use that stmt as the fur_stmt source.  This is also how I do
> > > recalculations..  when we see
> > > bb4:
> > >a_32 = f_16 + 10
> > > <...>
> > > bb88:
> > >if (f_16 < 20)
> > >   b_8 = a_32 + 8
> > > and there is sufficient reason to think that a_32 would have a different
> > > value , we can invoke a re-fold of a_32's defintion stmt at the use
> > > point in b_8..  using that stmt as the fur_source. Ranger will take into
> > > account the range of f_16 being [0,19] at that spot, and recalculate
> > > a_32 as [10,29].  Its expensive to do this at every use point, so we
> > > only do it if we think there is a good reason at this point.
> > >
> > > The point is that the fur_source mechanism is how we provide a context,
> > > and that class talkes care of the details of what the source actually is.
> > >
> > > There are other fur_sources.. fur_edge allows all the same questions to
> > > be answered, but using an edge as the source. Meaning we can calculate
> > > an arbitrary stmt/expressions as if it occurs on an edge.
> > >
> > > There are also a couple of specialized fur_sources.. there is an
> > > internal one in ranger which communicates some other information called
> > > fur_depend which acts like range_of_stmt, but with additional
> > > functionality to register dependencies in GORI as they are seen.
> >
> > This is a really good explanation.  I think you should save it and
> > included it in the documentation when you/we get around to writing it
> > ;-).
> >
> > >
> > > Aldy overloads the fur_depend class (called jt_fur_source--  Im not sure
> > > the origination of the name) to work with the values in the path_query
> > > class.   You will note that the path_range_query class inherits from a
> > > range_query, so it supports all the range_of_expr, range_of_stmt, and
> > > range_on_edge aspect of rangers API.
> >
> > The name comes from "jump thread" fur_source.  I should probably
> > rename that to path_fur_source.  Note that even though the full
> > range_query API is available in path_range_query, only range_of_expr
> > and range_of_stmt are supported (or tested).  As I mention in the
> > comment for the class:
> >
> > // This class is a basic block path solver.  Given a set of BBs
> > // indicating a path through the CFG, range_of_expr and range_of_stmt
> > // will calculate the range of an SSA or STMT as if the BBs in the
> > // path would have been executed in order.
> >
> > So using range_on_edge would probably give unexpected results, using
> > stuff in the cache as it would appear at the end of the path, or some
> > such.  We could definitely harden this class and make it work solidly
> > across the entire API, but we've had no uses so far for anything but
> > range_of_expr and range_of_stmt-- and even those are only supported
> > for a range as it would appear at the end of the path.  

Re: [PATCH] Tame path_range_query::compute_imports

2022-08-16 Thread Richard Biener via Gcc-patches
On Tue, 16 Aug 2022, Aldy Hernandez wrote:

> On Thu, Aug 11, 2022 at 1:42 PM Richard Biener  wrote:
> 
> > @@ -599,6 +592,30 @@ path_range_query::compute_imports (bitmap imports, 
> > const vec )
> > worklist.safe_push (arg);
> > }
> > }
> > +  else if (gassign *ass = dyn_cast  (def_stmt))
> > +   {
> > + tree ssa[3];
> > + if (range_op_handler (ass))
> > +   {
> > + ssa[0] = gimple_range_ssa_p (gimple_range_operand1 (ass));
> > + ssa[1] = gimple_range_ssa_p (gimple_range_operand2 (ass));
> > + ssa[2] = NULL_TREE;
> > +   }
> > + else if (gimple_assign_rhs_code (ass) == COND_EXPR)
> > +   {
> > + ssa[0] = gimple_range_ssa_p (gimple_assign_rhs1 (ass));
> > + ssa[1] = gimple_range_ssa_p (gimple_assign_rhs2 (ass));
> > + ssa[2] = gimple_range_ssa_p (gimple_assign_rhs3 (ass));
> > +   }
> > + else
> > +   continue;
> > + for (unsigned j = 0; j < 3; ++j)
> > +   {
> > + tree rhs = ssa[j];
> > + if (rhs && add_to_imports (rhs, imports))
> > +   worklist.safe_push (rhs);
> > +   }
> > +   }
> 
> We seem to have 3 copies of this copy now: this one, the
> threadbackward one, and the original one.
> 
> Could we abstract this somehow?

I've thought about this but didn't find any good solution since the
use of the operands is always a bit different.  But I was wondering
why/if the COND_EXPR special-casing is necessary, that is, why
don't we have a range_op_handler for it and if we don't why
do we care about it?

Richard.


Re: [x86 PATCH] PR target/106577: force_reg may clobber operands during split.

2022-08-16 Thread Richard Biener via Gcc-patches
On Tue, Aug 16, 2022 at 10:14 AM Richard Sandiford
 wrote:
>
> Richard Biener via Gcc-patches  writes:
> > On Fri, Aug 12, 2022 at 10:41 PM Roger Sayle  
> > wrote:
> >>
> >>
> >> This patch fixes PR target/106577 which is a recent ICE on valid regression
> >> caused by my introduction of a *testti_doubleword pre-reload splitter in
> >> i386.md.  During the split pass before reload, this converts the virtual
> >> *testti_doubleword into an *andti3_doubleword and *cmpti_doubleword,
> >> checking that any immediate operand is a valid 
> >> "x86_64_hilo_general_operand"
> >> and placing it into a TImode register using force_reg if it isn't.
> >>
> >> The unexpected behaviour (that caught me out) is that calling force_reg
> >> may occasionally clobber the contents of the global operands array, or
> >> more accurately recog_data.operand[0], which means that by the time
> >> split_XXX calls gen_split_YYY the replacement insn's operands have been
> >> corrupted.
> >>
> >> It's difficult to tell who (if anyone is at fault).  The re-entrant
> >> stack trace (for the attached PR) looks like:
> >>
> >> gen_split_203 (*testti_doubleword) calls
> >> force_reg calls
> >> emit_move_insn calls
> >> emit_move_insn_1 calls
> >> gen_movti calls
> >> ix86_expand_move calls
> >> ix86_convert_const_wide_int_to_broadcast calls
> >> ix86_vector_duplicate_value calls
> >> recog_memoized calls
> >> recog.
> >>
> >> By far the simplest and possibly correct fix is rather than attempt
> >> to push and pop recog_data, to simply (in pre-reload splits) save a
> >> copy of any operands that will be needed after force_reg, and use
> >> these copies afterwards.  Many pre-reload splitters avoid this issue
> >> using "[(clobber (const_int 0))]" and so avoid gen_split_YYY functions,
> >> but in our case we still need to save a copy of operands[0] (even if we
> >> call emit_insn or expand_* ourselves), so we might as well continue to
> >> use the conveniently generated gen_split.
> >>
> >> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> >> and make -k check, both with and without --target_board=unix{-m32},
> >> with no new failures. Ok for mainline?
> >
> > Why this obviously fixes the issue seen I wonder whether there's
> > more of recog_data that might be used after control flow returns
> > to recog_memoized and thus the fix would be there, not in any
> > backend pattern triggering the issue like this?
> >
> > The "easiest" fix would maybe to add a in_recog flag and
> > simply return FAIL from recog when recursing.  Not sure what
> > the effect on this particular pattern would be though?
> >
> > The better(?) fix might be to push/pop recog_data in 'recog', but
> > of course give that recog_data is currently a global leakage
> > in intermediate code can still happen.
> >
> > That said - does anybody know of similar fixes for this issue in other
> > backends patterns?
>
> I don't think it's valid for a simple query function like
> ix86_vector_duplicate_value to clobber global state.  Doing that
> could cause problems in other situations, not just splits.
>
> Ideally, it would be good to wean insn-recog.cc:recog off global state.
> The only parts of recog_data it uses (if I didn't miss something)
> are recog_data.operands and recog_data.insn (but only to nullify
> it for recog_memoized, which wouldn't be necessary if recog didn't
> clobber recog_data.operands).  But I guess some .md expand/insn
> conditions probably rely on the operands array being in recog_data,
> so that might not be easy.
>
> IMO the correct low-effort fix is to save and restore recog_data
> in ix86_vector_duplicate_value.  It's a relatively big copy,
> but the current code is pretty wasteful anyway (allocating at
> least a new SET and INSN for every query).  Compared to the
> overhead of doing that, a copy to and from the stack shouldn't
> be too bad.

I see.  I wonder if we should at least add some public API for
save/restore of recog_data so the many places don't need to
invent their own version and they are more easily to find later.
Maybe some RAII

{
  push_recog_data saved ();

}

?  Shall we armor recog () for recursive invocation by adding a
->in_recog member to recog_data?

>
> Thanks,
> Richard


Re: [committed] doc: Update link to "Memory Model" paper

2022-08-16 Thread Martin Liška
On 3/28/21 23:38, Gerald Pfeifer wrote:
> The original link redirected, alas the new location gives a 404 "Not
> Found". Luckily I found what looks like a more stable location.

Hi.

The newly selected location gives 403, which is not a friendly return code:

wget 
https://www.researchgate.net/publication/221430855_A_Memory_Model_for_Static_Analysis_of_C_Programs
--2022-08-16 10:24:09--  
https://www.researchgate.net/publication/221430855_A_Memory_Model_for_Static_Analysis_of_C_Programs
Resolving www.researchgate.net (www.researchgate.net)... 104.17.32.105, 
104.17.33.105, 2606:4700::6811:2069, ...
Connecting to www.researchgate.net (www.researchgate.net)|104.17.32.105|:443... 
connected.
HTTP request sent, awaiting response... 403 Forbidden
2022-08-16 10:24:09 ERROR 403: Forbidden.

While the original URL works for me (even though it leads to China):

wget http://lcs.ios.ac.cn/~xzx/memmodel.pdf
--2022-08-16 10:24:41--  http://lcs.ios.ac.cn/~xzx/memmodel.pdf
Resolving lcs.ios.ac.cn (lcs.ios.ac.cn)... 124.16.137.50
Connecting to lcs.ios.ac.cn (lcs.ios.ac.cn)|124.16.137.50|:80... connected.
HTTP request sent, awaiting response... 200 OK

Cheers,
Martin

> 
> Pushed.
> 
> Gerald
> 
> 
> 
> commit d15db0c5f5d81e9057df07c9568ee81873860a44
> Author: Gerald Pfeifer 
> Date:   Sun Mar 28 23:34:35 2021 +0200
> 
> doc: Update link to "Memory Model" paper
> 
> gcc/ChangeLog:
> * doc/analyzer.texi (Analyzer Internals): Update link to
> "A Memory Model for Static Analysis of C Programs".
> 
> diff --git a/gcc/doc/analyzer.texi b/gcc/doc/analyzer.texi
> index 3f7bcf3c115..26808ff5d22 100644
> --- a/gcc/doc/analyzer.texi
> +++ b/gcc/doc/analyzer.texi
> @@ -245,7 +245,7 @@ Merging can be disabled via 
> @option{-fno-analyzer-state-merge}.
>  
>  Part of the state stored at a @code{exploded_node} is a @code{region_model}.
>  This is an implementation of the region-based ternary model described in
> -@url{http://lcs.ios.ac.cn/~xzx/memmodel.pdf,
> +@url{https://www.researchgate.net/publication/221430855_A_Memory_Model_for_Static_Analysis_of_C_Programs,
>  "A Memory Model for Static Analysis of C Programs"}
>  (Zhongxing Xu, Ted Kremenek, and Jian Zhang).
>  



Re: [PATCH] Tame path_range_query::compute_imports

2022-08-16 Thread Aldy Hernandez via Gcc-patches
On Thu, Aug 11, 2022 at 1:42 PM Richard Biener  wrote:

> @@ -599,6 +592,30 @@ path_range_query::compute_imports (bitmap imports, const 
> vec )
> worklist.safe_push (arg);
> }
> }
> +  else if (gassign *ass = dyn_cast  (def_stmt))
> +   {
> + tree ssa[3];
> + if (range_op_handler (ass))
> +   {
> + ssa[0] = gimple_range_ssa_p (gimple_range_operand1 (ass));
> + ssa[1] = gimple_range_ssa_p (gimple_range_operand2 (ass));
> + ssa[2] = NULL_TREE;
> +   }
> + else if (gimple_assign_rhs_code (ass) == COND_EXPR)
> +   {
> + ssa[0] = gimple_range_ssa_p (gimple_assign_rhs1 (ass));
> + ssa[1] = gimple_range_ssa_p (gimple_assign_rhs2 (ass));
> + ssa[2] = gimple_range_ssa_p (gimple_assign_rhs3 (ass));
> +   }
> + else
> +   continue;
> + for (unsigned j = 0; j < 3; ++j)
> +   {
> + tree rhs = ssa[j];
> + if (rhs && add_to_imports (rhs, imports))
> +   worklist.safe_push (rhs);
> +   }
> +   }

We seem to have 3 copies of this copy now: this one, the
threadbackward one, and the original one.

Could we abstract this somehow?

Aldy



Re: [x86 PATCH] PR target/106577: force_reg may clobber operands during split.

2022-08-16 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches  writes:
> On Fri, Aug 12, 2022 at 10:41 PM Roger Sayle  
> wrote:
>>
>>
>> This patch fixes PR target/106577 which is a recent ICE on valid regression
>> caused by my introduction of a *testti_doubleword pre-reload splitter in
>> i386.md.  During the split pass before reload, this converts the virtual
>> *testti_doubleword into an *andti3_doubleword and *cmpti_doubleword,
>> checking that any immediate operand is a valid "x86_64_hilo_general_operand"
>> and placing it into a TImode register using force_reg if it isn't.
>>
>> The unexpected behaviour (that caught me out) is that calling force_reg
>> may occasionally clobber the contents of the global operands array, or
>> more accurately recog_data.operand[0], which means that by the time
>> split_XXX calls gen_split_YYY the replacement insn's operands have been
>> corrupted.
>>
>> It's difficult to tell who (if anyone is at fault).  The re-entrant
>> stack trace (for the attached PR) looks like:
>>
>> gen_split_203 (*testti_doubleword) calls
>> force_reg calls
>> emit_move_insn calls
>> emit_move_insn_1 calls
>> gen_movti calls
>> ix86_expand_move calls
>> ix86_convert_const_wide_int_to_broadcast calls
>> ix86_vector_duplicate_value calls
>> recog_memoized calls
>> recog.
>>
>> By far the simplest and possibly correct fix is rather than attempt
>> to push and pop recog_data, to simply (in pre-reload splits) save a
>> copy of any operands that will be needed after force_reg, and use
>> these copies afterwards.  Many pre-reload splitters avoid this issue
>> using "[(clobber (const_int 0))]" and so avoid gen_split_YYY functions,
>> but in our case we still need to save a copy of operands[0] (even if we
>> call emit_insn or expand_* ourselves), so we might as well continue to
>> use the conveniently generated gen_split.
>>
>> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
>> and make -k check, both with and without --target_board=unix{-m32},
>> with no new failures. Ok for mainline?
>
> Why this obviously fixes the issue seen I wonder whether there's
> more of recog_data that might be used after control flow returns
> to recog_memoized and thus the fix would be there, not in any
> backend pattern triggering the issue like this?
>
> The "easiest" fix would maybe to add a in_recog flag and
> simply return FAIL from recog when recursing.  Not sure what
> the effect on this particular pattern would be though?
>
> The better(?) fix might be to push/pop recog_data in 'recog', but
> of course give that recog_data is currently a global leakage
> in intermediate code can still happen.
>
> That said - does anybody know of similar fixes for this issue in other
> backends patterns?

I don't think it's valid for a simple query function like
ix86_vector_duplicate_value to clobber global state.  Doing that
could cause problems in other situations, not just splits.

Ideally, it would be good to wean insn-recog.cc:recog off global state.
The only parts of recog_data it uses (if I didn't miss something)
are recog_data.operands and recog_data.insn (but only to nullify
it for recog_memoized, which wouldn't be necessary if recog didn't
clobber recog_data.operands).  But I guess some .md expand/insn
conditions probably rely on the operands array being in recog_data,
so that might not be easy.

IMO the correct low-effort fix is to save and restore recog_data
in ix86_vector_duplicate_value.  It's a relatively big copy,
but the current code is pretty wasteful anyway (allocating at
least a new SET and INSN for every query).  Compared to the
overhead of doing that, a copy to and from the stack shouldn't
be too bad.

Thanks,
Richard


[PATCH] LoongArch: Provide fmin/fmax RTL pattern

2022-08-16 Thread Xi Ruoyao via Gcc-patches
A simple optimization.  Ok for trunk?

-- >8 --

We already had smin/smax RTL pattern using fmin/fmax instruction.  But
for smin/smax, it's unspecified what will happen if either operand is
NaN.  So we would generate calls to libc fmin/fmax functions with
-fno-finite-math-only (the default for all optimization levels expect
-Ofast).

But, LoongArch fmin/fmax instruction is IEEE-754-2008 conformant so we
can also use the instruction for fmin/fmax pattern and avoid the library
function call.

gcc/ChangeLog:

* config/loongarch/loongarch.md (fmax3): New RTL pattern.
(fmin3): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/fmax-fmin.c: New test.
---
 gcc/config/loongarch/loongarch.md | 18 +++
 .../gcc.target/loongarch/fmax-fmin.c  | 30 +++
 2 files changed, 48 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/fmax-fmin.c

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 6b6df22a5f1..8e8868de9f5 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -1023,6 +1023,24 @@ (define_insn "smin3"
   [(set_attr "type" "fmove")
(set_attr "mode" "")])
 
+(define_insn "fmax3"
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+   (smax:ANYF (match_operand:ANYF 1 "register_operand" "f")
+  (match_operand:ANYF 2 "register_operand" "f")))]
+  ""
+  "fmax.\t%0,%1,%2"
+  [(set_attr "type" "fmove")
+   (set_attr "mode" "")])
+
+(define_insn "fmin3"
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+   (smin:ANYF (match_operand:ANYF 1 "register_operand" "f")
+  (match_operand:ANYF 2 "register_operand" "f")))]
+  ""
+  "fmin.\t%0,%1,%2"
+  [(set_attr "type" "fmove")
+   (set_attr "mode" "")])
+
 (define_insn "smaxa3"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
(if_then_else:ANYF
diff --git a/gcc/testsuite/gcc.target/loongarch/fmax-fmin.c 
b/gcc/testsuite/gcc.target/loongarch/fmax-fmin.c
new file mode 100644
index 000..92cf8a1501d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/fmax-fmin.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-mdouble-float -fno-finite-math-only" } */
+/* { dg-final { scan-assembler "fmin\\.s" } } */
+/* { dg-final { scan-assembler "fmin\\.d" } } */
+/* { dg-final { scan-assembler "fmax\\.s" } } */
+/* { dg-final { scan-assembler "fmax\\.d" } } */
+
+double
+_fmax(double a, double b)
+{
+  return __builtin_fmax(a, b);
+}
+
+float
+_fmaxf(float a, float b)
+{
+  return __builtin_fmaxf(a, b);
+}
+
+double
+_fmin(double a, double b)
+{
+  return __builtin_fmin(a, b);
+}
+
+float
+_fminf(float a, float b)
+{
+  return __builtin_fminf(a, b);
+}
-- 
2.37.2




[PATCH][pushed] docs: fix link destination

2022-08-16 Thread Martin Liška
Pushed as obvious.

Martin

gcc/fortran/ChangeLog:

* gfortran.texi: Fix link destination to a valid URL.
---
 gcc/fortran/gfortran.texi | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi
index d34e0b5e8f9..59d673bfc03 100644
--- a/gcc/fortran/gfortran.texi
+++ b/gcc/fortran/gfortran.texi
@@ -246,7 +246,7 @@ including OpenMP and OpenACC support for parallel 
programming.
 The GNU Fortran compiler passes the
 @uref{http://www.fortran-2000.com/ArnaudRecipes/fcvs21_f95.html,
 NIST Fortran 77 Test Suite}, and produces acceptable results on the
-@uref{https://www.netlib.org/lapack/faq.html#1.21, LAPACK Test Suite}.
+@uref{https://www.netlib.org/lapack/faq.html, LAPACK Test Suite}.
 It also provides respectable performance on
 the @uref{https://polyhedron.com/?page_id=175,
 Polyhedron Fortran compiler benchmarks} and the
@@ -441,7 +441,7 @@ found in the following sections of the documentation.
 
 Additionally, the GNU Fortran compilers supports the OpenMP specification
 (version 4.5 and partial support of the features of the 5.0 version,
-@url{https://openmp.org/@/openmp-specifications/}).
+@url{https://openmp.org/@/specifications/}).
 There also is support for the OpenACC specification (targeting
 version 2.6, @uref{https://www.openacc.org/}).  See
 @uref{https://gcc.gnu.org/wiki/OpenACC} for more information.
@@ -1806,7 +1806,7 @@ It consists of a set of compiler directives, library 
routines,
 and environment variables that influence run-time behavior.
 
 GNU Fortran strives to be compatible to the
-@uref{https://openmp.org/wp/openmp-specifications/,
+@uref{https://openmp.org/specifications/,
 OpenMP Application Program Interface v4.5}.
 
 To enable the processing of the OpenMP directive @code{!$omp} in
-- 
2.37.1



Re: [09/23] Add a cut-down version of std::span (array_slice)

2022-08-16 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Wed, Aug 10, 2022 at 6:04 PM Martin Jambor  wrote:
>>
>> Hello,
>>
>> I have one more question/comment about array_slice.  Ever since I
>> started to use it...
>>
>> On Fri, Nov 13 2020, Richard Sandiford via Gcc-patches wrote:
>> > A later patch wants to be able to pass around subarray views of an
>> > existing array.  The standard class to do that is std::span, but it's
>> > a C++20 thing.  This patch just adds a cut-down version of it.
>> >
>> > The intention is just to provide what's currently needed.
>> >
>> > gcc/
>> >   * vec.h (array_slice): New class.
>> > ---
>> >  gcc/vec.h | 120 ++
>> >  1 file changed, 120 insertions(+)
>> >
>> > diff --git a/gcc/vec.h b/gcc/vec.h
>> > index f02beddc975..7768de9f518 100644
>> > --- a/gcc/vec.h
>> > +++ b/gcc/vec.h
>> > @@ -2128,6 +2128,126 @@ release_vec_vec (vec > )
>> >vec.release ();
>> >  }
>> >
>> > +// Provide a subset of the std::span functionality.  (We can't use 
>> > std::span
>> > +// itself because it's a C++20 feature.)
>> > +//
>> > +// In addition, provide an invalid value that is distinct from all valid
>> > +// sequences (including the empty sequence).  This can be used to return
>> > +// failure without having to use std::optional.
>> > +//
>> > +// There is no operator bool because it would be ambiguous whether it is
>> > +// testing for a valid value or an empty sequence.
>> > +template
>> > +class array_slice
>> > +{
>> > +  template friend class array_slice;
>> > +
>> > +public:
>> > +  using value_type = T;
>> > +  using iterator = T *;
>> > +  using const_iterator = const T *;
>> > +
>> > +  array_slice () : m_base (nullptr), m_size (0) {}
>> > +
>> > +  template
>> > +  array_slice (array_slice other)
>> > +: m_base (other.m_base), m_size (other.m_size) {}
>> > +
>> > +  array_slice (iterator base, unsigned int size)
>> > +: m_base (base), m_size (size) {}
>> > +
>> > +  template
>> > +  array_slice (T ()[N]) : m_base (array), m_size (N) {}
>> > +
>> > +  template
>> > +  array_slice (const vec )
>> > +: m_base (v.address ()), m_size (v.length ()) {}
>> > +
>> > +  iterator begin () { return m_base; }
>> > +  iterator end () { return m_base + m_size; }
>> > +
>> > +  const_iterator begin () const { return m_base; }
>> > +  const_iterator end () const { return m_base + m_size; }
>> > +
>> > +  value_type  ();
>> > +  value_type  ();
>> > +  value_type [] (unsigned int i);
>> > +
>> > +  const value_type  () const;
>> > +  const value_type  () const;
>> > +  const value_type [] (unsigned int i) const;
>> > +
>> > +  size_t size () const { return m_size; }
>>
>> ...this has been a constant source of compile errors, because vectors
>> have length () and this is size ().
>>
>> I understand that the motivation was consistency with std::span, but do
>> we really want to add another inconsistency with ourselves?
>>
>> Given that array_slice is not that much used yet, I believe we can still
>> change to be consistent with vectors.  I personally think we should but
>> at the very least, if we keep it as it is, I'd like us to do so
>> deliberately.
>
> We could alternatively add length in addition to size (and maybe size to
> vec<> if std::vector has size but not length) with a comment deprecating
> the "non-standard" variant?

Yeah, I'd prefer to do the latter: add vec::size as a synonym of
vec::length, and deprecate length.  Doing anything else seems like
it's going to increase the inconsistency rather than decrease it.
E.g. we already have uses of (hopefully) uncontroversial standard
containers like std::array (my fault).

(FWIW, I keep tripping up in the opposite direction: expecting
size to be available in vec, like for standard containers.)

Thanks,
Richard


Re: [PATCH] soft-fp: Update soft-fp from glibc

2022-08-16 Thread Kito Cheng via Gcc-patches
ping

On Wed, Aug 10, 2022 at 10:23 PM Kito Cheng  wrote:
>
> This patch is updating all soft-fp from glibc, most changes are
> copyright years update, removing "Contributed by" lines and update URL for
> license, and changes other than those update are adding conversion
> function between IEEE half and 32-bit/64-bit integer, those functions are
> required by RISC-V _Float16 support.
>
> libgcc/ChangeLog:
>
> * soft-fp/fixhfdi.c: New.
> * soft-fp/fixhfsi.c: Likewise.
> * soft-fp/fixunshfdi.c: Likewise.
> * soft-fp/fixunshfsi.c: Likewise.
> * soft-fp/floatdihf.c: Likewise.
> * soft-fp/floatsihf.c: Likewise.
> * soft-fp/floatundihf.c: Likewise.
> * soft-fp/floatunsihf.c: Likewise.
> * soft-fp/adddf3.c: Updating copyright years, removing "Contributed 
> by"
> lines and update URL for license.
> * soft-fp/addsf3.c: Likewise.
> * soft-fp/addtf3.c: Likewise.
> * soft-fp/divdf3.c: Likewise.
> * soft-fp/divsf3.c: Likewise.
> * soft-fp/divtf3.c: Likewise.
> * soft-fp/double.h: Likewise.
> * soft-fp/eqdf2.c: Likewise.
> * soft-fp/eqhf2.c: Likewise.
> * soft-fp/eqsf2.c: Likewise.
> * soft-fp/eqtf2.c: Likewise.
> * soft-fp/extenddftf2.c: Likewise.
> * soft-fp/extended.h: Likewise.
> * soft-fp/extendhfdf2.c: Likewise.
> * soft-fp/extendhfsf2.c: Likewise.
> * soft-fp/extendhftf2.c: Likewise.
> * soft-fp/extendhfxf2.c: Likewise.
> * soft-fp/extendsfdf2.c: Likewise.
> * soft-fp/extendsftf2.c: Likewise.
> * soft-fp/extendxftf2.c: Likewise.
> * soft-fp/fixdfdi.c: Likewise.
> * soft-fp/fixdfsi.c: Likewise.
> * soft-fp/fixdfti.c: Likewise.
> * soft-fp/fixhfti.c: Likewise.
> * soft-fp/fixsfdi.c: Likewise.
> * soft-fp/fixsfsi.c: Likewise.
> * soft-fp/fixsfti.c: Likewise.
> * soft-fp/fixtfdi.c: Likewise.
> * soft-fp/fixtfsi.c: Likewise.
> * soft-fp/fixtfti.c: Likewise.
> * soft-fp/fixunsdfdi.c: Likewise.
> * soft-fp/fixunsdfsi.c: Likewise.
> * soft-fp/fixunsdfti.c: Likewise.
> * soft-fp/fixunshfti.c: Likewise.
> * soft-fp/fixunssfdi.c: Likewise.
> * soft-fp/fixunssfsi.c: Likewise.
> * soft-fp/fixunssfti.c: Likewise.
> * soft-fp/fixunstfdi.c: Likewise.
> * soft-fp/fixunstfsi.c: Likewise.
> * soft-fp/fixunstfti.c: Likewise.
> * soft-fp/floatdidf.c: Likewise.
> * soft-fp/floatdisf.c: Likewise.
> * soft-fp/floatditf.c: Likewise.
> * soft-fp/floatsidf.c: Likewise.
> * soft-fp/floatsisf.c: Likewise.
> * soft-fp/floatsitf.c: Likewise.
> * soft-fp/floattidf.c: Likewise.
> * soft-fp/floattihf.c: Likewise.
> * soft-fp/floattisf.c: Likewise.
> * soft-fp/floattitf.c: Likewise.
> * soft-fp/floatundidf.c: Likewise.
> * soft-fp/floatundisf.c: Likewise.
> * soft-fp/floatunditf.c: Likewise.
> * soft-fp/floatunsidf.c: Likewise.
> * soft-fp/floatunsisf.c: Likewise.
> * soft-fp/floatunsitf.c: Likewise.
> * soft-fp/floatuntidf.c: Likewise.
> * soft-fp/floatuntihf.c: Likewise.
> * soft-fp/floatuntisf.c: Likewise.
> * soft-fp/floatuntitf.c: Likewise.
> * soft-fp/gedf2.c: Likewise.
> * soft-fp/gesf2.c: Likewise.
> * soft-fp/getf2.c: Likewise.
> * soft-fp/half.h: Likewise.
> * soft-fp/ledf2.c: Likewise.
> * soft-fp/lesf2.c: Likewise.
> * soft-fp/letf2.c: Likewise.
> * soft-fp/muldf3.c: Likewise.
> * soft-fp/mulsf3.c: Likewise.
> * soft-fp/multf3.c: Likewise.
> * soft-fp/negdf2.c: Likewise.
> * soft-fp/negsf2.c: Likewise.
> * soft-fp/negtf2.c: Likewise.
> * soft-fp/op-1.h: Likewise.
> * soft-fp/op-2.h: Likewise.
> * soft-fp/op-4.h: Likewise.
> * soft-fp/op-8.h: Likewise.
> * soft-fp/op-common.h: Likewise.
> * soft-fp/quad.h: Likewise.
> * soft-fp/single.h: Likewise.
> * soft-fp/soft-fp.h: Likewise.
> * soft-fp/subdf3.c: Likewise.
> * soft-fp/subsf3.c: Likewise.
> * soft-fp/subtf3.c: Likewise.
> * soft-fp/truncdfhf2.c: Likewise.
> * soft-fp/truncdfsf2.c: Likewise.
> * soft-fp/truncsfhf2.c: Likewise.
> * soft-fp/trunctfdf2.c: Likewise.
> * soft-fp/trunctfhf2.c: Likewise.
> * soft-fp/trunctfsf2.c: Likewise.
> * soft-fp/trunctfxf2.c: Likewise.
> * soft-fp/truncxfhf2.c: Likewise.
> * soft-fp/unorddf2.c: Likewise.
> * soft-fp/unordsf2.c: Likewise.
> ---
>  libgcc/soft-fp/adddf3.c  |  6 ++---
>  libgcc/soft-fp/addsf3.c  |  6 ++---
>  libgcc/soft-fp/addtf3.c  |  6 ++---
>  libgcc/soft-fp/divdf3.c  |  6 ++---
>  libgcc/soft-fp/divsf3.c

Re: [RFA configure parts] aarch64: Make cc1 handle --with options

2022-08-16 Thread Richard Sandiford via Gcc-patches
Richard Earnshaw  writes:
> On 05/08/2022 14:53, Richard Sandiford via Gcc-patches wrote:
>> Richard Earnshaw  writes:
>>> On 13/06/2022 15:33, Richard Sandiford via Gcc-patches wrote:
 On aarch64, --with-arch, --with-cpu and --with-tune only have an
 effect on the driver, so “./xgcc -B./ -O3” can give significantly
 different results from “./cc1 -O3”.  --with-arch did have a limited
 effect on ./cc1 in previous releases, although it didn't work
 entirely correctly.

 Being of a lazy persuasion, I've got used to ./cc1 selecting SVE for
 --with-arch=armv8.2-a+sve without having to supply an explicit -march,
 so this patch makes ./cc1 emulate the relevant OPTION_DEFAULT_SPECS.
 It relies on Wilco's earlier clean-ups.

 The patch makes config.gcc define WITH_FOO_STRING macros for each
 supported --with-foo option.  This could be done only in aarch64-
 specific code, but I thought it could be useful on other targets
 too (and can be safely ignored otherwise).  There didn't seem to
 be any existing and potentially clashing uses of macros with this
 style of name.

 Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK for the configure
 bits?

 Richard


 gcc/
* config.gcc: Define WITH_FOO_STRING macros for each supported
--with-foo option.
* config/aarch64/aarch64.cc (aarch64_override_options): Emulate
OPTION_DEFAULT_SPECS.
* config/aarch64/aarch64.h (OPTION_DEFAULT_SPECS): Reference the above.
 ---
gcc/config.gcc| 14 ++
gcc/config/aarch64/aarch64.cc |  8 
gcc/config/aarch64/aarch64.h  |  5 -
3 files changed, 26 insertions(+), 1 deletion(-)

 diff --git a/gcc/config.gcc b/gcc/config.gcc
 index cdbefb5b4f5..e039230431c 100644
 --- a/gcc/config.gcc
 +++ b/gcc/config.gcc
 @@ -5865,6 +5865,20 @@ else
configure_default_options="{ ${t} }"
fi

 +for option in $supported_defaults
 +do
 +  lc_option=`echo $option | sed s/-/_/g`
 +  uc_option=`echo $lc_option | tr a-z A-Z`
 +  eval "val=\$with_$lc_option"
 +  if test -n "$val"
 +  then
 +  val="\\\"$val\\\""
 +  else
 +  val=nullptr
 +  fi
 +  tm_defines="$tm_defines WITH_${uc_option}_STRING=$val"
 +done
>>>
>>> This bit would really be best reviewed by a non-arm maintainer.  It
>>> generally looks OK.  My only comment would be why define anything if the
>>> corresponding --with-foo was not specified.  They you can use #ifdef to
>>> test if the user specified a default.
>> 
>> Yeah, could do it that way instead, but:
>> 
 diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
 index d21e041eccb..0bc700b81ad 100644
 --- a/gcc/config/aarch64/aarch64.cc
 +++ b/gcc/config/aarch64/aarch64.cc
 @@ -18109,6 +18109,14 @@ aarch64_override_options (void)
  if (aarch64_branch_protection_string)
aarch64_validate_mbranch_protection 
 (aarch64_branch_protection_string);

 +  /* Emulate OPTION_DEFAULT_SPECS.  */
 +  if (!aarch64_arch_string && !aarch64_cpu_string)
 +aarch64_arch_string = WITH_ARCH_STRING;
 +  if (!aarch64_arch_string && !aarch64_cpu_string)
 +aarch64_cpu_string = WITH_CPU_STRING;
 +  if (!aarch64_cpu_string && !aarch64_tune_string)
 +aarch64_tune_string = WITH_TUNE_STRING;
>> 
>> (without the preprocessor stuff) IMO reads better.  If a preprocessor
>> is/isn't present test turns out to be useful, perhaps we should add
>> macros like HAVE_WITH_TUNE/WITH_TUNE_PRESENT/... too?  I guess it
>> should only be done when something needs it though.
>
> It's relatively easy to add
>
> #ifndef WITH_TUNE_STRING
> #define WITH_TUNE_STRING (nulptr)
> #endif
>
> in a header, but much harder to go the other way.  The case I was 
> thinking of was something like:
>
> #if !defined(WITH_ARCH_STRING) && !defined(WITH_CPU_STRING)
> #define WITH_ARCH_STRING ""
> #endif
>
> which saves having to have yet another level of fallback if nothing has 
> been specified, but this is next to impossible if the macros are 
> unconditionally defined.

Right, but I was suggesting to have both:

WITH_TUNE_STRING: always available, as above
HAVE_WITH_TUNE: for preprocessor conditions (if something needs it in future)

So the C++ code could stay as above (A):

  /* Emulate OPTION_DEFAULT_SPECS.  */
  if (!aarch64_arch_string && !aarch64_cpu_string)
aarch64_arch_string = WITH_ARCH_STRING;
  if (!aarch64_arch_string && !aarch64_cpu_string)
aarch64_cpu_string = WITH_CPU_STRING;
  if (!aarch64_cpu_string && !aarch64_tune_string)
aarch64_tune_string = WITH_TUNE_STRING;

rather than have to become:

#ifdef WITH_ARCH_STRING
  if (!aarch64_arch_string && !aarch64_cpu_string)
aarch64_arch_string = WITH_ARCH_STRING;
#endif
#ifdef WITH_CPU_STRING
  if 

[PATCH] x86: Support vector __bf16 type.

2022-08-16 Thread Kong, Lingling via Gcc-patches
Hi,

The patch is support vector init/broadcast/set/extract for __bf16 type.
The __bf16 type is a storage type.

OK for master?

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_expand_sse_movcc): Handle vector
BFmode.
(ix86_expand_vector_init_duplicate): Support vector BFmode.
(ix86_expand_vector_init_one_nonzero): Ditto.
(ix86_expand_vector_init_one_var): Ditto.
(ix86_expand_vector_init_concat): Ditto.
(ix86_expand_vector_init_interleave): Ditto.
(ix86_expand_vector_init_general): Ditto.
(ix86_expand_vector_init): Ditto.
(ix86_expand_vector_set_var): Ditto.
(ix86_expand_vector_set): Ditto.
(ix86_expand_vector_extract): Ditto.
* config/i386/i386.cc (classify_argument): Add BF vector modes.
(function_arg_64): Ditto.
(ix86_gimplify_va_arg): Ditto.
(ix86_get_ssemov): Ditto.
* config/i386/i386.h (VALID_AVX256_REG_MODE): Add BF vector modes.
(VALID_AVX512F_REG_MODE): Ditto.
(host_detect_local_cpu): Ditto.
(VALID_SSE2_REG_MODE): Ditto.
* config/i386/i386.md: Add BF vector modes.
(MODE_SIZE): Ditto.
(ssemodesuffix): Add bf suffix for BF vector modes.
(ssevecmode): Ditto.
* config/i386/sse.md (VMOVE): Adjust for BF vector modes.
(VI12HFBF_AVX512VL): Ditto.
(V_256_512): Ditto.
(VF_AVX512HFBF16): Ditto.
(VF_AVX512BWHFBF16): Ditto.
(VIHFBF): Ditto.
(avx512): Ditto.
(VIHFBF_256): Ditto.
(VIHFBF_AVX512BW): Ditto.
(VI2F_256_512):Ditto.
(V8_128):Ditto.
(V16_256): Ditto.
(V32_512): Ditto.
(sseinsnmode): Ditto.
(sseconstm1): Ditto.
(sseintmodesuffix): New mode_attr.
(avx512fmaskmode): Ditto.
(avx512fmaskmodelower): Ditto.
(ssedoublevecmode): Ditto.
(ssehalfvecmode): Ditto.
(ssehalfvecmodelower): Ditto.
(ssescalarmode): Add vector BFmode mapping.
(ssescalarmodelower): Ditto.
(ssexmmmode): Ditto.
(ternlogsuffix): Ditto.
(ssescalarsize): Ditto.
(sseintprefix): Ditto.
(i128): Ditto.
(xtg_mode): Ditto.
(bcstscalarsuff): Ditto.
(_blendm): New define_insn for BFmode.
(_store_mask): Ditto.
(vcond_mask_): Ditto.
(vec_set_0): New define_insn for BF vector set.
(V8BFH_128): New mode_iterator for BFmode.
(avx512fp16_mov): Ditto.
(vec_set): New define_insn for BF vector set.
(@vec_extract_hi_): Ditto.
(@vec_extract_lo_): Ditto.
(vec_set_hi_): Ditto.
(vec_set_lo_): Ditto.
(*vec_extract_0): New define_insn_and_split for BF
vector extract.
(*vec_extract): New define_insn.
(VEC_EXTRACT_MODE): Add BF vector modes.
(PINSR_MODE): Add V8BF.
(sse2p4_1): Ditto.
(pinsr_evex_isa): Ditto.
(_pinsr): Adjust to support
insert for V8BFmode.
(pbroadcast_evex_isa): Add BF vector modes.
(AVX2_VEC_DUP_MODE): Ditto.
(VEC_INIT_MODE): Ditto.
(VEC_INIT_HALF_MODE): Ditto.
(avx2_pbroadcast): Adjust to support BF vector mode
broadcast.
(avx2_pbroadcast_1): Ditto.
(_vec_dup_1): Ditto.
(_vec_dup_gpr):
Ditto.

gcc/testsuite/ChangeLog:

* g++.target/i386/vect-bfloat16-1.C: New test.
* gcc.target/i386/vect-bfloat16-1.c: New test.
* gcc.target/i386/vect-bfloat16-2a.c: New test.
* gcc.target/i386/vect-bfloat16-2b.c: New test.
* gcc.target/i386/vect-bfloat16-typecheck_1.c: New test.
* gcc.target/i386/vect-bfloat16-typecheck_2.c: New test.
---
 gcc/config/i386/i386-expand.cc| 129 +++--
 gcc/config/i386/i386.cc   |  16 +-
 gcc/config/i386/i386.h|  12 +-
 gcc/config/i386/i386.md   |   9 +-
 gcc/config/i386/sse.md| 211 --
 .../g++.target/i386/vect-bfloat16-1.C |  13 +
 .../gcc.target/i386/vect-bfloat16-1.c |  30 ++
 .../gcc.target/i386/vect-bfloat16-2a.c| 121 
 .../gcc.target/i386/vect-bfloat16-2b.c|  22 ++
 .../i386/vect-bfloat16-typecheck_1.c  | 258 ++
 .../i386/vect-bfloat16-typecheck_2.c  | 248 +
 11 files changed, 950 insertions(+), 119 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/i386/vect-bfloat16-1.C
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-bfloat16-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-bfloat16-2a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-bfloat16-2b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_2.c

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 

Re: [PATCH v2] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]

2022-08-16 Thread Kewen.Lin via Gcc-patches
Hi Xionghu,

Thanks for the updated version of patch, some comments are inlined.

on 2022/8/11 14:15, Xionghu Luo wrote:
> 
> 
> On 2022/8/11 01:07, Segher Boessenkool wrote:
>> On Wed, Aug 10, 2022 at 02:39:02PM +0800, Xionghu Luo wrote:
>>> On 2022/8/9 11:01, Kewen.Lin wrote:
 I have some concern on those changed "altivec_*_direct", IMHO the suffix
 "_direct" is normally to indicate the define_insn is mapped to the
 corresponding hw insn directly.  With this change, for example,
 altivec_vmrghb_direct can be mapped into vmrghb or vmrglb, this looks
 misleading.  Maybe we can add the corresponding _direct_le and _direct_be
 versions, both are mapped into the same insn but have different RTL
 patterns.  Looking forward to Segher's and David's suggestions.
>>>
>>> Thanks!  Do you mean same RTL patterns with different hw insn?
>>
>> A pattern called altivec_vmrghb_direct_le should always emit a vmrghb
>> instruction, never a vmrglb instead.  Misleading names are an expensive
>> problem.
>>
>>
> 
> Thanks.  Then on LE platforms, if user calls altivec_vmrghw,it will be
> expanded to RTL (vec_select (vec_concat (R0 R1 (0 4 1 5))), and
> finally matched to altivec_vmrglw_direct_v4si_le with ASM "vmrglw".
> For BE just strict forward, seems more clear :-), OK for master?
> 
> 
> [PATCH v3] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS 
> [PR106069]
> 
> v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match
> the actual output ASM vmrglb. Likewise for all similar xxx_direct_le
> patterns.
> v2: Split the direct pattern to be and le with same RTL but different insn.
> 
> The native RTL expression for vec_mrghw should be same for BE and LE as
> they are register and endian-independent.  So both BE and LE need
> generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
> with vec_select and vec_concat.
> 
> (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
>    (subreg:V4SI (reg:V16QI 139) 0)
>    (subreg:V4SI (reg:V16QI 140) 0))
>    [const_int 0 4 1 5]))
> 
> Then combine pass could do the nested vec_select optimization
> in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE:
> 
> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5])
> 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);}
> 
> =>
> 
> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel)
> 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);}
> 
> The endianness check need only once at ASM generation finally.
> ASM would be better due to nested vec_select simplified to simple scalar
> load.
> 
> Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{64}
> Linux(Thanks to Kewen).
> 
> gcc/ChangeLog:
> 
> PR target/106069
> * config/rs6000/altivec.md (altivec_vmrghb_direct): Remove.
> (altivec_vmrghb_direct_be): New pattern for BE.
> (altivec_vmrglb_direct_le): New pattern for LE.
> (altivec_vmrghh_direct): Remove.
> (altivec_vmrghh_direct_be): New pattern for BE.
> (altivec_vmrglh_direct_le): New pattern for LE.
> (altivec_vmrghw_direct_): Remove.
> (altivec_vmrghw_direct__be): New pattern for BE.
> (altivec_vmrglw_direct__le): New pattern for LE.
> (altivec_vmrglb_direct): Remove.
> (altivec_vmrglb_direct_be): New pattern for BE.
> (altivec_vmrghb_direct_le): New pattern for LE.
> (altivec_vmrglh_direct): Remove.
> (altivec_vmrglh_direct_be): New pattern for BE.
> (altivec_vmrghh_direct_le): New pattern for LE.
> (altivec_vmrglw_direct_): Remove.
> (altivec_vmrglw_direct__be): New pattern for BE.
> (altivec_vmrghw_direct__le): New pattern for LE.
> * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const):
> Adjust.
> * config/rs6000/vsx.md: Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
> PR target/106069
> * g++.target/powerpc/pr106069.C: New test.
> 
> Signed-off-by: Xionghu Luo 
> ---
>  gcc/config/rs6000/altivec.md    | 223 ++--
>  gcc/config/rs6000/rs6000.cc |  36 ++--
>  gcc/config/rs6000/vsx.md    |  24 +--
>  gcc/testsuite/g++.target/powerpc/pr106069.C | 120 +++
>  4 files changed, 305 insertions(+), 98 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C
> 
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index 2c4940f2e21..78245f470e9 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -1144,15 +1144,17 @@ (define_expand "altivec_vmrghb"
>     (use (match_operand:V16QI 2 "register_operand"))]
>    "TARGET_ALTIVEC"
>  {
> -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
> -    : gen_altivec_vmrglb_direct;
> -  if (!BYTES_BIG_ENDIAN)
> -    std::swap (operands[1], operands[2]);
> -  emit_insn (fun (operands[0], operands[1], operands[2]));
> +  rtvec v = gen_rtvec (16, 

Re: [RFC]rs6000: split complicated constant to memory

2022-08-16 Thread Jiufu Guo via Gcc-patches
Jiufu Guo  writes:

> Hi,
>
> Richard Biener  writes:
>
>> On Mon, Aug 15, 2022 at 7:26 AM Jiufu Guo via Gcc-patches
>>  wrote:
>>>
>>> Hi,
>>>
>>> This patch tries to put the constant into constant pool if building the
>>> constant requires 3 or more instructions.
>>>
>>> But there is a concern: I'm wondering if this patch is really profitable.
>>>
>>> Because, as I tested, 1. for simple case, if instructions are not been run
>>> in parallel, loading constant from memory maybe faster; but 2. if there
>>> are some instructions could run in parallel, loading constant from memory
>>> are not win comparing with building constant.  As below examples.
>>>
>>> For f1.c and f3.c, 'loading' constant would be acceptable in runtime aspect;
>>> for f2.c and f4.c, 'loading' constant are visibly slower.
>>>
>>> For real-world cases, both kinds of code sequences exist.
>>>
>>> So, I'm not sure if we need to push this patch.
>>>
>>> Run a lot of times (10) below functions to check runtime.
>>> f1.c:
>>> long foo (long *arg, long*, long *)
>>> {
>>>   *arg = 0x12345678;
>>> }
>>> asm building constant:
>>> lis 10,0x1234
>>> ori 10,10,0x5678
>>> sldi 10,10,32
>>> vs.  asm loading
>>> addis 10,2,.LC0@toc@ha
>>> ld 10,.LC0@toc@l(10)
>>> The runtime between 'building' and 'loading' are similar: some times the
>>> 'building' is faster; sometimes 'loading' is faster. And the difference is
>>> slight.
>>
>> I wonder if it is possible to decide this during scheduling - chose the
>> variant that, when the result is needed, is cheaper?  Post-RA might
>> be a bit difficult (I see the load from memory needs the TOC, but then
>> when the TOC is not available we could just always emit the build form),
>> and pre-reload precision might be not good enough to make this worth
>> the experiment?
> Thanks a lot for your comments!
>
> Yes, Post-RA may not handle all cases.
> If there is no TOC avaiable, we are not able to load the const through
> TOC.  As Segher point out: crtl->uses_const_pool maybe an approximation
> way.
> Sched2 pass could optimize some cases(e.g. for f2.c and f4.c), but for
> some cases, it may not distrubuted those 'building' instructions.
>
> So, maybe we add a peephole after sched2.  If the five-instructions
> to building constant are still successive, then using 'load' to replace
> (need to check TOC available).
> While I'm not sure if it is worthy.

Oh, as checking the object files (from GCC bootstrap and spec), it is rare
that the five-instructions are successive.  It is often 1(or 2) insns
are distributed, and other 4(or 3) instructions are successive.
So, using peephole may not very helpful.

BR,
Jeff(Jiufu)

>
>>
>> Of course the scheduler might lack on the technical side as well.
>
>
> BR,
> Jeff(Jiufu)
>
>>
>>>
>>> f2.c
>>> long foo (long *arg, long *arg2, long *arg3)
>>> {
>>>   *arg = 0x12345678;
>>>   *arg2 = 0x79652347;
>>>   *arg3 = 0x46891237;
>>> }
>>> asm building constant:
>>> lis 7,0x1234
>>> lis 10,0x7965
>>> lis 9,0x4689
>>> ori 7,7,0x5678
>>> ori 10,10,0x2347
>>> ori 9,9,0x1237
>>> sldi 7,7,32
>>> sldi 10,10,32
>>> sldi 9,9,32
>>> vs. loading
>>> addis 7,2,.LC0@toc@ha
>>> addis 10,2,.LC1@toc@ha
>>> addis 9,2,.LC2@toc@ha
>>> ld 7,.LC0@toc@l(7)
>>> ld 10,.LC1@toc@l(10)
>>> ld 9,.LC2@toc@l(9)
>>> For this case, 'loading' is always slower than 'building' (>15%).
>>>
>>> f3.c
>>> long foo (long *arg, long *, long *)
>>> {
>>>   *arg = 384307168202282325;
>>> }
>>> lis 10,0x555
>>> ori 10,10,0x
>>> sldi 10,10,32
>>> oris 10,10,0x
>>> ori 10,10,0x
>>> For this case, 'building' (through 5 instructions) are slower, and 'loading'
>>> is faster ~5%;
>>>
>>> f4.c
>>> long foo (long *arg, long *arg2, long *arg3)
>>> {
>>>   *arg = 384307168202282325;
>>>   *arg2 = -6148914691236517205;
>>>   *arg3 = 768614336404564651;
>>> }
>>> lis 7,0x555
>>> lis 10,0x
>>> lis 9,0xaaa
>>> ori 7,7,0x
>>> ori 10,10,0x
>>> ori 9,9,0x
>>> sldi 7,7,32
>>> sldi 10,10,32
>>> sldi 9,9,32
>>> oris 7,7,0x
>>> oris 10,10,0x
>>> oris 9,9,0x
>>> ori 7,7,0x
>>> ori 10,10,0xaaab
>>> ori 9,9,0xaaab
>>> For this cases, since 'building' constant are parallel, 'loading' is slower:
>>> ~8%. On p10, 'loading'(through 'pld') is also slower >4%.
>>>
>>>
>>> BR,
>>> Jeff(Jiufu)
>>>
>>> ---
>>>  gcc/config/rs6000/rs6000.cc| 14 ++
>>>  gcc/testsuite/gcc.target/powerpc/pr63281.c | 11 +++
>>>  2 files changed, 25 insertions(+)
>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr63281.c
>>>
>>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>>> index 4b727d2a500..3798e11bdbc 100644
>>> --- 

[PATCH][pushed] jobserver: fix fifo mode by opening pipe in proper mode

2022-08-16 Thread Martin Liška
The current jobserver_info relies on non-blocking FDs,
thus one the pipe in such mode.

Tested locally for GCC LTO bootstrap that was stuck before the revision.

I'm going to push the change.

Martin

gcc/ChangeLog:

* opts-common.cc (jobserver_info::connect): Open fifo
in non-blocking mode.
---
 gcc/opts-common.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/opts-common.cc b/gcc/opts-common.cc
index 5d79f174a38..4dec9f94447 100644
--- a/gcc/opts-common.cc
+++ b/gcc/opts-common.cc
@@ -2064,7 +2064,7 @@ void
 jobserver_info::connect ()
 {
   if (!pipe_path.empty ())
-pipefd = open (pipe_path.c_str (), O_RDWR);
+pipefd = open (pipe_path.c_str (), O_RDWR | O_NONBLOCK);
 }
 
 void
-- 
2.37.1