date:20210921

Re: [PATCH v2] ipa-inline: Add target info into fn summary [PR102059]

2021-09-21 Thread Kewen.Lin via Gcc-patches

on 2021/9/21 下午5:39, Richard Biener wrote:
> On Tue, Sep 21, 2021 at 11:31 AM Martin Jambor  wrote:
>>
>> Hi,
>>
>> On Tue, Sep 21 2021, Kewen.Lin wrote:
>>> on 2021/9/17 下午7:26, Martin Jambor wrote:
 On Fri, Sep 17 2021, Kewen.Lin wrote:
>> [...]
>
> Sorry that I failed to use 16 bit-fields for this, I figured out that
> the bit-fields can not be address-taken or passed as non-const reference.
> The gentype also failed to recognize uint16_t if I used uint16_t directly
> in ipa-fnsummary.h.  Finally I used unsigned int instead.
>

 well, you could have used:

   unsigned int target_info : 16;

 for the field (and uint16_t when passed to hooks).

 But I am not sure if it is that crucial.

>>>
>>> I may miss something, specifically I tried with:
>>>
>>> 1)
>>>
>>>   unsigned int target_info : 16;
>>>   unsigned inlinable : 1;
>>>   ...
>>>
>>>   update_ipa_fn_target_info (uint16_t &, const gimple *)
>>
>> Yeah, you would have to copy the bit-field into a temporary, pass
>> reference to that in the hook and then copy it back.  At least that is
>> what I meant but since we apparently want unsigned int everywhere, it
>> does not matter.

Ah, I misunderstood you didn't want this way since it seems inefficient.
Will realize it next time since it looks like a tradeoff.  :)

> 
> Or use a by-value interface:
> 
>  uint16_t update_ipa_fn_target_info (uint16_t in, const gimple *stmt);
> 
> with the function returning the (changed) set.

Yeah, I considered this like:

  uint16_t update_ipa_fn_target_info (const uint16_t, const gimple*, bool&)

but thought it might look weird to others at the first glances and gave up
then.  :(

BR,
Kewen

Re: [PATCH] rs6000: Parameterize some const values for density test

2021-09-21 Thread Kewen.Lin via Gcc-patches

on 2021/9/21 下午8:03, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Sep 21, 2021 at 01:47:19PM +0800, Kewen.Lin wrote:
>> on 2021/9/18 上午6:26, Segher Boessenkool wrote:
 +  if (data->nloads > (unsigned int) rs6000_density_load_num_threshold
 +&& load_pct > (unsigned int) rs6000_density_load_pct_threshold)
>>>
>>> Those variables are unsigned int already.  Don't cast please.
>>
>> Unfortunately this is required by bootstrapping.  The UInteger for the
>> param definition is really confusing, in the underlying implementation
>> it's still "signed".  If you grep "(unsigned) param", you can see a few
>> examples.  I guess the "UInteger" is mainly for the param value range
>> checking.
> 
> Huh, I see.  Is that a bug?  It certainly is surprising!  Please open a
> PR if you think it could/should be improved, put me on Cc:?
> 

I guessed it's not a bug, "UInteger" is more for the opt/param value range
checking, but could be improved.  PR102440 filed as you suggested.  :)

 +-param=rs6000-density-pct-threshold=
 +Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) 
 Init(85) IntegerRange(0, 99) Param
>>>
>>> So make this and all other percentages (0, 100) please.
>>
>> I thought 99 is enough for the RHS in ">". just realized it's more clear
>> with 100.  Will fix!
> 
> 99 will work fine, but it's not the best choice for the user, who will
> expect that a percentage can be anything from 0% to 100%.
> 
 +When costing for loop vectorization, we probably need to penalize the 
 loop body cost if the existing cost model may not adequately reflect 
 delays from unavailable vector resources.  We collect the cost for 
 vectorized statements and non-vectorized statements separately, check the 
 proportion of vec_cost to total cost of vec_cost and non vec_cost, and 
 penalize only if the proportion exceeds the threshold specified by this 
 parameter.  The default value is 85.
>>>
>>> It would be good if we can use line breaks in the source code for things
>>> like this, but I don't think we can.  This message is mainly used for
>>> "--help=param", and it is good there to have as short messages as you
>>> can.  But given the nature of params you need quite a few words often,
>>> and you do not want to say so little that things are no clear, either.
>>>
>>> So, dunno :-)
>>
>> I did some testings, the line breaks writing can still survive in the
>> "--help=param" show, the lines are concatenated with " ".  Although
>> there seems no this kind of writing practices, I am guessing you want
>> me to do line breaks for their descriptions?  If so, I will make them
>> short as the above "Target Undocumented..." line.  Or do you want it
>> to align source code ColumnLimit 80 (for these cases, it would look
>> shorter)?
> 
> It would help if was more readable in the surce code, one line of close
> to 500 columns is not very manageable :-)
> 
> But the thing that matters is what it will look like in the --help=
> output (and/or the manual).
> 

OK, I've used ColumnLimit 80 for that.  The outputs in --help= before/after
the line breaks look the same (smoother than what I can expect).  :)

Committed in r12-3767, thanks!

BR,
Kewen

[PATCH] [i386] Adjust testcase.

2021-09-21 Thread liuhongt via Gcc-patches

Pushed to trunk.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr92658-avx512f.c: Refine testcase.
* gcc.target/i386/pr92658-avx512vl.c: Adjust scan-assembler,
only v2di->v2qi truncate is not supported, v4di->v4qi should
be supported.
---
 gcc/testsuite/gcc.target/i386/pr92658-avx512f.c  | 6 +-
 gcc/testsuite/gcc.target/i386/pr92658-avx512vl.c | 3 ++-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr92658-avx512f.c 
b/gcc/testsuite/gcc.target/i386/pr92658-avx512f.c
index e26b06ecd5a..9afb195541a 100644
--- a/gcc/testsuite/gcc.target/i386/pr92658-avx512f.c
+++ b/gcc/testsuite/gcc.target/i386/pr92658-avx512f.c
@@ -48,6 +48,10 @@ truncqb (v8qi * dst, v8di * __restrict src)
   tem[1] = (*src)[1];
   tem[2] = (*src)[2];
   tem[3] = (*src)[3];
+  tem[4] = (*src)[4];
+  tem[5] = (*src)[5];
+  tem[6] = (*src)[6];
+  tem[7] = (*src)[7];
   dst[0] = *(v8qi *) tem;
 }
 
@@ -100,7 +104,7 @@ truncdb (v16qi * dst, v16si * __restrict src)
 
 /* { dg-final { scan-assembler-times "vpmovqd" 1 } } */
 /* { dg-final { scan-assembler-times "vpmovqw" 1 } } */
-/* { dg-final { scan-assembler-times "vpmovqb" 1 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times "vpmovqb" 1 } } */
 /* { dg-final { scan-assembler-times "vpmovdw" 1 } } */
 /* { dg-final { scan-assembler-times "vpmovdb" 1 } } */
 
diff --git a/gcc/testsuite/gcc.target/i386/pr92658-avx512vl.c 
b/gcc/testsuite/gcc.target/i386/pr92658-avx512vl.c
index 7ff9c19ee36..ae6959e4a20 100644
--- a/gcc/testsuite/gcc.target/i386/pr92658-avx512vl.c
+++ b/gcc/testsuite/gcc.target/i386/pr92658-avx512vl.c
@@ -123,6 +123,7 @@ truncdb_128 (v16qi * dst, v4si * __restrict src)
 
 /* { dg-final { scan-assembler-times "vpmovqd" 2 } } */
 /* { dg-final { scan-assembler-times "vpmovqw" 2 } } */
-/* { dg-final { scan-assembler-times "vpmovqb" 2 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times "vpmovqb\[ \t]*%ymm" 1 } }  */
+/* { dg-final { scan-assembler-times "vpmovqb\[ \t]*%xmm" 1 { xfail *-*-* } } 
} */
 /* { dg-final { scan-assembler-times "vpmovdw" 2 } } */
 /* { dg-final { scan-assembler-times "vpmovdb" 2 } } */
-- 
2.27.0

Re: [PATCH] ipa-fnsummary: Remove inconsistent bp_pack_value

2021-09-21 Thread Kewen.Lin via Gcc-patches

on 2021/9/21 下午2:16, Richard Biener wrote:
> On Tue, Sep 21, 2021 at 4:09 AM Kewen.Lin  wrote:
>>
>> Hi Richi,
>>
>> Thanks for the review!
>>
>> on 2021/9/17 下午6:04, Richard Biener wrote:
>>> On Fri, Sep 17, 2021 at 12:03 PM Richard Biener
>>>  wrote:

 On Fri, Sep 17, 2021 at 11:43 AM Kewen.Lin  wrote:
>
> Hi,
>
> When changing target_info with bitfield, I happened to find this
> inconsistent streaming in and out.  We have the streaming in:
>
>   bp_pack_value (, info->inlinable, 1);
>   bp_pack_value (, false, 1);
>   bp_pack_value (, info->fp_expressions, 1);
>
> while the streaming out:
>
>   info->inlinable = bp_unpack_value (, 1);
>   info->fp_expressions = bp_unpack_value (, 1)
>
> The cleanup of Cilk Plus support seemed to miss to remove the bit
> streaming out but change with streaming false.
>
> By hacking fp_expression_p to return true always, I can see it
> reads the wrong fp_expressions value (false) out in wpa dumping.
>
> Bootstrapped and regress-tested on powerpc64le-linux-gnu Power9.
>
> Is it ok for trunk?

 OK for trunk and all affected branches (note we need to bump the
 LTO minor version there).  The issue comes from the removal
 of cilk+ in r8-4956 which removed the bp_unpack but replaced
 the bp_pack ...

 It's a correctness issue as we'll read fp_expressions as always 'false'
>>>
>>> Btw, on branches we could also simply unpack a dummy bit to avoid
>>> changing the format.
>>>
>>
>> Committed in r12-3721.  Thanks!
>>
>> As suggested, the patch for branches is listed below.
>>
>> Is ok for branches 9, 10 and 11 after some trunk burn in time?
> 
> It's OK for branches without waiting, maybe you can do a LTO bootstrap
> on the branches for extra safety (just in case we're triggering some hidden
> issues due to the fix).

Thanks!  LTO bootstrapped on the branches, separately committed via
r11-9024, r10-10146 and r9-9740.

BR,
Kewen

> 
> Thanks,
> Richard.
> 
>> BR,
>> Kewen
>> -
>> gcc/ChangeLog:
>>
>> * ipa-fnsummary.c (inline_read_section): Unpack a dummy bit
>> to keep consistent with the side of streaming out.
>>
>> ---
>> diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
>> index 18bbae145b9..bf635c1f78a 100644
>> --- a/gcc/ipa-fnsummary.c
>> +++ b/gcc/ipa-fnsummary.c
>> @@ -4403,13 +4403,20 @@ inline_read_section (struct lto_file_decl_data 
>> *file_data, const char *data,
>>bp = streamer_read_bitpack ();
>>if (info)
>> {
>> -  info->inlinable = bp_unpack_value (, 1);
>> -  info->fp_expressions = bp_unpack_value (, 1);
>> + info->inlinable = bp_unpack_value (, 1);
>> + /* On the side of streaming out, there is still one bit
>> +streamed out between inlinable and fp_expressions bits,
>> +which was used for cilk+ before but now always false.
>> +To remove the bit packing need to bump LTO minor version,
>> +so unpack a dummy bit here to keep consistent instead.  */
>> + bp_unpack_value (, 1);
>> + info->fp_expressions = bp_unpack_value (, 1);
>> }
>>else
>> {
>> -  bp_unpack_value (, 1);
>> -  bp_unpack_value (, 1);
>> + bp_unpack_value (, 1);
>> + bp_unpack_value (, 1);
>> + bp_unpack_value (, 1);
>> }
>>
>>count2 = streamer_read_uhwi ();
>>

[PATCH] Support 64bit fma/fms/fnma/fnms under avx512vl.

2021-09-21 Thread liuhongt via Gcc-patches

Hi:
  fma/fms/fnma/fnmsv2sf4 are defined only under (TARGET_FMA || TARGET_FMA4).
The patch extend the expanders to TARGET_AVX512VL.

  Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
  Ok for trunk?

gcc/ChangeLog:

* config/i386/mmx.md (fmav2sf4): Extend to AVX512 fma.
(fmsv2sf4): Ditto.
(fnmav2sf4): Ditto.
(fnmsv2sf4): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512vl-pr95046.c: New test.
---
 gcc/config/i386/i386.md   |  4 +++-
 gcc/config/i386/mmx.md| 20 +++
 .../gcc.target/i386/avx512vl-pr95046.c| 10 ++
 3 files changed, 25 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-pr95046.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 188f431510a..c41fdd516c5 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -832,7 +832,7 @@ (define_attr "isa" 
"base,x64,nox64,x64_sse2,x64_sse4,x64_sse4_noavx,
x64_avx,x64_avx512bw,x64_avx512dq,
sse_noavx,sse2,sse2_noavx,sse3,sse3_noavx,sse4,sse4_noavx,
avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,noavx512f,
-   avx512bw,noavx512bw,avx512dq,noavx512dq,
+   avx512bw,noavx512bw,avx512dq,noavx512dq,fma_or_avx512vl,
avx512vl,noavx512vl,avxvnni,avx512vnnivl,avx512fp16"
   (const_string "base"))
 
@@ -874,6 +874,8 @@ (define_attr "enabled" ""
 (eq_attr "isa" "bmi2") (symbol_ref "TARGET_BMI2")
 (eq_attr "isa" "fma4") (symbol_ref "TARGET_FMA4")
 (eq_attr "isa" "fma") (symbol_ref "TARGET_FMA")
+(eq_attr "isa" "fma_or_avx512vl")
+  (symbol_ref "TARGET_FMA || TARGET_AVX512VL")
 (eq_attr "isa" "avx512f") (symbol_ref "TARGET_AVX512F")
 (eq_attr "isa" "noavx512f") (symbol_ref "!TARGET_AVX512F")
 (eq_attr "isa" "avx512bw") (symbol_ref "TARGET_AVX512BW")
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 2d3b63f0834..b0093778fc6 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1019,12 +1019,13 @@ (define_insn "fmav2sf4"
  (match_operand:V2SF 1 "register_operand" "%0,v,x")
  (match_operand:V2SF 2 "register_operand" "v,v,x")
  (match_operand:V2SF 3 "register_operand" "v,0,x")))]
-  "(TARGET_FMA || TARGET_FMA4) && TARGET_MMX_WITH_SSE"
+  "(TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL)
+   && TARGET_MMX_WITH_SSE"
   "@
vfmadd132ps\t{%2, %3, %0|%0, %3, %2}
vfmadd231ps\t{%2, %1, %0|%0, %1, %2}
vfmaddps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
-  [(set_attr "isa" "fma,fma,fma4")
+  [(set_attr "isa" "fma_or_avx512vl,fma_or_avx512vl,fma4")
(set_attr "type" "ssemuladd")
(set_attr "mode" "V4SF")])
 
@@ -1035,12 +1036,13 @@ (define_insn "fmsv2sf4"
  (match_operand:V2SF   2 "register_operand" "v,v,x")
  (neg:V2SF
(match_operand:V2SF 3 "register_operand" "v,0,x"]
-  "(TARGET_FMA || TARGET_FMA4) && TARGET_MMX_WITH_SSE"
+  "(TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL)
+   && TARGET_MMX_WITH_SSE"
   "@
vfmsub132ps\t{%2, %3, %0|%0, %3, %2}
vfmsub231ps\t{%2, %1, %0|%0, %1, %2}
vfmsubps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
-  [(set_attr "isa" "fma,fma,fma4")
+  [(set_attr "isa" "fma_or_avx512vl,fma_or_avx512vl,fma4")
(set_attr "type" "ssemuladd")
(set_attr "mode" "V4SF")])
 
@@ -1051,12 +1053,13 @@ (define_insn "fnmav2sf4"
(match_operand:V2SF 1 "register_operand" "%0,v,x"))
  (match_operand:V2SF   2 "register_operand" "v,v,x")
  (match_operand:V2SF   3 "register_operand" "v,0,x")))]
-  "(TARGET_FMA || TARGET_FMA4) && TARGET_MMX_WITH_SSE"
+  "(TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL)
+   && TARGET_MMX_WITH_SSE"
   "@
vfnmadd132ps\t{%2, %3, %0|%0, %3, %2}
vfnmadd231ps\t{%2, %1, %0|%0, %1, %2}
vfnmaddps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
-  [(set_attr "isa" "fma,fma,fma4")
+  [(set_attr "isa" "fma_or_avx512vl,fma_or_avx512vl,fma4")
(set_attr "type" "ssemuladd")
(set_attr "mode" "V4SF")])
 
@@ -1068,12 +1071,13 @@ (define_insn "fnmsv2sf4"
  (match_operand:V2SF   2 "register_operand" "v,v,x")
  (neg:V2SF
(match_operand:V2SF 3 "register_operand" "v,0,x"]
-  "(TARGET_FMA || TARGET_FMA4) && TARGET_MMX_WITH_SSE"
+  "(TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL)
+   && TARGET_MMX_WITH_SSE"
   "@
vfnmsub132ps\t{%2, %3, %0|%0, %3, %2}
vfnmsub231ps\t{%2, %1, %0|%0, %1, %2}
vfnmsubps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
-  [(set_attr "isa" "fma,fma,fma4")
+  [(set_attr "isa" "fma_or_avx512vl,fma_or_avx512vl,fma4")
(set_attr "type" "ssemuladd")
(set_attr "mode" "V4SF")])
 
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-pr95046.c 
b/gcc/testsuite/gcc.target/i386/avx512vl-pr95046.c
new file mode 100644
index 000..02204d0e3c2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-pr95046.c
@@ -0,0

Re: [PATCH 49/62] AVX512FP16: Add vfcmaddcph/vfmaddcph/vfcmulcph/vfmulcph

2021-09-21 Thread Hongtao Liu via Gcc-patches

I'm going to check in 7 patches.

[PATCH 49/62] AVX512FP16: Add vfcmaddcph/vfmaddcph/vfcmulcph/vfmulcph
[PATCH 50/62] AVX512FP16: Add testcases for
vfcmaddcph/vfmaddcph/vfcmulcph/vfmulcph.
[PATCH 51/62] AVX512FP16: Add vfcmaddcsh/vfmaddcsh/vfcmulcsh/vfmulcsh.
[PATCH 52/62] AVX512FP16: Add testcases for
vfcmaddcsh/vfmaddcsh/vfcmulcsh/vfmulcsh.
[PATCH 53/62] AVX512FP16: Add expander for sqrthf2.
[PATCH 54/62] AVX512FP16: Add expander for ceil/floor/trunc/roundeven.
[PATCH 55/62] AVX512FP16: Add expander for cstorehf4.

bootstrapped and regtest on x86_64-pc-linux-gnu{-m32,}
Newly added runtime tests passed on sde{-m32,}.

On Thu, Jul 1, 2021 at 2:18 PM liuhongt  wrote:
>
> gcc/ChangeLog:
>
> * config/i386/avx512fp16intrin.h (_mm512_fcmadd_pch):
> New intrinsic.
> (_mm512_mask_fcmadd_pch): Likewise.
> (_mm512_mask3_fcmadd_pch): Likewise.
> (_mm512_maskz_fcmadd_pch): Likewise.
> (_mm512_fmadd_pch): Likewise.
> (_mm512_mask_fmadd_pch): Likewise.
> (_mm512_mask3_fmadd_pch): Likewise.
> (_mm512_maskz_fmadd_pch): Likewise.
> (_mm512_fcmadd_round_pch): Likewise.
> (_mm512_mask_fcmadd_round_pch): Likewise.
> (_mm512_mask3_fcmadd_round_pch): Likewise.
> (_mm512_maskz_fcmadd_round_pch): Likewise.
> (_mm512_fmadd_round_pch): Likewise.
> (_mm512_mask_fmadd_round_pch): Likewise.
> (_mm512_mask3_fmadd_round_pch): Likewise.
> (_mm512_maskz_fmadd_round_pch): Likewise.
> (_mm512_fcmul_pch): Likewise.
> (_mm512_mask_fcmul_pch): Likewise.
> (_mm512_maskz_fcmul_pch): Likewise.
> (_mm512_fmul_pch): Likewise.
> (_mm512_mask_fmul_pch): Likewise.
> (_mm512_maskz_fmul_pch): Likewise.
> (_mm512_fcmul_round_pch): Likewise.
> (_mm512_mask_fcmul_round_pch): Likewise.
> (_mm512_maskz_fcmul_round_pch): Likewise.
> (_mm512_fmul_round_pch): Likewise.
> (_mm512_mask_fmul_round_pch): Likewise.
> (_mm512_maskz_fmul_round_pch): Likewise.
> * config/i386/avx512fp16vlintrin.h (_mm_fmadd_pch):
> New intrinsic.
> (_mm_mask_fmadd_pch): Likewise.
> (_mm_mask3_fmadd_pch): Likewise.
> (_mm_maskz_fmadd_pch): Likewise.
> (_mm256_fmadd_pch): Likewise.
> (_mm256_mask_fmadd_pch): Likewise.
> (_mm256_mask3_fmadd_pch): Likewise.
> (_mm256_maskz_fmadd_pch): Likewise.
> (_mm_fcmadd_pch): Likewise.
> (_mm_mask_fcmadd_pch): Likewise.
> (_mm_mask3_fcmadd_pch): Likewise.
> (_mm_maskz_fcmadd_pch): Likewise.
> (_mm256_fcmadd_pch): Likewise.
> (_mm256_mask_fcmadd_pch): Likewise.
> (_mm256_mask3_fcmadd_pch): Likewise.
> (_mm256_maskz_fcmadd_pch): Likewise.
> (_mm_fmul_pch): Likewise.
> (_mm_mask_fmul_pch): Likewise.
> (_mm_maskz_fmul_pch): Likewise.
> (_mm256_fmul_pch): Likewise.
> (_mm256_mask_fmul_pch): Likewise.
> (_mm256_maskz_fmul_pch): Likewise.
> (_mm_fcmul_pch): Likewise.
> (_mm_mask_fcmul_pch): Likewise.
> (_mm_maskz_fcmul_pch): Likewise.
> (_mm256_fcmul_pch): Likewise.
> (_mm256_mask_fcmul_pch): Likewise.
> (_mm256_maskz_fcmul_pch): Likewise.
> * config/i386/i386-builtin-types.def (V8HF_FTYPE_V8HF_V8HF_V8HF,
> V8HF_FTYPE_V16HF_V16HF_V16HF, V16HF_FTYPE_V16HF_V16HF_V16HF_UQI,
> V32HF_FTYPE_V32HF_V32HF_V32HF_INT,
> V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT): Add new builtin types.
> * config/i386/i386-builtin.def: Add new builtins.
> * config/i386/i386-expand.c: Handle new builtin types.
> * config/i386/subst.md (SUBST_CV): New.
> (maskc_name): Ditto.
> (maskc_operand3): Ditto.
> (maskc): Ditto.
> (sdc_maskz_name): Ditto.
> (sdc_mask_op4): Ditto.
> (sdc_mask_op5): Ditto.
> (sdc_mask_mode512bit_condition): Ditto.
> (sdc): Ditto.
> (round_maskc_operand3): Ditto.
> (round_sdc_mask_operand4): Ditto.
> (round_maskc_op3): Ditto.
> (round_sdc_mask_op4): Ditto.
> (round_saeonly_sdc_mask_operand5): Ditto.
> * config/i386/sse.md (unspec): Add complex fma unspecs.
> (avx512fmaskcmode): New.
> (UNSPEC_COMPLEX_F_C_MA): Ditto.
> (UNSPEC_COMPLEX_F_C_MUL): Ditto.
> (complexopname): Ditto.
> (_fmaddc__maskz): New expander.
> (_fcmaddc__maskz): Ditto.
> (fma__): New
> define insn.
> (___mask): Ditto.
> (__): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/avx-1.c: Add test for new builtins.
> * gcc.target/i386/sse-13.c: Ditto.
> * gcc.target/i386/sse-23.c: Ditto.
> * gcc.target/i386/sse-14.c: Add test for new intrinsics.
> * gcc.target/i386/sse-22.c: Ditto.
> ---
>  gcc/config/i386/avx512fp16intrin.h | 386 +

Re: [PATCH v6] c++: Fix cp_tree_equal for template value args using dependent sizeof/alignof/noexcept expressions

2021-09-21 Thread Jason Merrill via Gcc-patches


On 9/21/21 21:19, Barrett Adair wrote:

This revision adds description and ChangeLog entries.


Applied, thanks!  I wrapped some too-long lines in the commit message 
and tweaked the whitespace in the code slightly.


Jason

Re: [PATCH] x86: Clean up gcc.target/i386/auto-init-* tests

2021-09-21 Thread H.J. Lu via Gcc-patches

On Mon, Sep 20, 2021 at 8:19 AM H.J. Lu  wrote:
>
> On Fri, Sep 17, 2021 at 10:32 AM Qing Zhao via Gcc-patches
>  wrote:
> >
> >
> >
> > > On Sep 17, 2021, at 11:59 AM, Jakub Jelinek  wrote:
> > >
> > > On Fri, Sep 17, 2021 at 04:55:22PM +, Qing Zhao wrote:
> > >> This is the patch to fix gcc.target/i386/auto-init-* tests.
> > >>
> > >> I have tested the change at X86_64-linux with
> > >>
> > >> make check-gcc 
> > >> RUNTESTFLAGS='--target_board=unix\{-m64,-m64/-march=skylake-avx512,-m64/-fstack-protector-all,-m64/-fstack-clash-protection,-m32/-mno-sse,-m32/-mtune=bonnell,-m32/-march=bonnell,-m32/-fstack-protector-all/-fstack-clash-protection\}
> > >>  i386.exp=auto-init*’
> > >>
> > >> make check-gcc 
> > >> RUNTESTFLAGS='--target_board=unix\{-m64,-m64/-march=skylake-avx512/-fPIC,-m64/-fstack-protector-all/-fPIC,-m64/-fstack-clash-protection/-fPIC,-m32/-mno-sse/-fPIC,-m32/-mtune=bonnell/-fPIC,-m32/-march=bonnell/-fPIC,-m32/-fstack-protector-all/-fstack-clash-protection/-fPIC\}
> > >>  i386.exp=auto-init*’
> > >>
> > >> Everything works fine.
> > >>
> > >> Okay for commit?
> > >
> > > LGTM.
> >
> > Thank you.
> >
> > I will commit the change soon.
> >
> > For the aarch64 tests, do you have a suggestion on what the option 
> > combination I should test?
> >
>
> Here is the followup patch to clean up these tests:
>
> 1. Replace ia32 with { ! lp64 } to enable ILP32 tests for -mx32.
> 2. Replace lp64 with { ! ia32 } to enable x86-64 ISA tests for -mx32.
> 3. For auto-init-3.c, add -msse and -mfpmath=387 for ia32.
>
> Any comments?

I will check it in tomorrow.

-- 
H.J.

Re: [PATCH] x86-64: Remove HAVE_LD_PIE_COPYRELOC

2021-09-21 Thread Fāng-ruì Sòng via Gcc-patches

On Tue, Sep 21, 2021 at 6:57 PM H.J. Lu  wrote:
>
> On Tue, Sep 21, 2021 at 9:16 AM Uros Bizjak  wrote:
> >
> > On Mon, Sep 20, 2021 at 8:20 PM Fāng-ruì Sòng via Gcc-patches
> >  wrote:
> > >
> > > PING^5 https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570139.html
> > >
> > > On Sat, Sep 4, 2021 at 12:11 PM Fāng-ruì Sòng  wrote:
> > > >
> > > > PING^4 https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570139.html
> > > >
> > > > One major design goal of PIE was to avoid copy relocations.
> > > > The original patch for GCC 5 caused problems for many years.
> > > >
> > > > On Wed, Aug 18, 2021 at 11:54 PM Fāng-ruì Sòng  
> > > > wrote:
> > > >>
> > > >> PING^3 https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570139.html
> > > >>
> > > >> On Fri, Jun 4, 2021 at 3:04 PM Fāng-ruì Sòng  
> > > >> wrote:
> > > >> >
> > > >> > PING^2 https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570139.html
> > > >> >
> > > >> > On Mon, May 24, 2021 at 9:43 AM Fāng-ruì Sòng  
> > > >> > wrote:
> > > >> > >
> > > >> > > Ping https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570139.html
> > > >> > >
> > > >> > > On Tue, May 11, 2021 at 8:29 PM Fangrui Song  
> > > >> > > wrote:
> > > >> > > >
> > > >> > > > This was introduced in 2014-12 to use local binding for external 
> > > >> > > > symbols
> > > >> > > > for -fPIE. Now that we have H.J. Lu's GOTPCRELX for years which 
> > > >> > > > mostly
> > > >> > > > nullify the benefit of HAVE_LD_PIE_COPYRELOC, 
> > > >> > > > HAVE_LD_PIE_COPYRELOC
> > > >> > > > should retire now.
> > > >> > > >
> > > >> > > > One design goal of -fPIE was to avoid copy relocations.
> > > >> > > > HAVE_LD_PIE_COPYRELOC has deviated from the goal.  With this 
> > > >> > > > change, the
> > > >> > > > -fPIE behavior of x86-64 will be closer to x86-32 and other 
> > > >> > > > targets.
> > > >> > > >
> > > >> > > > ---
> > > >> > > >
> > > >> > > > See https://gcc.gnu.org/legacy-ml/gcc/2019-05/msg00215.html for 
> > > >> > > > a list
> > > >> > > > of fixed and unfixed (e.g. gold incompatibility with protected
> > > >> > > > https://sourceware.org/bugzilla/show_bug.cgi?id=19823) issues.
> > > >> > > >
> > > >> > > > If you prefer a longer write-up, see
> > > >> > > > https://maskray.me/blog/2021-01-09-copy-relocations-canonical-plt-entries-and-protected
> > > >> > > > ---
> > > >> > > >  gcc/config.in |  6 ---
> > > >> > > >  gcc/config/i386/i386.c| 11 +---
> > > >> > > >  gcc/configure | 52 
> > > >> > > > ---
> > > >> > > >  gcc/configure.ac  | 48 
> > > >> > > > -
> > > >> > > >  gcc/doc/sourcebuild.texi  |  3 --
> > > >> > > >  .../gcc.target/i386/pie-copyrelocs-1.c| 14 -
> > > >> > > >  .../gcc.target/i386/pie-copyrelocs-2.c| 14 -
> > > >> > > >  .../gcc.target/i386/pie-copyrelocs-3.c| 14 -
> > > >> > > >  .../gcc.target/i386/pie-copyrelocs-4.c| 17 --
> > > >> > > >  gcc/testsuite/lib/target-supports.exp | 47 
> > > >> > > > -
> > > >> > > >  10 files changed, 2 insertions(+), 224 deletions(-)
> > > >> > > >  delete mode 100644 
> > > >> > > > gcc/testsuite/gcc.target/i386/pie-copyrelocs-1.c
> > > >> > > >  delete mode 100644 
> > > >> > > > gcc/testsuite/gcc.target/i386/pie-copyrelocs-2.c
> > > >> > > >  delete mode 100644 
> > > >> > > > gcc/testsuite/gcc.target/i386/pie-copyrelocs-3.c
> > > >> > > >  delete mode 100644 
> > > >> > > > gcc/testsuite/gcc.target/i386/pie-copyrelocs-4.c
> >
> > From x86 maintainer's PoV, the implementation is trivially correct,
> > but I have no idea about functionality. HJ, can you please review the
> > functionality and post your opinion on the patch to move it forward?
> >
> > Thanks,
> > Uros.
>
> I prefer to leave it alone and apply this:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576736.html
>
> instead.  I am working to add a nodirect_extern_access attribute based
> on feedback at LPC 2021.

I think -fpie should be fixed as soon as possible.

"Add -f[no-]direct-extern-access" says "-fdirect-extern-access is the default."
IMHO this is not a good choice for -fpie.
As the description of this patch says, one of the design goals of
-fpie is to avoid copy relocations.

> In executable and shared library, bind symbols with the STV_PROTECTED 
> visibility locally

As I have repeated many times (also Clang's behavior), STV_PROTECTED
visibility symbol should be bound locally regardless of
-fno-direct-extern-access.

I think it is fair to say all of Michael Matz, Alan Modra, and I think
adding so many behaviors under -fno-direct-extern-access is
over-engineering (well, because I don't think
-fno-direct-extern-access can be selected as the default behavior any
time soon).

https://maskray.me/blog/2021-01-09-copy-relocations-canonical-plt-entries-and-protected#summary

Re: [PATCH] x86-64: Remove HAVE_LD_PIE_COPYRELOC

2021-09-21 Thread H.J. Lu via Gcc-patches

On Tue, Sep 21, 2021 at 9:16 AM Uros Bizjak  wrote:
>
> On Mon, Sep 20, 2021 at 8:20 PM Fāng-ruì Sòng via Gcc-patches
>  wrote:
> >
> > PING^5 https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570139.html
> >
> > On Sat, Sep 4, 2021 at 12:11 PM Fāng-ruì Sòng  wrote:
> > >
> > > PING^4 https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570139.html
> > >
> > > One major design goal of PIE was to avoid copy relocations.
> > > The original patch for GCC 5 caused problems for many years.
> > >
> > > On Wed, Aug 18, 2021 at 11:54 PM Fāng-ruì Sòng  wrote:
> > >>
> > >> PING^3 https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570139.html
> > >>
> > >> On Fri, Jun 4, 2021 at 3:04 PM Fāng-ruì Sòng  wrote:
> > >> >
> > >> > PING^2 https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570139.html
> > >> >
> > >> > On Mon, May 24, 2021 at 9:43 AM Fāng-ruì Sòng  
> > >> > wrote:
> > >> > >
> > >> > > Ping https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570139.html
> > >> > >
> > >> > > On Tue, May 11, 2021 at 8:29 PM Fangrui Song  
> > >> > > wrote:
> > >> > > >
> > >> > > > This was introduced in 2014-12 to use local binding for external 
> > >> > > > symbols
> > >> > > > for -fPIE. Now that we have H.J. Lu's GOTPCRELX for years which 
> > >> > > > mostly
> > >> > > > nullify the benefit of HAVE_LD_PIE_COPYRELOC, HAVE_LD_PIE_COPYRELOC
> > >> > > > should retire now.
> > >> > > >
> > >> > > > One design goal of -fPIE was to avoid copy relocations.
> > >> > > > HAVE_LD_PIE_COPYRELOC has deviated from the goal.  With this 
> > >> > > > change, the
> > >> > > > -fPIE behavior of x86-64 will be closer to x86-32 and other 
> > >> > > > targets.
> > >> > > >
> > >> > > > ---
> > >> > > >
> > >> > > > See https://gcc.gnu.org/legacy-ml/gcc/2019-05/msg00215.html for a 
> > >> > > > list
> > >> > > > of fixed and unfixed (e.g. gold incompatibility with protected
> > >> > > > https://sourceware.org/bugzilla/show_bug.cgi?id=19823) issues.
> > >> > > >
> > >> > > > If you prefer a longer write-up, see
> > >> > > > https://maskray.me/blog/2021-01-09-copy-relocations-canonical-plt-entries-and-protected
> > >> > > > ---
> > >> > > >  gcc/config.in |  6 ---
> > >> > > >  gcc/config/i386/i386.c| 11 +---
> > >> > > >  gcc/configure | 52 
> > >> > > > ---
> > >> > > >  gcc/configure.ac  | 48 
> > >> > > > -
> > >> > > >  gcc/doc/sourcebuild.texi  |  3 --
> > >> > > >  .../gcc.target/i386/pie-copyrelocs-1.c| 14 -
> > >> > > >  .../gcc.target/i386/pie-copyrelocs-2.c| 14 -
> > >> > > >  .../gcc.target/i386/pie-copyrelocs-3.c| 14 -
> > >> > > >  .../gcc.target/i386/pie-copyrelocs-4.c| 17 --
> > >> > > >  gcc/testsuite/lib/target-supports.exp | 47 
> > >> > > > -
> > >> > > >  10 files changed, 2 insertions(+), 224 deletions(-)
> > >> > > >  delete mode 100644 
> > >> > > > gcc/testsuite/gcc.target/i386/pie-copyrelocs-1.c
> > >> > > >  delete mode 100644 
> > >> > > > gcc/testsuite/gcc.target/i386/pie-copyrelocs-2.c
> > >> > > >  delete mode 100644 
> > >> > > > gcc/testsuite/gcc.target/i386/pie-copyrelocs-3.c
> > >> > > >  delete mode 100644 
> > >> > > > gcc/testsuite/gcc.target/i386/pie-copyrelocs-4.c
>
> From x86 maintainer's PoV, the implementation is trivially correct,
> but I have no idea about functionality. HJ, can you please review the
> functionality and post your opinion on the patch to move it forward?
>
> Thanks,
> Uros.

I prefer to leave it alone and apply this:

https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576736.html

instead.  I am working to add a nodirect_extern_access attribute based
on feedback at LPC 2021.

-- 
H.J.

Re: [PATCH] Allow different vector types for stmt groups

2021-09-21 Thread Hongtao Liu via Gcc-patches

On Tue, Sep 21, 2021 at 10:55 PM H.J. Lu  wrote:
>
> On Mon, Sep 20, 2021 at 5:15 AM Richard Biener via Gcc-patches
>  wrote:
> >
> > This allows vectorization (in practice non-loop vectorization) to
> > have a stmt participate in different vector type vectorizations.
> > It allows us to remove vect_update_shared_vectype and replace it
> > by pushing/popping STMT_VINFO_VECTYPE from SLP_TREE_VECTYPE around
> > vect_analyze_stmt and vect_transform_stmt.
> >
> > For data-ref the situation is a bit more complicated since we
> > analyze alignment info with a specific vector type in mind which
> > doesn't play well when that changes.
> >
> > So the bulk of the change is passing down the actual vector type
> > used for a vectorized access to the various accessors of alignment
> > info, first and foremost dr_misalignment but also aligned_access_p,
> > known_alignment_for_access_p, vect_known_alignment_in_bytes and
> > vect_supportable_dr_alignment.  I took the liberty to replace
> > ALL_CAPS macro accessors with the lower-case function invocations.
> >
> > The actual changes to the behavior are in dr_misalignment which now
> > is the place factoring in the negative step adjustment as well as
> > handling alignment queries for a vector type with bigger alignment
> > requirements than what we can (or have) analyze(d).
> >
> > vect_slp_analyze_node_alignment makes use of this and upon receiving
> > a vector type with a bigger alingment desire re-analyzes the DR
> > with respect to it but keeps an older more precise result if possible.
> > In this context it might be possible to do the analysis just once
> > but instead of analyzing with respect to a specific desired alignment
> > look for the biggest alignment we can compute a not unknown alignment.
> >
> > The ChangeLog includes the functional changes but not the bulk due
> > to the alignment accessor API changes - I hope that's something good.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, testing on SPEC
> > CPU 2017 in progress (for stats and correctness).
> >
> > Any comments?
> >
> > Thanks,
> > Richard.
> >
> > 2021-09-17  Richard Biener  
> >
> > PR tree-optimization/97351
> > PR tree-optimization/97352
> > PR tree-optimization/82426
> > * tree-vectorizer.h (dr_misalignment): Add vector type
> > argument.
> > (aligned_access_p): Likewise.
> > (known_alignment_for_access_p): Likewise.
> > (vect_supportable_dr_alignment): Likewise.
> > (vect_known_alignment_in_bytes): Likewise.  Refactor.
> > (DR_MISALIGNMENT): Remove.
> > (vect_update_shared_vectype): Likewise.
> > * tree-vect-data-refs.c (dr_misalignment): Refactor, handle
> > a vector type with larger alignment requirement and apply
> > the negative step adjustment here.
> > (vect_calculate_target_alignment): Remove.
> > (vect_compute_data_ref_alignment): Get explicit vector type
> > argument, do not apply a negative step alignment adjustment
> > here.
> > (vect_slp_analyze_node_alignment): Re-analyze alignment
> > when we re-visit the DR with a bigger desired alignment but
> > keep more precise results from smaller alignments.
> > * tree-vect-slp.c (vect_update_shared_vectype): Remove.
> > (vect_slp_analyze_node_operations_1): Do not update the
> > shared vector type on stmts.
> > * tree-vect-stmts.c (vect_analyze_stmt): Push/pop the
> > vector type of an SLP node to the representative stmt-info.
> > (vect_transform_stmt): Likewise.
> >
> > * gcc.target/i386/vect-pr82426.c: New testcase.
> > * gcc.target/i386/vect-pr97352.c: Likewise.
> > ---
> >  gcc/testsuite/gcc.target/i386/vect-pr82426.c |  32 +++
> >  gcc/testsuite/gcc.target/i386/vect-pr97352.c |  22 ++
> >  gcc/tree-vect-data-refs.c| 217 ++-
> >  gcc/tree-vect-slp.c  |  59 -
> >  gcc/tree-vect-stmts.c|  77 ---
> >  gcc/tree-vectorizer.h|  32 ++-
> >  6 files changed, 231 insertions(+), 208 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/vect-pr82426.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/vect-pr97352.c
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/vect-pr82426.c 
> > b/gcc/testsuite/gcc.target/i386/vect-pr82426.c
> > new file mode 100644
> > index 000..741a1d14d36
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/vect-pr82426.c
> > @@ -0,0 +1,32 @@
> > +/* i?86 does not have V2SF, x32 does though.  */
> > +/* { dg-do compile { target { lp64 || x32 } } } */
>
> It should be target { ! ia32 }
>
> > +/* ???  With AVX512 we only realize one FMA opportunity.  */
>
> Hongtao, is AVX512 missing 64-bit vector support??
>
(define_insn "fmav2sf4"
  [(set (match_operand:V2SF 0 "register_operand" "=v,v,x")
(fma:V2SF
  (match_operand:V2SF 1

Re: [PATCH] Enable auto-vectorization at O2 with very-cheap cost model.

2021-09-21 Thread Hongtao Liu via Gcc-patches

On Mon, Sep 20, 2021 at 4:13 AM Martin Sebor  wrote:
>
> On 9/16/21 3:03 AM, Hongtao Liu via Gcc-patches wrote:
> > On Thu, Sep 16, 2021 at 4:23 PM Richard Biener via Gcc-patches
> >  wrote:
> >>
> >> On Thu, 16 Sep 2021, liuhongt wrote:
> >>
> >>> Ping
> >>> rebased on latest trunk.
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>>* common.opt (ftree-vectorize): Add Var(flag_tree_vectorize).
> >>>* doc/invoke.texi (Options That Control Optimization): Update
> >>>documents.
> >>>* opts.c (default_options_table): Enable auto-vectorization at
> >>>O2 with very-cheap cost model.
> >>>(finish_options): Use cheap cost model for
> >>>explicit -ftree{,-loop}-vectorize.
> >>>
> >>> gcc/testsuite/ChangeLog:
> >>>
> >>>* c-c++-common/Wstringop-overflow-2.c: Adjust testcase.
> >>>* g++.dg/tree-ssa/pr81408.C: Ditto.
> >>>* g++.dg/warn/Wuninitialized-13.C: Ditto.
> >>>* gcc.dg/Warray-bounds-51.c: Ditto.
> >>>* gcc.dg/Warray-parameter-3.c: Ditto.
> >>>* gcc.dg/Wstringop-overflow-13.c: Ditto.
> >>>* gcc.dg/Wstringop-overflow-14.c: Ditto.
> >>>* gcc.dg/Wstringop-overflow-21.c: Ditto.
> >>>* gcc.dg/Wstringop-overflow-68.c: Ditto.
> >>>* gcc.dg/gomp/pr46032-2.c: Ditto.
> >>>* gcc.dg/gomp/pr46032-3.c: Ditto.
> >>>* gcc.dg/gomp/simd-2.c: Ditto.
> >>>* gcc.dg/gomp/simd-3.c: Ditto.
> >>>* gcc.dg/graphite/fuse-1.c: Ditto.
> >>>* gcc.dg/pr67089-6.c: Ditto.
> >>>* gcc.dg/pr82929-2.c: Ditto.
> >>>* gcc.dg/pr82929.c: Ditto.
> >>>* gcc.dg/store_merging_1.c: Ditto.
> >>>* gcc.dg/store_merging_11.c: Ditto.
> >>>* gcc.dg/store_merging_15.c: Ditto.
> >>>* gcc.dg/store_merging_16.c: Ditto.
> >>>* gcc.dg/store_merging_19.c: Ditto.
> >>>* gcc.dg/store_merging_24.c: Ditto.
> >>>* gcc.dg/store_merging_25.c: Ditto.
> >>>* gcc.dg/store_merging_28.c: Ditto.
> >>>* gcc.dg/store_merging_30.c: Ditto.
> >>>* gcc.dg/store_merging_5.c: Ditto.
> >>>* gcc.dg/store_merging_7.c: Ditto.
> >>>* gcc.dg/store_merging_8.c: Ditto.
> >>>* gcc.dg/strlenopt-85.c: Ditto.
> >>>* gcc.dg/tree-ssa/dump-6.c: Ditto.
> >>>* gcc.dg/tree-ssa/pr19210-1.c: Ditto.
> >>>* gcc.dg/tree-ssa/pr47059.c: Ditto.
> >>>* gcc.dg/tree-ssa/pr86017.c: Ditto.
> >>>* gcc.dg/tree-ssa/pr91482.c: Ditto.
> >>>* gcc.dg/tree-ssa/predcom-1.c: Ditto.
> >>>* gcc.dg/tree-ssa/predcom-dse-3.c: Ditto.
> >>>* gcc.dg/tree-ssa/prefetch-3.c: Ditto.
> >>>* gcc.dg/tree-ssa/prefetch-6.c: Ditto.
> >>>* gcc.dg/tree-ssa/prefetch-8.c: Ditto.
> >>>* gcc.dg/tree-ssa/prefetch-9.c: Ditto.
> >>>* gcc.dg/tree-ssa/ssa-dse-18.c: Ditto.
> >>>* gcc.dg/tree-ssa/ssa-dse-19.c: Ditto.
> >>>* gcc.dg/uninit-40.c: Ditto.
> >>>* gcc.dg/unroll-7.c: Ditto.
> >>>* gcc.misc-tests/help.exp: Ditto.
> >>>* gcc.target/i386/avx512vpopcntdqvl-vpopcntd-1.c: Ditto.
> >>>* gcc.target/i386/pr22141.c: Ditto.
> >>>* gcc.target/i386/pr34012.c: Ditto.
> >>>* gcc.target/i386/pr49781-1.c: Ditto.
> >>>* gcc.target/i386/pr95798-1.c: Ditto.
> >>>* gcc.target/i386/pr95798-2.c: Ditto.
> >>>* gfortran.dg/pr77498.f: Ditto.
> >>> ---
> >>>   gcc/common.opt |  2 +-
> >>>   gcc/doc/invoke.texi|  8 +---
> >>>   gcc/opts.c | 18 +++---
> >>>   .../c-c++-common/Wstringop-overflow-2.c|  2 +-
> >>>   gcc/testsuite/g++.dg/tree-ssa/pr81408.C|  2 +-
> >>>   gcc/testsuite/g++.dg/warn/Wuninitialized-13.C  |  2 +-
> >>>   gcc/testsuite/gcc.dg/Warray-bounds-51.c|  2 +-
> >>>   gcc/testsuite/gcc.dg/Warray-parameter-3.c  |  2 +-
> >>>   gcc/testsuite/gcc.dg/Wstringop-overflow-13.c   |  2 +-
> >>>   gcc/testsuite/gcc.dg/Wstringop-overflow-14.c   |  2 +-
> >>>   gcc/testsuite/gcc.dg/Wstringop-overflow-21.c   |  2 +-
> >>>   gcc/testsuite/gcc.dg/Wstringop-overflow-68.c   |  2 +-
> >>>   gcc/testsuite/gcc.dg/gomp/pr46032-2.c  |  2 +-
> >>>   gcc/testsuite/gcc.dg/gomp/pr46032-3.c  |  2 +-
> >>>   gcc/testsuite/gcc.dg/gomp/simd-2.c |  2 +-
> >>>   gcc/testsuite/gcc.dg/gomp/simd-3.c |  2 +-
> >>>   gcc/testsuite/gcc.dg/graphite/fuse-1.c |  2 +-
> >>>   gcc/testsuite/gcc.dg/pr67089-6.c   |  2 +-
> >>>   gcc/testsuite/gcc.dg/pr82929-2.c   |  2 +-
> >>>   gcc/testsuite/gcc.dg/pr82929.c |  2 +-
> >>>   gcc/testsuite/gcc.dg/store_merging_1.c |  2 +-
> >>>   gcc/testsuite/gcc.dg/store_merging_11.c|  2 +-
> >>>   gcc/testsuite/gcc.dg/store_merging_15.c|  2 +-
> >>>   gcc/testsuite/gcc.dg/store_merging_16.c|  2 +-
> >>>   gcc/testsuite/gcc.dg/store_merging_19.c|  2 +-

[PATCH] libiberty: prevent null dereferencing on dlang_type

2021-09-21 Thread Luís Ferreira

This patch prevents dereferencing a null reference on a crafted
malformed magled name, often causing SIGSEGV to be raised.

Signed-off-by: Luís Ferreira 
---
 libiberty/d-demangle.c  | 2 +-
 libiberty/testsuite/d-demangle-expected | 5 -
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/libiberty/d-demangle.c b/libiberty/d-demangle.c
index a2152cc65518..469398261994 100644
--- a/libiberty/d-demangle.c
+++ b/libiberty/d-demangle.c
@@ -875,7 +875,7 @@ dlang_type (string *decl, const char *mangled,
struct dlang_info *info)
   szmods = string_length ();
 
   /* Back referenced function type.  */
-  if (*mangled == 'Q')
+  if (mangled && *mangled == 'Q')
mangled = dlang_type_backref (decl, mangled, info, 1);
   else
mangled = dlang_function_type (decl, mangled, info);
diff --git a/libiberty/testsuite/d-demangle-expected
b/libiberty/testsuite/d-demangle-expected
index c35185c3e1e3..799f4724b72e 100644
--- a/libiberty/testsuite/d-demangle-expected
+++ b/libiberty/testsuite/d-demangle-expected
@@ -991,11 +991,14 @@ _D88
 _D5__T1aZv
 _D5__T1aZv
 #
---format=dlang
 _D00
 _D00
 #
 --format=dlang
+_D01_D
+_D01_D
+#
+--format=dlang
 _D9223372036854775817
 _D9223372036854775817
 #



signature.asc
Description: This is a digitally signed message part

[PATCH v6] c++: Fix cp_tree_equal for template value args using dependent sizeof/alignof/noexcept expressions

2021-09-21 Thread Barrett Adair via Gcc-patches

This revision adds description and ChangeLog entries.

Thanks,
Barrett
From d256074a17f16ce5a90653e1d907a8dbc105aa43 Mon Sep 17 00:00:00 2001
From: Barrett Adair 
Date: Wed, 15 Sep 2021 15:26:22 -0500
Subject: [PATCH] c++: fix template instantiation comparison in redeclarations

This change fixes a primordial c++11 frontend defect where function template
redeclarations with trailing return types that used dependent
sizeof/alignof/noexcept expressions in template value arguments failed to
compare as equivalent to the identical primary template declaration. By forcing
structural AST comparison of the template arguments, we no longer require
TYPE_CANONICAL to match in this case. The new canon-type-{15..18}.C tests failed
with all prior GCC versions, where the redeclarations were incorrectly reported
as ambiguous overloads. The new dependent-name{15,16}.C tests are regression
tests for sneaky problems encountered during development of this fix. Note that
this fix does not address the use of parm objects' constexpr members as template
arguments within a declaration (a superficially similar longstanding defect).

gcc/cp/ChangeLog:

	* pt.c (find_parm_usage_r): New walk_tree callback to find func parms.
	(any_template_arguments_need_structural_equality_p): New special case.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp0x/constexpr-52830.C: Remove unwanted dg-ice.
	* g++.dg/template/canon-type-15.C: New test.
	* g++.dg/template/canon-type-16.C: New test.
	* g++.dg/template/canon-type-17.C: New test.
	* g++.dg/template/canon-type-18.C: New test.
	* g++.dg/template/dependent-name15.C: New regression test.
	* g++.dg/template/dependent-name16.C: New regression test.
---
 gcc/cp/pt.c   | 20 +++
 gcc/testsuite/g++.dg/cpp0x/constexpr-52830.C  |  1 -
 gcc/testsuite/g++.dg/template/canon-type-15.C |  7 +++
 gcc/testsuite/g++.dg/template/canon-type-16.C |  6 ++
 gcc/testsuite/g++.dg/template/canon-type-17.C |  5 +
 gcc/testsuite/g++.dg/template/canon-type-18.C |  6 ++
 .../g++.dg/template/dependent-name15.C| 18 +
 .../g++.dg/template/dependent-name16.C| 14 +
 8 files changed, 76 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/template/canon-type-15.C
 create mode 100644 gcc/testsuite/g++.dg/template/canon-type-16.C
 create mode 100644 gcc/testsuite/g++.dg/template/canon-type-17.C
 create mode 100644 gcc/testsuite/g++.dg/template/canon-type-18.C
 create mode 100644 gcc/testsuite/g++.dg/template/dependent-name15.C
 create mode 100644 gcc/testsuite/g++.dg/template/dependent-name16.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 4d42899f28d..5ba771db678 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -27813,6 +27813,19 @@ dependent_template_arg_p (tree arg)
 return value_dependent_expression_p (arg);
 }
 
+/* Identify any expressions that use function parms.  */
+static tree
+find_parm_usage_r (tree *tp, int *walk_subtrees, void*)
+{
+  tree t = *tp;
+  if (TREE_CODE (t) == PARM_DECL)
+{
+  *walk_subtrees = 0;
+  return t;
+}
+  return NULL_TREE;
+}
+
 /* Returns true if ARGS (a collection of template arguments) contains
any types that require structural equality testing.  */
 
@@ -27857,6 +27870,13 @@ any_template_arguments_need_structural_equality_p (tree args)
 	  else if (!TYPE_P (arg) && TREE_TYPE (arg)
 		   && TYPE_STRUCTURAL_EQUALITY_P (TREE_TYPE (arg)))
 		return true;
+	  /* Checking current_function_decl because this structural
+	 comparison is only necessary for redeclaration.  */
+	  else if (!current_function_decl
+		   && dependent_template_arg_p (arg)
+		   && (cp_walk_tree_without_duplicates
+		   (, find_parm_usage_r, NULL)))
+		return true;
 	}
 	}
 }
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-52830.C b/gcc/testsuite/g++.dg/cpp0x/constexpr-52830.C
index eae0d8c377b..d6057f13497 100644
--- a/gcc/testsuite/g++.dg/cpp0x/constexpr-52830.C
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-52830.C
@@ -1,7 +1,6 @@
 // PR c++/52830
 // { dg-do compile { target c++11 } }
 // { dg-additional-options "-fchecking" }
-// { dg-ice "comptypes" }
 
 template struct eif { typedef void type; };
 template<>   struct eif {};
diff --git a/gcc/testsuite/g++.dg/template/canon-type-15.C b/gcc/testsuite/g++.dg/template/canon-type-15.C
new file mode 100644
index 000..b001b5c841d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/canon-type-15.C
@@ -0,0 +1,7 @@
+// { dg-do compile { target c++11 } }
+template struct size_c{ static constexpr unsigned value = u; };
+namespace g {
+template auto return_size(T t) -> size_c;
+template auto return_size(T t) -> size_c;
+}
+static_assert(decltype(g::return_size('a'))::value == 1u, "");
diff --git a/gcc/testsuite/g++.dg/template/canon-type-16.C b/gcc/testsuite/g++.dg/template/canon-type-16.C
new file mode 100644
index 000..99361cbac30
--- /dev/null
+++

[PATCH] libiberty: prevent buffer overflow when decoding user input

2021-09-21 Thread Luís Ferreira

Currently a stack/heap overflow may happen if a crafted mangle is
maliciously used to cause denial of service, such as intentional
crashes
by accessing a reserved memory space.

Signed-off-by: Luís Ferreira 
---
 libiberty/d-demangle.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libiberty/d-demangle.c b/libiberty/d-demangle.c
index a2152cc65518..7ded3e2a2563 100644
--- a/libiberty/d-demangle.c
+++ b/libiberty/d-demangle.c
@@ -381,7 +381,7 @@ dlang_symbol_backref (string *decl, const char
*mangled,
 
   /* Must point to a simple identifier.  */
   backref = dlang_number (backref, );
-  if (backref == NULL)
+  if (backref == NULL || strlen(backref) < len)
 return NULL;
 
   backref = dlang_lname (decl, backref, len);




signature.asc
Description: This is a digitally signed message part

Re: [PATCH] c++: fix wrong fixit hints for misspelled typedef [PR77565]

2021-09-21 Thread Michel Morin via Gcc-patches

On Tue, Sep 21, 2021 at 5:24 AM Jason Merrill  wrote:
>
> On 9/17/21 13:31, Michel Morin wrote:
> > On Fri, Sep 17, 2021 at 3:23 AM Jason Merrill  wrote:
> >>
> >> On 9/16/21 11:50, Michel Morin wrote:
> >>> On Thu, Sep 16, 2021 at 5:44 AM Jason Merrill  wrote:
> 
>  On 9/14/21 04:29, Michel Morin via Gcc-patches wrote:
> > On Tue, Sep 14, 2021 at 7:14 AM David Malcolm  
> > wrote:
> >>
> >> On Tue, 2021-09-14 at 03:35 +0900, Michel Morin via Gcc-patches wrote:
> >>> Hi,
> >>>
> >>> PR77565 reports that, with the code `typdef int Int;`, GCC emits
> >>> "did you mean 'typeof'?" instead of "did you mean 'typedef'?".
> >>>
> >>> This happens because the typo corrector determines that `typeof` is a
> >>> candidate for suggestion (through
> >>> `cp_keyword_starts_decl_specifier_p`),
> >>> but `typedef` is not.
> >>>
> >>> This patch fixes the issue by adding `typedef` as a candidate. The
> >>> patch
> >>> additionally adds the `inline` specifier and cv-specifiers as a
> >>> candidate.
> >>> Here is a patch (tests `make check-gcc` pass on darwin):
> >>
> >> Thanks for this patch (and for reporting the bug in the first place).
> >>
> >> I notice that, as well as being used for fix-it hints by
> >> lookup_name_fuzzy (indirectly via suggest_rid_p),
> >> cp_keyword_starts_decl_specifier_p is also used by
> >> cp_lexer_next_token_is_decl_specifier_keyword, which is used by
> >> cp_parser_lambda_declarator_opt and cp_parser_constructor_declarator_p.
> >
> > Ah, you're right! Thank you for pointing this out.
> > I failed to grep those functions somehow.
> >
> > One thing that confuses me is that cp_keyword_starts_decl_specifier_p
> > misses many keywords that can start decl-specifiers (e.g.
> > typedef/inline/cv-qual and friend/explicit/virtual).
> > So let's wait C++ frontend maintainers ;)
> 
>  That is strange.  Let's add all the rest of them as well.
> >>>
> >>> Done. Thanks for your help!
> >>>
> >>> One more thing — cp_keyword_starts_decl_specifier_p includes RID_ATTRIBUTE
> >>> (from the beginning; see https://gcc.gnu.org/PR28261 ), but attributes are
> >>> not decl-specifiers. Would it be reasonable to remove this?
> >>
> >> It looks like the place that PR28261 used
> >> cp_lexer_next_token_is_decl_specifier_keyword specifically exempts
> >> attributes:
> >>
> >>>&& (!cp_lexer_next_token_is_decl_specifier_keyword 
> >>> (parser->lexer)
> >>>/* GNU attributes can actually appear both at the start of
> >>>   a parameter and parenthesized declarator.
> >>>   S (__attribute__((unused)) int);
> >>>   is a constructor, but
> >>>   S (__attribute__((unused)) foo) (int);
> >>>   is a function declaration.  */
> >>>|| (cp_parser_allow_gnu_extensions_p (parser)
> >>>&& cp_next_tokens_can_be_gnu_attribute_p (parser)))
> >>
> >> So yes, let's remove RID_ATTRIBUTE and the || clause there.  I'd keep
> >> the comment, but move it to go with the test for C++11 attributes below.
> >
> > Done. No regressions introduced.
> >
> >>> One more thing — cp_keyword_starts_decl_specifier_p includes RID_ATTRIBUTE
> >>> (from the beginning; see https://gcc.gnu.org/PR28261 ), but attributes are
> >>> not decl-specifiers.
> >
> > Oh, this is wrong. I thought that, since C++11 attributes are not a
> > decl-specifier, neither are GNU attributes. But the comment just before
> > cp_parser_decl_specifier_seq says that GNU attributes are considered as a
> > decl-specifier. So I'm not confident about the removal of RID_ATTRIBUTE in
> > cp_keyword_starts_decl_specifier_p...
>
> GNU attributes can appear in lots of places, and the only two callers of
> cp_parser_next_token_is_decl_specifier_keyword don't want to treat
> attributes accordingly.

Makes sense.

> Let's go with both your patches, and also
> remove the consequently-unnecessary attributes check in
> cp_parser_lambda_declarator_opt:
>
> >   if (cp_lexer_next_token_is_decl_specifier_keyword (parser->lexer)
> >   && !cp_next_tokens_can_be_gnu_attribute_p (parser))
>
> OK with that change.

Updated and rebased the patch. No regressions on x86_64-apple-darwin.

Thank you for your help!


>
> > I've split the patch into two. The first one is for adding missing keywords 
> > to
> > fix PR77565 and the second one is for removing the "attribute" keyword.
> > Here is the second patch (if this is not applied, that's no problem ;) )
> >
> > ==
> > c++: adjust the handling of RID_ATTRIBUTE.
> >
> > gcc/cp/ChangeLog:
> >
> > * parser.c (cp_keyword_starts_decl_specifier_p): Do not
> > handle RID_ATTRIBUTE.
> > (cp_parser_constructor_declarator_p): Remove now-redundant
> > checks.
> >
> > diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
> > index

Re: [PATCH] warn for more impossible null pointer tests [PR102103]

2021-09-21 Thread Martin Sebor via Gcc-patches


On 9/21/21 3:40 PM, Jason Merrill wrote:

On 9/17/21 12:02, Martin Sebor wrote:

On 9/8/21 2:06 PM, Jason Merrill wrote:

On 9/2/21 7:53 PM, Martin Sebor wrote:
@@ -4622,28 +4622,94 @@ warn_for_null_address (location_t location, 
tree op, tsubst_flags_t complain)

    if (!warn_address
    || (complain & tf_warning) == 0
    || c_inhibit_evaluation_warnings != 0
-  || warning_suppressed_p (op, OPT_Waddress))
+  || warning_suppressed_p (op, OPT_Waddress)
+  || processing_template_decl != 0)


Completely suppressing this warning in templates seems like a 
regression;  I'd think we could recognize many relevant cases before 
instantiation.  You just can't assume that ADDR_EXPR has the default 
meaning if it has unknown type (i.e. because op0 is type-dependent).


I added the suppression to keep g++.dg/warn/pr101219.C from failing
but in hindsight I should have questioned the reasoning behind
the "no warning emitted here (no instantiation)" comment in the test.

I agree that it would be helpful to diagnose the type-independent
subset of the problem even in uninstantiated templates.  Current
trunk doesn't (it never has), but with my patch and the suppression
above removed it does.  I've updated the tests to expect it.

Please see the attached revision.

Martin

PS There are still more opportunities to issue -Waddress in templates
that this patch doesn't handle, e.g.,:

   template  bool f (T *p) { return  == 0; }

Handling this will take more surgery.

PPS It seems that most other warnings (and even some errors, like
-Wnarrowing) are suppressed in uninstantiated templates as well,
even for non-dependent expressions.  In the few test cases I looked
at Clang does better.  It sounds like you'd like to see improvements
in this direction not just for -Waddress but in general.  Just for
the avoidance of doubt, can you confirm that for future reference?


Yes, in general it's better to diagnose sooner.


+  if (TREE_CODE (cop) == NON_LVALUE_EXPR)
+    /* Unwrap the expression for C++ 98.  */
+    cop = TREE_OPERAND (cop, 0);


What does this have to do with C++98?


The code is needed to avoid failures in C++ 98 in the test below
where COP is a NON_LVALUE_EXPR which isn't handled below otherwise.
I didn't investigate why that happens (it works fine if f() is
an ordinary member function).

  void f (bool);

  void g ()
  {
struct A { virtual void vf (); };

f (::vf);   // missing -Waddress in C++ 98 mode
  }




+  if (TREE_CODE (cop) == PTRMEM_CST)
+    {
+  /* The address of a nonstatic data member is never null.  */
+  warning_at (location, OPT_Waddress,
+  "the address %qE will never be NULL",


Capitalizing NULL when talking about pointers-to-members seems a bit 
odd, but I guess it's fine.


I agree.  My personal preference is for lowercase null (in all
languages) since that's the technical term for it.  I used NULL
here only to conform to the existing style.  I'm willing to
change all these warnings to either use null or to some form
that doesn't mention null (there are two in use, although if
I had my druthers I'd choose some other phrasing altogether).
Let me know if you would support such a change.



The C++ changes are OK.


Jeff, should I take your previous "Generally OK" as an approval
for the rest of the patch as well?  (It has not changed in v2.)
I have just submitted a Glibc patch to suppress the new instances
there.

Martin

Re: [PATCH] c++: concept-ids and value-dependence [PR102412]

2021-09-21 Thread Jason Merrill via Gcc-patches


On 9/21/21 09:30, Patrick Palka wrote:

  case TEMPLATE_ID_EXPR:
-  return concept_definition_p (TREE_OPERAND (expression, 0));
+  return concept_definition_p (TREE_OPERAND (expression, 0))
+   && any_dependent_template_arguments_p (TREE_OPERAND (expression, 1));


Hmm, do we even need to check concept_definition_p?  Even if other 
template-ids don't get here, if they did they would also be dependent if 
they had dependent template arguments.


OK either way.

Jason

Re: [PATCH v5] c++: Fix cp_tree_equal for template value args using dependent sizeof/alignof/noexcept expressions

2021-09-21 Thread Jason Merrill via Gcc-patches


On 9/21/21 19:07, Barrett Adair wrote:
Thanks for the feedback. I addressed the concerns raised in the previous 
patch.


I ran tests with this, after rebasing:

../gcc/configure --prefix=$MYPREFIX --enable-languages=c++ 
--enable-bootstrap --enable-checking=yes,extra --disable-multilib && 
make -j6 && make check


I have the "expected" failures:
* guality
* xtreme-header-4_b.C
* 20050826-2.c scan-tree-dump-not dom2 "Invalid sum"

Additionally, libstdc++-v3/testsuite/30_threads/jthread/95989.cc results 
in a target segfault, but this also happens when manually building with 
Ubuntu's g++-11 and the specified flags. It looks like this test 
predates the 11.1 release, so I doubt this crash is related to the patch.


Other than that, no FAIL or XPASS.



Subject: [PATCH] Fix template instantiation comparison in redeclarations

---
 gcc/cp/pt.c   | 20 +++


Thanks.  Your patch still needs a description (maybe condensed from the 
intro to the first version of the patch) and ChangeLog entries.  If you 
run contrib/gcc-git-customization.sh it will define an alias that you 
can use to add skeleton ChangeLog entries to the commit with


git gcc-commit-mklog --amend

Sorry I didn't catch this before.

Jason

[PATCH v5] c++: Fix cp_tree_equal for template value args using dependent sizeof/alignof/noexcept expressions

2021-09-21 Thread Barrett Adair via Gcc-patches

Thanks for the feedback. I addressed the concerns raised in the previous
patch.

I ran tests with this, after rebasing:

../gcc/configure --prefix=$MYPREFIX --enable-languages=c++
--enable-bootstrap --enable-checking=yes,extra --disable-multilib && make
-j6 && make check

I have the "expected" failures:
* guality
* xtreme-header-4_b.C
* 20050826-2.c scan-tree-dump-not dom2 "Invalid sum"

Additionally, libstdc++-v3/testsuite/30_threads/jthread/95989.cc results in
a target segfault, but this also happens when manually building with
Ubuntu's g++-11 and the specified flags. It looks like this test predates
the 11.1 release, so I doubt this crash is related to the patch.

Other than that, no FAIL or XPASS.
From 22e3d2eb76d1d3178bdd80f67b8be7aea8912d0c Mon Sep 17 00:00:00 2001
From: Barrett Adair 
Date: Wed, 15 Sep 2021 15:26:22 -0500
Subject: [PATCH] Fix template instantiation comparison in redeclarations

---
 gcc/cp/pt.c   | 20 +++
 gcc/testsuite/g++.dg/cpp0x/constexpr-52830.C  |  1 -
 gcc/testsuite/g++.dg/template/canon-type-15.C |  7 +++
 gcc/testsuite/g++.dg/template/canon-type-16.C |  6 ++
 gcc/testsuite/g++.dg/template/canon-type-17.C |  5 +
 gcc/testsuite/g++.dg/template/canon-type-18.C |  6 ++
 .../g++.dg/template/dependent-name15.C| 18 +
 .../g++.dg/template/dependent-name16.C| 14 +
 8 files changed, 76 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/template/canon-type-15.C
 create mode 100644 gcc/testsuite/g++.dg/template/canon-type-16.C
 create mode 100644 gcc/testsuite/g++.dg/template/canon-type-17.C
 create mode 100644 gcc/testsuite/g++.dg/template/canon-type-18.C
 create mode 100644 gcc/testsuite/g++.dg/template/dependent-name15.C
 create mode 100644 gcc/testsuite/g++.dg/template/dependent-name16.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 4d42899f28d..5ba771db678 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -27813,6 +27813,19 @@ dependent_template_arg_p (tree arg)
 return value_dependent_expression_p (arg);
 }
 
+/* Identify any expressions that use function parms.  */
+static tree
+find_parm_usage_r (tree *tp, int *walk_subtrees, void*)
+{
+  tree t = *tp;
+  if (TREE_CODE (t) == PARM_DECL)
+{
+  *walk_subtrees = 0;
+  return t;
+}
+  return NULL_TREE;
+}
+
 /* Returns true if ARGS (a collection of template arguments) contains
any types that require structural equality testing.  */
 
@@ -27857,6 +27870,13 @@ any_template_arguments_need_structural_equality_p (tree args)
 	  else if (!TYPE_P (arg) && TREE_TYPE (arg)
 		   && TYPE_STRUCTURAL_EQUALITY_P (TREE_TYPE (arg)))
 		return true;
+	  /* Checking current_function_decl because this structural
+	 comparison is only necessary for redeclaration.  */
+	  else if (!current_function_decl
+		   && dependent_template_arg_p (arg)
+		   && (cp_walk_tree_without_duplicates
+		   (, find_parm_usage_r, NULL)))
+		return true;
 	}
 	}
 }
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-52830.C b/gcc/testsuite/g++.dg/cpp0x/constexpr-52830.C
index eae0d8c377b..d6057f13497 100644
--- a/gcc/testsuite/g++.dg/cpp0x/constexpr-52830.C
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-52830.C
@@ -1,7 +1,6 @@
 // PR c++/52830
 // { dg-do compile { target c++11 } }
 // { dg-additional-options "-fchecking" }
-// { dg-ice "comptypes" }
 
 template struct eif { typedef void type; };
 template<>   struct eif {};
diff --git a/gcc/testsuite/g++.dg/template/canon-type-15.C b/gcc/testsuite/g++.dg/template/canon-type-15.C
new file mode 100644
index 000..b001b5c841d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/canon-type-15.C
@@ -0,0 +1,7 @@
+// { dg-do compile { target c++11 } }
+template struct size_c{ static constexpr unsigned value = u; };
+namespace g {
+template auto return_size(T t) -> size_c;
+template auto return_size(T t) -> size_c;
+}
+static_assert(decltype(g::return_size('a'))::value == 1u, "");
diff --git a/gcc/testsuite/g++.dg/template/canon-type-16.C b/gcc/testsuite/g++.dg/template/canon-type-16.C
new file mode 100644
index 000..99361cbac30
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/canon-type-16.C
@@ -0,0 +1,6 @@
+// { dg-do compile { target c++11 } }
+template struct bool_c{ static constexpr bool value = u; };
+template auto noexcepty(T t) -> bool_c;
+template auto noexcepty(T t) -> bool_c;
+struct foo { void operator()() noexcept; };
+static_assert(decltype(noexcepty(foo{}))::value, "");
diff --git a/gcc/testsuite/g++.dg/template/canon-type-17.C b/gcc/testsuite/g++.dg/template/canon-type-17.C
new file mode 100644
index 000..0555c8d0a42
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/canon-type-17.C
@@ -0,0 +1,5 @@
+// { dg-do compile { target c++11 } }
+template struct size_c{ static constexpr unsigned value = u; };
+template auto return_size(T... t) -> size_c;
+template auto return_size(T... t) -> size_c;

[PATCH v2 3/3] reassoc: Test rank biasing

2021-09-21 Thread Ilya Leoshkevich via Gcc-patches

Add both positive and negative tests.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/reassoc-46.c: New test.
* gcc.dg/tree-ssa/reassoc-46.h: Common code for new tests.
* gcc.dg/tree-ssa/reassoc-47.c: New test.
* gcc.dg/tree-ssa/reassoc-48.c: New test.
* gcc.dg/tree-ssa/reassoc-49.c: New test.
* gcc.dg/tree-ssa/reassoc-50.c: New test.
* gcc.dg/tree-ssa/reassoc-51.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c |  7 +
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h | 33 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c |  9 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c |  9 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c | 11 
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c | 10 +++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c | 11 
 7 files changed, 90 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c 
b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c
new file mode 100644
index 000..69e02bc4d4a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include "reassoc-46.h"
+
+/* Check that the loop accumulator is added last.  */
+/* { dg-final { scan-tree-dump-times {sum_\d+ = (?:_\d+ \+ sum_\d+|sum_\d+ \+ 
_\d+)} 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h 
b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h
new file mode 100644
index 000..e60b490ea0d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h
@@ -0,0 +1,33 @@
+#define M 1024
+unsigned int arr1[M];
+unsigned int arr2[M];
+volatile unsigned int sink;
+
+unsigned int
+test (void)
+{
+  unsigned int sum = 0;
+  for (int i = 0; i < M; i++)
+{
+#ifdef MODIFY
+  /* Modify the loop accumulator using a chain of operations - this should
+ not affect its rank biasing.  */
+  sum |= 1;
+  sum ^= 2;
+#endif
+#ifdef STORE
+  /* Save the loop accumulator into a global variable - this should not
+ affect its rank biasing.  */
+  sink = sum;
+#endif
+#ifdef USE
+  /* Add a tricky use of the loop accumulator - this should prevent its
+ rank biasing.  */
+  i = (i + sum) % M;
+#endif
+  /* Use addends with different ranks.  */
+  sum += arr1[i];
+  sum += arr2[((i ^ 1) + 1) % M];
+}
+  return sum;
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c 
b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c
new file mode 100644
index 000..84b51ccddb0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#define MODIFY
+#include "reassoc-46.h"
+
+/* Check that if the loop accumulator is saved into a global variable, it's
+   still added last.  */
+/* { dg-final { scan-tree-dump-times {sum_\d+ = (?:_\d+ \+ sum_\d+|sum_\d+ \+ 
_\d+)} 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c 
b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c
new file mode 100644
index 000..53ae8820281
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#define STORE
+#include "reassoc-46.h"
+
+/* Check that if the loop accumulator is modified using a chain of operations
+   other than addition, its new value is still added last.  */
+/* { dg-final { scan-tree-dump-times {sum_\d+ = (?:_\d+ \+ sum_\d+|sum_\d+ \+ 
_\d+)} 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c 
b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c
new file mode 100644
index 000..a6941d5ac2b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#define MODIFY
+#define STORE
+#include "reassoc-46.h"
+
+/* Check that if the loop accumulator is both modified using a chain of
+   operations other than addition and stored into a global variable, its new
+   value is still added last.  */
+/* { dg-final { scan-tree-dump-times {sum_\d+ = (?:_\d+ \+ sum_\d+|sum_\d+ \+ 
_\d+)} 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c 
b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c
new file mode 100644
index 000..68cd308c4f1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2

[PATCH v2 2/3] reassoc: Propagate PHI_LOOP_BIAS along single uses

2021-09-21 Thread Ilya Leoshkevich via Gcc-patches

PR tree-optimization/49749 introduced code that shortens dependency
chains containing loop accumulators by placing them last on operand
lists of associative operations.

456.hmmer benchmark on s390 could benefit from this, however, the code
that needs it modifies loop accumulator before using it, and since only
so-called loop-carried phis are are treated as loop accumulators, the
code in the present form doesn't really help.   According to Bill
Schmidt - the original author - such a conservative approach was chosen
so as to avoid unnecessarily swapping operands, which might cause
unpredictable effects.  However, giving special treatment to forms of
loop accumulators is acceptable.

The definition of loop-carried phi is: it's a single-use phi, which is
used in the same innermost loop it's defined in, at least one argument
of which is defined in the same innermost loop as the phi itself.
Given this, it seems natural to treat single uses of such phis as phis
themselves.

gcc/ChangeLog:

* tree-ssa-reassoc.c (biased_names): New global.
(propagate_bias_p): New function.
(loop_carried_phi): Remove.
(propagate_rank): Propagate bias along single uses.
(get_rank): Update biased_names when needed.
---
 gcc/tree-ssa-reassoc.c | 97 --
 1 file changed, 64 insertions(+), 33 deletions(-)

diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index 420c14e8cf5..2f7a8882aac 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -211,6 +211,10 @@ static int64_t *bb_rank;
 /* Operand->rank hashtable.  */
 static hash_map *operand_rank;
 
+/* SSA_NAMEs that are forms of loop accumulators and whose ranks need to be
+   biased.  */
+static auto_bitmap biased_names;
+
 /* Vector of SSA_NAMEs on which after reassociate_bb is done with
all basic blocks the CFG should be adjusted - basic blocks
split right after that SSA_NAME's definition statement and before
@@ -256,6 +260,50 @@ reassoc_remove_stmt (gimple_stmt_iterator *gsi)
the rank difference between two blocks.  */
 #define PHI_LOOP_BIAS (1 << 15)
 
+/* Return TRUE iff PHI_LOOP_BIAS should be propagated from one of the STMT's
+   operands to the STMT's left-hand side.  The goal is to preserve bias in code
+   like this:
+
+ x_1 = phi(x_0, x_2)
+ a = x_1 | 1
+ b = a ^ 2
+ .MEM = b
+ c = b + d
+ x_2 = c + e
+
+   That is, we need to preserve bias along single-use chains originating from
+   loop-carried phis.  Only GIMPLE_ASSIGNs to SSA_NAMEs are considered to be
+   uses, because only they participate in rank propagation.  */
+static bool
+propagate_bias_p (gimple *stmt)
+{
+  use_operand_p use;
+  imm_use_iterator use_iter;
+  gimple *single_use_stmt = NULL;
+
+  FOR_EACH_IMM_USE_FAST (use, use_iter, gimple_assign_lhs (stmt))
+{
+  gimple *current_use_stmt = USE_STMT (use);
+
+  if (is_gimple_assign (current_use_stmt)
+ && TREE_CODE (gimple_assign_lhs (current_use_stmt)) == SSA_NAME)
+   {
+ if (single_use_stmt != NULL)
+   return false;
+ single_use_stmt = current_use_stmt;
+   }
+}
+
+  if (single_use_stmt == NULL)
+return false;
+
+  if (gimple_bb (stmt)->loop_father
+  != gimple_bb (single_use_stmt)->loop_father)
+return false;
+
+  return true;
+}
+
 /* Rank assigned to a phi statement.  If STMT is a loop-carried phi of
an innermost loop, and the phi has only a single use which is inside
the loop, then the rank is the block rank of the loop latch plus an
@@ -313,46 +361,23 @@ phi_rank (gimple *stmt)
   return bb_rank[bb->index];
 }
 
-/* If EXP is an SSA_NAME defined by a PHI statement that represents a
-   loop-carried dependence of an innermost loop, return TRUE; else
-   return FALSE.  */
-static bool
-loop_carried_phi (tree exp)
-{
-  gimple *phi_stmt;
-  int64_t block_rank;
-
-  if (TREE_CODE (exp) != SSA_NAME
-  || SSA_NAME_IS_DEFAULT_DEF (exp))
-return false;
-
-  phi_stmt = SSA_NAME_DEF_STMT (exp);
-
-  if (gimple_code (SSA_NAME_DEF_STMT (exp)) != GIMPLE_PHI)
-return false;
-
-  /* Non-loop-carried phis have block rank.  Loop-carried phis have
- an additional bias added in.  If this phi doesn't have block rank,
- it's biased and should not be propagated.  */
-  block_rank = bb_rank[gimple_bb (phi_stmt)->index];
-
-  if (phi_rank (phi_stmt) != block_rank)
-return true;
-
-  return false;
-}
-
 /* Return the maximum of RANK and the rank that should be propagated
from expression OP.  For most operands, this is just the rank of OP.
For loop-carried phis, the value is zero to avoid undoing the bias
in favor of the phi.  */
 static int64_t
-propagate_rank (int64_t rank, tree op)
+propagate_rank (int64_t rank, tree op, gimple *stmt, bool *bias_p)
 {
   int64_t op_rank;
 
-  if (loop_carried_phi (op))
-return rank;
+  if (TREE_CODE (op) == SSA_NAME
+  && bitmap_bit_p (biased_names, SSA_NAME_VERSION (op)))
+{
+

[PATCH v2 1/3] reassoc: Do not bias loop-carried PHIs early

2021-09-21 Thread Ilya Leoshkevich via Gcc-patches

Biasing loop-carried PHIs during the 1st reassociation pass interferes
with reduction chains and does not bring measurable benefits, so do it
only during the 2nd reassociation pass.

gcc/ChangeLog:

* passes.def (pass_reassoc): Rename parameter to early_p.
* tree-ssa-reassoc.c (reassoc_bias_loop_carried_phi_ranks_p):
New variable.
(phi_rank): Don't bias loop-carried phi ranks
before vectorization pass.
(execute_reassoc): Add bias_loop_carried_phi_ranks_p parameter.
(pass_reassoc::pass_reassoc): Add bias_loop_carried_phi_ranks_p
initializer.
(pass_reassoc::set_param): Set bias_loop_carried_phi_ranks_p
value.
(pass_reassoc::execute): Pass bias_loop_carried_phi_ranks_p to
execute_reassoc.
(pass_reassoc::bias_loop_carried_phi_ranks_p): New member.
---
 gcc/passes.def |  4 ++--
 gcc/tree-ssa-reassoc.c | 16 ++--
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/gcc/passes.def b/gcc/passes.def
index d7a1f8c97a6..c5f915d04c6 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -242,7 +242,7 @@ along with GCC; see the file COPYING3.  If not see
   /* Identify paths that should never be executed in a conforming
 program and isolate those paths.  */
   NEXT_PASS (pass_isolate_erroneous_paths);
-  NEXT_PASS (pass_reassoc, true /* insert_powi_p */);
+  NEXT_PASS (pass_reassoc, true /* early_p */);
   NEXT_PASS (pass_dce);
   NEXT_PASS (pass_forwprop);
   NEXT_PASS (pass_phiopt, false /* early_p */);
@@ -325,7 +325,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_lower_vector_ssa);
   NEXT_PASS (pass_lower_switch);
   NEXT_PASS (pass_cse_reciprocals);
-  NEXT_PASS (pass_reassoc, false /* insert_powi_p */);
+  NEXT_PASS (pass_reassoc, false /* early_p */);
   NEXT_PASS (pass_strength_reduction);
   NEXT_PASS (pass_split_paths);
   NEXT_PASS (pass_tracer);
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index 8498cfc7aa8..420c14e8cf5 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -180,6 +180,10 @@ along with GCC; see the file COPYING3.  If not see
point 3a in the pass header comment.  */
 static bool reassoc_insert_powi_p;
 
+/* Enable biasing ranks of loop accumulators.  We don't want this before
+   vectorization, since it interferes with reduction chains.  */
+static bool reassoc_bias_loop_carried_phi_ranks_p;
+
 /* Statistics */
 static struct
 {
@@ -269,6 +273,9 @@ phi_rank (gimple *stmt)
   use_operand_p use;
   gimple *use_stmt;
 
+  if (!reassoc_bias_loop_carried_phi_ranks_p)
+return bb_rank[bb->index];
+
   /* We only care about real loops (those with a latch).  */
   if (!father->latch)
 return bb_rank[bb->index];
@@ -6940,9 +6947,10 @@ fini_reassoc (void)
optimization of a gimple conditional.  Otherwise returns zero.  */
 
 static unsigned int
-execute_reassoc (bool insert_powi_p)
+execute_reassoc (bool insert_powi_p, bool bias_loop_carried_phi_ranks_p)
 {
   reassoc_insert_powi_p = insert_powi_p;
+  reassoc_bias_loop_carried_phi_ranks_p = bias_loop_carried_phi_ranks_p;
 
   init_reassoc ();
 
@@ -6983,15 +6991,19 @@ public:
 {
   gcc_assert (n == 0);
   insert_powi_p = param;
+  bias_loop_carried_phi_ranks_p = !param;
 }
   virtual bool gate (function *) { return flag_tree_reassoc != 0; }
   virtual unsigned int execute (function *)
-{ return execute_reassoc (insert_powi_p); }
+  {
+return execute_reassoc (insert_powi_p, bias_loop_carried_phi_ranks_p);
+  }
 
  private:
   /* Enable insertion of __builtin_powi calls during execute_reassoc.  See
  point 3a in the pass header comment.  */
   bool insert_powi_p;
+  bool bias_loop_carried_phi_ranks_p;
 }; // class pass_reassoc
 
 } // anon namespace
-- 
2.31.1

[PATCH v2 0/3] reassoc: Propagate PHI_LOOP_BIAS along single uses

2021-09-21 Thread Ilya Leoshkevich via Gcc-patches

This is an update to my very old patch with the review comments
addressed.  Bootstrapped and regtested x86_64-redhat-linux,
ppc64le-redhat-linux and s390x-redhat-linux.

v1: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/548785.html
Changes in v2:
* Disable PHI biasing in the early pass instance in a separate patch.
* Replace s390-specific tests with the generic tree-ssa ones.
* Replace the fragile (op_rank & PHI_LOOP_BIAS) test with auto_bitmap
  biased_names.  The review suggestion was to rather check whether op
  is defined by a loop-carried phi, but this would allow detecting only
  single assingments, and not assignment chains.  Another alternative
  that would make the check less fragile was to use saturating addition
  in order to prevent overflows into the PHI_LOOP_BIAS bit, but
  auto_bitmap of SSA_NAMEs allows graceful processing of large basic
  blocks, and its memory overhead looks acceptable.
* Restructure the code to make it a bit more readable.  The overall
  logic is the same as in v1.  I considered implementing an idea from
  [1], more specifically, detecting single-use chains in
  is_phi_for_stmt() so that swap_ops_for_binary_stmt() shifts the
  corresponding operand towards the end.  These two functions actually
  seem to serve a very related purpose.  However, for single-use chain
  detection we would still need to recursively traverse
  SSA_NAME_DEF_STMTs of operands, which propagate_rank() and friends
  already do.  So this would not have resulted in a significant code
  simplification.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2020-June/549149.html

Ilya Leoshkevich (3):
  reassoc: Do not bias loop-carried PHIs early
  reassoc: Propagate PHI_LOOP_BIAS along single uses
  reassoc: Test rank biasing

 gcc/passes.def |   4 +-
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c |   7 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h |  33 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c |   9 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c |   9 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c |  11 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c |  10 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c |  11 ++
 gcc/tree-ssa-reassoc.c | 113 ++---
 9 files changed, 170 insertions(+), 37 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c

-- 
2.31.1

Re: Merge from trunk to gccgo branch

2021-09-21 Thread Ian Lance Taylor via Gcc-patches

I merged trunk revision 09e18d113b3c3dae896ac1a8ad1e0087adbb153b to
the gccgo branch.

Ian

[PATCH] rs6000: Add psabi diagnostic for C++ zero-width bit field ABI change (PR102024)

2021-09-21 Thread Bill Schmidt via Gcc-patches


Hi!

Previously zero-width bit fields were removed from structs, so that otherwise
homogeneous aggregates were treated as such and passed in FPRs and VSRs.
This was incorrect behavior per the ELFv2 ABI.  Now that these fields are no
longer being removed, we generate the correct parameter passing code.  Alert
the unwary user in the rare cases where this behavior changes.

As noted in the PR, once the GCC 12 Changes page has text describing this issue,
we can update the diagnostic message to reference that URL.  I'll handle that
in a follow-up patch.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions.
Is this okay for trunk?

Thanks!
Bill


2021-09-21  Bill Schmidt  

gcc/
PR target/102024
* config/rs6000/rs6000-call.c (rs6000_aggregate_candidate): Detect
zero-width bit fields and return indicator.
(rs6000_discover_homogeneous_aggregate): Diagnose when the
presence of a zero-width bit field changes parameter passing in
GCC 12.

gcc/testsuite/
PR target/102024
* g++.target/powerpc/pr102024.C: New.
---
 gcc/config/rs6000/rs6000-call.c | 39 ++---
 gcc/testsuite/g++.target/powerpc/pr102024.C | 23 
 2 files changed, 57 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/pr102024.C

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 7d485480225..c02b202b0cd 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -6227,7 +6227,7 @@ const struct altivec_builtin_types 
altivec_overloaded_builtins[] = {
 
 static int

 rs6000_aggregate_candidate (const_tree type, machine_mode *modep,
-   int *empty_base_seen)
+   int *empty_base_seen, int *zero_width_bf_seen)
 {
   machine_mode mode;
   HOST_WIDE_INT size;
@@ -6298,7 +6298,8 @@ rs6000_aggregate_candidate (const_tree type, machine_mode 
*modep,
  return -1;
 
 	count = rs6000_aggregate_candidate (TREE_TYPE (type), modep,

-   empty_base_seen);
+   empty_base_seen,
+   zero_width_bf_seen);
if (count == -1
|| !index
|| !TYPE_MAX_VALUE (index)
@@ -6336,6 +6337,12 @@ rs6000_aggregate_candidate (const_tree type, 
machine_mode *modep,
if (TREE_CODE (field) != FIELD_DECL)
  continue;
 
+	if (DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD (field))

+ {
+   *zero_width_bf_seen = 1;
+   continue;
+ }
+
if (DECL_FIELD_ABI_IGNORED (field))
  {
if (lookup_attribute ("no_unique_address",
@@ -6347,7 +6354,8 @@ rs6000_aggregate_candidate (const_tree type, machine_mode 
*modep,
  }
 
 	sub_count = rs6000_aggregate_candidate (TREE_TYPE (field), modep,

-   empty_base_seen);
+   empty_base_seen,
+   zero_width_bf_seen);
if (sub_count < 0)
  return -1;
count += sub_count;
@@ -6381,7 +6389,8 @@ rs6000_aggregate_candidate (const_tree type, machine_mode 
*modep,
  continue;
 
 	sub_count = rs6000_aggregate_candidate (TREE_TYPE (field), modep,

-   empty_base_seen);
+   empty_base_seen,
+   zero_width_bf_seen);
if (sub_count < 0)
  return -1;
count = count > sub_count ? count : sub_count;
@@ -6423,8 +6432,10 @@ rs6000_discover_homogeneous_aggregate (machine_mode 
mode, const_tree type,
 {
   machine_mode field_mode = VOIDmode;
   int empty_base_seen = 0;
+  int zero_width_bf_seen = 0;
   int field_count = rs6000_aggregate_candidate (type, _mode,
-   _base_seen);
+   _base_seen,
+   _width_bf_seen);
 
   if (field_count > 0)

{
@@ -6460,6 +6471,24 @@ rs6000_discover_homogeneous_aggregate (machine_mode 
mode, const_tree type,
  last_reported_type_uid = uid;
}
}
+ if (zero_width_bf_seen && warn_psabi)
+   {
+ static unsigned last_reported_type_uid;
+ unsigned uid = TYPE_UID (TYPE_MAIN_VARIANT (type));
+ if (uid != last_reported_type_uid)
+   {
+ inform (input_location,
+ "parameter passing for an argument containing "
+ "zero-width bit fields but that is otherwise a "
+

[PATCH] Objective-C: fix class_ro layout for non-LP64

2021-09-21 Thread Matt Jacobson via Gcc-patches

Fix class_ro layout for non-LP64.  On LP64, the requisite padding is added at a
lower level.  For non-LP64, this fixes binary compatibility with clang-built
classes/runtimes.

Tested by examining the generated assembly for a class_ro in both cases (and in 
the case of clang), for both x86_64 (64-bit pointers) and AVR (16-bit pointers).
Tested by running a program on AVR with a GCC-built class using a clang-built 
Objective-C runtime.  Tested by running a program on x86_64/Darwin with a GCC-
built class and the clang-built system Objective-C runtime.

Patch also available at:


I don't have commit access, so if this patch is suitable, I'd need someone else 
to commit it for me.  Thanks.


gcc/objc/ChangeLog:

2021-09-21  Matt Jacobson  

* objc-next-runtime-abi-02.c (struct class_ro_t): Remove explicit 
alignment 
padding.
(build_v2_class_templates): Remove explicit alignment padding.
(build_v2_class_ro_t_initializer): Adjust initializer.


diff --git a/gcc/objc/objc-next-runtime-abi-02.c 
b/gcc/objc/objc-next-runtime-abi-02.c
index 42645e22316..c3af369ff0d 100644
--- a/gcc/objc/objc-next-runtime-abi-02.c
+++ b/gcc/objc/objc-next-runtime-abi-02.c
@@ -632,9 +632,7 @@ struct class_ro_t
 uint32_t const flags;
 uint32_t const instanceStart;
 uint32_t const instanceSize;
-#ifdef __LP64__
-uint32_t const reserved;
-#endif
+// [32 bits of reserved space here on LP64 platforms]
 const uint8_t * const ivarLayout;
 const char *const name;
 const struct method_list_t * const baseMethods;
@@ -677,11 +675,6 @@ build_v2_class_templates (void)
   /* uint32_t const instanceSize; */
   add_field_decl (integer_type_node, "instanceSize", );
 
-  /* This ABI is currently only used on m64 NeXT.  We always
- explicitly declare the alignment padding.  */
-  /* uint32_t const reserved; */
-  add_field_decl (integer_type_node, "reserved", );
-
   /* const uint8_t * const ivarLayout; */
   cnst_strg_type = build_pointer_type (unsigned_char_type_node);
   add_field_decl (cnst_strg_type, "ivarLayout", );
@@ -3225,12 +3218,6 @@ build_v2_class_ro_t_initializer (tree type, tree name,
   CONSTRUCTOR_APPEND_ELT (initlist, NULL_TREE,
  build_int_cst (integer_type_node, instanceSize));
 
-  /* This ABI is currently only used on m64 NeXT.  We always
- explicitly declare the alignment padding.  */
-  /* reserved, pads alignment.  */
-  CONSTRUCTOR_APPEND_ELT (initlist, NULL_TREE,
-   build_int_cst (integer_type_node, 0));
-
   /* ivarLayout */
   unsigned_char_star = build_pointer_type (unsigned_char_type_node);
   if (ivarLayout)

Re: [RFC/PATCH] C++ constexpr vs. floating point exceptions.

2021-09-21 Thread Marc Glisse


On Tue, 21 Sep 2021, Jason Merrill via Gcc-patches wrote:


On Tue, Sep 21, 2021 at 4:30 PM Joseph Myers 
wrote:


On Tue, 21 Sep 2021, Jakub Jelinek via Gcc-patches wrote:


On Tue, Sep 21, 2021 at 02:15:59PM +0100, Roger Sayle wrote:

Can you double check?  Integer division by zero is undefined, but

isn't floating point

division by zero defined by the appropriate IEEE standards?


https://eel.is/c++draft/expr.mul#4 doesn't make the division by zero
behavior conditional on integral types.
C has similar wording.


The position for C is that Annex F semantics take precedence over all the
ways in which floating-point arithmetic has undefined behavior in the main
body of the standard.  So floating-point overflow and division by zero are
fully defined in the presence of Annex F support, while out-of-range
conversions from floating point to integer types produce an unspecified
value (not necessarily the same unspecified value for different executions
of the conversion in the abstract machine - as discussed in bug 93806, GCC
can get that wrong and act as if a single execution of such a conversion
in the abstract machine produces more than one result).

In C, as specified in Annex F, initializers for floating-point objects
with static or thread storage duration are evaluated with exceptions
discarded and the default rounding mode in effect; 7.0 / 0.0 is a fully
valid initializer for such an object to initialize it to positive
infinity.  As I understand it, the question for this thread is whether C++
constexpr should have a similar rule to C static initializers (which ought
to apply to 1.0 / 3.0, raising inexact, just as much as to 7.0 / 0.0).



The C rules seem to be

F.8.2 Translation
During translation the IEC 60559 default modes are in effect:
— The rounding direction mode is rounding to nearest.
— The rounding precision mode (if supported) is set so that results are
not shortened.
— Trapping or stopping (if supported) is disabled on all floating-point
exceptions.
Recommended practice:
The implementation should produce a diagnostic message for each
translation-time floating-point exception, other than “inexact”; the
implementation should then proceed with the translation of the program.

I think following the same rules for C++ would be appropriate in a


I agree that looking at the C standard is more interesting, C++ is very 
bad at specifying anything float related.



diagnosing context: warn and continue.  In a template argument deduction
(SFINAE) context, where errors become silent substitution failures, it's
probably better to treat them as non-constant.


I am trying to imagine a sfinae example affected by whether 1./0. is 
constant. Does that mean A<0.,__builtin_inf()> would fail to use the 
specialization in


templatestruct A{};
templatestruct A{};

? I don't like that, I believe it should use the specialization. With 
ieee754, 1./0. is perfectly well defined as +inf, the only question is 
whether it should also set a flag at runtime, which is not relevant in a 
manifestly consteval context (fetestexcept, etc are not constexpr, that 
should be enough to catch mistakes). If some user wants to forbid 
FE_DIVBYZERO, FE_INEXACT, FE_INVALID, FE_OVERFLOW or FE_UNDERFLOW in 
compile-time operations, that looks like it could be part of a separate 
compiler flag or pragma, like C's FENV_ROUND can affect the rounding mode 
in static initializers (of course C++ templates make pragmas less 
convenient).


--
Marc Glisse

Re: [PATCH] warn for more impossible null pointer tests [PR102103]

2021-09-21 Thread Jason Merrill via Gcc-patches


On 9/17/21 12:02, Martin Sebor wrote:

On 9/8/21 2:06 PM, Jason Merrill wrote:

On 9/2/21 7:53 PM, Martin Sebor wrote:
@@ -4622,28 +4622,94 @@ warn_for_null_address (location_t location, 
tree op, tsubst_flags_t complain)

    if (!warn_address
    || (complain & tf_warning) == 0
    || c_inhibit_evaluation_warnings != 0
-  || warning_suppressed_p (op, OPT_Waddress))
+  || warning_suppressed_p (op, OPT_Waddress)
+  || processing_template_decl != 0)


Completely suppressing this warning in templates seems like a 
regression;  I'd think we could recognize many relevant cases before 
instantiation.  You just can't assume that ADDR_EXPR has the default 
meaning if it has unknown type (i.e. because op0 is type-dependent).


I added the suppression to keep g++.dg/warn/pr101219.C from failing
but in hindsight I should have questioned the reasoning behind
the "no warning emitted here (no instantiation)" comment in the test.

I agree that it would be helpful to diagnose the type-independent
subset of the problem even in uninstantiated templates.  Current
trunk doesn't (it never has), but with my patch and the suppression
above removed it does.  I've updated the tests to expect it.

Please see the attached revision.

Martin

PS There are still more opportunities to issue -Waddress in templates
that this patch doesn't handle, e.g.,:

   template  bool f (T *p) { return  == 0; }

Handling this will take more surgery.

PPS It seems that most other warnings (and even some errors, like
-Wnarrowing) are suppressed in uninstantiated templates as well,
even for non-dependent expressions.  In the few test cases I looked
at Clang does better.  It sounds like you'd like to see improvements
in this direction not just for -Waddress but in general.  Just for
the avoidance of doubt, can you confirm that for future reference?


Yes, in general it's better to diagnose sooner.


+  if (TREE_CODE (cop) == NON_LVALUE_EXPR)
+/* Unwrap the expression for C++ 98.  */
+cop = TREE_OPERAND (cop, 0);


What does this have to do with C++98?


+  if (TREE_CODE (cop) == PTRMEM_CST)
+{
+  /* The address of a nonstatic data member is never null.  */
+  warning_at (location, OPT_Waddress,
+ "the address %qE will never be NULL",


Capitalizing NULL when talking about pointers-to-members seems a bit 
odd, but I guess it's fine.


The C++ changes are OK.

Jason

libgo patch committed: set runtime.GOROOT at build time

2021-09-21 Thread Ian Lance Taylor via Gcc-patches

This patch changes libto to set runtime.GOROOT value at build time.
In Go 1.17 the gc toolchain changed to set runtime.GOROOT in cmd/link
(previously it was runtime/internal/sys.GOROOT).  This patch does the
same in libgo.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian

gotools:
* Makefile.am (check-runtime): Add goroot.go to --extrafiles.
* Makefile.in: Regenerate.
9a5c82b7c1cd935c4a35d6dd532a2aa3356d6785
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index e2abd5fc4b7..edfbe46d8f4 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-850235e4b974b9c5c2d7a1f9860583bd07f2a45c
+e3bfc0889237a5bb8aa7ae30e1cff14f90a5f941
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/gotools/Makefile.am b/gotools/Makefile.am
index 6576fe77b85..199899b9ef0 100644
--- a/gotools/Makefile.am
+++ b/gotools/Makefile.am
@@ -245,14 +245,14 @@ check-runtime: go$(EXEEXT) $(noinst_PROGRAMS) check-head 
check-gccgo check-gcc
export LD_LIBRARY_PATH; \
GOARCH=`$(abs_builddir)/go$(EXEEXT) env GOARCH`; \
GOOS=`$(abs_builddir)/go$(EXEEXT) env GOOS`; \
-   files=`$(SHELL) $(libgosrcdir)/../match.sh --goarch=$${GOARCH} 
--goos=$${GOOS} --srcdir=$(libgosrcdir)/runtime 
--extrafiles="$(libgodir)/runtime_linknames.go $(libgodir)/runtime_sysinfo.go 
$(libgodir)/sigtab.go" --tag=libffi`; \
+   files=`$(SHELL) $(libgosrcdir)/../match.sh --goarch=$${GOARCH} 
--goos=$${GOOS} --srcdir=$(libgosrcdir)/runtime 
--extrafiles="$(libgodir)/runtime_linknames.go $(libgodir)/runtime_sysinfo.go 
$(libgodir)/sigtab.go $(libgodir)/goroot.go" --tag=libffi`; \
echo "$(ECHO_ENV) GC='$(abs_builddir)/check-gccgo 
-fgo-compiling-runtime' GOARCH=$${GOARCH} GOOS=$${GOOS} $(SHELL) 
$(libgosrcdir)/../testsuite/gotest --goarch=$${GOARCH} --goos=$${GOOS} 
--basedir=$(libgosrcdir)/.. --srcdir=$(libgosrcdir)/runtime --pkgpath=runtime 
--pkgfiles='$${files}' $(GOTESTFLAGS) -test.timeout=$(GOTOOLS_TEST_TIMEOUT)s 
-test.v" > runtime-testlog
$(CHECK_ENV) \
GC="$${GCCGO} -fgo-compiling-runtime"; \
export GC; \
GOARCH=`$(abs_builddir)/go$(EXEEXT) env GOARCH`; \
GOOS=`$(abs_builddir)/go$(EXEEXT) env GOOS`; \
-   files=`$(SHELL) $(libgosrcdir)/../match.sh --goarch=$${GOARCH} 
--goos=$${GOOS} --srcdir=$(libgosrcdir)/runtime 
--extrafiles="$(libgodir)/runtime_linknames.go $(libgodir)/runtime_sysinfo.go 
$(libgodir)/sigtab.go" --tag=libffi`; \
+   files=`$(SHELL) $(libgosrcdir)/../match.sh --goarch=$${GOARCH} 
--goos=$${GOOS} --srcdir=$(libgosrcdir)/runtime 
--extrafiles="$(libgodir)/runtime_linknames.go $(libgodir)/runtime_sysinfo.go 
$(libgodir)/sigtab.go $(libgodir)/goroot.go" --tag=libffi`; \
$(SHELL) $(libgosrcdir)/../testsuite/gotest --goarch=$${GOARCH} 
--goos=$${GOOS} --basedir=$(libgosrcdir)/.. --srcdir=$(libgosrcdir)/runtime 
--pkgpath=runtime --pkgfiles="$${files}" $(GOTESTFLAGS) 
-test.timeout=$(GOTOOLS_TEST_TIMEOUT)s -test.v >> runtime-testlog 2>&1 || echo 
"--- $${fl}: go test runtime (0.00s)" >> runtime-testlog
grep '^--- ' runtime-testlog | sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' | 
sort -k 2
 
diff --git a/libgo/Makefile.am b/libgo/Makefile.am
index 92fedcf6eb8..5c377a30df9 100644
--- a/libgo/Makefile.am
+++ b/libgo/Makefile.am
@@ -545,6 +545,14 @@ s-gcpu: Makefile
$(SHELL) $(srcdir)/mvifdiff.sh gcpugen.go.tmp gcpugen.go
$(STAMP) $@
 
+goroot.go: s-goroot; @true
+s-goroot: Makefile
+   rm -f goroot.go.tmp
+   echo "package runtime" > goroot.go.tmp
+   echo 'var defaultGOROOT = `$(prefix)`' >> goroot.go.tmp
+   $(SHELL) $(srcdir)/mvifdiff.sh goroot.go.tmp goroot.go
+   $(STAMP) $@
+
 buildcfg.go: s-buildcfg; @true
 s-buildcfg: Makefile
rm -f buildcfg.go.tmp
@@ -1005,7 +1013,8 @@ math_lo_GOCFLAGS = $(MATH_FLAG)
 math_check_GOCFLAGS = $(MATH_FLAG)
 
 # Add generated files to the runtime package.
-extra_go_files_runtime = runtime_linknames.go runtime_sysinfo.go sigtab.go
+extra_go_files_runtime = \
+   runtime_linknames.go runtime_sysinfo.go sigtab.go goroot.go
 runtime.lo.dep: $(extra_go_files_runtime)
 
 # Add generated files to the syscall package.
diff --git a/libgo/go/runtime/extern.go b/libgo/go/runtime/extern.go
index 0d7f3577913..6bd612fcf32 100644
--- a/libgo/go/runtime/extern.go
+++ b/libgo/go/runtime/extern.go
@@ -211,8 +211,6 @@ func Caller(skip int) (pc uintptr, file string, line int, 
ok bool)
 // program counter adjustment.
 func Callers(skip int, pc []uintptr) int
 
-var defaultGOROOT string // set by cmd/link
-
 // GOROOT returns the root of the Go tree. It uses the
 // GOROOT environment variable, if set at process start,
 // or else the root used during the Go build.

Re: [RFC/PATCH] C++ constexpr vs. floating point exceptions.

2021-09-21 Thread Jason Merrill via Gcc-patches

On Tue, Sep 21, 2021 at 4:30 PM Joseph Myers 
wrote:

> On Tue, 21 Sep 2021, Jakub Jelinek via Gcc-patches wrote:
>
> > On Tue, Sep 21, 2021 at 02:15:59PM +0100, Roger Sayle wrote:
> > > Can you double check?  Integer division by zero is undefined, but
> isn't floating point
> > > division by zero defined by the appropriate IEEE standards?
> >
> > https://eel.is/c++draft/expr.mul#4 doesn't make the division by zero
> > behavior conditional on integral types.
> > C has similar wording.
>
> The position for C is that Annex F semantics take precedence over all the
> ways in which floating-point arithmetic has undefined behavior in the main
> body of the standard.  So floating-point overflow and division by zero are
> fully defined in the presence of Annex F support, while out-of-range
> conversions from floating point to integer types produce an unspecified
> value (not necessarily the same unspecified value for different executions
> of the conversion in the abstract machine - as discussed in bug 93806, GCC
> can get that wrong and act as if a single execution of such a conversion
> in the abstract machine produces more than one result).
>
> In C, as specified in Annex F, initializers for floating-point objects
> with static or thread storage duration are evaluated with exceptions
> discarded and the default rounding mode in effect; 7.0 / 0.0 is a fully
> valid initializer for such an object to initialize it to positive
> infinity.  As I understand it, the question for this thread is whether C++
> constexpr should have a similar rule to C static initializers (which ought
> to apply to 1.0 / 3.0, raising inexact, just as much as to 7.0 / 0.0).
>

The C rules seem to be

F.8.2 Translation
During translation the IEC 60559 default modes are in effect:
 — The rounding direction mode is rounding to nearest.
 — The rounding precision mode (if supported) is set so that results are
not shortened.
 — Trapping or stopping (if supported) is disabled on all floating-point
exceptions.
Recommended practice:
The implementation should produce a diagnostic message for each
translation-time floating-point exception, other than “inexact”; the
implementation should then proceed with the translation of the program.

I think following the same rules for C++ would be appropriate in a
diagnosing context: warn and continue.  In a template argument deduction
(SFINAE) context, where errors become silent substitution failures, it's
probably better to treat them as non-constant.

Jason

Re: [RFC/PATCH] C++ constexpr vs. floating point exceptions.

2021-09-21 Thread Joseph Myers

On Tue, 21 Sep 2021, Jakub Jelinek via Gcc-patches wrote:

> On Tue, Sep 21, 2021 at 02:15:59PM +0100, Roger Sayle wrote:
> > Can you double check?  Integer division by zero is undefined, but isn't 
> > floating point
> > division by zero defined by the appropriate IEEE standards?
> 
> https://eel.is/c++draft/expr.mul#4 doesn't make the division by zero
> behavior conditional on integral types.
> C has similar wording.

The position for C is that Annex F semantics take precedence over all the 
ways in which floating-point arithmetic has undefined behavior in the main 
body of the standard.  So floating-point overflow and division by zero are 
fully defined in the presence of Annex F support, while out-of-range 
conversions from floating point to integer types produce an unspecified 
value (not necessarily the same unspecified value for different executions 
of the conversion in the abstract machine - as discussed in bug 93806, GCC 
can get that wrong and act as if a single execution of such a conversion 
in the abstract machine produces more than one result).

In C, as specified in Annex F, initializers for floating-point objects 
with static or thread storage duration are evaluated with exceptions 
discarded and the default rounding mode in effect; 7.0 / 0.0 is a fully 
valid initializer for such an object to initialize it to positive 
infinity.  As I understand it, the question for this thread is whether C++ 
constexpr should have a similar rule to C static initializers (which ought 
to apply to 1.0 / 3.0, raising inexact, just as much as to 7.0 / 0.0).

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH V3 0/6] Initial support for AVX512FP16

2021-09-21 Thread Iain Sandoe

Hello Joseph,

> On 21 Sep 2021, at 21:11, Joseph Myers  wrote:
> 
> On Fri, 3 Sep 2021, Iain Sandoe wrote:
> 
>> given that:
>> 
>> a) this fixes Darwin x86-64 bootstrap which has been broken for more than 24h
>> b) the patch is now Darwin-local.
> 
> Actually, it's not Darwin-local.  It uses __MACH__, which is also defined 
> for Hurd.  And because sfp-machine.h gets included in files that aren't 
> specific to HFmode (and so aren't built with explicit -msse2), the build 
> for i686-gnu fails with:
> 
> In file included from 
> /scratch/jmyers/glibc-bot/src/gcc/libgcc/config/i386/sfp-exceptions.c:25:
> /scratch/jmyers/glibc-bot/src/gcc/libgcc/config/i386/sfp-machine.h:83:1: 
> error: unable to emulate 'HF'
>   83 | typedef float alias_HFtype __attribute__ ((mode (HF)));
>  | ^~~
> 
> I think some conditional that is genuinely Darwin-specific should be used, 
> so that Hurd keeps using normal ELF aliases and doesn't get these HFmode 
> references in sfp-machine.h at all.

Sorry about this, (I usually use __APPLE__ as the flag, but the __MACH__ was 
already there
in this case, I think).

I’ll fix this by s/__MACH__/__APPLE__/ 
since in this case we definitely mean Mach-O rather than anything to do with 
the micro-kernel...
(tomorrow now).

Note, that I have a suspicion that there are (maybe a small number of) other 
places in the code where __MACH__ has been taken to mean mach-o is in use.

thanks
Iain

Re: [PATCH V3 0/6] Initial support for AVX512FP16

2021-09-21 Thread Joseph Myers

On Fri, 3 Sep 2021, Iain Sandoe wrote:

> given that:
> 
> a) this fixes Darwin x86-64 bootstrap which has been broken for more than 24h
> b) the patch is now Darwin-local.

Actually, it's not Darwin-local.  It uses __MACH__, which is also defined 
for Hurd.  And because sfp-machine.h gets included in files that aren't 
specific to HFmode (and so aren't built with explicit -msse2), the build 
for i686-gnu fails with:

In file included from 
/scratch/jmyers/glibc-bot/src/gcc/libgcc/config/i386/sfp-exceptions.c:25:
/scratch/jmyers/glibc-bot/src/gcc/libgcc/config/i386/sfp-machine.h:83:1: error: 
unable to emulate 'HF'
   83 | typedef float alias_HFtype __attribute__ ((mode (HF)));
  | ^~~

I think some conditional that is genuinely Darwin-specific should be used, 
so that Hurd keeps using normal ELF aliases and doesn't get these HFmode 
references in sfp-machine.h at all.

-- 
Joseph S. Myers
jos...@codesourcery.com

[Patch] Fortran: Improve -Wmissing-include-dirs warnings [PR55534]

2021-09-21 Thread Tobias Burnus


While the previous patch fixed -Wno-missing-include-dirs and sorted
out some inconsistencies with libcpp warnings, it had two issues:

* Some superfluous warnings were printed, e.g. for
gfortran nonexisting/file.f90
  there was a warning about include path "nonexisting" not existing
  and twice the error that the "nonexisting/file.f90" could not be
  read.

* At least as invoked when build GCC or when running the GCC testsuite,
  the passed -B -isystem etc. arguments lead to proper but pointless
  diagnostic about 'finclude' or other directories not being found,
  causing excess-error FAILS and -Werror build fails.

While the latter could be fixed by adding -Wno-missing-include-dirs,
it still felt like the wrong approach.

While the testsuite does run for me, others reported that they do
see missing-include-dirs warnings. Instead of adding a bunch of
-Wno-missing-include-dirs to the test config, I now only warn for
-I and -J by default (similar to previous state) and only do a full
warnings when the user requested passes the -Wmissing-include-dirs
explicitly. The Fortran behavior is now also properly documented
in the manual.

In order to handle the silencing of the diagnostic and to avoid
double output via the Fortran code and libcpp (or rather: gcc/incpath.c),
I had to add some not that clean and obvious diagnostic flags.
I hope they still make sense and are somewhat readable.

OK? Comments?

Tobias

PS: There is also some inconsistency whether fprintf stderr and
gfc_error is used. All calls in load_file could be fatal errors
as all exist with an error - and similar issues get different
error messages for no good reason. I have not tried to solve
this issue – but can if deemed reasonable as follow-up patch.

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
Fortran: Improve -Wmissing-include-dirs warnings [PR55534]

It turned out that enabling the -Wmissing-include-dirs for libcpp did output
too many warnings – at least as run with -B and similar options during the
GCC build and warning for internal include dirs like finclude, unlikely of
relevance to for a real-world user.
This patch now only warns for -I and -J by default but permits to get the
full warnings including libcpp ones with -Wmissing-include-dirs. It
additionally documents this in the manual.

With that change, the -Wno-missing-include-dirs could be removed
from libgfortran's configure and libgomp's testsuite always cflags.
This reverts those bits of the previous
commit r12-3722-g417ea5c02cef7f000e66d1af22b066c2c1cda047

Additionally, it turned out that all call to load_file called exit
explicitly - except for the main file via gfc_init -> gfc_new_file. The
latter also output a file not existing fatal error, such that two errors
where printed. Now exit is called in line with the other users of
load_file.

Finally, when compileing with "nonexisting/file.f90", first a warning that
"nonexisting" does not exist as include path was printed before the file
not found error was printed. Now the directory in which the physical file
is located is added silently, relying on the file-not-found diagnostic for
those.

	PR fortran/55534
gcc/ChangeLog:

	* doc/invoke.texi (-Wno-missing-include-dirs.): Document Fortran
	behavior.

gcc/fortran/ChangeLog:

	* cpp.c (gfc_cpp_register_include_paths, gfc_cpp_post_options):
	Add new bool verbose_missing_dir_warn argument.
	* cpp.h (gfc_cpp_post_options): Update prototype.
	* f95-lang.c (gfc_init): Remove duplicated file-not found diag.
	* gfortran.h (gfc_check_include_dirs): Takes bool
	verbose_missing_dir_warn arg.
	(gfc_new_file): Returns now void.
	* options.c (gfc_post_options): Update to warn for -I and -J,
	only, by default but for all when user requested.
	* scanner.c (gfc_do_check_include_dir):
	(gfc_do_check_include_dirs, gfc_check_include_dirs): Take bool
	verbose warn arg and update to avoid printing the same message
	twice or never.
	(load_file): Fix indent.
	(gfc_new_file): Return void and exit when load_file failed
	as all other load_file users do.

libgfortran/ChangeLog:

	* configure.ac (AM_FCFLAGS): Revert r12-3722 by removing
	-Wno-missing-include-dirs.
	* configure: Regenerate.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/fortran.exp (ALWAYS_CFLAGS): Revert
	r12-3722 by removing -Wno-missing-include-dirs.
	* testsuite/libgomp.oacc-fortran/fortran.exp (ALWAYS_CFLAGS): Likewise.

gcc/testsuite/ChangeLog:

	* gfortran.dg/include_14.f90: Add -J testcase and update dg-output.
	* gfortran.dg/include_15.f90: Likewise.
	* gfortran.dg/include_16.f90: Likewise.
	* gfortran.dg/include_17.f90: Likewise.
	* gfortran.dg/include_18.f90: Likewise.
	* gfortran.dg/include_19.f90: Likewise.

 gcc/doc/invoke.texi|  6 +++--
 gcc/fortran/cpp.c

[PATCH][testsuite][aarch64]: Fix gcc.target/aarch64/auto-init-* tests.

2021-09-21 Thread Qing Zhao via Gcc-patches

Hi, 

This is the patch to fix gcc.target/aarch64/auto-init-* tests.

I have tested the change on aarch64-linux with 

make check-gcc 
RUNTESTFLAGS='--target_board=unix\{-mabi=lp64,-mabi=ilp32,-mabi=lp64/-fstack-clash-protection/-fstack-protector-all,-mabi=ilp32/-fstack-clash-protection/-fstack-protector-all,-mabi=lp64/-march=armv8-a,-mabi=ilp32/-march=armv8.2-a,-mabi=lp64/-march=armv8.4-a,-mabi=ilp32/-march=armv8.6-a,-mabi=lp64/-march=armv8-r\}
 aarch64.exp=auto-init*'

Everything works fine.

Okay for commit?

Thanks.

Qing

==



>From c46888eed5621df842178a85adf7e221c7e00b48 Mon Sep 17 00:00:00 2001
From: qing zhao 
Date: Tue, 21 Sep 2021 12:05:32 -0700
Subject: [PATCH] testsuite: Fix gcc.target/aarch64/auto-init-* tests.

Add -fno-stack-protector for two testing cases and also different
pattern match for lp64 and ilp32 for the other two cases.

gcc/testsuite/ChangeLog:

2021-09-21  qing zhao  

* gcc.target/aarch64/auto-init-1.c: Add -fno-stack-protector.
* gcc.target/aarch64/auto-init-7.c: Likewise.
* gcc.target/aarch64/auto-init-2.c: Different pattern match for
lp64 and ilp32.
* gcc.target/aarch64/auto-init-padding-5.c: Likewise.
---
 gcc/testsuite/gcc.target/aarch64/auto-init-1.c | 2 +-
 gcc/testsuite/gcc.target/aarch64/auto-init-2.c | 3 ++-
 gcc/testsuite/gcc.target/aarch64/auto-init-7.c | 2 +-
 gcc/testsuite/gcc.target/aarch64/auto-init-padding-5.c | 3 ++-
 4 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/auto-init-1.c 
b/gcc/testsuite/gcc.target/aarch64/auto-init-1.c
index 0fa4708..a38d91b 100644
--- a/gcc/testsuite/gcc.target/aarch64/auto-init-1.c
+++ b/gcc/testsuite/gcc.target/aarch64/auto-init-1.c
@@ -1,6 +1,6 @@
 /* Verify zero initialization for integer and pointer type automatic 
variables.  */
 /* { dg-do compile } */
-/* { dg-options "-ftrivial-auto-var-init=zero -fdump-rtl-expand" } */
+/* { dg-options "-ftrivial-auto-var-init=zero -fdump-rtl-expand 
-fno-stack-protector" } */
 
 #ifndef __cplusplus
 # define bool _Bool
diff --git a/gcc/testsuite/gcc.target/aarch64/auto-init-2.c 
b/gcc/testsuite/gcc.target/aarch64/auto-init-2.c
index 2c54e6d..136dbf6 100644
--- a/gcc/testsuite/gcc.target/aarch64/auto-init-2.c
+++ b/gcc/testsuite/gcc.target/aarch64/auto-init-2.c
@@ -32,4 +32,5 @@ void foo()
 /* { dg-final { scan-rtl-dump-times "0xfe\\\]" 1 "expand" } } */
 /* { dg-final { scan-rtl-dump-times "0xfefe" 1 "expand" } } */
 /* { dg-final { scan-rtl-dump-times "0xfefefefe" 2 "expand" } } */
-/* { dg-final { scan-rtl-dump-times "0xfefefefefefefefe" 2 "expand" } } */
+/* { dg-final { scan-rtl-dump-times "0xfefefefefefefefe" 2 "expand" { target 
lp64 } } } */
+/* { dg-final { scan-rtl-dump-times "0xfefefefefefefefe" 1 "expand" { target 
ilp32 } } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/auto-init-7.c 
b/gcc/testsuite/gcc.target/aarch64/auto-init-7.c
index ac27fbe..fde6e56 100644
--- a/gcc/testsuite/gcc.target/aarch64/auto-init-7.c
+++ b/gcc/testsuite/gcc.target/aarch64/auto-init-7.c
@@ -1,6 +1,6 @@
 /* Verify zero initialization for array, union, and structure type automatic 
variables.  */
 /* { dg-do compile } */
-/* { dg-options "-ftrivial-auto-var-init=zero -fdump-rtl-expand" } */
+/* { dg-options "-ftrivial-auto-var-init=zero -fdump-rtl-expand 
-fno-stack-protector" } */
 
 struct S
 {
diff --git a/gcc/testsuite/gcc.target/aarch64/auto-init-padding-5.c 
b/gcc/testsuite/gcc.target/aarch64/auto-init-padding-5.c
index 3c45a6c..7991367 100644
--- a/gcc/testsuite/gcc.target/aarch64/auto-init-padding-5.c
+++ b/gcc/testsuite/gcc.target/aarch64/auto-init-padding-5.c
@@ -17,6 +17,7 @@ int foo ()
   return var.four;
 }
 
-/* { dg-final { scan-assembler-times "stp\txzr, xzr," 2 } } */
+/* { dg-final { scan-assembler-times "stp\txzr, xzr," 2 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "stp\txzr, xzr," 1 { target ilp32 } } } */
 
 
-- 
1.9.1

[r12-3722 Regression] FAIL: gfortran.dg/write_to_null.F90 -Os (test for excess errors) on Linux/x86_64

2021-09-21 Thread sunil.k.pandey via Gcc-patches

On Linux/x86_64,

417ea5c02cef7f000e66d1af22b066c2c1cda047 is the first bad commit
commit 417ea5c02cef7f000e66d1af22b066c2c1cda047
Author: Tobias Burnus 
Date:   Tue Sep 21 08:27:00 2021 +0200

Fortran: Fix -Wno-missing-include-dirs handling [PR55534]

caused

FAIL: gfortran.dg/achar_6.F90   -O0  (test for excess errors)
FAIL: gfortran.dg/achar_6.F90   -O1  (test for excess errors)
FAIL: gfortran.dg/achar_6.F90   -O2  (test for excess errors)
FAIL: gfortran.dg/achar_6.F90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
FAIL: gfortran.dg/achar_6.F90   -O3 -g  (test for excess errors)
FAIL: gfortran.dg/achar_6.F90   -Os  (test for excess errors)
FAIL: gfortran.dg/array_assignment_1.F90   -O0  (test for excess errors)
FAIL: gfortran.dg/array_assignment_1.F90   -O1  (test for excess errors)
FAIL: gfortran.dg/array_assignment_1.F90   -O2  (test for excess errors)
FAIL: gfortran.dg/array_assignment_1.F90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess 
errors)
FAIL: gfortran.dg/array_assignment_1.F90   -O3 -g  (test for excess errors)
FAIL: gfortran.dg/array_assignment_1.F90   -Os  (test for excess errors)
FAIL: gfortran.dg/associate_1.f03   -O0  (test for excess errors)
FAIL: gfortran.dg/associate_1.f03   -O1  (test for excess errors)
FAIL: gfortran.dg/associate_1.f03   -O2  (test for excess errors)
FAIL: gfortran.dg/associate_1.f03   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
FAIL: gfortran.dg/associate_1.f03   -O3 -g  (test for excess errors)
FAIL: gfortran.dg/associate_1.f03   -Os  (test for excess errors)
FAIL: gfortran.dg/bit_comparison_1.F90   -O0  (test for excess errors)
FAIL: gfortran.dg/bit_comparison_1.F90   -O1  (test for excess errors)
FAIL: gfortran.dg/bit_comparison_1.F90   -O2  (test for excess errors)
FAIL: gfortran.dg/bit_comparison_1.F90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess 
errors)
FAIL: gfortran.dg/bit_comparison_1.F90   -O3 -g  (test for excess errors)
FAIL: gfortran.dg/bit_comparison_1.F90   -Os  (test for excess errors)
FAIL: gfortran.dg/bit_comparison_2.F90   -O0  (test for excess errors)
FAIL: gfortran.dg/bit_comparison_2.F90   -O1  (test for excess errors)
FAIL: gfortran.dg/bit_comparison_2.F90   -O2  (test for excess errors)
FAIL: gfortran.dg/bit_comparison_2.F90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess 
errors)
FAIL: gfortran.dg/bit_comparison_2.F90   -O3 -g  (test for excess errors)
FAIL: gfortran.dg/bit_comparison_2.F90   -Os  (test for excess errors)
FAIL: gfortran.dg/bom_UTF-8_F.F90   -O  (test for excess errors)
FAIL: gfortran.dg/dec_math.f90   -O0  (test for excess errors)
FAIL: gfortran.dg/dec_math.f90   -O1  (test for excess errors)
FAIL: gfortran.dg/dec_math.f90   -O2  (test for excess errors)
FAIL: gfortran.dg/dec_math.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
FAIL: gfortran.dg/dec_math.f90   -O3 -g  (test for excess errors)
FAIL: gfortran.dg/dec_math.f90   -Os  (test for excess errors)
FAIL: gfortran.dg/dev_null.F90   -O0  (test for excess errors)
FAIL: gfortran.dg/dev_null.F90   -O1  (test for excess errors)
FAIL: gfortran.dg/dev_null.F90   -O2  (test for excess errors)
FAIL: gfortran.dg/dev_null.F90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
FAIL: gfortran.dg/dev_null.F90   -O3 -g  (test for excess errors)
FAIL: gfortran.dg/dev_null.F90   -Os  (test for excess errors)
FAIL: gfortran.dg/diagnostic-format-json-1.F90   -O   dg-regexp 28 not found: 
""locations": [[{}, ]*]"
FAIL: gfortran.dg/diagnostic-format-json-1.F90   -O   dg-regexp 29 not found: 
""children": [[][]]"
FAIL: gfortran.dg/diagnostic-format-json-1.F90   -O  (test for excess errors)
FAIL: gfortran.dg/diagnostic-format-json-2.F90   -O   dg-regexp 30 not found: 
""locations": [[{}, ]*]"
FAIL: gfortran.dg/diagnostic-format-json-2.F90   -O   dg-regexp 31 not found: 
""children": [[][]]"
FAIL: gfortran.dg/diagnostic-format-json-2.F90   -O  (test for excess errors)
FAIL: gfortran.dg/diagnostic-format-json-3.F90   -O   dg-regexp 30 not found: 
""locations": [[{}, ]*]"
FAIL: gfortran.dg/diagnostic-format-json-3.F90   -O   dg-regexp 31 not found: 
""children": [[][]]"
FAIL: gfortran.dg/diagnostic-format-json-3.F90   -O  (test for excess errors)
FAIL: gfortran.dg/do_3.F90   -O0  (test for excess errors)
FAIL: gfortran.dg/do_3.F90   -O1  (test for excess errors)
FAIL: gfortran.dg/do_3.F90   -O2  (test for excess errors)
FAIL: gfortran.dg/do_3.F90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
FAIL: gfortran.dg/do_3.F90   -O3 -g  (test for excess errors)
FAIL: gfortran.dg/do_3.F90   -Os  (test for excess errors)

Re: [HELP Needed!][PATCH] testsuite: Fix gcc.target/aarch64/auto-init-* tests.

2021-09-21 Thread Qing Zhao via Gcc-patches

By using a newer version of dejagnu on aarch64,  the options in the testing 
cases are appended AFTER the options in the RUNTESTFLAGS.
As a result, my patch to the aarch64/auto-init-* tests passed without issue.

Thanks a lot for your help.

Qing

> On Sep 20, 2021, at 9:36 AM, Richard Earnshaw  
> wrote:
> 
> 
> 
> On 20/09/2021 14:55, Qing Zhao wrote:
>>> On Sep 20, 2021, at 8:18 AM, Richard Earnshaw 
>>>  wrote:
>>> 
>>> 
>>> 
>>> On 20/09/2021 13:47, Qing Zhao wrote:
> On Sep 20, 2021, at 5:43 AM, Richard Earnshaw 
>  wrote:
> 
> 
> 
> On 17/09/2021 20:48, Qing Zhao via Gcc-patches wrote:
>> Hi,
>> There are much less issues with aarch64/auto-init-* test cases.
>> Different -march values (from ‘armv8-a’, ‘armv8.1-a’, till ‘armv8.6-a’, 
>> ‘armv8-r’) do not change the pattern match.
>> Only
>> 1. -mabi=ilp32/lp64 impact two of the testing cases “auto-init-2.c” and 
>> “auto-init-padding-5.c”.
>> 2. -fstack-clash-protection impact two of the testing cases 
>> “auto-init-1.c” and “auto-init-7.c”.
>> Naturally the patch for this set is:
>> A. Adjust the patterns for ilp32 or lp64 in “auto-init-2.c” and 
>> “auto-init-padding-5.c”;
>> B. Add -fno-stack-protector to “auto-init-1.c” and “auto-init-7.c”
>> The above A fixed issue 1, however, the above B did not fix issue 2.
>> If I fixed “auto-init-1.c” as:
>> diff --git a/gcc/testsuite/gcc.target/aarch64/auto-init-1.c 
>> b/gcc/testsuite/gcc.target/aarch64/auto-init-1.c
>> index 0fa4708..a38d91b 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/auto-init-1.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/auto-init-1.c
>> @@ -1,6 +1,6 @@
>>  /* Verify zero initialization for integer and pointer type automatic 
>> variables.  */
>>  /* { dg-do compile } */
>> -/* { dg-options "-ftrivial-auto-var-init=zero -fdump-rtl-expand" } */
>> +/* { dg-options "-ftrivial-auto-var-init=zero -fdump-rtl-expand 
>> -fno-stack-protector" } */
>>#ifndef __cplusplus
>>  # define bool _Bool
>> So, I took a look at the log file of the testing, and found that, If I 
>> tested it as:
>>  make check-gcc 
>> RUNTESTFLAGS='--target_board=unix\{-mabi=lp64/-fstack-clash-protection/-fstack-protector-all\}
>>  aarch64.exp=auto-init*’
>> In the log file, I got:
>> /home/qinzhao/Work/GCC/build_git/gcc/xgcc 
>> -B/home/qinzhao/Work/GCC/build_git/gcc/ 
>> /home/qinzhao/Work/GCC/latest_gcc_git/gcc/testsuite/gcc.target/aarch64/auto-init-1.c
>>   -fdiagnostics-plain-output  -ftrivial-auto-var-init=zero 
>> -fdump-rtl-expand -fno-stack-protector -ffat-lto-objects -S  -mabi=lp64 
>> -fstack-clash-protection -fstack-protector-all  -o auto-init-1.s
>> From it, we can see, that the options that were passed through 
>> RUNTESTFLAGS “mabi-lp64 -fstack-clash-protection -fstack-protector-all” 
>> were appended AFTER the options inside the testing case through 
>> “dg-options”. As a result, the option “-fno-stack-protector” did not 
>> have any impact at all.
>> What’s the expected behavior for the order of these options? Should 
>> options through RUNTESTFLAGS be appended BEFORE or AFTER the options 
>> through testing cases?
>> For X86, the options through RUNTESTFLAGS are added BEFORE the options 
>> through testing cases. Therefore adding “-fno-stack-protector” has the 
>> expected result.
> 
> Can you check that you are using the same version of dejagnu on both 
> platforms?
 Stupid question:  how to check the version of it?
>>> 
>>> $ runtest --version
>> Thanks.
>> On aarch64:
>> qinzhao@gcc116:~/Work/GCC/build_git$ runtest --version
>> Expect version is5.45
>> Tcl version is   8.6
>> Framework version is 1.5
> 
> Ok, so this is dejagnu 1.5 (older versions of dejagnu reported the 'framework 
> version') ...
> 
>> On X86:
>> [opc@qinzhao-ol8u3-x86 gcc]$ runtest --version
>> WARNING: Couldn't find the global config file.
>> DejaGnu version  1.6.1
>> Expect version   5.45.4
>> Tcl version  8.6
> 
> While this is dejagnu 1.6.1
> 
> IIRC there were some changes to the way various options were combined at one 
> point and I suspect this may be the source of your problem.  Can you update 
> your dejagnu on your aarch64 system to be the same as that on x86?
> 
> R.
> 
>> So, is there anything wrong with the installation of runtest on X86?
>> Qing
>>> 
>>> R.
>>> 
 Qing
> 
> R.
> 
>> Is this a bug in aarch64 testing suite?
>> Thanks.
>> Qing

Re: [PATCH 2/7] Do not query SCEV in range_of_phi unless dominators are available.

2021-09-21 Thread Aldy Hernandez via Gcc-patches





On 9/21/21 7:05 PM, Andrew MacLeod wrote:

On 9/21/21 12:53 PM, Aldy Hernandez wrote:

SCEV won't work without dominators and we can get called without
dominators from debug_ranger.

Another option would be to rename scev_initialized_p to something like
scev_available_p and move the check there.  For now, this will do.

Committed.

gcc/ChangeLog:

* gimple-range-fold.cc (fold_using_range::range_of_phi): Check
dom_info_available_p.
---
  gcc/gimple-range-fold.cc | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index 997d02dd4b9..4dbf4188ec2 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -781,7 +781,9 @@ fold_using_range::range_of_phi (irange , gphi 
*phi, fur_source )

  }
    // If SCEV is available, query if this PHI has any knonwn values.
-  if (scev_initialized_p () && !POINTER_TYPE_P (TREE_TYPE (phi_def)))
+  if (dom_info_available_p (CDI_DOMINATORS)
+  && scev_initialized_p ()
+  && !POINTER_TYPE_P (TREE_TYPE (phi_def)))
  {
    value_range loop_range;
    class loop *l = loop_containing_stmt (phi);


Im confused.. if scev doesn't work without dominators, how is 
scev_initialized_p() true if there are no dominators? Are we 
initializing it somewhere without dominators?  Maybe there should be a 
check in the scev init routine?  seems like something else is amok.


As I mentioned, this can happen from debug_ranger(), which is a 
debugging construct, and I've been known to call it without dominators 
:).  And yes, I agree we could move it to scev_initialized_p.


Aldy




void
scev_initialize (void)
{
   gcc_assert (! scev_initialized_p ());

   scalar_evolution_info = hash_table::create_ggc (100);

   for (auto loop : loops_list (cfun, 0))
     loop->nb_iterations = NULL_TREE;
}

/* Return true if SCEV is initialized.  */

bool
scev_initialized_p (void)
{
   return scalar_evolution_info != NULL;
}

Re: [PATCH 2/7] Do not query SCEV in range_of_phi unless dominators are available.

2021-09-21 Thread Andrew MacLeod via Gcc-patches


On 9/21/21 12:53 PM, Aldy Hernandez wrote:

SCEV won't work without dominators and we can get called without
dominators from debug_ranger.

Another option would be to rename scev_initialized_p to something like
scev_available_p and move the check there.  For now, this will do.

Committed.

gcc/ChangeLog:

* gimple-range-fold.cc (fold_using_range::range_of_phi): Check
dom_info_available_p.
---
  gcc/gimple-range-fold.cc | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index 997d02dd4b9..4dbf4188ec2 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -781,7 +781,9 @@ fold_using_range::range_of_phi (irange , gphi *phi, 
fur_source )
  }
  
// If SCEV is available, query if this PHI has any knonwn values.

-  if (scev_initialized_p () && !POINTER_TYPE_P (TREE_TYPE (phi_def)))
+  if (dom_info_available_p (CDI_DOMINATORS)
+  && scev_initialized_p ()
+  && !POINTER_TYPE_P (TREE_TYPE (phi_def)))
  {
value_range loop_range;
class loop *l = loop_containing_stmt (phi);


Im confused.. if scev doesn't work without dominators, how is 
scev_initialized_p() true if there are no dominators? Are we 
initializing it somewhere without dominators?  Maybe there should be a 
check in the scev init routine?  seems like something else is amok.



void
scev_initialize (void)
{
  gcc_assert (! scev_initialized_p ());

  scalar_evolution_info = hash_table::create_ggc (100);

  for (auto loop : loops_list (cfun, 0))
    loop->nb_iterations = NULL_TREE;
}

/* Return true if SCEV is initialized.  */

bool
scev_initialized_p (void)
{
  return scalar_evolution_info != NULL;
}

RE: [PATCH 2/5]AArch64 sve: combine nested if predicates

2021-09-21 Thread Tamar Christina via Gcc-patches

Hi honored reviewer,

Thanks for the feedback, I hereby submit the new patch:

> > Note: This patch series is working incrementally towards generating the
> most
> >   efficient code for this and other loops in small steps.
> 
> It looks like this could be done in the vectoriser via an extension of the
> scalar_cond_masked_set mechanism.  We have:
> 
>   mask__54.13_59 = vect_a_15.9_55 > vect_b_17.12_58;
>   vec_mask_and_60 = loop_mask_32 & mask__54.13_59;
>   …
>   mask__30.17_67 = vect_a_15.9_55 > vect_cst__66;
>   mask__29.18_68 = mask__54.13_59 & mask__30.17_67;
>   vec_mask_and_69 = loop_mask_32 & mask__29.18_68;
> 
> When vectorising mask__29.18_68, we could test whether each side of the
> "&" is already in scalar_cond_masked_set and AND in the loop mask if so, like
> we do in vectorizable_condition.  We could then separately record that the &
> result includes the loop mask.

When never a mask is being generated from an BIT_AND we mask the operands of
the and instead and then just AND the result.

This allows us to be able to CSE the masks and generate the right combination.
However because re-assoc will try to re-order the masks in the & we have to now
perform a small local CSE on the vectorized loop is vectorization is successful.

Note: This patch series is working incrementally towards generating the most
  efficient code for this and other loops in small steps.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-linux-gnu and no 
issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vectorizer.c (vectorize_loops): Do local CSE through RPVN upon
successful vectorization.
* tree-vect-stmts.c (prepare_load_store_mask): When combining two masks
mask the operands instead of the combined operation.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/pred-combine-and.c: New test.

--- inline copy of patch ---

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pred-combine-and.c 
b/gcc/testsuite/gcc.target/aarch64/sve/pred-combine-and.c
new file mode 100644
index 
..d395b7f84bb15b588493611df5a47549726ac24a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pred-combine-and.c
@@ -0,0 +1,18 @@
+/* { dg-do assemble { target aarch64_asm_sve_ok } } */
+/* { dg-options "-O3 --save-temps" } */
+
+void f5(float * restrict z0, float * restrict z1, float *restrict x, float * 
restrict y, float c, int n)
+{
+for (int i = 0; i < n; i++) {
+float a = x[i];
+float b = y[i];
+if (a > b) {
+z0[i] = a + b;
+if (a > c) {
+z1[i] = a - b;
+}
+}
+}
+}
+
+/* { dg-final { scan-assembler-times {\tfcmgt\tp[0-9]+\.s, p[0-9]+/z, 
z[0-9]+\.s, z[0-9]+\.s} 2 } } */
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 
4e0b2adf1dc2404bc345af30cfeb9c819084894e..717a25f46aa72534eebeb382c92b9145d7d44d04
 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -1799,6 +1799,19 @@ prepare_load_store_mask (tree mask_type, tree loop_mask, 
tree vec_mask,
 return vec_mask;
 
   gcc_assert (TREE_TYPE (loop_mask) == mask_type);
+
+  /* Check if the mask is a combination of two different masks.  */
+  gimple *def_stmt = SSA_NAME_DEF_STMT (vec_mask);
+  if (is_gimple_assign (def_stmt)
+  && gimple_assign_rhs_code (def_stmt) == BIT_AND_EXPR)
+{
+  tree lhs1 = gimple_assign_rhs1 (def_stmt);
+  tree lhs2 = gimple_assign_rhs2 (def_stmt);
+
+  vec_mask = prepare_load_store_mask (mask_type, loop_mask, lhs1, gsi);
+  loop_mask = prepare_load_store_mask (mask_type, loop_mask, lhs2, gsi);
+}
+
   tree and_res = make_temp_ssa_name (mask_type, NULL, "vec_mask_and");
   gimple *and_stmt = gimple_build_assign (and_res, BIT_AND_EXPR,
  vec_mask, loop_mask);
diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
index 
3aa3e2a678328baccc4869fe2c6546e700b92255..84bcd146af7c4dedf6acdd7317954010ad3f281b
 100644
--- a/gcc/tree-vectorizer.c
+++ b/gcc/tree-vectorizer.c
@@ -81,7 +81,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-pretty-print.h"
 #include "opt-problem.h"
 #include "internal-fn.h"
-
+#include "tree-ssa-sccvn.h"
 
 /* Loop or bb location, with hotness information.  */
 dump_user_location_t vect_location;
@@ -1321,6 +1321,27 @@ vectorize_loops (void)
 ???  Also while we try hard to update loop-closed SSA form we fail
 to properly do this in some corner-cases (see PR56286).  */
   rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa_only_virtuals);
+
+  for (i = 1; i < number_of_loops (cfun); i++)
+   {
+ loop = get_loop (cfun, i);
+ if (!loop || !single_exit (loop))
+   continue;
+
+ bitmap exit_bbs;
+ /* Perform local CSE, this esp. helps because we emit code for
+predicates that need to be shared for optimal predicate usage.
+However reassoc

[PATCH 5/7] path solver: Remove useless code.

2021-09-21 Thread Aldy Hernandez via Gcc-patches

Committed.

gcc/ChangeLog:

* gimple-range-path.cc (path_range_query::range_defined_in_block):
Remove useless code.
---
 gcc/gimple-range-path.cc | 4 
 1 file changed, 4 deletions(-)

diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index b3040c9bc00..b86a76fa74b 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -246,10 +246,6 @@ path_range_query::range_defined_in_block (irange , tree 
name, basic_block bb)
   fprintf (dump_file, "\n");
 }
 
-  // We may have an artificial statement not in the IL.
-  if (!bb && r.varying_p ())
-return false;
-
   return true;
 }
 
-- 
2.31.1

[PATCH 2/7] Do not query SCEV in range_of_phi unless dominators are available.

2021-09-21 Thread Aldy Hernandez via Gcc-patches

SCEV won't work without dominators and we can get called without
dominators from debug_ranger.

Another option would be to rename scev_initialized_p to something like
scev_available_p and move the check there.  For now, this will do.

Committed.

gcc/ChangeLog:

* gimple-range-fold.cc (fold_using_range::range_of_phi): Check
dom_info_available_p.
---
 gcc/gimple-range-fold.cc | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index 997d02dd4b9..4dbf4188ec2 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -781,7 +781,9 @@ fold_using_range::range_of_phi (irange , gphi *phi, 
fur_source )
 }
 
   // If SCEV is available, query if this PHI has any knonwn values.
-  if (scev_initialized_p () && !POINTER_TYPE_P (TREE_TYPE (phi_def)))
+  if (dom_info_available_p (CDI_DOMINATORS)
+  && scev_initialized_p ()
+  && !POINTER_TYPE_P (TREE_TYPE (phi_def)))
 {
   value_range loop_range;
   class loop *l = loop_containing_stmt (phi);
-- 
2.31.1

[PATCH 1/7] Allocate non_null_ref tables at creation.

2021-09-21 Thread Aldy Hernandez via Gcc-patches

Preallocating the space is slightly cheaper than calling
safe_grow_cleared.

Committed.

gcc/ChangeLog:

* gimple-range-cache.cc (non_null_ref::non_null_ref): Use create
and quick_grow_cleared instead of safe_grow_cleared.
---
 gcc/gimple-range-cache.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index fbf0f95eef9..b856b212169 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -37,8 +37,8 @@ along with GCC; see the file COPYING3.  If not see
 
 non_null_ref::non_null_ref ()
 {
-  m_nn.create (0);
-  m_nn.safe_grow_cleared (num_ssa_names);
+  m_nn.create (num_ssa_names);
+  m_nn.quick_grow_cleared (num_ssa_names);
   bitmap_obstack_initialize (_bitmaps);
 }
 
-- 
2.31.1

[PATCH 4/7] path solver: Add relation support.

2021-09-21 Thread Aldy Hernandez via Gcc-patches

This patch adds relational support to the path solver.  It uses a
path_oracle that keeps track of relations within a path which are
augmented by relations on entry to the path.  With it, range_of_stmt,
range_of_expr, and friends can give relation aware answers.

Committed.

gcc/ChangeLog:

* gimple-range-fold.h (class fur_source): Make oracle protected.
* gimple-range-path.cc (path_range_query::path_range_query): Add
resolve argument.  Initialize oracle.
(path_range_query::~path_range_query): Delete oracle.
(path_range_query::range_of_stmt): Adapt to use relations.
(path_range_query::precompute_ranges): Pre-compute relations.
(class jt_fur_source): New
(jt_fur_source::jt_fur_source): New.
(jt_fur_source::register_relation): New.
(jt_fur_source::query_relation): New.
(path_range_query::precompute_relations): New.
(path_range_query::precompute_phi_relations): New.
* gimple-range-path.h (path_range_query): Add resolve argument.
Add oracle, precompute_relations, precompute_phi_relations.
* tree-ssa-threadbackward.c (back_threader::back_threader): Pass
resolve argument to solver.
---
 gcc/gimple-range-fold.h   |   2 +-
 gcc/gimple-range-path.cc  | 204 +++---
 gcc/gimple-range-path.h   |  15 ++-
 gcc/tree-ssa-threadbackward.c |   2 +-
 4 files changed, 200 insertions(+), 23 deletions(-)

diff --git a/gcc/gimple-range-fold.h b/gcc/gimple-range-fold.h
index d62d29b7133..bc0874b5f31 100644
--- a/gcc/gimple-range-fold.h
+++ b/gcc/gimple-range-fold.h
@@ -160,7 +160,7 @@ public:
  tree op2) OVERRIDE;
   virtual void register_relation (edge e, relation_kind k, tree op1,
  tree op2) OVERRIDE;
-private:
+protected:
   relation_oracle *m_oracle;
 };
 
diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index 10b018b5211..b3040c9bc00 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -30,11 +30,13 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pretty-print.h"
 #include "gimple-range-path.h"
 #include "ssa.h"
+#include "tree-cfg.h"
+#include "gimple-iterator.h"
 
 // Internal construct to help facilitate debugging of solver.
 #define DEBUG_SOLVER (0 && dump_file)
 
-path_range_query::path_range_query (gimple_ranger )
+path_range_query::path_range_query (gimple_ranger , bool resolve)
   : m_ranger (ranger)
 {
   if (DEBUG_SOLVER)
@@ -43,12 +45,15 @@ path_range_query::path_range_query (gimple_ranger )
   m_cache = new ssa_global_cache;
   m_has_cache_entry = BITMAP_ALLOC (NULL);
   m_path = NULL;
+  m_resolve = resolve;
+  m_oracle = new path_oracle (ranger.oracle ());
 }
 
 path_range_query::~path_range_query ()
 {
   BITMAP_FREE (m_has_cache_entry);
   delete m_cache;
+  delete m_oracle;
 }
 
 // Mark cache entry for NAME as unused.
@@ -161,22 +166,6 @@ path_range_query::unreachable_path_p ()
   return m_undefined_path;
 }
 
-// Return the range of STMT at the end of the path being analyzed.
-
-bool
-path_range_query::range_of_stmt (irange , gimple *stmt, tree)
-{
-  tree type = gimple_range_type (stmt);
-
-  if (!irange::supports_type_p (type))
-return false;
-
-  if (!fold_range (r, stmt, this))
-r.set_varying (type);
-
-  return true;
-}
-
 // Initialize the current path to PATH.  The current block is set to
 // the entry block to the path.
 //
@@ -371,6 +360,12 @@ path_range_query::precompute_ranges (const 
vec ,
   m_imports = imports;
   m_undefined_path = false;
 
+  if (m_resolve)
+{
+  m_oracle->reset_path ();
+  precompute_relations (path);
+}
+
   if (DEBUG_SOLVER)
 {
   fprintf (dump_file, "\npath_range_query: precompute_ranges for path: ");
@@ -400,3 +395,178 @@ path_range_query::precompute_ranges (const 
vec ,
   if (DEBUG_SOLVER)
 dump (dump_file);
 }
+
+// A folding aid used to register and query relations along a path.
+// When queried, it returns relations as they would appear on exit to
+// the path.
+//
+// Relations are registered on entry so the path_oracle knows which
+// block to query the root oracle at when a relation lies outside the
+// path.  However, when queried we return the relation on exit to the
+// path, since the root_oracle ignores the registered.
+
+class jt_fur_source : public fur_depend
+{
+public:
+  jt_fur_source (gimple *s, path_range_query *, gori_compute *,
+const vec &);
+  relation_kind query_relation (tree op1, tree op2) override;
+  void register_relation (gimple *, relation_kind, tree op1, tree op2) 
override;
+  void register_relation (edge, relation_kind, tree op1, tree op2) override;
+private:
+  basic_block m_entry;
+};
+
+jt_fur_source::jt_fur_source (gimple *s,
+ path_range_query *query,
+ gori_compute *gori,
+ const vec )
+  : fur_depend (s, gori,

[PATCH 3/7] Move postfold_gcond_edges into fur_source.

2021-09-21 Thread Aldy Hernandez via Gcc-patches

The code registering outgoing edges from a cond is living in
fold_using_range, which makes it difficult to be called from other
places.  Also, it refuses to register relations on the outgoing
destinations that have more than one predecessor.  This latter issue is
a problem because we would like to register outgoing edges along a path
in the path solver (regardless of single_pred_p).

Committed.

gcc/ChangeLog:

* gimple-range-fold.cc (fold_using_range::range_of_range_op):
Rename postfold_gcond_edges to register_outgoing_edges and
adapt.
(fold_using_range::postfold_gcond_edges): Rename...
(fur_source::register_outgoing_edges): ...to this.
* gimple-range-fold.h (postfold_gcond_edges): Rename to
register_outgoing_edges and move to fur_source.
---
 gcc/gimple-range-fold.cc | 44 
 gcc/gimple-range-fold.h  |  2 +-
 2 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index 4dbf4188ec2..921e8e3388f 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -651,7 +651,17 @@ fold_using_range::range_of_range_op (irange , gimple *s, 
fur_source )
}
}
  else if (is_a (s))
-   postfold_gcond_edges (as_a (s), r, src);
+   {
+ basic_block bb = gimple_bb (s);
+ edge e0 = EDGE_SUCC (bb, 0);
+ edge e1 = EDGE_SUCC (bb, 1);
+
+ if (!single_pred_p (e0->dest))
+   e0 = NULL;
+ if (!single_pred_p (e1->dest))
+   e1 = NULL;
+ src.register_outgoing_edges (as_a (s), r, e0, e1);
+   }
}
   else
r.set_varying (type);
@@ -1353,8 +1363,7 @@ fold_using_range::relation_fold_and_or (irange& 
lhs_range, gimple *s,
 // Register any outgoing edge relations from a conditional branch.
 
 void
-fold_using_range::postfold_gcond_edges (gcond *s, irange& lhs_range,
-   fur_source )
+fur_source::register_outgoing_edges (gcond *s, irange _range, edge e0, 
edge e1)
 {
   int_range_max r;
   int_range<2> e0_range, e1_range;
@@ -1366,10 +1375,7 @@ fold_using_range::postfold_gcond_edges (gcond *s, 
irange& lhs_range,
   if (!bb)
 return;
 
-  edge e0 = EDGE_SUCC (bb, 0);
-  if (!single_pred_p (e0->dest))
-e0 = NULL;
-  else
+  if (e0)
 {
   // If this edge is never taken, ignore it.
   gcond_edge_range (e0_range, e0);
@@ -1379,10 +1385,7 @@ fold_using_range::postfold_gcond_edges (gcond *s, 
irange& lhs_range,
 }
 
 
-  edge e1 = EDGE_SUCC (bb, 1);
-  if (!single_pred_p (e1->dest))
-e1 = NULL;
-  else
+  if (e1)
 {
   // If this edge is never taken, ignore it.
   gcond_edge_range (e1_range, e1);
@@ -1391,7 +1394,6 @@ fold_using_range::postfold_gcond_edges (gcond *s, irange& 
lhs_range,
e1 = NULL;
 }
 
-  // At least one edge needs to be single pred.
   if (!e0 && !e1)
 return;
 
@@ -1407,27 +1409,25 @@ fold_using_range::postfold_gcond_edges (gcond *s, 
irange& lhs_range,
{
  relation_kind relation = handler->op1_op2_relation (e0_range);
  if (relation != VREL_NONE)
-   src.register_relation (e0, relation, ssa1, ssa2);
+   register_relation (e0, relation, ssa1, ssa2);
}
   if (e1)
{
  relation_kind relation = handler->op1_op2_relation (e1_range);
  if (relation != VREL_NONE)
-   src.register_relation (e1, relation, ssa1, ssa2);
+   register_relation (e1, relation, ssa1, ssa2);
}
 }
 
   // Outgoing relations of GORI exports require a gori engine.
-  if (!src.gori ())
+  if (!gori ())
 return;
 
-  range_query *q = src.query ();
   // Now look for other relations in the exports.  This will find stmts
   // leading to the condition such as:
   // c_2 = a_4 < b_7
   // if (c_2)
-
-  FOR_EACH_GORI_EXPORT_NAME (*(src.gori ()), bb, name)
+  FOR_EACH_GORI_EXPORT_NAME (*(gori ()), bb, name)
 {
   if (TREE_CODE (TREE_TYPE (name)) != BOOLEAN_TYPE)
continue;
@@ -1439,19 +1439,19 @@ fold_using_range::postfold_gcond_edges (gcond *s, 
irange& lhs_range,
   tree ssa2 = gimple_range_ssa_p (gimple_range_operand2 (stmt));
   if (ssa1 && ssa2)
{
- if (e0 && src.gori ()->outgoing_edge_range_p (r, e0, name, *q)
+ if (e0 && gori ()->outgoing_edge_range_p (r, e0, name, *m_query)
  && r.singleton_p ())
{
  relation_kind relation = handler->op1_op2_relation (r);
  if (relation != VREL_NONE)
-   src.register_relation (e0, relation, ssa1, ssa2);
+   register_relation (e0, relation, ssa1, ssa2);
}
- if (e1 && src.gori ()->outgoing_edge_range_p (r, e1, name, *q)
+ if (e1 && gori ()->outgoing_edge_range_p (r, e1, name, *m_query)
  && r.singleton_p ())
{

[PATCH 6/7] path solver: Add related SSAs to solvable set.

2021-09-21 Thread Aldy Hernandez via Gcc-patches

The path solver takes an initial set of SSA names which are deemed
interesting.  These are then solved along the path.  Adding any copies
of said SSA names to the list of interesting names yields significantly
better results.  This patch adds said copies to the already provided
list.

Currently this code is guarded by "m_resolve", which is the more
expensive mode, but it would be reasonable to make it available always,
especially since adding more imports usually has minimal impact on the
processing time.  I will investigate and make it universally available
if this is indeed the case.

Committed.

gcc/ChangeLog:

* gimple-range-path.cc (path_range_query::add_to_imports): New.
(path_range_query::add_copies_to_imports): New.
(path_range_query::precompute_ranges): Call
add_copies_to_imports.
* gimple-range-path.h (class path_range_query): Add prototypes
for add_copies_to_imports and add_to_imports.
---
 gcc/gimple-range-path.cc | 80 +++-
 gcc/gimple-range-path.h  |  4 +-
 2 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index b86a76fa74b..a8ead3da4dc 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -343,6 +343,71 @@ path_range_query::adjust_for_non_null_uses (basic_block bb)
 }
 }
 
+// If NAME is a supported SSA_NAME, add it the bitmap in IMPORTS.
+
+bool
+path_range_query::add_to_imports (tree name, bitmap imports)
+{
+  if (TREE_CODE (name) == SSA_NAME
+  && irange::supports_type_p (TREE_TYPE (name)))
+return bitmap_set_bit (imports, SSA_NAME_VERSION (name));
+  return false;
+}
+
+// Add the copies of any SSA names in IMPORTS to IMPORTS.
+//
+// These are hints for the solver.  Adding more elements (within
+// reason) doesn't slow us down, because we don't solve anything that
+// doesn't appear in the path.  On the other hand, not having enough
+// imports will limit what we can solve.
+
+void
+path_range_query::add_copies_to_imports ()
+{
+  auto_vec worklist (bitmap_count_bits (m_imports));
+  bitmap_iterator bi;
+  unsigned i;
+
+  EXECUTE_IF_SET_IN_BITMAP (m_imports, 0, i, bi)
+{
+  tree name = ssa_name (i);
+  worklist.quick_push (name);
+}
+
+  while (!worklist.is_empty ())
+{
+  tree name = worklist.pop ();
+  gimple *def_stmt = SSA_NAME_DEF_STMT (name);
+
+  if (is_gimple_assign (def_stmt))
+   {
+ // ?? Adding assignment copies doesn't get us much.  At the
+ // time of writing, we got 63 more threaded paths across the
+ // .ii files from a bootstrap.
+ add_to_imports (gimple_assign_rhs1 (def_stmt), m_imports);
+ tree rhs = gimple_assign_rhs2 (def_stmt);
+ if (rhs && add_to_imports (rhs, m_imports))
+   worklist.safe_push (rhs);
+ rhs = gimple_assign_rhs3 (def_stmt);
+ if (rhs && add_to_imports (rhs, m_imports))
+   worklist.safe_push (rhs);
+   }
+  else if (gphi *phi = dyn_cast  (def_stmt))
+   {
+ for (size_t i = 0; i < gimple_phi_num_args (phi); ++i)
+   {
+ edge e = gimple_phi_arg_edge (phi, i);
+ tree arg = gimple_phi_arg (phi, i)->def;
+
+ if (TREE_CODE (arg) == SSA_NAME
+ && m_path->contains (e->src)
+ && bitmap_set_bit (m_imports, SSA_NAME_VERSION (arg)))
+   worklist.safe_push (arg);
+   }
+   }
+}
+}
+
 // Precompute the ranges for IMPORTS along PATH.
 //
 // IMPORTS are the set of SSA names, any of which could potentially
@@ -353,11 +418,12 @@ path_range_query::precompute_ranges (const 
vec ,
 const bitmap_head *imports)
 {
   set_path (path);
-  m_imports = imports;
+  bitmap_copy (m_imports, imports);
   m_undefined_path = false;
 
   if (m_resolve)
 {
+  add_copies_to_imports ();
   m_oracle->reset_path ();
   precompute_relations (path);
 }
@@ -379,6 +445,18 @@ path_range_query::precompute_ranges (const 
vec ,
 {
   basic_block bb = curr_bb ();
 
+  if (m_resolve)
+   {
+ gori_compute  = m_ranger.gori ();
+ tree name;
+
+ // Exported booleans along the path, may help conditionals.
+ // Add them as interesting imports.
+ FOR_EACH_GORI_EXPORT_NAME (gori, bb, name)
+   if (TREE_CODE (TREE_TYPE (name)) == BOOLEAN_TYPE)
+ bitmap_set_bit (m_imports, SSA_NAME_VERSION (name));
+   }
+
   precompute_ranges_in_block (bb);
   adjust_for_non_null_uses (bb);
 
diff --git a/gcc/gimple-range-path.h b/gcc/gimple-range-path.h
index d12fd7743ca..f23cce18391 100644
--- a/gcc/gimple-range-path.h
+++ b/gcc/gimple-range-path.h
@@ -61,6 +61,8 @@ private:
   void ssa_range_in_phi (irange , gphi *phi);
   void precompute_relations (const vec &);
   void precompute_phi_relations (basic_block bb, basic_block prev);
+  void add_copies_to_imports

[PATCH 7/7] path solver: Use ranger to solve unknowns.

2021-09-21 Thread Aldy Hernandez via Gcc-patches

The default behavior for the path solver is to resort to VARYING when
the range for an unknown SSA is outside the given path.  This is both
cheap and fast, but fails to get a significant amount of ranges that
traditionally the DOM and VRP threaders could get.

This patch uses the ranger to resolve any unknown names upon entry to
the path.  It also uses equivalences to improve ranges.

Committed.

gcc/ChangeLog:

* gimple-range-path.cc (path_range_query::defined_outside_path):
New.
(path_range_query::range_on_path_entry): New.
(path_range_query::internal_range_of_expr): Resolve unknowns
with ranger.
(path_range_query::improve_range_with_equivs): New.
(path_range_query::ssa_range_in_phi): Resolve unknowns with
ranger.
* gimple-range-path.h (class path_range_query): Add
defined_outside_path, range_on_path_entry, and
improve_range_with_equivs.
---
 gcc/gimple-range-path.cc | 89 ++--
 gcc/gimple-range-path.h  |  3 ++
 2 files changed, 88 insertions(+), 4 deletions(-)

diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index a8ead3da4dc..e65c7996bb7 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -121,6 +121,34 @@ path_range_query::debug ()
   dump (stderr);
 }
 
+// Return TRUE if NAME is defined outside the current path.
+
+bool
+path_range_query::defined_outside_path (tree name)
+{
+  gimple *def = SSA_NAME_DEF_STMT (name);
+  basic_block bb = gimple_bb (def);
+
+  return !bb || !m_path->contains (bb);
+}
+
+// Return the range of NAME on entry to the path.
+
+void
+path_range_query::range_on_path_entry (irange , tree name)
+{
+  int_range_max tmp;
+  basic_block entry = entry_bb ();
+  r.set_undefined ();
+  for (unsigned i = 0; i < EDGE_COUNT (entry->preds); ++i)
+{
+  edge e = EDGE_PRED (entry, i);
+  if (e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun)
+ && m_ranger.range_on_edge (tmp, e, name))
+   r.union_ (tmp);
+}
+}
+
 // Return the range of NAME at the end of the path being analyzed.
 
 bool
@@ -132,6 +160,16 @@ path_range_query::internal_range_of_expr (irange , tree 
name, gimple *stmt)
   if (get_cache (r, name))
 return true;
 
+  if (m_resolve && defined_outside_path (name))
+{
+  range_on_path_entry (r, name);
+
+  if (r.varying_p ())
+   improve_range_with_equivs (r, name);
+
+  set_cache (r, name);
+  return true;
+}
 
   basic_block bb = stmt ? gimple_bb (stmt) : exit_bb ();
   if (stmt && range_defined_in_block (r, name, bb))
@@ -139,6 +177,9 @@ path_range_query::internal_range_of_expr (irange , tree 
name, gimple *stmt)
   if (TREE_CODE (name) == SSA_NAME)
r.intersect (gimple_range_global (name));
 
+  if (m_resolve && r.varying_p ())
+   improve_range_with_equivs (r, name);
+
   set_cache (r, name);
   return true;
 }
@@ -160,6 +201,33 @@ path_range_query::range_of_expr (irange , tree name, 
gimple *stmt)
   return false;
 }
 
+// Improve the range of NAME with the range of any of its equivalences.
+
+void
+path_range_query::improve_range_with_equivs (irange , tree name)
+{
+  if (TREE_CODE (name) != SSA_NAME)
+return;
+
+  basic_block entry = entry_bb ();
+  relation_oracle *oracle = m_ranger.oracle ();
+
+  if (const bitmap_head *equivs = oracle->equiv_set (name, entry))
+{
+  int_range_max t;
+  bitmap_iterator bi;
+  unsigned i;
+
+  EXECUTE_IF_SET_IN_BITMAP (equivs, 0, i, bi)
+   if (i != SSA_NAME_VERSION (name) && r.varying_p ())
+ {
+   tree equiv = ssa_name (i);
+   range_on_path_entry (t, equiv);
+   r.intersect (t);
+ }
+}
+}
+
 bool
 path_range_query::unreachable_path_p ()
 {
@@ -189,10 +257,11 @@ path_range_query::ssa_range_in_phi (irange , gphi *phi)
   tree name = gimple_phi_result (phi);
   basic_block bb = gimple_bb (phi);
 
-  // We experimented with querying ranger's range_on_entry here, but
-  // the performance penalty was too high, for hardly any improvements.
   if (at_entry ())
 {
+  if (m_resolve && m_ranger.range_of_expr (r, name, phi))
+   return;
+
   // Try fold just in case we can resolve simple things like PHI <5(99), 
6(88)>.
   if (!fold_range (r, phi, this))
r.set_varying (TREE_TYPE (name));
@@ -210,8 +279,20 @@ path_range_query::ssa_range_in_phi (irange , gphi *phi)
tree arg = gimple_phi_arg_def (phi, i);
 
if (!get_cache (r, arg))
- r.set_varying (TREE_TYPE (name));
-
+ {
+   if (m_resolve)
+ {
+   int_range_max tmp;
+   // Using both the range on entry to the path, and the
+   // range on this edge yields significantly better
+   // results.
+   range_on_path_entry (r, arg);
+   m_ranger.range_on_edge (tmp, e_in, arg);
+   r.intersect (tmp);
+   return;
+ }

[PATCH 0/7] Add ability to resolve unknowns to path solver.

2021-09-21 Thread Aldy Hernandez via Gcc-patches

The default behavior for the path solver is to resort to VARYING when
the range for a requested SSA is outside the given path.  This is both
cheap and fast, but fails to get a significant amount of ranges that
traditionally the DOM and VRP threaders got.

This patchset improves the path solver such that any names outside the
given path are resolved with the ranger.  It also adds the ability to
resolve relations ocurring both within and without the path.

This functionality is turned off by default.  It will be used by the
hybrid VRP threader replacement, as well as by the backward threader
when it becomes the only threader in town.

This entire patchset has been bootstrapped and tested on x86-64 Linux,
both alone, and along with upcoming patches to the threader and VRP.

Committing to trunk.
Aldy

Aldy Hernandez (7):
  Allocate non_null_ref tables at creation.
  Do not query SCEV in range_of_phi unless dominators are available.
  Move postfold_gcond_edges into fur_source.
  path solver: Add relation support.
  path solver: Remove useless code.
  path solver: Add related SSAs to solvable set.
  path solver: Use ranger to solve unknowns.

 gcc/gimple-range-cache.cc |   4 +-
 gcc/gimple-range-fold.cc  |  48 ++---
 gcc/gimple-range-fold.h   |   4 +-
 gcc/gimple-range-path.cc  | 375 +++---
 gcc/gimple-range-path.h   |  20 +-
 gcc/tree-ssa-threadbackward.c |   2 +-
 6 files changed, 396 insertions(+), 57 deletions(-)

-- 
2.31.1

Re: [PATCH 1/3][vect] Add main vectorized loop unrolling

2021-09-21 Thread Andre Vieira (lists) via Gcc-patches


Hi Richi,

Thanks for the review, see below some questions.

On 21/09/2021 13:30, Richard Biener wrote:

On Fri, 17 Sep 2021, Andre Vieira (lists) wrote:


Hi all,

This patch adds the ability to define a target hook to unroll the main
vectorized loop. It also introduces --param's vect-unroll and
vect-unroll-reductions to control this through a command-line. I found this
useful to experiment and believe can help when tuning, so I decided to leave
it in.
We only unroll the main loop and have disabled unrolling epilogues for now. We
also do not support unrolling of any loop that has a negative step and we do
not support unrolling a loop with any reduction other than a
TREE_CODE_REDUCTION.

Bootstrapped and regression tested on aarch64-linux-gnu as part of the series.

I wonder why we want to change the vector modes used for the epilogue,
we're either making it more likely to need to fall through to the
scalar epilogue or require another vectorized epilogue.
I don't quite understand what you mean by change the vector modes for 
the epilogue. I don't think we do.

If you are referring to:
      /* If we are unrolling, try all VECTOR_MODES for the epilogue.  */
      if (loop_vinfo->par_unrolling_factor > 1)
        {
      next_vector_mode = vector_modes[0];
      mode_i = 1;

      if (dump_enabled_p ())
        dump_printf_loc (MSG_NOTE, vect_location,
                 "* Re-trying analysis with vector mode"
                 " %s for epilogue with partial vectors.\n",
                 GET_MODE_NAME (next_vector_mode));
      continue;
        }

That just forces trying the vector modes we've tried before. Though I 
might need to revisit this now I think about it. I'm afraid it might be 
possible for this to generate an epilogue with a vf that is not lower 
than that of the main loop, but I'd need to think about this again.


Either way I don't think this changes the vector modes used for the 
epilogue. But maybe I'm just missing your point here.

That said, for simplicity I'd only change the VF of the main loop.

There I wonder why you need to change vect_update_vf_for_slp or
vect_determine_partial_vectors_and_peeling and why it's not enough
to adjust the VF in a single place, I'd do that here:

   /* This is the point where we can re-start analysis with SLP forced off.
*/
start_over:

   /* Now the vectorization factor is final.  */
   poly_uint64 vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
   gcc_assert (known_ne (vectorization_factor, 0U));

>  call vect_update_vf_for_unroll ()
I can move it there, it would indeed remove the need for the change to 
vect_update_vf_for_slp, the change to 
vect_determine_partial_vectors_and_peeling would still be required I 
think. It is meant to disable using partial vectors in an unrolled loop.

note there's also loop->unroll (from #pragma GCC unroll) which we
could include in what you look at in vect_unroll_value.

I don't like add_stmt_cost_for_unroll - how should a target go
and decide based on what it is fed?  You could as well feed it
the scalar body or the vinfo so it can get a shot at all
the vectorizers meta data - but feeding it individual stmt_infos
does not add any meaningful abstraction and thus what's the
point?
I am still working on tuning our backend hook, but the way it works is 
it estimates how many load, store and general ops are required for the 
vectorized loop based on these.

I _think_ what would make some sense is when we actually cost
the vector body (with the not unrolled VF) ask the target
"well, how about unrolling this?" because there it has the
chance to look at the actual vector stmts produced (in "cost form").
And if the target answers "yeah - go ahead and try x4" we signal
that to the iteration and have "mode N with x4 unroll" validated and
costed.

So instead of a new target hook amend the finish_cost hook to
produce a suggested unroll value and cost both the unrolled and
not unrolled body.

Sorry for steering in a different direction ;)
The reason we decided to do this early and not after cost is because 
'vect_prune_runtime_alias_test_list' and 
'vect_enhance_data_refs_alignment' require the VF and if you suddenly 
raise that the alias analysis could become invalid.


An initial implementation did do it later for that very reason that we 
could reuse the cost calculations and AArch64 already computed these 
'ops' after Richard Sandiford's patches.

But yeah ... the above kinda led me to rewrite it this way.



Thanks,
Richard.




gcc/ChangeLog:

     * doc/tm.texi: Document TARGET_VECTORIZE_UNROLL_FACTOR
     and TARGET_VECTORIZE_ADD_STMT_COST_FOR_UNROLL.
     * doc/tm.texi.in: Add entries for target hooks above.
     * params.opt: Add vect-unroll and vect-unroll-reductions
parameters.
     * target.def: Define hooks TARGET_VECTORIZE_UNROLL_FACTOR
     and TARGET_VECTORIZE_ADD_STMT_COST_FOR_UNROLL.
     * targhooks.c (default_add_stmt_cost_for_unroll): New.

Re: [PATCH] x86-64: Remove HAVE_LD_PIE_COPYRELOC

2021-09-21 Thread Uros Bizjak via Gcc-patches

On Mon, Sep 20, 2021 at 8:20 PM Fāng-ruì Sòng via Gcc-patches
 wrote:
>
> PING^5 https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570139.html
>
> On Sat, Sep 4, 2021 at 12:11 PM Fāng-ruì Sòng  wrote:
> >
> > PING^4 https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570139.html
> >
> > One major design goal of PIE was to avoid copy relocations.
> > The original patch for GCC 5 caused problems for many years.
> >
> > On Wed, Aug 18, 2021 at 11:54 PM Fāng-ruì Sòng  wrote:
> >>
> >> PING^3 https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570139.html
> >>
> >> On Fri, Jun 4, 2021 at 3:04 PM Fāng-ruì Sòng  wrote:
> >> >
> >> > PING^2 https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570139.html
> >> >
> >> > On Mon, May 24, 2021 at 9:43 AM Fāng-ruì Sòng  wrote:
> >> > >
> >> > > Ping https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570139.html
> >> > >
> >> > > On Tue, May 11, 2021 at 8:29 PM Fangrui Song  
> >> > > wrote:
> >> > > >
> >> > > > This was introduced in 2014-12 to use local binding for external 
> >> > > > symbols
> >> > > > for -fPIE. Now that we have H.J. Lu's GOTPCRELX for years which 
> >> > > > mostly
> >> > > > nullify the benefit of HAVE_LD_PIE_COPYRELOC, HAVE_LD_PIE_COPYRELOC
> >> > > > should retire now.
> >> > > >
> >> > > > One design goal of -fPIE was to avoid copy relocations.
> >> > > > HAVE_LD_PIE_COPYRELOC has deviated from the goal.  With this change, 
> >> > > > the
> >> > > > -fPIE behavior of x86-64 will be closer to x86-32 and other targets.
> >> > > >
> >> > > > ---
> >> > > >
> >> > > > See https://gcc.gnu.org/legacy-ml/gcc/2019-05/msg00215.html for a 
> >> > > > list
> >> > > > of fixed and unfixed (e.g. gold incompatibility with protected
> >> > > > https://sourceware.org/bugzilla/show_bug.cgi?id=19823) issues.
> >> > > >
> >> > > > If you prefer a longer write-up, see
> >> > > > https://maskray.me/blog/2021-01-09-copy-relocations-canonical-plt-entries-and-protected
> >> > > > ---
> >> > > >  gcc/config.in |  6 ---
> >> > > >  gcc/config/i386/i386.c| 11 +---
> >> > > >  gcc/configure | 52 
> >> > > > ---
> >> > > >  gcc/configure.ac  | 48 -
> >> > > >  gcc/doc/sourcebuild.texi  |  3 --
> >> > > >  .../gcc.target/i386/pie-copyrelocs-1.c| 14 -
> >> > > >  .../gcc.target/i386/pie-copyrelocs-2.c| 14 -
> >> > > >  .../gcc.target/i386/pie-copyrelocs-3.c| 14 -
> >> > > >  .../gcc.target/i386/pie-copyrelocs-4.c| 17 --
> >> > > >  gcc/testsuite/lib/target-supports.exp | 47 -
> >> > > >  10 files changed, 2 insertions(+), 224 deletions(-)
> >> > > >  delete mode 100644 gcc/testsuite/gcc.target/i386/pie-copyrelocs-1.c
> >> > > >  delete mode 100644 gcc/testsuite/gcc.target/i386/pie-copyrelocs-2.c
> >> > > >  delete mode 100644 gcc/testsuite/gcc.target/i386/pie-copyrelocs-3.c
> >> > > >  delete mode 100644 gcc/testsuite/gcc.target/i386/pie-copyrelocs-4.c

>From x86 maintainer's PoV, the implementation is trivially correct,
but I have no idea about functionality. HJ, can you please review the
functionality and post your opinion on the patch to move it forward?

Thanks,
Uros.

[Ada] Add some comments in init.c about the lynx178 signal handler

2021-09-21 Thread Pierre-Marie de Rodat

__gnat_error_handler for LynxOS in init.c only accepts a "sig" argument.
Add some comments to explain why, and how to fix it with a change to a
kernel build parameter.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* init.c (__gnat_error_handler) [LynxOS]: Add a comment about
missing optional args.diff --git a/gcc/ada/init.c b/gcc/ada/init.c
--- a/gcc/ada/init.c
+++ b/gcc/ada/init.c
@@ -661,6 +661,28 @@ __gnat_install_handler (void)
 #include 
 #include 
 
+/* SA_SIGINFO is not supported by default on LynxOS, so all we have
+   available here is the "sig" argument. On newer LynxOS versions it's
+   possible to support SA_SIGINFO by setting a kernel configuration macro.
+
+   To wit:
+
+   #define NONPOSIX_SA_HANDLER_PROTO (0)
+
+   This macro must be set to 1 in either sys/bsp./uparam.h
+   or in the associated uparam.h customization file sys/bsp./xparam.h
+   (uparam.h includes xparam.h for customization)
+
+   The NONPOSIX_SA_HANDLER_PROTO macro makes it possible to provide
+   signal-catching function with 'info' and 'context' input parameters
+   even if SA_SIGINFO flag is not set or it is set for a non-realtime signal.
+
+   It also allows signal-catching function to update thread context even
+   if SA_UPDATECTX flag is not set.
+
+   This would be useful, but relying on that would transmit the requirement
+   to users to configure that feature as well, which is undesirable.  */
+
 static void
 __gnat_error_handler (int sig)
 {

[Ada] Rename "optional" node subtypes that allow Empty

2021-09-21 Thread Pierre-Marie de Rodat

This patch renames the new Opt_... subtypes in Sinfo.Nodes and
Einfo.Entities to end with the suffix "_Id" for homogeneity with other
subtypes of Node_Id and Entity_Id.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* gen_il-gen.adb (Put_Opt_Subtype): Add suffix.diff --git a/gcc/ada/gen_il-gen.adb b/gcc/ada/gen_il-gen.adb
--- a/gcc/ada/gen_il-gen.adb
+++ b/gcc/ada/gen_il-gen.adb
@@ -1503,7 +1503,7 @@ package body Gen_IL.Gen is
  procedure Put_Opt_Subtype (T : Node_Or_Entity_Type) is
  begin
 if Type_Table (T).Parent /= No_Type then
-   Put (S, "subtype Opt_" & Image (T) & " is" & LF);
+   Put (S, "subtype Opt_" & Id_Image (T) & " is" & LF);
Increase_Indent (S, 2);
Put (S, Id_Image (Root));
 
@@ -1513,8 +1513,8 @@ package body Gen_IL.Gen is
if Enable_Assertions then
   Put (S, " with Predicate =>" & LF);
   Increase_Indent (S, 2);
-  Put (S, "Opt_" & Image (T) & " = Empty or else" & LF);
-  Put (S, "Opt_" & Image (T) & " in " & Id_Image (T));
+  Put (S, "Opt_" & Id_Image (T) & " = Empty or else" & LF);
+  Put (S, "Opt_" & Id_Image (T) & " in " & Id_Image (T));
   Decrease_Indent (S, 2);
end if;

[Ada] Spurious dynamic accessibility check on allocator

2021-09-21 Thread Pierre-Marie de Rodat

This patch corrects an issue in the compiler whereby an anonymous access
class-wide type allocator with default initialization has spuriously
generated dynamic accessibility checks associated with the construct -
leading to spurious runtime accessibility failures.

Additionally, this patch corrects level miscalculations for protected
components.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_util.adb (Accessibility_Level): Remove spurious special
case for protected type components.
* exp_ch4.adb (Generate_Accessibility_Check): Use general
Accessibility_Level instead of the low-level function
Type_Access_Level.diff --git a/gcc/ada/exp_ch4.adb b/gcc/ada/exp_ch4.adb
--- a/gcc/ada/exp_ch4.adb
+++ b/gcc/ada/exp_ch4.adb
@@ -767,8 +767,7 @@ package body Exp_Ch4 is
 Cond :=
   Make_Op_Gt (Loc,
 Left_Opnd  => Cond,
-Right_Opnd =>
-  Make_Integer_Literal (Loc, Type_Access_Level (PtrT)));
+Right_Opnd => Accessibility_Level (N, Dynamic_Level));
 
 --  Due to the complexity and side effects of the check, utilize an
 --  if statement instead of the regular Program_Error circuitry.


diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -728,17 +728,6 @@ package body Sem_Util is
return Make_Level_Literal
 (Typ_Access_Level (Etype (E)));
 
---  When E is a component of the current instance of a
---  protected type, we assume the level to be deeper than that of
---  the type itself.
-
-elsif not Is_Overloadable (E)
-  and then Ekind (Scope (E)) = E_Protected_Type
-  and then Comes_From_Source (Scope (E))
-then
-   return Make_Level_Literal
-(Scope_Depth (Enclosing_Dynamic_Scope (E)) + 1);
-
 --  Check if E is an expansion-generated renaming of an iterator
 --  by examining Related_Expression. If so, determine the
 --  accessibility level based on the original expression.

[Ada] SCOs: generate 'P' decisions for [Type_]Invariant pragmas

2021-09-21 Thread Pierre-Marie de Rodat

Those pragmas should be dealt with in the same way as their equivalent
aspects.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* par_sco.adb (Traverse_One): Add support for pragma Invariant /
Type_Invariant.diff --git a/gcc/ada/par_sco.adb b/gcc/ada/par_sco.adb
--- a/gcc/ada/par_sco.adb
+++ b/gcc/ada/par_sco.adb
@@ -2248,6 +2248,8 @@ package body Par_SCO is
 | Name_Loop_Invariant
 | Name_Postcondition
 | Name_Precondition
+| Name_Type_Invariant
+| Name_Invariant
  =>
 --  For Assert/Check/Precondition/Postcondition, we
 --  must generate a P entry for the decision. Note
@@ -2256,7 +2258,10 @@ package body Par_SCO is
 --  on when we output the decision line in Put_SCOs,
 --  depending on setting by Set_SCO_Pragma_Enabled.
 
-if Nam = Name_Check then
+if Nam = Name_Check
+   or else Nam = Name_Type_Invariant
+   or else Nam = Name_Invariant
+then
Next (Arg);
 end if;
 
@@ -2285,8 +2290,7 @@ package body Par_SCO is
  --  never disabled.
 
  --  Should generate P decisions (not X) for assertion
- --  related pragmas: [Type_]Invariant,
- --  [{Static,Dynamic}_]Predicate???
+ --  related pragmas: [{Static,Dynamic}_]Predicate???
 
  when others =>
 Process_Decisions_Defer (N, 'X');

[Ada] Add "optional" node subtypes that allow Empty

2021-09-21 Thread Pierre-Marie de Rodat

This patch adds new Opt_... subtypes to Sinfo.Nodes and Einfo.Entities.
The predicates say "Opt_N_Declaration = Empty" rather than "No
(Opt_N_Declaration)" because No is not visible. It can't be made visible
with "with Atree;", because that would introduce cycles. It could be
made visible by moving it to Types, but that causes a minor earthquake
(changes in compiler, codepeer, and spark), so we're leaving No where it
is.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* gen_il-gen.adb (Put_Opt_Subtype): Print out subtypes of the
form:
subtype Opt_N_Declaration is
Node_Id with Predicate =>
Opt_N_Declaration = Empty or else
Opt_N_Declaration in N_Declaration_Id;
One for each node or entity type, with the predicate allowing
Empty.
* atree.adb (Parent, Set_Parent): Remove unnecessary "Atree.".diff --git a/gcc/ada/atree.adb b/gcc/ada/atree.adb
--- a/gcc/ada/atree.adb
+++ b/gcc/ada/atree.adb
@@ -1828,7 +1828,7 @@ package body Atree is
 
function Parent (N : Node_Or_Entity_Id) return Node_Or_Entity_Id is
begin
-  pragma Assert (Atree.Present (N));
+  pragma Assert (Present (N));
 
   if Is_List_Member (N) then
  return Parent (List_Containing (N));
@@ -2151,7 +2151,7 @@ package body Atree is
 
procedure Set_Parent (N : Node_Or_Entity_Id; Val : Node_Or_Entity_Id) is
begin
-  pragma Assert (Atree.Present (N));
+  pragma Assert (Present (N));
   pragma Assert (not In_List (N));
   Set_Link (N, Union_Id (Val));
end Set_Parent;


diff --git a/gcc/ada/gen_il-gen.adb b/gcc/ada/gen_il-gen.adb
--- a/gcc/ada/gen_il-gen.adb
+++ b/gcc/ada/gen_il-gen.adb
@@ -1405,6 +1405,10 @@ package body Gen_IL.Gen is
  --  Print out a subtype (of type Node_Id or Entity_Id) for a given
  --  nonroot abstract type.
 
+ procedure Put_Opt_Subtype (T : Node_Or_Entity_Type);
+ --  Print out an "optional" subtype; that is, one that allows
+ --  Empty. Their names start with "Opt_".
+
  procedure Put_Enum_Type is
 procedure Put_Enum_Lit (T : Node_Or_Entity_Type);
 --  Print out one enumeration literal in the declaration of
@@ -1496,6 +1500,29 @@ package body Gen_IL.Gen is
 end if;
  end Put_Id_Subtype;
 
+ procedure Put_Opt_Subtype (T : Node_Or_Entity_Type) is
+ begin
+if Type_Table (T).Parent /= No_Type then
+   Put (S, "subtype Opt_" & Image (T) & " is" & LF);
+   Increase_Indent (S, 2);
+   Put (S, Id_Image (Root));
+
+   --  Assert that the Opt_XXX subtype is empty or in the XXX
+   --  subtype.
+
+   if Enable_Assertions then
+  Put (S, " with Predicate =>" & LF);
+  Increase_Indent (S, 2);
+  Put (S, "Opt_" & Image (T) & " = Empty or else" & LF);
+  Put (S, "Opt_" & Image (T) & " in " & Id_Image (T));
+  Decrease_Indent (S, 2);
+   end if;
+
+   Put (S, ";" & LF);
+   Decrease_Indent (S, 2);
+end if;
+ end Put_Opt_Subtype;
+
   begin -- Put_Type_And_Subtypes
  Put_Enum_Type;
 
@@ -1544,7 +1571,20 @@ package body Gen_IL.Gen is
 end if;
  end loop;
 
- Put (S, "subtype Flag is Boolean;" & LF & LF);
+ Put (S, LF & "--  Optional subtypes of " & Id_Image (Root) & "." &
+  " These allow Empty." & LF & LF);
+
+ Iterate_Types (Root, Pre => Put_Opt_Subtype'Access);
+
+ Put (S, LF & "--  Optional union types:" & LF & LF);
+
+ for T in First_Abstract (Root) .. Last_Abstract (Root) loop
+if Type_Table (T) /= null and then Type_Table (T).Is_Union then
+   Put_Opt_Subtype (T);
+end if;
+ end loop;
+
+ Put (S, LF & "subtype Flag is Boolean;" & LF & LF);
   end Put_Type_And_Subtypes;
 
   function Low_Level_Getter_Name (T : Type_Enum) return String is

[Ada] bindgen: support additional features on targets suppressing the standard lib

2021-09-21 Thread Pierre-Marie de Rodat

For targets that suppress the standard library, the binder can now set
the default stack size and enable stack checking when GCC stack limit
are used.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* bindgen.adb (Gen_Adainit): For targets that suppress the
standard library: set the default stack size global variable if
a value is provided via the -d switch, and generate a call to
__gnat_initialize_stack_limit if stack checking using stack
limits is enabled.diff --git a/gcc/ada/bindgen.adb b/gcc/ada/bindgen.adb
--- a/gcc/ada/bindgen.adb
+++ b/gcc/ada/bindgen.adb
@@ -588,6 +588,27 @@ package body Bindgen is
 WBI ("");
  end if;
 
+ --  Import the default stack object if a size has been provided to the
+ --  binder.
+
+ if Opt.Default_Stack_Size /= Opt.No_Stack_Size then
+WBI ("  Default_Stack_Size : Integer;");
+WBI ("  pragma Import (C, Default_Stack_Size, " &
+ """__gl_default_stack_size"");");
+ end if;
+
+ --  Initialize stack limit variable of the environment task if the
+ --  stack check method is stack limit and stack check is enabled.
+
+ if Stack_Check_Limits_On_Target
+   and then (Stack_Check_Default_On_Target or Stack_Check_Switch_Set)
+ then
+WBI ("");
+WBI ("  procedure Initialize_Stack_Limit;");
+WBI ("  pragma Import (C, Initialize_Stack_Limit, " &
+ """__gnat_initialize_stack_limit"");");
+ end if;
+
  if System_Secondary_Stack_Package_In_Closure then
 --  System.Secondary_Stack is in the closure of the program
 --  because the program uses the secondary stack or the restricted
@@ -619,6 +640,15 @@ package body Bindgen is
 
  WBI ("   begin");
 
+ --  Set the default stack size if provided to the binder
+
+ if Opt.Default_Stack_Size /= Opt.No_Stack_Size then
+Set_String ("  Default_Stack_Size := ");
+Set_Int (Default_Stack_Size);
+Set_String (";");
+Write_Statement_Buffer;
+ end if;
+
  if Main_Priority /= No_Main_Priority then
 Set_String ("  Main_Priority := ");
 Set_Int(Main_Priority);
@@ -643,6 +673,7 @@ package body Bindgen is
  end if;
 
  if Main_Priority = No_Main_Priority
+   and then Opt.Default_Stack_Size = Opt.No_Stack_Size
and then Main_CPU = No_Main_CPU
and then not System_Tasking_Restricted_Stages_Used
  then

[Ada] Fix regression in ACATS bdd2006 and bdd2007

2021-09-21 Thread Pierre-Marie de Rodat

This fix is not strictly necessary to pass these ACATS tests, but this
improves the error message, and avoids updating expected outputs.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_ch13.adb (Stream_Size): Print message about allowed stream
sizes even if other error were already found. This avoids
falling into the 'else', which prints "Stream_Size cannot be
given for...", which is misleading -- the Size COULD be given if
it were correct.diff --git a/gcc/ada/sem_ch13.adb b/gcc/ada/sem_ch13.adb
--- a/gcc/ada/sem_ch13.adb
+++ b/gcc/ada/sem_ch13.adb
@@ -7824,12 +7824,17 @@ package body Sem_Ch13 is
 if Duplicate_Clause then
null;
 
-elsif Is_Elementary_Type (U_Ent) and then Present (Size) then
-   if Size /= System_Storage_Unit
- and then Size /= System_Storage_Unit * 2
- and then Size /= System_Storage_Unit * 3
- and then Size /= System_Storage_Unit * 4
- and then Size /= System_Storage_Unit * 8
+elsif Is_Elementary_Type (U_Ent) then
+   --  Size will be empty if we already detected an error
+   --  (e.g. Expr is of the wrong type); we might as well
+   --  give the useful hint below even in that case.
+
+   if No (Size) or else
+ (Size /= System_Storage_Unit
+  and then Size /= System_Storage_Unit * 2
+  and then Size /= System_Storage_Unit * 3
+  and then Size /= System_Storage_Unit * 4
+  and then Size /= System_Storage_Unit * 8)
then
   Error_Msg_N
 ("stream size for elementary type must be 8, 16, 24, " &

[Ada] Set related expression for external DISCR symbols in Build_Temporary

2021-09-21 Thread Pierre-Marie de Rodat

This is required for CodePeer to use a better name for a variable, or a
constant created by GNAT.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* exp_util.adb (Build_Temporary): In case of an external DISCR
symbol, set the related expression for CodePeer so that a more
comprehensible message can be emitted to the user.diff --git a/gcc/ada/exp_util.adb b/gcc/ada/exp_util.adb
--- a/gcc/ada/exp_util.adb
+++ b/gcc/ada/exp_util.adb
@@ -11656,6 +11656,7 @@ package body Exp_Util is
   is
  Temp_Id  : Entity_Id;
  Temp_Nam : Name_Id;
+ Should_Set_Related_Expression : Boolean := False;
 
   begin
  --  The context requires an external symbol : expression is
@@ -11675,6 +11676,12 @@ package body Exp_Util is
 
 else
pragma Assert (Discr_Number > 0);
+
+   --  We don't have any intelligible way of printing T_DISCR in
+   --  CodePeer. Thus, set a related expression in this case.
+
+   Should_Set_Related_Expression := True;
+
--  Use fully qualified name to avoid ambiguities.
 
Temp_Nam :=
@@ -11684,6 +11691,10 @@ package body Exp_Util is
 
 Temp_Id := Make_Defining_Identifier (Loc, Temp_Nam);
 
+if Should_Set_Related_Expression then
+   Set_Related_Expression (Temp_Id, Related_Nod);
+end if;
+
  --  Otherwise generate an internal temporary
 
  else

[Ada] Crash on build of Initialization procedure for derived container

2021-09-21 Thread Pierre-Marie de Rodat

This patch fixes a compiler abort on the construction of the
initialization procedure for a private type completed by a derived
container type whose element type is another container with controlled
components with trivial initializations,

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* exp_ch7.adb (Make_Init_Call): Add guard to protect against a
missing initialization procedure for a type.diff --git a/gcc/ada/exp_ch7.adb b/gcc/ada/exp_ch7.adb
--- a/gcc/ada/exp_ch7.adb
+++ b/gcc/ada/exp_ch7.adb
@@ -9555,8 +9555,11 @@ package body Exp_Ch7 is
 
   --  If initialization procedure for an array of controlled objects is
   --  trivial, do not generate a useless call to it.
+  --  The initialization procedure may be missing altogether in the case
+  --  of a derived container whose components have trivial initialization.
 
-  if (Is_Array_Type (Utyp) and then Is_Trivial_Subprogram (Proc))
+  if No (Proc)
+or else (Is_Array_Type (Utyp) and then Is_Trivial_Subprogram (Proc))
 or else
   (not Comes_From_Source (Proc)
 and then Present (Alias (Proc))

[Ada] Cleanup old VxWorks in Makefile.rtl

2021-09-21 Thread Pierre-Marie de Rodat

The sections titled "PowerPC and e500v2 VxWorks 653" and "VxWorksae /
VxWorks 653 for x86 (vxsim)" in Makefile.rtl are removed since they are
no longer used. Also remove the relevant packages.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* Makefile.rtl: Remove unused VxWorks sections.
* libgnarl/s-vxwext__noints.adb: Remove.
* libgnarl/s-vxwext__vthreads.ads: Remove.
* libgnat/a-elchha__vxworks-ppc-full.adb: Remove.
* libgnat/s-osprim__vxworks.adb: Remove.
* libgnat/s-osvers__vxworks-653.ads: Remove.
* libgnat/system-vxworks-e500-vthread.ads: Remove.
* libgnat/system-vxworks-ppc-vthread.ads: Remove.
* libgnat/system-vxworks-x86-vthread.ads: Remove.

patch.diff.gz
Description: application/gzip

[Ada] Add assertions to Uintp (UI_Is_In_Int_Range)

2021-09-21 Thread Pierre-Marie de Rodat

This completes the previous change that added assertions to Uintp.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* uintp.ads, uintp.adb (UI_Is_In_Int_Range): Change the type of
the formal parameter to Valid_Uint. Remove code that preserved
the previous behavior, and replace it with an assertion. The
previous behavior is no longer needed given the recent change to
gigi.
(No, Present): Add comment.diff --git a/gcc/ada/uintp.adb b/gcc/ada/uintp.adb
--- a/gcc/ada/uintp.adb
+++ b/gcc/ada/uintp.adb
@@ -1693,16 +1693,15 @@ package body Uintp is
-- UI_Is_In_Int_Range --
-
 
-   function UI_Is_In_Int_Range (Input : Uint) return Boolean is
+   function UI_Is_In_Int_Range (Input : Valid_Uint) return Boolean is
+  pragma Assert (Present (Input));
+  --  Assertion is here in case we're called from C++ code, which does
+  --  not check the predicates.
begin
   --  Make sure we don't get called before Initialize
 
   pragma Assert (Uint_Int_First /= Uint_0);
 
-  if No (Input) then -- Preserve old behavior
- return True;
-  end if;
-
   if Direct (Input) then
  return True;
   else


diff --git a/gcc/ada/uintp.ads b/gcc/ada/uintp.ads
--- a/gcc/ada/uintp.ads
+++ b/gcc/ada/uintp.ads
@@ -90,6 +90,10 @@ package Uintp is
Uint_Minus_127 : constant Uint;
Uint_Minus_128 : constant Uint;
 
+   --  Functions for detecting No_Uint. Note that clients of this package
+   --  cannot use "=" and "/=" to compare with No_Uint; they must use No
+   --  and Present instead.
+
function No (X : Uint) return Boolean is (X = No_Uint);
--  Note that this is using the predefined "=", not the "=" declared below,
--  which would blow up on No_Uint.
@@ -169,10 +173,9 @@ package Uintp is
pragma Inline (UI_Gt);
--  Compares integer values for greater than
 
-   function UI_Is_In_Int_Range (Input : Uint) return Boolean;
+   function UI_Is_In_Int_Range (Input : Valid_Uint) return Boolean;
pragma Inline (UI_Is_In_Int_Range);
--  Determines if universal integer is in Int range.
-   --  Input should probably be of type Valid_Uint.
 
function UI_Le (Left : Valid_Uint; Right : Valid_Uint) return Boolean;
function UI_Le (Left : Int;  Right : Valid_Uint) return Boolean;

[Ada] Remove if_expression

2021-09-21 Thread Pierre-Marie de Rodat

Replace an if_expression with an if_statement, because codepeer is
tripping over the if_expression.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_eval.adb (Fold_Shift): Replace an if_expression with an
if_statement.diff --git a/gcc/ada/sem_eval.adb b/gcc/ada/sem_eval.adb
--- a/gcc/ada/sem_eval.adb
+++ b/gcc/ada/sem_eval.adb
@@ -5063,12 +5063,20 @@ package body Sem_Eval is
--  result is always positive, even if the original operand was
--  negative.
 
-   Fold_Uint
- (N,
-  (Expr_Value (Left) +
- (if Expr_Value (Left) >= Uint_0 then Uint_0 else Modulus))
-  / (Uint_2 ** Expr_Value (Right)),
-  Static => Static);
+   declare
+  M : Unat;
+   begin
+  if Expr_Value (Left) >= Uint_0 then
+ M := Uint_0;
+  else
+ M := Modulus;
+  end if;
+
+  Fold_Uint
+(N,
+ (Expr_Value (Left) + M) / (Uint_2 ** Expr_Value (Right)),
+ Static => Static);
+   end;
 end if;
  elsif Op = N_Op_Shift_Right_Arithmetic then
 Check_Elab_Call;

[Ada] Add assertions to Uintp

2021-09-21 Thread Pierre-Marie de Rodat

Add appropriate assertions to the operations in Uintp.  Most operations
disallow No_Uint. Division disallows Uint_0 on the right, and so on.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* uintp.ads, uintp.adb: Add assertions.
(Ubool, Opt_Ubool): New "boolean" subtypes.
(UI_Is_In_Int_Range): The parameter should probably be
Valid_Uint, but we don't change that for now, because it causes
failures in gigi.
* sem_util.ads, sem_util.adb (Is_True, Is_False,
Static_Boolean): Use Opt_Ubool subtype.  Document the fact that
Is_True (No_Uint) = True.  Implement Is_False in terms of
Is_True.  We considered changing Static_Boolean to return Uint_1
in case of error, but that doesn't fit in well with
Static_Integer.
(Has_Compatible_Alignment_Internal): Deal with cases where Offs
is No_Uint. Change one "and" to "and then" to ensure we don't
pass No_Uint to ">", which would violate the new assertions.
* exp_util.adb, freeze.adb, sem_ch13.adb: Avoid violating new
assertions in Uintp.

patch.diff.gz
Description: application/gzip

[Ada] Small optimization to DWARF 5 mode in System.Dwarf_Line

2021-09-21 Thread Pierre-Marie de Rodat

There is no need to fetch every string from the .debug_line_str section.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* libgnat/s-dwalin.adb (To_File_Name): Fetch only the last string
from the .debug_line_str section.
(Symbolic_Address.Set_Result): Likewise.diff --git a/gcc/ada/libgnat/s-dwalin.adb b/gcc/ada/libgnat/s-dwalin.adb
--- a/gcc/ada/libgnat/s-dwalin.adb
+++ b/gcc/ada/libgnat/s-dwalin.adb
@@ -957,8 +957,10 @@ package body System.Dwarf_Lines is
 
  when DW_FORM_line_strp =>
 Read_Section_Offset (C.Lines, Off, C.Header.Is64);
-Seek (C.Line_Str, Off);
-Read_C_String (C.Line_Str, Buf);
+if J = File then
+   Seek (C.Line_Str, Off);
+   Read_C_String (C.Line_Str, Buf);
+end if;
 
  when others =>
 raise Dwarf_Error with "DWARF form not implemented";
@@ -1674,8 +1676,10 @@ package body System.Dwarf_Lines is
 
 when DW_FORM_line_strp =>
Read_Section_Offset (C.Lines, Off, C.Header.Is64);
-   Seek (C.Line_Str, Off);
-   File_Name := Read_C_String (C.Line_Str);
+   if J = Match.File then
+  Seek (C.Line_Str, Off);
+  File_Name := Read_C_String (C.Line_Str);
+   end if;
 
 when others =>
raise Dwarf_Error with "DWARF form not implemented";
@@ -1718,8 +1722,10 @@ package body System.Dwarf_Lines is
 
 when DW_FORM_line_strp =>
Read_Section_Offset (C.Lines, Off, C.Header.Is64);
-   Seek (C.Line_Str, Off);
-   Dir_Name := Read_C_String (C.Line_Str);
+   if J = Dir_Idx then
+  Seek (C.Line_Str, Off);
+  Dir_Name := Read_C_String (C.Line_Str);
+   end if;
 
 when others =>
raise Dwarf_Error with "DWARF form not implemented";

[Ada] Follow-up tweaks to System.Dwarf_Line

2021-09-21 Thread Pierre-Marie de Rodat

This fixes a couple of thinkos in the previous change.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* libgnat/s-dwalin.adb (Skip_Form): Fix cases of DW_FORM_addrx
and DW_FORM_implicit_const.  Replace Constraint_Error with
Dwarf_Error.diff --git a/gcc/ada/libgnat/s-dwalin.adb b/gcc/ada/libgnat/s-dwalin.adb
--- a/gcc/ada/libgnat/s-dwalin.adb
+++ b/gcc/ada/libgnat/s-dwalin.adb
@@ -1114,8 +1114,6 @@ package body System.Dwarf_Lines is
   case Form is
  when DW_FORM_addr =>
 Skip := Offset (Ptr_Sz);
- when DW_FORM_addrx =>
-Skip := Offset (uint32'(Read_LEB128 (S)));
  when DW_FORM_block1 =>
 Skip := Offset (uint8'(Read (S)));
  when DW_FORM_block2 =>
@@ -1161,11 +1159,12 @@ package body System.Dwarf_Lines is
 begin
return;
 end;
- when DW_FORM_udata
-| DW_FORM_ref_udata
+ when DW_FORM_addrx
 | DW_FORM_loclistx
+| DW_FORM_ref_udata
 | DW_FORM_rnglistx
 | DW_FORM_strx
+| DW_FORM_udata
=>
 declare
Val : constant uint32 := Read_LEB128 (S);
@@ -1173,7 +1172,7 @@ package body System.Dwarf_Lines is
 begin
return;
 end;
- when DW_FORM_flag_present =>
+ when DW_FORM_flag_present | DW_FORM_implicit_const =>
 return;
  when DW_FORM_ref_addr
 | DW_FORM_sec_offset
@@ -1187,10 +1186,10 @@ package body System.Dwarf_Lines is
null;
 end loop;
 return;
- when DW_FORM_implicit_const | DW_FORM_indirect =>
-raise Constraint_Error;
+ when DW_FORM_indirect =>
+raise Dwarf_Error with "DW_FORM_indirect not implemented";
  when others =>
-raise Constraint_Error;
+raise Dwarf_Error with "DWARF form not implemented";
   end case;
 
   Seek (S, Tell (S) + Skip);

[Ada] exp_pakd.adb: work around spurious Codepeer warnings

2021-09-21 Thread Pierre-Marie de Rodat

Codepeer erroneously emits a warning for this if expression. Replacing
it with a statement is enough to make the problem disappear.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* exp_pakd.adb (Expand_Packed_Not): Replace expression with
statement.diff --git a/gcc/ada/exp_pakd.adb b/gcc/ada/exp_pakd.adb
--- a/gcc/ada/exp_pakd.adb
+++ b/gcc/ada/exp_pakd.adb
@@ -2002,7 +2002,11 @@ package body Exp_Pakd is
   --  actual subtype of the operand. Preserve old behavior in case size is
   --  not set.
 
-  Size := (if Known_RM_Size (PAT) then RM_Size (PAT) else Uint_0);
+  if Known_RM_Size (PAT) then
+ Size := RM_Size (PAT);
+  else
+ Size := Uint_0;
+  end if;
   Lit := Make_Integer_Literal (Loc, 2 ** Size - 1);
   Set_Print_In_Hex (Lit);

[Ada] rtems: add 128bit support for aarch64

2021-09-21 Thread Pierre-Marie de Rodat

Add 128BITS integer support for aarch64-rtems6.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* Makefile.rtl (aarch64-rtems*): Add GNATRTL_128BIT_PAIRS to
the LIBGNAT_TARGET_PAIRS list and also GNATRTL_128BIT_OBJS to
the EXTRA_GNATRTL_NONTASKING_OBJS list.diff --git a/gcc/ada/Makefile.rtl b/gcc/ada/Makefile.rtl
--- a/gcc/ada/Makefile.rtl
+++ b/gcc/ada/Makefile.rtl
@@ -2196,6 +2196,11 @@ ifeq ($(strip $(filter-out rtems%,$(target_os))),)
 EH_MECHANISM=-gcc
   endif
 
+  ifeq ($(strip $(filter-out aarch64%,$(target_cpu))),)
+LIBGNAT_TARGET_PAIRS += $(GNATRTL_128BIT_PAIRS)
+EXTRA_GNATRTL_NONTASKING_OBJS += $(GNATRTL_128BIT_OBJS)
+  endif
+
   ifeq ($(strip $(filter-out aarch64% riscv%,$(target_cpu))),)
 LIBGNAT_TARGET_PAIRS += a-nallfl.ads

[Ada] Fix ignored dynamic predicates specified through "predicate" aspect

2021-09-21 Thread Pierre-Marie de Rodat

Before this patch, GNAT would ignore dynamic predicates specified
through the "predicate" pragma when attempting to evaluate expressions.
This would result in incorrect behavior in cases like the following:

   subtype SS is String (1 .. 4) with Predicate => SS (2) = 'e';
   pragma Assert ("" in SS);

Where the assert would not fail.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_eval.adb (Is_Static_Subtype): Take predicates created
through "predicate" pragma into account.diff --git a/gcc/ada/sem_eval.adb b/gcc/ada/sem_eval.adb
--- a/gcc/ada/sem_eval.adb
+++ b/gcc/ada/sem_eval.adb
@@ -5741,6 +5741,8 @@ package body Sem_Eval is
   elsif Has_Dynamic_Predicate_Aspect (Typ)
 or else (Is_Derived_Type (Typ)
   and then Has_Aspect (Typ, Aspect_Dynamic_Predicate))
+or else (Has_Aspect (Typ, Aspect_Predicate)
+  and then not Has_Static_Predicate (Typ))
   then
  return False;

[Ada] Presence of abstract operator function causes resolution problems

2021-09-21 Thread Pierre-Marie de Rodat

The declaration of an abstract function with an operator designator can
result in removing a nonhomographic user-defined operator as a possible
interpretation in an overloaded expression, leading to an error about
mismatched types.  The condition for marking an interpretation as being
a predefined operator that should be hidden by an abstract operator
function was incomplete, and only checked that the result was numeric,
without checking that the interpretation was actually for an operator.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_ch4.adb (Remove_Abstract_Operations): Add condition to
test for an E_Operator as part of criteria for setting
Abstract_Op on interpretations involving predefined operators.diff --git a/gcc/ada/sem_ch4.adb b/gcc/ada/sem_ch4.adb
--- a/gcc/ada/sem_ch4.adb
+++ b/gcc/ada/sem_ch4.adb
@@ -8029,6 +8029,7 @@ package body Sem_Ch4 is
while Present (It.Nam) loop
   if Is_Numeric_Type (It.Typ)
 and then Scope (It.Typ) = Standard_Standard
+and then Ekind (It.Nam) = E_Operator
   then
  Set_Abstract_Op (I, Abstract_Op);
   end if;

[Ada] Interface behaves differently from abstract tagged null

2021-09-21 Thread Pierre-Marie de Rodat

When the result expression of a simple-return-statement is a type
conversion, and the tag of the expression differs from the tag of the
specific nonlimited return type, the frontend silently skips ensuring
that the tag of the returned object is that of the result type.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* exp_ch6.adb (Expand_Simple_Function_Return): For explicit
dereference of type conversion, enable code that ensures that
the tag of the result is that of the result type.diff --git a/gcc/ada/exp_ch6.adb b/gcc/ada/exp_ch6.adb
--- a/gcc/ada/exp_ch6.adb
+++ b/gcc/ada/exp_ch6.adb
@@ -7437,6 +7437,10 @@ package body Exp_Ch6 is
 and then not Is_Class_Wide_Type (Utyp)
 and then (Nkind (Exp) in
   N_Type_Conversion | N_Unchecked_Type_Conversion
+or else (Nkind (Exp) = N_Explicit_Dereference
+   and then Nkind (Prefix (Exp)) in
+  N_Type_Conversion |
+  N_Unchecked_Type_Conversion)
 or else (Is_Entity_Name (Exp)
and then Is_Formal (Entity (Exp
   then

[Ada] Clean up uses of Esize and RM_Size

2021-09-21 Thread Pierre-Marie de Rodat

This patch updates calls to Esize and RM_Size so they will work with the
new representation of "unknown" (i.e.  "not yet set").  The old
representation is "Uint_0".  The new one will be "initial zero bits".

The new representation is not yet installed; we are still using Uint_0.
A future change will fix that.

In some cases, we have to explicitly set the size to Uint_0 in order to
preserve the old behavior.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* einfo-utils.adb: Add support (currently disabled) for using
"initial zero" instead of "Uint_0" to represent "unknown".  Call
Known_ functions, instead of evilly duplicating their code
inline.
* fe.h (No_Uint_To_0): New function to convert No_Uint to
Uint_0, in order to preserve existing behavior.
(Copy_Esize, Copy_RM_Size): New imports from Einfo.Utils.
* cstand.adb: Set size fields of Standard_Debug_Renaming_Type
and Standard_Exception_Type.
* checks.adb, exp_attr.adb, exp_ch3.adb, exp_ch5.adb,
exp_ch6.adb, exp_pakd.adb, exp_util.adb, freeze.adb, itypes.adb,
layout.adb, repinfo.adb, sem_attr.adb, sem_ch12.adb,
sem_ch13.adb, sem_ch13.ads, sem_ch3.adb, sem_ch7.adb,
sem_util.adb: Protect calls with Known_..., use Copy_...  Remove
assumption that Uint_0 represents "unknown".
* types.ads (Nonzero_Int): New subtype.
* gcc-interface/decl.c, gcc-interface/trans.c: Protect calls
with Known_... and use Copy_...  as appropriate, to avoid
blowing up in unknown cases. Similarly, call No_Uint_To_0 to
preserve existing behavior.

patch.diff.gz
Description: application/gzip

[Ada] Enforce legality rule for Predicate_Failure aspect specifications

2021-09-21 Thread Pierre-Marie de Rodat

If a Predicate_Failure aspect is specified for a type or subtype, Ada
requires that either the Static_Predicate aspect or the
Dynamic_Predicate aspect must also be specified for that same type or
subtype. [The GNAT-defined Predicate aspect can also be used to meet
this requirement.] The point is that an aspect inherited from some other
source does not meet this requirment.  Add enforcement of this legality
rule.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_ch13.adb (Analyze_Aspect_Specifications): Add a new nested
function, Directly_Specified, and then use it in the
implementation of the required check.diff --git a/gcc/ada/sem_ch13.adb b/gcc/ada/sem_ch13.adb
--- a/gcc/ada/sem_ch13.adb
+++ b/gcc/ada/sem_ch13.adb
@@ -1884,6 +1884,11 @@ package body Sem_Ch13 is
 --  expression is allowed. Includes checking that the expression
 --  does not raise Constraint_Error.
 
+function Directly_Specified
+  (Id : Entity_Id; A : Aspect_Id) return Boolean;
+--  Returns True if the given aspect is directly (as opposed to
+--  via any form of inheritance) specified for the given entity.
+
 function Make_Aitem_Pragma
   (Pragma_Argument_Associations : List_Id;
Pragma_Name  : Name_Id) return Node_Id;
@@ -2777,6 +2782,18 @@ package body Sem_Ch13 is
end if;
 end Check_Expr_Is_OK_Static_Expression;
 
+
+-- Directly_Specified --
+
+
+function Directly_Specified
+  (Id : Entity_Id; A : Aspect_Id) return Boolean
+is
+   Aspect_Spec : constant Node_Id := Find_Aspect (Id, A);
+begin
+   return Present (Aspect_Spec) and then Entity (Aspect_Spec) = Id;
+end Directly_Specified;
+
 ---
 -- Make_Aitem_Pragma --
 ---
@@ -3342,6 +3359,15 @@ package body Sem_Ch13 is
("Predicate_Failure requires previous predicate" &
 " specification", Aspect);
  goto Continue;
+
+  elsif not (Directly_Specified (E, Aspect_Dynamic_Predicate)
+or else Directly_Specified (E, Aspect_Static_Predicate)
+or else Directly_Specified (E, Aspect_Predicate))
+  then
+ Error_Msg_N
+   ("Predicate_Failure requires accompanying" &
+" noninherited predicate specification", Aspect);
+ goto Continue;
   end if;
 
   --  Construct the pragma

[Ada] Refactor sort procedures of doubly linked list containers

2021-09-21 Thread Pierre-Marie de Rodat

In earlier work, a performance problem was addressed by rewriting
Ada.Containers.Doubly_Linked_Lists.Generic_Sorting in a-cdlili.adb.  It
turned out that the very-slow-in-some-cases Sort algorithm formerly used
there was duplicated in 4 other units: the Bounded, Formal, Indefinite,
and Restricted versions. With this change, we use the better sorting
algorithm in all 5 cases while reducing code duplication.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* libgnat/a-costso.ads, libgnat/a-costso.adb: A new library
unit, Ada.Containers.Stable_Sorting, which exports a pair of
generics (one within the other) which are instantiated by each
of the 5 doubly-linked list container generics to implement
their respective Sort procedures. We use a pair of generics,
rather than a single generic, in order to further reduce code
duplication. The outer generic takes a formal private Node_Ref
type representing a reference to a linked list element. For some
instances, the corresponding actual parameter will be an access
type; for others, it will be the index type for an array.
* Makefile.rtl: Include new Ada.Containers.Stable_Sorting unit.
* libgnat/a-cbdlli.adb, libgnat/a-cdlili.adb,
libgnat/a-cfdlli.adb, libgnat/a-cidlli.adb, libgnat/a-crdlli.adb
(Sort): Replace existing Sort implementation with a call to an
instance of
Ada.Containers.Stable_Sorting.Doubly_Linked_List_Sort. Declare
the (trivial) actual parameters needed to declare that instance.
* libgnat/a-cfdlli.ads: Fix a bug encountered during testing in
the postcondition for M_Elements_Sorted. With a partial
ordering, it is possible for all three of (X < Y), (Y < X),
and (X = Y) to be simultaneously false, so that case needs to
handled correctly.

patch.diff.gz
Description: application/gzip

[Ada] Update comment for Error_Msg_Internal

2021-09-21 Thread Pierre-Marie de Rodat

When Error_Msg_Internal parameters Sptr and Optr were renamed to Span
and Opan, its comment has not been updated.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* errout.adb (Error_Msg_Internal): Fix references to Sptr and
Optr in comment; fix grammar of "low-level" where it is used as
an adjective.diff --git a/gcc/ada/errout.adb b/gcc/ada/errout.adb
--- a/gcc/ada/errout.adb
+++ b/gcc/ada/errout.adb
@@ -106,15 +106,15 @@ package body Errout is
   Opan : Source_Span;
   Msg_Cont : Boolean;
   Node : Node_Id);
-   --  This is the low level routine used to post messages after dealing with
+   --  This is the low-level routine used to post messages after dealing with
--  the issue of messages placed on instantiations (which get broken up
-   --  into separate calls in Error_Msg). Sptr is the location on which the
+   --  into separate calls in Error_Msg). Span is the location on which the
--  flag will be placed in the output. In the case where the flag is on
--  the template, this points directly to the template, not to one of the
-   --  instantiation copies of the template. Optr is the original location
+   --  instantiation copies of the template. Opan is the original location
--  used to flag the error, and this may indeed point to an instantiation
-   --  copy. So typically we can see Optr pointing to the template location
-   --  in an instantiation copy when Sptr points to the source location of
+   --  copy. So typically we can see Opan pointing to the template location
+   --  in an instantiation copy when Span points to the source location of
--  the actual instantiation (i.e the line with the new). Msg_Cont is
--  set true if this is a continuation message. Node is the relevant
--  Node_Id for this message, to be used to compute the enclosing entity if

[Ada] Exception raised on empty file in GNATprove mode

2021-09-21 Thread Pierre-Marie de Rodat

Adapt computation of indexes in buffer for outputting error messages to
avoid an index out-of-bound exception on an empty file in GNATprove
mode.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* errout.adb (Get_Line_End): Do not allow the result to go past
the end of the buffer.diff --git a/gcc/ada/errout.adb b/gcc/ada/errout.adb
--- a/gcc/ada/errout.adb
+++ b/gcc/ada/errout.adb
@@ -2473,7 +2473,8 @@ package body Errout is
  function Get_Line_End
(Buf : Source_Buffer_Ptr;
 Loc : Source_Ptr) return Source_Ptr;
- --  Get the source location for the end of the line in Buf for Loc
+ --  Get the source location for the end of the line in Buf for Loc. If
+ --  Loc is past the end of Buf already, return Buf'Last.
 
  function Get_Line_Start
(Buf : Source_Buffer_Ptr;
@@ -2515,9 +2516,9 @@ package body Errout is
(Buf : Source_Buffer_Ptr;
 Loc : Source_Ptr) return Source_Ptr
  is
-Cur_Loc : Source_Ptr := Loc;
+Cur_Loc : Source_Ptr := Source_Ptr'Min (Loc, Buf'Last);
  begin
-while Cur_Loc <= Buf'Last
+while Cur_Loc < Buf'Last
   and then Buf (Cur_Loc) /= ASCII.LF
 loop
Cur_Loc := Cur_Loc + 1;

[Ada] Refine patch for spurious link error involving discriminated types

2021-09-21 Thread Pierre-Marie de Rodat

This patch handles properly the case of a Component_Definition appearing
in a Component_Declaration.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_ch3.adb (Process_Discriminant_Expressions): If the
constraint is for a Component_Definition that appears in a
Component_Declaration, the entity to be used to create the
potentially global symbol is the Defining_Identifier of the
Component_Declaration.diff --git a/gcc/ada/sem_ch3.adb b/gcc/ada/sem_ch3.adb
--- a/gcc/ada/sem_ch3.adb
+++ b/gcc/ada/sem_ch3.adb
@@ -10502,13 +10502,30 @@ package body Sem_Ch3 is
if Expander_Active
  and then Comes_From_Source (Def)
  and then not Is_Subprogram (Current_Scope)
- and then Nkind (Parent (Def)) in
-   N_Object_Declaration | N_Component_Declaration
then
-  Force_Evaluation (
-Discr_Expr (J),
-Related_Id => Defining_Identifier (Parent (Def)),
-Discr_Number => J);
+  declare
+ Id : Entity_Id := Empty;
+  begin
+ if Nkind (Parent (Def)) = N_Object_Declaration then
+Id := Defining_Identifier (Parent (Def));
+
+ elsif Nkind (Parent (Def)) = N_Component_Definition
+   and then
+ Nkind (Parent (Parent (Def)))
+= N_Component_Declaration
+ then
+Id := Defining_Identifier (Parent (Parent (Def)));
+ end if;
+
+ if Present (Id) then
+Force_Evaluation (
+  Discr_Expr (J),
+  Related_Id => Id,
+  Discr_Number => J);
+ else
+Force_Evaluation (Discr_Expr (J));
+ end if;
+  end;
else
   Force_Evaluation (Discr_Expr (J));
end if;

[Ada] Remove "with GNAT.OS_Lib;" from libgnat/a-stbufi.ads

2021-09-21 Thread Pierre-Marie de Rodat

...and replace with System.OS_Lib, because we don't want things under
Ada to depend on GNAT.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* libgnat/a-stbufi.ads, libgnat/a-stbufi.adb: Change all
occurrences of GNAT.OS_Lib to System.OS_Lib.diff --git a/gcc/ada/libgnat/a-stbufi.adb b/gcc/ada/libgnat/a-stbufi.adb
--- a/gcc/ada/libgnat/a-stbufi.adb
+++ b/gcc/ada/libgnat/a-stbufi.adb
@@ -45,7 +45,7 @@ package body Ada.Strings.Text_Buffers.Files is
end Put_UTF_8_Implementation;
 
function Create_From_FD
- (FD  : GNAT.OS_Lib.File_Descriptor;
+ (FD  : System.OS_Lib.File_Descriptor;
   Close_Upon_Finalization : Boolean := True) return File_Buffer
is
   use OS;


diff --git a/gcc/ada/libgnat/a-stbufi.ads b/gcc/ada/libgnat/a-stbufi.ads
--- a/gcc/ada/libgnat/a-stbufi.ads
+++ b/gcc/ada/libgnat/a-stbufi.ads
@@ -30,7 +30,7 @@
 --
 
 with Ada.Finalization;
-with GNAT.OS_Lib;
+with System.OS_Lib;
 
 package Ada.Strings.Text_Buffers.Files is
 
@@ -38,7 +38,7 @@ package Ada.Strings.Text_Buffers.Files is
--  Output written to a File_Buffer is written to the associated file.
 
function Create_From_FD
- (FD  : GNAT.OS_Lib.File_Descriptor;
+ (FD  : System.OS_Lib.File_Descriptor;
   Close_Upon_Finalization : Boolean := True)
  return File_Buffer;
--  file closed upon finalization if specified
@@ -47,9 +47,11 @@ package Ada.Strings.Text_Buffers.Files is
--  file closed upon finalization
 
function Create_Standard_Output_Buffer return File_Buffer is
- (Create_From_FD (GNAT.OS_Lib.Standout, Close_Upon_Finalization => False));
+ (Create_From_FD (System.OS_Lib.Standout,
+  Close_Upon_Finalization => False));
function Create_Standard_Error_Buffer return File_Buffer is
- (Create_From_FD (GNAT.OS_Lib.Standerr, Close_Upon_Finalization => False));
+ (Create_From_FD (System.OS_Lib.Standerr,
+  Close_Upon_Finalization => False));
 
 private
 
@@ -60,7 +62,7 @@ private
 
package Mapping is new Output_Mapping (Put_UTF_8_Implementation);
 
-   package OS renames GNAT.OS_Lib;
+   package OS renames System.OS_Lib;
 
type Self_Ref (Self : not null access File_Buffer)
  is new Finalization.Limited_Controlled with null record;

Re: [PATCH] Allow different vector types for stmt groups

2021-09-21 Thread H.J. Lu via Gcc-patches

On Mon, Sep 20, 2021 at 5:15 AM Richard Biener via Gcc-patches
 wrote:
>
> This allows vectorization (in practice non-loop vectorization) to
> have a stmt participate in different vector type vectorizations.
> It allows us to remove vect_update_shared_vectype and replace it
> by pushing/popping STMT_VINFO_VECTYPE from SLP_TREE_VECTYPE around
> vect_analyze_stmt and vect_transform_stmt.
>
> For data-ref the situation is a bit more complicated since we
> analyze alignment info with a specific vector type in mind which
> doesn't play well when that changes.
>
> So the bulk of the change is passing down the actual vector type
> used for a vectorized access to the various accessors of alignment
> info, first and foremost dr_misalignment but also aligned_access_p,
> known_alignment_for_access_p, vect_known_alignment_in_bytes and
> vect_supportable_dr_alignment.  I took the liberty to replace
> ALL_CAPS macro accessors with the lower-case function invocations.
>
> The actual changes to the behavior are in dr_misalignment which now
> is the place factoring in the negative step adjustment as well as
> handling alignment queries for a vector type with bigger alignment
> requirements than what we can (or have) analyze(d).
>
> vect_slp_analyze_node_alignment makes use of this and upon receiving
> a vector type with a bigger alingment desire re-analyzes the DR
> with respect to it but keeps an older more precise result if possible.
> In this context it might be possible to do the analysis just once
> but instead of analyzing with respect to a specific desired alignment
> look for the biggest alignment we can compute a not unknown alignment.
>
> The ChangeLog includes the functional changes but not the bulk due
> to the alignment accessor API changes - I hope that's something good.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, testing on SPEC
> CPU 2017 in progress (for stats and correctness).
>
> Any comments?
>
> Thanks,
> Richard.
>
> 2021-09-17  Richard Biener  
>
> PR tree-optimization/97351
> PR tree-optimization/97352
> PR tree-optimization/82426
> * tree-vectorizer.h (dr_misalignment): Add vector type
> argument.
> (aligned_access_p): Likewise.
> (known_alignment_for_access_p): Likewise.
> (vect_supportable_dr_alignment): Likewise.
> (vect_known_alignment_in_bytes): Likewise.  Refactor.
> (DR_MISALIGNMENT): Remove.
> (vect_update_shared_vectype): Likewise.
> * tree-vect-data-refs.c (dr_misalignment): Refactor, handle
> a vector type with larger alignment requirement and apply
> the negative step adjustment here.
> (vect_calculate_target_alignment): Remove.
> (vect_compute_data_ref_alignment): Get explicit vector type
> argument, do not apply a negative step alignment adjustment
> here.
> (vect_slp_analyze_node_alignment): Re-analyze alignment
> when we re-visit the DR with a bigger desired alignment but
> keep more precise results from smaller alignments.
> * tree-vect-slp.c (vect_update_shared_vectype): Remove.
> (vect_slp_analyze_node_operations_1): Do not update the
> shared vector type on stmts.
> * tree-vect-stmts.c (vect_analyze_stmt): Push/pop the
> vector type of an SLP node to the representative stmt-info.
> (vect_transform_stmt): Likewise.
>
> * gcc.target/i386/vect-pr82426.c: New testcase.
> * gcc.target/i386/vect-pr97352.c: Likewise.
> ---
>  gcc/testsuite/gcc.target/i386/vect-pr82426.c |  32 +++
>  gcc/testsuite/gcc.target/i386/vect-pr97352.c |  22 ++
>  gcc/tree-vect-data-refs.c| 217 ++-
>  gcc/tree-vect-slp.c  |  59 -
>  gcc/tree-vect-stmts.c|  77 ---
>  gcc/tree-vectorizer.h|  32 ++-
>  6 files changed, 231 insertions(+), 208 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/vect-pr82426.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/vect-pr97352.c
>
> diff --git a/gcc/testsuite/gcc.target/i386/vect-pr82426.c 
> b/gcc/testsuite/gcc.target/i386/vect-pr82426.c
> new file mode 100644
> index 000..741a1d14d36
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-pr82426.c
> @@ -0,0 +1,32 @@
> +/* i?86 does not have V2SF, x32 does though.  */
> +/* { dg-do compile { target { lp64 || x32 } } } */

It should be target { ! ia32 }

> +/* ???  With AVX512 we only realize one FMA opportunity.  */

Hongtao, is AVX512 missing 64-bit vector support??

> +/* { dg-options "-O3 -mavx -mfma -mno-avx512f" } */
> +
> +struct Matrix
> +{
> +  float m11;
> +  float m12;
> +  float m21;
> +  float m22;
> +  float dx;
> +  float dy;
> +};
> +
> +struct Matrix multiply(const struct Matrix *a, const struct Matrix *b)
> +{
> +  struct Matrix out;
> +  out.m11 = a->m11*b->m11 + a->m12*b->m21;
> +  out.m12 = a->m11*b->m12 +

[PATCH] Always default to DWARF2_DEBUG if not specified, warn about deprecated STABS

2021-09-21 Thread Richard Biener via Gcc-patches

This makes defaults.h choose DWARF2_DEBUG if PREFERRED_DEBUGGING_TYPE
is not specified by the target and errors out if DWARF DWARF is not supported.

It also makes us warn when STABS is enabled but not the preferred
debugging type and removes the corresponding diagnostic from the Ada frontend.
The warnings are pruned from the testsuite output via prune_gcc_output.

The following target configurations now explicitely default to STABS:
 pdp11-*-*   pdp11 is a.out, dwarf support is difficult
 hppa[12]*-*-hpux10*  does not support DWARF
 hppa[12]*-*-hpux11*  likewise
note that the hppa configs have been deprecated.

Targets with DWARF support will now see
> ./cc1 -quiet t.c -gstabs
t.c: warning: STABS debugging information is obsolete and not supported anymore

that is, -gstabs will still generate STABS but use will be diagnosed
on targets where DWARF is available.

I have built all targets from contrib/config-list.mk to make sure we
don't run into the #error and the following makes the STABS usage
explicit for pdp11 and hppa with SOM.

This completes the series of deprecating STABS for GCC 12.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK for trunk?

Thanks,
Richard.

2021-09-21  Richard Biener  

gcc/
* defaults.h (PREFERRED_DEBUGGING_TYPE): Choose DWARF2_DEBUG
when not set.
* toplev.c (process_options): Warn when STABS debugging is
enabled but not the preferred format.
* config/pa/som.h (PREFERRED_DEBUGGING_TYPE): Define to
DBX_DEBUG.
* config/pdp11/pdp11.h (PREFERRED_DEBUGGING_TYPE): Likewise.

gcc/ada/
* gcc-interface/misc.c (gnat_post_options): Do not warn
about DBX_DEBUG use here.

gcc/testsuite/
* lib/prune.exp: Prune STABS obsoletion message.
---
 gcc/ada/gcc-interface/misc.c |  6 --
 gcc/config/pa/som.h  |  4 
 gcc/config/pdp11/pdp11.h |  3 +++
 gcc/defaults.h   | 29 +
 gcc/testsuite/lib/prune.exp  |  3 +++
 gcc/toplev.c |  5 +
 6 files changed, 20 insertions(+), 30 deletions(-)

diff --git a/gcc/ada/gcc-interface/misc.c b/gcc/ada/gcc-interface/misc.c
index 96199bd4b63..87a4c8662cb 100644
--- a/gcc/ada/gcc-interface/misc.c
+++ b/gcc/ada/gcc-interface/misc.c
@@ -274,12 +274,6 @@ gnat_post_options (const char **pfilename ATTRIBUTE_UNUSED)
   if (!global_options_set.x_flag_diagnostics_show_caret)
 global_dc->show_caret = false;
 
-  /* Warn only if STABS is not the default: we don't want to emit a warning if
- the user did not use a -gstabs option.  */
-  if (PREFERRED_DEBUGGING_TYPE != DBX_DEBUG && write_symbols == DBX_DEBUG)
-warning (0, "STABS debugging information for Ada is obsolete and not "
-   "supported anymore");
-
   /* Copy global settings to local versions.  */
   gnat_encodings = global_options.x_gnat_encodings;
   optimize = global_options.x_optimize;
diff --git a/gcc/config/pa/som.h b/gcc/config/pa/som.h
index 05cc315b9f9..36a1122b132 100644
--- a/gcc/config/pa/som.h
+++ b/gcc/config/pa/som.h
@@ -21,6 +21,10 @@ along with GCC; see the file COPYING3.  If not see
 #undef TARGET_SOM
 #define TARGET_SOM 1
 
+/* With SOM we can only do STABS.  */
+#undef PREFERRED_DEBUGGING_TYPE
+#define PREFERRED_DEBUGGING_TYPE DBX_DEBUG
+
 /* We do not use BINCL stabs in SOM.
??? If it does not hurt, we probably should to avoid useless divergence
from other embedded stabs implementations.  */
diff --git a/gcc/config/pdp11/pdp11.h b/gcc/config/pdp11/pdp11.h
index 9bc5e089f49..91a8ce70751 100644
--- a/gcc/config/pdp11/pdp11.h
+++ b/gcc/config/pdp11/pdp11.h
@@ -54,6 +54,9 @@ along with GCC; see the file COPYING3.  If not see
 
 #define DBX_DEBUGGING_INFO
 
+#undef PREFERRED_DEBUGGING_TYPE
+#define PREFERRED_DEBUGGING_TYPE DBX_DEBUG
+
 #define TARGET_40_PLUS (TARGET_40 || TARGET_45)
 #define TARGET_10  (! TARGET_40_PLUS)
 
diff --git a/gcc/defaults.h b/gcc/defaults.h
index ba79a8e48ed..d7f2546f2cc 100644
--- a/gcc/defaults.h
+++ b/gcc/defaults.h
@@ -900,33 +900,14 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 #define DEFAULT_GDB_EXTENSIONS 1
 #endif
 
-/* If more than one debugging type is supported, you must define
-   PREFERRED_DEBUGGING_TYPE to choose the default.  */
-
-#if 1 < (defined (DBX_DEBUGGING_INFO) \
- + defined (DWARF2_DEBUGGING_INFO) + defined (XCOFF_DEBUGGING_INFO) \
- + defined (VMS_DEBUGGING_INFO))
+/* Default to DWARF2_DEBUGGING_INFO.  Legacy targets can choose different
+   by defining PREFERRED_DEBUGGING_TYPE.  */
 #ifndef PREFERRED_DEBUGGING_TYPE
-#error You must define PREFERRED_DEBUGGING_TYPE
-#endif /* no PREFERRED_DEBUGGING_TYPE */
-
-/* If only one debugging format is supported, define PREFERRED_DEBUGGING_TYPE
-   here so other code needn't care.  */
-#elif defined DBX_DEBUGGING_INFO
-#define PREFERRED_DEBUGGING_TYPE DBX_DEBUG
-
-#elif defined DWARF2_DEBUGGING_INFO || defined DWARF2_LINENO_DEBUGGING_INFO
+#if defined

Re: [PATCH] darwin: support aarch64-darwin host

2021-09-21 Thread Yuta Saito via Gcc-patches

Hello Iain,

Thank you for taking a look at my patch.

>
> Hello Yuta
>
> thanks for your patch and interest.
>
> > On 17 Sep 2021, at 16:26, Yuta Saito via Gcc-patches 
> >  wrote:
>
> > Currently, building gcc for aarch64-darwin host fails due to missing
> > host_hooks definition.
> >
> > This patch adds host_hooks definition for aarch64-darwin.
> > aarch64-darwin is not supported as a target yet, but this allows using
> > gcc cross-compiler on aarch64-darwin.
>
> 1. The development prototype for aarch64-darwin is here:
>https://github.com/iains/gcc-darwin-arm64
>
> we need to phase the work into master with the approval of the Arm
> maintainers - so I would recommend that if you have fixes or improvements
> in the short-term, to make pull requests against the development branch.
>
> 2.
> The patch you have presented is identical in the host-only content to
> the existing development one:
>
> https://github.com/iains/gcc-darwin-arm64/commit/2190f7bda7bc0e6d0b74c7bd41c97510a685b06b
>
> So, that aspect has already been handled in the development.
>

Oh, I didn't know that development repository.

> > I confirmed linking gcc-cross succeed on aarch64 darwin.
>
> Patches need more testing than that - specifically, that they do not
> regress other targets (unlikely, in this case, but it is good form to
> test at least that aarch64-linux-gnu is unaffected).
>
> 
>
> There are two other small changes I’d like to check before enabling host 
> support
> 1/
> https://github.com/iains/gcc-darwin-arm64/commit/98c8f79929db1bf29ac52f748137b08c21976483
> might be needed for cross tools too.
>
> 2/
> We need to ensure that PCH is defaulted to “off” and that there is a proper 
> warning if
> the user tries to configure it “on” see:
> https://github.com/iains/gcc-darwin-arm64/issues/2
> (which we do not plan to fix in the short term).
>

Thank you for your advice. I'll apply these patches.

> 
>
> There is clearly interest in building cross-compilers on aarch64-darwin, so I 
> will try
> to phase the host support sooner rather than later,
>
> thanks again for the patch,
> Iain
>

I hope it will reach the upstream in the near future.
Thanks again for working on the darwin-arm64 support.

Yuta

RE: [RFC/PATCH] C++ constexpr vs. floating point exceptions.

2021-09-21 Thread Roger Sayle

Thanks.  It's a minefield.  I'm regretting using a division by zero example
(as it unintentionally drags in the whole undefined behaviour thing).

How about my recently added g++.dg/pr88173-1.C, is the following valid,
i.e. const, if ordered comparisons against NaNs should raise an exception
with (the default) -ftrapping-math?  Or only with -fno-trapping-math?

constexpr bool my_false1 = __builtin_nan("") < 1.0;
constexpr bool my_false2 = __builtin_nans("") < 1.0;

I think Richard Beiner was right; some trapping math is acceptable and
other trapping math is unacceptable, with (currently) no distinction
(in the middle-end) between the two.

Perhaps the g++ front-end relies on/is abusing flag_trapping_math
as a proxy to ensure that division by zero is undefined (is unfolded).

Cheers,
Roger
--

-Original Message-
From: Jakub Jelinek  
Sent: 21 September 2021 14:22
To: Roger Sayle ; Jason Merrill
; Jonathan Wakely 
Cc: 'Xi Ruoyao' ; 'GCC Patches'

Subject: Re: [RFC/PATCH] C++ constexpr vs. floating point exceptions.

On Tue, Sep 21, 2021 at 02:15:59PM +0100, Roger Sayle wrote:
> Can you double check?  Integer division by zero is undefined, but 
> isn't floating point division by zero defined by the appropriate IEEE
standards?

https://eel.is/c++draft/expr.mul#4 doesn't make the division by zero
behavior conditional on integral types.
C has similar wording.

Jakub

Re: [COMMITTED] Use EDGE_EXECUTABLE in ranger and return UNDEFINED for those edges.

2021-09-21 Thread Andrew MacLeod via Gcc-patches


On 9/21/21 9:32 AM, Richard Biener wrote:

On Tue, Sep 21, 2021 at 2:57 PM Andrew MacLeod  wrote:

On 9/21/21 2:14 AM, Richard Biener wrote:

On Tue, Sep 21, 2021 at 8:09 AM Richard Biener
 wrote:

On Tue, Sep 21, 2021 at 12:01 AM Andrew MacLeod via Gcc-patches
 wrote:

The patch sets the EXECUTABLE property on edges like VRP does, and then
removes that flag when an edge is determined to be un-executable.

This information is then used to return UNDEFINED for any requests on
un-executable edges, and to register equivalencies if all executable
edges of a PHI node are the same SSA_NAME.

This catches up a number of the cases VRP gets that ranger was missing,
and reduces the EVRP discrepancies to almost 0.

On a side note,  is there any interest/value in reversing the meaning of
that flag?  It seems to me that we could assume edges are EXECUTABLE by
default, then set a NON_EXECUTABLE flag when a pass determines the edge
cannot be executed.  This would rpevent a number fo passes from having
to loop through all the edges and set the EXECUTABLE property...   It
just seems backwards to me.

The flag is simply not kept up-to-date and it's the passes responsibility to
make use of it (aka install a default state upon entry).

To me not having EDGE_EXECUTABLE set on entry is more natural
for optimistic propagation passes, but yes, if you do on-demand greedy
processing then you need a conservative default.  But then how do you
denote a 'VARYING' (executable) state that may not drop back to 'CONSTANT"
(not executable)?  For optimistic propagation EDGE_EXECUTABLE set is
simply the varying state and since we never clear it again there's no chance
of oscillation.

Different model, we dont have a lattice whereby we track state and move
form one to another.. we just track currently best known values for
everything and recalculate them when the old values are stale.   We move
the edge to unexecutable when those values allow us to rewrite a branch
such that an edge can no longer be taken. everything else is executable.
Any values on an unexecutable edge are then considered UNDEFINED when
combined with other values..


Btw, I fail to see how the patch makes ranger assure a sane initial state of
EDGE_EXECUTABLE (or make its use conditional).  Is the code you patched
not also used on-demand?

THe constructor for a ranger makes everything executable to start.
Calls the same routine VRP does.

   gimple_ranger::gimple_ranger () : tracer ("")
   {
@@ -41,6 +42,7 @@ gimple_ranger::gimple_ranger () : tracer ("")
 m_oracle = m_cache.oracle ();
 if (dump_file && (param_evrp_mode & EVRP_MODE_TRACE))
   tracer.enable_trace ();
+  set_all_edges_as_executable (cfun);
   }

Ah, I see.  I had the impression that with ranger we can now
do a cheap query everywhere on the range of an SSA name.  But then
the above is O(CFG size)...


One of the reasons I'd like to see it persistent :-)  We could 
alternatively add another new one, something like EDGE_NEVER_EXECUTED 
which is cleared by default when created and only ranger/other 
interested passes utilize it and it is kept persistent.   Just seems 
more appropriate to "fix" the current flag. I took a quick look at that, 
but it seemed like one or more of the propagation passes may use the 
flag for other nefarious purposes. It would require fixing everyone to 
maintain the value properly.


 Queries are still "cheap", but there are varying amounts of lookups 
and allocations that are done.  If the lack of a persistent EXECUTABLE 
edge flag continues, I may make some further tweaks and make it 
sensitive to whether EXECUTABLE is to be looked at or not and perhaps 
only have the VRPs initiate that.  I prefer avoiding different modes 
when possible tho.


Currently most/all uses of ranger are instantiated and used for the 
duration of a pass, so the O(cfg) is pretty minimal with all the CFG 
traversing and caching required.





I guess I'm confusing something - but yes, clearly in a ranger VRP "pass"
that all sounds OK.

Richard.

Re: [RFC/PATCH] C++ constexpr vs. floating point exceptions.

2021-09-21 Thread Xi Ruoyao via Gcc-patches

On Tue, 2021-09-21 at 21:38 +0800, Xi Ruoyao via Gcc-patches wrote:

> BTW the "correct" way to get a NaN in C++ seems:
> 
> #include 
> constexpr double my_nan = std::numeric_limits::quiet_NaN();

Sorry, we were discussing inf, not NaN...  Then

constexpr double my_inf = std::numeric_limits::infinity();

If the C++ impl supports ISO/IEC 60559 (an alias of IEEE 754), it will
produce a positive infinity. Otherwise the compilation will fail.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [RFC/PATCH] C++ constexpr vs. floating point exceptions.

2021-09-21 Thread Xi Ruoyao via Gcc-patches

On Tue, 2021-09-21 at 14:15 +0100, Roger Sayle wrote:
> 
> Can you double check?  Integer division by zero is undefined, but isn't
> floating point
> division by zero defined by the appropriate IEEE standards?

It's an undefined behavior in C++ standard.  Even if we assume C++ *was*
IEEE-754 conforming (C++ standard is not saying "C++ is IEEE-754
conforming"!), it would be still not "something can be evaluated in
compile time".  IEEE 754 says the default behavior (it can be altered
with some way, for example using feenableexcept() on systems with Glibc)
of floating div-by-zero with both operands finite is "The divideByZero
exception shall be signaled", and:

"The operations of this standard, when exceptional, can as a side effect
raise some of the following status flags: inexact, underflow, overflow,
divideByZero, and invalid operation."

And this is a side effect in real-life: those status flags are normally
implemented as some bits in the floating point status register of the
CPU and can be queried, for example ISO C defined fegetexceptflag() to
do this.

So it's a side effect anyway.  You can't evaluate a side-effect during
compilation.

BTW the "correct" way to get a NaN in C++ seems:

#include 
constexpr double my_nan = std::numeric_limits::quiet_NaN();

If the C++ impl supports ISO/IEC 60559 (an alias of IEEE 754), it will
produce a quiet NaN which can be used in constexpr (you can use
signal_NaN() instead of quiet_NaN() if you want a signaling NaN). 
Otherwise the compilation will fail.
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [RFC/PATCH] C++ constexpr vs. floating point exceptions.

2021-09-21 Thread Richard Biener via Gcc-patches

On Tue, Sep 21, 2021 at 3:07 PM Xi Ruoyao via Gcc-patches
 wrote:
>
> On Tue, 2021-09-21 at 12:41 +0100, Roger Sayle wrote:
> >
> > I was wondering if I may ask the C++ language experts for their
> > opinion
> > on whether (potential) floating point exceptions/traps can be ignored
> > in constant expressions; this is related to PR c++/96862.  I think my
> > question boils down to whether (or not) the following is valid C++:
> >
> > constexpr float my_inf = 7.0 / 0.0;
>
> It's not.  C++ disallows constexpr from invoking undefined behaviors in
> Clauses 4 through 19, while division by zero is an undefined behavior in
> Clause 8.
>
> > [This is currently an error with "-O2", but OK with "-O2 -ffast-
> > math"!]
>
> > There's a long history of g++'s semantics being accidentally tied to
> > the middle-end's constant folding, such that the current status quo
> > is that some middle-end bugs can't be fixed without breaking C++,
> > and vice versa.  I'm hoping that the patch below (following Jakub's
> > lead with rounding math) might be a next step to improving things,
> > provided that my understanding of the desired/correct behaviour of
> > the C++ front-end is correct.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with a "make
> > bootstrap"
> > and "make -k check" with no new failures after tweaking two checks in
> > g++.dg/ubsan/pr63956.C.
>
> > With this change the middle-end can become more
> > strict about respecting flag_trapping_math without affecting C++'s
> > behavior.  Ideally, what the front-end considers valid should be
> > independent of whether the user specified -fno-trapping-math (or
> > -ffast-math) to the middle-end.
>
> I think we can allow a constexpr to contain floating div-by-zero with -
> fno-trapping-math, if we consider -fno-trapping-math make the behavior
> "no longer undefined" but I'm not sure.

Btw, -ftrapping-math also makes inexact operations possibly trapping
so I'm not sure if any combination of -ftrapping-math and -ffinite-math-only
can capture what C++ requires here.

> However -ffast-math also enables -ffinite-math-only, which makes div-by-
> zero absolutely undefined.  So to me the expected behavior is:
>
> g++ t.cc   -> Compile error
> g++ t.cc -ffast-math   -> Compile error
> g++ t.cc -fno-trapping-math-> Ok
> g++ t.cc -fno-trapping-math -ffinite-math-only -> Compile Error
>
> > Thoughts?  Ok for mainline?
>
> Based on the reasoning above I think it's not OK.  But anyway I'm not a
> maintainer.
>
> > 2021-09-21  Roger Sayle  
> >
> > gcc/cp/ChangeLog
> > * constexpr.c (cxx_eval_outermost_const_expr): Temporarily
> > disable
> > the middle-end from honoring floating point exceptions/traps
> > while
> > folding "manifestly constant" expressions.
> >
> > gcc/testsuite/ChangeLog
> > * g++.dg/ubsan/pr63956.C: Update to (always) allow floating
> > point
> > division in constexpr (if both operands are constexpr).
> >
> > Roger
> > --
> >
>
> --
> Xi Ruoyao 
> School of Aerospace Science and Technology, Xidian University

Re: [COMMITTED] Use EDGE_EXECUTABLE in ranger and return UNDEFINED for those edges.

2021-09-21 Thread Richard Biener via Gcc-patches

On Tue, Sep 21, 2021 at 2:57 PM Andrew MacLeod  wrote:
>
> On 9/21/21 2:14 AM, Richard Biener wrote:
> > On Tue, Sep 21, 2021 at 8:09 AM Richard Biener
> >  wrote:
> >> On Tue, Sep 21, 2021 at 12:01 AM Andrew MacLeod via Gcc-patches
> >>  wrote:
> >>>
> >>> The patch sets the EXECUTABLE property on edges like VRP does, and then
> >>> removes that flag when an edge is determined to be un-executable.
> >>>
> >>> This information is then used to return UNDEFINED for any requests on
> >>> un-executable edges, and to register equivalencies if all executable
> >>> edges of a PHI node are the same SSA_NAME.
> >>>
> >>> This catches up a number of the cases VRP gets that ranger was missing,
> >>> and reduces the EVRP discrepancies to almost 0.
> >>>
> >>> On a side note,  is there any interest/value in reversing the meaning of
> >>> that flag?  It seems to me that we could assume edges are EXECUTABLE by
> >>> default, then set a NON_EXECUTABLE flag when a pass determines the edge
> >>> cannot be executed.  This would rpevent a number fo passes from having
> >>> to loop through all the edges and set the EXECUTABLE property...   It
> >>> just seems backwards to me.
> >> The flag is simply not kept up-to-date and it's the passes responsibility 
> >> to
> >> make use of it (aka install a default state upon entry).
> >>
> >> To me not having EDGE_EXECUTABLE set on entry is more natural
> >> for optimistic propagation passes, but yes, if you do on-demand greedy
> >> processing then you need a conservative default.  But then how do you
> >> denote a 'VARYING' (executable) state that may not drop back to 'CONSTANT"
> >> (not executable)?  For optimistic propagation EDGE_EXECUTABLE set is
> >> simply the varying state and since we never clear it again there's no 
> >> chance
> >> of oscillation.
> Different model, we dont have a lattice whereby we track state and move
> form one to another.. we just track currently best known values for
> everything and recalculate them when the old values are stale.   We move
> the edge to unexecutable when those values allow us to rewrite a branch
> such that an edge can no longer be taken. everything else is executable.
>Any values on an unexecutable edge are then considered UNDEFINED when
> combined with other values..
>
> > Btw, I fail to see how the patch makes ranger assure a sane initial state of
> > EDGE_EXECUTABLE (or make its use conditional).  Is the code you patched
> > not also used on-demand?
>
> THe constructor for a ranger makes everything executable to start.
> Calls the same routine VRP does.
>
>   gimple_ranger::gimple_ranger () : tracer ("")
>   {
> @@ -41,6 +42,7 @@ gimple_ranger::gimple_ranger () : tracer ("")
> m_oracle = m_cache.oracle ();
> if (dump_file && (param_evrp_mode & EVRP_MODE_TRACE))
>   tracer.enable_trace ();
> +  set_all_edges_as_executable (cfun);
>   }

Ah, I see.  I had the impression that with ranger we can now
do a cheap query everywhere on the range of an SSA name.  But then
the above is O(CFG size)...

I guess I'm confusing something - but yes, clearly in a ranger VRP "pass"
that all sounds OK.

Richard.

> and EVRP also goes to this same effort at the start:
>
> evrp_range_analyzer::evrp_range_analyzer (bool update_global_ranges)
>: stack (10), m_update_global_ranges (update_global_ranges)
> {
>edge e;
>edge_iterator ei;
>basic_block bb;
>FOR_EACH_BB_FN (bb, cfun)
>  {
>bb->flags &= ~BB_VISITED;
>FOR_EACH_EDGE (e, ei, bb->preds)
>  e->flags |= EDGE_EXECUTABLE;
>  }
> }
>
> Ranger isn't doing anything different than the ciurrent VRP's do, so I
> don't see any distinguishing difference between optimistic and
> pessimistic.  We can guarantee that we will only clear the EXECUTABLE
> flag if the edge is 100% guaranteed to be not executable.  Perhaps other
> passes/approaches aren't able to provide such guarantees in which case
> having a persistent flag wouldnt work very well..
>
> > That is, are you sure you're not running into wrong-code issues if you
> > for example in passes.c:execute_one_pass right before passes
> > initialize EDGE_EXECUTABLE to a random state on each edge of the function?
> >
> > Richard.
>
> We dont really have multiple instances running, and I'm not sure that
> should be encouraged either.  Worst case is that a new instantiation
> would "undo" some of the edges that may have been marked as unexecutable
>
> We'd only run into problems if we try to use a ranger from some pass
> that is using EXECUTABLE in a different way.. ie,. to not mean
> EXECUTABLE.   but that seems even more problematic to me, and those uses
> ought to be fixed.
>
>
>
> >> Richard.
> >>
> >>> Anyway, bootstrapped on x86_64-pc-linux-gnu with no regressions. pushed.
> >>>
> >>> Andrew
> >>>
>

[PATCH] c++: concept-ids and value-dependence [PR102412]

2021-09-21 Thread Patrick Palka via Gcc-patches

Here since uses_template_parms returns true for all concept-ids, when a
concept-id is used as a default template argument then during deduction
the default argument is considered dependent even after substitution,
which leads to deduction failure (from type_unification_real).

This patch fixes this by implementing the proposed resolution of CWG 2446
which says a concept-id is dependent only if its arguments are.

Bootstrapped and regtestedon x86_64-pc-linux-gnu, does this look OK for
trunk and perhaps 11?  Also tested on cmcstl2 and range-v3.

DR 2446
PR c++/102412

gcc/cp/ChangeLog:

* constexpr.c (cxx_eval_constant_expression)
: Check value_dependent_expression_p
instead of processing_template_decl.
* pt.c (value_dependent_expression_p) :
Return true only if any_dependent_template_arguments_p.
(instantiation_dependent_r) : Remove this case.
: Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-nondep2.C: New test.
* g++.dg/cpp2a/concepts-nondep3.C: New test.
---
 gcc/cp/constexpr.c|  2 +-
 gcc/cp/pt.c   | 15 ++---
 gcc/testsuite/g++.dg/cpp2a/concepts-nondep2.C | 21 +++
 gcc/testsuite/g++.dg/cpp2a/concepts-nondep3.C |  9 
 4 files changed, 33 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-nondep2.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-nondep3.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 8a5dd067bcb..18d9d117a48 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -7117,7 +7117,7 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, 
tree t,
break;
  }
 
-   if (!processing_template_decl
+   if (!value_dependent_expression_p (t)
&& !uid_sensitive_constexpr_evaluation_p ())
  r = evaluate_concept_check (t);
else
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 22c74da7935..17386286269 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -27191,7 +27191,8 @@ value_dependent_expression_p (tree expression)
   }
 
 case TEMPLATE_ID_EXPR:
-  return concept_definition_p (TREE_OPERAND (expression, 0));
+  return concept_definition_p (TREE_OPERAND (expression, 0))
+   && any_dependent_template_arguments_p (TREE_OPERAND (expression, 1));
 
 case CONSTRUCTOR:
   {
@@ -27598,18 +27599,6 @@ instantiation_dependent_r (tree *tp, int 
*walk_subtrees,
 case REQUIRES_EXPR:
   return *tp;
 
-case CALL_EXPR:
-  /* Treat concept checks as dependent. */
-  if (concept_check_p (*tp))
-return *tp;
-  break;
-
-case TEMPLATE_ID_EXPR:
-  /* Treat concept checks as dependent.  */
-  if (concept_check_p (*tp))
-   return *tp;
-  break;
-
 case CONSTRUCTOR:
   if (CONSTRUCTOR_IS_DEPENDENT (*tp))
return *tp;
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-nondep2.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-nondep2.C
new file mode 100644
index 000..e9867f825a1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-nondep2.C
@@ -0,0 +1,21 @@
+// PR c++/102412
+// { dg-do link { target c++20 } }
+
+template concept C = __is_same(T, U);
+
+template> void f();
+template<> void f() { }
+template<> void f() { }
+
+template> void g();
+template<> void g() { }
+
+template> void h();
+template<> void h() { }
+
+int main() {
+  f();
+  f();
+  g();
+  h();
+}
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-nondep3.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-nondep3.C
new file mode 100644
index 000..ec2d60a43b3
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-nondep3.C
@@ -0,0 +1,9 @@
+// DR 2446
+// { dg-do compile { target c++20 } }
+
+template  concept C = true;
+template  struct A;
+template <> struct A { using type = bool; };
+
+template 
+void f(A)>::type); // OK: no 'typename' needed
-- 
2.33.0.363.g4c719308ce

Re: [RFC/PATCH] C++ constexpr vs. floating point exceptions.

2021-09-21 Thread Jakub Jelinek via Gcc-patches

On Tue, Sep 21, 2021 at 02:15:59PM +0100, Roger Sayle wrote:
> Can you double check?  Integer division by zero is undefined, but isn't 
> floating point
> division by zero defined by the appropriate IEEE standards?

https://eel.is/c++draft/expr.mul#4 doesn't make the division by zero
behavior conditional on integral types.
C has similar wording.

Jakub

RE: [RFC/PATCH] C++ constexpr vs. floating point exceptions.

2021-09-21 Thread Roger Sayle



Can you double check?  Integer division by zero is undefined, but isn't 
floating point
division by zero defined by the appropriate IEEE standards?

Roger
--

-Original Message-
From: Xi Ruoyao  
Sent: 21 September 2021 14:07
To: Roger Sayle ; 'GCC Patches' 

Subject: Re: [RFC/PATCH] C++ constexpr vs. floating point exceptions.

On Tue, 2021-09-21 at 12:41 +0100, Roger Sayle wrote:
> 
> I was wondering if I may ask the C++ language experts for their 
> opinion on whether (potential) floating point exceptions/traps can be 
> ignored in constant expressions; this is related to PR c++/96862.  I 
> think my question boils down to whether (or not) the following is 
> valid C++:
> 
> constexpr float my_inf = 7.0 / 0.0;

It's not.  C++ disallows constexpr from invoking undefined behaviors in Clauses 
4 through 19, while division by zero is an undefined behavior in Clause 8.

> [This is currently an error with "-O2", but OK with "-O2 -ffast- 
> math"!]

> There's a long history of g++'s semantics being accidentally tied to 
> the middle-end's constant folding, such that the current status quo is 
> that some middle-end bugs can't be fixed without breaking C++, and 
> vice versa.  I'm hoping that the patch below (following Jakub's lead 
> with rounding math) might be a next step to improving things, provided 
> that my understanding of the desired/correct behaviour of the C++ 
> front-end is correct.
> 
> This patch has been tested on x86_64-pc-linux-gnu with a "make 
> bootstrap"
> and "make -k check" with no new failures after tweaking two checks in
> g++.dg/ubsan/pr63956.C.

> With this change the middle-end can become more strict about 
> respecting flag_trapping_math without affecting C++'s behavior.  
> Ideally, what the front-end considers valid should be independent of 
> whether the user specified -fno-trapping-math (or
> -ffast-math) to the middle-end.

I think we can allow a constexpr to contain floating div-by-zero with - 
fno-trapping-math, if we consider -fno-trapping-math make the behavior "no 
longer undefined" but I'm not sure.

However -ffast-math also enables -ffinite-math-only, which makes div-by- zero 
absolutely undefined.  So to me the expected behavior is:

g++ t.cc   -> Compile error
g++ t.cc -ffast-math   -> Compile error
g++ t.cc -fno-trapping-math-> Ok
g++ t.cc -fno-trapping-math -ffinite-math-only -> Compile Error

> Thoughts?  Ok for mainline?

Based on the reasoning above I think it's not OK.  But anyway I'm not a 
maintainer.

> 2021-09-21  Roger Sayle  
> 
> gcc/cp/ChangeLog
> * constexpr.c (cxx_eval_outermost_const_expr): Temporarily 
> disable
> the middle-end from honoring floating point exceptions/traps 
> while
> folding "manifestly constant" expressions.
> 
> gcc/testsuite/ChangeLog
> * g++.dg/ubsan/pr63956.C: Update to (always) allow floating 
> point
> division in constexpr (if both operands are constexpr).
> 
> Roger
> --
> 

--
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [RFC/PATCH] C++ constexpr vs. floating point exceptions.

2021-09-21 Thread Xi Ruoyao via Gcc-patches

On Tue, 2021-09-21 at 12:41 +0100, Roger Sayle wrote:
> 
> I was wondering if I may ask the C++ language experts for their
> opinion
> on whether (potential) floating point exceptions/traps can be ignored
> in constant expressions; this is related to PR c++/96862.  I think my
> question boils down to whether (or not) the following is valid C++:
> 
> constexpr float my_inf = 7.0 / 0.0;

It's not.  C++ disallows constexpr from invoking undefined behaviors in
Clauses 4 through 19, while division by zero is an undefined behavior in
Clause 8.

> [This is currently an error with "-O2", but OK with "-O2 -ffast-
> math"!]

> There's a long history of g++'s semantics being accidentally tied to
> the middle-end's constant folding, such that the current status quo
> is that some middle-end bugs can't be fixed without breaking C++,
> and vice versa.  I'm hoping that the patch below (following Jakub's
> lead with rounding math) might be a next step to improving things,
> provided that my understanding of the desired/correct behaviour of
> the C++ front-end is correct.
> 
> This patch has been tested on x86_64-pc-linux-gnu with a "make
> bootstrap"
> and "make -k check" with no new failures after tweaking two checks in
> g++.dg/ubsan/pr63956.C.

> With this change the middle-end can become more
> strict about respecting flag_trapping_math without affecting C++'s
> behavior.  Ideally, what the front-end considers valid should be
> independent of whether the user specified -fno-trapping-math (or
> -ffast-math) to the middle-end.

I think we can allow a constexpr to contain floating div-by-zero with -
fno-trapping-math, if we consider -fno-trapping-math make the behavior
"no longer undefined" but I'm not sure.

However -ffast-math also enables -ffinite-math-only, which makes div-by-
zero absolutely undefined.  So to me the expected behavior is:

g++ t.cc   -> Compile error
g++ t.cc -ffast-math   -> Compile error
g++ t.cc -fno-trapping-math-> Ok
g++ t.cc -fno-trapping-math -ffinite-math-only -> Compile Error

> Thoughts?  Ok for mainline?

Based on the reasoning above I think it's not OK.  But anyway I'm not a
maintainer.

> 2021-09-21  Roger Sayle  
> 
> gcc/cp/ChangeLog
> * constexpr.c (cxx_eval_outermost_const_expr): Temporarily
> disable
> the middle-end from honoring floating point exceptions/traps
> while
> folding "manifestly constant" expressions.
> 
> gcc/testsuite/ChangeLog
> * g++.dg/ubsan/pr63956.C: Update to (always) allow floating
> point
> division in constexpr (if both operands are constexpr).
> 
> Roger
> --
> 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [COMMITTED] Use EDGE_EXECUTABLE in ranger and return UNDEFINED for those edges.

2021-09-21 Thread Andrew MacLeod via Gcc-patches


On 9/21/21 2:14 AM, Richard Biener wrote:

On Tue, Sep 21, 2021 at 8:09 AM Richard Biener
 wrote:

On Tue, Sep 21, 2021 at 12:01 AM Andrew MacLeod via Gcc-patches
 wrote:


The patch sets the EXECUTABLE property on edges like VRP does, and then
removes that flag when an edge is determined to be un-executable.

This information is then used to return UNDEFINED for any requests on
un-executable edges, and to register equivalencies if all executable
edges of a PHI node are the same SSA_NAME.

This catches up a number of the cases VRP gets that ranger was missing,
and reduces the EVRP discrepancies to almost 0.

On a side note,  is there any interest/value in reversing the meaning of
that flag?  It seems to me that we could assume edges are EXECUTABLE by
default, then set a NON_EXECUTABLE flag when a pass determines the edge
cannot be executed.  This would rpevent a number fo passes from having
to loop through all the edges and set the EXECUTABLE property...   It
just seems backwards to me.

The flag is simply not kept up-to-date and it's the passes responsibility to
make use of it (aka install a default state upon entry).

To me not having EDGE_EXECUTABLE set on entry is more natural
for optimistic propagation passes, but yes, if you do on-demand greedy
processing then you need a conservative default.  But then how do you
denote a 'VARYING' (executable) state that may not drop back to 'CONSTANT"
(not executable)?  For optimistic propagation EDGE_EXECUTABLE set is
simply the varying state and since we never clear it again there's no chance
of oscillation.
Different model, we dont have a lattice whereby we track state and move 
form one to another.. we just track currently best known values for 
everything and recalculate them when the old values are stale.   We move 
the edge to unexecutable when those values allow us to rewrite a branch 
such that an edge can no longer be taken. everything else is executable. 
  Any values on an unexecutable edge are then considered UNDEFINED when 
combined with other values..



Btw, I fail to see how the patch makes ranger assure a sane initial state of
EDGE_EXECUTABLE (or make its use conditional).  Is the code you patched
not also used on-demand?


THe constructor for a ranger makes everything executable to start.  
Calls the same routine VRP does.


 gimple_ranger::gimple_ranger () : tracer ("")
 {
@@ -41,6 +42,7 @@ gimple_ranger::gimple_ranger () : tracer ("")
   m_oracle = m_cache.oracle ();
   if (dump_file && (param_evrp_mode & EVRP_MODE_TRACE))
 tracer.enable_trace ();
+  set_all_edges_as_executable (cfun);
 }

and EVRP also goes to this same effort at the start:

evrp_range_analyzer::evrp_range_analyzer (bool update_global_ranges)
  : stack (10), m_update_global_ranges (update_global_ranges)
{
  edge e;
  edge_iterator ei;
  basic_block bb;
  FOR_EACH_BB_FN (bb, cfun)
    {
  bb->flags &= ~BB_VISITED;
  FOR_EACH_EDGE (e, ei, bb->preds)
    e->flags |= EDGE_EXECUTABLE;
    }
}

Ranger isn't doing anything different than the ciurrent VRP's do, so I 
don't see any distinguishing difference between optimistic and 
pessimistic.  We can guarantee that we will only clear the EXECUTABLE 
flag if the edge is 100% guaranteed to be not executable.  Perhaps other 
passes/approaches aren't able to provide such guarantees in which case 
having a persistent flag wouldnt work very well..



That is, are you sure you're not running into wrong-code issues if you
for example in passes.c:execute_one_pass right before passes
initialize EDGE_EXECUTABLE to a random state on each edge of the function?

Richard.


We dont really have multiple instances running, and I'm not sure that 
should be encouraged either.  Worst case is that a new instantiation 
would "undo" some of the edges that may have been marked as unexecutable


We'd only run into problems if we try to use a ranger from some pass 
that is using EXECUTABLE in a different way.. ie,. to not mean 
EXECUTABLE.   but that seems even more problematic to me, and those uses 
ought to be fixed.





Richard.


Anyway, bootstrapped on x86_64-pc-linux-gnu with no regressions. pushed.

Andrew

RE: [PATCH] Simplify paradoxical subreg extensions of TRUNCATE

2021-09-21 Thread Roger Sayle

That define_insn is making my eyes bleed!  I think that's the most convincing
argument I've ever read on gcc-patches, and I can see now what Segher is so
opposed to.  My new goal is to add sh_mul and uh_mul as RTL expression
codes for highpart multiplication, so that that pattern will go away [and
never ever been seen again].

Fortunately, the proposed patch is part of the solution, rather than part of
the problem.  simplify_subreg is used to find a sensible alternate form for
a hypothetical SUBREG before it's created (and normally it'll only be created
if it's valid c.f. validate_subreg in simplify_gen_subreg).  This function is 
called
over 30 million times during stage2/stage3 of a bootstrap on x86_64, with
the most common "SUBREG" operand RTX codes being:

16160294 reg
5627157 const_int
3278692 mem
 960839 lshiftrt
 902685 plus
 690246 and
 430131 subreg
 319407 zero_extract
 319145 minus
 255043 value
 232959 ashift
 225468 mult
 217819 xor
 207645 clobber
 207454 ashiftrt
 100213 ior

Adding a simplify_subreg for mult, replacing them with a highpart rtx should
avoid (reduce) combine's Frankenstein patterns appearing in machine 
descriptions.
Presumably, the reason that this define_insn starts with a "*" even for a 
recognized
named pattern, is that if expand ever attempted emitting this instruction, the
internal sanity checks for a sane/valid SUBREG would fail/ICE.

p.s. You're right that there's a temptation to add ugly patterns to a backend,
instead of fixing their simplification in the middle-end; but only when backend
patches get reviewed faster/easier than middle-end patches 

Awesome (counter-)example!

Roger
--

-Original Message-
From: Richard Sandiford  
Sent: 21 September 2021 13:02
To: Richard Biener via Gcc-patches 
Cc: Segher Boessenkool ; Richard Biener 
; Roger Sayle 
Subject: Re: [PATCH] Simplify paradoxical subreg extensions of TRUNCATE

[Using this is a convenient place to reply to the thread as a whole]

Richard Biener via Gcc-patches  writes:
> On Mon, Sep 6, 2021 at 12:15 PM Segher Boessenkool 
>  wrote:
>>
>> On Sun, Sep 05, 2021 at 11:28:30PM +0100, Roger Sayle wrote:
>> > This patch simplifies the RTX (subreg:HI (truncate:QI (reg:SI))) as 
>> > (truncate:HI (reg:SI)), and closely related variants.
>>
>> Subregs of other than regs are undefined in RTL.  You will first have 
>> to define this (in documentation as well as in other code that 
>> handles subregs).  I doubt this is possible to do, subreg have so 
>> many overloaded meanings already.
>
> I suppose (subreg:MODE1 (xyz:MODE2 ..)) where xyz is not REG or MEM is 
> equal to
>
>   (set (reg:MODE2) (xyz:MODE2 ..))
>   (subreg:MODE1 (reg:MODE2) ...)
>
> with 'reg' being a pseudo reg is the (only?) sensible way of defining it.

Agreed.  And I think that's already the de facto definition (and has been for a 
long time).  Subreg as an operation has to have defined semantics for all the 
cases that simplify_subreg handles, otherwise we have GIGO and a lot of the 
function could be deleted.  We can (and do) choose to prevent some of those 
operations becoming actual rtxes, but even there, the de facto rules are quite 
broad.  E.g.:

- Like you said later, simplify_gen_subreg is opt-out rather than opt-in
  in terms of the subreg rtxes that it's prepared to create.

- Even rs6000.md has:

(define_insn "*mul3_highpart"
  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
(subreg:GPR
  (mult: (any_extend:
  (match_operand:GPR 1 "gpc_reg_operand" "r"))
(any_extend:
  (match_operand:GPR 2 "gpc_reg_operand" "r")))
 0))]
  "WORDS_BIG_ENDIAN && !(mode == SImode && TARGET_POWERPC64)"
  "mulh %0,%1,%2"
  [(set_attr "type" "mul")
   (set_attr "size" "")])

  Many other ports have similar patterns.

The problem with “combine can generate invalid rtl but backends should reject 
it” is that, generally, people write combine patterns by looking at what 
combine _wants_ to generate and then writing .md patterns to match that.  In 
other words, combine in practice defines the (de facto) correct rtl 
representation of a combined sequence.

Given:

Trying 10 -> 15:
   10: r29:QI=trunc(r32:SI)
  REG_DEAD r32:SI
   15: r38:HI=r29:QI#0
  REG_DEAD r29:QI
Failed to match this instruction:
(set (reg:HI 38)
(subreg:HI (truncate:QI (reg:SI 32)) 0))

I'm sure there's a temptation to add an .md pattern that matches the subreg. :-)

Thanks,
Richard

> That would make the simplification apply when substituting the pseudos 
> definition inside the subreg which maybe is what this targets?
>
> Richard.
>
>>
>>
>> Segher

Re: [Patch] Fortran: Add gfc_simple_for_loop aux function

2021-09-21 Thread Thomas Koenig via Gcc-patches


Hi Tobias,


This patch adds a useful auxiliary function to generate a loop.


Looks good to me (and there are probably quite a few places where this
could be useful).

Best regards

Thomas

Re: [PATCH] Simplify paradoxical subreg extensions of TRUNCATE

2021-09-21 Thread Segher Boessenkool

On Tue, Sep 21, 2021 at 08:24:04AM +0200, Richard Biener wrote:
> On Tue, Sep 21, 2021 at 6:18 AM Jeff Law via Gcc-patches
>  wrote:
> > If we were catching the scenario which led to the creation of (subreg
> > (truncate)) in combine and instead of creating (subreg (truncate)) we
> > instead created the simplified, correct form would that ease your concerns?
> 
> Sorry to chime in,

No, please do!

> but if (subreg (truncate ...)) is invalid it would be OK for
> simplify_gen_subreg () to
>  1) return NULL_RTX

Yes, that is valid, but the simplify-rtx routines (as well as pretty
much everything dealing with RTL) doesn't have to check it is valid RTL,
as a matter of principle: it can just assume anything passed around *is*
valid.  Anything simplify-rtx gets as input should be valid RTL, and the
only thing it should be concerned with is *generating* only valid RTL as
well :-)

>  2) return valid RTX, in this case the suggested canoncalized form
> I agree that simplify_gen_subreg should not return (subreg (truncate ..)).
> 
> So I'm not understanding if you are suggesting that even calling
> simplify_gen_subreg on a (truncate ...) is invalid?

What would the semantics be?  A subreg has a reg as operand, and its
semantics are different for pseudos and for hard regs!

We could define the semantics of calling simplify_gen_subreg on
non-regs, but that will make the interface only more confused imo,
much harder to use correctly.

> Now, the patch changes simplify_subreg where it's not clear whether
> the subreg already exists (it should not) or whether we are about to
> construct it, seeing if there's a valid RTX that expresses the semantic
> as if the op were pushed to a pseudo (aka we're combining on the fly).
> 
> It's a bit like fold_convert (double_type_node, integer_one_node)
> second-guessing and doing a FLOAT_EXPR when folding
> the implicit CONVERT_EXPR.

Yeah :-/

Segher

Re: [PATCH 1/3][vect] Add main vectorized loop unrolling

2021-09-21 Thread Richard Biener via Gcc-patches

On Fri, 17 Sep 2021, Andre Vieira (lists) wrote:

> Hi all,
> 
> This patch adds the ability to define a target hook to unroll the main
> vectorized loop. It also introduces --param's vect-unroll and
> vect-unroll-reductions to control this through a command-line. I found this
> useful to experiment and believe can help when tuning, so I decided to leave
> it in.
> We only unroll the main loop and have disabled unrolling epilogues for now. We
> also do not support unrolling of any loop that has a negative step and we do
> not support unrolling a loop with any reduction other than a
> TREE_CODE_REDUCTION.
> 
> Bootstrapped and regression tested on aarch64-linux-gnu as part of the series.

I wonder why we want to change the vector modes used for the epilogue,
we're either making it more likely to need to fall through to the
scalar epilogue or require another vectorized epilogue.

That said, for simplicity I'd only change the VF of the main loop.

There I wonder why you need to change vect_update_vf_for_slp or
vect_determine_partial_vectors_and_peeling and why it's not enough
to adjust the VF in a single place, I'd do that here:

  /* This is the point where we can re-start analysis with SLP forced off.  
*/
start_over:

  /* Now the vectorization factor is final.  */
  poly_uint64 vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
  gcc_assert (known_ne (vectorization_factor, 0U));

>  call vect_update_vf_for_unroll ()

note there's also loop->unroll (from #pragma GCC unroll) which we
could include in what you look at in vect_unroll_value.

I don't like add_stmt_cost_for_unroll - how should a target go
and decide based on what it is fed?  You could as well feed it
the scalar body or the vinfo so it can get a shot at all
the vectorizers meta data - but feeding it individual stmt_infos
does not add any meaningful abstraction and thus what's the
point?

I _think_ what would make some sense is when we actually cost
the vector body (with the not unrolled VF) ask the target
"well, how about unrolling this?" because there it has the
chance to look at the actual vector stmts produced (in "cost form").
And if the target answers "yeah - go ahead and try x4" we signal
that to the iteration and have "mode N with x4 unroll" validated and
costed.

So instead of a new target hook amend the finish_cost hook to
produce a suggested unroll value and cost both the unrolled and
not unrolled body.

Sorry for steering in a different direction ;)

Thanks,
Richard.

> gcc/ChangeLog:
> 
>     * doc/tm.texi: Document TARGET_VECTORIZE_UNROLL_FACTOR
>     and TARGET_VECTORIZE_ADD_STMT_COST_FOR_UNROLL.
>     * doc/tm.texi.in: Add entries for target hooks above.
>     * params.opt: Add vect-unroll and vect-unroll-reductions 
> parameters.
>     * target.def: Define hooks TARGET_VECTORIZE_UNROLL_FACTOR
>     and TARGET_VECTORIZE_ADD_STMT_COST_FOR_UNROLL.
>     * targhooks.c (default_add_stmt_cost_for_unroll): New.
>     (default_unroll_factor): Likewise.
>     * targhooks.h (default_add_stmt_cost_for_unroll): Likewise.
>     (default_unroll_factor): Likewise.
>     * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
>     par_unrolling_factor.
>     (vect_update_vf_for_slp): Use unrolling factor to update 
> vectorization
>     factor.
>     (vect_determine_partial_vectors_and_peeling): Account for 
> unrolling.
>     (vect_determine_unroll_factor): Determine how much to unroll
> vectorized
>     main loop.
>     (vect_analyze_loop_2): Call vect_determine_unroll_factor.
>     (vect_analyze_loop): Allow for epilogue vectorization when 
> unrolling
>     and rewalk vector_mode array for the epilogues.
>     (vectorizable_reduction): Disable single_defuse_cycle when 
> unrolling.
>     * tree-vectorizer.h (vect_unroll_value): Declare par_unrolling_factor
>     as a member of loop_vec_info.
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

[Patch] Fortran: Fix assumed-size to assumed-rank passing [PR94070]

2021-09-21 Thread Tobias Burnus


This patch requires the previously mentioned simple-loop-gen patch,
which also still needs to be reviewed:
https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579576.html

For the xfailed part of the new testcase, the updated array descriptor
is needed, but I think leaving it as xfailed for now - and reviewing
this patch first makes more sense.

size(a,dim=i) of an array is simply:
  a.dim[i].ubound - a.dim[i].lbound + 1
except that the result is always >= 0, i.e. a(5:-10) has size 0.

Assumed-size arrays like  as(5, -3:*)  can be passed to assumed-rank
arrays – but, obviously, the upper bound is unknown. Without BIND(C),
the standard is quiet how those get transported but:
  ubound(as,dim=2) == size(as,dim=2) == -1
However, for
  ..., allocatable :: c(:,:)
  allocate (c(5,-4:-1))
the size(c,dim=2) is surely 4 and not -1. Thus, when passing it
to a
   subroutine foo(x)
 ..., allocatable :: x(..)
it should also come out as size(x, dim=2) == 4.

To make the distinction, the allocatable/pointer attribute of the
dummy can be used – as an assumed-size array cannot be allocatable.

That's what is used in trans-intrinsic.c/trans-array.c – and the
main reason I started to generate inline code for the array size.
(Given that it permits optimizations and is a trivial code, I
also think that it makes sense in general.)

But even when doing so, it still did not work properly as when
calling
  call foo(d)
the bounds where not always reset such that the caller could still
receive ubound(d,dim=last_dim) == -1 - in the case it just happened
to be -1, be it for a zero-sized array or because the lbounds just
happend to be -1 or smaller. That's taken care of in trans-expr.c.

OK for mainline?

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
Fortran: Fix assumed-size to assumed-rank passing [PR94070]

This code inlines the size0 and size1 libgfortran calls, the former is still
used by libgfortan itself (and by old code). Besides permitting more
optimizations, it also permits to handle assumed-rank dummies better: If the
dummy argument is a nonpointer/nonallocatable, an assumed-size actual arg is
repesented by having ubound == -1 for the last dimension. However, for
allocatable/pointers, this value can also exist. Hence, the dummy arg attr
has to be honored.

For that reason, when calling an assumed-rank procedure with nonpointer,
nonallocatable dummy arguments, the bounds have to be updated to avoid
the case ubound == -1 for the last dimension.

	PR fortran/94070

gcc/fortran/ChangeLog:

	* trans-array.c (gfc_tree_array_size): New function to
	find size inline (whole array or one dimension).
	(array_parameter_size): Use it, take stmt_block as arg.
	(gfc_conv_array_parameter): Update call.
	* trans-array.h (gfc_tree_array_size): Add prototype.
	* trans-expr.c (gfc_conv_procedure_call): Update
	bounds of pointer/allocatable actual args to nonallocatable/nonpointer
	dummies to be one based.
	* trans-intrinsic.c (gfc_conv_intrinsic_shape): Fix case for
	assumed rank with allocatable/pointer dummy.
	(gfc_conv_intrinsic_size): Update to use inline function.

libgfortran/ChangeLog:

	* intrinsics/size.c (size0, size1): Comment that now not
	used by newer compiler code.

libgomp/ChangeLog:

	* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Update
	expected dg-note output.

gcc/testsuite/ChangeLog:

	* gfortran.dg/c-interop/cf-out-descriptor-6.f90: Remove xfail.
	* gfortran.dg/c-interop/size.f90: Remove xfail.
	* gfortran.dg/intrinsic_size_3.f90:
	* gfortran.dg/transpose_optimization_2.f90:
	* gfortran.dg/assumed_rank_22.f90: New test.
	* gfortran.dg/assumed_rank_22_aux.c: New test.

 gcc/fortran/trans-array.c  | 165 
 gcc/fortran/trans-array.h  |   2 +
 gcc/fortran/trans-expr.c   |  43 +-
 gcc/fortran/trans-intrinsic.c  | 119 ++-
 gcc/testsuite/gfortran.dg/assumed_rank_22.f90  | 167 +
 gcc/testsuite/gfortran.dg/assumed_rank_22_aux.c|  68 +
 .../gfortran.dg/c-interop/cf-out-descriptor-6.f90  |   2 +-
 gcc/testsuite/gfortran.dg/c-interop/size.f90   |   2 +-
 gcc/testsuite/gfortran.dg/intrinsic_size_3.f90 |   2 +-
 .../gfortran.dg/transpose_optimization_2.f90   |   2 +-
 libgfortran/intrinsics/size.c  |   4 +
 .../libgomp.oacc-fortran/privatized-ref-2.f90  |  13 +-
 12 files changed, 476 insertions(+), 113 deletions(-)

diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index 0d013defdbb..b8061f37772 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -7901,31 +7901,143 @@ gfc_conv_expr_descriptor (gfc_se *se, gfc_expr *expr)
   gfc_cleanup_loop ();
 }
 
+
+/*

Re: [PATCH] Simplify paradoxical subreg extensions of TRUNCATE

2021-09-21 Thread Segher Boessenkool

On Mon, Sep 20, 2021 at 10:18:15PM -0600, Jeff Law wrote:
> On 9/20/2021 6:23 PM, Segher Boessenkool wrote:
> >There is no such thing as "earlier than simplify-rtx", that is the
> >point.  simplify-rtx is not a pass: it is like a library that is used
> >from all over the RTL routines.
> I'm referring to earlier in the call chain, not an earlier pass. Sorry I 
> wasn't clear about that.

Ah okay, I see.

> If we were catching the scenario which led to the creation of (subreg 
> (truncate)) in combine and instead of creating (subreg (truncate)) we 
> instead created the simplified, correct form would that ease your concerns?

Yes please.  And combine has access to the original instructions, so can
make sure to only ever create something with the same semantics.

Thanks,


Segher

Re: [PATCH] rs6000: Parameterize some const values for density test

2021-09-21 Thread Segher Boessenkool

Hi!

On Tue, Sep 21, 2021 at 01:47:19PM +0800, Kewen.Lin wrote:
> on 2021/9/18 上午6:26, Segher Boessenkool wrote:
> >> +  if (data->nloads > (unsigned int) rs6000_density_load_num_threshold
> >> +&& load_pct > (unsigned int) rs6000_density_load_pct_threshold)
> > 
> > Those variables are unsigned int already.  Don't cast please.
> 
> Unfortunately this is required by bootstrapping.  The UInteger for the
> param definition is really confusing, in the underlying implementation
> it's still "signed".  If you grep "(unsigned) param", you can see a few
> examples.  I guess the "UInteger" is mainly for the param value range
> checking.

Huh, I see.  Is that a bug?  It certainly is surprising!  Please open a
PR if you think it could/should be improved, put me on Cc:?

> >> +-param=rs6000-density-pct-threshold=
> >> +Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) 
> >> Init(85) IntegerRange(0, 99) Param
> > 
> > So make this and all other percentages (0, 100) please.
> 
> I thought 99 is enough for the RHS in ">". just realized it's more clear
> with 100.  Will fix!

99 will work fine, but it's not the best choice for the user, who will
expect that a percentage can be anything from 0% to 100%.

> >> +When costing for loop vectorization, we probably need to penalize the 
> >> loop body cost if the existing cost model may not adequately reflect 
> >> delays from unavailable vector resources.  We collect the cost for 
> >> vectorized statements and non-vectorized statements separately, check the 
> >> proportion of vec_cost to total cost of vec_cost and non vec_cost, and 
> >> penalize only if the proportion exceeds the threshold specified by this 
> >> parameter.  The default value is 85.
> > 
> > It would be good if we can use line breaks in the source code for things
> > like this, but I don't think we can.  This message is mainly used for
> > "--help=param", and it is good there to have as short messages as you
> > can.  But given the nature of params you need quite a few words often,
> > and you do not want to say so little that things are no clear, either.
> > 
> > So, dunno :-)
> 
> I did some testings, the line breaks writing can still survive in the
> "--help=param" show, the lines are concatenated with " ".  Although
> there seems no this kind of writing practices, I am guessing you want
> me to do line breaks for their descriptions?  If so, I will make them
> short as the above "Target Undocumented..." line.  Or do you want it
> to align source code ColumnLimit 80 (for these cases, it would look
> shorter)?

It would help if was more readable in the surce code, one line of close
to 500 columns is not very manageable :-)

But the thing that matters is what it will look like in the --help=
output (and/or the manual).


Segher

Re: [PATCH] Simplify paradoxical subreg extensions of TRUNCATE

2021-09-21 Thread Richard Sandiford via Gcc-patches

[Using this is a convenient place to reply to the thread as a whole]

Richard Biener via Gcc-patches  writes:
> On Mon, Sep 6, 2021 at 12:15 PM Segher Boessenkool
>  wrote:
>>
>> On Sun, Sep 05, 2021 at 11:28:30PM +0100, Roger Sayle wrote:
>> > This patch simplifies the RTX (subreg:HI (truncate:QI (reg:SI))) as
>> > (truncate:HI (reg:SI)), and closely related variants.
>>
>> Subregs of other than regs are undefined in RTL.  You will first have to
>> define this (in documentation as well as in other code that handles
>> subregs).  I doubt this is possible to do, subreg have so many
>> overloaded meanings already.
>
> I suppose (subreg:MODE1 (xyz:MODE2 ..)) where xyz is not REG or MEM
> is equal to
>
>   (set (reg:MODE2) (xyz:MODE2 ..))
>   (subreg:MODE1 (reg:MODE2) ...)
>
> with 'reg' being a pseudo reg is the (only?) sensible way of defining it.

Agreed.  And I think that's already the de facto definition (and has been
for a long time).  Subreg as an operation has to have defined semantics
for all the cases that simplify_subreg handles, otherwise we have GIGO
and a lot of the function could be deleted.  We can (and do) choose
to prevent some of those operations becoming actual rtxes, but even there,
the de facto rules are quite broad.  E.g.:

- Like you said later, simplify_gen_subreg is opt-out rather than opt-in
  in terms of the subreg rtxes that it's prepared to create.

- Even rs6000.md has:

(define_insn "*mul3_highpart"
  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
(subreg:GPR
  (mult: (any_extend:
  (match_operand:GPR 1 "gpc_reg_operand" "r"))
(any_extend:
  (match_operand:GPR 2 "gpc_reg_operand" "r")))
 0))]
  "WORDS_BIG_ENDIAN && !(mode == SImode && TARGET_POWERPC64)"
  "mulh %0,%1,%2"
  [(set_attr "type" "mul")
   (set_attr "size" "")])

  Many other ports have similar patterns.

The problem with “combine can generate invalid rtl but backends
should reject it” is that, generally, people write combine patterns
by looking at what combine _wants_ to generate and then writing
.md patterns to match that.  In other words, combine in practice
defines the (de facto) correct rtl representation of a combined
sequence.

Given:

Trying 10 -> 15:
   10: r29:QI=trunc(r32:SI)
  REG_DEAD r32:SI
   15: r38:HI=r29:QI#0
  REG_DEAD r29:QI
Failed to match this instruction:
(set (reg:HI 38)
(subreg:HI (truncate:QI (reg:SI 32)) 0))

I'm sure there's a temptation to add an .md pattern that matches
the subreg. :-)

Thanks,
Richard

> That would make the simplification apply when substituting the pseudos
> definition inside the subreg which maybe is what this targets?
>
> Richard.
>
>>
>>
>> Segher

1 2 >

1 - 100 of 114 matches

Mail list logo