Re: [PATCH 24/62] AVX512FP16: Add vmovw/vmovsh.

2021-09-15 Thread Hongtao Liu via Gcc-patches
I'm going to check in 6 patches

[PATCH 24/62] AVX512FP16: Add vmovw/vmovsh.
[PATCH 25/62] AVX512FP16: Add testcase for vmovsh/vmovw.
[PATCH 26/62] AVX512FP16: Add
vcvtph2dq/vcvtph2qq/vcvtph2w/vcvtph2uw/vcvtph2uqq/vcvtph2udq
[PATCH 27/62] AVX512FP16: Add testcase for
vcvtph2w/vcvtph2uw/vcvtph2dq/vcvtph2udq/vcvtph2qq/vcvtph2uqq.
[PATCH 28/62] AVX512FP16: Add
vcvtuw2ph/vcvtw2ph/vcvtdq2ph/vcvtudq2ph/vcvtqq2ph/vcvtuqq2ph
[PATCH 29/62] AVX512FP16: Add testcase for
vcvtw2ph/vcvtuw2ph/vcvtdq2ph/vcvtudq2ph/vcvtqq2ph/vcvtuqq2ph.

  Bootstrapped and regtested on x86_64-linux-gnu{-m32,}
  Newly added runtime testcase passed on SPR.

On Thu, Jul 1, 2021 at 2:17 PM liuhongt  wrote:
>
> gcc/ChangeLog:
>
> * config/i386/avx512fp16intrin.h: (_mm_cvtsi16_si128):
> New intrinsic.
> (_mm_cvtsi128_si16): Likewise.
> (_mm_mask_load_sh): Likewise.
> (_mm_maskz_load_sh): Likewise.
> (_mm_mask_store_sh): Likewise.
> (_mm_move_sh): Likewise.
> (_mm_mask_move_sh): Likewise.
> (_mm_maskz_move_sh): Likewise.
> * config/i386/i386-builtin-types.def: Add corresponding builtin types.
> * config/i386/i386-builtin.def: Add corresponding new builtins.
> * config/i386/i386-expand.c
> (ix86_expand_special_args_builtin): Handle new builtin types.
> (ix86_expand_vector_init_one_nonzero): Adjust for FP16 target.
> * config/i386/sse.md (VI2F): New mode iterator.
> (vec_set_0): Use new mode iterator.
> (avx512f_mov_mask): Adjust for HF vector mode.
> (avx512f_store_mask): Ditto.
> ---
>  gcc/config/i386/avx512fp16intrin.h | 59 ++
>  gcc/config/i386/i386-builtin-types.def |  3 ++
>  gcc/config/i386/i386-builtin.def   |  5 +++
>  gcc/config/i386/i386-expand.c  | 11 +
>  gcc/config/i386/sse.md | 33 +++---
>  5 files changed, 95 insertions(+), 16 deletions(-)
>
> diff --git a/gcc/config/i386/avx512fp16intrin.h 
> b/gcc/config/i386/avx512fp16intrin.h
> index 2fbfc140c44..cdf6646c8c6 100644
> --- a/gcc/config/i386/avx512fp16intrin.h
> +++ b/gcc/config/i386/avx512fp16intrin.h
> @@ -2453,6 +2453,65 @@ _mm512_maskz_getmant_round_ph (__mmask32 __U, __m512h 
> __A,
>
>  #endif /* __OPTIMIZE__ */
>
> +/* Intrinsics vmovw.  */
> +extern __inline __m128i
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_cvtsi16_si128 (short __A)
> +{
> +  return _mm_set_epi16 (0, 0, 0, 0, 0, 0, 0, __A);
> +}
> +
> +extern __inline short
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_cvtsi128_si16 (__m128i __A)
> +{
> +  return __builtin_ia32_vec_ext_v8hi ((__v8hi)__A, 0);
> +}
> +
> +/* Intrinsics vmovsh.  */
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_mask_load_sh (__m128h __A, __mmask8 __B, _Float16 const* __C)
> +{
> +  return __builtin_ia32_loadsh_mask (__C, __A, __B);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_maskz_load_sh (__mmask8 __A, _Float16 const* __B)
> +{
> +  return __builtin_ia32_loadsh_mask (__B, _mm_setzero_ph (), __A);
> +}
> +
> +extern __inline void
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_mask_store_sh (_Float16 const* __A, __mmask8 __B, __m128h __C)
> +{
> +  __builtin_ia32_storesh_mask (__A,  __C, __B);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_move_sh (__m128h __A, __m128h  __B)
> +{
> +  __A[0] = __B[0];
> +  return __A;
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_mask_move_sh (__m128h __A, __mmask8 __B, __m128h  __C, __m128h __D)
> +{
> +  return __builtin_ia32_vmovsh_mask (__C, __D, __A, __B);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_maskz_move_sh (__mmask8 __A, __m128h  __B, __m128h __C)
> +{
> +  return __builtin_ia32_vmovsh_mask (__B, __C, _mm_setzero_ph (), __A);
> +}
> +
>  #ifdef __DISABLE_AVX512FP16__
>  #undef __DISABLE_AVX512FP16__
>  #pragma GCC pop_options
> diff --git a/gcc/config/i386/i386-builtin-types.def 
> b/gcc/config/i386/i386-builtin-types.def
> index 79e7edf13e5..6cf3e354c78 100644
> --- a/gcc/config/i386/i386-builtin-types.def
> +++ b/gcc/config/i386/i386-builtin-types.def
> @@ -134,6 +134,7 @@ DEF_POINTER_TYPE (PCVOID, VOID, CONST)
>  DEF_POINTER_TYPE (PVOID, VOID)
>  DEF_POINTER_TYPE (PDOUBLE, DOUBLE)
>  DEF_POINTER_TYPE (PFLOAT, FLOAT)
> +DEF_POINTER_TYPE (PCFLOAT16, FLOAT16, CONST)
>  DEF_POINTER_TYPE (PSHORT, SHORT)
>  DEF_POINTER_TYPE (PUSHORT, USHORT)
>  DEF_POINTER_TYPE (PINT, INT)
> @@ -1308,6 +1309,8 @@ DEF_FUNCTION_TYPE (QI, V8HF, INT, UQI)
>  DEF_FUNCTION_TYPE (HI, V16HF, INT, UHI)
>  DEF_FUNCTION_TYPE (SI, V32HF, INT, USI)
>  DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF)
> 

[PATCH] Enable auto-vectorization at O2 with very-cheap cost model.

2021-09-15 Thread liuhongt via Gcc-patches
Ping
rebased on latest trunk.

gcc/ChangeLog:

* common.opt (ftree-vectorize): Add Var(flag_tree_vectorize).
* doc/invoke.texi (Options That Control Optimization): Update
documents.
* opts.c (default_options_table): Enable auto-vectorization at
O2 with very-cheap cost model.
(finish_options): Use cheap cost model for
explicit -ftree{,-loop}-vectorize.

gcc/testsuite/ChangeLog:

* c-c++-common/Wstringop-overflow-2.c: Adjust testcase.
* g++.dg/tree-ssa/pr81408.C: Ditto.
* g++.dg/warn/Wuninitialized-13.C: Ditto.
* gcc.dg/Warray-bounds-51.c: Ditto.
* gcc.dg/Warray-parameter-3.c: Ditto.
* gcc.dg/Wstringop-overflow-13.c: Ditto.
* gcc.dg/Wstringop-overflow-14.c: Ditto.
* gcc.dg/Wstringop-overflow-21.c: Ditto.
* gcc.dg/Wstringop-overflow-68.c: Ditto.
* gcc.dg/gomp/pr46032-2.c: Ditto.
* gcc.dg/gomp/pr46032-3.c: Ditto.
* gcc.dg/gomp/simd-2.c: Ditto.
* gcc.dg/gomp/simd-3.c: Ditto.
* gcc.dg/graphite/fuse-1.c: Ditto.
* gcc.dg/pr67089-6.c: Ditto.
* gcc.dg/pr82929-2.c: Ditto.
* gcc.dg/pr82929.c: Ditto.
* gcc.dg/store_merging_1.c: Ditto.
* gcc.dg/store_merging_11.c: Ditto.
* gcc.dg/store_merging_15.c: Ditto.
* gcc.dg/store_merging_16.c: Ditto.
* gcc.dg/store_merging_19.c: Ditto.
* gcc.dg/store_merging_24.c: Ditto.
* gcc.dg/store_merging_25.c: Ditto.
* gcc.dg/store_merging_28.c: Ditto.
* gcc.dg/store_merging_30.c: Ditto.
* gcc.dg/store_merging_5.c: Ditto.
* gcc.dg/store_merging_7.c: Ditto.
* gcc.dg/store_merging_8.c: Ditto.
* gcc.dg/strlenopt-85.c: Ditto.
* gcc.dg/tree-ssa/dump-6.c: Ditto.
* gcc.dg/tree-ssa/pr19210-1.c: Ditto.
* gcc.dg/tree-ssa/pr47059.c: Ditto.
* gcc.dg/tree-ssa/pr86017.c: Ditto.
* gcc.dg/tree-ssa/pr91482.c: Ditto.
* gcc.dg/tree-ssa/predcom-1.c: Ditto.
* gcc.dg/tree-ssa/predcom-dse-3.c: Ditto.
* gcc.dg/tree-ssa/prefetch-3.c: Ditto.
* gcc.dg/tree-ssa/prefetch-6.c: Ditto.
* gcc.dg/tree-ssa/prefetch-8.c: Ditto.
* gcc.dg/tree-ssa/prefetch-9.c: Ditto.
* gcc.dg/tree-ssa/ssa-dse-18.c: Ditto.
* gcc.dg/tree-ssa/ssa-dse-19.c: Ditto.
* gcc.dg/uninit-40.c: Ditto.
* gcc.dg/unroll-7.c: Ditto.
* gcc.misc-tests/help.exp: Ditto.
* gcc.target/i386/avx512vpopcntdqvl-vpopcntd-1.c: Ditto.
* gcc.target/i386/pr22141.c: Ditto.
* gcc.target/i386/pr34012.c: Ditto.
* gcc.target/i386/pr49781-1.c: Ditto.
* gcc.target/i386/pr95798-1.c: Ditto.
* gcc.target/i386/pr95798-2.c: Ditto.
* gfortran.dg/pr77498.f: Ditto.
---
 gcc/common.opt |  2 +-
 gcc/doc/invoke.texi|  8 +---
 gcc/opts.c | 18 +++---
 .../c-c++-common/Wstringop-overflow-2.c|  2 +-
 gcc/testsuite/g++.dg/tree-ssa/pr81408.C|  2 +-
 gcc/testsuite/g++.dg/warn/Wuninitialized-13.C  |  2 +-
 gcc/testsuite/gcc.dg/Warray-bounds-51.c|  2 +-
 gcc/testsuite/gcc.dg/Warray-parameter-3.c  |  2 +-
 gcc/testsuite/gcc.dg/Wstringop-overflow-13.c   |  2 +-
 gcc/testsuite/gcc.dg/Wstringop-overflow-14.c   |  2 +-
 gcc/testsuite/gcc.dg/Wstringop-overflow-21.c   |  2 +-
 gcc/testsuite/gcc.dg/Wstringop-overflow-68.c   |  2 +-
 gcc/testsuite/gcc.dg/gomp/pr46032-2.c  |  2 +-
 gcc/testsuite/gcc.dg/gomp/pr46032-3.c  |  2 +-
 gcc/testsuite/gcc.dg/gomp/simd-2.c |  2 +-
 gcc/testsuite/gcc.dg/gomp/simd-3.c |  2 +-
 gcc/testsuite/gcc.dg/graphite/fuse-1.c |  2 +-
 gcc/testsuite/gcc.dg/pr67089-6.c   |  2 +-
 gcc/testsuite/gcc.dg/pr82929-2.c   |  2 +-
 gcc/testsuite/gcc.dg/pr82929.c |  2 +-
 gcc/testsuite/gcc.dg/store_merging_1.c |  2 +-
 gcc/testsuite/gcc.dg/store_merging_11.c|  2 +-
 gcc/testsuite/gcc.dg/store_merging_15.c|  2 +-
 gcc/testsuite/gcc.dg/store_merging_16.c|  2 +-
 gcc/testsuite/gcc.dg/store_merging_19.c|  2 +-
 gcc/testsuite/gcc.dg/store_merging_24.c|  2 +-
 gcc/testsuite/gcc.dg/store_merging_25.c|  2 +-
 gcc/testsuite/gcc.dg/store_merging_28.c|  2 +-
 gcc/testsuite/gcc.dg/store_merging_30.c|  2 +-
 gcc/testsuite/gcc.dg/store_merging_5.c |  2 +-
 gcc/testsuite/gcc.dg/store_merging_7.c |  2 +-
 gcc/testsuite/gcc.dg/store_merging_8.c |  2 +-
 gcc/testsuite/gcc.dg/strlenopt-85.c|  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/dump-6.c |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr19210-1.c  |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr47059.c|  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr86017.c|  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr91482.c|  2 +-
 

Re: [PATCH 2/2] RISC-V: Implement TARGET_COMPUTE_MULTILIB

2021-09-15 Thread Kito Cheng via Gcc-patches
> I find the other_cond support a bit confusing.  Is this for -mcmodel
> perhaps?  Why not just say that if so?

I suppose we might have other multilib options other than -march,
-mabi and -mcmodel,
so I keep the flexibility here.

> riscv_multi_lib_info_t::parse
> Calls riscv_subset_list::parse twice when path == ".", the call inside
> the if looks unnecessary.

Thanks, good catch !

> It isn't clear how the loop with the comment "ignore march and mabi option
> in cond string" can work.  It looks like it computes other_cond, but
> assumes that there is at most one other_cond, and that it is always at the
> end of the list since otherwise the length won't be computed correctly.
> But it doesn't check these constraints.  Do you have examples showing how
> this works?
>   And maybe a little better commentary explaining what this loop does to
> make it easier to understand.  It doesn't mention that it computes
> other_cond for instance.

Seriously, I also spend some time remembering what they are doing...so
I rewrite that to make that easier to understand instead of copying
gcc.c if possible.


Re: GNU Tools @ LPC 2021: Program is published

2021-09-15 Thread Gerald Pfeifer
On Wed, 15 Sep 2021, Thomas Schwinge wrote:
>> The program for the GNU Tools Track at Linux Plumbers Conference is
>> published:
>>
>>   https://linuxplumbersconf.org/event/11/sessions/109/
> This may qualify "as obvious", but I better get reviewed what I change on
> our front page to the Internet ;-) -- OK to push to wwwdocs master branch
> the attached "GNU Tools @ Linux Plumbers Conference 2021"?

Yes, and thank you for thinking of this!

(Maybe just say "held online" or "through videoconference".)

Gerald


Re: [PATCH v2] ipa-inline: Add target info into fn summary [PR102059]

2021-09-15 Thread Kewen.Lin via Gcc-patches
Hi Martin,

Thanks for the review comments!

on 2021/9/15 下午8:51, Martin Jambor wrote:
> Hi,
> 
> since this is inlining-related, I would somewhat prefer Honza to have a
> look too, but I have the following comments:
> 
> On Wed, Sep 08 2021, Kewen.Lin wrote:
>>
> 
> [...]
> 
>> diff --git a/gcc/ipa-fnsummary.h b/gcc/ipa-fnsummary.h
>> index 78399b0b9bb..300b8da4507 100644
>> --- a/gcc/ipa-fnsummary.h
>> +++ b/gcc/ipa-fnsummary.h
>> @@ -193,6 +194,9 @@ public:
>>vec *loop_strides;
>>/* Parameters tested by builtin_constant_p.  */
>>vec GTY((skip)) builtin_constant_p_parms;
>> +  /* Like fp_expressions, but it's to hold some target specific information,
>> + such as some target specific isa flags.  */
>> +  auto_vec GTY((skip)) target_info;
>>/* Estimated growth for inlining all copies of the function before start
>>   of small functions inlining.
>>   This value will get out of date as the callers are duplicated, but
> 
> Segher already wrote in the first thread that a vector of HOST_WIDE_INTs
> is an overkill and I agree.  So at least make the new field just a
> HOST_WIDE_INT or better yet, an unsigned int.  But I would even go
> further and make target_info only a 16-bit bit-field, place it after the
> other bit-fields in class ipa_fn_summary and pass it to the hooks as
> uint16_t.  Unless you have plans which require more space, I think we
> should be conservative here.
> 

OK, yeah, the consideration is mainly for the scenario that target has
a few bits to care about.  I just realized that to avoid inefficient
bitwise operation for mapping target info bits to isa_flag bits, target
can rearrange the sparse bits in isa_flag, so it's not a deal.
Thanks for re-raising this!  I'll use the 16 bits bit-field in v3 as you
suggested, if you don't mind, I will put it before the existing bit-fields
to have a good alignment.

> I am also not sure if I agree that the field should not be streamed for
> offloading, but since we do not have an offloading compiler needing them
> I guess for now that is OK. But it should be documented in the comment
> describing the field that it is not streamed to offloading compilers.
> 

Good point, will add it in v3.

> [...]
> 
> 
>> diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
>> index 2470937460f..72091b6193f 100644
>> --- a/gcc/ipa-fnsummary.c
>> +++ b/gcc/ipa-fnsummary.c
>> @@ -2608,6 +2617,7 @@ analyze_function_body (struct cgraph_node *node, bool 
>> early)
>>info->conds = NULL;
>>info->size_time_table.release ();
>>info->call_size_time_table.release ();
>> +  info->target_info.release();
>>  
>>/* When optimizing and analyzing for IPA inliner, initialize loop 
>> optimizer
>>   so we can produce proper inline hints.
>> @@ -2659,6 +2669,12 @@ analyze_function_body (struct cgraph_node *node, bool 
>> early)
>> bb_predicate,
>> bb_predicate);
>>  
>> +  /* Only look for target information for inlinable functions.  */
>> +  bool scan_for_target_info =
>> +info->inlinable
>> +&& targetm.target_option.need_ipa_fn_target_info (node->decl,
>> +  info->target_info);
>> +
>>if (fbi.info)
>>  compute_bb_predicates (, node, info, params_summary);
>>const profile_count entry_count = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
>> @@ -2876,6 +2892,10 @@ analyze_function_body (struct cgraph_node *node, bool 
>> early)
>>if (dump_file)
>>  fprintf (dump_file, "   fp_expression set\n");
>>  }
>> +  if (scan_for_target_info)
>> +scan_for_target_info =
>> +  targetm.target_option.update_ipa_fn_target_info
>> +  (info->target_info, stmt);
>>  }
> 
> Practically it probably does not matter, but why is this in the "if
> (this_time || this_size)" block?  Although I can see that setting
> fp_expression is also done that way... but it seems like copying a
> mistake to me.

Yeah, I felt target info scanning is similar to fp_expression scanning,
so I just followed the same way.  If I read it right, the case
!(this_time || this_size) means the STMT won't be weighted to any RTL
insn from both time and size perspectives, so guarding it seems to avoid
unnecessary scannings.  I assumed that target bifs and inline asm would
not be evaluated as zero cost, it seems safe so far for HTM usage.

Do you worry about some special STMT which is weighted to zero but it's
necessarily to be checked for target info in a long term?
If so, I'll move it out in v3.
> 
> All that said, the overall approach seems correct to me.
> 

Thanks again.
BR,
Kewen


Re: [PATCH 4/4] [PATCH 4/4] x86: Add TARGET_SSE_PARTIAL_REG_[FP_]CONVERTS_DEPENDENCY

2021-09-15 Thread H.J. Lu via Gcc-patches
On Wed, Sep 15, 2021 at 4:54 PM Cui, Lili  wrote:
>
>
>
> > -Original Message-
> > From: H.J. Lu 
> > Sent: Wednesday, September 15, 2021 10:14 PM
> > To: Cui, Lili 
> > Cc: Uros Bizjak ; GCC Patches  > patc...@gcc.gnu.org>; Liu, Hongtao 
> > Subject: Re: [PATCH 4/4] [PATCH 4/4] x86: Add
> > TARGET_SSE_PARTIAL_REG_[FP_]CONVERTS_DEPENDENCY
> >
> > There is no need to add [PATCH N/4] in the first line of the git commit
> > message.  "git format-patch" or "git send-email" will add them 
> > automatically.
> >
> Thanks for the reminder, I didn't notice it before.
>
> > On Wed, Sep 15, 2021 at 1:10 AM  wrote:
> > >
> > > From: "H.J. Lu" 
> > >
> > > 1. Replace TARGET_SSE_PARTIAL_REG_DEPENDENCY with
> > > TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY in SSE FP to FP
> > splitters.
> > > 2. Replace TARGET_SSE_PARTIAL_REG_DEPENDENCY with
> > > TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY in SSE INT to FP
> > splitters.
> > > 3.  Also check TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY
> > and
> > > TARGET_SSE_PARTIAL_REG_DEPENDENCY when handling
> > avx_partial_xmm_update
> > > attribute.  Don't convert AVX partial XMM register update if there is
> > > no partial SSE register dependency for SSE conversion.
> > >
> > > gcc/
> > >
> > > * config/i386/i386-features.c (remove_partial_avx_dependency):
> > > Also check TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY
> > and
> > > and TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY before
> > generating
> > > vxorps.
> > > * config/i386/i386.h
> > (TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY):
> > > New.
> > > (TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Likewise.
> > > * config/i386/i386.md (SSE FP to FP splitters): Replace
> > > TARGET_SSE_PARTIAL_REG_DEPENDENCY with
> > > TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY.
> > > (SSE INT to FP splitter): Replace
> > TARGET_SSE_PARTIAL_REG_DEPENDENCY
> > > with TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY.
> > > * config/i386/x86-tune.def
> > > (X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY): New.
> > > (X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Likewise.
> > >
> > > gcc/testsuite/
> > >
> > > * gcc.target/i386/avx-covert-1.c: New file.
> > > * gcc.target/i386/avx-fp-covert-1.c: Likewise.
> > > * gcc.target/i386/avx-int-covert-1.c: Likewise.
> > > * gcc.target/i386/sse-covert-1.c: Likewise.
> > > * gcc.target/i386/sse-fp-covert-1.c: Likewise.
> > > * gcc.target/i386/sse-int-covert-1.c: Likewise.
> > > ---
> > >  gcc/config/i386/i386-features.c   |  6 --
> > >  gcc/config/i386/i386.h|  4 
> > >  gcc/config/i386/i386.md   |  9 ++---
> > >  gcc/config/i386/x86-tune.def  | 15 +++
> > >  gcc/testsuite/gcc.target/i386/avx-covert-1.c  | 19 +++
> > >  .../gcc.target/i386/avx-fp-covert-1.c | 15 +++
> > >  .../gcc.target/i386/avx-int-covert-1.c| 14 ++
> > >  gcc/testsuite/gcc.target/i386/sse-covert-1.c  | 19 +++
> > >  .../gcc.target/i386/sse-fp-covert-1.c | 15 +++
> > >  .../gcc.target/i386/sse-int-covert-1.c| 14 ++
> > >  10 files changed, 125 insertions(+), 5 deletions(-)  create mode
> > > 100644 gcc/testsuite/gcc.target/i386/avx-covert-1.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/avx-fp-covert-1.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/avx-int-covert-1.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/sse-covert-1.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/sse-fp-covert-1.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/sse-int-covert-1.c
> > >
> > > diff --git a/gcc/config/i386/i386-features.c
> > > b/gcc/config/i386/i386-features.c index ae5ea02a002..91bfa06d4bf
> > > 100644
> > > --- a/gcc/config/i386/i386-features.c
> > > +++ b/gcc/config/i386/i386-features.c
> > > @@ -2218,14 +2218,16 @@ remove_partial_avx_dependency (void)
> > >   machine_mode dest_mode = GET_MODE (dest);
> > >   machine_mode src_mode;
> > >
> > > - if (TARGET_USE_VECTOR_FP_CONVERTS)
> > > + if (TARGET_USE_VECTOR_FP_CONVERTS
> > > + || !TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY)
> > > {
> > >   src_mode = GET_MODE (XEXP (src, 0));
> > >   if (src_mode == E_SFmode || src_mode == E_DFmode)
> > > continue;
> > > }
> > >
> > > - if (TARGET_USE_VECTOR_CONVERTS)
> > > + if (TARGET_USE_VECTOR_CONVERTS
> > > + || !TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY)
> > > {
> > >   src_mode = GET_MODE (XEXP (src, 0));
> > >   if (src_mode == E_SImode || src_mode == E_DImode) diff
> > > --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index
> > > 

[PATCH] rs6000: Modify the way for extra penalized cost

2021-09-15 Thread Kewen.Lin via Gcc-patches
Hi,

This patch follows the discussion here[1], where Segher pointed
out the existing way to guard the extra penalized cost for
strided/elementwise loads with a magic bound doesn't scale.

The way with nunits * stmt_cost can get one much exaggerated
penalized cost, such as: for V16QI on P8, it's 16 * 20 = 320,
that's why we need one bound.  To make it scale, this patch
doesn't use nunits * stmt_cost any more, but it still keeps
nunits since there are actually nunits scalar loads there.  So
it uses one cost adjusted from stmt_cost, since the current
stmt_cost sort of considers nunits, we can stablize the cost
for big nunits and retain the cost for small nunits.  After
some tries, this patch gets the adjusted cost as:

stmt_cost / (log2(nunits) * log2(nunits))

For V16QI, the adjusted cost would be 1 and total penalized
cost is 16, it isn't exaggerated.  For V2DI, the adjusted
cost would be 2 and total penalized cost is 4, which is the
same as before.  btw, I tried to use one single log2(nunits),
but the penalized cost is still big enough and can't fix the
degraded bmk blender_r.

The separated SPEC2017 evaluations on Power8, Power9 and Power10
at option sets O2-vect and Ofast-unroll showed this change is
neutral (that is same effect as before).

Bootstrapped and regress-tested on powerpc64le-linux-gnu Power9.

Is it ok for trunk?

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579121.html

BR,
Kewen
-
gcc/ChangeLog:

* config/rs6000/rs6000.c (rs6000_update_target_cost_per_stmt): Adjust
the way to compute extra penalized cost.

---
 gcc/config/rs6000/rs6000.c | 28 +---
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 4ab23b0ab33..e08b94c0447 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -5454,17 +5454,23 @@ rs6000_update_target_cost_per_stmt (rs6000_cost_data 
*data,
{
  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
  unsigned int nunits = vect_nunits_for_cost (vectype);
- unsigned int extra_cost = nunits * stmt_cost;
- /* As function rs6000_builtin_vectorization_cost shows, we have
-priced much on V16QI/V8HI vector construction as their units,
-if we penalize them with nunits * stmt_cost, it can result in
-an unreliable body cost, eg: for V16QI on Power8, stmt_cost
-is 20 and nunits is 16, the extra cost is 320 which looks
-much exaggerated.  So let's use one maximum bound for the
-extra penalized cost for vector construction here.  */
- const unsigned int MAX_PENALIZED_COST_FOR_CTOR = 12;
- if (extra_cost > MAX_PENALIZED_COST_FOR_CTOR)
-   extra_cost = MAX_PENALIZED_COST_FOR_CTOR;
+ /* As function rs6000_builtin_vectorization_cost shows, we
+have priced much on V16QI/V8HI vector construction by
+considering their units, if we penalize them with nunits
+* stmt_cost here, it can result in an unreliable body cost,
+eg: for V16QI on Power8, stmt_cost is 20 and nunits is 16,
+the penalty will be 320 which looks much exaggerated.  But
+there are actually nunits scalar loads, so we try to adopt
+one reasonable penalized cost for each load rather than
+stmt_cost.  Here, with stmt_cost dividing by log2(nunits)^2,
+we can still retain the necessary penalty for small nunits
+meanwhile stabilize the penalty for big nunits.  */
+ int nunits_log2 = exact_log2 (nunits);
+ gcc_assert (nunits_log2 > 0);
+ unsigned int nunits_sq = nunits_log2 * nunits_log2;
+ unsigned int adjusted_cost = stmt_cost / nunits_sq;
+ gcc_assert (adjusted_cost > 0);
+ unsigned int extra_cost = nunits * adjusted_cost;
  data->extra_ctor_cost += extra_cost;
}
 }
--
2.25.1


RE: [PATCH 4/4] [PATCH 4/4] x86: Add TARGET_SSE_PARTIAL_REG_[FP_]CONVERTS_DEPENDENCY

2021-09-15 Thread Cui, Lili via Gcc-patches


> -Original Message-
> From: H.J. Lu 
> Sent: Wednesday, September 15, 2021 10:14 PM
> To: Cui, Lili 
> Cc: Uros Bizjak ; GCC Patches  patc...@gcc.gnu.org>; Liu, Hongtao 
> Subject: Re: [PATCH 4/4] [PATCH 4/4] x86: Add
> TARGET_SSE_PARTIAL_REG_[FP_]CONVERTS_DEPENDENCY
> 
> There is no need to add [PATCH N/4] in the first line of the git commit
> message.  "git format-patch" or "git send-email" will add them automatically.
> 
Thanks for the reminder, I didn't notice it before.

> On Wed, Sep 15, 2021 at 1:10 AM  wrote:
> >
> > From: "H.J. Lu" 
> >
> > 1. Replace TARGET_SSE_PARTIAL_REG_DEPENDENCY with
> > TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY in SSE FP to FP
> splitters.
> > 2. Replace TARGET_SSE_PARTIAL_REG_DEPENDENCY with
> > TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY in SSE INT to FP
> splitters.
> > 3.  Also check TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY
> and
> > TARGET_SSE_PARTIAL_REG_DEPENDENCY when handling
> avx_partial_xmm_update
> > attribute.  Don't convert AVX partial XMM register update if there is
> > no partial SSE register dependency for SSE conversion.
> >
> > gcc/
> >
> > * config/i386/i386-features.c (remove_partial_avx_dependency):
> > Also check TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY
> and
> > and TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY before
> generating
> > vxorps.
> > * config/i386/i386.h
> (TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY):
> > New.
> > (TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Likewise.
> > * config/i386/i386.md (SSE FP to FP splitters): Replace
> > TARGET_SSE_PARTIAL_REG_DEPENDENCY with
> > TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY.
> > (SSE INT to FP splitter): Replace
> TARGET_SSE_PARTIAL_REG_DEPENDENCY
> > with TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY.
> > * config/i386/x86-tune.def
> > (X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY): New.
> > (X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Likewise.
> >
> > gcc/testsuite/
> >
> > * gcc.target/i386/avx-covert-1.c: New file.
> > * gcc.target/i386/avx-fp-covert-1.c: Likewise.
> > * gcc.target/i386/avx-int-covert-1.c: Likewise.
> > * gcc.target/i386/sse-covert-1.c: Likewise.
> > * gcc.target/i386/sse-fp-covert-1.c: Likewise.
> > * gcc.target/i386/sse-int-covert-1.c: Likewise.
> > ---
> >  gcc/config/i386/i386-features.c   |  6 --
> >  gcc/config/i386/i386.h|  4 
> >  gcc/config/i386/i386.md   |  9 ++---
> >  gcc/config/i386/x86-tune.def  | 15 +++
> >  gcc/testsuite/gcc.target/i386/avx-covert-1.c  | 19 +++
> >  .../gcc.target/i386/avx-fp-covert-1.c | 15 +++
> >  .../gcc.target/i386/avx-int-covert-1.c| 14 ++
> >  gcc/testsuite/gcc.target/i386/sse-covert-1.c  | 19 +++
> >  .../gcc.target/i386/sse-fp-covert-1.c | 15 +++
> >  .../gcc.target/i386/sse-int-covert-1.c| 14 ++
> >  10 files changed, 125 insertions(+), 5 deletions(-)  create mode
> > 100644 gcc/testsuite/gcc.target/i386/avx-covert-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/avx-fp-covert-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/avx-int-covert-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/sse-covert-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/sse-fp-covert-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/sse-int-covert-1.c
> >
> > diff --git a/gcc/config/i386/i386-features.c
> > b/gcc/config/i386/i386-features.c index ae5ea02a002..91bfa06d4bf
> > 100644
> > --- a/gcc/config/i386/i386-features.c
> > +++ b/gcc/config/i386/i386-features.c
> > @@ -2218,14 +2218,16 @@ remove_partial_avx_dependency (void)
> >   machine_mode dest_mode = GET_MODE (dest);
> >   machine_mode src_mode;
> >
> > - if (TARGET_USE_VECTOR_FP_CONVERTS)
> > + if (TARGET_USE_VECTOR_FP_CONVERTS
> > + || !TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY)
> > {
> >   src_mode = GET_MODE (XEXP (src, 0));
> >   if (src_mode == E_SFmode || src_mode == E_DFmode)
> > continue;
> > }
> >
> > - if (TARGET_USE_VECTOR_CONVERTS)
> > + if (TARGET_USE_VECTOR_CONVERTS
> > + || !TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY)
> > {
> >   src_mode = GET_MODE (XEXP (src, 0));
> >   if (src_mode == E_SImode || src_mode == E_DImode) diff
> > --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index
> > e76bb55c080..ec60b89753e 100644
> > --- a/gcc/config/i386/i386.h
> > +++ b/gcc/config/i386/i386.h
> > @@ -334,6 +334,10 @@ extern unsigned char
> ix86_tune_features[X86_TUNE_LAST];
> > ix86_tune_features[X86_TUNE_PARTIAL_REG_DEPENDENCY]
> >  #define 

Re: [PATCH] c++: fix cases of core1001/1322 by not dropping cv-qualifier of function parameter of type of typename or decltype[PR101402,PR102033,PR102034,PR102039,PR102044]

2021-09-15 Thread Jason Merrill via Gcc-patches

On 8/31/21 09:55, nick huang via Gcc-patches wrote:

These bugs are considered duplicate cases of PR51851 which has been suspended
since 2012, an issue known as "core1001/1322". Considering this background,
it deserves a long comment to explain.

Many people believed the root cause of this family of bugs is related with
the nature of how and when the array type is converted to pointer type during
function signature is calculated. This is true, but we may need to go into 
details
to understand the exact reason.

There is a pattern for these bugs(PR101402,PR102033,PR102034,PR102039). In the
template function declaration, the function parameter is consisted of a "const"
followed by a typename-type which is actually an array type. According to
standard, function signature is calculated by dropping so-called
"top-level-cv-qualifier". As a result, the templater specialization complains
no matching to declaration can be found because specialization has const and
template function declaration doesn't have const which is dropped as mentioned.
Obviously the template function declaration should NOT drop the const. But why?
Let's review the procedure of standard first.
(https://timsong-cpp.github.io/cppwp/dcl.fct#5.sentence-3)

"After determining the type of each parameter, any parameter of type “array of 
T”
or of function type T is adjusted to be “pointer to T”. After producing the list
of parameter types, any top-level cv-qualifiers modifying a parameter type are
deleted when forming the function type."

Please note the action of deleting top-level cv-qualifiers happens at last stage
after array type is converted to pointer type. More importantly, there are two
conditions:
a) Each type must be able to be determined.
b) The cv-qualifier must be top-level.
Let's analysis if these two conditions can be met one by one.
1) Keyword "typename" indicates inside template it involves dependent name
  (https://timsong-cpp.github.io/cppwp/n4659/temp.res#2) for which the name 
lookup
can be postponed until template instantiation. Clearly the type of dependent
name cannot be determined without name lookup. Then we can NOT proceed to next
step until concrete template argument type is determined during specialization.
2) After “array of T” is converted to “pointer to T”, the cv-qualifiers are no
longer top-level! Unfortunately in standard there is no definition
of "top-level". Mr. Dan Saks's articals (https://www.dansaks.com/articles.shtml)
are tremendous help! Especially this wonderful paper 
(https://www.dansaks.com/articles/2000-02%20Top-Level%20cv-Qualifiers%20in%20Function%20Parameters.pdf)
discusses this topic in details. In one short sentence, the "const" before
array type is NOT top-level-cv-qualifier and should NOT be dropped.

So, understanding the root cause makes the fix very clear: Let's NOT drop
cv-qualifier for typename-type inside template. Leave this task for template
substitution later when template specialization locks template argument types.

Similarly inside template, "decltype" may also include dependent name and
the best strategy for parser is to preserve all original declaration and
postpone the task till template substitution.

Here is an interesting observation to share. Originally my fix is trying to
use function "resolve_typename_type" to see if the "typename-type" is indeed
an array type so as to decide whether the const should be dropped. It works
for cases of PR101402,PR102033(with a small fix of function), but cannot
succeed on cases of PR102034,PR102039. Especially PR102039 is impossible
because it depends on template argument. This helps me realize that parser
should not do any work if it cannot be 100% successful. All can wait.

At last I want to acknowledge other efforts to tackle this core 1001/1322 from
PR92010 which is an irreplaceable different approach from this fix by doing
rebuilding template function signature during template substitution stage.
After all, this fix can only deal with dependent type started with "typename"
or "decltype" which is not the case of pr92010.


Unfortunately, your patch breaks

template 
struct A
{
  void f(T);
};

template 
void A::f(const T)
{ }

which is certainly questionable code, but is currently also accepted by 
clang and EDG compilers.


Why doesn't the PR92010 fix address these testcases as well?


gcc/cp/ChangeLog:

2021-08-30  qingzhe huang  

* decl.c (grokparms):

gcc/testsuite/ChangeLog:

2021-08-30  qingzhe huang  

* g++.dg/parse/pr101402.C: New test.
* g++.dg/parse/pr102033.C: New test.
* g++.dg/parse/pr102034.C: New test.
* g++.dg/parse/pr102039.C: New test.
* g++.dg/parse/pr102044.C: New test.


diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index e0c603aaab6..940c43ce707 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -14384,7 +14384,16 @@ grokparms (tree parmlist, tree *parms)
  
  	  /* Top-level qualifiers on the parameters are

 ignored for function types.  */
- type = 

[PATCH] doc: improve -fsanitize=undefined description

2021-09-15 Thread Diane Meirowitz via Gcc-patches

doc: improve -fsanitize=undefined description

gcc/ChangeLog:
* doc/invoke.texi: add link to UndefinedBehaviorSanitizer 
documentation,
mention UBSAN_OPTIONS, similar to what is done for AddressSanitizer.

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 78cfc100ac2..f022885edf8 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -15200,7 +15200,8 @@ The option cannot be combined with 
@option{-fsanitize=thread}.
@opindex fsanitize=undefined
Enable UndefinedBehaviorSanitizer, a fast undefined behavior detector.
Various computations are instrumented to detect undefined behavior
-at runtime.  Current suboptions are:
+at runtime.  See 
@uref{https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html} for more 
details.   The run-time behavior can be influenced using the
+@env{UBSAN_OPTIONS} environment variable.  Current suboptions are:

@table @gcctabopt




Re: [PATCH] C++: add type checking for static local vector variable in template

2021-09-15 Thread Jason Merrill via Gcc-patches

On 9/6/21 08:10, wangpc via Gcc-patches wrote:

This patch adds type checking for static local vector variable in
C++ template, both AArch64 SVE and RISCV RVV are of sizeless type
and thay all have this issue.

2021-08-06  wangpc  

gcc/cp/ChangeLog

 * pt.c (tsubst_decl): Add type checking.

gcc/testsuite/ChangeLog

 * g++.target/aarch64/sve/static-var-in-template.C: New test.
---
  gcc/cp/pt.c|  8 +++-
  .../aarch64/sve/static-var-in-template.C   | 18 ++
  2 files changed, 25 insertions(+), 1 deletion(-)
  create mode 100644 
gcc/testsuite/g++.target/aarch64/sve/static-var-in-template.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index f0aa626ab723..988f4cb1e73f 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -14731,7 +14731,13 @@ tsubst_decl (tree t, tree args, tsubst_flags_t 
complain)
 even if its underlying type is not.  */
  TYPE_DEPENDENT_P_VALID (TREE_TYPE (r)) = false;
  }
-
+/* We should verify static local variable's type
+since vector type does not have a fixed size.  */
+if (TREE_STATIC (t)
+  &&!verify_type_context (input_location, TCTX_STATIC_STORAGE, type))


It seems that the reason this was missed before was because we checked 
for this in start_decl, which isn't called for template instantiation. 
Would it work to move the verify_type_context code from start_decl to 
cp_finish_decl, near the other call to verify_type_context, instead of 
doing anything here?



+{
+  RETURN (error_mark_node);
+}
layout_decl (r, 0);
}
break;
diff --git a/gcc/testsuite/g++.target/aarch64/sve/static-var-in-template.C 
b/gcc/testsuite/g++.target/aarch64/sve/static-var-in-template.C
new file mode 100644
index ..26d397ca565d
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/sve/static-var-in-template.C
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+
+#include 
+
+template 
+void f()
+{
+int i = 0;
+static svbool_t pg = svwhilelt_b64(0, N);
+}
+
+int main(int argc, char **argv)
+{
+f<2>();
+return 0;
+}
+
+/* { dg-error {SVE type 'svbool_t' does not have a fixed size} } */





[PATCH] Fix PR 67102: Add libstdc++ dependancy to libffi

2021-09-15 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

The error message is obvious -funconfigured-libstdc++-v3 is used
on the g++ command line.  So we just add the dependancy.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

ChangeLog:

* Makefile.def: Have configure-target-libffi depend on
all-target-libstdc++-v3.
* Makefile.in: Regenerate.
---
 Makefile.def | 1 +
 Makefile.in  | 1 +
 2 files changed, 2 insertions(+)

diff --git a/Makefile.def b/Makefile.def
index de3e0052106..90316364d01 100644
--- a/Makefile.def
+++ b/Makefile.def
@@ -592,6 +592,7 @@ dependencies = { module=configure-target-fastjar; 
on=configure-target-zlib; };
 dependencies = { module=all-target-fastjar; on=all-target-zlib; };
 dependencies = { module=configure-target-libgo; on=configure-target-libffi; };
 dependencies = { module=configure-target-libgo; on=all-target-libstdc++-v3; };
+dependencies = { module=configure-target-libffi; on=all-target-libstdc++-v3; };
 dependencies = { module=all-target-libgo; on=all-target-libbacktrace; };
 dependencies = { module=all-target-libgo; on=all-target-libffi; };
 dependencies = { module=all-target-libgo; on=all-target-libatomic; };
diff --git a/Makefile.in b/Makefile.in
index 61af99dc75a..81b26c7177e 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -61261,6 +61261,7 @@ all-bison: maybe-all-intl
 all-flex: maybe-all-intl
 all-m4: maybe-all-intl
 configure-target-libgo: maybe-all-target-libstdc++-v3
+configure-target-libffi: maybe-all-target-libstdc++-v3
 configure-target-liboffloadmic: maybe-configure-target-libgomp
 all-target-liboffloadmic: maybe-all-target-libgomp
 configure-target-newlib: maybe-all-binutils
-- 
2.17.1



Re: [PATCH] c++: fix wrong fixit hints for misspelled typedef [PR77565]

2021-09-15 Thread Jason Merrill via Gcc-patches

On 9/14/21 04:29, Michel Morin via Gcc-patches wrote:

On Tue, Sep 14, 2021 at 7:14 AM David Malcolm  wrote:


On Tue, 2021-09-14 at 03:35 +0900, Michel Morin via Gcc-patches wrote:

Hi,

PR77565 reports that, with the code `typdef int Int;`, GCC emits
"did you mean 'typeof'?" instead of "did you mean 'typedef'?".

This happens because the typo corrector determines that `typeof` is a
candidate for suggestion (through
`cp_keyword_starts_decl_specifier_p`),
but `typedef` is not.

This patch fixes the issue by adding `typedef` as a candidate. The
patch
additionally adds the `inline` specifier and cv-specifiers as a
candidate.
Here is a patch (tests `make check-gcc` pass on darwin):


Thanks for this patch (and for reporting the bug in the first place).

I notice that, as well as being used for fix-it hints by
lookup_name_fuzzy (indirectly via suggest_rid_p),
cp_keyword_starts_decl_specifier_p is also used by
cp_lexer_next_token_is_decl_specifier_keyword, which is used by
cp_parser_lambda_declarator_opt and cp_parser_constructor_declarator_p.


Ah, you're right! Thank you for pointing this out.
I failed to grep those functions somehow.

One thing that confuses me is that cp_keyword_starts_decl_specifier_p
misses many keywords that can start decl-specifiers (e.g.
typedef/inline/cv-qual and friend/explicit/virtual).
So let's wait C++ frontend maintainers ;)


That is strange.  Let's add all the rest of them as well.


So I'm not sure if this fix is exactly correct - hopefully one of the
C++ frontend maintainers can chime in.  If
cp_keyword_starts_decl_specifier_p isn't quite the right place for
this, the fix could probably go in suggest_rid_p instead, which *is*
specific to spelling corrections.

Hope this is constructive; thanks again for the patch
Dave






c++: add typo corrections for typedef/inline/cv-qual [PR77565]

PR c++/77565

gcc/cp/ChangeLog:

* parser.c (cp_keyword_starts_decl_specifier_p): Handle
typedef/inline specifiers and cv-qualifiers.

gcc/testsuite/ChangeLog:

* g++.dg/spellcheck-typenames.C: Add tests for decl-specs.

--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -1051,6 +1051,12 @@ cp_keyword_starts_decl_specifier_p (enum rid
keyword)
  case RID_FLOAT:
  case RID_DOUBLE:
  case RID_VOID:
+  /* CV qualifiers.  */
+case RID_CONST:
+case RID_VOLATILE:
+  /* typedef/inline specifiers.  */
+case RID_TYPEDEF:
+case RID_INLINE:
/* GNU extensions.  */
  case RID_ATTRIBUTE:
  case RID_TYPEOF:
--- a/gcc/testsuite/g++.dg/spellcheck-typenames.C
+++ b/gcc/testsuite/g++.dg/spellcheck-typenames.C
@@ -76,3 +76,38 @@ singed char ch; // { dg-error "1: 'singed' does
not
name a type; did you mean 's
   ^~
   signed
 { dg-end-multiline-output "" } */
+
+typdef int my_int; // { dg-error "1: 'typdef' does not name a type;
did you mean 'typedef'?" }
+/* { dg-begin-multiline-output "" }
+ typdef int my_int;
+ ^~
+ typedef
+   { dg-end-multiline-output "" } */
+
+inlien int inline_func(); // { dg-error "1: 'inlien' does not name a
type; did you mean 'inline'?" }
+/* { dg-begin-multiline-output "" }
+ inlien int inline_func();
+ ^~
+ inline
+   { dg-end-multiline-output "" } */
+
+coonst int ci = 0; // { dg-error "1: 'coonst' does not name a type;
did you mean 'const'?" }
+/* { dg-begin-multiline-output "" }
+ coonst int ci = 0;
+ ^~
+ const
+   { dg-end-multiline-output "" } */
+
+voltil int vi; // { dg-error "1: 'voltil' does not name a type; did
you mean 'volatile'?" }
+/* { dg-begin-multiline-output "" }
+ voltil int vi;
+ ^~
+ volatile
+   { dg-end-multiline-output "" } */
+
+statik int si; // { dg-error "1: 'statik' does not name a type; did
you mean 'static'?" }
+/* { dg-begin-multiline-output "" }
+ statik int si;
+ ^~
+ static
+   { dg-end-multiline-output "" } */


--
Regards,
Michel









[pushed] c++: add parsing_function_declarator predicate

2021-09-15 Thread Jason Merrill via Gcc-patches
While looking at PR96184 I noticed that we were recognizing the situation of
parsing a function declarator based on current_binding_level, and that we
ought to make that a predicate function.  This patch is just refactoring,
but I just suggested using it in a review of another patch.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/cp/ChangeLog:

* cp-tree.h (parsing_function_declarator): Declare.
* name-lookup.c (set_decl_context_in_fn): Use it.
* parser.c (cp_parser_direct_declarator): Use it.
(parsing_function_declarator): New.
---
 gcc/cp/cp-tree.h |  1 +
 gcc/cp/name-lookup.c |  7 ++-
 gcc/cp/parser.c  | 13 -
 3 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 060d1a0a3db..e5f632afba4 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7136,6 +7136,7 @@ extern void cp_convert_omp_range_for (tree &, vec *, tree &,
  tree &, tree &, tree &, tree &, tree &);
 extern void cp_finish_omp_range_for (tree, tree);
 extern bool parsing_nsdmi (void);
+extern bool parsing_function_declarator ();
 extern bool parsing_default_capturing_generic_lambda_in_template (void);
 extern void inject_this_parameter (tree, cp_cv_quals);
 extern location_t defparse_location (tree);
diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c
index 8e9c61e1ee8..ddee8b390f9 100644
--- a/gcc/cp/name-lookup.c
+++ b/gcc/cp/name-lookup.c
@@ -3363,12 +3363,9 @@ set_decl_context_in_fn (tree ctx, tree decl)
 
   if (!DECL_CONTEXT (decl)
   /* When parsing the parameter list of a function declarator,
-don't set DECL_CONTEXT to an enclosing function.  When we
-push the PARM_DECLs in order to process the function body,
-current_binding_level->this_entity will be set.  */
+don't set DECL_CONTEXT to an enclosing function.  */
   && !(TREE_CODE (decl) == PARM_DECL
-  && current_binding_level->kind == sk_function_parms
-  && current_binding_level->this_entity == NULL))
+  && parsing_function_declarator ()))
 DECL_CONTEXT (decl) = ctx;
 }
 
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 7a0b6234350..8d60f40706b 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -23107,7 +23107,7 @@ cp_parser_direct_declarator (cp_parser* parser,
  else if (!cp_parser_uncommitted_to_tentative_parse_p (parser))
/* Let compute_array_index_type diagnose this.  */;
  else if (!parser->in_function_body
-  || current_binding_level->kind == sk_function_parms)
+  || parsing_function_declarator ())
{
  /* Normally, the array bound must be an integral constant
 expression.  However, as an extension, we allow VLAs
@@ -23831,6 +23831,17 @@ parsing_nsdmi (void)
   return false;
 }
 
+/* True if we're parsing a function declarator.  */
+
+bool
+parsing_function_declarator ()
+{
+  /* this_entity is NULL for a function parameter scope while parsing the
+ declarator; it is set when parsing the body of the function.  */
+  return (current_binding_level->kind == sk_function_parms
+ && !current_binding_level->this_entity);
+}
+
 /* Parse a late-specified return type, if any.  This is not a separate
non-terminal, but part of a function declarator, which looks like
 

base-commit: 4320a4b717dcccddf230d0b944bfc5a7ae282508
-- 
2.27.0



Re: [PATCH][v2] Always default to DWARF2_DEBUG if not specified, warn about deprecated STABS

2021-09-15 Thread Koning, Paul via Gcc-patches



> On Sep 13, 2021, at 3:31 AM, Richard Biener  wrote:
> 
> This makes defaults.h choose DWARF2_DEBUG if PREFERRED_DEBUGGING_TYPE
> is not specified by the target and NO_DEBUG if DWARF is not supported.

As I'm looking at questions about old debug formats, it brings up the question 
of old object formats.  I don't remember what the status of a.out is.  Is that 
considered deprecated?  Still current?  Of course most targets use elf, but is 
there an expectation to move away from a.out the way there is an expectation to 
move away from STABS?

Is this actually a binutils rather than a gcc question?

paul



Re: [PATCH] c++: Fix handling of decls with flexible array members initialized with side-effects [PR88578]

2021-09-15 Thread Jason Merrill via Gcc-patches

On 9/15/21 14:59, Jakub Jelinek wrote:

On Tue, Sep 14, 2021 at 10:50:32AM -0400, Jason Merrill wrote:

Note, if the flexible array member is initialized only with non-constant
initializers, we have a worse bug that this patch doesn't solve, the
splitting of initializers into constant and dynamic initialization removes
the initializer and we don't have just wrong DECL_*SIZE, but nothing is
emitted when emitting those vars into assembly either and so the dynamic
initialization clobbers other vars that may overlap the variable.
I think we need keep an empty CONSTRUCTOR elt in DECL_INITIAL for the
flexible array member in that case.


Makes sense.


So, the following patch fixes that.

The typeck2.c change makes sure we keep those CONSTRUCTORs around (although
they should be empty because all their elts had side-effects/was
non-constant if it was removed earlier), and the varasm.c change is to avoid
ICEs on those as well as ICEs on other flex array members that had some
initializers without side-effects, but not on the last array element.

The code was already asserting that the (index of the last elt in the
CONSTRUCTOR + 1) times elt size is equal to TYPE_SIZE_UNIT of the local->val
type, which is true for C flex arrays or for C++ if they don't have any
side-effects or the last elt doesn't have side-effects, this patch changes
that to assertion that the TYPE_SIZE_UNIT is greater than equal to the
offset of the end of last element in the CONSTRUCTOR and uses TYPE_SIZE_UNIT
(int_size_in_bytes) in the code later on.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-09-15  Jakub Jelinek  

PR c++/88578
PR c++/102295
gcc/
* varasm.c (output_constructor_regular_field): Instead of assertion
that array_size_for_constructor result is equal to size of
TREE_TYPE (local->val) in bytes, assert that the type size is greater
or equal to array_size_for_constructor result and use type size as
fieldsize.
gcc/cp/
* typeck2.c (split_nonconstant_init_1): Don't throw away empty
initializers of flexible array members if they have non-zero type
size.
gcc/testsuite/
* g++.dg/ext/flexary39.C: New test.
* g++.dg/ext/flexary40.C: New test.

--- gcc/varasm.c.jj 2021-05-10 12:22:30.437451816 +0200
+++ gcc/varasm.c2021-09-15 13:19:02.841554574 +0200
@@ -5531,14 +5531,15 @@ output_constructor_regular_field (oc_loc
  && (!TYPE_DOMAIN (TREE_TYPE (local->field))
  || !TYPE_MAX_VALUE (TYPE_DOMAIN (TREE_TYPE (local->field)
{
- fieldsize = array_size_for_constructor (local->val);
+ unsigned HOST_WIDE_INT fldsize
+   = array_size_for_constructor (local->val);
+ fieldsize = int_size_in_bytes (TREE_TYPE (local->val));
+ gcc_checking_assert (fieldsize >= fldsize);


This assert needs a comment about when they can be unequal.  OK with 
that change.



  /* Given a non-empty initialization, this field had better
 be last.  Given a flexible array member, the next field
 on the chain is a TYPE_DECL of the enclosing struct.  */
  const_tree next = DECL_CHAIN (local->field);
  gcc_assert (!fieldsize || !next || TREE_CODE (next) != FIELD_DECL);
- tree size = TYPE_SIZE_UNIT (TREE_TYPE (local->val));
- gcc_checking_assert (compare_tree_int (size, fieldsize) == 0);
}
else
fieldsize = tree_to_uhwi (DECL_SIZE_UNIT (local->field));
--- gcc/cp/typeck2.c.jj 2021-09-01 21:30:30.520306823 +0200
+++ gcc/cp/typeck2.c2021-09-15 14:15:59.049388381 +0200
@@ -524,7 +524,20 @@ split_nonconstant_init_1 (tree dest, tre
sub = build3 (COMPONENT_REF, inner_type, dest, field_index,
  NULL_TREE);
  
-	  if (!split_nonconstant_init_1 (sub, value, true))

+ if (!split_nonconstant_init_1 (sub, value, true)
+ /* For flexible array member with initializer we
+can't remove the initializer, because only the
+initializer determines how many elements the
+flexible array member has.  */
+ || (!array_type_p
+ && TREE_CODE (inner_type) == ARRAY_TYPE
+ && TYPE_DOMAIN (inner_type) == NULL
+ && TREE_CODE (TREE_TYPE (value)) == ARRAY_TYPE
+ && COMPLETE_TYPE_P (TREE_TYPE (value))
+ && !integer_zerop (TYPE_SIZE (TREE_TYPE (value)))
+ && idx == CONSTRUCTOR_NELTS (init) - 1
+ && TYPE_HAS_TRIVIAL_DESTRUCTOR
+   (strip_array_types (inner_type
complete_p = false;
  else
{
--- gcc/testsuite/g++.dg/ext/flexary39.C.jj 2021-09-15 14:01:33.811320756 
+0200
+++ gcc/testsuite/g++.dg/ext/flexary39.C2021-09-15 14:04:05.962221748 

Re: [PATCH] coroutines: Small cleanups to await_statement_walker [NFC].

2021-09-15 Thread Jason Merrill via Gcc-patches

On 9/15/21 14:32, Iain Sandoe wrote:

Hi Jason,


On 15 Sep 2021, at 18:32, Jason Merrill  wrote:

On 9/14/21 11:36, Iain Sandoe wrote:

Hi
Some small code cleanups that allow us to have just one place that
we handle a statement with await expression(s) embedded.  Also we
can reduce the work done to figure out whether a statement contains
any such expressions.
tested on x86_64,powerpc64le-linux x86_64-darwin
OK for master?
thanks
Iain
-
There is no need to make a MODIFY_EXPR for any of the condition
vars that we synthesize.
Expansion of co_return can be carried out independently of any
co_awaits that might be contained which simplifies this.
Where we are rewriting statements to handle await expression
logic, there is no need to carry out any analysis - we just need
to detect the presence of any co_await.
Signed-off-by: Iain Sandoe 
gcc/cp/ChangeLog:
* coroutines.cc (await_statement_walker): Code cleanups.
---
  gcc/cp/coroutines.cc | 121 ---
  1 file changed, 56 insertions(+), 65 deletions(-)
diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index d2cc2e73c89..27556723b71 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -3412,16 +3412,11 @@ await_statement_walker (tree *stmt, int *do_subtree, 
void *d)
return NULL_TREE;
  }
  -  /* We have something to be handled as a single statement.  */
-  bool has_cleanup_wrapper = TREE_CODE (*stmt) == CLEANUP_POINT_EXPR;
-  hash_set visited;
-  awpts->saw_awaits = 0;
-  hash_set truth_aoif_to_expand;
-  awpts->truth_aoif_to_expand = _aoif_to_expand;
-  awpts->needs_truth_if_exp = false;
-  awpts->has_awaiter_init = false;
+  /* We have something to be handled as a single statement.  We have to handle
+ a few statements specially where await statements have to be moved out of
+ constructs.  */
tree expr = *stmt;
-  if (has_cleanup_wrapper)
+  if (TREE_CODE (*stmt) == CLEANUP_POINT_EXPR)
  expr = TREE_OPERAND (expr, 0);
STRIP_NOPS (expr);
  @@ -3437,6 +3432,8 @@ await_statement_walker (tree *stmt, int *do_subtree, 
void *d)
   transforms can be implemented.  */
case IF_STMT:
  {
+   tree *await_ptr;
+   hash_set visited;
/* Transform 'if (cond with awaits) then stmt1 else stmt2' into
   bool cond = cond with awaits.
   if (cond) then stmt1 else stmt2.  */
@@ -3444,10 +3441,8 @@ await_statement_walker (tree *stmt, int *do_subtree, 
void *d)
/* We treat the condition as if it was a stand-alone statement,
   to see if there are any await expressions which will be analyzed
   and registered.  */
-   if ((res = cp_walk_tree (_COND (if_stmt),
-   analyze_expression_awaits, d, )))
- return res;
-   if (!awpts->saw_awaits)
+   if (!(cp_walk_tree (_COND (if_stmt),
+ find_any_await, _ptr, )))
  return NULL_TREE; /* Nothing special to do here.  */
gcc_checking_assert (!awpts->bind_stack->is_empty());
@@ -3463,7 +3458,7 @@ await_statement_walker (tree *stmt, int *do_subtree, void 
*d)
/* We want to initialize the new variable with the expression
   that contains the await(s) and potentially also needs to
   have truth_if expressions expanded.  */
-   tree new_s = build2_loc (sloc, MODIFY_EXPR, boolean_type_node,
+   tree new_s = build2_loc (sloc, INIT_EXPR, boolean_type_node,
 newvar, cond_inner);
finish_expr_stmt (new_s);
IF_COND (if_stmt) = newvar;
@@ -3477,25 +3472,25 @@ await_statement_walker (tree *stmt, int *do_subtree, 
void *d)
  break;
case FOR_STMT:
  {
+   tree *await_ptr;
+   hash_set visited;
/* for loops only need special treatment if the condition or the
   iteration expression contain a co_await.  */
tree for_stmt = *stmt;
/* Sanity check.  */
-   if ((res = cp_walk_tree (_INIT_STMT (for_stmt),
-   analyze_expression_awaits, d, )))
- return res;
-   gcc_checking_assert (!awpts->saw_awaits);
-
-   if ((res = cp_walk_tree (_COND (for_stmt),
-   analyze_expression_awaits, d, )))
- return res;
-   bool for_cond_await = awpts->saw_awaits != 0;
-   unsigned save_awaits = awpts->saw_awaits;
-
-   if ((res = cp_walk_tree (_EXPR (for_stmt),
-   analyze_expression_awaits, d, )))
- return res;
-   bool for_expr_await = awpts->saw_awaits > save_awaits;
+   gcc_checking_assert
+ (!(cp_walk_tree (_INIT_STMT (for_stmt), find_any_await,
+  _ptr, )));


What's the rationale for this assert?  [expr.await] seems to say explicitly 
that an await can appear in the initializer of an init-statement.


Indeed (and we 

Re: [PATCH v3] c++: Fix cp_tree_equal for template value args using dependent sizeof/alignof/noexcept expressions

2021-09-15 Thread Jason Merrill via Gcc-patches

On 9/14/21 00:31, Barrett Adair wrote:
I reworked the fix today based on feedback from Jason and Jakub (thank 
you), and the subject line is now outdated. I added another test for a 
closely related bug that's also fixed here (dependent-expr11.C -- this 
one would even fail without the second declaration). All the new tests 
in the patch succeed with the change (only two of them succeed with 
trunk). On my box, the bootstrap succeeds, the g++ test suite passes 
(matching today's posted results anyway), and the libstdc++ test suite 
is looking good but is still running after a long time. I'll leave the 
full "make check" running overnight.


Some potentially controversial changes here:

1. Adding new bool member to cp_parser. I'd like to avoid this, any tips?
2. Relaxing an assert in tsubst_copy. This change feels correct to me, 
but maybe I'm missing something.
3. Pushing a function scope in PARM_DECL case in tsubst_copy_and_build 
to make process_outer_var_ref happy for trailing return types. I don't 
yet fully appreciate the consequences of these changes, so this needs 
some eyes.


These all are to support dependent-expr11.C, right?  This seems like a 
separate issue, that should be a separate patch.


And I don't think there's anything special about a trailing return type. 
 I am surprised to discover that I don't see anything prohibiting that 
use, but I similarly don't see anything prohibiting


template auto bar(T t, bool_c) -> bool_c;

or even

template  using FP = void (*)(T t, int (*)[t()]);

So I guess the "use of parameter outside function body" code in 
finish_id_expression_1 is obsolete with constexpr; removing that should 
address #1.


One way to approach #2 might be to

  begin_scope (sk_function_parms, NULL_TREE);

in tsubst_function_type, so that parsing_function_declarator (which I'm 
about to check in) is true, and change the assert to also check that.


Maybe that will also help with #3.  Really, outer_var_p should be false 
for t, so we shouldn't ever get to process_outer_var_ref.


-

OK, now for the part of the patch that corresponds to the subject line:

4. Traversing each template arg's tree in 
any_template_arguments_need_structural_equality_p to identify dependent 
expressions in trailing return types. This could probably be done 
better. I check current_function_decl here as an optimization (since 
it's NULL in the only place that "needs" this), but that seems brittle. 


I think that optimization makes sense; within a function we shouldn't 
need structural comparison, only for comparing two template declarations.


Also, the new find_dependent_parm_decl_r callback implementation may 
have the unintended consequence of forcing structural comparison on 
member function trailing return types that depend on class template 
parameters. I think I really only want to force structural comparison 
for "arg tree has a dependent parm decl and we're in a trailing return 
type" -- is there a better way to do this?


I don't think whether the parm is dependent is important: the case we 
want to catch is if the argument as a whole is dependent, and contains a 
mention of a parameter.



Also note that I found another related bug which I have not yet solved:

template
struct foo {
   constexpr operator int() { return i; }
};
void bar() {
   [](auto i) -> foo {
     return {};
   }(foo<1>{});
}

With the attached patch, failure occurs at invocation, while trunk fails 
to parse the return type. This seems like a step in the right direction, 
but we should consider whether such an incomplete fix introduces more 
issues than it solves (e.g. unfriendlier error messages, or perhaps 
something more sinister).


This would also be related to the separate change under 1-3 above.

Jason



Re: [External] Re: [PATCH] libstdc++: Optimize 'to_string' with numeric_limits instead of __to_chars_len

2021-09-15 Thread Jonathan Wakely via Gcc-patches
N.B. Please CC *both* the libstdc++ list and the gcc-patches list, as
per https://gcc.gnu.org/lists.html

On Wed, 15 Sept 2021 at 14:02, 刘可 wrote:
>
> Thank you for your review, and I apologize for my mistake. I have updated and 
> tested it!

Hmm, it doesn't work though. How did you test it?

For to_string(int) the string will be padded with '-' characters, i.e.
std::to_string(1) returns "1-" and to_string(-1) returns
"-1-" and to_string(100) returns "100" !
diff --git a/libstdc++-v3/include/bits/basic_string.h 
b/libstdc++-v3/include/bits/basic_string.h
index b61fe05efcf..e1fd42aea1a 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -3716,41 +3716,82 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 
   // DR 1261. Insufficent overloads for to_string / to_wstring
 
+  namespace __detail
+  {
+template
+  inline unsigned
+  __to_string_len(_Tp __val) noexcept
+  {
+#if _GLIBCXX_USE_CXX11_ABI
+   // Any 32-bit integer value fits in the 15-byte SSO buffer,
+   // so don't bother counting how many chars are needed.
+   if _GLIBCXX17_CONSTEXPR (sizeof(_Tp) * __CHAR_BIT_  <= 32)
+ return 9; // std::numeric_limits::digits10
+   else
+#endif
+   return __detail::__to_chars_len(__uval);
+  }
+
+inline void
+__to_string_trim(string& __s) noexcept
+{
+#if _GLIBCXX_USE_CXX11_ABI
+  ???
+#endif
+}
+  }
+
   inline string
   to_string(int __val)
+#if _GLIBCXX_USE_CXX11_ABI
+  noexcept
+#endif
   {
 const bool __neg = __val < 0;
 const unsigned __uval = __neg ? (unsigned)~__val + 1u : __val;
-const auto __len = __detail::__to_chars_len(__uval);
+const auto __len = __detail::__to_string_len(__uval);
 string __str(__neg + __len, '-');
 __detail::__to_chars_10_impl(&__str[__neg], __len, __uval);
+__detail::__to_string_trim(__str);
 return __str;
   }
 
   inline string
   to_string(unsigned __val)
+#if _GLIBCXX_USE_CXX11_ABI
+  noexcept
+#endif
   {
-string __str(__detail::__to_chars_len(__val), '\0');
+string __str(__detail::__to_string_len(__val), '\0');
 __detail::__to_chars_10_impl(&__str[0], __str.size(), __val);
+__detail::__to_string_trim(__str);
 return __str;
   }
 
   inline string
   to_string(long __val)
+#if _GLIBCXX_USE_CXX11_ABI && __SIZEOF_LONG__ == __SIZEOF_INT__
+  noexcept
+#endif
   {
 const bool __neg = __val < 0;
 const unsigned long __uval = __neg ? (unsigned long)~__val + 1ul : __val;
-const auto __len = __detail::__to_chars_len(__uval);
+const auto __len = __detail::__to_string_len(__uval);
 string __str(__neg + __len, '-');
 __detail::__to_chars_10_impl(&__str[__neg], __len, __uval);
+__detail::__to_string_trim(__str);
 return __str;
   }
 
   inline string
   to_string(unsigned long __val)
+#if _GLIBCXX_USE_CXX11_ABI && __SIZEOF_LONG__ == __SIZEOF_INT__
+  noexcept
+#endif
   {
-string __str(__detail::__to_chars_len(__val), '\0');
+string __str(__detail::__to_string_len(__val), '\0');
 __detail::__to_chars_10_impl(&__str[0], __str.size(), __val);
+__detail::__to_string_trim(__str);
 return __str;
   }
 


[PING^2] [PATCH] configure, jit: Allow for 'make check-gcc-jit'.

2021-09-15 Thread Iain Sandoe
Hi folks,

> On 27 Aug 2021, at 14:00, Iain Sandoe  wrote:
> 
> +Jeff
> 
> (it’s probably borderline obvious - but in the top level Makefile .. so)
> 
>> On 17 Aug 2021, at 21:53, David Malcolm  wrote:
>> 
>> On Tue, 2021-08-17 at 19:59 +0100, Iain Sandoe wrote:
>>> Hi,
>>> 
>>> For those of us who habitually build Ada, it’s convenient to 
>>> have a way of running individual test suites without invoking
>>> the acats tests…
>>> 
>>> being able to do “make check-gcc-jit” from the top level is very
>>> useful when debugging jit testsuite issues.
>>> 
>>> one can do "cd gcc ; make check-jit "- but this doesn’t seem 100%
>>> identical since the invocations from the top level set the host
>>> exports first.
>>> 
>>> … the patch itself is trivial / obvious - I am just curious as to
>>> whether there was a reason for omitting it so far?
>> 
>> Probably just a mistake on my part; Makefile glue is not my strongest
>> skill.
>> 
>>> 
>>> If not, 
>>> 
>>> OK for master?
>> 
>> Sounds OK to me - but then again, Makefile glue is not my strongest
>> skill, so not sure if I'm qualified to approve this.
>> 
>>> 
>>> thanks
>>> Iain
>>> 
>>> 
>>> 
>>> 
>>> This is a convenience feature that allows the user to
>>> do "make check-gcc-jit" at the top level of the build
>>> to check that facility in isolation from others.
>>> 
>>> Signed-off-by: Iain Sandoe 
>>> 
>>> ChangeLog:
>>> 
>>>* Makefile.def: Add a jit check target for the jit
>>>language.
>>>* Makefile.in: Regenerate.
>>> ---
>>> Makefile.def | 1 +
>>> Makefile.in  | 8 
>>> 2 files changed, 9 insertions(+)
>>> 
>>> diff --git a/Makefile.def b/Makefile.def
>>> index fbfdb6fee08..7cbeca5b181 100644
>>> --- a/Makefile.def
>>> +++ b/Makefile.def
>>> @@ -654,6 +654,7 @@ languages = { language=go;  gcc-check-
>>> target=check-go;
>>>lib-check-target=check-gotools; };
>>> languages = { language=d;  gcc-check-target=check-d;
>>>lib-check-target=check-target-
>>> libphobos; };
>>> +languages = { language=jit;gcc-check-target=check-jit; };
>>> 
>>> // Toplevel bootstrap
>>> bootstrap_stage = { id=1 ; };
> 



Re: [PATCH][OBVIOUS] rs6000: fix symtab_node::get == NULL issue

2021-09-15 Thread Martin Liška

On 9/15/21 20:40, David Edelsohn wrote:

This needs an additional adjustment.  The encoding decoration needs to
be applied if the decl isn't an alias.  That means both a null summary
*OR* the decl is not explicitly an alias.


Oh, sorry, I made a stupid thinko.

Please install the patch.
Martin



I'm proposing the following:

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index d0830a95027..ad81dfb316d 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21728,8 +21728,8 @@ rs6000_xcoff_encode_section_info (tree decl, rtx rtl, in
t first)
if (decl
&& DECL_P (decl)
&& VAR_OR_FUNCTION_DECL_P (decl)
-  && symtab_node::get (decl) != NULL
-  && symtab_node::get (decl)->alias == 0
+  && (symtab_node::get (decl) == NULL
+ || symtab_node::get (decl)->alias == 0)
&& symname[strlen (symname) - 1] != ']')
  {
const char *smclass = NULL;


On Wed, Sep 15, 2021 at 11:21 AM Martin Liška  wrote:


Hello.

The patch is approved by David and fixes the issue described in the PR.

Martin

 PR target/102349

gcc/ChangeLog:

 * config/rs6000/rs6000.c (rs6000_xcoff_encode_section_info):
 Check that we have a symbol summary for a symbol.
---
   gcc/config/rs6000/rs6000.c | 1 +
   1 file changed, 1 insertion(+)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index b0ec8108007..d0830a95027 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21728,6 +21728,7 @@ rs6000_xcoff_encode_section_info (tree decl, rtx rtl, 
int first)
 if (decl
 && DECL_P (decl)
 && VAR_OR_FUNCTION_DECL_P (decl)
+  && symtab_node::get (decl) != NULL
 && symtab_node::get (decl)->alias == 0
 && symname[strlen (symname) - 1] != ']')
   {
--
2.33.0





Re: [PATCH][RFC] pru: Named address space for R30/R31 I/O access

2021-09-15 Thread Dimitar Dimitrov
On Wed, Sep 15, 2021 at 11:12:18AM +0200, Richard Biener wrote:
> On Tue, Sep 14, 2021 at 11:13 PM Dimitar Dimitrov  wrote:
> >
> > Hi,
> >
> > I'm sending this patch to get feedback for a new PRU CPU port feature.
> > My intention is to push it to master by end of September, so that it gets
> > included in GCC 12.
> >
> > The PRU architecture provides single-cycle access to GPIO pins via
> > special designated CPU registers - R30 and R31. These two registers can
> > of course be accessed in C code using inline assembly, but that can be
> > intimidating to users.
> >
> > The TI proprietary compiler [1] can expose these I/O registers as global
> > volatile registers:
> >   volatile register unsigned int __R31;
> >
> > Consequently, accessing them in user programs is as straightforward as
> > using a regular global variable:
> >   __R31 |= (1 << 2);
> >
> > Unfortunately, global volatile registers are not supported by GCC [2].
> 
> Yes, a "register" write or read does not follow volatile semantics, so
> exposing those as registers isn't supported (I consider the GPIO regs
> similar to MSRs on other CPUs?).

Yes, they are a lot like MSRs.

> 
> > I decided to implement convenient access to __R30 and __R31 using a new
> > named address space:
> >   extern volatile __regio_symbol unsigned int __R30;
> >
> > Unlike global registers, volatile global memory variables are well
> > supported in GCC.  Memory writes and reads to the __regio_symbol address
> > space are converted to writes and reads to R30 and R31 CPU registers.
> > The declared variable name determines which of the two registers it is
> > representing.
> 
> I think that's reasonable.  I do wonder whether it's possible to prevent
> taking the address of __R30 though - otherwise I guess the backend
> will crash or do weird things on such code?

I believe I have handled those cases, and suitable error messages are
emitted by the compiler.  See the negative test cases added in
regio-as-pointer*.c and regio-decl*.c.

Thanks,
Dimitar

> 
> > With an ifdef for the __R30/__R31 declarations, user programs can now
> > be source-compatible with both TI and GCC toolchains.
> >
> > [1] https://www.ti.com/lit/ug/spruhv7c/spruhv7c.pdf , "Global Register 
> > Variables"
> > [2] https://gcc.gnu.org/ml/gcc-patches/2015-01/msg02241.html
> >
> > gcc/ChangeLog:
> >
> > * config/pru/constraints.md (Rrio): New constraint.
> > * config/pru/predicates.md (regio_operand): New predicate.
> > * config/pru/pru-pragma.c (pru_register_pragmas): Register
> > the __regio_symbol address space.
> > * config/pru/pru-protos.h (pru_symref2ioregno): Declaration.
> > * config/pru/pru.c (pru_symref2ioregno): New helper function.
> > (pru_legitimate_address_p): Remove.
> > (pru_addr_space_legitimate_address_p): Use the address space
> > aware hook variant.
> > (pru_nongeneric_pointer_addrspace): New helper function.
> > (pru_insert_attributes): New function to validate __regio_symbol
> > usage.
> > (TARGET_INSERT_ATTRIBUTES): New macro.
> > (TARGET_LEGITIMATE_ADDRESS_P): Remove.
> > (TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P): New macro.
> > * config/pru/pru.h (enum reg_class): Add REGIO_REGS class.
> > * config/pru/pru.md (*regio_readsi): New pattern to read I/O
> > registers.
> > (*regio_nozext_writesi): New pattern to write to I/O registers.
> > (*regio_zext_write_r30): Ditto.
> > * doc/extend.texi: Document the new PRU Named Address Space.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/pru/regio-as-pointer.c: New negative test.
> > * gcc.target/pru/regio-as-pointer2.c: New negative test.
> > * gcc.target/pru/regio-decl-2.c: New negative test.
> > * gcc.target/pru/regio-decl-3.c: New negative test.
> > * gcc.target/pru/regio-decl-4.c: New negative test.
> > * gcc.target/pru/regio-decl.c: New negative test.
> > * gcc.target/pru/regio-di.c: New negative test.
> > * gcc.target/pru/regio-hi.c: New negative test.
> > * gcc.target/pru/regio-qi.c: New negative test.
> > * gcc.target/pru/regio.c: New test.
> > * gcc.target/pru/regio.h: New helper header.
> >
> > Signed-off-by: Dimitar Dimitrov 
> > ---
> >  gcc/config/pru/constraints.md |   5 +
> >  gcc/config/pru/predicates.md  |  19 +++
> >  gcc/config/pru/pru-pragma.c   |   2 +
> >  gcc/config/pru/pru-protos.h   |   3 +
> >  gcc/config/pru/pru.c  | 155 +-
> >  gcc/config/pru/pru.h  |   5 +
> >  gcc/config/pru/pru.md | 102 +++-
> >  gcc/doc/extend.texi   |  19 ++-
> >  .../gcc.target/pru/regio-as-pointer.c |  11 ++
> >  .../gcc.target/pru/regio-as-pointer2.c|  11 ++
> >  

[PATCH] c++: Fix handling of decls with flexible array members initialized with side-effects [PR88578]

2021-09-15 Thread Jakub Jelinek via Gcc-patches
On Tue, Sep 14, 2021 at 10:50:32AM -0400, Jason Merrill wrote:
> > Note, if the flexible array member is initialized only with non-constant
> > initializers, we have a worse bug that this patch doesn't solve, the
> > splitting of initializers into constant and dynamic initialization removes
> > the initializer and we don't have just wrong DECL_*SIZE, but nothing is
> > emitted when emitting those vars into assembly either and so the dynamic
> > initialization clobbers other vars that may overlap the variable.
> > I think we need keep an empty CONSTRUCTOR elt in DECL_INITIAL for the
> > flexible array member in that case.
> 
> Makes sense.

So, the following patch fixes that.

The typeck2.c change makes sure we keep those CONSTRUCTORs around (although
they should be empty because all their elts had side-effects/was
non-constant if it was removed earlier), and the varasm.c change is to avoid
ICEs on those as well as ICEs on other flex array members that had some
initializers without side-effects, but not on the last array element.

The code was already asserting that the (index of the last elt in the
CONSTRUCTOR + 1) times elt size is equal to TYPE_SIZE_UNIT of the local->val
type, which is true for C flex arrays or for C++ if they don't have any
side-effects or the last elt doesn't have side-effects, this patch changes
that to assertion that the TYPE_SIZE_UNIT is greater than equal to the
offset of the end of last element in the CONSTRUCTOR and uses TYPE_SIZE_UNIT
(int_size_in_bytes) in the code later on.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-09-15  Jakub Jelinek  

PR c++/88578
PR c++/102295
gcc/
* varasm.c (output_constructor_regular_field): Instead of assertion
that array_size_for_constructor result is equal to size of
TREE_TYPE (local->val) in bytes, assert that the type size is greater
or equal to array_size_for_constructor result and use type size as
fieldsize.
gcc/cp/
* typeck2.c (split_nonconstant_init_1): Don't throw away empty
initializers of flexible array members if they have non-zero type
size.
gcc/testsuite/
* g++.dg/ext/flexary39.C: New test.
* g++.dg/ext/flexary40.C: New test.

--- gcc/varasm.c.jj 2021-05-10 12:22:30.437451816 +0200
+++ gcc/varasm.c2021-09-15 13:19:02.841554574 +0200
@@ -5531,14 +5531,15 @@ output_constructor_regular_field (oc_loc
  && (!TYPE_DOMAIN (TREE_TYPE (local->field))
  || !TYPE_MAX_VALUE (TYPE_DOMAIN (TREE_TYPE (local->field)
{
- fieldsize = array_size_for_constructor (local->val);
+ unsigned HOST_WIDE_INT fldsize
+   = array_size_for_constructor (local->val);
+ fieldsize = int_size_in_bytes (TREE_TYPE (local->val));
+ gcc_checking_assert (fieldsize >= fldsize);
  /* Given a non-empty initialization, this field had better
 be last.  Given a flexible array member, the next field
 on the chain is a TYPE_DECL of the enclosing struct.  */
  const_tree next = DECL_CHAIN (local->field);
  gcc_assert (!fieldsize || !next || TREE_CODE (next) != FIELD_DECL);
- tree size = TYPE_SIZE_UNIT (TREE_TYPE (local->val));
- gcc_checking_assert (compare_tree_int (size, fieldsize) == 0);
}
   else
fieldsize = tree_to_uhwi (DECL_SIZE_UNIT (local->field));
--- gcc/cp/typeck2.c.jj 2021-09-01 21:30:30.520306823 +0200
+++ gcc/cp/typeck2.c2021-09-15 14:15:59.049388381 +0200
@@ -524,7 +524,20 @@ split_nonconstant_init_1 (tree dest, tre
sub = build3 (COMPONENT_REF, inner_type, dest, field_index,
  NULL_TREE);
 
- if (!split_nonconstant_init_1 (sub, value, true))
+ if (!split_nonconstant_init_1 (sub, value, true)
+ /* For flexible array member with initializer we
+can't remove the initializer, because only the
+initializer determines how many elements the
+flexible array member has.  */
+ || (!array_type_p
+ && TREE_CODE (inner_type) == ARRAY_TYPE
+ && TYPE_DOMAIN (inner_type) == NULL
+ && TREE_CODE (TREE_TYPE (value)) == ARRAY_TYPE
+ && COMPLETE_TYPE_P (TREE_TYPE (value))
+ && !integer_zerop (TYPE_SIZE (TREE_TYPE (value)))
+ && idx == CONSTRUCTOR_NELTS (init) - 1
+ && TYPE_HAS_TRIVIAL_DESTRUCTOR
+   (strip_array_types (inner_type
complete_p = false;
  else
{
--- gcc/testsuite/g++.dg/ext/flexary39.C.jj 2021-09-15 14:01:33.811320756 
+0200
+++ gcc/testsuite/g++.dg/ext/flexary39.C2021-09-15 14:04:05.962221748 
+0200
@@ -0,0 +1,65 @@
+// PR c++/88578
+// { dg-do run }
+// { dg-options -Wno-pedantic }

Re: [PATCH][OBVIOUS] rs6000: fix symtab_node::get == NULL issue

2021-09-15 Thread David Edelsohn via Gcc-patches
This needs an additional adjustment.  The encoding decoration needs to
be applied if the decl isn't an alias.  That means both a null summary
*OR* the decl is not explicitly an alias.

I'm proposing the following:

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index d0830a95027..ad81dfb316d 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21728,8 +21728,8 @@ rs6000_xcoff_encode_section_info (tree decl, rtx rtl, in
t first)
   if (decl
   && DECL_P (decl)
   && VAR_OR_FUNCTION_DECL_P (decl)
-  && symtab_node::get (decl) != NULL
-  && symtab_node::get (decl)->alias == 0
+  && (symtab_node::get (decl) == NULL
+ || symtab_node::get (decl)->alias == 0)
   && symname[strlen (symname) - 1] != ']')
 {
   const char *smclass = NULL;


On Wed, Sep 15, 2021 at 11:21 AM Martin Liška  wrote:
>
> Hello.
>
> The patch is approved by David and fixes the issue described in the PR.
>
> Martin
>
> PR target/102349
>
> gcc/ChangeLog:
>
> * config/rs6000/rs6000.c (rs6000_xcoff_encode_section_info):
> Check that we have a symbol summary for a symbol.
> ---
>   gcc/config/rs6000/rs6000.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index b0ec8108007..d0830a95027 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -21728,6 +21728,7 @@ rs6000_xcoff_encode_section_info (tree decl, rtx rtl, 
> int first)
> if (decl
> && DECL_P (decl)
> && VAR_OR_FUNCTION_DECL_P (decl)
> +  && symtab_node::get (decl) != NULL
> && symtab_node::get (decl)->alias == 0
> && symname[strlen (symname) - 1] != ']')
>   {
> --
> 2.33.0
>


Re: [PATCH] coroutines: Small cleanups to await_statement_walker [NFC].

2021-09-15 Thread Iain Sandoe
Hi Jason,

> On 15 Sep 2021, at 18:32, Jason Merrill  wrote:
> 
> On 9/14/21 11:36, Iain Sandoe wrote:
>> Hi
>> Some small code cleanups that allow us to have just one place that
>> we handle a statement with await expression(s) embedded.  Also we
>> can reduce the work done to figure out whether a statement contains
>> any such expressions.
>> tested on x86_64,powerpc64le-linux x86_64-darwin
>> OK for master?
>> thanks
>> Iain
>> -
>> There is no need to make a MODIFY_EXPR for any of the condition
>> vars that we synthesize.
>> Expansion of co_return can be carried out independently of any
>> co_awaits that might be contained which simplifies this.
>> Where we are rewriting statements to handle await expression
>> logic, there is no need to carry out any analysis - we just need
>> to detect the presence of any co_await.
>> Signed-off-by: Iain Sandoe 
>> gcc/cp/ChangeLog:
>>  * coroutines.cc (await_statement_walker): Code cleanups.
>> ---
>>  gcc/cp/coroutines.cc | 121 ---
>>  1 file changed, 56 insertions(+), 65 deletions(-)
>> diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
>> index d2cc2e73c89..27556723b71 100644
>> --- a/gcc/cp/coroutines.cc
>> +++ b/gcc/cp/coroutines.cc
>> @@ -3412,16 +3412,11 @@ await_statement_walker (tree *stmt, int *do_subtree, 
>> void *d)
>>return NULL_TREE;
>>  }
>>  -  /* We have something to be handled as a single statement.  */
>> -  bool has_cleanup_wrapper = TREE_CODE (*stmt) == CLEANUP_POINT_EXPR;
>> -  hash_set visited;
>> -  awpts->saw_awaits = 0;
>> -  hash_set truth_aoif_to_expand;
>> -  awpts->truth_aoif_to_expand = _aoif_to_expand;
>> -  awpts->needs_truth_if_exp = false;
>> -  awpts->has_awaiter_init = false;
>> +  /* We have something to be handled as a single statement.  We have to 
>> handle
>> + a few statements specially where await statements have to be moved out 
>> of
>> + constructs.  */
>>tree expr = *stmt;
>> -  if (has_cleanup_wrapper)
>> +  if (TREE_CODE (*stmt) == CLEANUP_POINT_EXPR)
>>  expr = TREE_OPERAND (expr, 0);
>>STRIP_NOPS (expr);
>>  @@ -3437,6 +3432,8 @@ await_statement_walker (tree *stmt, int *do_subtree, 
>> void *d)
>> transforms can be implemented.  */
>>  case IF_STMT:
>>{
>> +tree *await_ptr;
>> +hash_set visited;
>>  /* Transform 'if (cond with awaits) then stmt1 else stmt2' into
>> bool cond = cond with awaits.
>> if (cond) then stmt1 else stmt2.  */
>> @@ -3444,10 +3441,8 @@ await_statement_walker (tree *stmt, int *do_subtree, 
>> void *d)
>>  /* We treat the condition as if it was a stand-alone statement,
>> to see if there are any await expressions which will be analyzed
>> and registered.  */
>> -if ((res = cp_walk_tree (_COND (if_stmt),
>> -analyze_expression_awaits, d, )))
>> -  return res;
>> -if (!awpts->saw_awaits)
>> +if (!(cp_walk_tree (_COND (if_stmt),
>> +  find_any_await, _ptr, )))
>>return NULL_TREE; /* Nothing special to do here.  */
>>  gcc_checking_assert (!awpts->bind_stack->is_empty());
>> @@ -3463,7 +3458,7 @@ await_statement_walker (tree *stmt, int *do_subtree, 
>> void *d)
>>  /* We want to initialize the new variable with the expression
>> that contains the await(s) and potentially also needs to
>> have truth_if expressions expanded.  */
>> -tree new_s = build2_loc (sloc, MODIFY_EXPR, boolean_type_node,
>> +tree new_s = build2_loc (sloc, INIT_EXPR, boolean_type_node,
>>   newvar, cond_inner);
>>  finish_expr_stmt (new_s);
>>  IF_COND (if_stmt) = newvar;
>> @@ -3477,25 +3472,25 @@ await_statement_walker (tree *stmt, int *do_subtree, 
>> void *d)
>>break;
>>  case FOR_STMT:
>>{
>> +tree *await_ptr;
>> +hash_set visited;
>>  /* for loops only need special treatment if the condition or the
>> iteration expression contain a co_await.  */
>>  tree for_stmt = *stmt;
>>  /* Sanity check.  */
>> -if ((res = cp_walk_tree (_INIT_STMT (for_stmt),
>> -analyze_expression_awaits, d, )))
>> -  return res;
>> -gcc_checking_assert (!awpts->saw_awaits);
>> -
>> -if ((res = cp_walk_tree (_COND (for_stmt),
>> -analyze_expression_awaits, d, )))
>> -  return res;
>> -bool for_cond_await = awpts->saw_awaits != 0;
>> -unsigned save_awaits = awpts->saw_awaits;
>> -
>> -if ((res = cp_walk_tree (_EXPR (for_stmt),
>> -analyze_expression_awaits, d, )))
>> -  return res;
>> -bool for_expr_await = awpts->saw_awaits > save_awaits;
>> +gcc_checking_assert
>> +  (!(cp_walk_tree (_INIT_STMT (for_stmt), find_any_await,
>> +   _ptr, )));
> 
> What's the rationale for 

Re: [PATCH] testsuite: Fix c-c++-common/auto-init-* tests

2021-09-15 Thread Qing Zhao via Gcc-patches


> On Sep 11, 2021, at 3:03 AM, Jakub Jelinek  wrote:
> 
> Note, the gcc.dg/i386/auto-init* tests fail also, just don't have time to
> deal with that right now, just try
> make check-gcc RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} 
> i386.exp=auto-init*'

It’s strange that the above testing on my local x86 machine with the latest gcc 
had less failure than the following:

[opc@qinzhao-ol8u3-x86 build-boot]$ make check-gcc 
RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} i386.exp=auto-init*' &> log &
[1] 3885164
[opc@qinzhao-ol8u3-x86 build-boot]$ 
[1]+  Donemake check-gcc 
RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} i386.exp=auto-init*' &> log
[opc@qinzhao-ol8u3-x86 build-boot]$ egrep FAIL gcc/testsuite/gcc/gcc.sum
FAIL: gcc.target/i386/auto-init-2.c scan-rtl-dump-times expand 
"0xfefefefe" 2
FAIL: gcc.target/i386/auto-init-2.c scan-rtl-dump-times expand 
"0xfefefefefefefefe" 3
FAIL: gcc.target/i386/auto-init-3.c scan-assembler-times pxor\t\\%xmm0, \\%xmm0 
3
FAIL: gcc.target/i386/auto-init-4.c scan-rtl-dump-times expand 
"0xfefefefe" 1
FAIL: gcc.target/i386/auto-init-4.c scan-rtl-dump-times expand 
"\\[0xfefefefefefefefe\\]" 1
FAIL: gcc.target/i386/auto-init-4.c scan-rtl-dump-times expand 
"0xfffe\\]\\) repeated x16" 1
FAIL: gcc.target/i386/auto-init-5.c scan-assembler-times \\.long\t0 14
FAIL: gcc.target/i386/auto-init-padding-3.c scan-assembler movl\t\\$16,
FAIL: gcc.target/i386/auto-init-padding-3.c scan-assembler rep stosq
FAIL: gcc.target/i386/auto-init-padding-7.c scan-assembler-times movq\t\\$0, 2
FAIL: gcc.target/i386/auto-init-padding-8.c scan-assembler-times movq\t\\$0, 2
FAIL: gcc.target/i386/auto-init-padding-9.c scan-assembler rep stosq

I am wondering whether the default value for  “-march” option might be 
different on different platforms? ( I see if I add -march=cascadelake, then I 
will get more failures).

I have a patch to the above FAILURES as:

Please take a look and let me know your comments:

thanks.

Qing


From deb44a929ee27b097cc2351c4a4d7644bee68277 Mon Sep 17 00:00:00 2001
From: Qing Zhao 
Date: Wed, 15 Sep 2021 17:22:07 +
Subject: [PATCH] fix i386 testing cases failure for m32

---
 gcc/testsuite/gcc.target/i386/auto-init-2.c | 6 --
 gcc/testsuite/gcc.target/i386/auto-init-3.c | 3 ++-
 gcc/testsuite/gcc.target/i386/auto-init-4.c | 8 +---
 gcc/testsuite/gcc.target/i386/auto-init-5.c | 5 +++--
 gcc/testsuite/gcc.target/i386/auto-init-padding-3.c | 6 --
 gcc/testsuite/gcc.target/i386/auto-init-padding-7.c | 5 +++--
 gcc/testsuite/gcc.target/i386/auto-init-padding-8.c | 7 +++
 gcc/testsuite/gcc.target/i386/auto-init-padding-9.c | 5 -
 8 files changed, 28 insertions(+), 17 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/auto-init-2.c 
b/gcc/testsuite/gcc.target/i386/auto-init-2.c
index e76fc2565168..ea12b29baefd 100644
--- a/gcc/testsuite/gcc.target/i386/auto-init-2.c
+++ b/gcc/testsuite/gcc.target/i386/auto-init-2.c
@@ -31,6 +31,8 @@ void foo()
 
 /* { dg-final { scan-rtl-dump-times "0xfffe" 2 "expand" } } */
 /* { dg-final { scan-rtl-dump-times "0xfefe" 1 "expand" } } */
-/* { dg-final { scan-rtl-dump-times "0xfefefefe" 2 "expand" } } */
-/* { dg-final { scan-rtl-dump-times "0xfefefefefefefefe" 3 "expand" } } */
+/* { dg-final { scan-rtl-dump-times "0xfefefefe" 2 "expand" { target 
lp64 } } } */
+/* { dg-final { scan-rtl-dump-times "0xfefefefefefefefe" 3 "expand" { target 
lp64 } } } */
+/* { dg-final { scan-rtl-dump-times "0xfefefefe" 4 "expand" { target 
ia32 } } } */
+/* { dg-final { scan-rtl-dump-times "0xfefefefefefefefe" 1 "expand" { target 
ia32 } } } */
 
diff --git a/gcc/testsuite/gcc.target/i386/auto-init-3.c 
b/gcc/testsuite/gcc.target/i386/auto-init-3.c
index 300ddfb34f11..8c6326384054 100644
--- a/gcc/testsuite/gcc.target/i386/auto-init-3.c
+++ b/gcc/testsuite/gcc.target/i386/auto-init-3.c
@@ -14,4 +14,5 @@ long double foo()
   return result;
 }
 
-/* { dg-final { scan-assembler-times "pxor\t\\\%xmm0, \\\%xmm0" 3 } } */
+/* { dg-final { scan-assembler-times "pxor\t\\\%xmm0, \\\%xmm0" 3  { target 
lp64 } } } */
+/* { dg-final { scan-assembler-times "fldz" 3  { target ia32} } } */
diff --git a/gcc/testsuite/gcc.target/i386/auto-init-4.c 
b/gcc/testsuite/gcc.target/i386/auto-init-4.c
index abe0b7e46a07..62102c7db946 100644
--- a/gcc/testsuite/gcc.target/i386/auto-init-4.c
+++ b/gcc/testsuite/gcc.target/i386/auto-init-4.c
@@ -14,7 +14,9 @@ long double foo()
   return result;
 }
 
-/* { dg-final { scan-rtl-dump-times "0xfefefefe" 1 "expand" } } */
-/* { dg-final { scan-rtl-dump-times "\\\[0xfefefefefefefefe\\\]" 1 "expand" } 
} */
-/* { dg-final { scan-rtl-dump-times "0xfffe\\\]\\\) repeated x16" 
1 "expand" } } */
+/* { dg-final { scan-rtl-dump-times "0xfefefefe" 1 "expand" { target 
lp64 } } } */
+/* { dg-final { scan-rtl-dump-times "\\\[0xfefefefefefefefe\\\]" 1 "expand" { 

Re: [PATCH] c++: shortcut bad convs during overload resolution, part 2 [PR101904]

2021-09-15 Thread Jason Merrill via Gcc-patches

On 9/12/21 15:45, Patrick Palka wrote:

The r12-3346 patch makes us avoid computing excess argument conversions
during overload resolution, but only when it turns out there's a
strictly viable candidate in the overload set.  If there is no such
candidate then we still need to compute more conversions than strictly
necessary because subsequent conversions after the first bad conversion
can turn a non-strictly viable candidate into unviable one, and that
affects the outcome of overload resolution and the behavior of its
callers (in light of -fpermissive).

But at least in a SFINAE context, the distinction between a non-strictly
viable and an unviable candidate shouldn't matter all that much since
performing a bad conversion is always an error (even with -fpermissive),
and so forming a call to a non-strictly viable candidate will end up
being a SFINAE error anyway, just like in the unviable case.  Hence a
non-strictly viable candidate is effectively unviable (in a SFINAE
context), and we don't really need to distinguish between the two kinds.
We can take advantage of this observation to avoid computing excess
argument conversions even when there's no strictly viable candidate in
the overload set.

This patch implements this idea.  We usually detect a SFINAE context by
looking for the absence of the tf_error flag, but that's not specific
enough: we can also get here from build_user_type_conversion with
tf_error cleared, and there the distinction between a non-strictly
viable candidate and an unviable candidate still matters (it determines
whether a user-defined conversion is bad or just doesn't exist).  So this
patch sets and checks for the tf_conv flag to detect this situation too,
which avoids regressing conv2.C below.

Unlike the previous patch, this one does change the outcome of overload
resolution, but it should do so only in a way that preserves backwards
compatibility with -fpermissive.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
for trunk?


OK.


PR c++/101904

gcc/cp/ChangeLog:

* call.c (build_user_type_conversion_1): Add tf_conv to complain.
(add_candidates): When in a SFINAE context, instead of adding a
candidate to bad_fns just mark it unviable.

gcc/testsuite/ChangeLog:

* g++.dg/ext/conv2.C: New test.
* g++.dg/template/conv17.C: Augment test.
---
  gcc/cp/call.c  | 17 +++--
  gcc/testsuite/g++.dg/ext/conv2.C   | 13 +
  gcc/testsuite/g++.dg/template/conv17.C |  7 +++
  3 files changed, 35 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/ext/conv2.C

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index b6011c1a282..ab0d118e34e 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -4175,6 +4175,9 @@ build_user_type_conversion_1 (tree totype, tree expr, int 
flags,
flags |= LOOKUP_NO_CONVERSION;
if (BRACE_ENCLOSED_INITIALIZER_P (expr))
  flags |= LOOKUP_NO_NARROWING;
+  /* Prevent add_candidates from treating a non-strictly viable candidate
+ as an unviable one.  */
+  complain |= tf_conv;
  
/* It's OK to bind a temporary for converting constructor arguments, but

   not in converting the return value of a conversion operator.  */
@@ -6232,8 +6235,18 @@ add_candidates (tree fns, tree first_arg, const vec *args,
 stopped at the first bad conversion).  Add the function to BAD_FNS
 to fully reconsider later if we don't find any strictly viable
 candidates.  */
- bad_fns = lookup_add (fn, bad_fns);
- *candidates = (*candidates)->next;
+ if (complain & (tf_error | tf_conv))
+   {
+ bad_fns = lookup_add (fn, bad_fns);
+ *candidates = (*candidates)->next;
+   }
+ else
+   /* But if we're in a SFINAE context, just mark this candidate as
+  unviable outright and avoid potentially reconsidering it.
+  This is safe to do since performing a bad conversion is always
+  erroneous in a SFINAE context (even with -fpermissive), so a
+  non-strictly viable candidate is effectively unviable anyway.  */
+   cand->viable = 0;
}
  }
if (which == non_templates && !seen_perfect)
diff --git a/gcc/testsuite/g++.dg/ext/conv2.C b/gcc/testsuite/g++.dg/ext/conv2.C
new file mode 100644
index 000..baf2a43b2ae
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/conv2.C
@@ -0,0 +1,13 @@
+// { dg-do compile { target c++11 } }
+// { dg-additional-options "-fpermissive" }
+
+struct A {
+  A(int*, int);
+};
+
+void f(A);
+
+int main() {
+  const int n = 0;
+  f({, 42}); // { dg-warning "invalid conversion from 'const int\\*' to 
'int\\*'" }
+}
diff --git a/gcc/testsuite/g++.dg/template/conv17.C 
b/gcc/testsuite/g++.dg/template/conv17.C
index ba012c9d1fa..e40da8f1f24 100644
--- a/gcc/testsuite/g++.dg/template/conv17.C
+++ b/gcc/testsuite/g++.dg/template/conv17.C
@@ -53,4 +53,11 @@ 

Re: [PATCH] coroutines: Small cleanups to await_statement_walker [NFC].

2021-09-15 Thread Jason Merrill via Gcc-patches

On 9/14/21 11:36, Iain Sandoe wrote:

Hi

Some small code cleanups that allow us to have just one place that
we handle a statement with await expression(s) embedded.  Also we
can reduce the work done to figure out whether a statement contains
any such expressions.

tested on x86_64,powerpc64le-linux x86_64-darwin
OK for master?
thanks
Iain

-

There is no need to make a MODIFY_EXPR for any of the condition
vars that we synthesize.

Expansion of co_return can be carried out independently of any
co_awaits that might be contained which simplifies this.

Where we are rewriting statements to handle await expression
logic, there is no need to carry out any analysis - we just need
to detect the presence of any co_await.

Signed-off-by: Iain Sandoe 

gcc/cp/ChangeLog:

* coroutines.cc (await_statement_walker): Code cleanups.
---
  gcc/cp/coroutines.cc | 121 ---
  1 file changed, 56 insertions(+), 65 deletions(-)

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index d2cc2e73c89..27556723b71 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -3412,16 +3412,11 @@ await_statement_walker (tree *stmt, int *do_subtree, 
void *d)
return NULL_TREE;
  }
  
-  /* We have something to be handled as a single statement.  */

-  bool has_cleanup_wrapper = TREE_CODE (*stmt) == CLEANUP_POINT_EXPR;
-  hash_set visited;
-  awpts->saw_awaits = 0;
-  hash_set truth_aoif_to_expand;
-  awpts->truth_aoif_to_expand = _aoif_to_expand;
-  awpts->needs_truth_if_exp = false;
-  awpts->has_awaiter_init = false;
+  /* We have something to be handled as a single statement.  We have to handle
+ a few statements specially where await statements have to be moved out of
+ constructs.  */
tree expr = *stmt;
-  if (has_cleanup_wrapper)
+  if (TREE_CODE (*stmt) == CLEANUP_POINT_EXPR)
  expr = TREE_OPERAND (expr, 0);
STRIP_NOPS (expr);
  
@@ -3437,6 +3432,8 @@ await_statement_walker (tree *stmt, int *do_subtree, void *d)

   transforms can be implemented.  */
case IF_STMT:
  {
+   tree *await_ptr;
+   hash_set visited;
/* Transform 'if (cond with awaits) then stmt1 else stmt2' into
   bool cond = cond with awaits.
   if (cond) then stmt1 else stmt2.  */
@@ -3444,10 +3441,8 @@ await_statement_walker (tree *stmt, int *do_subtree, 
void *d)
/* We treat the condition as if it was a stand-alone statement,
   to see if there are any await expressions which will be analyzed
   and registered.  */
-   if ((res = cp_walk_tree (_COND (if_stmt),
-   analyze_expression_awaits, d, )))
- return res;
-   if (!awpts->saw_awaits)
+   if (!(cp_walk_tree (_COND (if_stmt),
+ find_any_await, _ptr, )))
  return NULL_TREE; /* Nothing special to do here.  */
  
  	gcc_checking_assert (!awpts->bind_stack->is_empty());

@@ -3463,7 +3458,7 @@ await_statement_walker (tree *stmt, int *do_subtree, void 
*d)
/* We want to initialize the new variable with the expression
   that contains the await(s) and potentially also needs to
   have truth_if expressions expanded.  */
-   tree new_s = build2_loc (sloc, MODIFY_EXPR, boolean_type_node,
+   tree new_s = build2_loc (sloc, INIT_EXPR, boolean_type_node,
 newvar, cond_inner);
finish_expr_stmt (new_s);
IF_COND (if_stmt) = newvar;
@@ -3477,25 +3472,25 @@ await_statement_walker (tree *stmt, int *do_subtree, 
void *d)
  break;
case FOR_STMT:
  {
+   tree *await_ptr;
+   hash_set visited;
/* for loops only need special treatment if the condition or the
   iteration expression contain a co_await.  */
tree for_stmt = *stmt;
/* Sanity check.  */
-   if ((res = cp_walk_tree (_INIT_STMT (for_stmt),
-   analyze_expression_awaits, d, )))
- return res;
-   gcc_checking_assert (!awpts->saw_awaits);
-
-   if ((res = cp_walk_tree (_COND (for_stmt),
-   analyze_expression_awaits, d, )))
- return res;
-   bool for_cond_await = awpts->saw_awaits != 0;
-   unsigned save_awaits = awpts->saw_awaits;
-
-   if ((res = cp_walk_tree (_EXPR (for_stmt),
-   analyze_expression_awaits, d, )))
- return res;
-   bool for_expr_await = awpts->saw_awaits > save_awaits;
+   gcc_checking_assert
+ (!(cp_walk_tree (_INIT_STMT (for_stmt), find_any_await,
+  _ptr, )));


What's the rationale for this assert?  [expr.await] seems to say 
explicitly that an await can appear in the initializer of an init-statement.



+   visited.empty ();
+   bool for_cond_await
+ = cp_walk_tree (_COND 

Re: [PATCH][v2] Always default to DWARF2_DEBUG if not specified, warn about deprecated STABS

2021-09-15 Thread Koning, Paul via Gcc-patches



> On Sep 15, 2021, at 11:55 AM, John David Anglin  wrote:
> 
> On 2021-09-15 10:06 a.m., Richard Biener wrote:
>>> Is there a simple way to enable -gstabs in build?
>> Currently not.  If we're retaining more than pdp11 with a non-DWARF
>> config I'm considering to allow STABS by default for those without
>> diagnostics for GCC 12.
>> 
>> With GCC 13 we'll definitely either remove the configurations or
>> leave the target without any support for debug info.
> I tend to think targets without any support for debug information should be 
> removed.  There is
> some time before GCC 13.  This provides a chance for the target to implement 
> DWARF support.

I suppose.  But for pdp11 at least, DWARF and ELF are both somewhat unnatural 
and anachronistic.  PDP11 unixes use much older debug formats, and DEC 
operating systems are more primitive still (no debug symbols at all, of any 
kind).  So for that case at least, supporting the target but without debug 
symbols would not be a crazy option.

Of course, it would be neat to be able to debug PDP-11 code with GDB...

paul




Re: [FYI] zero-call-used-regs attr for ada

2021-09-15 Thread Alexandre Oliva
On Sep 15, 2021, Alexandre Oliva  wrote:

> Regstrapped on x86_64-linux-gnu.  Patch pre-approved by Olivier Hainque.

Uhh, actually, Olivier had only seen and approved these changes:

> for  gcc/ada/ChangeLog

>   * gcc-interface/utils.c: Include opts.h.
>   (handle_zero_call_used_regs_attribute): New.
>   (gnat_internal_attribute_table): Add zero_call_used_regs.

The trivial testcase went in without anyone's approval.  Oops.

> for  gcc/testsuite/ChangeLog

>   * gnat.dg/zcur_attr.adb, gnat.dg/zcur_attr.ads: New.

Sorry about the slip.  It's trivial enough that I suppose it can fit the
"obviously correct" rule.  Please let me know in case you disagree.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


Re: [PATCH] testsuite: Make sure double-precision is supported in g++.dg/eh/arm-vfp-unwind.C

2021-09-15 Thread Richard Earnshaw via Gcc-patches




On 15/09/2021 17:13, Christophe Lyon via Gcc-patches wrote:

On Wed, Sep 15, 2021 at 2:49 PM Richard Earnshaw via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:




On 15/09/2021 13:26, Christophe LYON via Gcc-patches wrote:


On 15/09/2021 13:02, Richard Earnshaw wrote:



On 26/08/2021 16:53, Christophe Lyon via Gcc-patches wrote:

g++.dg/eh/arm-vfp-unwind.C uses an asm statement relying on
double-precision FPU support, but does not make sure it is actually
supported by the target.
Check (__ARM_FP & 8) to ensure this.

2021-08-26  Christophe Lyon  

 gcc/testsuite/
 * g++.dg/eh/arm-vfp-unwind.C: Check __ARM_FP.
---
   gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C
b/gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C
index 62263c0c3b0..90d20081d78 100644
--- a/gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C
+++ b/gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C
@@ -3,7 +3,7 @@
 /* Test to catch off-by-one errors in arm/pr-support.c.  */
   -#if defined (__VFP_FP__) && !defined (__SOFTFP__)
+#if defined (__VFP_FP__) && !defined (__SOFTFP__) && (__ARM_FP & 8)
 #include 
   #include 



Wouldn't it be better to have an alternate to the asm for the case
where we only have single-precision float?  Something like (untested):

static void donkey ()
{
#if __ARM_FP & 8
   asm volatile ("fcpyd d9, %P0" : : "w" (1.2345) : "d9");
#else
   asm volatile ("fcpys s18, %P0" : : "w" (1.2345f) : "s18");
#endif
   throw 1;
}



I tried similar things but they failed on some testing configurations.

Let me try your version, I'll let you know if there is any fallout.


Of course, the asm syntax should be converted to the new 'unified
syntax' form ie vmov.f{32,64}.



The problem is that %P expects a double-precision register.
It seems there's nothing to print a single-precision register, or rather %p
(small p)
rejects s18 too.



I said it was untested :)

You want something like

#if __ARM_FP & 8
asm volatile ("vmov.f64 d9, %P0" : : "w" (1.2345) : "d9");
#else
asm volatile ("vmov.f32 s18, %0" : : "t" (1.2345f) : "s18");
#endif

(there's no need for a modifier on the single-precision register name).




R.



Christophe




R.




Re: [PATCH] c++: default ctor that's also a list ctor [PR102050]

2021-09-15 Thread Jason Merrill via Gcc-patches

On 9/14/21 15:16, Patrick Palka wrote:

In grok_special_member_properties we need to set TYPE_HAS_COPY_CTOR,
TYPE_HAS_DEFAULT_CONSTRUCTOR and TYPE_HAS_LIST_CTOR independently
from each other because a single constructor can be both a default and
list constructor (as in the first testcase), or both a default and copy
constructor (as in the second testcase).

Bootstrapped and regtested on x86_64-pc-linux-gsu, does this look OK for
trunk?


OK.


PR c++/102050

gcc/cp/ChangeLog:

* decl.c (grok_special_member_properties): Set
TYPE_HAS_COPY_CTOR, TYPE_HAS_DEFAULT_CONSTRUCTOR
and TYPE_HAS_LIST_CTOR independently from each other.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/initlist125.C: New test.
* g++.dg/cpp0x/initlist126.C: New test.
---
  gcc/cp/decl.c|  6 --
  gcc/testsuite/g++.dg/cpp0x/initlist125.C | 10 ++
  gcc/testsuite/g++.dg/cpp0x/initlist126.C | 17 +
  3 files changed, 31 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/initlist125.C
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/initlist126.C

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 1a2925b4108..76e4e6e8a26 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -14843,9 +14843,11 @@ grok_special_member_properties (tree decl)
  if (ctor > 1)
TYPE_HAS_CONST_COPY_CTOR (class_type) = 1;
}
-  else if (sufficient_parms_p (FUNCTION_FIRST_USER_PARMTYPE (decl)))
+
+  if (sufficient_parms_p (FUNCTION_FIRST_USER_PARMTYPE (decl)))
TYPE_HAS_DEFAULT_CONSTRUCTOR (class_type) = 1;
-  else if (is_list_ctor (decl))
+
+  if (is_list_ctor (decl))
TYPE_HAS_LIST_CTOR (class_type) = 1;
  
if (DECL_DECLARED_CONSTEXPR_P (decl)

diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist125.C 
b/gcc/testsuite/g++.dg/cpp0x/initlist125.C
new file mode 100644
index 000..08ae3741c67
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/initlist125.C
@@ -0,0 +1,10 @@
+// PR c++/102050
+// { dg-do compile { target c++11 } }
+
+#include 
+
+struct A { A(std::initializer_list = {}); };
+
+A x{0};
+A y{1, 2, 3};
+A z;
diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist126.C 
b/gcc/testsuite/g++.dg/cpp0x/initlist126.C
new file mode 100644
index 000..0a8fb998be6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/initlist126.C
@@ -0,0 +1,17 @@
+// PR c++/102050
+// { dg-do compile { target c++11 } }
+
+#include 
+
+extern struct A a;
+
+struct A {
+  A(const A& = a);
+  A(std::initializer_list) = delete;
+};
+
+void f(A);
+
+int main() {
+  f({}); // { dg-bogus "deleted" }
+}





[FYI] zero-call-used-regs attr for ada

2021-09-15 Thread Alexandre Oliva


Make the zero_call_used_regs attribute usable as a Machine_Attribute
pragma.

Regstrapped on x86_64-linux-gnu.  Patch pre-approved by Olivier Hainque.


for  gcc/ada/ChangeLog

* gcc-interface/utils.c: Include opts.h.
(handle_zero_call_used_regs_attribute): New.
(gnat_internal_attribute_table): Add zero_call_used_regs.

for  gcc/testsuite/ChangeLog

* gnat.dg/zcur_attr.adb, gnat.dg/zcur_attr.ads: New.
---
 gcc/ada/gcc-interface/utils.c   |   59 +++
 gcc/testsuite/gnat.dg/zcur_attr.adb |8 +
 gcc/testsuite/gnat.dg/zcur_attr.ads |4 ++
 3 files changed, 71 insertions(+)
 create mode 100644 gcc/testsuite/gnat.dg/zcur_attr.adb
 create mode 100644 gcc/testsuite/gnat.dg/zcur_attr.ads

diff --git a/gcc/ada/gcc-interface/utils.c b/gcc/ada/gcc-interface/utils.c
index 4190855b76394..be3f107926d7c 100644
--- a/gcc/ada/gcc-interface/utils.c
+++ b/gcc/ada/gcc-interface/utils.c
@@ -38,6 +38,7 @@
 #include "attribs.h"
 #include "varasm.h"
 #include "toplev.h"
+#include "opts.h"
 #include "output.h"
 #include "debug.h"
 #include "convert.h"
@@ -109,6 +110,8 @@ static tree handle_target_attribute (tree *, tree, tree, 
int, bool *);
 static tree handle_target_clones_attribute (tree *, tree, tree, int, bool *);
 static tree handle_vector_size_attribute (tree *, tree, tree, int, bool *);
 static tree handle_vector_type_attribute (tree *, tree, tree, int, bool *);
+static tree handle_zero_call_used_regs_attribute (tree *, tree, tree, int,
+ bool *);
 
 static const struct attribute_spec::exclusions attr_cold_hot_exclusions[] =
 {
@@ -191,6 +194,9 @@ const struct attribute_spec gnat_internal_attribute_table[] 
=
   { "may_alias",0, 0,  false, true,  false, false,
 NULL, NULL },
 
+  { "zero_call_used_regs", 1, 1, true, false, false, false,
+handle_zero_call_used_regs_attribute, NULL },
+
   /* ??? format and format_arg are heavy and not supported, which actually
  prevents support for stdio builtins, which we however declare as part
  of the common builtins.def contents.  */
@@ -6987,6 +6993,59 @@ handle_vector_type_attribute (tree *node, tree name, 
tree ARG_UNUSED (args),
   return NULL_TREE;
 }
 
+/* Handle a "zero_call_used_regs" attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree
+handle_zero_call_used_regs_attribute (tree *node, tree name, tree args,
+ int ARG_UNUSED (flags),
+ bool *no_add_attrs)
+{
+  tree decl = *node;
+  tree id = TREE_VALUE (args);
+
+  if (TREE_CODE (decl) != FUNCTION_DECL)
+{
+  error_at (DECL_SOURCE_LOCATION (decl),
+   "%qE attribute applies only to functions", name);
+  *no_add_attrs = true;
+  return NULL_TREE;
+}
+
+  /* pragma Machine_Attribute turns string arguments into identifiers.
+ Reverse it.  */
+  if (TREE_CODE (id) == IDENTIFIER_NODE)
+id = TREE_VALUE (args) = build_string
+  (IDENTIFIER_LENGTH (id), IDENTIFIER_POINTER (id));
+
+  if (TREE_CODE (id) != STRING_CST)
+{
+  error_at (DECL_SOURCE_LOCATION (decl),
+   "%qE argument not a string", name);
+  *no_add_attrs = true;
+  return NULL_TREE;
+}
+
+  bool found = false;
+  for (unsigned int i = 0; zero_call_used_regs_opts[i].name != NULL; ++i)
+if (strcmp (TREE_STRING_POINTER (id),
+   zero_call_used_regs_opts[i].name) == 0)
+  {
+   found = true;
+   break;
+  }
+
+  if (!found)
+{
+  error_at (DECL_SOURCE_LOCATION (decl),
+   "unrecognized %qE attribute argument %qs",
+   name, TREE_STRING_POINTER (id));
+  *no_add_attrs = true;
+}
+
+  return NULL_TREE;
+}
+
 /* --- *
  *  BUILTIN FUNCTIONS  *
  * --- */
diff --git a/gcc/testsuite/gnat.dg/zcur_attr.adb 
b/gcc/testsuite/gnat.dg/zcur_attr.adb
new file mode 100644
index 0..5d15f5e9d7324
--- /dev/null
+++ b/gcc/testsuite/gnat.dg/zcur_attr.adb
@@ -0,0 +1,8 @@
+--  { dg-do compile }
+--  { dg-options "-fdump-tree-optimized" }
+
+package body ZCUR_Attr is
+   function F return Integer is (0);
+end ZCUR_Attr;
+
+--  { dg-final { scan-tree-dump "zero_call_used_regs \[(\]\"all\"\[)\]" 
"optimized" } }
diff --git a/gcc/testsuite/gnat.dg/zcur_attr.ads 
b/gcc/testsuite/gnat.dg/zcur_attr.ads
new file mode 100644
index 0..b756cc838b8df
--- /dev/null
+++ b/gcc/testsuite/gnat.dg/zcur_attr.ads
@@ -0,0 +1,4 @@
+package ZCUR_Attr is
+   function F return Integer;
+   pragma Machine_Attribute (F, "zero_call_used_regs", "all");
+end ZCUR_Attr;


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer

Re: [pushed] c++: don't warn about internal interference sizes

2021-09-15 Thread Christophe Lyon via Gcc-patches
On Wed, Sep 15, 2021 at 5:39 PM Jason Merrill via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> On Wed, Sep 15, 2021 at 11:37 AM Jeff Law  wrote:
>
> >
> >
> > On 9/15/2021 9:31 AM, Jason Merrill via Gcc-patches wrote:
> > > Most any compilation on ARM/AArch64 was warning because the default L1
> > cache
> > > line size of 32B was smaller than the default
> > > std::hardware_constructive_interference_size of 64B.  This is mostly
> due
> > to
> > > inaccurate --param l1-cache-line-size, but it's not helpful to complain
> > to a
> > > user that didn't set the values.
> > >
> > > gcc/cp/ChangeLog:
> > >
> > >   * decl.c (cxx_init_decl_processing): Only warn about odd
> > >   interference sizes if they were specified with --param.
> > I wonder if that'll fix the arm-linux build failures that started
> > showing up recently:
> >
> > armeb-linux-gnueabi:
> >
> > : error: '--param constructive-interference-size=64' is
> > greater than '--param l1-cache-line-size=32' [-Werror=interference-size]
> >
> > I expect the other arm- linux configurations would show it as well, but
> > they're only run once a week in my tester and I don't think they've been
> > run since the recent changes in this space.
> >
>
> Yes, that is exactly the purpose of this change.
>
>
I can confirm that those targets build again, thanks.


> Jason
>


Re: [PATCH] testsuite: Make sure double-precision is supported in g++.dg/eh/arm-vfp-unwind.C

2021-09-15 Thread Christophe Lyon via Gcc-patches
On Wed, Sep 15, 2021 at 2:49 PM Richard Earnshaw via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

>
>
> On 15/09/2021 13:26, Christophe LYON via Gcc-patches wrote:
> >
> > On 15/09/2021 13:02, Richard Earnshaw wrote:
> >>
> >>
> >> On 26/08/2021 16:53, Christophe Lyon via Gcc-patches wrote:
> >>> g++.dg/eh/arm-vfp-unwind.C uses an asm statement relying on
> >>> double-precision FPU support, but does not make sure it is actually
> >>> supported by the target.
> >>> Check (__ARM_FP & 8) to ensure this.
> >>>
> >>> 2021-08-26  Christophe Lyon  
> >>>
> >>> gcc/testsuite/
> >>> * g++.dg/eh/arm-vfp-unwind.C: Check __ARM_FP.
> >>> ---
> >>>   gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C | 2 +-
> >>>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C
> >>> b/gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C
> >>> index 62263c0c3b0..90d20081d78 100644
> >>> --- a/gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C
> >>> +++ b/gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C
> >>> @@ -3,7 +3,7 @@
> >>> /* Test to catch off-by-one errors in arm/pr-support.c.  */
> >>>   -#if defined (__VFP_FP__) && !defined (__SOFTFP__)
> >>> +#if defined (__VFP_FP__) && !defined (__SOFTFP__) && (__ARM_FP & 8)
> >>> #include 
> >>>   #include 
> >>>
> >>
> >> Wouldn't it be better to have an alternate to the asm for the case
> >> where we only have single-precision float?  Something like (untested):
> >>
> >> static void donkey ()
> >> {
> >> #if __ARM_FP & 8
> >>   asm volatile ("fcpyd d9, %P0" : : "w" (1.2345) : "d9");
> >> #else
> >>   asm volatile ("fcpys s18, %P0" : : "w" (1.2345f) : "s18");
> >> #endif
> >>   throw 1;
> >> }
> >
> >
> > I tried similar things but they failed on some testing configurations.
> >
> > Let me try your version, I'll let you know if there is any fallout.
>
> Of course, the asm syntax should be converted to the new 'unified
> syntax' form ie vmov.f{32,64}.
>
>
The problem is that %P expects a double-precision register.
It seems there's nothing to print a single-precision register, or rather %p
(small p)
rejects s18 too.



> R.
>
> >
> > Christophe
> >
> >
> >>
> >> R.
>


Re: [PATCH][v2] Always default to DWARF2_DEBUG if not specified, warn about deprecated STABS

2021-09-15 Thread John David Anglin
On 2021-09-15 10:06 a.m., Richard Biener wrote:
>> Is there a simple way to enable -gstabs in build?
> Currently not.  If we're retaining more than pdp11 with a non-DWARF
> config I'm considering to allow STABS by default for those without
> diagnostics for GCC 12.
>
> With GCC 13 we'll definitely either remove the configurations or
> leave the target without any support for debug info.
I tend to think targets without any support for debug information should be 
removed.  There is
some time before GCC 13.  This provides a chance for the target to implement 
DWARF support.

Dave

-- 
John David Anglin  dave.ang...@bell.net



[PATCH][pushed] i386: port vxworks to TARGET_CPU_P macro

2021-09-15 Thread Martin Liška

Hi.

In g:f23881fcf081a6edd538d6d54fa0068d716973d7 I replaced TARGET_* macros
with TARGET_CPU_P.

Pushed as obvious.
Martin

PR target/102351

gcc/ChangeLog:

* config/i386/vxworks.h: Use new macro TARGET_CPU_P.
---
 gcc/config/i386/vxworks.h | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/gcc/config/i386/vxworks.h b/gcc/config/i386/vxworks.h
index ebda7d9d26a..0676cb4cead 100644
--- a/gcc/config/i386/vxworks.h
+++ b/gcc/config/i386/vxworks.h
@@ -73,37 +73,37 @@ along with GCC; see the file COPYING3.  If not see
   VXWORKS_OS_CPP_BUILTINS ();  \
   if (TARGET_64BIT)\
VX_CPUDEF (X86_64); \
-  else if (TARGET_PENTIUM4)\
+  else if (TARGET_CPU_P (PENTIUM4))\
{   \
  VX_CPUDEF (PENTIUM4); \
  VX_CPUVDEF (PENTIUM4);\
}   \
-  else if (TARGET_CORE2)   \
+  else if (TARGET_CPU_P (CORE2))   \
VX_CPUDEF (CORE2);  \
-  else if (TARGET_NEHALEM) \
+  else if (TARGET_CPU_P (NEHALEM)) \
VX_CPUDEF (NEHALEM);\
-  else if (TARGET_SANDYBRIDGE) \
+  else if (TARGET_CPU_P (SANDYBRIDGE)) \
VX_CPUDEF (SANDYBRIDGE);\
-  else if (TARGET_HASWELL) \
+  else if (TARGET_CPU_P (HASWELL)) \
VX_CPUDEF (HASWELL);\
-  else if (TARGET_SILVERMONT)  \
+  else if (TARGET_CPU_P (SILVERMONT))  \
VX_CPUDEF (SILVERMONT); \
-  else if (TARGET_SKYLAKE || TARGET_SKYLAKE_AVX512) \
+  else if (TARGET_CPU_P (SKYLAKE) || TARGET_CPU_P (SKYLAKE_AVX512)) \
VX_CPUDEF (SKYLAKE);\
-  else if (TARGET_GOLDMONT)\
+  else if (TARGET_CPU_P (GOLDMONT))\
VX_CPUDEF (GOLDMONT);   \
   else if (TARGET_VXWORKS7)\
VX_CPUDEF (PENTIUM4);   \
-  else if (TARGET_386) \
+  else if (TARGET_CPU_P (I386))\
VX_CPUDEF (I80386); \
-  else if (TARGET_486) \
+  else if (TARGET_CPU_P (I486))\
VX_CPUDEF (I80486); \
-  else if (TARGET_PENTIUM) \
+  else if (TARGET_CPU_P (PENTIUM)) \
{   \
  VX_CPUDEF (PENTIUM);  \
  VX_CPUVDEF (PENTIUM); \
}   \
-  else if (TARGET_PENTIUMPRO)  \
+  else if (TARGET_CPU_P (PENTIUMPRO))  \
{   \
  VX_CPUDEF (PENTIUM2); \
  VX_CPUVDEF (PENTIUMPRO);  \
--
2.33.0



Re: [pushed] c++: don't warn about internal interference sizes

2021-09-15 Thread Jason Merrill via Gcc-patches
On Wed, Sep 15, 2021 at 11:37 AM Jeff Law  wrote:

>
>
> On 9/15/2021 9:31 AM, Jason Merrill via Gcc-patches wrote:
> > Most any compilation on ARM/AArch64 was warning because the default L1
> cache
> > line size of 32B was smaller than the default
> > std::hardware_constructive_interference_size of 64B.  This is mostly due
> to
> > inaccurate --param l1-cache-line-size, but it's not helpful to complain
> to a
> > user that didn't set the values.
> >
> > gcc/cp/ChangeLog:
> >
> >   * decl.c (cxx_init_decl_processing): Only warn about odd
> >   interference sizes if they were specified with --param.
> I wonder if that'll fix the arm-linux build failures that started
> showing up recently:
>
> armeb-linux-gnueabi:
>
> : error: '--param constructive-interference-size=64' is
> greater than '--param l1-cache-line-size=32' [-Werror=interference-size]
>
> I expect the other arm- linux configurations would show it as well, but
> they're only run once a week in my tester and I don't think they've been
> run since the recent changes in this space.
>

Yes, that is exactly the purpose of this change.

Jason


Re: [pushed] c++: don't warn about internal interference sizes

2021-09-15 Thread Jeff Law via Gcc-patches




On 9/15/2021 9:31 AM, Jason Merrill via Gcc-patches wrote:

Most any compilation on ARM/AArch64 was warning because the default L1 cache
line size of 32B was smaller than the default
std::hardware_constructive_interference_size of 64B.  This is mostly due to
inaccurate --param l1-cache-line-size, but it's not helpful to complain to a
user that didn't set the values.

gcc/cp/ChangeLog:

* decl.c (cxx_init_decl_processing): Only warn about odd
interference sizes if they were specified with --param.
I wonder if that'll fix the arm-linux build failures that started 
showing up recently:


armeb-linux-gnueabi:

: error: '--param constructive-interference-size=64' is 
greater than '--param l1-cache-line-size=32' [-Werror=interference-size]


I expect the other arm- linux configurations would show it as well, but 
they're only run once a week in my tester and I don't think they've been 
run since the recent changes in this space.


jeff



Re: [PATCH RFC] c++: implement C++17 hardware interference size

2021-09-15 Thread Jason Merrill via Gcc-patches

On 9/15/21 8:31 AM, Martin Liška wrote:

On 9/14/21 09:56, Christophe LYON via Gcc-patches wrote:

So adjustment is needed for both arm and aarch64 targets


Hello.

I noticed the same problem and I've got a patch candidate for it.

What do you think about it?


I've now silenced the warning for internal default values, but they're 
still worth discussion.


For arm, I think it would make sense to use 32 for constructive for the 
generic target, but perhaps we think the older chips aren't worth 
constraining new code for?  In which case, perhaps 
param_l1_cache_line_size should also default to 64 for the generic target.


For aarch64, it seems even more questionable that 
param_l1_cache_line_size is 32 for the generic target; are there any 
aarch64 CPUs with a 32B L1 cache line?


Jason



[pushed] c++: don't warn about internal interference sizes

2021-09-15 Thread Jason Merrill via Gcc-patches
Most any compilation on ARM/AArch64 was warning because the default L1 cache
line size of 32B was smaller than the default
std::hardware_constructive_interference_size of 64B.  This is mostly due to
inaccurate --param l1-cache-line-size, but it's not helpful to complain to a
user that didn't set the values.

gcc/cp/ChangeLog:

* decl.c (cxx_init_decl_processing): Only warn about odd
interference sizes if they were specified with --param.
---
 gcc/cp/decl.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 1a2925b4108..9ad9446e262 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -4756,7 +4756,7 @@ cxx_init_decl_processing (void)
   /* Check that the hardware interference sizes are at least
  alignof(max_align_t), as required by the standard.  */
   const int max_align = max_align_t_align () / BITS_PER_UNIT;
-  if (param_destruct_interfere_size)
+  if (global_options_set.x_param_destruct_interfere_size)
 {
   if (param_destruct_interfere_size < max_align)
error ("%<--param destructive-interference-size=%d%> is less than "
@@ -4767,11 +4767,13 @@ cxx_init_decl_processing (void)
 "is less than %<--param l1-cache-line-size=%d%>",
 param_destruct_interfere_size, param_l1_cache_line_size);
 }
+  else if (param_destruct_interfere_size)
+/* Assume the internal value is OK.  */;
   else if (param_l1_cache_line_size >= max_align)
 param_destruct_interfere_size = param_l1_cache_line_size;
   /* else leave it unset.  */
 
-  if (param_construct_interfere_size)
+  if (global_options_set.x_param_construct_interfere_size)
 {
   if (param_construct_interfere_size < max_align)
error ("%<--param constructive-interference-size=%d%> is less than "
@@ -4783,6 +4785,8 @@ cxx_init_decl_processing (void)
 "is greater than %<--param l1-cache-line-size=%d%>",
 param_construct_interfere_size, param_l1_cache_line_size);
 }
+  else if (param_construct_interfere_size)
+/* Assume the internal value is OK.  */;
   else if (param_l1_cache_line_size >= max_align)
 param_construct_interfere_size = param_l1_cache_line_size;
 }

base-commit: adddfc85c07143f7c8097a90a83bfb15b8bd52e8
-- 
2.27.0



[PATCH][OBVIOUS] rs6000: fix symtab_node::get == NULL issue

2021-09-15 Thread Martin Liška

Hello.

The patch is approved by David and fixes the issue described in the PR.

Martin

PR target/102349

gcc/ChangeLog:

* config/rs6000/rs6000.c (rs6000_xcoff_encode_section_info):
Check that we have a symbol summary for a symbol.
---
 gcc/config/rs6000/rs6000.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index b0ec8108007..d0830a95027 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21728,6 +21728,7 @@ rs6000_xcoff_encode_section_info (tree decl, rtx rtl, 
int first)
   if (decl
   && DECL_P (decl)
   && VAR_OR_FUNCTION_DECL_P (decl)
+  && symtab_node::get (decl) != NULL
   && symtab_node::get (decl)->alias == 0
   && symname[strlen (symname) - 1] != ']')
 {
--
2.33.0



Re: GNU Tools @ LPC 2021: Program is published

2021-09-15 Thread Thomas Schwinge
Hi!

On 2021-09-10T09:10:23+0100, Jeremy Bennett  wrote:
> The program for the GNU Tools Track at Linux Plumbers Conference is
> published:
>
>   https://linuxplumbersconf.org/event/11/sessions/109/

Yay!

This may qualify "as obvious", but I better get reviewed what I change on
our front page to the Internet ;-) -- OK to push to wwwdocs master branch
the attached "GNU Tools @ Linux Plumbers Conference 2021"?


Grüße
 Thomas


> A total of 25 talks, lightning talks and BoFs plus our regular Q
> session with the GCC steering committee and Glibc, GDB and binutils
> stewards. The sessions run 07:00-11:00 Pacific Time from Monday 20
> September - Thursday 23 September. This means there is no clash with the
> LPC Toolchains and Kernel Microconference on Friday 24 September.
>
> This is a virtual conference hosted using BigBlueButton. It isn't a free
> conference - you'll need to sign up for a ticket (for all of LPC) to
> participate. However the talks should also be live streamed free on YouTube.
>
>   https://linuxplumbersconf.org/event/11/page/112-attend
>
> Speakers should be contacted next week with details of their free
> tickets. Contact me or sarah.c...@embecosm.com if you are a speaker and
> don't get an email.
>
> Best wishes,
>
>
> Jeremy


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 51e2e792d8a66436df126a28e870ac9f38767600 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 15 Sep 2021 17:08:12 +0200
Subject: [PATCH] GNU Tools @ Linux Plumbers Conference 2021

---
 htdocs/index.html | 5 +
 1 file changed, 5 insertions(+)

diff --git a/htdocs/index.html b/htdocs/index.html
index d6b0d959..c7368e26 100644
--- a/htdocs/index.html
+++ b/htdocs/index.html
@@ -54,6 +54,11 @@ mission statement.
 
 News
 
+
+https://gcc.gnu.org/wiki/linuxplumbers2021;>GNU Tools @ Linux Plumbers Conference 2021
+[2021-09-15]
+Will be held through online videoconference, September 20-24 2021
+
 GCC 11.2 released
 [2021-07-28]
 
-- 
2.25.1



[PATCH][pushed] gcc-changelog: check git commit email address

2021-09-15 Thread Martin Liška

contrib/ChangeLog:

* gcc-changelog/git_commit.py: Check commit email.
* gcc-changelog/test_email.py: Add new test.
* gcc-changelog/test_patches.txt: Likewise.
---
 contrib/gcc-changelog/git_commit.py| 10 ++
 contrib/gcc-changelog/test_email.py|  5 +
 contrib/gcc-changelog/test_patches.txt | 25 +
 3 files changed, 40 insertions(+)

diff --git a/contrib/gcc-changelog/git_commit.py 
b/contrib/gcc-changelog/git_commit.py
index d1646bdc0cd..03736140fd0 100755
--- a/contrib/gcc-changelog/git_commit.py
+++ b/contrib/gcc-changelog/git_commit.py
@@ -326,6 +326,8 @@ class GitCommit:
 if not self.info:
 return
 
+self.check_commit_email()

+
 # Extract PR numbers form the subject line
 # Match either [PR] / (PR) or PR component/
 if self.info.lines and not self.revert_commit:
@@ -803,3 +805,11 @@ class GitCommit:
 print('Errors:')
 for error in self.errors:
 print(error)
+
+def check_commit_email(self):
+# Parse 'Martin Liska  '
+email = self.info.author.split(' ')[-1].strip('<>')
+
+# Verify that all characters are ASCII
+if len(email) != len(email.encode()):
+self.errors.append(Error(f'non-ASCII characters in git commit 
email address ({email})'))
diff --git a/contrib/gcc-changelog/test_email.py 
b/contrib/gcc-changelog/test_email.py
index 319e065ca55..dae7c27c707 100755
--- a/contrib/gcc-changelog/test_email.py
+++ b/contrib/gcc-changelog/test_email.py
@@ -440,3 +440,8 @@ class TestGccChangelog(unittest.TestCase):
 def test_copyright_years(self):
 email = self.from_patch_glob('copyright-years.patch')
 assert not email.errors
+
+def test_non_ascii_email(self):
+email = self.from_patch_glob('non-ascii-email.patch')
+assert (email.errors[0].message ==
+'non-ASCII characters in git commit email address 
(jbglaw@ług-owl.de)')
diff --git a/contrib/gcc-changelog/test_patches.txt 
b/contrib/gcc-changelog/test_patches.txt
index ba516274fc1..98a0d3f1ee0 100644
--- a/contrib/gcc-changelog/test_patches.txt
+++ b/contrib/gcc-changelog/test_patches.txt
@@ -3464,3 +3464,28 @@ index 6f67552d075..32478f070e8 100644
 +
 --
 2.25.1
+
+=== non-ascii-email.patch ===
+From f42e95a830ab48e59389065ce79a013a519646f1 Mon Sep 17 00:00:00 2001
+From: Jan-Benedict Glaw 
+Date: Mon, 13 Sep 2021 12:08:25 +0200
+Subject: [PATCH] Fix multi-statment macro
+
+INIT_CUMULATIVE_ARGS() expands to multiple statements, which will break right
+after an `if` statement. Wrap it into a block.
+
+gcc/ChangeLog:
+
+   * config/alpha/vms.h (INIT_CUMULATIVE_ARGS): Wrap multi-statment
+   define into a block.
+---
+ gcc/config/alpha/vms.h | 10 +++---
+ 1 file changed, 7 insertions(+), 3 deletions(-)
+
+diff --git a/gcc/config/alpha/vms.h b/gcc/config/alpha/vms.h
+index 2a9917cde62..0033b0004b3 100644
+--- a/gcc/config/alpha/vms.h
 b/gcc/config/alpha/vms.h
+@@ -0,0 +1,1 @@
++
+--
--
2.33.0



Re: [PATCH] aix: Add FAT library support for libffi

2021-09-15 Thread H.J. Lu via Gcc-patches
On Wed, Sep 15, 2021 at 6:18 AM David Edelsohn via Gcc-patches
 wrote:
>
> Clement,
>
> GCC libffi cherry-picks / backports patches from upstream, but it does
> not maintain local patches, so we need to find another solution.
>

Please take a look at my libffi patches:

https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578702.html

I provided a way to maintain local patches, similar to libsanitizer.

> Thanks, David
>
> On Wed, Sep 15, 2021 at 9:14 AM CHIGOT, CLEMENT  
> wrote:
> >
> > Hi David,
> >
> > The problem is that it has no meaning in libffi itself...
> > This patch is specific to gcc because the multilib part is specific
> > to gcc.
> > I'll ask the community but the patch cannot be merged inside
> > libffi.
> >
> > Thanks,
> > Clément
> > 
> > From: David Edelsohn 
> > Sent: Wednesday, September 15, 2021 2:52 PM
> > To: CHIGOT, CLEMENT 
> > Cc: gcc-patches@gcc.gnu.org 
> > Subject: Re: [PATCH] aix: Add FAT library support for libffi
> >
> > Caution! External email. Do not open attachments or click links, unless 
> > this email comes from a known sender and you know the content is safe.
> >
> > Clement,
> >
> > GCC is not the primary repository for libffi.  This patch must be
> > submitted to the libffi project first, not GCC.  If it is accepted in
> > libffi, then you can ask for a backport to GCC.
> >
> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Flibffi%2Flibffidata=04%7C01%7Cclement.chigot%40atos.net%7C58a9dee835dd4d5ca10308d97847ec9c%7C33440fc6b7c7412cbb730e70b0198d5a%7C0%7C0%7C637673072579890230%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=6%2B%2Bxl4K30%2B%2FvitHcOZwwyq1B6CwAEtHmr3LXgUIp9uQ%3Dreserved=0
> >
> > Thanks, David
> >
> > On Wed, Sep 15, 2021 at 7:20 AM CHIGOT, CLEMENT  
> > wrote:
> > >
> > > Even if GCC64 is able to bootstrap without libffi being a
> > > FAT library on AIX, the tests for "-maix32" are not working
> > > without it.
> > >
> > > libffi/ChangeLog:
> > > 2021-09-10  Clément Chigot  
> > >
> > > * Makefile.am (tmake_file): Build and install AIX-style FAT
> > >   libraries.
> > > * Makefile.in: Regenerate.
> > > * include/Makefile.in: Regenerate.
> > > * man/Makefile.in: Regenerate.
> > > * testsuite/Makefile.in: Regenerate.
> > > * configure (tmake_file): Substitute.
> > > * configure.ac: Regenerate.
> > > * configure.host (powerpc-*-aix*): Define tmake_file.
> > > * src/powerpc/t-aix: New file.
> > >
> > >
> > >
> > >
> > > Clément Chigot
> > > ATOS Bull SAS
> > > 1 rue de Provence - 38432 Échirolles - France
> > >



-- 
H.J.


Re: [PATCH] rs6000: Fix wrong code generation for vec_sel [PR94613]

2021-09-15 Thread Segher Boessenkool
Hi!

Please do not send patches as attachments to replies.  Each patch (or
patch series) starts its own thread.  New versions of patches (or patch
series) are new threads.

> From: Xionghu Luo 
> Date: Tue, 27 Apr 2021 01:07:25 -0500
> Subject: [PATCH 1/2] rs6000: Fix wrong code generation for vec_sel [PR94613]

>   * config/rs6000/rs6000-call.c (altivec_expand_vec_sel_builtin):
>   New.

That fits on one line.  Changelogs are 80 chars wide.

>   * config/rs6000/rs6000.c (rs6000_emit_vector_cond_expr): Use
>   bit-wise selection instead of per element.

So "bit-wise" fits on the previous line, too.

> +  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)
> +  && rtx_equal_p (operands[2], operands[3])"

The "&&" should align with "VECTOR.."  (Many times in this).

> +(define_insn "altivec_vsel4"
> +  [(set (match_operand:VM 0 "altivec_register_operand" "=v")
> + (ior:VM
> +  (and:VM
> +   (match_operand:VM 1 "altivec_register_operand" "v")
> +   (match_operand:VM 2 "altivec_register_operand" "v"))
> +  (and:VM
> +   (not:VM (match_operand:VM 3 "altivec_register_operand" "v"))
> +   (match_operand:VM 4 "altivec_register_operand" "v"]
> +  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)
> +  && rtx_equal_p (operands[2], operands[3])"
> +  "vsel %0,%4,%1,%3"
>[(set_attr "type" "vecmove")])

I still don't see how rtx_equal_p is correct here.  Either it should be
a match_dup or the constraints can do that.

> +(ior:VEC_L
> + (and:VEC_L (not:VEC_L (match_operand:VEC_L 3 "vlogical_operand"))
> +  (match_operand:VEC_L 1 "vlogical_operand"))

We indent RTL by two chars, not one.  An advantage of that is that wrong
indent like in this last line is more obvious (the "(match.." should
align with the "(not").

> + (and:VEC_L (match_dup 3)
> + (match_operand:VEC_L 2 "vlogical_operand"]

The two "(match.." should align here.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr94613.c
> @@ -0,0 +1,38 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target vmx_hw } */
> +/* { dg-options "-O3 -maltivec" } */

Why -O3?  Please just -O2 (if that works).

> From 74cf1fd298e4923c106deaba3192423d48049559 Mon Sep 17 00:00:00 2001
> From: Xionghu Luo 
> Date: Fri, 14 May 2021 01:21:06 -0500
> Subject: [PATCH 2/2] rs6000: Fold xxsel to vsel since they have same semantics

Nevwer send two patches in one mail.  Make a series please.

> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -668,59 +668,67 @@ (define_insn "*altivec_gev4sf"
>[(set_attr "type" "veccmp")])
>  
>  (define_insn "altivec_vsel"
> -  [(set (match_operand:VM 0 "altivec_register_operand" "=v")
> +  [(set (match_operand:VM 0 "altivec_register_operand" "=wa,v")
>   (ior:VM
>(and:VM
> -   (not:VM (match_operand:VM 3 "altivec_register_operand" "v"))
> -   (match_operand:VM 1 "altivec_register_operand" "v"))
> +   (not:VM (match_operand:VM 3 "altivec_register_operand" "wa,v"))
> +   (match_operand:VM 1 "altivec_register_operand" "wa,v"))
>(and:VM
> -   (match_operand:VM 2 "altivec_register_operand" "v")
> -   (match_operand:VM 4 "altivec_register_operand" "v"]
> +   (match_operand:VM 2 "altivec_register_operand" "wa,v")
> +   (match_operand:VM 4 "altivec_register_operand" "wa,v"]
>"VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)
>&& rtx_equal_p (operands[2], operands[3])"
> -  "vsel %0,%1,%4,%3"
> +  "@
> +  xxsel %x0,%x1,%x4,%x3
> +  vsel %0,%1,%4,%3"

The mnemonics should align with the @.

This ordering makes us prefer xxsel over vsel.  Do we want that?  We
probably do, but it is a change I think?

Do we want to add an "isa" attribute?  Most patterns still don't, but we
probably should wherever we can.

"altivec_register_operand" is wrong.  Just "gpc_reg_operand" I think?


Segher


Re: [PATCH 4/4] [PATCH 4/4] x86: Add TARGET_SSE_PARTIAL_REG_[FP_]CONVERTS_DEPENDENCY

2021-09-15 Thread H.J. Lu via Gcc-patches
There is no need to add [PATCH N/4] in the first line of the git
commit message.  "git format-patch" or "git send-email" will
add them automatically.

On Wed, Sep 15, 2021 at 1:10 AM  wrote:
>
> From: "H.J. Lu" 
>
> 1. Replace TARGET_SSE_PARTIAL_REG_DEPENDENCY with
> TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY in SSE FP to FP splitters.
> 2. Replace TARGET_SSE_PARTIAL_REG_DEPENDENCY with
> TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY in SSE INT to FP splitters.
> 3.  Also check TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY and
> TARGET_SSE_PARTIAL_REG_DEPENDENCY when handling avx_partial_xmm_update
> attribute.  Don't convert AVX partial XMM register update if there is no
> partial SSE register dependency for SSE conversion.
>
> gcc/
>
> * config/i386/i386-features.c (remove_partial_avx_dependency):
> Also check TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY and
> and TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY before generating
> vxorps.
> * config/i386/i386.h (TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY):
> New.
> (TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Likewise.
> * config/i386/i386.md (SSE FP to FP splitters): Replace
> TARGET_SSE_PARTIAL_REG_DEPENDENCY with
> TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY.
> (SSE INT to FP splitter): Replace TARGET_SSE_PARTIAL_REG_DEPENDENCY
> with TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY.
> * config/i386/x86-tune.def
> (X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY): New.
> (X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Likewise.
>
> gcc/testsuite/
>
> * gcc.target/i386/avx-covert-1.c: New file.
> * gcc.target/i386/avx-fp-covert-1.c: Likewise.
> * gcc.target/i386/avx-int-covert-1.c: Likewise.
> * gcc.target/i386/sse-covert-1.c: Likewise.
> * gcc.target/i386/sse-fp-covert-1.c: Likewise.
> * gcc.target/i386/sse-int-covert-1.c: Likewise.
> ---
>  gcc/config/i386/i386-features.c   |  6 --
>  gcc/config/i386/i386.h|  4 
>  gcc/config/i386/i386.md   |  9 ++---
>  gcc/config/i386/x86-tune.def  | 15 +++
>  gcc/testsuite/gcc.target/i386/avx-covert-1.c  | 19 +++
>  .../gcc.target/i386/avx-fp-covert-1.c | 15 +++
>  .../gcc.target/i386/avx-int-covert-1.c| 14 ++
>  gcc/testsuite/gcc.target/i386/sse-covert-1.c  | 19 +++
>  .../gcc.target/i386/sse-fp-covert-1.c | 15 +++
>  .../gcc.target/i386/sse-int-covert-1.c| 14 ++
>  10 files changed, 125 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx-covert-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx-fp-covert-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx-int-covert-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse-covert-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse-fp-covert-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse-int-covert-1.c
>
> diff --git a/gcc/config/i386/i386-features.c b/gcc/config/i386/i386-features.c
> index ae5ea02a002..91bfa06d4bf 100644
> --- a/gcc/config/i386/i386-features.c
> +++ b/gcc/config/i386/i386-features.c
> @@ -2218,14 +2218,16 @@ remove_partial_avx_dependency (void)
>   machine_mode dest_mode = GET_MODE (dest);
>   machine_mode src_mode;
>
> - if (TARGET_USE_VECTOR_FP_CONVERTS)
> + if (TARGET_USE_VECTOR_FP_CONVERTS
> + || !TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY)
> {
>   src_mode = GET_MODE (XEXP (src, 0));
>   if (src_mode == E_SFmode || src_mode == E_DFmode)
> continue;
> }
>
> - if (TARGET_USE_VECTOR_CONVERTS)
> + if (TARGET_USE_VECTOR_CONVERTS
> + || !TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY)
> {
>   src_mode = GET_MODE (XEXP (src, 0));
>   if (src_mode == E_SImode || src_mode == E_DImode)
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index e76bb55c080..ec60b89753e 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -334,6 +334,10 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST];
> ix86_tune_features[X86_TUNE_PARTIAL_REG_DEPENDENCY]
>  #define TARGET_SSE_PARTIAL_REG_DEPENDENCY \
> ix86_tune_features[X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY]
> +#define TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY \
> +   ix86_tune_features[X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY]
> +#define TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY \
> +   ix86_tune_features[X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY]
>  #define TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \
> ix86_tune_features[X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL]
>  #define TARGET_SSE_UNALIGNED_STORE_OPTIMAL \
> diff --git 

[PATCH][RFC] tree-optimization/65206 - dependence analysis on mixed pointer/array

2021-09-15 Thread Richard Biener via Gcc-patches
This adds the capability to analyze the dependence of mixed
pointer/array accesses.  The example is from where using a masked
load/store creates the pointer-based access when an otherwise
unconditional access is array based.  Other examples would include
accesses to an array mixed with accesses from inlined helpers
that work on pointers.

The idea is quite simple and old - analyze the data-ref indices
as if the reference was pointer-based.  The following change does
this by changing dr_analyze_indices to work on the indices
sub-structure and storing an alternate indices substructure in
each data reference.  That alternate set of indices is analyzed
lazily by initialize_data_dependence_relation when it fails to
match-up the main set of indices of two data references.
initialize_data_dependence_relation is refactored into a head
and a tail worker and changed to work on one of the indices
structures and thus away from using DR_* access macros which
continue to reference the main indices substructure.

There's currently a --param in the patch that I intend to remove
in the end to control whether the alternate indices path is used.

The patch currently FAILs gcc.dg/vect/vect-cselim-1.c on x86_64 because
I enabled vect_masked_store but the testcase looks for 'short'
sized element masking which isn't available.  I have to figure
how to deal with this, possibly by not enabling vect_masked_store
but instead making the testcase target specific somehow.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

I'm currently gathering some statistics on SPEC CPU 2017 on
x86_64 with AVX2 to assess the effect on loop transforms
by means of -fopt-info-loop (but that takes some time due
to serial compile)

Any comments on the approach and/or different or better ideas?

Thanks,
Richard.

2021-09-08  Richard Biener  

PR tree-optimization/65206
* tree-data-ref.h (struct data_reference): Add alt_indices,
order it last.
* tree-data-ref.c (dr_analyze_indices): Work on
struct indices and get DR_REF as tree.
(create_data_ref): Adjust.
(initialize_data_dependence_relation): Split into head
and tail.  When the base objects fail to match up try
again with pointer-based analysis of indices.
* tree-vectorizer.c (vec_info_shared::check_datarefs): Do
not compare the lazily computed alternate set of indices.

* gcc.dg/vect/pr65206.c: New testcase.
* lib/target-supports.exp: Amend
check_effective_target_vect_masked_store.
---
 gcc/params.opt|   4 +
 gcc/testsuite/gcc.dg/vect/pr65206.c   |  23 
 gcc/testsuite/lib/target-supports.exp |   3 +-
 gcc/tree-data-ref.c   | 163 +-
 gcc/tree-data-ref.h   |   9 +-
 gcc/tree-vectorizer.c |   3 +-
 6 files changed, 146 insertions(+), 59 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr65206.c

diff --git a/gcc/params.opt b/gcc/params.opt
index 658ca028851..74a76954a0a 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -1137,4 +1137,8 @@ Controls how loop vectorizer uses partial vectors.  0 
means never, 1 means only
 Common Joined UInteger Var(param_vect_inner_loop_cost_factor) Init(50) 
IntegerRange(1, 1) Param Optimization
 The maximum factor which the loop vectorizer applies to the cost of statements 
in an inner loop relative to the loop being vectorized.
 
+-param=data-dep-alt-indices=
+Common Joined UInteger Var(param_data_dep_alt_indices) Init(1) IntegerRange(0, 
1) Param Optimization
+Whether to allow the use of alternate indices in dependence analysis.
+
 ; This comment is to ensure we retain the blank line above.
diff --git a/gcc/testsuite/gcc.dg/vect/pr65206.c 
b/gcc/testsuite/gcc.dg/vect/pr65206.c
new file mode 100644
index 000..17405297172
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65206.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_double } */
+/* { dg-require-effective-target vect_masked_store } */
+/* { dg-additional-options "-fno-trapping-math -fno-allow-store-data-races" } 
*/
+/* { dg-additional-options "-mavx" { target avx } } */
+
+#define N 1024
+
+double a[N], b[N];
+
+void foo ()
+{
+  for (int i = 0; i < N; ++i)
+if (b[i] < 3.)
+  a[i] += b[i];
+}
+
+/* We get a .MASK_STORE because while the load of a[i] does not trap
+   the store would introduce store data races.  Make sure we still
+   can handle the data dependence with zero distance.  */
+
+/* { dg-final { scan-tree-dump-not "versioning for alias required" "vect" } } 
*/
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 8697ceb53c9..74f47de6832 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -7656,7 +7656,8 @@ proc check_effective_target_vect_masked_load { } {
 
 proc 

Re: [PATCH][v2] Always default to DWARF2_DEBUG if not specified, warn about deprecated STABS

2021-09-15 Thread Richard Biener via Gcc-patches
On Wed, 15 Sep 2021, John David Anglin wrote:

> On 2021-09-15 2:26 a.m., Richard Biener wrote:
> >> I believe the 32-bit SOM target should be deprecated.  I'm the only one 
> >> maintaining it and I had some health issues earlier this year.
> >> The current versions should suffice for several years.
> > Do you think it's worth keeping the 32bit pa hpux targets for another
> > release but guarded with --enable-obsolete or can we remove those
> > configurations right away?
> I would choose --enable-obsolete.  Currently, things more or less work except 
> for modules.

OK, I see.

> >
> > In the current setting configurations that do not support DWARF will
> > get no debug info with -g (with a warning that this happens) and
> > STABS debug info with -gstabs (with a warning about its deprecation).
> > That might not be the final outcome for GCC 12 but it's the minimal
> > change I'm working towards.
> Is there a simple way to enable -gstabs in build?

Currently not.  If we're retaining more than pdp11 with a non-DWARF
config I'm considering to allow STABS by default for those without
diagnostics for GCC 12.

With GCC 13 we'll definitely either remove the configurations or
leave the target without any support for debug info.

Richard.


[PATCH] target/102348 - fix powerpc-lynxos build

2021-09-15 Thread Richard Biener via Gcc-patches
This fixes a similar issue for powerpc-lynxos as fixed for i686-lynxos
already.

Build-tested for powerpc-lynxos.

2021-09-15  Richard Biener  

PR target/102348
* config/rs6000/lynx.h: Remove undef of PREFERRED_DEBUGGING_TYPE
to inherit from elfos.h
---
 gcc/config/rs6000/lynx.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/config/rs6000/lynx.h b/gcc/config/rs6000/lynx.h
index 3434c8b3989..0ddb54f213e 100644
--- a/gcc/config/rs6000/lynx.h
+++ b/gcc/config/rs6000/lynx.h
@@ -80,7 +80,6 @@
 
 #undef SIZE_TYPE
 #undef ASM_OUTPUT_ALIGN
-#undef PREFERRED_DEBUGGING_TYPE
 
 /* The file rs6000.c defines TARGET_HAVE_TLS unconditionally to the
value of HAVE_AS_TLS.  HAVE_AS_TLS is true as gas support for TLS
-- 
2.31.1


Re: [PATCH][v2] Always default to DWARF2_DEBUG if not specified, warn about deprecated STABS

2021-09-15 Thread John David Anglin
On 2021-09-15 2:26 a.m., Richard Biener wrote:
>> I believe the 32-bit SOM target should be deprecated.  I'm the only one 
>> maintaining it and I had some health issues earlier this year.
>> The current versions should suffice for several years.
> Do you think it's worth keeping the 32bit pa hpux targets for another
> release but guarded with --enable-obsolete or can we remove those
> configurations right away?
I would choose --enable-obsolete.  Currently, things more or less work except 
for modules.
>
> In the current setting configurations that do not support DWARF will
> get no debug info with -g (with a warning that this happens) and
> STABS debug info with -gstabs (with a warning about its deprecation).
> That might not be the final outcome for GCC 12 but it's the minimal
> change I'm working towards.
Is there a simple way to enable -gstabs in build?

Dave

-- 
John David Anglin  dave.ang...@bell.net



Re: [PATCH] aix: Add FAT library support for libffi

2021-09-15 Thread David Edelsohn via Gcc-patches
Clement,

GCC libffi cherry-picks / backports patches from upstream, but it does
not maintain local patches, so we need to find another solution.

Thanks, David

On Wed, Sep 15, 2021 at 9:14 AM CHIGOT, CLEMENT  wrote:
>
> Hi David,
>
> The problem is that it has no meaning in libffi itself...
> This patch is specific to gcc because the multilib part is specific
> to gcc.
> I'll ask the community but the patch cannot be merged inside
> libffi.
>
> Thanks,
> Clément
> 
> From: David Edelsohn 
> Sent: Wednesday, September 15, 2021 2:52 PM
> To: CHIGOT, CLEMENT 
> Cc: gcc-patches@gcc.gnu.org 
> Subject: Re: [PATCH] aix: Add FAT library support for libffi
>
> Caution! External email. Do not open attachments or click links, unless this 
> email comes from a known sender and you know the content is safe.
>
> Clement,
>
> GCC is not the primary repository for libffi.  This patch must be
> submitted to the libffi project first, not GCC.  If it is accepted in
> libffi, then you can ask for a backport to GCC.
>
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Flibffi%2Flibffidata=04%7C01%7Cclement.chigot%40atos.net%7C58a9dee835dd4d5ca10308d97847ec9c%7C33440fc6b7c7412cbb730e70b0198d5a%7C0%7C0%7C637673072579890230%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=6%2B%2Bxl4K30%2B%2FvitHcOZwwyq1B6CwAEtHmr3LXgUIp9uQ%3Dreserved=0
>
> Thanks, David
>
> On Wed, Sep 15, 2021 at 7:20 AM CHIGOT, CLEMENT  
> wrote:
> >
> > Even if GCC64 is able to bootstrap without libffi being a
> > FAT library on AIX, the tests for "-maix32" are not working
> > without it.
> >
> > libffi/ChangeLog:
> > 2021-09-10  Clément Chigot  
> >
> > * Makefile.am (tmake_file): Build and install AIX-style FAT
> >   libraries.
> > * Makefile.in: Regenerate.
> > * include/Makefile.in: Regenerate.
> > * man/Makefile.in: Regenerate.
> > * testsuite/Makefile.in: Regenerate.
> > * configure (tmake_file): Substitute.
> > * configure.ac: Regenerate.
> > * configure.host (powerpc-*-aix*): Define tmake_file.
> > * src/powerpc/t-aix: New file.
> >
> >
> >
> >
> > Clément Chigot
> > ATOS Bull SAS
> > 1 rue de Provence - 38432 Échirolles - France
> >


Re: PING^1 [PATCH] rs6000: Remove useless toc-fusion option

2021-09-15 Thread David Edelsohn via Gcc-patches
On Wed, Sep 15, 2021 at 4:41 AM Kewen.Lin  wrote:
>
> Hi,
>
> Gentle ping this patch:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578553.html
>
>
> BR,
> Kewen
>
> on 2021/9/1 下午2:56, Kewen.Lin via Gcc-patches wrote:
> > Hi!
> >
> > Option toc-fusion was intended for Power9 toc fusion previously,
> > but Power9 doesn't support fusion at all eventually, this patch
> > is to remove this useless option.
> >
> > Is it ok for trunk?
> >
> > BR,
> > Kewen
> > -
> > gcc/ChangeLog:
> >
> >   * config/rs6000/rs6000.opt (-mtoc-fusion): Remove.
> >

Okay.

Thanks, David


Re: [PATCH] aix: Add FAT library support for libffi

2021-09-15 Thread CHIGOT, CLEMENT via Gcc-patches
Hi David,

The problem is that it has no meaning in libffi itself...
This patch is specific to gcc because the multilib part is specific
to gcc.
I'll ask the community but the patch cannot be merged inside
libffi.

Thanks,
Clément

From: David Edelsohn 
Sent: Wednesday, September 15, 2021 2:52 PM
To: CHIGOT, CLEMENT 
Cc: gcc-patches@gcc.gnu.org 
Subject: Re: [PATCH] aix: Add FAT library support for libffi

Caution! External email. Do not open attachments or click links, unless this 
email comes from a known sender and you know the content is safe.

Clement,

GCC is not the primary repository for libffi.  This patch must be
submitted to the libffi project first, not GCC.  If it is accepted in
libffi, then you can ask for a backport to GCC.

https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Flibffi%2Flibffidata=04%7C01%7Cclement.chigot%40atos.net%7C58a9dee835dd4d5ca10308d97847ec9c%7C33440fc6b7c7412cbb730e70b0198d5a%7C0%7C0%7C637673072579890230%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=6%2B%2Bxl4K30%2B%2FvitHcOZwwyq1B6CwAEtHmr3LXgUIp9uQ%3Dreserved=0

Thanks, David

On Wed, Sep 15, 2021 at 7:20 AM CHIGOT, CLEMENT  wrote:
>
> Even if GCC64 is able to bootstrap without libffi being a
> FAT library on AIX, the tests for "-maix32" are not working
> without it.
>
> libffi/ChangeLog:
> 2021-09-10  Clément Chigot  
>
> * Makefile.am (tmake_file): Build and install AIX-style FAT
>   libraries.
> * Makefile.in: Regenerate.
> * include/Makefile.in: Regenerate.
> * man/Makefile.in: Regenerate.
> * testsuite/Makefile.in: Regenerate.
> * configure (tmake_file): Substitute.
> * configure.ac: Regenerate.
> * configure.host (powerpc-*-aix*): Define tmake_file.
> * src/powerpc/t-aix: New file.
>
>
>
>
> Clément Chigot
> ATOS Bull SAS
> 1 rue de Provence - 38432 Échirolles - France
>


Re: Ping ^ 3: [PATCH] rs6000: Fix wrong code generation for vec_sel [PR94613]

2021-09-15 Thread David Edelsohn via Gcc-patches
Hi, Xionhu

Should "altivec_vsel2" .. 3 .. 4 be "*altivec_vsel2", etc.
because they are combiner patterns and never referenced by name?  Only
the first, named pattern is referenced by the builtin code.

Other than that question / suggestion, this patch is okay.  Please
coordinate with Bill and his builtin patches.

Thanks, David

On Wed, Sep 15, 2021 at 3:50 AM Xionghu Luo  wrote:
>
> Ping^3, thanks.
>
> https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570333.html
>
>
> On 2021/9/6 08:52, Xionghu Luo via Gcc-patches wrote:
> > Ping^2, thanks.
> >
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570333.html
> >
> > On 2021/6/30 09:42, Xionghu Luo via Gcc-patches wrote:
> >> Gentle ping, thanks.
> >>
> >> https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570333.html
> >>
> >>
> >> On 2021/5/14 14:57, Xionghu Luo via Gcc-patches wrote:
> >>> Hi,
> >>>
> >>> On 2021/5/13 18:49, Segher Boessenkool wrote:
>  Hi!
> 
>  On Fri, Apr 30, 2021 at 01:32:58AM -0500, Xionghu Luo wrote:
> > The vsel instruction is a bit-wise select instruction.  Using an
> > IF_THEN_ELSE to express it in RTL is wrong and leads to wrong code
> > being generated in the combine pass.  Per element selection is a
> > subset of per bit-wise selection,with the patch the pattern is
> > written using bit operations.  But there are 8 different patterns
> > to define "op0 := (op1 & ~op3) | (op2 & op3)":
> >
> > (~op3) | (op3),
> > (~op3) | (op2),
> > (op3) | (~op3),
> > (op2) | (~op3),
> > (op1&~op3) | (op3),
> > (op1&~op3) | (op2),
> > (op3) | (op1&~op3),
> > (op2) | (op1&~op3),
> >
> > Combine pass will swap (op1&~op3) to (~op3) due to commutative
> > canonical, which could reduce it to the FIRST 4 patterns, but it won't
> > swap (op2) | (~op3) to (~op3) | (op2), so this patch
> > handles it with two patterns with different NOT op3 position and check
> > equality inside it.
> 
>  Yup, that latter case does not have canonicalisation rules.  Btw, not
>  only combine does this canonicalisation: everything should,
>  non-canonical RTL is invalid RTL (in the instruction stream, you can do
>  everything in temporary code of course, as long as the RTL isn't
>  malformed).
> 
> > -(define_insn "*altivec_vsel"
> > +(define_insn "altivec_vsel"
> > [(set (match_operand:VM 0 "altivec_register_operand" "=v")
> > -(if_then_else:VM
> > - (ne:CC (match_operand:VM 1 "altivec_register_operand" "v")
> > -(match_operand:VM 4 "zero_constant" ""))
> > - (match_operand:VM 2 "altivec_register_operand" "v")
> > - (match_operand:VM 3 "altivec_register_operand" "v")))]
> > -  "VECTOR_MEM_ALTIVEC_P (mode)"
> > -  "vsel %0,%3,%2,%1"
> > +(ior:VM
> > + (and:VM
> > +  (not:VM (match_operand:VM 3 "altivec_register_operand" "v"))
> > +  (match_operand:VM 1 "altivec_register_operand" "v"))
> > + (and:VM
> > +  (match_operand:VM 2 "altivec_register_operand" "v")
> > +  (match_operand:VM 4 "altivec_register_operand" "v"]
> > +  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)
> > +  && (rtx_equal_p (operands[2], operands[3])
> > +  || rtx_equal_p (operands[4], operands[3]))"
> > +  {
> > +if (rtx_equal_p (operands[2], operands[3]))
> > +  return "vsel %0,%1,%4,%3";
> > +else
> > +  return "vsel %0,%1,%2,%3";
> > +  }
> > [(set_attr "type" "vecmove")])
> 
>  That rtx_equal_p stuff is nice and tricky, but it is a bit too tricky I
>  think.  So please write this as two patterns (and keep the expand if
>  that helps).
> >>>
> >>> I was a bit concerned that there would be a lot of duplicate code if we
> >>> write two patterns for each vsel, totally 4 similar patterns in
> >>> altivec.md and another 4 in vsx.md make it difficult to maintain,
> >>> however
> >>> I updated it since you prefer this way, as you pointed out the xxsel in
> >>> vsx.md could be folded by later patch.
> >>>
> 
> > +(define_insn "altivec_vsel2"
> 
>  (same here of course).
> 
> >   ;; Fused multiply add.
> > diff --git a/gcc/config/rs6000/rs6000-call.c
> > b/gcc/config/rs6000/rs6000-call.c
> > index f5676255387..d65bdc01055 100644
> > --- a/gcc/config/rs6000/rs6000-call.c
> > +++ b/gcc/config/rs6000/rs6000-call.c
> > @@ -3362,11 +3362,11 @@ const struct altivec_builtin_types
> > altivec_overloaded_builtins[] = {
> >   RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI,
> > RS6000_BTI_unsigned_V2DI },
> > { ALTIVEC_BUILTIN_VEC_SEL, ALTIVEC_BUILTIN_VSEL_2DI,
> >   RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI,
> > RS6000_BTI_V2DI },
> > -  { ALTIVEC_BUILTIN_VEC_SEL, ALTIVEC_BUILTIN_VSEL_2DI,
> > +  { ALTIVEC_BUILTIN_VEC_SEL, ALTIVEC_BUILTIN_VSEL_2DI_UNS,
> 
>  Are the _uns things still used for 

Re: [PATCH v2 1/2] MIPS: use mips_isa enum instead hardcoded numbers

2021-09-15 Thread Martin Liška

Hello.

I noticed the change likely caused the following failure when building
x86_64-linux-gnu cross compiler:

g++  -fno-PIE -c  -DIN_GCC_FRONTEND -DIN_GCC_FRONTEND -g   -DIN_GCC  
-DCROSS_DIRECTORY_STRUCTURE   -fno-exceptions -fno-rtti 
-fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wno-error=format-diag -Wmissing-format-attribute 
-Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros 
-Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -I. -Ic-family 
-I/home/marxin/Programming/gcc/gcc -I/home/marxin/Programming/gcc/gcc/c-family 
-I/home/marxin/Programming/gcc/gcc/../include 
-I/home/marxin/Programming/gcc/gcc/../libcpp/include 
-I/home/marxin/Programming/gcc/gcc/../libcody  
-I/home/marxin/Programming/gcc/gcc/../libdecnumber 
-I/home/marxin/Programming/gcc/gcc/../libdecnumber/dpd -I../libdecnumber 
-I/home/marxin/Programming/gcc/gcc/../libbacktrace   -o c-family/c-cppbuiltin.o 
-MT c-family/c-cppbuiltin.o -MMD -MP -MF c-family/.deps/c-cppbuiltin.TPo 
/home/marxin/Programming/gcc/gcc/c-family/c-cppbuiltin.c

In file included from ./tm.h:26,

 from /home/marxin/Programming/gcc/gcc/target.h:52,

 from 
/home/marxin/Programming/gcc/gcc/c-family/c-cppbuiltin.c:23:

/home/marxin/Programming/gcc/gcc/c-family/c-cppbuiltin.c: In function ‘void 
c_cpp_builtins(cpp_reader*)’:

/home/marxin/Programming/gcc/gcc/config/mips/netbsd.h:90:28: error: 
‘MIPS_ISA_64’ was not declared in this scope; did you mean ‘MIPS_ISA_MIPS64’?

   90 |   else if (mips_isa >= MIPS_ISA_64) \

  |^~~

/home/marxin/Programming/gcc/gcc/c-family/c-cppbuiltin.c:1551:3: note: in 
expansion of macro ‘TARGET_CPU_CPP_BUILTINS’

 1551 |   TARGET_CPU_CPP_BUILTINS ();

  |   ^~~


It's configured with:
--host=x86_64-pc-linux-gnu --target=mips-netbsd

Thanks,
Martin


Re: [PATCH] aix: Add FAT library support for libffi

2021-09-15 Thread David Edelsohn via Gcc-patches
Clement,

GCC is not the primary repository for libffi.  This patch must be
submitted to the libffi project first, not GCC.  If it is accepted in
libffi, then you can ask for a backport to GCC.

https://github.com/libffi/libffi

Thanks, David

On Wed, Sep 15, 2021 at 7:20 AM CHIGOT, CLEMENT  wrote:
>
> Even if GCC64 is able to bootstrap without libffi being a
> FAT library on AIX, the tests for "-maix32" are not working
> without it.
>
> libffi/ChangeLog:
> 2021-09-10  Clément Chigot  
>
> * Makefile.am (tmake_file): Build and install AIX-style FAT
>   libraries.
> * Makefile.in: Regenerate.
> * include/Makefile.in: Regenerate.
> * man/Makefile.in: Regenerate.
> * testsuite/Makefile.in: Regenerate.
> * configure (tmake_file): Substitute.
> * configure.ac: Regenerate.
> * configure.host (powerpc-*-aix*): Define tmake_file.
> * src/powerpc/t-aix: New file.
>
>
>
>
> Clément Chigot
> ATOS Bull SAS
> 1 rue de Provence - 38432 Échirolles - France
>


Re: [PATCH v2] ipa-inline: Add target info into fn summary [PR102059]

2021-09-15 Thread Martin Jambor
Hi,

since this is inlining-related, I would somewhat prefer Honza to have a
look too, but I have the following comments:

On Wed, Sep 08 2021, Kewen.Lin wrote:
>

[...]

> diff --git a/gcc/ipa-fnsummary.h b/gcc/ipa-fnsummary.h
> index 78399b0b9bb..300b8da4507 100644
> --- a/gcc/ipa-fnsummary.h
> +++ b/gcc/ipa-fnsummary.h
> @@ -193,6 +194,9 @@ public:
>vec *loop_strides;
>/* Parameters tested by builtin_constant_p.  */
>vec GTY((skip)) builtin_constant_p_parms;
> +  /* Like fp_expressions, but it's to hold some target specific information,
> + such as some target specific isa flags.  */
> +  auto_vec GTY((skip)) target_info;
>/* Estimated growth for inlining all copies of the function before start
>   of small functions inlining.
>   This value will get out of date as the callers are duplicated, but

Segher already wrote in the first thread that a vector of HOST_WIDE_INTs
is an overkill and I agree.  So at least make the new field just a
HOST_WIDE_INT or better yet, an unsigned int.  But I would even go
further and make target_info only a 16-bit bit-field, place it after the
other bit-fields in class ipa_fn_summary and pass it to the hooks as
uint16_t.  Unless you have plans which require more space, I think we
should be conservative here.

I am also not sure if I agree that the field should not be streamed for
offloading, but since we do not have an offloading compiler needing them
I guess for now that is OK. But it should be documented in the comment
describing the field that it is not streamed to offloading compilers.

[...]


> diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
> index 2470937460f..72091b6193f 100644
> --- a/gcc/ipa-fnsummary.c
> +++ b/gcc/ipa-fnsummary.c
> @@ -2608,6 +2617,7 @@ analyze_function_body (struct cgraph_node *node, bool 
> early)
>info->conds = NULL;
>info->size_time_table.release ();
>info->call_size_time_table.release ();
> +  info->target_info.release();
>  
>/* When optimizing and analyzing for IPA inliner, initialize loop optimizer
>   so we can produce proper inline hints.
> @@ -2659,6 +2669,12 @@ analyze_function_body (struct cgraph_node *node, bool 
> early)
>  bb_predicate,
>  bb_predicate);
>  
> +  /* Only look for target information for inlinable functions.  */
> +  bool scan_for_target_info =
> +info->inlinable
> +&& targetm.target_option.need_ipa_fn_target_info (node->decl,
> +   info->target_info);
> +
>if (fbi.info)
>  compute_bb_predicates (, node, info, params_summary);
>const profile_count entry_count = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
> @@ -2876,6 +2892,10 @@ analyze_function_body (struct cgraph_node *node, bool 
> early)
> if (dump_file)
>   fprintf (dump_file, "   fp_expression set\n");
>   }
> +   if (scan_for_target_info)
> + scan_for_target_info =
> +   targetm.target_option.update_ipa_fn_target_info
> +   (info->target_info, stmt);
>   }

Practically it probably does not matter, but why is this in the "if
(this_time || this_size)" block?  Although I can see that setting
fp_expression is also done that way... but it seems like copying a
mistake to me.

All that said, the overall approach seems correct to me.

Martin



Re: [PATCH] testsuite: Make sure double-precision is supported in g++.dg/eh/arm-vfp-unwind.C

2021-09-15 Thread Richard Earnshaw via Gcc-patches




On 15/09/2021 13:26, Christophe LYON via Gcc-patches wrote:


On 15/09/2021 13:02, Richard Earnshaw wrote:



On 26/08/2021 16:53, Christophe Lyon via Gcc-patches wrote:

g++.dg/eh/arm-vfp-unwind.C uses an asm statement relying on
double-precision FPU support, but does not make sure it is actually
supported by the target.
Check (__ARM_FP & 8) to ensure this.

2021-08-26  Christophe Lyon  

gcc/testsuite/
* g++.dg/eh/arm-vfp-unwind.C: Check __ARM_FP.
---
  gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C 
b/gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C

index 62263c0c3b0..90d20081d78 100644
--- a/gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C
+++ b/gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C
@@ -3,7 +3,7 @@
    /* Test to catch off-by-one errors in arm/pr-support.c.  */
  -#if defined (__VFP_FP__) && !defined (__SOFTFP__)
+#if defined (__VFP_FP__) && !defined (__SOFTFP__) && (__ARM_FP & 8)
    #include 
  #include 



Wouldn't it be better to have an alternate to the asm for the case 
where we only have single-precision float?  Something like (untested):


static void donkey ()
{
#if __ARM_FP & 8
  asm volatile ("fcpyd d9, %P0" : : "w" (1.2345) : "d9");
#else
  asm volatile ("fcpys s18, %P0" : : "w" (1.2345f) : "s18");
#endif
  throw 1;
}



I tried similar things but they failed on some testing configurations.

Let me try your version, I'll let you know if there is any fallout.


Of course, the asm syntax should be converted to the new 'unified 
syntax' form ie vmov.f{32,64}.


R.



Christophe




R.


Re: [PATCH RFC] c++: implement C++17 hardware interference size

2021-09-15 Thread Martin Liška

On 9/14/21 09:56, Christophe LYON via Gcc-patches wrote:

So adjustment is needed for both arm and aarch64 targets


Hello.

I noticed the same problem and I've got a patch candidate for it.

What do you think about it?
MartinFrom 2ecedfe5cc421dd36f5770b04553343ecebd3430 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Wed, 15 Sep 2021 14:29:52 +0200
Subject: [PATCH] Fix -Werror=interference-size on ARM and Aarch64.

gcc/ChangeLog:

	* config/aarch64/aarch64.c (aarch64_override_options_internal):
	Use default L1 cache line size from params.
	* config/arm/arm.c (arm_option_override): Likewise.
---
 gcc/config/aarch64/aarch64.c | 2 +-
 gcc/config/arm/arm.c | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 36519ccc5a5..4eb50bdb7d5 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -16559,7 +16559,7 @@ aarch64_override_options_internal (struct gcc_options *opts)
 			   256);
   SET_OPTION_IF_UNSET (opts, _options_set,
 			   param_construct_interfere_size,
-			   64);
+			   param_l1_cache_line_size);
 }
 
   if (aarch64_tune_params.prefetch->l2_cache_size >= 0)
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 6c6e77fab66..5d14e1e5e9c 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3688,7 +3688,8 @@ arm_option_override (void)
   SET_OPTION_IF_UNSET (_options, _options_set,
 			   param_destruct_interfere_size, 64);
   SET_OPTION_IF_UNSET (_options, _options_set,
-			   param_construct_interfere_size, 64);
+			   param_construct_interfere_size,
+			   param_l1_cache_line_size);
 }
 
   if (current_tune->prefetch.l1_cache_size >= 0)
-- 
2.33.0



Re: [PATCH] testsuite: Make sure double-precision is supported in g++.dg/eh/arm-vfp-unwind.C

2021-09-15 Thread Christophe LYON via Gcc-patches



On 15/09/2021 13:02, Richard Earnshaw wrote:



On 26/08/2021 16:53, Christophe Lyon via Gcc-patches wrote:

g++.dg/eh/arm-vfp-unwind.C uses an asm statement relying on
double-precision FPU support, but does not make sure it is actually
supported by the target.
Check (__ARM_FP & 8) to ensure this.

2021-08-26  Christophe Lyon  

gcc/testsuite/
* g++.dg/eh/arm-vfp-unwind.C: Check __ARM_FP.
---
  gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C 
b/gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C

index 62263c0c3b0..90d20081d78 100644
--- a/gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C
+++ b/gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C
@@ -3,7 +3,7 @@
    /* Test to catch off-by-one errors in arm/pr-support.c.  */
  -#if defined (__VFP_FP__) && !defined (__SOFTFP__)
+#if defined (__VFP_FP__) && !defined (__SOFTFP__) && (__ARM_FP & 8)
    #include 
  #include 



Wouldn't it be better to have an alternate to the asm for the case 
where we only have single-precision float?  Something like (untested):


static void donkey ()
{
#if __ARM_FP & 8
  asm volatile ("fcpyd d9, %P0" : : "w" (1.2345) : "d9");
#else
  asm volatile ("fcpys s18, %P0" : : "w" (1.2345f) : "s18");
#endif
  throw 1;
}



I tried similar things but they failed on some testing configurations.

Let me try your version, I'll let you know if there is any fallout.

Christophe




R.


Re: [PATCH RFC] c++: implement C++17 hardware interference size

2021-09-15 Thread Christophe Lyon via Gcc-patches
On Wed, Sep 15, 2021 at 12:25 PM Richard Earnshaw via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

>
>
> On 14/09/2021 08:56, Christophe LYON via Gcc-patches wrote:
> >
> > On 10/09/2021 15:16, Jason Merrill via Gcc-patches wrote:
> >> OK, time to finish this up.  The main change relative to the last
> >> patch I sent
> >> to the list is dropping the -finterference-tune flag and making that
> >> behavior
> >> the default.  Any more comments?
> >>
> >> 
> >>
> >> The last missing piece of the C++17 standard library is the hardware
> >> intereference size constants.  Much of the delay in implementing these
> >> has
> >> been due to uncertainty about what the right values are, and even
> whether
> >> there is a single constant value that is suitable; the destructive
> >> interference size is intended to be used in structure layout, so program
> >> ABIs will depend on it.
> >>
> >> In principle, both of these values should be the same as the target's L1
> >> cache line size.  When compiling for a generic target that is intended
> to
> >> support a range of target CPUs with different cache line sizes, the
> >> constructive size should probably be the minimum size, and the
> >> destructive
> >> size the maximum, unless you are constrained by ABI compatibility with
> >> previous code.
> >>
> >>  From discussion on gcc-patches, I've come to the conclusion that the
> >> solution to the difficulty of choosing stable values is to give up on
> it,
> >> and instead encourage only uses where ABI stability is unimportant: in
> >> particular, uses where the ABI is shared at most between translation
> >> units
> >> built at the same time with the same flags.
> >>
> >> To that end, I've added a warning for any use of the constant value of
> >> std::hardware_destructive_interference_size in a header or module
> export.
> >> Appropriate uses within a project can disable the warning.
> >>
> >> A previous iteration of this patch included an -finterference-tune
> >> flag to
> >> make the value vary with -mtune; this iteration makes that the default
> >> behavior, which should be appropriate for all reasonable uses of the
> >> variable.  The previous default of "stable-ish" seems to me likely to
> >> have
> >> been more of an attractive nuisance; since we can't promise actual
> >> stability, we should instead make proper uses more convenient.
> >>
> >> JF Bastien's implementation proposal is summarized at
> >> https://github.com/itanium-cxx-abi/cxx-abi/issues/74
> >>
> >> I implement this by adding new --params for the two sizes.  Targets can
> >> override these values in targetm.target_option.override() to support a
> >> range
> >> of values for the generic target; otherwise, both will default to the L1
> >> cache line size.
> >>
> >> 64 bytes still seems correct for all x86.
> >>
> >> I'm not sure why he proposed 64/64 for generic 32-bit ARM, since the
> >> Cortex
> >> A9 has a 32-byte cache line, so I'd think 32/64 would make more sense.
> >
> > While this works for an arm-linux-gnueabihf toolchain configured
> > --with-mode=arm, it fails --with-mode=thumb (also using
> > --with-cpu=cortex-a9):
> >
> > : error: '--param constructive-interference-size=64' is
> > greater than '--param l1-cache-line-size=32' [-Werror=interference-size]
> > cc1plus: all warnings being treated as errors
> > make[4]: *** [Makefile:678: alloc_c.lo] Error 1
> >
> >
> >
> >>
> >> He proposed 64/128 for generic AArch64, but since the A64FX now has a
> >> 256B
> >> cache line, I've changed that to 64/256.
> >
> >
> > Similarly, for aarch64 I'm seeing:
> >
> > : error: '--param constructive-interference-size=64' is
> > greater than '--param l1-cache-line-size=32' [-Werror=interference-size]
> > cc1plus: all warnings being treated as errors
> > make[4]: *** [Makefile:678: alloc_c.lo] Error 1
> >
> >
> > So adjustment is needed for both arm and aarch64 targets
> >
> >
>
> FWIW, I'm still in discussion with our architects about the best values
> to use here, but certainly this needs fixing quickly as it seems to be
> breaking hundreds of tests in the C++ testsuite.
>

Indeed. A lot of messages to gcc-testresults are blocked because they are
too large.

Christophe


>
> R.
>
> > Christophe
> >
> >
> >>
> >> With the above choice to reject stability as a goal, getting these
> values
> >> "right" is now just a matter of what we want the default optimization
> >> to be,
> >> and we can feel free to adjust them as CPUs with different cache lines
> >> become more and less common.
> >>
> >> gcc/ChangeLog:
> >>
> >> * params.opt: Add destructive-interference-size and
> >> constructive-interference-size.
> >> * doc/invoke.texi: Document them.
> >> * config/aarch64/aarch64.c (aarch64_override_options_internal):
> >> Set them.
> >> * config/arm/arm.c (arm_option_override): Set them.
> >> * config/i386/i386-options.c (ix86_option_override_internal):
> >> Set them.
> >>
> >> gcc/c-family/ChangeLog:
> >>
> >> * c.opt: Add 

[PATCH] aix: Add FAT library support for libffi

2021-09-15 Thread CHIGOT, CLEMENT via Gcc-patches
Even if GCC64 is able to bootstrap without libffi being a
FAT library on AIX, the tests for "-maix32" are not working
without it.

libffi/ChangeLog:
2021-09-10  Clément Chigot  

* Makefile.am (tmake_file): Build and install AIX-style FAT
  libraries.
* Makefile.in: Regenerate.
* include/Makefile.in: Regenerate.
* man/Makefile.in: Regenerate.
* testsuite/Makefile.in: Regenerate.
* configure (tmake_file): Substitute.
* configure.ac: Regenerate.
    * configure.host (powerpc-*-aix*): Define tmake_file.
* src/powerpc/t-aix: New file.




Clément Chigot
ATOS Bull SAS
1 rue de Provence - 38432 Échirolles - France



0001-aix-Add-FAT-library-support-for-libffi.patch
Description: 0001-aix-Add-FAT-library-support-for-libffi.patch


Re: [PATCH] sparc: Add scheduling information for LEON5

2021-09-15 Thread Daniel Cederman
Thank you for reviewing the patches! I will address your comments and 
push the patches after testing.


Thanks again,
Daniel

On 2021-09-15 12:18, Eric Botcazou wrote:

The LEON5 can often dual issue instructions from the same 64-bit aligned
double word if there are no data dependencies. Add scheduling information
to avoid scheduling unpairable instructions back-to-back.

gcc/ChangeLog:

 * config/sparc/sparc-opts.h (enum sparc_processor_type): Add LEON5
 * config/sparc/sparc.c (struct processor_costs): Add LEON5 costs
 (leon5_adjust_cost): Increase cost of store with data dependency
 on ALU instruction and FPU anti-dependencies.
 (sparc_option_override): Add LEON5 costs
 (sparc_adjust_cost): Add LEON5 cost adjustments
 * config/sparc/sparc.h: Add LEON5
 * config/sparc/sparc.md: Include LEON5 scheduling information
 * config/sparc/sparc.opt: Add LEON5
 * doc/invoke.texi: Add LEON5
 * config/sparc/leon5.md: New file.

OK for whatever branches you deem relevant, modulo a couple of nits:


+;; Avoid scheduling load/store, FPU, and multiplication instructions back

and multiply instructions


+;; Schedule three instructions between load and dependant instruction.

dependent



Re: [PATCH] testsuite: Make sure double-precision is supported in g++.dg/eh/arm-vfp-unwind.C

2021-09-15 Thread Richard Earnshaw via Gcc-patches




On 26/08/2021 16:53, Christophe Lyon via Gcc-patches wrote:

g++.dg/eh/arm-vfp-unwind.C uses an asm statement relying on
double-precision FPU support, but does not make sure it is actually
supported by the target.
Check (__ARM_FP & 8) to ensure this.

2021-08-26  Christophe Lyon  

gcc/testsuite/
* g++.dg/eh/arm-vfp-unwind.C: Check __ARM_FP.
---
  gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C 
b/gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C
index 62263c0c3b0..90d20081d78 100644
--- a/gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C
+++ b/gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C
@@ -3,7 +3,7 @@
  
  /* Test to catch off-by-one errors in arm/pr-support.c.  */
  
-#if defined (__VFP_FP__) && !defined (__SOFTFP__)

+#if defined (__VFP_FP__) && !defined (__SOFTFP__) && (__ARM_FP & 8)
  
  #include 

  #include 



Wouldn't it be better to have an alternate to the asm for the case where 
we only have single-precision float?  Something like (untested):


static void donkey ()
{
#if __ARM_FP & 8
  asm volatile ("fcpyd d9, %P0" : : "w" (1.2345) : "d9");
#else
  asm volatile ("fcpys s18, %P0" : : "w" (1.2345f) : "s18");
#endif
  throw 1;
}

R.


Re: [PATCH 4/4] sparc: Add NOP in stack_protect_setsi if sparc_fix_b2bst enabled

2021-09-15 Thread Eric Botcazou
> gcc/ChangeLog:
> 
> * config/sparc/sparc.md: Add NOP to prevent sensitive sequence for
> B2BST errata workaround.

OK everywhere, but the ChangeLog entry should be:

* config/sparc/sparc.md (stack_protect_set32): Add NOP...

Note that it's stack_protect_set32 on mainline and stack_protect_setsi before.

-- 
Eric Botcazou




Re: [PATCH 3/4] sparc: Prevent atomic instructions in beginning of functions for UT700

2021-09-15 Thread Eric Botcazou
> gcc/ChangeLog:
> 
> * config/sparc/sparc.c (sparc_do_work_around_errata): Do not begin
> functions with atomic instruction in the UT700 errata workaround.

OK everywhere.

-- 
Eric Botcazou




Re: [PATCH 2/4] sparc: Skip all empty assembly statements

2021-09-15 Thread Eric Botcazou
> gcc/ChangeLog:
> 
> * config/sparc/sparc.c (next_active_non_empty_insn): New function
> that returns next active non empty assembly instruction.
> (sparc_do_work_around_errata): Use new function.

OK everywhere, modulo a couple of nits:

> +rtx_insn *
> +next_active_non_empty_insn (rtx_insn *insn)
> +{
> +  insn = next_active_insn (insn);
> +
> +  while (insn
> +  && ((GET_CODE (PATTERN (insn)) == UNSPEC_VOLATILE)

Superfluous parentheses 

> +  || (GET_CODE (PATTERN (insn)) == ASM_INPUT)

Likewise.

> +  || (USEFUL_INSN_P (insn)
> +  && (asm_noperands (PATTERN (insn))>=0)

Missing spaces around >=

> +{
> +  insn = next_active_insn (insn);
> +}

Superfluous curly braces, even for readability.

-- 
Eric Botcazou




Re: [PATCH RFC] c++: implement C++17 hardware interference size

2021-09-15 Thread Richard Earnshaw via Gcc-patches




On 14/09/2021 08:56, Christophe LYON via Gcc-patches wrote:


On 10/09/2021 15:16, Jason Merrill via Gcc-patches wrote:
OK, time to finish this up.  The main change relative to the last 
patch I sent
to the list is dropping the -finterference-tune flag and making that 
behavior

the default.  Any more comments?



The last missing piece of the C++17 standard library is the hardware
intereference size constants.  Much of the delay in implementing these 
has

been due to uncertainty about what the right values are, and even whether
there is a single constant value that is suitable; the destructive
interference size is intended to be used in structure layout, so program
ABIs will depend on it.

In principle, both of these values should be the same as the target's L1
cache line size.  When compiling for a generic target that is intended to
support a range of target CPUs with different cache line sizes, the
constructive size should probably be the minimum size, and the 
destructive

size the maximum, unless you are constrained by ABI compatibility with
previous code.

 From discussion on gcc-patches, I've come to the conclusion that the
solution to the difficulty of choosing stable values is to give up on it,
and instead encourage only uses where ABI stability is unimportant: in
particular, uses where the ABI is shared at most between translation 
units

built at the same time with the same flags.

To that end, I've added a warning for any use of the constant value of
std::hardware_destructive_interference_size in a header or module export.
Appropriate uses within a project can disable the warning.

A previous iteration of this patch included an -finterference-tune 
flag to

make the value vary with -mtune; this iteration makes that the default
behavior, which should be appropriate for all reasonable uses of the
variable.  The previous default of "stable-ish" seems to me likely to 
have

been more of an attractive nuisance; since we can't promise actual
stability, we should instead make proper uses more convenient.

JF Bastien's implementation proposal is summarized at
https://github.com/itanium-cxx-abi/cxx-abi/issues/74

I implement this by adding new --params for the two sizes.  Targets can
override these values in targetm.target_option.override() to support a 
range

of values for the generic target; otherwise, both will default to the L1
cache line size.

64 bytes still seems correct for all x86.

I'm not sure why he proposed 64/64 for generic 32-bit ARM, since the 
Cortex

A9 has a 32-byte cache line, so I'd think 32/64 would make more sense.


While this works for an arm-linux-gnueabihf toolchain configured 
--with-mode=arm, it fails --with-mode=thumb (also using 
--with-cpu=cortex-a9):


: error: '--param constructive-interference-size=64' is 
greater than '--param l1-cache-line-size=32' [-Werror=interference-size]

cc1plus: all warnings being treated as errors
make[4]: *** [Makefile:678: alloc_c.lo] Error 1





He proposed 64/128 for generic AArch64, but since the A64FX now has a 
256B

cache line, I've changed that to 64/256.



Similarly, for aarch64 I'm seeing:

: error: '--param constructive-interference-size=64' is 
greater than '--param l1-cache-line-size=32' [-Werror=interference-size]

cc1plus: all warnings being treated as errors
make[4]: *** [Makefile:678: alloc_c.lo] Error 1


So adjustment is needed for both arm and aarch64 targets




FWIW, I'm still in discussion with our architects about the best values 
to use here, but certainly this needs fixing quickly as it seems to be 
breaking hundreds of tests in the C++ testsuite.


R.


Christophe




With the above choice to reject stability as a goal, getting these values
"right" is now just a matter of what we want the default optimization 
to be,

and we can feel free to adjust them as CPUs with different cache lines
become more and less common.

gcc/ChangeLog:

* params.opt: Add destructive-interference-size and
constructive-interference-size.
* doc/invoke.texi: Document them.
* config/aarch64/aarch64.c (aarch64_override_options_internal):
Set them.
* config/arm/arm.c (arm_option_override): Set them.
* config/i386/i386-options.c (ix86_option_override_internal):
Set them.

gcc/c-family/ChangeLog:

* c.opt: Add -Winterference-size.
* c-cppbuiltin.c (cpp_atomic_builtins): Add __GCC_DESTRUCTIVE_SIZE
and __GCC_CONSTRUCTIVE_SIZE.

gcc/cp/ChangeLog:

* constexpr.c (maybe_warn_about_constant_value):
Complain about std::hardware_destructive_interference_size.
(cxx_eval_constant_expression): Call it.
* decl.c (cxx_init_decl_processing): Check
--param *-interference-size values.

libstdc++-v3/ChangeLog:

* include/std/version: Define __cpp_lib_hardware_interference_size.
* libsupc++/new: Define hardware interference size variables.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Winterference.h: New file.
* g++.dg/warn/Winterference.C: New test.
* 

Re: [PATCH 1/4] sparc: Treat more instructions as load or store in errata workarounds

2021-09-15 Thread Eric Botcazou
> gcc/ChangeLog:
> 
> * config/sparc/sparc.c (store_insn_p): Add predicate for store
> attributes.
> (load_insn_p): Add predicate for load attributes.
> (sparc_do_work_around_errata): Use new predicates.

OK everywhere on principle, but can we avoid the multiple calls to 
get_attr_type?  See div_sqrt_insn_p and fpop_insn_p for a model.

-- 
Eric Botcazou




Re: [PATCH] sparc: Print out bit names for LEON and LEON3 with -mdebug

2021-09-15 Thread Eric Botcazou
> gcc/ChangeLog:
> 
> * config/sparc/sparc.c (dump_target_flag_bits): Print bit names for
> LEON and LEON3.

OK everywhere.

-- 
Eric Botcazou




Re: [PATCH] sparc: Add scheduling information for LEON5

2021-09-15 Thread Eric Botcazou
> The LEON5 can often dual issue instructions from the same 64-bit aligned
> double word if there are no data dependencies. Add scheduling information
> to avoid scheduling unpairable instructions back-to-back.
> 
> gcc/ChangeLog:
> 
> * config/sparc/sparc-opts.h (enum sparc_processor_type): Add LEON5
> * config/sparc/sparc.c (struct processor_costs): Add LEON5 costs
> (leon5_adjust_cost): Increase cost of store with data dependency
> on ALU instruction and FPU anti-dependencies.
> (sparc_option_override): Add LEON5 costs
> (sparc_adjust_cost): Add LEON5 cost adjustments
> * config/sparc/sparc.h: Add LEON5
> * config/sparc/sparc.md: Include LEON5 scheduling information
> * config/sparc/sparc.opt: Add LEON5
> * doc/invoke.texi: Add LEON5
> * config/sparc/leon5.md: New file.

OK for whatever branches you deem relevant, modulo a couple of nits:

> +;; Avoid scheduling load/store, FPU, and multiplication instructions back

and multiply instructions

> +;; Schedule three instructions between load and dependant instruction.

dependent

-- 
Eric Botcazou




Re: [PATCH] Maintain (mis-)alignment info in the first element of a group

2021-09-15 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches  writes:
> On Wed, 15 Sep 2021, Richard Sandiford wrote:
>> Richard Biener  writes:
>> > On Tue, 14 Sep 2021, Richard Sandiford wrote:
>> >
>> >> Richard Biener via Gcc-patches  writes:
>> >> > This changes us to maintain and compute (mis-)alignment info for
>> >> > the first element of a group only rather than for each DR when
>> >> > doing interleaving and for the earliest, first, or first in the SLP
>> >> > node (or any pair or all three of those) when SLP vectorizing.
>> >> >
>> >> > For this to work out the easiest way I have changed the accessors
>> >> > DR_MISALIGNMENT and DR_TARGET_ALIGNMENT to do the indirection to
>> >> > the first element rather than adjusting all callers.
>> >> > dr_misalignment is moved out-of-line and I'm not too fond of the
>> >> > poly-int dances there (any hints?), but basically we are now
>> >> > adjusting the first elements misalignment based on the DR_INIT
>> >> > difference.
>> >> >
>> >> > Bootstrap & regtest running on x86_64-unknown-linux-gnu.
>> >> >
>> >> > Richard.
>> >> >
>> >> > 2021-09-13  Richard Biener  
>> >> >
>> >> > * tree-vectorizer.h (dr_misalignment): Move out of line.
>> >> > (dr_target_alignment): New.
>> >> > (DR_TARGET_ALIGNMENT): Wrap dr_target_alignment.
>> >> > (set_dr_target_alignment): New.
>> >> > (SET_DR_TARGET_ALIGNMENT): Wrap set_dr_target_alignment.
>> >> > * tree-vect-data-refs.c (dr_misalignment): Compute and
>> >> > return the group members misalignment.
>> >> > (vect_compute_data_ref_alignment): Use SET_DR_TARGET_ALIGNMENT.
>> >> > (vect_analyze_data_refs_alignment): Compute alignment only
>> >> > for the first element of a DR group.
>> >> > (vect_slp_analyze_node_alignment): Likewise.
>> >> > ---
>> >> >  gcc/tree-vect-data-refs.c | 65 ---
>> >> >  gcc/tree-vectorizer.h | 24 ++-
>> >> >  2 files changed, 57 insertions(+), 32 deletions(-)
>> >> >
>> >> > diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
>> >> > index 66e76132d14..b53d6a0b3f1 100644
>> >> > --- a/gcc/tree-vect-data-refs.c
>> >> > +++ b/gcc/tree-vect-data-refs.c
>> >> > @@ -887,6 +887,36 @@ vect_slp_analyze_instance_dependence (vec_info 
>> >> > *vinfo, slp_instance instance)
>> >> >return res;
>> >> >  }
>> >> >  
>> >> > +/* Return the misalignment of DR_INFO.  */
>> >> > +
>> >> > +int
>> >> > +dr_misalignment (dr_vec_info *dr_info)
>> >> > +{
>> >> > +  if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt))
>> >> > +{
>> >> > +  dr_vec_info *first_dr
>> >> > +   = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (dr_info->stmt));
>> >> > +  int misalign = first_dr->misalignment;
>> >> > +  gcc_assert (misalign != DR_MISALIGNMENT_UNINITIALIZED);
>> >> > +  if (misalign == DR_MISALIGNMENT_UNKNOWN)
>> >> > +   return misalign;
>> >> > +  poly_offset_int diff = (wi::to_poly_offset (DR_INIT 
>> >> > (dr_info->dr))
>> >> > + - wi::to_poly_offset (DR_INIT 
>> >> > (first_dr->dr)));
>> >> > +  poly_int64 mispoly = misalign + diff.to_constant ().to_shwi ();
>> >> > +  bool res = known_misalignment (mispoly,
>> >> > +
>> >> > first_dr->target_alignment.to_constant (),
>> >> > +);
>> >> > +  gcc_assert (res);
>> >> > +  return misalign;
>> >> 
>> >> Yeah, not too keen on the to_constants here.  The one on diff looks
>> >> redundant -- you could just use diff.force_shwi () instead, and
>> >> keep everything poly_int.
>> >>
>> >> For the known_misalignment I think we should use:
>> >> 
>> >>if (!can_div_trunc_p (mispoly, first_dr->target_alignment,
>> >>, ))
>> >>  misalign = DR_MISALIGNMENT_UNKNOWN;
>> >>return misalign;
>> >> 
>> >> There are then no to_constant assumptions.
>> >
>> > OK, note that group analysis does
>> >
>> >   /* Check that the DR_INITs are compile-time constants.  */
>> >   if (TREE_CODE (DR_INIT (dra)) != INTEGER_CST
>> >   || TREE_CODE (DR_INIT (drb)) != INTEGER_CST)
>> > break;
>> >
>> >   /* Sorting has ensured that DR_INIT (dra) <= DR_INIT (drb).  */
>> >   HOST_WIDE_INT init_a = TREE_INT_CST_LOW (DR_INIT (dra));
>> >   HOST_WIDE_INT init_b = TREE_INT_CST_LOW (DR_INIT (drb));
>> >
>> > so I'm confident my variant was "correct", but it still was ugly.
>> 
>> Ah, OK.  In that case I don't mind the original version, but it would be
>> good to have a comment above the to_constant saying where the condition
>> is enforced.
>> 
>> I'm just trying to avoid to_constant calls with no comment to explain
>> them, and with no nearby is_constant call.  Otherwise it could end up
>> a bit like tree_to_uhwi, where sometimes tree_fits_uhwi_p really has
>> been checked earlier (not always obvious where) and sometimes
>> tree_to_uhwi is just used out of 

Re: [PATCH] Maintain (mis-)alignment info in the first element of a group

2021-09-15 Thread Richard Biener via Gcc-patches
On Wed, 15 Sep 2021, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Tue, 14 Sep 2021, Richard Sandiford wrote:
> >
> >> Richard Biener via Gcc-patches  writes:
> >> > This changes us to maintain and compute (mis-)alignment info for
> >> > the first element of a group only rather than for each DR when
> >> > doing interleaving and for the earliest, first, or first in the SLP
> >> > node (or any pair or all three of those) when SLP vectorizing.
> >> >
> >> > For this to work out the easiest way I have changed the accessors
> >> > DR_MISALIGNMENT and DR_TARGET_ALIGNMENT to do the indirection to
> >> > the first element rather than adjusting all callers.
> >> > dr_misalignment is moved out-of-line and I'm not too fond of the
> >> > poly-int dances there (any hints?), but basically we are now
> >> > adjusting the first elements misalignment based on the DR_INIT
> >> > difference.
> >> >
> >> > Bootstrap & regtest running on x86_64-unknown-linux-gnu.
> >> >
> >> > Richard.
> >> >
> >> > 2021-09-13  Richard Biener  
> >> >
> >> >  * tree-vectorizer.h (dr_misalignment): Move out of line.
> >> >  (dr_target_alignment): New.
> >> >  (DR_TARGET_ALIGNMENT): Wrap dr_target_alignment.
> >> >  (set_dr_target_alignment): New.
> >> >  (SET_DR_TARGET_ALIGNMENT): Wrap set_dr_target_alignment.
> >> >  * tree-vect-data-refs.c (dr_misalignment): Compute and
> >> >  return the group members misalignment.
> >> >  (vect_compute_data_ref_alignment): Use SET_DR_TARGET_ALIGNMENT.
> >> >  (vect_analyze_data_refs_alignment): Compute alignment only
> >> >  for the first element of a DR group.
> >> >  (vect_slp_analyze_node_alignment): Likewise.
> >> > ---
> >> >  gcc/tree-vect-data-refs.c | 65 ---
> >> >  gcc/tree-vectorizer.h | 24 ++-
> >> >  2 files changed, 57 insertions(+), 32 deletions(-)
> >> >
> >> > diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
> >> > index 66e76132d14..b53d6a0b3f1 100644
> >> > --- a/gcc/tree-vect-data-refs.c
> >> > +++ b/gcc/tree-vect-data-refs.c
> >> > @@ -887,6 +887,36 @@ vect_slp_analyze_instance_dependence (vec_info 
> >> > *vinfo, slp_instance instance)
> >> >return res;
> >> >  }
> >> >  
> >> > +/* Return the misalignment of DR_INFO.  */
> >> > +
> >> > +int
> >> > +dr_misalignment (dr_vec_info *dr_info)
> >> > +{
> >> > +  if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt))
> >> > +{
> >> > +  dr_vec_info *first_dr
> >> > += STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (dr_info->stmt));
> >> > +  int misalign = first_dr->misalignment;
> >> > +  gcc_assert (misalign != DR_MISALIGNMENT_UNINITIALIZED);
> >> > +  if (misalign == DR_MISALIGNMENT_UNKNOWN)
> >> > +return misalign;
> >> > +  poly_offset_int diff = (wi::to_poly_offset (DR_INIT (dr_info->dr))
> >> > +  - wi::to_poly_offset (DR_INIT 
> >> > (first_dr->dr)));
> >> > +  poly_int64 mispoly = misalign + diff.to_constant ().to_shwi ();
> >> > +  bool res = known_misalignment (mispoly,
> >> > + 
> >> > first_dr->target_alignment.to_constant (),
> >> > + );
> >> > +  gcc_assert (res);
> >> > +  return misalign;
> >> 
> >> Yeah, not too keen on the to_constants here.  The one on diff looks
> >> redundant -- you could just use diff.force_shwi () instead, and
> >> keep everything poly_int.
> >>
> >> For the known_misalignment I think we should use:
> >> 
> >>if (!can_div_trunc_p (mispoly, first_dr->target_alignment,
> >> , ))
> >>  misalign = DR_MISALIGNMENT_UNKNOWN;
> >>return misalign;
> >> 
> >> There are then no to_constant assumptions.
> >
> > OK, note that group analysis does
> >
> >   /* Check that the DR_INITs are compile-time constants.  */
> >   if (TREE_CODE (DR_INIT (dra)) != INTEGER_CST
> >   || TREE_CODE (DR_INIT (drb)) != INTEGER_CST)
> > break;
> >
> >   /* Sorting has ensured that DR_INIT (dra) <= DR_INIT (drb).  */
> >   HOST_WIDE_INT init_a = TREE_INT_CST_LOW (DR_INIT (dra));
> >   HOST_WIDE_INT init_b = TREE_INT_CST_LOW (DR_INIT (drb));
> >
> > so I'm confident my variant was "correct", but it still was ugly.
> 
> Ah, OK.  In that case I don't mind the original version, but it would be
> good to have a comment above the to_constant saying where the condition
> is enforced.
> 
> I'm just trying to avoid to_constant calls with no comment to explain
> them, and with no nearby is_constant call.  Otherwise it could end up
> a bit like tree_to_uhwi, where sometimes tree_fits_uhwi_p really has
> been checked earlier (not always obvious where) and sometimes
> tree_to_uhwi is just used out of hope, to avoid having to think about
> the alternative.
> 
> > There's also the issue that target_alignment is poly_uint64 but
> > misalignment is signed int.
> >
> > Note that can_div_trunc_p seems to require a poly_uint64 

[PATCH] Optimize for V{8,16,32}HFmode vec_set/extract/init.

2021-09-15 Thread liuhongt via Gcc-patches
Hi:
  The optimization is decribled in PR.

  Bootstrapped and regtest on x86_64-linux-gnu{-m32,}.
  All avx512fp16 runtest cases passed on SPR.

gcc/ChangeLog:

PR target/102327
* config/i386/i386-expand.c
(ix86_expand_vector_init_interleave): Use puncklwd to pack 2
HFmodes.
(ix86_expand_vector_set): Use blendw instead of pinsrw.
* config/i386/i386.c (ix86_can_change_mode_class): Adjust for
AVX512FP16 which supports 16bit vector load.
* config/i386/sse.md (avx512bw_interleave_highv32hi):
Rename to ..
(avx512bw_interleave_high): .. this, and
extend to V32HFmode.
(avx2_interleave_highv16hi): Rename to ..
(avx2_interleave_high): .. this, and extend
to V16HFmode.
(vec_interleave_highv8hi): Rename to ..
(vec_interleave_high): .. this, and extend to V8HFmode.
(avx512bw_interleave_lowv32hi):
Rename to ..
(avx512bw_interleave_low):
this, and extend to V32HFmode.
(avx2_interleave_lowv16hi): Rename to ..
(avx2_interleave_low): .. this, and extend to 
V16HFmode.
(vec_interleave_lowv8hi): Rename to ..
(vec_interleave_low): .. this, and extend to V8HFmode.
(sse4_1_pblendw): Rename to ..
(sse4_1_pblend): .. this, and extend to V8HFmode.
(avx2_pblendph): New define_expand.
(_pinsr): Refactor, use
sseintmodesuffix instead of ssemodesuffix.
(blendsuf): New mode attr.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr102327-1.c: New test.
* gcc.target/i386/pr102327-2.c: New test.
* gcc.target/i386/avx512fp16-1c.c: Adjust testcase.
---
 gcc/config/i386/i386-expand.c |  95 ++
 gcc/config/i386/i386.c|   7 +-
 gcc/config/i386/sse.md| 176 --
 gcc/testsuite/gcc.target/i386/avx512fp16-1c.c |   6 +-
 gcc/testsuite/gcc.target/i386/pr102327-1.c|  65 +++
 gcc/testsuite/gcc.target/i386/pr102327-2.c|  95 ++
 6 files changed, 343 insertions(+), 101 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr102327-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr102327-2.c

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index e117afb16b8..c82b6accf1b 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -14054,7 +14054,7 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, 
machine_mode mode,
  tmp1 = gen_reg_rtx (SImode);
  emit_move_insn (tmp1, gen_lowpart (SImode, val));
 
- /* Insert the SImode value as low element of a V4SImode vector. */
+ /* Insert the SImode value as low element of a V4SImode vector.  */
  tmp2 = gen_reg_rtx (V4SImode);
  emit_insn (gen_vec_setv4si_0 (tmp2, CONST0_RTX (V4SImode), tmp1));
  emit_move_insn (dperm.op0, gen_lowpart (mode, tmp2));
@@ -14638,7 +14638,7 @@ ix86_expand_vector_init_interleave (machine_mode mode,
   switch (mode)
 {
 case E_V8HFmode:
-  gen_load_even = gen_vec_setv8hf;
+  gen_load_even = gen_vec_interleave_lowv8hf;
   gen_interleave_first_low = gen_vec_interleave_lowv4si;
   gen_interleave_second_low = gen_vec_interleave_lowv2di;
   inner_mode = HFmode;
@@ -14673,35 +14673,40 @@ ix86_expand_vector_init_interleave (machine_mode mode,
   op = ops [i + i];
   if (inner_mode == HFmode)
{
- /* Convert HFmode to HImode.  */
- op1 = gen_reg_rtx (HImode);
- op1 = gen_rtx_SUBREG (HImode, force_reg (HFmode, op), 0);
- op = gen_reg_rtx (HImode);
- emit_move_insn (op, op1);
+ rtx even, odd;
+ /* Use vpuncklwd to pack 2 HFmode.  */
+ op0 = gen_reg_rtx (V8HFmode);
+ even = lowpart_subreg (V8HFmode, force_reg (HFmode, op), HFmode);
+ odd = lowpart_subreg (V8HFmode,
+   force_reg (HFmode, ops[i + i + 1]),
+   HFmode);
+ emit_insn (gen_load_even (op0, even, odd));
}
+  else
+   {
+ /* Extend the odd elment to SImode using a paradoxical SUBREG.  */
+ op0 = gen_reg_rtx (SImode);
+ emit_move_insn (op0, gen_lowpart (SImode, op));
 
-  /* Extend the odd elment to SImode using a paradoxical SUBREG.  */
-  op0 = gen_reg_rtx (SImode);
-  emit_move_insn (op0, gen_lowpart (SImode, op));
-
-  /* Insert the SImode value as low element of V4SImode vector. */
-  op1 = gen_reg_rtx (V4SImode);
-  op0 = gen_rtx_VEC_MERGE (V4SImode,
-  gen_rtx_VEC_DUPLICATE (V4SImode,
- op0),
-  CONST0_RTX (V4SImode),
-  const1_rtx);
-  emit_insn (gen_rtx_SET (op1, op0));
+ /* Insert the SImode value as low element of V4SImode vector.  */
+ op1 = 

[PATCH 4/4] sparc: Add NOP in stack_protect_setsi if sparc_fix_b2bst enabled

2021-09-15 Thread Daniel Cederman
This is needed to prevent the Store -> (Non-store or load) -> Store
sequence.

gcc/ChangeLog:

* config/sparc/sparc.md: Add NOP to prevent sensitive sequence for
B2BST errata workaround.
---
 gcc/config/sparc/sparc.md | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md
index 24b76e0cacd..3ac074a244d 100644
--- a/gcc/config/sparc/sparc.md
+++ b/gcc/config/sparc/sparc.md
@@ -8353,9 +8353,15 @@ visl")
(unspec:SI [(match_operand:SI 1 "memory_operand" "m")] UNSPEC_SP_SET))
(set (match_scratch:SI 2 "=") (const_int 0))]
   "TARGET_ARCH32"
-  "ld\t%1, %2\;st\t%2, %0\;mov\t0, %2"
+{
+  if (sparc_fix_b2bst)
+return "ld\t%1, %2\;st\t%2, %0\;mov\t0, %2\;nop";
+  else
+return "ld\t%1, %2\;st\t%2, %0\;mov\t0, %2";
+}
   [(set_attr "type" "multi")
-   (set_attr "length" "3")])
+   (set (attr "length") (if_then_else (eq_attr "fix_b2bst" "true")
+ (const_int 4) (const_int 3)))])
 
 (define_insn "stack_protect_set64"
   [(set (match_operand:DI 0 "memory_operand" "=m")
-- 
2.25.1



[PATCH 3/4] sparc: Prevent atomic instructions in beginning of functions for UT700

2021-09-15 Thread Daniel Cederman
A call to the function might have a load instruction in the delay slot
and a load followed by an atomic function could cause a deadlock.

gcc/ChangeLog:

* config/sparc/sparc.c (sparc_do_work_around_errata): Do not begin
functions with atomic instruction in the UT700 errata workaround.
---
 gcc/config/sparc/sparc.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c
index b087c5b3fc8..5177d48793d 100644
--- a/gcc/config/sparc/sparc.c
+++ b/gcc/config/sparc/sparc.c
@@ -1106,6 +1106,7 @@ static unsigned int
 sparc_do_work_around_errata (void)
 {
   rtx_insn *insn, *next;
+  bool find_first_useful = true;
 
   /* Force all instructions to be split into their final form.  */
   split_all_insns_noflow ();
@@ -1130,6 +1131,16 @@ sparc_do_work_around_errata (void)
   else
jump = NULL;
 
+  /* Do not begin function with atomic instruction.  */
+  if (sparc_fix_ut700
+ && find_first_useful
+ && USEFUL_INSN_P (insn))
+   {
+ find_first_useful = false;
+ if (atomic_insn_for_leon3_p (insn))
+   emit_insn_before (gen_nop (), insn);
+   }
+
   /* Place a NOP at the branch target of an integer branch if it is a
 floating-point operation or a floating-point branch.  */
   if (sparc_fix_gr712rc
-- 
2.25.1



[PATCH] sparc: Add scheduling information for LEON5

2021-09-15 Thread Daniel Cederman
The LEON5 can often dual issue instructions from the same 64-bit aligned
double word if there are no data dependencies. Add scheduling information
to avoid scheduling unpairable instructions back-to-back.

gcc/ChangeLog:

* config/sparc/sparc-opts.h (enum sparc_processor_type): Add LEON5
* config/sparc/sparc.c (struct processor_costs): Add LEON5 costs
(leon5_adjust_cost): Increase cost of store with data dependency
on ALU instruction and FPU anti-dependencies.
(sparc_option_override): Add LEON5 costs
(sparc_adjust_cost): Add LEON5 cost adjustments
* config/sparc/sparc.h: Add LEON5
* config/sparc/sparc.md: Include LEON5 scheduling information
* config/sparc/sparc.opt: Add LEON5
* doc/invoke.texi: Add LEON5
* config/sparc/leon5.md: New file.
---
 gcc/config/sparc/leon5.md | 103 ++
 gcc/config/sparc/sparc-opts.h |   1 +
 gcc/config/sparc/sparc.c  |  84 +++
 gcc/config/sparc/sparc.h  |  36 ++--
 gcc/config/sparc/sparc.md |   2 +
 gcc/config/sparc/sparc.opt|   3 +
 gcc/doc/invoke.texi   |  13 +++--
 7 files changed, 220 insertions(+), 22 deletions(-)
 create mode 100644 gcc/config/sparc/leon5.md

diff --git a/gcc/config/sparc/leon5.md b/gcc/config/sparc/leon5.md
new file mode 100644
index 000..3f72a9f53e0
--- /dev/null
+++ b/gcc/config/sparc/leon5.md
@@ -0,0 +1,103 @@
+;; Scheduling description for LEON5.
+;;   Copyright (C) 2021 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+
+;; The LEON5 can often dual issue instructions from the same 64-bit aligned
+;; double word if there are no data dependencies.
+;;
+;; Avoid scheduling load/store, FPU, and multiplication instructions back to
+;; back, regardless of data dependencies.
+;;
+;; Push comparisons away from the associated branch instruction.
+;;
+;; Avoid scheduling ALU instructions with data dependencies back to back.
+;;
+;; Schedule three instructions between load and dependant instruction.
+
+(define_automaton "leon5")
+
+(define_cpu_unit "leon5_memory" "leon5")
+(define_cpu_unit "leon5_mul" "leon5")
+(define_cpu_unit "grfpu_d" "grfpu")
+(define_cpu_unit "grfpu_s" "grfpu")
+
+(define_insn_reservation "leon5_load" 4
+  (and (eq_attr "cpu" "leon5")
+  (eq_attr "type" "load,sload"))
+  "leon5_memory * 2, nothing * 2")
+
+(define_insn_reservation "leon5_fpload" 2
+  (and (eq_attr "cpu" "leon5")
+  (eq_attr "type" "fpload"))
+  "leon5_memory * 2 + grfpu_alu * 2")
+
+(define_insn_reservation "leon5_store" 2
+  (and (eq_attr "cpu" "leon5")
+  (eq_attr "type" "store"))
+  "leon5_memory * 2")
+
+(define_insn_reservation "leon5_fpstore" 2
+  (and (eq_attr "cpu" "leon5")
+  (eq_attr "type" "fpstore"))
+  "leon5_memory * 2 + grfpu_alu * 2")
+
+(define_insn_reservation "leon5_ialu" 2
+  (and (eq_attr "cpu" "leon5")
+  (eq_attr "type" "ialu, shift, ialuX"))
+  "nothing * 2")
+
+(define_insn_reservation "leon5_compare" 5
+  (and (eq_attr "cpu" "leon5")
+  (eq_attr "type" "compare"))
+  "nothing * 5")
+
+(define_insn_reservation "leon5_imul" 4
+  (and (eq_attr "cpu" "leon5")
+  (eq_attr "type" "imul"))
+  "leon5_mul * 2, nothing * 2")
+
+(define_insn_reservation "leon5_idiv" 35
+  (and (eq_attr "cpu" "leon5")
+  (eq_attr "type" "imul"))
+  "nothing * 35")
+
+(define_insn_reservation "leon5_fp_alu" 5
+  (and (eq_attr "cpu" "leon5")
+  (eq_attr "type" "fp,fpcmp,fpmul,fpmove"))
+  "grfpu_alu * 2, nothing*3")
+
+(define_insn_reservation "leon5_fp_divs" 17
+  (and (eq_attr "cpu" "leon5")
+  (eq_attr "type" "fpdivs"))
+  "grfpu_alu * 2 + grfpu_d*16, nothing")
+
+(define_insn_reservation "leon5_fp_divd" 18
+  (and (eq_attr "cpu" "leon5")
+  (eq_attr "type" "fpdivd"))
+  "grfpu_alu * 2 + grfpu_d*17, nothing")
+
+(define_insn_reservation "leon5_fp_sqrts" 25
+  (and (eq_attr "cpu" "leon5")
+  (eq_attr "type" "fpsqrts"))
+  "grfpu_alu * 2 + grfpu_s*24, nothing")
+
+(define_insn_reservation "leon5_fp_sqrtd" 26
+  (and (eq_attr "cpu" "leon5")
+  (eq_attr "type" "fpsqrtd"))
+  "grfpu_alu * 2 + grfpu_s*25, nothing")
diff --git a/gcc/config/sparc/sparc-opts.h b/gcc/config/sparc/sparc-opts.h
index 1af556e1156..9299cf6a2ff 100644
--- a/gcc/config/sparc/sparc-opts.h
+++ b/gcc/config/sparc/sparc-opts.h
@@ -31,6 +31,7 @@ enum sparc_processor_type {
   

[PATCH 2/4] sparc: Skip all empty assembly statements

2021-09-15 Thread Daniel Cederman
This version detects multiple empty assembly statements in a row and also
detects non-memory barrier empty assembly statements (__asm__("")). It
can be used instead of next_active_insn().

gcc/ChangeLog:

* config/sparc/sparc.c (next_active_non_empty_insn): New function
that returns next active non empty assembly instruction.
(sparc_do_work_around_errata): Use new function.
---
 gcc/config/sparc/sparc.c | 37 +++--
 1 file changed, 23 insertions(+), 14 deletions(-)

diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c
index fa78e0dc739..b087c5b3fc8 100644
--- a/gcc/config/sparc/sparc.c
+++ b/gcc/config/sparc/sparc.c
@@ -1082,6 +1082,26 @@ load_insn_p (rtx_insn *insn)
&& GET_CODE (PATTERN (INSN)) != USE \
&& GET_CODE (PATTERN (INSN)) != CLOBBER)
 
+rtx_insn *
+next_active_non_empty_insn (rtx_insn *insn)
+{
+  insn = next_active_insn (insn);
+
+  while (insn
+&& ((GET_CODE (PATTERN (insn)) == UNSPEC_VOLATILE)
+|| (GET_CODE (PATTERN (insn)) == ASM_INPUT)
+|| (USEFUL_INSN_P (insn)
+&& (asm_noperands (PATTERN (insn))>=0)
+&& !strcmp (decode_asm_operands (PATTERN (insn),
+ NULL, NULL, NULL,
+ NULL, NULL), ""
+{
+  insn = next_active_insn (insn);
+}
+
+  return insn;
+}
+
 static unsigned int
 sparc_do_work_around_errata (void)
 {
@@ -1139,7 +1159,7 @@ sparc_do_work_around_errata (void)
emit_insn_before (gen_nop (), target);
}
 
- next = next_active_insn (insn);
+ next = next_active_non_empty_insn (insn);
  if (!next)
break;
 
@@ -1242,23 +1262,12 @@ sparc_do_work_around_errata (void)
  rtx_insn *after;
  int i;
 
- next = next_active_insn (insn);
+ next = next_active_non_empty_insn (insn);
  if (!next)
break;
 
  for (after = next, i = 0; i < 2; i++)
{
- /* Skip empty assembly statements.  */
- if ((GET_CODE (PATTERN (after)) == UNSPEC_VOLATILE)
- || (USEFUL_INSN_P (after)
- && (asm_noperands (PATTERN (after))>=0)
- && !strcmp (decode_asm_operands (PATTERN (after),
-  NULL, NULL, NULL,
-  NULL, NULL), "")))
-   after = next_active_insn (after);
- if (!after)
-   break;
-
  /* If the insn is a branch, then it cannot be problematic.  */
  if (!NONJUMP_INSN_P (after)
  || GET_CODE (PATTERN (after)) == SEQUENCE)
@@ -1283,7 +1292,7 @@ sparc_do_work_around_errata (void)
  && (MEM_P (SET_DEST (set)) || mem_ref (SET_SRC (set
break;
 
- after = next_active_insn (after);
+ after = next_active_non_empty_insn (after);
  if (!after)
break;
}
-- 
2.25.1



[PATCH 1/4] sparc: Treat more instructions as load or store in errata workarounds

2021-09-15 Thread Daniel Cederman
Check the attribute of instruction to determine if it performs a store
or load operation. This more generic approach sees the last instruction
in the GOTdata_op model as a potential load and treats the memory barrier
as a potential store instruction.

gcc/ChangeLog:

* config/sparc/sparc.c (store_insn_p): Add predicate for store
attributes.
(load_insn_p): Add predicate for load attributes.
(sparc_do_work_around_errata): Use new predicates.
---
 gcc/config/sparc/sparc.c | 37 +
 1 file changed, 29 insertions(+), 8 deletions(-)

diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c
index d5a0ff7d4ea..fa78e0dc739 100644
--- a/gcc/config/sparc/sparc.c
+++ b/gcc/config/sparc/sparc.c
@@ -1045,6 +1045,31 @@ atomic_insn_for_leon3_p (rtx_insn *insn)
 }
 }
 
+/* True if INSN is a store instruction.  */
+
+static bool
+store_insn_p (rtx_insn *insn)
+{
+   if (GET_CODE (PATTERN (insn)) != SET)
+return false;
+
+   return (get_attr_type (insn) == TYPE_STORE)
+ || (get_attr_type (insn) == TYPE_FPSTORE);
+}
+
+/* True if INSN is a load instruction.  */
+
+static bool
+load_insn_p (rtx_insn *insn)
+{
+   if (GET_CODE (PATTERN (insn)) != SET)
+return false;
+
+   return (get_attr_type (insn) == TYPE_LOAD)
+ || (get_attr_type (insn) == TYPE_SLOAD)
+ || (get_attr_type (insn) == TYPE_FPLOAD);
+}
+
 /* We use a machine specific pass to enable workarounds for errata.
 
We need to have the (essentially) final form of the insn stream in order
@@ -1105,9 +1130,7 @@ sparc_do_work_around_errata (void)
 instruction at branch target.  */
   if (sparc_fix_ut700
  && NONJUMP_INSN_P (insn)
- && (set = single_set (insn)) != NULL_RTX
- && mem_ref (SET_SRC (set))
- && REG_P (SET_DEST (set)))
+ && load_insn_p (insn))
{
  if (jump && jump_to_label_p (jump))
{
@@ -1212,7 +1235,7 @@ sparc_do_work_around_errata (void)
   if (sparc_fix_b2bst
  && NONJUMP_INSN_P (insn)
  && (set = single_set (insn)) != NULL_RTX
- && MEM_P (SET_DEST (set)))
+ && store_insn_p (insn))
{
  /* Sequence B begins with a double-word store.  */
  bool seq_b = GET_MODE_SIZE (GET_MODE (SET_DEST (set))) == 8;
@@ -1245,8 +1268,7 @@ sparc_do_work_around_errata (void)
  if (seq_b)
{
  /* Add NOP if followed by a store.  */
- if ((set = single_set (after)) != NULL_RTX
- && MEM_P (SET_DEST (set)))
+ if (store_insn_p (after))
insert_nop = true;
 
  /* Otherwise it is ok.  */
@@ -1268,8 +1290,7 @@ sparc_do_work_around_errata (void)
 
  /* Add NOP if third instruction is a store.  */
  if (i == 1
- && (set = single_set (after)) != NULL_RTX
- && MEM_P (SET_DEST (set)))
+ && store_insn_p (after))
insert_nop = true;
}
}
-- 
2.25.1



[PATCH] sparc: Print out bit names for LEON and LEON3 with -mdebug

2021-09-15 Thread Daniel Cederman
From: Andreas Larsson 

gcc/ChangeLog:

* config/sparc/sparc.c (dump_target_flag_bits): Print bit names for
LEON and LEON3.
---
 gcc/config/sparc/sparc.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c
index 06f41d7bb53..d5a0ff7d4ea 100644
--- a/gcc/config/sparc/sparc.c
+++ b/gcc/config/sparc/sparc.c
@@ -1596,6 +1596,10 @@ dump_target_flag_bits (const int flags)
 fprintf (stderr, "CBCOND ");
   if (flags & MASK_DEPRECATED_V8_INSNS)
 fprintf (stderr, "DEPRECATED_V8_INSNS ");
+  if (flags & MASK_LEON)
+fprintf (stderr, "LEON ");
+  if (flags & MASK_LEON3)
+fprintf (stderr, "LEON3 ");
   if (flags & MASK_SPARCLET)
 fprintf (stderr, "SPARCLET ");
   if (flags & MASK_SPARCLITE)
-- 
2.25.1



[PATCH] tree-optimization/102318 - reduction epilogue re-use

2021-09-15 Thread Richard Biener via Gcc-patches
This refines the fix for PR102226 to do the mode conversion
from V2DI to VNx2DI separately from the sign-conversion, retaining
the signedness of the saved accumulator as before the original fix.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-09-15  Richard Biener 

PR tree-optimization/102318
* tree-vect-loop.c (vect_transform_cycle_phi): Revert
previous change and do the mode conversion separately from
the sign conversion.

* gcc.dg/vect/pr102318.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr102318.c | 21 +
 gcc/tree-vect-loop.c | 13 +++--
 2 files changed, 32 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr102318.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr102318.c 
b/gcc/testsuite/gcc.dg/vect/pr102318.c
new file mode 100644
index 000..cc58efacecd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr102318.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+
+void
+vec_slp_int16_t (short int *restrict a, short int *restrict b, int n)
+{
+  short int x0 = b[0];
+  short int x1 = b[1];
+  short int x2 = b[2];
+  short int x3 = b[3];
+  for (int i = 0; i < n; ++i)
+  {
+x0 += a[i * 4];
+x1 += a[i * 4 + 1];
+x2 += a[i * 4 + 2];
+x3 += a[i * 4 + 3];
+  }
+  b[0] = x0;
+  b[1] = x1;
+  b[2] = x2;
+  b[3] = x3;
+}
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index c9dcc647d2c..5a5b8da2e77 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -7755,11 +7755,20 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo,
  (reduc_info),
);
}
- if (!useless_type_conversion_p (vectype_out, TREE_TYPE (def)))
-   def = gimple_convert (, vectype_out, def);
+ /* The epilogue loop might use a different vector mode, like
+VNx2DI vs. V2DI.  */
+ if (TYPE_MODE (vectype_out) != TYPE_MODE (TREE_TYPE (def)))
+   {
+ tree reduc_type = build_vector_type_for_mode
+   (TREE_TYPE (TREE_TYPE (def)), TYPE_MODE (vectype_out));
+ def = gimple_convert (, reduc_type, def);
+   }
  /* Adjust the input so we pick up the partially reduced value
 for the skip edge in vect_create_epilog_for_reduction.  */
  accumulator->reduc_input = def;
+ /* And the reduction could be carried out using a different sign.  */
+ if (!useless_type_conversion_p (vectype_out, TREE_TYPE (def)))
+   def = gimple_convert (, vectype_out, def);
  if (loop_vinfo->main_loop_edge)
{
  /* While we'd like to insert on the edge this will split
-- 
2.31.1


Re: [PATCH][RFC] pru: Named address space for R30/R31 I/O access

2021-09-15 Thread Richard Biener via Gcc-patches
On Tue, Sep 14, 2021 at 11:13 PM Dimitar Dimitrov  wrote:
>
> Hi,
>
> I'm sending this patch to get feedback for a new PRU CPU port feature.
> My intention is to push it to master by end of September, so that it gets
> included in GCC 12.
>
> The PRU architecture provides single-cycle access to GPIO pins via
> special designated CPU registers - R30 and R31. These two registers can
> of course be accessed in C code using inline assembly, but that can be
> intimidating to users.
>
> The TI proprietary compiler [1] can expose these I/O registers as global
> volatile registers:
>   volatile register unsigned int __R31;
>
> Consequently, accessing them in user programs is as straightforward as
> using a regular global variable:
>   __R31 |= (1 << 2);
>
> Unfortunately, global volatile registers are not supported by GCC [2].

Yes, a "register" write or read does not follow volatile semantics, so
exposing those as registers isn't supported (I consider the GPIO regs
similar to MSRs on other CPUs?).

> I decided to implement convenient access to __R30 and __R31 using a new
> named address space:
>   extern volatile __regio_symbol unsigned int __R30;
>
> Unlike global registers, volatile global memory variables are well
> supported in GCC.  Memory writes and reads to the __regio_symbol address
> space are converted to writes and reads to R30 and R31 CPU registers.
> The declared variable name determines which of the two registers it is
> representing.

I think that's reasonable.  I do wonder whether it's possible to prevent
taking the address of __R30 though - otherwise I guess the backend
will crash or do weird things on such code?

> With an ifdef for the __R30/__R31 declarations, user programs can now
> be source-compatible with both TI and GCC toolchains.
>
> [1] https://www.ti.com/lit/ug/spruhv7c/spruhv7c.pdf , "Global Register 
> Variables"
> [2] https://gcc.gnu.org/ml/gcc-patches/2015-01/msg02241.html
>
> gcc/ChangeLog:
>
> * config/pru/constraints.md (Rrio): New constraint.
> * config/pru/predicates.md (regio_operand): New predicate.
> * config/pru/pru-pragma.c (pru_register_pragmas): Register
> the __regio_symbol address space.
> * config/pru/pru-protos.h (pru_symref2ioregno): Declaration.
> * config/pru/pru.c (pru_symref2ioregno): New helper function.
> (pru_legitimate_address_p): Remove.
> (pru_addr_space_legitimate_address_p): Use the address space
> aware hook variant.
> (pru_nongeneric_pointer_addrspace): New helper function.
> (pru_insert_attributes): New function to validate __regio_symbol
> usage.
> (TARGET_INSERT_ATTRIBUTES): New macro.
> (TARGET_LEGITIMATE_ADDRESS_P): Remove.
> (TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P): New macro.
> * config/pru/pru.h (enum reg_class): Add REGIO_REGS class.
> * config/pru/pru.md (*regio_readsi): New pattern to read I/O
> registers.
> (*regio_nozext_writesi): New pattern to write to I/O registers.
> (*regio_zext_write_r30): Ditto.
> * doc/extend.texi: Document the new PRU Named Address Space.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/pru/regio-as-pointer.c: New negative test.
> * gcc.target/pru/regio-as-pointer2.c: New negative test.
> * gcc.target/pru/regio-decl-2.c: New negative test.
> * gcc.target/pru/regio-decl-3.c: New negative test.
> * gcc.target/pru/regio-decl-4.c: New negative test.
> * gcc.target/pru/regio-decl.c: New negative test.
> * gcc.target/pru/regio-di.c: New negative test.
> * gcc.target/pru/regio-hi.c: New negative test.
> * gcc.target/pru/regio-qi.c: New negative test.
> * gcc.target/pru/regio.c: New test.
> * gcc.target/pru/regio.h: New helper header.
>
> Signed-off-by: Dimitar Dimitrov 
> ---
>  gcc/config/pru/constraints.md |   5 +
>  gcc/config/pru/predicates.md  |  19 +++
>  gcc/config/pru/pru-pragma.c   |   2 +
>  gcc/config/pru/pru-protos.h   |   3 +
>  gcc/config/pru/pru.c  | 155 +-
>  gcc/config/pru/pru.h  |   5 +
>  gcc/config/pru/pru.md | 102 +++-
>  gcc/doc/extend.texi   |  19 ++-
>  .../gcc.target/pru/regio-as-pointer.c |  11 ++
>  .../gcc.target/pru/regio-as-pointer2.c|  11 ++
>  gcc/testsuite/gcc.target/pru/regio-decl-2.c   |  13 ++
>  gcc/testsuite/gcc.target/pru/regio-decl-3.c   |  19 +++
>  gcc/testsuite/gcc.target/pru/regio-decl-4.c   |  17 ++
>  gcc/testsuite/gcc.target/pru/regio-decl.c |  15 ++
>  gcc/testsuite/gcc.target/pru/regio-di.c   |   9 +
>  gcc/testsuite/gcc.target/pru/regio-hi.c   |   9 +
>  gcc/testsuite/gcc.target/pru/regio-qi.c   |   9 +
>  gcc/testsuite/gcc.target/pru/regio.c  |  58 +++
>  gcc/testsuite/gcc.target/pru/regio.h  

Re: [PATCH] libstdc++-v3: Check for TLS support on mingw

2021-09-15 Thread Jonathan Wakely via Gcc-patches
On Wed, 1 Sept 2021 at 10:52, Jonathan Wakely  wrote:
>
> On Wed, 1 Sept 2021 at 02:44, Jonathan Yong <10wa...@gmail.com> wrote:
> >
> > On 8/31/21 9:02 AM, Jonathan Wakely wrote:
> > > It looks like my questions about this patch never got an answer, and
> > > it never got applied.
> > >
> > > Could somebody say whether TLS is enabled for native *-*-mingw*
> > > builds? If it is, then we definitely need to add GCC_CHECK_TLS to the
> > > cross-compiler config too.
> > >
> > > For a linux-hosted x86_64-w64-mingw32 cross compiler I see TLS is not 
> > > enabled:
> > >
> > > /* Define to 1 if the target supports thread-local storage. */
> > > /* #undef _GLIBCXX_HAVE_TLS */
> > >
> > >
> > >
> > >
> > > On Mon, 19 Feb 2018 at 08:59, Hugo Beauzée-Luyssen  
> > > wrote:
> > >>
> > >> libstdc++-v3: Check for TLS support on mingw
> > >>
> > >> 2018-02-16  Hugo Beauzée-Luyssen  
> > >>
> > >>  * crossconfig.m4: Check for TLS support on mignw
> > >>  * configure: regenerate
> > >>
> > >> Index: libstdc++-v3/crossconfig.m4
> > >> ===
> > >> --- libstdc++-v3/crossconfig.m4 (revision 257730)
> > >> +++ libstdc++-v3/crossconfig.m4 (working copy)
> > >> @@ -197,6 +197,7 @@ case "${host}" in
> > >>   GLIBCXX_CHECK_LINKER_FEATURES
> > >>   GLIBCXX_CHECK_MATH_SUPPORT
> > >>   GLIBCXX_CHECK_STDLIB_SUPPORT
> > >> +GCC_CHECK_TLS
> > >>   ;;
> > >> *-netbsd*)
> > >>   SECTION_FLAGS='-ffunction-sections -fdata-sections'
> >
> > According to MSYS2 native from
> > https://mirror.msys2.org/mingw/ucrt64/mingw-w64-ucrt-x86_64-gcc-10.3.0-5-any.pkg.tar.zst:
> >
> > x86_64-w64-mingw32/bits/c++config.h:#define _GLIBCXX_HAVE_TLS 1
> >
> > So yes.
>
> Thanks! I'll test the patch on a cross-compiler and apply it soon then.
>
> (Thanks also to LH for the answer)

I've pushed this to trunk now. Thanks for the patch, Hugo. Sorry it
took so long to process.


[PATCH] rs6000: Parameterize some const values for density test

2021-09-15 Thread Kewen.Lin via Gcc-patches
Hi,

This patch follows the discussion here[1], where Segher suggested
parameterizing those exact magic constants for density heuristics,
to make it easier to tweak if need.

Since these heuristics are quite internal, I make these parameters
as undocumented and be mainly used by developers.

The change here should be "No Functional Change".  But I verified
it with SPEC2017 at option sets O2-vect and Ofast-unroll on Power8,
the result is neutral as expected.

Bootstrapped and regress-tested on powerpc64le-linux-gnu Power9.

Is it ok for trunk?

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579121.html

BR,
Kewen
-
gcc/ChangeLog:

* config/rs6000/rs6000.opt (rs6000-density-pct-threshold,
rs6000-density-size-threshold, rs6000-density-penalty,
rs6000-density-load-pct-threshold,
rs6000-density-load-num-threshold): New parameter.
* config/rs6000/rs6000.c (rs6000_density_test): Adjust with
corresponding parameters.

---
 gcc/config/rs6000/rs6000.c   | 22 +++---
 gcc/config/rs6000/rs6000.opt | 21 +
 2 files changed, 28 insertions(+), 15 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 9bc826e3a50..4ab23b0ab33 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -5284,9 +5284,6 @@ struct rs6000_cost_data
 static void
 rs6000_density_test (rs6000_cost_data *data)
 {
-  const int DENSITY_PCT_THRESHOLD = 85;
-  const int DENSITY_SIZE_THRESHOLD = 70;
-  const int DENSITY_PENALTY = 10;
   struct loop *loop = data->loop_info;
   basic_block *bbs = get_loop_body (loop);
   int nbbs = loop->num_nodes;
@@ -5322,26 +5319,21 @@ rs6000_density_test (rs6000_cost_data *data)
   free (bbs);
   density_pct = (vec_cost * 100) / (vec_cost + not_vec_cost);

-  if (density_pct > DENSITY_PCT_THRESHOLD
-  && vec_cost + not_vec_cost > DENSITY_SIZE_THRESHOLD)
+  if (density_pct > rs6000_density_pct_threshold
+  && vec_cost + not_vec_cost > rs6000_density_size_threshold)
 {
-  data->cost[vect_body] = vec_cost * (100 + DENSITY_PENALTY) / 100;
+  data->cost[vect_body] = vec_cost * (100 + rs6000_density_penalty) / 100;
   if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
 "density %d%%, cost %d exceeds threshold, penalizing "
-"loop body cost by %d%%\n", density_pct,
-vec_cost + not_vec_cost, DENSITY_PENALTY);
+"loop body cost by %u%%\n", density_pct,
+vec_cost + not_vec_cost, rs6000_density_penalty);
 }

   /* Check whether we need to penalize the body cost to account
  for excess strided or elementwise loads.  */
   if (data->extra_ctor_cost > 0)
 {
-  /* Threshold for load stmts percentage in all vectorized stmts.  */
-  const int DENSITY_LOAD_PCT_THRESHOLD = 45;
-  /* Threshold for total number of load stmts.  */
-  const int DENSITY_LOAD_NUM_THRESHOLD = 20;
-
   gcc_assert (data->nloads <= data->nstmts);
   unsigned int load_pct = (data->nloads * 100) / data->nstmts;

@@ -5355,8 +5347,8 @@ rs6000_density_test (rs6000_cost_data *data)
  the loads.
 One typical case is the innermost loop of the hotspot of SPEC2017
 503.bwaves_r without loop interchange.  */
-  if (data->nloads > DENSITY_LOAD_NUM_THRESHOLD
- && load_pct > DENSITY_LOAD_PCT_THRESHOLD)
+  if (data->nloads > (unsigned int) rs6000_density_load_num_threshold
+ && load_pct > (unsigned int) rs6000_density_load_pct_threshold)
{
  data->cost[vect_body] += data->extra_ctor_cost;
  if (dump_enabled_p ())
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 0538db387dc..563983f3269 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -639,3 +639,24 @@ Enable instructions that guard against return-oriented 
programming attacks.
 mprivileged
 Target Var(rs6000_privileged) Init(0)
 Generate code that will run in privileged state.
+
+-param=rs6000-density-pct-threshold=
+Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) Init(85) 
IntegerRange(0, 99) Param
+When costing for loop vectorization, we probably need to penalize the loop 
body cost if the existing cost model may not adequately reflect delays from 
unavailable vector resources.  We collect the cost for vectorized statements 
and non-vectorized statements separately, check the proportion of vec_cost to 
total cost of vec_cost and non vec_cost, and penalize only if the proportion 
exceeds the threshold specified by this parameter.  The default value is 85.
+
+-param=rs6000-density-size-threshold=
+Target Undocumented Joined UInteger Var(rs6000_density_size_threshold) 
Init(70) IntegerRange(0, 99) Param
+Like parameter rs6000-density-pct-threshold, we also check the total sum of 
vec_cost and non vec_cost, and penalize only if 

Re: [PATCH V2] Set bound/cmp/control for until wrap loop.

2021-09-15 Thread Jiufu Guo via Gcc-patches
Jiufu Guo  writes:

I may want to have a gentle ping on this.
https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578680.html

BR,
Jiufu

> Changes on V1:
> * Add more test case
> * Add comment for exit-condition transform
> * Removing duplicate setting on niter->control
>
> This patch reset niter->control, niter->bound and niter->cmp in
> number_of_iterations_until_wrap.
>
> Bootstrap and test pass on ppc64 and x86, and pass the test cases
> in PR.  Is this ok for trunk?
>
> One thing, in this patch, the IVbase is still keep as biasing by 1 step.
>
>
> BR.
> Jiufu Guo
>
> gcc/ChangeLog:
>
> 2021-09-02  Jiufu Guo  
>
> PR tree-optimization/102087
> * tree-ssa-loop-niter.c (number_of_iterations_until_wrap):
> Update bound/cmp/control for niter.
>
> gcc/testsuite/ChangeLog:
>
> 2021-09-02  Jiufu Guo  
>
> PR tree-optimization/102087
> * gcc.dg/pr102087.c: New test.
>
> ---
>  gcc/tree-ssa-loop-niter.c   | 16 ++-
>  gcc/testsuite/gcc.dg/pr102087.c | 35 +
>  2 files changed, 50 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr102087.c
>
> diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
> index 7af92d1c893..75109407124 100644
> --- a/gcc/tree-ssa-loop-niter.c
> +++ b/gcc/tree-ssa-loop-niter.c
> @@ -1482,7 +1482,7 @@ number_of_iterations_until_wrap (class loop *, tree 
> type, affine_iv *iv0,
>affine_iv *iv1, class tree_niter_desc *niter)
>  {
>tree niter_type = unsigned_type_for (type);
> -  tree step, num, assumptions, may_be_zero;
> +  tree step, num, assumptions, may_be_zero, span;
>wide_int high, low, max, min;
>  
>may_be_zero = fold_build2 (LE_EXPR, boolean_type_node, iv1->base, 
> iv0->base);
> @@ -1557,6 +1557,20 @@ number_of_iterations_until_wrap (class loop *, tree 
> type, affine_iv *iv0,
>  
>niter->control.no_overflow = false;
>  
> +  /* Update bound and exit condition as:
> + bound = niter * STEP + (IVbase - STEP).
> + { IVbase - STEP, +, STEP } != bound
> + Here, biasing IVbase by 1 step makes 'bound' be the value before wrap.
> + */
> +  niter->control.base = fold_build2 (MINUS_EXPR, niter_type,
> +  niter->control.base, niter->control.step);
> +  span = fold_build2 (MULT_EXPR, niter_type, niter->niter,
> +   fold_convert (niter_type, niter->control.step));
> +  niter->bound = fold_build2 (PLUS_EXPR, niter_type, span,
> +   fold_convert (niter_type, niter->control.base));
> +  niter->bound = fold_convert (type, niter->bound);
> +  niter->cmp = NE_EXPR;
> +
>return true;
>  }
>  
> diff --git a/gcc/testsuite/gcc.dg/pr102087.c b/gcc/testsuite/gcc.dg/pr102087.c
> new file mode 100644
> index 000..fc60cbda066
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr102087.c
> @@ -0,0 +1,35 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3" } */
> +
> +unsigned __attribute__ ((noinline))
> +foo (int *__restrict__ a, int *__restrict__ b, unsigned l, unsigned n)
> +{
> +  while (n < ++l)
> +*a++ = *b++ + 1;
> +  return l;
> +}
> +
> +volatile int a[1];
> +unsigned b;
> +int c;
> +
> +int
> +check ()
> +{
> +  int d;
> +  for (; b > 1; b++)
> +for (c = 0; c < 2; c++)
> +  for (d = 0; d < 2; d++)
> + a[0];
> +  return 0;
> +}
> +
> +char **Gif_ClipImage_gfi_0;
> +int Gif_ClipImage_y, Gif_ClipImage_shift;
> +void
> +Gif_ClipImage ()
> +{
> +  for (; Gif_ClipImage_y >= Gif_ClipImage_shift; Gif_ClipImage_y++)
> +Gif_ClipImage_gfi_0[Gif_ClipImage_shift]
> +  = Gif_ClipImage_gfi_0[Gif_ClipImage_y];
> +}


[PATCH] Loop unswitching: support gswitch statements.

2021-09-15 Thread Martin Liška

Hello.

The patch extends the loop unswitching pass so that gswitch
statements are supported. The pass now uses ranger which marks
switch edges that are known to be unreachable in a versioned loop.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

* tree-cfg.c (gimple_lv_add_condition_to_bb): Support non-gimple
expressions that needs to be gimplified.
* tree-ssa-loop-unswitch.c (tree_unswitch_loop): Add new
cond_edge parameter.
(tree_may_unswitch_on): Support gswitch statements.
(clean_up_switches): New function.
(tree_ssa_unswitch_loops): Call clean_up_switches.
(simplify_using_entry_checks): Removed and replaced with ranger.
(tree_unswitch_single_loop): Change assumptions.

gcc/testsuite/ChangeLog:

* gcc.dg/loop-unswitch-6.c: New test.
* gcc.dg/loop-unswitch-7.c: New test.
* gcc.dg/loop-unswitch-8.c: New test.
* gcc.dg/loop-unswitch-9.c: New test.

Co-Authored-By: Richard Biener 
---
 gcc/testsuite/gcc.dg/loop-unswitch-6.c |  56 +
 gcc/testsuite/gcc.dg/loop-unswitch-7.c |  45 
 gcc/testsuite/gcc.dg/loop-unswitch-8.c |  28 +++
 gcc/testsuite/gcc.dg/loop-unswitch-9.c |  34 +++
 gcc/tree-cfg.c |   7 +-
 gcc/tree-ssa-loop-unswitch.c   | 284 ++---
 6 files changed, 374 insertions(+), 80 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/loop-unswitch-6.c
 create mode 100644 gcc/testsuite/gcc.dg/loop-unswitch-7.c
 create mode 100644 gcc/testsuite/gcc.dg/loop-unswitch-8.c
 create mode 100644 gcc/testsuite/gcc.dg/loop-unswitch-9.c

diff --git a/gcc/testsuite/gcc.dg/loop-unswitch-6.c 
b/gcc/testsuite/gcc.dg/loop-unswitch-6.c
new file mode 100644
index 000..8a022e0f200
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/loop-unswitch-6.c
@@ -0,0 +1,56 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -funswitch-loops -fdump-tree-unswitch-details 
--param=max-unswitch-insns=1000 --param=max-unswitch-level=10" } */
+
+int
+__attribute__((noipa))
+foo(double *a, double *b, double *c, double *d, double *r, int size, int order)
+{
+  for (int i = 0; i < size; i++)
+  {
+double tmp, tmp2;
+
+switch(order)
+{
+  case 0:
+tmp = -8 * a[i];
+tmp2 = 2 * b[i];
+break;
+  case 1:
+tmp = 3 * a[i] -  2 * b[i];
+tmp2 = 5 * b[i] - 2 * c[i];
+break;
+  case 2:
+tmp = 9 * a[i] +  2 * b[i] + c[i];
+tmp2 = 4 * b[i] + 2 * c[i] + 8 * d[i];
+break;
+  case 3:
+tmp = 3 * a[i] +  2 * b[i] - c[i];
+tmp2 = b[i] - 2 * c[i] + 8 * d[i];
+break;
+  defaut:
+__builtin_unreachable ();
+}
+
+double x = 3 * tmp + d[i] + tmp;
+double y = 3.4f * tmp + d[i] + tmp2;
+r[i] = x + y;
+  }
+
+  return 0;
+}
+
+#define N 16 * 1024
+double aa[N], bb[N], cc[N], dd[N], rr[N];
+
+int main()
+{
+  for (int i = 0; i < 100 * 1000; i++)
+foo (aa, bb, cc, dd, rr, N, i % 4);
+}
+
+
+/* Test that we actually unswitched something.  */
+/* { dg-final { scan-tree-dump ";; Unswitching loop with condition: order.* == 0" 
"unswitch" } } */
+/* { dg-final { scan-tree-dump ";; Unswitching loop with condition: order.* == 1" 
"unswitch" } } */
+/* { dg-final { scan-tree-dump ";; Unswitching loop with condition: order.* == 2" 
"unswitch" } } */
+/* { dg-final { scan-tree-dump ";; Unswitching loop with condition: order.* == 3" 
"unswitch" } } */
diff --git a/gcc/testsuite/gcc.dg/loop-unswitch-7.c 
b/gcc/testsuite/gcc.dg/loop-unswitch-7.c
new file mode 100644
index 000..00f2fcff64b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/loop-unswitch-7.c
@@ -0,0 +1,45 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -funswitch-loops -fdump-tree-unswitch-details 
--param=max-unswitch-insns=1000 --param=max-unswitch-level=10" } */
+
+int
+foo(double *a, double *b, double *c, double *d, double *r, int size, int order)
+{
+  for (int i = 0; i < size; i++)
+  {
+double tmp, tmp2;
+
+switch(order)
+{
+  case 5 ... 6:
+  case 9:
+tmp = -8 * a[i];
+tmp2 = 2 * b[i];
+break;
+  case 11:
+tmp = 3 * a[i] -  2 * b[i];
+tmp2 = 5 * b[i] - 2 * c[i];
+break;
+  case 22:
+tmp = 9 * a[i] +  2 * b[i] + c[i];
+tmp2 = 4 * b[i] + 2 * c[i] + 8 * d[i];
+break;
+  case 33:
+tmp = 3 * a[i] +  2 * b[i] - c[i];
+tmp2 = b[i] - 2 * c[i] + 8 * d[i];
+break;
+  defaut:
+__builtin_unreachable ();
+}
+
+double x = 3 * tmp + d[i] + tmp;
+double y = 3.4f * tmp + d[i] + tmp2;
+r[i] = x + y;
+  }
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump ";; Unswitching loop with condition: order_.* >= 5 & order_.* 
<= 6 | order_.* == 9" "unswitch" } } */
+/* { dg-final { scan-tree-dump ";; Unswitching loop with condition: order.* == 1" 
"unswitch" } } */
+/* { dg-final { scan-tree-dump 

PING^1 [PATCH] rs6000: Fix some issues in rs6000_can_inline_p [PR102059]

2021-09-15 Thread Kewen.Lin via Gcc-patches
Hi!

Gentle ping this patch:

https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578552.html

BR,
Kewen

on 2021/9/1 下午2:55, Kewen.Lin via Gcc-patches wrote:
> Hi!
> 
> This patch is to fix the inconsistent behaviors for non-LTO mode
> and LTO mode.  As Martin pointed out, currently the function
> rs6000_can_inline_p simply makes it inlinable if callee_tree is
> NULL, but it's wrong, we should use the command line options
> from target_option_default_node as default.  It also replaces
> rs6000_isa_flags with the one from target_option_default_node
> when caller_tree is NULL as rs6000_isa_flags could probably
> change since initialization.
> 
> It also extends the scope of the check for the case that callee
> has explicit set options, for test case pr102059-2.c inlining can
> happen unexpectedly before, it's fixed accordingly.
> 
> As Richi/Mike pointed out, some tuning flags like MASK_P8_FUSION
> can be neglected for inlining, this patch also exludes them when
> the callee is attributed by always_inline.
> 
> Bootstrapped and regtested on powerpc64le-linux-gnu Power9.
> 
> BR,
> Kewen
> -
> gcc/ChangeLog:
> 
>   PR ipa/102059
>   * config/rs6000/rs6000.c (rs6000_can_inline_p): Adjust with
>   target_option_default_node and consider always_inline_safe flags.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR ipa/102059
>   * gcc.target/powerpc/pr102059-1.c: New test.
>   * gcc.target/powerpc/pr102059-2.c: New test.
>   * gcc.target/powerpc/pr102059-3.c: New test.
>   * gcc.target/powerpc/pr102059-4.c: New test.
> 



PING^1 [PATCH] rs6000: Remove useless toc-fusion option

2021-09-15 Thread Kewen.Lin via Gcc-patches
Hi,

Gentle ping this patch:

https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578553.html


BR,
Kewen

on 2021/9/1 下午2:56, Kewen.Lin via Gcc-patches wrote:
> Hi!
> 
> Option toc-fusion was intended for Power9 toc fusion previously,
> but Power9 doesn't support fusion at all eventually, this patch
> is to remove this useless option.
> 
> Is it ok for trunk?
> 
> BR,
> Kewen
> -
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000.opt (-mtoc-fusion): Remove.
> 





Re: [PATCH 1/7] ifcvt: Check if cmovs are needed.

2021-09-15 Thread Robin Dapp via Gcc-patches

Hi Richard,


Don't we still need this code (without the REG_DEAD handling) for the
case in which…


+  /* As we are transforming
+if (x > y)
+  {
+a = b;
+c = d;
+  }
+into
+  a = (x > y) ...
+  c = (x > y) ...
+
+we potentially check x > y before every set.
+Even though the check might be removed by subsequent passes, this means
+that we cannot transform
+  if (x > y)
+{
+  x = y;
+  ...
+}
+into
+  x = (x > y) ...
+  ...
+since this would invalidate x and the following to-be-removed checks.
+Therefore we introduce a temporary every time we are about to
+overwrite a variable used in the check.  Costing of a sequence with
+these is going to be inaccurate so only use temporaries when
+needed.  */
+  if (reg_overlap_mentioned_p (target, cond))
+   temp = gen_reg_rtx (GET_MODE (target));


…this code triggers?  I don't see otherwise how later uses of x would
pick up “temp” instead of the original target.  E.g. suppose we had:

 if (x > y)
   {
 x = …;
 z = x; // x does not die here
   }

Without the loop, it looks like z would pick up the old value of x
(used in the comparison) instead of the new one.


getting back to this now.  I re-added handling of the situation you 
mentioned (even though I didn't manage to trigger it myself).


Regards
 Robin
commit 2d909ee93ee1eb0f7474ed57581713367c22ba6c
Author: Robin Dapp 
Date:   Thu Jun 24 16:40:04 2021 +0200

ifcvt: Check if cmovs are needed.

When if-converting multiple SETs and we encounter a swap-style idiom

  if (a > b)
{
  tmp = c;   // [1]
  c = d;
  d = tmp;
}

ifcvt should not generate a conditional move for the instruction at
[1].

In order to achieve that, this patch goes through all relevant SETs
and marks the relevant instructions.  This helps to evaluate costs.

On top, only generate temporaries if the current cmov is going to
overwrite one of the comparands of the initial compare.

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 017944f4f79..f1448667732 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -98,6 +98,8 @@ static int dead_or_predicable (basic_block, basic_block, basic_block,
 			   edge, int);
 static void noce_emit_move_insn (rtx, rtx);
 static rtx_insn *block_has_only_trap (basic_block);
+static void need_cmov_or_rewire (basic_block, hash_set *,
+ hash_map *);
 
 /* Count the number of non-jump active insns in BB.  */
 
@@ -3203,6 +3205,11 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
   auto_vec unmodified_insns;
   int count = 0;
 
+  hash_set need_no_cmov;
+  hash_map rewired_src;
+
+  need_cmov_or_rewire (then_bb, _no_cmov, _src);
+
   FOR_BB_INSNS (then_bb, insn)
 {
   /* Skip over non-insns.  */
@@ -3213,26 +3220,47 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
   gcc_checking_assert (set);
 
   rtx target = SET_DEST (set);
-  rtx temp = gen_reg_rtx (GET_MODE (target));
-  rtx new_val = SET_SRC (set);
+  rtx temp;
+
+  int *ii = rewired_src.get (insn);
+  rtx new_val = ii == NULL ? SET_SRC (set) : temporaries[*ii];
   rtx old_val = target;
 
-  /* If we were supposed to read from an earlier write in this block,
-	 we've changed the register allocation.  Rewire the read.  While
-	 we are looking, also try to catch a swap idiom.  */
-  for (int i = count - 1; i >= 0; --i)
-	if (reg_overlap_mentioned_p (new_val, targets[i]))
-	  {
-	/* Catch a "swap" style idiom.  */
-	if (find_reg_note (insn, REG_DEAD, new_val) != NULL_RTX)
-	  /* The write to targets[i] is only live until the read
-		 here.  As the condition codes match, we can propagate
-		 the set to here.  */
-	  new_val = SET_SRC (single_set (unmodified_insns[i]));
-	else
-	  new_val = temporaries[i];
-	break;
-	  }
+  /* As we are transforming
+	 if (x > y)
+	   {
+	 a = b;
+	 c = d;
+	   }
+	 into
+	   a = (x > y) ...
+	   c = (x > y) ...
+
+	 we potentially check x > y before every set.
+	 Even though the check might be removed by subsequent passes, this means
+	 that we cannot transform
+	   if (x > y)
+	 {
+	   x = y;
+	   ...
+	 }
+	 into
+	   x = (x > y) ...
+	   ...
+	 since this would invalidate x and the following to-be-removed checks.
+	 Therefore we introduce a temporary every time we are about to
+	 overwrite a variable used in the check.  Costing of a sequence with
+	 these is going to be inaccurate so only use temporaries when
+	 needed.  */
+  if (reg_overlap_mentioned_p (target, cond))
+	temp = gen_reg_rtx (GET_MODE (target));
+  else
+	temp = target;
+
+  /* We have identified swap-style idioms in check_need_cmovs.  A normal
+	 set will need to be a cmov while the 

Re: [committed] Fortran: Add missing ST_OMP_END_SCOPE handling [PR102313]

2021-09-15 Thread Thomas Schwinge
Hi!

On 2021-09-14T14:25:20+0200, Tobias Burnus  wrote:
> I have created a testcase with all missing ST_OMP_END_* and ST_OACC_END_*;
> I am not quite sure why a different code path is triggered for some, but
> at least here is now a parse check for all.

At least the OpenACC one is explained easily:

> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/goacc/unexpected-end.f90
> @@ -0,0 +1,23 @@
> +! PR fortran/102313
> +
> +!$acc end ATOMIC  ! { dg-error "Unexpected !.ACC END ATOMIC" }
> +
> +!$acc end DATA  ! { dg-error "Unexpected !.ACC END DATA" }
> +
> +!$acc end HOST DATA  ! { dg-error "Unclassifiable OpenACC directive" }

Pushed to master branch commit 8b69c481fc86e04c6c83f3a49eef2760c175a8f2
"Add OpenACC 'host_data' testing to 'gfortran.dg/goacc/unexpected-end.f90'",
see attached.


Grüße
 Thomas


> +
> +!$acc end KERNELS  ! { dg-error "Unexpected !.ACC END KERNELS" }
> +
> +!$acc end KERNELS LOOP  ! { dg-error "Unexpected !.ACC END KERNELS LOOP" }
> +
> +!$acc end LOOP  ! { dg-error "Unexpected !.ACC END LOOP" }
> +
> +!$acc end PARALLEL  ! { dg-error "Unexpected !.ACC END PARALLEL" }
> +
> +!$acc end PARALLEL LOOP  ! { dg-error "Unexpected !.ACC END PARALLEL LOOP" }
> +
> +!$acc end SERIAL  ! { dg-error "Unexpected !.ACC END SERIAL" }
> +
> +!$acc end SERIAL LOOP  ! { dg-error "Unexpected !.ACC END SERIAL LOOP" }
> +
> +end
> diff --git a/gcc/testsuite/gfortran.dg/gomp/unexpected-end.f90 
> b/gcc/testsuite/gfortran.dg/gomp/unexpected-end.f90
> new file mode 100644
> index 000..d2e8daa3fde
> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/gomp/unexpected-end.f90
> @@ -0,0 +1,123 @@
> +! PR fortran/102313
> +
> +!$omp end ATOMIC  ! { dg-error "Unexpected !.OMP END ATOMIC" }
> +
> +!$omp end CRITICAL  ! { dg-error "Unexpected !.OMP END CRITICAL" }
> +
> +!$omp end DISTRIBUTE  ! { dg-error "Unexpected !.OMP END DISTRIBUTE" }
> +
> +!$omp end DISTRIBUTE PARALLEL DO  ! { dg-error "Unexpected !.OMP END 
> DISTRIBUTE PARALLEL DO" }
> +
> +!$omp end DISTRIBUTE PARALLEL DO SIMD  ! { dg-error "Unexpected !.OMP END 
> DISTRIBUTE PARALLEL DO SIMD" }
> +
> +!$omp end DISTRIBUTE SIMD  ! { dg-error "Unexpected !.OMP END DISTRIBUTE 
> SIMD" }
> +
> +!$omp end DO  ! { dg-error "Unexpected !.OMP END DO" }
> +
> +!$omp end DO SIMD  ! { dg-error "Unexpected !.OMP END DO SIMD" }
> +
> +!$omp end LOOP  ! { dg-error "Unclassifiable OpenMP directive" }
> +
> +!$omp parallel loop
> +do i = 1, 5
> +end do
> +!$omp end LOOP  ! { dg-error "Unclassifiable OpenMP directive" }
> +
> +!$omp end MASKED  ! { dg-error "Unexpected !.OMP END MASKED" }
> +
> +!$omp end MASKED TASKLOOP  ! { dg-error "Unexpected !.OMP END MASKED 
> TASKLOOP" }
> +
> +!$omp end MASKED TASKLOOP SIMD  ! { dg-error "Unexpected !.OMP END MASKED 
> TASKLOOP SIMD" }
> +
> +!$omp end MASTER  ! { dg-error "Unexpected !.OMP END MASTER" }
> +
> +!$omp end MASTER TASKLOOP  ! { dg-error "Unexpected !.OMP END MASTER 
> TASKLOOP" }
> +
> +!$omp end MASTER TASKLOOP SIMD  ! { dg-error "Unexpected !.OMP END MASTER 
> TASKLOOP SIMD" }
> +
> +!$omp end ORDERED  ! { dg-error "Unexpected !.OMP END ORDERED" }
> +
> +!$omp end PARALLEL  ! { dg-error "Unexpected !.OMP END PARALLEL" }
> +
> +!$omp end PARALLEL DO  ! { dg-error "Unexpected !.OMP END PARALLEL DO" }
> +
> +!$omp end PARALLEL DO SIMD  ! { dg-error "Unexpected !.OMP END PARALLEL DO 
> SIMD" }
> +
> +!$omp loop
> +!$omp end PARALLEL LOOP  ! { dg-error "Unexpected junk" }
> +
> +!$omp end PARALLEL MASKED  ! { dg-error "Unexpected !.OMP END PARALLEL 
> MASKED" }
> +
> +!$omp end PARALLEL MASKED TASKLOOP  ! { dg-error "Unexpected !.OMP END 
> PARALLEL MASKED TASKLOOP" }
> +
> +!$omp end PARALLEL MASKED TASKLOOP SIMD  ! { dg-error "Unexpected !.OMP END 
> PARALLEL MASKED TASKLOOP SIMD" }
> +
> +!$omp end PARALLEL MASTER  ! { dg-error "Unexpected !.OMP END PARALLEL 
> MASTER" }
> +
> +!$omp end PARALLEL MASTER TASKLOOP  ! { dg-error "Unexpected !.OMP END 
> PARALLEL MASTER TASKLOOP" }
> +
> +!$omp end PARALLEL MASTER TASKLOOP SIMD  ! { dg-error "Unexpected !.OMP END 
> PARALLEL MASTER TASKLOOP SIMD" }
> +
> +!$omp end PARALLEL SECTIONS  ! { dg-error "Unexpected !.OMP END PARALLEL 
> SECTIONS" }
> +
> +!$omp end PARALLEL WORKSHARE  ! { dg-error "Unexpected !.OMP END PARALLEL 
> WORKSHARE" }
> +
> +!$omp end SCOPE  ! { dg-error "Unexpected !.OMP END SCOPE" }
> +
> +!$omp end SECTIONS  ! { dg-error "Unexpected !.OMP END SECTIONS" }
> +
> +!$omp end SIMD  ! { dg-error "Unexpected !.OMP END SIMD" }
> +
> +!$omp end SINGLE  ! { dg-error "Unexpected !.OMP END SINGLE" }
> +
> +!$omp end TARGET  ! { dg-error "Unexpected !.OMP END TARGET" }
> +
> +!$omp end TARGET DATA  ! { dg-error "Unexpected !.OMP END TARGET DATA" }
> +
> +!$omp end TARGET PARALLEL  ! { dg-error "Unexpected !.OMP END TARGET 
> PARALLEL" }
> +
> +!$omp end TARGET PARALLEL DO  ! { dg-error "Unexpected !.OMP END TARGET 
> PARALLEL DO" }
> +
> +!$omp end TARGET PARALLEL DO SIMD  ! { dg-error "Unexpected !.OMP END TARGET 
> PARALLEL DO SIMD" }
> +
> +!$omp end 

aix: adjust installation directories for GCC64

2021-09-15 Thread CHIGOT, CLEMENT via Gcc-patches
As gcc on 64bit for AIX is built with "MULTILIB_MATCHES= .=maix32",
"-print-multi-directory" and similar flags aren't returning the
correct directory when used with -maix32: "." is returned instead
of "ppc32".
Libgcc installation script needs to be adjust to bypass this
problem and correctly install 32bit files in a ppc32 subdirectory.

libgcc/ChangeLog:
2021-09-03  Clément Chigot  

* config/rs6000/t-slibgcc-aix (SHLIB_INSTALL): Replace
"$(slibdir)@shlib_slibdir_qual@" by $(inst_libdir).


Please submit if accepted. 
Thanks, 

Clément

0001-aix-adjust-installation-directories-for-GCC64.patch
Description: 0001-aix-adjust-installation-directories-for-GCC64.patch


0001-aix-adjust-installation-directories-for-GCC64.patch
Description: 0001-aix-adjust-installation-directories-for-GCC64.patch


[PATCH 3/4] [PATCH 3/4] x86: Properly handle USE_VECTOR_FP_CONVERTS/USE_VECTOR_CONVERTS

2021-09-15 Thread lili.cui--- via Gcc-patches
From: "H.J. Lu" 

Check TARGET_USE_VECTOR_FP_CONVERTS or TARGET_USE_VECTOR_CONVERTS when
handling avx_partial_xmm_update attribute.  Don't convert AVX partial
XMM register update if vector packed SSE conversion should be used.

gcc/

PR target/101900
* config/i386/i386-features.c (remove_partial_avx_dependency):
Check TARGET_USE_VECTOR_FP_CONVERTS and TARGET_USE_VECTOR_CONVERTS
before generating vxorps.

gcc/

PR target/101900
* testsuite/gcc.target/i386/pr101900-1.c: New test.
* testsuite/gcc.target/i386/pr101900-2.c: Likewise.
* testsuite/gcc.target/i386/pr101900-3.c: Likewise.
---
 gcc/config/i386/i386-features.c| 21 ++---
 gcc/testsuite/gcc.target/i386/pr101900-1.c | 18 ++
 gcc/testsuite/gcc.target/i386/pr101900-2.c | 18 ++
 gcc/testsuite/gcc.target/i386/pr101900-3.c | 19 +++
 4 files changed, 73 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr101900-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr101900-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr101900-3.c

diff --git a/gcc/config/i386/i386-features.c b/gcc/config/i386/i386-features.c
index 5a99ea7c046..ae5ea02a002 100644
--- a/gcc/config/i386/i386-features.c
+++ b/gcc/config/i386/i386-features.c
@@ -2210,15 +2210,30 @@ remove_partial_avx_dependency (void)
  != AVX_PARTIAL_XMM_UPDATE_TRUE)
continue;
 
- if (!v4sf_const0)
-   v4sf_const0 = gen_reg_rtx (V4SFmode);
-
  /* Convert PARTIAL_XMM_UPDATE_TRUE insns, DF -> SF, SF -> DF,
 SI -> SF, SI -> DF, DI -> SF, DI -> DF, to vec_dup and
 vec_merge with subreg.  */
  rtx src = SET_SRC (set);
  rtx dest = SET_DEST (set);
  machine_mode dest_mode = GET_MODE (dest);
+ machine_mode src_mode;
+
+ if (TARGET_USE_VECTOR_FP_CONVERTS)
+   {
+ src_mode = GET_MODE (XEXP (src, 0));
+ if (src_mode == E_SFmode || src_mode == E_DFmode)
+   continue;
+   }
+
+ if (TARGET_USE_VECTOR_CONVERTS)
+   {
+ src_mode = GET_MODE (XEXP (src, 0));
+ if (src_mode == E_SImode || src_mode == E_DImode)
+   continue;
+   }
+
+ if (!v4sf_const0)
+   v4sf_const0 = gen_reg_rtx (V4SFmode);
 
  rtx zero;
  machine_mode dest_vecmode;
diff --git a/gcc/testsuite/gcc.target/i386/pr101900-1.c 
b/gcc/testsuite/gcc.target/i386/pr101900-1.c
new file mode 100644
index 000..0a45f8e340a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr101900-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake -mfpmath=sse 
-mtune-ctrl=use_vector_fp_converts" } */
+
+extern float f;
+extern double d;
+extern int i;
+
+void
+foo (void)
+{
+  d = f;
+  f = i;
+}
+
+/* { dg-final { scan-assembler "vcvtps2pd" } } */
+/* { dg-final { scan-assembler "vcvtsi2ssl" } } */
+/* { dg-final { scan-assembler-not "vcvtss2sd" } } */
+/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr101900-2.c 
b/gcc/testsuite/gcc.target/i386/pr101900-2.c
new file mode 100644
index 000..c8b2d1da5ae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr101900-2.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake -mfpmath=sse 
-mtune-ctrl=use_vector_converts" } */
+
+extern float f;
+extern double d;
+extern int i;
+
+void
+foo (void)
+{
+  d = f;
+  f = i;
+}
+
+/* { dg-final { scan-assembler "vcvtss2sd" } } */
+/* { dg-final { scan-assembler "vcvtdq2ps" } } */
+/* { dg-final { scan-assembler-not "vcvtsi2ssl" } } */
+/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr101900-3.c 
b/gcc/testsuite/gcc.target/i386/pr101900-3.c
new file mode 100644
index 000..6ee565b5bd4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr101900-3.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake -mfpmath=sse 
-mtune-ctrl=use_vector_fp_converts,use_vector_converts" } */
+
+extern float f;
+extern double d;
+extern int i;
+
+void
+foo (void)
+{
+  d = f;
+  f = i;
+}
+
+/* { dg-final { scan-assembler "vcvtps2pd" } } */
+/* { dg-final { scan-assembler "vcvtdq2ps" } } */
+/* { dg-final { scan-assembler-not "vcvtss2sd" } } */
+/* { dg-final { scan-assembler-not "vcvtsi2ssl" } } */
+/* { dg-final { scan-assembler-not "vxorps" } } */
-- 
2.17.1



[PATCH 2/4] [PATCH 2/4] x86: Update memcpy/memset inline strategies for -mtune=tremont

2021-09-15 Thread lili.cui--- via Gcc-patches
From: "H.J. Lu" 

Simply memcpy and memset inline strategies to avoid branches for
-mtune=tremont:

1. Create Tremont cost model from generic cost model.
2. With MOVE_RATIO and CLEAR_RATIO == 17, GCC will use integer/vector
   load and store for up to 16 * 16 (256) bytes when the data size is
   fixed and known.
3. Inline only if data size is known to be <= 256.
   a. Use "rep movsb/stosb" with simple code sequence if the data size
  is a constant.
   b. Use loop if data size is not a constant.
4. Use memcpy/memset libray function if data size is unknown or > 256.

* config/i386/i386-options.c (processor_cost_table): Use
tremont_cost for Tremont.
* config/i386/x86-tune-costs.h (tremont_memcpy): New.
(tremont_memset): Likewise.
(tremont_cost): Likewise.
* config/i386/x86-tune.def (X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB):
Enable for Tremont.
---
 gcc/config/i386/i386-options.c   |   2 +-
 gcc/config/i386/x86-tune-costs.h | 124 +++
 gcc/config/i386/x86-tune.def |   2 +-
 3 files changed, 126 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index c0006b3674b..e7a3bd4aaea 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -724,7 +724,7 @@ static const struct processor_costs *processor_cost_table[] 
=
   _cost,
   _cost,
   _cost,
-  _cost,
+  _cost,
   _cost,
   _cost,
   _cost,
diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h
index ffe810f2bcb..93644be9cb3 100644
--- a/gcc/config/i386/x86-tune-costs.h
+++ b/gcc/config/i386/x86-tune-costs.h
@@ -2734,6 +2734,130 @@ struct processor_costs slm_cost = {
   "16",/* Func alignment.  */
 };
 
+static stringop_algs tremont_memcpy[2] = {
+  {libcall,
+   {{256, rep_prefix_1_byte, true},
+{256, loop, false},
+{-1, libcall, false}}},
+  {libcall,
+   {{256, rep_prefix_1_byte, true},
+{256, loop, false},
+{-1, libcall, false;
+static stringop_algs tremont_memset[2] = {
+  {libcall,
+   {{256, rep_prefix_1_byte, true},
+{256, loop, false},
+{-1, libcall, false}}},
+  {libcall,
+   {{256, rep_prefix_1_byte, true},
+{256, loop, false},
+{-1, libcall, false;
+static const
+struct processor_costs tremont_cost = {
+  {
+  /* Start of register allocator costs.  integer->integer move cost is 2. */
+  6,/* cost for loading QImode using movzbl */
+  {6, 6, 6},   /* cost of loading integer registers
+  in QImode, HImode and SImode.
+  Relative to reg-reg move (2).  */
+  {6, 6, 6},   /* cost of storing integer registers */
+  4,   /* cost of reg,reg fld/fst */
+  {6, 6, 12},  /* cost of loading fp registers
+  in SFmode, DFmode and XFmode */
+  {6, 6, 12},  /* cost of storing fp registers
+  in SFmode, DFmode and XFmode */
+  2,   /* cost of moving MMX register */
+  {6, 6},  /* cost of loading MMX registers
+  in SImode and DImode */
+  {6, 6},  /* cost of storing MMX registers
+  in SImode and DImode */
+  2, 3, 4, /* cost of moving XMM,YMM,ZMM register 
*/
+  {6, 6, 6, 10, 15},   /* cost of loading SSE registers
+  in 32,64,128,256 and 512-bit */
+  {6, 6, 6, 10, 15},   /* cost of storing SSE registers
+  in 32,64,128,256 and 512-bit */
+  6, 6,/* SSE->integer and integer->SSE moves 
*/
+  6, 6,/* mask->integer and integer->mask 
moves */
+  {6, 6, 6},   /* cost of loading mask register
+  in QImode, HImode, SImode.  */
+  {6, 6, 6},   /* cost if storing mask register
+  in QImode, HImode, SImode.  */
+  2,   /* cost of moving mask register.  */
+  /* End of register allocator costs.  */
+  },
+
+  COSTS_N_INSNS (1),   /* cost of an add instruction */
+  /* Setting cost to 2 makes our current implementation of synth_mult result in
+ use of unnecessary temporary registers causing regression on several
+ SPECfp benchmarks.  */
+  COSTS_N_INSNS (1) + 1,   /* cost of a lea instruction */
+  COSTS_N_INSNS (1),   /* variable shift costs */
+  COSTS_N_INSNS (1),   /* constant shift costs */
+  

[PATCH 4/4] [PATCH 4/4] x86: Add TARGET_SSE_PARTIAL_REG_[FP_]CONVERTS_DEPENDENCY

2021-09-15 Thread lili.cui--- via Gcc-patches
From: "H.J. Lu" 

1. Replace TARGET_SSE_PARTIAL_REG_DEPENDENCY with
TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY in SSE FP to FP splitters.
2. Replace TARGET_SSE_PARTIAL_REG_DEPENDENCY with
TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY in SSE INT to FP splitters.
3.  Also check TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY and
TARGET_SSE_PARTIAL_REG_DEPENDENCY when handling avx_partial_xmm_update
attribute.  Don't convert AVX partial XMM register update if there is no
partial SSE register dependency for SSE conversion.

gcc/

* config/i386/i386-features.c (remove_partial_avx_dependency):
Also check TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY and
and TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY before generating
vxorps.
* config/i386/i386.h (TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY):
New.
(TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Likewise.
* config/i386/i386.md (SSE FP to FP splitters): Replace
TARGET_SSE_PARTIAL_REG_DEPENDENCY with
TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY.
(SSE INT to FP splitter): Replace TARGET_SSE_PARTIAL_REG_DEPENDENCY
with TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY.
* config/i386/x86-tune.def
(X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY): New.
(X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Likewise.

gcc/testsuite/

* gcc.target/i386/avx-covert-1.c: New file.
* gcc.target/i386/avx-fp-covert-1.c: Likewise.
* gcc.target/i386/avx-int-covert-1.c: Likewise.
* gcc.target/i386/sse-covert-1.c: Likewise.
* gcc.target/i386/sse-fp-covert-1.c: Likewise.
* gcc.target/i386/sse-int-covert-1.c: Likewise.
---
 gcc/config/i386/i386-features.c   |  6 --
 gcc/config/i386/i386.h|  4 
 gcc/config/i386/i386.md   |  9 ++---
 gcc/config/i386/x86-tune.def  | 15 +++
 gcc/testsuite/gcc.target/i386/avx-covert-1.c  | 19 +++
 .../gcc.target/i386/avx-fp-covert-1.c | 15 +++
 .../gcc.target/i386/avx-int-covert-1.c| 14 ++
 gcc/testsuite/gcc.target/i386/sse-covert-1.c  | 19 +++
 .../gcc.target/i386/sse-fp-covert-1.c | 15 +++
 .../gcc.target/i386/sse-int-covert-1.c| 14 ++
 10 files changed, 125 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx-covert-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx-fp-covert-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx-int-covert-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse-covert-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse-fp-covert-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse-int-covert-1.c

diff --git a/gcc/config/i386/i386-features.c b/gcc/config/i386/i386-features.c
index ae5ea02a002..91bfa06d4bf 100644
--- a/gcc/config/i386/i386-features.c
+++ b/gcc/config/i386/i386-features.c
@@ -2218,14 +2218,16 @@ remove_partial_avx_dependency (void)
  machine_mode dest_mode = GET_MODE (dest);
  machine_mode src_mode;
 
- if (TARGET_USE_VECTOR_FP_CONVERTS)
+ if (TARGET_USE_VECTOR_FP_CONVERTS
+ || !TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY)
{
  src_mode = GET_MODE (XEXP (src, 0));
  if (src_mode == E_SFmode || src_mode == E_DFmode)
continue;
}
 
- if (TARGET_USE_VECTOR_CONVERTS)
+ if (TARGET_USE_VECTOR_CONVERTS
+ || !TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY)
{
  src_mode = GET_MODE (XEXP (src, 0));
  if (src_mode == E_SImode || src_mode == E_DImode)
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index e76bb55c080..ec60b89753e 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -334,6 +334,10 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST];
ix86_tune_features[X86_TUNE_PARTIAL_REG_DEPENDENCY]
 #define TARGET_SSE_PARTIAL_REG_DEPENDENCY \
ix86_tune_features[X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY]
+#define TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY \
+   ix86_tune_features[X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY]
+#define TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY \
+   ix86_tune_features[X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY]
 #define TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \
ix86_tune_features[X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL]
 #define TARGET_SSE_UNALIGNED_STORE_OPTIMAL \
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 13f6f57cdcc..c82a9dc1f67 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -4535,7 +4535,8 @@
 (float_extend:DF
   (match_operand:SF 1 "nonimmediate_operand")))]
   "!TARGET_AVX
-   && TARGET_SSE_PARTIAL_REG_DEPENDENCY && epilogue_completed
+   && 

[PATCH 1/4] [PATCH 1/4] x86: Update -mtune=tremont

2021-09-15 Thread lili.cui--- via Gcc-patches
From: "H.J. Lu" 

Initial -mtune=tremont update

1. Use Haswell scheduling model.
2. Assume that stack engine allows to execute push instructions in
parall.
3. Prepare for scheduling pass as -mtune=generic.
4. Use the same issue rate as -mtune=generic.
5. Enable partial_reg_dependency.
6. Disable accumulate_outgoing_args
7. Enable use_leave
8. Enable push_memory
9. Disable four_jump_limit
10. Disable opt_agu
11. Disable avoid_lea_for_addr
12. Disable avoid_mem_opnd_for_cmove
13. Enable misaligned_move_string_pro_epilogues
14. Enable use_cltd
16. Enable avoid_false_dep_for_bmi
17. Enable avoid_mfence
18. Disable expand_abs
19. Enable sse_typeless_stores
20. Enable sse_load0_by_pxor
21. Disable split_mem_opnd_for_fp_converts
22. Disable slow_pshufb
23. Enable partial_reg_dependency

This is the first patch to tune for Tremont.  With all patches applied,
performance impacts on SPEC CPU 2017 are:

500.perlbench_r 1.81%
502.gcc_r   0.57%
505.mcf_r   1.16%
520.omnetpp_r   0.00%
523.xalancbmk_r 0.00%
525.x264_r  4.55%
531.deepsjeng_r 0.00%
541.leela_r 0.39%
548.exchange2_r 1.13%
557.xz_r0.00%
geomean for intrate 0.95%
503.bwaves_r0.00%
507.cactuBSSN_r 6.94%
508.namd_r  12.37%
510.parest_r1.01%
511.povray_r3.70%
519.lbm_r   36.61%
521.wrf_r   8.79%
526.blender_r   2.91%
527.cam4_r  6.23%
538.imagick_r   0.28%
544.nab_r   21.99%
549.fotonik3d_r 3.63%
554.roms_r  -1.20%
geomean for fprate  7.50%

gcc/ChangeLog

* common/config/i386/i386-common.c: Use Haswell scheduling model
for Tremont.
* config/i386/i386.c (ix86_sched_init_global): Prepare for Tremont
scheduling pass.
* config/i386/x86-tune-sched.c (ix86_issue_rate): Change Tremont
issue rate to 4.
(ix86_adjust_cost): Handle Tremont.
* config/i386/x86-tune.def (X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY):
Enable for Tremont.
(X86_TUNE_USE_LEAVE): Likewise.
(X86_TUNE_PUSH_MEMORY): Likewise.
(X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES): Likewise.
(X86_TUNE_USE_CLTD): Likewise.
(X86_TUNE_AVOID_FALSE_DEP_FOR_BMI): Likewise.
(X86_TUNE_AVOID_MFENCE): Likewise.
(X86_TUNE_SSE_TYPELESS_STORES): Likewise.
(X86_TUNE_SSE_LOAD0_BY_PXOR): Likewise.
(X86_TUNE_ACCUMULATE_OUTGOING_ARGS): Disable for Tremont.
(X86_TUNE_FOUR_JUMP_LIMIT): Likewise.
(X86_TUNE_OPT_AGU): Likewise.
(X86_TUNE_AVOID_LEA_FOR_ADDR): Likewise.
(X86_TUNE_AVOID_MEM_OPND_FOR_CMOVE): Likewise.
(X86_TUNE_EXPAND_ABS): Likewise.
(X86_TUNE_SPLIT_MEM_OPND_FOR_FP_CONVERTS): Likewise.
(X86_TUNE_SLOW_PSHUFB): Likewise.
---
 gcc/common/config/i386/i386-common.c |  2 +-
 gcc/config/i386/i386.c   |  1 +
 gcc/config/i386/x86-tune-sched.c |  2 ++
 gcc/config/i386/x86-tune.def | 37 ++--
 4 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/gcc/common/config/i386/i386-common.c 
b/gcc/common/config/i386/i386-common.c
index 00c65ba15ab..2c9e1ccbc6e 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -1935,7 +1935,7 @@ const pta processor_alias_table[] =
 M_CPU_TYPE (INTEL_GOLDMONT), P_PROC_SSE4_2},
   {"goldmont-plus", PROCESSOR_GOLDMONT_PLUS, CPU_GLM, PTA_GOLDMONT_PLUS,
 M_CPU_TYPE (INTEL_GOLDMONT_PLUS), P_PROC_SSE4_2},
-  {"tremont", PROCESSOR_TREMONT, CPU_GLM, PTA_TREMONT,
+  {"tremont", PROCESSOR_TREMONT, CPU_HASWELL, PTA_TREMONT,
 M_CPU_TYPE (INTEL_TREMONT), P_PROC_SSE4_2},
   {"knl", PROCESSOR_KNL, CPU_SLM, PTA_KNL,
 M_CPU_TYPE (INTEL_KNL), P_PROC_AVX512F},
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 7b173bc0beb..2927e2884c9 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -16976,6 +16976,7 @@ ix86_sched_init_global (FILE *, int, int)
 case PROCESSOR_NEHALEM:
 case PROCESSOR_SANDYBRIDGE:
 case PROCESSOR_HASWELL:
+case PROCESSOR_TREMONT:
 case PROCESSOR_GENERIC:
   /* Do not perform multipass scheduling for pre-reload schedule
  to save compile time.  */
diff --git a/gcc/config/i386/x86-tune-sched.c b/gcc/config/i386/x86-tune-sched.c
index 2e5ee4e..56ada99a450 100644
--- a/gcc/config/i386/x86-tune-sched.c
+++ b/gcc/config/i386/x86-tune-sched.c
@@ -71,6 +71,7 @@ ix86_issue_rate (void)
 case PROCESSOR_NEHALEM:
 case PROCESSOR_SANDYBRIDGE:
 case PROCESSOR_HASWELL:
+case PROCESSOR_TREMONT:
 case PROCESSOR_GENERIC:
   return 4;
 
@@ -429,6 +430,7 @@ ix86_adjust_cost (rtx_insn *insn, int dep_type, rtx_insn 
*dep_insn, int cost,
 case PROCESSOR_NEHALEM:
 case PROCESSOR_SANDYBRIDGE:
 case PROCESSOR_HASWELL:
+case PROCESSOR_TREMONT:
 case 

[PATCH 0/4] Update mtune=tremont

2021-09-15 Thread lili.cui--- via Gcc-patches
From: "Cui,Lili" 

Hi,

I have four patches for tremont tuning, With all patches applied,
performance impacts on SPEC CPU 2017 are:

500.perlbench_r 1.81%
502.gcc_r   0.57%
505.mcf_r   1.16%
520.omnetpp_r   0.00%
523.xalancbmk_r 0.00%
525.x264_r  4.55%
531.deepsjeng_r 0.00%
541.leela_r 0.39%
548.exchange2_r 1.13%
557.xz_r0.00%
geomean for intrate 0.95%
503.bwaves_r0.00%
507.cactuBSSN_r 6.94%
508.namd_r  12.37%
510.parest_r1.01%
511.povray_r3.70%
519.lbm_r   36.61%
521.wrf_r   8.79%
526.blender_r   2.91%
527.cam4_r  6.23%
538.imagick_r   0.28%
544.nab_r   21.99%
549.fotonik3d_r 3.63%
554.roms_r  -1.20%
geomean for fprate  7.50%

Bootstrapped and regtested on x86_64-linux-gnu{-m32,-m64}.
Ok for master?

  x86: Update -mtune=tremont
  x86: Update memcpy/memset inline strategies for -mtune=tremont
  x86: Properly handle USE_VECTOR_FP_CONVERTS/USE_VECTOR_CONVERTS
  x86: Add TARGET_SSE_PARTIAL_REG_[FP_]CONVERTS_DEPENDENCY

 gcc/common/config/i386/i386-common.c  |   2 +-
 gcc/config/i386/i386-features.c   |  23 +++-
 gcc/config/i386/i386-options.c|   2 +-
 gcc/config/i386/i386.c|   1 +
 gcc/config/i386/i386.h|   4 +
 gcc/config/i386/i386.md   |   9 +-
 gcc/config/i386/x86-tune-costs.h  | 124 ++
 gcc/config/i386/x86-tune-sched.c  |   2 +
 gcc/config/i386/x86-tune.def  |  52 +---
 gcc/testsuite/gcc.target/i386/avx-covert-1.c  |  19 +++
 .../gcc.target/i386/avx-fp-covert-1.c |  15 +++
 .../gcc.target/i386/avx-int-covert-1.c|  14 ++
 gcc/testsuite/gcc.target/i386/pr101900-1.c|  18 +++
 gcc/testsuite/gcc.target/i386/pr101900-2.c|  18 +++
 gcc/testsuite/gcc.target/i386/pr101900-3.c|  19 +++
 gcc/testsuite/gcc.target/i386/sse-covert-1.c  |  19 +++
 .../gcc.target/i386/sse-fp-covert-1.c |  15 +++
 .../gcc.target/i386/sse-int-covert-1.c|  14 ++
 18 files changed, 344 insertions(+), 26 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx-covert-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx-fp-covert-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx-int-covert-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr101900-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr101900-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr101900-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse-covert-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse-fp-covert-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse-int-covert-1.c

-- 
2.17.1

Thanks,
Lili.


Re: Ping ^ 3: [PATCH] rs6000: Fix wrong code generation for vec_sel [PR94613]

2021-09-15 Thread Xionghu Luo via Gcc-patches

Ping^3, thanks.
 
https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570333.html



On 2021/9/6 08:52, Xionghu Luo via Gcc-patches wrote:

Ping^2, thanks.

https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570333.html

On 2021/6/30 09:42, Xionghu Luo via Gcc-patches wrote:

Gentle ping, thanks.

https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570333.html


On 2021/5/14 14:57, Xionghu Luo via Gcc-patches wrote:

Hi,

On 2021/5/13 18:49, Segher Boessenkool wrote:

Hi!

On Fri, Apr 30, 2021 at 01:32:58AM -0500, Xionghu Luo wrote:

The vsel instruction is a bit-wise select instruction.  Using an
IF_THEN_ELSE to express it in RTL is wrong and leads to wrong code
being generated in the combine pass.  Per element selection is a
subset of per bit-wise selection,with the patch the pattern is
written using bit operations.  But there are 8 different patterns
to define "op0 := (op1 & ~op3) | (op2 & op3)":

(~op3) | (op3),
(~op3) | (op2),
(op3) | (~op3),
(op2) | (~op3),
(op1&~op3) | (op3),
(op1&~op3) | (op2),
(op3) | (op1&~op3),
(op2) | (op1&~op3),

Combine pass will swap (op1&~op3) to (~op3) due to commutative
canonical, which could reduce it to the FIRST 4 patterns, but it won't
swap (op2) | (~op3) to (~op3) | (op2), so this patch
handles it with two patterns with different NOT op3 position and check
equality inside it.


Yup, that latter case does not have canonicalisation rules.  Btw, not
only combine does this canonicalisation: everything should,
non-canonical RTL is invalid RTL (in the instruction stream, you can do
everything in temporary code of course, as long as the RTL isn't
malformed).


-(define_insn "*altivec_vsel"
+(define_insn "altivec_vsel"
    [(set (match_operand:VM 0 "altivec_register_operand" "=v")
-    (if_then_else:VM
- (ne:CC (match_operand:VM 1 "altivec_register_operand" "v")
-    (match_operand:VM 4 "zero_constant" ""))
- (match_operand:VM 2 "altivec_register_operand" "v")
- (match_operand:VM 3 "altivec_register_operand" "v")))]
-  "VECTOR_MEM_ALTIVEC_P (mode)"
-  "vsel %0,%3,%2,%1"
+    (ior:VM
+ (and:VM
+  (not:VM (match_operand:VM 3 "altivec_register_operand" "v"))
+  (match_operand:VM 1 "altivec_register_operand" "v"))
+ (and:VM
+  (match_operand:VM 2 "altivec_register_operand" "v")
+  (match_operand:VM 4 "altivec_register_operand" "v"]
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)
+  && (rtx_equal_p (operands[2], operands[3])
+  || rtx_equal_p (operands[4], operands[3]))"
+  {
+    if (rtx_equal_p (operands[2], operands[3]))
+  return "vsel %0,%1,%4,%3";
+    else
+  return "vsel %0,%1,%2,%3";
+  }
    [(set_attr "type" "vecmove")])


That rtx_equal_p stuff is nice and tricky, but it is a bit too tricky I
think.  So please write this as two patterns (and keep the expand if
that helps).


I was a bit concerned that there would be a lot of duplicate code if we
write two patterns for each vsel, totally 4 similar patterns in
altivec.md and another 4 in vsx.md make it difficult to maintain, 
however

I updated it since you prefer this way, as you pointed out the xxsel in
vsx.md could be folded by later patch.




+(define_insn "altivec_vsel2"


(same here of course).


  ;; Fused multiply add.
diff --git a/gcc/config/rs6000/rs6000-call.c 
b/gcc/config/rs6000/rs6000-call.c

index f5676255387..d65bdc01055 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -3362,11 +3362,11 @@ const struct altivec_builtin_types 
altivec_overloaded_builtins[] = {
  RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 
RS6000_BTI_unsigned_V2DI },

    { ALTIVEC_BUILTIN_VEC_SEL, ALTIVEC_BUILTIN_VSEL_2DI,
  RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 
RS6000_BTI_V2DI },

-  { ALTIVEC_BUILTIN_VEC_SEL, ALTIVEC_BUILTIN_VSEL_2DI,
+  { ALTIVEC_BUILTIN_VEC_SEL, ALTIVEC_BUILTIN_VSEL_2DI_UNS,


Are the _uns things still used for anything?  But, let's not change
this until Bill's stuff is in :-)

Why do you want to change this here, btw?  I don't understand.


OK, they are actually "unsigned type" overload builtin functions, change
it or not so far won't cause functionality issue, I will revert this 
change

in the updated patch.




+  if (target == 0
+  || GET_MODE (target) != tmode
+  || ! (*insn_data[icode].operand[0].predicate) (target, tmode))


No space after ! and other unary operators (except for casts and other
operators you write with alphanumerics, like "sizeof").  I know you
copied this code, but :-)


OK, thanks.



@@ -15608,8 +15606,6 @@ rs6000_emit_vector_cond_expr (rtx dest, rtx 
op_true, rtx op_false,

  case GEU:
  case LTU:
  case LEU:
-  /* Mark unsigned tests with CCUNSmode.  */
-  cc_mode = CCUNSmode;
    /* Invert condition to avoid compound test if necessary.  */
    if (rcode == GEU || rcode == LEU)


So this is related to the _uns thing.  Could you split off that change?
Probably as an earlier patch (but either works for me).


Not related to the 

Re: [PATCH, rs6000] Optimization for vec_xl_sext

2021-09-15 Thread HAO CHEN GUI via Gcc-patches

Bill,

    Yes, I built the gcc with p10 binutils. Then power10_ok tests can pass. 
Thanks again for your kindly explanation.

    I finally realized that the line wrap settings on my thunderbird didn't 
take any effect. I have to set a very large line size,  just for a workaround.

ChangeLog

2021-09-15 Haochen Gui 

gcc/
    * config/rs6000/rs6000-call.c (altivec_expand_lxvr_builtin):
    Modify the expansion for sign extension. All extentions are done
    within VSX resgisters.
    * gcc/config/rs6000/vsx.md (vsx_sign_extend_si_v2di): Define.

gcc/testsuite/
    * gcc.target/powerpc/p10_vec_xl_sext.c: New test.

patch.diff

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index b4e13af4dc6..587e9fa2a2a 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -9779,7 +9779,7 @@ altivec_expand_lxvr_builtin (enum insn_code icode, tree 
exp, rtx target, bool bl

   if (sign_extend)
 {
-  rtx discratch = gen_reg_rtx (DImode);
+  rtx discratch = gen_reg_rtx (V2DImode);
   rtx tiscratch = gen_reg_rtx (TImode);

   /* Emit the lxvr*x insn.  */
@@ -9788,20 +9788,31 @@ altivec_expand_lxvr_builtin (enum insn_code icode, tree 
exp, rtx target, bool bl
    return 0;
   emit_insn (pat);

-  /* Emit a sign extension from QI,HI,WI to double (DI).  */
-  rtx scratch = gen_lowpart (smode, tiscratch);
+  /* Emit a sign extension from V16QI,V8HI,V4SI to V2DI.  */
+  rtx temp1, temp2;
   if (icode == CODE_FOR_vsx_lxvrbx)
-   emit_insn (gen_extendqidi2 (discratch, scratch));
+   {
+ temp1  = simplify_gen_subreg (V16QImode, tiscratch, TImode, 0);
+ emit_insn (gen_vsx_sign_extend_qi_v2di (discratch, temp1));
+   }
   else if (icode == CODE_FOR_vsx_lxvrhx)
-   emit_insn (gen_extendhidi2 (discratch, scratch));
+   {
+ temp1  = simplify_gen_subreg (V8HImode, tiscratch, TImode, 0);
+ emit_insn (gen_vsx_sign_extend_hi_v2di (discratch, temp1));
+   }
   else if (icode == CODE_FOR_vsx_lxvrwx)
-   emit_insn (gen_extendsidi2 (discratch, scratch));
-  /*  Assign discratch directly if scratch is already DI.  */
-  if (icode == CODE_FOR_vsx_lxvrdx)
-   discratch = scratch;
+   {
+ temp1  = simplify_gen_subreg (V4SImode, tiscratch, TImode, 0);
+ emit_insn (gen_vsx_sign_extend_si_v2di (discratch, temp1));
+   }
+  else if (icode == CODE_FOR_vsx_lxvrdx)
+   discratch = simplify_gen_subreg (V2DImode, tiscratch, TImode, 0);
+  else
+   gcc_unreachable ();

-  /* Emit the sign extension from DI (double) to TI (quad). */
-  emit_insn (gen_extendditi2 (target, discratch));
+  /* Emit the sign extension from V2DI (double) to TI (quad).  */
+  temp2 = simplify_gen_subreg (TImode, discratch, V2DImode, 0);
+  emit_insn (gen_extendditi2_vector (target, temp2));

   return target;
 }
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index bcb92be2f5c..987f21bbc22 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -4830,7 +4830,7 @@ (define_insn "vsx_sign_extend_hi_"
   "vextsh2 %0,%1"
   [(set_attr "type" "vecexts")])

-(define_insn "*vsx_sign_extend_si_v2di"
+(define_insn "vsx_sign_extend_si_v2di"
   [(set (match_operand:V2DI 0 "vsx_register_operand" "=v")
    (unspec:V2DI [(match_operand:V4SI 1 "vsx_register_operand" "v")]
 UNSPEC_VSX_SIGN_EXTEND))]
diff --git a/gcc/testsuite/gcc.target/powerpc/p10_vec_xl_sext.c 
b/gcc/testsuite/gcc.target/powerpc/p10_vec_xl_sext.c
new file mode 100644
index 000..78e72ac5425
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/p10_vec_xl_sext.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target int128 } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include 
+
+vector signed __int128
+foo1 (signed long a, signed char *b)
+{
+  return vec_xl_sext (a, b);
+}
+
+vector signed __int128
+foo2 (signed long a, signed short *b)
+{
+  return vec_xl_sext (a, b);
+}
+
+vector signed __int128
+foo3 (signed long a, signed int *b)
+{
+  return vec_xl_sext (a, b);
+}
+
+vector signed __int128
+foo4 (signed long a, signed long *b)
+{
+  return vec_xl_sext (a, b);
+}
+
+/* { dg-final { scan-assembler-times {\mvextsd2q\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mvextsb2d\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvextsh2d\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvextsw2d\M} 1 } } */


On 10/9/2021 下午 8:18, Bill Schmidt wrote:



On 9/10/21 12:45 AM, HAO CHEN GUI wrote:

Bill,

    Thanks so much for your advice.

    I refined the patch and passed the bootstrap and regression test.
Just one thing, the test case becomes unsupported on P9 if I set "{
dg-require-effective-target power10_ok }". I just want the test case to
be compiled and check its assembly. Do we need set "power10_ok"?

Re: [PATCH] Maintain (mis-)alignment info in the first element of a group

2021-09-15 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Tue, 14 Sep 2021, Richard Sandiford wrote:
>
>> Richard Biener via Gcc-patches  writes:
>> > This changes us to maintain and compute (mis-)alignment info for
>> > the first element of a group only rather than for each DR when
>> > doing interleaving and for the earliest, first, or first in the SLP
>> > node (or any pair or all three of those) when SLP vectorizing.
>> >
>> > For this to work out the easiest way I have changed the accessors
>> > DR_MISALIGNMENT and DR_TARGET_ALIGNMENT to do the indirection to
>> > the first element rather than adjusting all callers.
>> > dr_misalignment is moved out-of-line and I'm not too fond of the
>> > poly-int dances there (any hints?), but basically we are now
>> > adjusting the first elements misalignment based on the DR_INIT
>> > difference.
>> >
>> > Bootstrap & regtest running on x86_64-unknown-linux-gnu.
>> >
>> > Richard.
>> >
>> > 2021-09-13  Richard Biener  
>> >
>> >* tree-vectorizer.h (dr_misalignment): Move out of line.
>> >(dr_target_alignment): New.
>> >(DR_TARGET_ALIGNMENT): Wrap dr_target_alignment.
>> >(set_dr_target_alignment): New.
>> >(SET_DR_TARGET_ALIGNMENT): Wrap set_dr_target_alignment.
>> >* tree-vect-data-refs.c (dr_misalignment): Compute and
>> >return the group members misalignment.
>> >(vect_compute_data_ref_alignment): Use SET_DR_TARGET_ALIGNMENT.
>> >(vect_analyze_data_refs_alignment): Compute alignment only
>> >for the first element of a DR group.
>> >(vect_slp_analyze_node_alignment): Likewise.
>> > ---
>> >  gcc/tree-vect-data-refs.c | 65 ---
>> >  gcc/tree-vectorizer.h | 24 ++-
>> >  2 files changed, 57 insertions(+), 32 deletions(-)
>> >
>> > diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
>> > index 66e76132d14..b53d6a0b3f1 100644
>> > --- a/gcc/tree-vect-data-refs.c
>> > +++ b/gcc/tree-vect-data-refs.c
>> > @@ -887,6 +887,36 @@ vect_slp_analyze_instance_dependence (vec_info 
>> > *vinfo, slp_instance instance)
>> >return res;
>> >  }
>> >  
>> > +/* Return the misalignment of DR_INFO.  */
>> > +
>> > +int
>> > +dr_misalignment (dr_vec_info *dr_info)
>> > +{
>> > +  if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt))
>> > +{
>> > +  dr_vec_info *first_dr
>> > +  = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (dr_info->stmt));
>> > +  int misalign = first_dr->misalignment;
>> > +  gcc_assert (misalign != DR_MISALIGNMENT_UNINITIALIZED);
>> > +  if (misalign == DR_MISALIGNMENT_UNKNOWN)
>> > +  return misalign;
>> > +  poly_offset_int diff = (wi::to_poly_offset (DR_INIT (dr_info->dr))
>> > +- wi::to_poly_offset (DR_INIT (first_dr->dr)));
>> > +  poly_int64 mispoly = misalign + diff.to_constant ().to_shwi ();
>> > +  bool res = known_misalignment (mispoly,
>> > +   first_dr->target_alignment.to_constant (),
>> > +   );
>> > +  gcc_assert (res);
>> > +  return misalign;
>> 
>> Yeah, not too keen on the to_constants here.  The one on diff looks
>> redundant -- you could just use diff.force_shwi () instead, and
>> keep everything poly_int.
>>
>> For the known_misalignment I think we should use:
>> 
>>if (!can_div_trunc_p (mispoly, first_dr->target_alignment,
>>   , ))
>>  misalign = DR_MISALIGNMENT_UNKNOWN;
>>return misalign;
>> 
>> There are then no to_constant assumptions.
>
> OK, note that group analysis does
>
>   /* Check that the DR_INITs are compile-time constants.  */
>   if (TREE_CODE (DR_INIT (dra)) != INTEGER_CST
>   || TREE_CODE (DR_INIT (drb)) != INTEGER_CST)
> break;
>
>   /* Sorting has ensured that DR_INIT (dra) <= DR_INIT (drb).  */
>   HOST_WIDE_INT init_a = TREE_INT_CST_LOW (DR_INIT (dra));
>   HOST_WIDE_INT init_b = TREE_INT_CST_LOW (DR_INIT (drb));
>
> so I'm confident my variant was "correct", but it still was ugly.

Ah, OK.  In that case I don't mind the original version, but it would be
good to have a comment above the to_constant saying where the condition
is enforced.

I'm just trying to avoid to_constant calls with no comment to explain
them, and with no nearby is_constant call.  Otherwise it could end up
a bit like tree_to_uhwi, where sometimes tree_fits_uhwi_p really has
been checked earlier (not always obvious where) and sometimes
tree_to_uhwi is just used out of hope, to avoid having to think about
the alternative.

> There's also the issue that target_alignment is poly_uint64 but
> misalignment is signed int.
>
> Note that can_div_trunc_p seems to require a poly_uint64 remainder,
> I'm not sure what to do with that, so I used is_constant.

Ah, yeah, forgot about that sorry.  I guess in that case, using
is_constant on first_dr->target_alignment and sticking with
known_misalignment would make sense.

> Btw, to what value do we want to align with variable sized 

  1   2   >