Re: [PATCH] c++: CTAD within alias template [PR91911]

2021-12-21 Thread Jason Merrill via Gcc-patches

On 12/21/21 14:08, Patrick Palka wrote:

On Tue, Dec 21, 2021 at 2:03 PM Patrick Palka  wrote:


On Wed, Jun 30, 2021 at 4:23 PM Jason Merrill  wrote:


On 6/30/21 4:18 PM, Patrick Palka wrote:

On Wed, Jun 30, 2021 at 3:51 PM Jason Merrill  wrote:


On 6/30/21 11:58 AM, Patrick Palka wrote:

On Wed, 30 Jun 2021, Patrick Palka wrote:


On Fri, 25 Jun 2021, Jason Merrill wrote:


On 6/25/21 1:11 PM, Patrick Palka wrote:

On Fri, 25 Jun 2021, Jason Merrill wrote:


On 6/24/21 4:45 PM, Patrick Palka wrote:

In the first testcase below, during parsing of the alias template
ConstSpanType, transparency of alias template specializations means we
replace SpanType with SpanType's substituted definition.  But this
substitution lowers the level of the CTAD placeholder for span(T()) from
2 to 1, and so the later instantiantion of ConstSpanType
erroneously substitutes this CTAD placeholder with the template argument
at level 1 index 0, i.e. with int, before we get a chance to perform the
CTAD.

In light of this, it seems we should avoid level lowering when
substituting through through the type-id of a dependent alias template
specialization.  To that end this patch makes lookup_template_class_1
pass tf_partial to tsubst in this situation.


This makes sense, but what happens if SpanType is a member template, so
that
the levels of it and ConstSpanType don't match?  Or the other way around?


If SpanType is a member template of say the class template A (and
thus its level is greater than ConstSpanType):

  template
  struct A {
template
using SpanType = decltype(span(T()));
  };

  template
  using ConstSpanType = span::SpanType::value_type>;

  using type = ConstSpanType;

then this case luckily works even without the patch because
instantiate_class_template now reuses the specialization A::SpanType
that was formed earlier during instantiation of A, where we
substitute only a single level of template arguments, so the level of
the CTAD placeholder inside the defining-type-id of this specialization
dropped from 3 to 2, so still more than the level of ConstSpanType.

This luck is short-lived though, because if we replace
A::SpanType with say A::SpanType then the testcase
breaks again (without the patch) because we no longer can reuse that
specialization, so we instead form it on the spot by substituting two
levels of template arguments (U=int,T=T) into the defining-type-id,
causing the level of the placeholder to drop to 1.  I think the patch
causes its level to remain 3 (though I guess it should really be 2).


For the other way around, if ConstSpanType is a member template of
say the class template B (and thus its level is greater than
SpanType):

  template
  using SpanType = decltype(span(T()));

  template
  struct B {
template
using ConstSpanType = span::value_type>;
  };

  using type = B::ConstSpanType;

then tf_partial doesn't help here at all; we end up substituting 'int'
for the CTAD placeholder...  What it seems we need is to _increase_ the
level of the CTAD placeholder from 2 to 3 during the dependent
substitution..

Hmm, rather than messing with tf_partial, which is apparently only a
partial solution, maybe we should just make tsubst never substitute a
CTAD placeholder -- they should always be resolved from do_class_deduction,
and their level doesn't really matter otherwise.  (But we'd still want
to substitute into the CLASS_PLACEHOLDER_TEMPLATE of the placeholder in
case it's a template template parm.)  Something like:

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 5107bfbf9d1..dead651ed84 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -15552,7 +15550,8 @@ tsubst (tree t, tree args, tsubst_flags_t complain,
tree in_decl)
 levels = TMPL_ARGS_DEPTH (args);
 if (level <= levels
-  && TREE_VEC_LENGTH (TMPL_ARGS_LEVEL (args, level)) > 0)
+  && TREE_VEC_LENGTH (TMPL_ARGS_LEVEL (args, level)) > 0
+  && !template_placeholder_p (t))
   {
 arg = TMPL_ARG (args, level, idx);

seems to work better.


Makes sense.


Here's a patch that implements that.  I reckon it's good to have both
workarounds in place because the tf_partial workaround is necessary to
accept class-deduction93a.C below, and the tsubst workaround is
necessary to accept class-deduction-92b.C below.


Whoops, forgot to git-add class-deduction93a.C:

-- >8 --

Subject: [PATCH] c++: CTAD within alias template [PR91911]

In the first testcase below, during parsing of the alias template
ConstSpanType, transparency of alias template specializations means we
replace SpanType with SpanType's substituted definition.  But this
substitution lowers the level of the CTAD placeholder for span{T()} from
2 to 1, and so the later instantiation of ConstSpanType erroneously
substitutes this CTAD placeholder with the template argument at level 1
index 0, i.e. with int, before we get a chance to perform the CTAD.

In light of this, 

[PATCH] i386: Enable intrinsics that convert float and bf16 data to each other.

2021-12-21 Thread Kong, Lingling via Gcc-patches
Hi,


This patch is to enable intrinsics that convert float and bf16 data to each 
other.
Ok for master?

gcc/ChangeLog:

* config/i386/avx512bf16intrin.h (_mm_cvtsbh_ss): Add new intrinsic.
(_mm512_cvtpbh_ps): Likewise.
(_mm512_maskz_cvtpbh_ps): Likewise.
(_mm512_mask_cvtpbh_ps): Likewise.
* config/i386/avx512bf16vlintrin.h (_mm_cvtness_sbh): Likewise.
(_mm_cvtpbh_ps): Likewise.
(_mm256_cvtpbh_ps): Likewise.
(_mm_maskz_cvtpbh_ps): Likewise.
(_mm256_maskz_cvtpbh_ps): Likewise.
(_mm_mask_cvtpbh_ps): Likewise.
(_mm256_mask_cvtpbh_ps): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512bf16-cvtsbh2ss-1.c: New test.
* gcc.target/i386/avx512bf16-vcvtpbh2ps-1.c: Ditto.
* gcc.target/i386/avx512bf16vl-cvtness2sbh-1.c: Ditto.
* gcc.target/i386/avx512bf16vl-vcvtpbh2ps-1.c: Ditto.
---
 gcc/config/i386/avx512bf16intrin.h| 36 +++
 gcc/config/i386/avx512bf16vlintrin.h  | 63 +++
 .../gcc.target/i386/avx512bf16-cvtsbh2ss-1.c  | 15 +  
.../gcc.target/i386/avx512bf16-vcvtpbh2ps-1.c | 20 ++
 .../i386/avx512bf16vl-cvtness2sbh-1.c | 14 +
 .../i386/avx512bf16vl-vcvtpbh2ps-1.c  | 29 +
 6 files changed, 177 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512bf16-cvtsbh2ss-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512bf16-vcvtpbh2ps-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512bf16vl-cvtness2sbh-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512bf16vl-vcvtpbh2ps-1.c

diff --git a/gcc/config/i386/avx512bf16intrin.h 
b/gcc/config/i386/avx512bf16intrin.h
index 9afc6bd7d2b..6b62dc3e398 100644
--- a/gcc/config/i386/avx512bf16intrin.h
+++ b/gcc/config/i386/avx512bf16intrin.h
@@ -41,6 +41,16 @@ typedef short __v32bh __attribute__ ((__vector_size__ (64)));
vector types, and their scalar components.  */  typedef short __m512bh 
__attribute__ ((__vector_size__ (64), __may_alias__));
 
+/* Convert One BF16 Data to One Single Float Data.  */ extern __inline 
+float __attribute__ ((__gnu_inline__, __always_inline__, 
+__artificial__)) _mm_cvtsbh_ss (__bfloat16 __A) {
+  union{ float a; unsigned int b;} __tmp;
+  __tmp.b = ((unsigned int)(__A)) << 16;
+  return __tmp.a;
+}
+
 /* vcvtne2ps2bf16 */
 
 extern __inline __m512bh
@@ -110,6 +120,32 @@ _mm512_maskz_dpbf16_ps (__mmask16 __A, __m512 __B, 
__m512bh __C, __m512bh __D)
   return (__m512)__builtin_ia32_dpbf16ps_v16sf_maskz(__B, __C, __D, __A);  }
 
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) 
+_mm512_cvtpbh_ps (__m256bh __A) {
+  return (__m512)_mm512_castsi512_ps ((__m512i)_mm512_slli_epi32 (
+(__m512i)_mm512_cvtepi16_epi32 ((__m256i)__A), 16)); }
+
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) 
+_mm512_maskz_cvtpbh_ps (__mmask16 __U, __m256bh __A) {
+  return (__m512)_mm512_castsi512_ps ((__m512i) _mm512_slli_epi32 (
+(__m512i)_mm512_maskz_cvtepi16_epi32 (
+(__mmask16)__U, (__m256i)__A), 16));
+}
+
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) 
+_mm512_mask_cvtpbh_ps (__m512 __S, __mmask16 __U, __m256bh __A) {
+  return (__m512)_mm512_castsi512_ps ((__m512i)(_mm512_mask_slli_epi32 (
+(__m512i)__S, (__mmask16)__U,
+(__m512i)_mm512_cvtepi16_epi32 ((__m256i)__A), 16))); }
+
 #ifdef __DISABLE_AVX512BF16__
 #undef __DISABLE_AVX512BF16__
 #pragma GCC pop_options
diff --git a/gcc/config/i386/avx512bf16vlintrin.h 
b/gcc/config/i386/avx512bf16vlintrin.h
index 6dd396d4008..5e6a6503aa6 100644
--- a/gcc/config/i386/avx512bf16vlintrin.h
+++ b/gcc/config/i386/avx512bf16vlintrin.h
@@ -43,6 +43,7 @@ typedef short __v8bh __attribute__ ((__vector_size__ (16)));  
typedef short __m256bh __attribute__ ((__vector_size__ (32), __may_alias__));  
typedef short __m128bh __attribute__ ((__vector_size__ (16), __may_alias__));
 
+typedef unsigned short __bfloat16;
 /* vcvtne2ps2bf16 */
 
 extern __inline __m256bh
@@ -175,6 +176,68 @@ _mm_maskz_dpbf16_ps (__mmask8 __A, __m128 __B, __m128bh 
__C, __m128bh __D)
   return (__m128)__builtin_ia32_dpbf16ps_v4sf_maskz(__B, __C, __D, __A);  }
 
+extern __inline __bfloat16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) 
+_mm_cvtness_sbh (float __A) {
+  __v4sf __V = {__A, 0, 0, 0};
+  __v8hi __R = __builtin_ia32_cvtneps2bf16_v4sf_mask ((__v4sf)__V,
+  (__v8hi)_mm_undefined_si128 (), (__mmask8)-1);
+  return __R[0];
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) 
+_mm_cvtpbh_ps (__m128bh __A) {
+  return (__m128)_mm_castsi128_ps ((__m128i)_mm_slli_epi32 (
+(__m128i)_mm_cvtepi16_epi32 ((__m128i)__A), 16)); }
+
+extern __inline __m256
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) 
+_mm256_cvtpbh_ps (__m128bh __A) {
+ 

[PATCH] [i386] Add option -mvect-compare-costs

2021-12-21 Thread liuhongt via Gcc-patches
Here is updated patch.

Also with corresponding target attribute, option default disabled.

gcc/ChangeLog:

* config/i386/i386-options.c (ix86_target_string): Handle
-mvect-compare-costs.
(ix86_valid_target_attribute_inner_p): Support target attribute
vect-compare-costs.
* config/i386/i386.c (ix86_autovectorize_vector_modes): Return
1 when TARGET_X86_VECT_COMPARE_COSTS.
* config/i386/i386.opt: Add option -mvect-compare-costs.
* doc/invoke.texi: Document -mvect-compare-costs.
---
 gcc/config/i386/i386-options.c | 7 ++-
 gcc/config/i386/i386.c | 5 -
 gcc/config/i386/i386.opt   | 5 +
 gcc/doc/invoke.texi| 7 ++-
 4 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index 53bd55a12e3..53794b13fc5 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -406,7 +406,8 @@ ix86_target_string (HOST_WIDE_INT isa, HOST_WIDE_INT isa2,
   /* Additional flag options.  */
   static struct ix86_target_opts flag2_opts[] =
   {
-{ "-mgeneral-regs-only",   OPTION_MASK_GENERAL_REGS_ONLY }
+{ "-mgeneral-regs-only",   OPTION_MASK_GENERAL_REGS_ONLY },
+{ "-mvect-compare-costs",  OPTION_MASK_X86_VECT_COMPARE_COSTS }
   };
 
   const char *opts[ARRAY_SIZE (isa_opts) + ARRAY_SIZE (isa2_opts)
@@ -,6 +1112,10 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree 
args, char *p_strings[],
OPT_mgeneral_regs_only,
OPTION_MASK_GENERAL_REGS_ONLY),
 
+IX86_ATTR_IX86_YES ("vect-compare-costs",
+   OPT_mvect_compare_costs,
+   OPTION_MASK_X86_VECT_COMPARE_COSTS),
+
 IX86_ATTR_YES ("relax-cmpxchg-loop",
   OPT_mrelax_cmpxchg_loop,
   MASK_RELAX_CMPXCHG_LOOP),
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ec155826310..fd471c953c4 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -22833,7 +22833,10 @@ ix86_autovectorize_vector_modes (vector_modes *modes, 
bool all)
   if (TARGET_SSE2)
 modes->safe_push (V4QImode);
 
-  return 0;
+  unsigned int flags = 0;
+  if (TARGET_X86_VECT_COMPARE_COSTS)
+flags |= VECT_COMPARE_COSTS;
+  return flags;
 }
 
 /* Implemenation of targetm.vectorize.get_mask_mode.  */
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index e1af3e417b0..80c7a073d07 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1206,3 +1206,8 @@ Support MWAIT and MONITOR built-in functions and code 
generation.
 mavx512fp16
 Target Mask(ISA2_AVX512FP16) Var(ix86_isa_flags2) Save
 Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX512F and 
AVX512FP16 built-in functions and code generation.
+
+mvect-compare-costs
+Target Mask(X86_VECT_COMPARE_COSTS) Var(ix86_target_flags) Save
+Tells the loop vectorizer to try all the provided vector lengths and pick the
+one with the lowest cost.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index ca621577432..4ee522b3d66 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1428,7 +1428,7 @@ See RS/6000 and PowerPC Options.
 -malign-data=@var{type}  -mstack-protector-guard=@var{guard} @gol
 -mstack-protector-guard-reg=@var{reg} @gol
 -mstack-protector-guard-offset=@var{offset} @gol
--mstack-protector-guard-symbol=@var{symbol} @gol
+-mstack-protector-guard-symbol=@var{symbol} -mvect-compare-costs@gol
 -mgeneral-regs-only  -mcall-ms2sysv-xlogues -mrelax-cmpxchg-loop @gol
 -mindirect-branch=@var{choice}  -mfunction-return=@var{choice} @gol
 -mindirect-branch-register -mharden-sls=@var{choice} @gol
@@ -32447,6 +32447,11 @@ Generate code that uses only the general-purpose 
registers.  This
 prevents the compiler from using floating-point, vector, mask and bound
 registers.
 
+@item -mvect-compare-costs
+@opindex mcompare-vect-costs
+Tells the loop vectorizer to try all the vector lengths and pick the one
+with the lowest cost.
+
 @item -mrelax-cmpxchg-loop
 @opindex mrelax-cmpxchg-loop
 Relax cmpxchg loop by emitting an early load and compare before cmpxchg,
-- 
2.18.1



Re: [PATCH] PR target/32803: Add -Oz option for improved clang compatibility.

2021-12-21 Thread Eric Gallager via Gcc-patches
On Tue, Dec 14, 2021 at 6:33 PM Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 12/13/2021 5:27 PM, Joseph Myers wrote:
> > This is missing an invoke.texi update for the new option.
> And that update should probably note that -Oz turns on O2.  OK with that
> change.
>
> jeff

A news entry for the new optimization option would also be nice to
have in gcc-12/changes.html (in gcc-wwwdocs), so that people can know
it's available.


Re: [PATCH v3, rs6000] Implement mffscrni pattern

2021-12-21 Thread HAO CHEN GUI via Gcc-patches
Hi Segher,
  Thanks for your advice. Please see my explanation below.

On 22/12/2021 上午 1:05, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Dec 21, 2021 at 04:08:06PM +0800, HAO CHEN GUI wrote:
>>   This patch defines a pattern for mffscrni. If the RN is a constant, it can 
>> call
>> gen_rs6000_mffscrni directly.
> 
> And that isn't more work than just falling through to the general case.
> Okay.
> 
>> The "rs6000-builtin-new.def" defines prototype for builtin arguments.
>> The pattern "rs6000_set_fpscr_rn" is then broken as the mode of its argument 
>> is DI while its
>> corresponding builtin has a const int argument.
> 
> I don't understand that bit.  Do you mean it is a const_int, or do you
> mean it is an "int" as in C source code, i.e. a 32-bit integer?

With the trunk, the new test case hits following ICE.

test_fpscr_rn_builtin.c: In function ‘wrap_set_fpscr_rn’:
test_fpscr_rn_builtin.c:13:3: internal compiler error: in copy_to_mode_reg, at 
explow.c:652
   13 |   __builtin_set_fpscr_rn (val);
  |   ^~~~
0x1065892f copy_to_mode_reg(machine_mode, rtx_def*)
/home/guihaoc/gcc/gcc-mainline-base/gcc/explow.c:652
0x11206127 rs6000_expand_new_builtin

/home/guihaoc/gcc/gcc-mainline-base/gcc/config/rs6000/rs6000-call.c:15974
0x11206127 rs6000_expand_builtin(tree_node*, rtx_def*, rtx_def*, machine_mode, 
int)

/home/guihaoc/gcc/gcc-mainline-base/gcc/config/rs6000/rs6000-call.c:14591

The val is int variable, so its mode is SImode. But rs6000_set_fpscr_rn 
requires DImode.

  for (int i = 0; i < nargs; i++)
if (!insn_data[icode].operand[i+k].predicate (op[i], mode[i+k]))
  op[i] = copy_to_mode_reg (mode[i+k], op[i]);

So it executes copy_to_mode_reg to copy it to DImode. Then it fails at 
assertion in
copy_to_mode_reg.

gcc_assert (GET_MODE (x) == mode || GET_MODE (x) == VOIDmode);

The original test case only tests the val as const int(E_VOIDmode). So it never 
hits
copy_to_mode_reg.

> 
>>  * config/rs6000/rs6000-call.c
>>  (rs6000_expand_set_fpscr_rn_builtin): Not copy argument to a reg if
>>  it's a constant.  The pattern for constant can be recognized now.
> 
> "Do not copy"
> 
>>  * config/rs6000/rs6000.md (rs6000_mffscrni): Defined.
> 
> "Define".
> 
>>  (rs6000_set_fpscr_rn): Change the type of operand[0] form DI to SI.
> 
> "from"
> 
>>  Call gen_rs6000_mffscrni when operand[0] is a const int[0,3].
> 
> "if operands[0] is a const_0_to_3_operand"
> 
>>  * gcc.target/powerpc/mffscrni_p9.c: New testcase for mffscrni.
>>  * gcc.target/powerpc/test_fpscr_rn_builtin.c: Modify the test cases to
>>  test mffscrn and mffscrni separately.
> 
> "Test for mffscrn and mffscrni separately."
> 
> Everything you say in a changelog is "modify this" or "modify that", and
> (almost) all things in gcc/testsuite/ are testcases :-)
> 
>> @@ -6357,7 +6370,8 @@ (define_expand "rs6000_set_fpscr_rn"
>>rtx tmp_di = gen_reg_rtx (DImode);
>>
>>/* Extract new RN mode from operand.  */
>> -  emit_insn (gen_anddi3 (tmp_rn, operands[0], GEN_INT (0x3)));
>> +  rtx op0 = convert_to_mode (DImode, operands[0], false);
>> +  emit_insn (gen_anddi3 (tmp_rn, op0, GEN_INT (3)));
>>
>>/* Insert new RN mode into FSCPR.  */
>>emit_insn (gen_rs6000_mffs (tmp_df));
> 
> It doesn't seem correct to use DImode with -m32, hrm.  Not new of
> course, but I wonder how this worked.

With m32 on BE, it sets the low part to 0 when converting SI to DI.
As it just needs last two bits of high part, I set the signed to false.

rtx op0 = convert_to_mode (DImode, operands[0], false)

(insn 110 109 111 21 (set (subreg:SI (reg:DI 179 [ ll_value ]) 0)
(const_int 0 [0])) "test_fpscr_rn_builtin.c":167:12 556 
{*movsi_internal1}
 (nil))

> 
> Okay for trunk with such changelog fixes.  Thanks!
> 
> 
> Segher


[PATCH] Fix [11/12 regression] error: 'fenv_t' has not been declared in '::' -- canadian compilation fails [PR100017]

2021-12-21 Thread cqwrteur via Gcc-patches
libstdc++ cannot find fenv_t for fenv.h when doing canadian compilation.
Fix it by adding -nostdinc++ toggle to configure and configure.ac.

new patch after today's change to configure
---
 configure| 2 +-
 configure.ac | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index 9c2d7df1bb2..103a59627b0 100755
--- a/configure
+++ b/configure
@@ -17304,7 +17304,7 @@ else
 fi
 
 
-RAW_CXX_FOR_TARGET="$CXX_FOR_TARGET"
+RAW_CXX_FOR_TARGET="$CXX_FOR_TARGET -nostdinc++"
 
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking where to find the target ar" 
>&5
 $as_echo_n "checking where to find the target ar... " >&6; }
diff --git a/configure.ac b/configure.ac
index 68cc5cc31fe..baab3b02e2e 100644
--- a/configure.ac
+++ b/configure.ac
@@ -3636,7 +3636,7 @@ ACX_CHECK_INSTALLED_TARGET_TOOL(STRIP_FOR_TARGET, strip)
 ACX_CHECK_INSTALLED_TARGET_TOOL(WINDRES_FOR_TARGET, windres)
 ACX_CHECK_INSTALLED_TARGET_TOOL(WINDMC_FOR_TARGET, windmc)
 
-RAW_CXX_FOR_TARGET="$CXX_FOR_TARGET"
+RAW_CXX_FOR_TARGET="$CXX_FOR_TARGET -nostdinc++"
 
 GCC_TARGET_TOOL(ar, AR_FOR_TARGET, AR, [binutils/ar])
 GCC_TARGET_TOOL(as, AS_FOR_TARGET, AS, [gas/as-new])
-- 
2.25.1



[PATCH] disable aggressive_loop_optimizations until niter ready

2021-12-21 Thread Jiufu Guo via Gcc-patches
Hi,

Normaly, estimate_numbers_of_iterations get/caculate niter first,
and then invokes infer_loop_bounds_from_undefined. While in some case,
after a few call stacks, estimate_numbers_of_iterations is invoked before
niter is ready (e.g. before number_of_latch_executions returns).

e.g. number_of_latch_executions->...follow_ssa_edge_expr-->
  --> estimate_numbers_of_iterations --> infer_loop_bounds_from_undefined.

Since niter is still not computed, call to infer_loop_bounds_from_undefined
may not get final result.
To avoid infer_loop_bounds_from_undefined to be called with interim state
and avoid infer_loop_bounds_from_undefined generates interim data, during
niter's computing, we could disable flag_aggressive_loop_optimizations.

Bootstrap and regtest pass on ppc64* and x86_64.  Is this ok for trunk?

BR,
Jiufu

gcc/ChangeLog:

* tree-ssa-loop-niter.c (number_of_iterations_exit_assumptions):
Disable/restore flag_aggressive_loop_optimizations.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/scev-16.c: New test.

---
 gcc/tree-ssa-loop-niter.c   | 23 +++
 gcc/testsuite/gcc.dg/tree-ssa/scev-16.c | 20 
 2 files changed, 39 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/scev-16.c

diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index 06954e437f5..51bb501019e 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -2534,18 +2534,31 @@ number_of_iterations_exit_assumptions (class loop 
*loop, edge exit,
   && !POINTER_TYPE_P (type))
 return false;
 
+  /* Before niter is calculated, avoid to analyze interim state. */
+  int old_aggressive_loop_optimizations = flag_aggressive_loop_optimizations;
+  flag_aggressive_loop_optimizations = 0;
+
   tree iv0_niters = NULL_TREE;
   if (!simple_iv_with_niters (loop, loop_containing_stmt (stmt),
  op0, , safe ? _niters : NULL, false))
-return number_of_iterations_popcount (loop, exit, code, niter);
+{
+  bool res = number_of_iterations_popcount (loop, exit, code, niter);
+  flag_aggressive_loop_optimizations = old_aggressive_loop_optimizations;
+  return res;
+}
   tree iv1_niters = NULL_TREE;
   if (!simple_iv_with_niters (loop, loop_containing_stmt (stmt),
  op1, , safe ? _niters : NULL, false))
-return false;
+{
+  flag_aggressive_loop_optimizations = old_aggressive_loop_optimizations;
+  return false;
+}
   /* Give up on complicated case.  */
   if (iv0_niters && iv1_niters)
-return false;
-
+{
+  flag_aggressive_loop_optimizations = old_aggressive_loop_optimizations;
+  return false;
+}
   /* We don't want to see undefined signed overflow warnings while
  computing the number of iterations.  */
   fold_defer_overflow_warnings ();
@@ -2565,6 +2578,7 @@ number_of_iterations_exit_assumptions (class loop *loop, 
edge exit,
  only_exit_p, safe))
 {
   fold_undefer_and_ignore_overflow_warnings ();
+  flag_aggressive_loop_optimizations = old_aggressive_loop_optimizations;
   return false;
 }
 
@@ -2608,6 +2622,7 @@ number_of_iterations_exit_assumptions (class loop *loop, 
edge exit,
   niter->may_be_zero);
 
   fold_undefer_and_ignore_overflow_warnings ();
+  flag_aggressive_loop_optimizations = old_aggressive_loop_optimizations;
 
   /* If NITER has simplified into a constant, update MAX.  */
   if (TREE_CODE (niter->niter) == INTEGER_CST)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c 
b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
new file mode 100644
index 000..708ffab88ca
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ivopts-scev" } */
+
+/* Expect scalar_evolution = {(int) start_7(D), +, 1}_1), instead
+   (int) (short int) {(unsigned short) start_7(D), +, 1}_1 */
+
+int arr[1000];
+
+void
+s2i (short start, short end)
+{
+  int res = 0;
+  for (short i = start; i < end; i++)
+{
+  int lv = i;
+  arr[lv] += lv;
+}
+}
+
+/* { dg-final { scan-tree-dump-times "scalar_evolution = \{\\(int\\) 
start_\[0-9\]+\\(D\\), \\+, 1\}_1" 1 "ivopts" } } */
-- 
2.17.1



[PATCH] Fix ICE in lsplit when built with -O3 -fno-guess-branch-probability [PR103793]

2021-12-21 Thread Xionghu Luo via Gcc-patches
no-guess-branch-probability option requires profile_count with initialized_p
guard.  Also merge the missed part of r12-6086 of factor out function to
avoid duplicate code.

gcc/ChangeLog:

PR 103793
* tree-ssa-loop-split.c (fix_loop_bb_probability): New function.
(split_loop): Only update loop1's exit probability if
initialized.
(do_split_loop_on_cond): Call fix_loop_bb_probability.

gcc/testsuite/ChangeLog:

PR 103793
* gcc.dg/pr103793.c: New test.
---
 gcc/tree-ssa-loop-split.c   | 98 +++--
 gcc/testsuite/gcc.dg/pr103793.c | 12 
 2 files changed, 57 insertions(+), 53 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr103793.c

diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
index 33128061aab..bdd20d4e3ed 100644
--- a/gcc/tree-ssa-loop-split.c
+++ b/gcc/tree-ssa-loop-split.c
@@ -484,6 +484,39 @@ compute_new_first_bound (gimple_seq *stmts, class 
tree_niter_desc *niter,
   return newend;
 }
 
+/* Fix the two loop's bb count after split based on the split edge probability,
+   don't adjust the bbs dominated by true branches of that loop to avoid
+   dropping 1s down.  */
+static void
+fix_loop_bb_probability (class loop *loop1, class loop *loop2, edge true_edge,
+edge false_edge)
+{
+  update_ssa (TODO_update_ssa);
+
+  /* Proportion first loop's bb counts except those dominated by true
+ branch to avoid drop 1s down.  */
+  basic_block *bbs1, *bbs2;
+  bbs1 = get_loop_body (loop1);
+  unsigned j;
+  for (j = 0; j < loop1->num_nodes; j++)
+if (bbs1[j] == loop1->latch
+   || !dominated_by_p (CDI_DOMINATORS, bbs1[j], true_edge->dest))
+  bbs1[j]->count
+   = bbs1[j]->count.apply_probability (true_edge->probability);
+  free (bbs1);
+
+  /* Proportion second loop's bb counts except those dominated by false
+ branch to avoid drop 1s down.  */
+  basic_block bbi_copy = get_bb_copy (false_edge->dest);
+  bbs2 = get_loop_body (loop2);
+  for (j = 0; j < loop2->num_nodes; j++)
+if (bbs2[j] == loop2->latch
+   || !dominated_by_p (CDI_DOMINATORS, bbs2[j], bbi_copy))
+  bbs2[j]->count
+   = bbs2[j]->count.apply_probability (true_edge->probability.invert ());
+  free (bbs2);
+}
+
 /* Checks if LOOP contains an conditional block whose condition
depends on which side in the iteration space it is, and if so
splits the iteration space into two loops.  Returns true if the
@@ -610,37 +643,19 @@ split_loop (class loop *loop1)
tree guard_next = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop1));
patch_loop_exit (loop1, guard_stmt, guard_next, newend, initial_true);
 
-   update_ssa (TODO_update_ssa);
-
-   /* Proportion first loop's bb counts except those dominated by true
-  branch to avoid drop 1s down.  */
-   basic_block *bbs1, *bbs2;
-   bbs1 = get_loop_body (loop1);
-   unsigned j;
-   for (j = 0; j < loop1->num_nodes; j++)
- if (bbs1[j] == loop1->latch
- || !dominated_by_p (CDI_DOMINATORS, bbs1[j], true_edge->dest))
-   bbs1[j]->count
- = bbs1[j]->count.apply_probability (true_edge->probability);
-   free (bbs1);
+   fix_loop_bb_probability (loop1, loop2, true_edge, false_edge);
 
/* Fix first loop's exit probability after scaling.  */
-   edge exit_to_latch1 = single_pred_edge (loop1->latch);
-   exit_to_latch1->probability = exit_to_latch1->probability.apply_scale (
- true_edge->probability.to_reg_br_prob_base (), REG_BR_PROB_BASE);
-   single_exit (loop1)->probability
- = exit_to_latch1->probability.invert ();
-
-   /* Proportion second loop's bb counts except those dominated by false
-  branch to avoid drop 1s down.  */
-   basic_block bbi_copy = get_bb_copy (false_edge->dest);
-   bbs2 = get_loop_body (loop2);
-   for (j = 0; j < loop2->num_nodes; j++)
- if (bbs2[j] == loop2->latch
- || !dominated_by_p (CDI_DOMINATORS, bbs2[j], bbi_copy))
-   bbs2[j]->count = bbs2[j]->count.apply_probability (
- true_edge->probability.invert ());
-   free (bbs2);
+   if (true_edge->probability.initialized_p ())
+ {
+   edge exit_to_latch1 = single_pred_edge (loop1->latch);
+   exit_to_latch1->probability
+ = exit_to_latch1->probability.apply_scale (
+   true_edge->probability.to_reg_br_prob_base (),
+   REG_BR_PROB_BASE);
+   single_exit (loop1)->probability
+ = exit_to_latch1->probability.invert ();
+ }
 
/* Finally patch out the two copies of the condition to be always
   true/false (or opposite).  */
@@ -1570,40 +1585,17 @@ do_split_loop_on_cond (struct loop *loop1, edge 
invar_branch)
  between loop1 and loop2.  */
   connect_loop_phis (loop1, loop2, to_loop2);
 
-  update_ssa (TODO_update_ssa);
-
   edge true_edge, false_edge, 

Re: [PATCH] c++: CTAD within alias template [PR91911]

2021-12-21 Thread Patrick Palka via Gcc-patches
On Tue, Dec 21, 2021 at 2:03 PM Patrick Palka  wrote:
>
> On Wed, Jun 30, 2021 at 4:23 PM Jason Merrill  wrote:
> >
> > On 6/30/21 4:18 PM, Patrick Palka wrote:
> > > On Wed, Jun 30, 2021 at 3:51 PM Jason Merrill  wrote:
> > >>
> > >> On 6/30/21 11:58 AM, Patrick Palka wrote:
> > >>> On Wed, 30 Jun 2021, Patrick Palka wrote:
> > >>>
> >  On Fri, 25 Jun 2021, Jason Merrill wrote:
> > 
> > > On 6/25/21 1:11 PM, Patrick Palka wrote:
> > >> On Fri, 25 Jun 2021, Jason Merrill wrote:
> > >>
> > >>> On 6/24/21 4:45 PM, Patrick Palka wrote:
> >  In the first testcase below, during parsing of the alias template
> >  ConstSpanType, transparency of alias template specializations 
> >  means we
> >  replace SpanType with SpanType's substituted definition.  But 
> >  this
> >  substitution lowers the level of the CTAD placeholder for 
> >  span(T()) from
> >  2 to 1, and so the later instantiantion of ConstSpanType
> >  erroneously substitutes this CTAD placeholder with the template 
> >  argument
> >  at level 1 index 0, i.e. with int, before we get a chance to 
> >  perform the
> >  CTAD.
> > 
> >  In light of this, it seems we should avoid level lowering when
> >  substituting through through the type-id of a dependent alias 
> >  template
> >  specialization.  To that end this patch makes 
> >  lookup_template_class_1
> >  pass tf_partial to tsubst in this situation.
> > >>>
> > >>> This makes sense, but what happens if SpanType is a member 
> > >>> template, so
> > >>> that
> > >>> the levels of it and ConstSpanType don't match?  Or the other way 
> > >>> around?
> > >>
> > >> If SpanType is a member template of say the class template A 
> > >> (and
> > >> thus its level is greater than ConstSpanType):
> > >>
> > >>  template
> > >>  struct A {
> > >>template
> > >>using SpanType = decltype(span(T()));
> > >>  };
> > >>
> > >>  template
> > >>  using ConstSpanType = span > >> A::SpanType::value_type>;
> > >>
> > >>  using type = ConstSpanType;
> > >>
> > >> then this case luckily works even without the patch because
> > >> instantiate_class_template now reuses the specialization 
> > >> A::SpanType
> > >> that was formed earlier during instantiation of A, where we
> > >> substitute only a single level of template arguments, so the level of
> > >> the CTAD placeholder inside the defining-type-id of this 
> > >> specialization
> > >> dropped from 3 to 2, so still more than the level of ConstSpanType.
> > >>
> > >> This luck is short-lived though, because if we replace
> > >> A::SpanType with say A::SpanType then the 
> > >> testcase
> > >> breaks again (without the patch) because we no longer can reuse that
> > >> specialization, so we instead form it on the spot by substituting two
> > >> levels of template arguments (U=int,T=T) into the defining-type-id,
> > >> causing the level of the placeholder to drop to 1.  I think the patch
> > >> causes its level to remain 3 (though I guess it should really be 2).
> > >>
> > >>
> > >> For the other way around, if ConstSpanType is a member template of
> > >> say the class template B (and thus its level is greater than
> > >> SpanType):
> > >>
> > >>  template
> > >>  using SpanType = decltype(span(T()));
> > >>
> > >>  template
> > >>  struct B {
> > >>template
> > >>using ConstSpanType = span > >> SpanType::value_type>;
> > >>  };
> > >>
> > >>  using type = B::ConstSpanType;
> > >>
> > >> then tf_partial doesn't help here at all; we end up substituting 
> > >> 'int'
> > >> for the CTAD placeholder...  What it seems we need is to _increase_ 
> > >> the
> > >> level of the CTAD placeholder from 2 to 3 during the dependent
> > >> substitution..
> > >>
> > >> Hmm, rather than messing with tf_partial, which is apparently only a
> > >> partial solution, maybe we should just make tsubst never substitute a
> > >> CTAD placeholder -- they should always be resolved from 
> > >> do_class_deduction,
> > >> and their level doesn't really matter otherwise.  (But we'd still 
> > >> want
> > >> to substitute into the CLASS_PLACEHOLDER_TEMPLATE of the placeholder 
> > >> in
> > >> case it's a template template parm.)  Something like:
> > >>
> > >> diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
> > >> index 5107bfbf9d1..dead651ed84 100644
> > >> --- a/gcc/cp/pt.c
> > >> +++ b/gcc/cp/pt.c
> > >> @@ -15552,7 +15550,8 @@ tsubst (tree t, tree args, tsubst_flags_t 
> > >> complain,
> > >> tree in_decl)
> > >>  

Re: [PATCH] c++: CTAD within alias template [PR91911]

2021-12-21 Thread Patrick Palka via Gcc-patches
On Wed, Jun 30, 2021 at 4:23 PM Jason Merrill  wrote:
>
> On 6/30/21 4:18 PM, Patrick Palka wrote:
> > On Wed, Jun 30, 2021 at 3:51 PM Jason Merrill  wrote:
> >>
> >> On 6/30/21 11:58 AM, Patrick Palka wrote:
> >>> On Wed, 30 Jun 2021, Patrick Palka wrote:
> >>>
>  On Fri, 25 Jun 2021, Jason Merrill wrote:
> 
> > On 6/25/21 1:11 PM, Patrick Palka wrote:
> >> On Fri, 25 Jun 2021, Jason Merrill wrote:
> >>
> >>> On 6/24/21 4:45 PM, Patrick Palka wrote:
>  In the first testcase below, during parsing of the alias template
>  ConstSpanType, transparency of alias template specializations means 
>  we
>  replace SpanType with SpanType's substituted definition.  But this
>  substitution lowers the level of the CTAD placeholder for span(T()) 
>  from
>  2 to 1, and so the later instantiantion of ConstSpanType
>  erroneously substitutes this CTAD placeholder with the template 
>  argument
>  at level 1 index 0, i.e. with int, before we get a chance to perform 
>  the
>  CTAD.
> 
>  In light of this, it seems we should avoid level lowering when
>  substituting through through the type-id of a dependent alias 
>  template
>  specialization.  To that end this patch makes lookup_template_class_1
>  pass tf_partial to tsubst in this situation.
> >>>
> >>> This makes sense, but what happens if SpanType is a member template, 
> >>> so
> >>> that
> >>> the levels of it and ConstSpanType don't match?  Or the other way 
> >>> around?
> >>
> >> If SpanType is a member template of say the class template A (and
> >> thus its level is greater than ConstSpanType):
> >>
> >>  template
> >>  struct A {
> >>template
> >>using SpanType = decltype(span(T()));
> >>  };
> >>
> >>  template
> >>  using ConstSpanType = span >> A::SpanType::value_type>;
> >>
> >>  using type = ConstSpanType;
> >>
> >> then this case luckily works even without the patch because
> >> instantiate_class_template now reuses the specialization 
> >> A::SpanType
> >> that was formed earlier during instantiation of A, where we
> >> substitute only a single level of template arguments, so the level of
> >> the CTAD placeholder inside the defining-type-id of this specialization
> >> dropped from 3 to 2, so still more than the level of ConstSpanType.
> >>
> >> This luck is short-lived though, because if we replace
> >> A::SpanType with say A::SpanType then the 
> >> testcase
> >> breaks again (without the patch) because we no longer can reuse that
> >> specialization, so we instead form it on the spot by substituting two
> >> levels of template arguments (U=int,T=T) into the defining-type-id,
> >> causing the level of the placeholder to drop to 1.  I think the patch
> >> causes its level to remain 3 (though I guess it should really be 2).
> >>
> >>
> >> For the other way around, if ConstSpanType is a member template of
> >> say the class template B (and thus its level is greater than
> >> SpanType):
> >>
> >>  template
> >>  using SpanType = decltype(span(T()));
> >>
> >>  template
> >>  struct B {
> >>template
> >>using ConstSpanType = span >> SpanType::value_type>;
> >>  };
> >>
> >>  using type = B::ConstSpanType;
> >>
> >> then tf_partial doesn't help here at all; we end up substituting 'int'
> >> for the CTAD placeholder...  What it seems we need is to _increase_ the
> >> level of the CTAD placeholder from 2 to 3 during the dependent
> >> substitution..
> >>
> >> Hmm, rather than messing with tf_partial, which is apparently only a
> >> partial solution, maybe we should just make tsubst never substitute a
> >> CTAD placeholder -- they should always be resolved from 
> >> do_class_deduction,
> >> and their level doesn't really matter otherwise.  (But we'd still want
> >> to substitute into the CLASS_PLACEHOLDER_TEMPLATE of the placeholder in
> >> case it's a template template parm.)  Something like:
> >>
> >> diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
> >> index 5107bfbf9d1..dead651ed84 100644
> >> --- a/gcc/cp/pt.c
> >> +++ b/gcc/cp/pt.c
> >> @@ -15552,7 +15550,8 @@ tsubst (tree t, tree args, tsubst_flags_t 
> >> complain,
> >> tree in_decl)
> >> levels = TMPL_ARGS_DEPTH (args);
> >> if (level <= levels
> >> -  && TREE_VEC_LENGTH (TMPL_ARGS_LEVEL (args, level)) > 0)
> >> +  && TREE_VEC_LENGTH (TMPL_ARGS_LEVEL (args, level)) > 0
> >> +  && !template_placeholder_p (t))
> >>   {
> >> arg = TMPL_ARG (args, level, idx);
> >>
> >> 

Re: [PATCH][Hashtable 6/6] PR 68303 small size optimization

2021-12-21 Thread François Dumont via Gcc-patches

On 21/12/21 7:28 am, Daniel Krügler wrote:

Am Di., 21. Dez. 2021 um 07:08 Uhr schrieb François Dumont via
Libstdc++ :

Hi

  Is there a chance for this patch to be integrated for next gcc
release ?

François


No counterargument for the acceptance, but: Shouldn't
__small_size_threshold() be a noexcept function?

- Daniel


Could it enhance code generation ? I could make it depends on 
_Hashtable_hash_traits<>::__small_size_threshold() noexcept 
qualification if so. But I was hoping that the compiler to detect all 
that itself.


Otherwise no, it do not have to be noexcept as it is used to avoid 
hasher invocation in some situations and hasher is not noexcept 
constraint. At least I do not need to static_assert this.





Re: [PATCH v3, rs6000] Implement mffscrni pattern

2021-12-21 Thread Segher Boessenkool
Hi!

On Tue, Dec 21, 2021 at 04:08:06PM +0800, HAO CHEN GUI wrote:
>   This patch defines a pattern for mffscrni. If the RN is a constant, it can 
> call
> gen_rs6000_mffscrni directly.

And that isn't more work than just falling through to the general case.
Okay.

> The "rs6000-builtin-new.def" defines prototype for builtin arguments.
> The pattern "rs6000_set_fpscr_rn" is then broken as the mode of its argument 
> is DI while its
> corresponding builtin has a const int argument.

I don't understand that bit.  Do you mean it is a const_int, or do you
mean it is an "int" as in C source code, i.e. a 32-bit integer?

>   * config/rs6000/rs6000-call.c
>   (rs6000_expand_set_fpscr_rn_builtin): Not copy argument to a reg if
>   it's a constant.  The pattern for constant can be recognized now.

"Do not copy"

>   * config/rs6000/rs6000.md (rs6000_mffscrni): Defined.

"Define".

>   (rs6000_set_fpscr_rn): Change the type of operand[0] form DI to SI.

"from"

>   Call gen_rs6000_mffscrni when operand[0] is a const int[0,3].

"if operands[0] is a const_0_to_3_operand"

>   * gcc.target/powerpc/mffscrni_p9.c: New testcase for mffscrni.
>   * gcc.target/powerpc/test_fpscr_rn_builtin.c: Modify the test cases to
>   test mffscrn and mffscrni separately.

"Test for mffscrn and mffscrni separately."

Everything you say in a changelog is "modify this" or "modify that", and
(almost) all things in gcc/testsuite/ are testcases :-)

> @@ -6357,7 +6370,8 @@ (define_expand "rs6000_set_fpscr_rn"
>rtx tmp_di = gen_reg_rtx (DImode);
> 
>/* Extract new RN mode from operand.  */
> -  emit_insn (gen_anddi3 (tmp_rn, operands[0], GEN_INT (0x3)));
> +  rtx op0 = convert_to_mode (DImode, operands[0], false);
> +  emit_insn (gen_anddi3 (tmp_rn, op0, GEN_INT (3)));
> 
>/* Insert new RN mode into FSCPR.  */
>emit_insn (gen_rs6000_mffs (tmp_df));

It doesn't seem correct to use DImode with -m32, hrm.  Not new of
course, but I wonder how this worked.

Okay for trunk with such changelog fixes.  Thanks!


Segher


[GCC-11][committed] libphobos: Add power*-*-freebsd* as supported target

2021-12-21 Thread Iain Buclaw via Gcc-patches
This patch backports the change in mainline that adds power*-*-freebsd*
as supported targets for libphobos, which soft depends on another change
in mainline that adds FreeBSD_13 support for the bindings.

Regression tested on powerpc64-portbld-freebsd13.0, and committed to the
releases/gcc-11 branch.

Regards,
Iain.

libphobos/ChangeLog:

* configure.tgt: Add power*-*-freebsd* as a supported target.
* libdruntime/core/sys/freebsd/config.d: Define
  __FreeBSD_version for FreeBSD_13 targets.

(cherry picked from commit 0c3fc06c300f5b71f299812c7fcac82b0236e5ac)
---
 libphobos/configure.tgt | 3 +++
 libphobos/libdruntime/core/sys/freebsd/config.d | 3 ++-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/libphobos/configure.tgt b/libphobos/configure.tgt
index 88c027d0c28..0643daeb613 100644
--- a/libphobos/configure.tgt
+++ b/libphobos/configure.tgt
@@ -39,6 +39,9 @@ case "${target}" in
   mips*-*-linux*)
LIBPHOBOS_SUPPORTED=yes
;;
+  power*-*-freebsd*)
+   LIBPHOBOS_SUPPORTED=yes
+   ;;
   power*-*-linux*)
LIBPHOBOS_SUPPORTED=yes
LIBDRUNTIME_ONLY=yes
diff --git a/libphobos/libdruntime/core/sys/freebsd/config.d 
b/libphobos/libdruntime/core/sys/freebsd/config.d
index 4eda066b293..ead941c0e67 100644
--- a/libphobos/libdruntime/core/sys/freebsd/config.d
+++ b/libphobos/libdruntime/core/sys/freebsd/config.d
@@ -13,7 +13,8 @@ public import core.sys.posix.config;
 // __FreeBSD_version numbers are documented in the Porter's Handbook.
 // NOTE: When adding newer versions of FreeBSD, verify all current versioned
 // bindings are still compatible with the release.
- version (FreeBSD_12) enum __FreeBSD_version = 1202000;
+ version (FreeBSD_13) enum __FreeBSD_version = 130;
+else version (FreeBSD_12) enum __FreeBSD_version = 1202000;
 else version (FreeBSD_11) enum __FreeBSD_version = 1104000;
 else version (FreeBSD_10) enum __FreeBSD_version = 1004000;
 else version (FreeBSD_9)  enum __FreeBSD_version = 903000;
-- 
2.32.0



vxworks libstdc++ locale

2021-12-21 Thread Rasmus Villemoes via Gcc-patches
Hi

While trying to upgrade our vxworks 5.5 compiler to gcc12, I've hit a
problem when loading the libstdc++ module on target. It manifests as

[00] tShell memPartFree: invalid block 8bf72c in partition 9605dc.
[00] tShell memPartFree: invalid block 8bf38c in partition 9605dc.
[00] tShell memPartFree: invalid block 8bf304 in partition 9605dc.
[00] tShell memPartFree: invalid block 8bf348 in partition 9605dc.
[00] tShell memPartFree: invalid block 8bf23c in partition 9605dc.
[00] tShell memPartFree: invalid block 8bf6c4 in partition 9605dc.
[00] tShell memPartFree: invalid block 8bf794 in partition 9605dc.
[00] tShell memPartFree: invalid block 8bf7a0 in partition 9605dc.
[00] tShell memPartFree: invalid block 8bf7bc in partition 9605dc.

being printed on the console. We didn't use to pass an explicit
--enable-clocale option to configure, but if I add
--enable-clocale=generic , thus reverting to the locale implementation
used for gcc11, the problem goes away.

The vxworks locale seems to be mostly identical to generic, just
differing in CCTYPE_CC. And comparing the .a files, it seems that that
TU ends up defining a constructor
_GLOBAL__sub_I__ZNSt12ctype_bynameIcEC2EPKcj , which calls
_ZNSt8ios_base4InitC1Ev . But then I'm lost.

Any ideas?

Rasmus


[PATCH] x86: Shrink writing 0/-1 to memory using and/or with -Oz.

2021-12-21 Thread Roger Sayle

This is the second part of my fix to PR target/103773 where -Oz shouldn't
use push/pop on x86 to shrink writing small integer constants to memory.
Instead clang uses "andl $0, mem" for writing zero, and "orl $-1, mem"
when writing -1 to memory when using -Oz.  This patch implements this
via peephole2 where we can confirm that its ok to clobber the flags.

On the CSiBE benchmark, this reduces total code size from 3664172 bytes
to 3663304 bytes, saving 868 bytes.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check with no new failures, and the new testcase checked
both with and without -m32.  Ok for mainline?


2021-12-21  Roger Sayle  

gcc/ChangeLog
* gcc/config/i386/i386.md (define_peephole2): With -Oz use
andl $0,mem instead of movl $0,mem and orl $-1,mem instead of
movl $-1,mem.

gcc/testsuite/ChangeLog
* gcc.target/i386/pr103773-2.c: New test case.


Thanks in advance (and my apologies for the breakage).
Roger
--

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index d25453f..d872824 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -20940,6 +20942,19 @@
   DONE;
 })
 
+;; Aggressive size optimizations with -Oz
+(define_peephole2
+  [(set (match_operand:SWI248 0 "memory_operand") (const_int 0))]
+  "optimize_size > 1 && peep2_regno_dead_p (0, FLAGS_REG)"
+  [(parallel [(set (match_dup 0) (and:SWI248 (match_dup 0) (const_int 0)))
+ (clobber (reg:CC FLAGS_REG))])])
+
+(define_peephole2
+  [(set (match_operand:SWI248 0 "memory_operand") (const_int -1))]
+  "optimize_size > 1 && peep2_regno_dead_p (0, FLAGS_REG)"
+  [(parallel [(set (match_dup 0) (ior:SWI248 (match_dup 0) (const_int -1)))
+ (clobber (reg:CC FLAGS_REG))])])
+
 ;; Reload dislikes loading constants directly into class_likely_spilled
 ;; hard registers.  Try to tidy things up here.
 (define_peephole2
diff --git a/gcc/testsuite/gcc.target/i386/pr103773-2.c 
b/gcc/testsuite/gcc.target/i386/pr103773-2.c
new file mode 100644
index 000..9dafebd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr103773-2.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-Oz" } */
+short s;
+int i;
+long long l;
+
+void s0() { s = 0; }
+void sm1() { s = -1; }
+void i0() { i = 0; }
+void im1() { i = -1; }
+void l0() { l = 0; }
+void lm1() { l = -1; }
+
+/* { dg-final { scan-assembler-not "\tmov\[wlq\]\t\\\$0," } } */
+/* { dg-final { scan-assembler-not "\tmov\[wlq\]\t\\\$-1," } } */
+/* { dg-final { scan-assembler "\tandw\t\\\$0," } } */
+/* { dg-final { scan-assembler "\torw\t\\\$-1," } } */
+/* { dg-final { scan-assembler "\torl\t\\\$-1," } } */
+


[committed] libphobos: Add power*-*-freebsd* as supported target

2021-12-21 Thread Iain Buclaw via Gcc-patches
Hi,

This patch adds power*-*-freebsd* as supported targets for libphobos.

This has been tested on powerpc64-freebsd13 and powerpc64le-freebsd13,
and used to build dub, along with some D tools from ports.

Regression tested, and committed to mainline.

Regards,
Iain.

---
libphobos/ChangeLog:

* configure.tgt: Add power*-*-freebsd* as a supported target.
---
 libphobos/configure.tgt | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/libphobos/configure.tgt b/libphobos/configure.tgt
index 88c027d0c28..0643daeb613 100644
--- a/libphobos/configure.tgt
+++ b/libphobos/configure.tgt
@@ -39,6 +39,9 @@ case "${target}" in
   mips*-*-linux*)
LIBPHOBOS_SUPPORTED=yes
;;
+  power*-*-freebsd*)
+   LIBPHOBOS_SUPPORTED=yes
+   ;;
   power*-*-linux*)
LIBPHOBOS_SUPPORTED=yes
LIBDRUNTIME_ONLY=yes
-- 
2.32.0



[GCC-9, 10, 11][committed] libphobos: Fix definition of stat_t for MIPS64 (PR103604)

2021-12-21 Thread Iain Buclaw via Gcc-patches
Hi,

This patch backports a specific change from commit r12-6003 to the
release branches to fix the layout of stat_t on MIPS64 targets.

Bootstrapped and regression tested on mips-unknown-linux, with -mabi=64
and -mabi=n32 multilib configurations.  Committed to releases/gcc-11,
gcc-10, and gcc-9 branches.

Regards
Iain.

---
libphobos/ChangeLog:

PR d/103604
* libdruntime/core/sys/posix/sys/stat.d (struct stat_t): Fix
definition for MIPS64.
---
 .../libdruntime/core/sys/posix/sys/stat.d | 46 +++
 1 file changed, 27 insertions(+), 19 deletions(-)

diff --git a/libphobos/libdruntime/core/sys/posix/sys/stat.d 
b/libphobos/libdruntime/core/sys/posix/sys/stat.d
index 6b4d022b825..7d0b1708e2d 100644
--- a/libphobos/libdruntime/core/sys/posix/sys/stat.d
+++ b/libphobos/libdruntime/core/sys/posix/sys/stat.d
@@ -340,26 +340,23 @@ version (CRuntime_Glibc)
 }
 c_long[14]  st_pad5;
 }
+static if (!__USE_FILE_OFFSET64)
+static assert(stat_t.sizeof == 144);
+else
+static assert(stat_t.sizeof == 160);
 }
 else version (MIPS64)
 {
 struct stat_t
 {
-c_ulong st_dev;
+dev_t   st_dev;
 int[3]  st_pad1;
-static if (!__USE_FILE_OFFSET64)
-{
-ino_t   st_ino;
-}
-else
-{
-c_ulong st_ino;
-}
+ino_t   st_ino;
 mode_t  st_mode;
 nlink_t st_nlink;
 uid_t   st_uid;
 gid_t   st_gid;
-c_ulong st_rdev;
+dev_t   st_rdev;
 static if (!__USE_FILE_OFFSET64)
 {
 uint[2] st_pad2;
@@ -368,8 +365,8 @@ version (CRuntime_Glibc)
 }
 else
 {
-c_long[3]   st_pad2;
-c_long  st_size;
+uint[3] st_pad2;
+off_t   st_size;
 }
 static if (__USE_MISC || __USE_XOPEN2K8)
 {
@@ -394,15 +391,26 @@ version (CRuntime_Glibc)
 }
 blksize_t   st_blksize;
 uintst_pad4;
+blkcnt_tst_blocks;
+int[14] st_pad5;
+}
+version (MIPS_N32)
+{
 static if (!__USE_FILE_OFFSET64)
-{
-blkcnt_tst_blocks;
-}
+static assert(stat_t.sizeof == 160);
 else
-{
-c_long  st_blocks;
-}
-c_long[14]  st_pad5;
+static assert(stat_t.sizeof == 176);
+}
+else version (MIPS_O64)
+{
+static if (!__USE_FILE_OFFSET64)
+static assert(stat_t.sizeof == 160);
+else
+static assert(stat_t.sizeof == 176);
+}
+else
+{
+static assert(stat_t.sizeof == 216);
 }
 }
 else version (PPC)
-- 
2.32.0



Re: [PATCH] PR fortran/103777 - ICE in gfc_simplify_maskl, at fortran/simplify.c:4918

2021-12-21 Thread Mikael Morin

Le 20/12/2021 à 23:05, Harald Anlauf via Fortran a écrit :

Dear all,

we need to check the arguments of the elemental MASKL and MASKR
intrinsics also before simplifying.

Testcase by Gerhard.  The fix is almost obvious, but I'm happy to
get feedback in case there is something I overlooked.  (There is
already a check on scalar arguments to MASKL/MASKR, which however
misses the case of array arguments.)

Regtested on x86_64-pc-linux-gnu.  OK for mainline?


Your patch looks reasonable and safe.
However, I find it surprising that it’s actually needed, as gfc_check 
mask is already the check function associated to maskl and maskr in the 
definition of the symbols.  The simplification function should be called 
only when the associated check function has returned successfully, so it 
shouldn’t be necessary to call it again at simplification time.
Looking at the backtrace, it is the do_simplify call at the beginning of 
 gfc_intrinsic_func_interface that seems dubious to me, as it comes 
before all the check further down in the function and it looks redundant 
with the other simplification code after the checks.


So I’m inclined to test whether by any chance removing that call works, 
and if it doesn’t, let’s go with this patch.


Mikael


[PATCH][AArch32]: correct usdot-product RTL patterns.

2021-12-21 Thread Tamar Christina via Gcc-patches
Hi All,

There was a bug in the ACLE specication for dot product which has now
been fixed[1].  This means some intrinsics were missing and are added by this
patch.

Bootstrapped and regtested on arm-none-linux-gnueabihf and no issues.

Ok for master?

[1] https://github.com/ARM-software/acle/releases/tag/r2021Q3

Thanks,
Tamar

gcc/ChangeLog:

* config/arm/arm_neon.h (vusdotq_s32, vusdot_laneq_s32,
vusdotq_laneq_s32, vsudot_laneq_s32, vsudotq_laneq_s32): New
* config/arm/arm_neon_builtins.def (usdot): Add V16QI.
(usdot_laneq, sudot_laneq): New.
* config/arm/neon.md (neon_dot_laneq): New.
(neon_dot_lane): Remote unneeded code.

gcc/testsuite/ChangeLog:

* gcc.target/arm/simd/vdot-2-1.c: Add new tests.
* gcc.target/arm/simd/vdot-2-2.c: Likewise and fix output.

--- inline copy of patch -- 
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 
af6ac63dc3b47830d92f199d93153ff510f658e9..2255d600549a2a1e5dbcebc03f7d6a63bab9f5aa
 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -18930,6 +18930,13 @@ vusdot_s32 (int32x2_t __r, uint8x8_t __a, int8x8_t __b)
   return __builtin_neon_usdotv8qi_ssus (__r, __a, __b);
 }
 
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vusdotq_s32 (int32x4_t __r, uint8x16_t __a, int8x16_t __b)
+{
+  return __builtin_neon_usdotv16qi_ssus (__r, __a, __b);
+}
+
 __extension__ extern __inline int32x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vusdot_lane_s32 (int32x2_t __r, uint8x8_t __a,
@@ -18962,6 +18969,38 @@ vsudotq_lane_s32 (int32x4_t __r, int8x16_t __a,
   return __builtin_neon_sudot_lanev16qi_sssus (__r, __a, __b, __index);
 }
 
+__extension__ extern __inline int32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vusdot_laneq_s32 (int32x2_t __r, uint8x8_t __a,
+ int8x16_t __b, const int __index)
+{
+  return __builtin_neon_usdot_laneqv8qi_ssuss (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vusdotq_laneq_s32 (int32x4_t __r, uint8x16_t __a,
+  int8x16_t __b, const int __index)
+{
+  return __builtin_neon_usdot_laneqv16qi_ssuss (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline int32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vsudot_laneq_s32 (int32x2_t __r, int8x8_t __a,
+ uint8x16_t __b, const int __index)
+{
+  return __builtin_neon_sudot_laneqv8qi_sssus (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vsudotq_laneq_s32 (int32x4_t __r, int8x16_t __a,
+  uint8x16_t __b, const int __index)
+{
+  return __builtin_neon_sudot_laneqv16qi_sssus (__r, __a, __b, __index);
+}
+
 #pragma GCC pop_options
 
 #pragma GCC pop_options
diff --git a/gcc/config/arm/arm_neon_builtins.def 
b/gcc/config/arm/arm_neon_builtins.def
index 
f83dd4327c16c0af68f72eb6d9ca8cf21e2e56b5..1c150ed3b650a003b44901b4d160a7d6f595f057
 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -345,9 +345,11 @@ VAR2 (UMAC_LANE, udot_lane, v8qi, v16qi)
 VAR2 (MAC_LANE, sdot_laneq, v8qi, v16qi)
 VAR2 (UMAC_LANE, udot_laneq, v8qi, v16qi)
 
-VAR1 (USTERNOP, usdot, v8qi)
+VAR2 (USTERNOP, usdot, v8qi, v16qi)
 VAR2 (USMAC_LANE_QUADTUP, usdot_lane, v8qi, v16qi)
 VAR2 (SUMAC_LANE_QUADTUP, sudot_lane, v8qi, v16qi)
+VAR2 (USMAC_LANE_QUADTUP, usdot_laneq, v8qi, v16qi)
+VAR2 (SUMAC_LANE_QUADTUP, sudot_laneq, v8qi, v16qi)
 
 VAR4 (BINOP, vcadd90, v4hf, v2sf, v8hf, v4sf)
 VAR4 (BINOP, vcadd270, v4hf, v2sf, v8hf, v4sf)
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 
848166311b5f82c5facb66e97c2260a5aba5d302..1707d8e625079b83497a3db44db5e33405bb5fa1
 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -2977,9 +2977,33 @@ (define_insn "neon_dot_lane"
DOTPROD_I8MM)
  (match_operand:VCVTI 1 "register_operand" "0")))]
   "TARGET_I8MM"
+  "vdot.\\t%0, %2, %P3[%c4]"
+  [(set_attr "type" "neon_dot")]
+)
+
+;; These instructions map to the __builtins for the Dot Product
+;; indexed operations in the v8.6 I8MM extension.
+(define_insn "neon_dot_laneq"
+  [(set (match_operand:VCVTI 0 "register_operand" "=w")
+   (plus:VCVTI
+ (unspec:VCVTI [(match_operand: 2 "register_operand" "w")
+(match_operand:V16QI 3 "register_operand" "t")
+(match_operand:SI 4 "immediate_operand" "i")]
+DOTPROD_I8MM)
+ (match_operand:VCVTI 1 "register_operand" "0")))]
+  "TARGET_I8MM"
   {
-operands[4] = GEN_INT (INTVAL (operands[4]));
-return "vdot.\\t%0, %2, %P3[%c4]";
+int lane = INTVAL (operands[4]);
+if (lane > GET_MODE_NUNITS (V2SImode) - 1)
+  {
+   

[AArch32]: correct dot-product RTL patterns.

2021-12-21 Thread Tamar Christina via Gcc-patches
Hi All,

The previous fix for this problem was wrong due to a subtle difference between
where NEON expects the RMW values and where intrinsics expects them.

The insn pattern is modeled after the intrinsics and so needs an expand for
the vectorizer optab to switch the RTL.

However operand[3] is not expected to be written to so the current pattern is
bogus.

Instead we use the expand to shuffle around the RTL.

The vectorizer expects operands[3] and operands[0] to be
the same but the aarch64 intrinsics expanders expect operands[0] and
operands[1] to be the same.

This also fixes some issues with big-endian, each dot product performs 4 8-byte
multiplications.  However compared to AArch64 we don't enter lanes in GCC
lane indexed in AArch32 aside from loads/stores.  This means no lane remappings
are done in arm-builtins.c and so none should be done at the instruction side.

There are some other instructions that need inspections as I think there are
more incorrect ones.

Third there was a bug in the ACLE specication for dot product which has now been
fixed[1].  This means some intrinsics were missing and are added by this patch.

Bootstrapped and regtested on arm-none-linux-gnueabihf and no issues.

Ok for master? and active branches after some stew?

[1] https://github.com/ARM-software/acle/releases/tag/r2021Q3

Thanks,
Tamar

gcc/ChangeLog:

* config/arm/arm_neon.h (vdot_laneq_u32, vdotq_laneq_u32,
vdot_laneq_s32, vdotq_laneq_s32): New.
* config/arm/arm_neon_builtins.def (sdot_laneq, udot_laneq: New.
* config/arm/neon.md (neon_dot): New.
(dot_prod): Re-order rtl.
(neon_dot_lane): Fix rtl order and endiannes.
(neon_dot_laneq): New.

gcc/testsuite/ChangeLog:

* gcc.target/arm/simd/vdot-compile.c: Add new cases.
* gcc.target/arm/simd/vdot-exec.c: Likewise.

--- inline copy of patch -- 
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 
3364b37f69dfc33082388246c03149d9ad66a634..af6ac63dc3b47830d92f199d93153ff510f658e9
 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -18243,6 +18243,35 @@ vdotq_lane_s32 (int32x4_t __r, int8x16_t __a, int8x8_t 
__b, const int __index)
   return __builtin_neon_sdot_lanev16qi (__r, __a, __b, __index);
 }
 
+__extension__ extern __inline uint32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vdot_laneq_u32 (uint32x2_t __r, uint8x8_t __a, uint8x16_t __b, const int 
__index)
+{
+  return __builtin_neon_udot_laneqv8qi_s (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vdotq_laneq_u32 (uint32x4_t __r, uint8x16_t __a, uint8x16_t __b,
+   const int __index)
+{
+  return __builtin_neon_udot_laneqv16qi_s (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline int32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vdot_laneq_s32 (int32x2_t __r, int8x8_t __a, int8x16_t __b, const int __index)
+{
+  return __builtin_neon_sdot_laneqv8qi (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vdotq_laneq_s32 (int32x4_t __r, int8x16_t __a, int8x16_t __b, const int 
__index)
+{
+  return __builtin_neon_sdot_laneqv16qi (__r, __a, __b, __index);
+}
+
 #pragma GCC pop_options
 #endif
 
diff --git a/gcc/config/arm/arm_neon_builtins.def 
b/gcc/config/arm/arm_neon_builtins.def
index 
fafb5c6fc51c16679ead1afda7cccfea8264fd15..f83dd4327c16c0af68f72eb6d9ca8cf21e2e56b5
 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -342,6 +342,8 @@ VAR2 (TERNOP, sdot, v8qi, v16qi)
 VAR2 (UTERNOP, udot, v8qi, v16qi)
 VAR2 (MAC_LANE, sdot_lane, v8qi, v16qi)
 VAR2 (UMAC_LANE, udot_lane, v8qi, v16qi)
+VAR2 (MAC_LANE, sdot_laneq, v8qi, v16qi)
+VAR2 (UMAC_LANE, udot_laneq, v8qi, v16qi)
 
 VAR1 (USTERNOP, usdot, v8qi)
 VAR2 (USMAC_LANE_QUADTUP, usdot_lane, v8qi, v16qi)
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 
8b0a396947cc8e7345f178b926128d7224fb218a..848166311b5f82c5facb66e97c2260a5aba5d302
 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -2866,20 +2866,49 @@ (define_expand "cmul3"
 })
 
 
-;; These instructions map to the __builtins for the Dot Product operations.
-(define_insn "neon_dot"
+;; These map to the auto-vectorizer Dot Product optab.
+;; The auto-vectorizer expects a dot product builtin that also does an
+;; accumulation into the provided register.
+;; Given the following pattern
+;;
+;; for (i=0; idot_prod"
   [(set (match_operand:VCVTI 0 "register_operand" "=w")
-   (plus:VCVTI (match_operand:VCVTI 1 "register_operand" "0")
-   (unspec:VCVTI [(match_operand: 2
-   "register_operand" "w")
-  (match_operand: 3
- 

[PATCH] PR target/103773: Fix wrong-code with -Oz from pop to memory.

2021-12-21 Thread Roger Sayle

My apologies for the inconvenience.  The new support for -Oz using
push/pop for small integer constants on x86_64 is only a win/correct
for loading registers.  Fixed by adding !MEM_P tests in the appropriate
locations.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check with no new failures.  Ok for mainline?


2021-12-21  Roger Sayle  

gcc/ChangeLog
PR target/103773
* config/i386/i386.md (*movdi_internal): Only use short
push/pop sequence for register (non-memory) destinations.
(*movsi_internal): Likewise.

gcc/testsuite/ChangeLog
PR target/103773
* gcc.target/i386/pr103773.c: New test case.

Roger
--

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index d25453f..e596f8b 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -2217,7 +2217,8 @@
  if (optimize_size > 1
  && TARGET_64BIT
  && CONST_INT_P (operands[1])
- && IN_RANGE (INTVAL (operands[1]), -128, 127))
+ && IN_RANGE (INTVAL (operands[1]), -128, 127)
+ && !MEM_P (operands[0]))
return "push{q}\t%1\n\tpop{q}\t%0";
  return "mov{l}\t{%k1, %k0|%k0, %k1}";
}
@@ -2440,7 +2441,8 @@
return "lea{l}\t{%E1, %0|%0, %E1}";
   else if (optimize_size > 1
   && CONST_INT_P (operands[1])
-  && IN_RANGE (INTVAL (operands[1]), -128, 127))
+  && IN_RANGE (INTVAL (operands[1]), -128, 127)
+  && !MEM_P (operands[0]))
{
  if (TARGET_64BIT)
return "push{q}\t%1\n\tpop{q}\t%q0";
diff --git a/gcc/testsuite/gcc.target/i386/pr103773.c 
b/gcc/testsuite/gcc.target/i386/pr103773.c
new file mode 100644
index 000..1e4b8ce
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr103773.c
@@ -0,0 +1,12 @@
+/* { dg-do run } */
+/* { dg-options "-Oz" } */
+
+unsigned long long x;
+
+int main (void)
+{
+  __builtin_memset (, 0xff, 4);
+  if (x != 0x)
+__builtin_abort ();
+  return 0;
+}


[PATCH] nvptx: bump default to PTX 4.1

2021-12-21 Thread Andrew Stubbs

On 20/12/2021 15:58, Andrew Stubbs wrote:
In order to support the %dynamic_smem_size PTX feature is is necessary 
to bump the minimum supported PTX version from 3.1 (~2013) to 4.1 (~2014).


Tobias has pointed out, privately, that the default version is both 
documented and encoded in the -mptx option, so I need to fix that too.


This patch adds -mptx=4.1, sets it as the default, and updates the 
documentation accordingly.


The -mptx=3.1 option is kept for backwards compatibility as an alias for 
4.1. There's no point in actually allowing 3.1 as any program linked 
against libgomp will fail (and that's all offloading programs).


OK for stage 1?

Andrew
nvptx: bump default to PTX 4.1

gcc/ChangeLog:

* config/nvptx/nvptx-opts.h (ptx_version): Change PTX_VERSION_3_1 to
PTX_VERSION_4_1.
* config/nvptx/nvptx.c (nvptx_file_start): Bump minimum PTX version
to 4.1.
* config/nvptx/nvptx.opt (ptx_version): Add 4.1. Change default.
doc/invoke.texi: -mptx default is now 4.1.

diff --git a/gcc/config/nvptx/nvptx-opts.h b/gcc/config/nvptx/nvptx-opts.h
index 7b6ecd42fed..c7fcf1c8cd4 100644
--- a/gcc/config/nvptx/nvptx-opts.h
+++ b/gcc/config/nvptx/nvptx-opts.h
@@ -31,7 +31,7 @@ enum ptx_isa
 
 enum ptx_version
 {
-  PTX_VERSION_3_1,
+  PTX_VERSION_4_1,
   PTX_VERSION_6_3,
   PTX_VERSION_7_0
 };
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index ff44d9fdbef..9bc26d7de0c 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -5409,7 +5409,7 @@ nvptx_file_start (void)
   else if (TARGET_PTX_6_3)
 fputs ("\t.version\t6.3\n", asm_out_file);
   else
-fputs ("\t.version\t3.1\n", asm_out_file);
+fputs ("\t.version\t4.1\n", asm_out_file);
   if (TARGET_SM80)
 fputs ("\t.target\tsm_80\n", asm_out_file);
   else if (TARGET_SM75)
diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt
index 1d88ef18d04..9e6a6a7fbff 100644
--- a/gcc/config/nvptx/nvptx.opt
+++ b/gcc/config/nvptx/nvptx.opt
@@ -79,8 +79,12 @@ Enum
 Name(ptx_version) Type(int)
 Known PTX versions (for use with the -mptx= option):
 
+; Keep 3.1 for backwards compatibility only
 EnumValue
-Enum(ptx_version) String(3.1) Value(PTX_VERSION_3_1)
+Enum(ptx_version) String(3.1) Value(PTX_VERSION_4_1)
+
+EnumValue
+Enum(ptx_version) String(4.1) Value(PTX_VERSION_4_1)
 
 EnumValue
 Enum(ptx_version) String(6.3) Value(PTX_VERSION_6_3)
@@ -89,5 +93,5 @@ EnumValue
 Enum(ptx_version) String(7.0) Value(PTX_VERSION_7_0)
 
 mptx=
-Target RejectNegative ToLower Joined Enum(ptx_version) Var(ptx_version_option) 
Init(PTX_VERSION_3_1)
+Target RejectNegative ToLower Joined Enum(ptx_version) Var(ptx_version_option) 
Init(PTX_VERSION_4_1)
 Specify the version of the ptx version to use.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 37836a7d614..92f0da62a2b 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -27056,8 +27056,8 @@ strings must be lower-case.  Valid ISA strings include 
@samp{sm_30} and
 @item -mptx=@var{version-string}
 @opindex mptx
 Generate code for given the specified PTX version (e.g.@: @samp{6.3}).
-Valid version strings include @samp{3.1} and @samp{6.3}.  The default PTX
-version is 3.1.
+Valid version strings include @samp{4.1} and @samp{6.3}.  The default PTX
+version is 4.1.
 
 @item -mmainkernel
 @opindex mmainkernel


Re: [PATCH v3 2/2][GCC] arm: Declare MVE types internally via pragma

2021-12-21 Thread Murray Steele via Gcc-patches
Hi,


I'd like to ping this patch revision [1]. 

Thanks,
Murray

[1]: https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586476.html

---

On 09/12/2021 15:24, Murray Steele via Gcc-patches wrote:
> Changes from original patch:
> 
> 1. Make mentioned changes to changelog.
> 2. Add namespace-end comments.
> 3. Add #error for when arm-mve-builtins.def is included without
>defining DEF_MVE_TYPE.
> 4. Make placement of '#undef DEF_MVE_TYPE' consistent.
> 
> ---
> 
> This patch moves the implementation of MVE ACLE types from
> arm_mve_types.h to inside GCC via a new pragma, which replaces the prior
> type definitions. This allows for the types to be used internally for
> intrinsic function definitions.
> 
> Bootstrapped and regression tested on arm-none-linux-gnuabihf, and
> regression tested on arm-eabi -- no issues.
> 
> Thanks,
> Murray
> 
> gcc/ChangeLog:
> 
> * config.gcc: Add arm-mve-builtins.o to extra_objs.
> * config/arm/arm-c.c (arm_pragma_arm): Handle "#pragma GCC arm".
> (arm_register_target_pragmas): Register it.
> * config/arm/arm-protos.h: (arm_mve::arm_handle_mve_types_h): New
> prototype.
> * config/arm/arm_mve_types.h: Replace MVE type definitions with
> new pragma.
> * config/arm/t-arm: (arm-mve-builtins.o): New target rule.
> * config/arm/arm-mve-builtins.cc: New file.
> * config/arm/arm-mve-builtins.def: New file.
> * config/arm/arm-mve-builtins.h: New file.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/arm/mve/mve.exp: Add new subdirectories.
> * gcc.target/arm/mve/general-c/type_redef_1.c: New test.
> * gcc.target/arm/mve/general/double_pragmas_1.c: New test.
> * gcc.target/arm/mve/general/nomve_1.c: New test.


[PATCH] [PATCH] Fix [11/12 regression] error: 'fenv_t' has not been declared in '::' -- canadian compilation fails [PR100017]

2021-12-21 Thread cqwrteur via Gcc-patches
libstdc++ cannot find fenv_t for fenv.h when doing canadian compilation.
Fix it by adding -nostdinc++ toggle to configure and configure.ac.

Backported from master
---
 configure| 2 +-
 configure.ac | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index 504f6410274..0c53c986c11 100755
--- a/configure
+++ b/configure
@@ -16478,7 +16478,7 @@ else
 fi
 
 
-RAW_CXX_FOR_TARGET="$CXX_FOR_TARGET"
+RAW_CXX_FOR_TARGET="$CXX_FOR_TARGET -nostdinc++"
 
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking where to find the target ar" 
>&5
 $as_echo_n "checking where to find the target ar... " >&6; }
diff --git a/configure.ac b/configure.ac
index 088e735c5db..ee0953d53ae 100644
--- a/configure.ac
+++ b/configure.ac
@@ -3520,7 +3520,7 @@ ACX_CHECK_INSTALLED_TARGET_TOOL(STRIP_FOR_TARGET, strip)
 ACX_CHECK_INSTALLED_TARGET_TOOL(WINDRES_FOR_TARGET, windres)
 ACX_CHECK_INSTALLED_TARGET_TOOL(WINDMC_FOR_TARGET, windmc)
 
-RAW_CXX_FOR_TARGET="$CXX_FOR_TARGET"
+RAW_CXX_FOR_TARGET="$CXX_FOR_TARGET -nostdinc++"
 
 GCC_TARGET_TOOL(ar, AR_FOR_TARGET, AR, [binutils/ar])
 GCC_TARGET_TOOL(as, AS_FOR_TARGET, AS, [gas/as-new])
-- 
2.25.1



[PATCH] Fix [11/12 regression] error: 'fenv_t' has not been declared in '::' -- canadian compilation fails [PR100017]

2021-12-21 Thread cqwrteur via Gcc-patches
From: expnkx 

libstdc++ cannot find fenv_t for fenv.h when doing canadian compilation.
Fix it by adding -nostdinc++ toggle to configure and configure.ac.
---
 configure| 2 +-
 configure.ac | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index 2e2dcd0ab30..f4c746e7733 100755
--- a/configure
+++ b/configure
@@ -17285,7 +17285,7 @@ else
 fi
 
 
-RAW_CXX_FOR_TARGET="$CXX_FOR_TARGET"
+RAW_CXX_FOR_TARGET="$CXX_FOR_TARGET -nostdinc++"
 
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking where to find the target ar" 
>&5
 $as_echo_n "checking where to find the target ar... " >&6; }
diff --git a/configure.ac b/configure.ac
index 68cc5cc31fe..baab3b02e2e 100644
--- a/configure.ac
+++ b/configure.ac
@@ -3636,7 +3636,7 @@ ACX_CHECK_INSTALLED_TARGET_TOOL(STRIP_FOR_TARGET, strip)
 ACX_CHECK_INSTALLED_TARGET_TOOL(WINDRES_FOR_TARGET, windres)
 ACX_CHECK_INSTALLED_TARGET_TOOL(WINDMC_FOR_TARGET, windmc)
 
-RAW_CXX_FOR_TARGET="$CXX_FOR_TARGET"
+RAW_CXX_FOR_TARGET="$CXX_FOR_TARGET -nostdinc++"
 
 GCC_TARGET_TOOL(ar, AR_FOR_TARGET, AR, [binutils/ar])
 GCC_TARGET_TOOL(as, AS_FOR_TARGET, AS, [gas/as-new])
-- 
2.25.1



[PATCH, rs6000] Fix ICE on expand bcd__ [PR100736]

2021-12-21 Thread HAO CHEN GUI via Gcc-patches
Hi,
  This patch fixes the ICE in PR100736. It adds a reverse condition comparison 
when the
condition code can be reversed and finite-math-only is set.

  Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is 
this okay for trunk?
Any recommendations? Thanks a lot.

ChangeLog
2021-12-20 Haochen Gui 

gcc/
* config/rs6000/altivec.md (bcd_test_): Named.
(bcd__): Reverse compare condition if
finite-math-only is set.
* config/rs6000/rs6000-protos.c (rs6000_reverse_compare): Defined.
* config/rs6000/rs6000.c (rs6000_emit_sCOND): Refactored.  Call
rs6000_reverse_compare if the condition code can be reversed.
(rs6000_reverse_compare): Implemented.

gcc/testsuite/
* gcc.target/powerpc/pr100736.h: New.
* gcc.target/powerpc/pr100736.finite.c: New.
* gcc.target/powerpc/pr100736.infinite.c: New.


patch.diff
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index ef432112333..cc40edc5381 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -4412,7 +4412,7 @@ (define_insn "bcd_"
 ;; UNORDERED test on an integer type (like V1TImode) is not defined.  The type
 ;; probably should be one that can go in the VMX (Altivec) registers, so we
 ;; can't use DDmode or DFmode.
-(define_insn "*bcd_test_"
+(define_insn "bcd_test_"
   [(set (reg:CCFP CR6_REGNO)
(compare:CCFP
 (unspec:V2DF [(match_operand:VBCD 1 "register_operand" "v")
@@ -4539,6 +4539,18 @@ (define_expand "bcd__"
   "TARGET_P8_VECTOR"
 {
   operands[4] = CONST0_RTX (V2DFmode);
+  emit_insn (gen_bcd_test_ (operands[0], operands[1],
+  operands[2], operands[3],
+  operands[4]));
+  rtx cr6 = gen_rtx_REG (CCFPmode, CR6_REGNO);
+  rtx condition_rtx = gen_rtx_ (SImode, cr6, const0_rtx);
+  if (flag_finite_math_only)
+{
+  condition_rtx = rs6000_reverse_compare (condition_rtx);
+  PUT_MODE (condition_rtx, SImode);
+}
+  emit_insn (gen_rtx_SET (operands[0], condition_rtx));
+  DONE;
 })

 (define_insn "*bcdinvalid_"
diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index 14f6b313105..9b93e26bec2 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -114,6 +114,7 @@ extern enum rtx_code rs6000_reverse_condition (machine_mode,
 extern rtx rs6000_emit_eqne (machine_mode, rtx, rtx, rtx);
 extern rtx rs6000_emit_fp_cror (rtx_code, machine_mode, rtx);
 extern void rs6000_emit_sCOND (machine_mode, rtx[]);
+extern rtx rs6000_reverse_compare (rtx);
 extern void rs6000_emit_cbranch (machine_mode, rtx[]);
 extern char * output_cbranch (rtx, const char *, int, rtx_insn *);
 extern const char * output_probe_stack_range (rtx, rtx, rtx);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 5e129986516..2f3dd311396 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -15653,19 +15653,14 @@ rs6000_emit_fp_cror (rtx_code code, machine_mode 
mode, rtx x)
   return cc;
 }

-void
-rs6000_emit_sCOND (machine_mode mode, rtx operands[])
+rtx
+rs6000_reverse_compare (rtx condition_rtx)
 {
-  rtx condition_rtx = rs6000_generate_compare (operands[1], mode);
   rtx_code cond_code = GET_CODE (condition_rtx);
-
-  if (FLOAT_MODE_P (mode) && HONOR_NANS (mode)
-  && !(FLOAT128_VECTOR_P (mode) && !TARGET_FLOAT128_HW))
-;
-  else if (cond_code == NE
-  || cond_code == GE || cond_code == LE
-  || cond_code == GEU || cond_code == LEU
-  || cond_code == ORDERED || cond_code == UNGE || cond_code == UNLE)
+  if (cond_code == NE
+  || cond_code == GE || cond_code == LE
+  || cond_code == GEU || cond_code == LEU
+  || cond_code == ORDERED || cond_code == UNGE || cond_code == UNLE)
 {
   rtx not_result = gen_reg_rtx (CCEQmode);
   rtx not_op, rev_cond_rtx;
@@ -15679,6 +15674,19 @@ rs6000_emit_sCOND (machine_mode mode, rtx operands[])
   emit_insn (gen_rtx_SET (not_result, not_op));
   condition_rtx = gen_rtx_EQ (VOIDmode, not_result, const0_rtx);
 }
+  return condition_rtx;
+}
+
+void
+rs6000_emit_sCOND (machine_mode mode, rtx operands[])
+{
+  rtx condition_rtx = rs6000_generate_compare (operands[1], mode);
+
+  if (FLOAT_MODE_P (mode) && HONOR_NANS (mode)
+  && !(FLOAT128_VECTOR_P (mode) && !TARGET_FLOAT128_HW))
+  ;
+  else
+condition_rtx = rs6000_reverse_compare (condition_rtx);

   machine_mode op_mode = GET_MODE (XEXP (operands[1], 0));
   if (op_mode == VOIDmode)
diff --git a/gcc/testsuite/gcc.target/powerpc/pr100736.finite.c 
b/gcc/testsuite/gcc.target/powerpc/pr100736.finite.c
new file mode 100644
index 000..43c85d4a2c1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr100736.finite.c
@@ -0,0 +1,8 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mdejagnu-cpu=power8 

Re: [gcc r12-6020] Fixed typo

2021-12-21 Thread Martin Liška

On 12/20/21 19:19, Rainer Orth wrote:

please make config.sub executable again.  This has been lost with the
update.


Hello.

Sure, sorry for the unintended change. Fixed in 
r12-6088-g6fad101f3063d722e3348d07dc93cf737f8709e4.

Martin


[PATCH v3, rs6000] Implement mffscrni pattern

2021-12-21 Thread HAO CHEN GUI via Gcc-patches
Hi,
  I modified the patch according to reviewers' advice.

  This patch defines a pattern for mffscrni. If the RN is a constant, it can 
call
gen_rs6000_mffscrni directly. The "rs6000-builtin-new.def" defines prototype 
for builtin arguments.
The pattern "rs6000_set_fpscr_rn" is then broken as the mode of its argument is 
DI while its
corresponding builtin has a const int argument. The patch also fixed it.

  Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is 
this okay for trunk?
Any recommendations? Thanks a lot.

ChangeLog
2021-12-21 Haochen Gui 

gcc/
* config/rs6000/rs6000-call.c
(rs6000_expand_set_fpscr_rn_builtin): Not copy argument to a reg if
it's a constant.  The pattern for constant can be recognized now.
* config/rs6000/rs6000.md (rs6000_mffscrni): Defined.
(rs6000_set_fpscr_rn): Change the type of operand[0] form DI to SI.
Call gen_rs6000_mffscrni when operand[0] is a const int[0,3].

gcc/testsuite/
* gcc.target/powerpc/mffscrni_p9.c: New testcase for mffscrni.
* gcc.target/powerpc/test_fpscr_rn_builtin.c: Modify the test cases to
test mffscrn and mffscrni separately.


patch.diff
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index d9736eaf21c..81261a0f24d 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -9610,13 +9610,15 @@ rs6000_expand_set_fpscr_rn_builtin (enum insn_code 
icode, tree exp)
  compile time if the argument is a variable.  The least significant two
  bits of the argument, regardless of type, are used to set the rounding
  mode.  All other bits are ignored.  */
-  if (CONST_INT_P (op0) && !const_0_to_3_operand(op0, VOIDmode))
+  if (CONST_INT_P (op0))
 {
-  error ("Argument must be a value between 0 and 3.");
-  return const0_rtx;
+  if (!const_0_to_3_operand (op0, VOIDmode))
+   {
+ error ("Argument must be a value between 0 and 3.");
+ return const0_rtx;
+   }
 }
-
-  if (! (*insn_data[icode].operand[0].predicate) (op0, mode0))
+  else if (! (*insn_data[icode].operand[0].predicate) (op0, mode0))
 op0 = copy_to_mode_reg (mode0, op0);

   pat = GEN_FCN (icode) (op0);
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 6bec2bddbde..452a77f2033 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -6315,6 +6315,14 @@ (define_insn "rs6000_mffscrn"
"mffscrn %0,%1"
   [(set_attr "type" "fp")])

+(define_insn "rs6000_mffscrni"
+  [(set (match_operand:DF 0 "gpc_reg_operand" "=d")
+   (unspec_volatile:DF [(match_operand:SI 1 "const_0_to_3_operand" "n")]
+   UNSPECV_MFFSCRN))]
+   "TARGET_P9_MISC"
+   "mffscrni %0,%1"
+  [(set_attr "type" "fp")])
+
 (define_insn "rs6000_mffscdrn"
   [(set (match_operand:DF 0 "gpc_reg_operand" "=d")
(unspec_volatile:DF [(const_int 0)] UNSPECV_MFFSCDRN))
@@ -6324,7 +6332,7 @@ (define_insn "rs6000_mffscdrn"
   [(set_attr "type" "fp")])

 (define_expand "rs6000_set_fpscr_rn"
- [(match_operand:DI 0 "reg_or_cint_operand")]
+ [(match_operand:SI 0 "reg_or_cint_operand")]
   "TARGET_HARD_FLOAT"
 {
   rtx tmp_df = gen_reg_rtx (DFmode);
@@ -6333,9 +6341,14 @@ (define_expand "rs6000_set_fpscr_rn"
  new rounding mode bits from operands[0][62:63] into FPSCR[62:63].  */
   if (TARGET_P9_MISC)
 {
-  rtx src_df = force_reg (DImode, operands[0]);
-  src_df = simplify_gen_subreg (DFmode, src_df, DImode, 0);
-  emit_insn (gen_rs6000_mffscrn (tmp_df, src_df));
+  if (const_0_to_3_operand (operands[0], VOIDmode))
+   emit_insn (gen_rs6000_mffscrni (tmp_df, operands[0]));
+  else
+   {
+ rtx op0 = convert_to_mode (DImode, operands[0], false);
+ rtx src_df = simplify_gen_subreg (DFmode, op0, DImode, 0);
+ emit_insn (gen_rs6000_mffscrn (tmp_df, src_df));
+   }
   DONE;
 }

@@ -6357,7 +6370,8 @@ (define_expand "rs6000_set_fpscr_rn"
   rtx tmp_di = gen_reg_rtx (DImode);

   /* Extract new RN mode from operand.  */
-  emit_insn (gen_anddi3 (tmp_rn, operands[0], GEN_INT (0x3)));
+  rtx op0 = convert_to_mode (DImode, operands[0], false);
+  emit_insn (gen_anddi3 (tmp_rn, op0, GEN_INT (3)));

   /* Insert new RN mode into FSCPR.  */
   emit_insn (gen_rs6000_mffs (tmp_df));
diff --git a/gcc/testsuite/gcc.target/powerpc/mffscrni_p9.c 
b/gcc/testsuite/gcc.target/powerpc/mffscrni_p9.c
new file mode 100644
index 000..d97c6db8002
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/mffscrni_p9.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
+/* { dg-final { scan-assembler-times {\mmffscrni\M} 1 } } */
+
+void foo ()
+{
+  int val = 2;
+  __builtin_set_fpscr_rn (val);
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin.c 
b/gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin.c
index 0d0d3f0f96b..04707ad8a56