[PATCH] arc: Fix for new ifcvt behavior [PR104154]

2022-02-20 Thread Robin Dapp via Gcc-patches
Hi,

I figured I'd just go ahead and post this patch as well since it seems
to have fixed the arc build problems.

It would be nice if someone could bootstrap/regtest if Jeff hasn't
already done so.  I was able to verify that the two testcases attached
to the PR build cleanly but not much more.  Thank you.

Regards
 Robin

--

PR104154

gcc/ChangeLog:

* config/arc/arc.cc (gen_compare_reg):  Return the CC-mode
comparison ifcvt passed us.

---

>From fa98a40abd55e3a10653f6a8c5b2414a2025103b Mon Sep 17 00:00:00 2001
From: Robin Dapp 
Date: Mon, 7 Feb 2022 08:39:41 +0100
Subject: [PATCH] arc: Fix for new ifcvt behavior [PR104154]

ifcvt now passes a CC-mode "comparison" to backends.  This patch
simply returns from gen_compare_reg () in that case since nothing
needs to be prepared anymore.

PR104154

gcc/ChangeLog:

* config/arc/arc.cc (gen_compare_reg):  Return the CC-mode
comparison ifcvt passed us.
---
 gcc/config/arc/arc.cc | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/config/arc/arc.cc b/gcc/config/arc/arc.cc
index 8cc173519ab..5e40ec2c04d 100644
--- a/gcc/config/arc/arc.cc
+++ b/gcc/config/arc/arc.cc
@@ -2254,6 +2254,12 @@ gen_compare_reg (rtx comparison, machine_mode omode)


   cmode = GET_MODE (x);
+
+  /* If ifcvt passed us a MODE_CC comparison we can
+ just return it.  It should be in the proper form already.   */
+  if (GET_MODE_CLASS (cmode) == MODE_CC)
+return comparison;
+
   if (cmode == VOIDmode)
 cmode = GET_MODE (y);
   gcc_assert (cmode == SImode || cmode == SFmode || cmode == DFmode);
-- 
2.31.1



Re: [RFC][nvptx] Initialize ptx regs

2022-02-20 Thread Richard Biener via Gcc-patches
On Sun, Feb 20, 2022 at 11:50 PM Tom de Vries via Gcc-patches
 wrote:
>
> Hi,
>
> With nvptx target, driver version 510.47.03 and board GT 1030 I, we run into:
> ...
> FAIL: gcc.c-torture/execute/pr53465.c -O1 execution test
> FAIL: gcc.c-torture/execute/pr53465.c -O2 execution test
> FAIL: gcc.c-torture/execute/pr53465.c -O3 -g execution test
> ...
> while the test-cases pass with nvptx-none-run -O0.
>
> The problem is that the generated ptx contains a read from an uninitialized
> ptx register, and the driver JIT doesn't handle this well.
>
> For -O2 and -O3, we can get rid of the FAIL using --param
> logical-op-non-short-circuit=0.  But not for -O1.
>
> At -O1, the test-case minimizes to:
> ...
> void __attribute__((noinline, noclone))
> foo (int y) {
>   int c;
>   for (int i = 0; i < y; i++)
> {
>   int d = i + 1;
>   if (i && d <= c)
> __builtin_abort ();
>   c = d;
> }
> }
>
> int main () {
>   foo (2); return 0;
> }
> ...
>
> Note that the test-case does not contain an uninitialized use.  In the first
> iteration, i is 0 and consequently c is not read.  In the second iteration, c
> is read, but by that time it's already initialized by 'c = d' from the first
> iteration.
>
> AFAICT the problem is introduced as follows: the conditional use of c in the
> loop body is translated into an unconditional use of c in the loop header:
> ...
>   # c_1 = PHI 
> ...
> which forwprop1 propagates the 'c_9 = d_7' assignment into:
> ...
>   # c_1 = PHI 
> ...
> which ends up being translated by expand into an unconditional:
> ...
> (insn 13 12 0 (set (reg/v:SI 22 [ c ])
> (reg/v:SI 23 [ d ])) -1
>  (nil))
> ...
> at the start of the loop body, creating an uninitialized read of d on the
> path from loop entry.

Ah, interesting case.  Note that some fixup pass inserted a copy in
the loop header
before coalescing:

;;   basic block 3, loop depth 1
;;pred:   6
;;2
  # c_10 = PHI 
  # i_11 = PHI 
  c_2 = c_10;   <--- this one
  i_8 = i_11;
  d_6 = i_11 + 1;
  if (i_8 != 0)
goto ; [64.00%]
  else
goto ; [36.00%]
;;succ:   4
;;6

;;   basic block 4, loop depth 1
;;pred:   3
  if (d_6 <= c_2)
goto ; [0.00%]
  else
goto ; [100.00%]

we try to coalesce both c_10 to d_6 and i_11 to d_6, both have the same
cost I think and we succeed with the first which happens to be the one with
the default def arg.

I also think whether we coalesce or not doesn't really matter for the issue at
hand, the copy on entry should be elided anyway but the odd inserted copy
should be investigated (it looks unnecessary and it should be placed before
the single-use, not in the header).

> By disabling coalesce_ssa_name, we get the more usual copies on the incoming
> edges.  The copy on the loop entry path still does an uninitialized read, but
> that one's now initialized by init-regs.  The test-case passes, also when
> disabling init-regs, so it's possible that the JIT driver doesn't object to
> this type of uninitialized read.
>
> Now that we characterized the problem to some degree, we need to fix this,
> because either:
> - we're violating an undocumented ptx invariant, and this is a compiler bug,
>   or
> - this is is a driver JIT bug and we need to work around it.

So what does the JIT do that ends up breaking things?  Does the
actual hardware ISA have NaTs and trap?

> There are essentially two strategies to address this:
> - stop the compiler from creating uninitialized reads
> - patch up uninitialized reads using additional initialization
>
> The former will probably involve:
> - making some optimizations more conservative in the presence of
>   uninitialized reads, and
> - disabling some other optimizations (where making them more conservative is
>   not possible, or cannot easily be achieved).
> This will probably will have a cost penalty for code that does not suffer from
> the original problem.
>
> The latter has the problem that it may paper over uninitialized reads
> in the source code, or indeed over ones that were incorrectly introduced
> by the compiler.  But it has the advantage that it allows for the problem to
> be addressed at a single location.

There are some long-standing bug in bugzilla regarding to uninit uses
and how we treat them as invoking undefined behavior but also introduce
those ourselves in some places.  We of course can't do both so I think
we do neet to get our hands at a way to fix things without introducing
too many optimization regressions.

You've identified the most obvious candidate already - logical-op
short-circuiting.

I guess that the PTX JIT is fine with uninitialized memory so one issue
is that we can end up turning uninitialized memory into uninitialized
registers (not changing the point of execution though), if the JIT will
break here you will need to fixup in reorg like you do.

> There's an existing pass, init-regs, which implements a form of the latter,
> 

[PATCH] x86: Fix -fsplit-stack feature detection via TARGET_CAN_SPLIT_STACK

2022-02-20 Thread soeren--- via Gcc-patches
From: Sören Tempel 

Since commit c163647ffbc9a20c8feb6e079dbecccfe016c82e -fsplit-stack
is only supported on glibc targets. However, this original commit
required some fixups. As part of the fixup, the changes to the
gnu-user-common.h and gnu.h where partially reverted in commit
60953a23d57b13a672f751bec0c6eefc059eb1ab thus causing TARGET_CAN_SPLIT_STACK
to be defined for non-glibc targets even though -fsplit-stack is
actually not supported and attempting to use it causes a runtime error.

This causes gcc internal code, such as ./gcc/go/gospec.cc to not
correctly detect that -fsplit-stack is not supported and thus causes
gccgo to fail compilation on non-glibc targets.

This commit ensures that TARGET_CAN_SPLIT_STACK is set based on the
changes performed in 2c31a8be4a5db11a0a0e97c366dded6362421086, i.e.
the new OPTION_GLIBC_P macro is now used to detect if -fsplit-stack is
supported in the x86 header files.

The proposed changes have been tested on x86_64 Alpine Linux (which uses
musl libc) and fix compilation of gccgo for this target.

Signed-off-by: Sören Tempel 

gcc/ChangeLog:

* config/i386/gnu-user-common.h (defined): Only define
TARGET_CAN_SPLIT_STACK for glibc targets.
* config/i386/gnu.h (defined): Ditto.
---
I hope this is the last fixup commit needed for the original -fsplit-stack
change. Apologizes that the integration was a bit messy, I am simply not
deeply familiar with the gcc code base. The change works fine on Alpine
and fixes gccgo compilation but please review it carefully and make sure
the macro guards for TARGET_CAN_SPLIT_STACK are aligned with the
ix86_supports_split_stack implementation.

Should we also check if TARGET_THREAD_SPLIT_STACK_OFFSET is defined?

 gcc/config/i386/gnu-user-common.h | 5 +++--
 gcc/config/i386/gnu.h | 5 +++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/config/i386/gnu-user-common.h 
b/gcc/config/i386/gnu-user-common.h
index 23b54c5be52..bb745c2edaa 100644
--- a/gcc/config/i386/gnu-user-common.h
+++ b/gcc/config/i386/gnu-user-common.h
@@ -66,7 +66,8 @@ along with GCC; see the file COPYING3.  If not see
 #define STACK_CHECK_STATIC_BUILTIN 1
 
 /* We only build the -fsplit-stack support in libgcc if the
-   assembler has full support for the CFI directives.  */
-#if HAVE_GAS_CFI_PERSONALITY_DIRECTIVE
+   assembler has full support for the CFI directives.  Also
+   we only support -fsplit-stack on glibc targets.  */
+#if HAVE_GAS_CFI_PERSONALITY_DIRECTIVE && defined(OPTION_GLIBC_P)
 #define TARGET_CAN_SPLIT_STACK
 #endif
diff --git a/gcc/config/i386/gnu.h b/gcc/config/i386/gnu.h
index 401e60c9a02..92a29149b2e 100644
--- a/gcc/config/i386/gnu.h
+++ b/gcc/config/i386/gnu.h
@@ -41,8 +41,9 @@ along with GCC.  If not, see .
 #define TARGET_THREAD_SSP_OFFSET0x14
 
 /* We only build the -fsplit-stack support in libgcc if the
-   assembler has full support for the CFI directives.  */
-#if HAVE_GAS_CFI_PERSONALITY_DIRECTIVE
+   assembler has full support for the CFI directives.  Also
+   we only support -fsplit-stack on glibc targets.  */
+#if HAVE_GAS_CFI_PERSONALITY_DIRECTIVE && defined(OPTION_GLIBC_P)
 #define TARGET_CAN_SPLIT_STACK
 #endif
 /* We steal the last transactional memory word.  */


[PATCH] Implement constant-folding simplifications of reductions.

2022-02-20 Thread Roger Sayle

This patch addresses a code quality regression in GCC 12 by implementing
some constant folding/simplification transformations for REDUC_PLUS_EXPR
in match.pd.  The motivating example is gcc.dg/vect/pr89440.c which with
-O2 -ffast-math (with vectorization now enabled) gets optimized to:

float f (float x)
{
  vector(4) float vect_x_14.11;
  vector(4) float _2;
  float _32;

  _2 = {x_9(D), 0.0, 0.0, 0.0};
  vect_x_14.11_29 = _2 + { 1.0e+1, 2.6e+1, 4.2e+1, 5.8e+1 };
  _32 = .REDUC_PLUS (vect_x_14.11_29); [tail call]
  return _32;
}

With these proposed new transformations, we can simplify the
above code even further.

float f (float x)
{
  float _32;
  _32 = x_9(D) + 1.36e+2;
  return _32;
}

[which happens to match what we'd produce with -fno-tree-vectorize,
and with GCC 11].

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check with no new failures.  Ok for mainline?


2022-02-21  Roger Sayle  

gcc/ChangeLog
* fold-const.cc (ctor_single_nonzero_element): New function to
return the single non-zero element of a (vector) constructor.
* fold-const.h (ctor_single_nonzero_element): Prototype here.
* match.pd (reduc (constructor@0)): Simplify reductions of a
constructor containing a single non-zero element.
(reduc (@0 op VECTOR_CST) ->  (reduc @0) op CONST): Simplify
reductions of vector operations of the same operator with
constant vector operands.

gcc/testsuite/ChangeLog
* gcc.dg/fold-reduc-1.c: New test case.


Thanks in advance,
Roger
--

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 386d573..4283308 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -16792,6 +16792,33 @@ address_compare (tree_code code, tree type, tree op0, 
tree op1,
   return equal;
 }
 
+/* Return the single non-zero element of a CONSTRUCTOR or NULL_TREE.  */
+tree
+ctor_single_nonzero_element (const_tree t)
+{
+  unsigned HOST_WIDE_INT idx;
+  constructor_elt *ce;
+  tree elt = NULL_TREE;
+
+  if (TREE_CODE (t) == SSA_NAME)
+{
+  gassign *def = dyn_cast  (SSA_NAME_DEF_STMT (t));
+  if (gimple_assign_rhs_code (def) == CONSTRUCTOR)
+t = gimple_assign_rhs1 (def);
+}
+
+  if (TREE_CODE (t) != CONSTRUCTOR)
+return NULL_TREE;
+  for (idx = 0; vec_safe_iterate (CONSTRUCTOR_ELTS (t), idx, ); idx++)
+if (!integer_zerop (ce->value) && !real_zerop (ce->value))
+  {
+   if (elt)
+ return NULL_TREE;
+   elt = ce->value;
+  }
+  return elt;
+}
+
 #if CHECKING_P
 
 namespace selftest {
diff --git a/gcc/fold-const.h b/gcc/fold-const.h
index f217598..b2f0a2f 100644
--- a/gcc/fold-const.h
+++ b/gcc/fold-const.h
@@ -224,6 +224,7 @@ extern const char *c_getstr (tree);
 extern wide_int tree_nonzero_bits (const_tree);
 extern int address_compare (tree_code, tree, tree, tree, tree &, tree &,
poly_int64 &, poly_int64 &, bool);
+extern tree ctor_single_nonzero_element (const_tree);
 
 /* Return OFF converted to a pointer offset type suitable as offset for
POINTER_PLUS_EXPR.  Use location LOC for this conversion.  */
diff --git a/gcc/match.pd b/gcc/match.pd
index d9d8359..047fb50 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -7528,6 +7528,20 @@ and,
(BIT_FIELD_REF:elt_type @0 { size; } { pos; })
{ elt; })))
 
+/* Fold reduction of a single nonzero element constructor.  */
+(for reduc (IFN_REDUC_PLUS IFN_REDUC_IOR IFN_REDUC_XOR)
+  (simplify (reduc (CONSTRUCTOR@0))
+(with { tree elt = ctor_single_nonzero_element (@0); }
+  (if (elt)
+(non_lvalue { elt; })
+
+/* Fold REDUC (@0 op VECTOR_CST) as REDUC (@0) op REDUC (VECTOR_CST).  */
+(for reduc (IFN_REDUC_PLUS IFN_REDUC_MAX IFN_REDUC_MIN IFN_REDUC_FMAX
+IFN_REDUC_FMIN IFN_REDUC_AND IFN_REDUC_IOR IFN_REDUC_XOR)
+ op (plus max min IFN_FMAX IFN_FMIN bit_and bit_ior bit_xor)
+  (simplify (reduc (op @0 VECTOR_CST@1))
+(op (reduc:type @0) (reduc:type @1
+
 (simplify
  (vec_perm @0 @1 VECTOR_CST@2)
  (with
diff --git a/gcc/testsuite/gcc.dg/fold-reduc-1.c 
b/gcc/testsuite/gcc.dg/fold-reduc-1.c
new file mode 100644
index 000..c8360b0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-reduc-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -fdump-tree-optimized" } */
+float foo (float x)
+{
+ int i;
+ float j;
+ float a = 0;
+ for (i = 0; i < 4; ++i)
+   {
+ for (j = 0; j < 4; ++j)
+   {
+ a += 1;
+ x += a;
+   }
+   }
+ return x;
+}
+
+/* { dg-final { scan-tree-dump-not "REDUC_PLUS" "optimized"} } */


Re: [PATCH] Improved constant folding for scalar evolution.

2022-02-20 Thread Richard Biener via Gcc-patches
On Sun, Feb 20, 2022 at 2:50 PM Roger Sayle  wrote:
>
>
> This patch adds a small (follow-up) optimization to chrec_apply for
> linear chrecs to clean-up the final value expressions sometimes generated
> by GCC's scalar evolution pass.  The transformation of A+(X-1)*A into
> A*X is usually unsafe with respect to overflow (see PR92712), and so
> can't be performed by match.pd (or fold-const).  However, during scalar
> evolution's evaluation of recurrences it's known that X-1 can't be negative
> (in fact X-1 is unsigned even when A is signed), hence this optimization
> can be applied.  Interestingly, this expression does get simplified in
> later passes once the range of X-1 is bounded by VRP, but that occurs
> long after the decision of whether to perform final value replacement,
> which is based on the complexity of this expression.

In principle A + (X-1)*A can be always simplified to (unsigned)A * (unsigned)X,
but at least fold-consts fold_plusminus_mult has

  /* Do not resort to unsigned multiplication because
 we lose the no-overflow property of the expression.  */
  return NULL_TREE;

we might want to heuristically do that anyway if the result is not a
multiplication
by a constant (I remember doing the above because of testsuite regressions).

It might be also interesting to see whether we change back
(int)((unsigned)A * (unsigned)X)
to A * X when we can constrain ranges.

In the specific case of the testcase below we of course only know overflow
doesn't happen because it would be undefined behavior.

> The motivating test case is the optimization of the loop (from comment
> #7 of PR65855):
>
> int square(int x) {
>   int result = 0;
>   for (int i = 0; i < x; ++i)
> result += x;
>   return result;
> }
>
> which is currently optimized, with final value replacement to:
>
>   final value replacement:
>with expr: (int) ((unsigned int) x_3(D) + 4294967295) * x_3(D) + x_3(D)
>
> but with this patch it first gets simplified further:
>
>   final value replacement:
>with expr: x_3(D) * x_3(D)
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check with no new failures.  Ok for mainline?

OK once stage1 opens.

Thanks,
Richard.

>
> 2022-02-20  Roger Sayle  
>
> gcc/ChangeLog
> * tree-chrec.cc (chrec_apply): Attempt to fold the linear chrec
> "{a, +, a} (x-1)" as "a*x", as the number of loop iterations, x-1,
> can't be negative.
>
> gcc/testsuite/ChangeLog
> * gcc.dg/tree-ssa/pr65855-2.c: New test case.
>
>
> Roger
> --
>


Re: [PR103302] skip multi-word pre-move clobber during lra

2022-02-20 Thread Richard Biener via Gcc-patches
On Sat, Feb 19, 2022 at 12:28 AM Alexandre Oliva via Gcc-patches
 wrote:
>
> On Dec 15, 2021, Jeff Law  wrote:
>
> >> * expr.c (emit_move_complex_parts): Skip clobbers during lra.
> > OK for the next cycle.
>
> Thanks, but having looked into PR 104121, I withdraw this patch and also
> the already-installed patch for PR 103302.  As I found out, LRA does
> worse without the clobbers for multi-word moves, not only because the
> clobbers shorten live ranges and enable earlier and better allocations,
> but also because find_reload_regno_insns implicitly, possibly
> unknowingly, relied on the clobbers to avoid the risk of an infinite
> loop.
>
> As noted in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104121#c11 with
> the clobber, a multi-word reload, and the insn the reload applies to, we
> get 4 insns, so find_reload_regno_insns avoids the loop.  Without the
> clobber, a multi-word reload for either input or output makes for n==3,
> so we enter the loop and don't ever exit it: we'll find first_insn
> (input) or second_insn (output), but then we'll loop forever because we
> won't iterate again on {prev,next}_insn, respectively, and the other
> iterator won't find the other word reload.  We advance the other till
> the end, but that's not enough for us to terminate the loop.
>
> With the proposed patch reversal, we no longer hit the problem with the
> v850 testcase in 104121, but I'm concerned we might still get an
> infinite loop on ports whose input or output reloads might emit a pair
> of insns without a clobber.
>
> I see two ways to robustify it.  One is to find a complete reload
> sequence:
>
> diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc
> index c1d40ea2a14bd..ff1688917cbce 100644
> --- a/gcc/lra-assigns.cc
> +++ b/gcc/lra-assigns.cc
> @@ -1716,9 +1716,18 @@ find_reload_regno_insns (int regno, rtx_insn * , 
> rtx_insn * )
> start_insn = lra_insn_recog_data[uid]->insn;
>n++;
>  }
> -  /* For reload pseudo we should have at most 3 insns referring for
> - it: input/output reload insns and the original insn.  */
> -  if (n > 3)
> +  /* For reload pseudo we should have at most 3 (sequences of) insns
> + referring for it: input/output reload insn sequences and the
> + original insn.  Each reload insn sequence may amount to multiple
> + insns, but we expect to find each of them contiguous, one before
> + start_insn, one after it.  We know start_insn is between the
> + sequences because it's the lowest-numbered insn, thus the first
> + we'll have found above.  The reload insns, emitted later, will
> + have been assigned higher insn uids.  If this assumption doesn't
> + hold, and there happen to be intervening reload insns for other
> + pseudos, we may end up returning false after searching to the end
> + in the wrong direction.  */
> +  if (n > 1 + 2 * CEIL (lra_reg_info[regno].biggest_mode, UNITS_PER_WORD))
>  return false;
>if (n > 1)
>  {
> @@ -1726,26 +1735,52 @@ find_reload_regno_insns (int regno, rtx_insn * 
> , rtx_insn * )
>  next_insn = NEXT_INSN (start_insn);
>n != 1 && (prev_insn != NULL || next_insn != NULL); )
> {
> - if (prev_insn != NULL && first_insn == NULL)
> + if (prev_insn != NULL)
> {
>   if (! bitmap_bit_p (_reg_info[regno].insn_bitmap,
>   INSN_UID (prev_insn)))
> prev_insn = PREV_INSN (prev_insn);
>   else
> {
> - first_insn = prev_insn;
> - n--;
> + /* A reload sequence may have multiple insns, but
> +they must be contiguous.  */
> + do
> +   {
> + first_insn = prev_insn;
> + n--;
> + prev_insn = PREV_INSN (prev_insn);
> +   }
> + while (n > 1 && prev_insn
> +&& bitmap_bit_p (_reg_info[regno].insn_bitmap,
> + INSN_UID (prev_insn)));
> + /* After finding first_insn, we don't want to search
> +backward any more, so set prev_insn to NULL so as
> +to not loop indefinitely.  */
> + prev_insn = NULL;
> }
> }
> - if (next_insn != NULL && second_insn == NULL)
> + else if (next_insn != NULL)
> {
>   if (! bitmap_bit_p (_reg_info[regno].insn_bitmap,
> INSN_UID (next_insn)))
> next_insn = NEXT_INSN (next_insn);
>   else
> {
> - second_insn = next_insn;
> - n--;
> + /* A reload sequence may have multiple insns, but
> +they must be contiguous.  */
> + do
> +   {
> + second_insn = next_insn;

Re: [PATCH] c: [PR104506] Fix ICE after error due to change of type to error_mark_node

2022-02-20 Thread Richard Biener via Gcc-patches
On Fri, Feb 18, 2022 at 10:40 PM apinski--- via Gcc-patches
 wrote:
>
> From: Andrew Pinski 
>
> The problem here is we end up with an error_mark_node when calling
> useless_type_conversion_p and that ICEs. STRIP_NOPS/tree_nop_conversion
> has had a check for the inner type being an error_mark_node since 
> g9a6bb3f78c96
> (2000). This just adds the check also to tree_ssa_useless_type_conversion.
> STRIP_USELESS_TYPE_CONVERSION is mostly used inside the gimplifier
> and the places where it is used outside of the gimplifier would not
> be adding too much overhead.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

> Thanks,
> Andrew Pinski
>
> PR c/104506
>
> gcc/ChangeLog:
>
> * tree-ssa.cc (tree_ssa_useless_type_conversion):
> Check the inner type before calling useless_type_conversion_p.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/pr104506-1.c: New test.
> * gcc.dg/pr104506-2.c: New test.
> * gcc.dg/pr104506-3.c: New test.
> ---
>  gcc/testsuite/gcc.dg/pr104506-1.c | 12 
>  gcc/testsuite/gcc.dg/pr104506-2.c | 11 +++
>  gcc/testsuite/gcc.dg/pr104506-3.c | 11 +++
>  gcc/tree-ssa.cc   | 20 +---
>  4 files changed, 47 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr104506-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr104506-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr104506-3.c
>
> diff --git a/gcc/testsuite/gcc.dg/pr104506-1.c 
> b/gcc/testsuite/gcc.dg/pr104506-1.c
> new file mode 100644
> index 000..5eb71911b71
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr104506-1.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-std=gnu11" } */
> +/* PR c/104506: we used to ICE after the error of
> +   changing the type.  */
> +
> +void
> +foo (double x)
> +/* { dg-message "note: previous definition" "previous definition" { target 
> *-*-* } .-1 } */
> +{
> +  (void)x;
> +  int x; /* { dg-error "redeclared as different kind of symbol" } */
> +}
> diff --git a/gcc/testsuite/gcc.dg/pr104506-2.c 
> b/gcc/testsuite/gcc.dg/pr104506-2.c
> new file mode 100644
> index 000..3c3c4f8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr104506-2.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-std=gnu11" } */
> +/* PR c/104506: we used to ICE after the error of
> +   changing the type.  */
> +void
> +foo (double x)
> +/* { dg-message "note: previous definition" "previous definition" { target 
> *-*-* } .-1 } */
> +{
> +  x;
> +  int x; /* { dg-error "redeclared as different kind of symbol" } */
> +}
> diff --git a/gcc/testsuite/gcc.dg/pr104506-3.c 
> b/gcc/testsuite/gcc.dg/pr104506-3.c
> new file mode 100644
> index 000..b14deb5cf25
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr104506-3.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* PR c/104506: we used to ICE after the error of
> +   changing the type.  */
> +double x;
> +/* { dg-message "note: previous declaration" "previous declaration" { target 
> *-*-* } .-1 } */
> +void
> +foo (void)
> +{
> +  x;
> +}
> +int x; /* { dg-error "conflicting types" } */
> diff --git a/gcc/tree-ssa.cc b/gcc/tree-ssa.cc
> index 430875ae37a..423dd871d9e 100644
> --- a/gcc/tree-ssa.cc
> +++ b/gcc/tree-ssa.cc
> @@ -1256,18 +1256,24 @@ delete_tree_ssa (struct function *fn)
>  bool
>  tree_ssa_useless_type_conversion (tree expr)
>  {
> +  tree outer_type, inner_type;
> +
>/* If we have an assignment that merely uses a NOP_EXPR to change
>   the top of the RHS to the type of the LHS and the type conversion
>   is "safe", then strip away the type conversion so that we can
>   enter LHS = RHS into the const_and_copies table.  */
> -  if (CONVERT_EXPR_P (expr)
> -  || TREE_CODE (expr) == VIEW_CONVERT_EXPR
> -  || TREE_CODE (expr) == NON_LVALUE_EXPR)
> -return useless_type_conversion_p
> -  (TREE_TYPE (expr),
> -   TREE_TYPE (TREE_OPERAND (expr, 0)));
> +  if (!CONVERT_EXPR_P (expr)
> +  && TREE_CODE (expr) != VIEW_CONVERT_EXPR
> +  && TREE_CODE (expr) != NON_LVALUE_EXPR)
> +return false;
>
> -  return false;
> +  outer_type = TREE_TYPE (expr);
> +  inner_type = TREE_TYPE (TREE_OPERAND (expr, 0));
> +
> +  if (inner_type == error_mark_node)
> +return false;
> +
> +  return useless_type_conversion_p (outer_type, inner_type);
>  }
>
>  /* Strip conversions from EXP according to
> --
> 2.17.1
>


[PATCH v2, rs6000] Disable TImode from Bool expanders [PR100694, PR93123]

2022-02-20 Thread HAO CHEN GUI via Gcc-patches
Hi,
  This patch disables TImode for Bool expanders. Thus TI register can be split
to two DI registers during expand.Potential optimizations can be implemented
after the split. The new test case illustrates it.

  Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is
this okay for trunk? Any recommendations? Thanks a lot.

ChangeLog
2022-02-21 Haochen Gui 

gcc/
PR target/100694
* config/rs6000/rs6000.md (and3): Disable TImode.
(ior3): Likewise.
(xor3): Likewise.
(nor3): Likewise.
(andc3): Likewise.
(eqv3): Likewise.
(nand3): Likewise.
(orc3): Likewise.
(one_cmpl2): Likewise.
(*one_cmplti2): Enable TImode complement for combine and split.

gcc/testsuite/
PR target/100694
* gcc.target/powerpc/pr100694.c: New.
* gcc.target/powerpc/pr92398.c: New.
* gcc.target/powerpc/pr92398.h: Remove.
* gcc.target/powerpc/pr92398.p9-.c: Remove.
* gcc.target/powerpc/pr92398.p9+.c: Remove.


patch.diff
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 6f74075f58d..1b1816d72ec 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -6976,21 +6976,21 @@ (define_expand "and3"
   [(set (match_operand:BOOL_128 0 "vlogical_operand")
(and:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand")
  (match_operand:BOOL_128 2 "vlogical_operand")))]
-  ""
+  "mode != TImode"
   "")

 (define_expand "ior3"
   [(set (match_operand:BOOL_128 0 "vlogical_operand")
 (ior:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand")
  (match_operand:BOOL_128 2 "vlogical_operand")))]
-  ""
+  "mode != TImode"
   "")

 (define_expand "xor3"
   [(set (match_operand:BOOL_128 0 "vlogical_operand")
 (xor:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand")
  (match_operand:BOOL_128 2 "vlogical_operand")))]
-  ""
+  "mode != TImode"
   "")

 (define_expand "nor3"
@@ -6998,7 +6998,7 @@ (define_expand "nor3"
(and:BOOL_128
 (not:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand"))
 (not:BOOL_128 (match_operand:BOOL_128 2 "vlogical_operand"]
-  ""
+  "mode != TImode"
   "")

 (define_expand "andc3"
@@ -7006,7 +7006,7 @@ (define_expand "andc3"
 (and:BOOL_128
 (not:BOOL_128 (match_operand:BOOL_128 2 "vlogical_operand"))
 (match_operand:BOOL_128 1 "vlogical_operand")))]
-  ""
+  "mode != TImode"
   "")

 ;; Power8 vector logical instructions.
@@ -7015,7 +7015,7 @@ (define_expand "eqv3"
(not:BOOL_128
 (xor:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand")
   (match_operand:BOOL_128 2 "vlogical_operand"]
-  "mode == TImode || mode == PTImode || TARGET_P8_VECTOR"
+  "mode != TImode && (mode == PTImode || TARGET_P8_VECTOR)"
   "")

 ;; Rewrite nand into canonical form
@@ -7024,7 +7024,7 @@ (define_expand "nand3"
(ior:BOOL_128
 (not:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand"))
 (not:BOOL_128 (match_operand:BOOL_128 2 "vlogical_operand"]
-  "mode == TImode || mode == PTImode || TARGET_P8_VECTOR"
+  "mode != TImode && (mode == PTImode || TARGET_P8_VECTOR)"
   "")

 ;; The canonical form is to have the negated element first, so we need to
@@ -7034,7 +7034,7 @@ (define_expand "orc3"
(ior:BOOL_128
 (not:BOOL_128 (match_operand:BOOL_128 2 "vlogical_operand"))
 (match_operand:BOOL_128 1 "vlogical_operand")))]
-  "mode == TImode || mode == PTImode || TARGET_P8_VECTOR"
+  "mode != TImode && (mode == PTImode || TARGET_P8_VECTOR)"
   "")

 ;; 128-bit logical operations insns and split operations
@@ -7291,7 +7291,7 @@ (define_insn_and_split "one_cmpl2"
   [(set (match_operand:BOOL_128 0 "vlogical_operand" "=")
(not:BOOL_128
  (match_operand:BOOL_128 1 "vlogical_operand" "")))]
-  ""
+  "mode != TImode"
 {
   if (TARGET_VSX && vsx_register_operand (operands[0], mode))
 return "xxlnor %x0,%x1,%x1";
@@ -7321,6 +7321,39 @@ (define_insn_and_split "one_cmpl2"
 (const_string "8")
 (const_string "16"])

+(define_insn_and_split "*one_cmplti2"
+  [(set (match_operand:TI 0 "vlogical_operand" "=,r,r,wa,v")
+   (not:TI
+ (match_operand:TI 1 "vlogical_operand" "r,0,0,wa,v")))]
+  ""
+{
+  if (TARGET_VSX && vsx_register_operand (operands[0], TImode))
+return "xxlnor %x0,%x1,%x1";
+
+  if (TARGET_ALTIVEC && altivec_register_operand (operands[0], TImode))
+return "vnor %0,%1,%1";
+
+  return "#";
+}
+  "reload_completed && int_reg_operand (operands[0], TImode)"
+  [(const_int 0)]
+{
+  rs6000_split_logical (operands, NOT, false, false, false);
+  DONE;
+}
+  [(set (attr "type")
+  (if_then_else
+   (match_test "vsx_register_operand (operands[0], TImode)")
+   (const_string "veclogical")
+   (const_string "integer")))
+   (set (attr "length")
+  (if_then_else
+   

Re: [PATCH v2] x86: Add TARGET_OMIT_VZEROUPPER_AFTER_AVX_READ_ZERO

2022-02-20 Thread Hongtao Liu via Gcc-patches
On Thu, Feb 17, 2022 at 9:56 PM H.J. Lu  wrote:
>
> On Thu, Feb 17, 2022 at 08:51:31AM +0100, Uros Bizjak wrote:
> > On Thu, Feb 17, 2022 at 6:25 AM Hongtao Liu via Gcc-patches
> >  wrote:
> > >
> > > On Thu, Feb 17, 2022 at 12:26 PM H.J. Lu via Gcc-patches
> > >  wrote:
> > > >
> > > > Reading YMM registers with all zero bits needs VZEROUPPER on Sandy 
> > > > Bride,
> > > > Ivy Bridge, Haswell, Broadwell and Alder Lake to avoid SSE <-> AVX
> > > > transition penalty.  Add TARGET_READ_ZERO_YMM_ZMM_NEED_VZEROUPPER to
> > > > generate vzeroupper instruction after loading all-zero YMM/YMM registers
> > > > and enable it by default.
> > > Shouldn't TARGET_READ_ZERO_YMM_ZMM_NONEED_VZEROUPPER sounds a bit 
> > > smoother?
> > > Because originally we needed to add vzeroupper to all avx<->sse cases,
> > > now it's a tune to indicate that we don't need to add it in some
> >
> > Perhaps we should go from the other side and use
> > X86_TUNE_OPTIMIZE_AVX_READ for new processors?
> >
>
> Here is the v2 patch to add TARGET_OMIT_VZEROUPPER_AFTER_AVX_READ_ZERO.
>
The patch LGTM in general, but please rebase against
https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590541.html
and resend the patch, also wait a couple days in case Uros(and others)
have any comments.
>
> H.J.
> ---
> Reading YMM registers with all zero bits needs VZEROUPPER on Sandy Bride,
> Ivy Bridge, Haswell, Broadwell and Alder Lake to avoid SSE <-> AVX
> transition penalty.  Add TARGET_OMIT_VZEROUPPER_AFTER_AVX_READ_ZERO to
> omit vzeroupper instruction after loading all-zero YMM/ZMM registers.
>
> gcc/
>
> PR target/101456
> * config/i386/i386.cc (ix86_avx_u128_mode_needed): Omit
> vzeroupper after reading all-zero YMM/ZMM registers for
> TARGET_OMIT_VZEROUPPER_AFTER_AVX_READ_ZERO.
> * config/i386/i386.h (TARGET_OMIT_VZEROUPPER_AFTER_AVX_READ_ZERO):
> New.
> * config/i386/x86-tune.def
> (X86_TUNE_OMIT_VZEROUPPER_AFTER_AVX_READ_ZERO): New.
>
> gcc/testsuite/
>
> PR target/101456
> * gcc.target/i386/pr101456-1.c (dg-options): Add
> -mtune-ctrl=-mtune-ctrl=omit_vzeroupper_after_avx_read_zero.
> * gcc.target/i386/pr101456-2.c: Likewise.
> * gcc.target/i386/pr101456-3.c: New test.
> * gcc.target/i386/pr101456-4.c: Likewise.
> ---
>  gcc/config/i386/i386.cc| 51 --
>  gcc/config/i386/i386.h |  2 +
>  gcc/config/i386/x86-tune.def   |  5 +++
>  gcc/testsuite/gcc.target/i386/pr101456-1.c |  2 +-
>  gcc/testsuite/gcc.target/i386/pr101456-2.c |  2 +-
>  gcc/testsuite/gcc.target/i386/pr101456-3.c | 33 ++
>  gcc/testsuite/gcc.target/i386/pr101456-4.c | 33 ++
>  7 files changed, 103 insertions(+), 25 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr101456-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr101456-4.c
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index cf246e74e57..60c72ceb72d 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -14502,33 +14502,38 @@ ix86_avx_u128_mode_needed (rtx_insn *insn)
>
>subrtx_iterator::array_type array;
>
> -  rtx set = single_set (insn);
> -  if (set)
> +  if (TARGET_OMIT_VZEROUPPER_AFTER_AVX_READ_ZERO)
>  {
> -  rtx dest = SET_DEST (set);
> -  rtx src = SET_SRC (set);
> -  if (ix86_check_avx_upper_register (dest))
> +  /* Perform this vzeroupper optimization if target doesn't need
> +vzeroupper after reading all-zero YMM/YMM registers.  */
> +  rtx set = single_set (insn);
> +  if (set)
> {
> - /* This is an YMM/ZMM load.  Return AVX_U128_DIRTY if the
> -source isn't zero.  */
> - if (standard_sse_constant_p (src, GET_MODE (dest)) != 1)
> -   return AVX_U128_DIRTY;
> + rtx dest = SET_DEST (set);
> + rtx src = SET_SRC (set);
> + if (ix86_check_avx_upper_register (dest))
> +   {
> + /* This is an YMM/ZMM load.  Return AVX_U128_DIRTY if the
> +source isn't zero.  */
> + if (standard_sse_constant_p (src, GET_MODE (dest)) != 1)
> +   return AVX_U128_DIRTY;
> + else
> +   return AVX_U128_ANY;
> +   }
>   else
> -   return AVX_U128_ANY;
> -   }
> -  else
> -   {
> - FOR_EACH_SUBRTX (iter, array, src, NONCONST)
> -   if (ix86_check_avx_upper_register (*iter))
> - {
> -   int status = ix86_avx_u128_mode_source (insn, *iter);
> -   if (status == AVX_U128_DIRTY)
> - return status;
> - }
> -   }
> +   {
> + FOR_EACH_SUBRTX (iter, array, src, NONCONST)
> +   if (ix86_check_avx_upper_register (*iter))
> + {
> +   int status = ix86_avx_u128_mode_source (insn, *iter);
> +  

Re: [PATCH 3/3] target/99881 - x86 vector cost of CTOR from integer regs

2022-02-20 Thread Hongtao Liu via Gcc-patches
On Fri, Feb 18, 2022 at 10:01 PM Richard Biener via Gcc-patches
 wrote:
>
> This uses the now passed SLP node to the vectorizer costing hook
> to adjust vector construction costs for the cost of moving an
> integer component from a GPR to a vector register when that's
> required for building a vector from components.  A cruical difference
> here is whether the component is loaded from memory or extracted
> from a vector register as in those cases no intermediate GPR is involved.
>
> The pr99881.c testcase can be Un-XFAILed with this patch, the
> pr91446.c testcase now produces scalar code which looks superior
> to me so I've adjusted it as well.
>
> I'm currently re-bootstrapping and testing on x86_64-unknown-linux-gnu
> after adding the BIT_FIELD_REF vector extracting special casing.
Does the patch handle PR101929?
>
> I suppose we can let autotesters look for SPEC performance fallout.
>
> OK if testing succeeds?
>
> Thanks,
> Richard.
>
> 2022-02-18  Richard Biener  
>
> PR tree-optimization/104582
> PR target/99881
> * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
> Cost GPR to vector register moves for integer vector construction.
>
> * gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-1.c: New.
> * gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-2.c: Likewise.
> * gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-3.c: Likewise.
> * gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-4.c: Likewise.
> * gcc.target/i386/pr99881.c: Un-XFAIL.
> * gcc.target/i386/pr91446.c: Adjust to not expect vectorization.
> ---
>  gcc/config/i386/i386.cc   | 45 ++-
>  .../costmodel/x86_64/costmodel-pr104582-1.c   | 15 +++
>  .../costmodel/x86_64/costmodel-pr104582-2.c   | 13 ++
>  .../costmodel/x86_64/costmodel-pr104582-3.c   | 13 ++
>  .../costmodel/x86_64/costmodel-pr104582-4.c   | 15 +++
>  gcc/testsuite/gcc.target/i386/pr91446.c   |  2 +-
>  gcc/testsuite/gcc.target/i386/pr99881.c   |  2 +-
>  7 files changed, 102 insertions(+), 3 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-1.c
>  create mode 100644 
> gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-2.c
>  create mode 100644 
> gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-3.c
>  create mode 100644 
> gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-4.c
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 0830dbd7dca..b2bf90576d5 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -22997,7 +22997,7 @@ ix86_vectorize_create_costs (vec_info *vinfo, bool 
> costing_for_scalar)
>
>  unsigned
>  ix86_vector_costs::add_stmt_cost (int count, vect_cost_for_stmt kind,
> - stmt_vec_info stmt_info, slp_tree,
> + stmt_vec_info stmt_info, slp_tree node,
>   tree vectype, int misalign,
>   vect_cost_model_location where)
>  {
> @@ -23160,6 +23160,49 @@ ix86_vector_costs::add_stmt_cost (int count, 
> vect_cost_for_stmt kind,
>stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign);
>stmt_cost *= (TYPE_VECTOR_SUBPARTS (vectype) + 1);
>  }
> +  else if (kind == vec_construct
> +  && node
> +  && SLP_TREE_DEF_TYPE (node) == vect_external_def
> +  && INTEGRAL_TYPE_P (TREE_TYPE (vectype)))
> +{
> +  stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign);
> +  unsigned i;
> +  tree op;
> +  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_OPS (node), i, op)
> +   if (TREE_CODE (op) == SSA_NAME)
> + TREE_VISITED (op) = 0;
> +  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_OPS (node), i, op)
> +   {
> + if (TREE_CODE (op) != SSA_NAME
> + || TREE_VISITED (op))
> +   continue;
> + TREE_VISITED (op) = 1;
> + gimple *def = SSA_NAME_DEF_STMT (op);
> + tree tem;
> + if (is_gimple_assign (def)
> + && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def))
> + && ((tem = gimple_assign_rhs1 (def)), true)
> + && TREE_CODE (tem) == SSA_NAME
> + /* A sign-change expands to nothing.  */
> + && tree_nop_conversion_p (TREE_TYPE (gimple_assign_lhs (def)),
> +   TREE_TYPE (tem)))
> +   def = SSA_NAME_DEF_STMT (tem);
> + /* When the component is loaded from memory we can directly
> +move it to a vector register, otherwise we have to go
> +via a GPR or via vpinsr which involves similar cost.
> +Likewise with a BIT_FIELD_REF extracting from a vector
> +register we can hope to avoid using a GPR.  */
> + if (!is_gimple_assign (def)
> + || (!gimple_assign_load_p (def)
> + 

[PATCH] libgcc: allow building float128 libraries on FreeBSD

2022-02-20 Thread pkubaj
From: Piotr Kubaj 

While FreeBSD currently uses 64-bit long double, there should be no
problem with adding support for float128.

Signed-off-by: Piotr Kubaj 
---
 libgcc/configure| 22 ++
 libgcc/configure.ac | 11 +++
 2 files changed, 33 insertions(+)

diff --git a/libgcc/configure b/libgcc/configure
index 4919a56f518..334d20d1fb1 100755
--- a/libgcc/configure
+++ b/libgcc/configure
@@ -5300,6 +5300,28 @@ fi
 { $as_echo "$as_me:${as_lineno-$LINENO}: result: 
$libgcc_cv_powerpc_3_1_float128_hw" >&5
 $as_echo "$libgcc_cv_powerpc_3_1_float128_hw" >&6; }
   CFLAGS="$saved_CFLAGS"
+;;
+powerpc*-*-freebsd*)
+  saved_CFLAGS="$CFLAGS"
+  CFLAGS="$CFLAGS -mabi=altivec -mvsx -mfloat128"
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for PowerPC ISA 2.06 to 
build __float128 libraries" >&5
+$as_echo_n "checking for PowerPC ISA 2.06 to build __float128 libraries... " 
>&6; }
+if ${libgcc_cv_powerpc_float128+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+vector double dadd (vector double a, vector double b) { return a + b; }
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  libgcc_cv_powerpc_float128=yes
+else
+  libgcc_cv_powerpc_float128=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $libgcc_cv_powerpc_float128" 
>&5
+$as_echo "$libgcc_cv_powerpc_float128" >&6; }
 esac
 
 # Collect host-machine-specific information.
diff --git a/libgcc/configure.ac b/libgcc/configure.ac
index 13a80b2551b..99ec5d405a4 100644
--- a/libgcc/configure.ac
+++ b/libgcc/configure.ac
@@ -483,6 +483,17 @@ powerpc*-*-linux*)
 [libgcc_cv_powerpc_3_1_float128_hw=yes],
 [libgcc_cv_powerpc_3_1_float128_hw=no])])
   CFLAGS="$saved_CFLAGS"
+;;
+powerpc*-*-freebsd*)
+  saved_CFLAGS="$CFLAGS"
+  CFLAGS="$CFLAGS -mabi=altivec -mvsx -mfloat128"
+  AC_CACHE_CHECK([for PowerPC ISA 2.06 to build __float128 libraries],
+ [libgcc_cv_powerpc_float128],
+ [AC_COMPILE_IFELSE(
+[AC_LANG_SOURCE([vector double dadd (vector double a, vector double b) { 
return a + b; }])],
+[libgcc_cv_powerpc_float128=yes],
+[libgcc_cv_powerpc_float128=no])])
+  CFLAGS="$saved_CFLAGS"
 esac
 
 # Collect host-machine-specific information.
-- 
2.35.1



[committed] d: Remove handling of deleting GC allocated classes.

2022-02-20 Thread Iain Buclaw via Gcc-patches
Hi,

Now that the `delete' keyword has been removed from the front-end, only
compiler-generated uses of DeleteExp reach the code generator via the
auto-destruction of `scope class' variables.

The run-time library helpers that previously were used to delete GC
class objects can now be removed from the compiler.

Bootstrapped and regression tested on x86_64-linux-gnu/-m32/-mx32, and
committed to mainline.

Regards,
Iain.

---
gcc/d/ChangeLog:

* expr.cc (ExprVisitor::visit (DeleteExp *)): Remove handling of
deleting GC allocated classes.
* runtime.def (DELCLASS): Remove.
(DELINTERFACE): Remove.
---
 gcc/d/expr.cc | 24 ++--
 gcc/d/runtime.def |  6 +-
 2 files changed, 7 insertions(+), 23 deletions(-)

diff --git a/gcc/d/expr.cc b/gcc/d/expr.cc
index d5e4df7f563..2a7fb690862 100644
--- a/gcc/d/expr.cc
+++ b/gcc/d/expr.cc
@@ -1438,28 +1438,16 @@ public:
   {
/* For class object references, if there is a destructor for that class,
   the destructor is called for the object instance.  */
-   libcall_fn libcall;
+   gcc_assert (e->e1->op == EXP::variable);
 
-   if (e->e1->op == EXP::variable)
- {
-   VarDeclaration *v = e->e1->isVarExp ()->var->isVarDeclaration ();
-   if (v && v->onstack)
- {
-   libcall = tb1->isClassHandle ()->isInterfaceDeclaration ()
- ? LIBCALL_CALLINTERFACEFINALIZER : LIBCALL_CALLFINALIZER;
+   VarDeclaration *v = e->e1->isVarExp ()->var->isVarDeclaration ();
+   gcc_assert (v && v->onstack);
 
-   this->result_ = build_libcall (libcall, Type::tvoid, 1, t1);
-   return;
- }
- }
+   libcall_fn libcall = tb1->isClassHandle ()->isInterfaceDeclaration ()
+ ? LIBCALL_CALLINTERFACEFINALIZER : LIBCALL_CALLFINALIZER;
 
-   /* Otherwise, the garbage collector is called to immediately free the
-  memory allocated for the class instance.  */
-   libcall = tb1->isClassHandle ()->isInterfaceDeclaration ()
- ? LIBCALL_DELINTERFACE : LIBCALL_DELCLASS;
-
-   t1 = build_address (t1);
this->result_ = build_libcall (libcall, Type::tvoid, 1, t1);
+   return;
   }
 else
   {
diff --git a/gcc/d/runtime.def b/gcc/d/runtime.def
index acb610f71f0..534f8661b3e 100644
--- a/gcc/d/runtime.def
+++ b/gcc/d/runtime.def
@@ -63,11 +63,7 @@ DEF_D_RUNTIME (ARRAYBOUNDS_INDEXP, "_d_arraybounds_indexp", 
RT(VOID),
 DEF_D_RUNTIME (NEWCLASS, "_d_newclass", RT(OBJECT), P1(CONST_CLASSINFO), 0)
 DEF_D_RUNTIME (NEWTHROW, "_d_newThrowable", RT(OBJECT), P1(CONST_CLASSINFO), 0)
 
-/* Used when calling delete on a class or interface.  */
-DEF_D_RUNTIME (DELCLASS, "_d_delclass", RT(VOID), P1(VOIDPTR), 0)
-DEF_D_RUNTIME (DELINTERFACE, "_d_delinterface", RT(VOID), P1(VOIDPTR), 0)
-
-/* Same as deleting a class, but used for stack-allocated classes.  */
+/* Used when calling delete on a stack-allocated class or interface.  */
 DEF_D_RUNTIME (CALLFINALIZER, "_d_callfinalizer", RT(VOID), P1(VOIDPTR), 0)
 DEF_D_RUNTIME (CALLINTERFACEFINALIZER, "_d_callinterfacefinalizer", RT(VOID),
   P1(VOIDPTR), 0)
-- 
2.32.0



[RFC][nvptx] Initialize ptx regs

2022-02-20 Thread Tom de Vries via Gcc-patches
Hi,

With nvptx target, driver version 510.47.03 and board GT 1030 I, we run into:
...
FAIL: gcc.c-torture/execute/pr53465.c -O1 execution test
FAIL: gcc.c-torture/execute/pr53465.c -O2 execution test
FAIL: gcc.c-torture/execute/pr53465.c -O3 -g execution test
...
while the test-cases pass with nvptx-none-run -O0.

The problem is that the generated ptx contains a read from an uninitialized
ptx register, and the driver JIT doesn't handle this well.

For -O2 and -O3, we can get rid of the FAIL using --param
logical-op-non-short-circuit=0.  But not for -O1.

At -O1, the test-case minimizes to:
...
void __attribute__((noinline, noclone))
foo (int y) {
  int c;
  for (int i = 0; i < y; i++)
{
  int d = i + 1;
  if (i && d <= c)
__builtin_abort ();
  c = d;
}
}

int main () {
  foo (2); return 0;
}
...

Note that the test-case does not contain an uninitialized use.  In the first
iteration, i is 0 and consequently c is not read.  In the second iteration, c
is read, but by that time it's already initialized by 'c = d' from the first
iteration.

AFAICT the problem is introduced as follows: the conditional use of c in the
loop body is translated into an unconditional use of c in the loop header:
...
  # c_1 = PHI 
...
which forwprop1 propagates the 'c_9 = d_7' assignment into:
...
  # c_1 = PHI 
...
which ends up being translated by expand into an unconditional:
...
(insn 13 12 0 (set (reg/v:SI 22 [ c ])
(reg/v:SI 23 [ d ])) -1
 (nil))
...
at the start of the loop body, creating an uninitialized read of d on the
path from loop entry.

By disabling coalesce_ssa_name, we get the more usual copies on the incoming
edges.  The copy on the loop entry path still does an uninitialized read, but
that one's now initialized by init-regs.  The test-case passes, also when
disabling init-regs, so it's possible that the JIT driver doesn't object to
this type of uninitialized read.

Now that we characterized the problem to some degree, we need to fix this,
because either:
- we're violating an undocumented ptx invariant, and this is a compiler bug,
  or
- this is is a driver JIT bug and we need to work around it.

There are essentially two strategies to address this:
- stop the compiler from creating uninitialized reads
- patch up uninitialized reads using additional initialization

The former will probably involve:
- making some optimizations more conservative in the presence of
  uninitialized reads, and
- disabling some other optimizations (where making them more conservative is
  not possible, or cannot easily be achieved).
This will probably will have a cost penalty for code that does not suffer from
the original problem.

The latter has the problem that it may paper over uninitialized reads
in the source code, or indeed over ones that were incorrectly introduced
by the compiler.  But it has the advantage that it allows for the problem to
be addressed at a single location.

There's an existing pass, init-regs, which implements a form of the latter,
but it doesn't work for this example because it only inserts additional
initialization for uses that have not a single reaching definition.

Fix this by adding initialization of uninitialized ptx regs in reorg.

Control the new functionality using -minit-regs=<0|1|2|3>, meaning:
- 0: disabled.
- 1: add initialization of all regs at the entry bb
- 2: add initialization of uninitialized regs at the entry bb
- 3: add initialization of uninitialized regs close to the use
and defaulting to 3.

Tested on nvptx.

Any comments?

Thanks,
- Tom

[nvptx] Initialize ptx regs

gcc/ChangeLog:

2022-02-17  Tom de Vries  

PR target/104440
* config/nvptx/nvptx.cc (workaround_uninit_method_1)
(workaround_uninit_method_2, workaround_uninit_method_3)
(workaround_uninit): New function.
(nvptx_reorg): Use workaround_uninit.
* config/nvptx/nvptx.opt (minit-regs): New option.

---
 gcc/config/nvptx/nvptx.cc  | 188 +
 gcc/config/nvptx/nvptx.opt |   4 +
 2 files changed, 192 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index ed347cab70e..a37a6c78b41 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -5372,6 +5372,190 @@ workaround_barsyncs (void)
 }
 #endif
 
+/* Initialize all declared regs at function entry.
+   Advantage   : Fool-proof.
+   Disadvantage: Potentially creates a lot of long live ranges and adds a lot
+of insns.  */
+
+static void
+workaround_uninit_method_1 (void)
+{
+  rtx_insn *first = get_insns ();
+  rtx_insn *insert_here = NULL;
+
+  for (int ix = LAST_VIRTUAL_REGISTER + 1; ix < max_reg_num (); ix++)
+{
+  rtx reg = regno_reg_rtx[ix];
+
+  /* Skip undeclared registers.  */
+  if (reg == const0_rtx)
+   continue;
+
+  gcc_assert (CONST0_RTX (GET_MODE (reg)));
+
+  start_sequence ();
+  emit_move_insn (reg, CONST0_RTX (GET_MODE (reg)));
+  rtx_insn 

[committed] d: Merge upstream dmd cb49e99f8, druntime 55528bd1, phobos 1a3e80ec2.

2022-02-20 Thread Iain Buclaw via Gcc-patches
Hi,

This patch merges the D front-end implementation with upstream dmd
cb49e99f8, as well as the D runtime libraries with druntime 55528bd1,
and phobos 1a3e80ec2, synchronizing with the release of 2.099.0-beta1.

D front-end changes:

- Import dmd v2.099.0-beta.1.
- It's now an error to use `alias this' for partial assignment.
- The `delete' keyword has been removed from the language.
- Using `this' and `super' as types has been removed from the
  language, the parser no longer specially handles this wrong code
  with an informative error.

D Runtime changes:

- Import druntime v2.099.0-beta.1.

Phobos changes:

- Import phobos v2.099.0-beta.1.


Bootstrapped and regression tested on x86_64-linux-gnu/-m32/-mx32, and
committed to mainline.

Regards,
Iain.

---
gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd cb49e99f8.
* dmd/VERSION: Update version to v2.099.0-beta.1.
* decl.cc (layout_class_initializer): Update call to NewExp::create.
* expr.cc (ExprVisitor::visit (DeleteExp *)): Remove handling of
deleting arrays and pointers.
(ExprVisitor::visit (DotVarExp *)): Convert complex types to the
front-end library type representing them.
(ExprVisitor::visit (StringExp *)): Use getCodeUnit instead of charAt
to get the value of each index in a string expression.
* runtime.def (DELMEMORY): Remove.
(DELARRAYT): Remove.
* types.cc (TypeVisitor::visit (TypeEnum *)): Handle anonymous enums.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime 55528bd1.
* src/MERGE: Merge upstream phobos 1a3e80ec2.
* testsuite/libphobos.hash/test_hash.d: Update.
* testsuite/libphobos.betterc/test19933.d: New test.
---
 gcc/d/decl.cc |2 +-
 gcc/d/dmd/MERGE   |2 +-
 gcc/d/dmd/VERSION |2 +-
 gcc/d/dmd/apply.d |4 +-
 gcc/d/dmd/canthrow.d  |   12 +-
 gcc/d/dmd/clone.d |   37 +-
 gcc/d/dmd/constfold.d |6 +-
 gcc/d/dmd/cparse.d|   27 +-
 gcc/d/dmd/ctfeexpr.d  |2 +-
 gcc/d/dmd/dcast.d | 4267 -
 gcc/d/dmd/declaration.d   |5 +-
 gcc/d/dmd/declaration.h   |1 -
 gcc/d/dmd/dinterpret.d|  106 +-
 gcc/d/dmd/dmangle.d   |3 +-
 gcc/d/dmd/dmodule.d   |   78 +-
 gcc/d/dmd/dscope.d|2 +-
 gcc/d/dmd/dsymbol.d   |   11 +-
 gcc/d/dmd/dsymbol.h   |2 +
 gcc/d/dmd/dsymbolsem.d|  184 +-
 gcc/d/dmd/dtemplate.d |   52 +-
 gcc/d/dmd/dtoh.d  |   24 +-
 gcc/d/dmd/escape.d|2 +-
 gcc/d/dmd/expression.d|  115 +-
 gcc/d/dmd/expression.h|   17 +-
 gcc/d/dmd/expressionsem.d |  304 +-
 gcc/d/dmd/func.d  |3 +-
 gcc/d/dmd/hdrgen.d|   70 +-
 gcc/d/dmd/iasmgcc.d   |2 +-
 gcc/d/dmd/id.d|4 -
 gcc/d/dmd/importc.d   |   47 +
 gcc/d/dmd/initsem.d   |4 +
 gcc/d/dmd/lexer.d |  444 +-
 gcc/d/dmd/mtype.d |   45 +-
 gcc/d/dmd/nogc.d  |   42 +-
 gcc/d/dmd/opover.d|  342 +-
 gcc/d/dmd/optimize.d  |7 -
 gcc/d/dmd/parse.d |  794 +--
 gcc/d/dmd/printast.d  |   10 +
 gcc/d/dmd/semantic2.d |2 +-
 gcc/d/dmd/semantic3.d |   22 +-
 gcc/d/dmd/statementsem.d  |  206 +-
 gcc/d/dmd/staticassert.d  |5 +
 gcc/d/dmd/staticassert.h  |1 +
 gcc/d/dmd/tokens.d|  120 +-
 gcc/d/dmd/tokens.h|   13 +-
 gcc/d/dmd/transitivevisitor.d |4 -
 gcc/d/dmd/typesem.d   |   80 +-
 gcc/d/expr.cc |   46 +-
 gcc/d/runtime.def |7 -
 gcc/d/types.cc|   14 +-
 gcc/testsuite/gdc.dg/special1.d   |   12 +
 gcc/testsuite/gdc.test/compilable/99bottles.d |  212 +-
 gcc/testsuite/gdc.test/compilable/b18242.d|6 +-
 gcc/testsuite/gdc.test/compilable/b19294.d|   10 +-
 gcc/testsuite/gdc.test/compilable/b20938.d|6 +-
 gcc/testsuite/gdc.test/compilable/b21285.d|   10 +-
 

Re: libgo patch committed: Update to Go1.18rc1 release

2022-02-20 Thread Rainer Orth
Hi Ian,

> This patch updates libgo to the Go1.18rc1 release.  Bootstrapped and
> ran Go testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

this broke Solaris bootstrap:

ld: fatal: file runtime/internal/.libs/syscall.o: open failed: No such file or 
directory
collect2: error: ld returned 1 exit status

Creating a dummy syscall_solaris.go worked around that for now.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] PR fortran/77693 - ICE in rtl_for_decl_init, at dwarf2out.c:17378

2022-02-20 Thread Thomas Koenig via Gcc-patches



Hi Harald,


Regtested on x86_64-pc-linux-gnu.  OK for mainline?


Looks good to me.  Thanks for the patch!

Best regards

Thomas


New Swedish PO file for 'gcc' (version 12.1-b20220213)

2022-02-20 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Swedish team of translators.  The file is available at:

https://translationproject.org/latest/gcc/sv.po

(This file, 'gcc-12.1-b20220213.sv.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




*Ping* [PATCH] PR fortran/77693 - ICE in rtl_for_decl_init, at dwarf2out.c:17378

2022-02-20 Thread Harald Anlauf via Gcc-patches

Am 09.02.22 um 22:11 schrieb Harald Anlauf via Gcc-patches:

Dear all,

as we did not properly check the initialization of pointers in
DATA statements for valid initial data targets, we could either
ICE or generate wrong code.  Testcase based on Gerhard's, BTW.

The attached patch adds a check in gfc_assign_data_value by
calling gfc_check_pointer_assign, as the latter did not get
called otherwise.

Along the course I introduced a new macro IS_POINTER() that
should help to make the code more readable whenever we need
to check the attributes of a symbol to see whether it is a
pointer, CLASS or not.  At least it may save some typing in
the future.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald





Re: [pushed] LRA, rs6000, Darwin: Amend lo_sum use for forced constants [PR104117].

2022-02-20 Thread Iain Sandoe
Hi Folks.

> On 14 Feb 2022, at 16:58, Vladimir Makarov  wrote:
> On 2022-02-14 11:00, Richard Sandiford wrote:

>> Vladimir Makarov via Gcc-patches  writes:
>>> 
>>> Hi, Richard.  Change LRA is mine and I approved it for Iain's patch.
>>> 
>>> I think there is no need for this code and it is misleading.  If
>>> 'mem[low_sum]' does not work, I don't think that 'reg=low_sum;mem[reg]'
>>> will help for any existing target.  As machine-dependent code for any
>>> target most probably (for ppc64 darwin it is exactly the case) checks
>>> address only in memory, it can wrongly accept wrong address by reloading
>>> it into reg and use it in memory. So these are my arguments for the
>>> remove this code from process_address_1.
>> I'm probably making too much of this, but:
>> 
>> I think the code is potentially useful in that existing targets do forbid
>> forbid lo_sum addresses in certain contexts (due to limited offset range)
>> while still wanting lo_sum to be used to be load the address.  If we
>> handle the high/lo_sum split in generic code then we have more chance
>> of being able to optimise things.  So it feels like this is setting an
>> unfortunate precedent.
>> 
>> I still don't understand what went wrong before though (the PR trail
>> was a bit too long to process :-)).  Is there a case where
>> (lo_sum (high X) X) != X?  If so, that seems like a target bug to me.
>> Or does the target accept (set R1 (lo_sum R2 X)) for an X that cannot
>> be split into a HIGH/LO_SUM pair?  I'd argue that's a target bug too.
>> 
> Sometimes it is hard to make a line where an RA bug is a bug in 
> machine-dependent code or in RA itself.
> 
> For this case I would say it is a bug in the both parts.
> 
> Low-sum is generated by LRA and it does not know that it should be wrapped by 
> unspec for darwin. Generally speaking we could avoid the change in LRA but it 
> would require to do non-trivial analysis in machine dependent code to find 
> cases when 'reg=low_sum ... mem[reg]' is incorrect code for darwin (PIC) 
> target (and may be some other PIC targets too). Therefore I believe the 
> change in LRA is a good solution even if the change can potentially result in 
> less optimized code for some cases.  Taking your concern into account we 
> could probably improve the patch by introducing a hook (I never liked such 
> solutions as we already have too many hooks directing RA) or better to make 
> the LRA change working only for PIC target. Something like this (it probably 
> needs better recognition of pic target):
> 
> --- a/gcc/lra-constraints.cc
> +++ b/gcc/lra-constraints.cc
> @@ -3616,21 +3616,21 @@ process_address_1 (int nop, bool check_only_p,
>   if (HAVE_lo_sum)
> {
>   /* addr => lo_sum (new_base, addr), case (2) above.  */
>   insn = emit_insn (gen_rtx_SET
> (new_reg,
>  gen_rtx_HIGH (Pmode, copy_rtx (addr;
>   code = recog_memoized (insn);
>   if (code >= 0)
> {
>   *ad.inner = gen_rtx_LO_SUM (Pmode, new_reg, addr);
> - if (!valid_address_p (op, , cn))
> + if (!valid_address_p (op, , cn) && !flag_pic)

IMO the PIC aspect of this is possibly misleading
 - the issue is that we have an invalid address, and that such addresses in 
this case need to be legitimised by wrapping them in an UNSPEC. 
- My concern about the generic code was that I would not expect Darwin to be 
the only platform that might need to wrap an invlaid address in an unspec  
[TOC, TLS, small data etc. spring to mind].

I need some help understanding the expected pathway through this code that 
could be useful.

we start with an invalid address.

1. we build (set reg (high invalid_address))
 - Darwin was allowing this (and the lo_sum) [eveywhere, not just here] on the 
basis that the target legitimizer would be called later to fix it up.  (that is 
why the initial recog passes) - but AFAICT we never call the target’s address 
legitimizer.

 - I am curious about what (other) circumstance there would be where a (high of 
an invalid address would be useful.

2. …  assuming the we allowed the build of the (high invalid)

 - we now build the lo_sum and check to see if it is valid.

3. if it is _not_ valid, we load it into a reg

  - I am not sure (outside the comment about about post-legitimiizer use) about 
how an invalid lo_sum can be used in this way.

  - assuming we accept this, we then test to see if the register is a valid 
address (my guess is that test will pass pretty much everywhere, since we 
picked a suitable register in the first place).

^^^ this is mostly for my education - the stuff below is a potential solution 
to leaving lra-constraints unchanged and fixing the Darwin bug….

[ part of me wonders why we do not just call the target’s address legitimizer 
when we have an illegal address ]

——— current WIP:

So .. I have split the Darwin 

[PATCH] PR tree-optimization/83907: Improved memset handling in strlen pass.

2022-02-20 Thread Roger Sayle

This patch implements the missed optimization enhancement PR 83907,
by handling memset with a constant byte value in tree-ssa's strlen
optimization pass.  Effectively, this treats memset(dst,'x',3) as
it would memcpy(dst,"xxx",3).

This patch also includes a tweak to handle_store to address another
missed optimization observed in the related test case pr83907-2.c.
The consecutive byte stores to memory get coalesced into a vector
write of a vector const, but unfortunately tree-ssa-strlen's
handle_store didn't previously handle the (unusual) case where the
stored "string" starts with a zero byte but also contains non-zero
bytes.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check with no new failures.  Ok for mainline?


2022-02-20  Roger Sayle  

gcc/ChangeLog
PR tree-optimization/83907
* tree-ssa-strlen.cc (handle_builtin_memset): Record a strinfo
for memset with an constant char value.
(handle_store): Improved handling of stores with a first byte
of zero, but not storing_all_zeros_p.

gcc/testsuite/ChangeLog
PR tree-optimization/83907
* gcc.dg/tree-ssa/pr83907-1.c: New test case.
* gcc.dg/tree-ssa/pr83907-2.c: New test case.


Thanks in advance,
Roger
--

diff --git a/gcc/tree-ssa-strlen.cc b/gcc/tree-ssa-strlen.cc
index 7370516..d2db197 100644
--- a/gcc/tree-ssa-strlen.cc
+++ b/gcc/tree-ssa-strlen.cc
@@ -3810,9 +3810,44 @@ strlen_pass::handle_builtin_memset (bool *zero_write)
 {
   gimple *memset_stmt = gsi_stmt (m_gsi);
   tree ptr = gimple_call_arg (memset_stmt, 0);
+  tree memset_val = gimple_call_arg (memset_stmt, 1);
+  tree memset_size = gimple_call_arg (memset_stmt, 2);
+
   /* Set to the non-constant offset added to PTR.  */
   wide_int offrng[2];
   int idx1 = get_stridx (ptr, memset_stmt, offrng, ptr_qry.rvals);
+  if (idx1 == 0
+  && TREE_CODE (memset_val) == INTEGER_CST
+  && ((TREE_CODE (memset_size) == INTEGER_CST
+  && !integer_zerop (memset_size))
+ || TREE_CODE (memset_size) == SSA_NAME))
+{
+  unsigned HOST_WIDE_INT mask = (HOST_WIDE_INT_1U << CHAR_TYPE_SIZE) - 1;
+  bool full_string_p = (wi::to_wide (memset_val) & mask) == 0;
+
+  /* We only handle symbolic lengths when writing non-zero values.  */
+  if (full_string_p && TREE_CODE (memset_size) != INTEGER_CST)
+   return false;
+
+  idx1 = new_stridx (ptr);
+  if (idx1 == 0)
+   return false;
+  tree newlen;
+  if (full_string_p)
+   newlen = build_int_cst (size_type_node, 0);
+  else if (TREE_CODE (memset_size) == INTEGER_CST)
+   newlen = fold_convert (size_type_node, memset_size);
+  else
+   newlen = memset_size;
+
+  strinfo *dsi = new_strinfo (ptr, idx1, newlen, full_string_p);
+  set_strinfo (idx1, dsi);
+  find_equal_ptrs (ptr, idx1);
+  dsi->dont_invalidate = true;
+  dsi->writable = true;
+  return false;
+}
+
   if (idx1 <= 0)
 return false;
   strinfo *si1 = get_strinfo (idx1);
@@ -3825,7 +3860,6 @@ strlen_pass::handle_builtin_memset (bool *zero_write)
   if (!valid_builtin_call (alloc_stmt))
 return false;
   tree alloc_size = gimple_call_arg (alloc_stmt, 0);
-  tree memset_size = gimple_call_arg (memset_stmt, 2);
 
   /* Check for overflow.  */
   maybe_warn_overflow (memset_stmt, false, memset_size, NULL, false, true);
@@ -3841,7 +3875,7 @@ strlen_pass::handle_builtin_memset (bool *zero_write)
 return false;
 
   /* Bail when the call writes a non-zero value.  */
-  if (!integer_zerop (gimple_call_arg (memset_stmt, 1)))
+  if (!integer_zerop (memset_val))
 return false;
 
   /* Let the caller know the memset call cleared the destination.  */
@@ -5093,8 +5127,9 @@ strlen_pass::handle_store (bool *zero_write)
  return false;
}
 
-  if (storing_all_zeros_p
- || storing_nonzero_p
+  if (storing_nonzero_p
+ || storing_all_zeros_p
+ || (full_string_p && lenrange[1] == 0)
  || (offset != 0 && store_before_nul[1] > 0))
{
  /* When STORING_NONZERO_P, we know that the string will start
@@ -5104,8 +5139,9 @@ strlen_pass::handle_store (bool *zero_write)
 of leading non-zero characters and set si->NONZERO_CHARS to
 the result instead.
 
-When STORING_ALL_ZEROS_P, we know that the string is now
-OFFSET characters long.
+When STORING_ALL_ZEROS_P, or the first byte written is zero,
+i.e. FULL_STRING_P && LENRANGE[1] == 0, we know that the
+string is now OFFSET characters long.
 
 Otherwise, we're storing an unknown value at offset OFFSET,
 so need to clip the nonzero_chars to OFFSET.
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr83907-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr83907-1.c
new file mode 100644
index 000..2a6f4f5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr83907-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */

[PATCH] Improved constant folding for scalar evolution.

2022-02-20 Thread Roger Sayle

This patch adds a small (follow-up) optimization to chrec_apply for
linear chrecs to clean-up the final value expressions sometimes generated
by GCC's scalar evolution pass.  The transformation of A+(X-1)*A into
A*X is usually unsafe with respect to overflow (see PR92712), and so
can't be performed by match.pd (or fold-const).  However, during scalar
evolution's evaluation of recurrences it's known that X-1 can't be negative
(in fact X-1 is unsigned even when A is signed), hence this optimization
can be applied.  Interestingly, this expression does get simplified in
later passes once the range of X-1 is bounded by VRP, but that occurs
long after the decision of whether to perform final value replacement,
which is based on the complexity of this expression.

The motivating test case is the optimization of the loop (from comment
#7 of PR65855):

int square(int x) {
  int result = 0;
  for (int i = 0; i < x; ++i)
result += x;
  return result;
}

which is currently optimized, with final value replacement to:

  final value replacement:
   with expr: (int) ((unsigned int) x_3(D) + 4294967295) * x_3(D) + x_3(D)

but with this patch it first gets simplified further:

  final value replacement:
   with expr: x_3(D) * x_3(D)


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check with no new failures.  Ok for mainline?


2022-02-20  Roger Sayle  

gcc/ChangeLog
* tree-chrec.cc (chrec_apply): Attempt to fold the linear chrec
"{a, +, a} (x-1)" as "a*x", as the number of loop iterations, x-1,
can't be negative.

gcc/testsuite/ChangeLog
* gcc.dg/tree-ssa/pr65855-2.c: New test case.


Roger
--

diff --git a/gcc/tree-chrec.cc b/gcc/tree-chrec.cc
index c44cea7..7321fb9 100644
--- a/gcc/tree-chrec.cc
+++ b/gcc/tree-chrec.cc
@@ -612,16 +612,31 @@ chrec_apply (unsigned var,
 case POLYNOMIAL_CHREC:
   if (evolution_function_is_affine_p (chrec))
{
+ tree chrecr = CHREC_RIGHT (chrec);
  if (CHREC_VARIABLE (chrec) != var)
-   return build_polynomial_chrec
+   res = build_polynomial_chrec
  (CHREC_VARIABLE (chrec),
   chrec_apply (var, CHREC_LEFT (chrec), x),
-  chrec_apply (var, CHREC_RIGHT (chrec), x));
+  chrec_apply (var, chrecr, x));
 
  /* "{a, +, b} (x)"  ->  "a + b*x".  */
- x = chrec_convert_rhs (type, x, NULL);
- res = chrec_fold_multiply (TREE_TYPE (x), CHREC_RIGHT (chrec), x);
- res = chrec_fold_plus (type, CHREC_LEFT (chrec), res);
+ else if (operand_equal_p (CHREC_LEFT (chrec), chrecr)
+  && TREE_CODE (x) == PLUS_EXPR
+  && integer_all_onesp (TREE_OPERAND (x, 1)))
+   {
+ /* We know the number of iterations can't be negative.
+So {a, +, a} (x-1) -> "a*x".  */
+ res = build_int_cst (TREE_TYPE (x), 1);
+ res = chrec_fold_plus (TREE_TYPE (x), x, res);
+ res = chrec_convert_rhs (type, res, NULL);
+ res = chrec_fold_multiply (type, chrecr, res);
+   }
+ else
+   {
+ res = chrec_convert_rhs (TREE_TYPE (chrecr), x, NULL);
+ res = chrec_fold_multiply (TREE_TYPE (chrecr), chrecr, res);
+ res = chrec_fold_plus (type, CHREC_LEFT (chrec), res);
+   }
}
   else if (TREE_CODE (x) == INTEGER_CST
   && tree_int_cst_sgn (x) == 1)
@@ -644,7 +659,7 @@ chrec_apply (unsigned var,
 
   if (dump_file && (dump_flags & TDF_SCEV))
 {
-  fprintf (dump_file, "  (varying_loop = %d\n", var);
+  fprintf (dump_file, "  (varying_loop = %d", var);
   fprintf (dump_file, ")\n  (chrec = ");
   print_generic_expr (dump_file, chrec);
   fprintf (dump_file, ")\n  (x = ");
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr65855-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr65855-2.c
new file mode 100644
index 000..d44ef51
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr65855-2.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sccp" } */
+
+int square(int x) {
+  int result = 0;
+  for (int i = 0; i < x; ++i)
+result += x;
+  return result;
+}
+
+/* { dg-final { scan-tree-dump " with expr: x_\[0-9\]\\(D\\) \\* 
x_\[0-9\]\\(D\\)" "sccp" } } */


Re: [PATCH] libgo: include asm/ptrace.h for pt_regs definition on PowerPC

2022-02-20 Thread Andreas Schwab
On Jan 02 2022, soeren--- via Gcc-patches wrote:

>   libgo/runtime/go-signal.c: In function 'getSiginfo':
>   libgo/runtime/go-signal.c:227:63: error: invalid use of undefined type 
> 'struct pt_regs'
> 227 | ret.sigpc = 
> ((ucontext_t*)(context))->uc_mcontext.regs->nip;

Why does that use .regs instead of .uc_regs?

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: [PATCH] libgo: include asm/ptrace.h for pt_regs definition on PowerPC

2022-02-20 Thread Sören Tempel via Gcc-patches
Ping.

Summary: Fix build of libgo on PPC with musl libc and libucontext by
explicitly including the Linux header defining `struct pt_regs` instead of
relying on other libc headers to include it implicitly.

See: https://gcc.gnu.org/pipermail/gcc-patches/2022-January/587520.html

If the patch needs to be revised further please let me know. This patch has
been applied at Alpine Linux downstream (which uses musl libc) for a while, I
have not tested it on other systems.

Greetings,
Sören

Sören Tempel  wrote:
> Both glibc and musl libc declare pt_regs as an incomplete type. This
> type has to be completed by inclusion of another header. On Linux, the
> asm/ptrace.h header file provides this type definition. Without
> including this header file, it is not possible to access the regs member
> of the mcontext_t struct as done in libgo/runtime/go-signal.c. On glibc,
> other headers (e.g. sys/user.h) include asm/ptrace.h but on musl
> asm/ptrace.h is not included by other headers and thus the
> aforementioned files do not compile without an explicit include of
> asm/ptrace.h:
> 
>   libgo/runtime/go-signal.c: In function 'getSiginfo':
>   libgo/runtime/go-signal.c:227:63: error: invalid use of undefined type 
> 'struct pt_regs'
> 227 | ret.sigpc = 
> ((ucontext_t*)(context))->uc_mcontext.regs->nip;
> |
> 
> See also:
> 
> * 
> https://git.musl-libc.org/cgit/musl/commit/?id=c2518a8efb6507f1b41c3b12e03b06f8f2317a1f
> * https://github.com/kaniini/libucontext/issues/36
> 
> Signed-off-by: Sören Tempel 
> 
> ChangeLog:
> 
>   * libgo/runtime/go-signal.c: Include asm/ptrace.h for the
> definition of pt_regs (used by mcontext_t) on PowerPC.
> ---
>  libgo/runtime/go-signal.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/libgo/runtime/go-signal.c b/libgo/runtime/go-signal.c
> index d30d1603adc..fc01e04e4a1 100644
> --- a/libgo/runtime/go-signal.c
> +++ b/libgo/runtime/go-signal.c
> @@ -10,6 +10,12 @@
>  #include 
>  #include 
>  
> +// On PowerPC, ucontext.h uses a pt_regs struct as an incomplete
> +// type. This type must be completed by including asm/ptrace.h.
> +#ifdef __PPC__
> +#include 
> +#endif
> +
>  #include "runtime.h"
>  
>  #ifndef SA_RESTART