Re: [PATCH] Add __builtin_iseqsig()

2022-10-28 Thread Jeff Law via Gcc-patches



On 9/21/22 03:40, FX via Gcc-patches wrote:

ping*2




Le 9 sept. 2022 à 19:55, FX  a écrit :

ping



Le 1 sept. 2022 à 23:02, FX  a écrit :

Attached patch adds __builtin_iseqsig() to the middle-end and C family 
front-ends.
Testing does not currently check whether the signaling part works, because with 
optimisation is actually does not (preexisting compiler bug: 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106805)

Bootstrapped and regtested on x86_64-linux.
OK to commit?

(I’m not very skilled for middle-end hacking, so I’m sure there will be 
modifications to make.)

FX
<0001-Add-__builtin_iseqsig.patch>


Joseph, do you have bits in this space that are going to be landing 
soon, or is your C2X work focused elsewhere?  Are there other C2X 
routines we need to be proving builtins for?



Jeff



Re: [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266]

2022-10-28 Thread Jeff Law via Gcc-patches



On 10/20/22 13:01, Andrea Parri wrote:

On Wed, Oct 12, 2022 at 07:16:20PM +0200, Andrea Parri wrote:

 +Andrea, in case he has time to look at the memory model / ABI
 issues.

+Jeff, who was offering to help when the threads got crossed.  I'd punted on
a lot of this in the hope Andrea could help out, as I'm not really a memory
model guy and this is pretty far down the rabbit hole.  Happy to have the
help if you're offering, though, as what's there is likely a pretty big
performance issue for anyone with a reasonable memory system.

Thanks for linking me to the discussion and the remarks, Palmer.  I'm
happy to help (and synchronized with Jeff/the community) as possible,
building a better understanding of the 'issues' at stake.

Summarizing here some findings from looking at the currently-implemented
and the proposed [1] mappings:

   - Current mapping is missing synchronization, notably

atomic_compare_exchange_weak_explicit(-, -, -,
  memory_order_release,
  memory_order_relaxed);

 is unable to provide the (required) release ordering guarantees; for
 reference, I've reported a litmus test illustrating it at the bottom
 of this email, cf. c-cmpxchg.

   - [1] addressed the "memory_order_release" problem/bug mentioned above
 (as well as other quirks of the current mapping I won't detail here),
 but it doesn't address other problems present in the current mapping;
 in particular, both mappings translate the following

atomic_compare_exchange_weak_explicit(-, -, -,
  memory_order_acquire,
  memory_order_relaxed);

 to a sequence

lr.w
bne
sc.w.aq

 (withouth any other synchronization/fences), which contrasts with the
 the Unprivileged Spec, Section 10,2 "Load-Reserve / Store-Conditional
 Instructions":

   "Software should not set the 'rl' bit on an LR instruction unless
   the 'aq' bit is also set, nor should software set the 'aq' bit on
   an SC instruction unless the 'rl' bit is also set.  LR.rl and SC.aq
   instructions are not guaranteed to provide any stronger ordering
   than those with both bits clear [...]"


So it sounds like Christoph's patch is an improvement, but isn't 
complete.  Given the pain in this space, I'd be hesitant to put in an 
incomplete fix, even if it moves things in the right direction as it 
creates another compatibility headache if we don't get the complete 
solution in place for gcc-13.



Christoph, thoughts on the case Andrea pointed out?


Jeff




Re: [PATCH] [PR24021] Implement PLUS_EXPR range-op entry for floats.

2022-10-28 Thread Jeff Law via Gcc-patches



On 10/24/22 00:04, Aldy Hernandez via Gcc-patches wrote:

PING


I'd be a lot more comfortable if Jakub would chime in here.


Jeff




Re: [PATCH] builtins: Add various __builtin_*f{16,32,64,128,32x,64x,128x} builtins

2022-10-28 Thread Jeff Law via Gcc-patches



On 10/16/22 04:09, Jakub Jelinek wrote:

Hi!

When working on libstdc++ extended float support in , I found that
we need various builtins for the _Float{16,32,64,128,32x,64x,128x} types.
Glibc 2.26 and later provides the underlying libm routines (except for
_Float16 and _Float128x for the time being) and in libstdc++ I think we
need at least the _Float128 builtins on x86_64, i?86, powerpc64le and ia64
(when long double is IEEE quad, we can handle it by using __builtin_*l
instead), because without the builtins the overloads couldn't be constexpr
(say when it would declare the *f128 extern "C" routines itself and call
them).

The testcase covers just types of those builtins and their constant
folding, so doesn't need actual libm support.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-10-15  Jakub Jelinek  

* builtin-types.def (BT_FLOAT16_PTR, BT_FLOAT32_PTR, BT_FLOAT64_PTR,
BT_FLOAT128_PTR, BT_FLOAT32X_PTR, BT_FLOAT64X_PTR, BT_FLOAT128X_PTR):
New DEF_PRIMITIVE_TYPE.
(BT_FN_INT_FLOAT16, BT_FN_INT_FLOAT32, BT_FN_INT_FLOAT64,
BT_FN_INT_FLOAT128, BT_FN_INT_FLOAT32X, BT_FN_INT_FLOAT64X,
BT_FN_INT_FLOAT128X, BT_FN_LONG_FLOAT16, BT_FN_LONG_FLOAT32,
BT_FN_LONG_FLOAT64, BT_FN_LONG_FLOAT128, BT_FN_LONG_FLOAT32X,
BT_FN_LONG_FLOAT64X, BT_FN_LONG_FLOAT128X, BT_FN_LONGLONG_FLOAT16,
BT_FN_LONGLONG_FLOAT32, BT_FN_LONGLONG_FLOAT64,
BT_FN_LONGLONG_FLOAT128, BT_FN_LONGLONG_FLOAT32X,
BT_FN_LONGLONG_FLOAT64X, BT_FN_LONGLONG_FLOAT128X): New
DEF_FUNCTION_TYPE_1.
(BT_FN_FLOAT16_FLOAT16_FLOAT16PTR, BT_FN_FLOAT32_FLOAT32_FLOAT32PTR,
BT_FN_FLOAT64_FLOAT64_FLOAT64PTR, BT_FN_FLOAT128_FLOAT128_FLOAT128PTR,
BT_FN_FLOAT32X_FLOAT32X_FLOAT32XPTR,
BT_FN_FLOAT64X_FLOAT64X_FLOAT64XPTR,
BT_FN_FLOAT128X_FLOAT128X_FLOAT128XPTR, BT_FN_FLOAT16_FLOAT16_INT,
BT_FN_FLOAT32_FLOAT32_INT, BT_FN_FLOAT64_FLOAT64_INT,
BT_FN_FLOAT128_FLOAT128_INT, BT_FN_FLOAT32X_FLOAT32X_INT,
BT_FN_FLOAT64X_FLOAT64X_INT, BT_FN_FLOAT128X_FLOAT128X_INT,
BT_FN_FLOAT16_FLOAT16_INTPTR, BT_FN_FLOAT32_FLOAT32_INTPTR,
BT_FN_FLOAT64_FLOAT64_INTPTR, BT_FN_FLOAT128_FLOAT128_INTPTR,
BT_FN_FLOAT32X_FLOAT32X_INTPTR, BT_FN_FLOAT64X_FLOAT64X_INTPTR,
BT_FN_FLOAT128X_FLOAT128X_INTPTR, BT_FN_FLOAT16_FLOAT16_LONG,
BT_FN_FLOAT32_FLOAT32_LONG, BT_FN_FLOAT64_FLOAT64_LONG,
BT_FN_FLOAT128_FLOAT128_LONG, BT_FN_FLOAT32X_FLOAT32X_LONG,
BT_FN_FLOAT64X_FLOAT64X_LONG, BT_FN_FLOAT128X_FLOAT128X_LONG): New
DEF_FUNCTION_TYPE_2.
(BT_FN_FLOAT16_FLOAT16_FLOAT16_INTPTR,
BT_FN_FLOAT32_FLOAT32_FLOAT32_INTPTR,
BT_FN_FLOAT64_FLOAT64_FLOAT64_INTPTR,
BT_FN_FLOAT128_FLOAT128_FLOAT128_INTPTR,
BT_FN_FLOAT32X_FLOAT32X_FLOAT32X_INTPTR,
BT_FN_FLOAT64X_FLOAT64X_FLOAT64X_INTPTR,
BT_FN_FLOAT128X_FLOAT128X_FLOAT128X_INTPTR): New DEF_FUNCTION_TYPE_3.
* builtins.def (ACOSH_TYPE, ATAN2_TYPE, ATANH_TYPE, COSH_TYPE,
FDIM_TYPE, HUGE_VAL_TYPE, HYPOT_TYPE, ILOGB_TYPE, LDEXP_TYPE,
LGAMMA_TYPE, LLRINT_TYPE, LOG10_TYPE, LRINT_TYPE, MODF_TYPE,
NEXTAFTER_TYPE, REMQUO_TYPE, SCALBLN_TYPE, SCALBN_TYPE, SINH_TYPE):
Define and undefine later.
(FMIN_TYPE, SQRT_TYPE): Undefine at a later line.
(INF_TYPE): Define at a later line.
(BUILT_IN_ACOSH, BUILT_IN_ACOS, BUILT_IN_ASINH, BUILT_IN_ASIN,
BUILT_IN_ATAN2, BUILT_IN_ATANH, BUILT_IN_ATAN, BUILT_IN_CBRT,
BUILT_IN_COSH, BUILT_IN_COS, BUILT_IN_ERFC, BUILT_IN_ERF,
BUILT_IN_EXP2, BUILT_IN_EXP, BUILT_IN_EXPM1, BUILT_IN_FDIM,
BUILT_IN_FMOD, BUILT_IN_FREXP, BUILT_IN_HYPOT, BUILT_IN_ILOGB,
BUILT_IN_LDEXP, BUILT_IN_LGAMMA, BUILT_IN_LLRINT, BUILT_IN_LLROUND,
BUILT_IN_LOG10, BUILT_IN_LOG1P, BUILT_IN_LOG2, BUILT_IN_LOGB,
BUILT_IN_LOG, BUILT_IN_LRINT, BUILT_IN_LROUND, BUILT_IN_MODF,
BUILT_IN_NEXTAFTER, BUILT_IN_POW, BUILT_IN_REMAINDER, BUILT_IN_REMQUO,
BUILT_IN_SCALBLN, BUILT_IN_SCALBN, BUILT_IN_SINH, BUILT_IN_SIN,
BUILT_IN_TANH, BUILT_IN_TAN, BUILT_IN_TGAMMA): Add
DEF_EXT_LIB_FLOATN_NX_BUILTINS.
(BUILT_IN_HUGE_VAL): Use HUGE_VAL_TYPE instead of INF_TYPE in
DEF_GCC_FLOATN_NX_BUILTINS.
* fold-const-call.cc (fold_const_call_ss): Add various CASE_CFN_*_FN:
cases when CASE_CFN_* is present.
(fold_const_call_sss): Likewise.
* builtins.cc (mathfn_built_in_2): Use CASE_MATHFN_FLOATN instead of
CASE_MATHFN for various builtins in SEQ_OF_CASE_MATHFN macro.
(builtin_with_linkage_p): Add CASE_FLT_FN_FLOATN_NX for various
builtins next to CASE_FLT_FN.
* fold-const.cc (tree_call_nonnegative_warnv_p): Add CASE_CFN_*_FN:
next to CASE_CFN_*: for various builtins.
* tree-call-cdce.cc (can_test_argument_range): Add
CASE_FLT_FN_FLOATN_NX next to CASE_FLT_FN for various 

Re: [PATCH] improved const shifts for AVR targets

2022-10-28 Thread Jeff Law via Gcc-patches



On 10/15/22 06:08, A. Binzberger wrote:

Re: [PATCH] improved const shifts for AVR targets
On 12.10.22 19:57, Jeff Law wrote:


On 10/4/22 11:06, Alexander Binzberger via Gcc-patches wrote:

Hi,
recently I used some arduino uno for a project and realized some areas
which do not output optimal asm code. Especially around shifts and 
function

calls.
With this as motivation and hacktoberfest I started patching things.
Since patch files do not provide a good overview and I hope for a
"hacktoberfest-accepted" label on the PR on github I also opened it 
there:

https://github.com/gcc-mirror/gcc/pull/73

This patch improves shifts with const right hand operand. While 8bit 
and

16bit shifts where mostly fine 24bit and 32bit where not handled well.

Testing
I checked output with a local installation of compiler explorer in 
asm and

a tiny unit test comparing shifts with mul/div by 2.
I however did not write any testcases in gcc for it.

Target
This patch is only targeting atmel avr family of chips.

Changelog
improved const shifts for AVR targets


It would be helpful if you could show the before/after code for the 
cases you're changing.  Extra credit if you include cycles & size 
information for those cases.  That would help someone like me who 
knows GCC well, but isn't particularly well versed in the AVR target 
evaluate the overarching goal of the patch (ie, better code).


about the avr family targets:

* consider every branch instruction = 1/2 cycles

* consider every 2byte/word instruction (besides move word if 
available) = 2 cycles


* consider multiplication (if available) = 2 cycles

* consider every load (beside load immediate "ldi" 1cylce) = 2cycles 
(+1 for prog mem)


* pop and jump mostly = 2 cycles

* call is basically = 2-4 cycles

* ret is about =  4/5 cycles

* consider every instruction (bit/bit-test, most compare, arithmetic, 
logic, some other) = 1 cycle


* division does not exist

or as a summary for this patch: branches and such are 2 cycles the 
rest is 1 cycle


note that shifts are 1bit per cycle and the instructions are at least 
mostly byte based.


also note that operations using immediate do only work with the upper 
half of registers.


All useful, but you should be giving me the summary for the things 
you're changing, not asking me to do it :-)  Presumably you've already 
done the analysis to ensure your changes are an improvement.  I'm asking 
you to provide that analysis for review and archival purposes.



A quick table like


Mode    Shift count    Shift type    original cycles (or size) new 
cycles (or size)



That will make it very clear for me and anyone doing historical work in 
the future what was expected here.  It's OK if the cycle counts aren't 
100% accurate.



Including a testcase would be awesome as well, but isn't strictly required.



a description for the code before my change and what changed:

* shifts on 8bit (beside arithmetic shifts right) were optimized and 
always unrolled (only aligned with the rest of the code without actual 
change)


* arithmetic shift 8bit and 16bit shifts were mostly optimized and 
mostly unrolled - depending on registers and Os (I added the missing 
cases there)


* 24bit and 32bit shifts were basically not optimized at all and never 
unrolled (I added those cases and aligned the optimizer logic with the 
others. They also reuse the other shift code since they may reduce to 
those cases after a move for bigger shifts.)


* out_shift_with_cnt provides a fallback implementation as a loop over 
shifts which may get unrolled. in case of Os to about inner_len + 3,4 
or 5 and in other cases of optimizer e.g. O2 it gets unrolled if size 
is smaller 10. see max_len (basically unchanged)


* did not touch non const cases in this patch but may in a future 
patch for O2 and O3


note that in case of Os the smaller code is picked which is the loop 
at least in some cases but other optimizer cases profit a lot.


also note that it is debatable if Os needs to be that strict with size 
since the compute overhead of the loop is high with 5 per loop 
iteration/cycle- so per bit shift. A lot more cases could be covered 
with +1 or +2 more instructions.



about plen:

If plen is NULL the asm code gets returned.

If plen is a pointer the code does count the instruction count which I 
guess is used (or could be used) as a rough estimate of cycles as well 
as byte code size.


Some of the functions named this len. The 24bit functions mainly named 
this plen and used it like it is now in all functions. This is mostly 
a readability improvement.


I am not sure how this works together with the optimizer or the rest.

To my understanding however the functions may get called once by the 
optimizer with a length given, then to output code and potentially 
again with a len given over avr_adjust_length to return the size.


I may be wrong about this part but as far as I can tell I did not 
change the way it operates.



size and cycles summary:

The 

Re: [PATCH v4] RISC-V: Libitm add RISC-V support.

2022-10-28 Thread Jeff Law via Gcc-patches



On 10/28/22 06:34, Xiongchuan Tan via Gcc-patches wrote:

Reviewed-by: Palmer Dabbelt 
Acked-by: Palmer Dabbelt 

libitm/ChangeLog:

 * configure.tgt: Add riscv support.
 * config/riscv/asm.h: New file.
 * config/riscv/sjlj.S: New file.
 * config/riscv/target.h: New file.
---
v2: Change HW_CACHELINE_SIZE to 64 (in accordance with the RVA profiles, see
https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc)

v3: Ensure the stack is aligned to 16 bytes; make use of Zihintpause in
cpu_relax()

v4: Add a guard for unsupported RV32E

  libitm/config/riscv/asm.h|  58 ++
  libitm/config/riscv/sjlj.S   | 144 +++
  libitm/config/riscv/target.h |  62 +++
  libitm/configure.tgt |   2 +
  4 files changed, 266 insertions(+)
  create mode 100644 libitm/config/riscv/asm.h
  create mode 100644 libitm/config/riscv/sjlj.S
  create mode 100644 libitm/config/riscv/target.h

diff --git a/libitm/config/riscv/asm.h b/libitm/config/riscv/asm.h
new file mode 100644
index 000..8d02117
--- /dev/null
+++ b/libitm/config/riscv/asm.h
@@ -0,0 +1,58 @@
+/* Copyright (C) 2022 Free Software Foundation, Inc.
+   Contributed by Xiongchuan Tan .
+
+   This file is part of the GNU Transactional Memory Library (libitm).
+
+   Libitm is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   Libitm is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#ifndef _RV_ASM_H
+#define _RV_ASM_H
+
+#ifdef __riscv_e
+#  error "rv32e unsupported"
+#endif


error "rv32e and rv64e unsupported" would probably be a better error 
here.  But it's probably not a big deal.




+#else
+#  define SZ_FPR 0
+#endif


Sneaky way to not allocate space for the FP regs.  ;)

Do you have commit access?  If so, go ahead and commit the change.  Else 
let me know and I can do it for you.



Thanks,



Jeff



Re: Ping^3 [PATCH V2] Add attribute hot judgement for INLINE_HINT_known_hot hint.

2022-10-28 Thread Jeff Law via Gcc-patches



On 10/20/22 19:52, Cui, Lili via Gcc-patches wrote:

Hi Honza,

Gentle ping  
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601934.html

gcc/ChangeLog

   * ipa-inline-analysis.cc (do_estimate_edge_time): Add function attribute
   judgement for INLINE_HINT_known_hot hint.

gcc/testsuite/ChangeLog:

   * gcc.dg/ipa/inlinehint-6.c: New test.
---
  gcc/ipa-inline-analysis.cc  | 13 ---
  gcc/testsuite/gcc.dg/ipa/inlinehint-6.c | 47 +
  2 files changed, 56 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/ipa/inlinehint-6.c

diff --git a/gcc/ipa-inline-analysis.cc b/gcc/ipa-inline-analysis.cc
index 1ca685d1b0e..7bd29c36590 100644
--- a/gcc/ipa-inline-analysis.cc
+++ b/gcc/ipa-inline-analysis.cc
@@ -48,6 +48,7 @@ along with GCC; see the file COPYING3.  If not see
  #include "ipa-utils.h"
  #include "cfgexpand.h"
  #include "gimplify.h"
+#include "attribs.h"
  
  /* Cached node/edge growths.  */

  fast_call_summary *edge_growth_cache = 
NULL;
@@ -249,15 +250,19 @@ do_estimate_edge_time (struct cgraph_edge *edge, sreal 
*ret_nonspec_time)
hints = estimates.hints;
  }
  
-  /* When we have profile feedback, we can quite safely identify hot

- edges and for those we disable size limits.  Don't do that when
- probability that caller will call the callee is low however, since it
+  /* When we have profile feedback or function attribute, we can quite safely
+ identify hot edges and for those we disable size limits.  Don't do that
+ when probability that caller will call the callee is low however, since it
   may hurt optimization of the caller's hot path.  */
-  if (edge->count.ipa ().initialized_p () && edge->maybe_hot_p ()
+  if ((edge->count.ipa ().initialized_p () && edge->maybe_hot_p ()
&& (edge->count.ipa () * 2
  > (edge->caller->inlined_to
 ? edge->caller->inlined_to->count.ipa ()
 : edge->caller->count.ipa (
+  || (lookup_attribute ("hot", DECL_ATTRIBUTES (edge->caller->decl))
+ != NULL
+&& lookup_attribute ("hot", DECL_ATTRIBUTES (edge->callee->decl))
+ != NULL))
  hints |= INLINE_HINT_known_hot;


Is the theory here that if the user has marked the caller and callee as 
hot, then we're going to assume an edge between them is hot too?  That's 
not necessarily true, it could be they're both hot, but via other call 
chains.  But it's probably a reasonable heuristic in practice.



OK


jeff




Re: [PATCH] libgcc: Special-case BFD ld unwind table encodings in find_fde_tail

2022-10-28 Thread Jeff Law via Gcc-patches



On 10/17/22 03:06, Florian Weimer via Gcc-patches wrote:

BFD ld (and the other linkers) only produce one encoding of these
values.  It is not necessary to use the general
read_encoded_value_with_base decoding routine.  This avoids the
data-dependent branches in its implementation.

libgcc/

* unwind-dw2-fde-dip.c (find_fde_tail): Special-case encoding
values actually used by BFD ld.


OK.

jeff




Re: [committed] More infrastructure to avoid bogus RTL on H8

2022-10-28 Thread Jeff Law via Gcc-patches


On 10/25/22 13:59, Jan-Benedict Glaw wrote:

Hi Jeff!

On Mon, 2022-10-17 17:47:16 -0600, Jeff Law via Gcc-patches 
 wrote:

--- a/gcc/config/h8300/h8300.cc
+++ b/gcc/config/h8300/h8300.cc
@@ -5531,6 +5531,32 @@ h8300_ok_for_sibcall_p (tree fndecl, tree)
  
return 1;

  }
+
+/* Return TRUE if OP is a PRE_INC or PRE_DEC
+   instruction using REG, FALSE otherwise.  */
+
+bool
+pre_incdec_with_reg (rtx op, int reg)
+{
+  /* OP must be a MEM.  */
+  if (GET_CODE (op) != MEM)
+return false;
+
+  /* The address must be a PRE_INC or PRE_DEC.  */
+  op = XEXP (op, 0);
+  if (GET_CODE (op) != PRE_DEC && GET_CODE (op) != PRE_INC)
+return false;
+
+  /* It must be a register that is being incremented
+ or decremented.  */
+  op = XEXP (op, 0);
+  if (!REG_P (op))
+return false;
+
+  /* Finally, check that the register number matches.  */
+  return REGNO (op) == reg;

This results in a new signed-vs-unsigned warning for me:

[all 2022-10-25 00:41:11] ../../gcc/gcc/config/h8300/h8300.cc: In function 
'bool pre_incdec_with_reg(rtx, int)':
[all 2022-10-25 00:41:11] ../../gcc/gcc/config/h8300/h8300.cc:5557:21: error: 
comparison of integer expressions of different signedness: 'unsigned int' and 
'int' [-Werror=sign-compare]
[all 2022-10-25 00:41:11]  5557 |   return REGNO (op) == reg;


Fixed via the attached patch.  Thanks for pointing it out.


jeff

commit 724d3f926b94672de960dbe88fb699bbdd7fde97
Author: Jeff Law 
Date:   Fri Oct 28 23:33:06 2022 -0400

Fix signed vs unsigned issue in H8 port

gcc/
* config/h8300/h8300.cc (pre_incdec_with_reg): Make reg argument
an unsigned int
* config/h8300/h8300-protos.h (pre_incdec_with_reg): Adjust 
prototype.

diff --git a/gcc/config/h8300/h8300-protos.h b/gcc/config/h8300/h8300-protos.h
index 8c989495c29..77adfaba07b 100644
--- a/gcc/config/h8300/h8300-protos.h
+++ b/gcc/config/h8300/h8300-protos.h
@@ -100,7 +100,7 @@ extern int h8300_initial_elimination_offset (int, int);
 extern int h8300_regs_ok_for_stm (int, rtx[]);
 extern int h8300_hard_regno_rename_ok (unsigned int, unsigned int);
 extern bool h8300_move_ok (rtx, rtx);
-extern bool pre_incdec_with_reg (rtx, int);
+extern bool pre_incdec_with_reg (rtx, unsigned int);
 
 struct cpp_reader;
 extern void h8300_pr_interrupt (struct cpp_reader *);
diff --git a/gcc/config/h8300/h8300.cc b/gcc/config/h8300/h8300.cc
index ce0702edecb..cd7975e2fff 100644
--- a/gcc/config/h8300/h8300.cc
+++ b/gcc/config/h8300/h8300.cc
@@ -5536,7 +5536,7 @@ h8300_ok_for_sibcall_p (tree fndecl, tree)
instruction using REG, FALSE otherwise.  */
 
 bool
-pre_incdec_with_reg (rtx op, int reg)
+pre_incdec_with_reg (rtx op, unsigned int reg)
 {
   /* OP must be a MEM.  */
   if (GET_CODE (op) != MEM)


Re: [PATCH 2/3] Add lto-dump tool.

2022-10-28 Thread Jeff Law via Gcc-patches



On 10/28/22 04:14, Thomas Schwinge wrote:

Hi!

This minor clean-up had fallen out of me working on something else in
GCC's options machinery, several months ago:

On 2019-03-12T18:14:04+0100, marxin  wrote:

gcc/lto/ChangeLog:
   * lang.opt: Add new language LTODump and options related
   to LTO dump tool.

As this new "Language" 'LTODump' does not share any options with 'LTO'
proper, it makes sense, in my opinion, to also make that obvious in
'gcc/lto/lang.opt', which your Subversion r270897 (Git
commit 66d62d9f2e6b059be6a018397fba555147133a9a) "Add lto-dump tool"
almost ;-) did:


--- a/gcc/lto/lang.opt
+++ b/gcc/lto/lang.opt
@@ -24,6 +24,9 @@
  Language
  LTO

+Language
+LTODump
+
  Enum
  Name(lto_linker_output) Type(enum lto_linker_output) UnknownError(unknown 
linker output %qs)

@@ -66,6 +69,65 @@ fwpa=
  LTO Driver RejectNegative Joined Var(flag_wpa)
  Whole program analysis (WPA) mode with number of parallel jobs specified.

+
+[LTODump option records]
+
+
  fresolution=
  LTO Joined
  The resolution file.

OK to push the attached
"Better separate 'LTO' vs. 'LTODump' in 'gcc/lto/lang.opt'"?


OK.

jeff




Re: [PATCH v2] RISC-V: Libitm add RISC-V support.

2022-10-28 Thread Jeff Law via Gcc-patches



On 10/27/22 22:23, Xi Ruoyao via Gcc-patches wrote:

On Thu, 2022-10-27 at 17:44 -0700, Palmer Dabbelt wrote:

though I don't have an opinion on whether libitm should be taking ports
to new targets, I'd never even heard of it before.

I asked this question to myself when I reviewed LoongArch libitm port.
But I remember one maintainer of Deepin (a distro) has complained that
some packages were depending on libitm (and/or libvtv).+++


I thought libitm had generic code to work if the target didn't provide 
an implementation.  But looking more closely I see that is not the 
case.  So yea, I guess cobbling together the most straightforward 
implementation as possible makes sense.



Jeff



[committed] libstdc++: Fix dangling reference in filesystem::path::filename()

2022-10-28 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux. Pushed to trunk. Worth backporting too.

-- >8 --

The new -Wdangling-reference warning noticed this.

libstdc++-v3/ChangeLog:

* include/bits/fs_path.h (path::filename()): Fix dangling
reference.
---
 libstdc++-v3/include/bits/fs_path.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/bits/fs_path.h 
b/libstdc++-v3/include/bits/fs_path.h
index 6e7b366d104..2fc7dcd98c9 100644
--- a/libstdc++-v3/include/bits/fs_path.h
+++ b/libstdc++-v3/include/bits/fs_path.h
@@ -1262,9 +1262,9 @@ namespace __detail
   {
if (_M_pathname.back() == preferred_separator)
  return {};
-   auto& __last = *--end();
-   if (__last._M_type() == _Type::_Filename)
- return __last;
+   auto __last = --end();
+   if (__last->_M_type() == _Type::_Filename)
+ return *__last;
   }
 return {};
   }
-- 
2.37.3



Re: [PATCH] x86: Replace ne:CCC/ne:CCO with UNSPEC_CC_NE in neg patterns

2022-10-28 Thread H.J. Lu via Gcc-patches
On Fri, Oct 28, 2022 at 2:34 PM Segher Boessenkool
 wrote:
>
> On Wed, Oct 26, 2022 at 11:58:57AM -0700, H.J. Lu via Gcc-patches wrote:
> > In i386.md, neg patterns which set MODE_CC register like
> >
> > (set (reg:CCC FLAGS_REG)
> >  (ne:CCC (match_operand:SWI48 1 "general_reg_operand") (const_int 0)))
> >
> > can lead to errors when operand 1 is a constant value.  If FLAGS_REG in
>
> But it cannot be.  general_reg_operand will not allow that:
> ===
> (define_predicate "general_reg_operand"
>   (and (match_code "reg")
>(match_test "GENERAL_REGNO_P (REGNO (op))")))
> ===
>
> > (set (reg:CCC FLAGS_REG)
> >  (ne:CCC (const_int 2) (const_int 0)))
> >
> > is set to 1, RTX simplifiers may simplify
>

Here is another example:

(define_insn "*neg_ccc_1"
  [(set (reg:CCC FLAGS_REG)
(ne:CCC
  (match_operand:SWI 1 "nonimmediate_operand" "0")
  (const_int 0)))
   (set (match_operand:SWI 0 "nonimmediate_operand" "=m")
(neg:SWI (match_dup 1)))]
  ""
  "neg{}\t%0"
  [(set_attr "type" "negnot")
   (set_attr "mode" "")])

Operand 1 can be a known value.

H.J.


Re: [PATCH] x86: Replace ne:CCC/ne:CCO with UNSPEC_CC_NE in neg patterns

2022-10-28 Thread Eric Botcazou via Gcc-patches
> You mean in CCV?  That works yes, but only because (or if) the setter
> and getter of the CC reg both use CCV (so never use any other flag at
> the same time; CCV has an empty intersection with all other CC modes).

We're talking about CCC here AFAIK, i.e. the carry, not CCV.

-- 
Eric Botcazou




Re: [PATCH] x86: Replace ne:CCC/ne:CCO with UNSPEC_CC_NE in neg patterns

2022-10-28 Thread Segher Boessenkool
Hi!

On Fri, Oct 28, 2022 at 10:35:03AM +0200, Eric Botcazou via Gcc-patches wrote:
> > (set (reg:SI 93)
> >  (neg:SI (ltu:SI (reg:CCC 17 flags) (const_int 0 [0]
> > 
> > as
> > 
> > (set (reg:SI 93)
> >  (neg:SI (ltu:SI (const_int 1) (const_int 0 [0]
> > 
> > which leads to incorrect results since LTU on MODE_CC register isn't the
> > same as "unsigned less than" in x86 backend.
> 
> That's not specific to the x86 back-end, i.e. it's a generic caveat.

A MODE_CC reg can never be "const_int 1".  That is total garbage.  It
cannot work.  It would mean all of
  (eq (reg:CC) (const_int 0))
  (lt (reg:CC) (const_int 0))
  (gt (reg:CC) (const_int 0))
  (ne (reg:CC) (const_int 0))
  (ge (reg:CC) (const_int 0))
  (le (reg:CC) (const_int 0))
(and more) are simultaneously true.

> > PR target/107172
> > * config/i386/i386.md (UNSPEC_CC_NE): New.
> > Replace ne:CCC/ne:CCO with UNSPEC_CC_NE in neg patterns.
> 
> FWIW the SPARC back-end uses a COMPARE instead of an UNSPEC here.

You mean in CCV?  That works yes, but only because (or if) the setter
and getter of the CC reg both use CCV (so never use any other flag at
the same time; CCV has an empty intersection with all other CC modes).


Segher


Re: [PATCH] x86: Replace ne:CCC/ne:CCO with UNSPEC_CC_NE in neg patterns

2022-10-28 Thread Segher Boessenkool
On Wed, Oct 26, 2022 at 11:58:57AM -0700, H.J. Lu via Gcc-patches wrote:
> In i386.md, neg patterns which set MODE_CC register like
> 
> (set (reg:CCC FLAGS_REG)
>  (ne:CCC (match_operand:SWI48 1 "general_reg_operand") (const_int 0)))
> 
> can lead to errors when operand 1 is a constant value.  If FLAGS_REG in

But it cannot be.  general_reg_operand will not allow that:
===
(define_predicate "general_reg_operand"
  (and (match_code "reg")
   (match_test "GENERAL_REGNO_P (REGNO (op))")))
===

> (set (reg:CCC FLAGS_REG)
>  (ne:CCC (const_int 2) (const_int 0)))
> 
> is set to 1, RTX simplifiers may simplify

"is set to 1"?  Do you mean you do something like
  (set (regs FLAGS_REG) (const_int 1))
?  That is invalid RTL, as I've said tens of time in the last few weeks.

> which leads to incorrect results since LTU on MODE_CC register isn't the
> same as "unsigned less than" in x86 backend.

The special notation
  (ltu (reg:CC) (const_int 0))
is not about comparing anything to 0, but simply means "did the
comparison-like thing that set that reg say ltu was true".

> To prevent RTL optimizers
> from setting MODE_CC register to a constant, use UNSPEC_CC_NE to replace
> ne:CCC/ne:CCO when setting FLAGS_REG in neg patterns.

This is an indirect workaround, nothing more.  The unspec will naturally
not be folded to anything else (unless you arrange for that yourself),
there is nothing the generic code knows about the semantics of any
unspec after all.

AFIACS there is no way to express overflow in a CC, but an unspec can
help, sure.  You need to fix the setter side as well though.


Segher


[r13-3540 Regression] FAIL: gcc.dg/vect/bb-slp-cond-1.c scan-tree-dump-times vect "loop vectorized" 1 on Linux/x86_64

2022-10-28 Thread haochen.jiang via Gcc-patches
On Linux/x86_64,

0607307768b66a90e27c5bc91a247acc938f070e is the first bad commit
commit 0607307768b66a90e27c5bc91a247acc938f070e
Author: Thomas Schwinge 
Date:   Tue Oct 25 13:10:52 2022 +0200

Fix target selector syntax in 'gcc.dg/vect/bb-slp-cond-1.c'

caused

FAIL: gcc.dg/vect/bb-slp-cond-1.c -flto -ffat-lto-objects  scan-tree-dump-times 
vect "loop vectorized" 1
FAIL: gcc.dg/vect/bb-slp-cond-1.c scan-tree-dump-times vect "loop vectorized" 1

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-3540/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/bb-slp-cond-1.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/bb-slp-cond-1.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com)


[PATCH] c++: Tweaks for -Wredundant-move [PR107363]

2022-10-28 Thread Marek Polacek via Gcc-patches
Two things here:

1) when we're pointing out that std::move on a constant object is
   redundant, don't say "in return statement" when we aren't in a
   return statement;
2) suppress the warning when the std::move call was dependent, because
   removing the std::move may not be correct for a different
   instantiation of the original template.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR c++/107363

gcc/cp/ChangeLog:

* semantics.cc (finish_call_expr): Suppress OPT_Wpessimizing_move.
* typeck.cc (maybe_warn_pessimizing_move): Check warn_redundant_move
and warning_suppressed_p.  Adjust a message depending on return_p.
(check_return_expr): Don't suppress OPT_Wpessimizing_move here.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/Wredundant-move13.C: New test.
---
 gcc/cp/semantics.cc   |  4 ++
 gcc/cp/typeck.cc  | 16 ++---
 .../g++.dg/cpp0x/Wredundant-move13.C  | 61 +++
 3 files changed, 73 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/Wredundant-move13.C

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 36aa9c4499f..caaa40fde19 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -2738,6 +2738,10 @@ finish_call_expr (tree fn, vec **args, bool 
disallow_virtual,
  result = build_min_nt_call_vec (orig_fn, *args);
  SET_EXPR_LOCATION (result, cp_expr_loc_or_input_loc (fn));
  KOENIG_LOOKUP_P (result) = koenig_p;
+ /* Disable the std::move warnings since this call was dependent
+(c++/89780, c++/107363).  This also suppresses the
+-Wredundant-move warning.  */
+ suppress_warning (result, OPT_Wpessimizing_move);
  if (is_overloaded_fn (fn))
fn = get_fns (fn);
 
diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index 2e0fd8fbf17..5f5fb2a212b 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -10885,7 +10885,9 @@ maybe_warn_pessimizing_move (tree expr, tree type, bool 
return_p)
  and where the std::move does nothing if T does not have a T(const T&&)
  constructor, because the argument is const.  It will not use T(T&&)
  because that would mean losing the const.  */
-  else if (TYPE_REF_P (TREE_TYPE (arg))
+  else if (warn_redundant_move
+  && !warning_suppressed_p (expr, OPT_Wredundant_move)
+  && TYPE_REF_P (TREE_TYPE (arg))
   && CP_TYPE_CONST_P (TREE_TYPE (TREE_TYPE (arg
 {
   tree rtype = TREE_TYPE (TREE_TYPE (arg));
@@ -10901,8 +10903,11 @@ maybe_warn_pessimizing_move (tree expr, tree type, 
bool return_p)
  return;
  }
   auto_diagnostic_group d;
-  if (warning_at (loc, OPT_Wredundant_move,
- "redundant move in return statement"))
+  if (return_p
+ ? warning_at (loc, OPT_Wredundant_move,
+   "redundant move in return statement")
+ : warning_at (loc, OPT_Wredundant_move,
+   "redundant move in initialization"))
inform (loc, "remove % call");
 }
 }
@@ -11126,11 +11131,6 @@ check_return_expr (tree retval, bool *no_warning)
   /* We don't know if this is an lvalue or rvalue use, but
 either way we can mark it as read.  */
   mark_exp_read (retval);
-  /* Disable our std::move warnings when we're returning
-a dependent expression (c++/89780).  */
-  if (retval && TREE_CODE (retval) == CALL_EXPR)
-   /* This also suppresses -Wredundant-move.  */
-   suppress_warning (retval, OPT_Wpessimizing_move);
   return retval;
 }
 
diff --git a/gcc/testsuite/g++.dg/cpp0x/Wredundant-move13.C 
b/gcc/testsuite/g++.dg/cpp0x/Wredundant-move13.C
new file mode 100644
index 000..80e7d80cd02
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/Wredundant-move13.C
@@ -0,0 +1,61 @@
+// PR c++/107363
+// { dg-do compile { target c++11 } }
+// { dg-options "-Wredundant-move" }
+
+// Define std::move.
+namespace std {
+  template
+struct remove_reference
+{ typedef _Tp   type; };
+
+  template
+struct remove_reference<_Tp&>
+{ typedef _Tp   type; };
+
+  template
+struct remove_reference<_Tp&&>
+{ typedef _Tp   type; };
+
+  template
+constexpr typename std::remove_reference<_Tp>::type&&
+move(_Tp&& __t) noexcept
+{ return static_cast::type&&>(__t); }
+}
+
+template 
+struct Optional {
+  U ();
+  T release_value() {
+T t = std::move (value ());
+return t;
+  }
+};
+
+struct Foo {};
+void test(Optional o) { o.release_value(); }
+
+struct F {
+  F(const F&);
+  F(F&&) = delete;
+};
+
+struct Z {
+  Z(const Z&) = delete;
+  Z(Z&&) = delete;
+  Z(const Z&&);
+};
+
+const F& constfref();
+const Z& constzref();
+
+void
+g ()
+{
+  // Will call F::F(const F&) w/ and w/o std::move.  So it's redundant.
+  F f = std::move (constfref()); // { dg-warning "redundant move in 
initialization" }
+  (void) f;
+  // Will 

[PATCH] Fortran: ordering of hidden procedure arguments [PR107441]

2022-10-28 Thread Harald Anlauf via Gcc-patches
Dear all,

the passing of procedure arguments in Fortran sometimes requires
ancillary parameters that are "hidden".  Examples are string length
and the presence status of scalar variables with optional+value
attribute.

The gfortran ABI is actually documented:

https://gcc.gnu.org/onlinedocs/gfortran/Argument-passing-conventions.html

The reporter found that there was a discrepancy between the
caller and the callee.  This is corrected by the attached patch.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

From b7646403557eca19612c81437f381d4b4dcd51c8 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Fri, 28 Oct 2022 21:58:08 +0200
Subject: [PATCH] Fortran: ordering of hidden procedure arguments [PR107441]

gcc/fortran/ChangeLog:

	PR fortran/107441
	* trans-decl.cc (create_function_arglist): Adjust the ordering of
	automatically generated hidden procedure arguments to match the
	documented ABI for gfortran.

gcc/testsuite/ChangeLog:

	PR fortran/107441
	* gfortran.dg/optional_absent_6.f90: New test.
---
 gcc/fortran/trans-decl.cc | 15 +++--
 .../gfortran.dg/optional_absent_6.f90 | 60 +++
 2 files changed, 71 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/optional_absent_6.f90

diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
index 63515b9072a..18842fe2c4b 100644
--- a/gcc/fortran/trans-decl.cc
+++ b/gcc/fortran/trans-decl.cc
@@ -2508,7 +2508,7 @@ create_function_arglist (gfc_symbol * sym)
   tree fndecl;
   gfc_formal_arglist *f;
   tree typelist, hidden_typelist;
-  tree arglist, hidden_arglist;
+  tree arglist, hidden_arglist, optional_arglist, strlen_arglist;
   tree type;
   tree parm;

@@ -2518,6 +2518,7 @@ create_function_arglist (gfc_symbol * sym)
  the new FUNCTION_DECL node.  */
   arglist = NULL_TREE;
   hidden_arglist = NULL_TREE;
+  strlen_arglist = optional_arglist = NULL_TREE;
   typelist = TYPE_ARG_TYPES (TREE_TYPE (fndecl));

   if (sym->attr.entry_master)
@@ -2644,7 +2645,7 @@ create_function_arglist (gfc_symbol * sym)
 	  length = build_decl (input_location,
 			   PARM_DECL, get_identifier (name), len_type);

-	  hidden_arglist = chainon (hidden_arglist, length);
+	  strlen_arglist = chainon (strlen_arglist, length);
 	  DECL_CONTEXT (length) = fndecl;
 	  DECL_ARTIFICIAL (length) = 1;
 	  DECL_ARG_TYPE (length) = len_type;
@@ -2712,7 +2713,7 @@ create_function_arglist (gfc_symbol * sym)
 			PARM_DECL, get_identifier (name),
 			boolean_type_node);

-  hidden_arglist = chainon (hidden_arglist, tmp);
+	  optional_arglist = chainon (optional_arglist, tmp);
   DECL_CONTEXT (tmp) = fndecl;
   DECL_ARTIFICIAL (tmp) = 1;
   DECL_ARG_TYPE (tmp) = boolean_type_node;
@@ -2863,10 +2864,16 @@ create_function_arglist (gfc_symbol * sym)
   typelist = TREE_CHAIN (typelist);
 }

+  /* Add hidden present status for optional+value arguments.  */
+  arglist = chainon (arglist, optional_arglist);
+
   /* Add the hidden string length parameters, unless the procedure
  is bind(C).  */
   if (!sym->attr.is_bind_c)
-arglist = chainon (arglist, hidden_arglist);
+arglist = chainon (arglist, strlen_arglist);
+
+  /* Add hidden extra arguments for the gfortran library.  */
+  arglist = chainon (arglist, hidden_arglist);

   gcc_assert (hidden_typelist == NULL_TREE
   || TREE_VALUE (hidden_typelist) == void_type_node);
diff --git a/gcc/testsuite/gfortran.dg/optional_absent_6.f90 b/gcc/testsuite/gfortran.dg/optional_absent_6.f90
new file mode 100644
index 000..b8abb06980a
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/optional_absent_6.f90
@@ -0,0 +1,60 @@
+! { dg-do run }
+! PR fortran/107441
+!
+! Test VALUE + OPTIONAL for integer/real/...
+! in the presence of non-optional character dummies
+
+program bugdemo
+  implicit none
+  character :: s = 'a'
+  integer   :: t
+
+  t = testoptional(s)
+  call test2 (s)
+  call test3 (s)
+  call test4 (w='123',x=42)
+
+contains
+
+  function testoptional (w, x) result(t)
+character, intent(in)  :: w
+integer,   intent(in), value, optional :: x
+integer :: t
+print *, 'present(x) is', present(x)
+t = 0
+if (present (x)) stop 1
+  end function testoptional
+
+  subroutine test2 (w, x)
+character, intent(in)  :: w
+integer,   intent(in), value, optional :: x
+print*, 'present(x) is', present(x)
+if (present (x)) stop 2
+  end subroutine test2
+
+  subroutine test3 (w, x)
+character, intent(in),optional :: w
+integer,   intent(in), value, optional :: x
+print *, 'present(w) is', present(w)
+print *, 'present(x) is', present(x)
+if (.not. present (w)) stop 3
+if (present (x)) stop 4
+  end subroutine test3
+
+  subroutine test4 (r, w, x)
+real, value, optional :: r
+character(*), intent(in),optional :: w
+integer,  value, 

Re: [PATCH] docs: document sanitizers can trigger warnings

2022-10-28 Thread Eric Gallager via Gcc-patches
On Wed, Oct 26, 2022 at 7:09 AM Martin Liška  wrote:
>
> PR sanitizer/107298
>
> gcc/ChangeLog:
>
> * doc/invoke.texi: Document sanitizers can trigger warnings.
> ---
>  gcc/doc/invoke.texi | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 64f77e8367a..1ffbba16a72 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -16460,6 +16460,10 @@ by this option.
>
>  @end table
>
> +Note the enabled sanitizer options tend to increase a false-positive rate
> +of selected warnings, most notably @option{-Wmaybe-uninitialized}.
> +And thus we recommend to disable @option{-Werror}.
> +

I'd recommend rewording the second sentence there as:
"Thus, GCC developers recommend disabling @option{-Werror} when using
sanitizer options."

>  While @option{-ftrapv} causes traps for signed overflows to be emitted,
>  @option{-fsanitize=undefined} gives a diagnostic message.
>  This currently works only for the C family of languages.
> --
> 2.38.0
>


Re: [PATCH] x86: Replace ne:CCC/ne:CCO with UNSPEC_CC_NE in neg patterns

2022-10-28 Thread Eric Botcazou via Gcc-patches
> COMPARE may also set CC register to a constant when both operands are
> known constants.

No, a COMPARE is never evaluated alone, only the CC user may be evaluated.

-- 
Eric Botcazou




Re: RFC - VRP1 default mode

2022-10-28 Thread Eric Botcazou via Gcc-patches
> I get a clean testsuite run configured and bootstrapped with
> 
> --enable-languages=c,c++,go,fortran,ada,obj-c++,jit --enable-host-shared
> 
> Is there a PR or specific tests in either fortran or ada for those
> improvements? ie, something specific I should check for? Part of rangers
> point is to be able to do symbolic relationships without storing the
> symbolic in the range, just picking it up from the IL as needed.

The motivating Ada example for symbolic ranges was gnat.dg/opt40.adb.

-- 
Eric Botcazou




Re: [PATCH] c++: -Wdangling-reference and system headers

2022-10-28 Thread Jason Merrill via Gcc-patches

On 10/27/22 11:39, Marek Polacek wrote:

I got this testcase:

   auto f() -> std::optional;
   for (char c : f().value()) { }

which has a dangling reference: std::optional::value returns
a reference to the contained value, but here it's the f() temporary.
We warn, which is great, but only with -Wsystem-headers, because
the function comes from a system header and warning_enabled_at used
in do_warn_dangling_reference checks diagnostic_report_warnings_p,
which in this case returned false so we didn't warn.

Fixed as below.  I could also override dc_warn_system_headers so that
the warning is enabled in system headers always.  With that, I found one
issue in libstdc++:

libstdc++-v3/include/bits/fs_path.h:1265:15: warning: possibly dangling 
reference to a temporary [-Wdangling-reference]
  1265 | auto& __last = *--end();
   |   ^~

which looks like a true positive as well.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/cp/ChangeLog:

* call.cc (maybe_warn_dangling_reference): Enable the warning in
system headers if the decl isn't in a system header.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wdangling-reference4.C: New test.
---
  gcc/cp/call.cc   |  7 +++
  gcc/testsuite/g++.dg/warn/Wdangling-reference4.C | 14 ++
  2 files changed, 21 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/warn/Wdangling-reference4.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 951b9fd2a88..c7c7a122045 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -13539,6 +13539,13 @@ maybe_warn_dangling_reference (const_tree decl, tree 
init)
  return;
if (!TYPE_REF_P (TREE_TYPE (decl)))
  return;
+  /* Don't suppress the diagnostic just because the call comes from
+ a system header.  If the DECL is not in a system header, or if
+ -Wsystem-headers was provided, warn.  */
+  auto wsh
+= make_temp_override (global_dc->dc_warn_system_headers,
+ (!in_system_header_at (DECL_SOURCE_LOCATION (decl))
+  || global_dc->dc_warn_system_headers));


Hmm, this is OK, but maybe we want a 
warning_enabled_at_ignore_system_header?



if (tree call = do_warn_dangling_reference (init))
  {
auto_diagnostic_group d;
diff --git a/gcc/testsuite/g++.dg/warn/Wdangling-reference4.C 
b/gcc/testsuite/g++.dg/warn/Wdangling-reference4.C
new file mode 100644
index 000..aee7a29019b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wdangling-reference4.C
@@ -0,0 +1,14 @@
+// { dg-do compile { target c++17 } }
+// { dg-options "-Wdangling-reference" }
+// Check that we warn here even without -Wsystem-headers.
+
+#include 
+#include 
+
+auto f() -> std::optional;
+
+void
+g ()
+{
+  for (char c : f().value()) { (void) c; } // { dg-warning "dangling 
reference" }
+}

base-commit: f95d3d5de72a1c43e8d529bad3ef59afc3214705




Re: [PATCH] libstdc++: std::to_chars std::{,b}float16_t support

2022-10-28 Thread Jakub Jelinek via Gcc-patches
On Fri, Oct 28, 2022 at 12:52:44PM -0400, Patrick Palka wrote:
> IIRC for hex formatting of denormals I opted to be consistent with how
> glibc printf formats them, instead of outputting the truly shortest
> form.

Note, it isn't just denormals,
1.18cp-4
2.318p-5
4.63p-6
8.c6p-7
463p-10
8c6p-11
also represent the same number, the first is what glibc emits (and
is certainly nicer to read), but some of the others are shorter.

Now, the printf %a/%A documentation says that there must be one hexadecimal
digit before the dot if any and that for normalized numbers it must be
non-zero.
So that rules out the last 2, and allows but doesn't require the denormal
treatment the library does right now.
If we shall go really for the shortest, we should handle denormals with
non-zero leading digit too and for all cases consider the 4 shifting
possibilities which one results in shortest (perhaps prefer the smallest
non-zero leading digit among the shortest)?
> > readelf -Ws libstdc++.so.6.0.31 | grep float16_t
> >912: 000ae824   950 FUNCGLOBAL DEFAULT   13 
> > _ZSt21__to_chars_bfloat16_tPcS_fSt12chars_format@@GLIBCXX_3.4.31
> >   5767: 000ae4a1   899 FUNCGLOBAL DEFAULT   13 
> > _ZSt20__to_chars_float16_tPcS_fSt12chars_format@@GLIBCXX_3.4.31
> >842: 0016d430   106 FUNCLOCAL  DEFAULT   13 
> > _ZN12_GLOBAL__N_113get_ieee_reprINS_23floating_type_float16_tEEENS_6ieee_tIT_EES3_
> >865: 00170980  1613 FUNCLOCAL  DEFAULT   13 
> > _ZSt23__floating_to_chars_hexIN12_GLOBAL__N_123floating_type_float16_tEESt15to_chars_resultPcS3_T_St8optionalIiE.constprop.0.isra.0
> >   7205: 000ae824   950 FUNCGLOBAL DEFAULT   13 
> > _ZSt21__to_chars_bfloat16_tPcS_fSt12chars_format
> >   7985: 000ae4a1   899 FUNCGLOBAL DEFAULT   13 
> > _ZSt20__to_chars_float16_tPcS_fSt12chars_format
> > so 3568 code bytes together or so.
> 
> Ouch, the instantiation of __floating_to_chars_hex for float16 is
> responsible for nearly 50% of the .so size increase

True, but the increase isn't that huge.

Jakub



Re: [PATCH v2 3/3] p1689r5: initial support

2022-10-28 Thread Ben Boeckel via Gcc-patches
On Thu, Oct 27, 2022 at 19:16:44 -0400, Ben Boeckel wrote:
> diff --git a/gcc/testsuite/g++.dg/modules/modules.exp 
> b/gcc/testsuite/g++.dg/modules/modules.exp
> index afb323d0efd..7fe8825144f 100644
> --- a/gcc/testsuite/g++.dg/modules/modules.exp
> +++ b/gcc/testsuite/g++.dg/modules/modules.exp
> @@ -28,6 +28,7 @@
>  # { dg-module-do [link|run] [xfail] [options] } # link [and run]
>  
>  load_lib g++-dg.exp
> +load_lib modules.exp
>  
>  # If a testcase doesn't have special options, use these.
>  global DEFAULT_CXXFLAGS
> @@ -237,6 +238,13 @@ proc cleanup_module_files { files } {
>  }
>  }
>  
> +# delete the specified set of dep files
> +proc cleanup_dep_files { files } {
> +foreach file $files {
> + file_on_host delete $file
> +}
> +}
> +
>  global testdir
>  set testdir $srcdir/$subdir
>  proc srcdir {} {
> @@ -310,6 +318,7 @@ foreach src [lsort [find $srcdir/$subdir {*_a.[CHX}]] {
>   set std_list [module-init $src]
>   foreach std $std_list {
>   set mod_files {}
> + set dep_files {}
>   global module_do
>   set module_do {"compile" "P"}
>   set asm_list {}
> @@ -346,6 +355,8 @@ foreach src [lsort [find $srcdir/$subdir {*_a.[CHX}]] {
>   set mod_files [find $DEFAULT_REPO *.gcm]
>   }
>   cleanup_module_files $mod_files
> +
> + cleanup_dep_files $dep_files
>   }
>  }
>  }

These `cleanup_dep_files` hunks are leftovers from my attempts at
getting the P1689 and flags tests working; they'll be gone in v3.

--Ben


Re: [PATCH v2 2/3] libcpp: add a function to determine UTF-8 validity of a C string

2022-10-28 Thread Ben Boeckel via Gcc-patches
On Fri, Oct 28, 2022 at 08:59:16 -0400, David Malcolm wrote:
> On Thu, 2022-10-27 at 19:16 -0400, Ben Boeckel wrote:
> > This simplifies the interface for other UTF-8 validity detections
> > when a
> > simple "yes" or "no" answer is sufficient.
> > 
> > Signed-off-by: Ben Boeckel 
> > ---
> >  libcpp/ChangeLog  |  6 ++
> >  libcpp/charset.cc | 18 ++
> >  libcpp/internal.h |  2 ++
> >  3 files changed, 26 insertions(+)
> > 
> > diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog
> > index 4d707277531..4e2c7900ae2 100644
> > --- a/libcpp/ChangeLog
> > +++ b/libcpp/ChangeLog
> > @@ -1,3 +1,9 @@
> > +2022-10-27  Ben Boeckel  
> > +
> > +   * include/charset.cc: Add `_cpp_valid_utf8_str` which
> > determines
> > +   whether a C string is valid UTF-8 or not.
> > +   * include/internal.h: Add prototype for
> > `_cpp_valid_utf8_str`.
> > +
> >  2022-10-27  Ben Boeckel  
> >  
> > * include/charset.cc: Reject encodings of codepoints above
> > 0x10.
> 
> The patch looks good to me, with the same potential caveat that you
> might need to move the ChangeLog entry from the patch "body" to the
> leading blurb, to satisfy:
>   ./contrib/gcc-changelog/git_check_commit.py

Ah, I had missed that. Now fixed locally for patches 1 and 2; will be in
v3 pending some time for further reviews.

THanks,

--Ben


Re: [PATCH v4] RISC-V: Add support for inlining subword atomic operations

2022-10-28 Thread David Abdurachmanov via Gcc-patches
On Fri, Sep 2, 2022 at 1:09 PM Kito Cheng via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> LGTM with minor comments, it's time to move forward, thanks Patrick and
> Palmer.
>

Ping.

Any plans to finally land this one for GCC 13?

The hope is that this patch would make life significantly easier for
distributions. There are way too many packages failing to build due to
sub-word atomics, which is highly annoying considering that it's not
consistent between package versions. Build times on riscv64 are extremely
long which makes it even more annoying. Would love to see this finally
fixed.


> > +
> > +void
> > +riscv_subword_address (rtx mem, rtx *aligned_mem, rtx *shift, rtx *mask,
> > +  rtx *not_mask)
> > +{
> > +  /* Align the memory addess to a word.  */
> > +  rtx addr = force_reg (Pmode, XEXP (mem, 0));
> > +
> > +  rtx aligned_addr = gen_reg_rtx (Pmode);
> > +  emit_move_insn (aligned_addr,  gen_rtx_AND (Pmode, addr,
> > + gen_int_mode (-4, Pmode)));
> > +
> > +  *aligned_mem = change_address (mem, SImode, aligned_addr);
> > +
> > +  /* Calculate the shift amount.  */
> > +  *shift = gen_reg_rtx (SImode);
>
> Already allocated reg_rtx outside, this line could be removed.
>
> > +  emit_move_insn (*shift, gen_rtx_AND (SImode, gen_lowpart (SImode,
> addr),
> > + gen_int_mode (3, SImode)));
> > +  emit_move_insn (*shift, gen_rtx_ASHIFT (SImode, *shift,
> > +gen_int_mode(3, SImode)));
> > +
> > +  /* Calculate the mask.  */
> > +  int unshifted_mask;
> > +  if (GET_MODE (mem) == QImode)
> > +unshifted_mask = 0xFF;
> > +  else
> > +unshifted_mask = 0x;
> > +
> > +  rtx mask_reg = gen_reg_rtx (SImode);
>
> Ditto.
>
> > @@ -152,6 +348,128 @@
> >DONE;
> >  })
> >
> > +(define_expand "atomic_compare_and_swap"
> > +  [(match_operand:SI 0 "register_operand" "");; bool output
> > +   (match_operand:SHORT 1 "register_operand" "") ;; val output
> > +   (match_operand:SHORT 2 "memory_operand" "")   ;; memory
> > +   (match_operand:SHORT 3 "reg_or_0_operand" "") ;; expected value
> > +   (match_operand:SHORT 4 "reg_or_0_operand" "") ;; desired value
> > +   (match_operand:SI 5 "const_int_operand" "")   ;; is_weak
> > +   (match_operand:SI 6 "const_int_operand" "")   ;; mod_s
> > +   (match_operand:SI 7 "const_int_operand" "")]  ;; mod_f
> > +  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
> > +{
> > +  emit_insn (gen_atomic_cas_value_strong (operands[1],
> operands[2],
> > +   operands[3], operands[4],
> > +   operands[6],
> operands[7]));
> > +
> > +  rtx val = gen_reg_rtx (SImode);
> > +  if (operands[1] != const0_rtx)
> > +emit_insn (gen_rtx_SET (val, gen_rtx_SIGN_EXTEND (SImode,
> operands[1])));
> > +  else
> > +emit_insn (gen_rtx_SET (val, const0_rtx));
>
> nit: emit_move_insn rather than emit_insn + gen_rtx_SET
>
> > +
> > +  rtx exp = gen_reg_rtx (SImode);
> > +  if (operands[3] != const0_rtx)
> > +emit_insn (gen_rtx_SET (exp, gen_rtx_SIGN_EXTEND (SImode,
> operands[3])));
> > +  else
> > +emit_insn (gen_rtx_SET (exp, const0_rtx));
>
> nit: emit_move_insn rather than emit_insn + gen_rtx_SET
>
> > +
> > +  rtx compare = val;
> > +  if (exp != const0_rtx)
> > +{
> > +  rtx difference = gen_rtx_MINUS (SImode, val, exp);
> > +  compare = gen_reg_rtx (SImode);
> > +  emit_insn (gen_rtx_SET (compare, difference));
>
> nit: emit_move_insn rather than emit_insn + gen_rtx_SET
>
> > +}
> > +
> > +  if (word_mode != SImode)
> > +{
> > +  rtx reg = gen_reg_rtx (word_mode);
> > +  emit_insn (gen_rtx_SET (reg, gen_rtx_SIGN_EXTEND (word_mode,
> compare)));
>
> nit: emit_move_insn rather than emit_insn + gen_rtx_SET
>
>
> > +  compare = reg;
> > +}
> > +
> > +  emit_insn (gen_rtx_SET (operands[0], gen_rtx_EQ (SImode, compare,
> const0_rtx)));
>
> nit: emit_move_insn rather than emit_insn + gen_rtx_SET
>


Re: [PATCH] libstdc++: std::to_chars std::{,b}float16_t support

2022-10-28 Thread Patrick Palka via Gcc-patches
On Thu, 27 Oct 2022, Jakub Jelinek wrote:

> Hi!
> 
> The following patch on top of
> https://gcc.gnu.org/pipermail/libstdc++/2022-October/054849.html
> adds std::{,b}float16_t support for std::to_chars.
> When precision is specified (or for std::bfloat16_t for hex mode even if not),
> I believe we can just use the std::to_chars float (when float is mode
> compatible with std::float32_t) overloads, both formats are proper subsets
> of std::float32_t.
> Unfortunately when precision is not specified and we are supposed to emit
> shortest string, the std::{,b}float16_t strings are usually much shorter.
> E.g. 1.e7p-14f16 shortest fixed representation is
> 0.0001161 and shortest scientific representation is
> 1.161e-04 while 1.e7p-14f32 (same number promoted to std::float32_t)
> 0.00011610985 and
> 1.1610985e-04.
> Similarly for 1.38p-112bf16,
> 0.0235
> 2.35e-34 vs. 1.38p-112f32
> 0.023472271
> 2.3472271e-34
> For std::float16_t there are differences even in the shortest hex, say:
> 0.01p-14 vs. 1p-22
> but only for denormal std::float16_t values (where all std::float16_t
> denormals converted to std::float32_t are normal), __FLT16_MIN__ and
> everything larger in absolute value than that is the same.  Unless
> that is a bug and we should try to discover shorter representations
> even for denormals...

IIRC for hex formatting of denormals I opted to be consistent with how
glibc printf formats them, instead of outputting the truly shortest
form.

I wouldn't be against using the float32 overloads even for shortest hex
formatting of float16.  The output is shorter but equivalent so it
shouldn't cause any problems.

> std::bfloat16_t has the same exponent range as std::float32_t, so all
> std::bfloat16_t denormals are also std::float32_t denormals and thus
> the shortest hex representations are the same.
> 
> As documented, ryu can handle arbitrary IEEE like floating point formats
> (probably not wider than IEEE quad) using the generic_128 handling, but
> ryu is hidden in libstdc++.so.  As only few architectures support
> std::float16_t right now and some of them have special ISA requirements
> for those (e.g. on i?86 one needs -msse2) and std::bfloat16_t is right
> now supported only on x86 (again with -msse2), perhaps with aarch64/arm
> coming next if ARM is interested, but I think it is possible that more
> will be added later, instead of exporting APIs from the library to handle
> directly the std::{,b}float16_t overloads this patch instead exports
> functions which take a float which is a superset of those and expects
> the inline overloads to promote the 16-bit formats to 32-bit, then inside
> of the library it ensures they are printed right.
> With the added [[gnu::cold]] attribute because I think most users
> will primarily use these formats as storage formats and perform arithmetics
> in the excess precision for them and print also as std::float32_t the
> added support doesn't seem to be too large, on x86_64:
> readelf -Ws libstdc++.so.6.0.31 | grep float16_t
>912: 000ae824   950 FUNCGLOBAL DEFAULT   13 
> _ZSt21__to_chars_bfloat16_tPcS_fSt12chars_format@@GLIBCXX_3.4.31
>   5767: 000ae4a1   899 FUNCGLOBAL DEFAULT   13 
> _ZSt20__to_chars_float16_tPcS_fSt12chars_format@@GLIBCXX_3.4.31
>842: 0016d430   106 FUNCLOCAL  DEFAULT   13 
> _ZN12_GLOBAL__N_113get_ieee_reprINS_23floating_type_float16_tEEENS_6ieee_tIT_EES3_
>865: 00170980  1613 FUNCLOCAL  DEFAULT   13 
> _ZSt23__floating_to_chars_hexIN12_GLOBAL__N_123floating_type_float16_tEESt15to_chars_resultPcS3_T_St8optionalIiE.constprop.0.isra.0
>   7205: 000ae824   950 FUNCGLOBAL DEFAULT   13 
> _ZSt21__to_chars_bfloat16_tPcS_fSt12chars_format
>   7985: 000ae4a1   899 FUNCGLOBAL DEFAULT   13 
> _ZSt20__to_chars_float16_tPcS_fSt12chars_format
> so 3568 code bytes together or so.

Ouch, the instantiation of __floating_to_chars_hex for float16 is
responsible for nearly 50% of the .so size increase

> 
> Tested with the attached test (which doesn't prove the shortest
> representation, just prints std::{,b}float16_t and std::float32_t
> shortest strings side by side, then tries to verify it can be
> emitted even into the exact sized range and can't be into range
> one smaller than that and tries to read what is printed
> back using from_chars float32_t overload (so there could be
> double rounding, but apparently there is none for the shortest strings).
> The only differences printed are for NaNs, where sNaNs are canonicalized
> to canonical qNaNs and as to_chars doesn't print NaN mantissa, even qNaNs
> other than the canonical one are read back just as the canonical NaN.
> 
> Also attaching what Patrick wrote to generate the pow10_adjustment_tab,
> for std::float16_t only 1.0, 10.0, 100.0, 1000.0 and 1.0 are powers
> of 10 in the range because __FLT16_MAX__ is 65504.0, and all of the above
> are exactly 

Re: [PATCH v3] RISC-V: Libitm add RISC-V support.

2022-10-28 Thread Palmer Dabbelt

On Fri, 28 Oct 2022 02:37:13 PDT (-0700), gcc-patches@gcc.gnu.org wrote:

I guess we don't really care about RV32E here, but in case you add a
guard for that?

#ifdef __riscv_e
#error "rv32e unsupported"
#endif


Ah, thanks.  There's rv64e now too, but that's just an error message 
problem so probably not a big deal.



On Fri, Oct 28, 2022 at 4:39 PM Xiongchuan Tan via Gcc-patches
 wrote:


Reviewed-by: Palmer Dabbelt 
Acked-by: Palmer Dabbelt 

libitm/ChangeLog:

* configure.tgt: Add riscv support.
* config/riscv/asm.h: New file.
* config/riscv/sjlj.S: New file.
* config/riscv/target.h: New file.
---
v2: Change HW_CACHELINE_SIZE to 64 (in accordance with the RVA profiles, see
https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc)

v3: Ensure the stack is aligned to 16 bytes; make use of Zihintpause in
cpu_relax()

 libitm/config/riscv/asm.h|  54 +
 libitm/config/riscv/sjlj.S   | 144 +++
 libitm/config/riscv/target.h |  62 +++
 libitm/configure.tgt |   2 +
 4 files changed, 262 insertions(+)
 create mode 100644 libitm/config/riscv/asm.h
 create mode 100644 libitm/config/riscv/sjlj.S
 create mode 100644 libitm/config/riscv/target.h

diff --git a/libitm/config/riscv/asm.h b/libitm/config/riscv/asm.h
new file mode 100644
index 000..bb515f2
--- /dev/null
+++ b/libitm/config/riscv/asm.h
@@ -0,0 +1,54 @@
+/* Copyright (C) 2022 Free Software Foundation, Inc.
+   Contributed by Xiongchuan Tan .
+
+   This file is part of the GNU Transactional Memory Library (libitm).
+
+   Libitm is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   Libitm is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#ifndef _RV_ASM_H
+#define _RV_ASM_H
+
+#if __riscv_xlen == 64
+#  define GPR_L ld
+#  define GPR_S sd
+#  define SZ_GPR 8
+#  define LEN_GPR 14
+#elif __riscv_xlen == 32
+#  define GPR_L lw
+#  define GPR_S sw
+#  define SZ_GPR 4
+#  define LEN_GPR 16 /* Extra padding to align the stack to 16 bytes */
+#else
+#  error Unsupported XLEN (must be 64-bit or 32-bit).
+#endif
+
+#if defined(__riscv_flen) && __riscv_flen == 64
+#  define FPR_L fld
+#  define FPR_S fsd
+#  define SZ_FPR 8
+#elif defined(__riscv_flen) && __riscv_flen == 32
+#  define FPR_L flw
+#  define FPR_S fsw
+#  define SZ_FPR 4


Check __riscv_flen is not 32 or 64 here, in case we add Q-extension
then we can error out.


diff --git a/libitm/config/riscv/sjlj.S b/libitm/config/riscv/sjlj.S
new file mode 100644
index 000..93f12ec
--- /dev/null
+++ b/libitm/config/riscv/sjlj.S
@@ -0,0 +1,144 @@
+#include "asmcfi.h"
+#include "asm.h"
+
+   .text
+   .align  2
+   .global _ITM_beginTransaction
+   .type   _ITM_beginTransaction, @function
+
+_ITM_beginTransaction:
+   cfi_startproc
+   mv a1, sp
+   addi sp, sp, -(LEN_GPR*SZ_GPR+ 12*SZ_FPR)


This expression appeared 4 times, maybe define a marco ADJ_STACK_SIZE
or something else to hold that?


+   cfi_adjust_cfa_offset(LEN_GPR*SZ_GPR+ 12*SZ_FPR)



diff --git a/libitm/config/riscv/target.h b/libitm/config/riscv/target.h
new file mode 100644
index 000..b8a1665
--- /dev/null
+++ b/libitm/config/riscv/target.h
@@ -0,0 +1,62 @@
+typedef struct gtm_jmpbuf
+  {
+long int pc;
+void *cfa;
+long int s[12]; /* Saved registers, s0 is fp */
+
+#if __riscv_xlen == 32
+/* Ensure that the stack is 16-byte aligned */
+long int padding[2];
+#endif
+
+/* FP saved registers */
+#if defined(__riscv_flen) && __riscv_flen == 64
+double fs[12];
+#elif defined(__riscv_flen) && __riscv_flen == 32
+float fs[12];


Same here, error __riscv_flen if defined but not 64 or 32.


[PATCH 12/15 V3] arm: implement bti injection

2022-10-28 Thread Andrea Corallo via Gcc-patches
Hi all,

please find attached the third iteration of this patch addresing review
comments.

Thanks

  Andrea

>From e3001bd662b84dafeca200b52fc644b7bf81c4af Mon Sep 17 00:00:00 2001
From: Andrea Corallo 
Date: Thu, 7 Apr 2022 11:51:56 +0200
Subject: [PATCH] [PATCH 12/15] arm: implement bti injection

Hi all,

this patch enables Branch Target Identification Armv8.1-M Mechanism
[1].

This is achieved by using the bti pass made common with Aarch64.

The pass iterates through the instructions and adds the necessary BTI
instructions at the beginning of every function and at every landing
pads targeted by indirect jumps.

Best Regards

  Andrea

[1]


gcc/ChangeLog

2022-04-07  Andrea Corallo  

* config.gcc (arm*-*-*): Add 'aarch-bti-insert.o' object.
* config/arm/arm-protos.h: Update.
* config/arm/arm.cc (aarch_bti_enabled, aarch_bti_j_insn_p)
(aarch_pac_insn_p, aarch_gen_bti_c, aarch_gen_bti_j): New
functions.
* config/arm/arm.md (bti_nop): New insn.
* config/arm/t-arm (PASSES_EXTRA): Add 'arm-passes.def'.
(aarch-bti-insert.o): New target.
* config/arm/unspecs.md (UNSPEC_BTI_NOP): New unspec.
* config/arm/aarch-bti-insert.cc (rest_of_insert_bti): Update
to verify arch compatibility.
* config/arm/arm-passes.def: New file.

gcc/testsuite/ChangeLog

2022-04-07  Andrea Corallo  

* gcc.target/arm/bti-1.c: New testcase.
* gcc.target/arm/bti-2.c: Likewise.
---
 gcc/config.gcc   |  2 +-
 gcc/config/arm/arm-passes.def| 21 ++
 gcc/config/arm/arm-protos.h  |  2 +
 gcc/config/arm/arm.cc| 61 +---
 gcc/config/arm/arm.md|  7 
 gcc/config/arm/t-arm | 10 +
 gcc/config/arm/unspecs.md|  1 +
 gcc/testsuite/gcc.target/arm/bti-1.c | 12 ++
 gcc/testsuite/gcc.target/arm/bti-2.c | 58 ++
 9 files changed, 167 insertions(+), 7 deletions(-)
 create mode 100644 gcc/config/arm/arm-passes.def
 create mode 100644 gcc/testsuite/gcc.target/arm/bti-1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/bti-2.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 2021bdf9d2f..004e1dfa8d8 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -353,7 +353,7 @@ arc*-*-*)
;;
 arm*-*-*)
cpu_type=arm
-   extra_objs="arm-builtins.o arm-mve-builtins.o aarch-common.o"
+   extra_objs="arm-builtins.o arm-mve-builtins.o aarch-common.o 
aarch-bti-insert.o"
extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h arm_cmse.h 
arm_bf16.h arm_mve_types.h arm_mve.h arm_cde.h"
target_type_format_char='%'
c_target_objs="arm-c.o"
diff --git a/gcc/config/arm/arm-passes.def b/gcc/config/arm/arm-passes.def
new file mode 100644
index 000..71d6b563640
--- /dev/null
+++ b/gcc/config/arm/arm-passes.def
@@ -0,0 +1,21 @@
+/* Arm-specific passes declarations.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   Contributed by Arm Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+INSERT_PASS_BEFORE (pass_shorten_branches, 1, pass_insert_bti);
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 84764bf27ce..6befb6c4445 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -24,6 +24,8 @@
 
 #include "sbitmap.h"
 
+rtl_opt_pass *make_pass_insert_bti (gcc::context *ctxt);
+
 extern enum unwind_info_type arm_except_unwind_info (struct gcc_options *);
 extern int use_return_insn (int, rtx);
 extern bool use_simple_return_p (void);
diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index fa0f9a61498..26d4c1502f2 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -23374,12 +23374,6 @@ output_probe_stack_range (rtx reg1, rtx reg2)
   return "";
 }
 
-static bool
-aarch_bti_enabled ()
-{
-  return false;
-}
-
 /* Generate the prologue instructions for entry into an ARM or Thumb-2
function.  */
 void
@@ -32992,6 +32986,61 @@ arm_current_function_pac_enabled_p (void)
   && !crtl->is_leaf));
 }
 
+/* Return TRUE if Branch Target Identification Mechanism is enabled.  */
+bool

[PATCH 10/15 V3] arm: Implement cortex-M return signing address codegen

2022-10-28 Thread Andrea Corallo via Gcc-patches
Hi all,

the third iteration of this patch is attached addresing review comments.

Thanks

  Andrea

>From b42e28be75f374a4e1a5943c8c9002e07dbcc567 Mon Sep 17 00:00:00 2001
From: Andrea Corallo 
Date: Thu, 20 Jan 2022 15:36:23 +0100
Subject: [PATCH] [PATCH 10/15] arm: Implement cortex-M return signing address
 codegen

Hi all,

this patch enables address return signature and verification based on
Armv8.1-M Pointer Authentication [1].

To sign the return address, we use the PAC R12, LR, SP instruction
upon function entry.  This is signing LR using SP and storing the
result in R12.  R12 will be pushed into the stack.

During function epilogue R12 will be popped and AUT R12, LR, SP will
be used to verify that the content of LR is still valid before return.

Here an example of PAC instrumented function prologue and epilogue:

void foo (void);

int main()
{
  foo ();
  return 0;
}

Compiled with '-march=armv8.1-m.main -mbranch-protection=pac-ret
-mthumb' translates into:

main:
pac ip, lr, sp
push{r3, r7, ip, lr}
add r7, sp, #0
bl  foo
movsr3, #0
mov r0, r3
pop {r3, r7, ip, lr}
aut ip, lr, sp
bx  lr

The patch also takes care of generating a PACBTI instruction in place
of the sequence BTI+PAC when Branch Target Identification is enabled
contextually.

Ex. the previous example compiled with '-march=armv8.1-m.main
-mbranch-protection=pac-ret+bti -mthumb' translates into:

main:
pacbti  ip, lr, sp
push{r3, r7, ip, lr}
add r7, sp, #0
bl  foo
movsr3, #0
mov r0, r3
pop {r3, r7, ip, lr}
aut ip, lr, sp
bx  lr

As part of previous upstream suggestions a test for varargs has been
added and '-mtpcs-frame' is deemed being incompatible with this return
signing address feature being introduced.

[1] 


gcc/Changelog

2021-11-03  Andrea Corallo  

* config/arm/arm.h (arm_arch8m_main): Declare it.
* config/arm/arm.cc (arm_arch8m_main): Define it.
(arm_option_reconfigure_globals): Set arm_arch8m_main.
(arm_compute_frame_layout, arm_expand_prologue)
(thumb2_expand_return, arm_expand_epilogue)
(arm_conditional_register_usage): Update for pac codegen.
(arm_current_function_pac_enabled_p): New function.
* config/arm/arm.md (pac_ip_lr_sp, pacbti_ip_lr_sp, aut_ip_lr_sp):
Add new patterns.
* config/arm/unspecs.md (UNSPEC_PAC_IP_LR_SP)
(UNSPEC_PACBTI_IP_LR_SP, UNSPEC_AUT_IP_LR_SP): Add unspecs.

gcc/testsuite/Changelog

2021-11-03  Andrea Corallo  

* gcc.target/arm/pac.h : New file.
* gcc.target/arm/pac-1.c : New test case.
* gcc.target/arm/pac-2.c : Likewise.
* gcc.target/arm/pac-3.c : Likewise.
* gcc.target/arm/pac-4.c : Likewise.
* gcc.target/arm/pac-5.c : Likewise.
* gcc.target/arm/pac-6.c : Likewise.
* gcc.target/arm/pac-7.c : Likewise.
* gcc.target/arm/pac-8.c : Likewise.
---
 gcc/config/arm/arm-protos.h  |  1 +
 gcc/config/arm/arm.cc| 77 +++-
 gcc/config/arm/arm.h |  4 ++
 gcc/config/arm/arm.md| 23 +
 gcc/config/arm/unspecs.md|  3 ++
 gcc/testsuite/gcc.target/arm/pac-1.c | 12 +
 gcc/testsuite/gcc.target/arm/pac-2.c | 11 
 gcc/testsuite/gcc.target/arm/pac-3.c | 11 
 gcc/testsuite/gcc.target/arm/pac-4.c | 10 
 gcc/testsuite/gcc.target/arm/pac-5.c | 28 ++
 gcc/testsuite/gcc.target/arm/pac-6.c | 18 +++
 gcc/testsuite/gcc.target/arm/pac-7.c | 32 
 gcc/testsuite/gcc.target/arm/pac-8.c | 34 
 gcc/testsuite/gcc.target/arm/pac.h   | 17 ++
 14 files changed, 268 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/pac-1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/pac-2.c
 create mode 100644 gcc/testsuite/gcc.target/arm/pac-3.c
 create mode 100644 gcc/testsuite/gcc.target/arm/pac-4.c
 create mode 100644 gcc/testsuite/gcc.target/arm/pac-5.c
 create mode 100644 gcc/testsuite/gcc.target/arm/pac-6.c
 create mode 100644 gcc/testsuite/gcc.target/arm/pac-7.c
 create mode 100644 gcc/testsuite/gcc.target/arm/pac-8.c
 create mode 100644 gcc/testsuite/gcc.target/arm/pac.h

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index cff7ff1da2a..84764bf27ce 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -379,6 +379,7 @@ extern int vfp3_const_double_for_bits (rtx);
 extern void arm_emit_coreregs_64bit_shift (enum rtx_code, rtx, rtx, rtx, rtx,
   rtx);
 extern bool arm_fusion_enabled_p (tune_params::fuse_ops);
+extern bool 

Re: Rust frontend patches v3

2022-10-28 Thread David Malcolm via Gcc-patches
On Fri, 2022-10-28 at 17:20 +0200, Arthur Cohen wrote:
> 
> 
> On 10/28/22 15:06, David Malcolm wrote:
> > On Fri, 2022-10-28 at 13:48 +0200, Arthur Cohen wrote:
> > > Hi David,
> > > 
> > > On 10/26/22 23:15, David Malcolm wrote:
> > > > On Wed, 2022-10-26 at 10:17 +0200,
> > > > arthur.co...@embecosm.com wrote:
> > > > > This is the fixed version of our previous patch set for gccrs
> > > > > -
> > > > > We've
> > > > > adressed
> > > > > the comments raised in our previous emails.

[...snip...]

> > 
> > I'm guessing that almost all of gccrs testing so far has been on
> > relatively small examples, so that even if the GC considers
> > collecting,
> > the memory usage might not have exceeded the threshold for actually
> > doing the mark-and-sweep collection, and so no collection has been
> > happening during your testing.
> > 
> > In case you haven't tried yet, you might want to try adding:
> >    --param=ggc-min-expand=0 --param=ggc-min-heapsize=0
> > which IIRC forces the GC to actually do its mark-and-sweep
> > collection
> > at every potential point where it might collect.
> 
> That's very helpful, thanks a lot. I've ran our testsuite with these
> and 
> found no issues, but we might consider adding that to our CI setup to
> make sure.

Great!   Though as noted, for libgccjit it slows the testsuite down
*massively*, so you might want to bear that in mind.  I'm doing it for
libgccjit because libgccjit looks like a "frontend" to the rest of the
GCC codebase, but it's a deeply weird one, and so tends to uncover
weird issues :-/

Dave

> 
> Kindly,
> 
> Arthur
> 
> > I use these params in libgccjit's test suite; it massively slows
> > things
> > down, but it makes any GC misuse crash immediately even on minimal
> > test
> > cases, rather than hiding problems until you have a big (and thus
> > nasty) test case.
> > 
> > Hope this is helpful
> > Dave
> > 
> > 
> > > 
> > > > Hope this is constructive
> > > > Dave
> > > > 
> > > 
> > > Thanks a lot for the input,
> > > 
> > > All the best,
> > > 
> > > Arthur
> > > 
> > > 
> > > 
> > > 
> > 



Re: [PATCH] x86: Replace ne:CCC/ne:CCO with UNSPEC_CC_NE in neg patterns

2022-10-28 Thread H.J. Lu via Gcc-patches
On Fri, Oct 28, 2022 at 1:35 AM Eric Botcazou  wrote:
>
> > (set (reg:SI 93)
> >  (neg:SI (ltu:SI (reg:CCC 17 flags) (const_int 0 [0]
> >
> > as
> >
> > (set (reg:SI 93)
> >  (neg:SI (ltu:SI (const_int 1) (const_int 0 [0]
> >
> > which leads to incorrect results since LTU on MODE_CC register isn't the
> > same as "unsigned less than" in x86 backend.
>
> That's not specific to the x86 back-end, i.e. it's a generic caveat.
>
> >   PR target/107172
> >   * config/i386/i386.md (UNSPEC_CC_NE): New.
> >   Replace ne:CCC/ne:CCO with UNSPEC_CC_NE in neg patterns.
>
> FWIW the SPARC back-end uses a COMPARE instead of an UNSPEC here.

COMPARE may also set CC register to a constant when both operands are
known constants.


-- 
H.J.


Re: Rust frontend patches v3

2022-10-28 Thread Arthur Cohen



On 10/28/22 15:06, David Malcolm wrote:

On Fri, 2022-10-28 at 13:48 +0200, Arthur Cohen wrote:

Hi David,

On 10/26/22 23:15, David Malcolm wrote:

On Wed, 2022-10-26 at 10:17 +0200, arthur.co...@embecosm.com wrote:

This is the fixed version of our previous patch set for gccrs -
We've
adressed
the comments raised in our previous emails.


[...snip...]

(Caveat: I'm not a global reviewer)

Sorry if this is answered in the docs in the patch kit, but a high-
level question: what's the interaction between gccrs and gcc's
garbage
collector?  Are the only GC-managed objects (such as trees) either
(a)
created near the end of the gccrs, or (b) common globals created at
initialization and with GTY roots?


We only create trees at the last point of our compilation pipeline,
before directly writing them to the backend. This then calls a
`write_global_definitions` method, that we ported over directly from
the
Go frontend. Among other things, this method has the role of
preserving
trees from the GC using `go_preserve_from_gc()` (or
`rust_preserve_from_gc()` in our case).

Elsewhere in our pipeline, we never call any garbage-collection
routines
or GC-related functions.


Are there any points where a collection happen within gccrs?  Or is
almost everything stored using
gccrs's own data structures, and are these managed in the regular
(non-
GC) heap?


This is correct. We have an AST representation, implemented using
unique
pointers, which is then lowered to an HIR, also using unique
pointers.


I skimmed the patches and see that gccrs uses e.g. std::vector,
std::unique_ptr, std::map, and std::string; this seems reasonable
to
me, but it got me thinking about memory management strategies.

I see various std::map e.g. in Rust::Compile::Context; so
e.g.
is the GC guaranteed never to collect whilst this is live?


This is a really interesting question, and I hope the answer is yes!
But
I'm unsure as to how to enforce that, as I am not too familiar with
the
GCC GC. I'm hoping someone else will weigh in. As I said, we do not
do
anything particular with the GC during the execution of our
`CompileCrate` visitor, so hopefully it shouldn't run.


I'm guessing that almost all of gccrs testing so far has been on
relatively small examples, so that even if the GC considers collecting,
the memory usage might not have exceeded the threshold for actually
doing the mark-and-sweep collection, and so no collection has been
happening during your testing.

In case you haven't tried yet, you might want to try adding:
   --param=ggc-min-expand=0 --param=ggc-min-heapsize=0
which IIRC forces the GC to actually do its mark-and-sweep collection
at every potential point where it might collect.


That's very helpful, thanks a lot. I've ran our testsuite with these and 
found no issues, but we might consider adding that to our CI setup to 
make sure.


Kindly,

Arthur


I use these params in libgccjit's test suite; it massively slows things
down, but it makes any GC misuse crash immediately even on minimal test
cases, rather than hiding problems until you have a big (and thus
nasty) test case.

Hope this is helpful
Dave





Hope this is constructive
Dave



Thanks a lot for the input,

All the best,

Arthur








OpenPGP_0x1B3465B044AD9C65.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


[pushed] c++: apply friend attributes sooner

2022-10-28 Thread Jason Merrill via Gcc-patches
Tested x86_64-pc-linux-gnu, applying to trunk.

-- >8 --

Comparing attributes between declarations of a friend function has been
complicated by pushdecl happening before decl_attributes.  I assumed there
was some complicated reason we weren't calling decl_attributes here, but it
doesn't break anything.

gcc/cp/ChangeLog:

* decl.cc (grokdeclarator): Call decl_attributes before do_friend.
---
 gcc/cp/decl.cc | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index bc085f8fcce..c7f1937ea48 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -14206,13 +14206,16 @@ grokdeclarator (const cp_declarator *declarator,
else if (decl && DECL_NAME (decl))
  {
set_originating_module (decl, true);
-   
+
if (initialized)
  /* Kludge: We need funcdef_flag to be true in do_friend for
 in-class defaulted functions, but that breaks grokfndecl.
 So set it here.  */
  funcdef_flag = true;
 
+   cplus_decl_attributes (, *attrlist, 0);
+   *attrlist = NULL_TREE;
+
decl = do_friend (ctype, unqualified_id, decl,
  flags, funcdef_flag);
return decl;

base-commit: 4fe34cdcc80ac225b80670eabc38ac5e31ce8a5a
-- 
2.31.1



Re: [PATCH] libstdc++: Make placeholders inline when inline variables are available

2022-10-28 Thread Jonathan Wakely via Gcc-patches

On 20/10/22 16:58 +0200, Arsen Arsenović wrote:

This slightly lowers the dependency of generated code on libstdc++.so.


Looks good, I'll test and push, thanks.


libstdc++-v3/ChangeLog:

* include/std/functional: Make placeholders inline, if possible.
---
libstdc++-v3/include/std/functional | 66 -
1 file changed, 37 insertions(+), 29 deletions(-)

diff --git a/libstdc++-v3/include/std/functional 
b/libstdc++-v3/include/std/functional
index d22acaa3cb8..b396e8dbbdc 100644
--- a/libstdc++-v3/include/std/functional
+++ b/libstdc++-v3/include/std/functional
@@ -285,35 +285,43 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   * simplify this with variadic templates, because we're introducing
   * unique names for each.
   */
-extern const _Placeholder<1> _1;
-extern const _Placeholder<2> _2;
-extern const _Placeholder<3> _3;
-extern const _Placeholder<4> _4;
-extern const _Placeholder<5> _5;
-extern const _Placeholder<6> _6;
-extern const _Placeholder<7> _7;
-extern const _Placeholder<8> _8;
-extern const _Placeholder<9> _9;
-extern const _Placeholder<10> _10;
-extern const _Placeholder<11> _11;
-extern const _Placeholder<12> _12;
-extern const _Placeholder<13> _13;
-extern const _Placeholder<14> _14;
-extern const _Placeholder<15> _15;
-extern const _Placeholder<16> _16;
-extern const _Placeholder<17> _17;
-extern const _Placeholder<18> _18;
-extern const _Placeholder<19> _19;
-extern const _Placeholder<20> _20;
-extern const _Placeholder<21> _21;
-extern const _Placeholder<22> _22;
-extern const _Placeholder<23> _23;
-extern const _Placeholder<24> _24;
-extern const _Placeholder<25> _25;
-extern const _Placeholder<26> _26;
-extern const _Placeholder<27> _27;
-extern const _Placeholder<28> _28;
-extern const _Placeholder<29> _29;
+#if __cpp_inline_variables
+#  define _GLIBCXX_PLACEHOLDER inline
+#else
+#  define _GLIBCXX_PLACEHOLDER extern
+#endif
+
+_GLIBCXX_PLACEHOLDER const _Placeholder<1> _1;
+_GLIBCXX_PLACEHOLDER const _Placeholder<2> _2;
+_GLIBCXX_PLACEHOLDER const _Placeholder<3> _3;
+_GLIBCXX_PLACEHOLDER const _Placeholder<4> _4;
+_GLIBCXX_PLACEHOLDER const _Placeholder<5> _5;
+_GLIBCXX_PLACEHOLDER const _Placeholder<6> _6;
+_GLIBCXX_PLACEHOLDER const _Placeholder<7> _7;
+_GLIBCXX_PLACEHOLDER const _Placeholder<8> _8;
+_GLIBCXX_PLACEHOLDER const _Placeholder<9> _9;
+_GLIBCXX_PLACEHOLDER const _Placeholder<10> _10;
+_GLIBCXX_PLACEHOLDER const _Placeholder<11> _11;
+_GLIBCXX_PLACEHOLDER const _Placeholder<12> _12;
+_GLIBCXX_PLACEHOLDER const _Placeholder<13> _13;
+_GLIBCXX_PLACEHOLDER const _Placeholder<14> _14;
+_GLIBCXX_PLACEHOLDER const _Placeholder<15> _15;
+_GLIBCXX_PLACEHOLDER const _Placeholder<16> _16;
+_GLIBCXX_PLACEHOLDER const _Placeholder<17> _17;
+_GLIBCXX_PLACEHOLDER const _Placeholder<18> _18;
+_GLIBCXX_PLACEHOLDER const _Placeholder<19> _19;
+_GLIBCXX_PLACEHOLDER const _Placeholder<20> _20;
+_GLIBCXX_PLACEHOLDER const _Placeholder<21> _21;
+_GLIBCXX_PLACEHOLDER const _Placeholder<22> _22;
+_GLIBCXX_PLACEHOLDER const _Placeholder<23> _23;
+_GLIBCXX_PLACEHOLDER const _Placeholder<24> _24;
+_GLIBCXX_PLACEHOLDER const _Placeholder<25> _25;
+_GLIBCXX_PLACEHOLDER const _Placeholder<26> _26;
+_GLIBCXX_PLACEHOLDER const _Placeholder<27> _27;
+_GLIBCXX_PLACEHOLDER const _Placeholder<28> _28;
+_GLIBCXX_PLACEHOLDER const _Placeholder<29> _29;
+
+#undef _GLIBCXX_PLACEHOLDER
  }

  /**




Re: [PATCH v2] libstdc++: Don't use gstdint.h anymore

2022-10-28 Thread Jonathan Wakely via Gcc-patches

On 20/10/22 16:20 +0200, Arsen Arsenović wrote:

libstdc++-v3/ChangeLog:

* configure.ac: Stop generating gstdint.h.
* src/c++11/compatibility-atomic-c++0x.cc: Stop using gstdint.h.
---


> +using guintptr_t = __UINTPTR_TYPE__;

I think this should be local in the only function that uses it.

Sure.

Tested on x86_64-pc-linux-gnu.



Thanks, I'll test and push this.



libstdc++-v3/configure.ac| 6 --
libstdc++-v3/src/c++11/compatibility-atomic-c++0x.cc | 8 
2 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/libstdc++-v3/configure.ac b/libstdc++-v3/configure.ac
index 81d914b434a..c5ec976c026 100644
--- a/libstdc++-v3/configure.ac
+++ b/libstdc++-v3/configure.ac
@@ -440,12 +440,6 @@ GCC_CHECK_UNWIND_GETIPINFO

GCC_LINUX_FUTEX([AC_DEFINE(HAVE_LINUX_FUTEX, 1, [Define if futex syscall is 
available.])])

-if test "$is_hosted" = yes; then
-# TODO: remove this and change src/c++11/compatibility-atomic-c++0x.cc to
-# use  instead of .
-GCC_HEADER_STDINT(include/gstdint.h)
-fi
-
GLIBCXX_ENABLE_SYMVERS([yes])
AC_SUBST(libtool_VERSION)

diff --git a/libstdc++-v3/src/c++11/compatibility-atomic-c++0x.cc 
b/libstdc++-v3/src/c++11/compatibility-atomic-c++0x.cc
index 5a0c5459088..e21bd76245d 100644
--- a/libstdc++-v3/src/c++11/compatibility-atomic-c++0x.cc
+++ b/libstdc++-v3/src/c++11/compatibility-atomic-c++0x.cc
@@ -22,7 +22,6 @@
// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
// .

-#include "gstdint.h"
#include 
#include 

@@ -119,13 +118,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  _GLIBCXX_CONST __atomic_flag_base*
  __atomic_flag_for_address(const volatile void* __z) _GLIBCXX_NOTHROW
  {
-uintptr_t __u = reinterpret_cast(__z);
+using guintptr_t = __UINTPTR_TYPE__;
+guintptr_t __u = reinterpret_cast(__z);
__u += (__u >> 2) + (__u << 4);
__u += (__u >> 7) + (__u << 5);
__u += (__u >> 17) + (__u << 13);
-if (sizeof(uintptr_t) > 4)
+if (sizeof(guintptr_t) > 4)
  __u += (__u >> 31);
-__u &= ~((~uintptr_t(0)) << LOGSIZE);
+__u &= ~((~guintptr_t(0)) << LOGSIZE);
return flag_table + __u;
  }





Re: Extend fold_vec_perm to fold VEC_PERM_EXPR in VLA manner

2022-10-28 Thread Prathamesh Kulkarni via Gcc-patches
On Wed, 26 Oct 2022 at 21:07, Richard Sandiford
 wrote:
>
> Sorry for the slow response.  I wanted to find some time to think
> about this a bit more.
>
> Prathamesh Kulkarni  writes:
> > On Fri, 30 Sept 2022 at 21:38, Richard Sandiford
> >  wrote:
> >>
> >> Richard Sandiford via Gcc-patches  writes:
> >> > Prathamesh Kulkarni  writes:
> >> >> Sorry to ask a silly question but in which case shall we select 2nd 
> >> >> vector ?
> >> >> For num_poly_int_coeffs == 2,
> >> >> a1 /trunc n1 == (a1 + 0x) / (n1.coeffs[0] + n1.coeffs[1]*x)
> >> >> If a1/trunc n1 succeeds,
> >> >> 0 / n1.coeffs[1] == a1/n1.coeffs[0] == 0.
> >> >> So, a1 has to be < n1.coeffs[0] ?
> >> >
> >> > Remember that a1 is itself a poly_int.  It's not necessarily a constant.
> >> >
> >> > E.g. the TRN1 .D instruction maps to a VEC_PERM_EXPR with the selector:
> >> >
> >> >   { 0, 2 + 2x, 1, 4 + 2x, 2, 6 + 2x, ... }
> >>
> >> Sorry, should have been:
> >>
> >>   { 0, 2 + 2x, 2, 4 + 2x, 4, 6 + 2x, ... }
> > Hi Richard,
> > Thanks for the clarifications, and sorry for late reply.
> > I have attached POC patch that tries to implement the above approach.
> > Passes bootstrap+test on x86_64-linux-gnu and aarch64-linux-gnu for VLS 
> > vectors.
> >
> > For VLA vectors, I have only done limited testing so far.
> > It seems to pass couple of tests written in the patch for
> > nelts_per_pattern == 3,
> > and folds the following svld1rq test:
> > int32x4_t v = {1, 2, 3, 4};
> > return svld1rq_s32 (svptrue_b8 (), [0])
> > into:
> > return {1, 2, 3, 4, ...};
> > I will try to bootstrap+test it on SVE machine to test further for VLA 
> > folding.
> >
> > I have a couple of questions:
> > 1] When mask selects elements from same vector but from different patterns:
> > For eg:
> > arg0 = {1, 11, 2, 12, 3, 13, ...},
> > arg1 = {21, 31, 22, 32, 23, 33, ...},
> > mask = {0, 0, 0, 1, 0, 2, ... },
> > All have npatterns = 2, nelts_per_pattern = 3.
> >
> > With above mask,
> > Pattern {0, ...} selects arg0[0], ie {1, ...}
> > Pattern {0, 1, 2, ...} selects arg0[0], arg0[1], arg0[2], ie {1, 11, 2, ...}
> > While arg0[0] and arg0[2] belong to same pattern, arg0[1] belongs to 
> > different
> > pattern in arg0.
> > The result is:
> > res = {1, 1, 1, 11, 1, 2, ...}
> > In this case, res's 2nd pattern {1, 11, 2, ...} is encoded with:
> > with a0 = 1, a1 = 11, S = -9.
> > Is that expected tho ? It seems to create a new encoding which
> > wasn't present in the input vector. For instance, the next elem in
> > sequence would be -7,
> > which is not present originally in arg0.
>
> Yeah, you're right, sorry.  Going back to:
>
> (2) The explicit encoding can be used to produce a sequence of N*Ex*Px
> elements for any integer N.  This extended sequence can be reencoded
> as having N*Px patterns, with Ex staying the same.
>
> I guess we need to pick an N for the selector such that each new
> selector pattern (each one out of the N*Px patterns) selects from
> the *same pattern* of the same data input.
>
> So if a particular pattern in the selector has a step S, and the data
> input it selects from has Pi patterns, N*S must be a multiple of Pi.
> N must be a multiple of least_common_multiple(S,Pi)/S.
>
> I think that means that the total number of patterns in the result
> (Pr from previous messages) can safely be:
>
>   Ps * least_common_multiple(
> least_common_multiple(S[1], P[input(1)]) / S[1],
> ...
> least_common_multiple(S[Ps], P[input(Ps)]) / S[Ps]
>   )
>
> where:
>
>   Ps = the number of patterns in the selector
>   S[I] = the step for selector pattern I (I being 1-based)
>   input(I) = the data input selected by selector pattern I (I being 1-based)
>   P[I] = the number of patterns in data input I
>
> That's getting quite complicated :-)  If we allow arbitrary P[...]
> and S[...] then it could also get large.  Perhaps we should finally
> give up on the general case and limit this to power-of-2 patterns and
> power-of-2 steps, so that least_common_multiple becomes MAX.  Maybe that
> simplifies other things as well.
>
> What do you think?
Hi Richard,
Thanks for the suggestions. Yeah I suppose we can initially add support for
power-of-2 patterns and power-of-2 steps and try to generalize it in
follow up patches if possible.

Sorry if this sounds like a silly ques -- if we are going to have
pattern in selector, select *same pattern from same input vector*,
instead of re-encoding the selector to have N * Ps patterns, would it
make sense for elements in selector to denote pattern number itself
instead of element index
if input vectors are VLA ?

For eg:
op0 = {1, 2, 3, 4, 1, 2, 3, 5, 1, 2, 3, 6, ...}
op1 = {...}
with npatterns == 4, nelts_per_pattern == 3,
sel = {0, 3} should pick pattern 0 and pattern 3 from op0,
so, res = {1, 4, 1, 5, 1, 6, ...}
Not sure if this is correct tho.

Thanks,
Prathamesh
>
> > I suppose it's fine since if the user defines mask to have pattern {0,
> > 1, 2, ...}
> > they intended result to have pattern with above encoding.
> > 

Re: RFC - VRP1 default mode

2022-10-28 Thread Andrew MacLeod via Gcc-patches



On 10/28/22 10:14, Richard Biener wrote:



Am 28.10.2022 um 15:59 schrieb Andrew MacLeod :



On 10/28/22 09:46, Richard Biener wrote:

On Fri, Oct 28, 2022 at 3:43 PM Andrew MacLeod  wrote:

On 10/28/22 03:17, Richard Biener wrote:

On Wed, Oct 26, 2022 at 4:24 PM Andrew MacLeod  wrote:

Figured I would ask what you guys think of making ranger the default for
the VRP1 pass now.

With partial equivalences and the other bits I checked in the past few
weeks I'm not aware of much that the legacy VRP pass gets that ranger
doesn't.  The only exception to that which I am aware of is the trick
played with the unreachable edges to set global ranges, but that is done
in the DOM passes now anyway... so it just happens slightly later in the
optimization cycle.

Note DOM should go away at some point.  Why can this not happen during
ranger driven VRP?

I have been working on that for the last 2 days.  Turns out VRP1 can
remove builtin_unreachable from the
if (X)
  __builtin_unreachable ()

idiom and set the appropriate global ranges, but it has to leave those
with 2 ssa-names:

if (a_1 != b_2)
  __builtin_unreachable()

until the second pass of VRP or we lose the relationship between a_1 and
b_2.  That triggers some failures.  Specifically a vectorizor fail
because it cant be sure that the start and end point are not the same
without the condition in the IL. Trying to store global relations over
multiple passes would be problematic at this stage of development, so I
don't see a problem with leaving it that way.

Hmm, I don't remember VRP1 doing anything special with the above though?
Did it somehow propagate the (un!)conditional equivalence?

So as I looked at builtin_unreachable(), it was very adhoc.  That one of the 
roots of that artificial testcase in the PR I opened. Cascading calls were not 
being handled in a consistent way. VRP1 removed some, dom removed some..  they 
just kind of disappeared at some point, but not consistently.  The PR that Uli 
opened that Aldy fixed, I could make fail again with minor adjustments to the 
conditions.  So I worked on a consistent approach.

My guess is the old range stored globally for that case for a_1 was probably 
~[b_2, b_2]  meaning it was carried in the range. Until we have an overall 
global relation tracker, we can't represent that across passes.

The global ranges were never symbolic, this was at most used during VRP itself.



Ah. Just took a closer look at what use to happen.

legacy vrp1 never removed the unreachable call, it hung around until the 
threadfull2 ran just before vrp2. The testcase was an artificial 
vectorizing test with an infinite loop and unreachable in the final 
block.  Just part of the inconsistent removal :-P:


Andrew



[committed] libstdc++: Fix allocator propagation in regex algorithms [PR107376]

2022-10-28 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux. Pushed to trunk.

-- >8 --

The PR points out that we assume the match_results allocator is default
constuctible, which might not be true. We also have a related issue with
unwanted propagation from an object that might have an unequal
allocator.

Ideally we use the same allocator type for _State_info::_M_match_queue
but that would be an ABI change now. We should investigate if that can
be done without breaking anything, which might be possible because the
_Executor object is short-lived and never leaks out of the regex_match,
regex_search, and regex_replace algorithms. If we change the mangled
name for _Executor then there would be no ODR violations when mixing old
and new definitions. This commit does not attempt that.

libstdc++-v3/ChangeLog:

PR libstdc++/107376
* include/bits/regex_executor.h (_Executor::_Executor): Use same
allocator for _M_cur_results and _M_results.
* include/bits/regex_executor.tcc (_Executor::_M_main_dispatch):
Prevent possibly incorrect allocator propagating to
_M_cur_results.
* testsuite/28_regex/algorithms/regex_match/107376.cc: New test.
---
 libstdc++-v3/include/bits/regex_executor.h| 17 +++--
 libstdc++-v3/include/bits/regex_executor.tcc  |  3 +-
 .../28_regex/algorithms/regex_match/107376.cc | 76 +++
 3 files changed, 87 insertions(+), 9 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/28_regex/algorithms/regex_match/107376.cc

diff --git a/libstdc++-v3/include/bits/regex_executor.h 
b/libstdc++-v3/include/bits/regex_executor.h
index dc0878ce678..cdafcd5523d 100644
--- a/libstdc++-v3/include/bits/regex_executor.h
+++ b/libstdc++-v3/include/bits/regex_executor.h
@@ -71,14 +71,15 @@ namespace __detail
_ResultsVec&__results,
const _RegexT&  __re,
_FlagT  __flags)
-  : _M_begin(__begin),
-  _M_end(__end),
-  _M_re(__re),
-  _M_nfa(*__re._M_automaton),
-  _M_results(__results),
-  _M_rep_count(_M_nfa.size()),
-  _M_states(_M_nfa._M_start(), _M_nfa.size()),
-  _M_flags(__flags)
+  : _M_cur_results(__results.get_allocator()),
+   _M_begin(__begin),
+   _M_end(__end),
+   _M_re(__re),
+   _M_nfa(*__re._M_automaton),
+   _M_results(__results),
+   _M_rep_count(_M_nfa.size()),
+   _M_states(_M_nfa._M_start(), _M_nfa.size()),
+   _M_flags(__flags)
   {
using namespace regex_constants;
if (__flags & match_prev_avail) // ignore not_bol and not_bow
diff --git a/libstdc++-v3/include/bits/regex_executor.tcc 
b/libstdc++-v3/include/bits/regex_executor.tcc
index b93e958075e..a5885ed34ba 100644
--- a/libstdc++-v3/include/bits/regex_executor.tcc
+++ b/libstdc++-v3/include/bits/regex_executor.tcc
@@ -124,9 +124,10 @@ namespace __detail
break;
  std::fill_n(_M_states._M_visited_states, _M_nfa.size(), false);
  auto __old_queue = std::move(_M_states._M_match_queue);
+ auto __alloc = _M_cur_results.get_allocator();
  for (auto& __task : __old_queue)
{
- _M_cur_results = std::move(__task.second);
+ _M_cur_results = _ResultsVec(std::move(__task.second), __alloc);
  _M_dfs(__match_mode, __task.first);
}
  if (__match_mode == _Match_mode::_Prefix)
diff --git a/libstdc++-v3/testsuite/28_regex/algorithms/regex_match/107376.cc 
b/libstdc++-v3/testsuite/28_regex/algorithms/regex_match/107376.cc
new file mode 100644
index 000..da4f7ad0a23
--- /dev/null
+++ b/libstdc++-v3/testsuite/28_regex/algorithms/regex_match/107376.cc
@@ -0,0 +1,76 @@
+// { dg-do run { target c++11 } }
+#include 
+#include 
+#include 
+
+template
+struct Alloc
+{
+  using value_type = T;
+  explicit Alloc(int) { }
+  template Alloc(const Alloc&) { }
+
+  T* allocate(std::size_t n)
+  { return std::allocator().allocate(n); }
+  void deallocate(T* ptr, std::size_t n)
+  { std::allocator().deallocate(ptr, n); }
+
+  bool operator==(const Alloc&) const { return true; }
+  bool operator!=(const Alloc&) const { return false; }
+};
+
+void
+test_non_default_constructible()
+{
+  using sub_match = std::sub_match;
+  using alloc_type = Alloc;
+  using match_results = std::match_results;
+  match_results res(alloc_type(1));
+
+  std::regex_match("x", res, std::regex(".")); // PR libstdc++/107376
+}
+
+template
+struct PropAlloc
+{
+  int id;
+
+  using value_type = T;
+  explicit PropAlloc(int id) : id(id) { }
+  template PropAlloc(const PropAlloc& a) : id(a.id) { }
+
+  using propagate_on_container_move_assignment = std::true_type;
+  using propagate_on_container_copy_assignment = std::true_type;
+
+  PropAlloc select_on_container_copy_construction() const
+  { return PropAlloc(0); }
+
+  T* allocate(std::size_t n)
+  { return std::allocator().allocate(n); }
+  void deallocate(T* ptr, std::size_t n)
+  { std::allocator().deallocate(ptr, n); }
+
+  bool operator==(const 

Re: [PATCH v3] LoongArch: Libvtv add loongarch support.

2022-10-28 Thread chenglulu



在 2022/10/28 17:38, WANG Xuerui 写道:

Hi,

The code change seems good but a few grammatical nits.

Patch subject should be a verb phrase, something like "libvtv: add 
LoongArch support" could be better.


Ok, thank you. I'll make the changes.




On 2022/10/28 16:01, Lulu Cheng wrote:
After several considerations, I decided to set VTV_PAGE_SIZE to 16KB 
under loongarch64.



v1 - > v2:

1. When the macro __loongarch_lp64 is defined, the VTV_PAGE_SIZE is 
set to 64K.
2. In the vtv_malloc.cc file __vtv_malloc_init function, it does not 
check

    whether VTV_PAGE_SIZE is equal to the system page size, if the macro
    __loongarch_lp64 is defined.

v2 -> v3:

Set VTV_PAGE_SIZE to 16KB under loongarch64.



All regression tests of libvtv passed.

 === libvtv Summary ===

# of expected passes    176

-


Are the monologue and changelog supposed to be a part of the actual 
commit? If not, conventionally they should be placed *after* the "---" 
line separating the commit message and diffstat/patch content.




The loongarch64 kernel supports 4KB,16KB, or 64KB pages,
but only 16k pages are currently supported in this code.
This sentence feels a little bit unnatural. I suggest just "The 
LoongArch specification permits page sizes of 4KiB, 16KiB and 64KiB, 
but only 16KiB pages are supported for now".


Co-Authored-By: qijingwen 

include/ChangeLog:

* vtv-change-permission.h (defined):
(VTV_PAGE_SIZE): Set VTV_PAGE_SIZE to 16KB under loongarch64.

"for loongarch64" feels more natural.


What I want to say is that loongarch64 supports different page sizes,

but loongarch32 will be supported later, and loongarch32 only

supports 4KiB page sizes, so this is loongarch64.



libvtv/ChangeLog:

* configure.tgt: Add loongarch support.
---
  include/vtv-change-permission.h | 5 +
  libvtv/configure.tgt    | 3 +++
  2 files changed, 8 insertions(+)

diff --git a/include/vtv-change-permission.h 
b/include/vtv-change-permission.h

index 70bdad92bca..f61d8b68ef6 100644
--- a/include/vtv-change-permission.h
+++ b/include/vtv-change-permission.h
@@ -48,6 +48,11 @@ extern void __VLTChangePermission (int);
  #else
  #if defined(__sun__) && defined(__svr4__) && defined(__sparc__)
  #define VTV_PAGE_SIZE 8192
+#elif defined(__loongarch_lp64)
+/* The page size can be configured to 4, 16, or 64KB configuring the 
kernel.

"The page size is configurable by the kernel to be 4, 16 or 64 KiB."
+   However, only 16KB pages are supported here. Please modify this 
macro if you

+   want to support other page sizes.  */


Are we actually encouraging the users to modify the sources themselves 
if they decide to run with non-16KiB page size? This might not even be 
feasible, as you're essentially telling them to recompile part of the 
toolchain, which they may not want to cannot do.


I think the message you want to convey here is for them to voice their 
need upstream so we can then discuss. In that case, the 2 sentences 
here could be:


"For now, only the default page size of 16KiB is supported. If you 
have a need for other page sizes, please get in touch."
Although I'm not sure if the vague "get in touch" wording is 
appropriate. What do others think?

I think ok, I can't think of a better way to say it.



+#define VTV_PAGE_SIZE 16384
  #else
  #define VTV_PAGE_SIZE 4096
  #endif
diff --git a/libvtv/configure.tgt b/libvtv/configure.tgt
index aa2a3f675b8..6cdd1e97ab1 100644
--- a/libvtv/configure.tgt
+++ b/libvtv/configure.tgt
@@ -50,6 +50,9 @@ case "${target}" in
  ;;
    x86_64-*-darwin[1]* | i?86-*-darwin[1]*)
  ;;
+  loongarch*-*-linux*)
+    VTV_SUPPORTED=yes
+    ;;
    *)
  ;;
  esac


Re: RFC - VRP1 default mode

2022-10-28 Thread Richard Biener via Gcc-patches



> Am 28.10.2022 um 15:59 schrieb Andrew MacLeod :
> 
> 
>> On 10/28/22 09:46, Richard Biener wrote:
>>> On Fri, Oct 28, 2022 at 3:43 PM Andrew MacLeod  wrote:
>>> 
>>> On 10/28/22 03:17, Richard Biener wrote:
 On Wed, Oct 26, 2022 at 4:24 PM Andrew MacLeod  wrote:
> Figured I would ask what you guys think of making ranger the default for
> the VRP1 pass now.
> 
> With partial equivalences and the other bits I checked in the past few
> weeks I'm not aware of much that the legacy VRP pass gets that ranger
> doesn't.  The only exception to that which I am aware of is the trick
> played with the unreachable edges to set global ranges, but that is done
> in the DOM passes now anyway... so it just happens slightly later in the
> optimization cycle.
 Note DOM should go away at some point.  Why can this not happen during
 ranger driven VRP?
>>> I have been working on that for the last 2 days.  Turns out VRP1 can
>>> remove builtin_unreachable from the
>>>if (X)
>>>  __builtin_unreachable ()
>>> 
>>> idiom and set the appropriate global ranges, but it has to leave those
>>> with 2 ssa-names:
>>> 
>>>if (a_1 != b_2)
>>>  __builtin_unreachable()
>>> 
>>> until the second pass of VRP or we lose the relationship between a_1 and
>>> b_2.  That triggers some failures.  Specifically a vectorizor fail
>>> because it cant be sure that the start and end point are not the same
>>> without the condition in the IL. Trying to store global relations over
>>> multiple passes would be problematic at this stage of development, so I
>>> don't see a problem with leaving it that way.
>> Hmm, I don't remember VRP1 doing anything special with the above though?
>> Did it somehow propagate the (un!)conditional equivalence?
> 
> So as I looked at builtin_unreachable(), it was very adhoc.  That one of the 
> roots of that artificial testcase in the PR I opened. Cascading calls were 
> not being handled in a consistent way. VRP1 removed some, dom removed some..  
> they just kind of disappeared at some point, but not consistently.  The PR 
> that Uli opened that Aldy fixed, I could make fail again with minor 
> adjustments to the conditions.  So I worked on a consistent approach.
> 
> My guess is the old range stored globally for that case for a_1 was probably 
> ~[b_2, b_2]  meaning it was carried in the range. Until we have an overall 
> global relation tracker, we can't represent that across passes.

The global ranges were never symbolic, this was at most used during VRP itself.

> 
> It appears that leaving those until VRP2 works fine...  testsuite currently 
> running tho ;-)
> 
> Andrew
> 


Re: RFC - VRP1 default mode

2022-10-28 Thread Andrew MacLeod via Gcc-patches



On 10/28/22 09:46, Richard Biener wrote:

On Fri, Oct 28, 2022 at 3:43 PM Andrew MacLeod  wrote:


On 10/28/22 03:17, Richard Biener wrote:

On Wed, Oct 26, 2022 at 4:24 PM Andrew MacLeod  wrote:

Figured I would ask what you guys think of making ranger the default for
the VRP1 pass now.

With partial equivalences and the other bits I checked in the past few
weeks I'm not aware of much that the legacy VRP pass gets that ranger
doesn't.  The only exception to that which I am aware of is the trick
played with the unreachable edges to set global ranges, but that is done
in the DOM passes now anyway... so it just happens slightly later in the
optimization cycle.

Note DOM should go away at some point.  Why can this not happen during
ranger driven VRP?

I have been working on that for the last 2 days.  Turns out VRP1 can
remove builtin_unreachable from the
if (X)
  __builtin_unreachable ()

idiom and set the appropriate global ranges, but it has to leave those
with 2 ssa-names:

if (a_1 != b_2)
  __builtin_unreachable()

until the second pass of VRP or we lose the relationship between a_1 and
b_2.  That triggers some failures.  Specifically a vectorizor fail
because it cant be sure that the start and end point are not the same
without the condition in the IL. Trying to store global relations over
multiple passes would be problematic at this stage of development, so I
don't see a problem with leaving it that way.

Hmm, I don't remember VRP1 doing anything special with the above though?
Did it somehow propagate the (un!)conditional equivalence?


So as I looked at builtin_unreachable(), it was very adhoc.  That one of 
the roots of that artificial testcase in the PR I opened. Cascading 
calls were not being handled in a consistent way. VRP1 removed some, dom 
removed some..  they just kind of disappeared at some point, but not 
consistently.  The PR that Uli opened that Aldy fixed, I could make fail 
again with minor adjustments to the conditions.  So I worked on a 
consistent approach.


My guess is the old range stored globally for that case for a_1 was 
probably ~[b_2, b_2]  meaning it was carried in the range. Until we have 
an overall global relation tracker, we can't represent that across passes.


It appears that leaving those until VRP2 works fine...  testsuite 
currently running tho ;-)


Andrew



Re: vect: Make vect_check_gather_scatter reject offsets that aren't multiples of BITS_PER_UNIT [PR107346]

2022-10-28 Thread Richard Biener via Gcc-patches
On Fri, 28 Oct 2022, Andre Vieira (lists) wrote:

> 
> On 24/10/2022 14:29, Richard Biener wrote:
> > On Mon, 24 Oct 2022, Andre Vieira (lists) wrote:
> >
> >> Changing if-convert would merely change this testcase but we could still
> >> trigger using a different structure type, changing the size of Int24 to 32
> >> bits rather than 24:
> >> package Loop_Optimization23_Pkg is
> >>    type Nibble is mod 2**4;
> >>    type Int24  is mod 2**32;  -- Changed this from 24->32
> >>    type StructA is record
> >>      a : Nibble;
> >>      b : Int24;
> >>    end record;
> >>    pragma Pack(StructA);
> >>    type StructB is record
> >>      a : Nibble;
> >>      b : StructA;
> >>    end record;
> >>    pragma Pack(StructB);
> >>    type ArrayOfStructB is array(0..100) of StructB;
> >>    procedure Foo (X : in out ArrayOfStructB);
> >> end Loop_Optimization23_Pkg;
> >>
> >> This would yield a DR_REF (dr): (*x_7(D))[_1].b.b  where the last 'b' isn't
> >> a
> >> DECL_BIT_FIELD anymore, but the first one still is and still has the
> >> non-multiple of BITS_PER_UNIT offset. Thus passing the
> >> vect_find_stmt_data_reference check and triggering the
> >> vect_check_gather_scatter failure. So unless we go and make sure we always
> >> set
> >> the DECL_BIT_FIELD on all subsequent accesses of a DECL_BIT_FIELD 'struct'
> >> (which is odd enough on its own) then we are better off catching the issue
> >> in
> >> vect_check_gather_scatter ?
> > But it's not only an issue with scatter-gather, other load/store handling
> > assumes it can create a pointer to the start of the access and thus
> > requires BITS_PER_UNIT alignment for each of them.  So we need to fail
> > at data-ref analysis somehow.
> >
> > Richard.
> 
> Sorry for the delay on this, had some other things come in between. After our
> IRC discussion I believe we agreed that it would be neater to check this in
> vect_check_gather_scatter as I did in the original patch in
> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604139.html
> The main reasons being that to check earlier we'd need to walk the DR_REF to
> look for any FIELD_DECL that has DECL_BIT_FIELD set and we decided against
> that.
> 
> Can you confirm the original patch is OK for trunk?

Yes.

Thanks,
Richard.

> Kind regards,
> Andre


Re: RFC - VRP1 default mode

2022-10-28 Thread Richard Biener via Gcc-patches
On Fri, Oct 28, 2022 at 3:43 PM Andrew MacLeod  wrote:
>
>
> On 10/28/22 03:17, Richard Biener wrote:
> > On Wed, Oct 26, 2022 at 4:24 PM Andrew MacLeod  wrote:
> >> Figured I would ask what you guys think of making ranger the default for
> >> the VRP1 pass now.
> >>
> >> With partial equivalences and the other bits I checked in the past few
> >> weeks I'm not aware of much that the legacy VRP pass gets that ranger
> >> doesn't.  The only exception to that which I am aware of is the trick
> >> played with the unreachable edges to set global ranges, but that is done
> >> in the DOM passes now anyway... so it just happens slightly later in the
> >> optimization cycle.
> > Note DOM should go away at some point.  Why can this not happen during
> > ranger driven VRP?
>
> I have been working on that for the last 2 days.  Turns out VRP1 can
> remove builtin_unreachable from the
>if (X)
>  __builtin_unreachable ()
>
> idiom and set the appropriate global ranges, but it has to leave those
> with 2 ssa-names:
>
>if (a_1 != b_2)
>  __builtin_unreachable()
>
> until the second pass of VRP or we lose the relationship between a_1 and
> b_2.  That triggers some failures.  Specifically a vectorizor fail
> because it cant be sure that the start and end point are not the same
> without the condition in the IL. Trying to store global relations over
> multiple passes would be problematic at this stage of development, so I
> don't see a problem with leaving it that way.

Hmm, I don't remember VRP1 doing anything special with the above though?
Did it somehow propagate the (un!)conditional equivalence?

> bultin_unreachables() from switches get removed during the second pass
> of switch-conversion... which I presume remains OK.
>
> Anyway, thats pretty much under control.  Patch probably coming later today.
>
>
>
> >> There is one test case that needs adjustment for
> >> that which was just checking for a mask in DOM2
> >> (gcc.dg/tree-ssa/pr107009.c).   At this point I have not aware of
> >> anything that Id be concerned about, and the testsuite seems to run
> >> cleanly.
> > Did you enable Ada?  The only feature I don't see implemented is
> > symbolic range handling which boils down to general base + constant offset
> > range endpoints (that's what symbolic ranges allow).  That area was
> > specifically improved to optimize range checks emitted by the Ada frontend
> > but IIRC also applies to fortran -frange-check (not sure about test coverage
> > of that).
> I get a clean testsuite run configured and bootstrapped with
>
> --enable-languages=c,c++,go,fortran,ada,obj-c++,jit --enable-host-shared
>
> Is there a PR or specific tests in either fortran or ada for those
> improvements? ie, something specific I should check for? Part of rangers
> point is to be able to do symbolic relationships without storing the
> symbolic in the range, just picking it up from the IL as needed.

I'm defering to Eric here.

Richard.

> Andrew
>
>


Re: vect: Make vect_check_gather_scatter reject offsets that aren't multiples of BITS_PER_UNIT [PR107346]

2022-10-28 Thread Andre Vieira (lists) via Gcc-patches



On 24/10/2022 14:29, Richard Biener wrote:

On Mon, 24 Oct 2022, Andre Vieira (lists) wrote:


Changing if-convert would merely change this testcase but we could still
trigger using a different structure type, changing the size of Int24 to 32
bits rather than 24:
package Loop_Optimization23_Pkg is
   type Nibble is mod 2**4;
   type Int24  is mod 2**32;  -- Changed this from 24->32
   type StructA is record
     a : Nibble;
     b : Int24;
   end record;
   pragma Pack(StructA);
   type StructB is record
     a : Nibble;
     b : StructA;
   end record;
   pragma Pack(StructB);
   type ArrayOfStructB is array(0..100) of StructB;
   procedure Foo (X : in out ArrayOfStructB);
end Loop_Optimization23_Pkg;

This would yield a DR_REF (dr): (*x_7(D))[_1].b.b  where the last 'b' isn't a
DECL_BIT_FIELD anymore, but the first one still is and still has the
non-multiple of BITS_PER_UNIT offset. Thus passing the
vect_find_stmt_data_reference check and triggering the
vect_check_gather_scatter failure. So unless we go and make sure we always set
the DECL_BIT_FIELD on all subsequent accesses of a DECL_BIT_FIELD 'struct'
(which is odd enough on its own) then we are better off catching the issue in
vect_check_gather_scatter ?

But it's not only an issue with scatter-gather, other load/store handling
assumes it can create a pointer to the start of the access and thus
requires BITS_PER_UNIT alignment for each of them.  So we need to fail
at data-ref analysis somehow.

Richard.


Sorry for the delay on this, had some other things come in between. 
After our IRC discussion I believe we agreed that it would be neater to 
check this in vect_check_gather_scatter as I did in the original patch 
in https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604139.html
The main reasons being that to check earlier we'd need to walk the 
DR_REF to look for any FIELD_DECL that has DECL_BIT_FIELD set and we 
decided against that.


Can you confirm the original patch is OK for trunk?

Kind regards,
Andre



Re: RFC - VRP1 default mode

2022-10-28 Thread Andrew MacLeod via Gcc-patches



On 10/28/22 03:17, Richard Biener wrote:

On Wed, Oct 26, 2022 at 4:24 PM Andrew MacLeod  wrote:

Figured I would ask what you guys think of making ranger the default for
the VRP1 pass now.

With partial equivalences and the other bits I checked in the past few
weeks I'm not aware of much that the legacy VRP pass gets that ranger
doesn't.  The only exception to that which I am aware of is the trick
played with the unreachable edges to set global ranges, but that is done
in the DOM passes now anyway... so it just happens slightly later in the
optimization cycle.

Note DOM should go away at some point.  Why can this not happen during
ranger driven VRP?


I have been working on that for the last 2 days.  Turns out VRP1 can 
remove builtin_unreachable from the

  if (X)
    __builtin_unreachable ()

idiom and set the appropriate global ranges, but it has to leave those 
with 2 ssa-names:


  if (a_1 != b_2)
    __builtin_unreachable()

until the second pass of VRP or we lose the relationship between a_1 and 
b_2.  That triggers some failures.  Specifically a vectorizor fail 
because it cant be sure that the start and end point are not the same 
without the condition in the IL. Trying to store global relations over 
multiple passes would be problematic at this stage of development, so I 
don't see a problem with leaving it that way.


bultin_unreachables() from switches get removed during the second pass 
of switch-conversion... which I presume remains OK.


Anyway, thats pretty much under control.  Patch probably coming later today.




There is one test case that needs adjustment for
that which was just checking for a mask in DOM2
(gcc.dg/tree-ssa/pr107009.c).   At this point I have not aware of
anything that Id be concerned about, and the testsuite seems to run
cleanly.

Did you enable Ada?  The only feature I don't see implemented is
symbolic range handling which boils down to general base + constant offset
range endpoints (that's what symbolic ranges allow).  That area was
specifically improved to optimize range checks emitted by the Ada frontend
but IIRC also applies to fortran -frange-check (not sure about test coverage
of that).

I get a clean testsuite run configured and bootstrapped with

   --enable-languages=c,c++,go,fortran,ada,obj-c++,jit --enable-host-shared

Is there a PR or specific tests in either fortran or ada for those 
improvements? ie, something specific I should check for? Part of rangers 
point is to be able to do symbolic relationships without storing the 
symbolic in the range, just picking it up from the IL as needed.


Andrew




Re: [PATCH 2/2] ivopts: Consider number of invariants when calculating register pressure.

2022-10-28 Thread Dimitrije Milosevic
Hi Richard,

> It's n_invs + 2 * n_cands?

Correct, n_invs + 2 * n_cands, my apologies.

> The comment says we want to prefer eliminating IVs over invariants.  Your 
> patch
> undoes that by weighting invariants the same so it does no longer have
> the effect
> of the comment.

I see how my patch may have confused you.
My concern is the "If we have enough registers." case - if we do have 
enough registers to store both the invariants and induction variables, I think 
the cost 
should be equal to the sum of those. 

I understand that adding another n_cands could be used as a tie-breaker for the 
two 
cases where we do have enough registers and the sum of n_invs and n_cands is 
equal, 
however I think there are two problems with that:
- How often does it happen that we have two cases where we do have enough 
registers,
  n_invs + n_cands sums are equal, and n_cands differ? I think that's pretty 
rare.
- Bumping up the cost by another n_cands may lead to cost for the "If we do have
enough registers." case to be higher than for other cases, which doesn't make 
sense.
I can refer to the test case that I presented in [0] for the second point.
Also worth noting is that the estimate_reg_pressure_cost function (used before 
c18101f) 
follows this:

  /* If we have enough registers, we should use them and not restrict
 the transformations unnecessarily.  */
  if (regs_needed + target_res_regs <= available_regs)
return 0;

As far as preferring to eliminate induction variables if possible, don't we 
already do that,
for example:

  /* If the number of candidates runs out available registers, we penalize
 extra candidate registers using target_spill_cost * 2.  Because it is
 more expensive to spill induction variable than invariant.  */
  else
cost = target_reg_cost [speed] * available_regs
   + target_spill_cost [speed] * (n_cands - available_regs) * 2
   + target_spill_cost [speed] * (regs_needed - n_cands);

To clarify, what my patch did was that it gave every case a base cost of
n_invs + n_cands. This base cost gets bumped up accordingly, for each
one of the cases (by the amount equal to "cost = ..." statement prior to
the return statement in the ivopts_estimate_reg_pressure function).
I agree that my patch isn't clear on my intention, and that it also does
not correspond to the comment. 
What I could do is just return n_new as the cost for the 
"If we do have enough registers." case, but I would love to hear your 
thoughts, if I clarified my intention a little bit.

[0] https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604304.html

Regards,
Dimitrije

From: Richard Biener 
Sent: Friday, October 28, 2022 9:38 AM
To: Dimitrije Milosevic 
Cc: gcc-patches@gcc.gnu.org ; Djordje Todorovic 

Subject: Re: [PATCH 2/2] ivopts: Consider number of invariants when calculating 
register pressure. 
 
On Tue, Oct 25, 2022 at 3:00 PM Dimitrije Milosevic
 wrote:
>
> Hi Richard,
>
> > don't you add n_invs twice now given
> >
> >  unsigned n_old = data->regs_used, n_new = n_invs + n_cands;
> >  unsigned regs_needed = n_new + n_old, available_regs = target_avail_regs;
> >
> > ?
>
> If you are referring to the "If we have enough registers." case, correct. 
> After c18101f,
> for that case, the returned cost is equal to 2 * n_invs + n_cands.

It's n_invs + 2 * n_cands?  And the comment states the reasoning.

 Before c18101f, for
> that case, the returned cost is equal to n_invs + n_cands. Another solution 
> would be
> to just return n_invs + n_cands if we have enough registers.

The comment says we want to prefer eliminating IVs over invariants.  Your patch
undoes that by weighting invariants the same so it does no longer have
the effect
of the comment.

> Regards,
> Dimitrije
>
>
> From: Richard Biener 
> Sent: Tuesday, October 25, 2022 1:07 PM
> To: Dimitrije Milosevic 
> Cc: gcc-patches@gcc.gnu.org ; Djordje Todorovic 
> 
> Subject: Re: [PATCH 2/2] ivopts: Consider number of invariants when 
> calculating register pressure.
>
> On Fri, Oct 21, 2022 at 3:57 PM Dimitrije Milosevic
>  wrote:
> >
> > From: Dimitrije Milošević 
> >
> > This patch slightly modifies register pressure model function to consider
> > both the number of invariants and the number of candidates, rather than
> > just the number of candidates. This used to be the case before c18101f.
>
> don't you add n_invs twice now given
>
>   unsigned n_old = data->regs_used, n_new = n_invs + n_cands;
>   unsigned regs_needed = n_new + n_old, available_regs = target_avail_regs;
>
> ?
>
> > gcc/ChangeLog:
> >
> > * tree-ssa-loop-ivopts.cc (ivopts_estimate_reg_pressure): Adjust.
> >
> > Signed-off-by: Dimitrije Milosevic 
> > ---
> >  gcc/tree-ssa-loop-ivopts.cc | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> > index d53ba05a4f6..9d0b669d671 100644
> > --- a/gcc/tree-ssa-loop-ivopts.cc
> > +++ b/gcc/tree-ssa-loop-ivopts.cc
> > @@ 

Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.

2022-10-28 Thread Dimitrije Milosevic
Hi Richard,

> But something is wrong then - it shouldn't ever pick a candidate with
> an addressing
> mode that isn't supported?

Test case I presented in [0] only has non-"BASE + OFFSET" candidates. Correct me
if I'm wrong, but the candidate selection algorithm doesn't take into account
which addressing modes are supported by the target?

> So you say that the cost of expressing
> 'BASE + INDEX << SCALE + OFFSET' as 'BASE + OFFSET' is not computed
> accurately?

I just took it as an example, but yes.

> The function tries to compensate for that, maybe you can point out
> where it goes wrong?
> That is, at the end it adjusts cost and complexity based on what it
> scrapped before, maybe
> that is just a bit incomplete?

I think the cost.cost part is mostly okay, as the costs are just scaled 
(what was lesser/higher before f9f69dd is lesser/higher after f9f69dd).
As far as the adjustments go, I don't think they are complete. 
On the other hand, as complexity is a valid part of address costs, and
it can be used as a tie-breaker, I feel like it should serve a purpose, 
even for targets like Mips which are limited when it comes to 
addressing modes, rather than being equal to 0.

I guess an alternative would be to fully cover cost.cost adjustments, and
leave the complexities to be 0 for non-supported addressing modes.
Currently, they are implemented as follows:

  if (simple_inv)
simple_inv = (aff_inv == NULL
  || aff_combination_const_p (aff_inv)
  || aff_combination_singleton_var_p (aff_inv));
  if (!aff_combination_zero_p (aff_inv))
comp_inv = aff_combination_to_tree (aff_inv);
  if (comp_inv != NULL_TREE)
cost = force_var_cost (data, comp_inv, inv_vars);
  if (ratio != 1 && parts.step == NULL_TREE)
var_cost += mult_by_coeff_cost (ratio, addr_mode, speed);
  if (comp_inv != NULL_TREE && parts.index == NULL_TREE)
var_cost += add_cost (speed, addr_mode);

> Note the original author of this is not available so it would help
> (maybe also yourself) to
> walk through the function with a specific candidate / use where you
> think the complexity
> (or cost) is wrong?

I'd like to refer to [0] where candidate costs didn't get adjusted to 
cover the lack of complexity calculation.

Would love to hear your thoughts.

[0] https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604304.html

Regards,
Dimitrije

From: Richard Biener 
Sent: Friday, October 28, 2022 9:00 AM
To: Dimitrije Milosevic 
Cc: Jeff Law ; gcc-patches@gcc.gnu.org 
; Djordje Todorovic 
Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity. 
 
On Fri, Oct 28, 2022 at 8:43 AM Dimitrije Milosevic
 wrote:
>
> Hi Jeff,
>
> > THe part I don't understand is, if you only have BASE+OFF, why does
> > preventing the calculation of more complex addressing modes matter?  ie,
> > what's the point of computing the cost of something like base + off +
> > scaled index when the target can't utilize it?
>
> Well, the complexities of all addressing modes other than BASE + OFFSET are
> equal to 0. For targets like Mips, which only has BASE + OFFSET, it would 
> still
> be more complex to use a candidate with BASE + INDEX << SCALE + OFFSET
> than a candidate with BASE + INDEX, for example, as it has to compensate
> the lack of other addressing modes somehow. If complexities for both of
> those are equal to 0, in cases where complexities decide which candidate is
> to be chosen, a more complex candidate may be picked.

But something is wrong then - it shouldn't ever pick a candidate with
an addressing
mode that isn't supported?  So you say that the cost of expressing
'BASE + INDEX << SCALE + OFFSET' as 'BASE + OFFSET' is not computed
accurately?

The function tries to compensate for that, maybe you can point out
where it goes wrong?
That is, at the end it adjusts cost and complexity based on what it
scrapped before, maybe
that is just a bit incomplete?

Note the original author of this is not available so it would help
(maybe also yourself) to
walk through the function with a specific candidate / use where you
think the complexity
(or cost) is wrong?


> Regards,
> Dimitrije
>
>
> From: Jeff Law 
> Sent: Friday, October 28, 2022 1:02 AM
> To: Dimitrije Milosevic ; 
> gcc-patches@gcc.gnu.org 
> Cc: Djordje Todorovic 
> Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost 
> complexity.
>
>
> On 10/21/22 07:52, Dimitrije Milosevic wrote:
> > From: Dimitrije Milošević 
> >
> > This patch reverts the computation of address cost complexity
> > to the legacy one. After f9f69dd, complexity is calculated
> > using the valid_mem_ref_p target hook. Architectures like
> > Mips only allow BASE + OFFSET addressing modes, which in turn
> > prevents the calculation of complexity for other addressing
> > modes, resulting in non-optimal candidate selection.
> >
> > gcc/ChangeLog:
> >
> >    * tree-ssa-address.cc (multiplier_allowed_in_address_p): Change
> >    to non-static.
> >    * 

[PATCH] tree-optimization/107407 - wrong code with DSE

2022-10-28 Thread Richard Biener via Gcc-patches
So what happens is that we elide processing of this check with

  /* In addition to kills we can remove defs whose only use
 is another def in defs.  That can only ever be PHIs of which
 we track two for simplicity reasons, the first and last in
 {first,last}_phi_def (we fail for multiple PHIs anyways).
 We can also ignore defs that feed only into
 already visited PHIs.  */
  else if (single_imm_use (vdef, _p, _stmt)
   && (use_stmt == first_phi_def
   || use_stmt == last_phi_def
   || (gimple_code (use_stmt) == GIMPLE_PHI
   && bitmap_bit_p (visited,
SSA_NAME_VERSION
  (PHI_RESULT (use_stmt))

where we have the last PHI being the outer loop virtual PHI and the first
PHI being the loop exit PHI of the outer loop and we've already processed
the single immediate use of the outer loop PHI, the inner loop PHI.  But
we still have to perform the above check!

It's easiest to perform the check when we visit the PHI node instead of
delaying it to the actual processing loop.

Bootstrap & regtest in progress on x86_64-unknown-linux-gnu.

PR tree-optimization/107407
* tree-ssa-dse.cc (dse_classify_store): Perform backedge
varying index check when collecting PHI uses rather than
after optimizing processing of the candidate defs.

* gcc.dg/torture/pr107407.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr107407.c | 26 +
 gcc/tree-ssa-dse.cc | 17 
 2 files changed, 35 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr107407.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr107407.c 
b/gcc/testsuite/gcc.dg/torture/pr107407.c
new file mode 100644
index 000..228fce1e699
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr107407.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+
+int *a;
+int c[4];
+int d;
+
+static int
+f(char k, int j)
+{
+  for (; k <= 3; k++)
+{
+  a = [k];
+  for (; d <= 1; d++)
+*a = 3;
+}
+  *a = 0;
+}
+
+int main()
+{
+  int i;
+  f(0, 0);
+  if (c[0] != 3)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc
index c14e5e43eb3..82976bdbeb0 100644
--- a/gcc/tree-ssa-dse.cc
+++ b/gcc/tree-ssa-dse.cc
@@ -978,14 +978,6 @@ dse_classify_store (ao_ref *ref, gimple *stmt,
 
   if (gimple_code (temp) == GIMPLE_PHI)
{
- /* If we visit this PHI by following a backedge then we have to
-make sure ref->ref only refers to SSA names that are invariant
-with respect to the loop represented by this PHI node.  */
- if (dominated_by_p (CDI_DOMINATORS, gimple_bb (stmt),
- gimple_bb (temp))
- && !for_each_index (ref->ref ? >ref : >base,
- check_name, gimple_bb (temp)))
-   return DSE_STORE_LIVE;
  defvar = PHI_RESULT (temp);
  bitmap_set_bit (visited, SSA_NAME_VERSION (defvar));
}
@@ -1019,6 +1011,15 @@ dse_classify_store (ao_ref *ref, gimple *stmt,
  if (!bitmap_bit_p (visited,
 SSA_NAME_VERSION (PHI_RESULT (use_stmt
{
+ /* If we visit this PHI by following a backedge then we have
+to make sure ref->ref only refers to SSA names that are
+invariant with respect to the loop represented by this
+PHI node.  */
+ if (dominated_by_p (CDI_DOMINATORS, gimple_bb (stmt),
+ gimple_bb (use_stmt))
+ && !for_each_index (ref->ref ? >ref : >base,
+ check_name, gimple_bb (use_stmt)))
+   return DSE_STORE_LIVE;
  defs.safe_push (use_stmt);
  if (!first_phi_def)
first_phi_def = as_a  (use_stmt);
-- 
2.35.3


[PATCH] tree-optimization/107447 - avoid hoisting returns-twice calls in LIM

2022-10-28 Thread Richard Biener via Gcc-patches
The following makes sure to not hoist returns-twice calls in LIM
since we have no way to move the abnormal edge associated with it
and we are prone having stray abnormal edges in the IL which will
then cause IL verification failures even when the actual call
does not return twice.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/107447
* tree-ssa-loop-im.cc (determine_max_movement): Do not
hoist returns-twice calls.

* gcc.dg/torture/pr107447.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr107447.c | 23 +++
 gcc/tree-ssa-loop-im.cc | 13 +
 2 files changed, 32 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr107447.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr107447.c 
b/gcc/testsuite/gcc.dg/torture/pr107447.c
new file mode 100644
index 000..06f7b7b15ae
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr107447.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+
+int n;
+
+void
+bar (int, int);
+
+__attribute__ ((noinline, returns_twice)) int
+zero (void)
+{
+  return 0;
+}
+
+void
+foo (void)
+{
+  (void) zero ();
+
+  n = 0;
+
+  for (;;)
+bar (zero (), n);
+}
diff --git a/gcc/tree-ssa-loop-im.cc b/gcc/tree-ssa-loop-im.cc
index 2ea815050d1..2119d4072d3 100644
--- a/gcc/tree-ssa-loop-im.cc
+++ b/gcc/tree-ssa-loop-im.cc
@@ -835,10 +835,15 @@ determine_max_movement (gimple *stmt, bool 
must_preserve_exec)
 
   return true;
 }
-  else
-FOR_EACH_SSA_TREE_OPERAND (val, stmt, iter, SSA_OP_USE)
-  if (!add_dependency (val, lim_data, loop, true))
-   return false;
+
+  /* A stmt that receives abnormal edges cannot be hoisted.  */
+  if (is_a  (stmt)
+  && (gimple_call_flags (stmt) & ECF_RETURNS_TWICE))
+return false;
+
+  FOR_EACH_SSA_TREE_OPERAND (val, stmt, iter, SSA_OP_USE)
+if (!add_dependency (val, lim_data, loop, true))
+  return false;
 
   if (gimple_vuse (stmt))
 {
-- 
2.35.3


Re: Rust frontend patches v3

2022-10-28 Thread David Malcolm via Gcc-patches
On Fri, 2022-10-28 at 13:48 +0200, Arthur Cohen wrote:
> Hi David,
> 
> On 10/26/22 23:15, David Malcolm wrote:
> > On Wed, 2022-10-26 at 10:17 +0200, arthur.co...@embecosm.com wrote:
> > > This is the fixed version of our previous patch set for gccrs -
> > > We've
> > > adressed
> > > the comments raised in our previous emails.
> > 
> > [...snip...]
> > 
> > (Caveat: I'm not a global reviewer)
> > 
> > Sorry if this is answered in the docs in the patch kit, but a high-
> > level question: what's the interaction between gccrs and gcc's
> > garbage
> > collector?  Are the only GC-managed objects (such as trees) either
> > (a)
> > created near the end of the gccrs, or (b) common globals created at
> > initialization and with GTY roots? 
> 
> We only create trees at the last point of our compilation pipeline, 
> before directly writing them to the backend. This then calls a 
> `write_global_definitions` method, that we ported over directly from
> the 
> Go frontend. Among other things, this method has the role of
> preserving 
> trees from the GC using `go_preserve_from_gc()` (or 
> `rust_preserve_from_gc()` in our case).
> 
> Elsewhere in our pipeline, we never call any garbage-collection
> routines 
> or GC-related functions.
> 
> > Are there any points where a collection happen within gccrs?  Or is
> > almost everything stored using
> > gccrs's own data structures, and are these managed in the regular
> > (non-
> > GC) heap?
> 
> This is correct. We have an AST representation, implemented using
> unique 
> pointers, which is then lowered to an HIR, also using unique
> pointers.
> 
> > I skimmed the patches and see that gccrs uses e.g. std::vector,
> > std::unique_ptr, std::map, and std::string; this seems reasonable
> > to
> > me, but it got me thinking about memory management strategies.
> > 
> > I see various std::map e.g. in Rust::Compile::Context; so
> > e.g.
> > is the GC guaranteed never to collect whilst this is live?
> 
> This is a really interesting question, and I hope the answer is yes!
> But 
> I'm unsure as to how to enforce that, as I am not too familiar with
> the 
> GCC GC. I'm hoping someone else will weigh in. As I said, we do not
> do 
> anything particular with the GC during the execution of our 
> `CompileCrate` visitor, so hopefully it shouldn't run.

I'm guessing that almost all of gccrs testing so far has been on
relatively small examples, so that even if the GC considers collecting,
the memory usage might not have exceeded the threshold for actually
doing the mark-and-sweep collection, and so no collection has been
happening during your testing.

In case you haven't tried yet, you might want to try adding:
  --param=ggc-min-expand=0 --param=ggc-min-heapsize=0
which IIRC forces the GC to actually do its mark-and-sweep collection
at every potential point where it might collect.

I use these params in libgccjit's test suite; it massively slows things
down, but it makes any GC misuse crash immediately even on minimal test
cases, rather than hiding problems until you have a big (and thus
nasty) test case.

Hope this is helpful
Dave


> 
> > Hope this is constructive
> > Dave
> > 
> 
> Thanks a lot for the input,
> 
> All the best,
> 
> Arthur
> 
> 
> 
> 



[PATCH] tree-optimization/107435 - ICE with recurrence vectorization

2022-10-28 Thread Richard Biener via Gcc-patches
This implements the missed conversion from pointer to integer.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/107435
* tree-vect-loop.cc (vectorizable_recurr): Convert initial
value to vector component type.

* gcc.dg/torture/pr107435.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr107435.c | 23 +++
 gcc/tree-vect-loop.cc   |  6 ++
 2 files changed, 29 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr107435.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr107435.c 
b/gcc/testsuite/gcc.dg/torture/pr107435.c
new file mode 100644
index 000..10128969191
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr107435.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-mavx2" { target x86_64-*-* i?86-*-* } } */
+
+struct qlist_head {
+  struct qlist_head *next;
+};
+void
+qlist_add (struct qlist_head *new, struct qlist_head *head)
+{
+  struct qlist_head *prev = head;
+  new->next = head->next;
+  prev->next = new;
+}
+struct {
+  struct qlist_head queue_link;
+} free_list, prealloc[64];
+void
+dbpf_open_cache_initialize()
+{
+  int i = 0;
+  for (; i < 64; i++)
+qlist_add([i].queue_link, _list.queue_link);
+}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index d5c2bff80be..aacbb12580c 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -8469,6 +8469,12 @@ vectorizable_recurr (loop_vec_info loop_vinfo, 
stmt_vec_info stmt_info,
   edge pe = loop_preheader_edge (LOOP_VINFO_LOOP (loop_vinfo));
   basic_block bb = gimple_bb (phi);
   tree preheader = PHI_ARG_DEF_FROM_EDGE (phi, pe);
+  if (!useless_type_conversion_p (TREE_TYPE (vectype), TREE_TYPE (preheader)))
+{
+  gimple_seq stmts = NULL;
+  preheader = gimple_convert (, TREE_TYPE (vectype), preheader);
+  gsi_insert_seq_on_edge_immediate (pe, stmts);
+}
   tree vec_init = build_vector_from_val (vectype, preheader);
   vec_init = vect_init_vector (loop_vinfo, stmt_info, vec_init, vectype, NULL);
 
-- 
2.35.3


Re: [PATCH] 16/19 modula2 front end: bootstrap and documentation tools

2022-10-28 Thread Gaius Mulley via Gcc-patches
Martin Liška  writes:

> Hello.
>
> I noticed the devel/modula-2 branch contains the following dead links:
>
> - http://www.gccsummit.org/2006
> - http://www.gccsummit.org/2010/speakers.php?types=LIGHTNING
> - http://floppsie.comp.glam.ac.uk/Papers/paper23/gaius-mulley-gnu-m2.pdf
> - http://floppsie.comp.glam.ac.uk/Papers/paper15/mulley-proc.pdf
> - http://floppsie.comp.glam.ac.uk/Papers/paper22/gaius-gcc-cauldron-2016.pdf
>
> Thanks,
> Martin

many thanks - I will remove these,

regards,
Gaius


Re: [PATCH v2 2/3] libcpp: add a function to determine UTF-8 validity of a C string

2022-10-28 Thread David Malcolm via Gcc-patches
On Thu, 2022-10-27 at 19:16 -0400, Ben Boeckel wrote:
> This simplifies the interface for other UTF-8 validity detections
> when a
> simple "yes" or "no" answer is sufficient.
> 
> Signed-off-by: Ben Boeckel 
> ---
>  libcpp/ChangeLog  |  6 ++
>  libcpp/charset.cc | 18 ++
>  libcpp/internal.h |  2 ++
>  3 files changed, 26 insertions(+)
> 
> diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog
> index 4d707277531..4e2c7900ae2 100644
> --- a/libcpp/ChangeLog
> +++ b/libcpp/ChangeLog
> @@ -1,3 +1,9 @@
> +2022-10-27  Ben Boeckel  
> +
> +   * include/charset.cc: Add `_cpp_valid_utf8_str` which
> determines
> +   whether a C string is valid UTF-8 or not.
> +   * include/internal.h: Add prototype for
> `_cpp_valid_utf8_str`.
> +
>  2022-10-27  Ben Boeckel  
>  
> * include/charset.cc: Reject encodings of codepoints above
> 0x10.

The patch looks good to me, with the same potential caveat that you
might need to move the ChangeLog entry from the patch "body" to the
leading blurb, to satisfy:
  ./contrib/gcc-changelog/git_check_commit.py

Thanks
Dave



Re: [PATCH v2 1/3] libcpp: reject codepoints above 0x10FFFF

2022-10-28 Thread David Malcolm via Gcc-patches
On Thu, 2022-10-27 at 19:16 -0400, Ben Boeckel wrote:
> Unicode does not support such values because they are unrepresentable
> in
> UTF-16.

Wikipedia pointed me to RFC 3629, which was when UTF-8 introduced this
restriction, whereas libcpp was implementing the higher upper limit
from the earlier, superceded RFC 2279.

The patch looks good to me, assuming it bootstraps and passes usual
regression testing, but...
> 
> Signed-off-by: Ben Boeckel 
> ---
>  libcpp/ChangeLog  | 6 ++
>  libcpp/charset.cc | 4 ++--
>  2 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog
> index 18d5bcceaf0..4d707277531 100644
> --- a/libcpp/ChangeLog
> +++ b/libcpp/ChangeLog
> @@ -1,3 +1,9 @@
> +2022-10-27  Ben Boeckel  
> +
> +   * include/charset.cc: Reject encodings of codepoints above
> 0x10.
> +   UTF-16 does not support such codepoints and therefore all
> Unicode
> +   rejects such values.
> +
>  2022-10-19  Lewis Hyatt  

...AIUI we now put ChangeLog entries in the blurb part of the patch, so
that server-side git scripts add them to the actual ChangeLog file.

Does the patch pass:
  ./contrib/gcc-changelog/git_check_commit.py
?

Thanks
Dave

>  
> * include/cpplib.h (struct cpp_string): Use new
> "string_length" GTY.
> diff --git a/libcpp/charset.cc b/libcpp/charset.cc
> index 12a398e7527..e9da6674b5f 100644
> --- a/libcpp/charset.cc
> +++ b/libcpp/charset.cc
> @@ -216,7 +216,7 @@ one_utf8_to_cppchar (const uchar **inbufp, size_t
> *inbytesleftp,
>    if (c <= 0x3FF && nbytes > 5) return EILSEQ;
>  
>    /* Make sure the character is valid.  */
> -  if (c > 0x7FFF || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ;
> +  if (c > 0x10 || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ;
>  
>    *cp = c;
>    *inbufp = inbuf;
> @@ -320,7 +320,7 @@ one_utf32_to_utf8 (iconv_t bigend, const uchar
> **inbufp, size_t *inbytesleftp,
>    s += inbuf[bigend ? 2 : 1] << 8;
>    s += inbuf[bigend ? 3 : 0];
>  
> -  if (s >= 0x7FFF || (s >= 0xD800 && s <= 0xDFFF))
> +  if (s > 0x10 || (s >= 0xD800 && s <= 0xDFFF))
>  return EILSEQ;
>  
>    rval = one_cppchar_to_utf8 (s, outbufp, outbytesleftp);



[PATCH v4] RISC-V: Libitm add RISC-V support.

2022-10-28 Thread Xiongchuan Tan via Gcc-patches
Reviewed-by: Palmer Dabbelt 
Acked-by: Palmer Dabbelt 

libitm/ChangeLog:

* configure.tgt: Add riscv support.
* config/riscv/asm.h: New file.
* config/riscv/sjlj.S: New file.
* config/riscv/target.h: New file.
---
v2: Change HW_CACHELINE_SIZE to 64 (in accordance with the RVA profiles, see
https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc)

v3: Ensure the stack is aligned to 16 bytes; make use of Zihintpause in
cpu_relax()

v4: Add a guard for unsupported RV32E

 libitm/config/riscv/asm.h|  58 ++
 libitm/config/riscv/sjlj.S   | 144 +++
 libitm/config/riscv/target.h |  62 +++
 libitm/configure.tgt |   2 +
 4 files changed, 266 insertions(+)
 create mode 100644 libitm/config/riscv/asm.h
 create mode 100644 libitm/config/riscv/sjlj.S
 create mode 100644 libitm/config/riscv/target.h

diff --git a/libitm/config/riscv/asm.h b/libitm/config/riscv/asm.h
new file mode 100644
index 000..8d02117
--- /dev/null
+++ b/libitm/config/riscv/asm.h
@@ -0,0 +1,58 @@
+/* Copyright (C) 2022 Free Software Foundation, Inc.
+   Contributed by Xiongchuan Tan .
+
+   This file is part of the GNU Transactional Memory Library (libitm).
+
+   Libitm is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   Libitm is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#ifndef _RV_ASM_H
+#define _RV_ASM_H
+
+#ifdef __riscv_e
+#  error "rv32e unsupported"
+#endif
+
+#if __riscv_xlen == 64
+#  define GPR_L ld
+#  define GPR_S sd
+#  define SZ_GPR 8
+#  define LEN_GPR 14
+#elif __riscv_xlen == 32
+#  define GPR_L lw
+#  define GPR_S sw
+#  define SZ_GPR 4
+#  define LEN_GPR 16 /* Extra padding to align the stack to 16 bytes */
+#else
+#  error Unsupported XLEN (must be 64-bit or 32-bit).
+#endif
+
+#if defined(__riscv_flen) && __riscv_flen == 64
+#  define FPR_L fld
+#  define FPR_S fsd
+#  define SZ_FPR 8
+#elif defined(__riscv_flen) && __riscv_flen == 32
+#  define FPR_L flw
+#  define FPR_S fsw
+#  define SZ_FPR 4
+#else
+#  define SZ_FPR 0
+#endif
+
+#endif  /* _RV_ASM_H */
diff --git a/libitm/config/riscv/sjlj.S b/libitm/config/riscv/sjlj.S
new file mode 100644
index 000..93f12ec
--- /dev/null
+++ b/libitm/config/riscv/sjlj.S
@@ -0,0 +1,144 @@
+/* Copyright (C) 2022 Free Software Foundation, Inc.
+   Contributed by Xiongchuan Tan .
+
+   This file is part of the GNU Transactional Memory Library (libitm).
+
+   Libitm is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   Libitm is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#include "asmcfi.h"
+#include "asm.h"
+
+   .text
+   .align  2
+   .global _ITM_beginTransaction
+   .type   _ITM_beginTransaction, @function
+
+_ITM_beginTransaction:
+   cfi_startproc
+   mv a1, sp
+   addi sp, sp, -(LEN_GPR*SZ_GPR+12*SZ_FPR)
+   cfi_adjust_cfa_offset(LEN_GPR*SZ_GPR+12*SZ_FPR)
+
+   /* Return Address */
+   GPR_S ra, 0*SZ_GPR(sp)
+   cfi_rel_offset(ra, 0*SZ_GPR)
+
+   /* Caller's sp */
+   GPR_S a1, 1*SZ_GPR(sp)
+
+   /* Caller's s0/fp */
+   GPR_S fp, 2*SZ_GPR(sp)
+   cfi_rel_offset(fp, 2*SZ_GPR)
+
+   /* Callee-saved registers */
+   GPR_S s1, 3*SZ_GPR(sp)
+   GPR_S s2, 4*SZ_GPR(sp)
+   GPR_S s3, 5*SZ_GPR(sp)
+   GPR_S s4, 6*SZ_GPR(sp)
+   GPR_S s5, 

Re: Rust frontend patches v3

2022-10-28 Thread Richard Biener via Gcc-patches
On Fri, Oct 28, 2022 at 1:45 PM Arthur Cohen  wrote:
>
> Hi David,
>
> On 10/26/22 23:15, David Malcolm wrote:
> > On Wed, 2022-10-26 at 10:17 +0200, arthur.co...@embecosm.com wrote:
> >> This is the fixed version of our previous patch set for gccrs - We've
> >> adressed
> >> the comments raised in our previous emails.
> >
> > [...snip...]
> >
> > (Caveat: I'm not a global reviewer)
> >
> > Sorry if this is answered in the docs in the patch kit, but a high-
> > level question: what's the interaction between gccrs and gcc's garbage
> > collector?  Are the only GC-managed objects (such as trees) either (a)
> > created near the end of the gccrs, or (b) common globals created at
> > initialization and with GTY roots?
>
> We only create trees at the last point of our compilation pipeline,
> before directly writing them to the backend. This then calls a
> `write_global_definitions` method, that we ported over directly from the
> Go frontend. Among other things, this method has the role of preserving
> trees from the GC using `go_preserve_from_gc()` (or
> `rust_preserve_from_gc()` in our case).
>
> Elsewhere in our pipeline, we never call any garbage-collection routines
> or GC-related functions.
>
> > Are there any points where a collection happen within gccrs?  Or is almost 
> > everything stored using
> > gccrs's own data structures, and are these managed in the regular (non-
> > GC) heap?
>
> This is correct. We have an AST representation, implemented using unique
> pointers, which is then lowered to an HIR, also using unique pointers.
>
> > I skimmed the patches and see that gccrs uses e.g. std::vector,
> > std::unique_ptr, std::map, and std::string; this seems reasonable to
> > me, but it got me thinking about memory management strategies.
> >
> > I see various std::map e.g. in Rust::Compile::Context; so e.g.
> > is the GC guaranteed never to collect whilst this is live?
>
> This is a really interesting question, and I hope the answer is yes! But
> I'm unsure as to how to enforce that, as I am not too familiar with the
> GCC GC. I'm hoping someone else will weigh in. As I said, we do not do
> anything particular with the GC during the execution of our
> `CompileCrate` visitor, so hopefully it shouldn't run.

collection points are explicit, but some might be hidden behind
middle-end APIs, in particular once you call cgraph::finalize_compilation_unit
you should probably expect collection.

Richard.

> > Hope this is constructive
> > Dave
> >
>
> Thanks a lot for the input,
>
> All the best,
>
> Arthur
>
>
>
>


Re: [PATCH] Restore RTL alias analysis for hard frame pointer

2022-10-28 Thread Richard Biener via Gcc-patches
On Fri, Oct 28, 2022 at 2:27 PM Richard Biener
 wrote:
>
> On Fri, Oct 28, 2022 at 11:11 AM Eric Botcazou via Gcc-patches
>  wrote:
> >
> > Hi,
> >
> > the following change:
> >
> > 2021-07-28  Bin Cheng  
> >
> > * alias.c (init_alias_analysis): Don't skip prologue/epilogue.
> >
> > broke the alias analysis for the hard frame pointer (when it is used as a
> > frame pointer, i.e. when the frame pointer is not eliminated) described in 
> > the
> > large comment at the top of the file, because static_reg_base_value is set 
> > for
> > it and, consequently, new_reg_base_value too.  So when the instruction 
> > saving
> > the stack pointer into the hard frame pointer in the prologue is processed, 
> > it
> > is viewed as a second set of the hard frame pointer and to a different value
> > by record_set, which then resets new_reg_base_value to 0 and the game is 
> > over.
> >
> > This e.g. hampers the performance of the var-tracking RTL pass for 
> > parameters
> > passed on the stack like on x86, leading to regressions when debugging, but
> > code generation is very likely affected too.
> >
> > Bootstrapped/regtested on x86-64/Linux, OK for mainline and 12 branch?
>
> OK for trunk and 12 after a while of burn-in.

Oh, do you have a testcase suitable for the testsuite?

> Thanks,
> Richard.
>
> >
> > 2022-10-28  Eric Botcazou  
> >
> > * alias.cc (init_alias_analysis): Do not record sets to the hard
> > frame pointer if the frame pointer has not been eliminated.
> >
> > --
> > Eric Botcazou


Re: [PATCH] Restore RTL alias analysis for hard frame pointer

2022-10-28 Thread Richard Biener via Gcc-patches
On Fri, Oct 28, 2022 at 11:11 AM Eric Botcazou via Gcc-patches
 wrote:
>
> Hi,
>
> the following change:
>
> 2021-07-28  Bin Cheng  
>
> * alias.c (init_alias_analysis): Don't skip prologue/epilogue.
>
> broke the alias analysis for the hard frame pointer (when it is used as a
> frame pointer, i.e. when the frame pointer is not eliminated) described in the
> large comment at the top of the file, because static_reg_base_value is set for
> it and, consequently, new_reg_base_value too.  So when the instruction saving
> the stack pointer into the hard frame pointer in the prologue is processed, it
> is viewed as a second set of the hard frame pointer and to a different value
> by record_set, which then resets new_reg_base_value to 0 and the game is over.
>
> This e.g. hampers the performance of the var-tracking RTL pass for parameters
> passed on the stack like on x86, leading to regressions when debugging, but
> code generation is very likely affected too.
>
> Bootstrapped/regtested on x86-64/Linux, OK for mainline and 12 branch?

OK for trunk and 12 after a while of burn-in.

Thanks,
Richard.

>
> 2022-10-28  Eric Botcazou  
>
> * alias.cc (init_alias_analysis): Do not record sets to the hard
> frame pointer if the frame pointer has not been eliminated.
>
> --
> Eric Botcazou


Re: Rust frontend patches v3

2022-10-28 Thread Arthur Cohen

Hi David,

On 10/26/22 23:15, David Malcolm wrote:

On Wed, 2022-10-26 at 10:17 +0200, arthur.co...@embecosm.com wrote:

This is the fixed version of our previous patch set for gccrs - We've
adressed
the comments raised in our previous emails.


[...snip...]

(Caveat: I'm not a global reviewer)

Sorry if this is answered in the docs in the patch kit, but a high-
level question: what's the interaction between gccrs and gcc's garbage
collector?  Are the only GC-managed objects (such as trees) either (a)
created near the end of the gccrs, or (b) common globals created at
initialization and with GTY roots? 


We only create trees at the last point of our compilation pipeline, 
before directly writing them to the backend. This then calls a 
`write_global_definitions` method, that we ported over directly from the 
Go frontend. Among other things, this method has the role of preserving 
trees from the GC using `go_preserve_from_gc()` (or 
`rust_preserve_from_gc()` in our case).


Elsewhere in our pipeline, we never call any garbage-collection routines 
or GC-related functions.



Are there any points where a collection happen within gccrs?  Or is almost 
everything stored using
gccrs's own data structures, and are these managed in the regular (non-
GC) heap?


This is correct. We have an AST representation, implemented using unique 
pointers, which is then lowered to an HIR, also using unique pointers.



I skimmed the patches and see that gccrs uses e.g. std::vector,
std::unique_ptr, std::map, and std::string; this seems reasonable to
me, but it got me thinking about memory management strategies.

I see various std::map e.g. in Rust::Compile::Context; so e.g.
is the GC guaranteed never to collect whilst this is live?


This is a really interesting question, and I hope the answer is yes! But 
I'm unsure as to how to enforce that, as I am not too familiar with the 
GCC GC. I'm hoping someone else will weigh in. As I said, we do not do 
anything particular with the GC during the execution of our 
`CompileCrate` visitor, so hopefully it shouldn't run.



Hope this is constructive
Dave



Thanks a lot for the input,

All the best,

Arthur






OpenPGP_0x1B3465B044AD9C65.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH 2/3] Add lto-dump tool.

2022-10-28 Thread Thomas Schwinge
Hi!

This minor clean-up had fallen out of me working on something else in
GCC's options machinery, several months ago:

On 2019-03-12T18:14:04+0100, marxin  wrote:
> gcc/lto/ChangeLog:

>   * lang.opt: Add new language LTODump and options related
>   to LTO dump tool.

As this new "Language" 'LTODump' does not share any options with 'LTO'
proper, it makes sense, in my opinion, to also make that obvious in
'gcc/lto/lang.opt', which your Subversion r270897 (Git
commit 66d62d9f2e6b059be6a018397fba555147133a9a) "Add lto-dump tool"
almost ;-) did:

> --- a/gcc/lto/lang.opt
> +++ b/gcc/lto/lang.opt
> @@ -24,6 +24,9 @@
>  Language
>  LTO
>
> +Language
> +LTODump
> +
>  Enum
>  Name(lto_linker_output) Type(enum lto_linker_output) UnknownError(unknown 
> linker output %qs)
>
> @@ -66,6 +69,65 @@ fwpa=
>  LTO Driver RejectNegative Joined Var(flag_wpa)
>  Whole program analysis (WPA) mode with number of parallel jobs specified.
>
> +
> +[LTODump option records]
> +
> +
>  fresolution=
>  LTO Joined
>  The resolution file.

OK to push the attached
"Better separate 'LTO' vs. 'LTODump' in 'gcc/lto/lang.opt'"?


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 7fe1d5b8d39d863285e14fbb186599dcf6bba986 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 1 Apr 2022 19:52:54 +0200
Subject: [PATCH] Better separate 'LTO' vs. 'LTODump' in 'gcc/lto/lang.opt'

Minor clean-up after Subversion r270897 (Git
commit 66d62d9f2e6b059be6a018397fba555147133a9a) "Add lto-dump tool".

No change in generated files.

	gcc/lto/
	* lang.opt: Better separate 'LTO' vs. 'LTODump'.
---
 gcc/lto/lang.opt | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/gcc/lto/lang.opt b/gcc/lto/lang.opt
index 550a50fc188..1ad2967d9cf 100644
--- a/gcc/lto/lang.opt
+++ b/gcc/lto/lang.opt
@@ -24,9 +24,6 @@
 Language
 LTO
 
-Language
-LTODump
-
 Enum
 Name(lto_linker_output) Type(enum lto_linker_output) UnknownError(unknown linker output %qs)
 
@@ -52,7 +49,6 @@ flinker-output=
 LTO Driver Joined RejectNegative Enum(lto_linker_output) Var(flag_lto_linker_output) Init(LTO_LINKER_OUTPUT_UNKNOWN)
 Set linker output type (used internally during LTO optimization).
 
-
 fltrans
 LTO Var(flag_ltrans)
 Run the link-time optimizer in local transformation (LTRANS) mode.
@@ -61,6 +57,10 @@ fltrans-output-list=
 LTO Joined Var(ltrans_output_list)
 Specify a file to which a list of files output by LTRANS is written.
 
+fresolution=
+LTO Joined
+The resolution file.
+
 fwpa
 LTO Driver
 Run the link-time optimizer in whole program analysis (WPA) mode.
@@ -70,6 +70,9 @@ LTO Driver RejectNegative Joined Var(flag_wpa)
 Whole program analysis (WPA) mode with number of parallel jobs specified.
 
 
+Language
+LTODump
+
 list
 LTODump Var(flag_lto_dump_list)
 Call the dump function for variables and function in IL.
@@ -131,8 +134,4 @@ callgraph
 LTODump Var(flag_dump_callgraph)
 Dump the symtab callgraph.
 
-fresolution=
-LTO Joined
-The resolution file.
-
 ; This comment is to ensure we retain the blank line above.
-- 
2.35.1



RE: [PATCH 2/3]AArch64 Promote function arguments using a paradoxical subreg when beneficial.

2022-10-28 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Andrew Pinski 
> Sent: Thursday, October 27, 2022 4:15 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
> ; nd ; Richard Sandiford
> ; Marcus Shawcroft
> 
> Subject: Re: [PATCH 2/3]AArch64 Promote function arguments using a
> paradoxical subreg when beneficial.
> 
> On Fri, May 13, 2022 at 10:14 AM Tamar Christina via Gcc-patches  patc...@gcc.gnu.org> wrote:
> >
> > Hi All,
> >
> > The PROMOTE_MODE always promotes 8 and 16-bit parameters to 32-bits.
> > This promotion is not required for the ABI which states:
> >
> >
> > ```
> > C.9 If the argument is an Integral or Pointer Type, the size of the
> argument is
> > less than or equal to 8 bytes and the NGRN is less than 8, the
> > argument is copied to the least significant bits in x[NGRN]. The NGRN is
> incremented by one.
> > The argument has now been allocated.
> >
> > C.16If the size of the argument is less than 8 bytes then the size of 
> > the
> > argument is set to 8 bytes. The effect is as if the argument was
> > copied to the least significant bits of a 64-bit register and the
> > remaining bits filled with unspecified values ```
> >
> > That is, the bits in the registers are unspecified and callees cannot
> > assume any particular status.
> >
> > This means that we can avoid the promotion and still get correct code
> > as the language level promotion rules require values to be extended
> > when the bits are significant.
> >
> > So if we are .e.g OR-ing two 8-bit values no extend is needed as the
> > top bits are irrelevant.  If we are doing e.g. addition, then the top
> > bits *might* be relevant depending on the result type.  But the middle
> > end will always contain the appropriate extend in those cases.
> >
> > The mid-end also has optimizations around this assumption and the
> > AArch64 port actively undoes them.
> >
> > So for instance
> >
> > uint16_t fd (uint8_t xr){
> > return xr + 1;
> > }
> >
> > uint8_t fd2 (uint8_t xr){
> > return xr + 1;
> > }
> >
> > should produce
> >
> > fd: // @fd
> > and w8, w0, #0xff
> > add w0, w8, #1
> > ret
> > fd2:// @fd2
> > add w0, w0, #1
> > ret
> >
> > like clang does instead of
> >
> > fd:
> > and w0, w0, 255
> > add w0, w0, 1
> > ret
> > fd2:
> > and w0, w0, 255
> > add w0, w0, 1
> > ret
> >
> > like we do now.  Removing this forced expansion maintains correctness
> > but fixes issues with various codegen defects.  It also brings us inline 
> > with
> clang.
> >
> > Note that C, C++ and Fortran etc all correctly specify what should
> > happen w.r.t extends and e.g. array indexing, pointer arith etc so we
> > never get incorrect code.
> >
> > There is however a second reason for doing this promotion: RTL efficiency.
> > The promotion stops us from having to promote the values to SI to be
> > able to use them in instructions and then truncating again afterwards.
> >
> > To get both the efficiency and the simpler RTL we can instead promote
> > to a paradoxical subreg.  This patch implements the hook for AArch64
> > and adds an explicit opt-out for values that feed into comparisons.  This is
> done because:
> >
> > 1. our comparisons patterns already allow us to absorb the zero extend
> > 2. The extension allows us to use cbz/cbnz/tbz etc.  In some cases
> > such as
> >
> > int foo (char a, char b)
> > {
> >if (a)
> >  if (b)
> >bar1 ();
> >  else
> >...
> > else
> >  if (b)
> >bar2 ();
> >  else
> >...
> > }
> >
> > by zero extending the value we can avoid having to repeatedly test the
> > value before a branch.  Allowing the zero extend also allows our
> > existing `ands` patterns to work as expected.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > I have to commit this and the last patch together but ease of review I
> > have split them up here. However 209 missed optimization xfails are
> > fixed.
> >
> > No performance difference on SPECCPU 2017 but no failures.
> >
> > Ok for master?
> 
> Did this patch ever get approved? It is a nice improvement that would be nice
> to get into GCC 13 before the close of stage 1.

No, It was requested I make a standalone pass that introduces a new kind of 
extension
in the mid-end.  Unfortunately due to constrains on how much time I can 
dedicate to
that this year I've had to drop it for GCC 13.

I'll try to pick it up again during GCC 14.

Regards,
Tamar

> 
> Thanks,
> Andrew
> 
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64.cc
> (aarch64_promote_function_args_subreg_p):
> > (TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P): New.
> > * config/aarch64/aarch64.h (PROMOTE_MODE): Expand doc.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/apc-subreg.c: 

Re: [PATCH v3] LoongArch: Libvtv add loongarch support.

2022-10-28 Thread WANG Xuerui

Hi,

The code change seems good but a few grammatical nits.

Patch subject should be a verb phrase, something like "libvtv: add 
LoongArch support" could be better.


On 2022/10/28 16:01, Lulu Cheng wrote:

After several considerations, I decided to set VTV_PAGE_SIZE to 16KB under 
loongarch64.


v1 - > v2:

1. When the macro __loongarch_lp64 is defined, the VTV_PAGE_SIZE is set to 64K.
2. In the vtv_malloc.cc file __vtv_malloc_init function, it does not check
whether VTV_PAGE_SIZE is equal to the system page size, if the macro
__loongarch_lp64 is defined.

v2 -> v3:

Set VTV_PAGE_SIZE to 16KB under loongarch64.



All regression tests of libvtv passed.

 === libvtv Summary ===

# of expected passes176

-


Are the monologue and changelog supposed to be a part of the actual 
commit? If not, conventionally they should be placed *after* the "---" 
line separating the commit message and diffstat/patch content.




The loongarch64 kernel supports 4KB,16KB, or 64KB pages,
but only 16k pages are currently supported in this code.
This sentence feels a little bit unnatural. I suggest just "The 
LoongArch specification permits page sizes of 4KiB, 16KiB and 64KiB, but 
only 16KiB pages are supported for now".


Co-Authored-By: qijingwen 

include/ChangeLog:

* vtv-change-permission.h (defined):
(VTV_PAGE_SIZE): Set VTV_PAGE_SIZE to 16KB under loongarch64.

"for loongarch64" feels more natural.


libvtv/ChangeLog:

* configure.tgt: Add loongarch support.
---
  include/vtv-change-permission.h | 5 +
  libvtv/configure.tgt| 3 +++
  2 files changed, 8 insertions(+)

diff --git a/include/vtv-change-permission.h b/include/vtv-change-permission.h
index 70bdad92bca..f61d8b68ef6 100644
--- a/include/vtv-change-permission.h
+++ b/include/vtv-change-permission.h
@@ -48,6 +48,11 @@ extern void __VLTChangePermission (int);
  #else
  #if defined(__sun__) && defined(__svr4__) && defined(__sparc__)
  #define VTV_PAGE_SIZE 8192
+#elif defined(__loongarch_lp64)
+/* The page size can be configured to 4, 16, or 64KB configuring the kernel.

"The page size is configurable by the kernel to be 4, 16 or 64 KiB."

+   However, only 16KB pages are supported here. Please modify this macro if you
+   want to support other page sizes.  */


Are we actually encouraging the users to modify the sources themselves 
if they decide to run with non-16KiB page size? This might not even be 
feasible, as you're essentially telling them to recompile part of the 
toolchain, which they may not want to / cannot do.


I think the message you want to convey here is for them to voice their 
need upstream so we can then discuss. In that case, the 2 sentences here 
could be:


"For now, only the default page size of 16KiB is supported. If you have 
a need for other page sizes, please get in touch."


Although I'm not sure if the vague "get in touch" wording is 
appropriate. What do others think?



+#define VTV_PAGE_SIZE 16384
  #else
  #define VTV_PAGE_SIZE 4096
  #endif
diff --git a/libvtv/configure.tgt b/libvtv/configure.tgt
index aa2a3f675b8..6cdd1e97ab1 100644
--- a/libvtv/configure.tgt
+++ b/libvtv/configure.tgt
@@ -50,6 +50,9 @@ case "${target}" in
;;
x86_64-*-darwin[1]* | i?86-*-darwin[1]*)
;;
+  loongarch*-*-linux*)
+   VTV_SUPPORTED=yes
+   ;;
*)
;;
  esac


Re: [PATCH v3] RISC-V: Libitm add RISC-V support.

2022-10-28 Thread Kito Cheng via Gcc-patches
I guess we don't really care about RV32E here, but in case you add a
guard for that?

#ifdef __riscv_e
#error "rv32e unsupported"
#endif


On Fri, Oct 28, 2022 at 4:39 PM Xiongchuan Tan via Gcc-patches
 wrote:
>
> Reviewed-by: Palmer Dabbelt 
> Acked-by: Palmer Dabbelt 
>
> libitm/ChangeLog:
>
> * configure.tgt: Add riscv support.
> * config/riscv/asm.h: New file.
> * config/riscv/sjlj.S: New file.
> * config/riscv/target.h: New file.
> ---
> v2: Change HW_CACHELINE_SIZE to 64 (in accordance with the RVA profiles, see
> https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc)
>
> v3: Ensure the stack is aligned to 16 bytes; make use of Zihintpause in
> cpu_relax()
>
>  libitm/config/riscv/asm.h|  54 +
>  libitm/config/riscv/sjlj.S   | 144 +++
>  libitm/config/riscv/target.h |  62 +++
>  libitm/configure.tgt |   2 +
>  4 files changed, 262 insertions(+)
>  create mode 100644 libitm/config/riscv/asm.h
>  create mode 100644 libitm/config/riscv/sjlj.S
>  create mode 100644 libitm/config/riscv/target.h
>
> diff --git a/libitm/config/riscv/asm.h b/libitm/config/riscv/asm.h
> new file mode 100644
> index 000..bb515f2
> --- /dev/null
> +++ b/libitm/config/riscv/asm.h
> @@ -0,0 +1,54 @@
> +/* Copyright (C) 2022 Free Software Foundation, Inc.
> +   Contributed by Xiongchuan Tan .
> +
> +   This file is part of the GNU Transactional Memory Library (libitm).
> +
> +   Libitm is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3 of the License, or
> +   (at your option) any later version.
> +
> +   Libitm is distributed in the hope that it will be useful, but WITHOUT ANY
> +   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
> +   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> +   more details.
> +
> +   Under Section 7 of GPL version 3, you are granted additional
> +   permissions described in the GCC Runtime Library Exception, version
> +   3.1, as published by the Free Software Foundation.
> +
> +   You should have received a copy of the GNU General Public License and
> +   a copy of the GCC Runtime Library Exception along with this program;
> +   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +   .  */
> +
> +#ifndef _RV_ASM_H
> +#define _RV_ASM_H
> +
> +#if __riscv_xlen == 64
> +#  define GPR_L ld
> +#  define GPR_S sd
> +#  define SZ_GPR 8
> +#  define LEN_GPR 14
> +#elif __riscv_xlen == 32
> +#  define GPR_L lw
> +#  define GPR_S sw
> +#  define SZ_GPR 4
> +#  define LEN_GPR 16 /* Extra padding to align the stack to 16 bytes */
> +#else
> +#  error Unsupported XLEN (must be 64-bit or 32-bit).
> +#endif
> +
> +#if defined(__riscv_flen) && __riscv_flen == 64
> +#  define FPR_L fld
> +#  define FPR_S fsd
> +#  define SZ_FPR 8
> +#elif defined(__riscv_flen) && __riscv_flen == 32
> +#  define FPR_L flw
> +#  define FPR_S fsw
> +#  define SZ_FPR 4

Check __riscv_flen is not 32 or 64 here, in case we add Q-extension
then we can error out.

> diff --git a/libitm/config/riscv/sjlj.S b/libitm/config/riscv/sjlj.S
> new file mode 100644
> index 000..93f12ec
> --- /dev/null
> +++ b/libitm/config/riscv/sjlj.S
> @@ -0,0 +1,144 @@
> +#include "asmcfi.h"
> +#include "asm.h"
> +
> +   .text
> +   .align  2
> +   .global _ITM_beginTransaction
> +   .type   _ITM_beginTransaction, @function
> +
> +_ITM_beginTransaction:
> +   cfi_startproc
> +   mv a1, sp
> +   addi sp, sp, -(LEN_GPR*SZ_GPR+ 12*SZ_FPR)

This expression appeared 4 times, maybe define a marco ADJ_STACK_SIZE
or something else to hold that?

> +   cfi_adjust_cfa_offset(LEN_GPR*SZ_GPR+ 12*SZ_FPR)

> diff --git a/libitm/config/riscv/target.h b/libitm/config/riscv/target.h
> new file mode 100644
> index 000..b8a1665
> --- /dev/null
> +++ b/libitm/config/riscv/target.h
> @@ -0,0 +1,62 @@
> +typedef struct gtm_jmpbuf
> +  {
> +long int pc;
> +void *cfa;
> +long int s[12]; /* Saved registers, s0 is fp */
> +
> +#if __riscv_xlen == 32
> +/* Ensure that the stack is 16-byte aligned */
> +long int padding[2];
> +#endif
> +
> +/* FP saved registers */
> +#if defined(__riscv_flen) && __riscv_flen == 64
> +double fs[12];
> +#elif defined(__riscv_flen) && __riscv_flen == 32
> +float fs[12];

Same here, error __riscv_flen if defined but not 64 or 32.


[PATCH] c++: Allow module name to be a single letter on Windows

2022-10-28 Thread Torbjörn SVENSSON via Gcc-patches
On Windows, the ':' character is special and when the module name is
a single character, like 'A', then the flatname would be (for
example) 'A:Foo'. On Windows, 'A:Foo' is treated as an absolute
path by the module loader and is likely not found.

Without this patch, the test case pr98944_c.C fails with:

In module imported at /src/gcc/testsuite/g++.dg/modules/pr98944_b.C:7:1,
of module A:Foo, imported at /src/gcc/testsuite/g++.dg/modules/pr98944_c.C:7:
A:Internals: error: header module expected, module 'A:Internals' found
A:Internals: error: failed to read compiled module: Bad file data
A:Internals: note: compiled module file is 'gcm.cache/A-Internals.gcm'
In module imported at /src/gcc/testsuite/g++.dg/modules/pr98944_c.C:7:8:
A:Foo: error: failed to read compiled module: Bad import dependency
A:Foo: note: compiled module file is 'gcm.cache/A-Foo.gcm'
A:Foo: fatal error: returning to the gate for a mechanical issue
compilation terminated.

include/ChangeLog:

* filenames.h: Added IS_REAL_ABSOLUTE_PATH macro to check if
path is absolute and not semi-absolute on Windows.

gcc/cp/ChangeLog:

* module.cc: Use IS_REAL_ABSOLUTE_PATH macro.

Co-Authored-By: Yvan ROUX 
Signed-off-by: Torbjörn SVENSSON 
---
 gcc/cp/module.cc| 2 +-
 include/filenames.h | 4 
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 9957df510e6..84680e183b7 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -13958,7 +13958,7 @@ get_module (tree name, module_state *parent, bool 
partition)
 static module_state *
 get_module (const char *ptr)
 {
-  if (ptr[0] == '.' ? IS_DIR_SEPARATOR (ptr[1]) : IS_ABSOLUTE_PATH (ptr))
+  if (ptr[0] == '.' ? IS_DIR_SEPARATOR (ptr[1]) : IS_REAL_ABSOLUTE_PATH (ptr))
 /* A header name.  */
 return get_module (build_string (strlen (ptr), ptr));
 
diff --git a/include/filenames.h b/include/filenames.h
index 6c72c422edd..d04fccfed64 100644
--- a/include/filenames.h
+++ b/include/filenames.h
@@ -43,6 +43,7 @@ extern "C" {
 #  define HAS_DRIVE_SPEC(f) HAS_DOS_DRIVE_SPEC (f)
 #  define IS_DIR_SEPARATOR(c) IS_DOS_DIR_SEPARATOR (c)
 #  define IS_ABSOLUTE_PATH(f) IS_DOS_ABSOLUTE_PATH (f)
+#  define IS_REAL_ABSOLUTE_PATH(f) IS_DOS_REAL_ABSOLUTE_PATH (f)
 #else /* not DOSish */
 #  if defined(__APPLE__)
 #ifndef HAVE_CASE_INSENSITIVE_FILE_SYSTEM
@@ -52,6 +53,7 @@ extern "C" {
 #  define HAS_DRIVE_SPEC(f) (0)
 #  define IS_DIR_SEPARATOR(c) IS_UNIX_DIR_SEPARATOR (c)
 #  define IS_ABSOLUTE_PATH(f) IS_UNIX_ABSOLUTE_PATH (f)
+#  define IS_REAL_ABSOLUTE_PATH(f) IS_ABSOLUTE_PATH (f)
 #endif
 
 #define IS_DIR_SEPARATOR_1(dos_based, c)   \
@@ -67,6 +69,8 @@ extern "C" {
 
 #define IS_DOS_DIR_SEPARATOR(c) IS_DIR_SEPARATOR_1 (1, c)
 #define IS_DOS_ABSOLUTE_PATH(f) IS_ABSOLUTE_PATH_1 (1, f)
+#define IS_DOS_REAL_ABSOLUTE_PATH(f) \
+  ((f)[0] && (f)[1] == ':' && ((f)[2] == '/' || (f)[2] == '\\'))
 #define HAS_DOS_DRIVE_SPEC(f) HAS_DRIVE_SPEC_1 (1, f)
 
 #define IS_UNIX_DIR_SEPARATOR(c) IS_DIR_SEPARATOR_1 (0, c)
-- 
2.25.1



[PATCH] Restore RTL alias analysis for hard frame pointer

2022-10-28 Thread Eric Botcazou via Gcc-patches
Hi,

the following change:

2021-07-28  Bin Cheng  

* alias.c (init_alias_analysis): Don't skip prologue/epilogue.

broke the alias analysis for the hard frame pointer (when it is used as a 
frame pointer, i.e. when the frame pointer is not eliminated) described in the 
large comment at the top of the file, because static_reg_base_value is set for 
it and, consequently, new_reg_base_value too.  So when the instruction saving 
the stack pointer into the hard frame pointer in the prologue is processed, it 
is viewed as a second set of the hard frame pointer and to a different value 
by record_set, which then resets new_reg_base_value to 0 and the game is over.

This e.g. hampers the performance of the var-tracking RTL pass for parameters 
passed on the stack like on x86, leading to regressions when debugging, but 
code generation is very likely affected too.

Bootstrapped/regtested on x86-64/Linux, OK for mainline and 12 branch?


2022-10-28  Eric Botcazou  

* alias.cc (init_alias_analysis): Do not record sets to the hard
frame pointer if the frame pointer has not been eliminated.

-- 
Eric Botcazoudiff --git a/gcc/alias.cc b/gcc/alias.cc
index d54feb15268..c62837dd854 100644
--- a/gcc/alias.cc
+++ b/gcc/alias.cc
@@ -3369,6 +3369,10 @@ memory_modified_in_insn_p (const_rtx mem, const_rtx insn)
 void
 init_alias_analysis (void)
 {
+  const bool frame_pointer_eliminated
+= reload_completed
+  && !frame_pointer_needed
+  && targetm.can_eliminate (FRAME_POINTER_REGNUM, STACK_POINTER_REGNUM);
   unsigned int maxreg = max_reg_num ();
   int changed, pass;
   int i;
@@ -3446,12 +3450,8 @@ init_alias_analysis (void)
   for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
 	if (static_reg_base_value[i]
 	/* Don't treat the hard frame pointer as special if we
-	   eliminated the frame pointer to the stack pointer instead.  */
-	&& !(i == HARD_FRAME_POINTER_REGNUM
-		 && reload_completed
-		 && !frame_pointer_needed
-		 && targetm.can_eliminate (FRAME_POINTER_REGNUM,
-	   STACK_POINTER_REGNUM)))
+	   eliminated the frame pointer to the stack pointer.  */
+	&& !(i == HARD_FRAME_POINTER_REGNUM && frame_pointer_eliminated))
 	  {
 	new_reg_base_value[i] = static_reg_base_value[i];
 	bitmap_set_bit (reg_seen, i);
@@ -3467,10 +3467,15 @@ init_alias_analysis (void)
 		{
 		  rtx note, set;
 
+		  /* Treat the hard frame pointer as special unless we
+		 eliminated the frame pointer to the stack pointer.  */
+		  if (!frame_pointer_eliminated
+		  && modified_in_p (hard_frame_pointer_rtx, insn))
+		continue;
+
 		  /* If this insn has a noalias note, process it,  Otherwise,
 		 scan for sets.  A simple set will have no side effects
 		 which could change the base value of any other register.  */
-
 		  if (GET_CODE (PATTERN (insn)) == SET
 		  && REG_NOTES (insn) != 0
 		  && find_reg_note (insn, REG_NOALIAS, NULL_RTX))


[committed] openmp: Allow optional comma after directive-specifier in C/C++

2022-10-28 Thread Jakub Jelinek via Gcc-patches
Hi!

Previously we've been allowing that comma only in C++ when in attribute
form (which was the reason why it has been allowed), but 5.1 allows that
even in pragma form in C/C++ (with clarifications in 5.2) and 5.2
also in Fortran (which this patch doesn't implement).

Note, for directives which take an argument (== unnamed clause),
comma is not allowed in between the directive name and the argument,
like the directive-1.c testcase shows.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2022-10-28  Jakub Jelinek  

gcc/c/
* c-parser.cc (c_parser_omp_all_clauses): Allow optional
comma before the first clause.
(c_parser_omp_allocate, c_parser_omp_atomic, c_parser_omp_depobj,
c_parser_omp_flush, c_parser_omp_scan_loop_body,
c_parser_omp_ordered, c_finish_omp_declare_variant,
c_parser_omp_declare_target, c_parser_omp_declare_reduction,
c_parser_omp_requires, c_parser_omp_error,
c_parser_omp_assumption_clauses): Likewise.
gcc/cp/
* parser.cc (cp_parser_omp_all_clauses): Allow optional comma
before the first clause even in pragma syntax.
(cp_parser_omp_allocate, cp_parser_omp_atomic, cp_parser_omp_depobj,
cp_parser_omp_flush, cp_parser_omp_scan_loop_body,
cp_parser_omp_ordered, cp_parser_omp_assumption_clauses,
cp_finish_omp_declare_variant, cp_parser_omp_declare_target,
cp_parser_omp_declare_reduction_exprs, cp_parser_omp_requires,
cp_parser_omp_error): Likewise.
gcc/testsuite/
* c-c++-common/gomp/directive-1.c: New test.
* c-c++-common/gomp/clauses-6.c: New test.
* c-c++-common/gomp/declare-variant-2.c (f75a): Declare.
(f75): Use f75a as variant instead of f1 and don't expect error.
* g++.dg/gomp/clause-4.C (foo): Don't expect error on comma
before first clause.
* gcc.dg/gomp/clause-2.c (foo): Likewise.

--- gcc/c/c-parser.cc.jj2022-10-14 09:26:50.345509324 +0200
+++ gcc/c/c-parser.cc   2022-10-27 15:11:14.902665656 +0200
@@ -17366,7 +17366,7 @@ c_parser_omp_all_clauses (c_parser *pars
   if (nested && c_parser_next_token_is (parser, CPP_CLOSE_PAREN))
break;
 
-  if (!first)
+  if (!first || nested != 2)
{
  if (c_parser_next_token_is (parser, CPP_COMMA))
c_parser_consume_token (parser);
@@ -18453,6 +18453,9 @@ c_parser_omp_allocate (location_t loc, c
 {
   tree allocator = NULL_TREE;
   tree nl = c_parser_omp_var_list_parens (parser, OMP_CLAUSE_ALLOCATE, 
NULL_TREE);
+  if (c_parser_next_token_is (parser, CPP_COMMA)
+  && c_parser_peek_2nd_token (parser)->type == CPP_NAME)
+c_parser_consume_token (parser);
   if (c_parser_next_token_is (parser, CPP_NAME))
 {
   matching_parens parens;
@@ -18591,7 +18594,6 @@ c_parser_omp_atomic (location_t loc, c_p
   bool structured_block = false;
   bool swapped = false;
   bool non_lvalue_p;
-  bool first = true;
   tree clauses = NULL_TREE;
   bool capture = false;
   bool compare = false;
@@ -18602,13 +18604,10 @@ c_parser_omp_atomic (location_t loc, c_p
 
   while (c_parser_next_token_is_not (parser, CPP_PRAGMA_EOL))
 {
-  if (!first
- && c_parser_next_token_is (parser, CPP_COMMA)
+  if (c_parser_next_token_is (parser, CPP_COMMA)
  && c_parser_peek_2nd_token (parser)->type == CPP_NAME)
c_parser_consume_token (parser);
 
-  first = false;
-
   if (c_parser_next_token_is (parser, CPP_NAME))
{
  const char *p
@@ -19553,6 +19552,8 @@ c_parser_omp_depobj (c_parser *parser)
   parens.skip_until_found_close (parser);
   tree clause = NULL_TREE;
   enum omp_clause_depend_kind kind = OMP_CLAUSE_DEPEND_INVALID;
+  if (c_parser_next_token_is (parser, CPP_COMMA))
+c_parser_consume_token (parser);
   location_t c_loc = c_parser_peek_token (parser)->location;
   if (c_parser_next_token_is (parser, CPP_NAME))
 {
@@ -19629,6 +19630,9 @@ c_parser_omp_flush (c_parser *parser)
   location_t loc = c_parser_peek_token (parser)->location;
   c_parser_consume_pragma (parser);
   enum memmodel mo = MEMMODEL_LAST;
+  if (c_parser_next_token_is (parser, CPP_COMMA)
+  && c_parser_peek_2nd_token (parser)->type == CPP_NAME)
+c_parser_consume_token (parser);
   if (c_parser_next_token_is (parser, CPP_NAME))
 {
   const char *p
@@ -19721,6 +19725,9 @@ c_parser_omp_scan_loop_body (c_parser *p
 
   c_parser_consume_pragma (parser);
 
+  if (c_parser_next_token_is (parser, CPP_COMMA))
+   c_parser_consume_token (parser);
+
   if (c_parser_next_token_is (parser, CPP_NAME))
{
  const char *p
@@ -20504,9 +20511,14 @@ c_parser_omp_ordered (c_parser *parser,
   return false;
 }
 
-  if (c_parser_next_token_is (parser, CPP_NAME))
+  int n = 1;
+  if (c_parser_next_token_is (parser, CPP_COMMA))
+n = 2;
+
+  if (c_parser_peek_nth_token (parser, n)->type == CPP_NAME)
 {
-  const char *p = 

RE: [committed 6/6] amdgcn: vector testsuite tweaks

2022-10-28 Thread Thomas Schwinge
Hi Andrew!

On 2022-10-28T10:38:11+0200, "Stubbs, Andrew"  wrote:
>> -Original Message-
>> Looking into commit r13-3225-gbd9a05594d227cde79a67dc715bd9d82e9c464e9
>> "amdgcn: vector testsuite tweaks" for a moment, I also did wonder about
>> the following changes, because for 'vect_multiple_sizes' (for example,
>> x86_64-pc-linux-gnu) that seems to lose more specific testing;
>> previously: 'scan-tree-dump-times' exactly once, now: 'scan-tree-dump'
>> any number of times.  But I've no clue about that myself, so just
>> mentioning this, in case somebody else has an opinion.  ;-)
>
> When vect_multiple_sizes is true the number of times the pattern appears will 
> be greater that normal.  Most likely the pattern will appear once for each 
> vector size.  In the case of GCN, a pattern that normally appears 4 times now 
> appears 24 times.
>
> The alternative would be to have a whole set of patterns for each 
> configuration of each target that can have the multiple sizes.  That or 
> change the implementation of 'scan-tree-dump-times' to support expressions of 
> some kind, but even then the expressions would get hairy.

I guess my confusion is why this then hasn't already previously be
FAILing for example for x86_64-pc-linux-gnu, which also is
'vect_multiple_sizes'?  Anyway: "I've no clue about that myself".


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


RE: [PATCH 4/6] Support Intel AVX-NE-CONVERT

2022-10-28 Thread Kong, Lingling via Gcc-patches
Hi,

Because we  switch intrinsics for avx512bf16 to the new type __bf16. Now we 
could use m128/256bh for vector bf16 type instead of m128/256bf16.
And unified builtin for avx512bf16/avxneconvert.

Thanks,
Lingling

> -Original Message-
> From: Hongtao Liu 
> Sent: Tuesday, October 25, 2022 1:23 PM
> To: Kong, Lingling 
> Cc: Liu, Hongtao ; gcc-patches@gcc.gnu.org; Jiang,
> Haochen 
> Subject: Re: [PATCH 4/6] Support Intel AVX-NE-CONVERT
> 
> On Mon, Oct 24, 2022 at 2:20 PM Kong, Lingling 
> wrote:
> >
> > > From: Gcc-patches
> > > 
> > > On Behalf Of Hongtao Liu via Gcc-patches
> > > Sent: Monday, October 17, 2022 1:47 PM
> > > To: Jiang, Haochen 
> > > Cc: Liu, Hongtao ; gcc-patches@gcc.gnu.org
> > > Subject: Re: [PATCH 4/6] Support Intel AVX-NE-CONVERT
> > >
> > > On Fri, Oct 14, 2022 at 3:58 PM Haochen Jiang via Gcc-patches
> > >  wrote:
> > > >
> > > > From: Kong Lingling 
> > > > +(define_insn "vbcstne2ps_"
> > > > +  [(set (match_operand:VF1_128_256 0 "register_operand" "=x")
> > > > +(vec_duplicate:VF1_128_256
> > > > + (unspec:SF
> > > > +  [(match_operand:HI 1 "memory_operand" "m")]
> > > > +  VBCSTNE)))]
> > > > +  "TARGET_AVXNECONVERT"
> > > > +  "vbcstne2ps\t{%1, %0|%0, %1}"
> > > > +  [(set_attr "prefix" "vex")
> > > > +  (set_attr "mode" "")])
> > > Since jakub has support bf16 software emulation, can we rewrite it
> > > with general rtl ir without unspec?
> > > Like (float_extend:SF (match_operand:BF "memory_operand" "m")
> > > > +
> > > > +(define_int_iterator VCVTNEBF16
> > > > +  [UNSPEC_VCVTNEEBF16SF
> > > > +   UNSPEC_VCVTNEOBF16SF])
> > > > +
> > > > +(define_int_attr vcvtnebf16type
> > > > +  [(UNSPEC_VCVTNEEBF16SF "ebf16")
> > > > +   (UNSPEC_VCVTNEOBF16SF "obf16")]) (define_insn
> > > > +"vcvtne2ps_"
> > > > +  [(set (match_operand:VF1_128_256 0 "register_operand" "=x")
> > > > +(unspec:VF1_128_256
> > > > +  [(match_operand: 1 "memory_operand" "m")]
> > > > + VCVTNEBF16))]
> > > > +  "TARGET_AVXNECONVERT"
> > > > +  "vcvtne2ps\t{%1, %0|%0, %1}"
> > > > +  [(set_attr "prefix" "vex")
> > > > +   (set_attr "mode" "")])
> > > Similar for this one and all those patterns below.
> >
> > That's great! Thanks for the review!
> > Now rewrite it without unspec and use float_extend for new define_insn.
> Ok.
> >
> > Thanks
> > Lingling
> >
> >
> 
> 
> --
> BR,
> Hongtao


0001-Support-Intel-AVX-NE-CONVERT.patch
Description: 0001-Support-Intel-AVX-NE-CONVERT.patch


OpenACC: Don't gang-privatize artificial variables [PR90115] (was: [PATCH] [og12] OpenACC: Don't gang-privatize artificial variables)

2022-10-28 Thread Thomas Schwinge
Hi!

On 2022-10-18T16:46:07+0200, Thomas Schwinge  wrote:
> On 2022-10-14T13:38:56+, Julian Brown  wrote:
>> This patch prevents compiler-generated artificial variables from being
>> treated as privatization candidates for OpenACC.
>>
>> The rationale is that e.g. "gang-private" variables actually must be
>> shared by each worker and vector spawned within a particular gang, but
>> that sharing is not necessary for any compiler-generated variable (at
>> least at present, but no such need is anticipated either).  Variables on
>> the stack (and machine registers) are already private per-"thread"
>> (gang, worker and/or vector), and that's fine for artificial variables.
>
> OK, that seems fine rationale for this change in behavior.
> No contradicting test case jumped onto me, either.

>> Several tests need their scan output patterns adjusted to compensate.
>
> ACK -- surprisingly few.  (Some minor fine-tuning necessary for GCC
> master branch, as had to be expected; I'm working on that.)

With those changes...

>> --- a/gcc/omp-low.cc
>> +++ b/gcc/omp-low.cc
>> @@ -11400,6 +11400,28 @@ oacc_privatization_candidate_p (const location_t 
>> loc, const tree c,
>>  }
>>  }
>>
>> +  /* If an artificial variable has been added to a bind, e.g.
>> + a compiler-generated temporary structure used by the Fortran 
>> front-end, do
>> + not consider it as a privatization candidate.  Note that variables on
>> + the stack are private per-thread by default: making them "gang-private"
>> + for OpenACC actually means to share a single instance of a variable
>> + amongst all workers and threads spawned within each gang.
>> + At present, no compiler-generated artificial variables require such
>> + sharing semantics, so this is safe.  */
>> +
>> +  if (res && DECL_ARTIFICIAL (decl))
>> +{
>> +  res = false;
>> +
>> +  if (dump_enabled_p ())
>> +{
>> +  oacc_privatization_begin_diagnose_var (l_dump_flags, loc, c, decl);
>> +  dump_printf (l_dump_flags,
>> +   "isn%'t candidate for adjusting OpenACC privatization "
>> +   "level: %s\n", "artificial");
>> +}
>> +}
>
> In the source code comment, you say "added to a bind", and that's indeed
> what I was expecting, too, and thus put in:
>
>if (res && DECL_ARTIFICIAL (decl))
>  {
> +  gcc_checking_assert (block);
> +
>res = false;
>
> ..., but to my surprised, that did fire in one occasion:
>
>> --- a/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90
>> +++ b/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90
>> @@ -94,9 +94,7 @@ contains
>>  !$acc parallel copy(array)
>>  !$acc loop gang private(array) ! { dg-line l_loop[incr c_loop] }
>>  ! { dg-note {variable 'i' in 'private' clause isn't candidate for 
>> adjusting OpenACC privatization level: not addressable} "" { target *-*-* } 
>> l_loop$c_loop }
>> -! { dg-note {variable 'array\.[0-9]+' in 'private' clause is candidate 
>> for adjusting OpenACC privatization level} "" { target *-*-* } l_loop$c_loop 
>> }
>> -! { dg-note {variable 'array\.[0-9]+' ought to be adjusted for OpenACC 
>> privatization level: 'gang'} "" { target *-*-* } l_loop$c_loop }
>> -! { dg-note {variable 'array\.[0-9]+' adjusted for OpenACC 
>> privatization level: 'gang'} "" { target { ! { openacc_host_selected || { 
>> openacc_nvidia_accel_selected && __OPTIMIZE__ } } } } l_loop$c_loop }
>> +! { dg-note {variable 'array\.[0-9]+' in 'private' clause isn't 
>> candidate for adjusting OpenACC privatization level: artificial} "" { target 
>> *-*-* } l_loop$c_loop }
>>  ! { dg-message {sorry, unimplemented: target cannot support alloca} 
>> PR65181 { target openacc_nvidia_accel_selected } l_loop$c_loop }
>>  do i = 1, 10
>>array(i) = 9*i
>
> ... here.  Note "variable 'array\.[0-9]+' in 'private' clause";
> everywhere else we have "declared in block".
>
> As part of your verification, have you already looked into whether the
> new behavior is correct here, or does this one need to continue to be
> "adjusted for OpenACC privatization level: 'gang'"?  If the latter,
> should we check 'if (res && block && DECL_ARTIFICIAL (decl))' instead of
> 'if (res && DECL_ARTIFICIAL (decl))'

..., and that change merged in, I've then pushed to master branch
commit 11e811d8e2f63667f60f73731bb934273f5882b8
"OpenACC: Don't gang-privatize artificial variables [PR90115]", see
attached.

Cherry-picked pushed to releases/gcc-12 branch in
commit 9b116c51a451995f1bae8fdac0748fcf3f06aafe
"OpenACC: Don't gang-privatize artificial variables [PR90115]", see
attached.

I've then also done a merge from releases/gcc-12 branch into
devel/omp/gcc-12 branch, taking care of merge conflicts due to
"fine-tuning necessary for GCC master branch", which shouldn't propagate
into devel/omp/gcc-12 branch.  Pushed to devel/omp/gcc-12 branch
commit 33eae55cd9effd9e0bb0c3659cc5dfc100b6fd4e
"Merge commit 

Re: [PATCH] i386: Enable small loop unrolling for O2

2022-10-28 Thread Richard Biener via Gcc-patches
On Fri, Oct 28, 2022 at 10:08 AM Hongyu Wang  wrote:
>
> > Ugh, that's all quite ugly and unmaintainable, no?
> Agreed, I have the same feeling.
>
> > I'm quite sure that if this works it's not by intention.  Doesn't this
> > also disable
> > register renaming and web when the user explicitely specifies 
> > -funroll-loops?
> >
> > Doesn't this change -funroll-loops behavior everywhere, only unrolling small
> > loops?
>
> The ugly part ensures that -funroll-loops would not be affected at all
> by -munroll-only-small-loops.
>
> >
> > I'd like to see a -munroll-only-small-loops addition that doesn't have any 
> > such
> > effects.  Note RTL unrolling could also
> > conditionally enabled on a new -funroll-small-loops which wouldn't enable
> > register renaming or web.
>
> Did you mean something like
>
> index b9e07973dd6..b707d4afb84 100644
> --- a/gcc/loop-init.cc
> +++ b/gcc/loop-init.cc
> @@ -567,7 +567,8 @@ public:
>/* opt_pass methods: */
>bool gate (function *) final override
>  {
> -  return (flag_unroll_loops || flag_unroll_all_loops || 
> cfun->has_unroll);
> +  return (flag_unroll_loops || flag_unroll_all_loops || cfun->has_unroll
> + || flag_unroll_only_small_loops);
>  }
>
> then the backend can turn it on by default in O2?
> I don't know if there is a way to turn on middle-end pass by
> target-specific flags.

There isn't, it would need to be a target hook.  Currently only i386, rs6000
and s390 have loop_unroll_adjust.  We could enable the pass conditional
on implementing that hook (and optimize >= 2, hopefully the pass only
unrolls loops that are optimized for speed)?

>
> Richard Biener via Gcc-patches  于2022年10月28日周五 
> 15:33写道:
> >
> > On Wed, Oct 26, 2022 at 7:53 AM Hongyu Wang  wrote:
> > >
> > > Hi,
> > >
> > > Inspired by rs6000 and s390 port changes, this patch
> > > enables loop unrolling for small size loop at O2 by default.
> > > The default behavior is to unroll loop with unknown trip-count and
> > > less than 4 insns by 1 time.
> > >
> > > This improves 548.exchange2 by 3.5% on icelake and 6% on zen3 with
> > > 1.2% codesize increment. For other benchmarks the variants are minor
> > > and overall codesize increased by 0.2%.
> > >
> > > The kernel image size increased by 0.06%, and no impact on eembc.
> > >
> > > Bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > >
> > > Ok for trunk?
> > >
> > > gcc/ChangeLog:
> > >
> > > * common/config/i386/i386-common.cc (ix86_optimization_table):
> > > Enable loop unroll and small loop unroll at O2 by default.
> > > * config/i386/i386-options.cc
> > > (ix86_override_options_after_change):
> > > Disable small loop unroll when funroll-loops enabled, reset
> > > cunroll_grow_size when it is not explicitly enabled.
> > > (ix86_option_override_internal): Call
> > > ix86_override_options_after_change instead of calling
> > > ix86_recompute_optlev_based_flags and ix86_default_align
> > > separately.
> > > * config/i386/i386.cc (ix86_loop_unroll_adjust): Adjust unroll
> > > factor if -munroll-only-small-loops enabled.
> > > * config/i386/i386.opt: Add -munroll-only-small-loops,
> > > -param=x86-small-unroll-ninsns= for loop insn limit,
> > > -param=x86-small-unroll-factor= for unroll factor.
> > > * doc/invoke.texi: Document -munroll-only-small-loops,
> > > x86-small-unroll-ninsns and x86-small-unroll-factor.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/i386/pr86270.c: Add -mno-unroll-only-small-loops.
> > > * gcc.target/i386/pr93002.c: Likewise.
> > > ---
> > >  gcc/common/config/i386/i386-common.cc   |  6 
> > >  gcc/config/i386/i386-options.cc | 40 ++---
> > >  gcc/config/i386/i386.cc | 13 
> > >  gcc/config/i386/i386.opt| 13 
> > >  gcc/doc/invoke.texi | 14 +
> > >  gcc/testsuite/gcc.target/i386/pr86270.c |  2 +-
> > >  gcc/testsuite/gcc.target/i386/pr93002.c |  2 +-
> > >  7 files changed, 84 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/gcc/common/config/i386/i386-common.cc 
> > > b/gcc/common/config/i386/i386-common.cc
> > > index d6a68dc9b1d..0e580b39d14 100644
> > > --- a/gcc/common/config/i386/i386-common.cc
> > > +++ b/gcc/common/config/i386/i386-common.cc
> > > @@ -1686,6 +1686,12 @@ static const struct default_options 
> > > ix86_option_optimization_table[] =
> > >  /* The STC algorithm produces the smallest code at -Os, for x86.  */
> > >  { OPT_LEVELS_2_PLUS, OPT_freorder_blocks_algorithm_, NULL,
> > >REORDER_BLOCKS_ALGORITHM_STC },
> > > +{ OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
> > > +{ OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_only_small_loops, NULL, 
> > > 1 },
> > > +/* Turns off -frename-registers and -fweb which are enabled by
> > > +   funroll-loops.  */
> > 

[PATCH v3] RISC-V: Libitm add RISC-V support.

2022-10-28 Thread Xiongchuan Tan via Gcc-patches
Reviewed-by: Palmer Dabbelt 
Acked-by: Palmer Dabbelt 

libitm/ChangeLog:

* configure.tgt: Add riscv support.
* config/riscv/asm.h: New file.
* config/riscv/sjlj.S: New file.
* config/riscv/target.h: New file.
---
v2: Change HW_CACHELINE_SIZE to 64 (in accordance with the RVA profiles, see
https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc)

v3: Ensure the stack is aligned to 16 bytes; make use of Zihintpause in
cpu_relax()

 libitm/config/riscv/asm.h|  54 +
 libitm/config/riscv/sjlj.S   | 144 +++
 libitm/config/riscv/target.h |  62 +++
 libitm/configure.tgt |   2 +
 4 files changed, 262 insertions(+)
 create mode 100644 libitm/config/riscv/asm.h
 create mode 100644 libitm/config/riscv/sjlj.S
 create mode 100644 libitm/config/riscv/target.h

diff --git a/libitm/config/riscv/asm.h b/libitm/config/riscv/asm.h
new file mode 100644
index 000..bb515f2
--- /dev/null
+++ b/libitm/config/riscv/asm.h
@@ -0,0 +1,54 @@
+/* Copyright (C) 2022 Free Software Foundation, Inc.
+   Contributed by Xiongchuan Tan .
+
+   This file is part of the GNU Transactional Memory Library (libitm).
+
+   Libitm is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   Libitm is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#ifndef _RV_ASM_H
+#define _RV_ASM_H
+
+#if __riscv_xlen == 64
+#  define GPR_L ld
+#  define GPR_S sd
+#  define SZ_GPR 8
+#  define LEN_GPR 14
+#elif __riscv_xlen == 32
+#  define GPR_L lw
+#  define GPR_S sw
+#  define SZ_GPR 4
+#  define LEN_GPR 16 /* Extra padding to align the stack to 16 bytes */
+#else
+#  error Unsupported XLEN (must be 64-bit or 32-bit).
+#endif
+
+#if defined(__riscv_flen) && __riscv_flen == 64
+#  define FPR_L fld
+#  define FPR_S fsd
+#  define SZ_FPR 8
+#elif defined(__riscv_flen) && __riscv_flen == 32
+#  define FPR_L flw
+#  define FPR_S fsw
+#  define SZ_FPR 4
+#else
+#  define SZ_FPR 0
+#endif
+
+#endif  /* _RV_ASM_H */
diff --git a/libitm/config/riscv/sjlj.S b/libitm/config/riscv/sjlj.S
new file mode 100644
index 000..93f12ec
--- /dev/null
+++ b/libitm/config/riscv/sjlj.S
@@ -0,0 +1,144 @@
+/* Copyright (C) 2022 Free Software Foundation, Inc.
+   Contributed by Xiongchuan Tan .
+
+   This file is part of the GNU Transactional Memory Library (libitm).
+
+   Libitm is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   Libitm is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#include "asmcfi.h"
+#include "asm.h"
+
+   .text
+   .align  2
+   .global _ITM_beginTransaction
+   .type   _ITM_beginTransaction, @function
+
+_ITM_beginTransaction:
+   cfi_startproc
+   mv a1, sp
+   addi sp, sp, -(LEN_GPR*SZ_GPR+12*SZ_FPR)
+   cfi_adjust_cfa_offset(LEN_GPR*SZ_GPR+12*SZ_FPR)
+
+   /* Return Address */
+   GPR_S ra, 0*SZ_GPR(sp)
+   cfi_rel_offset(ra, 0*SZ_GPR)
+
+   /* Caller's sp */
+   GPR_S a1, 1*SZ_GPR(sp)
+
+   /* Caller's s0/fp */
+   GPR_S fp, 2*SZ_GPR(sp)
+   cfi_rel_offset(fp, 2*SZ_GPR)
+
+   /* Callee-saved registers */
+   GPR_S s1, 3*SZ_GPR(sp)
+   GPR_S s2, 4*SZ_GPR(sp)
+   GPR_S s3, 5*SZ_GPR(sp)
+   GPR_S s4, 6*SZ_GPR(sp)
+   GPR_S s5, 7*SZ_GPR(sp)
+   GPR_S s6, 8*SZ_GPR(sp)
+   GPR_S s7, 9*SZ_GPR(sp)
+   GPR_S s8, 10*SZ_GPR(sp)

RE: [committed 6/6] amdgcn: vector testsuite tweaks

2022-10-28 Thread Stubbs, Andrew
> -Original Message-
> Looking into commit r13-3225-gbd9a05594d227cde79a67dc715bd9d82e9c464e9
> "amdgcn: vector testsuite tweaks" for a moment, I also did wonder about
> the following changes, because for 'vect_multiple_sizes' (for example,
> x86_64-pc-linux-gnu) that seems to lose more specific testing;
> previously: 'scan-tree-dump-times' exactly once, now: 'scan-tree-dump'
> any number of times.  But I've no clue about that myself, so just
> mentioning this, in case somebody else has an opinion.  ;-)

When vect_multiple_sizes is true the number of times the pattern appears will 
be greater that normal.  Most likely the pattern will appear once for each 
vector size.  In the case of GCN, a pattern that normally appears 4 times now 
appears 24 times.

The alternative would be to have a whole set of patterns for each configuration 
of each target that can have the multiple sizes.  That or change the 
implementation of 'scan-tree-dump-times' to support expressions of some kind, 
but even then the expressions would get hairy.

Andrew


Re: [PATCH] x86: Replace ne:CCC/ne:CCO with UNSPEC_CC_NE in neg patterns

2022-10-28 Thread Eric Botcazou via Gcc-patches
> (set (reg:SI 93)
>  (neg:SI (ltu:SI (reg:CCC 17 flags) (const_int 0 [0]
> 
> as
> 
> (set (reg:SI 93)
>  (neg:SI (ltu:SI (const_int 1) (const_int 0 [0]
> 
> which leads to incorrect results since LTU on MODE_CC register isn't the
> same as "unsigned less than" in x86 backend.

That's not specific to the x86 back-end, i.e. it's a generic caveat.

>   PR target/107172
>   * config/i386/i386.md (UNSPEC_CC_NE): New.
>   Replace ne:CCC/ne:CCO with UNSPEC_CC_NE in neg patterns.

FWIW the SPARC back-end uses a COMPARE instead of an UNSPEC here.

-- 
Eric Botcazou




Re: c: tree: target: C2x (...) function prototypes and va_start relaxation

2022-10-28 Thread Richard Biener via Gcc-patches
On Tue, Oct 25, 2022 at 9:42 PM Joseph Myers  wrote:
>
> On Tue, 25 Oct 2022, Richard Biener wrote:
>
> > You are missing to stream the new type flag in tree-streamer-{in,out}.cc
> > and checking for tree merging in lto-common.cc:compare_tree_sccs_1
> >
> > Otherwise looks reasonable.  Can you add a (multi TU) runtime testcase to 
> > the
> > torture exercising the feature so we can see any ABI issues?
>
> I've made those changes.  Those in turn showed up the need for a
> change in fortran/trans-types.cc (to avoid using
> build_varargs_function_type_vec when what was wanted was an
> unprototyped function type rather than a (...) prototype) to avoid a
> failure of gfortran.dg/lto/pr40724.
>
> I've added a two-file testcase to gcc.dg/torture/.  I've also made the
> execution tests cover the case where there are named arguments but the
> last named argument has a declaration that results in undefined
> behavior in previous C standard versions, such as a type changed by
> the default argument promotions.
>
> Here is the revised patch version.

OK.

Thanks,
Richard.

> c: tree: target: C2x (...) function prototypes and va_start relaxation
>
> C2x allows function prototypes to be given as (...), a prototype
> meaning a variable-argument function with no named arguments.  To
> allow such functions to access their arguments, requirements for
> va_start calls are relaxed so it ignores all but its first argument
> (i.e. subsequent arguments, if any, can be arbitrary pp-token
> sequences).
>
> Implement this feature accordingly.  The va_start relaxation in
>  is itself easy: __builtin_va_start already supports a
> second argument of 0 instead of a parameter name, and calls get
> converted internally to the form using 0 for that argument, so
>  just needs changing to use a variadic macro that passes 0
> as the second argument of __builtin_va_start.  (This is done only in
> C2x mode, on the expectation that users of older standard would expect
> unsupported uses of va_start to be diagnosed.)
>
> For the (...) functions, it's necessary to distinguish these from
> unprototyped functions, whereas previously C++ (...) functions and
> unprototyped functions both used NULL TYPE_ARG_TYPES.  A flag is added
> to tree_type_common to mark the (...) functions; as discussed on gcc@,
> doing things this way is likely to be safer for unchanged code in GCC
> than adding a different form of representation in TYPE_ARG_TYPES, or
> adding a flag that instead signals that the function is unprototyped.
>
> There was previously an option
> -fallow-parameterless-variadic-functions to enable support for (...)
> prototypes.  The support was incomplete - it treated the functions as
> unprototyped, and only parsed some declarations, not e.g.
> "int g (int (...));".  This option is changed into a no-op ignored
> option; (...) is always accepted syntactically, with a pedwarn_c11
> call to given required diagnostics when appropriate.  The peculiarity
> of a parameter list with __attribute__ followed by '...' being
> accepted with that option is removed.
>
> Interfaces in tree.cc that create function types are adjusted to set
> this flag as appropriate.  It is of course possible that some existing
> users of the functions to create variable-argument functions actually
> wanted unprototyped functions in the no-named-argument case, rather
> than functions with a (...) prototype; some such cases in c-common.cc
> (for built-in functions and implicit function declarations) turn out
> to need updating for that reason.
>
> I didn't do anything to change how the C++ front end creates (...)
> function types.  It's very likely there are unchanged places in the
> compiler that in fact turn out to need changes to work properly with
> (...) function prototypes.
>
> Target setup_incoming_varargs hooks, where they used the information
> passed about the last named argument, needed updating to avoid using
> that information in the (...) case.  Note that apart from the x86
> changes, I haven't done any testing of those target changes beyond
> building cc1 to check for syntax errors.  It's possible further
> target-specific fixes will be needed; target maintainers should watch
> out for failures of c2x-stdarg-4.c or c2x-stdarg-split-1a.c, the
> execution tests, which would indicate that this feature is not working
> correctly.  Those tests also verify the case where there are named
> arguments but the last named argument has a declaration that results
> in undefined behavior in previous C standard versions, such as a type
> changed by the default argument promotions.
>
> Bootstrapped with no regressions for x86_64-pc-linux-gnu.
>
> gcc/
> * config/aarch64/aarch64.cc (aarch64_setup_incoming_varargs):
> Check TYPE_NO_NAMED_ARGS_STDARG_P.
> * config/alpha/alpha.cc (alpha_setup_incoming_varargs): Likewise.
> * config/arc/arc.cc (arc_setup_incoming_varargs): Likewise.
> * config/arm/arm.cc (arm_setup_incoming_varargs): Likewise.
>  

[PATCH] Adjust gcc.dg/vect/pr100756.c for V8SI and V16SI

2022-10-28 Thread Richard Biener via Gcc-patches
The following adjusts the testcase to require no epilogue also
for larger vectors than V4SI.

Pushed.

* gcc.dg/vect/pr100756.c: Adjust for larger vectors.
---
 gcc/testsuite/gcc.dg/vect/pr100756.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr100756.c 
b/gcc/testsuite/gcc.dg/vect/pr100756.c
index c1362f29ebe..7847f3eb880 100644
--- a/gcc/testsuite/gcc.dg/vect/pr100756.c
+++ b/gcc/testsuite/gcc.dg/vect/pr100756.c
@@ -7,7 +7,7 @@ foo (int * restrict a, int n)
   int i, result = 0;
 
   a = __builtin_assume_aligned (a, __BIGGEST_ALIGNMENT__);
-  for (i = 0; i < n * 4; i++)
+  for (i = 0; i < n * 16; i++)
 result += a[i];
   return result;
 }
-- 
2.35.3


Re: [PATCH] diagnostics: Allow FEs to keep customizations for middle end [PR101551, PR106274]

2022-10-28 Thread Richard Biener via Gcc-patches
On Tue, Oct 25, 2022 at 11:05 PM Lewis Hyatt  wrote:
>
> On Tue, Oct 25, 2022 at 7:35 AM Richard Biener
>  wrote:
> >
> > On Thu, Oct 20, 2022 at 1:09 AM Lewis Hyatt via Gcc-patches
> >  wrote:
> > >
> > > Currently, the ipa-free-lang-data pass resets most of the frontend's
> > > diagnostic customizations, such as the diagnostic_finalizer that prints 
> > > macro
> > > expansion information, which is the subject of the two PRs. In most cases,
> > > however, there is no need to reset these customizations; they still work 
> > > just
> > > fine after the language-specific data has been freed. (Macro tracking
> > > information, for instance, only depends on the line_maps instance and 
> > > does not
> > > use the tree data structures at all.)
> > >
> > > Add an interface whereby frontends can convey which of their 
> > > customizations
> > > should be preserved by ipa-free-lang-data. Only the macro tracking 
> > > behavior is
> > > changed for now.  Subsequent patches will add further configurations for 
> > > each
> > > frontend.
> >
> > One point of the resetting of the hooks is to avoid crashes due to us 
> > zapping
> > many of the lang specific data structures.  If the hooks were more resilent
> > that wouldn't be an issue.
> >
>
> Right. The patch I have for C++ (not sent yet) makes the C++ versions
> of decl_printable_name and and the diagnostic starter able to work
> after free_lang_data runs.  I just worry that future changes to the
> C++ hooks would need to preserve this property, which could be error
> prone since issues are not immediately apparent, and most of the
> testsuite does not use -flto.
>
> > Now - as for macro tracking, how difficult is it to replicate that in the
> > default hook implementation?  Basically we have similar issues for
> > late diagnostics of the LTO compile step where only the LTO (aka default)
> > variant of the hooks are present - it would be nice to improve that as well.
> >
>
> It is easy enough to make the default diagnostic finalizer print the
> macro tracking information stored in the global line_table. (It just
> needs to check if the global line_table is set, in which case call
> virt_loc_aware_diagnostic_finalizer()). This would remove the need for
> C-family frontends to override that callback. Fortran would still do
> so, since it does other things in its finalizer. However, this would
> not help with the LTO frontend because the line_table is not part of
> what gets streamed out. Rather the line_table is rebuilt from scratch
> when reading the data back in, but the macro tracking information is
> not available at that time, just the basic location info (filename and
> source location). I am not that familiar with the LTO streaming
> process but I feel like streaming the entire line_table would not mesh
> well with it (especially since multiple of them from different
> translation units would need to be combined back together).
>
> > Note free_lang_data exists to "simplify" the LTO bytecode output - things
> > freed do not need to be output.  Of course the "freeing" logic could be
> > wired into the LTO bytecode output machinery directly - simply do not
> > output what we'd free.  That way all info would prevail for the non-LTO
> > compile and the hooks could continue to work as they do without any
> > LTO streaming done.
> >
>
> Naively (emphasis on the naive, as I don't have any experience with
> this part of GCC), that is how I would have guessed it worked. But I
> understood there are some benefits to freeing the lang data earlier
> (e.g. reduced resource usage), and even a hope to start doing it in
> non-LTO builds as well, so I thought some incremental changes as in
> this patch to make diagnostics better after free_lang_data could
> perhaps be useful. Thanks for taking a look at it!

Yes, the idea was also to free up memory but then that part never
really materialized - the idea was to always run free-lang-data, not
just when later outputting LTO bytecode.  The reason is probably
mainly the diagnostic regressions you observe.

Maybe a better strathegy than your patch would be to work towards
that goal but reduce the number of "freeings", instead adjusting the
LTO streamer to properly ignore frontend specific bits where clearing
conflicts with the intent to preserve accurate diagnostics throughout
the compilation.

If you see bits that when not freed would fix some of the observed
issues we can see to replicate the freeing in the LTO output machinery.

Richard.

>
> -Lewis


Re: [PATCH] [PR tree-optimization/107394] Canonicalize global franges as they are read back.

2022-10-28 Thread Aldy Hernandez via Gcc-patches
On Fri, Oct 28, 2022, 08:49 Richard Biener 
wrote:

> On Fri, Oct 28, 2022 at 12:45 AM Jeff Law  wrote:
> >
> >
> > On 10/25/22 15:01, Aldy Hernandez via Gcc-patches wrote:
> > > [Richi/Jakub/FP experts, does this sound like the right solution, or
> am I
> > > missing some subtle IPA/inlining issue?]
> > >
> > > The problem here is that we're inlining a global range with NANs into
> > > a function that has been tagged with __attribute__((optimize
> > > ("-ffinite-math-only"))).  As the global range is copied from
> > > SSA_NAME_RANGE_INFO, its NAN bits are copied, which then cause
> > > frange::verify_range() to fail a sanity check making sure no NANs
> > > creep in when !HONOR_NANS.
> > >
> > > I think what we should do is nuke the NAN bits as we're restoring the
> > > global range.  For that matter, if we use the frange constructor,
> > > everything except that NAN sign will be done automatically, including
> > > dropping INFs to the min/max representable range when appropriate.
> > >
> > >   PR tree-optimization/107394
> > >
> > > gcc/ChangeLog:
> > >
> > >   * value-range-storage.cc (frange_storage_slot::get_frange): Use
> > >   frange constructor.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   * gcc.dg/tree-ssa/pr107394.c: New test.
> >
> > The other approach would be to disabling inlining in this case due to an
> > unsafe attribute mismatch, but we're not currently doing much sanity
> > checking in this space and it might be a huge can of worms.  I'm
> > inclined to ACK, but give Jakub and Richi until Monday to chime in first.
>
> We are actually quite careful in this regard but maybe our reasoning
> is wrong.  We are allowing inlining of -fno-finite-math-only into
> -ffinite-math-only code but not the other way around.
>
> On the actual patch I think that ranges with Inf/NaNs should be always
> treated as "valid", the optimization to trim them with certain options
> is optimization and thus optional.  So IMHO having verify_range ICE
> on NaNs isn't correct?
>

That was my gut feeling as well, but the assert has caught real issues such
as this one. Also, in your example down thread, we would drop the explicit
NAN to UNDEFINED if expressed as a range (as agreed earlier this cycle). So
we won't ICE...since a range with NAN will never get built.

The assert is there to keep NANs from sneaking in. However, if you still
think it's incorrect I'm happy to remove it.


> That said, the patch is in line with what we do elsewhere at the moment,
> so I guess OK.
>

Thanks.
Aldy


> Richard.
>
> >
> > jeff
> >
>
>


Re: [og12] OpenACC: Don't gang-privatize artificial variables: restrict to blocks (was: [PATCH] [og12] OpenACC: Don't gang-privatize artificial variables)

2022-10-28 Thread Thomas Schwinge
Hi!

On 2022-10-28T10:11:04+0200, I wrote:
> On 2022-10-18T15:59:24+0100, Julian Brown  wrote:
>> On Tue, 18 Oct 2022 16:46:07 +0200 Thomas Schwinge  
>> wrote:
>>> On 2022-10-14T13:38:56+, Julian Brown  wrote:
>>> ..., but to my surprised, that did fire in one occasion:
>>>
>>> > --- a/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90
>>> > +++ b/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90
>>> > @@ -94,9 +94,7 @@ contains
>>> >  !$acc parallel copy(array)
>>> >  !$acc loop gang private(array) ! { dg-line l_loop[incr c_loop] }
>>> >  ! { dg-note {variable 'i' in 'private' clause isn't candidate for 
>>> > adjusting OpenACC privatization level: not addressable} "" { target *-*-* 
>>> > } l_loop$c_loop }
>>> > -! { dg-note {variable 'array\.[0-9]+' in 'private' clause is 
>>> > candidate for adjusting OpenACC privatization level} "" { target *-*-* } 
>>> > l_loop$c_loop }
>>> > -! { dg-note {variable 'array\.[0-9]+' ought to be adjusted for 
>>> > OpenACC privatization level: 'gang'} "" { target *-*-* } l_loop$c_loop }
>>> > -! { dg-note {variable 'array\.[0-9]+' adjusted for OpenACC 
>>> > privatization level: 'gang'} "" { target { ! { openacc_host_selected || { 
>>> > openacc_nvidia_accel_selected && __OPTIMIZE__ } } } } l_loop$c_loop }
>>> > +! { dg-note {variable 'array\.[0-9]+' in 'private' clause isn't 
>>> > candidate for adjusting OpenACC privatization level: artificial} "" { 
>>> > target *-*-* } l_loop$c_loop }
>>> >  ! { dg-message {sorry, unimplemented: target cannot support alloca} 
>>> > PR65181 { target openacc_nvidia_accel_selected } l_loop$c_loop }
>>> >  do i = 1, 10
>>> >array(i) = 9*i
>>>
>>> ... here.  Note "variable 'array\.[0-9]+' in 'private' clause";
>>> everywhere else we have "declared in block".
>>>
>>> As part of your verification, have you already looked into whether the
>>> new behavior is correct here, or does this one need to continue to be
>>> "adjusted for OpenACC privatization level: 'gang'"?  If the latter,
>>> should we check 'if (res && block && DECL_ARTIFICIAL (decl))' instead
>>> of 'if (res && DECL_ARTIFICIAL (decl))', or is there some wrong
>>> setting of 'DECL_ARTIFICIAL' -- or are we maybe looking at an
>>> inappropriate 'decl'? (Thinking of commit
>>> r12-7580-g7a5e036b61aa088e6b8564bc9383d37dfbb4801e "[OpenACC
>>> privatization] Analyze 'lookup_decl'-translated DECL [PR90115,
>>> PR102330, PR104774]", for example.)
>>
>> I haven't looked in detail, but it seems to me that the "artificial"
>> flag isn't appropriate for that decl, which is (derived from?) a
>> user-visible symbol. So, I'm not sure what's going on there (and yes
>> the commit you mention looks like it could be relevant, I think?).
>> There are probably subtleties I'm not aware of...
>
> Until we've got that worked out, let's simply restrict the
> 'DECL_ARTIFICIAL' handling to 'block's only; pushed to devel/omp/gcc-12
> commit 9a50d282f03f7f1e1ad00de917143a2a8e0c0ee0
> "[og12] OpenACC: Don't gang-privatize artificial variables: restrict to 
> blocks"

..., see attached now really.

Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 9a50d282f03f7f1e1ad00de917143a2a8e0c0ee0 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 18 Oct 2022 16:59:54 +0200
Subject: [PATCH] [og12] OpenACC: Don't gang-privatize artificial variables:
 restrict to blocks

Follow-up to og12 commit d4504346d2a1d6ffecb8b2d8e3e04ab8ea259785
"[og12] OpenACC: Don't gang-privatize artificial variables", to restore
the previous behavior, until we understand what it means for a
'DECL_ARTIFICIAL' to appear in a 'private' clause.

	gcc/
	* omp-low.cc (oacc_privatization_candidate_p) :
	Restrict to 'block's.
	libgomp/
	* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Adjust.
---
 gcc/omp-low.cc  | 2 +-
 libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90 | 4 +++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index 002f91d930a..66aa11cd32d 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -11409,7 +11409,7 @@ oacc_privatization_candidate_p (const location_t loc, const tree c,
  At present, no compiler-generated artificial variables require such
  sharing semantics, so this is safe.  */
 
-  if (res && DECL_ARTIFICIAL (decl))
+  if (res && block && DECL_ARTIFICIAL (decl))
 {
   res = false;
 
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90 b/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90
index 936285e9f69..498ef70b63a 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90
@@ -94,7 +94,9 

[og12] OpenACC: Don't gang-privatize artificial variables: restrict to blocks (was: [PATCH] [og12] OpenACC: Don't gang-privatize artificial variables)

2022-10-28 Thread Thomas Schwinge
Hi!

On 2022-10-18T15:59:24+0100, Julian Brown  wrote:
> On Tue, 18 Oct 2022 16:46:07 +0200 Thomas Schwinge  
> wrote:
>> On 2022-10-14T13:38:56+, Julian Brown  wrote:
>> ..., but to my surprised, that did fire in one occasion:
>>
>> > --- a/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90
>> > +++ b/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90
>> > @@ -94,9 +94,7 @@ contains
>> >  !$acc parallel copy(array)
>> >  !$acc loop gang private(array) ! { dg-line l_loop[incr c_loop] }
>> >  ! { dg-note {variable 'i' in 'private' clause isn't candidate for 
>> > adjusting OpenACC privatization level: not addressable} "" { target *-*-* 
>> > } l_loop$c_loop }
>> > -! { dg-note {variable 'array\.[0-9]+' in 'private' clause is 
>> > candidate for adjusting OpenACC privatization level} "" { target *-*-* } 
>> > l_loop$c_loop }
>> > -! { dg-note {variable 'array\.[0-9]+' ought to be adjusted for 
>> > OpenACC privatization level: 'gang'} "" { target *-*-* } l_loop$c_loop }
>> > -! { dg-note {variable 'array\.[0-9]+' adjusted for OpenACC 
>> > privatization level: 'gang'} "" { target { ! { openacc_host_selected || { 
>> > openacc_nvidia_accel_selected && __OPTIMIZE__ } } } } l_loop$c_loop }
>> > +! { dg-note {variable 'array\.[0-9]+' in 'private' clause isn't 
>> > candidate for adjusting OpenACC privatization level: artificial} "" { 
>> > target *-*-* } l_loop$c_loop }
>> >  ! { dg-message {sorry, unimplemented: target cannot support alloca} 
>> > PR65181 { target openacc_nvidia_accel_selected } l_loop$c_loop }
>> >  do i = 1, 10
>> >array(i) = 9*i
>>
>> ... here.  Note "variable 'array\.[0-9]+' in 'private' clause";
>> everywhere else we have "declared in block".
>>
>> As part of your verification, have you already looked into whether the
>> new behavior is correct here, or does this one need to continue to be
>> "adjusted for OpenACC privatization level: 'gang'"?  If the latter,
>> should we check 'if (res && block && DECL_ARTIFICIAL (decl))' instead
>> of 'if (res && DECL_ARTIFICIAL (decl))', or is there some wrong
>> setting of 'DECL_ARTIFICIAL' -- or are we maybe looking at an
>> inappropriate 'decl'? (Thinking of commit
>> r12-7580-g7a5e036b61aa088e6b8564bc9383d37dfbb4801e "[OpenACC
>> privatization] Analyze 'lookup_decl'-translated DECL [PR90115,
>> PR102330, PR104774]", for example.)
>
> I haven't looked in detail, but it seems to me that the "artificial"
> flag isn't appropriate for that decl, which is (derived from?) a
> user-visible symbol. So, I'm not sure what's going on there (and yes
> the commit you mention looks like it could be relevant, I think?).
> There are probably subtleties I'm not aware of...

Until we've got that worked out, let's simply restrict the
'DECL_ARTIFICIAL' handling to 'block's only; pushed to devel/omp/gcc-12
commit 9a50d282f03f7f1e1ad00de917143a2a8e0c0ee0
"[og12] OpenACC: Don't gang-privatize artificial variables: restrict to blocks",
see attached.


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH] i386: Enable small loop unrolling for O2

2022-10-28 Thread Hongyu Wang via Gcc-patches
> Ugh, that's all quite ugly and unmaintainable, no?
Agreed, I have the same feeling.

> I'm quite sure that if this works it's not by intention.  Doesn't this
> also disable
> register renaming and web when the user explicitely specifies -funroll-loops?
>
> Doesn't this change -funroll-loops behavior everywhere, only unrolling small
> loops?

The ugly part ensures that -funroll-loops would not be affected at all
by -munroll-only-small-loops.

>
> I'd like to see a -munroll-only-small-loops addition that doesn't have any 
> such
> effects.  Note RTL unrolling could also
> conditionally enabled on a new -funroll-small-loops which wouldn't enable
> register renaming or web.

Did you mean something like

index b9e07973dd6..b707d4afb84 100644
--- a/gcc/loop-init.cc
+++ b/gcc/loop-init.cc
@@ -567,7 +567,8 @@ public:
   /* opt_pass methods: */
   bool gate (function *) final override
 {
-  return (flag_unroll_loops || flag_unroll_all_loops || cfun->has_unroll);
+  return (flag_unroll_loops || flag_unroll_all_loops || cfun->has_unroll
+ || flag_unroll_only_small_loops);
 }

then the backend can turn it on by default in O2?
I don't know if there is a way to turn on middle-end pass by
target-specific flags.

Richard Biener via Gcc-patches  于2022年10月28日周五 15:33写道:
>
> On Wed, Oct 26, 2022 at 7:53 AM Hongyu Wang  wrote:
> >
> > Hi,
> >
> > Inspired by rs6000 and s390 port changes, this patch
> > enables loop unrolling for small size loop at O2 by default.
> > The default behavior is to unroll loop with unknown trip-count and
> > less than 4 insns by 1 time.
> >
> > This improves 548.exchange2 by 3.5% on icelake and 6% on zen3 with
> > 1.2% codesize increment. For other benchmarks the variants are minor
> > and overall codesize increased by 0.2%.
> >
> > The kernel image size increased by 0.06%, and no impact on eembc.
> >
> > Bootstrapped & regrtested on x86_64-pc-linux-gnu.
> >
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> > * common/config/i386/i386-common.cc (ix86_optimization_table):
> > Enable loop unroll and small loop unroll at O2 by default.
> > * config/i386/i386-options.cc
> > (ix86_override_options_after_change):
> > Disable small loop unroll when funroll-loops enabled, reset
> > cunroll_grow_size when it is not explicitly enabled.
> > (ix86_option_override_internal): Call
> > ix86_override_options_after_change instead of calling
> > ix86_recompute_optlev_based_flags and ix86_default_align
> > separately.
> > * config/i386/i386.cc (ix86_loop_unroll_adjust): Adjust unroll
> > factor if -munroll-only-small-loops enabled.
> > * config/i386/i386.opt: Add -munroll-only-small-loops,
> > -param=x86-small-unroll-ninsns= for loop insn limit,
> > -param=x86-small-unroll-factor= for unroll factor.
> > * doc/invoke.texi: Document -munroll-only-small-loops,
> > x86-small-unroll-ninsns and x86-small-unroll-factor.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/pr86270.c: Add -mno-unroll-only-small-loops.
> > * gcc.target/i386/pr93002.c: Likewise.
> > ---
> >  gcc/common/config/i386/i386-common.cc   |  6 
> >  gcc/config/i386/i386-options.cc | 40 ++---
> >  gcc/config/i386/i386.cc | 13 
> >  gcc/config/i386/i386.opt| 13 
> >  gcc/doc/invoke.texi | 14 +
> >  gcc/testsuite/gcc.target/i386/pr86270.c |  2 +-
> >  gcc/testsuite/gcc.target/i386/pr93002.c |  2 +-
> >  7 files changed, 84 insertions(+), 6 deletions(-)
> >
> > diff --git a/gcc/common/config/i386/i386-common.cc 
> > b/gcc/common/config/i386/i386-common.cc
> > index d6a68dc9b1d..0e580b39d14 100644
> > --- a/gcc/common/config/i386/i386-common.cc
> > +++ b/gcc/common/config/i386/i386-common.cc
> > @@ -1686,6 +1686,12 @@ static const struct default_options 
> > ix86_option_optimization_table[] =
> >  /* The STC algorithm produces the smallest code at -Os, for x86.  */
> >  { OPT_LEVELS_2_PLUS, OPT_freorder_blocks_algorithm_, NULL,
> >REORDER_BLOCKS_ALGORITHM_STC },
> > +{ OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
> > +{ OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_only_small_loops, NULL, 1 
> > },
> > +/* Turns off -frename-registers and -fweb which are enabled by
> > +   funroll-loops.  */
> > +{ OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 },
> > +{ OPT_LEVELS_ALL, OPT_fweb, NULL, 0 },
>
> I'm quite sure that if this works it's not by intention.  Doesn't this
> also disable
> register renaming and web when the user explicitely specifies -funroll-loops?
>
> Doesn't this change -funroll-loops behavior everywhere, only unrolling small
> loops?
>
> I'd like to see a -munroll-only-small-loops addition that doesn't have any 
> such
> effects.  Note RTL unrolling could also
> conditionally enabled on a new 

[PATCH v3] LoongArch: Libvtv add loongarch support.

2022-10-28 Thread Lulu Cheng
After several considerations, I decided to set VTV_PAGE_SIZE to 16KB under 
loongarch64.


v1 - > v2:

1. When the macro __loongarch_lp64 is defined, the VTV_PAGE_SIZE is set to 64K.
2. In the vtv_malloc.cc file __vtv_malloc_init function, it does not check
   whether VTV_PAGE_SIZE is equal to the system page size, if the macro
   __loongarch_lp64 is defined.

v2 -> v3:

Set VTV_PAGE_SIZE to 16KB under loongarch64.



All regression tests of libvtv passed.

=== libvtv Summary ===

# of expected passes176

-

The loongarch64 kernel supports 4KB,16KB, or 64KB pages,
but only 16k pages are currently supported in this code.

Co-Authored-By: qijingwen 

include/ChangeLog:

* vtv-change-permission.h (defined):
(VTV_PAGE_SIZE): Set VTV_PAGE_SIZE to 16KB under loongarch64.

libvtv/ChangeLog:

* configure.tgt: Add loongarch support.
---
 include/vtv-change-permission.h | 5 +
 libvtv/configure.tgt| 3 +++
 2 files changed, 8 insertions(+)

diff --git a/include/vtv-change-permission.h b/include/vtv-change-permission.h
index 70bdad92bca..f61d8b68ef6 100644
--- a/include/vtv-change-permission.h
+++ b/include/vtv-change-permission.h
@@ -48,6 +48,11 @@ extern void __VLTChangePermission (int);
 #else 
 #if defined(__sun__) && defined(__svr4__) && defined(__sparc__)
 #define VTV_PAGE_SIZE 8192
+#elif defined(__loongarch_lp64)
+/* The page size can be configured to 4, 16, or 64KB configuring the kernel.
+   However, only 16KB pages are supported here. Please modify this macro if you
+   want to support other page sizes.  */
+#define VTV_PAGE_SIZE 16384
 #else
 #define VTV_PAGE_SIZE 4096
 #endif
diff --git a/libvtv/configure.tgt b/libvtv/configure.tgt
index aa2a3f675b8..6cdd1e97ab1 100644
--- a/libvtv/configure.tgt
+++ b/libvtv/configure.tgt
@@ -50,6 +50,9 @@ case "${target}" in
;;
   x86_64-*-darwin[1]* | i?86-*-darwin[1]*)
;;
+  loongarch*-*-linux*)
+   VTV_SUPPORTED=yes
+   ;;
   *)
;;
 esac
-- 
2.31.1



Re: [PATCH] openmp: fix max_vf setting for amdgcn offloading

2022-10-28 Thread Thomas Schwinge
Hi!

In addition to the technical issues pointed out by Jakub for this og12
commit:

On 2022-07-12T15:16:35+0100, Andrew Stubbs  wrote:
> This patch [...]

> I will commit a backport to OG12 shortly.

> openmp: fix max_vf setting for amdgcn offloading

> --- a/gcc/omp-general.h
> +++ b/gcc/omp-general.h

>  extern poly_uint64 omp_max_vf (void);
>  extern int omp_max_simt_vf (void);
> +extern int omp_max_simd_vf (void);

> --- a/gcc/omp-low.cc
> +++ b/gcc/omp-low.cc
> @@ -4646,7 +4646,14 @@ lower_rec_simd_input_clauses (tree new_var, 
> omp_context *ctx,
>  {
>if (known_eq (sctx->max_vf, 0U))
>  {
> -  sctx->max_vf = sctx->is_simt ? omp_max_simt_vf () : omp_max_vf ();
> +  /* If we are compiling for multiple devices choose the largest VF.  */
> +  sctx->max_vf = omp_max_vf ();
> +  if (omp_maybe_offloaded_ctx (ctx))
> + {
> +   if (sctx->is_simt)
> + sctx->max_vf = ordered_max (sctx->max_vf, omp_max_simt_vf ());
> +   sctx->max_vf = ordered_max (sctx->max_vf, omp_max_simd_vf ());
> + }
>if (maybe_gt (sctx->max_vf, 1U))
>   {
> tree c = omp_find_clause (gimple_omp_for_clauses (ctx->stmt),

... I've additionally run into a bootstrap error, and have now pushed
"Resolve '-Wsign-compare' issue in 
'gcc/omp-low.cc:lower_rec_simd_input_clauses'"
to devel/omp/gcc-12 in commit 4e32d1582a137d5f34248fdd3e93d35a798f5221,
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 4e32d1582a137d5f34248fdd3e93d35a798f5221 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 25 Oct 2022 09:45:31 +0200
Subject: [PATCH 1/2] Resolve '-Wsign-compare' issue in
 'gcc/omp-low.cc:lower_rec_simd_input_clauses'
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

..., introduced in og12 commit 55722a87dd223149dcd41ca9c8eba16ad5b3eddc
"openmp: fix max_vf setting for amdgcn offloading":

In file included from [...]/source-gcc/gcc/coretypes.h:482,
 from [...]/source-gcc/gcc/omp-low.cc:27:
[...]/source-gcc/gcc/poly-int.h: In instantiation of ‘typename if_nonpoly::type maybe_lt(const Ca&, const poly_int_pod&) [with unsigned int N = 1; Ca = int; Cb = long unsigned int; typename if_nonpoly::type = bool]’:
[...]/source-gcc/gcc/poly-int.h:1510:7:   required from ‘poly_int::type>::type> ordered_max(const poly_int_pod&, const Cb&) [with unsigned int N = 1; Ca = long unsigned int; Cb = int; typename poly_result::type>::type = long unsigned int; typename if_nonpoly::type = int]’
[...]/source-gcc/gcc/omp-low.cc:5180:33:   required from here
[...]/source-gcc/gcc/poly-int.h:1384:12: error: comparison of integer expressions of different signedness: ‘const int’ and ‘const long unsigned int’ [-Werror=sign-compare]
 1384 |   return a < b.coeffs[0];
  |  ~~^~~
[...]/source-gcc/gcc/poly-int.h: In instantiation of ‘typename if_nonpoly::type maybe_lt(const poly_int_pod&, const Cb&) [with unsigned int N = 1; Ca = long unsigned int; Cb = int; typename if_nonpoly::type = bool]’:
[...]/source-gcc/gcc/poly-int.h:1515:2:   required from ‘poly_int::type>::type> ordered_max(const poly_int_pod&, const Cb&) [with unsigned int N = 1; Ca = long unsigned int; Cb = int; typename poly_result::type>::type = long unsigned int; typename if_nonpoly::type = int]’
[...]/source-gcc/gcc/omp-low.cc:5180:33:   required from here
[...]/source-gcc/gcc/poly-int.h:1373:22: error: comparison of integer expressions of different signedness: ‘const long unsigned int’ and ‘const int’ [-Werror=sign-compare]
 1373 |   return a.coeffs[0] < b;
  |  ^~~

	gcc/
	* omp-low.cc (lower_rec_simd_input_clauses): For 'ordered_max',
	cast 'omp_max_simt_vf ()', 'omp_max_simd_vf ()' to 'unsigned'.
---
 gcc/omp-low.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index b5b2681b654..002f91d930a 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -5177,8 +5177,8 @@ lower_rec_simd_input_clauses (tree new_var, omp_context *ctx,
   if (omp_maybe_offloaded_ctx (ctx))
 	{
 	  if (sctx->is_simt)
-	sctx->max_vf = ordered_max (sctx->max_vf, omp_max_simt_vf ());
-	  sctx->max_vf = ordered_max (sctx->max_vf, omp_max_simd_vf ());
+	sctx->max_vf = ordered_max (sctx->max_vf, (unsigned) omp_max_simt_vf ());
+	  sctx->max_vf = ordered_max (sctx->max_vf, (unsigned) omp_max_simd_vf ());
 	}
   if (maybe_gt (sctx->max_vf, 1U))
 	{
-- 
2.35.1

>From 1c5087dfff64c40505bcb81b5069781a44bb0b4d Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 28 Oct 2022 09:55:22 +0200
Subject: [PATCH 2/2] Resolve '-Wsign-compare' issue in
 'gcc/omp-low.cc:lower_rec_simd_input_clauses': ChangeLog

... 

Re: [committed 6/6] amdgcn: vector testsuite tweaks

2022-10-28 Thread Thomas Schwinge
Hi!

On 2022-10-11T12:02:08+0100, Andrew Stubbs  wrote:
> The testsuite needs a few tweaks following my patches to add multiple vector
> sizes for amdgcn.

While 'grep'ping for some other GCN thing, this:

> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c
> @@ -46,5 +46,6 @@ int main ()
>  }
>
>  /* { dg-final { scan-tree-dump {(no need for alias check [^\n]* when VF is 
> 1|no alias between [^\n]* when [^\n]* is outside \(-16, 16\))} "vect" { 
> target vect_element_align } } } */
> -/* { dg-final { scan-tree-dump-times "loop vectorized" 1 "vect" { target 
> vect_element_align } } } */
> +/* { dg-final { scan-tree-dump-times "loop vectorized" 1 "vect" { target { 
> vect_element_align && !amdgcn-*-* } } } } */
> +/* { dg-final { scan-tree-dump-times "loop vectorized" 2 "vect" { target 
> amdgcn-*-* } } } */

... target selector expression '!amdgcn-*-*' occurred to me as dubious,
so I checked, and now pushed to master branch
commit 0607307768b66a90e27c5bc91a247acc938f070e
"Fix target selector syntax in 'gcc.dg/vect/bb-slp-cond-1.c'", see attached.

Cherry-picked pushed to devel/omp/gcc-12 branch
commit 5f4d2a15403d7231d7be673a9d633c0b4a22e19c
"Fix target selector syntax in 'gcc.dg/vect/bb-slp-cond-1.c'", see attached.


Looking into commit r13-3225-gbd9a05594d227cde79a67dc715bd9d82e9c464e9
"amdgcn: vector testsuite tweaks" for a moment, I also did wonder about
the following changes, because for 'vect_multiple_sizes' (for example,
x86_64-pc-linux-gnu) that seems to lose more specific testing;
previously: 'scan-tree-dump-times' exactly once, now: 'scan-tree-dump'
any number of times.  But I've no clue about that myself, so just
mentioning this, in case somebody else has an opinion.  ;-)

>   * gcc.dg/vect/no-vfa-vect-depend-2.c: Change expectations for multiple
>   vector sizes.
>   * gcc.dg/vect/pr33953.c: Likewise.
>   * gcc.dg/vect/pr65947-12.c: Likewise.
>   * gcc.dg/vect/pr65947-13.c: Likewise.
>   * gcc.dg/vect/pr80631-2.c: Likewise.
>   * gcc.dg/vect/slp-reduc-4.c: Likewise.
>   * gcc.dg/vect/trapv-vect-reduc-4.c: Likewise.

> --- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-2.c
> @@ -51,4 +51,5 @@ int main (void)
>  }
>
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" {xfail { 
> vect_no_align && { ! vect_hw_misalign } } } } } */
> -/* { dg-final { scan-tree-dump-times "dependence distance negative" 1 "vect" 
> } } */
> +/* { dg-final { scan-tree-dump-times "dependence distance negative" 1 "vect" 
> { target { ! vect_multiple_sizes } } } } */
> +/* { dg-final { scan-tree-dump "dependence distance negative" "vect" { 
> target vect_multiple_sizes } } } */

> --- a/gcc/testsuite/gcc.dg/vect/pr33953.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr33953.c
> @@ -29,6 +29,7 @@ void blockmove_NtoN_blend_noremap32 (const UINT32 *srcdata, 
> int srcwidth,
>  }
>
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { 
> vect_no_align && { ! vect_hw_misalign } } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
> { xfail { vect_no_align && { ! vect_hw_misalign } } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
> { target { ! vect_multiple_sizes } xfail { vect_no_align && { ! 
> vect_hw_misalign } } } } } */
> +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { target 
> vect_multiple_sizes xfail { vect_no_align && { ! vect_hw_misalign } } } } } */

> --- a/gcc/testsuite/gcc.dg/vect/pr65947-12.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr65947-12.c
> @@ -42,5 +42,6 @@ main (void)
>  }
>
>  /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" } } */
> -/* { dg-final { scan-tree-dump-times "optimizing condition reduction with 
> FOLD_EXTRACT_LAST" 2 "vect" { target vect_fold_extract_last } } } */
> +/* { dg-final { scan-tree-dump-times "optimizing condition reduction with 
> FOLD_EXTRACT_LAST" 2 "vect" { target { vect_fold_extract_last && { ! 
> vect_multiple_sizes } } } } } */
> +/* { dg-final { scan-tree-dump "optimizing condition reduction with 
> FOLD_EXTRACT_LAST" "vect" { target { vect_fold_extract_last && 
> vect_multiple_sizes } } } } */
>  /* { dg-final { scan-tree-dump-not "condition expression based on integer 
> induction." "vect" } } */

> --- a/gcc/testsuite/gcc.dg/vect/pr65947-13.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr65947-13.c
> @@ -44,4 +44,5 @@ main (void)
>
>  /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" } } */
>  /* { dg-final { scan-tree-dump-times "condition expression based on integer 
> induction." 2 "vect" { xfail vect_fold_extract_last } } } */
> -/* { dg-final { scan-tree-dump-times "optimizing condition reduction with 
> FOLD_EXTRACT_LAST" 2 "vect" { target vect_fold_extract_last } } } */
> +/* { dg-final { scan-tree-dump-times "optimizing condition reduction with 
> FOLD_EXTRACT_LAST" 2 "vect" 

Re: [PATCH 2/2] ivopts: Consider number of invariants when calculating register pressure.

2022-10-28 Thread Richard Biener via Gcc-patches
On Tue, Oct 25, 2022 at 3:00 PM Dimitrije Milosevic
 wrote:
>
> Hi Richard,
>
> > don't you add n_invs twice now given
> >
> >  unsigned n_old = data->regs_used, n_new = n_invs + n_cands;
> >  unsigned regs_needed = n_new + n_old, available_regs = target_avail_regs;
> >
> > ?
>
> If you are referring to the "If we have enough registers." case, correct. 
> After c18101f,
> for that case, the returned cost is equal to 2 * n_invs + n_cands.

It's n_invs + 2 * n_cands?  And the comment states the reasoning.

 Before c18101f, for
> that case, the returned cost is equal to n_invs + n_cands. Another solution 
> would be
> to just return n_invs + n_cands if we have enough registers.

The comment says we want to prefer eliminating IVs over invariants.  Your patch
undoes that by weighting invariants the same so it does no longer have
the effect
of the comment.

> Regards,
> Dimitrije
>
>
> From: Richard Biener 
> Sent: Tuesday, October 25, 2022 1:07 PM
> To: Dimitrije Milosevic 
> Cc: gcc-patches@gcc.gnu.org ; Djordje Todorovic 
> 
> Subject: Re: [PATCH 2/2] ivopts: Consider number of invariants when 
> calculating register pressure.
>
> On Fri, Oct 21, 2022 at 3:57 PM Dimitrije Milosevic
>  wrote:
> >
> > From: Dimitrije Milošević 
> >
> > This patch slightly modifies register pressure model function to consider
> > both the number of invariants and the number of candidates, rather than
> > just the number of candidates. This used to be the case before c18101f.
>
> don't you add n_invs twice now given
>
>   unsigned n_old = data->regs_used, n_new = n_invs + n_cands;
>   unsigned regs_needed = n_new + n_old, available_regs = target_avail_regs;
>
> ?
>
> > gcc/ChangeLog:
> >
> > * tree-ssa-loop-ivopts.cc (ivopts_estimate_reg_pressure): Adjust.
> >
> > Signed-off-by: Dimitrije Milosevic 
> > ---
> >  gcc/tree-ssa-loop-ivopts.cc | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> > index d53ba05a4f6..9d0b669d671 100644
> > --- a/gcc/tree-ssa-loop-ivopts.cc
> > +++ b/gcc/tree-ssa-loop-ivopts.cc
> > @@ -6409,9 +6409,9 @@ ivopts_estimate_reg_pressure (struct ivopts_data 
> > *data, unsigned n_invs,
> >+ target_spill_cost [speed] * (n_cands - available_regs) * 2
> >+ target_spill_cost [speed] * (regs_needed - n_cands);
> >
> > -  /* Finally, add the number of candidates, so that we prefer eliminating
> > - induction variables if possible.  */
> > -  return cost + n_cands;
> > +  /* Finally, add the number of invariants and the number of candidates,
> > + so that we prefer eliminating induction variables if possible.  */
> > +  return cost + n_invs + n_cands;
> >  }
> >
> >  /* For each size of the induction variable set determine the penalty.  */
> > --
> > 2.25.1
> >


Re: [PATCH] i386: Enable small loop unrolling for O2

2022-10-28 Thread Richard Biener via Gcc-patches
On Wed, Oct 26, 2022 at 7:53 AM Hongyu Wang  wrote:
>
> Hi,
>
> Inspired by rs6000 and s390 port changes, this patch
> enables loop unrolling for small size loop at O2 by default.
> The default behavior is to unroll loop with unknown trip-count and
> less than 4 insns by 1 time.
>
> This improves 548.exchange2 by 3.5% on icelake and 6% on zen3 with
> 1.2% codesize increment. For other benchmarks the variants are minor
> and overall codesize increased by 0.2%.
>
> The kernel image size increased by 0.06%, and no impact on eembc.
>
> Bootstrapped & regrtested on x86_64-pc-linux-gnu.
>
> Ok for trunk?
>
> gcc/ChangeLog:
>
> * common/config/i386/i386-common.cc (ix86_optimization_table):
> Enable loop unroll and small loop unroll at O2 by default.
> * config/i386/i386-options.cc
> (ix86_override_options_after_change):
> Disable small loop unroll when funroll-loops enabled, reset
> cunroll_grow_size when it is not explicitly enabled.
> (ix86_option_override_internal): Call
> ix86_override_options_after_change instead of calling
> ix86_recompute_optlev_based_flags and ix86_default_align
> separately.
> * config/i386/i386.cc (ix86_loop_unroll_adjust): Adjust unroll
> factor if -munroll-only-small-loops enabled.
> * config/i386/i386.opt: Add -munroll-only-small-loops,
> -param=x86-small-unroll-ninsns= for loop insn limit,
> -param=x86-small-unroll-factor= for unroll factor.
> * doc/invoke.texi: Document -munroll-only-small-loops,
> x86-small-unroll-ninsns and x86-small-unroll-factor.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr86270.c: Add -mno-unroll-only-small-loops.
> * gcc.target/i386/pr93002.c: Likewise.
> ---
>  gcc/common/config/i386/i386-common.cc   |  6 
>  gcc/config/i386/i386-options.cc | 40 ++---
>  gcc/config/i386/i386.cc | 13 
>  gcc/config/i386/i386.opt| 13 
>  gcc/doc/invoke.texi | 14 +
>  gcc/testsuite/gcc.target/i386/pr86270.c |  2 +-
>  gcc/testsuite/gcc.target/i386/pr93002.c |  2 +-
>  7 files changed, 84 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/common/config/i386/i386-common.cc 
> b/gcc/common/config/i386/i386-common.cc
> index d6a68dc9b1d..0e580b39d14 100644
> --- a/gcc/common/config/i386/i386-common.cc
> +++ b/gcc/common/config/i386/i386-common.cc
> @@ -1686,6 +1686,12 @@ static const struct default_options 
> ix86_option_optimization_table[] =
>  /* The STC algorithm produces the smallest code at -Os, for x86.  */
>  { OPT_LEVELS_2_PLUS, OPT_freorder_blocks_algorithm_, NULL,
>REORDER_BLOCKS_ALGORITHM_STC },
> +{ OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
> +{ OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_only_small_loops, NULL, 1 },
> +/* Turns off -frename-registers and -fweb which are enabled by
> +   funroll-loops.  */
> +{ OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 },
> +{ OPT_LEVELS_ALL, OPT_fweb, NULL, 0 },

I'm quite sure that if this works it's not by intention.  Doesn't this
also disable
register renaming and web when the user explicitely specifies -funroll-loops?

Doesn't this change -funroll-loops behavior everywhere, only unrolling small
loops?

I'd like to see a -munroll-only-small-loops addition that doesn't have any such
effects.  Note RTL unrolling could also
conditionally enabled on a new -funroll-small-loops which wouldn't enable
register renaming or web.

>  /* Turn off -fschedule-insns by default.  It tends to make the
> problem with not enough registers even worse.  */
>  { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 },
> diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
> index acb2291e70f..6ea347c32e1 100644
> --- a/gcc/config/i386/i386-options.cc
> +++ b/gcc/config/i386/i386-options.cc
> @@ -1819,8 +1819,43 @@ ix86_recompute_optlev_based_flags (struct gcc_options 
> *opts,
>  void
>  ix86_override_options_after_change (void)
>  {
> +  /* Default align_* from the processor table.  */
>ix86_default_align (_options);
> +
>ix86_recompute_optlev_based_flags (_options, _options_set);
> +
> +  /* Disable unrolling small loops when there's explicit
> + -f{,no}unroll-loop.  */
> +  if ((OPTION_SET_P (flag_unroll_loops))
> + || (OPTION_SET_P (flag_unroll_all_loops)
> +&& flag_unroll_all_loops))
> +{
> +  if (!OPTION_SET_P (ix86_unroll_only_small_loops))
> +   ix86_unroll_only_small_loops = 0;
> +  /* Re-enable -frename-registers and -fweb if funroll-loops
> +enabled.  */
> +  if (!OPTION_SET_P (flag_web))
> +   flag_web = flag_unroll_loops;
> +  if (!OPTION_SET_P (flag_rename_registers))
> +   flag_rename_registers = flag_unroll_loops;
> +  if (!OPTION_SET_P (flag_cunroll_grow_size))
> +   flag_cunroll_grow_size = flag_unroll_loops

Re: RFC - VRP1 default mode

2022-10-28 Thread Richard Biener via Gcc-patches
On Wed, Oct 26, 2022 at 4:24 PM Andrew MacLeod  wrote:
>
> Figured I would ask what you guys think of making ranger the default for
> the VRP1 pass now.
>
> With partial equivalences and the other bits I checked in the past few
> weeks I'm not aware of much that the legacy VRP pass gets that ranger
> doesn't.  The only exception to that which I am aware of is the trick
> played with the unreachable edges to set global ranges, but that is done
> in the DOM passes now anyway... so it just happens slightly later in the
> optimization cycle.

Note DOM should go away at some point.  Why can this not happen during
ranger driven VRP?

> There is one test case that needs adjustment for
> that which was just checking for a mask in DOM2
> (gcc.dg/tree-ssa/pr107009.c).   At this point I have not aware of
> anything that Id be concerned about, and the testsuite seems to run
> cleanly.

Did you enable Ada?  The only feature I don't see implemented is
symbolic range handling which boils down to general base + constant offset
range endpoints (that's what symbolic ranges allow).  That area was
specifically improved to optimize range checks emitted by the Ada frontend
but IIRC also applies to fortran -frange-check (not sure about test coverage
of that).

> We could change the default now and see if any issues show up, giving us
> a chance to address them. The code base has been well exercised for a
> while so risk should be low.  We could also reduce code size by
> stripping out unneeded code if we so desired.
>
> Or we could leave things as they are for one more cycle.  My preference
> would be to make the switch now and let it play out. Thoughts?
>
> Andrew
>
>
>
>


Re: [PATCH] ix86: Suggest unroll factor for loop vectorization

2022-10-28 Thread Richard Biener via Gcc-patches
On Wed, Oct 26, 2022 at 1:38 PM Cui, Lili  wrote:
>
> Hi Richard,
>
> > +@item x86-vect-unroll-min-ldst-threshold
> > +The vectorizer will check with target information to determine whether
> > +unroll it. This parameter is used to limit the mininum of loads and
> > +stores in the main loop.
> >
> > It's odd to "limit" the minimum number of something.  I think this warrants
> > clarification that for some (unknow to me ;)) reason we think that when we
> > have many loads and (or?) stores it is beneficial to unroll to get even more
> > loads and stores in a single iteration.  Btw, does the parameter limit the
> > number of loads and stores _after_ unrolling or before?
> >
> When the number of loads/stores exceeds the threshold, the loads/stores are 
> more likely to conflict with loop itself in the L1 cache(Assuming that 
> address of loads are scattered).
> Unroll + software scheduling will make 2 or 4 address contiguous loads/stores 
> closer together, it will reduce cache miss rate.

Ah, nice.  Can we express the default as a function of L1 data cache
size, L1 cache line size and
more importantly, the size of the vector memory access?

Btw, I was looking into making a more meaningful cost modeling for loop
distribution.  Similar reasoning might apply there - try to _reduce_ the
number of memory streams so L1 cache utilization allows re-use of a
cache line in the next [next N] iteration[s]?  OTOH given L1D is quite
large I'd expect the loops affected to be either quite huge or bottlenecked
by load/store bandwith (there are 1024 L1D cache lines in zen2 for
example) - what's the effective L1D load you are keying off?.
Btw, how does L1D allocation on stores play a role here?

> > +@item x86-vect-unroll-max-loop-size
> > +The vectorizer will check with target information to determine whether
> > +unroll it. This threshold is used to limit the max size of loop body after
> > unrolling.
> > +The default value is 200.
> >
> > it should probably say not "size" but "number of instructions".  Note that 
> > 200
> > is quite large given we are talking about vector instructions here which 
> > have
> > larger encodings than scalar instructions.  Optimistically assuming
> > 4 byte encoding (quite optimistic give we're looking at loops with many
> > loads/stores) that would be an 800 byte loop body which would be 25 cache
> > lines.
> > ISTR that at least the loop discovery is limited to a lot smaller cases 
> > (but we
> > are likely not targeting that).  The limit probably still works to fit the 
> > loop
> > body in the u-op caches though.
> >
> Agree with you, it should be "x86-vect-unroll-max-loop-insns". Thanks for the 
> reminder about larger encodings, I checked the skylake uop cache, it can hold 
> 1.5k uOPs, 200 * 2 (1~3 uops/instruction) = 400 uops. I think 200 still work.
>
> > That said, the heuristic made me think "what the heck".  Can we explain in 
> > u-
> > arch terms why the unrolling is beneficial instead of just defering to SPEC
> > CPU 2017 fotonik?
> >
> Regarding the benefits,  I explained in the first answer, I checked 5 hottest 
> functions in the 549, they all benefit from it, it improves the cache hit 
> ratio.
>
> Thanks,
> Lili.
>
> > > On Mon, Oct 24, 2022 at 10:46 AM Cui,Lili via Gcc-patches
> > >  wrote:
> > > >
> > > > Hi Hongtao,
> > > >
> > > > This patch introduces function finish_cost and
> > > > determine_suggested_unroll_factor for x86 backend, to make it be
> > > > able to suggest the unroll factor for a given loop being vectorized.
> > > > Referring to aarch64, RS6000 backends and basing on the analysis on
> > > > SPEC2017 performance evaluation results.
> > > >
> > > > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > > >
> > > > OK for trunk?
> > > >
> > > >
> > > >
> > > > With this patch, SPEC2017 performance evaluation results on
> > > > ICX/CLX/ADL/Znver3 are listed below:
> > > >
> > > > For single copy:
> > > >   - ICX: 549.fotonik3d_r +6.2%, the others are neutral
> > > >   - CLX: 549.fotonik3d_r +1.9%, the others are neutral
> > > >   - ADL: 549.fotonik3d_r +4.5%, the others are neutral
> > > >   - Znver3: 549.fotonik3d_r +4.8%, the others are neutral
> > > >
> > > > For multi-copy:
> > > >   - ADL: 549.fotonik3d_r +2.7%, the others are neutral
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * config/i386/i386.cc (class ix86_vector_costs): Add new members
> > > >  m_nstmts, m_nloads m_nstores and
> > determine_suggested_unroll_factor.
> > > > (ix86_vector_costs::add_stmt_cost): Update for m_nstores,
> > m_nloads
> > > > and m_nstores.
> > > > (ix86_vector_costs::determine_suggested_unroll_factor): New
> > function.
> > > > (ix86_vector_costs::finish_cost): Diito.
> > > > * config/i386/i386.opt:(x86-vect-unroll-limit): New parameter.
> > > > (x86-vect-unroll-min-ldst-threshold): Likewise.
> > > > (x86-vect-unroll-max-loop-size): Likewise.
> > > > * doc/invoke.texi: Document 

Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.

2022-10-28 Thread Richard Biener via Gcc-patches
On Fri, Oct 28, 2022 at 8:43 AM Dimitrije Milosevic
 wrote:
>
> Hi Jeff,
>
> > THe part I don't understand is, if you only have BASE+OFF, why does
> > preventing the calculation of more complex addressing modes matter?  ie,
> > what's the point of computing the cost of something like base + off +
> > scaled index when the target can't utilize it?
>
> Well, the complexities of all addressing modes other than BASE + OFFSET are
> equal to 0. For targets like Mips, which only has BASE + OFFSET, it would 
> still
> be more complex to use a candidate with BASE + INDEX << SCALE + OFFSET
> than a candidate with BASE + INDEX, for example, as it has to compensate
> the lack of other addressing modes somehow. If complexities for both of
> those are equal to 0, in cases where complexities decide which candidate is
> to be chosen, a more complex candidate may be picked.

But something is wrong then - it shouldn't ever pick a candidate with
an addressing
mode that isn't supported?  So you say that the cost of expressing
'BASE + INDEX << SCALE + OFFSET' as 'BASE + OFFSET' is not computed
accurately?

The function tries to compensate for that, maybe you can point out
where it goes wrong?
That is, at the end it adjusts cost and complexity based on what it
scrapped before, maybe
that is just a bit incomplete?

Note the original author of this is not available so it would help
(maybe also yourself) to
walk through the function with a specific candidate / use where you
think the complexity
(or cost) is wrong?


> Regards,
> Dimitrije
>
>
> From: Jeff Law 
> Sent: Friday, October 28, 2022 1:02 AM
> To: Dimitrije Milosevic ; 
> gcc-patches@gcc.gnu.org 
> Cc: Djordje Todorovic 
> Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost 
> complexity.
>
>
> On 10/21/22 07:52, Dimitrije Milosevic wrote:
> > From: Dimitrije Milošević 
> >
> > This patch reverts the computation of address cost complexity
> > to the legacy one. After f9f69dd, complexity is calculated
> > using the valid_mem_ref_p target hook. Architectures like
> > Mips only allow BASE + OFFSET addressing modes, which in turn
> > prevents the calculation of complexity for other addressing
> > modes, resulting in non-optimal candidate selection.
> >
> > gcc/ChangeLog:
> >
> >* tree-ssa-address.cc (multiplier_allowed_in_address_p): Change
> >to non-static.
> >* tree-ssa-address.h (multiplier_allowed_in_address_p): Declare.
> >* tree-ssa-loop-ivopts.cc (compute_symbol_and_var_present): 
> > Reintroduce.
> >(compute_min_and_max_offset): Likewise.
> >(get_address_cost): Revert
> >complexity calculation.
>
> THe part I don't understand is, if you only have BASE+OFF, why does
> preventing the calculation of more complex addressing modes matter?  ie,
> what's the point of computing the cost of something like base + off +
> scaled index when the target can't utilize it?
>
>
> jeff
>


Re: [PATCH] [PR tree-optimization/107394] Canonicalize global franges as they are read back.

2022-10-28 Thread Richard Biener via Gcc-patches
On Fri, Oct 28, 2022 at 8:48 AM Richard Biener
 wrote:
>
> On Fri, Oct 28, 2022 at 12:45 AM Jeff Law  wrote:
> >
> >
> > On 10/25/22 15:01, Aldy Hernandez via Gcc-patches wrote:
> > > [Richi/Jakub/FP experts, does this sound like the right solution, or am I
> > > missing some subtle IPA/inlining issue?]
> > >
> > > The problem here is that we're inlining a global range with NANs into
> > > a function that has been tagged with __attribute__((optimize
> > > ("-ffinite-math-only"))).  As the global range is copied from
> > > SSA_NAME_RANGE_INFO, its NAN bits are copied, which then cause
> > > frange::verify_range() to fail a sanity check making sure no NANs
> > > creep in when !HONOR_NANS.
> > >
> > > I think what we should do is nuke the NAN bits as we're restoring the
> > > global range.  For that matter, if we use the frange constructor,
> > > everything except that NAN sign will be done automatically, including
> > > dropping INFs to the min/max representable range when appropriate.
> > >
> > >   PR tree-optimization/107394
> > >
> > > gcc/ChangeLog:
> > >
> > >   * value-range-storage.cc (frange_storage_slot::get_frange): Use
> > >   frange constructor.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   * gcc.dg/tree-ssa/pr107394.c: New test.
> >
> > The other approach would be to disabling inlining in this case due to an
> > unsafe attribute mismatch, but we're not currently doing much sanity
> > checking in this space and it might be a huge can of worms.  I'm
> > inclined to ACK, but give Jakub and Richi until Monday to chime in first.
>
> We are actually quite careful in this regard but maybe our reasoning
> is wrong.  We are allowing inlining of -fno-finite-math-only into
> -ffinite-math-only code but not the other way around.
>
> On the actual patch I think that ranges with Inf/NaNs should be always
> treated as "valid", the optimization to trim them with certain options
> is optimization and thus optional.  So IMHO having verify_range ICE
> on NaNs isn't correct?

Just to make a point here - in functions with -ffinite-math-only in effect

volatile double x = __builtin_nan("");

will still have a literal NaN in the IL and that's not invalid GIMPLE.  You
cannot assume that no NaNs appear with -ffinite-math-only, you just
don't need to specially are about preserving them.

> That said, the patch is in line with what we do elsewhere at the moment,
> so I guess OK.
>
> Richard.
>
> >
> > jeff
> >


Re: [PATCH] [PR tree-optimization/107394] Canonicalize global franges as they are read back.

2022-10-28 Thread Richard Biener via Gcc-patches
On Fri, Oct 28, 2022 at 12:45 AM Jeff Law  wrote:
>
>
> On 10/25/22 15:01, Aldy Hernandez via Gcc-patches wrote:
> > [Richi/Jakub/FP experts, does this sound like the right solution, or am I
> > missing some subtle IPA/inlining issue?]
> >
> > The problem here is that we're inlining a global range with NANs into
> > a function that has been tagged with __attribute__((optimize
> > ("-ffinite-math-only"))).  As the global range is copied from
> > SSA_NAME_RANGE_INFO, its NAN bits are copied, which then cause
> > frange::verify_range() to fail a sanity check making sure no NANs
> > creep in when !HONOR_NANS.
> >
> > I think what we should do is nuke the NAN bits as we're restoring the
> > global range.  For that matter, if we use the frange constructor,
> > everything except that NAN sign will be done automatically, including
> > dropping INFs to the min/max representable range when appropriate.
> >
> >   PR tree-optimization/107394
> >
> > gcc/ChangeLog:
> >
> >   * value-range-storage.cc (frange_storage_slot::get_frange): Use
> >   frange constructor.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.dg/tree-ssa/pr107394.c: New test.
>
> The other approach would be to disabling inlining in this case due to an
> unsafe attribute mismatch, but we're not currently doing much sanity
> checking in this space and it might be a huge can of worms.  I'm
> inclined to ACK, but give Jakub and Richi until Monday to chime in first.

We are actually quite careful in this regard but maybe our reasoning
is wrong.  We are allowing inlining of -fno-finite-math-only into
-ffinite-math-only code but not the other way around.

On the actual patch I think that ranges with Inf/NaNs should be always
treated as "valid", the optimization to trim them with certain options
is optimization and thus optional.  So IMHO having verify_range ICE
on NaNs isn't correct?

That said, the patch is in line with what we do elsewhere at the moment,
so I guess OK.

Richard.

>
> jeff
>


Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.

2022-10-28 Thread Dimitrije Milosevic
Hi Jeff,

> THe part I don't understand is, if you only have BASE+OFF, why does 
> preventing the calculation of more complex addressing modes matter?  ie, 
> what's the point of computing the cost of something like base + off + 
> scaled index when the target can't utilize it?

Well, the complexities of all addressing modes other than BASE + OFFSET are
equal to 0. For targets like Mips, which only has BASE + OFFSET, it would still
be more complex to use a candidate with BASE + INDEX << SCALE + OFFSET
than a candidate with BASE + INDEX, for example, as it has to compensate
the lack of other addressing modes somehow. If complexities for both of
those are equal to 0, in cases where complexities decide which candidate is
to be chosen, a more complex candidate may be picked.

Regards,
Dimitrije


From: Jeff Law 
Sent: Friday, October 28, 2022 1:02 AM
To: Dimitrije Milosevic ; 
gcc-patches@gcc.gnu.org 
Cc: Djordje Todorovic 
Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity. 
 

On 10/21/22 07:52, Dimitrije Milosevic wrote:
> From: Dimitrije Milošević 
>
> This patch reverts the computation of address cost complexity
> to the legacy one. After f9f69dd, complexity is calculated
> using the valid_mem_ref_p target hook. Architectures like
> Mips only allow BASE + OFFSET addressing modes, which in turn
> prevents the calculation of complexity for other addressing
> modes, resulting in non-optimal candidate selection.
>
> gcc/ChangeLog:
>
>    * tree-ssa-address.cc (multiplier_allowed_in_address_p): Change
>    to non-static.
>    * tree-ssa-address.h (multiplier_allowed_in_address_p): Declare.
>    * tree-ssa-loop-ivopts.cc (compute_symbol_and_var_present): 
>Reintroduce.
>    (compute_min_and_max_offset): Likewise.
>    (get_address_cost): Revert
>    complexity calculation.

THe part I don't understand is, if you only have BASE+OFF, why does 
preventing the calculation of more complex addressing modes matter?  ie, 
what's the point of computing the cost of something like base + off + 
scaled index when the target can't utilize it?


jeff



Re: [PATCH] i386: using __bf16 for AVX512BF16 intrinsics

2022-10-28 Thread Hongtao Liu via Gcc-patches
On Fri, Oct 28, 2022 at 2:20 PM Kong, Lingling via Gcc-patches
 wrote:
>
> Hi,
>
> Previously we use unsigned short to represent bf16. It's not a good 
> expression, and at the time the front end didn't support bf16 type.
> Now we introduced __bf16 to X86 psABI. So we can switch intrinsics to the new 
> type.
>
> Ok for trunk ?
LGTM, but please don't commit it until next week to leave some time
for others to take a look.
Also please update GCC13 doc for it.
https://gcc.gnu.org/gcc-13/changes.html.
>
> Thanks,
> Lingling
>
> gcc/ChangeLog:
>
> * config/i386/avx512bf16intrin.h (__attribute__): Change short to 
> bf16.
> (_mm_cvtsbh_ss): Ditto.
> (_mm512_cvtne2ps_pbh): Ditto.
> (_mm512_mask_cvtne2ps_pbh): Ditto.
> (_mm512_maskz_cvtne2ps_pbh): Ditto.
> * config/i386/avx512bf16vlintrin.h (__attribute__): Ditto.
> (_mm256_cvtne2ps_pbh): Ditto.
> (_mm256_mask_cvtne2ps_pbh): Ditto.
> (_mm256_maskz_cvtne2ps_pbh): Ditto.
> (_mm_cvtne2ps_pbh): Ditto.
> (_mm_mask_cvtne2ps_pbh): Ditto.
> (_mm_maskz_cvtne2ps_pbh): Ditto.
> (_mm_cvtness_sbh): Ditto.
> * config/i386/i386-builtin-types.def (V8BF): Add new
> DEF_VECTOR_TYPE for BFmode.
> (V16BF): Ditto.
> (V32BF): Ditto.
> * config/i386/i386-builtin.def (BDESC): Fixed builtins.
> * config/i386/i386-expand.cc (ix86_expand_args_builtin): Changed
> avx512bf16 ix86_builtin_func_type included HI to BF.
> * config/i386/immintrin.h: Add SSE2 depend for avx512bf16.
> * config/i386/sse.md (TARGET_AVX512VL): Changed HI vector to BF
> vector.
> (avx512f_cvtneps2bf16_v4sf): New define_expand.
> (*avx512f_cvtneps2bf16_v4sf): New define_insn.
> (avx512f_cvtneps2bf16_v4sf_maskz):Ditto.
> (avx512f_cvtneps2bf16_v4sf_mask): Ditto.
> (avx512f_cvtneps2bf16_v4sf_mask_1): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/avx512bf16-cvtsbh2ss-1.c: Add fpmath option.
> * gcc.target/i386/avx512bf16-vdpbf16ps-2.c: Fixed
> scan-assembler.
> * gcc.target/i386/avx512bf16vl-cvtness2sbh-1.c: Add x/y suffix
> for vcvtneps2bf16.
> * gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1.c: Ditto.
> ---
>  gcc/config/i386/avx512bf16intrin.h|  12 +--
>  gcc/config/i386/avx512bf16vlintrin.h  |  29 ++---
>  gcc/config/i386/i386-builtin-types.def|  51 -
>  gcc/config/i386/i386-builtin.def  |  54 +-
>  gcc/config/i386/i386-expand.cc|  48 -
>  gcc/config/i386/immintrin.h   |   2 +
>  gcc/config/i386/sse.md| 101 ++
>  .../gcc.target/i386/avx512bf16-cvtsbh2ss-1.c  |   2 +-
>  .../gcc.target/i386/avx512bf16-vdpbf16ps-2.c  |   2 +-
>  .../i386/avx512bf16vl-cvtness2sbh-1.c |   2 +-
>  .../i386/avx512bf16vl-vcvtneps2bf16-1.c   |  12 +--
>  11 files changed, 189 insertions(+), 126 deletions(-)
>
> diff --git a/gcc/config/i386/avx512bf16intrin.h 
> b/gcc/config/i386/avx512bf16intrin.h
> index b6e9ddad157..ea1d0125b3f 100644
> --- a/gcc/config/i386/avx512bf16intrin.h
> +++ b/gcc/config/i386/avx512bf16intrin.h
> @@ -35,16 +35,16 @@
>  #endif /* __AVX512BF16__ */
>
>  /* Internal data types for implementing the intrinsics.  */
> -typedef short __v32bh __attribute__ ((__vector_size__ (64)));
> +typedef __bf16 __v32bf __attribute__ ((__vector_size__ (64)));
>
>  /* The Intel API is flexible enough that we must allow aliasing with other
> vector types, and their scalar components.  */
> -typedef short __m512bh __attribute__ ((__vector_size__ (64), __may_alias__));
> +typedef __bf16 __m512bh __attribute__ ((__vector_size__ (64), 
> __may_alias__));
>
>  /* Convert One BF16 Data to One Single Float Data.  */
>  extern __inline float
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> -_mm_cvtsbh_ss (__bfloat16 __A)
> +_mm_cvtsbh_ss (__bf16 __A)
>  {
>union{ float a; unsigned int b;} __tmp;
>__tmp.b = ((unsigned int)(__A)) << 16;
> @@ -57,21 +57,21 @@ extern __inline __m512bh
>  __attribute__((__gnu_inline__, __always_inline__, __artificial__))
>  _mm512_cvtne2ps_pbh (__m512 __A, __m512 __B)
>  {
> -  return (__m512bh)__builtin_ia32_cvtne2ps2bf16_v32hi(__A, __B);
> +  return (__m512bh)__builtin_ia32_cvtne2ps2bf16_v32bf(__A, __B);
>  }
>
>  extern __inline __m512bh
>  __attribute__((__gnu_inline__, __always_inline__, __artificial__))
>  _mm512_mask_cvtne2ps_pbh (__m512bh __A, __mmask32 __B, __m512 __C, __m512 
> __D)
>  {
> -  return (__m512bh)__builtin_ia32_cvtne2ps2bf16_v32hi_mask(__C, __D, __A, 
> __B);
> +  return (__m512bh)__builtin_ia32_cvtne2ps2bf16_v32bf_mask(__C, __D, __A, 
> __B);
>  }
>
>  extern __inline __m512bh
>  __attribute__((__gnu_inline__, __always_inline__, __artificial__))
>  _mm512_maskz_cvtne2ps_pbh (__mmask32 __A, __m512 __B, __m512 __C)
>  {
> -  return 

Re: [PATCH] Convert flag_finite_math_only uses in frange to HONOR_*.

2022-10-28 Thread Aldy Hernandez via Gcc-patches
On Fri, Oct 28, 2022 at 1:00 AM Jeff Law  wrote:
>
>
> On 10/25/22 14:59, Aldy Hernandez via Gcc-patches wrote:
> > [As Richi, and probably Jakub, have mentioned in the past...]
> >
> > As mentioned earlier, we should be using HONOR_* on types rather than
> > flag_finite_math_only.
> >
> > Will commit pending tests.
> >
> > gcc/ChangeLog:
> >
> >   * value-range.cc (frange::set): Use HONOR_*.
> >   (frange::verify_range): Same.
> >   * value-range.h (frange_val_min): Same.
> >   (frange_val_max): Same.
>
> I haven't verified it's this patch, but our friend the vax regression is
> back:

Bah.  I suck.  There was one remaining use of flag_finite_math_only in
the self tests.  Fixed and finally done:

$ grep flag_finite *range*
value-range.cc:  int save_finite_math_only = flag_finite_math_only;
value-range.cc:  flag_finite_math_only = 1;
value-range.cc:  flag_finite_math_only = 0;
value-range.cc:  flag_finite_math_only = save_finite_math_only;

Aldy
From dc55841d9a45a2d93eaedd68841f7514723939d1 Mon Sep 17 00:00:00 2001
From: Aldy Hernandez 
Date: Fri, 28 Oct 2022 08:13:38 +0200
Subject: [PATCH] Change remaining flag_finite_math_only use in value-range.cc.

gcc/ChangeLog:

	* value-range.cc (range_tests_floats): Use HONOR_INFINITIES.
---
 gcc/value-range.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index 77e5a2cc299..03b3c4b4a65 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -4031,7 +4031,7 @@ range_tests_floats ()
   r0.intersect (r1);
   ASSERT_TRUE (r0.undefined_p ());
 
-  if (!flag_finite_math_only)
+  if (HONOR_INFINITIES (float_type_node))
 {
   // Make sure [-Inf, -Inf] doesn't get normalized.
   r0 = frange_float ("-Inf", "-Inf");
-- 
2.37.3



[PATCH] i386: using __bf16 for AVX512BF16 intrinsics

2022-10-28 Thread Kong, Lingling via Gcc-patches
Hi,

Previously we use unsigned short to represent bf16. It's not a good expression, 
and at the time the front end didn't support bf16 type.
Now we introduced __bf16 to X86 psABI. So we can switch intrinsics to the new 
type.

Ok for trunk ?

Thanks,
Lingling

gcc/ChangeLog:

* config/i386/avx512bf16intrin.h (__attribute__): Change short to bf16.
(_mm_cvtsbh_ss): Ditto.
(_mm512_cvtne2ps_pbh): Ditto.
(_mm512_mask_cvtne2ps_pbh): Ditto.
(_mm512_maskz_cvtne2ps_pbh): Ditto.
* config/i386/avx512bf16vlintrin.h (__attribute__): Ditto.
(_mm256_cvtne2ps_pbh): Ditto.
(_mm256_mask_cvtne2ps_pbh): Ditto.
(_mm256_maskz_cvtne2ps_pbh): Ditto.
(_mm_cvtne2ps_pbh): Ditto.
(_mm_mask_cvtne2ps_pbh): Ditto.
(_mm_maskz_cvtne2ps_pbh): Ditto.
(_mm_cvtness_sbh): Ditto.
* config/i386/i386-builtin-types.def (V8BF): Add new
DEF_VECTOR_TYPE for BFmode.
(V16BF): Ditto.
(V32BF): Ditto.
* config/i386/i386-builtin.def (BDESC): Fixed builtins.
* config/i386/i386-expand.cc (ix86_expand_args_builtin): Changed
avx512bf16 ix86_builtin_func_type included HI to BF.
* config/i386/immintrin.h: Add SSE2 depend for avx512bf16.
* config/i386/sse.md (TARGET_AVX512VL): Changed HI vector to BF
vector.
(avx512f_cvtneps2bf16_v4sf): New define_expand.
(*avx512f_cvtneps2bf16_v4sf): New define_insn.
(avx512f_cvtneps2bf16_v4sf_maskz):Ditto.
(avx512f_cvtneps2bf16_v4sf_mask): Ditto.
(avx512f_cvtneps2bf16_v4sf_mask_1): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512bf16-cvtsbh2ss-1.c: Add fpmath option.
* gcc.target/i386/avx512bf16-vdpbf16ps-2.c: Fixed
scan-assembler.
* gcc.target/i386/avx512bf16vl-cvtness2sbh-1.c: Add x/y suffix
for vcvtneps2bf16.
* gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1.c: Ditto.
---
 gcc/config/i386/avx512bf16intrin.h|  12 +--
 gcc/config/i386/avx512bf16vlintrin.h  |  29 ++---
 gcc/config/i386/i386-builtin-types.def|  51 -
 gcc/config/i386/i386-builtin.def  |  54 +-
 gcc/config/i386/i386-expand.cc|  48 -
 gcc/config/i386/immintrin.h   |   2 +
 gcc/config/i386/sse.md| 101 ++
 .../gcc.target/i386/avx512bf16-cvtsbh2ss-1.c  |   2 +-
 .../gcc.target/i386/avx512bf16-vdpbf16ps-2.c  |   2 +-
 .../i386/avx512bf16vl-cvtness2sbh-1.c |   2 +-
 .../i386/avx512bf16vl-vcvtneps2bf16-1.c   |  12 +--
 11 files changed, 189 insertions(+), 126 deletions(-)

diff --git a/gcc/config/i386/avx512bf16intrin.h 
b/gcc/config/i386/avx512bf16intrin.h
index b6e9ddad157..ea1d0125b3f 100644
--- a/gcc/config/i386/avx512bf16intrin.h
+++ b/gcc/config/i386/avx512bf16intrin.h
@@ -35,16 +35,16 @@
 #endif /* __AVX512BF16__ */
 
 /* Internal data types for implementing the intrinsics.  */
-typedef short __v32bh __attribute__ ((__vector_size__ (64)));
+typedef __bf16 __v32bf __attribute__ ((__vector_size__ (64)));
 
 /* The Intel API is flexible enough that we must allow aliasing with other
vector types, and their scalar components.  */
-typedef short __m512bh __attribute__ ((__vector_size__ (64), __may_alias__));
+typedef __bf16 __m512bh __attribute__ ((__vector_size__ (64), __may_alias__));
 
 /* Convert One BF16 Data to One Single Float Data.  */
 extern __inline float
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtsbh_ss (__bfloat16 __A)
+_mm_cvtsbh_ss (__bf16 __A)
 {
   union{ float a; unsigned int b;} __tmp;
   __tmp.b = ((unsigned int)(__A)) << 16;
@@ -57,21 +57,21 @@ extern __inline __m512bh
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_cvtne2ps_pbh (__m512 __A, __m512 __B)
 {
-  return (__m512bh)__builtin_ia32_cvtne2ps2bf16_v32hi(__A, __B);
+  return (__m512bh)__builtin_ia32_cvtne2ps2bf16_v32bf(__A, __B);
 }
 
 extern __inline __m512bh
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_mask_cvtne2ps_pbh (__m512bh __A, __mmask32 __B, __m512 __C, __m512 __D)
 {
-  return (__m512bh)__builtin_ia32_cvtne2ps2bf16_v32hi_mask(__C, __D, __A, __B);
+  return (__m512bh)__builtin_ia32_cvtne2ps2bf16_v32bf_mask(__C, __D, __A, __B);
 }
 
 extern __inline __m512bh
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_maskz_cvtne2ps_pbh (__mmask32 __A, __m512 __B, __m512 __C)
 {
-  return (__m512bh)__builtin_ia32_cvtne2ps2bf16_v32hi_maskz(__B, __C, __A);
+  return (__m512bh)__builtin_ia32_cvtne2ps2bf16_v32bf_maskz(__B, __C, __A);
 }
 
 /* vcvtneps2bf16 */
diff --git a/gcc/config/i386/avx512bf16vlintrin.h 
b/gcc/config/i386/avx512bf16vlintrin.h
index 969335ff358..56c28f14cf6 100644
--- a/gcc/config/i386/avx512bf16vlintrin.h
+++ b/gcc/config/i386/avx512bf16vlintrin.h
@@ -35,57 +35,58 @@
 #endif /* __AVX512BF16__ */
 
 /* Internal data types for 

Re: [PATCH] x86: Replace ne:CCC/ne:CCO with UNSPEC_CC_NE in neg patterns

2022-10-28 Thread Hongtao Liu via Gcc-patches
On Fri, Oct 28, 2022 at 1:56 PM Hongtao Liu  wrote:
>
> On Thu, Oct 27, 2022 at 2:59 AM H.J. Lu via Gcc-patches
>  wrote:
> >
> > In i386.md, neg patterns which set MODE_CC register like
> >
> > (set (reg:CCC FLAGS_REG)
> >  (ne:CCC (match_operand:SWI48 1 "general_reg_operand") (const_int 0)))
> >
> > can lead to errors when operand 1 is a constant value.  If FLAGS_REG in
> >
> > (set (reg:CCC FLAGS_REG)
> >  (ne:CCC (const_int 2) (const_int 0)))
> >
> > is set to 1, RTX simplifiers may simplify
> >
> > (set (reg:SI 93)
> >  (neg:SI (ltu:SI (reg:CCC 17 flags) (const_int 0 [0]
> >
> > as
> >
> > (set (reg:SI 93)
> >  (neg:SI (ltu:SI (const_int 1) (const_int 0 [0]
> >
> > which leads to incorrect results since LTU on MODE_CC register isn't the
> > same as "unsigned less than" in x86 backend.  To prevent RTL optimizers
> > from setting MODE_CC register to a constant, use UNSPEC_CC_NE to replace
> > ne:CCC/ne:CCO when setting FLAGS_REG in neg patterns.
> >
> > gcc/
> >
> > PR target/107172
> > * config/i386/i386.md (UNSPEC_CC_NE): New.
> > Replace ne:CCC/ne:CCO with UNSPEC_CC_NE in neg patterns.
> >
> > gcc/testsuite/
> >
> > PR target/107172
> > * gcc.target/i386/pr107172.c: New test.
> > ---
> >  gcc/config/i386/i386.md  | 45 +---
> >  gcc/testsuite/gcc.target/i386/pr107172.c | 26 ++
> >  2 files changed, 51 insertions(+), 20 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr107172.c
> >
> > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > index baf1f1f8fa2..aaa678e7314 100644
> > --- a/gcc/config/i386/i386.md
> > +++ b/gcc/config/i386/i386.md
> > @@ -113,6 +113,7 @@ (define_c_enum "unspec" [
> >UNSPEC_PEEPSIB
> >UNSPEC_INSN_FALSE_DEP
> >UNSPEC_SBB
> > +  UNSPEC_CC_NE
> >
> >;; For SSE/MMX support:
> >UNSPEC_FIX_NOTRUNC
> > @@ -11470,7 +11471,7 @@ (define_insn_and_split "*neg2_doubleword"
> >"&& reload_completed"
> >[(parallel
> >  [(set (reg:CCC FLAGS_REG)
> > - (ne:CCC (match_dup 1) (const_int 0)))
> > + (unspec:CCC [(match_dup 1) (const_int 0)] UNSPEC_CC_NE))
> >   (set (match_dup 0) (neg:DWIH (match_dup 1)))])
> > (parallel
> >  [(set (match_dup 2)
> > @@ -11499,7 +11500,8 @@ (define_peephole2
> > (match_operand:SWI48 1 "nonimmediate_gr_operand"))
> > (parallel
> >  [(set (reg:CCC FLAGS_REG)
> > - (ne:CCC (match_operand:SWI48 2 "general_reg_operand") (const_int 
> > 0)))
> > + (unspec:CCC [(match_operand:SWI48 2 "general_reg_operand")
> > +  (const_int 0)] UNSPEC_CC_NE))
> >   (set (match_dup 2) (neg:SWI48 (match_dup 2)))])
> > (parallel
> >  [(set (match_dup 0)
> > @@ -11517,7 +11519,7 @@ (define_peephole2
> > && !reg_mentioned_p (operands[2], operands[1])"
> >[(parallel
> >  [(set (reg:CCC FLAGS_REG)
> > - (ne:CCC (match_dup 2) (const_int 0)))
> > + (unspec:CCC [(match_dup 2) (const_int 0)] UNSPEC_CC_NE))
> >   (set (match_dup 2) (neg:SWI48 (match_dup 2)))])
> > (parallel
> >  [(set (match_dup 0)
> > @@ -11543,7 +11545,8 @@ (define_peephole2
> >   (clobber (reg:CC FLAGS_REG))])
> > (parallel
> >  [(set (reg:CCC FLAGS_REG)
> > - (ne:CCC (match_operand:SWI48 1 "general_reg_operand") (const_int 
> > 0)))
> > + (unspec:CCC [(match_operand:SWI48 1 "general_reg_operand")
> > +  (const_int 0)] UNSPEC_CC_NE))
> >   (set (match_dup 1) (neg:SWI48 (match_dup 1)))])
> > (parallel
> >  [(set (match_dup 0)
> > @@ -11559,7 +11562,7 @@ (define_peephole2
> >"REGNO (operands[0]) != REGNO (operands[1])"
> >[(parallel
> >  [(set (reg:CCC FLAGS_REG)
> > - (ne:CCC (match_dup 1) (const_int 0)))
> > + (unspec:CCC [(match_dup 1) (const_int 0)] UNSPEC_CC_NE))
> >   (set (match_dup 1) (neg:SWI48 (match_dup 1)))])
> > (parallel
> >  [(set (match_dup 0)
> > @@ -11635,9 +11638,9 @@ (define_insn "*negsi_2_zext"
> >
> >  (define_insn "*neg_ccc_1"
> >[(set (reg:CCC FLAGS_REG)
> > -   (ne:CCC
> > - (match_operand:SWI 1 "nonimmediate_operand" "0")
> > - (const_int 0)))
> > +   (unspec:CCC
> > + [(match_operand:SWI 1 "nonimmediate_operand" "0")
> > +  (const_int 0)] UNSPEC_CC_NE))
> > (set (match_operand:SWI 0 "nonimmediate_operand" "=m")
> > (neg:SWI (match_dup 1)))]
> >""
> > @@ -11647,9 +11650,9 @@ (define_insn "*neg_ccc_1"
> >
> >  (define_insn "*neg_ccc_2"
> >[(set (reg:CCC FLAGS_REG)
> > -   (ne:CCC
> > - (match_operand:SWI 1 "nonimmediate_operand" "0")
> > - (const_int 0)))
> > +   (unspec:CCC
> > + [(match_operand:SWI 1 "nonimmediate_operand" "0")
> > +  (const_int 0)] UNSPEC_CC_NE))
> > (clobber (match_scratch:SWI 0 "="))]
> >""
> >"neg{}\t%0"
> > @@ -11659,8 +11662,8 @@ (define_insn "*neg_ccc_2"
> >