[Patch] libgomp: Handle OpenMP's reverse offloads

2022-12-05 Thread Tobias Burnus

This patch finally handles reverse offload. Due to the prep work,
it essentially only adds content to libgomp/target.c's gomp_target_rev(),
except that it additionally saves the reverse-offload-function table
in gomp_load_image_to_device.

In the comment to "[Patch] libgomp: Add reverse-offload splay tree",
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601368.html ,
it was suggested not to keep track of all the variable mappings and
to reconstruct the mapping from the normal splay tree, which this
patch does.
(Albeit in the very slow walk-everything way. Given that reverse-offload
target regions likely have only few map items and program should only use
few reverse-offload regions and expect them not being fast, that might
be okay.)

Specification references:
- For pointer attachment, I assume that the pointer is already fine on
  the host (if existed on the host before) and it does not need to get
  updated. I think the spec lacks a wording for this; cf. OpenMP Spec Issue 
#3424.
- There are plans to permit 'nowait'. I think it wouldn't change anything
  except for not spin waiting for the result - and (only for shared memory),
  the argument lists (addr, kinds, sizes) need to be copied to have a sufficent
  life time. (To be implemented in future; cf. OpenMP Spec Pull Req. 3423
  for Issue 2038.)

 * * *

32bit vs. 64bit: libgomp itself is compiled with both -m32 and -m64; however,
nvptx and gcn requires -m64 on the device side and assume that the device
pointers are representable on the host (i.e. all are 64bit). The new code
tries to be in principle compatible with uint32_t pointers and uses uint64_t
to represent it consistently. – The code should be mostly fine, except that
one called function requires an array of void* and size_t. Instead of handling
that case, I added some code to permit optimizing away the function content
without offloading - and a run-time assert if it should ever happen that this
function gets called on a 32bit host from the target side.
It is a run-time fail as '#if TARGET_OFFLOAD == ""' does not work (string
comparison by the C preprocessor not supported, unfortunately).

Comments, suggestions, OK for mainline, ... ?

Tobias

PS:
* As follow-up,  libgomp.texi must be updated
* For GCN, it currently does not work until stack variables are accessible
  from the host. (Prep work for this is in newlib + GCC 13.) One done, a
  similar one-line change to plugin-gcn.c's GOMP_OFFLOAD_get_num_devices is
  required.

PPS: (Off topic remark to 32bit host)
While 32bit host with 32bit device will mostly work, having a 32bit host
with a 64bit device becomes interesting as 'void *' returned by 
omp_target_alloc(...)
can't represent a device pointer. The solution is a 32bit pointer pointing to a 
64bit
valirable, e.g.
  uint64_t *devptr = malloc(sizeof(uint64_t*);
  *devptr = internal_device_alloc ();
  return devptr;
with all the fun to translate this correctly with {use,has}_device_ptr etc.

To actually support this will require some larger changes to libgomp, which I
do not see happening unless a device system with sizeof(void*) > 64 bit shows
up. Or some compelling reason to use 32bit on the host; but not for for x86-64 
or arm64
(or PowerPC). (There exist 128bit pointer systems, which use the upper bits for 
extra
purposes - but for unified-shared address purposes, it seems to be unlikely that
accelerator devices head this direction.)
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
libgomp: Handle OpenMP's reverse offloads

This commit enabled reverse offload for nvptx such that gomp_target_rev
actually gets called.  And it fills the latter function to do all of
the following: finding the host function to the device func ptr and
copying the arguments to the host, processing the mapping/firstprivate,
calling the host function, copying back the data and freeing as needed.

The data handling is made easier by assuming that all host variables
either existed before (and are in the mapping) or that those are
devices variables not yet available on the host. Thus, the reverse
mapping can do without refcounts etc. Note that the spec disallows
inside a target region device-affecting constructs other than target
plus ancestor device-modifier and it also limits the clauses permitted
on this construct.

For the function addresses, an additional splay tree is used; for
the lookup of mapped variables, the existing splay-tree is used.
Unfortunately, its data structure requires a full walk of the tree;
Additionally, the just mapped variables are recorded in a separate
data structure an extra lookup. While the lookup is slow, assuming
that only few variables get mapped in each reverse offload construct
and that reverse offload is the exception and not performance critical,
this 

Re: [aarch64] PR107920 - Fix incorrect handling of virtual operands in svld1rq_impl::fold

2022-12-05 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni  writes:
> On Tue, 6 Dec 2022 at 00:08, Richard Sandiford
>  wrote:
>>
>> Prathamesh Kulkarni  writes:
>> > Hi,
>> > The following test:
>> >
>> > #include "arm_sve.h"
>> >
>> > svint8_t
>> > test_s8(int8_t *x)
>> > {
>> >   return svld1rq_s8 (svptrue_b8 (), [0]);
>> > }
>> >
>> > ICE's with -march=armv8.2-a+sve -O1 -fno-tree-ccp -fno-tree-forwprop:
>> > during GIMPLE pass: fre
>> > pr107920.c: In function ‘test_s8’:
>> > pr107920.c:7:1: internal compiler error: in execute_todo, at passes.cc:2140
>> > 7 | }
>> >   | ^
>> > 0x7b03d0 execute_todo
>> > ../../gcc/gcc/passes.cc:2140
>> >
>> > because of incorrect handling of virtual operands in svld1rq_impl::fold:
>> >  # VUSE <.MEM>
>> >   _5 = MEM  [(signed char * {ref-all})x_3(D)];
>> >   _4 = VEC_PERM_EXPR <_5, _5, { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
>> > 12, 13, 14, 15, ... }>;
>> >   # VUSE <.MEM_2(D)>
>> >   return _4;
>> >
>> > The attached patch tries to fix the issue by building the replacement
>> > statements in gimple_seq, and passing it to gsi_replace_with_seq_vops,
>> > which resolves the ICE, and results in:
>> >:
>> >   # VUSE <.MEM_2(D)>
>> >   _5 = MEM  [(signed char * {ref-all})x_3(D)];
>> >   _4 = VEC_PERM_EXPR <_5, _5, { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
>> > 12, 13, 14, 15, ... }>;
>> >   # VUSE <.MEM_2(D)>
>> >   return _4;
>> >
>> > Bootstrapped+tested on aarch64-linux-gnu.
>> > OK to commit ?
>>
>> Looks good, but we also need to deal with the -fnon-call-exceptions
>> point that Andrew made in the PR trail.  An easy way of testing
>> would be to make sure that:
>>
>> #include "arm_sve.h"
>>
>> svint8_t
>> test_s8(int8_t *x)
>> {
>>   try
>> {
>>   return svld1rq_s8 (svptrue_b8 (), [0]);
>> }
>>   catch (...)
>> {
>>   return svdup_s8 (1);
>> }
>> }
>>
>> compiled with -fnon-call-exceptions still has a call to __cxa_begin_catch.
>>
>> I don't think it's worth optimising this case.  Let's just add
>> !flag_non_call_exceptions to the test.
>>
>> The patch is missing a changelog btw.
> Thanks for the suggestions. Is the attached patch OK to commit after
> bootstrap+test ?

Please add the test above to g++.target/aarch64/sve, with a scan-assembler
for __cxa_begin_catch.  OK with that change, thanks.

Richard

>
> Thanks,
> Prathamesh
>>
>> Thanks,
>> Richard
>>
>> > Thanks,
>> > Prathamesh
>> >
>> > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
>> > b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> > index 6347407555f..f5546a65d22 100644
>> > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> > @@ -45,6 +45,7 @@
>> >  #include "aarch64-sve-builtins-base.h"
>> >  #include "aarch64-sve-builtins-functions.h"
>> >  #include "ssa.h"
>> > +#include "gimple-fold.h"
>> >
>> >  using namespace aarch64_sve;
>> >
>> > @@ -1232,7 +1233,9 @@ public:
>> >   tree mem_ref_op = fold_build2 (MEM_REF, access_type, arg1, zero);
>> >   gimple *mem_ref_stmt
>> > = gimple_build_assign (mem_ref_lhs, mem_ref_op);
>> > - gsi_insert_before (f.gsi, mem_ref_stmt, GSI_SAME_STMT);
>> > +
>> > + gimple_seq stmts = NULL;
>> > + gimple_seq_add_stmt_without_update (, mem_ref_stmt);
>> >
>> >   int source_nelts = TYPE_VECTOR_SUBPARTS (access_type).to_constant ();
>> >   vec_perm_builder sel (lhs_len, source_nelts, 1);
>> > @@ -1245,8 +1248,11 @@ public:
>> >  indices));
>> >   tree mask_type = build_vector_type (ssizetype, lhs_len);
>> >   tree mask = vec_perm_indices_to_tree (mask_type, indices);
>> > - return gimple_build_assign (lhs, VEC_PERM_EXPR,
>> > - mem_ref_lhs, mem_ref_lhs, mask);
>> > + gimple *g2 = gimple_build_assign (lhs, VEC_PERM_EXPR,
>> > +   mem_ref_lhs, mem_ref_lhs, mask);
>> > + gimple_seq_add_stmt_without_update (, g2);
>> > + gsi_replace_with_seq_vops (f.gsi, stmts);
>> > + return g2;
>> >}
>> >
>> >  return NULL;
>> > diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
>> > index c2d9c806aee..03cdb2f9f49 100644
>> > --- a/gcc/gimple-fold.cc
>> > +++ b/gcc/gimple-fold.cc
>> > @@ -591,7 +591,7 @@ fold_gimple_assign (gimple_stmt_iterator *si)
>> > If the statement has a lhs the last stmt in the sequence is expected
>> > to assign to that lhs.  */
>> >
>> > -static void
>> > +void
>> >  gsi_replace_with_seq_vops (gimple_stmt_iterator *si_p, gimple_seq stmts)
>> >  {
>> >gimple *stmt = gsi_stmt (*si_p);
>> > diff --git a/gcc/gimple-fold.h b/gcc/gimple-fold.h
>> > index 7d29ee9a9a4..87ed4e56d25 100644
>> > --- a/gcc/gimple-fold.h
>> > +++ b/gcc/gimple-fold.h
>> > @@ -63,6 +63,7 @@ extern bool arith_code_with_undefined_signed_overflow 
>> > (tree_code);
>> >  extern gimple_seq rewrite_to_defined_overflow (gimple *, bool = false);
>> >  extern void replace_call_with_value (gimple_stmt_iterator *, tree);
>> 

[PATCH] tree-optimization/104165 - bougs -Warray-bounds, add testcase

2022-12-05 Thread Richard Biener via Gcc-patches
The following adds the testcase from the description which was
fixed by r13-2894-gbe4a6551ed37c1.

Tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/104165
* g++.dg/warn/Warray-bounds-pr104165-1.C: New testcase.
---
 .../g++.dg/warn/Warray-bounds-pr104165-1.C| 27 +++
 1 file changed, 27 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/warn/Warray-bounds-pr104165-1.C

diff --git a/gcc/testsuite/g++.dg/warn/Warray-bounds-pr104165-1.C 
b/gcc/testsuite/g++.dg/warn/Warray-bounds-pr104165-1.C
new file mode 100644
index 000..bc46285eb90
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Warray-bounds-pr104165-1.C
@@ -0,0 +1,27 @@
+// { dg-do compile }
+// { dg-require-effective-target c++11 }
+// { dg-options "-O2 -Warray-bounds" }
+
+#include 
+
+static int bar(int n, int l)
+{
+  int f[l];
+  int x = 0;
+  int r = n;
+
+  for (; x < l;)
+if (r)
+  x = l;
+else
+  r = 1;
+
+  if (r == 1)
+std::sort(f, f + x, [](int a, int b) { return a > b; });
+  return 1;
+}
+
+int foo(int n)
+{
+  return bar(n, 4);
+}
-- 
2.35.3


Re: [PATCH] range-op-float: Fix up ICE in lower_bound [PR107975]

2022-12-05 Thread Richard Biener via Gcc-patches
On Mon, 5 Dec 2022, Jakub Jelinek wrote:

> Hi!
> 
> According to 
> https://gcc.gnu.org/pipermail/gcc-regression/2022-December/077258.html
> my patch caused some ICEs, e.g. the following testcase ICEs.
> The problem is that lower_bound and upper_bound methods on a france assert
> that the range isn't VR_NAN or VR_UNDEFINED.
> All the op1_range/op2_range methods already return early if lhs.undefined_p,
> but the other cases (when lhs is VR_NAN or the other opN is VR_NAN or
> VR_UNDEFINED) aren't.  float_binary_op_range_finish will DTRT for those
> cases already.
> 
> Ok for trunk if this passes bootstrap/regtest?

OK.

Richard.

> 2022-12-05  Jakub Jelinek  
> 
>   PR tree-optimization/107975
>   * range-op-float.cc (foperator_mult::op1_range,
>   foperator_div::op1_range, foperator_div::op2_range): Just
>   return float_binary_op_range_finish result if lhs is known
>   NAN, or the other operand is known NAN or UNDEFINED.
> 
>   * gcc.dg/pr107975.c: New test.
> 
> --- gcc/range-op-float.cc.jj  2022-12-05 11:17:34.900573272 +0100
> +++ gcc/range-op-float.cc 2022-12-05 17:58:38.059754128 +0100
> @@ -2146,6 +2146,8 @@ public:
>  bool ret = rdiv.fold_range (r, type, lhs, op2);
>  if (ret == false)
>return false;
> +if (lhs.known_isnan () || op2.known_isnan () || op2.undefined_p ())
> +  return float_binary_op_range_finish (ret, r, type, lhs);
>  const REAL_VALUE_TYPE _lb = lhs.lower_bound ();
>  const REAL_VALUE_TYPE _ub = lhs.upper_bound ();
>  const REAL_VALUE_TYPE _lb = op2.lower_bound ();
> @@ -2296,6 +2298,8 @@ public:
>  bool ret = fop_mult.fold_range (r, type, lhs, op2);
>  if (!ret)
>return ret;
> +if (lhs.known_isnan () || op2.known_isnan () || op2.undefined_p ())
> +  return float_binary_op_range_finish (ret, r, type, lhs);
>  const REAL_VALUE_TYPE _lb = lhs.lower_bound ();
>  const REAL_VALUE_TYPE _ub = lhs.upper_bound ();
>  const REAL_VALUE_TYPE _lb = op2.lower_bound ();
> @@ -2325,6 +2329,8 @@ public:
>  bool ret = fold_range (r, type, op1, lhs);
>  if (!ret)
>return ret;
> +if (lhs.known_isnan () || op1.known_isnan () || op1.undefined_p ())
> +  return float_binary_op_range_finish (ret, r, type, lhs);
>  const REAL_VALUE_TYPE _lb = lhs.lower_bound ();
>  const REAL_VALUE_TYPE _ub = lhs.upper_bound ();
>  const REAL_VALUE_TYPE _lb = op1.lower_bound ();
> --- gcc/testsuite/gcc.dg/pr107975.c.jj2022-12-05 19:15:56.518851401 
> +0100
> +++ gcc/testsuite/gcc.dg/pr107975.c   2022-12-05 19:15:04.014620281 +0100
> @@ -0,0 +1,15 @@
> +/* PR tree-optimization/107975 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-add-options ieee } */
> +
> +double
> +foo (double x, double y)
> +{
> +  if (x == 42.0)
> +return 1.0;
> +  double r = x * y;
> +  if (!__builtin_isnan (r))
> +__builtin_unreachable ();
> +  return r;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Re: [PATCH] tree-optimization/107852 - missed optimization with PHIs

2022-12-05 Thread Richard Biener via Gcc-patches
On Mon, 5 Dec 2022, Jan-Benedict Glaw wrote:

> On Tue, 2022-11-29 14:30:22 +0100, Richard Biener via Gcc-patches 
>  wrote:
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
> > 
> > PR tree-optimization/107852
> > * tree-ssa-sccvn.cc (visit_phi): Use equivalences recorded
> > as predicated values to elide more redundant PHIs.
> > 
> > * gcc.dg/tree-ssa/ssa-fre-101.c: New testcase.
> 
> This seems to trigger an issue when building the Linux powerpc kernel
> for the skiroot_defconfig:
> 
> [mk all 2022-12-05 19:50:10]   powerpc64-linux-gcc 
> -Wp,-MMD,drivers/dma-buf/.dma-fence-array.o.d -nostdinc 
> -I./arch/powerpc/include -I./arch/powerpc/include/generated  -I./include 
> -I./arch/powerpc/include/uapi -I./arch/powerpc/include/generated/uapi 
> -I./include/uapi -I./include/generated/uapi -include 
> ./include/linux/compiler-version.h -include ./include/linux/kconfig.h 
> -include ./include/linux/compiler_types.h -D__KERNEL__ -I ./arch/powerpc 
> -DHAVE_AS_ATHIGH=1 -fmacro-prefix-map=./= -Wall -Wundef 
> -Werror=strict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common 
> -fshort-wchar -fno-PIE -Werror=implicit-function-declaration 
> -Werror=implicit-int -Werror=return-type -Wno-format-security -std=gnu11 
> -mlittle-endian -m64 -msoft-float -pipe -mtraceback=no -mabi=elfv2 
> -mcmodel=medium -mno-pointers-to-nested-functions -mcpu=power8 -mtune=power10 
> -mno-prefixed -mno-pcrel -mno-altivec -mno-vsx -mno-mma 
> -fno-asynchronous-unwind-tables -mno-string -Wa,-maltivec -Wa,-mpower4 
> -Wa,-many -mno
 -strict-align -mlittle-endian -mstack-protector-guard=tls 
-mstack-protector-guard-reg=r13 -fno-delete-null-pointer-checks 
-Wno-frame-address -Wno-format-truncation -Wno-format-overflow 
-Wno-address-of-packed-member -Os -fno-allow-store-data-races 
-Wframe-larger-than=2048 -fstack-protector-strong -Wno-main 
-Wno-unused-but-set-variable -Wno-unused-const-variable -Wno-dangling-pointer 
-fomit-frame-pointer -ftrivial-auto-var-init=zero -fno-stack-clash-protection 
-Wdeclaration-after-statement -Wvla -Wno-pointer-sign -Wcast-function-type 
-Wno-stringop-truncation -Wno-stringop-overflow -Wno-restrict 
-Wno-maybe-uninitialized -Wno-alloc-size-larger-than -Wimplicit-fallthrough=5 
-fno-strict-overflow -fno-stack-check -fconserve-stack -Werror=date-time 
-Werror=incompatible-pointer-types -Werror=designated-init 
-Wno-packed-not-aligned -mstack-protector-guard-offset=2800
-DKBUILD_MODFILE='"drivers/dma-buf/dma-fence-array"' 
-DKBUILD_BASENAME='"dma_fence_array"' -DKBUILD_MODNAME='"dma_fence_arra
 y"' -D__KBUILD_MODNAME=kmod_dma_fence_array -c -o 
drivers/dma-buf/dma-fence-array.o drivers/dma-buf/dma-fence-array.c  
> [mk all 2022-12-05 19:50:10] drivers/dma-buf/dma-fence-array.c: In function 
> 'dma_fence_array_create':
> [mk all 2022-12-05 19:50:10] drivers/dma-buf/dma-fence-array.c:154:25: error: 
> control flow in the middle of basic block 12
> [mk all 2022-12-05 19:50:10]   154 | struct dma_fence_array 
> *dma_fence_array_create(int num_fences,
> [mk all 2022-12-05 19:50:10]   | 
> ^~
> [mk all 2022-12-05 19:50:10] during GIMPLE pass: ivopts
> [mk all 2022-12-05 19:50:10] drivers/dma-buf/dma-fence-array.c:154:25: 
> internal compiler error: verify_flow_info failed
> [mk all 2022-12-05 19:50:10] 0x19ea876 internal_error(char const*, ...)
> [mk all 2022-12-05 19:50:10]???:0
> [mk all 2022-12-05 19:50:10] 0x94b00e verify_flow_info()
> [mk all 2022-12-05 19:50:10]???:0
> [mk all 2022-12-05 19:50:10] Please submit a full bug report, with 
> preprocessed source (by using -freport-bug).
> [mk all 2022-12-05 19:50:10] Please include the complete backtrace with any 
> bug report.
> [mk all 2022-12-05 19:50:10] See  for instructions.
> 
> Maybe you've got an idea, otherwise I'll try to reproduce it manually.
> (That's all automated building.)

I'll note the above ICE is quite a few passes later during IVOPTs so
the change triggered a latent issue.  Wild guessing makes me think
it's some asm goto being mis-handled.

Can you please open a bugreport and provide preprocessed source so
one can reproduce this with a cc1 cross?

Thanks,
Richard.


[PATCH v3, rs6000] Enable have_cbranchcc4 on rs6000

2022-12-05 Thread HAO CHEN GUI via Gcc-patches
Hi,
  This patch enables "have_cbranchcc4" on rs6000 by defining
a "cbranchcc4" expander. "have_cbrnachcc4" is a flag in ifcvt.cc
to indicate if branch by CC bits is invalid or not. With this
flag enabled, some branches can be optimized to conditional
moves.

  The patch relies on the former patch which is under review.
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/607810.html

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is this okay for trunk? Any recommendations? Thanks
a lot.

Thanks
Gui Haochen

ChangeLog
2022-12-06  Haochen Gui 

gcc/
* config/rs6000/rs6000.md (cbranchcc4): New expander.

gcc/testsuite
* gcc.target/powerpc/cbranchcc4.c: New.
* gcc.target/powerpc/cbranchcc4-1.c: New.


patch.diff
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index e9e5cd1e54d..d7ddd96cc70 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -11932,6 +11932,16 @@ (define_expand "cbranch4"
   DONE;
 })

+(define_expand "cbranchcc4"
+  [(set (pc)
+   (if_then_else (match_operator 0 "branch_comparison_operator"
+   [(match_operand 1 "cc_reg_operand")
+(match_operand 2 "zero_constant")])
+ (label_ref (match_operand 3))
+ (pc)))]
+  ""
+  "")
+
 (define_expand "cstore4_signed"
   [(use (match_operator 1 "signed_comparison_operator"
  [(match_operand:P 2 "gpc_reg_operand")
diff --git a/gcc/testsuite/gcc.target/powerpc/cbranchcc4-1.c 
b/gcc/testsuite/gcc.target/powerpc/cbranchcc4-1.c
new file mode 100644
index 000..3c8286bf091
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/cbranchcc4-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" */
+
+/* This case should be successfully compiled after cbranchcc4 is enabled.  It
+   generates a "*cbranch_2insn" insn which makes predicate check of cbranchcc4
+   failed and returns a NULL rtx from prepare_cmp_insn.  */
+
+int foo (double d)
+{
+  if (d == 0.0)
+return 0;
+
+  d = ((d) >= 0 ? (d) : -(d));
+
+  if (d < 1.0)
+return 1;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/cbranchcc4.c 
b/gcc/testsuite/gcc.target/powerpc/cbranchcc4.c
new file mode 100644
index 000..528ba1a878d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/cbranchcc4.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-ce1" } */
+/* { dg-final { scan-rtl-dump "noce_try_store_flag_constants" "ce1" } } */
+
+/* The inner branch should be detected by ifcvt then be converted to a setcc
+   with a plus by noce_try_store_flag_constants.  */
+
+int test (unsigned int a, unsigned int b)
+{
+return (a < b ? 0 : (a > b ? 2 : 1));
+}


Re: [PATCH V2] Use subscalar mode to move struct block for parameter

2022-12-05 Thread Jiufu Guo via Gcc-patches
Hi,

Jeff Law  writes:

> On 11/28/22 20:53, Jiufu Guo wrote:
>
>>>
>>> Right, but the number of registers is target dependent, so I don't see
>>> how using "8" or any number of that matter is correct here.
>> I understand.  And even for the same struct type, using how many
>> registers to pass a parameter, it also dependends on the size of the
>> parameter and how many leading parameters there is.
>> So, as you said, "8" or any numbers are not always accurate.
>>
>> Because, the enhancement in this patch is just make "block move" to be
>> more friendly for follow optiomizations(cse/dse/dce...) by moving
>> through sub-mode.  So, I just selected one arbitrary number which may
>> not too large and also tolerable.
>> I also through to query the max number registers from targets for
>> param/ret passing, but it may not very accurate neither.
>>
>> Any sugguestions are welcome! and thanks!
> OK, so it's just a magic number and (in theory) any number should
> still generate correct code -- the number merely places limits on when
> we'll consider performing this optimization.
Right.
>
> It may be overkill, but you might consider making it a PARAM that can
> be adjusted.

Yes. I also feel the magic number is not perfect. I'm thinking of a way
to mark if a param is stored to stack from the register when setup, and
then reference the marker during this optimization. This would be more
accurate and able to avoid the magic number.

>
>
>>
>> For this patch, only simple stuffs are handled like "D.xxx = param_1",
>> where the source and dest of the assignment are all in memory which is
>> the DECL_RTL(of D.xx/param_xx) in MEM_P/BLK.
>> And to avoid complicate, this patch only handle the case where
>> "(size % mode_size) == 0".
>>
>> If any misunderstandings, please point out, thanks.
>> And thanks for comments!
> How values are justified varies on the PA depending on whether the
> parameter is passed in registers or in memory.  Though thinking more
> about things, I don't think you're changing how the parameter is
> passed. Just how it's subsequently pulled out of memory.
Right!

Thanks a lot for your comments!

BR,
Jeff (Jiufu)
>
> jeff


Re: [aarch64] PR107920 - Fix incorrect handling of virtual operands in svld1rq_impl::fold

2022-12-05 Thread Prathamesh Kulkarni via Gcc-patches
On Tue, 6 Dec 2022 at 00:08, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > Hi,
> > The following test:
> >
> > #include "arm_sve.h"
> >
> > svint8_t
> > test_s8(int8_t *x)
> > {
> >   return svld1rq_s8 (svptrue_b8 (), [0]);
> > }
> >
> > ICE's with -march=armv8.2-a+sve -O1 -fno-tree-ccp -fno-tree-forwprop:
> > during GIMPLE pass: fre
> > pr107920.c: In function ‘test_s8’:
> > pr107920.c:7:1: internal compiler error: in execute_todo, at passes.cc:2140
> > 7 | }
> >   | ^
> > 0x7b03d0 execute_todo
> > ../../gcc/gcc/passes.cc:2140
> >
> > because of incorrect handling of virtual operands in svld1rq_impl::fold:
> >  # VUSE <.MEM>
> >   _5 = MEM  [(signed char * {ref-all})x_3(D)];
> >   _4 = VEC_PERM_EXPR <_5, _5, { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
> > 12, 13, 14, 15, ... }>;
> >   # VUSE <.MEM_2(D)>
> >   return _4;
> >
> > The attached patch tries to fix the issue by building the replacement
> > statements in gimple_seq, and passing it to gsi_replace_with_seq_vops,
> > which resolves the ICE, and results in:
> >:
> >   # VUSE <.MEM_2(D)>
> >   _5 = MEM  [(signed char * {ref-all})x_3(D)];
> >   _4 = VEC_PERM_EXPR <_5, _5, { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
> > 12, 13, 14, 15, ... }>;
> >   # VUSE <.MEM_2(D)>
> >   return _4;
> >
> > Bootstrapped+tested on aarch64-linux-gnu.
> > OK to commit ?
>
> Looks good, but we also need to deal with the -fnon-call-exceptions
> point that Andrew made in the PR trail.  An easy way of testing
> would be to make sure that:
>
> #include "arm_sve.h"
>
> svint8_t
> test_s8(int8_t *x)
> {
>   try
> {
>   return svld1rq_s8 (svptrue_b8 (), [0]);
> }
>   catch (...)
> {
>   return svdup_s8 (1);
> }
> }
>
> compiled with -fnon-call-exceptions still has a call to __cxa_begin_catch.
>
> I don't think it's worth optimising this case.  Let's just add
> !flag_non_call_exceptions to the test.
>
> The patch is missing a changelog btw.
Thanks for the suggestions. Is the attached patch OK to commit after
bootstrap+test ?

Thanks,
Prathamesh
>
> Thanks,
> Richard
>
> > Thanks,
> > Prathamesh
> >
> > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
> > b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > index 6347407555f..f5546a65d22 100644
> > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > @@ -45,6 +45,7 @@
> >  #include "aarch64-sve-builtins-base.h"
> >  #include "aarch64-sve-builtins-functions.h"
> >  #include "ssa.h"
> > +#include "gimple-fold.h"
> >
> >  using namespace aarch64_sve;
> >
> > @@ -1232,7 +1233,9 @@ public:
> >   tree mem_ref_op = fold_build2 (MEM_REF, access_type, arg1, zero);
> >   gimple *mem_ref_stmt
> > = gimple_build_assign (mem_ref_lhs, mem_ref_op);
> > - gsi_insert_before (f.gsi, mem_ref_stmt, GSI_SAME_STMT);
> > +
> > + gimple_seq stmts = NULL;
> > + gimple_seq_add_stmt_without_update (, mem_ref_stmt);
> >
> >   int source_nelts = TYPE_VECTOR_SUBPARTS (access_type).to_constant ();
> >   vec_perm_builder sel (lhs_len, source_nelts, 1);
> > @@ -1245,8 +1248,11 @@ public:
> >  indices));
> >   tree mask_type = build_vector_type (ssizetype, lhs_len);
> >   tree mask = vec_perm_indices_to_tree (mask_type, indices);
> > - return gimple_build_assign (lhs, VEC_PERM_EXPR,
> > - mem_ref_lhs, mem_ref_lhs, mask);
> > + gimple *g2 = gimple_build_assign (lhs, VEC_PERM_EXPR,
> > +   mem_ref_lhs, mem_ref_lhs, mask);
> > + gimple_seq_add_stmt_without_update (, g2);
> > + gsi_replace_with_seq_vops (f.gsi, stmts);
> > + return g2;
> >}
> >
> >  return NULL;
> > diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> > index c2d9c806aee..03cdb2f9f49 100644
> > --- a/gcc/gimple-fold.cc
> > +++ b/gcc/gimple-fold.cc
> > @@ -591,7 +591,7 @@ fold_gimple_assign (gimple_stmt_iterator *si)
> > If the statement has a lhs the last stmt in the sequence is expected
> > to assign to that lhs.  */
> >
> > -static void
> > +void
> >  gsi_replace_with_seq_vops (gimple_stmt_iterator *si_p, gimple_seq stmts)
> >  {
> >gimple *stmt = gsi_stmt (*si_p);
> > diff --git a/gcc/gimple-fold.h b/gcc/gimple-fold.h
> > index 7d29ee9a9a4..87ed4e56d25 100644
> > --- a/gcc/gimple-fold.h
> > +++ b/gcc/gimple-fold.h
> > @@ -63,6 +63,7 @@ extern bool arith_code_with_undefined_signed_overflow 
> > (tree_code);
> >  extern gimple_seq rewrite_to_defined_overflow (gimple *, bool = false);
> >  extern void replace_call_with_value (gimple_stmt_iterator *, tree);
> >  extern tree tree_vec_extract (gimple_stmt_iterator *, tree, tree, tree, 
> > tree);
> > +extern void gsi_replace_with_seq_vops (gimple_stmt_iterator *, gimple_seq);
> >
> >  /* gimple_build, functionally matching fold_buildN, outputs stmts
> > int the provided sequence, matching and simplifying them 

Re: [PATCH 3/3] Testcases for move sub blocks on param and ret

2022-12-05 Thread Jiufu Guo via Gcc-patches
Hi Segher,

Thanks for your comments!

Segher Boessenkool  writes:

> Hi!
>
> Some comments on the testcases:
>
> On Tue, Nov 29, 2022 at 09:45:07PM +0800, Jiufu Guo wrote:
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c
>> @@ -0,0 +1,25 @@
>> +/* PR target/65421 */
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2" } */
>> +/* { dg-require-effective-target powerpc_elfv2 } */
>> +
>> +typedef struct SA
>> +{
>> +  double a[3];
>> +  long l;
>> +} A;
>> +
>> +/* 2 vec load, 2 vec store.  */
>> +A ret_arg_pt (A *a){return *a;}
>> +
>> +/* 4 std */
>> +A ret_arg (A a) {return a;}
>> +
>> +/* 4 std */
>> +void st_arg (A a, A *p) {*p = a;}
>> +
>> +/* { dg-final { scan-assembler-times {\mlxvd2x\M|\mlxv\M|\mlvx\M} 2 } } */
>> +/* { dg-final { scan-assembler-times {\mstxvd2x\M|\mstxv\M|\mstvx\M} 2 } } 
>> */
>> +/* { dg-final { scan-assembler-times {\mstd\M} 8 } } */
>
> You need at least ISA 2.06 (power7) to have {l,st}xvd2x, and elfv2 does
> not guarantee that.  Have you tested on something old as well (say 970)
> to see if the lvx/stvx is generated as expected?  For that you need to
> have AltiVec enabled as well, so dg-require that?

Thanks a lot for point out this! 
Compiling this case on power7 manually, it does not generate these vector
load/store insns.  To check those insns, power7 is needed.

>
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr65421.c
>> @@ -0,0 +1,22 @@
>> +/* PR target/65421 */
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2" } */
>> +/* { dg-require-effective-target powerpc_elfv2 } */
>> +
>> +typedef struct SA
>> +{
>> +  double a[3];
>> +} A;
>> +
>> +/* 3 lfd */
>> +A ret_arg_pt (A *a){return *a;}
>> +
>> +/* blr */
>> +A ret_arg (A a) {return a;}
>> +
>> +/* 3 stfd */
>> +void st_arg (A a, A *p) {*p = a;}
>> +
>> +/* { dg-final { scan-assembler-times {\mlfd\M} 3 } } */
>> +/* { dg-final { scan-assembler-times {\mstfd\M} 3 } } */
>> +/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 9 } } */
>
> Comment that last one?  Just something as simple as "count insns" is
> enough :-)
Sure, thanks!

BR,
Jeff (Jiufu)

>
>
> Segher


Re: [aarch64] Use dup and zip1 for interleaving elements in initializing vector

2022-12-05 Thread Prathamesh Kulkarni via Gcc-patches
On Mon, 5 Dec 2022 at 16:50, Richard Sandiford
 wrote:
>
> Richard Sandiford via Gcc-patches  writes:
> > Prathamesh Kulkarni  writes:
> >> Hi,
> >> For the following test-case:
> >>
> >> int16x8_t foo(int16_t x, int16_t y)
> >> {
> >>   return (int16x8_t) { x, y, x, y, x, y, x, y };
> >> }
> >>
> >> Code gen at -O3:
> >> foo:
> >> dupv0.8h, w0
> >> ins v0.h[1], w1
> >> ins v0.h[3], w1
> >> ins v0.h[5], w1
> >> ins v0.h[7], w1
> >> ret
> >>
> >> For 16 elements, it results in 8 ins instructions which might not be
> >> optimal perhaps.
> >> I guess, the above code-gen would be equivalent to the following ?
> >> dup v0.8h, w0
> >> dup v1.8h, w1
> >> zip1 v0.8h, v0.8h, v1.8h
> >>
> >> I have attached patch to do the same, if number of elements >= 8,
> >> which should be possibly better compared to current code-gen ?
> >> Patch passes bootstrap+test on aarch64-linux-gnu.
> >> Does the patch look OK ?
> >>
> >> Thanks,
> >> Prathamesh
> >>
> >> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> >> index c91df6f5006..e5dea70e363 100644
> >> --- a/gcc/config/aarch64/aarch64.cc
> >> +++ b/gcc/config/aarch64/aarch64.cc
> >> @@ -22028,6 +22028,39 @@ aarch64_expand_vector_init (rtx target, rtx vals)
> >>return;
> >>  }
> >>
> >> +  /* Check for interleaving case.
> >> + For eg if initializer is (int16x8_t) {x, y, x, y, x, y, x, y}.
> >> + Generate following code:
> >> + dup v0.h, x
> >> + dup v1.h, y
> >> + zip1 v0.h, v0.h, v1.h
> >> + for "large enough" initializer.  */
> >> +
> >> +  if (n_elts >= 8)
> >> +{
> >> +  int i;
> >> +  for (i = 2; i < n_elts; i++)
> >> +if (!rtx_equal_p (XVECEXP (vals, 0, i), XVECEXP (vals, 0, i % 2)))
> >> +  break;
> >> +
> >> +  if (i == n_elts)
> >> +{
> >> +  machine_mode mode = GET_MODE (target);
> >> +  rtx dest[2];
> >> +
> >> +  for (int i = 0; i < 2; i++)
> >> +{
> >> +  rtx x = copy_to_mode_reg (GET_MODE_INNER (mode), XVECEXP (vals, 
> >> 0, i));
> >
> > Formatting nit: long line.
> >
> >> +  dest[i] = gen_reg_rtx (mode);
> >> +  aarch64_emit_move (dest[i], gen_vec_duplicate (mode, x));
> >> +}
> >
> > This could probably be written:
> >
> > for (int i = 0; i < 2; i++)
> >   {
> > rtx x = expand_vector_broadcast (mode, XVECEXP (vals, 0, i));
> > dest[i] = force_reg (GET_MODE_INNER (mode), x);
>
> Oops, I meant "mode" rather than "GET_MODE_INNER (mode)", sorry.
Thanks, I have pushed the change in
769370f3e2e04823c8a621d8ffa756dd83ebf21e after running
bootstrap+test on aarch64-linux-gnu.

Thanks,
Prathamesh
>
> >   }
> >
> > which avoids forcing constant elements into a register before the 
> > duplication.
> > OK with that change if it works.
> >
> > Thanks,
> > Richard
> >
> >> +
> >> +  rtvec v = gen_rtvec (2, dest[0], dest[1]);
> >> +  emit_set_insn (target, gen_rtx_UNSPEC (mode, v, UNSPEC_ZIP1));
> >> +  return;
> >> +}
> >> +}
> >> +
> >>enum insn_code icode = optab_handler (vec_set_optab, mode);
> >>gcc_assert (icode != CODE_FOR_nothing);
> >>
> >> diff --git a/gcc/testsuite/gcc.target/aarch64/interleave-init-1.c 
> >> b/gcc/testsuite/gcc.target/aarch64/interleave-init-1.c
> >> new file mode 100644
> >> index 000..ee775048589
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/aarch64/interleave-init-1.c
> >> @@ -0,0 +1,37 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-O3" } */
> >> +/* { dg-final { check-function-bodies "**" "" "" } } */
> >> +
> >> +#include 
> >> +
> >> +/*
> >> +** foo:
> >> +**  ...
> >> +**  dup v[0-9]+\.8h, w[0-9]+
> >> +**  dup v[0-9]+\.8h, w[0-9]+
> >> +**  zip1v[0-9]+\.8h, v[0-9]+\.8h, v[0-9]+\.8h
> >> +**  ...
> >> +**  ret
> >> +*/
> >> +
> >> +int16x8_t foo(int16_t x, int y)
> >> +{
> >> +  int16x8_t v = (int16x8_t) {x, y, x, y, x, y, x, y};
> >> +  return v;
> >> +}
> >> +
> >> +/*
> >> +** foo2:
> >> +**  ...
> >> +**  dup v[0-9]+\.8h, w[0-9]+
> >> +**  moviv[0-9]+\.8h, 0x1
> >> +**  zip1v[0-9]+\.8h, v[0-9]+\.8h, v[0-9]+\.8h
> >> +**  ...
> >> +**  ret
> >> +*/
> >> +
> >> +int16x8_t foo2(int16_t x)
> >> +{
> >> +  int16x8_t v = (int16x8_t) {x, 1, x, 1, x, 1, x, 1};
> >> +  return v;
> >> +}


Re: Re: [PATCH] RISC-V: optimize stack manipulation in save-restore

2022-12-05 Thread Fei Gao
Hi Palmer and all, 

I have split the patches and triggerred a new thread.
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg297206.html

Could you please review at your convenience?

Thanks & BR, 
Fei

On 2022-12-01 11:07  Fei Gao  wrote:
>
>On 2022-12-01 06:50  Palmer Dabbelt  wrote:
>>
>>On Wed, 30 Nov 2022 00:37:17 PST (-0800), gao...@eswincomputing.com wrote:
>>> The stack that save-restore reserves is not well accumulated in stack 
>>> allocation and deallocation.
>>> This patch allows less instructions to be used in stack allocation and 
>>> deallocation if save-restore enabled,
>>> and also a much clear logic for save-restore stack manipulation.
>>>
>>> before patch:
>>> bar:
>>> callt0,__riscv_save_4
>>> addisp,sp,-64
>>> ...
>>> li  t0,-12288
>>> addit0,t0,-1968 # optimized out after patch
>>> add sp,sp,t0 # prologue
>>> ...
>>> li  t0,12288 # epilogue
>>> addit0,t0,2000 # optimized out after patch
>>> add sp,sp,t0
>>> ...
>>> addisp,sp,32
>>> tail__riscv_restore_4
>>>
>>> after patch:
>>> bar:
>>> callt0,__riscv_save_4
>>> addisp,sp,-2032
>>> ...
>>> li  t0,-12288
>>> add sp,sp,t0 # prologue
>>> ...
>>> li  t0,12288 # epilogue
>>> add sp,sp,t0
>>> ...
>>> addisp,sp,2032
>>> tail__riscv_restore_4
>>>
>>> gcc/ChangeLog:
>>>
>>> * config/riscv/riscv.cc (riscv_first_stack_step): add a new 
>>>function parameter remaining_size.
>>> (riscv_compute_frame_info): adapt new riscv_first_stack_step 
>>>interface.
>>> (riscv_expand_prologue): consider save-restore in stack allocation.
>>> (riscv_expand_epilogue): consider save-restore in stack 
>>>deallocation.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * gcc.target/riscv/stack_save_restore.c: New test.
>>> ---
>>>  gcc/config/riscv/riscv.cc | 58 ++-
>>>  .../gcc.target/riscv/stack_save_restore.c | 40 +
>>>  2 files changed, 70 insertions(+), 28 deletions(-)
>>>  create mode 100644 gcc/testsuite/gcc.target/riscv/stack_save_restore.c
>>
>>I guess with the RISC-V backend still being open for things as big as
>>the V port we should probably be taking code like this as well?  I
>>wouldn't be opposed to making an exception for the V code and holding
>>everything else back, though.
>>
>>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>>> index 05bdba5ab4d..9e92e729a5f 100644
>>> --- a/gcc/config/riscv/riscv.cc
>>> +++ b/gcc/config/riscv/riscv.cc
>>> @@ -4634,7 +4634,7 @@ riscv_save_libcall_count (unsigned mask)
>>> They decrease stack_pointer_rtx but leave frame_pointer_rtx and
>>> hard_frame_pointer_rtx unchanged.  */
>>>
>>> -static HOST_WIDE_INT riscv_first_stack_step (struct riscv_frame_info 
>>> *frame);
>>> +static HOST_WIDE_INT riscv_first_stack_step (struct riscv_frame_info 
>>> *frame, poly_int64 remaining_size);
>>>
>>>  /* Handle stack align for poly_int.  */
>>>  static poly_int64
>>> @@ -4663,7 +4663,7 @@ riscv_compute_frame_info (void)
>>>   save/restore t0.  We check for this before clearing the frame struct. 
>>> */
>>>    if (cfun->machine->interrupt_handler_p)
>>>  {
>>> -  HOST_WIDE_INT step1 = riscv_first_stack_step (frame);
>>> +  HOST_WIDE_INT step1 = riscv_first_stack_step (frame, 
>>> frame->total_size);
>>>    if (! POLY_SMALL_OPERAND_P ((frame->total_size - step1)))
>>>  interrupt_save_prologue_temp = true;
>>>  }
>>> @@ -4913,31 +4913,31 @@ riscv_restore_reg (rtx reg, rtx mem)
>>> without adding extra instructions.  */
>>>
>>>  static HOST_WIDE_INT
>>> -riscv_first_stack_step (struct riscv_frame_info *frame)
>>> +riscv_first_stack_step (struct riscv_frame_info *frame, poly_int64 
>>> remaining_size)
>>>  {
>>> -  HOST_WIDE_INT frame_total_constant_size;
>>> -  if (!frame->total_size.is_constant ())
>>> -    frame_total_constant_size
>>> -  = riscv_stack_align (frame->total_size.coeffs[0])
>>> -   - riscv_stack_align (frame->total_size.coeffs[1]);
>>> +  HOST_WIDE_INT remaining_const_size;
>>> +  if (!remaining_size.is_constant ())
>>> +    remaining_const_size
>>> +  = riscv_stack_align (remaining_size.coeffs[0])
>>> +   - riscv_stack_align (remaining_size.coeffs[1]);
>>
>>The alignment looks off here, at least in the email.  Worth fixing it up
>>if you're touching the lines anyway.
>
>Sure, i will change RISCV_STACK_ALIGN into riscv_stack_align.
>
>>
>>>    else
>>> -    frame_total_constant_size = frame->total_size.to_constant ();
>>> +    remaining_const_size = remaining_size.to_constant ();
>>>
>>> -  if (SMALL_OPERAND (frame_total_constant_size))
>>> -    return frame_total_constant_size;
>>> +  if (SMALL_OPERAND (remaining_const_size))
>>> +    return remaining_const_size;
>>>
>>>    HOST_WIDE_INT min_first_step =
>>> -    RISCV_STACK_ALIGN ((frame->total_size - 
>>> frame->frame_pointer_offset).to_constant());
>>> +    RISCV_STACK_ALIGN ((remaining_size - 
>>> 

Re: [PATCH] range-op-float: Improve binary reverse operations

2022-12-05 Thread Andrew MacLeod via Gcc-patches



On 12/5/22 17:38, Jakub Jelinek wrote:

On Mon, Dec 05, 2022 at 09:54:09PM +0100, Jakub Jelinek via Gcc-patches wrote:

On Mon, Dec 05, 2022 at 03:43:16PM -0500, Andrew MacLeod wrote:

Id actually prefer to avoid passing the tree code around... we're trying to
avoid that sort of thing even though Aldy temporarily introduced them to
range-ops. Hes suppose to remove that next stage 1 :-P   Ideally anything
"special" is locally contained to the specific routine.

Would a bool divide_op2 argument be better (perhaps defaulted to false)?
Inlining float_binary_op_range_finish by hand doesn't seem to be a good
idea, if it needs to be changed, it would need to be changed in multiple
places...


Yeah thats also fine.

Andrew



In patch form on top of PR107975 patch.

2022-12-05  Jakub Jelinek  

PR tree-optimization/107972
* range-op-float.cc (frange_drop_infs): New function.
(float_binary_op_range_finish): Add DIV_OP2 argument.  If DIV_OP2 is
false and lhs is finite or if DIV_OP2 is true and lhs is non-zero and
not NAN, r must be finite too.
(foperator_div::op2_range): Pass true to DIV_OP2 of
float_binary_op_range_finish.

--- gcc/range-op-float.cc.jj2022-12-05 11:17:34.900573272 +0100
+++ gcc/range-op-float.cc   2022-12-05 16:13:54.414845672 +0100
@@ -330,6 +330,18 @@ frange_drop_ninf (frange , tree type)
r.intersect (tmp);
  }
  
+// Crop R to [MIN, MAX] where MAX is the maximum representable number

+// for TYPE and MIN the minimum representable number for TYPE.
+
+static inline void
+frange_drop_infs (frange , tree type)
+{
+  REAL_VALUE_TYPE max = real_max_representable (type);
+  REAL_VALUE_TYPE min = real_min_representable (type);
+  frange tmp (type, min, max);
+  r.intersect (tmp);
+}
+
  // If zero is in R, make sure both -0.0 and +0.0 are in the range.
  
  static inline void

@@ -1883,7 +1895,7 @@ foperator_unordered_equal::op1_range (fr
  
  static bool

  float_binary_op_range_finish (bool ret, frange , tree type,
- const frange )
+ const frange , bool div_op2 = false)
  {
if (!ret)
  return false;
@@ -1904,7 +1916,20 @@ float_binary_op_range_finish (bool ret,
// If lhs isn't NAN, then neither operand could be NAN,
// even if the reverse operation does introduce a maybe_nan.
if (!lhs.maybe_isnan ())
-r.clear_nan ();
+{
+  r.clear_nan ();
+  if (div_op2
+ ? !(real_compare (LE_EXPR, _bound (), )
+ && real_compare (GE_EXPR, _bound (), ))
+ : !(real_isinf (_bound ())
+ || real_isinf (_bound (
+   // For reverse + or - or * or op1 of /, if result is finite, then
+   // r must be finite too, as X + INF or X - INF or X * INF or
+   // INF / X is always +-INF or NAN.  For op2 of /, if result is
+   // non-zero and not NAN, r must be finite, as X / INF is always
+   // 0 or NAN.
+   frange_drop_infs (r, type);
+}
// If lhs is a maybe or known NAN, the operand could be
// NAN.
else
@@ -2330,7 +2355,7 @@ public:
  if (!ret)
return ret;
  if (lhs.known_isnan () || op1.known_isnan () || op1.undefined_p ())
-  return float_binary_op_range_finish (ret, r, type, lhs);
+  return float_binary_op_range_finish (ret, r, type, lhs, true);
  const REAL_VALUE_TYPE _lb = lhs.lower_bound ();
  const REAL_VALUE_TYPE _ub = lhs.upper_bound ();
  const REAL_VALUE_TYPE _lb = op1.lower_bound ();
@@ -2347,7 +2372,7 @@ public:
zero_to_inf_range (lb, ub, signbit_known);
r.set (type, lb, ub);
}
-return float_binary_op_range_finish (ret, r, type, lhs);
+return float_binary_op_range_finish (ret, r, type, lhs, true);
}
  private:
void rv_fold (REAL_VALUE_TYPE , REAL_VALUE_TYPE , bool _nan,


Jakub





Re: [PATCH] range-op-float: Improve binary reverse operations

2022-12-05 Thread Jakub Jelinek via Gcc-patches
On Mon, Dec 05, 2022 at 09:54:09PM +0100, Jakub Jelinek via Gcc-patches wrote:
> On Mon, Dec 05, 2022 at 03:43:16PM -0500, Andrew MacLeod wrote:
> > Id actually prefer to avoid passing the tree code around... we're trying to
> > avoid that sort of thing even though Aldy temporarily introduced them to
> > range-ops. Hes suppose to remove that next stage 1 :-P   Ideally anything
> > "special" is locally contained to the specific routine.
> 
> Would a bool divide_op2 argument be better (perhaps defaulted to false)?
> Inlining float_binary_op_range_finish by hand doesn't seem to be a good
> idea, if it needs to be changed, it would need to be changed in multiple
> places...

In patch form on top of PR107975 patch.

2022-12-05  Jakub Jelinek  

PR tree-optimization/107972
* range-op-float.cc (frange_drop_infs): New function.
(float_binary_op_range_finish): Add DIV_OP2 argument.  If DIV_OP2 is
false and lhs is finite or if DIV_OP2 is true and lhs is non-zero and
not NAN, r must be finite too.
(foperator_div::op2_range): Pass true to DIV_OP2 of
float_binary_op_range_finish.

--- gcc/range-op-float.cc.jj2022-12-05 11:17:34.900573272 +0100
+++ gcc/range-op-float.cc   2022-12-05 16:13:54.414845672 +0100
@@ -330,6 +330,18 @@ frange_drop_ninf (frange , tree type)
   r.intersect (tmp);
 }
 
+// Crop R to [MIN, MAX] where MAX is the maximum representable number
+// for TYPE and MIN the minimum representable number for TYPE.
+
+static inline void
+frange_drop_infs (frange , tree type)
+{
+  REAL_VALUE_TYPE max = real_max_representable (type);
+  REAL_VALUE_TYPE min = real_min_representable (type);
+  frange tmp (type, min, max);
+  r.intersect (tmp);
+}
+
 // If zero is in R, make sure both -0.0 and +0.0 are in the range.
 
 static inline void
@@ -1883,7 +1895,7 @@ foperator_unordered_equal::op1_range (fr
 
 static bool
 float_binary_op_range_finish (bool ret, frange , tree type,
- const frange )
+ const frange , bool div_op2 = false)
 {
   if (!ret)
 return false;
@@ -1904,7 +1916,20 @@ float_binary_op_range_finish (bool ret,
   // If lhs isn't NAN, then neither operand could be NAN,
   // even if the reverse operation does introduce a maybe_nan.
   if (!lhs.maybe_isnan ())
-r.clear_nan ();
+{
+  r.clear_nan ();
+  if (div_op2
+ ? !(real_compare (LE_EXPR, _bound (), )
+ && real_compare (GE_EXPR, _bound (), ))
+ : !(real_isinf (_bound ())
+ || real_isinf (_bound (
+   // For reverse + or - or * or op1 of /, if result is finite, then
+   // r must be finite too, as X + INF or X - INF or X * INF or
+   // INF / X is always +-INF or NAN.  For op2 of /, if result is
+   // non-zero and not NAN, r must be finite, as X / INF is always
+   // 0 or NAN.
+   frange_drop_infs (r, type);
+}
   // If lhs is a maybe or known NAN, the operand could be
   // NAN.
   else
@@ -2330,7 +2355,7 @@ public:
 if (!ret)
   return ret;
 if (lhs.known_isnan () || op1.known_isnan () || op1.undefined_p ())
-  return float_binary_op_range_finish (ret, r, type, lhs);
+  return float_binary_op_range_finish (ret, r, type, lhs, true);
 const REAL_VALUE_TYPE _lb = lhs.lower_bound ();
 const REAL_VALUE_TYPE _ub = lhs.upper_bound ();
 const REAL_VALUE_TYPE _lb = op1.lower_bound ();
@@ -2347,7 +2372,7 @@ public:
zero_to_inf_range (lb, ub, signbit_known);
r.set (type, lb, ub);
   }
-return float_binary_op_range_finish (ret, r, type, lhs);
+return float_binary_op_range_finish (ret, r, type, lhs, true);
   }
 private:
   void rv_fold (REAL_VALUE_TYPE , REAL_VALUE_TYPE , bool _nan,


Jakub



[PATCH] range-op-float: Fix up ICE in lower_bound [PR107975]

2022-12-05 Thread Jakub Jelinek via Gcc-patches
Hi!

According to 
https://gcc.gnu.org/pipermail/gcc-regression/2022-December/077258.html
my patch caused some ICEs, e.g. the following testcase ICEs.
The problem is that lower_bound and upper_bound methods on a france assert
that the range isn't VR_NAN or VR_UNDEFINED.
All the op1_range/op2_range methods already return early if lhs.undefined_p,
but the other cases (when lhs is VR_NAN or the other opN is VR_NAN or
VR_UNDEFINED) aren't.  float_binary_op_range_finish will DTRT for those
cases already.

Ok for trunk if this passes bootstrap/regtest?

2022-12-05  Jakub Jelinek  

PR tree-optimization/107975
* range-op-float.cc (foperator_mult::op1_range,
foperator_div::op1_range, foperator_div::op2_range): Just
return float_binary_op_range_finish result if lhs is known
NAN, or the other operand is known NAN or UNDEFINED.

* gcc.dg/pr107975.c: New test.

--- gcc/range-op-float.cc.jj2022-12-05 11:17:34.900573272 +0100
+++ gcc/range-op-float.cc   2022-12-05 17:58:38.059754128 +0100
@@ -2146,6 +2146,8 @@ public:
 bool ret = rdiv.fold_range (r, type, lhs, op2);
 if (ret == false)
   return false;
+if (lhs.known_isnan () || op2.known_isnan () || op2.undefined_p ())
+  return float_binary_op_range_finish (ret, r, type, lhs);
 const REAL_VALUE_TYPE _lb = lhs.lower_bound ();
 const REAL_VALUE_TYPE _ub = lhs.upper_bound ();
 const REAL_VALUE_TYPE _lb = op2.lower_bound ();
@@ -2296,6 +2298,8 @@ public:
 bool ret = fop_mult.fold_range (r, type, lhs, op2);
 if (!ret)
   return ret;
+if (lhs.known_isnan () || op2.known_isnan () || op2.undefined_p ())
+  return float_binary_op_range_finish (ret, r, type, lhs);
 const REAL_VALUE_TYPE _lb = lhs.lower_bound ();
 const REAL_VALUE_TYPE _ub = lhs.upper_bound ();
 const REAL_VALUE_TYPE _lb = op2.lower_bound ();
@@ -2325,6 +2329,8 @@ public:
 bool ret = fold_range (r, type, op1, lhs);
 if (!ret)
   return ret;
+if (lhs.known_isnan () || op1.known_isnan () || op1.undefined_p ())
+  return float_binary_op_range_finish (ret, r, type, lhs);
 const REAL_VALUE_TYPE _lb = lhs.lower_bound ();
 const REAL_VALUE_TYPE _ub = lhs.upper_bound ();
 const REAL_VALUE_TYPE _lb = op1.lower_bound ();
--- gcc/testsuite/gcc.dg/pr107975.c.jj  2022-12-05 19:15:56.518851401 +0100
+++ gcc/testsuite/gcc.dg/pr107975.c 2022-12-05 19:15:04.014620281 +0100
@@ -0,0 +1,15 @@
+/* PR tree-optimization/107975 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-add-options ieee } */
+
+double
+foo (double x, double y)
+{
+  if (x == 42.0)
+return 1.0;
+  double r = x * y;
+  if (!__builtin_isnan (r))
+__builtin_unreachable ();
+  return r;
+}

Jakub



Re: [PATCH] range-op-float: Improve binary reverse operations

2022-12-05 Thread Jakub Jelinek via Gcc-patches
On Mon, Dec 05, 2022 at 03:43:16PM -0500, Andrew MacLeod wrote:
> Id actually prefer to avoid passing the tree code around... we're trying to
> avoid that sort of thing even though Aldy temporarily introduced them to
> range-ops. Hes suppose to remove that next stage 1 :-P   Ideally anything
> "special" is locally contained to the specific routine.

Would a bool divide_op2 argument be better (perhaps defaulted to false)?
Inlining float_binary_op_range_finish by hand doesn't seem to be a good
idea, if it needs to be changed, it would need to be changed in multiple
places...

Jakub



Re: [PATCH] tree-optimization/107852 - missed optimization with PHIs

2022-12-05 Thread Jan-Benedict Glaw
On Tue, 2022-11-29 14:30:22 +0100, Richard Biener via Gcc-patches 
 wrote:
> Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
> 
>   PR tree-optimization/107852
>   * tree-ssa-sccvn.cc (visit_phi): Use equivalences recorded
>   as predicated values to elide more redundant PHIs.
> 
>   * gcc.dg/tree-ssa/ssa-fre-101.c: New testcase.

This seems to trigger an issue when building the Linux powerpc kernel
for the skiroot_defconfig:

[mk all 2022-12-05 19:50:10]   powerpc64-linux-gcc 
-Wp,-MMD,drivers/dma-buf/.dma-fence-array.o.d -nostdinc 
-I./arch/powerpc/include -I./arch/powerpc/include/generated  -I./include 
-I./arch/powerpc/include/uapi -I./arch/powerpc/include/generated/uapi 
-I./include/uapi -I./include/generated/uapi -include 
./include/linux/compiler-version.h -include ./include/linux/kconfig.h -include 
./include/linux/compiler_types.h -D__KERNEL__ -I ./arch/powerpc 
-DHAVE_AS_ATHIGH=1 -fmacro-prefix-map=./= -Wall -Wundef 
-Werror=strict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common 
-fshort-wchar -fno-PIE -Werror=implicit-function-declaration 
-Werror=implicit-int -Werror=return-type -Wno-format-security -std=gnu11 
-mlittle-endian -m64 -msoft-float -pipe -mtraceback=no -mabi=elfv2 
-mcmodel=medium -mno-pointers-to-nested-functions -mcpu=power8 -mtune=power10 
-mno-prefixed -mno-pcrel -mno-altivec -mno-vsx -mno-mma 
-fno-asynchronous-unwind-tables -mno-string -Wa,-maltivec -Wa,-mpower4 
-Wa,-many -mno-strict-align -mlittle-endian -mstack-protector-guard=tls 
-mstack-protector-guard-reg=r13 -fno-delete-null-pointer-checks 
-Wno-frame-address -Wno-format-truncation -Wno-format-overflow 
-Wno-address-of-packed-member -Os -fno-allow-store-data-races 
-Wframe-larger-than=2048 -fstack-protector-strong -Wno-main 
-Wno-unused-but-set-variable -Wno-unused-const-variable -Wno-dangling-pointer 
-fomit-frame-pointer -ftrivial-auto-var-init=zero -fno-stack-clash-protection 
-Wdeclaration-after-statement -Wvla -Wno-pointer-sign -Wcast-function-type 
-Wno-stringop-truncation -Wno-stringop-overflow -Wno-restrict 
-Wno-maybe-uninitialized -Wno-alloc-size-larger-than -Wimplicit-fallthrough=5 
-fno-strict-overflow -fno-stack-check -fconserve-stack -Werror=date-time 
-Werror=incompatible-pointer-types -Werror=designated-init 
-Wno-packed-not-aligned -mstack-protector-guard-offset=2800
-DKBUILD_MODFILE='"drivers/dma-buf/dma-fence-array"' 
-DKBUILD_BASENAME='"dma_fence_array"' -DKBUILD_MODNAME='"dma_fence_array"' 
-D__KBUILD_MODNAME=kmod_dma_fence_array -c -o drivers/dma-buf/dma-fence-array.o 
drivers/dma-buf/dma-fence-array.c  
[mk all 2022-12-05 19:50:10] drivers/dma-buf/dma-fence-array.c: In function 
'dma_fence_array_create':
[mk all 2022-12-05 19:50:10] drivers/dma-buf/dma-fence-array.c:154:25: error: 
control flow in the middle of basic block 12
[mk all 2022-12-05 19:50:10]   154 | struct dma_fence_array 
*dma_fence_array_create(int num_fences,
[mk all 2022-12-05 19:50:10]   | 
^~
[mk all 2022-12-05 19:50:10] during GIMPLE pass: ivopts
[mk all 2022-12-05 19:50:10] drivers/dma-buf/dma-fence-array.c:154:25: internal 
compiler error: verify_flow_info failed
[mk all 2022-12-05 19:50:10] 0x19ea876 internal_error(char const*, ...)
[mk all 2022-12-05 19:50:10]???:0
[mk all 2022-12-05 19:50:10] 0x94b00e verify_flow_info()
[mk all 2022-12-05 19:50:10]???:0
[mk all 2022-12-05 19:50:10] Please submit a full bug report, with preprocessed 
source (by using -freport-bug).
[mk all 2022-12-05 19:50:10] Please include the complete backtrace with any bug 
report.
[mk all 2022-12-05 19:50:10] See  for instructions.

Maybe you've got an idea, otherwise I'll try to reproduce it manually.
(That's all automated building.)

Thanks,
  Jan-Benedict
-- 


signature.asc
Description: PGP signature


Re: [PATCH] testsuite, X86, Darwin: Fix bf16 ABI tests for Mach-O/macOS ABI.

2022-12-05 Thread Uros Bizjak via Gcc-patches
On Mon, Dec 5, 2022 at 10:17 PM Iain Sandoe  wrote:
>
> Hi Uros,
>
> > On 5 Dec 2022, at 21:07, Uros Bizjak via Gcc-patches 
> >  wrote:
> >
> > On Mon, Dec 5, 2022 at 3:54 PM Iain Sandoe  wrote:
> >>
> >> Hi Uros,
> >>
> >>> On 5 Dec 2022, at 10:37, Uros Bizjak via Gcc-patches 
> >>>  wrote:
> >>>
> >>> On Sun, Dec 4, 2022 at 9:30 PM Iain Sandoe  wrote:
> 
> >>
>  gcc/testsuite/ChangeLog:
> 
>    * gcc.target/x86_64/abi/bf16/args.h: Make xmm_regs, x87_regs 
>  extern.
>    * gcc.target/x86_64/abi/bf16/m256bf16/args.h: Likewise.
>    * gcc.target/x86_64/abi/bf16/m512bf16/args.h: Likewise.
>    * gcc.target/x86_64/abi/bf16/asm-support.S: Add Mach-O variant.
>    * gcc.target/x86_64/abi/bf16/m256bf16/asm-support.S: Likewise.
>    * gcc.target/x86_64/abi/bf16/m512bf16/asm-support.S: Likewise.
> >>>
> >>> Please note that in other directories asm-support-darwin.s is
> >>> introduced and included via .exp file. Is there a reason a different
> >>> approach is introduced here?
> >>
> >> Since it seems that testcases get added and amended without considering any
> >> sub-target apart from x86_64-linux-gnu (even by very experienced 
> >> contributors),
> >> I was hoping that the Darwin section might prompt folks to remember that 
> >> there
> >> are several other sub-targets.
> >>
> >> However, the main thing is to fix the tests .. so here’s a version using 
> >> separate
> >> files.
> >
> > extern void (*callthis)(void);
> > extern unsigned long long
> > rax,rbx,rcx,rdx,rsi,rdi,rsp,rbp,r8,r9,r10,r11,r12,r13,r14,r15;
> > -XMM_T xmm_regs[16];
> > -X87_T x87_regs[8];
> > +extern XMM_T xmm_regs[16];
> > +extern X87_T x87_regs[8];
> >
> > Do you still need this change? Existing test files are compiled without 
> > extern.
> >
> > +.globl_callthis
> > +.zerofill __DATA,__bss,_callthis,8,3
> > +.globl_rax
> > +.zerofill __DATA,__bss,_rax,8,3
> > +.globl_rbx
> > +.zerofill __DATA,__bss,_rbx,8,3
> > +.globl_rcx
> > +.zerofill __DATA,__bss,_rcx,8,3
> > +.globl_rdx
> > +.zerofill __DATA,__bss,_rdx,8,3
> > ...
> >
> > I wonder if the above approach is better than existing:
> >
> >.comm_callthis,8
> >.comm_rax,8
> >.comm_rbx,8
> >.comm_rcx,8
> >.comm_rdx,8
> > ...
>
> As noted in the changelog, direct access to common data is not permitted in 
> the Darwin
> ABI [for x86_64, it would need to be _xxx@GOTPCREL(%rip)..] that’s why these 
> have
> been moved to bss.

Thanks for the explanation!

The patch is OK.

> > It is strange to have two different approaches for similar tests. If
> > the new approach is better, we should also change existing asm-support
> > files.
>
> could be, I have not checked other case so far (extremely limited time at the 
> moment)
>
> Quite likely, the accesses work in the testcases, despite violating the ABI.

This is never a good sign, it will break sooner or later...

Thanks,
Uros.


Re: [PATCH] testsuite, X86, Darwin: Fix bf16 ABI tests for Mach-O/macOS ABI.

2022-12-05 Thread Iain Sandoe
Hi Uros,

> On 5 Dec 2022, at 21:07, Uros Bizjak via Gcc-patches 
>  wrote:
> 
> On Mon, Dec 5, 2022 at 3:54 PM Iain Sandoe  wrote:
>> 
>> Hi Uros,
>> 
>>> On 5 Dec 2022, at 10:37, Uros Bizjak via Gcc-patches 
>>>  wrote:
>>> 
>>> On Sun, Dec 4, 2022 at 9:30 PM Iain Sandoe  wrote:
 
>> 
 gcc/testsuite/ChangeLog:
 
   * gcc.target/x86_64/abi/bf16/args.h: Make xmm_regs, x87_regs extern.
   * gcc.target/x86_64/abi/bf16/m256bf16/args.h: Likewise.
   * gcc.target/x86_64/abi/bf16/m512bf16/args.h: Likewise.
   * gcc.target/x86_64/abi/bf16/asm-support.S: Add Mach-O variant.
   * gcc.target/x86_64/abi/bf16/m256bf16/asm-support.S: Likewise.
   * gcc.target/x86_64/abi/bf16/m512bf16/asm-support.S: Likewise.
>>> 
>>> Please note that in other directories asm-support-darwin.s is
>>> introduced and included via .exp file. Is there a reason a different
>>> approach is introduced here?
>> 
>> Since it seems that testcases get added and amended without considering any
>> sub-target apart from x86_64-linux-gnu (even by very experienced 
>> contributors),
>> I was hoping that the Darwin section might prompt folks to remember that 
>> there
>> are several other sub-targets.
>> 
>> However, the main thing is to fix the tests .. so here’s a version using 
>> separate
>> files.
> 
> extern void (*callthis)(void);
> extern unsigned long long
> rax,rbx,rcx,rdx,rsi,rdi,rsp,rbp,r8,r9,r10,r11,r12,r13,r14,r15;
> -XMM_T xmm_regs[16];
> -X87_T x87_regs[8];
> +extern XMM_T xmm_regs[16];
> +extern X87_T x87_regs[8];
> 
> Do you still need this change? Existing test files are compiled without 
> extern.
> 
> +.globl_callthis
> +.zerofill __DATA,__bss,_callthis,8,3
> +.globl_rax
> +.zerofill __DATA,__bss,_rax,8,3
> +.globl_rbx
> +.zerofill __DATA,__bss,_rbx,8,3
> +.globl_rcx
> +.zerofill __DATA,__bss,_rcx,8,3
> +.globl_rdx
> +.zerofill __DATA,__bss,_rdx,8,3
> ...
> 
> I wonder if the above approach is better than existing:
> 
>.comm_callthis,8
>.comm_rax,8
>.comm_rbx,8
>.comm_rcx,8
>.comm_rdx,8
> ...

As noted in the changelog, direct access to common data is not permitted in the 
Darwin
ABI [for x86_64, it would need to be _xxx@GOTPCREL(%rip)..] that’s why these 
have
been moved to bss.

> It is strange to have two different approaches for similar tests. If
> the new approach is better, we should also change existing asm-support
> files.

could be, I have not checked other case so far (extremely limited time at the 
moment)

Quite likely, the accesses work in the testcases, despite violating the ABI.

Iain

Re: [PATCH] Use cxx11 abi in versioned namespace

2022-12-05 Thread François Dumont via Gcc-patches

I just rebased this patch.

All good apart from the to_chars/from_chars symbols issue.

François


On 11/10/22 19:28, François Dumont wrote:

Hi

    Now that pretty printer is fixed (once patch validated) I'd like 
to propose this patch again.


    Note that I'am adding a check on pretty printer with a std::any on 
a std::wstring. I did so because of the FIXME in printers.py which is 
dealing with 'std::string' explicitely. Looks like in my case, where 
there is no 'std::string' but just a 'std::__8::string' we do not need 
the workaround.


    Once again I am attaching also the version namespace bump patch as 
I think that adopting the cxx11 abi in this mode is a good enough 
reason to bump it. If you agress let me know if I should squash the 
commits before pushing.


    libstdc++: [_GLIBCXX_INLINE_VERSION] Use cxx11 abi

    Use cxx11 abi when activating versioned namespace mode.

    libstdcxx-v3/ChangeLog:

    * acinclude.m4 [GLIBCXX_ENABLE_LIBSTDCXX_DUAL_ABI]: 
Default to "new" libstdcxx abi.
    * config/locale/dragonfly/monetary_members.cc 
[!_GLIBCXX_USE_DUAL_ABI]: Define money_base

    members.
    * config/locale/generic/monetary_members.cc 
[!_GLIBCXX_USE_DUAL_ABI]: Likewise.
    * config/locale/gnu/monetary_members.cc 
[!_GLIBCXX_USE_DUAL_ABI]: Likewise.

    * config/locale/gnu/numeric_members.cc
    [!_GLIBCXX_USE_DUAL_ABI](__narrow_multibyte_chars): Define.
    * configure: Regenerate.
    * include/bits/c++config
    [_GLIBCXX_INLINE_VERSION](_GLIBCXX_NAMESPACE_CXX11, 
_GLIBCXX_BEGIN_NAMESPACE_CXX11): Define

    empty.
[_GLIBCXX_INLINE_VERSION](_GLIBCXX_END_NAMESPACE_CXX11, 
_GLIBCXX_DEFAULT_ABI_TAG): Likewise.

    * python/libstdcxx/v6/printers.py
    (StdStringPrinter::__init__): Set self.new_string to True 
when std::__8::basic_string type is

    found.
    * src/Makefile.am 
[ENABLE_SYMVERS_GNU_NAMESPACE](ldbl_alt128_compat_sources): Define empty.

    * src/Makefile.in: Regenerate.
    * src/c++11/Makefile.am (cxx11_abi_sources): Rename into...
    (dual_abi_sources): ...this, new. Also move several 
sources to...

    (sources): ...this.
    (extra_string_inst_sources): Move several sources to...
    (inst_sources): ...this.
    * src/c++11/Makefile.in: Regenerate.
    * src/c++11/cow-fstream-inst.cc [_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions.
    * src/c++11/cow-locale_init.cc [_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions.
    * src/c++11/cow-sstream-inst.cc [_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions.
    * src/c++11/cow-stdexcept.cc 
[_GLIBCXX_USE_CXX11_ABI](error_category::_M_message):

    Skip definition.
    [_GLIBCXX_USE_CXX11_ABI]: Skip Transaction Memory TS 
definitions.
    * src/c++11/cow-string-inst.cc [_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions.
    * src/c++11/cow-string-io-inst.cc 
[_GLIBCXX_USE_CXX11_ABI]: Skip definitions.
    * src/c++11/cow-wstring-inst.cc [_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions.
    * src/c++11/cow-wstring-io-inst.cc 
[_GLIBCXX_USE_CXX11_ABI]: Skip definitions.
    * src/c++11/cxx11-hash_tr1.cc [!_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions.
    * src/c++11/cxx11-ios_failure.cc 
[!_GLIBCXX_USE_CXX11_ABI]: Skip definitions.

    [!_GLIBCXX_USE_DUAL_ABI] (__ios_failure): Remove.
    * src/c++11/cxx11-locale-inst.cc: Cleanup, just include 
locale-inst.cc.
    * src/c++11/cxx11-stdexcept.cc [!_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions.

    [!_GLIBCXX_USE_DUAL_ABI](__cow_string): Remove.
    * src/c++11/cxx11-wlocale-inst.cc 
[!_GLIBCXX_USE_CXX11_ABI]: Skip definitions.
    * src/c++11/fstream-inst.cc [!_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions

    * src/c++11/locale-inst-numeric.h
[!_GLIBCXX_USE_DUAL_ABI](std::use_facet>, 
std::use_facet>): Instantiate.
[!_GLIBCXX_USE_DUAL_ABI](std::has_facet>, 
std::has_facet>): Instantiate.
    [!_GLIBCXX_USE_DUAL_ABI](std::num_getistreambuf_iterator>): Instantiate.
    [!_GLIBCXX_USE_DUAL_ABI](std::num_putostreambuf_iterator>): Instantiate.
    * src/c++11/locale-inst.cc [!_GLIBCXX_USE_DUAL_ABI]: Build 
only when configured

    _GLIBCXX_USE_CXX11_ABI is equal to currently built abi.
    [!_GLIBCXX_USE_DUAL_ABI](__moneypunct_cache): 
Instantiate.
    [!_GLIBCXX_USE_DUAL_ABI](__moneypunct_cache): 
Instantiate.

    [!_GLIBCXX_USE_DUAL_ABI](__numpunct_cache): Instantiate.
    [!_GLIBCXX_USE_DUAL_ABI](__timepunct): Instantiate.
    [!_GLIBCXX_USE_DUAL_ABI](__timepunct_cache): Instantiate.
    [!_GLIBCXX_USE_DUAL_ABI](time_putostreambuf_iterator>): Instantiate.
    [!_GLIBCXX_USE_DUAL_ABI](time_put_bynameostreambuf_iterator>): Instantiate.

[!_GLIBCXX_USE_DUAL_ABI](__ctype_abstract_base): 

Re: [PATCH] tree, c++: optimize walk_tree_1 and cp_walk_subtrees

2022-12-05 Thread Jason Merrill via Gcc-patches

On 12/5/22 06:09, Prathamesh Kulkarni wrote:

On Mon, 5 Dec 2022 at 09:51, Patrick Palka via Gcc-patches
 wrote:


These functions currently repeatedly dereference tp during the subtree
walk, dereferences which the compiler can't CSE because it can't
guarantee that the subtree walking doesn't modify *tp.

But we already implicitly require that TREE_CODE (*tp) remains the same
throughout the subtree walks, so it doesn't seem to be a huge leap to
strengthen that to requiring *tp remains the same.

Hi Patrick,
Just wondering in that case, if marking tp with const_tree *, instead
of tree *, would perhaps help the compiler
for CSEing some of the dereferences to *tp ?


That wouldn't make a difference; even if *tp is const, the compiler 
can't be sure that a call won't change it.  And const_tree * doesn't 
even make *tp const, it makes **tp const.



So this patch manually CSEs the dereferences of *tp.  This means that
the callback function can no longer replace *tp with another tree (of
the same TREE_CODE) when walking one of its subtrees, but that doesn't
sound like a useful feature anyway.  This speeds up the C++ frontend by
about ~1.5% according to my experiments.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk stage3 or perhaps for stage1?


OK for stage 1.


gcc/cp/ChangeLog:

 * tree.cc (cp_walk_subtrees): Avoid repeatedly dereferencing tp.

gcc/ChangeLog:

 * tree.cc (walk_tree_1): Avoid repeatedly dereferencing tp
 and type_p.
---
  gcc/cp/tree.cc | 113 +
  gcc/tree.cc| 103 ++--
  2 files changed, 108 insertions(+), 108 deletions(-)

diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc
index 4066b014f6e..ceb0c75f559 100644
--- a/gcc/cp/tree.cc
+++ b/gcc/cp/tree.cc
@@ -5441,7 +5441,8 @@ tree
  cp_walk_subtrees (tree *tp, int *walk_subtrees_p, walk_tree_fn func,
   void *data, hash_set *pset)
  {
-  enum tree_code code = TREE_CODE (*tp);
+  tree t = *tp;
+  enum tree_code code = TREE_CODE (t);
tree result;

  #define WALK_SUBTREE(NODE) \
@@ -5452,7 +5453,7 @@ cp_walk_subtrees (tree *tp, int *walk_subtrees_p, 
walk_tree_fn func,
  }  \
while (0)

-  if (TYPE_P (*tp))
+  if (TYPE_P (t))
  {
/* If *WALK_SUBTREES_P is 1, we're interested in the syntactic form of
  the argument, so don't look through typedefs, but do walk into
@@ -5464,15 +5465,15 @@ cp_walk_subtrees (tree *tp, int *walk_subtrees_p, 
walk_tree_fn func,

  See find_abi_tags_r for an example of setting *WALK_SUBTREES_P to 2
  when that's the behavior the walk_tree_fn wants.  */
-  if (*walk_subtrees_p == 1 && typedef_variant_p (*tp))
+  if (*walk_subtrees_p == 1 && typedef_variant_p (t))
 {
- if (tree ti = TYPE_ALIAS_TEMPLATE_INFO (*tp))
+ if (tree ti = TYPE_ALIAS_TEMPLATE_INFO (t))
 WALK_SUBTREE (TI_ARGS (ti));
   *walk_subtrees_p = 0;
   return NULL_TREE;
 }

-  if (tree ti = TYPE_TEMPLATE_INFO (*tp))
+  if (tree ti = TYPE_TEMPLATE_INFO (t))
 WALK_SUBTREE (TI_ARGS (ti));
  }

@@ -5482,8 +5483,8 @@ cp_walk_subtrees (tree *tp, int *walk_subtrees_p, 
walk_tree_fn func,
switch (code)
  {
  case TEMPLATE_TYPE_PARM:
-  if (template_placeholder_p (*tp))
-   WALK_SUBTREE (CLASS_PLACEHOLDER_TEMPLATE (*tp));
+  if (template_placeholder_p (t))
+   WALK_SUBTREE (CLASS_PLACEHOLDER_TEMPLATE (t));
/* Fall through.  */
  case DEFERRED_PARSE:
  case TEMPLATE_TEMPLATE_PARM:
@@ -5497,63 +5498,63 @@ cp_walk_subtrees (tree *tp, int *walk_subtrees_p, 
walk_tree_fn func,
break;

  case TYPENAME_TYPE:
-  WALK_SUBTREE (TYPE_CONTEXT (*tp));
-  WALK_SUBTREE (TYPENAME_TYPE_FULLNAME (*tp));
+  WALK_SUBTREE (TYPE_CONTEXT (t));
+  WALK_SUBTREE (TYPENAME_TYPE_FULLNAME (t));
*walk_subtrees_p = 0;
break;

  case BASELINK:
-  if (BASELINK_QUALIFIED_P (*tp))
-   WALK_SUBTREE (BINFO_TYPE (BASELINK_ACCESS_BINFO (*tp)));
-  WALK_SUBTREE (BASELINK_FUNCTIONS (*tp));
+  if (BASELINK_QUALIFIED_P (t))
+   WALK_SUBTREE (BINFO_TYPE (BASELINK_ACCESS_BINFO (t)));
+  WALK_SUBTREE (BASELINK_FUNCTIONS (t));
*walk_subtrees_p = 0;
break;

  case PTRMEM_CST:
-  WALK_SUBTREE (TREE_TYPE (*tp));
+  WALK_SUBTREE (TREE_TYPE (t));
*walk_subtrees_p = 0;
break;

  case TREE_LIST:
-  WALK_SUBTREE (TREE_PURPOSE (*tp));
+  WALK_SUBTREE (TREE_PURPOSE (t));
break;

  case OVERLOAD:
-  WALK_SUBTREE (OVL_FUNCTION (*tp));
-  WALK_SUBTREE (OVL_CHAIN (*tp));
+  WALK_SUBTREE (OVL_FUNCTION (t));
+  WALK_SUBTREE (OVL_CHAIN (t));
*walk_subtrees_p = 0;
break;

  case USING_DECL:
-  WALK_SUBTREE (DECL_NAME (*tp));
-  WALK_SUBTREE 

Re: [PATCH] testsuite, X86, Darwin: Fix bf16 ABI tests for Mach-O/macOS ABI.

2022-12-05 Thread Uros Bizjak via Gcc-patches
On Mon, Dec 5, 2022 at 3:54 PM Iain Sandoe  wrote:
>
> Hi Uros,
>
> > On 5 Dec 2022, at 10:37, Uros Bizjak via Gcc-patches 
> >  wrote:
> >
> > On Sun, Dec 4, 2022 at 9:30 PM Iain Sandoe  wrote:
> >>
>
> >> gcc/testsuite/ChangeLog:
> >>
> >>* gcc.target/x86_64/abi/bf16/args.h: Make xmm_regs, x87_regs extern.
> >>* gcc.target/x86_64/abi/bf16/m256bf16/args.h: Likewise.
> >>* gcc.target/x86_64/abi/bf16/m512bf16/args.h: Likewise.
> >>* gcc.target/x86_64/abi/bf16/asm-support.S: Add Mach-O variant.
> >>* gcc.target/x86_64/abi/bf16/m256bf16/asm-support.S: Likewise.
> >>* gcc.target/x86_64/abi/bf16/m512bf16/asm-support.S: Likewise.
> >
> > Please note that in other directories asm-support-darwin.s is
> > introduced and included via .exp file. Is there a reason a different
> > approach is introduced here?
>
> Since it seems that testcases get added and amended without considering any
> sub-target apart from x86_64-linux-gnu (even by very experienced 
> contributors),
> I was hoping that the Darwin section might prompt folks to remember that there
> are several other sub-targets.
>
> However, the main thing is to fix the tests .. so here’s a version using 
> separate
> files.

 extern void (*callthis)(void);
 extern unsigned long long
rax,rbx,rcx,rdx,rsi,rdi,rsp,rbp,r8,r9,r10,r11,r12,r13,r14,r15;
-XMM_T xmm_regs[16];
-X87_T x87_regs[8];
+extern XMM_T xmm_regs[16];
+extern X87_T x87_regs[8];

Do you still need this change? Existing test files are compiled without extern.

+.globl_callthis
+.zerofill __DATA,__bss,_callthis,8,3
+.globl_rax
+.zerofill __DATA,__bss,_rax,8,3
+.globl_rbx
+.zerofill __DATA,__bss,_rbx,8,3
+.globl_rcx
+.zerofill __DATA,__bss,_rcx,8,3
+.globl_rdx
+.zerofill __DATA,__bss,_rdx,8,3
...

I wonder if the above approach is better than existing:

.comm_callthis,8
.comm_rax,8
.comm_rbx,8
.comm_rcx,8
.comm_rdx,8
...

It is strange to have two different approaches for similar tests. If
the new approach is better, we should also change existing asm-support
files.

Uros.


Re: [PATCH] range-op-float: Improve binary reverse operations

2022-12-05 Thread Andrew MacLeod via Gcc-patches



On 12/5/22 10:33, Jakub Jelinek wrote:

Hi!

On Mon, Dec 05, 2022 at 02:29:36PM +0100, Aldy Hernandez wrote:

So like this for multiplication op1/2_range if it passes bootstrap/regtest?
For division I'll need to go to a drawing board...

Sure, looks good to me.

Ulrich just filed PR107972, so in the light of that PR the following patch
attempts to do that differently.

Is this also ok if it passes bootstrap/regtest?



Id actually prefer to avoid passing the tree code around... we're trying 
to avoid that sort of thing even though Aldy temporarily introduced them 
to range-ops. Hes suppose to remove that next stage 1 :-P   Ideally 
anything "special" is locally contained to the specific routine.


It looks like the only time we actually do anything different is for 
divide op2_range?   the divide op1_range seems to still call 
frange_drop_infs() under the same conditions..


In which case, maybe we can just change op2_range to contain all the 
special casing.. frange_drop_infs doesnt seem overly complicated, so 
just specialize it?  ie for divide op2_range do something like this 
instead of the call?:


if (!ret)
  return false;
if (r.known_isnan () || lhs.known_isnan () || r.undefined_p ())
  r.set_varying (type);
else if (!lhs.maybe_isnan ())
  {
    r.clear_nan();
    if (!contains_zero_p (lhs_lb, lhs_ub))
      frange_drop_infs (r, type);
  }
else
  r.update_nan ();
return true;


or whatever the sequence precisely works out to.

Andrew



As for testcase, I've tried both attached testcases, but unfortunately it
seems that in neither of the cases we actually figure out that res range
is finite (or for last function non-zero ordered).  So there is further
work needed on that.

2022-12-05  Jakub Jelinek  

PR tree-optimization/107972
* range-op-float.cc (frange_drop_infs): New function.
(float_binary_op_range_finish): Add OP argument.  If OP is not
RDIV_EXPR and lhs is finite, r must be finite too.
(foperator_plus::op1_range, foperator_minus::op1_range,
foperator_minus::op2_range, foperator_mult::op1_range): Pass
operation code to float_binary_op_range_finish.
(foperator_div::op1_range): Pass RDIV_EXPR to
float_binary_op_range_finish.  If lhs is finite, r must be finite
too.
(foperator_div::op2_range): Pass RDIV_EXPR to
float_binary_op_range_finish.  If lhs is not NAN nor zero, r must
be finite.

--- gcc/range-op-float.cc.jj2022-12-05 11:17:34.900573272 +0100
+++ gcc/range-op-float.cc   2022-12-05 16:13:54.414845672 +0100
@@ -330,6 +330,18 @@ frange_drop_ninf (frange , tree type)
r.intersect (tmp);
  }
  
+// Crop R to [MIN, MAX] where MAX is the maximum representable number

+// for TYPE and MIN the minimum representable number for TYPE.
+
+static inline void
+frange_drop_infs (frange , tree type)
+{
+  REAL_VALUE_TYPE max = real_max_representable (type);
+  REAL_VALUE_TYPE min = real_min_representable (type);
+  frange tmp (type, min, max);
+  r.intersect (tmp);
+}
+
  // If zero is in R, make sure both -0.0 and +0.0 are in the range.
  
  static inline void

@@ -1883,7 +1895,7 @@ foperator_unordered_equal::op1_range (fr
  
  static bool

  float_binary_op_range_finish (bool ret, frange , tree type,
- const frange )
+ const frange , enum tree_code op)
  {
if (!ret)
  return false;
@@ -1904,7 +1916,16 @@ float_binary_op_range_finish (bool ret,
// If lhs isn't NAN, then neither operand could be NAN,
// even if the reverse operation does introduce a maybe_nan.
if (!lhs.maybe_isnan ())
-r.clear_nan ();
+{
+  r.clear_nan ();
+  if (op != RDIV_EXPR
+ && !real_isinf (_bound ())
+ && !real_isinf (_bound ()))
+   // For reverse + or - or *, if result is finite, then r must
+   // be finite too, as X + INF or X - INF or X * INF is always
+   // +-INF or NAN.  For division handle it in the caller.
+   frange_drop_infs (r, type);
+}
// If lhs is a maybe or known NAN, the operand could be
// NAN.
else
@@ -2020,7 +2041,7 @@ public:
  if (!minus)
return false;
  return float_binary_op_range_finish (minus.fold_range (r, type, lhs, op2),
-r, type, lhs);
+r, type, lhs, PLUS_EXPR);
}
virtual bool op2_range (frange , tree type,
  const frange ,
@@ -2067,7 +2088,7 @@ public:
return false;
  return float_binary_op_range_finish (fop_plus.fold_range (r, type, lhs,
  op2),
-r, type, lhs);
+r, type, lhs, MINUS_EXPR);
}
virtual bool op2_range (frange , tree type,
  const frange ,
@@ -2077,7 +2098,7 @@ public:
  if (lhs.undefined_p ())
return false;
  

Re: [PATCH 3/3] Testcases for move sub blocks on param and ret

2022-12-05 Thread Segher Boessenkool
Hi!

Some comments on the testcases:

On Tue, Nov 29, 2022 at 09:45:07PM +0800, Jiufu Guo wrote:
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c
> @@ -0,0 +1,25 @@
> +/* PR target/65421 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-require-effective-target powerpc_elfv2 } */
> +
> +typedef struct SA
> +{
> +  double a[3];
> +  long l;
> +} A;
> +
> +/* 2 vec load, 2 vec store.  */
> +A ret_arg_pt (A *a){return *a;}
> +
> +/* 4 std */
> +A ret_arg (A a) {return a;}
> +
> +/* 4 std */
> +void st_arg (A a, A *p) {*p = a;}
> +
> +/* { dg-final { scan-assembler-times {\mlxvd2x\M|\mlxv\M|\mlvx\M} 2 } } */
> +/* { dg-final { scan-assembler-times {\mstxvd2x\M|\mstxv\M|\mstvx\M} 2 } } */
> +/* { dg-final { scan-assembler-times {\mstd\M} 8 } } */

You need at least ISA 2.06 (power7) to have {l,st}xvd2x, and elfv2 does
not guarantee that.  Have you tested on something old as well (say 970)
to see if the lvx/stvx is generated as expected?  For that you need to
have AltiVec enabled as well, so dg-require that?

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr65421.c
> @@ -0,0 +1,22 @@
> +/* PR target/65421 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-require-effective-target powerpc_elfv2 } */
> +
> +typedef struct SA
> +{
> +  double a[3];
> +} A;
> +
> +/* 3 lfd */
> +A ret_arg_pt (A *a){return *a;}
> +
> +/* blr */
> +A ret_arg (A a) {return a;}
> +
> +/* 3 stfd */
> +void st_arg (A a, A *p) {*p = a;}
> +
> +/* { dg-final { scan-assembler-times {\mlfd\M} 3 } } */
> +/* { dg-final { scan-assembler-times {\mstfd\M} 3 } } */
> +/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 9 } } */

Comment that last one?  Just something as simple as "count insns" is
enough :-)


Segher


Re: [PATCH] libgcc: Fix uninitialized RA signing on AArch64 [PR107678]

2022-12-05 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra  writes:
> A recent change only initializes the regs.how[] during Dwarf unwinding
> which resulted in an uninitialized offset used in return address signing
> and random failures during unwinding.  The fix is to use REG_SAVED_OFFSET
> as the state where the return address signing bit is valid, and if the
> state is REG_UNSAVED, initialize it to 0.
>
> Passes bootstrap & regress, OK for commit?
>
> libgcc/
> PR target/107678
> * unwind-dw2.c (execute_cfa_program): Initialize offset of
> DWARF_REGNUM_AARCH64_RA_STATE if in REG_UNSAVED state.
> * config/aarch64/aarch64-unwind.h (aarch64_frob_update_contex):
> Check state is REG_SAVED_OFFSET before using offset for RA state.
>
> ---
>
> diff --git a/libgcc/config/aarch64/aarch64-unwind.h 
> b/libgcc/config/aarch64/aarch64-unwind.h
> index 
> 26db9cbd9e5c526e0c410a4fc6be2bedb7d261cf..597133b3d708a50a366c8bfeff57475f5522b3f6
>  100644
> --- a/libgcc/config/aarch64/aarch64-unwind.h
> +++ b/libgcc/config/aarch64/aarch64-unwind.h
> @@ -71,21 +71,15 @@ aarch64_demangle_return_addr (struct _Unwind_Context 
> *context,
>  }
>  
>  /* Do AArch64 private initialization on CONTEXT based on frame info FS.  Mark
> -   CONTEXT as return address signed if bit 0 of 
> DWARF_REGNUM_AARCH64_RA_STATE is
> -   set.  */
> +   CONTEXT as having a signed return address if DWARF_REGNUM_AARCH64_RA_STATE
> +   is initialized (REG_SAVED_OFFSET state) and the offset has bit 0 set.  */
>  
>  static inline void
>  aarch64_frob_update_context (struct _Unwind_Context *context,
>_Unwind_FrameState *fs)
>  {
> -  const int reg = DWARF_REGNUM_AARCH64_RA_STATE;
> -  int ra_signed;
> -  if (fs->regs.how[reg] == REG_UNSAVED)
> -ra_signed = fs->regs.reg[reg].loc.offset & 0x1;
> -  else
> -ra_signed = _Unwind_GetGR (context, reg) & 0x1;
> -  if (ra_signed)
> -/* The flag is used for re-authenticating EH handler's address.  */
> +  if (fs->regs.how[DWARF_REGNUM_AARCH64_RA_STATE] == REG_SAVED_OFFSET
> +  && (fs->regs.reg[DWARF_REGNUM_AARCH64_RA_STATE].loc.offset & 1) != 0)
>  context->flags |= RA_SIGNED_BIT;
>else
>  context->flags &= ~RA_SIGNED_BIT;

Hmm, but the point of the original patch was to support code generators
that emit DW_CFA_val_expression instead of DW_CFA_AARCH64_negate_ra_state.
Doesn't this patch undo that?

Also, if I understood correctly, the reason we use REG_UNSAVED is to
ensure that state from one frame isn't carried across to a parent frame,
in cases where the parent frame lacks any signing.  That is, each frame
should start out with a zero bit even if a child frame is unwound while
it has a set bit.

Thanks,
Richard

> diff --git a/libgcc/unwind-dw2.c b/libgcc/unwind-dw2.c
> index 
> eaceace20298b9b13344aff9d1fe9ee5f9c7bd73..87f2ae065b67982ce48f74e45523d9c754a7661c
>  100644
> --- a/libgcc/unwind-dw2.c
> +++ b/libgcc/unwind-dw2.c
> @@ -1203,11 +1203,16 @@ execute_cfa_program (const unsigned char *insn_ptr,
>  
>   case DW_CFA_GNU_window_save:
>  #if defined (__aarch64__) && !defined (__ILP32__)
> -   /* This CFA is multiplexed with Sparc.  On AArch64 it's used to toggle
> -  return address signing status.  */
> -   reg = DWARF_REGNUM_AARCH64_RA_STATE;
> -   gcc_assert (fs->regs.how[reg] == REG_UNSAVED);
> -   fs->regs.reg[reg].loc.offset ^= 1;
> +  /* This CFA is multiplexed with Sparc.  On AArch64 it's used to toggle
> + the return address signing status.  It is initialized at the first
> + use and the state is stored in bit 0 of the offset.  */
> +  reg = DWARF_REGNUM_AARCH64_RA_STATE;
> +  if (fs->regs.how[reg] == REG_UNSAVED)
> +{
> +  fs->regs.how[reg] = REG_SAVED_OFFSET;
> +  fs->regs.reg[reg].loc.offset = 0;
> +}
> +  fs->regs.reg[reg].loc.offset ^= 1;
>  #else
> /* ??? Hardcoded for SPARC register window configuration.  */
> if (__LIBGCC_DWARF_FRAME_REGISTERS__ >= 32)


Re: [PATCH 2/2] Corrected pr25521.c target matching.

2022-12-05 Thread Jeff Law via Gcc-patches




On 12/2/22 10:52, Cupertino Miranda via Gcc-patches wrote:

This commit is a follow up of bugzilla #107181.

The commit /a0aafbc/ changed the default implementation of the
SELECT_SECTION hook in order to match clang/llvm behaviour w.r.t the
placement of `const volatile' objects.

However, the following targets use target-specific selection functions
and they choke on the testcase pr25521.c:

  *rx - target sets its const variables as '.section C,"a",@progbits'.
That's presumably a constant section.  We should instead twiddle the 
test to recognize that section.




  *powerpc - its 32bit version is eager to allocate globals in .sdata
 sections.

Normally, one can expect for the variable to be allocated in .srodata,
however, in case of powerpc-*-* or powerpc64-*-* (with -m32)
'targetm.have_srodata_section == false' and the code in
categorize_decl_for_section(varasm.cc), forces it to allocate in .sdata.

   /* If the target uses small data sections, select it.  */
   else if (targetm.in_small_data_p (decl))
 {
   if (ret == SECCAT_BSS)
ret = SECCAT_SBSS;
   else if targetm.have_srodata_section && ret == SECCAT_RODATA)
ret = SECCAT_SRODATA;
   else
ret = SECCAT_SDATA;
 }
I'd just skip the test for 32bit ppc.  There should be suitable 
effective-target tests you can use.


jeff


Re: [aarch64] PR107920 - Fix incorrect handling of virtual operands in svld1rq_impl::fold

2022-12-05 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni  writes:
> Hi,
> The following test:
>
> #include "arm_sve.h"
>
> svint8_t
> test_s8(int8_t *x)
> {
>   return svld1rq_s8 (svptrue_b8 (), [0]);
> }
>
> ICE's with -march=armv8.2-a+sve -O1 -fno-tree-ccp -fno-tree-forwprop:
> during GIMPLE pass: fre
> pr107920.c: In function ‘test_s8’:
> pr107920.c:7:1: internal compiler error: in execute_todo, at passes.cc:2140
> 7 | }
>   | ^
> 0x7b03d0 execute_todo
> ../../gcc/gcc/passes.cc:2140
>
> because of incorrect handling of virtual operands in svld1rq_impl::fold:
>  # VUSE <.MEM>
>   _5 = MEM  [(signed char * {ref-all})x_3(D)];
>   _4 = VEC_PERM_EXPR <_5, _5, { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
> 12, 13, 14, 15, ... }>;
>   # VUSE <.MEM_2(D)>
>   return _4;
>
> The attached patch tries to fix the issue by building the replacement
> statements in gimple_seq, and passing it to gsi_replace_with_seq_vops,
> which resolves the ICE, and results in:
>:
>   # VUSE <.MEM_2(D)>
>   _5 = MEM  [(signed char * {ref-all})x_3(D)];
>   _4 = VEC_PERM_EXPR <_5, _5, { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
> 12, 13, 14, 15, ... }>;
>   # VUSE <.MEM_2(D)>
>   return _4;
>
> Bootstrapped+tested on aarch64-linux-gnu.
> OK to commit ?

Looks good, but we also need to deal with the -fnon-call-exceptions
point that Andrew made in the PR trail.  An easy way of testing
would be to make sure that:

#include "arm_sve.h"

svint8_t
test_s8(int8_t *x)
{
  try
{
  return svld1rq_s8 (svptrue_b8 (), [0]);
}
  catch (...)
{
  return svdup_s8 (1);
}
}

compiled with -fnon-call-exceptions still has a call to __cxa_begin_catch.

I don't think it's worth optimising this case.  Let's just add
!flag_non_call_exceptions to the test.

The patch is missing a changelog btw.

Thanks,
Richard

> Thanks,
> Prathamesh
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index 6347407555f..f5546a65d22 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -45,6 +45,7 @@
>  #include "aarch64-sve-builtins-base.h"
>  #include "aarch64-sve-builtins-functions.h"
>  #include "ssa.h"
> +#include "gimple-fold.h"
>  
>  using namespace aarch64_sve;
>  
> @@ -1232,7 +1233,9 @@ public:
>   tree mem_ref_op = fold_build2 (MEM_REF, access_type, arg1, zero);
>   gimple *mem_ref_stmt
> = gimple_build_assign (mem_ref_lhs, mem_ref_op);
> - gsi_insert_before (f.gsi, mem_ref_stmt, GSI_SAME_STMT);
> +
> + gimple_seq stmts = NULL;
> + gimple_seq_add_stmt_without_update (, mem_ref_stmt);
>  
>   int source_nelts = TYPE_VECTOR_SUBPARTS (access_type).to_constant ();
>   vec_perm_builder sel (lhs_len, source_nelts, 1);
> @@ -1245,8 +1248,11 @@ public:
>  indices));
>   tree mask_type = build_vector_type (ssizetype, lhs_len);
>   tree mask = vec_perm_indices_to_tree (mask_type, indices);
> - return gimple_build_assign (lhs, VEC_PERM_EXPR,
> - mem_ref_lhs, mem_ref_lhs, mask);
> + gimple *g2 = gimple_build_assign (lhs, VEC_PERM_EXPR,
> +   mem_ref_lhs, mem_ref_lhs, mask);
> + gimple_seq_add_stmt_without_update (, g2);
> + gsi_replace_with_seq_vops (f.gsi, stmts);
> + return g2;
>}
>  
>  return NULL;
> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> index c2d9c806aee..03cdb2f9f49 100644
> --- a/gcc/gimple-fold.cc
> +++ b/gcc/gimple-fold.cc
> @@ -591,7 +591,7 @@ fold_gimple_assign (gimple_stmt_iterator *si)
> If the statement has a lhs the last stmt in the sequence is expected
> to assign to that lhs.  */
>  
> -static void
> +void
>  gsi_replace_with_seq_vops (gimple_stmt_iterator *si_p, gimple_seq stmts)
>  {
>gimple *stmt = gsi_stmt (*si_p);
> diff --git a/gcc/gimple-fold.h b/gcc/gimple-fold.h
> index 7d29ee9a9a4..87ed4e56d25 100644
> --- a/gcc/gimple-fold.h
> +++ b/gcc/gimple-fold.h
> @@ -63,6 +63,7 @@ extern bool arith_code_with_undefined_signed_overflow 
> (tree_code);
>  extern gimple_seq rewrite_to_defined_overflow (gimple *, bool = false);
>  extern void replace_call_with_value (gimple_stmt_iterator *, tree);
>  extern tree tree_vec_extract (gimple_stmt_iterator *, tree, tree, tree, 
> tree);
> +extern void gsi_replace_with_seq_vops (gimple_stmt_iterator *, gimple_seq);
>  
>  /* gimple_build, functionally matching fold_buildN, outputs stmts
> int the provided sequence, matching and simplifying them on-the-fly.
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr107920.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr107920.c
> new file mode 100644
> index 000..11448ed5e68
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr107920.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fno-tree-ccp -fno-tree-forwprop" } */
> +
> +#include "arm_sve.h"

Re: [PATCH] tree-optimization/107833 - invariant motion of uninit uses

2022-12-05 Thread Jeff Law via Gcc-patches




On 12/2/22 07:30, Richard Biener via Gcc-patches wrote:

The following fixes a wrong-code bug caused by loop invariant motion
hoisting an expression using an uninitialized value outside of its
controlling condition causing IVOPTs to use that to rewrite a defined
value.  PR107839 is a similar case involving a bogus uninit diagnostic.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK?

Thanks,
Richard.

PR tree-optimization/107833
PR tree-optimization/107839
* cfghooks.cc: Include tree.h.
* tree-ssa-loop-im.cc (movement_possibility): Wrap and
make stmts using any ssa_name_maybe_undef_p operand
to preserve execution.
(loop_invariant_motion_in_fun): Call mark_ssa_maybe_undefs
to init maybe-undefined status.
* tree-ssa-loop-ivopts.cc (ssa_name_maybe_undef_p,
ssa_name_set_maybe_undef, ssa_name_any_use_dominates_bb_p,
mark_ssa_maybe_undefs): Move ...
* tree-ssa.cc: ... here.
* tree-ssa.h (ssa_name_any_use_dominates_bb_p,
mark_ssa_maybe_undefs): Declare.
(ssa_name_maybe_undef_p, ssa_name_set_maybe_undef): Define.

* gcc.dg/torture/pr107833.c: New testcase.
* gcc.dg/uninit-pr107839.c: Likewise.

OK.

Jeff



Re: [PATCH 1/2] select .rodata for const volatile variables.

2022-12-05 Thread Jeff Law via Gcc-patches




On 12/2/22 10:52, Cupertino Miranda via Gcc-patches wrote:

Changed target code to select .rodata section for 'const volatile'
defined variables.
This change is in the context of the bugzilla #170181.

gcc/ChangeLog:

v850.c(v850_select_section): Changed function.
I'm not sure this is safe/correct.  ISTM that you need to look at the 
underlying TREE_TYPE to check for const-volatile rather than 
TREE_SIDE_EFFECTS.


Of secondary importance is the ChangeLog.  Just saying "Changed 
function" provides no real information.  Something like this would be 
better:


* config/v850/v850.c (v850_select_section): Put const volatile
objects into read-only sections.


Jeff





---
  gcc/config/v850/v850.cc | 1 -
  1 file changed, 1 deletion(-)

diff --git a/gcc/config/v850/v850.cc b/gcc/config/v850/v850.cc
index c7d432990ab..e66893fede4 100644
--- a/gcc/config/v850/v850.cc
+++ b/gcc/config/v850/v850.cc
@@ -2865,7 +2865,6 @@ v850_select_section (tree exp,
  {
int is_const;
if (!TREE_READONLY (exp)
- || TREE_SIDE_EFFECTS (exp)
  || !DECL_INITIAL (exp)
  || (DECL_INITIAL (exp) != error_mark_node
  && !TREE_CONSTANT (DECL_INITIAL (exp


Re: [PATCH 12/15 V3] arm: implement bti injection

2022-12-05 Thread Richard Earnshaw via Gcc-patches




On 28/10/2022 17:40, Andrea Corallo via Gcc-patches wrote:

Hi all,

please find attached the third iteration of this patch addresing review
comments.

Thanks

   Andrea



@@ -23374,12 +23374,6 @@ output_probe_stack_range (rtx reg1, rtx reg2)
   return "";
 }

-static bool
-aarch_bti_enabled ()
-{
-  return false;
-}
-
 /* Generate the prologue instructions for entry into an ARM or Thumb-2
function.  */
 void
@@ -32992,6 +32986,61 @@ arm_current_function_pac_enabled_p (void)
   && !crtl->is_leaf));
 }

+/* Return TRUE if Branch Target Identification Mechanism is enabled.  */
+bool
+aarch_bti_enabled (void)
+{
+  return aarch_enable_bti == 1;
+}

See comment in earlier patch about the location of this function moving. 
 Can aarch_enable_bti take values other than 0 and 1?  If not, then 
writing aarch_enable_bti != 0 is slightly more robust, but perhaps this 
should be replaced by a macro anyway, much like a number of other 
predicates used by the backend.


+  return GET_CODE (pat) == UNSPEC_VOLATILE && XINT (pat, 1) == 
UNSPEC_BTI_NOP;


I'm not sure where this crept in, but UNSPEC and UNSPEC_VOLATILE have 
separate enums in the backend, so UNSPEC_BIT_NOP should really be 
VUNSPEC_BTI_NOP and defined in the enum "unspecv".


+aarch_pac_insn_p (rtx x)
+{
+  if (!x || !INSN_P (x))
+return false;
+
+  rtx pat = PATTERN (x);
+
+  if (GET_CODE (pat) == SET)
+{
+  rtx tmp = XEXP (pat, 1);
+  if (tmp
+ && GET_CODE (tmp) == UNSPEC
+ && (XINT (tmp, 1) == UNSPEC_PAC_NOP
+ || XINT (tmp, 1) == UNSPEC_PACBTI_NOP))
+   return true;
+}
+

This will also need updating (see review on earlier patch) because 
PACBTI needs to be unspec_volatile, while PAC doesn't.


+/* The following two functions are for code compatibility with aarch64
+   code, this even if in arm we have only one bti instruction.  */
+

I'd just write
 /* Target specific mapping for aarch_gen_bti_c and aarch_gen_bti_j. 
For Arm, both of these map to a simple BTI instruction.  */



@@ -162,6 +162,7 @@ (define_c_enum "unspec" [
   UNSPEC_PAC_NOP   ; Represents PAC signing LR
   UNSPEC_PACBTI_NOP; Represents PAC signing LR + valid landing pad
   UNSPEC_AUT_NOP   ; Represents PAC verifying LR
+  UNSPEC_BTI_NOP   ; Represent BTI
 ])

BTI is an unspec volatile, so this should be in the "vunspec" enum and 
renamed accordingly (see above).


R.


Re: [PATCH] testsuite: Fix leaks in tree-dynamic-object-size-0.c

2022-12-05 Thread Siddhesh Poyarekar

On 2022-12-05 11:38, Jeff Law wrote:



On 12/5/22 07:28, Siddhesh Poyarekar wrote:

In commit e5cfb9cac1d7aba9a8ea73bfe7922cfaff9d61f3 I introduced tests
for strdup and strndup with leaks.  Fix those leaks.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-dynamic-object-size-0.c (test_strdup,
test_strndup, test_strdup_min, test_strndup_min): Free RES
before returning from function.
We don't generally worry about these kinds of issues in the testsuite. 
My only worry would be compromising the test.  By adding the free calls 
the compiler might match up the allocation and release and potentially 
turn it into an alloca.  I don't think we're likely to do that in this 
case, but it's worth keeping in mind.


Ack, thanks, I'll keep that in mind.

So OK as long as you've verified the test still does what it's supposed 
to do.


I have verified that the test still works correctly and the optimizer 
hasn't done anything funny with the calls, i.e. the str*dup calls and 
free calls are as is.


Thanks,
Sid


Re: [PATCH] Silence some -Wnarrowing errors

2022-12-05 Thread Jeff Law via Gcc-patches




On 12/2/22 00:26, Eric Gallager via Gcc-patches wrote:

I tried turning -Wnarrowing back on earlier this year, but
unfortunately it didn't work due to triggering a bunch of new errors.
This patch silences at least some of them, but there will still be
more left even after applying it. (When compiling with clang,
technically the warning flag is -Wc++11-narrowing, but it's pretty
much the same thing as gcc's -Wnarrowing, albeit with fixit hints,
which I made use of to insert the casts here.)

gcc/ChangeLog:

 * ipa-modref.cc (modref_lattice::add_escape_point): Use a
static_cast to silence -Wnarrowing.
 (modref_eaf_analysis::record_escape_points): Likewise.
 (update_escape_summary_1): Likewise.
 * rtl-ssa/changes.cc (function_info::temp_access_array): Likewise.
 * rtl-ssa/member-fns.inl: Likewise.
 * tree-ssa-structalias.cc (push_fields_onto_fieldstack): Likewise.
 * tree-vect-slp.cc (vect_prologue_cost_for_slp): Likewise.
 * tree-vect-stmts.cc (vect_truncate_gather_scatter_offset): Likewise.
 (vectorizable_operation): Likewise.
It's potentially more invasive change, but I'd lean towards trying to 
fix the API to pass the right types rather than using static_cast.


jeff




Re: [PATCH V2] Use subscalar mode to move struct block for parameter

2022-12-05 Thread Jeff Law via Gcc-patches




On 11/28/22 20:53, Jiufu Guo wrote:



Right, but the number of registers is target dependent, so I don't see
how using "8" or any number of that matter is correct here.

I understand.  And even for the same struct type, using how many
registers to pass a parameter, it also dependends on the size of the
parameter and how many leading parameters there is.
So, as you said, "8" or any numbers are not always accurate.

Because, the enhancement in this patch is just make "block move" to be
more friendly for follow optiomizations(cse/dse/dce...) by moving
through sub-mode.  So, I just selected one arbitrary number which may
not too large and also tolerable.
I also through to query the max number registers from targets for
param/ret passing, but it may not very accurate neither.

Any sugguestions are welcome! and thanks!
OK, so it's just a magic number and (in theory) any number should still 
generate correct code -- the number merely places limits on when we'll 
consider performing this optimization.


It may be overkill, but you might consider making it a PARAM that can be 
adjusted.





For this patch, only simple stuffs are handled like "D.xxx = param_1",
where the source and dest of the assignment are all in memory which is
the DECL_RTL(of D.xx/param_xx) in MEM_P/BLK.
And to avoid complicate, this patch only handle the case where
"(size % mode_size) == 0".

If any misunderstandings, please point out, thanks.
And thanks for comments!
How values are justified varies on the PA depending on whether the 
parameter is passed in registers or in memory.   Though thinking more 
about things, I don't think you're changing how the parameter is passed. 
 Just how it's subsequently pulled out of memory.


jeff


Re: [PATCH] [arm] xfail fp-uint64-convert-double tests

2022-12-05 Thread Jeff Law via Gcc-patches




On 12/2/22 02:34, Alexandre Oliva via Gcc-patches wrote:


The FP emulation on ARM doesn't take rounding modes into account.  The
tests require hard_float, but that only tests for calls when adding
doubles.  There are arm targets that support hardware adds, but that
emulate conversions.

Regstraped on x86_64-linux-gnu, also tested with crosses to riscv64-elf
and arm-eabi.  Ok to install?

PS: I've pondered changing hard_float instead, but decided that could
impact other tests that rely on its current behavior.  Then I considered
adding a variant of hard_float that tested for conversions, but thought
that it was overkill.


for  gcc/testsuite/ChangeLog

* gcc.dg/torture/fp-uint64-convert-double-1.c: Expect fail on
arm-*-eabi*.
* gcc.dg/torture/fp-uint64-convert-double-2.c: Likewise.

OK
jeff


Re: [PATCH 10/15 V4] arm: Implement cortex-M return signing address codegen

2022-12-05 Thread Richard Earnshaw via Gcc-patches




On 07/11/2022 08:57, Andrea Corallo via Gcc-patches wrote:

Hi all,

please find attached the lastest version of this patch incorporating some
more improvents.  Feel free to ignore V3.

Best Regards

   Andrea



> As part of previous upstream suggestions a test for varargs has been
> added and '-mtpcs-frame' is deemed being incompatible with this return
> signing address feature being introduced.

I don't see any check for the tpcs-frame incompatibility?  What happens 
if a user does combine the options?


gcc/Changelog

2021-11-03  Andrea Corallo  

* config/arm/arm.h (arm_arch8m_main): Declare it.
* config/arm/arm.cc (arm_arch8m_main): Define it.
(arm_option_reconfigure_globals): Set arm_arch8m_main.
(arm_compute_frame_layout, arm_expand_prologue)
(thumb2_expand_return, arm_expand_epilogue)
(arm_conditional_register_usage): Update for pac codegen.
(arm_current_function_pac_enabled_p): New function.
* config/arm/arm.md (pac_ip_lr_sp, pacbti_ip_lr_sp, aut_ip_lr_sp):
Add new patterns.
* config/arm/unspecs.md (UNSPEC_PAC_IP_LR_SP)
(UNSPEC_PACBTI_IP_LR_SP, UNSPEC_AUT_IP_LR_SP): Add unspecs.

You're missing an entry for aarch_bti_enabled () - yes I realize that's 
just a placeholder at present and will be fully defined in patch 12.


+static bool
+aarch_bti_enabled ()
+{
+  return false;
+}
+

No comment on this function (and in patch 12 it moves to a different 
location).  It would be best to have it in the right place at this point 
in time.


+  clobber_ip = (IS_NESTED (func_type)
+&& (((TARGET_APCS_FRAME && frame_pointer_needed && 
TARGET_ARM)

+ || ((flag_stack_check == STATIC_BUILTIN_STACK_CHECK
+  || flag_stack_clash_protection)
+ && !df_regs_ever_live_p (LR_REGNUM)
+ && arm_r3_live_at_start_p ()))
+|| (arm_current_function_pac_enabled_p (;

Redundant parenthesis around arm_current_function_pac_enabled_p () call.

+ gcc_assert(arm_compute_static_chain_stack_bytes() == 4
+ || arm_current_function_pac_enabled_p ());

I wonder if this assert is now really serving a useful purpose.  I'd 
consider removing it.


@@ -27309,7 +27340,7 @@ thumb2_expand_return (bool simple_return)
 to assert it for now to ensure that future code changes do not silently
 change this behavior.  */
   gcc_assert (!IS_CMSE_ENTRY (arm_current_func_type ()));
-  if (num_regs == 1)
+  if (num_regs == 1 && !arm_current_function_pac_enabled_p ())
 {
   rtx par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2));
   rtx reg = gen_rtx_REG (SImode, PC_REGNUM);
@@ -27324,10 +27355,20 @@ thumb2_expand_return (bool simple_return)
 }
   else
 {
-  saved_regs_mask &= ~ (1 << LR_REGNUM);
-  saved_regs_mask |=   (1 << PC_REGNUM);
-  arm_emit_multi_reg_pop (saved_regs_mask);
-}
+ if (arm_current_function_pac_enabled_p ())
+   {
+ gcc_assert (!(saved_regs_mask & (1 << PC_REGNUM)));
+ arm_emit_multi_reg_pop (saved_regs_mask);
+ emit_insn (gen_aut_nop ());
+ emit_jump_insn (simple_return_rtx);
+   }
+ else
+   {
+ saved_regs_mask &= ~ (1 << LR_REGNUM);
+ saved_regs_mask |=   (1 << PC_REGNUM);
+ arm_emit_multi_reg_pop (saved_regs_mask);
+   }
+   }
 }
   else

The logic for these blocks would, I think, be better expressed as

   if (pac_enabled)
   ...
   else if (num_regs == 1)
 ...  // existing code
   else
 ...  // existing code

Also, I think (out of an abundance of caution) we really need a 
scheduling barrier placed before calls to gen_aut_nop() pattern is 
emitted, to ensure that the scheduler never tries to move this 
instruction away from the position we place it.  Use gen_blockage() for 
that (see TARGET_SCHED_PROLOG).  Alternatively, we could make the 
UNSPEC_PAC_NOP an unspec_volatile, which has the same effect (IIRC) 
without needing an additional insn - if you use this approach, then 
please make sure this is explained in a comment.


+(define_insn "pacbti_nop"
+  [(set (reg:SI IP_REGNUM)
+   (unspec:SI [(reg:SI SP_REGNUM) (reg:SI LR_REGNUM)]
+  UNSPEC_PACBTI_NOP))]
+  "arm_arch8m_main"
+  "pacbti\t%|ip, %|lr, %|sp"
+  [(set_attr "conds" "unconditional")])

The additional side-effect of this being a BTI landing pad means that we 
mustn't move any other instruction before it.  So I think this needs to 
be an unspec_volatile as well.


On the tests, they are OK as they stand, but we lack anything that will 
be tested when suitable hardware is unavailable (all tests are "dg-do 
run").  Can we please have some compile-only tests as well?


R.



Re: [PATCH] testsuite: Fix leaks in tree-dynamic-object-size-0.c

2022-12-05 Thread Jeff Law via Gcc-patches




On 12/5/22 07:28, Siddhesh Poyarekar wrote:

In commit e5cfb9cac1d7aba9a8ea73bfe7922cfaff9d61f3 I introduced tests
for strdup and strndup with leaks.  Fix those leaks.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-dynamic-object-size-0.c (test_strdup,
test_strndup, test_strdup_min, test_strndup_min): Free RES
before returning from function.
We don't generally worry about these kinds of issues in the testsuite. 
My only worry would be compromising the test.  By adding the free calls 
the compiler might match up the allocation and release and potentially 
turn it into an alloca.  I don't think we're likely to do that in this 
case, but it's worth keeping in mind.


So OK as long as you've verified the test still does what it's supposed 
to do.


jeff


[Ping x5] Re: [PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx

2022-12-05 Thread Chung-Lin Tang via Gcc-patches
Ping x5

On 2022/11/22 12:24 上午, Chung-Lin Tang wrote:
> Ping x4
> 
> On 2022/11/8 12:34 AM, Chung-Lin Tang wrote:
>> Ping x3.
>>
>> On 2022/10/31 10:18 PM, Chung-Lin Tang wrote:
>>> Ping x2.
>>>
>>> On 2022/10/17 10:29 PM, Chung-Lin Tang wrote:
 Ping.

 On 2022/9/21 3:45 PM, Chung-Lin Tang via Gcc-patches wrote:
> Hi Tom,
> I had a patch submitted earlier, where I reported that the current way of 
> implementing
> barriers in libgomp on nvptx created a quite significant performance drop 
> on some SPEChpc2021
> benchmarks:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/600818.html>>
>  That previous patch wasn't accepted well (admittedly, it was kind of a 
> hack).
> So in this patch, I tried to (mostly) re-implement team-barriers for 
> NVPTX.
>
> Basically, instead of trying to have the GPU do CPU-with-OS-like things 
> that it isn't suited for,
> barriers are implemented simplistically with bar.* synchronization 
> instructions.
> Tasks are processed after threads have joined, and only if 
> team->task_count != 0
>
> (arguably, there might be a little bit of performance forfeited where 
> earlier arriving threads
> could've been used to process tasks ahead of other threads. But that 
> again falls into requiring
> implementing complex futex-wait/wake like behavior. Really, that kind of 
> tasking is not what target
> offloading is usually used for)
>
> Implementation highlight notes:
> 1. gomp_team_barrier_wake() is now an empty function (threads never 
> "wake" in the usual manner)
> 2. gomp_team_barrier_cancel() now uses the "exit" PTX instruction.
> 3. gomp_barrier_wait_last() now is implemented using "bar.arrive"
>
> 4. gomp_team_barrier_wait_end()/gomp_team_barrier_wait_cancel_end():
> The main synchronization is done using a 'bar.red' instruction. This 
> reduces across all threads
> the condition (team->task_count != 0), to enable the task processing 
> down below if any thread
> created a task. (this bar.red usage required the need of the second 
> GCC patch in this series)
>
> This patch has been tested on x86_64/powerpc64le with nvptx offloading, 
> using libgomp, ovo, omptests,
> and sollve_vv testsuites, all without regressions. Also verified that the 
> SPEChpc 2021 521.miniswp_t
> and 534.hpgmgfv_t performance regressions that occurred in the GCC12 
> cycle has been restored to
> devel/omp/gcc-11 (OG11) branch levels. Is this okay for trunk?
>
> (also suggest backporting to GCC12 branch, if performance regression can 
> be considered a defect)
>
> Thanks,
> Chung-Lin
>
> libgomp/ChangeLog:
>
> 2022-09-21  Chung-Lin Tang  
>
>   * config/nvptx/bar.c (generation_to_barrier): Remove.
>   (futex_wait,futex_wake,do_spin,do_wait): Remove.
>   (GOMP_WAIT_H): Remove.
>   (#include "../linux/bar.c"): Remove.
>   (gomp_barrier_wait_end): New function.
>   (gomp_barrier_wait): Likewise.
>   (gomp_barrier_wait_last): Likewise.
>   (gomp_team_barrier_wait_end): Likewise.
>   (gomp_team_barrier_wait): Likewise.
>   (gomp_team_barrier_wait_final): Likewise.
>   (gomp_team_barrier_wait_cancel_end): Likewise.
>   (gomp_team_barrier_wait_cancel): Likewise.
>   (gomp_team_barrier_cancel): Likewise.
>   * config/nvptx/bar.h (gomp_team_barrier_wake): Remove
>   prototype, add new static inline function.
>>
> 



[PATCH] middle-end/40635 - SSA update losing PHI arg loations

2022-12-05 Thread Richard Biener via Gcc-patches
The following fixes an issue where SSA update loses PHI argument
locations when updating PHI nodes it didn't create as part of the
SSA update.  For the case where the reaching def is the same as
the current argument opt to do nothing and for the case where the
PHI argument already has a location keep that (that's an indication
the PHI node wasn't created as part of the update SSA process).

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

PR middle-end/40635
* tree-into-ssa.cc (rewrite_update_phi_arguments): Only
update the argument when the reaching definition is different
from the current argument.  Keep an existing argument
location.

* gcc.dg/uninit-pr40635.c: New testcase.
---
 gcc/testsuite/gcc.dg/uninit-pr40635.c | 33 +++
 gcc/tree-into-ssa.cc  | 11 +
 2 files changed, 40 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/uninit-pr40635.c

diff --git a/gcc/testsuite/gcc.dg/uninit-pr40635.c 
b/gcc/testsuite/gcc.dg/uninit-pr40635.c
new file mode 100644
index 000..fab7c3d49d9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/uninit-pr40635.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-O -Wuninitialized" } */
+
+struct hostent {
+char **h_addr_list;
+};
+struct hostent *gethostbyname(const char*);
+int socket(void);
+int close(int);
+int connect(int, const char*);
+
+int get_tcp_socket(const char *machine)
+{
+  struct hostent *hp;
+  int s42, x;
+  char **addr;
+
+  hp = gethostbyname(machine);
+  x = 0;
+  for (addr = hp->h_addr_list; *addr; addr++)
+{
+  s42 = socket();
+  if (s42 < 0)
+   return -1;
+  x = connect(s42, *addr);
+  if (x == 0)
+   break;
+  close(s42);
+}
+  if (x < 0)
+return -1;
+  return s42;  /* { dg-warning "uninitialized" } */
+}
diff --git a/gcc/tree-into-ssa.cc b/gcc/tree-into-ssa.cc
index f21ed2bea3f..9a2417d2b0b 100644
--- a/gcc/tree-into-ssa.cc
+++ b/gcc/tree-into-ssa.cc
@@ -2110,7 +2110,6 @@ rewrite_update_phi_arguments (basic_block bb)
 symbol we may find NULL arguments.  That's why we
 take the symbol from the LHS of the PHI node.  */
  reaching_def = get_reaching_def (lhs_sym);
-
}
  else
{
@@ -2122,8 +2121,9 @@ rewrite_update_phi_arguments (basic_block bb)
reaching_def = get_reaching_def (arg);
}
 
-  /* Update the argument if there is a reaching def.  */
- if (reaching_def)
+ /* Update the argument if there is a reaching def different
+from arg.  */
+ if (reaching_def && reaching_def != arg)
{
  location_t locus;
  int arg_i = PHI_ARG_INDEX_FROM_USE (arg_p);
@@ -2133,6 +2133,10 @@ rewrite_update_phi_arguments (basic_block bb)
  /* Virtual operands do not need a location.  */
  if (virtual_operand_p (reaching_def))
locus = UNKNOWN_LOCATION;
+ /* If SSA update didn't insert this PHI the argument
+might have a location already, keep that.  */
+ else if (gimple_phi_arg_has_location (phi, arg_i))
+   locus = gimple_phi_arg_location (phi, arg_i);
  else
{
  gimple *stmt = SSA_NAME_DEF_STMT (reaching_def);
@@ -2150,7 +2154,6 @@ rewrite_update_phi_arguments (basic_block bb)
  gimple_phi_arg_set_location (phi, arg_i, locus);
}
 
-
  if (e->flags & EDGE_ABNORMAL)
SSA_NAME_OCCURS_IN_ABNORMAL_PHI (USE_FROM_PTR (arg_p)) = 1;
}
-- 
2.35.3


[PATCH] range-op-float: Improve binary reverse operations

2022-12-05 Thread Jakub Jelinek via Gcc-patches
Hi!

On Mon, Dec 05, 2022 at 02:29:36PM +0100, Aldy Hernandez wrote:
> > So like this for multiplication op1/2_range if it passes bootstrap/regtest?
> > For division I'll need to go to a drawing board...
> 
> Sure, looks good to me.

Ulrich just filed PR107972, so in the light of that PR the following patch
attempts to do that differently.

Is this also ok if it passes bootstrap/regtest?

As for testcase, I've tried both attached testcases, but unfortunately it
seems that in neither of the cases we actually figure out that res range
is finite (or for last function non-zero ordered).  So there is further
work needed on that.

2022-12-05  Jakub Jelinek  

PR tree-optimization/107972
* range-op-float.cc (frange_drop_infs): New function.
(float_binary_op_range_finish): Add OP argument.  If OP is not
RDIV_EXPR and lhs is finite, r must be finite too.
(foperator_plus::op1_range, foperator_minus::op1_range,
foperator_minus::op2_range, foperator_mult::op1_range): Pass
operation code to float_binary_op_range_finish.
(foperator_div::op1_range): Pass RDIV_EXPR to
float_binary_op_range_finish.  If lhs is finite, r must be finite
too.
(foperator_div::op2_range): Pass RDIV_EXPR to
float_binary_op_range_finish.  If lhs is not NAN nor zero, r must
be finite.

--- gcc/range-op-float.cc.jj2022-12-05 11:17:34.900573272 +0100
+++ gcc/range-op-float.cc   2022-12-05 16:13:54.414845672 +0100
@@ -330,6 +330,18 @@ frange_drop_ninf (frange , tree type)
   r.intersect (tmp);
 }
 
+// Crop R to [MIN, MAX] where MAX is the maximum representable number
+// for TYPE and MIN the minimum representable number for TYPE.
+
+static inline void
+frange_drop_infs (frange , tree type)
+{
+  REAL_VALUE_TYPE max = real_max_representable (type);
+  REAL_VALUE_TYPE min = real_min_representable (type);
+  frange tmp (type, min, max);
+  r.intersect (tmp);
+}
+
 // If zero is in R, make sure both -0.0 and +0.0 are in the range.
 
 static inline void
@@ -1883,7 +1895,7 @@ foperator_unordered_equal::op1_range (fr
 
 static bool
 float_binary_op_range_finish (bool ret, frange , tree type,
- const frange )
+ const frange , enum tree_code op)
 {
   if (!ret)
 return false;
@@ -1904,7 +1916,16 @@ float_binary_op_range_finish (bool ret,
   // If lhs isn't NAN, then neither operand could be NAN,
   // even if the reverse operation does introduce a maybe_nan.
   if (!lhs.maybe_isnan ())
-r.clear_nan ();
+{
+  r.clear_nan ();
+  if (op != RDIV_EXPR
+ && !real_isinf (_bound ())
+ && !real_isinf (_bound ()))
+   // For reverse + or - or *, if result is finite, then r must
+   // be finite too, as X + INF or X - INF or X * INF is always
+   // +-INF or NAN.  For division handle it in the caller.
+   frange_drop_infs (r, type);
+}
   // If lhs is a maybe or known NAN, the operand could be
   // NAN.
   else
@@ -2020,7 +2041,7 @@ public:
 if (!minus)
   return false;
 return float_binary_op_range_finish (minus.fold_range (r, type, lhs, op2),
-r, type, lhs);
+r, type, lhs, PLUS_EXPR);
   }
   virtual bool op2_range (frange , tree type,
  const frange ,
@@ -2067,7 +2088,7 @@ public:
   return false;
 return float_binary_op_range_finish (fop_plus.fold_range (r, type, lhs,
  op2),
-r, type, lhs);
+r, type, lhs, MINUS_EXPR);
   }
   virtual bool op2_range (frange , tree type,
  const frange ,
@@ -2077,7 +2098,7 @@ public:
 if (lhs.undefined_p ())
   return false;
 return float_binary_op_range_finish (fold_range (r, type, op1, lhs),
-r, type, lhs);
+r, type, lhs, MINUS_EXPR);
   }
 private:
   void rv_fold (REAL_VALUE_TYPE , REAL_VALUE_TYPE , bool _nan,
@@ -2166,7 +2187,7 @@ public:
 // or if lhs must be zero and op2 doesn't include zero, it would be
 // UNDEFINED, while rdiv.fold_range computes a zero or singleton INF
 // range.  Those are supersets of UNDEFINED, so let's keep that way.
-return float_binary_op_range_finish (ret, r, type, lhs);
+return float_binary_op_range_finish (ret, r, type, lhs, MULT_EXPR);
   }
   virtual bool op2_range (frange , tree type,
  const frange ,
@@ -2313,7 +2334,12 @@ public:
zero_to_inf_range (lb, ub, signbit_known);
r.set (type, lb, ub);
   }
-return float_binary_op_range_finish (ret, r, type, lhs);
+// If lhs must be finite (can't be +-INF nor NAN), then op1 must be
+// finite too - +-INF / anything is either +-INF or NAN (NAN if op2 is
+// +-INF or NAN).
+if 

Re: [V2][PATCH 1/1] Add a new warning option -Wstrict-flex-arrays.

2022-12-05 Thread Qing Zhao via Gcc-patches



> On Dec 5, 2022, at 10:16 AM, Richard Biener  wrote:
> 
> On Fri, 2 Dec 2022, Qing Zhao wrote:
> 
>> 
>> 
>>> On Dec 2, 2022, at 2:20 AM, Richard Biener  wrote:
>>> 
>>> On Fri, 2 Dec 2022, Richard Biener wrote:
>>> 
 On Thu, 1 Dec 2022, Siddhesh Poyarekar wrote:
 
> On 2022-12-01 11:42, Kees Cook wrote:
>> On Wed, Nov 30, 2022 at 02:25:56PM +, Qing Zhao wrote:
>>> '-Wstrict-flex-arrays'
>>> Warn about inproper usages of flexible array members according to
>>> the LEVEL of the 'strict_flex_array (LEVEL)' attribute attached to
>>> the trailing array field of a structure if it's available,
>>> otherwise according to the LEVEL of the option
>>> '-fstrict-flex-arrays=LEVEL'.
>>> 
>>> This option is effective only when LEVEL is bigger than 0.
>>> Otherwise, it will be ignored with a warning.
>>> 
>>> when LEVEL=1, warnings will be issued for a trailing array
>>> reference of a structure that have 2 or more elements if the
>>> trailing array is referenced as a flexible array member.
>>> 
>>> when LEVEL=2, in addition to LEVEL=1, additional warnings will be
>>> issued for a trailing one-element array reference of a structure if
>>> the array is referenced as a flexible array member.
>>> 
>>> when LEVEL=3, in addition to LEVEL=2, additional warnings will be
>>> issued for a trailing zero-length array reference of a structure if
>>> the array is referenced as a flexible array member.
>>> 
>>> At the same time, -Warray-bounds is updated:
>> 
>> Why is there both -Wstrict-flex-arrays and -Warray-bounds? I thought
>> only the latter was going to exist?
 
 Sorry for appearantly not being clear - I was requesting 
 -Wstrict-flex-arrays to be dropped and instead adjusting -Warray-bounds
 to adhere to -fstrict-flex-arrays in both =1 and =2 where then =2
 would only add the intermediate pointer results verification.
 
 I think that's reasonable if documented since the default behavior
 with -Wall will not change then unless the -fstrict-flex-arrays
 default is altered.
>>> 
>>> Btw, your patch seems to implement the above plus adds 
>>> -Wstrict-flex-arrays.  It seems it could be split into two, doing
>>> the -Warray-bounds adjustment as first and the -Wstrict-flex-arrays 
>>> addition as second.
>> 
>> Yes, implementation should be very easy to be adjusted to drop the new 
>> -Wstrict-flex-arrays option.
>> But I still feel the new -Wstrict-flex-arrays option is good to add.
> 
> Can you split the patch and re-post?  I'll quickly approve the first
> part and will think harder on the second.

Okay, I will do that.

thanks.

Qing
> 
> Thanks,
> Richard.
> 
>> Qing
>>> We do all seem to agree on the first so it's easy
>>> to go forward with that?
>>> 
>>> Thanks,
>>> Richard.
>> 
>> 
> 
> -- 
> Richard Biener 
> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
> Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
> HRB 36809 (AG Nuernberg)



Re: [V2][PATCH 1/1] Add a new warning option -Wstrict-flex-arrays.

2022-12-05 Thread Richard Biener via Gcc-patches
On Fri, 2 Dec 2022, Qing Zhao wrote:

> 
> 
> > On Dec 2, 2022, at 2:20 AM, Richard Biener  wrote:
> > 
> > On Fri, 2 Dec 2022, Richard Biener wrote:
> > 
> >> On Thu, 1 Dec 2022, Siddhesh Poyarekar wrote:
> >> 
> >>> On 2022-12-01 11:42, Kees Cook wrote:
>  On Wed, Nov 30, 2022 at 02:25:56PM +, Qing Zhao wrote:
> > '-Wstrict-flex-arrays'
> >  Warn about inproper usages of flexible array members according to
> >  the LEVEL of the 'strict_flex_array (LEVEL)' attribute attached to
> >  the trailing array field of a structure if it's available,
> >  otherwise according to the LEVEL of the option
> >  '-fstrict-flex-arrays=LEVEL'.
> > 
> >  This option is effective only when LEVEL is bigger than 0.
> >  Otherwise, it will be ignored with a warning.
> > 
> >  when LEVEL=1, warnings will be issued for a trailing array
> >  reference of a structure that have 2 or more elements if the
> >  trailing array is referenced as a flexible array member.
> > 
> >  when LEVEL=2, in addition to LEVEL=1, additional warnings will be
> >  issued for a trailing one-element array reference of a structure if
> >  the array is referenced as a flexible array member.
> > 
> >  when LEVEL=3, in addition to LEVEL=2, additional warnings will be
> >  issued for a trailing zero-length array reference of a structure if
> >  the array is referenced as a flexible array member.
> > 
> > At the same time, -Warray-bounds is updated:
>  
>  Why is there both -Wstrict-flex-arrays and -Warray-bounds? I thought
>  only the latter was going to exist?
> >> 
> >> Sorry for appearantly not being clear - I was requesting 
> >> -Wstrict-flex-arrays to be dropped and instead adjusting -Warray-bounds
> >> to adhere to -fstrict-flex-arrays in both =1 and =2 where then =2
> >> would only add the intermediate pointer results verification.
> >> 
> >> I think that's reasonable if documented since the default behavior
> >> with -Wall will not change then unless the -fstrict-flex-arrays
> >> default is altered.
> > 
> > Btw, your patch seems to implement the above plus adds 
> > -Wstrict-flex-arrays.  It seems it could be split into two, doing
> > the -Warray-bounds adjustment as first and the -Wstrict-flex-arrays 
> > addition as second.
> 
> Yes, implementation should be very easy to be adjusted to drop the new 
> -Wstrict-flex-arrays option.
> But I still feel the new -Wstrict-flex-arrays option is good to add.

Can you split the patch and re-post?  I'll quickly approve the first
part and will think harder on the second.

Thanks,
Richard.

> Qing
> >  We do all seem to agree on the first so it's easy
> > to go forward with that?
> > 
> > Thanks,
> > Richard.
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Re: [PATCH] testsuite, X86, Darwin: Fix bf16 ABI tests for Mach-O/macOS ABI.

2022-12-05 Thread Iain Sandoe
Hi Uros,

> On 5 Dec 2022, at 10:37, Uros Bizjak via Gcc-patches 
>  wrote:
> 
> On Sun, Dec 4, 2022 at 9:30 PM Iain Sandoe  wrote:
>> 

>> gcc/testsuite/ChangeLog:
>> 
>>* gcc.target/x86_64/abi/bf16/args.h: Make xmm_regs, x87_regs extern.
>>* gcc.target/x86_64/abi/bf16/m256bf16/args.h: Likewise.
>>* gcc.target/x86_64/abi/bf16/m512bf16/args.h: Likewise.
>>* gcc.target/x86_64/abi/bf16/asm-support.S: Add Mach-O variant.
>>* gcc.target/x86_64/abi/bf16/m256bf16/asm-support.S: Likewise.
>>* gcc.target/x86_64/abi/bf16/m512bf16/asm-support.S: Likewise.
> 
> Please note that in other directories asm-support-darwin.s is
> introduced and included via .exp file. Is there a reason a different
> approach is introduced here?

Since it seems that testcases get added and amended without considering any
sub-target apart from x86_64-linux-gnu (even by very experienced contributors),
I was hoping that the Darwin section might prompt folks to remember that there
are several other sub-targets.

However, the main thing is to fix the tests .. so here’s a version using 
separate
files.

OK?
thanks,
Iain



0001-testsuite-X86-Darwin-Fix-bf16-ABI-tests-for-Mach-O-M.patch
Description: Binary data


Re: [PATCH] tree, c++: optimize walk_tree_1 and cp_walk_subtrees

2022-12-05 Thread Patrick Palka via Gcc-patches
On Mon, 5 Dec 2022, Prathamesh Kulkarni wrote:

> On Mon, 5 Dec 2022 at 09:51, Patrick Palka via Gcc-patches
>  wrote:
> >
> > These functions currently repeatedly dereference tp during the subtree
> > walk, dereferences which the compiler can't CSE because it can't
> > guarantee that the subtree walking doesn't modify *tp.
> >
> > But we already implicitly require that TREE_CODE (*tp) remains the same
> > throughout the subtree walks, so it doesn't seem to be a huge leap to
> > strengthen that to requiring *tp remains the same.
> Hi Patrick,
> Just wondering in that case, if marking tp with const_tree *, instead
> of tree *, would perhaps help the compiler
> for CSEing some of the dereferences to *tp ?

IIUC I don't think using const_tree will help optimization-wise since the
main issue is CSEing across calls to the callback fn, and the callback
can still potentially modify *tp through some other, non-const pointer
to it since it's global memory.

> 
> Thanks,
> Prathamesh
> >
> > So this patch manually CSEs the dereferences of *tp.  This means that
> > the callback function can no longer replace *tp with another tree (of
> > the same TREE_CODE) when walking one of its subtrees, but that doesn't
> > sound like a useful feature anyway.  This speeds up the C++ frontend by
> > about ~1.5% according to my experiments.
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> > trunk stage3 or perhaps for stage1?
> >
> > gcc/cp/ChangeLog:
> >
> > * tree.cc (cp_walk_subtrees): Avoid repeatedly dereferencing tp.
> >
> > gcc/ChangeLog:
> >
> > * tree.cc (walk_tree_1): Avoid repeatedly dereferencing tp
> > and type_p.
> > ---
> >  gcc/cp/tree.cc | 113 +
> >  gcc/tree.cc| 103 ++--
> >  2 files changed, 108 insertions(+), 108 deletions(-)
> >
> > diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc
> > index 4066b014f6e..ceb0c75f559 100644
> > --- a/gcc/cp/tree.cc
> > +++ b/gcc/cp/tree.cc
> > @@ -5441,7 +5441,8 @@ tree
> >  cp_walk_subtrees (tree *tp, int *walk_subtrees_p, walk_tree_fn func,
> >   void *data, hash_set *pset)
> >  {
> > -  enum tree_code code = TREE_CODE (*tp);
> > +  tree t = *tp;
> > +  enum tree_code code = TREE_CODE (t);
> >tree result;
> >
> >  #define WALK_SUBTREE(NODE) \
> > @@ -5452,7 +5453,7 @@ cp_walk_subtrees (tree *tp, int *walk_subtrees_p, 
> > walk_tree_fn func,
> >  }  \
> >while (0)
> >
> > -  if (TYPE_P (*tp))
> > +  if (TYPE_P (t))
> >  {
> >/* If *WALK_SUBTREES_P is 1, we're interested in the syntactic form 
> > of
> >  the argument, so don't look through typedefs, but do walk into
> > @@ -5464,15 +5465,15 @@ cp_walk_subtrees (tree *tp, int *walk_subtrees_p, 
> > walk_tree_fn func,
> >
> >  See find_abi_tags_r for an example of setting *WALK_SUBTREES_P to 2
> >  when that's the behavior the walk_tree_fn wants.  */
> > -  if (*walk_subtrees_p == 1 && typedef_variant_p (*tp))
> > +  if (*walk_subtrees_p == 1 && typedef_variant_p (t))
> > {
> > - if (tree ti = TYPE_ALIAS_TEMPLATE_INFO (*tp))
> > + if (tree ti = TYPE_ALIAS_TEMPLATE_INFO (t))
> > WALK_SUBTREE (TI_ARGS (ti));
> >   *walk_subtrees_p = 0;
> >   return NULL_TREE;
> > }
> >
> > -  if (tree ti = TYPE_TEMPLATE_INFO (*tp))
> > +  if (tree ti = TYPE_TEMPLATE_INFO (t))
> > WALK_SUBTREE (TI_ARGS (ti));
> >  }
> >
> > @@ -5482,8 +5483,8 @@ cp_walk_subtrees (tree *tp, int *walk_subtrees_p, 
> > walk_tree_fn func,
> >switch (code)
> >  {
> >  case TEMPLATE_TYPE_PARM:
> > -  if (template_placeholder_p (*tp))
> > -   WALK_SUBTREE (CLASS_PLACEHOLDER_TEMPLATE (*tp));
> > +  if (template_placeholder_p (t))
> > +   WALK_SUBTREE (CLASS_PLACEHOLDER_TEMPLATE (t));
> >/* Fall through.  */
> >  case DEFERRED_PARSE:
> >  case TEMPLATE_TEMPLATE_PARM:
> > @@ -5497,63 +5498,63 @@ cp_walk_subtrees (tree *tp, int *walk_subtrees_p, 
> > walk_tree_fn func,
> >break;
> >
> >  case TYPENAME_TYPE:
> > -  WALK_SUBTREE (TYPE_CONTEXT (*tp));
> > -  WALK_SUBTREE (TYPENAME_TYPE_FULLNAME (*tp));
> > +  WALK_SUBTREE (TYPE_CONTEXT (t));
> > +  WALK_SUBTREE (TYPENAME_TYPE_FULLNAME (t));
> >*walk_subtrees_p = 0;
> >break;
> >
> >  case BASELINK:
> > -  if (BASELINK_QUALIFIED_P (*tp))
> > -   WALK_SUBTREE (BINFO_TYPE (BASELINK_ACCESS_BINFO (*tp)));
> > -  WALK_SUBTREE (BASELINK_FUNCTIONS (*tp));
> > +  if (BASELINK_QUALIFIED_P (t))
> > +   WALK_SUBTREE (BINFO_TYPE (BASELINK_ACCESS_BINFO (t)));
> > +  WALK_SUBTREE (BASELINK_FUNCTIONS (t));
> >*walk_subtrees_p = 0;
> >break;
> >
> >  case PTRMEM_CST:
> > -  WALK_SUBTREE (TREE_TYPE (*tp));
> > +  WALK_SUBTREE (TREE_TYPE 

[PATCH] testsuite: Fix leaks in tree-dynamic-object-size-0.c

2022-12-05 Thread Siddhesh Poyarekar
In commit e5cfb9cac1d7aba9a8ea73bfe7922cfaff9d61f3 I introduced tests
for strdup and strndup with leaks.  Fix those leaks.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-dynamic-object-size-0.c (test_strdup,
test_strndup, test_strdup_min, test_strndup_min): Free RES
before returning from function.
---
 .../gcc.dg/builtin-dynamic-object-size-0.c| 20 +++
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
index 4f1606a486b..f9047a037d9 100644
--- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
@@ -486,7 +486,10 @@ __attribute__ ((noinline))
 test_strdup (const char *in)
 {
   char *res = __builtin_strdup (in);
-  return __builtin_dynamic_object_size (res, 0);
+  size_t sz = __builtin_dynamic_object_size (res, 0);
+
+  __builtin_free (res);
+  return sz;
 }
 
 size_t
@@ -494,7 +497,10 @@ __attribute__ ((noinline))
 test_strndup (const char *in, size_t bound)
 {
   char *res = __builtin_strndup (in, bound);
-  return __builtin_dynamic_object_size (res, 0);
+  size_t sz = __builtin_dynamic_object_size (res, 0);
+
+  __builtin_free (res);
+  return sz;
 }
 
 size_t
@@ -502,7 +508,10 @@ __attribute__ ((noinline))
 test_strdup_min (const char *in)
 {
   char *res = __builtin_strdup (in);
-  return __builtin_dynamic_object_size (res, 2);
+  size_t sz = __builtin_dynamic_object_size (res, 2);
+
+  __builtin_free (res);
+  return sz;
 }
 
 size_t
@@ -510,7 +519,10 @@ __attribute__ ((noinline))
 test_strndup_min (const char *in, size_t bound)
 {
   char *res = __builtin_strndup (in, bound);
-  return __builtin_dynamic_object_size (res, 2);
+  size_t sz = __builtin_dynamic_object_size (res, 2);
+
+  __builtin_free (res);
+  return sz;
 }
 
 /* Other tests.  */
-- 
2.38.1



RE: [PING][PATCH 0/15] arm: Enables return address verification and branch target identification on Cortex-M

2022-12-05 Thread Kyrylo Tkachov via Gcc-patches
Hi Andrea,

> -Original Message-
> From: Gcc-patches  bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Andrea
> Corallo via Gcc-patches
> Sent: Monday, December 5, 2022 2:11 PM
> To: Andrea Corallo via Gcc-patches 
> Cc: Richard Earnshaw ; nd 
> Subject: Re: [PING][PATCH 0/15] arm: Enables return address verification and
> branch target identification on Cortex-M
> 
> Andrea Corallo via Gcc-patches  writes:
> 
> > Hi all,
> >
> > ping^2 for patches 9/15 7/15 11/15 12/15 and 10/15 V2 of this series.
> >
> >   Andrea
> 
> Hello all,
> 
> PING^3 for:
> [PATCH 6/12 V2] arm: Add pointer authentication for stack-unwinding
> runtime
> [PATCH 9/15] arm: Set again stack pointer as CFA reg when popping if
> necessary
> [PATCH 10/15 V4] arm: Implement cortex-M return signing address codegen
> [PATCH 12/15 V3] arm: implement bti injection
> 
> which I believe are still pending for review.

Thanks for pinging these, it helps keep track of the patches needing attention.

> 
> Other option would be to declare the arm backend as unmaintained.

Apologies for the delays in this patch series. For patches such as PACBTI in 
particular it is important to be extra thorough in the review,
as this functionality has important security implications and as it will be 
deployed in deeply embedded M-profile systems
it would be hard to address bugs and vulnerabilities effectively after the 
fact. See, for example the many PAC and BTI bug reports we've had over the 
years for AArch64.
There are many things that can go wrong 

So it's important to get this right the first time.
Thank you for your patience.
Kyrill

> 
> Thanks
> 
>   Andrea


Re: [PING][PATCH 0/15] arm: Enables return address verification and branch target identification on Cortex-M

2022-12-05 Thread Andrea Corallo via Gcc-patches
Andrea Corallo via Gcc-patches  writes:

> Hi all,
>
> ping^2 for patches 9/15 7/15 11/15 12/15 and 10/15 V2 of this series.
>
>   Andrea

Hello all,

PING^3 for:
[PATCH 6/12 V2] arm: Add pointer authentication for stack-unwinding runtime
[PATCH 9/15] arm: Set again stack pointer as CFA reg when popping if necessary
[PATCH 10/15 V4] arm: Implement cortex-M return signing address codegen
[PATCH 12/15 V3] arm: implement bti injection

which I believe are still pending for review.

Other option would be to declare the arm backend as unmaintained.

Thanks

  Andrea


Re: [PATCH 2/2]AArch64 Support new tbranch optab.

2022-12-05 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
> Hi,
>
> I hadn't received any reply so I had implemented various ways to do this 
> (about 8 of them in fact).
>
> The conclusion is that no, we cannot emit one big RTL for the final 
> instruction immediately.
> The reason that all comparisons in the AArch64 backend expand to separate CC 
> compares, and
> separate testing of the operands is for ifcvt.
>
> The separate CC compare is needed so ifcvt can produce csel, cset etc from 
> the compares.  Unlike
> say combine, ifcvt can not do recog on a parallel with a clobber.  Should we 
> emit the instruction
> directly then ifcvt will not be able to say, make a csel, because we have no 
> patterns which handle
> zero_extract and compare. (unlike combine ifcvt cannot transform the extract 
> into an AND).
>
> While you could provide various patterns for this (and I did try) you end up 
> with broken patterns
> because you can't add the clobber to the CC register.  If you do, ifcvt recog 
> fails.
>
> i.e.
>
> int
> f1 (int x)
> {
>   if (x & 1)
> return 1;
>   return x;
> }
>
> We lose csel here.
>
> Secondly the reason the compare with an explicit CC mode is needed is so that 
> ifcvt can transform
> the operation into a version that doesn't require the flags to be set.  But 
> it only does so if it know
> the explicit usage of the CC reg.
>
> For instance 
>
> int
> foo (int a, int b)
> {
>   return ((a & (1 << 25)) ? 5 : 4);
> }
>
> Doesn't require a comparison, the optimal form is:
>
> foo(int, int):
> ubfxx0, x0, 25, 1
> add w0, w0, 4
> ret
>
> and no compare is actually needed.  If you represent the instruction using an 
> ANDS instead of a zero_extract
> then you get close, but you end up with an ands followed by an add, which is 
> a slower operation.
>
> These two reasons are the main reasons why all comparisons in AArch64 expand 
> the way they do, so tbranch
> Shouldn't do anything differently here.

Thanks for the (useful) investigation.  Makes sense.

> Additionally the reason for the optab was to pass range information
> to the backend during expansion.

Yeah.  But I think the fundamental reason that AArch64 defines the
optab is still that it has native support for the associated operation
(which is a good thing, an honest reason).  The fact that we split it
apart for if-conversion---in a different form from normal comparisons---
is an implementation detail.  So it still seems like a proper optab,
rather than a crutch to convey tree info.

> In this version however I have represented the expand using an ANDS instead.  
> This allows us not to regress
> on -O0 as the previous version did.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Note that this patch relies on 
> https://patchwork.sourceware.org/project/gcc/patch/y1+4qitmrqhbd...@arm.com/ 
> which has yet to be reviewed but which cleans up extensions so they can be 
> used like this.
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.md (*tb1): Rename to...
>   (*tb1): ... this.
>   (tbranch_4): New.
>   (zero_extend2,
>   zero_extend2,
>   zero_extend2): Make dynamic calls with @.
>   * config/aarch64/iterators.md(ZEROM, zerom): New.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/tbz_1.c: New test.
>
> --- inline copy of patch ---
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 
> 4c181a96e555c2a58c59fc991000b2a2fa9bd244..7ee1d01e050004e42cd2d0049f0200da71d918bb
>  100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -946,12 +946,33 @@ (define_insn "*cb1"
> (const_int 1)))]
>  )
>  
> -(define_insn "*tb1"
> +(define_expand "tbranch_4"
>[(set (pc) (if_then_else
> -   (EQL (zero_extract:DI (match_operand:GPI 0 "register_operand" "r")
> - (const_int 1)
> - (match_operand 1
> -   "aarch64_simd_shift_imm_" "n"))
> +  (EQL (match_operand:ALLI 0 "register_operand")
> +   (match_operand 1 "aarch64_simd_shift_imm_"))
> +  (label_ref (match_operand 2 ""))
> +  (pc)))]
> +  ""
> +{
> +  rtx bitvalue = gen_reg_rtx (mode);
> +  rtx reg = gen_reg_rtx (mode);
> +  if (mode == mode)
> +reg = operands[0];
> +  else
> +emit_insn (gen_zero_extend2 (mode, mode, reg, operands[0]));

I think the last five lines should just be:

  rtx reg = gen_lowpart (mode, operands[0]);

using paradoxical subregs for the QI and HI cases.  Using subregs should
generate better code, since the temporary runs the risk of having the
same value live in two different pseudos at the same time (one pseudo
with the original mode, one pseudo with the extended mode).

OK with that change and without the changes to the zero_extend pattern names.

Thanks,
Richard

> +  rtx val = GEN_INT (1UL << UINTVAL (operands[1]));

[PATCH] tree-optimization/106868 - bogus -Wdangling-pointer diagnostic

2022-12-05 Thread Richard Biener via Gcc-patches
The testcase shows we mishandle the case where there's a pass-through
of a pointer through a function like memcpy.  The following adjusts
handling of this copy case to require a taken address and adjust
the PHI case similarly.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/106868
* gimple-ssa-warn-access.cc (pass_waccess::gimple_call_return_arg_ref):
Inline into single user ...
(pass_waccess::check_dangling_uses): ... here and adjust the
call and the PHI case to require that ref.aref is the address
of the decl.

* gcc.dg/Wdangling-pointer-pr106868.c: New testcase.
---
 gcc/gimple-ssa-warn-access.cc | 52 ++-
 .../gcc.dg/Wdangling-pointer-pr106868.c   | 14 +
 2 files changed, 30 insertions(+), 36 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/Wdangling-pointer-pr106868.c

diff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-access.cc
index 59a70530600..854e47cf389 100644
--- a/gcc/gimple-ssa-warn-access.cc
+++ b/gcc/gimple-ssa-warn-access.cc
@@ -2127,7 +2127,6 @@ private:
 
   /* Return the argument that a call returns.  */
   tree gimple_call_return_arg (gcall *);
-  tree gimple_call_return_arg_ref (gcall *);
 
   /* Check a call for uses of a dangling pointer arguments.  */
   void check_call_dangling (gcall *);
@@ -4460,24 +4459,6 @@ pass_waccess::gimple_call_return_arg (gcall *call)
   return gimple_call_arg (call, argno);
 }
 
-/* Return the decl referenced by the argument that the call STMT to
-   a built-in function returns (including with an offset) or null if
-   it doesn't.  */
-
-tree
-pass_waccess::gimple_call_return_arg_ref (gcall *call)
-{
-  if (tree arg = gimple_call_return_arg (call))
-{
-  access_ref aref;
-  if (m_ptr_qry.get_ref (arg, call, , 0)
- && DECL_P (aref.ref))
-   return aref.ref;
-}
-
-  return NULL_TREE;
-}
-
 /* Check for and diagnose all uses of the dangling pointer VAR to the auto
object DECL whose lifetime has ended.  OBJREF is true when VAR denotes
an access to a DECL that may have been clobbered.  */
@@ -4646,11 +4627,10 @@ pass_waccess::check_dangling_uses ()
   unsigned i;
   FOR_EACH_SSA_NAME (i, var, m_func)
 {
-  /* For each SSA_NAME pointer VAR find the DECL it points to.
-If the DECL is a clobbered local variable, check to see
+  /* For each SSA_NAME pointer VAR find the object it points to.
+If the object is a clobbered local variable, check to see
 if any of VAR's uses (or those of other pointers derived
 from VAR) happens after the clobber.  If so, warn.  */
-  tree decl = NULL_TREE;
 
   gimple *def_stmt = SSA_NAME_DEF_STMT (var);
   if (is_gimple_assign (def_stmt))
@@ -4660,23 +4640,30 @@ pass_waccess::check_dangling_uses ()
{
  if (!POINTER_TYPE_P (TREE_TYPE (var)))
continue;
- decl = TREE_OPERAND (rhs, 0);
+ check_dangling_uses (var, TREE_OPERAND (rhs, 0));
}
  else
{
  /* For other expressions, check the base DECL to see
 if it's been clobbered, most likely as a result of
 inlining a reference to it.  */
- decl = get_base_address (rhs);
+ tree decl = get_base_address (rhs);
  if (DECL_P (decl))
check_dangling_uses (var, decl, false, true);
- continue;
}
}
   else if (POINTER_TYPE_P (TREE_TYPE (var)))
{
  if (gcall *call = dyn_cast(def_stmt))
-   decl = gimple_call_return_arg_ref (call);
+   {
+ if (tree arg = gimple_call_return_arg (call))
+   {
+ access_ref aref;
+ if (m_ptr_qry.get_ref (arg, call, , 0)
+ && aref.deref < 0)
+   check_dangling_uses (var, aref.ref);
+   }
+   }
  else if (gphi *phi = dyn_cast (def_stmt))
{
  unsigned nargs = gimple_phi_num_args (phi);
@@ -4684,19 +4671,12 @@ pass_waccess::check_dangling_uses ()
{
  access_ref aref;
  tree arg = gimple_phi_arg_def (phi, i);
- if (!m_ptr_qry.get_ref (arg, phi, , 0)
- || (aref.deref == 0
- && POINTER_TYPE_P (TREE_TYPE (aref.ref
-   continue;
- check_dangling_uses (var, aref.ref, true);
+ if (m_ptr_qry.get_ref (arg, phi, , 0)
+ && aref.deref < 0)
+   check_dangling_uses (var, aref.ref, true);
}
- continue;
}
- else
-   continue;
}
-
-  check_dangling_uses (var, decl);
 }
 }
 
diff --git a/gcc/testsuite/gcc.dg/Wdangling-pointer-pr106868.c 
b/gcc/testsuite/gcc.dg/Wdangling-pointer-pr106868.c
new 

Re: [Patch] OpenMP, libgomp, gimple: omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices

2022-12-05 Thread Jakub Jelinek via Gcc-patches
On Thu, Nov 24, 2022 at 03:09:02PM +0100, Marcel Vollweiler wrote:
> gcc/ChangeLog:
> 
>   * gimplify.cc (optimize_target_teams): Set initial num_teams_upper
>   to "-2" instead of "1" for non-existing num_teams clause in order to
>   disambiguate from the case of an existing num_teams clause with value 1.
> 
> libgomp/ChangeLog:
> 
>   * config/gcn/icv-device.c (omp_get_teams_thread_limit): Added to
>   allow processing of device-specific values.
>   (omp_set_teams_thread_limit): Likewise.
>   (ialias): Likewise.
>   * config/nvptx/icv-device.c (omp_get_teams_thread_limit): Likewise.
>   (omp_set_teams_thread_limit): Likewise.
>   (ialias): Likewise.
>   * icv-device.c (omp_get_teams_thread_limit): Likewise.
>   (ialias): Likewise.
>   (omp_set_teams_thread_limit): Likewise.
>   * icv.c (omp_set_teams_thread_limit): Removed.
>   (omp_get_teams_thread_limit): Likewise.
>   (ialias): Likewise.
>   * libgomp.texi: Updated documentation for nvptx and gcn corresponding
>   to the limitation of the number of teams.
>   * plugin/plugin-gcn.c (limit_teams): New helper function that limits
>   the number of teams by twice the number of compute units.
>   (parse_target_attributes): Limit the number of teams on gcn offload
>   devices.
>   * target.c (get_gomp_offload_icvs): Added teams_thread_limit_var
>   handling.
>   (gomp_load_image_to_device): Added a size check for the ICVs struct
>   variable.
>   (gomp_copy_back_icvs): New function that is used in GOMP_target_ext to
>   copy back the ICV values from device to host.
>   (GOMP_target_ext): Update the number of teams and threads in the kernel
>   args also considering device-specific values.
>   * testsuite/libgomp.c-c++-common/icv-4.c: Fixed an error in the reading
>   of OMP_TEAMS_THREAD_LIMIT from the environment.
>   * testsuite/libgomp.c-c++-common/icv-5.c: Extended.
>   * testsuite/libgomp.c-c++-common/icv-6.c: Extended.
>   * testsuite/libgomp.c-c++-common/icv-7.c: Extended.
>   * testsuite/libgomp.c-c++-common/icv-9.c: New test.
>   * testsuite/libgomp.fortran/icv-5.f90: New test.
>   * testsuite/libgomp.fortran/icv-6.f90: New test.
> 
> gcc/testsuite/ChangeLog:
> 
>   * c-c++-common/gomp/target-teams-1.c: Adapt expected values for
>   num_teams from "1" to "-2" in cases without num_teams clause.
>   * g++.dg/gomp/target-teams-1.C: Likewise.
>   * gfortran.dg/gomp/defaultmap-4.f90: Likewise.
>   * gfortran.dg/gomp/defaultmap-5.f90: Likewise.
>   * gfortran.dg/gomp/defaultmap-6.f90: Likewise.

Ok, thanks.

Jakub



Re: [PATCH] gcc: Use ld -r when checking for HAVE_LD_RO_RW_SECTION_MIXING

2022-12-05 Thread Joakim Nohlgård
On Mon, Nov 28, 2022 at 5:53 PM Jeff Law  wrote:
>
>
> On 11/28/22 06:59, Joakim Nohlgård wrote:
> > The check for HAVE_LD_RO_RW_SECTION_MIXING fails on targets where ld
> > does not support shared objects, even though the answer to the test
> > should be 'read-write'. One such target is riscv64-unknown-elf. Failing
> > this test results in a libgcc crtbegin.o which has a writable .eh_frame
> > section leading to the default linker scripts placing the .eh_frame
> > section in a writable memory segment, or a linker warning about writable
> > sections in a read-only segment when using ld scripts that place
> > .eh_frame unconditionally in ROM.
> >
> > gcc/ChangeLog:
> >
> >   * configure: Regenerate.
> >   * configure.ac: Use ld -r in the check for 
> > HAVE_LD_RO_RW_SECTION_MIXING
>
> I'm not sure that simply replacing -shared with -r is the right fix
> here.  ISTM that if the -shared tests fails, then we can/should try the
> -r variant.Am I missing something here?
>
I have posted a v2 patch. The new patch tries ld -shared first and
falling back to ld -r if that fails.

I believe the original reason for using ld -shared in the first place
was that it was a convenient way to let the conftest1,2,3 code be as
simple as possible. Using ld without any flags would require a program
entry point (_start) at the very minimum. Using ld -r has the same
effect as the ld -shared link in this case, where we just want to
merge sections with the same name from different input files and check
what section flags were propagated to the output file.


[PATCH v2] gcc: Use ld -r when checking for HAVE_LD_RO_RW_SECTION_MIXING

2022-12-05 Thread Joakim Nohlgård
Fall back to ld -r if ld -shared fails during configure. The check for
HAVE_LD_RO_RW_SECTION_MIXING can fail on targets where ld does not
support shared objects, even though the answer to the test should be
'read-write'. One such target is riscv64-unknown-elf. Failing this test
results in a libgcc crtbegin.o which has a writable .eh_frame section
leading to the default linker scripts placing the .eh_frame section in a
writable memory segment, or a linker warning when using ld scripts that
place .eh_frame unconditionally in ROM.

gcc/ChangeLog:

* configure: Regenerate.
* configure.ac: Use ld -r in the check for HAVE_LD_RO_RW_SECTION_MIXING

Signed-off-by: Joakim Nohlgård 
---
 gcc/configure.ac | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/gcc/configure.ac b/gcc/configure.ac
index 7c55bff6cb0..500829ceb1d 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -3152,16 +3152,19 @@ elif test x$gcc_cv_as != x -a x$gcc_cv_ld != x -a 
x$gcc_cv_objdump != x ; then
   echo '.byte 0' >> conftest3.s
   if $gcc_cv_as -o conftest1.o conftest1.s > /dev/null 2>&1 \
  && $gcc_cv_as -o conftest2.o conftest2.s > /dev/null 2>&1 \
- && $gcc_cv_as -o conftest3.o conftest3.s > /dev/null 2>&1 \
- && $gcc_cv_ld -shared -o conftest1.so conftest1.o \
-   conftest2.o conftest3.o > /dev/null 2>&1; then
-gcc_cv_ld_ro_rw_mix=`$gcc_cv_objdump -h conftest1.so \
-| sed -e '/myfoosect/!d' -e N`
-if echo "$gcc_cv_ld_ro_rw_mix" | grep CONTENTS > /dev/null; then
-  if echo "$gcc_cv_ld_ro_rw_mix" | grep READONLY > /dev/null; then
-   gcc_cv_ld_ro_rw_mix=read-only
-  else
-   gcc_cv_ld_ro_rw_mix=read-write
+ && $gcc_cv_as -o conftest3.o conftest3.s > /dev/null 2>&1; then
+if $gcc_cv_ld -shared -o conftest1.so conftest1.o \
+   conftest2.o conftest3.o > /dev/null 2>&1 \
+   || $gcc_cv_ld -r -o conftest1.so conftest1.o \
+ conftest2.o conftest3.o > /dev/null 2>&1; then
+  gcc_cv_ld_ro_rw_mix=`$gcc_cv_objdump -h conftest1.so \
+  | sed -e '/myfoosect/!d' -e N`
+  if echo "$gcc_cv_ld_ro_rw_mix" | grep CONTENTS > /dev/null; then
+   if echo "$gcc_cv_ld_ro_rw_mix" | grep READONLY > /dev/null; then
+ gcc_cv_ld_ro_rw_mix=read-only
+   else
+ gcc_cv_ld_ro_rw_mix=read-write
+   fi
   fi
 fi
   fi
-- 
2.38.1



Re: [PATCH][AArch64] Cleanup move immediate code

2022-12-05 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra  writes:
> Hi Richard,
>
>> -  scalar_int_mode imode = (mode == HFmode
>> -    ? SImode
>> -    : int_mode_for_mode (mode).require ());
>> +  machine_mode imode = (mode == DFmode) ? DImode : SImode;
>
>> It looks like this might mishandle DDmode, if not now then in the future.
>> Seems safer to use known_eq (GET_MODE_SIZE (mode), 8)
>
> I've changed that, but it does not matter for the narrow modes as the result
> will be identical - only DDmode might get costed incorrectly.
>
>> Sorry for not noticing last time, but: rather than have
>> aarch64_zeroextended_move_imm (which is quite a complicated test),
>> could we just add an extra (default off) parameter to aarch64_move_imm
>> that suppresses the (val >> 32) == 0 test?
>
> That makes things more complicated again - ultimately I'd like to get rid of 
> the
> mode parameter since most callers use a fixed mode, and ones that don't are
> now creating and passing fake modes...

I guess we'll have to agree to disagree on that one.

> I've change it like aarch64_move_imm and call aarch64_is_movz twice to
> check it is not a 64-bit MOVZ/MOVN.

Unlike with my suggestion, the result of the function is only meaningful
if the caller has checked for a valid move immediate first.  That is,
aarch64_zeroextended_move_imm takes as granted that the immediate is
valid for MOV Xn or MOV Wn, and the function returns false if the
immediate is valid for MOV Xn.  It seems like an odd way round to me,
since it inherently means checking the same thing twice.

Maybe a compromise would be to reverse the sense of the return value
(return true for MOV Xns rather than false) and call the function
something like aarch64_mov_xn_imm.  All callers could cope with that,
and:

(define_constraint "N"
  "A constant that can be used with a 64-bit MOV immediate operation."
  (and (match_code "const_int")
   (match_test "aarch64_move_imm (ival, DImode)")
   (match_test "!aarch64_zeroextended_move_imm (ival)")))

would become simply:

(define_constraint "N"
  "A constant that can be used with a 64-bit MOV immediate operation."
  (and (match_code "const_int")
   (match_test "aarch64_mov_xn_imm (ival)")))

I'm still not too happy with the duplication, since the structure of
aarch64_mov_xn_imm is obviously aarch64_move_imm with some bits taken
out.  If (somehow) one function learns a new trick, that trick would
need to be duplicated in the other function.  But like I say, I think
we'll have to agree to disagree on that.

So the patch is OK with the aarch64_mov_xn_imm change suggested above,
or let me know if you disagree.

Thanks,
Richard

>
> Cheers,
> Wilco
>
> v3: Use aarch64_is_movz, use known_eq
>
> Simplify, refactor and improve various move immediate functions.
> Allow 32-bit MOVI/N as a valid 64-bit immediate which removes special
> cases in aarch64_internal_mov_immediate.  Add new constraint so the movdi
> pattern only needs a single alternative for move immediate.
>
> Passes bootstrap and regress, OK for commit?
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.cc (aarch64_bitmask_imm): Use unsigned type.
> (aarch64_zeroextended_move_imm): New function.
> (aarch64_move_imm): Refactor, assert mode is SImode or DImode.
> (aarch64_internal_mov_immediate): Assert mode is SImode or DImode.
> Simplify special cases.
> (aarch64_uimm12_shift): Simplify code.
> (aarch64_clamp_to_uimm12_shift): Likewise.
> (aarch64_movw_imm): Rename to aarch64_is_movz.
> (aarch64_float_const_rtx_p): Pass either SImode or DImode to
> aarch64_internal_mov_immediate.
> (aarch64_rtx_costs): Likewise.
> * config/aarch64/aarch64.md (movdi_aarch64): Merge 'N' and 'M'
> constraints into single 'O'.
> (mov_aarch64): Likewise.
> * config/aarch64/aarch64-protos.h (aarch64_move_imm): Use unsigned.
> (aarch64_bitmask_imm): Likewise.
> (aarch64_uimm12_shift): Likewise.
> (aarch64_zeroextended_move_imm): New prototype.
> * config/aarch64/constraints.md: Add 'O' for 32/64-bit immediates,
> limit 'N' to 64-bit only moves.
>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 
> 4be93c93c26e091f878bc8e4cf06e90888405fb2..8bce6ec7599edcc2e6a1d8006450f35c0ce7f61f
>  100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -756,7 +756,7 @@ void aarch64_post_cfi_startproc (void);
>  poly_int64 aarch64_initial_elimination_offset (unsigned, unsigned);
>  int aarch64_get_condition_code (rtx);
>  bool aarch64_address_valid_for_prefetch_p (rtx, bool);
> -bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
> +bool aarch64_bitmask_imm (unsigned HOST_WIDE_INT val, machine_mode);
>  unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT val_in);
>  unsigned HOST_WIDE_INT aarch64_and_split_imm2 

Re: [PATCH] 0/19 modula-2 front end patches overview

2022-12-05 Thread Gaius Mulley via Gcc-patches
Richard Biener  writes:

>>
>> I think it's important to get this (and the rust frontend) into the tree 
>> before
>> Christmas holidays so it gets exposed to the more weird treatment of some
>> of our users (build wise).  This way we can develop either a negative or
>> positive list of host/targets where to disable the new frontends.
>
> So let's go ahead with the Modula-2 merge.  Gaius, can you post a final
> series of the patches and perform the merging please?
>
> Thus - OK to merge to trunk!
>
> Thanks,
> Richard.

Hi Richard,

certainly will do - and thanks!

regards,
Gaius


Re: [PATCH] range-op-float: Improve multiplication reverse operation

2022-12-05 Thread Aldy Hernandez via Gcc-patches




On 12/5/22 12:59, Jakub Jelinek wrote:

On Mon, Dec 05, 2022 at 10:54:41AM +0100, Aldy Hernandez wrote:

Do you mind if I try that incrementally and only if it doesn't make the
code too large/too unreadable?


Sure.  And don't feel obligated to implement it either.  range-ops is a
never ending pit of possible optimizations.  We found that out quite early
in the design.

If you don't get to it, could you at least add a comment?


So like this for multiplication op1/2_range if it passes bootstrap/regtest?
For division I'll need to go to a drawing board...


Sure, looks good to me.



2022-12-05  Jakub Jelinek  

* range-op-float.cc (zero_to_max_range): New function.
(foperator_plus::op1_range): If lhs can't be INF nor NAN,
op1 can't be INF.

--- gcc/range-op-float.cc.jj2022-12-05 11:17:34.900573272 +0100
+++ gcc/range-op-float.cc   2022-12-05 12:32:30.286753929 +0100
@@ -2004,6 +2004,29 @@ zero_to_inf_range (REAL_VALUE_TYPE ,
  }
  }
  
+// Set [lb, ub] to [-MAX, -0], [-MAX, +MAX] or [+0, +MAX] depending on

+// signbit_known.
+static void
+zero_to_max_range (REAL_VALUE_TYPE , REAL_VALUE_TYPE , tree type,
+  int signbit_known)


My apologies for having to passs around [lb, ub] everywhere.  At least 
for rv_fold(), I'd like to rewrite it next year as:


void rv_fold (frange , tree type, ...);

...instead of having to pass LB, UB, and maybe_nan as references.  All 
the information in LB, UB, and maybe_nan can fit nicely in an frange. 
This is what we do in irange (wi_fold).  It also sets us up nicely for 
when we have subranges, and we can union the result.


I'll clean it up early next release.
Aldy


+{
+  if (signbit_known > 0)
+{
+  lb = dconst0;
+  ub = real_max_representable (type);
+}
+  else if (signbit_known < 0)
+{
+  lb = real_min_representable (type);
+  ub = real_value_negate ();
+}
+  else
+{
+  lb = real_min_representable (type);
+  ub = real_max_representable (type);
+}
+}
+
  class foperator_plus : public range_operator_float
  {
using range_operator_float::op1_range;
@@ -2159,7 +2182,14 @@ public:
// and perhaps if it can be NAN or not.
REAL_VALUE_TYPE lb, ub;
int signbit_known = signbit_known_p (lhs_lb, lhs_ub, op2_lb, op2_ub);
-   zero_to_inf_range (lb, ub, signbit_known);
+   // If lhs can't be +-INF nor NAN, then op1 can't be +-INF -
+   // +-INF * anything is either +-INF or NAN (if op2 is +-0 or NAN).
+   if (!real_isinf (_lb)
+   && !real_isinf (_ub)
+   && !lhs.maybe_isnan ())
+ zero_to_max_range (lb, ub, type, signbit_known);
+   else
+ zero_to_inf_range (lb, ub, signbit_known);
r.set (type, lb, ub);
}
  // Otherwise, if op2 is a singleton INF and lhs doesn't include INF,


Jakub





Re: [PATCH 1/2]middle-end: Add new tbranch optab to add support for bit-test-and-branch operations

2022-12-05 Thread Richard Sandiford via Gcc-patches
Richard Sandiford via Gcc-patches  writes:
> Tamar Christina via Gcc-patches  writes:
>>> > +/* Check to see if the supplied comparison in PTEST can be performed as a
>>> > +   bit-test-and-branch instead.  VAL must contain the original tree
>>> > +   expression of the non-zero operand which will be used to rewrite the
>>> > +   comparison in PTEST.
>>> > +
>>> > +   Returns TRUE if operation succeeds and returns updated PMODE and
>>> PTEST,
>>> > +   else FALSE.  */
>>> > +
>>> > +enum insn_code
>>> > +static validate_test_and_branch (tree val, rtx *ptest, machine_mode
>>> > +*pmode) {
>>> > +  if (!val || TREE_CODE (val) != SSA_NAME)
>>> > +return CODE_FOR_nothing;
>>> > +
>>> > +  machine_mode mode = TYPE_MODE (TREE_TYPE (val));  rtx test =
>>> > + *ptest;
>>> > +
>>> > +  if (GET_CODE (test) != EQ && GET_CODE (test) != NE)
>>> > +return CODE_FOR_nothing;
>>> > +
>>> > +  /* If the target supports the testbit comparison directly, great.
>>> > + */  auto icode = direct_optab_handler (tbranch_optab, mode);  if
>>> > + (icode == CODE_FOR_nothing)
>>> > +return icode;
>>> > +
>>> > +  if (tree_zero_one_valued_p (val))
>>> > +{
>>> > +  auto pos = BYTES_BIG_ENDIAN ? GET_MODE_BITSIZE (mode) - 1 : 0;
>>> 
>>> Does this work for BYTES_BIG_ENDIAN && !WORDS_BIG_ENDIAN and mode
>>> > word_mode?
>>> 
>>
>> It does now. In this particular case all that matters is the bit ordering, 
>> so I've changed
>> It to BITS_BIG_ENDIAN.
>>
>> Also during the review of the AArch64 optab Richard Sandiford wanted me to 
>> split the
>> optabs apart into two.  The reason is that a match_operator still gets the 
>> full RTL.
>>
>> In the case of a tbranch the full RTL has an invalid comparison, so if a 
>> target doesn't implement
>> the hook correctly this would lead to incorrect code.  We've now moved the 
>> operator as part of
>> the name itself to avoid this.
>>
>> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>>
>> Ok for master?
>>
>> Thanks,
>> Tamar
>>
>> gcc/ChangeLog:
>>
>>  * dojump.cc (do_jump): Pass along value.
>>  (do_jump_by_parts_greater_rtx): Likewise.
>>  (do_jump_by_parts_zero_rtx): Likewise.
>>  (do_jump_by_parts_equality_rtx): Likewise.
>>  (do_compare_rtx_and_jump): Likewise.
>>  (do_compare_and_jump): Likewise.
>>  * dojump.h (do_compare_rtx_and_jump): New.
>>  * optabs.cc (emit_cmp_and_jump_insn_1): Refactor to take optab to check.
>>  (validate_test_and_branch): New.
>>  (emit_cmp_and_jump_insns): Optiobally take a value, and when value is
>>  supplied then check if it's suitable for tbranch.
>>  * optabs.def (tbranch_eq$a4, tbranch_ne$a4): New.
>>  * doc/md.texi (tbranch_@var{op}@var{mode}4): Document it.
>>  * optabs.h (emit_cmp_and_jump_insns):
>>  * tree.h (tree_zero_one_valued_p): New.
>
> Thanks for doing this.
>
>> --- inline copy of patch ---
>>
>> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
>> index 
>> d0a71ecbb806de3a6564c6ffe973fec5da5c597b..c6c4b13d756de28078a0a779876a00c614246914
>>  100644
>> --- a/gcc/doc/md.texi
>> +++ b/gcc/doc/md.texi
>> @@ -6964,6 +6964,14 @@ case, you can and should make operand 1's predicate 
>> reject some operators
>>  in the @samp{cstore@var{mode}4} pattern, or remove the pattern altogether
>>  from the machine description.
>>  
>> +@cindex @code{tbranch_@var{op}@var{mode}4} instruction pattern
>> +@item @samp{tbranch_@var{op}@var{mode}4}
>> +Conditional branch instruction combined with a bit test-and-compare
>> +instruction. Operand 0 is a comparison operator.  Operand 1 is the
>> +operand of the comparison. Operand 2 is the bit position of Operand 1 to 
>> test.
>> +Operand 3 is the @code{code_label} to jump to. @var{op} is one of @var{eq} 
>> or
>> +@var{ne}.
>> +
>
> The documentation still describes the old interface.  Also, there are only 3
> operands now, rather than 4, so the optab name should end with 3.
>
>>  @cindex @code{cbranch@var{mode}4} instruction pattern
>>  @item @samp{cbranch@var{mode}4}
>>  Conditional branch instruction combined with a compare instruction.
>> diff --git a/gcc/dojump.h b/gcc/dojump.h
>> index 
>> e379cceb34bb1765cb575636e4c05b61501fc2cf..d1d79c490c420a805fe48d58740a79c1f25fb839
>>  100644
>> --- a/gcc/dojump.h
>> +++ b/gcc/dojump.h
>> @@ -71,6 +71,10 @@ extern void jumpifnot (tree exp, rtx_code_label *label,
>>  extern void jumpifnot_1 (enum tree_code, tree, tree, rtx_code_label *,
>>   profile_probability);
>>  
>> +extern void do_compare_rtx_and_jump (rtx, rtx, enum rtx_code, int, tree,
>> + machine_mode, rtx, rtx_code_label *,
>> + rtx_code_label *, profile_probability);
>> +
>>  extern void do_compare_rtx_and_jump (rtx, rtx, enum rtx_code, int,
>>   machine_mode, rtx, rtx_code_label *,
>>   rtx_code_label *, profile_probability);
>> diff --git a/gcc/dojump.cc b/gcc/dojump.cc
>> 

Re: [PATCH v2 1/2] RISC-V: Support _Float16 type.

2022-12-05 Thread Maciej W. Rozycki
Hi Kito,

 I came across this issue while inspecting code and I have been wondering 
what the reason was to downgrade current FMV.X.W and FMW.W.X instructions 
to their older FMV.S.W and FMV.W.S variants here:

On Wed, 10 Aug 2022, Kito Cheng wrote:

> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 5a0adffb5ce..47e6110767c 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -2308,10 +2310,19 @@ riscv_output_move (rtx dest, rtx src)
>if (dest_code == REG && GP_REG_P (REGNO (dest)))
>  {
>if (src_code == REG && FP_REG_P (REGNO (src)))
> - return dbl_p ? "fmv.x.d\t%0,%1" : "fmv.x.w\t%0,%1";
> + switch (width)
> +   {
> +   case 2:
> + /* Using fmv.x.s + sign-extend to emulate fmv.x.h.  */
> + return "fmv.x.s\t%0,%1;slli\t%0,%0,16;srai\t%0,%0,16";
> +   case 4:
> + return "fmv.x.s\t%0,%1";
> +   case 8:
> + return "fmv.x.d\t%0,%1";
> +   }

and here:

> @@ -2353,18 +2364,24 @@ riscv_output_move (rtx dest, rtx src)
>   return "mv\t%0,%z1";
>  
> if (FP_REG_P (REGNO (dest)))
> - {
> -   if (!dbl_p)
> - return "fmv.w.x\t%0,%z1";
> -   if (TARGET_64BIT)
> - return "fmv.d.x\t%0,%z1";
> -   /* in RV32, we can emulate fmv.d.x %0, x0 using fcvt.d.w */
> -   gcc_assert (src == CONST0_RTX (mode));
> -   return "fcvt.d.w\t%0,x0";
> - }
> + switch (width)
> +   {
> +   case 2:
> + /* High 16 bits should be all-1, otherwise HW will treated
> +as a n-bit canonical NaN, but isn't matter for softfloat.  */
> + return "fmv.s.x\t%0,%1";
> +   case 4:
> + return "fmv.s.x\t%0,%z1";
> +   case 8:
> + if (TARGET_64BIT)
> +   return "fmv.d.x\t%0,%z1";
> + /* in RV32, we can emulate fmv.d.x %0, x0 using fcvt.d.w */

(Incorrect comment formatting here as well.)

> + gcc_assert (src == CONST0_RTX (mode));
> + return "fcvt.d.w\t%0,x0";
> +   }

Was it intentional or just an oversight in review?  If intentional, I'd 
expect such a change to happen on its own rather than sneaked in with a 
large functional update.

  Maciej


Re: [PATCH][AArch64] Cleanup move immediate code

2022-12-05 Thread Wilco Dijkstra via Gcc-patches
Hi Richard,

> -  scalar_int_mode imode = (mode == HFmode
> -    ? SImode
> -    : int_mode_for_mode (mode).require ());
> +  machine_mode imode = (mode == DFmode) ? DImode : SImode;

> It looks like this might mishandle DDmode, if not now then in the future.
> Seems safer to use known_eq (GET_MODE_SIZE (mode), 8)

I've changed that, but it does not matter for the narrow modes as the result
will be identical - only DDmode might get costed incorrectly.

> Sorry for not noticing last time, but: rather than have
> aarch64_zeroextended_move_imm (which is quite a complicated test),
> could we just add an extra (default off) parameter to aarch64_move_imm
> that suppresses the (val >> 32) == 0 test?

That makes things more complicated again - ultimately I'd like to get rid of the
mode parameter since most callers use a fixed mode, and ones that don't are
now creating and passing fake modes... I've change it like aarch64_move_imm
and call aarch64_is_movz twice to check it is not a 64-bit MOVZ/MOVN.

Cheers,
Wilco

v3: Use aarch64_is_movz, use known_eq

Simplify, refactor and improve various move immediate functions.
Allow 32-bit MOVI/N as a valid 64-bit immediate which removes special
cases in aarch64_internal_mov_immediate.  Add new constraint so the movdi
pattern only needs a single alternative for move immediate.

Passes bootstrap and regress, OK for commit?

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_bitmask_imm): Use unsigned type.
(aarch64_zeroextended_move_imm): New function.
(aarch64_move_imm): Refactor, assert mode is SImode or DImode.
(aarch64_internal_mov_immediate): Assert mode is SImode or DImode.
Simplify special cases.
(aarch64_uimm12_shift): Simplify code.
(aarch64_clamp_to_uimm12_shift): Likewise.
(aarch64_movw_imm): Rename to aarch64_is_movz.
(aarch64_float_const_rtx_p): Pass either SImode or DImode to
aarch64_internal_mov_immediate.
(aarch64_rtx_costs): Likewise.
* config/aarch64/aarch64.md (movdi_aarch64): Merge 'N' and 'M'
constraints into single 'O'.
(mov_aarch64): Likewise.
* config/aarch64/aarch64-protos.h (aarch64_move_imm): Use unsigned.
(aarch64_bitmask_imm): Likewise.
(aarch64_uimm12_shift): Likewise.
(aarch64_zeroextended_move_imm): New prototype.
* config/aarch64/constraints.md: Add 'O' for 32/64-bit immediates,
limit 'N' to 64-bit only moves.

---

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 
4be93c93c26e091f878bc8e4cf06e90888405fb2..8bce6ec7599edcc2e6a1d8006450f35c0ce7f61f
 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -756,7 +756,7 @@ void aarch64_post_cfi_startproc (void);
 poly_int64 aarch64_initial_elimination_offset (unsigned, unsigned);
 int aarch64_get_condition_code (rtx);
 bool aarch64_address_valid_for_prefetch_p (rtx, bool);
-bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
+bool aarch64_bitmask_imm (unsigned HOST_WIDE_INT val, machine_mode);
 unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT val_in);
 unsigned HOST_WIDE_INT aarch64_and_split_imm2 (HOST_WIDE_INT val_in);
 bool aarch64_and_bitmask_imm (unsigned HOST_WIDE_INT val_in, machine_mode 
mode);
@@ -793,7 +793,7 @@ bool aarch64_masks_and_shift_for_bfi_p (scalar_int_mode, 
unsigned HOST_WIDE_INT,
unsigned HOST_WIDE_INT,
unsigned HOST_WIDE_INT);
 bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx);
-bool aarch64_move_imm (HOST_WIDE_INT, machine_mode);
+bool aarch64_move_imm (unsigned HOST_WIDE_INT, machine_mode);
 machine_mode aarch64_sve_int_mode (machine_mode);
 opt_machine_mode aarch64_sve_pred_mode (unsigned int);
 machine_mode aarch64_sve_pred_mode (machine_mode);
@@ -843,8 +843,9 @@ bool aarch64_sve_float_arith_immediate_p (rtx, bool);
 bool aarch64_sve_float_mul_immediate_p (rtx);
 bool aarch64_split_dimode_const_store (rtx, rtx);
 bool aarch64_symbolic_address_p (rtx);
-bool aarch64_uimm12_shift (HOST_WIDE_INT);
+bool aarch64_uimm12_shift (unsigned HOST_WIDE_INT);
 int aarch64_movk_shift (const wide_int_ref &, const wide_int_ref &);
+bool aarch64_zeroextended_move_imm (unsigned HOST_WIDE_INT);
 bool aarch64_use_return_insn_p (void);
 const char *aarch64_output_casesi (rtx *);
 
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
a73741800c963ee6605fd2cfa918f4399da4bfdf..00269632eeb52c29ba2011c4c82274968b850d71
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -5625,12 +5625,10 @@ aarch64_bitmask_imm (unsigned HOST_WIDE_INT val)
 
 /* Return true if VAL is a valid bitmask immediate for MODE.  */
 bool
-aarch64_bitmask_imm (HOST_WIDE_INT val_in, machine_mode mode)
+aarch64_bitmask_imm (unsigned HOST_WIDE_INT 

Re: [PATCH 1/2]middle-end: Add new tbranch optab to add support for bit-test-and-branch operations

2022-12-05 Thread Richard Sandiford via Gcc-patches
Tamar Christina via Gcc-patches  writes:
>> > +/* Check to see if the supplied comparison in PTEST can be performed as a
>> > +   bit-test-and-branch instead.  VAL must contain the original tree
>> > +   expression of the non-zero operand which will be used to rewrite the
>> > +   comparison in PTEST.
>> > +
>> > +   Returns TRUE if operation succeeds and returns updated PMODE and
>> PTEST,
>> > +   else FALSE.  */
>> > +
>> > +enum insn_code
>> > +static validate_test_and_branch (tree val, rtx *ptest, machine_mode
>> > +*pmode) {
>> > +  if (!val || TREE_CODE (val) != SSA_NAME)
>> > +return CODE_FOR_nothing;
>> > +
>> > +  machine_mode mode = TYPE_MODE (TREE_TYPE (val));  rtx test =
>> > + *ptest;
>> > +
>> > +  if (GET_CODE (test) != EQ && GET_CODE (test) != NE)
>> > +return CODE_FOR_nothing;
>> > +
>> > +  /* If the target supports the testbit comparison directly, great.
>> > + */  auto icode = direct_optab_handler (tbranch_optab, mode);  if
>> > + (icode == CODE_FOR_nothing)
>> > +return icode;
>> > +
>> > +  if (tree_zero_one_valued_p (val))
>> > +{
>> > +  auto pos = BYTES_BIG_ENDIAN ? GET_MODE_BITSIZE (mode) - 1 : 0;
>> 
>> Does this work for BYTES_BIG_ENDIAN && !WORDS_BIG_ENDIAN and mode
>> > word_mode?
>> 
>
> It does now. In this particular case all that matters is the bit ordering, so 
> I've changed
> It to BITS_BIG_ENDIAN.
>
> Also during the review of the AArch64 optab Richard Sandiford wanted me to 
> split the
> optabs apart into two.  The reason is that a match_operator still gets the 
> full RTL.
>
> In the case of a tbranch the full RTL has an invalid comparison, so if a 
> target doesn't implement
> the hook correctly this would lead to incorrect code.  We've now moved the 
> operator as part of
> the name itself to avoid this.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * dojump.cc (do_jump): Pass along value.
>   (do_jump_by_parts_greater_rtx): Likewise.
>   (do_jump_by_parts_zero_rtx): Likewise.
>   (do_jump_by_parts_equality_rtx): Likewise.
>   (do_compare_rtx_and_jump): Likewise.
>   (do_compare_and_jump): Likewise.
>   * dojump.h (do_compare_rtx_and_jump): New.
>   * optabs.cc (emit_cmp_and_jump_insn_1): Refactor to take optab to check.
>   (validate_test_and_branch): New.
>   (emit_cmp_and_jump_insns): Optiobally take a value, and when value is
>   supplied then check if it's suitable for tbranch.
>   * optabs.def (tbranch_eq$a4, tbranch_ne$a4): New.
>   * doc/md.texi (tbranch_@var{op}@var{mode}4): Document it.
>   * optabs.h (emit_cmp_and_jump_insns):
>   * tree.h (tree_zero_one_valued_p): New.

Thanks for doing this.

> --- inline copy of patch ---
>
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 
> d0a71ecbb806de3a6564c6ffe973fec5da5c597b..c6c4b13d756de28078a0a779876a00c614246914
>  100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -6964,6 +6964,14 @@ case, you can and should make operand 1's predicate 
> reject some operators
>  in the @samp{cstore@var{mode}4} pattern, or remove the pattern altogether
>  from the machine description.
>  
> +@cindex @code{tbranch_@var{op}@var{mode}4} instruction pattern
> +@item @samp{tbranch_@var{op}@var{mode}4}
> +Conditional branch instruction combined with a bit test-and-compare
> +instruction. Operand 0 is a comparison operator.  Operand 1 is the
> +operand of the comparison. Operand 2 is the bit position of Operand 1 to 
> test.
> +Operand 3 is the @code{code_label} to jump to. @var{op} is one of @var{eq} or
> +@var{ne}.
> +

The documentation still describes the old interface.  Also, there are only 3
operands now, rather than 4, so the optab name should end with 3.

>  @cindex @code{cbranch@var{mode}4} instruction pattern
>  @item @samp{cbranch@var{mode}4}
>  Conditional branch instruction combined with a compare instruction.
> diff --git a/gcc/dojump.h b/gcc/dojump.h
> index 
> e379cceb34bb1765cb575636e4c05b61501fc2cf..d1d79c490c420a805fe48d58740a79c1f25fb839
>  100644
> --- a/gcc/dojump.h
> +++ b/gcc/dojump.h
> @@ -71,6 +71,10 @@ extern void jumpifnot (tree exp, rtx_code_label *label,
>  extern void jumpifnot_1 (enum tree_code, tree, tree, rtx_code_label *,
>profile_probability);
>  
> +extern void do_compare_rtx_and_jump (rtx, rtx, enum rtx_code, int, tree,
> +  machine_mode, rtx, rtx_code_label *,
> +  rtx_code_label *, profile_probability);
> +
>  extern void do_compare_rtx_and_jump (rtx, rtx, enum rtx_code, int,
>machine_mode, rtx, rtx_code_label *,
>rtx_code_label *, profile_probability);
> diff --git a/gcc/dojump.cc b/gcc/dojump.cc
> index 
> 2af0cd1aca3b6af13d5d8799094ee93f18022296..190324f36f1a31990f8c49bc8c0f45c23da5c31e
>  100644
> --- a/gcc/dojump.cc
> +++ b/gcc/dojump.cc
> @@ 

[PATCH] range-op-float: Improve multiplication reverse operation

2022-12-05 Thread Jakub Jelinek via Gcc-patches
On Mon, Dec 05, 2022 at 10:54:41AM +0100, Aldy Hernandez wrote:
> > Do you mind if I try that incrementally and only if it doesn't make the
> > code too large/too unreadable?
> 
> Sure.  And don't feel obligated to implement it either.  range-ops is a
> never ending pit of possible optimizations.  We found that out quite early
> in the design.
> 
> If you don't get to it, could you at least add a comment?

So like this for multiplication op1/2_range if it passes bootstrap/regtest?
For division I'll need to go to a drawing board...

2022-12-05  Jakub Jelinek  

* range-op-float.cc (zero_to_max_range): New function.
(foperator_plus::op1_range): If lhs can't be INF nor NAN,
op1 can't be INF.

--- gcc/range-op-float.cc.jj2022-12-05 11:17:34.900573272 +0100
+++ gcc/range-op-float.cc   2022-12-05 12:32:30.286753929 +0100
@@ -2004,6 +2004,29 @@ zero_to_inf_range (REAL_VALUE_TYPE ,
 }
 }
 
+// Set [lb, ub] to [-MAX, -0], [-MAX, +MAX] or [+0, +MAX] depending on
+// signbit_known.
+static void
+zero_to_max_range (REAL_VALUE_TYPE , REAL_VALUE_TYPE , tree type,
+  int signbit_known)
+{
+  if (signbit_known > 0)
+{
+  lb = dconst0;
+  ub = real_max_representable (type);
+}
+  else if (signbit_known < 0)
+{
+  lb = real_min_representable (type);
+  ub = real_value_negate ();
+}
+  else
+{
+  lb = real_min_representable (type);
+  ub = real_max_representable (type);
+}
+}
+
 class foperator_plus : public range_operator_float
 {
   using range_operator_float::op1_range;
@@ -2159,7 +2182,14 @@ public:
// and perhaps if it can be NAN or not.
REAL_VALUE_TYPE lb, ub;
int signbit_known = signbit_known_p (lhs_lb, lhs_ub, op2_lb, op2_ub);
-   zero_to_inf_range (lb, ub, signbit_known);
+   // If lhs can't be +-INF nor NAN, then op1 can't be +-INF -
+   // +-INF * anything is either +-INF or NAN (if op2 is +-0 or NAN).
+   if (!real_isinf (_lb)
+   && !real_isinf (_ub)
+   && !lhs.maybe_isnan ())
+ zero_to_max_range (lb, ub, type, signbit_known);
+   else
+ zero_to_inf_range (lb, ub, signbit_known);
r.set (type, lb, ub);
   }
 // Otherwise, if op2 is a singleton INF and lhs doesn't include INF,


Jakub



Re: [PATCH]AArch64 Fix vector re-interpretation between partial SIMD modes

2022-12-05 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Friday, November 18, 2022 9:30 AM
>> To: Tamar Christina 
>> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
>> ; Marcus Shawcroft
>> ; Kyrylo Tkachov 
>> Subject: Re: [PATCH]AArch64 Fix vector re-interpretation between partial
>> SIMD modes
>> 
>> Richard Sandiford via Gcc-patches  writes:
>> > Tamar Christina  writes:
>> >> Hi All,
>> >>
>> >> While writing a patch series I started getting incorrect codegen out
>> >> from VEC_PERM on partial struct types.
>> >>
>> >> It turns out that this was happening because the
>> >> TARGET_CAN_CHANGE_MODE_CLASS implementation has a slight bug in
>> it.  The hook only checked for SIMD to
>> >> Partial but never Partial to SIMD.   This resulted in incorrect subregs 
>> >> to be
>> >> generated from the fallback code in VEC_PERM_EXPR expansions.
>> >>
>> >> I have unfortunately not been able to trigger it using a standalone
>> >> testcase as the mid-end optimizes away the permute every time I try
>> >> to describe a permute that would result in the bug.
>> >>
>> >> The patch now rejects any conversion of partial SIMD struct types,
>> >> unless they are both partial structures of the same number of
>> >> registers or one is a SIMD type who's size is less than 8 bytes.
>> >>
>> >> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>> >>
>> >> Ok for master? And backport to GCC 12?
>> >>
>> >> Thanks,
>> >> Tamar
>> >>
>> >> gcc/ChangeLog:
>> >>
>> >>   * config/aarch64/aarch64.cc (aarch64_can_change_mode_class):
>> Restrict
>> >>   conversions between partial struct types properly.
>> >>
>> >> --- inline copy of patch --
>> >> diff --git a/gcc/config/aarch64/aarch64.cc
>> >> b/gcc/config/aarch64/aarch64.cc index
>> >>
>> d3c3650d7d728f56adb65154127dc7b72386c5a7..84dbe2f4ea7d03b424602ed9
>> 8a3
>> >> 4e7824217dc91 100644
>> >> --- a/gcc/config/aarch64/aarch64.cc
>> >> +++ b/gcc/config/aarch64/aarch64.cc
>> >> @@ -26471,9 +26471,10 @@ aarch64_can_change_mode_class
>> (machine_mode from,
>> >>bool from_pred_p = (from_flags & VEC_SVE_PRED);
>> >>bool to_pred_p = (to_flags & VEC_SVE_PRED);
>> >>
>> >> -  bool from_full_advsimd_struct_p = (from_flags == (VEC_ADVSIMD |
>> VEC_STRUCT));
>> >>bool to_partial_advsimd_struct_p = (to_flags == (VEC_ADVSIMD |
>> VEC_STRUCT
>> >>  | VEC_PARTIAL));
>> >> +  bool from_partial_advsimd_struct_p = (from_flags == (VEC_ADVSIMD
>> | VEC_STRUCT
>> >> +| VEC_PARTIAL));
>> >>
>> >>/* Don't allow changes between predicate modes and other modes.
>> >>   Only predicate registers can hold predicate modes and only @@
>> >> -26496,9 +26497,23 @@ aarch64_can_change_mode_class
>> (machine_mode from,
>> >>  return false;
>> >>
>> >>/* Don't allow changes between partial and full Advanced SIMD
>> structure
>> >> - modes.  */
>> >> -  if (from_full_advsimd_struct_p && to_partial_advsimd_struct_p)
>> >> -return false;
>> >> + modes unless both are a partial struct with the same number of
>> registers
>> >> + or the vector bitsizes must be the same.  */
>> >> +  if (to_partial_advsimd_struct_p ^ from_partial_advsimd_struct_p)
>> >> +{
>> >> +  /* If they're both partial structures, allow if they have the same
>> number
>> >> +  or registers.  */
>> >> +  if (to_partial_advsimd_struct_p == from_partial_advsimd_struct_p)
>> >> + return known_eq (GET_MODE_SIZE (from), GET_MODE_SIZE (to));
>> >
>> > It looks like the ^ makes this line unreachable.  I guess it should be
>> > a separate top-level condition.
>> >
>> >> +  /* If one is a normal SIMD register, allow only if no larger than 
>> >> 64-bit.
>> */
>> >> +  if ((to_flags & VEC_ADVSIMD) == to_flags)
>> >> + return known_le (GET_MODE_SIZE (to), 8);
>> >> +  else if ((from_flags & VEC_ADVSIMD) == from_flags)
>> >> + return known_le (GET_MODE_SIZE (from), 8);
>> >> +
>> >> +  return false;
>> >> +}
>> >
>> > I don't think we need to restrict this to SIMD modes.  A plain DI
>> > would be OK too.  So I think it should just be:
>> >
>> > return (known_le (GET_MODE_SIZE (to), 8)
>> > || known_le (GET_MODE_SIZE (from, 8));
>> 
>> Looking again, all the other tests return false if they found a definite 
>> problem
>> and fall through to later code otherwise.  I think we should do the same 
>> here.
>
> I've rewritten the conditions. I needed to allow any conversions as long as 
> they're both
> partial vectors.  There were various intrinsics tests that rely on for 
> instance being able to
> take a subreg of a VN2x8QI from a VN3x8QI for instance.  I did not only allow 
> the smaller
> case since it didn't seem logical to block paradoxical subregs of this kind.
>
> Reload seems to be correctly handling them as separate 64-bit registers (we 
> have tests for
> this it looks like.)
>
> So Bootstrapped Regtested on aarch64-none-linux-gnu 

Re: AArch64: Add UNSPECV_PATCHABLE_AREA [PR98776]

2022-12-05 Thread Richard Sandiford via Gcc-patches
"Pop, Sebastian"  writes:
> Hi,
>
> Currently patchable area is at the wrong place on AArch64.  It is placed
> immediately after function label, before .cfi_startproc.  This patch
> adds UNSPECV_PATCHABLE_AREA for pseudo patchable area instruction and
> modifies aarch64_print_patchable_function_entry to avoid placing
> patchable area before .cfi_startproc.
>
> The patch passed bootstrap and regression test on aarch64-linux.
> Ok to commit to trunk and backport to active release branches?

Looks good, but doesn't the problem described in the PR then still
apply to the BTI emitted by:

  if (cfun->machine->label_is_assembled
  && aarch64_bti_enabled ()
  && !cgraph_node::get (cfun->decl)->only_called_directly_p ())
{
  /* Remove the BTI that follows the patch area and insert a new BTI
 before the patch area right after the function label.  */
  rtx_insn *insn = next_real_nondebug_insn (get_insns ());
  if (insn
  && INSN_P (insn)
  && GET_CODE (PATTERN (insn)) == UNSPEC_VOLATILE
  && XINT (PATTERN (insn), 1) == UNSPECV_BTI_C)
delete_insn (insn);
  asm_fprintf (file, "\thint\t34 // bti c\n");
}

?  It seems like the BTI will be before the cfi_startproc and the
patchable entry afterwards.

I guess we should keep the BTI instruction as-is (rather than printing
a .hint) and emit the new UNSPECV_PATCHABLE_AREA after the BTI rather
than before it.

Thanks,
Richard

> Thanks,
> Sebastian
>
> gcc/
> PR target/93492
> * config/aarch64/aarch64-protos.h (aarch64_output_patchable_area):
> Declared.
> * config/aarch64/aarch64.cc (aarch64_print_patchable_function_entry):
> Emit an UNSPECV_PATCHABLE_AREA pseudo instruction.
> (aarch64_output_patchable_area): New.
> * config/aarch64/aarch64.md (UNSPECV_PATCHABLE_AREA): New.
> (patchable_area): Define.
>
> gcc/testsuite/
> PR target/93492
> * gcc.target/aarch64/pr98776.c: New.
>
>
> From b9cf87bcdf65f515b38f1851eb95c18aaa180253 Mon Sep 17 00:00:00 2001
> From: Sebastian Pop 
> Date: Wed, 30 Nov 2022 19:45:24 +
> Subject: [PATCH] AArch64: Add UNSPECV_PATCHABLE_AREA [PR98776]
>
> Currently patchable area is at the wrong place on AArch64.  It is placed
> immediately after function label, before .cfi_startproc.  This patch
> adds UNSPECV_PATCHABLE_AREA for pseudo patchable area instruction and
> modifies aarch64_print_patchable_function_entry to avoid placing
> patchable area before .cfi_startproc.
>
> gcc/
>   PR target/93492
>   * config/aarch64/aarch64-protos.h (aarch64_output_patchable_area):
>   Declared.
>   * config/aarch64/aarch64.cc (aarch64_print_patchable_function_entry):
>   Emit an UNSPECV_PATCHABLE_AREA pseudo instruction.
>   (aarch64_output_patchable_area): New.
>   * config/aarch64/aarch64.md (UNSPECV_PATCHABLE_AREA): New.
>   (patchable_area): Define.
>
> gcc/testsuite/
>   PR target/93492
>   * gcc.target/aarch64/pr98776.c: New.
> ---
>  gcc/config/aarch64/aarch64-protos.h|  2 ++
>  gcc/config/aarch64/aarch64.cc  | 24 +-
>  gcc/config/aarch64/aarch64.md  | 14 +
>  gcc/testsuite/gcc.target/aarch64/pr98776.c | 11 ++
>  4 files changed, 50 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr98776.c
>
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 4be93c93c26..2fba24d947d 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -1074,4 +1074,6 @@ const char *aarch64_indirect_call_asm (rtx);
>  extern bool aarch64_harden_sls_retbr_p (void);
>  extern bool aarch64_harden_sls_blr_p (void);
>  
> +extern void aarch64_output_patchable_area (unsigned int, bool);
> +
>  #endif /* GCC_AARCH64_PROTOS_H */
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index e97f3b32f7c..e84b33b958c 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -22684,7 +22684,29 @@ aarch64_print_patchable_function_entry (FILE *file,
>asm_fprintf (file, "\thint\t34 // bti c\n");
>  }
>  
> -  default_print_patchable_function_entry (file, patch_area_size, record_p);
> +  if (cfun->machine->label_is_assembled)
> +{
> +  rtx pa = gen_patchable_area (GEN_INT (patch_area_size),
> +GEN_INT (record_p));
> +  basic_block bb = ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb;
> +  rtx_insn *insn = emit_insn_before (pa, BB_HEAD (bb));
> +  INSN_ADDRESSES_NEW (insn, -1);
> +}
> +  else
> +{
> +  default_print_patchable_function_entry (file, patch_area_size,
> +   record_p);
> +}
> +}
> +
> +/* Output patchable area.  */
> +
> +void
> +aarch64_output_patchable_area (unsigned int patch_area_size, bool record_p)
> +{
> +  

Re: [aarch64] Use dup and zip1 for interleaving elements in initializing vector

2022-12-05 Thread Richard Sandiford via Gcc-patches
Richard Sandiford via Gcc-patches  writes:
> Prathamesh Kulkarni  writes:
>> Hi,
>> For the following test-case:
>>
>> int16x8_t foo(int16_t x, int16_t y)
>> {
>>   return (int16x8_t) { x, y, x, y, x, y, x, y };
>> }
>>
>> Code gen at -O3:
>> foo:
>> dupv0.8h, w0
>> ins v0.h[1], w1
>> ins v0.h[3], w1
>> ins v0.h[5], w1
>> ins v0.h[7], w1
>> ret
>>
>> For 16 elements, it results in 8 ins instructions which might not be
>> optimal perhaps.
>> I guess, the above code-gen would be equivalent to the following ?
>> dup v0.8h, w0
>> dup v1.8h, w1
>> zip1 v0.8h, v0.8h, v1.8h
>>
>> I have attached patch to do the same, if number of elements >= 8,
>> which should be possibly better compared to current code-gen ?
>> Patch passes bootstrap+test on aarch64-linux-gnu.
>> Does the patch look OK ?
>>
>> Thanks,
>> Prathamesh
>>
>> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
>> index c91df6f5006..e5dea70e363 100644
>> --- a/gcc/config/aarch64/aarch64.cc
>> +++ b/gcc/config/aarch64/aarch64.cc
>> @@ -22028,6 +22028,39 @@ aarch64_expand_vector_init (rtx target, rtx vals)
>>return;
>>  }
>>  
>> +  /* Check for interleaving case.
>> + For eg if initializer is (int16x8_t) {x, y, x, y, x, y, x, y}.
>> + Generate following code:
>> + dup v0.h, x
>> + dup v1.h, y
>> + zip1 v0.h, v0.h, v1.h
>> + for "large enough" initializer.  */
>> +
>> +  if (n_elts >= 8)
>> +{
>> +  int i;
>> +  for (i = 2; i < n_elts; i++)
>> +if (!rtx_equal_p (XVECEXP (vals, 0, i), XVECEXP (vals, 0, i % 2)))
>> +  break;
>> +
>> +  if (i == n_elts)
>> +{
>> +  machine_mode mode = GET_MODE (target);
>> +  rtx dest[2];
>> +
>> +  for (int i = 0; i < 2; i++)
>> +{
>> +  rtx x = copy_to_mode_reg (GET_MODE_INNER (mode), XVECEXP (vals, 
>> 0, i));
>
> Formatting nit: long line.
>
>> +  dest[i] = gen_reg_rtx (mode);
>> +  aarch64_emit_move (dest[i], gen_vec_duplicate (mode, x));
>> +}
>
> This could probably be written:
>
> for (int i = 0; i < 2; i++)
>   {
> rtx x = expand_vector_broadcast (mode, XVECEXP (vals, 0, i));
> dest[i] = force_reg (GET_MODE_INNER (mode), x);

Oops, I meant "mode" rather than "GET_MODE_INNER (mode)", sorry.

>   }
>
> which avoids forcing constant elements into a register before the duplication.
> OK with that change if it works.
>
> Thanks,
> Richard
>
>> +
>> +  rtvec v = gen_rtvec (2, dest[0], dest[1]);
>> +  emit_set_insn (target, gen_rtx_UNSPEC (mode, v, UNSPEC_ZIP1));
>> +  return;
>> +}
>> +}
>> +
>>enum insn_code icode = optab_handler (vec_set_optab, mode);
>>gcc_assert (icode != CODE_FOR_nothing);
>>  
>> diff --git a/gcc/testsuite/gcc.target/aarch64/interleave-init-1.c 
>> b/gcc/testsuite/gcc.target/aarch64/interleave-init-1.c
>> new file mode 100644
>> index 000..ee775048589
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/interleave-init-1.c
>> @@ -0,0 +1,37 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O3" } */
>> +/* { dg-final { check-function-bodies "**" "" "" } } */
>> +
>> +#include 
>> +
>> +/*
>> +** foo:
>> +**  ...
>> +**  dup v[0-9]+\.8h, w[0-9]+
>> +**  dup v[0-9]+\.8h, w[0-9]+
>> +**  zip1v[0-9]+\.8h, v[0-9]+\.8h, v[0-9]+\.8h
>> +**  ...
>> +**  ret
>> +*/
>> +
>> +int16x8_t foo(int16_t x, int y)
>> +{
>> +  int16x8_t v = (int16x8_t) {x, y, x, y, x, y, x, y}; 
>> +  return v;
>> +}
>> +
>> +/*
>> +** foo2:
>> +**  ...
>> +**  dup v[0-9]+\.8h, w[0-9]+
>> +**  moviv[0-9]+\.8h, 0x1
>> +**  zip1v[0-9]+\.8h, v[0-9]+\.8h, v[0-9]+\.8h
>> +**  ...
>> +**  ret
>> +*/
>> +
>> +int16x8_t foo2(int16_t x) 
>> +{
>> +  int16x8_t v = (int16x8_t) {x, 1, x, 1, x, 1, x, 1}; 
>> +  return v;
>> +}


Re: [PATCH] tree, c++: optimize walk_tree_1 and cp_walk_subtrees

2022-12-05 Thread Prathamesh Kulkarni via Gcc-patches
On Mon, 5 Dec 2022 at 09:51, Patrick Palka via Gcc-patches
 wrote:
>
> These functions currently repeatedly dereference tp during the subtree
> walk, dereferences which the compiler can't CSE because it can't
> guarantee that the subtree walking doesn't modify *tp.
>
> But we already implicitly require that TREE_CODE (*tp) remains the same
> throughout the subtree walks, so it doesn't seem to be a huge leap to
> strengthen that to requiring *tp remains the same.
Hi Patrick,
Just wondering in that case, if marking tp with const_tree *, instead
of tree *, would perhaps help the compiler
for CSEing some of the dereferences to *tp ?

Thanks,
Prathamesh
>
> So this patch manually CSEs the dereferences of *tp.  This means that
> the callback function can no longer replace *tp with another tree (of
> the same TREE_CODE) when walking one of its subtrees, but that doesn't
> sound like a useful feature anyway.  This speeds up the C++ frontend by
> about ~1.5% according to my experiments.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> trunk stage3 or perhaps for stage1?
>
> gcc/cp/ChangeLog:
>
> * tree.cc (cp_walk_subtrees): Avoid repeatedly dereferencing tp.
>
> gcc/ChangeLog:
>
> * tree.cc (walk_tree_1): Avoid repeatedly dereferencing tp
> and type_p.
> ---
>  gcc/cp/tree.cc | 113 +
>  gcc/tree.cc| 103 ++--
>  2 files changed, 108 insertions(+), 108 deletions(-)
>
> diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc
> index 4066b014f6e..ceb0c75f559 100644
> --- a/gcc/cp/tree.cc
> +++ b/gcc/cp/tree.cc
> @@ -5441,7 +5441,8 @@ tree
>  cp_walk_subtrees (tree *tp, int *walk_subtrees_p, walk_tree_fn func,
>   void *data, hash_set *pset)
>  {
> -  enum tree_code code = TREE_CODE (*tp);
> +  tree t = *tp;
> +  enum tree_code code = TREE_CODE (t);
>tree result;
>
>  #define WALK_SUBTREE(NODE) \
> @@ -5452,7 +5453,7 @@ cp_walk_subtrees (tree *tp, int *walk_subtrees_p, 
> walk_tree_fn func,
>  }  \
>while (0)
>
> -  if (TYPE_P (*tp))
> +  if (TYPE_P (t))
>  {
>/* If *WALK_SUBTREES_P is 1, we're interested in the syntactic form of
>  the argument, so don't look through typedefs, but do walk into
> @@ -5464,15 +5465,15 @@ cp_walk_subtrees (tree *tp, int *walk_subtrees_p, 
> walk_tree_fn func,
>
>  See find_abi_tags_r for an example of setting *WALK_SUBTREES_P to 2
>  when that's the behavior the walk_tree_fn wants.  */
> -  if (*walk_subtrees_p == 1 && typedef_variant_p (*tp))
> +  if (*walk_subtrees_p == 1 && typedef_variant_p (t))
> {
> - if (tree ti = TYPE_ALIAS_TEMPLATE_INFO (*tp))
> + if (tree ti = TYPE_ALIAS_TEMPLATE_INFO (t))
> WALK_SUBTREE (TI_ARGS (ti));
>   *walk_subtrees_p = 0;
>   return NULL_TREE;
> }
>
> -  if (tree ti = TYPE_TEMPLATE_INFO (*tp))
> +  if (tree ti = TYPE_TEMPLATE_INFO (t))
> WALK_SUBTREE (TI_ARGS (ti));
>  }
>
> @@ -5482,8 +5483,8 @@ cp_walk_subtrees (tree *tp, int *walk_subtrees_p, 
> walk_tree_fn func,
>switch (code)
>  {
>  case TEMPLATE_TYPE_PARM:
> -  if (template_placeholder_p (*tp))
> -   WALK_SUBTREE (CLASS_PLACEHOLDER_TEMPLATE (*tp));
> +  if (template_placeholder_p (t))
> +   WALK_SUBTREE (CLASS_PLACEHOLDER_TEMPLATE (t));
>/* Fall through.  */
>  case DEFERRED_PARSE:
>  case TEMPLATE_TEMPLATE_PARM:
> @@ -5497,63 +5498,63 @@ cp_walk_subtrees (tree *tp, int *walk_subtrees_p, 
> walk_tree_fn func,
>break;
>
>  case TYPENAME_TYPE:
> -  WALK_SUBTREE (TYPE_CONTEXT (*tp));
> -  WALK_SUBTREE (TYPENAME_TYPE_FULLNAME (*tp));
> +  WALK_SUBTREE (TYPE_CONTEXT (t));
> +  WALK_SUBTREE (TYPENAME_TYPE_FULLNAME (t));
>*walk_subtrees_p = 0;
>break;
>
>  case BASELINK:
> -  if (BASELINK_QUALIFIED_P (*tp))
> -   WALK_SUBTREE (BINFO_TYPE (BASELINK_ACCESS_BINFO (*tp)));
> -  WALK_SUBTREE (BASELINK_FUNCTIONS (*tp));
> +  if (BASELINK_QUALIFIED_P (t))
> +   WALK_SUBTREE (BINFO_TYPE (BASELINK_ACCESS_BINFO (t)));
> +  WALK_SUBTREE (BASELINK_FUNCTIONS (t));
>*walk_subtrees_p = 0;
>break;
>
>  case PTRMEM_CST:
> -  WALK_SUBTREE (TREE_TYPE (*tp));
> +  WALK_SUBTREE (TREE_TYPE (t));
>*walk_subtrees_p = 0;
>break;
>
>  case TREE_LIST:
> -  WALK_SUBTREE (TREE_PURPOSE (*tp));
> +  WALK_SUBTREE (TREE_PURPOSE (t));
>break;
>
>  case OVERLOAD:
> -  WALK_SUBTREE (OVL_FUNCTION (*tp));
> -  WALK_SUBTREE (OVL_CHAIN (*tp));
> +  WALK_SUBTREE (OVL_FUNCTION (t));
> +  WALK_SUBTREE (OVL_CHAIN (t));
>*walk_subtrees_p = 0;
>break;
>
>  case USING_DECL:
> -  WALK_SUBTREE (DECL_NAME (*tp));
> -  WALK_SUBTREE (USING_DECL_SCOPE (*tp));
> -  

Re: [aarch64] Use dup and zip1 for interleaving elements in initializing vector

2022-12-05 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni  writes:
> Hi,
> For the following test-case:
>
> int16x8_t foo(int16_t x, int16_t y)
> {
>   return (int16x8_t) { x, y, x, y, x, y, x, y };
> }
>
> Code gen at -O3:
> foo:
> dupv0.8h, w0
> ins v0.h[1], w1
> ins v0.h[3], w1
> ins v0.h[5], w1
> ins v0.h[7], w1
> ret
>
> For 16 elements, it results in 8 ins instructions which might not be
> optimal perhaps.
> I guess, the above code-gen would be equivalent to the following ?
> dup v0.8h, w0
> dup v1.8h, w1
> zip1 v0.8h, v0.8h, v1.8h
>
> I have attached patch to do the same, if number of elements >= 8,
> which should be possibly better compared to current code-gen ?
> Patch passes bootstrap+test on aarch64-linux-gnu.
> Does the patch look OK ?
>
> Thanks,
> Prathamesh
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index c91df6f5006..e5dea70e363 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -22028,6 +22028,39 @@ aarch64_expand_vector_init (rtx target, rtx vals)
>return;
>  }
>  
> +  /* Check for interleaving case.
> + For eg if initializer is (int16x8_t) {x, y, x, y, x, y, x, y}.
> + Generate following code:
> + dup v0.h, x
> + dup v1.h, y
> + zip1 v0.h, v0.h, v1.h
> + for "large enough" initializer.  */
> +
> +  if (n_elts >= 8)
> +{
> +  int i;
> +  for (i = 2; i < n_elts; i++)
> + if (!rtx_equal_p (XVECEXP (vals, 0, i), XVECEXP (vals, 0, i % 2)))
> +   break;
> +
> +  if (i == n_elts)
> + {
> +   machine_mode mode = GET_MODE (target);
> +   rtx dest[2];
> +
> +   for (int i = 0; i < 2; i++)
> + {
> +   rtx x = copy_to_mode_reg (GET_MODE_INNER (mode), XVECEXP (vals, 
> 0, i));

Formatting nit: long line.

> +   dest[i] = gen_reg_rtx (mode);
> +   aarch64_emit_move (dest[i], gen_vec_duplicate (mode, x));
> + }

This could probably be written:

  for (int i = 0; i < 2; i++)
{
  rtx x = expand_vector_broadcast (mode, XVECEXP (vals, 0, i));
  dest[i] = force_reg (GET_MODE_INNER (mode), x);
}

which avoids forcing constant elements into a register before the duplication.

OK with that change if it works.

Thanks,
Richard

> +
> +   rtvec v = gen_rtvec (2, dest[0], dest[1]);
> +   emit_set_insn (target, gen_rtx_UNSPEC (mode, v, UNSPEC_ZIP1));
> +   return;
> + }
> +}
> +
>enum insn_code icode = optab_handler (vec_set_optab, mode);
>gcc_assert (icode != CODE_FOR_nothing);
>  
> diff --git a/gcc/testsuite/gcc.target/aarch64/interleave-init-1.c 
> b/gcc/testsuite/gcc.target/aarch64/interleave-init-1.c
> new file mode 100644
> index 000..ee775048589
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/interleave-init-1.c
> @@ -0,0 +1,37 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3" } */
> +/* { dg-final { check-function-bodies "**" "" "" } } */
> +
> +#include 
> +
> +/*
> +** foo:
> +**   ...
> +**   dup v[0-9]+\.8h, w[0-9]+
> +**   dup v[0-9]+\.8h, w[0-9]+
> +**   zip1v[0-9]+\.8h, v[0-9]+\.8h, v[0-9]+\.8h
> +**   ...
> +**   ret
> +*/
> +
> +int16x8_t foo(int16_t x, int y)
> +{
> +  int16x8_t v = (int16x8_t) {x, y, x, y, x, y, x, y}; 
> +  return v;
> +}
> +
> +/*
> +** foo2:
> +**   ...
> +**   dup v[0-9]+\.8h, w[0-9]+
> +**   moviv[0-9]+\.8h, 0x1
> +**   zip1v[0-9]+\.8h, v[0-9]+\.8h, v[0-9]+\.8h
> +**   ...
> +**   ret
> +*/
> +
> +int16x8_t foo2(int16_t x) 
> +{
> +  int16x8_t v = (int16x8_t) {x, 1, x, 1, x, 1, x, 1}; 
> +  return v;
> +}


Re: [PATCH] match.pd: Don't fold nan < x etc. for -ftrapping-math [PR106805]

2022-12-05 Thread Richard Biener via Gcc-patches



> Am 05.12.2022 um 11:30 schrieb Jakub Jelinek via Gcc-patches 
> :
> 
> Hi!
> 
> As reported in the PR, the following pr106805.c testcase is miscompiled
> with the default -ftrapping-math, because we fold all the comparisons into
> constants and don't raise any exceptions.
> 
> The match.pd pattern handles just simple comparisons, from those
> EQ/NE are quiet and don't raise exceptions on anything but sNaN, while
> GT/GE/LT/LE are signaling and do raise exceptions even on qNaN.
> 
> fold_relational_const handles this IMHO correctly:
>  /* Handle the cases where either operand is a NaN.  */
>  if (real_isnan (c0) || real_isnan (c1))
>{
>  switch (code)
>{
>case EQ_EXPR:
>case ORDERED_EXPR:
>  result = 0;
>  break;
> 
>case NE_EXPR:
>case UNORDERED_EXPR:
>case UNLT_EXPR:
>case UNLE_EXPR:
>case UNGT_EXPR:
>case UNGE_EXPR:
>case UNEQ_EXPR:
>  result = 1;
>  break;
> 
>case LT_EXPR:
>case LE_EXPR:
>case GT_EXPR:
>case GE_EXPR:
>case LTGT_EXPR:
>  if (flag_trapping_math)
>return NULL_TREE;
>  result = 0;
>  break;
> 
>default:
>  gcc_unreachable ();
>}
> 
>  return constant_boolean_node (result, type);
>}
> by folding the signaling comparisons only if -fno-trapping-math.
> The following patch does the same in match.pd.
> 
> Unfortunately the pr106805.c testcase still fails, but no longer because of
> match.pd, but on the trunk because of the still unresolved ranger problems
> (same issue as for fold-overflow-1.c etc.) and on 12 branch (and presumably
> trunk too) somewhere during expansion the comparisons are also expanded
> into constants (which is ok for -fno-trapping-math, but not ok with that).
> 
> Though, I think the patch is a small step in the direction, so I'd like
> to commit this patch without the gcc.dg/pr106805.c testcase for now.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, appart from the
> +FAIL: gcc.dg/pr106805.c execution test
> nothing else appears.  Ok for trunk without the last testcase?

Ok.

Richard 

> 2022-12-05  Jakub Jelinek  
> 
>PR middle-end/106805
>* match.pd (cmp @0 REAL_CST@1): Don't optimize x cmp NaN
>or NaN cmp x to false/true for cmp >/>=/ 
>* c-c++-common/pr57371-4.c: Revert 2021-09-19 changes.
>* c-c++-common/pr57371-5.c: New test.
>* gcc.c-torture/execute/ieee/fp-cmp-6.x: Add -fno-trapping-math.
>* gcc.c-torture/execute/ieee/fp-cmp-9.c: New test. 
>* gcc.c-torture/execute/ieee/fp-cmp-9.x: New file.
>* gcc.dg/pr106805.c: New test.
> 
> --- gcc/match.pd.jj2022-12-02 10:28:30.0 +0100
> +++ gcc/match.pd2022-12-02 12:50:28.701735269 +0100
> @@ -5146,12 +5146,14 @@ (define_operator_list SYNC_FETCH_AND_AND
> (cmp { build_real (TREE_TYPE (@0), dconst0); } @1))
>/* x != NaN is always true, other ops are always false.  */
>(if (REAL_VALUE_ISNAN (TREE_REAL_CST (@1))
> +&& (cmp == EQ_EXPR || cmp == NE_EXPR || !flag_trapping_math)
>&& !tree_expr_signaling_nan_p (@1)
>&& !tree_expr_maybe_signaling_nan_p (@0))
> { constant_boolean_node (cmp == NE_EXPR, type); })
>/* NaN != y is always true, other ops are always false.  */
>(if (TREE_CODE (@0) == REAL_CST
>&& REAL_VALUE_ISNAN (TREE_REAL_CST (@0))
> +&& (cmp == EQ_EXPR || cmp == NE_EXPR || !flag_trapping_math)
> && !tree_expr_signaling_nan_p (@0)
> && !tree_expr_signaling_nan_p (@1))
> { constant_boolean_node (cmp == NE_EXPR, type); })
> --- gcc/testsuite/c-c++-common/pr57371-4.c.jj2022-04-28 
> 15:56:07.137787247 +0200
> +++ gcc/testsuite/c-c++-common/pr57371-4.c2022-12-02 13:34:03.076882811 
> +0100
> @@ -13,25 +13,25 @@ void nonfinite(unsigned short x) {
>   {
> volatile int nonfinite_1;
> nonfinite_1 = (float) x > QNAN;
> -/* { dg-final { scan-tree-dump "nonfinite_1 = 0" "original" } } */
> +/* { dg-final { scan-tree-dump "nonfinite_1 = \\(float\\)" "original" } 
> } */
>   }
> 
>   {
> volatile int nonfinite_2;
> nonfinite_2 = (float) x >= QNAN;
> -/* { dg-final { scan-tree-dump "nonfinite_2 = 0" "original" } } */
> +/* { dg-final { scan-tree-dump "nonfinite_2 = \\(float\\)" "original" } 
> } */
>   }
> 
>   {
> volatile int nonfinite_3;
> nonfinite_3 = (float) x < QNAN;
> -/* { dg-final { scan-tree-dump "nonfinite_3 = 0" "original" } } */
> +/* { dg-final { scan-tree-dump "nonfinite_3 = \\(float\\)" "original" } 
> } */
>   }
> 
>   {
> volatile int nonfinite_4;
> nonfinite_4 = (float) x <= QNAN;
> -/* { dg-final { scan-tree-dump "nonfinite_4 = 0" "original" } } */
> +/* { dg-final { scan-tree-dump "nonfinite_4 = \\(float\\)" "original" } 
> } */
>   }
> 
>   {
> --- 

Re: [PATCH] testsuite, X86, Darwin: Fix bf16 ABI tests for Mach-O/macOS ABI.

2022-12-05 Thread Uros Bizjak via Gcc-patches
On Sun, Dec 4, 2022 at 9:30 PM Iain Sandoe  wrote:
>
>
>
> > On 4 Dec 2022, at 20:20, Uros Bizjak via Gcc-patches 
> >  wrote:
> >
> > On Sun, Dec 4, 2022 at 12:51 PM Iain Sandoe  wrote:
> >>
> >> This is almost a completely Darwin-local patch, but there is one (repeated)
> >> place where a general change is needed - which is in making xmm_regs and
> >> x87_regs extern in the three copies of args.h (this is consistent with the
> >> other saved vars).  These fails represent most of the current testsuite 
> >> noise
> >> on x86 Darwin.
> >>
> >> tested on x86-64 Darwin and Linux.
> >>
> >> OK for master?
> >> Iain
> >>
> >> -- >8 --
> >>
> >> These tests have failed since introduction since they assume that the
> >> assembler output is ELF and that the ABI targeted supports the addressing.
> >>
> >> For Darwin, Mach-O and ABI we need to make several changes:
> >> 1. Use the __USER_LABEL__PREFIX__
> >> 2. Remove the use of ELF-specific constructs (.size, .type etc.)
> >> 3. We cannot make direct access to common variables in the ABI, so that we
> >>   must move these to BSS.
> >>
> >> Since that set is quite significant, I elected to make a separate source
> >> section for Darwin.  This is introduced by #elif defined(__APPLE__) because
> >> __MACH__ is also used by HURD.
> >>
> >> There are potentially other X86 targets (e.g. XCOFF) that could have yet
> >> more changes, so I added a catchall section that #errors if the object 
> >> format
> >> is neither ELF or Mach-O.
> >>
> >> Signed-off-by: Iain Sandoe 
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>* gcc.target/x86_64/abi/bf16/args.h:
> >>* gcc.target/x86_64/abi/bf16/asm-support.S:
> >>* gcc.target/x86_64/abi/bf16/m256bf16/args.h:
> >>* gcc.target/x86_64/abi/bf16/m256bf16/asm-support.S:
> >>* gcc.target/x86_64/abi/bf16/m512bf16/args.h:
> >>* gcc.target/x86_64/abi/bf16/m512bf16/asm-support.S:
> >
> > Missing descriptions in ChangeLog entry.
>
> oops, here:
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/x86_64/abi/bf16/args.h: Make xmm_regs, x87_regs extern.
> * gcc.target/x86_64/abi/bf16/m256bf16/args.h: Likewise.
> * gcc.target/x86_64/abi/bf16/m512bf16/args.h: Likewise.
> * gcc.target/x86_64/abi/bf16/asm-support.S: Add Mach-O variant.
> * gcc.target/x86_64/abi/bf16/m256bf16/asm-support.S: Likewise.
> * gcc.target/x86_64/abi/bf16/m512bf16/asm-support.S: Likewise.

Please note that in other directories asm-support-darwin.s is
introduced and included via .exp file. Is there a reason a different
approach is introduced here?

Uros.


[PATCH] match.pd: Don't fold nan < x etc. for -ftrapping-math [PR106805]

2022-12-05 Thread Jakub Jelinek via Gcc-patches
Hi!

As reported in the PR, the following pr106805.c testcase is miscompiled
with the default -ftrapping-math, because we fold all the comparisons into
constants and don't raise any exceptions.

The match.pd pattern handles just simple comparisons, from those
EQ/NE are quiet and don't raise exceptions on anything but sNaN, while
GT/GE/LT/LE are signaling and do raise exceptions even on qNaN.

fold_relational_const handles this IMHO correctly:
  /* Handle the cases where either operand is a NaN.  */
  if (real_isnan (c0) || real_isnan (c1))
{
  switch (code)
{
case EQ_EXPR:
case ORDERED_EXPR:
  result = 0;
  break;
   
case NE_EXPR:
case UNORDERED_EXPR:
case UNLT_EXPR:
case UNLE_EXPR:
case UNGT_EXPR:
case UNGE_EXPR:
case UNEQ_EXPR:
  result = 1;
  break;

case LT_EXPR:
case LE_EXPR:
case GT_EXPR:
case GE_EXPR:
case LTGT_EXPR:
  if (flag_trapping_math)
return NULL_TREE;
  result = 0;
  break;

default:
  gcc_unreachable ();
}

  return constant_boolean_node (result, type);
}
by folding the signaling comparisons only if -fno-trapping-math.
The following patch does the same in match.pd.

Unfortunately the pr106805.c testcase still fails, but no longer because of
match.pd, but on the trunk because of the still unresolved ranger problems
(same issue as for fold-overflow-1.c etc.) and on 12 branch (and presumably
trunk too) somewhere during expansion the comparisons are also expanded
into constants (which is ok for -fno-trapping-math, but not ok with that).

Though, I think the patch is a small step in the direction, so I'd like
to commit this patch without the gcc.dg/pr106805.c testcase for now.

Bootstrapped/regtested on x86_64-linux and i686-linux, appart from the
+FAIL: gcc.dg/pr106805.c execution test
nothing else appears.  Ok for trunk without the last testcase?

2022-12-05  Jakub Jelinek  

PR middle-end/106805
* match.pd (cmp @0 REAL_CST@1): Don't optimize x cmp NaN
or NaN cmp x to false/true for cmp >/>=/ QNAN;
-/* { dg-final { scan-tree-dump "nonfinite_1 = 0" "original" } } */
+/* { dg-final { scan-tree-dump "nonfinite_1 = \\(float\\)" "original" } } 
*/
   }
 
   {
 volatile int nonfinite_2;
 nonfinite_2 = (float) x >= QNAN;
-/* { dg-final { scan-tree-dump "nonfinite_2 = 0" "original" } } */
+/* { dg-final { scan-tree-dump "nonfinite_2 = \\(float\\)" "original" } } 
*/
   }
 
   {
 volatile int nonfinite_3;
 nonfinite_3 = (float) x < QNAN;
-/* { dg-final { scan-tree-dump "nonfinite_3 = 0" "original" } } */
+/* { dg-final { scan-tree-dump "nonfinite_3 = \\(float\\)" "original" } } 
*/
   }
 
   {
 volatile int nonfinite_4;
 nonfinite_4 = (float) x <= QNAN;
-/* { dg-final { scan-tree-dump "nonfinite_4 = 0" "original" } } */
+/* { dg-final { scan-tree-dump "nonfinite_4 = \\(float\\)" "original" } } 
*/
   }
 
   {
--- gcc/testsuite/c-c++-common/pr57371-5.c.jj   2022-12-02 13:34:14.397714696 
+0100
+++ gcc/testsuite/c-c++-common/pr57371-5.c  2022-12-02 13:35:05.915949599 
+0100
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fno-signaling-nans -fno-trapping-math 
-fdump-tree-original" } */
+
+/* We can not get rid of comparison in tests below because of
+   pending NaN exceptions.
+
+   TODO: avoid under -fno-trapping-math.  */
+
+#define QNAN __builtin_nanf ("0")
+
+void nonfinite(unsigned short x) {
+  {
+volatile int nonfinite_1;
+nonfinite_1 = (float) x > QNAN;
+/* { dg-final { scan-tree-dump "nonfinite_1 = 0" "original" } } */
+  }
+
+  {
+volatile int nonfinite_2;
+nonfinite_2 = (float) x >= QNAN;
+/* { dg-final { scan-tree-dump "nonfinite_2 = 0" "original" } } */
+  }
+
+  {
+volatile int nonfinite_3;
+nonfinite_3 = (float) x < QNAN;
+/* { dg-final { scan-tree-dump "nonfinite_3 = 0" "original" } } */
+  }
+
+  {
+volatile int nonfinite_4;
+nonfinite_4 = (float) x <= QNAN;
+/* { dg-final { scan-tree-dump "nonfinite_4 = 0" "original" } } */
+  }
+
+  {
+volatile int nonfinite_11;
+nonfinite_11 = (float) x == QNAN;
+/* { dg-final { scan-tree-dump "nonfinite_11 = 0" "original" } } */
+  }
+
+  {
+volatile int nonfinite_12;
+nonfinite_12 = (float) x != QNAN;
+/* { dg-final { scan-tree-dump "nonfinite_12 = 1" "original" } } */
+  }
+}
--- gcc/testsuite/gcc.c-torture/execute/ieee/fp-cmp-6.x.jj  2020-01-14 
20:02:47.175603962 +0100
+++ gcc/testsuite/gcc.c-torture/execute/ieee/fp-cmp-6.x 2022-12-05 
10:44:10.037871848 +0100
@@ -1,3 +1,4 @@
+lappend additional_flags "-fno-trapping-math"
 # The ARM VxWorks kernel uses an external floating-point library in
 # which 

Re: [PATCH] range-op-float: Fix up multiplication and division reverse operation [PR107879]

2022-12-05 Thread Aldy Hernandez via Gcc-patches




On 12/5/22 10:37, Jakub Jelinek wrote:

On Mon, Dec 05, 2022 at 10:20:53AM +0100, Aldy Hernandez wrote:

For the division, [-0., 0.] / VARYING is computed (IMHO correctly)
as [-0., 0.] +-NAN, because 0 / anything but 0 or NAN is still
0 and 0 / 0 is NAN and ditto 0 / NAN.  And then we just
float_binary_op_range_finish, which figures out that because lhs
can't be NAN, neither operand can be NAN.  So, the end range is
[-0., 0.].  But that is not correct for the reverse multiplication.
When the result is 0, if op2 can be zero, then x can be anything
(VARYING), to be precise anything but INF (unless result can be NAN),


Not an objection, just an observation... If we know it can't be INF, could
we drop INF from the range?  We have frange_drop_{inf,ninf} for this.


Do you mind if I try that incrementally and only if it doesn't make the
code too large/too unreadable?


Sure.  And don't feel obligated to implement it either.  range-ops is a 
never ending pit of possible optimizations.  We found that out quite 
early in the design.


If you don't get to it, could you at least add a comment?

Aldy




+   // If both lhs and op2 could be zeros or both could be infinities,
+   // we don't know anything about op1 except maybe for the sign
+   // and perhaps if it can be NAN or not.
+   REAL_VALUE_TYPE lb, ub;
+   int signbit_known = signbit_known_p (lhs_lb, lhs_ub, op2_lb, op2_ub);
+   zero_to_inf_range (lb, ub, signbit_known);
+   r.set (type, lb, ub);


I assume we don't know anything about the sign of the NAN because of all the
weird IEEE rules?


Yes, sign bit of NAN is unknown after binary operation or even in the
reverse case of binary operations.
The sign rule is for non-NAN.

"When neither the inputs nor result are NaN, the sign of a product or quotient 
is the exclusive OR of the
operands' signs; the sign of a sum, or of a difference x–y regarded as a sum 
x+(–y), differs from at most one of
the addends' signs; and the sign of the result of the roundToIntegral 
operations and roundToIntegralExact (see
7.3.1) is the sign of the operand. These rules shall apply even when operands 
or results are zero or
infinite."

Jakub





Re: [PATCH] range-op-float: Fix up multiplication and division reverse operation [PR107879]

2022-12-05 Thread Jakub Jelinek via Gcc-patches
On Mon, Dec 05, 2022 at 10:20:53AM +0100, Aldy Hernandez wrote:
> > For the division, [-0., 0.] / VARYING is computed (IMHO correctly)
> > as [-0., 0.] +-NAN, because 0 / anything but 0 or NAN is still
> > 0 and 0 / 0 is NAN and ditto 0 / NAN.  And then we just
> > float_binary_op_range_finish, which figures out that because lhs
> > can't be NAN, neither operand can be NAN.  So, the end range is
> > [-0., 0.].  But that is not correct for the reverse multiplication.
> > When the result is 0, if op2 can be zero, then x can be anything
> > (VARYING), to be precise anything but INF (unless result can be NAN),
> 
> Not an objection, just an observation... If we know it can't be INF, could
> we drop INF from the range?  We have frange_drop_{inf,ninf} for this.

Do you mind if I try that incrementally and only if it doesn't make the
code too large/too unreadable?

> > +   // If both lhs and op2 could be zeros or both could be infinities,
> > +   // we don't know anything about op1 except maybe for the sign
> > +   // and perhaps if it can be NAN or not.
> > +   REAL_VALUE_TYPE lb, ub;
> > +   int signbit_known = signbit_known_p (lhs_lb, lhs_ub, op2_lb, op2_ub);
> > +   zero_to_inf_range (lb, ub, signbit_known);
> > +   r.set (type, lb, ub);
> 
> I assume we don't know anything about the sign of the NAN because of all the
> weird IEEE rules?

Yes, sign bit of NAN is unknown after binary operation or even in the
reverse case of binary operations.
The sign rule is for non-NAN.

"When neither the inputs nor result are NaN, the sign of a product or quotient 
is the exclusive OR of the
operands' signs; the sign of a sum, or of a difference x–y regarded as a sum 
x+(–y), differs from at most one of
the addends' signs; and the sign of the result of the roundToIntegral 
operations and roundToIntegralExact (see
7.3.1) is the sign of the operand. These rules shall apply even when operands 
or results are zero or
infinite."

Jakub



Re: [PATCH] range-op-float: Fix up multiplication and division reverse operation [PR107879]

2022-12-05 Thread Aldy Hernandez via Gcc-patches




On 11/29/22 10:43, Jakub Jelinek wrote:

Hi!

While for the normal cases it seems to be correct to implement
reverse multiplication (op1_range/op2_range) through division
with float_binary_op_range_finish, reverse division (op1_range)
through multiplication with float_binary_op_range_finish or
(op2_range) through division with float_binary_op_range_finish,
as e.g. following testcase shows for the corner cases it is
incorrect.
Say on the testcase we are doing reverse multiplication, we
have [-0., 0.] range (no NAN) on lhs and VARYING on op1 (or op2).
We implement that through division, because x from
lhs = x * op2
is
x = lhs / op2
For the division, [-0., 0.] / VARYING is computed (IMHO correctly)
as [-0., 0.] +-NAN, because 0 / anything but 0 or NAN is still
0 and 0 / 0 is NAN and ditto 0 / NAN.  And then we just
float_binary_op_range_finish, which figures out that because lhs
can't be NAN, neither operand can be NAN.  So, the end range is
[-0., 0.].  But that is not correct for the reverse multiplication.
When the result is 0, if op2 can be zero, then x can be anything
(VARYING), to be precise anything but INF (unless result can be NAN),


Not an objection, just an observation... If we know it can't be INF, 
could we drop INF from the range?  We have frange_drop_{inf,ninf} for this.



because anything * 0 is 0 (or NAN for INF).  While if op2 must be
non-zero, then x must be 0.  Of course the sign logic
(signbit(x) = signbit(lhs) ^ signbit(op2)) still holds, so it actually
isn't full VARYING if both lhs and op2 have known sign bits.
And going through other corner cases one by one shows other differences
between what we compute for the corresponding forward operation and
what we should compute for the reverse operations.
The following patch is slightly conservative and includes INF
(in case of result including 0 and not NAN) in the ranges or
0 in the ranges (in case of result including INF and not NAN).
The latter is what happens anyway because we flush denormals to 0,
and the former just not to deal with all the corner cases.
So, the end test is that for reverse multiplication and division
op2_range the cases we need to adjust to VARYING or VARYING positive
or VARYING negative are if lhs and op? ranges both contain 0,
or both contain some infinity, while for division op1_range the
corner case is if lhs range contains 0 and op2 range contains INF or vice
versa.  Otherwise I believe ranges from the corresponding operation
are ok, or could be slightly more conservative (e.g. for
reverse multiplication, if op? range is singleton INF and lhs
range doesn't include any INF, then x's range should be UNDEFINED or
known NAN (depending on if lhs can be NAN), while the division computes
[-0., 0.] +-NAN; or similarly if op? range is only 0 and lhs range
doesn't include 0, division would compute +INF +-NAN, or -INF +-NAN,
or (for lack of multipart franges -INF +INF +-NAN just VARYING +-NAN),
while again it is UNDEFINED or known NAN.

Oh, and I found by code inspection wrong condition for the division's
known NAN result, due to thinko it would trigger not just when
both operands are known to be 0 or both are known to be INF, but
when either both are known to be 0, or at least one is known to be INF.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-11-29  Jakub Jelinek  

PR tree-optimization/107879
* range-op-float.cc (foperator_mult::op1_range): If both
lhs and op2 ranges contain zero or both ranges contain
some infinity, set r range to zero_to_inf_range depending on
signbit_known_p.
(foperator_div::op2_range): Similarly for lhs and op1 ranges.
(foperator_div::op1_range): If lhs range contains zero and op2
range contains some infinity or vice versa, set r range to
zero_to_inf_range depending on signbit_known_p.
(foperator_div::rv_fold): Fix up condition for returning known NAN.

--- gcc/range-op-float.cc.jj2022-11-18 09:00:44.371322999 +0100
+++ gcc/range-op-float.cc   2022-11-28 19:45:50.347869350 +0100
@@ -2143,8 +2143,30 @@ public:
  range_op_handler rdiv (RDIV_EXPR, type);
  if (!rdiv)
return false;
-return float_binary_op_range_finish (rdiv.fold_range (r, type, lhs, op2),
-r, type, lhs);
+bool ret = rdiv.fold_range (r, type, lhs, op2);
+if (ret == false)
+  return false;
+const REAL_VALUE_TYPE _lb = lhs.lower_bound ();
+const REAL_VALUE_TYPE _ub = lhs.upper_bound ();
+const REAL_VALUE_TYPE _lb = op2.lower_bound ();
+const REAL_VALUE_TYPE _ub = op2.upper_bound ();
+if ((contains_zero_p (lhs_lb, lhs_ub) && contains_zero_p (op2_lb, op2_ub))
+   || ((real_isinf (_lb) || real_isinf (_ub))
+   && (real_isinf (_lb) || real_isinf (_ub
+  {
+   // If both lhs and op2 could be zeros or both could be infinities,
+   // we don't know anything about op1 except maybe for the sign
+   // 

Re: [PATCH v2] Return a NULL rtx when targets don't support cbranchcc4 or predicate check fails in prepare_cmp_insn

2022-12-05 Thread Richard Biener via Gcc-patches
On Mon, Dec 5, 2022 at 9:43 AM HAO CHEN GUI  wrote:
>
> Hi Richard,
>
> 在 2022/12/5 15:31, Richard Biener 写道:
> > I wonder if you have a testcase you can add showing this change is
> > worthwhile and
> > fixes a bug?
>
> I want to enable cbranchcc4 on rs6000. But not all sub CCmode is
> supported on rs6000. So the predicate check(assert) fails and it hits
> ICE. I drafted two patches. This one is for the generic code, and
> another is for rs6000. If this one is committed, cbranchcc4 can be
> enabled on rs6000. Then I can create a testcase and let the predicate
> check fail. Right now I can't write a testcase for it as it never
> reaches the failure path.

Fair enough.


> Thanks a lot
> Gui Haochen
>


Re: [PATCH v2] Return a NULL rtx when targets don't support cbranchcc4 or predicate check fails in prepare_cmp_insn

2022-12-05 Thread HAO CHEN GUI via Gcc-patches
Hi Richard,

在 2022/12/5 15:31, Richard Biener 写道:
> I wonder if you have a testcase you can add showing this change is
> worthwhile and
> fixes a bug?

I want to enable cbranchcc4 on rs6000. But not all sub CCmode is
supported on rs6000. So the predicate check(assert) fails and it hits
ICE. I drafted two patches. This one is for the generic code, and
another is for rs6000. If this one is committed, cbranchcc4 can be
enabled on rs6000. Then I can create a testcase and let the predicate
check fail. Right now I can't write a testcase for it as it never
reaches the failure path.

Thanks a lot
Gui Haochen



[PATCH] plugins/107964 - install contracts.h

2022-12-05 Thread rguenther--- via Gcc-patches
contracts.h is included by cp-tree.h so needs to be installed for
plugins.

Pushed as onbvious.

PR plugins/107964
gcc/cp/
* Make-lang.in (CP_PLUGIN_HEADERS): Install contracts.h
---
 gcc/cp/Make-lang.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/cp/Make-lang.in b/gcc/cp/Make-lang.in
index 75e2f7c7ba3..b3131964602 100644
--- a/gcc/cp/Make-lang.in
+++ b/gcc/cp/Make-lang.in
@@ -39,7 +39,7 @@ CXX_INSTALL_NAME := $(shell echo c++|sed 
'$(program_transform_name)')
 GXX_INSTALL_NAME := $(shell echo g++|sed '$(program_transform_name)')
 CXX_TARGET_INSTALL_NAME := $(target_noncanonical)-$(shell echo c++|sed 
'$(program_transform_name)')
 GXX_TARGET_INSTALL_NAME := $(target_noncanonical)-$(shell echo g++|sed 
'$(program_transform_name)')
-CP_PLUGIN_HEADERS := cp-tree.h cxx-pretty-print.h name-lookup.h type-utils.h 
operators.def cp-trait.def
+CP_PLUGIN_HEADERS := cp-tree.h cxx-pretty-print.h name-lookup.h type-utils.h 
operators.def cp-trait.def contracts.h
 
 #
 # Define the names for selecting c++ in LANGUAGES.
-- 
2.35.3


[PATCH] tree-optimization/107956 - ICE with NULL call LHS

2022-12-05 Thread rguenther--- via Gcc-patches
The following adds a missing check for a NULL call LHS in the
vector pattern recognizer.

Pushed as obvious.

PR tree-optimization/107956
* tree-vect-patterns.cc (vect_recog_mask_conversion_pattern):
Check for NULL LHS on masked loads.
---
 gcc/tree-vect-patterns.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index f6c34bb3263..d9fdb24ee62 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -4964,6 +4964,8 @@ vect_recog_mask_conversion_pattern (vec_info *vinfo,
   else
{
  lhs = gimple_call_lhs (last_stmt);
+ if (!lhs)
+   return NULL;
  vectype1 = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
}
 
-- 
2.35.3