Re: Add 'c-c++-common/torture/pr107195-1.c' [PR107195] (was: [COMMITTED] [PR107195] Set range to zero when nonzero mask is 0.)

2022-10-17 Thread Aldy Hernandez via Gcc-patches
On Mon, Oct 17, 2022 at 4:47 PM Thomas Schwinge  wrote:
>
> Hi!
>
> On 2022-10-17T15:58:47+0200, Aldy Hernandez  wrote:
> > On Mon, Oct 17, 2022 at 9:44 AM Thomas Schwinge  
> > wrote:
> >> On 2022-10-11T10:31:37+0200, Aldy Hernandez via Gcc-patches 
> >>  wrote:
> >> > When solving 0 = _15 & 1, we calculate _15 as:
> >> >
> >> >   [irange] int [-INF, -2][0, +INF] NONZERO 0xfffe
> >> >
> >> > The known value of _15 is [0, 1] NONZERO 0x1 which is intersected with
> >> > the above, yielding:
> >> >
> >> >   [0, 1] NONZERO 0x0
> >> >
> >> > This eventually gets copied to a _Bool [0, 1] NONZERO 0x0.
> >> >
> >> > This is problematic because here we have a bool which is zero, but
> >> > returns false for irange::zero_p, since the latter does not look at
> >> > nonzero bits.  This causes logical_combine to assume the range is
> >> > not-zero, and all hell breaks loose.
> >> >
> >> > I think we should just normalize a nonzero mask of 0 to [0, 0] at
> >> > creation, thus avoiding all this.
> >>
> >> 1. This commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04
> >> "[PR107195] Set range to zero when nonzero mask is 0" broke a GCC/nvptx
> >> offloading test case:
> >>
> >> UNSUPPORTED: 
> >> libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c 
> >> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0
> >> PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c 
> >> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  
> >> (test for excess errors)
> >> PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c 
> >> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  
> >> execution test
> >> [-PASS:-]{+FAIL:+} 
> >> libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c 
> >> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2   
> >> scan-nvptx-none-offload-rtl-dump mach "SESE regions:.* 
> >> [0-9]+{[0-9]+->[0-9]+(\\.[0-9]+)+}"
> >>
> >> Same for C++.
> >>
> >> I'll later send a patch (for the test case!) to fix that up.
> >>
> >> 2. Looking into this, I found that this
> >> commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04
> >> "[PR107195] Set range to zero when nonzero mask is 0" actually enables a
> >> code transformation/optimization that GCC apparently has not been doing
> >> before!  I've tried to capture that in the attached
> >> "Add 'c-c++-common/torture/pr107195-1.c' [PR107195]".
> >
> > Nice.
> >
> >> Will you please verify that one?  In its current '#if 1' configuration,
> >> it's all-PASS after commit
> >> r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04
> >> "[PR107195] Set range to zero when nonzero mask is 0", whereas before, we
> >> get two calls to 'foo', because GCC apparently didnn't understand the
> >> relation (optimization opportunity) between 'r *= 2;' and the subsequent
> >> 'if (r & 1)'.
> >
> > Yeah, that looks correct.  We keep better track of nonzero masks.
>
> OK, next observation: this also works for split-up expressions
> 'if ((r & 2) && (r & 1))' (same rationale as for 'if (r & 1)' alone).
> I've added such a variant in my test case.

Unless I'm missing something, your testcase doesn't have a body for
foo[123], so GCC has no way to know what any of those functions did or
what bits are set/unset.

>
> But: it doesn't work for logically equal 'if (r & 3)'.  I've added such
> an XFAILed variant in my test case.  Do you have guidance what needs to
> be done to make such cases work, too?
>
> >> I've left in the other '#if' variants in case you'd like to experiment
> >> with these, but would otherwise clean that up before pushing.
> >>
> >> Where does one put such a test case?
> >>
> >> Should the file be named 'pr107195' or something else?
> >
> > The aforementioned patch already has:
> >
> > * gcc.dg/tree-ssa/pr107195-1.c: New test.
> > * gcc.dg/tree-ssa/pr107195-2.c: New test.
> >
> > So I would just add a pr107195-3.c test.
>
> But note that unlike yours in 'gcc.dg/tree-ssa/', I had put mine into
> 'c-c++-common/torture/'.  That's so that we get C and C++ testing, and
> all torture testing flag variants.  (... where we do see the optimization
> happen starting at '-O1'.)  Do you think that is excessive, and a single
> 'gcc.dg/tree-ssa/' test case, C only, '-O1' only is sufficient for this?
> (I don't have much experience with test cases in such regions of GCC,
> hence these questions.)

My personal preference is tree-ssa since they are middle end tests.
Also, since we're testing ranger, it primarily runs in DOM, VRP, evrp,
and the backward threader, so no need to run it at multiple
optimization levels.

I suggested DOM, because I know ranger runs within DOM, so if the
transformation is seen at -O1, it's likely to be done there.  Also,
evrp/VRP don't run at -O1, so that's another hint it happened in DOM.
This is a guess though, it could've been CCP setting a nonzero mask,
which then ranger/DOM picked up.

All in 

Re: [PATCH v2] xtensa: Prepare the transition from Reload to LRA

2022-10-17 Thread Max Filippov via Gcc-patches
On Mon, Oct 17, 2022 at 7:57 PM Takayuki 'January June' Suwa
 wrote:
> On 2022/10/16 14:03, Max Filippov wrote:
> > There's also the following runtime failures, but only on call0 
> > configuration:
> >
> > +FAIL: gcc.c-torture/execute/20010122-1.c   -O1  execution test
> > +FAIL: gcc.c-torture/execute/20010122-1.c   -O2  execution test
> > +FAIL: gcc.c-torture/execute/20010122-1.c   -O3 -g  execution test
> > +FAIL: gcc.c-torture/execute/20010122-1.c   -Os  execution test
> > +FAIL: gcc.c-torture/execute/20010122-1.c   -O2 -flto 
> > -fno-use-linker-plugin -flto-partition=none  execution test
>
> both assembler outputs with and without this patch are identical on my side

Interesting. In -O1 test I see the following difference that is going to affect
the return value of the corresponding functions:

--- gcc-13-3308-gb4a4c6382b14-call0-le/20010122-1.s  2022-10-17
20:07:32.390363204 -0700
+++ gcc-13-3309-g851636ecd015-call0-le/20010122-1.s  2022-10-17
20:06:36.613785546 -0700
@@ -143,13 +143,10 @@
test2:
   addisp, sp, -16
   s32i.n  a0, sp, 12
-   s32i.n  a12, sp, 8
-   mov.n   a12, a0
   l32ra2, .LC6
   callx0  a2
-   mov.n   a2, a12
+   mov.n   a2, a0
   l32i.n  a0, sp, 12
-   l32i.n  a12, sp, 8
   addisp, sp, 16
   ret.n
   .size   test2, .-test2
@@ -161,13 +158,10 @@
test3:
   addisp, sp, -16
   s32i.n  a0, sp, 12
-   s32i.n  a12, sp, 8
-   mov.n   a12, a0
   l32ra2, .LC7
   callx0  a2
-   mov.n   a2, a12
+   mov.n   a2, a0
   l32i.n  a0, sp, 12
-   l32i.n  a12, sp, 8
   addisp, sp, 16
   ret.n
   .size   test3, .-test3
@@ -258,14 +252,11 @@
test8:
   addisp, sp, -16
   s32i.n  a0, sp, 12
-   s32i.n  a12, sp, 8
-   mov.n   a12, a0
   l32ra2, .LC12
   callx0  a2
   l32ra2, .LC13
-   s32i.n  a12, a2, 0
+   s32i.n  a0, a2, 0
   l32i.n  a0, sp, 12
-   l32i.n  a12, sp, 8
   addisp, sp, 16
   ret.n
   .size   test8, .-test8

-- 
Thanks.
-- Max


PING [PATCH v5 0/2] IBM zSystems: Improve storing asan frame_pc

2022-10-17 Thread Ilya Leoshkevich via Gcc-patches
On Tue, 2022-09-27 at 02:23 +0200, Ilya Leoshkevich wrote:
> Hi,
> 
> This is a resend of v4 with slightly adjusted commit messages:
> 
> v1: https://gcc.gnu.org/pipermail/gcc-patches/2019-July/525016.html
> v2: https://gcc.gnu.org/pipermail/gcc-patches/2019-July/525069.html
> v3: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/548338.html
> v4: https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549252.html
> 
> It still survives the bootstrap and the regtest on x86_64-redhat-
> linux,
> s390x-redhat-linux and ppc64le-redhat-linux.  It also fixes [1].
> 
> I also tried the approach with moving .LASANPC closer to the function
> label and using FUNCTION_BOUNDARY instead of introducing
> CODE_LABEL_BOUNDARY, but the problem there is that it's hard to catch
> the moment where the function label is written.  Architectures can do
> it by calling ASM_OUTPUT_LABEL() or assemble_name() in
> ASM_DECLARE_FUNCTION_NAME(), ASM_OUTPUT_FUNCTION_LABEL() or
> TARGET_ASM_FUNCTION_PROLOGUE().  epiphany_start_function() does that
> twice, but passes the same decl to both calls.  Note that simply
> moving asan_function_start() to final_start_function_1() is not
> enough,
> since an architecture can write something after the function label.
> This all means that for this approach to work, all the architectures
> need to be adjusted, which looks like an overkill to me.
> 
> Best regards,
> Ilya
> 
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593666.html
> 
> 
> Ilya Leoshkevich (2):
>   asan: specify alignment for LASANPC labels
>   IBM zSystems: Define CODE_LABEL_BOUNDARY
> 
>  gcc/asan.cc    |  1 +
>  gcc/config/s390/s390.h |  3 +++
>  gcc/defaults.h |  5 +
>  gcc/doc/tm.texi    |  4 
>  gcc/doc/tm.texi.in |  4 
>  gcc/testsuite/gcc.target/s390/asan-no-gotoff.c | 15 +++
>  6 files changed, 32 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/s390/asan-no-gotoff.c
> 



[COMMITTED] PR tree-optimization/107273 - Merge partial relation precisions properly.

2022-10-17 Thread Andrew MacLeod via Gcc-patches
When a partial equivalency record is merged, the existing members are 
updated.  The resulting PE size for each member should be the minimum of 
what it was, and the size of the object it is now based on.  The code 
was simply setting it to the new size, which sometimes overwrote the 
correct result.


Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew
commit 0205fbb91be022055c632973caa95e398b33db39
Author: Andrew MacLeod 
Date:   Mon Oct 17 19:00:49 2022 -0400

Merge partial relation precisions properly

When merging 2 groups of PE's, one group was simply being set to the
other instead of properly merging them.

PR tree-optimization/107273
gcc/
* value-relation.cc (equiv_oracle::add_partial_equiv): Merge
instead of copying precison of each member.

gcc/testsuite/
* gcc.dg/tree-ssa/pr107273-1.c: New.
* gcc.dg/tree-ssa/pr107273-2.c: New.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr107273-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr107273-1.c
new file mode 100644
index 000..db2e2c0da55
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr107273-1.c
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+/* { dg-options "-O3" } */
+
+int printf(const char *, ...);
+int a[1] = {1};
+short b, c = 5500;
+int d;
+long e;
+char f = 1;
+int main() {
+  while (1) {
+long g = b < 1;
+e = g;
+break;
+  }
+  for (; f; f--) {
+if (e) {
+  d = -(6L | -(c & 1000));
+}
+char h = d;
+if (b)
+  b = 0;
+if (d < 200)
+  while (1)
+printf("%d", a[c]);
+short i = h * 210;
+c = i;
+  }
+  return 0;
+}
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr107273-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr107273-2.c
new file mode 100644
index 000..337450782d9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr107273-2.c
@@ -0,0 +1,27 @@
+/* { dg-do run } */
+/* { dg-options "-Os" } */
+
+int a, d, f;
+char b, g;
+unsigned i;
+int main() {
+  int c = 300, h = 40;
+  char e = 1;
+  for (; a < 1; a++) {
+c = ~((i - ~c) | e);
+  L1:
+e = f = c;
+if (c)
+  if (c > -200)
+e = g % (1 << h);
+char k = 0;
+  L2:;
+  }
+  if (b) {
+if (d)
+  goto L2;
+if (!b)
+  goto L1;
+  }
+  return 0;
+}
diff --git a/gcc/value-relation.cc b/gcc/value-relation.cc
index fed8a78723c..178a245f41a 100644
--- a/gcc/value-relation.cc
+++ b/gcc/value-relation.cc
@@ -380,7 +380,7 @@ equiv_oracle::add_partial_equiv (relation_kind r, tree op1, tree op2)
   EXECUTE_IF_SET_IN_BITMAP (pe1.members, 0, x, bi)
 	{
 	  m_partial[x].ssa_base = op2;
-	  m_partial[x].code = pe2.code;
+	  m_partial[x].code = pe_min (m_partial[x].code, pe2.code);
 	}
   bitmap_set_bit (pe1.members, v2);
   return;


Re: [PATCH] C++ API database

2022-10-17 Thread Jason Merrill via Gcc-patches

On 9/28/22 12:59, Ulrich Drepper via Gcc-patches wrote:

Ping.  Anyone having problems with this?  And the governance of the file?


Hmm, for some reason this didn't show up on my C++ patches filter. 
Please do CC me when pinging C++ patches.



On Mon, Sep 12, 2022 at 1:51 PM Ulrich Drepper  wrote:


After my prior inquiry into the use of python as a build tool for
maintainers didn't produce any negative comments and several active and
even enthusiastic support message I'm going forward with submitting the
patch.

To repeat the detail, for the generation of the upcoming C++ standard
library module and the hints for missing definitions/declarations in the
std:: namespace we need a list of standard C++ APIs.  The information
needed for the two use cases is different but the actual APIs overlap
almost completely and therefore it would be a bad idea to have the data
separated.

We could opt for a file format that is easy to read in awk and writing the
appropriate scripts to transform the data into the appropriate output
format but this looks ugly, is hard to understand, and a nightmare to
maintain.  On the other hand, writing the code in Python is simple and
clean.


Therefore, Jonathan and I worked on a CSV file which contains the
necessary information and a Python to create the gperf input file to
generate std-name-hint.h and also, in future, the complete source of the
export interface description for the standard library module.  This mode is
not yet used because the module support isn't ready yet.  The output file
corresponds to the hand-coded version of the export code Jonathan uses
right now.

Note that in both of these cases the generated files are static, they
don't depend on the local configuration and therefore are checked into the
source code repository.  The script only has to run if the generated files
are explicitly removed or, in maintainer mode, if the CSV file has
changed.  For normal compilation from a healthy source code tree the tool
is not needed.


One remaining issue is the responsibility for the CSV file.  The file
needs to live in the directory of the frontend and therefore nominally
changes need to be approved by the frontend maintainers.  The content
entirely consists of information from the standard library, though.  Any
change that doesn't break the build on one machine (i.e., the Python script
doesn't fail) will not course any problem because the output format of the
script is correct.  Therefore we have been wondering whether the CSV file
should at least have shared ownership between the frontend maintainers and
the libstdc++ maintainers.


That makes sense; the file could say something to that effect.  Or the 
CSV file could live in the library directory, or a third directory.  And 
maybe separate the two generators; it seems like the code shared between 
them is pretty small.



The CSV file contain more hint information than the old hand-coded .gperf
file.  So, an additional effect of this patch is the extension of the hints
that are provided but given that the lookup is now fast this shouldn't have
any negative impact.  The file is not complete, though, this will come over
time and definitely before the module support is done.

I build my complete set of compilers with this patch without problems.

Any comments?


Generally, looks good.

The CSV file could use a header row documenting the fields (as well as 
the documentation in the script).



+# This is the file that depends in the generated header file.


s/in/on/

Jason



Re: [PATCH] Add condition coverage profiling

2022-10-17 Thread Hans-Peter Nilsson
On Wed, 12 Oct 2022, Jørgen Kvalsvik via Gcc-patches wrote:
> This patch adds support in gcc+gcov for modified condition/decision
> coverage (MC/DC) with the -fprofile-conditions flag.

I'd love improvements in this area.

But this is a serious concern:

> gcov --conditions:
> 
> 3:   17:void fn (int a, int b, int c, int d) {
> 3:   18:if ((a && (b || c)) && d)
> condition outcomes covered 3/8
> condition  0 not covered (true false)
> condition  1 not covered (true)
> condition  2 not covered (true)
> condition  3 not covered (true)
> 1:   19:x = 1;
> -:   20:else
> 2:   21:x = 2;
> 3:   22:}

Is this the suggested output from gcov?

Sorry, but this is too hard to read; I can't read this.  What 
does it mean?  What's 0 and what's 1 and which are the 8 
conditions?  (Why not 16 or more; which are redundant?)  Or to 
wit, a glance, which parts of (a && (b || c)) && d are actually 
covered?

There has got to be a better *intuitively* understandable 
presentation format than this. If you please forgive the errors 
in not matching the partal expressions like in your proposal and 
focus on the presentation format, I'd suggest something like, 
for a one-time run with a=true, b=false, c=true, d=false:

"With:
 3:   18:if ((a && (b || c)) && d)
0:   ^^^
1:  ^
2:^
3: 
4:  ^
5:   ^
condition  0 not covered (false)
condition  1 not covered (true)
condition  2 not covered (false)
condition  3 not covered (false)
condition  4 not covered (true)
condition  5 not covered (false)"
(etc)

Possibly with each partial expression repeated above its 
underscoring for readability, because of the increasing distance 
between the underscoring and referred source.

Actually, a separate indexed table like that isn't the best 
choice either.  Perhaps better quoting the source:

"condition (a && (b || c)) false not covered
condition d false not covered
condition (b || c) false not covered
condition b true not covered
condition c false not covered"

Or, just underscoring as instead of quoting the source:
"3:   18:if ((a && (b || c)) && d)

In condition:^^^
false not covered"
(etc)

It is possible I completely misunderstand your proposal, but 
there has to be something from the above to pick.  I'd hate to 
see this go down because of usability problems.  Hope this was 
constructive.

brgds, H-P


[committed][PR target/101697] Fix bogus RTL on the H8

2022-10-17 Thread Jeff Law via Gcc-patches

This patch actually fixes the bogus RTL seen in PR101697.


Basically we continue to use the insn condition to catch most of the 
problem cases related to autoinc addressing modes.  This patch adds 
constraints which can guide reload (and hopefully LRA) away from doing 
blind replacements during register elimination that would ultimately 
result in bogus RTL.  The idea is from Paul K who has done something 
very similar on the pdp11.  I guess it shouldn't be a big surprise that 
the H8 and pdp11 need the same kind of handling given some of the 
similarities in their architectures.



Anyway, this has been tested in my tester without regressions. In fact, 
it fixes several bugs where the testsuite was tripping over the same 
problem.  Given this issue is covered by the testsuite, I haven't added 
a new test.



Pushed to the trunk.

Jeff
commit 4374c424a60777a7658050f0aeb1dcc9af915647
Author: Jeff Law 
Date:   Mon Oct 17 19:52:18 2022 -0400

Fix bogus RTL on the H8.

This patch actually fixes the bogus RTL seen in PR101697.

Basically we continue to use the insn condition to catch most of the problem
cases related to autoinc addressing modes.  This patch adds constraints 
which
can guide reload (and hopefully LRA) away from doing blind replacements 
during
register elimination that would ultimately result in bogus RTL.  The idea is
from Paul K. who has done something very similar on the pdp11.  I guess it
shouldn't be a big surprise that the H8 and pdp11 need the same kind of
handling given some of the similarities in their architectures.

gcc/
PR target/101697
* config/h8300/combiner.md: Replace '<' preincment constraint with
ZA/Z1..ZH/Z7 combinations.
* config/h8300/movepush.md: Similarly

diff --git a/gcc/config/h8300/combiner.md b/gcc/config/h8300/combiner.md
index 067f26678c1..fd5cf2f4af4 100644
--- a/gcc/config/h8300/combiner.md
+++ b/gcc/config/h8300/combiner.md
@@ -1142,8 +1142,8 @@
 ;; Storing a part of HImode to QImode.
 
 (define_insn_and_split ""
-  [(set (match_operand:QI 0 "general_operand_dst" "=rm<")
-   (subreg:QI (lshiftrt:HI (match_operand:HI 1 "register_operand" "r")
+  [(set (match_operand:QI 0 "general_operand_dst" 
"=rm,Za,Zb,Zc,Zd,Ze,Zf,Zg,Zh")
+   (subreg:QI (lshiftrt:HI (match_operand:HI 1 "register_operand" 
"r,Z0,Z1,Z2,Z3,Z4,Z5,Z6,Z7")
(const_int 8)) 1))]
   ""
   "#"
@@ -1153,8 +1153,8 @@
  (clobber (reg:CC CC_REG))])])
 
 (define_insn ""
-  [(set (match_operand:QI 0 "general_operand_dst" "=rm<")
-   (subreg:QI (lshiftrt:HI (match_operand:HI 1 "register_operand" "r")
+  [(set (match_operand:QI 0 "general_operand_dst" 
"=rm,Za,Zb,Zc,Zd,Ze,Zf,Zh,Zg")
+   (subreg:QI (lshiftrt:HI (match_operand:HI 1 "register_operand" 
"r,Z0,Z1,Z2,Z3,Z4,Z5,Z6,Z7")
(const_int 8)) 1))
(clobber (reg:CC CC_REG))]
   ""
@@ -1164,8 +1164,8 @@
 ;; Storing a part of SImode to QImode.
 
 (define_insn_and_split ""
-  [(set (match_operand:QI 0 "general_operand_dst" "=rm<")
-   (subreg:QI (lshiftrt:SI (match_operand:SI 1 "register_operand" "r")
+  [(set (match_operand:QI 0 "general_operand_dst" 
"=rm,Za,Zb,Zc,Zd,Ze,Zf,Zh,Zg")
+   (subreg:QI (lshiftrt:SI (match_operand:SI 1 "register_operand" 
"r,Z0,Z1,Z2,Z3,Z4,Z5,Z6,Z7")
(const_int 8)) 3))]
   ""
   "#"
@@ -1175,8 +1175,8 @@
  (clobber (reg:CC CC_REG))])])
 
 (define_insn ""
-  [(set (match_operand:QI 0 "general_operand_dst" "=rm<")
-   (subreg:QI (lshiftrt:SI (match_operand:SI 1 "register_operand" "r")
+  [(set (match_operand:QI 0 "general_operand_dst" 
"=rm,Za,Zb,Zc,Zd,Ze,Zf,Zh,Zg")
+   (subreg:QI (lshiftrt:SI (match_operand:SI 1 "register_operand" 
"r,Z0,Z1,Z2,Z3,Z4,Z5,Z6,Z7")
(const_int 8)) 3))
(clobber (reg:CC CC_REG))]
   ""
@@ -1184,10 +1184,10 @@
   [(set_attr "length" "8")])
 
 (define_insn_and_split ""
-  [(set (match_operand:QI 0 "general_operand_dst" "=rm<")
-   (subreg:QI (lshiftrt:SI (match_operand:SI 1 "register_operand" "r")
+  [(set (match_operand:QI 0 "general_operand_dst" 
"=rm,Za,Zb,Zc,Zd,Ze,Zf,Zh,Zg")
+   (subreg:QI (lshiftrt:SI (match_operand:SI 1 "register_operand" 
"r,Z0,Z1,Z2,Z3,Z4,Z5,Z6,Z7")
(const_int 16)) 3))
-   (clobber (match_scratch:SI 2 "="))]
+   (clobber (match_scratch:SI 2 "="))]
   ""
   "#"
   "&& reload_completed"
@@ -1197,20 +1197,20 @@
  (clobber (reg:CC CC_REG))])])
 
 (define_insn ""
-  [(set (match_operand:QI 0 "general_operand_dst" "=rm<")
-   (subreg:QI (lshiftrt:SI (match_operand:SI 1 "register_operand" "r")
+  [(set (match_operand:QI 0 "general_operand_dst" 
"=rm,Za,Zb,Zc,Zd,Ze,Zf,Zh,Zg")
+   (subreg:QI (lshiftrt:SI (match_operand:SI 1 "register_operand" 
"r,Z0,Z1,Z2,Z3,Z4,Z5,Z6,Z7")
(const_int 16)) 3))
-   (clobber 

[committed] More infrastructure to avoid bogus RTL on H8

2022-10-17 Thread Jeff Law via Gcc-patches
Continuing the work to add constraints to avoid invalid RTL  with 
autoinc addressing modes.  Specifically this patch adds  the memory 
constraints similar to the pdp11.


Pushed to the trunk,

Jeff
commit 19859bd72119708c85cc6976b3547738be6f5b1c
Author: Jeff Law 
Date:   Mon Oct 17 19:42:27 2022 -0400

More infrastructure to avoid bogus RTL on H8.

Continuing the work to add constraints to avoid invalid RTL
with autoinc addressing modes.  Specifically this patch adds
the memory constraints similar to the pdp11.

gcc/

* config/h8300/constraints.md (Za..Zh): New constraints for
autoinc addresses using a specific register.
* config/h8300/h8300.cc (pre_incdec_with_reg): New function.
* config/h8300/h8300-protos.h (pre_incdec_with_reg): Add prototype.

diff --git a/gcc/config/h8300/constraints.md b/gcc/config/h8300/constraints.md
index 6eaffc16975..7e6681c4492 100644
--- a/gcc/config/h8300/constraints.md
+++ b/gcc/config/h8300/constraints.md
@@ -241,3 +241,11 @@
 (define_register_constraint "Z7" "NOT_SP_REGS"
   "@internal")
 
+(define_constraint "Za" "@internal" (match_test "pre_incdec_with_reg (op, 0)"))
+(define_constraint "Zb" "@internal" (match_test "pre_incdec_with_reg (op, 1)"))
+(define_constraint "Zc" "@internal" (match_test "pre_incdec_with_reg (op, 2)"))
+(define_constraint "Zd" "@internal" (match_test "pre_incdec_with_reg (op, 3)"))
+(define_constraint "Ze" "@internal" (match_test "pre_incdec_with_reg (op, 4)"))
+(define_constraint "Zf" "@internal" (match_test "pre_incdec_with_reg (op, 5)"))
+(define_constraint "Zg" "@internal" (match_test "pre_incdec_with_reg (op, 6)"))
+(define_constraint "Zh" "@internal" (match_test "pre_incdec_with_reg (op, 7)"))
diff --git a/gcc/config/h8300/h8300-protos.h b/gcc/config/h8300/h8300-protos.h
index e9d434c0d5a..8c989495c29 100644
--- a/gcc/config/h8300/h8300-protos.h
+++ b/gcc/config/h8300/h8300-protos.h
@@ -100,6 +100,7 @@ extern int h8300_initial_elimination_offset (int, int);
 extern int h8300_regs_ok_for_stm (int, rtx[]);
 extern int h8300_hard_regno_rename_ok (unsigned int, unsigned int);
 extern bool h8300_move_ok (rtx, rtx);
+extern bool pre_incdec_with_reg (rtx, int);
 
 struct cpp_reader;
 extern void h8300_pr_interrupt (struct cpp_reader *);
diff --git a/gcc/config/h8300/h8300.cc b/gcc/config/h8300/h8300.cc
index be3e385c91e..ce0702edecb 100644
--- a/gcc/config/h8300/h8300.cc
+++ b/gcc/config/h8300/h8300.cc
@@ -5531,6 +5531,32 @@ h8300_ok_for_sibcall_p (tree fndecl, tree)
 
   return 1;
 }
+
+/* Return TRUE if OP is a PRE_INC or PRE_DEC
+   instruction using REG, FALSE otherwise.  */
+
+bool
+pre_incdec_with_reg (rtx op, int reg)
+{
+  /* OP must be a MEM.  */
+  if (GET_CODE (op) != MEM)
+return false;
+
+  /* The address must be a PRE_INC or PRE_DEC.  */
+  op = XEXP (op, 0);
+  if (GET_CODE (op) != PRE_DEC && GET_CODE (op) != PRE_INC)
+return false;
+
+  /* It must be a register that is being incremented
+ or decremented.  */
+  op = XEXP (op, 0);
+  if (!REG_P (op))
+return false;
+
+  /* Finally, check that the register number matches.  */
+  return REGNO (op) == reg;
+}
+
 
 /* Initialize the GCC target structure.  */
 #undef TARGET_ATTRIBUTE_TABLE


[committed] Enable REE for H8

2022-10-17 Thread Jeff Law via Gcc-patches


I was looking at H8 assembly code recently and noticed we had 
unnecessary extensions.  As it turns out we never enabled redundant 
extension elimination on the H8.  This patch fixes that oversight (and 
was the trigger for the failure fixed my the prior patch).



Regression tested along with a bit of other in-progress work. Committing 
to the trunk.



Jeff
commit 566c5f1aaae120d2283103e68ecf1c1a83dd4459
Author: Jeff Law 
Date:   Mon Oct 17 19:28:00 2022 -0400

Enable REE for H8

I was looking at H8 assembly code recently and noticed we had unnecessary
extensions.  As it turns out we never enabled redundant extension 
elimination
on the H8.  This patch fixes that oversight (and was the trigger for the
failure fixed my the prior patch).

gcc/common

* common/config/h8300/h8300-common.cc 
(h8300_option_optimization_table):
Enable redundant extension elimination at -O2 and above.

diff --git a/gcc/common/config/h8300/h8300-common.cc 
b/gcc/common/config/h8300/h8300-common.cc
index bfbda22006b..22e2cfcb7b2 100644
--- a/gcc/common/config/h8300/h8300-common.cc
+++ b/gcc/common/config/h8300/h8300-common.cc
@@ -32,6 +32,8 @@ static const struct default_options 
h8300_option_optimization_table[] =
and/or variable-cycle branches where (cycle count taken !=
cycle count not taken).  */
 { OPT_LEVELS_ALL, OPT_freorder_blocks, NULL, 0 },
+/* Enable redundant extension instructions removal at -O2 and higher.  */
+{ OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
 { OPT_LEVELS_NONE, 0, NULL, 0 }
   };
 


[committed] Add missing splitter for H8

2022-10-17 Thread Jeff Law via Gcc-patches
While testing a minor optimization on the H8 my builds failed due to 
failure to split a zero-extended memory load.    That particular pattern 
is a bit special on the H8 in that it's split at assembly time primarily 
to get the length computations correct.  Arguably that alternative 
should go away completely, but I haven't really looked into that.


Anyway, with the final-asm split we obviously need to match a 
define_split somewhere.  But none was ever written after adding CCZN 
optimizations.  So if we had a zero extend of a memory operand and it 
was used to eliminate a compare, then we'd abort at final asm time.



Regression tested (in conjunction with various other in-progress 
patches) on H8 without regressions.



Installed on the trunk.


Jeff
commit 43ee3f64cb519f2675fa1771007d4aa3baba944f
Author: Jeff Law 
Date:   Mon Oct 17 19:19:25 2022 -0400

Add missing splitter for H8

While testing a minor optimization on the H8 my builds failed due to
failure to split a zero-extended memory load.That particular pattern
is a bit special on the H8 in that it's split at assembly time primarily
to get the length computations correct.  Arguably that alternative should
go away completely, but I haven't really looked into that.

Anyway, with the final-asm split we obviously need to match a define_split
somewhere.  But none was ever written after adding CCZN optimizations.  So
if we had a zero extend of a memory operand and it was used to eliminate
a compare, then we'd abort at final asm time.

Regression tested (in conjunction with various other in-progress patches) on
H8 without regressions.

gcc/
* config/h8300/extensions.md (CCZN setting zero extended load): Add
missing splitter.

diff --git a/gcc/config/h8300/extensions.md b/gcc/config/h8300/extensions.md
index 74647c79cd8..7149dc0ac52 100644
--- a/gcc/config/h8300/extensions.md
+++ b/gcc/config/h8300/extensions.md
@@ -47,6 +47,24 @@
 operands[2] = gen_rtx_REG (QImode, REGNO (operands[0]));
   })
 
+;; Similarly, but setting cczn.
+(define_split
+  [(set (reg:CCZN CC_REG)
+   (compare:CCZN
+ (zero_extend:HI (match_operand:QI 1 "general_operand_src" ""))
+ (const_int 0)))
+   (set (match_operand:HI 0 "register_operand" "")
+(zero_extend:HI (match_dup 1)))]
+  "!REG_P (operands[1]) && reload_completed"
+  [(parallel [(set (match_dup 2) (match_dup 1))
+ (clobber (reg:CC CC_REG))])
+   (parallel [(set (reg:CCZN CC_REG)
+  (compare:CCZN (zero_extend:HI (match_dup 2)) (const_int 0)))
+ (set (match_dup 0) (zero_extend:HI (match_dup 2)))])]
+  {
+operands[2] = gen_rtx_REG (QImode, REGNO (operands[0]));
+  })
+
 (define_insn "*zero_extendqisi2"
   [(set (match_operand:SI 0 "register_operand" "=r,r")
(zero_extend:SI (match_operand:QI 1 "general_operand_src" "0,g>")))]


Re: Announcement: Porting the Docs to Sphinx - 9. November 2022

2022-10-17 Thread Sandra Loosemore

On 10/17/22 07:28, Martin Liška wrote:

Hello.

Based on the very positive feedback I was given at the Cauldron Sphinx 
Documentation BoF,
I'm planning migrating the documentation on 9th November. There are still some 
minor comments
from Sandra when it comes to the PDF output, but we can address that once the 
conversion is done.


My main complaint about the PDF is that the blue color used for link 
text is so light it interferes with readability.  Few people are going 
to print the document on paper any more, but I did try printing a sample 
page on a grayscale printer and the blue link text came out so faint 
that it was barely visible at all.  An E-ink reader device would 
probably have similar problems.


I'm generally not a fan of the other colors being used for formatting, 
either.  To me it seems like they all interfere with readability, plus 
in code samples it seems like random things get highlighted in random 
colors, instead of focusing on the thing the example is trying to 
demonstrate.


I've been preferring to use the PDF form of the GNU manuals because it 
is easier to search the whole document that way.  The search feature in 
the new web version doesn't quite cut it  it gives you a list of web 
pages and then you have to do a second browser search within each page 
to find the reference.  So I hope we can continue to support the PDF as 
a canonical format and better tune it for easy readability, instead of 
assuming that most people will only care about the online web version.


-Sandra






Re: [RFC PATCH] libstdc++, v2: Partial library support for std::float{16, 32, 64, 128}_t

2022-10-17 Thread Jakub Jelinek via Gcc-patches
On Mon, Oct 17, 2022 at 09:33:02PM +, Joseph Myers wrote:
> > > And I/O etc. support is missing, not sure I'm able to handle that and if 
> > > it
> > > is e.g. possible to keep that support out of libstdc++.so.6, because what
> > > extended floating point types one has on a particular arch could change 
> > > over
> > > time (I mean e.g. bfloat16_t support or float16_t support can be added
> > > etc.).
> > 
> > Yes, I think we can add the I/O functions as always_inline because all
> > they're going to do is convert the argument to float, double, or long
> > double and then call the existing overloads. There will be no new
> > virtual functions.
> 
> As with fma, note that doing conversions from strings to floating-point is 
> a case where doing the operation on float and then narrowing is 
> technically incorrect because double rounding can occur (the rounded float 
> result can be half way between two values of the narrower type, without 
> that being the exact mathematical result) but the operation should be 
> correctly rounding.  It's fine to use float or double operations in the 
> other direction (floating-point to strings), of course.

That is true, but for istream and ostream that is what the standard
requires.  This is because there are required facets to be called and
they are available just for float/double/long double and it would be
an ABI change to allow more.
For extended floating point wider than long double it is implementation
defined what happens.
And otherwise, there is
[ Note: When the extended floating-point type has a floating-point
conversion rank that is not equal to the rank of any standard floating-point
type, then double rounding during the conversion can result in inaccurate
results.  from_chars can be used in situations where maximum accuracy is
important.  - end note ]
As for , I think we'll need to implement both directions,
though perhaps the float16_t or bfloat16_t to_chars can be partly or fully
implemented using the float to_chars.

Jakub



Re: [RFC PATCH] libstdc++, v2: Partial library support for std::float{16, 32, 64, 128}_t

2022-10-17 Thread Jonathan Wakely via Gcc-patches
On Mon, 17 Oct 2022 at 22:33, Joseph Myers  wrote:
>
> On Mon, 17 Oct 2022, Jonathan Wakely via Gcc-patches wrote:
>
> > > And I/O etc. support is missing, not sure I'm able to handle that and if 
> > > it
> > > is e.g. possible to keep that support out of libstdc++.so.6, because what
> > > extended floating point types one has on a particular arch could change 
> > > over
> > > time (I mean e.g. bfloat16_t support or float16_t support can be added
> > > etc.).
> >
> > Yes, I think we can add the I/O functions as always_inline because all
> > they're going to do is convert the argument to float, double, or long
> > double and then call the existing overloads. There will be no new
> > virtual functions.
>
> As with fma, note that doing conversions from strings to floating-point is
> a case where doing the operation on float and then narrowing is
> technically incorrect because double rounding can occur (the rounded float
> result can be half way between two values of the narrower type, without
> that being the exact mathematical result) but the operation should be
> correctly rounding.  It's fine to use float or double operations in the
> other direction (floating-point to strings), of course.

Yes, this is called out in the C++23 draft:

[ Note: When the extended floating-point type has a floating-point
conversion rank that is not equal to the rank of any standard
floating-point type, then double rounding during the conversion can
result in inaccurate results. from_chars can be used in situations
where maximum accuracy is important. - end note ]

The alternative is an ABI break, which we didn't want to force on
implementors. For libstdc++ we're not going to break the ABI, so we're
going to live with the double rounding.


Re: [RFC PATCH] libstdc++, v2: Partial library support for std::float{16, 32, 64, 128}_t

2022-10-17 Thread Joseph Myers
On Mon, 17 Oct 2022, Jonathan Wakely via Gcc-patches wrote:

> > And I/O etc. support is missing, not sure I'm able to handle that and if it
> > is e.g. possible to keep that support out of libstdc++.so.6, because what
> > extended floating point types one has on a particular arch could change over
> > time (I mean e.g. bfloat16_t support or float16_t support can be added
> > etc.).
> 
> Yes, I think we can add the I/O functions as always_inline because all
> they're going to do is convert the argument to float, double, or long
> double and then call the existing overloads. There will be no new
> virtual functions.

As with fma, note that doing conversions from strings to floating-point is 
a case where doing the operation on float and then narrowing is 
technically incorrect because double rounding can occur (the rounded float 
result can be half way between two values of the narrower type, without 
that being the exact mathematical result) but the operation should be 
correctly rounding.  It's fine to use float or double operations in the 
other direction (floating-point to strings), of course.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH, committed] Fortran: NULL pointer dereference in gfc_simplify_image_index [PR104330]

2022-10-17 Thread Harald Anlauf via Gcc-patches
Dear all,

I've pushed a very obvious fix for a NULL pointer dereference
on behalf of Steve after regtesting on x86_64-pc-linux-gnu as

https://gcc.gnu.org/g:84807af0ca6dfdb81abb8e925ce32acbcab29868

Thanks,
Harald




[PATCH] libstdc++: Implement ranges::stride_view from P1899R3

2022-10-17 Thread Patrick Palka via Gcc-patches
Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

libstdc++-v3/ChangeLog:

* include/std/ranges (stride_view): Define.
(stride_view::_Iterator): Define.
(views::__detail::__can_stride_view): Define.
(views::_Stride, views::stride): Define.
* testsuite/std/ranges/adaptors/stride/1.cc: New test.
---
 libstdc++-v3/include/std/ranges   | 351 ++
 .../testsuite/std/ranges/adaptors/stride/1.cc |  73 
 2 files changed, 424 insertions(+)
 create mode 100644 libstdc++-v3/testsuite/std/ranges/adaptors/stride/1.cc

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 5857d426a66..d113cf19dc7 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -7566,6 +7566,357 @@ namespace views::__adaptor
 
 inline constexpr _Repeat repeat;
   }
+
+  template
+requires view<_Vp>
+  class stride_view : public view_interface>
+  {
+_Vp _M_base;
+range_difference_t<_Vp> _M_stride;
+
+template using _Base = __detail::__maybe_const_t<_Const, _Vp>;
+
+template
+struct __iter_cat
+{ };
+
+template
+  requires forward_range<_Base<_Const>>
+struct __iter_cat<_Const>
+{
+private:
+  static auto
+  _S_iter_cat()
+  {
+   using _Cat = typename 
iterator_traits>>::iterator_category;
+   if constexpr (derived_from<_Cat, random_access_iterator_tag>)
+ return random_access_iterator_tag{};
+   else
+ return _Cat{};
+  }
+public:
+  using iterator_category = decltype(_S_iter_cat());
+};
+
+template class _Iterator;
+
+  public:
+constexpr explicit
+stride_view(_Vp __base, range_difference_t<_Vp> __stride)
+: _M_base(std::move(__base)), _M_stride(__stride)
+{ __glibcxx_assert(__stride > 0); }
+
+constexpr _Vp
+base() const& requires copy_constructible<_Vp>
+{ return _M_base; }
+
+constexpr _Vp
+base() &&
+{ return std::move(_M_base); }
+
+constexpr range_difference_t<_Vp>
+stride() const noexcept
+{ return _M_stride; }
+
+constexpr auto
+begin() requires (!__detail::__simple_view<_Vp>)
+{ return _Iterator(this, ranges::begin(_M_base)); }
+
+constexpr auto
+begin() const requires range
+{ return _Iterator(this, ranges::begin(_M_base)); }
+
+constexpr auto
+end() requires (!__detail::__simple_view<_Vp>)
+{
+  if constexpr (common_range<_Vp> && sized_range<_Vp> && 
forward_range<_Vp>)
+   {
+ auto __missing = (_M_stride - ranges::distance(_M_base) % _M_stride) 
% _M_stride;
+ return _Iterator(this, ranges::end(_M_base), __missing);
+   }
+  else if constexpr (common_range<_Vp> && !bidirectional_range<_Vp>)
+   return _Iterator(this, ranges::end(_M_base));
+  else
+return default_sentinel;
+}
+
+constexpr auto
+end() const requires range
+{
+  if constexpr (common_range && sized_range
+   && forward_range)
+   {
+ auto __missing = (_M_stride - ranges::distance(_M_base) % _M_stride) 
% _M_stride;
+ return _Iterator(this, ranges::end(_M_base), __missing);
+   }
+  else if constexpr (common_range && !bidirectional_range)
+return _Iterator(this, ranges::end(_M_base));
+  else
+return default_sentinel;
+}
+
+constexpr auto
+size() requires sized_range<_Vp>
+{
+  return __detail::__to_unsigned_like
+   (__detail::__div_ceil(ranges::distance(_M_base), _M_stride));
+}
+
+constexpr auto
+size() const requires sized_range
+{
+  return __detail::__to_unsigned_like
+   (__detail::__div_ceil(ranges::distance(_M_base), _M_stride));
+}
+  };
+
+  template
+stride_view(_Range&&, range_difference_t<_Range>) -> 
stride_view>;
+
+  template
+inline constexpr bool enable_borrowed_range>
+  = enable_borrowed_range<_Vp>;
+
+  template
+requires view<_Vp>
+  template
+  class stride_view<_Vp>::_Iterator : public __iter_cat<_Const>
+  {
+using _Parent = __detail::__maybe_const_t<_Const, stride_view>;
+using _Base = stride_view::_Base<_Const>;
+
+iterator_t<_Base> _M_current = iterator_t<_Base>();
+sentinel_t<_Base> _M_end = sentinel_t<_Base>();
+range_difference_t<_Base> _M_stride = 0;
+range_difference_t<_Base> _M_missing = 0;
+
+constexpr
+_Iterator(_Parent* __parent, iterator_t<_Base> __current,
+ range_difference_t<_Base> __missing = 0)
+: _M_current(std::move(__current)), _M_end(ranges::end(__parent->_M_base)),
+  _M_stride(__parent->_M_stride), _M_missing(__missing)
+{ }
+
+static auto
+_S_iter_concept()
+{
+  if constexpr (random_access_range<_Base>)
+   return random_access_iterator_tag{};
+  else if constexpr (bidirectional_range<_Base>)
+   return bidirectional_iterator_tag{};
+  else if constexpr (forward_range<_Base>)
+   return 

Re: [COMMITTED 4/4] PR tree-optimization/102540 - propagate partial equivs in the cache.

2022-10-17 Thread H.J. Lu via Gcc-patches
On Thu, Oct 13, 2022 at 8:32 AM Andrew MacLeod via Gcc-patches
 wrote:
>
> Rangers on entry cache propagation already evaluates equivalences when
> calculating values. This patch also allows it to work with partial
> equivalences, and if the bit sizes are compatible, make use of those
> ranges as well.
>
> It attempts to be conservative, so should be safe.
>
> This resolves regressions in both PR 102540 and PR 102872.
>
> Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed
>
> Andrew

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107273

-- 
H.J.


[COMMITTED] Make sure exported range for SSA post-dominates the DEF in set_global_ranges_from_unreachable_edges.

2022-10-17 Thread Aldy Hernandez via Gcc-patches
The problem here is that we're exporting a range for an SSA range that
happens on the other side of a __builtin_unreachable, but the SSA does
not post-dominate the definition point.  This is causing ivcanon to
unroll things incorrectly.

This was a snafu when converting the code from evrp.

PR tree-optimization/107293

gcc/ChangeLog:

* tree-ssa-dom.cc
(dom_opt_dom_walker::set_global_ranges_from_unreachable_edges):
Check that condition post-dominates the definition point.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr107293.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr107293.c | 32 
 gcc/tree-ssa-dom.cc  |  6 -
 2 files changed, 37 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr107293.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr107293.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr107293.c
new file mode 100644
index 000..724c31a11e6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr107293.c
@@ -0,0 +1,32 @@
+// { dg-do run }
+// { dg-options "-w -Os" }
+
+short a;
+int b[1];
+
+int c(int p) {
+  return (p < 0) ? 0 : 10 + ((p / 100 - 16) / 4);
+}
+
+void f(int n) {
+  while (1) {
+int m = n;
+while ((m ) )
+  m /= 2;
+break;
+  }
+}
+
+void g() {
+  int h = a = 0;
+  for (; h + a <= 0; a++) {
+if (b[c(a - 6)])
+  break;
+f(a);
+  }
+}
+int main() {
+  g();
+  if (a != 1)
+__builtin_abort ();
+}
diff --git a/gcc/tree-ssa-dom.cc b/gcc/tree-ssa-dom.cc
index e6b8dace5e9..c7f095d79fc 100644
--- a/gcc/tree-ssa-dom.cc
+++ b/gcc/tree-ssa-dom.cc
@@ -1367,7 +1367,11 @@ 
dom_opt_dom_walker::set_global_ranges_from_unreachable_edges (basic_block bb)
   tree name;
   gori_compute  = m_ranger->gori ();
   FOR_EACH_GORI_EXPORT_NAME (gori, pred_e->src, name)
-if (all_uses_feed_or_dominated_by_stmt (name, stmt))
+if (all_uses_feed_or_dominated_by_stmt (name, stmt)
+   // The condition must post-dominate the definition point.
+   && (SSA_NAME_IS_DEFAULT_DEF (name)
+   || (gimple_bb (SSA_NAME_DEF_STMT (name))
+   == pred_e->src)))
   {
Value_Range r (TREE_TYPE (name));
 
-- 
2.37.3



Re: [PATCH, rs6000] Split TARGET_POWER8 from TARGET_DIRECT_MOVE [PR101865] (2/2)

2022-10-17 Thread Segher Boessenkool
On Mon, Sep 19, 2022 at 11:13:20AM -0500, will schmidt wrote:
>   The _ARCH_PWR8 define is conditional on TARGET_DIRECT_MOVE,
> and can be disabled by dependent options when it should not be.
> This manifests in the issue seen in PR101865 where -mno-vsx
> mistakenly disables _ARCH_PWR8.

> This change replaces the relevant TARGET_DIRECT_MOVE references
> with a TARGET_POWER8 entry so that the direct_move and power8
> features can be enabled or disabled independently.

We should get rid of TARGET_DIRECT_MOVE altogether.  Please see
57f108f5a1e1:
rs6000: Disable -m[no-]direct-move (PR85293)

The -mno-direct-move option causes a lot of problems, since it forces
us to be able to generate code for p8 and up with some crucial
instructions missing.  This patch removes the -m[no-]direct-move
options so that the user cannot put us into this unexpected situation
anymore.  Internally we still have all the same flags, and they are
automatically set based on -mcpu; getting rid of that is a lot more
work and will have to wait for GCC 9 (in some places the flag is used
to see if we are compiling for a p8 _at all_).

It did not happen in GCC 9 obviously.  Do you want to take a shot?  It
doesn't have to be all at once, it's probably best if not even -- as I
wrote in the commit message, the flag always was used to mean different
things.

> The existing (and rather lengthy) commentary for DIRECT_MOVE remains
> in place in rs6000-c.cc:rs6000_target_modify_macros().  The
> if-defined logic there will now set a __DIRECT_MOVE__ define when
> TARGET_DIRECT_MOVE is set, this serves as a placeholder for debug
> purposes, but is otherwise unused.  This can be removed in a
> subsequent patch, or in an update of this patch, depending on feedback.

There should be no such macro, for the same reason there should be no
-mdirect-move option: it is so very essential to all code we generate,
it *always* is enabled if we have P8 or later.

> gcc/
>   PR Target/101865
>   * config/rs6000/rs6000-builtin.cc
>   (rs6000_builtin_is_supported): Replace TARGET_DIRECT_MOVE
>   usage with TARGET_POWER8.

Please don't arbitrarily wrap lines.  It is harder to read, and it looks
like something is missing.

>   * config/rs6000/rs6000-cpus.def (ISA_2_7_MASKS_SERVER):
>   Add OPTION_MASK_POWER8 entry.

Especially in cases like this, where it looks like you forgot to write
something after the colon.

> @@ -24046,10 +24045,11 @@ static struct rs6000_opt_mask const 
> rs6000_opt_masks[] =
>{ "block-ops-vector-pair", OPTION_MASK_BLOCK_OPS_VECTOR_PAIR,
>   false, true  },
>{ "cmpb",  OPTION_MASK_CMPB,   false, true  },
>{ "crypto",OPTION_MASK_CRYPTO, false, 
> true  },
>{ "direct-move",   OPTION_MASK_DIRECT_MOVE,false, true  },
> +  { "power8",OPTION_MASK_POWER8, false, 
> true  },

Why would we want a #pragma power8 ?

> --- a/gcc/config/rs6000/rs6000.opt
> +++ b/gcc/config/rs6000/rs6000.opt
> @@ -490,10 +490,15 @@ mcrypto
>  Target Mask(CRYPTO) Var(rs6000_isa_flags)
>  Use ISA 2.07 Category:Vector.AES and Category:Vector.SHA2 instructions.
>  
>  mdirect-move
>  Target Undocumented Mask(DIRECT_MOVE) Var(rs6000_isa_flags) WarnRemoved
> +Enable direct move (ISA 2.07).

It is undocumented and should remain that, except eventually we should
remove it completely (but leave some stubs so that code in the wild
keeps compiling).

> +mpower8
> +Target Mask(POWER8) Var(rs6000_isa_flags)
> +Use instructions added in ISA 2.07 (power8).

There should not be such an option.  It is set by -mcpu=power8 and
later, but can never be enabled or disabled direfctly by the user.

> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -3407,11 +3407,11 @@ (define_insn "vsx_extract_"
>if (element == VECTOR_ELEMENT_SCALAR_64BIT)
>  {
>if (op0_regno == op1_regno)
>   return ASM_COMMENT_START " vec_extract to same register";
>  
> -  else if (INT_REGNO_P (op0_regno) && TARGET_DIRECT_MOVE
> +  else if (INT_REGNO_P (op0_regno) && TARGET_POWER8
>  && TARGET_POWERPC64)

That fits on one line now.

Thanks,


Segher


[PATCH] d: Remove D-specific version definitions from target headers

2022-10-17 Thread Iain Buclaw via Gcc-patches
Hi,

This splits up the targetdm sources so that each file only handles one
target platform.

Having all logic kept in the headers means that they could become out of
sync when a new target is added (loongarch*-*-linux*) or accidentally
broken if some headers in tm_file are omitted or changed about.

There might be an open bikeshed question as to appropriate names for
some of the platform sources (kfreebsd-d.cc or kfreebsd-gnu-d.cc).

Bootstrapped and regression tested on x86_64-linux-gnu, and also built
i686-cygwin, i686-gnu, i686-kfreebsd-gnu, i686-kopensolaris-gnu,
x86_64-cygwin, x86_64-w64-mingw32 cross compilers, the dumps of all
predefined version identifiers remain correct in all configurations.

OK?

Regards,
Iain.

---
gcc/ChangeLog:

* config.gcc: Split out glibc-d.o into linux-d.o, kfreebsd-d.o,
kopensolaris-d.o, and gnu-d.o.  Split out cygwin-d.o from winnt-d.o.
* config/arm/linux-eabi.h (EXTRA_TARGET_D_OS_VERSIONS): Remove.
* config/gnu.h (GNU_USER_TARGET_D_OS_VERSIONS): Remove.
* config/i386/cygwin.h (EXTRA_TARGET_D_OS_VERSIONS): Remove.
* config/i386/linux-common.h (EXTRA_TARGET_D_OS_VERSIONS): Remove.
* config/i386/mingw32.h (EXTRA_TARGET_D_OS_VERSIONS): Remove.
* config/i386/t-cygming: Add cygwin-d.o.
* config/i386/winnt-d.cc (winnt_d_os_builtins): Only add
MinGW-specific version condition.
* config/kfreebsd-gnu.h (GNU_USER_TARGET_D_OS_VERSIONS): Remove.
* config/kopensolaris-gnu.h (GNU_USER_TARGET_D_OS_VERSIONS): Remove.
* config/linux-android.h (ANDROID_TARGET_D_OS_VERSIONS): Remove.
* config/linux.h (GNU_USER_TARGET_D_OS_VERSIONS): Remove.
* config/mips/linux-common.h (EXTRA_TARGET_D_OS_VERSIONS): Remove.
* config/t-glibc: Remove glibc-d.o, add gnu-d.o, kfreebsd-d.o,
kopensolaris-d.o.
* config/t-linux: Add linux-d.o.
* config/glibc-d.cc: Remove file.
* config/gnu-d.cc: New file.
* config/i386/cygwin-d.cc: New file.
* config/kfreebsd-d.cc: New file.
* config/kopensolaris-d.cc: New file.
* config/linux-d.cc: New file.
---
 gcc/config.gcc  | 24 +++--
 gcc/config/arm/linux-eabi.h |  3 --
 gcc/config/{glibc-d.cc => gnu-d.cc} | 30 ---
 gcc/config/gnu.h|  6 ---
 gcc/config/i386/cygwin-d.cc | 83 +
 gcc/config/i386/cygwin.h|  9 
 gcc/config/i386/linux-common.h  |  3 --
 gcc/config/i386/mingw32.h   | 12 -
 gcc/config/i386/t-cygming   |  4 ++
 gcc/config/i386/winnt-d.cc  | 10 ++--
 gcc/config/kfreebsd-d.cc| 65 ++
 gcc/config/kfreebsd-gnu.h   |  6 ---
 gcc/config/kopensolaris-d.cc| 65 ++
 gcc/config/kopensolaris-gnu.h   |  6 ---
 gcc/config/linux-android.h  |  6 ---
 gcc/config/linux-d.cc   | 78 +++
 gcc/config/linux.h  | 13 -
 gcc/config/mips/linux-common.h  |  3 --
 gcc/config/t-glibc  | 10 +++-
 gcc/config/t-linux  |  4 ++
 20 files changed, 345 insertions(+), 95 deletions(-)
 rename gcc/config/{glibc-d.cc => gnu-d.cc} (65%)
 create mode 100644 gcc/config/i386/cygwin-d.cc
 create mode 100644 gcc/config/kfreebsd-d.cc
 create mode 100644 gcc/config/kopensolaris-d.cc
 create mode 100644 gcc/config/linux-d.cc

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 2af30b4a6ec..2c9b9a06564 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -879,10 +879,8 @@ case ${target} in
   esac
   c_target_objs="${c_target_objs} glibc-c.o"
   cxx_target_objs="${cxx_target_objs} glibc-c.o"
-  d_target_objs="${d_target_objs} glibc-d.o"
   tmake_file="${tmake_file} t-glibc"
   target_has_targetcm=yes
-  target_has_targetdm=yes
   case $target in
 *-*-*uclibc* | *-*-uclinuxfdpiceabi)
   ;;
@@ -891,6 +889,24 @@ case ${target} in
   gcc_cv_initfini_array=yes
   ;;
   esac
+  case $target in
+*-*-*linux*)
+  d_target_objs="${d_target_objs} linux-d.o"
+  target_has_targetdm=yes
+  ;;
+*-*-kfreebsd*-gnu)
+  d_target_objs="${d_target_objs} kfreebsd-d.o"
+  target_has_targetdm=yes
+  ;;
+*-*-kopensolaris*-gnu)
+  d_target_objs="${d_target_objs} kopensolaris-d.o"
+  target_has_targetdm=yes
+  ;;
+*-*-gnu*)
+  d_target_objs="${d_target_objs} gnu-d.o"
+  target_has_targetdm=yes
+  ;;
+  esac
   ;;
 *-*-netbsd*)
   tm_p_file="${tm_p_file} netbsd-protos.h"
@@ -2051,7 +2067,7 @@ i[34567]86-*-cygwin*)
extra_objs="${extra_objs} winnt.o winnt-stubs.o"
c_target_objs="${c_target_objs} msformat-c.o"
cxx_target_objs="${cxx_target_objs} winnt-cxx.o msformat-c.o"
-   d_target_objs="${d_target_objs} winnt-d.o"
+   d_target_objs="${d_target_objs} cygwin-d.o"
target_has_targetdm="yes"
if test x$enable_threads = xyes; 

Re: [PATCH] middle-end IFN_ASSUME support [PR106654]

2022-10-17 Thread Andrew MacLeod via Gcc-patches



The assume function can have many arguments (one is created for each
automatic var referenced or set by the condition), so it would be nice to
track all of them rather than just hardcoding the first.  And, the argument
doesn't necessarily have to be a scalar, so perhaps later on we could derive
ranges even for structure members etc. if needed.  Or just handle
assume_function in IPA-SRA somehow at least.

The C++23 paper mentions
[[assume(size > 0)]];
[[assume(size % 32 == 0)]];
[[assume(std::isfinite(data[i]))]];
[[assume(*pn >= 1)]];
[[assume(i == 42)]];
[[assume(++i == 43)]];
[[assume((std::cin >> i, i == 42))]];
[[assume(++almost_last == last)]];
among other things, the first and fifth are already handled the
if (!cond) __builtin_unreachable (); way and so work even without this
patch, the (std::cin >> i, i == 42) case is not worth bothering for now
and the rest would be single block assumptions that can be handled easily
(except the last one which would have record type arguments and so we'd need
SRA).

Jakub
I put together an initial prototype, attached is the 2 patches so far. I 
applied this on top of one of your sets of patches to try it out.     
The first patch has the initial simple version, and the second patch 
hacks VRP to add a loop over all the ssa-names in the function and show 
what assume_range_p would  return for them.


First, I added another routine to ranger:

*bool gimple_ranger::assume_range_p (vrange , tree name)*

This is the routine that is called to determine what the range of NAME 
is at the end of the function if the function returns [1,1]. It is 
painfully simple, only working on names in the definition chain of the 
return variable. It returns TRUE if it finds a non-varying result.   I 
will next expand on this to look back in the CFG and be more flexible.


To apply any assumed values, I added a routine to be called

*bool query_assume_call (vrange , tree assume_id, tree name);*

This routine would be what is called to lookup if there is any range 
associated with NAME in the assume function ASSUME_ID.    I hacked one 
up to return [42, 42] for any integer query just for POC.  You'd need to 
supply this routine somewhere instead.


As the ASSUME function has no defs, we can't produce results for the 
parameters in normal ways, so I leverage the inferred range code.  When 
doing a stmt walk, when VRP is done processing a stmt, it applies any 
side effects of the statement going forward. The gimple_inferred_range 
constructor now also looks for assume calls, and calls query_assume_call 
() on each argument, and if it gets back TRUE, applies an inferred range 
record for that range at that stmt.  (This also means those ASSUME 
ranges will only show up in a VRP walk.)


These seems like it might be functional enough for you to experiment with.

For the simple

int
bar (int x)
{
  [[assume (++x == 43)]];
  return x;
}

The VRP hack for ther assume function shows :

for an assume function, x_2(D) would have a range of [irange] int [42, 
42] NONZERO 0x2a.


I also tried it for

bool foo1 (int x, int y) { return x < 10 || x > 20 || x == 12; }
or an assume function, x_5(D) would have a range of [irange] int [-INF, 
9][12, 12][21, +INF]



bool foo2 (int x, int y) { return (x >= 10 && x <= 20) || x == 2; }
for an assume function, x_5(D) would have a range of [irange] int [2, 
2][10, 20] NONZERO 0x1f


for:

int
bar (int x)
{
  [[assume (++x == 43)]];
  return x;
}

As for propagating assumed values, the hacked up version returning 42 
shows it propagates into the return:


query_assume_call injection
_Z3bari._assume.0 assume inferred range of x_1(D) to [irange] int [42, 
42] NONZERO 0x2a

int bar (int x)
{
   :
  .ASSUME (_Z3bari._assume.0, x_1(D));
  return 42;

}

So in principle, I think theres enough infrastructure there to get 
going.  You can query parameter ranges by creating a ranger, and 
querying the parameters via *assume_range_p () *.  You can do that 
anywhere, as the hack I put in vrp shows, it creates a new ranger, 
simply queries each SSA_NAME, then disposes of the ranger before 
invoking VRP on a fresh ranger.  The you just wire up a proper 
*query_assume_call(*) to return those ranges.


Thats the basic APi to deal with... call one function, supply another.  
Does this model seem like it would work OK for you?  Or do we need to 
tweak it?


I am planning to extend assume_range_p to handle other basic blocks, as 
well as pick up a few of the extra things that outgoing_edge_range_p does.


Andrew


PS. It also seems to me that the assume_range_p() infrastructure may 
have some other uses when it comes to inlining or LTO or IPA.    This 
particular version works with a return value of [1,1], but that value is 
manually supplied to GORI by the routine.  If any other pass has some 
reason to know that the return value was within a certain range, we 
could use that and query what the incoming ranges of any parameter might 
have to be. Just a thought.



From 

Re: [PATCH 1/2] ipa-cp: Better representation of aggregate values we clone for

2022-10-17 Thread Martin Jambor
Hi,

thanks for the review.

On Fri, Oct 14 2022, Jan Hubicka wrote:
>>

[...]

>> 
>> gcc/testsuite/ChangeLog:
>> 
>> 2022-08-15  Martin Jambor  
>> 
>>  * gcc.dg/ipa/ipcp-agg-11.c: Adjust dumps.
>>  * gcc.dg/ipa/ipcp-agg-8.c: Likewise.
>> ---
>>  gcc/ipa-cp.cc  | 1010 
>>  gcc/ipa-prop.cc|  254 +++---
>>  gcc/ipa-prop.h |  139 +++-
>>  gcc/testsuite/gcc.dg/ipa/ipcp-agg-11.c |4 +-
>>  gcc/testsuite/gcc.dg/ipa/ipcp-agg-8.c  |4 +-
>>  5 files changed, 736 insertions(+), 675 deletions(-)
>> 
>> diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
>> index 543a9334e2c..024f8c06b5d 100644
>> --- a/gcc/ipa-cp.cc
>> +++ b/gcc/ipa-cp.cc
>> @@ -127,6 +127,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "attribs.h"
>>  #include "dbgcnt.h"
>>  #include "symtab-clones.h"
>> +#include 
>>  
>>  template  class ipcp_value;
>>  
>> @@ -455,6 +456,26 @@ ipcp_lattice::is_single_const ()
>>  return true;
>>  }
>>  
>> +/* Return true iff X and Y should be considered equal values by IPA-CP.  */
>> +
>> +static bool
>> +values_equal_for_ipcp_p (tree x, tree y)
>> +{
>> +  gcc_checking_assert (x != NULL_TREE && y != NULL_TREE);
>> +
>> +  if (x == y)
>> +return true;
>> +
>> +  if (TREE_CODE (x) == ADDR_EXPR
>> +  && TREE_CODE (y) == ADDR_EXPR
>> +  && TREE_CODE (TREE_OPERAND (x, 0)) == CONST_DECL
>> +  && TREE_CODE (TREE_OPERAND (y, 0)) == CONST_DECL)
>> +return operand_equal_p (DECL_INITIAL (TREE_OPERAND (x, 0)),
>> +DECL_INITIAL (TREE_OPERAND (y, 0)), 0);
>
> I wonder if we want to handle MEM_REFs here too? They get quite common
> in IPA mode and I think we miss the fixup removing them here.

This patch just moves the function up without modifying it, and I'd like
to do any changes separately, unless they are required for this patch.

And just to be sure, you mean it should cover also the MEM_REF case as
in is_gimple_invariant_address, right?

>> +  else
>> +return operand_equal_p (x, y, 0);
>> +/* Return the item describing a constant stored for INDEX at UNIT_OFFSET or
>> +   NULL if there is no such constant.  */
>> +
>> +const ipa_argagg_value *
>> +ipa_argagg_value_list::get_elt (int index, unsigned unit_offset) const
>> +{
>> +  ipa_argagg_value key;
>> +  key.index = index;
>> +  key.unit_offset = unit_offset;
>> +  const ipa_argagg_value *res
>> += std::lower_bound (m_elts.begin (), m_elts.end (), key,
>> +[] (const ipa_argagg_value ,
>> +const ipa_argagg_value )
>> +{
>> +  if (elt.index < val.index)
>> +return true;
>> +  if (elt.index > val.index)
>> +return false;
>> +  if (elt.unit_offset < val.unit_offset)
>> +return true;
>> +  return false;
>> +});
>> +
>> +  if (res == m_elts.end ()
>> +  || res->index != index
>> +  || res->unit_offset != unit_offset)
>> +res = nullptr;
>> +
>> +  /* TODO: perhaps remove after some extensive testing? */
>> +  if (!flag_checking)
>> +return res;
>> +
>> +  const ipa_argagg_value *slow_res = NULL;
>> +  int prev_index = -1;
>> +  unsigned prev_unit_offset = 0;
>> +  for (const ipa_argagg_value  : m_elts)
>> +{
>> +  gcc_assert (prev_index < 0
>> +  || prev_index < av.index
>> +  || prev_unit_offset < av.unit_offset);
>> +  prev_index = av.index;
>> +  prev_unit_offset = av.unit_offset;
>> +  if (av.index == index
>> +  && av.unit_offset == unit_offset)
>> +slow_res = 
>> +}
>> +  gcc_assert (res == slow_res);

> So this is just checking that the std::lower_bound works as expected?
> I am just curious if you expect it to break?

It rather checks that the underlying array on which it operates really
is sorted :-)

When I was writing this code I had not carefully checked all the places
where we construct them.  Now I am quite confident they indeed are
always sorted but still thought this would be a useful check against
future errors.  We can remove the test at any time if it ever becomes
too slow.

>> +/* Turn all values that are not present in OTHER into NULL_TREEs.  Return 
>> the
>> +   number of remaining valid entries.  */
>> +
>> +bool
>> +ipa_argagg_value_list::superset_of_p (const ipa_argagg_value_list ) 
>> const

> It returns bool, so not number of entries.

Umm, that comment was from an entirely different function, fixed.

I also changed the names of local variables this_index and this_offset
to other_index and other_offset because that is what they really are.

>> +/* Push into RES aggregate all stored aggregate values relating to parameter
>> +   with SRC_INDEX as those relating to of DST_INDEX while subtracting
>> +   UNIT_DELTA from the individual unit offsets.  */
>> +
>> +void
>> 

Re: [PATCH, v2] Fortran: handle bad array ctors with typespec [PR93483, , PR107216, PR107219]

2022-10-17 Thread Harald Anlauf via Gcc-patches

Hi Mikael,

Am 16.10.22 um 23:17 schrieb Mikael Morin:

Le 15/10/2022 à 22:15, Harald Anlauf via Fortran a écrit :

Dear all,

here is an updated version of the patch that includes suggestions
and comments by Mikael in PR93483.

Basic new features are:
- a new enum value ARITH_NOT_REDUCED to keep track if we encountered
   an expression that was not reduced via reduce_unary/reduce_binary
- a cleanup of the related checking, resulting in more readable
   code.
- a new testcase by Mikael that exhibited a flaw in the first patch
   due to a false resolution of a symbol by premature simplification.

Regtested again.  OK for mainline?


(...)

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 10bb098d136..7b8f0b148bd 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -222,11 +222,12 @@ enum gfc_intrinsic_op
    Assumptions are made about the numbering of the interface_op
enums.  */
 #define GFC_INTRINSIC_OPS GFC_INTRINSIC_END

-/* Arithmetic results.  */
+/* Arithmetic results.  ARITH_NOT_REDUCED is used to keep track of
failed
+   reductions because an erroneous expression was encountered.  */


The expressions are not always erroneous.  They can be, but in the
testcase for example, all the expressions are valid.  They are just
unsupported by the arithmetic evaluation code which works only with
literal constants and arrays of literal constants (and arrays of arrays
etc).

OK with that comment fixed.


you're absolutely right.  I adjusted the comment and the commit
message according to your suggestion.

Pushed as https://gcc.gnu.org/g:d45af5c2eb1ba1e48449d8f3c5b4e3994a956f92

Thanks,
Harald


Thanks.





Re: [PATCH, rs6000] Tests of ARCH_PWR8 and -mno-vsx option. (1/2)

2022-10-17 Thread will schmidt via Gcc-patches
On Mon, 2022-10-17 at 10:32 -0500, Segher Boessenkool wrote:
> Hi!
> 
> Everything Ke Wen said.  Some more commments / hints:

Thanks for the reviews. :-)

I'll rework things and repost 'soon'.

Thanks
-WIll



[PATCH] libstdc++: Redefine __from_chars_alnum_to_val's table

2022-10-17 Thread Patrick Palka via Gcc-patches
It looks like the constexpr  commit r13-3313-g378a0f1840e694
caused some modules regressions:

  FAIL: g++.dg/modules/xtreme-header-4_b.C -std=c++2b (test for excess errors)
  FAIL: g++.dg/modules/xtreme-header_b.C -std=c++2b (test for excess errors)

Like PR105297, the problem seems to be the local class from
__from_chars_alnum_to_val ending up as the type of a namespace-scope
entity (the variable template __detail::__table in this case).

This patch works around this modules issue by using an ordinary class
instead of a local class.  Also, I suppose we might as well use a static
data member to define the table once for all dialects instead of having
to define it twice in C++23 mode, once as a static local variable (which
isn't usable during constexpr evaluation) and again as a variable template
(which is).

Tested on x86_64-pc-linux-gnu, does this look OK for trunk?  Diff
generated with -w to ignore noisy whitespace changes.

libstdc++-v3/ChangeLog:

* include/std/charconv (__detail::__from_chars_alnum_to_val_table):
Redefine as a class template containing type, value and _S_table
members.  Don't use a local class as the table type.
(__detail::__table): Remove.
(__detail::__from_chars_alnum_to_val): Adjust after the above.
---
 libstdc++-v3/include/std/charconv | 31 ++-
 1 file changed, 14 insertions(+), 17 deletions(-)

diff --git a/libstdc++-v3/include/std/charconv 
b/libstdc++-v3/include/std/charconv
index 7aefdd3298c..c157d4c74ab 100644
--- a/libstdc++-v3/include/std/charconv
+++ b/libstdc++-v3/include/std/charconv
@@ -413,14 +413,19 @@ namespace __detail
   return true;
 }
 
+  template
+struct __from_chars_alnum_to_val_table
+{
+  struct type { unsigned char __data[1u << __CHAR_BIT__] = {}; };
+
   // Construct and return a lookup table that maps 0-9, A-Z and a-z to 
their
   // corresponding base-36 value and maps all other characters to 127.
-  constexpr auto
-  __from_chars_alnum_to_val_table()
+  static constexpr type
+  _S_table()
   {
constexpr unsigned char __lower_letters[27] = 
"abcdefghijklmnopqrstuvwxyz";
constexpr unsigned char __upper_letters[27] = 
"ABCDEFGHIJKLMNOPQRSTUVWXYZ";
-struct { unsigned char __data[1u << __CHAR_BIT__] = {}; } __table;
+   type __table;
for (auto& __entry : __table.__data)
  __entry = 127;
for (int __i = 0; __i < 10; ++__i)
@@ -433,10 +438,11 @@ namespace __detail
return __table;
   }
 
-#if __cpp_lib_constexpr_charconv
-  template
-inline constexpr auto __table = __from_chars_alnum_to_val_table();
-#endif
+  // This initializer is made superficially dependent in order
+  // to prevent the compiler from wastefully constructing the
+  // table ahead of time when it's not needed.
+  static constexpr type value = (_DecOnly, _S_table());
+};
 
   // If _DecOnly is true: if the character is a decimal digit, then
   // return its corresponding base-10 value, otherwise return a value >= 127.
@@ -449,16 +455,7 @@ namespace __detail
   if _GLIBCXX17_CONSTEXPR (_DecOnly)
return static_cast(__c - '0');
   else
-   {
-#if __cpp_lib_constexpr_charconv
- if (std::__is_constant_evaluated())
-   return __table<_DecOnly>.__data[__c];
-#endif
- // This initializer is deliberately made dependent in order to work
- // around modules bug PR105322.
- static constexpr auto __table = (_DecOnly, 
__from_chars_alnum_to_val_table());
- return __table.__data[__c];
-   }
+   return __from_chars_alnum_to_val_table<_DecOnly>::value.__data[__c];
 }
 
   /// std::from_chars implementation for integers in a power-of-two base.
-- 
2.38.0.68.ge85701b4af



[PATCH] libstdc++, v3: Partial library support for std::float{16,32,64,128}_t and std::bfloat16_t

2022-10-17 Thread Jakub Jelinek via Gcc-patches
Hi!

On Mon, Oct 17, 2022 at 02:07:00PM +0100, Jonathan Wakely wrote:
> Yes, that's now https://cplusplus.github.io/LWG/issue3790
> The current proposed resolution is to just restore the C++20 functions
> and not provide anything for the new types.

Ok.

> > If you want to have  done in a different way, e.g. the patch
> > groups a lot of different function overloads by the floating point type,
> > is that ok or do you want to have them one function at a time for all types,
> > then next?
> 
> No, I think this way makes more sense. Otherwise the line count in the
> file will baloon with all the repeated #if #endif directives.

Ok, changed.
I've also changed this in limits and std_abs.h (ditto for
_GLIBCXX_USE_CONSTEXPR, _GLIBCXX_USE_NOEXCEPT).

There is one thing I'm not sure about but can be handled incrementally.
What exactly is is_iec559 supposed to be?
Currently for float/double/long double/__float128 it seems to be defined
to true if the type has Inf, qNaN and denormals.
For std::float{16,32,64,128}_t even a note in the spec says they are
true.
Shall it be true only if the type is actually a IEEE754 type
(binary16/32/64/128) and false otherwise, or that + the x86 extended type?
Or if it is IEEE754-like and shall it be true also for
std::bfloat16_t?
Yet another case is the IBM double double, which has infinities, NaNs
and denormals, but for that one it is hard to claim it is even IEEE754-like
(variable precision).

> > I could try to handle  too, but am kind of lost there.
> > The paper dropped the explicit std::complex specializations, can they stay
> > around as is and should new overloads be added for the
> > _Float*/__gnu_cxx::__bfloat16_t types?
> 
> The explicit specializations can stay, they do no harm.

Ok.  Shall those specialization also get the P1467 changes for the ctors?
Shall we also have specializations for the extended floating point types,
or only conditionally (say when float is binary have _Complex _Float32
so that we get better code)?

> I can take care of the  changes.

Ok.

> > And I/O etc. support is missing, not sure I'm able to handle that and if it
> > is e.g. possible to keep that support out of libstdc++.so.6, because what
> > extended floating point types one has on a particular arch could change over
> > time (I mean e.g. bfloat16_t support or float16_t support can be added
> > etc.).
> 
> Yes, I think we can add the I/O functions as always_inline because all
> they're going to do is convert the argument to float, double, or long
> double and then call the existing overloads. There will be no new
> virtual functions.
> 
> I can take care of that too.

Thanks.

Here is an updated patch that I'll test overnight (but can't commit
until the builtins patch is reviewed as it depends on that;
well, I could comment out the std::float128_t cmath support if
long double is not IEEE quad and commit that only once the builtins
patch is in).

2022-10-17  Jakub Jelinek  

* include/std/stdfloat: New file.
* include/std/numbers (__glibcxx_numbers): Define and use it
for __float128 explicit instantiations as well as
_Float{16,32,64,128} and __gnu_cxx::__bfloat16_t.
* include/std/atomic (atomic<_Float16>, atomic<_Float32>,
atomic<_Float64>, atomic<_Float128>, atomic<__gnu_cxx::__bfloat16_t>):
New explicit instantiations.
* include/std/type_traits (__is_floating_point_helper<_Float16>,
__is_floating_point_helper<_Float32>,
__is_floating_point_helper<_Float64>,
__is_floating_point_helper<_Float128>,
__is_floating_point_helper<__gnu_cxx::__bfloat16_t>): Likewise.
* include/std/limits (__glibcxx_concat3_, __glibcxx_concat3,
__glibcxx_float_n): Define.
(numeric_limits<_Float16>, numeric_limits<_Float32>,
numeric_limits<_Float64>, numeric_limits<_Float128>,
numeric_limits<__gnu_cxx::__bfloat16_t>): New explicit instantiations.
* include/bits/std_abs.h (abs): New overloads for
_Float{16,32,64,128} and __gnu_cxx::__bfloat16_t.
* include/bits/c++config (_GLIBCXX_LDOUBLE_IS_IEEE_BINARY128): Define
if long double is IEEE quad.
(__gnu_cxx::__bfloat16_t): New using.
* include/c_global/cmath (acos, asin, atan, atan2, ceil, cos, cosh,
exp, fabs, floor, fmod, frexp, ldexp, log, log10, modf, pow, sin,
sinh, sqrt, tan, tanh, fpclassify, isfinite, isinf, isnan, isnormal,
signbit, isgreater, isgreaterequal, isless, islessequal,
islessgreater, isunordered, acosh, asinh, atanh, cbrt, copysign, erf,
erfc, exp2, expm1, fdim, fma, fmax, fmin, hypot, ilogb, lgamma,
llrint, llround, log1p, log2, logb, lrint, lround, nearbyint,
nextafter, remainder, rint, round, scalbln, scalbn, tgamma, trunc,
lerp): New overloads with _Float{16,32,64,128} or
__gnu_cxx::__bfloat16_t types.
* config/os/gnu-linux/os_defines.h (_GLIBCXX_HAVE_FLOAT128_MATH):
Define if 

[PATCH] middle-end, v4: IFN_ASSUME support [PR106654]

2022-10-17 Thread Jakub Jelinek via Gcc-patches
Hi!

On Mon, Oct 17, 2022 at 06:55:40AM +, Richard Biener wrote:
> > That is what I wrote in the patch description as alternative:
> > "with the condition wrapped into a GIMPLE_BIND (I admit the above isn't
> > extra clean but it is just something to hold it from gimplifier until
> > gimple low pass; it reassembles if (condition_never_true) { cond; };
> > an alternative would be introduce GOMP_ASSUME statement that would have
> > the guard var as operand and the GIMPLE_BIND as body, but for the
> > few passes (tree-nested and omp lowering) in between that looked like
> > an overkill to me)"
> > I can certainly implement that easily.
> 
> I'd prefer that, it looks possibly less messy.

Ok, introduced GIMPLE_ASSUME for this then.

> Ah, they are all in all_passes :/  Maybe we can add something
> like TODO_discard_function (or a property) that will not discard
> the function but stop compiling it?  I wonder if all cleanup
> is properly done for the function -  I suppose we want to keep the
> body around for callers indefinitely.
> 
> > What I had in the patch was just skip pass_expand
> > and pass_rest_of_compilation, perhaps another possibility
> > to do the former would be to define a gate on pass_expand.
> 
> Or some gate in the pass manager, like in override_gate_status
> check fun->properties & PROP_suspended and have some
> pass_suspend_assume add that property for all assume function
> bodies.
> 
> In case you like any of the above give it a shot, otherwise what
> you have isn't too bad, I just wondered if there's a nicer way.

Turns out we already had TODO_discard_function, so I've reused it
with the detail that assume function's bodies aren't actually dropped,
and a new pass which returns that and where I'd like to have the
backwards range walk implemented.

> I suppose for now adding noipa is easiest, we'd still inline into
> the body of course.

Ok, done.

On Fri, Oct 14, 2022 at 11:27:07AM +, Richard Biener wrote:
> > @@ -237,6 +244,383 @@ lower_omp_directive (gimple_stmt_iterato
> >gsi_next (gsi);
> >  }
> 
> comment missing

Added.

> > +static tree
> > +create_assumption_fn (location_t loc)
> > +{
> > +  tree name = clone_function_name_numbered (current_function_decl, 
> > "_assume");
> > +  /* For now, will be changed later.  */
> 
> ?

I've tried error_mark_node as the type, but that didn't work out, get
various ICEs with it even for the short time before it is fixed up.
But changed that to
  tree type = build_varargs_function_type_list (boolean_type_node, NULL_TREE);
which is closer to what it ends up later.

> > +  DECL_FUNCTION_VERSIONED (decl)
> > += DECL_FUNCTION_VERSIONED (current_function_decl);
> 
> what does it mean to copy DECL_FUNCTION_VERSIONED here?

Dropped.

> > + && !is_gimple_val (vargs[i - sz]))
> 
> a few comments might be helpful here

Added some to various places in that function.
> > @@ -1490,7 +1494,14 @@ public:
> >  
> >/* opt_pass methods: */
> >opt_pass * clone () final override { return new pass_slp_vectorize 
> > (m_ctxt); }
> > -  bool gate (function *) final override { return flag_tree_slp_vectorize 
> > != 0; }
> > +  bool gate (function *fun) final override
> > +  {
> > +/* Vectorization makes range analysis of assume functions harder,
> > +   not easier.  */
> 
> Can we split out these kind of considerations from the initial patch?

Dropped for now.

> Reading some of the patch I guessed you wanted to handle nested
> assumes.  So - is
> 
> [[assume (a == 4 && ([[assume(b == 3)]], b != 2))]]
> 
> a thing?

Added 2 tests for nested assumptions (one with a simple assumption
nested in complex one, one with side-effects one nested in complex one).

So far lightly tested, will test fully overnight.

2022-10-17  Jakub Jelinek  

PR c++/106654
gcc/
* gimple.def (GIMPLE_ASSUME): New statement kind.
* gimple.h (struct gimple_statement_assume): New type.
(is_a_helper ::test,
is_a_helper ::test): New.
(gimple_build_assume): Declare.
(gimple_has_substatements): Return true for GIMPLE_ASSUME.
(gimple_assume_guard, gimple_assume_set_guard,
gimple_assume_guard_ptr, gimple_assume_body_ptr, gimple_assume_body):
New inline functions.
* gsstruct.def (GSS_ASSUME): New.
* gimple.cc (gimple_build_assume): New function.
(gimple_copy): Handle GIMPLE_ASSUME.
* gimple-pretty-print.cc (dump_gimple_assume): New function.
(pp_gimple_stmt_1): Handle GIMPLE_ASSUME.
* gimple-walk.cc (walk_gimple_op): Handle GIMPLE_ASSUME.
* omp-low.cc (WALK_SUBSTMTS): Likewise.
(lower_omp_1): Likewise.
* omp-oacc-kernels-decompose.cc (adjust_region_code_walk_stmt_fn):
Likewise.
* tree-cfg.cc (verify_gimple_stmt, verify_gimple_in_seq_2): Likewise.
* function.h (struct function): Add assume_function bitfield.
* gimplify.cc (gimplify_call_expr): If the assumption isn't
simple enough, 

Re: [PATCH, rs6000] Tests of ARCH_PWR8 and -mno-vsx option. (1/2)

2022-10-17 Thread Segher Boessenkool
Hi!

Everything Ke Wen said.  Some more commments / hints:

On Mon, Sep 19, 2022 at 11:05:17AM -0500, will schmidt wrote:
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/predefine_p7-novsx.c
> @@ -0,0 +1,9 @@
> +/* { dg-do preprocess } */
> +/* Test whether the ARCH_PWR7 and ARCH_PWR8 defines gets set
> + * when we specify power7, plus options.
> +/* This is a variation of the test at issue in GCC PR 101865 */

Please don't start comment lines with stars.  And don't start a comment
inside of another comment :-)

> +/* { dg-options "-dM -E -mdejagnu-cpu=power7 -mno-vsx" } */
> +/* { dg-final { scan-file predefine_p7-novsx.i "(^|\\n)#define _ARCH_PWR7 
> 1($|\\n)"  } } */

REs are easier to read and write if you write them inside {} instead of
inside "".

Whenever you see  (^|\n)  it should hint you to use newline-sensitive
matching?  Like
  {(?n)^#define _ARCH_PWR7 1$}
(it makes ^ and $ match the begin/end of lines instead of of the string,
and makes . and similar not match newlines).

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/predefine_p9-novsx.c
> @@ -0,0 +1,10 @@
> +/* { dg-do preprocess } */
> +/* Test whether the ARCH_PWR8 define remains set after disabling vsx.
> +   This also confirms __ALTIVEC__ remains set when VSX is disabled. */
> +/* This is the primary test at issue in GCC PR 101865 */
> +/* { dg-options "-dM -E -mdejagnu-cpu=power9 -mno-vsx" } */
> +/* {xfail *-*-*} */

An xfail always needs a comment :-)


Segher


Re: [PATCH 1/2] Add a parameter for the builtin function of prefetch to align with LLVM

2022-10-17 Thread Richard Earnshaw via Gcc-patches




On 14/10/2022 09:34, Haochen Jiang via Gcc-patches wrote:

gcc/ChangeLog:

* builtins.cc (expand_builtin_prefetch): Handle the fourth parameter in
expand function.
* config/aarch64/aarch64-sve.md: Add default parameter value.
* config/aarch64/aarch64.md (prefetch): New define_expand.
(*prefetch): Add default parameter value.
* config/alpha/alpha.md (prefetch): New define_expand.
(*prefetch): Add default parameter value.
* config/arc/arc.md: Add default parameter value.
* config/arm/arm.md (prefetch): New define_expand.
(*prefetch): Add default parameter value.
* config/frv/frv.md: Ditto.
* config/i386/i386.md: Ditto.
* config/ia64/ia64.md (prefetch): New define_expand.
(*prefetch): Add default parameter value.
* config/mips/mips.md (prefetch): New define_expand.
(*prefetch): Add default parameter value.
* config/pa/pa.md: Ditto.
* config/rs6000/rs6000.md (prefetch): New define_expand.
(*prefetch): Add default parameter value.
* config/s390/s390.cc (s390_expand_cpymem): Generate fourth parameter 
for
gen_prefetch call.
(s390_expand_setmem): Ditto.
(s390_expand_cmpmem): Ditto.
* config/s390/s390.md (prefetch): New define_expand.
(*prefetch): Add default parameter value.
* config/sh/sh.md: Ditto.
* config/sparc/sparc.md: Ditto.
* doc/rtl.texi: Document cache variable for prefetch.
* rtl.def (PREFETCH): Change prefetch DEF_RTL_EXPR to add fourth 
parameter.
* rtlanal.cc (setup_reg_subrtx_bounds): Change gcc_checking_assert for
fourth parameter.
* target-insns.def (prefetch): Add fourth rtx for prefetch.

gcc/testsuite/ChangeLog:

* gcc.c-torture/execute/builtin-prefetch-1.c: Add fourth parameter for
testcases.
* gcc.c-torture/execute/builtin-prefetch-2.c: Ditto.
* gcc.c-torture/execute/builtin-prefetch-3.c: Ditto.
* gcc.c-torture/execute/builtin-prefetch-4.c: Ditto.
* gcc.c-torture/execute/builtin-prefetch-5.c: Ditto.
* gcc.c-torture/execute/builtin-prefetch-6.c: Ditto.
* gcc.dg/builtin-prefetch-1.c: Ditto.
* gcc.misc-tests/i386-pf-3dnow-1.c: Ditto.
* gcc.misc-tests/i386-pf-athlon-1.c: Ditto.
* gcc.misc-tests/i386-pf-none-1.c: Ditto.
* gcc.misc-tests/i386-pf-sse-1.c: Ditto.
* gcc.target/i386/avx-1.c: Change prefetch macro define to variable 
args.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/aarch64/prefetchi-1.c: New test.
* gcc.target/alpha/prefetchi-1.c: Ditto.
* gcc.target/arc/prefetchi-1.c: Ditto.
* gcc.target/arm/prefetchi-1.c: Ditto.
* gcc.target/hppa/prefetchi-1.c: Ditto.
* gcc.target/i386/prefetchi-1.c: Ditto.
* gcc.target/ia64/prefetchi-1.c: Ditto.
* gcc.target/mips/prefetchi-1.c: Ditto.
* gcc.target/powerpc/prefetchi-1.c: Ditto.
* gcc.target/s390/prefetchi-1.c: Ditto.
* gcc.target/sh/prefetchi-1.c: Ditto.
* gcc.target/sparc/prefetchi-1.c: Ditto.
---
  gcc/builtins.cc   |  34 --
  gcc/config/aarch64/aarch64-sve.md |  15 ++-
  gcc/config/aarch64/aarch64.md |  19 +++-
  gcc/config/alpha/alpha.md |  19 +++-
  gcc/config/arc/arc.md |  20 +++-
  gcc/config/arm/arm.md |  19 +++-
  gcc/config/frv/frv.md |   6 +-
  gcc/config/i386/i386.md   |  17 ++-
  gcc/config/ia64/ia64.md   |  19 +++-
  gcc/config/mips/mips.md   |  22 +++-
  gcc/config/pa/pa.md   |  12 +-
  gcc/config/rs6000/rs6000.md   |  19 +++-
  gcc/config/s390/s390.cc   |  10 +-
  gcc/config/s390/s390.md   |  19 +++-
  gcc/config/sh/sh.md   |  15 ++-
  gcc/config/sparc/sparc.md |  15 ++-
  gcc/doc/rtl.texi  |   6 +-
  gcc/rtl.def   |   5 +-
  gcc/rtlanal.cc|   2 +-
  gcc/target-insns.def  |   2 +-
  .../execute/builtin-prefetch-1.c  |  45 
  .../execute/builtin-prefetch-2.c  | 106 +-
  .../execute/builtin-prefetch-3.c  |  92 +++
  .../execute/builtin-prefetch-4.c  |  44 
  .../execute/builtin-prefetch-5.c  |  12 +-
  .../execute/builtin-prefetch-6.c  |   4 +-
  gcc/testsuite/gcc.dg/builtin-prefetch-1.c |   5 +-
  .../gcc.misc-tests/i386-pf-3dnow-1.c  |  16 +--
  .../gcc.misc-tests/i386-pf-athlon-1.c |  16 +--
  gcc/testsuite/gcc.misc-tests/i386-pf-none-1.c |  16 +--
  

Re: [GCC][PATCH] arm: Add cde feature support for Cortex-M55 CPU.

2022-10-17 Thread Bernhard Reutner-Fischer via Gcc-patches
On 17 October 2022 15:29:33 CEST, Christophe Lyon via Gcc-patches 
 wrote:
>Hi Srinath,
>
>
>On 10/10/22 10:20, Srinath Parvathaneni via Gcc-patches wrote:
>> Hi,
>> 
>> This patch adds cde feature (optional) support for Cortex-M55 CPU, please 
>> refer
>> [1] for more details. To use this feature we need to specify +cdecpN
>> (e.g. -mcpu=cortex-m55+cdecp), where N is the coprocessor number 0 to 7.
>> 
>> Bootstrapped for arm-none-linux-gnueabihf target, regression tested
>> on arm-none-eabi target and found no regressions.
>> 
>> [1] https://developer.arm.com/documentation/101051/0101/?lang=en (version: 
>> r1p1).
>> 
>> Ok for master?
>> 
>> Regards,
>> Srinath.
>> 
>> gcc/ChangeLog:
>> 
>> 2022-10-07  Srinath Parvathaneni  
>> 
>>  * common/config/arm/arm-common.cc (arm_canon_arch_option_1): Ignore 
>> cde
>>  options for mlibarch.
>>  * config/arm/arm-cpus.in (begin cpu cortex-m55): Add cde options.
>>  * doc/invoke.texi (CDE): Document options for Cortex-M55 CPU.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>> 2022-10-07  Srinath Parvathaneni  
>> 
>>  * gcc.target/arm/multilib.exp: Add multilib tests for Cortex-M55 
>> CPU.
>> 
>> 
>> ### Attachment also inlined for ease of reply
>> ###
>> 
>> 
>> diff --git a/gcc/common/config/arm/arm-common.cc 
>> b/gcc/common/config/arm/arm-common.cc
>> index 
>> c38812f1ea6a690cd19b0dc74d963c4f5ae155ca..b6f955b3c012475f398382e72c9a3966412991ec
>>  100644
>> --- a/gcc/common/config/arm/arm-common.cc
>> +++ b/gcc/common/config/arm/arm-common.cc
>> @@ -753,6 +753,15 @@ arm_canon_arch_option_1 (int argc, const char **argv, 
>> bool arch_for_multilib)
>> arm_initialize_isa (target_isa, selected_cpu->common.isa_bits);
>> arm_parse_option_features (target_isa, _cpu->common,
>>   strchr (cpu, '+'));
>> +  if (arch_for_multilib)
>> +{
>> +  const enum isa_feature removable_bits[] = {ISA_IGNORE_FOR_MULTILIB,
>> + isa_nobit};
>> +  sbitmap isa_bits = sbitmap_alloc (isa_num_bits);
>> +  arm_initialize_isa (isa_bits, removable_bits);
>> +  bitmap_and_compl (target_isa, target_isa, isa_bits);
>> +}
>> +
>
>I can see the piece of code you add here is exactly the same as the one a few 
>lines above when handling "if (arch)". Can this be moved below and thus be 
>common to the two cases, or does it have to be performed before bitmap_ior of 
>fpu_isa?
>
>Also, IIUC, CDE was already optional for other CPUs (M33, M35P, star-mc1), so 
>the hunk above fixes a latent bug when handling multilibs for these CPUs too? 
>If so, maybe worth splitting the patch into two parts since the above is not 
>strictly related to M55?
>
>But I'm not a maintainer ;-)

Don't you have to sbitmap_free the thing, short of using an auto_sbitmap?

thanks,

>
>Thanks,
>
>Christophe
>
>> if (fpu && strcmp (fpu, "auto") != 0)
>>  {
>>/* The easiest and safest way to remove the default fpu
>> diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
>> index 
>> 5a63bc548e54dbfdce5d1df425bd615d81895d80..aa02c04c4924662f3ddd58e6967392ba3f4b4a87
>>  100644
>> --- a/gcc/config/arm/arm-cpus.in
>> +++ b/gcc/config/arm/arm-cpus.in
>> @@ -1633,6 +1633,14 @@ begin cpu cortex-m55
>>option nomve remove mve mve_float
>>option nofp remove ALL_FP mve_float
>>option nodsp remove MVE mve_float
>> + option cdecp0 add cdecp0
>> + option cdecp1 add cdecp1
>> + option cdecp2 add cdecp2
>> + option cdecp3 add cdecp3
>> + option cdecp4 add cdecp4
>> + option cdecp5 add cdecp5
>> + option cdecp6 add cdecp6
>> + option cdecp7 add cdecp7
>>isa quirk_no_asmcpu quirk_vlldm
>>costs v7m
>>vendor 41
>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index 
>> aa5655764a0360959f9c1061749d2cc9ebd23489..26857f7a90e42d925bc6908686ac78138a53c4ad
>>  100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -21698,6 +21698,10 @@ floating-point instructions on @samp{cortex-m55}.
>>   Disable the M-Profile Vector Extension (MVE) single precision 
>> floating-point
>>   instructions on @samp{cortex-m55}.
>>   +@item +cdecp0, +cdecp1, ... , +cdecp7
>> +Enable the Custom Datapath Extension (CDE) on selected coprocessors 
>> according
>> +to the numbers given in the options in the range 0 to 7 on 
>> @samp{cortex-m55}.
>> +
>>   @item  +nofp
>>   Disables the floating-point instructions on @samp{arm9e},
>>   @samp{arm946e-s}, @samp{arm966e-s}, @samp{arm968e-s}, @samp{arm10e},
>> diff --git a/gcc/testsuite/gcc.target/arm/multilib.exp 
>> b/gcc/testsuite/gcc.target/arm/multilib.exp
>> index 
>> 2fa648c61dafebb663969198bf7849400a7547f6..7a977bff58b7b68bfe9e49d7602989a39caa6534
>>  100644
>> --- a/gcc/testsuite/gcc.target/arm/multilib.exp
>> +++ b/gcc/testsuite/gcc.target/arm/multilib.exp
>> @@ -851,6 +851,18 @@ if {[multilib_config "rmprofile"] } {
>>  {-mcpu=cortex-m55+nomve+nofp -mfpu=auto 

RE: [EXTERNAL] RE: [r13-3172 Regression] FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for excess errors)

2022-10-17 Thread Eugene Rozenfeld via Gcc-patches
Yes, I received that one. The root cause is the -gstatement-frontiers issue 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100733 . I submitted a workaround 
patch for that ( 
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603673.html ) but it 
hasn't been approved yet. Another workaround is passing 
-gno-statement-frontiers to the test.

Eugene

-Original Message-
From: Thomas Schwinge  
Sent: Monday, October 17, 2022 3:46 AM
To: Eugene Rozenfeld 
Cc: haochen.ji...@intel.com; gcc-patches@gcc.gnu.org; gcc-regress...@gcc.gnu.org
Subject: [EXTERNAL] RE: [r13-3172 Regression] 
FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for 
excess errors) on Linux/x86_64

Hi!

On 2022-10-17T06:11:01+, "Jiang, Haochen via Gcc-patches" 
 wrote:
> I just checkout to your commit and the test still got failed.
>
> It is reporting like this:
> xgcc: error: 
> /export/users2/haochenj/src/gcc/master/./libgomp/testsuite/libgomp.oac
> c-c++/../libgomp.oacc-c-c++-common/kernels-loop-g.c: '-fcompare-debug' 
> failure (length)

Right.  I had filed 

 "[13 Regression]
c-c++-common/goacc/kernels-loop-g.c: '-fcompare-debug' failure (length)"
with you in CC: .  (Have you not received that
one?)


Grüße
 Thomas


> Also fix a typo in manually sending, should be this to reproduce
>
> To reproduce:
>
> $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
> --target_board='unix{-m32}'"
> $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
> --target_board='unix{-m32\ -march=cascadelake}'"
> $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
> --target_board='unix{-m64}'"
> $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
> --target_board='unix{-m64\ -march=cascadelake}'"
>
> BRs,
> Haochen
>
> From: Jiang, Haochen
> Sent: Monday, October 17, 2022 1:41 PM
> To: Eugene Rozenfeld ; 
> gcc-patches@gcc.gnu.org; gcc-regress...@gcc.gnu.org
> Subject: RE: [r13-3172 Regression] 
> FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
> -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 
> (test for excess errors) on Linux/x86_64
>
> If that has been fixed, just ignore that mail.
>
> It is run through by a script and got the result few days ago. 
> However, the sendmail service was down on that machine and I just 
> noticed that issue. So I sent that result manually today in case that is not 
> fixed.
>
> Sorry for the disturb!
>
> BRs,
> Haochen
>
> From: Eugene Rozenfeld 
> mailto:eugene.rozenf...@microsoft.com>
> >
> Sent: Monday, October 17, 2022 1:23 PM
> To: Jiang, Haochen 
> mailto:haochen.ji...@intel.com>>; 
> gcc-patches@gcc.gnu.org; 
> gcc-regress...@gcc.gnu.org
> Subject: RE: [r13-3172 Regression] 
> FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
> -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 
> (test for excess errors) on Linux/x86_64
>
> That commit had a bug that was fixed in 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.
> gnu.org%2Fgit%2F%3Fp%3Dgcc.git%3Ba%3Dcommit%3Bh%3D80f414e6d73f9f1683f9
> 3d83ce63a6a482e54beedata=05%7C01%7CEugene.Rozenfeld%40microsoft.c
> om%7C05c1cd8b5db4455fe3de08dab02ce66e%7C72f988bf86f141af91ab2d7cd011db
> 47%7C1%7C0%7C638016004172688130%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLj
> AwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
> mp;sdata=50kAEt5C%2F8DO%2BvuyNvE7c1ydxriGeQwY5G%2Bqs%2FgSGE0%3Dre
> served=0
>
> Was that fix included in your GCC build?
>
> From: Jiang, Haochen 
> mailto:haochen.ji...@intel.com>>
> Sent: Sunday, October 16, 2022 8:09 PM
> To: gcc-patches@gcc.gnu.org; Eugene 
> Rozenfeld 
> mailto:eugene.rozenf...@microsoft.com>
> >; Jiang, Haochen 
> mailto:haochen.ji...@intel.com>>; 
> gcc-regress...@gcc.gnu.org
> Subject: [EXTERNAL] [r13-3172 Regression] 
> FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
> -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 
> (test for excess errors) on Linux/x86_64
>
> You don't often get email from 
> 

RE: [EXTERNAL] Re: [PATCH] Don't print discriminators for -fcompare-debug.

2022-10-17 Thread Eugene Rozenfeld via Gcc-patches
Yes, -gstatement-frontiers is the root cause here but the new approach to 
discriminators is especially prone to this. I added the workaround to pr85213.c 
in my original discriminator patch but now two more -fcompare-debug bugs were 
opened (PR107231 and PR107169). I suspect we'll keep getting more. So I'd like 
to disable printing discriminators in -fcompare-debug dums until 
-gstatement-frontier issue is fixed.

Eugene

-Original Message-
From: Richard Biener  
Sent: Monday, October 17, 2022 12:06 AM
To: Eugene Rozenfeld 
Cc: gcc-patches@gcc.gnu.org; Jason Merrill 
Subject: [EXTERNAL] Re: [PATCH] Don't print discriminators for -fcompare-debug.

On Sun, Oct 16, 2022 at 10:25 PM Eugene Rozenfeld via Gcc-patches 
 wrote:
>
> With -gstatement-frontiers we may end up with different IR coming from 
> the front end with and without debug information turned on.
> See 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fbugzilla%2Fshow_bug.cgi%3Fid%3D100733data=05%7C01%7CEugene.Rozenfeld%40microsoft.com%7C5d3df88ec7e14f5eec2708dab00e0440%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638015871510301049%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=2JjQHAgDi6%2Fet1vowA1IRcdInJMkkjuva9DbM5rHawc%3Dreserved=0
>  for details.
> That may result in differences in discriminator values and 
> -fcompare-debug failures.
>
> This patch disables printing of discriminators when the dump is 
> intended for -fcompare-debug comparison and reverses the workaround in a test.

I don't think this is the correct approach.  -gstatement-frontiers is known to 
be prone to these issues and is the one to blame here.  I think the bugs should 
be SUSPENDED until -gstatement-frontiers is fixed or at least disabled by 
default (IIRC Jakub tried that but failed last time)

> Tested on x86_64-pc-linux-gnu.
>
> gcc/ChangeLog:
> PR debug/107231
> PR debug/107169
> * print-rtl.cc (print_rtx_operand_code_i): Don't print discriminators
> for -fdebug-compare.
>
> gcc/testsuite/ChangeLog:
>
> * c-c++-common/ubsan/pr85213.c: Reverse the workaround for 
> discriminators.
> ---
>  gcc/print-rtl.cc   | 13 ++---
>  gcc/testsuite/c-c++-common/ubsan/pr85213.c |  7 +--
>  2 files changed, 11 insertions(+), 9 deletions(-)
>
> diff --git a/gcc/print-rtl.cc b/gcc/print-rtl.cc index 
> e115f987173..0476f3d7e79 100644
> --- a/gcc/print-rtl.cc
> +++ b/gcc/print-rtl.cc
> @@ -453,10 +453,17 @@ rtx_writer::print_rtx_operand_code_i (const_rtx in_rtx, 
> int idx)
>   expanded_location xloc = insn_location (in_insn);
>   fprintf (m_outfile, " \"%s\":%i:%i", xloc.file, xloc.line,
>xloc.column);
> - int discriminator = insn_discriminator (in_insn);
> -   if (discriminator)
> - fprintf (m_outfile, " discrim %d", discriminator);
>
> + /* Don't print discriminators for -fcompare-debug since the IR
> +coming from the front end may be different with and without
> +debug information turned on. That may result in different
> +discriminator values. */
> + if (!(dump_flags & TDF_COMPARE_DEBUG))
> +   {
> + int discriminator = insn_discriminator (in_insn);
> + if (discriminator)
> +   fprintf (m_outfile, " discrim %d", discriminator);
> +   }
> }
>  #endif
>  }
> diff --git a/gcc/testsuite/c-c++-common/ubsan/pr85213.c 
> b/gcc/testsuite/c-c++-common/ubsan/pr85213.c
> index e903e976f2c..8a6be81d20f 100644
> --- a/gcc/testsuite/c-c++-common/ubsan/pr85213.c
> +++ b/gcc/testsuite/c-c++-common/ubsan/pr85213.c
> @@ -1,11 +1,6 @@
>  /* PR sanitizer/85213 */
>  /* { dg-do compile } */
> -/* Pass -gno-statement-frontiers to work around
> -   
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fbugzilla%2Fshow_bug.cgi%3Fid%3D100733data=05%7C01%7CEugene.Rozenfeld%40microsoft.com%7C5d3df88ec7e14f5eec2708dab00e0440%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638015871510301049%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=2JjQHAgDi6%2Fet1vowA1IRcdInJMkkjuva9DbM5rHawc%3Dreserved=0
>  :
> -   without it the IR coming from the front end may be different with and 
> without
> -   debug information turned on. That may cause e.g., different discriminator 
> values
> -   and -fcompare-debug failures. */
> -/* { dg-options "-O1 -fsanitize=undefined -fcompare-debug 
> -gno-statement-frontiers" } */
> +/* { dg-options "-O1 -fsanitize=undefined -fcompare-debug" } */
>
>  int
>  foo (int x)
> --
> 2.25.1


Re: Add 'c-c++-common/torture/pr107195-1.c' [PR107195] (was: [COMMITTED] [PR107195] Set range to zero when nonzero mask is 0.)

2022-10-17 Thread Thomas Schwinge
Hi!

On 2022-10-17T15:58:47+0200, Aldy Hernandez  wrote:
> On Mon, Oct 17, 2022 at 9:44 AM Thomas Schwinge  
> wrote:
>> On 2022-10-11T10:31:37+0200, Aldy Hernandez via Gcc-patches 
>>  wrote:
>> > When solving 0 = _15 & 1, we calculate _15 as:
>> >
>> >   [irange] int [-INF, -2][0, +INF] NONZERO 0xfffe
>> >
>> > The known value of _15 is [0, 1] NONZERO 0x1 which is intersected with
>> > the above, yielding:
>> >
>> >   [0, 1] NONZERO 0x0
>> >
>> > This eventually gets copied to a _Bool [0, 1] NONZERO 0x0.
>> >
>> > This is problematic because here we have a bool which is zero, but
>> > returns false for irange::zero_p, since the latter does not look at
>> > nonzero bits.  This causes logical_combine to assume the range is
>> > not-zero, and all hell breaks loose.
>> >
>> > I think we should just normalize a nonzero mask of 0 to [0, 0] at
>> > creation, thus avoiding all this.
>>
>> 1. This commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04
>> "[PR107195] Set range to zero when nonzero mask is 0" broke a GCC/nvptx
>> offloading test case:
>>
>> UNSUPPORTED: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c 
>> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0
>> PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c 
>> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  
>> (test for excess errors)
>> PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c 
>> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  
>> execution test
>> [-PASS:-]{+FAIL:+} 
>> libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c 
>> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2   
>> scan-nvptx-none-offload-rtl-dump mach "SESE regions:.* 
>> [0-9]+{[0-9]+->[0-9]+(\\.[0-9]+)+}"
>>
>> Same for C++.
>>
>> I'll later send a patch (for the test case!) to fix that up.
>>
>> 2. Looking into this, I found that this
>> commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04
>> "[PR107195] Set range to zero when nonzero mask is 0" actually enables a
>> code transformation/optimization that GCC apparently has not been doing
>> before!  I've tried to capture that in the attached
>> "Add 'c-c++-common/torture/pr107195-1.c' [PR107195]".
>
> Nice.
>
>> Will you please verify that one?  In its current '#if 1' configuration,
>> it's all-PASS after commit
>> r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04
>> "[PR107195] Set range to zero when nonzero mask is 0", whereas before, we
>> get two calls to 'foo', because GCC apparently didnn't understand the
>> relation (optimization opportunity) between 'r *= 2;' and the subsequent
>> 'if (r & 1)'.
>
> Yeah, that looks correct.  We keep better track of nonzero masks.

OK, next observation: this also works for split-up expressions
'if ((r & 2) && (r & 1))' (same rationale as for 'if (r & 1)' alone).
I've added such a variant in my test case.

But: it doesn't work for logically equal 'if (r & 3)'.  I've added such
an XFAILed variant in my test case.  Do you have guidance what needs to
be done to make such cases work, too?

>> I've left in the other '#if' variants in case you'd like to experiment
>> with these, but would otherwise clean that up before pushing.
>>
>> Where does one put such a test case?
>>
>> Should the file be named 'pr107195' or something else?
>
> The aforementioned patch already has:
>
> * gcc.dg/tree-ssa/pr107195-1.c: New test.
> * gcc.dg/tree-ssa/pr107195-2.c: New test.
>
> So I would just add a pr107195-3.c test.

But note that unlike yours in 'gcc.dg/tree-ssa/', I had put mine into
'c-c++-common/torture/'.  That's so that we get C and C++ testing, and
all torture testing flag variants.  (... where we do see the optimization
happen starting at '-O1'.)  Do you think that is excessive, and a single
'gcc.dg/tree-ssa/' test case, C only, '-O1' only is sufficient for this?
(I don't have much experience with test cases in such regions of GCC,
hence these questions.)

>> Do we scan 'optimized', or an earlier dump?
>>
>> At '-O1', the actual code transformation is visible already in the 'dom2'
>> dump:
>>
>> [local count: 536870913]:
>>gimple_assign 
>> +  gimple_assign 
>> +  goto ; [100.00%]
>>
>> -   [local count: 1073741824]:
>> -  # gimple_phi 
>> +   [local count: 536870912]:
>> +  # gimple_phi 
>>gimple_assign 
>>gimple_cond 
>> -goto ; [50.00%]
>> +goto ; [100.00%]
>>else
>> -goto ; [50.00%]
>> +goto ; [0.00%]
>>
>> [local count: 536870913]:
>>gimple_call 
>>gimple_assign 
>>
>> [local count: 1073741824]:
>> -  # gimple_phi 
>> +  # gimple_phi 
>>gimple_return 
>>
>> And, the actual "avoid second call 'foo'" optimization is visiable
>> starting 'dom3':
>>
>> [local count: 536870913]:
>>gimple_assign 
>> +  goto ; [100.00%]
>>
>> 

[PATCH] i386: Allow setting target attribute from conditional expression

2022-10-17 Thread J.W. Jagersma via Gcc-patches
Recently I tried to set a function's target attribute conditionally
based on template parameters, eg.:

template
[[gnu::target (enable_sse ? "sse" : "")]]
void func () { /* ... */ }

I then discovered that this is currently not possible.  This small patch
resolves that.

A possible alternative solution is to do this globally, eg. in
decl_attributes.  But doing so would trigger empty-string warnings from
handle_target_attribute, and I don't know how safe it is to remove that.
There likely isn't much use for this with other attributes, anyway.

2022-10-17  Jan W. Jagersma  

gcc/ChangeLog:
* config/i386/i386-options.cc
(ix86_valid_target_attribute_inner_p):  Dereference args string
from ADDR_EXPR.

gcc/testsuite/ChangeLog:
* g++.target/i386/target-attr-conditional.C: New test.
---
 gcc/config/i386/i386-options.cc   |  9 
 .../g++.target/i386/target-attr-conditional.C | 53 +++
 2 files changed, 62 insertions(+)
 create mode 100644 gcc/testsuite/g++.target/i386/target-attr-conditional.C

diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index acb2291e70f..915f3b0c1f0 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -1123,6 +1123,15 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree 
args, char *p_strings[],
 = fndecl == NULL ? UNKNOWN_LOCATION : DECL_SOURCE_LOCATION (fndecl);
   const char *attr_name = target_clone_attr ? "target_clone" : "target";
 
+  args = tree_strip_nop_conversions (args);
+
+  if (TREE_CODE (args) == ADDR_EXPR)
+{
+  /* Attribute string is given by a constexpr function or conditional
+expression.  Dereference ADDR_EXPR, operand should be a STRING_CST.  */
+  args = TREE_OPERAND (args, 0);
+}
+
   /* If this is a list, recurse to get the options.  */
   if (TREE_CODE (args) == TREE_LIST)
 {
diff --git a/gcc/testsuite/g++.target/i386/target-attr-conditional.C 
b/gcc/testsuite/g++.target/i386/target-attr-conditional.C
new file mode 100644
index 000..2d418ed90bf
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/target-attr-conditional.C
@@ -0,0 +1,53 @@
+/* { dg-do compile } */
+/* { dg-options "-Wno-psabi -m32 -march=i386 -std=c++20" } */
+
+#pragma GCC push_options
+#pragma GCC target("sse")
+
+typedef int __m64 __attribute__ ((__vector_size__ (8), __may_alias__));
+typedef short __v4hi __attribute__ ((__vector_size__ (8)));
+
+extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_extract_pi16 (__m64 const __A, int const __N)
+{
+  return (unsigned short) __builtin_ia32_vec_ext_v4hi ((__v4hi)__A, __N);
+}
+
+#pragma GCC pop_options
+
+consteval const char*
+target_string (bool enable_sse)
+{
+  return enable_sse ? "sse" : "";
+}
+
+// Via consteval function
+template
+[[gnu::target (target_string (enable_sse))]]
+int
+extract1 (__m64 const src)
+{
+  if constexpr (enable_sse)
+return _mm_extract_pi16 (src, 0);
+  else
+return reinterpret_cast<__v4hi>(src)[1];
+}
+
+// Via ternary operator
+template
+[[gnu::target (enable_sse ? "sse" : "")]]
+int
+extract2 (__m64 const src)
+{
+  if constexpr (enable_sse)
+return _mm_extract_pi16 (src, 2);
+  else
+return reinterpret_cast<__v4hi>(src)[3];
+}
+
+int
+test (__m64 const src)
+{
+  return extract1(src) + extract1(src)
+   + extract2(src) + extract2(src);
+}
-- 
2.35.1



RE: [PATCH] [X86_64]: Enable support for next generation AMD Zen4 CPU

2022-10-17 Thread Joshi, Tejas Sanjay via Gcc-patches
[Public]

Hi,

> BTW: Perhaps znver1.md is not the right filename anymore, since it hosts all 
> four Zen schedulers.

I have renamed the file to znver.md in this revision, PFA.
Thank you for the review, we will push it for trunk if we don't get any further 
comments.

Thanks and Regards,
Tejas


0001-Enable-AMD-znver4-support-and-add-instruction-reserv.patch
Description: 0001-Enable-AMD-znver4-support-and-add-instruction-reserv.patch


Re: [PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx

2022-10-17 Thread Chung-Lin Tang
Ping.

On 2022/9/21 3:45 PM, Chung-Lin Tang via Gcc-patches wrote:
> Hi Tom,
> I had a patch submitted earlier, where I reported that the current way of 
> implementing
> barriers in libgomp on nvptx created a quite significant performance drop on 
> some SPEChpc2021
> benchmarks:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/600818.html
> 
> That previous patch wasn't accepted well (admittedly, it was kind of a hack).
> So in this patch, I tried to (mostly) re-implement team-barriers for NVPTX.
> 
> Basically, instead of trying to have the GPU do CPU-with-OS-like things that 
> it isn't suited for,
> barriers are implemented simplistically with bar.* synchronization 
> instructions.
> Tasks are processed after threads have joined, and only if team->task_count 
> != 0
> 
> (arguably, there might be a little bit of performance forfeited where earlier 
> arriving threads
> could've been used to process tasks ahead of other threads. But that again 
> falls into requiring
> implementing complex futex-wait/wake like behavior. Really, that kind of 
> tasking is not what target
> offloading is usually used for)
> 
> Implementation highlight notes:
> 1. gomp_team_barrier_wake() is now an empty function (threads never "wake" in 
> the usual manner)
> 2. gomp_team_barrier_cancel() now uses the "exit" PTX instruction.
> 3. gomp_barrier_wait_last() now is implemented using "bar.arrive"
> 
> 4. gomp_team_barrier_wait_end()/gomp_team_barrier_wait_cancel_end():
> The main synchronization is done using a 'bar.red' instruction. This 
> reduces across all threads
> the condition (team->task_count != 0), to enable the task processing down 
> below if any thread
> created a task. (this bar.red usage required the need of the second GCC 
> patch in this series)
> 
> This patch has been tested on x86_64/powerpc64le with nvptx offloading, using 
> libgomp, ovo, omptests,
> and sollve_vv testsuites, all without regressions. Also verified that the 
> SPEChpc 2021 521.miniswp_t
> and 534.hpgmgfv_t performance regressions that occurred in the GCC12 cycle 
> has been restored to
> devel/omp/gcc-11 (OG11) branch levels. Is this okay for trunk?
> 
> (also suggest backporting to GCC12 branch, if performance regression can be 
> considered a defect)
> 
> Thanks,
> Chung-Lin
> 
> libgomp/ChangeLog:
> 
> 2022-09-21  Chung-Lin Tang  
> 
>   * config/nvptx/bar.c (generation_to_barrier): Remove.
>   (futex_wait,futex_wake,do_spin,do_wait): Remove.
>   (GOMP_WAIT_H): Remove.
>   (#include "../linux/bar.c"): Remove.
>   (gomp_barrier_wait_end): New function.
>   (gomp_barrier_wait): Likewise.
>   (gomp_barrier_wait_last): Likewise.
>   (gomp_team_barrier_wait_end): Likewise.
>   (gomp_team_barrier_wait): Likewise.
>   (gomp_team_barrier_wait_final): Likewise.
>   (gomp_team_barrier_wait_cancel_end): Likewise.
>   (gomp_team_barrier_wait_cancel): Likewise.
>   (gomp_team_barrier_cancel): Likewise.
>   * config/nvptx/bar.h (gomp_team_barrier_wake): Remove
>   prototype, add new static inline function.


Re: Announcement: Porting the Docs to Sphinx - 9. November 2022

2022-10-17 Thread Paul Iannetta via Gcc-patches
Hi Martin,

Thank you very much for porting the documentation to Sphinx, it is
very convenient to use, especially the menu on the left and the
search bar.

However, I also regularly browse and search the documentation through
info, especially when I want to use regexps to search or need to
include a special character (eg.,+,-,_,(; this can happen when I
search for '(define' ) for example) in the search string.

Does the port to Sphinx means the end of texinfo? Or, will both be
available as it is the case now with the official texinfo and your
unofficial splichal.eu pages.

Paul

On Mon, Oct 17, 2022 at 03:28:34PM +0200, Martin Liška wrote:
> Hello.
> 
> Based on the very positive feedback I was given at the Cauldron Sphinx 
> Documentation BoF,
> I'm planning migrating the documentation on 9th November. There are still 
> some minor comments
> from Sandra when it comes to the PDF output, but we can address that once the 
> conversion is done.
> 
> The reason I'm sending the email now is that I was waiting for latest Sphinx 
> release (5.3.0) that
> simplifies reference format for options and results in much simpler Option 
> summary section ([1])
> 
> The current GCC master (using Sphinx 5.3.0) converted docs can be seen here:
> https://splichal.eu/scripts/sphinx/
> 
> If you see any issues with the converted documentation, or have a feedback 
> about it,
> please reply to this email.
> 
> Cheers,
> Martin
> 
> [1] https://github.com/sphinx-doc/sphinx/pull/10840
> [1] 
> https://splichal.eu/scripts/sphinx/gcc/_build/html/gcc-command-options/option-summary.html
> 
> 
> 
> 






Re: Add 'c-c++-common/torture/pr107195-1.c' [PR107195] (was: [COMMITTED] [PR107195] Set range to zero when nonzero mask is 0.)

2022-10-17 Thread Aldy Hernandez via Gcc-patches
On Mon, Oct 17, 2022 at 9:44 AM Thomas Schwinge  wrote:
>
> Hi!
>
> On 2022-10-11T10:31:37+0200, Aldy Hernandez via Gcc-patches 
>  wrote:
> > When solving 0 = _15 & 1, we calculate _15 as:
> >
> >   [irange] int [-INF, -2][0, +INF] NONZERO 0xfffe
> >
> > The known value of _15 is [0, 1] NONZERO 0x1 which is intersected with
> > the above, yielding:
> >
> >   [0, 1] NONZERO 0x0
> >
> > This eventually gets copied to a _Bool [0, 1] NONZERO 0x0.
> >
> > This is problematic because here we have a bool which is zero, but
> > returns false for irange::zero_p, since the latter does not look at
> > nonzero bits.  This causes logical_combine to assume the range is
> > not-zero, and all hell breaks loose.
> >
> > I think we should just normalize a nonzero mask of 0 to [0, 0] at
> > creation, thus avoiding all this.
>
> 1. This commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04
> "[PR107195] Set range to zero when nonzero mask is 0" broke a GCC/nvptx
> offloading test case:
>
> UNSUPPORTED: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0
> PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  
> (test for excess errors)
> PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  
> execution test
> [-PASS:-]{+FAIL:+} 
> libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2   
> scan-nvptx-none-offload-rtl-dump mach "SESE regions:.* 
> [0-9]+{[0-9]+->[0-9]+(\\.[0-9]+)+}"
>
> Same for C++.
>
> I'll later send a patch (for the test case!) to fix that up.
>
> 2. Looking into this, I found that this
> commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04
> "[PR107195] Set range to zero when nonzero mask is 0" actually enables a
> code transformation/optimization that GCC apparently has not been doing
> before!  I've tried to capture that in the attached
> "Add 'c-c++-common/torture/pr107195-1.c' [PR107195]".

Nice.

>
> Will you please verify that one?  In its current '#if 1' configuration,
> it's all-PASS after commit
> r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04
> "[PR107195] Set range to zero when nonzero mask is 0", whereas before, we
> get two calls to 'foo', because GCC apparently didnn't understand the
> relation (optimization opportunity) between 'r *= 2;' and the subsequent
> 'if (r & 1)'.

Yeah, that looks correct.  We keep better track of nonzero masks.

>
> I've left in the other '#if' variants in case you'd like to experiment
> with these, but would otherwise clean that up before pushing.
>
> Where does one put such a test case?
>
> Should the file be named 'pr107195' or something else?

The aforementioned patch already has:

* gcc.dg/tree-ssa/pr107195-1.c: New test.
* gcc.dg/tree-ssa/pr107195-2.c: New test.

So I would just add a pr107195-3.c test.

>
> Do we scan 'optimized', or an earlier dump?
>
> At '-O1', the actual code transformation is visible already in the 'dom2'
> dump:
>
> [local count: 536870913]:
>gimple_assign 
> +  gimple_assign 
> +  goto ; [100.00%]
>
> -   [local count: 1073741824]:
> -  # gimple_phi 
> +   [local count: 536870912]:
> +  # gimple_phi 
>gimple_assign 
>gimple_cond 
> -goto ; [50.00%]
> +goto ; [100.00%]
>else
> -goto ; [50.00%]
> +goto ; [0.00%]
>
> [local count: 536870913]:
>gimple_call 
>gimple_assign 
>
> [local count: 1073741824]:
> -  # gimple_phi 
> +  # gimple_phi 
>gimple_return 
>
> And, the actual "avoid second call 'foo'" optimization is visiable
> starting 'dom3':
>
> [local count: 536870913]:
>gimple_assign 
> +  goto ; [100.00%]
>
> -   [local count: 1073741824]:
> -  # gimple_phi 
> -  gimple_assign 
> +   [local count: 536870912]:
> +  gimple_assign 
>gimple_cond 
> -goto ; [50.00%]
> +goto ; [100.00%]
>else
> -goto ; [50.00%]
> +goto ; [0.00%]
>
> [local count: 536870913]:
> -  gimple_call 
> -  gimple_assign 
> +  gimple_assign 
> +  gimple_assign 
>
> [local count: 1073741824]:
> -  # gimple_phi 
> +  # gimple_phi 
>gimple_return 
>
> ..., but I don't know if either of those would be stable/appropriate to
> scan instead of 'optimized'?

IMO, either dom3 or optimized is fine.

Thanks.
Aldy



[COMMITTED] [PR10582] Add test.

2022-10-17 Thread Aldy Hernandez via Gcc-patches
PR tree-optimization/105820

gcc/testsuite/ChangeLog:

* g++.dg/tree-ssa/pr105820.c: New test.
---
 gcc/testsuite/g++.dg/tree-ssa/pr105820.c | 26 
 1 file changed, 26 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr105820.c

diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr105820.c 
b/gcc/testsuite/g++.dg/tree-ssa/pr105820.c
new file mode 100644
index 000..507950f42d3
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr105820.c
@@ -0,0 +1,26 @@
+// { dg-do compile }
+// { dg-options "-O2 -fstrict-enums --param case-values-threshold=1"}
+
+typedef int basic_block;
+
+enum gimple_code {};
+
+struct omp_region {
+  omp_region *outer;
+  basic_block cont;
+};
+
+void
+oof (void);
+
+void
+build_omp_regions_1 (omp_region *parent, basic_block bb, gimple_code code)
+{
+  if (code == 2)
+parent = parent->outer;
+  else if (code != 0)
+parent->cont = bb;
+
+  if (parent)
+oof ();
+}
-- 
2.37.3



Re: [GCC][PATCH] arm: Add cde feature support for Cortex-M55 CPU.

2022-10-17 Thread Christophe Lyon via Gcc-patches

Hi Srinath,


On 10/10/22 10:20, Srinath Parvathaneni via Gcc-patches wrote:

Hi,

This patch adds cde feature (optional) support for Cortex-M55 CPU, please refer
[1] for more details. To use this feature we need to specify +cdecpN
(e.g. -mcpu=cortex-m55+cdecp), where N is the coprocessor number 0 to 7.

Bootstrapped for arm-none-linux-gnueabihf target, regression tested
on arm-none-eabi target and found no regressions.

[1] https://developer.arm.com/documentation/101051/0101/?lang=en (version: 
r1p1).

Ok for master?

Regards,
Srinath.

gcc/ChangeLog:

2022-10-07  Srinath Parvathaneni  

 * common/config/arm/arm-common.cc (arm_canon_arch_option_1): Ignore cde
 options for mlibarch.
 * config/arm/arm-cpus.in (begin cpu cortex-m55): Add cde options.
 * doc/invoke.texi (CDE): Document options for Cortex-M55 CPU.

gcc/testsuite/ChangeLog:

2022-10-07  Srinath Parvathaneni  

 * gcc.target/arm/multilib.exp: Add multilib tests for Cortex-M55 CPU.


### Attachment also inlined for ease of reply###


diff --git a/gcc/common/config/arm/arm-common.cc 
b/gcc/common/config/arm/arm-common.cc
index 
c38812f1ea6a690cd19b0dc74d963c4f5ae155ca..b6f955b3c012475f398382e72c9a3966412991ec
 100644
--- a/gcc/common/config/arm/arm-common.cc
+++ b/gcc/common/config/arm/arm-common.cc
@@ -753,6 +753,15 @@ arm_canon_arch_option_1 (int argc, const char **argv, bool 
arch_for_multilib)
arm_initialize_isa (target_isa, selected_cpu->common.isa_bits);
arm_parse_option_features (target_isa, _cpu->common,
 strchr (cpu, '+'));
+  if (arch_for_multilib)
+   {
+ const enum isa_feature removable_bits[] = {ISA_IGNORE_FOR_MULTILIB,
+isa_nobit};
+ sbitmap isa_bits = sbitmap_alloc (isa_num_bits);
+ arm_initialize_isa (isa_bits, removable_bits);
+ bitmap_and_compl (target_isa, target_isa, isa_bits);
+   }
+


I can see the piece of code you add here is exactly the same as the one 
a few lines above when handling "if (arch)". Can this be moved below and 
thus be common to the two cases, or does it have to be performed before 
bitmap_ior of fpu_isa?


Also, IIUC, CDE was already optional for other CPUs (M33, M35P, 
star-mc1), so the hunk above fixes a latent bug when handling multilibs 
for these CPUs too? If so, maybe worth splitting the patch into two 
parts since the above is not strictly related to M55?


But I'm not a maintainer ;-)

Thanks,

Christophe


if (fpu && strcmp (fpu, "auto") != 0)
{
  /* The easiest and safest way to remove the default fpu
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 
5a63bc548e54dbfdce5d1df425bd615d81895d80..aa02c04c4924662f3ddd58e6967392ba3f4b4a87
 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -1633,6 +1633,14 @@ begin cpu cortex-m55
   option nomve remove mve mve_float
   option nofp remove ALL_FP mve_float
   option nodsp remove MVE mve_float
+ option cdecp0 add cdecp0
+ option cdecp1 add cdecp1
+ option cdecp2 add cdecp2
+ option cdecp3 add cdecp3
+ option cdecp4 add cdecp4
+ option cdecp5 add cdecp5
+ option cdecp6 add cdecp6
+ option cdecp7 add cdecp7
   isa quirk_no_asmcpu quirk_vlldm
   costs v7m
   vendor 41
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 
aa5655764a0360959f9c1061749d2cc9ebd23489..26857f7a90e42d925bc6908686ac78138a53c4ad
 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -21698,6 +21698,10 @@ floating-point instructions on @samp{cortex-m55}.
  Disable the M-Profile Vector Extension (MVE) single precision floating-point
  instructions on @samp{cortex-m55}.
  
+@item +cdecp0, +cdecp1, ... , +cdecp7

+Enable the Custom Datapath Extension (CDE) on selected coprocessors according
+to the numbers given in the options in the range 0 to 7 on @samp{cortex-m55}.
+
  @item  +nofp
  Disables the floating-point instructions on @samp{arm9e},
  @samp{arm946e-s}, @samp{arm966e-s}, @samp{arm968e-s}, @samp{arm10e},
diff --git a/gcc/testsuite/gcc.target/arm/multilib.exp 
b/gcc/testsuite/gcc.target/arm/multilib.exp
index 
2fa648c61dafebb663969198bf7849400a7547f6..7a977bff58b7b68bfe9e49d7602989a39caa6534
 100644
--- a/gcc/testsuite/gcc.target/arm/multilib.exp
+++ b/gcc/testsuite/gcc.target/arm/multilib.exp
@@ -851,6 +851,18 @@ if {[multilib_config "rmprofile"] } {
{-mcpu=cortex-m55+nomve+nofp -mfpu=auto -mfloat-abi=softfp} 
"thumb/v8-m.main/nofp"
{-mcpu=cortex-m55+nodsp+nofp -mfpu=auto -mfloat-abi=soft} 
"thumb/v8-m.main/nofp"
{-mcpu=cortex-m55+nodsp+nofp -mfpu=auto -mfloat-abi=softfp} 
"thumb/v8-m.main/nofp"
+   {-mcpu=cortex-m55 -mfloat-abi=hard -mfpu=auto} "thumb/v8-m.main+dp/hard"
+   {-mcpu=cortex-m55+cdecp0 -mfloat-abi=hard -mfpu=auto} 
"thumb/v8-m.main+dp/hard"
+   {-mcpu=cortex-m55+nomve+cdecp0 -mfloat-abi=hard -mfpu=auto} 

Announcement: Porting the Docs to Sphinx - 9. November 2022

2022-10-17 Thread Martin Liška
Hello.

Based on the very positive feedback I was given at the Cauldron Sphinx 
Documentation BoF,
I'm planning migrating the documentation on 9th November. There are still some 
minor comments
from Sandra when it comes to the PDF output, but we can address that once the 
conversion is done.

The reason I'm sending the email now is that I was waiting for latest Sphinx 
release (5.3.0) that
simplifies reference format for options and results in much simpler Option 
summary section ([1])

The current GCC master (using Sphinx 5.3.0) converted docs can be seen here:
https://splichal.eu/scripts/sphinx/

If you see any issues with the converted documentation, or have a feedback 
about it,
please reply to this email.

Cheers,
Martin

[1] https://github.com/sphinx-doc/sphinx/pull/10840
[1] 
https://splichal.eu/scripts/sphinx/gcc/_build/html/gcc-command-options/option-summary.html


[COMMITTED] Do not test for -Inf when flag_finite_math_only.

2022-10-17 Thread Aldy Hernandez via Gcc-patches
PR tree-optimizatin/107286

gcc/ChangeLog:

* value-range.cc (range_tests_floats): Do not test for -Inf when
flag_finite_math_only.
---
 gcc/value-range.cc | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index 4794d2386a8..90d5e660684 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -4022,10 +4022,13 @@ range_tests_floats ()
   r0.intersect (r1);
   ASSERT_TRUE (r0.undefined_p ());
 
-  // Make sure [-Inf, -Inf] doesn't get normalized.
-  r0 = frange_float ("-Inf", "-Inf");
-  ASSERT_TRUE (real_isinf (_bound (), true));
-  ASSERT_TRUE (real_isinf (_bound (), true));
+  if (!flag_finite_math_only)
+{
+  // Make sure [-Inf, -Inf] doesn't get normalized.
+  r0 = frange_float ("-Inf", "-Inf");
+  ASSERT_TRUE (real_isinf (_bound (), true));
+  ASSERT_TRUE (real_isinf (_bound (), true));
+}
 }
 
 void
-- 
2.37.3



[COMMITTED] Add 3 floating NAN tests.

2022-10-17 Thread Andrew MacLeod via Gcc-patches
3 tests from Aldy using the relations generated by GORI between operands 
to set or clear NANs as appropriate on outgoing edges.


Pushed.

Andrew

From 7896a31d3003bad8b845881f59e570fbc3c78cfa Mon Sep 17 00:00:00 2001
From: Aldy Hernandez 
Date: Mon, 10 Oct 2022 11:01:48 +0200
Subject: [PATCH 4/4] Add 3 floating NAN tests.

x UNORD x should set NAN on the TRUE side.
The false side of x == x should set NAN.
The true side of x != x should set NAN.

	gcc/testsuite/
	* gcc.dg/tree-ssa/vrp-float-3a.c: New.
	* gcc.dg/tree-ssa/vrp-float-4a.c: New.
	* gcc.dg/tree-ssa/vrp-float-5a.c: New.
---
 gcc/testsuite/gcc.dg/tree-ssa/vrp-float-3a.c | 19 
 gcc/testsuite/gcc.dg/tree-ssa/vrp-float-4a.c | 23 
 gcc/testsuite/gcc.dg/tree-ssa/vrp-float-5a.c | 16 ++
 3 files changed, 58 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp-float-3a.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp-float-4a.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp-float-5a.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp-float-3a.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp-float-3a.c
new file mode 100644
index 000..5aadaa7c4db
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp-float-3a.c
@@ -0,0 +1,19 @@
+// { dg-do compile }
+// { dg-options "-O2 -fno-thread-jumps -fdisable-tree-fre1 -fdump-tree-evrp" }
+
+void link_error ();
+void bar ();
+
+float
+foo (float x)
+{
+  if (x != x)
+{
+  // The true side of x != x implies NAN, so we should be able to
+  // fold this.
+  if (!__builtin_isnan (x))
+	link_error ();
+}
+}
+
+// { dg-final { scan-tree-dump-not "link_error" "evrp" } }
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp-float-4a.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp-float-4a.c
new file mode 100644
index 000..7d3187b3962
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp-float-4a.c
@@ -0,0 +1,23 @@
+// { dg-do compile }
+// { dg-options "-O2 -fno-thread-jumps -fdisable-tree-fre1 -fdump-tree-evrp" }
+
+void link_error ();
+void bar ();
+
+float
+foo (float x)
+{
+  if (x == x)
+{
+  bar ();
+}
+  else
+{
+  // The false side of x == x implies NAN, so we should be able to
+  // fold this.
+  if (!__builtin_isnan (x))
+	link_error ();
+}
+}
+
+// { dg-final { scan-tree-dump-not "link_error" "evrp" } }
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp-float-5a.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp-float-5a.c
new file mode 100644
index 000..08332305f2c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp-float-5a.c
@@ -0,0 +1,16 @@
+// { dg-do compile }
+// { dg-options "-O2 -fno-thread-jumps -fdisable-tree-fre1 -fdump-tree-evrp" }
+
+void link_error ();
+
+float
+foo (float x)
+{
+  if (__builtin_isnan (x))
+{
+  if (!__builtin_isnan (x))
+	link_error ();
+}
+}
+
+// { dg-final { scan-tree-dump-not "link_error" "evrp" } }
-- 
2.37.3



[COMMITTED] Add relation_trio class for range-ops.

2022-10-17 Thread Andrew MacLeod via Gcc-patches
When I first added relations to range_ops, I struggled with obfuscating 
the API too much by adding all of the 3 possible relations.  For 
simplicity, it seemed like only one was ever relevant, so elected to add 
one relation, and make it always the relation between the 2 known operands.


fold_range()    - relation passed  op1 REL op2
op1_range()    - relation passed  lhs REL op2
op2_range()    - relation passed  lhs REL op1

With some of the floating point enhancements, we've tripped over cases 
where its useful to know one or more of the relations.


This patch provides a new class in value_relation.h called 
"relation_trio" which packages up 3 relations into a single unsigned 
value, and allows them to be extracted by request.  I have changed all 3 
of the primary range-ops interface routines mentioned above to take a 
relation_trio object by value rather than a relation kind, and then each 
range_op routines explicitly asks for the relation it is looking for.


I have also audited the range-op and range-op-float routines to make 
sure that when op2_range invokes op1_range that we do the appropriate 
relation swapping.


There is virtually no performance impact by this, and it is now clear 
when looking at one of the range-ops routines exactly what relation it 
is using.   This seems much less confusing.


I have also adjusted the compute_operand[12]_range routines in GORI to 
also provide a second relation when appropriate.


Bootstrapped on  x86_64-pc-linux-gnu with no regressions. Pushed.

Andrew
From b565ac19264a5827162d28537bccc8531c25e817 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Thu, 13 Oct 2022 18:03:58 -0400
Subject: [PATCH 3/4] Add relation_trio class for range-ops.

There are 3 possible relations range-ops might care about, but only the one
most likely to be needed is supplied.   This patch provides a new class
relation_trio which allows 3 relations to be passed in a single word.

fold_range (), op1_range (), and op2_range () are adjusted to take a
relation_trio class instead of a relation_kind, then the routine can
extract which relation it wants to work with.

	* gimple-range-fold.cc (fold_using_range::range_of_range_op):
	Provide relation_trio class.
	* gimple-range-gori.cc (gori_compute::refine_using_relation):
	Provide relation_trio class.
	(gori_compute::refine_using_relation): Ditto.
	(gori_compute::compute_operand1_range): Provide lhs_op2 and
	op1_op2 relations via relation_trio class.
	(gori_compute::compute_operand2_range): Ditto.
	* gimple-range-op.cc (gimple_range_op_handler::calc_op1): Use
	relation_trio instead of relation_kind.
	(gimple_range_op_handler::calc_op2): Ditto.
	(*::fold_range): Ditto.
	* gimple-range-op.h (gimple_range_op::calc_op1): Adjust prototypes.
	(gimple_range_op::calc_op2): Adjust prototypes.
	* range-op-float.cc (*::fold_range): Use relation_trio instead of
	relation_kind.
	(*::op1_range): Ditto.
	(*::op2_range): Ditto.
	* range-op.cc (*::fold_range): Use relation_trio instead of
	relation_kind.
	(*::op1_range): Ditto.
	(*::op2_range): Ditto.
	* range-op.h (class range_operator): Adjust prototypes.
	(class range_operator_float): Ditto.
	(class range_op_handler): Adjust prototypes.
	(relop_early_resolve): Pickup op1_op2 relation from relation_trio.
	* value-relation.cc (VREL_LAST): Adjust use to be one past the end of
	the enum.
	(relation_oracle::validate_relation): Use relation_trio in call
	to fold_range.
	* value-relation.h (enum relation_kind_t): Add VREL_LAST as
	final element.
	(class relation_trio): New.
	(TRIO_VARYING, TRIO_SHIFT, TRIO_MASK): New.
---
 gcc/gimple-range-fold.cc |   5 +-
 gcc/gimple-range-gori.cc |  43 ---
 gcc/gimple-range-op.cc   |  40 +++---
 gcc/gimple-range-op.h|   4 +-
 gcc/range-op-float.cc| 170 +
 gcc/range-op.cc  | 267 ---
 gcc/range-op.h   |  29 +++--
 gcc/value-relation.cc|  19 ++-
 gcc/value-relation.h | 119 +++--
 9 files changed, 405 insertions(+), 291 deletions(-)

diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index c381ef94087..f91923782dc 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -578,7 +578,8 @@ fold_using_range::range_of_range_op (vrange ,
 	  fputc ('\n', dump_file);
 	}
 	  // Fold range, and register any dependency if available.
-	  if (!handler.fold_range (r, type, range1, range2, rel))
+	  if (!handler.fold_range (r, type, range1, range2,
+   relation_trio::op1_op2 (rel)))
 	r.set_varying (type);
 	  if (irange::supports_p (type))
 	relation_fold_and_or (as_a  (r), s, src);
@@ -597,7 +598,7 @@ fold_using_range::range_of_range_op (vrange ,
 		}
 	  if (gimple_range_ssa_p (op2))
 		{
-		  rel= handler.lhs_op2_relation (r, range1, range2, rel);
+		  rel = handler.lhs_op2_relation (r, range1, range2, rel);
 		  if (rel != VREL_VARYING)
 		src.register_relation (s, rel, lhs, op2);
 		}
diff --git 

[COMMITTED] Fix nan updating in range-ops.

2022-10-17 Thread Andrew MacLeod via Gcc-patches
There is a path in which clear_nan() is called on an UNDEFINED range, 
which is not allowed.  This patch simply makes sure VARYING is set 
before calling clear_nan().


In operator_not_equal, we should check if op1 == op1 AFTER the check for 
a singleton.


operator_ordered was also cealring the NAN on the false side, and should 
be setting it.


None of these paths were being executed to this point as GORI was not 
passing in the relation between op1 and op2, but the next patch changes 
that and would trigger these issues.


Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew


From 04874fedae8074b252abbd70fea68bf3dd0a605b Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Fri, 14 Oct 2022 09:29:23 -0400
Subject: [PATCH 2/4] Fix nan updating in range-ops.

Calling clean_nan on an undefined type traps, set_varying first. Other
tweaks for correctness.

	* range-op-float.cc (foperator_not_equal::op1_range): Check for
	VREL_EQ after singleton.
	(foperator_unordered::op1_range): Set VARYING before calling
	clear_nan().
	(foperator_ordered::op1_range): Set rather than clear NAN if both
	operands are the same.
---
 gcc/range-op-float.cc | 23 ++-
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/gcc/range-op-float.cc b/gcc/range-op-float.cc
index 23e0f5ef4e2..6cf2180ce59 100644
--- a/gcc/range-op-float.cc
+++ b/gcc/range-op-float.cc
@@ -510,12 +510,9 @@ foperator_not_equal::op1_range (frange , tree type,
   switch (get_bool_state (r, lhs, type))
 {
 case BRS_TRUE:
-  // The TRUE side of op1 != op1 implies op1 is NAN.
-  if (rel == VREL_EQ)
-	r.set_nan (type);
   // If the result is true, the only time we know anything is if
   // OP2 is a constant.
-  else if (op2.singleton_p ())
+  if (op2.singleton_p ())
 	{
 	  // This is correct even if op1 is NAN, because the following
 	  // range would be ~[tmp, tmp] with the NAN property set to
@@ -523,6 +520,9 @@ foperator_not_equal::op1_range (frange , tree type,
 	  REAL_VALUE_TYPE tmp = op2.lower_bound ();
 	  r.set (type, tmp, tmp, VR_ANTI_RANGE);
 	}
+  // The TRUE side of op1 != op1 implies op1 is NAN.
+  else if (rel == VREL_EQ)
+	r.set_nan (type);
   else
 	r.set_varying (type);
   break;
@@ -1045,22 +1045,18 @@ foperator_unordered::op1_range (frange , tree type,
   switch (get_bool_state (r, lhs, type))
 {
 case BRS_TRUE:
-  if (rel == VREL_EQ)
-	r.set_nan (type);
   // Since at least one operand must be NAN, if one of them is
   // not, the other must be.
-  else if (!op2.maybe_isnan ())
+  if (rel == VREL_EQ || !op2.maybe_isnan ())
 	r.set_nan (type);
   else
 	r.set_varying (type);
   break;
 
 case BRS_FALSE:
-  if (rel == VREL_EQ)
-	r.clear_nan ();
   // A false UNORDERED means both operands are !NAN, so it's
   // impossible for op2 to be a NAN.
-  else if (op2.known_isnan ())
+  if (op2.known_isnan ())
 	r.set_undefined ();
   else
 	{
@@ -1132,10 +1128,11 @@ foperator_ordered::op1_range (frange , tree type,
   break;
 
 case BRS_FALSE:
-  r.set_varying (type);
-  // The FALSE side of op1 ORDERED op1 implies op1 is !NAN.
+  // The FALSE side of op1 ORDERED op1 implies op1 is NAN.
   if (rel == VREL_EQ)
-	r.clear_nan ();
+	r.set_nan (type);
+  else
+	r.set_varying (type);
   break;
 
 default:
-- 
2.37.3



[COMMITTED] Don't set useless relations.

2022-10-17 Thread Andrew MacLeod via Gcc-patches
The oracle will not register nonsense/useless relations, (basically X op 
X).  Symbolically,  x == x  is implied, and x != x, x< x, etc are all 
nonsense.


Now that we are using class value_relation in a couple of other places, 
it shouldn't either.


Bootstrapped on  x86_64-pc-linux-gnu with no regressions. Pushed.

Andrew
From fca529517484bf19098ca9efa77e95534086abdc Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Fri, 14 Oct 2022 14:31:02 -0400
Subject: [PATCH 1/4] Don't set useless relations.

The oracle will not register nonssense/useless relations, class
value_relation shouldn't either.

	* value-relation.cc (value_relation::dump): Change message.
	* value-relation.h (value_relation::set_relation): If op1 is the
	same as op2 do not create a relation.
---
 gcc/value-relation.cc | 2 +-
 gcc/value-relation.h  | 7 +++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/gcc/value-relation.cc b/gcc/value-relation.cc
index 50fc190a36b..3fb7b96c9e0 100644
--- a/gcc/value-relation.cc
+++ b/gcc/value-relation.cc
@@ -891,7 +891,7 @@ value_relation::dump (FILE *f) const
 {
   if (!name1 || !name2)
 {
-  fprintf (f, "uninitialized");
+  fprintf (f, "no relation registered");
   return;
 }
   fputc ('(', f);
diff --git a/gcc/value-relation.h b/gcc/value-relation.h
index a3bbe1e8157..fa9097a8069 100644
--- a/gcc/value-relation.h
+++ b/gcc/value-relation.h
@@ -349,6 +349,13 @@ value_relation::set_relation (relation_kind r, tree n1, tree n2)
 {
   gcc_checking_assert (TREE_CODE (n1) == SSA_NAME
 		   && TREE_CODE (n2) == SSA_NAME);
+  if (n1 == n2)
+{
+  related = VREL_VARYING;
+  name1 = NULL_TREE;
+  name2 = NULL_TREE;
+  return;
+}
   related = r;
   name1 = n1;
   name2 = n2;
-- 
2.37.3



Re: [PATCH] libgcc: Mostly vectorize CIE encoding extraction for FDEs

2022-10-17 Thread Florian Weimer via Gcc-patches
* Richard Biener:

> On Mon, Oct 17, 2022 at 3:01 PM Florian Weimer via Gcc-patches
>  wrote:
>>
>> "zR" and "zPLR" are the most common augmentations.  Use a simple
>> SIMD-with-in-a-register technique to check for both augmentations,
>> and that the following variable-length integers have length 1, to
>> get more quickly at the encoding field.
>>
>> libgcc/
>>
>> * unwind-dw2-fde.c (get_cie_encoding_slow): Rename from
>> get_cie_encoding.  Mark as noinline.
>> (get_cie_encoding): Add fast path for "zR" and "zPLR"
>> augmentations.  Call get_cie_encoding_slow as a fall-back.
>>
>> ---
>>  libgcc/unwind-dw2-fde.c | 61 
>> +++--
>>  1 file changed, 59 insertions(+), 2 deletions(-)
>>
>> diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c
>> index 3c0cc654ec0..4e3a54c5a1a 100644
>> --- a/libgcc/unwind-dw2-fde.c
>> +++ b/libgcc/unwind-dw2-fde.c
>> @@ -333,8 +333,10 @@ base_from_object (unsigned char encoding, const struct 
>> object *ob)
>>  /* Return the FDE pointer encoding from the CIE.  */
>>  /* ??? This is a subset of extract_cie_info from unwind-dw2.c.  */
>>
>> -static int
>> -get_cie_encoding (const struct dwarf_cie *cie)
>> +/* Disable inlining because the function is only used as a slow path in
>> +   get_cie_encoding below.  */
>> +static int __attribute__ ((noinline))
>> +get_cie_encoding_slow (const struct dwarf_cie *cie)
>>  {
>>const unsigned char *aug, *p;
>>_Unwind_Ptr dummy;
>> @@ -389,6 +391,61 @@ get_cie_encoding (const struct dwarf_cie *cie)
>>  }
>>  }
>>
>> +static inline int
>> +get_cie_encoding (const struct dwarf_cie *cie)
>> +{
>> +  /* Fast path for some augmentations and single-byte variable-length
>> + integers.  Do this only for targets that align struct dwarf_cie to 8
>> + bytes, which ensures that at least 8 bytes are available starting at
>> + cie->version.  */
>> +#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ \
>> +  || __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
>> +  if (__alignof (*cie) == 8 && sizeof (unsigned long long) == 8)
>> +{
>> +  unsigned long long value = *(const unsigned long long *) 
>> >version;
>
> TBAA?  Maybe use
>
>unsigned long long value;
>memcpy (, >version, 8);
>
> instead?

It's following the existing libgcc style, see
read_encoded_value_with_base in unwind-pe.h.

We know here that the pointer is aligned, but perhaps GCC still can use
that information if using memcpy.

I can certainly change it to memcpy.

Thanks,
Florian



Re: [PATCH] libgcc: Mostly vectorize CIE encoding extraction for FDEs

2022-10-17 Thread Richard Biener via Gcc-patches
On Mon, Oct 17, 2022 at 3:01 PM Florian Weimer via Gcc-patches
 wrote:
>
> "zR" and "zPLR" are the most common augmentations.  Use a simple
> SIMD-with-in-a-register technique to check for both augmentations,
> and that the following variable-length integers have length 1, to
> get more quickly at the encoding field.
>
> libgcc/
>
> * unwind-dw2-fde.c (get_cie_encoding_slow): Rename from
> get_cie_encoding.  Mark as noinline.
> (get_cie_encoding): Add fast path for "zR" and "zPLR"
> augmentations.  Call get_cie_encoding_slow as a fall-back.
>
> ---
>  libgcc/unwind-dw2-fde.c | 61 
> +++--
>  1 file changed, 59 insertions(+), 2 deletions(-)
>
> diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c
> index 3c0cc654ec0..4e3a54c5a1a 100644
> --- a/libgcc/unwind-dw2-fde.c
> +++ b/libgcc/unwind-dw2-fde.c
> @@ -333,8 +333,10 @@ base_from_object (unsigned char encoding, const struct 
> object *ob)
>  /* Return the FDE pointer encoding from the CIE.  */
>  /* ??? This is a subset of extract_cie_info from unwind-dw2.c.  */
>
> -static int
> -get_cie_encoding (const struct dwarf_cie *cie)
> +/* Disable inlining because the function is only used as a slow path in
> +   get_cie_encoding below.  */
> +static int __attribute__ ((noinline))
> +get_cie_encoding_slow (const struct dwarf_cie *cie)
>  {
>const unsigned char *aug, *p;
>_Unwind_Ptr dummy;
> @@ -389,6 +391,61 @@ get_cie_encoding (const struct dwarf_cie *cie)
>  }
>  }
>
> +static inline int
> +get_cie_encoding (const struct dwarf_cie *cie)
> +{
> +  /* Fast path for some augmentations and single-byte variable-length
> + integers.  Do this only for targets that align struct dwarf_cie to 8
> + bytes, which ensures that at least 8 bytes are available starting at
> + cie->version.  */
> +#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ \
> +  || __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
> +  if (__alignof (*cie) == 8 && sizeof (unsigned long long) == 8)
> +{
> +  unsigned long long value = *(const unsigned long long *) >version;

TBAA?  Maybe use

   unsigned long long value;
   memcpy (, >version, 8);

instead?

> +
> +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
> +#define C(x) __builtin_bswap64 (x)
> +#else
> +#define C(x) x
> +#endif
> +
> +  /* Fast path for "zR".  Check for version 1, the "zR" string and that
> +the sleb128/uleb128 values are single bytes.  In the comments
> +below, '1', 'c', 'd', 'r', 'l' are version, code alignment, data
> +alignment, return address column, augmentation length.  Note that
> +with CIE version 1, the return address column is byte-encoded.  */
> +  unsigned long long expected =
> +   /*   1 z R 0 c d r l.  */
> +   C (0x017a5200ULL);
> +  unsigned long long mask =
> +   /*   1 z R 0 c d r l.  */
> +   C (0x80800080ULL);
> +
> +  if ((value & mask) == expected)
> +   return cie->augmentation[7];
> +
> +  /* Fast path for "zPLR".  */
> +  expected =
> +   /*   1 z P L R 0 c d.  */
> +   C (0x017a504c5200ULL);
> +  mask =
> +   /*   1 z P L R 0 c d.  */
> +   C (0x8080ULL);
> +#undef C
> +
> +  /* Validate the augmentation length, and return the enconding after
> +it.  No check for the return address column because it is
> +byte-encoded with CIE version 1.  */
> +  if (__builtin_expect ((value & mask) == expected
> +   && (cie->augmentation[8] & 0x80) == 0, 1))
> + return cie->augmentation[9];
> +}
> +#endif
> +
> +  return get_cie_encoding_slow (cie);
> +}
> +
>  static inline int
>  get_fde_encoding (const struct dwarf_fde *f)
>  {
>
> base-commit: de84a1e4b107b803ec3b064c3771a6ed8c0e201e
>


Re: [RFC PATCH] libstdc++, v2: Partial library support for std::float{16, 32, 64, 128}_t

2022-10-17 Thread Jonathan Wakely via Gcc-patches
On Sun, 16 Oct 2022 at 11:23, Jakub Jelinek  wrote:
>
> Hi!
>
> As the __bf16 support is now in at least on x86_64/i686, I've
> updated my patch to cover bfloat16_t as well and implemented almost
> everything for  - the only thing missing I'm aware of is
> std::nextafter std::float16_t and std::bfloat16_t overloads (I think
> we probably need to implement that out of line somewhere, or inline? - might
> need inline asm barriers) and std::nexttoward overloads (those are
> intentional, you said there is a LWG issue about that).

Yes, that's now https://cplusplus.github.io/LWG/issue3790
The current proposed resolution is to just restore the C++20 functions
and not provide anything for the new types.

> If you want to have  done in a different way, e.g. the patch
> groups a lot of different function overloads by the floating point type,
> is that ok or do you want to have them one function at a time for all types,
> then next?

No, I think this way makes more sense. Otherwise the line count in the
file will baloon with all the repeated #if #endif directives.

The only comment I have about the  changes is that I think all
the new functions should be just 'constexpr' not 'inline
_GLIBCXX_CONSTEXPR'. The __STDCPP_FLOATN__ macros are only defined for
C++23, right? So _GLIBCXX_CONSTEXPR is always just 'constexpr' (it's
only something different for C++98), and that already implies 'inline'
too. So just:

constexpr _Float16
log10(_Float16 __x)
{ return _Float16(__builtin_log10f(__x)); }


> I could try to handle  too, but am kind of lost there.
> The paper dropped the explicit std::complex specializations, can they stay
> around as is and should new overloads be added for the
> _Float*/__gnu_cxx::__bfloat16_t types?

The explicit specializations can stay, they do no harm.

I think to handle the new FP types we can modify the primary template
as shown in P1467. I don't think we'll need to add any new function
overloads for the new types.

I can take care of the  changes.

> And I/O etc. support is missing, not sure I'm able to handle that and if it
> is e.g. possible to keep that support out of libstdc++.so.6, because what
> extended floating point types one has on a particular arch could change over
> time (I mean e.g. bfloat16_t support or float16_t support can be added
> etc.).

Yes, I think we can add the I/O functions as always_inline because all
they're going to do is convert the argument to float, double, or long
double and then call the existing overloads. There will be no new
virtual functions.

I can take care of that too.



> --- libstdc++-v3/include/bits/c++config.jj  2022-05-23 21:44:49.082847038 
> +0200
> +++ libstdc++-v3/include/bits/c++config 2022-10-14 22:32:55.411346463 +0200
> @@ -1,4 +1,4 @@
> -// Predefined symbols and macros -*- C++ -*-
> +   // Predefined symbols and macros -*- C++ -*-

This whitespace change looks accidental.

Apart from that and simplifying 'inline _GLIBCXX_CONSTEXPR' to just
'constexpr' this looks good for trunk.



Re: [DOCS] Python Language Conventions

2022-10-17 Thread Martin Liška
On 10/13/22 19:16, David Malcolm wrote:
> On Thu, 2022-10-13 at 11:44 +0200, Gerald Pfeifer wrote:
>> Hi Martin,
>>
>> On Thu, 13 Oct 2022, Martin Liška wrote:
>>> I think we should add how Python scripts should be formatted. I
>>> noticed
>>> that while reading the Modula-2 patchset where it follows the C/C++
>>> style
>>> when it comes to Python files.
>>
>> good initiative, thank you! This makes sense to me, alas I'm not a
>> Python 
>> hacker, so best wait to see what David and Gaius think, too?
> 
> I'm very much +1 on recommending PEP 8.
> 
> My Python skills are bit-rotting somewhat, and I've not used flake8,
> but it seems a reasonable recommendation to me.

All right, let me install my initial patch with the improved wording.

Cheers,
Martin

> 
>>
>>
>> Some suggestions on the web side of things:
>>
>>> +Python Language Conventions
>>
>> Since the name of the page already is codingconventions.html, I
>> suggest
>> making this simply "#python" - shorter and simpler. :-)
>>
>>> +Python scripts should follow >> href="https://peps.python.org/pep-0008/;>PEP 8 – Style Guide for
>>> Python Code
>>> +which can be verified by flake8
>>> tool.
>>
>> ...by the...tool.
>>
>>> +We do also recommend using the following flake8 plug-
>>> ins:
>>
>> Here maybe simply say "We recommend using"?
> 
> That's a much better wording.
> 
> Dave
> 
>>
>> Hope this helps,
>> Gerald
> 



[PATCH] libgcc: Mostly vectorize CIE encoding extraction for FDEs

2022-10-17 Thread Florian Weimer via Gcc-patches
"zR" and "zPLR" are the most common augmentations.  Use a simple
SIMD-with-in-a-register technique to check for both augmentations,
and that the following variable-length integers have length 1, to
get more quickly at the encoding field.

libgcc/

* unwind-dw2-fde.c (get_cie_encoding_slow): Rename from
get_cie_encoding.  Mark as noinline.
(get_cie_encoding): Add fast path for "zR" and "zPLR"
augmentations.  Call get_cie_encoding_slow as a fall-back.

---
 libgcc/unwind-dw2-fde.c | 61 +++--
 1 file changed, 59 insertions(+), 2 deletions(-)

diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c
index 3c0cc654ec0..4e3a54c5a1a 100644
--- a/libgcc/unwind-dw2-fde.c
+++ b/libgcc/unwind-dw2-fde.c
@@ -333,8 +333,10 @@ base_from_object (unsigned char encoding, const struct 
object *ob)
 /* Return the FDE pointer encoding from the CIE.  */
 /* ??? This is a subset of extract_cie_info from unwind-dw2.c.  */
 
-static int
-get_cie_encoding (const struct dwarf_cie *cie)
+/* Disable inlining because the function is only used as a slow path in
+   get_cie_encoding below.  */
+static int __attribute__ ((noinline))
+get_cie_encoding_slow (const struct dwarf_cie *cie)
 {
   const unsigned char *aug, *p;
   _Unwind_Ptr dummy;
@@ -389,6 +391,61 @@ get_cie_encoding (const struct dwarf_cie *cie)
 }
 }
 
+static inline int
+get_cie_encoding (const struct dwarf_cie *cie)
+{
+  /* Fast path for some augmentations and single-byte variable-length
+ integers.  Do this only for targets that align struct dwarf_cie to 8
+ bytes, which ensures that at least 8 bytes are available starting at
+ cie->version.  */
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ \
+  || __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+  if (__alignof (*cie) == 8 && sizeof (unsigned long long) == 8)
+{
+  unsigned long long value = *(const unsigned long long *) >version;
+
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+#define C(x) __builtin_bswap64 (x)
+#else
+#define C(x) x
+#endif
+
+  /* Fast path for "zR".  Check for version 1, the "zR" string and that
+the sleb128/uleb128 values are single bytes.  In the comments
+below, '1', 'c', 'd', 'r', 'l' are version, code alignment, data
+alignment, return address column, augmentation length.  Note that
+with CIE version 1, the return address column is byte-encoded.  */
+  unsigned long long expected =
+   /*   1 z R 0 c d r l.  */
+   C (0x017a5200ULL);
+  unsigned long long mask =
+   /*   1 z R 0 c d r l.  */
+   C (0x80800080ULL);
+
+  if ((value & mask) == expected)
+   return cie->augmentation[7];
+
+  /* Fast path for "zPLR".  */
+  expected =
+   /*   1 z P L R 0 c d.  */
+   C (0x017a504c5200ULL);
+  mask =
+   /*   1 z P L R 0 c d.  */
+   C (0x8080ULL);
+#undef C
+
+  /* Validate the augmentation length, and return the enconding after
+it.  No check for the return address column because it is
+byte-encoded with CIE version 1.  */
+  if (__builtin_expect ((value & mask) == expected
+   && (cie->augmentation[8] & 0x80) == 0, 1))
+ return cie->augmentation[9];
+}
+#endif
+
+  return get_cie_encoding_slow (cie);
+}
+
 static inline int
 get_fde_encoding (const struct dwarf_fde *f)
 {

base-commit: de84a1e4b107b803ec3b064c3771a6ed8c0e201e



Re: [PATCH, rs6000] Split TARGET_POWER8 from TARGET_DIRECT_MOVE [PR101865] (2/2)

2022-10-17 Thread Kewen.Lin via Gcc-patches
Hi Will,

Thanks for fixing this, some comments are inline as below.

on 2022/9/20 00:13, will schmidt wrote:
> [PATCH, rs6000] Split TARGET_POWER8 from TARGET_DIRECT_MOVE [PR101865]
> 
> Hi,
>   The _ARCH_PWR8 define is conditional on TARGET_DIRECT_MOVE,
> and can be disabled by dependent options when it should not be.
> This manifests in the issue seen in PR101865 where -mno-vsx
> mistakenly disables _ARCH_PWR8.
> 
> This change replaces the relevant TARGET_DIRECT_MOVE references
> with a TARGET_POWER8 entry so that the direct_move and power8
> features can be enabled or disabled independently.
> 
> This is done via the OPTION_MASK definitions, so this
> means that some references to the OPTION_MASK_DIRECT_MOVE
> option are now replaced with OPTION_MASK_POWER8.
> 
> The existing (and rather lengthy) commentary for DIRECT_MOVE remains
> in place in rs6000-c.cc:rs6000_target_modify_macros().  The
> if-defined logic there will now set a __DIRECT_MOVE__ define when
> TARGET_DIRECT_MOVE is set, this serves as a placeholder for debug
> purposes, but is otherwise unused.  This can be removed in a
> subsequent patch, or in an update of this patch, depending on feedback.

The mentioned commentary for DIRECT_MOVE looks out of date since
option direct_move is marked as Undocumented & WarnRemoved, it can't
be enabled/disabled explicitly.  Personally I'm inclined not to
introduce __DIRECT_MOVE__ define, since we don't have a separated
option for it now, and if users want to check the availability,
they can check __VSX__ && _ARCH_PWR8 instead.

> 
> This regests cleanly (power8,power9,power10), and resolves
> PR 101865 as represented in the tests from (1/2).
> 
> OK for trunk?
> Thanks,
> -Will
> 
> 
> gcc/
>   PR Target/101865
>   * config/rs6000/rs6000-builtin.cc
>   (rs6000_builtin_is_supported): Replace TARGET_DIRECT_MOVE
>   usage with TARGET_POWER8.
>   * config/rs6000/rs6000-c.cc (rs6000_target_modify_macros):
>   Add __DIRECT_MOVE__ define.  Replace _ARCH_PWR8_ define
>   conditional with OPTION_MASK_POWER8.
>   * config/rs6000/rs6000-cpus.def (ISA_2_7_MASKS_SERVER):
>   Add OPTION_MASK_POWER8 entry.
>   (POWERPC_MASKS): Same.
>   * config/rs6000/rs6000.cc (rs6000_option_override_internal):
>   Replace OPTION_MASK_DIRECT_MOVE usage with OPTION_MASK_POWER8.
>   (rs6000_opt_masks): Add "power8" entry for new OPTION_MASK_POWER8.
>   * config/rs6000/rs6000.opt (-mpower8): Add entry for POWER8.
>   * config/rs6000/vsx.md (vsx_extract_): Replace
>   TARGET_DIRECT_MOVE usage with TARGET_POWER8.
>   (define_peephole2): Same.
> 
> diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
> b/gcc/config/rs6000/rs6000-builtin.cc
> index 3ce729c1e6de..91a0f39bd796 100644
> --- a/gcc/config/rs6000/rs6000-builtin.cc
> +++ b/gcc/config/rs6000/rs6000-builtin.cc
> @@ -163,11 +163,11 @@ rs6000_builtin_is_supported (enum rs6000_gen_builtins 
> fncode)
>  case ENB_P7:
>return TARGET_POPCNTD;
>  case ENB_P7_64:
>return TARGET_POPCNTD && TARGET_POWERPC64;
>  case ENB_P8:
> -  return TARGET_DIRECT_MOVE;
> +  return TARGET_POWER8;
>  case ENB_P8V:
>return TARGET_P8_VECTOR;
>  case ENB_P9:
>return TARGET_MODULO;
>  case ENB_P9_64:
> diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
> index ca9cc42028f7..41d51b039061 100644
> --- a/gcc/config/rs6000/rs6000-c.cc
> +++ b/gcc/config/rs6000/rs6000-c.cc
> @@ -439,11 +439,13 @@ rs6000_target_modify_macros (bool define_p, 
> HOST_WIDE_INT flags)
>   turned off in any of the following conditions:
>   1. TARGET_HARD_FLOAT, TARGET_ALTIVEC, or TARGET_VSX is explicitly
>   disabled and OPTION_MASK_DIRECT_MOVE was not explicitly
>   enabled.
>   2. TARGET_VSX is off.  */

As mentioned above, the comments might need some updates.

> -  if ((flags & OPTION_MASK_DIRECT_MOVE) != 0)
> +  if ((OPTION_MASK_DIRECT_MOVE) != 0)
> +rs6000_define_or_undefine_macro (define_p, "__DIRECT_MOVE__");
> +  if ((flags & OPTION_MASK_POWER8) != 0)
>  rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR8");
>if ((flags & OPTION_MASK_MODULO) != 0)
>  rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR9");
>if ((flags & OPTION_MASK_POWER10) != 0)
>  rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR10");
> diff --git a/gcc/config/rs6000/rs6000-cpus.def 
> b/gcc/config/rs6000/rs6000-cpus.def
> index c3825bcccd84..c873f6d58989 100644
> --- a/gcc/config/rs6000/rs6000-cpus.def
> +++ b/gcc/config/rs6000/rs6000-cpus.def
> @@ -48,10 +48,11 @@
> system.  */
>  #define ISA_2_7_MASKS_SERVER (ISA_2_6_MASKS_SERVER   \
>| OPTION_MASK_P8_VECTOR\
>| OPTION_MASK_CRYPTO   \
>| OPTION_MASK_DIRECT_MOVE  \
> +  | OPTION_MASK_POWER8   \
>  

Re: [PATCH, rs6000] Tests of ARCH_PWR8 and -mno-vsx option. (1/2)

2022-10-17 Thread Kewen.Lin via Gcc-patches
Hi Will,

Some comments are inline.

on 2022/9/20 00:05, will schmidt wrote:
> [PATCH, rs6000] Tests of ARCH_PWR8 and -mno-vsx option.
> 
> Hi,
> 
> This adds an assortment of tests to exercise the -mno-vsx option and
> confirm the impacts on the ARCH_PWR8 define.
> 
> These are based on and inspired by PR 101865, which
> reports that _ARCH_PWR8 is disabled when -mno-vsx
> is passed on the commandline.
> 
> There are a small number of failures introduced by these tests,
> those are resolved with the changes in part 2.
> 
> OK for trunk?
> Thanks,
> -Will
> 
> 
> gcc/testsuite:
>   * gcc.target/powerpc/predefine_p7-novsx.c: New test.
>   * gcc.target/powerpc/predefine_p8-noaltivec-novsx.c: New test.
>   * gcc.target/powerpc/predefine_p8-novsx.c: New test.
>   * gcc.target/powerpc/predefine_p9-novsx.c: New test.
>   * gcc.target/powerpc/predefine_pragma_vsx.c: New test.
> 
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/predefine_p7-novsx.c 
> b/gcc/testsuite/gcc.target/powerpc/predefine_p7-novsx.c
> new file mode 100644
> index ..e842025b4d3c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/predefine_p7-novsx.c
> @@ -0,0 +1,9 @@
> +/* { dg-do preprocess } */
> +/* Test whether the ARCH_PWR7 and ARCH_PWR8 defines gets set

Nit: s/gets/get.

> + * when we specify power7, plus options.
> +/* This is a variation of the test at issue in GCC PR 101865 */
> +/* { dg-options "-dM -E -mdejagnu-cpu=power7 -mno-vsx" } */
> +/* { dg-final { scan-file predefine_p7-novsx.i "(^|\\n)#define _ARCH_PWR7 
> 1($|\\n)"  } } */
> +/* { dg-final { scan-file-not predefine_p7-novsx.i "(^|\\n)#define 
> _ARCH_PWR8 1($|\\n)"  } } */
> +/* { dg-final { scan-file-not predefine_p7-novsx.i "(^|\\n)#define __VSX__ 
> 1($|\\n)" } } */
> +/* { dg-final { scan-file predefine_p7-novsx.i "(^|\\n)#define __ALTIVEC__ 
> 1($|\\n)" } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/predefine_p8-noaltivec-novsx.c 
> b/gcc/testsuite/gcc.target/powerpc/predefine_p8-noaltivec-novsx.c
> new file mode 100644
> index ..c3b705ca3d48
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/predefine_p8-noaltivec-novsx.c
> @@ -0,0 +1,7 @@
> +/* { dg-do preprocess } */
> +/* Test whether the ARCH_PWR8 define remains set after disabling both 
> altivec and vsx. */
> +/* { dg-options "-dM -E -mdejagnu-cpu=power8 -mno-altivec -mno-vsx" } */
> +/* { dg-final { scan-file predefine_p8-noaltivec-novsx.i "(^|\\n)#define 
> _ARCH_PWR8 1($|\\n)"  } } */
> +/* { dg-final { scan-file-not predefine_p8-noaltivec-novsx.i "(^|\\n)#define 
> _ARCH_PWR9 1($|\\n)" } } */
> +/* { dg-final { scan-file-not predefine_p8-noaltivec-novsx.i "(^|\\n)#define 
> __VSX__ 1($|\\n)" } } */
> +/* { dg-final { scan-file-not predefine_p8-noaltivec-novsx.i "(^|\\n)#define 
> __ALTIVEC__ 1($|\\n)" } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/predefine_p8-novsx.c 
> b/gcc/testsuite/gcc.target/powerpc/predefine_p8-novsx.c
> new file mode 100644
> index ..8b6c69b20104
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/predefine_p8-novsx.c
> @@ -0,0 +1,8 @@
> +/* { dg-do preprocess } */
> +/* Test whether the ARCH_PWR8 define remains set after disabling vsx.
> +   This also confirms __ALTIVEC__ remains set when VSX is disabled. */
> +/* This is the primary test at issue in GCC PR 101865 */

Nit: the last comment missing a period.

> +/* { dg-options "-dM -E -mdejagnu-cpu=power8 -mno-vsx" } */
> +/* { dg-final { scan-file predefine_p8-novsx.i "(^|\\n)#define _ARCH_PWR8 
> 1($|\\n)"  } } */
> +/* { dg-final { scan-file-not predefine_p8-novsx.i "(^|\\n)#define __VSX__ 
> 1($|\\n)" } } */
> +/* { dg-final { scan-file predefine_p8-novsx.i "(^|\\n)#define __ALTIVEC__ 
> 1($|\\n)" } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/predefine_p9-novsx.c 
> b/gcc/testsuite/gcc.target/powerpc/predefine_p9-novsx.c
> new file mode 100644
> index ..eef42c111663
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/predefine_p9-novsx.c
> @@ -0,0 +1,10 @@
> +/* { dg-do preprocess } */
> +/* Test whether the ARCH_PWR8 define remains set after disabling vsx.
> +   This also confirms __ALTIVEC__ remains set when VSX is disabled. */
> +/* This is the primary test at issue in GCC PR 101865 */

Nit: it seems this part of comments were copied from the above case?
better with "s/ARCH_PWR8 define/ARCH_PWR8 and ARCH_PWR9 defines/" and
and removing the last sentence since power9 isn't the primary test?

> +/* { dg-options "-dM -E -mdejagnu-cpu=power9 -mno-vsx" } */
> +/* {xfail *-*-*} */
> +/* { dg-final { scan-file predefine_p9-novsx.i "(^|\\n)#define _ARCH_PWR8 
> 1($|\\n)"  } } */
> +/* { dg-final { scan-file predefine_p9-novsx.i "(^|\\n)#define _ARCH_PWR9 
> 1($|\\n)"  } } */
> +/* { dg-final { scan-file-not predefine_p9-novsx.i "(^|\\n)#define __VSX__ 
> 1($|\\n)" } } */
> +/* { dg-final { scan-file predefine_p9-novsx.i "(^|\\n)#define __ALTIVEC__ 
> 1($|\\n)" } } */
> diff --git 

GCN: Restore build with GCC 4.8 (was: [committed 1/6] amdgcn: add multiple vector sizes)

2022-10-17 Thread Thomas Schwinge
Hi!

On 2022-10-11T12:02:03+0100, Andrew Stubbs  wrote:
> --- a/gcc/config/gcn/gcn.cc
> +++ b/gcc/config/gcn/gcn.cc

> +/* Return a vector mode with N lanes of MODE.  */
> +
> +static machine_mode
> +VnMODE (int n, machine_mode mode)
> +{
> +  switch (mode)
> +{
> +case QImode:

Pushed to master branch commit 612de72b0d2904b5a5a2b487ce4cb907c768a947
"GCN: Restore build with GCC 4.8", see attached.

Cherry-picked pushed to devel/omp/gcc-12 branch in
commit 38e4f4f55a6823d028b8f5332c500b7267ad320b
"GCN: Restore build with GCC 4.8", see attached.


Grüße
 Thomas


> +  switch (n)
> + {
> + case 2: return V2QImode;
> + case 4: return V4QImode;
> + case 8: return V8QImode;
> + case 16: return V16QImode;
> + case 32: return V32QImode;
> + case 64: return V64QImode;
> + }
> +  break;
> +case HImode:
> +  switch (n)
> + {
> + case 2: return V2HImode;
> + case 4: return V4HImode;
> + case 8: return V8HImode;
> + case 16: return V16HImode;
> + case 32: return V32HImode;
> + case 64: return V64HImode;
> + }
> +  break;
> +case HFmode:
> +  switch (n)
> + {
> + case 2: return V2HFmode;
> + case 4: return V4HFmode;
> + case 8: return V8HFmode;
> + case 16: return V16HFmode;
> + case 32: return V32HFmode;
> + case 64: return V64HFmode;
> + }
> +  break;
> +case SImode:
> +  switch (n)
> + {
> + case 2: return V2SImode;
> + case 4: return V4SImode;
> + case 8: return V8SImode;
> + case 16: return V16SImode;
> + case 32: return V32SImode;
> + case 64: return V64SImode;
> + }
> +  break;
> +case SFmode:
> +  switch (n)
> + {
> + case 2: return V2SFmode;
> + case 4: return V4SFmode;
> + case 8: return V8SFmode;
> + case 16: return V16SFmode;
> + case 32: return V32SFmode;
> + case 64: return V64SFmode;
> + }
> +  break;
> +case DImode:
> +  switch (n)
> + {
> + case 2: return V2DImode;
> + case 4: return V4DImode;
> + case 8: return V8DImode;
> + case 16: return V16DImode;
> + case 32: return V32DImode;
> + case 64: return V64DImode;
> + }
> +  break;
> +case DFmode:
> +  switch (n)
> + {
> + case 2: return V2DFmode;
> + case 4: return V4DFmode;
> + case 8: return V8DFmode;
> + case 16: return V16DFmode;
> + case 32: return V32DFmode;
> + case 64: return V64DFmode;
> + }
> +  break;
> +default:
> +  break;
> +}
> +
> +  return VOIDmode;
> +}


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 612de72b0d2904b5a5a2b487ce4cb907c768a947 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Sat, 15 Oct 2022 00:10:29 +0200
Subject: [PATCH] GCN: Restore build with GCC 4.8
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

For example, for "g++-4.8 (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4", the recent
commit r13-3220-g45381d6f9f4e7b5c7b062f5ad8cc9788091c2d07
"amdgcn: add multiple vector sizes" broke the build:

In file included from [...]/source-gcc/gcc/coretypes.h:458:0,
 from [...]/source-gcc/gcc/config/gcn/gcn.cc:24:
[...]/source-gcc/gcc/config/gcn/gcn.cc: In function ‘machine_mode VnMODE(int, machine_mode)’:
./insn-modes.h:42:71: error: temporary of non-literal type ‘scalar_int_mode’ in a constant expression
 #define QImode (scalar_int_mode ((scalar_int_mode::from_int) E_QImode))
   ^
[...]/source-gcc/gcc/config/gcn/gcn.cc:405:10: note: in expansion of macro ‘QImode’
 case QImode:
  ^
In file included from [...]/source-gcc/gcc/coretypes.h:478:0,
 from [...]/source-gcc/gcc/config/gcn/gcn.cc:24:
[...]/source-gcc/gcc/machmode.h:410:7: note: ‘scalar_int_mode’ is not literal because:
 class scalar_int_mode
   ^
[...]/source-gcc/gcc/machmode.h:410:7: note:   ‘scalar_int_mode’ is not an aggregate, does not have a trivial default constructor, and has no constexpr constructor that is not a copy or move constructor
[...]

Addressing this like simiar issues have been addressed in the past.

	gcc/
	* config/gcn/gcn.cc (VnMODE): Use 'case E_QImode:' instead of
	'case QImode:', etc.
---
 gcc/config/gcn/gcn.cc | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 3dc294c2d2f..8777255a5c6 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -402,7 +402,7 @@ VnMODE (int n, machine_mode mode)
 {
   switch (mode)
 {
-case QImode:
+case E_QImode:
   switch (n)
 	{
 	case 2: return V2QImode;
@@ 

Re: [PATCH][RFT] Vectorization of first-order recurrences

2022-10-17 Thread Richard Biener via Gcc-patches
On Tue, 11 Oct 2022, juzhe.zh...@rivai.ai wrote:

> Hi, I apply this patch in RVV downstrean. Tested it with a lot of vector 
> benchmark. It overal has a greate performance gain.
> Maybe the last thing to merge this patch is wait for Richard Sandiford test 
> it in ARM SVE?
> 
> By the way, would you mind sharing the case list that GCC failed to vectorize 
> but Clang succeed ? 
> I am familiar with LLVM. I think I can do this job.

Sure - for (most of?) the TSVC tests GCC is failing to vectorize there
should be a bugzilla open.  So if you have TSVC tests that LLVM can
vectorize but GCC can not then check if there is a bugzilla open.
If so, note that LLVM can vectorize, if not, open one.

I have now pushed the change.

Thanks,
Richard.


Tag 'gcc/gimple-expr.cc:mark_addressable_2' as 'static' (was: [PR67891] drop is_gimple_reg test from set_parm_rtl)

2022-10-17 Thread Thomas Schwinge
Hi!

On 2015-11-03T02:29:41-0200, Alexandre Oliva  wrote:
> Thanks, here's the patch as just installed.

> --- a/gcc/gimple-expr.c
> +++ b/gcc/gimple-expr.c

> +static void
> +mark_addressable_1 (tree x)
> +{
> +  [...]
> +}
> +
> +/* Adaptor for mark_addressable_1 for use in hash_set traversal.  */
> +
> +bool
> +mark_addressable_2 (tree const , void * ATTRIBUTE_UNUSED = NULL)
> +{
> +  mark_addressable_1 (x);
> +  return false;
> +}

Found already a while ago, now pushed to master branch in
commit aeb1e2bff95ae17717026905ef404699d91f5c61
"Tag 'gcc/gimple-expr.cc:mark_addressable_2' as 'static'", see attached.


Grüße
 Thomas


> +void
> +flush_mark_addressable_queue ()
> +{
> +  gcc_assert (!currently_expanding_to_rtl);
> +  if (mark_addressable_queue)
> +{
> +  mark_addressable_queue->traverse (NULL);
> +  delete mark_addressable_queue;
> +  mark_addressable_queue = NULL;
> +}
> +}

> --- a/gcc/gimple-expr.h
> +++ b/gcc/gimple-expr.h

> +extern void flush_mark_addressable_queue (void);


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From aeb1e2bff95ae17717026905ef404699d91f5c61 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 15 Dec 2021 22:00:53 +0100
Subject: [PATCH] Tag 'gcc/gimple-expr.cc:mark_addressable_2' as 'static'

Added in 2015 r229696 (commit 1b223a9f3489296c625bdb7cc764196d04fd9231)
"defer mark_addressable calls during expand till the end of expand",
it has never been used 'extern'ally.

	gcc/
	* gimple-expr.cc (mark_addressable_2): Tag as 'static'.
---
 gcc/gimple-expr.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/gimple-expr.cc b/gcc/gimple-expr.cc
index c9c7285efbc..4fbce9369c7 100644
--- a/gcc/gimple-expr.cc
+++ b/gcc/gimple-expr.cc
@@ -912,7 +912,7 @@ mark_addressable_1 (tree x)
 
 /* Adaptor for mark_addressable_1 for use in hash_set traversal.  */
 
-bool
+static bool
 mark_addressable_2 (tree const , void * ATTRIBUTE_UNUSED = NULL)
 {
   mark_addressable_1 (x);
-- 
2.35.1



Fix nvptx-specific '-foffload-options' syntax in 'libgomp.c/reverse-offload-sm30.c' (was: [Patch] nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible)

2022-10-17 Thread Thomas Schwinge
Hi!

On 2022-09-12T14:02:16+0200, Tobias Burnus  wrote:
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c
> @@ -0,0 +1,15 @@
> +/* { dg-do link { target { offload_target_nvptx } } } */
> +/* { dg-additional-options "-foffload-options=nvptx-none=-march=sm_30 
> -foffload=-mptx=_" } */

Pushed to master branch
commit b61796663ba1fe8fb83203829398f3f89ec212b7
"Fix nvptx-specific '-foffload-options' syntax in
'libgomp.c/reverse-offload-sm30.c'", see attached.

Cherry-picked pushed to devel/omp/gcc-12 branch in
commit f36ce95ad928578aa6739f61480e6c8fbaf2248e
"Fix nvptx-specific '-foffload-options' syntax in
'libgomp.c/reverse-offload-sm30.c'", see attached.


Grüße
 Thomas


> +
> +#pragma omp requires reverse_offload
> +
> +int
> +main ()
> +{
> +  #pragma omp target
> +{
> +}
> +  return 0;
> +}
> +
> +/* { dg-warning "'omp requires reverse_offload' requires at least 'sm_35' 
> for '-march=' - disabling offload-code generation for this device type" "" { 
> target *-*-* } 0 } */


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From b61796663ba1fe8fb83203829398f3f89ec212b7 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 23 Sep 2022 11:29:50 +0200
Subject: [PATCH] Fix nvptx-specific '-foffload-options' syntax in
 'libgomp.c/reverse-offload-sm30.c'

That is, '-mptx=_' is only valid in '-foffload-options=nvptx-none', too.

Fix test case added in recent
commit r13-2625-g6b43f556f392a7165582aca36a19fe7389d995b2 "nvptx/mkoffload.cc:
Warn instead of error when reverse offload is not possible".

	libgomp/
	* testsuite/libgomp.c/reverse-offload-sm30.c: Fix nvptx-specific
	'-foffload-options' syntax.
---
 libgomp/testsuite/libgomp.c/reverse-offload-sm30.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c b/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c
index fbfeae1fd41..7f10fd4ded9 100644
--- a/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c
+++ b/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c
@@ -1,5 +1,5 @@
 /* { dg-do link { target { offload_target_nvptx } } } */
-/* { dg-additional-options "-foffload-options=nvptx-none=-march=sm_30 -foffload=-mptx=_" } */
+/* { dg-additional-options "-foffload-options=nvptx-none=-march=sm_30 -foffload-options=nvptx-none=-mptx=_" } */
 
 #pragma omp requires reverse_offload
 
-- 
2.35.1

>From f36ce95ad928578aa6739f61480e6c8fbaf2248e Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 23 Sep 2022 11:29:50 +0200
Subject: [PATCH] Fix nvptx-specific '-foffload-options' syntax in
 'libgomp.c/reverse-offload-sm30.c'

That is, '-mptx=_' is only valid in '-foffload-options=nvptx-none', too.

Fix test case added in recent
commit r13-2625-g6b43f556f392a7165582aca36a19fe7389d995b2 "nvptx/mkoffload.cc:
Warn instead of error when reverse offload is not possible".

	libgomp/
	* testsuite/libgomp.c/reverse-offload-sm30.c: Fix nvptx-specific
	'-foffload-options' syntax.

(cherry picked from commit b61796663ba1fe8fb83203829398f3f89ec212b7)
---
 libgomp/ChangeLog.omp  | 8 
 libgomp/testsuite/libgomp.c/reverse-offload-sm30.c | 2 +-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp
index cb3541be378..048314eb1be 100644
--- a/libgomp/ChangeLog.omp
+++ b/libgomp/ChangeLog.omp
@@ -1,3 +1,11 @@
+2022-10-17  Thomas Schwinge  
+
+	Backported from master:
+	2022-10-17  Thomas Schwinge  
+
+	* testsuite/libgomp.c/reverse-offload-sm30.c: Fix nvptx-specific
+	'-foffload-options' syntax.
+
 2022-10-14  Julian Brown  
 
 	* testsuite/libgomp.oacc-fortran/declare-1.f90: Adjust scan output.
diff --git a/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c b/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c
index fbfeae1fd41..7f10fd4ded9 100644
--- a/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c
+++ b/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c
@@ -1,5 +1,5 @@
 /* { dg-do link { target { offload_target_nvptx } } } */
-/* { dg-additional-options "-foffload-options=nvptx-none=-march=sm_30 -foffload=-mptx=_" } */
+/* { dg-additional-options "-foffload-options=nvptx-none=-march=sm_30 -foffload-options=nvptx-none=-mptx=_" } */
 
 #pragma omp requires reverse_offload
 
-- 
2.35.1



RE: [r13-3172 Regression] FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for excess errors) on Linux/x86_64

2022-10-17 Thread Thomas Schwinge
Hi!

On 2022-10-17T06:11:01+, "Jiang, Haochen via Gcc-patches" 
 wrote:
> I just checkout to your commit and the test still got failed.
>
> It is reporting like this:
> xgcc: error: 
> /export/users2/haochenj/src/gcc/master/./libgomp/testsuite/libgomp.oacc-c++/../libgomp.oacc-c-c++-common/kernels-loop-g.c:
>  '-fcompare-debug' failure (length)

Right.  I had filed  "[13 Regression]
c-c++-common/goacc/kernels-loop-g.c: '-fcompare-debug' failure (length)"
with you in CC: .  (Have you not received that
one?)


Grüße
 Thomas


> Also fix a typo in manually sending, should be this to reproduce
>
> To reproduce:
>
> $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
> --target_board='unix{-m32}'"
> $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
> --target_board='unix{-m32\ -march=cascadelake}'"
> $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
> --target_board='unix{-m64}'"
> $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
> RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
> --target_board='unix{-m64\ -march=cascadelake}'"
>
> BRs,
> Haochen
>
> From: Jiang, Haochen
> Sent: Monday, October 17, 2022 1:41 PM
> To: Eugene Rozenfeld ; 
> gcc-patches@gcc.gnu.org; gcc-regress...@gcc.gnu.org
> Subject: RE: [r13-3172 Regression] 
> FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
> -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for 
> excess errors) on Linux/x86_64
>
> If that has been fixed, just ignore that mail.
>
> It is run through by a script and got the result few days ago. However, the 
> sendmail
> service was down on that machine and I just noticed that issue. So I sent 
> that result
> manually today in case that is not fixed.
>
> Sorry for the disturb!
>
> BRs,
> Haochen
>
> From: Eugene Rozenfeld 
> mailto:eugene.rozenf...@microsoft.com>>
> Sent: Monday, October 17, 2022 1:23 PM
> To: Jiang, Haochen mailto:haochen.ji...@intel.com>>; 
> gcc-patches@gcc.gnu.org; 
> gcc-regress...@gcc.gnu.org
> Subject: RE: [r13-3172 Regression] 
> FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
> -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for 
> excess errors) on Linux/x86_64
>
> That commit had a bug that was fixed in 
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=80f414e6d73f9f1683f93d83ce63a6a482e54bee
>
> Was that fix included in your GCC build?
>
> From: Jiang, Haochen mailto:haochen.ji...@intel.com>>
> Sent: Sunday, October 16, 2022 8:09 PM
> To: gcc-patches@gcc.gnu.org; Eugene Rozenfeld 
> mailto:eugene.rozenf...@microsoft.com>>; 
> Jiang, Haochen mailto:haochen.ji...@intel.com>>; 
> gcc-regress...@gcc.gnu.org
> Subject: [EXTERNAL] [r13-3172 Regression] 
> FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
> -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for 
> excess errors) on Linux/x86_64
>
> You don't often get email from 
> haochen.ji...@intel.com. Learn why this is 
> important
> On Linux/x86_64,
>
> f30e9fd33e56a5a721346ea6140722e1b193db42 is the first bad commit
> commit f30e9fd33e56a5a721346ea6140722e1b193db42
> Author: Eugene Rozenfeld mailto:ero...@microsoft.com>>
> Date:   Thu Apr 21 16:43:24 2022 -0700
>
> Set discriminators for call stmts on the same line within the same basic 
> block.
>
> caused
>
> FAIL: libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
> -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O2  (test for 
> excess errors)
>
> with GCC configured with
>
> ../../gcc/configure 
> --prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-2288/usr 
> --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
> --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
> --enable-libmpx x86_64-linux --disable-bootstrap
>
> To reproduce:
>
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
> --target_board='unix{-m32}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
> --target_board='unix{-m32\ -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
> --target_board='unix{-m64}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
> --target_board='unix{-m64\ -march=cascadelake}'"
-
Siemens Electronic Design Automation GmbH; 

Re: Extend fold_vec_perm to fold VEC_PERM_EXPR in VLA manner

2022-10-17 Thread Prathamesh Kulkarni via Gcc-patches
On Mon, 10 Oct 2022 at 16:18, Prathamesh Kulkarni
 wrote:
>
> On Fri, 30 Sept 2022 at 21:38, Richard Sandiford
>  wrote:
> >
> > Richard Sandiford via Gcc-patches  writes:
> > > Prathamesh Kulkarni  writes:
> > >> Sorry to ask a silly question but in which case shall we select 2nd 
> > >> vector ?
> > >> For num_poly_int_coeffs == 2,
> > >> a1 /trunc n1 == (a1 + 0x) / (n1.coeffs[0] + n1.coeffs[1]*x)
> > >> If a1/trunc n1 succeeds,
> > >> 0 / n1.coeffs[1] == a1/n1.coeffs[0] == 0.
> > >> So, a1 has to be < n1.coeffs[0] ?
> > >
> > > Remember that a1 is itself a poly_int.  It's not necessarily a constant.
> > >
> > > E.g. the TRN1 .D instruction maps to a VEC_PERM_EXPR with the selector:
> > >
> > >   { 0, 2 + 2x, 1, 4 + 2x, 2, 6 + 2x, ... }
> >
> > Sorry, should have been:
> >
> >   { 0, 2 + 2x, 2, 4 + 2x, 4, 6 + 2x, ... }
> Hi Richard,
> Thanks for the clarifications, and sorry for late reply.
> I have attached POC patch that tries to implement the above approach.
> Passes bootstrap+test on x86_64-linux-gnu and aarch64-linux-gnu for VLS 
> vectors.
>
> For VLA vectors, I have only done limited testing so far.
> It seems to pass couple of tests written in the patch for
> nelts_per_pattern == 3,
> and folds the following svld1rq test:
> int32x4_t v = {1, 2, 3, 4};
> return svld1rq_s32 (svptrue_b8 (), [0])
> into:
> return {1, 2, 3, 4, ...};
> I will try to bootstrap+test it on SVE machine to test further for VLA 
> folding.
With the attached patch it seems to pass bootstrap+test with SVE enabled.
The only difference w.r.t previous patch is it adds check in
get_vector_for_pattern
if S is constant otherwise returns NULL_TREE.

I added this check because 930325-1.c ICE'd with previous patch
because it had following vec_perm_expr,
where S was non-constant:
vect__16.13_70 = VEC_PERM_EXPR ;
I am not sure how to proceed in this case, so chose to bail out.

Thanks,
Prathamesh

>
> I have a couple of questions:
> 1] When mask selects elements from same vector but from different patterns:
> For eg:
> arg0 = {1, 11, 2, 12, 3, 13, ...},
> arg1 = {21, 31, 22, 32, 23, 33, ...},
> mask = {0, 0, 0, 1, 0, 2, ... },
> All have npatterns = 2, nelts_per_pattern = 3.
>
> With above mask,
> Pattern {0, ...} selects arg0[0], ie {1, ...}
> Pattern {0, 1, 2, ...} selects arg0[0], arg0[1], arg0[2], ie {1, 11, 2, ...}
> While arg0[0] and arg0[2] belong to same pattern, arg0[1] belongs to different
> pattern in arg0.
> The result is:
> res = {1, 1, 1, 11, 1, 2, ...}
> In this case, res's 2nd pattern {1, 11, 2, ...} is encoded with:
> with a0 = 1, a1 = 11, S = -9.
> Is that expected tho ? It seems to create a new encoding which
> wasn't present in the input vector. For instance, the next elem in
> sequence would be -7,
> which is not present originally in arg0.
> I suppose it's fine since if the user defines mask to have pattern {0,
> 1, 2, ...}
> they intended result to have pattern with above encoding.
> Just wanted to confirm if this is correct ?
>
> 2] Could you please suggest a test-case for S < 0 ?
> I am not able to come up with one :/
>
> Thanks,
> Prathamesh
> >
> > > which is an interleaving of the two patterns:
> > >
> > >   { 0, 2, 4, ... }  a0 = 0, a1 = 2, S = 2
> > >   { 2 + 2x, 4 + 2x, 6 + 2x }a0 = 2 + 2x, a1 = 4 + 2x, S = 2
diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 9f7beae14e5..e93f2c7b592 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -85,6 +85,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "vec-perm-indices.h"
 #include "asan.h"
 #include "gimple-range.h"
+#include 
+#include "tree-pretty-print.h"
+#include "print-tree.h"
 
 /* Nonzero if we are folding constants inside an initializer or a C++
manifestly-constant-evaluated context; zero otherwise.
@@ -10494,38 +10497,56 @@ fold_mult_zconjz (location_t loc, tree type, tree 
expr)
  build_zero_cst (itype));
 }
 
+/* Check if PATTERN in SEL selects either ARG0 or ARG1,
+   and return the selected arg, otherwise return NULL_TREE.  */
 
-/* Helper function for fold_vec_perm.  Store elements of VECTOR_CST or
-   CONSTRUCTOR ARG into array ELTS, which has NELTS elements, and return
-   true if successful.  */
-
-static bool
-vec_cst_ctor_to_array (tree arg, unsigned int nelts, tree *elts)
+static tree
+get_vector_for_pattern (tree arg0, tree arg1,
+   const vec_perm_indices , unsigned pattern)
 {
-  unsigned HOST_WIDE_INT i, nunits;
+  unsigned sel_npatterns = sel.encoding ().npatterns ();
+  unsigned sel_nelts_per_pattern = sel.encoding ().nelts_per_pattern ();
 
-  if (TREE_CODE (arg) == VECTOR_CST
-  && VECTOR_CST_NELTS (arg).is_constant ())
+  poly_uint64 n1 = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
+  poly_uint64 nsel = sel.length ();
+  poly_uint64 esel;
+
+  if (!multiple_p (nsel, sel_npatterns, ))
+return NULL_TREE;
+
+  poly_uint64 a1 = sel[pattern + sel_npatterns];
+  int64_t S = 0;
+  if (sel_nelts_per_pattern == 3)
 {
-  for (i = 0; 

Re: [PATCH] gcc: honour -ffile-prefix-map in ASM_MAP [PR93371]

2022-10-17 Thread Rasmus Villemoes
On 27/09/2022 08.54, Rasmus Villemoes wrote:
> On 12/09/2022 11.46, Rasmus Villemoes wrote:
>> On 29/08/2022 11.29, Rasmus Villemoes wrote:
>>> -ffile-prefix-map is supposed to be a superset of -fmacro-prefix-map
>>> and -fdebug-prefix-map. However, when building .S or .s files, gas is
>>> not called with the appropriate --debug-prefix-map option when
>>> -ffile-prefix-map is used.
>>>
>>> While the user can specify -fdebug-prefix-map when building assembly
>>> files via gcc, it's more ergonomic to also support -ffile-prefix-map;
>>> especially since for .S files that could contain the __FILE__ macro,
>>> one would then also have to specify -fmacro-prefix-map.
>>>
>>> gcc:
>>> PR driver/93371
>>> * gcc.cc (ASM_MAP): Honour -ffile-prefix-map.
>>> ---
>>>
>>> I've tested that this works as expected, both by looking at how gas is
>>> now invoked, and by running 'strings' on the generated .o file. But I
>>> don't know how to add something to the testsuite for this.
>>
>> Is this ok for trunk? If so, how about older maintained branches?
>>
>> And does anyone have ideas for how I could add a test case?
> 
> ping.
> 

ping^2


RE: [PATCH]middle-end fix floating out of constants in conditionals

2022-10-17 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Monday, September 26, 2022 11:28 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; jeffreya...@gmail.com;
> ebotca...@adacore.com
> Subject: Re: [PATCH]middle-end fix floating out of constants in conditionals
> 
> On Fri, 23 Sep 2022, Tamar Christina wrote:
> 
> > Hi All,
> >
> > The following testcase:
> >
> > int zoo1 (int a, int b, int c, int d)
> > {
> >return (a > b ? c : d) & 1;
> > }
> >
> > gets de-optimized by the front-end since somewhere around GCC 4.x due
> > to a fix that was added to fold_binary_op_with_conditional_arg.
> >
> > The folding is supposed to succeed only if we have folded at least one
> > of the branches, however the check doesn't tests that all of the
> > values are non-constant.  So if one of the operators are a constant it
> accepts the folding.
> >
> > This ends up folding
> >
> >return (a > b ? c : d) & 1;
> >
> > into
> >
> >return (a > b ? c & 1 : d & 1);
> >
> > and thus performing the AND twice.
> >
> > change changes it to reject the folding if one of the arguments are a
> > constant and if the operations being performed are the same.
> >
> > Secondly it adds a new match.pd rule to now also fold the opposite
> > direction, so it now also folds:
> >
> >return (a > b ? c & 1 : d & 1);
> >
> > into
> >
> >return (a > b ? c : d) & 1;
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> > and  issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * fold-const.cc (fold_binary_op_with_conditional_arg): Add
> relaxation.
> > * match.pd: Add ternary constant fold rule.
> > * tree-cfg.cc (verify_gimple_assign_ternary): RHS1 of a COND_EXPR
> isn't
> > a value but an expression itself.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/if-compare_3.c: New test.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc index
> >
> 4f4ec81c8d4b6937ade3141a14c695b67c874c35..0ee083f290d12104969f1b335d
> c3
> > 3917c97b4808 100644
> > --- a/gcc/fold-const.cc
> > +++ b/gcc/fold-const.cc
> > @@ -7212,7 +7212,9 @@ fold_binary_op_with_conditional_arg (location_t
> loc,
> >  }
> >
> >/* Check that we have simplified at least one of the branches.  */
> > -  if (!TREE_CONSTANT (arg) && !TREE_CONSTANT (lhs) &&
> !TREE_CONSTANT
> > (rhs))
> > +  if ((!TREE_CONSTANT (arg) && !TREE_CONSTANT (lhs) &&
> !TREE_CONSTANT (rhs))
> > +  || (TREE_CONSTANT (arg) && TREE_CODE (lhs) == TREE_CODE (rhs)
> > + && !TREE_CONSTANT (lhs)))
> >  return NULL_TREE;
> 
> I think the better fix would be to only consider TREE_CONSTANT (arg) if it
> wasn't constant initially.  Because clearly "simplify" intends to be 
> "constant"
> here.  In fact I wonder why we test !TREE_CONSTANT (arg) at all, we don't
> simplify 'arg' ...

The function allows this because even when !TREE_CONSTANT (arg) the
true_value or false_value can instead be constant.

Yes it's of limited use unless true_value and false_value are 0 or 1.

> 
> Eric added this test (previosuly we'd just always done the folding), but I 
> think
> not enough?
> 
> >
> >return fold_build3_loc (loc, cond_code, type, test, lhs, rhs); diff
> > --git a/gcc/match.pd b/gcc/match.pd index
> >
> b225d36dc758f1581502c8d03761544bfd499c01..b61ed70e69b881a49177f10f20
> c1
> > f92712bb8665 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -4318,6 +4318,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >(op @3 (vec_cond:s @0 @1 @2))
> >(vec_cond @0 (op! @3 @1) (op! @3 @2
> >
> > +/* Float out binary operations from branches if they can't be folded.
> > +   Fold (a ? (b op c) : (d op c)) --> (op (a ? b : d) c).  */ (for op
> > +(plus mult min max bit_and bit_ior bit_xor minus lshift rshift rdiv
> > +trunc_div ceil_div floor_div round_div trunc_mod ceil_mod
> floor_mod
> > +round_mod)
> > + (simplify
> > +  (cond @0 (op @1 @2) (op @3 @2))
> > +   (if (!FLOAT_TYPE_P (type) || !(HONOR_NANS (@1) &&
> flag_trapping_math))
> > +(op (cond @0 @1 @3) @2
> 
> Ick.  Adding a reverse tranform is going to be prone to recursing :/
> 
> Why do you need to care about NANs or FP exceptions?

Because the function in fold-const.cc has a check in the start:

  /* Do not move possibly trapping operations into the conditional as this
 pessimizes code and causes gimplification issues when applied late.  */

And so I stayed conservative and just didn't want to touch them.

> How do you know if they can't be folded?

Because otherwise they would have been reduced in fold-const.cc.
The new conditions long with the rest in fold-const.cc require:

1. at least one of the operands to be constant
2. if the operands are equal, at least one of them must have been reduced to a 
constant.

These two means that (cond @0 (op @1 @2) (op @3 @2)) can't match if it's 
something
that fold-const.cc can handle.

> Since match.pd cannot handle 

[PATCH] libgcc: Special-case BFD ld unwind table encodings in find_fde_tail

2022-10-17 Thread Florian Weimer via Gcc-patches
BFD ld (and the other linkers) only produce one encoding of these
values.  It is not necessary to use the general
read_encoded_value_with_base decoding routine.  This avoids the
data-dependent branches in its implementation.

libgcc/

* unwind-dw2-fde-dip.c (find_fde_tail): Special-case encoding
values actually used by BFD ld.

---
 libgcc/unwind-dw2-fde-dip.c | 58 +
 1 file changed, 48 insertions(+), 10 deletions(-)

diff --git a/libgcc/unwind-dw2-fde-dip.c b/libgcc/unwind-dw2-fde-dip.c
index 7f9be5e6b02..f370c1279ae 100644
--- a/libgcc/unwind-dw2-fde-dip.c
+++ b/libgcc/unwind-dw2-fde-dip.c
@@ -396,10 +396,21 @@ find_fde_tail (_Unwind_Ptr pc,
   if (hdr->version != 1)
 return NULL;
 
-  p = read_encoded_value_with_base (hdr->eh_frame_ptr_enc,
-   base_from_cb_data (hdr->eh_frame_ptr_enc,
-  dbase),
-   p, _frame);
+  if (__builtin_expect (hdr->eh_frame_ptr_enc == (DW_EH_PE_sdata4
+ | DW_EH_PE_pcrel), 1))
+{
+  /* Specialized version of read_encoded_value_with_base, based on what
+BFD ld generates.  */
+  signed value __attribute__ ((mode (SI)));
+  memcpy (, p, sizeof (value));
+  p += sizeof (value);
+  dbase = value;   /* No adjustment because pcrel has base 0.  */
+}
+  else
+p = read_encoded_value_with_base (hdr->eh_frame_ptr_enc,
+ base_from_cb_data (hdr->eh_frame_ptr_enc,
+dbase),
+ p, _frame);
 
   /* We require here specific table encoding to speed things up.
  Also, DW_EH_PE_datarel here means using PT_GNU_EH_FRAME start
@@ -409,10 +420,20 @@ find_fde_tail (_Unwind_Ptr pc,
 {
   _Unwind_Ptr fde_count;
 
-  p = read_encoded_value_with_base (hdr->fde_count_enc,
-   base_from_cb_data (hdr->fde_count_enc,
-  dbase),
-   p, _count);
+  if (__builtin_expect (hdr->fde_count_enc == DW_EH_PE_udata4, 1))
+   {
+ /* Specialized version of read_encoded_value_with_base, based on
+what BFD ld generates.  */
+ unsigned value __attribute__ ((mode (SI)));
+ memcpy (, p, sizeof (value));
+ p += sizeof (value);
+ fde_count = value;
+   }
+  else
+   p = read_encoded_value_with_base (hdr->fde_count_enc,
+ base_from_cb_data (hdr->fde_count_enc,
+dbase),
+ p, _count);
   /* Shouldn't happen.  */
   if (fde_count == 0)
return NULL;
@@ -454,8 +475,25 @@ find_fde_tail (_Unwind_Ptr pc,
  f = (fde *) (table[mid].fde + data_base);
  f_enc = get_fde_encoding (f);
  f_enc_size = size_of_encoded_value (f_enc);
- read_encoded_value_with_base (f_enc & 0x0f, 0,
-   >pc_begin[f_enc_size], );
+
+ /* BFD ld uses DW_EH_PE_sdata4 | DW_EH_PE_pcrel on non-FDPIC targets,
+so optimize for that.
+
+This optimization is not valid for FDPIC targets.  f_enc & 0x0f as
+passed to read_encoded_value_with_base masks away the base flags,
+but they are implicit for FDPIC.  */
+#ifndef __FDPIC__
+ if (__builtin_expect (f_enc == (DW_EH_PE_sdata4 | DW_EH_PE_pcrel),
+   1))
+   {
+ signed value __attribute__ ((mode (SI)));
+ memcpy (, >pc_begin[f_enc_size], sizeof (value));
+ range = value;
+   }
+ else
+#endif
+   read_encoded_value_with_base (f_enc & 0x0f, 0,
+ >pc_begin[f_enc_size], );
  _Unwind_Ptr func = table[mid].initial_loc + data_base;
  if (pc < table[mid].initial_loc + data_base + range)
{

base-commit: de7d6310862c6045cf2dfb0ef209ff0e0923e648



Re: [PATCH v2] rs6000: Rework option -mpowerpc64 handling [PR106680]

2022-10-17 Thread Kewen.Lin via Gcc-patches
Hi Iain,

on 2022/10/13 18:09, Iain Sandoe wrote:
> 
> 
>> On 12 Oct 2022, at 09:57, Iain Sandoe  wrote:
>>> On 12 Oct 2022, at 09:12, Kewen.Lin  wrote:
>>
>>> PR106680 shows that -m32 -mpowerpc64 is different from
>>> -mpowerpc64 -m32, this is determined by the way how we
>>> handle option powerpc64 in rs6000_handle_option.
>>>
>>> Segher pointed out this difference should be taken as
>>> a bug and we should ensure that option powerpc64 is
>>> independent of -m32/-m64.  So this patch removes the
>>> handlings in rs6000_handle_option and add some necessary
>>> supports in rs6000_option_override_internal instead.
>>>
>>> With this patch, if users specify -m{no-,}powerpc64, the
>>> specified value is honoured, otherwise, for 64bit it
>>> always enables OPTION_MASK_POWERPC64; while for 32bit
>>> and TARGET_POWERPC64 and OS_MISSING_POWERPC64, it disables
>>> OPTION_MASK_POWERPC64.
>>>
>>> btw, following Segher's suggestion, I did some tries to warn
>>> when OPTION_MASK_POWERPC64 is set for OS_MISSING_POWERPC64.
>>> If warn for the case that powerpc64 is specified explicitly,
>>> there are some TCs using -m32 -mpowerpc64 on ppc64-linux,
>>> they need some updates, meanwhile the artificial run
>>> with "--target_board=unix'{-m32/-mpowerpc64}'" will have
>>> noisy warnings on ppc64-linux.  If warn for the case that
>>> it's specified implicitly, they can just be initialized by
>>> TARGET_DEFAULT (like -m32 on ppc64-linux) or set from the 
>>> given cpu mask, we have to special case them and not to warn.
>>> As Segher's latest comment, I decide not to warn them and
>>> keep it consistent with before.
>>>
>>> Bootstrapped and regress-tested on:
>>> - powerpc64-linux-gnu P7 and P8 {-m64,-m32}
>>> - powerpc64le-linux-gnu P9 and P10
>>> - powerpc-ibm-aix7.2.0.0 {-maix64,-maix32}
>>>
>>> Hi Iain, could you help to test this new patch on darwin
>>> again?  Thanks in advance!
>>
>> I kicked off a bootstrap - and 'check-gcc-c' .. if all goes well, there will 
>> be an 
>> answer in ≈ 7hours.  If something fails, the answer will be sooner ;)
> 
> bootstrapped and tested on powerpc-darwin9, with default CPU configuration.
> I have not yet tried tuning or cpu configure options.
> 
> testresults compare “nominal" against a recent set (another day elapsed time
> would be needed for a proper regtest).

Sounds good!  Many thanks again for your help!

BR,
Kewen


RE: [r13-3219 Regression] FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwq 2 on Linux/x86_64

2022-10-17 Thread Jiang, Haochen via Gcc-patches
Yes, the mail service on script machine was down previously after expected 
reboot
and it just recovered but still ran into some problems when sending previously 
email.

That is why this is the only stuck mail got sent and sorry for the disturb.

> -Original Message-
> From: Hongtao Liu 
> Sent: Monday, October 17, 2022 4:53 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; gcc-regress...@gcc.gnu.org;
> andre.simoesdiasvie...@arm.com
> Subject: Re: [r13-3219 Regression] FAIL: gcc.target/i386/pr92658-sse4.c scan-
> assembler-times pmovzxwq 2 on Linux/x86_64
> 
> This should be already fixed.
> 
> On Mon, Oct 17, 2022 at 4:34 PM haochen.jiang via Gcc-patches  patc...@gcc.gnu.org> wrote:
> >
> > On Linux/x86_64,
> >
> > 25413fdb2ac24933214123e24ba165026452a6f2 is the first bad commit
> > commit 25413fdb2ac24933214123e24ba165026452a6f2
> > Author: Andre Vieira 
> > Date:   Tue Oct 11 10:49:27 2022 +0100
> >
> > vect: Teach vectorizer how to handle bitfield accesses
> >
> > caused
> >
> > FAIL: gcc.target/i386/pr101668.c scan-assembler vpmovsxdq
> > FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbd 2
> > FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbq 2
> > FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbw 2
> > FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxdq 2
> > FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwd 2
> > FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwq 2
> > FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbd 2
> > FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbq 2
> > FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbw 2
> > FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxdq 2
> > FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwd 2
> > FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwq 2
> > FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times
> > pmovsxbd 2
> > FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times
> > pmovsxbq 2
> > FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times
> > pmovsxbw 2
> > FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times
> > pmovsxdq 2
> > FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times
> > pmovsxwd 2
> > FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times
> > pmovsxwq 2
> > FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbd
> > 2
> > FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbq
> > 2
> > FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times
> pmovzxbw
> > 2
> > FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxdq
> > 2
> > FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times
> pmovzxwd
> > 2
> > FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times
> pmovzxwq
> > 2
> > FAIL: gcc.target/i386/pr92658-avx512bw-trunc.c scan-assembler-times
> > vpmovwb 3
> > FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdb 1
> > FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdw 1
> > FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqb 1
> > FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqd 1
> > FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqw 1
> > FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdb
> > 2
> > FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdw
> > 2
> > FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[
> > \t]*%xmm 1
> > FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[
> > \t]*%ymm 1
> > FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqd
> > 2
> > FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqw
> > 2
> > FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbd 2
> > FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbq 2
> > FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbw 2
> > FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxdq 2
> > FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwd 2
> > FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwq 2
> > FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbd 2
> > FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbq 2
> > FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbw 2
> > FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxdq 2
> > FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwd 2
> > FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwq 2
> >
> > with GCC configured with
> >
> > ../../gcc/configure
> > --prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-3219/
> > usr --enable-clocale=gnu --with-system-zlib 

Re: [r13-3219 Regression] FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwq 2 on Linux/x86_64

2022-10-17 Thread Hongtao Liu via Gcc-patches
This should be already fixed.

On Mon, Oct 17, 2022 at 4:34 PM haochen.jiang via Gcc-patches
 wrote:
>
> On Linux/x86_64,
>
> 25413fdb2ac24933214123e24ba165026452a6f2 is the first bad commit
> commit 25413fdb2ac24933214123e24ba165026452a6f2
> Author: Andre Vieira 
> Date:   Tue Oct 11 10:49:27 2022 +0100
>
> vect: Teach vectorizer how to handle bitfield accesses
>
> caused
>
> FAIL: gcc.target/i386/pr101668.c scan-assembler vpmovsxdq
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbd 2
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbq 2
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbw 2
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxdq 2
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwd 2
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwq 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbd 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbq 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbw 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxdq 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwd 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwq 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbd 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbq 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbw 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxdq 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwd 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwq 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbd 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbq 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbw 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxdq 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwd 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwq 2
> FAIL: gcc.target/i386/pr92658-avx512bw-trunc.c scan-assembler-times vpmovwb 3
> FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdb 1
> FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdw 1
> FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqb 1
> FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqd 1
> FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqw 1
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdb 2
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdw 2
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[ 
> \t]*%xmm 1
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[ 
> \t]*%ymm 1
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqd 2
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqw 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbd 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbq 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbw 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxdq 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwd 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwq 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbd 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbq 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbw 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxdq 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwd 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwq 2
>
> with GCC configured with
>
> ../../gcc/configure 
> --prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-3219/usr 
> --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
> --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
> --enable-libmpx x86_64-linux --disable-bootstrap
>
> To reproduce:
>
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="i386.exp=gcc.target/i386/pr101668.c --target_board='unix{-m32}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="i386.exp=gcc.target/i386/pr101668.c --target_board='unix{-m32\ 
> -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="i386.exp=gcc.target/i386/pr101668.c --target_board='unix{-m64}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="i386.exp=gcc.target/i386/pr101668.c --target_board='unix{-m64\ 
> -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="i386.exp=gcc.target/i386/pr92658-avx2-2.c 
> --target_board='unix{-m32}'"
> 

Re: [committed] libstdc++: Implement constexpr std::to_chars for C++23 (P2291R3)

2022-10-17 Thread Jonathan Wakely via Gcc-patches
On Sat, 15 Oct 2022 at 21:26, Jonathan Wakely via Libstdc++
 wrote:
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/charconv.h (__to_chars_10_impl): Add constexpr
> for C++23. Remove 'static' from array.
> * include/std/charconv (__cpp_lib_constexpr_charconv): Define.

I managed to define the feature test macro with the wrong value. Fixed
by the attached patch, pushed to trunk.
commit 0f4815502d8dac07579dc7a5a40c597a18291b4c
Author: Jonathan Wakely 
Date:   Mon Oct 17 09:38:02 2022

libstdc++: Fix value of __cpp_lib_constexpr_charconv

libstdc++-v3/ChangeLog:

* include/std/charconv (__cpp_lib_constexpr_charconv): Define to
correct value.
* include/std/version (__cpp_lib_constexpr_charconv): Likewise.
* testsuite/20_util/to_chars/constexpr.cc: Check correct value.
* testsuite/20_util/to_chars/version.cc: Likewise.

diff --git a/libstdc++-v3/include/std/charconv 
b/libstdc++-v3/include/std/charconv
index 4b6cc83a567..7aefdd3298c 100644
--- a/libstdc++-v3/include/std/charconv
+++ b/libstdc++-v3/include/std/charconv
@@ -51,7 +51,7 @@
 #endif
 
 #if __cplusplus > 202002L
-# define __cpp_lib_constexpr_charconv 202202L
+# define __cpp_lib_constexpr_charconv 202207L
 #endif
 
 namespace std _GLIBCXX_VISIBILITY(default)
diff --git a/libstdc++-v3/include/std/version b/libstdc++-v3/include/std/version
index bec9e7aa792..3c7c440bd80 100644
--- a/libstdc++-v3/include/std/version
+++ b/libstdc++-v3/include/std/version
@@ -302,7 +302,7 @@
 #if __cplusplus > 202002L
 // c++23
 #define __cpp_lib_byteswap 202110L
-#define __cpp_lib_constexpr_charconv 202202L
+#define __cpp_lib_constexpr_charconv 202207L
 #define __cpp_lib_constexpr_typeinfo 202106L
 #if __cpp_concepts >= 202002L
 # define __cpp_lib_expected 202202L
diff --git a/libstdc++-v3/testsuite/20_util/to_chars/constexpr.cc 
b/libstdc++-v3/testsuite/20_util/to_chars/constexpr.cc
index 30c591659ee..10855b737c7 100644
--- a/libstdc++-v3/testsuite/20_util/to_chars/constexpr.cc
+++ b/libstdc++-v3/testsuite/20_util/to_chars/constexpr.cc
@@ -5,7 +5,7 @@
 
 #ifndef __cpp_lib_constexpr_charconv
 # error "Feature-test macro for constexpr charconv missing in "
-#elif __cpp_lib_constexpr_charconv != 202202L
+#elif __cpp_lib_constexpr_charconv != 202207L
 # error "Feature-test macro for constexpr charconv has wrong value in 
"
 #endif
 
diff --git a/libstdc++-v3/testsuite/20_util/to_chars/version.cc 
b/libstdc++-v3/testsuite/20_util/to_chars/version.cc
index af06e1bf054..25b1e0036e8 100644
--- a/libstdc++-v3/testsuite/20_util/to_chars/version.cc
+++ b/libstdc++-v3/testsuite/20_util/to_chars/version.cc
@@ -11,6 +11,6 @@
 
 #ifndef __cpp_lib_constexpr_charconv
 # error "Feature-test macro for constexpr charconv missing in "
-#elif __cpp_lib_constexpr_charconv != 202202L
+#elif __cpp_lib_constexpr_charconv != 202207L
 # error "Feature-test macro for constexpr charconv has wrong value in 
"
 #endif


Re: [PATCH][RFT] Vectorization of first-order recurrences

2022-10-17 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Tue, 11 Oct 2022, Richard Sandiford wrote:
>
>> Richard Biener  writes:
>> > On Mon, 10 Oct 2022, Andrew Stubbs wrote:
>> >> On 10/10/2022 12:03, Richard Biener wrote:
>> >> > The following picks up the prototype by Ju-Zhe Zhong for vectorizing
>> >> > first order recurrences.  That solves two TSVC missed optimization PRs.
>> >> > 
>> >> > There's a new scalar cycle def kind, vect_first_order_recurrence
>> >> > and it's handling of the backedge value vectorization is complicated
>> >> > by the fact that the vectorized value isn't the PHI but instead
>> >> > a (series of) permute(s) shifting in the recurring value from the
>> >> > previous iteration.  I've implemented this by creating both the
>> >> > single vectorized PHI and the series of permutes when vectorizing
>> >> > the scalar PHI but leave the backedge values in both unassigned.
>> >> > The backedge values are (for the testcases) computed by a load
>> >> > which is also the place after which the permutes are inserted.
>> >> > That placement also restricts the cases we can handle (without
>> >> > resorting to code motion).
>> >> > 
>> >> > I added both costing and SLP handling though SLP handling is
>> >> > restricted to the case where a single vectorized PHI is enough.
>> >> > 
>> >> > Missing is epilogue handling - while prologue peeling would
>> >> > be handled transparently by adjusting iv_phi_p the epilogue
>> >> > case doesn't work with just inserting a scalar LC PHI since
>> >> > that a) keeps the scalar load live and b) that loads is the
>> >> > wrong one, it has to be the last, much like when we'd vectorize
>> >> > the LC PHI as live operation.  Unfortunately LIVE
>> >> > compute/analysis happens too early before we decide on
>> >> > peeling.  When using fully masked loop vectorization the
>> >> > vect-recurr-6.c works as expected though.
>> >> > 
>> >> > I have tested this on x86_64 for now, but since epilogue
>> >> > handling is missing there's probably no practical cases.
>> >> > My prototype WHILE_ULT AVX512 patch can handle vect-recurr-6.c
>> >> > just fine but I didn't feel like running SPEC within SDE nor
>> >> > is the WHILE_ULT patch complete enough.  Builds of SPEC 2k7
>> >> > with fully masked loops succeed (minus three cases of
>> >> > PR107096, caused by my WHILE_ULT prototype).
>> >> > 
>> >> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
>> >> > 
>> >> > Testing with SVE, GCN or RVV appreciated, ideas how to cleanly
>> >> > handle epilogues welcome.
>> >> 
>> >> The testcases all produce correct code on GCN and pass the execution 
>> >> tests.
>> >> 
>> >> The code isn't terribly optimal because we don't have a two-input 
>> >> permutation
>> >> instruction, so we permute each half separately and vec_merge the 
>> >> results. In
>> >> this case the first vector is always a no-op permutation so that's wasted
>> >> cycles. We'd really want a vector rotate and write-lane (or the other way
>> >> around). I think the special-case permutations can be recognised and coded
>> >> into the backend, but I don't know if we can easily tell that the first 
>> >> vector
>> >> is just a bunch of duplicates, when it's not constant.
>> >
>> > It's not actually a bunch of duplicates in all but the first iteration.
>> > But what you can recognize is that we're only using lane N - 1 of the
>> > first vector, so you could model the permute as extract last
>> > + shift in scalar (the extracted lane).  IIRC VLA vector targets usually
>> > have something like shift the vector and set the low lane from a
>> > scalar?
>> 
>> Yeah.
>> 
>> > The extract lane N - 1 might be more difficult but then
>> > a rotate plus extracting lane 0 might work as well.
>> 
>> I guess for SVE we should probably use SPLICE, which joins two vectors
>> and uses a predicate to select the first element that should be extracted.
>> 
>> Unfortunately we don't have a way of representing "last bit set, all other
>> bits clear" as a constant though, so I guess it'll have to be hidden
>> behind unspecs.
>> 
>> I meant to start SVE tests running once I'd finished for the day yesterday,
>> but forgot, sorry.  Will try to test today.
>> 
>> On the patch:
>> 
>> +  /* This is the second phase of vectorizing first-order rececurrences. An
>> + overview of the transformation is described below. Suppose we have the
>> + following loop.
>> +
>> + int32_t t = 0;
>> + for (int i = 0; i < n; ++i)
>> +   {
>> +b[i] = a[i] - t;
>> +t = a[i];
>> +  }
>> +
>> +There is a first-order recurrence on "a". For this loop, the shorthand
>> +scalar IR looks like:
>> +
>> +scalar.preheader:
>> +  init = a[-1]
>> +  br loop.body
>> +
>> +scalar.body:
>> +  i = PHI <0(scalar.preheader), i+1(scalar.body)>
>> +  _2 = PHI <(init(scalar.preheader), <_1(scalar.body)>
>> +  _1 = a[i]
>> +  b[i] = _1 - _2
>> +  br cond, scalar.body, ...
>> +
>> +In this example, _2 is a recurrence because it's value 

[PATCH] RISC-V: Add RVV vsetvl/vsetvlmax intrinsics and tests.

2022-10-17 Thread juzhe . zhong
From: Ju-Zhe Zhong 

gcc/ChangeLog:

* config.gcc: Add riscv-vector-builtins-bases.o and 
riscv-vector-builtins-shapes.o
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_I_OPS): New macro.
(DEF_RVV_FUNCTION): Ditto.
(handle_pragma_vector): Add intrinsic framework.
* config/riscv/riscv.cc (riscv_print_operand): Add operand print for 
vsetvl/vsetvlmax.
* config/riscv/riscv.md: include vector.md.
* config/riscv/t-riscv: Add riscv-vector-builtins-bases.o and 
riscv-vector-builtins-shapes.o
* config/riscv/riscv-vector-builtins-bases.cc: New file.
* config/riscv/riscv-vector-builtins-bases.h: New file.
* config/riscv/riscv-vector-builtins-functions.def: New file.
* config/riscv/riscv-vector-builtins-shapes.cc: New file.
* config/riscv/riscv-vector-builtins-shapes.h: New file.
* config/riscv/riscv-vector-builtins-types.def: New file.
* config/riscv/vector.md: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/vsetvl-1.c: New test.

---
 gcc/config.gcc|   2 +-
 .../riscv/riscv-vector-builtins-bases.cc  | 104 +++
 .../riscv/riscv-vector-builtins-bases.h   |  33 +
 .../riscv/riscv-vector-builtins-functions.def |  43 +
 .../riscv/riscv-vector-builtins-shapes.cc | 104 +++
 .../riscv/riscv-vector-builtins-shapes.h  |  33 +
 .../riscv/riscv-vector-builtins-types.def |  50 ++
 gcc/config/riscv/riscv-vector-builtins.cc |  56 ++
 gcc/config/riscv/riscv.cc |  26 +
 gcc/config/riscv/riscv.md |   1 +
 gcc/config/riscv/t-riscv  |  28 +-
 gcc/config/riscv/vector.md|  72 ++
 .../gcc.target/riscv/rvv/base/vsetvl-1.c  | 750 ++
 13 files changed, 1300 insertions(+), 2 deletions(-)
 create mode 100644 gcc/config/riscv/riscv-vector-builtins-bases.cc
 create mode 100644 gcc/config/riscv/riscv-vector-builtins-bases.h
 create mode 100644 gcc/config/riscv/riscv-vector-builtins-functions.def
 create mode 100644 gcc/config/riscv/riscv-vector-builtins-shapes.cc
 create mode 100644 gcc/config/riscv/riscv-vector-builtins-shapes.h
 create mode 100644 gcc/config/riscv/riscv-vector-builtins-types.def
 create mode 100644 gcc/config/riscv/vector.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vsetvl-1.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 486e8790544..3826ae42803 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -516,7 +516,7 @@ pru-*-*)
 riscv*)
cpu_type=riscv
extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o 
riscv-shorten-memrefs.o riscv-selftests.o"
-   extra_objs="${extra_objs} riscv-vector-builtins.o"
+   extra_objs="${extra_objs} riscv-vector-builtins.o 
riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
d_target_objs="riscv-d.o"
extra_headers="riscv_vector.h"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/riscv/riscv-vector-builtins.cc"
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
new file mode 100644
index 000..8582c0cae4c
--- /dev/null
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -0,0 +1,104 @@
+/* function_base implementation for RISC-V 'V' Extension for GNU compiler.
+   Copyright (C) 2022-2022 Free Software Foundation, Inc.
+   Contributed by Ju-Zhe Zhong (juzhe.zh...@rivai.ai), RiVAI Technologies Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "tree.h"
+#include "rtl.h"
+#include "tm_p.h"
+#include "memmodel.h"
+#include "insn-codes.h"
+#include "optabs.h"
+#include "recog.h"
+#include "expr.h"
+#include "basic-block.h"
+#include "function.h"
+#include "fold-const.h"
+#include "gimple.h"
+#include "gimple-iterator.h"
+#include "gimplify.h"
+#include "explow.h"
+#include "emit-rtl.h"
+#include "tree-vector-builder.h"
+#include "rtx-vector-builder.h"
+#include "riscv-vector-builtins.h"
+#include "riscv-vector-builtins-shapes.h"
+#include "riscv-vector-builtins-bases.h"
+
+using namespace riscv_vector;
+
+namespace riscv_vector {
+
+/* Implements vsetvl && vsetvlmax.  */
+template
+class vsetvl : public function_base
+{
+public:
+  unsigned int 

[r13-3219 Regression] FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwq 2 on Linux/x86_64

2022-10-17 Thread haochen.jiang via Gcc-patches
On Linux/x86_64,

25413fdb2ac24933214123e24ba165026452a6f2 is the first bad commit
commit 25413fdb2ac24933214123e24ba165026452a6f2
Author: Andre Vieira 
Date:   Tue Oct 11 10:49:27 2022 +0100

vect: Teach vectorizer how to handle bitfield accesses

caused

FAIL: gcc.target/i386/pr101668.c scan-assembler vpmovsxdq
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbd 2
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbq 2
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbw 2
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxdq 2
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwd 2
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwq 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbd 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbq 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbw 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxdq 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwd 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwq 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbd 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbq 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbw 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxdq 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwd 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwq 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbd 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbq 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbw 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxdq 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwd 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwq 2
FAIL: gcc.target/i386/pr92658-avx512bw-trunc.c scan-assembler-times vpmovwb 3
FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdb 1
FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdw 1
FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqb 1
FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqd 1
FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqw 1
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdb 2
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdw 2
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[ \t]*%xmm 
1
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[ \t]*%ymm 
1
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqd 2
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqw 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbd 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbq 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbw 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxdq 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwd 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwq 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbd 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbq 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbw 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxdq 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwd 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwq 2

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-3219/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr101668.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr101668.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr101668.c --target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr101668.c --target_board='unix{-m64\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr92658-avx2-2.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr92658-avx2-2.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr92658-avx2-2.c 
--target_board='unix{-m64}'"
$ cd 

[PATCH] RISC-V: Add RVV intrinsic basic framework.

2022-10-17 Thread juzhe . zhong
From: Ju-Zhe Zhong 

gcc/ChangeLog:

* config.gcc: Add gt files since function_instance is GTY ((user)).
* config/riscv/riscv-builtins.cc (riscv_init_builtins): Add RVV 
intrinsic framework.
(riscv_builtin_decl): Ditto.
(riscv_expand_builtin): Ditto.
* config/riscv/riscv-protos.h (builtin_decl): New function.
(expand_builtin): Ditto.
(enum riscv_builtin_class): New enum to classify RVV intrinsic and 
RISC-V general built-in.
* config/riscv/riscv-vector-builtins.cc (class GTY): New declaration.
(struct registered_function_hasher): New struct.
(DEF_RVV_OP_TYPE): New macro.
(DEF_RVV_TYPE): Ditto.
(DEF_RVV_PRED_TYPE): Ditto.
(GTY): New declaration.
(add_attribute): New function.
(check_required_extensions): Ditto.
(rvv_arg_type_info::get_tree_type): Ditto.
(function_instance::function_instance): Ditto.
(function_instance::operator==): Ditto.
(function_instance::any_type_float_p): Ditto.
(function_instance::get_return_type): Ditto.
(function_instance::get_arg_type): Ditto.
(function_instance::hash): Ditto.
(function_instance::call_properties): Ditto.
(function_instance::reads_global_state_p): Ditto.
(function_instance::modifies_global_state_p): Ditto.
(function_instance::could_trap_p): Ditto.
(function_builder::function_builder): Ditto.
(function_builder::~function_builder): Ditto.
(function_builder::allocate_argument_types): Ditto.
(function_builder::register_function_group): Ditto.
(function_builder::append_name): Ditto.
(function_builder::finish_name): Ditto.
(function_builder::get_attributes): Ditto.
(function_builder::add_function): Ditto.
(function_builder::add_unique_function): Ditto.
(function_call_info::function_call_info): Ditto.
(function_expander::function_expander): Ditto.
(function_expander::add_input_operand): Ditto.
(function_expander::generate_insn): Ditto.
(registered_function_hasher::hash): Ditto.
(registered_function_hasher::equal): Ditto.
(builtin_decl): Ditto.
(expand_builtin): Ditto.
(gt_ggc_mx): Define for using GCC garbage collect.
(gt_pch_nx): Define for using GCC garbage collect.
* config/riscv/riscv-vector-builtins.def (DEF_RVV_OP_TYPE): New macro.
(DEF_RVV_PRED_TYPE): Ditto.
(vbool64_t): Add suffix.
(vbool32_t): Ditto.
(vbool16_t): Ditto.
(vbool8_t): Ditto.
(vbool4_t): Ditto.
(vbool2_t): Ditto.
(vbool1_t): Ditto.
(vint8mf8_t): Ditto.
(vuint8mf8_t): Ditto.
(vint8mf4_t): Ditto.
(vuint8mf4_t): Ditto.
(vint8mf2_t): Ditto.
(vuint8mf2_t): Ditto.
(vint8m1_t): Ditto.
(vuint8m1_t): Ditto.
(vint8m2_t): Ditto.
(vuint8m2_t): Ditto.
(vint8m4_t): Ditto.
(vuint8m4_t): Ditto.
(vint8m8_t): Ditto.
(vuint8m8_t): Ditto.
(vint16mf4_t): Ditto.
(vuint16mf4_t): Ditto.
(vint16mf2_t): Ditto.
(vuint16mf2_t): Ditto.
(vint16m1_t): Ditto.
(vuint16m1_t): Ditto.
(vint16m2_t): Ditto.
(vuint16m2_t): Ditto.
(vint16m4_t): Ditto.
(vuint16m4_t): Ditto.
(vint16m8_t): Ditto.
(vuint16m8_t): Ditto.
(vint32mf2_t): Ditto.
(vuint32mf2_t): Ditto.
(vint32m1_t): Ditto.
(vuint32m1_t): Ditto.
(vint32m2_t): Ditto.
(vuint32m2_t): Ditto.
(vint32m4_t): Ditto.
(vuint32m4_t): Ditto.
(vint32m8_t): Ditto.
(vuint32m8_t): Ditto.
(vint64m1_t): Ditto.
(vuint64m1_t): Ditto.
(vint64m2_t): Ditto.
(vuint64m2_t): Ditto.
(vint64m4_t): Ditto.
(vuint64m4_t): Ditto.
(vint64m8_t): Ditto.
(vuint64m8_t): Ditto.
(vfloat32mf2_t): Ditto.
(vfloat32m1_t): Ditto.
(vfloat32m2_t): Ditto.
(vfloat32m4_t): Ditto.
(vfloat32m8_t): Ditto.
(vfloat64m1_t): Ditto.
(vfloat64m2_t): Ditto.
(vfloat64m4_t): Ditto.
(vfloat64m8_t): Ditto.
(vv): Ditto.
(vx): Ditto.
(v): Ditto.
(wv): Ditto.
(wx): Ditto.
(x_x_v): Ditto.
(vf2): Ditto.
(vf4): Ditto.
(vf8): Ditto.
(vvm): Ditto.
(vxm): Ditto.
(x_x_w): Ditto.
(v_v): Ditto.
(v_x): Ditto.
(vs): Ditto.
(mm): Ditto.
(m): Ditto.
(vf): Ditto.
(vm): Ditto.
(wf): Ditto.
(vfm): Ditto.
(v_f): Ditto.
(ta): Ditto.
(tu): Ditto.
(ma): Ditto.
(mu): Ditto.
(tama): Ditto.
(tamu): Ditto.
(tuma): Ditto.
(tumu): Ditto.
(tam): Ditto.
(tum): Ditto.
* 

Re: [PATCH] RISC-V: Reorganize mangle_builtin_type.[NFC]

2022-10-17 Thread Kito Cheng via Gcc-patches
Committed :)

On Sat, Oct 15, 2022 at 7:03 AM  wrote:
>
> From: Ju-Zhe Zhong 
>
> Hi, this patch fixed my mistake in the previous commit patch.
> Since "mangle_builtin_type" is a global function will be called in riscv.cc.
> It's reasonable move it down and put them together stay with other global 
> functions.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins.cc (mangle_builtin_type): Move 
> down the function.
> ---
>  gcc/config/riscv/riscv-vector-builtins.cc | 26 +++
>  1 file changed, 13 insertions(+), 13 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
> b/gcc/config/riscv/riscv-vector-builtins.cc
> index 99c482582d3..55d45651618 100644
> --- a/gcc/config/riscv/riscv-vector-builtins.cc
> +++ b/gcc/config/riscv/riscv-vector-builtins.cc
> @@ -155,19 +155,6 @@ lookup_vector_type_attribute (const_tree type)
>return lookup_attribute ("RVV type", TYPE_ATTRIBUTES (type));
>  }
>
> -/* If TYPE is a built-in type defined by the RVV ABI, return the mangled 
> name,
> -   otherwise return NULL.  */
> -const char *
> -mangle_builtin_type (const_tree type)
> -{
> -  if (TYPE_NAME (type) && TREE_CODE (TYPE_NAME (type)) == TYPE_DECL)
> -type = TREE_TYPE (TYPE_NAME (type));
> -  if (tree attr = lookup_vector_type_attribute (type))
> -if (tree id = TREE_VALUE (chain_index (0, TREE_VALUE (attr
> -  return IDENTIFIER_POINTER (id);
> -  return NULL;
> -}
> -
>  /* Return a representation of "const T *".  */
>  static tree
>  build_const_pointer (tree t)
> @@ -250,6 +237,19 @@ register_vector_type (vector_type_index type)
>builtin_types[type].vector_ptr = build_pointer_type (vectype);
>  }
>
> +/* If TYPE is a built-in type defined by the RVV ABI, return the mangled 
> name,
> +   otherwise return NULL.  */
> +const char *
> +mangle_builtin_type (const_tree type)
> +{
> +  if (TYPE_NAME (type) && TREE_CODE (TYPE_NAME (type)) == TYPE_DECL)
> +type = TREE_TYPE (TYPE_NAME (type));
> +  if (tree attr = lookup_vector_type_attribute (type))
> +if (tree id = TREE_VALUE (chain_index (0, TREE_VALUE (attr
> +  return IDENTIFIER_POINTER (id);
> +  return NULL;
> +}
> +
>  /* Initialize all compiler built-ins related to RVV that should be
> defined at start-up.  */
>  void
> --
> 2.36.1
>


Re: [PATCH] RISC-V: Fix format[NFC]

2022-10-17 Thread Kito Cheng via Gcc-patches
Committed :)

On Mon, Oct 17, 2022 at 3:31 PM  wrote:
>
> From: Ju-Zhe Zhong 
>
> gcc/ChangeLog:
>
> * config/riscv/t-riscv: Change Tab into 2 space.
>
> ---
>  gcc/config/riscv/t-riscv | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
> index 2f060437c23..15b9e7c01b1 100644
> --- a/gcc/config/riscv/t-riscv
> +++ b/gcc/config/riscv/t-riscv
> @@ -11,7 +11,7 @@ riscv-vector-builtins.o: 
> $(srcdir)/config/riscv/riscv-vector-builtins.cc \
>$(FUNCTION_H) fold-const.h gimplify.h explow.h stor-layout.h $(REGS_H) \
>alias.h langhooks.h attribs.h stringpool.h \
>$(srcdir)/config/riscv/riscv-vector-builtins.h \
> -   $(srcdir)/config/riscv/riscv-vector-builtins.def
> +  $(srcdir)/config/riscv/riscv-vector-builtins.def
> $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
> $(srcdir)/config/riscv/riscv-vector-builtins.cc
>
> --
> 2.36.1
>


Add 'c-c++-common/torture/pr107195-1.c' [PR107195] (was: [COMMITTED] [PR107195] Set range to zero when nonzero mask is 0.)

2022-10-17 Thread Thomas Schwinge
Hi!

On 2022-10-11T10:31:37+0200, Aldy Hernandez via Gcc-patches 
 wrote:
> When solving 0 = _15 & 1, we calculate _15 as:
>
>   [irange] int [-INF, -2][0, +INF] NONZERO 0xfffe
>
> The known value of _15 is [0, 1] NONZERO 0x1 which is intersected with
> the above, yielding:
>
>   [0, 1] NONZERO 0x0
>
> This eventually gets copied to a _Bool [0, 1] NONZERO 0x0.
>
> This is problematic because here we have a bool which is zero, but
> returns false for irange::zero_p, since the latter does not look at
> nonzero bits.  This causes logical_combine to assume the range is
> not-zero, and all hell breaks loose.
>
> I think we should just normalize a nonzero mask of 0 to [0, 0] at
> creation, thus avoiding all this.

1. This commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04
"[PR107195] Set range to zero when nonzero mask is 0" broke a GCC/nvptx
offloading test case:

UNSUPPORTED: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0
PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  (test 
for excess errors)
PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  
execution test
[-PASS:-]{+FAIL:+} 
libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2   
scan-nvptx-none-offload-rtl-dump mach "SESE regions:.* 
[0-9]+{[0-9]+->[0-9]+(\\.[0-9]+)+}"

Same for C++.

I'll later send a patch (for the test case!) to fix that up.

2. Looking into this, I found that this
commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04
"[PR107195] Set range to zero when nonzero mask is 0" actually enables a
code transformation/optimization that GCC apparently has not been doing
before!  I've tried to capture that in the attached
"Add 'c-c++-common/torture/pr107195-1.c' [PR107195]".

Will you please verify that one?  In its current '#if 1' configuration,
it's all-PASS after commit
r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04
"[PR107195] Set range to zero when nonzero mask is 0", whereas before, we
get two calls to 'foo', because GCC apparently didnn't understand the
relation (optimization opportunity) between 'r *= 2;' and the subsequent
'if (r & 1)'.

I've left in the other '#if' variants in case you'd like to experiment
with these, but would otherwise clean that up before pushing.

Where does one put such a test case?

Should the file be named 'pr107195' or something else?

Do we scan 'optimized', or an earlier dump?

At '-O1', the actual code transformation is visible already in the 'dom2'
dump:

[local count: 536870913]:
   gimple_assign 
+  gimple_assign 
+  goto ; [100.00%]

-   [local count: 1073741824]:
-  # gimple_phi 
+   [local count: 536870912]:
+  # gimple_phi 
   gimple_assign 
   gimple_cond 
-goto ; [50.00%]
+goto ; [100.00%]
   else
-goto ; [50.00%]
+goto ; [0.00%]

[local count: 536870913]:
   gimple_call 
   gimple_assign 

[local count: 1073741824]:
-  # gimple_phi 
+  # gimple_phi 
   gimple_return 

And, the actual "avoid second call 'foo'" optimization is visiable
starting 'dom3':

[local count: 536870913]:
   gimple_assign 
+  goto ; [100.00%]

-   [local count: 1073741824]:
-  # gimple_phi 
-  gimple_assign 
+   [local count: 536870912]:
+  gimple_assign 
   gimple_cond 
-goto ; [50.00%]
+goto ; [100.00%]
   else
-goto ; [50.00%]
+goto ; [0.00%]

[local count: 536870913]:
-  gimple_call 
-  gimple_assign 
+  gimple_assign 
+  gimple_assign 

[local count: 1073741824]:
-  # gimple_phi 
+  # gimple_phi 
   gimple_return 

..., but I don't know if either of those would be stable/appropriate to
scan instead of 'optimized'?


Grüße
 Thomas

>   PR tree-optimization/107195
>
> gcc/ChangeLog:
>
>   * value-range.cc (irange::set_range_from_nonzero_bits): Set range
>   to [0,0] when nonzero mask is 0.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.dg/tree-ssa/pr107195-1.c: New test.
>   * gcc.dg/tree-ssa/pr107195-2.c: New test.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/pr107195-1.c | 15 +++
>  gcc/testsuite/gcc.dg/tree-ssa/pr107195-2.c | 16 
>  gcc/value-range.cc |  5 +
>  3 files changed, 36 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr107195-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr107195-2.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr107195-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr107195-1.c
> new file mode 100644
> index 000..a0c20dbd4b1
> --- /dev/null
> +++ 

*ping* / Re: [Patch] libgomp: Add offload_device_gcn check, add requires-4a.c test

2022-10-17 Thread Tobias Burnus



On 12.10.22 16:05, Tobias Burnus wrote:

This came up because the USM implementation with
-foffload-memory={unified,pinned}
as posted at
https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597976.html
does not handle USM with static variables.

This shows up for the OG12 alias devel/omp/gcc-12 branch as FAIL for
requires-4.c.

The attached patch prepares for skipping requires-4.c for the
gcn/nvptx device
and adds an adjacent requires-4a.c testcase, using heap memory, that
can still
run on gcn/nvptx.

Additionally, I commented on no longer used #defined, following the
precedence GOMP_DEVICE_HOST_NONSHM.

Thus, this tests adds another testcase and one effective-target check,
out-comments a unused #define - and that's it.
(Otherwise, it is just a prep patch.)

OK for mainline?

Tobias

PS: Currently, neither the preexisting offload_device_nvptx nor the new
offload_device_gcn target selector is used, neither in old code nor by
this patch.

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


*ping* / Re: [Patch][v5] libgomp/nvptx: Prepare for reverse-offload callback handling

2022-10-17 Thread Tobias Burnus



On 12.10.22 10:55, Tobias Burnus wrote:

On 11.10.22 13:12, Alexander Monakov wrote:

My understanding is such trickery should not be necessary with
the barrier-based approach, i.e. the sequence of PTX instructions

   st   % plain store
   membar.sys
   st.volatile

should be enough to guarantee that the former store is visible on the
host
before the latter, and work all the way back to sm_20.


If I understand it correctly, you mean:

  GOMP_REV_OFFLOAD_VAR->dev_num = GOMP_ADDITIONAL_ICVS.device_num;

  __sync_synchronize ();  /* membar.sys */
  asm volatile ("st.volatile.global.u64 [%0], %1;"
: : "r"(addr_struct_fn), "r" (fn) : "memory");


And then directly followed by the busy wait:

  while (__atomic_load_n (_REV_OFFLOAD_VAR->fn, __ATOMIC_ACQUIRE)
!= 0)
;  /* spin  */

which GCC expands to:

  /* ld.global.u64 %r64,[__gomp_rev_offload_var];
 ld.u64 %r36,[%r64];
 membar.sys;  */

The such updated patch is attached.

(This is the only change + removing the mkoffload.cc part is the only
larger change. Otherwise, it only handles the minor comments by Jakub.
The now removed CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT was used
until commit r10-304-g1f4c5b9bb2eb81880e2bc725435d596fcd2bdfef i.e.
it is a really old left over!)

Otherwise, tested* to work with sm_30 (error by mkoffload, unchanged),
sm_35 and sm_70.

Tobias

*With some added code; until GOMP_OFFLOAD_get_num_devices accepts
GOMP_REQUIRES_UNIFIED_SHARED_MEMORY and GOMP_OFFLOAD_load_image
gets passed a non-NULL for rev_fn_table, the current patch is a no op.

Planned next is the related GCN patch – and the actual change
in libgomp/target.c (+ accepting USM in GOMP_OFFLOAD_get_num_devices)

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH] RISC-V: Fix format[NFC]

2022-10-17 Thread juzhe . zhong
From: Ju-Zhe Zhong 

gcc/ChangeLog:

* config/riscv/t-riscv: Change Tab into 2 space.

---
 gcc/config/riscv/t-riscv | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
index 2f060437c23..15b9e7c01b1 100644
--- a/gcc/config/riscv/t-riscv
+++ b/gcc/config/riscv/t-riscv
@@ -11,7 +11,7 @@ riscv-vector-builtins.o: 
$(srcdir)/config/riscv/riscv-vector-builtins.cc \
   $(FUNCTION_H) fold-const.h gimplify.h explow.h stor-layout.h $(REGS_H) \
   alias.h langhooks.h attribs.h stringpool.h \
   $(srcdir)/config/riscv/riscv-vector-builtins.h \
-   $(srcdir)/config/riscv/riscv-vector-builtins.def
+  $(srcdir)/config/riscv/riscv-vector-builtins.def
$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
$(srcdir)/config/riscv/riscv-vector-builtins.cc
 
-- 
2.36.1



Re: PING^1: [PATCH] x86: Check corrupted return address when unwinding stack

2022-10-17 Thread Hongtao Liu via Gcc-patches
On Wed, Oct 5, 2022 at 5:33 AM H.J. Lu  wrote:
>
> On Wed, Sep 21, 2022 at 1:42 PM H.J. Lu  wrote:
> >
> > If shadow stack is enabled, when unwinding stack, we count how many stack
> > frames we pop to reach the landing pad and adjust shadow stack by the same
> > amount.  When counting the stack frame, we compare the return address on
> > normal stack against the return address on shadow stack.  If they don't
> > match, return _URC_FATAL_PHASE2_ERROR for the corrupted return address on
> > normal stack.  Don't check the return address for
> >
> > 1. Non-catchable exception where exception_class == 0.  Process will be
> > terminated.
> > 2. Zero return address which marks the outermost stack frame.
> > 3. Signal stack frame since kernel puts a restore token on shadow stack.
Ok.
> >
> > * unwind-generic.h (_Unwind_Frames_Increment): Add the EXC
> > argument.
> > * unwind.inc (_Unwind_RaiseException_Phase2): Pass EXC to
> > _Unwind_Frames_Increment.
> > (_Unwind_ForcedUnwind_Phase2): Likewise.
> > * config/i386/shadow-stack-unwind.h (_Unwind_Frames_Increment):
> > Take the EXC argument.  Return _URC_FATAL_PHASE2_ERROR if the
> > return address on normal stack doesn't match the return address
> > on shadow stack.
> > ---
> >  libgcc/config/i386/shadow-stack-unwind.h | 51 ++--
> >  libgcc/unwind-generic.h  |  2 +-
> >  libgcc/unwind.inc|  4 +-
> >  3 files changed, 50 insertions(+), 7 deletions(-)
> >
> > diff --git a/libgcc/config/i386/shadow-stack-unwind.h 
> > b/libgcc/config/i386/shadow-stack-unwind.h
> > index 2b02682bdae..89d44165000 100644
> > --- a/libgcc/config/i386/shadow-stack-unwind.h
> > +++ b/libgcc/config/i386/shadow-stack-unwind.h
> > @@ -54,10 +54,39 @@ see the files COPYING3 and COPYING.RUNTIME 
> > respectively.  If not, see
> > aligned.  If the original shadow stack is 8 byte aligned, we just
> > need to pop 2 slots, one restore token, from shadow stack.  Otherwise,
> > we need to pop 3 slots, one restore token + 4 byte padding, from
> > -   shadow stack.  */
> > -#ifndef __x86_64__
> > +   shadow stack.
> > +
> > +   When popping a stack frame, we compare the return address on normal
> > +   stack against the return address on shadow stack.  If they don't match,
> > +   return _URC_FATAL_PHASE2_ERROR for the corrupted return address on
> > +   normal stack.  Don't check the return address for
> > +   1. Non-catchable exception where exception_class == 0.  Process will
> > +  be terminated.
> > +   2. Zero return address which marks the outermost stack frame.
> > +   3. Signal stack frame since kernel puts a restore token on shadow
> > +  stack.
> > + */
> >  #undef _Unwind_Frames_Increment
> > -#define _Unwind_Frames_Increment(context, frames)  \
> > +#ifdef __x86_64__
> > +#define _Unwind_Frames_Increment(exc, context, frames) \
> > +{  \
> > +  frames++;\
> > +  if (exc->exception_class != 0\
> > + && _Unwind_GetIP (context) != 0   \
> > + && !_Unwind_IsSignalFrame (context))  \
> > +   {   \
> > + _Unwind_Word ssp = _get_ssp ();   \
> > + if (ssp != 0) \
> > +   {   \
> > + ssp += 8 * frames;\
> > + _Unwind_Word ra = *(_Unwind_Word *) ssp;  \
> > + if (ra != _Unwind_GetIP (context))\
> > +   return _URC_FATAL_PHASE2_ERROR; \
> > +   }   \
> > +   }   \
> > +}
> > +#else
> > +#define _Unwind_Frames_Increment(exc, context, frames) \
> >if (_Unwind_IsSignalFrame (context)) \
> >  do \
> >{\
> > @@ -83,5 +112,19 @@ see the files COPYING3 and COPYING.RUNTIME 
> > respectively.  If not, see
> >}\
> >  while (0); \
> >else \
> > -frames++;
> > +{  \
> > +  frames++;\
> > +  if (exc->exception_class != 0\
> > + && _Unwind_GetIP (context) != 0)  \
> > +   {   \
> > + _Unwind_Word ssp = _get_ssp ();   \
> > + if (ssp != 0) \
> > +   {   \
> > +  

Re: [RFC] Add support for vectors in comparisons (like the C++ frontend does)

2022-10-17 Thread Richard Biener via Gcc-patches
On Fri, Oct 14, 2022 at 4:18 PM Paul Iannetta via Gcc-patches
 wrote:
>
> On Wed, Oct 12, 2022 at 01:18:19AM +0200, Paul Iannetta wrote:
> > On Mon, Oct 10, 2022 at 11:07:06PM +, Joseph Myers wrote:
> > > On Mon, 10 Oct 2022, Paul Iannetta via Gcc-patches wrote:
> > >
> > > > I have a patch to bring this feature to the C front-end as well, and
> > > > would like to hear your opinion on it, especially since it may affect
> > > > the feature-set of the objc front-end as well.
> > >
> > > > Currently, this is only a tentative patch and I did not add any tests
> > > > to the testsuite.
> > >
> > > I think tests (possibly existing C++ tests moved to c-c++-common?) are
> > > necessary to judge such a feature; it could better be judged based on
> > > tests without implementation than based on implementation without tests.
> >
> > Currently, this feature has the following tests in g++.dg/ext/
> >   - vector9.C
> >   - vector19.C
> >   - vector21.C
> >   - vector22.C
> >   - vector23.C
> >   - vector27.C
> >   - vector28.C
> > provided by Marc Glisse when he implemented the feature for C++.
> >
> > They are all handled by my mirror implementation (after removing
> > C++-only features), save for a case in vector19.C ( v ? '1' : '2',
> > where v is a vector of unsigned char, but '1' and '2' are considered
> > as int, which results in a type mismatch.)
> >
> > I'll move those tests to c-c++-common tomorrow, but will duplicate
> > vector19.C and vector23.C which rely on C++-only features.
> >
> > During my tests, I've been using variations around this:
> >
> > typedef int v2si __attribute__((__vector_size__ (2 * sizeof(int;
> >
> > v2si f (v2si a, v2si b, v2si c)
> > {
> >   v2si d = a + !b;
> >   v2si e = a || b;
> >   return c ? (a + !b) && (c - e && a) : (!!b ^ c && e);
> > }
> >
> > It is already possible to express much of the same thing without the
> > syntactic sugar but is is barely legible
> >
> > typedef int v2si __attribute__((__vector_size__ (2 * sizeof(int;
> >
> > v2si f (v2si a, v2si b, v2si c)
> > {
> >   v2si d = a + (b == 0);
> >   v2si e = (a != 0) | (b != 0);
> >   return ((c != 0) & (((a + (b == 0)) != 0) & (((c - e) != 0) & (a != 0
> >| ((c == 0) & (b == 0) == 0) ^ c) != 0) & (e != 0)));
> > }
> >
> > Paul
>
> I still need to check what is done by clang on the objc side, but in
> order to not conflict with what was done before, a warning is
> triggered by c_obj_common_truthvalue_conversion and
> build_unary_operator warns if '!' is used with a vector.  Both warnings
> are only triggered in pedantic mode as suggested by Iain Sandoe.
>
> The support of the binary ops and unary ops works as the C++ front-end
> does, there is however the case of the ternary conditional operator,
> where the C standard mandates the promotion of the operands if they
> have rank less than (unsigned) int, whereas C++ does not.
>
> In any case, as per the documentation of VEC_COND_EXPR,
> "vec0 = vector-condition ? vec1 : vec2" is equivalent to
> ``` (from tree.def)
>   for (int i = 0 ; i < n ; ++i)
> vec0[i] = vector-condtion[i] ? vec1[i] : vec2[i];
> ```
> But this is currently not the case, even in C++ where
> ``` (Ex1)
> typedef signed char vec2 __attribute__((vector_size(16)));
> typedef float vec2f __attribute__((vector_size( 2 * sizeof (float;
>
> void j (vec2 *x, vec2 *z, vec2f *y, vec2f *t)
> {
>   *x = (*y < *t) ? '1' : '0'; // error: inferred scalar type ‘char’ is
>   // not an integer or floating-point type
>   // of the same size as ‘float’.
>
>   for (int i = 0 ; i < 2 ; ++i)  // fine
> (*x)[i] = (*y)[i] < (*t)[i] ? '1' : '0'; //
>
>   *z = (*x < *z) ? '1' : '0'; // fine
> }
> ```
>
> The documentation explicitly says:
> > the ternary operator ?: is available. a?b:c, where b and c are
> > vectors of the same type and a is an integer vector with the same
> > number of elements of the same size as b and c, computes all three
> > arguments and creates a vector {a[0]?b[0]:c[0], a[1]?b[1]:c[1], …}
> Here, "*y < *t" is a boolean vector (and bool is an integral type
> ([basic.fundamental] 11), so this should be accepted.
>
> An other point is that if we look at
> ```
>   for (int i = 0 ; i < n ; ++i)
> vec0[i] = vector-condtion[i] ? vec1[i] : vec2[i];
> ```
> implicit conversions may happen, which is completely over-looked
> currently.  That is, the type of (1): "v = v0 ? v1 : v2" is the lowest
> common type of v, v1 and v2; and the type of (2): "v0 ? v1 : v2" is the
> lowest common type of v1 and v2.  (2) can appear as a parameter, but
> even in that case, I think that (2) should be constrained by the type
> of the parameter and we are back to case (1).
>
> My points are that:
>   - the current implementation has a bug: " *x = (*y < *t) ? '1' :
> '0';" from (Ex1) should be fine.
>   - the current implementation does not explicetly follow the
> documented behavior of VEC_COND_EXPR.
>
> 

Re: [PATCH] Move scanning pass of forwprop-19.c to dse1 for r13-3212-gb88adba751da63

2022-10-17 Thread Richard Biener via Gcc-patches
On Mon, Oct 17, 2022 at 5:44 AM Hongtao Liu via Gcc-patches
 wrote:
>
> On Mon, Oct 17, 2022 at 11:26 AM Liwei Xu via Gcc-patches
>  wrote:
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/tree-ssa/forwprop-19.c: Move scanning pass from forwprop1 
> > to dse1, This fixs
> > the test case fail.
> Looks like an obvious fix to me.

Yes, OK.

Thanks,
Richard.

> > ---
> >  gcc/testsuite/gcc.dg/tree-ssa/forwprop-19.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-19.c 
> > b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-19.c
> > index 4d77138b206..6ca81cb6c49 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-19.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-19.c
> > @@ -1,5 +1,5 @@
> >  /* { dg-do compile } */
> > -/* { dg-options "-O -fdump-tree-forwprop1" } */
> > +/* { dg-options "-O -fdump-tree-dse1" } */
> >
> >  typedef int vec __attribute__((vector_size (4 * sizeof (int;
> >  void f (vec *x1, vec *x2)
> > @@ -11,4 +11,4 @@ void f (vec *x1, vec *x2)
> >*x1 = z;
> >  }
> >
> > -/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "forwprop1" } } */
> > +/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "dse1" } } */
> > --
> > 2.18.2
> >
>
>
> --
> BR,
> Hongtao


Re: [PATCH] Don't print discriminators for -fcompare-debug.

2022-10-17 Thread Richard Biener via Gcc-patches
On Sun, Oct 16, 2022 at 10:25 PM Eugene Rozenfeld via Gcc-patches
 wrote:
>
> With -gstatement-frontiers we may end up with different IR
> coming from the front end with and without debug information turned on.
> See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100733 for details.
> That may result in differences in discriminator values and -fcompare-debug
> failures.
>
> This patch disables printing of discriminators when the dump is intended
> for -fcompare-debug comparison and reverses the workaround in a test.

I don't think this is the correct approach.  -gstatement-frontiers is
known to be
prone to these issues and is the one to blame here.  I think the bugs should be
SUSPENDED until -gstatement-frontiers is fixed or at least disabled by default
(IIRC Jakub tried that but failed last time)

> Tested on x86_64-pc-linux-gnu.
>
> gcc/ChangeLog:
> PR debug/107231
> PR debug/107169
> * print-rtl.cc (print_rtx_operand_code_i): Don't print discriminators
> for -fdebug-compare.
>
> gcc/testsuite/ChangeLog:
>
> * c-c++-common/ubsan/pr85213.c: Reverse the workaround for 
> discriminators.
> ---
>  gcc/print-rtl.cc   | 13 ++---
>  gcc/testsuite/c-c++-common/ubsan/pr85213.c |  7 +--
>  2 files changed, 11 insertions(+), 9 deletions(-)
>
> diff --git a/gcc/print-rtl.cc b/gcc/print-rtl.cc
> index e115f987173..0476f3d7e79 100644
> --- a/gcc/print-rtl.cc
> +++ b/gcc/print-rtl.cc
> @@ -453,10 +453,17 @@ rtx_writer::print_rtx_operand_code_i (const_rtx in_rtx, 
> int idx)
>   expanded_location xloc = insn_location (in_insn);
>   fprintf (m_outfile, " \"%s\":%i:%i", xloc.file, xloc.line,
>xloc.column);
> - int discriminator = insn_discriminator (in_insn);
> -   if (discriminator)
> - fprintf (m_outfile, " discrim %d", discriminator);
>
> + /* Don't print discriminators for -fcompare-debug since the IR
> +coming from the front end may be different with and without
> +debug information turned on. That may result in different
> +discriminator values. */
> + if (!(dump_flags & TDF_COMPARE_DEBUG))
> +   {
> + int discriminator = insn_discriminator (in_insn);
> + if (discriminator)
> +   fprintf (m_outfile, " discrim %d", discriminator);
> +   }
> }
>  #endif
>  }
> diff --git a/gcc/testsuite/c-c++-common/ubsan/pr85213.c 
> b/gcc/testsuite/c-c++-common/ubsan/pr85213.c
> index e903e976f2c..8a6be81d20f 100644
> --- a/gcc/testsuite/c-c++-common/ubsan/pr85213.c
> +++ b/gcc/testsuite/c-c++-common/ubsan/pr85213.c
> @@ -1,11 +1,6 @@
>  /* PR sanitizer/85213 */
>  /* { dg-do compile } */
> -/* Pass -gno-statement-frontiers to work around
> -   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100733 :
> -   without it the IR coming from the front end may be different with and 
> without
> -   debug information turned on. That may cause e.g., different discriminator 
> values
> -   and -fcompare-debug failures. */
> -/* { dg-options "-O1 -fsanitize=undefined -fcompare-debug 
> -gno-statement-frontiers" } */
> +/* { dg-options "-O1 -fsanitize=undefined -fcompare-debug" } */
>
>  int
>  foo (int x)
> --
> 2.25.1


Re: [PATCH] middle-end, v3: IFN_ASSUME support [PR106654]

2022-10-17 Thread Richard Biener via Gcc-patches
On Fri, 14 Oct 2022, Jakub Jelinek wrote:

> On Fri, Oct 14, 2022 at 11:27:07AM +, Richard Biener wrote:
> > > --- gcc/function.h.jj 2022-10-10 11:57:40.163722972 +0200
> > > +++ gcc/function.h2022-10-12 19:48:28.887554771 +0200
> > > @@ -438,6 +438,10 @@ struct GTY(()) function {
> > >  
> > >/* Set if there are any OMP_TARGET regions in the function.  */
> > >unsigned int has_omp_target : 1;
> > > +
> > > +  /* Set for artificial function created for [[assume (cond)]].
> > > + These should be GIMPLE optimized, but not expanded to RTL.  */
> > > +  unsigned int assume_function : 1;
> > 
> > I wonder if we should have this along force_output in the symtab
> > node and let the symtab code decide whether to expand?
> 
> I actually first had a flag on the symtab node but as the patch shows,
> when it needs to be tested, more frequently I have access to struct function
> than to cgraph node.

I see.

> > > --- gcc/gimplify.cc.jj2022-10-10 11:57:40.165722944 +0200
> > > +++ gcc/gimplify.cc   2022-10-12 19:48:28.890554730 +0200
> > > @@ -3569,7 +3569,52 @@ gimplify_call_expr (tree *expr_p, gimple
> > >fndecl, 0));
> > > return GS_OK;
> > >   }
> > > -   /* FIXME: Otherwise expand it specially.  */
> > > +   /* If not optimizing, ignore the assumptions.  */
> > > +   if (!optimize)
> > > + {
> > > +   *expr_p = NULL_TREE;
> > > +   return GS_ALL_DONE;
> > > + }
> > > +   /* Temporarily, until gimple lowering, transform
> > > +  .ASSUME (cond);
> > > +  into:
> > > +  guard = .ASSUME ();
> > > +  if (guard) goto label_true; else label_false;
> > > +  label_true:;
> > > +  {
> > > +guard = cond;
> > > +  }
> > > +  label_false:;
> > > +  .ASSUME (guard);
> > > +  such that gimple lowering can outline the condition into
> > > +  a separate function easily.  */
> > 
> > So the idea to use lambdas and/or nested functions (for OMP)
> > didn't work out or is more complicated?
> 
> Yes, that didn't work out.  Both lambda creation and nested function
> handling produce big structures with everything while for the assumptions
> it is better to have separate scalars if possible, lambda creation has
> various language imposed restrictions, diagnostics etc. and isn't
> available in C and I think the outlining in the patch is pretty simple and
> short.
> 
> > I wonder if, instead of using the above intermediate form we
> > can have a new structued GIMPLE code with sub-statements
> > 
> >  .ASSUME
> >{
> >  condition;
> >}
> 
> That is what I wrote in the patch description as alternative:
> "with the condition wrapped into a GIMPLE_BIND (I admit the above isn't   
>   
>
> extra clean but it is just something to hold it from gimplifier until 
>   
>   
> gimple low pass; it reassembles if (condition_never_true) { cond; };  
>   
>   
> an alternative would be introduce GOMP_ASSUME statement that would have   
>   
>   
> the guard var as operand and the GIMPLE_BIND as body, but for the 
>   
>   
> few passes (tree-nested and omp lowering) in between that looked like 
>   
>   
> an overkill to me)"
> I can certainly implement that easily.

I'd prefer that, it looks possibly less messy.

> > ?  There's gimple_statement_omp conveniently available as base and
> > IIRC you had the requirement to implement some OMP assume as well?
> 
> For OpenMP assumptions we right now implement just the holds clause
> of assume and implement it the same way as assume/gnu::assume attributes.
> 
> > Of ocurse a different stmt class with body would work as well here,
> > maybe we can even use a gbind with a special flag.
> > 
> > The outlining code can then be ajusted to outline a single BIND?
> 
> It already is adjusting a single bind (of course with everything nested in
> it).
> 
> > It probably won't simplify much that way.
> 
> > > +static tree
> > > +create_assumption_fn (location_t loc)
> > > +{
> > > +  tree name = clone_function_name_numbered (current_function_decl, 
> > > "_assume");
> > > +  /* For now, will be changed later.  */
> > 
> > ?
> 
> I need to create the FUNCTION_DECL early and only later on discover
> the used automatic vars (for which I need the destination function)
> and only once those are discovered I can create 

Re: [PATCH 2/6] Support Intel AVX-VNNI-INT8

2022-10-17 Thread Hongtao Liu via Gcc-patches
On Mon, Oct 17, 2022 at 2:27 PM Jiang, Haochen  wrote:
>
> > -Original Message-
> > From: Hongtao Liu 
> > Sent: Monday, October 17, 2022 12:05 PM
> > To: Jiang, Haochen 
> > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> > Subject: Re: [PATCH 2/6] Support Intel AVX-VNNI-INT8
> >
> > On Fri, Oct 14, 2022 at 3:57 PM Haochen Jiang via Gcc-patches  > patc...@gcc.gnu.org> wrote:
> > >
> > > From: Kong Lingling 
> > >
> > > gcc/ChangeLog
> > >
> > > * common/config/i386/cpuinfo.h (get_available_features): Detect
> > > avxvnniint8.
> > > * common/config/i386/i386-common.cc
> > > (OPTION_MASK_ISA2_AVXVNNIINT8_SET): New.
> > > (OPTION_MASK_ISA2_AVXVNNIINT8_UNSET): Ditto.
> > > (ix86_handle_option): Handle -mavxvnniint8.
> > > * common/config/i386/i386-cpuinfo.h (enum processor_features):
> > > Add FEATURE_AVXVNNIINT8.
> > > * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for
> > > avxvnniint8.
> > > * config.gcc: Add avxvnniint8intrin.h.
> > > * config/i386/avxvnniint8intrin.h: New file.
> > > * config/i386/cpuid.h (bit_AVXVNNIINT8): New.
> > > * config/i386/i386-builtin.def: Add new builtins.
> > > * config/i386/i386-c.cc (ix86_target_macros_internal): Define
> > > __AVXVNNIINT8__.
> > > * config/i386/i386-options.cc (isa2_opts): Add -mavxvnniint8.
> > > (ix86_valid_target_attribute_inner_p): Handle avxvnniint8.
> > > * config/i386/i386-isa.def: Add DEF_PTA(AVXVNNIINT8) New..
> > > * config/i386/i386.opt: Add option -mavxvnniint8.
> > > * config/i386/immintrin.h: Include avxvnniint8intrin.h.
> > > * config/i386/sse.md
> > > (vpdp_): New define_insn.
> > > * doc/extend.texi: Document avxvnniint8.
> > > * doc/invoke.texi: Document -mavxvnniint8.
> > > * doc/sourcebuild.texi: Document target avxvnniint8.
> > >
> > > gcc/testsuite/ChangeLog
> > >
> > > * g++.dg/other/i386-2.C: Add -mavxvnniint8.
> > > * g++.dg/other/i386-3.C: Ditto.
> > > * gcc.target/i386/avx-check.h: Add avxvnniint8 check.
> > > * gcc.target/i386/sse-12.c: Add -mavxvnniint8.
> > > * gcc.target/i386/sse-13.c: Ditto.
> > > * gcc.target/i386/sse-14.c: Ditto.
> > > * gcc.target/i386/sse-22.c: Ditto.
> > > * gcc.target/i386/sse-23.c: Ditto.
> > > * gcc.target/i386/funcspec-56.inc: Add new target attribute.
> > > * lib/target-supports.exp
> > > (check_effective_target_avxvnniint8): New.
> > > * gcc.target/i386/avxvnniint8-1.c: Ditto.
> > > * gcc.target/i386/avxvnniint8-vpdpbssd-2.c: Ditto.
> > > * gcc.target/i386/avxvnniint8-vpdpbssds-2.c: Ditto.
> > > * gcc.target/i386/avxvnniint8-vpdpbsud-2.c: Ditto.
> > > * gcc.target/i386/avxvnniint8-vpdpbsuds-2.c: Ditto.
> > > * gcc.target/i386/avxvnniint8-vpdpbuud-2.c: Ditto.
> > > * gcc.target/i386/avxvnniint8-vpdpbuuds-2.c: Ditto.
> > >
> > > Co-authored-by: Hongyu Wang 
> > > Co-authored-by: Haochen Jiang 
> > > ---
> > >  gcc/common/config/i386/cpuinfo.h  |   2 +
> > >  gcc/common/config/i386/i386-common.cc |  22 ++-
> > >  gcc/common/config/i386/i386-cpuinfo.h |   1 +
> > >  gcc/common/config/i386/i386-isas.h|   2 +
> > >  gcc/config.gcc|   2 +-
> > >  gcc/config/i386/avxvnniint8intrin.h   | 138 ++
> > >  gcc/config/i386/cpuid.h   |   1 +
> > >  gcc/config/i386/i386-builtin.def  |  14 ++
> > >  gcc/config/i386/i386-c.cc |   2 +
> > >  gcc/config/i386/i386-isa.def  |   1 +
> > >  gcc/config/i386/i386-options.cc   |   4 +-
> > >  gcc/config/i386/i386.opt  |   5 +
> > >  gcc/config/i386/immintrin.h   |   2 +
> > >  gcc/config/i386/sse.md|  31 
> > >  gcc/doc/extend.texi   |   5 +
> > >  gcc/doc/invoke.texi   |   9 +-
> > >  gcc/doc/sourcebuild.texi  |   3 +
> > >  gcc/testsuite/g++.dg/other/i386-2.C   |   2 +-
> > >  gcc/testsuite/g++.dg/other/i386-3.C   |   2 +-
> > >  gcc/testsuite/gcc.target/i386/avx-check.h |   3 +
> > >  gcc/testsuite/gcc.target/i386/avxvnniint8-1.c |  43 ++
> > > .../gcc.target/i386/avxvnniint8-vpdpbssd-2.c  |  72 +
> > > .../gcc.target/i386/avxvnniint8-vpdpbssds-2.c |  72 +
> > > .../gcc.target/i386/avxvnniint8-vpdpbsud-2.c  |  72 +
> > > .../gcc.target/i386/avxvnniint8-vpdpbsuds-2.c |  72 +
> > > .../gcc.target/i386/avxvnniint8-vpdpbuud-2.c  |  72 +
> > > .../gcc.target/i386/avxvnniint8-vpdpbuuds-2.c |  72 +
> > >  gcc/testsuite/gcc.target/i386/funcspec-56.inc |   2 +
> > >  gcc/testsuite/gcc.target/i386/sse-12.c|   2 +-
> > >  

RE: [PATCH 2/6] Support Intel AVX-VNNI-INT8

2022-10-17 Thread Jiang, Haochen via Gcc-patches
> -Original Message-
> From: Hongtao Liu 
> Sent: Monday, October 17, 2022 12:05 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> Subject: Re: [PATCH 2/6] Support Intel AVX-VNNI-INT8
> 
> On Fri, Oct 14, 2022 at 3:57 PM Haochen Jiang via Gcc-patches  patc...@gcc.gnu.org> wrote:
> >
> > From: Kong Lingling 
> >
> > gcc/ChangeLog
> >
> > * common/config/i386/cpuinfo.h (get_available_features): Detect
> > avxvnniint8.
> > * common/config/i386/i386-common.cc
> > (OPTION_MASK_ISA2_AVXVNNIINT8_SET): New.
> > (OPTION_MASK_ISA2_AVXVNNIINT8_UNSET): Ditto.
> > (ix86_handle_option): Handle -mavxvnniint8.
> > * common/config/i386/i386-cpuinfo.h (enum processor_features):
> > Add FEATURE_AVXVNNIINT8.
> > * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for
> > avxvnniint8.
> > * config.gcc: Add avxvnniint8intrin.h.
> > * config/i386/avxvnniint8intrin.h: New file.
> > * config/i386/cpuid.h (bit_AVXVNNIINT8): New.
> > * config/i386/i386-builtin.def: Add new builtins.
> > * config/i386/i386-c.cc (ix86_target_macros_internal): Define
> > __AVXVNNIINT8__.
> > * config/i386/i386-options.cc (isa2_opts): Add -mavxvnniint8.
> > (ix86_valid_target_attribute_inner_p): Handle avxvnniint8.
> > * config/i386/i386-isa.def: Add DEF_PTA(AVXVNNIINT8) New..
> > * config/i386/i386.opt: Add option -mavxvnniint8.
> > * config/i386/immintrin.h: Include avxvnniint8intrin.h.
> > * config/i386/sse.md
> > (vpdp_): New define_insn.
> > * doc/extend.texi: Document avxvnniint8.
> > * doc/invoke.texi: Document -mavxvnniint8.
> > * doc/sourcebuild.texi: Document target avxvnniint8.
> >
> > gcc/testsuite/ChangeLog
> >
> > * g++.dg/other/i386-2.C: Add -mavxvnniint8.
> > * g++.dg/other/i386-3.C: Ditto.
> > * gcc.target/i386/avx-check.h: Add avxvnniint8 check.
> > * gcc.target/i386/sse-12.c: Add -mavxvnniint8.
> > * gcc.target/i386/sse-13.c: Ditto.
> > * gcc.target/i386/sse-14.c: Ditto.
> > * gcc.target/i386/sse-22.c: Ditto.
> > * gcc.target/i386/sse-23.c: Ditto.
> > * gcc.target/i386/funcspec-56.inc: Add new target attribute.
> > * lib/target-supports.exp
> > (check_effective_target_avxvnniint8): New.
> > * gcc.target/i386/avxvnniint8-1.c: Ditto.
> > * gcc.target/i386/avxvnniint8-vpdpbssd-2.c: Ditto.
> > * gcc.target/i386/avxvnniint8-vpdpbssds-2.c: Ditto.
> > * gcc.target/i386/avxvnniint8-vpdpbsud-2.c: Ditto.
> > * gcc.target/i386/avxvnniint8-vpdpbsuds-2.c: Ditto.
> > * gcc.target/i386/avxvnniint8-vpdpbuud-2.c: Ditto.
> > * gcc.target/i386/avxvnniint8-vpdpbuuds-2.c: Ditto.
> >
> > Co-authored-by: Hongyu Wang 
> > Co-authored-by: Haochen Jiang 
> > ---
> >  gcc/common/config/i386/cpuinfo.h  |   2 +
> >  gcc/common/config/i386/i386-common.cc |  22 ++-
> >  gcc/common/config/i386/i386-cpuinfo.h |   1 +
> >  gcc/common/config/i386/i386-isas.h|   2 +
> >  gcc/config.gcc|   2 +-
> >  gcc/config/i386/avxvnniint8intrin.h   | 138 ++
> >  gcc/config/i386/cpuid.h   |   1 +
> >  gcc/config/i386/i386-builtin.def  |  14 ++
> >  gcc/config/i386/i386-c.cc |   2 +
> >  gcc/config/i386/i386-isa.def  |   1 +
> >  gcc/config/i386/i386-options.cc   |   4 +-
> >  gcc/config/i386/i386.opt  |   5 +
> >  gcc/config/i386/immintrin.h   |   2 +
> >  gcc/config/i386/sse.md|  31 
> >  gcc/doc/extend.texi   |   5 +
> >  gcc/doc/invoke.texi   |   9 +-
> >  gcc/doc/sourcebuild.texi  |   3 +
> >  gcc/testsuite/g++.dg/other/i386-2.C   |   2 +-
> >  gcc/testsuite/g++.dg/other/i386-3.C   |   2 +-
> >  gcc/testsuite/gcc.target/i386/avx-check.h |   3 +
> >  gcc/testsuite/gcc.target/i386/avxvnniint8-1.c |  43 ++
> > .../gcc.target/i386/avxvnniint8-vpdpbssd-2.c  |  72 +
> > .../gcc.target/i386/avxvnniint8-vpdpbssds-2.c |  72 +
> > .../gcc.target/i386/avxvnniint8-vpdpbsud-2.c  |  72 +
> > .../gcc.target/i386/avxvnniint8-vpdpbsuds-2.c |  72 +
> > .../gcc.target/i386/avxvnniint8-vpdpbuud-2.c  |  72 +
> > .../gcc.target/i386/avxvnniint8-vpdpbuuds-2.c |  72 +
> >  gcc/testsuite/gcc.target/i386/funcspec-56.inc |   2 +
> >  gcc/testsuite/gcc.target/i386/sse-12.c|   2 +-
> >  gcc/testsuite/gcc.target/i386/sse-13.c|   2 +-
> >  gcc/testsuite/gcc.target/i386/sse-14.c|   2 +-
> >  gcc/testsuite/gcc.target/i386/sse-22.c|   4 +-
> >  gcc/testsuite/gcc.target/i386/sse-23.c|   2 +-
> >  

Re: [PATCH] [PR24021] Implement PLUS_EXPR range-op entry for floats.

2022-10-17 Thread Aldy Hernandez via Gcc-patches
On Thu, Oct 13, 2022 at 7:57 PM Jakub Jelinek  wrote:
>
> On Thu, Oct 13, 2022 at 02:36:49PM +0200, Aldy Hernandez wrote:
> > +// Like real_arithmetic, but round the result to INF if the operation
> > +// produced inexact results.
> > +//
> > +// ?? There is still one problematic case, i387.  With
> > +// -fexcess-precision=standard we perform most SF/DFmode arithmetic in
> > +// XFmode (long_double_type_node), so that case is OK.  But without
> > +// -mfpmath=sse, all the SF/DFmode computations are in XFmode
> > +// precision (64-bit mantissa) and only occassionally rounded to
> > +// SF/DFmode (when storing into memory from the 387 stack).  Maybe
> > +// this is ok as well though it is just occassionally more precise. ??
> > +
> > +static void
> > +frange_arithmetic (enum tree_code code, tree type,
> > +REAL_VALUE_TYPE ,
> > +const REAL_VALUE_TYPE ,
> > +const REAL_VALUE_TYPE ,
> > +const REAL_VALUE_TYPE )
> > +{
> > +  REAL_VALUE_TYPE value;
> > +  enum machine_mode mode = TYPE_MODE (type);
> > +  bool mode_composite = MODE_COMPOSITE_P (mode);
> > +
> > +  bool inexact = real_arithmetic (, code, , );
> > +  real_convert (, mode, );
> > +
> > +  // If real_convert above has rounded an inexact value to towards
> > +  // inf, we can keep the result as is, otherwise we'll adjust by 1 ulp
> > +  // later (real_nextafter).
> > +  bool rounding = (flag_rounding_math
> > +&& (real_isneg ()
> > +? real_less (, )
> > +: !real_less (, )));
>
> I thought the agreement during Cauldron was that we'd do this always,
> regardless of flag_rounding_math.
> Because excess precision (the fast one like on ia32 or -mfpmath=387 on
> x86_64), or -frounding-math, or FMA contraction can all increase precision
> and worst case it all behaves like -frounding-math for the ranges.
>
> So, perhaps use:
>   if ((mode_composite || (real_isneg () ? real_less (, )
> : !real_less (, ))
>   && (inexact || !real_identical (, 

Done.

> ?
> No need to do the real_isneg/real_less stuff for mode_composite, then
> we do it always for inexacts, but otherwise we check if the rounding
> performed by real.cc has been in the conservative direction (for upper
> bound to +inf, for lower bound to -inf), if yes, we don't need to do
> anything, if yes, we frange_nextafter.
>
> As discussed, for mode_composite, I think we want to do the extra
> stuff for inexact denormals and otherwise do the nextafter unconditionally,
> because our internal mode_composite representation isn't precise enough.
>
> > +  // Be extra careful if there may be discrepancies between the
> > +  // compile and runtime results.
> > +  if ((rounding || mode_composite)
> > +  && (inexact || !real_identical (, )))
> > +{
> > +  if (mode_composite)
> > + {
> > +   bool denormal = (result.sig[SIGSZ-1] & SIG_MSB) == 0;
>
> Use real_isdenormal here?

Done.

> Though, real_iszero needs the same thing.

So... real_isdenormal() || real_iszero() as in the attached patch?

>
> > +   if (denormal)
> > + {
> > +   REAL_VALUE_TYPE tmp;
>
> And explain here why is this, that IBM extended denormals have just
> DFmode precision.

Done.

> Though, now that I think about it, while this is correct for denormals,
>
> > +   real_convert (, DFmode, );
> > +   frange_nextafter (DFmode, tmp, inf);
> > +   real_convert (, mode, );
> > + }
>
> there are also the cases where the higher double exponent is in the
> [__DBL_MIN_EXP__, __LDBL_MIN_EXP__] aka [-1021, -968] or so.
> https://en.wikipedia.org/wiki/Double-precision_floating-point_format
> If the upper double is denormal in the DFmode sense, so smaller absolute
> value than __DBL_MIN__, then doing nextafter in DFmode is the right thing to
> do, the lower double must be always +/- zero.
> Now, if the result is __DBL_MIN__, the upper double is already normalized
> but we can add __DBL_DENORM_MIN__ to it, which will make the number have
> 54-bit precision.
> If the result is __DBL_MIN__ * 2, we can again add __DBL_DENORM_MIN__
> and make it 55-bit precision.  Etc. until we reach __DBL_MIN__ * 2e53
> where it acts like fully normalized 106-bit precision number.
> I must say I'm not really sure what real_nextafter is doing in those cases,
> I'm afraid it doesn't handle it correctly but the only other use
> of real_nextafter is guarded with:
>   /* Don't handle composite modes, nor decimal, nor modes without
>  inf or denorm at least for now.  */
>   if (format->pnan < format->p
>   || format->b == 10
>   || !format->has_inf
>   || !format->has_denorm)
> return false;

Dunno.  Is there a conservative thing we can do for mode_composites
that aren't denormal or zero?

How does this look?
Aldy
From d7be6caf60133bc39f8224e2f0e00dabcdbbe55d Mon Sep 17 00:00:00 2001
From: Aldy Hernandez 
Date: Thu, 13 Oct 

RE: [r13-3172 Regression] FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for excess errors) on Linux/x86_64

2022-10-17 Thread Jiang, Haochen via Gcc-patches
Hi Rozenfeld,

I just checkout to your commit and the test still got failed.

It is reporting like this:
xgcc: error: 
/export/users2/haochenj/src/gcc/master/./libgomp/testsuite/libgomp.oacc-c++/../libgomp.oacc-c-c++-common/kernels-loop-g.c:
 '-fcompare-debug' failure (length)

Also fix a typo in manually sending, should be this to reproduce

To reproduce:

$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m64\ -march=cascadelake}'"

BRs,
Haochen

From: Jiang, Haochen
Sent: Monday, October 17, 2022 1:41 PM
To: Eugene Rozenfeld ; gcc-patches@gcc.gnu.org; 
gcc-regress...@gcc.gnu.org
Subject: RE: [r13-3172 Regression] 
FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for 
excess errors) on Linux/x86_64

If that has been fixed, just ignore that mail.

It is run through by a script and got the result few days ago. However, the 
sendmail
service was down on that machine and I just noticed that issue. So I sent that 
result
manually today in case that is not fixed.

Sorry for the disturb!

BRs,
Haochen

From: Eugene Rozenfeld 
mailto:eugene.rozenf...@microsoft.com>>
Sent: Monday, October 17, 2022 1:23 PM
To: Jiang, Haochen mailto:haochen.ji...@intel.com>>; 
gcc-patches@gcc.gnu.org; 
gcc-regress...@gcc.gnu.org
Subject: RE: [r13-3172 Regression] 
FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for 
excess errors) on Linux/x86_64

That commit had a bug that was fixed in 
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=80f414e6d73f9f1683f93d83ce63a6a482e54bee

Was that fix included in your GCC build?

From: Jiang, Haochen mailto:haochen.ji...@intel.com>>
Sent: Sunday, October 16, 2022 8:09 PM
To: gcc-patches@gcc.gnu.org; Eugene Rozenfeld 
mailto:eugene.rozenf...@microsoft.com>>; Jiang, 
Haochen mailto:haochen.ji...@intel.com>>; 
gcc-regress...@gcc.gnu.org
Subject: [EXTERNAL] [r13-3172 Regression] 
FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for 
excess errors) on Linux/x86_64

You don't often get email from 
haochen.ji...@intel.com. Learn why this is 
important
On Linux/x86_64,

f30e9fd33e56a5a721346ea6140722e1b193db42 is the first bad commit
commit f30e9fd33e56a5a721346ea6140722e1b193db42
Author: Eugene Rozenfeld mailto:ero...@microsoft.com>>
Date:   Thu Apr 21 16:43:24 2022 -0700

Set discriminators for call stmts on the same line within the same basic 
block.

caused

FAIL: libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O2  (test for 
excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-2288/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m64\ -march=cascadelake}'"