Re: [PATCH] testsuite: update requires for powerpc/float128-cmp2-runnable.c

2023-04-11 Thread Jiufu Guo via Gcc-patches
Jiufu Guo via Gcc-patches  writes:

> Hi,
>
> Segher Boessenkool  writes:
>
>> On Tue, Apr 11, 2023 at 05:40:09PM +0800, Kewen.Lin wrote:
>>> on 2023/4/11 17:14, guojiufu wrote:
>>> > Thanks for raising this concern.
>>> > The behavior to check about bif on FLOAT128_HW and emit an error message 
>>> > for
>>> > requirements on quad-precision is added in gcc12. This is why gcc12 fails 
>>> > to
>>> > compile the case on -m32.
>>> > 
>>> > Before gcc12, altivec_resolve_overloaded_builtin will return the 
>>> > overloaded
>>> > result directly, and does not check more about the result function.
>>> 
>>> Thanks for checking, I wonder which commit caused this behavior change and 
>>> what's
>>> the underlying justification?  I know there is one new bif handling 
>>> framework
>>> introduced in gcc12, not sure the checking condition was changed together 
>>> or by
>>> a standalone commit.  Anyway, apparently the conditions for the support of 
>>> these
>>> bifs are different on gcc-11 and gcc-12, I wonder why it changed.  As 
>>> mentioned
>>> above, PR108758's c#1 said this case (bifs) work well on gcc-11, I 
>>> suspected the
>>> condition change was an overkill, that's why I asked.
>>
>> It almost certainly was an oversight.  The new builtin framework changed
>> so many things, there was bound to be some breakage to go with all the
>> good things it brought.
>
> Yes, the condition checking on gcc-12 is different from gcc-11. In
> gcc-11, the condition on overloaded bif is not checked.
> And, there are a few commits related to the bifs change. e.g.
> r12-4977-ga28cfe49203705 introduces a new bif expand function which has
> the ability to check more bif's target requirements like ieee128_hw.
> And another commit changes the error message (r12-6684).
>
>>
>> So what is the actual thing going wrong?  QP insns work fine and are
>> valid on all systems and environments, BE or LE, 32-bit or 64-bit.
>> Of

I understand that QP insns (e.g. xscmpexpqp) is valid if the system
meets ISA3.0, no matter BE/LE, 32-bit/64-bit.
I think option -mfloat128-hardware is designed for QP insns.

While there is one issue, on BE machine, when compiling with options
"-mfloat128-hardware -m32", an error message is generated:
"error: '%<-mfloat128-hardware%>' requires '-m64'"

(I'm wondering if we need to relax this limitation.)


BR,
Jeff (Jiufu)

>> course you cannot use the "long double" type for those everywhere, but
>> that is a very different thing.
>
> Currently, when compiling bif __builtin_vsx_scalar_cmp_exp_qp_eq,
> gcc generates error message:
> error: '__builtin_vsx_scalar_cmp_exp_qp_eq' requires quad-precision
> floating-point arithmetic
>
> IMHO, this error would be ok.  Because it makes sense that this bif
> needs ieee128_hw.
>
> BR,
> Jeff (Jiufu)
>
>>
>>
>> Segher


Re: [PATCH] testsuite: update requires for powerpc/float128-cmp2-runnable.c

2023-04-11 Thread Jiufu Guo via Gcc-patches
Hi,

Segher Boessenkool  writes:

> On Tue, Apr 11, 2023 at 05:40:09PM +0800, Kewen.Lin wrote:
>> on 2023/4/11 17:14, guojiufu wrote:
>> > Thanks for raising this concern.
>> > The behavior to check about bif on FLOAT128_HW and emit an error message 
>> > for
>> > requirements on quad-precision is added in gcc12. This is why gcc12 fails 
>> > to
>> > compile the case on -m32.
>> > 
>> > Before gcc12, altivec_resolve_overloaded_builtin will return the overloaded
>> > result directly, and does not check more about the result function.
>> 
>> Thanks for checking, I wonder which commit caused this behavior change and 
>> what's
>> the underlying justification?  I know there is one new bif handling framework
>> introduced in gcc12, not sure the checking condition was changed together or 
>> by
>> a standalone commit.  Anyway, apparently the conditions for the support of 
>> these
>> bifs are different on gcc-11 and gcc-12, I wonder why it changed.  As 
>> mentioned
>> above, PR108758's c#1 said this case (bifs) work well on gcc-11, I suspected 
>> the
>> condition change was an overkill, that's why I asked.
>
> It almost certainly was an oversight.  The new builtin framework changed
> so many things, there was bound to be some breakage to go with all the
> good things it brought.

Yes, the condition checking on gcc-12 is different from gcc-11. In
gcc-11, the condition on overloaded bif is not checked.
And, there are a few commits related to the bifs change. e.g.
r12-4977-ga28cfe49203705 introduces a new bif expand function which has
the ability to check more bif's target requirements like ieee128_hw.
And another commit changes the error message (r12-6684).

>
> So what is the actual thing going wrong?  QP insns work fine and are
> valid on all systems and environments, BE or LE, 32-bit or 64-bit.  Of
> course you cannot use the "long double" type for those everywhere, but
> that is a very different thing.

Currently, when compiling bif __builtin_vsx_scalar_cmp_exp_qp_eq,
gcc generates error message:
error: '__builtin_vsx_scalar_cmp_exp_qp_eq' requires quad-precision
floating-point arithmetic

IMHO, this error would be ok.  Because it makes sense that this bif
needs ieee128_hw.

BR,
Jeff (Jiufu)

>
>
> Segher


Re: Fix ICEs related to VM types in C [PR106465, PR107557, PR108424, PR109450]

2023-04-11 Thread Martin Uecker via Gcc-patches
Am Mittwoch, dem 12.04.2023 um 00:32 + schrieb Joseph Myers:
> On Tue, 11 Apr 2023, Martin Uecker via Gcc-patches wrote:
> 
> > Ok, here is another attempt on fixing issues with size expression.
> > Not all are regressions, but it does not make sense to try to split
> > it up.
> 
> This wording implies this is version 2 or later of the patch, could you 
> please give a reference to whatever previous patch posting / discussion 
> there may have been?

The previous one was the following for PR107557 and PR108423.

https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611562.html

While revising it I realized that recursing for parameter
when gimplifying is incorrect for the same reason as recursing in
other cases (see the comment in gimplify_type_sizes). This lead
to PR109450.

> 
> > Fix ICEs related to VM types in C [PR106465, PR107557, PR108424, 
> > PR109450]
> 
> 108424 seems to be the wrong PR number.

This should be: PR108423


Martin





[PATCH, rs6000] xfail float128 comparison test case that fails on powerpc64 [PR108728]

2023-04-11 Thread HAO CHEN GUI via Gcc-patches
Hi,
  This patch xfails a float128 comparison test case on powerpc64 that
fails due to a longstanding issue with floating-point compares.

  See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58684 for more
information.

  The patch passed regression test on Power Linux platforms.

Thanks
Gui Haochen

ChangeLog
rs6000: xfail float128 comparison test case that fails on powerpc64.

This patch xfails a float128 comparison test case on powerpc64 that
fails due to a longstanding issue with floating-point compares.

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58684 for more information.

gcc/testsuite/
PR target/108728
* gcc.dg/torture/float128-cmp-invalid.c: Add xfail.

patch.diff
diff --git a/gcc/testsuite/gcc.dg/torture/float128-cmp-invalid.c 
b/gcc/testsuite/gcc.dg/torture/float128-cmp-invalid.c
index 1f675efdd61..f52686e0a24 100644
--- a/gcc/testsuite/gcc.dg/torture/float128-cmp-invalid.c
+++ b/gcc/testsuite/gcc.dg/torture/float128-cmp-invalid.c
@@ -1,5 +1,5 @@
 /* Test for "invalid" exceptions from __float128 comparisons.  */
-/* { dg-do run } */
+/* { dg-do run { xfail { powerpc*-*-* } } } */
 /* { dg-options "" } */
 /* { dg-require-effective-target __float128 } */
 /* { dg-require-effective-target base_quadfloat_support } */


Re: [PATCH] combine: Fix simplify_comparison AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]

2023-04-11 Thread Jeff Law via Gcc-patches



On 4/10/23 01:10, Jakub Jelinek wrote:

On Sat, Apr 08, 2023 at 06:25:32PM -0600, Jeff Law wrote:



On 4/6/23 08:21, Eric Botcazou wrote:


So, perhaps just in the return op0; case add further code for
WORD_REGISTER_OPERATIONS and sub-word modes which will call nonzero_bits
again for the word mode and decide if it is still safe.


Does it work to just replace mode by word_mode in the calls to nonzero_bits?

It helps marginally -- basically we defer mucking up the code a bit.  We
then hit this in simplify_and_const_int_1:


   /* See what bits may be nonzero in VAROP.  Unlike the general case of
  a call to nonzero_bits, here we don't care about bits outside
  MODE.  */

   nonzero = nonzero_bits (varop, mode) & GET_MODE_MASK (mode);

That just seems wrong for WORD_REGISTER_OPERATIONS targets.


Hacking both locations in a similar manner fixes the test.


If so, can you post that in patch form and can we go with that version
plus the testcase (e.g. from the first patch I've posted where I've changed
dse)?
So as I mentioned in IRC, I haven't really looked closely at 
simplify_and_const_int_1.  I don't have a high degree of confidence this 
patch is complete, even though it does fix the test for 108947 and 109040.


I did bootstrap on riscv, but not a regression test, that's spinning 
right now.


Jeffdiff --git a/gcc/combine.cc b/gcc/combine.cc
index 22bf8e1ec89..c41d8a09b3b 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -10055,9 +10055,10 @@ simplify_and_const_int_1 (scalar_int_mode mode, rtx 
varop,
 
   /* See what bits may be nonzero in VAROP.  Unlike the general case of
  a call to nonzero_bits, here we don't care about bits outside
- MODE.  */
+ MODE unless WORD_REGISTER_OPERATIONS is true.  */
 
-  nonzero = nonzero_bits (varop, mode) & GET_MODE_MASK (mode);
+  enum machine_mode tmode = WORD_REGISTER_OPERATIONS ? word_mode : mode;
+  nonzero = nonzero_bits (varop, tmode) & GET_MODE_MASK (tmode);
 
   /* Turn off all bits in the constant that are known to already be zero.
  Thus, if the AND isn't needed at all, we will have CONSTOP == NONZERO_BITS
diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index 3b33afa2461..5f6f70491d8 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -3752,7 +3752,10 @@ simplify_context::simplify_binary_operation_1 (rtx_code 
code,
return op0;
   if (HWI_COMPUTABLE_MODE_P (mode))
{
- HOST_WIDE_INT nzop0 = nonzero_bits (trueop0, mode);
+ /* When WORD_REGISTER_OPERATIONS is true, we need to know the
+nonzero bits in WORD_MODE rather than MODE.  */
+ HOST_WIDE_INT nzop0
+   = nonzero_bits (trueop0, WORD_REGISTER_OPERATIONS ? word_mode : 
mode);
  HOST_WIDE_INT nzop1;
  if (CONST_INT_P (trueop1))
{


Re: Fix ICEs related to VM types in C [PR106465, PR107557, PR108424, PR109450]

2023-04-11 Thread Joseph Myers
On Tue, 11 Apr 2023, Martin Uecker via Gcc-patches wrote:

> Ok, here is another attempt on fixing issues with size expression.
> Not all are regressions, but it does not make sense to try to split
> it up.

This wording implies this is version 2 or later of the patch, could you 
please give a reference to whatever previous patch posting / discussion 
there may have been?

>   Fix ICEs related to VM types in C [PR106465, PR107557, PR108424, 
> PR109450]

108424 seems to be the wrong PR number.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH] PR tree-optimization/109462 - Don't use ANY PHI equivalences in range-on-entry.

2023-04-11 Thread Andrew MacLeod via Gcc-patches

This is a carry over from PR 108139.

When we have a PHI node which has 2 arguments and one is undefined, we 
create an equivalence between the LHS and the non-undefined PHI 
argument.  THis allows us to perform certain optimizations.


The problem is, when we are evaluating range-on-entry in the cache, its 
depends on where that equivalence is made, from where we have no context.


a_3 = 

if c_3 is undefined,  then a_3 is equivalent to b_2... but b_2 is not 
equivalence to a_3 everywhere..   its a one way thing.


108139 fixed this by not evaluating any equivalences if the equivalence 
was the LHS.


What it missed, was it possible we are calculating the range of a_3.   
b_2 is not defined in a phi node, so it happily used the equivalence.  
This PR demonstrates that we can't always use that equivlence either 
without more context.  There can be places in the IL where a_3 is used, 
but b_2 has moved to a new value within a loop.


So we can't do this if either NAME or the equivalence is equal via a PHI 
node with an undefined argument.


Unfortunately, this unsafe assumption is why PR 101912 is fixed.   
Fixing this issue properly is going to cause that to reopen as it is 
unsafe. (That PR is  a false uninitialized warning issue, rather than an 
wrong-code issue)


This bootstraps on x86_64-pc-linux-gnu  with that single regression, 
which I have XFAILed for now.  OK for trunk?   Once Jakub verifies it 
actually fixes the execution problem.   we have no executable test . yet.


Andrew



commit 90848fb75cf91a45edd355d2b1485ef835099609
Author: Andrew MacLeod 
Date:   Tue Apr 11 17:29:03 2023 -0400

Don't use ANY PHI equivalences in range-on-entry.

PR 108139 dissallows PHI equivalencies in the on-entry calculator, but
it was only checking if the equivlaence was a PHI.  In this case, NAME
itself is a PHI with an equivlaence caused by an undefined value, so we
also need to check that case.  Unfortunately this un-fixes 101912.

PR tree-optimization/109462
gcc/
* gimple-range-cache.cc (ranger_cache::fill_block_cache): Don't
check for equivalences if NAME is a phi node.

gcc/testsuite/
* gcc.dg/uninit-pr101912.c: XFAIL the warning.

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index 6a098d8ec28..3b52f1e734c 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -1218,7 +1218,9 @@ ranger_cache::fill_block_cache (tree name, basic_block bb, basic_block def_bb)
 	  fprintf (dump_file, "\n");
 	}
   // See if any equivalences can refine it.
-  if (m_oracle)
+  // PR 109462, like 108139 below, a one way equivalence introduced
+  // by a PHI node can also be through the definition side.  Disallow it.
+  if (m_oracle && !is_a (SSA_NAME_DEF_STMT (name)))
 	{
 	  tree equiv_name;
 	  relation_kind rel;
diff --git a/gcc/testsuite/gcc.dg/uninit-pr101912.c b/gcc/testsuite/gcc.dg/uninit-pr101912.c
index 1550c03436d..62cd2a0c73e 100644
--- a/gcc/testsuite/gcc.dg/uninit-pr101912.c
+++ b/gcc/testsuite/gcc.dg/uninit-pr101912.c
@@ -11,7 +11,7 @@ tzloadbody (void)
   for (int i = 0; i < n; i++)
 {
   int corr = getint ();
-  if (corr < 1 || (corr == 1 && !(leapcnt == 0 || (prevcorr < corr ? corr == prevcorr + 1 : (corr == prevcorr || corr == prevcorr - 1) /* { dg-bogus "uninitialized" } */
+  if (corr < 1 || (corr == 1 && !(leapcnt == 0 || (prevcorr < corr ? corr == prevcorr + 1 : (corr == prevcorr || corr == prevcorr - 1) /* { dg-bogus "uninitialized" "pr101912" { xfail *-*-* } } */
 	return -1;
 
   prevcorr = corr;


Re: [PATCH] RISC-V: Fix PR108279

2023-04-11 Thread Jeff Law via Gcc-patches




On 4/11/23 17:09, juzhe.zh...@rivai.ai wrote:

I don't want to seperate VSETVL PASS into 2 seperate PASS.
I want make everything cleaner.
Well, two pass vsetvl might actually be cleaner.  But as I've noted 
before, now is not the time to debate the vsetvl implementation detail. 
We've got much more important stuff to deal with.


Jeff


Re: Re: [PATCH] RISC-V: Fix PR108279

2023-04-11 Thread juzhe.zhong
I don't want to seperate VSETVL PASS into 2 seperate PASS.
I want make everything cleaner.

Another example is VSETVL PASS can do the branch prediction:
https://godbolt.org/z/K44r98E5v 
In function "f", you can see we put the hoist vsetvl from a more likely block 
(i !=cond) outside the loop,
then eliminate the vsetvl of this block. (Branch prediction is not that perfect 
in VSETVL PASS, I plan to 
optimize more when GCC 14 is open).

"f2" function is the normal fuse that we do in Phase 3.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-04-12 05:14
To: Richard Biener; juzhe.zhong
CC: gcc-patches; kito.cheng; palmer
Subject: Re: [PATCH] RISC-V: Fix PR108279
 
 
On 4/11/23 02:55, Richard Biener wrote:
 
> 
> Just to throw in a comment here - I think you should present LCM
> with something it can identify as the same for compatible vsetvl and
> then it should just work?  OTOH if "compatible" is not transitive
> that's not possible (but then I can't quickly make up an example
> where it wouldn't be).
I'm not sure it's that simple.  Or at least not with a single iteration 
of LCM.
 
One problem is that kills may affecting one setting, but not the other. 
I couldn't mentally come up with a single pass LCM to handle the case 
Juzhe was handling.  ie, you may have two compatible settings where you 
can unify them and hoist the compatible setting to a less executed 
point.  But the transp set for one of two compatible settings may be 
different for the other compatible setting because of vector 
instructions in a block.
 
What was starting to form was a two pass approach.  One which worked 
with individual vsetvl settings, another which worked on unified vsetvl 
settings.  It wasn't clear to me which ordering would be better, but I 
didn't work through the likely scenarios -- it was clear this wasn't the 
time to introduce that kind of conceptual change.
 
jeff
 
 
 
 


Re: [PATCH] c++: Fix Solaris bootstraps across midnight

2023-04-11 Thread Nathan Sidwell via Gcc-patches

Jakub,
for avoidance of doubt, your version is fine.

nathan


On 4/11/23 18:06, Nathan Sidwell wrote:

On 4/11/23 04:12, Jakub Jelinek wrote:

Hi!

When working on the PR109040 fix, I wanted to test it on some
WORD_REGISTER_OPERATIONS target and tried sparc-solaris on GCC Farm.
My bootstrap failed in comparison failure on cp/module.o, because
Solaris date doesn't support the -r option and one stage's cp/module.o
was built before midnight and next stage's cp/module.o after midnight,
so they had different -DMODULE_VERSION= value.

Now, I think the advice (don't bootstrap at midnight) is something
we shouldn't have, so the following patch stores the module version
(still generated through the same way, date -r cp/module.cc
if it works, otherwise just date) into a temporary file, makes sure
that temporary file is updated when cp/module.cc source is updated
and when date -r doesn't work copies file from previous stage
if it is newer than cp/module.cc.


looks good.  one could tweak it slightly to avoid the MODULE_VERSION variable 
(but then I was the original lazy one!) Something like:



s-cp-module-version: $(srcdir)/cp/module.cc
 if date -r $(srcdir)/cp/module.cc '+%y%m%d%H%MU' \
   2>/dev/null >$@; then :; \
 elif test ../prev-gcc/$@ -nt \
   $(srcdir)/cp/module.cc; then \
   cp ../prev-gcc/$@ $@; \
 else \
   date '+%y%m%dU' 2>/dev/null >$@; \
 fi`; \


nathan



--
Nathan Sidwell



Re: [PATCH] c++: Fix Solaris bootstraps across midnight

2023-04-11 Thread Nathan Sidwell via Gcc-patches

On 4/11/23 04:12, Jakub Jelinek wrote:

Hi!

When working on the PR109040 fix, I wanted to test it on some
WORD_REGISTER_OPERATIONS target and tried sparc-solaris on GCC Farm.
My bootstrap failed in comparison failure on cp/module.o, because
Solaris date doesn't support the -r option and one stage's cp/module.o
was built before midnight and next stage's cp/module.o after midnight,
so they had different -DMODULE_VERSION= value.

Now, I think the advice (don't bootstrap at midnight) is something
we shouldn't have, so the following patch stores the module version
(still generated through the same way, date -r cp/module.cc
if it works, otherwise just date) into a temporary file, makes sure
that temporary file is updated when cp/module.cc source is updated
and when date -r doesn't work copies file from previous stage
if it is newer than cp/module.cc.


looks good.  one could tweak it slightly to avoid the MODULE_VERSION variable 
(but then I was the original lazy one!) Something like:



s-cp-module-version: $(srcdir)/cp/module.cc
if date -r $(srcdir)/cp/module.cc '+%y%m%d%H%MU' \
  2>/dev/null >$@; then :; \
elif test ../prev-gcc/$@ -nt \
  $(srcdir)/cp/module.cc; then \
  cp ../prev-gcc/$@ $@; \
else \
  date '+%y%m%dU' 2>/dev/null >$@; \
fi`; \


nathan

--
Nathan Sidwell



Re: [PATCH] RISC-V: Fix PR108279

2023-04-11 Thread Jeff Law via Gcc-patches




On 4/11/23 02:55, Richard Biener wrote:



Just to throw in a comment here - I think you should present LCM
with something it can identify as the same for compatible vsetvl and
then it should just work?  OTOH if "compatible" is not transitive
that's not possible (but then I can't quickly make up an example
where it wouldn't be).
I'm not sure it's that simple.  Or at least not with a single iteration 
of LCM.


One problem is that kills may affecting one setting, but not the other. 
I couldn't mentally come up with a single pass LCM to handle the case 
Juzhe was handling.  ie, you may have two compatible settings where you 
can unify them and hoist the compatible setting to a less executed 
point.  But the transp set for one of two compatible settings may be 
different for the other compatible setting because of vector 
instructions in a block.


What was starting to form was a two pass approach.  One which worked 
with individual vsetvl settings, another which worked on unified vsetvl 
settings.  It wasn't clear to me which ordering would be better, but I 
didn't work through the likely scenarios -- it was clear this wasn't the 
time to introduce that kind of conceptual change.


jeff





Re: [PATCH] RISC-V: Force ilp32d for the T-Head FMV test

2023-04-11 Thread Jeff Law via Gcc-patches




On 4/11/23 14:10, Palmer Dabbelt wrote:

These functions are NOPs on the soft-float ABIs.  Since we're already
forcing the ISA, let's just force the ABI too.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadfmv-fmv.c: Force the ilp32d ABI.
---
This one is also in the testing queue.  OK for trunk?

OK
jeff


Re: [PATCH] RISC-V: Clean up the pr106602.c testcase

2023-04-11 Thread Jeff Law via Gcc-patches




On 4/11/23 13:03, Palmer Dabbelt wrote:

The test case that was added is rv64i-specific, as there's better ways
to generate this code on rv32i (where the long/int cast is a NOP) and on
rv64i_zba (where we have word shifts).  This renames the original test
case and adds two more for those targets.

gcc/testsuite/ChangeLog:
PR target/106602
* gcc.target/riscv/pr106602.c: Moved to...
* gcc.target/riscv/pr106602-rv64i.c: ...here.
* gcc.target/riscv/pr106602-rv32i.c: New test.
* gcc.target/riscv/pr106602-rv64i_zba.c: New test.
---
The test suite is still running, but it looks like it's made it past
these so I figured I'd send it now as otherwise I might forget.

OK for trunk?  (assuming the tests finish and pass)

OK
jeff


Re: [RFC PATCH] range-op-float: Fix up op1_op2_relation of comparisons

2023-04-11 Thread Andrew MacLeod via Gcc-patches


On 4/11/23 04:21, Jakub Jelinek wrote:

Hi!

This patch was what I've tried first before the currently committed
PR109386 fix.  Still, I think it is the right thing until we have proper
full set of VREL_* relations for NANs (though it would be really nice
if op1_op2_relation could be passed either type of the comparison
operands, or even better ranges of the two operands, such that
we could choose if inversion of say VREL_LT is VREL_GE (if !MODE_HONOR_NANS
(TYPE_MODE (type))) or rhs1/rhs2 ranges are guaranteed not to include
NANs (!known_isnan && !maybe_isnan for both), or VREL_UNGE, etc.
Anyway, the current state is that for the LE/LT/GE/GT comparisons
we pretend the inverse is like for integral comparisons, which is
true only if NANs can't appear in operands, while for UNLE/UNLT/UNGE/UNGT
we don't override op1_op2_relation (so it always yields VREL_VARYING).

Though, this patch regresses the
FAIL: gcc.dg/tree-ssa/vrp-float-6.c scan-tree-dump-times evrp "Folding predicate x_.* 
<= y_.* to 1" 1
test, so am not sure what to do with it.  The test has explicit
!isnan tests around it, so e.g. having the ranges passed to op1_op2_relation
would also fix it.



I see no reason op1_op2_relation can't have ranges provided to it for 
op1 and op2.  There was no need originally.  There are times when we 
don't have a range handy and we want the simple answer, but if the 
ranges are available, we could utilize them.


Attached is a patch which added op1 and op2 ranges to the routine.  GORI 
will utilize and pass on real ranges (which I think is the core part you 
want), but the consumers in fold_using_range at this point will simply 
pass in varying.  There are 2 consumers in fold_using_range.. one is a 
combiner for logicals, and the other is for export outgoing relations 
that are not on the branch condition.  The combiner could use real 
ranges, but until I fix dispatch up it is very awkward to get them.  The 
export one simply doesn't have them without going to an calculating 
them.. which would probably be expensive..


Regardless, you can at least try your enhancement using real ranges and 
see if this works for you.


This bootstraps and has no regressions, and is fine by me if you want to 
use it.,


Andrew

commit 3715234f2cba21f2b9ec6c609b6f058d1d8af500
Author: Andrew MacLeod 
Date:   Tue Apr 11 12:25:49 2023 -0400

Add op1 and op2 ranges to op1_op2_relation.

* gimple-range-fold.cc (fold_using_range::relation_fold_and_or):
Provide VARYING for op1 and op2 when calling op1_op2_relation.
(fur_source::register_outgoing_edges): Ditto.
* gimple-range-gori.cc (gori_compute::compute_operand1_range):
Pass op1 and op2 ranges to op1_op2_relation.
(gori_compute::compute_operand2_range): Ditto.
* range-op-float.cc (*::op1_op2_relation): Adjust params.
* range-op.cc (*::op1_op2_relation): Adjust params.
* range-op.h (*::op1_op2_relation): Adjust params.

diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index e81f6b3699e..3170b1e71a1 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -1051,9 +1051,11 @@ fold_using_range::relation_fold_and_or (irange& lhs_range, gimple *s,
 return;
 
   int_range<2> bool_one (boolean_true_node, boolean_true_node);
+  Value_Range op (TREE_TYPE (ssa1));
+  op.set_varying (TREE_TYPE (ssa1));
 
-  relation_kind relation1 = handler1.op1_op2_relation (bool_one);
-  relation_kind relation2 = handler2.op1_op2_relation (bool_one);
+  relation_kind relation1 = handler1.op1_op2_relation (bool_one, op, op);
+  relation_kind relation2 = handler2.op1_op2_relation (bool_one, op, op);
   if (relation1 == VREL_VARYING || relation2 == VREL_VARYING)
 return;
 
@@ -1125,15 +1127,17 @@ fur_source::register_outgoing_edges (gcond *s, irange _range, edge e0, edge
   tree ssa2 = gimple_range_ssa_p (handler.operand2 ());
   if (ssa1 && ssa2)
 {
+  Value_Range op (TREE_TYPE (ssa1));
+  op.set_varying (TREE_TYPE (ssa1));
   if (e0)
 	{
-	  relation_kind relation = handler.op1_op2_relation (e0_range);
+	  relation_kind relation = handler.op1_op2_relation (e0_range, op, op);
 	  if (relation != VREL_VARYING)
 	register_relation (e0, relation, ssa1, ssa2);
 	}
   if (e1)
 	{
-	  relation_kind relation = handler.op1_op2_relation (e1_range);
+	  relation_kind relation = handler.op1_op2_relation (e1_range, op, op);
 	  if (relation != VREL_VARYING)
 	register_relation (e1, relation, ssa1, ssa2);
 	}
@@ -1160,17 +1164,19 @@ fur_source::register_outgoing_edges (gcond *s, irange _range, edge e0, edge
   Value_Range r (TREE_TYPE (name));
   if (ssa1 && ssa2)
 	{
+	  Value_Range op (TREE_TYPE (ssa1));
+	  op.set_varying (TREE_TYPE (ssa1));
 	  if (e0 && gori ()->outgoing_edge_range_p (r, e0, name, *m_query)
 	  && r.singleton_p ())
 	{
-	  relation_kind relation = handler.op1_op2_relation (r);
+	  relation_kind 

[PATCH] RISC-V: Force ilp32d for the T-Head FMV test

2023-04-11 Thread Palmer Dabbelt
These functions are NOPs on the soft-float ABIs.  Since we're already
forcing the ISA, let's just force the ABI too.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadfmv-fmv.c: Force the ilp32d ABI.
---
This one is also in the testing queue.  OK for trunk?
---
 gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c 
b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
index 10d035e9e1d..1036044291e 100644
--- a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
+++ b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target { rv32 } } } */
-/* { dg-options "-march=rv32gc_xtheadfmv" } */
+/* { dg-options "-march=rv32gc_xtheadfmv -mabi=ilp32d" } */
 /* { dg-skip-if "" { *-*-* } { "-O0" } } */
 
 double
-- 
2.39.2



[PATCH] Fortran: fix functions with entry and pointer/allocatable result [PR104312]

2023-04-11 Thread Harald Anlauf via Gcc-patches
Dear all,

the testcase in the PR by Gerhard exhibited a mis-treatment of
the function decl of the entry master if the function result
had a pointer attribute and the translation unit was compiled
with -ff2c.  We actually should not use the peculiar special
treatment for default-real functions in that case, as -ff2c is
reserved for function results that can be expressed in Fortran77,
and POINTER was not allowed in that standard.  Same for complex.

Furthermore, it turned out that ALLOCATABLE function results
were not yet handled for functions with entries, even without
-ff2c.  Adding support for this was straightforward.

I also fixed a potential buffer overflow for a generated
internal symbol.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

From 60e81b97cf3715347de30ed4fd579be54fdb1997 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Tue, 11 Apr 2023 21:44:20 +0200
Subject: [PATCH] Fortran: fix functions with entry and pointer/allocatable
 result [PR104312]

gcc/fortran/ChangeLog:

	PR fortran/104312
	* resolve.cc (resolve_entries): Handle functions with ENTRY and
	ALLOCATABLE results.
	* trans-expr.cc (gfc_conv_procedure_call): Functions with a result
	with the POINTER or ALLOCATABLE attribute shall not get any special
	treatment with -ff2c, as they cannot be written in Fortran 77.
	* trans-types.cc (gfc_return_by_reference): Likewise.
	(gfc_get_function_type): Likewise.

gcc/testsuite/ChangeLog:

	PR fortran/104312
	* gfortran.dg/entry_26.f90: New test.
	* gfortran.dg/entry_27.f90: New test.
---
 gcc/fortran/resolve.cc | 19 +++-
 gcc/fortran/trans-expr.cc  |  2 +
 gcc/fortran/trans-types.cc |  4 ++
 gcc/testsuite/gfortran.dg/entry_26.f90 | 64 ++
 gcc/testsuite/gfortran.dg/entry_27.f90 | 64 ++
 5 files changed, 152 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/entry_26.f90
 create mode 100644 gcc/testsuite/gfortran.dg/entry_27.f90

diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index 6e42397c2ea..58013d48dff 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -702,7 +702,8 @@ resolve_entries (gfc_namespace *ns)
   gfc_code *c;
   gfc_symbol *proc;
   gfc_entry_list *el;
-  char name[GFC_MAX_SYMBOL_LEN + 1];
+  /* Provide sufficient space to hold "master.%d.%s".  */
+  char name[GFC_MAX_SYMBOL_LEN + 1 + 18];
   static int master_count = 0;

   if (ns->proc_name == NULL)
@@ -827,6 +828,9 @@ resolve_entries (gfc_namespace *ns)
 			"entries returning variables of different "
 			"string lengths", ns->entries->sym->name,
 			>entries->sym->declared_at);
+	  else if (el->sym->result->attr.allocatable
+		   != ns->entries->sym->result->attr.allocatable)
+	break;
 	}

   if (el == NULL)
@@ -838,6 +842,8 @@ resolve_entries (gfc_namespace *ns)
 	gfc_set_array_spec (proc, gfc_copy_array_spec (sym->as), NULL);
 	  if (sym->attr.pointer)
 	gfc_add_pointer (>attr, NULL);
+	  if (sym->attr.allocatable)
+	gfc_add_allocatable (>attr, NULL);
 	}
   else
 	{
@@ -869,6 +875,17 @@ resolve_entries (gfc_namespace *ns)
 			   "FUNCTION %s at %L", sym->name,
 			   ns->entries->sym->name, >declared_at);
 		}
+	  else if (sym->attr.allocatable)
+		{
+		  if (el == ns->entries)
+		gfc_error ("FUNCTION result %s cannot be ALLOCATABLE in "
+			   "FUNCTION %s at %L", sym->name,
+			   ns->entries->sym->name, >declared_at);
+		  else
+		gfc_error ("ENTRY result %s cannot be ALLOCATABLE in "
+			   "FUNCTION %s at %L", sym->name,
+			   ns->entries->sym->name, >declared_at);
+		}
 	  else
 		{
 		  ts = >ts;
diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index f052d6b9440..79367fa2ae0 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -7800,6 +7800,8 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
   */
   if (flag_f2c && sym->ts.type == BT_REAL
   && sym->ts.kind == gfc_default_real_kind
+  && !sym->attr.pointer
+  && !sym->attr.allocatable
   && !sym->attr.always_explicit)
 se->expr = fold_convert (gfc_get_real_type (sym->ts.kind), se->expr);

diff --git a/gcc/fortran/trans-types.cc b/gcc/fortran/trans-types.cc
index 9c9489a42bd..fc5c221a301 100644
--- a/gcc/fortran/trans-types.cc
+++ b/gcc/fortran/trans-types.cc
@@ -2962,6 +2962,8 @@ gfc_return_by_reference (gfc_symbol * sym)
  require an explicit interface, as no compatibility problems can
  arise there.  */
   if (flag_f2c && sym->ts.type == BT_COMPLEX
+  && !sym->attr.pointer
+  && !sym->attr.allocatable
   && !sym->attr.intrinsic && !sym->attr.always_explicit)
 return 1;

@@ -3273,6 +3275,8 @@ arg_type_list_done:
 type = gfc_get_mixed_entry_union (sym->ns);
   else if (flag_f2c && sym->ts.type == BT_REAL
 	   && sym->ts.kind == gfc_default_real_kind
+	   && !sym->attr.pointer
+	   && !sym->attr.allocatable
 	   && 

libgm2: Adjust 'autogen.sh' to 'ACLOCAL_AMFLAGS', and simplify (was: [PATCH v3 8/19] modula2 front end: libgm2 contents)

2023-04-11 Thread Thomas Schwinge
Hi!

On 2022-12-06T14:47:26+, Gaius Mulley via Gcc-patches 
 wrote:
> This patch set consists of the libgm2 makefile, autoconf sources
> necessary to build the libm2pim, libm2iso, libm2min, libm2cor
> and libm2log.

Notice:

> --- /dev/null 2022-08-24 16:22:16.88870 +0100
> +++ gcc-git-devel-modula2/libgm2/Makefile.am  2022-12-06 02:56:51.428775868 
> +

> +ACLOCAL_AMFLAGS = -I . -I .. -I ../config

... this vs.:

> --- /dev/null 2022-08-24 16:22:16.88870 +0100
> +++ gcc-git-devel-modula2/libgm2/autogen.sh   2022-12-06 02:56:51.432775922 
> +

> +aclocal -I . -I ../config
> +autoreconf -I . -I ../config

... this: for the latter, no explicit '-I ..' before '-I ../config'.
This means that 'autogen.sh' vs. plain 'autoreconf' don't produce
identical results: different order of files found via search paths.
OK to align (and simplify) that as per the attached
"libgm2: Adjust 'autogen.sh' to 'ACLOCAL_AMFLAGS', and simplify"?


Separately, given that plain 'autoreconf' works, why have 'autogen.sh' at
all?


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 8fd7ff138c83200ab83df7f08f773e33961550b4 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 11 Apr 2023 21:40:14 +0200
Subject: [PATCH] libgm2: Adjust 'autogen.sh' to 'ACLOCAL_AMFLAGS', and
 simplify

Specifying explicit '-I ..' before '-I ../config' is what (most) other GCC
components do.  Specifying '-I .' is not necessary.

With the order of '-I's aligned, 'autogen.sh' and plain 'autoreconf' then
produce identical results.

	libgm2/
	* autogen.sh: For 'aclocal', 'autoreconf', remove '-I .',
	add '-I ..'.
	* Makefile.am (ACLOCAL_AMFLAGS): Remove '-I .'.
	* libm2cor/Makefile.am (ACLOCAL_AMFLAGS): Likewise.
	* libm2iso/Makefile.am (ACLOCAL_AMFLAGS): Likewise.
	* libm2log/Makefile.am (ACLOCAL_AMFLAGS): Likewise.
	* libm2min/Makefile.am (ACLOCAL_AMFLAGS): Likewise.
	* libm2pim/Makefile.am (ACLOCAL_AMFLAGS): Likewise.
	* aclocal.m4: Regenerate.
	* Makefile.in: Likewise.
	* libm2cor/Makefile.in: Likewise.
	* libm2iso/Makefile.in: Likewise.
	* libm2log/Makefile.in: Likewise.
	* libm2min/Makefile.in: Likewise.
	* libm2pim/Makefile.in: Likewise.
---
 libgm2/Makefile.am  |  2 +-
 libgm2/Makefile.in  | 12 ++--
 libgm2/aclocal.m4   | 10 +-
 libgm2/autogen.sh   |  5 ++---
 libgm2/libm2cor/Makefile.am |  2 +-
 libgm2/libm2cor/Makefile.in | 12 ++--
 libgm2/libm2iso/Makefile.am |  2 +-
 libgm2/libm2iso/Makefile.in | 12 ++--
 libgm2/libm2log/Makefile.am |  2 +-
 libgm2/libm2log/Makefile.in | 12 ++--
 libgm2/libm2min/Makefile.am |  2 +-
 libgm2/libm2min/Makefile.in | 12 ++--
 libgm2/libm2pim/Makefile.am |  2 +-
 libgm2/libm2pim/Makefile.in | 12 ++--
 14 files changed, 49 insertions(+), 50 deletions(-)

diff --git a/libgm2/Makefile.am b/libgm2/Makefile.am
index 0b593f6ff216..95df3ed7a301 100644
--- a/libgm2/Makefile.am
+++ b/libgm2/Makefile.am
@@ -25,7 +25,7 @@ AUTOMAKE_OPTIONS = 1.8 foreign
 
 SUFFIXES = .c .mod .def .o .obj .lo .a
 
-ACLOCAL_AMFLAGS = -I . -I .. -I ../config
+ACLOCAL_AMFLAGS = -I .. -I ../config
 
 # Multilib support.
 MAKEOVERRIDES=
diff --git a/libgm2/Makefile.in b/libgm2/Makefile.in
index da2ec7c2a098..d9950065de19 100644
--- a/libgm2/Makefile.in
+++ b/libgm2/Makefile.in
@@ -90,15 +90,15 @@ host_triplet = @host@
 target_triplet = @target@
 subdir = .
 ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
-am__aclocal_m4_deps = $(top_srcdir)/../libtool.m4 \
-	$(top_srcdir)/../ltoptions.m4 $(top_srcdir)/../ltsugar.m4 \
-	$(top_srcdir)/../ltversion.m4 $(top_srcdir)/../lt~obsolete.m4 \
-	$(top_srcdir)/../config/acx.m4 \
+am__aclocal_m4_deps = $(top_srcdir)/../config/acx.m4 \
 	$(top_srcdir)/../config/depstand.m4 \
 	$(top_srcdir)/../config/lead-dot.m4 \
 	$(top_srcdir)/../config/multi.m4 \
 	$(top_srcdir)/../config/no-executables.m4 \
-	$(top_srcdir)/../config/override.m4 $(top_srcdir)/configure.ac
+	$(top_srcdir)/../config/override.m4 \
+	$(top_srcdir)/../libtool.m4 $(top_srcdir)/../ltoptions.m4 \
+	$(top_srcdir)/../ltsugar.m4 $(top_srcdir)/../ltversion.m4 \
+	$(top_srcdir)/../lt~obsolete.m4 $(top_srcdir)/configure.ac
 am__configure_deps = $(am__aclocal_m4_deps) $(CONFIGURE_DEPENDENCIES) \
 	$(ACLOCAL_M4)
 DIST_COMMON = $(srcdir)/Makefile.am $(top_srcdir)/configure \
@@ -332,7 +332,7 @@ top_srcdir = @top_srcdir@
 # Modula-2 support.
 AUTOMAKE_OPTIONS = 1.8 foreign
 SUFFIXES = .c .mod .def .o .obj .lo .a
-ACLOCAL_AMFLAGS = -I . -I .. -I ../config
+ACLOCAL_AMFLAGS = -I .. -I ../config
 
 # Multilib support.
 MAKEOVERRIDES = 
diff --git a/libgm2/aclocal.m4 b/libgm2/aclocal.m4
index c352303012d2..832065fbb9be 100644
--- a/libgm2/aclocal.m4
+++ b/libgm2/aclocal.m4
@@ -1187,14 +1187,14 @@ AC_SUBST([am__tar])
 AC_SUBST([am__untar])
 ]) 

[PATCH] RISC-V: Clean up the pr106602.c testcase

2023-04-11 Thread Palmer Dabbelt
The test case that was added is rv64i-specific, as there's better ways
to generate this code on rv32i (where the long/int cast is a NOP) and on
rv64i_zba (where we have word shifts).  This renames the original test
case and adds two more for those targets.

gcc/testsuite/ChangeLog:
PR target/106602
* gcc.target/riscv/pr106602.c: Moved to...
* gcc.target/riscv/pr106602-rv64i.c: ...here.
* gcc.target/riscv/pr106602-rv32i.c: New test.
* gcc.target/riscv/pr106602-rv64i_zba.c: New test.
---
The test suite is still running, but it looks like it's made it past
these so I figured I'd send it now as otherwise I might forget.

OK for trunk?  (assuming the tests finish and pass)
---
 gcc/testsuite/gcc.target/riscv/pr106602-rv32i.c   | 14 ++
 .../riscv/{pr106602.c => pr106602-rv64i.c}|  2 +-
 .../gcc.target/riscv/pr106602-rv64i_zba.c | 15 +++
 3 files changed, 30 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr106602-rv32i.c
 rename gcc/testsuite/gcc.target/riscv/{pr106602.c => pr106602-rv64i.c} (88%)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr106602-rv64i_zba.c

diff --git a/gcc/testsuite/gcc.target/riscv/pr106602-rv32i.c 
b/gcc/testsuite/gcc.target/riscv/pr106602-rv32i.c
new file mode 100644
index 000..05b54db7486
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr106602-rv32i.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target { riscv64*-*-* } } } */
+/* { dg-options "-O2 -march=rv32i -mabi=ilp32" } */
+
+unsigned long
+foo2 (unsigned long a)
+{
+  return (unsigned long)(unsigned int) a << 6;
+}
+
+/* { dg-final { scan-assembler-times "slli\t" 1 } } */
+/* { dg-final { scan-assembler-not "srli\t" } } */
+/* { dg-final { scan-assembler-not "\tli\t" } } */
+/* { dg-final { scan-assembler-not "addi\t" } } */
+/* { dg-final { scan-assembler-not "and\t" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/pr106602.c 
b/gcc/testsuite/gcc.target/riscv/pr106602-rv64i.c
similarity index 88%
rename from gcc/testsuite/gcc.target/riscv/pr106602.c
rename to gcc/testsuite/gcc.target/riscv/pr106602-rv64i.c
index 825b1a143b5..ef0719f4a9a 100644
--- a/gcc/testsuite/gcc.target/riscv/pr106602.c
+++ b/gcc/testsuite/gcc.target/riscv/pr106602-rv64i.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target { riscv64*-*-* } } } */
-/* { dg-options "-O2" } */
+/* { dg-options "-O2 -march=rv64i -mabi=lp64" } */
 
 unsigned long
 foo2 (unsigned long a)
diff --git a/gcc/testsuite/gcc.target/riscv/pr106602-rv64i_zba.c 
b/gcc/testsuite/gcc.target/riscv/pr106602-rv64i_zba.c
new file mode 100644
index 000..23b9f1e60f6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr106602-rv64i_zba.c
@@ -0,0 +1,15 @@
+/* { dg-do compile { target { riscv64*-*-* } } } */
+/* { dg-options "-O2 -march=rv64i_zba -mabi=lp64" } */
+
+unsigned long
+foo2 (unsigned long a)
+{
+  return (unsigned long)(unsigned int) a << 6;
+}
+
+/* { dg-final { scan-assembler-times "slli.uw\t" 1 } } */
+/* { dg-final { scan-assembler-not "slli\t" } } */
+/* { dg-final { scan-assembler-not "srli\t" } } */
+/* { dg-final { scan-assembler-not "\tli\t" } } */
+/* { dg-final { scan-assembler-not "addi\t" } } */
+/* { dg-final { scan-assembler-not "and\t" } } */
-- 
2.39.2



Re: Fix ICEs related to VM types in C [PR106465, PR107557, PR108424, PR109450]

2023-04-11 Thread Andrew Pinski via Gcc-patches
On Tue, Apr 11, 2023 at 11:47 AM Martin Uecker via Gcc-patches
 wrote:
>
>
>
> Ok, here is another attempt on fixing issues with size expression.
> Not all are regressions, but it does not make sense to try to split
> it up.

They might be regressions still from pre gimple (3.4 and before),
though I wonder how much value considering something which didn't work
in GCC 4.0+ these days as a regression.
Also splitting them up if possible is always welcomed to be able to
review each thing separately.

Thanks,
Andrew

>
> Martin
>
>
>
> Fix ICEs related to VM types in C [PR106465, PR107557, PR108424, 
> PR109450]
>
> Size expressions were sometimes lost and not gimplified correctly, 
> leading to
> ICEs and incorrect evaluation order.  Fix this by 1) not recursing 
> into
> pointers when gimplifying parameters in the middle-end (the code is 
> merged with
> gimplify_type_sizes), which is incorrect because it might access 
> variables
> declared later for incomplete structs, and 2) tracking size 
> expressions for
> struct/union members correctly, 3) emitting code to evaluate size 
> expressions
> for missing cases (nested functions, empty declarations, and 
> structs/unions).
>
> PR c/106465
> PR c/107557
> PR c/108423
> PR c/109450
>
> gcc/
> * c/c-decl.cc (start_decl): Make sure size expression are
> evaluated only in correct context.
> (grokdeclarator): Size expression in fields may need a bind
> expression, make sure DECL_EXPR is always created.
> (grokfield, declspecs_add_type): Pass along size expressions.
> (finish_struct): Remove unneeded DECL_EXPR.
> (start_function): Evaluate size expressions for nested functions.
> * c/c-parser.cc (c_parser_struct_declarations,
> c_parser_struct_or_union_specifier): Pass along size expressions.
> (c_parser_declaration_or_fndef): Evaluate size expression.
> (c_parser_objc_at_property_declaration,
> c_parser_objc_class_instance_variables): Adapt.
> * function.cc (gimplify_parm_type): Remove function.
> (gimplify_parameters): Call gimplify_parm_sizes.
> * gimplify.cc (gimplify_type_sizes): Make function static.
> (gimplify_parm_sizes): New function.
>
> gcc/testsuite/
> * gcc.dg/nested-vla-1.c: New test.
> * gcc.dg/nested-vla-2.c: New test.
> * gcc.dg/nested-vla-3.c: New test.
> * gcc.dg/pr106465.c: New test.
> * gcc.dg/pr107557-1.c: New test.
> * gcc.dg/pr107557-2.c: New test.
> * gcc.dg/pr108423-1.c: New test.
> * gcc.dg/pr108423-2.c: New test.
> * gcc.dg/pr108423-3.c: New test.
> * gcc.dg/pr108423-4.c: New test.
> * gcc.dg/pr108423-5.c: New test.
> * gcc.dg/pr108423-6.c: New test.
> * gcc.dg/pr109450-1.c: New test.
> * gcc.dg/pr109450-2.c: New test.
> * gcc.dg/typename-vla-2.c: New test.
> * gcc.dg/typename-vla-3.c: New test.
> * gcc.dg/typename-vla-4.c: New test.
> * gcc.misc-tests/gcov-pr85350.c: Adapt.
>
> diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
> index e537d33f398..c76cbb3115f 100644
> --- a/gcc/c/c-decl.cc
> +++ b/gcc/c/c-decl.cc
> @@ -5371,7 +5371,8 @@ start_decl (struct c_declarator *declarator, struct 
> c_declspecs *declspecs,
>  if (lastdecl != error_mark_node)
>*lastloc = DECL_SOURCE_LOCATION (lastdecl);
>
> -  if (expr)
> +  /* Make sure the size expression is evaluated at this point.  */
> +  if (expr && !current_scope->parm_flag)
>  add_stmt (fold_convert (void_type_node, expr));
>
>if (TREE_CODE (decl) != FUNCTION_DECL && MAIN_NAME_P (DECL_NAME (decl))
> @@ -7500,7 +7501,8 @@ grokdeclarator (const struct c_declarator *declarator,
> && c_type_variably_modified_p (type))
>   {
> tree bind = NULL_TREE;
> -   if (decl_context == TYPENAME || decl_context == PARM)
> +   if (decl_context == TYPENAME || decl_context == PARM
> +   || decl_context == FIELD)
>   {
> bind = build3 (BIND_EXPR, void_type_node, NULL_TREE,
>NULL_TREE, NULL_TREE);
> @@ -7509,10 +7511,9 @@ grokdeclarator (const struct c_declarator *declarator,
> push_scope ();
>   }
> tree decl = build_decl (loc, TYPE_DECL, NULL_TREE, type);
> -   DECL_ARTIFICIAL (decl) = 1;
> -   pushdecl (decl);
> -   finish_decl (decl, loc, NULL_TREE, NULL_TREE, NULL_TREE);
> +   add_stmt (build_stmt (DECL_SOURCE_LOCATION (decl), DECL_EXPR, 
> decl));
> TYPE_NAME (type) = decl;
> +
> if (bind)
>   {
> pop_scope ();
> @@ -8711,7 +8712,7 @@ start_struct (location_t loc, enum tree_code code, tree 
> 

Fix ICEs related to VM types in C [PR106465, PR107557, PR108424, PR109450]

2023-04-11 Thread Martin Uecker via Gcc-patches



Ok, here is another attempt on fixing issues with size expression.
Not all are regressions, but it does not make sense to try to split
it up.

Martin



Fix ICEs related to VM types in C [PR106465, PR107557, PR108424, 
PR109450]

Size expressions were sometimes lost and not gimplified correctly, 
leading to
ICEs and incorrect evaluation order.  Fix this by 1) not recursing into
pointers when gimplifying parameters in the middle-end (the code is 
merged with
gimplify_type_sizes), which is incorrect because it might access 
variables
declared later for incomplete structs, and 2) tracking size expressions 
for
struct/union members correctly, 3) emitting code to evaluate size 
expressions
for missing cases (nested functions, empty declarations, and 
structs/unions).

PR c/106465
PR c/107557
PR c/108423
PR c/109450

gcc/
* c/c-decl.cc (start_decl): Make sure size expression are
evaluated only in correct context.
(grokdeclarator): Size expression in fields may need a bind
expression, make sure DECL_EXPR is always created.
(grokfield, declspecs_add_type): Pass along size expressions.
(finish_struct): Remove unneeded DECL_EXPR.
(start_function): Evaluate size expressions for nested functions.
* c/c-parser.cc (c_parser_struct_declarations,
c_parser_struct_or_union_specifier): Pass along size expressions.
(c_parser_declaration_or_fndef): Evaluate size expression.
(c_parser_objc_at_property_declaration,
c_parser_objc_class_instance_variables): Adapt.
* function.cc (gimplify_parm_type): Remove function.
(gimplify_parameters): Call gimplify_parm_sizes.
* gimplify.cc (gimplify_type_sizes): Make function static.
(gimplify_parm_sizes): New function.

gcc/testsuite/
* gcc.dg/nested-vla-1.c: New test.
* gcc.dg/nested-vla-2.c: New test.
* gcc.dg/nested-vla-3.c: New test.
* gcc.dg/pr106465.c: New test.
* gcc.dg/pr107557-1.c: New test.
* gcc.dg/pr107557-2.c: New test.
* gcc.dg/pr108423-1.c: New test.
* gcc.dg/pr108423-2.c: New test.
* gcc.dg/pr108423-3.c: New test.
* gcc.dg/pr108423-4.c: New test.
* gcc.dg/pr108423-5.c: New test.
* gcc.dg/pr108423-6.c: New test.
* gcc.dg/pr109450-1.c: New test.
* gcc.dg/pr109450-2.c: New test.
* gcc.dg/typename-vla-2.c: New test.
* gcc.dg/typename-vla-3.c: New test.
* gcc.dg/typename-vla-4.c: New test.
* gcc.misc-tests/gcov-pr85350.c: Adapt.

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index e537d33f398..c76cbb3115f 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -5371,7 +5371,8 @@ start_decl (struct c_declarator *declarator, struct 
c_declspecs *declspecs,
 if (lastdecl != error_mark_node)
   *lastloc = DECL_SOURCE_LOCATION (lastdecl);
 
-  if (expr)
+  /* Make sure the size expression is evaluated at this point.  */
+  if (expr && !current_scope->parm_flag)
 add_stmt (fold_convert (void_type_node, expr));
 
   if (TREE_CODE (decl) != FUNCTION_DECL && MAIN_NAME_P (DECL_NAME (decl))
@@ -7500,7 +7501,8 @@ grokdeclarator (const struct c_declarator *declarator,
&& c_type_variably_modified_p (type))
  {
tree bind = NULL_TREE;
-   if (decl_context == TYPENAME || decl_context == PARM)
+   if (decl_context == TYPENAME || decl_context == PARM
+   || decl_context == FIELD)
  {
bind = build3 (BIND_EXPR, void_type_node, NULL_TREE,
   NULL_TREE, NULL_TREE);
@@ -7509,10 +7511,9 @@ grokdeclarator (const struct c_declarator *declarator,
push_scope ();
  }
tree decl = build_decl (loc, TYPE_DECL, NULL_TREE, type);
-   DECL_ARTIFICIAL (decl) = 1;
-   pushdecl (decl);
-   finish_decl (decl, loc, NULL_TREE, NULL_TREE, NULL_TREE);
+   add_stmt (build_stmt (DECL_SOURCE_LOCATION (decl), DECL_EXPR, 
decl));
TYPE_NAME (type) = decl;
+
if (bind)
  {
pop_scope ();
@@ -8711,7 +8712,7 @@ start_struct (location_t loc, enum tree_code code, tree 
name,
 tree
 grokfield (location_t loc,
   struct c_declarator *declarator, struct c_declspecs *declspecs,
-  tree width, tree *decl_attrs)
+  tree width, tree *decl_attrs, tree *expr)
 {
   tree value;
 
@@ -8768,7 +8769,7 @@ grokfield (location_t loc,
 }
 
   value = grokdeclarator (declarator, declspecs, FIELD, false,
- width ?  : NULL, decl_attrs, NULL, NULL,
+ width ?  : NULL, decl_attrs, expr, NULL,
  DEPRECATED_NORMAL);
 
   finish_decl (value, loc, NULL_TREE, 

Re: [PATCH] RISC-V: avoid splitting small constant in i_extrabit pattern

2023-04-11 Thread Jeff Law via Gcc-patches




On 4/9/23 23:07, Lin Sinan via Gcc-patches wrote:

From: Sinan Lin 

there is no need to split an xori/ori with an small constant. take the test
case `int foo(int idx) { return idx|3; }` as an example,

rv64im_zba generates:
 ori a0,a0,3
 ret
but, rv64im_zba_zbs generates:
 ori a0,a0,1
 ori a0,a0,2
 ret

with this change, insn `ori r2,r1,3` will not be splitted in zbs.
As noted, this was actually a regression.  I've bootstrapped this change 
on rv64 with no regressions and pushed it to the trunk.


Thanks!

jeff


Re: [PATCH] RISC-V: avoid splitting small constant in i_extrabit pattern

2023-04-11 Thread Jeff Law via Gcc-patches




On 4/10/23 14:59, Philipp Tomsich wrote:

On Mon, 10 Apr 2023 at 17:57, Jeff Law  wrote:




On 4/9/23 23:07, Lin Sinan via Gcc-patches wrote:

From: Sinan Lin 

there is no need to split an xori/ori with an small constant. take the test
case `int foo(int idx) { return idx|3; }` as an example,

rv64im_zba generates:
  ori a0,a0,3
  ret
but, rv64im_zba_zbs generates:
  ori a0,a0,1
  ori a0,a0,2
  ret

with this change, insn `ori r2,r1,3` will not be splitted in zbs.
---
   gcc/config/riscv/predicates.md |  2 +-
   .../gcc.target/riscv/zbs-extra-bit-or-twobits.c| 14 ++
   2 files changed, 15 insertions(+), 1 deletion(-)
   create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-extra-bit-or-twobits.c

A minor oversight in the VRULL patches in this space.  This is actually
a regression as we were previously generating the single [xo]ri.


Thanks for catching this one!

I looked this change over and it looks fine.  I hope this is the last
fallout from this set of changes.
Just for completeness, I bootstrapped and regression tested Sinan's 
change on rv64.  Given it's a regression, I'm going to go ahead and 
commit it to the trunk momentarily.


jeff


Re: [PATCH] testsuite: update requires for powerpc/float128-cmp2-runnable.c

2023-04-11 Thread Segher Boessenkool
On Tue, Apr 11, 2023 at 05:40:09PM +0800, Kewen.Lin wrote:
> on 2023/4/11 17:14, guojiufu wrote:
> > Thanks for raising this concern.
> > The behavior to check about bif on FLOAT128_HW and emit an error message for
> > requirements on quad-precision is added in gcc12. This is why gcc12 fails to
> > compile the case on -m32.
> > 
> > Before gcc12, altivec_resolve_overloaded_builtin will return the overloaded
> > result directly, and does not check more about the result function.
> 
> Thanks for checking, I wonder which commit caused this behavior change and 
> what's
> the underlying justification?  I know there is one new bif handling framework
> introduced in gcc12, not sure the checking condition was changed together or 
> by
> a standalone commit.  Anyway, apparently the conditions for the support of 
> these
> bifs are different on gcc-11 and gcc-12, I wonder why it changed.  As 
> mentioned
> above, PR108758's c#1 said this case (bifs) work well on gcc-11, I suspected 
> the
> condition change was an overkill, that's why I asked.

It almost certainly was an oversight.  The new builtin framework changed
so many things, there was bound to be some breakage to go with all the
good things it brought.

So what is the actual thing going wrong?  QP insns work fine and are
valid on all systems and environments, BE or LE, 32-bit or 64-bit.  Of
course you cannot use the "long double" type for those everywhere, but
that is a very different thing.


Segher


Re: [PATCH] libstdc++: Implement ranges::enumerate_view from P2164R9

2023-04-11 Thread Jonathan Wakely via Gcc-patches
On Tue, 11 Apr 2023 at 15:59, Patrick Palka via Libstdc++
 wrote:
>
> Tested on x86_64-pc-linux-gnu, does this look OK for trunk perhaps?

Yes, this is only for C++23 so OK for trunk now.

The auto(x) uses mean this won't work with older versions of Clang,
but that's OK. I already introduced that dependency into
basic_string::resize_for_overwrite, and it just means users of older
Clang versions can't use some C++23 features. They can still use C++20
and lower.

>
> libstdc++-v3/ChangeLog:
>
> * include/std/ranges (__cpp_lib_ranges_enumerate): Define
> for C++23.
> (__detail::__range_with_movable_reference): Likewise.
> (enumerate_view): Likewise.
> (enumerate_view::_Iterator): Likewise.
> (enumerate_view::_Sentinel): Likewise.
> * include/std/version (__cpp_lib_ranges_enumerate): Likewise.
> * testsuite/std/ranges/version_c++23.cc: Verify value of
> __cpp_lib_ranges_enumerate.
> * testsuite/std/ranges/adaptors/enumerate/1.cc: New test.
> ---
>  libstdc++-v3/include/std/ranges   | 303 ++
>  libstdc++-v3/include/std/version  |   1 +
>  .../std/ranges/adaptors/enumerate/1.cc| 102 ++
>  .../testsuite/std/ranges/version_c++23.cc |   4 +
>  4 files changed, 410 insertions(+)
>  create mode 100644 libstdc++-v3/testsuite/std/ranges/adaptors/enumerate/1.cc
>
> diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
> index 14754c125ff..be71c370eb7 100644
> --- a/libstdc++-v3/include/std/ranges
> +++ b/libstdc++-v3/include/std/ranges
> @@ -8732,6 +8732,309 @@ namespace views::__adaptor
>
>  inline constexpr _AsConst as_const;
>}
> +
> +#define __cpp_lib_ranges_enumerate 202302L
> +
> +  namespace __detail
> +  {
> +template
> +  concept __range_with_movable_reference = input_range<_Range>
> +   && move_constructible>
> +   && move_constructible>;
> +  }
> +
> +  template
> +requires __detail::__range_with_movable_reference<_Vp>
> +  class enumerate_view : public view_interface>
> +  {
> +_Vp _M_base = _Vp();
> +
> +template class _Iterator;
> +template class _Sentinel;
> +
> +  public:
> +enumerate_view() requires default_initializable<_Vp> = default;
> +
> +constexpr explicit
> +enumerate_view(_Vp __base)
> +: _M_base(std::move(__base))
> +{ }
> +
> +constexpr auto
> +begin() requires (!__detail::__simple_view<_Vp>)
> +{ return _Iterator(ranges::begin(_M_base), 0); }
> +
> +constexpr auto
> +begin() const requires __detail::__range_with_movable_reference _Vp>
> +{ return _Iterator(ranges::begin(_M_base), 0); }
> +
> +constexpr auto
> +end() requires (!__detail::__simple_view<_Vp>)
> +{
> +  if constexpr (common_range<_Vp> && sized_range<_Vp>)
> +   return _Iterator(ranges::end(_M_base), 
> ranges::distance(_M_base));
> +  else
> +   return _Sentinel(ranges::end(_M_base));
> +}
> +
> +constexpr auto
> +end() const requires __detail::__range_with_movable_reference
> +{
> +  if constexpr (common_range && sized_range)
> +   return _Iterator(ranges::end(_M_base), 
> ranges::distance(_M_base));
> +  else
> +   return _Sentinel(ranges::end(_M_base));
> +}
> +
> +constexpr auto
> +size() requires sized_range<_Vp>
> +{ return ranges::size(_M_base); }
> +
> +constexpr auto
> +size() const requires sized_range
> +{ return ranges::size(_M_base); }
> +
> +constexpr _Vp
> +base() const & requires copy_constructible<_Vp>
> +{ return _M_base; }
> +
> +constexpr _Vp
> +base() &&
> +{ return std::move(_M_base); }
> +  };
> +
> +  template
> +enumerate_view(_Range&&) -> enumerate_view>;
> +
> +  template
> +inline constexpr bool enable_borrowed_range>
> +  = enable_borrowed_range<_Tp>;
> +
> +  template
> +  requires __detail::__range_with_movable_reference<_Vp>
> +  template
> +  class enumerate_view<_Vp>::_Iterator
> +  {
> +using _Base = __maybe_const_t<_Const, _Vp>;
> +
> +static auto
> +_S_iter_concept()
> +{
> +  if constexpr (random_access_range<_Base>)
> +   return random_access_iterator_tag{};
> +  else if constexpr (bidirectional_range<_Base>)
> +   return bidirectional_iterator_tag{};
> +  else if constexpr (forward_range<_Base>)
> +   return forward_iterator_tag{};
> +  else
> +   return input_iterator_tag{};
> +}
> +
> +friend enumerate_view;
> +
> +  public:
> +using iterator_category = input_iterator_tag;
> +using iterator_concept = decltype(_S_iter_concept());
> +using difference_type = range_difference_t<_Base>;
> +using value_type = tuple>;
> +
> +  private:
> +using __reference_type = tuple range_reference_t<_Base>>;
> +
> +iterator_t<_Base> _M_current = iterator_t<_Base>();
> +difference_type _M_pos = 0;
> +
> +constexpr explicit
> +_Iterator(iterator_t<_Base> 

Re: [PATCH] libstdc++: Implement LWG 3904 change to lazy_split_view's iterator

2023-04-11 Thread Jonathan Wakely via Gcc-patches
On Tue, 11 Apr 2023 at 15:58, Patrick Palka via Libstdc++
 wrote:
>
> Tested on x86_64-pc-linux-gnu, does this look OK for trunk/12?

OK for all, thanks.

(This hasn't been approved by LWG yet, but it should be soon.)

>
> libstdc++-v3/ChangeLog:
>
> * include/std/ranges (lazy_split_view::_OuterIter::_OuterIter):
> Propagate _M_trailing_empty in the const-converting constructor
> as per LWG 3904.
> * testsuite/std/ranges/adaptors/adjacent/1.cc (test04): Correct
> assertion.
> * testsuite/std/ranges/adaptors/lazy_split.cc (test12): New test.
> ---
>  libstdc++-v3/include/std/ranges  |  3 ++-
>  .../testsuite/std/ranges/adaptors/adjacent/1.cc  |  2 +-
>  .../testsuite/std/ranges/adaptors/lazy_split.cc  | 16 
>  3 files changed, 19 insertions(+), 2 deletions(-)
>
> diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
> index b230ebefcf5..26a9f2a6872 100644
> --- a/libstdc++-v3/include/std/ranges
> +++ b/libstdc++-v3/include/std/ranges
> @@ -3209,7 +3209,8 @@ namespace views::__adaptor
>   _OuterIter(_OuterIter __i)
> requires _Const
>   && convertible_to, iterator_t<_Base>>
> -   : _M_parent(__i._M_parent), _M_current(std::move(__i._M_current))
> +   : _M_parent(__i._M_parent), _M_current(std::move(__i._M_current)),
> + _M_trailing_empty(__i._M_trailing_empty)
>   { }
>
>   constexpr value_type
> diff --git a/libstdc++-v3/testsuite/std/ranges/adaptors/adjacent/1.cc 
> b/libstdc++-v3/testsuite/std/ranges/adaptors/adjacent/1.cc
> index 443c1fbf450..19640abfe93 100644
> --- a/libstdc++-v3/testsuite/std/ranges/adaptors/adjacent/1.cc
> +++ b/libstdc++-v3/testsuite/std/ranges/adaptors/adjacent/1.cc
> @@ -107,7 +107,7 @@ test04()
>// PR libstdc++/106798
>auto r = views::single(0) | views::lazy_split(0) | views::pairwise;
>decltype(ranges::cend(r)) s = r.end();
> -  VERIFY( r.begin() == s );
> +  VERIFY( r.begin() != s );
>
>return true;
>  }
> diff --git a/libstdc++-v3/testsuite/std/ranges/adaptors/lazy_split.cc 
> b/libstdc++-v3/testsuite/std/ranges/adaptors/lazy_split.cc
> index 9df6b3b66a6..4e5c0dc3ed5 100644
> --- a/libstdc++-v3/testsuite/std/ranges/adaptors/lazy_split.cc
> +++ b/libstdc++-v3/testsuite/std/ranges/adaptors/lazy_split.cc
> @@ -22,6 +22,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -218,6 +219,20 @@ test11()
>static_assert(ranges::distance(views::lazy_split("text"sv, ""sv)) == 4);
>  }
>
> +constexpr bool
> +test12()
> +{
> +  // LWG 3904
> +  auto r = views::single(0) | views::lazy_split(0);
> +  auto i = r.begin();
> +  ++i;
> +  VERIFY( i != r.end() );
> +  decltype(std::as_const(r).begin()) j = i;
> +  VERIFY( j != r.end() );
> +
> +  return true;
> +}
> +
>  int
>  main()
>  {
> @@ -232,4 +247,5 @@ main()
>test09();
>test10();
>test11();
> +  static_assert(test12());
>  }
> --
> 2.40.0.315.g0607f793cb
>



[PATCH] libstdc++: Implement ranges::enumerate_view from P2164R9

2023-04-11 Thread Patrick Palka via Gcc-patches
Tested on x86_64-pc-linux-gnu, does this look OK for trunk perhaps?

libstdc++-v3/ChangeLog:

* include/std/ranges (__cpp_lib_ranges_enumerate): Define
for C++23.
(__detail::__range_with_movable_reference): Likewise.
(enumerate_view): Likewise.
(enumerate_view::_Iterator): Likewise.
(enumerate_view::_Sentinel): Likewise.
* include/std/version (__cpp_lib_ranges_enumerate): Likewise.
* testsuite/std/ranges/version_c++23.cc: Verify value of
__cpp_lib_ranges_enumerate.
* testsuite/std/ranges/adaptors/enumerate/1.cc: New test.
---
 libstdc++-v3/include/std/ranges   | 303 ++
 libstdc++-v3/include/std/version  |   1 +
 .../std/ranges/adaptors/enumerate/1.cc| 102 ++
 .../testsuite/std/ranges/version_c++23.cc |   4 +
 4 files changed, 410 insertions(+)
 create mode 100644 libstdc++-v3/testsuite/std/ranges/adaptors/enumerate/1.cc

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 14754c125ff..be71c370eb7 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -8732,6 +8732,309 @@ namespace views::__adaptor
 
 inline constexpr _AsConst as_const;
   }
+
+#define __cpp_lib_ranges_enumerate 202302L
+
+  namespace __detail
+  {
+template
+  concept __range_with_movable_reference = input_range<_Range>
+   && move_constructible>
+   && move_constructible>;
+  }
+
+  template
+requires __detail::__range_with_movable_reference<_Vp>
+  class enumerate_view : public view_interface>
+  {
+_Vp _M_base = _Vp();
+
+template class _Iterator;
+template class _Sentinel;
+
+  public:
+enumerate_view() requires default_initializable<_Vp> = default;
+
+constexpr explicit
+enumerate_view(_Vp __base)
+: _M_base(std::move(__base))
+{ }
+
+constexpr auto
+begin() requires (!__detail::__simple_view<_Vp>)
+{ return _Iterator(ranges::begin(_M_base), 0); }
+
+constexpr auto
+begin() const requires __detail::__range_with_movable_reference
+{ return _Iterator(ranges::begin(_M_base), 0); }
+
+constexpr auto
+end() requires (!__detail::__simple_view<_Vp>)
+{
+  if constexpr (common_range<_Vp> && sized_range<_Vp>)
+   return _Iterator(ranges::end(_M_base), 
ranges::distance(_M_base));
+  else
+   return _Sentinel(ranges::end(_M_base));
+}
+
+constexpr auto
+end() const requires __detail::__range_with_movable_reference
+{
+  if constexpr (common_range && sized_range)
+   return _Iterator(ranges::end(_M_base), ranges::distance(_M_base));
+  else
+   return _Sentinel(ranges::end(_M_base));
+}
+
+constexpr auto
+size() requires sized_range<_Vp>
+{ return ranges::size(_M_base); }
+
+constexpr auto
+size() const requires sized_range
+{ return ranges::size(_M_base); }
+
+constexpr _Vp
+base() const & requires copy_constructible<_Vp>
+{ return _M_base; }
+
+constexpr _Vp
+base() &&
+{ return std::move(_M_base); }
+  };
+
+  template
+enumerate_view(_Range&&) -> enumerate_view>;
+
+  template
+inline constexpr bool enable_borrowed_range>
+  = enable_borrowed_range<_Tp>;
+
+  template
+  requires __detail::__range_with_movable_reference<_Vp>
+  template
+  class enumerate_view<_Vp>::_Iterator
+  {
+using _Base = __maybe_const_t<_Const, _Vp>;
+
+static auto
+_S_iter_concept()
+{
+  if constexpr (random_access_range<_Base>)
+   return random_access_iterator_tag{};
+  else if constexpr (bidirectional_range<_Base>)
+   return bidirectional_iterator_tag{};
+  else if constexpr (forward_range<_Base>)
+   return forward_iterator_tag{};
+  else
+   return input_iterator_tag{};
+}
+
+friend enumerate_view;
+
+  public:
+using iterator_category = input_iterator_tag;
+using iterator_concept = decltype(_S_iter_concept());
+using difference_type = range_difference_t<_Base>;
+using value_type = tuple>;
+
+  private:
+using __reference_type = tuple>;
+
+iterator_t<_Base> _M_current = iterator_t<_Base>();
+difference_type _M_pos = 0;
+
+constexpr explicit
+_Iterator(iterator_t<_Base> __current, difference_type __pos)
+: _M_current(std::move(__current)), _M_pos(__pos)
+{ }
+
+  public:
+_Iterator() requires default_initializable> = default;
+
+constexpr
+_Iterator(_Iterator __i)
+requires _Const && convertible_to, iterator_t<_Base>>
+: _M_current(std::move(__i._M_current)), _M_pos(__i._M_pos)
+{ }
+
+constexpr const iterator_t<_Base> &
+base() const & noexcept
+{ return _M_current; }
+
+constexpr iterator_t<_Base>
+base() &&
+{ return std::move(_M_current); }
+
+constexpr difference_type
+index() const noexcept
+{ return _M_pos; }
+
+constexpr auto
+operator*() const
+{ return __reference_type(_M_pos, 

[PATCH] libstdc++: Implement LWG 3904 change to lazy_split_view's iterator

2023-04-11 Thread Patrick Palka via Gcc-patches
Tested on x86_64-pc-linux-gnu, does this look OK for trunk/12?

libstdc++-v3/ChangeLog:

* include/std/ranges (lazy_split_view::_OuterIter::_OuterIter):
Propagate _M_trailing_empty in the const-converting constructor
as per LWG 3904.
* testsuite/std/ranges/adaptors/adjacent/1.cc (test04): Correct
assertion.
* testsuite/std/ranges/adaptors/lazy_split.cc (test12): New test.
---
 libstdc++-v3/include/std/ranges  |  3 ++-
 .../testsuite/std/ranges/adaptors/adjacent/1.cc  |  2 +-
 .../testsuite/std/ranges/adaptors/lazy_split.cc  | 16 
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index b230ebefcf5..26a9f2a6872 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -3209,7 +3209,8 @@ namespace views::__adaptor
  _OuterIter(_OuterIter __i)
requires _Const
  && convertible_to, iterator_t<_Base>>
-   : _M_parent(__i._M_parent), _M_current(std::move(__i._M_current))
+   : _M_parent(__i._M_parent), _M_current(std::move(__i._M_current)),
+ _M_trailing_empty(__i._M_trailing_empty)
  { }
 
  constexpr value_type
diff --git a/libstdc++-v3/testsuite/std/ranges/adaptors/adjacent/1.cc 
b/libstdc++-v3/testsuite/std/ranges/adaptors/adjacent/1.cc
index 443c1fbf450..19640abfe93 100644
--- a/libstdc++-v3/testsuite/std/ranges/adaptors/adjacent/1.cc
+++ b/libstdc++-v3/testsuite/std/ranges/adaptors/adjacent/1.cc
@@ -107,7 +107,7 @@ test04()
   // PR libstdc++/106798
   auto r = views::single(0) | views::lazy_split(0) | views::pairwise;
   decltype(ranges::cend(r)) s = r.end();
-  VERIFY( r.begin() == s );
+  VERIFY( r.begin() != s );
 
   return true;
 }
diff --git a/libstdc++-v3/testsuite/std/ranges/adaptors/lazy_split.cc 
b/libstdc++-v3/testsuite/std/ranges/adaptors/lazy_split.cc
index 9df6b3b66a6..4e5c0dc3ed5 100644
--- a/libstdc++-v3/testsuite/std/ranges/adaptors/lazy_split.cc
+++ b/libstdc++-v3/testsuite/std/ranges/adaptors/lazy_split.cc
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -218,6 +219,20 @@ test11()
   static_assert(ranges::distance(views::lazy_split("text"sv, ""sv)) == 4);
 }
 
+constexpr bool
+test12()
+{
+  // LWG 3904
+  auto r = views::single(0) | views::lazy_split(0);
+  auto i = r.begin();
+  ++i;
+  VERIFY( i != r.end() );
+  decltype(std::as_const(r).begin()) j = i;
+  VERIFY( j != r.end() );
+
+  return true;
+}
+
 int
 main()
 {
@@ -232,4 +247,5 @@ main()
   test09();
   test10();
   test11();
+  static_assert(test12());
 }
-- 
2.40.0.315.g0607f793cb



[PATCH, v2] Fortran: resolve correct generic with TYPE(C_PTR) arguments [PR61615]

2023-04-11 Thread Harald Anlauf via Gcc-patches

Hi Jerry, all,

On 4/11/23 02:43, Jerry D via Gcc-patches wrote:

On 4/10/23 1:49 PM, Harald Anlauf via Fortran wrote:

Dear all,

when comparing formal and actual arguments of a procedure, there was no
check of rank for derived types from intrinsic module ISO_C_BINDING.
This could lead to a wrong resolution of generic procedures with dummy
argument of related types, see PR.  This was likely an oversight.

The attached fix is simple and regtests cleanly on x86_64-pc-linux-gnu.

OK for mainline?

Thanks,
Harald



Looks good to go.

Jerry


I actually found a flaw in the previous patch regarding the handling
of rank, and also realized that a comparison of the types was missing
for those from this intrinsic module (and found the related PR99982).

I updated the patch accordingly and extended the testcase, see attached.

Regtests cleanly on x86_64-pc-linux-gnu.

Will wait for 24h for more comments.

Thanks,
Harald

From 3fa9d2ee99afa38f42c267d17aed5266daa22dbc Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Tue, 11 Apr 2023 16:44:32 +0200
Subject: [PATCH] Fortran: resolve correct generic with TYPE(C_PTR) arguments
 [PR61615,PR99982]

gcc/fortran/ChangeLog:

	PR fortran/61615
	PR fortran/99982
	* interface.cc (compare_parameter): Enable type and rank checks for
	arguments of derived type from the intrinsic module ISO_C_BINDING.

gcc/testsuite/ChangeLog:

	PR fortran/61615
	PR fortran/99982
	* gfortran.dg/interface_49.f90: New test.
---
 gcc/fortran/interface.cc   | 18 +++-
 gcc/testsuite/gfortran.dg/interface_49.f90 | 95 ++
 2 files changed, 112 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/interface_49.f90

diff --git a/gcc/fortran/interface.cc b/gcc/fortran/interface.cc
index db79b104dc2..e9843e9549c 100644
--- a/gcc/fortran/interface.cc
+++ b/gcc/fortran/interface.cc
@@ -2361,7 +2361,23 @@ compare_parameter (gfc_symbol *formal, gfc_expr *actual,
   && formal->ts.u.derived && formal->ts.u.derived->ts.is_iso_c
   && actual->ts.type == BT_DERIVED
   && actual->ts.u.derived && actual->ts.u.derived->ts.is_iso_c)
-return true;
+{
+  if (formal->ts.u.derived->intmod_sym_id
+	  != actual->ts.u.derived->intmod_sym_id)
+	return false;
+
+  if (ranks_must_agree
+	  && symbol_rank (formal) != actual->rank
+	  && symbol_rank (formal) != -1)
+	{
+	  if (where)
+	argument_rank_mismatch (formal->name, >where,
+symbol_rank (formal), actual->rank,
+NULL);
+	  return false;
+	}
+  return true;
+}
 
   if (formal->ts.type == BT_CLASS && actual->ts.type == BT_DERIVED)
 /* Make sure the vtab symbol is present when
diff --git a/gcc/testsuite/gfortran.dg/interface_49.f90 b/gcc/testsuite/gfortran.dg/interface_49.f90
new file mode 100644
index 000..aef5e0c6609
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/interface_49.f90
@@ -0,0 +1,95 @@
+! { dg-do run }
+! PR fortran/61615 - resolve correct generic with TYPE(C_PTR) arguments
+! PR fortran/99982 - dto. with C_PTR and C_FUNPTR
+! Contributed by Jacob Abel and Scot Breitenfeld
+
+MODULE foo
+  USE iso_c_binding, only : c_ptr, c_funptr
+  IMPLICIT NONE
+  integer  :: rank = -99
+  character(8) :: ctyp = ""
+  INTERFACE bar
+MODULE PROCEDURE bar_s
+MODULE PROCEDURE bar_a1d
+MODULE PROCEDURE bar_a2d
+MODULE PROCEDURE bar_fp
+MODULE PROCEDURE bar_fp1
+MODULE PROCEDURE bar_fpx
+  END INTERFACE bar
+CONTAINS
+  SUBROUTINE bar_s(a)
+TYPE(c_ptr) :: a
+WRITE (0, *) 'in bar_s'
+rank = 0
+ctyp = "c_ptr"
+  END SUBROUTINE bar_s
+
+  SUBROUTINE bar_a1d(a)
+TYPE(c_ptr) :: a(:)
+WRITE (0, *) 'in bar_a1d'
+rank = 1
+ctyp = "c_ptr"
+  END SUBROUTINE bar_a1d
+
+  SUBROUTINE bar_a2d(a)
+TYPE(c_ptr) :: a(:,:)
+WRITE (0, *) 'in bar_a2d'
+rank = 2
+ctyp = "c_ptr"
+  END SUBROUTINE bar_a2d
+
+  SUBROUTINE bar_fp(a)
+TYPE(c_funptr) :: a
+WRITE (0, *) 'in bar_fp'
+rank = 0
+ctyp = "c_funptr"
+  END SUBROUTINE bar_fp
+
+  SUBROUTINE bar_fp1(a)
+TYPE(c_funptr) :: a(:)
+WRITE (0, *) 'in bar_fp1'
+rank = 1
+ctyp = "c_funptr"
+  END SUBROUTINE bar_fp1
+
+  SUBROUTINE bar_fpx(a, b)
+TYPE(c_funptr) :: a(..)
+TYPE(c_ptr):: b
+WRITE (0, *) 'in bar_fpx'
+rank = -1
+ctyp = "c_funptr"
+  END SUBROUTINE bar_fpx
+END MODULE foo
+
+PROGRAM cptr_array_vs_scalar_arg
+  USE foo
+  USE iso_c_binding, only : c_ptr, c_loc, c_funptr
+  IMPLICIT NONE
+  INTEGER, TARGET :: i
+  TYPE(c_ptr) :: a, b(1), c(1,1)
+  type(c_funptr)  :: fp, fp1(1), fp2(1,1)
+  a= C_LOC(i)
+  b(1) = C_LOC(i)
+  CALL bar(a)
+  if (rank /= 0 .or. ctyp /= "c_ptr") stop 1
+  CALL bar(b)
+  if (rank /= 1 .or. ctyp /= "c_ptr") stop 2
+  CALL bar(c)
+  if (rank /= 2 .or. ctyp /= "c_ptr") stop 3
+  rank = -99
+  ctyp = ""
+  CALL bar((a))
+  if (rank /= 0 .or. ctyp /= "c_ptr") stop 4
+  CALL bar((b))
+  if (rank /= 1 .or. ctyp /= "c_ptr") stop 5
+  rank = -99
+  ctyp = ""
+  CALL bar(fp)
+  

Re: [PATCH] c++: Define built-in for std::tuple_element [PR100157]

2023-04-11 Thread Patrick Palka via Gcc-patches
On Thu, 26 Jan 2023, Jason Merrill wrote:

> On 1/25/23 15:35, Patrick Palka wrote:
> > On Tue, 17 Jan 2023, Jason Merrill wrote:
> > 
> > > On 1/9/23 14:25, Patrick Palka via Gcc-patches wrote:
> > > > On Mon, 9 Jan 2023, Patrick Palka wrote:
> > > > 
> > > > > On Wed, 5 Oct 2022, Patrick Palka wrote:
> > > > > 
> > > > > > On Thu, 7 Jul 2022, Jonathan Wakely via Gcc-patches wrote:
> > > > > > 
> > > > > > > This adds a new built-in to replace the recursive class template
> > > > > > > instantiations done by traits such as std::tuple_element and
> > > > > > > std::variant_alternative. The purpose is to select the Nth type
> > > > > > > from a
> > > > > > > list of types, e.g. __builtin_type_pack_element(1, char, int,
> > > > > > > float)
> > > > > > > is
> > > > > > > int.
> > > > > > > 
> > > > > > > For a pathological example tuple_element_t<1000, tuple<2000
> > > > > > > types...>>
> > > > > > > the compilation time is reduced by more than 90% and the memory
> > > > > > > used
> > > > > > > by
> > > > > > > the compiler is reduced by 97%. In realistic examples the gains
> > > > > > > will
> > > > > > > be
> > > > > > > much smaller, but still relevant.
> > > > > > > 
> > > > > > > Clang has a similar built-in, __type_pack_element, but
> > > > > > > that's
> > > > > > > a
> > > > > > > "magic template" built-in using <> syntax, which GCC doesn't
> > > > > > > support.
> > > > > > > So
> > > > > > > this provides an equivalent feature, but as a built-in function
> > > > > > > using
> > > > > > > parens instead of <>. I don't really like the name "type pack
> > > > > > > element"
> > > > > > > (it gives you an element from a pack of types) but the
> > > > > > > semi-consistency
> > > > > > > with Clang seems like a reasonable argument in favour of keeping
> > > > > > > the
> > > > > > > name. I'd be open to alternative names though, e.g.
> > > > > > > __builtin_nth_type
> > > > > > > or __builtin_type_at_index.
> > > > > > 
> > > > > > Rather than giving the trait a different name from
> > > > > > __type_pack_element,
> > > > > > I wonder if we could just special case cp_parser_trait to expect <>
> > > > > > instead of parens for this trait?
> > > > > > 
> > > > > > Btw the frontend recently got a generic TRAIT_TYPE tree code, which
> > > > > > gets
> > > > > > rid of much of the boilerplate of adding a new type-yielding
> > > > > > built-in
> > > > > > trait, see e.g. cp-trait.def.
> > > > > 
> > > > > Here's a tested patch based on Jonathan's original patch that
> > > > > implements
> > > > > the built-in in terms of TRAIT_TYPE, names it __type_pack_element
> > > > > instead of __builtin_type_pack_element, and treats invocations of it
> > > > > like a template-id instead of a call (to match Clang).
> > > > > 
> > > > > -- >8 --
> > > > > 
> > > > > Subject: [PATCH] c++: Define built-in for std::tuple_element
> > > > > [PR100157]
> > > > > 
> > > > > This adds a new built-in to replace the recursive class template
> > > > > instantiations done by traits such as std::tuple_element and
> > > > > std::variant_alternative.  The purpose is to select the Nth type from
> > > > > a
> > > > > list of types, e.g. __type_pack_element<1, char, int, float> is int.
> > > > > We implement it as a special kind of TRAIT_TYPE.
> > > > > 
> > > > > For a pathological example tuple_element_t<1000, tuple<2000 types...>>
> > > > > the compilation time is reduced by more than 90% and the memory  used
> > > > > by
> > > > > the compiler is reduced by 97%.  In realistic examples the gains will
> > > > > be
> > > > > much smaller, but still relevant.
> > > > > 
> > > > > Unlike the other built-in traits, __type_pack_element uses template-id
> > > > > syntax instead of call syntax and is SFINAE-enabled, matching Clang's
> > > > > implementation.  And like the other built-in traits, it's not
> > > > > mangleable
> > > > > so we can't use it directly in function signatures.
> > > > > 
> > > > > Some caveats:
> > > > > 
> > > > > * Clang's version of the built-in seems to act like a "magic
> > > > > template"
> > > > >   that can e.g. be used as a template template argument.  For
> > > > > simplicity
> > > > >   we implement it in a more ad-hoc way.
> > > > > * Our parsing of the <>'s in __type_pack_element<...> is currently
> > > > >   rudimentary and doesn't try to disambiguate a trailing >> vs > >
> > > > >   as cp_parser_enclosed_template_argument_list does.
> > > > 
> > > > Hmm, this latter caveat turns out to be inconvenient (for code such as
> > > > type_pack_element3.C) and admits an easy workaround inspired by what
> > > > cp_parser_enclosed_template_argument_list does.
> > > > 
> > > > v2: Consider the >> in __type_pack_element<0, int, char>> to be two >'s.
> > > >   Handle non-type TRAIT_TYPE_TYPE1 in strip_typedefs (for sake of
> > > >   CPTK_TYPE_PACK_ELEMENT).
> > > 
> > > Why not use cp_parser_enclosed_template_argument_list directly?
> > 
> > If we used 

Re: [PATCH v5] RISC-V: Fix regression of -fzero-call-used-regs=all

2023-04-11 Thread Kito Cheng via Gcc-patches
Hi Yanzhang:

Thanks, applied to trunk now, and also congrats for your first patch on GCC!

On Tue, Apr 11, 2023 at 8:00 PM Wang, Yanzhang  wrote:
>
> Hi Kito, Juzhe, Jeff,
>
> Thanks for your kindly reviews. I have modified based on the comments and ran 
> the testsuite on my local. Could you please take another look ? If any more 
> comments please let me know.
>
> Thanks
> Yanzhang
>
> > -Original Message-
> > From: Wang, Yanzhang 
> > Sent: Tuesday, April 11, 2023 7:38 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: juzhe.zh...@rivai.ai; kito.ch...@sifive.com; Li, Pan2
> > ; Wang, Yanzhang 
> > Subject: [PATCH v5] RISC-V: Fix regression of -fzero-call-used-regs=all
> >
> > From: Yanzhang Wang 
> >
> > This patch registers a riscv specific function to
> > TARGET_ZERO_CALL_USED_REGS instead of default in targhooks.cc. It will
> > clean gpr and vector relevant registers.
> >
> >   PR 109104
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/riscv-protos.h (emit_hard_vlmax_vsetvl):
> >   * config/riscv/riscv-v.cc (emit_hard_vlmax_vsetvl):
> >   (emit_vlmax_vsetvl):
> >   * config/riscv/riscv.cc (vector_zero_call_used_regs):
> >   (riscv_zero_call_used_regs):
> >   (TARGET_ZERO_CALL_USED_REGS):
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/zero-scratch-regs-1.c: New test.
> >   * gcc.target/riscv/zero-scratch-regs-2.c: New test.
> >   * gcc.target/riscv/zero-scratch-regs-3.c: New test.
> >
> > Signed-off-by: Yanzhang Wang 
> > Co-authored-by: Pan Li 
> > Co-authored-by: Ju-Zhe Zhong 
> > Co-authored-by: Kito Cheng 
> > ---
> >  gcc/config/riscv/riscv-protos.h   |  1 +
> >  gcc/config/riscv/riscv-v.cc   | 15 +++-
> >  gcc/config/riscv/riscv.cc | 75 +++
> >  .../gcc.target/riscv/zero-scratch-regs-1.c|  9 +++
> >  .../gcc.target/riscv/zero-scratch-regs-2.c| 24 ++
> >  .../gcc.target/riscv/zero-scratch-regs-3.c| 57 ++
> >  6 files changed, 178 insertions(+), 3 deletions(-)  create mode 100644
> > gcc/testsuite/gcc.target/riscv/zero-scratch-regs-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/zero-scratch-regs-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/zero-scratch-regs-3.c
> >
> > diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-
> > protos.h index 4611447ddde..5244e8dcbf0 100644
> > --- a/gcc/config/riscv/riscv-protos.h
> > +++ b/gcc/config/riscv/riscv-protos.h
> > @@ -159,6 +159,7 @@ bool check_builtin_call (location_t, vec,
> > unsigned int,  bool const_vec_all_same_in_range_p (rtx, HOST_WIDE_INT,
> > HOST_WIDE_INT);  bool legitimize_move (rtx, rtx, machine_mode);  void
> > emit_vlmax_vsetvl (machine_mode, rtx);
> > +void emit_hard_vlmax_vsetvl (machine_mode, rtx);
> >  void emit_vlmax_op (unsigned, rtx, rtx, machine_mode);  void
> > emit_vlmax_op (unsigned, rtx, rtx, rtx, machine_mode);  void
> > emit_nonvlmax_op (unsigned, rtx, rtx, rtx, machine_mode); diff --git
> > a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index
> > 2e91d019f6c..392f5d02e17 100644
> > --- a/gcc/config/riscv/riscv-v.cc
> > +++ b/gcc/config/riscv/riscv-v.cc
> > @@ -118,6 +118,17 @@ const_vec_all_same_in_range_p (rtx x, HOST_WIDE_INT
> > minval,
> > && IN_RANGE (INTVAL (elt), minval, maxval));  }
> >
> > +/* Emit a vlmax vsetvl instruction.  This should only be used when
> > +   optimization is disabled or after vsetvl insertion pass.  */ void
> > +emit_hard_vlmax_vsetvl (machine_mode vmode, rtx vl) {
> > +  unsigned int sew = get_sew (vmode);
> > +  emit_insn (gen_vsetvl (Pmode, vl, RVV_VLMAX, gen_int_mode (sew, Pmode),
> > +  gen_int_mode (get_vlmul (vmode), Pmode), const0_rtx,
> > +  const0_rtx));
> > +}
> > +
> >  void
> >  emit_vlmax_vsetvl (machine_mode vmode, rtx vl)  { @@ -126,9 +137,7 @@
> > emit_vlmax_vsetvl (machine_mode vmode, rtx vl)
> >unsigned int ratio = calculate_ratio (sew, vlmul);
> >
> >if (!optimize)
> > -emit_insn (gen_vsetvl (Pmode, vl, RVV_VLMAX, gen_int_mode (sew,
> > Pmode),
> > -gen_int_mode (get_vlmul (vmode), Pmode), 
> > const0_rtx,
> > -const0_rtx));
> > +emit_hard_vlmax_vsetvl (vmode, vl);
> >else
> >  emit_insn (gen_vlmax_avl (Pmode, vl, gen_int_mode (ratio,
> > Pmode)));  } diff --git a/gcc/config/riscv/riscv.cc
> > b/gcc/config/riscv/riscv.cc index 5f542932d13..a9c9e1aa32b 100644
> > --- a/gcc/config/riscv/riscv.cc
> > +++ b/gcc/config/riscv/riscv.cc
> > @@ -7066,6 +7066,78 @@ riscv_shamt_matches_mask_p (int shamt,
> > HOST_WIDE_INT mask)
> >return shamt == ctz_hwi (mask);
> >  }
> >
> > +HARD_REG_SET
> > +vector_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs) {
> > +  HARD_REG_SET zeroed_hardregs;
> > +  CLEAR_HARD_REG_SET (zeroed_hardregs);
> > +
> > +  /* Find a register to hold vl.  */
> > +  unsigned vl_regno = INVALID_REGNUM;
> > +  /* Skip the first GPR, 

Re: [match.pd] [SVE] Add pattern to transform svrev(svrev(v)) --> v

2023-04-11 Thread Prathamesh Kulkarni via Gcc-patches
On Tue, 11 Apr 2023 at 14:17, Richard Biener  wrote:
>
> On Wed, Apr 5, 2023 at 10:39 AM Prathamesh Kulkarni via Gcc-patches
>  wrote:
> >
> > Hi,
> > For the following test:
> >
> > svint32_t f(svint32_t v)
> > {
> >   return svrev_s32 (svrev_s32 (v));
> > }
> >
> > We generate 2 rev instructions instead of nop:
> > f:
> > rev z0.s, z0.s
> > rev z0.s, z0.s
> > ret
> >
> > The attached patch tries to fix that by trying to recognize the following
> > pattern in match.pd:
> > v1 = VEC_PERM_EXPR (v0, v0, mask)
> > v2 = VEC_PERM_EXPR (v1, v1, mask)
> > -->
> > v2 = v0
> > if mask is { nelts - 1, nelts - 2, nelts - 3, ... }
> >
> > Code-gen with patch:
> > f:
> > ret
> >
> > Bootstrap+test passes on aarch64-linux-gnu, and SVE bootstrap in progress.
> > Does it look OK for stage-1 ?
>
> I didn't look at the patch but tree-ssa-forwprop.cc:simplify_permutation 
> should
> handle two consecutive permutes with the is_combined_permutation_identity
> which might need tweaking for VLA vectors
Hi Richard,
Thanks for the suggestions. The attached patch modifies
is_combined_permutation_identity
to recognize the above pattern.
Does it look OK ?
Bootstrap+test in progress on aarch64-linux-gnu and x86_64-linux-gnu.

Thanks,
Prathamesh
>
> Richard.
>
> >
> > Thanks,
> > Prathamesh
gcc/ChangeLog:
* tree-ssa-forwprop.cc (is_combined_permutation_identity):
New parameter def_stmt.
Try to simplify two successive VEC_PERM_EXPRs with single operand
and same mask, where mask chooses elements in reverse order.

gcc/testesuite/ChangeLog:
* gcc.target/aarch64/sve/acle/general/rev-1.c: New test.

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/rev-1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/rev-1.c
new file mode 100644
index 000..e57ee67d716
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/rev-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+#include 
+
+svint32_t f(svint32_t v)
+{
+  return svrev_s32 (svrev_s32 (v));
+}
+
+/* { dg-final { scan-tree-dump "return v_1\\(D\\)" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "optimized" } } */
diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
index 9b567440ba4..5cbee077d89 100644
--- a/gcc/tree-ssa-forwprop.cc
+++ b/gcc/tree-ssa-forwprop.cc
@@ -2532,7 +2532,7 @@ simplify_bitfield_ref (gimple_stmt_iterator *gsi)
gives back one of the input.  */
 
 static int
-is_combined_permutation_identity (tree mask1, tree mask2)
+is_combined_permutation_identity (tree mask1, tree mask2, gimple *def_stmt = 
NULL)
 {
   tree mask;
   unsigned HOST_WIDE_INT nelts, i, j;
@@ -2541,7 +2541,36 @@ is_combined_permutation_identity (tree mask1, tree mask2)
 
   gcc_checking_assert (TREE_CODE (mask1) == VECTOR_CST
   && TREE_CODE (mask2) == VECTOR_CST);
+  if (def_stmt)
+gcc_checking_assert (is_gimple_assign (def_stmt)
+&& gimple_assign_rhs_code (def_stmt) == VEC_PERM_EXPR);
+
   mask = fold_ternary (VEC_PERM_EXPR, TREE_TYPE (mask1), mask1, mask1, mask2);
+
+  /* For VLA masks, check for the following pattern:
+ v1 = VEC_PERM_EXPR (v0, v0, mask)
+ v2 = VEC_PERM_EXPR (v1, v1, mask)
+ -->
+ v2 = v0
+ if mask is {nelts - 1, nelts - 2, ...}.  */
+
+  if (operand_equal_p (mask1, mask2, 0)
+  && !VECTOR_CST_NELTS (mask1).is_constant ()
+  && def_stmt
+  && operand_equal_p (gimple_assign_rhs1 (def_stmt),
+ gimple_assign_rhs2 (def_stmt), 0))
+{
+  vec_perm_builder builder;
+  if (tree_to_vec_perm_builder (, mask1))
+   {
+ poly_uint64 nelts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask1));
+ vec_perm_indices sel (builder, 1, nelts);
+ if (sel.series_p (0, 1, nelts - 1, -1))
+   return 1;
+   }
+  return 0;
+}
+
   if (mask == NULL_TREE || TREE_CODE (mask) != VECTOR_CST)
 return 0;
 
@@ -2629,7 +2658,7 @@ simplify_permutation (gimple_stmt_iterator *gsi)
   op3 = gimple_assign_rhs3 (def_stmt);
   if (TREE_CODE (op3) != VECTOR_CST)
return 0;
-  ident = is_combined_permutation_identity (op3, op2);
+  ident = is_combined_permutation_identity (op3, op2, def_stmt);
   if (!ident)
return 0;
   orig = (ident == 1) ? gimple_assign_rhs1 (def_stmt)


Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-11 Thread Kito Cheng via Gcc-patches
Let me give more explanation why RISC-V vector need so many modes than AArch64.

The following will use "RVV" as an abbreviation for "RISC-V Vector"
instructions.

There are two key points here:

- RVV has a concept called LMUL - you can understand that as register
grouping, we can group up to 8 adjacent registers together and then
operate at once, e.g. one vadd can operate on adding two 8-reg groups
at once.
- We have segment load/store that require vector tuple types. -
AArch64 has similar stuffs on both Neon and SVE, e.g. int32x2x2_t or
svint32x2_t.

In order to model LMUL in backend, we have to the combination of
scalar type and LMUL; possible LMUL is 1, 2, 4, 8, 1/2, 1/4, 1/8 - 8
different types of LMUL, and we'll have QI, HI, SI, DI, HF, SF and DF,
so basically we'll have 7 (LMUL type) * 7 (scalar type) here.

Okay, let's talk about tuple type AArch64 also having tuple type, but
why is it not having such a huge number of modes? It mainly cause by
LMUL; use a concrete example to explain why this cause different
design on machine mode, using scalable vector mode with SI mode tuple
here:

AArch64: svint32_t (VNx4SI) svint32x2_t (VNx8SI) svint32x3_t (VNx12SI)
svint32x3_t (VNx16SI)

AArch64 only has up to 3-tuple, but RISC-V could have up to 8-tuple,
so we already have 8 different types for each scalar mode even though
we don't count LMUL concept yet.

RISC-V*: vint32m1_t (VNx4SI) vint32m1x2_t (VNx8SI) vint32m1x3_t
(VNx12SI) vint32m1x4_t (VNx16SI) vint32m1x5_t (VNx20SI) vint32m1x6_t
(VNx24SI) vint32m1x7_t (VNx28SI) vint32m1x8_t (VNx32SI)

Using VLEN=128 as the base type system, you can ignore it if you don't
understand the meaning for now.

And let's consider LMUL now, add LMUL=2 case here, RVV has a
constraint that the LMUL * NF(NF-tuple) must be less or equal to 8, so
we have only 3 extra modes for LMUL=2.

RISC-V*: vint32m2_t (VNx8SI) vint32m2x2_t (VNx16SI) vint32m2x3_t
(VNx24SI) vint32m2x4_t (VNx32SI)

However, there is a big problem RVV have different register constraint
for different LMUL type, LMUL <= 1 can use any register, LMUL=2 type
require register align to multiple-of-2 (v0, v2, …), and LMUL=4 type
requires register align to multiple-of-4 (v0, v4, …).

So vint32m1x2_t (LMUL=1x2) and vint32m2_t (LMUL=2) have the same size
and NUNIT, but they have different register constraint, vint32m1x2_t
is LMUL 1, so we don't have register constraint, but vint32m2_t is
LMUL 2 so it has reg. constraint, it must be aligned to multiple-of-2.

Based on the above reason, those tuple types must have separated
machine mode even if they have the same size and NUNIT.

Why Neon and SVE didn't have such an issue? Because SVE and Neon
didn't have the concept of LMUL, so tuple type in SVE and Neon won't
have two vector types that have the same size but different register
constraints or alignment - one size is one type.

So based on LMUL and register constraint issue of tuple type, we must
have 37 types for vector tuples, and plus 48 modes variable-length
vector mode, and 42 scalar mode - so we have ~140 modes now, it sounds
like still less than 256, so what happened?


RVV has one more thing special thing in our type system due to ISA
design, the minimal vector length of RVV is 32 bit unlike SVE
guarantee, the minimal is 128 bits, so we did some tricks one our type
system is we have a different mode for minimal vector length
(MIN_VLEN) is 32, 64 or large or equal to 128, this design is because
it would be more friendly for vectorizer, and also model things
precisely for better code gen.

e.g.

vint32m1_t is VNx1SI in MIN_VLEN>=32

vint32m1_t is VNx2SI in MIN_VLEN>=64

vint32m1_t is VNx4SI in MIN_VLEN>=128

So actually we will have 37 * 3 modes for vector tuple mode, and now
~210 modes now (the result is little different than JuZhe's number
since I ignore some mode isn't used in C, but it defined in machine
mode due the the current GCC will always define all possible scalar
mode for a vector mode)

We also plan to add some traditional fixed length vector types like
V2SI in future…and apparently 256 mode isn't enough for this plan :(


Fwd: [V6][PATCH 2/2] Update documentation to clarify a GCC extension

2023-04-11 Thread Qing Zhao via Gcc-patches
Hi, Joseph,

This is the 2nd ping to the 6th version of the patch -:)

Please let me know if you have any further comments on the patch, and whether 
it’s Okay to commit it to trunk?

Thanks a lot for the help.

Qing

Begin forwarded message:

From: Qing Zhao via Gcc-patches 
mailto:gcc-patches@gcc.gnu.org>>
Subject: Fwd: [V6][PATCH 2/2] Update documentation to clarify a GCC extension
Date: April 4, 2023 at 9:07:55 AM EDT
To: Joseph Myers mailto:jos...@codesourcery.com>>
Cc: Jakub Jelinek mailto:ja...@redhat.com>>, Richard Biener 
mailto:richard.guent...@gmail.com>>, Kees Cook 
mailto:keesc...@chromium.org>>, Siddhesh Poyarekar 
mailto:siddh...@gotplt.org>>, gcc Patches 
mailto:gcc-patches@gcc.gnu.org>>
Reply-To: Qing Zhao mailto:qing.z...@oracle.com>>

Ping….

Qing

Begin forwarded message:

From: Qing Zhao 
mailto:qing.z...@oracle.com>>
Subject: [PATCH 2/2] Update documentation to clarify a GCC extension
Date: March 28, 2023 at 11:49:44 AM EDT
To: ja...@redhat.com, 
jos...@codesourcery.com
Cc: 
richard.guent...@gmail.com,
 
keesc...@chromium.org,
 siddh...@gotplt.org, 
gcc-patches@gcc.gnu.org,
 Qing Zhao 
mailto:qing.z...@oracle.com>>

on a structure with a C99 flexible array member being nested in
another structure. (PR77650)

"GCC extension accepts a structure containing an ISO C99 "flexible array
member", or a union containing such a structure (possibly recursively)
to be a member of a structure.

There are two situations:

 * A structure or a union with a C99 flexible array member is the last
   field of another structure, for example:

struct flex  { int length; char data[]; };
union union_flex { int others; struct flex f; };

struct out_flex_struct { int m; struct flex flex_data; };
struct out_flex_union { int n; union union_flex flex_data; };

   In the above, both 'out_flex_struct.flex_data.data[]' and
   'out_flex_union.flex_data.f.data[]' are considered as flexible
   arrays too.

 * A structure or a union with a C99 flexible array member is the
   middle field of another structure, for example:

struct flex  { int length; char data[]; };

struct mid_flex { int m; struct flex flex_data; int n; };

   In the above, 'mid_flex.flex_data.data[]' has undefined behavior.
   Compilers do not handle such case consistently, Any code relying on
   such case should be modified to ensure that flexible array members
   only end up at the ends of structures.

   Please use warning option '-Wflex-array-member-not-at-end' to
   identify all such cases in the source code and modify them.  This
   warning will be on by default starting from GCC 14.
"

gcc/c-family/ChangeLog:

* c.opt: New option -Wflex-array-member-not-at-end.

gcc/c/ChangeLog:

* c-decl.cc> 
(finish_struct): Issue warnings for new option.

gcc/ChangeLog:

* doc/extend.texi: Document GCC extension on a structure containing
a flexible array member to be a member of another structure.

gcc/testsuite/ChangeLog:

* gcc.dg/variable-sized-type-flex-array.c: New test.
---
gcc/c-family/c.opt|  5 +++
gcc/c/c-decl.cc> 
  |  9 
gcc/doc/extend.texi   | 45 ++-
.../gcc.dg/variable-sized-type-flex-array.c   | 31 +
4 files changed, 89 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.dg/variable-sized-type-flex-array.c

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index cddeece..c26d9801b63 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -737,6 +737,11 @@ Wformat-truncation=
C ObjC C++ LTO ObjC++ Joined RejectNegative UInteger Var(warn_format_trunc) 
Warning LangEnabledBy(C ObjC C++ LTO ObjC++,Wformat=, warn_format >= 1, 0) 
IntegerRange(0, 2)
Warn about calls to snprintf and similar functions that truncate output.

+Wflex-array-member-not-at-end
+C C++ Var(warn_flex_array_member_not_at_end) Warning
+Warn when a structure containing a C99 flexible array member as the last
+field is not at the end of another structure.
+
Wif-not-aligned
C ObjC C++ ObjC++ Var(warn_if_not_aligned) Init(1) Warning
Warn when the field in a struct is not aligned.
diff --git 
a/gcc/c/c-decl.cc> 
b/gcc/c/c-decl.cc>
index 14c54809b9d..92304fd9c8f 100644
--- a/gcc/c/c-decl.cc>
+++ 

Fwd: [V6][PATCH 1/2] Handle component_ref to a structre/union field including flexible array member [PR101832]

2023-04-11 Thread Qing Zhao via Gcc-patches
Hi,  Jakub,

This is the 2nd ping to the 6th version of the patches -:)

Please let me know if you have any further comments on this patch, and whether 
it’s Okay to commit it to trunk?

Thanks a lot for the help.

Qing

Begin forwarded message:

From: Qing Zhao via Gcc-patches 
mailto:gcc-patches@gcc.gnu.org>>
Subject: Fwd: [V6][PATCH 1/2] Handle component_ref to a structre/union field 
including flexible array member [PR101832]
Date: April 4, 2023 at 9:06:37 AM EDT
To: Jakub Jelinek mailto:ja...@redhat.com>>
Cc: Joseph Myers mailto:jos...@codesourcery.com>>, 
Richard Biener mailto:richard.guent...@gmail.com>>, 
kees Cook mailto:keesc...@chromium.org>>, Siddhesh 
Poyarekar mailto:siddh...@gotplt.org>>, gcc Patches 
mailto:gcc-patches@gcc.gnu.org>>
Reply-To: Qing Zhao mailto:qing.z...@oracle.com>>

Ping…

Qing

Begin forwarded message:

From: Qing Zhao 
mailto:qing.z...@oracle.com>>
Subject: [V6][PATCH 1/2] Handle component_ref to a structre/union field 
including flexible array member [PR101832]
Date: March 28, 2023 at 11:49:43 AM EDT
To: ja...@redhat.com, 
jos...@codesourcery.com
Cc: 
richard.guent...@gmail.com,
 
keesc...@chromium.org,
 siddh...@gotplt.org, 
gcc-patches@gcc.gnu.org,
 Qing Zhao 
mailto:qing.z...@oracle.com>>

the C front-end has been approved by Joseph.

Jacub, could you please eview the middle end part of the changes of this patch?

The major change is in 
tree-object-size.cc 
(addr_object_size).
(To use the new TYPE_INCLUDE_FLEXARRAY info).

This patch is to fix 
PR101832(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101832),
and is needed for Linux Kernel security.  It’s better to be put into GCC13.

Thanks a lot!

Qing

==

GCC extension accepts the case when a struct with a flexible array member
is embedded into another struct or union (possibly recursively).
__builtin_object_size should treat such struct as flexible size per
-fstrict-flex-arrays.

gcc/c/ChangeLog:

PR tree-optimization/101832
* c-decl.cc (finish_struct): Set 
TYPE_INCLUDE_FLEXARRAY for
struct/union type.

gcc/lto/ChangeLog:

PR tree-optimization/101832
* lto-common.cc 
(compare_tree_sccs_1): Compare bit
TYPE_NO_NAMED_ARGS_STDARG_P or TYPE_INCLUDE_FLEXARRAY properly
for its corresponding type.

gcc/ChangeLog:

PR tree-optimization/101832
* print-tree.cc (print_node): Print 
new bit type_include_flexarray.
* tree-core.h (struct tree_type_common): Use bit no_named_args_stdarg_p
as type_include_flexarray for RECORD_TYPE or UNION_TYPE.
* tree-object-size.cc 
(addr_object_size): Handle structure/union type
when it has flexible size.
* tree-streamer-in.cc 
(unpack_ts_type_common_value_fields): Stream
in bit no_named_args_stdarg_p properly for its corresponding type.
* 
tree-streamer-out.cc 
(pack_ts_type_common_value_fields): Stream
out bit no_named_args_stdarg_p properly for its corresponding type.
* tree.h (TYPE_INCLUDE_FLEXARRAY): New macro TYPE_INCLUDE_FLEXARRAY.

gcc/testsuite/ChangeLog:

PR tree-optimization/101832
* gcc.dg/builtin-object-size-pr101832.c: New test.
---
gcc/c/c-decl.cc 
  |  11 ++
gcc/lto/lto-common.cc   
  |   5 +-
gcc/print-tree.cc   
  |   5 +
.../gcc.dg/builtin-object-size-pr101832.c | 134 ++
gcc/tree-core.h   |   2 +
gcc/tree-object-size.cc 
  |  23 ++-
gcc/tree-streamer-in.cc 
  |   5 +-
gcc/tree-streamer-out.cc
  |   5 +-
gcc/tree.h|   7 +-
9 files changed, 192 insertions(+), 5 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/builtin-object-size-pr101832.c

diff --git a/gcc/c/c-decl.cc 
b/gcc/c/c-decl.cc
index e537d33f398..14c54809b9d 100644
--- a/gcc/c/c-decl.cc
+++ 

Fwd: [V6][PATCH 0/2] Handle component_ref to a structure/union field including FAM for builtin_object_size

2023-04-11 Thread Qing Zhao via Gcc-patches
Hi, Joseph and Jakub,

This is the 2nd ping to the 6th version of the patches -:)

Please let me know if you have any further comments on the patches, and whether 
it’s Okay to commit them to trunk?

Thanks a lot for the help.

Qing

Begin forwarded message:

From: Qing Zhao mailto:qing.z...@oracle.com>>
Subject: [V6][PATCH 0/2] Handle component_ref to a structure/union field 
including FAM for builtin_object_size
Date: March 28, 2023 at 11:49:42 AM EDT
To: ja...@redhat.com, 
jos...@codesourcery.com
Cc: richard.guent...@gmail.com, 
keesc...@chromium.org, 
siddh...@gotplt.org, 
gcc-patches@gcc.gnu.org, Qing Zhao 
mailto:qing.z...@oracle.com>>

Hi, Joseph and Jakub,

this is the 6th version of the patch.
compared to the 5th version, the major changes are:

1. Update the documentation Per Joseph's comments;
2. Change the name of the new warning option per Jakub's suggestions.
3. Update testing case per the above change.

these changes are all in the 2th patch (2/2 Update documentation to
clarify a GCC extension).

The first patch (1/2 Handle component_ref to a structre/union field
including  flexible array member [PR101832]) is not changed

For the first patch, As a record, Joseph has approved the C front-end change,
I only need a review from Jakub for the Middle-end.

bootstrapped and regression tested on aarch64 and x86.

Okay for commit?

thanks.

Qing

=

Qing Zhao (2):
 Handle component_ref to a structre/union field including flexible
   array member [PR101832]
 Update documentation to clarify a GCC extension

gcc/c-family/c.opt|   5 +
gcc/c/c-decl.cc   |  20 +++
gcc/doc/extend.texi   |  45 +-
gcc/lto/lto-common.cc |   5 +-
gcc/print-tree.cc |   5 +
.../gcc.dg/builtin-object-size-pr101832.c | 134 ++
.../gcc.dg/variable-sized-type-flex-array.c   |  31 
gcc/tree-core.h   |   2 +
gcc/tree-object-size.cc   |  23 
++-
gcc/tree-streamer-in.cc   |   5 
+-
gcc/tree-streamer-out.cc  |   
5 +-
gcc/tree.h|   7 +-
12 files changed, 281 insertions(+), 6 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/builtin-object-size-pr101832.c
create mode 100644 gcc/testsuite/gcc.dg/variable-sized-type-flex-array.c

--
2.31.1




Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization

2023-04-11 Thread Richard Sandiford via Gcc-patches
"juzhe.zh...@rivai.ai"  writes:
> Hi, Richards. 
> Kindly Ping this patch. 
> This is the most important patch for RVV auto-vectorization support.
> Bootstraped on X86 has passed.

Can it wait for GCC 14?  It doesn't seem like stage 4 material.

Also, pinging after 5 days seems a bit soon.  It's been a 4-day
holiday weekend for much of Europe.

Thanks,
Richard

> Feel free to comments.
>
> Thanks.
>
>
> juzhe.zh...@rivai.ai
>  
> From: juzhe.zhong
> Date: 2023-04-07 09:47
> To: gcc-patches
> CC: richard.sandiford; rguenther; jeffreyalaw; Juzhe-Zhong
> Subject: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for 
> auto-vectorization
> From: Juzhe-Zhong 
>  
> This patch is to add WHILE_LEN pattern.
> It's inspired by RVV ISA simple "vvaddint32.s" example:
> https://github.com/riscv/riscv-v-spec/blob/master/example/vvaddint32.s
>  
> More details are in "vect_set_loop_controls_by_while_len" implementation
> and comments.
>  
> Consider such following case:
> #define N 16
> int src[N];
> int dest[N];
>  
> void
> foo (int n)
> {
>   for (int i = 0; i < n; i++)
> dest[i] = src[i];
> }
>  
> -march=rv64gcv -O3 --param riscv-autovec-preference=scalable 
> -fno-vect-cost-model -fno-tree-loop-distribute-patterns:
>  
> foo:
> ble a0,zero,.L1
> lui a4,%hi(.LANCHOR0)
> addia4,a4,%lo(.LANCHOR0)
> addia3,a4,64
> csrra2,vlenb
> .L3:
> vsetvli a5,a0,e32,m1,ta,ma
> vle32.v v1,0(a4)
> sub a0,a0,a5
> vse32.v v1,0(a3)
> add a4,a4,a2
> add a3,a3,a2
> bne a0,zero,.L3
> .L1:
> ret
>  
> gcc/ChangeLog:
>  
> * doc/md.texi: Add WHILE_LEN support.
> * internal-fn.cc (while_len_direct): Ditto.
> (expand_while_len_optab_fn): Ditto.
> (direct_while_len_optab_supported_p): Ditto.
> * internal-fn.def (WHILE_LEN): Ditto.
> * optabs.def (OPTAB_D): Ditto.
> * tree-ssa-loop-manip.cc (create_iv): Ditto.
> * tree-ssa-loop-manip.h (create_iv): Ditto.
> * tree-vect-loop-manip.cc (vect_set_loop_controls_by_while_len): 
> Ditto.
> (vect_set_loop_condition_partial_vectors): Ditto.
> * tree-vect-loop.cc (vect_get_loop_len): Ditto.
> * tree-vect-stmts.cc (vectorizable_store): Ditto.
> (vectorizable_load): Ditto.
> * tree-vectorizer.h (vect_get_loop_len): Ditto.
>  
> ---
> gcc/doc/md.texi |  14 +++
> gcc/internal-fn.cc  |  29 ++
> gcc/internal-fn.def |   1 +
> gcc/optabs.def  |   1 +
> gcc/tree-ssa-loop-manip.cc  |   4 +-
> gcc/tree-ssa-loop-manip.h   |   2 +-
> gcc/tree-vect-loop-manip.cc | 186 ++--
> gcc/tree-vect-loop.cc   |  35 +--
> gcc/tree-vect-stmts.cc  |   9 +-
> gcc/tree-vectorizer.h   |   4 +-
> 10 files changed, 264 insertions(+), 21 deletions(-)
>  
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 8e3113599fd..72178ab014c 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -4965,6 +4965,20 @@ for (i = 1; i < operand3; i++)
>operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
> @end smallexample
> +@cindex @code{while_len@var{m}@var{n}} instruction pattern
> +@item @code{while_len@var{m}@var{n}}
> +Set operand 0 to the number of active elements in vector will be updated 
> value.
> +operand 1 is the total elements need to be updated value.
> +operand 2 is the vectorization factor.
> +The operation is equivalent to:
> +
> +@smallexample
> +operand0 = MIN (operand1, operand2);
> +operand2 can be const_poly_int or poly_int related to vector mode size.
> +Some target like RISC-V has a standalone instruction to get MIN (n, MODE 
> SIZE) so
> +that we can reduce a use of general purpose register.
> +@end smallexample
> +
> @cindex @code{check_raw_ptrs@var{m}} instruction pattern
> @item @samp{check_raw_ptrs@var{m}}
> Check whether, given two pointers @var{a} and @var{b} and a length @var{len},
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 6e81dc05e0e..5f44def90d3 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -127,6 +127,7 @@ init_internal_fns ()
> #define cond_binary_direct { 1, 1, true }
> #define cond_ternary_direct { 1, 1, true }
> #define while_direct { 0, 2, false }
> +#define while_len_direct { 0, 0, false }
> #define fold_extract_direct { 2, 2, false }
> #define fold_left_direct { 1, 1, false }
> #define mask_fold_left_direct { 1, 1, false }
> @@ -3702,6 +3703,33 @@ expand_while_optab_fn (internal_fn, gcall *stmt, 
> convert_optab optab)
>  emit_move_insn (lhs_rtx, ops[0].value);
> }
> +/* Expand WHILE_LEN call STMT using optab OPTAB.  */
> +static void
> +expand_while_len_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
> +{
> +  expand_operand ops[3];
> +  tree rhs_type[2];
> +
> +  tree lhs = gimple_call_lhs (stmt);
> +  tree lhs_type = TREE_TYPE (lhs);
> +  rtx lhs_rtx = expand_expr (lhs, 

Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization

2023-04-11 Thread juzhe.zh...@rivai.ai
Hi, Richards. 
Kindly Ping this patch. 
This is the most important patch for RVV auto-vectorization support.
Bootstraped on X86 has passed.
Feel free to comments.

Thanks.


juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-04-07 09:47
To: gcc-patches
CC: richard.sandiford; rguenther; jeffreyalaw; Juzhe-Zhong
Subject: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for 
auto-vectorization
From: Juzhe-Zhong 
 
This patch is to add WHILE_LEN pattern.
It's inspired by RVV ISA simple "vvaddint32.s" example:
https://github.com/riscv/riscv-v-spec/blob/master/example/vvaddint32.s
 
More details are in "vect_set_loop_controls_by_while_len" implementation
and comments.
 
Consider such following case:
#define N 16
int src[N];
int dest[N];
 
void
foo (int n)
{
  for (int i = 0; i < n; i++)
dest[i] = src[i];
}
 
-march=rv64gcv -O3 --param riscv-autovec-preference=scalable 
-fno-vect-cost-model -fno-tree-loop-distribute-patterns:
 
foo:
ble a0,zero,.L1
lui a4,%hi(.LANCHOR0)
addia4,a4,%lo(.LANCHOR0)
addia3,a4,64
csrra2,vlenb
.L3:
vsetvli a5,a0,e32,m1,ta,ma
vle32.v v1,0(a4)
sub a0,a0,a5
vse32.v v1,0(a3)
add a4,a4,a2
add a3,a3,a2
bne a0,zero,.L3
.L1:
ret
 
gcc/ChangeLog:
 
* doc/md.texi: Add WHILE_LEN support.
* internal-fn.cc (while_len_direct): Ditto.
(expand_while_len_optab_fn): Ditto.
(direct_while_len_optab_supported_p): Ditto.
* internal-fn.def (WHILE_LEN): Ditto.
* optabs.def (OPTAB_D): Ditto.
* tree-ssa-loop-manip.cc (create_iv): Ditto.
* tree-ssa-loop-manip.h (create_iv): Ditto.
* tree-vect-loop-manip.cc (vect_set_loop_controls_by_while_len): Ditto.
(vect_set_loop_condition_partial_vectors): Ditto.
* tree-vect-loop.cc (vect_get_loop_len): Ditto.
* tree-vect-stmts.cc (vectorizable_store): Ditto.
(vectorizable_load): Ditto.
* tree-vectorizer.h (vect_get_loop_len): Ditto.
 
---
gcc/doc/md.texi |  14 +++
gcc/internal-fn.cc  |  29 ++
gcc/internal-fn.def |   1 +
gcc/optabs.def  |   1 +
gcc/tree-ssa-loop-manip.cc  |   4 +-
gcc/tree-ssa-loop-manip.h   |   2 +-
gcc/tree-vect-loop-manip.cc | 186 ++--
gcc/tree-vect-loop.cc   |  35 +--
gcc/tree-vect-stmts.cc  |   9 +-
gcc/tree-vectorizer.h   |   4 +-
10 files changed, 264 insertions(+), 21 deletions(-)
 
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 8e3113599fd..72178ab014c 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4965,6 +4965,20 @@ for (i = 1; i < operand3; i++)
   operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
@end smallexample
+@cindex @code{while_len@var{m}@var{n}} instruction pattern
+@item @code{while_len@var{m}@var{n}}
+Set operand 0 to the number of active elements in vector will be updated value.
+operand 1 is the total elements need to be updated value.
+operand 2 is the vectorization factor.
+The operation is equivalent to:
+
+@smallexample
+operand0 = MIN (operand1, operand2);
+operand2 can be const_poly_int or poly_int related to vector mode size.
+Some target like RISC-V has a standalone instruction to get MIN (n, MODE SIZE) 
so
+that we can reduce a use of general purpose register.
+@end smallexample
+
@cindex @code{check_raw_ptrs@var{m}} instruction pattern
@item @samp{check_raw_ptrs@var{m}}
Check whether, given two pointers @var{a} and @var{b} and a length @var{len},
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 6e81dc05e0e..5f44def90d3 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -127,6 +127,7 @@ init_internal_fns ()
#define cond_binary_direct { 1, 1, true }
#define cond_ternary_direct { 1, 1, true }
#define while_direct { 0, 2, false }
+#define while_len_direct { 0, 0, false }
#define fold_extract_direct { 2, 2, false }
#define fold_left_direct { 1, 1, false }
#define mask_fold_left_direct { 1, 1, false }
@@ -3702,6 +3703,33 @@ expand_while_optab_fn (internal_fn, gcall *stmt, 
convert_optab optab)
 emit_move_insn (lhs_rtx, ops[0].value);
}
+/* Expand WHILE_LEN call STMT using optab OPTAB.  */
+static void
+expand_while_len_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
+{
+  expand_operand ops[3];
+  tree rhs_type[2];
+
+  tree lhs = gimple_call_lhs (stmt);
+  tree lhs_type = TREE_TYPE (lhs);
+  rtx lhs_rtx = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  create_output_operand ([0], lhs_rtx, TYPE_MODE (lhs_type));
+
+  for (unsigned int i = 0; i < gimple_call_num_args (stmt); ++i)
+{
+  tree rhs = gimple_call_arg (stmt, i);
+  rhs_type[i] = TREE_TYPE (rhs);
+  rtx rhs_rtx = expand_normal (rhs);
+  create_input_operand ([i + 1], rhs_rtx, TYPE_MODE (rhs_type[i]));
+}
+
+  insn_code icode = direct_optab_handler (optab, TYPE_MODE (rhs_type[0]));
+
+  expand_insn (icode, 3, ops);
+  if 

RE: [PATCH v5] RISC-V: Fix regression of -fzero-call-used-regs=all

2023-04-11 Thread Wang, Yanzhang via Gcc-patches
Hi Kito, Juzhe, Jeff,

Thanks for your kindly reviews. I have modified based on the comments and ran 
the testsuite on my local. Could you please take another look ? If any more 
comments please let me know.

Thanks
Yanzhang

> -Original Message-
> From: Wang, Yanzhang 
> Sent: Tuesday, April 11, 2023 7:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@sifive.com; Li, Pan2
> ; Wang, Yanzhang 
> Subject: [PATCH v5] RISC-V: Fix regression of -fzero-call-used-regs=all
> 
> From: Yanzhang Wang 
> 
> This patch registers a riscv specific function to
> TARGET_ZERO_CALL_USED_REGS instead of default in targhooks.cc. It will
> clean gpr and vector relevant registers.
> 
>   PR 109104
> 
> gcc/ChangeLog:
> 
>   * config/riscv/riscv-protos.h (emit_hard_vlmax_vsetvl):
>   * config/riscv/riscv-v.cc (emit_hard_vlmax_vsetvl):
>   (emit_vlmax_vsetvl):
>   * config/riscv/riscv.cc (vector_zero_call_used_regs):
>   (riscv_zero_call_used_regs):
>   (TARGET_ZERO_CALL_USED_REGS):
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/riscv/zero-scratch-regs-1.c: New test.
>   * gcc.target/riscv/zero-scratch-regs-2.c: New test.
>   * gcc.target/riscv/zero-scratch-regs-3.c: New test.
> 
> Signed-off-by: Yanzhang Wang 
> Co-authored-by: Pan Li 
> Co-authored-by: Ju-Zhe Zhong 
> Co-authored-by: Kito Cheng 
> ---
>  gcc/config/riscv/riscv-protos.h   |  1 +
>  gcc/config/riscv/riscv-v.cc   | 15 +++-
>  gcc/config/riscv/riscv.cc | 75 +++
>  .../gcc.target/riscv/zero-scratch-regs-1.c|  9 +++
>  .../gcc.target/riscv/zero-scratch-regs-2.c| 24 ++
>  .../gcc.target/riscv/zero-scratch-regs-3.c| 57 ++
>  6 files changed, 178 insertions(+), 3 deletions(-)  create mode 100644
> gcc/testsuite/gcc.target/riscv/zero-scratch-regs-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zero-scratch-regs-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zero-scratch-regs-3.c
> 
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-
> protos.h index 4611447ddde..5244e8dcbf0 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -159,6 +159,7 @@ bool check_builtin_call (location_t, vec,
> unsigned int,  bool const_vec_all_same_in_range_p (rtx, HOST_WIDE_INT,
> HOST_WIDE_INT);  bool legitimize_move (rtx, rtx, machine_mode);  void
> emit_vlmax_vsetvl (machine_mode, rtx);
> +void emit_hard_vlmax_vsetvl (machine_mode, rtx);
>  void emit_vlmax_op (unsigned, rtx, rtx, machine_mode);  void
> emit_vlmax_op (unsigned, rtx, rtx, rtx, machine_mode);  void
> emit_nonvlmax_op (unsigned, rtx, rtx, rtx, machine_mode); diff --git
> a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index
> 2e91d019f6c..392f5d02e17 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -118,6 +118,17 @@ const_vec_all_same_in_range_p (rtx x, HOST_WIDE_INT
> minval,
> && IN_RANGE (INTVAL (elt), minval, maxval));  }
> 
> +/* Emit a vlmax vsetvl instruction.  This should only be used when
> +   optimization is disabled or after vsetvl insertion pass.  */ void
> +emit_hard_vlmax_vsetvl (machine_mode vmode, rtx vl) {
> +  unsigned int sew = get_sew (vmode);
> +  emit_insn (gen_vsetvl (Pmode, vl, RVV_VLMAX, gen_int_mode (sew, Pmode),
> +  gen_int_mode (get_vlmul (vmode), Pmode), const0_rtx,
> +  const0_rtx));
> +}
> +
>  void
>  emit_vlmax_vsetvl (machine_mode vmode, rtx vl)  { @@ -126,9 +137,7 @@
> emit_vlmax_vsetvl (machine_mode vmode, rtx vl)
>unsigned int ratio = calculate_ratio (sew, vlmul);
> 
>if (!optimize)
> -emit_insn (gen_vsetvl (Pmode, vl, RVV_VLMAX, gen_int_mode (sew,
> Pmode),
> -gen_int_mode (get_vlmul (vmode), Pmode), const0_rtx,
> -const0_rtx));
> +emit_hard_vlmax_vsetvl (vmode, vl);
>else
>  emit_insn (gen_vlmax_avl (Pmode, vl, gen_int_mode (ratio,
> Pmode)));  } diff --git a/gcc/config/riscv/riscv.cc
> b/gcc/config/riscv/riscv.cc index 5f542932d13..a9c9e1aa32b 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -7066,6 +7066,78 @@ riscv_shamt_matches_mask_p (int shamt,
> HOST_WIDE_INT mask)
>return shamt == ctz_hwi (mask);
>  }
> 
> +HARD_REG_SET
> +vector_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs) {
> +  HARD_REG_SET zeroed_hardregs;
> +  CLEAR_HARD_REG_SET (zeroed_hardregs);
> +
> +  /* Find a register to hold vl.  */
> +  unsigned vl_regno = INVALID_REGNUM;
> +  /* Skip the first GPR, otherwise the existing vl is kept due to the
> same
> + between vl and avl.  */
> +  for (unsigned regno = GP_REG_FIRST + 1; regno <= GP_REG_LAST; regno++)
> +{
> +  if (TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
> + {
> +   vl_regno = regno;
> +   break;
> + }
> +}
> +
> +  if (vl_regno > GP_REG_LAST)
> +sorry ("cannot allocate 

[PATCH v5] RISC-V: Fix regression of -fzero-call-used-regs=all

2023-04-11 Thread yanzhang.wang--- via Gcc-patches
From: Yanzhang Wang 

This patch registers a riscv specific function to
TARGET_ZERO_CALL_USED_REGS instead of default in targhooks.cc. It will
clean gpr and vector relevant registers.

PR 109104

gcc/ChangeLog:

* config/riscv/riscv-protos.h (emit_hard_vlmax_vsetvl):
* config/riscv/riscv-v.cc (emit_hard_vlmax_vsetvl):
(emit_vlmax_vsetvl):
* config/riscv/riscv.cc (vector_zero_call_used_regs):
(riscv_zero_call_used_regs):
(TARGET_ZERO_CALL_USED_REGS):

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zero-scratch-regs-1.c: New test.
* gcc.target/riscv/zero-scratch-regs-2.c: New test.
* gcc.target/riscv/zero-scratch-regs-3.c: New test.

Signed-off-by: Yanzhang Wang 
Co-authored-by: Pan Li 
Co-authored-by: Ju-Zhe Zhong 
Co-authored-by: Kito Cheng 
---
 gcc/config/riscv/riscv-protos.h   |  1 +
 gcc/config/riscv/riscv-v.cc   | 15 +++-
 gcc/config/riscv/riscv.cc | 75 +++
 .../gcc.target/riscv/zero-scratch-regs-1.c|  9 +++
 .../gcc.target/riscv/zero-scratch-regs-2.c| 24 ++
 .../gcc.target/riscv/zero-scratch-regs-3.c| 57 ++
 6 files changed, 178 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zero-scratch-regs-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zero-scratch-regs-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zero-scratch-regs-3.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 4611447ddde..5244e8dcbf0 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -159,6 +159,7 @@ bool check_builtin_call (location_t, vec, 
unsigned int,
 bool const_vec_all_same_in_range_p (rtx, HOST_WIDE_INT, HOST_WIDE_INT);
 bool legitimize_move (rtx, rtx, machine_mode);
 void emit_vlmax_vsetvl (machine_mode, rtx);
+void emit_hard_vlmax_vsetvl (machine_mode, rtx);
 void emit_vlmax_op (unsigned, rtx, rtx, machine_mode);
 void emit_vlmax_op (unsigned, rtx, rtx, rtx, machine_mode);
 void emit_nonvlmax_op (unsigned, rtx, rtx, rtx, machine_mode);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 2e91d019f6c..392f5d02e17 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -118,6 +118,17 @@ const_vec_all_same_in_range_p (rtx x, HOST_WIDE_INT minval,
  && IN_RANGE (INTVAL (elt), minval, maxval));
 }
 
+/* Emit a vlmax vsetvl instruction.  This should only be used when
+   optimization is disabled or after vsetvl insertion pass.  */
+void
+emit_hard_vlmax_vsetvl (machine_mode vmode, rtx vl)
+{
+  unsigned int sew = get_sew (vmode);
+  emit_insn (gen_vsetvl (Pmode, vl, RVV_VLMAX, gen_int_mode (sew, Pmode),
+gen_int_mode (get_vlmul (vmode), Pmode), const0_rtx,
+const0_rtx));
+}
+
 void
 emit_vlmax_vsetvl (machine_mode vmode, rtx vl)
 {
@@ -126,9 +137,7 @@ emit_vlmax_vsetvl (machine_mode vmode, rtx vl)
   unsigned int ratio = calculate_ratio (sew, vlmul);
 
   if (!optimize)
-emit_insn (gen_vsetvl (Pmode, vl, RVV_VLMAX, gen_int_mode (sew, Pmode),
-  gen_int_mode (get_vlmul (vmode), Pmode), const0_rtx,
-  const0_rtx));
+emit_hard_vlmax_vsetvl (vmode, vl);
   else
 emit_insn (gen_vlmax_avl (Pmode, vl, gen_int_mode (ratio, Pmode)));
 }
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 5f542932d13..a9c9e1aa32b 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7066,6 +7066,78 @@ riscv_shamt_matches_mask_p (int shamt, HOST_WIDE_INT 
mask)
   return shamt == ctz_hwi (mask);
 }
 
+HARD_REG_SET
+vector_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
+{
+  HARD_REG_SET zeroed_hardregs;
+  CLEAR_HARD_REG_SET (zeroed_hardregs);
+
+  /* Find a register to hold vl.  */
+  unsigned vl_regno = INVALID_REGNUM;
+  /* Skip the first GPR, otherwise the existing vl is kept due to the same
+ between vl and avl.  */
+  for (unsigned regno = GP_REG_FIRST + 1; regno <= GP_REG_LAST; regno++)
+{
+  if (TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
+   {
+ vl_regno = regno;
+ break;
+   }
+}
+
+  if (vl_regno > GP_REG_LAST)
+sorry ("cannot allocate vl register for %qs on this target",
+  "-fzero-call-used-regs");
+
+  /* Vector configurations need not be saved and restored here.  The
+ -fzero-call-used-regs=* option will zero all vector registers and
+ return.  So there's no vector operations between them.  */
+
+  bool emitted_vlmax_vsetvl = false;
+  rtx vl = gen_rtx_REG (Pmode, vl_regno); /* vl is VLMAX.  */
+  for (unsigned regno = V_REG_FIRST; regno <= V_REG_LAST; ++regno)
+{
+  if (TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
+   {
+ rtx target = regno_reg_rtx[regno];
+ machine_mode mode = GET_MODE (target);
+ poly_uint16 nunits = GET_MODE_NUNITS (mode);
+   

Re: Re: [PATCH] RISC-V: Fix PR108279

2023-04-11 Thread juzhe.zh...@rivai.ai
Oh, sorry for didn't explain this clearly.
In this example, the "vle32" should be using same VL as "vadd" that I didn't 
mention.

But we do have other instructions doesn't care about VL operand. So let me 
explain more clearly.
Same example I repeated again:
Loop:
{
bb 0:
vsetvl VL, e8,mf8,TA
vadd (Demand SEW = 8, LMUL = MF8, can either TU or TA, demand VL)
bb 1:
vsetvl VL, e32 mf2,TU
vle32 (Demand RATIO (SEW/LMUL) = 64, so available [SEW, LMUL] = [e8,mf8 or 
e16,mf4 or e32,mf2 or e64,m1], and demand TU, demand VL)
}
In this example, when the "VL" of vadd is the same as "VL" fo vle32, then we 
should do is generate a new vsetvl "vsetvl VL, e32,mf2,TU" outside the loop
then, remove those 2 vsetvl inside the loop.

Another example is that we do have some rvv instructions doesn't demand "VL":
Loop:
{
bb 0:
vsetvl e8,m1,TA (can be any VL, can be any LMUL)
vmv.x.s (Demand SEW = 8, LMUL default value = 1 (However, it's available also 
for all LMUL), tail policty can be either TU or TA, do not demand VL)
bb 1:
vsetvl VL, e32 mf2,TU
vle32 (Demand RATIO (SEW/LMUL) = 64, so available [SEW, LMUL] = [e8,mf8 or 
e16,mf4 or e32,mf2 or e64,m1], and demand TU, demand VL)
}

Then in this case, we should generate "vsetvl VL (this VL is from vle32.v), e8, 
mf8, TU outside the loop,
And eliminate those 2 instructions inside the loop.

Hmmm, may not be easy to understand since RVV instructions have various vsetvl 
configuration. 

Some instructions like vle32...load/stores, they demand RATIO = SEW/LMUL, 
meanning they don't need the exactly the SEW and LMUL just only SEW/LMUL ratio 
will be enough.
Some instructions like vadd, vsub,...they demand exactly SEW && LMUL to be 
exact value, the SEW/LMUL ratio is not enough.
Some instructions like comparison instructions, they don't care about tail 
policy
Some instructions like vmv.x.s, doesn't care about VL.
etc.
Quite complicated, so we have defined several fusion rules in VSETVL PASS


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-04-11 19:19
To: juzhe.zh...@rivai.ai
CC: jeffreyalaw; gcc-patches; kito.cheng; palmer
Subject: Re: Re: [PATCH] RISC-V: Fix PR108279
On Tue, Apr 11, 2023 at 11:19 AM juzhe.zh...@rivai.ai
 wrote:
>
> No, we can only pass "available" to LCM.
> Passing "compatible" to LCM can not work for us.
>
> LCM can only help for eliminate vsetvls can not help for fuse vsetvls.
>
> For example:
>
> bb 0:
> vsetvl e8,mf8
> vadd (Demand SEW = 8, LMUL = MF8)
> bb 1:
> vsetvl e32 mf2
> vle32 (Demand RATIO (SEW/LMUL) = 64, so available [SEW, LMUL] = [e8,mf8 or 
> e16,mf4 or e32,mf2 or e64,m1])
>
> I use LCM to handle the case above, I tell LCM that the vsetvl of vadd is 
> "available" for the following "vle" instruction.
> Then LCM will let us to remove "vsetvl e32mf2"
> This is what I said "available" case that I use LCM to handle.
>
> However, LCM can not handle "compatible" case. Here is the example:
>
> Loop:
> {
> bb 0:
> vsetvl e8,mf8,TA
> vadd (Demand SEW = 8, LMUL = MF8, can either TU or TA)
> bb 1:
> vsetvl e32 mf2,TU
> vle32 (Demand RATIO (SEW/LMUL) = 64, so available [SEW, LMUL] = [e8,mf8 or 
> e16,mf4 or e32,mf2 or e64,m1], and demand TU)
> }
 
So for this case the vle32 instruction is the one that also works with
other VL, which means the vsetvl is too strict.  I agree
if the above is the input to LCM then it's difficult to make it work.
But then maybe initial placement should be only
done for strict cases.  Anyway - I'm just throwing in wild guesses here.
 
> It's obvious that neither "vsetvl e8,mf8,TA" nor "vsetvl e32 mf2,TU" are 
> available for both instructions "vadd" and "vle".
> That's why we need Phase 3 in VSETVL PASS.
> I do the demand fusion generate a new vsetvl instructions "vsetvl e8,mf8,TU" 
> which is available for both RVV instructions "vadd" and "vle",
> and update the first vsetvl "vsetvl e8,mf8,TA" to  "vsetvl e8,mf8,TU"
>
> Then, I tell LCM "vsetvl e8,mf8,TU" is available for both "vadd" and "vle32", 
> so LCM will hoist "vsetvl e8,mf8,TU" outside the LOOP
> and remove all vsetvls inside the loop.
>
> 
> juzhe.zh...@rivai.ai
>
>
> From: Richard Biener
> Date: 2023-04-11 16:55
> To: juzhe.zhong
> CC: Jeff Law; gcc-patches; kito.cheng; palmer
> Subject: Re: Re: [PATCH] RISC-V: Fix PR108279
> On Wed, Apr 5, 2023 at 3:53 PM  wrote:
> >
> > >> So fusion in this context is really about identifying cases where two
> > >> configuration settings are equivalent and you "fuse" them together.
> > >> Presumably this is only going to be possible when the vector insns are
> > >> just doing data movement rather than actual computations?
> >
> > >> If my understanding is correct, I can kind of see why you're doing
> > >> fusion during phase 3.  My sense is there's a better way, but I'm having
> > >> a bit of trouble working out the details of what that should be to
> > >> myself.  In any event, revamping parts of the vsetvl insertion code
> > >> isn't the kind of thing we should be doing now.
> >
> > 

[committed] gfortran.dg/gomp/affinity-clause-1.f90: Fix scan-tree-dump (was: [r13-7120 Regression] FAIL: gfortran.dg/gomp/affinity-clause-1.f90 -O scan-tree-dump-times original "#pragma omp task affin

2023-04-11 Thread Tobias Burnus

Commit r13-7137 fixes thethe dump issue with -m32, cf. attachment.

Tobias

On 09.04.23 00:11, haochen.jiang via Gcc-patches wrote:

 Fortran: Fix dg directives and remove trailing whitespaces in testsuite
caused
FAIL: gfortran.dg/gomp/affinity-clause-1.f90   -O   scan-tree-dump-times original "#pragma 
omp task affinity\\(iterator\\(integer\\(kind=4\\) i=D\\.[0-9]+:5:1\\):b\\[\\(.* ? 
\\+ -1\\]\\) affinity\\(iterator\\(integer\\(kind=4\\) 
i=D\\.[0-9]+:5:1\\):d\\[\\(\\(integer\\(kind=8\\)\\) i \\+ -1\\) \\* 6\\]\\)" 1

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit b8e32978e3d9e3b88cd4f441edfdebfa395a5c26
Author: Tobias Burnus 
Date:   Tue Apr 11 13:15:39 2023 +0200

gfortran.dg/gomp/affinity-clause-1.f90: Fix scan-tree-dump

Commit r13-7120-g46fe32cb4d887d44a62f9c4ff2a72532d4eb5a19 added the
missing hyphen to 'dg-final', which exposed an -m32 pattern mismatch.

gcc/testsuite/

* gfortran.dg/gomp/affinity-clause-1.f90: Update scan-tree pattern
for -m32.

diff --git a/gcc/testsuite/gfortran.dg/gomp/affinity-clause-1.f90 b/gcc/testsuite/gfortran.dg/gomp/affinity-clause-1.f90
index b8c7b5d68ad..32c9acef070 100644
--- a/gcc/testsuite/gfortran.dg/gomp/affinity-clause-1.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/affinity-clause-1.f90
@@ -27,2 +27 @@ end
-! { dg-final { scan-tree-dump-times "#pragma omp task affinity\\(iterator\\(integer\\(kind=4\\) i=D\\.\[0-9\]+:5:1\\):b\\\[\\(.* ? \\+ -1\\\]\\) affinity\\(iterator\\(integer\\(kind=4\\) i=D\\.\[0-9\]+:5:1\\):d\\\[\\(\\(integer\\(kind=8\\)\\) i \\+ -1\\) \\* 6\\\]\\)"  1 "original" } }
-
+! { dg-final { scan-tree-dump-times "#pragma omp task affinity\\(iterator\\(integer\\(kind=4\\) i=D\\.\[0-9\]+:5:1\\):b\\\[.* ? \\+ -1\\\]\\) affinity\\(iterator\\(integer\\(kind=4\\) i=D\\.\[0-9\]+:5:1\\):d\\\[\\(.*i \\+ -1\\) \\* 6\\\]\\)"  1 "original" } }


Re: Re: [PATCH] RISC-V: Fix PR108279

2023-04-11 Thread Richard Biener via Gcc-patches
On Tue, Apr 11, 2023 at 11:19 AM juzhe.zh...@rivai.ai
 wrote:
>
> No, we can only pass "available" to LCM.
> Passing "compatible" to LCM can not work for us.
>
> LCM can only help for eliminate vsetvls can not help for fuse vsetvls.
>
> For example:
>
> bb 0:
> vsetvl e8,mf8
> vadd (Demand SEW = 8, LMUL = MF8)
> bb 1:
> vsetvl e32 mf2
> vle32 (Demand RATIO (SEW/LMUL) = 64, so available [SEW, LMUL] = [e8,mf8 or 
> e16,mf4 or e32,mf2 or e64,m1])
>
> I use LCM to handle the case above, I tell LCM that the vsetvl of vadd is 
> "available" for the following "vle" instruction.
> Then LCM will let us to remove "vsetvl e32mf2"
> This is what I said "available" case that I use LCM to handle.
>
> However, LCM can not handle "compatible" case. Here is the example:
>
> Loop:
> {
> bb 0:
> vsetvl e8,mf8,TA
> vadd (Demand SEW = 8, LMUL = MF8, can either TU or TA)
> bb 1:
> vsetvl e32 mf2,TU
> vle32 (Demand RATIO (SEW/LMUL) = 64, so available [SEW, LMUL] = [e8,mf8 or 
> e16,mf4 or e32,mf2 or e64,m1], and demand TU)
> }

So for this case the vle32 instruction is the one that also works with
other VL, which means the vsetvl is too strict.  I agree
if the above is the input to LCM then it's difficult to make it work.
But then maybe initial placement should be only
done for strict cases.  Anyway - I'm just throwing in wild guesses here.

> It's obvious that neither "vsetvl e8,mf8,TA" nor "vsetvl e32 mf2,TU" are 
> available for both instructions "vadd" and "vle".
> That's why we need Phase 3 in VSETVL PASS.
> I do the demand fusion generate a new vsetvl instructions "vsetvl e8,mf8,TU" 
> which is available for both RVV instructions "vadd" and "vle",
> and update the first vsetvl "vsetvl e8,mf8,TA" to  "vsetvl e8,mf8,TU"
>
> Then, I tell LCM "vsetvl e8,mf8,TU" is available for both "vadd" and "vle32", 
> so LCM will hoist "vsetvl e8,mf8,TU" outside the LOOP
> and remove all vsetvls inside the loop.
>
> 
> juzhe.zh...@rivai.ai
>
>
> From: Richard Biener
> Date: 2023-04-11 16:55
> To: juzhe.zhong
> CC: Jeff Law; gcc-patches; kito.cheng; palmer
> Subject: Re: Re: [PATCH] RISC-V: Fix PR108279
> On Wed, Apr 5, 2023 at 3:53 PM  wrote:
> >
> > >> So fusion in this context is really about identifying cases where two
> > >> configuration settings are equivalent and you "fuse" them together.
> > >> Presumably this is only going to be possible when the vector insns are
> > >> just doing data movement rather than actual computations?
> >
> > >> If my understanding is correct, I can kind of see why you're doing
> > >> fusion during phase 3.  My sense is there's a better way, but I'm having
> > >> a bit of trouble working out the details of what that should be to
> > >> myself.  In any event, revamping parts of the vsetvl insertion code
> > >> isn't the kind of thing we should be doing now.
> >
> > The vsetvl demand fusion happens is not necessary "equivalent", instead, we
> > call it we will do demand fusion when they are "compatible".
> > And the fusion can happen between any vector insns including data movement
> > and actual computations.
> >
> > What is "compatible" ??  This definition is according to RVV ISA.
> > For example , For a vadd.vv need a vsetvl zero, 4, e32,m1,ta,ma.
> > and a vle.v need a vsetvl zero,4,e8,mf4,ta,ma.
> >
> > According to RVV ISA:
> > vadd.vv demand SEW = 32, LMUL = M1, AVL = 4
> > vle.v demand RATIO = SEW/LMUL = 32, AVL = 4.
> > So after demand fusion, the demand becomes SEW = 32, LMUL = M1, AVL = 4.
> > Such vsetvl instruction is configured as this demand fusion, we call it 
> > "compatible"
> > since we can find a common vsetvl VL/VTYPE status for both vadd.vv and vle.v
> >
> > However, what case is not "incompatible", same example, if the vadd.vv 
> > demand SEW = 32. LMUL = MF2,
> > the vadd.vv is incompatible with vle.v. since we can't find a common 
> > VL/VTYPE vsetvl instruction available
> > for both of them.
> >
> > We have local demand fusion which is Phase 1. Local demand fusion is doing 
> > the fusion within a block
> > And also we have global demand fusion which is Phase 3. Global demand 
> > fusion is doing across blocks.
> >
> > After Phase 1, each block has a single demand fusion. Phase 3 is doing 
> > global demand fusion trying to
> > find the common VL/VTYPE status available for a bunch of blocks, and fuse 
> > them into a single vsetvl.
> > So that we eliminate redundant vsetvli.
> >
> > Here is a example:
> >
> > bb 0:  (vle.v demand RATIO = 32)
> >   /   \
> > bb 1  bb 2
> >   /  \ /   \
> >  bb 3   bb 4   bb 5
> >vadd   vmul  vdiv
> > (demand  (demand  (demand
> >  sew = 8,sew = 8,  sew = 8,
> > lmul = mf4)  lmul = mf4,   lmul = mf4,
> >   tail policy = tu) mask policy = mu)
> >
> > So in 

Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-11 Thread juzhe.zh...@rivai.ai
9 bit (512 modes) mode should be enough for RVV.
In the future, I would expect we will have BF16 vector, FP16 vector,.. matrix 
modes. 
And I think it will not be more 512 modes in the future.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-04-11 19:11
To: Richard Biener
CC: juzhe.zhong; Jeff Law; gcc-patches; kito.cheng; palmer; jakub
Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit
Richard Biener  writes:
> On Tue, 11 Apr 2023, Richard Sandiford wrote:
>
>>  writes:
>> > ARM SVE has?svint8_t, svint8x2_t, svint8x3_t, svint8x4_t
>> > As far as I known, they don't have tuple type for partial vector.
>> 
>> Yeah, there are no separate types for partial vectors, but there
>> are separate modes.  E.g. VNx2QI is a partial vector of QIs,
>> with each QI stored in a 64-bit container.
>> 
>> I agree with all the comments about the danger of growing the number of
>> modes too much.  But it looks like rtx_def should be easy to rearrange.
>> Unless I'm missing something, there are less than 256 rtx codes at
>> present.  So one simple option would be to make the code 8 bits and
>> the machine_mode 16 bits (and swap them, so that they stay well-aligned
>> wrt their size).
>
> But then the bigger issue is tree_type_common where we agreed to
> bump precision from 10 to 16 bits, with bumping machine_mode from
> 8 to 16 we then are left with only 3 spare bits from 15 now - if
> the comments are correct.
 
Hmm, true.  I guess the two options are:
 
(1) Increase the size of the machine_mode field by the smallest amount
possible (accepting that it will be non-power-of-2).  I'd be
surprised if that's a significant performance issue, since modes
aren't as fundamental to trees as rtxes (and since a non-power-of-2
precision doesn't seem to have hurt).
 
(2) Increase the size to 16 anyway, with the understanding that the
mode is the first thing to shrink if we need a fourth spare bit.
 
> In tree_decl_common we have 13 unused bits.
>
> IRA allocno would also increase and it's hard_regno field looks
> suspiciously unaligned already (unless unsigned/signed re-aligns
> bitfields).
 
Yeah, agree it looks unaligned.
 
If I've read it correctly, it looks like there's a 32-bit gap
on 64-bit hosts before objects[2].  So perhaps we could move
the mode fields there and put hard_regno where the modes are now.
 
Thanks,
Richard
 


Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-11 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Tue, 11 Apr 2023, Richard Sandiford wrote:
>
>>  writes:
>> > ARM SVE has?svint8_t, svint8x2_t, svint8x3_t, svint8x4_t
>> > As far as I known, they don't have tuple type for partial vector.
>> 
>> Yeah, there are no separate types for partial vectors, but there
>> are separate modes.  E.g. VNx2QI is a partial vector of QIs,
>> with each QI stored in a 64-bit container.
>> 
>> I agree with all the comments about the danger of growing the number of
>> modes too much.  But it looks like rtx_def should be easy to rearrange.
>> Unless I'm missing something, there are less than 256 rtx codes at
>> present.  So one simple option would be to make the code 8 bits and
>> the machine_mode 16 bits (and swap them, so that they stay well-aligned
>> wrt their size).
>
> But then the bigger issue is tree_type_common where we agreed to
> bump precision from 10 to 16 bits, with bumping machine_mode from
> 8 to 16 we then are left with only 3 spare bits from 15 now - if
> the comments are correct.

Hmm, true.  I guess the two options are:

(1) Increase the size of the machine_mode field by the smallest amount
possible (accepting that it will be non-power-of-2).  I'd be
surprised if that's a significant performance issue, since modes
aren't as fundamental to trees as rtxes (and since a non-power-of-2
precision doesn't seem to have hurt).

(2) Increase the size to 16 anyway, with the understanding that the
mode is the first thing to shrink if we need a fourth spare bit.

> In tree_decl_common we have 13 unused bits.
>
> IRA allocno would also increase and it's hard_regno field looks
> suspiciously unaligned already (unless unsigned/signed re-aligns
> bitfields).

Yeah, agree it looks unaligned.

If I've read it correctly, it looks like there's a 32-bit gap
on 64-bit hosts before objects[2].  So perhaps we could move
the mode fields there and put hard_regno where the modes are now.

Thanks,
Richard


Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-11 Thread Richard Biener via Gcc-patches
On Tue, 11 Apr 2023, Richard Sandiford wrote:

>  writes:
> > ARM SVE has?svint8_t, svint8x2_t, svint8x3_t, svint8x4_t
> > As far as I known, they don't have tuple type for partial vector.
> 
> Yeah, there are no separate types for partial vectors, but there
> are separate modes.  E.g. VNx2QI is a partial vector of QIs,
> with each QI stored in a 64-bit container.
> 
> I agree with all the comments about the danger of growing the number of
> modes too much.  But it looks like rtx_def should be easy to rearrange.
> Unless I'm missing something, there are less than 256 rtx codes at
> present.  So one simple option would be to make the code 8 bits and
> the machine_mode 16 bits (and swap them, so that they stay well-aligned
> wrt their size).

But then the bigger issue is tree_type_common where we agreed to
bump precision from 10 to 16 bits, with bumping machine_mode from
8 to 16 we then are left with only 3 spare bits from 15 now - if
the comments are correct.  In tree_decl_common we have 13 unused bits.

IRA allocno would also increase and it's hard_regno field looks
suspiciously unaligned already (unless unsigned/signed re-aligns
bitfields).

> That of course would create new problem if we want more than 256 codes
> in future.  But then there would be the option of a non-power-of-2
> split (12/12 or whatever).  Also, it's possible to multiplex operations
> into a single code by adding an extra operand, whereas it's harder to
> multiplex modes.
> 
> Thanks,
> Richard
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-11 Thread Jakub Jelinek via Gcc-patches
On Tue, Apr 11, 2023 at 06:25:58PM +0800, juzhe.zh...@rivai.ai wrote:
> Explicit sets multiple VNx1SImode with multiple dest operand and let RA to 
> assign them with continguous regsiters
> will make RTL patterns in RVV hard to maintain.

Not necessarily.  It can be handled through define_subst.

Jakub



Re: [PATCH] tree-vect-generic: Fix up ICE with SSA_NAME_OCCURS_IN_ABNORMAL_PHI [PR109392]

2023-04-11 Thread Richard Biener via Gcc-patches
On Tue, 11 Apr 2023, Jakub Jelinek wrote:

> On Wed, Apr 05, 2023 at 02:11:08PM +0200, Richard Biener wrote:
> > Ok, but maybe there?s a gimple_build overload that meanwhile implements
> > the desired semantics?  It would probably need to specify the valueization
> > hook to follow SSA edges beyond the sequence we?re currently building.
> 
> Jeff has apparently committed the patch in the meantime.
> For gimple_build, did you mean to use it just for this fallback case,
> or instead of the initial resimplification as well?

I was originally thinking of the initial resimplification.

But yes, using gimple_build would have simplified it to

  if (!res)
res = gimple_build (, BIT_FIELD_REF, type, t, bitsize, bitpos);

With the gsi overloads the whole function should now be able to
be turned into

return gimple_build (gsi, true, GSI_SAME_STMT, UNKNOWN_LOCATION,
 BIT_FIELD_REF, type, t, bitsize, bitpos);

I think.  Note that's not 100% equivalent to what we do now, so it's
probably better to defer to stage1.  What it doesn't do is

  /* We're using the resimplify API and maybe_push_res_to_seq to
 simplify the BIT_FIELD_REF but restrict the simplification to 
 a single stmt while at the same time following SSA edges for
 simplification with already emitted CTORs.  */

which is achieved by passing NULL as the gimple_seq * argument to
.resimplify.  The above replacement would instead do what a later
forwprop pass would do when folding.  ISTR I wanted to do minimal
changes when rewriting this.

Richard.

> > > 2023-04-05  Jakub Jelinek  
> > > 
> > >PR tree-optimization/109392
> > >* tree-vect-generic.cc (tree_vec_extract): If maybe_push_res_to_seq
> > >fails, build BIT_FIELD_REF insn without trying to simplify it.
> > > 
> > >* gcc.dg/pr109392.c: New test.
> > > 
> > > --- gcc/tree-vect-generic.cc.jj2023-03-23 10:02:18.997935620 +0100
> > > +++ gcc/tree-vect-generic.cc2023-04-04 14:28:32.977729134 +0200
> > > @@ -174,7 +174,16 @@ tree_vec_extract (gimple_stmt_iterator *
> > >   opr.resimplify (NULL, follow_all_ssa_edges);
> > >   gimple_seq stmts = NULL;
> > >   tree res = maybe_push_res_to_seq (, );
> > > -  gcc_assert (res);
> > > +  if (!res)
> > > +{
> > > +  /* This can happen if SSA_NAME_OCCURS_IN_ABNORMAL_PHI are
> > > + used.  Build BIT_FIELD_REF manually otherwise.  */
> > > +  t = build3 (BIT_FIELD_REF, type, t, bitsize, bitpos);
> > > +  res = make_ssa_name (type);
> > > +  gimple *g = gimple_build_assign (res, t);
> > > +  gsi_insert_before (gsi, g, GSI_SAME_STMT);
> > > +  return res;
> > > +}
> > >   gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
> > >   return res;
> > > }
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


[PATCH] fix compatability typos

2023-04-11 Thread Reini Urban via Gcc-patches
gcc/Changelog:

  * gcc/config/arm/arm.cc(arm_print_asm_arch_directives): fix compatability
typo
  * gcc/fortran/gfortran.texi(intrinsics): fix compatability typo
  * gcc/fortran/invoke.texi(-fdec-math): fix compatability typo
  * gcc/m2/gm2-compiler/M2Base.def(CannotCheckTypeInPass3): fix
compatability typo in comment
  * gcc/m2/gm2-compiler/M2Base.mod: likewise
  * gcc/po/sr.po: fix compatability typo in config/darwin.cc string
  * gcc/testsuite/gm2/pim/pass/opaquetype.def: fix compatability typo
in comment
  * gcc/testsuite/gm2/switches/pim4/run/pass/modulus.mod: fix
compatability typo in comment
  * gcc/testsuite/objc-obj-c++-shared/GNUStep/Foundation/NSGeometry.h:
fix compatability typo in comment

libgm2/Changelog:

  * libgm2/configure: fix compatability typo
  * libgm2/configure.ac: likewise
---
 gcc/config/arm/arm.cc | 2 +-
 gcc/fortran/gfortran.texi | 2 +-
 gcc/fortran/invoke.texi   | 2 +-
 gcc/m2/gm2-compiler/M2Base.def| 2 +-
 gcc/m2/gm2-compiler/M2Base.mod| 2 +-
 gcc/po/sr.po  | 4 ++--
 gcc/testsuite/gm2/pim/pass/opaquetype.def | 2 +-
 gcc/testsuite/gm2/switches/pim4/run/pass/modulus.mod  | 2 +-
 .../objc-obj-c++-shared/GNUStep/Foundation/NSGeometry.h   | 2 +-
 libgm2/configure  | 4 ++--
 libgm2/configure.ac   | 2 +-
 11 files changed, 13 insertions(+), 13 deletions(-)

diff --git gcc/config/arm/arm.cc gcc/config/arm/arm.cc
index bf7ff9a9704..35e67511788 100644
--- gcc/config/arm/arm.cc
+++ gcc/config/arm/arm.cc
@@ -28534,7 +28534,7 @@ arm_print_asm_arch_directives (FILE *stream,
cl_target_option *targ_options)

   if (strcmp (build_target.arch_name, "armv7ve") == 0)
 {
-  /* Keep backward compatability for assemblers which don't support
+  /* Keep backward compatibility for assemblers which don't support
  armv7ve.  Fortunately, none of the following extensions are reset
  by a .fpu directive.  */
   asm_fprintf (stream, "\t.arch armv7-a\n");
diff --git gcc/fortran/gfortran.texi gcc/fortran/gfortran.texi
index b96712987e1..1509e2cf299 100644
--- gcc/fortran/gfortran.texi
+++ gcc/fortran/gfortran.texi
@@ -2323,7 +2323,7 @@ endsubroutine
 @cindex intrinsics, trigonometric functions

 GNU Fortran supports an extended list of mathematical intrinsics with the
-compile flag @option{-fdec-math} for compatability with legacy code.
+compile flag @option{-fdec-math} for compatibility with legacy code.
 These intrinsics are described fully in @ref{Intrinsic Procedures} where
it is
 noted that they are extensions and should be avoided whenever possible.

diff --git gcc/fortran/invoke.texi gcc/fortran/invoke.texi
index 38150b1e29e..6e2aa7de85d 100644
--- gcc/fortran/invoke.texi
+++ gcc/fortran/invoke.texi
@@ -302,7 +302,7 @@ JIAND, etc...). For a complete list of intrinsics see
the full documentation.
 @opindex @code{fdec-math}
 @item -fdec-math
 Enable legacy math intrinsics such as COTAN and degree-valued trigonometric
-functions (e.g. TAND, ATAND, etc...) for compatability with older code.
+functions (e.g. TAND, ATAND, etc...) for compatibility with older code.

 @opindex @code{fdec-static}
 @item -fdec-static
diff --git gcc/m2/gm2-compiler/M2Base.def gcc/m2/gm2-compiler/M2Base.def
index 14e79e7ae00..5fab1d57451 100644
--- gcc/m2/gm2-compiler/M2Base.def
+++ gcc/m2/gm2-compiler/M2Base.def
@@ -348,7 +348,7 @@ PROCEDURE CannotCheckTypeInPass3 (e: CARDINAL) :
BOOLEAN ;
 (*
MixTypes - returns the type symbol that corresponds to the types t1 and
t2.
   NearTok is used to identify the source position if a type
-  incompatability occurs.
+  incompatibility occurs.
 *)

 PROCEDURE MixTypes (t1, t2: CARDINAL; NearTok: CARDINAL) : CARDINAL ;
diff --git gcc/m2/gm2-compiler/M2Base.mod gcc/m2/gm2-compiler/M2Base.mod
index cc3aa4cc961..7610eea39d6 100644
--- gcc/m2/gm2-compiler/M2Base.mod
+++ gcc/m2/gm2-compiler/M2Base.mod
@@ -2008,7 +2008,7 @@ END MixMetaTypes ;
MixTypes - given types, t1 and t2, returns a type symbol that
   provides expression type compatibility.
   NearTok is used to identify the source position if a type
-  incompatability occurs.
+  incompatibility occurs.
 *)

 PROCEDURE MixTypes (t1, t2: CARDINAL; NearTok: CARDINAL) : CARDINAL ;
diff --git gcc/po/sr.po gcc/po/sr.po
index a78221c999a..6966c0c0dcc 100644
--- gcc/po/sr.po
+++ gcc/po/sr.po
@@ -40333,13 +40333,13 @@ msgstr ""

 #: config/darwin.cc:2116
 #, fuzzy, gcc-internal-format
-#| msgid "%<%s%> 2.95 vtable-compatability attribute applies only when
compiling a kext"
+#| msgid "%<%s%> 2.95 vtable-compatibility attribute applies only when
compiling a kext"
 msgid "%qE 2.95 

Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-11 Thread juzhe.zh...@rivai.ai
Explicit sets multiple VNx1SImode with multiple dest operand and let RA to 
assign them with continguous regsiters
will make RTL patterns in RVV hard to maintain.

Assume we have a new pattern flag to tell RA assign continguous registers for 
multiple dest operand, and RA can handle this:
in RVV, we have NF = 2 ~ 8
Then we need to define RTL pattern for "vlseg" as follows:
NF = 2:
define_insn "vlseg2"
[(parallel_with_continguous_reg
  (set dest operand 0)
   (set dest operand 1)...])

NF = 3:
define_insn "vlseg3"
[(parallel_with_continguous_reg
  (set dest operand 0)
   (set dest operand 1)
(set dest operand 2)...])
...


NF = 7:
define_insn "vlseg7"
[(parallel_with_continguous_reg
  (set dest operand 0)
   (set dest operand 1)
(set dest operand 2)
(set dest operand 2)
(set dest operand 2)...])



juzhe.zh...@rivai.ai
 
From: Jakub Jelinek
Date: 2023-04-11 18:11
To: juzhe.zh...@rivai.ai
CC: jeffreyalaw; gcc-patches; kito.cheng; palmer; richard.sandiford; rguenther; 
Vladimir Makarov
Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit
On Tue, Apr 11, 2023 at 05:46:15PM +0800, juzhe.zh...@rivai.ai wrote:
> I am not sure whether aggregate type without a tuple mode can work for us.
> Here is the example:
> 
> We already had a vector type "vint8mf8_t", the corresponding mode is 
> VNx1SImode.
> 
> Now we have an intrinsic as following:
> vint8mf8x2_t test_vlseg2e8_v_i8mf8(const int8_t *base, size_t vl) {
>   return __riscv_vlseg2e8_v_i8mf8(base, vl);
> }
> 
> This intrinsic is suppose generate a "vlseg2e8.v" instructions and dest 
> operand of the intrinsic should be 2 continguous registers.
> 
> Another intrinsic:
>  vint8mf8x3_t test_vlseg3e8_v_i8mf8(const int8_t *base, size_t vl) {
>   return __riscv_vlseg3e8_v_i8mf8(base, vl);
> }
> 
> This intrinsic is suppose generate a "vlseg3e8.v" instructions and dest 
> operand of the intrinsic should be 3 continguous registers.
> 
> Now, my plan is to build_array_type for both "vint8mf8x2_t" and 
> "vint8mf8x3_t" and make their TYPE_MODE is "VNx2x1SI" and "VNx3x1SI" 
> corresponding like ARM SVE.
> Then define the RTL pattern which has dest operand is a register_operand with 
> mode  "VNx2x1SI" and "VNx3x1SI". Then we can do the codegen.
 
Another possibility would be just make it explicit in the RTL that it sets 3
VNx1SI mode REGs rather than one, as long as there is some way to tell RA
that they need to be consecutive.  CCing Vlad on that.
 
Jakub
 
 


Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-11 Thread Richard Sandiford via Gcc-patches
Richard Earnshaw  writes:
> On 11/04/2023 10:46, Richard Sandiford via Gcc-patches wrote:
>>  writes:
>>> ARM SVE has:svint8_t, svint8x2_t, svint8x3_t, svint8x4_t
>>> As far as I known, they don't have tuple type for partial vector.
>> 
>> Yeah, there are no separate types for partial vectors, but there
>> are separate modes.  E.g. VNx2QI is a partial vector of QIs,
>> with each QI stored in a 64-bit container.
>> 
>> I agree with all the comments about the danger of growing the number of
>> modes too much.  But it looks like rtx_def should be easy to rearrange.
>> Unless I'm missing something, there are less than 256 rtx codes at
>> present.  So one simple option would be to make the code 8 bits and
>> the machine_mode 16 bits (and swap them, so that they stay well-aligned
>> wrt their size).
>> 
>> That of course would create new problem if we want more than 256 codes
>> in future.  But then there would be the option of a non-power-of-2
>> split (12/12 or whatever).  Also, it's possible to multiplex operations
>> into a single code by adding an extra operand, whereas it's harder to
>> multiplex modes.
>> 
>> Thanks,
>> Richard
>
> The rtx code and mode are both accessed quite frequently, making them 
> non-native machine sizes might have impact on the performance of 
> accessing the fields.

Yeah, that's why I suggested that having a subcode operand would be an
alternative to abandoning non-power-of-2 sizes.  It seems unlikely that
any new codes we add now will be so frequently used that an extra
operand would be a problem in terms of either size or speed.

Having a subcode operand would be very much UNSPECs today.

But as it is, we've added 9 new rtx codes in the last 10 years.
So even with 203 at present, with the current rate of expansion,
it would be at least the 2070s before this becomes an issue.

Richard



Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-11 Thread Jakub Jelinek via Gcc-patches
On Tue, Apr 11, 2023 at 05:46:15PM +0800, juzhe.zh...@rivai.ai wrote:
> I am not sure whether aggregate type without a tuple mode can work for us.
> Here is the example:
> 
> We already had a vector type "vint8mf8_t", the corresponding mode is 
> VNx1SImode.
> 
> Now we have an intrinsic as following:
> vint8mf8x2_t test_vlseg2e8_v_i8mf8(const int8_t *base, size_t vl) {
>   return __riscv_vlseg2e8_v_i8mf8(base, vl);
> }
> 
> This intrinsic is suppose generate a "vlseg2e8.v" instructions and dest 
> operand of the intrinsic should be 2 continguous registers.
> 
> Another intrinsic:
>  vint8mf8x3_t test_vlseg3e8_v_i8mf8(const int8_t *base, size_t vl) {
>   return __riscv_vlseg3e8_v_i8mf8(base, vl);
> }
> 
> This intrinsic is suppose generate a "vlseg3e8.v" instructions and dest 
> operand of the intrinsic should be 3 continguous registers.
> 
> Now, my plan is to build_array_type for both "vint8mf8x2_t" and 
> "vint8mf8x3_t" and make their TYPE_MODE is "VNx2x1SI" and "VNx3x1SI" 
> corresponding like ARM SVE.
> Then define the RTL pattern which has dest operand is a register_operand with 
> mode  "VNx2x1SI" and "VNx3x1SI". Then we can do the codegen.

Another possibility would be just make it explicit in the RTL that it sets 3
VNx1SI mode REGs rather than one, as long as there is some way to tell RA
that they need to be consecutive.  CCing Vlad on that.

Jakub



Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-11 Thread juzhe.zh...@rivai.ai
May RTX code grow faster than machine mode ?
Since RTX code grows target independent wheras 
machine mode grows target dependent.

In the future, we may easily have more and more targets that some target may 
have a lot of machine mode.
Maybe Richard Sandiford suggestion is a good idea to fix it?

Thanks for all comments.


juzhe.zh...@rivai.ai
 
From: Jakub Jelinek
Date: 2023-04-11 17:59
To: juzhe.zhong; Jeff Law; gcc-patches; kito.cheng; palmer; rguenther; 
richard.sandiford
Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit
On Tue, Apr 11, 2023 at 10:46:25AM +0100, Richard Sandiford wrote:
> I agree with all the comments about the danger of growing the number of
> modes too much.  But it looks like rtx_def should be easy to rearrange.
> Unless I'm missing something, there are less than 256 rtx codes at
> present.  So one simple option would be to make the code 8 bits and
> the machine_mode 16 bits (and swap them, so that they stay well-aligned
> wrt their size).
> 
> That of course would create new problem if we want more than 256 codes
> in future.  But then there would be the option of a non-power-of-2
> split (12/12 or whatever).  Also, it's possible to multiplex operations
> into a single code by adding an extra operand, whereas it's harder to
> multiplex modes.
 
We have 151 rtx codes if not a generator, 201 otherwise.  That is closer to
the limit except for the RISCV proposed changes.
 
Jakub
 
 


Re: [PATCH v2][RFC] vect: Verify that GET_MODE_NUNITS is greater than one for vect_grouped_store_supported

2023-04-11 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches  writes:
> On Mon, Mar 27, 2023 at 6:02 PM Kevin Lee  wrote:
>>
>> This patch is a proper fix to the previous patch
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614463.html
>> vect_grouped_store_supported checks if the count is a power of 2, but
>> doesn't check the size of the GET_MODE_NUNITS.
>> This should handle the riscv case where the mode is VNx1DI since the
>> nelt would be {1, 1}.
>> It was tested on RISCV and x86_64-linux-gnu. Would this be correct
>> for the vectors with size smaller than 2?
>>
>> ---
>>  gcc/tree-vect-data-refs.cc | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
>> index 8daf7bd7dd3..04ad12f7d04 100644
>> --- a/gcc/tree-vect-data-refs.cc
>> +++ b/gcc/tree-vect-data-refs.cc
>> @@ -5399,6 +5399,8 @@ vect_grouped_store_supported (tree vectype, unsigned 
>> HOST_WIDE_INT count)
>>   poly_uint64 nelt = GET_MODE_NUNITS (mode);
>>
>>   /* The encoding has 2 interleaved stepped patterns.  */
>> +if(!nelt.is_constant() && maybe_lt(nelt, (unsigned int) 2))
>> +  return false;
>
> Indentation is off (or your MUA is broken).  I think the nelt.is_constant ()
> check is superfluous but with constant nelt we'd never end up with a
> grouped store.
>
> Note the calls are guarded with
>
>  && ! known_eq (TYPE_VECTOR_SUBPARTS (vectype), 1U)
>
> maybe the better fix is to change those to ! maybe_eq?

I think the point of those checks is that a grouped store of N 1-element
vectors is equivalent to a store of N scalars.  Nothing needs to happen
internally within the vectors.

For a grouped store of VNx1 vectors, some permutation would be needed.
But it's difficult to generate code for that case, because the minimum
size reduces to two scalars while larger sizes need normal interleaves.

But I think the better check for location above is:

   if (!multiple_p (nelt, 2))
 return false;

which then guards the assert in the later exact_div (nelt, 2).

Thanks,
Richard


Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-11 Thread Richard Earnshaw via Gcc-patches




On 11/04/2023 10:46, Richard Sandiford via Gcc-patches wrote:

 writes:

ARM SVE has:svint8_t, svint8x2_t, svint8x3_t, svint8x4_t
As far as I known, they don't have tuple type for partial vector.


Yeah, there are no separate types for partial vectors, but there
are separate modes.  E.g. VNx2QI is a partial vector of QIs,
with each QI stored in a 64-bit container.

I agree with all the comments about the danger of growing the number of
modes too much.  But it looks like rtx_def should be easy to rearrange.
Unless I'm missing something, there are less than 256 rtx codes at
present.  So one simple option would be to make the code 8 bits and
the machine_mode 16 bits (and swap them, so that they stay well-aligned
wrt their size).

That of course would create new problem if we want more than 256 codes
in future.  But then there would be the option of a non-power-of-2
split (12/12 or whatever).  Also, it's possible to multiplex operations
into a single code by adding an extra operand, whereas it's harder to
multiplex modes.

Thanks,
Richard


The rtx code and mode are both accessed quite frequently, making them 
non-native machine sizes might have impact on the performance of 
accessing the fields.


Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-11 Thread Jakub Jelinek via Gcc-patches
On Tue, Apr 11, 2023 at 10:46:25AM +0100, Richard Sandiford wrote:
> I agree with all the comments about the danger of growing the number of
> modes too much.  But it looks like rtx_def should be easy to rearrange.
> Unless I'm missing something, there are less than 256 rtx codes at
> present.  So one simple option would be to make the code 8 bits and
> the machine_mode 16 bits (and swap them, so that they stay well-aligned
> wrt their size).
> 
> That of course would create new problem if we want more than 256 codes
> in future.  But then there would be the option of a non-power-of-2
> split (12/12 or whatever).  Also, it's possible to multiplex operations
> into a single code by adding an extra operand, whereas it's harder to
> multiplex modes.

We have 151 rtx codes if not a generator, 201 otherwise.  That is closer to
the limit except for the RISCV proposed changes.

Jakub



Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-11 Thread Richard Sandiford via Gcc-patches
 writes:
> ARM SVE has:svint8_t, svint8x2_t, svint8x3_t, svint8x4_t
> As far as I known, they don't have tuple type for partial vector.

Yeah, there are no separate types for partial vectors, but there
are separate modes.  E.g. VNx2QI is a partial vector of QIs,
with each QI stored in a 64-bit container.

I agree with all the comments about the danger of growing the number of
modes too much.  But it looks like rtx_def should be easy to rearrange.
Unless I'm missing something, there are less than 256 rtx codes at
present.  So one simple option would be to make the code 8 bits and
the machine_mode 16 bits (and swap them, so that they stay well-aligned
wrt their size).

That of course would create new problem if we want more than 256 codes
in future.  But then there would be the option of a non-power-of-2
split (12/12 or whatever).  Also, it's possible to multiplex operations
into a single code by adding an extra operand, whereas it's harder to
multiplex modes.

Thanks,
Richard


Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-11 Thread juzhe.zh...@rivai.ai
I am not sure whether aggregate type without a tuple mode can work for us.
Here is the example:

We already had a vector type "vint8mf8_t", the corresponding mode is VNx1SImode.

Now we have an intrinsic as following:
vint8mf8x2_t test_vlseg2e8_v_i8mf8(const int8_t *base, size_t vl) {
  return __riscv_vlseg2e8_v_i8mf8(base, vl);
}

This intrinsic is suppose generate a "vlseg2e8.v" instructions and dest operand 
of the intrinsic should be 2 continguous registers.

Another intrinsic:
 vint8mf8x3_t test_vlseg3e8_v_i8mf8(const int8_t *base, size_t vl) {
  return __riscv_vlseg3e8_v_i8mf8(base, vl);
}

This intrinsic is suppose generate a "vlseg3e8.v" instructions and dest operand 
of the intrinsic should be 3 continguous registers.

Now, my plan is to build_array_type for both "vint8mf8x2_t" and "vint8mf8x3_t" 
and make their TYPE_MODE is "VNx2x1SI" and "VNx3x1SI" corresponding like ARM 
SVE.
Then define the RTL pattern which has dest operand is a register_operand with 
mode  "VNx2x1SI" and "VNx3x1SI". Then we can do the codegen.

If we don't have a mode for "vint8mf8x2_t" and "vint8mf8x3_t", I don't known 
how to define such instruction RTL pattern. Should its dest operand mode be 
BLKmode?
But we want the dest operand is a register operand.



juzhe.zh...@rivai.ai
 
From: Jakub Jelinek
Date: 2023-04-11 17:16
To: juzhe.zhong
CC: Jeff Law; gcc-patches; kito.cheng; palmer; richard.sandiford; rguenther
Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit
On Mon, Apr 10, 2023 at 11:14:46PM +0800, juzhe.zh...@rivai.ai wrote:
> ARM SVE has:svint8_t, svint8x2_t, svint8x3_t, svint8x4_t
> As far as I known, they don't have tuple type for partial vector.
> However, for RVV not only has vint8m1_t, vint8m1x2_t, vint8m1x3_t, 
> vint8m1x4_t, vint8m1x5_t, vint8m1x6_t, vint8m1x7_t, vint8m1x8_t
> 
> But also, we have vint8mf8_t, vint8mf8x2_t, vint8mf8x3_t, 
> vint8mf8x4_t, vint8mf8x5_t, vint8mf8x6_t, vint8mf8x7_t, vint8mf8x8_t
> 
> vint8mf4_t, vint8mf4x2_t, vint8mf4x3_t, 
> vint8mf4x4_t, vint8mf4x5_t, vint8mf4x6_t, vint8mf4x7_t, vint8mf4x8_t
> 
> etc
> 
> So many tuple types.
 
Do all of them need their own mode?  I mean, can't you instead use say some
backend aggregate types which act like homogenous aggregates in various
backends?
Modes are needed for something that can appear in instructions, for
something that can be lowered say during expansion at latest you don't
need special modes.  I admit I don't know much about RVV, but if those
tuples are to be handled as configure the CPU for certain vector length,
perform some instruction on effectively variable length vector with certain
element and then reconfigure the CPU again for something else, couldn't
the only vector modes there be the variable length ones?
 
Jakub
 
 


RE: [PATCH] aarch64: Add the cost and scheduling models for Neoverse N1

2023-04-11 Thread Kyrylo Tkachov via Gcc-patches
Hi Evandro,

> -Original Message-
> From: Gcc-patches  bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Evandro
> Menezes via Gcc-patches
> Sent: Friday, April 7, 2023 11:34 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Evandro Menezes ; Richard Sandiford
> 
> Subject: [PATCH] aarch64: Add the cost and scheduling models for Neoverse
> N1
> 
> This patch adds the cost and scheduling models for Neoverse N1, based on
> the information from the "Arm Neoverse N1 Software Optimization Guide”.
> 

Thank you for working on this. It is true that we haven't added any scheduling 
models for big cores from Arm for quite a while.
How has this patch been tested and benchmarked?
Using numbers from the Software Optimization Guide is certainly the way to go, 
but we need to ensure that the way GCC uses them actually results in better 
performance in practice.

> --
> Evandro Menezes ◊ evan...@yahoo.com
> 
> [PATCH] aarch64: Add the cost and scheduling models for Neoverse N1
> 
> gcc/ChangeLog:
> 
> * config/aarch64/aarch64-cores.def:
> Use the Neoverse N1 scheduling and cost models, but only for itself.
>   * config/aarch64/aarch64.cc
> (cortexa76_tunings): Rename variable.
>   (neoversen1_addrcost_table): New variable.
>   (neoversen1_vector_cost): Likewise.
>   (neoversen1_regmove_cost): Likewise.
>   (neoversen1_advsimd_vector_cost): Likewise.
>   (neoversen1_scalar_issue_info): Likewise.
>   (neoversen1_advsimd_issue_info): Likewise.
>   (neoversen1_vec_issue_info): Likewise.
>   (neoversen1_vector_cost): Likewise.
>   (neoversen1_tunings): Likewise.
>   * config/aarch64/aarch64.md: Include `neoverse-n1.md`.
>   * config/aarch64/neoverse-n1.md: New file.
>   * gcc/config/arm/aarch-cost-tables.h
>   (neoversen1_extra_costs): New variable.
> 
> Signed-off-by: Evandro Menezes 
> 
> ---
>  gcc/config/aarch64/aarch64-cores.def |  22 +-
>  gcc/config/aarch64/aarch64.cc| 155 +-
>  gcc/config/aarch64/aarch64.md|   1 +
>  gcc/config/aarch64/neoverse-n1.md| 716 +++
>  gcc/config/arm/aarch-cost-tables.h   | 107 
>  5 files changed, 977 insertions(+), 24 deletions(-)
>  create mode 100644 gcc/config/aarch64/neoverse-n1.md
> 
> diff --git a/gcc/config/aarch64/aarch64-cores.def
> b/gcc/config/aarch64/aarch64-cores.def
> index 2ec88c98400..cc842c4e22c 100644
> --- a/gcc/config/aarch64/aarch64-cores.def
> +++ b/gcc/config/aarch64/aarch64-cores.def
> @@ -105,18 +105,18 @@ AARCH64_CORE("thunderx2t99",  thunderx2t99,
> thunderx2t99, V8_1A,  (CRYPTO), thu
>  /* ARM ('A') cores. */
>  AARCH64_CORE("cortex-a55",  cortexa55, cortexa53, V8_2A,  (F16, RCPC,
> DOTPROD), cortexa53, 0x41, 0xd05, -1)
>  AARCH64_CORE("cortex-a75",  cortexa75, cortexa57, V8_2A,  (F16, RCPC,
> DOTPROD), cortexa73, 0x41, 0xd0a, -1)
> -AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, V8_2A,  (F16, RCPC,
> DOTPROD), neoversen1, 0x41, 0xd0b, -1)
> -AARCH64_CORE("cortex-a76ae",  cortexa76ae, cortexa57, V8_2A,  (F16,
> RCPC, DOTPROD, SSBS), neoversen1, 0x41, 0xd0e, -1)
> -AARCH64_CORE("cortex-a77",  cortexa77, cortexa57, V8_2A,  (F16, RCPC,
> DOTPROD, SSBS), neoversen1, 0x41, 0xd0d, -1)
> -AARCH64_CORE("cortex-a78",  cortexa78, cortexa57, V8_2A,  (F16, RCPC,
> DOTPROD, SSBS, PROFILE), neoversen1, 0x41, 0xd41, -1)
> -AARCH64_CORE("cortex-a78ae",  cortexa78ae, cortexa57, V8_2A,  (F16,
> RCPC, DOTPROD, SSBS, PROFILE), neoversen1, 0x41, 0xd42, -1)
> -AARCH64_CORE("cortex-a78c",  cortexa78c, cortexa57, V8_2A,  (F16, RCPC,
> DOTPROD, SSBS, PROFILE, FLAGM, PAUTH), neoversen1, 0x41, 0xd4b, -1)
> +AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, V8_2A,  (F16, RCPC,
> DOTPROD), cortexa76, 0x41, 0xd0b, -1)
> +AARCH64_CORE("cortex-a76ae",  cortexa76ae, cortexa57, V8_2A,  (F16,
> RCPC, DOTPROD, SSBS), cortexa76, 0x41, 0xd0e, -1)
> +AARCH64_CORE("cortex-a77",  cortexa77, cortexa57, V8_2A,  (F16, RCPC,
> DOTPROD, SSBS), cortexa76, 0x41, 0xd0d, -1)
> +AARCH64_CORE("cortex-a78",  cortexa78, cortexa57, V8_2A,  (F16, RCPC,
> DOTPROD, SSBS, PROFILE), cortexa76, 0x41, 0xd41, -1)
> +AARCH64_CORE("cortex-a78ae",  cortexa78ae, cortexa57, V8_2A,  (F16,
> RCPC, DOTPROD, SSBS, PROFILE), cortexa76, 0x41, 0xd42, -1)
> +AARCH64_CORE("cortex-a78c",  cortexa78c, cortexa57, V8_2A,  (F16, RCPC,
> DOTPROD, SSBS, PROFILE, FLAGM, PAUTH), cortexa76, 0x41, 0xd4b, -1)
>  AARCH64_CORE("cortex-a65",  cortexa65, cortexa53, V8_2A,  (F16, RCPC,
> DOTPROD, SSBS), cortexa73, 0x41, 0xd06, -1)
>  AARCH64_CORE("cortex-a65ae",  cortexa65ae, cortexa53, V8_2A,  (F16, RCPC,
> DOTPROD, SSBS), cortexa73, 0x41, 0xd43, -1)
> -AARCH64_CORE("cortex-x1",  cortexx1, cortexa57, V8_2A,  (F16, RCPC,
> DOTPROD, SSBS, PROFILE), neoversen1, 0x41, 0xd44, -1)
> -AARCH64_CORE("cortex-x1c",  cortexx1c, cortexa57, V8_2A,  (F16, RCPC,
> DOTPROD, SSBS, PROFILE, PAUTH), neoversen1, 0x41, 0xd4c, -1)
> -AARCH64_CORE("ares",  ares, cortexa57, V8_2A,  (F16, RCPC, DOTPROD,
> PROFILE), neoversen1, 0x41, 0xd0c, 

Re: [PATCH] testsuite: update requires for powerpc/float128-cmp2-runnable.c

2023-04-11 Thread Kewen.Lin via Gcc-patches
Hi Jeff,

on 2023/4/11 17:14, guojiufu wrote:
> Hi Kewen,
> 
> Thanks a lot for your very helpful comments!
> 
> On 2023-04-10 17:26, Kewen.Lin wrote:
>> Hi Jeff,
>>
>> on 2023/4/10 10:09, Jiufu Guo via Gcc-patches wrote:
>>> Hi,
>>>
>>> In this test case (float128-cmp2-runnable.c), the instruction
>>> xscmpexpqp is used to support a few builtins e.g.
>>> __builtin_vsx_scalar_cmp_exp_qp_eq on _Float128.
>>> This instruction handles the whole 128bits of the vector, and
>>> it is guarded by [ieee128-hw].
>>
>> The instruction xscmpexpqp is guarded with TARGET_P9_VECTOR,
>>
>> (define_insn "*xscmpexpqp"
>>   [(set (match_operand:CCFP 0 "cc_reg_operand" "=y")
>> (compare:CCFP
>>  (unspec:IEEE128 [(match_operand:IEEE128 1 "altivec_register_operand" 
>> "v")
>>   (match_operand:IEEE128 2 "altivec_register_operand" "v")]
>>   UNSPEC_VSX_SCMPEXPQP)
>>  (match_operand:SI 3 "zero_constant" "j")))]
>>   "TARGET_P9_VECTOR"
>>   "xscmpexpqp %0,%1,%2"
>>   [(set_attr "type" "fpcompare")])
>>
>> [ieee128-hw] is used for guarding those bifs, so the above
>> statement doesn't quite match the fact.
>>
> 
> Agree, I'm wondering if P9_VECTOR is perfect here, even if it indicates the 
> ISA
> which contains xscmpexpqp. Let me have more checks.
> 
>> PR108758 said this case doesn't fail with gcc-10 and gcc-11,
>> I wonder why it changes from gcc-12?  The above define_insn
>> shows the underlying insns for these bifs just requires the
>> condition power9-vector.  Could you have a further check?
>> Thanks.
> 
> Thanks for raising this concern.
> The behavior to check about bif on FLOAT128_HW and emit an error message for
> requirements on quad-precision is added in gcc12. This is why gcc12 fails to
> compile the case on -m32.
> 
> Before gcc12, altivec_resolve_overloaded_builtin will return the overloaded
> result directly, and does not check more about the result function.

Thanks for checking, I wonder which commit caused this behavior change and 
what's
the underlying justification?  I know there is one new bif handling framework
introduced in gcc12, not sure the checking condition was changed together or by
a standalone commit.  Anyway, apparently the conditions for the support of these
bifs are different on gcc-11 and gcc-12, I wonder why it changed.  As mentioned
above, PR108758's c#1 said this case (bifs) work well on gcc-11, I suspected the
condition change was an overkill, that's why I asked.

BR,
Kewen

> 
>>
>> btw, please add a PR marker for PR108758.
> 
> Sure,  thanks for catching this!
> 
> 
> BR,
> Jeff (Jiufu)
> 
>>
>> BR,
>> Kewen
>>
>>> So, we may update the testcase to require ppc_float128_hw.
>>>
>>> Tested on ppc64 both BE and LE.
>>> Is this ok for trunk?
>>>
>>> BR,
>>> Jeff (Jiufu)
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * gcc.target/powerpc/float128-cmp2-runnable.c: Update requires.
>>>
>>> ---
>>>  gcc/testsuite/gcc.target/powerpc/float128-cmp2-runnable.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/gcc/testsuite/gcc.target/powerpc/float128-cmp2-runnable.c 
>>> b/gcc/testsuite/gcc.target/powerpc/float128-cmp2-runnable.c
>>> index d376a3ca68e..91287c0fb7a 100644
>>> --- a/gcc/testsuite/gcc.target/powerpc/float128-cmp2-runnable.c
>>> +++ b/gcc/testsuite/gcc.target/powerpc/float128-cmp2-runnable.c
>>> @@ -1,5 +1,5 @@
>>>  /* { dg-do run } */
>>> -/* { dg-require-effective-target ppc_float128_sw } */
>>> +/* { dg-require-effective-target ppc_float128_hw } */
>>>  /* { dg-require-effective-target p9vector_hw } */
>>>  /* { dg-options "-O2 -mdejagnu-cpu=power9 " } */
>>>


Re: [PATCH v2][RFC] vect: Verify that GET_MODE_NUNITS is greater than one for vect_grouped_store_supported

2023-04-11 Thread Richard Biener via Gcc-patches
On Mon, Mar 27, 2023 at 6:02 PM Kevin Lee  wrote:
>
> This patch is a proper fix to the previous patch
> https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614463.html
> vect_grouped_store_supported checks if the count is a power of 2, but
> doesn't check the size of the GET_MODE_NUNITS.
> This should handle the riscv case where the mode is VNx1DI since the
> nelt would be {1, 1}.
> It was tested on RISCV and x86_64-linux-gnu. Would this be correct
> for the vectors with size smaller than 2?
>
> ---
>  gcc/tree-vect-data-refs.cc | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> index 8daf7bd7dd3..04ad12f7d04 100644
> --- a/gcc/tree-vect-data-refs.cc
> +++ b/gcc/tree-vect-data-refs.cc
> @@ -5399,6 +5399,8 @@ vect_grouped_store_supported (tree vectype, unsigned 
> HOST_WIDE_INT count)
>   poly_uint64 nelt = GET_MODE_NUNITS (mode);
>
>   /* The encoding has 2 interleaved stepped patterns.  */
> +if(!nelt.is_constant() && maybe_lt(nelt, (unsigned int) 2))
> +  return false;

Indentation is off (or your MUA is broken).  I think the nelt.is_constant ()
check is superfluous but with constant nelt we'd never end up with a
grouped store.

Note the calls are guarded with

 && ! known_eq (TYPE_VECTOR_SUBPARTS (vectype), 1U)

maybe the better fix is to change those to ! maybe_eq?

Richard should know best here.

Richard.

>   vec_perm_builder sel (nelt, 2, 3);
>   sel.quick_grow (6);
>   for (i = 0; i < 3; i++)
> --
> 2.25.1
>


Re: [PATCH] gcov: add info about "calls" to JSON output format

2023-04-11 Thread Richard Biener via Gcc-patches
On Thu, Apr 6, 2023 at 3:58 PM Martin Liška  wrote:
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed after stage1 opens?

Did we release a compiler with version 1?  If not we might want to sneak
this in before 13.1 ...

Up to Honza.

Thanks,
Richard.

> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> * doc/gcov.texi: Document the new "calls" field and document
> the API bump.
> * gcov.cc (output_intermediate_json_line): Output info about
> calls.
> (generate_results): Bump version to 2.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/gcov/gcov-17.C: Add call to a noreturn function.
> * g++.dg/gcov/test-gcov-17.py: Cover new format.
> * lib/gcov.exp: Add options for gcov that emit the extra info.
> ---
>  gcc/doc/gcov.texi | 27 +--
>  gcc/gcov.cc   | 12 +-
>  gcc/testsuite/g++.dg/gcov/gcov-17.C   |  7 ++
>  gcc/testsuite/g++.dg/gcov/test-gcov-17.py | 17 ++
>  gcc/testsuite/lib/gcov.exp|  2 +-
>  5 files changed, 57 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/doc/gcov.texi b/gcc/doc/gcov.texi
> index d39cce3a683..6739ebb3643 100644
> --- a/gcc/doc/gcov.texi
> +++ b/gcc/doc/gcov.texi
> @@ -195,7 +195,7 @@ Structure of the JSON is following:
>  @{
>"current_working_directory": "foo/bar",
>"data_file": "a.out",
> -  "format_version": "1",
> +  "format_version": "2",
>"gcc_version": "11.1.1 20210510"
>"files": ["$file"]
>  @}
> @@ -214,6 +214,12 @@ a compilation unit was compiled
>  @item
>  @var{format_version}: semantic version of the format
>
> +Changes in version @emph{2}:
> +@itemize @bullet
> +@item
> +@var{calls}: information about function calls is added
> +@end itemize
> +
>  @item
>  @var{gcc_version}: version of the GCC compiler
>  @end itemize
> @@ -292,6 +298,7 @@ Each @var{line} has the following form:
>  @smallexample
>  @{
>"branches": ["$branch"],
> +  "calls": ["$call"],
>"count": 2,
>"line_number": 15,
>"unexecuted_block": false,
> @@ -299,7 +306,7 @@ Each @var{line} has the following form:
>  @}
>  @end smallexample
>
> -Branches are present only with @var{-b} option.
> +Branches and calls are present only with @var{-b} option.
>  Fields of the @var{line} element have following semantics:
>
>  @itemize @bullet
> @@ -341,6 +348,22 @@ Fields of the @var{branch} element have following 
> semantics:
>  @var{throw}: true when the branch is an exceptional branch
>  @end itemize
>
> +Each @var{call} has the following form:
> +
> +@smallexample
> +@{
> +  "returned": 11,
> +@}
> +@end smallexample
> +
> +Fields of the @var{call} element have following semantics:
> +
> +@itemize @bullet
> +@item
> +@var{returned}: number of times a function call returned (call count is equal
> +to @var{line::count})
> +@end itemize
> +
>  @item -H
>  @itemx --human-readable
>  Write counts in human readable format (like 24.6k).
> diff --git a/gcc/gcov.cc b/gcc/gcov.cc
> index 2ec7248cc0e..88324143640 100644
> --- a/gcc/gcov.cc
> +++ b/gcc/gcov.cc
> @@ -1116,6 +1116,9 @@ output_intermediate_json_line (json::array *object,
>json::array *branches = new json::array ();
>lineo->set ("branches", branches);
>
> +  json::array *calls = new json::array ();
> +  lineo->set ("calls", calls);
> +
>vector::const_iterator it;
>if (flag_branches)
>  for (it = line->branches.begin (); it != line->branches.end ();
> @@ -1130,6 +1133,13 @@ output_intermediate_json_line (json::array *object,
>  new json::literal ((*it)->fall_through));
> branches->append (branch);
>   }
> +   else if ((*it)->is_call_non_return)
> + {
> +   json::object *call = new json::object ();
> +   gcov_type returns = (*it)->src->count - (*it)->count;
> +   call->set ("returned", new json::integer_number (returns));
> +   calls->append (call);
> + }
>}
>
>object->append (lineo);
> @@ -1523,7 +1533,7 @@ generate_results (const char *file_name)
>gcov_intermediate_filename = get_gcov_intermediate_filename (file_name);
>
>json::object *root = new json::object ();
> -  root->set ("format_version", new json::string ("1"));
> +  root->set ("format_version", new json::string ("2"));
>root->set ("gcc_version", new json::string (version_string));
>
>if (bbg_cwd != NULL)
> diff --git a/gcc/testsuite/g++.dg/gcov/gcov-17.C 
> b/gcc/testsuite/g++.dg/gcov/gcov-17.C
> index d11883cfd39..efe019599a5 100644
> --- a/gcc/testsuite/g++.dg/gcov/gcov-17.C
> +++ b/gcc/testsuite/g++.dg/gcov/gcov-17.C
> @@ -15,6 +15,11 @@ private:
>  template class Foo;
>  template class Foo;
>
> +static void noret()
> +{
> +  __builtin_exit (0);
> +}
> +
>  int
>  main (void)
>  {
> @@ -34,6 +39,8 @@ main (void)
>  __builtin_printf ("Failure\n");
>else
>  __builtin_printf ("Success\n");
> +
> +  

Re: [PATCH] PR tree-optimization/109417 - Check if dependency is valid before using in may_recompute_p.

2023-04-11 Thread Richard Biener via Gcc-patches
On Wed, Apr 5, 2023 at 11:53 PM Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 4/5/23 14:10, Andrew MacLeod via Gcc-patches wrote:
> > When a statement is first processed, any SSA_NAMEs that are dependencies
> > are cached for quick future access.
> >
> > if we ;later rewrite the statement (say propagate a constant into it),
> > its possible the ssa-name in this cache is no longer active.   Normally
> > this is not a problem, but the changed to may_recompute_p forgot to take
> > that into account, and was checking a dependency from the cache that was
> > in the SSA_NAME_FREE_LIST. It thus had no SSA_NAME_DEF_STMT when we were
> > expecting one.
> >
> > This patch simply rejects dependencies from consideration if they are in
> > the free list.
> >
> > Bootstrapping on x86_64-pc-linux-gnu  and presuming no regressio0ns, OK
> > for trunk?
> eek.  So you've got a released name in the cache?  What happens if the
> name gets released, then re-used?  Aren't you going to get bogus results
> in that case?

We never re-use SSA names from within the pass releasing it.  But if
the ranger cache
persists across passes this could be a problem.  See
flush_ssaname_freelist which
for example resets the SCEV hash table which otherwise would have the
same issue.

IIRC ranger never outlives a pass so the patch should be OK.

_But_ I wonder how ranger gets at the tree SSA name in the first place - usually
caches are indexed by SSA_NAME_VERSION (because that's cheaper and
better than a pointer to the tree) and ssa_name (ver) will return NULL
for released
SSA names.  So range_def_chain::rdc could be just

  struct rdc {
   int ssa1;   // First direct dependency
   int ssa2;   // Second direct dependency
   bitmap bm;   // All dependencies
   bitmap m_import;
  };

and ::depend1 return ssa_name (m_def_chain[v].ssa1) and everything would
work automatically (and save 8 bytes of storage).

Richard.

> jeff


Re: Re: [PATCH] RISC-V: Fix PR108279

2023-04-11 Thread juzhe.zh...@rivai.ai
No, we can only pass "available" to LCM.
Passing "compatible" to LCM can not work for us.

LCM can only help for eliminate vsetvls can not help for fuse vsetvls.

For example:

bb 0:
vsetvl e8,mf8
vadd (Demand SEW = 8, LMUL = MF8)
bb 1:
vsetvl e32 mf2
vle32 (Demand RATIO (SEW/LMUL) = 64, so available [SEW, LMUL] = [e8,mf8 or 
e16,mf4 or e32,mf2 or e64,m1])

I use LCM to handle the case above, I tell LCM that the vsetvl of vadd is 
"available" for the following "vle" instruction.
Then LCM will let us to remove "vsetvl e32mf2"
This is what I said "available" case that I use LCM to handle.

However, LCM can not handle "compatible" case. Here is the example:

Loop:
{
bb 0:
vsetvl e8,mf8,TA
vadd (Demand SEW = 8, LMUL = MF8, can either TU or TA)
bb 1:
vsetvl e32 mf2,TU
vle32 (Demand RATIO (SEW/LMUL) = 64, so available [SEW, LMUL] = [e8,mf8 or 
e16,mf4 or e32,mf2 or e64,m1], and demand TU)
}
It's obvious that neither "vsetvl e8,mf8,TA" nor "vsetvl e32 mf2,TU" are 
available for both instructions "vadd" and "vle".
That's why we need Phase 3 in VSETVL PASS.
I do the demand fusion generate a new vsetvl instructions "vsetvl e8,mf8,TU" 
which is available for both RVV instructions "vadd" and "vle", 
and update the first vsetvl "vsetvl e8,mf8,TA" to  "vsetvl e8,mf8,TU"

Then, I tell LCM "vsetvl e8,mf8,TU" is available for both "vadd" and "vle32", 
so LCM will hoist "vsetvl e8,mf8,TU" outside the LOOP
and remove all vsetvls inside the loop.



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-04-11 16:55
To: juzhe.zhong
CC: Jeff Law; gcc-patches; kito.cheng; palmer
Subject: Re: Re: [PATCH] RISC-V: Fix PR108279
On Wed, Apr 5, 2023 at 3:53 PM  wrote:
>
> >> So fusion in this context is really about identifying cases where two
> >> configuration settings are equivalent and you "fuse" them together.
> >> Presumably this is only going to be possible when the vector insns are
> >> just doing data movement rather than actual computations?
>
> >> If my understanding is correct, I can kind of see why you're doing
> >> fusion during phase 3.  My sense is there's a better way, but I'm having
> >> a bit of trouble working out the details of what that should be to
> >> myself.  In any event, revamping parts of the vsetvl insertion code
> >> isn't the kind of thing we should be doing now.
>
> The vsetvl demand fusion happens is not necessary "equivalent", instead, we
> call it we will do demand fusion when they are "compatible".
> And the fusion can happen between any vector insns including data movement
> and actual computations.
>
> What is "compatible" ??  This definition is according to RVV ISA.
> For example , For a vadd.vv need a vsetvl zero, 4, e32,m1,ta,ma.
> and a vle.v need a vsetvl zero,4,e8,mf4,ta,ma.
>
> According to RVV ISA:
> vadd.vv demand SEW = 32, LMUL = M1, AVL = 4
> vle.v demand RATIO = SEW/LMUL = 32, AVL = 4.
> So after demand fusion, the demand becomes SEW = 32, LMUL = M1, AVL = 4.
> Such vsetvl instruction is configured as this demand fusion, we call it 
> "compatible"
> since we can find a common vsetvl VL/VTYPE status for both vadd.vv and vle.v
>
> However, what case is not "incompatible", same example, if the vadd.vv demand 
> SEW = 32. LMUL = MF2,
> the vadd.vv is incompatible with vle.v. since we can't find a common VL/VTYPE 
> vsetvl instruction available
> for both of them.
>
> We have local demand fusion which is Phase 1. Local demand fusion is doing 
> the fusion within a block
> And also we have global demand fusion which is Phase 3. Global demand fusion 
> is doing across blocks.
>
> After Phase 1, each block has a single demand fusion. Phase 3 is doing global 
> demand fusion trying to
> find the common VL/VTYPE status available for a bunch of blocks, and fuse 
> them into a single vsetvl.
> So that we eliminate redundant vsetvli.
>
> Here is a example:
>
> bb 0:  (vle.v demand RATIO = 32)
>   /   \
> bb 1  bb 2
>   /  \ /   \
>  bb 3   bb 4   bb 5
>vadd   vmul  vdiv
> (demand  (demand  (demand
>  sew = 8,sew = 8,  sew = 8,
> lmul = mf4)  lmul = mf4,   lmul = mf4,
>   tail policy = tu) mask policy = mu)
>
> So in this case, we should do the global demand fusion for bb 0, bb3, bb 4, 
> bb5.
> since they are compatible according to RVV ISA.
> The final demand info of vsetvl should be vsetvl e8,mf4,tu,mu and put it
> in the bb0. Then we can avoid vsetvl in bb 3, 4, 5.
 
Just to throw in a comment here - I think you should present LCM with
something it
can identify as the same for compatible vsetvl and then it should just
work?  OTOH
if "compatible" is not transitive that's not possible (but then I
can't quickly make up
an example where it wouldn't be).
 
> >> We have more fusion rules according to RVV ISA. Phase 3 (Global demand 
> >> 

Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-11 Thread Jakub Jelinek via Gcc-patches
On Mon, Apr 10, 2023 at 11:14:46PM +0800, juzhe.zh...@rivai.ai wrote:
> ARM SVE has:svint8_t, svint8x2_t, svint8x3_t, svint8x4_t
> As far as I known, they don't have tuple type for partial vector.
> However, for RVV not only has vint8m1_t, vint8m1x2_t, vint8m1x3_t, 
> vint8m1x4_t, vint8m1x5_t, vint8m1x6_t, vint8m1x7_t, vint8m1x8_t
> 
> But also, we have vint8mf8_t, vint8mf8x2_t, vint8mf8x3_t, 
> vint8mf8x4_t, vint8mf8x5_t, vint8mf8x6_t, vint8mf8x7_t, vint8mf8x8_t
> 
> vint8mf4_t, vint8mf4x2_t, vint8mf4x3_t, 
> vint8mf4x4_t, vint8mf4x5_t, vint8mf4x6_t, vint8mf4x7_t, vint8mf4x8_t
> 
> etc
> 
> So many tuple types.

Do all of them need their own mode?  I mean, can't you instead use say some
backend aggregate types which act like homogenous aggregates in various
backends?
Modes are needed for something that can appear in instructions, for
something that can be lowered say during expansion at latest you don't
need special modes.  I admit I don't know much about RVV, but if those
tuples are to be handled as configure the CPU for certain vector length,
perform some instruction on effectively variable length vector with certain
element and then reconfigure the CPU again for something else, couldn't
the only vector modes there be the variable length ones?

Jakub



Re: [PATCH] testsuite: update requires for powerpc/float128-cmp2-runnable.c

2023-04-11 Thread guojiufu via Gcc-patches

Hi Kewen,

Thanks a lot for your very helpful comments!

On 2023-04-10 17:26, Kewen.Lin wrote:

Hi Jeff,

on 2023/4/10 10:09, Jiufu Guo via Gcc-patches wrote:

Hi,

In this test case (float128-cmp2-runnable.c), the instruction
xscmpexpqp is used to support a few builtins e.g.
__builtin_vsx_scalar_cmp_exp_qp_eq on _Float128.
This instruction handles the whole 128bits of the vector, and
it is guarded by [ieee128-hw].


The instruction xscmpexpqp is guarded with TARGET_P9_VECTOR,

(define_insn "*xscmpexpqp"
  [(set (match_operand:CCFP 0 "cc_reg_operand" "=y")
(compare:CCFP
	 (unspec:IEEE128 [(match_operand:IEEE128 1 "altivec_register_operand" 
"v")

  (match_operand:IEEE128 2 "altivec_register_operand" 
"v")]
  UNSPEC_VSX_SCMPEXPQP)
 (match_operand:SI 3 "zero_constant" "j")))]
  "TARGET_P9_VECTOR"
  "xscmpexpqp %0,%1,%2"
  [(set_attr "type" "fpcompare")])

[ieee128-hw] is used for guarding those bifs, so the above
statement doesn't quite match the fact.



Agree, I'm wondering if P9_VECTOR is perfect here, even if it indicates 
the ISA

which contains xscmpexpqp. Let me have more checks.


PR108758 said this case doesn't fail with gcc-10 and gcc-11,
I wonder why it changes from gcc-12?  The above define_insn
shows the underlying insns for these bifs just requires the
condition power9-vector.  Could you have a further check?
Thanks.


Thanks for raising this concern.
The behavior to check about bif on FLOAT128_HW and emit an error message 
for
requirements on quad-precision is added in gcc12. This is why gcc12 
fails to

compile the case on -m32.

Before gcc12, altivec_resolve_overloaded_builtin will return the 
overloaded

result directly, and does not check more about the result function.



btw, please add a PR marker for PR108758.


Sure,  thanks for catching this!


BR,
Jeff (Jiufu)



BR,
Kewen


So, we may update the testcase to require ppc_float128_hw.

Tested on ppc64 both BE and LE.
Is this ok for trunk?

BR,
Jeff (Jiufu)

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/float128-cmp2-runnable.c: Update requires.

---
 gcc/testsuite/gcc.target/powerpc/float128-cmp2-runnable.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/float128-cmp2-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/float128-cmp2-runnable.c

index d376a3ca68e..91287c0fb7a 100644
--- a/gcc/testsuite/gcc.target/powerpc/float128-cmp2-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/float128-cmp2-runnable.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-require-effective-target ppc_float128_sw } */
+/* { dg-require-effective-target ppc_float128_hw } */
 /* { dg-require-effective-target p9vector_hw } */
 /* { dg-options "-O2 -mdejagnu-cpu=power9 " } */



Re: Rust front-end update 2023-04-05

2023-04-11 Thread Richard Biener via Gcc-patches
On Wed, Apr 5, 2023 at 4:06 PM  wrote:
>
> Hi everyone,
>
> This patchset contains around 80 commits concerning the Rust frontend.
>
> We have been hard at work trying to get the Rust core library to
> compile, and hope to push more commits in the coming days as we try
> and upstream a more recent version of gccrs. All of the team has done
> a tremendous amount of work in the last few weeks, as we get closer
> and closer to compiling libcore 1.49.
>
> Our focus before GCC 13 releases is to improve the documentation of the
> compiler and write more about the various flags used by the frontend,
> which a user will need to understand and interact with.
>
> The commits we will be pushing before GCC 13 releases will not contain
> any "major breakthrough", as some functionality required to compile
> libcore properly will still take some time to implement. Very often used
> Rust macros such as `format_args` are defined within the core library,
> and require a high amount of work on the compiler side.
>
> Furthermore, integrating libcore as part of GCC will require
> significant build system changes which are incompatible with the current
> GCC stage. We will be making these changes as soon as possible and
> integrate a version of libcore in GCC 14, as well as an implementation
> of the libproc_macro crate. We will be submitting these patches and
> pinging build system experts for extensive reviewing.
>
> Thank you to everyone involved with the project, and to everyone who
> has helped us.

Thanks a lot for the update.  I see it's still required to use
-frust-incomplete-and-experimental-compiler-do-not-use - for GCC 13.1
I would suggest to instead discourage from distributing rust in the
documentation
and release notes.

Do you expect enough progress until GCC 13.2 (which should be about
two to three months after GCC 13.1 is released) so that libcore can be compiled?
If not then I'm not sure it makes much sense to keep rust configurable
for GCC 13.

On the general note - were you planning to keep the GCC 13 branch copy of rust
up-to-date with the GCC 14 trunk?

Richard.

> All the best,
>
> Arthur
>
>


Re: [RFC PATCH] range-op-float: Fix up op1_op2_relation of comparisons

2023-04-11 Thread Aldy Hernandez via Gcc-patches




On 4/11/23 10:21, Jakub Jelinek wrote:

Hi!

This patch was what I've tried first before the currently committed
PR109386 fix.  Still, I think it is the right thing until we have proper
full set of VREL_* relations for NANs (though it would be really nice
if op1_op2_relation could be passed either type of the comparison
operands, or even better ranges of the two operands, such that
we could choose if inversion of say VREL_LT is VREL_GE (if !MODE_HONOR_NANS
(TYPE_MODE (type))) or rhs1/rhs2 ranges are guaranteed not to include
NANs (!known_isnan && !maybe_isnan for both), or VREL_UNGE, etc.
Anyway, the current state is that for the LE/LT/GE/GT comparisons
we pretend the inverse is like for integral comparisons, which is
true only if NANs can't appear in operands, while for UNLE/UNLT/UNGE/UNGT
we don't override op1_op2_relation (so it always yields VREL_VARYING).

Though, this patch regresses the
FAIL: gcc.dg/tree-ssa/vrp-float-6.c scan-tree-dump-times evrp "Folding predicate x_.* 
<= y_.* to 1" 1
test, so am not sure what to do with it.  The test has explicit
!isnan tests around it, so e.g. having the ranges passed to op1_op2_relation
would also fix it.


I'll defer to Andrew on this, as he's the master of the relation oracle :).

Aldy



2023-04-11  Jakub Jelinek  

* range-op-float.cc (foperator_lt::op1_op2_relation): Return
VREL_VARYING instead of VREL_GE.
(foperator_le::op1_op2_relation): Return VREL_VARYING instead of
VREL_GT.
(foperator_gt::op1_op2_relation): Return VREL_VARYING instead of
VREL_LE.
(foperator_ge::op1_op2_relation): Return VREL_VARYING instead of
VREL_LT.
(foperator_unordered_lt::op1_op2_relation,
foperator_unordered_le::op1_op2_relation,
foperator_unordered_gt::op1_op2_relation,
foperator_unordered_ge::op1_op2_relation): New.

--- gcc/range-op-float.cc.jj2023-04-01 09:32:02.635423345 +0200
+++ gcc/range-op-float.cc   2023-04-03 10:42:54.0 +0200
@@ -831,7 +831,10 @@ public:
   relation_trio = TRIO_VARYING) const final override;
relation_kind op1_op2_relation (const irange ) const final override
{
-return lt_op1_op2_relation (lhs);
+relation_kind ret = lt_op1_op2_relation (lhs);
+if (ret == VREL_GE)
+  ret = VREL_VARYING; // Inverse of VREL_LT is VREL_UNGE.
+return ret;
}
bool op1_range (frange , tree type,
  const irange , const frange ,
@@ -947,6 +950,10 @@ public:
   relation_trio rel = TRIO_VARYING) const final override;
relation_kind op1_op2_relation (const irange ) const final override
{
+relation_kind ret = le_op1_op2_relation (lhs);
+if (ret == VREL_GT)
+  ret = VREL_VARYING; // Inverse of VREL_LE is VREL_UNGT.
+return ret;
  return le_op1_op2_relation (lhs);
}
bool op1_range (frange , tree type,
@@ -1057,7 +1064,10 @@ public:
   relation_trio = TRIO_VARYING) const final override;
relation_kind op1_op2_relation (const irange ) const final override
{
-return gt_op1_op2_relation (lhs);
+relation_kind ret = gt_op1_op2_relation (lhs);
+if (ret == VREL_LE)
+  ret = VREL_VARYING; // Inverse of VREL_GT is VREL_UNLE.
+return ret;
}
bool op1_range (frange , tree type,
  const irange , const frange ,
@@ -1177,7 +1187,10 @@ public:
   relation_trio = TRIO_VARYING) const final override;
relation_kind op1_op2_relation (const irange ) const final override
{
-return ge_op1_op2_relation (lhs);
+relation_kind ret = ge_op1_op2_relation (lhs);
+if (ret == VREL_LT)
+  ret = VREL_VARYING; // Inverse of VREL_GE is VREL_UNLT.
+return ret;
}
bool op1_range (frange , tree type,
  const irange , const frange ,
@@ -1571,6 +1584,7 @@ class foperator_unordered_lt : public ra
using range_operator_float::fold_range;
using range_operator_float::op1_range;
using range_operator_float::op2_range;
+  using range_operator_float::op1_op2_relation;
  public:
bool fold_range (irange , tree type,
   const frange , const frange ,
@@ -1599,6 +1613,13 @@ public:
return true;
}
}
+  relation_kind op1_op2_relation (const irange ) const final override
+  {
+relation_kind ret = lt_op1_op2_relation (lhs);
+if (ret == VREL_LT)
+  ret = VREL_VARYING; // Should have been VREL_UNLT.
+return ret;
+  }
bool op1_range (frange , tree type,
  const irange ,
  const frange ,
@@ -1682,6 +1703,7 @@ class foperator_unordered_le : public ra
using range_operator_float::fold_range;
using range_operator_float::op1_range;
using range_operator_float::op2_range;
+  using range_operator_float::op1_op2_relation;
  public:
bool fold_range (irange , tree type,
   const frange , const frange ,
@@ -1710,6 +1732,13 @@ public:
return true;

Re: Re: [PATCH] RISC-V: Fix PR108279

2023-04-11 Thread Richard Biener via Gcc-patches
On Wed, Apr 5, 2023 at 3:53 PM  wrote:
>
> >> So fusion in this context is really about identifying cases where two
> >> configuration settings are equivalent and you "fuse" them together.
> >> Presumably this is only going to be possible when the vector insns are
> >> just doing data movement rather than actual computations?
>
> >> If my understanding is correct, I can kind of see why you're doing
> >> fusion during phase 3.  My sense is there's a better way, but I'm having
> >> a bit of trouble working out the details of what that should be to
> >> myself.  In any event, revamping parts of the vsetvl insertion code
> >> isn't the kind of thing we should be doing now.
>
> The vsetvl demand fusion happens is not necessary "equivalent", instead, we
> call it we will do demand fusion when they are "compatible".
> And the fusion can happen between any vector insns including data movement
> and actual computations.
>
> What is "compatible" ??  This definition is according to RVV ISA.
> For example , For a vadd.vv need a vsetvl zero, 4, e32,m1,ta,ma.
> and a vle.v need a vsetvl zero,4,e8,mf4,ta,ma.
>
> According to RVV ISA:
> vadd.vv demand SEW = 32, LMUL = M1, AVL = 4
> vle.v demand RATIO = SEW/LMUL = 32, AVL = 4.
> So after demand fusion, the demand becomes SEW = 32, LMUL = M1, AVL = 4.
> Such vsetvl instruction is configured as this demand fusion, we call it 
> "compatible"
> since we can find a common vsetvl VL/VTYPE status for both vadd.vv and vle.v
>
> However, what case is not "incompatible", same example, if the vadd.vv demand 
> SEW = 32. LMUL = MF2,
> the vadd.vv is incompatible with vle.v. since we can't find a common VL/VTYPE 
> vsetvl instruction available
> for both of them.
>
> We have local demand fusion which is Phase 1. Local demand fusion is doing 
> the fusion within a block
> And also we have global demand fusion which is Phase 3. Global demand fusion 
> is doing across blocks.
>
> After Phase 1, each block has a single demand fusion. Phase 3 is doing global 
> demand fusion trying to
> find the common VL/VTYPE status available for a bunch of blocks, and fuse 
> them into a single vsetvl.
> So that we eliminate redundant vsetvli.
>
> Here is a example:
>
> bb 0:  (vle.v demand RATIO = 32)
>   /   \
> bb 1  bb 2
>   /  \ /   \
>  bb 3   bb 4   bb 5
>vadd   vmul  vdiv
> (demand  (demand  (demand
>  sew = 8,sew = 8,  sew = 8,
> lmul = mf4)  lmul = mf4,   lmul = mf4,
>   tail policy = tu) mask policy = mu)
>
> So in this case, we should do the global demand fusion for bb 0, bb3, bb 4, 
> bb5.
> since they are compatible according to RVV ISA.
> The final demand info of vsetvl should be vsetvl e8,mf4,tu,mu and put it
> in the bb0. Then we can avoid vsetvl in bb 3, 4, 5.

Just to throw in a comment here - I think you should present LCM with
something it
can identify as the same for compatible vsetvl and then it should just
work?  OTOH
if "compatible" is not transitive that's not possible (but then I
can't quickly make up
an example where it wouldn't be).

> >> We have more fusion rules according to RVV ISA. Phase 3 (Global demand 
> >> fusion) is
> >> really important.
>
> >> That would seem to indicate the function is poorly named.  Unless you're
> >> using "empty" here to mean the state is valid or dirty.  Either way it
> >> seems like the function name ought to be improved.
>
> >> The comments talk about bb1 being inside a loop.  Nowhere do you check
> >> that as far as I can tell.
>
> >> When trying to understand what the patch is going I ran across this 
> >> comment:
>
> >>   /* The local_dem vector insn_info of the block.  */
>  >>   vector_insn_info local_dem;
>
>
> >> That comment really doesn't improve anything.  "local_dem" is clearly
> >> short-hand for something (local demand?), whatever it is, make it
> >> clearer in the comment.
>
> Sorry for bad comments in the codes. Currently, I am working on the first 
> patch
> of auto-vectorization. After I sent the first patch of auto-vectorization for 
> you to
> review. I would like to re-check all the comments and code style of VSETVL 
> PASS.
> And refine them.
>
>
>
>
> juzhe.zh...@rivai.ai
>
> From: Jeff Law
> Date: 2023-04-05 21:05
> To: juzhe.zhong; gcc-patches
> CC: kito.cheng; palmer
> Subject: Re: [PATCH] RISC-V: Fix PR108279
>
>
> On 4/2/23 16:40, juzhe.zh...@rivai.ai wrote:
> > This point is seletected not because LCM but by Phase 3 (VL/VTYPE demand
> > info backward fusion and propogation) which
> > is I introduced into VSETVL PASS to enhance LCM && improve vsetvl
> > instruction performance.
> So fusion in this context is really about identifying cases where two
> configuration settings are equivalent and you "fuse" them together.
> Presumably this is only going to be 

Re: [PATCH] tree-optimization/108888 - call if-conversion

2023-04-11 Thread Richard Biener via Gcc-patches
On Wed, 5 Apr 2023, Andre Vieira (lists) wrote:

> Hi,
> 
> The original patch to fix this PR broke the if-conversion of calls into
> IFN_MASK_CALL.  This patch restores that original behaviour and makes sure the
> tests added earlier specifically test inbranch SIMD clones.

OOps.

> Bootstrapped and regression tested on aarch64-none-linux-gnu and
> x86_64-pc-linux-gnu.
> 
> Is this OK for trunk?

OK.

Thanks,
Richard.

> gcc/ChangeLog:
> 
>   PR tree-optimization/10
>   * tree-if-conv.cc (predicate_statements): Fix gimple call check.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/vect-simd-clone-16.c: Make simd clone inbranch only.
>   * gcc.dg/vect/vect-simd-clone-17.c: Likewise.
>   * gcc.dg/vect/vect-simd-clone-18.c: Likewise.
> 
> On 23/02/2023 10:10, Richard Biener via Gcc-patches wrote:
> > The following makes sure to only predicate calls necessary.
> > 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
> > 
> >  PR tree-optimization/10
> >  * tree-if-conv.cc (if_convertible_stmt_p): Set PLF_2 on
> >  calls to predicate.
> >  (predicate_statements): Only predicate calls with PLF_2.
> > 
> > * g++.dg/torture/pr10.C: New testcase.
> > ---
> >   gcc/testsuite/g++.dg/torture/pr10.C | 18 ++
> >   gcc/tree-if-conv.cc | 17 ++---
> >   2 files changed, 28 insertions(+), 7 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/torture/pr10.C
> > 
> > diff --git a/gcc/testsuite/g++.dg/torture/pr10.C
> > b/gcc/testsuite/g++.dg/torture/pr10.C
> > new file mode 100644
> > index 000..29a22e21102
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/torture/pr10.C
> > @@ -0,0 +1,18 @@
> > +// { dg-do compile }
> > +
> > +int scaleValueSaturate_scalefactor, scaleValueSaturate___trans_tmp_2,
> > +scaleValuesSaturate_i;
> > +int scaleValueSaturate(int value) {
> > +  int result = __builtin_clz(value);
> > +  if (value)
> > +if (-result <= scaleValueSaturate_scalefactor)
> > +  return 0;
> > +  return scaleValueSaturate___trans_tmp_2;
> > +}
> > +short scaleValuesSaturate_dst;
> > +short *scaleValuesSaturate_src;
> > +void scaleValuesSaturate() {
> > +  for (; scaleValuesSaturate_i; scaleValuesSaturate_i++)
> > +scaleValuesSaturate_dst =
> > +scaleValueSaturate(scaleValuesSaturate_src[scaleValuesSaturate_i]);
> > +}
> > diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
> > index a7a8406374d..0e384e36394 100644
> > --- a/gcc/tree-if-conv.cc
> > +++ b/gcc/tree-if-conv.cc
> > @@ -1099,6 +1099,7 @@ if_convertible_stmt_p (gimple *stmt,
> > vec refs)
> >n = n->simdclone->next_clone)
> > if (n->simdclone->inbranch)
> >   {
> > +   gimple_set_plf (stmt, GF_PLF_2, true);
> > need_to_predicate = true;
> > return true;
> >   }
> > @@ -2541,7 +2542,8 @@ predicate_statements (loop_p loop)
> >  release_defs (stmt);
> >  continue;
> > }
> > - else if (gimple_plf (stmt, GF_PLF_2))
> > + else if (gimple_plf (stmt, GF_PLF_2)
> > +  && is_gimple_assign (stmt))
> >{
> >  tree lhs = gimple_assign_lhs (stmt);
> >  tree mask;
> > @@ -2625,13 +2627,14 @@ predicate_statements (loop_p loop)
> >  gimple_assign_set_rhs1 (stmt, ifc_temp_var (type, rhs, ));
> >  update_stmt (stmt);
> > }
> > -
> > - /* Convert functions that have a SIMD clone to IFN_MASK_CALL.  This
> > -will cause the vectorizer to match the "in branch" clone
> > variants,
> > -and serves to build the mask vector in a natural way.  */
> > - gcall *call = dyn_cast  (gsi_stmt (gsi));
> > - if (call && !gimple_call_internal_p (call))
> > + else if (gimple_plf (stmt, GF_PLF_2)
> > +  && is_gimple_call (stmt))
> > {
> > + /* Convert functions that have a SIMD clone to IFN_MASK_CALL.
> > +This will cause the vectorizer to match the "in branch"
> > +clone variants, and serves to build the mask vector
> > +in a natural way.  */
> > + gcall *call = dyn_cast  (gsi_stmt (gsi));
> >  tree orig_fn = gimple_call_fn (call);
> >  int orig_nargs = gimple_call_num_args (call);
> >  auto_vec args;
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Re: [match.pd] [SVE] Add pattern to transform svrev(svrev(v)) --> v

2023-04-11 Thread Richard Biener via Gcc-patches
On Wed, Apr 5, 2023 at 10:39 AM Prathamesh Kulkarni via Gcc-patches
 wrote:
>
> Hi,
> For the following test:
>
> svint32_t f(svint32_t v)
> {
>   return svrev_s32 (svrev_s32 (v));
> }
>
> We generate 2 rev instructions instead of nop:
> f:
> rev z0.s, z0.s
> rev z0.s, z0.s
> ret
>
> The attached patch tries to fix that by trying to recognize the following
> pattern in match.pd:
> v1 = VEC_PERM_EXPR (v0, v0, mask)
> v2 = VEC_PERM_EXPR (v1, v1, mask)
> -->
> v2 = v0
> if mask is { nelts - 1, nelts - 2, nelts - 3, ... }
>
> Code-gen with patch:
> f:
> ret
>
> Bootstrap+test passes on aarch64-linux-gnu, and SVE bootstrap in progress.
> Does it look OK for stage-1 ?

I didn't look at the patch but tree-ssa-forwprop.cc:simplify_permutation should
handle two consecutive permutes with the is_combined_permutation_identity
which might need tweaking for VLA vectors

Richard.

>
> Thanks,
> Prathamesh


Re: [PATCH] Add -fsarif-time-report [PR109361]

2023-04-11 Thread Richard Biener via Gcc-patches
On Tue, 4 Apr 2023, David Malcolm wrote:

> Richi, Jakub: I can probably self-approve this, but it's technically a
> new feature.  OK if I push this to trunk in stage 4?  I believe it's
> low risk, and is very useful for benchmarking -fanalyzer.

Please wait for stage1 at this point.  One comment on the patch
below ...

> 
> This patch adds support for embeddding profiling information about the
> compiler itself into the SARIF output.
> 
> In an earlier version of this patch I extended -ftime-report so that
> as well as writing to stderr, it would embed the information in any
> SARIF output.  This turned out to be awkward to use, in that I found
> myself needing to get the data in JSON form without also having it
> emitted on stderr (which was affecting the output of the build).
> 
> Hence this version of the patch adds a new -fsarif-time-report, similar
> to the existing -ftime-report for requesting GCC profile itself using
> the timevar machinery.
> 
> Specifically, if -fsarif-time-report is specified, the timing
> information will be captured (as if -ftime-report were specified), and
> will be embedded in JSON form within any SARIF as a "gcc/timeReport"
> property within a property bag of the "invocation" object.
> 
> Here's an example of the output:
> 
>   "invocations": [
>   {
>   "executionSuccessful": true,
>   "toolExecutionNotifications": [],
>   "properties": {
>   "gcc/timeReport": {
>   "timevars": [
>   {
>   "name": "phase setup",
>   "elapsed": {
>   "user": 0.04,
>   "sys": 0,
>   "wall": 0.04,
>   "ggc_mem": 1863472
>   }
>   },
> 
>   [...snip...]
> 
>   {
>   "name": "analyzer: processing worklist",
>   "elapsed": {
>   "user": 0.06,
>   "sys": 0,
>   "wall": 0.06,
>   "ggc_mem": 48
>   }
>   },
>   {
>   "name": "analyzer: emitting diagnostics",
>   "elapsed": {
>   "user": 0.01,
>   "sys": 0,
>   "wall": 0.01,
>   "ggc_mem": 0
>   }
>   },
>   {
>   "name": "TOTAL",
>   "elapsed": {
>   "user": 0.21,
>   "sys": 0.03,
>   "wall": 0.24,
>   "ggc_mem": 3368736
>   }
>   }
>   ],
>   "CHECKING_P": true,
>   "flag_checking": true
>   }
>   }
>   }
>   ]
> 
> I have successfully used this in my analyzer integration tests to get
> timing information about which source files get slowed down by the
> analyzer.  I've validated the generated .sarif files against the SARIF
> schema.
> 
> The documentation notes that the precise output format is subject
> to change.
> 
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> 
> gcc/ChangeLog:
>   PR analyzer/109361
>   * common.opt (fsarif-time-report): New option.

'sarif' is currently used only with -fdiagnostics-format= it seems.
We already have

ftime-report
Common Var(time_report)
Report the time taken by each compiler pass.

ftime-report-details
Common Var(time_report_details)
Record times taken by sub-phases separately. 

so -fsarif-time-report is not a) -ftime-report-sarif and b) it's
unclear if it applies to -ftime-report or to both -ftime-report
and -ftime-report-details?  (note -ftime-report-details needs
-ftime-report to be effective)

I'd rather have a -ftime-report-format= (or -freport-format in
case we want to cover -fmem-report, -fmem-report-wpa,
-fpre-ipa-mem-report and -fpost-ipa-mem-report as well?)

ISTR there's a summer of code project in this are as well.

Thanks,
Richard.

>   * diagnostic-client-data-hooks.h (class sarif_object): New forward
>   decl.
>   (diagnostic_client_data_hooks::add_sarif_invocation_properties):
>   New vfunc.
>   * diagnostic-format-sarif.cc: Include "diagnostic-format-sarif.h".
>   (class sarif_invocation): Inherit from sarif_object rather
>   than json::object.
>   (class sarif_result): Likewise.
>   (class sarif_ice_notification): Likewise.
>   (sarif_object::get_or_create_properties): New.
>   (sarif_invocation::prepare_to_flush): Add "context" param.  Use it
>   to call the context's 

Re: [PATCH] ipa: propagate attributes for target attribute clones

2023-04-11 Thread Richard Biener via Gcc-patches
On Mon, Apr 3, 2023 at 11:03 AM Martin Liška  wrote:
>
> Hi.
>
> The patch propagates noreturn attribute for MV functions. However, I noticed
> we've got the following ICE when we do the same for TREE_READONLY attr:
>
> $ cat tc.c
> double bar() __attribute__((target_clones("avx,avx2,avx512f,default")));
> double bar() { return 1.2f; }
>
> int foo() { return (int)bar(); }
>
> $ ./xgcc -B. ~/Programming/testcases/tc.c -O3 -c -fprofile-generate
> /home/marxin/Programming/testcases/tc.c: In function ‘foo’:
> /home/marxin/Programming/testcases/tc.c:4:5: error: virtual definition of 
> statement not up to date
>  4 | int foo() { return (int)bar(); }
>| ^~~
> _1 = bar ();
> during GIMPLE pass: fixup_cfg
>
> Thus my ambition is to propagate only noreturn attr.

But wouldn't that mean the IL needs fixup for the call to the dispatcher
which would be non-const for actually const implementations?  That said,
maybe the IPA cloning for the actual functions need adjustments as well?

> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?
> Thanks,
> Martin
>
> PR ipa/106816
>
> gcc/ChangeLog:
>
> * config/i386/i386-features.cc 
> (ix86_get_function_versions_dispatcher):
> Propagate function attributes for clones.
>
> gcc/testsuite/ChangeLog:
>
> * g++.target/i386/pr106816.C: New test.
>
> Co-Authored-By: H.J. Lu 
> ---
>   gcc/config/i386/i386-features.cc |  1 +
>   gcc/testsuite/g++.target/i386/pr106816.C | 23 +++
>   2 files changed, 24 insertions(+)
>   create mode 100644 gcc/testsuite/g++.target/i386/pr106816.C
>
> diff --git a/gcc/config/i386/i386-features.cc 
> b/gcc/config/i386/i386-features.cc
> index c09abf8fc20..f2b0d59a73c 100644
> --- a/gcc/config/i386/i386-features.cc
> +++ b/gcc/config/i386/i386-features.cc
> @@ -3379,6 +3379,7 @@ ix86_get_function_versions_dispatcher (void *decl)
> /* Right now, the dispatching is done via ifunc.  */
> dispatch_decl = make_dispatcher_decl (default_node->decl);
> TREE_NOTHROW (dispatch_decl) = TREE_NOTHROW (fn);
> +  TREE_THIS_VOLATILE (dispatch_decl) = TREE_THIS_VOLATILE (fn);
>
> dispatcher_node = cgraph_node::get_create (dispatch_decl);
> gcc_assert (dispatcher_node != NULL);
> diff --git a/gcc/testsuite/g++.target/i386/pr106816.C 
> b/gcc/testsuite/g++.target/i386/pr106816.C
> new file mode 100644
> index 000..0f5cc1f13dd
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/pr106816.C
> @@ -0,0 +1,23 @@
> +// PR ipa/106816
> +
> +// { dg-do compile }
> +// { dg-require-ifunc "" }
> +// { dg-options "-O2 -fdump-tree-optimized" }
> +
> +__attribute__((noreturn, target("default"))) void f()
> +{
> +  for (;;) {}
> +}
> +
> +__attribute__((noreturn, target("sse4.2,bmi"))) void f()
> +{
> +  for (;;) {}
> +}
> +
> +int main()
> +{
> +  f();
> +  return 12345;
> +}
> +
> +/* { dg-final { scan-tree-dump-not "12345" "optimized" } } */
> --
> 2.40.0
>


Re: [PATCH] driver: drop flag_var_tracking_assignments flag

2023-04-11 Thread Richard Biener via Gcc-patches
On Mon, Apr 3, 2023 at 10:46 AM Martin Liška  wrote:
>
> The revision r13-259-g76db543db88727 moved a condition from one
> file to another, but now we do not drop x_flag_var_tracking_assignments
> as it was done before the mentioned revision.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?

OK.

> Thanks,
> Martin
>
> PR driver/108241
>
> gcc/ChangeLog:
>
> * opts.cc (finish_options): Drop also
>   x_flag_var_tracking_assignments.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/pr108241.c: New test.
> * gcc.dg/pr79570.c: Add also -g option.
> ---
>   gcc/opts.cc |  1 +
>   gcc/testsuite/gcc.dg/pr108241.c | 63 +
>   gcc/testsuite/gcc.dg/pr79570.c  |  2 +-
>   3 files changed, 65 insertions(+), 1 deletion(-)
>   create mode 100644 gcc/testsuite/gcc.dg/pr108241.c
>
> diff --git a/gcc/opts.cc b/gcc/opts.cc
> index f102c1328b9..fb2e5388ab1 100644
> --- a/gcc/opts.cc
> +++ b/gcc/opts.cc
> @@ -1384,6 +1384,7 @@ finish_options (struct gcc_options *opts, struct 
> gcc_options *opts_set,
> }
> opts->x_flag_var_tracking = 0;
> opts->x_flag_var_tracking_uninit = 0;
> +  opts->x_flag_var_tracking_assignments = 0;
>   }
>
> /* One could use EnabledBy, but it would lead to a circular dependency.  
> */
> diff --git a/gcc/testsuite/gcc.dg/pr108241.c b/gcc/testsuite/gcc.dg/pr108241.c
> new file mode 100644
> index 000..06d210fae68
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr108241.c
> @@ -0,0 +1,63 @@
> +/* PR driver/108241 */
> +/* { dg-options "-Os -frounding-math -fvar-tracking-assignments -fno-dce 
> -fno-trapping-math -fno-tree-dce -fno-tree-dse" } */
> +
> +long int n1;
> +int n2, n3, n4;
> +char n5;
> +
> +void
> +foo (long int x1, long int x2, int x3, int x4, int x5, char x6, char x7)
> +{
> +  char a01 = n2, a02 = x4, a03 = 0;
> +  short int a04;
> +  unsigned short int a05 = x5;
> +  int a06, a07, a08 = a05, a09 = x3, a10 = 0;
> +  long int a11, a12 = x4;
> +
> +  if (x1)
> +{
> +  a07 = x6 + (float)0x101;
> +  a03 = a12 = a01 = a06 = ~0;
> +
> +  if (x5)
> +   a11 = n5;
> +}
> +  else
> +{
> +  a10 = x3 = n3;
> +  if (n3)
> +   a06 = a05 = x7;
> +}
> +
> +  if (n3 < n5)
> +{
> +  n4 = (x2 == x4) + !n1;
> +  if (n4 % (n1 % x3))
> +   {
> + a04 = n4;
> + a02 = n2;
> +   }
> +
> +  if (x3)
> +   {
> + a05 = !n1 % n2;
> + a08 = n1;
> + a04 = x5 + a06;
> +   }
> +
> +  if (a12)
> +   a09 = n3 + n4;
> +
> +  a12 = a07;
> +  n3 = a11 % x1;
> +  n5 += x6;
> +  n1 = a04;
> +}
> +
> +  n4 = x2 % x5 % a11;
> +  a06 = a10 + a08 % a02 == n4;
> +  a09 = a09 == a01 * x7;
> +  n4 = x4;
> +  a12 += x4 / 0xc000 + !a03;
> +  a03 = !a05;
> +}
> diff --git a/gcc/testsuite/gcc.dg/pr79570.c b/gcc/testsuite/gcc.dg/pr79570.c
> index 00841b9487a..a15be9f201d 100644
> --- a/gcc/testsuite/gcc.dg/pr79570.c
> +++ b/gcc/testsuite/gcc.dg/pr79570.c
> @@ -1,6 +1,6 @@
>   /* PR target/79570 */
>   /* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */
> -/* { dg-options "-O2 -fselective-scheduling2 -fvar-tracking-assignments" } */
> +/* { dg-options "-O2 -fselective-scheduling2 -fvar-tracking-assignments -g" 
> } */
>   /* { dg-warning "changes selective scheduling" "" { target *-*-* } 0 } */
>
>   #include "pr69956.c"
> --
> 2.40.0
>


[RFC PATCH] range-op-float: Fix up op1_op2_relation of comparisons

2023-04-11 Thread Jakub Jelinek via Gcc-patches
Hi!

This patch was what I've tried first before the currently committed
PR109386 fix.  Still, I think it is the right thing until we have proper
full set of VREL_* relations for NANs (though it would be really nice
if op1_op2_relation could be passed either type of the comparison
operands, or even better ranges of the two operands, such that
we could choose if inversion of say VREL_LT is VREL_GE (if !MODE_HONOR_NANS
(TYPE_MODE (type))) or rhs1/rhs2 ranges are guaranteed not to include
NANs (!known_isnan && !maybe_isnan for both), or VREL_UNGE, etc.
Anyway, the current state is that for the LE/LT/GE/GT comparisons
we pretend the inverse is like for integral comparisons, which is
true only if NANs can't appear in operands, while for UNLE/UNLT/UNGE/UNGT
we don't override op1_op2_relation (so it always yields VREL_VARYING).

Though, this patch regresses the
FAIL: gcc.dg/tree-ssa/vrp-float-6.c scan-tree-dump-times evrp "Folding 
predicate x_.* <= y_.* to 1" 1
test, so am not sure what to do with it.  The test has explicit
!isnan tests around it, so e.g. having the ranges passed to op1_op2_relation
would also fix it.

2023-04-11  Jakub Jelinek  

* range-op-float.cc (foperator_lt::op1_op2_relation): Return
VREL_VARYING instead of VREL_GE.
(foperator_le::op1_op2_relation): Return VREL_VARYING instead of
VREL_GT.
(foperator_gt::op1_op2_relation): Return VREL_VARYING instead of
VREL_LE.
(foperator_ge::op1_op2_relation): Return VREL_VARYING instead of
VREL_LT.
(foperator_unordered_lt::op1_op2_relation,
foperator_unordered_le::op1_op2_relation,
foperator_unordered_gt::op1_op2_relation,
foperator_unordered_ge::op1_op2_relation): New.

--- gcc/range-op-float.cc.jj2023-04-01 09:32:02.635423345 +0200
+++ gcc/range-op-float.cc   2023-04-03 10:42:54.0 +0200
@@ -831,7 +831,10 @@ public:
   relation_trio = TRIO_VARYING) const final override;
   relation_kind op1_op2_relation (const irange ) const final override
   {
-return lt_op1_op2_relation (lhs);
+relation_kind ret = lt_op1_op2_relation (lhs);
+if (ret == VREL_GE)
+  ret = VREL_VARYING; // Inverse of VREL_LT is VREL_UNGE.
+return ret;
   }
   bool op1_range (frange , tree type,
  const irange , const frange ,
@@ -947,6 +950,10 @@ public:
   relation_trio rel = TRIO_VARYING) const final override;
   relation_kind op1_op2_relation (const irange ) const final override
   {
+relation_kind ret = le_op1_op2_relation (lhs);
+if (ret == VREL_GT)
+  ret = VREL_VARYING; // Inverse of VREL_LE is VREL_UNGT.
+return ret;
 return le_op1_op2_relation (lhs);
   }
   bool op1_range (frange , tree type,
@@ -1057,7 +1064,10 @@ public:
   relation_trio = TRIO_VARYING) const final override;
   relation_kind op1_op2_relation (const irange ) const final override
   {
-return gt_op1_op2_relation (lhs);
+relation_kind ret = gt_op1_op2_relation (lhs);
+if (ret == VREL_LE)
+  ret = VREL_VARYING; // Inverse of VREL_GT is VREL_UNLE.
+return ret;
   }
   bool op1_range (frange , tree type,
  const irange , const frange ,
@@ -1177,7 +1187,10 @@ public:
   relation_trio = TRIO_VARYING) const final override;
   relation_kind op1_op2_relation (const irange ) const final override
   {
-return ge_op1_op2_relation (lhs);
+relation_kind ret = ge_op1_op2_relation (lhs);
+if (ret == VREL_LT)
+  ret = VREL_VARYING; // Inverse of VREL_GE is VREL_UNLT.
+return ret;
   }
   bool op1_range (frange , tree type,
  const irange , const frange ,
@@ -1571,6 +1584,7 @@ class foperator_unordered_lt : public ra
   using range_operator_float::fold_range;
   using range_operator_float::op1_range;
   using range_operator_float::op2_range;
+  using range_operator_float::op1_op2_relation;
 public:
   bool fold_range (irange , tree type,
   const frange , const frange ,
@@ -1599,6 +1613,13 @@ public:
return true;
   }
   }
+  relation_kind op1_op2_relation (const irange ) const final override
+  {
+relation_kind ret = lt_op1_op2_relation (lhs);
+if (ret == VREL_LT)
+  ret = VREL_VARYING; // Should have been VREL_UNLT.
+return ret;
+  }
   bool op1_range (frange , tree type,
  const irange ,
  const frange ,
@@ -1682,6 +1703,7 @@ class foperator_unordered_le : public ra
   using range_operator_float::fold_range;
   using range_operator_float::op1_range;
   using range_operator_float::op2_range;
+  using range_operator_float::op1_op2_relation;
 public:
   bool fold_range (irange , tree type,
   const frange , const frange ,
@@ -1710,6 +1732,13 @@ public:
return true;
   }
   }
+  relation_kind op1_op2_relation (const irange ) const final override
+  {
+relation_kind ret = le_op1_op2_relation (lhs);
+if (ret == VREL_LE)
+ 

Re: [PATCH] tree-optimization/109304 - properly handle instrumented aliases

2023-04-11 Thread Richard Biener via Gcc-patches
On Tue, 4 Apr 2023, Jan Hubicka wrote:

> > On Tue, Apr 04, 2023 at 01:21:40AM +0200, Jan Hubicka via Gcc-patches wrote:
> > > It is however really side case and I am worried about dropping
> > > pure/const from builtin declarations...
> > 
> > Yeah, that can certainly break stuff.  See e.g. the recently fixed
> > ICE when memcmp wasn't pure in PR109258.
> 
> Yep, i think itis better to poke about this in stage1 (it is a can of
> worms).  Clearly we have conflict here: if memcmp is implemented locally
> one can construct a testcase where profile would be rejected on
> -fprofile-use time if const flag is not cleared :(.  But it should be
> rare thing happening in practice.

If we have a locally implemented memcmp then calls to it shouldn't
be marked 'built-in' ...   But then when the compiler itself
creates a memcmp call it would need to resolve to a not instrumented
library copy, or alternatively we should tell the compiler it cannot
emit such a call (but for some builtins not being able to emit them
might prove interesting).

Richard.


Re: [PATCH] tree-optimization/109304 - properly handle instrumented aliases

2023-04-11 Thread Richard Biener via Gcc-patches
On Tue, 4 Apr 2023, Jan Hubicka wrote:

> > On Tue, 28 Mar 2023, Richard Biener wrote:
> > 
> > > When adjusting calls to reflect instrumentation we failed to handle
> > > calls to aliases since they appear to have no body.  Instead resort
> > > to symtab node availability.  The patch also avoids touching
> > > internal function calls in a more obvious way (builtins might
> > > have a body available).
> > > 
> > > profiledbootstrap & regtest running on x86_64-unknown-linux-gnu.
> > > 
> > > Honza - does this look OK?
> > >   PR tree-optimization/109304
> > >   * tree-profile.cc (tree_profiling): Use symtab node
> > >   availability to decide whether to skip adjusting calls.
> > >   Do not adjust calls to internal functions.
> > > @@ -842,12 +842,15 @@ tree_profiling (void)
> > >   for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next ())
> > > {
> > >   gcall *call = dyn_cast  (gsi_stmt (gsi));
> > > - if (!call)
> > > + if (!call || gimple_call_internal_p (call))
> > > continue;
> > >  
> > >   /* We do not clear pure/const on decls without body.  */
> > >   tree fndecl = gimple_call_fndecl (call);
> > > - if (fndecl && !gimple_has_body_p (fndecl))
> > > + cgraph_node *callee;
> > > + if (fndecl
> > > + && (callee = cgraph_node::get (fndecl))
> > > + && callee->get_availability (node) == AVAIL_NOT_AVAILABLE)
> 
> As discussed earlier, the testcase I posted can be adjusted to put the
> const declared wrapper into another translation unit, so I think we will
> need to drop the visibility check completely.  But as discussed, it is
> wrong code issue, but not a regression, so we may go with the
> availability check as you suggest. So the patch is OK. 
> 
> 
> I wonder if we do not want to drop it everywhere (as we plan for next
> stage1 anyway).  I think similar ICE as in the PR can be produced with
> LTO. In normal situation declaration merging will do the right thing:
> If you have unit A calling const foo externally, it won't get processed
> by the code above.  However unit B declaring foo will get it downgraded
> to non-const.
> 
> Now at WPA time we will read both A and B and in declaration merging B's
> definition will prevail.  This won't happen if lto_symtab_merge_p
> returns false which can probably be triggered by adding warning/error
> attribute to B's declaration but not to A's.
> 
> It is however really side case and I am worried about dropping
> pure/const from builtin declarations...

Yeah, that's what I'm worried about as well.  I guess we'd need to
do the demotion to non-const/pure at WPA time and have a flag
in the cgraph node indicating instrument_add_{read,write}?  But
then how should we deal with C++ comdats instrumented in one TU
but not in another?  Does this mean we should do instrumentation
at IPA time instead of as small IPA pass before IPA?

That said, when there's a definition of say strlen in a TU and
that's instrumented we do want to drop pure from calls but if
not then we shouldn't worry.

Without LTO we'd still run into coverage issues but at least
with LTO we shouldn't ICE?

Richard.


[PATCH] c++: Fix Solaris bootstraps across midnight

2023-04-11 Thread Jakub Jelinek via Gcc-patches
Hi!

When working on the PR109040 fix, I wanted to test it on some
WORD_REGISTER_OPERATIONS target and tried sparc-solaris on GCC Farm.
My bootstrap failed in comparison failure on cp/module.o, because
Solaris date doesn't support the -r option and one stage's cp/module.o
was built before midnight and next stage's cp/module.o after midnight,
so they had different -DMODULE_VERSION= value.

Now, I think the advice (don't bootstrap at midnight) is something
we shouldn't have, so the following patch stores the module version
(still generated through the same way, date -r cp/module.cc
if it works, otherwise just date) into a temporary file, makes sure
that temporary file is updated when cp/module.cc source is updated
and when date -r doesn't work copies file from previous stage
if it is newer than cp/module.cc.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-04-11  Jakub Jelinek  

* Make-lang.in (s-cp-module-version): New target.
(cp/module.o): Depend on it.
(MODULE_VERSION): Remove variable.
(CFLAGS-cp/module.o): For -DMODULE_VERSION= argument just
cat s-cp-module-version.

--- gcc/cp/Make-lang.in.jj  2023-01-16 11:52:16.057734447 +0100
+++ gcc/cp/Make-lang.in 2023-04-06 10:03:34.344469067 +0200
@@ -59,12 +59,21 @@ CFLAGS-cp/module.o += -DHOST_MACHINE=\"$
 
 # In non-release builds, use a date-related module version.
 ifneq ($(DEVPHASE_c),)
-# Some date's don't grok 'r', if so, simply use today's
-# date (don't bootstrap at midnight).
-MODULE_VERSION := $(shell date -r $(srcdir)/cp/module.cc '+%y%m%d-%H%M' \
-  2>/dev/null || date '+%y%m%d-' 2>/dev/null || echo 0)
-
-CFLAGS-cp/module.o += -DMODULE_VERSION='($(subst -,,$(MODULE_VERSION))U)'
+# Some date's don't grok 'r', if so, simply use today's date,
+# but use date from previous stage if bootstrapping to avoid breaking
+# bootstraps across midnight.
+s-cp-module-version: $(srcdir)/cp/module.cc
+   MODULE_VERSION=`if date -r $(srcdir)/cp/module.cc '+%y%m%d%H%M' \
+ 2>/dev/null; then :; \
+   elif test ../prev-gcc/s-cp-module-version -nt \
+  $(srcdir)/cp/module.cc; then \
+ cat ../prev-gcc/s-cp-module-version; \
+   else \
+ date '+%y%m%d' 2>/dev/null; \
+   fi`; \
+   echo $${MODULE_VERSION}U > s-cp-module-version
+cp/module.o: s-cp-module-version
+CFLAGS-cp/module.o += -DMODULE_VERSION='$(shell cat s-cp-module-version)'
 endif
 
 # Create the compiler driver for g++.

Jakub



Re: [PATCH] tree-vect-generic: Fix up ICE with SSA_NAME_OCCURS_IN_ABNORMAL_PHI [PR109392]

2023-04-11 Thread Jakub Jelinek via Gcc-patches
On Wed, Apr 05, 2023 at 02:11:08PM +0200, Richard Biener wrote:
> Ok, but maybe there’s a gimple_build overload that meanwhile implements
> the desired semantics?  It would probably need to specify the valueization
> hook to follow SSA edges beyond the sequence we’re currently building.

Jeff has apparently committed the patch in the meantime.
For gimple_build, did you mean to use it just for this fallback case,
or instead of the initial resimplification as well?

> > 2023-04-05  Jakub Jelinek  
> > 
> >PR tree-optimization/109392
> >* tree-vect-generic.cc (tree_vec_extract): If maybe_push_res_to_seq
> >fails, build BIT_FIELD_REF insn without trying to simplify it.
> > 
> >* gcc.dg/pr109392.c: New test.
> > 
> > --- gcc/tree-vect-generic.cc.jj2023-03-23 10:02:18.997935620 +0100
> > +++ gcc/tree-vect-generic.cc2023-04-04 14:28:32.977729134 +0200
> > @@ -174,7 +174,16 @@ tree_vec_extract (gimple_stmt_iterator *
> >   opr.resimplify (NULL, follow_all_ssa_edges);
> >   gimple_seq stmts = NULL;
> >   tree res = maybe_push_res_to_seq (, );
> > -  gcc_assert (res);
> > +  if (!res)
> > +{
> > +  /* This can happen if SSA_NAME_OCCURS_IN_ABNORMAL_PHI are
> > + used.  Build BIT_FIELD_REF manually otherwise.  */
> > +  t = build3 (BIT_FIELD_REF, type, t, bitsize, bitpos);
> > +  res = make_ssa_name (type);
> > +  gimple *g = gimple_build_assign (res, t);
> > +  gsi_insert_before (gsi, g, GSI_SAME_STMT);
> > +  return res;
> > +}
> >   gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
> >   return res;
> > }

Jakub



Ping [PATCH v3] Add condition coverage profiling

2023-04-11 Thread Jørgen Kvalsvik
On 05/12/2022 10:40, Jørgen Kvalsvik wrote:
> This patch adds support in gcc+gcov for modified condition/decision
> coverage (MC/DC) with the -fprofile-conditions flag. MC/DC is a type of
> test/code coverage and it is particularly important in the avation and
> automotive industries for safety-critical applications. MC/DC it is
> required for or recommended by:
> 
> * DO-178C for the most critical software (Level A) in avionics
> * IEC 61508 for SIL 4
> * ISO 26262-6 for ASIL D
> 
>  From the SQLite webpage:
> 
> Two methods of measuring test coverage were described above:
> "statement" and "branch" coverage. There are many other test
> coverage metrics besides these two. Another popular metric is
> "Modified Condition/Decision Coverage" or MC/DC. Wikipedia defines
> MC/DC as follows:
> 
> * Each decision tries every possible outcome.
> * Each condition in a decision takes on every possible outcome.
> * Each entry and exit point is invoked.
> * Each condition in a decision is shown to independently affect
>   the outcome of the decision.
> 
> In the C programming language where && and || are "short-circuit"
> operators, MC/DC and branch coverage are very nearly the same thing.
> The primary difference is in boolean vector tests. One can test for
> any of several bits in bit-vector and still obtain 100% branch test
> coverage even though the second element of MC/DC - the requirement
> that each condition in a decision take on every possible outcome -
> might not be satisfied.
> 
> https://sqlite.org/testing.html#mcdc
> 
> Wahlen, Heimdahl, and De Silva "Efficient Test Coverage Measurement for
> MC/DC" describes an algorithm for adding instrumentation by carrying
> over information from the AST, but my algorithm analyses the the control
> flow graph to instrument for coverage. This has the benefit of being
> programming language independent and faithful to compiler decisions
> and transformations, although I have only tested it on constructs in C
> and C++, see testsuite/gcc.misc-tests and testsuite/g++.dg.
> 
> Like Wahlen et al this implementation records coverage in fixed-size
> bitsets which gcov knows how to interpret. This is very fast, but
> introduces a limit on the number of terms in a single boolean
> expression, the number of bits in a gcov_unsigned_type (which is
> typedef'd to uint64_t), so for most practical purposes this would be
> acceptable. This limitation is in the implementation and not the
> algorithm, so support for more conditions can be added by also
> introducing arbitrary-sized bitsets.
> 
> For space overhead, the instrumentation needs two accumulators
> (gcov_unsigned_type) per condition in the program which will be written
> to the gcov file. In addition, every function gets a pair of local
> accumulators, but these accmulators are reused between conditions in the
> same function.
> 
> For time overhead, there is a zeroing of the local accumulators for
> every condition and one or two bitwise operation on every edge taken in
> the an expression.
> 
> In action it looks pretty similar to the branch coverage. The -g short
> opt carries no significance, but was chosen because it was an available
> option with the upper-case free too.
> 
> gcov --conditions:
> 
> 3:   17:void fn (int a, int b, int c, int d) {
> 3:   18:if ((a && (b || c)) && d)
> condition outcomes covered 3/8
> condition  0 not covered (true false)
> condition  1 not covered (true)
> condition  2 not covered (true)
> condition  3 not covered (true)
> 1:   19:x = 1;
> -:   20:else
> 2:   21:x = 2;
> 3:   22:}
> 
> gcov --conditions --json-format:
> 
> "conditions": [
> {
> "not_covered_false": [
> 0
> ],
> "count": 8,
> "covered": 3,
> "not_covered_true": [
> 0,
> 1,
> 2,
> 3
> ]
> }
> ],
> 
> Some expressions, mostly those without else-blocks, are effectively
> "rewritten" in the CFG construction making the algorithm unable to
> distinguish them:
> 
> and.c:
> 
> if (a && b && c)
> x = 1;
> 
> ifs.c:
> 
> if (a)
> if (b)
> if (c)
> x = 1;
> 
> gcc will build the same graph for both these programs, and gcov will
> report boths as 3-term expressions. It is vital that it is not
> interpreted the other way around (which is consistent with the shape of
> the graph) because otherwise the masking would be wrong for the and.c
> program which is a more severe error. While surprising, users would
> probably expect some minor rewriting of semantically-identical
> expressions.
> 
> and.c.gcov:
> #:2:if (a && b && c)
> condition outcomes covered 6/6
> #:3:x = 1;
> 
> ifs.c.gcov:
> #: