Re: [PATCH] testsuite: update requires for powerpc/float128-cmp2-runnable.c

2023-04-12 Thread guojiufu via Gcc-patches

Hi Mike,

On 2023-04-12 22:46, Michael Meissner wrote:

On Wed, Apr 12, 2023 at 01:31:46PM +0800, Jiufu Guo wrote:

I understand that QP insns (e.g. xscmpexpqp) is valid if the system
meets ISA3.0, no matter BE/LE, 32-bit/64-bit.
I think option -mfloat128-hardware is designed for QP insns.

While there is one issue, on BE machine, when compiling with options
"-mfloat128-hardware -m32", an error message is generated:
"error: '%<-mfloat128-hardware%>' requires '-m64'"

(I'm wondering if we need to relax this limitation.)


In the past, the machine independent portion of the compiler demanded 
that for
scalar mode, there be an integer mode of the same size, since sometimes 
moves
are converted to using an int RTL mode.  Since we don't have TImode 
support in
32-bit, you would get various errors because something tried to do a 
TImode

move for KFmode types, and the TImode wasn't available.

If somebody wants to verify that this now works on 32-bit and/or 
implements

TImode on 32-bit, then we can relax the restriction.


Thanks a lot for pointing out this!

BR,
Jeff (Jiufu)


Re: [PATCH] testsuite: filter out warning noise for CWE-1341 test

2023-04-12 Thread Jiufu Guo via Gcc-patches


Add more reviewers. :)

Jiufu Guo  writes:

> Hi,
>
> The case file-CWE-1341-example.c checkes [CWE-1341](`double-fclose`).
> While on some systems, besides [CWE-1341], a message of [CWE-415] is
> also reported. On those systems, attribute `malloc` may be attached on
> fopen:
> ```
> # 258 "/usr/include/stdio.h" 3 4
> extern FILE *fopen (const char *__restrict __filename,
>   const char *__restrict __modes) 
>   
>   
>   __attribute__ ((__malloc__)) __attribute__ ((__malloc__ (fclose, 1))) ;
>
> or say: __attribute_malloc__ __attr_dealloc_fclose __wur;
> ```
>
> It would be ok to suppress other message except CWE-1341 for this case.
> This patch add -Wno-analyzer-double-free to make this case pass on
> those systems.
>
> Tested on ppc64 both BE and LE.
> Is this ok for trunk?
>
> BR,
> Jeff (Jiufu)
>
> gcc/testsuite/ChangeLog:
>
>   PR target/108722
>   * gcc.dg/analyzer/file-CWE-1341-example.c: Update.
>
> ---
>  gcc/testsuite/gcc.dg/analyzer/file-CWE-1341-example.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/gcc/testsuite/gcc.dg/analyzer/file-CWE-1341-example.c 
> b/gcc/testsuite/gcc.dg/analyzer/file-CWE-1341-example.c
> index 2add3cb109b..830cb0376ea 100644
> --- a/gcc/testsuite/gcc.dg/analyzer/file-CWE-1341-example.c
> +++ b/gcc/testsuite/gcc.dg/analyzer/file-CWE-1341-example.c
> @@ -19,6 +19,9 @@
>  
> IN NO EVENT SHALL THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR 
> IS SPONSORED BY (IF ANY), THE MITRE CORPORATION, ITS BOARD OF TRUSTEES, 
> OFFICERS, AGENTS, AND EMPLOYEES BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 
> LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 
> OUT OF OR IN CONNECTION WITH THE INFORMATION OR THE USE OR OTHER DEALINGS IN 
> THE CWE.  */
>  
> +/* This case checks double-fclose only, suppress other warning.  */
> +/* { dg-additional-options -Wno-analyzer-double-free } */
> +
>  #include 
>  #include 
>  #include 


New Croatian PO file for 'gcc' (version 13.1-b20230409)

2023-04-12 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Croatian team of translators.  The file is available at:

https://translationproject.org/latest/gcc/hr.po

(This file, 'gcc-13.1-b20230409.hr.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: [PATCH] combine, v4: Fix AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]

2023-04-12 Thread Jeff Law via Gcc-patches




On 4/12/23 10:58, Jakub Jelinek wrote:

On Wed, Apr 12, 2023 at 12:02:12PM +0200, Jakub Jelinek via Gcc-patches wrote:

I've tried the pr108947.c testcase, but I see no differences in the assembly
before/after the patch (but dunno if I'm using the right options).
The pr109040.c testcase from the patch I don't see the expected zero
extension without the patch and do see it with it.


Seems my cross defaulted to 32-bit compilation, reproduced it with
additional -mabi=lp64 -march=rv64gv even on the pr108947.c test.
So, let's include that test in the patch too:

2023-04-12  Jeff Law  
Jakub Jelinek  

PR target/108947
PR target/109040
* combine.cc (simplify_and_const_int_1): Compute nonzero_bits in
word_mode rather than mode if WORD_REGISTER_OPERATIONS and mode is
smaller than word_mode.
* simplify-rtx.cc (simplify_context::simplify_binary_operation_1)
: Likewise.

* gcc.dg/pr108947.c: New test.
* gcc.c-torture/execute/pr109040.c: New test.
Bootstrap of the v3 patch has completed.  Regression testing is still 
spinning.   It should be done and waiting for me when I wake up in the 
morning.


jeff-


New Croatian PO file for 'gcc' (version 13.1-b20230409)

2023-04-12 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Croatian team of translators.  The file is available at:

https://translationproject.org/latest/gcc/hr.po

(This file, 'gcc-13.1-b20230409.hr.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: [PATCH] gcc-13: Mention Intel AMX-COMPLEX ISA support and revise march support

2023-04-12 Thread Hongtao Liu via Gcc-patches
On Mon, Apr 10, 2023 at 10:08 AM Haochen Jiang via Gcc-patches
 wrote:
>
> Hi all,
>
> This patch mentions Intel AMX-COMPLEX ISA support in GCC 13.
>
> Also it revises the march support according to newly released
> Intel Architecture Instruction Set Extensions and Future Features.
>
> Ok for trunk?
>
> BRs,
> Haochen
>
> ---
>  htdocs/gcc-13/changes.html | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
> index 71cb335d..84207104 100644
> --- a/htdocs/gcc-13/changes.html
> +++ b/htdocs/gcc-13/changes.html
> @@ -520,6 +520,10 @@ a work-in-progress.
>RAO-INT intrinsics are available via the -mraoint
>compiler switch.
>
> +  New ISA extension support for Intel AMX-COMPLEX was added.
> +  AMX-COMPLEX intrinsics are available via the -mamx-complex
> +  compiler switch.
> +  
>GCC now supports the Intel CPU named Raptor Lake through
>  -march=raptorlake.
>  Raptor Lake is based on Alder Lake.
> @@ -538,9 +542,13 @@ a work-in-progress.
>  The switch enables the AVX-IFMA, AVX-VNNI-INT8, AVX-NE-CONVERT, CMPccXADD
>  and RAO-INT ISA extensions.
>
> +  GCC now supports the Intel CPU named Emerald Rapids through
> +-march=emeraldrapids.
> +Emerald Rapids is based on Sapphire Rapids.
> +  
>GCC now supports the Intel CPU named Granite Rapids through
>  -march=graniterapids.
> -The switch enables the AMX-FP16 and PREFETCHI ISA extensions.
> +The switch enables the AMX-FP16, PREFETCHI and AMX-COMPLEX ISA 
> extensions.
Ok
>
>GCC now supports AMD CPUs based on the znver4 core
>  via -march=znver4.  The switch makes GCC consider
> --
> 2.31.1
>


-- 
BR,
Hongtao


[r13-7135 Regression] FAIL: gcc.dg/vect/vect-simd-clone-18f.c scan-tree-dump-times vect "[\\n\\r] [^\\n]* = foo\\.simdclone" 2 on Linux/x86_64

2023-04-12 Thread haochen.jiang via Gcc-patches
On Linux/x86_64,

58c8c1b383bc3c286d6527fc6e8fb62463f9a877 is the first bad commit
commit 58c8c1b383bc3c286d6527fc6e8fb62463f9a877
Author: Andre Vieira 
Date:   Tue Apr 11 10:07:43 2023 +0100

if-conv: Restore MASK_CALL conversion [PR10]

caused

FAIL: gcc.dg/vect/vect-simd-clone-16e.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 3
FAIL: gcc.dg/vect/vect-simd-clone-16f.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-17e.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 3
FAIL: gcc.dg/vect/vect-simd-clone-17f.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-18e.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 3
FAIL: gcc.dg/vect/vect-simd-clone-18f.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 2

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-7135/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-16e.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-16e.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-16f.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-16f.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-17e.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-17e.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-17f.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-17f.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-18e.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-18e.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-18f.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-18f.c 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com)


New Croatian PO file for 'gcc' (version 13.1-b20230409)

2023-04-12 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Croatian team of translators.  The file is available at:

https://translationproject.org/latest/gcc/hr.po

(This file, 'gcc-13.1-b20230409.hr.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: [PATCH] RISC-V: Fix PR108279

2023-04-12 Thread Jeff Law via Gcc-patches




On 3/27/23 00:59, juzhe.zh...@rivai.ai wrote:

From: Juzhe-Zhong 

 PR 108270

Fix bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108270.

Consider the following testcase:
void f (void * restrict in, void * restrict out, int l, int n, int m)
{
   for (int i = 0; i < l; i++){
 for (int j = 0; j < m; j++){
   for (int k = 0; k < n; k++)
 {
   vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + j, 17);
   __riscv_vse8_v_i8mf8 (out + i + j, v, 17);
 }
 }
   }
}

Compile option: -O3

Before this patch:
mv  a7,a2
mv  a6,a0   
 mv t1,a1
mv  a2,a3
vsetivlizero,17,e8,mf8,ta,ma
...

After this patch:
 mv  a7,a2
 mv  a6,a0
 mv  t1,a1
 mv  a2,a3
 ble a7,zero,.L1
 ble a4,zero,.L1
 ble a3,zero,.L1
 add a1,a0,a4
 li  a0,0
 vsetivlizero,17,e8,mf8,ta,ma
...

It will produce potential bug when:

int main ()
{
   vsetivli zero, 100,.
   f (in, out, 0,0,0)
   asm volatile ("csrr a0,vl":::"memory");

   // Before this patch the a0 is 17. (Wrong).
   // After this patch the a0 is 100. (Correct).
   ...
}

gcc/ChangeLog:

 * config/riscv/riscv-vsetvl.cc 
(vector_infos_manager::all_empty_predecessor_p): New function.
 (pass_vsetvl::backward_demand_fusion): Fix bug.
 * config/riscv/riscv-vsetvl.h: New function declare.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c: Adapt test.
 * gcc.target/riscv/rvv/vsetvl/imm_conflict-3.c: Adapt test.
 * gcc.target/riscv/rvv/vsetvl/pr108270.c: New test.
I've largely figured this out.  But I'd still recommend we wait for 
gcc-14.  The BZ is a missed optimization (poor placement of the vsetvl). 
  We can address is with your patch once gcc-13 branches.


Thanks for walking my through the implementation details.

Jeff


Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-12 Thread 钟居哲
Yeah, like kito said.
Turns out the tuple type model in ARM SVE is the optimal solution for RVV.
And we like ARM SVE style implmentation.

And now we see swapping rtx_code and mode in rtx_def can make rtx_def overal 
not exceed 64 bit.
But it seems that there is still problem in tree_type_common and 
tree_decl_common, is that right?

After several trys (remove all redundant TI/TF vector modes and FP16 vector 
mode), now there are 252 modes
in RISC-V port. Basically, I can keep supporting new RVV intrinsisc features 
recently.
However, we can't support more in the future, for example, FP16 vector, BF16 
vector, matrix modes, VLS modes,...etc.

From RVV side, I think extending 1 more bit of machine mode should be enough 
for RVV (overal 512 modes).
Is it possible make it happen in tree_type_common and tree_decl_common, 
Richards?

Thank you so much for all comments.


juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-04-12 17:31
To: Richard Biener
CC: juzhe.zh...@rivai.ai; richard.sandiford; jeffreyalaw; gcc-patches; palmer; 
jakub
Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit
> > The concept of fractional LMUL is the same as the concept of AArch64's
> > partial SVE vectors,
> > so they can only access the lowest part, like SVE's partial vector.
> >
> > We want to spill/restore the exact size of those modes (1/2, 1/4,
> > 1/8), so adding dedicated modes for those partial vector modes should
> > be unavoidable IMO.
> >
> > And even if we use sub-vector, we still need to define those partial
> > vector types.
>
> Could you use integer modes for the fractional vectors?
 
You mean using the scalar integer mode like using (subreg:SI
(reg:VNx4SI) 0) to represent
LMUL=1/4?
(Assume VNx4SI is mode for M1)
 
If so I think it might not be able to model that right - it seems like
we are using 32-bits
but actually we are using poly_int16(1, 1) * 32 bits.
 
> For computation you can always appropriately limit the LEN?
 
RVV provide zvl*b extension like zvlb (e.g.zvl128b or zvl256b)
to guarantee the vector length is at least larger than N bits, but it's
just guarantee the minimal length like SVE guarantee the minimal
vector length is 128 bits
 


Re: [PATCH] RISC-V: Fix PR108279

2023-04-12 Thread Jeff Law via Gcc-patches




On 4/5/23 07:53, juzhe.zh...@rivai.ai wrote:

 >> So fusion in this context is really about identifying cases where two

configuration settings are equivalent and you "fuse" them together.
Presumably this is only going to be possible when the vector insns are
just doing data movement rather than actual computations?



If my understanding is correct, I can kind of see why you're doing
fusion during phase 3.  My sense is there's a better way, but I'm having
a bit of trouble working out the details of what that should be to
myself.  In any event, revamping parts of the vsetvl insertion code
isn't the kind of thing we should be doing now.


The vsetvl demand fusion happens is not necessary "equivalent", instead, we
call it we will do demand fusion when they are "compatible".
And the fusion can happen between any vector insns including data movement
and actual computations.
I wasn't precise enough in my language, sorry about that.  "compatible" 
would definitely have been a better choice of words on my part.





What is "compatible" ??  This definition is according to RVV ISA.
For example , For a vadd.vv need a vsetvl zero, 4, e32,m1,ta,ma.
and a vle.v need a vsetvl zero,4,e8,mf4,ta,ma.

According to RVV ISA:
vadd.vv demand SEW = 32, LMUL = M1, AVL = 4
vle.v demand RATIO = SEW/LMUL = 32, AVL = 4.
So after demand fusion, the demand becomes SEW = 32, LMUL = M1, AVL = 4.
Such vsetvl instruction is configured as this demand fusion, we call it 
"compatible"

since we can find a common vsetvl VL/VTYPE status for both vadd.vv and vle.v
Thanks.  Yea, that makes sense.  Maybe a better way to state what I was 
thinking was that for pure data movement we have degrees of freedom to 
adjust the vector configuration to match something else and thus remove 
a vsetvl.


jeff


Re: Re: [PATCH] RISC-V: Fix incorrect condition of EEW = 64 mode

2023-04-12 Thread 钟居哲
Yeah. But this patch is not appropriate now since it is conflict with the 
upstream GCC.
I am gonna re-check the current upstream GCC and the queue patch for GCC 14.
If there are some conflicts, I will resend them.

Thanks


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-04-13 07:00
To: juzhe.zhong; gcc-patches
CC: kito.cheng; palmer
Subject: Re: [PATCH] RISC-V: Fix incorrect condition of EEW = 64 mode
 
 
On 4/6/23 19:11, juzhe.zh...@rivai.ai wrote:
> From: Juzhe-Zhong 
> 
> This patch should be merged before this patch:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614935.html
> 
> According to RVV ISA, the EEW = 64 is enable only when -march=*zve64*
> Current condition is incorrect, since -march=*zve32*_zvl64b will enable EEW = 
> 64 which
> is incorrect.
> 
> gcc/ChangeLog:
> 
>  * config/riscv/riscv-vector-switch.def (ENTRY): Change to 
> TARGET_VECTOR_ELEN_64.
Just to be clear, this was for gcc-14, right?  I don't see these modes 
in the current trunk.
 
jeff
 


Re: [PATCH] RISC-V: Fix incorrect condition of EEW = 64 mode

2023-04-12 Thread Jeff Law via Gcc-patches




On 4/6/23 19:11, juzhe.zh...@rivai.ai wrote:

From: Juzhe-Zhong 

This patch should be merged before this patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614935.html

According to RVV ISA, the EEW = 64 is enable only when -march=*zve64*
Current condition is incorrect, since -march=*zve32*_zvl64b will enable EEW = 
64 which
is incorrect.

gcc/ChangeLog:

 * config/riscv/riscv-vector-switch.def (ENTRY): Change to 
TARGET_VECTOR_ELEN_64.
Just to be clear, this was for gcc-14, right?  I don't see these modes 
in the current trunk.


jeff


[committed] libstdc++: Fix some AIX test failures

2023-04-12 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux and powerpc-aix. Pushed to trunk.

-- >8 --

AIX  defines struct tstate with non-reserved names, so
adjust the 17_intro/names.cc test. It also defines struct user, which
conflicts with namespace user in some tests.

Replacing the global operator new doesn't work on AIX the same way as it
does for ELF, so skip some tests that depend on replacing it.

Add missing DG directives to synchronized_value test so it doesn't run
for the single-threaded AIX multilib.

libstdc++-v3/ChangeLog:

* testsuite/17_intro/names.cc [_AIX]: Do not define policy.
* testsuite/19_diagnostics/error_code/cons/lwg3629.cc: Rename
namespace to avoid clashing with libc struct.
* testsuite/19_diagnostics/error_condition/cons/lwg3629.cc:
Likewise.
* testsuite/23_containers/unordered_map/96088.cc: Skip on AIX.
* testsuite/23_containers/unordered_multimap/96088.cc: Likewise.
* testsuite/23_containers/unordered_multiset/96088.cc: Likewise.
* testsuite/23_containers/unordered_set/96088.cc: Likewise.
* testsuite/experimental/synchronized_value.cc: Require gthreads
and add missing option for pthreads targets.
---
 libstdc++-v3/testsuite/17_intro/names.cc   |  2 ++
 .../19_diagnostics/error_code/cons/lwg3629.cc  | 18 +-
 .../error_condition/cons/lwg3629.cc| 18 +-
 .../23_containers/unordered_map/96088.cc   |  1 +
 .../23_containers/unordered_multimap/96088.cc  |  1 +
 .../23_containers/unordered_multiset/96088.cc  |  1 +
 .../23_containers/unordered_set/96088.cc   |  1 +
 .../experimental/synchronized_value.cc |  2 ++
 8 files changed, 26 insertions(+), 18 deletions(-)

diff --git a/libstdc++-v3/testsuite/17_intro/names.cc 
b/libstdc++-v3/testsuite/17_intro/names.cc
index 9932dea14d5..eb4d064177c 100644
--- a/libstdc++-v3/testsuite/17_intro/names.cc
+++ b/libstdc++-v3/testsuite/17_intro/names.cc
@@ -243,6 +243,8 @@
 #undef v
 //  defines trb::func and cputime_tmr::func
 #undef func
+//  defines tstate::policy
+#undef policy
 #endif
 
 #ifdef __APPLE__
diff --git a/libstdc++-v3/testsuite/19_diagnostics/error_code/cons/lwg3629.cc 
b/libstdc++-v3/testsuite/19_diagnostics/error_code/cons/lwg3629.cc
index 70fa5e80503..bd7c6ce3d9e 100644
--- a/libstdc++-v3/testsuite/19_diagnostics/error_code/cons/lwg3629.cc
+++ b/libstdc++-v3/testsuite/19_diagnostics/error_code/cons/lwg3629.cc
@@ -3,18 +3,18 @@
 // 3629. make_error_code and make_error_condition are customization points
 // Verify that make_error_code is looked up using ADL only.
 
-namespace user
+namespace User
 {
   struct E1;
 }
 
 // N.B. not in associated namespace of E1, and declared before .
-user::E1 make_error_code(user::E1);
+User::E1 make_error_code(User::E1);
 
 #include  // declares std::make_error_code(future_errc)
 #include 
 
-namespace user
+namespace User
 {
   struct E1
   {
@@ -32,17 +32,17 @@ namespace user
   };
 }
 
-template<> struct std::is_error_code_enum : std::true_type { };
-template<> struct std::is_error_code_enum : std::true_type { };
-template<> struct std::is_error_code_enum : std::true_type { };
+template<> struct std::is_error_code_enum : std::true_type { };
+template<> struct std::is_error_code_enum : std::true_type { };
+template<> struct std::is_error_code_enum : std::true_type { };
 
 // ::make_error_code(E1) should not be found by name lookup.
-std::error_code e1( user::E1{} ); // { dg-error "here" }
+std::error_code e1( User::E1{} ); // { dg-error "here" }
 
 // std::make_error_code(future_errc) should not be found by name lookup.
-std::error_code e2( user::E2{} ); // { dg-error "here" }
+std::error_code e2( User::E2{} ); // { dg-error "here" }
 
 // std::make_error_code(errc) should not be found by name lookup.
-std::error_code e3( user::E3{} ); // { dg-error "here" }
+std::error_code e3( User::E3{} ); // { dg-error "here" }
 
 // { dg-error "use of deleted function" "" { target *-*-* } 0 }
diff --git 
a/libstdc++-v3/testsuite/19_diagnostics/error_condition/cons/lwg3629.cc 
b/libstdc++-v3/testsuite/19_diagnostics/error_condition/cons/lwg3629.cc
index 562a99aee3b..d72163b1a07 100644
--- a/libstdc++-v3/testsuite/19_diagnostics/error_condition/cons/lwg3629.cc
+++ b/libstdc++-v3/testsuite/19_diagnostics/error_condition/cons/lwg3629.cc
@@ -3,18 +3,18 @@
 // 3629. make_error_code and make_error_condition are customization points
 // Verify that make_error_condition is looked up using ADL only.
 
-namespace user
+namespace User
 {
   struct E1;
 }
 
 // N.B. not in associated namespace of E1, and declared before .
-user::E1 make_error_condition(user::E1);
+User::E1 make_error_condition(User::E1);
 
 #include  // declares std::make_error_condition(future_errc)
 #include 
 
-namespace user
+namespace User
 {
   struct E1
   {
@@ -32,17 +32,17 @@ namespace user
   };
 }
 
-template<> struct std::is_error_condition_enum : std::true_type { };
-template<> struct std::is_error_condition_enum : 

[committed] libstdc++: Document libstdc++exp.a library for -fcontracts

2023-04-12 Thread Jonathan Wakely via Gcc-patches
Pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* doc/xml/manual/using.xml: Document libstdc++exp.a library.
* doc/html/*: Regenerate.

diff --git a/libstdc++-v3/doc/xml/manual/using.xml 
b/libstdc++-v3/doc/xml/manual/using.xml
index 7f011a6d931..3a507fc1671 100644
--- a/libstdc++-v3/doc/xml/manual/using.xml
+++ b/libstdc++-v3/doc/xml/manual/using.xml
@@ -108,6 +108,14 @@
   
 
 
+
+  -lstdc++exp
+  Linking to libstdc++exp
+is required for use of the C++ Contracts extensions enabled by
+-fcontracts.
+  
+
+
 
   -lstdc++_libbacktrace
   Until C++23 support is non-experimental, linking to
@@ -1700,14 +1708,25 @@ A quick read of the relevant part of the GCC
   no shared library for it. To use the library you should include
   experimental/filesystem
   and link with -lstdc++fs. The library implementation
-  is incomplete on non-POSIX platforms, specifically Windows support is
-  rudimentary.
+  is incomplete on non-POSIX platforms, specifically Windows is only
+  partially supported.
 
 
 
-  Due to the experimental nature of the Filesystem library the usual
+  GCC 13 includes an implementation of the C++ Contracts library defined by
+  http://www.w3.org/1999/xlink; 
xlink:href="https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1429r3.pdf;>P1429R3.
+  Because this is an experimental extension, not part of the C++ standard,
+  it is implemented in a separate library,
+  libstdc++exp.a, and there is
+  no shared library for it. To use the library you should include
+  experimental/contract
+  and link with -lstdc++exp.
+
+
+
+  Due to the experimental nature of these libraries the usual
   guarantees about ABI stability and backwards compatibility do not apply
-  to it. There is no guarantee that the components in any
+  to them. There is no guarantee that the components in any
   experimental/xxx
   header will remain compatible between different GCC releases.
 
-- 
2.39.2



Re: [PATCH] PR tree-optimization/109462 - Don't use ANY PHI equivalences in range-on-entry.

2023-04-12 Thread Andrew MacLeod via Gcc-patches


On 4/12/23 07:01, Richard Biener wrote:

On Wed, Apr 12, 2023 at 12:59 PM Jakub Jelinek  wrote:


Would be nice.

Though, I'm afraid it still wouldn't fix the PR101912 testcase, because
it has exactly what happens in this PR, undefined phi arg from the
pre-header and uses of the previous iteration's value (i.e. across
backedge).

Well yes, that's what's not allowed.  So when the PHI dominates the
to-be-equivalenced argument edge src then the equivalence isn't
valid because there's a place (that very source block for example) a use of the
PHI lhs could appear and where we'd then mixup iterations.

If we want to implement this cleaner, then as you say, we don't create 
the equivalence if the PHI node dominates the argument edge.  The 
attached patch does just that, removing the both the "fix" for 108139 
and the just committed one for 109462, replacing them with catching this 
at the time of equivalence registering.


It bootstraps and passes all regressions tests.
Do you want me to check this into trunk?

Andrew

PS    Of course, we still fail 101912.   The only way I see us being 
able to do anything with that is to effectively peel the first iteration 
off, either physically,  or logically with the path ranger to determine 
if a given use  is actually reachable by the undefined value.




   :
  # prevcorr_7 = PHI 
  # leapcnt_8 = PHI <0(2), leapcnt_26(8)>
  if (leapcnt_8 < n_16)   // 0 < n_16
    goto ; [INV]

   :
  corr_22 = getint ();
  if (corr_22 <= 0)
    goto ; [INV]
  else
    goto ; [INV]

   :
  _1 = corr_22 == 1;
  _2 = leapcnt_8 != 0;  // [0, 0] = 0 != 0
  _3 = _1 & _2; // [0, 0] = 0 & _2
  if (_3 != 0)    // 4->5 is not taken on the path starting 
2->9

    goto ; [INV]
  else
    goto ; [INV]

   : // We know this path is not taken when 
prevcorr_7  == prevcorr_19(D)(2)

  if (prevcorr_7 != 1)
    goto ; [INV]
  else
    goto ; [INV]

   :
  _5 = prevcorr_7 + -1;
  if (prevcorr_7 != 2)
    goto ; [INV]
  else
    goto ; [INV]

Using the path ranger (Would it even need tweaks aldy?) , before issuing 
the warning the uninit code could easily start at each use, construct 
the path(s) to that use from the unitialized value, and determine  when 
prevcorr is uninitialized, 2->9->3->4->5 will not be executed  and of 
course,neither will 2->9->3->4->5->6


  I think threading already does something similar?



commit 79b13320cf739c965bc8c7ceb8b27903271a3f6e
Author: Andrew MacLeod 
Date:   Wed Apr 12 13:10:55 2023 -0400

Ensure PHI equivalencies do not dominate the argument edge.

When we create an equivalency between a PHI definition and an argument,
ensure the defintion does not dominate the incoming argument edge.

PR tree-optimziation/108139
PR tree-optimziation/109462
* gimple-range-cache.cc (ranger_cache::fill_block_cache): Remove
equivalency check for PHI nodes.
* gimple-range-fold.cc (fold_using_range::range_of_phi): Ensure def
does not dominate single-arg equivalency edges.

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index 3b52f1e734c..2314478d558 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -1220,7 +1220,7 @@ ranger_cache::fill_block_cache (tree name, basic_block bb, basic_block def_bb)
   // See if any equivalences can refine it.
   // PR 109462, like 108139 below, a one way equivalence introduced
   // by a PHI node can also be through the definition side.  Disallow it.
-  if (m_oracle && !is_a (SSA_NAME_DEF_STMT (name)))
+  if (m_oracle)
 	{
 	  tree equiv_name;
 	  relation_kind rel;
@@ -1237,13 +1237,6 @@ ranger_cache::fill_block_cache (tree name, basic_block bb, basic_block def_bb)
 	  if (!m_gori.has_edge_range_p (equiv_name))
 		continue;
 
-	  // PR 108139. It is hazardous to assume an equivalence with
-	  // a PHI is the same value.  The PHI may be an equivalence
-	  // via UNDEFINED arguments which is really a one way equivalence.
-	  // PHIDEF == name, but name may not be == PHIDEF.
-	  if (is_a (SSA_NAME_DEF_STMT (equiv_name)))
-		continue;
-
 	  // Check if the equiv definition dominates this block
 	  if (equiv_bb == bb ||
 		  (equiv_bb && !dominated_by_p (CDI_DOMINATORS, bb, equiv_bb)))
diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index e81f6b3699e..8860152d3a0 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -742,7 +742,8 @@ fold_using_range::range_of_phi (vrange , gphi *phi, fur_source )
 
   // Track if all executable arguments are the same.
   tree single_arg = NULL_TREE;
-  bool seen_arg = false;
+  edge single_arg_edge = NULL;
+  basic_block bb = gimple_bb (phi);
 
   // Start with an empty range, unioning in each argument's range.
   r.set_undefined ();
@@ -773,13 +774,23 @@ fold_using_range::range_of_phi (vrange , gphi *phi, fur_source )
 	src.gori ()->register_dependency 

Re: [Patch, fortran] PR109451 - ICE in gfc_conv_expr_descriptor with ASSOCIATE and substrings

2023-04-12 Thread Harald Anlauf via Gcc-patches

Hi Paul,

On 4/12/23 17:25, Paul Richard Thomas via Gcc-patches wrote:

Hi All,

I think that the changelog says it all. OK for mainline?


this looks almost fine, but still fails if one directly uses the
dummy argument as the ASSOCIATE target, as in:

program p
  implicit none
  character(4) :: c(2) = ["abcd","efgh"]
  call dcs0 (c)
! call dcs0 (["abcd","efgh"])
contains
  subroutine dcs0(a)
character(len=*), intent(in) :: a(:)
print *, size(a),len(a)
associate (q => a(:))
  print *, size(q),len(q)
end associate
associate (q => a(:)(:))
  print *, size(q),len(q)
end associate
return
  end subroutine dcs0
end

This prints e.g.

   2   4
   2   0
   2   0

(sometimes I also get junk values for the character length).

Can you please have another look?

Thanks,
Harald



Paul

Fortran: Fix some deferred character problems in associate [PR109451]

2023-04-07  Paul Thomas  

gcc/fortran
PR fortran/109451
* trans-array.cc (gfc_conv_expr_descriptor): Guard expression
character length backend decl before using it. Suppress the
assignment if lhs equals rhs.
* trans-io.cc (gfc_trans_transfer): Scalarize transfer of
associate variables pointing to a variable. Add comment.


gcc/testsuite/
PR fortran/109451
* gfortran.dg/associate_61.f90 : New test




Re: [V6][PATCH 1/2] Handle component_ref to a structre/union field including flexible array member [PR101832]

2023-04-12 Thread Kees Cook via Gcc-patches
On Tue, Mar 28, 2023 at 03:49:43PM +, Qing Zhao wrote:
> the C front-end has been approved by Joseph.
> 
> Jacub, could you please eview the middle end part of the changes of this 
> patch?
> 
> The major change is in tree-object-size.cc (addr_object_size).
>  (To use the new TYPE_INCLUDE_FLEXARRAY info). 
> 
> This patch is to fix 
> PR101832(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101832),
> and is needed for Linux Kernel security.  It’s better to be put into GCC13.
> 
> Thanks a lot!

Just to confirm, I've done build testing with the Linux kernel, and this
is behaving as I'd expect. This makes my life MUCH easier -- many fewer
false positives for our bounds checking. :)

-Kees

-- 
Kees Cook


New German PO file for 'gcc' (version 13.1-b20230409)

2023-04-12 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the German team of translators.  The file is available at:

https://translationproject.org/latest/gcc/de.po

(This file, 'gcc-13.1-b20230409.de.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: [PATCH v2] ree: Improve ree pass for rs6000 target.

2023-04-12 Thread Bernhard Reutner-Fischer via Gcc-patches
On 6 April 2023 12:49:53 CEST, Ajit Agarwal via Gcc-patches 
 wrote:
>Hello All:
>
>Eliminate unnecessary redundant extension within basic and across basic blocks.

To borrow HP's "arm-chair development mode", unfortunately most of the comments 
i had for the previous version apply to this version too, fwiw.

PS: Pretty please avoid moving functions around, if possible, as it spoils the 
history needlessly, IMHO.

thanks, and looking forward to a stage 1 patch.


Re: [PATCH] libstdc++: Implement ranges::enumerate_view from P2164R9

2023-04-12 Thread Patrick Palka via Gcc-patches
On Tue, Apr 11, 2023 at 11:12 AM Jonathan Wakely  wrote:
>
> On Tue, 11 Apr 2023 at 15:59, Patrick Palka via Libstdc++
>  wrote:
> >
> > Tested on x86_64-pc-linux-gnu, does this look OK for trunk perhaps?
>
> Yes, this is only for C++23 so OK for trunk now.

Yay thanks, pushed as r13-7161-g0f3b4d38d4bad8.

>
> The auto(x) uses mean this won't work with older versions of Clang,
> but that's OK. I already introduced that dependency into
> basic_string::resize_for_overwrite, and it just means users of older
> Clang versions can't use some C++23 features. They can still use C++20
> and lower.

Ah, sounds good.  It seems our  is already unusable before
Clang 16 even in C++20 mode anyway: https://godbolt.org/z/Ebob8naMo

>
> >
> > libstdc++-v3/ChangeLog:
> >
> > * include/std/ranges (__cpp_lib_ranges_enumerate): Define
> > for C++23.
> > (__detail::__range_with_movable_reference): Likewise.
> > (enumerate_view): Likewise.
> > (enumerate_view::_Iterator): Likewise.
> > (enumerate_view::_Sentinel): Likewise.
> > * include/std/version (__cpp_lib_ranges_enumerate): Likewise.
> > * testsuite/std/ranges/version_c++23.cc: Verify value of
> > __cpp_lib_ranges_enumerate.
> > * testsuite/std/ranges/adaptors/enumerate/1.cc: New test.
> > ---
> >  libstdc++-v3/include/std/ranges   | 303 ++
> >  libstdc++-v3/include/std/version  |   1 +
> >  .../std/ranges/adaptors/enumerate/1.cc| 102 ++
> >  .../testsuite/std/ranges/version_c++23.cc |   4 +
> >  4 files changed, 410 insertions(+)
> >  create mode 100644 
> > libstdc++-v3/testsuite/std/ranges/adaptors/enumerate/1.cc
> >
> > diff --git a/libstdc++-v3/include/std/ranges 
> > b/libstdc++-v3/include/std/ranges
> > index 14754c125ff..be71c370eb7 100644
> > --- a/libstdc++-v3/include/std/ranges
> > +++ b/libstdc++-v3/include/std/ranges
> > @@ -8732,6 +8732,309 @@ namespace views::__adaptor
> >
> >  inline constexpr _AsConst as_const;
> >}
> > +
> > +#define __cpp_lib_ranges_enumerate 202302L
> > +
> > +  namespace __detail
> > +  {
> > +template
> > +  concept __range_with_movable_reference = input_range<_Range>
> > +   && move_constructible>
> > +   && move_constructible>;
> > +  }
> > +
> > +  template
> > +requires __detail::__range_with_movable_reference<_Vp>
> > +  class enumerate_view : public view_interface>
> > +  {
> > +_Vp _M_base = _Vp();
> > +
> > +template class _Iterator;
> > +template class _Sentinel;
> > +
> > +  public:
> > +enumerate_view() requires default_initializable<_Vp> = default;
> > +
> > +constexpr explicit
> > +enumerate_view(_Vp __base)
> > +: _M_base(std::move(__base))
> > +{ }
> > +
> > +constexpr auto
> > +begin() requires (!__detail::__simple_view<_Vp>)
> > +{ return _Iterator(ranges::begin(_M_base), 0); }
> > +
> > +constexpr auto
> > +begin() const requires __detail::__range_with_movable_reference > _Vp>
> > +{ return _Iterator(ranges::begin(_M_base), 0); }
> > +
> > +constexpr auto
> > +end() requires (!__detail::__simple_view<_Vp>)
> > +{
> > +  if constexpr (common_range<_Vp> && sized_range<_Vp>)
> > +   return _Iterator(ranges::end(_M_base), 
> > ranges::distance(_M_base));
> > +  else
> > +   return _Sentinel(ranges::end(_M_base));
> > +}
> > +
> > +constexpr auto
> > +end() const requires __detail::__range_with_movable_reference > _Vp>
> > +{
> > +  if constexpr (common_range && sized_range)
> > +   return _Iterator(ranges::end(_M_base), 
> > ranges::distance(_M_base));
> > +  else
> > +   return _Sentinel(ranges::end(_M_base));
> > +}
> > +
> > +constexpr auto
> > +size() requires sized_range<_Vp>
> > +{ return ranges::size(_M_base); }
> > +
> > +constexpr auto
> > +size() const requires sized_range
> > +{ return ranges::size(_M_base); }
> > +
> > +constexpr _Vp
> > +base() const & requires copy_constructible<_Vp>
> > +{ return _M_base; }
> > +
> > +constexpr _Vp
> > +base() &&
> > +{ return std::move(_M_base); }
> > +  };
> > +
> > +  template
> > +enumerate_view(_Range&&) -> enumerate_view>;
> > +
> > +  template
> > +inline constexpr bool enable_borrowed_range>
> > +  = enable_borrowed_range<_Tp>;
> > +
> > +  template
> > +  requires __detail::__range_with_movable_reference<_Vp>
> > +  template
> > +  class enumerate_view<_Vp>::_Iterator
> > +  {
> > +using _Base = __maybe_const_t<_Const, _Vp>;
> > +
> > +static auto
> > +_S_iter_concept()
> > +{
> > +  if constexpr (random_access_range<_Base>)
> > +   return random_access_iterator_tag{};
> > +  else if constexpr (bidirectional_range<_Base>)
> > +   return bidirectional_iterator_tag{};
> > +  else if constexpr (forward_range<_Base>)
> > +   return forward_iterator_tag{};
> > +  else
> > +   return 

Re: [PATCH] libstdc++: Ensure headers used by fast_float are included

2023-04-12 Thread Patrick Palka via Gcc-patches
On Wed, Apr 12, 2023 at 11:04 AM Jonathan Wakely  wrote:
>
> On Wed, 12 Apr 2023 at 14:45, Patrick Palka via Libstdc++
>  wrote:
> >
> > This makes floating_from_chars.cc explicitly include all headers
> > that are used by the original fast_float amalgamation according to
> > r12-6647-gf5c8b82512f9d3, except:
> >
> >   1.  since fast_float doesn't seem to use anything from it
> >   2.  since fast_float doesn't seem to use anything directly
> >  from it (as opposed to from )
> >   3.  since std::errc is naturally already available
> >  from 
> >
> > This avoids potential build failures on platforms for which some
> > required headers (namely ) end up not getting transitively
> > included from elsewhere.
> >
> > Tested on x86_64-pc-linux-gnu, does this look OK for trunk/12?
>
> Yes for both, thanks.

Thanks, pushed as r13-7159-g13669111e7219e and r12-9396-g43ab94d20e1f68

>
>
> >
> > libstdc++-v3/ChangeLog:
> >
> > * src/c++17/floating_from_chars.cc: Include ,
> > ,  and .
> > ---
> >  libstdc++-v3/src/c++17/floating_from_chars.cc | 4 
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/libstdc++-v3/src/c++17/floating_from_chars.cc 
> > b/libstdc++-v3/src/c++17/floating_from_chars.cc
> > index 5d18ca32dbb..3a411cf546a 100644
> > --- a/libstdc++-v3/src/c++17/floating_from_chars.cc
> > +++ b/libstdc++-v3/src/c++17/floating_from_chars.cc
> > @@ -30,14 +30,18 @@
> >  // Prefer to use std::pmr::string if possible, which requires the cxx11 
> > ABI.
> >  #define _GLIBCXX_USE_CXX11_ABI 1
> >
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > --
> > 2.40.0.315.g0607f793cb
> >
>



[PATCH] combine, v4: Fix AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]

2023-04-12 Thread Jakub Jelinek via Gcc-patches
On Wed, Apr 12, 2023 at 12:02:12PM +0200, Jakub Jelinek via Gcc-patches wrote:
> I've tried the pr108947.c testcase, but I see no differences in the assembly
> before/after the patch (but dunno if I'm using the right options).
> The pr109040.c testcase from the patch I don't see the expected zero
> extension without the patch and do see it with it.

Seems my cross defaulted to 32-bit compilation, reproduced it with
additional -mabi=lp64 -march=rv64gv even on the pr108947.c test.
So, let's include that test in the patch too:

2023-04-12  Jeff Law  
Jakub Jelinek  

PR target/108947
PR target/109040
* combine.cc (simplify_and_const_int_1): Compute nonzero_bits in
word_mode rather than mode if WORD_REGISTER_OPERATIONS and mode is
smaller than word_mode.
* simplify-rtx.cc (simplify_context::simplify_binary_operation_1)
: Likewise.

* gcc.dg/pr108947.c: New test.
* gcc.c-torture/execute/pr109040.c: New test.

--- gcc/combine.cc.jj   2023-04-07 16:02:06.668051629 +0200
+++ gcc/combine.cc  2023-04-12 11:24:18.458240028 +0200
@@ -10055,9 +10055,12 @@ simplify_and_const_int_1 (scalar_int_mod
 
   /* See what bits may be nonzero in VAROP.  Unlike the general case of
  a call to nonzero_bits, here we don't care about bits outside
- MODE.  */
+ MODE unless WORD_REGISTER_OPERATIONS is true.  */
 
-  nonzero = nonzero_bits (varop, mode) & GET_MODE_MASK (mode);
+  scalar_int_mode tmode = mode;
+  if (WORD_REGISTER_OPERATIONS && GET_MODE_BITSIZE (mode) < BITS_PER_WORD)
+tmode = word_mode;
+  nonzero = nonzero_bits (varop, tmode) & GET_MODE_MASK (tmode);
 
   /* Turn off all bits in the constant that are known to already be zero.
  Thus, if the AND isn't needed at all, we will have CONSTOP == NONZERO_BITS
@@ -10071,7 +10074,7 @@ simplify_and_const_int_1 (scalar_int_mod
 
   /* If VAROP is a NEG of something known to be zero or 1 and CONSTOP is
  a power of two, we can replace this with an ASHIFT.  */
-  if (GET_CODE (varop) == NEG && nonzero_bits (XEXP (varop, 0), mode) == 1
+  if (GET_CODE (varop) == NEG && nonzero_bits (XEXP (varop, 0), tmode) == 1
   && (i = exact_log2 (constop)) >= 0)
 return simplify_shift_const (NULL_RTX, ASHIFT, mode, XEXP (varop, 0), i);
 
--- gcc/simplify-rtx.cc.jj  2023-03-02 19:09:45.459594212 +0100
+++ gcc/simplify-rtx.cc 2023-04-12 11:26:26.027400305 +0200
@@ -3752,7 +3752,13 @@ simplify_context::simplify_binary_operat
return op0;
   if (HWI_COMPUTABLE_MODE_P (mode))
{
- HOST_WIDE_INT nzop0 = nonzero_bits (trueop0, mode);
+ /* When WORD_REGISTER_OPERATIONS is true, we need to know the
+nonzero bits in WORD_MODE rather than MODE.  */
+  scalar_int_mode tmode = as_a  (mode);
+  if (WORD_REGISTER_OPERATIONS
+ && GET_MODE_BITSIZE (tmode) < BITS_PER_WORD)
+   tmode = word_mode;
+ HOST_WIDE_INT nzop0 = nonzero_bits (trueop0, tmode);
  HOST_WIDE_INT nzop1;
  if (CONST_INT_P (trueop1))
{
--- gcc/testsuite/gcc.dg/pr108947.c.jj  2023-04-12 18:54:13.115630365 +0200
+++ gcc/testsuite/gcc.dg/pr108947.c 2023-04-12 18:53:21.166372386 +0200
@@ -0,0 +1,21 @@
+/* PR target/108947 */
+/* { dg-do run } */
+/* { dg-options "-O2 -fno-forward-propagate -Wno-psabi" } */
+
+typedef unsigned short __attribute__((__vector_size__ (2 * sizeof (short V;
+
+__attribute__((__noipa__)) V
+foo (V v)
+{
+  V w = 3 > (v & 3992);
+  return w;
+}
+
+int
+main ()
+{
+  V w = foo ((V) { 0, 9 });
+  if (w[0] != 0x || w[1] != 0)
+__builtin_abort ();
+  return 0;
+}
--- gcc/testsuite/gcc.c-torture/execute/pr109040.c.jj   2023-04-12 
11:11:56.728938344 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr109040.c  2023-04-12 
11:11:56.728938344 +0200
@@ -0,0 +1,23 @@
+/* PR target/109040 */
+
+typedef unsigned short __attribute__((__vector_size__ (32))) V;
+
+unsigned short a, b, c, d;
+
+void
+foo (V m, unsigned short *ret)
+{
+  V v = 6 > ((V) { 2124, 8 } & m);
+  unsigned short uc = v[0] + a + b + c + d;
+  *ret = uc;
+}
+
+int
+main ()
+{
+  unsigned short x;
+  foo ((V) { 0, 15 }, );
+  if (x != (unsigned short) ~0)
+__builtin_abort ();
+  return 0;
+}


Jakub



Re: [RFC PATCH] range-op-float: Fix up op1_op2_relation of comparisons

2023-04-12 Thread Jakub Jelinek via Gcc-patches
On Wed, Apr 12, 2023 at 06:35:29PM +0200, Bernhard Reutner-Fischer wrote:
> >+relation_kind ret = le_op1_op2_relation (lhs);
> >+if (ret == VREL_GT
> >+&& (op1.known_isnan ()
> >+|| op1.maybe_isnan ()
> >+|| op2.known_isnan ()
> >+|| op2.maybe_isnan ()))
> >+  ret = VREL_VARYING; // Inverse of VREL_LE is VREL_UNGT with NAN ops.
> >+return ret;
> > return le_op1_op2_relation (lhs);
> 
> I think you forgot to delete the above return.

Thanks, adjusted in my copy.

Jakub



Re: [RFC PATCH] range-op-float: Fix up op1_op2_relation of comparisons

2023-04-12 Thread Bernhard Reutner-Fischer via Gcc-patches
On 12 April 2023 16:21:24 CEST, Jakub Jelinek via Gcc-patches 
 wrote:

>--- gcc/range-op-float.cc.jj   2023-04-12 12:17:44.784962757 +0200
>+++ gcc/range-op-float.cc  2023-04-12 16:07:54.948759355 +0200
>@@ -835,10 +835,17 @@ public:
>   bool fold_range (irange , tree type,
>  const frange , const frange ,
>  relation_trio = TRIO_VARYING) const final override;
>-  relation_kind op1_op2_relation (const irange , const frange &,
>-const frange &) const final override
>+  relation_kind op1_op2_relation (const irange , const frange ,
>+const frange ) const final override
>   {
>-return lt_op1_op2_relation (lhs);
>+relation_kind ret = lt_op1_op2_relation (lhs);
>+if (ret == VREL_GE
>+  && (op1.known_isnan ()
>+  || op1.maybe_isnan ()
>+  || op2.known_isnan ()
>+  || op2.maybe_isnan ()))
>+  ret = VREL_VARYING; // Inverse of VREL_LT is VREL_UNGE with NAN ops.
>+return ret;
>   }
>   bool op1_range (frange , tree type,
> const irange , const frange ,
>@@ -952,9 +959,17 @@ public:
>   bool fold_range (irange , tree type,
>  const frange , const frange ,
>  relation_trio rel = TRIO_VARYING) const final override;
>-  relation_kind op1_op2_relation (const irange , const frange &,
>-const frange &) const final override
>+  relation_kind op1_op2_relation (const irange , const frange ,
>+const frange ) const final override
>   {
>+relation_kind ret = le_op1_op2_relation (lhs);
>+if (ret == VREL_GT
>+  && (op1.known_isnan ()
>+  || op1.maybe_isnan ()
>+  || op2.known_isnan ()
>+  || op2.maybe_isnan ()))
>+  ret = VREL_VARYING; // Inverse of VREL_LE is VREL_UNGT with NAN ops.
>+return ret;
> return le_op1_op2_relation (lhs);

I think you forgot to delete the above return.

thanks,


Re: [PATCH] combine, v3: Fix AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]

2023-04-12 Thread Segher Boessenkool
On Wed, Apr 12, 2023 at 12:02:12PM +0200, Jakub Jelinek wrote:
> On Wed, Apr 12, 2023 at 08:21:26AM +0200, Jakub Jelinek via Gcc-patches wrote:
> > I would have expected something like
> > WORD_REGISTER_OPERATIONS && known_le (GET_MODE_PRECISION (mode), 
> > BITS_PER_WORD)
> > as the condition to use word_mode, rather than just
> > WORD_REGISTER_OPERATIONS.  In both spots.  Because larger modes should be
> > used as is, not a narrower word_mode instead of them.
> 
> In patch form that would be following (given that the combine.cc change
> had scalar_int_mode mode we can as well just use normal comparison, and
> simplify-rtx.cc has it guarded on HWI_COMPUTABLE_MODE_P, which is also only
> true for scalar int modes).
> 
> I've tried the pr108947.c testcase, but I see no differences in the assembly
> before/after the patch (but dunno if I'm using the right options).
> The pr109040.c testcase from the patch I don't see the expected zero
> extension without the patch and do see it with it.
> 
> As before, I can only test this easily on non-WORD_REGISTER_OPERATIONS
> targets.

There are no doubt tens more similar WORD_REGISTER_OPERATIONS problems
lurking.  We would be much better off if this wart was removed and we
handled such things properly.

That said:

>   PR target/109040
>   * combine.cc (simplify_and_const_int_1): Compute nonzero_bits in
>   word_mode rather than mode if WORD_REGISTER_OPERATIONS and mode is
>   smaller than word_mode.
>   * simplify-rtx.cc (simplify_context::simplify_binary_operation_1)
>   : Likewise.
> 
>   * gcc.c-torture/execute/pr109040.c: New test.

Okay for trunk.  Thanks!


Segher


[Patch, fortran] PR109451 - ICE in gfc_conv_expr_descriptor with ASSOCIATE and substrings

2023-04-12 Thread Paul Richard Thomas via Gcc-patches
Hi All,

I think that the changelog says it all. OK for mainline?

Paul

Fortran: Fix some deferred character problems in associate [PR109451]

2023-04-07  Paul Thomas  

gcc/fortran
PR fortran/109451
* trans-array.cc (gfc_conv_expr_descriptor): Guard expression
character length backend decl before using it. Suppress the
assignment if lhs equals rhs.
* trans-io.cc (gfc_trans_transfer): Scalarize transfer of
associate variables pointing to a variable. Add comment.


gcc/testsuite/
PR fortran/109451
* gfortran.dg/associate_61.f90 : New test
diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index e1725808033..3d90a02cdac 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -7934,8 +7934,12 @@ gfc_conv_expr_descriptor (gfc_se *se, gfc_expr *expr)
 	  else
 	tmp = se->string_length;
 
-	  if (expr->ts.deferred && VAR_P (expr->ts.u.cl->backend_decl))
-	gfc_add_modify (>pre, expr->ts.u.cl->backend_decl, tmp);
+	  if (expr->ts.deferred && expr->ts.u.cl->backend_decl
+	  && VAR_P (expr->ts.u.cl->backend_decl))
+	{
+	  if (expr->ts.u.cl->backend_decl != tmp)
+	gfc_add_modify (>pre, expr->ts.u.cl->backend_decl, tmp);
+	}
 	  else
 	expr->ts.u.cl->backend_decl = tmp;
 	}
diff --git a/gcc/fortran/trans-io.cc b/gcc/fortran/trans-io.cc
index 9b54d2f0d31..67658769b9e 100644
--- a/gcc/fortran/trans-io.cc
+++ b/gcc/fortran/trans-io.cc
@@ -2620,9 +2620,13 @@ gfc_trans_transfer (gfc_code * code)
 	  gcc_assert (ref && ref->type == REF_ARRAY);
 	}
 
+  /* These expressions don't always have the dtype element length set
+	 correctly, rendering them useless for array transfer.  */
   if (expr->ts.type != BT_CLASS
 	 && expr->expr_type == EXPR_VARIABLE
 	 && ((expr->symtree->n.sym->ts.type == BT_DERIVED && expr->ts.deferred)
+	 || (expr->symtree->n.sym->assoc
+		 && expr->symtree->n.sym->assoc->variable)
 	 || gfc_expr_attr (expr).pointer))
 	goto scalarize;
 
! { dg-do run }
! Test fixes for PR109451
! Contributed by Harald Anlauf  
!
  call dcs3(['abcd','efgh'])
contains
  subroutine dcs3(a)
character(len=*), intent(in)  :: a(:)
character(:), allocatable :: b(:)
b = a(:)
call test (b, a, 1)
associate (q => b(:))! no ICE but print repeated first element
  call test (q, a, 2)
  print *, q
end associate
associate (q => b(:)(:)) ! ICE
  call test (q, a, 3)
  associate (r => q(:)(1:3))
call test (r, a(:)(1:3), 4)
  end associate
end associate
associate (q => b(:)(2:3))
  call test (q, a(:)(2:3), 5)
end associate
  end subroutine dcs3
  subroutine test (x, y, i)
character(len=*), intent(in) :: x(:), y(:)
integer, intent(in) :: i
if (any (x .ne. y)) stop i
  end subroutine test
end
! { dg-output " abcdefgh" }


Re: [PATCH v2][RFC] vect: Verify that GET_MODE_NUNITS is greater than one for vect_grouped_store_supported

2023-04-12 Thread Kevin Lee
Thank you for the feedback Richard and Richard.

> Note the calls are guarded with
>
>   && ! known_eq (TYPE_VECTOR_SUBPARTS (vectype), 1U)

Yes, I believe nelt.is_constant() wouldn't be necessary. I didn't realize
the call was guarded by this condition.

> But I think the better check for location above is:
>
>if (!multiple_p (nelt, 2))
> return false;
>
> which then guards the assert in the later exact_div (nelt, 2).

I believe this check is better than using maybe_lt because it properly
guards exact_div(nelt, 2) and vec_perm_builder sel(nelt, 2, 3).
I'll modify the patch accordingly, build, test and submit the patch. Thank
you!!

Sincerely,


Re: [PATCH] RISC-V: Fix pr109479 RVV ISA inconsistency bug

2023-04-12 Thread Kito Cheng via Gcc-patches
Thanks for the quick response! verified and pushed to trunk.

On Wed, Apr 12, 2023 at 9:56 PM  wrote:
>
> From: Ju-Zhe Zhong 
>
> Fix supporting data type according to RVV ISA.
> For vint64m*_t, we should only allow them in zve64* instead of zve32*_zvl64b 
> (>=64b).
> Ideally, we should make error message more friendly like Clang.
> https://godbolt.org/z/f9GMv4dMo to report the RVV type require extenstion 
> name.
> However, I failed to find a way to do that. So current GCC can only report 
> "unknown" type.
> And I added comments to remind us doing this in the future.
>
> PR 109479
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins-types.def (vint8mf8_t): Fix 
> predicate.
> (vint16mf4_t): Ditto.
> (vint32mf2_t): Ditto.
> (vint64m1_t): Ditto.
> (vint64m2_t): Ditto.
> (vint64m4_t): Ditto.
> (vint64m8_t): Ditto.
> (vuint8mf8_t): Ditto.
> (vuint16mf4_t): Ditto.
> (vuint32mf2_t): Ditto.
> (vuint64m1_t): Ditto.
> (vuint64m2_t): Ditto.
> (vuint64m4_t): Ditto.
> (vuint64m8_t): Ditto.
> (vfloat32mf2_t): Ditto.
> (vbool64_t): Ditto.
> * config/riscv/riscv-vector-builtins.cc (register_builtin_type): Add 
> comments.
> (register_vector_type): Ditto.
> (check_required_extensions): Fix condition.
> * config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_ZVE64): Remove it.
> (RVV_REQUIRE_ELEN_64): New define.
> (RVV_REQUIRE_MIN_VLEN_64): Ditto.
> * config/riscv/riscv-vector-switch.def (TARGET_VECTOR_FP32): Remove 
> it.
> (TARGET_VECTOR_FP64): Ditto.
> (ENTRY): Fix predicate.
> * config/riscv/vector-iterators.md: Fix predicate.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/pr109479-1.c: New test.
> * gcc.target/riscv/rvv/base/pr109479-2.c: New test.
> * gcc.target/riscv/rvv/base/pr109479-3.c: New test.
> * gcc.target/riscv/rvv/base/pr109479-4.c: New test.
> * gcc.target/riscv/rvv/base/pr109479-5.c: New test.
> * gcc.target/riscv/rvv/base/pr109479-6.c: New test.
>
> ---
>  .../riscv/riscv-vector-builtins-types.def | 348 +-
>  gcc/config/riscv/riscv-vector-builtins.cc |  14 +-
>  gcc/config/riscv/riscv-vector-builtins.h  |   3 +-
>  gcc/config/riscv/riscv-vector-switch.def  |  56 ++-
>  gcc/config/riscv/vector-iterators.md  |  68 ++--
>  .../gcc.target/riscv/rvv/base/pr109479-1.c|  13 +
>  .../gcc.target/riscv/rvv/base/pr109479-2.c|  13 +
>  .../gcc.target/riscv/rvv/base/pr109479-3.c|  20 +
>  .../gcc.target/riscv/rvv/base/pr109479-4.c|  20 +
>  .../gcc.target/riscv/rvv/base/pr109479-5.c|  20 +
>  .../gcc.target/riscv/rvv/base/pr109479-6.c|  20 +
>  11 files changed, 349 insertions(+), 246 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr109479-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr109479-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr109479-3.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr109479-4.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr109479-5.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr109479-6.c
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins-types.def 
> b/gcc/config/riscv/riscv-vector-builtins-types.def
> index a55d494f1d9..a74df066521 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-types.def
> +++ b/gcc/config/riscv/riscv-vector-builtins-types.def
> @@ -235,53 +235,53 @@ along with GCC; see the file COPYING3. If not see
>  #define DEF_RVV_LMUL4_OPS(TYPE, REQUIRE)
>  #endif
>
> -DEF_RVV_I_OPS (vint8mf8_t, RVV_REQUIRE_ZVE64)
> +DEF_RVV_I_OPS (vint8mf8_t, RVV_REQUIRE_MIN_VLEN_64)
>  DEF_RVV_I_OPS (vint8mf4_t, 0)
>  DEF_RVV_I_OPS (vint8mf2_t, 0)
>  DEF_RVV_I_OPS (vint8m1_t, 0)
>  DEF_RVV_I_OPS (vint8m2_t, 0)
>  DEF_RVV_I_OPS (vint8m4_t, 0)
>  DEF_RVV_I_OPS (vint8m8_t, 0)
> -DEF_RVV_I_OPS (vint16mf4_t, RVV_REQUIRE_ZVE64)
> +DEF_RVV_I_OPS (vint16mf4_t, RVV_REQUIRE_MIN_VLEN_64)
>  DEF_RVV_I_OPS (vint16mf2_t, 0)
>  DEF_RVV_I_OPS (vint16m1_t, 0)
>  DEF_RVV_I_OPS (vint16m2_t, 0)
>  DEF_RVV_I_OPS (vint16m4_t, 0)
>  DEF_RVV_I_OPS (vint16m8_t, 0)
> -DEF_RVV_I_OPS (vint32mf2_t, RVV_REQUIRE_ZVE64)
> +DEF_RVV_I_OPS (vint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)
>  DEF_RVV_I_OPS (vint32m1_t, 0)
>  DEF_RVV_I_OPS (vint32m2_t, 0)
>  DEF_RVV_I_OPS (vint32m4_t, 0)
>  DEF_RVV_I_OPS (vint32m8_t, 0)
> -DEF_RVV_I_OPS (vint64m1_t, RVV_REQUIRE_ZVE64)
> -DEF_RVV_I_OPS (vint64m2_t, RVV_REQUIRE_ZVE64)
> -DEF_RVV_I_OPS (vint64m4_t, RVV_REQUIRE_ZVE64)
> -DEF_RVV_I_OPS (vint64m8_t, RVV_REQUIRE_ZVE64)
> +DEF_RVV_I_OPS (vint64m1_t, RVV_REQUIRE_ELEN_64)
> +DEF_RVV_I_OPS (vint64m2_t, RVV_REQUIRE_ELEN_64)
> +DEF_RVV_I_OPS (vint64m4_t, RVV_REQUIRE_ELEN_64)
> +DEF_RVV_I_OPS (vint64m8_t, RVV_REQUIRE_ELEN_64)
>
> -DEF_RVV_U_OPS (vuint8mf8_t, RVV_REQUIRE_ZVE64)
> 

Re: [PATCH] libstdc++: Fix chunk_by_view when value_type& and reference differ [PR108291]

2023-04-12 Thread Jonathan Wakely via Gcc-patches
On Wed, 12 Apr 2023 at 15:41, Patrick Palka via Libstdc++
 wrote:
>
> Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

OK, thanks.


>
> PR libstdc++/108291
>
> libstdc++-v3/ChangeLog:
>
> * include/std/ranges (chunk_by_view::_M_find_next): Generalize
> parameter types of the predicate passed to adjacent_find.
> (chunk_by_view::_M_find_prev): Likewise.
> * testsuite/std/ranges/adaptors/chunk_by/1.cc (test04, test05):
> New tests.
> ---
>  libstdc++-v3/include/std/ranges   |  8 ++---
>  .../std/ranges/adaptors/chunk_by/1.cc | 35 +++
>  2 files changed, 39 insertions(+), 4 deletions(-)
>
> diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
> index be71c370eb7..dc37a8afe51 100644
> --- a/libstdc++-v3/include/std/ranges
> +++ b/libstdc++-v3/include/std/ranges
> @@ -6743,8 +6743,8 @@ namespace views::__adaptor
>  _M_find_next(iterator_t<_Vp> __current)
>  {
>__glibcxx_assert(_M_pred.has_value());
> -  auto __pred = [this](_Tp&& __x, _Tp&& __y) {
> -   return !bool((*_M_pred)(std::forward<_Tp>(__x), 
> std::forward<_Tp>(__y)));
> +  auto __pred = [this](_Tp&& __x, _Up&& __y) 
> {
> +   return !bool((*_M_pred)(std::forward<_Tp>(__x), 
> std::forward<_Up>(__y)));
>};
>auto __it = ranges::adjacent_find(__current, ranges::end(_M_base), 
> __pred);
>return ranges::next(__it, 1, ranges::end(_M_base));
> @@ -6754,8 +6754,8 @@ namespace views::__adaptor
>  _M_find_prev(iterator_t<_Vp> __current) requires bidirectional_range<_Vp>
>  {
>__glibcxx_assert(_M_pred.has_value());
> -  auto __pred = [this](_Tp&& __x, _Tp&& __y) {
> -   return !bool((*_M_pred)(std::forward<_Tp>(__y), 
> std::forward<_Tp>(__x)));
> +  auto __pred = [this](_Tp&& __x, _Up&& __y) 
> {
> +   return !bool((*_M_pred)(std::forward<_Up>(__y), 
> std::forward<_Tp>(__x)));
>};
>auto __rbegin = std::make_reverse_iterator(__current);
>auto __rend = std::make_reverse_iterator(ranges::begin(_M_base));
> diff --git a/libstdc++-v3/testsuite/std/ranges/adaptors/chunk_by/1.cc 
> b/libstdc++-v3/testsuite/std/ranges/adaptors/chunk_by/1.cc
> index f165c7d9a95..a8fceb105e0 100644
> --- a/libstdc++-v3/testsuite/std/ranges/adaptors/chunk_by/1.cc
> +++ b/libstdc++-v3/testsuite/std/ranges/adaptors/chunk_by/1.cc
> @@ -61,10 +61,45 @@ test03()
>ranges::chunk_by_view, ranges::equal_to> r;
>  }
>
> +constexpr bool
> +test04()
> +{
> +  // PR libstdc++/108291
> +  using namespace std::literals;
> +  std::string_view s = "hello";
> +  auto r = s | views::chunk_by(std::less{});
> +  VERIFY( ranges::equal(r,
> +   (std::string_view[]){"h"sv, "el"sv, "lo"sv},
> +   ranges::equal) );
> +  VERIFY( ranges::equal(r | views::reverse,
> +   (std::string_view[]){"lo"sv, "el"sv, "h"sv},
> +   ranges::equal) );
> +
> +  return true;
> +}
> +
> +void
> +test05()
> +{
> +  // PR libstdc++/109474
> +  std::vector v = {true, false, true, true, false, false};
> +  auto r = v | views::chunk_by(std::equal_to{});
> +  VERIFY( ranges::equal(r,
> +   (std::initializer_list[])
> + {{true}, {false}, {true, true}, {false, false}},
> +   ranges::equal) );
> +  VERIFY( ranges::equal(r | views::reverse,
> +   (std::initializer_list[])
> + {{false, false}, {true, true}, {false}, {true}},
> +   ranges::equal) );
> +}
> +
>  int
>  main()
>  {
>static_assert(test01());
>test02();
>test03();
> +  static_assert(test04());
> +  test05();
>  }
> --
> 2.40.0.335.g9857273be0
>



Re: [PATCH] libstdc++: Ensure headers used by fast_float are included

2023-04-12 Thread Jonathan Wakely via Gcc-patches
On Wed, 12 Apr 2023 at 14:45, Patrick Palka via Libstdc++
 wrote:
>
> This makes floating_from_chars.cc explicitly include all headers
> that are used by the original fast_float amalgamation according to
> r12-6647-gf5c8b82512f9d3, except:
>
>   1.  since fast_float doesn't seem to use anything from it
>   2.  since fast_float doesn't seem to use anything directly
>  from it (as opposed to from )
>   3.  since std::errc is naturally already available
>  from 
>
> This avoids potential build failures on platforms for which some
> required headers (namely ) end up not getting transitively
> included from elsewhere.
>
> Tested on x86_64-pc-linux-gnu, does this look OK for trunk/12?

Yes for both, thanks.


>
> libstdc++-v3/ChangeLog:
>
> * src/c++17/floating_from_chars.cc: Include ,
> ,  and .
> ---
>  libstdc++-v3/src/c++17/floating_from_chars.cc | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/libstdc++-v3/src/c++17/floating_from_chars.cc 
> b/libstdc++-v3/src/c++17/floating_from_chars.cc
> index 5d18ca32dbb..3a411cf546a 100644
> --- a/libstdc++-v3/src/c++17/floating_from_chars.cc
> +++ b/libstdc++-v3/src/c++17/floating_from_chars.cc
> @@ -30,14 +30,18 @@
>  // Prefer to use std::pmr::string if possible, which requires the cxx11 ABI.
>  #define _GLIBCXX_USE_CXX11_ABI 1
>
> +#include 
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> --
> 2.40.0.315.g0607f793cb
>



Ping #2: [PATCH, V3] PR target/105325, Make load/cmp fusion know about prefixed loads

2023-04-12 Thread Michael Meissner via Gcc-patches
Ping for patch 105325.  I believe patch V3 answers the objections raised
earlier.  Can I check this patch into master?  Then can I apply this patch to
GCC 12 and 11 after appropriate delays?

| Date: Mon, 27 Mar 2023 23:19:55 -0400
| Subject: [PATCH, V3] PR target/105325, Make load/cmp fusion know about 
prefixed loads
| Message-ID: 

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH] reassoc: Fix up another ICE with returns_twice call [PR109410]

2023-04-12 Thread Richard Biener via Gcc-patches



> Am 12.04.2023 um 16:37 schrieb Jakub Jelinek via Gcc-patches 
> :
> 
> Hi!
> 
> The following testcase ICEs in reassoc, unlike the last case I've fixed
> there here SSA_NAME_USED_IN_ABNORMAL_PHI is not the case anywhere.
> build_and_add_sum places new statements after the later appearing definition
> of an operand but if both operands are default defs or constants, we place
> statement at the start of the function.
> 
> If the very first statement of a function is a call to returns_twice
> function, this doesn't work though, because that call has to be the first
> thing in its basic block, so the following patch splits the entry successor
> edge such that the new statements are added into a different block from the
> returns_twice call.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

> I think we should in stage1 reconsider such placements, I think it
> unnecessarily enlarges the lifetime of the new lhs if its operand(s)
> are used more than once in the function.  Unless something sinks those
> again.  Would be nice to place it closer to the actual uses (or where
> they will be placed).

IStR there is code to do that in reassoc already.

> 
> 2023-04-12  Jakub Jelinek  
> 
>PR tree-optimization/109410
>* tree-ssa-reassoc.cc (build_and_add_sum): Split edge from entry
>block if first statement of the function is a call to returns_twice
>function.
> 
>* gcc.dg/pr109410.c: New test.
> 
> --- gcc/tree-ssa-reassoc.cc.jj2023-02-18 12:40:42.739131728 +0100
> +++ gcc/tree-ssa-reassoc.cc2023-04-12 13:23:49.083979843 +0200
> @@ -1564,6 +1564,15 @@ build_and_add_sum (tree type, tree op1,
>   && (!op2def || gimple_nop_p (op2def)))
> {
>   gsi = gsi_after_labels (single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
> +  if (!gsi_end_p (gsi)
> +  && is_gimple_call (gsi_stmt (gsi))
> +  && (gimple_call_flags (gsi_stmt (gsi)) & ECF_RETURNS_TWICE))
> +{
> +  /* Don't add statements before a returns_twice call at the start
> + of a function.  */
> +  split_edge (single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
> +  gsi = gsi_after_labels (single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
> +}
>   if (gsi_end_p (gsi))
>{
>  gimple_stmt_iterator gsi2
> --- gcc/testsuite/gcc.dg/pr109410.c.jj2023-04-12 13:42:41.759751843 +0200
> +++ gcc/testsuite/gcc.dg/pr109410.c2023-04-12 13:42:27.249959585 +0200
> @@ -0,0 +1,19 @@
> +/* PR tree-optimization/109410 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +__attribute__((returns_twice)) int baz (int, int);
> +
> +int
> +bar (int x)
> +{
> +  return x;
> +}
> +
> +int
> +foo (int x, int y)
> +{
> +  baz (x, y);
> +  int a = bar (x);
> +  return y || a == 42 || a > 42;
> +}
> 
>Jakub
> 


Re: [PATCH] testsuite: update requires for powerpc/float128-cmp2-runnable.c

2023-04-12 Thread Michael Meissner via Gcc-patches
On Wed, Apr 12, 2023 at 01:31:46PM +0800, Jiufu Guo wrote:
> I understand that QP insns (e.g. xscmpexpqp) is valid if the system
> meets ISA3.0, no matter BE/LE, 32-bit/64-bit.
> I think option -mfloat128-hardware is designed for QP insns.
> 
> While there is one issue, on BE machine, when compiling with options
> "-mfloat128-hardware -m32", an error message is generated:
> "error: '%<-mfloat128-hardware%>' requires '-m64'"
> 
> (I'm wondering if we need to relax this limitation.)

In the past, the machine independent portion of the compiler demanded that for
scalar mode, there be an integer mode of the same size, since sometimes moves
are converted to using an int RTL mode.  Since we don't have TImode support in
32-bit, you would get various errors because something tried to do a TImode
move for KFmode types, and the TImode wasn't available.

If somebody wants to verify that this now works on 32-bit and/or implements
TImode on 32-bit, then we can relax the restriction.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


[PATCH] libstdc++: Fix chunk_by_view when value_type& and reference differ [PR108291]

2023-04-12 Thread Patrick Palka via Gcc-patches
Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

PR libstdc++/108291

libstdc++-v3/ChangeLog:

* include/std/ranges (chunk_by_view::_M_find_next): Generalize
parameter types of the predicate passed to adjacent_find.
(chunk_by_view::_M_find_prev): Likewise.
* testsuite/std/ranges/adaptors/chunk_by/1.cc (test04, test05):
New tests.
---
 libstdc++-v3/include/std/ranges   |  8 ++---
 .../std/ranges/adaptors/chunk_by/1.cc | 35 +++
 2 files changed, 39 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index be71c370eb7..dc37a8afe51 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -6743,8 +6743,8 @@ namespace views::__adaptor
 _M_find_next(iterator_t<_Vp> __current)
 {
   __glibcxx_assert(_M_pred.has_value());
-  auto __pred = [this](_Tp&& __x, _Tp&& __y) {
-   return !bool((*_M_pred)(std::forward<_Tp>(__x), 
std::forward<_Tp>(__y)));
+  auto __pred = [this](_Tp&& __x, _Up&& __y) {
+   return !bool((*_M_pred)(std::forward<_Tp>(__x), 
std::forward<_Up>(__y)));
   };
   auto __it = ranges::adjacent_find(__current, ranges::end(_M_base), 
__pred);
   return ranges::next(__it, 1, ranges::end(_M_base));
@@ -6754,8 +6754,8 @@ namespace views::__adaptor
 _M_find_prev(iterator_t<_Vp> __current) requires bidirectional_range<_Vp>
 {
   __glibcxx_assert(_M_pred.has_value());
-  auto __pred = [this](_Tp&& __x, _Tp&& __y) {
-   return !bool((*_M_pred)(std::forward<_Tp>(__y), 
std::forward<_Tp>(__x)));
+  auto __pred = [this](_Tp&& __x, _Up&& __y) {
+   return !bool((*_M_pred)(std::forward<_Up>(__y), 
std::forward<_Tp>(__x)));
   };
   auto __rbegin = std::make_reverse_iterator(__current);
   auto __rend = std::make_reverse_iterator(ranges::begin(_M_base));
diff --git a/libstdc++-v3/testsuite/std/ranges/adaptors/chunk_by/1.cc 
b/libstdc++-v3/testsuite/std/ranges/adaptors/chunk_by/1.cc
index f165c7d9a95..a8fceb105e0 100644
--- a/libstdc++-v3/testsuite/std/ranges/adaptors/chunk_by/1.cc
+++ b/libstdc++-v3/testsuite/std/ranges/adaptors/chunk_by/1.cc
@@ -61,10 +61,45 @@ test03()
   ranges::chunk_by_view, ranges::equal_to> r;
 }
 
+constexpr bool
+test04()
+{
+  // PR libstdc++/108291
+  using namespace std::literals;
+  std::string_view s = "hello";
+  auto r = s | views::chunk_by(std::less{});
+  VERIFY( ranges::equal(r,
+   (std::string_view[]){"h"sv, "el"sv, "lo"sv},
+   ranges::equal) );
+  VERIFY( ranges::equal(r | views::reverse,
+   (std::string_view[]){"lo"sv, "el"sv, "h"sv},
+   ranges::equal) );
+
+  return true;
+}
+
+void
+test05()
+{
+  // PR libstdc++/109474
+  std::vector v = {true, false, true, true, false, false};
+  auto r = v | views::chunk_by(std::equal_to{});
+  VERIFY( ranges::equal(r,
+   (std::initializer_list[])
+ {{true}, {false}, {true, true}, {false, false}},
+   ranges::equal) );
+  VERIFY( ranges::equal(r | views::reverse,
+   (std::initializer_list[])
+ {{false, false}, {true, true}, {false}, {true}},
+   ranges::equal) );
+}
+
 int
 main()
 {
   static_assert(test01());
   test02();
   test03();
+  static_assert(test04());
+  test05();
 }
-- 
2.40.0.335.g9857273be0



Re: [PATCH] mingw: Support building with older gcc versions

2023-04-12 Thread Jonathan Yong via Gcc-patches

On 4/12/23 13:39, Costas Argyris wrote:

This is proposed to fix PR109460 where an older version of
gcc (7.3) was used to build for windows (mingw) host.



Thanks, accepted and pushed to master branch.



[PATCH] reassoc: Fix up another ICE with returns_twice call [PR109410]

2023-04-12 Thread Jakub Jelinek via Gcc-patches
Hi!

The following testcase ICEs in reassoc, unlike the last case I've fixed
there here SSA_NAME_USED_IN_ABNORMAL_PHI is not the case anywhere.
build_and_add_sum places new statements after the later appearing definition
of an operand but if both operands are default defs or constants, we place
statement at the start of the function.

If the very first statement of a function is a call to returns_twice
function, this doesn't work though, because that call has to be the first
thing in its basic block, so the following patch splits the entry successor
edge such that the new statements are added into a different block from the
returns_twice call.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

I think we should in stage1 reconsider such placements, I think it
unnecessarily enlarges the lifetime of the new lhs if its operand(s)
are used more than once in the function.  Unless something sinks those
again.  Would be nice to place it closer to the actual uses (or where
they will be placed).

2023-04-12  Jakub Jelinek  

PR tree-optimization/109410
* tree-ssa-reassoc.cc (build_and_add_sum): Split edge from entry
block if first statement of the function is a call to returns_twice
function.

* gcc.dg/pr109410.c: New test.

--- gcc/tree-ssa-reassoc.cc.jj  2023-02-18 12:40:42.739131728 +0100
+++ gcc/tree-ssa-reassoc.cc 2023-04-12 13:23:49.083979843 +0200
@@ -1564,6 +1564,15 @@ build_and_add_sum (tree type, tree op1,
   && (!op2def || gimple_nop_p (op2def)))
 {
   gsi = gsi_after_labels (single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
+  if (!gsi_end_p (gsi)
+ && is_gimple_call (gsi_stmt (gsi))
+ && (gimple_call_flags (gsi_stmt (gsi)) & ECF_RETURNS_TWICE))
+   {
+ /* Don't add statements before a returns_twice call at the start
+of a function.  */
+ split_edge (single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
+ gsi = gsi_after_labels (single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
+   }
   if (gsi_end_p (gsi))
{
  gimple_stmt_iterator gsi2
--- gcc/testsuite/gcc.dg/pr109410.c.jj  2023-04-12 13:42:41.759751843 +0200
+++ gcc/testsuite/gcc.dg/pr109410.c 2023-04-12 13:42:27.249959585 +0200
@@ -0,0 +1,19 @@
+/* PR tree-optimization/109410 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+__attribute__((returns_twice)) int baz (int, int);
+
+int
+bar (int x)
+{
+  return x;
+}
+
+int
+foo (int x, int y)
+{
+  baz (x, y);
+  int a = bar (x);
+  return y || a == 42 || a > 42;
+}

Jakub



Re: [PATCH] i386: Fix up z operand modifier diagnostics on inline-asm [PR109458]

2023-04-12 Thread Uros Bizjak via Gcc-patches
On Wed, Apr 12, 2023 at 4:28 PM Jakub Jelinek  wrote:
>
> Hi!
>
> On the following testcase, we emit weird diagnostics.
> User used the z modifier, but diagnostics talks about Z instead.
> This is because z is implemented by doing some stuff and then falling
> through into the Z case.
>
> The following patch adjusts the Z diagnostics, such that it prints whatever
> modifier user actually uses in places which could happen with either
> modifier.
>
> Furthermore, in case of the non-integer operand used with operand code %
> warning the warning location was incorrect (and of function), so I've used
> warning_for_asm to get it a proper location in case it is a user inline-asm.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2023-04-12  Jakub Jelinek  
>
> PR target/109458
> * config/i386/i386.cc: Include rtl-error.h.
> (ix86_print_operand): For z modifier warning, use warning_for_asm
> if this_is_asm_operands.  For Z modifier errors, use %c and code
> instead of hardcoded Z.
>
> * gcc.target/i386/pr109458.c: New test.

OK.

Thanks,
Uros.

>
> --- gcc/config/i386/i386.cc.jj  2023-03-31 09:26:47.970219929 +0200
> +++ gcc/config/i386/i386.cc 2023-04-10 10:21:39.506793959 +0200
> @@ -96,6 +96,7 @@ along with GCC; see the file COPYING3.
>  #include "i386-expand.h"
>  #include "i386-features.h"
>  #include "function-abi.h"
> +#include "rtl-error.h"
>
>  /* This file should be included last.  */
>  #include "target-def.h"
> @@ -13218,7 +13219,13 @@ ix86_print_operand (FILE *file, rtx x, i
> }
>
>   if (GET_MODE_CLASS (GET_MODE (x)) == MODE_FLOAT)
> -   warning (0, "non-integer operand used with operand code %");
> +   {
> + if (this_is_asm_operands)
> +   warning_for_asm (this_is_asm_operands,
> +"non-integer operand used with operand code 
> %");
> + else
> +   warning (0, "non-integer operand used with operand code 
> %");
> +   }
>   /* FALLTHRU */
>
> case 'Z':
> @@ -13281,11 +13288,12 @@ ix86_print_operand (FILE *file, rtx x, i
>   else
> {
>   output_operand_lossage ("invalid operand type used with "
> - "operand code 'Z'");
> + "operand code '%c'", code);
>   return;
> }
>
> - output_operand_lossage ("invalid operand size for operand code 
> 'Z'");
> + output_operand_lossage ("invalid operand size for operand code 
> '%c'",
> + code);
>   return;
>
> case 'd':
> --- gcc/testsuite/gcc.target/i386/pr109458.c.jj 2023-04-10 10:30:44.950822263 
> +0200
> +++ gcc/testsuite/gcc.target/i386/pr109458.c2023-04-10 10:30:22.257153906 
> +0200
> @@ -0,0 +1,13 @@
> +/* PR target/109458 */
> +/* { dg-do compile } */
> +/* { dg-options "-msse2" } */
> +
> +void
> +foo (_Float16 x)
> +{
> +  asm volatile ("# %z0" : : "i" (42)); /* { dg-error "invalid 'asm': invalid 
> operand type used with operand code 'z'" } */
> +  asm volatile ("# %Z0" : : "i" (42)); /* { dg-error "invalid 'asm': invalid 
> operand type used with operand code 'Z'" } */
> +  asm volatile ("# %z0" : : "x" (x));  /* { dg-error "invalid 'asm': invalid 
> operand size for operand code 'z'" } */
> +   /* { dg-warning "non-integer operand 
> used with operand code 'z'" "" { target *-*-* } .-1 } */
> +  asm volatile ("# %Z0" : : "x" (x));  /* { dg-error "invalid 'asm': invalid 
> operand size for operand code 'Z'" } */
> +}
>
> Jakub
>


Re: [PATCH] combine, v3: Fix AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]

2023-04-12 Thread Jakub Jelinek via Gcc-patches
On Wed, Apr 12, 2023 at 08:17:46AM -0600, Jeff Law wrote:
> Looks pretty sensible.  It'll take most of the day, but I'll do a bootstrap
> and regression test with this variant.

Thanks.  Note, it bootstraps/regtests on x86_64-linux and i686-linux fine,
though that is not WORD_REGISTER_OPERATIONS target.  And it builds cc1 on
aarch64-linux (just wanted to make sure I didn't break anything on 2
coefficient poly-int arches).

Jakub



[PATCH] i386: Fix up z operand modifier diagnostics on inline-asm [PR109458]

2023-04-12 Thread Jakub Jelinek via Gcc-patches
Hi!

On the following testcase, we emit weird diagnostics.
User used the z modifier, but diagnostics talks about Z instead.
This is because z is implemented by doing some stuff and then falling
through into the Z case.

The following patch adjusts the Z diagnostics, such that it prints whatever
modifier user actually uses in places which could happen with either
modifier.

Furthermore, in case of the non-integer operand used with operand code %
warning the warning location was incorrect (and of function), so I've used
warning_for_asm to get it a proper location in case it is a user inline-asm.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-04-12  Jakub Jelinek  

PR target/109458
* config/i386/i386.cc: Include rtl-error.h.
(ix86_print_operand): For z modifier warning, use warning_for_asm
if this_is_asm_operands.  For Z modifier errors, use %c and code
instead of hardcoded Z.

* gcc.target/i386/pr109458.c: New test.

--- gcc/config/i386/i386.cc.jj  2023-03-31 09:26:47.970219929 +0200
+++ gcc/config/i386/i386.cc 2023-04-10 10:21:39.506793959 +0200
@@ -96,6 +96,7 @@ along with GCC; see the file COPYING3.
 #include "i386-expand.h"
 #include "i386-features.h"
 #include "function-abi.h"
+#include "rtl-error.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -13218,7 +13219,13 @@ ix86_print_operand (FILE *file, rtx x, i
}
 
  if (GET_MODE_CLASS (GET_MODE (x)) == MODE_FLOAT)
-   warning (0, "non-integer operand used with operand code %");
+   {
+ if (this_is_asm_operands)
+   warning_for_asm (this_is_asm_operands,
+"non-integer operand used with operand code 
%");
+ else
+   warning (0, "non-integer operand used with operand code %");
+   }
  /* FALLTHRU */
 
case 'Z':
@@ -13281,11 +13288,12 @@ ix86_print_operand (FILE *file, rtx x, i
  else
{
  output_operand_lossage ("invalid operand type used with "
- "operand code 'Z'");
+ "operand code '%c'", code);
  return;
}
 
- output_operand_lossage ("invalid operand size for operand code 'Z'");
+ output_operand_lossage ("invalid operand size for operand code '%c'",
+ code);
  return;
 
case 'd':
--- gcc/testsuite/gcc.target/i386/pr109458.c.jj 2023-04-10 10:30:44.950822263 
+0200
+++ gcc/testsuite/gcc.target/i386/pr109458.c2023-04-10 10:30:22.257153906 
+0200
@@ -0,0 +1,13 @@
+/* PR target/109458 */
+/* { dg-do compile } */
+/* { dg-options "-msse2" } */
+
+void
+foo (_Float16 x)
+{
+  asm volatile ("# %z0" : : "i" (42)); /* { dg-error "invalid 'asm': invalid 
operand type used with operand code 'z'" } */
+  asm volatile ("# %Z0" : : "i" (42)); /* { dg-error "invalid 'asm': invalid 
operand type used with operand code 'Z'" } */
+  asm volatile ("# %z0" : : "x" (x));  /* { dg-error "invalid 'asm': invalid 
operand size for operand code 'z'" } */
+   /* { dg-warning "non-integer operand 
used with operand code 'z'" "" { target *-*-* } .-1 } */
+  asm volatile ("# %Z0" : : "x" (x));  /* { dg-error "invalid 'asm': invalid 
operand size for operand code 'Z'" } */
+}

Jakub



Re: [RFC PATCH] range-op-float: Fix up op1_op2_relation of comparisons

2023-04-12 Thread Jakub Jelinek via Gcc-patches
On Wed, Apr 12, 2023 at 12:33:39PM +0200, Jakub Jelinek via Gcc-patches wrote:
> On Tue, Apr 11, 2023 at 04:58:19PM -0400, Andrew MacLeod wrote:
> > This bootstraps and has no regressions, and is fine by me if you want to use
> > it.
> 
> Thanks, looks nice.
> My incremental patch on top of that would then be below.
> 
> Though,
> FAIL: gcc.dg/tree-ssa/vrp-float-6.c scan-tree-dump-times evrp "Folding 
> predicate x_.* <= y_.* to 1" 1
> still FAILs with those 2 patches together.
> Shall we just xfail it for now and find some solution for GCC 14?

Except that testing of this patch on top of your patch has shown
+FAIL: gfortran.dg/maxlocval_4.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (internal compiler error: in type, at 
value-range.h:1157)
+FAIL: gfortran.dg/maxlocval_4.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
+UNRESOLVED: gfortran.dg/maxlocval_4.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  compilation failed to 
produce executable
+FAIL: gfortran.dg/maxlocval_4.f90   -O3 -g  (internal compiler error: in type, 
at value-range.h:1157)
+FAIL: gfortran.dg/maxlocval_4.f90   -O3 -g  (test for excess errors)
+UNRESOLVED: gfortran.dg/maxlocval_4.f90   -O3 -g  compilation failed to 
produce executable
+FAIL: gfortran.dg/minlocval_1.f90   -Os  (internal compiler error: in type, at 
value-range.h:1157)
+FAIL: gfortran.dg/minlocval_1.f90   -Os  (test for excess errors)
+UNRESOLVED: gfortran.dg/minlocval_1.f90   -Os  compilation failed to produce 
executable
+FAIL: gfortran.dg/minlocval_4.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (internal compiler error: in type, at 
value-range.h:1157)
+FAIL: gfortran.dg/minlocval_4.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
+UNRESOLVED: gfortran.dg/minlocval_4.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  compilation failed to 
produce executable
+FAIL: gfortran.dg/minlocval_4.f90   -O3 -g  (internal compiler error: in type, 
at value-range.h:1157)
+FAIL: gfortran.dg/minlocval_4.f90   -O3 -g  (test for excess errors)
+UNRESOLVED: gfortran.dg/minlocval_4.f90   -O3 -g  compilation failed to 
produce executable
+FAIL: gfortran.dg/pr104466.f90   -O  (internal compiler error: in type, at 
value-range.h:1157)
+FAIL: gfortran.dg/pr104466.f90   -O  (test for excess errors)
+FAIL: gfortran.dg/pr79315.f90   -O  (internal compiler error: in type, at 
value-range.h:1157)
+FAIL: gfortran.dg/pr79315.f90   -O  (test for excess errors)
+FAIL: gfortran.dg/pr81175.f   -O  (internal compiler error: in type, at 
value-range.h:1157)
+FAIL: gfortran.dg/pr81175.f   -O  (test for excess errors)
+FAIL: gfortran.dg/graphite/id-11.f   -O  (internal compiler error: in type, at 
value-range.h:1157)
+FAIL: gfortran.dg/graphite/id-11.f   -O  (test for excess errors)
regressions, missed the fact that frange::type ICEs on undefined_p values.
Seems the HONOR_NANS case was there just as a bad attempt to speed the test
up, we can just test known_isnan || maybe_isnan on both operands and that is
it.
With this version all those tests pass again:

2023-04-12  Jakub Jelinek  

* range-op-float.cc (foperator_lt::op1_op2_relation): Return
VREL_VARYING instead of VREL_GE if op1 or op2 could be NANs.
(foperator_le::op1_op2_relation): Similarly return VREL_VARYING
instead of VREL_GT.
(foperator_gt::op1_op2_relation): Similarly return VREL_VARYING
instead of VREL_LE.
(foperator_ge::op1_op2_relation): Similarly return VREL_VARYING
instead of VREL_LT.
(foperator_unordered_lt::op1_op2_relation,
foperator_unordered_le::op1_op2_relation,
foperator_unordered_gt::op1_op2_relation,
foperator_unordered_ge::op1_op2_relation): New.

--- gcc/range-op-float.cc.jj2023-04-12 12:17:44.784962757 +0200
+++ gcc/range-op-float.cc   2023-04-12 16:07:54.948759355 +0200
@@ -835,10 +835,17 @@ public:
   bool fold_range (irange , tree type,
   const frange , const frange ,
   relation_trio = TRIO_VARYING) const final override;
-  relation_kind op1_op2_relation (const irange , const frange &,
- const frange &) const final override
+  relation_kind op1_op2_relation (const irange , const frange ,
+ const frange ) const final override
   {
-return lt_op1_op2_relation (lhs);
+relation_kind ret = lt_op1_op2_relation (lhs);
+if (ret == VREL_GE
+   && (op1.known_isnan ()
+   || op1.maybe_isnan ()
+   || op2.known_isnan ()
+   || op2.maybe_isnan ()))
+  ret = VREL_VARYING; // Inverse of VREL_LT is VREL_UNGE with NAN ops.
+return ret;
   }
   bool op1_range (frange , tree type,
  

Re: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization

2023-04-12 Thread 钟居哲
>> It's not so much that we need to do that.  But normally it's only worth
>> adding internal functions if they do something that is too complicated
>> to express in simple gimple arithmetic.  The UQDEC case I mentioned:

>>z = MAX (x, y) - y

>> fell into the "simple arithmetic" category for me.  We could have added
>> an ifn for unsigned saturating decrement, but it didn't seem complicated
>> enough to merit its own ifn.

Ah, I known your concern. I should admit that WHILE_LEN is a simple arithmetic 
operation
which is just taking result from

min (remain,vf).

The possible solution is to just use MIN_EXPR (remain,vf).
Then, add speciall handling in umin_optab pattern to recognize "vf" in the 
backend.
Finally generate vsetvl in RISC-V backend.

The "vf" should be recognized as the operand of umin should be 
const_int/const_poly_int operand.
Otherwise, just generate umin scalar instruction..

However, there is a case that I can't recognize umin should generate vsetvl or 
umin. Is this following case:
void foo (int32_t a)
{
  return min (a, 4);
}

In this case I should generate:
li a1,4
umin a1,a0,a1

instead of generating vsetvl

However, in this case:

void foo (int32_t *a...)
for (int i = 0; i < n; i++)
  a[i] = b[i] + c[i];

with -mriscv-vector-bits=128 (which means each vector can handle 4 INT32)
Then the VF will be 4 too. If we also MIN_EXPR instead WHILE_LEN:

...
len = MIN_EXPR (n,4)
v = len_load (len)

...

In this case, MIN_EXPR should emit vsetvl.

It's hard for me to tell the difference between these 2 cases...

CC RISC-V port backend maintainer: Kito.



juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-04-12 20:24
To: juzhe.zhong\@rivai.ai
CC: rguenther; gcc-patches; jeffreyalaw; rdapp; linkw
Subject: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for 
auto-vectorization
"juzhe.zh...@rivai.ai"  writes:
>>> I think that already works for them (could be misremembering).
>>> However, IIUC, they have no special instruction to calculate the
>>> length (unlike for RVV), and so it's open-coded using vect_get_len.
>
> Yeah, the current flow using min, sub, and then min in vect_get_len
> is working for IBM. But I wonder whether switching the current flow of
> length-loop-control into the WHILE_LEN pattern that this patch can improve
> their performance.
>
>>> (1) How easy would it be to express WHILE_LEN in normal gimple?
>>> I haven't thought about this at all, so the answer might be
>>> "very hard".  But it reminds me a little of UQDEC on AArch64,
>>> which we open-code using MAX_EXPR and MINUS_EXPR (see
>  >>vect_set_loop_controls_directly).
>
>   >>   I'm not saying WHILE_LEN is the same operation, just that it seems
>   >>   like it might be open-codeable in a similar way.
>
>  >>Even if we can open-code it, we'd still need some way for the
>   >>   target to select the "RVV way" from the "s390/PowerPC way".
>
> WHILE_LEN in doc I define is
> operand0 = MIN (operand1, operand2)operand1 is the residual number of scalar 
> elements need to be updated.operand2 is vectorization factor (vf) for single 
> rgroup. if multiple rgroup operan2 = vf * nitems_per_ctrl.You mean 
> such pattern is not well expressed so we need to replace it with normaltree 
> code (MIN OR MAX). And let RISC-V backend to optimize them into vsetvl 
> ?Sorry, maybe I am not on the same page.
 
It's not so much that we need to do that.  But normally it's only worth
adding internal functions if they do something that is too complicated
to express in simple gimple arithmetic.  The UQDEC case I mentioned:
 
   z = MAX (x, y) - y
 
fell into the "simple arithmetic" category for me.  We could have added
an ifn for unsigned saturating decrement, but it didn't seem complicated
enough to merit its own ifn.
 
>>> (2) What effect does using a variable IV step (the result of
>>> the WHILE_LEN) have on ivopts?  I remember experimenting with
>>> something similar once (can't remember the context) and not
>>> having a constant step prevented ivopts from making good
>>> addresing-mode choices.
>
> Thank you so much for pointing out this. Currently, varialble IV step and 
> decreasing n down to 0 
> works fine for RISC-V downstream GCC and we didn't find issues related 
> addressing-mode choosing.
 
OK, that's good.  Sounds like it isn't a problem then.
 
> I think I must missed something, would you mind giving me some hints so that 
> I can study on ivopts
> to find out which case may generate inferior codegens for varialble IV step?
 
I think AArch64 was sensitive to this because (a) the vectoriser creates
separate IVs for each base address and (b) for SVE, we instead want
invariant base addresses that are indexed by the loop control IV.
Like Richard says, if the loop control IV isn't a SCEV, ivopts isn't
able to use it and so (b) fails.
 
Thanks,
Richard
 


Re: [PATCH] combine, v3: Fix AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]

2023-04-12 Thread Jeff Law via Gcc-patches




On 4/12/23 04:02, Jakub Jelinek wrote:

Hi!

On Wed, Apr 12, 2023 at 08:21:26AM +0200, Jakub Jelinek via Gcc-patches wrote:

I would have expected something like
WORD_REGISTER_OPERATIONS && known_le (GET_MODE_PRECISION (mode), BITS_PER_WORD)
as the condition to use word_mode, rather than just
WORD_REGISTER_OPERATIONS.  In both spots.  Because larger modes should be
used as is, not a narrower word_mode instead of them.


In patch form that would be following (given that the combine.cc change
had scalar_int_mode mode we can as well just use normal comparison, and
simplify-rtx.cc has it guarded on HWI_COMPUTABLE_MODE_P, which is also only
true for scalar int modes).

I've tried the pr108947.c testcase, but I see no differences in the assembly
before/after the patch (but dunno if I'm using the right options).
The pr109040.c testcase from the patch I don't see the expected zero
extension without the patch and do see it with it.

As before, I can only test this easily on non-WORD_REGISTER_OPERATIONS
targets.

2023-04-12  Jeff Law  
Jakub Jelinek  

PR target/109040
* combine.cc (simplify_and_const_int_1): Compute nonzero_bits in
word_mode rather than mode if WORD_REGISTER_OPERATIONS and mode is
smaller than word_mode.
* simplify-rtx.cc (simplify_context::simplify_binary_operation_1)
: Likewise.

* gcc.c-torture/execute/pr109040.c: New test.
Looks pretty sensible.  It'll take most of the day, but I'll do a 
bootstrap and regression test with this variant.


jeff


[PATCH] RISC-V: Fix pr109479 RVV ISA inconsistency bug

2023-04-12 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Fix supporting data type according to RVV ISA.
For vint64m*_t, we should only allow them in zve64* instead of zve32*_zvl64b 
(>=64b).
Ideally, we should make error message more friendly like Clang.
https://godbolt.org/z/f9GMv4dMo to report the RVV type require extenstion name.
However, I failed to find a way to do that. So current GCC can only report 
"unknown" type.
And I added comments to remind us doing this in the future.

PR 109479

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-types.def (vint8mf8_t): Fix 
predicate.
(vint16mf4_t): Ditto.
(vint32mf2_t): Ditto.
(vint64m1_t): Ditto.
(vint64m2_t): Ditto.
(vint64m4_t): Ditto.
(vint64m8_t): Ditto.
(vuint8mf8_t): Ditto.
(vuint16mf4_t): Ditto.
(vuint32mf2_t): Ditto.
(vuint64m1_t): Ditto.
(vuint64m2_t): Ditto.
(vuint64m4_t): Ditto.
(vuint64m8_t): Ditto.
(vfloat32mf2_t): Ditto.
(vbool64_t): Ditto.
* config/riscv/riscv-vector-builtins.cc (register_builtin_type): Add 
comments.
(register_vector_type): Ditto.
(check_required_extensions): Fix condition.
* config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_ZVE64): Remove it.
(RVV_REQUIRE_ELEN_64): New define.
(RVV_REQUIRE_MIN_VLEN_64): Ditto.
* config/riscv/riscv-vector-switch.def (TARGET_VECTOR_FP32): Remove it.
(TARGET_VECTOR_FP64): Ditto.
(ENTRY): Fix predicate.
* config/riscv/vector-iterators.md: Fix predicate.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr109479-1.c: New test.
* gcc.target/riscv/rvv/base/pr109479-2.c: New test.
* gcc.target/riscv/rvv/base/pr109479-3.c: New test.
* gcc.target/riscv/rvv/base/pr109479-4.c: New test.
* gcc.target/riscv/rvv/base/pr109479-5.c: New test.
* gcc.target/riscv/rvv/base/pr109479-6.c: New test.

---
 .../riscv/riscv-vector-builtins-types.def | 348 +-
 gcc/config/riscv/riscv-vector-builtins.cc |  14 +-
 gcc/config/riscv/riscv-vector-builtins.h  |   3 +-
 gcc/config/riscv/riscv-vector-switch.def  |  56 ++-
 gcc/config/riscv/vector-iterators.md  |  68 ++--
 .../gcc.target/riscv/rvv/base/pr109479-1.c|  13 +
 .../gcc.target/riscv/rvv/base/pr109479-2.c|  13 +
 .../gcc.target/riscv/rvv/base/pr109479-3.c|  20 +
 .../gcc.target/riscv/rvv/base/pr109479-4.c|  20 +
 .../gcc.target/riscv/rvv/base/pr109479-5.c|  20 +
 .../gcc.target/riscv/rvv/base/pr109479-6.c|  20 +
 11 files changed, 349 insertions(+), 246 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr109479-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr109479-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr109479-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr109479-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr109479-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr109479-6.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-types.def 
b/gcc/config/riscv/riscv-vector-builtins-types.def
index a55d494f1d9..a74df066521 100644
--- a/gcc/config/riscv/riscv-vector-builtins-types.def
+++ b/gcc/config/riscv/riscv-vector-builtins-types.def
@@ -235,53 +235,53 @@ along with GCC; see the file COPYING3. If not see
 #define DEF_RVV_LMUL4_OPS(TYPE, REQUIRE)
 #endif
 
-DEF_RVV_I_OPS (vint8mf8_t, RVV_REQUIRE_ZVE64)
+DEF_RVV_I_OPS (vint8mf8_t, RVV_REQUIRE_MIN_VLEN_64)
 DEF_RVV_I_OPS (vint8mf4_t, 0)
 DEF_RVV_I_OPS (vint8mf2_t, 0)
 DEF_RVV_I_OPS (vint8m1_t, 0)
 DEF_RVV_I_OPS (vint8m2_t, 0)
 DEF_RVV_I_OPS (vint8m4_t, 0)
 DEF_RVV_I_OPS (vint8m8_t, 0)
-DEF_RVV_I_OPS (vint16mf4_t, RVV_REQUIRE_ZVE64)
+DEF_RVV_I_OPS (vint16mf4_t, RVV_REQUIRE_MIN_VLEN_64)
 DEF_RVV_I_OPS (vint16mf2_t, 0)
 DEF_RVV_I_OPS (vint16m1_t, 0)
 DEF_RVV_I_OPS (vint16m2_t, 0)
 DEF_RVV_I_OPS (vint16m4_t, 0)
 DEF_RVV_I_OPS (vint16m8_t, 0)
-DEF_RVV_I_OPS (vint32mf2_t, RVV_REQUIRE_ZVE64)
+DEF_RVV_I_OPS (vint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)
 DEF_RVV_I_OPS (vint32m1_t, 0)
 DEF_RVV_I_OPS (vint32m2_t, 0)
 DEF_RVV_I_OPS (vint32m4_t, 0)
 DEF_RVV_I_OPS (vint32m8_t, 0)
-DEF_RVV_I_OPS (vint64m1_t, RVV_REQUIRE_ZVE64)
-DEF_RVV_I_OPS (vint64m2_t, RVV_REQUIRE_ZVE64)
-DEF_RVV_I_OPS (vint64m4_t, RVV_REQUIRE_ZVE64)
-DEF_RVV_I_OPS (vint64m8_t, RVV_REQUIRE_ZVE64)
+DEF_RVV_I_OPS (vint64m1_t, RVV_REQUIRE_ELEN_64)
+DEF_RVV_I_OPS (vint64m2_t, RVV_REQUIRE_ELEN_64)
+DEF_RVV_I_OPS (vint64m4_t, RVV_REQUIRE_ELEN_64)
+DEF_RVV_I_OPS (vint64m8_t, RVV_REQUIRE_ELEN_64)
 
-DEF_RVV_U_OPS (vuint8mf8_t, RVV_REQUIRE_ZVE64)
+DEF_RVV_U_OPS (vuint8mf8_t, RVV_REQUIRE_MIN_VLEN_64)
 DEF_RVV_U_OPS (vuint8mf4_t, 0)
 DEF_RVV_U_OPS (vuint8mf2_t, 0)
 DEF_RVV_U_OPS (vuint8m1_t, 0)
 DEF_RVV_U_OPS (vuint8m2_t, 0)
 DEF_RVV_U_OPS (vuint8m4_t, 0)
 DEF_RVV_U_OPS (vuint8m8_t, 0)
-DEF_RVV_U_OPS (vuint16mf4_t, RVV_REQUIRE_ZVE64)
+DEF_RVV_U_OPS (vuint16mf4_t, 

[PATCH] libstdc++: Ensure headers used by fast_float are included

2023-04-12 Thread Patrick Palka via Gcc-patches
This makes floating_from_chars.cc explicitly include all headers
that are used by the original fast_float amalgamation according to
r12-6647-gf5c8b82512f9d3, except:

  1.  since fast_float doesn't seem to use anything from it
  2.  since fast_float doesn't seem to use anything directly
 from it (as opposed to from )
  3.  since std::errc is naturally already available
 from 

This avoids potential build failures on platforms for which some
required headers (namely ) end up not getting transitively
included from elsewhere.

Tested on x86_64-pc-linux-gnu, does this look OK for trunk/12?

libstdc++-v3/ChangeLog:

* src/c++17/floating_from_chars.cc: Include ,
,  and .
---
 libstdc++-v3/src/c++17/floating_from_chars.cc | 4 
 1 file changed, 4 insertions(+)

diff --git a/libstdc++-v3/src/c++17/floating_from_chars.cc 
b/libstdc++-v3/src/c++17/floating_from_chars.cc
index 5d18ca32dbb..3a411cf546a 100644
--- a/libstdc++-v3/src/c++17/floating_from_chars.cc
+++ b/libstdc++-v3/src/c++17/floating_from_chars.cc
@@ -30,14 +30,18 @@
 // Prefer to use std::pmr::string if possible, which requires the cxx11 ABI.
 #define _GLIBCXX_USE_CXX11_ABI 1
 
+#include 
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
-- 
2.40.0.315.g0607f793cb



[PATCH] mingw: Support building with older gcc versions

2023-04-12 Thread Costas Argyris via Gcc-patches
This is proposed to fix PR109460 where an older version of
gcc (7.3) was used to build for windows (mingw) host.
From e5b608072f80a83cca65e88bb75ecc62ab0bbb87 Mon Sep 17 00:00:00 2001
From: Costas Argyris 
Date: Wed, 12 Apr 2023 08:48:18 +0100
Subject: [PATCH] mingw: Support building with older gcc versions

The $@ argument to the compiler is causing
only a warning in some gcc versions but an
error in others. In any case, $@ was never
necessary so remove it completely, just like
the rules in x-mingw32 where the object file
gets named after the source file.

This fixes both warnings and errors about
sym-mingw32.o appearing in the command line
unnecessarily.

The -nostdlib flag is required along with -r
for older gcc versions that don't apply it
automatically with -r, resulting in main
functions erroneously entering a partial link.
---
 gcc/config/i386/x-mingw32-utf8 | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/x-mingw32-utf8 b/gcc/config/i386/x-mingw32-utf8
index 2783dd259a6..b5a6cfcf702 100644
--- a/gcc/config/i386/x-mingw32-utf8
+++ b/gcc/config/i386/x-mingw32-utf8
@@ -34,7 +34,7 @@ utf8rc-mingw32.o : $(srcdir)/config/i386/utf8-mingw32.rc \
 # Create an object file that just exports the global symbol
 # HOST_EXTRA_OBJS_SYMBOL
 sym-mingw32.o : $(srcdir)/config/i386/sym-mingw32.cc
-	$(COMPILER) -c $< $@
+	$(COMPILER) -c $<
 
 # Combine the two object files into one which has both the
 # compiled utf8 resource and the HOST_EXTRA_OBJS_SYMBOL symbol.
@@ -44,8 +44,10 @@ sym-mingw32.o : $(srcdir)/config/i386/sym-mingw32.cc
 # If nothing references it into libbackend.a, it will not
 # get linked into the compiler proper eventually.
 # Therefore we need to request the symbol at compiler link time.
+# -nostdlib is required for supporting old gcc versions that
+# don't apply it automatically with -r.
 utf8-mingw32.o : utf8rc-mingw32.o sym-mingw32.o
-	$(COMPILER) -r utf8rc-mingw32.o sym-mingw32.o -o $@
+	$(COMPILER) -r -nostdlib utf8rc-mingw32.o sym-mingw32.o -o $@
 
 # Force compilers to link against the utf8 resource by
 # requiring the symbol to be defined.
-- 
2.30.2



Re: [PATCH] combine: Fix simplify_comparison AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]

2023-04-12 Thread Jeff Law via Gcc-patches




On 4/12/23 00:21, Jakub Jelinek wrote:

On Tue, Apr 11, 2023 at 07:26:07PM -0600, Jeff Law wrote:

I did bootstrap on riscv, but not a regression test, that's spinning right
now.

Jeff



diff --git a/gcc/combine.cc b/gcc/combine.cc
index 22bf8e1ec89..c41d8a09b3b 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -10055,9 +10055,10 @@ simplify_and_const_int_1 (scalar_int_mode mode, rtx 
varop,
  
/* See what bits may be nonzero in VAROP.  Unlike the general case of

   a call to nonzero_bits, here we don't care about bits outside
- MODE.  */
+ MODE unless WORD_REGISTER_OPERATIONS is true.  */


I would have expected something like
WORD_REGISTER_OPERATIONS && known_le (GET_MODE_PRECISION (mode), BITS_PER_WORD)
as the condition to use word_mode, rather than just
WORD_REGISTER_OPERATIONS.  In both spots.  Because larger modes should be
used as is, not a narrower word_mode instead of them.

Agreed.

Jeff


Re: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization

2023-04-12 Thread 钟居哲
Thanks Kewen. 

It seems that this proposal WHILE_LEN can help s390 when using --param 
vect-partial-vector-usage=2 compile option.

Would you mind apply this patch && support WHILE_LEN in s390 backend and test 
it to see the overal benefits for s390
as well as the correctness of this sequence ? 
If it may create some correctness issue for s390 or rs6000 (I saw 
len_load/len_store in rs6000 too), I can fix this patch for you.

I hope both RVV and IBM targets can gain benefits from this patch.

Thanks.


juzhe.zh...@rivai.ai
 
From: Kewen.Lin
Date: 2023-04-12 20:56
To: juzhe.zh...@rivai.ai; richard.sandiford; rguenther
CC: gcc-patches; jeffreyalaw; rdapp
Subject: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for 
auto-vectorization
Hi!
 
on 2023/4/12 19:37, juzhe.zh...@rivai.ai wrote:
> Thank you. Richard. >> I think that already works for them (could be 
> misremembering). >> However, IIUC, they have no special instruction to 
> calculate the >> length (unlike for RVV), and so it's open-coded using 
> vect_get_len. 
> 
> Thank you. Richard.
> 
> 
>>> I think that already works for them (could be misremembering).
>>> However, IIUC, they have no special instruction to calculate the
>>> length (unlike for RVV), and so it's open-coded using vect_get_len.
> 
 
Yeah, Richard is right, we don't have some special hardware instruction
for efficient length calculation.
 
> Yeah, the current flow using min, sub, and then min in vect_get_len
> is working for IBM. But I wonder whether switching the current flow of
> length-loop-control into the WHILE_LEN pattern that this patch can improve
> their performance.
 
Based on some cons for the vector load/store with length in bytes on Power
(like we need one extra GPR holding the length, the length needs to be the
most significant 8 bits requiring an extra shifting etc.), we use normal
vector load/store in main loop and only use vector load/store with length
for the epilogue.  For the epilogue, the remaining length is known less
than the whole vector length, so the related sequence can be optimized.
I just had a check on s390 code, which also enables it only for the
epilogue.  From this perspective, this WHILE_LEN proposal may not give us
more.  But for the case of vect-partial-vector-usage=2 (fully adopting
vector with length on the main loop), I think the proposed sequence looks
better to me.
 
BR,
Kewen
 


[PATCH V5] Testsuite: Fix a redefinition bug for the fd-4.c

2023-04-12 Thread shiyulong
From: yulong 

This patch fix a redefinition bug.
There are have a definition about mode_t in the fd-4.c, but it duplicates the 
definition in types.h that be included by stdio.h.
Thanks to Jeff Law for reviewing the previous version.

gcc/testsuite/ChangeLog:

* gcc.dg/analyzer/fd-4.c: delete the definition of mode_t.

---
 gcc/testsuite/gcc.dg/analyzer/fd-4.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/analyzer/fd-4.c 
b/gcc/testsuite/gcc.dg/analyzer/fd-4.c
index 994bad84342..9ec015679e9 100644
--- a/gcc/testsuite/gcc.dg/analyzer/fd-4.c
+++ b/gcc/testsuite/gcc.dg/analyzer/fd-4.c
@@ -13,11 +13,6 @@ int read (int fd, void *buf, int nbytes);
 #define O_WRONLY 1
 #define O_RDWR 2
 
-typedef enum {
-  S_IRWXU
-  // etc
-} mode_t;
-
 int creat (const char *, mode_t mode);
 
 void
-- 
2.25.1



Re: [PATCH] PR tree-optimization/109462 - Don't use ANY PHI equivalences in range-on-entry.

2023-04-12 Thread Andrew MacLeod via Gcc-patches



On 4/12/23 04:20, Jakub Jelinek wrote:

On Tue, Apr 11, 2023 at 07:52:29PM -0400, Andrew MacLeod wrote:

This bootstraps on x86_64-pc-linux-gnu  with that single regression, which I
have XFAILed for now.  OK for trunk?

Yes.


    Once Jakub verifies it actually fixes
the execution problem.   we have no executable test . yet.

I have verified this fix both on the original clang testcase, and
on a self-contained testcase I've reduced overnight and this morning.

Ok to commit it to trunk incrementally after your commit?



Sure. I just pushed it.

Andrew




Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization

2023-04-12 Thread Kewen.Lin via Gcc-patches
Hi!

on 2023/4/12 19:37, juzhe.zh...@rivai.ai wrote:
> Thank you. Richard. >> I think that already works for them (could be 
> misremembering). >> However, IIUC, they have no special instruction to 
> calculate the >> length (unlike for RVV), and so it's open-coded using 
> vect_get_len. 
> 
> Thank you. Richard.
> 
> 
>>> I think that already works for them (could be misremembering).
>>> However, IIUC, they have no special instruction to calculate the
>>> length (unlike for RVV), and so it's open-coded using vect_get_len.
> 

Yeah, Richard is right, we don't have some special hardware instruction
for efficient length calculation.

> Yeah, the current flow using min, sub, and then min in vect_get_len
> is working for IBM. But I wonder whether switching the current flow of
> length-loop-control into the WHILE_LEN pattern that this patch can improve
> their performance.

Based on some cons for the vector load/store with length in bytes on Power
(like we need one extra GPR holding the length, the length needs to be the
most significant 8 bits requiring an extra shifting etc.), we use normal
vector load/store in main loop and only use vector load/store with length
for the epilogue.  For the epilogue, the remaining length is known less
than the whole vector length, so the related sequence can be optimized.
I just had a check on s390 code, which also enables it only for the
epilogue.  From this perspective, this WHILE_LEN proposal may not give us
more.  But for the case of vect-partial-vector-usage=2 (fully adopting
vector with length on the main loop), I think the proposed sequence looks
better to me.

BR,
Kewen


Re: [PATCH] RISC-V: Fix PR109479

2023-04-12 Thread Kito Cheng via Gcc-patches
OK for trunk, but plz improve the coverage of the testcase, e.g.
vint16mf4_t has fixed too but not tested in testcase.

On Wed, Apr 12, 2023 at 7:09 PM  wrote:
>
> From: Ju-Zhe Zhong 
>
> Fix supporting data type according to RVV ISA.
> For vint64m*_t, we should only allow them in zve64* instead of zve32*_zvl64b 
> (>=64b).
> Ideally, we should make error message more friendly like Clang.
> https://godbolt.org/z/f9GMv4dMo to report the RVV type require extenstion 
> name.
> However, I failed to find a way to do that. So current GCC can only report 
> "unknown" type.
> And I added comments to remind us doing this in the future.
>
> Tested and Regression all passed. OK for GCC-13 trunk ?
>
> PR 109479
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins-types.def (vint8mf8_t): Fix 
> predicate.
> (vint16mf4_t): Ditto.
> (vint32mf2_t): Ditto.
> (vint64m1_t): Ditto.
> (vint64m2_t): Ditto.
> (vint64m4_t): Ditto.
> (vint64m8_t): Ditto.
> (vuint8mf8_t): Ditto.
> (vuint16mf4_t): Ditto.
> (vuint32mf2_t): Ditto.
> (vuint64m1_t): Ditto.
> (vuint64m2_t): Ditto.
> (vuint64m4_t): Ditto.
> (vuint64m8_t): Ditto.
> (vfloat32mf2_t): Ditto.
> (vbool64_t): Ditto.
> * config/riscv/riscv-vector-builtins.cc (register_builtin_type): Add 
> comments.
> (register_vector_type): Ditto.
> (check_required_extensions): Fix predicate.
> * config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_ZVE64): Ditto.
> (RVV_REQUIRE_ELEN_64): Ditto.
> (RVV_REQUIRE_MIN_VLEN_64): Ditto.
> * config/riscv/riscv-vector-switch.def (TARGET_VECTOR_FP32): Remove 
> it.
> (TARGET_VECTOR_FP64): Ditto.
> (ENTRY): Fix predicate.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/pr109479.c: New test.
>
> ---
>  .../riscv/riscv-vector-builtins-types.def | 348 +-
>  gcc/config/riscv/riscv-vector-builtins.cc |  14 +-
>  gcc/config/riscv/riscv-vector-builtins.h  |   3 +-
>  gcc/config/riscv/riscv-vector-switch.def  |  56 ++-
>  .../gcc.target/riscv/rvv/base/pr109479.c  |  11 +
>  5 files changed, 220 insertions(+), 212 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr109479.c
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins-types.def 
> b/gcc/config/riscv/riscv-vector-builtins-types.def
> index a55d494f1d9..a74df066521 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-types.def
> +++ b/gcc/config/riscv/riscv-vector-builtins-types.def
> @@ -235,53 +235,53 @@ along with GCC; see the file COPYING3. If not see
>  #define DEF_RVV_LMUL4_OPS(TYPE, REQUIRE)
>  #endif
>
> -DEF_RVV_I_OPS (vint8mf8_t, RVV_REQUIRE_ZVE64)
> +DEF_RVV_I_OPS (vint8mf8_t, RVV_REQUIRE_MIN_VLEN_64)
>  DEF_RVV_I_OPS (vint8mf4_t, 0)
>  DEF_RVV_I_OPS (vint8mf2_t, 0)
>  DEF_RVV_I_OPS (vint8m1_t, 0)
>  DEF_RVV_I_OPS (vint8m2_t, 0)
>  DEF_RVV_I_OPS (vint8m4_t, 0)
>  DEF_RVV_I_OPS (vint8m8_t, 0)
> -DEF_RVV_I_OPS (vint16mf4_t, RVV_REQUIRE_ZVE64)
> +DEF_RVV_I_OPS (vint16mf4_t, RVV_REQUIRE_MIN_VLEN_64)
>  DEF_RVV_I_OPS (vint16mf2_t, 0)
>  DEF_RVV_I_OPS (vint16m1_t, 0)
>  DEF_RVV_I_OPS (vint16m2_t, 0)
>  DEF_RVV_I_OPS (vint16m4_t, 0)
>  DEF_RVV_I_OPS (vint16m8_t, 0)
> -DEF_RVV_I_OPS (vint32mf2_t, RVV_REQUIRE_ZVE64)
> +DEF_RVV_I_OPS (vint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)
>  DEF_RVV_I_OPS (vint32m1_t, 0)
>  DEF_RVV_I_OPS (vint32m2_t, 0)
>  DEF_RVV_I_OPS (vint32m4_t, 0)
>  DEF_RVV_I_OPS (vint32m8_t, 0)
> -DEF_RVV_I_OPS (vint64m1_t, RVV_REQUIRE_ZVE64)
> -DEF_RVV_I_OPS (vint64m2_t, RVV_REQUIRE_ZVE64)
> -DEF_RVV_I_OPS (vint64m4_t, RVV_REQUIRE_ZVE64)
> -DEF_RVV_I_OPS (vint64m8_t, RVV_REQUIRE_ZVE64)
> +DEF_RVV_I_OPS (vint64m1_t, RVV_REQUIRE_ELEN_64)
> +DEF_RVV_I_OPS (vint64m2_t, RVV_REQUIRE_ELEN_64)
> +DEF_RVV_I_OPS (vint64m4_t, RVV_REQUIRE_ELEN_64)
> +DEF_RVV_I_OPS (vint64m8_t, RVV_REQUIRE_ELEN_64)
>
> -DEF_RVV_U_OPS (vuint8mf8_t, RVV_REQUIRE_ZVE64)
> +DEF_RVV_U_OPS (vuint8mf8_t, RVV_REQUIRE_MIN_VLEN_64)
>  DEF_RVV_U_OPS (vuint8mf4_t, 0)
>  DEF_RVV_U_OPS (vuint8mf2_t, 0)
>  DEF_RVV_U_OPS (vuint8m1_t, 0)
>  DEF_RVV_U_OPS (vuint8m2_t, 0)
>  DEF_RVV_U_OPS (vuint8m4_t, 0)
>  DEF_RVV_U_OPS (vuint8m8_t, 0)
> -DEF_RVV_U_OPS (vuint16mf4_t, RVV_REQUIRE_ZVE64)
> +DEF_RVV_U_OPS (vuint16mf4_t, RVV_REQUIRE_MIN_VLEN_64)
>  DEF_RVV_U_OPS (vuint16mf2_t, 0)
>  DEF_RVV_U_OPS (vuint16m1_t, 0)
>  DEF_RVV_U_OPS (vuint16m2_t, 0)
>  DEF_RVV_U_OPS (vuint16m4_t, 0)
>  DEF_RVV_U_OPS (vuint16m8_t, 0)
> -DEF_RVV_U_OPS (vuint32mf2_t, RVV_REQUIRE_ZVE64)
> +DEF_RVV_U_OPS (vuint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)
>  DEF_RVV_U_OPS (vuint32m1_t, 0)
>  DEF_RVV_U_OPS (vuint32m2_t, 0)
>  DEF_RVV_U_OPS (vuint32m4_t, 0)
>  DEF_RVV_U_OPS (vuint32m8_t, 0)
> -DEF_RVV_U_OPS (vuint64m1_t, RVV_REQUIRE_ZVE64)
> -DEF_RVV_U_OPS (vuint64m2_t, RVV_REQUIRE_ZVE64)
> -DEF_RVV_U_OPS (vuint64m4_t, RVV_REQUIRE_ZVE64)
> -DEF_RVV_U_OPS (vuint64m8_t, RVV_REQUIRE_ZVE64)
> 

Re: [PATCH] testsuite: update requires for powerpc/float128-cmp2-runnable.c

2023-04-12 Thread Kewen.Lin via Gcc-patches
Hi Segher & Jeff,

on 2023/4/11 23:13, Segher Boessenkool wrote:
> On Tue, Apr 11, 2023 at 05:40:09PM +0800, Kewen.Lin wrote:
>> on 2023/4/11 17:14, guojiufu wrote:
>>> Thanks for raising this concern.
>>> The behavior to check about bif on FLOAT128_HW and emit an error message for
>>> requirements on quad-precision is added in gcc12. This is why gcc12 fails to
>>> compile the case on -m32.
>>>
>>> Before gcc12, altivec_resolve_overloaded_builtin will return the overloaded
>>> result directly, and does not check more about the result function.
>>
>> Thanks for checking, I wonder which commit caused this behavior change and 
>> what's
>> the underlying justification?  I know there is one new bif handling framework

Answered this question by myself with some diggings, test case
float128-cmp2-runnable.c started to fail from r12-5752-gd08236359eb229 which
exactly makes new bif framework start to take effect and the reason why the
behavior changes is the condition change from **TARGET_P9_VECTOR** to
**TARGET_FLOAT128_HW**.

With r12-5751-gc9dd01314d8467 (still old bif framework):

$ grep -r scalar_cmp_exp_qp gcc/config/rs6000/rs6000-builtin.def
BU_P9V_VSX_2 (VSCEQPGT, "scalar_cmp_exp_qp_gt", CONST,  xscmpexpqp_gt_kf)
BU_P9V_VSX_2 (VSCEQPLT, "scalar_cmp_exp_qp_lt", CONST,  xscmpexpqp_lt_kf)
BU_P9V_VSX_2 (VSCEQPEQ, "scalar_cmp_exp_qp_eq", CONST,  xscmpexpqp_eq_kf)
BU_P9V_VSX_2 (VSCEQPUO, "scalar_cmp_exp_qp_unordered",  CONST,  
xscmpexpqp_unordered_kf)
BU_P9V_OVERLOAD_2 (VSCEQPGT,"scalar_cmp_exp_qp_gt")
BU_P9V_OVERLOAD_2 (VSCEQPLT,"scalar_cmp_exp_qp_lt")
BU_P9V_OVERLOAD_2 (VSCEQPEQ,"scalar_cmp_exp_qp_eq")
BU_P9V_OVERLOAD_2 (VSCEQPUO,"scalar_cmp_exp_qp_unordered")

There were only 13 bifs requiring TARGET_FLOAT128_HW in old bif framework.

$ grep ^BU_FLOAT128_HW gcc/config/rs6000/rs6000-builtin.def
BU_FLOAT128_HW_VSX_1 (VSEEQP,   "scalar_extract_expq",  CONST,  xsxexpqp_kf)
BU_FLOAT128_HW_VSX_1 (VSESQP,   "scalar_extract_sigq",  CONST,  xsxsigqp_kf)
BU_FLOAT128_HW_VSX_1 (VSTDCNQP, "scalar_test_neg_qp",   CONST,  xststdcnegqp_kf)
BU_FLOAT128_HW_VSX_2 (VSIEQP,   "scalar_insert_exp_q",  CONST,  xsiexpqp_kf)
BU_FLOAT128_HW_VSX_2 (VSIEQPF,  "scalar_insert_exp_qp", CONST,  xsiexpqpf_kf)
BU_FLOAT128_HW_VSX_2 (VSTDCQP, "scalar_test_data_class_qp", CONST,  
xststdcqp_kf)
BU_FLOAT128_HW_1 (SQRTF128_ODD,  "sqrtf128_round_to_odd",  FP, sqrtkf2_odd)
BU_FLOAT128_HW_1 (TRUNCF128_ODD, "truncf128_round_to_odd", FP, trunckfdf2_odd)
BU_FLOAT128_HW_2 (ADDF128_ODD,   "addf128_round_to_odd",   FP, addkf3_odd)
BU_FLOAT128_HW_2 (SUBF128_ODD,   "subf128_round_to_odd",   FP, subkf3_odd)
BU_FLOAT128_HW_2 (MULF128_ODD,   "mulf128_round_to_odd",   FP, mulkf3_odd)
BU_FLOAT128_HW_2 (DIVF128_ODD,   "divf128_round_to_odd",   FP, divkf3_odd)
BU_FLOAT128_HW_3 (FMAF128_ODD,   "fmaf128_round_to_odd",   FP, fmakf4_odd)

Starting from r12-5752-gd08236359eb229, these 
scalar_cmp_exp_qp_{gt,lt,eq,unordered}
bifs were put under stanza ieee128-hw, it makes ieee128-hw to have 17 bifs,
comparing to the previous, the extra four ones were exactly these
scalar_cmp_exp_qp_{gt,lt,eq,unordered}.

>> introduced in gcc12, not sure the checking condition was changed together or 
>> by
>> a standalone commit.  Anyway, apparently the conditions for the support of 
>> these
>> bifs are different on gcc-11 and gcc-12, I wonder why it changed.  As 
>> mentioned
>> above, PR108758's c#1 said this case (bifs) work well on gcc-11, I suspected 
>> the
>> condition change was an overkill, that's why I asked.
> 
> It almost certainly was an oversight.  The new builtin framework changed
> so many things, there was bound to be some breakage to go with all the
> good things it brought.

Yeah, as the above findings, also I found that r12-3126-g2ed356a4c9af06 
introduced
power9 related stanzas and r12-3167-g2f9489a1009d98 introduced ieee128-hw stanza
including these four bifs, both of them don't have any notes on why we would 
change
the condition for these scalar_cmp_exp_qp_{gt,lt,eq,unordered} from 
power9-vector to
ieee128-hw, so I think it's just an oversight (ieee128-hw is an overkill 
comparing
to power9-vector :)).

> 
> So what is the actual thing going wrong?  QP insns work fine and are
> valid on all systems and environments, BE or LE, 32-bit or 64-bit.  Of
> course you cannot use the "long double" type for those everywhere, but
> that is a very different thing.

The actual thing going wrong is that: the test case float128-cmp2-runnable.c
runs well on BE -m32 and -m64 with gcc-11, but meets failures on BE -m32 with
latest gcc-12 and trunk during compilation, having the error messages like:

gcc/testsuite/gcc.target/powerpc/float128-cmp2-runnable.c: In function 'main':
gcc/testsuite/gcc.target/powerpc/float128-cmp2-runnable.c:155:3: error:
  '__builtin_vsx_scalar_cmp_exp_qp_eq' requires ISA 3.0 IEEE 128-bit floating 
point

As scalar_cmp_exp_qp_{gt,lt,eq,unordered} requires condition TARGET_FLOAT128_HW
now (since new bif framework took effect).

(To be 

[committed] libstdc++: Initialize all members of basic_endpoint union [PR109482]

2023-04-12 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux and sparc-solaris2.11. Pushed to trunk.

-- >8 --

On Solaris the in_addr struct contains a union and value-initializing it
does not make the s_addr member active. This means we can't access that
member later during constant evaluation.

Make the constructors explicitly set every member that we might want to
read later in constexpr member functions. This means even the default
constructor can only be constexpr for C++20, because we can't change the
active member of a union in older standards.

libstdc++-v3/ChangeLog:

PR libstdc++/109482
* include/experimental/internet (basic_endpoint::basic_endpoint()):
Ensure that the required union members are active. Only define
as constexpr for C++20 and later.
(basic_endpoint::basic_endpoint(const protocol_type&, port_type)):
Likewise.
* testsuite/experimental/net/internet/endpoint/cons.cc: Only
check constexpr default constructor for C++20 and later.
* testsuite/experimental/net/internet/endpoint/extensible.cc:
Likewise.
---
 libstdc++-v3/include/experimental/internet| 22 ---
 .../net/internet/endpoint/cons.cc | 27 +--
 .../net/internet/endpoint/extensible.cc   |  4 +++
 3 files changed, 35 insertions(+), 18 deletions(-)

diff --git a/libstdc++-v3/include/experimental/internet 
b/libstdc++-v3/include/experimental/internet
index eb23ae21cdc..1f63c61ce85 100644
--- a/libstdc++-v3/include/experimental/internet
+++ b/libstdc++-v3/include/experimental/internet
@@ -1512,9 +1512,14 @@ namespace ip
 
   // constructors:
 
-  constexpr
+  _GLIBCXX20_CONSTEXPR
   basic_endpoint() noexcept : _M_data()
-  { _M_data._M_v4.sin_family = protocol_type::v4().family(); }
+  {
+   _M_data._M_v4.sin_family = protocol_type::v4().family();
+   // If in_addr contains a union, make the correct member active:
+   if (std::__is_constant_evaluated())
+ std::_Construct(&_M_data._M_v4.sin_addr.s_addr);
+  }
 
   _GLIBCXX20_CONSTEXPR
   basic_endpoint(const protocol_type& __proto,
@@ -1523,19 +1528,25 @@ namespace ip
   {
if (__proto == protocol_type::v4())
  {
-   _M_data._M_v4.sin_family = __proto.family();
+   _M_data._M_v4.sin_family = protocol_type::v4().family();
_M_data._M_v4.sin_port = address_v4::_S_hton_16(__port_num);
+   if (std::__is_constant_evaluated())
+ std::_Construct(&_M_data._M_v4.sin_addr.s_addr);
  }
else if (__proto == protocol_type::v6())
  {
std::_Construct(&_M_data._M_v6);
_M_data._M_v6.sin6_family = __proto.family();
_M_data._M_v6.sin6_port = address_v4::_S_hton_16(__port_num);
+   _M_data._M_v6.sin6_scope_id = 0;
+   if (std::__is_constant_evaluated())
+ std::_Construct(&_M_data._M_v6.sin6_addr.s6_addr);
  }
else
  {
__glibcxx_assert(__proto == protocol_type::v4()
   || __proto == protocol_type::v6());
+
  }
   }
 
@@ -1548,13 +1559,16 @@ namespace ip
  {
_M_data._M_v4.sin_family = protocol_type::v4().family();
_M_data._M_v4.sin_port = address_v4::_S_hton_16(__port_num);
-   _M_data._M_v4.sin_addr.s_addr = __addr._M_v4._M_addr;
+   std::_Construct(&_M_data._M_v4.sin_addr.s_addr,
+   __addr._M_v4._M_addr);
  }
else
  {
std::_Construct(&_M_data._M_v6);
_M_data._M_v6.sin6_family = protocol_type::v6().family();
_M_data._M_v6.sin6_port = address_v4::_S_hton_16(__port_num);
+   if (std::__is_constant_evaluated())
+ std::_Construct(&_M_data._M_v6.sin6_addr.s6_addr);
uint8_t* __s6a = _M_data._M_v6.sin6_addr.s6_addr;
for (int __i = 0; __i < 16; ++__i)
  __s6a[__i] = __addr._M_v6._M_bytes[__i];
diff --git a/libstdc++-v3/testsuite/experimental/net/internet/endpoint/cons.cc 
b/libstdc++-v3/testsuite/experimental/net/internet/endpoint/cons.cc
index b4bef88b4a3..d54b0c9550b 100644
--- a/libstdc++-v3/testsuite/experimental/net/internet/endpoint/cons.cc
+++ b/libstdc++-v3/testsuite/experimental/net/internet/endpoint/cons.cc
@@ -7,7 +7,10 @@
 
 using namespace std::experimental::net;
 
-constexpr void
+#if __cplusplus >= 202002
+constexpr
+#endif
+void
 test_default()
 {
   ip::tcp::endpoint t1;
@@ -57,23 +60,19 @@ test_addr()
   VERIFY( t2.port() == 80 );
 }
 
-constexpr bool
-test_constexpr()
-{
-  test_default();
-#if __cplusplus >= 202002
-  // Non-default basic_endpoint constructors are only constexpr in C++20.
-  test_proto();
-  test_addr();
-#endif
-  return true;
-}
-
 int main()
 {
   test_default();
   test_proto();
   test_addr();
 
-  static_assert( test_constexpr(), "valid in constant expressions" );
+#if __cplusplus >= 202002
+  // basic_endpoint 

[committed] libstdc++: Update tzdata to 2023c

2023-04-12 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux and sparc-solaris2.11. Pushed to trunk.

-- >8 --

Import the new 2023c tzdata.zi file.

libstdc++-v3/ChangeLog:

* src/c++20/tzdata.zi: Import new file from 2023c release.
---
 libstdc++-v3/src/c++20/tzdata.zi | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/src/c++20/tzdata.zi b/libstdc++-v3/src/c++20/tzdata.zi
index 5f65edcd5cf..b522e395326 100644
--- a/libstdc++-v3/src/c++20/tzdata.zi
+++ b/libstdc++-v3/src/c++20/tzdata.zi
@@ -1,4 +1,4 @@
-# version 2023b
+# version 2023c
 # This zic input file is in the public domain.
 R d 1916 o - Jun 14 23s 1 S
 R d 1916 1919 - O Su>=1 23s 0 -
@@ -920,11 +920,9 @@ R l 1988 o - Jun 1 0 1 S
 R l 1989 o - May 10 0 1 S
 R l 1990 1992 - May 1 0 1 S
 R l 1992 o - O 4 0 0 -
-R l 1993 2022 - Mar lastSu 0 1 S
+R l 1993 ma - Mar lastSu 0 1 S
 R l 1993 1998 - S lastSu 0 0 -
 R l 1999 ma - O lastSu 0 0 -
-R l 2023 o - Ap 21 0 1 S
-R l 2024 ma - Mar lastSu 0 1 S
 Z Asia/Beirut 2:22 - LMT 1880
 2 l EE%sT
 R NB 1935 1941 - S 14 0 0:20 -
-- 
2.39.2



Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization

2023-04-12 Thread Richard Sandiford via Gcc-patches
"juzhe.zh...@rivai.ai"  writes:
>>> I think that already works for them (could be misremembering).
>>> However, IIUC, they have no special instruction to calculate the
>>> length (unlike for RVV), and so it's open-coded using vect_get_len.
>
> Yeah, the current flow using min, sub, and then min in vect_get_len
> is working for IBM. But I wonder whether switching the current flow of
> length-loop-control into the WHILE_LEN pattern that this patch can improve
> their performance.
>
>>> (1) How easy would it be to express WHILE_LEN in normal gimple?
>>> I haven't thought about this at all, so the answer might be
>>> "very hard".  But it reminds me a little of UQDEC on AArch64,
>>> which we open-code using MAX_EXPR and MINUS_EXPR (see
>  >>vect_set_loop_controls_directly).
>
>   >>   I'm not saying WHILE_LEN is the same operation, just that it seems
>   >>   like it might be open-codeable in a similar way.
>
>  >>Even if we can open-code it, we'd still need some way for the
>   >>   target to select the "RVV way" from the "s390/PowerPC way".
>
> WHILE_LEN in doc I define is
> operand0 = MIN (operand1, operand2)operand1 is the residual number of scalar 
> elements need to be updated.operand2 is vectorization factor (vf) for single 
> rgroup. if multiple rgroup operan2 = vf * nitems_per_ctrl.You mean 
> such pattern is not well expressed so we need to replace it with normaltree 
> code (MIN OR MAX). And let RISC-V backend to optimize them into vsetvl 
> ?Sorry, maybe I am not on the same page.

It's not so much that we need to do that.  But normally it's only worth
adding internal functions if they do something that is too complicated
to express in simple gimple arithmetic.  The UQDEC case I mentioned:

   z = MAX (x, y) - y

fell into the "simple arithmetic" category for me.  We could have added
an ifn for unsigned saturating decrement, but it didn't seem complicated
enough to merit its own ifn.

>>> (2) What effect does using a variable IV step (the result of
>>> the WHILE_LEN) have on ivopts?  I remember experimenting with
>>> something similar once (can't remember the context) and not
>>> having a constant step prevented ivopts from making good
>>> addresing-mode choices.
>
> Thank you so much for pointing out this. Currently, varialble IV step and 
> decreasing n down to 0 
> works fine for RISC-V downstream GCC and we didn't find issues related 
> addressing-mode choosing.

OK, that's good.  Sounds like it isn't a problem then.

> I think I must missed something, would you mind giving me some hints so that 
> I can study on ivopts
> to find out which case may generate inferior codegens for varialble IV step?

I think AArch64 was sensitive to this because (a) the vectoriser creates
separate IVs for each base address and (b) for SVE, we instead want
invariant base addresses that are indexed by the loop control IV.
Like Richard says, if the loop control IV isn't a SCEV, ivopts isn't
able to use it and so (b) fails.

Thanks,
Richard


[GCC14 PATCH] LoongArch: Improve cpymemsi expansion [PR109465]

2023-04-12 Thread Xi Ruoyao via Gcc-patches
We'd been generating really bad block move sequences which is recently
complained by kernel developers who tried __builtin_memcpy.  To improve
it:

1. Take the advantage of -mno-strict-align.  When it is set, set mode
   size to UNITS_PER_WORD regardless of the alignment.
2. Half the mode size when (block size) % (mode size) != 0, instead of
   falling back to ld.bu/st.b at once.
3. Limit the length of block move sequence considering the number of
   instructions, not the size of block.  When -mstrict-align is set and
   the block is not aligned, the old size limit for straight-line
   implementation (64 bytes) was definitely too large (we don't have 64
   registers anyway).

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for GCC 14?

gcc/ChangeLog:

PR target/109465
* config/loongarch/loongarch-protos.h
(loongarch_expand_block_move): Add a parameter as alignment RTX.
* config/loongarch/loongarch.h:
(LARCH_MAX_MOVE_BYTES_PER_LOOP_ITER): Remove.
(LARCH_MAX_MOVE_BYTES_STRAIGHT): Remove.
(LARCH_MAX_MOVE_OPS_PER_LOOP_ITER): Define.
(LARCH_MAX_MOVE_OPS_STRAIGHT): Define.
(MOVE_RATIO): Use LARCH_MAX_MOVE_OPS_PER_LOOP_ITER instead of
LARCH_MAX_MOVE_BYTES_PER_LOOP_ITER.
* config/loongarch/loongarch.cc (loongarch_expand_block_move):
Take the alignment from the parameter, but set it to
UNITS_PER_WORD if !TARGET_STRICT_ALIGN.  Limit the length of
straight-line implementation with LARCH_MAX_MOVE_OPS_STRAIGHT
instead of LARCH_MAX_MOVE_BYTES_STRAIGHT.
(loongarch_block_move_straight): When there are left-over bytes,
half the mode size instead of falling back to byte mode at once.
(loongarch_block_move_loop): Limit the length of loop body with
LARCH_MAX_MOVE_OPS_PER_LOOP_ITER instead of
LARCH_MAX_MOVE_BYTES_PER_LOOP_ITER.
* config/loongarch/loongarch.md (cpymemsi): Pass the alignment
to loongarch_expand_block_move.

gcc/testsuite/ChangeLog:

PR target/109465
* gcc.target/loongarch/pr109465-1.c: New test.
* gcc.target/loongarch/pr109465-2.c: New test.
* gcc.target/loongarch/pr109465-3.c: New test.
---
 gcc/config/loongarch/loongarch-protos.h   |  2 +-
 gcc/config/loongarch/loongarch.cc | 87 ++-
 gcc/config/loongarch/loongarch.h  | 10 +--
 gcc/config/loongarch/loongarch.md |  3 +-
 .../gcc.target/loongarch/pr109465-1.c |  9 ++
 .../gcc.target/loongarch/pr109465-2.c |  9 ++
 .../gcc.target/loongarch/pr109465-3.c | 12 +++
 7 files changed, 83 insertions(+), 49 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/pr109465-1.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/pr109465-2.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/pr109465-3.c

diff --git a/gcc/config/loongarch/loongarch-protos.h 
b/gcc/config/loongarch/loongarch-protos.h
index 83df489c7a5..b71b188507a 100644
--- a/gcc/config/loongarch/loongarch-protos.h
+++ b/gcc/config/loongarch/loongarch-protos.h
@@ -95,7 +95,7 @@ extern void loongarch_expand_conditional_trap (rtx);
 #endif
 extern void loongarch_set_return_address (rtx, rtx);
 extern bool loongarch_move_by_pieces_p (unsigned HOST_WIDE_INT, unsigned int);
-extern bool loongarch_expand_block_move (rtx, rtx, rtx);
+extern bool loongarch_expand_block_move (rtx, rtx, rtx, rtx);
 extern bool loongarch_do_optimize_block_move_p (void);
 
 extern bool loongarch_expand_ext_as_unaligned_load (rtx, rtx, HOST_WIDE_INT,
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index dfb731fca9d..06fc1cd0604 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -4459,41 +4459,38 @@ loongarch_function_ok_for_sibcall (tree decl 
ATTRIBUTE_UNUSED,
Assume that the areas do not overlap.  */
 
 static void
-loongarch_block_move_straight (rtx dest, rtx src, HOST_WIDE_INT length)
+loongarch_block_move_straight (rtx dest, rtx src, HOST_WIDE_INT length,
+  HOST_WIDE_INT delta)
 {
-  HOST_WIDE_INT offset, delta;
-  unsigned HOST_WIDE_INT bits;
+  HOST_WIDE_INT offs, delta_cur;
   int i;
   machine_mode mode;
   rtx *regs;
 
-  bits = MIN (BITS_PER_WORD, MIN (MEM_ALIGN (src), MEM_ALIGN (dest)));
-
-  mode = int_mode_for_size (bits, 0).require ();
-  delta = bits / BITS_PER_UNIT;
+  HOST_WIDE_INT num_reg = length / delta;
+  for (delta_cur = delta / 2; delta_cur != 0; delta_cur /= 2)
+num_reg += !!(length & delta_cur);
 
   /* Allocate a buffer for the temporary registers.  */
-  regs = XALLOCAVEC (rtx, length / delta);
+  regs = XALLOCAVEC (rtx, num_reg);
 
-  /* Load as many BITS-sized chunks as possible.  Use a normal load if
- the source has enough alignment, otherwise use left/right pairs.  */
-  for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++)
+  for (delta_cur = delta, i = 0, offs 

Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization

2023-04-12 Thread Richard Biener via Gcc-patches
On Wed, Apr 12, 2023 at 1:18 PM Richard Sandiford via Gcc-patches
 wrote:
>
> Richard Biener  writes:
> > On Wed, 12 Apr 2023, juzhe.zh...@rivai.ai wrote:
> >
> >>
> >> >> Thanks for the detailed explanation.  Just to clarify - with RVV
> >> >> there's only a single mask register, v0.t, or did you want to
> >> >> say an instruction can only specify a single mask register?
> >>
> >> RVV has 32 (v0~v31) vector register in total.
> >> We can store vector data value or mask value in any of them.
> >> We also have mask-logic instruction for example mask-and between any 
> >> vector register.
> >>
> >> However, any vector operation for example like vadd.vv can only  
> >> predicated by v0 (in asm is v0.t) which is the first vector register.
> >> We can predicate vadd.vv with v1 - v31.
> >>
> >> So, you can image every time we want to use a mask to predicate a vector 
> >> operation, we should always first store the mask value
> >> into v0.
> >>
> >> So, we can write intrinsic sequence like this:
> >>
> >> vmseq v0,v8,v9 (store mask value to v0)
> >> vmslt v1,v10,v11 (store mask value to v1)
> >> vmand v0,v0,v1
> >> vadd.vv ...v0.t (predicate mask should always be mask).
> >
> > Ah, I see - that explains it well.
> >
> >> >> ARM SVE would have a loop control mask and a separate mask
> >> >> for the if (cond[i]) which would be combined with a mask-and
> >> >> instruction to a third mask which is then used on the
> >> >> predicated instructions.
> >>
> >> Yeah, I know it. ARM SVE way is a more elegant way than RVV do.
> >> However, for RVV, we can't follow this flow.
> >> We don't have a  "whilelo" instruction to generate loop control mask.
> >
> > Yep.  Similar for AVX512 where I have to use a vector compare.  I'm
> > currently using
> >
> >  { 0, 1, 2 ... } < { remaining_len, remaining_len, ... }
> >
> > and careful updating of remaining_len (we know it will either
> > be adjusted by the full constant vector length or updated to zero).
> >
> >> We only can do loop control with length generated by vsetvl.
> >> And we can only use "v0" to mask predicate vadd.vv, and mask value can 
> >> only generated by comparison or mask logical instructions.
> >>
> >> >> PowerPC and s390x might be able to use WHILE_LEN as well (though
> >> >> they only have LEN variants of loads and stores) - of course
> >> >> only "simulating it".  For the fixed-vector-length ISAs the
> >> >> predicated vector loop IMHO makes most sense for the epilogue to
> >> >> handle low-trip loops better.
> >>
> >> Yeah, I wonder how they do the flow control (if (cond[i])).
> >> For RVV, you can image I will need to add a pattern 
> >> LEN_MASK_LOAD/LEN_MASK_STORE (length generated by WHILE_LEN and mask 
> >> generated by comparison)
> >>
> >> I think we can CC IBM folks to see whether we can make WHILE_LEN works
> >> for both IBM and RVV ?
> >
> > I've CCed them.  Adding WHILE_LEN support to rs6000/s390x would be
> > mainly the "easy" way to get len-masked (epilog) loop support.
>
> I think that already works for them (could be misremembering).
> However, IIUC, they have no special instruction to calculate the
> length (unlike for RVV), and so it's open-coded using vect_get_len.
>
> I suppose my two questions are:
>
> (1) How easy would it be to express WHILE_LEN in normal gimple?
> I haven't thought about this at all, so the answer might be
> "very hard".  But it reminds me a little of UQDEC on AArch64,
> which we open-code using MAX_EXPR and MINUS_EXPR (see
> vect_set_loop_controls_directly).
>
> I'm not saying WHILE_LEN is the same operation, just that it seems
> like it might be open-codeable in a similar way.

I think WHILE_LEN is saturate-to-zero subtraction.  So when the IV
can be expressed signed

   remain = MAX (0, remain - vf);

the details are more complicated then you need an unsigned IV.

It might be that WHILE_LEN for RVV computes remain % VL
so another MIN around (not sure).  For the AVX512 work I
also have a scalar 'remain' like above but currently I'm adding
a branch

do {
 if (remain < vf)
   mask = ... vector compare ..;
 else
   mask = all-ones;
} while (mask-not-all-zeros);

so I'm using the mask as control "IV".  But that's because I do
open-code WHILE_ULT at RTL expansion time and this is how
the vectorizer works for SVE.

When manually creating a loop mask in the vectorizer tracking
'remain' is easier.  Note the extra control flow complicates the
fully masked variant, for the epilog we know remain < vf and
that we'll immediately exit the loop.

> Even if we can open-code it, we'd still need some way for the
> target to select the "RVV way" from the "s390/PowerPC way".
>
> (2) What effect does using a variable IV step (the result of
> the WHILE_LEN) have on ivopts?  I remember experimenting with
> something similar once (can't remember the context) and not
> having a constant step prevented ivopts from making good
> addresing-mode choices.

Any kind of variable length stuff 

Re: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization

2023-04-12 Thread juzhe.zh...@rivai.ai
Thank you. Richard.


>> I think that already works for them (could be misremembering).
>> However, IIUC, they have no special instruction to calculate the
>> length (unlike for RVV), and so it's open-coded using vect_get_len.

Yeah, the current flow using min, sub, and then min in vect_get_len
is working for IBM. But I wonder whether switching the current flow of
length-loop-control into the WHILE_LEN pattern that this patch can improve
their performance.

>> (1) How easy would it be to express WHILE_LEN in normal gimple?
>> I haven't thought about this at all, so the answer might be
>> "very hard".  But it reminds me a little of UQDEC on AArch64,
>> which we open-code using MAX_EXPR and MINUS_EXPR (see
 >>vect_set_loop_controls_directly).

  >>   I'm not saying WHILE_LEN is the same operation, just that it seems
  >>   like it might be open-codeable in a similar way.

 >>Even if we can open-code it, we'd still need some way for the
  >>   target to select the "RVV way" from the "s390/PowerPC way".

WHILE_LEN in doc I define is
operand0 = MIN (operand1, operand2)operand1 is the residual number of scalar 
elements need to be updated.operand2 is vectorization factor (vf) for single 
rgroup. if multiple rgroup operan2 = vf * nitems_per_ctrl.You mean such 
pattern is not well expressed so we need to replace it with normaltree code 
(MIN OR MAX). And let RISC-V backend to optimize them into vsetvl ?Sorry, maybe 
I am not on the same page.
>> (2) What effect does using a variable IV step (the result of
>> the WHILE_LEN) have on ivopts?  I remember experimenting with
>> something similar once (can't remember the context) and not
>> having a constant step prevented ivopts from making good
>> addresing-mode choices.

Thank you so much for pointing out this. Currently, varialble IV step and 
decreasing n down to 0 
works fine for RISC-V downstream GCC and we didn't find issues related 
addressing-mode choosing.

I think I must missed something, would you mind giving me some hints so that I 
can study on ivopts
to find out which case may generate inferior codegens for varialble IV step?

Thank you so much.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-04-12 19:17
To: Richard Biener
CC: juzhe.zhong\@rivai.ai; gcc-patches; jeffreyalaw; rdapp; linkw
Subject: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for 
auto-vectorization
Richard Biener  writes:
> On Wed, 12 Apr 2023, juzhe.zh...@rivai.ai wrote:
>
>> 
>> >> Thanks for the detailed explanation.  Just to clarify - with RVV
>> >> there's only a single mask register, v0.t, or did you want to
>> >> say an instruction can only specify a single mask register?
>> 
>> RVV has 32 (v0~v31) vector register in total.
>> We can store vector data value or mask value in any of them.
>> We also have mask-logic instruction for example mask-and between any vector 
>> register.
>> 
>> However, any vector operation for example like vadd.vv can only  predicated 
>> by v0 (in asm is v0.t) which is the first vector register.
>> We can predicate vadd.vv with v1 - v31.
>> 
>> So, you can image every time we want to use a mask to predicate a vector 
>> operation, we should always first store the mask value
>> into v0.
>> 
>> So, we can write intrinsic sequence like this:
>> 
>> vmseq v0,v8,v9 (store mask value to v0)
>> vmslt v1,v10,v11 (store mask value to v1)
>> vmand v0,v0,v1
>> vadd.vv ...v0.t (predicate mask should always be mask).
>
> Ah, I see - that explains it well.
>
>> >> ARM SVE would have a loop control mask and a separate mask
>> >> for the if (cond[i]) which would be combined with a mask-and
>> >> instruction to a third mask which is then used on the
>> >> predicated instructions.
>> 
>> Yeah, I know it. ARM SVE way is a more elegant way than RVV do. 
>> However, for RVV, we can't follow this flow.
>> We don't have a  "whilelo" instruction to generate loop control mask.
>
> Yep.  Similar for AVX512 where I have to use a vector compare.  I'm
> currently using
>
>  { 0, 1, 2 ... } < { remaining_len, remaining_len, ... }
>
> and careful updating of remaining_len (we know it will either
> be adjusted by the full constant vector length or updated to zero).
>
>> We only can do loop control with length generated by vsetvl.
>> And we can only use "v0" to mask predicate vadd.vv, and mask value can only 
>> generated by comparison or mask logical instructions. 
>> 
>> >> PowerPC and s390x might be able to use WHILE_LEN as well (though
>> >> they only have LEN variants of loads and stores) - of course
>> >> only "simulating it".  For the fixed-vector-length ISAs the
>> >> predicated vector loop IMHO makes most sense for the epilogue to
>> >> handle low-trip loops better.
>> 
>> Yeah, I wonder how they do the flow control (if (cond[i])). 
>> For RVV, you can image I will need to add a pattern 
>> LEN_MASK_LOAD/LEN_MASK_STORE (length generated by WHILE_LEN and mask 
>> generated by comparison)
>> 
>> I think we can CC IBM folks 

Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization

2023-04-12 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Wed, 12 Apr 2023, juzhe.zh...@rivai.ai wrote:
>
>> 
>> >> Thanks for the detailed explanation.  Just to clarify - with RVV
>> >> there's only a single mask register, v0.t, or did you want to
>> >> say an instruction can only specify a single mask register?
>> 
>> RVV has 32 (v0~v31) vector register in total.
>> We can store vector data value or mask value in any of them.
>> We also have mask-logic instruction for example mask-and between any vector 
>> register.
>> 
>> However, any vector operation for example like vadd.vv can only  predicated 
>> by v0 (in asm is v0.t) which is the first vector register.
>> We can predicate vadd.vv with v1 - v31.
>> 
>> So, you can image every time we want to use a mask to predicate a vector 
>> operation, we should always first store the mask value
>> into v0.
>> 
>> So, we can write intrinsic sequence like this:
>> 
>> vmseq v0,v8,v9 (store mask value to v0)
>> vmslt v1,v10,v11 (store mask value to v1)
>> vmand v0,v0,v1
>> vadd.vv ...v0.t (predicate mask should always be mask).
>
> Ah, I see - that explains it well.
>
>> >> ARM SVE would have a loop control mask and a separate mask
>> >> for the if (cond[i]) which would be combined with a mask-and
>> >> instruction to a third mask which is then used on the
>> >> predicated instructions.
>> 
>> Yeah, I know it. ARM SVE way is a more elegant way than RVV do. 
>> However, for RVV, we can't follow this flow.
>> We don't have a  "whilelo" instruction to generate loop control mask.
>
> Yep.  Similar for AVX512 where I have to use a vector compare.  I'm
> currently using
>
>  { 0, 1, 2 ... } < { remaining_len, remaining_len, ... }
>
> and careful updating of remaining_len (we know it will either
> be adjusted by the full constant vector length or updated to zero).
>
>> We only can do loop control with length generated by vsetvl.
>> And we can only use "v0" to mask predicate vadd.vv, and mask value can only 
>> generated by comparison or mask logical instructions. 
>> 
>> >> PowerPC and s390x might be able to use WHILE_LEN as well (though
>> >> they only have LEN variants of loads and stores) - of course
>> >> only "simulating it".  For the fixed-vector-length ISAs the
>> >> predicated vector loop IMHO makes most sense for the epilogue to
>> >> handle low-trip loops better.
>> 
>> Yeah, I wonder how they do the flow control (if (cond[i])). 
>> For RVV, you can image I will need to add a pattern 
>> LEN_MASK_LOAD/LEN_MASK_STORE (length generated by WHILE_LEN and mask 
>> generated by comparison)
>> 
>> I think we can CC IBM folks to see whether we can make WHILE_LEN works 
>> for both IBM and RVV ?
>
> I've CCed them.  Adding WHILE_LEN support to rs6000/s390x would be
> mainly the "easy" way to get len-masked (epilog) loop support.

I think that already works for them (could be misremembering).
However, IIUC, they have no special instruction to calculate the
length (unlike for RVV), and so it's open-coded using vect_get_len.

I suppose my two questions are:

(1) How easy would it be to express WHILE_LEN in normal gimple?
I haven't thought about this at all, so the answer might be
"very hard".  But it reminds me a little of UQDEC on AArch64,
which we open-code using MAX_EXPR and MINUS_EXPR (see
vect_set_loop_controls_directly).

I'm not saying WHILE_LEN is the same operation, just that it seems
like it might be open-codeable in a similar way.

Even if we can open-code it, we'd still need some way for the
target to select the "RVV way" from the "s390/PowerPC way".

(2) What effect does using a variable IV step (the result of
the WHILE_LEN) have on ivopts?  I remember experimenting with
something similar once (can't remember the context) and not
having a constant step prevented ivopts from making good
addresing-mode choices.

Thanks,
Richard




[PATCH] RISC-V: Fix PR109479

2023-04-12 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Fix supporting data type according to RVV ISA.
For vint64m*_t, we should only allow them in zve64* instead of zve32*_zvl64b 
(>=64b).
Ideally, we should make error message more friendly like Clang.
https://godbolt.org/z/f9GMv4dMo to report the RVV type require extenstion name.
However, I failed to find a way to do that. So current GCC can only report 
"unknown" type.
And I added comments to remind us doing this in the future.

Tested and Regression all passed. OK for GCC-13 trunk ?

PR 109479

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-types.def (vint8mf8_t): Fix 
predicate.
(vint16mf4_t): Ditto.
(vint32mf2_t): Ditto.
(vint64m1_t): Ditto.
(vint64m2_t): Ditto.
(vint64m4_t): Ditto.
(vint64m8_t): Ditto.
(vuint8mf8_t): Ditto.
(vuint16mf4_t): Ditto.
(vuint32mf2_t): Ditto.
(vuint64m1_t): Ditto.
(vuint64m2_t): Ditto.
(vuint64m4_t): Ditto.
(vuint64m8_t): Ditto.
(vfloat32mf2_t): Ditto.
(vbool64_t): Ditto.
* config/riscv/riscv-vector-builtins.cc (register_builtin_type): Add 
comments.
(register_vector_type): Ditto.
(check_required_extensions): Fix predicate.
* config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_ZVE64): Ditto.
(RVV_REQUIRE_ELEN_64): Ditto.
(RVV_REQUIRE_MIN_VLEN_64): Ditto.
* config/riscv/riscv-vector-switch.def (TARGET_VECTOR_FP32): Remove it.
(TARGET_VECTOR_FP64): Ditto.
(ENTRY): Fix predicate.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr109479.c: New test.

---
 .../riscv/riscv-vector-builtins-types.def | 348 +-
 gcc/config/riscv/riscv-vector-builtins.cc |  14 +-
 gcc/config/riscv/riscv-vector-builtins.h  |   3 +-
 gcc/config/riscv/riscv-vector-switch.def  |  56 ++-
 .../gcc.target/riscv/rvv/base/pr109479.c  |  11 +
 5 files changed, 220 insertions(+), 212 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr109479.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-types.def 
b/gcc/config/riscv/riscv-vector-builtins-types.def
index a55d494f1d9..a74df066521 100644
--- a/gcc/config/riscv/riscv-vector-builtins-types.def
+++ b/gcc/config/riscv/riscv-vector-builtins-types.def
@@ -235,53 +235,53 @@ along with GCC; see the file COPYING3. If not see
 #define DEF_RVV_LMUL4_OPS(TYPE, REQUIRE)
 #endif
 
-DEF_RVV_I_OPS (vint8mf8_t, RVV_REQUIRE_ZVE64)
+DEF_RVV_I_OPS (vint8mf8_t, RVV_REQUIRE_MIN_VLEN_64)
 DEF_RVV_I_OPS (vint8mf4_t, 0)
 DEF_RVV_I_OPS (vint8mf2_t, 0)
 DEF_RVV_I_OPS (vint8m1_t, 0)
 DEF_RVV_I_OPS (vint8m2_t, 0)
 DEF_RVV_I_OPS (vint8m4_t, 0)
 DEF_RVV_I_OPS (vint8m8_t, 0)
-DEF_RVV_I_OPS (vint16mf4_t, RVV_REQUIRE_ZVE64)
+DEF_RVV_I_OPS (vint16mf4_t, RVV_REQUIRE_MIN_VLEN_64)
 DEF_RVV_I_OPS (vint16mf2_t, 0)
 DEF_RVV_I_OPS (vint16m1_t, 0)
 DEF_RVV_I_OPS (vint16m2_t, 0)
 DEF_RVV_I_OPS (vint16m4_t, 0)
 DEF_RVV_I_OPS (vint16m8_t, 0)
-DEF_RVV_I_OPS (vint32mf2_t, RVV_REQUIRE_ZVE64)
+DEF_RVV_I_OPS (vint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)
 DEF_RVV_I_OPS (vint32m1_t, 0)
 DEF_RVV_I_OPS (vint32m2_t, 0)
 DEF_RVV_I_OPS (vint32m4_t, 0)
 DEF_RVV_I_OPS (vint32m8_t, 0)
-DEF_RVV_I_OPS (vint64m1_t, RVV_REQUIRE_ZVE64)
-DEF_RVV_I_OPS (vint64m2_t, RVV_REQUIRE_ZVE64)
-DEF_RVV_I_OPS (vint64m4_t, RVV_REQUIRE_ZVE64)
-DEF_RVV_I_OPS (vint64m8_t, RVV_REQUIRE_ZVE64)
+DEF_RVV_I_OPS (vint64m1_t, RVV_REQUIRE_ELEN_64)
+DEF_RVV_I_OPS (vint64m2_t, RVV_REQUIRE_ELEN_64)
+DEF_RVV_I_OPS (vint64m4_t, RVV_REQUIRE_ELEN_64)
+DEF_RVV_I_OPS (vint64m8_t, RVV_REQUIRE_ELEN_64)
 
-DEF_RVV_U_OPS (vuint8mf8_t, RVV_REQUIRE_ZVE64)
+DEF_RVV_U_OPS (vuint8mf8_t, RVV_REQUIRE_MIN_VLEN_64)
 DEF_RVV_U_OPS (vuint8mf4_t, 0)
 DEF_RVV_U_OPS (vuint8mf2_t, 0)
 DEF_RVV_U_OPS (vuint8m1_t, 0)
 DEF_RVV_U_OPS (vuint8m2_t, 0)
 DEF_RVV_U_OPS (vuint8m4_t, 0)
 DEF_RVV_U_OPS (vuint8m8_t, 0)
-DEF_RVV_U_OPS (vuint16mf4_t, RVV_REQUIRE_ZVE64)
+DEF_RVV_U_OPS (vuint16mf4_t, RVV_REQUIRE_MIN_VLEN_64)
 DEF_RVV_U_OPS (vuint16mf2_t, 0)
 DEF_RVV_U_OPS (vuint16m1_t, 0)
 DEF_RVV_U_OPS (vuint16m2_t, 0)
 DEF_RVV_U_OPS (vuint16m4_t, 0)
 DEF_RVV_U_OPS (vuint16m8_t, 0)
-DEF_RVV_U_OPS (vuint32mf2_t, RVV_REQUIRE_ZVE64)
+DEF_RVV_U_OPS (vuint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)
 DEF_RVV_U_OPS (vuint32m1_t, 0)
 DEF_RVV_U_OPS (vuint32m2_t, 0)
 DEF_RVV_U_OPS (vuint32m4_t, 0)
 DEF_RVV_U_OPS (vuint32m8_t, 0)
-DEF_RVV_U_OPS (vuint64m1_t, RVV_REQUIRE_ZVE64)
-DEF_RVV_U_OPS (vuint64m2_t, RVV_REQUIRE_ZVE64)
-DEF_RVV_U_OPS (vuint64m4_t, RVV_REQUIRE_ZVE64)
-DEF_RVV_U_OPS (vuint64m8_t, RVV_REQUIRE_ZVE64)
+DEF_RVV_U_OPS (vuint64m1_t, RVV_REQUIRE_ELEN_64)
+DEF_RVV_U_OPS (vuint64m2_t, RVV_REQUIRE_ELEN_64)
+DEF_RVV_U_OPS (vuint64m4_t, RVV_REQUIRE_ELEN_64)
+DEF_RVV_U_OPS (vuint64m8_t, RVV_REQUIRE_ELEN_64)
 
-DEF_RVV_F_OPS (vfloat32mf2_t, RVV_REQUIRE_ELEN_FP_32 | RVV_REQUIRE_ZVE64)
+DEF_RVV_F_OPS (vfloat32mf2_t, RVV_REQUIRE_ELEN_FP_32 | RVV_REQUIRE_MIN_VLEN_64)
 DEF_RVV_F_OPS (vfloat32m1_t, 

Re: [PATCH] PR tree-optimization/109462 - Don't use ANY PHI equivalences in range-on-entry.

2023-04-12 Thread Richard Biener via Gcc-patches
On Wed, Apr 12, 2023 at 12:59 PM Jakub Jelinek  wrote:
>
> On Wed, Apr 12, 2023 at 11:12:17AM +0200, Richard Biener wrote:
> > > 108139 fixed this by not evaluating any equivalences if the equivalence
> > > was the LHS.
> > >
> > > What it missed, was it possible we are calculating the range of a_3.
> > > b_2 is not defined in a phi node, so it happily used the equivalence.
> > > This PR demonstrates that we can't always use that equivlence either
> > > without more context.  There can be places in the IL where a_3 is used,
> > > but b_2 has moved to a new value within a loop.
> >
> > I think that's only possible when b_2 flows in via a backedge (from BB3).
> > So isn't this all about backedges?  Indeed creating equivalences across
>
> Apparently backedges and undefined phi arg, without that it doesn't seem
> to trigger because then it isn't considered equivalent even if the two have
> same range.
>
> > backedges is futile with SSA.  I think ranger requires dominators, so
> > to have the above scenario - a_3 used after the b_2 definition - requires
> > BB3 to be dominated by the a_3 definition which is what you could check.
>
> Would be nice.
>
> Though, I'm afraid it still wouldn't fix the PR101912 testcase, because
> it has exactly what happens in this PR, undefined phi arg from the
> pre-header and uses of the previous iteration's value (i.e. across
> backedge).

Well yes, that's what's not allowed.  So when the PHI dominates the
to-be-equivalenced argument edge src then the equivalence isn't
valid because there's a place (that very source block for example) a use of the
PHI lhs could appear and where we'd then mixup iterations.

>
> Jakub
>


Re: [PATCH] PR tree-optimization/109462 - Don't use ANY PHI equivalences in range-on-entry.

2023-04-12 Thread Jakub Jelinek via Gcc-patches
On Wed, Apr 12, 2023 at 11:12:17AM +0200, Richard Biener wrote:
> > 108139 fixed this by not evaluating any equivalences if the equivalence
> > was the LHS.
> >
> > What it missed, was it possible we are calculating the range of a_3.
> > b_2 is not defined in a phi node, so it happily used the equivalence.
> > This PR demonstrates that we can't always use that equivlence either
> > without more context.  There can be places in the IL where a_3 is used,
> > but b_2 has moved to a new value within a loop.
> 
> I think that's only possible when b_2 flows in via a backedge (from BB3).
> So isn't this all about backedges?  Indeed creating equivalences across

Apparently backedges and undefined phi arg, without that it doesn't seem
to trigger because then it isn't considered equivalent even if the two have
same range.

> backedges is futile with SSA.  I think ranger requires dominators, so
> to have the above scenario - a_3 used after the b_2 definition - requires
> BB3 to be dominated by the a_3 definition which is what you could check.

Would be nice.

Though, I'm afraid it still wouldn't fix the PR101912 testcase, because
it has exactly what happens in this PR, undefined phi arg from the
pre-header and uses of the previous iteration's value (i.e. across
backedge).

Jakub



Re: [RFC PATCH] range-op-float: Fix up op1_op2_relation of comparisons

2023-04-12 Thread Jakub Jelinek via Gcc-patches
On Tue, Apr 11, 2023 at 04:58:19PM -0400, Andrew MacLeod wrote:
> This bootstraps and has no regressions, and is fine by me if you want to use
> it.

Thanks, looks nice.
My incremental patch on top of that would then be below.

Though,
FAIL: gcc.dg/tree-ssa/vrp-float-6.c scan-tree-dump-times evrp "Folding 
predicate x_.* <= y_.* to 1" 1
still FAILs with those 2 patches together.
Shall we just xfail it for now and find some solution for GCC 14?

2023-04-12  Jakub Jelinek  

* range-op-float.cc (foperator_lt::op1_op2_relation): Return
VREL_VARYING instead of VREL_GE if HONOR_NANS and op1 or op2 could be
NANs.
(foperator_le::op1_op2_relation): Similarly return VREL_VARYING
instead of VREL_GT.
(foperator_gt::op1_op2_relation): Similarly return VREL_VARYING
instead of VREL_LE.
(foperator_ge::op1_op2_relation): Similarly return VREL_VARYING
instead of VREL_LT.
(foperator_unordered_lt::op1_op2_relation,
foperator_unordered_le::op1_op2_relation,
foperator_unordered_gt::op1_op2_relation,
foperator_unordered_ge::op1_op2_relation): New.

--- gcc/range-op-float.cc.jj2023-04-12 12:17:44.784962757 +0200
+++ gcc/range-op-float.cc   2023-04-12 12:26:11.740657502 +0200
@@ -835,10 +835,18 @@ public:
   bool fold_range (irange , tree type,
   const frange , const frange ,
   relation_trio = TRIO_VARYING) const final override;
-  relation_kind op1_op2_relation (const irange , const frange &,
- const frange &) const final override
+  relation_kind op1_op2_relation (const irange , const frange ,
+ const frange ) const final override
   {
-return lt_op1_op2_relation (lhs);
+relation_kind ret = lt_op1_op2_relation (lhs);
+if (ret == VREL_GE
+   && HONOR_NANS (op1.type ())
+   && (op1.known_isnan ()
+   || op1.maybe_isnan ()
+   || op2.known_isnan ()
+   || op2.maybe_isnan ()))
+  ret = VREL_VARYING; // Inverse of VREL_LT is VREL_UNGE with NAN ops.
+return ret;
   }
   bool op1_range (frange , tree type,
  const irange , const frange ,
@@ -952,9 +960,18 @@ public:
   bool fold_range (irange , tree type,
   const frange , const frange ,
   relation_trio rel = TRIO_VARYING) const final override;
-  relation_kind op1_op2_relation (const irange , const frange &,
- const frange &) const final override
+  relation_kind op1_op2_relation (const irange , const frange ,
+ const frange ) const final override
   {
+relation_kind ret = le_op1_op2_relation (lhs);
+if (ret == VREL_GT
+   && HONOR_NANS (op1.type ())
+   && (op1.known_isnan ()
+   || op1.maybe_isnan ()
+   || op2.known_isnan ()
+   || op2.maybe_isnan ()))
+  ret = VREL_VARYING; // Inverse of VREL_LE is VREL_UNGT with NAN ops.
+return ret;
 return le_op1_op2_relation (lhs);
   }
   bool op1_range (frange , tree type,
@@ -1063,10 +1080,18 @@ public:
   bool fold_range (irange , tree type,
   const frange , const frange ,
   relation_trio = TRIO_VARYING) const final override;
-  relation_kind op1_op2_relation (const irange , const frange &,
- const frange &) const final override
+  relation_kind op1_op2_relation (const irange , const frange ,
+ const frange ) const final override
   {
-return gt_op1_op2_relation (lhs);
+relation_kind ret = gt_op1_op2_relation (lhs);
+if (ret == VREL_LE
+   && HONOR_NANS (op1.type ())
+   && (op1.known_isnan ()
+   || op1.maybe_isnan ()
+   || op2.known_isnan ()
+   || op2.maybe_isnan ()))
+  ret = VREL_VARYING; // Inverse of VREL_GT is VREL_UNLE with NAN ops.
+return ret;
   }
   bool op1_range (frange , tree type,
  const irange , const frange ,
@@ -1184,10 +1209,18 @@ public:
   bool fold_range (irange , tree type,
   const frange , const frange ,
   relation_trio = TRIO_VARYING) const final override;
-  relation_kind op1_op2_relation (const irange , const frange &,
- const frange &) const final override
+  relation_kind op1_op2_relation (const irange , const frange ,
+ const frange ) const final override
   {
-return ge_op1_op2_relation (lhs);
+relation_kind ret = ge_op1_op2_relation (lhs);
+if (ret == VREL_LT
+   && HONOR_NANS (op1.type ())
+   && (op1.known_isnan ()
+   || op1.maybe_isnan ()
+   || op2.known_isnan ()
+   || op2.maybe_isnan ()))
+  ret = VREL_VARYING; // Inverse of VREL_GE is VREL_UNLT with NAN ops.
+return ret;
   }
   bool op1_range (frange , tree type,
  const irange , const frange ,
@@ 

[PATCH] tree-optimization/109473 - ICE with reduction epilog adjustment op

2023-04-12 Thread Richard Biener via Gcc-patches
The following makes sure to carry out the reduction epilog adjustment
in the original computation type which for pointers is an unsigned
integer type.  There's a similar issue with signed vs. unsigned ops
and overflow which is fixed by this as well.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/109473
* tree-vect-loop.cc (vect_create_epilog_for_reduction):
Convert scalar result to the computation type before performing
the reduction adjustment.

* gcc.dg/vect/pr109473.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr109473.c | 16 
 gcc/tree-vect-loop.cc|  7 +--
 2 files changed, 21 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr109473.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr109473.c 
b/gcc/testsuite/gcc.dg/vect/pr109473.c
new file mode 100644
index 000..9dee5515dc6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr109473.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O" } */
+
+struct spa_buffer {
+  __UINT32_TYPE__ *metas;
+};
+void do_port_use_buffers(struct spa_buffer **buffers, void *endptr, void *mem)
+{
+  for (int i = 0; i < 128; i++)
+{
+  for (int j = 0; j < 128; j++)
+   endptr = (void *)((__UINTPTR_TYPE__)endptr + buffers[i]->metas[j]);
+  if (endptr > mem)
+   return;
+}
+}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 1ba9f18d73e..ba28214f09a 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -6297,9 +6297,12 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
{
   new_temp = scalar_results[0];
  gcc_assert (TREE_CODE (TREE_TYPE (adjustment_def)) != VECTOR_TYPE);
- adjustment_def = gimple_convert (, scalar_type, adjustment_def);
- new_temp = gimple_build (, code, scalar_type,
+ adjustment_def = gimple_convert (, TREE_TYPE (vectype),
+  adjustment_def);
+ new_temp = gimple_convert (, TREE_TYPE (vectype), new_temp);
+ new_temp = gimple_build (, code, TREE_TYPE (vectype),
   new_temp, adjustment_def);
+ new_temp = gimple_convert (, scalar_type, new_temp);
}
 
   epilog_stmt = gimple_seq_last_stmt (stmts);
-- 
2.35.3


[PATCH] combine, v3: Fix AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]

2023-04-12 Thread Jakub Jelinek via Gcc-patches
Hi!

On Wed, Apr 12, 2023 at 08:21:26AM +0200, Jakub Jelinek via Gcc-patches wrote:
> I would have expected something like
> WORD_REGISTER_OPERATIONS && known_le (GET_MODE_PRECISION (mode), 
> BITS_PER_WORD)
> as the condition to use word_mode, rather than just
> WORD_REGISTER_OPERATIONS.  In both spots.  Because larger modes should be
> used as is, not a narrower word_mode instead of them.

In patch form that would be following (given that the combine.cc change
had scalar_int_mode mode we can as well just use normal comparison, and
simplify-rtx.cc has it guarded on HWI_COMPUTABLE_MODE_P, which is also only
true for scalar int modes).

I've tried the pr108947.c testcase, but I see no differences in the assembly
before/after the patch (but dunno if I'm using the right options).
The pr109040.c testcase from the patch I don't see the expected zero
extension without the patch and do see it with it.

As before, I can only test this easily on non-WORD_REGISTER_OPERATIONS
targets.

2023-04-12  Jeff Law  
Jakub Jelinek  

PR target/109040
* combine.cc (simplify_and_const_int_1): Compute nonzero_bits in
word_mode rather than mode if WORD_REGISTER_OPERATIONS and mode is
smaller than word_mode.
* simplify-rtx.cc (simplify_context::simplify_binary_operation_1)
: Likewise.

* gcc.c-torture/execute/pr109040.c: New test.

--- gcc/combine.cc.jj   2023-04-07 16:02:06.668051629 +0200
+++ gcc/combine.cc  2023-04-12 11:24:18.458240028 +0200
@@ -10055,9 +10055,12 @@ simplify_and_const_int_1 (scalar_int_mod
 
   /* See what bits may be nonzero in VAROP.  Unlike the general case of
  a call to nonzero_bits, here we don't care about bits outside
- MODE.  */
+ MODE unless WORD_REGISTER_OPERATIONS is true.  */
 
-  nonzero = nonzero_bits (varop, mode) & GET_MODE_MASK (mode);
+  scalar_int_mode tmode = mode;
+  if (WORD_REGISTER_OPERATIONS && GET_MODE_BITSIZE (mode) < BITS_PER_WORD)
+tmode = word_mode;
+  nonzero = nonzero_bits (varop, tmode) & GET_MODE_MASK (tmode);
 
   /* Turn off all bits in the constant that are known to already be zero.
  Thus, if the AND isn't needed at all, we will have CONSTOP == NONZERO_BITS
@@ -10071,7 +10074,7 @@ simplify_and_const_int_1 (scalar_int_mod
 
   /* If VAROP is a NEG of something known to be zero or 1 and CONSTOP is
  a power of two, we can replace this with an ASHIFT.  */
-  if (GET_CODE (varop) == NEG && nonzero_bits (XEXP (varop, 0), mode) == 1
+  if (GET_CODE (varop) == NEG && nonzero_bits (XEXP (varop, 0), tmode) == 1
   && (i = exact_log2 (constop)) >= 0)
 return simplify_shift_const (NULL_RTX, ASHIFT, mode, XEXP (varop, 0), i);
 
--- gcc/simplify-rtx.cc.jj  2023-03-02 19:09:45.459594212 +0100
+++ gcc/simplify-rtx.cc 2023-04-12 11:26:26.027400305 +0200
@@ -3752,7 +3752,13 @@ simplify_context::simplify_binary_operat
return op0;
   if (HWI_COMPUTABLE_MODE_P (mode))
{
- HOST_WIDE_INT nzop0 = nonzero_bits (trueop0, mode);
+ /* When WORD_REGISTER_OPERATIONS is true, we need to know the
+nonzero bits in WORD_MODE rather than MODE.  */
+  scalar_int_mode tmode = as_a  (mode);
+  if (WORD_REGISTER_OPERATIONS
+ && GET_MODE_BITSIZE (tmode) < BITS_PER_WORD)
+   tmode = word_mode;
+ HOST_WIDE_INT nzop0 = nonzero_bits (trueop0, tmode);
  HOST_WIDE_INT nzop1;
  if (CONST_INT_P (trueop1))
{
--- gcc/testsuite/gcc.c-torture/execute/pr109040.c.jj   2023-04-12 
11:11:56.728938344 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr109040.c  2023-04-12 
11:11:56.728938344 +0200
@@ -0,0 +1,23 @@
+/* PR target/109040 */
+
+typedef unsigned short __attribute__((__vector_size__ (32))) V;
+
+unsigned short a, b, c, d;
+
+void
+foo (V m, unsigned short *ret)
+{
+  V v = 6 > ((V) { 2124, 8 } & m);
+  unsigned short uc = v[0] + a + b + c + d;
+  *ret = uc;
+}
+
+int
+main ()
+{
+  unsigned short x;
+  foo ((V) { 0, 15 }, );
+  if (x != (unsigned short) ~0)
+__builtin_abort ();
+  return 0;
+}

Jakub



Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization

2023-04-12 Thread Robin Dapp via Gcc-patches
>> I think we can CC IBM folks to see whether we can make WHILE_LEN works 
>> for both IBM and RVV ?
> 
> I've CCed them.  Adding WHILE_LEN support to rs6000/s390x would be
> mainly the "easy" way to get len-masked (epilog) loop support.  I've
> figured actually implementing WHILE_ULT for AVX512 in the backend
> results in some code generation challenges so I'm going to play
> (again) with open-coding it as outlined above in the vectorizer itself
> so followup passes (mostly IVOPTs) can do a better job.

I'm with Ventana now but haven't updated my affiliation yet.  CC'ing Stefan and 
Andreas fyi.


Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-12 Thread Kito Cheng via Gcc-patches
> > The concept of fractional LMUL is the same as the concept of AArch64's
> > partial SVE vectors,
> > so they can only access the lowest part, like SVE's partial vector.
> >
> > We want to spill/restore the exact size of those modes (1/2, 1/4,
> > 1/8), so adding dedicated modes for those partial vector modes should
> > be unavoidable IMO.
> >
> > And even if we use sub-vector, we still need to define those partial
> > vector types.
>
> Could you use integer modes for the fractional vectors?

You mean using the scalar integer mode like using (subreg:SI
(reg:VNx4SI) 0) to represent
LMUL=1/4?
(Assume VNx4SI is mode for M1)

If so I think it might not be able to model that right - it seems like
we are using 32-bits
but actually we are using poly_int16(1, 1) * 32 bits.

> For computation you can always appropriately limit the LEN?

RVV provide zvl*b extension like zvlb (e.g.zvl128b or zvl256b)
to guarantee the vector length is at least larger than N bits, but it's
just guarantee the minimal length like SVE guarantee the minimal
vector length is 128 bits


Re: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization

2023-04-12 Thread Richard Biener via Gcc-patches
On Wed, 12 Apr 2023, juzhe.zh...@rivai.ai wrote:

> 
> >> Thanks for the detailed explanation.  Just to clarify - with RVV
> >> there's only a single mask register, v0.t, or did you want to
> >> say an instruction can only specify a single mask register?
> 
> RVV has 32 (v0~v31) vector register in total.
> We can store vector data value or mask value in any of them.
> We also have mask-logic instruction for example mask-and between any vector 
> register.
> 
> However, any vector operation for example like vadd.vv can only  predicated 
> by v0 (in asm is v0.t) which is the first vector register.
> We can predicate vadd.vv with v1 - v31.
> 
> So, you can image every time we want to use a mask to predicate a vector 
> operation, we should always first store the mask value
> into v0.
> 
> So, we can write intrinsic sequence like this:
> 
> vmseq v0,v8,v9 (store mask value to v0)
> vmslt v1,v10,v11 (store mask value to v1)
> vmand v0,v0,v1
> vadd.vv ...v0.t (predicate mask should always be mask).

Ah, I see - that explains it well.

> >> ARM SVE would have a loop control mask and a separate mask
> >> for the if (cond[i]) which would be combined with a mask-and
> >> instruction to a third mask which is then used on the
> >> predicated instructions.
> 
> Yeah, I know it. ARM SVE way is a more elegant way than RVV do. 
> However, for RVV, we can't follow this flow.
> We don't have a  "whilelo" instruction to generate loop control mask.

Yep.  Similar for AVX512 where I have to use a vector compare.  I'm
currently using

 { 0, 1, 2 ... } < { remaining_len, remaining_len, ... }

and careful updating of remaining_len (we know it will either
be adjusted by the full constant vector length or updated to zero).

> We only can do loop control with length generated by vsetvl.
> And we can only use "v0" to mask predicate vadd.vv, and mask value can only 
> generated by comparison or mask logical instructions. 
> 
> >> PowerPC and s390x might be able to use WHILE_LEN as well (though
> >> they only have LEN variants of loads and stores) - of course
> >> only "simulating it".  For the fixed-vector-length ISAs the
> >> predicated vector loop IMHO makes most sense for the epilogue to
> >> handle low-trip loops better.
> 
> Yeah, I wonder how they do the flow control (if (cond[i])). 
> For RVV, you can image I will need to add a pattern 
> LEN_MASK_LOAD/LEN_MASK_STORE (length generated by WHILE_LEN and mask 
> generated by comparison)
> 
> I think we can CC IBM folks to see whether we can make WHILE_LEN works 
> for both IBM and RVV ?

I've CCed them.  Adding WHILE_LEN support to rs6000/s390x would be
mainly the "easy" way to get len-masked (epilog) loop support.  I've
figured actually implementing WHILE_ULT for AVX512 in the backend
results in some code generation challenges so I'm going to play
(again) with open-coding it as outlined above in the vectorizer itself
so followup passes (mostly IVOPTs) can do a better job.

Richard.

> 
> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-04-12 16:42
> To: juzhe.zh...@rivai.ai
> CC: richard.sandiford; gcc-patches; jeffreyalaw
> Subject: Re: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support 
> for auto-vectorization
> On Wed, 12 Apr 2023, juzhe.zh...@rivai.ai wrote:
>  
> > Thank you very much for reply.
> > 
> > WHILE_LEN is the pattern that calculates the number of the elements of the 
> > vector will be updated in each iteration.
> > For RVV, we use vsetvl instruction to calculate the number of the elements 
> > of the vector.
> > 
> > WHILE_ULT can not work for RVV since WHILE_ULT is generating mask to 
> > predicate vector operation, but RVV do not
> > use mask to do the loop strip mining (RVV only use mask for control flow 
> > inside the loop).
> > 
> > Here is the example WHILE_ULT working in ARM SVE:
> > https://godbolt.org/z/jKsT8E1hP 
> > 
> > The first example is:
> > void foo (int32_t * __restrict a, int32_t * __restrict b, int n)
> > {
> > for (int i = 0; i < n; i++)
> >   a[i] = a[i] + b[i];
> > }
> > 
> > ARM SVE:
> > foo:
> > cmp w2, 0
> > ble .L1
> > mov x3, 0
> > cntwx4
> > whilelo p0.s, wzr, w2
> > .L3:
> > ld1wz1.s, p0/z, [x0, x3, lsl 2]
> > ld1wz0.s, p0/z, [x1, x3, lsl 2]
> > add z0.s, z0.s, z1.s
> > st1wz0.s, p0, [x0, x3, lsl 2]
> > add x3, x3, x4
> > whilelo p0.s, w3, w2
> > b.any   .L3
> > .L1:
> > ret
> > 
> > Here, whilelo will generate the mask according to w3 to w2.
> > So for example, if w3 = 0, and w2 = 3 (Suppose machine vector length > 3).
> > Then it will generate a mask with 0b111 mask to predicate loads and stores.
> > 
> > For RVV, we can't do that since RVV doesn't have whilelo instructions to 
> > generate predicate mask.
> > Also, we can't use mask as the predicate to do loop strip mining since RVV 
> > only has 1 single mask 
> > to handle flow control  

Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-12 Thread Richard Biener via Gcc-patches
On Wed, 12 Apr 2023, Kito Cheng wrote:

> Hi Richard:
> 
> > > In order to model LMUL in backend, we have to the combination of
> > > scalar type and LMUL; possible LMUL is 1, 2, 4, 8, 1/2, 1/4, 1/8 - 8
> > > different types of LMUL, and we'll have QI, HI, SI, DI, HF, SF and DF,
> > > so basically we'll have 7 (LMUL type) * 7 (scalar type) here.
> >
> > Other archs have load/store-multiple instructions, IIRC those
> > are modeled with the appropriate set of operands. Do RVV LMUL
> > group inputs/outputs overlap with the non-LMUL grouped registers
> > and can they be used as aliases or is this supposed to be
> > implemented transparently on the register file level only?
> 
> LMUL and non-LMUL (or LMUL=1) modes use the same vector register file.
> 
> Reg for LMUL=1/2 : { {v0, v1, ...v31} }
> Reg for LMUL=1 : { {v0, v1, ...v31} }
> Reg for LMUL=2 : { {v0, v1}, {v2, v3}, ... {v30, v31} } // reg. must
> align to multiple of 2.
> Reg for LMUL=4 : { {v0, v1, v2, v3}, {v4, v5, v6, v7}, ... {v28, v29,
> v30, v31} } // reg. must align to multiple of 4.
> ..
> Reg for 2-tuples of LMUL=1 : { {v0, v1}, {v1, v2}, ... {v29, v30}, {v30, v31} 
> }
> Reg for 2-tuples of LMUL=2 : { {v0, v1, v2, v3}, {v2, v3, v4, v5}, ...
> {v28, v29, v30, v31}, {v28, v29, v30, v31} } // reg. must align to
> multiple of 2.
> ...
> 
> > But yes, implementing this as operations on multi-register
> > ops with large modes is probably the only sensible approach.
> >
> > I don't see how LMUL of 1/2, 1/4 or 1/8 is useful though? Can you
> > explain? Is that supposed to virtually increase the number of
> > registers? How do you represent r0:1/8:0 vs r0:1/8:3 (the first
> > and the third "virtual" register decomposed from r0) in GCC? To
> > me the natural way would be a subreg of r0?
> >
> > Somehow RVV seems to have more knobs than necessary for tuning
> > the actual vector register layout (aka N axes but only N-1 dimensions
> > thus the axes are
> 
> The concept of fractional LMUL is the same as the concept of AArch64's
> partial SVE vectors,
> so they can only access the lowest part, like SVE's partial vector.
> 
> We want to spill/restore the exact size of those modes (1/2, 1/4,
> 1/8), so adding dedicated modes for those partial vector modes should
> be unavoidable IMO.
> 
> And even if we use sub-vector, we still need to define those partial
> vector types.

Could you use integer modes for the fractional vectors?  For computation
you can always appropriately limit the LEN?


Re: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization

2023-04-12 Thread juzhe.zh...@rivai.ai
Sorry for incorrect typo.We can predicate vadd.vv with v1 - v31.
>
We can't predicate vadd.vv with v1 - v31.


juzhe.zh...@rivai.ai
 
From: juzhe.zh...@rivai.ai
Date: 2023-04-12 17:15
To: rguenther
CC: richard.sandiford; gcc-patches; jeffreyalaw
Subject: Re: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support 
for auto-vectorization

>> Thanks for the detailed explanation.  Just to clarify - with RVV
>> there's only a single mask register, v0.t, or did you want to
>> say an instruction can only specify a single mask register?

RVV has 32 (v0~v31) vector register in total.
We can store vector data value or mask value in any of them.
We also have mask-logic instruction for example mask-and between any vector 
register.

However, any vector operation for example like vadd.vv can only  predicated by 
v0 (in asm is v0.t) which is the first vector register.
We can predicate vadd.vv with v1 - v31.

So, you can image every time we want to use a mask to predicate a vector 
operation, we should always first store the mask value
into v0.

So, we can write intrinsic sequence like this:

vmseq v0,v8,v9 (store mask value to v0)
vmslt v1,v10,v11 (store mask value to v1)
vmand v0,v0,v1
vadd.vv ...v0.t (predicate mask should always be mask).

>> ARM SVE would have a loop control mask and a separate mask
>> for the if (cond[i]) which would be combined with a mask-and
>> instruction to a third mask which is then used on the
>> predicated instructions.

Yeah, I know it. ARM SVE way is a more elegant way than RVV do. 
However, for RVV, we can't follow this flow.
We don't have a  "whilelo" instruction to generate loop control mask.
We only can do loop control with length generated by vsetvl.
And we can only use "v0" to mask predicate vadd.vv, and mask value can only 
generated by comparison or mask logical instructions. 

>> PowerPC and s390x might be able to use WHILE_LEN as well (though
>> they only have LEN variants of loads and stores) - of course
>> only "simulating it".  For the fixed-vector-length ISAs the
>> predicated vector loop IMHO makes most sense for the epilogue to
>> handle low-trip loops better.

Yeah, I wonder how they do the flow control (if (cond[i])). 
For RVV, you can image I will need to add a pattern 
LEN_MASK_LOAD/LEN_MASK_STORE (length generated by WHILE_LEN and mask generated 
by comparison)

I think we can CC IBM folks to see whether we can make WHILE_LEN works 
for both IBM and RVV ? 

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-04-12 16:42
To: juzhe.zh...@rivai.ai
CC: richard.sandiford; gcc-patches; jeffreyalaw
Subject: Re: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support 
for auto-vectorization
On Wed, 12 Apr 2023, juzhe.zh...@rivai.ai wrote:
 
> Thank you very much for reply.
> 
> WHILE_LEN is the pattern that calculates the number of the elements of the 
> vector will be updated in each iteration.
> For RVV, we use vsetvl instruction to calculate the number of the elements of 
> the vector.
> 
> WHILE_ULT can not work for RVV since WHILE_ULT is generating mask to 
> predicate vector operation, but RVV do not
> use mask to do the loop strip mining (RVV only use mask for control flow 
> inside the loop).
> 
> Here is the example WHILE_ULT working in ARM SVE:
> https://godbolt.org/z/jKsT8E1hP 
> 
> The first example is:
> void foo (int32_t * __restrict a, int32_t * __restrict b, int n)
> {
> for (int i = 0; i < n; i++)
>   a[i] = a[i] + b[i];
> }
> 
> ARM SVE:
> foo:
> cmp w2, 0
> ble .L1
> mov x3, 0
> cntwx4
> whilelo p0.s, wzr, w2
> .L3:
> ld1wz1.s, p0/z, [x0, x3, lsl 2]
> ld1wz0.s, p0/z, [x1, x3, lsl 2]
> add z0.s, z0.s, z1.s
> st1wz0.s, p0, [x0, x3, lsl 2]
> add x3, x3, x4
> whilelo p0.s, w3, w2
> b.any   .L3
> .L1:
> ret
> 
> Here, whilelo will generate the mask according to w3 to w2.
> So for example, if w3 = 0, and w2 = 3 (Suppose machine vector length > 3).
> Then it will generate a mask with 0b111 mask to predicate loads and stores.
> 
> For RVV, we can't do that since RVV doesn't have whilelo instructions to 
> generate predicate mask.
> Also, we can't use mask as the predicate to do loop strip mining since RVV 
> only has 1 single mask 
> to handle flow control  inside the loop.
> 
> Instead, we use vsetvl to do the strip mining, so base on this, the same C 
> code, RVV ideal asm according RVV ISA should be:
> 
> preheader:
> a0 = n (the total number of the scalar should be calculated).
>  .
> .L3:
> vsetvli a5,a0,e32,m1,ta,ma> WHILE_LEN pattern generate this 
> instruction, calculate the number of the elements should be updated
> vle32.v v1,0(a4)
> sub a0,a0,a5  > Decrement the induction variable 
> by the a5 (generated by WHILE_LEN)
>    
> 
> vadd.vv
> vse32.v v1,0(a3)
> add a4,a4,a2
>   

Re: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization

2023-04-12 Thread juzhe.zh...@rivai.ai

>> Thanks for the detailed explanation.  Just to clarify - with RVV
>> there's only a single mask register, v0.t, or did you want to
>> say an instruction can only specify a single mask register?

RVV has 32 (v0~v31) vector register in total.
We can store vector data value or mask value in any of them.
We also have mask-logic instruction for example mask-and between any vector 
register.

However, any vector operation for example like vadd.vv can only  predicated by 
v0 (in asm is v0.t) which is the first vector register.
We can predicate vadd.vv with v1 - v31.

So, you can image every time we want to use a mask to predicate a vector 
operation, we should always first store the mask value
into v0.

So, we can write intrinsic sequence like this:

vmseq v0,v8,v9 (store mask value to v0)
vmslt v1,v10,v11 (store mask value to v1)
vmand v0,v0,v1
vadd.vv ...v0.t (predicate mask should always be mask).

>> ARM SVE would have a loop control mask and a separate mask
>> for the if (cond[i]) which would be combined with a mask-and
>> instruction to a third mask which is then used on the
>> predicated instructions.

Yeah, I know it. ARM SVE way is a more elegant way than RVV do. 
However, for RVV, we can't follow this flow.
We don't have a  "whilelo" instruction to generate loop control mask.
We only can do loop control with length generated by vsetvl.
And we can only use "v0" to mask predicate vadd.vv, and mask value can only 
generated by comparison or mask logical instructions. 

>> PowerPC and s390x might be able to use WHILE_LEN as well (though
>> they only have LEN variants of loads and stores) - of course
>> only "simulating it".  For the fixed-vector-length ISAs the
>> predicated vector loop IMHO makes most sense for the epilogue to
>> handle low-trip loops better.

Yeah, I wonder how they do the flow control (if (cond[i])). 
For RVV, you can image I will need to add a pattern 
LEN_MASK_LOAD/LEN_MASK_STORE (length generated by WHILE_LEN and mask generated 
by comparison)

I think we can CC IBM folks to see whether we can make WHILE_LEN works 
for both IBM and RVV ? 

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-04-12 16:42
To: juzhe.zh...@rivai.ai
CC: richard.sandiford; gcc-patches; jeffreyalaw
Subject: Re: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support 
for auto-vectorization
On Wed, 12 Apr 2023, juzhe.zh...@rivai.ai wrote:
 
> Thank you very much for reply.
> 
> WHILE_LEN is the pattern that calculates the number of the elements of the 
> vector will be updated in each iteration.
> For RVV, we use vsetvl instruction to calculate the number of the elements of 
> the vector.
> 
> WHILE_ULT can not work for RVV since WHILE_ULT is generating mask to 
> predicate vector operation, but RVV do not
> use mask to do the loop strip mining (RVV only use mask for control flow 
> inside the loop).
> 
> Here is the example WHILE_ULT working in ARM SVE:
> https://godbolt.org/z/jKsT8E1hP 
> 
> The first example is:
> void foo (int32_t * __restrict a, int32_t * __restrict b, int n)
> {
> for (int i = 0; i < n; i++)
>   a[i] = a[i] + b[i];
> }
> 
> ARM SVE:
> foo:
> cmp w2, 0
> ble .L1
> mov x3, 0
> cntwx4
> whilelo p0.s, wzr, w2
> .L3:
> ld1wz1.s, p0/z, [x0, x3, lsl 2]
> ld1wz0.s, p0/z, [x1, x3, lsl 2]
> add z0.s, z0.s, z1.s
> st1wz0.s, p0, [x0, x3, lsl 2]
> add x3, x3, x4
> whilelo p0.s, w3, w2
> b.any   .L3
> .L1:
> ret
> 
> Here, whilelo will generate the mask according to w3 to w2.
> So for example, if w3 = 0, and w2 = 3 (Suppose machine vector length > 3).
> Then it will generate a mask with 0b111 mask to predicate loads and stores.
> 
> For RVV, we can't do that since RVV doesn't have whilelo instructions to 
> generate predicate mask.
> Also, we can't use mask as the predicate to do loop strip mining since RVV 
> only has 1 single mask 
> to handle flow control  inside the loop.
> 
> Instead, we use vsetvl to do the strip mining, so base on this, the same C 
> code, RVV ideal asm according RVV ISA should be:
> 
> preheader:
> a0 = n (the total number of the scalar should be calculated).
>  .
> .L3:
> vsetvli a5,a0,e32,m1,ta,ma> WHILE_LEN pattern generate this 
> instruction, calculate the number of the elements should be updated
> vle32.v v1,0(a4)
> sub a0,a0,a5  > Decrement the induction variable 
> by the a5 (generated by WHILE_LEN)
>    
> 
> vadd.vv
> vse32.v v1,0(a3)
> add a4,a4,a2
> add a3,a3,a2
> bne a0,zero,.L3
> .L1:
> ret
> 
> So you will see, if n = 3 like I said for ARM SVE (Suppose machine vector 
> length > 3), then vsetvli a5,a0,e32,m1,ta,ma will
> generate a5 = 3, then the vle32.v/vadd.vv/vse32.v are all doing the operation 
> only on the element 0,  element 1, element 2.
> 
> Besides, 

Re: [PATCH] PR tree-optimization/109462 - Don't use ANY PHI equivalences in range-on-entry.

2023-04-12 Thread Richard Biener via Gcc-patches
On Wed, Apr 12, 2023 at 1:52 AM Andrew MacLeod  wrote:
>
> This is a carry over from PR 108139.
>
> When we have a PHI node which has 2 arguments and one is undefined, we
> create an equivalence between the LHS and the non-undefined PHI
> argument.  THis allows us to perform certain optimizations.
>
> The problem is, when we are evaluating range-on-entry in the cache, its
> depends on where that equivalence is made, from where we have no context.
>
> a_3 = 
>
> if c_3 is undefined,  then a_3 is equivalent to b_2... but b_2 is not
> equivalence to a_3 everywhere..   its a one way thing.
>
> 108139 fixed this by not evaluating any equivalences if the equivalence
> was the LHS.
>
> What it missed, was it possible we are calculating the range of a_3.
> b_2 is not defined in a phi node, so it happily used the equivalence.
> This PR demonstrates that we can't always use that equivlence either
> without more context.  There can be places in the IL where a_3 is used,
> but b_2 has moved to a new value within a loop.

I think that's only possible when b_2 flows in via a backedge (from BB3).
So isn't this all about backedges?  Indeed creating equivalences across
backedges is futile with SSA.  I think ranger requires dominators, so
to have the above scenario - a_3 used after the b_2 definition - requires
BB3 to be dominated by the a_3 definition which is what you could check.

>
> So we can't do this if either NAME or the equivalence is equal via a PHI
> node with an undefined argument.
>
> Unfortunately, this unsafe assumption is why PR 101912 is fixed.
> Fixing this issue properly is going to cause that to reopen as it is
> unsafe. (That PR is  a false uninitialized warning issue, rather than an
> wrong-code issue)
>
> This bootstraps on x86_64-pc-linux-gnu  with that single regression,
> which I have XFAILed for now.  OK for trunk?   Once Jakub verifies it
> actually fixes the execution problem.   we have no executable test . yet.
>
> Andrew
>
>
>


Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-12 Thread Kito Cheng via Gcc-patches
Hi Richard:

> > In order to model LMUL in backend, we have to the combination of
> > scalar type and LMUL; possible LMUL is 1, 2, 4, 8, 1/2, 1/4, 1/8 - 8
> > different types of LMUL, and we'll have QI, HI, SI, DI, HF, SF and DF,
> > so basically we'll have 7 (LMUL type) * 7 (scalar type) here.
>
> Other archs have load/store-multiple instructions, IIRC those
> are modeled with the appropriate set of operands. Do RVV LMUL
> group inputs/outputs overlap with the non-LMUL grouped registers
> and can they be used as aliases or is this supposed to be
> implemented transparently on the register file level only?

LMUL and non-LMUL (or LMUL=1) modes use the same vector register file.

Reg for LMUL=1/2 : { {v0, v1, ...v31} }
Reg for LMUL=1 : { {v0, v1, ...v31} }
Reg for LMUL=2 : { {v0, v1}, {v2, v3}, ... {v30, v31} } // reg. must
align to multiple of 2.
Reg for LMUL=4 : { {v0, v1, v2, v3}, {v4, v5, v6, v7}, ... {v28, v29,
v30, v31} } // reg. must align to multiple of 4.
..
Reg for 2-tuples of LMUL=1 : { {v0, v1}, {v1, v2}, ... {v29, v30}, {v30, v31} }
Reg for 2-tuples of LMUL=2 : { {v0, v1, v2, v3}, {v2, v3, v4, v5}, ...
{v28, v29, v30, v31}, {v28, v29, v30, v31} } // reg. must align to
multiple of 2.
...

> But yes, implementing this as operations on multi-register
> ops with large modes is probably the only sensible approach.
>
> I don't see how LMUL of 1/2, 1/4 or 1/8 is useful though? Can you
> explain? Is that supposed to virtually increase the number of
> registers? How do you represent r0:1/8:0 vs r0:1/8:3 (the first
> and the third "virtual" register decomposed from r0) in GCC? To
> me the natural way would be a subreg of r0?
>
> Somehow RVV seems to have more knobs than necessary for tuning
> the actual vector register layout (aka N axes but only N-1 dimensions
> thus the axes are

The concept of fractional LMUL is the same as the concept of AArch64's
partial SVE vectors,
so they can only access the lowest part, like SVE's partial vector.

We want to spill/restore the exact size of those modes (1/2, 1/4,
1/8), so adding dedicated modes for those partial vector modes should
be unavoidable IMO.

And even if we use sub-vector, we still need to define those partial
vector types.


Re: [aarch64] Use dup and zip1 for interleaving elements in initializing vector

2023-04-12 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni  writes:
> On Thu, 6 Apr 2023 at 16:05, Richard Sandiford
>  wrote:
>>
>> Prathamesh Kulkarni  writes:
>> > On Tue, 4 Apr 2023 at 23:35, Richard Sandiford
>> >  wrote:
>> >> > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
>> >> > b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> >> > index cd9cace3c9b..3de79060619 100644
>> >> > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> >> > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> >> > @@ -817,6 +817,62 @@ public:
>> >> >
>> >> >  class svdupq_impl : public quiet
>> >> >  {
>> >> > +private:
>> >> > +  gimple *
>> >> > +  fold_nonconst_dupq (gimple_folder , unsigned factor) const
>> >> > +  {
>> >> > +/* Lower lhs = svdupq (arg0, arg1, ..., argN} into:
>> >> > +   tmp = {arg0, arg1, ..., arg}
>> >> > +   lhs = VEC_PERM_EXPR (tmp, tmp, {0, 1, 2, N-1, ...})  */
>> >> > +
>> >> > +/* TODO: Revisit to handle factor by padding zeros.  */
>> >> > +if (factor > 1)
>> >> > +  return NULL;
>> >>
>> >> Isn't the key thing here predicate vs. vector rather than factor == 1 vs.
>> >> factor != 1?  Do we generate good code for b8, where factor should be 1?
>> > Hi,
>> > It generates the following code for svdup_n_b8:
>> > https://pastebin.com/ypYt590c
>>
>> Hmm, yeah, not pretty :-)  But it's not pretty without either.
>>
>> > I suppose lowering to ctor+vec_perm_expr is not really useful
>> > for this case because it won't simplify ctor, unlike the above case of
>> > svdupq_s32 (x[0], x[1], x[2], x[3]);
>> > However I wonder if it's still a good idea to lower svdupq for predicates, 
>> > for
>> > representing svdupq (or other intrinsics) using GIMPLE constructs as
>> > far as possible ?
>>
>> It's possible, but I think we'd need an example in which its a clear
>> benefit.
> Sorry I posted for wrong test case above.
> For the following test:
> svbool_t f(uint8x16_t x)
> {
>   return svdupq_n_b8 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7],
> x[8], x[9], x[10], x[11], x[12],
> x[13], x[14], x[15]);
> }
>
> Code-gen:
> https://pastebin.com/maexgeJn
>
> I suppose it's equivalent to following ?
>
> svbool_t f2(uint8x16_t x)
> {
>   svuint8_t tmp = svdupq_n_u8 ((bool) x[0], (bool) x[1], (bool) x[2],
> (bool) x[3],
>(bool) x[4], (bool) x[5], (bool) x[6],
> (bool) x[7],
>(bool) x[8], (bool) x[9], (bool) x[10],
> (bool) x[11],
>(bool) x[12], (bool) x[13], (bool)
> x[14], (bool) x[15]);
>   return svcmpne_n_u8 (svptrue_b8 (), tmp, 0);
> }

Yeah, this is essentially the transformation that the svdupq rtl
expander uses.  It would probably be a good idea to do that in
gimple too.

Thanks,
Richard

>
> which generates:
> f2:
> .LFB3901:
> .cfi_startproc
> moviv1.16b, 0x1
> ptrue   p0.b, all
> cmeqv0.16b, v0.16b, #0
> bic v0.16b, v1.16b, v0.16b
> dup z0.q, z0.q[0]
> cmpne   p0.b, p0/z, z0.b, #0
> ret
>
> Thanks,
> Prathamesh


New French PO file for 'gcc' (version 13.1-b20230409)

2023-04-12 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the French team of translators.  The file is available at:

https://translationproject.org/latest/gcc/fr.po

(This file, 'gcc-13.1-b20230409.fr.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization

2023-04-12 Thread Richard Biener via Gcc-patches
On Wed, 12 Apr 2023, juzhe.zh...@rivai.ai wrote:

> Thank you very much for reply.
> 
> WHILE_LEN is the pattern that calculates the number of the elements of the 
> vector will be updated in each iteration.
> For RVV, we use vsetvl instruction to calculate the number of the elements of 
> the vector.
> 
> WHILE_ULT can not work for RVV since WHILE_ULT is generating mask to 
> predicate vector operation, but RVV do not
> use mask to do the loop strip mining (RVV only use mask for control flow 
> inside the loop).
> 
> Here is the example WHILE_ULT working in ARM SVE:
> https://godbolt.org/z/jKsT8E1hP 
> 
> The first example is:
> void foo (int32_t * __restrict a, int32_t * __restrict b, int n)
> {
> for (int i = 0; i < n; i++)
>   a[i] = a[i] + b[i];
> }
> 
> ARM SVE:
> foo:
> cmp w2, 0
> ble .L1
> mov x3, 0
> cntwx4
> whilelo p0.s, wzr, w2
> .L3:
> ld1wz1.s, p0/z, [x0, x3, lsl 2]
> ld1wz0.s, p0/z, [x1, x3, lsl 2]
> add z0.s, z0.s, z1.s
> st1wz0.s, p0, [x0, x3, lsl 2]
> add x3, x3, x4
> whilelo p0.s, w3, w2
> b.any   .L3
> .L1:
> ret
> 
> Here, whilelo will generate the mask according to w3 to w2.
> So for example, if w3 = 0, and w2 = 3 (Suppose machine vector length > 3).
> Then it will generate a mask with 0b111 mask to predicate loads and stores.
> 
> For RVV, we can't do that since RVV doesn't have whilelo instructions to 
> generate predicate mask.
> Also, we can't use mask as the predicate to do loop strip mining since RVV 
> only has 1 single mask 
> to handle flow control  inside the loop.
> 
> Instead, we use vsetvl to do the strip mining, so base on this, the same C 
> code, RVV ideal asm according RVV ISA should be:
> 
> preheader:
> a0 = n (the total number of the scalar should be calculated).
>  .
> .L3:
> vsetvli a5,a0,e32,m1,ta,ma> WHILE_LEN pattern generate this 
> instruction, calculate the number of the elements should be updated
> vle32.v v1,0(a4)
> sub a0,a0,a5  > Decrement the induction variable 
> by the a5 (generated by WHILE_LEN)
>    
> 
> vadd.vv
> vse32.v v1,0(a3)
> add a4,a4,a2
> add a3,a3,a2
> bne a0,zero,.L3
> .L1:
> ret
> 
> So you will see, if n = 3 like I said for ARM SVE (Suppose machine vector 
> length > 3), then vsetvli a5,a0,e32,m1,ta,ma will
> generate a5 = 3, then the vle32.v/vadd.vv/vse32.v are all doing the operation 
> only on the element 0,  element 1, element 2.
> 
> Besides, WHILE_LEN is defined to make sure to be never overflow the input 
> operand which is "a0".
> That means  sub a0,a0,a5 will make a0 never underflow 0.
> 
> I have tried to return Pmode in TARGET_VECTORIZE_GET_MASK_MODE 
> target hook and then use WHILE_ULT. 
> 
> But there are 2 issues:
> One is that current GCC is doing the flow from 0-based until the TEST_LIMIT. 
> Wheras the optimal flow of RVV I showed above
> is from "n" keep decreasing n until 0.  Trying to fit the current flow of 
> GCC, RVV needs more instructions to do the loop strip mining.
> 
> Second is that if we return a Pmode in TARGET_VECTORIZE_GET_MASK_MODE 
> which not only specify the dest mode for WHILE_ULT but also the mask mode of 
> flow control.
> If we return Pmode which is used as the length for RVV. We can't use mask 
> mode like VNx2BI mode to do the flow control predicate.
> This another example:
> void foo2 (int32_t * __restrict a, int32_t * __restrict b, int32_t * restrict 
> cond, int n)
> {
> for (int i = 0; i < n; i++)
>   if (cond[i])
> a[i] = a[i] + b[i];
> }
> 
> ARM SVE:
> ld1wz0.s, p0/z, [x2, x4, lsl 2]
> cmpne   p0.s, p0/z, z0.s, #0
> ld1wz0.s, p0/z, [x0, x4, lsl 2]
> ld1wz1.s, p0/z, [x1, x4, lsl 2]
> add z0.s, z0.s, z1.s
> st1wz0.s, p0, [x0, x4, lsl 2]
> add x4, x4, x5
> whilelo p0.s, w4, w3
> b.any   .L8
> 
> Here we can see ARM use mask mode for both loop strip minning and flow 
> control.
> 
> Wheras, RVV use length generated by vsetvl (WHILE_LEN) to do the loop strip 
> minning and mask generated by comparison to do the flow control.
> 
> So the ASM generated by my downstream LLVM/GCC:
> .L3:
> vsetvli a6,a3,e32,m1,ta,mu   ==> generate length to predicate 
> RVV operation. 
> vle32.v v0,(a2)
> sub a3,a3,a6  ==> decrease the induction variable 
> until 0.
> vmsne.viv0,v0,0   ==> generate mask to predicate RVV 
> operation. 
> vle32.v v24,(a0),v0.t   ===> here using v0.t is the only mask 
> register to predicate RVV operation
> vle32.v v25,(a1),v0.t
> vadd.vv v24,v24,v25
> vse32.v v24,(a0),v0.t
> add a2,a2,a4
> add a0,a0,a4
> add a1,a1,a4
> bne 

Re: [PATCH] PR tree-optimization/109462 - Don't use ANY PHI equivalences in range-on-entry.

2023-04-12 Thread Jakub Jelinek via Gcc-patches
On Tue, Apr 11, 2023 at 07:52:29PM -0400, Andrew MacLeod wrote:
> This bootstraps on x86_64-pc-linux-gnu  with that single regression, which I
> have XFAILed for now.  OK for trunk?

Yes.

>   Once Jakub verifies it actually fixes
> the execution problem.   we have no executable test . yet.

I have verified this fix both on the original clang testcase, and
on a self-contained testcase I've reduced overnight and this morning.

Ok to commit it to trunk incrementally after your commit?

BTW, I've wondered if it is just this uninitialized phi arg problem or if
I could reproduce it also if the phi arg was initialized but had the same
range as the current iteration is known to have due to a comparison (even
when the actual value is from the previous loop's iteration).  In the
test below, that would be Result.c = r_paren; before
while (!TheLexer.LexFromRawLexer (I)) loop.  But it wasn't miscompiled in
that case.

2023-04-12  Jakub Jelinek  

PR tree-optimization/109462
* g++.dg/opt/pr109462.C: New test.

--- gcc/testsuite/g++.dg/opt/pr109462.C.jj  2023-04-12 09:58:23.085603031 
+0200
+++ gcc/testsuite/g++.dg/opt/pr109462.C 2023-04-12 09:54:22.472079711 +0200
@@ -0,0 +1,94 @@
+// PR tree-optimization/109462
+// { dg-do run { target c++11 } }
+// { dg-options "-O2" }
+
+struct A {
+  A (const char *);
+  A (const char *, int);
+  bool empty ();
+  int size ();
+  bool equals (A);
+  A trim (char);
+  A trim ();
+};
+[[gnu::noipa]] A::A (const char *) {}
+[[gnu::noipa]] A::A (const char *, int) { __builtin_abort (); }
+[[gnu::noipa]] bool A::empty () { __builtin_abort (); }
+[[gnu::noipa]] int A::size () { __builtin_abort (); }
+[[gnu::noipa]] bool A::equals (A) { return true; }
+[[gnu::noipa]] A A::trim (char) { __builtin_abort (); }
+[[gnu::noipa]] A A::trim () { __builtin_abort (); }
+
+enum B { raw_identifier = 6, l_paren = 21, r_paren = 22 };
+[[gnu::noipa]] bool isAnyIdentifier (B) { return true; }
+[[gnu::noipa]] bool isStringLiteral (B) { __builtin_abort (); }
+
+struct C {
+  B c;
+  B getKind () { return c; }
+  bool is (B x) { return c == x; }
+  unsigned getLength () { __builtin_abort (); }
+  A getRawIdentifier () {
+A x ("");
+c == raw_identifier ? void () : __builtin_abort ();
+return x;
+  }
+  const char *getLiteralData ();
+};
+[[gnu::noipa]] const char *C::getLiteralData () { __builtin_abort (); }
+
+struct D {
+  D ();
+  bool LexFromRawLexer (C &);
+};
+[[gnu::noipa]] D::D () {}
+[[gnu::noipa]] bool D::LexFromRawLexer (C ) {
+  static int cnt;
+  C tok[] = { { raw_identifier }, { l_paren }, { raw_identifier }, { r_paren } 
};
+  t = tok[cnt++];
+  return false;
+}
+
+bool ok = false;
+[[gnu::noipa]] void reportEmptyContextError ()
+{
+  ok = true;
+}
+
+[[gnu::noipa]] void
+VisitObjCMessageExpr ()
+{
+  D TheLexer;
+  C I;
+  C Result;
+  int p_count = 0;
+  while (!TheLexer.LexFromRawLexer (I)) {
+if (I.getKind () == l_paren)
+  ++p_count;
+if (I.getKind () == r_paren) {
+  if (p_count == 1)
+break;
+  --p_count;
+}
+Result = I;
+  }
+  if (isAnyIdentifier (Result.getKind ())) {
+if (Result.getRawIdentifier ().equals ("nil")) {
+  reportEmptyContextError ();
+  return;
+}
+  }
+  if (!isStringLiteral (Result.getKind ()))
+return;
+  A Comment = A (Result.getLiteralData (), Result.getLength ()).trim ('"');
+  if ((Comment.trim ().size () == 0 && Comment.size () > 0) || Comment.empty 
())
+reportEmptyContextError ();
+}
+
+int
+main ()
+{
+  VisitObjCMessageExpr ();
+  if (!ok)
+__builtin_abort ();
+}


Jakub



Re: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization

2023-04-12 Thread juzhe.zh...@rivai.ai
Thank you very much for reply.

WHILE_LEN is the pattern that calculates the number of the elements of the 
vector will be updated in each iteration.
For RVV, we use vsetvl instruction to calculate the number of the elements of 
the vector.

WHILE_ULT can not work for RVV since WHILE_ULT is generating mask to predicate 
vector operation, but RVV do not
use mask to do the loop strip mining (RVV only use mask for control flow inside 
the loop).

Here is the example WHILE_ULT working in ARM SVE:
https://godbolt.org/z/jKsT8E1hP 

The first example is:
void foo (int32_t * __restrict a, int32_t * __restrict b, int n)
{
for (int i = 0; i < n; i++)
  a[i] = a[i] + b[i];
}

ARM SVE:
foo:
cmp w2, 0
ble .L1
mov x3, 0
cntwx4
whilelo p0.s, wzr, w2
.L3:
ld1wz1.s, p0/z, [x0, x3, lsl 2]
ld1wz0.s, p0/z, [x1, x3, lsl 2]
add z0.s, z0.s, z1.s
st1wz0.s, p0, [x0, x3, lsl 2]
add x3, x3, x4
whilelo p0.s, w3, w2
b.any   .L3
.L1:
ret

Here, whilelo will generate the mask according to w3 to w2.
So for example, if w3 = 0, and w2 = 3 (Suppose machine vector length > 3).
Then it will generate a mask with 0b111 mask to predicate loads and stores.

For RVV, we can't do that since RVV doesn't have whilelo instructions to 
generate predicate mask.
Also, we can't use mask as the predicate to do loop strip mining since RVV only 
has 1 single mask 
to handle flow control  inside the loop.

Instead, we use vsetvl to do the strip mining, so base on this, the same C 
code, RVV ideal asm according RVV ISA should be:

preheader:
a0 = n (the total number of the scalar should be calculated).
 .
.L3:
vsetvli a5,a0,e32,m1,ta,ma> WHILE_LEN pattern generate this 
instruction, calculate the number of the elements should be updated
vle32.v v1,0(a4)
sub a0,a0,a5  > Decrement the induction variable by 
the a5 (generated by WHILE_LEN)
   

vadd.vv
vse32.v v1,0(a3)
add a4,a4,a2
add a3,a3,a2
bne a0,zero,.L3
.L1:
ret

So you will see, if n = 3 like I said for ARM SVE (Suppose machine vector 
length > 3), then vsetvli a5,a0,e32,m1,ta,ma will
generate a5 = 3, then the vle32.v/vadd.vv/vse32.v are all doing the operation 
only on the element 0,  element 1, element 2.

Besides, WHILE_LEN is defined to make sure to be never overflow the input 
operand which is "a0".
That means  sub a0,a0,a5 will make a0 never underflow 0.

I have tried to return Pmode in TARGET_VECTORIZE_GET_MASK_MODE 
target hook and then use WHILE_ULT. 

But there are 2 issues:
One is that current GCC is doing the flow from 0-based until the TEST_LIMIT. 
Wheras the optimal flow of RVV I showed above
is from "n" keep decreasing n until 0.  Trying to fit the current flow of GCC, 
RVV needs more instructions to do the loop strip mining.

Second is that if we return a Pmode in TARGET_VECTORIZE_GET_MASK_MODE 
which not only specify the dest mode for WHILE_ULT but also the mask mode of 
flow control.
If we return Pmode which is used as the length for RVV. We can't use mask mode 
like VNx2BI mode to do the flow control predicate.
This another example:
void foo2 (int32_t * __restrict a, int32_t * __restrict b, int32_t * restrict 
cond, int n)
{
for (int i = 0; i < n; i++)
  if (cond[i])
a[i] = a[i] + b[i];
}

ARM SVE:
ld1wz0.s, p0/z, [x2, x4, lsl 2]
cmpne   p0.s, p0/z, z0.s, #0
ld1wz0.s, p0/z, [x0, x4, lsl 2]
ld1wz1.s, p0/z, [x1, x4, lsl 2]
add z0.s, z0.s, z1.s
st1wz0.s, p0, [x0, x4, lsl 2]
add x4, x4, x5
whilelo p0.s, w4, w3
b.any   .L8

Here we can see ARM use mask mode for both loop strip minning and flow control.

Wheras, RVV use length generated by vsetvl (WHILE_LEN) to do the loop strip 
minning and mask generated by comparison to do the flow control.

So the ASM generated by my downstream LLVM/GCC:
.L3:
vsetvli a6,a3,e32,m1,ta,mu   ==> generate length to predicate 
RVV operation. 
vle32.v v0,(a2)
sub a3,a3,a6  ==> decrease the induction variable until 
0.
vmsne.viv0,v0,0   ==> generate mask to predicate RVV 
operation. 
vle32.v v24,(a0),v0.t   ===> here using v0.t is the only mask 
register to predicate RVV operation
vle32.v v25,(a1),v0.t
vadd.vv v24,v24,v25
vse32.v v24,(a0),v0.t
add a2,a2,a4
add a0,a0,a4
add a1,a1,a4
bne a3,zero,.L3
.L1:
ret


This is the how RVV works.
Feel free to comment if you have any questions.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-04-12 15:00
To: Richard Sandiford
CC: juzhe.zh...@rivai.ai; gcc-patches; jeffreyalaw
Subject: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for 

New template for 'gcc' made available

2023-04-12 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.  (If you have
any questions, send them to .)

A new POT file for textual domain 'gcc' has been made available
to the language teams for translation.  It is archived as:

https://translationproject.org/POT-files/gcc-13.1-b20230409.pot

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

Below is the URL which has been provided to the translators of your
package.  Please inform the translation coordinator, at the address
at the bottom, if this information is not current:

https://gcc.gnu.org/pub/gcc/snapshots/13-20230409/gcc-13-20230409.tar.xz

Translated PO files will later be automatically e-mailed to you.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-12 Thread Richard Biener via Gcc-patches
On Tue, 11 Apr 2023, Kito Cheng wrote:

> Let me give more explanation why RISC-V vector need so many modes than 
> AArch64.
> 
> The following will use "RVV" as an abbreviation for "RISC-V Vector"
> instructions.
> 
> There are two key points here:
> 
> - RVV has a concept called LMUL - you can understand that as register
> grouping, we can group up to 8 adjacent registers together and then
> operate at once, e.g. one vadd can operate on adding two 8-reg groups
> at once.
> - We have segment load/store that require vector tuple types. -
> AArch64 has similar stuffs on both Neon and SVE, e.g. int32x2x2_t or
> svint32x2_t.
> 
> In order to model LMUL in backend, we have to the combination of
> scalar type and LMUL; possible LMUL is 1, 2, 4, 8, 1/2, 1/4, 1/8 - 8
> different types of LMUL, and we'll have QI, HI, SI, DI, HF, SF and DF,
> so basically we'll have 7 (LMUL type) * 7 (scalar type) here.

Other archs have load/store-multiple instructions, IIRC those
are modeled with the appropriate set of operands.  Do RVV LMUL
group inputs/outputs overlap with the non-LMUL grouped registers
and can they be used as aliases or is this supposed to be
implemented transparently on the register file level only?

But yes, implementing this as operations on multi-register
ops with large modes is probably the only sensible approach.

I don't see how LMUL of 1/2, 1/4 or 1/8 is useful though?  Can you
explain?  Is that supposed to virtually increase the number of
registers?  How do you represent r0:1/8:0 vs r0:1/8:3 (the first
and the third "virtual" register decomposed from r0) in GCC?  To
me the natural way would be a subreg of r0?

Somehow RVV seems to have more knobs than necessary for tuning
the actual vector register layout (aka N axes but only N-1 dimensions
thus the axes are not orthogonal).

> Okay, let's talk about tuple type AArch64 also having tuple type, but
> why is it not having such a huge number of modes? It mainly cause by
> LMUL; use a concrete example to explain why this cause different
> design on machine mode, using scalable vector mode with SI mode tuple
> here:
> 
> AArch64: svint32_t (VNx4SI) svint32x2_t (VNx8SI) svint32x3_t (VNx12SI)
> svint32x3_t (VNx16SI)
> 
> AArch64 only has up to 3-tuple, but RISC-V could have up to 8-tuple,
> so we already have 8 different types for each scalar mode even though
> we don't count LMUL concept yet.
> 
> RISC-V*: vint32m1_t (VNx4SI) vint32m1x2_t (VNx8SI) vint32m1x3_t
> (VNx12SI) vint32m1x4_t (VNx16SI) vint32m1x5_t (VNx20SI) vint32m1x6_t
> (VNx24SI) vint32m1x7_t (VNx28SI) vint32m1x8_t (VNx32SI)
> 
> Using VLEN=128 as the base type system, you can ignore it if you don't
> understand the meaning for now.
> 
> And let's consider LMUL now, add LMUL=2 case here, RVV has a
> constraint that the LMUL * NF(NF-tuple) must be less or equal to 8, so
> we have only 3 extra modes for LMUL=2.
> 
> RISC-V*: vint32m2_t (VNx8SI) vint32m2x2_t (VNx16SI) vint32m2x3_t
> (VNx24SI) vint32m2x4_t (VNx32SI)
> 
> However, there is a big problem RVV have different register constraint
> for different LMUL type, LMUL <= 1 can use any register, LMUL=2 type
> require register align to multiple-of-2 (v0, v2, ?), and LMUL=4 type
> requires register align to multiple-of-4 (v0, v4, ?).
> 
> So vint32m1x2_t (LMUL=1x2) and vint32m2_t (LMUL=2) have the same size
> and NUNIT, but they have different register constraint, vint32m1x2_t
> is LMUL 1, so we don't have register constraint, but vint32m2_t is
> LMUL 2 so it has reg. constraint, it must be aligned to multiple-of-2.
> 
> Based on the above reason, those tuple types must have separated
> machine mode even if they have the same size and NUNIT.
> 
> Why Neon and SVE didn't have such an issue? Because SVE and Neon
> didn't have the concept of LMUL, so tuple type in SVE and Neon won't
> have two vector types that have the same size but different register
> constraints or alignment - one size is one type.
> 
> So based on LMUL and register constraint issue of tuple type, we must
> have 37 types for vector tuples, and plus 48 modes variable-length
> vector mode, and 42 scalar mode - so we have ~140 modes now, it sounds
> like still less than 256, so what happened?
> 
> 
> RVV has one more thing special thing in our type system due to ISA
> design, the minimal vector length of RVV is 32 bit unlike SVE
> guarantee, the minimal is 128 bits, so we did some tricks one our type
> system is we have a different mode for minimal vector length
> (MIN_VLEN) is 32, 64 or large or equal to 128, this design is because
> it would be more friendly for vectorizer, and also model things
> precisely for better code gen.
> 
> e.g.
> 
> vint32m1_t is VNx1SI in MIN_VLEN>=32
> 
> vint32m1_t is VNx2SI in MIN_VLEN>=64
> 
> vint32m1_t is VNx4SI in MIN_VLEN>=128
> 
> So actually we will have 37 * 3 modes for vector tuple mode, and now
> ~210 modes now (the result is little different than JuZhe's number
> since I ignore some mode isn't used in C, but it defined in machine
> mode due 

Re: [PATCH] gcov: add info about "calls" to JSON output format

2023-04-12 Thread Martin Liška
On 4/11/23 11:23, Richard Biener wrote:
> On Thu, Apr 6, 2023 at 3:58 PM Martin Liška  wrote:
>>
>> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>>
>> Ready to be installed after stage1 opens?
> 
> Did we release a compiler with version 1?  If not we might want to sneak

Yes, all compilers starting with 9.1 emit version 1.

> this in before 13.1 ...

Yep, I would welcome sneaking in.

> 
> Up to Honza.

PING: Honza!

Thanks,
Martin

> 
> Thanks,
> Richard.
> 
>> Thanks,
>> Martin
>>
>> gcc/ChangeLog:
>>
>> * doc/gcov.texi: Document the new "calls" field and document
>> the API bump.
>> * gcov.cc (output_intermediate_json_line): Output info about
>> calls.
>> (generate_results): Bump version to 2.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * g++.dg/gcov/gcov-17.C: Add call to a noreturn function.
>> * g++.dg/gcov/test-gcov-17.py: Cover new format.
>> * lib/gcov.exp: Add options for gcov that emit the extra info.
>> ---
>>  gcc/doc/gcov.texi | 27 +--
>>  gcc/gcov.cc   | 12 +-
>>  gcc/testsuite/g++.dg/gcov/gcov-17.C   |  7 ++
>>  gcc/testsuite/g++.dg/gcov/test-gcov-17.py | 17 ++
>>  gcc/testsuite/lib/gcov.exp|  2 +-
>>  5 files changed, 57 insertions(+), 8 deletions(-)
>>
>> diff --git a/gcc/doc/gcov.texi b/gcc/doc/gcov.texi
>> index d39cce3a683..6739ebb3643 100644
>> --- a/gcc/doc/gcov.texi
>> +++ b/gcc/doc/gcov.texi
>> @@ -195,7 +195,7 @@ Structure of the JSON is following:
>>  @{
>>"current_working_directory": "foo/bar",
>>"data_file": "a.out",
>> -  "format_version": "1",
>> +  "format_version": "2",
>>"gcc_version": "11.1.1 20210510"
>>"files": ["$file"]
>>  @}
>> @@ -214,6 +214,12 @@ a compilation unit was compiled
>>  @item
>>  @var{format_version}: semantic version of the format
>>
>> +Changes in version @emph{2}:
>> +@itemize @bullet
>> +@item
>> +@var{calls}: information about function calls is added
>> +@end itemize
>> +
>>  @item
>>  @var{gcc_version}: version of the GCC compiler
>>  @end itemize
>> @@ -292,6 +298,7 @@ Each @var{line} has the following form:
>>  @smallexample
>>  @{
>>"branches": ["$branch"],
>> +  "calls": ["$call"],
>>"count": 2,
>>"line_number": 15,
>>"unexecuted_block": false,
>> @@ -299,7 +306,7 @@ Each @var{line} has the following form:
>>  @}
>>  @end smallexample
>>
>> -Branches are present only with @var{-b} option.
>> +Branches and calls are present only with @var{-b} option.
>>  Fields of the @var{line} element have following semantics:
>>
>>  @itemize @bullet
>> @@ -341,6 +348,22 @@ Fields of the @var{branch} element have following 
>> semantics:
>>  @var{throw}: true when the branch is an exceptional branch
>>  @end itemize
>>
>> +Each @var{call} has the following form:
>> +
>> +@smallexample
>> +@{
>> +  "returned": 11,
>> +@}
>> +@end smallexample
>> +
>> +Fields of the @var{call} element have following semantics:
>> +
>> +@itemize @bullet
>> +@item
>> +@var{returned}: number of times a function call returned (call count is 
>> equal
>> +to @var{line::count})
>> +@end itemize
>> +
>>  @item -H
>>  @itemx --human-readable
>>  Write counts in human readable format (like 24.6k).
>> diff --git a/gcc/gcov.cc b/gcc/gcov.cc
>> index 2ec7248cc0e..88324143640 100644
>> --- a/gcc/gcov.cc
>> +++ b/gcc/gcov.cc
>> @@ -1116,6 +1116,9 @@ output_intermediate_json_line (json::array *object,
>>json::array *branches = new json::array ();
>>lineo->set ("branches", branches);
>>
>> +  json::array *calls = new json::array ();
>> +  lineo->set ("calls", calls);
>> +
>>vector::const_iterator it;
>>if (flag_branches)
>>  for (it = line->branches.begin (); it != line->branches.end ();
>> @@ -1130,6 +1133,13 @@ output_intermediate_json_line (json::array *object,
>>  new json::literal ((*it)->fall_through));
>> branches->append (branch);
>>   }
>> +   else if ((*it)->is_call_non_return)
>> + {
>> +   json::object *call = new json::object ();
>> +   gcov_type returns = (*it)->src->count - (*it)->count;
>> +   call->set ("returned", new json::integer_number (returns));
>> +   calls->append (call);
>> + }
>>}
>>
>>object->append (lineo);
>> @@ -1523,7 +1533,7 @@ generate_results (const char *file_name)
>>gcov_intermediate_filename = get_gcov_intermediate_filename (file_name);
>>
>>json::object *root = new json::object ();
>> -  root->set ("format_version", new json::string ("1"));
>> +  root->set ("format_version", new json::string ("2"));
>>root->set ("gcc_version", new json::string (version_string));
>>
>>if (bbg_cwd != NULL)
>> diff --git a/gcc/testsuite/g++.dg/gcov/gcov-17.C 
>> b/gcc/testsuite/g++.dg/gcov/gcov-17.C
>> index d11883cfd39..efe019599a5 100644
>> --- a/gcc/testsuite/g++.dg/gcov/gcov-17.C
>> +++ 

Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization

2023-04-12 Thread Richard Biener via Gcc-patches
On Tue, 11 Apr 2023, Richard Sandiford wrote:

> "juzhe.zh...@rivai.ai"  writes:
> > Hi, Richards. 
> > Kindly Ping this patch. 
> > This is the most important patch for RVV auto-vectorization support.
> > Bootstraped on X86 has passed.
> 
> Can it wait for GCC 14?  It doesn't seem like stage 4 material.
> 
> Also, pinging after 5 days seems a bit soon.  It's been a 4-day
> holiday weekend for much of Europe.

Also can you explain why using WHILE_ULT is not possible?  (I've
successfully - to some extent - done that for AVX512 for example)

The patch lacks the description of what WHILE_LEN actually is.

Richard.

> Thanks,
> Richard
> 
> > Feel free to comments.
> >
> > Thanks.
> >
> >
> > juzhe.zh...@rivai.ai
> >  
> > From: juzhe.zhong
> > Date: 2023-04-07 09:47
> > To: gcc-patches
> > CC: richard.sandiford; rguenther; jeffreyalaw; Juzhe-Zhong
> > Subject: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for 
> > auto-vectorization
> > From: Juzhe-Zhong 
> >  
> > This patch is to add WHILE_LEN pattern.
> > It's inspired by RVV ISA simple "vvaddint32.s" example:
> > https://github.com/riscv/riscv-v-spec/blob/master/example/vvaddint32.s
> >  
> > More details are in "vect_set_loop_controls_by_while_len" implementation
> > and comments.
> >  
> > Consider such following case:
> > #define N 16
> > int src[N];
> > int dest[N];
> >  
> > void
> > foo (int n)
> > {
> >   for (int i = 0; i < n; i++)
> > dest[i] = src[i];
> > }
> >  
> > -march=rv64gcv -O3 --param riscv-autovec-preference=scalable 
> > -fno-vect-cost-model -fno-tree-loop-distribute-patterns:
> >  
> > foo:
> > ble a0,zero,.L1
> > lui a4,%hi(.LANCHOR0)
> > addia4,a4,%lo(.LANCHOR0)
> > addia3,a4,64
> > csrra2,vlenb
> > .L3:
> > vsetvli a5,a0,e32,m1,ta,ma
> > vle32.v v1,0(a4)
> > sub a0,a0,a5
> > vse32.v v1,0(a3)
> > add a4,a4,a2
> > add a3,a3,a2
> > bne a0,zero,.L3
> > .L1:
> > ret
> >  
> > gcc/ChangeLog:
> >  
> > * doc/md.texi: Add WHILE_LEN support.
> > * internal-fn.cc (while_len_direct): Ditto.
> > (expand_while_len_optab_fn): Ditto.
> > (direct_while_len_optab_supported_p): Ditto.
> > * internal-fn.def (WHILE_LEN): Ditto.
> > * optabs.def (OPTAB_D): Ditto.
> > * tree-ssa-loop-manip.cc (create_iv): Ditto.
> > * tree-ssa-loop-manip.h (create_iv): Ditto.
> > * tree-vect-loop-manip.cc (vect_set_loop_controls_by_while_len): 
> > Ditto.
> > (vect_set_loop_condition_partial_vectors): Ditto.
> > * tree-vect-loop.cc (vect_get_loop_len): Ditto.
> > * tree-vect-stmts.cc (vectorizable_store): Ditto.
> > (vectorizable_load): Ditto.
> > * tree-vectorizer.h (vect_get_loop_len): Ditto.
> >  
> > ---
> > gcc/doc/md.texi |  14 +++
> > gcc/internal-fn.cc  |  29 ++
> > gcc/internal-fn.def |   1 +
> > gcc/optabs.def  |   1 +
> > gcc/tree-ssa-loop-manip.cc  |   4 +-
> > gcc/tree-ssa-loop-manip.h   |   2 +-
> > gcc/tree-vect-loop-manip.cc | 186 ++--
> > gcc/tree-vect-loop.cc   |  35 +--
> > gcc/tree-vect-stmts.cc  |   9 +-
> > gcc/tree-vectorizer.h   |   4 +-
> > 10 files changed, 264 insertions(+), 21 deletions(-)
> >  
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index 8e3113599fd..72178ab014c 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -4965,6 +4965,20 @@ for (i = 1; i < operand3; i++)
> >operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
> > @end smallexample
> > +@cindex @code{while_len@var{m}@var{n}} instruction pattern
> > +@item @code{while_len@var{m}@var{n}}
> > +Set operand 0 to the number of active elements in vector will be updated 
> > value.
> > +operand 1 is the total elements need to be updated value.
> > +operand 2 is the vectorization factor.
> > +The operation is equivalent to:
> > +
> > +@smallexample
> > +operand0 = MIN (operand1, operand2);
> > +operand2 can be const_poly_int or poly_int related to vector mode size.
> > +Some target like RISC-V has a standalone instruction to get MIN (n, MODE 
> > SIZE) so
> > +that we can reduce a use of general purpose register.
> > +@end smallexample
> > +
> > @cindex @code{check_raw_ptrs@var{m}} instruction pattern
> > @item @samp{check_raw_ptrs@var{m}}
> > Check whether, given two pointers @var{a} and @var{b} and a length 
> > @var{len},
> > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > index 6e81dc05e0e..5f44def90d3 100644
> > --- a/gcc/internal-fn.cc
> > +++ b/gcc/internal-fn.cc
> > @@ -127,6 +127,7 @@ init_internal_fns ()
> > #define cond_binary_direct { 1, 1, true }
> > #define cond_ternary_direct { 1, 1, true }
> > #define while_direct { 0, 2, false }
> > +#define while_len_direct { 0, 0, false }
> > #define fold_extract_direct { 2, 2, false }
> > #define fold_left_direct { 1, 1, 

Re: [PATCH] Fortran: fix functions with entry and pointer/allocatable result [PR104312]

2023-04-12 Thread Paul Richard Thomas via Gcc-patches
Hi Harald,

The patch looks good to me - OK for mainline.

Thanks

Paul


On Tue, 11 Apr 2023 at 21:12, Harald Anlauf via Fortran 
wrote:

> Dear all,
>
> the testcase in the PR by Gerhard exhibited a mis-treatment of
> the function decl of the entry master if the function result
> had a pointer attribute and the translation unit was compiled
> with -ff2c.  We actually should not use the peculiar special
> treatment for default-real functions in that case, as -ff2c is
> reserved for function results that can be expressed in Fortran77,
> and POINTER was not allowed in that standard.  Same for complex.
>
> Furthermore, it turned out that ALLOCATABLE function results
> were not yet handled for functions with entries, even without
> -ff2c.  Adding support for this was straightforward.
>
> I also fixed a potential buffer overflow for a generated
> internal symbol.
>
> Regtested on x86_64-pc-linux-gnu.  OK for mainline?
>
> Thanks,
> Harald
>
>

-- 
"If you can't explain it simply, you don't understand it well enough" -
Albert Einstein


[PATCH] tree-optimization/109434 - bogus DSE of throwing call LHS

2023-04-12 Thread Richard Biener via Gcc-patches
The byte tracking of call LHS didn't properly handle possibly
throwing calls correctly which cases bogus DSE and in turn, for the
testcase a bogus uninit diagnostic and (unreliable) wrong-code.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/109434
* tree-ssa-dse.cc (initialize_ao_ref_for_dse): Properly
handle possibly throwing calls when processing the LHS
and may-defs are not OK.

* g++.dg/opt/pr109434.C: New testcase.
---
 gcc/testsuite/g++.dg/opt/pr109434.C | 28 
 gcc/tree-ssa-dse.cc |  3 ++-
 2 files changed, 30 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/opt/pr109434.C

diff --git a/gcc/testsuite/g++.dg/opt/pr109434.C 
b/gcc/testsuite/g++.dg/opt/pr109434.C
new file mode 100644
index 000..cffa327fd9b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/opt/pr109434.C
@@ -0,0 +1,28 @@
+// { dg-do compile }
+// { dg-require-effective-target c++17 }
+// { dg-options "-O2 -Wall" }
+
+#include 
+#include 
+
+std::optional foo()
+{
+  volatile int x = 1;
+  if (x)
+throw std::runtime_error("haha");
+  return 42;
+}
+
+int main()
+{
+  std::optional optInt;
+  try {
+  // We falsely DSEd the LHS of the call even though foo throws
+  // which results in an uninitialized diagnostic
+  optInt = foo();
+  } catch (...) {
+  return optInt.has_value();
+  }
+  std::optional optDbl{optInt};
+  return optDbl ? optDbl.value () : 2.0;
+}
diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc
index 4f8a44fbba0..eabe8ba4522 100644
--- a/gcc/tree-ssa-dse.cc
+++ b/gcc/tree-ssa-dse.cc
@@ -179,7 +179,8 @@ initialize_ao_ref_for_dse (gimple *stmt, ao_ref *write, 
bool may_def_ok = false)
 }
   if (tree lhs = gimple_get_lhs (stmt))
 {
-  if (TREE_CODE (lhs) != SSA_NAME)
+  if (TREE_CODE (lhs) != SSA_NAME
+ && (may_def_ok || !stmt_could_throw_p (cfun, stmt)))
{
  ao_ref_init (write, lhs);
  return true;
-- 
2.35.3


[PATCH] tree-optimization/109469 - SLP with returns-twice region start

2023-04-12 Thread Richard Biener via Gcc-patches
The following avoids an SLP region starting with a returns-twice
call where we cannot insert stmts at the head.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/109469
* tree-vect-slp.cc (vect_slp_function): Skip region starts with
a returns-twice call.

* gcc.dg/torture/pr109469.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr109469.c | 15 +++
 gcc/tree-vect-slp.cc| 19 ---
 2 files changed, 31 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr109469.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr109469.c 
b/gcc/testsuite/gcc.dg/torture/pr109469.c
new file mode 100644
index 000..d05a93b6783
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr109469.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+
+__attribute__((returns_twice)) int foo();
+
+struct xio myproc;
+struct xio {
+  void (*read_proc)();
+  void (*write_proc)();
+};
+
+void dummy_write_proc() {
+  switch (foo())
+  default:
+myproc.read_proc = myproc.write_proc = dummy_write_proc;
+}
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 356bdfb93d9..d73deaecce0 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -7671,10 +7671,23 @@ vect_slp_function (function *fun)
{
  r |= vect_slp_bbs (bbs, NULL);
  bbs.truncate (0);
- bbs.quick_push (bb);
}
-  else
-   bbs.safe_push (bb);
+
+  /* We need to be able to insert at the head of the region which
+we cannot for region starting with a returns-twice call.  */
+  if (bbs.is_empty ())
+   if (gcall *first = safe_dyn_cast  (first_stmt (bb)))
+ if (gimple_call_flags (first) & ECF_RETURNS_TWICE)
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"skipping bb%d as start of region as it "
+"starts with returns-twice call\n",
+bb->index);
+ continue;
+   }
+
+  bbs.safe_push (bb);
 
   /* When we have a stmt ending this block and defining a
 value we have to insert on edges when inserting after it for
-- 
2.35.3


Re: [PATCH] combine: Fix simplify_comparison AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]

2023-04-12 Thread Jakub Jelinek via Gcc-patches
On Tue, Apr 11, 2023 at 07:26:07PM -0600, Jeff Law wrote:
> I did bootstrap on riscv, but not a regression test, that's spinning right
> now.
> 
> Jeff

> diff --git a/gcc/combine.cc b/gcc/combine.cc
> index 22bf8e1ec89..c41d8a09b3b 100644
> --- a/gcc/combine.cc
> +++ b/gcc/combine.cc
> @@ -10055,9 +10055,10 @@ simplify_and_const_int_1 (scalar_int_mode mode, rtx 
> varop,
>  
>/* See what bits may be nonzero in VAROP.  Unlike the general case of
>   a call to nonzero_bits, here we don't care about bits outside
> - MODE.  */
> + MODE unless WORD_REGISTER_OPERATIONS is true.  */

I would have expected something like
WORD_REGISTER_OPERATIONS && known_le (GET_MODE_PRECISION (mode), BITS_PER_WORD)
as the condition to use word_mode, rather than just
WORD_REGISTER_OPERATIONS.  In both spots.  Because larger modes should be
used as is, not a narrower word_mode instead of them.

> -  nonzero = nonzero_bits (varop, mode) & GET_MODE_MASK (mode);
> +  enum machine_mode tmode = WORD_REGISTER_OPERATIONS ? word_mode : mode;
> +  nonzero = nonzero_bits (varop, tmode) & GET_MODE_MASK (tmode);
>  
>/* Turn off all bits in the constant that are known to already be zero.
>   Thus, if the AND isn't needed at all, we will have CONSTOP == 
> NONZERO_BITS
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> index 3b33afa2461..5f6f70491d8 100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -3752,7 +3752,10 @@ simplify_context::simplify_binary_operation_1 
> (rtx_code code,
>   return op0;
>if (HWI_COMPUTABLE_MODE_P (mode))
>   {
> -   HOST_WIDE_INT nzop0 = nonzero_bits (trueop0, mode);
> +   /* When WORD_REGISTER_OPERATIONS is true, we need to know the
> +  nonzero bits in WORD_MODE rather than MODE.  */
> +   HOST_WIDE_INT nzop0
> + = nonzero_bits (trueop0, WORD_REGISTER_OPERATIONS ? word_mode : 
> mode);
> HOST_WIDE_INT nzop1;
> if (CONST_INT_P (trueop1))
>   {

Regarding my earlier comments for this spot, the later code does
  nzop1 = nonzero_bits (trueop1, mode);
  /* If we are clearing all the nonzero bits, the result is zero.  */
  if ((nzop1 & nzop0) == 0
  && !side_effects_p (op0) && !side_effects_p (op1))
return CONST0_RTX (mode);
and because nonzero_bits in word_mode if it is wider might have just more
bits set above mode, but nzop1 will not have those bits set, I think it is
fine the way you wrote it (except for the precision check).

Jakub



[PATCH] testsuite: filter out warning noise for CWE-1341 test

2023-04-12 Thread Jiufu Guo via Gcc-patches
Hi,

The case file-CWE-1341-example.c checkes [CWE-1341](`double-fclose`).
While on some systems, besides [CWE-1341], a message of [CWE-415] is
also reported. On those systems, attribute `malloc` may be attached on
fopen:
```
# 258 "/usr/include/stdio.h" 3 4
extern FILE *fopen (const char *__restrict __filename,
  const char *__restrict __modes)   

  
  __attribute__ ((__malloc__)) __attribute__ ((__malloc__ (fclose, 1))) ;

or say: __attribute_malloc__ __attr_dealloc_fclose __wur;
```

It would be ok to suppress other message except CWE-1341 for this case.
This patch add -Wno-analyzer-double-free to make this case pass on
those systems.

Tested on ppc64 both BE and LE.
Is this ok for trunk?

BR,
Jeff (Jiufu)

gcc/testsuite/ChangeLog:

PR target/108722
* gcc.dg/analyzer/file-CWE-1341-example.c: Update.

---
 gcc/testsuite/gcc.dg/analyzer/file-CWE-1341-example.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/analyzer/file-CWE-1341-example.c 
b/gcc/testsuite/gcc.dg/analyzer/file-CWE-1341-example.c
index 2add3cb109b..830cb0376ea 100644
--- a/gcc/testsuite/gcc.dg/analyzer/file-CWE-1341-example.c
+++ b/gcc/testsuite/gcc.dg/analyzer/file-CWE-1341-example.c
@@ -19,6 +19,9 @@
 
IN NO EVENT SHALL THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS 
SPONSORED BY (IF ANY), THE MITRE CORPORATION, ITS BOARD OF TRUSTEES, OFFICERS, 
AGENTS, AND EMPLOYEES BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, 
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 
CONNECTION WITH THE INFORMATION OR THE USE OR OTHER DEALINGS IN THE CWE.  */
 
+/* This case checks double-fclose only, suppress other warning.  */
+/* { dg-additional-options -Wno-analyzer-double-free } */
+
 #include 
 #include 
 #include 
-- 
2.31.1