Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-10 Thread juzhe.zh...@rivai.ai
Hi, I have checked SDnode in LLVM which is a similiar data structure with RTX 
in GCC.
The SDnode in LLVM occupy 80bytes. 
Can we have some tool to test the memory consuming of the whole GCC with 
extended-size RTX?



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-04-11 04:42
To: juzhe.zhong; jakub
CC: gcc-patches; kito.cheng; palmer; richard.sandiford; rguenther
Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit
 
 
On 4/10/23 09:22, juzhe.zh...@rivai.ai wrote:
> Yeah, aarch64 already has 178, RVV has much more types than aarch64...
> You can see intrinsic doc:
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/tuple-type-for-seg-load-store/auto-generated/intrinsic_funcs/02_vector_unit-stride_segment_load_store_instructions_zvlsseg.md
>  
> 
> api number explodes.
> 
> As well as tuples types in RVV much more than aarch64.
> Maybe we need to ask RVV api doc maintainer to reduce types && api of RVV?
> Not sure.
> I think kito may help for this.
I think it's a discussion we need to have.  I really expect efforts to 
have > 256 modes are going to be very controversial.
 
jeff
 
 
 


Re: [PATCH] Fortran: resolve correct generic with TYPE(C_PTR) arguments [PR61615]

2023-04-10 Thread Jerry D via Gcc-patches

On 4/10/23 1:49 PM, Harald Anlauf via Fortran wrote:

Dear all,

when comparing formal and actual arguments of a procedure, there was no
check of rank for derived types from intrinsic module ISO_C_BINDING.
This could lead to a wrong resolution of generic procedures with dummy
argument of related types, see PR.  This was likely an oversight.

The attached fix is simple and regtests cleanly on x86_64-pc-linux-gnu.

OK for mainline?

Thanks,
Harald



Looks good to go.

Jerry


Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-10 Thread juzhe.zhong
Another feasible solution: Maybe we can drop supporting segment intrinsics
in upstream GCC. 
We let the downstream companies support segment in their own downstream GCC ?



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-04-11 04:42
To: juzhe.zhong; jakub
CC: gcc-patches; kito.cheng; palmer; richard.sandiford; rguenther
Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit
 
 
On 4/10/23 09:22, juzhe.zh...@rivai.ai wrote:
> Yeah, aarch64 already has 178, RVV has much more types than aarch64...
> You can see intrinsic doc:
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/tuple-type-for-seg-load-store/auto-generated/intrinsic_funcs/02_vector_unit-stride_segment_load_store_instructions_zvlsseg.md
>  
> 
> api number explodes.
> 
> As well as tuples types in RVV much more than aarch64.
> Maybe we need to ask RVV api doc maintainer to reduce types && api of RVV?
> Not sure.
> I think kito may help for this.
I think it's a discussion we need to have.  I really expect efforts to 
have > 256 modes are going to be very controversial.
 
jeff
 
 
 


Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-10 Thread juzhe.zhong
I don't know, maybe we can try to ask rvv-intrinsic-doc define so many tuple 
types and try to 
make them reduce the api && tuple types?

I am going to remove all FP16 vector to see whether we can reduce machine modes 
<= 256.
I think it may be probably helping to fix that.



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-04-11 04:36
To: Jakub Jelinek
CC: juzhe.zhong; gcc-patches; kito.cheng; palmer; richard.sandiford; rguenther
Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit
 
 
On 4/10/23 09:18, Jakub Jelinek wrote:
> On Mon, Apr 10, 2023 at 08:54:12AM -0600, Jeff Law wrote:
>> This is likely going to be very controversial.  It's going to increase the
>> size of two of most heavily used data structures in GCC (rtx and trees).
>>
>> The first thing I would ask is whether or not we really need the full matrix
>> in practice or if we can combine some of the modes.
>>
>> Why hasn't aarch64 stumbled over this problem?
> 
>  From what I can see, x86 has 130 modes and aarch64 178 right now.
To put it another way.  Why does RISC-V have so many more modes than 
AArch64.
 
Jeff
 


Re: [PATCH] update_web_docs_git: Add updated Texinfo to PATH

2023-04-10 Thread Arsen Arsenović via Gcc-patches

Gerald Pfeifer  writes:

> On Thu, 6 Apr 2023, Arsen Arsenović wrote:
>> maintainer-scripts/ChangeLog:
>> 
>>  * update_web_docs_git: Add updated Texinfo to PATH
>
> Do we really need to adjust PATH, or could we just introduce a MAKEINFO 
> variable, something like
>
>   if [ x${MAKEINFO}x = xx ]; then
> if [ -x /home/gccadmin/texinfo/install-git/bin/makeinfo ]; then
>   MAKEINFO=/home/gccadmin/texinfo/install-git/bin/makeinfo;
> else
>   MAKEINFO=makeinfo
> fi
>   fi
>
> ?
>
> (This also still allows overriding upon invocation.)
>
> Gerald

Ah!  Good idea.  What do you think of the following?

From ba00aa3882b7e0a5fa247f9fa824474e3ddc8102 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Arsen=20Arsenovi=C4=87?= 
Date: Thu, 6 Apr 2023 12:20:57 +0200
Subject: [PATCH] update_web_docs_git: Allow setting TEXI2*, add git build
 default

maintainer-scripts/ChangeLog:

	* update_web_docs_git: Add a mechanism to override makeinfo,
	texi2dvi and texi2pdf, and default them to
	/home/gccadmin/texinfo/install-git/bin/${tool}, if present.
---
 maintainer-scripts/update_web_docs_git | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/maintainer-scripts/update_web_docs_git b/maintainer-scripts/update_web_docs_git
index d44ab27c1b7..c651e567424 100755
--- a/maintainer-scripts/update_web_docs_git
+++ b/maintainer-scripts/update_web_docs_git
@@ -14,6 +14,17 @@ export GITROOT
 
 PATH=/usr/local/bin:$PATH
 
+makeinfo_git=/home/gccadmin/texinfo/install-git/bin/
+if [ -x "${makeinfo_git}"/makeinfo ]; then
+: "${MAKEINFO:=${makeinfo_git}/makeinfo}"
+: "${TEXI2DVI:=${makeinfo_git}/texi2dvi}"
+: "${TEXI2PDF:=${makeinfo_git}/texi2pdf}"
+else
+: "${MAKEINFO:=makeinfo}"
+: "${TEXI2DVI:=texi2dvi}"
+: "${TEXI2PDF:=texi2pdf}"
+fi
+
 MANUALS="cpp
   cppinternals
   fastjar
@@ -174,10 +185,10 @@ for file in $MANUALS; do
 elif [ "$file" = "gnat_ugn" ]; then
   includes="$includes -I gcc/gcc/ada -I gcc/gcc/ada/doc/gnat_ugn"
 fi
-makeinfo --html -c CONTENTS_OUTPUT_LOCATION=inline --css-ref $CSS $includes -o ${file} ${filename}
+"${MAKEINFO}" --html -c CONTENTS_OUTPUT_LOCATION=inline --css-ref $CSS $includes -o ${file} ${filename}
 tar cf ${file}-html.tar ${file}/*.html
-texi2dvi $includes -o ${file}.dvi ${filename} /dev/null && dvips -o ${file}.ps ${file}.dvi
-texi2pdf $includes -o ${file}.pdf ${filename} /dev/null && dvips -o ${file}.ps ${file}.dvi
+"${TEXI2PDF}" $includes -o ${file}.pdf ${filename} 
... since the other tools are siblings.

Thanks for the smoke test!
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Re: [PATCH] update_web_docs_git: Add updated Texinfo to PATH

2023-04-10 Thread Gerald Pfeifer
On Thu, 6 Apr 2023, Arsen Arsenović wrote:
> I must ask that whoever decides to apply/update the script tests
> texi2any with a simple example, like
> 
>   echo @node Top | ~/texinfo/install-git/bin/makeinfo --html -o -
> 
> ... before updating; this should be a representative enough smoke test.
> You should see some HTML output with little text in it.

Yep, and one warning:

  -: warning: must specify a title with a title command or @top

The following then proceeds without warning and the output looks fine:

  printf "@title foo\n@node Top" | 
/home/gccadmin/texinfo/install-git/bin/makeinfo  --html -o -

Gerald


Re: [PATCH] update_web_docs_git: Add updated Texinfo to PATH

2023-04-10 Thread Gerald Pfeifer
On Thu, 6 Apr 2023, Arsen Arsenović wrote:
> maintainer-scripts/ChangeLog:
> 
>   * update_web_docs_git: Add updated Texinfo to PATH

Do we really need to adjust PATH, or could we just introduce a MAKEINFO 
variable, something like

  if [ x${MAKEINFO}x = xx ]; then
if [ -x /home/gccadmin/texinfo/install-git/bin/makeinfo ]; then
  MAKEINFO=/home/gccadmin/texinfo/install-git/bin/makeinfo;
else
  MAKEINFO=makeinfo
fi
  fi

?

(This also still allows overriding upon invocation.)

Gerald


Re: [PATCH] RISC-V: avoid splitting small constant in i_extrabit pattern

2023-04-10 Thread Philipp Tomsich
On Mon, 10 Apr 2023 at 17:57, Jeff Law  wrote:
>
>
>
> On 4/9/23 23:07, Lin Sinan via Gcc-patches wrote:
> > From: Sinan Lin 
> >
> > there is no need to split an xori/ori with an small constant. take the test
> > case `int foo(int idx) { return idx|3; }` as an example,
> >
> > rv64im_zba generates:
> >  ori a0,a0,3
> >  ret
> > but, rv64im_zba_zbs generates:
> >  ori a0,a0,1
> >  ori a0,a0,2
> >  ret
> >
> > with this change, insn `ori r2,r1,3` will not be splitted in zbs.
> > ---
> >   gcc/config/riscv/predicates.md |  2 +-
> >   .../gcc.target/riscv/zbs-extra-bit-or-twobits.c| 14 ++
> >   2 files changed, 15 insertions(+), 1 deletion(-)
> >   create mode 100644 
> > gcc/testsuite/gcc.target/riscv/zbs-extra-bit-or-twobits.c
> A minor oversight in the VRULL patches in this space.  This is actually
> a regression as we were previously generating the single [xo]ri.

Thanks for catching this one!

I looked this change over and it looks fine.  I hope this is the last
fallout from this set of changes.

>
> The patch looks fine, though it does need to go through a test cycle.
>
> jeff
>


Re: [PATCH v3] RISC-V: Fix regression of -fzero-call-used-regs=all

2023-04-10 Thread Jeff Law via Gcc-patches





On 4/9/23 21:11, Kito Cheng wrote:


I think one keypoint here is -fzero-call-used-regs=* emit zeroing
instruction before return, that means there won't be any vector
operations between return and zeroing instructions, so we don't need
to restore the vcsr after zeroing.

Oh yea, makes perfect sense.  Thanks.
jeff


[PATCH] Fortran: resolve correct generic with TYPE(C_PTR) arguments [PR61615]

2023-04-10 Thread Harald Anlauf via Gcc-patches
Dear all,

when comparing formal and actual arguments of a procedure, there was no
check of rank for derived types from intrinsic module ISO_C_BINDING.
This could lead to a wrong resolution of generic procedures with dummy
argument of related types, see PR.  This was likely an oversight.

The attached fix is simple and regtests cleanly on x86_64-pc-linux-gnu.

OK for mainline?

Thanks,
Harald

From d41aa0f60b53799a5d28743f168fbf312461f51f Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Mon, 10 Apr 2023 22:39:52 +0200
Subject: [PATCH] Fortran: resolve correct generic with TYPE(C_PTR) arguments
 [PR61615]

gcc/fortran/ChangeLog:

	PR fortran/61615
	* interface.cc (compare_parameter): Enable rank check for arguments
	of derived type from the intrinsic module ISO_C_BINDING.

gcc/testsuite/ChangeLog:

	PR fortran/61615
	* gfortran.dg/interface_49.f90: New test.
---
 gcc/fortran/interface.cc   | 14 ++-
 gcc/testsuite/gfortran.dg/interface_49.f90 | 43 ++
 2 files changed, 56 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/interface_49.f90

diff --git a/gcc/fortran/interface.cc b/gcc/fortran/interface.cc
index db79b104dc2..8682dc999be 100644
--- a/gcc/fortran/interface.cc
+++ b/gcc/fortran/interface.cc
@@ -2361,7 +2361,19 @@ compare_parameter (gfc_symbol *formal, gfc_expr *actual,
   && formal->ts.u.derived && formal->ts.u.derived->ts.is_iso_c
   && actual->ts.type == BT_DERIVED
   && actual->ts.u.derived && actual->ts.u.derived->ts.is_iso_c)
-return true;
+{
+  if (ranks_must_agree
+	  && ((actual->rank == 0 && formal->attr.dimension)
+	  || (actual->rank != 0 && !formal->attr.dimension)))
+	{
+	  if (where)
+	argument_rank_mismatch (formal->name, >where,
+symbol_rank (formal), actual->rank,
+NULL);
+	  return false;
+	}
+  return true;
+}

   if (formal->ts.type == BT_CLASS && actual->ts.type == BT_DERIVED)
 /* Make sure the vtab symbol is present when
diff --git a/gcc/testsuite/gfortran.dg/interface_49.f90 b/gcc/testsuite/gfortran.dg/interface_49.f90
new file mode 100644
index 000..67d3e3f871b
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/interface_49.f90
@@ -0,0 +1,43 @@
+! { dg-do run }
+! PR fortran/61615 - resolve correct generic with TYPE(C_PTR) arguments
+! Contributed by Jacob Abel
+
+MODULE foo
+  USE iso_c_binding, only : c_ptr
+  IMPLICIT NONE
+  integer :: rank = -99
+  INTERFACE bar
+MODULE PROCEDURE bar_s
+MODULE PROCEDURE bar_a1d
+  END INTERFACE bar
+CONTAINS
+  SUBROUTINE bar_s(a)
+TYPE(c_ptr) :: a
+WRITE (0, *) 'in bar_s'
+rank = 0
+  END SUBROUTINE bar_s
+
+  SUBROUTINE bar_a1d(a)
+TYPE(c_ptr) :: a(:)
+WRITE (0, *) 'in bar_a1d'
+rank = 1
+  END SUBROUTINE bar_a1d
+END MODULE foo
+
+PROGRAM cptr_array_vs_scalar_arg
+  USE foo
+  USE iso_c_binding, only : c_ptr, c_loc
+  IMPLICIT NONE
+  INTEGER, TARGET :: i
+  TYPE(c_ptr) :: a, b(1)
+  a= C_LOC(i)
+  b(1) = C_LOC(i)
+  CALL bar(a)
+  if (rank /= 0) stop 1
+  CALL bar(b)
+  if (rank /= 1) stop 2
+  CALL bar((a))
+  if (rank /= 0) stop 3
+  CALL bar((b))
+  if (rank /= 1) stop 4
+END PROGRAM cptr_array_vs_scalar_arg
--
2.35.3



Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-10 Thread Jeff Law via Gcc-patches




On 4/10/23 09:22, juzhe.zh...@rivai.ai wrote:

Yeah, aarch64 already has 178, RVV has much more types than aarch64...
You can see intrinsic doc:
https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/tuple-type-for-seg-load-store/auto-generated/intrinsic_funcs/02_vector_unit-stride_segment_load_store_instructions_zvlsseg.md
 

api number explodes.

As well as tuples types in RVV much more than aarch64.
Maybe we need to ask RVV api doc maintainer to reduce types && api of RVV?
Not sure.
I think kito may help for this.
I think it's a discussion we need to have.  I really expect efforts to 
have > 256 modes are going to be very controversial.


jeff




Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-10 Thread Jeff Law via Gcc-patches




On 4/10/23 09:33, juzhe.zh...@rivai.ai wrote:

I saw many redundant scalar modes:

  E_CDImode,               /* machmode.def:267 */
#define HAVE_CDImode
#ifdef USE_ENUM_MODES
#define CDImode E_CDImode
#else
#define CDImode (complex_mode ((complex_mode::from_int) E_CDImode))
#endif
   E_CTImode,               /* machmode.def:267 */
#define HAVE_CTImode
#ifdef USE_ENUM_MODES
#define CTImode E_CTImode
#else
#define CTImode (complex_mode ((complex_mode::from_int) E_CTImode))
#endif
   E_HCmode,                /* machmode.def:269 */
#define HAVE_HCmode
#ifdef USE_ENUM_MODES
#define HCmode E_HCmode
#else
#define HCmode (complex_mode ((complex_mode::from_int) E_HCmode))
#endif
   E_SCmode,                /* machmode.def:269 */
#define HAVE_SCmode
#ifdef USE_ENUM_MODES
#define SCmode E_SCmode
#else
#define SCmode (complex_mode ((complex_mode::from_int) E_SCmode))
#endif
   E_DCmode,                /* machmode.def:269 */
#define HAVE_DCmode
#ifdef USE_ENUM_MODES
#define DCmode E_DCmode
#else
#define DCmode (complex_mode ((complex_mode::from_int) E_DCmode))
#endif
   E_TCmode,                /* machmode.def:269 */
#define HAVE_TCmode
#ifdef USE_ENUM_MODES
#define TCmode E_TCmode
#else
#define TCmode (complex_mode ((complex_mode::from_int) E_TCmode))
#endif
...

These scalar modes are redundant I think, can we forbid them?
There are 40+ scalar modes that are not used.

Those are fairly standard complex modes.  Those are unlikely to go away.

Some of those might be redundant with 2 element vector modes, but I'd 
hesitate to do something like using CDI to represent a 2XDI vector.




Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-10 Thread Jeff Law via Gcc-patches




On 4/10/23 09:18, Jakub Jelinek wrote:

On Mon, Apr 10, 2023 at 08:54:12AM -0600, Jeff Law wrote:

This is likely going to be very controversial.  It's going to increase the
size of two of most heavily used data structures in GCC (rtx and trees).

The first thing I would ask is whether or not we really need the full matrix
in practice or if we can combine some of the modes.

Why hasn't aarch64 stumbled over this problem?


 From what I can see, x86 has 130 modes and aarch64 178 right now.
To put it another way.  Why does RISC-V have so many more modes than 
AArch64.


Jeff


[PATCH v3 08/10] RISCV: Weaken mem_thread_fence

2023-04-10 Thread Patrick O'Neill
This change brings atomic fences in line with table A.6 of the ISA
manual.

Relax mem_thread_fence according to the memmodel given.

2023-04-10 Patrick O'Neill 

* sync.md (mem_thread_fence_1): Change fence depending on the
given memory model.

Signed-off-by: Patrick O'Neill 
---
v3 Changelog:
* Consolidate tests in [PATCH v3 10/10]
* Remove helper functions
---
 gcc/config/riscv/sync.md | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index a31b8c4f28a..e91fa29da51 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -42,14 +42,24 @@
   DONE;
 })
 
-;; Until the RISC-V memory model (hence its mapping from C++) is finalized,
-;; conservatively emit a full FENCE.
 (define_insn "mem_thread_fence_1"
   [(set (match_operand:BLK 0 "" "")
(unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER))
(match_operand:SI 1 "const_int_operand" "")] ;; model
   ""
-  "fence\tiorw,iorw")
+  {
+enum memmodel model = (enum memmodel) INTVAL (operands[1]);
+model = memmodel_base (model);
+if (model == MEMMODEL_SEQ_CST)
+   return "fence\trw,rw";
+else if (model == MEMMODEL_ACQ_REL)
+   return "fence.tso";
+else if (model == MEMMODEL_ACQUIRE)
+   return "fence\tr,rw";
+else if (model == MEMMODEL_RELEASE)
+   return "fence\trw,w";
+  }
+  [(set (attr "length") (const_int 4))])
 
 ;; Atomic memory operations.
 
-- 
2.25.1



[PATCH v3 10/10] RISCV: Table A.6 conformance tests

2023-04-10 Thread Patrick O'Neill
These tests cover basic cases to ensure the atomic mappings follow the
strengthened Table A.6 mappings that are compatible with Table A.7.

2023-04-10 Patrick O'Neill 

* amo-table-a-6-amo-add-1.c: New test.
* amo-table-a-6-amo-add-2.c: Likewise.
* amo-table-a-6-amo-add-3.c: Likewise.
* amo-table-a-6-amo-add-4.c: Likewise.
* amo-table-a-6-amo-add-5.c: Likewise.
* amo-table-a-6-compare-exchange-1.c: Likewise.
* amo-table-a-6-compare-exchange-2.c: Likewise.
* amo-table-a-6-compare-exchange-3.c: Likewise.
* amo-table-a-6-compare-exchange-4.c: Likewise.
* amo-table-a-6-compare-exchange-5.c: Likewise.
* amo-table-a-6-fence-1.c: Likewise.
* amo-table-a-6-fence-2.c: Likewise.
* amo-table-a-6-fence-3.c: Likewise.
* amo-table-a-6-fence-4.c: Likewise.
* amo-table-a-6-fence-5.c: Likewise.
* amo-table-a-6-load-1.c: Likewise.
* amo-table-a-6-load-2.c: Likewise.
* amo-table-a-6-load-3.c: Likewise.
* amo-table-a-6-store-1.c: Likewise.
* amo-table-a-6-store-2.c: Likewise.
* amo-table-a-6-store-compat-3.c: Likewise.

Signed-off-by: Patrick O'Neill 
---
v3 Changelog:
* Consolidate existing tests in this patch
* Add new tests for store/load/amoadd
---
 .../gcc.target/riscv/amo-table-a-6-amo-add-1.c   |  8 
 .../gcc.target/riscv/amo-table-a-6-amo-add-2.c   |  8 
 .../gcc.target/riscv/amo-table-a-6-amo-add-3.c   |  8 
 .../gcc.target/riscv/amo-table-a-6-amo-add-4.c   |  8 
 .../gcc.target/riscv/amo-table-a-6-amo-add-5.c   |  8 
 .../riscv/amo-table-a-6-compare-exchange-1.c | 12 
 .../riscv/amo-table-a-6-compare-exchange-2.c | 12 
 .../riscv/amo-table-a-6-compare-exchange-3.c | 12 
 .../riscv/amo-table-a-6-compare-exchange-4.c | 12 
 .../riscv/amo-table-a-6-compare-exchange-5.c | 12 
 .../gcc.target/riscv/amo-table-a-6-fence-1.c |  9 +
 .../gcc.target/riscv/amo-table-a-6-fence-2.c |  7 +++
 .../gcc.target/riscv/amo-table-a-6-fence-3.c |  7 +++
 .../gcc.target/riscv/amo-table-a-6-fence-4.c |  7 +++
 .../gcc.target/riscv/amo-table-a-6-fence-5.c |  7 +++
 .../gcc.target/riscv/amo-table-a-6-load-1.c  |  9 +
 .../gcc.target/riscv/amo-table-a-6-load-2.c  | 10 ++
 .../gcc.target/riscv/amo-table-a-6-load-3.c  | 10 ++
 .../gcc.target/riscv/amo-table-a-6-store-1.c |  9 +
 .../gcc.target/riscv/amo-table-a-6-store-2.c | 10 ++
 .../gcc.target/riscv/amo-table-a-6-store-compat-3.c  | 10 ++
 21 files changed, 195 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-fence-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-fence-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-fence-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-fence-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-fence-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-load-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-load-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-load-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-store-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-store-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-store-compat-3.c

diff --git a/gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-1.c 
b/gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-1.c
new file mode 100644
index 000..ae7e407befc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-1.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* Verify that fence mappings match Table A.6's recommended mapping.  */
+/* { dg-final { scan-assembler "amoadd.w\t" } } */
+
+void
+foo (int* bar, int* baz) {
+  __atomic_add_fetch(bar, baz, __ATOMIC_RELAXED);
+}
diff --git 

[PATCH v3 09/10] RISCV: Weaken atomic loads

2023-04-10 Thread Patrick O'Neill
This change brings atomic loads in line with table A.6 of the ISA
manual.

2023-04-10 Patrick O'Neill 

* sync.md (atomic_load): Implement atomic load mapping.

Signed-off-by: Patrick O'Neill 
---
v3 Changelog:
* Add this patch
---
 gcc/config/riscv/sync.md | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index e91fa29da51..9e3685f5b1c 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -23,6 +23,7 @@
   UNSPEC_COMPARE_AND_SWAP
   UNSPEC_SYNC_OLD_OP
   UNSPEC_SYNC_EXCHANGE
+  UNSPEC_ATOMIC_LOAD
   UNSPEC_ATOMIC_STORE
   UNSPEC_MEMORY_BARRIER
 ])
@@ -63,7 +64,31 @@
 
 ;; Atomic memory operations.
 
-;; Implement atomic stores with conservative fences. Fall back to fences for 
atomic loads.
+(define_insn "atomic_load"
+  [(set (match_operand:GPR 0 "register_operand" "=r")
+(unspec_volatile:GPR
+  [(match_operand:GPR 1 "memory_operand" "A")
+   (match_operand:SI 2 "const_int_operand")]  ;; model
+  UNSPEC_ATOMIC_LOAD))]
+  "TARGET_ATOMIC"
+  {
+enum memmodel model = (enum memmodel) INTVAL (operands[2]);
+model = memmodel_base (model);
+
+if (model == MEMMODEL_SEQ_CST)
+  return "fence\trw,rw\;"
+"l\t%0,%1\;"
+"fence\tr,rw";
+if (model == MEMMODEL_ACQUIRE)
+  return "l\t%0,%1\;"
+"fence\tr,rw";
+else
+  return "l\t%0,%1";
+  }
+  [(set_attr "type" "atomic")
+   (set (attr "length") (const_int 12))])
+
+;; Implement atomic stores with conservative fences.
 ;; This allows us to be compatible with the ISA manual Table A.6 and Table A.7.
 (define_insn "atomic_store"
   [(set (match_operand:GPR 0 "memory_operand" "=A")
-- 
2.25.1



[PATCH v3 05/10] RISCV: Strengthen atomic stores

2023-04-10 Thread Patrick O'Neill
This change makes atomic stores strictly stronger than table A.6 of the
ISA manual. This mapping makes the overall patchset compatible with
table A.7 as well.

2023-04-10 Patrick O'Neill 

PR target/89835
* sync.md (atomic_store): Use simple store instruction in
combination with a fence.
* pr89835.c: New test.

Signed-off-by: Patrick O'Neill 
---
v3 Changelog:
* Use a trailing fence for atomic stores to be compatible with Table A.7
---
 gcc/config/riscv/sync.md | 20 +---
 gcc/testsuite/gcc.target/riscv/pr89835.c |  9 +
 2 files changed, 26 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr89835.c

diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index de42245981b..eef083b06e8 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -53,7 +53,8 @@
 
 ;; Atomic memory operations.
 
-;; Implement atomic stores with amoswap.  Fall back to fences for atomic loads.
+;; Implement atomic stores with conservative fences. Fall back to fences for 
atomic loads.
+;; This allows us to be compatible with the ISA manual Table A.6 and Table A.7.
 (define_insn "atomic_store"
   [(set (match_operand:GPR 0 "memory_operand" "=A")
 (unspec_volatile:GPR
@@ -61,9 +62,22 @@
(match_operand:SI 2 "const_int_operand")]  ;; model
   UNSPEC_ATOMIC_STORE))]
   "TARGET_ATOMIC"
-  "%F2amoswap.%A2 zero,%z1,%0"
+  {
+enum memmodel model = (enum memmodel) INTVAL (operands[2]);
+model = memmodel_base (model);
+
+if (model == MEMMODEL_SEQ_CST)
+  return "fence\trw,w\;"
+"s\t%z1,%0\;"
+"fence\trw,rw";
+if (model == MEMMODEL_RELEASE)
+  return "fence\trw,w\;"
+"s\t%z1,%0";
+else
+  return "s\t%z1,%0";
+  }
   [(set_attr "type" "atomic")
-   (set (attr "length") (const_int 8))])
+   (set (attr "length") (const_int 12))])
 
 (define_insn "atomic_"
   [(set (match_operand:GPR 0 "memory_operand" "+A")
diff --git a/gcc/testsuite/gcc.target/riscv/pr89835.c 
b/gcc/testsuite/gcc.target/riscv/pr89835.c
new file mode 100644
index 000..ab190e11b60
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr89835.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* Verify that relaxed atomic stores use simple store instuctions.  */
+/* { dg-final { scan-assembler-not "amoswap" } } */
+
+void
+foo(int bar, int baz)
+{
+  __atomic_store_n(, baz, __ATOMIC_RELAXED);
+}
-- 
2.25.1



[PATCH v3 03/10] RISCV: Enforce atomic compare_exchange SEQ_CST

2023-04-10 Thread Patrick O'Neill
This patch enforces SEQ_CST for atomic compare_exchange ops.

Replace Fence/LR.aq/SC.aq pairs with SEQ_CST LR.aqrl/SC.rl pairs
recommended by table A.6 of the ISA manual.

2023-04-10 Patrick O'Neill 

* sync.md: Change FENCE/LR.aq/SC.aq into sequentially
consistent LR.aqrl/SC.rl pair.

Signed-off-by: Patrick O'Neill 
---
 gcc/config/riscv/sync.md | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index c932ef87b9d..de42245981b 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -115,9 +115,16 @@
 UNSPEC_COMPARE_AND_SWAP))
(clobber (match_scratch:GPR 6 "="))]
   "TARGET_ATOMIC"
-  "%F5 1: lr.%A5 %0,%1; bne %0,%z2,1f; sc.%A4 %6,%z3,%1; bnez %6,1b; 
1:"
+  {
+return "1:\;"
+  "lr..aqrl\t%0,%1\;"
+  "bne\t%0,%z2,1f\;"
+  "sc..rl\t%6,%z3,%1\;"
+  "bnez\t%6,1b\;"
+  "1:";
+  }
   [(set_attr "type" "atomic")
-   (set (attr "length") (const_int 20))])
+   (set (attr "length") (const_int 16))])
 
 (define_expand "atomic_compare_and_swap"
   [(match_operand:SI 0 "register_operand" "")   ;; bool output
-- 
2.25.1



[PATCH v3 07/10] RISCV: Weaken compare_exchange LR/SC pairs

2023-04-10 Thread Patrick O'Neill
Introduce the %I and %J flags for setting the .aqrl bits on LR/SC pairs
as needed.

Atomic compare and exchange ops provide success and failure memory
models. C++17 and later place no restrictions on the relative strength
of each model, so ensure we cover both by using a model that enforces
the ordering of both given models.

This change brings compare_exchange LR/SC ops in line with table A.6 of the ISA
manual.

2023-04-10 Patrick O'Neill 

* riscv.cc: Add function to get the union of two
memmodels in sync.md.
* riscv-protos.h: Likewise.
* sync.md (atomic_cas_value_strong): Remove static
.aqrl bits on SC op/.rl bits on LR op and replace with
optimized %I, %J flags.

Signed-off-by: Patrick O'Neill 
---
v3 Changelog:
* Consolidate tests in [PATCH v3 10/10]
---
 gcc/config/riscv/riscv-protos.h |  3 +++
 gcc/config/riscv/riscv.cc   | 44 +
 gcc/config/riscv/sync.md|  9 +--
 3 files changed, 54 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 4611447ddde..b03edc3e8a5 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -22,6 +22,8 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_RISCV_PROTOS_H
 #define GCC_RISCV_PROTOS_H
 
+#include "memmodel.h"
+
 /* Symbol types we understand.  The order of this list must match that of
the unspec enum in riscv.md, subsequent to UNSPEC_ADDRESS_FIRST.  */
 enum riscv_symbol_type {
@@ -79,6 +81,7 @@ extern void riscv_reinit (void);
 extern poly_uint64 riscv_regmode_natural_size (machine_mode);
 extern bool riscv_v_ext_vector_mode_p (machine_mode);
 extern bool riscv_shamt_matches_mask_p (int, HOST_WIDE_INT);
+extern enum memmodel riscv_union_memmodels (enum memmodel, enum memmodel);
 
 /* Routines implemented in riscv-c.cc.  */
 void riscv_cpu_cpp_builtins (cpp_reader *);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 6576e9ae524..061d2cf42b4 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4278,6 +4278,36 @@ riscv_print_operand_reloc (FILE *file, rtx op, bool 
hi_reloc)
   fputc (')', file);
 }
 
+/* Return the memory model that encapuslates both given models.  */
+
+enum memmodel
+riscv_union_memmodels (enum memmodel model1, enum memmodel model2)
+{
+  model1 = memmodel_base (model1);
+  model2 = memmodel_base (model2);
+
+  enum memmodel weaker = model1 <= model2 ? model1: model2;
+  enum memmodel stronger = model1 > model2 ? model1: model2;
+
+  switch (stronger)
+{
+  case MEMMODEL_SEQ_CST:
+  case MEMMODEL_ACQ_REL:
+   return stronger;
+  case MEMMODEL_RELEASE:
+   if (weaker == MEMMODEL_ACQUIRE || weaker == MEMMODEL_CONSUME)
+ return MEMMODEL_ACQ_REL;
+   else
+ return stronger;
+  case MEMMODEL_ACQUIRE:
+  case MEMMODEL_CONSUME:
+  case MEMMODEL_RELAXED:
+   return stronger;
+  default:
+   gcc_unreachable ();
+}
+}
+
 /* Return true if the .AQ suffix should be added to an AMO to implement the
acquire portion of memory model MODEL.  */
 
@@ -4331,6 +4361,8 @@ riscv_memmodel_needs_amo_release (enum memmodel model)
'R' Print the low-part relocation associated with OP.
'C' Print the integer branch condition for comparison OP.
'A' Print the atomic operation suffix for memory model OP.
+   'I' Print the LR suffix for memory model OP.
+   'J' Print the SC suffix for memory model OP.
'z' Print x0 if OP is zero, otherwise print OP normally.
'i' Print i if the operand is not a register.
'S' Print shift-index of single-bit mask OP.
@@ -4500,6 +4532,18 @@ riscv_print_operand (FILE *file, rtx op, int letter)
fputs (".rl", file);
   break;
 
+case 'I':
+  if (model == MEMMODEL_SEQ_CST)
+   fputs (".aqrl", file);
+  else if (riscv_memmodel_needs_amo_acquire (model))
+   fputs (".aq", file);
+  break;
+
+case 'J':
+  if (riscv_memmodel_needs_amo_release (model))
+   fputs (".rl", file);
+  break;
+
 case 'i':
   if (code != REG)
 fputs ("i", file);
diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index fdfc56d64a1..a31b8c4f28a 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -130,10 +130,15 @@
(clobber (match_scratch:GPR 6 "="))]
   "TARGET_ATOMIC"
   {
+enum memmodel model_success = (enum memmodel) INTVAL(operands[4]);
+enum memmodel model_failure = (enum memmodel) INTVAL(operands[5]);
+/* Find the union of the two memory models so we can satisfy both success
+   and failure memory models.  */
+operands[5] = GEN_INT(riscv_union_memmodels(model_success, model_failure));
 return "1:\;"
-  "lr..aqrl\t%0,%1\;"
+  "lr.%I5\t%0,%1\;"
   "bne\t%0,%z2,1f\;"
-  "sc..rl\t%6,%z3,%1\;"
+  "sc.%J5\t%6,%z3,%1\;"
   "bnez\t%6,1b\;"
   "1:";
   }
-- 

[PATCH v3 02/10] RISCV: Enforce Libatomic LR/SC SEQ_CST

2023-04-10 Thread Patrick O'Neill
Replace LR.aq/SC.rl pairs with the SEQ_CST LR.aqrl/SC.rl pairs
recommended by table A.6 of the ISA manual.

2023-04-10 Patrick O'Neill 

* atomic.c: Change LR.aq/SC.rl pairs into sequentially
consistent LR.aqrl/SC.rl pair.

Signed-off-by: Patrick O'Neill 
---
 libgcc/config/riscv/atomic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libgcc/config/riscv/atomic.c b/libgcc/config/riscv/atomic.c
index 69f53623509..5f895939b0b 100644
--- a/libgcc/config/riscv/atomic.c
+++ b/libgcc/config/riscv/atomic.c
@@ -39,7 +39,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 unsigned old, tmp1, tmp2;  \
\
 asm volatile ("1:\n\t" \
- "lr.w.aq %[old], %[mem]\n\t"  \
+ "lr.w.aqrl %[old], %[mem]\n\t"\
  #insn " %[tmp1], %[old], %[value]\n\t"\
  invert\
  "and %[tmp1], %[tmp1], %[mask]\n\t"   \
@@ -73,7 +73,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 unsigned old, tmp1;
\
\
 asm volatile ("1:\n\t" \
- "lr.w.aq %[old], %[mem]\n\t"  \
+ "lr.w.aqrl %[old], %[mem]\n\t"\
  "and %[tmp1], %[old], %[mask]\n\t"\
  "bne %[tmp1], %[o], 1f\n\t"   \
  "and %[tmp1], %[old], %[not_mask]\n\t"\
-- 
2.25.1



[PATCH v3 04/10] RISCV: Add AMO release bits

2023-04-10 Thread Patrick O'Neill
This patch sets the relevant .rl bits on amo operations.

2023-04-10 Patrick O'Neill 

* riscv.cc (riscv_print_operand): change behavior of %A to
include release bits.

Signed-off-by: Patrick O'Neill 
---
 gcc/config/riscv/riscv.cc | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 8f5636c93ed..8ffee494fbe 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4492,8 +4492,13 @@ riscv_print_operand (FILE *file, rtx op, int letter)
   break;
 
 case 'A':
-  if (riscv_memmodel_needs_amo_acquire (model))
+  if (riscv_memmodel_needs_amo_acquire (model) &&
+ riscv_memmodel_needs_release_fence (model))
+   fputs (".aqrl", file);
+  else if (riscv_memmodel_needs_amo_acquire (model))
fputs (".aq", file);
+  else if (riscv_memmodel_needs_release_fence (model))
+   fputs (".rl", file);
   break;
 
 case 'F':
-- 
2.25.1



[PATCH v3 06/10] RISCV: Eliminate AMO op fences

2023-04-10 Thread Patrick O'Neill
Atomic operations with the appropriate bits set already enfore release
semantics. Remove unnecessary release fences from atomic ops.

This change brings AMO ops in line with table A.6 of the ISA manual.

2023-04-10 Patrick O'Neill 

* riscv.cc (riscv_memmodel_needs_amo_acquire): Change function
name.
* riscv.cc (riscv_print_operand): Remove unneeded %F case.
* sync.md: Remove unneeded fences.

Signed-off-by: Patrick O'Neill 
---
 gcc/config/riscv/riscv.cc | 16 +---
 gcc/config/riscv/sync.md  | 12 ++--
 2 files changed, 11 insertions(+), 17 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 8ffee494fbe..6576e9ae524 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4301,11 +4301,11 @@ riscv_memmodel_needs_amo_acquire (enum memmodel model)
 }
 }
 
-/* Return true if a FENCE should be emitted to before a memory access to
-   implement the release portion of memory model MODEL.  */
+/* Return true if the .RL suffix should be added to an AMO to implement the
+   release portion of memory model MODEL.  */
 
 static bool
-riscv_memmodel_needs_release_fence (enum memmodel model)
+riscv_memmodel_needs_amo_release (enum memmodel model)
 {
   switch (model)
 {
@@ -4331,7 +4331,6 @@ riscv_memmodel_needs_release_fence (enum memmodel model)
'R' Print the low-part relocation associated with OP.
'C' Print the integer branch condition for comparison OP.
'A' Print the atomic operation suffix for memory model OP.
-   'F' Print a FENCE if the memory model requires a release.
'z' Print x0 if OP is zero, otherwise print OP normally.
'i' Print i if the operand is not a register.
'S' Print shift-index of single-bit mask OP.
@@ -4493,19 +4492,14 @@ riscv_print_operand (FILE *file, rtx op, int letter)
 
 case 'A':
   if (riscv_memmodel_needs_amo_acquire (model) &&
- riscv_memmodel_needs_release_fence (model))
+ riscv_memmodel_needs_amo_release (model))
fputs (".aqrl", file);
   else if (riscv_memmodel_needs_amo_acquire (model))
fputs (".aq", file);
-  else if (riscv_memmodel_needs_release_fence (model))
+  else if (riscv_memmodel_needs_amo_release (model))
fputs (".rl", file);
   break;
 
-case 'F':
-  if (riscv_memmodel_needs_release_fence (model))
-   fputs ("fence iorw,ow; ", file);
-  break;
-
 case 'i':
   if (code != REG)
 fputs ("i", file);
diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index eef083b06e8..fdfc56d64a1 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -87,9 +87,9 @@
   (match_operand:SI 2 "const_int_operand")] ;; model
 UNSPEC_SYNC_OLD_OP))]
   "TARGET_ATOMIC"
-  "%F2amo.%A2 zero,%z1,%0"
+  "amo.%A2\tzero,%z1,%0"
   [(set_attr "type" "atomic")
-   (set (attr "length") (const_int 8))])
+   (set (attr "length") (const_int 4))])
 
 (define_insn "atomic_fetch_"
   [(set (match_operand:GPR 0 "register_operand" "=")
@@ -101,9 +101,9 @@
   (match_operand:SI 3 "const_int_operand")] ;; model
 UNSPEC_SYNC_OLD_OP))]
   "TARGET_ATOMIC"
-  "%F3amo.%A3 %0,%z2,%1"
+  "amo.%A3\t%0,%z2,%1"
   [(set_attr "type" "atomic")
-   (set (attr "length") (const_int 8))])
+   (set (attr "length") (const_int 4))])
 
 (define_insn "atomic_exchange"
   [(set (match_operand:GPR 0 "register_operand" "=")
@@ -114,9 +114,9 @@
(set (match_dup 1)
(match_operand:GPR 2 "register_operand" "0"))]
   "TARGET_ATOMIC"
-  "%F3amoswap.%A3 %0,%z2,%1"
+  "amoswap.%A3\t%0,%z2,%1"
   [(set_attr "type" "atomic")
-   (set (attr "length") (const_int 8))])
+   (set (attr "length") (const_int 4))])
 
 (define_insn "atomic_cas_value_strong"
   [(set (match_operand:GPR 0 "register_operand" "=")
-- 
2.25.1



[PATCH v3 01/10] RISCV: Eliminate SYNC memory models

2023-04-10 Thread Patrick O'Neill
Remove references to MEMMODEL_SYNC_* models by converting via
memmodel_base().

2023-04-10 Patrick O'Neill 

* riscv.cc: Remove MEMMODEL_SYNC_* cases and sanitize memmodel
input with memmodel_base

Signed-off-by: Patrick O'Neill 
---
 gcc/config/riscv/riscv.cc | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 76eee4a55e9..8f5636c93ed 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4288,14 +4288,11 @@ riscv_memmodel_needs_amo_acquire (enum memmodel model)
 {
   case MEMMODEL_ACQ_REL:
   case MEMMODEL_SEQ_CST:
-  case MEMMODEL_SYNC_SEQ_CST:
   case MEMMODEL_ACQUIRE:
   case MEMMODEL_CONSUME:
-  case MEMMODEL_SYNC_ACQUIRE:
return true;
 
   case MEMMODEL_RELEASE:
-  case MEMMODEL_SYNC_RELEASE:
   case MEMMODEL_RELAXED:
return false;
 
@@ -4314,14 +4311,11 @@ riscv_memmodel_needs_release_fence (enum memmodel model)
 {
   case MEMMODEL_ACQ_REL:
   case MEMMODEL_SEQ_CST:
-  case MEMMODEL_SYNC_SEQ_CST:
   case MEMMODEL_RELEASE:
-  case MEMMODEL_SYNC_RELEASE:
return true;
 
   case MEMMODEL_ACQUIRE:
   case MEMMODEL_CONSUME:
-  case MEMMODEL_SYNC_ACQUIRE:
   case MEMMODEL_RELAXED:
return false;
 
@@ -4360,6 +4354,7 @@ riscv_print_operand (FILE *file, rtx op, int letter)
 }
   machine_mode mode = GET_MODE (op);
   enum rtx_code code = GET_CODE (op);
+  const enum memmodel model = memmodel_base (INTVAL (op));
 
   switch (letter)
 {
@@ -4497,12 +4492,12 @@ riscv_print_operand (FILE *file, rtx op, int letter)
   break;
 
 case 'A':
-  if (riscv_memmodel_needs_amo_acquire ((enum memmodel) INTVAL (op)))
+  if (riscv_memmodel_needs_amo_acquire (model))
fputs (".aq", file);
   break;
 
 case 'F':
-  if (riscv_memmodel_needs_release_fence ((enum memmodel) INTVAL (op)))
+  if (riscv_memmodel_needs_release_fence (model))
fputs ("fence iorw,ow; ", file);
   break;
 
-- 
2.25.1



[PATCH v3 00/10] RISCV: Implement ISA Manual Table A.6 Mappings

2023-04-10 Thread Patrick O'Neill
This patchset aims to make the RISCV atomics implementation stronger
than the recommended mapping present in table A.6 of the ISA manual.
  
https://github.com/riscv/riscv-isa-manual/blob/c7cf84547b3aefacab5463add1734c1602b67a49/src/memory.tex#L1083-L1157

The current mapping in GCC is not internally consistent. Andrea Parri
pointed this out here along with a litmus test:
  https://inbox.sourceware.org/gcc-patches/Y1GbJuhcBFpPGJQ0@andrea/

As a result, we have an opportunity to jump straight to the A.6
implementation (meaning we will be compatible with LLVM's mappings which
are A.6). In light of a proposal by Hans Boehm and to avoid an ABI break
in the future, the mapping implemented is strictly stronger than the one
in table A.6 in order to be compatible with Table A.7.
  
https://lists.riscv.org/g/tech-unprivileged/topic/risc_v_memory_model_topics/92916241

If Hans' proposal is accepted, it makes sense to migrate to the mapping
recommended by table A.7. Since the stronger mapping in this patchset
(provided by Hans Boehm) appears to be compatible with both A.6 and A.7,
this transition should not result in an ABI break for GCC.

Patch 1 simplifies the memmodel to ignore MEMMODEL_SYNC_* cases (legacy
cases that aren't handled differently for RISC-V).
Patches 2-5 make the mappings strictly stronger.
Patches 5-9 weaken the mappings to be in line with table A.6 of the ISA
manual.
Patch 10 adds some basic conformance tests to ensure the implemented
mapping matches table A.6 with stronger SEQ_CST stores.

Christoph Muellner also submitted a similar patchset here:
  https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595712.html
I used my previous patchset as a starting point since it was easier for 
me.

LLVM mapping notes:
* LLVM emits corresponding fences for atomic_signal_fence instructions.
  This seems to be an oversight since AFAIK atomic_signal_fence acts as
  a compiler directive. GCC does not emit any fences for
  atomic_signal_fence instructions.

Patchset v1:
  https://gcc.gnu.org/pipermail/gcc-patches/2022-April/592950.html

Patchset v2:
  https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615264.html

Changes for v2:
* Use memmodel_base rather than a custom simplify_memmodel function
  (Inspired by Christoph Muellner's patch 1/9)
* Move instruction styling change from [v1 5/7] to [v2 3/8] to reduce
  [v2 6/8]'s complexity
* Eliminated %K flag for atomic store introduced in v1 in favor of
  if/else
* Rebase/test

Changes for v3:
* Use a trailing fence for atomic stores to be compatible with Table A.7
* Emit an optimized fence r,rw following a SEQ_CST load
* Consolidate tests in [PATCH v3 10/10]
* Add tests for basic A.6 conformance

Patrick O'Neill (10):
  RISCV: Eliminate SYNC memory models
  RISCV: Enforce Libatomic LR/SC SEQ_CST
  RISCV: Enforce atomic compare_exchange SEQ_CST
  RISCV: Add AMO release bits
  RISCV: Strengthen atomic stores
  RISCV: Eliminate AMO op fences
  RISCV: Weaken compare_exchange LR/SC pairs
  RISCV: Weaken mem_thread_fence
  RISCV: Weaken atomic loads
  RISCV: Table A.6 conformance tests

 gcc/config/riscv/riscv-protos.h   |  3 +
 gcc/config/riscv/riscv.cc | 66 +++---
 gcc/config/riscv/sync.md  | 89 ---
 .../riscv/amo-table-a-6-amo-add-1.c   |  8 ++
 .../riscv/amo-table-a-6-amo-add-2.c   |  8 ++
 .../riscv/amo-table-a-6-amo-add-3.c   |  8 ++
 .../riscv/amo-table-a-6-amo-add-4.c   |  8 ++
 .../riscv/amo-table-a-6-amo-add-5.c   |  8 ++
 .../riscv/amo-table-a-6-compare-exchange-1.c  | 12 +++
 .../riscv/amo-table-a-6-compare-exchange-2.c  | 12 +++
 .../riscv/amo-table-a-6-compare-exchange-3.c  | 12 +++
 .../riscv/amo-table-a-6-compare-exchange-4.c  | 12 +++
 .../riscv/amo-table-a-6-compare-exchange-5.c  | 12 +++
 .../gcc.target/riscv/amo-table-a-6-fence-1.c  |  9 ++
 .../gcc.target/riscv/amo-table-a-6-fence-2.c  |  7 ++
 .../gcc.target/riscv/amo-table-a-6-fence-3.c  |  7 ++
 .../gcc.target/riscv/amo-table-a-6-fence-4.c  |  7 ++
 .../gcc.target/riscv/amo-table-a-6-fence-5.c  |  7 ++
 .../gcc.target/riscv/amo-table-a-6-load-1.c   |  9 ++
 .../gcc.target/riscv/amo-table-a-6-load-2.c   | 10 +++
 .../gcc.target/riscv/amo-table-a-6-load-3.c   | 10 +++
 .../gcc.target/riscv/amo-table-a-6-store-1.c  |  9 ++
 .../gcc.target/riscv/amo-table-a-6-store-2.c  | 10 +++
 .../riscv/amo-table-a-6-store-compat-3.c  | 10 +++
 gcc/testsuite/gcc.target/riscv/pr89835.c  |  9 ++
 libgcc/config/riscv/atomic.c  |  4 +-
 26 files changed, 336 insertions(+), 30 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-5.c
 create mode 

Re: [PATCH] RISC-V: avoid splitting small constant in i_extrabit pattern

2023-04-10 Thread Jeff Law via Gcc-patches




On 4/9/23 23:07, Lin Sinan via Gcc-patches wrote:

From: Sinan Lin 

there is no need to split an xori/ori with an small constant. take the test
case `int foo(int idx) { return idx|3; }` as an example,

rv64im_zba generates:
 ori a0,a0,3
 ret
but, rv64im_zba_zbs generates:
 ori a0,a0,1
 ori a0,a0,2
 ret

with this change, insn `ori r2,r1,3` will not be splitted in zbs.
---
  gcc/config/riscv/predicates.md |  2 +-
  .../gcc.target/riscv/zbs-extra-bit-or-twobits.c| 14 ++
  2 files changed, 15 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-extra-bit-or-twobits.c
A minor oversight in the VRULL patches in this space.  This is actually 
a regression as we were previously generating the single [xo]ri.



The patch looks fine, though it does need to go through a test cycle.

jeff



Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-10 Thread juzhe.zhong
I saw many redundant scalar modes:

 E_CDImode,   /* machmode.def:267 */
#define HAVE_CDImode
#ifdef USE_ENUM_MODES
#define CDImode E_CDImode
#else
#define CDImode (complex_mode ((complex_mode::from_int) E_CDImode))
#endif
  E_CTImode,   /* machmode.def:267 */
#define HAVE_CTImode
#ifdef USE_ENUM_MODES
#define CTImode E_CTImode
#else
#define CTImode (complex_mode ((complex_mode::from_int) E_CTImode))
#endif
  E_HCmode,/* machmode.def:269 */
#define HAVE_HCmode
#ifdef USE_ENUM_MODES
#define HCmode E_HCmode
#else
#define HCmode (complex_mode ((complex_mode::from_int) E_HCmode))
#endif
  E_SCmode,/* machmode.def:269 */
#define HAVE_SCmode
#ifdef USE_ENUM_MODES
#define SCmode E_SCmode
#else
#define SCmode (complex_mode ((complex_mode::from_int) E_SCmode))
#endif
  E_DCmode,/* machmode.def:269 */
#define HAVE_DCmode
#ifdef USE_ENUM_MODES
#define DCmode E_DCmode
#else
#define DCmode (complex_mode ((complex_mode::from_int) E_DCmode))
#endif
  E_TCmode,/* machmode.def:269 */
#define HAVE_TCmode
#ifdef USE_ENUM_MODES
#define TCmode E_TCmode
#else
#define TCmode (complex_mode ((complex_mode::from_int) E_TCmode))
#endif
...

These scalar modes are redundant I think, can we forbid them?
There are 40+ scalar modes that are not used.



juzhe.zh...@rivai.ai
 
From: juzhe.zh...@rivai.ai
Date: 2023-04-10 23:22
To: jakub; Jeff Law
CC: gcc-patches; kito.cheng; palmer; richard.sandiford; rguenther
Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit
Yeah, aarch64 already has 178, RVV has much more types than aarch64...
You can see intrinsic doc:
https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/tuple-type-for-seg-load-store/auto-generated/intrinsic_funcs/02_vector_unit-stride_segment_load_store_instructions_zvlsseg.md
 
api number explodes.

As well as tuples types in RVV much more than aarch64.
Maybe we need to ask RVV api doc maintainer to reduce types && api of RVV?
Not sure.
I think kito may help for this.


juzhe.zh...@rivai.ai
 
From: Jakub Jelinek
Date: 2023-04-10 23:18
To: Jeff Law
CC: juzhe.zhong; gcc-patches; kito.cheng; palmer; richard.sandiford; rguenther
Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit
On Mon, Apr 10, 2023 at 08:54:12AM -0600, Jeff Law wrote:
> This is likely going to be very controversial.  It's going to increase the
> size of two of most heavily used data structures in GCC (rtx and trees).
> 
> The first thing I would ask is whether or not we really need the full matrix
> in practice or if we can combine some of the modes.
> 
> Why hasn't aarch64 stumbled over this problem?
 
From what I can see, x86 has 130 modes and aarch64 178 right now.
 
Jakub
 
 


Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-10 Thread juzhe.zhong
Yeah, aarch64 already has 178, RVV has much more types than aarch64...
You can see intrinsic doc:
https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/tuple-type-for-seg-load-store/auto-generated/intrinsic_funcs/02_vector_unit-stride_segment_load_store_instructions_zvlsseg.md
 
api number explodes.

As well as tuples types in RVV much more than aarch64.
Maybe we need to ask RVV api doc maintainer to reduce types && api of RVV?
Not sure.
I think kito may help for this.


juzhe.zh...@rivai.ai
 
From: Jakub Jelinek
Date: 2023-04-10 23:18
To: Jeff Law
CC: juzhe.zhong; gcc-patches; kito.cheng; palmer; richard.sandiford; rguenther
Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit
On Mon, Apr 10, 2023 at 08:54:12AM -0600, Jeff Law wrote:
> This is likely going to be very controversial.  It's going to increase the
> size of two of most heavily used data structures in GCC (rtx and trees).
> 
> The first thing I would ask is whether or not we really need the full matrix
> in practice or if we can combine some of the modes.
> 
> Why hasn't aarch64 stumbled over this problem?
 
From what I can see, x86 has 130 modes and aarch64 178 right now.
 
Jakub
 
 


Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-10 Thread Jakub Jelinek via Gcc-patches
On Mon, Apr 10, 2023 at 08:54:12AM -0600, Jeff Law wrote:
> This is likely going to be very controversial.  It's going to increase the
> size of two of most heavily used data structures in GCC (rtx and trees).
> 
> The first thing I would ask is whether or not we really need the full matrix
> in practice or if we can combine some of the modes.
> 
> Why hasn't aarch64 stumbled over this problem?

>From what I can see, x86 has 130 modes and aarch64 178 right now.

Jakub



Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-10 Thread juzhe.zhong
ARM SVE has:svint8_t, svint8x2_t, svint8x3_t, svint8x4_t
As far as I known, they don't have tuple type for partial vector.
However, for RVV not only has vint8m1_t, vint8m1x2_t, vint8m1x3_t, 
vint8m1x4_t, vint8m1x5_t, vint8m1x6_t, vint8m1x7_t, vint8m1x8_t

But also, we have vint8mf8_t, vint8mf8x2_t, vint8mf8x3_t, 
vint8mf8x4_t, vint8mf8x5_t, vint8mf8x6_t, vint8mf8x7_t, vint8mf8x8_t

vint8mf4_t, vint8mf4x2_t, vint8mf4x3_t, 
vint8mf4x4_t, vint8mf4x5_t, vint8mf4x6_t, vint8mf4x7_t, vint8mf4x8_t

etc

So many tuple types.  I saw there are redundant scalar mode in RISC-V port 
backend
like UQQmode, HQQmode, Not sure maybe we can reduce these scalar modes to
make total machine modes less than 256?


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-04-10 22:54
To: juzhe.zhong; gcc-patches
CC: kito.cheng; palmer; jakub; richard.sandiford; rguenther
Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit
 
 
On 4/10/23 08:48, juzhe.zh...@rivai.ai wrote:
> From: Juzhe-Zhong 
> 
> According RVV ISA:
> https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-type-register-vtype
> We have LMUL: 1/8, 1/4, 1/2, 1, 2, 4, 8
> Also, for segment instructions, we have tuple type for NF = 2 ~ 8.
> For example, for LMUL = 1/2, SEW = 32, we have vint32mf2_t,
> we will have NF from 2 ~ 8 tuples: vint32mf2x2_t, vint32mf2x2...  
> vint32mf2x8_t.
> So we will end up with over 220+ vector machine mode for RVV.
> 
> PLUS the scalar machine modes that we already have in RISC-V port.
> 
> The total machine modes in RISC-V port > 256.
> 
> Current GCC can not allow us support RVV segment instructions tuple types.
> 
> So extend machine mode size from 8bit to 16bit.
> 
> I have another solution related to this patch,
> May be adding a target dependent macro is better?
> Revise this patch like this:
> 
> #ifdef TARGET_MAX_MACHINE_MODE_LARGER_THAN_256
> ENUM_BITFIELD(machine_mode)  last_set_mode : 16;
> #else
> ENUM_BITFIELD(machine_mode)  last_set_mode : 8;
> #endif
> 
> Not sure whether this solution is better?
> 
> This patch Bootstraped on X86 is PASS. Will run make-check gcc-testsuite 
> tomorrow.
> 
> Expecting land in GCC-14, any suggestions ?
> 
> gcc/ChangeLog:
> 
>  * combine.cc (struct reg_stat_type): Extend 8bit to 16bit.
>  * cse.cc (struct qty_table_elem): Ditto.
>  (struct table_elt): Ditto.
>  (struct set): Ditto.
>  * genopinit.cc (main): Ditto.
>  * ira-int.h (struct ira_allocno): Ditto.
>  * ree.cc (struct ATTRIBUTE_PACKED): Ditto.
>  * rtl-ssa/accesses.h: Ditto.
>  * rtl.h (struct GTY): Ditto.
>  (subreg_shape::unique_id): Ditto.
>  * rtlanal.h: Ditto.
>  * tree-core.h (struct tree_type_common): Ditto.
>  (struct tree_decl_common): Ditto.
This is likely going to be very controversial.  It's going to increase 
the size of two of most heavily used data structures in GCC (rtx and trees).
 
The first thing I would ask is whether or not we really need the full 
matrix in practice or if we can combine some of the modes.
 
Why hasn't aarch64 stumbled over this problem?
 
Jeff
 


Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-10 Thread Jakub Jelinek via Gcc-patches
On Mon, Apr 10, 2023 at 10:48:08PM +0800, juzhe.zh...@rivai.ai wrote:
> * rtl.h (struct GTY): Ditto.

> --- a/gcc/rtl.h
> +++ b/gcc/rtl.h
> @@ -313,7 +313,7 @@ struct GTY((desc("0"), tag("0"),
>ENUM_BITFIELD(rtx_code) code: 16;
>  
>/* The kind of value the expression has.  */
> -  ENUM_BITFIELD(machine_mode) mode : 8;
> +  ENUM_BITFIELD(machine_mode) mode : 16;
>  
>/* 1 in a MEM if we should keep the alias set for this mem unchanged
>   when we access a component.

At least for struct rtx_def this is certainly unacceptable.
The widely used structure is carefully laid out so that it doesn't waste any
bits - there are 16 + 8 + 8 bits, then 32-bit union, and then union of
something that needs on 64-bit hosts 64-bit alignment.  So header nicely 64
bits before the variable sized payloads.
The above change grows that to 16 + 16 + 8 bits, the 32-bit union needs
32-bit alignment, so that is already 96 bits, and then the payload which
needs 64-bit alignment, so the above change grows the rtl header by 100%,
from 64-bits to 128-bits.

> --- a/gcc/tree-core.h
> +++ b/gcc/tree-core.h
> @@ -1693,7 +1693,7 @@ struct GTY(()) tree_type_common {
>unsigned restrict_flag : 1;
>unsigned contains_placeholder_bits : 2;
>  
> -  ENUM_BITFIELD(machine_mode) mode : 8;
> +  ENUM_BITFIELD(machine_mode) mode : 16;
>  
>/* TYPE_STRING_FLAG for INTEGER_TYPE and ARRAY_TYPE.
>   TYPE_CXX_ODR_P for RECORD_TYPE and UNION_TYPE.  */

This structure has 15 spare bits, so in theory it could be accomodated to
handle more bits for mode, but the above change is insufficient for that.

> @@ -1776,7 +1776,7 @@ struct GTY(()) tree_decl_common {
>struct tree_decl_minimal common;
>tree size;
>  
> -  ENUM_BITFIELD(machine_mode) mode : 8;
> +  ENUM_BITFIELD(machine_mode) mode : 16;
>  
>unsigned nonlocal_flag : 1;
>unsigned virtual_flag : 1;

I think this one has 13 spare bits, but again one would need to adjust the
structure more so that it doesn't grow unnecessarily.

I think you should try hard to avoid having too many modes, there are a lot
of arrays especially in RA sized by number of modes or even that times
number of register classes (I thought we have some number of modes ^ 2
but can't find them right now), and if there is no way to avoid that,
we should consider making those changes dependent on maximum number of modes
and use current more compact compile time memory data structures unless
the target has more than 256 modes.

Jakub



Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-10 Thread juzhe.zhong
Since RVV has much more types than aarch64.
You can see rvv-intrinsic doc there are so many rvv intrinsics:
https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/tuple-type-for-seg-load-store/auto-generated/intrinsic_funcs/02_vector_unit-stride_segment_load_store_instructions_zvlsseg.md
 
The rvv intrinsics explode.

For segment instructions, RVV has array type supporting NF from 2 ~ 8 for LMUL 
<= 1 (MF8,MF4,MF2,M1)
Wheras aarch64 only has array type with array size 2 ~ 4 only for a LMUL = 1(a 
whole vector).

I think, kito can explain more clearly about such issue.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-04-10 22:54
To: juzhe.zhong; gcc-patches
CC: kito.cheng; palmer; jakub; richard.sandiford; rguenther
Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit
 
 
On 4/10/23 08:48, juzhe.zh...@rivai.ai wrote:
> From: Juzhe-Zhong 
> 
> According RVV ISA:
> https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-type-register-vtype
> We have LMUL: 1/8, 1/4, 1/2, 1, 2, 4, 8
> Also, for segment instructions, we have tuple type for NF = 2 ~ 8.
> For example, for LMUL = 1/2, SEW = 32, we have vint32mf2_t,
> we will have NF from 2 ~ 8 tuples: vint32mf2x2_t, vint32mf2x2...  
> vint32mf2x8_t.
> So we will end up with over 220+ vector machine mode for RVV.
> 
> PLUS the scalar machine modes that we already have in RISC-V port.
> 
> The total machine modes in RISC-V port > 256.
> 
> Current GCC can not allow us support RVV segment instructions tuple types.
> 
> So extend machine mode size from 8bit to 16bit.
> 
> I have another solution related to this patch,
> May be adding a target dependent macro is better?
> Revise this patch like this:
> 
> #ifdef TARGET_MAX_MACHINE_MODE_LARGER_THAN_256
> ENUM_BITFIELD(machine_mode)  last_set_mode : 16;
> #else
> ENUM_BITFIELD(machine_mode)  last_set_mode : 8;
> #endif
> 
> Not sure whether this solution is better?
> 
> This patch Bootstraped on X86 is PASS. Will run make-check gcc-testsuite 
> tomorrow.
> 
> Expecting land in GCC-14, any suggestions ?
> 
> gcc/ChangeLog:
> 
>  * combine.cc (struct reg_stat_type): Extend 8bit to 16bit.
>  * cse.cc (struct qty_table_elem): Ditto.
>  (struct table_elt): Ditto.
>  (struct set): Ditto.
>  * genopinit.cc (main): Ditto.
>  * ira-int.h (struct ira_allocno): Ditto.
>  * ree.cc (struct ATTRIBUTE_PACKED): Ditto.
>  * rtl-ssa/accesses.h: Ditto.
>  * rtl.h (struct GTY): Ditto.
>  (subreg_shape::unique_id): Ditto.
>  * rtlanal.h: Ditto.
>  * tree-core.h (struct tree_type_common): Ditto.
>  (struct tree_decl_common): Ditto.
This is likely going to be very controversial.  It's going to increase 
the size of two of most heavily used data structures in GCC (rtx and trees).
 
The first thing I would ask is whether or not we really need the full 
matrix in practice or if we can combine some of the modes.
 
Why hasn't aarch64 stumbled over this problem?
 
Jeff
 


Re: [PATCH] RISC-V: add TARGET_ZBKB to the condition of bswapsi2, bswapdi2 and rotr3 patterns

2023-04-10 Thread Jeff Law via Gcc-patches




On 4/10/23 04:56, Lin Sinan wrote:

From: Sinan Lin 

tell gcc that zbkb has these two spn to enable some optimizations. e.g.
1) the rrotate_expr could match to rotrm3 during expand; 2) hook up
__builtin_bswap64 with `rev8` in zbkb64.
---
  gcc/config/riscv/bitmanip.md | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

Deferred to gcc-14 as the trunk is not currently open for development.

jeff


Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-10 Thread Jeff Law via Gcc-patches




On 4/10/23 08:48, juzhe.zh...@rivai.ai wrote:

From: Juzhe-Zhong 

According RVV ISA:
https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-type-register-vtype
We have LMUL: 1/8, 1/4, 1/2, 1, 2, 4, 8
Also, for segment instructions, we have tuple type for NF = 2 ~ 8.
For example, for LMUL = 1/2, SEW = 32, we have vint32mf2_t,
we will have NF from 2 ~ 8 tuples: vint32mf2x2_t, vint32mf2x2...  vint32mf2x8_t.
So we will end up with over 220+ vector machine mode for RVV.

PLUS the scalar machine modes that we already have in RISC-V port.

The total machine modes in RISC-V port > 256.

Current GCC can not allow us support RVV segment instructions tuple types.

So extend machine mode size from 8bit to 16bit.

I have another solution related to this patch,
May be adding a target dependent macro is better?
Revise this patch like this:

#ifdef TARGET_MAX_MACHINE_MODE_LARGER_THAN_256
ENUM_BITFIELD(machine_mode)  last_set_mode : 16;
#else
ENUM_BITFIELD(machine_mode)  last_set_mode : 8;
#endif

Not sure whether this solution is better?

This patch Bootstraped on X86 is PASS. Will run make-check gcc-testsuite 
tomorrow.

Expecting land in GCC-14, any suggestions ?

gcc/ChangeLog:

 * combine.cc (struct reg_stat_type): Extend 8bit to 16bit.
 * cse.cc (struct qty_table_elem): Ditto.
 (struct table_elt): Ditto.
 (struct set): Ditto.
 * genopinit.cc (main): Ditto.
 * ira-int.h (struct ira_allocno): Ditto.
 * ree.cc (struct ATTRIBUTE_PACKED): Ditto.
 * rtl-ssa/accesses.h: Ditto.
 * rtl.h (struct GTY): Ditto.
 (subreg_shape::unique_id): Ditto.
 * rtlanal.h: Ditto.
 * tree-core.h (struct tree_type_common): Ditto.
 (struct tree_decl_common): Ditto.
This is likely going to be very controversial.  It's going to increase 
the size of two of most heavily used data structures in GCC (rtx and trees).


The first thing I would ask is whether or not we really need the full 
matrix in practice or if we can combine some of the modes.


Why hasn't aarch64 stumbled over this problem?

Jeff


[PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-10 Thread juzhe . zhong
From: Juzhe-Zhong 

According RVV ISA:
https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-type-register-vtype
We have LMUL: 1/8, 1/4, 1/2, 1, 2, 4, 8
Also, for segment instructions, we have tuple type for NF = 2 ~ 8.
For example, for LMUL = 1/2, SEW = 32, we have vint32mf2_t,
we will have NF from 2 ~ 8 tuples: vint32mf2x2_t, vint32mf2x2...  vint32mf2x8_t.
So we will end up with over 220+ vector machine mode for RVV.

PLUS the scalar machine modes that we already have in RISC-V port.

The total machine modes in RISC-V port > 256.

Current GCC can not allow us support RVV segment instructions tuple types.

So extend machine mode size from 8bit to 16bit.

I have another solution related to this patch,
May be adding a target dependent macro is better? 
Revise this patch like this:

#ifdef TARGET_MAX_MACHINE_MODE_LARGER_THAN_256
ENUM_BITFIELD(machine_mode)  last_set_mode : 16;
#else
ENUM_BITFIELD(machine_mode)  last_set_mode : 8;
#endif

Not sure whether this solution is better?

This patch Bootstraped on X86 is PASS. Will run make-check gcc-testsuite 
tomorrow.

Expecting land in GCC-14, any suggestions ?

gcc/ChangeLog:

* combine.cc (struct reg_stat_type): Extend 8bit to 16bit.
* cse.cc (struct qty_table_elem): Ditto.
(struct table_elt): Ditto.
(struct set): Ditto.
* genopinit.cc (main): Ditto.
* ira-int.h (struct ira_allocno): Ditto.
* ree.cc (struct ATTRIBUTE_PACKED): Ditto.
* rtl-ssa/accesses.h: Ditto.
* rtl.h (struct GTY): Ditto.
(subreg_shape::unique_id): Ditto.
* rtlanal.h: Ditto.
* tree-core.h (struct tree_type_common): Ditto.
(struct tree_decl_common): Ditto.

---
 gcc/combine.cc | 4 ++--
 gcc/cse.cc | 6 +++---
 gcc/genopinit.cc   | 2 +-
 gcc/ira-int.h  | 4 ++--
 gcc/ree.cc | 2 +-
 gcc/rtl-ssa/accesses.h | 2 +-
 gcc/rtl.h  | 4 ++--
 gcc/rtlanal.h  | 2 +-
 gcc/tree-core.h| 4 ++--
 9 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 053879500b7..af9bae23c92 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -200,7 +200,7 @@ struct reg_stat_type {
 
   unsigned HOST_WIDE_INT   last_set_nonzero_bits;
   char last_set_sign_bit_copies;
-  ENUM_BITFIELD(machine_mode)  last_set_mode : 8;
+  ENUM_BITFIELD(machine_mode)  last_set_mode : 16;
 
   /* Set nonzero if references to register n in expressions should not be
  used.  last_set_invalid is set nonzero when this register is being
@@ -235,7 +235,7 @@ struct reg_stat_type {
  truncation if we know that value already contains a truncated
  value.  */
 
-  ENUM_BITFIELD(machine_mode)  truncated_to_mode : 8;
+  ENUM_BITFIELD(machine_mode)  truncated_to_mode : 16;
 };
 
 
diff --git a/gcc/cse.cc b/gcc/cse.cc
index 8fbda4ecc86..d78efaa39f7 100644
--- a/gcc/cse.cc
+++ b/gcc/cse.cc
@@ -251,7 +251,7 @@ struct qty_table_elem
   /* The sizes of these fields should match the sizes of the
  code and mode fields of struct rtx_def (see rtl.h).  */
   ENUM_BITFIELD(rtx_code) comparison_code : 16;
-  ENUM_BITFIELD(machine_mode) mode : 8;
+  ENUM_BITFIELD(machine_mode) mode : 16;
 };
 
 /* The table of all qtys, indexed by qty number.  */
@@ -406,7 +406,7 @@ struct table_elt
   int regcost;
   /* The size of this field should match the size
  of the mode field of struct rtx_def (see rtl.h).  */
-  ENUM_BITFIELD(machine_mode) mode : 8;
+  ENUM_BITFIELD(machine_mode) mode : 16;
   char in_memory;
   char is_const;
   char flag;
@@ -4146,7 +4146,7 @@ struct set
   /* Original machine mode, in case it becomes a CONST_INT.
  The size of this field should match the size of the mode
  field of struct rtx_def (see rtl.h).  */
-  ENUM_BITFIELD(machine_mode) mode : 8;
+  ENUM_BITFIELD(machine_mode) mode : 16;
   /* Hash value of constant equivalent for SET_SRC.  */
   unsigned src_const_hash;
   /* A constant equivalent for SET_SRC, if any.  */
diff --git a/gcc/genopinit.cc b/gcc/genopinit.cc
index 83cb7504fa1..3ca3e9fd946 100644
--- a/gcc/genopinit.cc
+++ b/gcc/genopinit.cc
@@ -182,7 +182,7 @@ main (int argc, const char **argv)
 
   progname = "genopinit";
 
-  if (NUM_OPTABS > 0x || MAX_MACHINE_MODE >= 0xff)
+  if (NUM_OPTABS > 0x || MAX_MACHINE_MODE >= 0x)
 fatal ("genopinit range assumptions invalid");
 
   if (!init_rtx_reader_args_cb (argc, argv, handle_arg))
diff --git a/gcc/ira-int.h b/gcc/ira-int.h
index e2de47213b4..65ec1678146 100644
--- a/gcc/ira-int.h
+++ b/gcc/ira-int.h
@@ -281,10 +281,10 @@ struct ira_allocno
   int regno;
   /* Mode of the allocno which is the mode of the corresponding
  pseudo-register.  */
-  ENUM_BITFIELD (machine_mode) mode : 8;
+  ENUM_BITFIELD (machine_mode) mode : 16;
   /* Widest mode of the allocno which in at least one case could be
  for paradoxical subregs where wmode > mode.  */
-  ENUM_BITFIELD (machine_mode) wmode : 

[PATCH] RISC-V: add TARGET_ZBKB to the condition of bswapsi2, bswapdi2 and rotr3 patterns

2023-04-10 Thread Lin Sinan via Gcc-patches
From: Sinan Lin 

tell gcc that zbkb has these two spn to enable some optimizations. e.g.
1) the rrotate_expr could match to rotrm3 during expand; 2) hook up
__builtin_bswap64 with `rev8` in zbkb64.
---
 gcc/config/riscv/bitmanip.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 7aa591689ba..3ed9f5d403a 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -297,7 +297,7 @@
   [(set (match_operand:GPR 0 "register_operand")
(rotatert:GPR (match_operand:GPR 1 "register_operand")
 (match_operand:QI 2 "arith_operand")))]
-  "TARGET_ZBB || TARGET_XTHEADBB"
+  "TARGET_ZBB || TARGET_XTHEADBB || TARGET_ZBKB"
 {
   if (TARGET_XTHEADBB && !immediate_operand (operands[2], VOIDmode))
 FAIL;
@@ -362,12 +362,12 @@
 (define_expand "bswapdi2"
   [(set (match_operand:DI 0 "register_operand")
(bswap:DI (match_operand:DI 1 "register_operand")))]
-  "TARGET_64BIT && (TARGET_ZBB || TARGET_XTHEADBB)")
+  "TARGET_64BIT && (TARGET_ZBB || TARGET_XTHEADBB || TARGET_ZBKB)")
 
 (define_expand "bswapsi2"
   [(set (match_operand:SI 0 "register_operand")
(bswap:SI (match_operand:SI 1 "register_operand")))]
-  "(!TARGET_64BIT && TARGET_ZBB) || TARGET_XTHEADBB")
+  "(!TARGET_64BIT && (TARGET_ZBB || TARGET_ZBKB)) || TARGET_XTHEADBB")
 
 (define_insn "*bswap2"
   [(set (match_operand:X 0 "register_operand" "=r")
-- 
2.19.1.6.gb485710b



Re: [PATCH] LoongArch: Improve GAR store for va_list

2023-04-10 Thread Lulu Cheng
Sorry, it's my question. I still have some questions that I haven't 
understood, so I haven't replied to the email yet.:-(



在 2023/4/10 下午5:04, Xi Ruoyao 写道:

Ping.  Or maybe I've lost some replies here because my mail server
crashed several days ago :).

On Wed, 2023-03-29 at 02:01 +0800, Xi Ruoyao wrote:

LoongArch backend used to save all GARs for a function with variable
arguments.  But sometimes a function only accepts variable arguments
for
a purpose like C++ function overloading.  For example, POSIX defines
open() as:

     int open(const char *path, int oflag, ...);

But only two forms are actually used:

     int open(const char *pathname, int flags);
     int open(const char *pathname, int flags, mode_t mode);

So it's obviously a waste to save all 8 GARs in open().  We can use
the
cfun->va_list_gpr_size field set by the stdarg pass to only save the
GARs necessary to be saved.

If the va_list escapes (for example, in fprintf() we pass it to
vfprintf()), stdarg would set cfun->va_list_gpr_size to 255 so we
don't need a special case.

With this patch, only one GAR ($a2/$r6) is saved in open().  Ideally
even this stack store should be omitted too, but doing so is not
trivial
and AFAIK there are no compilers (for any target) performing the
"ideal"
optimization here, see https://godbolt.org/z/n1YqWq9c9.

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk
(GCC 14 or now)?

gcc/ChangeLog:

 * config/loongarch/loongarch.cc
 (loongarch_setup_incoming_varargs): Don't save more GARs than
 cfun->va_list_gpr_size / UNITS_PER_WORD.

gcc/testsuite/ChangeLog:

 * gcc.target/loongarch/va_arg.c: New test.
---
  gcc/config/loongarch/loongarch.cc   |  4 +++-
  gcc/testsuite/gcc.target/loongarch/va_arg.c | 24
+
  2 files changed, 27 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/va_arg.c

diff --git a/gcc/config/loongarch/loongarch.cc
b/gcc/config/loongarch/loongarch.cc
index 6927bdc7fe5..0ecb91ca997 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -764,7 +764,9 @@ loongarch_setup_incoming_varargs
(cumulative_args_t cum,
  loongarch_function_arg_advance (pack_cumulative_args
(_cum), arg);
  
    /* Found out how many registers we need to save.  */

-  gp_saved = MAX_ARGS_IN_REGISTERS - local_cum.num_gprs;
+  gp_saved = cfun->va_list_gpr_size / UNITS_PER_WORD;
+  if (gp_saved > (int) (MAX_ARGS_IN_REGISTERS - local_cum.num_gprs))
+    gp_saved = MAX_ARGS_IN_REGISTERS - local_cum.num_gprs;
  
    if (!no_rtl && gp_saved > 0)

  {
diff --git a/gcc/testsuite/gcc.target/loongarch/va_arg.c
b/gcc/testsuite/gcc.target/loongarch/va_arg.c
new file mode 100644
index 000..980c96d0e3d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/va_arg.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+/* Technically we shouldn't save any register for this function: it
should be
+   compiled as if it accepts 3 named arguments.  But AFAIK no
compilers can
+   achieve this "perfect" optimization now, so just ensure we are
using the
+   knowledge provided by stdarg pass and we won't save GARs
impossible to be
+   accessed with __builtin_va_arg () when the va_list does not
escape.  */
+
+/* { dg-final { scan-assembler-not "st.*r7" } } */
+
+int
+test (int a0, ...)
+{
+  void *arg;
+  int a1, a2;
+
+  __builtin_va_start (arg, a0);
+  a1 = __builtin_va_arg (arg, int);
+  a2 = __builtin_va_arg (arg, int);
+  __builtin_va_end (arg);
+
+  return a0 + a1 + a2;
+}




Re: [PATCH] testsuite: update requires for powerpc/float128-cmp2-runnable.c

2023-04-10 Thread Kewen.Lin via Gcc-patches
Hi Jeff,

on 2023/4/10 10:09, Jiufu Guo via Gcc-patches wrote:
> Hi,
> 
> In this test case (float128-cmp2-runnable.c), the instruction
> xscmpexpqp is used to support a few builtins e.g.
> __builtin_vsx_scalar_cmp_exp_qp_eq on _Float128.
> This instruction handles the whole 128bits of the vector, and
> it is guarded by [ieee128-hw].

The instruction xscmpexpqp is guarded with TARGET_P9_VECTOR,

(define_insn "*xscmpexpqp"
  [(set (match_operand:CCFP 0 "cc_reg_operand" "=y")
(compare:CCFP
 (unspec:IEEE128 [(match_operand:IEEE128 1 "altivec_register_operand" 
"v")
  (match_operand:IEEE128 2 "altivec_register_operand" 
"v")]
  UNSPEC_VSX_SCMPEXPQP)
 (match_operand:SI 3 "zero_constant" "j")))]
  "TARGET_P9_VECTOR"
  "xscmpexpqp %0,%1,%2"
  [(set_attr "type" "fpcompare")])

[ieee128-hw] is used for guarding those bifs, so the above
statement doesn't quite match the fact.

PR108758 said this case doesn't fail with gcc-10 and gcc-11,
I wonder why it changes from gcc-12?  The above define_insn
shows the underlying insns for these bifs just requires the
condition power9-vector.  Could you have a further check?
Thanks.

btw, please add a PR marker for PR108758.

BR,
Kewen

> So, we may update the testcase to require ppc_float128_hw.
> 
> Tested on ppc64 both BE and LE.
> Is this ok for trunk?
> 
> BR,
> Jeff (Jiufu)
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/powerpc/float128-cmp2-runnable.c: Update requires.
> 
> ---
>  gcc/testsuite/gcc.target/powerpc/float128-cmp2-runnable.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/float128-cmp2-runnable.c 
> b/gcc/testsuite/gcc.target/powerpc/float128-cmp2-runnable.c
> index d376a3ca68e..91287c0fb7a 100644
> --- a/gcc/testsuite/gcc.target/powerpc/float128-cmp2-runnable.c
> +++ b/gcc/testsuite/gcc.target/powerpc/float128-cmp2-runnable.c
> @@ -1,5 +1,5 @@
>  /* { dg-do run } */
> -/* { dg-require-effective-target ppc_float128_sw } */
> +/* { dg-require-effective-target ppc_float128_hw } */
>  /* { dg-require-effective-target p9vector_hw } */
>  /* { dg-options "-O2 -mdejagnu-cpu=power9 " } */
>  


Re: [PATCH] LoongArch: Improve GAR store for va_list

2023-04-10 Thread Xi Ruoyao via Gcc-patches
Ping.  Or maybe I've lost some replies here because my mail server
crashed several days ago :).

On Wed, 2023-03-29 at 02:01 +0800, Xi Ruoyao wrote:
> LoongArch backend used to save all GARs for a function with variable
> arguments.  But sometimes a function only accepts variable arguments
> for
> a purpose like C++ function overloading.  For example, POSIX defines
> open() as:
> 
>     int open(const char *path, int oflag, ...);
> 
> But only two forms are actually used:
> 
>     int open(const char *pathname, int flags);
>     int open(const char *pathname, int flags, mode_t mode);
> 
> So it's obviously a waste to save all 8 GARs in open().  We can use
> the
> cfun->va_list_gpr_size field set by the stdarg pass to only save the
> GARs necessary to be saved.
> 
> If the va_list escapes (for example, in fprintf() we pass it to
> vfprintf()), stdarg would set cfun->va_list_gpr_size to 255 so we
> don't need a special case.
> 
> With this patch, only one GAR ($a2/$r6) is saved in open().  Ideally
> even this stack store should be omitted too, but doing so is not
> trivial
> and AFAIK there are no compilers (for any target) performing the
> "ideal"
> optimization here, see https://godbolt.org/z/n1YqWq9c9.
> 
> Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk
> (GCC 14 or now)?
> 
> gcc/ChangeLog:
> 
> * config/loongarch/loongarch.cc
> (loongarch_setup_incoming_varargs): Don't save more GARs than
> cfun->va_list_gpr_size / UNITS_PER_WORD.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/loongarch/va_arg.c: New test.
> ---
>  gcc/config/loongarch/loongarch.cc   |  4 +++-
>  gcc/testsuite/gcc.target/loongarch/va_arg.c | 24
> +
>  2 files changed, 27 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/loongarch/va_arg.c
> 
> diff --git a/gcc/config/loongarch/loongarch.cc
> b/gcc/config/loongarch/loongarch.cc
> index 6927bdc7fe5..0ecb91ca997 100644
> --- a/gcc/config/loongarch/loongarch.cc
> +++ b/gcc/config/loongarch/loongarch.cc
> @@ -764,7 +764,9 @@ loongarch_setup_incoming_varargs
> (cumulative_args_t cum,
>  loongarch_function_arg_advance (pack_cumulative_args
> (_cum), arg);
>  
>    /* Found out how many registers we need to save.  */
> -  gp_saved = MAX_ARGS_IN_REGISTERS - local_cum.num_gprs;
> +  gp_saved = cfun->va_list_gpr_size / UNITS_PER_WORD;
> +  if (gp_saved > (int) (MAX_ARGS_IN_REGISTERS - local_cum.num_gprs))
> +    gp_saved = MAX_ARGS_IN_REGISTERS - local_cum.num_gprs;
>  
>    if (!no_rtl && gp_saved > 0)
>  {
> diff --git a/gcc/testsuite/gcc.target/loongarch/va_arg.c
> b/gcc/testsuite/gcc.target/loongarch/va_arg.c
> new file mode 100644
> index 000..980c96d0e3d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/loongarch/va_arg.c
> @@ -0,0 +1,24 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +/* Technically we shouldn't save any register for this function: it
> should be
> +   compiled as if it accepts 3 named arguments.  But AFAIK no
> compilers can
> +   achieve this "perfect" optimization now, so just ensure we are
> using the
> +   knowledge provided by stdarg pass and we won't save GARs
> impossible to be
> +   accessed with __builtin_va_arg () when the va_list does not
> escape.  */
> +
> +/* { dg-final { scan-assembler-not "st.*r7" } } */
> +
> +int
> +test (int a0, ...)
> +{
> +  void *arg;
> +  int a1, a2;
> +
> +  __builtin_va_start (arg, a0);
> +  a1 = __builtin_va_arg (arg, int);
> +  a2 = __builtin_va_arg (arg, int);
> +  __builtin_va_end (arg);
> +
> +  return a0 + a1 + a2;
> +}

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] combine: Fix simplify_comparison AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]

2023-04-10 Thread Jakub Jelinek via Gcc-patches
On Sat, Apr 08, 2023 at 06:25:32PM -0600, Jeff Law wrote:
> 
> 
> On 4/6/23 08:21, Eric Botcazou wrote:
> 
> > > So, perhaps just in the return op0; case add further code for
> > > WORD_REGISTER_OPERATIONS and sub-word modes which will call nonzero_bits
> > > again for the word mode and decide if it is still safe.
> > 
> > Does it work to just replace mode by word_mode in the calls to nonzero_bits?
> It helps marginally -- basically we defer mucking up the code a bit.  We
> then hit this in simplify_and_const_int_1:
> 
> 
>   /* See what bits may be nonzero in VAROP.  Unlike the general case of
>  a call to nonzero_bits, here we don't care about bits outside
>  MODE.  */
> 
>   nonzero = nonzero_bits (varop, mode) & GET_MODE_MASK (mode);
> 
> That just seems wrong for WORD_REGISTER_OPERATIONS targets.
> 
> 
> Hacking both locations in a similar manner fixes the test.

If so, can you post that in patch form and can we go with that version
plus the testcase (e.g. from the first patch I've posted where I've changed
dse)?

Jakub