Re: [PATCH] install: Correct check-g++ to check-gcc-c++

2023-12-30 Thread Andrew Pinski
On Sat, Dec 30, 2023 at 11:26 PM YunQiang Su  wrote:
>
> make: *** No rule to make target 'check-g++'.  Stop.
>
> gcc
>
> * doc/install.texi (Testing): Correct check-g++ to
> check-gcc-c++.

Actually these targets exist in the gcc subdirectory.
Which is mentioned slightly above:
```
In order to run sets of tests selectively, there are targets
@samp{make check-gcc} and language specific @samp{make check-c},
@samp{make check-c++}, @samp{make check-d} @samp{make check-fortran},
@samp{make check-ada}, @samp{make check-m2}, @samp{make check-objc},
@samp{make check-obj-c++}, @samp{make check-lto} in the @file{gcc}
subdirectory of the object directory.
```

Though maybe it should be clear that these make targets are talking
about the targets in the gcc subdirectory rather than from the
toplevel.

Thanks,
Andrew Pinski


> ---
>  gcc/doc/install.texi | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
> index d20b43a5b21..cff4d0cd71f 100644
> --- a/gcc/doc/install.texi
> +++ b/gcc/doc/install.texi
> @@ -3345,7 +3345,7 @@ Likewise, in order to run only the @command{g++} 
> ``old-deja'' tests in
>  the testsuite with filenames matching @samp{9805*}, you would use
>
>  @smallexample
> -make check-g++ RUNTESTFLAGS="old-deja.exp=9805* @var{other-options}"
> +make check-gcc-c++ RUNTESTFLAGS="old-deja.exp=9805* @var{other-options}"
>  @end smallexample
>
>  The file-matching expression following @var{filename}@command{.exp=} is 
> treated
> @@ -3354,8 +3354,8 @@ may be passed, although any whitespace must either be 
> escaped or surrounded by
>  single quotes if multiple expressions are desired. For example,
>
>  @smallexample
> -make check-g++ RUNTESTFLAGS="old-deja.exp=9805*\ virtual2.c 
> @var{other-options}"
> -make check-g++ RUNTESTFLAGS="'old-deja.exp=9805* virtual2.c' 
> @var{other-options}"
> +make check-gcc-c++ RUNTESTFLAGS="old-deja.exp=9805*\ virtual2.c 
> @var{other-options}"
> +make check-gcc-c++ RUNTESTFLAGS="'old-deja.exp=9805* virtual2.c' 
> @var{other-options}"
>  @end smallexample
>
>  The @file{*.exp} files are located in the testsuite directories of the GCC
> @@ -3373,7 +3373,7 @@ You can pass multiple options to the testsuite using the
>  work outside the makefiles.  For example,
>
>  @smallexample
> -make check-g++ RUNTESTFLAGS="--target_board=unix/-O3/-fmerge-constants"
> +make check-gcc-c++ RUNTESTFLAGS="--target_board=unix/-O3/-fmerge-constants"
>  @end smallexample
>
>  will run the standard @command{g++} testsuites (``unix'' is the target name
> --
> 2.39.2
>


[PATCH] install: Correct check-g++ to check-gcc-c++

2023-12-30 Thread YunQiang Su
make: *** No rule to make target 'check-g++'.  Stop.

gcc

* doc/install.texi (Testing): Correct check-g++ to
check-gcc-c++.
---
 gcc/doc/install.texi | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index d20b43a5b21..cff4d0cd71f 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -3345,7 +3345,7 @@ Likewise, in order to run only the @command{g++} 
``old-deja'' tests in
 the testsuite with filenames matching @samp{9805*}, you would use
 
 @smallexample
-make check-g++ RUNTESTFLAGS="old-deja.exp=9805* @var{other-options}"
+make check-gcc-c++ RUNTESTFLAGS="old-deja.exp=9805* @var{other-options}"
 @end smallexample
 
 The file-matching expression following @var{filename}@command{.exp=} is treated
@@ -3354,8 +3354,8 @@ may be passed, although any whitespace must either be 
escaped or surrounded by
 single quotes if multiple expressions are desired. For example,
 
 @smallexample
-make check-g++ RUNTESTFLAGS="old-deja.exp=9805*\ virtual2.c 
@var{other-options}"
-make check-g++ RUNTESTFLAGS="'old-deja.exp=9805* virtual2.c' 
@var{other-options}"
+make check-gcc-c++ RUNTESTFLAGS="old-deja.exp=9805*\ virtual2.c 
@var{other-options}"
+make check-gcc-c++ RUNTESTFLAGS="'old-deja.exp=9805* virtual2.c' 
@var{other-options}"
 @end smallexample
 
 The @file{*.exp} files are located in the testsuite directories of the GCC
@@ -3373,7 +3373,7 @@ You can pass multiple options to the testsuite using the
 work outside the makefiles.  For example,
 
 @smallexample
-make check-g++ RUNTESTFLAGS="--target_board=unix/-O3/-fmerge-constants"
+make check-gcc-c++ RUNTESTFLAGS="--target_board=unix/-O3/-fmerge-constants"
 @end smallexample
 
 will run the standard @command{g++} testsuites (``unix'' is the target name
-- 
2.39.2



Re: [PATCH] Improved RTL expansion of field assignments into promoted registers.

2023-12-30 Thread YunQiang Su
> Right.  But that's the whole point behind avoiding the narrowing subreg
> and forcing use of a truncate operation.
>
> So basically the question becomes is there a way to modify those bits in
> a way that GCC doesn't know that it needs to to truncate/extend?
>

I guess that this code may cause some problem.
int test(int val, unsigned char c, int pos) {
  ((unsigned char*))[pos+0] = c;
  return val;
}
GCC avoids using bitops, instead it uses load/store for it.
Any ISA has INSERT_CHAR_VAR instruction?
 INSERT_CHAR_VAR $rN, $rM,$rX

So I guess that  known_lt may be a better choice
if (known_lt)
no_truncate_or_extend_needed;
else
add_truncate_or_extend;

> The most obvious concern would be bitfield insertions that modify those
> bits.  But in that case the destination must have been DImode and we
> must truncate it to SImode before we can do anything with the SImode
> object.  BUt that's all supposed to work as long as
> TRULY_NOOP_TRUNCATION is defined properly.
>
> Jeff


[PATCH] Pass GUILE down to subdirectories

2023-12-30 Thread Tom Tromey
When I enable cgen rebuilding in the binutils-gdb tree, the default is
to run cgen using 'guile'.  However, on my host, guile is guile 2.2,
which doesn't work for me -- I have to use guile3.0.

This patch arranges to pass "GUILE" down to subdirectories, so I can
use 'make GUILE=guile3.0'.

ChangeLog
2023-12-30  Tom Tromey  

* Makefile.in: Rebuild.
* Makefile.tpl (BASE_EXPORTS): Add GUILE.
(GUILE): New variable.
* Makefile.def (flags_to_pass): Add GUILE.
---
 ChangeLog| 7 +++
 Makefile.def | 1 +
 Makefile.in  | 8 ++--
 Makefile.tpl | 7 +--
 4 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/Makefile.def b/Makefile.def
index 662e50fdc18..792919e561c 100644
--- a/Makefile.def
+++ b/Makefile.def
@@ -310,6 +310,7 @@ flags_to_pass = { flag= GNATBIND ; };
 flags_to_pass = { flag= GNATMAKE ; };
 flags_to_pass = { flag= GDC ; };
 flags_to_pass = { flag= GDCFLAGS ; };
+flags_to_pass = { flag= GUILE ; };
 
 // Target tools
 flags_to_pass = { flag= AR_FOR_TARGET ; };
diff --git a/Makefile.in b/Makefile.in
index 48320bb549e..9a58d5a4f20 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -3,7 +3,7 @@
 #
 # Makefile for directory with subdirs to build.
 #   Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998,
-#   1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 
2011
+#   1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 
2011, 2023
 #   Free Software Foundation
 #
 # This file is free software; you can redistribute it and/or modify
@@ -143,7 +143,8 @@ BASE_EXPORTS = \
M4="$(M4)"; export M4; \
SED="$(SED)"; export SED; \
AWK="$(AWK)"; export AWK; \
-   MAKEINFO="$(MAKEINFO)"; export MAKEINFO;
+   MAKEINFO="$(MAKEINFO)"; export MAKEINFO; \
+   GUILE="$(GUILE)"; export GUILE;
 
 # This is the list of variables to export in the environment when
 # configuring subdirectories for the build system.
@@ -450,6 +451,8 @@ GM2FLAGS = $(CFLAGS)
 
 PKG_CONFIG_PATH = @PKG_CONFIG_PATH@
 
+GUILE = guile
+
 # Pass additional PGO and LTO compiler options to the PGO build.
 BUILD_CFLAGS = $(PGO_BUILD_CFLAGS) $(PGO_BUILD_LTO_CFLAGS)
 override CFLAGS += $(BUILD_CFLAGS)
@@ -878,6 +881,7 @@ BASE_FLAGS_TO_PASS = \
"GNATMAKE=$(GNATMAKE)" \
"GDC=$(GDC)" \
"GDCFLAGS=$(GDCFLAGS)" \
+   "GUILE=$(GUILE)" \
"AR_FOR_TARGET=$(AR_FOR_TARGET)" \
"AS_FOR_TARGET=$(AS_FOR_TARGET)" \
"CC_FOR_TARGET=$(CC_FOR_TARGET)" \
diff --git a/Makefile.tpl b/Makefile.tpl
index 36fa20950d4..17e585df541 100644
--- a/Makefile.tpl
+++ b/Makefile.tpl
@@ -6,7 +6,7 @@ in
 #
 # Makefile for directory with subdirs to build.
 #   Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998,
-#   1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 
2011
+#   1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 
2011, 2023
 #   Free Software Foundation
 #
 # This file is free software; you can redistribute it and/or modify
@@ -146,7 +146,8 @@ BASE_EXPORTS = \
M4="$(M4)"; export M4; \
SED="$(SED)"; export SED; \
AWK="$(AWK)"; export AWK; \
-   MAKEINFO="$(MAKEINFO)"; export MAKEINFO;
+   MAKEINFO="$(MAKEINFO)"; export MAKEINFO; \
+   GUILE="$(GUILE)"; export GUILE;
 
 # This is the list of variables to export in the environment when
 # configuring subdirectories for the build system.
@@ -453,6 +454,8 @@ GM2FLAGS = $(CFLAGS)
 
 PKG_CONFIG_PATH = @PKG_CONFIG_PATH@
 
+GUILE = guile
+
 # Pass additional PGO and LTO compiler options to the PGO build.
 BUILD_CFLAGS = $(PGO_BUILD_CFLAGS) $(PGO_BUILD_LTO_CFLAGS)
 override CFLAGS += $(BUILD_CFLAGS)
-- 
2.43.0



Re: [PATCH] Add a late-combine pass [PR106594]

2023-12-30 Thread Segher Boessenkool
Hi!

On Tue, Oct 24, 2023 at 07:49:10PM +0100, Richard Sandiford wrote:
> This patch adds a combine pass that runs late in the pipeline.

But it is not.  It is a completely new thing, and much closer to
fwprop than to combine, too.

Could you rename it to something else, please?  Something less confusing
to both users and maintainers :-)

> There are two instances: one between combine and split1,

So, what kind of things does this do that the real combine does not?
And, same question but for fwprop.  That would be the crucial motivation
for why we want to have this new pass at all :-)

> The pass currently has a single objective: remove definitions by
> substituting into all uses.

The easy case ;-)


Segher


Re: [PATCH] libstdc++ testsuite/std/ranges/iota/max_size_type.cc: Reduce /10 for simulators

2023-12-30 Thread Hans-Peter Nilsson
On Sat, 30 Dec 2023, Jonathan Wakely wrote:

> On Sat, 30 Dec 2023, 01:41 Hans-Peter Nilsson,  wrote:
> > Or perhaps the cause is known?
> 
> Not to me. It probably is a target codegen bug, since all this test really
> does is emulate a wide integer type using masks and shifts.

If so, a generic code-generator bug.  I've repeated the 5x 
performance regression observation for a native build and 
updated PR113175 (.32 vs 1.73 seconds).  I'll see if I can 
quickly find out whether it's codegen or libstdc++.  I set it 
the PR to the latter for the moment.

> > With this, the test successfully completes in ~34 seconds.
> >
> > Ok to commit?
> >
> 
> Looks OK to me, but Patrick wrote this test so please wait for him to
> confirm. I think this just reduces the number of cases tested, but doesn't
> miss any important edge cases that should be checked.

Understood: holding, but will ping after the usual week.  
Thanks for the review!

brgds, H-P


Re: [PATCH 1/2] RTX_COST: Count instructions

2023-12-30 Thread Segher Boessenkool
On Fri, Dec 29, 2023 at 09:14:52PM -0700, Jeff Law wrote:
> On 12/29/23 10:46, YunQiang Su wrote:
> >When we try to combine RTLs, the result may be very complex,
> >and `rtx_cost` may think that it need lots of costs. But in
> >fact, it may match a pattern in machine descriptions, which
> >may emit only 1 or 2 hardware instructions.  This combination
> >may be refused due to cost comparison failure.
> Then that's a problem with the backend's implementation of RTX_COST.
> 
> >Since the high cost may be due to a more expsensive operation.
> >To get real reason, we also need information about instruction
> >count.
> Then cost the *operations*, not the number of instructions.  Also note 
> that a single insn may generate multiple assembler instructions.
> 
> Even with all its warts, the real solution here is to fix the port's RTX 
> costs.

Or implement the insn_cost hook instead, it will be used preferably over
rtx_costs in most places then.  Including in the combiner.  insn_cost
is much easier to implement, and even possible to make good cost
estimates with :-)


Segher


[PATCH] MIPS: Add pattern insqisi_extended and inshisi_extended

2023-12-30 Thread YunQiang Su
This match pattern allows combination (zero_extract:DI 8, 24, QI)
with an sign-extend to 32bit INS instruction on TARGET_64BIT.

For SI mode, if the sign-bit is modified by bitops, we will need a
sign-extend operation.  Since 32bit INS instruction can be sure that
result is sign-extended, and the QImode src register is safe for INS, too.

(insn 19 18 20 2 (set (zero_extract:DI (reg/v:DI 200 [ val ])
(const_int 8 [0x8])
(const_int 24 [0x18]))
(subreg:DI (reg:QI 205) 0)) "../xx.c":7:29 -1
 (nil))
(insn 20 19 23 2 (set (reg/v:DI 200 [ val ])
(sign_extend:DI (subreg:SI (reg/v:DI 200 [ val ]) 0))) "../xx.c":7:29 -1
 (nil))

Combine try to merge them to:

(insn 20 19 23 2 (set (reg/v:DI 200 [ val ])
(sign_extend:DI (ior:SI (and:SI (subreg:SI (reg/v:DI 200 [ val ]) 0)
(const_int 16777215 [0xff]))
(ashift:SI (subreg:SI (reg:QI 205 [ MEM[(const unsigned char 
*)buf_8(D) + 3B] ]) 0)
(const_int 24 [0x18]) "../xx.c":7:29 18 {*insv_extended}
 (expr_list:REG_DEAD (reg:QI 205 [ MEM[(const unsigned char *)buf_8(D) + 
3B] ])
(nil)))

And do similarly for 16/16 pair:
(insn 13 12 14 2 (set (zero_extract:DI (reg/v:DI 198 [ val ])
(const_int 16 [0x10])
(const_int 16 [0x10]))
(subreg:DI (reg:HI 201 [ MEM[(const short unsigned int *)buf_6(D) + 2B] 
]) 0)) "xx.c":5:30 286 {*insvdi}
 (expr_list:REG_DEAD (reg:HI 201 [ MEM[(const short unsigned int *)buf_6(D) 
+ 2B] ])
(nil)))
(insn 14 13 17 2 (set (reg/v:DI 198 [ val ])
(sign_extend:DI (subreg:SI (reg/v:DI 198 [ val ]) 0))) "xx.c":5:30 241 
{extendsidi2}
 (nil))
>
(insn 14 13 17 2 (set (reg/v:DI 198 [ val ])
(sign_extend:DI (ior:SI (ashift:SI (subreg:SI (reg:HI 201 [ MEM[(const 
short unsigned int *)buf_6(D) + 2B] ]) 0)
(const_int 16 [0x10]))
(zero_extend:SI (subreg:HI (reg/v:DI 198 [ val ]) 0) 
"xx.c":5:30 284 {*inshisi_extended}
 (expr_list:REG_DEAD (reg:HI 201 [ MEM[(const short unsigned int *)buf_6(D) 
+ 2B] ])
(nil)))

Let's accept these patterns, and set the cost to 1 instruction.

gcc

PR rtl-optimization/104914
* config/mips/mips.md (insqisi_extended): New patterns.
(inshisi_extended): Ditto.
* config/mips/mips.cc (mips_binary_cost): Set the cost of new
  patters to COST_N_INSNS (1).
---
 gcc/config/mips/mips.cc | 43 +
 gcc/config/mips/mips.md | 22 +
 2 files changed, 65 insertions(+)

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index 9180dbbf843..225a9ee1fd4 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -4066,6 +4066,49 @@ mips_binary_cost (rtx x, int single_cost, int 
double_cost, bool speed)
 {
   int cost;
 
+  rtx op0 = XEXP (x, 0);
+  rtx op1 = XEXP (x, 1);
+  rtx op00, op01, op10, op11;
+  if (GET_RTX_LENGTH (GET_CODE (op0)) > 0)
+op00 = XEXP (op0, 0);
+  if (GET_RTX_LENGTH (GET_CODE (op0)) > 1)
+op01 = XEXP (op0, 1);
+  if (GET_RTX_LENGTH (GET_CODE (op1)) > 0)
+op10 = XEXP (op1, 0);
+  if (GET_RTX_LENGTH (GET_CODE (op1)) > 1)
+op11 = XEXP (op1, 1);
+  /* On TARGET_64BIT, these 2 RTXs can be converted to INS instruction.
+ (ior:SI (and:SI (subreg:SI (reg/v:DI 200) 0)
+   (const_int 16777215 [0xff]))
+(ashift:SI (subreg:SI (reg:QI 205) 0)
+   (const_int 24 [0x18]))
+ )
+ (ior:SI (ashift:SI (subreg:SI (reg:HI 201) 0)
+   (const_int 16 [0x10]))
+(zero_extend:SI (subreg:HI (reg/v:DI 198) 0)))
+  */
+  if (TARGET_64BIT && ISA_HAS_EXT_INS
+  && GET_CODE (x) == IOR && GET_MODE (x) == SImode
+  && GET_MODE (op0) == SImode && GET_MODE (op1) == SImode
+  && ((GET_CODE (op0) == AND && GET_CODE (op1) == ASHIFT
+&& SUBREG_P (op00) && GET_MODE (op00) == SImode
+   && GET_MODE (SUBREG_REG (op00)) == DImode
+   && SUBREG_BYTE (op00) == 0
+&& GET_CODE (op01) == CONST_INT && INTVAL (op01) == 0xff
+&& SUBREG_P (op10) && GET_MODE (op10) == SImode
+   && GET_MODE (SUBREG_REG (op10)) == QImode
+   && SUBREG_BYTE (op00) == 0
+&& GET_CODE (op11) == CONST_INT && INTVAL (op11) == 0x18)
+ || (GET_CODE (op0) == ASHIFT && GET_CODE (op1) == ZERO_EXTEND
+   && SUBREG_P (op00) && GET_MODE (op00) == SImode
+  && GET_MODE (SUBREG_REG (op00)) == HImode
+  && SUBREG_BYTE (op00) == 0
+   && GET_CODE (op01) == CONST_INT && INTVAL (op01) == 0x10
+   && SUBREG_P (op10) && GET_MODE (op10) == HImode
+   && GET_MODE (SUBREG_REG (op10)) == DImode
+   && SUBREG_BYTE (op10) == 0)))
+return COSTS_N_INSNS (1);
+
   if (GET_MODE_SIZE (GET_MODE (x)) == UNITS_PER_WORD * 2)
 cost = double_cost;
   

Ping^3: [PATCH] Add a late-combine pass [PR106594]

2023-12-30 Thread Richard Sandiford
Ping^3

---

This patch adds a combine pass that runs late in the pipeline.
There are two instances: one between combine and split1, and one
after postreload.

The pass currently has a single objective: remove definitions by
substituting into all uses.  The pre-RA version tries to restrict
itself to cases that are likely to have a neutral or beneficial
effect on register pressure.

The patch fixes PR106594.  It also fixes a few FAILs and XFAILs
in the aarch64 test results, mostly due to making proper use of
MOVPRFX in cases where we didn't previously.  I hope it would
also help with Robin's vec_duplicate testcase, although the
pressure heuristic might need tweaking for that case.

This is just a first step..  I'm hoping that the pass could be
used for other combine-related optimisations in future.  In particular,
the post-RA version doesn't need to restrict itself to cases where all
uses are substitutitable, since it doesn't have to worry about register
pressure.  If we did that, and if we extended it to handle multi-register
REGs, the pass might be a viable replacement for regcprop, which in
turn might reduce the cost of having a post-RA instance of the new pass.

I've run an assembly comparison with one target per CPU directory,
and it seems to be a win for all targets except nvptx (which is hard
to measure, being a higher-level asm).  The biggest winner seemed
to be AVR.

I'd originally hoped to enable the pass by default at -O2 and above
on all targets.  But in the end, I don't think that's possible,
because it interacts badly with x86's STV and partial register
dependency passes.

For example, gcc.target/i386/minmax-6.c tests whether the code
compiles without any spilling.  The RTL created by STV contains:

(insn 33 31 3 2 (set (subreg:V4SI (reg:SI 120) 0)
(vec_merge:V4SI (vec_duplicate:V4SI (reg:SI 116))
(const_vector:V4SI [
(const_int 0 [0]) repeated x4
])
(const_int 1 [0x1]))) -1
 (nil))
(insn 3 33 34 2 (set (subreg:V4SI (reg:SI 118) 0)
(subreg:V4SI (reg:SI 120) 0)) {movv4si_internal}
 (expr_list:REG_DEAD (reg:SI 120)
(nil)))
(insn 34 3 32 2 (set (reg/v:SI 108 [ y ])
(reg:SI 118)) -1
 (nil))

and it's crucial for the test that reg 108 is kept, rather than
propagated into uses.  As things stand, 118 can be allocated
a vector register and 108 a scalar register.  If 108 is propagated,
there will be scalar and vector uses of 118, and so it will be
spilled to memory.

That one could be solved by running STV2 later.  But RPAD is
a bigger problem.  In gcc.target/i386/pr87007-5.c, RPAD converts:

(insn 27 26 28 6 (set (reg:DF 100 [ _15 ])
(sqrt:DF (mem/c:DF (symbol_ref:DI ("d2") {*sqrtdf2_sse}
 (nil))

into:

(insn 45 26 44 6 (set (reg:V4SF 108)
(const_vector:V4SF [
(const_double:SF 0.0 [0x0.0p+0]) repeated x4
])) -1
 (nil))
(insn 44 45 27 6 (set (reg:V2DF 109)
(vec_merge:V2DF (vec_duplicate:V2DF (sqrt:DF (mem/c:DF (symbol_ref:DI 
("d2")
(subreg:V2DF (reg:V4SF 108) 0)
(const_int 1 [0x1]))) -1
 (nil))
(insn 27 44 28 6 (set (reg:DF 100 [ _15 ])
(subreg:DF (reg:V2DF 109) 0)) {*movdf_internal}
 (nil))

But both the pre-RA and post-RA passes are able to combine these
instructions back to the original form.

The patch therefore enables the pass by default only on AArch64.
However, I did test the patch with it enabled on x86_64-linux-gnu
as well, which was useful for debugging.

Bootstrapped & regression-tested on aarch64-linux-gnu and
x86_64-linux-gnu (as posted, with no regressions, and with the
pass enabled by default, with some gcc.target/i386 regressions).
OK to install?

Richard


gcc/
PR rtl-optimization/106594
* Makefile.in (OBJS): Add late-combine.o.
* common.opt (flate-combine-instructions): New option.
* doc/invoke.texi: Document it.
* common/config/aarch64/aarch64-common.cc: Enable it by default
at -O2 and above.
* tree-pass.h (make_pass_late_combine): Declare.
* late-combine.cc: New file.
* passes.def: Add two instances of late_combine.

gcc/testsuite/
PR rtl-optimization/106594
* gcc.dg/ira-shrinkwrap-prep-1.c: Restrict XFAIL to non-aarch64
targets.
* gcc.dg/ira-shrinkwrap-prep-2.c: Likewise.
* gcc.dg/stack-check-4.c: Add -fno-shrink-wrap.
* gcc.target/aarch64/sve/cond_asrd_3.c: Remove XFAILs.
* gcc.target/aarch64/sve/cond_convert_3.c: Likewise.
* gcc.target/aarch64/sve/cond_fabd_5.c: Likewise.
* gcc.target/aarch64/sve/cond_convert_6.c: Expect the MOVPRFX /Zs
described in the comment.
* gcc.target/aarch64/sve/cond_unary_4.c: Likewise.
* gcc.target/aarch64/pr106594_1.c: New test.
---
 gcc/Makefile.in   |   1 +
 gcc/common.opt|   5 +
 

Re: [PATCH v1] LoongArch: testsuite:Add the "-ffast-math" compilation option for the file vect-fmin-3.c.

2023-12-30 Thread chenglulu



在 2023/12/30 下午8:25, Xi Ruoyao 写道:

On Sat, 2023-12-30 at 12:15 +, Richard Sandiford wrote:

This shouldn't be necessary.  The test does:

   for (int i = 0; i < n; i += 2)
     {
   x0 = __builtin_fmin (x0, ptr[i + 0]);
   x1 = __builtin_fmin (x1, ptr[i + 1]);
     }
   res[0] = x0;
   res[1] = x1;

__builtin_fmin is an FP minimum operation that corresponds directly to
the fmin*3 optab (or reduc_fmin_scal_* for reductions).  It is naturally
associative, so doesn't need -ffast-math for that.

Does LoongArch provide reduc_min_scal_* but not reduc_fmin_scal_*?
If so, we probably need a new target selector for fmin/fmax reduction.

Let me try if the [x]vf{min,max} instructions are IEEE-conform.  They've
still not released the volume 2 of the instruction manual so I can only
try...


These two instructions are in compliance with the ieee-754 standard.



Re: [PATCH v1] LoongArch: testsuite:Add the "-ffast-math" compilation option for the file vect-fmin-3.c.

2023-12-30 Thread Xi Ruoyao
On Sat, 2023-12-30 at 20:25 +0800, Xi Ruoyao wrote:
> On Sat, 2023-12-30 at 12:15 +, Richard Sandiford wrote:
> > This shouldn't be necessary.  The test does:
> > 
> >   for (int i = 0; i < n; i += 2)
> >     {
> >   x0 = __builtin_fmin (x0, ptr[i + 0]);
> >   x1 = __builtin_fmin (x1, ptr[i + 1]);
> >     }
> >   res[0] = x0;
> >   res[1] = x1;
> > 
> > __builtin_fmin is an FP minimum operation that corresponds directly to
> > the fmin*3 optab (or reduc_fmin_scal_* for reductions).  It is naturally
> > associative, so doesn't need -ffast-math for that.
> > 
> > Does LoongArch provide reduc_min_scal_* but not reduc_fmin_scal_*?
> > If so, we probably need a new target selector for fmin/fmax reduction.
> 
> Let me try if the [x]vf{min,max} instructions are IEEE-conform.  They've
> still not released the volume 2 of the instruction manual so I can only
> try...

They are conforming (at least on LA464).  I'll make a patch to add
f{min/max} and reduc_f{min/max}_scal_* for LoongArch SIMD.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v1] LoongArch: testsuite:Add the "-ffast-math" compilation option for the file vect-fmin-3.c.

2023-12-30 Thread Xi Ruoyao
On Sat, 2023-12-30 at 12:15 +, Richard Sandiford wrote:
> This shouldn't be necessary.  The test does:
> 
>   for (int i = 0; i < n; i += 2)
>     {
>   x0 = __builtin_fmin (x0, ptr[i + 0]);
>   x1 = __builtin_fmin (x1, ptr[i + 1]);
>     }
>   res[0] = x0;
>   res[1] = x1;
> 
> __builtin_fmin is an FP minimum operation that corresponds directly to
> the fmin*3 optab (or reduc_fmin_scal_* for reductions).  It is naturally
> associative, so doesn't need -ffast-math for that.
> 
> Does LoongArch provide reduc_min_scal_* but not reduc_fmin_scal_*?
> If so, we probably need a new target selector for fmin/fmax reduction.

Let me try if the [x]vf{min,max} instructions are IEEE-conform.  They've
still not released the volume 2 of the instruction manual so I can only
try...

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Ping: [PATCH] toplevel: don't override gettext-runtime/configure-discovered build args

2023-12-30 Thread Arsen Arsenović
Hi,

Ping on this patch.

TIA, have a lovely day and happy holidays!
--
Arsen Arsenović


signature.asc
Description: PGP signature


Re: [PATCH v1] LoongArch: testsuite:Add the "-ffast-math" compilation option for the file vect-fmin-3.c.

2023-12-30 Thread Richard Sandiford
chenxiaolong  writes:
> After the detection of maximum reduction is enabled on LoongArch architecture,
> the regression test of GCC finds that vect-fmin-3.c fails. Currently, in the
> target-supports.exp file, only aarch64,arm,riscv, and LoongArch architectures
> are supported. Through analysis, the "-ffast-math" compilation option needs to
> be added to the test case in order to successfully reduce using vectorization.
> The original patch was submitted by author Richard Sandiford.
>
> The initial patch information submitted is as follows:
>
> commit e32b9eb32d7cd2d39bf9c70497890ac61b9ee14c
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.dg/vect/vect-fmin-3.c:Adding an extra "-ffast-math" to the
> compilation option ensures that the loop can be reduced to maximum
> success.

This shouldn't be necessary.  The test does:

  for (int i = 0; i < n; i += 2)
{
  x0 = __builtin_fmin (x0, ptr[i + 0]);
  x1 = __builtin_fmin (x1, ptr[i + 1]);
}
  res[0] = x0;
  res[1] = x1;

__builtin_fmin is an FP minimum operation that corresponds directly to
the fmin*3 optab (or reduc_fmin_scal_* for reductions).  It is naturally
associative, so doesn't need -ffast-math for that.

Does LoongArch provide reduc_min_scal_* but not reduc_fmin_scal_*?
If so, we probably need a new target selector for fmin/fmax reduction.

Thanks,
Richard


> ---
>  gcc/testsuite/gcc.dg/vect/vect-fmin-3.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-fmin-3.c 
> b/gcc/testsuite/gcc.dg/vect/vect-fmin-3.c
> index 2e282ba6878..edef57925c1 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-fmin-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-fmin-3.c
> @@ -1,4 +1,5 @@
>  /* { dg-require-effective-target vect_float } */
> +/* { dg-additional-options "-ffast-math" } */
>  
>  #include "tree-vect.h"


Re: Fortran: Use non conflicting file extensions for intermediates [PR81615]

2023-12-30 Thread Thomas Koenig

Replying to myself...



I think this also desevers a mention in changes.html.  Here is something
that I came up with.  OK? Or does anybody have suggestions for a better
wording?



Or maybe this is better:

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 4b83037a..d232f631 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -282,8 +282,14 @@ a work-in-progress.

 

-
-
+Fortran
+
+   With the -save-temps option, preprocessed files
+with the .fii extension will be generated for
+free-form source files such as .F90 and
+.fi for fixed-form files such as .F.
+  
+
 

 





Re: Fortran: Use non conflicting file extensions for intermediates [PR81615]

2023-12-30 Thread Thomas Koenig

Hi Rimvydas,


Documentation part.
The makeinfo gcc/fortran/gfortran.texi does not seem to have any new warnings.


Thanks for your work on this!

I think this also desevers a mention in changes.html.  Here is something
that I came up with.  OK? Or does anybody have suggestions for a better
wording?

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 4b83037a..b3b67dda 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -282,8 +282,13 @@ a work-in-progress.

 

-
-
+Fortran
+
+   With the -save-temps option, .fii and
+.fi files are now generated from .F90
+and .F files, respectively.
+  
+
 

 







Re: [PATCH] libstdc++ testsuite/std/ranges/iota/max_size_type.cc: Reduce /10 for simulators

2023-12-30 Thread Jonathan Wakely
On Sat, 30 Dec 2023, 01:41 Hans-Peter Nilsson,  wrote:

> I'm not completely sure I got the intent of the "log2_limit",
> or whether "limit" is sane to decrease like this; it just
> looked like an obvious and safe reduction.  Also, I verified
> the 10+ minute runtime, on this same host (clocked at 11:43.61
> elapsed time) for a r12-2797-g307e0d40367996 build that I
> happened to have kept around; likely the build that led up
> to that commit.  Now it's 58:45.78 elapsed time for a
> successful run.  Looks like a 5x performance regression.
> Worrisome; PR mentioned below.
>
> Incidentally, a parallel build and a serial test-run takes 9
> hours on that laptop, so that's almost 2 hours just for one
> test, if just updating the timeout to fit.  IOW, currently 48
> minutes out of 9 hours for one test that just times out.
>
> (That was just mentioned for comparison purposed: when suitable,
> I test with `nprocs`-1 in parallel.)
>
> I'll put it on the back-burner to investigate.  I think I'll
> try to graft that version of libstdc++-v3 to this version
>

Unfortunately that will probably be difficult.


and see if I can shift the blame away from MMIX code
> generation onto libstdc++-v3.  ;)
> Or perhaps the cause is known?
>

Not to me. It probably is a target codegen bug, since all this test really
does is emulate a wide integer type using masks and shifts.


> With this, the test successfully completes in ~34 seconds.
>
> Ok to commit?
>

Looks OK to me, but Patrick wrote this test so please wait for him to
confirm. I think this just reduces the number of cases tested, but doesn't
miss any important edge cases that should be checked.




> -- >8 --
> Looks like the MMIX port code quality and/or libstdc++
> performance of this test has regressed since
> r12-2799-ge9b639c4b53221 by a factor 5.  Anyway what was 11+
> minutes runtime then, is now at r14-6859-gd1eacedc6d9ba9
> close to 60 minutes.  Better prune the test, not just
> increase timeouts.  Also of course, investigate the
> performance regression, logged as PR113175.
>
> * testsuite/std/ranges/iota/max_size_type.cc: Adjust
> limits from -1000..1000 to -100..100 for simulators.
> ---
>  .../std/ranges/iota/max_size_type.cc  | 19 ++-
>  1 file changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/libstdc++-v3/testsuite/std/ranges/iota/max_size_type.cc
> b/libstdc++-v3/testsuite/std/ranges/iota/max_size_type.cc
> index a1fbc3241dca..38fa6323d47e 100644
> --- a/libstdc++-v3/testsuite/std/ranges/iota/max_size_type.cc
> +++ b/libstdc++-v3/testsuite/std/ranges/iota/max_size_type.cc
> @@ -16,6 +16,7 @@
>  // .
>
>  // { dg-do run { target c++20 } }
> +// { dg-additional-options "-DSIMULATOR_TEST" { target simulator } }
>  // { dg-timeout-factor 4 }
>
>  #include 
> @@ -31,6 +32,14 @@ using signed_rep_t = __int128;
>  using signed_rep_t = long long;
>  #endif
>
> +#ifdef SIMULATOR_TEST
> +#define LIMIT 100
> +#define LOG2_CEIL_LIMIT 7
> +#else
> +#define LIMIT 1000
> +#define LOG2_CEIL_LIMIT 10
> +#endif
> +
>  static_assert(sizeof(max_size_t) == sizeof(max_diff_t));
>  static_assert(sizeof(rep_t) == sizeof(signed_rep_t));
>
> @@ -199,8 +208,8 @@ test02()
>using max_type = std::conditional_t;
>using shorten_type = std::conditional_t;
>const int hw_type_bit_size = sizeof(hw_type) * __CHAR_BIT__;
> -  const int limit = 1000;
> -  const int log2_limit = 10;
> +  const int limit = LIMIT;
> +  const int log2_limit = LOG2_CEIL_LIMIT;
>static_assert((1 << log2_limit) >= limit);
>const int min = (signed_p ? -limit : 0);
>const int max = limit;
> @@ -257,8 +266,8 @@ test03()
>using max_type = std::conditional_t;
>using base_type = std::conditional_t;
>constexpr int hw_type_bit_size = sizeof(hw_type) * __CHAR_BIT__;
> -  constexpr int limit = 1000;
> -  constexpr int log2_limit = 10;
> +  constexpr int limit = LIMIT;
> +  constexpr int log2_limit = LOG2_CEIL_LIMIT;
>static_assert((1 << log2_limit) >= limit);
>const int min = (signed_p ? -limit : 0);
>const int max = limit;
> @@ -312,7 +321,7 @@ test03()
>  void
>  test04()
>  {
> -  constexpr int limit = 1000;
> +  constexpr int limit = LIMIT;
>for (int i = -limit; i <= limit; i++)
>  {
>VERIFY( -max_size_t(-i) == i );
> --
> 2.30.2
>
>


Re: [PATCH] libstdc++ testsuite/20_util/hash/quality.cc: Increase timeout 3x

2023-12-30 Thread Jonathan Wakely
On Sat, 30 Dec 2023, 01:24 Hans-Peter Nilsson,  wrote:

> Tested for mmix and observing the increased timeout in the .log
> file - and the test passing.
>
> Ok to commit?  Or better suggestions?
>

OK to commit, thanks.



> -- >8 --
> Testing for mmix (a 64-bit target using Knuth's simulator).  The test
> is largely pruned for simulators, but still needs 5m57s on my laptop
> from 3.5 years ago to run to successful completion.  Perhaps slow
> hosted targets could also have problems so increasing the timeout
> limit, not just for simulators but for everyone, and by more than a
> factor 2.
>
> * testsuite/20_util/hash/quality.cc: Increase timeout by a factor
> 3.
> ---
>  libstdc++-v3/testsuite/20_util/hash/quality.cc | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/libstdc++-v3/testsuite/20_util/hash/quality.cc
> b/libstdc++-v3/testsuite/20_util/hash/quality.cc
> index 7d4208ed6d21..80efc026 100644
> --- a/libstdc++-v3/testsuite/20_util/hash/quality.cc
> +++ b/libstdc++-v3/testsuite/20_util/hash/quality.cc
> @@ -1,5 +1,6 @@
>  // { dg-options "-DNTESTS=1 -DNSTRINGS=100 -DSTRSIZE=21" { target
> simulator } }
>  // { dg-do run { target c++11 } }
> +// { dg-timeout-factor 3 }
>
>  // Copyright (C) 2010-2023 Free Software Foundation, Inc.
>  //
> --
> 2.30.2
>
>