Re: Repost [PATCH 5/6] PowerPC: Switch to dense math names for all MMA operations.

2024-02-03 Thread Kewen.Lin
Hi Mike,

on 2024/1/6 07:40, Michael Meissner wrote:
> This patch changes the assembler instruction names for MMA instructions from
> the original name used in power10 to the new name when used with the dense 
> math
> system.  I.e. xvf64gerpp becomes dmxvf64gerpp.  The assembler will emit the
> same bits for either spelling.
> 
> The patches have been tested on both little and big endian systems.  Can I 
> check
> it into the master branch?
> 
> 2024-01-05   Michael Meissner  
> 
> gcc/
> 
>   * config/rs6000/mma.md (vvi4i4i8_dm): New int attribute.
>   (avvi4i4i8_dm): Likewise.
>   (vvi4i4i2_dm): Likewise.
>   (avvi4i4i2_dm): Likewise.
>   (vvi4i4_dm): Likewise.
>   (avvi4i4_dm): Likewise.
>   (pvi4i2_dm): Likewise.
>   (apvi4i2_dm): Likewise.
>   (vvi4i4i4_dm): Likewise.
>   (avvi4i4i4_dm): Likewise.
>   (mma_): Add support for running on DMF systems, generating the dense
>   math instruction and using the dense math accumulators.
>   (mma_): Likewise.
>   (mma_): Likewise.
>   (mma_): Likewise.
>   (mma_): Likewise.
>   (mma_): Likewise.
>   (mma_): Likewise.
>   (mma_): Likewise.
>   (mma_): Likewise.
>   (mma_   (mma_): Likewise.
>   (mma_   (mma_): Likewise.
>   (mma_): Likewise.
> 
> gcc/testsuite/
> 
>   * gcc.target/powerpc/dm-double-test.c: New test.
>   * lib/target-supports.exp (check_effective_target_ppc_dmr_ok): New
>   target test.
> ---
>  gcc/config/rs6000/mma.md  |  98 +++--
>  .../gcc.target/powerpc/dm-double-test.c   | 194 ++
>  gcc/testsuite/lib/target-supports.exp |  19 ++
>  3 files changed, 299 insertions(+), 12 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/dm-double-test.c
> 
> diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
> index 525a85146ff..f06e6bbb184 100644
> --- a/gcc/config/rs6000/mma.md
> +++ b/gcc/config/rs6000/mma.md
> @@ -227,13 +227,22 @@ (define_int_attr apv[(UNSPEC_MMA_XVF64GERPP 
> "xvf64gerpp")
>  
>  (define_int_attr vvi4i4i8[(UNSPEC_MMA_PMXVI4GER8 "pmxvi4ger8")])
>  
> +(define_int_attr vvi4i4i8_dm [(UNSPEC_MMA_PMXVI4GER8 
> "pmdmxvi4ger8")])

Can we update vvi4i4i8 to

(define_int_attr vvi4i4i8   [(UNSPEC_MMA_PMXVI4GER8 "xvi4ger8")])

by avoiding to introduce vvi4i4i8_dm, then its use places would be like:

-  " %A0,%x1,%x2,%3,%4,%5"
+  "@
+   pmdm %A0,%x1,%x2,%3,%4,%5
+   pm %A0,%x1,%x2,%3,%4,%5
+   pm %A0,%x1,%x2,%3,%4,%5"

and 

- define_insn "mma_"
+ define_insn "mma_pm"

(or updating its use in corresponding bif expander field)

?  

This comment is also applied for the other iterators changes.

> +
>  (define_int_attr avvi4i4i8   [(UNSPEC_MMA_PMXVI4GER8PP   
> "pmxvi4ger8pp")])
>  
> +(define_int_attr avvi4i4i8_dm[(UNSPEC_MMA_PMXVI4GER8PP   
> "pmdmxvi4ger8pp")])
> +
>  (define_int_attr vvi4i4i2[(UNSPEC_MMA_PMXVI16GER2"pmxvi16ger2")
>(UNSPEC_MMA_PMXVI16GER2S   "pmxvi16ger2s")
>(UNSPEC_MMA_PMXVF16GER2"pmxvf16ger2")
>(UNSPEC_MMA_PMXVBF16GER2   
> "pmxvbf16ger2")])
>  
> +(define_int_attr vvi4i4i2_dm [(UNSPEC_MMA_PMXVI16GER2"pmdmxvi16ger2")
> +  (UNSPEC_MMA_PMXVI16GER2S   
> "pmdmxvi16ger2s")
> +  (UNSPEC_MMA_PMXVF16GER2"pmdmxvf16ger2")
> +  (UNSPEC_MMA_PMXVBF16GER2   
> "pmdmxvbf16ger2")])
> +
>  (define_int_attr avvi4i4i2   [(UNSPEC_MMA_PMXVI16GER2PP  "pmxvi16ger2pp")
>(UNSPEC_MMA_PMXVI16GER2SPP 
> "pmxvi16ger2spp")
>(UNSPEC_MMA_PMXVF16GER2PP  "pmxvf16ger2pp")
> @@ -245,25 +254,54 @@ (define_int_attr avvi4i4i2  
> [(UNSPEC_MMA_PMXVI16GER2PP  "pmxvi16ger2pp")
>(UNSPEC_MMA_PMXVBF16GER2NP 
> "pmxvbf16ger2np")
>(UNSPEC_MMA_PMXVBF16GER2NN 
> "pmxvbf16ger2nn")])
>  
> +(define_int_attr avvi4i4i2_dm[(UNSPEC_MMA_PMXVI16GER2PP  
> "pmdmxvi16ger2pp")
> +  (UNSPEC_MMA_PMXVI16GER2SPP 
> "pmdmxvi16ger2spp")
> +  (UNSPEC_MMA_PMXVF16GER2PP  
> "pmdmxvf16ger2pp")
> +  (UNSPEC_MMA_PMXVF16GER2PN  
> "pmdmxvf16ger2pn")
> +  (UNSPEC_MMA_PMXVF16GER2NP  
> "pmdmxvf16ger2np")
> +  (UNSPEC_MMA_PMXVF16GER2NN  
> "pmdmxvf16ger2nn")
> +  (UNSPEC_MMA_PMXVBF16GER2PP 
> "pmdmxvbf16ger2pp")
> +  (UNSPEC_MMA_PMXVBF16GER2PN 
> "pmdmxvbf16ger2pn")
> +  (UNSPEC_MMA_PMXVBF16GER2NP 
> "pmdmxvbf16ger2np")
> +  (UNSPEC_MMA_PMXVBF16GER2NN 
> 

Re: [PATCH v4 5/5] Add documentation for musttail attribute

2024-02-03 Thread Sandra Loosemore

On 2/2/24 02:09, Andi Kleen wrote:

gcc/ChangeLog:

* doc/extend.texi: Document [[musttail]]
---
  gcc/doc/extend.texi | 16 
  1 file changed, 16 insertions(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 142e41ab8fbf..866f6c4a9fed 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -9875,6 +9875,22 @@ foo (int x, int y)
  @code{y} is not actually incremented and the compiler can but does not
  have to optimize it to just @code{return 42 + 42;}.
  
+@cindex @code{musttail} statement attribute

+@item musttail
+
+The @code{gnu::musttail} or @code{clang::musttail} attribute
+can be applied to a return statement that returns the value
+of a call to indicate that the call must be a tail call
+that does not allocate extra stack space.


It took me about 3 attempts to parse this.  :-S  I think this might be a 
little better:


...can be applied to a @code{return} statement with a return-value 
expression that is a function call.  It asserts that the call must be a 
tail call that does not allocate extra stack space.



+
+@smallexample
+[[gnu::musttail]] return foo();
+@end smallexample
+
+If the compiler cannot generate a tail call it will generate


s/will generate/generates/

I'm a big fan of writing in the present tense.  ;-)


+an error. Tail calls generally require enabling optimization.
+On some targets they may not be supported.
+
  @end table
  
  @node Attribute Syntax


In addition to these changes, at the beginning of this section we have

@node Statement Attributes
@section Statement Attributes
@cindex Statement Attributes

GCC allows attributes to be set on null statements.  @xref{Attribute 
Syntax},

for details of the exact syntax for using attributes. [...]

Well, we now have an attribute that goes on a non-null statement, so we 
have to fix this.  The documentation for the other statement attributes 
is already explicit that they go on null statements so those already 
would be OK if we just removed the "null" restriction here.  OTOH, the 
Attribute Syntax section, in discussing GCC's traditional attribute 
syntax, says:


@subsubheading Statement Attributes
In GNU C, an attribute specifier list may appear as part of a null
statement.  The attribute goes before the semicolon.

If "musttail" is only supported in the standard attribute syntax, its 
new entry in the Statement Attributes node must say that, and the blurb 
at the top of the node quoted above must say something to the effect 
that the traditional syntax only allows statement attributes on null 
statements and attributes on non-null statements are only permitted in 
the new standard attribute form.


-Sandra



[r14-8768 Regression] FAIL: libgomp.fortran/non-rectangular-loop-1.f90 -O1 execution test on Linux/x86_64

2024-02-03 Thread haochen.jiang
On Linux/x86_64,

85094e2aa6dba7908f053046f02dd443e8f65d72 is the first bad commit
commit 85094e2aa6dba7908f053046f02dd443e8f65d72
Author: Tamar Christina 
Date:   Fri Feb 2 23:52:27 2024 +

middle-end: check memory accesses in the destination block [PR113588].

caused

FAIL: libgomp.fortran/non-rectangular-loop-6.f90   -O1  execution test

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-8768/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="fortran.exp=libgomp.fortran/non-rectangular-loop-6.f90 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="fortran.exp=libgomp.fortran/non-rectangular-loop-6.f90 
--target_board='unix{-m64}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


Re: Repost [PATCH 4/6] PowerPC: Make MMA insns support DMR registers.

2024-02-03 Thread Kewen.Lin
Hi Mike,

on 2024/1/6 07:39, Michael Meissner wrote:
> This patch changes the MMA instructions to use either FPR registers
> (-mcpu=power10) or DMRs (-mcpu=future).  In this patch, the existing MMA
> instruction names are used.
> 
> A macro (__PPC_DMR__) is defined if the MMA instructions use the DMRs.
> 
> The patches have been tested on both little and big endian systems.  Can I 
> check
> it into the master branch?
> 
> 2024-01-05   Michael Meissner  
> 
> gcc/
> 
>   * config/rs6000/mma.md (mma_): New define_expand to handle
>   mma_ for dense math and non dense math.
>   (mma_ insn): Restrict to non dense math.
>   (mma_xxsetaccz): Convert to define_expand to handle non dense math and
>   dense math.
>   (mma_xxsetaccz_vsx): Rename from mma_xxsetaccz and restrict usage to non
>   dense math.
>   (mma_xxsetaccz_dm): Dense math version of mma_xxsetaccz.
>   (mma_): Add support for dense math.
>   (mma_): Likewise.
>   (mma_): Likewise.
>   (mma_): Likewise.
>   (mma_): Likewise.
>   (mma_): Likewise.
>   (mma_): Likewise.
>   (mma_): Likewise.
>   (mma_): Likewise.
>   (mma_): Likewise.
>   (mma_): Likewise.
>   (mma_): Likewise.
>   (mma_): Likewise.
>   (mma_): Likewise.
>   * config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define
>   __PPC_DMR__ if we have dense math instructions.
>   * config/rs6000/rs6000.cc (print_operand): Make %A handle only DMRs if
>   dense math and only FPRs if not dense math.
>   (rs6000_split_multireg_move): Do not generate the xxmtacc instruction to
>   prime the DMR registers or the xxmfacc instruction to de-prime
>   instructions if we have dense math register support.
> ---
>  gcc/config/rs6000/mma.md  | 247 +-
>  gcc/config/rs6000/rs6000-c.cc |   3 +
>  gcc/config/rs6000/rs6000.cc   |  35 ++---
>  3 files changed, 176 insertions(+), 109 deletions(-)
> 
> diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
> index bb898919ab5..525a85146ff 100644
> --- a/gcc/config/rs6000/mma.md
> +++ b/gcc/config/rs6000/mma.md
> @@ -559,190 +559,249 @@ (define_insn "*mma_disassemble_acc_dm"
>"dmxxextfdmr256 %0,%1,2"
>[(set_attr "type" "mma")])
>  
> -(define_insn "mma_"
> +;; MMA instructions that do not use their accumulators as an input, still 
> must
> +;; not allow their vector operands to overlap the registers used by the
> +;; accumulator.  We enforce this by marking the output as early clobber.  If 
> we
> +;; have dense math, we don't need the whole prime/de-prime action, so just 
> make
> +;; thse instructions be NOPs.

typo: thse.

> +
> +(define_expand "mma_"
> +  [(set (match_operand:XO 0 "register_operand")
> + (unspec:XO [(match_operand:XO 1 "register_operand")]

s/register_operand/accumulator_operand/?

> +MMA_ACC))]
> +  "TARGET_MMA"
> +{
> +  if (TARGET_DENSE_MATH)
> +{
> +  if (!rtx_equal_p (operands[0], operands[1]))
> + emit_move_insn (operands[0], operands[1]);
> +  DONE;
> +}
> +
> +  /* Generate the prime/de-prime code.  */
> +})
> +
> +(define_insn "*mma_"

May be better to name with "*mma__nodm"?

>[(set (match_operand:XO 0 "fpr_reg_operand" "=")
>   (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0")]
>   MMA_ACC))]
> -  "TARGET_MMA"
> +  "TARGET_MMA && !TARGET_DENSE_MATH"

I found that "TARGET_MMA && !TARGET_DENSE_MATH" is used much (like changes in 
function
rs6000_split_multireg_move in this patch and some places in previous patches), 
maybe we
can introduce a macro named as TARGET_MMA_NODM short for it?

>" %A0"
>[(set_attr "type" "mma")])
>  
>  ;; We can't have integer constants in XOmode so we wrap this in an
> -;; UNSPEC_VOLATILE.
> +;; UNSPEC_VOLATILE for the non-dense math case.  For dense math, we don't 
> need
> +;; to disable optimization and we can do a normal UNSPEC.
>  
> -(define_insn "mma_xxsetaccz"
> -  [(set (match_operand:XO 0 "fpr_reg_operand" "=d")
> +(define_expand "mma_xxsetaccz"
> +  [(set (match_operand:XO 0 "register_operand")

s/register_operand/accumulator_operand/?

>   (unspec_volatile:XO [(const_int 0)]
>   UNSPECV_MMA_XXSETACCZ))]
>"TARGET_MMA"
> +{
> +  if (TARGET_DENSE_MATH)
> +{
> +  emit_insn (gen_mma_xxsetaccz_dm (operands[0]));
> +  DONE;
> +}
> +})
> +
> +(define_insn "*mma_xxsetaccz_vsx"

s/vsx/nodm/

> +  [(set (match_operand:XO 0 "fpr_reg_operand" "=d")
> + (unspec_volatile:XO [(const_int 0)]
> + UNSPECV_MMA_XXSETACCZ))]
> +  "TARGET_MMA && !TARGET_DENSE_MATH"
>"xxsetaccz %A0"
>[(set_attr "type" "mma")])
>  
> +
> +(define_insn "mma_xxsetaccz_dm"
> +  [(set (match_operand:XO 0 "dmr_operand" "=wD")
> + (unspec:XO [(const_int 0)]
> +UNSPECV_MMA_XXSETACCZ))]
> +  "TARGET_DENSE_MATH"
> +  "dmsetdmrz %0"
> +  [(set_attr "type" "mma")])
> +
>  (define_insn "mma_"
> -  

Re: [PATCH] LoongArch: Fix wrong LSX FP vector negation

2024-02-03 Thread chenglulu



在 2024/2/3 下午4:58, Xi Ruoyao 写道:

We expanded (neg x) to (minus const0 x) for LSX FP vectors, this is
wrong because -0.0 is not 0 - 0.0.  This causes some Python tests to
fail when Python is built with LSX enabled.

Use the vbitrevi.{d/w} instructions to simply reverse the sign bit
instead.  We are already doing this for LASX and now we can unify them
into simd.md.

gcc/ChangeLog:

* config/loongarch/lsx.md (neg2): Remove the
incorrect expand.
* config/loongarch/simd.md (simdfmt_as_i): New define_mode_attr.
(elmsgnbit): Likewise.
(neg2): New define_insn.
* config/loongarch/lasx.md (negv4df2, negv8sf2): Remove as they
are now instantiated in simd.md.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?


LGTM!

Thanks!



  gcc/config/loongarch/lasx.md | 16 
  gcc/config/loongarch/lsx.md  | 11 ---
  gcc/config/loongarch/simd.md | 18 ++
  3 files changed, 18 insertions(+), 27 deletions(-)

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index e2115ffb884..ac84db7f0ce 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -3028,22 +3028,6 @@ (define_insn "absv8sf2"
[(set_attr "type" "simd_logic")
 (set_attr "mode" "V8SF")])
  
-(define_insn "negv4df2"

-  [(set (match_operand:V4DF 0 "register_operand" "=f")
-   (neg:V4DF (match_operand:V4DF 1 "register_operand" "f")))]
-  "ISA_HAS_LASX"
-  "xvbitrevi.d\t%u0,%u1,63"
-  [(set_attr "type" "simd_logic")
-   (set_attr "mode" "V4DF")])
-
-(define_insn "negv8sf2"
-  [(set (match_operand:V8SF 0 "register_operand" "=f")
-   (neg:V8SF (match_operand:V8SF 1 "register_operand" "f")))]
-  "ISA_HAS_LASX"
-  "xvbitrevi.w\t%u0,%u1,31"
-  [(set_attr "type" "simd_logic")
-   (set_attr "mode" "V8SF")])
-
  (define_insn "xvfmadd4"
[(set (match_operand:FLASX 0 "register_operand" "=f")
(fma:FLASX (match_operand:FLASX 1 "register_operand" "f")
diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md
index 7002edae4d4..b9b94b9079c 100644
--- a/gcc/config/loongarch/lsx.md
+++ b/gcc/config/loongarch/lsx.md
@@ -728,17 +728,6 @@ (define_expand "neg2"
DONE;
  })
  
-(define_expand "neg2"

-  [(set (match_operand:FLSX 0 "register_operand")
-   (neg:FLSX (match_operand:FLSX 1 "register_operand")))]
-  "ISA_HAS_LSX"
-{
-  rtx reg = gen_reg_rtx (mode);
-  emit_move_insn (reg, CONST0_RTX (mode));
-  emit_insn (gen_sub3 (operands[0], reg, operands[1]));
-  DONE;
-})
-
  (define_expand "lsx_vrepli"
[(match_operand:ILSX 0 "register_operand")
 (match_operand 1 "const_imm10_operand")]
diff --git a/gcc/config/loongarch/simd.md b/gcc/config/loongarch/simd.md
index cb0a19447a1..00ff2823a4e 100644
--- a/gcc/config/loongarch/simd.md
+++ b/gcc/config/loongarch/simd.md
@@ -85,12 +85,21 @@ (define_mode_attr simdfmt [(V2DF "d") (V4DF "d")
  (define_mode_attr simdifmt_for_f [(V2DF "l") (V4DF "l")
  (V4SF "w") (V8SF "w")])
  
+;; Suffix for integer mode in LSX or LASX instructions to operating FP

+;; vectors using integer vector operations.
+(define_mode_attr simdfmt_as_i [(V2DF "d") (V4DF "d")
+   (V4SF "w") (V8SF "w")])
+
  ;; Size of vector elements in bits.
  (define_mode_attr elmbits [(V2DI "64") (V4DI "64")
   (V4SI "32") (V8SI "32")
   (V8HI "16") (V16HI "16")
   (V16QI "8") (V32QI "8")])
  
+;; The index of sign bit in FP vector elements.

+(define_mode_attr elmsgnbit [(V2DF "63") (V4DF "63")
+(V4SF "31") (V8SF "31")])
+
  ;; This attribute is used to form an immediate operand constraint using
  ;; "const__operand".
  (define_mode_attr bitimm [(V16QI "uimm3") (V32QI "uimm3")
@@ -457,6 +466,15 @@ (define_expand "reduc__scal_"
DONE;
  })
  
+;; FP negation.

+(define_insn "neg2"
+  [(set (match_operand:FVEC 0 "register_operand" "=f")
+   (neg:FVEC (match_operand:FVEC 1 "register_operand" "f")))]
+  ""
+  "vbitrevi.\t%0,%1,"
+  [(set_attr "type" "simd_logic")
+   (set_attr "mode" "")])
+
  ; The LoongArch SX Instructions.
  (include "lsx.md")
  




Re: [PATCH] LoongArch: Avoid out-of-bounds access in loongarch_symbol_insns

2024-02-03 Thread chenglulu



在 2024/2/2 下午5:55, Xi Ruoyao 写道:

We call loongarch_symbol_insns with mode = MAX_MACHINE_MODE sometimes.
But in loongarch_symbol_insns:

 if (LSX_SUPPORTED_MODE_P (mode) || LASX_SUPPORTED_MODE_P (mode))
   return 0;

And LSX_SUPPORTED_MODE_P is defined as:

 #define LSX_SUPPORTED_MODE_P(MODE) \
   (ISA_HAS_LSX \
&& GET_MODE_SIZE (MODE) == UNITS_PER_LSX_REG ... ...

GET_MODE_SIZE is expanded to a call to mode_to_bytes, which is defined:

 ALWAYS_INLINE poly_uint16
 mode_to_bytes (machine_mode mode)
 {
 #if GCC_VERSION >= 4001
   return (__builtin_constant_p (mode)
  ? mode_size_inline (mode) : mode_size[mode]);
 #else
   return mode_size[mode];
 #endif
 }

There is an assertion in mode_size_inline:

 gcc_assert (mode >= 0 && mode < NUM_MACHINE_MODES);

Note that NUM_MACHINE_MODES = MAX_MACHINE_MODE (emitted by genmodes.cc),
thus if __builtin_constant_p (mode) is evaluated true (it happens when
GCC is bootstrapped with LTO+PGO), the assertion will be triggered and
cause an ICE.  OTOH if __builtin_constant_p (mode) is evaluated false,
mode_size[mode] is still an out-of-bound array access (the length or the
mode_size array is NUM_MACHINE_MODES).

So we shouldn't call LSX_SUPPORTED_MODE_P or LASX_SUPPORTED_MODE_P with
MAX_MACHINE_MODE in loongarch_symbol_insns.  This is very similar to a
MIPS bug PR98491 fixed by me about 3 years ago.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_symbol_insns): Do not
use LSX_SUPPORTED_MODE_P or LASX_SUPPORTED_MODE_P if mode is
MAX_MACHINE_MODE.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?


LGTM!

I have a question. I see that you often add compilation options in 
BOOT_CFLAGS.


I also want to test it. Do you have a recommended set of compilation 
options?


Thanks!



  gcc/config/loongarch/loongarch.cc | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 963e86d61af..6badef45d62 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -2007,7 +2007,8 @@ loongarch_symbol_insns (enum loongarch_symbol_type type, 
machine_mode mode)
  {
/* LSX LD.* and ST.* cannot support loading symbols via an immediate
   operand.  */
-  if (LSX_SUPPORTED_MODE_P (mode) || LASX_SUPPORTED_MODE_P (mode))
+  if (mode != MAX_MACHINE_MODE
+  && (LSX_SUPPORTED_MODE_P (mode) || LASX_SUPPORTED_MODE_P (mode)))
  return 0;
  
switch (type)




Re: Re: [PATCH] RISC-V: Allow LICM hoist POLY_INT configuration code sequence

2024-02-03 Thread juzhe.zh...@rivai.ai
Hi,  kito and Robin and Jeff.
 
I didn't commit this patch yet since I found there is an ICE caused by this 
patch:

during RTL pass: loop2_unroll
dump file: bug.c.286r.loop2_unroll
bug.c: In function 'crashIt':
bug.c:23:1: internal compiler error: in decompose, at wide-int.h:1049
   23 | }
  | ^
0x1043946 wi::int_traits > 
>::decompose(long*, unsigned int, generic_wide_int > const&)
../../../../gcc/gcc/wide-int.h:1049
0x1043a80 wide_int_ref_storage::wide_int_ref_storage > >(generic_wide_int > const&, 
unsigned int)
../../../../gcc/gcc/wide-int.h:1099
0x1042f72 generic_wide_int 
>::generic_wide_int > 
>(generic_wide_int > const&, unsigned int)
../../../../gcc/gcc/wide-int.h:855
0x145b5d0 wi::binary_traits 
>, generic_wide_int >, 
wi::int_traits > 
>::precision_type, wi::int_traits > >::precision_type>::result_type 
wi::add >, 
generic_wide_int > 
>(generic_wide_int > const&, 
generic_wide_int > const&)
../../../../gcc/gcc/wide-int.h:2872
0x1458439 wi::binary_traits 
>, generic_wide_int >, 
wi::int_traits > 
>::precision_type, wi::int_traits > >::precision_type>::operator_result 
operator+ >, 
generic_wide_int > 
>(generic_wide_int > const&, 
generic_wide_int > const&)
../../../../gcc/gcc/wide-int.h:3857
0x195f866 poly_int<2u, poly_result >, generic_wide_int >, 
poly_coeff_pair_traits >, 
generic_wide_int > >::result_kind>::type> 
operator+<2u, generic_wide_int >, 
generic_wide_int > >(poly_int<2u, 
generic_wide_int > > const&, poly_int<2u, 
generic_wide_int > > const&)
../../../../gcc/gcc/poly-int.h:772
0x194d423 simplify_const_binary_operation(rtx_code, machine_mode, rtx_def*, 
rtx_def*)
../../../../gcc/gcc/simplify-rtx.cc:5392
0x1940374 simplify_context::simplify_binary_operation(rtx_code, machine_mode, 
rtx_def*, rtx_def*)
../../../../gcc/gcc/simplify-rtx.cc:2664
0x1936e62 simplify_context::simplify_gen_binary(rtx_code, machine_mode, 
rtx_def*, rtx_def*)
../../../../gcc/gcc/simplify-rtx.cc:182
0x11b43f6 simplify_gen_binary(rtx_code, machine_mode, rtx_def*, rtx_def*)
../../../../gcc/gcc/rtl.h:3529
0x16c0e35 get_biv_step_1
../../../../gcc/gcc/loop-iv.cc:788
0x16c0c97 get_biv_step_1
../../../../gcc/gcc/loop-iv.cc:758
0x16c0f68 get_biv_step
../../../../gcc/gcc/loop-iv.cc:828
0x16c1390 iv_analyze_biv
../../../../gcc/gcc/loop-iv.cc:921
0x16c1e7d iv_analyze_op
../../../../gcc/gcc/loop-iv.cc:1187
0x16c1d71 iv_analyze_op
../../../../gcc/gcc/loop-iv.cc:1157
0x16c15e0 iv_analyze_expr(rtx_insn*, scalar_int_mode, rtx_def*, rtx_iv*)
../../../../gcc/gcc/loop-iv.cc:976
0x16c1757 iv_analyze_expr(rtx_insn*, scalar_int_mode, rtx_def*, rtx_iv*)
../../../../gcc/gcc/loop-iv.cc:1020
0x16c1757 iv_analyze_expr(rtx_insn*, scalar_int_mode, rtx_def*, rtx_iv*)
../../../../gcc/gcc/loop-iv.cc:1020
0x16c1b83 iv_analyze_def
../../../../gcc/gcc/loop-iv.cc:1115

To reproduce this ICE:

with compile option:  -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops 
-ftracer -finline-functions

typedef unsigned short (FUNC_P) (void *, unsigned char *, unsigned short);

void crashIt(int id, FUNC_P *func, unsigned char *funcparm)
{
  unsigned char buff[5], reverse[4];
  unsigned char *bp = buff;
  unsigned char *rp = reverse;
  unsigned short int count = 0;
  unsigned short cnt;
  while (id > 0)
{
  *rp++ = (unsigned char) (id & 0x7F);
  id >>= 7;
  count++;
}
  cnt = count + 1;
  while ((count--) > 1)
{
  *bp++ = (unsigned char)(*(--rp) | 0x80);
}
  *bp++ = *(--rp);
  (void)(*func)(funcparm, buff, cnt);
}

The root cause is this following RTL pattern, after fwprop1:

(insn 82 78 84 9 (set (reg:DI 230)
(sign_extend:DI (minus:SI (subreg/s/v:SI (reg:DI 150 [ niters.10 ]) 0)
(subreg:SI (reg:DI 221) 0 13 {subsi3_extended}
 (expr_list:REG_EQUAL (sign_extend:DI (plus:SI (subreg/s/v:SI (reg:DI 150 [ 
niters.10 ]) 0)
(const_poly_int:SI [-16, -16])))
(nil)))

The highlight (const_poly_int:SI [-16, -16])
causes ICE.

This RTL is because:
(insn 69 68 71 8 (set (reg:DI 221)
(const_poly_int:DI [16, 16])) 208 {*movdi_64bit}
 (nil))
(insn 82 78 84 9 (set (reg:DI 230)
(sign_extend:DI (minus:SI (subreg/s/v:SI (reg:DI 150 [ niters.10 ]) 0)
(subreg:SI (reg:DI 221) 0 13 {subsi3_extended}  
> (subreg:SI (const_poly_int:SI [-16, -16])) 
fwprop1 add  (const_poly_int:SI [-16, -16]) reg_equal
 (expr_list:REG_EQUAL (sign_extend:DI (plus:SI (subreg/s/v:SI (reg:DI 150 [ 
niters.10 ]) 0)
(const_poly_int:SI [-16, -16])))
(nil)))

Previously, we are doing:

(set (subreg:DI (reg:SI)  (DI: poly value)). --> outer mode bigger than inner 
mode in dest operand.

We never has (subreg: (poly_value)), so we won't have ICE. However, I don't 
think our previous approach is correct.

Actually, I believe we should apply this 

[PATCH v2] LoongArch: libsanitizer: Enable Lsan and Tsan for loongarch64.

2024-02-03 Thread Lulu Cheng
From: chenguoqi 

libsanitizer/ChangeLog:

* configure.tgt: Enable tsan and lsan for loongarch64.
* tsan/Makefile.am (EXTRA_libtsan_la_SOURCES): Add
tsan_rtl_loongarch64.S.
* tsan/Makefile.in: Regenerate.
---
 libsanitizer/configure.tgt| 5 +
 libsanitizer/tsan/Makefile.am | 2 +-
 libsanitizer/tsan/Makefile.in | 3 ++-
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/libsanitizer/configure.tgt b/libsanitizer/configure.tgt
index 38fc7001ff7..77a0e68222b 100644
--- a/libsanitizer/configure.tgt
+++ b/libsanitizer/configure.tgt
@@ -79,6 +79,11 @@ case "${target}" in
fi
;;
   loongarch64-*-linux*)
+   if test x$ac_cv_sizeof_void_p = x8; then
+   TSAN_SUPPORTED=yes
+   LSAN_SUPPORTED=yes
+   TSAN_TARGET_DEPENDENT_OBJECTS=tsan_rtl_loongarch64.lo
+   fi
;;
   *)
UNSUPPORTED=1
diff --git a/libsanitizer/tsan/Makefile.am b/libsanitizer/tsan/Makefile.am
index cb8bf2e705e..e8fca16be5f 100644
--- a/libsanitizer/tsan/Makefile.am
+++ b/libsanitizer/tsan/Makefile.am
@@ -50,7 +50,7 @@ tsan_files = \
tsan_vector_clock.cpp
 
 libtsan_la_SOURCES = $(tsan_files)
-EXTRA_libtsan_la_SOURCES = tsan_rtl_amd64.S tsan_rtl_aarch64.S 
tsan_rtl_mips64.S tsan_rtl_ppc64.S tsan_rtl_s390x.S tsan_rtl_riscv64.S
+EXTRA_libtsan_la_SOURCES = tsan_rtl_amd64.S tsan_rtl_aarch64.S 
tsan_rtl_loongarch64.S tsan_rtl_mips64.S tsan_rtl_ppc64.S tsan_rtl_s390x.S 
tsan_rtl_riscv64.S
 libtsan_la_LIBADD = $(top_builddir)/sanitizer_common/libsanitizer_common.la 
$(top_builddir)/interception/libinterception.la $(TSAN_TARGET_DEPENDENT_OBJECTS)
 libtsan_la_DEPENDENCIES = 
$(top_builddir)/sanitizer_common/libsanitizer_common.la 
$(top_builddir)/interception/libinterception.la $(TSAN_TARGET_DEPENDENT_OBJECTS)
 if LIBBACKTRACE_SUPPORTED
diff --git a/libsanitizer/tsan/Makefile.in b/libsanitizer/tsan/Makefile.in
index 5cc6f95a40a..5bbdf3915b8 100644
--- a/libsanitizer/tsan/Makefile.in
+++ b/libsanitizer/tsan/Makefile.in
@@ -456,7 +456,7 @@ tsan_files = \
tsan_vector_clock.cpp
 
 libtsan_la_SOURCES = $(tsan_files)
-EXTRA_libtsan_la_SOURCES = tsan_rtl_amd64.S tsan_rtl_aarch64.S 
tsan_rtl_mips64.S tsan_rtl_ppc64.S tsan_rtl_s390x.S tsan_rtl_riscv64.S
+EXTRA_libtsan_la_SOURCES = tsan_rtl_amd64.S tsan_rtl_aarch64.S 
tsan_rtl_loongarch64.S tsan_rtl_mips64.S tsan_rtl_ppc64.S tsan_rtl_s390x.S 
tsan_rtl_riscv64.S
 libtsan_la_LIBADD =  \
$(top_builddir)/sanitizer_common/libsanitizer_common.la \
$(top_builddir)/interception/libinterception.la \
@@ -614,6 +614,7 @@ distclean-compile:
 @AMDEP_TRUE@@am__include@ 
@am__quote@./$(DEPDIR)/tsan_rtl_aarch64.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/tsan_rtl_access.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/tsan_rtl_amd64.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ 
@am__quote@./$(DEPDIR)/tsan_rtl_loongarch64.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/tsan_rtl_mips64.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/tsan_rtl_mutex.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/tsan_rtl_ppc64.Plo@am__quote@
-- 
2.39.3



Re: [PATCH] LoongArch: libsanitizer: Enable build lsan and tsan for loongarch64.

2024-02-03 Thread chenglulu



在 2024/2/2 下午6:01, Jakub Jelinek 写道:

On Tue, Jan 30, 2024 at 10:09:51AM +0800, Lulu Cheng wrote:

From: chenguoqi 

libsanitizer/ChangeLog:

* configure.tgt: Enable tsan and lsan for loongarch64.
* tsan/Makefile.am: Add tsan_rtl_loongarch64.S to 
EXTRA_libtsan_la_SOURCES.

This line is too long and should read
* tsan/Makefile.am (EXTRA_libtsan_la_SOURCES): Add
tsan_rtl_loongarch64.S.


* tsan/Makefile.in: Regenerate.

Otherwise LGTM.

Jakub


Thanks for your review.

I will send a patch for the V2 version immediately.




[committed] d: Merge dmd, druntime a6f1083699, phobos 31dedd7da

2024-02-03 Thread Iain Buclaw
Hi,

This patch merges the D front-end and runtime library with upstream dmd
a6f1083699, and the standard library with phobos 31dedd7da.

D front-end changes:

- Import dmd v2.107.0.
- Character postfixes can now also be used for integers of size
  two or four.

D run-time changes:

- Import druntime v2.107.0.

Phobos changes:

- Import phobos v2.107.0.

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd a6f1083699.
* dmd/VERSION: Bump version to v2.107.0
* Make-lang.in (D_FRONTEND_OBJS): Add d/pragmasem.o.
* d-builtins.cc (strip_type_modifiers): Update for new front-end
interface.
* d-codegen.cc (declaration_type): Likewise.
(parameter_type): Likewise.
* d-target.cc (TargetCPP::parameterType): Likewise.
* expr.cc (ExprVisitor::visit (IndexExp *)): Likewise.
(ExprVisitor::visit (VarExp *)): Likewise.
(ExprVisitor::visit (AssocArrayLiteralExp *)): Likewise.
* runtime.cc (get_libcall_type): Likewise.
* typeinfo.cc (TypeInfoVisitor::visit (TypeInfoConstDeclaration *)):
Likewise.
(TypeInfoVisitor::visit (TypeInfoInvariantDeclaration *)): Likewise.
(TypeInfoVisitor::visit (TypeInfoSharedDeclaration *)): Likewise.
(TypeInfoVisitor::visit (TypeInfoWildDeclaration *)): Likewise.
* types.cc (build_ctype): Likewise.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime a6f1083699.
* src/MERGE: Merge upstream phobos 31dedd7da.

Bootstrapped and regression tested on x86_64-linux-gnu/-m32, committed
to mainline.

Regards,
Iain.

---
 gcc/d/Make-lang.in|1 +
 gcc/d/d-builtins.cc   |2 +-
 gcc/d/d-codegen.cc|4 +-
 gcc/d/d-target.cc |4 +-
 gcc/d/dmd/MERGE   |2 +-
 gcc/d/dmd/README.md   |1 +
 gcc/d/dmd/VERSION |2 +-
 gcc/d/dmd/constfold.d |6 +-
 gcc/d/dmd/cparse.d|2 +-
 gcc/d/dmd/ctfeexpr.d  |2 +-
 gcc/d/dmd/dcast.d |   20 +-
 gcc/d/dmd/dclass.d|1 +
 gcc/d/dmd/declaration.h   |1 -
 gcc/d/dmd/denum.d |7 +-
 gcc/d/dmd/dinterpret.d|   43 +-
 gcc/d/dmd/dmangle.d   |   20 +-
 gcc/d/dmd/dsymbol.h   |2 +-
 gcc/d/dmd/dsymbolsem.d| 1888 ++---
 gcc/d/dmd/dtemplate.d |  759 +--
 gcc/d/dmd/dtoh.d  |1 +
 gcc/d/dmd/enumsem.d   |6 +
 gcc/d/dmd/expression.d|3 +-
 gcc/d/dmd/expression.h|3 +-
 gcc/d/dmd/expressionsem.d |   31 +-
 gcc/d/dmd/func.d  |  172 +-
 gcc/d/dmd/funcsem.d   | 1150 ++
 gcc/d/dmd/hdrgen.d|3 +-
 gcc/d/dmd/initsem.d   |   86 +-
 gcc/d/dmd/mtype.d |  353 +--
 gcc/d/dmd/mtype.h |   26 +-
 gcc/d/dmd/opover.d|1 +
 gcc/d/dmd/optimize.d  |3 +-
 gcc/d/dmd/pragmasem.d |  650 ++
 gcc/d/dmd/scope.h |2 +-
 gcc/d/dmd/semantic2.d |   23 +-
 gcc/d/dmd/sideeffect.d|   10 +
 gcc/d/dmd/statementsem.d  |  181 +-
 gcc/d/dmd/templatesem.d   |  909 +++-
 gcc/d/dmd/typesem.d   |  304 ++-
 gcc/d/dmd/utils.d |   41 +
 gcc/d/expr.cc |9 +-
 gcc/d/runtime.cc  |6 +-
 gcc/d/typeinfo.cc |8 +-
 gcc/d/types.cc|2 +-
 gcc/testsuite/gdc.test/compilable/ddoc4162.d  |2 +-
 gcc/testsuite/gdc.test/compilable/ddoc5446.d  |2 +-
 gcc/testsuite/gdc.test/compilable/ddoc7795.d  |2 +-
 .../compilable/{ddoc12.d => ddoc_bom_UTF8.d}  |0
 gcc/testsuite/gdc.test/compilable/test24338.d |   10 +
 .../gdc.test/fail_compilation/discard_value.d |   34 +
 .../gdc.test/fail_compilation/fail12390.d |   16 -
 .../gdc.test/fail_compilation/gag4269a.d  |2 +-
 .../gdc.test/fail_compilation/gag4269b.d  |2 +-
 .../gdc.test/fail_compilation/gag4269c.d  |2 +-
 .../gdc.test/fail_compilation/gag4269d.d  |2 +-
 .../gdc.test/fail_compilation/gag4269e.d  |2 +-
 .../gdc.test/fail_compilation/gag4269f.d  |2 +-
 .../gdc.test/fail_compilation/gag4269g.d  |2 +-
 

Re: [PATCH]middle-end: check memory accesses in the destination block [PR113588].

2024-02-03 Thread Sam James


Toon Moene  writes:

> On 2/1/24 22:33, Tamar Christina wrote:
>
>> Bootstrapped Regtested on aarch64-none-linux-gnu and x86_64-pc-linux-gnu no 
>> issues.
>> Also checked both with --enable-lto --with-build-config='bootstrap-O3 
>> bootstrap-lto' --enable-multilib
>> and --enable-lto --with-build-config=bootstrap-O3 
>> --enable-checking=release,yes,rtl,extra;
>> and checked the libcrypt testsuite as reported on PR113467.
>
> Note that I still run into problems if bootstrapping
> --with-build-config=bootstrap-O3
> (https://gcc.gnu.org/pipermail/gcc-testresults/2024-February/806840.html),
> but it is not visible.
>
> That is because it happens in the test suite of gmp, which I build
> locally as part of the build.
>
> It *is* visible in the full log of the bootstrap:

Can you file a bug please? The GMP test suite passes for me. It sounds
like it _might_ be PR113576?

> [...]

thanks,
sam


[committed] Fix xfail for 32-bit hppa*-*-* in gcc.dg/pr84877.c

2024-02-03 Thread John David Anglin
Tested on hppa-unknown-linux-gnu.  Committed to trunk.

Dave
---

Fix xfail for 32-bit hppa*-*-* in gcc.dg/pr84877.c

2024-02-03  John David Anglin  

gcc/testsuite/ChangeLog:

* gcc.dg/pr84877.c: Adjust xfail parentheses.

diff --git a/gcc/testsuite/gcc.dg/pr84877.c b/gcc/testsuite/gcc.dg/pr84877.c
index 68681206e73..e82991f42dd 100644
--- a/gcc/testsuite/gcc.dg/pr84877.c
+++ b/gcc/testsuite/gcc.dg/pr84877.c
@@ -1,4 +1,4 @@
-/* { dg-do run { xfail { cris-*-* sparc*-*-* } || { { ! lp64 } && hppa*-*-* } 
} } */
+/* { dg-do run { xfail { { cris-*-* sparc*-*-* } || { { ! lp64 } && hppa*-*-* 
} } } } */
 /* { dg-options "-O2" } */
 
 #include 


signature.asc
Description: PGP signature


[PATCH 1/2 v2] libdecnumber: fixed multiple potential access-out-of bounds errors by moving range conditions before reads.

2024-02-03 Thread Ian McCormack
Multiple `for` loops across `libdecnumber` contain boolean expressions where 
memory is accessed prior to checking if the pointer is still within a valid 
range, which can lead to out-of-bounds reads.

This patch moves the range conditions to appear before the memory accesses in 
each conjunction so that these expressions short-circuit instead of performing 
an invalid read. 

libdecnumber/ChangeLog
   * In each `for` loop and `if` statement, all boolean expressions of the 
form `*ptr == value && in_range(ptr)` have been changed to `in_range(ptr) && 
*ptr == value`.

Bootstrapped on x86_64-pc-linux-gnu with no regressions.
---
 libdecnumber/decBasic.c  | 20 ++--
 libdecnumber/decCommon.c |  2 +-
 libdecnumber/decNumber.c |  2 +-
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/libdecnumber/decBasic.c b/libdecnumber/decBasic.c
index 6319f66b25d..04833f2390d 100644
--- a/libdecnumber/decBasic.c
+++ b/libdecnumber/decBasic.c
@@ -341,7 +341,7 @@ static decFloat * decDivide(decFloat *result, const 
decFloat *dfl,
 for (;;) {   /* inner loop -- calculate quotient unit */
   /* strip leading zero units from acc (either there initially or */
   /* from subtraction below); this may strip all if exactly 0 */
-  for (; *msua==0 && msua>=lsua;) msua--;
+  for (; msua>=lsua && *msua==0;) msua--;
   accunits=(Int)(msua-lsua+1);   /* [maybe 0] */
   /* subtraction is only necessary and possible if there are as */
   /* least as many units remaining in acc for this iteration as */
@@ -515,7 +515,7 @@ static decFloat * decDivide(decFloat *result, const 
decFloat *dfl,
 /* (must also continue to original lsu for correct quotient length) */
 if (lsua>acc+DIVACCLEN-DIVOPLEN) continue;
 for (; msua>lsua && *msua==0;) msua--;
-if (*msua==0 && msua==lsua) break;
+if (msua==lsua && *msua==0) break;
 } /* outer loop */
 
   /* all of the original operand in acc has been covered at this point */
@@ -1543,8 +1543,8 @@ decFloat * decFloatAdd(decFloat *result,
umsd=acc+COFF+DECPMAX-1;   /* so far, so zero */
if (ulsd>umsd) {   /* more to check */
  umsd++;  /* to align after checked area */
- for (; UBTOUI(umsd)==0 && umsd+3msd)==0 && hi->msd+3lsd;) hi->msd+=4;
-  for (; *hi->msd==0 && hi->msdlsd;) hi->msd++;
-  for (; UBTOUI(lo->msd)==0 && lo->msd+3lsd;) lo->msd+=4;
-  for (; *lo->msd==0 && lo->msdlsd;) lo->msd++;
+  for (; hi->msd+3lsd && UBTOUI(hi->msd)==0;) hi->msd+=4;
+  for (; hi->msdlsd && *hi->msd==0;) hi->msd++;
+  for (; lo->msd+3lsd && UBTOUI(lo->msd)==0;) lo->msd+=4;
+  for (; lo->msdlsd && *lo->msd==0;) lo->msd++;
 
   /* if hi is zero then result will be lo (which has the smaller */
   /* exponent), which also may need to be tested for zero for the */
@@ -2252,8 +2252,8 @@ decFloat * decFloatFMA(decFloat *result, const decFloat 
*dfl,
   /* all done except for the special IEEE 754 exact-zero-result */
   /* rule (see above); while testing for zero, strip leading */
   /* zeros (which will save decFinalize doing it) */
-  for (; UBTOUI(lo->msd)==0 && lo->msd+3lsd;) lo->msd+=4;
-  for (; *lo->msd==0 && lo->msdlsd;) lo->msd++;
+  for (; lo->msd+3lsd && UBTOUI(lo->msd)==0;) lo->msd+=4;
+  for (; lo->msdlsd && *lo->msd==0;) lo->msd++;
   if (*lo->msd==0) {  /* must be true zero (and diffsign) */
lo->sign=0;/* assume + */
if (set->round==DEC_ROUND_FLOOR) lo->sign=DECFLOAT_Sign;
diff --git a/libdecnumber/decCommon.c b/libdecnumber/decCommon.c
index 6f7563de6e6..1f9fe4a1935 100644
--- a/libdecnumber/decCommon.c
+++ b/libdecnumber/decCommon.c
@@ -276,7 +276,7 @@ static decFloat * decFinalize(decFloat *df, bcdnum *num,
 /* [this is quite expensive] */
 if (*umsd==0) {
   for (; umsd+3exponent);
diff --git a/libdecnumber/decNumber.c b/libdecnumber/decNumber.c
index 094bc51c14a..89baef15749 100644
--- a/libdecnumber/decNumber.c
+++ b/libdecnumber/decNumber.c
@@ -4505,7 +4505,7 @@ static decNumber * decDivideOp(decNumber *res,
   for (;;) {   /* inner forever loop */
/* strip leading zero units [from either pre-adjust or from */
/* subtract last time around].  Leave at least one unit. */
-   for (; *msu1==0 && msu1>var1; msu1--) var1units--;
+   for (; msu1>var1 && *msu1==0; msu1--) var1units--;
 
if (var1units

[committed] libatomic: Provide FPU exception defines for hppa

2024-02-03 Thread John David Anglin
Tested on hppa64-hp-hpux11.11.  Committed to trunk.

Dave
---

libatomic: Provide FPU exception defines for hppa

The exception defines in  do not match the exception bits
in the FPU status register on hppa-linux and hppa64-hpux11.11.  On
linux, they match the trap enable bits.  On 64-bit hpux, they match
the exception bits for IA64.  The IA64 bits are in a different
order and location than HPPA.  HP uses table look ups to reorder
the bits in code to test and raise exceptions.

All the architectures that I looked at just pass the FPU status
register to __atomic_feraiseexcept().  The simplest approach for
hppa is to define FE_INEXACT, etc, to match the status register
and not include .

2024-02-03  John David Anglin  

libatomic/ChangeLog:

PR target/59778
* configure.tgt (hppa*): Set ARCH.
* config/pa/fenv.c: New file.

diff --git a/libatomic/config/pa/fenv.c b/libatomic/config/pa/fenv.c
new file mode 100644
index 000..232e8416ffd
--- /dev/null
+++ b/libatomic/config/pa/fenv.c
@@ -0,0 +1,74 @@
+/* Copyright (C) 2012-2024 Free Software Foundation, Inc.
+
+   This file is part of the GNU Atomic Library (libatomic).
+
+   Libatomic is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   Libatomic is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#include "libatomic_i.h"
+
+#define FE_INEXACT (1<<27)
+#define FE_UNDERFLOW   (1<<28)
+#define FE_OVERFLOW(1<<29)
+#define FE_DIVBYZERO   (1<<30)
+#define FE_INVALID (1<<31)
+
+/* Raise the supported floating-point exceptions from EXCEPTS.  Other
+   bits in EXCEPTS are ignored.  */
+
+void
+__atomic_feraiseexcept (int excepts __attribute__ ((unused)))
+{
+  volatile float r __attribute__ ((unused));
+#ifdef FE_INVALID
+  if (excepts & FE_INVALID)
+  {
+volatile float zero = 0.0f;
+r = zero / zero;
+  }
+#endif
+#ifdef FE_DIVBYZERO
+  if (excepts & FE_DIVBYZERO)
+{
+  volatile float zero = 0.0f;
+  r = 1.0f / zero;
+}
+#endif
+#ifdef FE_OVERFLOW
+  if (excepts & FE_OVERFLOW)
+{
+  volatile float max = __FLT_MAX__;
+  r = max * max;
+}
+#endif
+#ifdef FE_UNDERFLOW
+  if (excepts & FE_UNDERFLOW)
+{
+  volatile float min = __FLT_MIN__;
+  r = min * min;
+}
+#endif
+#ifdef FE_INEXACT
+  if (excepts & FE_INEXACT)
+{
+  volatile float three = 3.0f;
+  r = 1.0f / three;
+}
+#endif
+}
diff --git a/libatomic/configure.tgt b/libatomic/configure.tgt
index 67a5f2dff80..4237f283fe4 100644
--- a/libatomic/configure.tgt
+++ b/libatomic/configure.tgt
@@ -36,6 +36,7 @@ case "${target_cpu}" in
XCFLAGS="${XCFLAGS} -mfp-trap-mode=sui"
ARCH=alpha
;;
+  hppa*)   ARCH=pa ;;
   rs6000 | powerpc*)   ARCH=powerpc ;;
   riscv*)  ARCH=riscv ;;
   sh*) ARCH=sh ;;


signature.asc
Description: PGP signature


Re: [PATCH 1/2] xtensa: Recover constant synthesis for HImode after LRA transition

2024-02-03 Thread Max Filippov
Hi Suwa-san,

On Sat, Feb 3, 2024 at 6:20 AM Takayuki 'January June' Suwa
 wrote:
> After LRA transition, HImode constants that don't fit into signed 12 bits
> are no longer subject to constant synthesis:

with this change I get multiple ICEs during libgomp, libgfortran and
libstdc++ builds, e.g.:

/home/jcmvbkbc/ws/tensilica/gcc/gcc/libstdc++-v3/src/c++20/tzdb.cc:1228:3:
error: unrecognizable insn:
1228 |   }
 |   ^
(insn 3131 27 3132 2 (set (subreg:SI (reg:DI 176) 0)
   (const_int 78796800 [0x4b25800]))
"/home/jcmvbkbc/ws/tensilica/gcc/builds/gcc-14-8779-ge15d00be88c1-xtensa-call0-le/xtensa-buildroot-linux-uclibc/libstdc++-v3/include/bits/chrono.h":574:6
-1
(nil))
during RTL pass: subreg3
/home/jcmvbkbc/ws/tensilica/gcc/gcc/libstdc++-v3/src/c++20/tzdb.cc:1228:3:
internal compiler error: in extract_insn, at recog.cc:2812
0x7cb898 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)
   /home/jcmvbkbc/ws/tensilica/gcc/gcc/gcc/rtl-error.cc:108
0x7cb8b4 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
   /home/jcmvbkbc/ws/tensilica/gcc/gcc/gcc/rtl-error.cc:116
0x7ca31e extract_insn(rtx_insn*)
   /home/jcmvbkbc/ws/tensilica/gcc/gcc/gcc/recog.cc:2812
0x1c08b57 decompose_multiword_subregs
   /home/jcmvbkbc/ws/tensilica/gcc/gcc/gcc/lower-subreg.cc:1569
0x1c09d7d execute
   /home/jcmvbkbc/ws/tensilica/gcc/gcc/gcc/lower-subreg.cc:1834



/home/jcmvbkbc/ws/tensilica/gcc/gcc/libstdc++-v3/src/filesystem/ops.cc:936:1:
error: unrecognizable insn:
 936 | }
 | ^
(insn 260 21 261 2 (set (reg:SI 4 a4)
   (const_int 10 [0x3b9aca00]))
"/home/jcmvbkbc/ws/tensilica/gcc/builds/gcc-14-8779-ge15d00be88c1-xtensa-call0-le/xtensa-buildroot-linux-uclibc/libstdc++-v3/include/bits/chrono.h":214:38
discrim 1 -1
(nil))
during RTL pass: subreg3
/home/jcmvbkbc/ws/tensilica/gcc/gcc/libstdc++-v3/src/filesystem/ops.cc:936:1:
internal compiler error: in extract_insn, at recog.cc:2812
0x7cb898 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)
   /home/jcmvbkbc/ws/tensilica/gcc/gcc/gcc/rtl-error.cc:108
0x7cb8b4 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
   /home/jcmvbkbc/ws/tensilica/gcc/gcc/gcc/rtl-error.cc:116
0x7ca31e extract_insn(rtx_insn*)
   /home/jcmvbkbc/ws/tensilica/gcc/gcc/gcc/recog.cc:2812
0x1c08b57 decompose_multiword_subregs
   /home/jcmvbkbc/ws/tensilica/gcc/gcc/gcc/lower-subreg.cc:1569
0x1c09d7d execute
   /home/jcmvbkbc/ws/tensilica/gcc/gcc/gcc/lower-subreg.cc:1834


-- 
Thanks.
-- Max


[PATCH] c++: DR2237, cdtor and template-id tweaks [PR107126]

2024-02-03 Thread Marek Polacek
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

I'm not certain OPT_Wc__20_extensions is the best thing for something
from [diff.cpp17]; would you prefer something else?

-- >8 --
Since my r11-532 changes to implement DR2237, for this test:

  template
  struct S {
S();
  };

in C++20 we emit the ugly:

q.C:3:8: error: expected unqualified-id before ')' token
3 |   S();

which doesn't explain what the problem is.  This patch improves that
diagnostic, reduces the error to a pedwarn, and adds a -Wc++20-compat
diagnostic.  We now say:

q.C:3:7: warning: template-id not allowed for constructor [-Wc++20-extensions]
3 |   S();

This patch does *not* fix

where the C++20 diagnostic is missing altogether.  Something for the
next stage1 I reckon.

-Wc++20-compat triggered in libitm/; I sent a patch for that.

DR 2237
PR c++/107126
PR c++/97202

gcc/cp/ChangeLog:

* parser.cc (cp_parser_unqualified_id): Downgrade the DR2237 error to
a pedwarn.  Emit a -Wc++20-compat message.
(cp_parser_constructor_declarator_p): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/DRs/dr2237.C: Adjust dg-error.
* g++.dg/parse/constructor2.C: Likewise.
* g++.dg/template/error34.C: Likewise.
* g++.old-deja/g++.pt/ctor2.C: Likewise.
* g++.dg/DRs/dr2237-2.C: New test.
* g++.dg/DRs/dr2237-3.C: New test.
* g++.dg/DRs/dr2237-4.C: New test.
* g++.dg/diagnostic/cdtor-template1.C: New test.
---
 gcc/cp/parser.cc  | 33 ++-
 gcc/testsuite/g++.dg/DRs/dr2237-2.C   |  9 +
 gcc/testsuite/g++.dg/DRs/dr2237-3.C   | 16 +
 gcc/testsuite/g++.dg/DRs/dr2237-4.C   | 11 +++
 gcc/testsuite/g++.dg/DRs/dr2237.C |  2 +-
 .../g++.dg/diagnostic/cdtor-template1.C   |  9 +
 gcc/testsuite/g++.dg/parse/constructor2.C | 16 -
 gcc/testsuite/g++.dg/template/error34.C   | 10 +++---
 gcc/testsuite/g++.old-deja/g++.pt/ctor2.C |  2 +-
 9 files changed, 85 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/DRs/dr2237-2.C
 create mode 100644 gcc/testsuite/g++.dg/DRs/dr2237-3.C
 create mode 100644 gcc/testsuite/g++.dg/DRs/dr2237-4.C
 create mode 100644 gcc/testsuite/g++.dg/diagnostic/cdtor-template1.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 3748ccd49ff..4f7d4edbad9 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -6717,12 +6717,17 @@ cp_parser_unqualified_id (cp_parser* parser,
 
/* DR 2237 (C++20 only): A simple-template-id is no longer valid as the
   declarator-id of a constructor or destructor.  */
-   if (token->type == CPP_TEMPLATE_ID && declarator_p
-   && cxx_dialect >= cxx20)
+   if (token->type == CPP_TEMPLATE_ID && declarator_p)
  {
-   if (!cp_parser_simulate_error (parser))
- error_at (tilde_loc, "template-id not allowed for destructor");
-   return error_mark_node;
+   if (cxx_dialect >= cxx20)
+ {
+   if (!cp_parser_simulate_error (parser))
+ pedwarn (tilde_loc, OPT_Wc__20_extensions,
+  "template-id not allowed for destructor");
+   return error_mark_node;
+ }
+   warning_at (tilde_loc, OPT_Wc__20_compat,
+   "template-id not allowed for destructor in C++20");
  }
 
/* If there was an explicit qualification (S::~T), first look
@@ -32329,11 +32334,11 @@ cp_parser_constructor_declarator_p (cp_parser 
*parser, cp_parser_flags flags,
   if (next_token->type != CPP_NAME
   && next_token->type != CPP_SCOPE
   && next_token->type != CPP_NESTED_NAME_SPECIFIER
-  /* DR 2237 (C++20 only): A simple-template-id is no longer valid as the
-declarator-id of a constructor or destructor.  */
-  && (next_token->type != CPP_TEMPLATE_ID || cxx_dialect >= cxx20))
+  && next_token->type != CPP_TEMPLATE_ID)
 return false;
 
+  const bool saw_template_id = (next_token->type == CPP_TEMPLATE_ID);
+
   /* Parse tentatively; we are going to roll back all of the tokens
  consumed here.  */
   cp_parser_parse_tentatively (parser);
@@ -32550,6 +32555,18 @@ cp_parser_constructor_declarator_p (cp_parser *parser, 
cp_parser_flags flags,
   /* We did not really want to consume any tokens.  */
   cp_parser_abort_tentative_parse (parser);
 
+  /* DR 2237 (C++20 only): A simple-template-id is no longer valid as the
+ declarator-id of a constructor or destructor.  */
+  if (constructor_p && saw_template_id)
+{
+  if (cxx_dialect >= cxx20)
+   pedwarn (input_location, OPT_Wc__20_extensions,
+"template-id not allowed for constructor");
+  else
+   warning (OPT_Wc__20_compat,
+"template-id not allowed for constructor in C++20");
+}
+
   return 

[PATCH] libitm: small update for C++20

2024-02-03 Thread Marek Polacek
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
C++20 DR 2237 disallows simple-template-id in cdtors, so you
can't write

template
struct S {
  S(); // should be S();
};

This hasn't been a problem until now but I'm adding a warning about it
to -Wc++20-compat which libitm apparently uses.

libitm/ChangeLog:

* containers.h (vector): Remove the template-id in constructors.
---
 libitm/containers.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/libitm/containers.h b/libitm/containers.h
index 2842fa038ed..4160b16d569 100644
--- a/libitm/containers.h
+++ b/libitm/containers.h
@@ -48,7 +48,7 @@ class vector
   static const size_t default_resize_min = 32;
 
   // Don't try to copy this vector.
-  vector(const vector& x);
+  vector(const vector& x);
 
  public:
   typedef T datatype;
@@ -59,7 +59,7 @@ class vector
   T& operator[] (size_t pos) { return entries[pos]; }
   const T& operator[] (size_t pos) const  { return entries[pos]; }
 
-  vector(size_t initial_size = default_initial_capacity)
+  vector(size_t initial_size = default_initial_capacity)
 : m_capacity(initial_size),
   m_size(0)
   {
@@ -68,7 +68,7 @@ class vector
 else
   entries = 0;
   }
-  ~vector() { if (m_capacity) free(entries); }
+  ~vector() { if (m_capacity) free(entries); }
 
   void resize(size_t additional_capacity)
   {

base-commit: 78005c648921899a674d1e561b49b05ccabedfe0
-- 
2.43.0



Re: [patch, libgfortran] PR111022 ES0.0E0 format gave ES0.dE0 output with d too high.

2024-02-03 Thread Harald Anlauf

Jerry, Steve,

Am 03.02.24 um 04:24 schrieb Steve Kargl:

Jerry,

The patch looks good to me, but please give Harald a chance
to comment.



I just tested it a little, and it looked good.

We even get a runtime error on E0.0 now as required.  :-)

Thanks for the patch!

Harald




[PATCH 1/2] xtensa: Recover constant synthesis for HImode after LRA transition

2024-02-03 Thread Takayuki 'January June' Suwa
After LRA transition, HImode constants that don't fit into signed 12 bits
are no longer subject to constant synthesis:

/* example */
void test(void) {
  short foo = 32767;
  __asm__ ("" :: "r"(foo));
}

;; before
.literal_position
.literal .LC0, 32767
test:
l32ra9, .LC0
ret.n

This patch fixes that:

;; after
test:
movi.n  a9, -1
extui   a9, a9, 17, 15
ret.n

gcc/ChangeLog:

* config/xtensa/xtensa.md (2 split patterns related to constsynth):
Change to also accept HImode operands.
---
 gcc/config/xtensa/xtensa.md | 30 +++---
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index f3953aa26b0..5242eb3c006 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -1291,28 +1291,36 @@
(set_attr "length"  "2,2,2,2,2,2,3,3,3,3,6,3,3,3,3,3")])
 
 (define_split
-  [(set (match_operand:SI 0 "register_operand")
-   (match_operand:SI 1 "const_int_operand"))]
+  [(set (match_operand 0 "register_operand")
+   (match_operand 1 "const_int_operand"))]
   "!TARGET_CONST16 && !TARGET_AUTO_LITPOOLS
&& ! xtensa_split1_finished_p ()
-   && ! xtensa_simm12b (INTVAL (operands[1]))"
+   && ! xtensa_simm12b (INTVAL (operands[1]))
+   && GET_MODE (operands[0]) == GET_MODE (operands[1])
+   && (GET_MODE (operands[0]) == SImode
+   || GET_MODE (operands[0]) == HImode)"
   [(set (match_dup 0)
(match_dup 1))]
 {
-  operands[1] = force_const_mem (SImode, operands[1]);
+  operands[1] = force_const_mem (GET_MODE (operands[0]), operands[1]);
 })
 
 (define_split
-  [(set (match_operand:SI 0 "register_operand")
-   (match_operand:SI 1 "constantpool_operand"))]
-  "! optimize_debug && reload_completed"
+  [(set (match_operand 0 "register_operand")
+   (match_operand 1 "constantpool_operand"))]
+  "! optimize_debug && reload_completed
+   && GET_MODE (operands[0]) == GET_MODE (operands[1])
+   && (GET_MODE (operands[0]) == SImode
+   || GET_MODE (operands[0]) == HImode)"
   [(const_int 0)]
 {
-  rtx x = avoid_constant_pool_reference (operands[1]);
-  if (! CONST_INT_P (x))
+  rtx x, dst;
+  if (! CONST_INT_P (x = avoid_constant_pool_reference (operands[1])))
 FAIL;
-  if (! xtensa_constantsynth (operands[0], INTVAL (x)))
-emit_move_insn (operands[0], x);
+  if (GET_MODE (dst = operands[0]) == HImode)
+dst = gen_rtx_REG (SImode, REGNO (dst));
+  if (! xtensa_constantsynth (dst, INTVAL (x)))
+emit_move_insn (dst, x);
   DONE;
 })
 
-- 
2.30.2


[PATCH 2/2] xtensa: Fix missing mode warning in "*eqne_zero_masked_bits"

2024-02-03 Thread Takayuki 'January June' Suwa
gcc/ChangeLog:

* config/xtensa/xtensa.md (*eqne_zero_masked_bits):
Add missing ":SI" to the match_operator.
---
 gcc/config/xtensa/xtensa.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index 5242eb3c006..1a031d79cf3 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -3271,7 +3271,7 @@
 
 (define_insn_and_split "*eqne_zero_masked_bits"
   [(set (match_operand:SI 0 "register_operand" "=a")
-   (match_operator 3 "boolean_operator"
+   (match_operator:SI 3 "boolean_operator"
[(and:SI (match_operand:SI 1 "register_operand" "r")
 (match_operand:SI 2 "const_int_operand" "i"))
 (const_int 0)]))]
-- 
2.30.2


[committed] MAINTAINERS: Update my e-mail address

2024-02-03 Thread Maciej W. Rozycki
* MAINTAINERS: Update my e-mail address.
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index b47e0465852..3720344308e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -124,7 +124,7 @@ sparc port  David S. Miller 

 sparc port Eric Botcazou   
 v850 port  Nick Clifton
 vax port   Matt Thomas 
-vax port   Maciej W. Rozycki   
+vax port   Maciej W. Rozycki   
 visium portEric Botcazou   
 x86-64 portJan Hubicka 
 xstormy16 port Nick Clifton
-- 
2.11.0



Re: [PATCH] wide-int: Fix up wi::bswap_large [PR113722]

2024-02-03 Thread Richard Biener



> Am 03.02.2024 um 09:46 schrieb Jakub Jelinek :
> 
> Hi!
> 
> Since bswap has been converted from a method to a function we miscompile
> the following testcase.  The problem is the assumption that the passed in
> len argument (number of limbs in the xval array) is the upper bound for the
> bswap result, which is true only if precision is <= 64.  If precision is
> larger than that, e.g. 128 as in the testcase, if the argument has only
> one limb (i.e. 0 to ~(unsigned HOST_WIDE_INT) 0), the result can still
> need 2 limbs for that precision, or generally BLOCKS_NEEDED (precision)
> limbs, it all depends on how many least significant limbs of the operand
> are zero.  bswap_large as implemented only cleared len limbs of result,
> then swapped the bytes (invoking UB when oring something in all the limbs
> above it) and finally passed len to canonize, saying that more limbs
> aren't needed.
> 
> The following patch fixes it by renaming len to xlen (so that it is clear
> it is X's length), using it solely for safe_uhwi argument when we attempt
> to read from X, and using new len = BLOCKS_NEEDED (precision) instead in
> the other two spots (i.e. when clearing the val array, turned it also
> into memset, and in canonize argument).  wi::bswap asserts it isn't invoked
> on widest_int, so we are always invoked on wide_int or similar and those
> have preallocated result sized for the corresponding precision (i.e.
> BLOCKS_NEEDED (precision)).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok

Richard 

> 2024-02-03  Jakub Jelinek  
> 
>PR middle-end/113722
>* wide-int.cc (wi::bswap_large): Rename third argument from
>len to xlen and adjust use in safe_uhwi.  Add len variable, set
>it to BLOCKS_NEEDED (precision) and use it for clearing of val
>and as canonize argument.  Clear val using memset instead of
>a loop.
>
>* gcc.dg/pr113722.c: New test.
> 
> --- gcc/wide-int.cc.jj2024-01-03 11:51:42.077584823 +0100
> +++ gcc/wide-int.cc2024-02-02 18:13:34.993332159 +0100
> @@ -729,20 +729,19 @@ wi::set_bit_large (HOST_WIDE_INT *val, c
> }
> }
> 
> -/* Byte swap the integer represented by XVAL and LEN into VAL.  Return
> +/* Byte swap the integer represented by XVAL and XLEN into VAL.  Return
>the number of blocks in VAL.  Both XVAL and VAL have PRECISION bits.  */
> unsigned int
> wi::bswap_large (HOST_WIDE_INT *val, const HOST_WIDE_INT *xval,
> - unsigned int len, unsigned int precision)
> + unsigned int xlen, unsigned int precision)
> {
> -  unsigned int i, s;
> +  unsigned int s, len = BLOCKS_NEEDED (precision);
> 
>   /* This is not a well defined operation if the precision is not a
>  multiple of 8.  */
>   gcc_assert ((precision & 0x7) == 0);
> 
> -  for (i = 0; i < len; i++)
> -val[i] = 0;
> +  memset (val, 0, sizeof (unsigned HOST_WIDE_INT) * len);
> 
>   /* Only swap the bytes that are not the padding.  */
>   for (s = 0; s < precision; s += 8)
> @@ -753,7 +752,7 @@ wi::bswap_large (HOST_WIDE_INT *val, con
>   unsigned int block = s / HOST_BITS_PER_WIDE_INT;
>   unsigned int offset = s & (HOST_BITS_PER_WIDE_INT - 1);
> 
> -  byte = (safe_uhwi (xval, len, block) >> offset) & 0xff;
> +  byte = (safe_uhwi (xval, xlen, block) >> offset) & 0xff;
> 
>   block = d / HOST_BITS_PER_WIDE_INT;
>   offset = d & (HOST_BITS_PER_WIDE_INT - 1);
> --- gcc/testsuite/gcc.dg/pr113722.c.jj2024-02-02 18:25:22.702561427 +0100
> +++ gcc/testsuite/gcc.dg/pr113722.c2024-02-02 18:21:00.109186858 +0100
> @@ -0,0 +1,22 @@
> +/* PR middle-end/113722 */
> +/* { dg-do run { target int128 } } */
> +/* { dg-options "-O2" } */
> +
> +int
> +main ()
> +{
> +  unsigned __int128 a = __builtin_bswap128 ((unsigned __int128) 2);
> +  if (a != ((unsigned __int128) 2) << 120)
> +__builtin_abort ();
> +  a = __builtin_bswap128 ((unsigned __int128) 0xdeadbeefULL);
> +  if (a != ((unsigned __int128) 0xefbeaddeULL) << 96)
> +__builtin_abort ();
> +  a = __builtin_bswap128 (((unsigned __int128) 0xdeadbeefULL) << 64);
> +  if (a != ((unsigned __int128) 0xefbeaddeULL) << 32)
> +__builtin_abort ();
> +  a = __builtin_bswap128 unsigned __int128) 0xdeadbeefULL) << 64)
> +  | 0xcafed00dfeedbac1ULL);
> +  if (a != unsigned __int128) 0xc1baedfe0dd0fecaULL) << 64)
> +| (((unsigned __int128) 0xefbeaddeULL) << 32)))
> +__builtin_abort ();
> +}
> 
>Jakub
> 


Re: [PATCH] ggc-common: Fix save PCH assertion

2024-02-03 Thread Richard Biener



> Am 03.02.2024 um 09:36 schrieb Jakub Jelinek :
> 
> Hi!
> 
> We are getting a gnuradio PCH ICE
> /usr/include/pybind11/stl.h:447:1: internal compiler error: in gt_pch_save, 
> at ggc-common.cc:693
> 0x1304e7d gt_pch_save(_IO_FILE*)
>../../gcc/ggc-common.cc:693
> 0x12a45fb c_common_write_pch()
>../../gcc/c-family/c-pch.cc:175
> 0x18ad711 c_parse_final_cleanups()
>../../gcc/cp/decl2.cc:5062
> 0x213988b c_common_parse_file()
>../../gcc/c-family/c-opts.cc:1319
> (unfortunately it isn't reproduceable always, but often needs
> up to 100 attempts, isn't reproduceable in a cross etc.).
> The bug is in the assertion I've added in gt_pch_save when adding
> relocation support for the PCH files in case they happen not to be
> mmapped at the selected address.
> addr is a relocated address which points to a location in the PCH
> blob (starting at mmi.preferred_base, with mmi.size bytes) which contains
> a pointer that needs to be relocated.  So the assertion is meant to
> verify the address is within the PCH blob, obviously it needs to be
> equal or above mmi.preferred_base, but I got the other comparison wrong
> and when one is very unlucky and the last sizeof (void *) bytes of the
> blob happen to be a pointer which needs to be relocated, such as on the
> s390x host addr 0x8008a04ff8, mmi.preferred_base 0x80 and
> mmi.size 0x8a05000, addr + sizeof (void *) is equal to mmi.preferred_base +
> mmi.size and that is still fine, both addresses are end of something.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, plus tested on s390x
> on the testcase which was ICEing in 1-100 iterations and there it survived
> 7750 attempts without ICE (forgot to stop it earlier), ok for trunk?
> 

Ok

Richard 
> 2024-02-03  Jakub Jelinek  
> 
>* ggc-common.cc (gt_pch_save): Allow addr to be equal to
>mmi.preferred_base + mmi.size - sizeof (void *).
> 
> --- gcc/ggc-common.cc.jj2024-01-03 11:51:39.397622018 +0100
> +++ gcc/ggc-common.cc2024-02-02 17:33:13.106727473 +0100
> @@ -692,7 +692,7 @@ gt_pch_save (FILE *f)
> {
>   gcc_assert ((uintptr_t) addr >= (uintptr_t) mmi.preferred_base
>  && ((uintptr_t) addr + sizeof (void *)
> -  < (uintptr_t) mmi.preferred_base + mmi.size));
> +  <= (uintptr_t) mmi.preferred_base + mmi.size));
>   if (addr == last_addr)
>continue;
>   if (last_addr == NULL)
> 
>Jakub
> 


Re: [PATCH v2] RISC-V: THEAD: Fix improper immediate value for MODIFY_DISP instruction on 32-bit systems.

2024-02-03 Thread Andreas Schwab
On Jan 30 2024, Christoph Müllner wrote:

> retested

Nope.

../../gcc/config/riscv/thead.cc:1144:22: error: invalid suffix on literal; 
C++11 requires a space between literal and string macro [-Werror=literal-suffix]
 1144 |   fprintf (file, "(%s),"HOST_WIDE_INT_PRINT_DEC",%u", 
reg_names[REGNO (addr.reg)],
  |  ^
cc1plus: all warnings being treated as errors
make[3]: *** [../../gcc/config/riscv/t-riscv:127: thead.o] Error 1

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


[committed] d: Merge dmd. druntime e770945277, phobos 6d6e0b9b9

2024-02-03 Thread Iain Buclaw
Hi,

This patch merges the D front-end and runtime library with upstream dmd
e770945277, and the standard runtime library with phobos 6d6e0b9b9.

Synchronizing with the upstream release candidate as of 2024-01-27.

D front-end changes:

- Import latest fixes from dmd v2.107.0-beta.1.
- Hex strings can now be cast to integer arrays.
- Add support for Interpolated Expression Sequences.

D runtime changes:

- Import latest fixes from druntime v2.107.0-beta.1.
- New core.interpolation module to provide run-time support for D
  interpolated expression sequence literals.

Phobos changes:

- Import latest fixes from phobos v2.107.0-beta.1.
- `std.range.primitives.isBidirectionalRange', and
  `std.range.primitives.isRandomAccessRange' now take an optional
  element type.

Bootstrapped and regression tested on x86_64-linux-gnu/-m32, committed
to mainline.

Regards,
Iain.

---
gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd e770945277.
* Make-lang.in (D_FRONTEND_OBJS): Add d/basicmangle.o, d/enumsem.o,
d/funcsem.o, d/templatesem.o.
* d-builtins.cc (build_frontend_type): Update for new front-end
interface.
* d-codegen.cc (declaration_type): Likewise.
(parameter_type): Likewise.
* d-incpath.cc (add_globalpaths): Likewise.
(add_filepaths): Likewise.
(add_import_paths): Likewise.
* d-lang.cc (d_init_options): Likewise.
(d_handle_option): Likewise.
(d_parse_file): Likewise.
* decl.cc (DeclVisitor::finish_vtable): Likewise.
(DeclVisitor::visit (FuncDeclaration *)): Likewise.
(get_symbol_decl): Likewise.
* expr.cc (ExprVisitor::visit (StringExp *)): Likewise.
Implement support for 8-byte hexadecimal strings.
* typeinfo.cc (create_tinfo_types): Update internal TypeInfo
representation.
(TypeInfoVisitor::visit (TypeInfoConstDeclaration *)): Update for new
front-end interface.
(TypeInfoVisitor::visit (TypeInfoInvariantDeclaration *)): Likewise.
(TypeInfoVisitor::visit (TypeInfoSharedDeclaration *)): Likewise.
(TypeInfoVisitor::visit (TypeInfoWildDeclaration *)): Likewise.
(TypeInfoVisitor::visit (TypeInfoClassDeclaration *)): Move data for
TypeInfo_Class.nameSig to the end of the object.
(create_typeinfo): Update for new front-end interface.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime e770945277.
* libdruntime/Makefile.am (DRUNTIME_SOURCES): Add
core/interpolation.d.
* libdruntime/Makefile.in: Regenerate.
* src/MERGE: Merge upstream phobos 6d6e0b9b9.
---
 gcc/d/Make-lang.in|4 +
 gcc/d/d-builtins.cc   |2 +-
 gcc/d/d-codegen.cc|4 +-
 gcc/d/d-incpath.cc|   41 +-
 gcc/d/d-lang.cc   |   34 +-
 gcc/d/decl.cc |   37 +-
 gcc/d/dmd/MERGE   |2 +-
 gcc/d/dmd/README.md   |4 +
 gcc/d/dmd/aggregate.h |3 +-
 gcc/d/dmd/basicmangle.d   |  109 ++
 gcc/d/dmd/clone.d |9 +-
 gcc/d/dmd/common/outbuffer.d  |   27 +
 gcc/d/dmd/cond.d  |   19 +-
 gcc/d/dmd/constfold.d |6 +-
 gcc/d/dmd/ctfeexpr.d  |   10 +-
 gcc/d/dmd/dclass.d|2 +
 gcc/d/dmd/declaration.h   |7 +-
 gcc/d/dmd/denum.d |   85 -
 gcc/d/dmd/dinterpret.d|   68 +-
 gcc/d/dmd/dmangle.d   |  144 +-
 gcc/d/dmd/dmodule.d   |6 +-
 gcc/d/dmd/doc.d   |3 +-
 gcc/d/dmd/dstruct.d   |2 +-
 gcc/d/dmd/dsymbolsem.d|  574 +-
 gcc/d/dmd/dtemplate.d | 1646 +
 gcc/d/dmd/enum.h  |2 -
 gcc/d/dmd/enumsem.d   |  714 +++
 gcc/d/dmd/expression.d|   44 +-
 gcc/d/dmd/expression.h|   15 +-
 gcc/d/dmd/expressionsem.d |  103 +-
 gcc/d/dmd/func.d  |  199 +-
 gcc/d/dmd/funcsem.d   |  219 +++
 gcc/d/dmd/globals.d   |   12 +-
 gcc/d/dmd/globals.h   |   12 +-
 gcc/d/dmd/hdrgen.d|   84 +
 gcc/d/dmd/id.d|6 +
 gcc/d/dmd/json.d  |   14 +-
 gcc/d/dmd/lexer.d |  166 +-
 gcc/d/dmd/mtype.d |   56 +-
 gcc/d/dmd/mtype.h |

[PATCH 3/4] ira: Apply DF_LIVE_SUBREG data

2024-02-03 Thread Lehua Ding
This patch simple replace df_get_live_in to df_get_subreg_live_in
and replace df_get_live_out to df_get_subreg_live_out.

gcc/ChangeLog:

* ira-build.cc (create_bb_allocnos): Switch to DF_LIVE_SUBREG df data.
(create_loop_allocnos): Ditto.
* ira-color.cc (ira_loop_edge_freq): Ditto.
* ira-emit.cc (generate_edge_moves): Ditto.
(add_ranges_and_copies): Ditto.
* ira-lives.cc (process_out_of_region_eh_regs): Ditto.
(add_conflict_from_region_landing_pads): Ditto.
(process_bb_node_lives): Ditto.
* ira.cc (find_moveable_pseudos): Ditto.
(interesting_dest_for_shprep_1): Ditto.
(allocate_initial_values): Ditto.
(ira): Ditto.

---
 gcc/ira-build.cc |  7 ---
 gcc/ira-color.cc |  8 
 gcc/ira-emit.cc  | 12 ++--
 gcc/ira-lives.cc |  7 ---
 gcc/ira.cc   | 19 ---
 5 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index ea593d5a087..283ff36d3dd 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc
@@ -1921,7 +1921,8 @@ create_bb_allocnos (ira_loop_tree_node_t bb_node)
   create_insn_allocnos (PATTERN (insn), NULL, false);
   /* It might be a allocno living through from one subloop to
  another.  */
-  EXECUTE_IF_SET_IN_REG_SET (df_get_live_in (bb), FIRST_PSEUDO_REGISTER, i, bi)
+  EXECUTE_IF_SET_IN_REG_SET (df_get_subreg_live_in (bb), FIRST_PSEUDO_REGISTER,
+i, bi)
 if (ira_curr_regno_allocno_map[i] == NULL)
   ira_create_allocno (i, false, ira_curr_loop_tree_node);
 }
@@ -1937,9 +1938,9 @@ create_loop_allocnos (edge e)
   bitmap_iterator bi;
   ira_loop_tree_node_t parent;
 
-  live_in_regs = df_get_live_in (e->dest);
+  live_in_regs = df_get_subreg_live_in (e->dest);
   border_allocnos = ira_curr_loop_tree_node->border_allocnos;
-  EXECUTE_IF_SET_IN_REG_SET (df_get_live_out (e->src),
+  EXECUTE_IF_SET_IN_REG_SET (df_get_subreg_live_out (e->src),
 FIRST_PSEUDO_REGISTER, i, bi)
 if (bitmap_bit_p (live_in_regs, i))
   {
diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index b9ae32d1b4d..bfebc48ef83 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -2786,8 +2786,8 @@ ira_loop_edge_freq (ira_loop_tree_node_t loop_node, int 
regno, bool exit_p)
   FOR_EACH_EDGE (e, ei, loop_node->loop->header->preds)
if (e->src != loop_node->loop->latch
&& (regno < 0
-   || (bitmap_bit_p (df_get_live_out (e->src), regno)
-   && bitmap_bit_p (df_get_live_in (e->dest), regno
+   || (bitmap_bit_p (df_get_subreg_live_out (e->src), regno)
+   && bitmap_bit_p (df_get_subreg_live_in (e->dest), regno
  freq += EDGE_FREQUENCY (e);
 }
   else
@@ -2795,8 +2795,8 @@ ira_loop_edge_freq (ira_loop_tree_node_t loop_node, int 
regno, bool exit_p)
   auto_vec edges = get_loop_exit_edges (loop_node->loop);
   FOR_EACH_VEC_ELT (edges, i, e)
if (regno < 0
-   || (bitmap_bit_p (df_get_live_out (e->src), regno)
-   && bitmap_bit_p (df_get_live_in (e->dest), regno)))
+   || (bitmap_bit_p (df_get_subreg_live_out (e->src), regno)
+   && bitmap_bit_p (df_get_subreg_live_in (e->dest), regno)))
  freq += EDGE_FREQUENCY (e);
 }
 
diff --git a/gcc/ira-emit.cc b/gcc/ira-emit.cc
index d347f11fa02..8075b082e36 100644
--- a/gcc/ira-emit.cc
+++ b/gcc/ira-emit.cc
@@ -510,8 +510,8 @@ generate_edge_moves (edge e)
 return;
   src_map = src_loop_node->regno_allocno_map;
   dest_map = dest_loop_node->regno_allocno_map;
-  regs_live_in_dest = df_get_live_in (e->dest);
-  regs_live_out_src = df_get_live_out (e->src);
+  regs_live_in_dest = df_get_subreg_live_in (e->dest);
+  regs_live_out_src = df_get_subreg_live_out (e->src);
   EXECUTE_IF_SET_IN_REG_SET (regs_live_in_dest,
 FIRST_PSEUDO_REGISTER, regno, bi)
 if (bitmap_bit_p (regs_live_out_src, regno))
@@ -1229,16 +1229,16 @@ add_ranges_and_copies (void)
 destination block) to use for searching allocnos by their
 regnos because of subsequent IR flattening.  */
   node = IRA_BB_NODE (bb)->parent;
-  bitmap_copy (live_through, df_get_live_in (bb));
+  bitmap_copy (live_through, df_get_subreg_live_in (bb));
   add_range_and_copies_from_move_list
(at_bb_start[bb->index], node, live_through, REG_FREQ_FROM_BB (bb));
-  bitmap_copy (live_through, df_get_live_out (bb));
+  bitmap_copy (live_through, df_get_subreg_live_out (bb));
   add_range_and_copies_from_move_list
(at_bb_end[bb->index], node, live_through, REG_FREQ_FROM_BB (bb));
   FOR_EACH_EDGE (e, ei, bb->succs)
{
- bitmap_and (live_through,
- df_get_live_in (e->dest), df_get_live_out (bb));
+ bitmap_and (live_through, df_get_subreg_live_in (e->dest),
+ df_get_subreg_live_out (bb));
   

[PATCH 4/4] lra: Apply DF_LIVE_SUBREG data

2024-02-03 Thread Lehua Ding
This patch apply the DF_LIVE_SUBREG to LRA pass. More changes were made
to the LRA than the IRA since the LRA will modify the DF data directly.
The main big changes are centered on the lra-lives.cc file.

gcc/ChangeLog:

* lra-coalesce.cc (update_live_info): Extend to DF_LIVE_SUBREG.
(lra_coalesce): Ditto.
* lra-constraints.cc (update_ebb_live_info): Ditto.
(get_live_on_other_edges): Ditto.
(inherit_in_ebb): Ditto.
(lra_inheritance): Ditto.
(fix_bb_live_info): Ditto.
(remove_inheritance_pseudos): Ditto.
* lra-int.h (GCC_LRA_INT_H): include subreg-live-range.h
(struct lra_insn_reg): Add op filed to record the corresponding rtx.
* lra-lives.cc (class bb_data_pseudos): Extend the bb_data_pseudos to
include new partial_def/use and range_def/use fileds for DF_LIVE_SUBREG
problem.
(need_track_subreg_p): checking is the regno need to be tracked.
(make_hard_regno_live): switch to live_subreg filed.
(make_hard_regno_dead): Ditto.
(mark_regno_live): Support record subreg liveness.
(mark_regno_dead): Ditto.
(live_trans_fun): Adjust transfer function to support subreg liveness.
(live_con_fun_0): Adjust Confluence function to support subreg liveness.
(live_con_fun_n): Ditto.
(initiate_live_solver): Ditto.
(finish_live_solver): Ditto.
(process_bb_lives): Ditto.
(lra_create_live_ranges_1): Dump subreg liveness.
* lra-remat.cc (dump_candidates_and_remat_bb_data): Switch to
DF_LIVE_SUBREG df data.
(calculate_livein_cands): Ditto.
(do_remat): Ditto.
* lra-spills.cc (spill_pseudos): Ditto.
* lra.cc (new_insn_reg): New argument op.
(add_regs_to_insn_regno_info): Add new argument op.
---
 gcc/lra-coalesce.cc|  27 +++-
 gcc/lra-constraints.cc | 109 ++---
 gcc/lra-int.h  |   4 +
 gcc/lra-lives.cc   | 357 -
 gcc/lra-remat.cc   |   8 +-
 gcc/lra-spills.cc  |  27 +++-
 gcc/lra.cc |  10 +-
 7 files changed, 430 insertions(+), 112 deletions(-)

diff --git a/gcc/lra-coalesce.cc b/gcc/lra-coalesce.cc
index a9b5b51cb3f..9416775a009 100644
--- a/gcc/lra-coalesce.cc
+++ b/gcc/lra-coalesce.cc
@@ -186,19 +186,28 @@ static bitmap_head used_pseudos_bitmap;
 /* Set up USED_PSEUDOS_BITMAP, and update LR_BITMAP (a BB live info
bitmap).  */
 static void
-update_live_info (bitmap lr_bitmap)
+update_live_info (bitmap all, bitmap full, bitmap partial)
 {
   unsigned int j;
   bitmap_iterator bi;
 
   bitmap_clear (_pseudos_bitmap);
-  EXECUTE_IF_AND_IN_BITMAP (_pseudos_bitmap, lr_bitmap,
+  EXECUTE_IF_AND_IN_BITMAP (_pseudos_bitmap, all,
FIRST_PSEUDO_REGISTER, j, bi)
 bitmap_set_bit (_pseudos_bitmap, first_coalesced_pseudo[j]);
-  if (! bitmap_empty_p (_pseudos_bitmap))
+  if (!bitmap_empty_p (_pseudos_bitmap))
 {
-  bitmap_and_compl_into (lr_bitmap, _pseudos_bitmap);
-  bitmap_ior_into (lr_bitmap, _pseudos_bitmap);
+  bitmap_and_compl_into (all, _pseudos_bitmap);
+  bitmap_ior_into (all, _pseudos_bitmap);
+
+  if (flag_track_subreg_liveness)
+   {
+ bitmap_and_compl_into (full, _pseudos_bitmap);
+ bitmap_ior_and_compl_into (full, _pseudos_bitmap, partial);
+
+ bitmap_and_compl_into (partial, _pseudos_bitmap);
+ bitmap_ior_and_compl_into (partial, _pseudos_bitmap, full);
+   }
 }
 }
 
@@ -301,8 +310,12 @@ lra_coalesce (void)
   bitmap_initialize (_pseudos_bitmap, _obstack);
   FOR_EACH_BB_FN (bb, cfun)
 {
-  update_live_info (df_get_live_in (bb));
-  update_live_info (df_get_live_out (bb));
+  update_live_info (df_get_subreg_live_in (bb),
+   df_get_subreg_live_full_in (bb),
+   df_get_subreg_live_partial_in (bb));
+  update_live_info (df_get_subreg_live_out (bb),
+   df_get_subreg_live_full_out (bb),
+   df_get_subreg_live_partial_out (bb));
   FOR_BB_INSNS_SAFE (bb, insn, next)
if (INSN_P (insn)
&& bitmap_bit_p (_insns_bitmap, INSN_UID (insn)))
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 0ae81c1ff9c..d1316620f51 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -6505,34 +6505,86 @@ update_ebb_live_info (rtx_insn *head, rtx_insn *tail)
{
  if (prev_bb != NULL)
{
- /* Update df_get_live_in (prev_bb):  */
+ /* Update subreg live (prev_bb):  */
+ bitmap subreg_all_in = df_get_subreg_live_in (prev_bb);
+ bitmap subreg_full_in = df_get_subreg_live_full_in (prev_bb);
+ bitmap subreg_partial_in = df_get_subreg_live_partial_in 
(prev_bb);
+ subregs_live *range_in = df_get_subreg_live_range_in (prev_bb);
  EXECUTE_IF_SET_IN_BITMAP (_only_regs, 0, j, 

[PATCH 2/4] df: Add DF_LIVE_SUBREG problem

2024-02-03 Thread Lehua Ding
This patch add a new DF problem, named DF_LIVE_SUBREG. This problem
is extended from the DF_LR problem and support track the subreg liveness
of multireg pseudo if these pseudo satisfy the following conditions:

  1. the mode size greater than it's REGMODE_NATURAL_SIZE.
  2. the reg is used in insns via subreg pattern.

The main methods are as follows:

  1. split bitmap in/out/def/use fileds to full_in/out/def/use and
 partial_in/out/def/use. If a pseudo need to be tracked it's subreg
 liveness, then it is recorded in partial_in/out/def/use fileds.
 Meantimes, there are range_in/out/def/use fileds which records the live
 range of the tracked pseudo.
  2. in the df_live_subreg_finalize function, we move the tracked pseudo from
 the partial_in/out/def/use to full_in/out/def/use if the pseudo's live
 range is full.

gcc/ChangeLog:

* Makefile.in: Add subreg-live-range object file.
* df-problems.cc (struct df_live_subreg_problem_data): Private struct
for DF_LIVE_SUBREG problem.
(df_live_subreg_get_bb_info): getting bb regs in/out data.
(get_live_subreg_local_bb_info): getting bb regs use/def data.
(multireg_p): checking is the regno a pseudo multireg.
(need_track_subreg_p): checking is the regno need to be tracked.
(init_range): getting the range of subreg rtx.
(remove_subreg_range): removing use data for the reg/subreg rtx.
(add_subreg_range): adding def/use data for the reg/subreg rtx.
(df_live_subreg_free_bb_info): Free basic block df data.
(df_live_subreg_alloc): Allocate and init df data.
(df_live_subreg_reset): Reset the live in/out df data.
(df_live_subreg_bb_local_compute): Compute basic block df data.
(df_live_subreg_local_compute): Compute all basic blocks df data.
(df_live_subreg_init): Init the in/out df data.
(df_live_subreg_check_result): Assert the full and partial df data.
(df_live_subreg_confluence_0): Confluence function for infinite loops.
(df_live_subreg_confluence_n): Confluence function for normal edge.
(df_live_subreg_transfer_function): Transfer function.
(df_live_subreg_finalize): Finalize the all_in/all_out df data.
(df_live_subreg_free): Free the df data.
(df_live_subreg_top_dump): Dump top df data.
(df_live_subreg_bottom_dump): Dump bottom df data.
(df_live_subreg_add_problem): Add the DF_LIVE_SUBREG problem.
* df.h (enum df_problem_id): Add DF_LIVE_SUBREG.
(class subregs_live): Simple decalare.
(class df_live_subreg_local_bb_info): New class for full/partial def/use
df data.
(class df_live_subreg_bb_info): New class for full/partial in/out
df data.
(df_live_subreg): getting the df_live_subreg data.
(df_live_subreg_add_problem): Exported.
(df_live_subreg_finalize): Ditto.
(df_live_subreg_check_result): Ditto.
(multireg_p): Ditto.
(init_range): Ditto.
(add_subreg_range): Ditto.
(remove_subreg_range): Ditto.
(df_get_subreg_live_in): Accessor the all_in df data.
(df_get_subreg_live_out): Accessor the all_out df data.
(df_get_subreg_live_full_in): Accessor the full_in df data.
(df_get_subreg_live_full_out): Accessor the full_out df data.
(df_get_subreg_live_partial_in): Accessor the partial_in df data.
(df_get_subreg_live_partial_out): Accessor the partial_out df data.
(df_get_subreg_live_range_in): Accessor the range_in df data.
(df_get_subreg_live_range_out): Accessor the range_out df data.
* regs.h (get_nblocks): Get the blocks of mode.
* sbitmap.cc (bitmap_full_p): sbitmap predicator.
(bitmap_same_p): sbitmap predicator.
(test_full): test bitmap_full_p.
(test_same): test bitmap_same_p.
(sbitmap_cc_tests): Add test_full and test_same.
* sbitmap.h (bitmap_full_p): Exported.
(bitmap_same_p): Ditto.
* timevar.def (TV_DF_LIVE_SUBREG): add DF_LIVE_SUBREG timevar.
* subreg-live-range.cc: New file.
* subreg-live-range.h: New file.

---
 gcc/Makefile.in  |   1 +
 gcc/df-problems.cc   | 855 ++-
 gcc/df.h | 155 +++
 gcc/regs.h   |   5 +
 gcc/sbitmap.cc   |  98 +
 gcc/sbitmap.h|   2 +
 gcc/subreg-live-range.cc |  53 +++
 gcc/subreg-live-range.h  | 206 ++
 gcc/timevar.def  |   1 +
 9 files changed, 1375 insertions(+), 1 deletion(-)
 create mode 100644 gcc/subreg-live-range.cc
 create mode 100644 gcc/subreg-live-range.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 95caa54a52b..c30ea333736 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1684,6 +1684,7 @@ OBJS = \
store-motion.o \
streamer-hooks.o \
stringpool.o \
+   subreg-live-range.o \

[PATCH 0/4] Add DF_LIVE_SUBREG data and apply to IRA and LRA

2024-02-03 Thread Lehua Ding
Hi,

These patches are used to add a new data flow DF_LIVE_SUBREG,
which will track subreg liveness and then apply it to IRA and LRA
passes (enabled via -O3 or -ftrack-subreg-liveness). These patches
are for GCC 15.

These patches are separated from the subreg-coalesce patches submitted
a few months ago. I refactored the code according to comments. The next
patches will support subreg coalesce base on they. Here are some data
abot build time of SPEC INT 2017 (x86-64 target):

  baseline   baseline(+track-subreg-liveness)
specint2017 build time :  1892s  1883s

Regarding build times, I've run it a few times, but they all seem to take
much less time. Since the difference is small, it's possible that it's just
a change in environment. But it's theoretically possible, since supporting
subreg-liveness could have reduced the number of living regs.

For memory usage, I trided PR 69609 by valgrind, peak memory size grow from
2003910656 to 2003947520, very small increase.

For SPEC INT 2017, when using upstream GCC (whitout these patches), I get a
coredump when training the peak case, so no data yet. The cause of the core
dump still needs to be investigated.

No regression on x86-64, AArch64 and RISC-V target.

Best,
Lehua

Lehua Ding (4):
  df: Add -ftrack-subreg-liveness option
  df: Add DF_LIVE_SUBREG problem
  ira: Apply DF_LIVE_SUBREG data
  lra: Apply DF_LIVE_SUBREG data

 gcc/Makefile.in  |   1 +
 gcc/common.opt   |   4 +
 gcc/df-problems.cc   | 855 ++-
 gcc/df.h | 155 +++
 gcc/ira-build.cc |   7 +-
 gcc/ira-color.cc |   8 +-
 gcc/ira-emit.cc  |  12 +-
 gcc/ira-lives.cc |   7 +-
 gcc/ira.cc   |  19 +-
 gcc/lra-coalesce.cc  |  27 +-
 gcc/lra-constraints.cc   | 109 -
 gcc/lra-int.h|   4 +
 gcc/lra-lives.cc | 355 
 gcc/lra-remat.cc |   8 +-
 gcc/lra-spills.cc|  27 +-
 gcc/lra.cc   |  10 +-
 gcc/opts.cc  |   1 +
 gcc/regs.h   |   5 +
 gcc/sbitmap.cc   |  98 +
 gcc/sbitmap.h|   2 +
 gcc/subreg-live-range.cc |  53 +++
 gcc/subreg-live-range.h  | 206 ++
 gcc/timevar.def  |   1 +
 23 files changed, 1839 insertions(+), 135 deletions(-)
 create mode 100644 gcc/subreg-live-range.cc
 create mode 100644 gcc/subreg-live-range.h

-- 
2.36.3



[PATCH 1/4] df: Add -ftrack-subreg-liveness option

2024-02-03 Thread Lehua Ding
Add new flag -ftrack-subreg-liveness to enable track-subreg-liveness.
This flag is enabled at -O3/fast.

gcc/ChangeLog:

* common.opt: add -ftrack-subreg-liveness option.
* opts.cc: auto aneble -ftrack-subreg-liveness in -O3/fast

---
 gcc/common.opt | 4 
 gcc/opts.cc| 1 +
 2 files changed, 5 insertions(+)

diff --git a/gcc/common.opt b/gcc/common.opt
index 51c4a17da83..d4592c6426a 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2156,6 +2156,10 @@ fira-verbose=
 Common RejectNegative Joined UInteger Var(flag_ira_verbose) Init(5)
 -fira-verbose= Control IRA's level of diagnostic messages.
 
+ftrack-subreg-liveness
+Common Var(flag_track_subreg_liveness) Init(0) Optimization
+Track subreg liveness information for IRA and LRA, enabled at -O3.
+
 fivopts
 Common Var(flag_ivopts) Init(1) Optimization
 Optimize induction variables on trees.
diff --git a/gcc/opts.cc b/gcc/opts.cc
index 600e0ea..50c0b62c5af 100644
--- a/gcc/opts.cc
+++ b/gcc/opts.cc
@@ -689,6 +689,7 @@ static const struct default_options default_options_table[] 
=
 { OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 },
 { OPT_LEVELS_3_PLUS, OPT_fvect_cost_model_, NULL, VECT_COST_MODEL_DYNAMIC 
},
 { OPT_LEVELS_3_PLUS, OPT_fversion_loops_for_strides, NULL, 1 },
+{ OPT_LEVELS_3_PLUS, OPT_ftrack_subreg_liveness, NULL, 1 },
 
 /* -O3 parameters.  */
 { OPT_LEVELS_3_PLUS, OPT__param_max_inline_insns_auto_, NULL, 30 },
-- 
2.36.3



Pushed: [PATCH] LoongArch: Fix an ODR violation

2024-02-03 Thread Xi Ruoyao
On Fri, 2024-02-02 at 10:42 +0800, chenglulu wrote:
> LGTM!
> 
> Thanks!

Pushed r14-8773.

> 在 2024/2/2 上午5:54, Xi Ruoyao 写道:
> > When bootstrapping GCC 14 --with-build-config=bootstrap-lto, an ODR
> > violation is detected:
> > 
> >  ../../gcc/config/loongarch/loongarch-opts.cc:57: warning:
> >  'abi_minimal_isa' violates the C++ One Definition Rule [-Wodr]
> >  57 | abi_minimal_isa[N_ABI_BASE_TYPES][N_ABI_EXT_TYPES];
> >  ../../gcc/config/loongarch/loongarch-def.cc:186: note:
> >  'abi_minimal_isa' was previously declared here
> >  186 |   abi_minimal_isa = array,
> >  ../../gcc/config/loongarch/loongarch-def.cc:186: note:
> >  code may be misoptimized unless '-fno-strict-aliasing' is used
> > 
> > Fix it by adding a proper declaration of abi_minimal_isa into
> > loongarch-def.h and remove the ODR-violating local declaration in
> > loongarch-opts.cc.
> > 
> > gcc/ChangeLog:
> > 
> > * config/loongarch/loongarch-def.h (abi_minimal_isa): Declare.
> > * config/loongarch/loongarch-opts.cc (abi_minimal_isa): Remove
> > the ODR-violating locale declaration.
> > ---
> > 
> > Bootstrapped on loongarch64-linux-gnu.  Not fully regtested but it
> > should be an obvious fix.  Ok for trunk?
> > 
> >   gcc/config/loongarch/loongarch-def.h   | 3 +++
> >   gcc/config/loongarch/loongarch-opts.cc | 2 --
> >   2 files changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/gcc/config/loongarch/loongarch-def.h 
> > b/gcc/config/loongarch/loongarch-def.h
> > index a1237ecf1fd..2dbf006d013 100644
> > --- a/gcc/config/loongarch/loongarch-def.h
> > +++ b/gcc/config/loongarch/loongarch-def.h
> > @@ -203,5 +203,8 @@ extern loongarch_def_array > N_TUNE_TYPES>
> >     loongarch_cpu_align;
> >   extern loongarch_def_array
> >     loongarch_cpu_rtx_cost_data;
> > +extern loongarch_def_array<
> > +  loongarch_def_array,
> > +  N_ABI_BASE_TYPES> abi_minimal_isa;
> >   
> >   #endif /* LOONGARCH_DEF_H */
> > diff --git a/gcc/config/loongarch/loongarch-opts.cc 
> > b/gcc/config/loongarch/loongarch-opts.cc
> > index b87299513c9..7eeac43ed2f 100644
> > --- a/gcc/config/loongarch/loongarch-opts.cc
> > +++ b/gcc/config/loongarch/loongarch-opts.cc
> > @@ -53,8 +53,6 @@ static const int tm_multilib_list[] = { TM_MULTILIB_LIST 
> > };
> >   static int enabled_abi_types[N_ABI_BASE_TYPES][N_ABI_EXT_TYPES] = { 0 };
> >   
> >   #define isa_required(ABI) (abi_minimal_isa[(ABI).base][(ABI).ext])
> > -extern "C" const struct loongarch_isa
> > -abi_minimal_isa[N_ABI_BASE_TYPES][N_ABI_EXT_TYPES];
> >   
> >   static inline int
> >   is_multilib_enabled (struct loongarch_abi abi)
> 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH] LoongArch: Fix wrong LSX FP vector negation

2024-02-03 Thread Xi Ruoyao
We expanded (neg x) to (minus const0 x) for LSX FP vectors, this is
wrong because -0.0 is not 0 - 0.0.  This causes some Python tests to
fail when Python is built with LSX enabled.

Use the vbitrevi.{d/w} instructions to simply reverse the sign bit
instead.  We are already doing this for LASX and now we can unify them
into simd.md.

gcc/ChangeLog:

* config/loongarch/lsx.md (neg2): Remove the
incorrect expand.
* config/loongarch/simd.md (simdfmt_as_i): New define_mode_attr.
(elmsgnbit): Likewise.
(neg2): New define_insn.
* config/loongarch/lasx.md (negv4df2, negv8sf2): Remove as they
are now instantiated in simd.md.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/lasx.md | 16 
 gcc/config/loongarch/lsx.md  | 11 ---
 gcc/config/loongarch/simd.md | 18 ++
 3 files changed, 18 insertions(+), 27 deletions(-)

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index e2115ffb884..ac84db7f0ce 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -3028,22 +3028,6 @@ (define_insn "absv8sf2"
   [(set_attr "type" "simd_logic")
(set_attr "mode" "V8SF")])
 
-(define_insn "negv4df2"
-  [(set (match_operand:V4DF 0 "register_operand" "=f")
-   (neg:V4DF (match_operand:V4DF 1 "register_operand" "f")))]
-  "ISA_HAS_LASX"
-  "xvbitrevi.d\t%u0,%u1,63"
-  [(set_attr "type" "simd_logic")
-   (set_attr "mode" "V4DF")])
-
-(define_insn "negv8sf2"
-  [(set (match_operand:V8SF 0 "register_operand" "=f")
-   (neg:V8SF (match_operand:V8SF 1 "register_operand" "f")))]
-  "ISA_HAS_LASX"
-  "xvbitrevi.w\t%u0,%u1,31"
-  [(set_attr "type" "simd_logic")
-   (set_attr "mode" "V8SF")])
-
 (define_insn "xvfmadd4"
   [(set (match_operand:FLASX 0 "register_operand" "=f")
(fma:FLASX (match_operand:FLASX 1 "register_operand" "f")
diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md
index 7002edae4d4..b9b94b9079c 100644
--- a/gcc/config/loongarch/lsx.md
+++ b/gcc/config/loongarch/lsx.md
@@ -728,17 +728,6 @@ (define_expand "neg2"
   DONE;
 })
 
-(define_expand "neg2"
-  [(set (match_operand:FLSX 0 "register_operand")
-   (neg:FLSX (match_operand:FLSX 1 "register_operand")))]
-  "ISA_HAS_LSX"
-{
-  rtx reg = gen_reg_rtx (mode);
-  emit_move_insn (reg, CONST0_RTX (mode));
-  emit_insn (gen_sub3 (operands[0], reg, operands[1]));
-  DONE;
-})
-
 (define_expand "lsx_vrepli"
   [(match_operand:ILSX 0 "register_operand")
(match_operand 1 "const_imm10_operand")]
diff --git a/gcc/config/loongarch/simd.md b/gcc/config/loongarch/simd.md
index cb0a19447a1..00ff2823a4e 100644
--- a/gcc/config/loongarch/simd.md
+++ b/gcc/config/loongarch/simd.md
@@ -85,12 +85,21 @@ (define_mode_attr simdfmt [(V2DF "d") (V4DF "d")
 (define_mode_attr simdifmt_for_f [(V2DF "l") (V4DF "l")
  (V4SF "w") (V8SF "w")])
 
+;; Suffix for integer mode in LSX or LASX instructions to operating FP
+;; vectors using integer vector operations.
+(define_mode_attr simdfmt_as_i [(V2DF "d") (V4DF "d")
+   (V4SF "w") (V8SF "w")])
+
 ;; Size of vector elements in bits.
 (define_mode_attr elmbits [(V2DI "64") (V4DI "64")
   (V4SI "32") (V8SI "32")
   (V8HI "16") (V16HI "16")
   (V16QI "8") (V32QI "8")])
 
+;; The index of sign bit in FP vector elements.
+(define_mode_attr elmsgnbit [(V2DF "63") (V4DF "63")
+(V4SF "31") (V8SF "31")])
+
 ;; This attribute is used to form an immediate operand constraint using
 ;; "const__operand".
 (define_mode_attr bitimm [(V16QI "uimm3") (V32QI "uimm3")
@@ -457,6 +466,15 @@ (define_expand "reduc__scal_"
   DONE;
 })
 
+;; FP negation.
+(define_insn "neg2"
+  [(set (match_operand:FVEC 0 "register_operand" "=f")
+   (neg:FVEC (match_operand:FVEC 1 "register_operand" "f")))]
+  ""
+  "vbitrevi.\t%0,%1,"
+  [(set_attr "type" "simd_logic")
+   (set_attr "mode" "")])
+
 ; The LoongArch SX Instructions.
 (include "lsx.md")
 
-- 
2.43.0



[r14-8768 Regression] FAIL: libgomp.fortran/non-rectangular-loop-1.f90 -O1 execution test on Linux/x86_64

2024-02-03 Thread haochen.jiang
On Linux/x86_64,

85094e2aa6dba7908f053046f02dd443e8f65d72 is the first bad commit
commit 85094e2aa6dba7908f053046f02dd443e8f65d72
Author: Tamar Christina 
Date:   Fri Feb 2 23:52:27 2024 +

middle-end: check memory accesses in the destination block [PR113588].

caused

FAIL: libgomp.fortran/non-rectangular-loop-1.f90   -O1  execution test

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-8768/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="fortran.exp=libgomp.fortran/non-rectangular-loop-1.f90 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="fortran.exp=libgomp.fortran/non-rectangular-loop-1.f90 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="fortran.exp=libgomp.fortran/non-rectangular-loop-1.f90 
--target_board='unix{-m64}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="fortran.exp=libgomp.fortran/non-rectangular-loop-1.f90 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


[PATCH] wide-int: Fix up wi::bswap_large [PR113722]

2024-02-03 Thread Jakub Jelinek
Hi!

Since bswap has been converted from a method to a function we miscompile
the following testcase.  The problem is the assumption that the passed in
len argument (number of limbs in the xval array) is the upper bound for the
bswap result, which is true only if precision is <= 64.  If precision is
larger than that, e.g. 128 as in the testcase, if the argument has only
one limb (i.e. 0 to ~(unsigned HOST_WIDE_INT) 0), the result can still
need 2 limbs for that precision, or generally BLOCKS_NEEDED (precision)
limbs, it all depends on how many least significant limbs of the operand
are zero.  bswap_large as implemented only cleared len limbs of result,
then swapped the bytes (invoking UB when oring something in all the limbs
above it) and finally passed len to canonize, saying that more limbs
aren't needed.

The following patch fixes it by renaming len to xlen (so that it is clear
it is X's length), using it solely for safe_uhwi argument when we attempt
to read from X, and using new len = BLOCKS_NEEDED (precision) instead in
the other two spots (i.e. when clearing the val array, turned it also
into memset, and in canonize argument).  wi::bswap asserts it isn't invoked
on widest_int, so we are always invoked on wide_int or similar and those
have preallocated result sized for the corresponding precision (i.e.
BLOCKS_NEEDED (precision)).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-02-03  Jakub Jelinek  

PR middle-end/113722
* wide-int.cc (wi::bswap_large): Rename third argument from
len to xlen and adjust use in safe_uhwi.  Add len variable, set
it to BLOCKS_NEEDED (precision) and use it for clearing of val
and as canonize argument.  Clear val using memset instead of
a loop.

* gcc.dg/pr113722.c: New test.

--- gcc/wide-int.cc.jj  2024-01-03 11:51:42.077584823 +0100
+++ gcc/wide-int.cc 2024-02-02 18:13:34.993332159 +0100
@@ -729,20 +729,19 @@ wi::set_bit_large (HOST_WIDE_INT *val, c
 }
 }
 
-/* Byte swap the integer represented by XVAL and LEN into VAL.  Return
+/* Byte swap the integer represented by XVAL and XLEN into VAL.  Return
the number of blocks in VAL.  Both XVAL and VAL have PRECISION bits.  */
 unsigned int
 wi::bswap_large (HOST_WIDE_INT *val, const HOST_WIDE_INT *xval,
-unsigned int len, unsigned int precision)
+unsigned int xlen, unsigned int precision)
 {
-  unsigned int i, s;
+  unsigned int s, len = BLOCKS_NEEDED (precision);
 
   /* This is not a well defined operation if the precision is not a
  multiple of 8.  */
   gcc_assert ((precision & 0x7) == 0);
 
-  for (i = 0; i < len; i++)
-val[i] = 0;
+  memset (val, 0, sizeof (unsigned HOST_WIDE_INT) * len);
 
   /* Only swap the bytes that are not the padding.  */
   for (s = 0; s < precision; s += 8)
@@ -753,7 +752,7 @@ wi::bswap_large (HOST_WIDE_INT *val, con
   unsigned int block = s / HOST_BITS_PER_WIDE_INT;
   unsigned int offset = s & (HOST_BITS_PER_WIDE_INT - 1);
 
-  byte = (safe_uhwi (xval, len, block) >> offset) & 0xff;
+  byte = (safe_uhwi (xval, xlen, block) >> offset) & 0xff;
 
   block = d / HOST_BITS_PER_WIDE_INT;
   offset = d & (HOST_BITS_PER_WIDE_INT - 1);
--- gcc/testsuite/gcc.dg/pr113722.c.jj  2024-02-02 18:25:22.702561427 +0100
+++ gcc/testsuite/gcc.dg/pr113722.c 2024-02-02 18:21:00.109186858 +0100
@@ -0,0 +1,22 @@
+/* PR middle-end/113722 */
+/* { dg-do run { target int128 } } */
+/* { dg-options "-O2" } */
+
+int
+main ()
+{
+  unsigned __int128 a = __builtin_bswap128 ((unsigned __int128) 2);
+  if (a != ((unsigned __int128) 2) << 120)
+__builtin_abort ();
+  a = __builtin_bswap128 ((unsigned __int128) 0xdeadbeefULL);
+  if (a != ((unsigned __int128) 0xefbeaddeULL) << 96)
+__builtin_abort ();
+  a = __builtin_bswap128 (((unsigned __int128) 0xdeadbeefULL) << 64);
+  if (a != ((unsigned __int128) 0xefbeaddeULL) << 32)
+__builtin_abort ();
+  a = __builtin_bswap128 unsigned __int128) 0xdeadbeefULL) << 64)
+ | 0xcafed00dfeedbac1ULL);
+  if (a != unsigned __int128) 0xc1baedfe0dd0fecaULL) << 64)
+   | (((unsigned __int128) 0xefbeaddeULL) << 32)))
+__builtin_abort ();
+}

Jakub



[PATCH] ggc-common: Fix save PCH assertion

2024-02-03 Thread Jakub Jelinek
Hi!

We are getting a gnuradio PCH ICE
/usr/include/pybind11/stl.h:447:1: internal compiler error: in gt_pch_save, at 
ggc-common.cc:693
0x1304e7d gt_pch_save(_IO_FILE*)
../../gcc/ggc-common.cc:693
0x12a45fb c_common_write_pch()
../../gcc/c-family/c-pch.cc:175
0x18ad711 c_parse_final_cleanups()
../../gcc/cp/decl2.cc:5062
0x213988b c_common_parse_file()
../../gcc/c-family/c-opts.cc:1319
(unfortunately it isn't reproduceable always, but often needs
up to 100 attempts, isn't reproduceable in a cross etc.).
The bug is in the assertion I've added in gt_pch_save when adding
relocation support for the PCH files in case they happen not to be
mmapped at the selected address.
addr is a relocated address which points to a location in the PCH
blob (starting at mmi.preferred_base, with mmi.size bytes) which contains
a pointer that needs to be relocated.  So the assertion is meant to
verify the address is within the PCH blob, obviously it needs to be
equal or above mmi.preferred_base, but I got the other comparison wrong
and when one is very unlucky and the last sizeof (void *) bytes of the
blob happen to be a pointer which needs to be relocated, such as on the
s390x host addr 0x8008a04ff8, mmi.preferred_base 0x80 and
mmi.size 0x8a05000, addr + sizeof (void *) is equal to mmi.preferred_base +
mmi.size and that is still fine, both addresses are end of something.

Bootstrapped/regtested on x86_64-linux and i686-linux, plus tested on s390x
on the testcase which was ICEing in 1-100 iterations and there it survived
7750 attempts without ICE (forgot to stop it earlier), ok for trunk?

2024-02-03  Jakub Jelinek  

* ggc-common.cc (gt_pch_save): Allow addr to be equal to
mmi.preferred_base + mmi.size - sizeof (void *).

--- gcc/ggc-common.cc.jj2024-01-03 11:51:39.397622018 +0100
+++ gcc/ggc-common.cc   2024-02-02 17:33:13.106727473 +0100
@@ -692,7 +692,7 @@ gt_pch_save (FILE *f)
 {
   gcc_assert ((uintptr_t) addr >= (uintptr_t) mmi.preferred_base
  && ((uintptr_t) addr + sizeof (void *)
- < (uintptr_t) mmi.preferred_base + mmi.size));
+ <= (uintptr_t) mmi.preferred_base + mmi.size));
   if (addr == last_addr)
continue;
   if (last_addr == NULL)

Jakub