Re: [middle-end PATCH] Prefer PLUS over IOR in RTL expansion of multi-word shifts/rotates.

2024-01-21 Thread Richard Biener
On Fri, Jan 19, 2024 at 5:06 PM Georg-Johann Lay  wrote:
>
>
>
> Am 18.01.24 um 20:54 schrieb Roger Sayle:
> >
> > This patch tweaks RTL expansion of multi-word shifts and rotates to use
> > PLUS rather than IOR for disjunctive operations.  During expansion of
> > these operations, the middle-end creates RTL like (X<>C2)
> > where the constants C1 and C2 guarantee that bits don't overlap.
> > Hence the IOR can be performed by any any_or_plus operation, such as
> > IOR, XOR or PLUS; for word-size operations where carry chains aren't
> > an issue these should all be equally fast (single-cycle) instructions.
> > The benefit of this change is that targets with shift-and-add insns,
> > like x86's lea, can benefit from the LSHIFT-ADD form.
> >
> > An example of a backend that benefits is ARC, which is demonstrated
> > by these two simple functions:
>
> But there are also back-ends where this is bad.
>
> The reason is that with ORI, the back-end needs only to operate no
> these sub-words where the sub-mask is non-zero.  But for PLUS this
> is not the case because the back-end does not know that intermediate
> carry will be zero.  Hence, with PLUS, more instructions are needed.
> An example is AVR, but maybe much more target with multi-word operations
> are affected in a bad way.
>
> Take for example the case with 2 words and a value of 1.
>
> LO |= 1
> HI |= 0
>
> can be optimized to
>
> LO |= 1
>
> but for addition this is not the case:
>
> LO += 1
> HI +=c 0 ;; Does not know that always carry = 0.

I wonder if the PLUS can be done on the lowpart only to make this
detail obvious?

> Johann
>
>
> >
> > unsigned long long foo(unsigned long long x) { return x<<2; }
> >
> > which with -O2 is currently compiled to:
> >
> > foo:lsr r2,r0,30
> >  asl_s   r1,r1,2
> >  asl_s   r0,r0,2
> >  j_s.d   [blink]
> >  or_sr1,r1,r2
> >
> > with this patch becomes:
> >
> > foo:lsr r2,r0,30
> >  add2r1,r2,r1
> >  j_s.d   [blink]
> >  asl_s   r0,r0,2
> >
> > unsigned long long bar(unsigned long long x) { return (x<<2)|(x>>62); }
> >
> > which with -O2 is currently compiled to 6 insns + return:
> >
> > bar:lsr r12,r0,30
> >  asl_s   r3,r1,2
> >  asl_s   r0,r0,2
> >  lsr_s   r1,r1,30
> >  or_sr0,r0,r1
> >  j_s.d   [blink]
> >  or  r1,r12,r3
> >
> > with this patch becomes 4 insns + return:
> >
> > bar:lsr r3,r1,30
> >  lsr r2,r0,30
> >  add2r1,r2,r1
> >  j_s.d   [blink]
> >  add2r0,r3,r0
> >
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
> >
> >
> > 2024-01-18  Roger Sayle  
> >
> > gcc/ChangeLog
> >  * expmed.cc (expand_shift_1): Use add_optab instead of ior_optab
> >  to generate PLUS instead or IOR when unioning disjoint bitfields.
> >  * optabs.cc (expand_subword_shift): Likewise.
> >  (expand_binop): Likewise for double-word rotate.
> >
> >
> > Thanks in advance,
> > Roger
> > --
> >


Re: Re: [PATCH] PR rtl-optimization/111267: Improved forward propagation.

2024-01-21 Thread juzhe.zh...@rivai.ai
OK I guess change register_operand into REG_P should fix those ICEs.

I will have a try and send a patch. 

Thanks.



juzhe.zh...@rivai.ai
 
From: Andrew Pinski
Date: 2024-01-22 15:14
To: juzhe.zh...@rivai.ai
CC: gcc-patches; Kito.cheng; kito.cheng; jeffreyalaw; vineetg; Robin Dapp; 
palmer; Patrick O'Neill
Subject: Re: [PATCH] PR rtl-optimization/111267: Improved forward propagation.
On Sun, Jan 21, 2024 at 11:06 PM juzhe.zh...@rivai.ai
 wrote:
>
> Hi, this patch causes these regressions (all ICEs) in RISC-V backend:
 
Note this is related to the fix for PR 109092.
 
But right now register_operand still accepts `(SUBREG (MEM...))`
before reload ...
So basically there is a check against register_operand to then
reg_or_subregno is called
but since register_operand allows `(SUBREG (MEM())` but
reg_or_subregno  asserts on only having `REG` or `SUBREG(REG)`, things
go downhill.
 
What a mess.  I really wish `(SUBREG (MEM...))` handling would go away ...
 
Thanks,
Andrew Pinski
 
>
> FAIL: gcc.dg/torture/float32-tg-2.c   -O1  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O1  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O2  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O2  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  (internal compiler error: in reg_or_subregno, at 
> jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  (internal compiler error: in reg_or_subregno, at 
> jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O3 -g  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O3 -g  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg-2.c   -Os  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg-2.c   -Os  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg.c   -O1  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg.c   -O1  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg.c   -O2  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg.c   -O2  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  (internal compiler error: in reg_or_subregno, at 
> jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  (internal compiler error: in reg_or_subregno, at 
> jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg.c   -O3 -g  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg.c   -O3 -g  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg.c   -Os  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg.c   -Os  (test for excess errors)
> FAIL: gcc.dg/torture/pr48124-4.c   -O1  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/pr48124-4.c   -O1  (test for excess errors)
> FAIL: gcc.dg/torture/pr48124-4.c   -O2  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/pr48124-4.c   -O2  (test for excess errors)
> FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  (internal compiler error: in reg_or_subregno, at 
> jump.cc:1895)
> FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  (test for excess errors)
> FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  (internal compiler error: in reg_or_subregno, at 
> jump.cc:1895)
> FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  (test for excess errors)
> FAIL: gcc.dg/torture/pr48124-4.c   -O3 -fomit-frame-pointer -funroll-loops 
> -fpeel-loops -ftracer -finline-functions  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/pr48124-4.c   -O3 -fomit-frame-pointer -funroll-loops 
> -fpeel-loops -ftracer -finline-functions  (test for excess errors)
> FAIL: gcc.dg/torture/pr48124-4.c   -O3 -g  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/pr48124-4.c   -O3 -g  (test for excess errors)
> FAIL: 

Re: [patch][gcn] mkoffload: Fix linking with "-g"; fix file deletion; improve diagnostic [PR111966]

2024-01-21 Thread Richard Biener
On Fri, Jan 19, 2024 at 9:08 PM Tobias Burnus  wrote:
>
> This patch fixes PR111966, i.e. when compiling offloaded code with "-g"
> but without "-march=", mkoffload created a file with e_flags set to
> gfx803/fiji as architecture - while all other files used gfx900, which
> the linker did not like.
>
> Reason: When the default was changed, this flag was missed. When passing
> -march=... instead of relying on the default, it works.
>
> Additionally, it fixed a bug with dangling pointers and multiple
> deletion attempts for the same file, leading normally only to the
> accumulation of /tmp/cc*.mkoffload.dbg.o files.
>
> And, finally,  when building with a recent GCC; it warned about missing
> %<...%> or %qs quotes. I added a couple to reduce the number of warnings.
>
> OK for mainline? — I think the /tmp/cc*.mkoffload.dbg.o part of the
> patch could also be backported to GCC 13 (and 12) if deemed to be useful.

OK for trunk

> Tobias


Re: HELP: Questions on unshare_expr

2024-01-21 Thread Richard Biener
On Fri, Jan 19, 2024 at 5:26 PM Qing Zhao  wrote:
>
>
>
> > On Jan 19, 2024, at 4:30 AM, Richard Biener  
> > wrote:
> >
> > On Thu, Jan 18, 2024 at 3:46 PM Qing Zhao  wrote:
> >>
> >>
> >>
> >>> On Jan 17, 2024, at 1:43 AM, Richard Biener  
> >>> wrote:
> >>>
> >>> On Wed, Jan 17, 2024 at 7:42 AM Richard Biener
> >>>  wrote:
> 
>  On Tue, Jan 16, 2024 at 9:26 PM Qing Zhao  wrote:
> >
> >
> >
> >> On Jan 15, 2024, at 4:31 AM, Richard Biener 
> >>  wrote:
> >>
> >>> All my questions for unshare_expr relate to a  LTO bug that I 
> >>> currently stuck with
> >>> when using .ACCESS_WITH_SIZE in bound sanitizer (only with -flto, 
> >>> without -flto, no issue):
> >>>
> >>> [opc@qinzhao-aarch64-ol8 gcc]$ sh t
> >>> during IPA pass: modref
> >>> t.c:20:1: internal compiler error: tree code ‘ssa_name’ is not 
> >>> supported in LTO streams
> >>> 0x14c3993 lto_write_tree
> >>>  ../../latest-gcc-write/gcc/lto-streamer-out.cc:561
> >>> 0x14c3aeb lto_output_tree_1
> >>>
> >>> And the value of the tree node that triggered the ICE is:
> >>> (gdb) call debug_tree(expr)
> >>> 
> >>>  nothrow
> >>>  def_stmt
> >>>  version:13 in-free-list>
> >>>
> >>> Is there any good way to debug LTO bug?
> >>
> >> This happens usually when you have a VLA type and its type fields are 
> >> not
> >> properly gimplified which usually happens because the frontend fails to
> >> insert a gimplification point for it (a DECL_EXPR).
> >
> > I found an old gcc bug
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97172
> > ICE: tree code ‘ssa_name’ is not supported in LTO streams since 
> > r11-3303-g6450f07388f9fe57
> >
> > Which is very similar to the bug I am having right now.
> >
> > After further study, I suspect that the issue I am having right now 
> > with the LTO streaming also
> > relate to “unshare_expr”, “save_expr”, and the combination of these 
> > two, I suspect that
> > the current gcc cannot handle the combination of these two correctly 
> > for my case.
> >
> > My testing case is:
> >
> > #include 
> > void __attribute__((__noinline__)) setup_and_test_vla (int n1, int n2, 
> > int m)
> > {
> >  struct foo {
> >  int n;
> >  int p[][n2][n1] __attribute__((counted_by(n)));
> >  } *f;
> >
> >  f = (struct foo *) malloc (sizeof(struct foo) + m*sizeof(int[n2][n1]));
> >  f->n = m;
> >  f->p[m][n2][n1]=1;
> >  return;
> > }
> >
> > int main(int argc, char *argv[])
> > {
> > setup_and_test_vla (10, 11, 20);
> > return 0;
> > }
> >
> > Failed with
> > my_gcc -Os -fsanitize=bounds -flto
> >
> > If changing either n1 or n2 to a constant, the testing passed.
> > If deleting -flto, the testing passed too.
> >
> > I double checked my code per the suggestions provided by you and Jakub 
> > in this
> > email thread, and I think the code should be fine.
> >
> > The code is following:
> >
> > =
> > 504 /* Instrument array bounds for INDIRECT_REFs whose pointers are
> > 505POINTER_PLUS_EXPRs of calls to .ACCESS_WITH_SIZE. We create 
> > special
> > 506builtins that gets expanded in the sanopt pass, and make an array
> > 507dimension of it.  ARRAY is the pointer to the base of the array,
> > 508which is a call to .ACCESS_WITH_SIZE, *OFFSET is the offset to 
> > the
> > 509beginning of array.
> > 510Return NULL_TREE if no instrumentation is emitted.  */
> > 511
> > 512 tree
> > 513 ubsan_instrument_bounds_indirect_ref (location_t loc, tree array, 
> > tree *offset)
> > 514 {
> > 515   if (!is_access_with_size_p (array))
> > 516 return NULL_TREE;
> > 517   tree bound = get_bound_from_access_with_size (array);
> > 518   /* The type of the call to .ACCESS_WITH_SIZE is a pointer type to
> > 519  the element of the array.  */
> > 520   tree element_size = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE 
> > (array)));
> > 521   gcc_assert (bound);
> > 522
> > 523   /* Given the offset, and the size of each element, the index can 
> > be
> > 524  computed as: offset/element_size.  */
> > 525   *offset = save_expr (*offset);
> > 526   tree index = fold_build2 (EXACT_DIV_EXPR,
> > 527sizetype, *offset,
> > 528unshare_expr (element_size));
> > 529   /* Create a "(T *) 0" tree node to describe the original array 
> > type.
> > 530  We get the original array type from the first argument of the 
> > call to
> > 531  .ACCESS_WITH_SIZE (REF, COUNTED_BY_REF, 1, num_bytes, -1).
> > 532
> > 533  Originally, REF is a COMPONENT_REF with the original array 
> > type,
> > 534 

Re: [PATCH] c/c++: Tweak warning for 'always_inline function might not be inlinable'

2024-01-21 Thread Richard Biener
On Sun, Jan 21, 2024 at 2:33 PM Hans-Peter Nilsson  wrote:
>
> Tested x86_64-linux-gnu.  Ok to commit?
>
> Or, does the message need more tweaking?
> (If so, suggestions from native speakers?)
> FWIW, I found no PR for just the message being bad.
>
> -- >8 --
> When you're not regularly exposed to this warning, it is
> easy to be misled by its wording, believing that there's
> something else in the function that stops it from being
> inlined, than the lack of also being *declared* inline.
>
> It's just a warning: without the inline directive, there has
> to be a secondary reasons for the function to be inlined,
> than the always_inline attribute, reasons that may be in
> effect despite the warning.
>
> Whenever the text is quoted in inline-related bugzilla
> entries, there seems to often have been an initial step of
> confusion that has to be cleared, for example in PR55830.
> The powerpc test-case in this patch where a comment is
> adjusted, seems to be another example, and I testify as the
> first-hand third "experience".  The wording has been the
> same since the warning was added.
>
> Let's just tweak the wording, adding the cause, so that the
> reason for the warning is clearer.  This hopefully stops the
> user from immediately asking "'Might'?  Because why?"  and
> then going off looking at the function body - or grepping
> the gcc source or documentation, or enter a bug-report
> subsequently closed as resolved/invalid.
>
> gcc:
> * cgraphunit.cc (process_function_and_variable_attributes): Tweak
> the warning for an attribute-always_inline without inline declaration.
>
> gcc/testsuite:
> * g++.dg/Wattributes-3.C: Adjust expected warning.
> * gcc.dg/fail_always_inline.c: Ditto.
> * gcc.target/powerpc/vec-extract-v16qiu-v2.h: Adjust
> quoted warning in comment.
> ---
>  gcc/cgraphunit.cc| 3 ++-
>  gcc/testsuite/g++.dg/Wattributes-3.C | 4 ++--
>  gcc/testsuite/gcc.dg/fail_always_inline.c| 2 +-
>  gcc/testsuite/gcc.target/powerpc/vec-extract-v16qiu-v2.h | 2 +-
>  4 files changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/cgraphunit.cc b/gcc/cgraphunit.cc
> index 38052674aaa5..89dc1af522a4 100644
> --- a/gcc/cgraphunit.cc
> +++ b/gcc/cgraphunit.cc
> @@ -918,7 +918,8 @@ process_function_and_variable_attributes (cgraph_node 
> *first,
>   /* redefining extern inline function makes it DECL_UNINLINABLE.  */
>   && !DECL_UNINLINABLE (decl))
> warning_at (DECL_SOURCE_LOCATION (decl), OPT_Wattributes,
> -   "% function might not be inlinable");
> +   "% function is not always inlined"
> +   " unless also declared %");

I don't like the "is not always inlined", maybe simply reword to

  "% function might not be inlinable"
  " unless also declared %"

?

>
>process_common_attributes (node, decl);
>  }
> diff --git a/gcc/testsuite/g++.dg/Wattributes-3.C 
> b/gcc/testsuite/g++.dg/Wattributes-3.C
> index 208ec6696551..4adf0944cd4f 100644
> --- a/gcc/testsuite/g++.dg/Wattributes-3.C
> +++ b/gcc/testsuite/g++.dg/Wattributes-3.C
> @@ -26,7 +26,7 @@ B::operator char () const { return 0; }
>
>  ATTR ((__noinline__))
>  B::operator int () const  // { dg-warning "ignoring attribute .noinline. 
> because it conflicts with attribute .always_inline." }
> -// { dg-warning "function might not be inlinable" "" { target *-*-* } .-1 }
> +// { dg-warning "function is not always inlined unless also declared 
> .inline." "" { target *-*-* } .-1 }
>
>  {
>return 0;
> @@ -45,7 +45,7 @@ C::operator char () { return 0; }
>
>  ATTR ((__noinline__))
>  C::operator short ()   // { dg-warning "ignoring attribute 
> .noinline. because it conflicts with attribute .always_inline." }
> -// { dg-warning "function might not be inlinable" "" { target *-*-* } .-1 }
> +// { dg-warning "function is not always inlined unless also declared 
> .inline." "" { target *-*-* } .-1 }
>  { return 0; }
>
>  inline ATTR ((__noinline__))
> diff --git a/gcc/testsuite/gcc.dg/fail_always_inline.c 
> b/gcc/testsuite/gcc.dg/fail_always_inline.c
> index 86645b850de8..2f48d7f5c6be 100644
> --- a/gcc/testsuite/gcc.dg/fail_always_inline.c
> +++ b/gcc/testsuite/gcc.dg/fail_always_inline.c
> @@ -2,7 +2,7 @@
>  /* { dg-add-options bind_pic_locally } */
>
>  extern __attribute__ ((always_inline)) void
> - bar() { } /* { dg-warning "function might not be inlinable" } */
> + bar() { } /* { dg-warning "function is not always inlined unless also 
> declared .inline." } */
>
>  void
>  f()
> diff --git a/gcc/testsuite/gcc.target/powerpc/vec-extract-v16qiu-v2.h 
> b/gcc/testsuite/gcc.target/powerpc/vec-extract-v16qiu-v2.h
> index d1157599ee7c..b9800e1c950d 100644
> --- a/gcc/testsuite/gcc.target/powerpc/vec-extract-v16qiu-v2.h
> +++ b/gcc/testsuite/gcc.target/powerpc/vec-extract-v16qiu-v2.h
> @@ -179,7 +179,7 @@ get_auto_15 (vector TYPE a)
>  #ifdef 

Re: [PATCH v2 2/2] LoongArch: When the code model is extreme, the symbol address is obtained through macro instructions regardless of the value of -mexplicit-relocs.

2024-01-21 Thread chenglulu



在 2024/1/19 下午4:51, chenglulu 写道:


在 2024/1/19 下午1:46, Xi Ruoyao 写道:

On Wed, 2024-01-17 at 17:57 +0800, chenglulu wrote:
Virtual register 1479 will be used in insn 2744, but register 1479 
was

assigned the REG_UNUSED attribute in the previous instruction.

The attached file is the wrong file.
The compilation command is as follows:

$ ./gcc/cc1 -fpreprocessed regrename.i -quiet -dp -dumpbase 
regrename.c

-dumpbase-ext .c -mno-relax -mabi=lp64d -march=loongarch64 -mfpu=64
-msimd=lasx -mcmodel=extreme -mtune=loongarch64 -g3 -O2
-Wno-int-conversion -Wno-implicit-int 
-Wno-implicit-function-declaration

-Wno-incompatible-pointer-types -version -o regrename.s
-mexplicit-relocs=always -fdump-rtl-all-all

I've seen some "guality" test failures in GCC test suite as well.
Normally I just ignore the guality failures but this time they look 
very

suspicious.  I'll investigate these issues...

I've also seen this type of failed regression tests and I'll 
continue to

look at this issue as well.

The guality regression is simple: I didn't call
delegitimize_mem_from_attrs (the default TARGET_DELEGITIMIZE_ADDRESS) in
the custom implementation.

The failure of this test case was because the compiler believes that two
(UNSPEC_PCREL_64_PART2 [(symbol)]) instances would always produce the
same result, but this isn't true because the result depends on PC.  Thus
(pc) needed to be included in the RTX, like:

   [(set (match_operand:DI 0 "register_operand" "=r")
 (unspec:DI [(match_operand:DI 2 "") (pc)] 
UNSPEC_LA_PCREL_64_PART1))

    (set (match_operand:DI 1 "register_operand" "=r")
 (unspec:DI [(match_dup 2) (pc)] UNSPEC_LA_PCREL_64_PART2))]

With this the buggy REG_UNUSED notes were gone.  But it then prevented
the CSE when loading the address of __tls_get_addr (i.e. if we address
10 TLE_LD symbols in a function it would emit 10 instances of "la.global
__tls_get_addr") so I added an REG_EQUAL note for it.  For symbols other
than __tls_get_addr such notes are added automatically by optimization
passes.

Updated patch attached.

I'm eliminating redundant la.global directives in my macro 
implementation.


I will be testing this patch.




With this patch, spec2006 can pass the test, but spec2017 621 and 654 
tests fail.

I haven't debugged the specific cause of the problem yet.



Re: [PATCH] PR rtl-optimization/111267: Improved forward propagation.

2024-01-21 Thread Andrew Pinski
On Sun, Jan 21, 2024 at 11:06 PM juzhe.zh...@rivai.ai
 wrote:
>
> Hi, this patch causes these regressions (all ICEs) in RISC-V backend:

Note this is related to the fix for PR 109092.

But right now register_operand still accepts `(SUBREG (MEM...))`
before reload ...
So basically there is a check against register_operand to then
reg_or_subregno is called
but since register_operand allows `(SUBREG (MEM())` but
reg_or_subregno  asserts on only having `REG` or `SUBREG(REG)`, things
go downhill.

What a mess.  I really wish `(SUBREG (MEM...))` handling would go away ...

Thanks,
Andrew Pinski

>
> FAIL: gcc.dg/torture/float32-tg-2.c   -O1  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O1  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O2  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O2  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  (internal compiler error: in reg_or_subregno, at 
> jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  (internal compiler error: in reg_or_subregno, at 
> jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O3 -g  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O3 -g  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg-2.c   -Os  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg-2.c   -Os  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg.c   -O1  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg.c   -O1  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg.c   -O2  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg.c   -O2  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  (internal compiler error: in reg_or_subregno, at 
> jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  (internal compiler error: in reg_or_subregno, at 
> jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg.c   -O3 -g  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg.c   -O3 -g  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg.c   -Os  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg.c   -Os  (test for excess errors)
> FAIL: gcc.dg/torture/pr48124-4.c   -O1  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/pr48124-4.c   -O1  (test for excess errors)
> FAIL: gcc.dg/torture/pr48124-4.c   -O2  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/pr48124-4.c   -O2  (test for excess errors)
> FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  (internal compiler error: in reg_or_subregno, at 
> jump.cc:1895)
> FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  (test for excess errors)
> FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  (internal compiler error: in reg_or_subregno, at 
> jump.cc:1895)
> FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  (test for excess errors)
> FAIL: gcc.dg/torture/pr48124-4.c   -O3 -fomit-frame-pointer -funroll-loops 
> -fpeel-loops -ftracer -finline-functions  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/pr48124-4.c   -O3 -fomit-frame-pointer -funroll-loops 
> -fpeel-loops -ftracer -finline-functions  (test for excess errors)
> FAIL: gcc.dg/torture/pr48124-4.c   -O3 -g  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/pr48124-4.c   -O3 -g  (test for excess errors)
> FAIL: gcc.dg/torture/pr48124-4.c   -Os  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/pr48124-4.c   -Os  (test for excess errors)
>
> 
> juzhe.zh...@rivai.ai


[PATCH] PR rtl-optimization/111267: Improved forward propagation.

2024-01-21 Thread juzhe.zh...@rivai.ai
Hi, this patch causes these regressions (all ICEs) in RISC-V backend:

FAIL: gcc.dg/torture/float32-tg-2.c   -O1  (internal compiler error: in 
reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg-2.c   -O1  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg-2.c   -O2  (internal compiler error: in 
reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg-2.c   -O2  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (internal compiler error: in reg_or_subregno, at 
jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (internal compiler error: in reg_or_subregno, at 
jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg-2.c   -O3 -g  (internal compiler error: in 
reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg-2.c   -O3 -g  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg-2.c   -Os  (internal compiler error: in 
reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg-2.c   -Os  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg.c   -O1  (internal compiler error: in 
reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg.c   -O1  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg.c   -O2  (internal compiler error: in 
reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg.c   -O2  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (internal compiler error: in reg_or_subregno, at 
jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (internal compiler error: in reg_or_subregno, at 
jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg.c   -O3 -g  (internal compiler error: in 
reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg.c   -O3 -g  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg.c   -Os  (internal compiler error: in 
reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg.c   -Os  (test for excess errors)
FAIL: gcc.dg/torture/pr48124-4.c   -O1  (internal compiler error: in 
reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/pr48124-4.c   -O1  (test for excess errors)
FAIL: gcc.dg/torture/pr48124-4.c   -O2  (internal compiler error: in 
reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/pr48124-4.c   -O2  (test for excess errors)
FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (internal compiler error: in reg_or_subregno, at 
jump.cc:1895)
FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (test for excess errors)
FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (internal compiler error: in reg_or_subregno, at 
jump.cc:1895)
FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (test for excess errors)
FAIL: gcc.dg/torture/pr48124-4.c   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (internal compiler error: in 
reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/pr48124-4.c   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
FAIL: gcc.dg/torture/pr48124-4.c   -O3 -g  (internal compiler error: in 
reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/pr48124-4.c   -O3 -g  (test for excess errors)
FAIL: gcc.dg/torture/pr48124-4.c   -Os  (internal compiler error: in 
reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/pr48124-4.c   -Os  (test for excess errors) 



juzhe.zh...@rivai.ai


Re: Re: [PATCH v2] RISC-V: Bugfix for resolve_overloaded_builtin[PR113420]

2024-01-21 Thread Li Xu
Committed, thanks



xu...@eswincomputing.com
 
From: juzhe.zh...@rivai.ai
Date: 2024-01-22 14:40
To: Li Xu; gcc-patches
CC: kito.cheng; palmer; Li Xu
Subject: Re: [PATCH v2] RISC-V: Bugfix for resolve_overloaded_builtin[PR113420]
LGTM.



juzhe.zh...@rivai.ai
 
From: Li Xu
Date: 2024-01-22 12:11
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong; xuli
Subject: [PATCH v2] RISC-V: Bugfix for resolve_overloaded_builtin[PR113420]
From: xuli 
 
v2:
Avoid internal ICE for the case below.
vint8mf8_t test_vle8_v_i8mf8_m(vbool64_t vm, const int32_t *rs1, size_t vl) {
  return __riscv_vle8(vm, rs1, vl);
}
 
v1:
Change the hash value of overloaded intrinsic from considering
all parameter types to:
1. Encoding vector data type
2. In order to distinguish vle8_v_i8mf8_m(vbool64_t vm, const int8_t *rs1, 
size_t vl)
   and vle8_v_u8mf8_m(vbool64_t vm, const uint8_t *rs1, size_t vl), encode the 
pointer type
3. In order to distinguish vfadd_vv_f32mf2_rm(vfloat32mf2_t vs2, vfloat32mf2_t 
vs1, size_t vl)
   and vfadd_vv_f32mf2(vfloat32mf2_t vs2, vfloat32mf2_t vs1, size_t vl), encode 
the number of
   parameters. The same goes for the vxrm intrinsics.
 
PR target/113420
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins.cc (has_vxrm_or_frm_p):remove.
(registered_function::overloaded_hash):refacotr.
(resolve_overloaded_builtin):avoid interal ICE.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/pr113420-1.c: New test.
* gcc.target/riscv/rvv/base/pr113420-2.c: New test.
---
gcc/config/riscv/riscv-vector-builtins.cc | 93 ---
.../gcc.target/riscv/rvv/base/pr113420-1.c| 30 ++
.../gcc.target/riscv/rvv/base/pr113420-2.c| 31 +++
3 files changed, 77 insertions(+), 77 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr113420-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr113420-2.c
 
diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-vector-builtins.cc
index 25e0b6e56de..c0e7af482da 100644
--- a/gcc/config/riscv/riscv-vector-builtins.cc
+++ b/gcc/config/riscv/riscv-vector-builtins.cc
@@ -4271,24 +4271,22 @@ registered_function::overloaded_hash () const
: TYPE_UNSIGNED (type);
   mode_p = POINTER_TYPE_P (type) ? TYPE_MODE (TREE_TYPE (type))
 : TYPE_MODE (type);
-  h.add_int (unsigned_p);
-  h.add_int (mode_p);
+  if (POINTER_TYPE_P (type) || lookup_vector_type_attribute (type))
+ {
+   h.add_int (unsigned_p);
+   h.add_int (mode_p);
+ }
+  else if (instance.base->may_require_vxrm_p ()
+|| instance.base->may_require_frm_p ())
+ {
+   h.add_int (argument_types.length ());
+   break;
+ }
 }
   return h.end ();
}
-bool
-has_vxrm_or_frm_p (function_instance , const vec 
)
-{
-  if (instance.base->may_require_vxrm_p ()
-  || (instance.base->may_require_frm_p ()
-   && (TREE_CODE (TREE_TYPE (arglist[arglist.length () - 2]))
-   == INTEGER_TYPE)))
-return true;
-  return false;
-}
-
hashval_t
registered_function::overloaded_hash (const vec )
{
@@ -4296,68 +4294,8 @@ registered_function::overloaded_hash (const vec )
   unsigned int len = arglist.length ();
   for (unsigned int i = 0; i < len; i++)
-{
-  /* vint8m1_t __riscv_vget_i8m1(vint8m2_t src, size_t index);
-  When the user calls vget intrinsic, the __riscv_vget_i8m1(src, 1)
-   form is used. The compiler recognizes that the parameter index is signed
-   int, which is inconsistent with size_t, so the index is converted to
-   size_t type in order to get correct hash value. vint8m2_t
-   __riscv_vset(vint8m2_t dest, size_t index, vint8m1_t value); The reason
-   is the same as above. */
-  if ((instance.base == bases::vget && (i == (len - 1)))
-   || ((instance.base == bases::vset
-   || instance.shape == shapes::crypto_vi)
- && (i == (len - 2
- argument_types.safe_push (size_type_node);
-  /* Vector fixed-point arithmetic instructions requiring argument vxrm.
-  For example: vuint32m4_t __riscv_vaaddu(vuint32m4_t vs2,
-  vuint32m4_t vs1, unsigned int vxrm, size_t vl); The user calls vaaddu
-  intrinsic in the form of __riscv_vaaddu(vs2, vs1, 2, vl). The compiler
-  recognizes that the parameter vxrm is a signed int, which is inconsistent
-  with the parameter unsigned int vxrm declared by intrinsic, so the
-  parameter vxrm is converted to an unsigned int type in order to get
-  correct hash value.
-
-  Vector Floating-Point Instructions requiring argument frm.
-  DEF_RVV_FUNCTION (vfadd, alu, full_preds, f_vvv_ops)
-  DEF_RVV_FUNCTION (vfadd_frm, alu_frm, full_preds, f_vvv_ops)
-  Taking vfadd as an example, theoretically we can add base or shape to the
-  hash value to distinguish whether the frm parameter is required.
-  vfloat32m1_t __riscv_vfadd(vfloat32m1_t vs2, float32_t rs1, size_t vl);
-  vfloat32m1_t __riscv_vfadd(vfloat32m1_t vs2, 

[PATCH] RISC-V: Lower vmv.v.x (avl = 1) into vmv.s.x

2024-01-21 Thread Juzhe-Zhong
Notice there is a AI benchmark, GCC vs Clang has 3% performance drop.

It's because Clang/LLVM has a simplification transform vmv.v.x (avl = 1) into 
vmv.s.x.

Since vmv.s.x has more flexible vsetvl demand than vmv.v.x that can allow us to 
have
better chances to fuse vsetvl.

Consider this following case:

void
foo (uint32_t *outputMat, uint32_t *inputMat)
{
  vuint32m1_t matRegIn0 = __riscv_vle32_v_u32m1 (inputMat, 4);
  vuint32m1_t matRegIn1 = __riscv_vle32_v_u32m1 (inputMat + 4, 4);
  vuint32m1_t matRegIn2 = __riscv_vle32_v_u32m1 (inputMat + 8, 4);
  vuint32m1_t matRegIn3 = __riscv_vle32_v_u32m1 (inputMat + 12, 4);

  vbool32_t oddMask
= __riscv_vreinterpret_v_u32m1_b32 (__riscv_vmv_v_x_u32m1 (0x, 1));

  vuint32m1_t smallTransposeMat0
= __riscv_vslideup_vx_u32m1_tumu (oddMask, matRegIn0, matRegIn1, 1, 4);
  vuint32m1_t smallTransposeMat2
= __riscv_vslideup_vx_u32m1_tumu (oddMask, matRegIn2, matRegIn3, 1, 4);

  vuint32m1_t outMat0 = __riscv_vslideup_vx_u32m1_tu (smallTransposeMat0,
  smallTransposeMat2, 2, 4);

  __riscv_vse32_v_u32m1 (outputMat, outMat0, 4);
}

Before this patch:

vsetivlizero,4,e32,m1,ta,ma
li  a5,45056
addia2,a1,16
addia3,a1,32
addia4,a1,48
vle32.v v1,0(a1)
vle32.v v4,0(a2)
vle32.v v2,0(a3)
vle32.v v3,0(a4)
addiw   a5,a5,-1366
vsetivlizero,1,e32,m1,ta,ma
vmv.v.x v0,a5 ---> Since it avl = 1, we can 
transform it into vmv.s.x
vsetivlizero,4,e32,m1,tu,mu
vslideup.vi v1,v4,1,v0.t
vslideup.vi v2,v3,1,v0.t
vslideup.vi v1,v2,2
vse32.v v1,0(a0)
ret

After this patch:

li  a5,45056
addia2,a1,16
vsetivlizero,4,e32,m1,tu,mu
addiw   a5,a5,-1366
vle32.v v3,0(a2)
addia3,a1,32
addia4,a1,48
vle32.v v1,0(a1)
vmv.s.x v0,a5
vle32.v v2,0(a3)
vslideup.vi v1,v3,1,v0.t
vle32.v v3,0(a4)
vslideup.vi v2,v3,1,v0.t
vslideup.vi v1,v2,2
vse32.v v1,0(a0)
ret

Tested on both RV32 and RV64 no regression.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (splat_to_scalar_move_p): New function.
* config/riscv/riscv-v.cc (splat_to_scalar_move_p): Ditto.
* config/riscv/vector.md: Simplify vmv.v.x. into vmv.s.x.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/attribute-2.c: New test.
* gcc.target/riscv/rvv/vsetvl/attribute-3.c: New test.

---
 gcc/config/riscv/riscv-protos.h   |  1 +
 gcc/config/riscv/riscv-v.cc   | 12 ++
 gcc/config/riscv/vector.md|  9 -
 .../gcc.target/riscv/rvv/vsetvl/attribute-2.c | 37 +++
 .../gcc.target/riscv/rvv/vsetvl/attribute-3.c | 36 ++
 5 files changed, 94 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/attribute-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/attribute-3.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 7fe26fcd939..b3f0bdb9924 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -708,6 +708,7 @@ bool can_be_broadcasted_p (rtx);
 bool gather_scatter_valid_offset_p (machine_mode);
 HOST_WIDE_INT estimated_poly_value (poly_int64, unsigned int);
 bool whole_reg_to_reg_move_p (rtx *, machine_mode, int);
+bool splat_to_scalar_move_p (rtx *);
 }
 
 /* We classify builtin types into two classes:
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 93a1238a5ab..4bacb7fea45 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -5151,4 +5151,16 @@ whole_reg_to_reg_move_p (rtx *ops, machine_mode mode, 
int avl_type_index)
   return false;
 }
 
+/* Return true if we can transform vmv.v.x/vfmv.v.f to vmv.s.x/vfmv.s.f.  */
+bool
+splat_to_scalar_move_p (rtx *ops)
+{
+  return satisfies_constraint_Wc1 (ops[1])
+&& satisfies_constraint_vu (ops[2])
+&& !MEM_P (ops[3])
+&& satisfies_constraint_c01 (ops[4])
+&& INTVAL (ops[7]) == NONVLMAX
+&& known_ge (GET_MODE_SIZE (Pmode), GET_MODE_SIZE (GET_MODE (ops[3])));
+}
+
 } // namespace riscv_vector
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 307d9a8c952..ab6e099852d 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -1977,8 +1977,15 @@
  (match_operand:V_VLS 2 "vector_merge_operand")))]
   "TARGET_VECTOR"
 {
+  /* Transform vmv.v.x/vfmv.v.f (avl = 1) into vmv.s.x since vmv.s.x/vfmv.s.f
+ has better chances to do vsetvl fusion in vsetvl pass.  */
+  if (riscv_vector::splat_to_scalar_move_p (operands))
+{
+  operands[1] = riscv_vector::gen_scalar_move_mask (mode);
+  operands[3] 

Re: [PATCH v2] RISC-V: Bugfix for resolve_overloaded_builtin[PR113420]

2024-01-21 Thread juzhe.zh...@rivai.ai
LGTM.



juzhe.zh...@rivai.ai
 
From: Li Xu
Date: 2024-01-22 12:11
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong; xuli
Subject: [PATCH v2] RISC-V: Bugfix for resolve_overloaded_builtin[PR113420]
From: xuli 
 
v2:
Avoid internal ICE for the case below.
vint8mf8_t test_vle8_v_i8mf8_m(vbool64_t vm, const int32_t *rs1, size_t vl) {
  return __riscv_vle8(vm, rs1, vl);
}
 
v1:
Change the hash value of overloaded intrinsic from considering
all parameter types to:
1. Encoding vector data type
2. In order to distinguish vle8_v_i8mf8_m(vbool64_t vm, const int8_t *rs1, 
size_t vl)
   and vle8_v_u8mf8_m(vbool64_t vm, const uint8_t *rs1, size_t vl), encode the 
pointer type
3. In order to distinguish vfadd_vv_f32mf2_rm(vfloat32mf2_t vs2, vfloat32mf2_t 
vs1, size_t vl)
   and vfadd_vv_f32mf2(vfloat32mf2_t vs2, vfloat32mf2_t vs1, size_t vl), encode 
the number of
   parameters. The same goes for the vxrm intrinsics.
 
PR target/113420
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins.cc (has_vxrm_or_frm_p):remove.
(registered_function::overloaded_hash):refacotr.
(resolve_overloaded_builtin):avoid interal ICE.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/pr113420-1.c: New test.
* gcc.target/riscv/rvv/base/pr113420-2.c: New test.
---
gcc/config/riscv/riscv-vector-builtins.cc | 93 ---
.../gcc.target/riscv/rvv/base/pr113420-1.c| 30 ++
.../gcc.target/riscv/rvv/base/pr113420-2.c| 31 +++
3 files changed, 77 insertions(+), 77 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr113420-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr113420-2.c
 
diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-vector-builtins.cc
index 25e0b6e56de..c0e7af482da 100644
--- a/gcc/config/riscv/riscv-vector-builtins.cc
+++ b/gcc/config/riscv/riscv-vector-builtins.cc
@@ -4271,24 +4271,22 @@ registered_function::overloaded_hash () const
: TYPE_UNSIGNED (type);
   mode_p = POINTER_TYPE_P (type) ? TYPE_MODE (TREE_TYPE (type))
 : TYPE_MODE (type);
-  h.add_int (unsigned_p);
-  h.add_int (mode_p);
+  if (POINTER_TYPE_P (type) || lookup_vector_type_attribute (type))
+ {
+   h.add_int (unsigned_p);
+   h.add_int (mode_p);
+ }
+  else if (instance.base->may_require_vxrm_p ()
+|| instance.base->may_require_frm_p ())
+ {
+   h.add_int (argument_types.length ());
+   break;
+ }
 }
   return h.end ();
}
-bool
-has_vxrm_or_frm_p (function_instance , const vec 
)
-{
-  if (instance.base->may_require_vxrm_p ()
-  || (instance.base->may_require_frm_p ()
-   && (TREE_CODE (TREE_TYPE (arglist[arglist.length () - 2]))
-   == INTEGER_TYPE)))
-return true;
-  return false;
-}
-
hashval_t
registered_function::overloaded_hash (const vec )
{
@@ -4296,68 +4294,8 @@ registered_function::overloaded_hash (const vec )
   unsigned int len = arglist.length ();
   for (unsigned int i = 0; i < len; i++)
-{
-  /* vint8m1_t __riscv_vget_i8m1(vint8m2_t src, size_t index);
-  When the user calls vget intrinsic, the __riscv_vget_i8m1(src, 1)
-   form is used. The compiler recognizes that the parameter index is signed
-   int, which is inconsistent with size_t, so the index is converted to
-   size_t type in order to get correct hash value. vint8m2_t
-   __riscv_vset(vint8m2_t dest, size_t index, vint8m1_t value); The reason
-   is the same as above. */
-  if ((instance.base == bases::vget && (i == (len - 1)))
-   || ((instance.base == bases::vset
-   || instance.shape == shapes::crypto_vi)
- && (i == (len - 2
- argument_types.safe_push (size_type_node);
-  /* Vector fixed-point arithmetic instructions requiring argument vxrm.
-  For example: vuint32m4_t __riscv_vaaddu(vuint32m4_t vs2,
-  vuint32m4_t vs1, unsigned int vxrm, size_t vl); The user calls vaaddu
-  intrinsic in the form of __riscv_vaaddu(vs2, vs1, 2, vl). The compiler
-  recognizes that the parameter vxrm is a signed int, which is inconsistent
-  with the parameter unsigned int vxrm declared by intrinsic, so the
-  parameter vxrm is converted to an unsigned int type in order to get
-  correct hash value.
-
-  Vector Floating-Point Instructions requiring argument frm.
-  DEF_RVV_FUNCTION (vfadd, alu, full_preds, f_vvv_ops)
-  DEF_RVV_FUNCTION (vfadd_frm, alu_frm, full_preds, f_vvv_ops)
-  Taking vfadd as an example, theoretically we can add base or shape to the
-  hash value to distinguish whether the frm parameter is required.
-  vfloat32m1_t __riscv_vfadd(vfloat32m1_t vs2, float32_t rs1, size_t vl);
-  vfloat32m1_t __riscv_vfadd(vfloat32m1_t vs2, vfloat32m1_t vs1, unsigned 
int
-  frm, size_t vl);
-
- However, the current registration mechanism of overloaded intinsic for gcc
-  limits the intrinsic obtained by entering the hook to always be vfadd, 
not
-  vfadd_frm. 

Re: [PATCH] libstdc++: hashtable: No need to update before begin node in _M_remove_bucket_begin

2024-01-21 Thread François Dumont
Thanks, nice result, I'll try to run the performance benchmarks that are 
coming with libstdc++ to see if they spot anything.


That's tests in testsuite/performance folder in case you want to have a 
try yourself.


François


On 18/01/2024 10:26, Huanghui Nie wrote:


Yes, I have. I did a benchmark today.

The conclusion is: the time consumption can be reduced by 0.4% ~ 1.2% 
when unordered_set erase(begin()), and 1.2% ~ 2.4% when erase(begin(), 
end()).



My test environment:

CPU: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz, 2393.365 MHz, 56 CPUs

MEM: 256G

OS: CentOS-8.2

g++: gcc version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)

Compile flags: -O3 -std=c++17


Test conclusion data (time taken to delete every 100 million elements):

erase(begin()):

|size of unordered_set |100 |1,000 |10,000|100,000 |1,000,000|10,000,000|

|base time consuming (ms)|3827.736|3807.725|3830.168|3807.373|3798.713 
|3854.168|


|test time consuming (ms)|3783.406|3789.460|3791.146|3778.033|3783.494 
|3808.137|


|Time-consuming reduction|1.16% |0.48% |1.02% |0.77% |0.40%|1.19% |

erase(begin(),end()):

|size of unordered_set |100 |1,000 |10,000|100,000 |1,000,000|10,000,000|

|base time consuming (ms)|2779.229|2768.550|2795.778|2767.385|2761.521 
|2804.099|


|test time consuming (ms)|2712.759|2726.578|2752.224|2732.140|2718.953 
|2739.727|


|Time-consuming reduction|2.39% |1.52% |1.56% |1.27% |1.54%|2.30% |


Please see the attachment for test code and detailed test result.


2024年1月18日(木) 4:04 François Dumont :

Hi

Looks like a great finding to me, this is indeed a useless check,
thanks!

Have you any figures on the performance enhancement ? It might
help to get proper approval as gcc is currently in dev stage 4
that is to say only bug fixes normally.

François

On 17/01/2024 09:11, Huanghui Nie wrote:


Hi.

When I implemented a hash table with reference to the C++ STL, I
found that when the hash table in the C++ STL deletes elements,
if the first element deleted is the begin element, the before
begin node is repeatedly assigned. This creates unnecessary
performance overhead.


First, let’s see the code implementation:

In _M_remove_bucket_begin, _M_before_begin._M_nxt is assigned
when &_M_before_begin == _M_buckets[__bkt]. That also means
_M_buckets[__bkt]->_M_nxt is assigned under some conditions.

_M_remove_bucket_begin is called by _M_erase and _M_extract_node:

 1. Case _M_erase a range: _M_remove_bucket_begin is called in a
for loop when __is_bucket_begin is true. And if
__is_bucket_begin is true and &_M_before_begin ==
_M_buckets[__bkt], __prev_n must be &_M_before_begin.
__prev_n->_M_nxt is always assigned in _M_erase. That means
_M_before_begin._M_nxt is always assigned, if
_M_remove_bucket_begin is called and &_M_before_begin ==
_M_buckets[__bkt]. So there’s no need to assign
_M_before_begin._M_nxt in _M_remove_bucket_begin.
 2. Other cases: _M_remove_bucket_begin is called when __prev_n
== _M_buckets[__bkt]. And __prev_n->_M_nxt is always assigned
in _M_erase and _M_before_begin. That means
_M_buckets[__bkt]->_M_nxt is always assigned. So there's no
need to assign _M_buckets[__bkt]->_M_nxt in
_M_remove_bucket_begin.

In summary, there’s no need to check &_M_before_begin ==
_M_buckets[__bkt] and assign _M_before_begin._M_nxt in
_M_remove_bucket_begin.


Then let’s see the responsibility of each method:

The hash table in the C++ STL is composed of hash buckets and a
node list. The update of the node list is responsible for
_M_erase and _M_extract_node method. _M_remove_bucket_begin
method only needs to update the hash buckets. The update of
_M_before_begin belongs to the update of the node list. So
_M_remove_bucket_begin doesn’t need to update _M_before_begin.


Existing tests listed below cover this change:

23_containers/unordered_set/allocator/copy.cc

23_containers/unordered_set/allocator/copy_assign.cc

23_containers/unordered_set/allocator/move.cc

23_containers/unordered_set/allocator/move_assign.cc

23_containers/unordered_set/allocator/swap.cc

23_containers/unordered_set/erase/1.cc

23_containers/unordered_set/erase/24061-set.cc

23_containers/unordered_set/modifiers/extract.cc

23_containers/unordered_set/operations/count.cc

23_containers/unordered_set/requirements/exception/basic.cc

23_containers/unordered_map/allocator/copy.cc

23_containers/unordered_map/allocator/copy_assign.cc

23_containers/unordered_map/allocator/move.cc

23_containers/unordered_map/allocator/move_assign.cc

23_containers/unordered_map/allocator/swap.cc

23_containers/unordered_map/erase/1.cc

23_containers/unordered_map/erase/24061-map.cc

23_containers/unordered_map/modifiers/extract.cc


[PATCH v2] RISC-V: Add split pattern to generate SFB instructions. [PR113095]

2024-01-21 Thread Monk Chiang
Since the match.pd transforms (zero_one == 0) ? y : z  y,
into ((typeof(y))zero_one * z)  y. Add splitters to recongize
this expression to generate SFB instructions.

gcc/ChangeLog:
PR target/113095
* config/riscv/sfb.md: New splitters to rewrite single bit
sign extension as the condition to SFB instructions.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/sfb.c: New test.
* gcc.target/riscv/pr113095.c: New test.
---
 gcc/config/riscv/sfb.md   | 32 +++
 gcc/testsuite/gcc.target/riscv/pr113095.c | 20 ++
 gcc/testsuite/gcc.target/riscv/sfb.c  | 24 +
 3 files changed, 76 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr113095.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sfb.c

diff --git a/gcc/config/riscv/sfb.md b/gcc/config/riscv/sfb.md
index 8ab747142c8..bfd229e3d09 100644
--- a/gcc/config/riscv/sfb.md
+++ b/gcc/config/riscv/sfb.md
@@ -35,3 +35,35 @@
   [(set_attr "length" "8")
(set_attr "type" "sfb_alu")
(set_attr "mode" "")])
+
+;; Combine creates this form ((typeof(y))zero_one * z)  y
+;; for SiFive short forward branches.
+
+(define_split
+  [(set (match_operand:X 0 "register_operand")
+   (and:X (sign_extract:X (match_operand:X 1 "register_operand")
+  (const_int 1)
+  (match_operand 2 "immediate_operand"))
+  (match_operand:X 3 "register_operand")))
+   (clobber (match_operand:X 4 "register_operand"))]
+  "TARGET_SFB_ALU"
+  [(set (match_dup 4) (zero_extract:X (match_dup 1) (const_int 1) (match_dup 
2)))
+   (set (match_dup 0) (if_then_else:X (ne (match_dup 4) (const_int 0))
+ (match_dup 3)
+ (const_int 0)))])
+
+(define_split
+  [(set (match_operand:X 0 "register_operand")
+   (and:X (sign_extract:X (match_operand:X 1 "register_operand")
+  (const_int 1)
+  (match_operand 2 "immediate_operand"))
+  (match_operand:X 3 "register_operand")))
+   (clobber (match_operand:X 4 "register_operand"))]
+  "TARGET_SFB_ALU && (UINTVAL (operands[2]) < 11)"
+  [(set (match_dup 4) (and:X (match_dup 1) (match_dup 2)))
+   (set (match_dup 0) (if_then_else:X (ne (match_dup 4) (const_int 0))
+ (match_dup 3)
+ (const_int 0)))]
+{
+  operands[2] = GEN_INT (1 << UINTVAL(operands[2]));
+})
diff --git a/gcc/testsuite/gcc.target/riscv/pr113095.c 
b/gcc/testsuite/gcc.target/riscv/pr113095.c
new file mode 100644
index 000..50dd6be11c4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr113095.c
@@ -0,0 +1,20 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -march=rv32gc -mabi=ilp32d -mtune=sifive-7-series" } */
+
+extern void abort (void);
+extern void exit (int);
+
+unsigned short __attribute__ ((noinline, noclone))
+foo (unsigned short x) {
+  if (x == 1)
+x ^= 0x4002;
+
+  return x;
+}
+
+int main () {
+  if (foo(1) != 0x4003)
+abort ();
+
+  exit(0);
+}
diff --git a/gcc/testsuite/gcc.target/riscv/sfb.c 
b/gcc/testsuite/gcc.target/riscv/sfb.c
new file mode 100644
index 000..22f164051f4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sfb.c
@@ -0,0 +1,24 @@
+//* { dg-do compile } */
+/* { dg-options "-O2 -march=rv32gc -mabi=ilp32d -mtune=sifive-7-series" } */
+
+int f1(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) == 0) ? y : z ^ y;
+}
+
+int f2(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) != 0) ? z ^ y : y;
+}
+
+int f3(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) == 0) ? y : z | y;
+}
+
+int f4(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) != 0) ? z | y : y;
+}
+/* { dg-final { scan-assembler-times "bne" 4 } } */
+/* { dg-final { scan-assembler-times "movcc" 4 } } */
-- 
2.40.1



[PATCH v2] RISC-V: Bugfix for resolve_overloaded_builtin[PR113420]

2024-01-21 Thread Li Xu
From: xuli 

v2:
Avoid internal ICE for the case below.
vint8mf8_t test_vle8_v_i8mf8_m(vbool64_t vm, const int32_t *rs1, size_t vl) {
  return __riscv_vle8(vm, rs1, vl);
}

v1:
Change the hash value of overloaded intrinsic from considering
all parameter types to:
1. Encoding vector data type
2. In order to distinguish vle8_v_i8mf8_m(vbool64_t vm, const int8_t *rs1, 
size_t vl)
   and vle8_v_u8mf8_m(vbool64_t vm, const uint8_t *rs1, size_t vl), encode the 
pointer type
3. In order to distinguish vfadd_vv_f32mf2_rm(vfloat32mf2_t vs2, vfloat32mf2_t 
vs1, size_t vl)
   and vfadd_vv_f32mf2(vfloat32mf2_t vs2, vfloat32mf2_t vs1, size_t vl), encode 
the number of
   parameters. The same goes for the vxrm intrinsics.

PR target/113420

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins.cc (has_vxrm_or_frm_p):remove.
(registered_function::overloaded_hash):refacotr.
(resolve_overloaded_builtin):avoid interal ICE.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr113420-1.c: New test.
* gcc.target/riscv/rvv/base/pr113420-2.c: New test.
---
 gcc/config/riscv/riscv-vector-builtins.cc | 93 ---
 .../gcc.target/riscv/rvv/base/pr113420-1.c| 30 ++
 .../gcc.target/riscv/rvv/base/pr113420-2.c| 31 +++
 3 files changed, 77 insertions(+), 77 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr113420-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr113420-2.c

diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-vector-builtins.cc
index 25e0b6e56de..c0e7af482da 100644
--- a/gcc/config/riscv/riscv-vector-builtins.cc
+++ b/gcc/config/riscv/riscv-vector-builtins.cc
@@ -4271,24 +4271,22 @@ registered_function::overloaded_hash () const
 : TYPE_UNSIGNED (type);
   mode_p = POINTER_TYPE_P (type) ? TYPE_MODE (TREE_TYPE (type))
 : TYPE_MODE (type);
-  h.add_int (unsigned_p);
-  h.add_int (mode_p);
+  if (POINTER_TYPE_P (type) || lookup_vector_type_attribute (type))
+   {
+ h.add_int (unsigned_p);
+ h.add_int (mode_p);
+   }
+  else if (instance.base->may_require_vxrm_p ()
+  || instance.base->may_require_frm_p ())
+   {
+ h.add_int (argument_types.length ());
+ break;
+   }
 }
 
   return h.end ();
 }
 
-bool
-has_vxrm_or_frm_p (function_instance , const vec 
)
-{
-  if (instance.base->may_require_vxrm_p ()
-  || (instance.base->may_require_frm_p ()
- && (TREE_CODE (TREE_TYPE (arglist[arglist.length () - 2]))
- == INTEGER_TYPE)))
-return true;
-  return false;
-}
-
 hashval_t
 registered_function::overloaded_hash (const vec )
 {
@@ -4296,68 +4294,8 @@ registered_function::overloaded_hash (const vec )
   unsigned int len = arglist.length ();
 
   for (unsigned int i = 0; i < len; i++)
-{
-  /* vint8m1_t __riscv_vget_i8m1(vint8m2_t src, size_t index);
-When the user calls vget intrinsic, the __riscv_vget_i8m1(src, 1)
-   form is used. The compiler recognizes that the parameter index is signed
-   int, which is inconsistent with size_t, so the index is converted to
-   size_t type in order to get correct hash value. vint8m2_t
-   __riscv_vset(vint8m2_t dest, size_t index, vint8m1_t value); The reason
-   is the same as above. */
-  if ((instance.base == bases::vget && (i == (len - 1)))
- || ((instance.base == bases::vset
-   || instance.shape == shapes::crypto_vi)
- && (i == (len - 2
-   argument_types.safe_push (size_type_node);
-  /* Vector fixed-point arithmetic instructions requiring argument vxrm.
-For example: vuint32m4_t __riscv_vaaddu(vuint32m4_t vs2,
-  vuint32m4_t vs1, unsigned int vxrm, size_t vl); The user calls vaaddu
-  intrinsic in the form of __riscv_vaaddu(vs2, vs1, 2, vl). The compiler
-  recognizes that the parameter vxrm is a signed int, which is inconsistent
-  with the parameter unsigned int vxrm declared by intrinsic, so the
-  parameter vxrm is converted to an unsigned int type in order to get
-  correct hash value.
-
-  Vector Floating-Point Instructions requiring argument frm.
-  DEF_RVV_FUNCTION (vfadd, alu, full_preds, f_vvv_ops)
-  DEF_RVV_FUNCTION (vfadd_frm, alu_frm, full_preds, f_vvv_ops)
-  Taking vfadd as an example, theoretically we can add base or shape to the
-  hash value to distinguish whether the frm parameter is required.
-  vfloat32m1_t __riscv_vfadd(vfloat32m1_t vs2, float32_t rs1, size_t vl);
-  vfloat32m1_t __riscv_vfadd(vfloat32m1_t vs2, vfloat32m1_t vs1, unsigned 
int
-  frm, size_t vl);
-
-   However, the current registration mechanism of overloaded intinsic 
for gcc
-  limits the intrinsic obtained by entering the hook to always be vfadd, 
not
-  vfadd_frm. Therefore, the correct 

Re: [PATCH 1/2] x86: Add no_callee_saved_registers function attribute

2024-01-21 Thread Hongtao Liu
On Sat, Jan 20, 2024 at 10:30 PM H.J. Lu  wrote:
>
> When an interrupt handler is implemented by an assembly stub which does:
>
> 1. Save all registers.
> 2. Call a C function.
> 3. Restore all registers.
> 4. Return from interrupt.
>
> it is completely unnecessary to save and restore any registers in the C
> function called by the assembly stub, even if they would normally be
> callee-saved.
>
> Add no_callee_saved_registers function attribute, which is complementary
> to no_caller_saved_registers function attribute, to mark a function which
> doesn't have any callee-saved registers.  Such a function won't save and
> restore any registers.  Classify function call-saved register handling
> type with:
>
> 1. Default call-saved registers.
> 2. No caller-saved registers with no_caller_saved_registers attribute.
> 3. No callee-saved registers with no_callee_saved_registers attribute.
>
> Disallow sibcall if callee is a no_callee_saved_registers function
> and caller isn't a no_callee_saved_registers function.  Otherwise,
> callee-saved registers won't be preserved.
>
> After a no_callee_saved_registers function is called, all registers may
> be clobbered.  If the calling function isn't a no_callee_saved_registers
> function, we need to preserve all registers which aren't used by function
> calls.
>
> gcc/
>
> PR target/103503
> PR target/113312
> * config/i386/i386-expand.cc (ix86_expand_call): Set
> call_no_callee_saved_registers to true when calling function
> with no_callee_saved_registers attribute.  Replace
> no_caller_saved_registers check with call_saved_registers check.
> * config/i386/i386-options.cc (ix86_set_func_type): Set
> call_saved_registers to TYPE_NO_CALLEE_SAVED_REGISTERS for
> noreturn function.  Disallow no_callee_saved_registers with
> interrupt or no_caller_saved_registers attributes together.
> (ix86_set_current_function): Replace no_caller_saved_registers
> check with call_saved_registers check.
> (ix86_handle_no_caller_saved_registers_attribute): Renamed to ...
> (ix86_handle_call_saved_registers_attribute): This.
> (ix86_gnu_attributes): Add
> ix86_handle_call_saved_registers_attribute.
> * config/i386/i386.cc (ix86_conditional_register_usage): Replace
> no_caller_saved_registers check with call_saved_registers check.
> (ix86_function_ok_for_sibcall): Don't allow callee with
> no_callee_saved_registers attribute when the calling function
> has callee-saved registers.
> (ix86_comp_type_attributes): Also check
> no_callee_saved_registers.
> (ix86_epilogue_uses): Replace no_caller_saved_registers check
> with call_saved_registers check.
> (ix86_hard_regno_scratch_ok): Likewise.
> (ix86_save_reg): Replace no_caller_saved_registers check with
> call_saved_registers check.  Don't save any registers for
> TYPE_NO_CALLEE_SAVED_REGISTERS.  Save all registers with
> TYPE_DEFAULT_CALL_SAVED_REGISTERS if function with
> no_callee_saved_registers attribute is called.
> (find_drap_reg): Replace no_caller_saved_registers check with
> call_saved_registers check.
> * config/i386/i386.h (call_saved_registers_type): New enum.
> (machine_function): Replace no_caller_saved_registers with
> call_saved_registers.  Add call_no_callee_saved_registers.
> * doc/extend.texi: Document no_callee_saved_registers attribute.
>
> gcc/testsuite/
>
> PR target/103503
> PR target/113312
> * gcc.dg/torture/no-callee-saved-run-1a.c: New file.
> * gcc.dg/torture/no-callee-saved-run-1b.c: Likewise.
> * gcc.target/i386/no-callee-saved-1.c: Likewise.
> * gcc.target/i386/no-callee-saved-2.c: Likewise.
> * gcc.target/i386/no-callee-saved-3.c: Likewise.
> * gcc.target/i386/no-callee-saved-4.c: Likewise.
> * gcc.target/i386/no-callee-saved-5.c: Likewise.
> * gcc.target/i386/no-callee-saved-6.c: Likewise.
> * gcc.target/i386/no-callee-saved-7.c: Likewise.
> * gcc.target/i386/no-callee-saved-8.c: Likewise.
> * gcc.target/i386/no-callee-saved-9.c: Likewise.
> * gcc.target/i386/no-callee-saved-10.c: Likewise.
> * gcc.target/i386/no-callee-saved-11.c: Likewise.
> * gcc.target/i386/no-callee-saved-12.c: Likewise.
> * gcc.target/i386/no-callee-saved-13.c: Likewise.
> * gcc.target/i386/no-callee-saved-14.c: Likewise.
> * gcc.target/i386/no-callee-saved-15.c: Likewise.
> * gcc.target/i386/no-callee-saved-16.c: Likewise.
> * gcc.target/i386/no-callee-saved-17.c: Likewise.
> * gcc.target/i386/no-callee-saved-18.c: Likewise.
> ---
>  gcc/config/i386/i386-expand.cc| 72 ---
>  gcc/config/i386/i386-options.cc   | 49 +
>  

Re: [PATCH] RISC-V: Fix vfirst/vmsbf/vmsif/vmsof ratio attributes

2024-01-21 Thread Kito Cheng
LGTM :)

On Mon, Jan 22, 2024 at 10:49 AM Juzhe-Zhong  wrote:
>
> vfirst/vmsbf/vmsif/vmsof instructions are supposed to demand ratio instead of 
> demanding sew_lmul.
> But my previous typo makes VSETVL PASS miss honor the risc-v v spec.
>
> Consider this following simple case:
>
> int foo4 (void * in, void * out)
> {
>   vint32m1_t v = __riscv_vle32_v_i32m1 (in, 4);
>   v = __riscv_vadd_vv_i32m1 (v, v, 4);
>   vbool32_t mask = __riscv_vreinterpret_v_i32m1_b32(v);
>   mask = __riscv_vmsof_m_b32(mask, 4);
>   return __riscv_vfirst_m_b32(mask, 4);
> }
>
> Before this patch:
>
> foo4:
> vsetivlizero,4,e32,m1,ta,ma
> vle32.v v1,0(a0)
> vadd.vv v1,v1,v1
> vsetvli zero,zero,e8,mf4,ta,ma> redundant.
> vmsof.m v2,v1
> vfirst.ma0,v2
> ret
>
> After this patch:
>
> foo4:
> vsetivlizero,4,e32,m1,ta,ma
> vle32.v v1,0(a0)
> vadd.vv v1,v1,v1
> vmsof.m v2,v1
> vfirst.ma0,v2
> ret
>
> Confirm RVV spec and Clang, this patch makes VSETVL PASS match the correct 
> behavior.
>
> Tested on both RV32/RV64, no regression.
>
> gcc/ChangeLog:
>
> * config/riscv/vector.md: Fix vfirst/vmsbf/vmsof ratio attributes.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/vsetvl/attribute-1.c: New test.
>
> ---
>  gcc/config/riscv/vector.md|  2 +-
>  .../gcc.target/riscv/rvv/vsetvl/attribute-1.c | 47 +++
>  2 files changed, 48 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/attribute-1.c
>
> diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
> index cfc54ae5eac..307d9a8c952 100644
> --- a/gcc/config/riscv/vector.md
> +++ b/gcc/config/riscv/vector.md
> @@ -433,7 +433,7 @@
>   vialu,vshift,vicmp,vimul,vidiv,vsalu,\
>   vext,viwalu,viwmul,vicalu,vnshift,\
>   vimuladd,vimerge,vaalu,vsmul,vsshift,\
> - vnclip,viminmax,viwmuladd,vmffs,vmsfs,\
> + vnclip,viminmax,viwmuladd,\
>   vmiota,vmidx,vfalu,vfmul,vfminmax,vfdiv,\
>   vfwalu,vfwmul,vfsqrt,vfrecp,vfsgnj,vfcmp,\
>   vfmerge,vfcvtitof,vfcvtftoi,vfwcvtitof,\
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/attribute-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/attribute-1.c
> new file mode 100644
> index 000..28dcf986bac
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/attribute-1.c
> @@ -0,0 +1,47 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +int
> +foo (void *in, void *out)
> +{
> +  vint32m1_t v = __riscv_vle32_v_i32m1 (in, 4);
> +  v = __riscv_vadd_vv_i32m1 (v, v, 4);
> +  vbool32_t mask = __riscv_vreinterpret_v_i32m1_b32 (v);
> +  return __riscv_vfirst_m_b32 (mask, 4);
> +}
> +
> +int
> +foo2 (void *in, void *out)
> +{
> +  vint32m1_t v = __riscv_vle32_v_i32m1 (in, 4);
> +  v = __riscv_vadd_vv_i32m1 (v, v, 4);
> +  vbool32_t mask = __riscv_vreinterpret_v_i32m1_b32 (v);
> +  mask = __riscv_vmsbf_m_b32 (mask, 4);
> +  return __riscv_vfirst_m_b32 (mask, 4);
> +}
> +
> +int
> +foo3 (void *in, void *out)
> +{
> +  vint32m1_t v = __riscv_vle32_v_i32m1 (in, 4);
> +  v = __riscv_vadd_vv_i32m1 (v, v, 4);
> +  vbool32_t mask = __riscv_vreinterpret_v_i32m1_b32 (v);
> +  mask = __riscv_vmsif_m_b32 (mask, 4);
> +  return __riscv_vfirst_m_b32 (mask, 4);
> +}
> +
> +int
> +foo4 (void *in, void *out)
> +{
> +  vint32m1_t v = __riscv_vle32_v_i32m1 (in, 4);
> +  v = __riscv_vadd_vv_i32m1 (v, v, 4);
> +  vbool32_t mask = __riscv_vreinterpret_v_i32m1_b32 (v);
> +  mask = __riscv_vmsof_m_b32 (mask, 4);
> +  return __riscv_vfirst_m_b32 (mask, 4);
> +}
> +
> +/* { dg-final { scan-assembler-times 
> {vsetivli\s+zero,\s*4,\s*e32,\s*m1,\s*t[au],\s*m[au]} 4 } } */
> +/* { dg-final { scan-assembler-times {vsetivli} 4 } } */
> +/* { dg-final { scan-assembler-not {vsetvli} } } */
> --
> 2.36.3
>


[PATCH] RISC-V: Fix vfirst/vmsbf/vmsif/vmsof ratio attributes

2024-01-21 Thread Juzhe-Zhong
vfirst/vmsbf/vmsif/vmsof instructions are supposed to demand ratio instead of 
demanding sew_lmul.
But my previous typo makes VSETVL PASS miss honor the risc-v v spec.

Consider this following simple case:

int foo4 (void * in, void * out)
{
  vint32m1_t v = __riscv_vle32_v_i32m1 (in, 4);
  v = __riscv_vadd_vv_i32m1 (v, v, 4);
  vbool32_t mask = __riscv_vreinterpret_v_i32m1_b32(v);
  mask = __riscv_vmsof_m_b32(mask, 4);
  return __riscv_vfirst_m_b32(mask, 4);
}

Before this patch:

foo4:
vsetivlizero,4,e32,m1,ta,ma
vle32.v v1,0(a0)
vadd.vv v1,v1,v1
vsetvli zero,zero,e8,mf4,ta,ma> redundant.
vmsof.m v2,v1
vfirst.ma0,v2
ret

After this patch:

foo4:
vsetivlizero,4,e32,m1,ta,ma
vle32.v v1,0(a0)
vadd.vv v1,v1,v1
vmsof.m v2,v1
vfirst.ma0,v2
ret

Confirm RVV spec and Clang, this patch makes VSETVL PASS match the correct 
behavior.

Tested on both RV32/RV64, no regression.

gcc/ChangeLog:

* config/riscv/vector.md: Fix vfirst/vmsbf/vmsof ratio attributes.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/attribute-1.c: New test.

---
 gcc/config/riscv/vector.md|  2 +-
 .../gcc.target/riscv/rvv/vsetvl/attribute-1.c | 47 +++
 2 files changed, 48 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/attribute-1.c

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index cfc54ae5eac..307d9a8c952 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -433,7 +433,7 @@
  vialu,vshift,vicmp,vimul,vidiv,vsalu,\
  vext,viwalu,viwmul,vicalu,vnshift,\
  vimuladd,vimerge,vaalu,vsmul,vsshift,\
- vnclip,viminmax,viwmuladd,vmffs,vmsfs,\
+ vnclip,viminmax,viwmuladd,\
  vmiota,vmidx,vfalu,vfmul,vfminmax,vfdiv,\
  vfwalu,vfwmul,vfsqrt,vfrecp,vfsgnj,vfcmp,\
  vfmerge,vfcvtitof,vfcvtftoi,vfwcvtitof,\
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/attribute-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/attribute-1.c
new file mode 100644
index 000..28dcf986bac
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/attribute-1.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
+
+#include "riscv_vector.h"
+
+int
+foo (void *in, void *out)
+{
+  vint32m1_t v = __riscv_vle32_v_i32m1 (in, 4);
+  v = __riscv_vadd_vv_i32m1 (v, v, 4);
+  vbool32_t mask = __riscv_vreinterpret_v_i32m1_b32 (v);
+  return __riscv_vfirst_m_b32 (mask, 4);
+}
+
+int
+foo2 (void *in, void *out)
+{
+  vint32m1_t v = __riscv_vle32_v_i32m1 (in, 4);
+  v = __riscv_vadd_vv_i32m1 (v, v, 4);
+  vbool32_t mask = __riscv_vreinterpret_v_i32m1_b32 (v);
+  mask = __riscv_vmsbf_m_b32 (mask, 4);
+  return __riscv_vfirst_m_b32 (mask, 4);
+}
+
+int
+foo3 (void *in, void *out)
+{
+  vint32m1_t v = __riscv_vle32_v_i32m1 (in, 4);
+  v = __riscv_vadd_vv_i32m1 (v, v, 4);
+  vbool32_t mask = __riscv_vreinterpret_v_i32m1_b32 (v);
+  mask = __riscv_vmsif_m_b32 (mask, 4);
+  return __riscv_vfirst_m_b32 (mask, 4);
+}
+
+int
+foo4 (void *in, void *out)
+{
+  vint32m1_t v = __riscv_vle32_v_i32m1 (in, 4);
+  v = __riscv_vadd_vv_i32m1 (v, v, 4);
+  vbool32_t mask = __riscv_vreinterpret_v_i32m1_b32 (v);
+  mask = __riscv_vmsof_m_b32 (mask, 4);
+  return __riscv_vfirst_m_b32 (mask, 4);
+}
+
+/* { dg-final { scan-assembler-times 
{vsetivli\s+zero,\s*4,\s*e32,\s*m1,\s*t[au],\s*m[au]} 4 } } */
+/* { dg-final { scan-assembler-times {vsetivli} 4 } } */
+/* { dg-final { scan-assembler-not {vsetvli} } } */
-- 
2.36.3



[PATCH] i386: Modify testcases failed under -DDEBUG

2024-01-21 Thread Haochen Jiang
Hi all,

Recently, I happened to run i386.exp under -DDEBUG and found some fail.

This patch aims to fix that. Ok for trunk?

Thx,
Haochen

gcc/testsuite/ChangeLog:

* gcc.target/i386/adx-check.h: Include stdio.h when DEBUG
is defined.
* gcc.target/i386/avx512fp16-vscalefph-1b.c: Do not define
DEBUG.
* gcc.target/i386/avx512fp16vl-vaddph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcmpph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vdivph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vfpclassph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vgetexpph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vgetmantph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vmaxph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vminph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vmulph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vrcpph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vreduceph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vrndscaleph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vrsqrtph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vscalefph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vsqrtph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vsubph-1b.c: Ditto.
* gcc.target/i386/readeflags-1.c: Include stdio.h when DEBUG
is defined.
* gcc.target/i386/rtm-check.h: Ditto.
* gcc.target/i386/sha-check.h: Ditto.
* gcc.target/i386/writeeflags-1.c: Ditto.
---
 gcc/testsuite/gcc.target/i386/adx-check.h   | 3 +++
 gcc/testsuite/gcc.target/i386/avx512fp16-vscalefph-1b.c | 3 ---
 gcc/testsuite/gcc.target/i386/avx512fp16vl-vaddph-1b.c  | 1 -
 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcmpph-1b.c  | 1 -
 gcc/testsuite/gcc.target/i386/avx512fp16vl-vdivph-1b.c  | 1 -
 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfpclassph-1b.c  | 1 -
 gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetexpph-1b.c   | 1 -
 gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetmantph-1b.c  | 1 -
 gcc/testsuite/gcc.target/i386/avx512fp16vl-vmaxph-1b.c  | 1 -
 gcc/testsuite/gcc.target/i386/avx512fp16vl-vminph-1b.c  | 1 -
 gcc/testsuite/gcc.target/i386/avx512fp16vl-vmulph-1b.c  | 1 -
 gcc/testsuite/gcc.target/i386/avx512fp16vl-vrcpph-1b.c  | 1 -
 gcc/testsuite/gcc.target/i386/avx512fp16vl-vreduceph-1b.c   | 1 -
 gcc/testsuite/gcc.target/i386/avx512fp16vl-vrndscaleph-1b.c | 1 -
 gcc/testsuite/gcc.target/i386/avx512fp16vl-vrsqrtph-1b.c| 1 -
 gcc/testsuite/gcc.target/i386/avx512fp16vl-vscalefph-1b.c   | 1 -
 gcc/testsuite/gcc.target/i386/avx512fp16vl-vsqrtph-1b.c | 1 -
 gcc/testsuite/gcc.target/i386/avx512fp16vl-vsubph-1b.c  | 1 -
 gcc/testsuite/gcc.target/i386/readeflags-1.c| 3 +++
 gcc/testsuite/gcc.target/i386/rtm-check.h   | 3 +++
 gcc/testsuite/gcc.target/i386/sha-check.h   | 3 +++
 gcc/testsuite/gcc.target/i386/writeeflags-1.c   | 3 +++
 22 files changed, 15 insertions(+), 19 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/adx-check.h 
b/gcc/testsuite/gcc.target/i386/adx-check.h
index cfed1a38483..45435b91d0e 100644
--- a/gcc/testsuite/gcc.target/i386/adx-check.h
+++ b/gcc/testsuite/gcc.target/i386/adx-check.h
@@ -1,5 +1,8 @@
 #include 
 #include "cpuid.h"
+#ifdef DEBUG
+#include 
+#endif
 
 static void adx_test (void);
 
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vscalefph-1b.c 
b/gcc/testsuite/gcc.target/i386/avx512fp16-vscalefph-1b.c
index 7c7288d6eb3..0ba9ec57f37 100644
--- a/gcc/testsuite/gcc.target/i386/avx512fp16-vscalefph-1b.c
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vscalefph-1b.c
@@ -1,9 +1,6 @@
 /* { dg-do run { target avx512fp16 } } */
 /* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
 
-
-#define DEBUG
-
 #define AVX512FP16
 #include "avx512fp16-helper.h"
 
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vaddph-1b.c 
b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vaddph-1b.c
index fcf6a9058f5..1db7c565262 100644
--- a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vaddph-1b.c
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vaddph-1b.c
@@ -1,7 +1,6 @@
 /* { dg-do run { target avx512fp16 } } */
 /* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
 
-#define DEBUG
 #define AVX512VL
 #define AVX512F_LEN 256  
 #define AVX512F_LEN_HALF 128 
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcmpph-1b.c 
b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcmpph-1b.c
index c201a9258bf..bbd366a5d29 100644
--- a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcmpph-1b.c
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcmpph-1b.c
@@ -1,7 +1,6 @@
 /* { dg-do run { target avx512fp16 } } */
 /* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
 
-#define DEBUG
 #define AVX512VL
 #define AVX512F_LEN 256  
 #define AVX512F_LEN_HALF 128 
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vdivph-1b.c 

Re: [committed] Adjust expectations for pr59533-1.c

2024-01-21 Thread Oleg Endo


On Sun, 2024-01-21 at 19:14 -0700, Jeff Law wrote:
> The change for pr111267 twiddled code generation for sh/pr59533-1.c
> 
> We end up eliminating two comparisons, but require two shll instructions 
> to do so.  And in a couple places we're using an addc sequence rather 
> than a subc sequence.   This patch adjusts the expected codegen for the 
> test as all those are either a wash or a
> 
> The fwprop change does cause some code regressions on the same test. 
> I'll file a distinct but for that issue.
> 
> Pushed to the trunk,
> 
> Jeff

Thanks for keeping an eye on this.

Note that on SH4 the comparison insns are of MT type, which increases
likelihood of parallel execution.  So it's better to use those e.g. to shift
out the MSB into T bit than shll.

Cheers,
Oleg


Re: [PATCH 1/2] rtl-optimization/113255 - base_alias_check vs. pointer difference

2024-01-21 Thread Jeff Law




On 1/15/24 06:34, Richard Biener wrote:

When the x86 backend generates code for cpymem with the rep_8byte
strathegy for the 8 byte aligned main rep movq it needs to compute
an adjusted pointer to the source after doing a prologue aligning
the destination.  It computes that via

   src_ptr + (dest_ptr - orig_dest_ptr)

which is perfectly fine.  On RTL this is then

 8: r134:DI=const(`g'+0x44)
 9: {r133:DI=frame:DI-0x4c;clobber flags:CC;}
   REG_UNUSED flags:CC
56: r129:DI=const(`g'+0x4c)
57: {r129:DI=r129:DI&0xfff8;clobber flags:CC;}
   REG_UNUSED flags:CC
   REG_EQUAL const(`g'+0x4c)&0xfff8
58: {r118:DI=r134:DI-r129:DI;clobber flags:CC;}
   REG_DEAD r134:DI
   REG_UNUSED flags:CC
   REG_EQUAL const(`g'+0x44)-r129:DI
59: {r119:DI=r133:DI-r118:DI;clobber flags:CC;}
   REG_DEAD r133:DI
   REG_UNUSED flags:CC

but as written find_base_term happily picks the first candidate
it finds for the MINUS which means it picks const(`g') rather
than the correct frame:DI.  This way find_base_term (but also
the unfixed find_base_value used by init_alias_analysis to
initialize REG_BASE_VALUE) performs pointer analysis isn't
sound.  The following restricts the handling of multi-operand
operations to the case we know only one can be a pointer.

This for example causes gcc.dg/tree-ssa/pr94969.c to miss some
RTL PRE (I've opened PR113395 for this).  A more drastic patch,
removing base_alias_check results in only gcc.dg/guality/pr41447-1.c
regressing (so testsuite coverage is bad).  I've looked at
gcc.dg/tree-ssa tests and mostly scheduling changes are present,
the cc1plus .text size is only 230 bytes worse.  With the this
less drastic patch below most scheduling changes are gone.

x86_64 might not the very best target to test for impact, but
test coverage on other targets is unlikely to be very much better.

Bootstrapped and tested on x86_64-unknown-linux-gnu (together
with 2/2).  Jeff, can you maybe throw this on your tester?
Jakub, you did the PR64025 fix which was for a similar issue.

OK for trunk?
Pending testing, yes.  Though I'd be a bit surprised if anything pops on 
this.  I just doubt we've got much coverage in this space.  I'll pass 
along the cross results as soon as they're done.


jeff


[committed] Adjust expectations for pr59533-1.c

2024-01-21 Thread Jeff Law

The change for pr111267 twiddled code generation for sh/pr59533-1.c

We end up eliminating two comparisons, but require two shll instructions 
to do so.  And in a couple places we're using an addc sequence rather 
than a subc sequence.   This patch adjusts the expected codegen for the 
test as all those are either a wash or a


The fwprop change does cause some code regressions on the same test. 
I'll file a distinct but for that issue.


Pushed to the trunk,

Jeffcommit 7e16f819ff413c48702f9087b62eaac39a060a14
Author: Jeff Law 
Date:   Sun Jan 21 19:12:21 2024 -0700

[committed] Adjust expectations for pr59533-1.c

The change for pr111267 twiddled code generation for sh/pr59533-1.c

We end up eliminating two comparisons, but require two shll instructions to 
do
so.  And in a couple places we're using an addc sequence rather than a subc
sequence.   This patch adjusts the expected codegen for the test as all 
those
are either a wash or a

The fwprop change does cause some code regressions on the same test.  I'll 
file
a distinct but for that issue.

gcc/testsuite
* gcc.target/sh/pr59533-1.c: Adjust expected output.

diff --git a/gcc/testsuite/gcc.target/sh/pr59533-1.c 
b/gcc/testsuite/gcc.target/sh/pr59533-1.c
index b0469859df5..859b8e2d24c 100644
--- a/gcc/testsuite/gcc.target/sh/pr59533-1.c
+++ b/gcc/testsuite/gcc.target/sh/pr59533-1.c
@@ -2,15 +2,15 @@
 /* { dg-do compile }  */
 /* { dg-options "-O1" } */
 
-/* { dg-final { scan-assembler-times "shll" 1 } }  */
+/* { dg-final { scan-assembler-times "shll" 3 } }  */
 /* { dg-final { scan-assembler-times "movt" 5 } }  */
 /* { dg-final { scan-assembler-times "rotcl" 1 } }  */
 /* { dg-final { scan-assembler-times "and" 3 } }  */
 /* { dg-final { scan-assembler-times "extu.b" 5 } }  */
 
-/* { dg-final { scan-assembler-times "cmp/pz" 27 { target { ! sh2a } } } }  */
-/* { dg-final { scan-assembler-times "addc" 4 { target { ! sh2a } } } }  */
-/* { dg-final { scan-assembler-times "subc" 16 { target { ! sh2a } } } }  */
+/* { dg-final { scan-assembler-times "cmp/pz" 25 { target { ! sh2a } } } }  */
+/* { dg-final { scan-assembler-times "addc" 6 { target { ! sh2a } } } }  */
+/* { dg-final { scan-assembler-times "subc" 14 { target { ! sh2a } } } }  */
 
 /* { dg-final { scan-assembler-times "cmp/pz" 25 { target { sh2a } } } }  */
 /* { dg-final { scan-assembler-times "addc" 6 { target { sh2a } } } }  */


Re: [Committed] RISC-V: Suppress warning

2024-01-21 Thread Andreas Schwab
On Jan 21 2024, Jeff Law wrote:

> Yea.  The biggest problem with ATTRIBUTE_UNUSED is that it's a "may be
> unused" and thus if the code changes it's sometimes left on an parameter
> incorrectly.  C++ allows us to specify a "is definitely unused" concept by
> dropping the parameter's name, but leaving its type.

It's problematic if the parameter is only used conditionally on a macro
definition (which is less common with target hooks now being function
calls).

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: [PATCH v3 2/2] RISC-V: Fix XCValu test

2024-01-21 Thread Jeff Law




On 1/16/24 10:13, Mary Bennett wrote:

gcc/testsuite/ChangeLog:
* gcc.target/riscv/cv-alu-fail-compile.c: Change warning to error.
AFAICT this is independent of the other patches and fixes a clear 
testsuite issue.  I've pushed it to the trunk.


Thanks,
Jeff


Re: [PATCH v2 1/1] RISC-V: Add support for XCVbitmanip extension in CV32E40P

2024-01-21 Thread Jeff Law




On 1/16/24 09:25, Mary Bennett wrote:

Spec: 
github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md

Contributors:
   Mary Bennett 
   Nandni Jamnadas 
   Pietra Ferreira 
   Charlie Keaney
   Jessica Mills
   Craig Blackmore 
   Simon Cook 
   Jeremy Bennett 
   Helene Chelin 

gcc/ChangeLog:
* common/config/riscv/riscv-common.cc: Add XCVbitmanip.
* config/riscv/constraints.md: Likewise.
* config/riscv/corev.def: Likewise.
* config/riscv/corev.md: Likewise.
* config/riscv/predicates.md: Likewise.
* config/riscv/riscv-builtins.cc (AVAIL): Likewise.
* config/riscv/riscv-ftypes.def: Likewise.
* config/riscv/riscv.opt: Likewise.
* doc/extend.texi: Add XCVbitmanip builtin documentation.
* doc/sourcebuild.texi: Likewise.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/cv-bitmanip-compile-bclr.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-bclrr.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-bitrev.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-bset.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-bsetr.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-clb.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-cnt.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-extract.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-extractr.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-extractu.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-extractur.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-ff1.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-fl1.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-insert.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-insertr.c: New test.
* gcc.target/riscv/cv-bitmanip-compile-ror.c: New test.
* gcc.target/riscv/cv-bitmanip-fail-compile-bclr.c: New test.
* gcc.target/riscv/cv-bitmanip-fail-compile-bitrev.c: New test.
* gcc.target/riscv/cv-bitmanip-fail-compile-bset.c: New test.
* gcc.target/riscv/cv-bitmanip-fail-compile-extract.c: New test.
* gcc.target/riscv/cv-bitmanip-fail-compile-extractu.c: New test.
* gcc.target/riscv/cv-bitmanip-fail-compile-insert.c: New test.
* lib/target-supports.exp: Add proc for the XCVbitmanip extension.
---






diff --git a/gcc/config/riscv/corev.md b/gcc/config/riscv/corev.md
index adad2409fb6..9afd69ec080 100644
--- a/gcc/config/riscv/corev.md
+++ b/gcc/config/riscv/corev.md
@@ -706,3 +710,163 @@
  
[(set_attr "type" "load")

(set_attr "mode" "SI")])
+
+;; XCVBITMANIP builtins
+
+(define_insn "riscv_cv_bitmanip_extract"
+  [(set (match_operand:SI 0 "register_operand" "=r,r")
+(sign_extend:SI (and:SI (rotate:SI (ashift:SI (const_int 1)
+(ashiftrt:HI
+(match_operand:HI 2 "bit_extract_operand" 
"CV_bit_si10,r")
+(const_int 5)))
+(and:HI (match_dup 2)
+(const_int 31)))
+   (match_operand:SI 1 "register_operand" "r,r"]
So it looks like the you've got an ashift:SI where the shift amount of 
the result of an ashiftrt:HI -- note the difference in the modes.  That 
doesn't seem right.  It also looks like the indentation on the RTL 
pattern is wrong -- that ashiftrt is an argument of the ashift:SI, so it 
should line up with the (const_int 1) argument.


It looks like the same issue is repeated on the extractu pattern.  Which 
leads to a second comment.  I think the two patterns can be combined. 
There'a an "any_extend" iterator that you could use in place of the 
sign_extend/zero_extend.  And there's a "u" iterator that you can use in 
the output template to conditionally emit a "u".


The insertion, bclr, and bset patterns seem to be goofy on their modes 
in a similar way.




+
+(define_insn "riscv_cv_bitmanip_ff1"
+  [(set (match_operand:QI 0 "register_operand" "=r")
+(if_then_else (eq:SI (match_operand:SI 1 "register_operand" "r") 
(const_int 0))
+  (const_int 32)
+  (minus:SI (ffs:SI (match_dup 1))
+(const_int 1]
Presumably this is an if-then-else to deal with the case when the input 
value is zero?  Is there any way to use ctz/clz in combination with 
C[LT]Z_DEFINED_VALUE_AT_ZERO instead of ffs with a conditional?


More generally, can we avoid the UNSPECs in here by defining these 
operations in terms of ffs, clz, ctz?




+
+(define_insn "riscv_cv_bitmanip_cnt"
+  [(set (match_operand:QI 0 "register_operand" "=r")
+(truncate:QI (popcount:SI (match_operand:SI 1 "register_operand" 
"r"]
+
+  "TARGET_XCVBITMANIP && !TARGET_64BIT"
+  "cv.cnt\t%0,%1"
+  [(set_attr "type" "bitmanip")
+  (set_attr "mode" "SI")])
Interesting truncation 

Re: [PATCH] Avoid ICE with m68k-elf -malign-int and libcalls

2024-01-21 Thread Jeff Law




On 1/16/24 09:55, Mikael Pettersson wrote:

On Thu, 4 Jan 2024 14:39:23 -0700, Jeff Law wrote:

On 1/4/24 02:23, Mikael Pettersson wrote:

emit_library_call_value_1 calls emit_push_insn with NULL_TREE
for TYPE.  Sometimes emit_push_insn needs to assign a temp with
that TYPE, which causes a segfault.

Fixed by computing the TYPE from MODE when needed.

Original patch by Thorsten Otto.

gcc/

2024-01-03  Thorsten Otto  
  Mikael Pettersson  

PR target/82420
PR target/111279
* expr.cc (emit_push_insn): If TYPE is NULL compute it
from MODE before calling assign_temp.

gcc/teststuite/

2024-01-03  Mikael Pettersson  

PR target/82420
* gcc.target/m68k/pr82420.c: New test.

This really needs to happen in the two call paths which pass in
NULL_TREE for the type.  Note how the type is used to determine padding
earlier in emit_push_insn.  That would also make the code more
consistent with the comment before emit_push_insn which implies that
both MODE and TYPE are valid.


Additionally you should bootstrap and regression test this patch on at
least one target.


Updated as requested, and bootstrapped and tested on
{x86_64,aarch64,m68k}-linux-gnu without regressions.

gcc/

2024-01-16  Thorsten Otto  
 Mikael Pettersson  

PR target/82420
PR target/111279
* calls.cc (emit_library_call_value_1): Pass valid TYPE
to emit_push_insn.
* expr.cc (emit_push_insn): Likewise.

gcc/teststuite/

2024-01-16  Mikael Pettersson  

PR target/82420
* gcc.target/m68k/pr82420.c: New test.

I made minor adjustments to the commit message and pushed this to the trunk.

jeff


Re: [PATCH] sra: Disqualify bases of operands of asm gotos

2024-01-21 Thread Jeff Law




On 1/17/24 11:21, Martin Jambor wrote:

Hi,

PR 110422 shows that SRA can ICE assuming there is a single edge
outgoing from a block terminated with an asm goto.  We need that for
BB-terminating statements so that any adjustments they make to the
aggregates can be copied over to their replacements.  Because we can't
have that after ASM gotos, we need to punt.

Bootstrapped and tested on x86_64-linux, OK for master?  It will need
some tweaking for release branches, is it in principle OK for them too
(after testing)?

Thanks,

Martin


gcc/ChangeLog:

2024-01-17  Martin Jambor  

PR tree-optimization/110422
* tree-sra.cc (scan_function): Disqualify bases of operands of asm
gotos.

gcc/testsuite/ChangeLog:

2024-01-17  Martin Jambor  

PR tree-optimization/110422
* gcc.dg/torture/pr110422.c: New test.
OK.  Also OK for the other release branches with necessary adjustments 
after testing.


jeff


Re: [Committed] RISC-V: Suppress warning

2024-01-21 Thread Jeff Law




On 1/19/24 18:18, 钟居哲 wrote:

OK. I saw the other arguments there:

                             tree fntype ATTRIBUTE_UNUSED,
                             rtx libname ATTRIBUTE_UNUSED,

So I leverage these and add ATTRIBUTE_UNUSED to 'fndecl'

Maybe it's better remove all arguments for riscv_init_cumulative_args 
which are unused as you suggested.
Yea.  The biggest problem with ATTRIBUTE_UNUSED is that it's a "may be 
unused" and thus if the code changes it's sometimes left on an parameter 
incorrectly.  C++ allows us to specify a "is definitely unused" concept 
by dropping the parameter's name, but leaving its type.


I just pushed the obvious patch to fixup that function signature.  It 
would be a nice easy task for someone to go through all of riscv*.cc and 
clean those up after gcc-14 branches.


jeff


[committed] [NFC] Fix riscv_init_cumulative_args for unused arguments

2024-01-21 Thread Jeff Law


The signature was still using ATTRIBUTE_UNUSED and actually marked one 
of the used arguments with ATTRIBUTE_UNUSED.


This patch drops the decorations and instead remove the name of 
arguments which are actually unused which is the preferred way to handle 
this now when we can.


Bootstrapped.  I didn't have test results on the platform where I 
bootstrapped, so no results to compare against.  Given its NFC, I think 
we're OK without the regression results.


Pushed to the trunk.
jeffdiff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 1f9546f4d3e..3ba45ffaa74 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4873,11 +4873,7 @@ riscv_pass_fpr_pair (machine_mode mode, unsigned regno1,
For a library call, FNTYPE is 0.  */
 
 void
-riscv_init_cumulative_args (CUMULATIVE_ARGS *cum,
-   tree fntype ATTRIBUTE_UNUSED,
-   rtx libname ATTRIBUTE_UNUSED,
-   tree fndecl ATTRIBUTE_UNUSED,
-   int caller ATTRIBUTE_UNUSED)
+riscv_init_cumulative_args (CUMULATIVE_ARGS *cum, tree fntype, rtx, tree, int)
 {
   memset (cum, 0, sizeof (*cum));
 


[committed] libstdc++: Fix std::format for floating-point chrono::time_point [PR113500]

2024-01-21 Thread Jonathan Wakely
Tested aarch64-linux. Pushed to trunk.

Backport to gcc-13 would be good too, I think.

-- >8 --

Currently trying to use std::format with certain specializations of
std::chrono::time_point is ill-formed, due to one member function of the
__formatter_chrono type which tries to write a time_point to an ostream.
For sys_time or sys_time with a period greater than days
there is no operator<< that can be used.

That operator<< is only needed when using an empty chrono-specs in the
format string, like "{}", but the ill-formed expression gives an error
even if not actually used. This means it's not possible to format some
other specializations of chrono::time_point, even when using a non-empty
chrono-specs.

This fixes it by avoiding using 'os << t' for all chrono::time_point
specializations, and instead using std::format("{:L%F %T}", t). So that
we continue to reject std::format("{}", sys_time{1.0s}) a check for
empty chrono-specs is added to the formatter, C>
specialization.

While testing this I noticed that the output for %S with a
floating-point duration was incorrect, as the subseconds part was being
appended to the seconds without a decimal point, and without the correct
number of leading zeros.

libstdc++-v3/ChangeLog:

PR libstdc++/113500
* include/bits/chrono_io.h (__formatter_chrono::_M_S): Fix
printing of subseconds with floating-point rep.
(__formatter_chrono::_M_format_to_ostream): Do not write
time_point specializations directly to the ostream.
(formatter, C>::parse): Do not allow an
empty chrono-spec if the type fails to meet the constraints for
writing to an ostream with operator<<.
* testsuite/std/time/clock/file/io.cc: Check formatting
non-integral times with empty chrono-specs.
* testsuite/std/time/clock/gps/io.cc: Likewise.
* testsuite/std/time/clock/utc/io.cc: Likewise.
* testsuite/std/time/hh_mm_ss/io.cc: Likewise.
---
 libstdc++-v3/include/bits/chrono_io.h | 71 +--
 .../testsuite/std/time/clock/file/io.cc   | 17 +
 .../testsuite/std/time/clock/gps/io.cc| 27 +++
 .../testsuite/std/time/clock/utc/io.cc|  4 ++
 .../testsuite/std/time/hh_mm_ss/io.cc | 28 +++-
 5 files changed, 124 insertions(+), 23 deletions(-)

diff --git a/libstdc++-v3/include/bits/chrono_io.h 
b/libstdc++-v3/include/bits/chrono_io.h
index 7b5876b24e6..82f2d39ec44 100644
--- a/libstdc++-v3/include/bits/chrono_io.h
+++ b/libstdc++-v3/include/bits/chrono_io.h
@@ -681,6 +681,7 @@ namespace __format
return __fc.locale();
}
 
+  // Format for empty chrono-specs, e.g. "{}" (C++20 [time.format] p6).
   // TODO: consider moving body of every operator<< into this function
   // and use std::format("{}", t) to implement those operators. That
   // would avoid std::format("{}", t) calling operator<< which calls
@@ -702,6 +703,22 @@ namespace __format
 
  if constexpr (__is_specialization_of<_Tp, __utc_leap_second>)
__os << __t._M_date << ' ' << __t._M_time;
+ else if constexpr (chrono::__is_time_point_v<_Tp>)
+   {
+ // Need to be careful here because not all specializations
+ // of chrono::sys_time can be written to an ostream.
+ // For the specializations of time_point that can be
+ // formatted with an empty chrono-specs, either it's a
+ // sys_time with period greater or equal to days:
+ if constexpr (is_convertible_v<_Tp, chrono::sys_days>)
+   __os << _S_date(__t);
+ else // Or it's formatted as "{:L%F %T}":
+   {
+ auto __days = chrono::floor(__t);
+ __os << chrono::year_month_day(__days) << ' '
+<< chrono::hh_mm_ss(__t - __days);
+   }
+   }
  else
{
  if constexpr (chrono::__is_duration_v<_Tp>)
@@ -1122,39 +1139,43 @@ namespace __format
   'S', 'O');
}
 
- __out = __format::__write(std::move(__out),
-   _S_two_digits(__hms.seconds().count()));
- if constexpr (__hms.fractional_width != 0)
+ if constexpr (__hms.fractional_width == 0)
+   __out = __format::__write(std::move(__out),
+ _S_two_digits(__hms.seconds().count()));
+ else
{
  locale __loc = _M_locale(__ctx);
+ auto __s = __hms.seconds();
  auto __ss = __hms.subseconds();
  using rep = typename decltype(__ss)::rep;
  if constexpr (is_floating_point_v)
{
+ chrono::duration __fs = __s + __ss;
  __out = std::format_to(std::move(__out), __loc,
-  

[committed] libstdc++: Fix std::chrono::file_clock conversions for low-precision times

2024-01-21 Thread Jonathan Wakely
Tested aarch64-linux. Pushed to trunk. Backport needed to gcc-13 too.

-- >8 --

THe std::chrono::file_clock conversions were not using common_type and
so failed to compile when converting anything that should have increased
precision after arithmetic with a std::chrono::seconds value.

libstdc++-v3/ChangeLog:

* include/bits/chrono.h (__file_clock::from_sys)
(__file_clock::to_sys, __file_clock::_S_from_sys)
(__file_clock::_S_to_sys): Use common_type for return type.
* testsuite/std/time/clock/file/members.cc: Check round trip
conversion for time with lower precision that seconds.
---
 libstdc++-v3/include/bits/chrono.h | 14 --
 .../testsuite/std/time/clock/file/members.cc   |  9 +
 2 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/include/bits/chrono.h 
b/libstdc++-v3/include/bits/chrono.h
index 6f4fe55f9c0..0773867da71 100644
--- a/libstdc++-v3/include/bits/chrono.h
+++ b/libstdc++-v3/include/bits/chrono.h
@@ -1453,14 +1453,14 @@ _GLIBCXX_END_INLINE_ABI_NAMESPACE(_V2)
 #if __cplusplus > 201703L
   template
static
-   chrono::file_time<_Dur>
+   chrono::file_time>
from_sys(const chrono::sys_time<_Dur>& __t) noexcept
{ return _S_from_sys(__t); }
 
   // For internal use only
   template
static
-   chrono::sys_time<_Dur>
+   chrono::sys_time>
to_sys(const chrono::file_time<_Dur>& __t) noexcept
{ return _S_to_sys(__t); }
 #endif // C++20
@@ -1477,20 +1477,22 @@ _GLIBCXX_END_INLINE_ABI_NAMESPACE(_V2)
   // For internal use only
   template
static
-   chrono::time_point<__file_clock, _Dur>
+   chrono::time_point<__file_clock, common_type_t<_Dur, chrono::seconds>>
_S_from_sys(const chrono::time_point<__sys_clock, _Dur>& __t) noexcept
{
- using __file_time = chrono::time_point<__file_clock, _Dur>;
+ using _CDur = common_type_t<_Dur, chrono::seconds>;
+ using __file_time = chrono::time_point<__file_clock, _CDur>;
  return __file_time{__t.time_since_epoch()} - _S_epoch_diff;
}
 
   // For internal use only
   template
static
-   chrono::time_point<__sys_clock, _Dur>
+   chrono::time_point<__sys_clock, common_type_t<_Dur, chrono::seconds>>
_S_to_sys(const chrono::time_point<__file_clock, _Dur>& __t) noexcept
{
- using __sys_time = chrono::time_point<__sys_clock, _Dur>;
+ using _CDur = common_type_t<_Dur, chrono::seconds>;
+ using __sys_time = chrono::time_point<__sys_clock, _CDur>;
  return __sys_time{__t.time_since_epoch()} + _S_epoch_diff;
}
 };
diff --git a/libstdc++-v3/testsuite/std/time/clock/file/members.cc 
b/libstdc++-v3/testsuite/std/time/clock/file/members.cc
index 54459cc56a9..9d217b17811 100644
--- a/libstdc++-v3/testsuite/std/time/clock/file/members.cc
+++ b/libstdc++-v3/testsuite/std/time/clock/file/members.cc
@@ -41,9 +41,18 @@ test02()
   VERIFY( t - s < 1s );
 }
 
+void
+test03()
+{
+  using namespace std::chrono;
+  auto st = sys_days(2024y/January/21);
+  VERIFY( file_clock::to_sys(file_clock::from_sys(st)) == st );
+}
+
 int
 main()
 {
   test01();
   test02();
+  test03();
 }
-- 
2.43.0



Re: PR82943 - Suggested patch to fix

2024-01-21 Thread Harald Anlauf

Am 20.01.24 um 23:42 schrieb Alexander Westbrooks:

Based on what I am reading here, I can either do the DCO path or the copy
right form path? Or is it both, where I send in the copy right forms and
then on every commit I put the DCO?


I thought the text is pretty clear.  As already mentioned,

  https://gcc.gnu.org/contribute.html#legal

gives you the options.  One of these options is:

"Alternatively, a contributor can certify the Developer Certificate of
Origin for their contribution by adding the Signed-off-by: tag to their
submission."

If you opt for this variant,

  https://gcc.gnu.org/dco.html

tells you everything.  Basically (but please read yourself):

"The sign-off is a simple line at the end of the explanation for the
patch, which certifies that you wrote it or otherwise have the right to
pass it on as a free software patch. ..."

[...]

"then you just add a line saying:

Signed-off-by: Random J Developer 

using your real name (sorry, no pseudonyms or anonymous contributions.) ..."

I think this would be sufficient.



On Sat, Jan 20, 2024 at 3:40 PM Harald Anlauf  wrote:


Am 20.01.24 um 21:37 schrieb Jerry D:

On 1/20/24 12:08 PM, Harald Anlauf wrote:

Am 20.01.24 um 20:08 schrieb Jerry D:

On 1/20/24 10:46 AM, Alexander Westbrooks wrote:

Hello and Happy New Year!

I wanted to follow up on this patch I made to address PR82943 for
GFortran. Has anyone had a chance to review it?

Thanks,

Alexander Westbrooks



Inserting myself in here just a little bit.  I will apply and test

today

if I can. Paul is unavailable for a few weeks. Harald can chime in.

Do you have commit rights for gcc?


I am not aware of Alex having a copyright assignment on file,
or a DCO certificate, and the patch is not signed off.
But I might be wrong.


--- snip ---

I do not mind committing this but need clarifications regarding the
copyright (copyleft?) rules in this case. In the past we have allowed
small contributions like this. This may be a little more than minor.


It is.  This is why I pointed to:

https://gcc.gnu.org/dco.html


Regardless it appears to do the job!

Jerry











[COMMITTED] Make the manual clearer about what options -Wunused enables [PR90464]

2024-01-21 Thread Sandra Loosemore
gcc/ChangeLog
PR c++/90464
* doc/invoke.texi (Warning Options): Document that -Wunused-parameter
isn't enabled by -Wunused unless -Wextra is provided, and that
-Wunused does enable -Wunused-const-variable=1 for C.  Clarify that
-Wunused doesn't enable -Wunused-* options documented as behaving
otherwise, and list them explicitly.
---
 gcc/doc/invoke.texi | 28 ++--
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index f7a6e11d20e..278c931b6a3 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -7598,6 +7598,8 @@ This warning is enabled by @option{-Wall}.
 @opindex Wno-unused-parameter
 @item -Wunused-parameter
 Warn whenever a function parameter is unused aside from its declaration.
+This option is not enabled by @code{-Wunused} unless @code{-Wextra} is also
+specified.
 
 To suppress this warning use the @code{unused} attribute
 (@pxref{Variable Attributes}).
@@ -7624,25 +7626,26 @@ To suppress this warning use the @code{unused} attribute
 @item -Wunused-const-variable
 @itemx -Wunused-const-variable=@var{n}
 Warn whenever a constant static variable is unused aside from its declaration.
-@option{-Wunused-const-variable=1} is enabled by @option{-Wunused-variable}
-for C, but not for C++. In C this declares variable storage, but in C++ this
-is not an error since const variables take the place of @code{#define}s.
 
 To suppress this warning use the @code{unused} attribute
 (@pxref{Variable Attributes}).
 
 @table @gcctabopt
 @item -Wunused-const-variable=1
-This is the warning level that is enabled by @option{-Wunused-variable} for
-C.  It warns only about unused static const variables defined in the main
+Warn about unused static const variables defined in the main
 compilation unit, but not about static const variables declared in any
 header included.
 
+@option{-Wunused-const-variable=1} is enabled by either
+@option{-Wunused-variable} or @option{-Wunused} for C, but not for
+C++. In C this declares variable storage, but in C++ this is not an
+error since const variables take the place of @code{#define}s.
+
 @item -Wunused-const-variable=2
 This warning level also warns for unused constant static variables in
-headers (excluding system headers).  This is the warning level of
-@option{-Wunused-const-variable} and must be explicitly requested since
-in C++ this isn't an error and in C it might be harder to clean up all
+headers (excluding system headers).  It is equivalent to the short form
+@option{-Wunused-const-variable}.  This level must be explicitly
+requested in both C and C++ because it might be hard to clean up all
 headers included.
 @end table
 
@@ -7661,11 +7664,16 @@ This warning is enabled by @option{-Wall}.
 @opindex Wunused
 @opindex Wno-unused
 @item -Wunused
-All the above @option{-Wunused} options combined.
+All the above @option{-Wunused} options combined, except those documented
+as needing to be specified explicitly.
 
 In order to get a warning about an unused function parameter, you must
 either specify @option{-Wextra -Wunused} (note that @option{-Wall} implies
-@option{-Wunused}), or separately specify @option{-Wunused-parameter}.
+@option{-Wunused}), or separately specify @option{-Wunused-parameter} and/or
+@option{-Wunused-but-set-parameter}.
+
+@option{-Wunused} enables only @option{-Wunused-const-variable=1} rather than
+@option{-Wunused-const-variable}, and only for C, not C++.
 
 @opindex Wuse-after-free
 @opindex Wno-use-after-free
-- 
2.31.1



Re: [PATCH] Fortran: passing of optional scalar arguments with VALUE attribute [PR113377]

2024-01-21 Thread Harald Anlauf

Hi Mikael!

Am 21.01.24 um 11:50 schrieb Mikael Morin:

Hello,

Le 20/01/2024 à 22:58, Harald Anlauf a écrit :

Dear all,

here's the first part of an attempt to fix issues with optional
dummy arguments as actual arguments to optional dummies.  This patch
rectifies the case of scalar dummies with the VALUE attribute,
which in gfortran's argument passing convention are passed on the
stack when they are of intrinsic type, and have a hidden variable
for the presence status.

The testcase tries to cover valid combinations of actual and dummy
argument.  A few tests that are not standard-conforming but would
still work with gfortran (due to the argument passing convention)
are left there but commented out with a pointer to the standard
(thanks, Mikael!).

Regtested on x86_64-pc-linux-gnu.  OK for mainline?


Well, not yet.



diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 9dd1f4086f4..2f47a75955c 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -6526,6 +6526,10 @@ gfc_conv_procedure_call (gfc_se * se,
gfc_symbol * sym,
 gfc_init_se (, NULL);
 argse.want_pointer = 1;
 gfc_conv_expr (, e);
+    if (e->symtree->n.sym->attr.dummy
+    && POINTER_TYPE_P (TREE_TYPE (argse.expr)))
+  argse.expr = gfc_build_addr_expr (NULL_TREE,
+    argse.expr);


The second part of the condition looks superfluous: if
argse.want_pointer was set, we can expect to get a pointer result.

But more important, I don't understand the need for this whole part, the
new test seems to pass without it.
And here is an example that regresses with it.

program p
   type :: t
     integer, allocatable :: c
   end type
   call s2(t())
contains
   subroutine s1(a)
     integer, value, optional :: a
     if (present(a)) stop 1
   end subroutine
   subroutine s2(a)
     type(t) :: a
     call s1(a%c)
   end subroutine
end program


Thanks for this example!  I've taken the liberty to add a slightly
extended version of it to the testcase.

I was taken astray by the attempt to handle the (invalid by the
standard) variant of passing an absent allocatable scalar to
an optional scalar dummy with the value attribute under since
we use a hidden variable for the present status.  Without the
code above there is an unprotected pointer dereference.

I think that it still could be done, but it is probably not worth
it.  So I followed your suggestion and removed that part.




 cond = fold_convert (TREE_TYPE (argse.expr),
  null_pointer_node);
 cond = fold_build2_loc (input_location, NE_EXPR,
@@ -7256,6 +7260,7 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol
* sym,
   && e->symtree->n.sym->attr.optional
   && (((e->rank != 0 && elemental_proc)
    || e->representation.length || e->ts.type == BT_CHARACTER
+   || (e->rank == 0 && e->symtree->n.sym->attr.value)


This looks good.


    || (e->rank != 0
    && (fsym == NULL
    || (fsym->as
diff --git a/gcc/testsuite/gfortran.dg/optional_absent_9.f90
b/gcc/testsuite/gfortran.dg/optional_absent_9.f90
new file mode 100644
index 000..495a6c00d7f
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/optional_absent_9.f90
@@ -0,0 +1,324 @@
+! { dg-do run }
+! PR fortran/113377
+!
+! Test passing of missing optional scalar dummies of intrinsic type
+
+module m_int
+  implicit none
+contains
+  subroutine test_int ()
+    integer :: k = 1
+    call one (k)
+    call one_val (k)
+    call one_all (k)
+    call one_ptr (k)
+  end
+
+  subroutine one (i, j)
+    integer, intent(in)   :: i
+    integer ,optional :: j
+    integer, allocatable :: aa
+    integer, pointer :: pp => NULL()
+    if (present (j)) error stop "j is present"
+    call two (i, j)
+    call two_val (i, j)
+    call two (i, aa)
+    call two (i, pp)


To be complete, you could check two_val(i, aa) and two_val(i, pp) as well.
Both seem to pass already without the patch, so not absolutely needed.
Your call.


It is already contained in testcase gfortran.dg/value_optional_1.f90,
(see call sub there), but then it may be helpful to have it here too.
Thus added.


+  end
+


I think the patch is OK with the first trans-expr.cc hunk removed.
Thanks.


That's what I have done and pushed as r14-8317-g68862e5c75ef0e.


Mikael


Thanks for the review!

Harald




Re: [PATCH] Pass GUILE down to subdirectories

2024-01-21 Thread Eric Gallager
On Thu, Jan 18, 2024 at 3:37 PM Tom Tromey  wrote:
>
> Andrew> This change is causing some problems for me.
>
> Yeah, Tom de Vries as well.
>
> Andrew> One of my build machines has 2 versions of guile installed.  One is
> Andrew> guile 2.0.14 and the other is guile 2.2.21.
>
> Andrew> When GDB configures itself the configure script figures out that it
> Andrew> should use 2.2.21 to compile the guile libraries that GDB uses.
>
> Andrew> However, when we actually build the guile libraries we do use 
> guild2.2,
> Andrew> but due to this 'GUILE = guile' line, guild2.2 uses guile 2.0.14 in
> Andrew> order to perform the compile (I guess, I don't know the details of how
> Andrew> guile compilation works).
>
> Andrew> Unfortunately guile 2.0.14 compiles in a way which is not compatible
> Andrew> with how GDB then tries to load the guile library.
>
> I consider this a bug in guile -- it installs 'guild' with this:
>
> #!/usr/bin/sh
> # -*- scheme -*-
> exec ${GUILE:-/usr/bin/guile2.2} $GUILE_FLAGS -e '(@@ (guild) main)' -s 
> "$0" "$@"
> !#
>
> Allowing a system script to pick $GUILE here seems weird, especially for
> a versioned install of "guild", where as you can see it already knows
> the correct guile to use.
>
> However -- I think it's better to just work around this.
> I plan to back out this change.  Anyone needing to re-run cgen (which
> itself ought to come with smarts here, but since it is un-maintained...)
> can just specify this by hand.  I.e., the status quo ante.

I mean, I've been trying to figure out how to re-run cgen myself, to
regenerate some cgen-erated files in libopcodes to fix some compiler
warnings in them, but it's pretty hard to do so; I'd really appreciate
it if the whole process of regenerating files with cgen could be made
easy and well-documented and understandable...

>
> I'll try to send a patch tomorrow.
>
> Tom


[PATCH V2] rs6000: New pass for replacement of adjacent loads fusion (lxv).

2024-01-21 Thread Ajit Agarwal


Hello All:

New pass to replace adjacent memory addresses lxv with lxvp.
Added common infrastructure for load store fusion for
different targets.

Common routines are refactored in fusion-common.h.

AARCH64 load/store fusion pass is not changed with the 
common infrastructure.

For AARCH64 architectures just include "fusion-common.h"
and target dependent code can be added to that.


Alex/Richard:

If you would like me to add for AARCH64 I can do that for AARCH64.

If you would like to do that is fine with me.

Bootstrapped and regtested with powerpc64-linux-gnu.

Improvement in performance is seen with Spec 2017 spec FP benchmarks.

Thanks & Regards
Ajit

rs6000: New  pass for replacement of adjacent lxv with lxvp.

New pass to replace adjacent memory addresses lxv with lxvp.
Added common infrastructure for load store fusion for
different targets.

Common routines are refactored in fusion-common.h.

2024-01-21  Ajit Kumar Agarwal  

gcc/ChangeLog:

* config/rs6000/rs6000-passes.def: New vecload pass
before pass_early_remat.
* config/rs6000/rs6000-vecload-opt.cc: Add new pass.
* config.gcc: Add new executable.
* config/rs6000/rs6000-protos.h: Add new prototype for vecload
pass.
* config/rs6000/rs6000.cc: Add new prototype for vecload pass.
* config/rs6000/t-rs6000: Add new rule.
* fusion-common.h: Add common infrastructure for load store
fusion that can be shared across different architectures.
* emit-rtl.cc: Modify assert code.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/vecload.C: New test.
* g++.target/powerpc/vecload1.C: New test.
* gcc.target/powerpc/mma-builtin-1.c: Modify test.
---
 gcc/config.gcc|4 +-
 gcc/config/rs6000/rs6000-passes.def   |3 +
 gcc/config/rs6000/rs6000-protos.h |1 +
 gcc/config/rs6000/rs6000-vecload-opt.cc   | 1186 
 gcc/config/rs6000/rs6000.cc   |1 +
 gcc/config/rs6000/t-rs6000|5 +
 gcc/emit-rtl.cc   |   14 +-
 gcc/fusion-common.h   | 1195 +
 gcc/testsuite/g++.target/powerpc/vecload.C|   15 +
 gcc/testsuite/g++.target/powerpc/vecload1.C   |   22 +
 .../gcc.target/powerpc/mma-builtin-1.c|4 +-
 11 files changed, 2433 insertions(+), 17 deletions(-)
 create mode 100644 gcc/config/rs6000/rs6000-vecload-opt.cc
 create mode 100644 gcc/fusion-common.h
 create mode 100644 gcc/testsuite/g++.target/powerpc/vecload.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/vecload1.C

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 00355509c92..9bff42cf830 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -518,7 +518,7 @@ or1k*-*-*)
;;
 powerpc*-*-*)
cpu_type=rs6000
-   extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
+   extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
rs6000-vecload-opt.o"
extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
extra_objs="${extra_objs} rs6000-builtins.o rs6000-builtin.o"
extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h"
@@ -555,7 +555,7 @@ riscv*)
;;
 rs6000*-*-*)
extra_options="${extra_options} g.opt fused-madd.opt 
rs6000/rs6000-tables.opt"
-   extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
+   extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
rs6000-vecload-opt.o"
extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/rs6000/rs6000-logue.cc 
\$(srcdir)/config/rs6000/rs6000-call.cc"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/rs6000/rs6000-pcrel-opt.cc"
diff --git a/gcc/config/rs6000/rs6000-passes.def 
b/gcc/config/rs6000/rs6000-passes.def
index 46a0d0b8c56..eb4a65ebe10 100644
--- a/gcc/config/rs6000/rs6000-passes.def
+++ b/gcc/config/rs6000/rs6000-passes.def
@@ -28,6 +28,9 @@ along with GCC; see the file COPYING3.  If not see
  The power8 does not have instructions that automaticaly do the byte swaps
  for loads and stores.  */
   INSERT_PASS_BEFORE (pass_cse, 1, pass_analyze_swaps);
+  /* Pass to replace adjacent memory addresses lxv instruction with lxvp
+ instruction.  */
+  INSERT_PASS_BEFORE (pass_early_remat, 1, pass_analyze_vecload);
 
   /* Pass to do the PCREL_OPT optimization that combines the load of an
  external symbol's address along with a single load or store using that
diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index 09a57a806fa..f0a9f36602e 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -343,6 +343,7 @@ namespace gcc { class context; }
 class rtl_opt_pass;
 
 extern rtl_opt_pass *make_pass_analyze_swaps (gcc::context *);
+extern rtl_opt_pass *make_pass_analyze_vecload (gcc::context *);
 extern 

Fwd: [PATCH V2] rs6000: New pass for replacement of adjacent loads fusion (lxv).

2024-01-21 Thread Ajit Agarwal


Hello All:

New pass to replace adjacent memory addresses lxv with lxvp.
Added common infrastructure for load store fusion for
different targets.

Common routines are refactored in fusion-common.h.

AARCH64 load/store fusion pass is not changed with the 
common infrastructure.

For AARCH64 architectures just include "fusion-common.h"
and target dependent code is added to that.


Alex/Richard:

If you would like me to add for AARCH64 I can do that for AARCH64.

If you would like to do that is fine with me.

Bootstrapped and regtested with powerpc64-linux-gnu.

Improvement in performance is seen with Spec 2017 spec FP benchmarks.

Thanks & Regards
Ajit

rs6000: New  pass for replacement of adjacent lxv with lxvp.

New pass to replace adjacent memory addresses lxv with lxvp.
Added common infrastructure for load store fusion for
different targets.

Common routines are refactored in fusion-common.h.

2024-01-21  Ajit Kumar Agarwal  

gcc/ChangeLog:

* config/rs6000/rs6000-passes.def: New vecload pass
before pass_early_remat.
* config/rs6000/rs6000-vecload-opt.cc: Add new pass.
* config.gcc: Add new executable.
* config/rs6000/rs6000-protos.h: Add new prototype for vecload
pass.
* config/rs6000/rs6000.cc: Add new prototype for vecload pass.
* config/rs6000/t-rs6000: Add new rule.
* fusion-common.h: Add common infrastructure for load store
fusion that can be shared across different architectures.
* emit-rtl.cc: Modify assert code.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/vecload.C: New test.
* g++.target/powerpc/vecload1.C: New test.
* gcc.target/powerpc/mma-builtin-1.c: Modify test.
---
 gcc/config.gcc|4 +-
 gcc/config/rs6000/rs6000-passes.def   |3 +
 gcc/config/rs6000/rs6000-protos.h |1 +
 gcc/config/rs6000/rs6000-vecload-opt.cc   | 1186 
 gcc/config/rs6000/rs6000.cc   |1 +
 gcc/config/rs6000/t-rs6000|5 +
 gcc/emit-rtl.cc   |   14 +-
 gcc/fusion-common.h   | 1195 +
 gcc/testsuite/g++.target/powerpc/vecload.C|   15 +
 gcc/testsuite/g++.target/powerpc/vecload1.C   |   22 +
 .../gcc.target/powerpc/mma-builtin-1.c|4 +-
 11 files changed, 2433 insertions(+), 17 deletions(-)
 create mode 100644 gcc/config/rs6000/rs6000-vecload-opt.cc
 create mode 100644 gcc/fusion-common.h
 create mode 100644 gcc/testsuite/g++.target/powerpc/vecload.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/vecload1.C

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 00355509c92..9bff42cf830 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -518,7 +518,7 @@ or1k*-*-*)
;;
 powerpc*-*-*)
cpu_type=rs6000
-   extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
+   extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
rs6000-vecload-opt.o"
extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
extra_objs="${extra_objs} rs6000-builtins.o rs6000-builtin.o"
extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h"
@@ -555,7 +555,7 @@ riscv*)
;;
 rs6000*-*-*)
extra_options="${extra_options} g.opt fused-madd.opt 
rs6000/rs6000-tables.opt"
-   extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
+   extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
rs6000-vecload-opt.o"
extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/rs6000/rs6000-logue.cc 
\$(srcdir)/config/rs6000/rs6000-call.cc"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/rs6000/rs6000-pcrel-opt.cc"
diff --git a/gcc/config/rs6000/rs6000-passes.def 
b/gcc/config/rs6000/rs6000-passes.def
index 46a0d0b8c56..eb4a65ebe10 100644
--- a/gcc/config/rs6000/rs6000-passes.def
+++ b/gcc/config/rs6000/rs6000-passes.def
@@ -28,6 +28,9 @@ along with GCC; see the file COPYING3.  If not see
  The power8 does not have instructions that automaticaly do the byte swaps
  for loads and stores.  */
   INSERT_PASS_BEFORE (pass_cse, 1, pass_analyze_swaps);
+  /* Pass to replace adjacent memory addresses lxv instruction with lxvp
+ instruction.  */
+  INSERT_PASS_BEFORE (pass_early_remat, 1, pass_analyze_vecload);
 
   /* Pass to do the PCREL_OPT optimization that combines the load of an
  external symbol's address along with a single load or store using that
diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index 09a57a806fa..f0a9f36602e 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -343,6 +343,7 @@ namespace gcc { class context; }
 class rtl_opt_pass;
 
 extern rtl_opt_pass *make_pass_analyze_swaps (gcc::context *);
+extern rtl_opt_pass *make_pass_analyze_vecload (gcc::context *);
 extern 

[committed] libstdc++: Fix std::format floating-point alternate forms [PR113512]

2024-01-21 Thread Jonathan Wakely
Tested aarch64-linux. Pushed to trunk. Backport to gcc-13 to follow.

-- >8 --

The logic for handling '#' forms was ... not good. The count of
significant figures just counted digits, instead of ignoring leading
zeros. And when moving the result from the stack buffer to a dynamic
string the exponent could get lost in some cases.

libstdc++-v3/ChangeLog:

PR libstdc++/113512
* include/std/format (__formatter_fp::format): Fix logic for
alternate forms.
* testsuite/std/format/functions/format.cc: Check buggy cases of
alternate forms with g presentation type.
---
 libstdc++-v3/include/std/format   | 51 +--
 .../testsuite/std/format/functions/format.cc  |  6 +++
 2 files changed, 41 insertions(+), 16 deletions(-)

diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index f4d91517656..0eca8b58bfa 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -1623,6 +1623,7 @@ namespace __format
*__p = std::toupper(*__p);
}
 
+ bool __have_sign = true;
  // Add sign for non-negative values.
  if (!__builtin_signbit(__v))
{
@@ -1630,56 +1631,73 @@ namespace __format
*--__start = '+';
  else if (_M_spec._M_sign == _Sign_space)
*--__start = ' ';
+ else
+   __have_sign = false;
}
 
  string_view __narrow_str(__start, __res.ptr - __start);
 
- // Use alternate form.
+ // Use alternate form. Ensure decimal point is always present,
+ // and add trailing zeros (up to precision) for g and G forms.
  if (_M_spec._M_alt && __builtin_isfinite(__v))
{
  string_view __s = __narrow_str;
- size_t __z = 0;
- size_t __p;
- size_t __d = __s.find('.');
- size_t __sigfigs;
- if (__d != __s.npos)
+ size_t __sigfigs; // Number of significant figures.
+ size_t __z = 0;   // Number of trailing zeros to add.
+ size_t __p;   // Position of the exponent character (if any).
+ size_t __d = __s.find('.'); // Position of decimal point.
+ if (__d != __s.npos) // Found decimal point.
{
  __p = __s.find(__expc, __d + 1);
  if (__p == __s.npos)
__p = __s.size();
- __sigfigs = __p - 1;
+
+ // If presentation type is g or G we might need to add zeros.
+ if (__trailing_zeros)
+   {
+ // Find number of digits after first significant figure.
+ if (__s[__have_sign] != '0')
+   // A string like "D.D" or "-D.DDD"
+   __sigfigs = __p - __have_sign - 1;
+ else
+   // A string like "0.D" or "-0.0DD".
+   // Safe to assume there is a non-zero digit, because
+   // otherwise there would be no decimal point.
+   __sigfigs = __p - __s.find_first_not_of('0', __d + 1);
+   }
}
- else
+ else // No decimal point, we need to insert one.
{
- __p = __s.find(__expc);
+ __p = __s.find(__expc); // Find the exponent, if present.
  if (__p == __s.npos)
__p = __s.size();
  __d = __p; // Position where '.' should be inserted.
- __sigfigs = __d;
+ __sigfigs = __d - __have_sign;
}
 
  if (__trailing_zeros && __prec != 0)
{
- if (!__format::__is_xdigit(__s[0]))
-   --__sigfigs;
- __z = __prec - __sigfigs; // Number of zeros to insert.
+ // For g and G presentation types std::to_chars produces
+ // no more than prec significant figures. Insert this many
+ // zeros so the result has exactly prec significant figures.
+ __z = __prec - __sigfigs;
}
 
- if (size_t __extras = int(__d == __p) + __z)
+ if (size_t __extras = int(__d == __p) + __z) // How many to add.
{
  if (__dynbuf.empty() && __extras <= size_t(__end - __res.ptr))
{
+ // The stack buffer is large enough for the result.
  // Move exponent to make space for extra chars.
  __builtin_memmove(__start + __p + __extras,
__start + __p,
__s.size() - __p);
-
  if (__d == __p)
__start[__p++] = '.';
  __builtin_memset(__start + __p, '0', __z);

[PATCH] c/c++: Tweak warning for 'always_inline function might not be inlinable'

2024-01-21 Thread Hans-Peter Nilsson
Tested x86_64-linux-gnu.  Ok to commit?

Or, does the message need more tweaking?
(If so, suggestions from native speakers?)
FWIW, I found no PR for just the message being bad.

-- >8 --
When you're not regularly exposed to this warning, it is
easy to be misled by its wording, believing that there's
something else in the function that stops it from being
inlined, than the lack of also being *declared* inline.

It's just a warning: without the inline directive, there has
to be a secondary reasons for the function to be inlined,
than the always_inline attribute, reasons that may be in
effect despite the warning.

Whenever the text is quoted in inline-related bugzilla
entries, there seems to often have been an initial step of
confusion that has to be cleared, for example in PR55830.
The powerpc test-case in this patch where a comment is
adjusted, seems to be another example, and I testify as the
first-hand third "experience".  The wording has been the
same since the warning was added.

Let's just tweak the wording, adding the cause, so that the
reason for the warning is clearer.  This hopefully stops the
user from immediately asking "'Might'?  Because why?"  and
then going off looking at the function body - or grepping
the gcc source or documentation, or enter a bug-report
subsequently closed as resolved/invalid.

gcc:
* cgraphunit.cc (process_function_and_variable_attributes): Tweak
the warning for an attribute-always_inline without inline declaration.

gcc/testsuite:
* g++.dg/Wattributes-3.C: Adjust expected warning.
* gcc.dg/fail_always_inline.c: Ditto.
* gcc.target/powerpc/vec-extract-v16qiu-v2.h: Adjust
quoted warning in comment.
---
 gcc/cgraphunit.cc| 3 ++-
 gcc/testsuite/g++.dg/Wattributes-3.C | 4 ++--
 gcc/testsuite/gcc.dg/fail_always_inline.c| 2 +-
 gcc/testsuite/gcc.target/powerpc/vec-extract-v16qiu-v2.h | 2 +-
 4 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/gcc/cgraphunit.cc b/gcc/cgraphunit.cc
index 38052674aaa5..89dc1af522a4 100644
--- a/gcc/cgraphunit.cc
+++ b/gcc/cgraphunit.cc
@@ -918,7 +918,8 @@ process_function_and_variable_attributes (cgraph_node 
*first,
  /* redefining extern inline function makes it DECL_UNINLINABLE.  */
  && !DECL_UNINLINABLE (decl))
warning_at (DECL_SOURCE_LOCATION (decl), OPT_Wattributes,
-   "% function might not be inlinable");
+   "% function is not always inlined"
+   " unless also declared %");
 
   process_common_attributes (node, decl);
 }
diff --git a/gcc/testsuite/g++.dg/Wattributes-3.C 
b/gcc/testsuite/g++.dg/Wattributes-3.C
index 208ec6696551..4adf0944cd4f 100644
--- a/gcc/testsuite/g++.dg/Wattributes-3.C
+++ b/gcc/testsuite/g++.dg/Wattributes-3.C
@@ -26,7 +26,7 @@ B::operator char () const { return 0; }
 
 ATTR ((__noinline__))
 B::operator int () const  // { dg-warning "ignoring attribute .noinline. 
because it conflicts with attribute .always_inline." }
-// { dg-warning "function might not be inlinable" "" { target *-*-* } .-1 }
+// { dg-warning "function is not always inlined unless also declared .inline." 
"" { target *-*-* } .-1 }
 
 {
   return 0;
@@ -45,7 +45,7 @@ C::operator char () { return 0; }
 
 ATTR ((__noinline__))
 C::operator short ()   // { dg-warning "ignoring attribute .noinline. 
because it conflicts with attribute .always_inline." }
-// { dg-warning "function might not be inlinable" "" { target *-*-* } .-1 }
+// { dg-warning "function is not always inlined unless also declared .inline." 
"" { target *-*-* } .-1 }
 { return 0; }
 
 inline ATTR ((__noinline__))
diff --git a/gcc/testsuite/gcc.dg/fail_always_inline.c 
b/gcc/testsuite/gcc.dg/fail_always_inline.c
index 86645b850de8..2f48d7f5c6be 100644
--- a/gcc/testsuite/gcc.dg/fail_always_inline.c
+++ b/gcc/testsuite/gcc.dg/fail_always_inline.c
@@ -2,7 +2,7 @@
 /* { dg-add-options bind_pic_locally } */
 
 extern __attribute__ ((always_inline)) void
- bar() { } /* { dg-warning "function might not be inlinable" } */
+ bar() { } /* { dg-warning "function is not always inlined unless also 
declared .inline." } */
 
 void
 f()
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-extract-v16qiu-v2.h 
b/gcc/testsuite/gcc.target/powerpc/vec-extract-v16qiu-v2.h
index d1157599ee7c..b9800e1c950d 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-extract-v16qiu-v2.h
+++ b/gcc/testsuite/gcc.target/powerpc/vec-extract-v16qiu-v2.h
@@ -179,7 +179,7 @@ get_auto_15 (vector TYPE a)
 #ifdef DISABLE_INLINE_OF_GET_AUTO_N
 __attribute__ ((__noinline__))
 #else
-/* gcc issues warning: always_inline function might not be inlinable
+/* gcc issues warning: always_inline function is not always inlined unless 
also declared inline
 
__attribute__ ((__always_inline__))
 */
-- 
2.30.2



Re: [PATCH] Fortran: passing of optional scalar arguments with VALUE attribute [PR113377]

2024-01-21 Thread Mikael Morin

Hello,

Le 20/01/2024 à 22:58, Harald Anlauf a écrit :

Dear all,

here's the first part of an attempt to fix issues with optional
dummy arguments as actual arguments to optional dummies.  This patch
rectifies the case of scalar dummies with the VALUE attribute,
which in gfortran's argument passing convention are passed on the
stack when they are of intrinsic type, and have a hidden variable
for the presence status.

The testcase tries to cover valid combinations of actual and dummy
argument.  A few tests that are not standard-conforming but would
still work with gfortran (due to the argument passing convention)
are left there but commented out with a pointer to the standard
(thanks, Mikael!).

Regtested on x86_64-pc-linux-gnu.  OK for mainline?


Well, not yet.



diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 9dd1f4086f4..2f47a75955c 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -6526,6 +6526,10 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
gfc_init_se (, NULL);
argse.want_pointer = 1;
gfc_conv_expr (, e);
+   if (e->symtree->n.sym->attr.dummy
+   && POINTER_TYPE_P (TREE_TYPE (argse.expr)))
+ argse.expr = gfc_build_addr_expr (NULL_TREE,
+   argse.expr);


The second part of the condition looks superfluous: if 
argse.want_pointer was set, we can expect to get a pointer result.


But more important, I don't understand the need for this whole part, the 
new test seems to pass without it.

And here is an example that regresses with it.

program p
  type :: t
integer, allocatable :: c
  end type
  call s2(t())
contains
  subroutine s1(a)
integer, value, optional :: a
if (present(a)) stop 1
  end subroutine
  subroutine s2(a)
type(t) :: a
call s1(a%c)
  end subroutine
end program



cond = fold_convert (TREE_TYPE (argse.expr),
 null_pointer_node);
cond = fold_build2_loc (input_location, NE_EXPR,
@@ -7256,6 +7260,7 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
  && e->symtree->n.sym->attr.optional
  && (((e->rank != 0 && elemental_proc)
   || e->representation.length || e->ts.type == BT_CHARACTER
+  || (e->rank == 0 && e->symtree->n.sym->attr.value)


This looks good.


   || (e->rank != 0
   && (fsym == NULL
   || (fsym->as
diff --git a/gcc/testsuite/gfortran.dg/optional_absent_9.f90 
b/gcc/testsuite/gfortran.dg/optional_absent_9.f90
new file mode 100644
index 000..495a6c00d7f
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/optional_absent_9.f90
@@ -0,0 +1,324 @@
+! { dg-do run }
+! PR fortran/113377
+!
+! Test passing of missing optional scalar dummies of intrinsic type
+
+module m_int
+  implicit none
+contains
+  subroutine test_int ()
+integer :: k = 1
+call one (k)
+call one_val (k)
+call one_all (k)
+call one_ptr (k)
+  end
+
+  subroutine one (i, j)
+integer, intent(in)   :: i
+integer ,optional :: j
+integer, allocatable :: aa
+integer, pointer :: pp => NULL()
+if (present (j)) error stop "j is present"
+call two (i, j)
+call two_val (i, j)
+call two (i, aa)
+call two (i, pp)


To be complete, you could check two_val(i, aa) and two_val(i, pp) as well.
Both seem to pass already without the patch, so not absolutely needed.
Your call.


+  end
+


I think the patch is OK with the first trans-expr.cc hunk removed.
Thanks.

Mikael