Re: [PATCH][gdb/build] Fix gdbserver build with -fsanitize=thread

2022-06-26 Thread Richard Biener via Gcc-patches
On Sat, Jun 25, 2022 at 11:09 AM Tom de Vries via Gcc-patches
 wrote:
>
> Hi,
>
> When building gdbserver with -fsanitize=thread (added to CFLAGS/CXXFLAGS) we
> run into:
> ...
> ld: ../libiberty/libiberty.a(safe-ctype.o): warning: relocation against \
>   `__tsan_init' in read-only section `.text'
> ld: ../libiberty/libiberty.a(safe-ctype.o): relocation R_X86_64_PC32 \
>   against symbol `__tsan_init' can not be used when making a shared object; \
>   recompile with -fPIC
> ld: final link failed: bad value
> collect2: error: ld returned 1 exit status
> make[1]: *** [libinproctrace.so] Error 1
> ...
> which looks similar to what is described in commit 78e49486944 ("[gdb/build]
> Fix gdbserver build with -fsanitize=address").
>
> The gdbserver component builds a shared library libinproctrace.so, which uses
> libiberty and therefore requires the pic variant.  The gdbserver Makefile is
> setup to use this variant, if available, but it's not there.
>
> Fix this by listing gdbserver in the toplevel configure alongside libcc1, as a
> component that needs the libiberty pic variant, setting:
> ...
> extra_host_libiberty_configure_flags=--enable-shared
> ...
>
> Tested on x86_64-linux.
>
> OK for trunk gcc?

OK

> Thanks,
> - Tom
>
> [gdb/build] Fix gdbserver build with -fsanitize=thread
>
> ---
>  configure| 2 +-
>  configure.ac | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/configure b/configure
> index 1badcb314f8..aac80b88d70 100755
> --- a/configure
> +++ b/configure
> @@ -6964,7 +6964,7 @@ fi
>  extra_host_libiberty_configure_flags=
>  extra_host_zlib_configure_flags=
>  case " $configdirs " in
> -  *" lto-plugin "* | *" libcc1 "*)
> +  *" lto-plugin "* | *" libcc1 "* | *" gdbserver "*)
>  # When these are to be built as shared libraries, the same applies to
>  # libiberty.
>  extra_host_libiberty_configure_flags=--enable-shared
> diff --git a/configure.ac b/configure.ac
> index 5b6e2048514..29f74d10b5a 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -2344,7 +2344,7 @@ fi
>  extra_host_libiberty_configure_flags=
>  extra_host_zlib_configure_flags=
>  case " $configdirs " in
> -  *" lto-plugin "* | *" libcc1 "*)
> +  *" lto-plugin "* | *" libcc1 "* | *" gdbserver "*)
>  # When these are to be built as shared libraries, the same applies to
>  # libiberty.
>  extra_host_libiberty_configure_flags=--enable-shared


Re: PING^1 [PATCH] x86: Skip ENDBR when emitting direct call/jmp to local function

2022-06-26 Thread Hongtao Liu via Gcc-patches
On Tue, Jun 21, 2022 at 3:50 AM Uros Bizjak via Gcc-patches
 wrote:
>
> On Mon, Jun 20, 2022 at 8:14 PM H.J. Lu  wrote:
> >
> > On Tue, May 10, 2022 at 9:25 AM H.J. Lu  wrote:
> > >
> > > Mark a function with SYMBOL_FLAG_FUNCTION_ENDBR when inserting ENDBR at
> > > function entry.  Skip the 4-byte ENDBR when emitting a direct call/jmp
> > > to a local function with ENDBR at function entry.
> > >
> > > This has been tested on Linux kernel.
> > >
> > > gcc/
> > >
> > > PR target/102953
> > > * config/i386/i386-features.cc
> > > (rest_of_insert_endbr_and_patchable_area): Set
> > > SYMBOL_FLAG_FUNCTION_ENDBR when inserting ENDBR.
> > > * config/i386/i386.cc (ix86_print_operand): Skip the 4-byte ENDBR
> > > when calling the local function with ENDBR at function entry.
> > > * config/i386/i386.h (SYMBOL_FLAG_FUNCTION_ENDBR): New.
> > > (SYMBOL_FLAG_FUNCTION_ENDBR_P): Likewise.
> > >
> > > gcc/testsuite/
> > >
> > > PR target/102953
> > > * gcc.target/i386/pr102953-1.c: New test.
> > > * gcc.target/i386/pr102953-2.c: Likewise.
The patch looks good to me.
For direct call, endbr64 should not be used as a marker, right?
> > > ---
> > >  gcc/config/i386/i386-features.cc   |  2 ++
> > >  gcc/config/i386/i386.cc| 11 +++-
> > >  gcc/config/i386/i386.h |  5 
> > >  gcc/testsuite/gcc.target/i386/pr102953-1.c | 25 ++
> > >  gcc/testsuite/gcc.target/i386/pr102953-2.c | 30 ++
> > >  5 files changed, 72 insertions(+), 1 deletion(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr102953-1.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr102953-2.c
> > >
> > > diff --git a/gcc/config/i386/i386-features.cc 
> > > b/gcc/config/i386/i386-features.cc
> > > index 6fe41c3c24f..3ca1131ed59 100644
> > > --- a/gcc/config/i386/i386-features.cc
> > > +++ b/gcc/config/i386/i386-features.cc
> > > @@ -1979,6 +1979,8 @@ rest_of_insert_endbr_and_patchable_area (bool 
> > > need_endbr,
> > >   || (TARGET_DLLIMPORT_DECL_ATTRIBUTES
> > >   && DECL_DLLIMPORT_P (cfun->decl
> > > {
> > > + rtx symbol = XEXP (DECL_RTL (cfun->decl), 0);
> > > + SYMBOL_REF_FLAGS (symbol) |= SYMBOL_FLAG_FUNCTION_ENDBR;
> > >   if (crtl->profile && flag_fentry)
> > > {
> > >   /* Queue ENDBR insertion to x86_function_profiler.
> > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > > index 86752a6516a..ad1de239bef 100644
> > > --- a/gcc/config/i386/i386.cc
> > > +++ b/gcc/config/i386/i386.cc
> > > @@ -13787,7 +13787,16 @@ ix86_print_operand (FILE *file, rtx x, int code)
> > >else if (flag_pic || MACHOPIC_INDIRECT)
> > > output_pic_addr_const (file, x, code);
> > >else
> > > -   output_addr_const (file, x);
> > > +   {
> > > + /* Skip ENDBR when emitting a direct call/jmp to a local
> > > +function with ENDBR at function entry.  */
> > > + if (code == 'P'
> > > + && GET_CODE (x) == SYMBOL_REF
> > > + && SYMBOL_REF_LOCAL_P (x)
> > > + && SYMBOL_FLAG_FUNCTION_ENDBR_P (x))
> > > +   x = gen_rtx_PLUS (Pmode, x, GEN_INT (4));
> > > + output_addr_const (file, x);
> > > +   }
> > >  }
> > >  }
> > >
> > > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> > > index 363082ba47b..7a6317fea57 100644
> > > --- a/gcc/config/i386/i386.h
> > > +++ b/gcc/config/i386/i386.h
> > > @@ -2792,6 +2792,11 @@ extern GTY(()) tree ms_va_list_type_node;
> > >  #define SYMBOL_REF_STUBVAR_P(X) \
> > > ((SYMBOL_REF_FLAGS (X) & SYMBOL_FLAG_STUBVAR) != 0)
> > >
> > > +/* Flag to mark a function with ENDBR at entry.  */
> > > +#define SYMBOL_FLAG_FUNCTION_ENDBR (SYMBOL_FLAG_MACH_DEP << 5)
> > > +#define SYMBOL_FLAG_FUNCTION_ENDBR_P(X) \
> > > +   ((SYMBOL_REF_FLAGS (X) & SYMBOL_FLAG_FUNCTION_ENDBR) != 0)
> > > +
> > >  extern void debug_ready_dispatch (void);
> > >  extern void debug_dispatch_window (int);
> > >
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr102953-1.c 
> > > b/gcc/testsuite/gcc.target/i386/pr102953-1.c
> > > new file mode 100644
> > > index 000..2afad391baf
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/pr102953-1.c
> > > @@ -0,0 +1,25 @@
> > > +/* { dg-do compile { target { ! *-*-darwin* } } } */
> > > +/* { dg-options "-O2 -fno-pic -fplt -fcf-protection" } */
> > > +
> > > +extern int func (int);
> > > +
> > > +extern int i;
> > > +
> > > +__attribute__ ((noclone, noinline, noipa))
> > > +static int
> > > +bar (int x)
> > > +{
> > > +  if (x == 0)
> > > +return x;
> > > +  return bar (x - 1) + func (x);
> > > +}
> > > +
> > > +void *
> > > +foo (void)
> > > +{
> > > +  i = bar (2);
> > > +  return bar;
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-times {call\t_?bar\+4\M} 2 } } */
> > > +/* { 

Re: [statistics.cc] Emit asm name of function with -fdump-statistics-asmname

2022-06-26 Thread Richard Biener via Gcc-patches



> Am 27.06.2022 um 06:51 schrieb Prathamesh Kulkarni 
> :
> 
> On Mon, 20 Jun 2022 at 12:52, Richard Biener  
> wrote:
>> 
>>> On Thu, Jun 16, 2022 at 5:05 PM Prathamesh Kulkarni via Gcc-patches
>>>  wrote:
>>> 
>>> Hi,
>>> I just noticed -fdump-statistics supports asmname sub-option, which
>>> according to the doc states:
>>> "If DECL_ASSEMBLER_NAME has been set for a given decl, use that in the dump
>>> instead of DECL_NAME. Its primary use is ease of use working backward from
>>> mangled names in the assembly file."
>>> 
>>> When passed -fdump-statistics-asmname, the dump however still contains the
>>> original name of functions. The patch modifies statistics.cc to emit asm
>>> name of function instead. Also for C++, it helps to better disambiguate
>>> overloaded function names in the stats dump file.
>>> I have attached stats dump for a simple test-case.
>>> 
>>> Does it look OK ?
>> 
>> decl_assembler_name has the side-effect of computing and setting it if it is
>> not set already - I think that's unwanted.  You probably want to use
>> it only if DECL_ASSEMBLER_NAME_SET_P which then means when it
>> gets it later the dump will be split.
> Hi Richard,
> Sorry for late reply. In the attached patch, I checked for
> DECL_ASSEMBLER_NAME_SET_P before calling
> decl_assembler_name.
> Does that look OK ?

Ok.

Richard 


> Thanks,
> Prathamesh
>> 
>> Richard.
>> 
>>> 
>>> Thanks,
>>> Prathamesh
> 


RE: [PATCH 1/2]AArch64 Add fallback case using sdot for usdot

2022-06-26 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Sandiford 
> Sent: Thursday, June 16, 2022 7:54 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov 
> Subject: Re: [PATCH 1/2]AArch64 Add fallback case using sdot for usdot
> 
> Richard Sandiford via Gcc-patches  writes:
> > Tamar Christina  writes:
> >> Hi All,
> >>
> >> The usdot operation is common in video encoder and decoders including
> >> some of the most widely used ones.
> >>
> >> This patch adds a +dotprod version of the optab as a fallback for
> >> when you do have sdot but not usdot available.
> >>
> >> The fallback works by adding a bias to the unsigned argument to
> >> convert it to a signed value and then correcting for the bias later on.
> >>
> >> Essentially it relies on (x - 128)y + 128y == xy where x is unsigned
> >> and y is signed (assuming both are 8-bit values).  Because the range
> >> of a signed byte is only to 127 we split the bias correction into:
> >>
> >>(x - 128)y + 127y + y
> >
> > I bet you knew this question was coming, but: this technique isn't
> > target-specific, so wouldn't it be better to handle it in
> > tree-vect-patterns.cc instead?

Ok, so after many hours of trying I don't know how to make this work.
DOT_PROD_EXPR is a reduction, but emitting them as additional pattern
statement doesn't work because they'll be marked as internal_def rather than
reduction_def.  I tried marking the new vec_stmt_info that I create explicitly 
as
reduction_def but this gets overwritten during analysis.

I then looked into getting it as a vectorizable_operation but has this obvious 
problems
In that it no longer treats it as a reduction and so tries to decompose into 
hi/lo.

I then looked into treating additional patterns from  a reduction as reductions 
themselves
but this is obviously wrong as non-reduction statements also get marked as 
reductions.

The conclusion is that I don't think the vectorizer allows additional 
reductions to be
emitted from patterns.

> Also, how about doing (x - 128)y + 64y + 64y instead, to reduce the number
> of hoisted constants?
> 
> Thanks,
> Richard
> 
> > Thanks,
> > Richard
> >
> >> Concretely for:
> >>
> >> #define N 480
> >> #define SIGNEDNESS_1 unsigned
> >> #define SIGNEDNESS_2 signed
> >> #define SIGNEDNESS_3 signed
> >> #define SIGNEDNESS_4 unsigned
> >>
> >> SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res,
> >> SIGNEDNESS_3 char *restrict a,
> >>SIGNEDNESS_4 char *restrict b)
> >> {
> >>   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> >> {
> >>   int av = a[i];
> >>   int bv = b[i];
> >>   SIGNEDNESS_2 short mult = av * bv;
> >>   res += mult;
> >> }
> >>   return res;
> >> }
> >>
> >> we generate:
> >>
> >> moviv5.16b, 0x7f
> >> mov x3, 0
> >> moviv4.16b, 0x1
> >> moviv3.16b, 0xff80
> >> moviv0.4s, 0
> >> .L2:
> >> ldr q2, [x2, x3]
> >> ldr q1, [x1, x3]
> >> add x3, x3, 16
> >> sub v2.16b, v2.16b, v3.16b
> >> sdotv0.4s, v2.16b, v1.16b
> >> sdotv0.4s, v5.16b, v1.16b
> >> sdotv0.4s, v4.16b, v1.16b
> >> cmp x3, 480
> >> bne .L2
> >>
> >> instead of:
> >>
> >> moviv0.4s, 0
> >> mov x3, 0
> >> .L2:
> >> ldr q2, [x1, x3]
> >> ldr q1, [x2, x3]
> >> add x3, x3, 16
> >> sxtlv4.8h, v2.8b
> >> sxtl2   v3.8h, v2.16b
> >> uxtlv2.8h, v1.8b
> >> uxtl2   v1.8h, v1.16b
> >> mul v2.8h, v2.8h, v4.8h
> >> mul v1.8h, v1.8h, v3.8h
> >> saddw   v0.4s, v0.4s, v2.4h
> >> saddw2  v0.4s, v0.4s, v2.8h
> >> saddw   v0.4s, v0.4s, v1.4h
> >> saddw2  v0.4s, v0.4s, v1.8h
> >> cmp x3, 480
> >> bne .L2
> >>
> >> The new sequence is significantly faster as the operations it uses
> >> are well optimized.  Note that execution tests are already in the mid-end
> testsuite.
> >>
> >> Thanks to James Greenhalgh for the tip-off.
> >>
> >> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >>
> >> Ok for master?
> >>
> >> Thanks,
> >> Tamar
> >>
> >> gcc/ChangeLog:
> >>
> >>* config/aarch64/aarch64-simd.md (usdot_prod): Generate
> fallback
> >>or call original isns ...
> >>(usdot_prod_insn): ...here.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>* gcc.target/aarch64/simd/vusdot-autovec-2.c: New test.
> >>
> >> --- inline copy of patch --
> >> diff --git a/gcc/config/aarch64/aarch64-simd.md
> >> b/gcc/config/aarch64/aarch64-simd.md
> >> index
> >>
> cf2f4badacc594df9ecf06de3f8ea570ef9e0ff2..235a6fa371e471816284e3383e8
> >> 564e9cf643a74 100644
> >> --- a/gcc/config/aarch64/aarch64-simd.md
> >> +++ b/gcc/config/aarch64/aarch64-simd.md
> >> @@ -623,7 +623,7 @@ (define_insn "dot_prod"
> >>
> >>  ;; These instructions map to the __builtins for the 

Re: [statistics.cc] Emit asm name of function with -fdump-statistics-asmname

2022-06-26 Thread Prathamesh Kulkarni via Gcc-patches
On Mon, 20 Jun 2022 at 12:52, Richard Biener  wrote:
>
> On Thu, Jun 16, 2022 at 5:05 PM Prathamesh Kulkarni via Gcc-patches
>  wrote:
> >
> > Hi,
> > I just noticed -fdump-statistics supports asmname sub-option, which
> > according to the doc states:
> > "If DECL_ASSEMBLER_NAME has been set for a given decl, use that in the dump
> > instead of DECL_NAME. Its primary use is ease of use working backward from
> > mangled names in the assembly file."
> >
> > When passed -fdump-statistics-asmname, the dump however still contains the
> > original name of functions. The patch modifies statistics.cc to emit asm
> > name of function instead. Also for C++, it helps to better disambiguate
> > overloaded function names in the stats dump file.
> > I have attached stats dump for a simple test-case.
> >
> > Does it look OK ?
>
> decl_assembler_name has the side-effect of computing and setting it if it is
> not set already - I think that's unwanted.  You probably want to use
> it only if DECL_ASSEMBLER_NAME_SET_P which then means when it
> gets it later the dump will be split.
Hi Richard,
Sorry for late reply. In the attached patch, I checked for
DECL_ASSEMBLER_NAME_SET_P before calling
decl_assembler_name.
Does that look OK ?

Thanks,
Prathamesh
>
> Richard.
>
> >
> > Thanks,
> > Prathamesh
diff --git a/gcc/statistics.cc b/gcc/statistics.cc
index 0d596e34189..6c21415bf65 100644
--- a/gcc/statistics.cc
+++ b/gcc/statistics.cc
@@ -25,6 +25,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "context.h"
 #include "pass_manager.h"
+#include "tree.h"
 
 static int statistics_dump_nr;
 static dump_flags_t statistics_dump_flags;
@@ -113,6 +114,22 @@ curr_statistics_hash (void)
   return statistics_hashes[idx];
 }
 
+/* Helper function to return asmname or name of FN
+   depending on whether asmname option is set.  */
+
+static const char *
+get_function_name (struct function *fn)
+{
+  if ((statistics_dump_flags & TDF_ASMNAME)
+  && DECL_ASSEMBLER_NAME_SET_P (fn->decl))
+{
+  tree asmname = decl_assembler_name (fn->decl);
+  if (asmname)
+   return IDENTIFIER_POINTER (asmname);
+}
+  return function_name (fn);
+}
+
 /* Helper for statistics_fini_pass.  Print the counter difference
since the last dump for the pass dump files.  */
 
@@ -152,7 +169,7 @@ statistics_fini_pass_2 (statistics_counter **slot,
 current_pass->static_pass_number,
 current_pass->name,
 counter->id, counter->val,
-current_function_name (),
+get_function_name (cfun),
 count);
   else
 fprintf (statistics_dump_file,
@@ -160,7 +177,7 @@ statistics_fini_pass_2 (statistics_counter **slot,
 current_pass->static_pass_number,
 current_pass->name,
 counter->id,
-current_function_name (),
+get_function_name (cfun),
 count);
   counter->prev_dumped_count = counter->count;
   return 1;
@@ -329,7 +346,7 @@ statistics_counter_event (struct function *fn, const char 
*id, int incr)
   current_pass ? current_pass->static_pass_number : -1,
   current_pass ? current_pass->name : "none",
   id,
-  function_name (fn),
+  get_function_name (fn),
   incr);
 }
 
@@ -359,5 +376,5 @@ statistics_histogram_event (struct function *fn, const char 
*id, int val)
   current_pass->static_pass_number,
   current_pass->name,
   id, val,
-  function_name (fn));
+  get_function_name (fn));
 }


Re: [PATCH] xtensa: Optimize integer constant addition that is between -32896 and 32639

2022-06-26 Thread Max Filippov via Gcc-patches
On Sun, Jun 26, 2022 at 7:53 AM Takayuki 'January June' Suwa
 wrote:
>
> Such constants are often subject to the constant synthesis:
>
> int test(int a) {
>   return a - 31999;
> }
>
> test:
> movia3, 1
> addmi   a3, a3, -0x7d00
> add a2, a2, a3
> ret
>
> This patch optimizes such case as follows:
>
> test:
> addia2, a2, 1
> addmi   a2, a2, -0x7d00
> ret
>
> gcc/ChangeLog:
>
> * config/xtensa/xtensa.md:
> Suppress unnecessary emitting nop insn in the split patterns for
> integer/FP constant synthesis, and add new peephole2 pattern that
> folds such synthesized additions.
> ---
>  gcc/config/xtensa/xtensa.md | 35 +++
>  1 file changed, 35 insertions(+)

Regtested for target=xtensa-linux-uclibc, no new regressions.
Committed to master.

-- 
Thanks.
-- Max


Re: [PATCH v3] rs6000: Adjust mov optabs for opaque modes [PR103353]

2022-06-26 Thread Kewen.Lin via Gcc-patches
Hi Segher!

on 2022/6/25 00:49, Segher Boessenkool wrote:
> Hi!
> 
> On Fri, Jun 24, 2022 at 09:03:59AM +0800, Kewen.Lin wrote:
>> on 2022/6/24 03:06, Segher Boessenkool wrote:
>>> On Wed, May 18, 2022 at 10:07:48PM +0800, Kewen.Lin wrote:
 As PR103353 shows, we may want to continue to expand a MMA built-in
 function like a normal function, even if we have already emitted
 error messages about some missing required conditions.  As shown in
 that PR, without one explicit mov optab on OOmode provided, it would
 call emit_move_insn recursively.
>>>
>>> First off: lxvp is a VSX insn, not an MMA insn.  So please don't call it
>>> that -- this confusion is what presumably caused the problem here, so it
>>> would be good to root it out :-)
>>
>> I guess the "it" in "don't call it call" is for "MMA built-in function"?
>> It comes from the current code:
> 
> Your proposed commit message says "MMA built-in function".  It is not
> an MMA builtin, or rather, it should not be.
> 
 +  /* Opaque modes are only expected to be available when MMA is supported,
>>>
>>> Why do people expect that?  It is completely wrong.  The name "opaque"
>>> itself already says this is not just for MMA, but perhaps more
>>> importantly, it is a basic VSX insn, doesn't touch any MMA resources,
>>> and is useful in other contexts as well.
>>
>> ... The above statements are also based on current code, for now, the
>> related things like built-in functions, mov optab, hard_regno_ok etc.
>> for these two modes are guarded by TARGET_MMA.
> 
> Opaque modes are a generic thing, not an rs6000 thing.  It is important
> not to conflate completely different things that just happened to
> coincide some months ago (but not anymore right now even!)
> 
>> I think I get your points here, you want to separate these opaque
>> modes from MMA since the underlying lxvp/stxvp are not MMA specific,
>> so those related things (bifs, mov optabs etc.) are not necessarily
>> guarded under MMA.
> 
> Yup.  This can take some time of course, but in the mean time we should
> stop pretending the status quo is correct.
> 
>>> So this needs some bigger surgery.
>>
>> Yes, Peter may have more comments on this.
> 
> Yes.  Can you do a patch that just fixes this PR103353, without adding
> more misleading comments?  :-)
> 

Many thanks for all the further explanation above!  The attached patch
updated the misleading comments as you pointed out and suggested, could
you help to have another look?

BR,
KewenFrom ee49cd14b69aaa373b0aca71c4560944a0b43fbc Mon Sep 17 00:00:00 2001
From: "Kewen.Lin" 
Date: Mon, 27 Jun 2022 10:42:37 +0800
Subject: [PATCH] rs6000: Adjust mov optabs for opaque modes [PR103353]

As PR103353 shows, we may want to continue to expand built-in
function __builtin_vsx_lxvp, even if we have already emitted
error messages about some missing required conditions.  As
shown in that PR, without one explicit mov optab on OOmode
provided, it would call emit_move_insn recursively.

So this patch is to allow the mov pattern to be generated during
expanding phase if compiler has already seen errors.

PR target/103353

gcc/ChangeLog:

* config/rs6000/mma.md (define_expand movoo): Move TARGET_MMA condition
check to preparation statements and add handlings for !TARGET_MMA.
(define_expand movxo): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr103353.c: New test.
---
 gcc/config/rs6000/mma.md| 35 +
 gcc/testsuite/gcc.target/powerpc/pr103353.c | 22 +
 2 files changed, 51 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103353.c

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index a183b6a168a..a9cf59d68b5 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -268,10 +268,23 @@ (define_int_attr avvi4i4i4
[(UNSPEC_MMA_PMXVI8GER4PP   "pmxvi8ger4pp")
 (define_expand "movoo"
   [(set (match_operand:OO 0 "nonimmediate_operand")
(match_operand:OO 1 "input_operand"))]
-  "TARGET_MMA"
+  ""
 {
-  rs6000_emit_move (operands[0], operands[1], OOmode);
-  DONE;
+  if (TARGET_MMA) {
+rs6000_emit_move (operands[0], operands[1], OOmode);
+DONE;
+  }
+  /* PR103353 shows we may want to continue to expand the __builtin_vsx_lxvp
+ built-in function, even if we have already emitted error messages about
+ some missing required conditions.  As shown in that PR, without one
+ explicit mov optab on OOmode provided, it would call emit_move_insn
+ recursively.  So we allow this pattern to be generated when we are
+ expanding to RTL and have seen errors.  It would not cause further ICEs
+ as the compilation would stop soon after expanding.  */
+  else if (currently_expanding_to_rtl && seen_error ())
+;
+  else
+gcc_unreachable ();
 })
 
 (define_insn_and_split "*movoo"
@@ -300,10 +313,20 @@ (define_insn_and_split "*movoo"
 (define_expand "movxo"
   

Re: [PATCH 5/5] Convert ranger and clients to vrange.

2022-06-26 Thread Xi Ruoyao via Gcc-patches
On Wed, 2022-06-01 at 11:04 +0200, Aldy Hernandez via Gcc-patches wrote:
> Final patch committed.
> 
> All users but one of Value_Range::set_type() have been removed in
> favor of using a constructor taking a type.   We still need to delay
> initialization for one use in gimple_infer_range, as it has an array
> of temporaries for which the type is not known until later.
> 
> Re-tested on x86-64 Linux.

Bootstrap on loongarch64-linux-gnu is broken since r13-911.  I've
created https://gcc.gnu.org/PR106096 and put all information I've
gathered so far in the PR.


[rs6000 PATCH] Improve constant integer multiply using rldimi.

2022-06-26 Thread Roger Sayle
 

This patch tweaks the code generated on POWER for integer multiplications

by a constant, by making use of rldimi instructions.  Much like x86's

lea instruction, rldimi can be used to implement a shift and add pair

in some circumstances.  For rldimi this is when the shifted operand

is known to have no bits in common with the added operand.

 

Hence for the new testcase below:

 

int foo(int x)

{

  int t = x & 42;

  return t * 0x2001;

}

 

when compiled with -O2, GCC currently generates:

 

andi. 3,3,0x2a

slwi 9,3,13

add 3,9,3

extsw 3,3

blr

 

with this patch, we now generate:

 

andi. 3,3,0x2a

rlwimi 3,3,13,0,31-13

extsw 3,3

blr

 

It turns out this optimization already exists in the form of a combine

splitter in rs6000.md, but the constraints on combine splitters,

requiring three of four input instructions (and generating one or two

output instructions) mean it doesn't get applied as often as it could.

This patch converts the define_split into a define_insn_and_split to

catch more cases (such as the one above).

 

The one bit that's tricky/controversial is the use of RTL's

nonzero_bits which is accurate during the combine pass when this

pattern is first recognized, but not as advanced (not kept up to

date) when this pattern is eventually split.  To support this,

I've used a "|| reload_completed" idiom.  Does this approach seem

reasonable? [I've another patch of x86 that uses the same idiom].

 

This patch has been tested on powerpc64le-unknown-linux-gnu with

make bootstrap and make -k check with no new failures.

Ok for mainline?

 

 

2022-06-26  Roger Sayle  

 

gcc/ChangeLog

* config/rs6000/rs6000.md (*rotl3_insert_3b_): New

define_insn_and_split created from exisiting define_split.

 

gcc/testsuite/ChangeLog

* gcc.target/powerpc/rldimi-3.c: New test case.

 

 

Thanks in advance,

Roger

--

 

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 090dbcf..b8aada32 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -4209,13 +4209,19 @@
 
 (define_code_iterator plus_ior_xor [plus ior xor])
 
-(define_split
-  [(set (match_operand:GPR 0 "gpc_reg_operand")
-   (plus_ior_xor:GPR (ashift:GPR (match_operand:GPR 1 "gpc_reg_operand")
- (match_operand:SI 2 "const_int_operand"))
- (match_operand:GPR 3 "gpc_reg_operand")))]
-  "nonzero_bits (operands[3], mode)
-   < HOST_WIDE_INT_1U << INTVAL (operands[2])"
+(define_insn_and_split "*rotl3_insert_3b_"
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
+   (plus_ior_xor:GPR
+ (ashift:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
+ (match_operand:SI 2 "const_int_operand" "n"))
+ (match_operand:GPR 3 "gpc_reg_operand" "0")))]
+  "INTVAL (operands[2]) > 0
+   && INTVAL (operands[2]) < 64
+   && ((nonzero_bits (operands[3], mode)
+   < HOST_WIDE_INT_1U << INTVAL (operands[2]))
+   || reload_completed)"
+  "#"
+  "&& 1"
   [(set (match_dup 0)
(ior:GPR (and:GPR (match_dup 3)
  (match_dup 4))
diff --git a/gcc/testsuite/gcc.target/powerpc/rldimi-3.c 
b/gcc/testsuite/gcc.target/powerpc/rldimi-3.c
new file mode 100644
index 000..80ff86e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/rldimi-3.c
@@ -0,0 +1,9 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-O2" } */
+
+int foo(int x)
+{
+  int t = x & 42;
+  return t * 0x2001;
+}
+/* { dg-final { scan-assembler {\mrldimi\M} } } */


Re: [PATCH] Enhance _Hashtable for range insertion 0/5

2022-06-26 Thread François Dumont via Gcc-patches
I knew you were going to ask for it but was to impatient to propose 
those patches to wait anymore.


Attached you'll find what I start to work on. But I am quite 
disappointed by the results. At least it's showing  that results are not 
worst.


To be honest I was also hoping some feedback from potential users 
interesting in testing those patches. And maybe there are some well 
known (and free) benches that I could challenge ?


François

On 21/06/22 19:12, Jonathan Wakely wrote:

On Mon, 20 Jun 2022 at 17:58, François Dumont via Libstdc++
 wrote:

Hi

Here is a series of patch to enhance _Hashtable behavior mostly in the
context of range insertion. I also start considering the problem of
memory fragmentation in this container with 2 objectives:

- It is easier to find out when you're done with the elements of a
bucket if the last node of the bucket N is the before-begin node of
bucket N + 1.

- It is faster to loop through nodes of a bucket if those node are close
in memory, ultimately we should have addressof(Node + 1) ==
addressof(Node) + 1

Have these changes been profiled or benchmarked? Is it measurably
faster? By how much?



[1/5] Make more use of user hints as both insertion and allocation hints.

[2/5] Introduce a new method to check if we are still looping through
the same bucket's nodes

[3/5] Consider that all initializer_list elements are going to be inserted

[4/5] Introduce a before-begin cache policy to remember which bucket is
currently pointing on it

[5/5] Prealloc nodes on _Hashtable copy and introduce a new assignment
method which replicate buckets data structure

François

diff --git a/libstdc++-v3/testsuite/performance/23_containers/insert/54075.cc b/libstdc++-v3/testsuite/performance/23_containers/insert/54075.cc
index 8aa0cd20193..bb5778a257c 100644
--- a/libstdc++-v3/testsuite/performance/23_containers/insert/54075.cc
+++ b/libstdc++-v3/testsuite/performance/23_containers/insert/54075.cc
@@ -17,12 +17,14 @@
 
 // { dg-do run { target c++11 } }
 
-#include 
+#include 
 #include 
 #include 
 #include 
 #include 
 
+#include 
+
 #define USE_MY_FOO 1
 
 struct Foo
@@ -71,10 +73,13 @@ struct HashFunction
 };
 
 const int sz = 30;
+const int usz = sz / 2;
 
 template
   void
-  bench(const char* container_desc, const typename _ContType::value_type* foos)
+  bench(const char* container_desc,
+	const typename _ContType::value_type* foos,
+	const typename _ContType::value_type* ufoos)
   {
 using namespace __gnu_test;
 
@@ -106,6 +111,51 @@ template
 ostr << container_desc << nb_loop << " times insertion of "
 	 << sz << " elements";
 report_performance(__FILE__, ostr.str().c_str(), time, resource);
+
+// Try to lookup for mostly unknown entries.
+start_counters(time, resource);
+
+int fcount = 0;
+for (int j = 0; j != nb_loop; ++j)
+  for (int i = 0; i != usz; ++i)
+	fcount += s.find(ufoos[i]) != s.end() ? 1 : 0;
+
+stop_counters(time, resource);
+ostr.str("");
+ostr << container_desc << nb_loop << " times lookup of "
+	 << usz << " elements " << fcount / nb_loop << " found";
+report_performance(__FILE__, ostr.str().c_str(), time, resource);
+
+// Try again the previous operations but on a copy with potentially
+// less memory fragmentation.
+_ContType scopy(s);
+
+// Try to insert again to check performance of collision detection
+start_counters(time, resource);
+
+for (int j = 0; j != nb_loop; ++j)
+  for (int i = 0; i != sz; ++i)
+	scopy.insert(foos[i]);
+
+stop_counters(time, resource);
+ostr.str("");
+ostr << container_desc << nb_loop << " times insertion of "
+	 << sz << " elements in copy";
+report_performance(__FILE__, ostr.str().c_str(), time, resource);
+
+// Try to lookup for mostly unknown entries.
+start_counters(time, resource);
+
+fcount = 0;
+for (int j = 0; j != nb_loop; ++j)
+  for (int i = 0; i != usz; ++i)
+	fcount += scopy.find(ufoos[i]) != scopy.end() ? 1 : 0;
+
+stop_counters(time, resource);
+ostr.str("");
+ostr << container_desc << nb_loop << " times lookup of "
+	 << usz << " elements " << fcount / nb_loop << " found";
+report_performance(__FILE__, ostr.str().c_str(), time, resource);
   }
 
 template
@@ -155,67 +205,78 @@ int main()
 
   {
 int bars[sz];
+int ubars[usz];
 for (int i = 0; i != sz; ++i)
   bars[i] = i;
+for (int i = 0; i != usz; ++i)
+  ubars[i] = sz + i;
 bench>(
-	"std::tr1::unordered_set ", bars);
+  "std::tr1::unordered_set ", bars, ubars);
 bench>(
-	"std::unordered_set ", bars);
+  "std::unordered_set ", bars, ubars);
   }
 
-  Foo foos[sz];
-#if USE_MY_FOO
   {
-std::random_device randev;
-for (int i = 0; i != sz; ++i)
-  foos[i].init(randev);
-  }
+Foo foos[sz];
+Foo ufoos[usz];
+#if USE_MY_FOO
+{
+  std::random_device randev;
+  for (int i = 0; i != sz; ++i)
+	foos[i].init(randev);
+  for (int i = 0; i != usz; ++i)
+	

Re: [PATCH 1/2]middle-end Support optimized division by pow2 bitmask

2022-06-26 Thread Jeff Law via Gcc-patches




On 6/21/2022 6:34 PM, Tamar Christina via Gcc-patches wrote:

-Original Message-
From: Tamar Christina
Sent: Tuesday, June 14, 2022 4:58 PM
To: Richard Sandiford ; Richard Biener

Cc: gcc-patches@gcc.gnu.org; nd 
Subject: RE: [PATCH 1/2]middle-end Support optimized division by pow2
bitmask




-Original Message-
From: Richard Sandiford 
Sent: Tuesday, June 14, 2022 2:43 PM
To: Richard Biener 
Cc: Tamar Christina ;
gcc-patches@gcc.gnu.org; nd 
Subject: Re: [PATCH 1/2]middle-end Support optimized division by pow2
bitmask

Richard Biener  writes:

On Mon, 13 Jun 2022, Tamar Christina wrote:


-Original Message-
From: Richard Biener 
Sent: Monday, June 13, 2022 12:48 PM
To: Tamar Christina 
Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford

Subject: RE: [PATCH 1/2]middle-end Support optimized division by
pow2 bitmask

On Mon, 13 Jun 2022, Tamar Christina wrote:


-Original Message-
From: Richard Biener 
Sent: Monday, June 13, 2022 10:39 AM
To: Tamar Christina 
Cc: gcc-patches@gcc.gnu.org; nd ; Richard
Sandiford 
Subject: Re: [PATCH 1/2]middle-end Support optimized division
by
pow2 bitmask

On Mon, 13 Jun 2022, Richard Biener wrote:


On Thu, 9 Jun 2022, Tamar Christina wrote:


Hi All,

In plenty of image and video processing code it's common
to modify pixel values by a widening operation and then
scale them back into range

by dividing by 255.

This patch adds an optab to allow us to emit an optimized
sequence when doing an unsigned division that is equivalent

to:

x = y / (2 ^ (bitsize (y)/2)-1

Bootstrapped Regtested on aarch64-none-linux-gnu,
x86_64-pc-linux-gnu and no issues.

Ok for master?

Looking at 2/2 it seems that this is the wrong way to
attack the problem.  The ISA doesn't have such instruction
so adding an optab looks premature.  I suppose that there's
no unsigned vector integer division and thus we open-code
that in a different

way?

Isn't the correct thing then to fixup that open-coding if
it is more

efficient?

The problem is that even if you fixup the open-coding it would
need to be something target specific? The sequence of
instructions we generate don't have a GIMPLE representation.
So whatever is generated I'd have to fixup in RTL then.

What's the operation that doesn't have a GIMPLE representation?

For NEON use two operations:
1. Add High narrowing lowpart, essentially doing (a +w b) >>.n

bitsize(a)/2

 Where the + widens and the >> narrows.  So you give it two
shorts, get a byte 2. Add widening add of lowpart so basically
lowpart (a +w b)

For SVE2 we use a different sequence, we use two back-to-back

sequences of:

1. Add narrow high part (bottom).  In SVE the Top and Bottom
instructions

select

Even and odd elements of the vector rather than "top half" and
"bottom

half".

So this instruction does : Add each vector element of the first
source

vector to the

corresponding vector element of the second source vector, and
place

the most

 significant half of the result in the even-numbered half-width

destination elements,

 while setting the odd-numbered elements to zero.

So there's an explicit permute in there. The instructions are
sufficiently different that there wouldn't be a single GIMPLE

representation.

I see.  Are these also useful to express scalar integer division?

I'll defer to others to ack the special udiv_pow2_bitmask optab or
suggest some piecemail things other targets might be able to do as
well.  It does look very special.  I'd also bikeshed it to
udiv_pow2m1 since 'bitmask' is less obvious than 2^n-1 (assuming I
interpreted 'bitmask' correctly ;)).  It seems to be even less
general since it is an unary op and the actual divisor is
constrained by the mode itself?

Yeah, those were my concerns as well.  For n-bit numbers, the same
kind of arithmetic transformation can be used for any 2^m-1 for m in
[n/2, n), so from a target-independent point of view, m==n/2 isn't

particularly special.

Hard-coding one value of m would make sense if there was an underlying
instruction that did exactly this, but like you say, there isn't.

Would a compromise be to define an optab for ADDHN and then add a
vector pattern for this division that (at least initially) prefers
ADDHN over the current approach whenever ADDHN is available?  We

could

then adapt the conditions on the pattern if other targets also provide
ADDHN but don't want this transform.  (I think the other instructions
in the pattern already have
optabs.)

That still leaves open the question about what to do about SVE2, but
the underlying problem there is that the vectoriser doesn't know about
the B/T layout.

Wouldn't it be better to just generalize the optab and to pass on the mask?
I'd prefer to do that than teach the vectorizer about ADDHN (which can't be
easily done now) let alone teaching it about B/T.   It also seems somewhat
unnecessary to diverge the implementation here in the mid-end. After all,
you can generate better SSE code here as 

Re: [statistics.cc] Emit asm name of function with -fdump-statistics-asmname

2022-06-26 Thread Jeff Law via Gcc-patches




On 6/16/2022 9:04 AM, Prathamesh Kulkarni via Gcc-patches wrote:

Hi,
I just noticed -fdump-statistics supports asmname sub-option, which
according to the doc states:
"If DECL_ASSEMBLER_NAME has been set for a given decl, use that in the dump
instead of DECL_NAME. Its primary use is ease of use working backward from
mangled names in the assembly file."

When passed -fdump-statistics-asmname, the dump however still contains the
original name of functions. The patch modifies statistics.cc to emit asm
name of function instead. Also for C++, it helps to better disambiguate
overloaded function names in the stats dump file.
I have attached stats dump for a simple test-case.

Does it look OK ?

Yes.  This is fine for the trunk.
jeff



Re: [PATCH] expr.cc: Optimize if char array initialization consists of all zeros

2022-06-26 Thread Jeff Law via Gcc-patches




On 5/30/2022 9:35 PM, Takayuki 'January June' Suwa via Gcc-patches wrote:

Hi all,

In some targets, initialization code for char array may be split into two
parts even if the initialization consists of all zeros:

/* example */
extern void foo(char*);
void test(void) {
  char a[256] = { 0, 0, 0, 0, 0, 0, 0, 0, 0 };
  foo(a);
}

;; Xtensa (xtensa-lx106)
.LC0:
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.zero   246
test:
movi    a9, 0x110
sub    sp, sp, a9
l32r    a3, .LC1
movi.n    a4, 0xa
mov.n    a2, sp
s32i    a0, sp, 268
call0    memcpy
movi    a4, 0xf6
movi.n    a3, 0
addi.n    a2, sp, 10
call0    memset
mov.n    a2, sp
call0    foo
l32i    a0, sp, 268
movi    a9, 0x110
add.n    sp, sp, a9
ret.n

;; H8/300 (-mh -mint32)
.LC0:
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.zero   246
_test:
sub.l    #256,er7
sub.l    er2,er2
add.b    #10,r2l
mov.l    #.LC0,er1
mov.l    er7,er0
jsr    @_memcpy
sub.l    er2,er2
add.b    #246,r2l
sub.l    er1,er1
sub.l    er0,er0
add.b    #10,r0l
add.l    er7,er0
jsr    @_memset
mov.l    er7,er0
jsr    @_foo
add.l    #256,er7
rts

i386 target (both 32 and 64bit) does not show such behavior.

    gcc/ChangeLog:

* expr.cc (store_expr): Add check if the initialization content
consists of all zeros.
It's not entirely clear what you're trying to accomplish.  Is it the 
.zero which allocates space in the .rodata you're trying to remove? If 
so, it looks like that is already addressed on the trunk to me (I 
checked H8 with and without optimization).


jeff



Re: [PATCH] testsuite: constraint some of fp tests to hard_float

2022-06-26 Thread Jeff Law via Gcc-patches




On 5/29/2022 9:53 PM, Vineet Gupta wrote:

These tests validate fp conversions with various rounding modes which
would not work on soft-float ABIs.

On -march=rv64imac/-mabi=lp64 this reduces 5 unique failures (overall 35
due to multi flag combination builds)

gcc/testsuite/Changelog:
* gcc.dg/torture/fp-double-convert-float-1.c: Add
dg-require-effective-target hard_float.
* gcc.dg/torture/fp-int-convert-timode-3.c: Ditto.
* gcc.dg/torture/fp-int-convert-timode-4.c: Ditto.
* gcc.dg/torture/fp-uint64-convert-double-1.c: Ditto.
* gcc.dg/torture/fp-uint64-convert-double-2.c: Ditto.

Thanks.  I've pushed this to the trunk.

jeff



Re: [PATCH] Strip of a vector load which is only used partially.

2022-06-26 Thread Jeff Law via Gcc-patches




On 5/10/2022 12:30 AM, Richard Biener wrote:

On Tue, May 10, 2022 at 12:58 AM Jeff Law via Gcc-patches
 wrote:



On 5/5/2022 2:26 AM, Richard Biener via Gcc-patches wrote:

On Thu, May 5, 2022 at 7:04 AM liuhongt  wrote:

Optimize

_1 = *srcp_3(D);
_4 = VEC_PERM_EXPR <_1, _1, { 4, 5, 6, 7, 4, 5, 6, 7 }>;
_5 = BIT_FIELD_REF <_4, 128, 0>;

to

_1 = *srcp_3(D);
_5 = BIT_FIELD_REF <_1, 128, 128>;

the upper will finally be optimized to

_5 = BIT_FIELD_REF <*srcp_3(D), 128, 128>;

Bootstrapped and regtested on x86_64-pc-linux-gnu{m32,}.
Ok for trunk?

Hmm, tree-ssa-forwprop.cc:simplify_bitfield_ref should already
handle this in the

if (code == VEC_PERM_EXPR
&& constant_multiple_p (bit_field_offset (op), size, ))
  {

part of the code - maybe that needs to be enhanced to cover
a contiguous stride in the VEC_PERM_EXPR.  I see
we have

size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
if (maybe_ne (bit_field_size (op), size))
  return false;

where it will currently bail, so adjust that to check for a
constant multiple.  I also think we should only handle the
case where the new bit_field_offset alignment is not
worse than the original one.

That said, I'd prefer if you integrate this transform with
simplify_bitfield_ref.

I've got a hack here that tries to do something similar, but it's trying
to catch the case where we CONSTRUCTOR feeds the BIT_FIELD_REF.  It
walks the CONSTRUCTOR elements to see if an element has the right
offset/size to satisify the BIT_FIELD_REF. For x264 we're often able to
eliminate the VEC_PERMUTE entirely and just forward operands into the
BIT_FIELD_REF.

I was leaning towards moving those bits into match.pd before submitting,
but if you'd prefer them in tree-ssa-forwprop, that's even easier.

I think when deciding where to put things it's important to look where related
transforms reside.  We already do have a (simplify (BIT_FIELD_REF
CONSTRUCTOR@ ...))
pattern which should also handle your case already.  So instead of
adding something
new it would be nice to figure why it doesn't handle the case you are
interested in and
eventually just adjust the existing pattern.
I'm aware of that pattern.  I've found it painfully inadequate in every 
case I've looked at.    In general I've found tree-ssa-forwprop is a 
reasonable place to prototype a lot of stuff to see how it works in 
practice, but I think match.pd is better for most of the transformations 
in the long term.


It sounds like you'd prefer this particular case to move into match.pd.  
Fine.  That's what I'd originally planned to do.  It's pretty simple 
support code, so doing it in match.pd shouldn't be too hard.


Jeff


Re: [PATCH] [PR105455] predict: Check for no REG_BR_PROB in uninitialized case

2022-06-26 Thread Jeff Law via Gcc-patches




On 5/11/2022 7:48 PM, Alexandre Oliva via Gcc-patches wrote:

There is an assumption in force_edge_cold that, if any edge out of the
same src block has uninitialized probability, then a conditional
branch out of src won't have REG_BR_PROB set.

This assumption is supposed to hold, but buggy gimple passes may turn
unconditional edges into conditional ones, adding edges with
uninitialized probability out of blocks that retain originally
unconditional edges with precise always probability.  Expand may then
copy the formerly-unconditional edge's probability to REG_BR_PROB, and
if that edge ends up forced cold, the probability in the edge will be
modified without adjusting the note, and rtl_verify_edges complains
about that.

This patch adds checking that REG_BR_PROB is absent to the path taken
by force_cold_edge for uninitialized probabilities, so that the
problem is caught earlier and fixed sooner.

I'm not sure it buys us much, but...  Regstrapped on x86_64-linux-gnu.
Ok to install?


for  gcc/ChangeLog

* predict.cc (force_edge_cold): Check for no REG_BR_PROB in
the uninitialized probability case.
Should that be a runtime test (flag_checking) rather than a 
compile/configure time test (#if CHECKING_P)?  I think we generally 
perfer the former these days.   If you strongly think it should be a #if 
CHECKING_P, that's fine.  I just want you to ponder if the runtime test 
is more appropriate or not and change if you think it's warranted.


OK either way.

Jeff




Re: [PATCH] fortran, libgfortran, v2: Avoid using libquadmath for glibc 2.26+

2022-06-26 Thread Mikael Morin

Hello,

I don’t like the _Float128 vs __float128 business, it’s confusing.
And accordinog to https://gcc.gnu.org/onlinedocs/gcc/Floating-Types.html 
they seem to be basically the same thing, so it’s also redundant.

Is there a reason why the standard one, _Float128, can’t be used everywhere?
Mikael


Re: Fix another Rust demangling bugs (PR 105039)

2022-06-26 Thread Jeff Law via Gcc-patches




On 3/24/2022 7:12 AM, Nick Clifton via Gcc-patches wrote:

Hi Guys,

  Attached is a proposed patch to fix another Rust demangling bug
  reported in PR 105039.  I would like to say that this is the
  last time that we will see this problem for the Rust demangler,
  but I am not that naive...

  OK to apply ?

OK.

There are days when I wonder if restructuring the demanglers to use 
loops or gotos to avoid the recursion would be a step forward.  But I'm 
not convinced the same kinds of inputs just hang rather than eating 
stack space until the limit is exhausted.  So it wouldn't be a 
significant QOI improvement.


jeff


Re: [x86 PATCH] Use xchg for DImode double word rotate by 32 bits with -m32.

2022-06-26 Thread Uros Bizjak via Gcc-patches
On Sun, Jun 26, 2022 at 5:54 PM Roger Sayle  wrote:
>
>
> This patch was motivated by the investigation of Linus Torvalds' spill
> heavy cryptography kernels in PR 105930.  The di3 expander
> handles all rotations by an immediate constant for 1..63 bits with the
> exception of 32 bits, which FAILs and is then split by the middle-end.
> This patch makes these 32-bit doubleword rotations consistent with the
> other DImode rotations during reload, which results in reduced register
> pressure, fewer instructions and the use of x86's xchg instruction
> when appropriate.  In theory, xchg can be handled by register renaming,
> but even on micro-architectures where it's implemented by 3 uops (no
> worse than a three instruction shuffle), avoiding nominating a
> "temporary" register, reduces user-visible register pressure (and
> has obvious code size benefits).
>
> To effects are best shown with the new testcase:
>
> unsigned long long bar();
> unsigned long long foo()
> {
>   unsigned long long x = bar();
>   return (x>>32) | (x<<32);
> }
>
> for which GCC with -m32 -O2 currently generates:
>
> subl$12, %esp
> callbar
> addl$12, %esp
> movl%eax, %ecx
> movl%edx, %eax
> movl%ecx, %edx
> ret
>
> but with this patch now generates:
>
> subl$12, %esp
> callbar
> addl$12, %esp
> xchgl   %edx, %eax
> ret
>
> With this patch, the number of lines of assembly language generated
> for the blake2b kernel (from the attachment to PR105930) decreases
> from 5626 to 5404. Although there's an impressive reduction in
> instruction count, there's no change/reduction in stack frame size.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.  Ok for mainline?
>
>
> 2022-06-26  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386.md (swap_mode): Rename from *swap to
> provide gen_swapsi.
> (di3): Handle !TARGET_64BIT rotations via new
> gen_ix86_32di2_doubleword below.
> (ix86_32di2_doubleword): New define_insn_and_split
> that splits after reload as either a pair of move instructions
> or an xchgl (using gen_swapsi).
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/xchg-3.c: New test case.

+(define_insn_and_split "ix86_32di2_doubleword"

We don't encode the target in the insn name - 32di2_doubleword
should be OK.

+ [(set (match_operand:DI 0 "register_operand" "=r,r")
+   (any_rotate:DI (match_operand:DI 1 "nonimmediate_operand" "r,o")
+  (const_int 32)))]

Please use "=r,r,r"/"0,r,o" constraints here.

Uros.

>
> Thanks in advance,
> Roger
> --
>


Re: [PATCH 2/2] tree-optimization/104530 - Mark defs dependent on non-null stale.

2022-06-26 Thread Jeff Law via Gcc-patches




On 2/22/2022 9:40 AM, Andrew MacLeod via Gcc-patches wrote:
This patch simply leverages the existing computation machinery to 
re-evaluate values dependent on a newly found non-null value


Ranger associates a monotonically increasing temporal value with every 
def as it is defined.  When that value is used, we check if any of the 
values used in the definition have been updated, making the current 
cached global value stale.  This makes the evaluation lazy, if there 
are no more uses, we will never re-evaluate.


When an ssa-name is marked non-null it does not change the global 
value, and thus will not invalidate any global values.  This patch 
marks any definitions in the block which are dependent on the non-null 
value as stale.  This will cause them to be re-evaluated when they are 
next used.


Imports: b.0_1  d.3_7
Exports: b.0_1  _2  _3  d.3_7  _8
 _2 : b.0_1(I)
 _3 : b.0_1(I)  _2
 _8 : b.0_1(I)  _2  _3  d.3_7(I)

   b.0_1 = b;
    _2 = b.0_1 == 0B;
    _3 = (int) _2;
    c = _3;
    _5 = *b.0_1;    <<-- from this point b.0_1 is [+1, +INF]
    a = _5;
    d.3_7 = d;
    _8 = _3 % d.3_7;
    if (_8 != 0)

when _5 is defined, and n.0_1 becomes non-null,  we mark the dependent 
names that are exports and defined in this block as stale.  so _2, _3 
and _8.


When _8 is being calculated, _3 is stale, and causes it to be 
recomputed.  it is dependent on _2, alsdo stale, so it is also 
recomputed, and we end up with


  _2 == [0, 0]
  _3 == [0 ,0]
and _8 = [0, 0]
And then we can fold away the condition.

The side effect is that _2 and _3 are globally changed to be [0, 0], 
but this is OK because it is the definition block, so it dominates all 
other uses of these names, and they should be [0,0] upon exit anyway.  
The previous patch ensure that the global values written to 
SSA_NAME_RANGE_INFO is the correct [0,1] for both _2 and _3.


The patch would have been even smaller if I already had a mark_stale 
method.   I thought there was one, but I guess it never made it in 
from lack of need at the time.   The only other tweak was to make the 
value stale if the dependent value was the same as the definitions.


This bootstraps on x86_64-pc-linux-gnu with no regressions. Re-running 
to ensure.


OK for trunk? or defer to stage 1?
Seems reasonable now that we're in stage1.  Obviously given the time 
between original posting and now you should probably bootstrap and 
regression test it again.


Jeff


Re: [PATCH] libcpp: Update cpp_wcwidth() to Unicode 14.0.0

2022-06-26 Thread Lewis Hyatt via Gcc-patches
On Fri, Jun 24, 2022 at 3:26 PM Joseph Myers  wrote:
>
> On Fri, 24 Jun 2022, David Malcolm via Gcc-patches wrote:
>
> > > BTW, is this something simple enough I should just commit it without
> > > bugging
> > > the list for approval?
> >
> > The patch seems reasonable to me, but Joseph seems to be the expert on
> > i18n-related matters.
> >
> > Joseph, do we have a policy on this?
>
> I don't think we have a policy on Unicode updates specifically, but the
> general principle for updating files maintained outside GCC and copied
> verbatim into the GCC sources doesn't require prior approval.
>
> (Note that Unicode data is also used to generate ucnid.h - data for
> Unicode characters in identifiers - we should probably also include
> DerivedNormalizationProps.txt and DerivedCoreProperties.txt in the
> checked-in Unicode data for that purpose.)

Thank you both for the feedback. I have pushed the change for wcwidth,
and then I will follow up with a patch that adds these other two
Unicode data files and updates contrib/unicode/README so that the
procedure there will update both wcwidth and ucnid.h... and the patch
will follow that procedure to update ucnid.h from Unicode 13 to
Unicode 14.

-Lewis


Re: [x86 PATCH] PR rtl-optimization/96692: ((A|B)^C)^A using andn with -mbmi.

2022-06-26 Thread Uros Bizjak via Gcc-patches
On Sun, Jun 26, 2022 at 2:04 PM Roger Sayle  wrote:
>
>
> This patch addresses PR rtl-optimization/96692 on x86_64, by providing
> a define_split for combine to convert the three operation ((A|B)^C)^D
> into a two operation sequence using andn when either A or B is the same
> register as C or D.  This is essentially a reassociation problem that's
> only a win if the target supports an and-not instruction (as with -mbmi).
>
> Hence for the new test case:
>
> int f(int a, int b, int c)
> {
> return (a ^ b) ^ (a | c);
> }
>
> GCC on x86_64-pc-linux-gnu wth -O2 -mbmi would previously generate:
>
> xorl%edi, %esi
> orl %edx, %edi
> movl%esi, %eax
> xorl%edi, %eax
> ret
>
> but with this patch now generates:
>
> andn%edx, %edi, %eax
> xorl%esi, %eax
> ret
>
> I'll investigate whether this optimization can also be implemented
> more generically in simplify_rtx when the backend provides
> accurate rtx_costs for "(and (not ..." (as there's no optab for
> andn).
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.  Ok for mainline?
>
>
> 2022-06-26  Roger Sayle  
>
> gcc/ChangeLog
> PR rtl-optimization/96692
> * config/i386/i386.md (define_split): Split ((A | B) ^ C) ^ D
> as (X & ~Y) ^ Z on target BMI when either C or D is A or B.
>
> gcc/testsuite/ChangeLog
> PR rtl-optimization/96692
> * gcc.target/i386/bmi-andn-4.c: New test case.

+  "TARGET_BMI
+   && ix86_pre_reload_split ()
+   && (rtx_equal_p (operands[1], operands[3])
+   || rtx_equal_p (operands[1], operands[4])
+   || (REG_P (operands[2])
+   && (rtx_equal_p (operands[2], operands[3])
+   || rtx_equal_p (operands[2], operands[4]"

You don't need a ix86_pre_reload_split for combine splitter*

OTOH, please split the pattern to two for each commutative operand and
use (match_dup x) instead. Something similar to [1].

*combine splitter is described in the documentation as the splitter
pattern that does *not* match any existing insn pattern.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596804.html

Uros.


Re: [x86_64 PATCH] Implement __imag__ of float _Complex using shufps.

2022-06-26 Thread Uros Bizjak via Gcc-patches
On Sun, Jun 26, 2022 at 1:12 PM Roger Sayle  wrote:
>
>
> This patch is a follow-up improvement to my recent patch for
> PR rtl-optimization/7061.  That patch added the test case
> gcc.target/i386/pr7061-2.c:
>
> float im(float _Complex a) { return __imag__ a; }
>
> For which GCC on x86_64 currently generates:
>
> movq%xmm0, %rax
> shrq$32, %rax
> movd%eax, %xmm0
> ret
>
> but with this patch we now generate (the same as LLVM):
>
> shufps  $85, %xmm0, %xmm0
> ret
>
> This is achieved by providing a define_insn_and_split that allows
> truncated lshiftrt:DI by 32 to be performed on either SSE or general
> regs, where if the register allocator prefers to use SSE, we split
> to a shufps_v4si, or if not, we use a regular shrq.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, with no new failures.  Ok for mainline?
>
>
> 2022-06-26  Roger Sayle  
>
> gcc/ChangeLog
> PR rtl-optimization/7061
> * config/i386/i386.md (*highpartdisi2): New define_insn_and_split.
>
> gcc/testsuite/ChangeLog
> PR rtl-optimization/7061
> * gcc.target/i386/pr7061-2.c: Update to look for shufps.

OK.

Thanks,
Uros.

>
>
> Roger
> --
>


Re: [PATCH] testsuite: pthread: call sched_yield for non-preemptive targets

2022-06-26 Thread Jeff Law via Gcc-patches




On 6/20/2022 11:51 PM, Alexandre Oliva via Gcc-patches wrote:

Systems without preemptive multi-threading require sched_yield calls
to be placed at points in which a context switch might be needed to
enable the test to complete.

Regstrapped on x86_64-linux-gnu, also tested with a cross to
aarch64-rtems6.  Ok to install?


for  gcc/testsuite/ChangeLog

* gcc.dg/atomic/c11-atomic-exec-4.c: Call sched_yield.
* gcc.dg/atomic/c11-atomic-exec-5.c: Likewise.
* gcc.dg/atomic/pr80640-2.c: Likewise.
* gcc.dg/atomic/pr80640.c: Likewise.
* gcc.dg/atomic/pr81316.c: Likewise.
* gcc.dg/di-sync-multithread.c: Likewise.

OK.

I was hoping this would improve things internally as I know we have 
problems with the c11-atomic-exec-{4,5} and pr80640 testcases with one 
of our simulators that doesn't handle threads well.  But it doesn't seem 
to have helped.  Too bad, it would have been nice to have those tests 
flip to pass consistently.


Jeff



Re: [PATCH] PR tree-optimization/94026: Simplify (X>>8)&6 != 0 as X&1536 != 0.

2022-06-26 Thread Jeff Law via Gcc-patches




On 6/24/2022 9:09 AM, Roger Sayle wrote:

This patch implements the missed optimization described in PR 94026,
where a the shift can be eliminated from the sequence of a shift,
followed by a bit-wise AND followed by an equality/inequality test.
Specifically, ((X << C1) & C2) cmp C3 into (X & (C2 >> C1)) cmp (C3 >> C1)
and likewise ((X >> C1) & C2) cmp C3 into (X & (C2 << C1)) cmp (C3 << C1)
where cmp is == or !=, and C1, C2 and C3 are integer constants.
The example in the subject line is taken from the hot function
self_atari from the Go program Leela (in SPEC CPU 2017).

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}, with
no new failures, OK for mainline?


2022-06-24  Roger Sayle  

gcc/ChangeLog
PR tree-optimization/94026
* match.pd (((X << C1) & C2) eq/ne C3): New simplification.
(((X >> C1) & C2) eq/ne C3): Likewise.

gcc/testsuite/ChangeLog
PR tree-optimization/94026
* gcc.dg/pr94026.c: New test case.

OK.  But please check if we still need this code from fold-const.c:

  /* Fold ((X >> C1) & C2) == 0 and ((X >> C1) & C2) != 0 where
 C1 is a valid shift constant, and C2 is a power of two, i.e.
 a single bit.  */
  if (TREE_CODE (arg0) == BIT_AND_EXPR
  && integer_pow2p (TREE_OPERAND (arg0, 1))
  && integer_zerop (arg1))
[ ... ]

There's a whole series of transformations that are done for equality 
comparisons where one side is a constant and the other is combination of 
logicals & shifting.  Some (like the one noted above) are likely 
redundant now.  Others may fit better into the match.pd framework rather 
than fold-const.


Anyway, the patch is fine, but please take a looksie at the referenced 
cases in fold-const.c and see if there's any cleanup/refactoring we 
ought to be doing there.


jeff


Re: [PATCH] testsuite: Adjust btf-bitfields-1.c for default_packed

2022-06-26 Thread Jeff Law via Gcc-patches




On 6/23/2022 3:21 PM, Dimitar Dimitrov wrote:

If target packs structures by default, the bitfield offset which the
tests validates must be adjusted to not include padding.

Ok for trunk?

gcc/testsuite/ChangeLog:

* gcc.dg/debug/btf/btf-bitfields-1.c: Adjust the checked offsets
for targets which pack structures by default.

OK
jeff



Re: Pushed patch to convert DOM from EVRP to Ranger

2022-06-26 Thread Jeff Law via Gcc-patches




On 6/26/2022 9:38 AM, Aldy Hernandez wrote:

Thanks for pushing this.

The patch triggered a (known) regression on
g++.dg/warn/Wstringop-overflow-4.C.  In the original submission I
mentioned I would XFAIL it, but forgot to do so.  I have pushed the
attached patch.

We both forgot about this :-)



Note that since this was the last user of EVRP, I think it is now safe
to remove its code, along with any options on params.def.  Andrew, are
you OK with removing the legacy evrp code (gimple-ssa-evrp-analyze.*,
and any relevant bits)?  Of course, the core VRP code would still
remain, as VRP1 still uses it.

Sounds good to me.
jeff




[x86 PATCH] Use xchg for DImode double word rotate by 32 bits with -m32.

2022-06-26 Thread Roger Sayle

This patch was motivated by the investigation of Linus Torvalds' spill
heavy cryptography kernels in PR 105930.  The di3 expander
handles all rotations by an immediate constant for 1..63 bits with the
exception of 32 bits, which FAILs and is then split by the middle-end.
This patch makes these 32-bit doubleword rotations consistent with the
other DImode rotations during reload, which results in reduced register
pressure, fewer instructions and the use of x86's xchg instruction
when appropriate.  In theory, xchg can be handled by register renaming,
but even on micro-architectures where it's implemented by 3 uops (no
worse than a three instruction shuffle), avoiding nominating a
"temporary" register, reduces user-visible register pressure (and
has obvious code size benefits).

To effects are best shown with the new testcase:

unsigned long long bar();
unsigned long long foo()
{
  unsigned long long x = bar();
  return (x>>32) | (x<<32);
}

for which GCC with -m32 -O2 currently generates:

subl$12, %esp
callbar
addl$12, %esp
movl%eax, %ecx
movl%edx, %eax
movl%ecx, %edx
ret

but with this patch now generates:

subl$12, %esp
callbar
addl$12, %esp
xchgl   %edx, %eax
ret

With this patch, the number of lines of assembly language generated
for the blake2b kernel (from the attachment to PR105930) decreases
from 5626 to 5404. Although there's an impressive reduction in
instruction count, there's no change/reduction in stack frame size.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32},
with no new failures.  Ok for mainline?


2022-06-26  Roger Sayle  

gcc/ChangeLog
* config/i386/i386.md (swap_mode): Rename from *swap to
provide gen_swapsi.
(di3): Handle !TARGET_64BIT rotations via new
gen_ix86_32di2_doubleword below.
(ix86_32di2_doubleword): New define_insn_and_split
that splits after reload as either a pair of move instructions
or an xchgl (using gen_swapsi).

gcc/testsuite/ChangeLog
* gcc.target/i386/xchg-3.c: New test case.


Thanks in advance,
Roger
--

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index dd173f7..ab94866 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -2966,7 +2966,7 @@
(set_attr "memory" "load")
(set_attr "mode" "")])
 
-(define_insn "*swap"
+(define_insn "swap"
   [(set (match_operand:SWI48 0 "register_operand" "+r")
(match_operand:SWI48 1 "register_operand" "+r"))
(set (match_dup 1)
@@ -13648,6 +13648,8 @@
   else if (const_1_to_31_operand (operands[2], VOIDmode))
 emit_insn (gen_ix86_di3_doubleword
(operands[0], operands[1], operands[2]));
+  else if (CONST_INT_P (operands[2]) && INTVAL (operands[2]) == 32)
+emit_insn (gen_ix86_32di2_doubleword (operands[0], operands[1]));
   else
 FAIL;
 
@@ -13820,6 +13822,24 @@
   split_double_mode (mode, [0], 1, [4], [5]);
 })
 
+(define_insn_and_split "ix86_32di2_doubleword"
+ [(set (match_operand:DI 0 "register_operand" "=r,r")
+   (any_rotate:DI (match_operand:DI 1 "nonimmediate_operand" "r,o")
+  (const_int 32)))]
+ "!TARGET_64BIT"
+ "#"
+ "&& reload_completed"
+ [(set (match_dup 0) (match_dup 3))
+  (set (match_dup 2) (match_dup 1))]
+{
+  split_double_mode (DImode, [0], 2, [0], [2]);
+  if (rtx_equal_p (operands[0], operands[1]))
+{
+  emit_insn (gen_swapsi (operands[0], operands[2]));
+  DONE;
+}
+})
+
 (define_mode_attr rorx_immediate_operand
[(SI "const_0_to_31_operand")
 (DI "const_0_to_63_operand")])
diff --git a/gcc/testsuite/gcc.target/i386/xchg-3.c 
b/gcc/testsuite/gcc.target/i386/xchg-3.c
new file mode 100644
index 000..eec05f0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/xchg-3.c
@@ -0,0 +1,12 @@
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "-O2" } */
+
+unsigned long long bar();
+
+unsigned long long foo()
+{
+  unsigned long long x = bar();
+  return (x>>32) | (x<<32);
+}
+
+/*{ dg-final { scan-assembler "xchgl" } } */


Re: Pushed patch to convert DOM from EVRP to Ranger

2022-06-26 Thread Aldy Hernandez via Gcc-patches
Thanks for pushing this.

The patch triggered a (known) regression on
g++.dg/warn/Wstringop-overflow-4.C.  In the original submission I
mentioned I would XFAIL it, but forgot to do so.  I have pushed the
attached patch.

Note that since this was the last user of EVRP, I think it is now safe
to remove its code, along with any options on params.def.  Andrew, are
you OK with removing the legacy evrp code (gimple-ssa-evrp-analyze.*,
and any relevant bits)?  Of course, the core VRP code would still
remain, as VRP1 still uses it.

Aldy

On Sun, Jun 26, 2022 at 1:08 AM Jeff Law  wrote:
>
> This is 99.99% Aldy's work.  My only real contribution was twiddling one
> MIPS specific test in response to a regression after adding this patch
> to my tester.
>
> In simplest terms, this patch converts DOM to use the Ranger
> infrastructure rather than the EVRP infrastructure.  I'd basically
> approved it before Aldy went on PTO and it's been sitting in my tester
> ever since.  So I figured it was time to go ahead and push it.
>
> Jeff
From 80ace9185dc534e4d6cb19506300c4fcad4bd916 Mon Sep 17 00:00:00 2001
From: Aldy Hernandez 
Date: Sun, 26 Jun 2022 17:30:18 +0200
Subject: [PATCH] XFAIL a test in g++.dg/warn/Wstringop-overflow-4.C

As per the explanation in the test, and in the DOM conversion to
ranger patch, this is a known regression.  I had mentioned I would
XFAIL this test, but forgot to do so.  There is an analysis in the
test itself as to what is going on.

Tested on x86-64 Linux.

gcc/testsuite/ChangeLog:

	* g++.dg/warn/Wstringop-overflow-4.C: XFAIL a test.
---
 gcc/testsuite/g++.dg/warn/Wstringop-overflow-4.C | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/warn/Wstringop-overflow-4.C b/gcc/testsuite/g++.dg/warn/Wstringop-overflow-4.C
index eb4801918fc..c9d63932977 100644
--- a/gcc/testsuite/g++.dg/warn/Wstringop-overflow-4.C
+++ b/gcc/testsuite/g++.dg/warn/Wstringop-overflow-4.C
@@ -195,7 +195,7 @@ void test_strcpy_new_int16_t (size_t n, const size_t vals[])
   iftmp.2_33 = _45 * 2;;; iftmp.2_33 = 0
   _34 = operator new [] (iftmp.2_33);		;; new [] (0)
   */
-  T (S (2), new int16_t[r_dmin_dmax + 1]);
+  T (S (2), new int16_t[r_dmin_dmax + 1]); // { dg-bogus "into a region of size" "" { xfail *-*-*} }
   T (S (9), new int16_t[r_dmin_dmax * 2 + 1]);
 }
 
-- 
2.36.1



[PATCH] configure: When host-shared, pass --with-pic to in-tree lib configs.

2022-06-26 Thread Iain Sandoe via Gcc-patches
If we are building PIC/PIE host executables, and we are building dependent
libs (e.g. GMP) in-tree those libs need to be configured to generate PIC code.

At present, if an --enable-host-shared build is attempted on ELF platforms,
with in-tree dependents, the build will fail with incompatible relocations.
One can append --with-pic to the configure, but then that is applied everywhere
not just on the libraries that need it.

Tested on  x86_64-linux-gnu "--enable-host-shared" and compared with an
"--enable-host-shared --with-pic" version,

OK for master?
comments?
thanks
Iain

Signed-off-by: Iain Sandoe 

ChangeLog:

* Makefile.def: Pass host_libs_picflag to host dependent library
configures.
* Makefile.in: Regenerate.
* configure: Regenerate.
* configure.ac (host_libs_picflag): New configure variable set to
'--with-pic' when building 'host_shared'.
---
 Makefile.def |  15 +++---
 Makefile.in  | 140 +--
 configure|  11 
 configure.ac |  10 
 4 files changed, 99 insertions(+), 77 deletions(-)

diff --git a/Makefile.def b/Makefile.def
index 72d58549645..92239aebb57 100644
--- a/Makefile.def
+++ b/Makefile.def
@@ -50,7 +50,7 @@ host_modules= { module= gcc; bootstrap=true;
extra_make_flags="$(EXTRA_GCC_FLAGS)"; };
 host_modules= { module= gmp; lib_path=.libs; bootstrap=true;
// Work around in-tree gmp configure bug with missing flex.
-   extra_configure_flags='--disable-shared LEX="touch lex.yy.c"';
+   extra_configure_flags='--disable-shared LEX="touch lex.yy.c" 
@host_libs_picflag@';
extra_make_flags='AM_CFLAGS="-DNO_ASM"';
no_install= true;
// none-*-* disables asm optimizations, bootstrap-testing
@@ -60,21 +60,22 @@ host_modules= { module= gmp; lib_path=.libs; bootstrap=true;
// different from host for target.
target="none-${host_vendor}-${host_os}"; };
 host_modules= { module= mpfr; lib_path=src/.libs; bootstrap=true;
-   extra_configure_flags='--disable-shared 
@extra_mpfr_configure_flags@';
+   extra_configure_flags='--disable-shared 
@extra_mpfr_configure_flags@ @host_libs_picflag@';
extra_make_flags='AM_CFLAGS="-DNO_ASM"';
no_install= true; };
 host_modules= { module= mpc; lib_path=src/.libs; bootstrap=true;
-   extra_configure_flags='--disable-shared 
@extra_mpc_gmp_configure_flags@ @extra_mpc_mpfr_configure_flags@ 
--disable-maintainer-mode';
+   extra_configure_flags='--disable-shared 
@extra_mpc_gmp_configure_flags@ @extra_mpc_mpfr_configure_flags@  
@host_libs_picflag@ --disable-maintainer-mode';
no_install= true; };
 host_modules= { module= isl; lib_path=.libs; bootstrap=true;
-   extra_configure_flags='--disable-shared 
@extra_isl_gmp_configure_flags@';
+   extra_configure_flags='--disable-shared 
@extra_isl_gmp_configure_flags@  @host_libs_picflag@';
extra_make_flags='V=1';
no_install= true; };
 host_modules= { module= libelf; lib_path=.libs; bootstrap=true;
-   extra_configure_flags='--disable-shared';
+   extra_configure_flags='--disable-shared  @host_libs_picflag@';
no_install= true; };
 host_modules= { module= gold; bootstrap=true; };
 host_modules= { module= gprof; };
+// intl acts on 'host_shared' directly, and does not support --with-pic.
 host_modules= { module= intl; bootstrap=true; };
 host_modules= { module= tcl;
 missing=mostlyclean; };
@@ -110,7 +111,7 @@ host_modules= { module= libiberty-linker-plugin; 
bootstrap=true;
 // We abuse missing to avoid installing anything for libiconv.
 host_modules= { module= libiconv;
bootstrap=true;
-   extra_configure_flags='--disable-shared';
+   extra_configure_flags='--disable-shared  @host_libs_picflag@';
no_install= true;
missing= pdf;
missing= html;
@@ -125,7 +126,7 @@ host_modules= { module= sim; };
 host_modules= { module= texinfo; no_install= true; };
 host_modules= { module= zlib; no_install=true; no_check=true;
bootstrap=true;
-   extra_configure_flags='@extra_host_zlib_configure_flags@';};
+   extra_configure_flags='@extra_host_zlib_configure_flags@ 
@host_libs_picflag@';};
 host_modules= { module= gnulib; };
 host_modules= { module= gdbsupport; };
 host_modules= { module= gdbserver; };

diff --git a/configure.ac b/configure.ac
index dcea067759d..09958bd7782 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1928,8 +1928,18 @@ AC_ARG_ENABLE(host-shared,
   x86_64-*-darwin* | aarch64-*-darwin*) host_shared=yes ;;
   *) host_shared=no ;;
  esac])
+
 AC_SUBST(host_shared)
 
+# If we are building PIC/PIE host executables, and we are building dependent
+# libs (e.g. GMP) 

[pushed] configure, Darwin: Correct a pasto in host-shared processing.

2022-06-26 Thread Iain Sandoe via Gcc-patches
We do, of course, mean $host not $target in this case.  Corrected thus.

tested on x86_64-darwin and x86_64-linux, pushed to master, thanks
Iain

Signed-off-by: Iain Sandoe 

ChangeLog:

* configure: Regenerate.
* configure.ac: Correct use of $host.
---
 configure| 4 ++--
 configure.ac | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/configure.ac b/configure.ac
index f941b81af7f..dcea067759d 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1915,7 +1915,7 @@ AC_ARG_ENABLE(host-shared,
 [AS_HELP_STRING([--enable-host-shared],
[build host code as shared libraries])],
 [host_shared=$enableval
- case $target in
+ case $host in
x86_64-*-darwin* | aarch64-*-darwin*)
  if test x$host_shared != xyes ; then
# PIC is the default, and actually cannot be switched off.
@@ -1924,7 +1924,7 @@ AC_ARG_ENABLE(host-shared,
  fi ;;
   *) ;;
  esac],
-[case $target in
+[case $host in
   x86_64-*-darwin* | aarch64-*-darwin*) host_shared=yes ;;
   *) host_shared=no ;;
  esac])
-- 
2.24.3 (Apple Git-128)



[PATCH] xtensa: Optimize integer constant addition that is between -32896 and 32639

2022-06-26 Thread Takayuki 'January June' Suwa via Gcc-patches
Such constants are often subject to the constant synthesis:

int test(int a) {
  return a - 31999;
}

test:
movia3, 1
addmi   a3, a3, -0x7d00
add a2, a2, a3
ret

This patch optimizes such case as follows:

test:
addia2, a2, 1
addmi   a2, a2, -0x7d00
ret

gcc/ChangeLog:

* config/xtensa/xtensa.md:
Suppress unnecessary emitting nop insn in the split patterns for
integer/FP constant synthesis, and add new peephole2 pattern that
folds such synthesized additions.
---
 gcc/config/xtensa/xtensa.md | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index f31ec33b362..9d998589631 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -1033,6 +1033,7 @@
 FAIL;
   if (! xtensa_constantsynth (operands[0], INTVAL (x)))
 emit_move_insn (operands[0], x);
+  DONE;
 })
 
 ;; 16-bit Integer moves
@@ -1272,6 +1273,7 @@
   x = gen_rtx_REG (SImode, REGNO (operands[0]));
   if (! xtensa_constantsynth (x, l[i]))
 emit_move_insn (x, GEN_INT (l[i]));
+  DONE;
 })
 
 ;; 64-bit floating point moves
@@ -2808,3 +2810,36 @@
 && REGNO (x) == regno + REG_NREGS (operands[0]) / 2))
 FAIL;
 })
+
+(define_peephole2
+  [(set (match_operand:SI 0 "register_operand")
+   (match_operand:SI 1 "const_int_operand"))
+   (set (match_dup 0)
+   (plus:SI (match_dup 0)
+(match_operand:SI 2 "const_int_operand")))
+   (set (match_operand:SI 3 "register_operand")
+   (plus:SI (match_operand:SI 4 "register_operand")
+(match_dup 0)))]
+  "IN_RANGE (INTVAL (operands[1]) + INTVAL (operands[2]),
+(-128 - 32768), (127 + 32512))
+   && REGNO (operands[0]) != REGNO (operands[3])
+   && REGNO (operands[0]) != REGNO (operands[4])
+   && peep2_reg_dead_p (3, operands[0])"
+  [(set (match_dup 3)
+   (plus:SI (match_dup 4)
+(match_dup 1)))
+   (set (match_dup 3)
+   (plus:SI (match_dup 3)
+(match_dup 2)))]
+{
+  HOST_WIDE_INT value = INTVAL (operands[1]) + INTVAL (operands[2]);
+  int imm0, imm1;
+  value += 128;
+  if (value > 32512)
+imm1 = 32512;
+  else
+imm1 = value & ~255;
+  imm0 = value - imm1 - 128;
+  operands[1] = GEN_INT (imm0);
+  operands[2] = GEN_INT (imm1);
+})
-- 
2.20.1


[PATCH take 2] middle-end: Support ABIs that pass FP values as wider integers.

2022-06-26 Thread Roger Sayle

Hi Jeff,
Sorry for the long delay getting back to this, but after deeper
investigation, it turns out that your tingling spider senses that
the original patch wasn't updating everywhere that was required
were spot on.  Although my nvptx testing showed no problems with -O2,
compiling the same tests with -O0 found several additional assertion
ICEs (exactly where you'd predicted they should/would be).

Here's a revised patch that updates five locations (up from the
previous two).  Finding any remaining locations (if any) might be
easier once folks are able to test things on their targets.


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check with no new failures, and on nvptx-none, where it is
the middle-end portion of a pair of patches to allow the default ISA to
be advanced.  Ok for mainline?


2022-06-26  Roger Sayle  

gcc/ChangeLog
PR target/104489
* calls.cc (precompute_register_parameters): Allow promotion
of floating point values to be passed in wider integer modes.
(expand_call): Allow floating point results to be returned in
wider integer modes.
* cfgexpand.cc (expand_value_return): Allow backends to promote
a scalar floating point return value to a wider integer mode.
* expr.cc (expand_expr_real_1) : Likewise, allow
backends to promote scalar FP PARM_DECLs to wider integer modes.
* function.cc (assign_parm_setup_stack): Allow floating point
values to be passed on the stack as wider integer modes.

Thanks in advance,
Roger
--

> -Original Message-
> From: Jeff Law 
> Sent: 14 March 2022 15:30
> To: Roger Sayle ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] middle-end: Support ABIs that pass FP values as wider
> integers.
> 
> 
> 
> On 2/9/2022 1:12 PM, Roger Sayle wrote:
> > This patch adds middle-end support for target ABIs that pass/return
> > floating point values in integer registers with precision wider than
> > the original FP mode.  An example, is the nvptx backend where 16-bit
> > HFmode registers are passed/returned as (promoted to) SImode registers.
> > Unfortunately, this currently falls foul of the various (recent?)
> > sanity checks that (very sensibly) prevent creating paradoxical
> > SUBREGs of floating point registers.  The approach below is to
> > explicitly perform the conversion/promotion in two steps, via an
> > integer mode of same precision as the floating point value.  So on
> > nvptx, 16-bit HFmode is initially converted to 16-bit HImode (using
> > SUBREG), then zero-extended to SImode, and likewise when going the
> > other way, parameters truncated to HImode then converted to HFmode
> > (using SUBREG).  These changes are localized to expand_value_return
> > and expanding DECL_RTL to support strange ABIs, rather than inside
> > convert_modes or gen_lowpart, as mismatched precision integer/FP
> > conversions should be explicit in the RTL, and these semantics not generally
> visible/implicit in user code.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check with no new failures, and on nvptx-none, where it is
> > the middle-end portion of a pair of patches to allow the default ISA
> > to be advanced.  Ok for mainline?
> >
> > 2022-02-09  Roger Sayle  
> >
> > gcc/ChangeLog
> > * cfgexpand.cc (expand_value_return): Allow backends to promote
> > a scalar floating point return value to a wider integer mode.
> > * expr.cc (expand_expr_real_1) [expand_decl_rtl]: Likewise, allow
> > backends to promote scalar FP PARM_DECLs to wider integer modes.
> 
> Buried somewhere in our calling conventions code is the ability to pass around
> BLKmode objects in registers along with the ability to tune left vs right 
> padding
> adjustments.   Much of this support grew out of the PA
> 32 bit SOM ABI.
> 
> While I think we could probably make those bits do what we want, I suspect the
> result will actually be uglier than what you've done here and I wouldn't be
> surprised if there was a performance hit as the code to handle those cases was
> pretty dumb in its implementation.
> 
> What I find rather surprising is the location of your changes -- they feel
> incomplete.  For example, you fix the callee side of returns in
> expand_value_return, but I don't analogous code for the caller side.
> 
> Similarly while you fix things for arguments in expand_expr_real_1, that's 
> again
> just the callee side.  Don't you need to so something on the caller side too?
> 
> Jeff
> 

diff --git a/gcc/calls.cc b/gcc/calls.cc
index f4e1299..06d8a95 100644
--- a/gcc/calls.cc
+++ b/gcc/calls.cc
@@ -992,11 +992,24 @@ precompute_register_parameters (int num_actuals, struct 
arg_data *args,
/* If we are to promote the function arg to a wider mode,
   do it now.  */
 
-   if (args[i].mode != TYPE_MODE (TREE_TYPE (args[i].tree_value)))
- args[i].value
-   = convert_modes 

[x86 PATCH] PR rtl-optimization/96692: ((A|B)^C)^A using andn with -mbmi.

2022-06-26 Thread Roger Sayle

This patch addresses PR rtl-optimization/96692 on x86_64, by providing
a define_split for combine to convert the three operation ((A|B)^C)^D
into a two operation sequence using andn when either A or B is the same
register as C or D.  This is essentially a reassociation problem that's
only a win if the target supports an and-not instruction (as with -mbmi).

Hence for the new test case:

int f(int a, int b, int c)
{
return (a ^ b) ^ (a | c);
}

GCC on x86_64-pc-linux-gnu wth -O2 -mbmi would previously generate:

xorl%edi, %esi
orl %edx, %edi
movl%esi, %eax
xorl%edi, %eax
ret

but with this patch now generates:

andn%edx, %edi, %eax
xorl%esi, %eax
ret

I'll investigate whether this optimization can also be implemented
more generically in simplify_rtx when the backend provides
accurate rtx_costs for "(and (not ..." (as there's no optab for
andn).

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32},
with no new failures.  Ok for mainline?


2022-06-26  Roger Sayle  

gcc/ChangeLog
PR rtl-optimization/96692
* config/i386/i386.md (define_split): Split ((A | B) ^ C) ^ D
as (X & ~Y) ^ Z on target BMI when either C or D is A or B.

gcc/testsuite/ChangeLog
PR rtl-optimization/96692
* gcc.target/i386/bmi-andn-4.c: New test case.


Thanks in advance,
Roger
--

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 3093cb5..27dddaf 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -10525,6 +10525,57 @@
(set (match_dup 0) (match_op_dup 1
 [(and:SI (match_dup 3) (match_dup 2))
 (const_int 0)]))])
+
+;; Split ((A | B) ^ C) ^ D as (X & ~Y) ^ Z.
+(define_split
+  [(set (match_operand:SWI48 0 "register_operand")
+   (xor:SWI48
+  (xor:SWI48
+ (ior:SWI48 (match_operand:SWI48 1 "register_operand")
+(match_operand:SWI48 2 "nonimmediate_operand"))
+ (match_operand:SWI48 3 "nonimmediate_operand"))
+  (match_operand:SWI48 4 "nonimmediate_operand")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_BMI
+   && ix86_pre_reload_split ()
+   && (rtx_equal_p (operands[1], operands[3])
+   || rtx_equal_p (operands[1], operands[4])
+   || (REG_P (operands[2])
+  && (rtx_equal_p (operands[2], operands[3])
+  || rtx_equal_p (operands[2], operands[4]"
+  [(parallel
+  [(set (match_dup 5) (and:SWI48 (not:SWI48 (match_dup 6)) (match_dup 7)))
+   (clobber (reg:CC FLAGS_REG))])
+   (parallel
+  [(set (match_dup 0) (xor:SWI48 (match_dup 5) (match_dup 8)))
+   (clobber (reg:CC FLAGS_REG))])]
+{
+  operands[5] = gen_reg_rtx (mode);
+  if (rtx_equal_p (operands[1], operands[3]))
+{
+  operands[6] = operands[1];
+  operands[7] = operands[2];
+  operands[8] = operands[4];
+}
+  else if (rtx_equal_p (operands[1], operands[4]))
+{
+  operands[6] = operands[1];
+  operands[7] = operands[2];
+  operands[8] = operands[3];
+}
+  else if (rtx_equal_p (operands[2], operands[3]))
+{
+  operands[6] = operands[2];
+  operands[7] = operands[1];
+  operands[8] = operands[4];
+}
+  else
+{
+  operands[6] = operands[2];
+  operands[7] = operands[1];
+  operands[8] = operands[3];
+}
+})
 
 ;; Logical inclusive and exclusive OR instructions
 
diff --git a/gcc/testsuite/gcc.target/i386/bmi-andn-4.c 
b/gcc/testsuite/gcc.target/i386/bmi-andn-4.c
new file mode 100644
index 000..fb89529
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/bmi-andn-4.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mbmi" } */
+
+int f(int a, int b, int c)
+{
+return (a ^ b) ^ (a | c);
+}
+
+/* { dg-final { scan-assembler "andn\[ \\t\]+" } } */


[x86_64 PATCH] Implement __imag__ of float _Complex using shufps.

2022-06-26 Thread Roger Sayle

This patch is a follow-up improvement to my recent patch for
PR rtl-optimization/7061.  That patch added the test case
gcc.target/i386/pr7061-2.c:

float im(float _Complex a) { return __imag__ a; }

For which GCC on x86_64 currently generates:

movq%xmm0, %rax
shrq$32, %rax
movd%eax, %xmm0
ret

but with this patch we now generate (the same as LLVM):

shufps  $85, %xmm0, %xmm0
ret

This is achieved by providing a define_insn_and_split that allows
truncated lshiftrt:DI by 32 to be performed on either SSE or general
regs, where if the register allocator prefers to use SSE, we split
to a shufps_v4si, or if not, we use a regular shrq.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, with no new failures.  Ok for mainline?


2022-06-26  Roger Sayle  

gcc/ChangeLog
PR rtl-optimization/7061
* config/i386/i386.md (*highpartdisi2): New define_insn_and_split.

gcc/testsuite/ChangeLog
PR rtl-optimization/7061
* gcc.target/i386/pr7061-2.c: Update to look for shufps.


Roger
--

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 5b53841..709598c 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -13234,6 +13234,31 @@
(const_string "*")))
(set_attr "mode" "")])
 
+;; Specialization of *lshr3_1 below, extracting the SImode
+;; highpart of a DI to be extracted, but allowing it to be clobbered.
+(define_insn_and_split "*highpartdisi2"
+  [(set (subreg:DI (match_operand:SI 0 "register_operand" "=r,x,?k") 0)
+(lshiftrt:DI (match_operand:DI 1 "register_operand" "0,0,k")
+(const_int 32)))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_64BIT"
+  "#"
+  "&& reload_completed"
+  [(parallel
+[(set (match_dup 0) (lshiftrt:DI (match_dup 1) (const_int 32)))
+ (clobber (reg:CC FLAGS_REG))])]
+{
+  if (SSE_REG_P (operands[0]))
+{
+  rtx tmp = gen_rtx_REG (V4SImode, REGNO (operands[0]));
+  emit_insn (gen_sse_shufps_v4si (tmp, tmp, tmp,
+ const1_rtx, const1_rtx,
+ GEN_INT (5), GEN_INT (5)));
+  DONE;
+}
+  operands[0] = gen_rtx_REG (DImode, REGNO (operands[0]));
+})
+
 (define_insn "*lshr3_1"
   [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,?k")
(lshiftrt:SWI48
diff --git a/gcc/testsuite/gcc.target/i386/pr7061-2.c 
b/gcc/testsuite/gcc.target/i386/pr7061-2.c
index ac33340..837cd83 100644
--- a/gcc/testsuite/gcc.target/i386/pr7061-2.c
+++ b/gcc/testsuite/gcc.target/i386/pr7061-2.c
@@ -1,5 +1,9 @@
 /* { dg-do compile { target { ! ia32 } } } */
 /* { dg-options "-O2" } */
 float im(float _Complex a) { return __imag__ a; }
+/* { dg-final { scan-assembler "shufps" } } */
+/* { dg-final { scan-assembler-not "movd" } } */
+/* { dg-final { scan-assembler-not "movq" } } */
 /* { dg-final { scan-assembler-not "movss" } } */
 /* { dg-final { scan-assembler-not "rsp" } } */
+/* { dg-final { scan-assembler-not "shr" } } */


Re: [PATCH] Fortran: fix simplification of INDEX(str1,str2) [PR105691]

2022-06-26 Thread Thomas Koenig via Gcc-patches

Hello Harald,


compile time simplification of INDEX(str1,str2,back=.true.) gave wrong
results.  Looking at gfc_simplify_index, this appeared to be close to
a complete mess, while the runtime library code - which was developed
later - was a relief.

The solution is to use the runtime library code as template to fix this.
I took the opportunity to change string index and length variables
in gfc_simplify_index to HOST_WIDE_INT.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

As this is a wrong-code issue, would this qualify for backports to
open branches?


OK for both.

Thanks for the patch!


Best regards

Thomas


[PATCH v3] eliminate mutex in fast path of __register_frame

2022-06-26 Thread Thomas Neumann via Gcc-patches
NOTE: A stress test program and a detailed walkthrough that breaks this
patch into manageable parts can be found here:
https://databasearchitects.blogspot.com/2022/06/making-unwinding-through-jit-ed-code.html

The __register_frame/__deregister_frame functions are used to register
unwinding frames from JITed code in a sorted list. That list itself
is protected by object_mutex, which leads to terrible performance
in multi-threaded code and is somewhat expensive even if single-threaded.
There was already a fast-path that avoided taking the mutex if no
frame was registered at all.

This commit eliminates both the mutex and the sorted list from
the atomic fast path, and replaces it with a btree that uses
optimistic lock coupling during lookup. This allows for fully parallel
unwinding and is essential to scale exception handling to large
core counts.

Changes since v2:
- fix contention problem during unlocking

libgcc/ChangeLog:

* unwind-dw2-fde.c (release_registered_frames): Cleanup at shutdown.
(__register_frame_info_table_bases): Use btree in atomic fast path.
(__deregister_frame_info_bases): Likewise.
(_Unwind_Find_FDE): Likewise.
(base_from_object): Make parameter const.
(get_pc_range_from_fdes): Compute PC range for lookup.
(get_pc_range): Likewise.
* unwind-dw2-fde.h (last_fde): Make parameter const.
* unwind-dw2-btree.h: New file.
---
 libgcc/unwind-dw2-btree.h | 953 ++
 libgcc/unwind-dw2-fde.c   | 194 ++--
 libgcc/unwind-dw2-fde.h   |   2 +-
 3 files changed, 1113 insertions(+), 36 deletions(-)
 create mode 100644 libgcc/unwind-dw2-btree.h

diff --git a/libgcc/unwind-dw2-btree.h b/libgcc/unwind-dw2-btree.h
new file mode 100644
index 000..3b2b6871b46
--- /dev/null
+++ b/libgcc/unwind-dw2-btree.h
@@ -0,0 +1,953 @@
+/* Lock-free btree for manually registered unwind frames  */
+/* Copyright (C) 2022 Free Software Foundation, Inc.
+   Contributed by Thomas Neumann
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+#ifndef GCC_UNWIND_DW2_BTREE_H
+#define GCC_UNWIND_DW2_BTREE_H
+
+#include 
+
+// Common logic for version locks
+struct version_lock
+{
+  // The lock itself. The lowest bit indicates an exclusive lock,
+  // the second bit indicates waiting threads. All other bits are
+  // used as counter to recognize changes.
+  // Overflows are okay here, we must only prevent overflow to the
+  // same value within one lock_optimistic/validate
+  // range. Even on 32 bit platforms that would require 1 billion
+  // frame registrations within the time span of a few assembler
+  // instructions.
+  uintptr_t version_lock;
+};
+
+#ifdef __GTHREAD_HAS_COND
+// We should never get contention within the tree as it rarely changes.
+// But if we ever do get contention we use these for waiting
+static __gthread_mutex_t version_lock_mutex = __GTHREAD_MUTEX_INIT;
+static __gthread_cond_t version_lock_cond = __GTHREAD_COND_INIT;
+#endif
+
+// Initialize in locked state
+static inline void
+version_lock_initialize_locked_exclusive (struct version_lock *vl)
+{
+  vl->version_lock = 1;
+}
+
+// Try to lock the node exclusive
+static inline bool
+version_lock_try_lock_exclusive (struct version_lock *vl)
+{
+  uintptr_t state = __atomic_load_n (&(vl->version_lock), __ATOMIC_SEQ_CST);
+  if (state & 1)
+return false;
+  return __atomic_compare_exchange_n (&(vl->version_lock), , state | 1,
+ false, __ATOMIC_SEQ_CST,
+ __ATOMIC_SEQ_CST);
+}
+
+// Lock the node exclusive, blocking as needed
+static void
+version_lock_lock_exclusive (struct version_lock *vl)
+{
+#ifndef __GTHREAD_HAS_COND
+restart:
+#endif
+
+  // We should virtually never get contention here, as frame
+  // changes are rare
+  uintptr_t state = __atomic_load_n (&(vl->version_lock), __ATOMIC_SEQ_CST);
+  if (!(state & 1))
+{
+  if (__atomic_compare_exchange_n (&(vl->version_lock), , state | 1,
+  false, __ATOMIC_SEQ_CST,
+  

Re: [PATCH] Fortran: handle explicit-shape specs with constant bounds [PR105954]

2022-06-26 Thread Thomas Koenig via Gcc-patches

Hello Harald,


after simplification of constant bound expressions of an explicit
shape spec of an array, we need to ensure that we never obtain
negative extents.  In some cases this did happen, and we ICEd
as we hit an assert that this should never happen...

The original testcase by Gerhard exhibited this for sizeof()
of a derived type with an array component, but the issue is
more fundamental and affects other intrinsics during
simplification.

A straightforward solution "fixes up" the upper bound in the
shape spec when it is known to be below lower bounds minus one.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?


OK.  Thanks for the patch!

Regards

Thomas


Re: [PATCH] testsuite: Add new target check for no_alignment_constraints

2022-06-26 Thread Dimitar Dimitrov
On Fri, Jun 24, 2022 at 08:58:49AM +0200, Richard Biener wrote:
> On Fri, Jun 24, 2022 at 2:34 AM Andrew Pinski via Gcc-patches
>  wrote:
> >
> > On Thu, Jun 23, 2022 at 2:24 PM Dimitar Dimitrov  wrote:
> > >
> > > A few testcases were marked for avr target, which has no alignment
> > > requirements.  But those tests in fact should filter for any
> > > target having __BIGGEST_ALIGNMENT__=1.
> > >
> > > A new effective target check is introduced: no_alignment_constraints.
> > > It checks whether __BIGGEST_ALIGNMENT__ is declared as 1.
> > >
> > > Alternative names I considered for the new macro are:
> > >   - !natural_alignment_16
> > >   - biggest_alignment_1
> > >
> > > This change fixes the testsuite cases for PRU target.  I don't have
> > > environment to test mm32c and cris targets, which also declare
> > > __BIGGEST_ALIGNMENT__=1.
> > >
> > > It was regression-tested on x86_64-pc-linux-gnu.
> > >
> > > The following two existing macros were considered, but they check for
> > > subtly different target behaviour:
> > >  1. non_strict_align
> > > If true, non-aligned access is permitted. But it also allows
> > > variables to be naturally aligned, which is not true for
> > > no_alignment_constraints.
> > >
> > >  2. default_packed
> > > Whether structures are packed by default is not necessarily
> > > the same as lacking constraints for any variable alignment.
> > > For example, BIGGEST_FIELD_ALIGNMENT or ADJUST_FIELD_ALIGN
> > > could be defined for a target.
> > >
> > > Ok for trunk?
> >
> > How is no_alignment_constraints different from default_packed? I
> > suspect they have the same effect really.
> 
> Different when non-aggregates are involved?  Does default_packed
> also apply to scalar types?

It is my understanding that aggregates and scalars could have
different alignment constraints.

For example, consider the following target settings combination, which I
found in the vax backend:

#define BIGGEST_ALIGNMENT 32
#define BIGGEST_FIELD_ALIGNMENT 8

I made an experiment and hacked pru-unknonwn-elf with the above change.
This resulted in:

  default_packed=1
  no_alignment_constraints=0

> 
> Btw, new effective targets should be documented in sourcebuild.texi

I'll fix and post a new version.

Thanks,
Dimitar

> 
> Richard.
> 
> > Thanks,
> > Andrew
> >
> > >
> > > Signed-off-by: Dimitar Dimitrov 
> > > ---
> > >  gcc/testsuite/c-c++-common/Wcast-align.c |  4 ++--
> > >  gcc/testsuite/gcc.dg/c11-align-4.c   |  2 +-
> > >  gcc/testsuite/gcc.dg/strlenopt-10.c  |  6 +++---
> > >  gcc/testsuite/gcc.dg/strlenopt-11.c  | 14 +++---
> > >  gcc/testsuite/gcc.dg/strlenopt-13.c  | 16 
> > >  gcc/testsuite/lib/target-supports.exp| 13 +
> > >  6 files changed, 34 insertions(+), 21 deletions(-)
> > >
> > > diff --git a/gcc/testsuite/c-c++-common/Wcast-align.c 
> > > b/gcc/testsuite/c-c++-common/Wcast-align.c
> > > index c296c7fd249..1087b10fd99 100644
> > > --- a/gcc/testsuite/c-c++-common/Wcast-align.c
> > > +++ b/gcc/testsuite/c-c++-common/Wcast-align.c
> > > @@ -16,8 +16,8 @@ struct t { double x; } *q;
> > >  void
> > >  foo (void)
> > >  {
> > > -  y = (c *) x;  /* { dg-warning "7:cast \[^\n\r]* required alignment of 
> > > target type" } */
> > > -  z = (d *) x;  /* { dg-warning "7:cast \[^\n\r]* required alignment of 
> > > target type" } */
> > > +  y = (c *) x;  /* { dg-warning "7:cast \[^\n\r]* required alignment of 
> > > target type" "" { target { ! no_alignment_constraints } } } */
> > > +  z = (d *) x;  /* { dg-warning "7:cast \[^\n\r]* required alignment of 
> > > target type" "" { target { ! no_alignment_constraints } } } */
> > >(long long *) p;  /* { dg-bogus "alignment" } */
> > >(double *) q; /* { dg-bogus "alignment" } */
> > >  }
> > > diff --git a/gcc/testsuite/gcc.dg/c11-align-4.c 
> > > b/gcc/testsuite/gcc.dg/c11-align-4.c
> > > index 57f93ff05fc..eb9071b9149 100644
> > > --- a/gcc/testsuite/gcc.dg/c11-align-4.c
> > > +++ b/gcc/testsuite/gcc.dg/c11-align-4.c
> > > @@ -2,7 +2,7 @@
> > > are at least some alignment constraints).  */
> > >  /* { dg-do compile } */
> > >  /* { dg-options "-std=c11 -pedantic-errors" } */
> > > -/* { dg-skip-if "no alignment constraints" { "avr-*-*" } } */
> > > +/* { dg-skip-if "no alignment constraints" { no_alignment_constraints } 
> > > } */
> > >
> > >  #include 
> > >
> > > diff --git a/gcc/testsuite/gcc.dg/strlenopt-10.c 
> > > b/gcc/testsuite/gcc.dg/strlenopt-10.c
> > > index ce959c34a80..6e2c2597b27 100644
> > > --- a/gcc/testsuite/gcc.dg/strlenopt-10.c
> > > +++ b/gcc/testsuite/gcc.dg/strlenopt-10.c
> > > @@ -70,10 +70,10 @@ main ()
> > >  }
> > >
> > >  /* { dg-final { scan-tree-dump-times "strlen \\(" 2 "strlen1" } } */
> > > -/* avr has BIGGEST_ALIGNMENT 8, allowing fold_builtin_memory_op
> > > +/* Some targets have BIGGEST_ALIGNMENT 8-bits, allowing 
> > > fold_builtin_memory_op
> > > to expand the memcpy call at the end of fn2.  */
> > > -/* {