Re: [PATCH] AVX512FP16:support basic 64/32bit vector type and operation.

2021-09-27 Thread Hongyu Wang via Gcc-patches
> ia32 ABI declares that __m64 values pass via MMX registers. Due to
> this, we are not able to fully disable MMX register usage, as is the
> case with x86_64. So, V4HFmode values will pass to functions via MMX
> registers on ia32 targets.
>
> So, there should be no additional define_insn, the addition to the
> existing MMXMODE mode iterator should be enough. V4HFmodes should be
> handled in the same way as e.g. V8QImode.
>
> This is not the case with 4-byte values, which should be passed using
> integer ABI.

Thanks for the explanation, updated patch by removing the extra define_insn,
and drop V4HFmode from VALID_AVX512FP16_REG_MODE. Now v4hf would behave
same as v8qi.

Bootsrapped and regtested on x86_64-pc-linux-gnu{-m32,} and sde.

OK for master with the updated one?

Uros Bizjak via Gcc-patches  于2021年9月27日周一 下午7:35写道:
>
> On Mon, Sep 27, 2021 at 12:42 PM Hongyu Wang  wrote:
> >
> > Hi Uros,
> >
> > This patch intends to support V4HF/V2HF vector type and basic operations.
> >
> > For 32bit target, V4HF vector is parsed same as __m64 type, V2HF
> > is parsed by stack and returned from GPR since it is not specified
> > by ABI.
> >
> > We found for 64bit vector in ia32, when mmx disabled there seems no
> > mov_internal, so we add a define_insn for v4hf mode. It would be very
> > ppreciated if you know why the handling of 64bit vector looks as is and
> > give some advice.
>
> ia32 ABI declares that __m64 values pass via MMX registers. Due to
> this, we are not able to fully disable MMX register usage, as is the
> case with x86_64. So, V4HFmode values will pass to functions via MMX
> registers on ia32 targets.
>
> So, there should be no additional define_insn, the addition to the
> existing MMXMODE mode iterator should be enough. V4HFmodes should be
> handled in the same way as e.g. V8QImode.
>
> This is not the case with 4-byte values, which should be passed using
> integer ABI.
>
> Uros.
>
> >
> > Bootstraped and regtested on x86_64-pc-linux-gnu{-m32,} and sde.
> >
> > OK for master?
> >
> > gcc/ChangeLog:
> >
> > PR target/102230
> > * config/i386/i386.h (VALID_AVX512FP16_REG_MODE): Add
> > V4HF and V2HF mode check.
> > (VALID_SSE2_REG_VHF_MODE): Likewise.
> > (VALID_MMX_REG_MODE): Likewise.
> > (SSE_REG_MODE_P): Replace VALID_AVX512FP16_REG_MODE with
> > vector mode condition.
> > * config/i386/i386.c (classify_argument): Parse V4HF/V2HF
> > via sse regs.
> > (function_arg_32): Add V4HFmode.
> > (function_arg_advance_32): Likewise.
> > * config/i386/i386.md (mode): Add V4HF/V2HF.
> > (MODE_SIZE): Likewise.
> > * config/i386/mmx.md (MMXMODE): Add V4HF mode.
> > (V_32): Add V2HF mode.
> > (*mov_internal): Adjust sse alternatives to support
> > V4HF mode vector move.
> > (*mov_internal): Adjust sse alternatives
> > to support V2HF mode move.
> > * config/i386/sse.md (VHF_32_64): New mode iterator.
> > (3): New define_insn for add/sub/mul/div.
> > (*movv4hf_internal_sse): New define_insn for -mno-mmx and -msse.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/102230
> > * gcc.target/i386/avx512fp16-floatvnhf.c: Remove xfail.
> > * gcc.target/i386/avx512fp16-trunc-extendvnhf.c: Ditto.
> > * gcc.target/i386/avx512fp16-truncvnhf.c: Ditto.
> > * gcc.target/i386/avx512fp16-64-32-vecop-1.c: New test.
> > * gcc.target/i386/avx512fp16-64-32-vecop-2.c: Ditto.
> > * gcc.target/i386/pr102230.c: Ditto.
> > ---
> >  gcc/config/i386/i386.c|  4 +
> >  gcc/config/i386/i386.h| 12 ++-
> >  gcc/config/i386/i386.md   |  5 +-
> >  gcc/config/i386/mmx.md| 27 ---
> >  gcc/config/i386/sse.md| 49 
> >  .../i386/avx512fp16-64-32-vecop-1.c   | 30 
> >  .../i386/avx512fp16-64-32-vecop-2.c   | 75 +++
> >  .../gcc.target/i386/avx512fp16-floatvnhf.c| 12 +--
> >  .../i386/avx512fp16-trunc-extendvnhf.c| 12 +--
> >  .../gcc.target/i386/avx512fp16-truncvnhf.c| 12 +--
> >  gcc/testsuite/gcc.target/i386/pr102230.c  | 38 ++
> >  11 files changed, 243 insertions(+), 33 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-64-32-vecop-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-64-32-vecop-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr102230.c
> >
> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > index ba89e111d28..b3e4add4b9e 100644
> > --- a/gcc/config/i386/i386.c
> > +++ b/gcc/config/i386/i386.c
> > @@ -2462,6 +2462,8 @@ classify_argument (machine_mode mode, const_tree type,
> >  case E_V2SFmode:
> >  case E_V2SImode:
> >  case E_V4HImode:
> > +case E_V4HFmode:
> > +case E_V2HFmode:
> >  case E_V8QImode:
> > 

Re: [PATCH] [GIMPLE] Simplify (_Float16) ceil ((double) x) to .CEIL (x) when available.

2021-09-27 Thread Hongtao Liu via Gcc-patches
On Mon, Sep 27, 2021 at 8:53 PM Richard Biener
 wrote:
>
> On Fri, Sep 24, 2021 at 1:26 PM liuhongt  wrote:
> >
> > Hi:
> >   Related discussion in [1] and PR.
> >
> >   Bootstrapped and regtest on x86_64-linux-gnu{-m32,}.
> >   Ok for trunk?
> >
> > [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574330.html
> >
> > gcc/ChangeLog:
> >
> > PR target/102464
> > * config/i386/i386.c (ix86_optab_supported_p):
> > Return true for HFmode.
> > * match.pd: Simplify (_Float16) ceil ((double) x) to
> > __builtin_ceilf16 (a) when a is _Float16 type and
> > direct_internal_fn_supported_p.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/pr102464.c: New test.
> > ---
> >  gcc/config/i386/i386.c   | 20 +++-
> >  gcc/match.pd | 28 +
> >  gcc/testsuite/gcc.target/i386/pr102464.c | 39 
> >  3 files changed, 79 insertions(+), 8 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr102464.c
> >
> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > index ba89e111d28..3767fe9806d 100644
> > --- a/gcc/config/i386/i386.c
> > +++ b/gcc/config/i386/i386.c
> > @@ -23582,20 +23582,24 @@ ix86_optab_supported_p (int op, machine_mode 
> > mode1, machine_mode,
> >return opt_type == OPTIMIZE_FOR_SPEED;
> >
> >  case rint_optab:
> > -  if (SSE_FLOAT_MODE_P (mode1)
> > - && TARGET_SSE_MATH
> > - && !flag_trapping_math
> > - && !TARGET_SSE4_1)
> > +  if (mode1 == HFmode)
> > +   return true;
> > +  else if (SSE_FLOAT_MODE_P (mode1)
> > +  && TARGET_SSE_MATH
> > +  && !flag_trapping_math
> > +  && !TARGET_SSE4_1)
> > return opt_type == OPTIMIZE_FOR_SPEED;
> >return true;
> >
> >  case floor_optab:
> >  case ceil_optab:
> >  case btrunc_optab:
> > -  if (SSE_FLOAT_MODE_P (mode1)
> > - && TARGET_SSE_MATH
> > - && !flag_trapping_math
> > - && TARGET_SSE4_1)
> > +  if (mode1 == HFmode)
> > +   return true;
> > +  else if (SSE_FLOAT_MODE_P (mode1)
> > +  && TARGET_SSE_MATH
> > +  && !flag_trapping_math
> > +  && TARGET_SSE4_1)
> > return true;
> >return opt_type == OPTIMIZE_FOR_SPEED;
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index a9791ceb74a..9ccec8b6ce3 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -6191,6 +6191,34 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > (froms (convert float_value_p@0))
> > (convert (tos @0)
> >
> > +#if GIMPLE
> > +(match float16_value_p
> > + @0
> > + (if (TYPE_MAIN_VARIANT (TREE_TYPE (@0)) == float16_type_node)))
> > +(for froms (BUILT_IN_TRUNCL BUILT_IN_TRUNC BUILT_IN_TRUNCF
> > +   BUILT_IN_FLOORL BUILT_IN_FLOOR BUILT_IN_FLOORF
> > +   BUILT_IN_CEILL BUILT_IN_CEIL BUILT_IN_CEILF
> > +   BUILT_IN_ROUNDEVENL BUILT_IN_ROUNDEVEN BUILT_IN_ROUNDEVENF
> > +   BUILT_IN_ROUNDL BUILT_IN_ROUND BUILT_IN_ROUNDF
> > +   BUILT_IN_NEARBYINTL BUILT_IN_NEARBYINT BUILT_IN_NEARBYINTF
> > +   BUILT_IN_RINTL BUILT_IN_RINT BUILT_IN_RINTF)
>
> we do have patterns that convert (truncl (convert floatval)) to
> (float)truncf (val),
> your's does (_Float16)trunc ((double) float16) -> truncF16 (float16), doesn't 
> it
> make sense to have trunc ((double) float16) -> (double)trunfF16
> (float16) as well?
>
> Why do you conditionalize on GIMPLE here?
To avoid
error: ‘direct_internal_fn_supported_p’ was not declared in this scope

>
> That said, I wonder whether we can somehow address pattern explosion here,
> eliding the outer (convert ...) from the match would help a bit already.
>
> The related patterns use optimize && canonicalize_math_p as well btw., not
> sure whether either is appropriate here since there are no _Float16 math
> functions available.
Yes, that's why I didn't follow the existing pattern, i think we can
add optimize back to the condition, but not canonicalize_math_p ()
since there's no math function for _Float16.
Also w/o the outer (convert ..), it looks like a canonicalization to
transform ceil ((double) a) to (double) __builtin_ceilf16 (a) but not
an optimization.
>
> > + tos (IFN_TRUNC IFN_TRUNC IFN_TRUNC
> > + IFN_FLOOR IFN_FLOOR IFN_FLOOR
> > + IFN_CEIL IFN_CEIL IFN_CEIL
> > + IFN_ROUNDEVEN IFN_ROUNDEVEN IFN_ROUNDEVEN
> > + IFN_ROUND IFN_ROUND IFN_ROUND
> > + IFN_NEARBYINT IFN_NEARBYINT IFN_NEARBYINT
> > + IFN_RINT IFN_RINT IFN_RINT)
> > + /* (_Float16) round ((doube) x) -> __built_in_roundf16 (x), etc.,
> > +if x is a _Float16.  */
> > + (simplify
> > +   (convert (froms (convert float16_value_p@0)))
> > + (if (types_match (type, TREE_TYPE (@0))
> > + && direct_internal_fn_supported_p (as_internal_fn (tos),
> > +type, 

Re: [PATCH] Relax condition of (vec_concat:M(vec_select op0 idx0)(vec_select op0 idx1)) to allow different modes between op0 and M, but have same inner mode.

2021-09-27 Thread Jeff Law via Gcc-patches




On 9/27/2021 6:07 AM, Richard Biener via Gcc-patches wrote:

On Mon, Sep 27, 2021 at 11:42 AM Hongtao Liu  wrote:

On Fri, Sep 24, 2021 at 9:08 PM Segher Boessenkool
 wrote:

On Mon, Sep 13, 2021 at 04:24:13PM +0200, Richard Biener wrote:

On Mon, Sep 13, 2021 at 4:10 PM Jeff Law via Gcc-patches
 wrote:

I'm not convinced that we need the inner mode to match anything.  As
long as the vec_concat's mode is twice the size of the vec_select modes
and the vec_select mode is <= the mode of its operands ISTM this is
fine.   We  might want the modes of the vec_select to match, but I don't
think that's strictly necessary either, they just need to be the same
size.  ie, we could have somethig like

(vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DI (reg:V4DI)))

I'm not sure if that level of generality is useful though.  If we want
the modes of the vec_selects to match I think we could still support

(vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DF (reg:V8DF)))

Thoughts?

I think the component or scalar modes of the elements to concat need to match
the component mode of the result.  I don't think you example involving
a cat of DF and DI is too useful - but you could use a subreg around the DI
value ;)

I agree.

If you want to concatenate components of different modes, you should
change mode first, using subregs for example.

I don't really understand.

for

(vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DF (reg:V8DF)))

Thoughts?

how can it be simplified when reg:V4DF is different from (reg:V8DF)
to
(vec_select: (vec_concat:(subreg V8DF (reg:V4DF) 0) (reg:V8DF))
(paralle[...])
?, which doesn't look like a simpication.

Similar for

(vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DI (reg:V4DI)))

here we require rtx_equal_p (XEXP (trueop0, 0), XEXP (trueop1, 0)) so
vec_concat (vec_select vec_select) can be simplified to just
vec_select.

Yes, I think your patch is reasonable, I don't understand how we can
intermediately convert here either or how that would help.
Agreed.  It was never my intention to hold things up.  I was just 
curious about whether or not we wanted to relax things even further.


Jeff



Re: [PATCH] Control all jump threading passes with -fjump-threads.

2021-09-27 Thread Jeff Law via Gcc-patches




On 9/27/2021 9:00 AM, Aldy Hernandez wrote:

Last year I mentioned that -fthread-jumps was being ignored by the
majority of our jump threading passes, and Jeff said he'd be in favor
of fixing this.

This patch remedies the situation, but it does change existing behavior.
Currently -fthread-jumps is only enabled for -O2, -O3, and -Os.  This
means that even if we restricted all jump threading passes with
-fthread-jumps, DOM jump threading would still seep through since it
runs at -O1.

I propose this patch, but it does mean that DOM jump threading would
have to be explicitly enabled with -O1 -fthread-jumps.  An
alternative would be to also offer a specific -fno-dom-threading, but
that seems icky.

OK pending tests?

gcc/ChangeLog:

* tree-ssa-threadbackward.c (pass_thread_jumps::gate): Check
flag_thread_jumps.
(pass_early_thread_jumps::gate): Same.
* tree-ssa-threadedge.c (jump_threader::thread_outgoing_edges):
Return if !flag_thread_jumps.
* tree-ssa-threadupdate.c
(jt_path_registry::register_jump_thread): Assert that
flag_thread_jumps is true.
OK.  Clearly this is going to be even better once we disentangle 
threading from DOM.

jeff



Re: [COMMITTED] Remove old VRP jump threader code.

2021-09-27 Thread Jeff Law via Gcc-patches




On 9/27/2021 9:41 AM, Aldy Hernandez via Gcc-patches wrote:

There's a lot of code that melts away without the ASSERT_EXPR based jump
threader.  Also, I cleaned up the include files as part of the process.

gcc/ChangeLog:

* tree-vrp.c (lhs_of_dominating_assert): Remove.

Whee, goodness :-)

jeff



Re: [PATCH] libgccjit: add some reflection functions in the jit C api

2021-09-27 Thread Antoni Boucher via Gcc-patches
I fixed an issue (it would show an error message when
gcc_jit_type_dyncast_function_ptr_type was called on a type different
than a function pointer type).

Here's the updated patch.

Le vendredi 18 juin 2021 à 16:37 -0400, David Malcolm a écrit :
> On Fri, 2021-06-18 at 15:41 -0400, Antoni Boucher wrote:
> > I have write access now.
> 
> Great.
> 
> > I'm not sure how I'm supposed to send my patches:
> > should I put it in personal branches and you'll merge them?
> 
> Please send them to this mailing list for review; once they're
> approved
> you can merge them.
> 
> > 
> > And for the MAINTAINERS file, should I just push to master right
> > away,
> > after sending it to the mailing list?
> 
> I think people just push the MAINTAINERS change and then let the list
> know, since it makes a good test that write access is working
> correctly.
> 
> Dave
> 
> > 
> > Thanks for your help!
> > 
> > Le vendredi 18 juin 2021 à 12:09 -0400, David Malcolm a écrit :
> > > On Fri, 2021-06-18 at 11:55 -0400, Antoni Boucher wrote:
> > > > Le vendredi 11 juin 2021 à 14:00 -0400, David Malcolm a écrit :
> > > > > On Fri, 2021-06-11 at 08:15 -0400, Antoni Boucher wrote:
> > > > > > Thank you for your answer.
> > > > > > I attached the updated patch.
> > > > > 
> > > > > BTW you (or possibly me) dropped the mailing lists; was that
> > > > > deliberate?
> > > > 
> > > > Oh, my bad.
> > > > 
> > > 
> > > [...]
> > > 
> > > 
> > > > > 
> > > > > 
> > > > > > I have signed the FSF copyright attribution.
> > > > > 
> > > > > I can push changes on your behalf, but I'd prefer it if you
> > > > > did
> > > > > it,
> > > > > especially given that you have various other patches you want
> > > > > to
> > > > > get
> > > > > in.
> > > > > 
> > > > > Instructions on how to get push rights to the git repo are
> > > > > here:
> > > > >   https://gcc.gnu.org/gitwrite.html
> > > > > 
> > > > > I can sponsor you.
> > > > 
> > > > Thanks.
> > > > I did sign up to get push rights.
> > > > Have you accepted my request to get those?
> > > 
> > > I did, but I didn't see any kind of notification.  Did you get an
> > > email
> > > about it?
> > > 
> > > 
> > > Dave
> > > 
> > 
> > 
> 
> 

From 95f8b85bcc7b1259eef1e9916de824c752b2f2c0 Mon Sep 17 00:00:00 2001
From: Antoni Boucher 
Date: Sat, 1 Aug 2020 17:52:17 -0400
Subject: [PATCH] libgccjit: Add some reflection functions [PR96889]

2021-07-19  Antoni Boucher  

gcc/jit/
	PR target/96889
	* docs/topics/compatibility.rst (LIBGCCJIT_ABI_16): New ABI tag.
	* docs/topics/functions.rst: Add documentation for the
	functions gcc_jit_function_get_return_type and
	gcc_jit_function_get_param_count
	* docs/topics/types.rst: Add documentation for the functions
	gcc_jit_function_type_get_return_type,
	gcc_jit_function_type_get_param_count,
	gcc_jit_function_type_get_param_type,
	gcc_jit_type_unqualified, gcc_jit_type_dyncast_array,
	gcc_jit_type_is_bool,
	gcc_jit_type_dyncast_function_ptr_type,
	gcc_jit_type_is_integral, gcc_jit_type_is_pointer,
	gcc_jit_type_dyncast_vector,
	gcc_jit_vector_type_get_element_type,
	gcc_jit_vector_type_get_num_units,
	gcc_jit_struct_get_field, gcc_jit_type_is_struct,
	and gcc_jit_struct_get_field_count
	* libgccjit.c:
	(gcc_jit_function_get_return_type, gcc_jit_function_get_param_count,
	gcc_jit_function_type_get_return_type,
	gcc_jit_function_type_get_param_count,
	gcc_jit_function_type_get_param_type, gcc_jit_type_unqualified,
	gcc_jit_type_dyncast_array, gcc_jit_type_is_bool,
	gcc_jit_type_dyncast_function_ptr_type, gcc_jit_type_is_integral,
	gcc_jit_type_is_pointer, gcc_jit_type_dyncast_vector,
	gcc_jit_vector_type_get_element_type,
	gcc_jit_vector_type_get_num_units, gcc_jit_struct_get_field,
	gcc_jit_type_is_struct, gcc_jit_struct_get_field_count): New
	functions.
	(struct gcc_jit_function_type, struct gcc_jit_vector_type):
	New types.
	* libgccjit.h:
	(gcc_jit_function_get_return_type, gcc_jit_function_get_param_count,
	gcc_jit_function_type_get_return_type,
	gcc_jit_function_type_get_param_count,
	gcc_jit_function_type_get_param_type, gcc_jit_type_unqualified,
	gcc_jit_type_dyncast_array, gcc_jit_type_is_bool,
	gcc_jit_type_dyncast_function_ptr_type, gcc_jit_type_is_integral,
	gcc_jit_type_is_pointer, gcc_jit_type_dyncast_vector,
	gcc_jit_vector_type_get_element_type,
	gcc_jit_vector_type_get_num_units, gcc_jit_struct_get_field,
	gcc_jit_type_is_struct, gcc_jit_struct_get_field_count): New
	function declarations.
	(struct gcc_jit_function_type, struct gcc_jit_vector_type):
	New types.
	* jit-recording.h: New functions (is_struct and is_vector)
	* libgccjit.map (LIBGCCJIT_ABI_16): New ABI tag.

gcc/testsuite/
	PR target/96889
	* jit.dg/all-non-failing-tests.h: Add test-reflection.c.
	* jit.dg/test-reflection.c: New test.
---
 gcc/jit/docs/topics/compatibility.rst|  43 ++-
 gcc/jit/docs/topics/functions.rst|  26 ++
 gcc/jit/docs/topics/types.rst| 122 +
 gcc/jit/jit-recording.h  |   7 +
 gcc/jit/libgccjit.c   

[r12-3903 Regression] FAIL: gcc.dg/guality/pr41616-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects -DPREVENT_OPTIMIZATION execution test on Linux/x86_64

2021-09-27 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

0288527f47cec6698b31ccb3210816415506009e is the first bad commit
commit 0288527f47cec6698b31ccb3210816415506009e
Author: Aldy Hernandez 
Date:   Tue Sep 21 10:27:53 2021 +0200

Replace VRP threader with a hybrid forward threader.

caused

FAIL: gcc.dg/guality/pr41616-1.c   -O2  -DPREVENT_OPTIMIZATION  execution test
FAIL: gcc.dg/guality/pr41616-1.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  -DPREVENT_OPTIMIZATION execution test
FAIL: gcc.dg/guality/pr41616-1.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  -DPREVENT_OPTIMIZATION execution test
FAIL: gcc.dg/guality/pr41616-1.c   -O3 -g  -DPREVENT_OPTIMIZATION  execution 
test

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-3903/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="guality.exp=gcc.dg/guality/pr41616-1.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="guality.exp=gcc.dg/guality/pr41616-1.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="guality.exp=gcc.dg/guality/pr41616-1.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="guality.exp=gcc.dg/guality/pr41616-1.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


Re: Fix 48631_neg test in _GLIBCXX_VERSION_NAMESPACE mode

2021-09-27 Thread Jonathan Wakely via Gcc-patches
On Mon, 27 Sept 2021 at 21:26, François Dumont via Libstdc++
 wrote:
>
> Here is a small patch to fix a test which fails in
> _GLIBCXX_VERSION_NAMESPACE mode.
>
> IMHO it would be better to avoid putting  content in
> versioned namespace, no ?
>
> There is of course more work to do, so for now here is the simpler approach.
>
> Ok to commit ?

Leaving the pattern ending with just "struct" isn't very useful.
Wouldn't it be better to do:

// { dg-prune-output "no type named 'type' in" }

or just:

// { dg-prune-output "enable_if" }

?

Either of those is OK to commit.


Re: [RFC] Experimental __attribute__((saturating)) on integer types.

2021-09-27 Thread Joseph Myers
On Mon, 27 Sep 2021, Richard Biener via Gcc-patches wrote:

> Now - ISTR that elsewhere Joseph suggested that taking on
> saturating operations by type was eventually misguided and we should
> have instead added saturating arithmetic tree codes that we could
> expose via some builtin functions like the overflow ones.

There are several issues there:

* saturating (and other fixed-point) types at the C API level;

* saturating (and other fixed-point) types in GIMPLE;

* saturating (and other fixed-point) modes in RTL.

As I said in 
, I think 
having special modes for these kinds of types is a bad idea, because 
operations should be lowered to ordinary integer arithmetic at some point 
in GIMPLE, or at the latest in expand.  (Maybe a few cases would sensibly 
use libgcc functions rather than inline arithmetic, but those would be the 
exception.  We handle inline expansion of the overflow-checking built-in 
functions in general, much of that code could be shared to expand 
saturating arithmetic in general on hardware lacking the operations.)  At 
present, there are loads of fixed-point machine modes, and very many 
libgcc functions on the targets supporting fixed-point, and very little 
optimization done on these operations, when if the operations were lowered 
to normal arithmetic earlier, generic code in the compiler could optimize 
them.  (Back ends would still need to know enough about the types in 
question to be able to implement any desired ABI differences from the 
underlying ordinary integer types.)

My inclination is that GIMPLE should also use saturating operations rather 
than saturating types.

At the C API level it's less clear.  When you have saturating types in the 
front end - as in those we currently have implemented, from the Embedded C 
TR, for example - at some point they need lowering to saturating 
operations on normal types, if you follow my suggested model above.  That 
could be at gimplification, or you could allow saturating types in GIMPLE 
but then have some early pass that replaces them by normal types using 
saturating operations.

For some kinds of algorithm, saturating types may well be a convenient 
abstraction for the user.  For others, saturating operations on normal 
types may make more sense (e.g. using saturating arithmetic on size_t to 
compute an allocation size, knowing that SIZE_MAX will result in 
allocation failure if passed to an allocation function).

As for the specific patch: it looks like you create a new type every time 
the user uses the attribute.  If you allow users to create such saturating 
types (distinct from the fixed-point ones) at all, I think that every time 
someone requests int __attribute__ ((saturating)) it should produce the 
same type (and likewise for each other underlying non-saturating integer 
type, and watch out for any interactions with types created for 
bit-fields).  Then there would be API design questions to address such as 
the results of converting out-of-range integer or floating-point values - 
or, for that matter, wider pointers - to a saturating type.

-- 
Joseph S. Myers
jos...@codesourcery.com


Fix 48631_neg test in _GLIBCXX_VERSION_NAMESPACE mode

2021-09-27 Thread François Dumont via Gcc-patches
Here is a small patch to fix a test which fails in 
_GLIBCXX_VERSION_NAMESPACE mode.


IMHO it would be better to avoid putting  content in 
versioned namespace, no ?


There is of course more work to do, so for now here is the simpler approach.

Ok to commit ?

François


diff --git a/libstdc++-v3/testsuite/20_util/default_delete/48631_neg.cc b/libstdc++-v3/testsuite/20_util/default_delete/48631_neg.cc
index 3e80b73603e..f710806ef42 100644
--- a/libstdc++-v3/testsuite/20_util/default_delete/48631_neg.cc
+++ b/libstdc++-v3/testsuite/20_util/default_delete/48631_neg.cc
@@ -26,4 +26,4 @@ struct D : B { };
 D d;
 std::default_delete db;
 typedef decltype(db()) type; // { dg-error "no match" }
-// { dg-prune-output "no type named 'type' in 'struct std::enable_if" }
+// { dg-prune-output "no type named 'type' in 'struct" }


Re: [PATCH] Make flag_trapping_math a non-binary Boolean.

2021-09-27 Thread Joseph Myers
On Sat, 25 Sep 2021, Roger Sayle wrote:

> Normally Boolean options/flags in GCC take the values zero or one.
> This patch tweaks flag_trapping_math to take the values 0 or 65535.
> More accurately it introduces a new trapping_math_model enumeration in
> flag-types.h, and uses this to allow front-ends to (potentially) control
> which expressions may be constant folded at compile-time by the middle-end.
> Floating point/language experts may recognize these flags (bits) as being
> modelled upon (extended) FENV_ACCESS.

I'm not sure exactly what those bits are modelled on (they are similar to, 
but not exactly the same as, IEEE 754 sub-exceptions), but a lot more 
explanation is needed - explanation of the rationale for the particular 
model chosen, explanation for what is or is not considered a default 
there, comments on each bit documenting its semantics in detail, and 
explanation of how this relates to other relevant discussions and patch 
proposals.

I think the following are the three key things this should be related to, 
none of which are mentioned in this patch submission:


(a) The various possible effects -ftrapping-math might have on allowed 
transformations, as discussed in bug 54192, where in comment #8 I 
identified five different kinds of restriction that -ftrapping-math might 
imply (only the first of which, and maybe to some extent the second, is 
handled much in GCC at present - and even there, there are various open 
bugs about cases where e.g. the expanders generate code not raising the 
right exceptions, or libgcc functions don't raise the right exceptions, 
especially when conversions between floating-point and integer are 
involved).

I actually think this sort of classification of effects of -ftrapping-math 
is probably more useful to provide control over (both internally in GCC 
and externally via more fine-grained command-line options) than the 
details of which individual exceptions are involved as suggested in your 
flags values.  However, the two are largely orthogonal (other than the 
point about exact underflows only relating to one of the exceptions).

I don't make any assertion here of which of these effects (if any) ought 
to be the default - making -ftrapping-math actually implement all five 
restrictions fully while keeping it the default might significantly impair 
optimization.  Also as mentioned in bug 54192, even if some such 
restrictions are applied by default, for soft-float systems with no 
support for exceptions it would make sense to apply more transformations 
unconditionally.


(b) Marc Glisse's -ffenv-access patches from August 2020 (and the 
discussion of them from that time).  Those don't claim to be complete, but 
they are the nearest we have to an attempt at implementing the sort of 
thing that would actually be needed to avoid code movement or removal that 
is invalid in the presence of code using floating-point flags (which 
overlaps a lot with what's needed to get -frounding-math correct under 
similar circumstances - except that a full -ftrapping-math might well 
involve stricter optimization restrictions than full -frounding-math, even 
in the absence of supporting non-local control float for trap handlers, 
because floating-point operations only read the rounding mode, but both 
read and write the exception state).


(c) The alternate exception handling bindings (FENV_EXCEPT pragma) in TS 
18661-5.  I'm not aware of any implementations of those bindings, it's far 
from clear whether they will turn out in the end to be a good way of 
providing C bindings to IEEE 754 alternate exception handling or not, and 
(given those issues) they aren't going to be integrated into C23.  But 
it's at least possible that the OPTIONAL_FLAG action (allowing 
transformations that cause certain exceptions or sub-exceptions not to 
raise the corresponding flag) could sometimes be useful in practice - and 
it's what seems to relate most closely to the sort of classification of 
exceptions in your patch (to implement it, you'd need that classification 
- though you'd also need to fix the other issues under (a) above).


> +  TRAPPING_MATH_QNANOP = 1UL << 0,
> +  TRAPPING_MATH_SNANOP = 1UL << 1,
> +  TRAPPING_MATH_QNANCMP = 1UL << 2,
> +  TRAPPING_MATH_SNANCMP = 1UL << 3,
> +  TRAPPING_MATH_INTCONV = 1UL << 4,
> +  TRAPPING_MATH_SQRTNEG = 1UL << 5,
> +  TRAPPING_MATH_LIBMFUN = 1UL << 6,
> +  TRAPPING_MATH_FDIVZERO = 1UL << 7,
> +  TRAPPING_MATH_IDIVZERO = 1UL << 8,
> +  TRAPPING_MATH_FPDENORM = 1UL << 9,
> +  TRAPPING_MATH_OVERFLOW = 1UL << 10,
> +  TRAPPING_MATH_UNDERFLOW = 1UL << 11,
> +  TRAPPING_MATH_INFDIVINF = 1UL << 12,
> +  TRAPPING_MATH_INFSUBINF = 1UL << 13,
> +  TRAPPING_MATH_INFMULZERO = 1UL << 14,
> +  TRAPPING_MATH_ZERODIVZERO = 1UL << 15,

Many of these are similar to, but not the same as, the sub-exceptions in 
IEEE 754 (enumerated as a more explicit list with names in TS 18661-5).

I think that if you want to handle sub-exceptions at all, it would be much 

[PATCH] coroutines: Only set parm copy guard vars if we have exceptions [PR 102454].

2021-09-27 Thread Iain Sandoe via Gcc-patches
For coroutines, we make copies of the original function arguments into
the coroutine frame.  Normally, these are destroyed on the proper exit
from the coroutine when the frame is destroyed.

However, if an exception is thrown before the first suspend point is
reached, the cleanup has to happen in the ramp function.  These cleanups
are guarded such that they are only applied to any param copies actually
made.

The ICE is caused by an attempt to set the guard variable when there are
no exceptions enabled (the guard var is not created in this case).

Fixed by checking for flag_exceptions in this case too.

While touching this code paths, also clean up the synthetic names used
when a function parm is unnamed.

tested on x86_64-darwin,
OK for master?
thanks
Iain

Signed-off-by: Iain Sandoe 

PR c++/102454

gcc/cp/ChangeLog:

* coroutines.cc (analyze_fn_parms): Clean up synthetic names for
unnamed function params.
(morph_fn_to_coro): Do not try to set a guard variable for param
DTORs in the ramp, unless we have exceptions active.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr102454.C: New test.
---
 gcc/cp/coroutines.cc   | 26 ---
 gcc/testsuite/g++.dg/coroutines/pr102454.C | 38 ++
 2 files changed, 52 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/coroutines/pr102454.C

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index fbd5c49533f..c761e769c12 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -3829,13 +3829,12 @@ analyze_fn_parms (tree orig)
 
   if (TYPE_HAS_NONTRIVIAL_DESTRUCTOR (parm.frame_type))
{
- char *buf = xasprintf ("_Coro_%s_live", IDENTIFIER_POINTER (name));
- parm.guard_var = build_lang_decl (VAR_DECL, get_identifier (buf),
-   boolean_type_node);
- free (buf);
- DECL_ARTIFICIAL (parm.guard_var) = true;
- DECL_CONTEXT (parm.guard_var) = orig;
- DECL_INITIAL (parm.guard_var) = boolean_false_node;
+ char *buf = xasprintf ("%s%s_live", DECL_NAME (arg) ? "_Coro_" : "",
+IDENTIFIER_POINTER (name));
+ parm.guard_var
+   = coro_build_artificial_var (UNKNOWN_LOCATION, get_identifier (buf),
+boolean_type_node, orig,
+boolean_false_node);
  parm.trivial_dtor = false;
}
   else
@@ -4843,11 +4842,14 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
*destroyer)
 NULL, parm.frame_type,
 LOOKUP_NORMAL,
 tf_warning_or_error);
- /* This var is now live.  */
- r = build_modify_expr (fn_start, parm.guard_var,
-boolean_type_node, INIT_EXPR, fn_start,
-boolean_true_node, boolean_type_node);
- finish_expr_stmt (r);
+ if (flag_exceptions)
+   {
+ /* This var is now live.  */
+ r = build_modify_expr (fn_start, parm.guard_var,
+boolean_type_node, INIT_EXPR, fn_start,
+boolean_true_node, boolean_type_node);
+ finish_expr_stmt (r);
+   }
}
}
 }
diff --git a/gcc/testsuite/g++.dg/coroutines/pr102454.C 
b/gcc/testsuite/g++.dg/coroutines/pr102454.C
new file mode 100644
index 000..41aeda7b973
--- /dev/null
+++ b/gcc/testsuite/g++.dg/coroutines/pr102454.C
@@ -0,0 +1,38 @@
+//  { dg-additional-options "-fno-exceptions" }
+
+#include 
+#include 
+
+template 
+struct looper {
+  struct promise_type {
+auto get_return_object () { return handle_type::from_promise (*this); }
+auto initial_suspend () { return suspend_always_prt {}; }
+auto final_suspend () noexcept { return suspend_always_prt {}; }
+void return_value (T);
+void unhandled_exception ();
+  };
+
+  using handle_type = std::coroutine_handle;
+
+  looper (handle_type);
+
+  struct suspend_always_prt {
+bool await_ready () noexcept;
+void await_suspend (handle_type) noexcept;
+void await_resume () noexcept;
+  };
+};
+
+template 
+looper
+with_ctorable_state (T)
+{
+  co_return T ();
+}
+
+auto
+foo ()
+{
+  return with_ctorable_state;
+}
-- 
2.24.3 (Apple Git-128)



Re: [Patch] Fortran: Fix assumed-size to assumed-rank passing [PR94070]

2021-09-27 Thread Harald Anlauf via Gcc-patches

Hi Thomas,

Am 27.09.21 um 14:07 schrieb Tobias Burnus:

While playing I stumbled over the fact that when allocating an array
with a dimension that has extent 0, e.g. 4:-5, the lbound gets reset
to 1 and ubound set to 0.


I am not sure, whether I fully understand what you wrote. For:

   integer, allocatable :: a(:)
   allocate(a(4:-5))
   print *, size(a), size(a, dim=1), shape(a) ! should print the '0 0 0'
   print *, lbound(a, dim=1) ! should print '1'
   print *, ubound(a, dim=1) ! should print '0'

where the last line is due to F2018, 16.9.196, which has:

  'Otherwise, if DIM is present, the result has a value equal to the
   number of elements in dimension DIM of ARRAY.'

And lbound(a,dim=1) == 1 due to the "otherwise" case of F2018:16.9.109 
LBOUND:

"Case (i): If DIM is present, ARRAY is a whole array, and either
  ARRAY is an assumed-size array of rank DIM or dimension DIM of
  ARRAY has nonzero extent, the result has a value equal to the
  lower bound for subscript DIM of ARRAY. Otherwise, if DIM is
  present, the result value is 1."

And when doing
   call f(a)
   call g(a)
with 'subroutine f(x); integer :: x(:)'
and 'subroutine g(y); integer :: y(..)'

Here, ubound == 0 due to the reason above and lbound is set to
the declared lower bound, which is for 'x' the default ("1") but
could also be 5 with "x(5:)" and for 'y' it cannot be specified.
For 'x', see last sentence of F2018:8.5.8.3. For 'y', I did not
find the exact location but it follows alongsize.


it appears that I managed to kill the related essential part of
my comment (fat fingers?).  So here is what I played with:

program p
  implicit none
! integer, allocatable :: x(:,:)
  integer, pointer :: x(:,:)
  allocate (x(-3:3,4:0))
  print *, "lbound =", lbound (x)
  call sub (x)
contains
  subroutine sub (y)
!   integer, allocatable :: y(..)
integer, pointer :: y(..)
print *, "size   =", size (y)
print *, "shape  =", shape (y)
print *, "lbound =", lbound (y)
print *, "ubound =", ubound (y)
  end subroutine sub
end

Array x is deferred shape, as is the dummy y.
This prints:

 lbound =  -3   1
 size   =   0
 shape  =   7   0
 lbound =  -3   1
 ubound =   3   0

For some reason Intel prints different lbound for main/sub,
meaning that it is broken, but here it goes:

 lbound =  -3   1
 size   =   0
 shape  =   7   0
 lbound =  -3   4
 ubound =   3   3

So for the first dimension everything is fine, but for the
second dim, which has extent zero, my question is: what should
the lbound be?  1 or 4?


With BIND(C) applied to f and g, ubound remains the same but
lbound is now 0 instead of 1.


I haven't check the BIND(C) complications.
For "common" Fortran code, I looked at 9.7.1.2(1):

"When an ALLOCATE statement is executed for an array for which
 allocate-shape-spec-list is specified, the values of the lower bound
 and upper bound expressions determine the bounds of the array.
 Subsequent redefinition or undefinition of any entities in the bound
 expressions do not affect the array bounds. If the lower bound is
 omitted, the default value is 1. If the upper bound is less than the
 lower bound, the extent in that dimension is zero and the array has
 zero size."

It is the word "determine" in first sentence that made me stumble.
I am not saying that it is wrong to handle extent zero the way it
is done - using lower bound 1 and upper bound 0 - as the extent is
correct.  *If* that is the case, then I would consider gfortran having
a consistent quality of implementation, but not Intel...


Has the standard has changed in this respect?


I doubt it, but only looked at F2018 and not at older standards.




PS: I saw that we recently had a couple of double reviews. I think it is
useful if multiple persons look at patches, but hope that we do not
start requiring two reviews for each patch ;-)


That would certainly have a very adversary affect and terribly increase 
the spam^WPING rate...


Harald



Re: [PATCH] c++: deduction guides and ttp rewriting [PR102479]

2021-09-27 Thread Jason Merrill via Gcc-patches

On 9/27/21 10:44, Patrick Palka wrote:

The problem here is ultimately that rewrite_tparm_list when rewriting a
TEMPLATE_TEMPLATE_PARM introduces a tree cycle in the rewritten
ttp that structural_comptypes can't cope with.  In particular the
DECL_TEMPLATE_PARMS of a ttp's TEMPLATE_DECL normally captures an empty
parameter list at its own level (and so the TEMPLATE_DECL doesn't appear
in its own DECL_TEMPLATE_PARMS), but rewrite_tparm_list ends up giving
it a complete parameter list.  In the new testcase below, this causes
infinite recursion from structural_comptypes when comparing Tmpl
with Tmpl (here both 'Tmpl's are rewritten).

This patch fixes this by making rewrite_template_parm give a rewritten
template template parm an empty parameter list at its own level, thereby
avoiding the tree cycle.  Testing the alias CTAD case revealed that
we're not setting current_template_parms in alias_ctad_tweaks, which
this patch also fixes.  Also, the change to use TMPL_ARGS_LEVEL instead
of TREE_VEC_ELT is needed because alias_ctad_tweaks passes only a single
level of targs to rewrite_tparm_list.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


OK.


PR c++/102479

gcc/cp/ChangeLog:

* pt.c (rewrite_template_parm): Use TMPL_ARGS_LEVEL instead of
TREE_VEC_ELT directly to properly handle one-level tsubst_args.
Avoid a tree cycle when assigning the DECL_TEMPLATE_PARMS for a
rewritten ttp.
(alias_ctad_tweaks): Set current_template_parms accordingly.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/class-deduction12.C: Also test alias CTAD in the
same way.
* g++.dg/cpp1z/class-deduction99.C: New test.
---
  gcc/cp/pt.c   | 20 +--
  .../g++.dg/cpp1z/class-deduction12.C  |  6 
  .../g++.dg/cpp1z/class-deduction99.C  | 35 +++
  3 files changed, 58 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction99.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 6bd6ceb29be..cba0f5c8279 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -28754,7 +28754,7 @@ rewrite_template_parm (tree olddecl, unsigned index, 
unsigned level,
  const int depth = TMPL_ARGS_DEPTH (tsubst_args);
  tree ttargs = make_tree_vec (depth + 1);
  for (int i = 0; i < depth; ++i)
-   TREE_VEC_ELT (ttargs, i) = TREE_VEC_ELT (tsubst_args, i);
+   TREE_VEC_ELT (ttargs, i) = TMPL_ARGS_LEVEL (tsubst_args, i + 1);
  TREE_VEC_ELT (ttargs, depth)
= template_parms_level_to_args (ttparms);
  // Substitute ttargs into ttparms to fix references to
@@ -28767,8 +28767,17 @@ rewrite_template_parm (tree olddecl, unsigned index, 
unsigned level,
  ttparms = tsubst_template_parms_level (ttparms, ttargs,
 complain);
  // Finally, tack the adjusted parms onto tparms.
- ttparms = tree_cons (size_int (depth), ttparms,
-  current_template_parms);
+ ttparms = tree_cons (size_int (level + 1), ttparms,
+  copy_node (current_template_parms));
+ // As with all template template parms, the parameter list captured
+ // by this template template parm that corresponds to its own level
+ // should be empty.  This avoids infinite recursion when structurally
+ // comparing two such rewritten template template parms (102479).
+ gcc_assert (!TREE_VEC_LENGTH
+ (TREE_VALUE (TREE_CHAIN (DECL_TEMPLATE_PARMS 
(olddecl);
+ gcc_assert (TMPL_PARMS_DEPTH (TREE_CHAIN (ttparms)) == level);
+ TREE_VALUE (TREE_CHAIN (ttparms)) = make_tree_vec (0);
+ // All done.
  DECL_TEMPLATE_PARMS (newdecl) = ttparms;
}
  }
@@ -29266,6 +29275,11 @@ alias_ctad_tweaks (tree tmpl, tree uguides)
  ++ndlen;
  tree gtparms = make_tree_vec (natparms + ndlen);
  
+	  /* Set current_template_parms as in build_deduction_guide.  */

+ auto ctp = make_temp_override (current_template_parms);
+ current_template_parms = copy_node (DECL_TEMPLATE_PARMS (tmpl));
+ TREE_VALUE (current_template_parms) = gtparms;
+
  /* First copy over the parms of A.  */
  for (j = 0; j < natparms; ++j)
TREE_VEC_ELT (gtparms, j) = TREE_VEC_ELT (atparms, j);
diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction12.C 
b/gcc/testsuite/g++.dg/cpp1z/class-deduction12.C
index a31cc1526db..f0d7ea0e16b 100644
--- a/gcc/testsuite/g++.dg/cpp1z/class-deduction12.C
+++ b/gcc/testsuite/g++.dg/cpp1z/class-deduction12.C
@@ -15,3 +15,9 @@ A a(,2,B<42>());
  template  class same;
  template  class same {};
  same> s;
+
+#if __cpp_deduction_guides >= 201907
+template  using C = A;
+
+same())), A> t;
+#endif
diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction99.C 

[r12-3899 Regression] FAIL: gcc.dg/strlenopt-13.c scan-tree-dump-times strlen1 "memcpy \\(" 7 on Linux/x86_64

2021-09-27 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

d06dc8a2c73735e9496f434787ba4c93ceee5eea is the first bad commit
commit d06dc8a2c73735e9496f434787ba4c93ceee5eea
Author: Richard Biener 
Date:   Mon Sep 27 13:36:12 2021 +0200

middle-end/102450 - avoid type_for_size for non-existing modes

caused

FAIL: gcc.dg/out-of-bounds-1.c  (test for warnings, line 12)
FAIL: gcc.dg/pr78408-1.c scan-tree-dump-times fab1 "after previous" 17
FAIL: gcc.dg/strlenopt-13.c scan-tree-dump-times strlen1 "memcpy \\(" 7

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-3899/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gcc.dg/out-of-bounds-1.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gcc.dg/out-of-bounds-1.c --target_board='unix{-m64\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/pr78408-1.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/pr78408-1.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/strlenopt-13.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/strlenopt-13.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[r12-3893 Regression] FAIL: gcc.target/i386/vect-pr97352.c scan-assembler-times vmov.pd 4 on Linux/x86_64

2021-09-27 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

6390c5047adb75960f86d56582e6322aaa4d9281 is the first bad commit
commit 6390c5047adb75960f86d56582e6322aaa4d9281
Author: Richard Biener 
Date:   Wed Nov 18 09:36:57 2020 +0100

Allow different vector types for stmt groups

caused

FAIL: gcc.dg/vect/bb-slp-17.c -flto -ffat-lto-objects  scan-tree-dump-times 
slp2 "optimized: basic block" 1
FAIL: gcc.dg/vect/bb-slp-17.c scan-tree-dump-times slp2 "optimized: basic 
block" 1
FAIL: gcc.dg/vect/bb-slp-pr65935.c -flto -ffat-lto-objects  
scan-tree-dump-times slp1 "optimized: basic block" 10
FAIL: gcc.dg/vect/bb-slp-pr65935.c scan-tree-dump-times slp1 "optimized: basic 
block" 10
FAIL: gcc.target/i386/vect-pr97352.c scan-assembler-times vmov.pd 4

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-3893/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/bb-slp-17.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/bb-slp-17.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/bb-slp-pr65935.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/bb-slp-pr65935.c --target_board='unix{-m64\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/vect-pr97352.c 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


Aw: Re: [PATCH] PR fortran/102458 - ICE tree check: expected array_type, have pointer_type in gfc_conv_array_initializer, at fortran/trans-array.c:6136

2021-09-27 Thread Harald Anlauf via Gcc-patches
Hi Thomas,

> > Regtested on x86_64-pc-linux-gnu.  OK for mainline?
>
> It was actually 10.1.12 :-)

dang, you're right.  Steve had it right, and I failed miserably
on copy  I should fix the comment.

> OK for trunk.
>
> Thanks for the patch!

Thanks for the review, which came after Jerry's!
I should have waited for yours, too.

Harald

> Best regards
>
>   Thomas



Re: [PATCH] Replace VRP threader with a hybrid forward threader.

2021-09-27 Thread Richard Biener via Gcc-patches
On September 27, 2021 6:07:40 PM GMT+02:00, Aldy Hernandez via Gcc-patches 
 wrote:
>
>
>On 9/27/21 5:27 PM, Aldy Hernandez wrote:
>> 
>> 
>> On 9/27/21 5:01 PM, Jeff Law wrote:
>>>
>>>
>>> On 9/24/2021 9:46 AM, Aldy Hernandez wrote:
>
>>> And the big question, is the pass running after VRP2 doing anything 
>>> particularly useful?  Do we want to try and kill it now, or later?
>> 
>> Interesting question.  Perhaps if we convert DOM threading to a hybrid 
>> model, it will render the post-VRP threader completely useless.  Huhh... 
>> That could kill 2 birds with one stone... we get rid of a threading 
>> pass, and we don't need to worry about as much about the super-fast ranger.
>
>These are just a few of the threading passes at -O2:
>
>a.c.192t.thread3   <-- bck threader
>a.c.193t.dom3  <-- fwd threader
>a.c.194t.strlen1
>a.c.195t.thread4   <-- bck threader
>a.c.196t.vrp2
>a.c.197t.vrp-thread2 <-- fwd threader
>
>That's almost 4 back to back threaders!
>
>*pause for effect*

We've always known we have too many of these once Jeff triplicated all the 
backwards threading ones. I do hope we manage to reduce the number for GCC 12. 
Esp. If the new ones are slower because they no longer use simple lattices. 

Richard. 

>Aldy
>



Re: [PATCH] Replace VRP threader with a hybrid forward threader.

2021-09-27 Thread Aldy Hernandez via Gcc-patches




On 9/27/21 5:27 PM, Aldy Hernandez wrote:



On 9/27/21 5:01 PM, Jeff Law wrote:



On 9/24/2021 9:46 AM, Aldy Hernandez wrote:


And the big question, is the pass running after VRP2 doing anything 
particularly useful?  Do we want to try and kill it now, or later?


Interesting question.  Perhaps if we convert DOM threading to a hybrid 
model, it will render the post-VRP threader completely useless.  Huhh... 
That could kill 2 birds with one stone... we get rid of a threading 
pass, and we don't need to worry about as much about the super-fast ranger.


These are just a few of the threading passes at -O2:

a.c.192t.thread3   <-- bck threader
a.c.193t.dom3  <-- fwd threader
a.c.194t.strlen1
a.c.195t.thread4   <-- bck threader
a.c.196t.vrp2
a.c.197t.vrp-thread2 <-- fwd threader

That's almost 4 back to back threaders!

*pause for effect*

Aldy



Re: [PATCH] Introduce sh_mul and uh_mul RTX codes for high-part multiplications

2021-09-27 Thread Richard Sandiford via Gcc-patches
"Roger Sayle"  writes:
> This patch introduces new RTX codes to allow the RTL passes and
> backends to consistently represent high-part multiplications.
> Currently, the RTL used by different backends for expanding
> smul3_highpart and umul3_highpart varies greatly,
> with many but not all choosing to express this something like:
>
> (define_insn "smuldi3_highpart"
>   [(set (match_operand:DI 0 "nvptx_register_operand" "=R")
>(truncate:DI
> (lshiftrt:TI
>  (mult:TI (sign_extend:TI
>(match_operand:DI 1 "nvptx_register_operand" "R"))
>   (sign_extend:TI
>(match_operand:DI 2 "nvptx_register_operand" "R")))
>  (const_int 64]
>   ""
>   "%.\\tmul.hi.s64\\t%0, %1, %2;")
>
> One complication with using this "widening multiplication" representation
> is that it requires an intermediate in a wider mode, making it difficult
> or impossible to encode a high-part multiplication of the widest supported
> integer mode.

Yeah.  It's also a problem when representing vector ops.

> A second is that it can interfere with optimization; for
> example simplify-rtx.c contains the comment:
>
>case TRUNCATE:
>   /* Don't optimize (lshiftrt (mult ...)) as it would interfere
>  with the umulXi3_highpart patterns.  */
>
> Hopefully these problems are solved (or reduced) by introducing a
> new canonical form for high-part multiplications in RTL passes.
> This also simplifies insn patterns when one operand is constant.
>
> Whilst implementing some constant folding simplifications and
> compile-time evaluation of these new RTX codes, I noticed that
> this functionality could also be added for the existing saturating
> arithmetic RTX codes.  Then likewise when documenting these new RTX
> codes, I also took the opportunity to silence the @xref warnings in
> invoke.texi.
>
> This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
> and "make -k check" with no new failures.  Ok for mainline?
>
>
> 2021-09-25  Roger Sayle  
>
> gcc/ChangeLog
>   * gcc/rtl.def (SH_MULT, UH_MULT): New RTX codes for representing
>   signed and unsigned high-part multiplication respectively.
>   * gcc/simplify-rtx.c (simplify_binary_operation_1) [SH_MULT,
>   UH_MULT]: Simplify high-part multiplications by zero.
>   [SS_PLUS, US_PLUS, SS_MINUS, US_MINUS, SS_MULT, US_MULT,
>   SS_DIV, US_DIV]: Similar simplifications for saturating
>   arithmetic.
>   (simplify_const_binary_operation) [SS_PLUS, US_PLUS, SS_MINUS,
>   US_MINUS, SS_MULT, US_MULT, SH_MULT, UH_MULT]: Implement
>   compile-time evaluation for constant operands.
>   * gcc/dwarf2out.c (mem_loc_descriptor): Skip SH_MULT and UH_MULT.
>   * doc/rtl.texi (sh_mult, uhmult): Document new RTX codes.
>   * doc/md.texi (smul@var{m}3_highpart, umul@var{m3}_highpart):
>   Mention the new sh_mul and uh_mul RTX codes.
>   * doc/invoke.texi: Silence @xref "compilation" warnings.

Look like a good idea to me.  Only real comment is on the naming:
if possible, I think we should try to avoid introducing yet more
differences between optab names and rtl codes.  How about umul_highpart
for the unsigned code, to match both the optab and the existing
convention of adding “u” directly to the front of non-saturating
operations?

Things are more inconsistent for signed rtx codes: sometimes the
“s” is present and sometimes it isn't.  But since “smin” and “smax”
have it, I think we can justify having it here too.

So I think we should use smul_highpart and umul_highpart.
It's a bit more wordy than sh_mul, but still a lot shorter than
the status quo ;-)

> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> index ebad5cb..b4b04b9 100644
> --- a/gcc/simplify-rtx.c
> +++ b/gcc/simplify-rtx.c
> @@ -4142,11 +4142,40 @@ simplify_context::simplify_binary_operation_1 
> (rtx_code code,
>  case US_PLUS:
>  case SS_MINUS:
>  case US_MINUS:
> +  /* Simplify x + 0 to x, if possible.  */

Nit: +/-

> +  if (trueop1 == CONST0_RTX (mode) && !HONOR_SIGNED_ZEROS (mode))

The HONOR_SIGNED_ZEROS check is redundant, since these ops don't support
modes with signed zero.

Same for the other HONOR_* macros in the patch.  E.g. I don't think
we should try to guess how infinities and saturation work together.

> + return op0;
> +  return 0;
> +
>  case SS_MULT:
>  case US_MULT:
> +  /* Simplify x * 0 to 0, if possible.  */
> +  if (trueop1 == CONST0_RTX (mode)
> +   && !HONOR_NANS (mode)
> +   && !HONOR_SIGNED_ZEROS (mode)
> +   && !side_effects_p (op0))
> + return op1;
> +
> +  /* Simplify x * 1 to x, if possible.  */
> +  if (trueop1 == CONST1_RTX (mode) && !HONOR_SNANS (mode))
> + return op0;
> +  return 0;
> +
> +case SH_MULT:
> +case UH_MULT:
> +  /* Simplify x * 0 to 0, if possible.  */
> +  if (trueop1 == CONST0_RTX (mode)
> +   && !HONOR_NANS (mode)
> +   && 

[COMMITTED] Minor cleanups to solver.

2021-09-27 Thread Aldy Hernandez via Gcc-patches
These are some minor cleanups and renames that surfaced after the
hybrid_threader work.

gcc/ChangeLog:

* gimple-range-path.cc
(path_range_query::precompute_ranges_in_block): Rename to...
(path_range_query::compute_ranges_in_block): ...this.
(path_range_query::precompute_ranges): Rename to...
(path_range_query::compute_ranges): ...this.
(path_range_query::precompute_relations): Rename to...
(path_range_query::compute_relations): ...this.
(path_range_query::precompute_phi_relations): Rename to...
(path_range_query::compute_phi_relations): ...this.
* gimple-range-path.h: Rename precompute* to compute*.
* tree-ssa-threadbackward.c
(back_threader::find_taken_edge_switch): Same.
(back_threader::find_taken_edge_cond): Same.
* tree-ssa-threadedge.c
(hybrid_jt_simplifier::compute_ranges_from_state): Same.
(hybrid_jt_state::register_equivs_stmt): Inline...
* tree-ssa-threadedge.h: ...here.
---
 gcc/gimple-range-path.cc  | 28 ++--
 gcc/gimple-range-path.h   | 14 +-
 gcc/tree-ssa-threadbackward.c |  4 ++--
 gcc/tree-ssa-threadedge.c |  8 +---
 gcc/tree-ssa-threadedge.h |  7 +--
 5 files changed, 27 insertions(+), 34 deletions(-)

diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index 0738a5ca159..71e04e4deba 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -293,11 +293,11 @@ path_range_query::range_defined_in_block (irange , tree 
name, basic_block bb)
   return true;
 }
 
-// Precompute ranges defined in the current block, or ranges
-// that are exported on an edge to the next block.
+// Compute ranges defined in the current block, or exported to the
+// next block.
 
 void
-path_range_query::precompute_ranges_in_block (basic_block bb)
+path_range_query::compute_ranges_in_block (basic_block bb)
 {
   bitmap_iterator bi;
   int_range_max r, cached_range;
@@ -452,14 +452,14 @@ path_range_query::add_copies_to_imports ()
 }
 }
 
-// Precompute the ranges for IMPORTS along PATH.
+// Compute the ranges for IMPORTS along PATH.
 //
 // IMPORTS are the set of SSA names, any of which could potentially
 // change the value of the final conditional in PATH.
 
 void
-path_range_query::precompute_ranges (const vec ,
-const bitmap_head *imports)
+path_range_query::compute_ranges (const vec ,
+ const bitmap_head *imports)
 {
   if (DEBUG_SOLVER)
 fprintf (dump_file, "\n*** path_range_query **\n");
@@ -472,12 +472,12 @@ path_range_query::precompute_ranges (const 
vec ,
 {
   add_copies_to_imports ();
   m_oracle->reset_path ();
-  precompute_relations (path);
+  compute_relations (path);
 }
 
   if (DEBUG_SOLVER)
 {
-  fprintf (dump_file, "\npath_range_query: precompute_ranges for path: ");
+  fprintf (dump_file, "\npath_range_query: compute_ranges for path: ");
   for (unsigned i = path.length (); i > 0; --i)
{
  basic_block bb = path[i - 1];
@@ -504,7 +504,7 @@ path_range_query::precompute_ranges (const vec 
,
  bitmap_set_bit (m_imports, SSA_NAME_VERSION (name));
}
 
-  precompute_ranges_in_block (bb);
+  compute_ranges_in_block (bb);
   adjust_for_non_null_uses (bb);
 
   if (at_exit ())
@@ -611,12 +611,12 @@ path_range_query::range_of_stmt (irange , gimple *stmt, 
tree)
   return true;
 }
 
-// Precompute relations on a path.  This involves two parts: relations
+// Compute relations on a path.  This involves two parts: relations
 // along the conditionals joining a path, and relations determined by
 // examining PHIs.
 
 void
-path_range_query::precompute_relations (const vec )
+path_range_query::compute_relations (const vec )
 {
   if (!dom_info_available_p (CDI_DOMINATORS))
 return;
@@ -628,7 +628,7 @@ path_range_query::precompute_relations (const 
vec )
   basic_block bb = path[i - 1];
   gimple *stmt = last_stmt (bb);
 
-  precompute_phi_relations (bb, prev);
+  compute_phi_relations (bb, prev);
 
   // Compute relations in outgoing edges along the path.  Skip the
   // final conditional which we don't know yet.
@@ -656,14 +656,14 @@ path_range_query::precompute_relations (const 
vec )
 }
 }
 
-// Precompute relations for each PHI in BB.  For example:
+// Compute relations for each PHI in BB.  For example:
 //
 //   x_5 = PHI
 //
 // If the path flows through BB5, we can register that x_5 == y_9.
 
 void
-path_range_query::precompute_phi_relations (basic_block bb, basic_block prev)
+path_range_query::compute_phi_relations (basic_block bb, basic_block prev)
 {
   if (prev == NULL)
 return;
diff --git a/gcc/gimple-range-path.h b/gcc/gimple-range-path.h
index f7d9832ac8c..cf49c6dc086 100644
--- a/gcc/gimple-range-path.h
+++ b/gcc/gimple-range-path.h
@@ -26,9 +26,6 @@ 

[COMMITTED] Remove old VRP jump threader code.

2021-09-27 Thread Aldy Hernandez via Gcc-patches
There's a lot of code that melts away without the ASSERT_EXPR based jump
threader.  Also, I cleaned up the include files as part of the process.

gcc/ChangeLog:

* tree-vrp.c (lhs_of_dominating_assert): Remove.
(class vrp_jt_state): Remove.
(class vrp_jt_simplifier): Remove.
(vrp_jt_simplifier::simplify): Remove.
(class vrp_jump_threader): Remove.
(vrp_jump_threader::vrp_jump_threader): Remove.
(vrp_jump_threader::~vrp_jump_threader): Remove.
(vrp_jump_threader::before_dom_children): Remove.
(vrp_jump_threader::after_dom_children): Remove.
---
 gcc/tree-vrp.c | 308 ++---
 1 file changed, 7 insertions(+), 301 deletions(-)

diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index c55a7499c14..5aded5edb11 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -21,54 +21,34 @@ along with GCC; see the file COPYING3.  If not see
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
-#include "backend.h"
-#include "insn-codes.h"
-#include "rtl.h"
+#include "basic-block.h"
+#include "bitmap.h"
+#include "sbitmap.h"
+#include "options.h"
+#include "dominance.h"
+#include "function.h"
+#include "cfg.h"
 #include "tree.h"
 #include "gimple.h"
-#include "cfghooks.h"
 #include "tree-pass.h"
 #include "ssa.h"
-#include "optabs-tree.h"
 #include "gimple-pretty-print.h"
-#include "flags.h"
 #include "fold-const.h"
-#include "stor-layout.h"
-#include "calls.h"
 #include "cfganal.h"
-#include "gimple-fold.h"
-#include "tree-eh.h"
 #include "gimple-iterator.h"
-#include "gimple-walk.h"
 #include "tree-cfg.h"
 #include "tree-ssa-loop-manip.h"
 #include "tree-ssa-loop-niter.h"
-#include "tree-ssa-loop.h"
 #include "tree-into-ssa.h"
-#include "tree-ssa.h"
 #include "cfgloop.h"
 #include "tree-scalar-evolution.h"
 #include "tree-ssa-propagate.h"
-#include "tree-chrec.h"
-#include "tree-ssa-threadupdate.h"
-#include "tree-ssa-scopedtables.h"
 #include "tree-ssa-threadedge.h"
-#include "omp-general.h"
-#include "target.h"
-#include "case-cfn-macros.h"
-#include "alloc-pool.h"
 #include "domwalk.h"
-#include "tree-cfgcleanup.h"
-#include "stringpool.h"
-#include "attribs.h"
 #include "vr-values.h"
-#include "builtins.h"
-#include "range-op.h"
-#include "value-range-equiv.h"
 #include "gimple-array-bounds.h"
 #include "gimple-range.h"
 #include "gimple-range-path.h"
-#include "tree-ssa-dom.h"
 
 /* Set of SSA names found live during the RPO traversal of the function
for still active basic-blocks.  */
@@ -2349,34 +2329,6 @@ stmt_interesting_for_vrp (gimple *stmt)
   return false;
 }
 
-
-/* Return the LHS of any ASSERT_EXPR where OP appears as the first
-   argument to the ASSERT_EXPR and in which the ASSERT_EXPR dominates
-   BB.  If no such ASSERT_EXPR is found, return OP.  */
-
-static tree
-lhs_of_dominating_assert (tree op, basic_block bb, gimple *stmt)
-{
-  imm_use_iterator imm_iter;
-  gimple *use_stmt;
-  use_operand_p use_p;
-
-  if (TREE_CODE (op) == SSA_NAME)
-{
-  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, op)
-   {
- use_stmt = USE_STMT (use_p);
- if (use_stmt != stmt
- && gimple_assign_single_p (use_stmt)
- && TREE_CODE (gimple_assign_rhs1 (use_stmt)) == ASSERT_EXPR
- && TREE_OPERAND (gimple_assign_rhs1 (use_stmt), 0) == op
- && dominated_by_p (CDI_DOMINATORS, bb, gimple_bb (use_stmt)))
-   return gimple_assign_lhs (use_stmt);
-   }
-}
-  return op;
-}
-
 /* Searches the case label vector VEC for the index *IDX of the CASE_LABEL
that includes the value VAL.  The search is restricted to the range
[START_IDX, n - 1] where n is the size of VEC.
@@ -4163,252 +4115,6 @@ vrp_folder::fold_stmt (gimple_stmt_iterator *si)
   return simplifier.simplify (si);
 }
 
-class vrp_jt_state : public jt_state
-{
-public:
-  vrp_jt_state (const_and_copies *copies, avail_exprs_stack *avails)
-: m_copies (copies), m_avails (avails)
-  {
-  }
-  void push (edge e) override
-  {
-m_copies->push_marker ();
-m_avails->push_marker ();
-jt_state::push (e);
-  }
-  void pop () override
-  {
-m_copies->pop_to_marker ();
-m_avails->pop_to_marker ();
-jt_state::pop ();
-  }
-  void register_equiv (tree dest, tree src, bool) override
-  {
-m_copies->record_const_or_copy (dest, src);
-  }
-  void register_equivs_edge (edge e) override
-  {
-record_temporary_equivalences (e, m_copies, m_avails);
-  }
-  void record_ranges_from_stmt (gimple *, bool) override
-  {
-  }
-private:
-  const_and_copies *m_copies;
-  avail_exprs_stack *m_avails;
-};
-
-class vrp_jt_simplifier : public jt_simplifier
-{
-public:
-  vrp_jt_simplifier (vr_values *v, avail_exprs_stack *avails)
-: m_vr_values (v), m_avail_exprs_stack (avails) { }
-
-private:
-  tree simplify (gimple *, gimple *, basic_block, jt_state *) override;
-  vr_values *m_vr_values;
-  avail_exprs_stack *m_avail_exprs_stack;
-};
-
-tree

Re: [PATCH] Replace VRP threader with a hybrid forward threader.

2021-09-27 Thread Aldy Hernandez via Gcc-patches




On 9/27/21 5:01 PM, Jeff Law wrote:



On 9/24/2021 9:46 AM, Aldy Hernandez wrote:

This patch implements the new hybrid forward threader and replaces the
embedded VRP threader with it.
But most importantly, it pulls it out of the VRP pass as we no longer 
need the VRP data or ASSERT_EXPRs.


Yes, I have a follow-up patch removing the old mini-pass.





With all the pieces that have gone in, the implementation of the hybrid
threader is straightforward: convert the current state into
SSA imports that the solver will understand, and let the path solver
precompute ranges and relations for the path.  After this setup is done,
we can use the range_query API to solve gimple statements in the 
threader.

The forward threader is now engine agnostic so there are no changes to
the threader per se.
So the big question is do we think it's going to be this clean when we 
try to divorce the threading from DOM?


Interestingly, yes.  With all the refactoring I've done, it turns out 
that divorcing evrp from the DOM threader is a matter of having 
dom_jt_simplifier inherit from hybrid_jt_simplifier instead of the base 
class.  Then we have simplify() look at the const_copies/avails, 
otherwise let the hybrid simplifier do its thing.  Yes, I was amazed too.


As usual there are caveats:

First, notice that we'd still depend on const_copies/avails, because 
we'd need them for floats anyhow.  But this has the added benefit of 
catching a few things in the presence of the IL changing from under us.


Second, it turns out that DOM has other uses of evrp that need to be 
addressed-- particularly its use of evrp to do its simple copy prop.


Be that as it may, none of these are show stoppers.  I have a proof of 
concept that converts everything with a few lines of code.


The big issue now is performance.  Plugging in the full ranger makes it 
uncomfortably slower than just using evrp.  Andrew has some ideas for a 
super fast ranger that doesn't do full look-ups, so we have finally 
found a good use case for something we had in the back burner.


Now, numbers...

Converting the DOM threader to a hybrid client improves DOM threading 
counts by 4%, but it's all at the expense of other passes.  The total 
threading counts was unchanged (well, it got worse by -0.05%).  It 
doesn't look like there's any gain.  We're shuffling things around at 
this point.






I have put the hybrid bits in tree-ssa-threadedge.*, instead of VRP,
because they will also be used in the evrp removal of the DOM/threader,
which is my next task.

Sweet.



Most of the patch, is actually test changes.  I have gone through every
single one and verified that we're correct.  Most were trivial dump
file name changes, but others required going through the IL an
certifying that the different IL was expected.

For example, in pr59597.c, we have one less thread because the
ASSERT_EXPR was getting in the way, and making it seem like things were
not crossing loops.  The hybrid threader sees the correct representation
of the IL, and avoids threading this one case.

The final numbers are a 12.16% improvement in jump threads immediately
after VRP, and a 0.82% improvement in overall jump threads.  The
performance drop is 0.6% (plus the 1.43% hit from moving the embedded
threader into its own pass).  As I've said, I'd prefer to keep the
threader in its own pass, but if this is an issue, we can address this
with a shared ranger when VRP is replaced with an evrp instance
(upcoming).
Presumably we're also seeing a cannibalization of threads from later 
passes.   And just to be clear, this is good.


And the big question, is the pass running after VRP2 doing anything 
particularly useful?  Do we want to try and kill it now, or later?


Interesting question.  Perhaps if we convert DOM threading to a hybrid 
model, it will render the post-VRP threader completely useless.  Huhh... 
That could kill 2 birds with one stone... we get rid of a threading 
pass, and we don't need to worry about as much about the super-fast ranger.


Huh...good idea.  I will experiment.

Thanks.
Aldy



Re: [PATCH] Replace VRP threader with a hybrid forward threader.

2021-09-27 Thread Jeff Law via Gcc-patches




On 9/24/2021 9:46 AM, Aldy Hernandez wrote:

This patch implements the new hybrid forward threader and replaces the
embedded VRP threader with it.
But most importantly, it pulls it out of the VRP pass as we no longer 
need the VRP data or ASSERT_EXPRs.




With all the pieces that have gone in, the implementation of the hybrid
threader is straightforward: convert the current state into
SSA imports that the solver will understand, and let the path solver
precompute ranges and relations for the path.  After this setup is done,
we can use the range_query API to solve gimple statements in the threader.
The forward threader is now engine agnostic so there are no changes to
the threader per se.
So the big question is do we think it's going to be this clean when we 
try to divorce the threading from DOM?




I have put the hybrid bits in tree-ssa-threadedge.*, instead of VRP,
because they will also be used in the evrp removal of the DOM/threader,
which is my next task.

Sweet.



Most of the patch, is actually test changes.  I have gone through every
single one and verified that we're correct.  Most were trivial dump
file name changes, but others required going through the IL an
certifying that the different IL was expected.

For example, in pr59597.c, we have one less thread because the
ASSERT_EXPR was getting in the way, and making it seem like things were
not crossing loops.  The hybrid threader sees the correct representation
of the IL, and avoids threading this one case.

The final numbers are a 12.16% improvement in jump threads immediately
after VRP, and a 0.82% improvement in overall jump threads.  The
performance drop is 0.6% (plus the 1.43% hit from moving the embedded
threader into its own pass).  As I've said, I'd prefer to keep the
threader in its own pass, but if this is an issue, we can address this
with a shared ranger when VRP is replaced with an evrp instance
(upcoming).
Presumably we're also seeing a cannibalization of threads from later 
passes.   And just to be clear, this is good.


And the big question, is the pass running after VRP2 doing anything 
particularly useful?  Do we want to try and kill it now, or later?




As I mentioned in my introductory note, paths ending in MEM_REF
conditional are missing.  In reality, this didn't make a difference, as
it was so rare.  However, as a follow-up, I will distill a test and add
a suitable PR to keep us honest.
Yea, I don't think these are going to be a notable issue for the 
threaders that were previously run out of VRP.  I'm less sure about DOM 
though.




There is a one-line change to libgomp/team.c silencing a new used
uninitialized warning.  As my previous work with the threaders has
shown, warnings flare up after each improvement to jump threading.  I
expect this to be no different.  I've promised Jakub to investigate
fully, so I will analyze and add the appropriate PR for the warning
experts.

ACK.




Oh yeah, the new pass dump is called vrp-threader[12] to match each
VRP[12] pass.  However, there's no reason for it to either be named
vrp-threader, or for it to live in tree-vrp.c.

Tested on x86-64 Linux.

OK?

p.s. "Did I say 5 weeks?  My bad, I meant 5 months."

gcc/ChangeLog:

* passes.def (pass_vrp_threader): New.
* tree-pass.h (make_pass_vrp_threader): Add make_pass_vrp_threader.
* tree-ssa-threadedge.c (hybrid_jt_state::register_equivs_stmt): New.
(hybrid_jt_simplifier::hybrid_jt_simplifier): New.
(hybrid_jt_simplifier::simplify): New.
(hybrid_jt_simplifier::compute_ranges_from_state): New.
* tree-ssa-threadedge.h (class hybrid_jt_state): New.
(class hybrid_jt_simplifier): New.
* tree-vrp.c (execute_vrp): Remove ASSERT_EXPR based jump
threader.
(class hybrid_threader): New.
(hybrid_threader::hybrid_threader): New.
(hybrid_threader::~hybrid_threader): New.
(hybrid_threader::before_dom_children): New.
(hybrid_threader::after_dom_children): New.
(execute_vrp_threader): New.
(class pass_vrp_threader): New.
(make_pass_vrp_threader): New.

libgomp/ChangeLog:

* team.c: Initialize start_data.
* testsuite/libgomp.graphite/force-parallel-4.c: Adjust.
* testsuite/libgomp.graphite/force-parallel-8.c: Adjust.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr55107.c: Adjust.
* gcc.dg/tree-ssa/phi_on_compare-1.c: Adjust.
* gcc.dg/tree-ssa/phi_on_compare-2.c: Adjust.
* gcc.dg/tree-ssa/phi_on_compare-3.c: Adjust.
* gcc.dg/tree-ssa/phi_on_compare-4.c: Adjust.
* gcc.dg/tree-ssa/pr21559.c: Adjust.
* gcc.dg/tree-ssa/pr59597.c: Adjust.
* gcc.dg/tree-ssa/pr61839_1.c: Adjust.
* gcc.dg/tree-ssa/pr61839_3.c: Adjust.
* gcc.dg/tree-ssa/pr71437.c: Adjust.
* gcc.dg/tree-ssa/ssa-dom-thread-11.c: Adjust.
* gcc.dg/tree-ssa/ssa-dom-thread-16.c: Adjust.
* gcc.dg/tree-ssa/ssa-dom-thread-18.c: 

[PATCH] Control all jump threading passes with -fjump-threads.

2021-09-27 Thread Aldy Hernandez via Gcc-patches
Last year I mentioned that -fthread-jumps was being ignored by the
majority of our jump threading passes, and Jeff said he'd be in favor
of fixing this.

This patch remedies the situation, but it does change existing behavior.
Currently -fthread-jumps is only enabled for -O2, -O3, and -Os.  This
means that even if we restricted all jump threading passes with
-fthread-jumps, DOM jump threading would still seep through since it
runs at -O1.

I propose this patch, but it does mean that DOM jump threading would
have to be explicitly enabled with -O1 -fthread-jumps.  An
alternative would be to also offer a specific -fno-dom-threading, but
that seems icky.

OK pending tests?

gcc/ChangeLog:

* tree-ssa-threadbackward.c (pass_thread_jumps::gate): Check
flag_thread_jumps.
(pass_early_thread_jumps::gate): Same.
* tree-ssa-threadedge.c (jump_threader::thread_outgoing_edges):
Return if !flag_thread_jumps.
* tree-ssa-threadupdate.c
(jt_path_registry::register_jump_thread): Assert that
flag_thread_jumps is true.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr18133-2.c | 2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr18134.c   | 2 +-
 gcc/tree-ssa-threadbackward.c | 4 ++--
 gcc/tree-ssa-threadedge.c | 3 +++
 gcc/tree-ssa-threadupdate.c   | 2 ++
 5 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr18133-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr18133-2.c
index 8717640e327..1b409852189 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr18133-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr18133-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O1 -fdump-tree-optimized-blocks" } */
+/* { dg-options "-O1 -fthread-jumps -fdump-tree-optimized-blocks" } */
 
 int c, d;
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr18134.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr18134.c
index cd40ab2c162..d7f5d241eb9 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr18134.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr18134.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O1 -fdump-tree-optimized" } */
+/* { dg-options "-O1 -fthread-jumps -fdump-tree-optimized" } */
 
 int  foo (int a)
 {
diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
index 95542079faf..8940728cbf2 100644
--- a/gcc/tree-ssa-threadbackward.c
+++ b/gcc/tree-ssa-threadbackward.c
@@ -943,7 +943,7 @@ public:
 bool
 pass_thread_jumps::gate (function *fun ATTRIBUTE_UNUSED)
 {
-  return flag_expensive_optimizations;
+  return flag_thread_jumps && flag_expensive_optimizations;
 }
 
 // Try to thread blocks in FUN.  Return TRUE if any jump thread paths were
@@ -1013,7 +1013,7 @@ public:
 bool
 pass_early_thread_jumps::gate (function *fun ATTRIBUTE_UNUSED)
 {
-  return true;
+  return flag_thread_jumps;
 }
 
 unsigned int
diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index ae77e5eb396..e6f0ff0b54b 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -1195,6 +1195,9 @@ jump_threader::thread_outgoing_edges (basic_block bb)
   int flags = (EDGE_IGNORE | EDGE_COMPLEX | EDGE_ABNORMAL);
   gimple *last;
 
+  if (!flag_thread_jumps)
+return;
+
   /* If we have an outgoing edge to a block with multiple incoming and
  outgoing edges, then we may be able to thread the edge, i.e., we
  may be able to statically determine which of the outgoing edges
diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index 2b9b8f81274..cf96c903668 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -2822,6 +2822,8 @@ jt_path_registry::cancel_invalid_paths 
(vec )
 bool
 jt_path_registry::register_jump_thread (vec *path)
 {
+  gcc_checking_assert (flag_thread_jumps);
+
   if (!dbg_cnt (registered_jump_thread))
 {
   path->release ();
-- 
2.31.1



Re: [PATCH] Update pathname for IBM long double description.

2021-09-27 Thread Jeff Law via Gcc-patches




On 9/27/2021 5:17 AM, Vincent Lefevre wrote:

Update due to file moved to libgcc/config/rs6000/ibm-ldouble-format
in commit aca0b0b315f6e5a0ee60981fd4b0cbc9a7f59096.

Signed-off-by: Vincent Lefevre 
---
  include/floatformat.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

THanks.  Pushed to the trunk.

jeff



[PATCH] c++: deduction guides and ttp rewriting [PR102479]

2021-09-27 Thread Patrick Palka via Gcc-patches
The problem here is ultimately that rewrite_tparm_list when rewriting a
TEMPLATE_TEMPLATE_PARM introduces a tree cycle in the rewritten
ttp that structural_comptypes can't cope with.  In particular the
DECL_TEMPLATE_PARMS of a ttp's TEMPLATE_DECL normally captures an empty
parameter list at its own level (and so the TEMPLATE_DECL doesn't appear
in its own DECL_TEMPLATE_PARMS), but rewrite_tparm_list ends up giving
it a complete parameter list.  In the new testcase below, this causes
infinite recursion from structural_comptypes when comparing Tmpl
with Tmpl (here both 'Tmpl's are rewritten).

This patch fixes this by making rewrite_template_parm give a rewritten
template template parm an empty parameter list at its own level, thereby
avoiding the tree cycle.  Testing the alias CTAD case revealed that
we're not setting current_template_parms in alias_ctad_tweaks, which
this patch also fixes.  Also, the change to use TMPL_ARGS_LEVEL instead
of TREE_VEC_ELT is needed because alias_ctad_tweaks passes only a single
level of targs to rewrite_tparm_list.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/102479

gcc/cp/ChangeLog:

* pt.c (rewrite_template_parm): Use TMPL_ARGS_LEVEL instead of
TREE_VEC_ELT directly to properly handle one-level tsubst_args.
Avoid a tree cycle when assigning the DECL_TEMPLATE_PARMS for a
rewritten ttp.
(alias_ctad_tweaks): Set current_template_parms accordingly.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/class-deduction12.C: Also test alias CTAD in the
same way.
* g++.dg/cpp1z/class-deduction99.C: New test.
---
 gcc/cp/pt.c   | 20 +--
 .../g++.dg/cpp1z/class-deduction12.C  |  6 
 .../g++.dg/cpp1z/class-deduction99.C  | 35 +++
 3 files changed, 58 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction99.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 6bd6ceb29be..cba0f5c8279 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -28754,7 +28754,7 @@ rewrite_template_parm (tree olddecl, unsigned index, 
unsigned level,
  const int depth = TMPL_ARGS_DEPTH (tsubst_args);
  tree ttargs = make_tree_vec (depth + 1);
  for (int i = 0; i < depth; ++i)
-   TREE_VEC_ELT (ttargs, i) = TREE_VEC_ELT (tsubst_args, i);
+   TREE_VEC_ELT (ttargs, i) = TMPL_ARGS_LEVEL (tsubst_args, i + 1);
  TREE_VEC_ELT (ttargs, depth)
= template_parms_level_to_args (ttparms);
  // Substitute ttargs into ttparms to fix references to
@@ -28767,8 +28767,17 @@ rewrite_template_parm (tree olddecl, unsigned index, 
unsigned level,
  ttparms = tsubst_template_parms_level (ttparms, ttargs,
 complain);
  // Finally, tack the adjusted parms onto tparms.
- ttparms = tree_cons (size_int (depth), ttparms,
-  current_template_parms);
+ ttparms = tree_cons (size_int (level + 1), ttparms,
+  copy_node (current_template_parms));
+ // As with all template template parms, the parameter list captured
+ // by this template template parm that corresponds to its own level
+ // should be empty.  This avoids infinite recursion when structurally
+ // comparing two such rewritten template template parms (102479).
+ gcc_assert (!TREE_VEC_LENGTH
+ (TREE_VALUE (TREE_CHAIN (DECL_TEMPLATE_PARMS 
(olddecl);
+ gcc_assert (TMPL_PARMS_DEPTH (TREE_CHAIN (ttparms)) == level);
+ TREE_VALUE (TREE_CHAIN (ttparms)) = make_tree_vec (0);
+ // All done.
  DECL_TEMPLATE_PARMS (newdecl) = ttparms;
}
 }
@@ -29266,6 +29275,11 @@ alias_ctad_tweaks (tree tmpl, tree uguides)
  ++ndlen;
  tree gtparms = make_tree_vec (natparms + ndlen);
 
+ /* Set current_template_parms as in build_deduction_guide.  */
+ auto ctp = make_temp_override (current_template_parms);
+ current_template_parms = copy_node (DECL_TEMPLATE_PARMS (tmpl));
+ TREE_VALUE (current_template_parms) = gtparms;
+
  /* First copy over the parms of A.  */
  for (j = 0; j < natparms; ++j)
TREE_VEC_ELT (gtparms, j) = TREE_VEC_ELT (atparms, j);
diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction12.C 
b/gcc/testsuite/g++.dg/cpp1z/class-deduction12.C
index a31cc1526db..f0d7ea0e16b 100644
--- a/gcc/testsuite/g++.dg/cpp1z/class-deduction12.C
+++ b/gcc/testsuite/g++.dg/cpp1z/class-deduction12.C
@@ -15,3 +15,9 @@ A a(,2,B<42>());
 template  class same;
 template  class same {};
 same> s;
+
+#if __cpp_deduction_guides >= 201907
+template  using C = A;
+
+same())), A> t;
+#endif
diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction99.C 
b/gcc/testsuite/g++.dg/cpp1z/class-deduction99.C
new file mode 100644
index 

Re: [PATCH] tree-optimization/100112 - VN last_vuse and redundant store elimination

2021-09-27 Thread Richard Biener via Gcc-patches
On Mon, Sep 27, 2021 at 3:08 PM Richard Biener via Gcc-patches
 wrote:
>
> This avoids the last_vuse optimization hindering redundant store
> elimination by always also recording the original VUSE that was
> in effect on the load.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> I'm still pondering on how to avoid the wastage of adding the ref
> twice and will at least record some statistics for this.

In stage3 gcc/*.o we have 3182752 times recorded a single
entry and 903409 times two entries (that's ~20% overhead).

With just recording a single entry the number of hashtable lookups
done when walking the vuse->vdef links to find an earlier access
is 28961618.  When recording the second entry this makes us find
that earlier for donwnstream redundant accesses, reducing the number
of hashtable lookups to 25401052 (that's a ~10% reduction).

Overall I think it's a reasonable trade-off but as said, I'm pondering
a bit on how to reduce the overhead without too ugly hacks.

Richard.

> 2021-09-27  Richard Biener  
>
> PR tree-optimization/100112
> * tree-ssa-sccvn.c (visit_reference_op_load): Record the
> referece into the hashtable twice in case last_vuse is
> different from the original vuse on the stmt.
>
> * gcc.dg/tree-ssa/ssa-fre-95.c: New testcase.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-95.c | 25 ++
>  gcc/tree-ssa-sccvn.c   | 17 +++
>  2 files changed, 38 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-95.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-95.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-95.c
> new file mode 100644
> index 000..b0936be5e77
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-95.c
> @@ -0,0 +1,25 @@
> +/* PR100112 and dups.  */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-fre1-details -fdump-tree-optimized" } */
> +
> +int *c, *b;
> +void foo()
> +{
> +  int *tem = b;
> +  *tem = 0;
> +  int *footem = c;
> +  c = footem;
> +}
> +
> +void bar()
> +{
> +  int *tem = b;
> +  int *bartem = c;
> +  *tem = 0;
> +  c = bartem;
> +}
> +
> +/* We should elide the redundant store in foo, in bar it is not redundant 
> since
> +   the *tem = 0 store might alias.  */
> +/* { dg-final { scan-tree-dump "Deleted redundant store c = footem" "fre1" } 
> } */
> +/* { dg-final { scan-tree-dump "c = bartem" "optimized" } } */
> diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
> index e8b1c39184d..416a5252144 100644
> --- a/gcc/tree-ssa-sccvn.c
> +++ b/gcc/tree-ssa-sccvn.c
> @@ -5125,13 +5125,12 @@ static bool
>  visit_reference_op_load (tree lhs, tree op, gimple *stmt)
>  {
>bool changed = false;
> -  tree last_vuse;
>tree result;
>vn_reference_t res;
>
> -  last_vuse = gimple_vuse (stmt);
> -  result = vn_reference_lookup (op, gimple_vuse (stmt),
> -   default_vn_walk_kind, , true, _vuse);
> +  tree vuse = gimple_vuse (stmt);
> +  tree last_vuse = vuse;
> +  result = vn_reference_lookup (op, vuse, default_vn_walk_kind, , true, 
> _vuse);
>
>/* We handle type-punning through unions by value-numbering based
>   on offset and size of the access.  Be prepared to handle a
> @@ -5174,6 +5173,16 @@ visit_reference_op_load (tree lhs, tree op, gimple 
> *stmt)
>  {
>changed = set_ssa_val_to (lhs, lhs);
>vn_reference_insert (op, lhs, last_vuse, NULL_TREE);
> +  if (vuse && SSA_VAL (last_vuse) != SSA_VAL (vuse))
> +   {
> + if (dump_file && (dump_flags & TDF_DETAILS))
> +   {
> + fprintf (dump_file, "Using extra use virtual operand ");
> + print_generic_expr (dump_file, last_vuse);
> + fprintf (dump_file, "\n");
> +   }
> + vn_reference_insert (op, lhs, vuse, NULL_TREE);
> +   }
>  }
>
>return changed;
> --
> 2.31.1


[PATCH] libstdc++: Clear padding bits in atomic compare_exchange

2021-09-27 Thread Thomas Rodgers
From: Thomas Rodgers 

Now with checks for __has_builtin(__builtin_clear_padding)

This change implements P0528 which requires that padding bits not
participate in atomic compare exchange operations. All arguments to the
generic template are 'sanitized' by the __builtin_clearpadding intrisic
before they are used in comparisons. This alrequires that any stores
also sanitize the incoming value.

Signed-off-by: Thomas Rodgers 

libstdc++=v3/ChangeLog:

* include/std/atomic (atomic::atomic(_Tp) clear padding for
__cplusplus > 201703L.
(atomic::store()) Clear padding.
(atomic::exchange()) Likewise.
(atomic::compare_exchange_weak()) Likewise.
(atomic::compare_exchange_strong()) Likewise.
* testsuite/29_atomics/atomic/compare_exchange_padding.cc: New
test.
---
 libstdc++-v3/include/std/atomic   | 41 +-
 .../atomic/compare_exchange_padding.cc| 42 +++
 2 files changed, 81 insertions(+), 2 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/29_atomics/atomic/compare_exchange_padding.cc

diff --git a/libstdc++-v3/include/std/atomic b/libstdc++-v3/include/std/atomic
index 936dd50ba1c..4ac9ccdc1ab 100644
--- a/libstdc++-v3/include/std/atomic
+++ b/libstdc++-v3/include/std/atomic
@@ -228,7 +228,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   atomic& operator=(const atomic&) = delete;
   atomic& operator=(const atomic&) volatile = delete;
 
-  constexpr atomic(_Tp __i) noexcept : _M_i(__i) { }
+#if __cplusplus > 201703L && __has_builtin(__builtin_clear_padding)
+  constexpr atomic(_Tp __i) noexcept : _M_i(__i)
+  { __builtin_clear_padding(std::__addressof(_M_i)); }
+#else
+  constexpr atomic(_Tp __i) noexcept : _M_i(__i)
+  { }
+#endif
 
   operator _Tp() const noexcept
   { return load(); }
@@ -268,12 +274,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   void
   store(_Tp __i, memory_order __m = memory_order_seq_cst) noexcept
   {
+#if __has_builtin(__builtin_clear_padding)
+   __builtin_clear_padding(std::__addressof(__i));
+#endif
__atomic_store(std::__addressof(_M_i), std::__addressof(__i), int(__m));
   }
 
   void
   store(_Tp __i, memory_order __m = memory_order_seq_cst) volatile noexcept
   {
+#if __has_builtin(__builtin_clear_padding)
+   __builtin_clear_padding(std::__addressof(__i));
+#endif
__atomic_store(std::__addressof(_M_i), std::__addressof(__i), int(__m));
   }
 
@@ -300,6 +312,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   {
 alignas(_Tp) unsigned char __buf[sizeof(_Tp)];
_Tp* __ptr = reinterpret_cast<_Tp*>(__buf);
+#if __has_builtin(__builtin_clear_padding)
+   __builtin_clear_padding(std::__addressof(__i));
+#endif
__atomic_exchange(std::__addressof(_M_i), std::__addressof(__i),
  __ptr, int(__m));
return *__ptr;
@@ -311,6 +326,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   {
 alignas(_Tp) unsigned char __buf[sizeof(_Tp)];
_Tp* __ptr = reinterpret_cast<_Tp*>(__buf);
+#if __has_builtin(__builtin_clear_padding)
+   __builtin_clear_padding(std::__addressof(__i));
+#endif
__atomic_exchange(std::__addressof(_M_i), std::__addressof(__i),
  __ptr, int(__m));
return *__ptr;
@@ -322,6 +340,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   {
__glibcxx_assert(__is_valid_cmpexch_failure_order(__f));
 
+#if __has_builtin(__builtin_clear_padding)
+   __builtin_clear_padding(std::__addressof(__e));
+   __builtin_clear_padding(std::__addressof(__i));
+#endif
return __atomic_compare_exchange(std::__addressof(_M_i),
 std::__addressof(__e),
 std::__addressof(__i),
@@ -334,6 +356,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   {
__glibcxx_assert(__is_valid_cmpexch_failure_order(__f));
 
+#if __has_builtin(__builtin_clear_padding)
+   __builtin_clear_padding(std::__addressof(__e));
+   __builtin_clear_padding(std::__addressof(__i));
+#endif
return __atomic_compare_exchange(std::__addressof(_M_i),
 std::__addressof(__e),
 std::__addressof(__i),
@@ -358,6 +384,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   {
__glibcxx_assert(__is_valid_cmpexch_failure_order(__f));
 
+#if __has_builtin(__builtin_clear_padding)
+   __builtin_clear_padding(std::__addressof(__e));
+   __builtin_clear_padding(std::__addressof(__i));
+#endif
return __atomic_compare_exchange(std::__addressof(_M_i),
 std::__addressof(__e),
 std::__addressof(__i),
@@ -370,6 +400,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   {
__glibcxx_assert(__is_valid_cmpexch_failure_order(__f));
 
+#if __has_builtin(__builtin_clear_padding)
+   

RE: [PATCH] aarch64: Fix type qualifiers for qtbl1 and qtbx1 Neon builtins

2021-09-27 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Gcc-patches  bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Jonathan
> Wright via Gcc-patches
> Sent: 24 September 2021 13:54
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Sandiford 
> Subject: [PATCH] aarch64: Fix type qualifiers for qtbl1 and qtbx1 Neon
> builtins
> 
> Hi,
> 
> This patch fixes type qualifiers for the qtbl1 and qtbx1 Neon builtins
> and removes the casts from the Neon intrinsic function bodies that
> use these builtins.
> 
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
> 
> Ok for master?

Ok.
Thanks,
Kyrill

> 
> Thanks,
> Jonathan
> 
> ---
> 
> gcc/ChangeLog:
> 
> 23-09-2021  Jonathan Wright  
> 
>   * config/aarch64/aarch64-builtins.c (TYPES_BINOP_PPU): Define
>   new type qualifier enum.
>   (TYPES_TERNOP_SSSU): Likewise.
>   (TYPES_TERNOP_PPPU): Likewise.
>   * config/aarch64/aarch64-simd-builtins.def: Define PPU, SSU,
>   PPPU and SSSU builtin generator macros for qtbl1 and qtbx1
>   Neon builtins.
>   * config/aarch64/arm_neon.h (vqtbl1_p8): Use type-qualified
>   builtin and remove casts.
>   (vqtbl1_s8): Likewise.
>   (vqtbl1q_p8): Likewise.
>   (vqtbl1q_s8): Likewise.
>   (vqtbx1_s8): Likewise.
>   (vqtbx1_p8): Likewise.
>   (vqtbx1q_s8): Likewise.
>   (vqtbx1q_p8): Likewise.
>   (vtbl1_p8): Likewise.
>   (vtbl2_p8): Likewise.
>   (vtbx2_p8): Likewise.


Re: [RFC] Experimental __attribute__((saturating)) on integer types.

2021-09-27 Thread Richard Biener via Gcc-patches
On Sun, Sep 26, 2021 at 10:38 PM Roger Sayle  wrote:
>
>
> This patch is prototype proof-of-concept (and request for feedback)
> that touches the front-end, middle-end and backend.  My recent patch to
> perform RTL constant folding of saturating arithmetic revealed how
> difficult it is to generate a (portable) test case for that functionality.
> This patch experiments with adding an "saturating" attribute to the
> C-family front-ends to set the TYPE_SATURATING flag on integer types,
> initially as a debugging/testing tool for the middle-end.  GCC already
> contains logic during RTL expansion to emit [us]s_plus and [us]s_minus
> instructions via the standard named [us]ss{add,sub}3 optabs.
>
> Disappointingly, although the documentation for ssplus3 patterns
> implies this should work for arbitrary (i.e. integer) modes, the
> optab querying infrastructure (based on optabs.def) is currently
> limited to fixed-point modes.  Hence the patch below contains a
> tweak to optabs.def.
>
> With both of the above pieces in place, GCC can now generate an
> ssaddsi3 instruction (such as the example provided for the nvptx
> backend), or ICE if the required saturating operation doesn't exist,
> as libgcc doesn't (yet) provide fall-back implementations for
> saturating signed and unsigned arithmetic.
>
> Sticking with the positive, the following code:
>
> typedef int sat_int32 __attribute__ ((saturating));
> int ssadd32(int x, int y) {
>   sat_int32 t = (sat_int32)x + (sat_int32)y;
>   return (int)t;
> }
>
> with this patch, now generates the following on nvptx-none:
>
> mov.u32 %r23, %ar0;
> mov.u32 %r24, %ar1;
> add.sat.s32 %value, %r23, %r24;
>
>
> Are any of the independent chunks below suitable for the compiler?
> Tested on nvptx-none and x86_64-pc-linux-gnu, but nothing changes
> unless __attribute__ ((saturating)) is explicitly added to the source
> code [and I'd recommend against that except for testing purposes].
>
> Eventually saturating arithmetic such as this might be useful for
> kernel security (a hot topic of last week's Linux Plumbers' Conference)
> but it would require a lot of polishing to clean-up the rough edges
> (and ideally better hardware support).
>
> Thoughts?  Even if a new C-family attribute is unsuitable, is my
> logic/implementation in handle_saturating_attribute correct?

I wonder if you need to use tricks like those in handle_vector_size_attribute
to handle say

 __attribute__((saturating)) int foo(void);

Now - ISTR that elsewhere Joseph suggested that taking on
saturating operations by type was eventually misguided and we should
have instead added saturating arithmetic tree codes that we could
expose via some builtin functions like the overflow ones.

Btw, I do welcome patches like this to eventually make the
types accessible to the GIMPLE frontend though we might need
something like 'stopat' to stop us from trying to expand things to
RTL when not all targets support saturating arithmetic and we
have no fallback libgcc implementation.

I think the print-tree bits are OK.

Joseph may want to chime in as to whether it's good to expose
saturating "types" more or whether that works against any intent
to retire that detail.

Richard.

>
> 2021-09-26  Roger Sayle  
>
> gcc/c-family/ChangeLog
> * c-attribs (handle_saturating_attribute): New callback function
> for a "saturating" attribute to set the TYPE_SATURATING flag on
> an integer type.
> (c_common_attribute_table): New entry for "saturating".
>
> gcc/ChangeLog
> * config/nvptx/nvptx.md (ssaddsi3, sssubsi3): New define_insn
> patterns for SImode saturating addition/subtraction respectively.
>
> * optabs.def (ssadd_optab, usadd_optab, ssub_optab, usub_optab):
> Allow querying of integer modes in addition to fixed-point modes.
>
> * print-tree.c (print_node): Output "saturating" when the
> TYPE_SATURATING flag is set on integer types.
>
> Roger
> --
>


[PATCH] tree-optimization/100112 - VN last_vuse and redundant store elimination

2021-09-27 Thread Richard Biener via Gcc-patches
This avoids the last_vuse optimization hindering redundant store
elimination by always also recording the original VUSE that was
in effect on the load.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

I'm still pondering on how to avoid the wastage of adding the ref
twice and will at least record some statistics for this.

2021-09-27  Richard Biener  

PR tree-optimization/100112
* tree-ssa-sccvn.c (visit_reference_op_load): Record the
referece into the hashtable twice in case last_vuse is
different from the original vuse on the stmt.

* gcc.dg/tree-ssa/ssa-fre-95.c: New testcase.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-95.c | 25 ++
 gcc/tree-ssa-sccvn.c   | 17 +++
 2 files changed, 38 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-95.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-95.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-95.c
new file mode 100644
index 000..b0936be5e77
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-95.c
@@ -0,0 +1,25 @@
+/* PR100112 and dups.  */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-fre1-details -fdump-tree-optimized" } */
+
+int *c, *b;
+void foo()
+{
+  int *tem = b;
+  *tem = 0;
+  int *footem = c;
+  c = footem;
+}
+
+void bar()
+{
+  int *tem = b;
+  int *bartem = c;
+  *tem = 0;
+  c = bartem;
+}
+
+/* We should elide the redundant store in foo, in bar it is not redundant since
+   the *tem = 0 store might alias.  */
+/* { dg-final { scan-tree-dump "Deleted redundant store c = footem" "fre1" } } 
*/ 
+/* { dg-final { scan-tree-dump "c = bartem" "optimized" } } */ 
diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
index e8b1c39184d..416a5252144 100644
--- a/gcc/tree-ssa-sccvn.c
+++ b/gcc/tree-ssa-sccvn.c
@@ -5125,13 +5125,12 @@ static bool
 visit_reference_op_load (tree lhs, tree op, gimple *stmt)
 {
   bool changed = false;
-  tree last_vuse;
   tree result;
   vn_reference_t res;
 
-  last_vuse = gimple_vuse (stmt);
-  result = vn_reference_lookup (op, gimple_vuse (stmt),
-   default_vn_walk_kind, , true, _vuse);
+  tree vuse = gimple_vuse (stmt);
+  tree last_vuse = vuse;
+  result = vn_reference_lookup (op, vuse, default_vn_walk_kind, , true, 
_vuse);
 
   /* We handle type-punning through unions by value-numbering based
  on offset and size of the access.  Be prepared to handle a
@@ -5174,6 +5173,16 @@ visit_reference_op_load (tree lhs, tree op, gimple *stmt)
 {
   changed = set_ssa_val_to (lhs, lhs);
   vn_reference_insert (op, lhs, last_vuse, NULL_TREE);
+  if (vuse && SSA_VAL (last_vuse) != SSA_VAL (vuse))
+   {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+ fprintf (dump_file, "Using extra use virtual operand ");
+ print_generic_expr (dump_file, last_vuse);
+ fprintf (dump_file, "\n");
+   }
+ vn_reference_insert (op, lhs, vuse, NULL_TREE);
+   }
 }
 
   return changed;
-- 
2.31.1


[PATCH] middle-end/102450 - avoid type_for_size for non-existing modes

2021-09-27 Thread Richard Biener via Gcc-patches
This avoids asking type_for_size for types with sizes for which
no scalar integer mode exists.  Instead the following uses
int_mode_for_size to get the same result.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-09-27  Richard Biener  

PR middle-end/102450
* gimple-fold.c (gimple_fold_builtin_memory_op): Avoid using
type_for_size, instead use int_mode_for_size.
---
 gcc/gimple-fold.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 6fea8a6f9fd..474d0f44375 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -1001,9 +1001,7 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
  return false;
 
  scalar_int_mode mode;
- tree type = lang_hooks.types.type_for_size (ilen * 8, 1);
- if (type
- && is_a  (TYPE_MODE (type), )
+ if (int_mode_for_size (ilen * 8, 0).exists ()
  && GET_MODE_SIZE (mode) * BITS_PER_UNIT == ilen * 8
  && have_insn_for (SET, mode)
  /* If the destination pointer is not aligned we must be able
@@ -1013,6 +1011,7 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
  || (optab_handler (movmisalign_optab, mode)
  != CODE_FOR_nothing)))
{
+ tree type = build_nonstandard_integer_type (ilen * 8, 1);
  tree srctype = type;
  tree desttype = type;
  if (src_align < GET_MODE_ALIGNMENT (mode))
-- 
2.31.1


Re: [PATCH] [GIMPLE] Simplify (_Float16) ceil ((double) x) to .CEIL (x) when available.

2021-09-27 Thread Richard Biener via Gcc-patches
On Fri, Sep 24, 2021 at 1:26 PM liuhongt  wrote:
>
> Hi:
>   Related discussion in [1] and PR.
>
>   Bootstrapped and regtest on x86_64-linux-gnu{-m32,}.
>   Ok for trunk?
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574330.html
>
> gcc/ChangeLog:
>
> PR target/102464
> * config/i386/i386.c (ix86_optab_supported_p):
> Return true for HFmode.
> * match.pd: Simplify (_Float16) ceil ((double) x) to
> __builtin_ceilf16 (a) when a is _Float16 type and
> direct_internal_fn_supported_p.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr102464.c: New test.
> ---
>  gcc/config/i386/i386.c   | 20 +++-
>  gcc/match.pd | 28 +
>  gcc/testsuite/gcc.target/i386/pr102464.c | 39 
>  3 files changed, 79 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr102464.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index ba89e111d28..3767fe9806d 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -23582,20 +23582,24 @@ ix86_optab_supported_p (int op, machine_mode mode1, 
> machine_mode,
>return opt_type == OPTIMIZE_FOR_SPEED;
>
>  case rint_optab:
> -  if (SSE_FLOAT_MODE_P (mode1)
> - && TARGET_SSE_MATH
> - && !flag_trapping_math
> - && !TARGET_SSE4_1)
> +  if (mode1 == HFmode)
> +   return true;
> +  else if (SSE_FLOAT_MODE_P (mode1)
> +  && TARGET_SSE_MATH
> +  && !flag_trapping_math
> +  && !TARGET_SSE4_1)
> return opt_type == OPTIMIZE_FOR_SPEED;
>return true;
>
>  case floor_optab:
>  case ceil_optab:
>  case btrunc_optab:
> -  if (SSE_FLOAT_MODE_P (mode1)
> - && TARGET_SSE_MATH
> - && !flag_trapping_math
> - && TARGET_SSE4_1)
> +  if (mode1 == HFmode)
> +   return true;
> +  else if (SSE_FLOAT_MODE_P (mode1)
> +  && TARGET_SSE_MATH
> +  && !flag_trapping_math
> +  && TARGET_SSE4_1)
> return true;
>return opt_type == OPTIMIZE_FOR_SPEED;
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index a9791ceb74a..9ccec8b6ce3 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -6191,6 +6191,34 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (froms (convert float_value_p@0))
> (convert (tos @0)
>
> +#if GIMPLE
> +(match float16_value_p
> + @0
> + (if (TYPE_MAIN_VARIANT (TREE_TYPE (@0)) == float16_type_node)))
> +(for froms (BUILT_IN_TRUNCL BUILT_IN_TRUNC BUILT_IN_TRUNCF
> +   BUILT_IN_FLOORL BUILT_IN_FLOOR BUILT_IN_FLOORF
> +   BUILT_IN_CEILL BUILT_IN_CEIL BUILT_IN_CEILF
> +   BUILT_IN_ROUNDEVENL BUILT_IN_ROUNDEVEN BUILT_IN_ROUNDEVENF
> +   BUILT_IN_ROUNDL BUILT_IN_ROUND BUILT_IN_ROUNDF
> +   BUILT_IN_NEARBYINTL BUILT_IN_NEARBYINT BUILT_IN_NEARBYINTF
> +   BUILT_IN_RINTL BUILT_IN_RINT BUILT_IN_RINTF)

we do have patterns that convert (truncl (convert floatval)) to
(float)truncf (val),
your's does (_Float16)trunc ((double) float16) -> truncF16 (float16), doesn't it
make sense to have trunc ((double) float16) -> (double)trunfF16
(float16) as well?

Why do you conditionalize on GIMPLE here?

That said, I wonder whether we can somehow address pattern explosion here,
eliding the outer (convert ...) from the match would help a bit already.

The related patterns use optimize && canonicalize_math_p as well btw., not
sure whether either is appropriate here since there are no _Float16 math
functions available.

> + tos (IFN_TRUNC IFN_TRUNC IFN_TRUNC
> + IFN_FLOOR IFN_FLOOR IFN_FLOOR
> + IFN_CEIL IFN_CEIL IFN_CEIL
> + IFN_ROUNDEVEN IFN_ROUNDEVEN IFN_ROUNDEVEN
> + IFN_ROUND IFN_ROUND IFN_ROUND
> + IFN_NEARBYINT IFN_NEARBYINT IFN_NEARBYINT
> + IFN_RINT IFN_RINT IFN_RINT)
> + /* (_Float16) round ((doube) x) -> __built_in_roundf16 (x), etc.,
> +if x is a _Float16.  */
> + (simplify
> +   (convert (froms (convert float16_value_p@0)))
> + (if (types_match (type, TREE_TYPE (@0))
> + && direct_internal_fn_supported_p (as_internal_fn (tos),
> +type, OPTIMIZE_FOR_BOTH))
> +   (tos @0
> +#endif
> +
>  (for froms (XFLOORL XCEILL XROUNDL XRINTL)
>   tos (XFLOOR XCEIL XROUND XRINT)
>   /* llfloorl(extend(x)) -> llfloor(x), etc., if x is a double.  */
> diff --git a/gcc/testsuite/gcc.target/i386/pr102464.c 
> b/gcc/testsuite/gcc.target/i386/pr102464.c
> new file mode 100644
> index 000..e3e060ee80b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr102464.c
> @@ -0,0 +1,39 @@
> +/* PR target/102464.  */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx512fp16" } */
> +
> +#define FOO(FUNC,SUFFIX)   \
> +  _Float16 \
> +  foo_##FUNC##_##SUFFIX (_Float16 a)   \
> +  { 

[committed] libgomp.oacc-fortran/privatized-ref-2.f90: Fix dg-note (was: [Patch] Fortran: Fix assumed-size to assumed-rank passing [PR94070])

2021-09-27 Thread Tobias Burnus

On 27.09.21 14:07, Tobias Burnus wrote:

now committed r12-3897-g00f6de9c69119594f7dad3bd525937c94c8200d0


I accidentally changed dg-note to dg-message when updating the expected
output, as the dump has changed. (Copying seemingly the sorry line
instead of the dg-note lines as template.)

Changed back to dg-note & committed as
r12-3898-gda1f6391b7c255e4e2eea983832120eff4f7d3df.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit da1f6391b7c255e4e2eea983832120eff4f7d3df
Author: Tobias Burnus 
Date:   Mon Sep 27 14:33:39 2021 +0200

libgomp.oacc-fortran/privatized-ref-2.f90: Fix dg-note

In my last commit, r12-3897-g00f6de9c69119594f7dad3bd525937c94c8200d0,
which inlined array-size code, I had to update the expected output.  However,
in doing so, I accidentally (copy'n'paste) changed dg-note into dg-message.

libgomp/
* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Change
dg-message back to dg-note.

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90 b/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90
index 2ff60226109..588f528b2d5 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90
@@ -75,9 +75,9 @@ contains
 ! { dg-note {variable 'A\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: static} "" { target *-*-* } l_compute$c_compute }
 array = [(-2*i, i = 1, size(array))]
 !$acc loop gang private(array) ! { dg-line l_loop[incr c_loop] }
-! { dg-message {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop }
-! { dg-message {variable 'array\.[0-9]+' in 'private' clause is candidate for adjusting OpenACC privatization level} "" { target *-*-* } l_loop$c_loop }
-! { dg-message {variable 'array\.[0-9]+' ought to be adjusted for OpenACC privatization level: 'gang'} "" { target *-*-* } l_loop$c_loop }
+! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop }
+! { dg-note {variable 'array\.[0-9]+' in 'private' clause is candidate for adjusting OpenACC privatization level} "" { target *-*-* } l_loop$c_loop }
+! { dg-note {variable 'array\.[0-9]+' ought to be adjusted for OpenACC privatization level: 'gang'} "" { target *-*-* } l_loop$c_loop }
 
 ! { dg-message {sorry, unimplemented: target cannot support alloca} PR65181 { target openacc_nvidia_accel_selected } l_loop$c_loop }
 


Re: [PATCH] ipa: Fix ICE when speculating calls from inlined functions (PR 102388)

2021-09-27 Thread Martin Liška

On 9/23/21 19:26, Martin Jambor wrote:

Hi,

The code handling various cases which lead to call graph edge
duplication (in order to update reference descriptions used to track
and remove no-longer needed references) has missed one important case.

When edge duplication is an effect of creating a speculative edge for
an indirect edge which carries a constant jump function which had been
created from a pass-through function when the edge caller has was
inlined into one of its callers, the reference description attached to
the function describes an edge higher up in the "inlined" clone tree
and so even the new speculative edge will.  Therefore we should not
try to duplicate the reference description itself but rather just bump
the refcount of the existing one.

Creating a small testcase unfortunately is not very straightforward, I
have not attempted to trigger just the right speculation after inlining.

Bootstrapped and tested on an x86_64-linux.  OK for trunk?


Approved.

Martin



Thanks,

Martin


gcc/ChangeLog:

2021-09-22  Martin Jambor  

PR ipa/102388
* ipa-prop.c (ipa_edge_args_sum_t::duplicate): Also handle the
case when the source reference description corresponds to a
referance taken in a function src->caller is inlined to.
---
  gcc/ipa-prop.c | 40 +++-
  1 file changed, 27 insertions(+), 13 deletions(-)

diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
index 1c69d9766c5..443f21ce61b 100644
--- a/gcc/ipa-prop.c
+++ b/gcc/ipa-prop.c
@@ -4428,19 +4428,33 @@ ipa_edge_args_sum_t::duplicate (cgraph_edge *src, 
cgraph_edge *dst,
dst_jf->value.constant.rdesc = NULL;
  else if (src->caller == dst->caller)
{
- struct ipa_ref *ref;
- symtab_node *n = symtab_node_for_jfunc (src_jf);
- gcc_checking_assert (n);
- ref = src->caller->find_reference (n, src->call_stmt,
-src->lto_stmt_uid);
- gcc_checking_assert (ref);
- dst->caller->clone_reference (ref, ref->stmt);
-
- struct ipa_cst_ref_desc *dst_rdesc = ipa_refdesc_pool.allocate ();
- dst_rdesc->cs = dst;
- dst_rdesc->refcount = src_rdesc->refcount;
- dst_rdesc->next_duplicate = NULL;
- dst_jf->value.constant.rdesc = dst_rdesc;
+ /* Creation of a speculative edge.  If the source edge is the one
+grabbing a reference, we must create a new (duplicate)
+reference description.  Otherwise they refer to the same
+description corresponding to a reference taken in a function
+src->caller is inlined to.  In that case we just must
+increment the refcount.  */
+ if (src_rdesc->cs == src)
+   {
+  symtab_node *n = symtab_node_for_jfunc (src_jf);
+  gcc_checking_assert (n);
+  ipa_ref *ref
+= src->caller->find_reference (n, src->call_stmt,
+   src->lto_stmt_uid);
+  gcc_checking_assert (ref);
+  dst->caller->clone_reference (ref, ref->stmt);
+
+  ipa_cst_ref_desc *dst_rdesc = ipa_refdesc_pool.allocate ();
+  dst_rdesc->cs = dst;
+  dst_rdesc->refcount = src_rdesc->refcount;
+  dst_rdesc->next_duplicate = NULL;
+  dst_jf->value.constant.rdesc = dst_rdesc;
+   }
+ else
+   {
+ src_rdesc->refcount++;
+ dst_jf->value.constant.rdesc = src_rdesc;
+   }
}
  else if (src_rdesc->cs == src)
{





Re: [Patch] Fortran: Fix assumed-size to assumed-rank passing [PR94070]

2021-09-27 Thread Tobias Burnus

Hi Thomas, hi Harald, hi all,

now committed r12-3897-g00f6de9c69119594f7dad3bd525937c94c8200d0
with the following changes:
* Removed now unused gfor_fndecl_size0/gfor_fndecl_size1 (trans{-decl.c,.h})
* Add a scan-dump-not check for those.

See below for some comments.

On 24.09.21 22:38, Thomas Koenig wrote:

OK for mainline?

As promised on IRC, here's the review.

Thanks for the quick review :-)

Maybe you can add a test case which shows that the call to the size
intrinsic really does not happen.
OK with that.


I think you mean to the (_gfortran_)size0/size1 function libgfortran.
Unsurprisingly, the intrinsic itself _is_ still used and the simplify.c +
trans-instrinsic.c's functions are called. However, the libgfortran
functions size0/size1 shouldn't be callable – given that I deleted
the function decl in the front end. I have nonetheless added some
-fdump-tree checks that size0 and size1 do not appear.

Hmm, looking at my patch again – I think I did intent to remove
the decl – but did not actually do it.

In this patch, I have now actually done what I intended and wrote
above: I removed the gfor_fndecl_size0/gfor_fndecl_size1 also from
trans.h (declared) and trans-decl.c (global var, init with fn decl).

size_optional_dim_1.f90 was the only testcase that used size1 before
the patch (it also used size0). Thus, I added the dump check to it
and to the new assumed_rank_22.f90, which has 7 size0 calls with an
unpatched compiler.

Thus: Thanks for asking for the dump check as it showed that I did
forget to remove something ... :-)

Conclusion: Reviews are very helpful :-)

 * * *

As the following email by Harald did not show up at the gcc-patches mailing 
list:
you can find it at 
https://gcc.gnu.org/pipermail/fortran/2021-September/056578.html

In my email, it shows up with "To: fortran@" and "CC: gcc-patches@", thus,
I have no idea why it did not arrive at the mailing-list archive :-(

On 24.09.21 23:12, Harald Anlauf via Fortran wrote:

I played around with your patch and was unable to break it.

Good. That means we can now hand it over to Gerald ;-)

Are you tracking the xfailed parts?


For my newly added xfail, it is fixed by the posted CFI<->GFC
descriptor patch, which I am currently updating (for several reasons).

Otherwise, I lost a bit track of the various TS29113, C-interop,
class(*), type(*), dimension(..) etc. PRs. I think they do cover
most of it. – Besides that CFI/GFC descriptor patch, some additional
patches are still in Sandra's and my pipeline.

And once the CFI/GFC descriptor patch is in, I think it makes sense
to check a bunch of PRs to see whether they are now fixed or something
still needs to be done. Likewise for José's patches. I think they
will be partially superseded by the patches committed, submitted or
soon to be submitted – but I am sure not all issues are fixed.


While playing I stumbled over the fact that when allocating an array
with a dimension that has extent 0, e.g. 4:-5, the lbound gets reset
to 1 and ubound set to 0.


I am not sure, whether I fully understand what you wrote. For:

  integer, allocatable :: a(:)
  allocate(a(4:-5))
  print *, size(a), size(a, dim=1), shape(a) ! should print the '0 0 0'
  print *, lbound(a, dim=1) ! should print '1'
  print *, ubound(a, dim=1) ! should print '0'

where the last line is due to F2018, 16.9.196, which has:

 'Otherwise, if DIM is present, the result has a value equal to the
  number of elements in dimension DIM of ARRAY.'

And lbound(a,dim=1) == 1 due to the "otherwise" case of F2018:16.9.109 LBOUND:
"Case (i): If DIM is present, ARRAY is a whole array, and either
 ARRAY is an assumed-size array of rank DIM or dimension DIM of
 ARRAY has nonzero extent, the result has a value equal to the
 lower bound for subscript DIM of ARRAY. Otherwise, if DIM is
 present, the result value is 1."

And when doing
  call f(a)
  call g(a)
with 'subroutine f(x); integer :: x(:)'
and 'subroutine g(y); integer :: y(..)'

Here, ubound == 0 due to the reason above and lbound is set to
the declared lower bound, which is for 'x' the default ("1") but
could also be 5 with "x(5:)" and for 'y' it cannot be specified.
For 'x', see last sentence of F2018:8.5.8.3. For 'y', I did not
find the exact location but it follows alongsize.

With BIND(C) applied to f and g, ubound remains the same but
lbound is now 0 instead of 1.


Has the standard has changed in this respect?


I doubt it, but only looked at F2018 and not at older standards.


I am probably not the best person to review the trans-* parts, but
I did not spot anything I could point at, and the dump-tree looked
reasonable.  Therefore OK from my side.

Thanks for the work!


Thanks also for your review.

Thanks,

Tobias

PS: I saw that we recently had a couple of double reviews. I think it is
useful if multiple persons look at patches, but hope that we do not
start requiring two reviews for each patch ;-)

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 

Re: [PATCH] Relax condition of (vec_concat:M(vec_select op0 idx0)(vec_select op0 idx1)) to allow different modes between op0 and M, but have same inner mode.

2021-09-27 Thread Richard Biener via Gcc-patches
On Mon, Sep 27, 2021 at 11:42 AM Hongtao Liu  wrote:
>
> On Fri, Sep 24, 2021 at 9:08 PM Segher Boessenkool
>  wrote:
> >
> > On Mon, Sep 13, 2021 at 04:24:13PM +0200, Richard Biener wrote:
> > > On Mon, Sep 13, 2021 at 4:10 PM Jeff Law via Gcc-patches
> > >  wrote:
> > > > I'm not convinced that we need the inner mode to match anything.  As
> > > > long as the vec_concat's mode is twice the size of the vec_select modes
> > > > and the vec_select mode is <= the mode of its operands ISTM this is
> > > > fine.   We  might want the modes of the vec_select to match, but I don't
> > > > think that's strictly necessary either, they just need to be the same
> > > > size.  ie, we could have somethig like
> > > >
> > > > (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DI (reg:V4DI)))
> > > >
> > > > I'm not sure if that level of generality is useful though.  If we want
> > > > the modes of the vec_selects to match I think we could still support
> > > >
> > > > (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DF (reg:V8DF)))
> > > >
> > > > Thoughts?
> > >
> > > I think the component or scalar modes of the elements to concat need to 
> > > match
> > > the component mode of the result.  I don't think you example involving
> > > a cat of DF and DI is too useful - but you could use a subreg around the 
> > > DI
> > > value ;)
> >
> > I agree.
> >
> > If you want to concatenate components of different modes, you should
> > change mode first, using subregs for example.
> I don't really understand.
>
> for
> > > > (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DF (reg:V8DF)))
> > > >
> > > > Thoughts?
> how can it be simplified when reg:V4DF is different from (reg:V8DF)
> to
> (vec_select: (vec_concat:(subreg V8DF (reg:V4DF) 0) (reg:V8DF))
>(paralle[...])
> ?, which doesn't look like a simpication.
>
> Similar for
> > > > (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DI (reg:V4DI)))
>
> here we require rtx_equal_p (XEXP (trueop0, 0), XEXP (trueop1, 0)) so
> vec_concat (vec_select vec_select) can be simplified to just
> vec_select.

Yes, I think your patch is reasonable, I don't understand how we can
intermediately convert here either or how that would help.

Quoting the original example:

(set (reg:V2DF 87 [ xx ])
(vec_concat:V2DF (vec_select:DF (reg:V4DF 92)
(parallel [
(const_int 2 [0x2])
]))
(vec_select:DF (reg:V4DF 92)
(parallel [
(const_int 3 [0x3])
]

it's not about vec_concat of different modes but vec_concat with V2DF
mode concated from vec_select of V4DF and V2DF and V4DF not matching
up.  But we can do (vec_select:V2DF (reg:V4DF ..) (parallel [... ])) just
fine which this case doesn't handle.

Richard.


>
> >
> > ("Inner mode" is something of subregs btw, "component mode" is what this
> > concept of modes is called, the name GET_MODE_INNER is a bit confusing
> > though :-) )
> >
> > Btw, the documentation for "concat" says
> >   @findex concat
> >   @item (concat@var{m} @var{rtx} @var{rtx})
> >   This RTX represents the concatenation of two other RTXs.  This is used
> >   for complex values.  It should only appear in the RTL attached to
> >   declarations and during RTL generation.  It should not appear in the
> >   ordinary insn chain.
> > which needs some updating (in many ways).
> >
> >
> > Segher
>
>
>
> --
> BR,
> Hongtao


RE: [Patch][GCC][middle-end] - Lower store and load neon builtins to gimple

2021-09-27 Thread Richard Biener via Gcc-patches
On Mon, 27 Sep 2021, Jirui Wu wrote:

> Hi all,
> 
> I now use the type based on the specification of the intrinsic
> instead of type based on formal argument. 
> 
> I use signed Int vector types because the outputs of the neon builtins
> that I am lowering is always signed. In addition, fcode and stmt
> does not have information on whether the result is signed.
> 
> Because I am replacing the stmt with new_stmt,
> a VIEW_CONVERT_EXPR cast is already in the code if needed.
> As a result, the result assembly code is correct.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master? If OK can it be committed for me, I have no commit rights.

+   tree temp_lhs = gimple_call_lhs (stmt);
+   aarch64_simd_type_info simd_type
+ = aarch64_simd_types[mem_type];
+   tree elt_ptr_type = build_pointer_type (simd_type.eltype);
+   tree zero = build_zero_cst (elt_ptr_type);
+   gimple_seq stmts = NULL;
+   tree base = gimple_convert (, elt_ptr_type,
+   args[0]);
+   new_stmt = gimple_build_assign (temp_lhs,
+fold_build2 (MEM_REF,
+TREE_TYPE (temp_lhs),
+base,
+zero));

this now uses the alignment info as on the LHS of the call by using
TREE_TYPE (temp_lhs) as type of the MEM_REF.  So for example

 typedef int foo __attribute__((vector_size(N),aligned(256)));

 foo tem = ld1 (ptr);

will now access *ptr as if it were aligned to 256 bytes.  But I'm sure
the ld1 intrinsic documents the required alignment (either it's the
natural alignment of the vector type loaded or element alignment?).

For element alignment you'd do sth like

  tree access_type = build_aligned_type (vector_type, TYPE_ALIGN 
(TREE_TYPE (vector_type)));

for example.

Richard.


> Thanks,
> Jirui
> 
> > -Original Message-
> > From: Richard Biener 
> > Sent: Thursday, September 16, 2021 2:59 PM
> > To: Jirui Wu 
> > Cc: gcc-patches@gcc.gnu.org; jeffreya...@gmail.com; i...@airs.com; Richard
> > Sandiford 
> > Subject: Re: [Patch][GCC][middle-end] - Lower store and load neon builtins 
> > to
> > gimple
> > 
> > On Thu, 16 Sep 2021, Jirui Wu wrote:
> > 
> > > Hi all,
> > >
> > > This patch lowers the vld1 and vst1 variants of the store and load
> > > neon builtins functions to gimple.
> > >
> > > The changes in this patch covers:
> > > * Replaces calls to the vld1 and vst1 variants of the builtins
> > > * Uses MEM_REF gimple assignments to generate better code
> > > * Updates test cases to prevent over optimization
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master? If OK can it be committed for me, I have no commit rights.
> > 
> > +   new_stmt = gimple_build_assign (gimple_call_lhs (stmt),
> > +   fold_build2 (MEM_REF,
> > +   TREE_TYPE
> > +   (gimple_call_lhs (stmt)),
> > +   args[0], build_int_cst
> > +   (TREE_TYPE (args[0]), 0)));
> > 
> > you are using TBAA info based on the formal argument type that might have
> > pointer conversions stripped.  Instead you should use a type based on the
> > specification of the intrinsics (or the builtins).
> > 
> > Likewise for the type of the access (mind alignment info there!).
> > 
> > Richard.
> > 
> > > Thanks,
> > > Jirui
> > >
> > > gcc/ChangeLog:
> > >
> > > * config/aarch64/aarch64-builtins.c
> > (aarch64_general_gimple_fold_builtin):
> > > lower vld1 and vst1 variants of the neon builtins
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/aarch64/fmla_intrinsic_1.c:
> > > prevent over optimization
> > > * gcc.target/aarch64/fmls_intrinsic_1.c:
> > > prevent over optimization
> > > * gcc.target/aarch64/fmul_intrinsic_1.c:
> > > prevent over optimization
> > > * gcc.target/aarch64/mla_intrinsic_1.c:
> > > prevent over optimization
> > > * gcc.target/aarch64/mls_intrinsic_1.c:
> > > prevent over optimization
> > > * gcc.target/aarch64/mul_intrinsic_1.c:
> > > prevent over optimization
> > > * gcc.target/aarch64/simd/vmul_elem_1.c:
> > > prevent over optimization
> > > * gcc.target/aarch64/vclz.c:
> > > replace macro with function to prevent over optimization
> > > * gcc.target/aarch64/vneg_s.c:
> > > replace macro with function to prevent over optimization
> > >
> > 
> > --
> > Richard Biener 
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > Germany; GF: Felix Imendï¿œrffer; HRB 36809 (AG Nuernberg)
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


[PATCH] RISC-V: Pattern name fix mulm3_highpart -> smulm3_highpart.

2021-09-27 Thread Geng Qi via Gcc-patches
gcc/ChangeLog:

* config/riscv/riscv.md
(muldi3_highpart): Rename to muldi3_highpart.
(mulditi3): Emit muldi3_highpart.
(mulsi3_highpart): Rename to mulsi3_highpart.
(mulsidi3): Emit mulsi3_highpart.
---
 gcc/config/riscv/riscv.md | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index f88877fd596..3115a508bdf 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -899,14 +899,14 @@
   emit_insn (gen_muldi3 (low, operands[1], operands[2]));
 
   rtx high = gen_reg_rtx (DImode);
-  emit_insn (gen_muldi3_highpart (high, operands[1], operands[2]));
+  emit_insn (gen_muldi3_highpart (high, operands[1], operands[2]));
 
   emit_move_insn (gen_lowpart (DImode, operands[0]), low);
   emit_move_insn (gen_highpart (DImode, operands[0]), high);
   DONE;
 })
 
-(define_insn "muldi3_highpart"
+(define_insn "muldi3_highpart"
   [(set (match_operand:DI0 "register_operand" "=r")
(truncate:DI
  (lshiftrt:TI
@@ -961,13 +961,13 @@
 {
   rtx temp = gen_reg_rtx (SImode);
   emit_insn (gen_mulsi3 (temp, operands[1], operands[2]));
-  emit_insn (gen_mulsi3_highpart (riscv_subword (operands[0], true),
+  emit_insn (gen_mulsi3_highpart (riscv_subword (operands[0], true),
 operands[1], operands[2]));
   emit_insn (gen_movsi (riscv_subword (operands[0], false), temp));
   DONE;
 })
 
-(define_insn "mulsi3_highpart"
+(define_insn "mulsi3_highpart"
   [(set (match_operand:SI0 "register_operand" "=r")
(truncate:SI
  (lshiftrt:DI
-- 
2.22.0.windows.1



Re: [PATCH] AVX512FP16:support basic 64/32bit vector type and operation.

2021-09-27 Thread Uros Bizjak via Gcc-patches
On Mon, Sep 27, 2021 at 12:42 PM Hongyu Wang  wrote:
>
> Hi Uros,
>
> This patch intends to support V4HF/V2HF vector type and basic operations.
>
> For 32bit target, V4HF vector is parsed same as __m64 type, V2HF
> is parsed by stack and returned from GPR since it is not specified
> by ABI.
>
> We found for 64bit vector in ia32, when mmx disabled there seems no
> mov_internal, so we add a define_insn for v4hf mode. It would be very
> ppreciated if you know why the handling of 64bit vector looks as is and
> give some advice.

ia32 ABI declares that __m64 values pass via MMX registers. Due to
this, we are not able to fully disable MMX register usage, as is the
case with x86_64. So, V4HFmode values will pass to functions via MMX
registers on ia32 targets.

So, there should be no additional define_insn, the addition to the
existing MMXMODE mode iterator should be enough. V4HFmodes should be
handled in the same way as e.g. V8QImode.

This is not the case with 4-byte values, which should be passed using
integer ABI.

Uros.

>
> Bootstraped and regtested on x86_64-pc-linux-gnu{-m32,} and sde.
>
> OK for master?
>
> gcc/ChangeLog:
>
> PR target/102230
> * config/i386/i386.h (VALID_AVX512FP16_REG_MODE): Add
> V4HF and V2HF mode check.
> (VALID_SSE2_REG_VHF_MODE): Likewise.
> (VALID_MMX_REG_MODE): Likewise.
> (SSE_REG_MODE_P): Replace VALID_AVX512FP16_REG_MODE with
> vector mode condition.
> * config/i386/i386.c (classify_argument): Parse V4HF/V2HF
> via sse regs.
> (function_arg_32): Add V4HFmode.
> (function_arg_advance_32): Likewise.
> * config/i386/i386.md (mode): Add V4HF/V2HF.
> (MODE_SIZE): Likewise.
> * config/i386/mmx.md (MMXMODE): Add V4HF mode.
> (V_32): Add V2HF mode.
> (*mov_internal): Adjust sse alternatives to support
> V4HF mode vector move.
> (*mov_internal): Adjust sse alternatives
> to support V2HF mode move.
> * config/i386/sse.md (VHF_32_64): New mode iterator.
> (3): New define_insn for add/sub/mul/div.
> (*movv4hf_internal_sse): New define_insn for -mno-mmx and -msse.
>
> gcc/testsuite/ChangeLog:
>
> PR target/102230
> * gcc.target/i386/avx512fp16-floatvnhf.c: Remove xfail.
> * gcc.target/i386/avx512fp16-trunc-extendvnhf.c: Ditto.
> * gcc.target/i386/avx512fp16-truncvnhf.c: Ditto.
> * gcc.target/i386/avx512fp16-64-32-vecop-1.c: New test.
> * gcc.target/i386/avx512fp16-64-32-vecop-2.c: Ditto.
> * gcc.target/i386/pr102230.c: Ditto.
> ---
>  gcc/config/i386/i386.c|  4 +
>  gcc/config/i386/i386.h| 12 ++-
>  gcc/config/i386/i386.md   |  5 +-
>  gcc/config/i386/mmx.md| 27 ---
>  gcc/config/i386/sse.md| 49 
>  .../i386/avx512fp16-64-32-vecop-1.c   | 30 
>  .../i386/avx512fp16-64-32-vecop-2.c   | 75 +++
>  .../gcc.target/i386/avx512fp16-floatvnhf.c| 12 +--
>  .../i386/avx512fp16-trunc-extendvnhf.c| 12 +--
>  .../gcc.target/i386/avx512fp16-truncvnhf.c| 12 +--
>  gcc/testsuite/gcc.target/i386/pr102230.c  | 38 ++
>  11 files changed, 243 insertions(+), 33 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-64-32-vecop-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-64-32-vecop-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr102230.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index ba89e111d28..b3e4add4b9e 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -2462,6 +2462,8 @@ classify_argument (machine_mode mode, const_tree type,
>  case E_V2SFmode:
>  case E_V2SImode:
>  case E_V4HImode:
> +case E_V4HFmode:
> +case E_V2HFmode:
>  case E_V8QImode:
>classes[0] = X86_64_SSE_CLASS;
>return 1;
> @@ -2902,6 +2904,7 @@ pass_in_reg:
>
>  case E_V8QImode:
>  case E_V4HImode:
> +case E_V4HFmode:
>  case E_V2SImode:
>  case E_V2SFmode:
>  case E_V1TImode:
> @@ -3149,6 +3152,7 @@ pass_in_reg:
>
>  case E_V8QImode:
>  case E_V4HImode:
> +case E_V4HFmode:
>  case E_V2SImode:
>  case E_V2SFmode:
>  case E_V1TImode:
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index 8a4251b4926..9f3cad31f96 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -1033,7 +1033,8 @@ extern const char *host_detect_local_cpu (int argc, 
> const char **argv);
> || (MODE) == TImode)
>
>  #define VALID_AVX512FP16_REG_MODE(MODE)  
>   \
> -  ((MODE) == V8HFmode || (MODE) == V16HFmode || (MODE) == V32HFmode)
> +  ((MODE) == V8HFmode || (MODE) == V16HFmode || (MODE) == V32HFmode\
> +   || (MODE) == V4HFmode || (MODE) == V2HFmode)
>
>  #define 

答复: [PATCH] RISC-V: The 'multilib-generator' enhancement.

2021-09-27 Thread gengqi via Gcc-patches
Sorry, I sent the wrong one.

-邮件原件-
发件人: Geng Qi [mailto:gen...@linux.alibaba.com] 
发送时间: 2021年9月27日 19:25
收件人: gcc-patches@gcc.gnu.org; cooper...@linux.alibaba.com
抄送: gengqi 
主题: [PATCH] RISC-V: The 'multilib-generator' enhancement.

From: gengqi 

gcc/ChangeLog:
* config/riscv/arch-canonicalize
(longext_sort): New function for sorting 'multi-letter'.
* config/riscv/multilib-generator: Skip to next loop when current
'alt' is 'arch'. The 'arch' may not be the first of 'alts'.
(_expand_combination): Add underline for the ext without '*'.
This is because, a single-letter extension can always be treated
well
with a '_' prefix, but it cannot be separated out if it is appended
to a multi-letter.
---
 gcc/config/riscv/arch-canonicalize  | 14 +-
gcc/config/riscv/multilib-generator | 12 +++-
 2 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/gcc/config/riscv/arch-canonicalize
b/gcc/config/riscv/arch-canonicalize
index 2b4289e..a1e4570 100755
--- a/gcc/config/riscv/arch-canonicalize
+++ b/gcc/config/riscv/arch-canonicalize
@@ -74,8 +74,20 @@ def arch_canonicalize(arch):
   # becasue we just append extensions list to the arch string.
   std_exts += list(filter(lambda x:len(x) == 1, long_exts))
 
+  def longext_sort (exts):
+if not exts.startswith("zxm") and exts.startswith("z"):
+  # If "Z" extensions are named, they should be ordered first by
CANONICAL.
+  if exts[1] not in CANONICAL_ORDER:
+raise Exception("Unsupported extension `%s`" % exts)
+  canonical_sort = CANONICAL_ORDER.index(exts[1])
+else:
+  canonical_sort = -1
+return (exts.startswith("x"), exts.startswith("zxm"),
+LONG_EXT_PREFIXES.index(exts[0]), canonical_sort, exts[1:])
+
   # Multi-letter extension must be in lexicographic order.
-  long_exts = list(sorted(filter(lambda x:len(x) != 1, long_exts)))
+  long_exts = list(sorted(filter(lambda x:len(x) != 1, long_exts),
+  key=longext_sort))
 
   # Put extensions in canonical order.
   for ext in CANONICAL_ORDER:
diff --git a/gcc/config/riscv/multilib-generator
b/gcc/config/riscv/multilib-generator
index 64ff15f..7b22537 100755
--- a/gcc/config/riscv/multilib-generator
+++ b/gcc/config/riscv/multilib-generator
@@ -68,15 +68,15 @@ def arch_canonicalize(arch):
 def _expand_combination(ext):
   exts = list(ext.split("*"))
 
-  # No need to expand if there is no `*`.
-  if len(exts) == 1:
-return [(exts[0],)]
-
   # Add underline to every extension.
   # e.g.
   #  _b * zvamo => _b * _zvamo
   exts = list(map(lambda x: '_' + x, exts))
 
+  # No need to expand if there is no `*`.
+  if len(exts) == 1:
+return [(exts[0],)]
+
   # Generate combination!
   ext_combs = []
   for comb_len in range(1, len(exts)+1):
@@ -147,7 +147,9 @@ for cfg in sys.argv[1:]:
   # Drop duplicated entry.
   alts = unique(alts)
 
-  for alt in alts[1:]:
+  for alt in alts:
+if alt == arch:
+  continue
 arches[alt] = 1
 reuse.append('march.%s/mabi.%s=march.%s/mabi.%s' % (arch, abi, alt,
abi))
   required.append('march=%s/mabi=%s' % (arch, abi))
--
2.7.4



[PATCH] RISC-V: The 'multilib-generator' enhancement.

2021-09-27 Thread Geng Qi via Gcc-patches
From: gengqi 

gcc/ChangeLog:
* config/riscv/arch-canonicalize
(longext_sort): New function for sorting 'multi-letter'.
* config/riscv/multilib-generator: Skip to next loop when current
'alt' is 'arch'. The 'arch' may not be the first of 'alts'.
(_expand_combination): Add underline for the ext without '*'.
This is because, a single-letter extension can always be treated well
with a '_' prefix, but it cannot be separated out if it is appended
to a multi-letter.
---
 gcc/config/riscv/arch-canonicalize  | 14 +-
 gcc/config/riscv/multilib-generator | 12 +++-
 2 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/gcc/config/riscv/arch-canonicalize 
b/gcc/config/riscv/arch-canonicalize
index 2b4289e..a1e4570 100755
--- a/gcc/config/riscv/arch-canonicalize
+++ b/gcc/config/riscv/arch-canonicalize
@@ -74,8 +74,20 @@ def arch_canonicalize(arch):
   # becasue we just append extensions list to the arch string.
   std_exts += list(filter(lambda x:len(x) == 1, long_exts))
 
+  def longext_sort (exts):
+if not exts.startswith("zxm") and exts.startswith("z"):
+  # If "Z" extensions are named, they should be ordered first by CANONICAL.
+  if exts[1] not in CANONICAL_ORDER:
+raise Exception("Unsupported extension `%s`" % exts)
+  canonical_sort = CANONICAL_ORDER.index(exts[1])
+else:
+  canonical_sort = -1
+return (exts.startswith("x"), exts.startswith("zxm"),
+LONG_EXT_PREFIXES.index(exts[0]), canonical_sort, exts[1:])
+
   # Multi-letter extension must be in lexicographic order.
-  long_exts = list(sorted(filter(lambda x:len(x) != 1, long_exts)))
+  long_exts = list(sorted(filter(lambda x:len(x) != 1, long_exts),
+  key=longext_sort))
 
   # Put extensions in canonical order.
   for ext in CANONICAL_ORDER:
diff --git a/gcc/config/riscv/multilib-generator 
b/gcc/config/riscv/multilib-generator
index 64ff15f..7b22537 100755
--- a/gcc/config/riscv/multilib-generator
+++ b/gcc/config/riscv/multilib-generator
@@ -68,15 +68,15 @@ def arch_canonicalize(arch):
 def _expand_combination(ext):
   exts = list(ext.split("*"))
 
-  # No need to expand if there is no `*`.
-  if len(exts) == 1:
-return [(exts[0],)]
-
   # Add underline to every extension.
   # e.g.
   #  _b * zvamo => _b * _zvamo
   exts = list(map(lambda x: '_' + x, exts))
 
+  # No need to expand if there is no `*`.
+  if len(exts) == 1:
+return [(exts[0],)]
+
   # Generate combination!
   ext_combs = []
   for comb_len in range(1, len(exts)+1):
@@ -147,7 +147,9 @@ for cfg in sys.argv[1:]:
   # Drop duplicated entry.
   alts = unique(alts)
 
-  for alt in alts[1:]:
+  for alt in alts:
+if alt == arch:
+  continue
 arches[alt] = 1
 reuse.append('march.%s/mabi.%s=march.%s/mabi.%s' % (arch, abi, alt, abi))
   required.append('march=%s/mabi=%s' % (arch, abi))
-- 
2.7.4



Re: [RFC PATCH 0/8] RISC-V: Bit-manipulation extension.

2021-09-27 Thread Christoph Muellner
In case somebody wants to test this patchset, a patchset for Binutils
is required as well.
AFAIK here would be the Binutils branch with the required changes:
https://github.com/riscv-collab/riscv-binutils-gdb/tree/riscv-binutils-experiment

On Thu, Sep 23, 2021 at 9:57 AM Kito Cheng  wrote:
>
> Bit manipulation extension[1] is finishing the public review and waiting for
> the rest of the ratification process, I believe that will become a ratified
> extension soon, so I think it's time to submit to upstream for review now :)
>
> As the title included RFC, it's not a rush to merge to trunk yet, I would
> like to merge that until it is officially ratified.
>
> This patch set is the implementation of bit-manipulation extension, which
> includes zba, zbb, zbc and zbs extension, but only included in instruction/md
> pattern only, no intrinsic function implementation.
>
> Most work is done by Jim Willson and many other contributors
> on https://github.com/riscv-collab/riscv-gcc.
>
>
> [1] https://github.com/riscv/riscv-bitmanip/releases/tag/1.0.0
>
>


[PATCH] Update pathname for IBM long double description.

2021-09-27 Thread Vincent Lefevre
Update due to file moved to libgcc/config/rs6000/ibm-ldouble-format
in commit aca0b0b315f6e5a0ee60981fd4b0cbc9a7f59096.

Signed-off-by: Vincent Lefevre 
---
 include/floatformat.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/floatformat.h b/include/floatformat.h
index 5f9c14361f5..288aedda192 100644
--- a/include/floatformat.h
+++ b/include/floatformat.h
@@ -91,7 +91,7 @@ struct floatformat
 
   /* Is the format actually the sum of two smaller floating point
  formats (IBM long double, as described in
- gcc/config/rs6000/darwin-ldouble-format)?  If so, this is the
+ libgcc/config/rs6000/ibm-ldouble-format)?  If so, this is the
  smaller format in question, and the fields sign_start through
  intbit describe the first half.  If not, this is NULL.  */
   const struct floatformat *split_half;
-- 
2.33.0



Re: [RFC PATCH 1/8] RISC-V: Minimal support of bitmanip extension

2021-09-27 Thread Christoph Muellner
Hi Kito,

On Thu, Sep 23, 2021 at 9:57 AM Kito Cheng  wrote:
>
> 2021-09-23  Kito Cheng  
>
> gcc/ChangeLog:
>
> * common/config/riscv/riscv-common.c (riscv_ext_version_table):
> Add zba, zbb, zbc and zbs.
> (riscv_ext_flag_table): Ditto.
> * config/riscv/riscv-opts.h (MASK_ZBA): New.
> (MASK_ZBB): Ditto.
> (MASK_ZBC): Ditto.
> (MASK_ZBS): Ditto.
> (TARGET_ZBA): Ditto.
> (TARGET_ZBB): Ditto.
> (TARGET_ZBC): Ditto.
> (TARGET_ZBS): Ditto.
> * config/riscv/riscv.opt (riscv_zb_subext): New.
> ---
>  gcc/common/config/riscv/riscv-common.c | 10 ++
>  gcc/config/riscv/riscv-opts.h  | 10 ++
>  gcc/config/riscv/riscv.opt |  3 +++
>  3 files changed, 23 insertions(+)
>
> diff --git a/gcc/common/config/riscv/riscv-common.c 
> b/gcc/common/config/riscv/riscv-common.c
> index 10868fd417d..37b6ea80086 100644
> --- a/gcc/common/config/riscv/riscv-common.c
> +++ b/gcc/common/config/riscv/riscv-common.c
> @@ -101,6 +101,11 @@ static const struct riscv_ext_version 
> riscv_ext_version_table[] =
>{"zifencei", ISA_SPEC_CLASS_20191213, 2, 0},
>{"zifencei", ISA_SPEC_CLASS_20190608, 2, 0},
>
> +  {"zba", ISA_SPEC_CLASS_NONE, 1, 0},
> +  {"zbb", ISA_SPEC_CLASS_NONE, 1, 0},
> +  {"zbc", ISA_SPEC_CLASS_NONE, 1, 0},
> +  {"zbs", ISA_SPEC_CLASS_NONE, 1, 0},

I think this needs another specification class (there is a
specification for the instructions and it is in public review).
Proposal: ISA_SPEC_CLASS_FROZEN_2021

BR
Christoph

> +
>/* Terminate the list.  */
>{NULL, ISA_SPEC_CLASS_NONE, 0, 0}
>  };
> @@ -906,6 +911,11 @@ static const riscv_ext_flag_table_t 
> riscv_ext_flag_table[] =
>{"zicsr",_options::x_riscv_zi_subext, MASK_ZICSR},
>{"zifencei", _options::x_riscv_zi_subext, MASK_ZIFENCEI},
>
> +  {"zba",_options::x_riscv_zb_subext, MASK_ZBA},
> +  {"zbb",_options::x_riscv_zb_subext, MASK_ZBB},
> +  {"zbc",_options::x_riscv_zb_subext, MASK_ZBC},
> +  {"zbs",_options::x_riscv_zb_subext, MASK_ZBS},
> +
>{NULL, NULL, 0}
>  };
>
> diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
> index f4cf6ca4b82..2efc4b80f1f 100644
> --- a/gcc/config/riscv/riscv-opts.h
> +++ b/gcc/config/riscv/riscv-opts.h
> @@ -73,4 +73,14 @@ enum stack_protector_guard {
>  #define TARGET_ZICSR((riscv_zi_subext & MASK_ZICSR) != 0)
>  #define TARGET_ZIFENCEI ((riscv_zi_subext & MASK_ZIFENCEI) != 0)
>
> +#define MASK_ZBA  (1 << 0)
> +#define MASK_ZBB  (1 << 1)
> +#define MASK_ZBC  (1 << 2)
> +#define MASK_ZBS  (1 << 3)
> +
> +#define TARGET_ZBA((riscv_zb_subext & MASK_ZBA) != 0)
> +#define TARGET_ZBB((riscv_zb_subext & MASK_ZBB) != 0)
> +#define TARGET_ZBC((riscv_zb_subext & MASK_ZBC) != 0)
> +#define TARGET_ZBS((riscv_zb_subext & MASK_ZBS) != 0)
> +
>  #endif /* ! GCC_RISCV_OPTS_H */
> diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
> index 5ff85c21430..15bf89e17c2 100644
> --- a/gcc/config/riscv/riscv.opt
> +++ b/gcc/config/riscv/riscv.opt
> @@ -195,6 +195,9 @@ long riscv_stack_protector_guard_offset = 0
>  TargetVariable
>  int riscv_zi_subext
>
> +TargetVariable
> +int riscv_zb_subext
> +
>  Enum
>  Name(isa_spec_class) Type(enum riscv_isa_spec_class)
>  Supported ISA specs (for use with the -misa-spec= option):
> --
> 2.33.0
>


RE: [Patch][GCC][middle-end] - Lower store and load neon builtins to gimple

2021-09-27 Thread Jirui Wu via Gcc-patches
Hi all,

I now use the type based on the specification of the intrinsic
instead of type based on formal argument. 

I use signed Int vector types because the outputs of the neon builtins
that I am lowering is always signed. In addition, fcode and stmt
does not have information on whether the result is signed.

Because I am replacing the stmt with new_stmt,
a VIEW_CONVERT_EXPR cast is already in the code if needed.
As a result, the result assembly code is correct.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master? If OK can it be committed for me, I have no commit rights.

Thanks,
Jirui

> -Original Message-
> From: Richard Biener 
> Sent: Thursday, September 16, 2021 2:59 PM
> To: Jirui Wu 
> Cc: gcc-patches@gcc.gnu.org; jeffreya...@gmail.com; i...@airs.com; Richard
> Sandiford 
> Subject: Re: [Patch][GCC][middle-end] - Lower store and load neon builtins to
> gimple
> 
> On Thu, 16 Sep 2021, Jirui Wu wrote:
> 
> > Hi all,
> >
> > This patch lowers the vld1 and vst1 variants of the store and load
> > neon builtins functions to gimple.
> >
> > The changes in this patch covers:
> > * Replaces calls to the vld1 and vst1 variants of the builtins
> > * Uses MEM_REF gimple assignments to generate better code
> > * Updates test cases to prevent over optimization
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master? If OK can it be committed for me, I have no commit rights.
> 
> +   new_stmt = gimple_build_assign (gimple_call_lhs (stmt),
> +   fold_build2 (MEM_REF,
> +   TREE_TYPE
> +   (gimple_call_lhs (stmt)),
> +   args[0], build_int_cst
> +   (TREE_TYPE (args[0]), 0)));
> 
> you are using TBAA info based on the formal argument type that might have
> pointer conversions stripped.  Instead you should use a type based on the
> specification of the intrinsics (or the builtins).
> 
> Likewise for the type of the access (mind alignment info there!).
> 
> Richard.
> 
> > Thanks,
> > Jirui
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-builtins.c
> (aarch64_general_gimple_fold_builtin):
> > lower vld1 and vst1 variants of the neon builtins
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/fmla_intrinsic_1.c:
> > prevent over optimization
> > * gcc.target/aarch64/fmls_intrinsic_1.c:
> > prevent over optimization
> > * gcc.target/aarch64/fmul_intrinsic_1.c:
> > prevent over optimization
> > * gcc.target/aarch64/mla_intrinsic_1.c:
> > prevent over optimization
> > * gcc.target/aarch64/mls_intrinsic_1.c:
> > prevent over optimization
> > * gcc.target/aarch64/mul_intrinsic_1.c:
> > prevent over optimization
> > * gcc.target/aarch64/simd/vmul_elem_1.c:
> > prevent over optimization
> > * gcc.target/aarch64/vclz.c:
> > replace macro with function to prevent over optimization
> > * gcc.target/aarch64/vneg_s.c:
> > replace macro with function to prevent over optimization
> >
> 
> --
> Richard Biener 
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendï¿œrffer; HRB 36809 (AG Nuernberg)
diff --git a/gcc/config/aarch64/aarch64-builtins.c 
b/gcc/config/aarch64/aarch64-builtins.c
index 
119f67d4e4c9e70e9ab1de773b42a171fbdf423e..124fd35caa01ef4a83dae0626f83efb62c053bd1
 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -46,6 +46,7 @@
 #include "emit-rtl.h"
 #include "stringpool.h"
 #include "attribs.h"
+#include "gimple-fold.h"
 
 #define v8qi_UP  E_V8QImode
 #define v4hi_UP  E_V4HImode
@@ -2387,6 +2388,59 @@ aarch64_general_fold_builtin (unsigned int fcode, tree 
type,
   return NULL_TREE;
 }
 
+enum aarch64_simd_type
+get_mem_type_for_load_store (unsigned int fcode)
+{
+  switch (fcode)
+  {
+VAR1 (LOAD1, ld1 , 0, LOAD, v8qi)
+VAR1 (STORE1, st1 , 0, STORE, v8qi)
+  return Int8x8_t;
+VAR1 (LOAD1, ld1 , 0, LOAD, v16qi)
+VAR1 (STORE1, st1 , 0, STORE, v16qi)
+  return Int8x16_t;
+VAR1 (LOAD1, ld1 , 0, LOAD, v4hi)
+VAR1 (STORE1, st1 , 0, STORE, v4hi)
+  return Int16x4_t;
+VAR1 (LOAD1, ld1 , 0, LOAD, v8hi)
+VAR1 (STORE1, st1 , 0, STORE, v8hi)
+  return Int16x8_t;
+VAR1 (LOAD1, ld1 , 0, LOAD, v2si)
+VAR1 (STORE1, st1 , 0, STORE, v2si)
+  return Int32x2_t;
+VAR1 (LOAD1, ld1 , 0, LOAD, v4si)
+VAR1 (STORE1, st1 , 0, STORE, v4si)
+  return Int32x4_t;
+VAR1 (LOAD1, ld1 , 0, LOAD, v2di)
+VAR1 (STORE1, st1 , 0, STORE, v2di)
+  return Int64x2_t;
+VAR1 (LOAD1, ld1 , 0, LOAD, v4hf)
+VAR1 (STORE1, st1 , 0, STORE, v4hf)
+  return Float16x4_t;
+VAR1 (LOAD1, ld1 , 0, LOAD, v8hf)
+VAR1 (STORE1, st1 , 0, STORE, v8hf)
+  return Float16x8_t;
+VAR1 (LOAD1, ld1 , 0, 

[PATCH] AVX512FP16:support basic 64/32bit vector type and operation.

2021-09-27 Thread Hongyu Wang via Gcc-patches
Hi Uros,

This patch intends to support V4HF/V2HF vector type and basic operations.

For 32bit target, V4HF vector is parsed same as __m64 type, V2HF
is parsed by stack and returned from GPR since it is not specified
by ABI.

We found for 64bit vector in ia32, when mmx disabled there seems no
mov_internal, so we add a define_insn for v4hf mode. It would be very
ppreciated if you know why the handling of 64bit vector looks as is and
give some advice.

Bootstraped and regtested on x86_64-pc-linux-gnu{-m32,} and sde.

OK for master?

gcc/ChangeLog:

PR target/102230
* config/i386/i386.h (VALID_AVX512FP16_REG_MODE): Add
V4HF and V2HF mode check.
(VALID_SSE2_REG_VHF_MODE): Likewise.
(VALID_MMX_REG_MODE): Likewise.
(SSE_REG_MODE_P): Replace VALID_AVX512FP16_REG_MODE with
vector mode condition.
* config/i386/i386.c (classify_argument): Parse V4HF/V2HF
via sse regs.
(function_arg_32): Add V4HFmode.
(function_arg_advance_32): Likewise.
* config/i386/i386.md (mode): Add V4HF/V2HF.
(MODE_SIZE): Likewise.
* config/i386/mmx.md (MMXMODE): Add V4HF mode.
(V_32): Add V2HF mode.
(*mov_internal): Adjust sse alternatives to support
V4HF mode vector move.
(*mov_internal): Adjust sse alternatives
to support V2HF mode move.
* config/i386/sse.md (VHF_32_64): New mode iterator.
(3): New define_insn for add/sub/mul/div.
(*movv4hf_internal_sse): New define_insn for -mno-mmx and -msse.

gcc/testsuite/ChangeLog:

PR target/102230
* gcc.target/i386/avx512fp16-floatvnhf.c: Remove xfail.
* gcc.target/i386/avx512fp16-trunc-extendvnhf.c: Ditto.
* gcc.target/i386/avx512fp16-truncvnhf.c: Ditto.
* gcc.target/i386/avx512fp16-64-32-vecop-1.c: New test.
* gcc.target/i386/avx512fp16-64-32-vecop-2.c: Ditto.
* gcc.target/i386/pr102230.c: Ditto.
---
 gcc/config/i386/i386.c|  4 +
 gcc/config/i386/i386.h| 12 ++-
 gcc/config/i386/i386.md   |  5 +-
 gcc/config/i386/mmx.md| 27 ---
 gcc/config/i386/sse.md| 49 
 .../i386/avx512fp16-64-32-vecop-1.c   | 30 
 .../i386/avx512fp16-64-32-vecop-2.c   | 75 +++
 .../gcc.target/i386/avx512fp16-floatvnhf.c| 12 +--
 .../i386/avx512fp16-trunc-extendvnhf.c| 12 +--
 .../gcc.target/i386/avx512fp16-truncvnhf.c| 12 +--
 gcc/testsuite/gcc.target/i386/pr102230.c  | 38 ++
 11 files changed, 243 insertions(+), 33 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-64-32-vecop-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-64-32-vecop-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr102230.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ba89e111d28..b3e4add4b9e 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2462,6 +2462,8 @@ classify_argument (machine_mode mode, const_tree type,
 case E_V2SFmode:
 case E_V2SImode:
 case E_V4HImode:
+case E_V4HFmode:
+case E_V2HFmode:
 case E_V8QImode:
   classes[0] = X86_64_SSE_CLASS;
   return 1;
@@ -2902,6 +2904,7 @@ pass_in_reg:
 
 case E_V8QImode:
 case E_V4HImode:
+case E_V4HFmode:
 case E_V2SImode:
 case E_V2SFmode:
 case E_V1TImode:
@@ -3149,6 +3152,7 @@ pass_in_reg:
 
 case E_V8QImode:
 case E_V4HImode:
+case E_V4HFmode:
 case E_V2SImode:
 case E_V2SFmode:
 case E_V1TImode:
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 8a4251b4926..9f3cad31f96 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1033,7 +1033,8 @@ extern const char *host_detect_local_cpu (int argc, const 
char **argv);
|| (MODE) == TImode)
 
 #define VALID_AVX512FP16_REG_MODE(MODE)
\
-  ((MODE) == V8HFmode || (MODE) == V16HFmode || (MODE) == V32HFmode)
+  ((MODE) == V8HFmode || (MODE) == V16HFmode || (MODE) == V32HFmode\
+   || (MODE) == V4HFmode || (MODE) == V2HFmode)
 
 #define VALID_SSE2_REG_MODE(MODE)  \
   ((MODE) == V16QImode || (MODE) == V8HImode || (MODE) == V2DFmode \
@@ -1041,7 +1042,8 @@ extern const char *host_detect_local_cpu (int argc, const 
char **argv);
|| (MODE) == V2DImode || (MODE) == DFmode || (MODE) == HFmode)
 
 #define VALID_SSE2_REG_VHF_MODE(MODE)  \
-  (VALID_SSE2_REG_MODE (MODE) || (MODE) == V8HFmode)
+  (VALID_SSE2_REG_MODE (MODE) || (MODE) == V8HFmode\
+   || (MODE) == V4HFmode || (MODE) == V2HFmode)
 
 #define VALID_SSE_REG_MODE(MODE)   \
   ((MODE) == V1TImode || (MODE) == TImode  \
@@ -1054,7 +1056,8 @@ extern const char *host_detect_local_cpu (int argc, const 
char **argv);
 #define 

[COMMITTED] Use on-demand ranges in ssa_name_has_boolean_range before querying nonzero bits.

2021-09-27 Thread Aldy Hernandez via Gcc-patches
The function ssa_name_has_boolean_range looks at the nonzero bits stored
in SSA_NAME_RANGE_INFO.  These are global in nature and are the result
of a previous evrp/VRP run (technically other passes can also set them).

However, we can do better if we use get_range_query.  Doing so will use
a ranger if enabled in a pass, or global ranges otherwise.  The call to
get_nonzero_bits remains, as there are passes that will set them
independently of the global range info.

Tested on x86-64 Linux with a regstrap as well as in a DOM environment
using an on-demand ranger instead of evrp.

Committed to trunk.

gcc/ChangeLog:

* tree-ssanames.c (ssa_name_has_boolean_range): Use
get_range_query.
---
 gcc/tree-ssanames.c | 19 ++-
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/gcc/tree-ssanames.c b/gcc/tree-ssanames.c
index 2165ad71cf3..f427c5a789b 100644
--- a/gcc/tree-ssanames.c
+++ b/gcc/tree-ssanames.c
@@ -31,6 +31,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa.h"
 #include "cfgloop.h"
 #include "tree-scalar-evolution.h"
+#include "value-query.h"
 
 /* Rewriting a function into SSA form can create a huge number of SSA_NAMEs,
many of which may be thrown away shortly after their creation if jumps
@@ -484,7 +485,7 @@ get_nonzero_bits (const_tree name)
 
This can be because it is a boolean type, any unsigned integral
type with a single bit of precision, or has known range of [0..1]
-   via VRP analysis.  */
+   via range analysis.  */
 
 bool
 ssa_name_has_boolean_range (tree op)
@@ -502,12 +503,20 @@ ssa_name_has_boolean_range (tree op)
 return true;
 
   /* An integral type with more precision, but the object
- only takes on values [0..1] as determined by VRP
+ only takes on values [0..1] as determined by range
  analysis.  */
   if (INTEGRAL_TYPE_P (TREE_TYPE (op))
-  && (TYPE_PRECISION (TREE_TYPE (op)) > 1)
-  && wi::eq_p (get_nonzero_bits (op), 1))
-return true;
+  && (TYPE_PRECISION (TREE_TYPE (op)) > 1))
+{
+  int_range<2> onezero (build_zero_cst (TREE_TYPE (op)),
+   build_one_cst (TREE_TYPE (op)));
+  int_range<2> r;
+  if (get_range_query (cfun)->range_of_expr (r, op) && r == onezero)
+   return true;
+
+  if (wi::eq_p (get_nonzero_bits (op), 1))
+   return true;
+}
 
   return false;
 }
-- 
2.31.1



[COMMITTED] Convert some evrp uses in DOM to the range_query API.

2021-09-27 Thread Aldy Hernandez via Gcc-patches
DOM is the last remaining user of the evrp engine.  This patch converts
a few uses of the engine and vr-values into the new API.

There is one subtle change.  The call to vr_value's
op_with_constant_singleton_value_range can theoretically return
non-constants, unlike the range_query API which only returns constants.
In this particular case it doesn't matter because the symbolic stuff will
have been handled by the const_and_copies/avail_exprs read in the
SSA_NAME_VALUE copy immediately before.  I have verified this is the case
by asserting that all calls to op_with_constant_singleton_value_range at
this point return either NULL or an INTEGER_CST.

Tested on x86-64 Linux with a regstrap, as well as the aforementioned
assert.

Committed to trunk.

gcc/ChangeLog:

* gimple-ssa-evrp-analyze.h (class evrp_range_analyzer): Remove
vrp_visit_cond_stmt.
* tree-ssa-dom.c (cprop_operand): Convert to range_query API.
(cprop_into_stmt): Same.
(dom_opt_dom_walker::optimize_stmt): Same.
---
 gcc/gimple-ssa-evrp-analyze.h |  7 ---
 gcc/tree-ssa-dom.c| 17 +++--
 2 files changed, 11 insertions(+), 13 deletions(-)

diff --git a/gcc/gimple-ssa-evrp-analyze.h b/gcc/gimple-ssa-evrp-analyze.h
index 0a70a1e58bd..4cf82e69c3a 100644
--- a/gcc/gimple-ssa-evrp-analyze.h
+++ b/gcc/gimple-ssa-evrp-analyze.h
@@ -38,13 +38,6 @@ class evrp_range_analyzer : public vr_values
   /* Record a new unwindable range.  */
   void push_value_range (tree var, value_range_equiv *vr);
 
-  /* A bit of a wart.  This should ideally go away.  */
-  void vrp_visit_cond_stmt (gcond *cond, edge *e)
-  {
-simplify_using_ranges simpl (this);
-simpl.vrp_visit_cond_stmt (cond, e);
-  }
-
  private:
   DISABLE_COPY_AND_ASSIGN (evrp_range_analyzer);
 
diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index f58b6b78a41..a8a5b34f725 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -1810,7 +1810,7 @@ record_equivalences_from_stmt (gimple *stmt, int 
may_optimize_p,
CONST_AND_COPIES.  */
 
 static void
-cprop_operand (gimple *stmt, use_operand_p op_p, vr_values *vr_values)
+cprop_operand (gimple *stmt, use_operand_p op_p, range_query *query)
 {
   tree val;
   tree op = USE_FROM_PTR (op_p);
@@ -1820,7 +1820,12 @@ cprop_operand (gimple *stmt, use_operand_p op_p, 
vr_values *vr_values)
  CONST_AND_COPIES.  */
   val = SSA_NAME_VALUE (op);
   if (!val)
-val = vr_values->op_with_constant_singleton_value_range (op);
+{
+  value_range r;
+  tree single;
+  if (query->range_of_expr (r, op, stmt) && r.singleton_p ())
+   val = single;
+}
 
   if (val && val != op)
 {
@@ -1878,7 +1883,7 @@ cprop_operand (gimple *stmt, use_operand_p op_p, 
vr_values *vr_values)
vdef_ops of STMT.  */
 
 static void
-cprop_into_stmt (gimple *stmt, vr_values *vr_values)
+cprop_into_stmt (gimple *stmt, range_query *query)
 {
   use_operand_p op_p;
   ssa_op_iter iter;
@@ -1895,7 +1900,7 @@ cprop_into_stmt (gimple *stmt, vr_values *vr_values)
 operands.  */
   if (old_op != last_copy_propagated_op)
{
- cprop_operand (stmt, op_p, vr_values);
+ cprop_operand (stmt, op_p, query);
 
  tree new_op = USE_FROM_PTR (op_p);
  if (new_op != old_op && TREE_CODE (new_op) == SSA_NAME)
@@ -2203,8 +2208,8 @@ dom_opt_dom_walker::optimize_stmt (basic_block bb, 
gimple_stmt_iterator *si,
 SSA_NAMES.  */
  update_stmt_if_modified (stmt);
  edge taken_edge = NULL;
- m_evrp_range_analyzer->vrp_visit_cond_stmt
-   (as_a  (stmt), _edge);
+ simplify_using_ranges simpl (m_evrp_range_analyzer);
+ simpl.vrp_visit_cond_stmt (as_a  (stmt), _edge);
  if (taken_edge)
{
  if (taken_edge->flags & EDGE_TRUE_VALUE)
-- 
2.31.1



Re: [PATCH] Relax condition of (vec_concat:M(vec_select op0 idx0)(vec_select op0 idx1)) to allow different modes between op0 and M, but have same inner mode.

2021-09-27 Thread Hongtao Liu via Gcc-patches
On Fri, Sep 24, 2021 at 9:08 PM Segher Boessenkool
 wrote:
>
> On Mon, Sep 13, 2021 at 04:24:13PM +0200, Richard Biener wrote:
> > On Mon, Sep 13, 2021 at 4:10 PM Jeff Law via Gcc-patches
> >  wrote:
> > > I'm not convinced that we need the inner mode to match anything.  As
> > > long as the vec_concat's mode is twice the size of the vec_select modes
> > > and the vec_select mode is <= the mode of its operands ISTM this is
> > > fine.   We  might want the modes of the vec_select to match, but I don't
> > > think that's strictly necessary either, they just need to be the same
> > > size.  ie, we could have somethig like
> > >
> > > (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DI (reg:V4DI)))
> > >
> > > I'm not sure if that level of generality is useful though.  If we want
> > > the modes of the vec_selects to match I think we could still support
> > >
> > > (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DF (reg:V8DF)))
> > >
> > > Thoughts?
> >
> > I think the component or scalar modes of the elements to concat need to 
> > match
> > the component mode of the result.  I don't think you example involving
> > a cat of DF and DI is too useful - but you could use a subreg around the DI
> > value ;)
>
> I agree.
>
> If you want to concatenate components of different modes, you should
> change mode first, using subregs for example.
I don't really understand.

for
> > > (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DF (reg:V8DF)))
> > >
> > > Thoughts?
how can it be simplified when reg:V4DF is different from (reg:V8DF)
to
(vec_select: (vec_concat:(subreg V8DF (reg:V4DF) 0) (reg:V8DF))
   (paralle[...])
?, which doesn't look like a simpication.

Similar for
> > > (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DI (reg:V4DI)))

here we require rtx_equal_p (XEXP (trueop0, 0), XEXP (trueop1, 0)) so
vec_concat (vec_select vec_select) can be simplified to just
vec_select.

>
> ("Inner mode" is something of subregs btw, "component mode" is what this
> concept of modes is called, the name GET_MODE_INNER is a bit confusing
> though :-) )
>
> Btw, the documentation for "concat" says
>   @findex concat
>   @item (concat@var{m} @var{rtx} @var{rtx})
>   This RTX represents the concatenation of two other RTXs.  This is used
>   for complex values.  It should only appear in the RTL attached to
>   declarations and during RTL generation.  It should not appear in the
>   ordinary insn chain.
> which needs some updating (in many ways).
>
>
> Segher



-- 
BR,
Hongtao


[PATCH] Support 128/256/512-bit vector _Float16 plus/smin/smax reduce.

2021-09-27 Thread liuhongt via Gcc-patches
Hi:
  Add expanders for reduc_{smin,smax,plus}_scal_{v8hf,v16hf,v32hf}
  Bootstrapped and regtest on x86_64-pc-linux-gnu{-m32,}
  
gcc/ChangeLog:

* config/i386/i386-expand.c (emit_reduc_half): Handle
V8HF/V16HF/V32HFmode.
* config/i386/sse.md (REDUC_SSE_PLUS_MODE): Add V8HF.
(REDUC_SSE_SMINMAX_MODE): Ditto.
(REDUC_PLUS_MODE): Add V16HF and V32HF.
(REDUC_SMINMAX_MODE): Ditto.

gcc/testsuite

* gcc.target/i386/avx512fp16-reduce-op-2.c: New test.
* gcc.target/i386/avx512fp16-reduce-op-3.c: New test.
---
 gcc/config/i386/i386-expand.c |  3 +
 gcc/config/i386/sse.md| 10 +-
 .../gcc.target/i386/avx512fp16-reduce-op-2.c  | 96 +++
 .../gcc.target/i386/avx512fp16-reduce-op-3.c  | 91 ++
 4 files changed, 198 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-reduce-op-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-reduce-op-3.c

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 94ac303585e..4780b993917 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -16045,6 +16045,7 @@ emit_reduc_half (rtx dest, rtx src, int i)
   break;
 case E_V16QImode:
 case E_V8HImode:
+case E_V8HFmode:
 case E_V4SImode:
 case E_V2DImode:
   d = gen_reg_rtx (V1TImode);
@@ -16066,6 +16067,7 @@ emit_reduc_half (rtx dest, rtx src, int i)
   break;
 case E_V32QImode:
 case E_V16HImode:
+case E_V16HFmode:
 case E_V8SImode:
 case E_V4DImode:
   if (i == 256)
@@ -16085,6 +16087,7 @@ emit_reduc_half (rtx dest, rtx src, int i)
   break;
 case E_V64QImode:
 case E_V32HImode:
+case E_V32HFmode:
   if (i < 64)
{
  d = gen_reg_rtx (V4TImode);
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index bb7600edbab..4559b0ce9c9 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -3157,7 +3157,8 @@ (define_insn "sse3_hv4sf3"
(set_attr "mode" "V4SF")])
 
 (define_mode_iterator REDUC_SSE_PLUS_MODE
- [(V2DF "TARGET_SSE") (V4SF "TARGET_SSE")])
+ [(V2DF "TARGET_SSE") (V4SF "TARGET_SSE")
+  (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")])
 
 (define_expand "reduc_plus_scal_"
  [(plus:REDUC_SSE_PLUS_MODE
@@ -3194,7 +3195,9 @@ (define_expand "reduc_plus_scal_v16qi"
 
 (define_mode_iterator REDUC_PLUS_MODE
  [(V4DF "TARGET_AVX") (V8SF "TARGET_AVX")
+  (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
   (V8DF "TARGET_AVX512F") (V16SF "TARGET_AVX512F")
+  (V32HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
   (V32QI "TARGET_AVX") (V64QI "TARGET_AVX512F")])
 
 (define_expand "reduc_plus_scal_"
@@ -3214,7 +3217,8 @@ (define_expand "reduc_plus_scal_"
 
 ;; Modes handled by reduc_sm{in,ax}* patterns.
 (define_mode_iterator REDUC_SSE_SMINMAX_MODE
-  [(V4SF "TARGET_SSE") (V2DF "TARGET_SSE")
+  [(V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
+   (V4SF "TARGET_SSE") (V2DF "TARGET_SSE")
(V4SI "TARGET_SSE2") (V8HI "TARGET_SSE2") (V16QI "TARGET_SSE2")
(V2DI "TARGET_SSE4_2")])
 
@@ -3233,9 +3237,11 @@ (define_expand "reduc__scal_"
 
 (define_mode_iterator REDUC_SMINMAX_MODE
   [(V32QI "TARGET_AVX2") (V16HI "TARGET_AVX2")
+   (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
(V8SI "TARGET_AVX2") (V4DI "TARGET_AVX2")
(V8SF "TARGET_AVX") (V4DF "TARGET_AVX")
(V64QI "TARGET_AVX512BW")
+   (V32HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
(V32HI "TARGET_AVX512BW") (V16SI "TARGET_AVX512F")
(V8DI "TARGET_AVX512F") (V16SF "TARGET_AVX512F")
(V8DF "TARGET_AVX512F")])
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-reduce-op-2.c 
b/gcc/testsuite/gcc.target/i386/avx512fp16-reduce-op-2.c
new file mode 100644
index 000..593340e4afa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-reduce-op-2.c
@@ -0,0 +1,96 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mprefer-vector-width=512 -fdump-tree-optimized" } */
+
+/* { dg-final { scan-tree-dump-times "\.REDUC_PLUS" 3 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\.REDUC_MIN" 3 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\.REDUC_MAX" 3 "optimized" } } */
+
+_Float16
+__attribute__((noipa, target("avx512fp16,avx512vl"), optimize("Ofast")))
+reduc_add_128 (_Float16* p)
+{
+  _Float16 sum = 0;
+  for (int i = 0; i != 8; i++)
+sum += p[i];
+  return sum;
+}
+
+_Float16
+__attribute__((noipa, target("avx512fp16,avx512vl"), optimize("Ofast")))
+reduc_add_256 (_Float16* p)
+{
+  _Float16 sum = 0;
+  for (int i = 0; i != 16; i++)
+sum += p[i];
+  return sum;
+}
+
+_Float16
+__attribute__((noipa, target("avx512fp16,avx512vl"), optimize("Ofast")))
+reduc_add_512 (_Float16* p)
+{
+  _Float16 sum = 0;
+  for (int i = 0; i != 32; i++)
+sum += p[i];
+  return sum;
+}
+
+_Float16
+__attribute__((noipa, target("avx512fp16,avx512vl"), optimize("Ofast")))
+reduc_min_128 (_Float16* p)
+{
+  _Float16 sum = p[0];
+  

Re: [PATCHv2] top-level configure: setup target_configdirs based on repository

2021-09-27 Thread Richard Biener via Gcc-patches
On Fri, Sep 24, 2021 at 12:34 PM Andrew Burgess
 wrote:
>
> * Thomas Schwinge  [2021-09-23 11:29:05 +0200]:
>
> > Hi!
> >
> > I only had a curious look here; hope that's still useful.
> >
> > On 2021-09-22T16:30:42+0100, Andrew Burgess  
> > wrote:
> > > The top-level configure script is shared between the gcc repository
> > > and the binutils-gdb repository.
> > >
> > > The target_configdirs variable in the configure.ac script, defines
> > > sub-directories that contain components that should be built for the
> > > target using the target tools.
> > >
> > > Some components, e.g. zlib, are built as both host and target
> > > libraries.
> > >
> > > This causes problems for binutils-gdb.  If we run 'make all' in the
> > > binutils-gdb repository we end up trying to build a target version of
> > > the zlib library, which requires the target compiler be available.
> > > Often the target compiler isn't immediately available, and so the
> > > build fails.
> >
> > I did wonder: shouldn't normally these target libraries be masked out via
> > 'noconfigdirs' (see 'Handle --disable- generically' section),
> > via 'enable_[...]' being set to 'no'?  But I think I now see the problem
> > here: the 'enable_[...]' variables guard both the host and target library
> > build!  (... if I'm quickly understanding that correctly...)
> >
> > ... and you do need the host zlib, thus '$enable_zlib != no'.
> >
> > > The problem with zlib impacted a previous attempt to synchronise the
> > > top-level configure scripts from gcc to binutils-gdb, see this thread:
> > >
> > >   https://sourceware.org/pipermail/binutils/2019-May/107094.html
> > >
> > > And I'm in the process of importing libbacktrace in to binutils-gdb,
> > > which is also a host and target library, and triggers the same issues.
> > >
> > > I believe that for binutils-gdb, at least at the moment, there are no
> > > target libraries that we need to build.
> > >
> > > My proposal then is to make the value of target_libraries change based
> > > on which repository we are building in.  Specifically, if the source
> > > tree has a gcc/ directory then we should set the target_libraries
> > > variable, otherwise this variable is left entry.
> > >
> > > I think that if someone tries to create a single unified tree (gcc +
> > > binutils-gdb in a single source tree) and then build, this change will
> > > not have a negative impact, the tree still has gcc/ so we'd expect the
> > > target compiler to be built, which means building the target_libraries
> > > should work just fine.
> > >
> > > However, if the source tree lacks gcc/ then we assume the target
> > > compiler isn't built/available, and so target_libraries shouldn't be
> > > built.
> > >
> > > There is already precedent within configure.ac for check on the
> > > existence of gcc/ in the source tree, see the handling of
> > > -enable-werror around line 3658.
> >
> > (I understand that one to just guard the 'cat $srcdir/gcc/DEV-PHASE',
> > tough.)
> >
> > > I've tested a build of gcc on x86-64, and the same set of target
> > > libraries still seem to get built.  On binutils-gdb this change
> > > resolves the issues with 'make all'.
> > >
> > > Any thoughts?
> >
> > > --- a/configure.ac
> > > +++ b/configure.ac
> > > @@ -180,9 +180,17 @@ target_tools="target-rda"
> > >  ## We assign ${configdirs} this way to remove all embedded newlines.  
> > > This
> > >  ## is important because configure will choke if they ever get through.
> > >  ## ${configdirs} is directories we build using the host tools.
> > > -## ${target_configdirs} is directories we build using the target tools.
> > > +##
> > > +## ${target_configdirs} is directories we build using the target
> > > +## tools, these are only needed when working in the gcc tree.  This
> > > +## file is also reused in the binutils-gdb tree, where building any
> > > +## target stuff doesn't make sense.
> > >  configdirs=`echo ${host_libs} ${host_tools}`
> > > -target_configdirs=`echo ${target_libraries} ${target_tools}`
> > > +if test -d ${srcdir}/gcc; then
> > > +  target_configdirs=`echo ${target_libraries} ${target_tools}`
> > > +else
> > > +  target_configdirs=""
> > > +fi
> > >  build_configdirs=`echo ${build_libs} ${build_tools}`
> >
> > What I see is that after this, there are still occasions where inside
> > 'case "${target}"', 'target_configdirs' gets amended, so those won't be
> > caught by your approach?
>
> Good point, I'd failed to spot these.
>
> >
> > Instead of erasing 'target_configdirs' as you've posted, and
> > understanding that we can't just instead add all the "offending" ones to
> > 'noconfigdirs' for '! test -d "$srcdir"/gcc/' (because that would also
> > disable them for host usage),
>
> Great idea, this is what I've done in the revised patch below.
>
> >I wonder if it'd make sense to turn all
> > existing 'target_libraries=[...]' and 'target_tools=[...]' assignments
> > and later amendments into '[...]_gcc=[...]' variants, with 

Re: [PATCH] Allow different vector types for stmt groups

2021-09-27 Thread Richard Biener via Gcc-patches
On Mon, 27 Sep 2021, Richard Biener wrote:

> On Fri, 24 Sep 2021, Richard Sandiford wrote:
> 
> > Richard Biener  writes:
> > > This allows vectorization (in practice non-loop vectorization) to
> > > have a stmt participate in different vector type vectorizations.
> > > It allows us to remove vect_update_shared_vectype and replace it
> > > by pushing/popping STMT_VINFO_VECTYPE from SLP_TREE_VECTYPE around
> > > vect_analyze_stmt and vect_transform_stmt.
> > >
> > > For data-ref the situation is a bit more complicated since we
> > > analyze alignment info with a specific vector type in mind which
> > > doesn't play well when that changes.
> > >
> > > So the bulk of the change is passing down the actual vector type
> > > used for a vectorized access to the various accessors of alignment
> > > info, first and foremost dr_misalignment but also aligned_access_p,
> > > known_alignment_for_access_p, vect_known_alignment_in_bytes and
> > > vect_supportable_dr_alignment.  I took the liberty to replace
> > > ALL_CAPS macro accessors with the lower-case function invocations.
> > >
> > > The actual changes to the behavior are in dr_misalignment which now
> > > is the place factoring in the negative step adjustment as well as
> > > handling alignment queries for a vector type with bigger alignment
> > > requirements than what we can (or have) analyze(d).
> > >
> > > vect_slp_analyze_node_alignment makes use of this and upon receiving
> > > a vector type with a bigger alingment desire re-analyzes the DR
> > > with respect to it but keeps an older more precise result if possible.
> > > In this context it might be possible to do the analysis just once
> > > but instead of analyzing with respect to a specific desired alignment
> > > look for the biggest alignment we can compute a not unknown alignment.
> > >
> > > The ChangeLog includes the functional changes but not the bulk due
> > > to the alignment accessor API changes - I hope that's something good.
> > >
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu, testing on SPEC
> > > CPU 2017 in progress (for stats and correctness).
> > >
> > > Any comments?
> > 
> > Sorry for the super-slow response, some comments below.
> > 
> > > [?]
> > > diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
> > > index a57700f2c1b..c42fc2fb272 100644
> > > --- a/gcc/tree-vect-data-refs.c
> > > +++ b/gcc/tree-vect-data-refs.c
> > > @@ -887,37 +887,53 @@ vect_slp_analyze_instance_dependence (vec_info 
> > > *vinfo, slp_instance instance)
> > >return res;
> > >  }
> > >  
> > > -/* Return the misalignment of DR_INFO.  */
> > > +/* Return the misalignment of DR_INFO accessed in VECTYPE.  */
> > >  
> > >  int
> > > -dr_misalignment (dr_vec_info *dr_info)
> > > +dr_misalignment (dr_vec_info *dr_info, tree vectype)
> > >  {
> > > +  HOST_WIDE_INT diff = 0;
> > > +  /* Alignment is only analyzed for the first element of a DR group,
> > > + use that but adjust misalignment by the offset of the access.  */
> > >if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt))
> > >  {
> > >dr_vec_info *first_dr
> > >   = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (dr_info->stmt));
> > > -  int misalign = first_dr->misalignment;
> > > -  gcc_assert (misalign != DR_MISALIGNMENT_UNINITIALIZED);
> > > -  if (misalign == DR_MISALIGNMENT_UNKNOWN)
> > > - return misalign;
> > >/* vect_analyze_data_ref_accesses guarantees that DR_INIT are
> > >INTEGER_CSTs and the first element in the group has the lowest
> > >address.  Likewise vect_compute_data_ref_alignment will
> > >have ensured that target_alignment is constant and otherwise
> > >set misalign to DR_MISALIGNMENT_UNKNOWN.  */
> > 
> > Can you move the second sentence down so that it stays with the to_constant?
> > 
> > > -  HOST_WIDE_INT diff = (TREE_INT_CST_LOW (DR_INIT (dr_info->dr))
> > > - - TREE_INT_CST_LOW (DR_INIT (first_dr->dr)));
> > > +  diff = (TREE_INT_CST_LOW (DR_INIT (dr_info->dr))
> > > +   - TREE_INT_CST_LOW (DR_INIT (first_dr->dr)));
> > >gcc_assert (diff >= 0);
> > > -  unsigned HOST_WIDE_INT target_alignment_c
> > > - = first_dr->target_alignment.to_constant ();
> > > -  return (misalign + diff) % target_alignment_c;
> > > +  dr_info = first_dr;
> > >  }
> > > -  else
> > > +
> > > +  int misalign = dr_info->misalignment;
> > > +  gcc_assert (misalign != DR_MISALIGNMENT_UNINITIALIZED);
> > > +  if (misalign == DR_MISALIGNMENT_UNKNOWN)
> > > +return misalign;
> > > +
> > > +  /* If the access is only aligned for a vector type with smaller 
> > > alignment
> > > + requirement the access has unknown misalignment.  */
> > > +  if (maybe_lt (dr_info->target_alignment * BITS_PER_UNIT,
> > > + targetm.vectorize.preferred_vector_alignment (vectype)))
> > > +return DR_MISALIGNMENT_UNKNOWN;
> > > +
> > > +  /* If this is a backward running DR then first access in the larger
> > > + vectype actually 

Re: [PATCH] Fix PR c/94726: ICE with __builtin_shuffle and changing of types

2021-09-27 Thread Richard Biener via Gcc-patches
On Sun, Sep 26, 2021 at 12:00 PM apinski--- via Gcc-patches
 wrote:
>
> From: Andrew Pinski 
>
> The problem here is __builtin_shuffle when called with two arguments
> instead of 1, uses a SAVE_EXPR to put in for the 1st and 2nd operand
> of VEC_PERM_EXPR and when we go and gimplify the SAVE_EXPR, the type
> is now error_mark_node and that fails hard.
> This fixes the problem by adding a simple check for type of operand
> of SAVE_EXPR not to be error_mark_node.
>
> OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.

OK.

Richard.

> gcc/ChangeLog:
>
> PR c/94726
> * gimplify.c (gimplify_save_expr): Return early
> if the type of val is error_mark_node.
>
> gcc/testsuite/ChangeLog:
>
> PR c/94726
> * gcc.dg/pr94726.c: New test.
> ---
>  gcc/gimplify.c |  3 +++
>  gcc/testsuite/gcc.dg/pr94726.c | 11 +++
>  2 files changed, 14 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/pr94726.c
>
> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
> index 9163dcda438..943c5cb8f2d 100644
> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c
> @@ -6232,6 +6232,9 @@ gimplify_save_expr (tree *expr_p, gimple_seq *pre_p, 
> gimple_seq *post_p)
>gcc_assert (TREE_CODE (*expr_p) == SAVE_EXPR);
>val = TREE_OPERAND (*expr_p, 0);
>
> +  if (TREE_TYPE (val) == error_mark_node)
> +return GS_ERROR;
> +
>/* If the SAVE_EXPR has not been resolved, then evaluate it once.  */
>if (!SAVE_EXPR_RESOLVED_P (*expr_p))
>  {
> diff --git a/gcc/testsuite/gcc.dg/pr94726.c b/gcc/testsuite/gcc.dg/pr94726.c
> new file mode 100644
> index 000..d6911a644a4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr94726.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +typedef unsigned int type __attribute__ ( ( vector_size ( 2*sizeof(int) ) ) 
> ) ;
> +type a , b;
> +/* { dg-message "note: previous declaration" "previous declaration" { target 
> *-*-* } .-1 } */
> +void foo ( void ) {
> +   type var = { 2 , 2 } ;
> +   b = __builtin_shuffle ( a , var ) ;
> +}
> +
> +void * a [ ] = { } ; /* { dg-error "conflicting types" } */
> --
> 2.17.1
>


Re: [PATCHv2] top-level configure: setup target_configdirs based on repository

2021-09-27 Thread Thomas Schwinge
Hi!

On 2021-09-24T11:34:34+0100, Andrew Burgess  wrote:
> The V2 patch below:
>
>   - Moves the check for gcc/ to much later in the configure script,
> after we've finished building target_configdirs,
>
>   - Makes use of skipdirs to avoid building anything from
> target_configdirs if we're not also building gcc.

Thanks, that looks better in line with how that script generally appears
to work (... per my not-in-depth understanding).  (But I can't formally
approve.)

Reviewed-by: Thomas Schwinge 


Grüße
 Thomas


> commit 84c8b7f1605c8f2840d3c857a4d86abc7dde0668
> Author: Andrew Burgess 
> Date:   Wed Sep 22 15:15:41 2021 +0100
>
> top-level configure: setup target_configdirs based on repository
>
> The top-level configure script is shared between the gcc repository
> and the binutils-gdb repository.
>
> The target_configdirs variable in the configure.ac script, defines
> sub-directories that contain components that should be built for the
> target using the target tools.
>
> Some components, e.g. zlib, are built as both host and target
> libraries.
>
> This causes problems for binutils-gdb.  If we run 'make all' in the
> binutils-gdb repository we end up trying to build a target version of
> the zlib library, which requires the target compiler be available.
> Often the target compiler isn't immediately available, and so the
> build fails.
>
> The problem with zlib impacted a previous attempt to synchronise the
> top-level configure scripts from gcc to binutils-gdb, see this thread:
>
>   https://sourceware.org/pipermail/binutils/2019-May/107094.html
>
> And I'm in the process of importing libbacktrace in to binutils-gdb,
> which is also a host and target library, and triggers the same issues.
>
> I believe that for binutils-gdb, at least at the moment, there are no
> target libraries that we need to build.
>
> In the configure script we build three lists of things we want to
> build, $configdirs, $build_configdirs, and $target_configdirs, we also
> build two lists of things we don't want to build, $skipdirs and
> $noconfigdirs.  We then remove anything that is in the lists of things
> not to build, from the list of things that should be built.
>
> My proposal is to add everything in target_configdirs into skipdirs,
> if the source tree doesn't contain a gcc/ sub-directory.  The result
> is that for binutils-gdb no target tools or libraries will be built,
> while for the gcc repository, nothing should change.
>
> If a user builds a unified source tree, then the target tools and
> libraries should still be built as the gcc/ directory will be present.
>
> I've tested a build of gcc on x86-64, and the same set of target
> libraries still seem to get built.  On binutils-gdb this change
> resolves the issues with 'make all'.
>
> ChangeLog:
>
> * configure: Regenerate.
> * configure.ac (skipdirs): Add the contents of target_configdirs 
> if
> we are not building gcc.
>
> diff --git a/configure b/configure
> index 85ab9915402..785498efff5 100755
> --- a/configure
> +++ b/configure
> @@ -8874,6 +8874,16 @@ case ,${enable_languages}, in
>  ;;
>  esac
>
> +# If gcc/ is not in the source tree then we'll not be building a
> +# target compiler, assume in that case we don't want to build any
> +# target libraries or tools.
> +#
> +# This was added primarily for the benefit for binutils-gdb who reuse
> +# this configure script, but don't always have target tools available.
> +if test ! -d ${srcdir}/gcc; then
> +   skipdirs="${skipdirs} ${target_configdirs}"
> +fi
> +
>  # Remove the entries in $skipdirs and $noconfigdirs from $configdirs,
>  # $build_configdirs and $target_configdirs.
>  # If we have the source for $noconfigdirs entries, add them to $notsupp.
> diff --git a/configure.ac b/configure.ac
> index 1df038b04f3..c523083c346 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -2272,6 +2272,16 @@ case ,${enable_languages}, in
>  ;;
>  esac
>
> +# If gcc/ is not in the source tree then we'll not be building a
> +# target compiler, assume in that case we don't want to build any
> +# target libraries or tools.
> +#
> +# This was added primarily for the benefit for binutils-gdb who reuse
> +# this configure script, but don't always have target tools available.
> +if test ! -d ${srcdir}/gcc; then
> +   skipdirs="${skipdirs} ${target_configdirs}"
> +fi
> +
>  # Remove the entries in $skipdirs and $noconfigdirs from $configdirs,
>  # $build_configdirs and $target_configdirs.
>  # If we have the source for $noconfigdirs entries, add them to $notsupp.
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH] Revert "Optimize v4sf reduction.".

2021-09-27 Thread liuhongt via Gcc-patches
Revert due to performace regression.

This reverts commit 8f323c712ea76cc4506b03895e9b991e4e4b2baf.

 PR target/102473
 PR target/101059
---
 gcc/config/i386/sse.md| 39 ++-
 gcc/testsuite/gcc.target/i386/sse2-pr101059.c | 32 ---
 gcc/testsuite/gcc.target/i386/sse3-pr101059.c | 13 ---
 3 files changed, 11 insertions(+), 73 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.target/i386/sse2-pr101059.c
 delete mode 100644 gcc/testsuite/gcc.target/i386/sse3-pr101059.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index a446dedb2ec..bb7600edbab 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -3156,36 +3156,19 @@ (define_insn "sse3_hv4sf3"
(set_attr "prefix_rep" "1,*")
(set_attr "mode" "V4SF")])
 
-(define_expand "reduc_plus_scal_v4sf"
- [(plus:V4SF
-   (match_operand:SF 0 "register_operand")
-   (match_operand:V4SF 1 "register_operand"))]
- "TARGET_SSE"
-{
-  rtx vtmp = gen_reg_rtx (V4SFmode);
-  rtx stmp = gen_reg_rtx (SFmode);
-  if (TARGET_SSE3)
-emit_insn (gen_sse3_movshdup (vtmp, operands[1]));
-  else
-emit_insn (gen_sse_shufps (vtmp, operands[1], operands[1], GEN_INT(177)));
+(define_mode_iterator REDUC_SSE_PLUS_MODE
+ [(V2DF "TARGET_SSE") (V4SF "TARGET_SSE")])
 
-  emit_insn (gen_addv4sf3 (operands[1], operands[1], vtmp));
-  emit_insn (gen_sse_movhlps (vtmp, vtmp, operands[1]));
-  emit_insn (gen_vec_extractv4sfsf (stmp, vtmp, const0_rtx));
-  emit_insn (gen_vec_extractv4sfsf (operands[0], operands[1], const0_rtx));
-  emit_insn (gen_addsf3 (operands[0], operands[0], stmp));
-  DONE;
-})
-
-(define_expand "reduc_plus_scal_v2df"
- [(plus:V2DF
-   (match_operand:DF 0 "register_operand")
-   (match_operand:V2DF 1 "register_operand"))]
- "TARGET_SSE"
+(define_expand "reduc_plus_scal_"
+ [(plus:REDUC_SSE_PLUS_MODE
+   (match_operand: 0 "register_operand")
+   (match_operand:REDUC_SSE_PLUS_MODE 1 "register_operand"))]
+ ""
 {
-  rtx tmp = gen_reg_rtx (V2DFmode);
-  ix86_expand_reduc (gen_addv2df3, tmp, operands[1]);
-  emit_insn (gen_vec_extractv2dfdf (operands[0], tmp, const0_rtx));
+  rtx tmp = gen_reg_rtx (mode);
+  ix86_expand_reduc (gen_add3, tmp, operands[1]);
+  emit_insn (gen_vec_extract (operands[0], tmp,
+const0_rtx));
   DONE;
 })
 
diff --git a/gcc/testsuite/gcc.target/i386/sse2-pr101059.c 
b/gcc/testsuite/gcc.target/i386/sse2-pr101059.c
deleted file mode 100644
index d155bf5b43c..000
--- a/gcc/testsuite/gcc.target/i386/sse2-pr101059.c
+++ /dev/null
@@ -1,32 +0,0 @@
-/* { dg-do run } */
-/* { dg-options "-O2 -ffast-math -msse2" } */
-/* { dg-require-effective-target sse2 } */
-
-#ifndef CHECK_H
-#define CHECK_H "sse2-check.h"
-#endif
-
-#ifndef TEST
-#define TEST sse2_test
-#endif
-
-#include CHECK_H
-
-float
-__attribute__((noipa, optimize("tree-vectorize")))
-foo (float* p)
-{
-  float sum = 0.f;
-  for (int i = 0; i != 4; i++)
-sum += p[i];
-  return sum;
-}
-
-static void
-TEST (void)
-{
-  float p[4] = {1.0f, 2.0f, 3.0f, 4.0f};
-  float res = foo (p);
-  if (res != 10.0f)
-abort();
-}
diff --git a/gcc/testsuite/gcc.target/i386/sse3-pr101059.c 
b/gcc/testsuite/gcc.target/i386/sse3-pr101059.c
deleted file mode 100644
index 4795e892883..000
--- a/gcc/testsuite/gcc.target/i386/sse3-pr101059.c
+++ /dev/null
@@ -1,13 +0,0 @@
-/* { dg-do run } */
-/* { dg-options "-O2 -ffast-math -msse3" } */
-/* { dg-require-effective-target sse3 } */
-
-#ifndef CHECK_H
-#define CHECK_H "sse3-check.h"
-#endif
-
-#ifndef TEST
-#define TEST sse3_test
-#endif
-
-#include "sse2-pr101059.c"
-- 
2.27.0



Re: [PATCH] Allow different vector types for stmt groups

2021-09-27 Thread Richard Biener via Gcc-patches
On Fri, 24 Sep 2021, Richard Sandiford wrote:

> Richard Biener  writes:
> > This allows vectorization (in practice non-loop vectorization) to
> > have a stmt participate in different vector type vectorizations.
> > It allows us to remove vect_update_shared_vectype and replace it
> > by pushing/popping STMT_VINFO_VECTYPE from SLP_TREE_VECTYPE around
> > vect_analyze_stmt and vect_transform_stmt.
> >
> > For data-ref the situation is a bit more complicated since we
> > analyze alignment info with a specific vector type in mind which
> > doesn't play well when that changes.
> >
> > So the bulk of the change is passing down the actual vector type
> > used for a vectorized access to the various accessors of alignment
> > info, first and foremost dr_misalignment but also aligned_access_p,
> > known_alignment_for_access_p, vect_known_alignment_in_bytes and
> > vect_supportable_dr_alignment.  I took the liberty to replace
> > ALL_CAPS macro accessors with the lower-case function invocations.
> >
> > The actual changes to the behavior are in dr_misalignment which now
> > is the place factoring in the negative step adjustment as well as
> > handling alignment queries for a vector type with bigger alignment
> > requirements than what we can (or have) analyze(d).
> >
> > vect_slp_analyze_node_alignment makes use of this and upon receiving
> > a vector type with a bigger alingment desire re-analyzes the DR
> > with respect to it but keeps an older more precise result if possible.
> > In this context it might be possible to do the analysis just once
> > but instead of analyzing with respect to a specific desired alignment
> > look for the biggest alignment we can compute a not unknown alignment.
> >
> > The ChangeLog includes the functional changes but not the bulk due
> > to the alignment accessor API changes - I hope that's something good.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, testing on SPEC
> > CPU 2017 in progress (for stats and correctness).
> >
> > Any comments?
> 
> Sorry for the super-slow response, some comments below.
> 
> > […]
> > diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
> > index a57700f2c1b..c42fc2fb272 100644
> > --- a/gcc/tree-vect-data-refs.c
> > +++ b/gcc/tree-vect-data-refs.c
> > @@ -887,37 +887,53 @@ vect_slp_analyze_instance_dependence (vec_info 
> > *vinfo, slp_instance instance)
> >return res;
> >  }
> >  
> > -/* Return the misalignment of DR_INFO.  */
> > +/* Return the misalignment of DR_INFO accessed in VECTYPE.  */
> >  
> >  int
> > -dr_misalignment (dr_vec_info *dr_info)
> > +dr_misalignment (dr_vec_info *dr_info, tree vectype)
> >  {
> > +  HOST_WIDE_INT diff = 0;
> > +  /* Alignment is only analyzed for the first element of a DR group,
> > + use that but adjust misalignment by the offset of the access.  */
> >if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt))
> >  {
> >dr_vec_info *first_dr
> > = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (dr_info->stmt));
> > -  int misalign = first_dr->misalignment;
> > -  gcc_assert (misalign != DR_MISALIGNMENT_UNINITIALIZED);
> > -  if (misalign == DR_MISALIGNMENT_UNKNOWN)
> > -   return misalign;
> >/* vect_analyze_data_ref_accesses guarantees that DR_INIT are
> >  INTEGER_CSTs and the first element in the group has the lowest
> >  address.  Likewise vect_compute_data_ref_alignment will
> >  have ensured that target_alignment is constant and otherwise
> >  set misalign to DR_MISALIGNMENT_UNKNOWN.  */
> 
> Can you move the second sentence down so that it stays with the to_constant?
> 
> > -  HOST_WIDE_INT diff = (TREE_INT_CST_LOW (DR_INIT (dr_info->dr))
> > -   - TREE_INT_CST_LOW (DR_INIT (first_dr->dr)));
> > +  diff = (TREE_INT_CST_LOW (DR_INIT (dr_info->dr))
> > + - TREE_INT_CST_LOW (DR_INIT (first_dr->dr)));
> >gcc_assert (diff >= 0);
> > -  unsigned HOST_WIDE_INT target_alignment_c
> > -   = first_dr->target_alignment.to_constant ();
> > -  return (misalign + diff) % target_alignment_c;
> > +  dr_info = first_dr;
> >  }
> > -  else
> > +
> > +  int misalign = dr_info->misalignment;
> > +  gcc_assert (misalign != DR_MISALIGNMENT_UNINITIALIZED);
> > +  if (misalign == DR_MISALIGNMENT_UNKNOWN)
> > +return misalign;
> > +
> > +  /* If the access is only aligned for a vector type with smaller alignment
> > + requirement the access has unknown misalignment.  */
> > +  if (maybe_lt (dr_info->target_alignment * BITS_PER_UNIT,
> > +   targetm.vectorize.preferred_vector_alignment (vectype)))
> > +return DR_MISALIGNMENT_UNKNOWN;
> > +
> > +  /* If this is a backward running DR then first access in the larger
> > + vectype actually is N-1 elements before the address in the DR.
> > + Adjust misalign accordingly.  */
> > +  if (tree_int_cst_sgn (DR_STEP (dr_info->dr)) < 0)
> >  {
> > -  int misalign = dr_info->misalignment;
> > -  gcc_assert (misalign