from:"Steve Ellcey"

Mailing list unsubscribe not working?

2019-10-30 Thread Steve Ellcey

I am not sure if this is the correct mailing list but I did not see a
better one to use.

I have been trying to unsubscribe from some mailing lists and the process
does not seem to be working.  As an example I sent an unsubscribe request
for libstdc++-digest, got a reply asking me to confirm, when I did a reply
in order to confirm I got:

| Acknowledgment: The address
| 
|prvs=82067f4a26=sell...@marvell.com
| 
| was not on the libstdc++-digest mailing list when I received
| your request and is not a subscriber of this list.
|
| If you unsubscribe, but continue to receive mail, you're subscribed
| under a different address than you currently use. Please look at the
| header for:
| 
| 'Return-Path: '


I looked at the email sent to me and I see:

| Return-Path: libstdc++-digest-return-14453-sellcey=marvell@gcc.gnu.org

So that would seem to imply that I am in fact subscribed as sell...@marvell.com
but the unsubscribe still failed.  Has anyone else had this issue or have
any idea on what is going on?

Steve Ellcey

Re: RFC: Extending --with-advance-toolchain to aarch64

2019-10-10 Thread Steve Ellcey

On Thu, 2019-10-10 at 15:38 -0300, Tulio Magno Quites Machado Filho wrote:
> 
> > Let me first describe what I do now:
> > 
> > configure/build BINUTILS with --prefix=${X} --with-sysroot=${X}
> > configure/build an initial GCC (all-gcc all-target-libgcc) with
> > --prefix=${X} --with-sysroot=${X}
> > configure/build GLIBC, using that GCC, with --prefix=/usr,
> > followed by install with DESTDIR=${X}
> 
> Can you use --prefix=${X}?

I can.  I would rather not, because when you don't have prefix set to
/usr you get a different glibc build.  For example, on aarch64 building
with --prefix=/usr means that libraries are put in lib64 (or libilp32)
instead of just lib.  The glibc folks are always preaching against 
building with a prefix of anything other than /usr.

> 
> Florian already explained why glibc has this test.
> But the Advance Toolchain carries the following patch:
> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__sourceware.org_git_-3Fp-3Dglibc.git-3Ba-3Dcommitdiff-3Bh-3D9ca2648e2aa7094e022d5150281b2575f866259f=DwIBAg=nKjWec2b6R0mOyPaz7xtfQ=Kj0CuWu6MgrNHos80CzrFt4fiXgwrFhMWDTO9Ue_lRU=zJmKExSapjGitHa0CdqSuR7k0QkL_7nNpzI76Y8XSLs=oE8dt9sjEr5MEtYG4c_pIgGtWYh2ZH3CG1jPypnGAdg=
> 

Ah, I see.  I was hoping that using --with-advance-toolchain would give
me a way to build a toolchain without needing any local/non-standard
patches.

Steve Ellcey
sell...@marvell.com

Re: RFC: Extending --with-advance-toolchain to aarch64

2019-10-10 Thread Steve Ellcey

On Thu, 2019-10-10 at 18:41 +0200, Florian Weimer wrote:
> 
> * Steve Ellcey:
> 
> > I would like these used by default so I took some ideas from
> > --with-advance-toolchain and used that to automatically add these options
> > to LINK_SPEC (see attached patch).  I can compile and link a program with
> > this setup, but when I run the program I get:
> > 
> > % ./x
> > Inconsistency detected by ld.so: get-dynamic-info.h: 147: 
> > elf_get_dynamic_info: 
> > Assertion `info[DT_RPATH] == NULL' failed!
> > 
> > I am not sure why this doesn't work.  Can anyone help me understand
> > why this doesn't work or help me figure out how else I might be able to
> > get the functionality I want. That is: to use shared libraries and a dynamic
> > linker (at run time) that are in a non-standard location without needing
> > to compile or link with special flags.
> 
> An argument could be made that if ld.so has DT_RPATH set,
> LD_LIBRARY_PATH would stop working, which would be a bug.  Hence the
> assert.  It's probably less an issue for DT_RUNPATH.
> 
> The real fix would be to make sure that ld.so isn't built with those
> dynamic tags.  If ld.so wants to use an alternative search path, that
> should be baked into the loader itself, explicitly.
> 
> Do you know where those dynamic tags originate?  Is there some wrapper
> script involved that sets them unconditionally?

I am not sure, but my guess is that it is because I am building
binutils (including ld) using --with-sysroot.  I build both GCC and
binutils with the sysroot directory where I put the glibc that I am
building.  Maybe I should try building GCC with --with-sysroot but
build binutils without it.

Steve Ellcey
sell...@marvell.com

Re: RFC: Extending --with-advance-toolchain to aarch64

2019-10-10 Thread Steve Ellcey

On Thu, 2019-10-10 at 10:49 +1030, Alan Modra wrote:
> On Wed, Oct 09, 2019 at 10:29:48PM +0000, Steve Ellcey wrote:
> > I have a question about building a toolchain that uses (at run
> > time) a
> > dynamic linker and system libraries and headers that are in a non-
> > standard
> > place.
> 
> I had scripts a long time ago to build a complete toolchain including
> glibc that could be installed in a non-standard location and co-exist
> with other system libraries.  I worked around..
> 
> > Inconsistency detected by ld.so: get-dynamic-info.h: 147:
> > elf_get_dynamic_info: 
> > Assertion `info[DT_RPATH] == NULL' failed!
> 
> ..this by patching glibc.

Yes, I have something working by patching glibc (and gcc) but when I
saw the IBM --with-advance-toolchain option I was hoping I might be
able to come up with a build process that worked and did not need any
patching.

Steve Ellcey
sell...@marvell.com

RFC: Extending --with-advance-toolchain to aarch64

2019-10-09 Thread Steve Ellcey

I have a question about building a toolchain that uses (at run time) a
dynamic linker and system libraries and headers that are in a non-standard
place.

I just noticed the IBM --with-advance-toolchain option and I would
like to replicate it for aarch64.

Let me first describe what I do now:

configure/build BINUTILS with --prefix=${X} --with-sysroot=${X}
configure/build an initial GCC (all-gcc all-target-libgcc) with
--prefix=${X} --with-sysroot=${X}
configure/build GLIBC, using that GCC, with --prefix=/usr,
followed by install with DESTDIR=${X}
configure/build final GCC with --prefix=${X} --with-sysroot=${X}

This all works, but if I want my executables to find the shared libraries and
dynamic linker from ${X} when they are running, I need to compile things with:

   -Wl,--rpath=${X}/lib64 -Wl,--dynamic-linker=${X}/lib/ld-linux-aarch64.so.1

I would like these used by default so I took some ideas from
--with-advance-toolchain and used that to automatically add these options
to LINK_SPEC (see attached patch).  I can compile and link a program with
this setup, but when I run the program I get:

% ./x
Inconsistency detected by ld.so: get-dynamic-info.h: 147: elf_get_dynamic_info: 
Assertion `info[DT_RPATH] == NULL' failed!

I am not sure why this doesn't work.  Can anyone help me understand
why this doesn't work or help me figure out how else I might be able to
get the functionality I want. That is: to use shared libraries and a dynamic
linker (at run time) that are in a non-standard location without needing
to compile or link with special flags.

Steve Ellcey
sell...@marvell.com


Here is the patch I am trying, I use the --with-advance-toolchain option as
an absolute pathname instead of relative to /opt like IBM does and I set it
to ${X} in a build that otherwise looks like what I describe above.  Everything
works until I start the final GCC build which is when I get the assertion.


diff --git a/gcc/config.gcc b/gcc/config.gcc
index 481bc9586a7..0532139b0b1 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3879,7 +3879,7 @@ fi
 supported_defaults=
 case "${target}" in
aarch64*-*-*)
-   supported_defaults="abi cpu arch"
+   supported_defaults="abi cpu arch advance_toolchain"
for which in cpu arch; do
 
eval "val=\$with_$which"
@@ -3981,6 +3981,23 @@ case "${target}" in
  exit 1
fi
done
+   if test "x$with_advance_toolchain" != x; then
+   at=$with_advance_toolchain
+   if test -d "$at/." -a -d "$at/include/."; then
+   tm_file="$tm_file ./advance-toolchain.h"
+   (
+echo "/* Use Advance Toolchain $at */"
+echo "#undef  LINK_ADVANCE_SPEC"
+echo "#define LINK_ADVANCE_SPEC" \
+  "\"--rpath=$at/lib%{mabi=ilp32:ilp32}%{mabi=lp64:64} 
\
+  
"--rpath=$at/usr/lib%{mabi=ilp32:ilp32}%{mabi=lp64:64} \
+  
"--dynamic-linker=$at/lib/ld-linux-aarch64%{mbig-endian:_be}%{mabi=ilp32:_ilp32}.so.1\""
+   ) > advance-toolchain.h
+   else
+   echo "Unknown advance-toolchain $at"
+   exit 1
+   fi
+   fi
;;
 
alpha*-*-*)
diff --git a/gcc/config/aarch64/aarch64-linux.h 
b/gcc/config/aarch64/aarch64-linux.h
index 6ff2163b633..d76fa56c73e 100644
--- a/gcc/config/aarch64/aarch64-linux.h
+++ b/gcc/config/aarch64/aarch64-linux.h
@@ -47,7 +47,10 @@
-maarch64linux%{mabi=ilp32:32}%{mbig-endian:b}"
 
 
-#define LINK_SPEC LINUX_TARGET_LINK_SPEC AARCH64_ERRATA_LINK_SPEC
+#ifndef LINK_ADVANCE_SPEC
+#define LINK_ADVANCE_SPEC
+#endif
+#define LINK_SPEC LINUX_TARGET_LINK_SPEC AARCH64_ERRATA_LINK_SPEC 
LINK_ADVANCE_SPEC
 
 #define GNU_USER_TARGET_MATHFILE_SPEC \
   "%{Ofast|ffast-math|funsafe-math-optimizations:crtfastmath.o%s}"

Re: [PATCH] PR tree-optimization/90836 Missing popcount pattern matching

2019-10-08 Thread Steve Ellcey

On Mon, 2019-10-07 at 12:04 +0200, Richard Biener wrote:
> On Tue, Oct 1, 2019 at 1:48 PM Dmitrij Pochepko
>  wrote:
> > 
> > Hi Richard,
> > 
> > I updated patch according to all your comments.
> > Also bootstrapped and tested again on x86_64-pc-linux-gnu and
> > aarch64-linux-gnu, which took some time.
> > 
> > attached v3.
> 
> OK.
> 
> Thanks,
> Richard.

Dmitrij,

I checked in this patch for you.

Steve Ellcey
sell...@marvell.com

Re: SPEC 2017 profiling question (502.gcc_r and 505.mcf_r fail)

2019-10-04 Thread Steve Ellcey

On Fri, 2019-10-04 at 15:58 -0500, Bill Schmidt wrote:
> 
> > Has anyone else seen these failures?
> 
> 
> Have you tried -fno-strict-aliasing?  There is a known issue with 
> spec_qsort() that affects both of these benchmarks.  See 
> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__gcc.gnu.org_bugzilla_show-5Fbug.cgi-3Fid-3D83201=DwIDaQ=nKjWec2b6R0mOyPaz7xtfQ=Kj0CuWu6MgrNHos80CzrFt4fiXgwrFhMWDTO9Ue_lRU=M5tfnhGt9QWxrZvk7eKa9J_EonLqJs6YezVWveUtFhM=gesldYv1Oq8frkNSrX4O912SsKENeUKBZZruZ5UZ-NM=
>  .
> 
> Hope this helps,
> 
> Bill

Ah, of course, thank you.  I verified that this fixes my mcf failure,
gcc is still running.  I already had -fno-strict-aliasing for
perlbench, I should have figured out that it could be affecting other
tests too.

Steve Ellcey
sell...@marvell.com

SPEC 2017 profiling question (502.gcc_r and 505.mcf_r fail)

2019-10-04 Thread Steve Ellcey

I am curious if anyone has tried running 'peak' SPEC 2017 numbers using
profiling.  Now that the cactus lto bug has been fixed I can run all
the SPEC intrate and fprate benchmarks with '-Ofast -flto -march=native'
on my aarch64 box and get accurate results but when I try to use these
options along with -fprofile-generate/-fprofile-use I get two
verification errors: 502.gcc_r and 505.mcf_r. The gcc benchmark is
generating different assembly language for some of its tests and mcf is
generating different numbers that look too large to just be due to
unsafe math optimizations.

Has anyone else seen these failures?

Steve Ellcey
sell...@marvell.com

Boost build broken due to recent C++ change?

2019-09-24 Thread Steve Ellcey

A recent g++ change (I haven't tracked down exactly which one, but in
the last day or two) seems to have broken my boost build.  It is dying
with lots of errors like:

./boost/intrusive/list.hpp:1448:7:   required from here
./boost/intrusive/detail/list_iterator.hpp:93:41: error: call of
overloaded 'get
_next(boost::intrusive::list_node*&)' is ambiguous
   93 |   node_ptr p = node_traits::get_next(members_.nodeptr_);
  |~^~~
In file included from ./boost/intrusive/list_hook.hpp:20,
 from ./boost/intrusive/list.hpp:20,
 from ./boost/fiber/context.hpp:29,
 from libs/fiber/src/algo/algorithm.cpp:9:

Has anyone else run into this?  I will try to create a cutdown test
case.

Steve Ellcey
sell...@marvell.com

ICE when compiling SPEC 526.blender_r benchmark (profiling)

2019-09-23 Thread Steve Ellcey



Before I submit a Bugzilla report or try to cut down a test case, has any
one seen this problem when compiling the 526.blender_r benchmark from
SPEC 2017:

Compiling with '-Ofast -flto -march=native -fprofile-generate' on Aarch64:

during GIMPLE pass: vect
blender/source/blender/imbuf/intern/indexer.c: In function 'IMB_indexer_open':
blender/source/blender/imbuf/intern/indexer.c:157:20: internal compiler error: 
in execute_todo, at passes.c:2012
  157 | struct anim_index *IMB_indexer_open(const char *name)
  |^
0xa5ee2b execute_todo
../../gcc/gcc/passes.c:2012
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

Re: Help with bug in GCC garbage collector

2019-08-19 Thread Steve Ellcey

On Mon, 2019-08-19 at 17:05 -0600, Jeff Law wrote:
> 
> There's a real good chance Martin Liska has already fixed this.  He's
> made a couple fixes in the last week or so with the interactions
> between
> the GC system and the symbol tables.
> 
> 
> 2019-08-15  Martin Liska  
> 
> PR ipa/91404
> * passes.c (order): Remove.
> (uid_hash_t): Likewise).
> (remove_cgraph_node_from_order): Remove from set
> of pointers (cgraph_node *).
> (insert_cgraph_node_to_order): New.
> (duplicate_cgraph_node_to_order): New.
> (do_per_function_toporder): Register all 3 cgraph hooks.
> Skip removed_nodes now as we know about all of them.
> 
> 
> The way I'd approach would be to configure a compiler with
> --enable-checking=gc,gcac, just build it through stage1.  Then run your
> test through that compiler which should fail.  THen apply Martin's patch
> (or update to the head of the trunk), rebuild the stage1 compiler and
> verify it works.

I had already built a compiler with --enable-checking=gc,gcac, that did
not catch the bug (I still got a segfault).  I did update my sources
though and the bug does not happen at ToT so it looks like Martin's
patch did fix my bug.

Steve Ellcey
sell...@marvell.com

Help with bug in GCC garbage collector

2019-08-19 Thread Steve Ellcey

I was wondering if anyone could help me investigate a bug I am seeing
in the GCC garbage collector.  This bug (which may or may not be PR
89179) is causing a segfault in GCC, but when I try to create a
preprocessed source file, the bug doesn't trigger.  The problem is with
the garbage collector trying to mark some memory that has already been
freed.  I have tracked down the initial allocation to:

symbol_table::allocate_cgraph_symbol

It has:

node = ggc_cleared_alloc ();

to allocate a cgraph node.  With the GGC debugging on I see this
allocated:

Allocating object, requested size=360, actual=360 at 0x7029c210 on 
0x41b148c0

then freed:

Freeing object, actual size=360, at 0x7029c210 on 0x41b148c0

And then later, while the garbage collector is marking nodes, I see:

Marking 0x7029c210

The garbage collector shouldn't be marking this node if has already
been freed.

So I guess my main question is how do I figure out how the garbage
collector got to this memory location?  I am guessing some GTY pointer
is still pointing to it and hadn't got nulled out when the memory was
freed.  Does that seem like the most likely cause?

I am not sure why I am only running into this with one particular
application on my Aarch64 platform.  I am building it with -fopenmp,
which could have something to do with it (though there are no simd functions in 
the application).  The application is not that large as C++ programs go.

Steve Ellcey
sell...@marvell.com

Re: [PATCH] Fix simd attribute handling on aarch64 (version 2)

2019-08-07 Thread Steve Ellcey

On Wed, 2019-08-07 at 10:40 +, Szabolcs Nagy wrote:
> ---
> ---
> On 31/07/2019 08:25, Richard Sandiford wrote:
> > Steve Ellcey  writes:
> > > 
> > > 2019-07-30  Steve Ellcey  
> > > 
> > >   * omp-simd-clone.c (simd_clone_adjust_return_type): Remove call
> > > to
> > >   build_distinct_type_copy.
> > >   (simd_clone_adjust_argument_types): Ditto.
> > >   (simd_clone_adjust): Call build_distinct_type_copy here.
> > >   (expand_simd_clones): Ditto.
> > > 
> > > 2019-07-30  Steve Ellcey  
> > > 
> > >   * gcc.target/aarch64/simd_pcs_attribute.c: New test.
> > >   * gcc.target/aarch64/simd_pcs_attribute-2.c: Ditto.
> > >   * gcc.target/aarch64/simd_pcs_attribute-3.c: Ditto.
> > 
> > OK if there are no objections in 48 hours.
> 
> i think this should be backported to gcc-9 too.

Yes, I agree.  The 9.X branch is frozen right now for the 9.2 release,
I will backport it to the branch after it reopens assuming there are no
objections.

Steve Ellcey
sell...@marvell.com

RFC: Help with updating LTO documentation

2019-08-07 Thread Steve Ellcey

While trying to use the -flto and -fwhole-program flags I ran into problems
understanding what they do.  I would like to update the documentation but I
still don't understand these flags enough to be able to describe their
behaviour.  Here is the document section I would like to fix but don't
have enough information to do so.

From lto.texi:

| @subsection LTO modes of operation
| 
| One of the main goals of the GCC link-time infrastructure was to allow
| effective compilation of large programs.  For this reason GCC implements two
| link-time compilation modes.
| 
| @enumerate
| @item   @emph{LTO mode}, in which the whole program is read into the
| compiler at link-time and optimized in a similar way as if it
| were a single source-level compilation unit.
| 
| @item   @emph{WHOPR or partitioned mode}, designed to utilize multiple
| CPUs and/or a distributed compilation environment to quickly link
| large applications.  WHOPR stands for WHOle Program optimizeR (not to
| be confused with the semantics of @option{-fwhole-program}).  It
| partitions the aggregated callgraph from many different @code{.o}
| files and distributes the compilation of the sub-graphs to different
| CPUs.

What flag(s) do I use (or not use) to enable @emph{LTO mode}?
I am guessing that if I use -flto but not -flto-partition on a
link, this is what I get.  Is that correct?

What flag(s) do I use to enable @emph{WHOPR or partitioned mode}?
I am guessing that this is -flto-partition?  Do I also need -flto if I
am using -flto-partition?  I don't see any description in lto.texi or in
common.opt of exactly what the various values for -flto-partition
(none, one, balanced, 1to1, max) do.  Does such a description exist
anywhere?

Steve Ellcey
sell...@marvell.com

Re: [EXT] Re: [PATCH 1/3] C++20 constexpr lib part 1/3

2019-08-06 Thread Steve Ellcey

On Tue, 2019-08-06 at 16:47 -0400, Marek Polacek wrote:
> On Tue, Aug 06, 2019 at 08:30:14PM +0000, Steve Ellcey wrote:
> > On Tue, 2019-08-06 at 21:04 +0100, Jonathan Wakely wrote:
> > > 
> > > The RAJAPerf code appears to be built with -std=gnu++11 which
> > > means
> > > Ed's patch should make almost no difference at all. 99% of the
> > > patch
> > > has no effect unless compiling with -std=gnu++2a.
> > > 
> > > I don't see any ICE running the libstdc++ testsuite with
> > > -std=gnu++11,
> > > so I have no better suggestion than creating a bugzilla report
> > > for the
> > > C++ front-end, with the preprocessed source of WIP-COUPLE.cpp
> > 
> > I created a preprocessed file.  Unfortunately when I compile that
> > preprocessed file with the same options as the unpreprocessed file,
> > it does not ICE.  I compiled the original file with -save-temps and
> > that compile no longer gives an ICE.  I hate bugs like this.
> 
> Does -fdirectives-only make any difference?
> 
> Marek

Nope.  I also looked at the preprocessed file compiled with and without
this patch and the two preprocessed files are identical.  I am thinking
that this patch is triggering some unrelated latent bug.

Steve Ellcey
sell...@marvell.com

Re: [EXT] Re: [PATCH 1/3] C++20 constexpr lib part 1/3

2019-08-06 Thread Steve Ellcey

On Tue, 2019-08-06 at 21:04 +0100, Jonathan Wakely wrote:
> 
> The RAJAPerf code appears to be built with -std=gnu++11 which means
> Ed's patch should make almost no difference at all. 99% of the patch
> has no effect unless compiling with -std=gnu++2a.
> 
> I don't see any ICE running the libstdc++ testsuite with -std=gnu++11,
> so I have no better suggestion than creating a bugzilla report for the
> C++ front-end, with the preprocessed source of WIP-COUPLE.cpp

I created a preprocessed file.  Unfortunately when I compile that
preprocessed file with the same options as the unpreprocessed file,
it does not ICE.  I compiled the original file with -save-temps and
that compile no longer gives an ICE.  I hate bugs like this.

Steve Ellcey
sell...@marvell.com

Re: [PATCH 1/3] C++20 constexpr lib part 1/3

2019-08-06 Thread Steve Ellcey

Ed,

I have run into an ICE that I tracked down to this patch:

commit 02fefffe6b78c4c39169206aa40fb53f367ecba8
Author: emsr 
Date:   Thu Aug 1 15:25:42 2019 +

2019-08-01  Edward Smith-Rowland  <3dw...@verizon.net>

Implement C++20 p0202 - Add Constexpr Modifiers to Functions
in  and  Headers.
Implement C++20 p1023 - constexpr comparison operators for 
std::array.


Before I try to create a smaller test example (the current failure happens
when I build https://github.com/llnl/RAJAPerf.git) I was wondering if you
have already seen this ICE:

/extra/sellcey/raja-build-error/RAJAPerf/src/apps/WIP-COUPLE.cpp: In member 
function 'virtual void rajaperf::apps::COUPLE::runKernel(rajaperf::VariantID)':
/extra/sellcey/raja-build-error/RAJAPerf/src/apps/WIP-COUPLE.cpp:217:1: 
internal compiler error: Segmentation fault
  217 | } // end namespace rajaperf
  | ^
0xe81ddf crash_signal
../../gcc/gcc/toplev.c:326
0x968d14 lookup_page_table_entry
../../gcc/gcc/ggc-page.c:632
0x968d14 ggc_set_mark(void const*)
../../gcc/gcc/ggc-page.c:1531
0xbfeadf gt_ggc_mx_symtab_node(void*)
/extra/sellcey/gcc-tot/obj-gcc/gcc/gtype-desc.c:1302
0xd9d2a7 gt_ggc_ma_order
./gt-passes.h:31
0xd9d2a7 gt_ggc_ma_order
./gt-passes.h:26
0xb6f49f ggc_mark_root_tab
../../gcc/gcc/ggc-common.c:77
0xb6f6c3 ggc_mark_roots()
../../gcc/gcc/ggc-common.c:94
0x9696af ggc_collect()
../../gcc/gcc/ggc-page.c:2201
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

Re: [PATCH] Fix simd attribute handling on aarch64 (version 2)

2019-07-30 Thread Steve Ellcey

On Tue, 2019-07-30 at 14:31 +0100, Richard Sandiford wrote:
> 
> > -
> > -  tree new_type = build_distinct_type_copy (TREE_TYPE (node-
> > >decl));
> > -  TYPE_ARG_TYPES (new_type) = new_reversed;
> 
> I think you still need this line, just applied to the existing type
> rather than a new one.
> 
> > -  TREE_TYPE (node->decl) = new_type;
> > -
> >adjustments.release ();
> >  }

OK, I fixed that and retested with no regressions.

Steve Ellcey
sell...@marvell.com


2019-07-30  Steve Ellcey  

* omp-simd-clone.c (simd_clone_adjust_return_type): Remove call to
build_distinct_type_copy.
(simd_clone_adjust_argument_types): Ditto.
(simd_clone_adjust): Call build_distinct_type_copy here.
(expand_simd_clones): Ditto.

2019-07-30  Steve Ellcey  

* gcc.target/aarch64/simd_pcs_attribute.c: New test.
* gcc.target/aarch64/simd_pcs_attribute-2.c: Ditto.
* gcc.target/aarch64/simd_pcs_attribute-3.c: Ditto.
diff --git a/gcc/omp-simd-clone.c b/gcc/omp-simd-clone.c
index caa8da3cba5..c8c528471a3 100644
--- a/gcc/omp-simd-clone.c
+++ b/gcc/omp-simd-clone.c
@@ -498,7 +498,6 @@ simd_clone_adjust_return_type (struct cgraph_node *node)
   /* Adjust the function return type.  */
   if (orig_rettype == void_type_node)
 return NULL_TREE;
-  TREE_TYPE (fndecl) = build_distinct_type_copy (TREE_TYPE (fndecl));
   t = TREE_TYPE (TREE_TYPE (fndecl));
   if (INTEGRAL_TYPE_P (t) || POINTER_TYPE_P (t))
 veclen = node->simdclone->vecsize_int;
@@ -724,11 +723,7 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 	  else
 	new_reversed = void_list_node;
 	}
-
-  tree new_type = build_distinct_type_copy (TREE_TYPE (node->decl));
-  TYPE_ARG_TYPES (new_type) = new_reversed;
-  TREE_TYPE (node->decl) = new_type;
-
+  TYPE_ARG_TYPES (TREE_TYPE (node->decl)) = new_reversed;
   adjustments.release ();
 }
   args.release ();
@@ -1164,6 +1159,7 @@ simd_clone_adjust (struct cgraph_node *node)
 {
   push_cfun (DECL_STRUCT_FUNCTION (node->decl));
 
+  TREE_TYPE (node->decl) = build_distinct_type_copy (TREE_TYPE (node->decl));
   targetm.simd_clone.adjust (node);
 
   tree retval = simd_clone_adjust_return_type (node);
@@ -1737,6 +1733,8 @@ expand_simd_clones (struct cgraph_node *node)
 	simd_clone_adjust (n);
 	  else
 	{
+	  TREE_TYPE (n->decl)
+		= build_distinct_type_copy (TREE_TYPE (n->decl));
 	  targetm.simd_clone.adjust (n);
 	  simd_clone_adjust_return_type (n);
 	  simd_clone_adjust_argument_types (n);
diff --git a/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-2.c b/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-2.c
index e69de29bb2d..8eec6824f37 100644
--- a/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-2.c
+++ b/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-2.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast" } */
+
+__attribute__ ((__simd__ ("notinbranch")))
+__attribute__ ((__nothrow__ , __leaf__ , __const__))
+extern double foo (double x);
+
+void bar(double * f, int n)
+{
+	int i;
+	for (i = 0; i < n; i++)
+		f[i] = foo(f[i]);
+}
+
+/* { dg-final { scan-assembler-not {\.variant_pcs\tfoo} } } */
+/* { dg-final { scan-assembler-times {\.variant_pcs\t_ZGVnN2v_foo} 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-3.c b/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-3.c
index e69de29bb2d..95f6a6803e8 100644
--- a/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-3.c
+++ b/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-3.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast" } */
+
+__attribute__ ((__simd__))
+__attribute__ ((__nothrow__ , __leaf__ , __const__))
+double foo (double x);
+
+void bar(double *f, int n)
+{
+	int i;
+	for (i = 0; i < n; i++)
+		f[i] = foo(f[i]);
+}
+
+double foo(double x)
+{
+	return x * x / 3.0;
+}
+
+/* { dg-final { scan-assembler-not {\.variant_pcs\tfoo} } } */
+/* { dg-final { scan-assembler-times {\.variant_pcs\t_ZGVnM1v_foo} 1 } } */
+/* { dg-final { scan-assembler-times {\.variant_pcs\t_ZGVnM2v_foo} 1 } } */
+/* { dg-final { scan-assembler-times {\.variant_pcs\t_ZGVnN1v_foo} 1 } } */
+/* { dg-final { scan-assembler-times {\.variant_pcs\t_ZGVnN2v_foo} 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute.c b/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute.c
index e69de29bb2d..eddcef3597c 100644
--- a/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute.c
+++ b/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast" } */
+
+__attribute__ ((__simd__ ("notinbranch")))
+__attribute__ ((__nothrow__ , __leaf__ , __const__))
+extern double log (double __x);
+
+void foo(double *f, int n)
+{
+	int i;
+	for (i = 0; i < n; i++)
+		f[i] = log(f[i]);
+}
+
+/* { dg-final { scan-assembler-not {\.variant_pcs\tlog} } } */
+/* { dg-final { scan-assembler-times {\.variant_pcs\t_ZGVnN2v_log} 1 } } */

Re: [PATCH] Fix simd attribute handling on aarch64 (version 2)

2019-07-29 Thread Steve Ellcey

Ping.

Steve Ellcey
sell...@marvell.com

On Mon, 2019-07-22 at 11:25 -0700, Steve Ellcey wrote:
> On Fri, 2019-07-19 at 19:24 +0100, Richard Sandiford wrote:
> > 
> > You can probably also remove:
> > 
> >   tree new_type = build_distinct_type_copy (TREE_TYPE (node-
> > >decl));
> >   ...
> >   TREE_TYPE (node->decl) = new_type;
> > 
> > in simd_clone_adjust_argument_types.
> > 
> > I'm happy doing it this way or doing the copy in the AArch64 hook.
> > It's really Jakub's call.
> 
> You are right, that is no longer needed with the current patch.  I
> removed it and retested with no regressions.  Jakub, do you have
> any preference?  I have attached a new version of the patch to this
> email.
> 
> > I don't think the tests need:
> > 
> > /* { dg-require-effective-target aarch64_variant_pcs } */
> > 
> > since they're only dg-do compile.  Leaving the line out would get
> > more
> > coverage for people using older binutils.
> > 
> > The tests are OK with that change, thanks.
> 
> OK, I made that change to the tests.
> 
> 
> Latest version of the patch:
> 
> 2019-07-22  Steve Ellcey  
> 
>   * omp-simd-clone.c (simd_clone_adjust_return_type): Remove call
> to
>   build_distinct_type_copy.
>   (simd_clone_adjust_argument_types): Ditto.
>   (simd_clone_adjust): Call build_distinct_type_copy here.
>   (expand_simd_clones): Ditto.
> 
> 
> 2019-07-22  Steve Ellcey  
> 
>   * gcc.target/aarch64/simd_pcs_attribute.c: New test.
>   * gcc.target/aarch64/simd_pcs_attribute-2.c: Ditto.
>   * gcc.target/aarch64/simd_pcs_attribute-3.c: Ditto.
> 
>

[PATCH] Fix gcc.dg/gomp/pr89104.c failure on aarch64

2019-07-24 Thread Steve Ellcey

I noticed that the test gcc.dg/gomp/pr89104.c fails on aarch64 platforms.
As mentioned in the bug report for PR 89104, this message is coming from
aarch64 target specific code which is why it does not occur on other
platforms.  There doesn't seem to be complete consensus in the bug report
on how to deal with this but I chose to just use -w on aarch64 to surpress
the warning.

The warning that we get is:

pr89104.c:8:1: warning: GCC does not currently support mixed size types for 
‘simd’ functions
8 | foo (int *x, int y)

This is because 'x' is a 64 bit pointer and 'y' is a 32 bit integer
in the default LP64 mode.  If I use -mabi=ilp32, then aarch64 does not 
generate a warning because both arguments are 32 bits.  I could force
ILP32 mode for aarch64 and/or only use -w only when not in 32 bit mode
but that seemed like overkill to me.

OK to checkin?

Steve Ellcey
sell...@marvell.com



2019-07-24  Steve Ellcey  

* gcc.dg/gomp/pr89104.c: Use -w on aarch64*-*-* targets.


diff --git a/gcc/testsuite/gcc.dg/gomp/pr89104.c 
b/gcc/testsuite/gcc.dg/gomp/pr89104.c
index 505fdda..7f0f688 100644
--- a/gcc/testsuite/gcc.dg/gomp/pr89104.c
+++ b/gcc/testsuite/gcc.dg/gomp/pr89104.c
@@ -2,6 +2,7 @@
 /* PR ipa/89104 */
 /* { dg-do compile } */
 /* { dg-options "-O2 -fopenmp-simd" } */
+/* { dg-options "-O2 -fopenmp-simd -w" { target aarch64*-*-* } } */
 
 #pragma omp declare simd uniform (x) aligned (x)
 int

Re: [PATCH] Fix simd attribute handling on aarch64 (version 2)

2019-07-22 Thread Steve Ellcey

On Fri, 2019-07-19 at 19:24 +0100, Richard Sandiford wrote:
> 
> You can probably also remove:
> 
>   tree new_type = build_distinct_type_copy (TREE_TYPE (node->decl));
>   ...
>   TREE_TYPE (node->decl) = new_type;
> 
> in simd_clone_adjust_argument_types.
> 
> I'm happy doing it this way or doing the copy in the AArch64 hook.
> It's really Jakub's call.

You are right, that is no longer needed with the current patch.  I
removed it and retested with no regressions.  Jakub, do you have
any preference?  I have attached a new version of the patch to this
email.

> I don't think the tests need:
> 
> /* { dg-require-effective-target aarch64_variant_pcs } */
> 
> since they're only dg-do compile.  Leaving the line out would get more
> coverage for people using older binutils.
> 
> The tests are OK with that change, thanks.

OK, I made that change to the tests.


Latest version of the patch:

2019-07-22  Steve Ellcey  

* omp-simd-clone.c (simd_clone_adjust_return_type): Remove call to
build_distinct_type_copy.
(simd_clone_adjust_argument_types): Ditto.
(simd_clone_adjust): Call build_distinct_type_copy here.
(expand_simd_clones): Ditto.


2019-07-22  Steve Ellcey  

* gcc.target/aarch64/simd_pcs_attribute.c: New test.
* gcc.target/aarch64/simd_pcs_attribute-2.c: Ditto.
* gcc.target/aarch64/simd_pcs_attribute-3.c: Ditto.


diff --git a/gcc/omp-simd-clone.c b/gcc/omp-simd-clone.c
index caa8da3cba5..da01d9b8e6c 100644
--- a/gcc/omp-simd-clone.c
+++ b/gcc/omp-simd-clone.c
@@ -498,7 +498,6 @@ simd_clone_adjust_return_type (struct cgraph_node *node)
   /* Adjust the function return type.  */
   if (orig_rettype == void_type_node)
 return NULL_TREE;
-  TREE_TYPE (fndecl) = build_distinct_type_copy (TREE_TYPE (fndecl));
   t = TREE_TYPE (TREE_TYPE (fndecl));
   if (INTEGRAL_TYPE_P (t) || POINTER_TYPE_P (t))
 veclen = node->simdclone->vecsize_int;
@@ -724,11 +723,6 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 	  else
 	new_reversed = void_list_node;
 	}
-
-  tree new_type = build_distinct_type_copy (TREE_TYPE (node->decl));
-  TYPE_ARG_TYPES (new_type) = new_reversed;
-  TREE_TYPE (node->decl) = new_type;
-
   adjustments.release ();
 }
   args.release ();
@@ -1164,6 +1158,7 @@ simd_clone_adjust (struct cgraph_node *node)
 {
   push_cfun (DECL_STRUCT_FUNCTION (node->decl));
 
+  TREE_TYPE (node->decl) = build_distinct_type_copy (TREE_TYPE (node->decl));
   targetm.simd_clone.adjust (node);
 
   tree retval = simd_clone_adjust_return_type (node);
@@ -1737,6 +1732,8 @@ expand_simd_clones (struct cgraph_node *node)
 	simd_clone_adjust (n);
 	  else
 	{
+	  TREE_TYPE (n->decl)
+		= build_distinct_type_copy (TREE_TYPE (n->decl));
 	  targetm.simd_clone.adjust (n);
 	  simd_clone_adjust_return_type (n);
 	  simd_clone_adjust_argument_types (n);
diff --git a/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-2.c b/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-2.c
index e69de29bb2d..8eec6824f37 100644
--- a/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-2.c
+++ b/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-2.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast" } */
+
+__attribute__ ((__simd__ ("notinbranch")))
+__attribute__ ((__nothrow__ , __leaf__ , __const__))
+extern double foo (double x);
+
+void bar(double * f, int n)
+{
+	int i;
+	for (i = 0; i < n; i++)
+		f[i] = foo(f[i]);
+}
+
+/* { dg-final { scan-assembler-not {\.variant_pcs\tfoo} } } */
+/* { dg-final { scan-assembler-times {\.variant_pcs\t_ZGVnN2v_foo} 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-3.c b/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-3.c
index e69de29bb2d..95f6a6803e8 100644
--- a/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-3.c
+++ b/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-3.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast" } */
+
+__attribute__ ((__simd__))
+__attribute__ ((__nothrow__ , __leaf__ , __const__))
+double foo (double x);
+
+void bar(double *f, int n)
+{
+	int i;
+	for (i = 0; i < n; i++)
+		f[i] = foo(f[i]);
+}
+
+double foo(double x)
+{
+	return x * x / 3.0;
+}
+
+/* { dg-final { scan-assembler-not {\.variant_pcs\tfoo} } } */
+/* { dg-final { scan-assembler-times {\.variant_pcs\t_ZGVnM1v_foo} 1 } } */
+/* { dg-final { scan-assembler-times {\.variant_pcs\t_ZGVnM2v_foo} 1 } } */
+/* { dg-final { scan-assembler-times {\.variant_pcs\t_ZGVnN1v_foo} 1 } } */
+/* { dg-final { scan-assembler-times {\.variant_pcs\t_ZGVnN2v_foo} 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute.c b/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute.c
index e69de29bb2d..eddcef3597c 100644
--- a/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute.c

[PATCH, fortran, arm] Fix PR 78314 on arm and aarch64

2019-07-22 Thread Steve Ellcey

This patch fixes PR libfortran/78314, the failure of
gfortran.dg/ieee/ieee_6.f90 on some Arm and Aarch64 platforms.
As mentioned in the PR trapping fpu exceptions is optional
on ARM and this function cannot do a runtime check for support
so we should return 0.

There are a couple of discussion strings about this issue,
pointers to them are in the PR.  This patch has already been
suggested by a couple of people (Uros and Christophe) but
never been checked in.  Can we go ahead and check it in?

Tested on Aarch64 (ThunderX) with no regressions.

Steve Ellcey
sell...@marvell.com



2019-07-22  Steve Ellcey  

PR libfortran/78314
* config/fpu-glibc.h (support_fpu_trap): Return 0 for Arm/Aarch64.

diff --git a/libgfortran/config/fpu-glibc.h b/libgfortran/config/fpu-glibc.h
index df2588e..692091c 100644
--- a/libgfortran/config/fpu-glibc.h
+++ b/libgfortran/config/fpu-glibc.h
@@ -129,7 +129,11 @@ get_fpu_trap_exceptions (void)
 int
 support_fpu_trap (int flag)
 {
+#if defined(__arm__) || defined(__aarch64__)
+  return 0;
+#else
   return support_fpu_flag (flag);
+#endif
 }

Re: [PATCH] Fix simd attribute handling on aarch64 (version 2)

2019-07-19 Thread Steve Ellcey

Here is version two of my patch to fix simd attribute handling on
aarch64.  Unlike the first patch where I swapped the order of the
calls to targetm.simd_clone.adjust and simd_clone_adjust_return_type,
in this one I remove the (conditional) call to build_distinct_type_copy
from simd_clone_adjust_return_type and do it unconditionally before
calling either routine.  The only downside to this that I can see is
that on non-aarch64 platforms where the return type of a vector
function is VOID (and not changed), we will create a distinct type
where we did not before.

I also added some tests to ensure that, on aarch64, the vector
functions created by cloning a simd function have the .variant_pcs
directive and that the original non-vector version of the function
does not have the directive.  Without this patch the non-vector
version is putting out the directive, that is what this patch
fixes.

Retested on x86 and aarch64 with no regressions.

OK to checkin?

Steve Ellcey
sell...@marvell.com


2019-07-19  Steve Ellcey  

* omp-simd-clone.c (simd_clone_adjust_return_type): Remove call to
build_distinct_type_copy.
(simd_clone_adjust): Call build_distinct_type_copy.
(expand_simd_clones): Ditto.


2019-07-19  Steve Ellcey  

* gcc.target/aarch64/simd_pcs_attribute.c: New test.
* gcc.target/aarch64/simd_pcs_attribute-2.c: Ditto.
* gcc.target/aarch64/simd_pcs_attribute-3.c: Ditto.


diff --git a/gcc/omp-simd-clone.c b/gcc/omp-simd-clone.c
index caa8da3cba5..427d6f6f514 100644
--- a/gcc/omp-simd-clone.c
+++ b/gcc/omp-simd-clone.c
@@ -498,7 +498,6 @@ simd_clone_adjust_return_type (struct cgraph_node *node)
   /* Adjust the function return type.  */
   if (orig_rettype == void_type_node)
 return NULL_TREE;
-  TREE_TYPE (fndecl) = build_distinct_type_copy (TREE_TYPE (fndecl));
   t = TREE_TYPE (TREE_TYPE (fndecl));
   if (INTEGRAL_TYPE_P (t) || POINTER_TYPE_P (t))
 veclen = node->simdclone->vecsize_int;
@@ -1164,6 +1163,7 @@ simd_clone_adjust (struct cgraph_node *node)
 {
   push_cfun (DECL_STRUCT_FUNCTION (node->decl));
 
+  TREE_TYPE (node->decl) = build_distinct_type_copy (TREE_TYPE (node->decl));
   targetm.simd_clone.adjust (node);
 
   tree retval = simd_clone_adjust_return_type (node);
@@ -1737,6 +1737,8 @@ expand_simd_clones (struct cgraph_node *node)
 	simd_clone_adjust (n);
 	  else
 	{
+	  TREE_TYPE (n->decl)
+		= build_distinct_type_copy (TREE_TYPE (n->decl));
 	  targetm.simd_clone.adjust (n);
 	  simd_clone_adjust_return_type (n);
 	  simd_clone_adjust_argument_types (n);
diff --git a/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-2.c b/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-2.c
index e69de29bb2d..913960c607b 100644
--- a/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-2.c
+++ b/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-2.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast" } */
+/* { dg-require-effective-target aarch64_variant_pcs } */
+
+__attribute__ ((__simd__ ("notinbranch")))
+__attribute__ ((__nothrow__ , __leaf__ , __const__))
+extern double foo (double x);
+
+void bar(double * f, int n)
+{
+	int i;
+	for (i = 0; i < n; i++)
+		f[i] = foo(f[i]);
+}
+
+/* { dg-final { scan-assembler-not {\.variant_pcs\tfoo} } } */
+/* { dg-final { scan-assembler-times {\.variant_pcs\t_ZGVnN2v_foo} 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-3.c b/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-3.c
index e69de29bb2d..e3debb0ab18 100644
--- a/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-3.c
+++ b/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-3.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast" } */
+/* { dg-require-effective-target aarch64_variant_pcs } */
+
+__attribute__ ((__simd__))
+__attribute__ ((__nothrow__ , __leaf__ , __const__))
+double foo (double x);
+
+void bar(double *f, int n)
+{
+	int i;
+	for (i = 0; i < n; i++)
+		f[i] = foo(f[i]);
+}
+
+double foo(double x)
+{
+	return x * x / 3.0;
+}
+
+/* { dg-final { scan-assembler-not {\.variant_pcs\tfoo} } } */
+/* { dg-final { scan-assembler-times {\.variant_pcs\t_ZGVnM1v_foo} 1 } } */
+/* { dg-final { scan-assembler-times {\.variant_pcs\t_ZGVnM2v_foo} 1 } } */
+/* { dg-final { scan-assembler-times {\.variant_pcs\t_ZGVnN1v_foo} 1 } } */
+/* { dg-final { scan-assembler-times {\.variant_pcs\t_ZGVnN2v_foo} 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute.c b/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute.c
index e69de29bb2d..17a0a701cf4 100644
--- a/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute.c
+++ b/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast" } */
+/* { dg-require-effective-target aarch64_variant_pcs } */
+
+__attribute__ ((__simd__ ("notinbranch")))
+__attribute__ ((__no

Re: [PATCH] Fix simd attribute handling on aarch64

2019-07-18 Thread Steve Ellcey

On Thu, 2019-07-18 at 08:37 +0100, Richard Sandiford wrote:
> 
> > 2019-07-17  Steve Ellcey  
> > 
> > * omp-simd-clone.c (simd_clone_adjust):  Call targetm.simd_clone.adjust
> > after calling simd_clone_adjust_return_type.
> > (expand_simd_clones): Ditto.
> 
> It should be pretty easy to add a test for this, now that we use
> .variant_pcs to mark symbols with the attribute.

OK, I will add some tests that makes sure this mark is not on the
scalar version of a simd function.

> > diff --git a/gcc/omp-simd-clone.c b/gcc/omp-simd-clone.c
> > index caa8da3cba5..6a6b439d146 100644
> > --- a/gcc/omp-simd-clone.c
> > +++ b/gcc/omp-simd-clone.c
> > @@ -1164,9 +1164,8 @@ simd_clone_adjust (struct cgraph_node *node)
> >  {
> >push_cfun (DECL_STRUCT_FUNCTION (node->decl));
> >  
> > -  targetm.simd_clone.adjust (node);
> > -
> >tree retval = simd_clone_adjust_return_type (node);
> > +  targetm.simd_clone.adjust (node);
> >ipa_parm_adjustment_vec adjustments
> >  = simd_clone_adjust_argument_types (node);
> >  
> > @@ -1737,8 +1736,8 @@ expand_simd_clones (struct cgraph_node *node)
> > simd_clone_adjust (n);
> >   else
> > {
> > - targetm.simd_clone.adjust (n);
> >   simd_clone_adjust_return_type (n);
> > + targetm.simd_clone.adjust (n);
> >   simd_clone_adjust_argument_types (n);
> > }
> > }
> 
> I don't think this is enough, since simd_clone_adjust_return_type
> does nothing for functions that return void (e.g. sincos).
> I think instead aarch64_simd_clone_adjust should do something like:
> 
>   TREE_TYPE (node->decl) = build_distinct_type_copy (TREE_TYPE (node-
> >decl));
> 
> But maybe that has consequences that I've not thought about...

I think that would work, but it would build two distinct types for non-
void functions, one of which would be unused/uneeded.  I.e.
aarch64_simd_clone_adjust would create a distinct type and then
simd_clone_adjust_return_type would create another distinct type
and the previous one would no longer be used anywhere.

What do you think about moving the call to build_distinct_type_copy
out of simd_clone_adjust_return_type and doing it even for null
types.  Below is what I am thinking about (untested).  I suppose
we could also leave the call to build_distinct_type_copy in 
simd_clone_adjust_return_type but just move it above the check
for a NULL type so that a distinct type is always created there.
That would still require that we change the order of the
targetm.simd_clone.adjust and simd_clone_adjust_return_type
calls as my original patch does.


diff --git a/gcc/omp-simd-clone.c b/gcc/omp-simd-clone.c
index caa8da3cba5..427d6f6f514 100644
--- a/gcc/omp-simd-clone.c
+++ b/gcc/omp-simd-clone.c
@@ -498,7 +498,6 @@ simd_clone_adjust_return_type (struct cgraph_node
*node)
   /* Adjust the function return type.  */
   if (orig_rettype == void_type_node)
 return NULL_TREE;
-  TREE_TYPE (fndecl) = build_distinct_type_copy (TREE_TYPE (fndecl));
   t = TREE_TYPE (TREE_TYPE (fndecl));
   if (INTEGRAL_TYPE_P (t) || POINTER_TYPE_P (t))
 veclen = node->simdclone->vecsize_int;
@@ -1164,6 +1163,7 @@ simd_clone_adjust (struct cgraph_node *node)
 {
   push_cfun (DECL_STRUCT_FUNCTION (node->decl));
 
+  TREE_TYPE (node->decl) = build_distinct_type_copy (TREE_TYPE (node-
>decl));
   targetm.simd_clone.adjust (node);
 
   tree retval = simd_clone_adjust_return_type (node);
@@ -1737,6 +1737,8 @@ expand_simd_clones (struct cgraph_node *node)
simd_clone_adjust (n);
  else
{
+     TREE_TYPE (n->decl)
+   = build_distinct_type_copy (TREE_TYPE (n->decl));
  targetm.simd_clone.adjust (n);
  simd_clone_adjust_return_type (n);
  simd_clone_adjust_argument_types (n);


Steve Ellcey
sell...@marvell.com

[PATCH] Fix simd attribute handling on aarch64

2019-07-17 Thread Steve Ellcey

This patch fixes a bug with SIMD functions on Aarch64.  I found it
while trying to run SPEC with ToT GCC and a glibc that defines vector
math functions for aarch64.  When a function is declared with the simd
attribute GCC creates vector clones of that function with the return
and argument types changed to vector types.  On Aarch64 the vector
clones are also marked with the aarch64_vector_pcs attribute to signify
that they use an alternate calling convention.  Due to a bug in GCC the
non-vector version of the function being cloned was also being marked
with this attribute.

Because simd_clone_adjust and expand_simd_clones are calling
targetm.simd_clone.adjust (which attached the aarch64_vector_pcs
attribute to the function type) before calling
simd_clone_adjust_return_type (which created a new distinct type tree
for the cloned function) the attribute got attached to both the
'normal' scalar version of the SIMD function and any vector versions of
the function.  The attribute should only be on the vector versions.

My fix is to call simd_clone_adjust_return_type and create the new type
before calling targetm.simd_clone.adjust which adds the attribute.  The
only other platform that this patch could affect is x86 because that is
the only other platform to use targetm.simd_clone.adjust.  I did a
bootstrap and gcc test run on x86 (as well as Aarch64) and got no
regressions.

OK to checkin?

Steve Ellcey
sell...@marvell.com


2019-07-17  Steve Ellcey  

* omp-simd-clone.c (simd_clone_adjust):  Call targetm.simd_clone.adjust
after calling simd_clone_adjust_return_type.
(expand_simd_clones): Ditto.


diff --git a/gcc/omp-simd-clone.c b/gcc/omp-simd-clone.c
index caa8da3cba5..6a6b439d146 100644
--- a/gcc/omp-simd-clone.c
+++ b/gcc/omp-simd-clone.c
@@ -1164,9 +1164,8 @@ simd_clone_adjust (struct cgraph_node *node)
 {
   push_cfun (DECL_STRUCT_FUNCTION (node->decl));
 
-  targetm.simd_clone.adjust (node);
-
   tree retval = simd_clone_adjust_return_type (node);
+  targetm.simd_clone.adjust (node);
   ipa_parm_adjustment_vec adjustments
 = simd_clone_adjust_argument_types (node);
 
@@ -1737,8 +1736,8 @@ expand_simd_clones (struct cgraph_node *node)
simd_clone_adjust (n);
  else
{
- targetm.simd_clone.adjust (n);
  simd_clone_adjust_return_type (n);
+ targetm.simd_clone.adjust (n);
  simd_clone_adjust_argument_types (n);
}
}

RFC: GCC Aarch64 SIMD vectorization question involving libmvec

2019-06-27 Thread Steve Ellcey

I am testing the latest GCC with not-yet-submitted GLIBC changes that
implement libmvec on Aarch64.

While trying to run SPEC 2017 (specifically 521.wrf_r) I ran into a
case where GCC was generating a call to _ZGVnN2vv_powf, that is a
vectorized powf call for 2 (not 4) elements.  This was a problem
because I only implemented a 4 element 32 bit vectorized powf function
for libmvec and not a 2 element version.

I think this is due to aarch64_simd_clone_compute_vecsize_and_simdlen
which allows for (element count * element size) to be either 64
or 128.

I would like some thoughts on what we should do about this, should
we require glibc/libmvec to provide 2 element 32 bit floating point
vector functions (as well as the 4 element ones) or should we change
aarch64_simd_clone_compute_vecsize_and_simdlen to only allow 4
element (128 total bit size) vectors and not 2 element (64 total bit
size) ones?

This is obviously a question for the pre-SVE vector instructions,
I am not sure how this would be handled in SVE.

Steve Ellcey
sell...@marvell.com

P.S.  Here a test case in Fortran that generated the 2 element
  vector call.  It unrolled the loop into one vector call
  of 2 elements and one scalar call.

  SUBROUTINE FOO(B,W,P)
  REAL, DIMENSION (3) :: W, P
  DO 10 I = 1, 3
  P(I) = W(I) ** B
10CONTINUE
  END SUBROUTINE FOO

Re: [PATCH][AArch64] Increase default function alignment

2019-05-31 Thread Steve Ellcey

On Fri, 2019-05-31 at 11:52 +, Wilco Dijkstra wrote:
> With -mcpu=generic the function alignment is currently 8, however almost all
> supported cores prefer 16 or higher, so increase the default to 16:12.
> This gives ~0.2% performance increase on SPECINT2017, while codesize is 0.12%
> larger.
> 
> ChangeLog:
> 2019-05-31  Wilco Dijkstra  
> 
>   * config/aarch64/aarch64.c (generic_tunings): Set function
> alignment to 16.
> 
> --
> 
> diff --git a/gcc/config/aarch64/aarch64.c
> b/gcc/config/aarch64/aarch64.c
> index
> 0023cb37bbae5afe9387840c1bb6b43586d4fac2..ed1422af6aab5e3c6eeea37ec57
> e69b64092a0ab 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -693,7 +693,7 @@ static const struct tune_params generic_tunings =
>4, /* memmov_cost  */
>2, /* issue_rate  */
>(AARCH64_FUSE_AES_AESMC), /* fusible_ops  */
> -  "8",   /* function_align.  */
> +  "16:12",   /* function_align.  */
>"4",   /* jump_align.  */
>"8",   /* loop_align.  */
>2, /* int_reassoc_width.  */

I have no objection to the change but could the commit message and/or
comments be expanded to explain the ':12' part of this value.  I
couldn't find an explanation for it in the code and I don't understand
what it does.

Steve Ellcey
sell...@marvell.com

Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-17 Thread Steve Ellcey

I hope this isn't too much of a thread drift but I was wondering if,
after the Git conversion, will the GCC repo look like a 'normal' GIT
repo with the main line sources on the master branch?

I.e. right now instead of a simple clone, the GCC Wiki says to use a
sequence of git init/config/fetch/checkout commands.  After the
conversion will we be able to just use 'git clone'?  And will the
default master branch be the usable GCC top-of-tree sources (vs the
trunk branch) that we can do checkins to?

Steve Ellcey
sell...@marvell.com

Re: [Patch] [Aarch64] PR rtl-optimization/87763 - this patch fixes gcc.target/aarch64/lsl_asr_sbfiz.c

2019-04-16 Thread Steve Ellcey

Re-ping.  I know there are discussions about bigger changes to fix the
various failures listed in PR rtl-optimization/87763 but this patch
at least fixes one of them (gcc.target/aarch64/lsl_asr_sbfiz.c).

Steve Ellcey
sell...@marvell.com


On Wed, 2019-04-10 at 15:35 -0700, Steve Ellcey wrote:
> On Wed, 2019-04-10 at 15:03 -0700, Steve Ellcey wrote:
> > Here is another patch to fix one of the failures
> > listed in PR rtl-optimization/87763. This change
> > fixes gcc.target/aarch64/lsl_asr_sbfiz.c by adding
> > an alternative version of *ashiftsi_extv_bfiz that
> > has a subreg in it.
> > 
> > Tested with bootstrap and regression test run.
> > 
> > OK for checkin?
> > 
> > Steve Ellcey
> > 
> > 
> > 2018-04-10  Steve Ellcey  
> > 
> > PR rtl-optimization/87763
> > * config/aarch64/aarch64.md (*ashiftsi_extv_bfiz_alt):
> > New Instruction.
> 
> I forgot I sent this out before (in January).  It generated some
> discussion then but no action.  So I guess this is a ping.
> 
> https://gcc.gnu.org/ml/gcc-patches/2019-01/msg01694.html
> 
> Steve Ellcey
> sell...@marvell.com

Re: [Patch] PR rtl-optimization/87763 - generate more bfi instructions on aarch64

2019-04-11 Thread Steve Ellcey

On Thu, 2019-04-11 at 17:00 +0100, Richard Earnshaw (lists) wrote:
> 
> 
> Please add _alt at the end, to distinguish from the insn above.
> 
> Otherwise OK.

I added _alt, I also had to move the "0" contraint and the "r"
constraint.  I had "0" with operand 3 instead of operand 1 and
that caused a couple of test failures.  I fixed it and the regressions
went away.  I also had to tweak gcc.target/aarch64/combine_bfxil.c,
some of the bfxil instructions became bfi instructions so I updated
the scan checks.

Steve Ellcey

Re: [Patch] PR rtl-optimization/87763 - generate more bfi instructions on aarch64

2019-04-11 Thread Steve Ellcey

On Thu, 2019-04-11 at 14:58 +, Steve Ellcey wrote:
> 
> > You've removed the ..._noshift_alt variant.  That wasn't my
> > intention,
> > so perhaps you misunderstood what I was trying to say.
> > 
> > The two versions are both needed, since the register tie is not
> > orthogonal to the constants used in the masks and canonicalization
> > will
> > not generate a consistent ordering of the operands.
> 
> I started doing this and then convinced myself (perhaps incorrectly)
> that I didn't need the alt version.  Operands 1 and 3 are registers
> and Operands 2 and 4 are constants, so the only difference is in the 
> call to aarch64_masks_and_shift_for_bfi_p.  Given that argument 2 to
> this call is 0 this call should be the equivelent of ((x & MASK1) |
> (y
> & MASK2)) and that should mean that:
> 
> aarch64_masks_and_shift_for_bfi_p(X,0,Y) ==
> aarch64_masks_and_shift_for_bfi_p(Y,0,X)
> 
> 
> Maybe I am wrong about that?  I will do some expirements.  My testing
> did not find any cases in the testsuite where not having the _alt
> version resulted in a bfi not being generated.

OK, I think I see where my idea that I didn't need both versions is
wrong.  There is an extra check on the final argument that is not done
on the initial argument.  Here is the patch that I am testing, I think
it is what you have in mind.  It looks wierd having arguments 3 and 4
before 1 and 2, I think that is why I had it differently in my original
patch but if you prefer this version, that is fine with me.

OK to check in once my build/test is finished?



2018-04-11  Steve Ellcey  

PR rtl-optimization/87763
* config/aarch64/aarch64.md (*aarch64_bfi4_noshift):
New Instruction.


diff --git a/gcc/config/aarch64/aarch64.md
b/gcc/config/aarch64/aarch64.md
index e0df975..eac688a 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5565,7 +5565,8 @@
 )
 
 ;; Like *aarch64_bfi5_shift but with no shifting, we are just
-;; copying the least significant bits of OP3 to OP0.
+;; copying the least significant bits of OP3 to OP0.  We need two versions
+;; of the instruction to handle different checks on the constant values.
 
 (define_insn "*aarch64_bfi4_noshift"
   [(set (match_operand:GPI 0 "register_operand" "=r")
@@ -5579,6 +5580,18 @@
   [(set_attr "type" "bfm")]
 )
 
+(define_insn "*aarch64_bfi4_noshift"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+(ior:GPI (and:GPI (match_operand:GPI 3 "register_operand" "0")
+  (match_operand:GPI 4 "const_int_operand" "n"))
+ (and:GPI (match_operand:GPI 1 "register_operand" "r")
+  (match_operand:GPI 2 "const_int_operand" "n"]
+  "aarch64_masks_and_shift_for_bfi_p (mode, UINTVAL (operands[2]), 0,
+ UINTVAL (operands[4]))"
+  "bfi\t%0, %3, 0, %P4"
+  [(set_attr "type" "bfm")]
+)
+
 (define_insn "*extr_insv_lower_reg"
   [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand" "+r")
  (match_operand 1 "const_int_operand" "n")

Re: [EXT] Re: [Patch] PR rtl-optimization/87763 - generate more bfi instructions on aarch64

2019-04-11 Thread Steve Ellcey

On Thu, 2019-04-11 at 09:59 +0100, Richard Earnshaw (lists) wrote:
> 
> > 
> > 2018-04-10  Steve Ellcey  
> > 
> > PR rtl-optimization/87763
> > * config/aarch64/aarch64-protos.h
> > (aarch64_masks_and_shift_for_bfi_p):
> > New prototype.
> > * config/aarch64/aarch64.c (aarch64_masks_and_shift_for_bfi_p):
> > New function.
> > * config/aarch64/aarch64.md (*aarch64_bfi5_shift):
> > New instruction.
> > (*aarch64_bfi5_shift_alt): Ditto.
> > (*aarch64_bfi4_noand): Ditto.
> > (*aarch64_bfi4_noand_alt): Ditto.
> > (*aarch64_bfi4_noshift): Ditto.
> > 
> 
> You've removed the ..._noshift_alt variant.  That wasn't my intention,
> so perhaps you misunderstood what I was trying to say.
>
> The two versions are both needed, since the register tie is not
> orthogonal to the constants used in the masks and canonicalization will
> not generate a consistent ordering of the operands.

I started doing this and then convinced myself (perhaps incorrectly)
that I didn't need the alt version.  Operands 1 and 3 are registers
and Operands 2 and 4 are constants, so the only difference is in the 
call to aarch64_masks_and_shift_for_bfi_p.  Given that argument 2 to
this call is 0 this call should be the equivelent of ((x & MASK1) | (y
& MASK2)) and that should mean that:

aarch64_masks_and_shift_for_bfi_p(X,0,Y) ==
aarch64_masks_and_shift_for_bfi_p(Y,0,X)

Maybe I am wrong about that?  I will do some expirements.  My testing
did not find any cases in the testsuite where not having the _alt
version resulted in a bfi not being generated.

Steve Ellcey
sell...@marvell.com
> >

Re: [Patch] [Aarch64] PR rtl-optimization/87763 - this patch fixes gcc.target/aarch64/lsl_asr_sbfiz.c

2019-04-10 Thread Steve Ellcey

On Wed, 2019-04-10 at 15:03 -0700, Steve Ellcey wrote:
> Here is another patch to fix one of the failures
> listed in PR rtl-optimization/87763. This change
> fixes gcc.target/aarch64/lsl_asr_sbfiz.c by adding
> an alternative version of *ashiftsi_extv_bfiz that
> has a subreg in it.
> 
> Tested with bootstrap and regression test run.
> 
> OK for checkin?
> 
> Steve Ellcey
> 
> 
> 2018-04-10  Steve Ellcey  
> 
>   PR rtl-optimization/87763
>   * config/aarch64/aarch64.md (*ashiftsi_extv_bfiz_alt):
>   New Instruction.

I forgot I sent this out before (in January).  It generated some
discussion then but no action.  So I guess this is a ping.

https://gcc.gnu.org/ml/gcc-patches/2019-01/msg01694.html

Steve Ellcey
sell...@marvell.com

[Patch] [Aarch64] PR rtl-optimization/87763 - this patch fixes gcc.target/aarch64/lsl_asr_sbfiz.c

2019-04-10 Thread Steve Ellcey


Here is another patch to fix one of the failures
listed in PR rtl-optimization/87763. This change
fixes gcc.target/aarch64/lsl_asr_sbfiz.c by adding
an alternative version of *ashiftsi_extv_bfiz that
has a subreg in it.

Tested with bootstrap and regression test run.

OK for checkin?

Steve Ellcey


2018-04-10  Steve Ellcey  

PR rtl-optimization/87763
* config/aarch64/aarch64.md (*ashiftsi_extv_bfiz_alt):
New Instruction.


diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index e0df975..04dc06f 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5634,6 +5634,22 @@
   [(set_attr "type" "bfx")]
 )
 
+(define_insn "*ashiftsi_extv_bfiz_alt"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+   (ashift:SI
+ (subreg:SI
+   (sign_extract:DI
+ (subreg:DI (match_operand:SI 1 "register_operand" "r") 0)
+ (match_operand 2 "aarch64_simd_shift_imm_offset_si" "n")
+ (const_int 0))
+   0)
+ (match_operand 3 "aarch64_simd_shift_imm_si" "n")))]
+  "IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]),
+1, GET_MODE_BITSIZE (SImode) - 1)"
+  "sbfiz\\t%w0, %w1, %3, %2"
+  [(set_attr "type" "bfx")]
+)
+
 ;; When the bit position and width of the equivalent extraction add up to 32
 ;; we can use a W-reg LSL instruction taking advantage of the implicit
 ;; zero-extension of the X-reg.

Re: [Patch] PR rtl-optimization/87763 - generate more bfi instructions on aarch64

2019-04-10 Thread Steve Ellcey

On Wed, 2019-04-10 at 11:10 +0100, Richard Earnshaw (lists) wrote:
> 
> OK with those changes.
> 
> R.

I made the changes you suggested and checked in the patch.  Just to be
complete, here is the final version of the patch that I checked in.

2018-04-10  Steve Ellcey  

PR rtl-optimization/87763
* config/aarch64/aarch64-protos.h (aarch64_masks_and_shift_for_bfi_p):
New prototype.
* config/aarch64/aarch64.c (aarch64_masks_and_shift_for_bfi_p):
New function.
* config/aarch64/aarch64.md (*aarch64_bfi5_shift):
New instruction.
(*aarch64_bfi5_shift_alt): Ditto.
(*aarch64_bfi4_noand): Ditto.
(*aarch64_bfi4_noand_alt): Ditto.
(*aarch64_bfi4_noshift): Ditto.

2018-04-10  Steve Ellcey  

PR rtl-optimization/87763
* gcc.target/aarch64/combine_bfxil.c: Change some bfxil checks
to bfi.
* gcc.target/aarch64/combine_bfi_2.c: New test.
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index b035e35..b6c0d0a 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -429,6 +429,9 @@ bool aarch64_label_mentioned_p (rtx);
 void aarch64_declare_function_name (FILE *, const char*, tree);
 bool aarch64_legitimate_pic_operand_p (rtx);
 bool aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode, rtx, rtx);
+bool aarch64_masks_and_shift_for_bfi_p (scalar_int_mode, unsigned HOST_WIDE_INT,
+	unsigned HOST_WIDE_INT,
+	unsigned HOST_WIDE_INT);
 bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx);
 bool aarch64_move_imm (HOST_WIDE_INT, machine_mode);
 opt_machine_mode aarch64_sve_pred_mode (unsigned int);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 95e5b03..9be7548 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9336,6 +9336,35 @@ aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode mode, rtx mask,
 	 & ((HOST_WIDE_INT_1U << INTVAL (shft_amnt)) - 1)) == 0;
 }
 
+/* Return true if the masks and a shift amount from an RTX of the form
+   ((x & MASK1) | ((y << SHIFT_AMNT) & MASK2)) are valid to combine into
+   a BFI instruction of mode MODE.  See *arch64_bfi patterns.  */
+
+bool
+aarch64_masks_and_shift_for_bfi_p (scalar_int_mode mode,
+   unsigned HOST_WIDE_INT mask1,
+   unsigned HOST_WIDE_INT shft_amnt,
+   unsigned HOST_WIDE_INT mask2)
+{
+  unsigned HOST_WIDE_INT t;
+
+  /* Verify that there is no overlap in what bits are set in the two masks.  */
+  if (mask1 != ~mask2)
+return false;
+
+  /* Verify that mask2 is not all zeros or ones.  */
+  if (mask2 == 0 || mask2 == HOST_WIDE_INT_M1U)
+return false;
+
+  /* The shift amount should always be less than the mode size.  */
+  gcc_assert (shft_amnt < GET_MODE_BITSIZE (mode));
+
+  /* Verify that the mask being shifted is contiguous and would be in the
+ least significant bits after shifting by shft_amnt.  */
+  t = mask2 + (HOST_WIDE_INT_1U << shft_amnt);
+  return (t == (t & -t));
+}
+
 /* Calculate the cost of calculating X, storing it in *COST.  Result
is true if the total cost of the operation has now been calculated.  */
 static bool
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index ab8786a..e0df975 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5491,6 +5491,94 @@
   [(set_attr "type" "bfm")]
 )
 
+;;  Match a bfi instruction where the shift of OP3 means that we are
+;;  actually copying the least significant bits of OP3 into OP0 by way
+;;  of the AND masks and the IOR instruction.  A similar instruction
+;;  with the two parts of the IOR swapped around was never triggered
+;;  in a bootstrap build and test of GCC so it was not included.
+
+(define_insn "*aarch64_bfi5_shift"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+(ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
+  (match_operand:GPI 2 "const_int_operand" "n"))
+ (and:GPI (ashift:GPI
+   (match_operand:GPI 3 "register_operand" "r")
+   (match_operand:GPI 4 "aarch64_simd_shift_imm_" "n"))
+  (match_operand:GPI 5 "const_int_operand" "n"]
+  "aarch64_masks_and_shift_for_bfi_p (mode, UINTVAL (operands[2]),
+  UINTVAL (operands[4]),
+  UINTVAL(operands[5]))"
+  "bfi\t%0, %3, %4, %P5"
+  [(set_attr "type" "bfm")]
+)
+
+(define_insn "*aarch64_bfi5_shift_alt"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+(ior:GPI (and:GPI (ashift:GPI
+   (match_operand:GPI 1 "register_

Re: [Patch] PR rtl-optimization/87763 - generate more bfi instructions on aarch64

2019-04-01 Thread Steve Ellcey

This is a ping**3 for a patch to fix one of the test failures in PR 877763.
It fixes the gcc.target/aarch64/combine_bfi_1.c failure, but not the other
ones.

Could one of the Aarch64 maintainers take a look at it?  This version of
the patch was originally submitted on February 11 after incorporating some
changes suggested by Wilco Dijkstra but I have not gotten any comments
since then despite pinging it.  I updated it to apply cleanly on ToT but
did not make any other changes since the February 11th version.

If we want to encourage people to fix bugs in the run up to a release it
would help to get feedback on bugfix patches.

Steve Ellcey
sell...@marvell.com


2018-04-01  Steve Ellcey  

PR rtl-optimization/87763
* config/aarch64/aarch64-protos.h (aarch64_masks_and_shift_for_bfi_p):
New prototype.
* config/aarch64/aarch64.c (aarch64_masks_and_shift_for_bfi_p):
New function.
* config/aarch64/aarch64.md (*aarch64_bfi5_shift):
New instruction.
(*aarch64_bfi4_noand): Ditto.
(*aarch64_bfi4_noshift): Ditto.
(*aarch64_bfi4_noshift_alt): Ditto.


2018-04-01  Steve Ellcey  

PR rtl-optimization/87763
* gcc.target/aarch64/combine_bfxil.c: Change some bfxil checks
to bfi.
* gcc.target/aarch64/combine_bfi_2.c: New test.
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index b035e35..b6c0d0a 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -429,6 +429,9 @@ bool aarch64_label_mentioned_p (rtx);
 void aarch64_declare_function_name (FILE *, const char*, tree);
 bool aarch64_legitimate_pic_operand_p (rtx);
 bool aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode, rtx, rtx);
+bool aarch64_masks_and_shift_for_bfi_p (scalar_int_mode, unsigned HOST_WIDE_INT,
+	unsigned HOST_WIDE_INT,
+	unsigned HOST_WIDE_INT);
 bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx);
 bool aarch64_move_imm (HOST_WIDE_INT, machine_mode);
 opt_machine_mode aarch64_sve_pred_mode (unsigned int);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b38505b..3017e99 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9336,6 +9336,35 @@ aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode mode, rtx mask,
 	 & ((HOST_WIDE_INT_1U << INTVAL (shft_amnt)) - 1)) == 0;
 }
 
+/* Return true if the masks and a shift amount from an RTX of the form
+   ((x & MASK1) | ((y << SHIFT_AMNT) & MASK2)) are valid to combine into
+   a BFI instruction of mode MODE.  See *arch64_bfi patterns.  */
+
+bool
+aarch64_masks_and_shift_for_bfi_p (scalar_int_mode mode,
+   unsigned HOST_WIDE_INT mask1,
+   unsigned HOST_WIDE_INT shft_amnt,
+   unsigned HOST_WIDE_INT mask2)
+{
+  unsigned HOST_WIDE_INT t;
+
+  /* Verify that there is no overlap in what bits are set in the two masks.  */
+  if (mask1 != ~mask2)
+return false;
+
+  /* Verify that mask2 is not all zeros or ones.  */
+  if (mask2 == 0 || mask2 == HOST_WIDE_INT_M1U)
+return false;
+
+  /* The shift amount should always be less than the mode size.  */
+  gcc_assert (shft_amnt < GET_MODE_BITSIZE (mode));
+
+  /* Verify that the mask being shifted is contiguous and would be in the
+ least significant bits after shifting by shft_amnt.  */
+  t = mask2 + (HOST_WIDE_INT_1U << shft_amnt);
+  return (t == (t & -t));
+}
+
 /* Calculate the cost of calculating X, storing it in *COST.  Result
is true if the total cost of the operation has now been calculated.  */
 static bool
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 70f0418..baa8983 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5490,6 +5490,76 @@
   [(set_attr "type" "bfm")]
 )
 
+;;  Match a bfi instruction where the shift of OP3 means that we are
+;;  actually copying the least significant bits of OP3 into OP0 by way
+;;  of the AND masks and the IOR instruction.  A similar instruction
+;;  with the two parts of the IOR swapped around was never triggered
+;;  in a bootstrap build and test of GCC so it was not included.
+
+(define_insn "*aarch64_bfi5_shift"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+(ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
+  (match_operand:GPI 2 "const_int_operand" "n"))
+ (and:GPI (ashift:GPI
+   (match_operand:GPI 3 "register_operand" "r")
+   (match_operand:GPI 4 "aarch64_simd_shift_imm_" "n"))
+  (match_operand:GPI 5 "const_int_operand" "n"]
+  "aarch64_masks_and_shift_for_bfi_p (mode, UINTVAL (operands[2]),
+  UINTVAL (opera

Re: [Patch] PR rtl-optimization/87763 - generate more bfi instructions on aarch64

2019-03-14 Thread Steve Ellcey

Double ping.

Steve Ellcey
sell...@marvell.com

On Tue, 2019-02-26 at 08:44 -0800, Steve Ellcey wrote:
> Ping.
> 
> Steve Ellcey
> sell...@marvell.com
> 
> 
> On Mon, 2019-02-11 at 10:46 -0800, Steve Ellcey wrote:
> > On Thu, 2019-02-07 at 18:13 +, Wilco Dijkstra wrote
> > > 
> > > Hi Steve,
> > > 
> > > > > After special cases you could do something like t = mask2 +
> > > > > (HWI_1U << shift);
> > > > > return t == (t & -t) to check for a valid bfi.
> > > > 
> > > > I am not sure I follow this logic and my attempts to use this
> > > > did
> > > > not
> > > > work so I kept my original code.
> > > 
> > > It's similar to the initial code in aarch64_bitmask_imm, but
> > > rather
> > > than adding
> > > the lowest bit to the value to verify it is a mask (val + (val &
> > > -val)), we use the
> > > shift instead. If the shift is exactly right, it reaches the
> > > first
> > > set bit of the mask.
> > > Adding the low bit to a valid mask always results in zero or a
> > > single set bit.
> > > The standard idiom to check that is t == (t & -t).
> > > 
> > > > > +  "bfi\t%0, %1, 0, %P2"
> > > > > 
> > > > > This could emit a width that may be 32 too large in SImode if
> > > > > bit 31 is set
> > > > > (there is another use of %P in aarch64.md which may have the
> > > > > same
> > > > > issue).
> > > > 
> > > > I am not sure why having bit 31 set would be a problem.  Sign
> > > > extension?
> > > 
> > > Yes, if bit 31 is set, %P will emit 33 for a 32-bit constant which
> > > is obviously wrong.
> > > Your patch avoids this for bfi by explicitly computing the correct
> > > value.
> > > 
> > > This looks good to me (and creates useful bfi's as expected), but I
> > > can't approve.
> > > 
> > > Wilco
> > 
> > Thanks for looking this over.  I have updated the mask check to use
> > your method and retested to make sure it still works.  Can one of the
> > aarch64 maintainers approve this patch?
> > 
> > 2018-02-11  Steve Ellcey  
> > 
> > PR rtl-optimization/87763
> > * config/aarch64/aarch64-protos.h
> > (aarch64_masks_and_shift_for_bfi_p):
> > New prototype.
> > * config/aarch64/aarch64.c (aarch64_masks_and_shift_for_bfi_p):
> > New function.
> > * config/aarch64/aarch64.md (*aarch64_bfi5_shift):
> > New instruction.
> > (*aarch64_bfi4_noand): Ditto.
> > (*aarch64_bfi4_noshift): Ditto.
> > (*aarch64_bfi4_noshift_alt): Ditto.
> > 
> > 2018-02-11  Steve Ellcey  
> > 
> > PR rtl-optimization/87763
> > * gcc.target/aarch64/combine_bfxil.c: Change some bfxil checks
> > to bfi.
> > * gcc.target/aarch64/combine_bfi_2.c: New test.
> > 
> >

Re: RFC: Patch to allow use of DECL_NAME in libmvec calls instead of DECL_ASSEMBER_NAME

2019-03-13 Thread Steve Ellcey

On Wed, 2019-03-13 at 14:38 +, Joseph Myers wrote:
> 
> ---
> ---
> On Wed, 13 Mar 2019, Jakub Jelinek wrote:
> 
> > If the finite only doesn't buy anything, then another option is to drop the
> > math-finite.h stuff or portions thereof.
> 
> Well, finite-only entry points avoid wrappers for the scalar functions - 
> it's just that suitable optimized implementations could avoid the wrappers 
> in all cases without needing a separate finite-only variant.  It's not 
> clear that adding wrappers in this case for scalar functions to avoid them 
> for vector functions is a good idea.  And regardless of the merits of a 
> particular set of entry points, I think requiring the same set of variants 
> for both vector and scalar functions is flawed; the headers should be able 
> to declare a scalar variant to be used under certain conditions without 
> requiring a corresponding vector variant, or of course the other way 
> round.

If some targets don't need _finite variants it might be useful to tell
the compiler not to generate them.  For example, on aarch64 we know
that __exp_finite and __expf_finite are just aliases of exp and expf,
so if we could turn off the math-finite.h functionality for those
functions on this target, GCC would also not generate calls to the
vector versions of exp_finite and expf_finite.  This doesn't directly
address the issue of the scalar and vector variants being independent
of each other but it does make the problem moot in this specific case.

Steve Ellcey
sell...@marvell.com

Re: RFC: Patch to allow use of DECL_NAME in libmvec calls instead of DECL_ASSEMBER_NAME

2019-03-12 Thread Steve Ellcey

On Tue, 2019-03-12 at 14:17 +, Joseph Myers wrote:
> 
> On Tue, 12 Mar 2019, Richard Biener wrote:
> 
> > It shouldn't be difficult to provide an alias from the glibc side, no?  
> > How does x86 avoid this issue?
> 
> There are static-only wrappers in libmvec_nonshared.a to work around the 
> GCC limitation (not included in the shared library, so less efficient than 
> direct calls to the vectorized functions, because it ought not be 
> necessary to have multiple symbols for the same function exported from the 
> shared library for this purpose).
> 
> The issue is as I said in 
>  - vector and scalar 
> versions of functions should not need to be in one-to-one correspondence.  
> For example, you could add __attribute__ ((__vector_asm__ ("name"))) to 
> set the version of a function's name to be used as the basis for forming 
> vector function names, overriding the use of a name specified with asm 
> ("name") for that purpose.

I like this idea.  I have prototyped something, I created 'vector_asm'
as an aarch64 attribute because I couldn't find where to put global
attributes like __simd__.  Does anyone know where these are listed?
I left off the leading/trailing underscores because GCC didn't like
them in a target attribute.

I then updated my sysdeps/aarch64/fpu/bits/math-vector.h file in glibc
and things seemed to work fine.  Here are my changes, what do people
think about changing GCC to do this?

The GCC changes I made are:


diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b38505b..45fde16 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1180,8 +1180,9 @@ static const struct attribute_spec 
aarch64_attribute_table[] =
 {
   /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
affects_type_identity, handler, exclude } */
-  { "aarch64_vector_pcs", 0, 0, false, true,  true,  true,  NULL, NULL },
-  { NULL, 0, 0, false, false, false, false, NULL, NULL }
+  { "aarch64_vector_pcs", 0,  0, false, true,  true,  true,  NULL, NULL },
+  { "vector_asm", 1,  1, true,  false, false, false, NULL, NULL },
+  { NULL, 0,  0, false, false, false, false, NULL, NULL }
 };
 
 #define AARCH64_CPU_DEFAULT_FLAGS ((selected_cpu) ? selected_cpu->flags : 0)
diff --git a/gcc/omp-simd-clone.c b/gcc/omp-simd-clone.c
index 388198b..59183f0 100644
--- a/gcc/omp-simd-clone.c
+++ b/gcc/omp-simd-clone.c
@@ -342,6 +342,8 @@ simd_clone_mangle (struct cgraph_node *node,
   unsigned int simdlen = clone_info->simdlen;
   unsigned int n;
   pretty_printer pp;
+  tree attr;
+  const char *str;
 
   gcc_assert (vecsize_mangle && simdlen);
 
@@ -409,7 +411,12 @@ simd_clone_mangle (struct cgraph_node *node,
 }
 
   pp_underscore ();
-  const char *str = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (node->decl));
+  attr = lookup_attribute ("vector_asm", DECL_ATTRIBUTES (node->decl));
+  if (attr)
+str = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr)));
+  else
+str = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (node->decl));
+
   if (*str == '*')
 ++str;
   pp_string (, str);



The new sysdeps/aarch64/fpu/bits/math-vector.h was modified to include:

# ifdef __DECL_SIMD_AARCH64
#  undef __DECL_SIMD_exp
#  define __DECL_SIMD_exp __DECL_SIMD_AARCH64 __attribute__ ((vector_asm 
("exp")))
#  undef __DECL_SIMD_expf
#  define __DECL_SIMD_expf __DECL_SIMD_AARCH64 __attribute__ ((vector_asm 
("expf")))

# endif

RFC: Patch to allow use of DECL_NAME in libmvec calls instead of DECL_ASSEMBER_NAME

2019-03-11 Thread Steve Ellcey

This is a proposed GCC patch that allows targets to modify the names of
the libmvec routines that get called.  Currently, if you build ToT GCC
on Aarch64 and include this glibc patch:

https://sourceware.org/ml/libc-alpha/2019-03/msg00106.html

And then compile a call to expf which gets vectorized, GCC will
generate a libmvec call to '_ZGVnN4v___expf_finite' instead of
_ZGVnN4v_expf because the limvec name is based on the assembly name
of the scalar function (__expf_finite) and not the 'real' name (expf).
This means that libmvec needs to provide both names, even though the
routines don't differ.

Rather than create both names I would like to make it possible for
GCC to generate calls to libmvec based on the real name by having
a target specific function that allows GCC to use the DECL_NAME instead
of DECL_ASSEMBLER_NAME to create the libmvec name.

The responses to my glibc patch (referenced above) has a pointer
to where this was discussed in the GCC mailing list a couple of years
ago:

https://gcc.gnu.org/ml/gcc/2015-06/msg00173.html

and which has a pointer back to an older glibc string as well:

https://sourceware.org/ml/libc-alpha/2015-06/msg00213.html

Any thoughts on this patch as a way of 'fixing' GCC to not use the
finite alias names?

Steve Ellcey
sell...@marvell.com


2018-03-11  Steve Ellcey  

* config/aarch64/aarch64.c (aarch64_simd_clone_vec_base_name):
New function.
(TARGET_SIMD_CLONE_VEC_BASE_NAME): New macro.
* doc/tm.texi.in (TARGET_SIMD_CLONE_VEC_BASE_NAME): New hook.
* doc/tm.texi: Regenerate.
* omp-simd-clone.c (simd_clone_mangle): Call vec_base_name hook.
* target.def (vec_base_name): New hook.
* targhooks.c (cgraph.h): New include.
(default_vec_base_name): New function.
* targhooks.h (default_vec_base_name): New function declaration.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 252bed7..cddab80 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -18711,6 +18711,14 @@ aarch64_simd_clone_usable (struct cgraph_node *node)
 }
 }
 
+/* Implement TARGET_SIMD_CLONE_VEC_BASE_NAME */
+
+static const char *
+aarch64_simd_clone_vec_base_name (struct cgraph_node *node)
+{
+  return IDENTIFIER_POINTER (DECL_NAME (node->decl));
+}
+
 /* Implement TARGET_COMP_TYPE_ATTRIBUTES */
 
 static int
@@ -19251,6 +19259,9 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_SIMD_CLONE_USABLE
 #define TARGET_SIMD_CLONE_USABLE aarch64_simd_clone_usable
 
+#undef TARGET_SIMD_CLONE_VEC_BASE_NAME
+#define TARGET_SIMD_CLONE_VEC_BASE_NAME aarch64_simd_clone_vec_base_name
+
 #undef TARGET_COMP_TYPE_ATTRIBUTES
 #define TARGET_COMP_TYPE_ATTRIBUTES aarch64_comp_type_attributes
 
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index fe1194e..de4bdb42 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4196,6 +4196,8 @@ address;  but often a machine-dependent strategy can generate better code.
 
 @hook TARGET_SIMD_CLONE_USABLE
 
+@hook TARGET_SIMD_CLONE_VEC_BASE_NAME
+
 @hook TARGET_SIMT_VF
 
 @hook TARGET_GOACC_VALIDATE_DIMS
diff --git a/gcc/omp-simd-clone.c b/gcc/omp-simd-clone.c
index 388198b..b3a57aa 100644
--- a/gcc/omp-simd-clone.c
+++ b/gcc/omp-simd-clone.c
@@ -409,7 +409,7 @@ simd_clone_mangle (struct cgraph_node *node,
 }
 
   pp_underscore ();
-  const char *str = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (node->decl));
+  const char *str = targetm.simd_clone.vec_base_name (node);
   if (*str == '*')
 ++str;
   pp_string (, str);
diff --git a/gcc/target.def b/gcc/target.def
index 66cee07..da60249 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1655,6 +1655,13 @@ usable.  In that case, the smaller the number is, the more desirable it is\n\
 to use it.",
 int, (struct cgraph_node *), NULL)
 
+DEFHOOK
+(vec_base_name,
+"This hook should return the name of the scalar function being cloned.\n\
+This defaults to DECL_ASSEMBLER_NAME, but targets could use DECL_NAME\n\
+instead or some other variation of the function name.",
+const char *, (struct cgraph_node *), default_vec_base_name)
+
 HOOK_VECTOR_END (simd_clone)
 
 /* Functions relating to OpenMP SIMT vectorization transform.  */
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 318f7e9..6792ee5 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -83,6 +83,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "real.h"
 #include "langhooks.h"
 #include "sbitmap.h"
+#include "cgraph.h"
 
 bool
 default_legitimate_address_p (machine_mode mode ATTRIBUTE_UNUSED,
@@ -2379,4 +2380,10 @@ default_remove_extra_call_preserved_regs (rtx_insn *, HARD_REG_SET *)
 {
 }
 
+const char *
+default_vec_base_name (struct cgraph_node * node)
+{
+  return IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (node->decl));
+}
+
 #include "gt-targhooks.h"
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 5943627..85dce1a 100

Re: [Patch, aarch64] PR 89628 - fix register allocation in SIMD functions

2019-03-11 Thread Steve Ellcey

Richard,

I don't necessarily disagree with anything in your comments and long
term I think that is the right direction, but I wonder if that level of
change is appropriate for GCC Stage 4 which is where we are now.  Your
changes would require fixes in shared code, whereas setting
REG_ALLOC_ORDER only affects Aarch64 and seems like a safer change.
I am not sure how long it would take me to implement something along
the lines of what you are suggesting.

Steve Ellcey
sell...@marvell.com


On Sat, 2019-03-09 at 08:03 +, Richard Sandiford wrote:

> Steve Ellcey  writes:
> > This is a patch to fix the register allocation in SIMD functions.  In
> > normal functions registers V16 to V31 are all caller saved.  In SIMD
> > functions V16 to V23 are callee saved and V24 to V31 are caller saved.
> > This means that SIMD functions should use V24 to V31 before V16 to V23
> > in order to avoid some saves and restores.
> > 
> > My fix for this is to define REG_ALLOC_ORDER.  Right now it is not
> > defined on Aarch64, so I just defined it as all the registers in order
> > except for putting V24 to V31 before V16 to V23.  This fixes the
> > register allocation in SIMD functions.  It also changes the register
> > allocation order in regular functions but since all the registers (V16
> > to V31) are caller saved in that case, it doesn't matter.  We could use
> > ADJUST_REG_ALLOC_ORDER to only affect SIMD functions but I did not see
> > any reason to do that.
> 
> REG_ALLOC_ORDER shouldn't really be needed for testcases like the ones
> in the PR.  Like you say, we don't currently need it to handle the
> equivalent situation for the standard ABI.
> 
> I think the problem is that the RA is still using the register sets
> for the default ABI when evaluating the prologue/epilogue cost of using
> a hard register.  E.g. see calculate_saved_nregs.
> 
> Maybe one fix would be to add an equivalent of call_used_reg_set to
> rtl_data.  By default it would be the same as call_used_reg_set,
> but the target could have an opportunity to change it.  Then code like
> calculate_saved_nregs should use the new set to find out what registers
> the current function can use without spilling.
> 
> This would also be useful for targets that implement interrupt handler
> attributes.
> 
> It would be good to add the testcase in the PR to the testsuite,
> with a scan-assembler to check for spills.
> 
> > diff --git a/gcc/config/aarch64/aarch64.h
> > b/gcc/config/aarch64/aarch64.h
> > index 7bd3bf5..d3723ff 100644
> > --- a/gcc/config/aarch64/aarch64.h
> > +++ b/gcc/config/aarch64/aarch64.h
> > @@ -328,7 +328,9 @@ extern unsigned aarch64_architecture_version;
> > ZR  zero register, encoded as X/R31 elsewhere
> >  
> > 32 x 128-bit floating-point/vector registers
> > -   V16-V31 Caller-saved (temporary) registers
> > +   V24-V31 Caller-saved (temporary) registers
> > +   V16-V23 Caller-saved (temporary) registers in most functions;
> > +   Callee-saved in SIMD functions.
> > V8-V15  Callee-saved registers
> > V0-V7   Parameter/result registers
> 
> Probably better as s/SIMD/vector PCS/.  The hunk above is OK with
> that
> change, independently of the rest.
> 
> Thanks,
> Richard

[Patch, aarch64] PR 89628 - fix register allocation in SIMD functions

2019-03-08 Thread Steve Ellcey

This is a patch to fix the register allocation in SIMD functions.  In
normal functions registers V16 to V31 are all caller saved.  In SIMD
functions V16 to V23 are callee saved and V24 to V31 are caller saved.
This means that SIMD functions should use V24 to V31 before V16 to V23
in order to avoid some saves and restores.

My fix for this is to define REG_ALLOC_ORDER.  Right now it is not
defined on Aarch64, so I just defined it as all the registers in order
except for putting V24 to V31 before V16 to V23.  This fixes the
register allocation in SIMD functions.  It also changes the register
allocation order in regular functions but since all the registers (V16
to V31) are caller saved in that case, it doesn't matter.  We could use
ADJUST_REG_ALLOC_ORDER to only affect SIMD functions but I did not see
any reason to do that.

Tested on ThunderX2 Aarch64 with no regressions.

OK to checkin?


2018-03-08  Steve Ellcey  

PR target/89628
* config/aarch64/aarch64.h (V16-V31): Update comment.
(REG_ALLOC_ORDER): New macro.


diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 7bd3bf5..d3723ff 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -328,7 +328,9 @@ extern unsigned aarch64_architecture_version;
ZR		zero register, encoded as X/R31 elsewhere
 
32 x 128-bit floating-point/vector registers
-   V16-V31	Caller-saved (temporary) registers
+   V24-V31	Caller-saved (temporary) registers
+   V16-V23	Caller-saved (temporary) registers in most functions;
+		Callee-saved in SIMD functions.
V8-V15	Callee-saved registers
V0-V7	Parameter/result registers
 
@@ -431,6 +433,26 @@ extern unsigned aarch64_architecture_version;
 V_ALIASES(28), V_ALIASES(29), V_ALIASES(30), V_ALIASES(31)  \
   }
 
+/* This is the default order for everything except V16-V23.  These
+   are caller saved in most functions but callee saved in SIMD
+   functions so we want to use V24-V31 which are always
+   caller saved before using these registers. */
+
+#define REG_ALLOC_ORDER		\
+  {\
+ 0,  1,  2,  3,4,  5,  6,  7,	/* R0 - R7 */		\
+ 8,  9, 10, 11,   12, 13, 14, 15,	/* R8 - R15 */		\
+16, 17, 18, 19,   20, 21, 22, 23,	/* R16 - R23 */		\
+24, 25, 26, 27,   28, 29, 30, 31,	/* R24 - R30, SP */	\
+32, 33, 34, 35,   36, 37, 38, 39,	/* V0 - V7 */		\
+40, 41, 42, 43,   44, 45, 46, 47,	/* V8 - V15 */		\
+56, 57, 58, 59,   60, 61, 62, 63,	/* V24 - V31 */		\
+48, 49, 50, 51,   52, 53, 54, 55,	/* V16 - V23 */		\
+64, 65, 66, 57,			/* SFP, AP, CC, VG */	\
+68, 69, 70, 71,   72, 73, 74, 75,	/* P0 - P7 */		\
+76, 77, 78, 79,   80, 81, 82, 83,	/* P8 - P15 */		\
+  }
+
 #define EPILOGUE_USES(REGNO) (aarch64_epilogue_uses (REGNO))
 
 /* EXIT_IGNORE_STACK should be nonzero if, when returning from a function,

Re: [Patch] PR rtl-optimization/87763 - generate more bfi instructions on aarch64

2019-02-26 Thread Steve Ellcey

Ping.

Steve Ellcey
sell...@marvell.com


On Mon, 2019-02-11 at 10:46 -0800, Steve Ellcey wrote:
> On Thu, 2019-02-07 at 18:13 +, Wilco Dijkstra wrote
> > 
> > Hi Steve,
> > 
> > > > After special cases you could do something like t = mask2 +
> > > > (HWI_1U << shift);
> > > > return t == (t & -t) to check for a valid bfi.
> > > 
> > > I am not sure I follow this logic and my attempts to use this did
> > > not
> > > work so I kept my original code.
> > 
> > It's similar to the initial code in aarch64_bitmask_imm, but rather
> > than adding
> > the lowest bit to the value to verify it is a mask (val + (val &
> > -val)), we use the
> > shift instead. If the shift is exactly right, it reaches the first
> > set bit of the mask.
> > Adding the low bit to a valid mask always results in zero or a
> > single set bit.
> > The standard idiom to check that is t == (t & -t).
> > 
> > > > +  "bfi\t%0, %1, 0, %P2"
> > > > 
> > > > This could emit a width that may be 32 too large in SImode if
> > > > bit 31 is set
> > > > (there is another use of %P in aarch64.md which may have the
> > > > same
> > > > issue).
> > > 
> > > I am not sure why having bit 31 set would be a problem.  Sign
> > > extension?
> > 
> > Yes, if bit 31 is set, %P will emit 33 for a 32-bit constant which
> > is obviously wrong.
> > Your patch avoids this for bfi by explicitly computing the correct
> > value.
> > 
> > This looks good to me (and creates useful bfi's as expected), but I
> > can't approve.
> > 
> > Wilco
> 
> Thanks for looking this over.  I have updated the mask check to use
> your method and retested to make sure it still works.  Can one of the
> aarch64 maintainers approve this patch?
> 
> 2018-02-11  Steve Ellcey  
> 
>   PR rtl-optimization/87763
>   * config/aarch64/aarch64-protos.h
> (aarch64_masks_and_shift_for_bfi_p):
>   New prototype.
>   * config/aarch64/aarch64.c (aarch64_masks_and_shift_for_bfi_p):
>   New function.
>   * config/aarch64/aarch64.md (*aarch64_bfi5_shift):
>   New instruction.
>   (*aarch64_bfi4_noand): Ditto.
>   (*aarch64_bfi4_noshift): Ditto.
>   (*aarch64_bfi4_noshift_alt): Ditto.
> 
> 2018-02-11  Steve Ellcey  
> 
>   PR rtl-optimization/87763
>   * gcc.target/aarch64/combine_bfxil.c: Change some bfxil checks
>   to bfi.
>   * gcc.target/aarch64/combine_bfi_2.c: New test.
> 
>

Re: [EXT] Re: [Patch, Aarch64] Implement TARGET_GET_MULTILIB_ABI_NAME for aarch64 (for use in Fortran vector header file)

2019-02-26 Thread Steve Ellcey

On Tue, 2019-02-26 at 10:17 +, Richard Sandiford wrote:
> 
> I'm torn about about whether we should proactively add the ILP32 and
> big-endian conditions now or wait until there's a specific need.
> Unless anyone strongly objects, let's keep them for now.
> 
> Thanks,
> Richard

OK, here is a new patch with the ILP32 and big-endian conditions still
in place but the sve stuff removed.  Retested on aarch64 with no
regressions.  OK for checkin?

Steve Ellcey
sell...@marvell.com



2018-02-26  Steve Ellcey  

* config/aarch64/aarch64.c (aarch64_get_multilib_abi_name):
New function.
(TARGET_GET_MULTILIB_ABI_NAME): New macro.


2018-02-26  Steve Ellcey  

* gfortran.dg/simd-builtins-1.f90: Update for aarch64*-*-*.
* gfortran.dg/simd-builtins-2.f90: Ditto.
* gfortran.dg/simd-builtins-6.f90: Ditto.
* gfortran.dg/simd-builtins-8.f90: New test.
* gfortran.dg/simd-builtins-8.h: New header file.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 91e79d3..b8125d5 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -18722,6 +18722,17 @@ aarch64_comp_type_attributes (const_tree type1, const_tree type2)
   return 1;
 }
 
+/* Implement TARGET_GET_MULTILIB_ABI_NAME */
+
+static const char *
+aarch64_get_multilib_abi_name (void)
+{
+  if (TARGET_BIG_END)
+return TARGET_ILP32 ? "aarch64_be_ilp32" : "aarch64_be";
+  return TARGET_ILP32 ? "aarch64_ilp32" : "aarch64";
+}
+
+
 /* Implement TARGET_STACK_PROTECT_GUARD. In case of a
global variable based guard use the default else
return a null tree.  */
@@ -19244,6 +19255,9 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_COMP_TYPE_ATTRIBUTES
 #define TARGET_COMP_TYPE_ATTRIBUTES aarch64_comp_type_attributes
 
+#undef TARGET_GET_MULTILIB_ABI_NAME
+#define TARGET_GET_MULTILIB_ABI_NAME aarch64_get_multilib_abi_name
+
 #if CHECKING_P
 #undef TARGET_RUN_TARGET_SELFTESTS
 #define TARGET_RUN_TARGET_SELFTESTS selftest::aarch64_run_selftests
diff --git a/gcc/testsuite/gfortran.dg/simd-builtins-1.f90 b/gcc/testsuite/gfortran.dg/simd-builtins-1.f90
index 6d79ef8..5cb3eb5 100644
--- a/gcc/testsuite/gfortran.dg/simd-builtins-1.f90
+++ b/gcc/testsuite/gfortran.dg/simd-builtins-1.f90
@@ -1,5 +1,6 @@
-! { dg-do compile { target { i?86-*-linux* x86_64-*-linux* } } }
-! { dg-additional-options "-msse2 -mno-avx -nostdinc -Ofast -fpre-include=simd-builtins-1.h -fdump-tree-optimized" }
+! { dg-do compile { target i?86-*-linux* x86_64-*-linux* aarch64*-*-linux* } }
+! { dg-additional-options "-nostdinc -Ofast -fpre-include=simd-builtins-1.h -fdump-tree-optimized" }
+! { dg-additional-options "-msse2 -mno-avx" { target i?86-*-linux* x86_64-*-linux* } }
 
 program test_overloaded_intrinsic
   real(4) :: x4(3200), y4(3200)
@@ -14,6 +15,7 @@ program test_overloaded_intrinsic
   print *, y8
 end
 
-! { dg-final { scan-tree-dump "sinf.simdclone" "optimized" } } */
-! { dg-final { scan-tree-dump "__builtin_sin" "optimized" } } */
-! { dg-final { scan-assembler "call.*_ZGVbN4v_sinf" } }
+! { dg-final { scan-tree-dump "sinf.simdclone" "optimized" } }
+! { dg-final { scan-tree-dump "__builtin_sin" "optimized" } }
+! { dg-final { scan-assembler "call.*_ZGVbN4v_sinf" { target i?86-*-linux* x86_64-*-* } } }
+! { dg-final { scan-assembler "bl.*_ZGVnN4v_sinf" { target aarch64*-*-* } } }
diff --git a/gcc/testsuite/gfortran.dg/simd-builtins-2.f90 b/gcc/testsuite/gfortran.dg/simd-builtins-2.f90
index f0e6bc1..2e5bc22 100644
--- a/gcc/testsuite/gfortran.dg/simd-builtins-2.f90
+++ b/gcc/testsuite/gfortran.dg/simd-builtins-2.f90
@@ -1,11 +1,12 @@
-! { dg-do compile { target { i?86-*-linux* x86_64-*-linux* } } }
-! { dg-additional-options "-msse2 -nostdinc -Ofast -fdump-tree-optimized" }
+! { dg-do compile { target { i?86-*-linux* x86_64-*-linux* aarch64*-*-linux* } } }
+! { dg-additional-options "-nostdinc -Ofast -fdump-tree-optimized" }
+! { dg-additional-options "-msse2" { target i?86-*-linux* x86_64-*-linux* } }
 
 program test_overloaded_intrinsic
   real(4) :: x4(3200), y4(3200)
   real(8) :: x8(3200), y8(3200)
 
-  ! this should be using simd clone
+  ! this should not be using simd clone
   y4 = sin(x4)
   print *, y4
 
@@ -18,3 +19,4 @@ end
 ! { dg-final { scan-tree-dump "__builtin_sin" "optimized" } } */
 ! { dg-final { scan-tree-dump-not "simdclone" "optimized" } } */
 ! { dg-final { scan-assembler-not "call.*_ZGVbN4v_sinf" } }
+! { dg-final { scan-assembler-not "bl.*_ZGVnN4v_sinf" } }
diff --git a/gcc/testsuite/gfortran.dg/simd-builtins-6.f90 b/gcc/testsuite/gfortran.dg/simd-builtins-6.f90
index 2ffa807..60bcac7 100644
--- a/gcc/testsuite/gfortran.dg/simd-builtins-6.f90
+++ b

Re: [EXT] Re: [Patch, Aarch64] Implement TARGET_GET_MULTILIB_ABI_NAME for aarch64 (for use in Fortran vector header file)

2019-02-25 Thread Steve Ellcey

On Wed, 2019-02-20 at 10:04 +, Richard Sandiford wrote:
> 
> (E.g. __attribute__((vector_size)) never creates an ABI-level SVE vector,
> even with -msve-vector-bits=N, but it can create an ABI-level Advanced
> SIMD vector.)
> 
> I think we should leave the SVE stuff out for now though.  ISTM that:
> 
> !GCC$ builtin (sin) attributes simd (notinbranch) if('aarch64')
> !GCC$ builtin (sin) attributes simd (notinbranch) if('aarch64_sve')
> 
> is trying to say that there's both an Advanced SIMD implementation of sin
> and an SVE implementation of sin.  But AFAICT the patch would just treat
> the two the same way when SVE is enabled, which I think in practice means
> that both would declare an Advanced SIMD function.

It looks like you are right.  I did not know that GCC would essentially
use the non-SVE vector functions to handle SVE vectors. So right now,
GCC doesn't seem to need to differentiate between SVE and non-SVE.

> Once we support SVE vector functions, how would the header file declare
> the Advanced SIMD function when SVE is enabled on the command line?

I guess this is what I was thinking the aarch64_sve if would be for,
but there is a difference between supporting SVE (using non-SVE vector
functions) and supporting SVE with functions that can take and return
an SVE vector.  I think that we should use 'if('aarch64_sve')' to mean
SVE is enabled and we want GCC to call SVE vector functions.  So right
now this would always return false, until such time as GCC is changed
to do calls with SVE vectors as arguments.

Given that it would always return false I guess we could just drop it
out for now.

Steve Ellcey
sell...@marvell.com

FYI: If anyone is interested here is a test program I was compiling to
see what GCC does with SVE, I compiled with -march=armv8.5-a,
-march=armv8.5-a+sve and -msve-vector-bits=[256,2048,scalable] to see
what sort of code got generated.  I was a little surprised that using
-march=armv8.5-a (without the +sve) and -msve-vector-bits= did not give
a warning.  Why set sve-vector-bits if SVE is not enabled.

__attribute__ ((__simd__ ("notinbranch"))) extern float
xsinf (float __x) __attribute__ ((__nothrow__ , __leaf__, __const__));
__attribute__ ((__simd__ ("notinbranch"))) extern float
xcosf (float __x) __attribute__ ((__nothrow__ , __leaf__, __const__));

void foo (float * __restrict__ a,float * __restrict__ b,float *
__restrict__ c)
{
int i;
for (i = 0; i < 1; i++)
a[i] = xsinf(b[i]) * xcosf(c[i]);
}

[Patch, Aarch64] Implement TARGET_GET_MULTILIB_ABI_NAME for aarch64 (for use in Fortran vector header file)

2019-02-19 Thread Steve Ellcey

Here is a patch to use the new TARGET_GET_MULTILIB_ABI_NAME macro that
Martin Liska added with the Aarch64 platform.  The main issue to
consider is what abi names to support, I created 8 versions for big-
endian/little-endian, ilp32/lp64, and sve/non-sve.  I am not sure if we
need all of those but it seemed better to have them and not use them
than to find out we need them later.

Tested on Aarch64 and x86 with bootstraps and test runs.  There were no
regressions.

Ok to checkin?

Steve Ellcey
sell...@marvell.com


2018-02-19  Steve Ellcey  

* config/aarch64/aarch64.c (aarch64_get_multilib_abi_name):
New function.
(TARGET_GET_MULTILIB_ABI_NAME): New macro.


2018-02-19  Steve Ellcey  

* gfortran.dg/simd-builtins-1.f90: Update for aarch64*-*-*.
* gfortran.dg/simd-builtins-2.f90: Ditto.
* gfortran.dg/simd-builtins-6.f90: Ditto.
* gfortran.dg/simd-builtins-8.f90: New test.
* gfortran.dg/simd-builtins-8.h: New header file.


diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 9f52cc9ff3b..d26ff85b683 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -18720,6 +18720,23 @@ aarch64_comp_type_attributes (const_tree type1, const_tree type2)
   return 1;
 }
 
+/* Implement TARGET_GET_MULTILIB_ABI_NAME */
+
+static const char *
+aarch64_get_multilib_abi_name (void)
+{
+  if (TARGET_BIG_END)
+{
+  if (AARCH64_ISA_SVE)
+	return TARGET_ILP32 ? "aarch64_be_ilp32_sve" : "aarch64_be_sve";
+  return TARGET_ILP32 ? "aarch64_be_ilp32" : "aarch64_be";
+}
+  if (AARCH64_ISA_SVE)
+return TARGET_ILP32 ? "aarch64_ilp32_sve" : "aarch64_sve";
+  return TARGET_ILP32 ? "aarch64_ilp32" : "aarch64";
+}
+
+
 /* Implement TARGET_STACK_PROTECT_GUARD. In case of a
global variable based guard use the default else
return a null tree.  */
@@ -19242,6 +19259,9 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_COMP_TYPE_ATTRIBUTES
 #define TARGET_COMP_TYPE_ATTRIBUTES aarch64_comp_type_attributes
 
+#undef TARGET_GET_MULTILIB_ABI_NAME
+#define TARGET_GET_MULTILIB_ABI_NAME aarch64_get_multilib_abi_name
+
 #if CHECKING_P
 #undef TARGET_RUN_TARGET_SELFTESTS
 #define TARGET_RUN_TARGET_SELFTESTS selftest::aarch64_run_selftests
diff --git a/gcc/testsuite/gfortran.dg/simd-builtins-1.f90 b/gcc/testsuite/gfortran.dg/simd-builtins-1.f90
index 6d79ef8dc46..5cb3eb5132a 100644
--- a/gcc/testsuite/gfortran.dg/simd-builtins-1.f90
+++ b/gcc/testsuite/gfortran.dg/simd-builtins-1.f90
@@ -1,5 +1,6 @@
-! { dg-do compile { target { i?86-*-linux* x86_64-*-linux* } } }
-! { dg-additional-options "-msse2 -mno-avx -nostdinc -Ofast -fpre-include=simd-builtins-1.h -fdump-tree-optimized" }
+! { dg-do compile { target i?86-*-linux* x86_64-*-linux* aarch64*-*-linux* } }
+! { dg-additional-options "-nostdinc -Ofast -fpre-include=simd-builtins-1.h -fdump-tree-optimized" }
+! { dg-additional-options "-msse2 -mno-avx" { target i?86-*-linux* x86_64-*-linux* } }
 
 program test_overloaded_intrinsic
   real(4) :: x4(3200), y4(3200)
@@ -14,6 +15,7 @@ program test_overloaded_intrinsic
   print *, y8
 end
 
-! { dg-final { scan-tree-dump "sinf.simdclone" "optimized" } } */
-! { dg-final { scan-tree-dump "__builtin_sin" "optimized" } } */
-! { dg-final { scan-assembler "call.*_ZGVbN4v_sinf" } }
+! { dg-final { scan-tree-dump "sinf.simdclone" "optimized" } }
+! { dg-final { scan-tree-dump "__builtin_sin" "optimized" } }
+! { dg-final { scan-assembler "call.*_ZGVbN4v_sinf" { target i?86-*-linux* x86_64-*-* } } }
+! { dg-final { scan-assembler "bl.*_ZGVnN4v_sinf" { target aarch64*-*-* } } }
diff --git a/gcc/testsuite/gfortran.dg/simd-builtins-2.f90 b/gcc/testsuite/gfortran.dg/simd-builtins-2.f90
index f0e6bc13862..2e5bc22716b 100644
--- a/gcc/testsuite/gfortran.dg/simd-builtins-2.f90
+++ b/gcc/testsuite/gfortran.dg/simd-builtins-2.f90
@@ -1,11 +1,12 @@
-! { dg-do compile { target { i?86-*-linux* x86_64-*-linux* } } }
-! { dg-additional-options "-msse2 -nostdinc -Ofast -fdump-tree-optimized" }
+! { dg-do compile { target { i?86-*-linux* x86_64-*-linux* aarch64*-*-linux* } } }
+! { dg-additional-options "-nostdinc -Ofast -fdump-tree-optimized" }
+! { dg-additional-options "-msse2" { target i?86-*-linux* x86_64-*-linux* } }
 
 program test_overloaded_intrinsic
   real(4) :: x4(3200), y4(3200)
   real(8) :: x8(3200), y8(3200)
 
-  ! this should be using simd clone
+  ! this should not be using simd clone
   y4 = sin(x4)
   print *, y4
 
@@ -18,3 +19,4 @@ end
 ! { dg-final { scan-tree-dump "__builtin_sin" "optimized" } } */
 ! { dg-final { scan-tree-dump-not "simdclone" "optimized" } } */
 ! { dg-final { scan-assembler-not "call.*_ZGVbN4v_sinf&qu

Re: [EXT] Re: GCC missing -flto optimizations? SPEC lbm benchmark

2019-02-15 Thread Steve Ellcey

On Fri, 2019-02-15 at 17:48 +0800, Jun Ma wrote:
> 
> ICC is doing much more than GCC in ipo, especially memory layout 
> optimizations. See https://software.intel.com/en-us/node/522667.
> ICC is more aggressive in array transposition/structure splitting
> /field reordering. However, these optimizations have been removed
> from GCC long time ago.  
> As for case lbm_r, IIRC a loop with memory access which stride is 20 is 
> most time-consuming.  ICC will optimize the array(maybe structure?) 
> and vectorize the loop under ipo.
>  
> Thanks
> Jun

Interesting.  I tried using '-qno-opt-mem-layout-trans' on ICC
along with '-Ofast -ipo' and that had no affect on the speed.  I also
tried '-no-vec' and that had no affect either.  The only thing that 
slowed down ICC was '-ip-no-inlining' or '-fno-inline'.  I see that
'-Ofast -ipo' resulted in everything (except libc functions) getting
inlined into the main program when using ICC.  GCC did not do that, but
if I forced it to by using the always_inline attribute, GCC could
inline everything into main the way ICC does.  But that did not speed
up the GCC executable.

Steve Ellcey
sell...@marvell.com

Re: Fortran vector math header

2019-02-14 Thread Steve Ellcey

On Wed, 2019-02-13 at 12:34 +0100, Martin Liška wrote:
> May I please ping this so that we can reach mainline soon?
> 
> Thanks,
> Martin

Martin, I can't approve this patch but I can say that I have used it on
Aarch64 and created a follow up patch for aarch64 to create a
get_multilib_abi_name target function for that platform.  Everything
seemed to work fine for me and I did not have any problems or see any
regressions when using your patch.  I hope it gets approved and checked
in soon.

Steve Ellcey
sell...@marvell.com

GCC missing -flto optimizations? SPEC lbm benchmark

2019-02-14 Thread Steve Ellcey

I have a question about SPEC CPU 2017 and what GCC can and cannot do
with -flto.  As part of some SPEC analysis I am doing I found that with
-Ofast, ICC and GCC were not that far apart (especially spec int rate,
spec fp rate was a slightly larger difference).

But when I added -ipo to the ICC command and -flto to the GCC command,
the difference got larger.  In particular the 519.lbm_r was more than
twice as fast with ICC and -ipo, but -flto did not help GCC at all.

There are other tests that also show this type of improvement with -ipo
like 538.imagick_r, 544.nab_r, 525.x264_r, 531.deepsjeng_r, and
548.exchange2_r, but none are as dramatic as 519.lbm_r.  Anyone have
any idea on what ICC is doing that GCC is missing?  Is GCC just not
agressive enough with its inlining?

Steve Ellcey
sell...@marvell.com

[Patch, aarch64] Issue warning/error for mixing functions with/without aarch64_vector_pcs attribute

2019-02-13 Thread Steve Ellcey

Szabolcs pointed out that my SIMD ABI patches that implement the
aarch64_vector_pcs attribute do not generate a warning or error
when being mixed with functions that do not have the attribute because
the 'affects_type_identity' field was false in the attribute table.

This patch fixes that.  I thought I could just set it to true but it
turned out I also had to implement TARGET_COMP_TYPE_ATTRIBUTES as well.

This patch does that and adds a test case to check for the error
when assigning a function with the attribute to a pointer type without
the attribute.

The test checks for an error because the testsuite adds -pedantic-
errors to the compile line.  Without this you would just get a warning,
but that is consistent with any mixing of different function types in a
function pointer assignment.

Tested with a bootstrap build and test run on aarch64.  OK for checkin?

Steve Ellcey
sell...@marvell.com


2018-02-13  Steve Ellcey  

* config/aarch64/aarch64.c (aarch64_attribute_table): Change
affects_type_identity to true for aarch64_vector_pcs.
(aarch64_comp_type_attributes): New function.
(TARGET_COMP_TYPE_ATTRIBUTES): New macro.

2018-02-13  Steve Ellcey  

* gcc.target/aarch64/pcs_attribute.c: New test.


diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1fa28fe82fe..9f52cc9ff3b 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1180,7 +1180,7 @@ static const struct attribute_spec aarch64_attribute_table[] =
 {
   /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
affects_type_identity, handler, exclude } */
-  { "aarch64_vector_pcs", 0, 0, false, true,  true,  false, NULL, NULL },
+  { "aarch64_vector_pcs", 0, 0, false, true,  true,  true,  NULL, NULL },
   { NULL, 0, 0, false, false, false, false, NULL, NULL }
 };
 
@@ -18709,6 +18709,17 @@ aarch64_simd_clone_usable (struct cgraph_node *node)
 }
 }
 
+/* Implement TARGET_COMP_TYPE_ATTRIBUTES */
+
+static int
+aarch64_comp_type_attributes (const_tree type1, const_tree type2)
+{
+  if (lookup_attribute ("aarch64_vector_pcs", TYPE_ATTRIBUTES (type1))
+  != lookup_attribute ("aarch64_vector_pcs", TYPE_ATTRIBUTES (type2)))
+return 0;
+  return 1;
+}
+
 /* Implement TARGET_STACK_PROTECT_GUARD. In case of a
global variable based guard use the default else
return a null tree.  */
@@ -19228,6 +19239,9 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_SIMD_CLONE_USABLE
 #define TARGET_SIMD_CLONE_USABLE aarch64_simd_clone_usable
 
+#undef TARGET_COMP_TYPE_ATTRIBUTES
+#define TARGET_COMP_TYPE_ATTRIBUTES aarch64_comp_type_attributes
+
 #if CHECKING_P
 #undef TARGET_RUN_TARGET_SELFTESTS
 #define TARGET_RUN_TARGET_SELFTESTS selftest::aarch64_run_selftests
diff --git a/gcc/testsuite/gcc.target/aarch64/pcs_attribute.c b/gcc/testsuite/gcc.target/aarch64/pcs_attribute.c
index e69de29bb2d..9a99b91a6ed 100644
--- a/gcc/testsuite/gcc.target/aarch64/pcs_attribute.c
+++ b/gcc/testsuite/gcc.target/aarch64/pcs_attribute.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+ 
+/* Test that the assignment of f (with the attribute) to function pointer g
+   (with no attribute) results in an error.  */
+
+__attribute__((aarch64_vector_pcs)) void f(void);
+void (*g)(void) = f; /* { dg-error "incompatible pointer type" } */

Re: [Patch 1/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2019-02-13 Thread Steve Ellcey

On Wed, 2019-02-13 at 16:54 +, Szabolcs Nagy wrote:

> > +/* Table of machine attributes.  */
> > +static const struct attribute_spec aarch64_attribute_table[] =
> > +{
> > +  /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
> > +   affects_type_identity, handler, exclude } */
> > +  { "aarch64_vector_pcs", 0, 0, false, true,  true,  false, NULL,
> > NULL },
> 
> i just noticed that "affects_type_identity" is set to false,
> so gcc accepts
> 
>   __attribute__((aarch64_vector_pcs)) void f(void);
>   void (*g)(void) = f;
> 
> without a warning (treats function types with different
> pcs as compatible)
> 
> i think we don't want to allow calls through the wrong
> pointer type, such assignment should be an error.

I agree.  I will submit a patch to change the affects_type_identity
flag and add a test for it.

Steve Ellcey
sell...@marvell.com

Re: [Patch] PR rtl-optimization/87763 - generate more bfi instructions on aarch64

2019-02-11 Thread Steve Ellcey

On Thu, 2019-02-07 at 18:13 +, Wilco Dijkstra wrote:
> External Email
> 
> Hi Steve,
> 
> > > After special cases you could do something like t = mask2 +
> > > (HWI_1U << shift);
> > > return t == (t & -t) to check for a valid bfi.
> > 
> > I am not sure I follow this logic and my attempts to use this did not
> > work so I kept my original code.
> 
> It's similar to the initial code in aarch64_bitmask_imm, but rather than 
> adding
> the lowest bit to the value to verify it is a mask (val + (val & -val)), we 
> use the
> shift instead. If the shift is exactly right, it reaches the first
> set bit of the mask.
> Adding the low bit to a valid mask always results in zero or a single set bit.
> The standard idiom to check that is t == (t & -t).
> 
> > > +  "bfi\t%0, %1, 0, %P2"
> > > 
> > > This could emit a width that may be 32 too large in SImode if bit 31 is 
> > > set
> > > (there is another use of %P in aarch64.md which may have the same
> > > issue).
> > 
> > I am not sure why having bit 31 set would be a problem.  Sign
> > extension?
> 
> Yes, if bit 31 is set, %P will emit 33 for a 32-bit constant which is 
> obviously wrong.
> Your patch avoids this for bfi by explicitly computing the correct value.
> 
> This looks good to me (and creates useful bfi's as expected), but I
> can't approve.
> 
> Wilco

Thanks for looking this over.  I have updated the mask check to use
your method and retested to make sure it still works.  Can one of the
aarch64 maintainers approve this patch?

2018-02-11  Steve Ellcey  

PR rtl-optimization/87763
* config/aarch64/aarch64-protos.h (aarch64_masks_and_shift_for_bfi_p):
New prototype.
* config/aarch64/aarch64.c (aarch64_masks_and_shift_for_bfi_p):
New function.
    * config/aarch64/aarch64.md (*aarch64_bfi5_shift):
New instruction.
(*aarch64_bfi4_noand): Ditto.
(*aarch64_bfi4_noshift): Ditto.
(*aarch64_bfi4_noshift_alt): Ditto.

2018-02-11  Steve Ellcey  

PR rtl-optimization/87763
* gcc.target/aarch64/combine_bfxil.c: Change some bfxil checks
to bfi.
* gcc.target/aarch64/combine_bfi_2.c: New test.


diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index b035e35f33b..b6c0d0a8eb6 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -429,6 +429,9 @@ bool aarch64_label_mentioned_p (rtx);
 void aarch64_declare_function_name (FILE *, const char*, tree);
 bool aarch64_legitimate_pic_operand_p (rtx);
 bool aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode, rtx, rtx);
+bool aarch64_masks_and_shift_for_bfi_p (scalar_int_mode, unsigned HOST_WIDE_INT,
+	unsigned HOST_WIDE_INT,
+	unsigned HOST_WIDE_INT);
 bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx);
 bool aarch64_move_imm (HOST_WIDE_INT, machine_mode);
 opt_machine_mode aarch64_sve_pred_mode (unsigned int);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index d7c453cdad0..a7ef952ad1b 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9330,6 +9330,35 @@ aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode mode, rtx mask,
 	 & ((HOST_WIDE_INT_1U << INTVAL (shft_amnt)) - 1)) == 0;
 }
 
+/* Return true if the masks and a shift amount from an RTX of the form
+   ((x & MASK1) | ((y << SHIFT_AMNT) & MASK2)) are valid to combine into
+   a BFI instruction of mode MODE.  See *arch64_bfi patterns.  */
+
+bool
+aarch64_masks_and_shift_for_bfi_p (scalar_int_mode mode,
+   unsigned HOST_WIDE_INT mask1,
+   unsigned HOST_WIDE_INT shft_amnt,
+   unsigned HOST_WIDE_INT mask2)
+{
+  unsigned HOST_WIDE_INT t;
+
+  /* Verify that there is no overlap in what bits are set in the two masks.  */
+  if (mask1 != ~mask2)
+return false;
+
+  /* Verify that mask2 is not all zeros or ones.  */
+  if (mask2 == 0 || mask2 == HOST_WIDE_INT_M1U)
+return false;
+
+  /* The shift amount should always be less than the mode size.  */
+  gcc_assert (shft_amnt < GET_MODE_BITSIZE (mode));
+
+  /* Verify that the mask being shifted is contiguous and would be in the
+ least significant bits after shifting by shft_amnt.  */
+  t = mask2 + (HOST_WIDE_INT_1U << shft_amnt);
+  return (t == (t & -t));
+}
+
 /* Calculate the cost of calculating X, storing it in *COST.  Result
is true if the total cost of the operation has now been calculated.  */
 static bool
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index b7f6fe0f135..2bbd3f1055c 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5476,6 +5476,76 @@
   [(set_attr "type" "bfm")]
 )

Re: [EXT] Re: [PATCH, fortran ieee]: PR 88678, Many gfortran.dg/ieee/ieee_X.f90 test cases fail starting with r267465

2019-02-08 Thread Steve Ellcey

On Fri, 2019-02-08 at 10:42 +0100, Uros Bizjak wrote:

> so, the reverted patch neglected this assumption. Ignoring this, we
> can use
> 
> --cut here--
> Index: libgfortran/config/fpu-glibc.h
> ===
> --- libgfortran/config/fpu-glibc.h  (revision 268424)
> +++ libgfortran/config/fpu-glibc.h  (working copy)
> @@ -129,6 +129,10 @@
>  int
>  support_fpu_trap (int flag)
>  {
> +#if defined(__arm__) || defined(__aarch64__)
> +  return 0;
> +#endif
> +
>return support_fpu_flag (flag);
>  }
> 
> --cut here--
> 
> which would in effect revert to the previous status on arm/aarch64
> targets, while using correct setting for other targets, where
> exception support is assumed.
> 
> Uros.

I am not sure we want to disable this for all arm/Aarch64 chips. 
ieee_6.f90 is failing on my T88 ThunderX box with a 4.13 kernel and
passing on my T99 Thunderx2 box with a 4.15 kernel.  I don't know
if it is the hardware or the kernel that is making the difference,
I am still trying to figure that out.  I guess returning zero would be
a 'safe' choice in that it would say we do not support this on some
machines where it actually is supported but it would no longer claim
support on a machine where it is not supported.

Steve Ellcey
sell...@marvell.com

Re: [PATCH, fortran ieee]: PR 88678, Many gfortran.dg/ieee/ieee_X.f90 test cases fail starting with r267465

2019-02-07 Thread Steve Ellcey

On Thu, 2019-01-31 at 08:46 +0100, Uros Bizjak wrote:
> On Wed, Jan 30, 2019 at 9:51 PM Janne Blomqvist
>
> > This seems to change the only user of support_fpu_trap() that is
> > different from support_fpu_flag(), so with this change one could
> > remove support_fpu_trap() entirely and modify all callers (since
> > it's an internal function it's not used outside libgfortran) to
> > call support_fpu_flag() directly. Otherwise Ok.
> 
> Some targets only support IEEE flags (so, flags in some FP status
> register), but not exceptions (traps) based on these flags that halt
> the program. Currently, fpu-glibc.h assumes that all flags can
> generate exceptions (that is true for the current set of gfortran
> targets), but some future target wants to return 0 from
> support_fpu_trap.
> 
> Uros.

Is this actually true for all existing gfortran targets?  Specifically
I am wondering about Arm and Aarch64.  PR 78314 says that ARM trapping
FPU exceptions are optional.

I am currently seeing gfortran.dg/ieee/ieee_6.f90 fail on my Aarch64
ThunderX box.  I wonder if we should have an Arm/Aarch64 specific
version of the fpu header file (fpu-arm.h) that would use the previous 
version of support_fpu_trap.

Steve Ellcey
sell...@marvell.com

Re: [Patch] PR rtl-optimization/87763 - generate more bfi instructions on aarch64

2019-02-06 Thread Steve Ellcey

On Tue, 2019-02-05 at 21:12 +, Wilco Dijkstra wrote:

> +bool
> +aarch64_masks_and_shift_for_bfi_p (scalar_int_mode mode,
> +  unsigned HOST_WIDE_INT mask1,
> +  unsigned HOST_WIDE_INT shft_amnt,
> +  unsigned HOST_WIDE_INT mask2)
> +{
> +  unsigned HOST_WIDE_INT t;
> +
> +  /* Verify that there is no overlap in what bits are set in the two
> masks.  */
> +  if ((mask1 + mask2) != HOST_WIDE_INT_M1U)
> +return false;
> +  if ((mask1 & mask2) != 0)
> +return false;
> 
> Why not check mask1 == ~mask2? That's much simpler...

Yes that would be simpler.  I made that change.
> 
> +  /* Verify that the shift amount is less than the mode size.  */
> +  if (shft_amnt >= GET_MODE_BITSIZE (mode))
> +return false;
> 
> The md pattern already guarantees this (this could be an assert). We need
> to check that  shift+width <= mode size. Given immediates are limited to
> the mode size, the easiest way is to check mask2 is not 0 or M1.

OK, I changed the if statement to a gcc_assert and added a check to
make sure mask2 is not 0 or M1.

> +  /* Verify that the mask being shifted is contiguous and would be in the
> + least significant bits after shifting by creating a mask 't' based on
> + the number of bits set in mask2 and the shift amount for mask2 and
> + comparing that to the actual mask2.  */
> +  t = popcount_hwi (mask2);
> +  t = (HOST_WIDE_INT_1U << t) - 1;
> +  t = t << shft_amnt;
> +  if (t != mask2)
> +return false;
> +
> +  return true;
> 
> This would return true if mask2 == 0 or M1 (I think this can't happen with
> current md patterns, but this would emit an illegal bfi).
> 
> After special cases you could do something like t = mask2 + (HWI_1U << shift);
> return t == (t & -t) to check for a valid bfi.

I am not sure I follow this logic and my attempts to use this did not
work so I kept my original code.

> +  "bfi\t%0, %1, 0, %P2"
> 
> This could emit a width that may be 32 too large in SImode if bit 31 is set
> (there is another use of %P in aarch64.md which may have the same
> issue).

I am not sure why having bit 31 set would be a problem.  Sign
extension?

> Finally from some quick testing it seems one case is not yet
> supported:
> 
> int f1(int x, int y)
> {
>   return (y & 0xfff) | (((x <<28) & 0xf000));
> }
> 
> This just has a shift, rather than an AND plus a shift.

I added another instruction to handle this and added a new test to
check for it.

Steve Ellcey
sell...@marvell.com


Here is the latest version of the patch.


2018-02-06  Steve Ellcey  

PR rtl-optimization/87763
* config/aarch64/aarch64-protos.h (aarch64_masks_and_shift_for_bfi_p):
New prototype.
* config/aarch64/aarch64.c (aarch64_masks_and_shift_for_bfi_p):
New function.
    * config/aarch64/aarch64.md (*aarch64_bfi5_shift):
New instruction.
(*aarch64_bfi4_noand): Ditto.
(*aarch64_bfi4_noshift): Ditto.
(*aarch64_bfi4_noshift_alt): Ditto.

2018-02-06  Steve Ellcey  

PR rtl-optimization/87763
* gcc.target/aarch64/combine_bfxil.c: Change some bfxil checks
to bfi.
* gcc.target/aarch64/combine_bfi_2.c: New test.

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index b035e35f33b..b6c0d0a8eb6 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -429,6 +429,9 @@ bool aarch64_label_mentioned_p (rtx);
 void aarch64_declare_function_name (FILE *, const char*, tree);
 bool aarch64_legitimate_pic_operand_p (rtx);
 bool aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode, rtx, rtx);
+bool aarch64_masks_and_shift_for_bfi_p (scalar_int_mode, unsigned HOST_WIDE_INT,
+	unsigned HOST_WIDE_INT,
+	unsigned HOST_WIDE_INT);
 bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx);
 bool aarch64_move_imm (HOST_WIDE_INT, machine_mode);
 opt_machine_mode aarch64_sve_pred_mode (unsigned int);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index d7c453cdad0..26b5ab47d6f 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9330,6 +9330,42 @@ aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode mode, rtx mask,
 	 & ((HOST_WIDE_INT_1U << INTVAL (shft_amnt)) - 1)) == 0;
 }
 
+/* Return true if the masks and a shift amount from an RTX of the form
+   ((x & MASK1) | ((y << SHIFT_AMNT) & MASK2)) are valid to combine into
+   a BFI instruction of mode MODE.  See *arch64_bfi patterns.  */
+
+bool
+aarch64_masks_and_shift_for_bfi_p (scalar_int_mode mode,
+   unsigned HOST_WIDE_INT mask1,
+   unsigned HOST_WIDE_INT shft

Re: [Patch] PR rtl-optimization/87763 - generate more bfi instructions on aarch64

2019-02-04 Thread Steve Ellcey

Ping.  And adding Aarch64 Maintainers.


On Mon, 2019-01-28 at 16:11 -0800, Steve Ellcey wrote:
> On Sat, 2019-01-26 at 00:00 +0100, Jakub Jelinek wrote:
> > 
> > > +  /* Verify that there is no overlap in what bits are set in the
> > > two masks.  */
> > > +  if ((m1 + m2 + 1) != 0)
> > > +return false;
> > 
> > Wouldn't that be clearer to test
> >   if (m1 + m2 != HOST_WIDE_INT_1U)
> > return false;
> > ?
> > You IMHO also should test
> >   if ((m1 & m2) != 0)
> > return false;
> 
> I think you mean HOST_WIDE_INT_M1U, not HOST_WIDE_INT_1U.  That does
> seem clearer and I made that change as well as adding the second
> test.
> 
> > > 
> > > +  t = popcount_hwi (m2);
> > > +  t = (1 << t) - 1;
> > 
> > This should be (HOST_WIDE_INT_1U << t), otherwise if popcount of m2
> > is
> > > = 32, there will be UB.
> 
> Fixed.
> 
> > As mentioned in rs6000.md, I believe you also need a similar
> > pattern where
> > the two ANDs are swapped, because they have the same priority.
> 
> I fixed the long lines in aarch64.md and I added a second pattern for
> the *aarch64_bfi4_noshift pattern, swapping the order of
> the
> IOR operands.  I did not swap the AND operands, I assume the compiler
> would ensure that the register was always before the constant or that
> it would check both orderings.
> 
> I tried adding a second version of *aarch64_bfi5_shift as
> well but when I tested it, it never got used during a bootstrap build
> or a GCC test run so I decided it was not needed.
> 
> Tested on aarch64 with a bootstrap build and a GCC testsuite run.
> OK to checkin?
> 
> 
> Steve Ellcey
> sell...@marvell.com
> 
> 
> 2018-01-28  Steve Ellcey  
> 
>   PR rtl-optimization/87763
>   * config/aarch64/aarch64-protos.h
> (aarch64_masks_and_shift_for_bfi_p):
>   New prototype.
>   * config/aarch64/aarch64.c (aarch64_masks_and_shift_for_bfi_p):
>   New function.
>   * config/aarch64/aarch64.md (*aarch64_bfi5_shift):
>   New instruction.
>   (*aarch64_bfi4_noshift): Ditto.
>   (*aarch64_bfi4_noshift_alt): Ditto.
> 
> 
> 2018-01-28  Steve Ellcey  
> 
>   PR rtl-optimization/87763
>   * gcc.target/aarch64/combine_bfxil.c: Change some bfxil checks
>   to bfi.
> 
> 
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index b035e35f33b..b6c0d0a8eb6 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -429,6 +429,9 @@ bool aarch64_label_mentioned_p (rtx);
 void aarch64_declare_function_name (FILE *, const char*, tree);
 bool aarch64_legitimate_pic_operand_p (rtx);
 bool aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode, rtx, rtx);
+bool aarch64_masks_and_shift_for_bfi_p (scalar_int_mode, unsigned HOST_WIDE_INT,
+	unsigned HOST_WIDE_INT,
+	unsigned HOST_WIDE_INT);
 bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx);
 bool aarch64_move_imm (HOST_WIDE_INT, machine_mode);
 opt_machine_mode aarch64_sve_pred_mode (unsigned int);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index d7c453cdad0..9a3080b5db8 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9330,6 +9330,41 @@ aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode mode, rtx mask,
 	 & ((HOST_WIDE_INT_1U << INTVAL (shft_amnt)) - 1)) == 0;
 }
 
+/* Return true if the masks and a shift amount from an RTX of the form
+   ((x & MASK1) | ((y << SHIFT_AMNT) & MASK2)) are valid to combine into
+   a BFI instruction of mode MODE.  See *arch64_bfi patterns.  */
+
+bool
+aarch64_masks_and_shift_for_bfi_p (scalar_int_mode mode,
+   unsigned HOST_WIDE_INT mask1,
+   unsigned HOST_WIDE_INT shft_amnt,
+   unsigned HOST_WIDE_INT mask2)
+{
+  unsigned HOST_WIDE_INT t;
+
+  /* Verify that there is no overlap in what bits are set in the two masks.  */
+  if ((mask1 + mask2) != HOST_WIDE_INT_M1U)
+return false;
+  if ((mask1 & mask2) != 0)
+return false;
+
+  /* Verify that the shift amount is less than the mode size.  */
+  if (shft_amnt >= GET_MODE_BITSIZE (mode))
+return false;
+
+  /* Verify that the mask being shifted is contiguous and would be in the
+ least significant bits after shifting by creating a mask 't' based on
+ the number of bits set in mask2 and the shift amount for mask2 and
+ comparing that to the actual mask2.  */
+  t = popcount_hwi (mask2);
+  t = (HOST_WIDE_INT_1U << t) - 1;
+  t = t << shft_amnt;
+  if (t != mask2)
+return false;
+  
+  return true;
+}
+
 /* Calculate the cost of calculating X, storing it in *COST.  Result
is true if the total

[Patch][Aarch64]PR rtl-optimization/87763 - Fix lsl_asr_sbfiz.c test by checking for subregs

2019-01-29 Thread Steve Ellcey

So the various tests that started failing with r265398 seem to need
different fixes.  This particular fix is for the
gcc.target/aarch64/lsl_asr_sbfiz.c failure.  The problem is that the
instructions we are trying to match to *ashiftsi_extv_bfiz now have
explicit subregs in them where they didn't before.   The new version
is:

(set (reg:SI 93)
(ashift:SI (subreg:SI (sign_extract:DI (subreg:DI (reg:SI 95) 0)
(const_int 3 [0x3])
(const_int 0 [0])) 0)
(const_int 19 [0x13])))


The subreg's were not there before.  My proposed fix is to add an new
instruction like *ashiftsi_extv_bfiz but with the subregs.  This fixes
lsl_asr_sbfiz.c.  Does this seem like the right way to fix this?

Steve Ellcey
sell...@marvell.com


2018-01-29  Steve Ellcey  

PR rtl-optimization/87763
* config/aarch64/aarch64.md (*ashiftsi_extv_bfiz_alt):
New Instruction.

diff --git a/gcc/config/aarch64/aarch64.md
b/gcc/config/aarch64/aarch64.md
index b7f6fe0f135..d65230c4837 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5531,6 +5531,22 @@
   [(set_attr "type" "bfx")]
 )
 
+(define_insn "*ashiftsi_extv_bfiz_alt"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+   (ashift:SI
+ (subreg:SI
+   (sign_extract:DI
+ (subreg:DI (match_operand:SI 1 "register_operand" "r") 0)
+ (match_operand 2 "aarch64_simd_shift_imm_offset_si" "n")
+ (const_int 0))
+   0)
+ (match_operand 3 "aarch64_simd_shift_imm_si" "n")))]
+  "IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]),
+1, GET_MODE_BITSIZE (SImode) - 1)"
+  "sbfiz\\t%w0, %w1, %3, %2"
+  [(set_attr "type" "bfx")]
+)
+
 ;; When the bit position and width of the equivalent extraction add up
to 32
 ;; we can use a W-reg LSL instruction taking advantage of the implicit
 ;; zero-extension of the X-reg.

Re: [Patch] PR rtl-optimization/87763 - generate more bfi instructions on aarch64

2019-01-28 Thread Steve Ellcey

On Sat, 2019-01-26 at 00:00 +0100, Jakub Jelinek wrote:
> 
> > +  /* Verify that there is no overlap in what bits are set in the two 
> > masks.  */
> > +  if ((m1 + m2 + 1) != 0)
> > +return false;
> 
> Wouldn't that be clearer to test
>   if (m1 + m2 != HOST_WIDE_INT_1U)
> return false;
> ?
> You IMHO also should test
>   if ((m1 & m2) != 0)
> return false;

I think you mean HOST_WIDE_INT_M1U, not HOST_WIDE_INT_1U.  That does
seem clearer and I made that change as well as adding the second test.

> > 
> > +  t = popcount_hwi (m2);
> > +  t = (1 << t) - 1;
> 
> This should be (HOST_WIDE_INT_1U << t), otherwise if popcount of m2 is
> > = 32, there will be UB.

Fixed.

> As mentioned in rs6000.md, I believe you also need a similar pattern where
> the two ANDs are swapped, because they have the same priority.

I fixed the long lines in aarch64.md and I added a second pattern for
the *aarch64_bfi4_noshift pattern, swapping the order of the
IOR operands.  I did not swap the AND operands, I assume the compiler
would ensure that the register was always before the constant or that
it would check both orderings.

I tried adding a second version of *aarch64_bfi5_shift as
well but when I tested it, it never got used during a bootstrap build
or a GCC test run so I decided it was not needed.

Tested on aarch64 with a bootstrap build and a GCC testsuite run.
OK to checkin?


Steve Ellcey
sell...@marvell.com


2018-01-28  Steve Ellcey  

PR rtl-optimization/87763
* config/aarch64/aarch64-protos.h (aarch64_masks_and_shift_for_bfi_p):
New prototype.
* config/aarch64/aarch64.c (aarch64_masks_and_shift_for_bfi_p):
New function.
* config/aarch64/aarch64.md (*aarch64_bfi5_shift):
New instruction.
    (*aarch64_bfi4_noshift): Ditto.
(*aarch64_bfi4_noshift_alt): Ditto.


2018-01-28  Steve Ellcey  

PR rtl-optimization/87763
* gcc.target/aarch64/combine_bfxil.c: Change some bfxil checks
to bfi.


diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index b035e35f33b..b6c0d0a8eb6 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -429,6 +429,9 @@ bool aarch64_label_mentioned_p (rtx);
 void aarch64_declare_function_name (FILE *, const char*, tree);
 bool aarch64_legitimate_pic_operand_p (rtx);
 bool aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode, rtx, rtx);
+bool aarch64_masks_and_shift_for_bfi_p (scalar_int_mode, unsigned HOST_WIDE_INT,
+	unsigned HOST_WIDE_INT,
+	unsigned HOST_WIDE_INT);
 bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx);
 bool aarch64_move_imm (HOST_WIDE_INT, machine_mode);
 opt_machine_mode aarch64_sve_pred_mode (unsigned int);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index d7c453cdad0..9a3080b5db8 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9330,6 +9330,41 @@ aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode mode, rtx mask,
 	 & ((HOST_WIDE_INT_1U << INTVAL (shft_amnt)) - 1)) == 0;
 }
 
+/* Return true if the masks and a shift amount from an RTX of the form
+   ((x & MASK1) | ((y << SHIFT_AMNT) & MASK2)) are valid to combine into
+   a BFI instruction of mode MODE.  See *arch64_bfi patterns.  */
+
+bool
+aarch64_masks_and_shift_for_bfi_p (scalar_int_mode mode,
+   unsigned HOST_WIDE_INT mask1,
+   unsigned HOST_WIDE_INT shft_amnt,
+   unsigned HOST_WIDE_INT mask2)
+{
+  unsigned HOST_WIDE_INT t;
+
+  /* Verify that there is no overlap in what bits are set in the two masks.  */
+  if ((mask1 + mask2) != HOST_WIDE_INT_M1U)
+return false;
+  if ((mask1 & mask2) != 0)
+return false;
+
+  /* Verify that the shift amount is less than the mode size.  */
+  if (shft_amnt >= GET_MODE_BITSIZE (mode))
+return false;
+
+  /* Verify that the mask being shifted is contiguous and would be in the
+ least significant bits after shifting by creating a mask 't' based on
+ the number of bits set in mask2 and the shift amount for mask2 and
+ comparing that to the actual mask2.  */
+  t = popcount_hwi (mask2);
+  t = (HOST_WIDE_INT_1U << t) - 1;
+  t = t << shft_amnt;
+  if (t != mask2)
+return false;
+  
+  return true;
+}
+
 /* Calculate the cost of calculating X, storing it in *COST.  Result
is true if the total cost of the operation has now been calculated.  */
 static bool
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index b7f6fe0f135..6b5339092ba 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5476,6 +5476,56 @@
   [(set_attr "type" "bfm")]
 )
 
+;;  Match a bfi instruction where the shift of OP3 means that we are
+;;  actually copying the least significant b

Re: [Patch] PR rtl-optimization/87763 - generate more bfi instructions on aarch64

2019-01-25 Thread Steve Ellcey

On Fri, 2019-01-25 at 10:32 +, Richard Earnshaw (lists) wrote:
> 
> Do we need another variant pattern to handle the case where the
> insertion is into the top of the destination?  In that case the
> immediate mask on the shifted operand is technically redundant as the
> bottom bits are known zero and there are no top bits.

I am not sure about this.  Do you have an example where this might be
needed?

I updated my patch to address your other comments and have bootstrapped
and tested this on aarch64.  Does this version look good for checkin?
I had to modify the two tests because with my new instructions we
sometimes generate bfi instructions where we used to generate bfxil
instructions.  The only regression this is fixing is combine_bfi_1.c,
combine_bfxil.c was passing before but still needed to be changed in
order to keep passing.

Steve Ellcey
sell...@marvell.com

2018-01-25  Steve Ellcey  

PR rtl-optimization/87763
* config/aarch64/aarch64-protos.h (aarch64_masks_and_shift_for_bfi_p):
New prototype.
* config/aarch64/aarch64.c (aarch64_masks_and_shift_for_bfi_p):
New function.
* config/aarch64/aarch64.md (*aarch64_bfi4_shift):
New instruction.
(*aarch64_bfi4_noshift): Ditto.

2018-01-25  Steve Ellcey  

PR rtl-optimization/87763
* gcc.target/aarch64/combine_bfi_1.c: Change some bfxil checks
to bfi.
* gcc.target/aarch64/combine_bfxil.c: Ditto.

[Patch] PR rtl-optimization/87763 - generate more bfi instructions on aarch64

2019-01-24 Thread Steve Ellcey

Here is my attempt at creating a couple of new instructions to
generate more bfi instructions on aarch64.  I haven't finished
testing this but it helps with gcc.target/aarch64/combine_bfi_1.c.

Before I went any further with it I wanted to see if anyone
else was working on something like this and if this seems like
a reasonable approach.

Steve Ellcey
sell...@marvell.com


2018-01-24  Steve Ellcey  

PR rtl-optimization/87763
* config/aarch64/aarch64-protos.h
(aarch64_masks_and_shift_for_aarch64_bfi_p): New prototype.
* config/aarch64/aarch64.c (aarch64_masks_and_shift_for_aarch64_bfi_p):
New function.
* config/aarch64/aarch64.md (*aarch64_bfi4_shift):
New instruction.
(*aarch64_bfi4_noshift): Ditto.


diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index b035e35..ec90053 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -429,6 +429,7 @@ bool aarch64_label_mentioned_p (rtx);
 void aarch64_declare_function_name (FILE *, const char*, tree);
 bool aarch64_legitimate_pic_operand_p (rtx);
 bool aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode, rtx, rtx);
+bool aarch64_masks_and_shift_for_aarch64_bfi_p (scalar_int_mode, rtx, rtx, rtx);
 bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx);
 bool aarch64_move_imm (HOST_WIDE_INT, machine_mode);
 opt_machine_mode aarch64_sve_pred_mode (unsigned int);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 5df5a8b..69cc69f 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9294,6 +9294,44 @@ aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode mode, rtx mask,
 	 & ((HOST_WIDE_INT_1U << INTVAL (shft_amnt)) - 1)) == 0;
 }
 
+/* Return true if the masks and a shift amount from an RTX of the form
+   ((x & MASK1) | ((y << SHIFT_AMNT) & MASK2)) are valid to combine into
+   a BFI instruction of mode MODE.  See *arch64_bfi patterns.  */
+
+bool
+aarch64_masks_and_shift_for_aarch64_bfi_p (scalar_int_mode mode, rtx mask1,
+	   rtx shft_amnt, rtx mask2)
+{
+  unsigned HOST_WIDE_INT m1, m2, s, t;
+
+  if (!CONST_INT_P (mask1) || !CONST_INT_P (mask2) || !CONST_INT_P (shft_amnt))
+return false;
+
+  m1 = UINTVAL (mask1);
+  m2 = UINTVAL (mask2);
+  s = UINTVAL (shft_amnt);
+
+  /* Verify that there is no overlap in what bits are set in the two masks.  */
+  if ((m1 + m2 + 1) != 0)
+return false;
+
+  /* Verify that the shift amount is less than the mode size.  */
+  if (s >= GET_MODE_BITSIZE (mode))
+return false;
+
+  /* Verify that the mask being shifted is contigious and would be in the
+ least significant bits after shifting by creating a mask 't' based on
+ the number of bits set in mask2 and the shift amount for mask2 and
+ comparing that to the actual mask2.  */
+  t = popcount_hwi (m2);
+  t = (1 << t) - 1;
+  t = t << s;
+  if (t != m2)
+return false;
+  
+  return true;
+}
+
 /* Calculate the cost of calculating X, storing it in *COST.  Result
is true if the total cost of the operation has now been calculated.  */
 static bool
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index b7f6fe0..e1f526b 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5476,6 +5476,41 @@
   [(set_attr "type" "bfm")]
 )
 
+;;  Match a bfi instruction where the shift of OP3 means that we are
+;;  actually copying the least significant bits of OP3 into OP0 by way
+;;  of the AND masks and the IOR instruction.
+
+(define_insn "*aarch64_bfi4_shift"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+(ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
+  (match_operand:GPI 2 "const_int_operand" "n"))
+ (and:GPI (ashift:GPI
+   (match_operand:GPI 3 "register_operand" "r")
+   (match_operand:GPI 4 "aarch64_simd_shift_imm_" "n"))
+  (match_operand:GPI 5 "const_int_operand" "n"]
+  "aarch64_masks_and_shift_for_aarch64_bfi_p (mode, operands[2], operands[4], operands[5])"
+{
+  return "bfi\t%0, %3, %4, %P5";
+}
+  [(set_attr "type" "bfm")]
+)
+
+;; Like the above instruction but with no shifting, we are just copying the
+;; least significant bits of OP3 to OP0.
+
+(define_insn "*aarch64_bfi4_noshift"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+(ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
+  (match_operand:GPI 2 "const_int_operand" "n"))
+ (and:GPI (match_operand:GPI 3 "regist

Re: Fortran vector math header

2019-01-24 Thread Steve Ellcey

On Thu, 2019-01-24 at 16:25 +0100, Martin Liška wrote:
> 
> > 
> > > +ix86_get_multilib_abi_name (void)
> > > +{
> > > +  if (!(TARGET_64BIT_P (ix86_isa_flags)))
> > > +return "i386";
> > > +  else if (TARGET_X32_P (ix86_isa_flags))
> > > +return "x32";
> > > +  else
> > > +return "x86_64";
> > > +}

I'd like to get an aarch64 version of this target function into GCC 9.*
too in order to support ILP32/LP64 on aarch64.  It doesn't necessarily
have to be part of this patch though, I could submit one later if you
do not want to add it here.

Aarch64 ILP32 support is in GCC and binutils but not GLIBC (except on a
branch), I am thinking the Aarch64 version of this function would
return "aarch64" or "aarch64_ilp32".  Perhaps we should also have
"aarch64_be" and "aarch64_be_ilp32" for big endian ABI's?

Steve Ellcey
sell...@marvell.com

Re: [EXT] Re: Fortran vector math header

2019-01-23 Thread Steve Ellcey

On Wed, 2019-01-23 at 23:53 +0100, Jakub Jelinek wrote:
> External Email
> 
> ---
> ---
> On Wed, Jan 23, 2019 at 09:56:21PM +0000, Steve Ellcey wrote:
> > I wonder if we even need to pass a string to the target
> > hook.  Instead
> > of:
> > 
> > !GCC$ builtin (cos) attributes simd (notinbranch) if('x86_64-linux-gnu')
> > 
> > We just have:
> > 
> > !GCC$ builtin (cos) attributes simd (notinbranch) if_allowed
> 
> That is not possible.  The whole point why we have a header is that glibc
> as the library provider provides a header which says to gcc which builtins
> have simd entrypoints, for which target and for which multilib.
> 
> If GCC knew or wanted to hardcode that info, we wouldn't have a header, we'd
> just hardcode it.
> The point is that at some later point, glibc might get an implementation for
> say powerpc64 64-bit, or mips n32, or whatever else, say for a subset of the
> builtins x86_64 has, or superset, ...

Duh.  Of course.  I am not sure how I lost track of that fact.

Steve Ellcey
sell...@marvell.com

Re: Fortran vector math header

2019-01-23 Thread Steve Ellcey

On Tue, 2019-01-22 at 13:18 +0100, Richard Biener wrote:
> 
> Yeah, I would even suggest to use a target hook multilib_ABI_active_p
> (const char *)
> for this ... (where the hook should diagnose unknown multilib specifiers).
> 
> Richard.

I wonder if we even need to pass a string to the target hook.  Instead
of:

!GCC$ builtin (cos) attributes simd (notinbranch) if('x86_64-linux-gnu')

We just have:

!GCC$ builtin (cos) attributes simd (notinbranch) if_allowed

When we see if_allowed we call a target function like multilib_ABI_active_p
and it can use whatever criteria it wants to use to determine if the simd
attribute should be honored.

Or, if we are going this far, how about leaving out the if altogher and
everytime we see a simd attribute in Fortran we call a target function
to determine if it should or should not be honored.  The default function
can just return true, target specific versions could look at the ABI and
return true or false as needed.

It might also be worth passing the function (cos) as an argument to the
target function, then we could have different ABI's enabling different
functions and not have to have them all on or all off based on the ABI.

Steve Ellcey
sell...@marvell.com

Re: [EXT] Re: [Patch] PR target/85711 - Fix ICE on aarch64

2019-01-23 Thread Steve Ellcey

On Wed, 2019-01-23 at 16:54 +, Richard Sandiford wrote:
> 
> IMO we shouldn't remove the assert.  See:
> 
>   https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01969.html
> 
> and the thread leading up to it.
> 
> Thanks,
> Richard

OK, I hadn't seen that thread.  I didn't see any patch submitted
in response to your comment there so I created a new patch.
This patch leaves the assert in aarch64.c and changes the check
for the 'p' constraint in constain_operands, this version
fixes the pr84682-2.c test failure and causes no regressions
on aarch64 or x86.

Steve Ellcey
sell...@marvell.com


2019-01-23  Bin Cheng  
Steve Ellcey 

PR target/85711
* recog.c (address_operand): Return false on wrong mode for address.
(constrain_operands): Check for mode with 'p' constraint.


diff --git a/gcc/recog.c b/gcc/recog.c
index d0c498fced2..a9f584bc0dc 100644
--- a/gcc/recog.c
+++ b/gcc/recog.c
@@ -1070,6 +1070,11 @@ general_operand (rtx op, machine_mode mode)
 int
 address_operand (rtx op, machine_mode mode)
 {
+  /* Wrong mode for an address expr.  */
+  if (GET_MODE (op) != VOIDmode
+  && ! SCALAR_INT_MODE_P (GET_MODE (op)))
+return false;
+
   return memory_address_p (mode, op);
 }
 
@@ -2696,10 +2701,13 @@ constrain_operands (int strict, alternative_mask alternatives)
 		/* p is used for address_operands.  When we are called by
 		   gen_reload, no one will have checked that the address is
 		   strictly valid, i.e., that all pseudos requiring hard regs
-		   have gotten them.  */
-		if (strict <= 0
-		|| (strict_memory_address_p (recog_data.operand_mode[opno],
-		 op)))
+		   have gotten them.  We also want to make sure we have a
+		   valid mode.  */
+		if ((GET_MODE (op) == VOIDmode
+		 || SCALAR_INT_MODE_P (GET_MODE (op)))
+		&& (strict <= 0
+			|| (strict_memory_address_p
+			 (recog_data.operand_mode[opno], op
 		  win = 1;
 		break;

[Patch] PR target/85711 - Fix ICE on aarch64

2019-01-23 Thread Steve Ellcey

The test gcc.dg/torture/pr84682-2.c has been failing for some time on
aarch64.  Bin Cheng submitted a patch for this some time ago, the 
original patch was:

https://gcc.gnu.org/ml/gcc-patches/2018-03/msg00784.html

But Richard Sandiford thought it should be fixed in recog.c instead of
just in aarch64.c.  Bin submitted a followup:

https://gcc.gnu.org/ml/gcc-patches/2018-03/msg01166.html

But there were no replies to that patch submission.  I have retested
the second patch and verified it fixes the aarch64 failure and causes
no regressions on aarch64 or x86.

OK to checkin?

Steve Ellcey
sell...@marvell.com




2019-01-23  Bin Cheng  
Steve Ellcey 

* recog.c (address_operand): Return false on wrong mode for address.
* config/aarch64/aarch64.c (aarch64_classify_address): Remove assert
since it's checked in general code now.




diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 5df5a8b..aa3054d 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -6565,9 +6565,6 @@ aarch64_classify_address (struct aarch64_address_info *info,
   && (code != POST_INC && code != REG))
 return false;
 
-  gcc_checking_assert (GET_MODE (x) == VOIDmode
-		   || SCALAR_INT_MODE_P (GET_MODE (x)));
-
   switch (code)
 {
 case REG:
diff --git a/gcc/recog.c b/gcc/recog.c
index d0c498f..fb90302 100644
--- a/gcc/recog.c
+++ b/gcc/recog.c
@@ -1070,6 +1070,11 @@ general_operand (rtx op, machine_mode mode)
 int
 address_operand (rtx op, machine_mode mode)
 {
+  /* Wrong mode for an address expr.  */
+  if (GET_MODE (op) != VOIDmode
+  && ! SCALAR_INT_MODE_P (GET_MODE (op)))
+return false;
+
   return memory_address_p (mode, op);
 }

Re: [EXT] Re: [Patch 2/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2019-01-23 Thread Steve Ellcey

On Wed, 2019-01-23 at 12:50 +, Tamar Christina wrote:
> Hi Steve,
> 
> > 
> > Hi Steve,
> > 
> > No we are using aarch64_be-*-* but this is the only one that popped
> > up as a
> > failure which is why I didn't change the others.
> > But now I'm wondering why... I'll check the log file manually
> > tomorrow to be
> > sure.
> 
> I've checked and the reason this didn't show up is because we don't
> run AArch64 big-endian cross linux tests by default and the baremetal
> builds don't support gomp.
> 
> I'm happy to update them for you if you want though.

I already updated them.  Thanks for checking on the failures.

Steve Ellcey
sell...@cavium.com

Re: [EXT] Re: [Patch 2/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2019-01-22 Thread Steve Ellcey

On Mon, 2019-01-21 at 18:00 +, Tamar Christina wrote:
> 
> > That would need to be aarch64*-*-* to include big-endian.  Fixing that here
> > and in the other tests is OK under the obvious rule.
> 
> Ah, true, I didn't look at the testcase. I tested the ILP32 case and
> committed the fix for both.
> 
> Thanks,
> Tamar
> 
> > 
> > Adding && lp64 (as per Steve's patch below) is OK too if it works.
> > 
> > Thanks,
> > Richard

Thanks for fixing this test Tamar.  I am going to change the others to
use 'aarch64*-*-*' instead of 'aarch64-*-*' since that seems to be the
standard method (though I see a few other tests that are not using 
aarch64 instead of aarch64*).

I guess that while you are testing big-endian the target triplet you
are using is still aarch64-*-* and not aarch64be-*-*?  Otherwise I
would have expected you to have more failures.

Steve Ellcey
sell...@marvell.com

Re: [EXT] Re: [Patch 2/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2019-01-18 Thread Steve Ellcey

On Fri, 2019-01-18 at 15:35 +0100, Christophe Lyon wrote:
> 
> Hi Steve,
> 
> I've noticed that
> FAIL: g++.dg/vect/simd-clone-7.cc  -std=c++14  (test for warnings,
> line 7)
> (and for c++17 and c++98)
> when forcing -mabi=ilp32.
> 
> I suspect you want to skip the test in this case?
> 
> Christophe

Actually, I think we can compile that test, it just would not generate
a warning in ILP32 mode because int, floats and pointers would now all
be the same size.  So I think the fix is:


% git diff simd-clone-7.cc
diff --git a/gcc/testsuite/g++.dg/vect/simd-clone-7.cc 
b/gcc/testsuite/g++.dg/vect/simd-clone-7.cc
index c2a63cd5f8e..3617f0ab6a7 100644
--- a/gcc/testsuite/g++.dg/vect/simd-clone-7.cc
+++ b/gcc/testsuite/g++.dg/vect/simd-clone-7.cc
@@ -8,4 +8,4 @@ bar (float x, float *y, int)
 {
   return y[0] + y[1] * x;
 }
-// { dg-warning "GCC does not currently support mixed size types for 'simd' 
functions" "" { target aarch64-*-* } .-4 }
+// { dg-warning "GCC does not currently support mixed size types for 'simd' 
functions" "" { target { { aarch64-*-* } && lp64 } } .-4 }


I haven't tested this, I don't have an ILP32 build sitting around right
now.  Does it work for you?  I can build a toolchain, test it, and
submit a patch if you want.


Steve Ellcey
sell...@marvell.com

[PATCH] PR fortran/88898 - fix tests to check for warnings on aarch64 only

2019-01-17 Thread Steve Ellcey

I am going to check this patch in as obvious.  My earlier patch for Aarch64
SIMD support added some warning tests to three Fortran gomp tests as they were
needed for aarch64.  Unfortunately, I forgot to add '{ target aarch64-*-* }'
to the warning checks so the tests failed on x86 where the warning is not 
generated.
The fix is just to add the target check to the new warnings I added earlier
today.

Sorry for the noise.

Steve Ellcey
sell...@marvell.com


2018-01-17  Steve Ellcey  

PR fortran/88898
* gfortran.dg/gomp/declare-simd-2.f90: Add aarch64 target specifier to
warning checks.
* gfortran.dg/gomp/pr79154-1.f90: Ditto.
* gfortran.dg/gomp/pr83977.f90: Ditto.



diff --git a/gcc/testsuite/gfortran.dg/gomp/declare-simd-2.f90 b/gcc/testsuite/gfortran.dg/gomp/declare-simd-2.f90
index fd4e119744d..5852122a7de 100644
--- a/gcc/testsuite/gfortran.dg/gomp/declare-simd-2.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/declare-simd-2.f90
@@ -1,6 +1,6 @@
 ! { dg-do compile }
 
-function f1 (a, b, c, d, e, f) ! { dg-warning "GCC does not currently support mixed size types for 'simd' functions" }
+function f1 (a, b, c, d, e, f) ! { dg-warning "GCC does not currently support mixed size types for 'simd' functions" "" { target aarch64-*-* } }
   integer, value :: a, b, c
   integer :: d, e, f, f1
 !$omp declare simd (f1) uniform(b) linear(c, d) linear(uval(e)) linear(ref(f))
@@ -12,7 +12,7 @@ function f1 (a, b, c, d, e, f) ! { dg-warning "GCC does not currently support mi
   f = f + 1
   f1 = a + b + c + d + e + f
 end function f1
-integer function f2 (a, b) ! { dg-warning "GCC does not currently support mixed size types for 'simd' functions" }
+integer function f2 (a, b) ! { dg-warning "GCC does not currently support mixed size types for 'simd' functions" "" { target aarch64-*-* } }
   integer :: a, b
 !$omp declare simd uniform(b) linear(ref(a):b)
   a = a + 1
diff --git a/gcc/testsuite/gfortran.dg/gomp/pr79154-1.f90 b/gcc/testsuite/gfortran.dg/gomp/pr79154-1.f90
index 953dcadd786..29a570a990f 100644
--- a/gcc/testsuite/gfortran.dg/gomp/pr79154-1.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/pr79154-1.f90
@@ -1,7 +1,7 @@
 ! PR fortran/79154
 ! { dg-do compile }
 
-pure real function foo (a, b)		! { dg-warning "GCC does not currently support mixed size types for 'simd' functions" }
+pure real function foo (a, b)		! { dg-warning "GCC does not currently support mixed size types for 'simd' functions" "" { target aarch64-*-* } }
 !$omp declare simd(foo)			! { dg-bogus "may not appear in PURE or ELEMENTAL" }
   real, intent(in) :: a, b
   foo = a + b
@@ -20,7 +20,7 @@ pure real function baz (a, b)
   real, intent(in) :: a, b
   baz = a + b
 end function baz
-elemental real function fooe (a, b)	! { dg-warning "GCC does not currently support mixed size types for 'simd' functions" }
+elemental real function fooe (a, b)	! { dg-warning "GCC does not currently support mixed size types for 'simd' functions" "" { target aarch64-*-* } }
 !$omp declare simd(fooe)		! { dg-bogus "may not appear in PURE or ELEMENTAL" }
   real, intent(in) :: a, b
   fooe = a + b
diff --git a/gcc/testsuite/gfortran.dg/gomp/pr83977.f90 b/gcc/testsuite/gfortran.dg/gomp/pr83977.f90
index 6dfdbc32c2d..35fe2cc5edc 100644
--- a/gcc/testsuite/gfortran.dg/gomp/pr83977.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/pr83977.f90
@@ -1,7 +1,7 @@
 ! PR middle-end/83977
 ! { dg-do compile }
 
-integer function foo (a, b) ! { dg-warning "GCC does not currently support mixed size types for 'simd' functions" }
+integer function foo (a, b) ! { dg-warning "GCC does not currently support mixed size types for 'simd' functions" "" { target aarch64-*-* } }
integer :: a, b
 !$omp declare simd uniform(b) linear(ref(a):b)
a = a + 1

Re: [EXT] Re: [Patch 2/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2019-01-17 Thread Steve Ellcey

On Thu, 2019-01-17 at 09:10 +, Richard Sandiford wrote:
> 
> > +static bool supported_simd_type (tree t)
> 
> Missing line break after "static bool".

Fixed.

> > +static bool currently_supported_simd_type (tree t, tree b)
> 
> Same here.

Fixed.

> > +  return 0;
> 
> The return should use tab indentation.

Fixed.
> 
> OK otherwise, thanks.
> 
> Richard

Thanks for the reviews Richard, I made those changes and checked in the
patch.  That is the last of the Aarch64 SIMD / Vector ABI patches I
have so everything should be checked in and working now.

Steve Ellcey
sell...@marvell.com

Re: [Patch 2/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2019-01-16 Thread Steve Ellcey

On Wed, 2019-01-16 at 08:50 +, Richard Sandiford wrote:
> 
> I suggest for now we add:
> 
> /* { dg-excess-errors "partial simd clone support" { target { aarch64*-*-* } 
> } }  */

OK, that works.

> 
> ...this doesn't handle explicit simdlen correctly.  The vecsize_int and
> vecsize_float need to be set from the user's simdlen, with num only
> making a difference for the default simdlen.  And there should only
> be 1 size for an explicit simdlen.
> 
> Maybe this would be more obvious if we have something like:

OK, that works better, I think returning 2 instead of 1 is what was
causing the ICE.  I used your code to set the return value and the
vecsize and everything works now.

Here is a new version of the patch, it bootstraps on aarch64 and x86
and the GCC testsuite ran with no regressions on both platforms as
well.

Steve Ellcey
sell...@marvell.com


2018-01-16  Steve Ellcey  

* config/aarch64/aarch64.c (cgraph.h): New include.
(intl.h): New include.
(supported_simd_type): New function.
(currently_supported_simd_type): Ditto.
(aarch64_simd_clone_compute_vecsize_and_simdlen): Ditto.
(aarch64_simd_clone_adjust): Ditto.
(aarch64_simd_clone_usable): Ditto.
(TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN): New macro.
(TARGET_SIMD_CLONE_ADJUST): Ditto.
(TARGET_SIMD_CLONE_USABLE): Ditto.
* config/i386/i386.c (ix86_simd_clone_adjust): Add definition check.
* omp-simd-clone.c (expand_simd_clones): Add targetm.simd_clone.adjust
call.

2018-01-16  Steve Ellcey  

* c-c++-common/gomp/pr60823-1.c: Add aarch64 specific
warning checks and assembler scans.
* c-c++-common/gomp/pr60823-3.c: Ditto.
* c-c++-common/gomp/pr63328.c: Ditto.
* g++.dg/gomp/declare-simd-1.C: Ditto.
* g++.dg/gomp/declare-simd-3.C: Ditto.
* g++.dg/gomp/declare-simd-4.C: Ditto.
* g++.dg/gomp/declare-simd-7.C: Ditto.
* g++.dg/gomp/pr88182.C: Ditto.
* g++.dg/vect/simd-clone-7.cc: Ditto.
* gcc.dg/gomp/declare-simd-1.c: Ditto.
* gcc.dg/gomp/declare-simd-3.c: Ditto.
* gcc.dg/gomp/pr59669-2.c: Ditto.
* gcc.dg/gomp/pr87895-1.c: Ditto.
* gcc.dg/gomp/pr87895-2.c: Ditto.
* gcc.dg/gomp/simd-clones-2.c: Ditto.
* gfortran.dg/gomp/declare-simd-2.f90: Ditto.
* gfortran.dg/gomp/pr79154-1.f90: Ditto.
* gfortran.dg/gomp/pr83977.f90: Ditto.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index fd60bdd..7331482 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -40,6 +40,7 @@
 #include "regs.h"
 #include "emit-rtl.h"
 #include "recog.h"
+#include "cgraph.h"
 #include "diagnostic.h"
 #include "insn-attr.h"
 #include "alias.h"
@@ -71,6 +72,7 @@
 #include "selftest.h"
 #include "selftest-rtl.h"
 #include "rtx-vector-builder.h"
+#include "intl.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -18420,6 +18422,149 @@ aarch64_estimated_poly_value (poly_int64 val)
   return val.coeffs[0] + val.coeffs[1] * over_128 / 128;
 }
 
+
+/* Return true for types that could be supported as SIMD return or
+   argument types.  */
+
+static bool supported_simd_type (tree t)
+{
+  if (SCALAR_FLOAT_TYPE_P (t) || INTEGRAL_TYPE_P (t) || POINTER_TYPE_P (t))
+{
+  HOST_WIDE_INT s = tree_to_shwi (TYPE_SIZE_UNIT (t));
+  return s == 1 || s == 2 || s == 4 || s == 8;
+}
+  return false;
+}
+
+/* Return true for types that currently are supported as SIMD return
+   or argument types.  */
+
+static bool currently_supported_simd_type (tree t, tree b)
+{
+  if (COMPLEX_FLOAT_TYPE_P (t))
+return false;
+
+  if (TYPE_SIZE (t) != TYPE_SIZE (b))
+return false;
+
+  return supported_simd_type (t);
+}
+
+/* Implement TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN.  */
+
+static int
+aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
+	struct cgraph_simd_clone *clonei,
+	tree base_type, int num)
+{
+  tree t, ret_type, arg_type;
+  unsigned int elt_bits, vec_bits, count;
+
+  if (!TARGET_SIMD)
+return 0;
+
+  if (clonei->simdlen
+  && (clonei->simdlen < 2
+	  || clonei->simdlen > 1024
+	  || (clonei->simdlen & (clonei->simdlen - 1)) != 0))
+{
+  warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		  "unsupported simdlen %d", clonei->simdlen);
+  return 0;
+}
+
+  ret_type = TREE_TYPE (TREE_TYPE (node->decl));
+  if (TREE_CODE (ret_type) != VOID_TYPE
+  && !currently_supported_simd_type (ret_type, base_type))
+{
+  if (TYPE_SIZE (ret_type) != TYPE_SIZE (base_type))
+	warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		"GCC does not currently support mix

Re: [Patch 2/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2019-01-15 Thread Steve Ellcey

Richard,

Here is a new version of the patch but it is not passing the testsuite
right now.  I added the check for the size of the base type being the
same as the size of the return or argument type and modified the error
messages in some cases to make more sense.  This caused some things
to not get cloned which is fine but it also caused a couple of ICE
failures and I am not sure how to deal with those.  I also am having
trouble with a couple of tests that include other tests and how to
set dg-warning on them.

I will try to figure out what is going on but I wanted to send the
latest versions to you to see if you had any ideas on the problems
I am seeing.

Steve Ellcey
sell...@marvell.com



Here are the failures I am getting with this patch:

c-c++-common/gomp/pr63328.c
gcc.dg/gomp/pr87895-2.c

These tests include another test (which passes) and the included tests
have a dg-warning check.  For some reason the dg-warning in the include
is ignored and when I tried adding one in the main file (that includes
the other test), that didn't work either.


gcc.dg/gomp/simd-clones-1.c
g++.dg/gomp/declare-simd-1.C

These two tests are generating an ICE and I am not sure why.

I cut declare-simd-1.C down to:

#pragma omp declare simd simdlen (2) aligned (b : sizeof (long long) * 2)
__extension__ long long
f10 (long long *b)
{
  return *b;
}

And it results in:

% install/usr/bin/g++ -fopenmp-simd -c b1.C
during RTL pass: expand
b1.C: In function ‘long long int f10(long long int*)’:
b1.C:5:15: internal compiler error: in expand_assignment, at expr.c:5101
5 |   return *b;
  |   ^
0xaeee73 expand_assignment(tree_node*, tree_node*, bool)
/home/sellcey/gcc-vect/src/gcc/gcc/expr.c:5101
0x9aeecf expand_gimple_stmt_1
/home/sellcey/gcc-vect/src/gcc/gcc/cfgexpand.c:3746
0x9aeecf expand_gimple_stmt
/home/sellcey/gcc-vect/src/gcc/gcc/cfgexpand.c:3844
0x9b682f expand_gimple_basic_block
/home/sellcey/gcc-vect/src/gcc/gcc/cfgexpand.c:5880
0x9b90a7 execute
/home/sellcey/gcc-vect/src/gcc/gcc/cfgexpand.c:6503
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.



gcc.dg/gomp/simd-clones-1.c is pretty small, when I run that
I get the same trace as above.

Here is the current version of my patch.

2018-01-15  Steve Ellcey  

* config/aarch64/aarch64.c (cgraph.h): New include.
(intl.h): New include.
(supported_simd_type): New function.
(currently_supported_simd_type): Ditto.
(aarch64_simd_clone_compute_vecsize_and_simdlen): Ditto.
(aarch64_simd_clone_adjust): Ditto.
(aarch64_simd_clone_usable): Ditto.
(TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN): New macro.
(TARGET_SIMD_CLONE_ADJUST): Ditto.
(TARGET_SIMD_CLONE_USABLE): Ditto.
* config/i386/i386.c (ix86_simd_clone_adjust): Add definition check.
* omp-simd-clone.c (expand_simd_clones): Add targetm.simd_clone.adjust
call.


2018-01-15  Steve Ellcey  

* c-c++-common/gomp/pr60823-1.c: Add aarch64 specific
warning checks and assembler scans.
* c-c++-common/gomp/pr60823-3.c: Ditto.
* g++.dg/gomp/declare-simd-1.C: Ditto.
* g++.dg/gomp/declare-simd-3.C: Ditto.
* g++.dg/gomp/declare-simd-4.C: Ditto.
* g++.dg/gomp/declare-simd-7.C: Ditto.
* g++.dg/gomp/pr88182.C: Ditto.
* gcc.dg/gomp/declare-simd-1.c: Ditto.
* gcc.dg/gomp/declare-simd-3.c: Ditto.
* gcc.dg/gomp/pr59669-2.c: Ditto.
* gcc.dg/gomp/pr87895-1.c: Ditto.
* gcc.dg/gomp/simd-clones-2.c: Ditto.
* gfortran.dg/gomp/declare-simd-2.f90: Ditto.
* gfortran.dg/gomp/pr79154-1.f90: Ditto.
* gfortran.dg/gomp/pr83977.f90: Ditto.


diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index fd60bdd..5e5248f 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -40,6 +40,7 @@
 #include "regs.h"
 #include "emit-rtl.h"
 #include "recog.h"
+#include "cgraph.h"
 #include "diagnostic.h"
 #include "insn-attr.h"
 #include "alias.h"
@@ -71,6 +72,7 @@
 #include "selftest.h"
 #include "selftest-rtl.h"
 #include "rtx-vector-builder.h"
+#include "intl.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -18420,6 +18422,151 @@ aarch64_estimated_poly_value (poly_int64 val)
   return val.coeffs[0] + val.coeffs[1] * over_128 / 128;
 }
 
+
+/* Return true for types that could be supported as SIMD return or
+   argument types.  */
+
+static bool supported_simd_type (tree t)
+{
+  if (SCALAR_FLOAT_TYPE_P (t) || INTEGRAL_TYPE_P (t) || POINTER_TYPE_P (t))
+{
+  HOST_WIDE_INT s = tree_to_shwi (TYPE_SIZE_UNIT (t));
+  return s == 1 ||

Failing aarch64 tests (PR 87763), no longer combining instructions with hard registers

2019-01-14 Thread Steve Ellcey

I have a question about PR87763, these are aarch64 specific tests
that are failing after r265398 (combine: Do not combine moves from hard
registers).

These tests are all failing when the assembler scan looks for
specific instructions and these instructions are no longer being
generated.  In some cases the new code is no worse than the old code
(just different) but in most cases the new code is a performance
regression from the old code.

Note that these tests are generally *very* small functions where the
body of the function consists of only 1 to 4 instructions so if we
do not combine instructions involving hard registers there isn't much,
if any, combining that can be done.  In larger functions this probably
would not be an issue and I think those cases are where the incentive
for this patch came from.  So my question is, what do we want to
do about these failures?

Find a GCC patch to generate the better code?  If it isn't done by
combine, how would we do it?  Peephole optimizations?

Modify the tests to pass with the current output?  Which, in my
opinion would make the tests of not much value.

Remove the tests?  Tests that search for specific assembly language
output are rather brittle to begin with and if they are no longer
serving a purpose after the combine patch, maybe we don't need them.

The tests in question are:

gcc.target/aarch64/combine_bfi_1.c
gcc.target/aarch64/insv_1.c
gcc.target/aarch64/lsl_asr_sbfiz.c
gcc.target/aarch64/sve/tls_preserve_1.c
gcc.target/aarch64/tst_5.c
gcc.target/aarch64/tst_6.c
gcc.dg/vect/vect-nop-move.c # Scanning combine dump file, not asm file

Re: [Patch 2/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2019-01-11 Thread Steve Ellcey

On Fri, 2019-01-11 at 14:45 +, Richard Sandiford wrote:
> 
> > +
> > +/* Return true for types that could be supported as SIMD return or
> > +   argument types.  */
> > +
> > +static bool supported_simd_type (tree t)
> > +{
> > +  return (FLOAT_TYPE_P (t) || INTEGRAL_TYPE_P (t));
> 
> We should also check that the size is 1, 2, 4 or 8 bytes.

I fixed this, I also allow for POINTER_P types which allowed me
to not do the POINTER_P check below which you asked about and
which I now think was a mistake (more comments below).

> > 
> > +   wmsg = G_("unsupported return type return type %qT for simd");
> 
> Typo: doubled "return type".

Fixed.
> 
> Maybe s/for simd/for % functions/ in both messages.
> Since that will blow the line limit...
> 
> > +  warning_at (DECL_SOURCE_LOCATION (node->decl), 0, wmsg,
> > ret_type);
> > +  return 0;
> > +}
> 
> ...it's probably worth just calling warning_at in each arm of the
> "if".
> We'll then get proper format checking in bootstraps.

I made this change.

> > +  for (t = DECL_ARGUMENTS (node->decl); t; t = DECL_CHAIN (t))
> > +{
> > +  arg_type = TREE_TYPE (t);
> > +  if (POINTER_TYPE_P (arg_type))
> > +   arg_type = TREE_TYPE (arg_type);
> > +  if (!currently_supported_simd_type (arg_type))
> > +   {
> > + if (supported_simd_type (arg_type))
> > +   wmsg = G_("GCC does not currently support argument type %qT
> > "
> > + "for simd");
> > + else
> > +   wmsg = G_("unsupported argument type %qT for simd");
> > + warning_at (DECL_SOURCE_LOCATION (node->decl), 0, wmsg,
> > arg_type);
> > + return 0;
> > +   }
> > +}
> 
> The ABI supports all argument types, so this should only check
> current_supported_simd_type and should always use the "GCC does
> not..."
> message.

Done.

> Could you explain why the POINTER_TYPE_P handling is correct?
> Think it's worth a comment.  Dropping it is also fine if that's
> easier.

I now think this was a mistake but after removing it I had to allow
for POINTER_P type in supported_simd_type to get the regression
tests to pass.  I think the current code is the correct behavour
and more closely matches x86 in terms of what is and is not vectorized.

> > +  if (clonei->simdlen == 0)
> > +{
> > +  if (SCALAR_INT_MODE_P (TYPE_MODE (base_type)))
> > +   clonei->simdlen = clonei->vecsize_int;
> > +  else
> > +   clonei->simdlen = clonei->vecsize_float;
> > +  clonei->simdlen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type));
> > +  return 1;
> > +}
> 
> I should have noticed this last time, but base_type is the CDT in the
> Intel ABI.  That isn't always right for the AArch64 ABI.
> 
> I think for now currently_supported_simd_type should take base_type
> as a second parameter and check that the given type has the same
> size.

I have not changed this, I am not quite sure what you mean.  What is
CDT?  Clone data type?  Are you saying I should use node->decl->type
instead of base_type?

> 
> > +  /* Restrict ourselves to vectors that fit in a single
> > register  */
> > +
> > +  gcc_assert (tree_fits_shwi_p (TYPE_SIZE (base_type)));
> > +  vsize = clonei->simdlen * tree_to_shwi (TYPE_SIZE (base_type));
> > +  if (vsize > 128)
> > +{
> > + warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
> > + "GCC does not currently support simdlen %d for
> > type %qT",
> > + clonei->simdlen, base_type);
> > + return 0;
> > +}
> 
> nit: block contents indented too far.

Fixed.

> > +  return 0;
> > +}
> 
> Doesn't this mean that we always silently fail for an explicit and
> correct simdlen?  If so, that suggests we might not have enough
> testsuite coverage :-)

Well, we were failing here and some tests were failing but then I
'fixed' the tests in my last patch to pass instead of fixing this
bug.  I have now changed it to 'return 1' and re-fixed the tests
that I incorrectly fixed last time.  So at least it wasn't a silent
fail (until I silenced it).

Here is the latest patch, any help you can give me on the base_type
issue would be appreciated.

2018-01-11  Steve Ellcey  

* config/aarch64/aarch64.c (cgraph.h): New include.
(intl.h): New include.
(supported_simd_type): New function.
(currently_supported_simd_type): Ditto.
(aarch64_simd_clone_compute_vecsize_and_simdlen): Ditto.
(aarch64_simd_clone_adjust): Ditto.
(aa

ISL tiling question (gcc.dg/graphite/interchange-3.c)

2019-01-11 Thread Steve Ellcey

Someone here was asking about GCC, ISL, and tiling and we looked at
the test gcc.dg/graphite/interchange-3.c on Aarch64.  When this
test is run the graphite pass output file contains the string 'not
tiled' and since the dg-final scan-tree-dump is just looking for
the string 'tiled', it matches and the test passes.

Is this intentional?  It seems like if we wanted to check that it was
not tiled we sould grep for 'not tiled', not just 'tiled'.  If we
want grep to see that it is tiled, then the check for tiling happening
is wrong.

Steve Ellcey
sell...@marvell.com

Re: [EXT] Re: [Patch 4/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2019-01-11 Thread Steve Ellcey

On Fri, 2019-01-11 at 12:30 +0100, Jakub Jelinek wrote:
> 
> > Yeah, like you say, this was originally posted in stage 1 and is the
> > last patch in the series.  Not committing it would leave the work
> > incomplete in GCC 9.  The problem is that we're now in stage 4 rather
> > than stage 3.
> > 
> > I don't think I'm neutral enough to make the call.  Richard, Jakub?
> 
> I'm ok with accepting this late as an exception, if it can be committed
> reasonably soon (within a week or at most two).
> 
>   Jakub

OK, I will make the comment change and check it in.  Note that this
does not give us the complete implementation yet.  Patch 3/4 was OK'ed
by Richard last week but I hadn't checked that in either.  So I will
check both these in later today unless I haer otherwise.

That still leaves Patch 2/4 which is Aarch64 specific.

https://gcc.gnu.org/ml/gcc-patches/2018-12/msg01421.html

Jakub had some comments on the test changes which I fixed but I did
not get any feedback on the actual code changes so I am not sure if 
that is OK or not.

STeve Ellcey
sell...@marvell.com

Re: [Patch 4/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2019-01-10 Thread Steve Ellcey

OK, I fixed the issues in your last email.  I initially found one
regression while testing.  In lra_create_live_ranges_1 I had removed
the 'call_p = false' statement but did not replaced it with anything.
This resulted in no regressions on aarch64 but caused a single
regression on x86 (gcc.target/i386/pr87759.c).  I replaced the
line with 'call_insn = NULL' and the regression went away so I
have clean bootstraps and no regressions on aarch64 and x86 now.

If this looks good to you can I go ahead and check it in?  I know
we are in Stage 3 now, but my recollection is that patches that were
initially submitted during Stage 1 could go ahead once approved.

Steve Ellcey
sell...@marvell.com



2019-01-10  Steve Ellcey  

* config/aarch64/aarch64.c (aarch64_simd_call_p): New function.
(aarch64_hard_regno_call_part_clobbered): Add insn argument.
(aarch64_return_call_with_max_clobbers): New function.
(TARGET_RETURN_CALL_WITH_MAX_CLOBBERS): New macro.
* config/avr/avr.c (avr_hard_regno_call_part_clobbered): Add insn
argument.
* config/i386/i386.c (ix86_hard_regno_call_part_clobbered): Ditto.
* config/mips/mips.c (mips_hard_regno_call_part_clobbered): Ditto.
* config/rs6000/rs6000.c (rs6000_hard_regno_call_part_clobbered): Ditto.
* config/s390/s390.c (s390_hard_regno_call_part_clobbered): Ditto.
* cselib.c (cselib_process_insn): Add argument to
targetm.hard_regno_call_part_clobbered call.
* ira-conflicts.c (ira_build_conflicts): Ditto.
* ira-costs.c (ira_tune_allocno_costs): Ditto.
* lra-constraints.c (inherit_reload_reg): Ditto.
* lra-int.h (struct lra_reg): Add call_insn field, remove call_p field.
* lra-lives.c (check_pseudos_live_through_calls): Add call_insn
argument.  Call targetm.return_call_with_max_clobbers.
Add argument to targetm.hard_regno_call_part_clobbered call.
(calls_have_same_clobbers_p): New function.
(process_bb_lives): Add call_insn and last_call_insn variables.
Pass call_insn to check_pseudos_live_through_calls.
Modify if stmt to check targetm.return_call_with_max_clobbers.
Update setting of flush variable.
(lra_create_live_ranges_1): Set call_insn to NULL instead of call_p
to false.
* lra.c (initialize_lra_reg_info_element): Set call_insn to NULL.
* regcprop.c (copyprop_hardreg_forward_1): Add argument to
targetm.hard_regno_call_part_clobbered call.
* reginfo.c (choose_hard_reg_mode): Ditto.
* regrename.c (check_new_reg_p): Ditto.
* reload.c (find_equiv_reg): Ditto.
* reload1.c (emit_reload_insns): Ditto.
* sched-deps.c (deps_analyze_insn): Ditto.
* sel-sched.c (init_regs_for_mode): Ditto.
(mark_unavailable_hard_regs): Ditto.
* targhooks.c (default_dwarf_frame_reg_mode): Ditto.
* target.def (hard_regno_call_part_clobbered): Add insn argument.
(return_call_with_max_clobbers): New target function.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in (TARGET_RETURN_CALL_WITH_MAX_CLOBBERS): New hook.
* hooks.c (hook_bool_uint_mode_false): Change to
hook_bool_insn_uint_mode_false.
* hooks.h (hook_bool_uint_mode_false): Ditto.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1c300af..7a1f838 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1655,14 +1655,53 @@ aarch64_reg_save_mode (tree fndecl, unsigned regno)
 	   : (aarch64_simd_decl_p (fndecl) ? E_TFmode : E_DFmode);
 }
 
+/* Return true if the instruction is a call to a SIMD function, false
+   if it is not a SIMD function or if we do not know anything about
+   the function.  */
+
+static bool
+aarch64_simd_call_p (rtx_insn *insn)
+{
+  rtx symbol;
+  rtx call;
+  tree fndecl;
+
+  gcc_assert (CALL_P (insn));
+  call = get_call_rtx_from (insn);
+  symbol = XEXP (XEXP (call, 0), 0);
+  if (GET_CODE (symbol) != SYMBOL_REF)
+return false;
+  fndecl = SYMBOL_REF_DECL (symbol);
+  if (!fndecl)
+return false;
+
+  return aarch64_simd_decl_p (fndecl);
+}
+
 /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  The callee only saves
the lower 64 bits of a 128-bit register.  Tell the compiler the callee
clobbers the top 64 bits when restoring the bottom 64 bits.  */
 
 static bool
-aarch64_hard_regno_call_part_clobbered (unsigned int regno, machine_mode mode)
+aarch64_hard_regno_call_part_clobbered (rtx_insn *insn, unsigned int regno,
+	machine_mode mode)
+{
+  bool simd_p = insn && CALL_P (insn) && aarch64_simd_call_p (insn);
+  return FP_REGNUM_P (regno)
+	 && maybe_gt (GET_MODE_SIZE (mode), simd_p ? 16 : 8);
+}
+
+/* Implement TARGET_RETURN_CALL_WITH_MAX_CLOBBERS.  */
+
+rtx_insn *
+aarch64_return_call_with_max_clobbers (rtx_insn *call_1, rtx_insn *call_2)
 {
-  return FP_REGNUM_P (regno) && m

Re: [Patch 4/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2019-01-09 Thread Steve Ellcey

On Wed, 2019-01-09 at 10:00 +, Richard Sandiford wrote:

Thanks for the quick turnaround on the comments Richard.  Here is a new
version where I tried to address all the issues you raised.  One thing
I noticed is that I think your calls_have_same_clobbers_p function only
works if, when return_call_with_max_clobbers is called with two calls
that clobber the same set of registers, it always returns the first
call.

I don't think my original function had that guarantee but I changed it 
so that it would and documented that requirement in target.def.  I
couldn't see a better way to implement the calls_have_same_clobbers_p
function other than doing that.

Steve Ellcey
sell...@marvell.com


2019-01-09  Steve Ellcey  

* config/aarch64/aarch64.c (aarch64_simd_call_p): New function.
(aarch64_hard_regno_call_part_clobbered): Add insn argument.
(aarch64_return_call_with_max_clobbers): New function.
(TARGET_RETURN_CALL_WITH_MAX_CLOBBERS): New macro.
* config/avr/avr.c (avr_hard_regno_call_part_clobbered): Add insn
argument.
* config/i386/i386.c (ix86_hard_regno_call_part_clobbered): Ditto.
* config/mips/mips.c (mips_hard_regno_call_part_clobbered): Ditto.
* config/rs6000/rs6000.c (rs6000_hard_regno_call_part_clobbered): Ditto.
* config/s390/s390.c (s390_hard_regno_call_part_clobbered): Ditto.
* cselib.c (cselib_process_insn): Add argument to
targetm.hard_regno_call_part_clobbered call.
* ira-conflicts.c (ira_build_conflicts): Ditto.
* ira-costs.c (ira_tune_allocno_costs): Ditto.
* lra-constraints.c (inherit_reload_reg): Ditto.
* lra-int.h (struct lra_reg): Add call_insn field, remove call_p field.
* lra-lives.c (check_pseudos_live_through_calls): Add call_insn
argument.  Call targetm.return_call_with_max_clobbers.
Add argument to targetm.hard_regno_call_part_clobbered call.
(calls_have_same_clobbers_p): New function.
(process_bb_lives): Add call_insn and last_call_insn variables.
Pass call_insn to check_pseudos_live_through_calls.
Modify if stmt to check targetm.return_call_with_max_clobbers.
Update setting of flush variable.
* lra.c (initialize_lra_reg_info_element): Set call_insn to NULL.
* regcprop.c (copyprop_hardreg_forward_1): Add argument to
targetm.hard_regno_call_part_clobbered call.
* reginfo.c (choose_hard_reg_mode): Ditto.
* regrename.c (check_new_reg_p): Ditto.
* reload.c (find_equiv_reg): Ditto.
* reload1.c (emit_reload_insns): Ditto.
* sched-deps.c (deps_analyze_insn): Ditto.
* sel-sched.c (init_regs_for_mode): Ditto.
(mark_unavailable_hard_regs): Ditto.
* targhooks.c (default_dwarf_frame_reg_mode): Ditto.
* target.def (hard_regno_call_part_clobbered): Add insn argument.
(return_call_with_max_clobbers): New target function.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in (TARGET_RETURN_CALL_WITH_MAX_CLOBBERS): New hook.
* hooks.c (hook_bool_uint_mode_false): Change to
hook_bool_insn_uint_mode_false.
* hooks.h (hook_bool_uint_mode_false): Ditto.


diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1c300af..d88be6c 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1655,14 +1655,56 @@ aarch64_reg_save_mode (tree fndecl, unsigned regno)
 	   : (aarch64_simd_decl_p (fndecl) ? E_TFmode : E_DFmode);
 }
 
+/* Return true if the instruction is a call to a SIMD function, false
+   if it is not a SIMD function or if we do not know anything about
+   the function.  */
+
+static bool
+aarch64_simd_call_p (rtx_insn *insn)
+{
+  rtx symbol;
+  rtx call;
+  tree fndecl;
+
+  gcc_assert (CALL_P (insn));
+  call = get_call_rtx_from (insn);
+  symbol = XEXP (XEXP (call, 0), 0);
+  if (GET_CODE (symbol) != SYMBOL_REF)
+return false;
+  fndecl = SYMBOL_REF_DECL (symbol);
+  if (!fndecl)
+return false;
+
+  return aarch64_simd_decl_p (fndecl);
+}
+
 /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  The callee only saves
the lower 64 bits of a 128-bit register.  Tell the compiler the callee
clobbers the top 64 bits when restoring the bottom 64 bits.  */
 
 static bool
-aarch64_hard_regno_call_part_clobbered (unsigned int regno, machine_mode mode)
+aarch64_hard_regno_call_part_clobbered (rtx_insn *insn, unsigned int regno,
+	machine_mode mode)
 {
-  return FP_REGNUM_P (regno) && maybe_gt (GET_MODE_SIZE (mode), 8);
+  bool simd_p = insn && CALL_P (insn) && aarch64_simd_call_p (insn);
+  return FP_REGNUM_P (regno)
+	 && maybe_gt (GET_MODE_SIZE (mode), simd_p ? 16 : 8);
+}
+
+/* Implement TARGET_RETURN_CALL_WITH_MAX_CLOBBERS.  */
+
+rtx_insn *
+aarch64_return_call_with_max_clobbers (rtx_insn *call_1, rtx_insn *call_2)
+{
+  gcc_assert (CALL_P (call_1) && CALL_P (call_2

Re: [Patch 4/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2019-01-08 Thread Steve Ellcey

On Mon, 2019-01-07 at 17:38 +, Richard Sandiford wrote:
> 
> Yeah, this was the kind of thing I had in mind, thanks.

Here is an updated version of the patch.  I bootstrapped and tested
on aarch64 and x86.  I didn't test the other platforms where I changed
the arguments to hard_regno_call_part_clobbered but I think they should
be OK.  I believe I addressed all the issues you brought up.  The ones
I am least confident of are the lra-lives.c changes.  I think they are
right and testing had no regressions, but they are probably the changes
that need to be checked most closely.

Steve Ellcey
sell...@marvell.com


2019-01-08  Steve Ellcey  

* config/aarch64/aarch64.c (aarch64_simd_call_p): New function.
(aarch64_hard_regno_call_part_clobbered): Add insn argument.
(aarch64_return_call_with_max_clobbers): New function.
(TARGET_RETURN_CALL_WITH_MAX_CLOBBERS): New macro.
* config/avr/avr.c (avr_hard_regno_call_part_clobbered): Add insn
argument.
* config/i386/i386.c (ix86_hard_regno_call_part_clobbered): Ditto.
* config/mips/mips.c (mips_hard_regno_call_part_clobbered): Ditto.
* config/rs6000/rs6000.c (rs6000_hard_regno_call_part_clobbered): Ditto.
* config/s390/s390.c (s390_hard_regno_call_part_clobbered): Ditto.
* cselib.c (cselib_process_insn): Add argument to
targetm.hard_regno_call_part_clobbered call.
* ira-conflicts.c (ira_build_conflicts): Ditto.
* ira-costs.c (ira_tune_allocno_costs): Ditto.
* lra-constraints.c (inherit_reload_reg): Ditto.
* lra-int.h (struct lra_reg): Add call_insn field.
* lra-lives.c (check_pseudos_live_through_calls): Add call_insn
argument.  Call targetm.return_call_with_max_clobbers.
Add argument to targetm.hard_regno_call_part_clobbered call.
(process_bb_lives): Use new target function
targetm.return_call_with_max_clobbers to set call_insn.
Pass call_insn to check_pseudos_live_through_calls.
Modify if to check targetm.return_call_with_max_clobbers.
* lra.c (initialize_lra_reg_info_element): Set call_insn to NULL.
* regcprop.c (copyprop_hardreg_forward_1): Add argument to
targetm.hard_regno_call_part_clobbered call.
* reginfo.c (choose_hard_reg_mode): Ditto.
* regrename.c (check_new_reg_p): Ditto.
* reload.c (find_equiv_reg): Ditto.
* reload1.c (emit_reload_insns): Ditto.
* sched-deps.c (deps_analyze_insn): Ditto.
* sel-sched.c (init_regs_for_mode): Ditto.
(mark_unavailable_hard_regs): Ditto.
* targhooks.c (default_dwarf_frame_reg_mode): Ditto.
* target.def (hard_regno_call_part_clobbered): Add insn argument.
(return_call_with_max_clobbers): New target function.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in (TARGET_RETURN_CALL_WITH_MAX_CLOBBERS): New hook.
* hooks.c (hook_bool_uint_mode_false): Change to
hook_bool_insn_uint_mode_false.
* hooks.h (hook_bool_uint_mode_false): Ditto.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1c45243..2063292 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1644,14 +1644,51 @@ aarch64_reg_save_mode (tree fndecl, unsigned regno)
 	   : (aarch64_simd_decl_p (fndecl) ? E_TFmode : E_DFmode);
 }
 
+/* Return true if the instruction is a call to a SIMD function, false
+   if it is not a SIMD function or if we do not know anything about
+   the function.  */
+
+static bool
+aarch64_simd_call_p (rtx_insn *insn)
+{
+  rtx symbol;
+  rtx call;
+  tree fndecl;
+
+  gcc_assert (CALL_P (insn));
+  call = get_call_rtx_from (insn);
+  symbol = XEXP (XEXP (call, 0), 0);
+  if (GET_CODE (symbol) != SYMBOL_REF)
+return false;
+  fndecl = SYMBOL_REF_DECL (symbol);
+  if (!fndecl)
+return false;
+
+  return aarch64_simd_decl_p (fndecl);
+}
+
 /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  The callee only saves
the lower 64 bits of a 128-bit register.  Tell the compiler the callee
clobbers the top 64 bits when restoring the bottom 64 bits.  */
 
 static bool
-aarch64_hard_regno_call_part_clobbered (unsigned int regno, machine_mode mode)
+aarch64_hard_regno_call_part_clobbered (rtx_insn *insn, unsigned int regno,
+	machine_mode mode)
 {
-  return FP_REGNUM_P (regno) && maybe_gt (GET_MODE_SIZE (mode), 8);
+  bool simd_p = insn && CALL_P (insn) && aarch64_simd_call_p (insn);
+  return FP_REGNUM_P (regno) && maybe_gt (GET_MODE_SIZE (mode), simd_p ? 16: 8);
+}
+
+/* Implement TARGET_RETURN_CALL_WITH_MAX_CLOBBERS.  */
+
+rtx_insn *
+aarch64_return_call_with_max_clobbers (rtx_insn *call_1, rtx_insn *call_2)
+{
+  gcc_assert (CALL_P (call_1));
+  if (call_2 == NULL_RTX || aarch64_simd_call_p (call_2))
+return call_1;
+  else
+return call_2;
 }
 
 /* Implement REGMODE_NATURAL_SIZE.  */
@@ -18764,6 +18801,10 @@ aa

Re: [Patch 4/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2019-01-04 Thread Steve Ellcey

On Thu, 2018-12-06 at 12:25 +, Richard Sandiford wrote:
> 
> Since we're looking at the call insns anyway, we could have a hook that
> "jousts" two calls and picks the one that preserves *fewer* registers.
> This would mean that loop produces a single instruction that conservatively
> describes the call-preserved registers.  We could then stash that
> instruction in lra_reg instead of the current check_part_clobbered
> boolean.
> 
> The hook should by default be a null pointer, so that we can avoid
> the instruction walk on targets that don't need it.
> 
> That would mean that LRA would always have a call instruction to hand
> when asking about call-preserved information.  So I think we should
> add an insn parameter to targetm.hard_regno_call_part_clobbered,
> with a null insn selecting the defaul behaviour.   I know it's
> going to be a pain to update all callers and targets, sorry.

Richard,  here is an updated version of this patch.  It is not
completly tested yet but I wanted to send this out and make
sure it is what you had in mind and see if you had any comments about
the new target function while I am testing it (including building
some of the other targets).

Steve Ellcey
sell...@cavium.com


2019-01-04  Steve Ellcey  

* config/aarch64/aarch64.c (aarch64_simd_call_p): New function.
(aarch64_hard_regno_call_part_clobbered): Add insn argument.
(aarch64_return_call_with_max_clobbers): New function.
(TARGET_RETURN_CALL_WITH_MAX_CLOBBERS): New macro.
* config/avr/avr.c (avr_hard_regno_call_part_clobbered): Add insn
argument.
* config/i386/i386.c (ix86_hard_regno_call_part_clobbered): Ditto.
* config/mips/mips.c (mips_hard_regno_call_part_clobbered): Ditto.
* config/rs6000/rs6000.c (rs6000_hard_regno_call_part_clobbered): Ditto.
* config/s390/s390.c (s390_hard_regno_call_part_clobbered): Ditto.
* cselib.c (cselib_process_insn): Add argument to
targetm.hard_regno_call_part_clobbered call.
* conflicts.c (ira_build_conflicts): Ditto.
* ira-costs.c (ira_tune_allocno_costs): Ditto.
* lra-constraints.c (inherit_reload_reg): Ditto, plus refactor
return statement.
* lra-int.h (struct lra_reg): Add call_insn field.
* lra-lives.c (check_pseudos_live_through_calls): Add call_insn
argument.  Add argument to targetm.hard_regno_call_part_clobbered
call.
(process_bb_lives): Use new target function
targetm.return_call_with_max_clobbers to set call_insn.
Pass call_insn to check_pseudos_live_through_calls.
Set call_insn in lra_reg_info.
* lra.c (initialize_lra_reg_info_element): Set call_insn to NULL.
* regcprop.c (copyprop_hardreg_forward_1): Add argument to
targetm.hard_regno_call_part_clobbered call.
* reginfo.c (choose_hard_reg_mode): Ditto.
* regrename.c (check_new_reg_p): Ditto.
* reload.c (find_equiv_reg): Ditto.
* reload1.c (emit_reload_insns): Ditto.
* sched-deps.c (deps_analyze_insn): Ditto.
* sel-sched.c (init_regs_for_mode): Ditto.
(mark_unavailable_hard_regs): Ditto.
* targhooks.c (default_dwarf_frame_reg_mode): Ditto.
* target.def (hard_regno_call_part_clobbered): Add insn argument.
(return_call_with_max_clobbers): New target function.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in (TARGET_RETURN_CALL_WITH_MAX_CLOBBERS): New hook.
* hooks.c (hook_bool_uint_mode_false): Change to
hook_bool_insn_uint_mode_false.
* hooks.h (hook_bool_uint_mode_false): Ditto.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index c5036c8..87af31b 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1565,16 +1565,55 @@ aarch64_reg_save_mode (tree fndecl, unsigned regno)
 	   : (aarch64_simd_decl_p (fndecl) ? E_TFmode : E_DFmode);
 }
 
+/* Return true if the instruction is a call to a SIMD function, false
+   if it is not a SIMD function or if we do not know anything about
+   the function.  */
+
+static bool
+aarch64_simd_call_p (rtx_insn *insn)
+{
+  rtx symbol;
+  rtx call;
+  tree fndecl;
+
+  gcc_assert (CALL_P (insn));
+  call = get_call_rtx_from (insn);
+  symbol = XEXP (XEXP (call, 0), 0);
+  if (GET_CODE (symbol) != SYMBOL_REF)
+return false;
+  fndecl = SYMBOL_REF_DECL (symbol);
+  if (!fndecl)
+return false;
+
+  return aarch64_simd_decl_p (fndecl);
+}
+
 /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  The callee only saves
the lower 64 bits of a 128-bit register.  Tell the compiler the callee
clobbers the top 64 bits when restoring the bottom 64 bits.  */
 
 static bool
-aarch64_hard_regno_call_part_clobbered (unsigned int regno, machine_mode mode)
+aarch64_hard_regno_call_part_clobbered (rtx_insn *insn,
+	unsigned int regno,
+	machine_mode mode)
 {

Re: [Patch 3/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2019-01-02 Thread Steve Ellcey


Here is an update of this patch.  I believe I addressed all of the
issues you raised.  Thanks for the new target.def description, it
seems much clearer than mine.  I did a full build and test on x86 as
well as aarch64 to make sure that architectures that do not define
TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS build correctly.

Steve Ellcey
sell...@marvell.com


2019-01-02  Steve Ellcey  

* config/aarch64/aarch64.c (aarch64_simd_call_p): New function.
(aarch64_remove_extra_call_preserved_regs): New function.
(TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS): New macro.
* doc/tm.texi.in (TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS): New hook.
* final.c (get_call_reg_set_usage): Call new hook.
* target.def (remove_extra_call_preserved_regs): New hook.
* targhooks.c (default_remove_extra_call_preserved_regs): New function.
* targhooks.h (default_remove_extra_call_preserved_regs): New function.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index c5036c8..c916689 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1565,6 +1565,45 @@ aarch64_reg_save_mode (tree fndecl, unsigned regno)
 	   : (aarch64_simd_decl_p (fndecl) ? E_TFmode : E_DFmode);
 }
 
+/* Return true if the instruction is a call to a SIMD function, false
+   if it is not a SIMD function or if we do not know anything about
+   the function.  */
+
+static bool
+aarch64_simd_call_p (rtx_insn *insn)
+{
+  rtx symbol;
+  rtx call;
+  tree fndecl;
+
+  gcc_assert (CALL_P (insn));
+  call = get_call_rtx_from (insn);
+  symbol = XEXP (XEXP (call, 0), 0);
+  if (GET_CODE (symbol) != SYMBOL_REF)
+return false;
+  fndecl = SYMBOL_REF_DECL (symbol);
+  if (!fndecl)
+return false;
+
+  return aarch64_simd_decl_p (fndecl);
+}
+
+/* Implement TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS.  If INSN calls
+   a function that uses the SIMD ABI, take advantage of the extra
+   call-preserved registers that the ABI provides.  */
+
+void
+aarch64_remove_extra_call_preserved_regs (rtx_insn *insn,
+	  HARD_REG_SET *return_set)
+{
+  if (aarch64_simd_call_p (insn))
+{
+  for (int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
+	if (FP_SIMD_SAVED_REGNUM_P (regno))
+	  CLEAR_HARD_REG_BIT (*return_set, regno);
+}
+}
+
 /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  The callee only saves
the lower 64 bits of a 128-bit register.  Tell the compiler the callee
clobbers the top 64 bits when restoring the bottom 64 bits.  */
@@ -18524,6 +18563,10 @@ aarch64_libgcc_floating_mode_supported_p
 #define TARGET_HARD_REGNO_CALL_PART_CLOBBERED \
   aarch64_hard_regno_call_part_clobbered
 
+#undef TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS
+#define TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS \
+  aarch64_remove_extra_call_preserved_regs
+
 #undef TARGET_CONSTANT_ALIGNMENT
 #define TARGET_CONSTANT_ALIGNMENT aarch64_constant_alignment
 
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 976a700..d9f40a1 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -1707,6 +1707,8 @@ of @code{CALL_USED_REGISTERS}.
 @cindex call-saved register
 @hook TARGET_HARD_REGNO_CALL_PART_CLOBBERED
 
+@hook TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS
+
 @findex fixed_regs
 @findex call_used_regs
 @findex global_regs
diff --git a/gcc/final.c b/gcc/final.c
index 6dc1cd1..f6edd6a 100644
--- a/gcc/final.c
+++ b/gcc/final.c
@@ -5095,7 +5095,7 @@ get_call_reg_set_usage (rtx_insn *insn, HARD_REG_SET *reg_set,
 	  return true;
 	}
 }
-
   COPY_HARD_REG_SET (*reg_set, default_set);
+  targetm.remove_extra_call_preserved_regs (insn, reg_set);
   return false;
 }
diff --git a/gcc/target.def b/gcc/target.def
index e8f0f70..908f9b9 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -5775,6 +5775,20 @@ for targets that don't have partly call-clobbered registers.",
  bool, (unsigned int regno, machine_mode mode),
  hook_bool_uint_mode_false)
 
+DEFHOOK
+(remove_extra_call_preserved_regs,
+ "This hook removes registers from the set of call-clobbered registers\n\
+ in @var{used_regs} if, contrary to the default rules, something guarantees\n\
+ that @samp{insn} preserves those registers.  For example, some targets\n\
+ support variant ABIs in which functions preserve more registers than\n\
+ normal functions would.  Removing those extra registers from @var{used_regs}\n\
+ can lead to better register allocation.\n\
+ \n\
+ The default implementation does nothing, which is always safe.\n\
+ Defining the hook is purely an optimization.",
+ void, (rtx_insn *insn, HARD_REG_SET *used_regs),
+ default_remove_extra_call_preserved_regs)
+
 /* Return the smallest number of different values for which it is best to
use a jump-table instead of a tree of conditional branches.  */
 DEFHOOK
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 898848f..6bd9767 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -2374,4 +2374,9 @@ default_speculation_safe_value

Re: [EXT] Re: [Patch 2/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2018-12-21 Thread Steve Ellcey

Here is an update to the test part of this patch.  I did not change the
actual source code part of this, just the tests, so that is all I am
including here.  I removed the x86 changes that had gotten in there by
accident and used relative line numbers in the warning checks instead
of absolute line numbers.  I also moved the warning checks to be closer
to the lines where the warnings are generated.

Retested on x86 and aarch64 with no regressions.

Steve Ellcey
sell...@cavium.com


2018-12-21  Steve Ellcey  

* g++.dg/gomp/declare-simd-1.C: Add aarch64 specific
warning checks and assembler scans.
* g++.dg/gomp/declare-simd-3.C: Ditto.
* g++.dg/gomp/declare-simd-4.C: Ditto.
* g++.dg/gomp/declare-simd-7.C: Ditto.
* gcc.dg/gomp/declare-simd-1.c: Ditto.
* gcc.dg/gomp/declare-simd-3.c: Ditto.

diff --git a/gcc/testsuite/g++.dg/gomp/declare-simd-1.C b/gcc/testsuite/g++.dg/gomp/declare-simd-1.C
index d2659e1..f44efd5 100644
--- a/gcc/testsuite/g++.dg/gomp/declare-simd-1.C
+++ b/gcc/testsuite/g++.dg/gomp/declare-simd-1.C
@@ -14,6 +14,7 @@ int f2 (int a, int *b, int c)
   return a + *b + c;
 }
 
+// { dg-warning "GCC does not currently support simdlen 8 for type 'int'" "" { target aarch64-*-* } .-5 }
 // { dg-final { scan-assembler-times "_ZGVbM8uva32l4__Z2f2iPii:" 1 { target { i?86-*-* x86_64-*-* } } } }
 // { dg-final { scan-assembler-times "_ZGVbN8uva32l4__Z2f2iPii:" 1 { target { i?86-*-* x86_64-*-* } } } }
 // { dg-final { scan-assembler-times "_ZGVcM8uva32l4__Z2f2iPii:" 1 { target { i?86-*-* x86_64-*-* } } } }
@@ -89,6 +90,7 @@ namespace N1
 // { dg-final { scan-assembler-times "_ZGVdN2va16__ZN2N12N23f10EPx:" 1 { target { i?86-*-* x86_64-*-* } } } }
 // { dg-final { scan-assembler-times "_ZGVeM2va16__ZN2N12N23f10EPx:" 1 { target { i?86-*-* x86_64-*-* } } } }
 // { dg-final { scan-assembler-times "_ZGVeN2va16__ZN2N12N23f10EPx:" 1 { target { i?86-*-* x86_64-*-* } } } }
+// { dg-final { scan-assembler-times "_ZN2N12N23f10EPx:" 1 { target { aarch64-*-* } } } }
 
 struct A
 {
@@ -191,6 +193,7 @@ int B::f25<7> (int a, int *b, int c)
   return a + *b + c;
 }
 
+// { dg-warning "unsupported argument type 'B' for simd" "" { target aarch64-*-* } .-5 }
 // { dg-final { scan-assembler-times "_ZGVbM8vuva32u__ZN1BIiE3f25ILi7EEEiiPii:" 1 { target { i?86-*-* x86_64-*-* } } } }
 // { dg-final { scan-assembler-times "_ZGVbN8vuva32u__ZN1BIiE3f25ILi7EEEiiPii:" 1 { target { i?86-*-* x86_64-*-* } } } }
 // { dg-final { scan-assembler-times "_ZGVcM8vuva32u__ZN1BIiE3f25ILi7EEEiiPii:" 1 { target { i?86-*-* x86_64-*-* } } } }
@@ -216,6 +219,7 @@ int B::f26<-1> (int a, int *b, int c)
 // { dg-final { scan-assembler-times "_ZGVdN4vl2va32__ZN1BIiE3f26ILin1EEEiiPii:" 1 { target { i?86-*-* x86_64-*-* } } } }
 // { dg-final { scan-assembler-times "_ZGVeM4vl2va32__ZN1BIiE3f26ILin1EEEiiPii:" 1 { target { i?86-*-* x86_64-*-* } } } }
 // { dg-final { scan-assembler-times "_ZGVeN4vl2va32__ZN1BIiE3f26ILin1EEEiiPii:" 1 { target { i?86-*-* x86_64-*-* } } } }
+// { dg-final { scan-assembler-times "_ZN1BIiE3f26ILin1EEEiiPii:" 1 { target { aarch64-*-* } } } }
 
 int
 f27 (int x)
@@ -239,6 +243,7 @@ f30 (int x)
   return x;
 }
 
+// { dg-warning "GCC does not currently support simdlen 16 for type 'int'" "" { target aarch64-*-* } .-7 }
 // { dg-final { scan-assembler-times "_ZGVbM16v__Z3f30i:" 1 { target { i?86-*-* x86_64-*-* } } } }
 // { dg-final { scan-assembler-times "_ZGVbN16v__Z3f30i:" 1 { target { i?86-*-* x86_64-*-* } } } }
 // { dg-final { scan-assembler-times "_ZGVcM16v__Z3f30i:" 1 { target { i?86-*-* x86_64-*-* } } } }
@@ -281,6 +286,7 @@ struct D
   int f37 (int a);
   int e;
 };
+// { dg-warning "GCC does not currently support simdlen 16 for type 'int'" "" { target aarch64-*-* } .-3 }
 
 void
 f38 (D )
diff --git a/gcc/testsuite/g++.dg/gomp/declare-simd-3.C b/gcc/testsuite/g++.dg/gomp/declare-simd-3.C
index 32cdc58..3d668ff 100644
--- a/gcc/testsuite/g++.dg/gomp/declare-simd-3.C
+++ b/gcc/testsuite/g++.dg/gomp/declare-simd-3.C
@@ -21,6 +21,8 @@ int f1 (int a, int b, int c, int , int , int )
 // { dg-final { scan-assembler-times "_ZGVdN8vulLUR4__Z2f1iiiRiS_S_:" 1 { target { i?86-*-* x86_64-*-* } } } }
 // { dg-final { scan-assembler-times "_ZGVeM16vulLUR4__Z2f1iiiRiS_S_:" 1 { target { i?86-*-* x86_64-*-* } } } }
 // { dg-final { scan-assembler-times "_ZGVeN16vulLUR4__Z2f1iiiRiS_S_:" 1 { target { i?86-*-* x86_64-*-* } } } }
+// { dg-final { scan-assembler-times "_ZGVnN4vulLUR4__Z2f1iiiRiS_S_:" 1 { target { aarch64-*-* } } } }
+  
 
 #pragma omp declare simd uniform(b) linear(c, d) linear(uval(e)) linear(ref(f))
 int f2 (int a, int b, int c, int , int , in

Re: [EXT] Re: [Patch 2/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2018-12-19 Thread Steve Ellcey

On Wed, 2018-12-19 at 23:57 +0100, Jakub Jelinek wrote:
> On Wed, Dec 19, 2018 at 10:10:19PM +0000, Steve Ellcey wrote:
> > @@ -199,6 +201,7 @@ int B::f25<7> (int a, int *b, int c)
> >  // { dg-final { scan-assembler-times
> > "_ZGVdN8vuva32u__ZN1BIiE3f25ILi7EEEiiPii:" 1 { target { i?86-*-*
> > x86_64-*-* } } } }
> >  // { dg-final { scan-assembler-times
> > "_ZGVeM8vuva32u__ZN1BIiE3f25ILi7EEEiiPii:" 1 { target { i?86-*-*
> > x86_64-*-* } } } }
> >  // { dg-final { scan-assembler-times
> > "_ZGVeN8vuva32u__ZN1BIiE3f25ILi7EEEiiPii:" 1 { target { i?86-*-*
> > x86_64-*-* } } } }
> > +// { dg-warning "unsupported argument type 'B' for simd" "" {
> > target aarch64-*-* } 191 }
> 
> Can you use relative line number instead, like .-10 or so?

That sounds like a good idea.

> 
> > @@ -62,7 +65,7 @@ int f3 (const int a, const int b, const int c,
> > const int , const int , const
> >  // { dg-final { scan-assembler-times
> > "_ZGVdM8vulLUR4__Z2f3iiiRKiS0_S0_:" 1 { target { i?86-*-* x86_64-*-
> > * } } } }
> >  // { dg-final { scan-assembler-times
> > "_ZGVdN8vulLUR4__Z2f3iiiRKiS0_S0_:" 1 { target { i?86-*-* x86_64-*-
> > * } } } }
> >  // { dg-final { scan-assembler-times
> > "_ZGVeM16vulLUR4__Z2f3iiiRKiS0_S0_:" 1 { target { i?86-*-* x86_64-
> > *-* } } } }
> > -// { dg-final { scan-assembler-times
> > "_ZGVeN16vulLUR4__Z2f3iiiRKiS0_S0_:" 1 { target { i?86-*-* x86_64-
> > *-* } } } }
> > +// { dg-final { scan-assembler-times
> > "_ZGVeN4vulLUR4__Z2f3iiiRKiS0_S0_:" 1 { target { i?86-*-* x86_64-*-
> > * } } } }
> 
> Can you explain this change?  Are you changing the x86 ABI?

No, that is a mistake that snuck in.  None of the x86 lines should
change.  Same for the other x86 changes.  I was changing the aarch64
manglings and obviously messed up some of the x86 ones.  Unfortunately
I did those changes after I did my x86 testing to verify the x86
code change I made so I didn't notice them.  I will fix those so that
no x86 lines are different.

Steve Ellcey
sell...@marvell.com

Re: [EXT] Re: [Patch 2/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2018-12-19 Thread Steve Ellcey

Here is an updated version of the GCC patch to enable SIMD functions on
Aarch64.  There are a number of changes from the last patch.

I reduced the shared code changes, there is still one change in shared code
(omp-simd-clone.c) to call targetm.simd_clone.adjust from expand_simd_clones
but it now uses the same argument as the existing call.  This new call allows
Aarch64 to add the aarch64_vector_pcs attribute to SIMD clone definitions
which in turn ensures they use the correct ABI.  Previously this target
function was only called on declarations, not definitions.  This change affects
the x86 target so I modified ix86_simd_clone_adjust to return and do nothing
when called with a definition.  This means there is no change in behaviour
on x86.  I did a build and GCC testsuite run on x86 to verify this.

Most of the changes from the previous patch are in the
aarch64_simd_clone_compute_vecsize_and_simdlen function.

The previous version was heavily based on the x86 function, this one has
changes to address the issues that were raised in the earlier patch
and so it no longer looks like the x86 version.  I use types instead of modes
to check for what we can/cannot vectorize and I (try to) differentiate
between vectors that we are not currently handling (but could later) and
those that won't ever be handled.

I have also added a testsuite patch to fix regressions in the gcc.dg/gomp
and g++.dg/gomp tests.  There are no regressions with this patch applied.

Steve Ellcey
sell...@marvell.com


2018-12-19  Steve Ellcey  

* config/aarch64/aarch64.c (cgraph.h): New include.
(supported_simd_type): New function.
(currently_supported_simd_type): Ditto.
(aarch64_simd_clone_compute_vecsize_and_simdlen): Ditto.
(aarch64_simd_clone_adjust): Ditto.
(aarch64_simd_clone_usable): Ditto.
(TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN): New macro.
(TARGET_SIMD_CLONE_ADJUST): Ditto.
(TARGET_SIMD_CLONE_USABLE): Ditto.
* config/i386/i386.c (ix86_simd_clone_adjust): Add definition check.
* omp-simd-clone.c (expand_simd_clones): Add targetm.simd_clone.adjust
call.

2018-12-19  Steve Ellcey  

* g++.dg/gomp/declare-simd-1.C: Add aarch64 specific
warning checks and assembler scans.
* g++.dg/gomp/declare-simd-3.C: Ditto.
* g++.dg/gomp/declare-simd-4.C: Ditto.
* g++.dg/gomp/declare-simd-7.C: Ditto.
* gcc.dg/gomp/declare-simd-1.c: Ditto.
* gcc.dg/gomp/declare-simd-3.c: Ditto.


 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 6038494..e61f6e1 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -40,6 +40,7 @@
 #include "regs.h"
 #include "emit-rtl.h"
 #include "recog.h"
+#include "cgraph.h"
 #include "diagnostic.h"
 #include "insn-attr.h"
 #include "alias.h"
@@ -71,6 +72,7 @@
 #include "selftest.h"
 #include "selftest-rtl.h"
 #include "rtx-vector-builder.h"
+#include "intl.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -18064,6 +18066,138 @@ aarch64_estimated_poly_value (poly_int64 val)
   return val.coeffs[0] + val.coeffs[1] * over_128 / 128;
 }
 
+
+/* Return true for types that could be supported as SIMD return or
+   argument types.  */
+
+static bool supported_simd_type (tree t)
+{
+  return (FLOAT_TYPE_P (t) || INTEGRAL_TYPE_P (t));
+}
+
+/* Return true for types that currently are supported as SIMD return
+   or argument types.  */
+
+static bool currently_supported_simd_type (tree t)
+{
+  if (COMPLEX_FLOAT_TYPE_P (t))
+return false;
+
+  return supported_simd_type (t);
+}
+
+/* Implement TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN.  */
+
+static int
+aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
+	struct cgraph_simd_clone *clonei,
+	tree base_type,
+	int num ATTRIBUTE_UNUSED)
+{
+  const char *wmsg;
+  int vsize;
+  tree t, ret_type, arg_type;
+
+  if (!TARGET_SIMD)
+return 0;
+
+  if (clonei->simdlen
+  && (clonei->simdlen < 2
+	  || clonei->simdlen > 1024
+	  || (clonei->simdlen & (clonei->simdlen - 1)) != 0))
+{
+  warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		  "unsupported simdlen %d", clonei->simdlen);
+  return 0;
+}
+
+  ret_type = TREE_TYPE (TREE_TYPE (node->decl));
+  if (TREE_CODE (ret_type) != VOID_TYPE
+  && !currently_supported_simd_type (ret_type))
+{
+  if (supported_simd_type (ret_type))
+	wmsg = G_("GCC does not currently support return type %qT for simd");
+  else
+	wmsg = G_("unsupported return type return type %qT for simd");
+  warning_at (DECL_SOURCE_LOCATION (node->decl), 0, wmsg, ret_type);
+  return 0;
+}
+
+  for (t = DECL_ARGUMENTS (node->decl); t; t = DECL_CHA

Re: [EXT] Re: [Patch 1/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2018-12-12 Thread Steve Ellcey

On Wed, 2018-12-12 at 11:39 +, Richard Sandiford wrote:
> 
> Steve Ellcey  writes:
> > On Fri, 2018-12-07 at 17:34 +, Richard Sandiford wrote:
> > > > +  (match_operand:TX 2 "register_operand" "w"))
> > > > + (set (mem:TX (plus:P (match_dup 0)
> > > > +  (match_operand:P 5 "const_int_operand"
> > > > "n")))
> > > > +  (match_operand:TX 3 "register_operand" "w"))])]
> > > 
> > > Think this last part should be:
> > > 
> > >  (set (mem:TX (plus:P (plus:P (match_dup 0)
> > >   (match_dup 4))
> > >   (match_operand:P 5 "const_int_operand"
> > > "n")))
> > >   (match_operand:TX 3 "register_operand" "w"))])]
> > 
> > I think you are right about this.  What I have for
> > loadwb_pair_ matches what is there for
> > loadwb_pair_.  If this one is wrong, then I assume
> > the others are wrong too?  This won't make a practical difference since
> > we call these with gen_loadwb_pair*_* calls and not via pattern
> > recognition, but still they should be right.  Should I change them
> > all?  I did not change this as part of this patch.
> 
> I think we should fix the new pattern, but I agree fixing the others
> should be a separate patch.
> 
> Patch LGTM with that change.

I am not sure this is right.  I created a patch (separate from any of
the SIMD changes) to fix the storewb_pair_ and
storewb_pair_ and when I try to build GCC with
that change, gcc aborts while building libgcc.  I didn't think
this change could affect the build but it appears to do so.

/home/sellcey/gcc-md-fix/src/gcc/libgcc/static-object.mk:17: recipe for
target 'addtf3.o' failed
make[1]: *** [addtf3.o] Error 1
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
/home/sellcey/gcc-md-fix/src/gcc/libgcc/static-object.mk:17: recipe for
target 'unwind-dw2.o' failed
make[1]: *** [unwind-dw2.o] Error 1
0x86bc7b dwarf2out_frame_debug_expr
/home/sellcey/gcc-md-fix/src/gcc/gcc/dwarf2cfi.c:1910
0x86acaf dwarf2out_frame_debug_expr
/home/sellcey/gcc-md-fix/src/gcc/gcc/dwarf2cfi.c:1616
0x86c13b dwarf2out_frame_debug
/home/sellcey/gcc-md-fix/src/gcc/gcc/dwarf2cfi.c:2169
0x86c13b scan_insn_after
/home/sellcey/gcc-md-fix/src/gcc/gcc/dwarf2cfi.c:2511

The patch I was trying was:

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 6657316..3530dd4 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1464,7 +1464,8 @@
  (set (mem:GPI (plus:P (match_dup 0)
   (match_dup 4)))
  (match_operand:GPI 2 "register_operand" "r"))
- (set (mem:GPI (plus:P (match_dup 0)
+ (set (mem:GPI (plus:P (plus:P (match_dup 0)
+  (match_dup 4))
   (match_operand:P 5 "const_int_operand" "n")))
  (match_operand:GPI 3 "register_operand" "r"))])]
   "INTVAL (operands[5]) == INTVAL (operands[4]) + GET_MODE_SIZE 
(mode)"
@@ -1480,7 +1481,8 @@
  (set (mem:GPF (plus:P (match_dup 0)
   (match_dup 4)))
  (match_operand:GPF 2 "register_operand" "w"))
- (set (mem:GPF (plus:P (match_dup 0)
+ (set (mem:GPF (plus:P (plus:P (match_dup 0)
+  (match_dup 4))
   (match_operand:P 5 "const_int_operand" "n")))
  (match_operand:GPF 3 "register_operand" "w"))])]
   "INTVAL (operands[5]) == INTVAL (operands[4]) + GET_MODE_SIZE 
(mode)"

Re: [EXT] Re: [Patch 2/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2018-12-12 Thread Steve Ellcey

On Wed, 2018-12-12 at 13:41 +0100, Jakub Jelinek wrote:
> External Email
> 
> ---
> ---
> On Wed, Dec 12, 2018 at 12:34:46PM +, Richard Sandiford wrote:
> > > I considered comparing node->decl and cfun->decl to differentiate
> > > between definitions and declarations instead of using a new
> > > argument
> > > but having an argument seemed cleaner and clearer.
> > 
> > Yeah, agreed.
> 
> I actually disagree, there is no point in passing another argument.
> You should be able to just check node->definition whether it is a
> definition
> or declaration.
> 
>   Jakub

You are right, I can just use node->definition and not add the new
argument.  I will make that change.

Steve Ellcey
sell...@cavium.com

[Patch 2/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2018-12-11 Thread Steve Ellcey

This is the modified version of the second of my Aarch64 SIMD ABI patches.
While implementing this functionality I found I wanted
targetm.simd_clone.adjust to be called when creating SIMD clone definitions
and also when creating SIMD clone declarations.  The current
implementation (used only by x86) only called this target function when
creating clone definitions.  I added a second argument to the target
function to say if it was creating a definition or a declaration and
modified the i386 code to do nothing on declarations, thus maintaining
its current behavour.

This allowed my to add the aarch64_vector_pcs attribute to SIMD clone
declarations and definitions on Aarch64. 

I considered comparing node->decl and cfun->decl to differentiate
between definitions and declarations instead of using a new argument
but having an argument seemed cleaner and clearer.

Tested on x86 and aarch64.

Steve Ellcey
sell...@marvell.com


2018-12-11  Steve Ellcey  

* config/aarch64/aarch64.c (cgraph.h): New include.
(aarch64_simd_clone_compute_vecsize_and_simdlen): New function.
(aarch64_simd_clone_adjust): Ditto.
(aarch64_simd_clone_usable): Ditto.
(TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN): New macro.
(TARGET_SIMD_CLONE_ADJUST): Ditto.
(TARGET_SIMD_CLONE_USABLE): Ditto.
* config/i386/i386.c (ix86_simd_clone_adjust): Add new argument.
* omp-simd-clone.c (simd_clone_adjust): Add new argument
to targetm.simd_clone.adjust call.
(expand_simd_clones): Add new targetm.simd_clone.adjust call.
* target.def (simd_clone_adjust): Add new argument.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index ea7e79f..40f18ef 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -40,6 +40,7 @@
 #include "regs.h"
 #include "emit-rtl.h"
 #include "recog.h"
+#include "cgraph.h"
 #include "diagnostic.h"
 #include "insn-attr.h"
 #include "alias.h"
@@ -17936,6 +17937,135 @@ aarch64_estimated_poly_value (poly_int64 val)
   return val.coeffs[0] + val.coeffs[1] * over_128 / 128;
 }
 
+/* Set CLONEI->vecsize_mangle, CLONEI->mask_mode, CLONEI->vecsize_int,
+   CLONEI->vecsize_float and if CLONEI->simdlen is 0, also
+   CLONEI->simdlen.  Return 0 if SIMD clones shouldn't be emitted,
+   or number of vecsize_mangle variants that should be emitted.  */
+
+static int
+aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
+	struct cgraph_simd_clone *clonei,
+	tree base_type,
+	int num ATTRIBUTE_UNUSED)
+{
+  int ret = 0;
+
+  if (clonei->simdlen
+  && (clonei->simdlen < 2
+	  || clonei->simdlen > 1024
+	  || (clonei->simdlen & (clonei->simdlen - 1)) != 0))
+{
+  warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		  "unsupported simdlen %d", clonei->simdlen);
+  return 0;
+}
+
+  tree ret_type = TREE_TYPE (TREE_TYPE (node->decl));
+  if (TREE_CODE (ret_type) != VOID_TYPE)
+switch (TYPE_MODE (ret_type))
+  {
+  case E_QImode:
+  case E_HImode:
+  case E_SImode:
+  case E_DImode:
+  case E_SFmode:
+  case E_DFmode:
+  /* case E_SCmode: */
+  /* case E_DCmode: */
+	break;
+  default:
+	warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		"unsupported return type %qT for simd\n", ret_type);
+	return 0;
+  }
+
+  tree t;
+  for (t = DECL_ARGUMENTS (node->decl); t; t = DECL_CHAIN (t))
+/* FIXME: Shouldn't we allow such arguments if they are uniform?  */
+switch (TYPE_MODE (TREE_TYPE (t)))
+  {
+  case E_QImode:
+  case E_HImode:
+  case E_SImode:
+  case E_DImode:
+  case E_SFmode:
+  case E_DFmode:
+  /* case E_SCmode: */
+  /* case E_DCmode: */
+	break;
+  default:
+	warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		"unsupported argument type %qT for simd\n", TREE_TYPE (t));
+	return 0;
+  }
+
+  if (TARGET_SIMD)
+{
+clonei->vecsize_mangle = 'n';
+clonei->mask_mode = VOIDmode;
+clonei->vecsize_int = 128;
+clonei->vecsize_float = 128;
+
+if (clonei->simdlen == 0)
+  {
+  if (SCALAR_INT_MODE_P (TYPE_MODE (base_type)))
+	clonei->simdlen = clonei->vecsize_int;
+  else
+	clonei->simdlen = clonei->vecsize_float;
+  clonei->simdlen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type));
+  }
+else if (clonei->simdlen > 16)
+  {
+  /* If it is possible for given SIMDLEN to pass CTYPE value in
+	 registers (v0-v7) accept that SIMDLEN, otherwise warn and don't
+	 emit corresponding clone.  */
+  int cnt = GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type)) * clonei->simdlen;
+  if (SCALAR_INT_MODE_P (TYPE_MODE (base_type)))
+	cnt /= clonei->vecsize_int;
+  else
+	cnt /= clonei-&

Re: [Patch 1/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2018-12-11 Thread Steve Ellcey

On Fri, 2018-12-07 at 17:34 +, Richard Sandiford wrote:
> 
> I'm not an expert on this stuff, but it looks like:
> 
>   struct cgraph_node *node = cgraph_node::get (fndecl);
>   return node && node->simdclone;
> 
> might work.  But in some ways it would be cleaner to add the
> aarch64_vector_pcs attribute for SIMD clones, e.g. via a new hook,
> so that the function type is "correct".

I have changed this to add the aarch64_vector_pcs attribute to clones.
To do this I had to tweak the targetm.simd_clone.adjust target
function, but I did not have to add a new target function.  That change
is part of Patch 2/4 and I will resubmit that after this email.  The
change in this patch is to just check for the aarch64_vector_pcs
attribute and not look at the simd attribute to determine the ABI.

> 
> > @@ -4863,6 +4949,7 @@ aarch64_process_components (sbitmap
> > components, bool prologue_p)
> >mergeable with the current one into a pair.  */
> >if (!satisfies_constraint_Ump (mem)
> > || GP_REGNUM_P (regno) != GP_REGNUM_P (regno2)
> > +   || (aarch64_simd_decl_p (cfun->decl) && (FP_REGNUM_P
> > (regno)))
> 
> Formatting nit: redundant brackets around FP_REGNUM_P (regno).

Fixed.

> > 
> > -(define_insn "simple_return"
> > +(define_expand "simple_return"
> > +  [(simple_return)]
> > +  "aarch64_use_simple_return_insn_p ()"
> > +  ""
> > +)
> > +
> > +(define_insn "*simple_return"
> >[(simple_return)]
> >""
> >"ret"
> 
> Can't you just change the condition on the existing define_insn,
> without turning it in a define_expand?  Worth a comment explaining
> why if not.

Yes, I am not sure why I did it this way (ciopying some other target
probably) but I got rid of the define_expand and changed the
define_insn and things seem to work fine.

> 
> > @@ -1487,6 +1538,23 @@
> >[(set_attr "type" "neon_store1_2reg")]
> >  )
> > 
> > +(define_insn "storewb_pair_"
> > +  [(parallel
> > +[(set (match_operand:P 0 "register_operand" "=")
> > +  (plus:P (match_operand:P 1 "register_operand" "0")
> > +  (match_operand:P 4 "aarch64_mem_pair_offset"
> > "n")))
> > + (set (mem:TX (plus:P (match_dup 0)
> > +  (match_dup 4)))
> 
> Should be indented under the (match_dup 0).

Fixed.

> 
> > +  (match_operand:TX 2 "register_operand" "w"))
> > + (set (mem:TX (plus:P (match_dup 0)
> > +  (match_operand:P 5 "const_int_operand" "n")))
> > +  (match_operand:TX 3 "register_operand" "w"))])]
> 
> Think this last part should be:
> 
>  (set (mem:TX (plus:P (plus:P (match_dup 0)
>   (match_dup 4))
>   (match_operand:P 5 "const_int_operand"
> "n")))
>   (match_operand:TX 3 "register_operand" "w"))])]

I think you are right about this.  What I have for
loadwb_pair_ matches what is there for
loadwb_pair_.  If this one is wrong, then I assume
the others are wrong too?  This won't make a practical difference since
we call these with gen_loadwb_pair*_* calls and not via pattern
recognition, but still they should be right.  Should I change them
all?  I did not change this as part of this patch.

> 
> > +  "TARGET_SIMD &&
> > +   INTVAL (operands[5]) == INTVAL (operands[4]) + GET_MODE_SIZE
> > (mode)"
> 
> && should be on the second line (which makes the line long enough to
> need breaking).

Fixed.

> 
> > diff --git a/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-2.c
> > 
> 
> Comment doesn't match code: this is testing a normal PCS function.

Fixed.
> 
> > +++ b/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-5.c
> 
> This is a nice test, but I think it would also be good to have versions
> that don't clobber full register pairs.  E.g. one without q9 and another
> without q10 would test individual STR Qs.

I added two new tests for this, simd-abi-6.c and simd-abi-7.c.

Steve Ellcey
sell...@marvell.com

ChangeLog:

2018-12-11  Steve Ellcey  

* config/aarch64/aarch64-protos.h (aarch64_use_simple_return_insn_p):
New prototype.
(aarch64_epilogue_uses): Ditto.
* config/aarch64/aarch64.c (aarch64_attribute_table): New array.
(aarch64_simd_decl_p): New function.
(aarch64_reg_save_mode): New function.
(aarch64_function_ok_for_s

[Ping][Aarch64] v2: Implement Aarch64 SIMD ABI

2018-12-05 Thread Steve Ellcey

Ping.

This is a ping of my patch set to implement the SIMD ABI on Aarch64.

https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00636.html
https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00637.html
https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00639.html
https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00641.html
https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00642.html

Steve Ellcey
sell...@marvell.com

Re: [PATCH][AArch64][2/3] Correct type attribute for mul and mneg instructions

2018-11-30 Thread Steve Ellcey

On Fri, 2018-11-30 at 15:37 +, Kyrill Tkachov wrote:
> 
> In thunderx2t99.md the reservation changes from thunderx2t99_mul to
> thunderx2t99_madd.
> 
> Steve, can you share whether the AArch64 MUL and MNEG instructions
> really do have different latencies and reservations from MADD and
> MSUB
> on Thunderx2? If so, then this change is wrong :( and we'll want to
> model these forms differently indeed.
> 
> Thanks,
> Kyrill

According to the thunderx2 documents I looked at, the mul/mneg
instructions do have the same latencies as madd/msub.  So this
patch is OK from that standpoint and fixes an existing problem
on Thunderx2.

Steve Ellcey

Re: [PATCH][RFC] Extend locations where to seach for Fortran pre-include.

2018-11-27 Thread Steve Ellcey

On Mon, 2018-11-26 at 17:35 +0100, Martin Liška wrote:
> On 11/26/18 5:19 PM, Matthias Klose wrote:
> > On 26.11.18 13:20, Martin Liška wrote:
> > > On 11/23/18 7:08 PM, Joseph Myers wrote:
> > > > In the multiarch case, do you want 
> > > > /include/finclude/ or 
> > > > /include//finclude?  (This is where I'd
> > > > hope Debian 
> > > > / Ubuntu GCC people would comment.)
> > > 
> > > Mathias can you please reply to this?
> > 
> > this should not matter, as long as you use the multilib name, and the 
> > correct
> > directory is used with the -m32/-m64/-mabi options.  Are other compilers 
> > like a
> > clang based compiler supposed to access this directory as well? 
> 
> I don't think so.
> 
> > In that case
> > the include directories should be documented.

Why wouldn't clang (flang) want to use the same mechanism as
GCC/gfortran?  I know there is some interest/work going on here for
flang and we would like a consistent way to use pre-includes to define
SIMD vector functions in both gfortran and flang.  I think this should
be documented so flang and other compilers can use it.  Even if no
other compilers did use it I think it should be documented because it
crosses project/package boundries, i.e. it is created by glibc and used
by gfortran.

Steve Ellcey
sell...@cavium.com

Re: Bootstrap problem with genatautomata and sysroot

2018-11-26 Thread Steve Ellcey

On Mon, 2018-11-26 at 22:47 +0100, Andreas Schwab wrote:
> External Email
> 
> On Nov 26 2018, Steve Ellcey  wrote:
> 
> > I looked through the patches for the last couple of weeks to see if
> > I could identify
> > what changed here but I haven't found anything.  Maybe it was
> > something in
> > glibc that changed.
> 
> Most likely it only worked by accident so far.  Last week the first
> GLIBC_2.29 symbol has been added to libm.
> 
> Andreas.

Yup, I backed off those glibc changes and I could build, so that seems
to be the problem.  I guess if I want to build a complete toolchain
with bootstrap I will need to update the libm that is in /lib.

Steve Ellcey

Bootstrap problem with genatautomata and sysroot

2018-11-26 Thread Steve Ellcey

I am trying to do a bootstrap build of GCC using a newly built glibc in
a non standard location on my aarch64 platform (thunderx).  This was working
up until a week or so ago but now I am running into a problem I haven't seen
before:

build/genautomata /home/sellcey/test-tot/src/gcc/gcc/common.md 
/home/sellcey/test-tot/src/gcc/gcc/config/aarch64/aarch64.md \
  insn-conditions.md > tmp-automata.c
build/genautomata: /lib/aarch64-linux-gnu/libm.so.6: version `GLIBC_2.29' not 
found (required by build/genautomata)
Makefile:2326: recipe for target 's-automata' failed

Has anyone else seen this?

I am building binutils and an initial GCC into a sysroot location, then I build 
glibc using that GCC and install it into that sysroot location and finally do
a full GCC build with bootstrap.  It is the final bootstrap build that fails.
If I do a non-bootstrap build of the final GCC then it works.

I looked through the patches for the last couple of weeks to see if I could 
identify
what changed here but I haven't found anything.  Maybe it was something in
glibc that changed.

Steve Ellcey
sell...@cavium.com

[Patch 4/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2018-11-08 Thread Steve Ellcey

This is a patch 4 to support the Aarch64 SIMD ABI [1] in GCC.

It defines a new target hook targetm.check_part_clobbered that
takes a rtx_insn and checks to see if it is a call to a function
that may clobber partial registers.  It returns true by default,
which results in the current behaviour, but if we can determine
that the function will not do any partial clobbers (like the
Aarch64 SIMD functions) then it returns false.

Steve Ellcey
sell...@cavium.com



2018-11-08  Steve Ellcey  

* config/aarch64/aarch64.c (aarch64_check_part_clobbered): New function.
(TARGET_CHECK_PART_CLOBBERED): New macro.
* doc/tm.texi.in (TARGET_CHECK_PART_CLOBBERED): New hook.
* lra-constraints.c (need_for_call_save_p): Use check_part_clobbered.
* lra-int.h (check_part_clobbered): New field in lra_reg struct.
* lra-lives.c (check_pseudos_live_through_calls): Pass in
check_partial_clobber bool argument and use it.
(process_bb_lives): Check basic block for functions that may do
partial clobbers.  Pass this to check_pseudos_live_through_calls.
* lra.c (initialize_lra_reg_info_element): Inialize 
check_part_clobbered to false.
* target.def (check_part_clobbered): New target hook.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index c82c7b6..c2de4111 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1480,6 +1480,17 @@ aarch64_hard_regno_call_part_clobbered (unsigned int regno, machine_mode mode)
   return FP_REGNUM_P (regno) && maybe_gt (GET_MODE_SIZE (mode), 8);
 }
 
+/* Implement TARGET_CHECK_PART_CLOBBERED.  SIMD functions never save
+   partial registers, so they return false.  */
+
+static bool
+aarch64_check_part_clobbered(rtx_insn *insn)
+{
+  if (aarch64_simd_call_p (insn))
+return false;
+  return true;
+}
+
 /* Implement REGMODE_NATURAL_SIZE.  */
 poly_uint64
 aarch64_regmode_natural_size (machine_mode mode)
@@ -18294,6 +18305,9 @@ aarch64_libgcc_floating_mode_supported_p
 #define TARGET_HARD_REGNO_CALL_PART_CLOBBERED \
   aarch64_hard_regno_call_part_clobbered
 
+#undef TARGET_CHECK_PART_CLOBBERED
+#define TARGET_CHECK_PART_CLOBBERED aarch64_check_part_clobbered
+
 #undef TARGET_CONSTANT_ALIGNMENT
 #define TARGET_CONSTANT_ALIGNMENT aarch64_constant_alignment
 
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index e8af1bf..7dd6c54 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -1704,6 +1704,8 @@ of @code{CALL_USED_REGISTERS}.
 @cindex call-saved register
 @hook TARGET_HARD_REGNO_CALL_PART_CLOBBERED
 
+@hook TARGET_CHECK_PART_CLOBBERED
+
 @findex fixed_regs
 @findex call_used_regs
 @findex global_regs
diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index ab61989..89483d3 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -5325,16 +5325,23 @@ inherit_reload_reg (bool def_p, int original_regno,
 static inline bool
 need_for_call_save_p (int regno)
 {
+  machine_mode pmode = PSEUDO_REGNO_MODE (regno);
+  int new_regno = reg_renumber[regno];
+
   lra_assert (regno >= FIRST_PSEUDO_REGISTER && reg_renumber[regno] >= 0);
-  return (usage_insns[regno].calls_num < calls_num
-	  && (overlaps_hard_reg_set_p
-	  ((flag_ipa_ra &&
-		! hard_reg_set_empty_p (lra_reg_info[regno].actual_call_used_reg_set))
-	   ? lra_reg_info[regno].actual_call_used_reg_set
-	   : call_used_reg_set,
-	   PSEUDO_REGNO_MODE (regno), reg_renumber[regno])
-	  || (targetm.hard_regno_call_part_clobbered
-		  (reg_renumber[regno], PSEUDO_REGNO_MODE (regno);
+
+  if (usage_insns[regno].calls_num >= calls_num)
+return false;
+
+  if (flag_ipa_ra
+  && !hard_reg_set_empty_p (lra_reg_info[regno].actual_call_used_reg_set))
+return (overlaps_hard_reg_set_p
+		(lra_reg_info[regno].actual_call_used_reg_set, pmode, new_regno)
+	|| (lra_reg_info[regno].check_part_clobbered
+		&& targetm.hard_regno_call_part_clobbered (new_regno, pmode)));
+  else
+return (overlaps_hard_reg_set_p (call_used_reg_set, pmode, new_regno)
+|| targetm.hard_regno_call_part_clobbered (new_regno, pmode));
 }
 
 /* Global registers occurring in the current EBB.  */
diff --git a/gcc/lra-int.h b/gcc/lra-int.h
index 5267b53..e6aacd2 100644
--- a/gcc/lra-int.h
+++ b/gcc/lra-int.h
@@ -117,6 +117,8 @@ struct lra_reg
   /* This member is set up in lra-lives.c for subsequent
  assignments.  */
   lra_copy_t copies;
+  /* Whether or not the register is partially clobbered.  */
+  bool check_part_clobbered;
 };
 
 /* References to the common info about each register.  */
diff --git a/gcc/lra-lives.c b/gcc/lra-lives.c
index 0bf8cd0..b2dfe0e 100644
--- a/gcc/lra-lives.c
+++ b/gcc/lra-lives.c
@@ -597,7 +597,8 @@ lra_setup_reload_pseudo_preferenced_hard_reg (int regno,
PSEUDOS_LIVE_THROUGH_CALLS and PSEUDOS_LIVE_THROUGH_SETJUMPS.  */
 static inline void
 check_pseudos_live_through_

[Patch 3/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2018-11-08 Thread Steve Ellcey

This is a patch 3 to support the Aarch64 SIMD ABI [1] in GCC.

It defines a new target hook targetm.remove_extra_call_preserved_regs
that takes a rtx_insn and will remove registers from the register
set passed in if we know that this call preserves those registers.
Aarch64 SIMD functions preserve some registers that normal functions
do not.  The default version of this function will do nothing.

Steve Ellcey
sell...@cavium.com


2018-11-08  Steve Ellcey  

* config/aarch64/aarch64.c (aarch64_simd_call_p): New function.
(aarch64_remove_extra_call_preserved_regs): New function.
(TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS): New macro.
* doc/tm.texi.in (TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS): New hook.
* final.c (get_call_reg_set_usage): Call new hook.
* target.def (remove_extra_call_preserved_regs): New hook.
* targhooks.c (default_remove_extra_call_preserved_regs): New function.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index c82c7b6..62112ac 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1470,6 +1470,50 @@ aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode)
   return false;
 }
 
+/* Return true if the instruction is a call to a SIMD function, false
+   if it is not a SIMD function or if we do not know anything about
+   the function.  */
+
+static bool
+aarch64_simd_call_p (rtx_insn *insn)
+{
+  rtx symbol;
+  rtx call;
+  tree fndecl;
+
+  if (!insn)
+return false;
+  call = get_call_rtx_from (insn);
+  if (!call)
+return false;
+  symbol = XEXP (XEXP (call, 0), 0);
+  if (GET_CODE (symbol) != SYMBOL_REF)
+return false;
+  fndecl = SYMBOL_REF_DECL (symbol);
+  if (!fndecl)
+return false;
+
+  return aarch64_simd_decl_p (fndecl);
+}
+
+/* Possibly remove some registers from register set if we know they
+   are preserved by this call, even though they are marked as not
+   being callee saved in CALL_USED_REGISTERS.  */
+
+void
+aarch64_remove_extra_call_preserved_regs (rtx_insn *insn,
+	  HARD_REG_SET *return_set)
+{
+  int regno;
+
+  if (aarch64_simd_call_p (insn))
+{
+  for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
+	if (FP_SIMD_SAVED_REGNUM_P (regno))
+	  CLEAR_HARD_REG_BIT (*return_set, regno);
+}
+}
+
 /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  The callee only saves
the lower 64 bits of a 128-bit register.  Tell the compiler the callee
clobbers the top 64 bits when restoring the bottom 64 bits.  */
@@ -18290,6 +18334,10 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_MODES_TIEABLE_P
 #define TARGET_MODES_TIEABLE_P aarch64_modes_tieable_p
 
+#undef TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS
+#define TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS \
+  aarch64_remove_extra_call_preserved_regs
+
 #undef TARGET_HARD_REGNO_CALL_PART_CLOBBERED
 #define TARGET_HARD_REGNO_CALL_PART_CLOBBERED \
   aarch64_hard_regno_call_part_clobbered
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index e8af1bf..73febe9 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -1704,6 +1704,8 @@ of @code{CALL_USED_REGISTERS}.
 @cindex call-saved register
 @hook TARGET_HARD_REGNO_CALL_PART_CLOBBERED
 
+@hook TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS
+
 @findex fixed_regs
 @findex call_used_regs
 @findex global_regs
diff --git a/gcc/final.c b/gcc/final.c
index 6e61f1e..8df869e 100644
--- a/gcc/final.c
+++ b/gcc/final.c
@@ -5080,7 +5080,7 @@ get_call_reg_set_usage (rtx_insn *insn, HARD_REG_SET *reg_set,
 	  return true;
 	}
 }
-
   COPY_HARD_REG_SET (*reg_set, default_set);
+  targetm.remove_extra_call_preserved_regs (insn, reg_set);
   return false;
 }
diff --git a/gcc/target.def b/gcc/target.def
index 4b166d1..25be927 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -5757,6 +5757,12 @@ for targets that don't have partly call-clobbered registers.",
  bool, (unsigned int regno, machine_mode mode),
  hook_bool_uint_mode_false)
 
+DEFHOOK
+(remove_extra_call_preserved_regs,
+ "This hook removes some registers from the callee used register set.",
+ void, (rtx_insn *insn, HARD_REG_SET *used_regs),
+ default_remove_extra_call_preserved_regs)
+
 /* Return the smallest number of different values for which it is best to
use a jump-table instead of a tree of conditional branches.  */
 DEFHOOK
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 3d8b3b9..a9fb101 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -2372,4 +2372,11 @@ default_speculation_safe_value (machine_mode mode ATTRIBUTE_UNUSED,
   return result;
 }
 
+void
+default_remove_extra_call_preserved_regs (rtx_insn *insn ATTRIBUTE_UNUSED,
+	  HARD_REG_SET *used_regs
+		ATTRIBUTE_UNUSED)
+{
+}
+
 #include "gt-targhooks.h"

[Patch 2/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2018-11-08 Thread Steve Ellcey

This is a patch 2 to support the Aarch64 SIMD ABI [1] in GCC.

It defines the TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN,
TARGET_SIMD_CLONE_ADJUST, and TARGET_SIMD_CLONE_USABLE macros
so that GCC can generate SIMD clones on aarch64.

Steve Ellcey
sell...@cavium.com


2018-11-08  Steve Ellcey  

* config/aarch64/aarch64.c (cgraph.h): New include.
(aarch64_simd_clone_compute_vecsize_and_simdlen): New function.
(aarch64_simd_clone_adjust): Ditto.
(aarch64_simd_clone_usable): Ditto.
(TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN): New macro.
(TARGET_SIMD_CLONE_ADJUST): Ditto.
(TARGET_SIMD_CLONE_USABLE): Ditto.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index c82c7b6..cccf961 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -40,6 +40,7 @@
 #include "regs.h"
 #include "emit-rtl.h"
 #include "recog.h"
+#include "cgraph.h"
 #include "diagnostic.h"
 #include "insn-attr.h"
 #include "alias.h"
@@ -17834,6 +17835,131 @@ aarch64_speculation_safe_value (machine_mode mode,
   return result;
 }
 
+/* Set CLONEI->vecsize_mangle, CLONEI->mask_mode, CLONEI->vecsize_int,
+   CLONEI->vecsize_float and if CLONEI->simdlen is 0, also
+   CLONEI->simdlen.  Return 0 if SIMD clones shouldn't be emitted,
+   or number of vecsize_mangle variants that should be emitted.  */
+
+static int
+aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
+	struct cgraph_simd_clone *clonei,
+	tree base_type,
+	int num ATTRIBUTE_UNUSED)
+{
+  int ret = 0;
+
+  if (clonei->simdlen
+  && (clonei->simdlen < 2
+	  || clonei->simdlen > 1024
+	  || (clonei->simdlen & (clonei->simdlen - 1)) != 0))
+{
+  warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		  "unsupported simdlen %d", clonei->simdlen);
+  return 0;
+}
+
+  tree ret_type = TREE_TYPE (TREE_TYPE (node->decl));
+  if (TREE_CODE (ret_type) != VOID_TYPE)
+switch (TYPE_MODE (ret_type))
+  {
+  case E_QImode:
+  case E_HImode:
+  case E_SImode:
+  case E_DImode:
+  case E_SFmode:
+  case E_DFmode:
+  /* case E_SCmode: */
+  /* case E_DCmode: */
+	break;
+  default:
+	warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		"unsupported return type %qT for simd\n", ret_type);
+	return 0;
+  }
+
+  tree t;
+  for (t = DECL_ARGUMENTS (node->decl); t; t = DECL_CHAIN (t))
+/* FIXME: Shouldn't we allow such arguments if they are uniform?  */
+switch (TYPE_MODE (TREE_TYPE (t)))
+  {
+  case E_QImode:
+  case E_HImode:
+  case E_SImode:
+  case E_DImode:
+  case E_SFmode:
+  case E_DFmode:
+  /* case E_SCmode: */
+  /* case E_DCmode: */
+	break;
+  default:
+	warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		"unsupported argument type %qT for simd\n", TREE_TYPE (t));
+	return 0;
+  }
+
+  if (TARGET_SIMD)
+{
+clonei->vecsize_mangle = 'n';
+clonei->mask_mode = VOIDmode;
+clonei->vecsize_int = 128;
+clonei->vecsize_float = 128;
+
+if (clonei->simdlen == 0)
+  {
+  if (SCALAR_INT_MODE_P (TYPE_MODE (base_type)))
+	clonei->simdlen = clonei->vecsize_int;
+  else
+	clonei->simdlen = clonei->vecsize_float;
+  clonei->simdlen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type));
+  }
+else if (clonei->simdlen > 16)
+  {
+  /* If it is possible for given SIMDLEN to pass CTYPE value in
+	 registers (v0-v7) accept that SIMDLEN, otherwise warn and don't
+	 emit corresponding clone.  */
+  int cnt = GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type)) * clonei->simdlen;
+  if (SCALAR_INT_MODE_P (TYPE_MODE (base_type)))
+	cnt /= clonei->vecsize_int;
+  else
+	cnt /= clonei->vecsize_float;
+  if (cnt > 8)
+	{
+	warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		"unsupported simdlen %d", clonei->simdlen);
+	return 0;
+	}
+  }
+  ret = 1;
+}
+  return ret;
+}
+
+/* Add target attribute to SIMD clone NODE if needed.  */
+
+static void
+aarch64_simd_clone_adjust (struct cgraph_node *node ATTRIBUTE_UNUSED)
+{
+}
+
+/* If SIMD clone NODE can't be used in a vectorized loop
+   in current function, return -1, otherwise return a badness of using it
+   (0 if it is most desirable from vecsize_mangle point of view, 1
+   slightly less desirable, etc.).  */
+
+static int
+aarch64_simd_clone_usable (struct cgraph_node *node)
+{
+  switch (node->simdclone->vecsize_mangle)
+{
+case 'n':
+  if (!TARGET_SIMD)
+	return -1;
+  return 0;
+default:
+  gcc_unreachable ();
+}
+}
+
 /* Target-specific selftests.  */
 
 #if CHECKING_P
@@ -18313,6 +18439,16 @@ aarch64_libgcc_float

[Patch 1/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2018-11-08 Thread Steve Ellcey

This is a resubmission of patch 1 to support the Aarch64 SIMD ABI [1] in
GCC, it does not have any functional changes from the last submit.

The significant difference between the standard ARM ABI and the SIMD ABI
is that in the normal ABI a callee saves only the lower 64 bits of registers
V8-V15, in the SIMD ABI the callee must save all 128 bits of registers
V8-V23.

This patch checks for SIMD functions and saves the extra registers when
needed.  It does not change the caller behavour, so with just this patch
there may be values saved by both the caller and callee.  This is not
efficient, but it is correct code. Patches 3 and 4 will remove the extra
saves from the caller.

Steve Ellcey
sell...@cavium.com


2018-11-08  Steve Ellcey  

* config/aarch64/aarch64-protos.h (aarch64_use_simple_return_insn_p):
New prototype.
(aarch64_epilogue_uses): Ditto.
* config/aarch64/aarch64.c (aarch64_attribute_table): New array.
(aarch64_simd_decl_p): New function.
(aarch64_reg_save_mode): New function.
(aarch64_function_ok_for_sibcall): Check for simd calls.
(aarch64_layout_frame): Check for simd function.
(aarch64_gen_storewb_pair): Handle E_TFmode.
(aarch64_push_regs): Use aarch64_reg_save_mode to get mode.
(aarch64_gen_loadwb_pair): Handle E_TFmode.
(aarch64_pop_regs): Use aarch64_reg_save_mode to get mode.
(aarch64_gen_store_pair): Handle E_TFmode.
(aarch64_gen_load_pair): Ditto.
(aarch64_save_callee_saves): Handle different mode sizes.
(aarch64_restore_callee_saves): Ditto.
(aarch64_components_for_bb): Check for simd function.
(aarch64_epilogue_uses): New function.
(aarch64_process_components): Check for simd function.
(aarch64_expand_prologue): Ditto.
(aarch64_expand_epilogue): Ditto.
(aarch64_expand_call): Ditto.
(aarch64_use_simple_return_insn_p): New function.
(TARGET_ATTRIBUTE_TABLE): New define.
* config/aarch64/aarch64.h (EPILOGUE_USES): Redefine.
(FP_SIMD_SAVED_REGNUM_P): New macro.
* config/aarch64/aarch64.md (simple_return): New define_expand.
(load_pair_dw_tftf): New instruction.
(store_pair_dw_tftf): Ditto.
(loadwb_pair_): Ditto.
(storewb_pair_): Ditto.


2018-11-08  Steve Ellcey  

* gcc.target/aarch64/torture/aarch64-torture.exp: New file.
* gcc.target/aarch64/torture/simd-abi-1.c: New test.
* gcc.target/aarch64/torture/simd-abi-2.c: Ditto.
* gcc.target/aarch64/torture/simd-abi-3.c: Ditto.
* gcc.target/aarch64/torture/simd-abi-4.c: Ditto.
* gcc.target/aarch64/torture/simd-abi-5.c: Ditto.
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 1fe1a50..e1528a4 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -467,6 +467,7 @@ bool aarch64_split_dimode_const_store (rtx, rtx);
 bool aarch64_symbolic_address_p (rtx);
 bool aarch64_uimm12_shift (HOST_WIDE_INT);
 bool aarch64_use_return_insn_p (void);
+bool aarch64_use_simple_return_insn_p (void);
 const char *aarch64_mangle_builtin_type (const_tree);
 const char *aarch64_output_casesi (rtx *);
 
@@ -552,6 +553,8 @@ void aarch64_split_simd_move (rtx, rtx);
 /* Check for a legitimate floating point constant for FMOV.  */
 bool aarch64_float_const_representable_p (rtx);
 
+extern int aarch64_epilogue_uses (int);
+
 #if defined (RTX_CODE)
 void aarch64_gen_unlikely_cbranch (enum rtx_code, machine_mode cc_mode,
    rtx label_ref);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index c82c7b6..b848c2a 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1088,6 +1088,15 @@ static const struct processor *selected_tune;
 /* The current tuning set.  */
 struct tune_params aarch64_tune_params = generic_tunings;
 
+/* Table of machine attributes.  */
+static const struct attribute_spec aarch64_attribute_table[] =
+{
+  /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
+   affects_type_identity, handler, exclude } */
+  { "aarch64_vector_pcs", 0, 0, false, true,  true,  false, NULL, NULL },
+  { NULL, 0, 0, false, false, false, false, NULL, NULL }
+};
+
 #define AARCH64_CPU_DEFAULT_FLAGS ((selected_cpu) ? selected_cpu->flags : 0)
 
 /* An ISA extension in the co-processor and main instruction set space.  */
@@ -1470,6 +1479,45 @@ aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode)
   return false;
 }
 
+/* Return true if this is a definition of a vectorized simd function.  */
+
+static bool
+aarch64_simd_decl_p (tree fndecl)
+{
+  tree fntype;
+
+  if (fndecl == NULL)
+return false;
+  fntype = TREE_TYPE (fndecl);
+  if (fntype == NULL)
+return false;
+
+  /* All functions with the aarch64_vector_pcs attribute use the simd ABI.  */
+  if (lookup_attribute ("aarch64_vector_pcs

[Patch 0/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2018-11-08 Thread Steve Ellcey

This is an updated set of patches for the SIMD ABI on aarch64.  I have
updated all of them to apply to ToT.  The first two have no functional
changes from the last submittal.  The last two are a reworking of what
used to be a single patch.  I split it up into two parts and reworked
it to address comments by Richard Sandiford.

Code that does not use SIMD functions should not be affected by these
patches.  Patch 1 implements the ABI and Patch 2 allows for vectorization
of loops with SIMD functions.

Patch 3 allows for optimization of SIMD functions by recognizing that some
registers that need to be caller saved by normal functions do not need to
be saved when calling a SIMD function.

Patch 4 allows for optimization of SIMD functions by recognizing that some
registers that are partially clobbered by normal functions are not partially
clobbered by SIMD functions and do not need to be saved by the caller.


Steve Ellcey
sell...@cavium.com

Re: Running the C++ library tests in the GCC testsuite

2018-11-07 Thread Steve Ellcey

On Wed, 2018-11-07 at 17:39 +, Joseph Myers wrote:
> External Email
> 
> On Wed, 7 Nov 2018, Steve Ellcey wrote:
> 
> > 
> > I have a question about the C++ library testsuite.  I built and
> > installed
> > a complete toolchain with GCC, binutils, and glibc in a directory
> > ($T) and
> > then I run the GCC testsuite with this command:
> > 
> > # cd to GCC object directory
> > make -j50 check RUNTESTFLAGS="--tool_opts  '--sysroot=$T -Wl,
> > --dynamic-linker=$T/lib/ld-linux-aarch64.so.1 -Wl,-rpath=$T/lib64
> > -Wl,-rpath=$T/usr/lib64'"
> I advise instead putting those options in your board file.
> 
> set_board_info ldflags "-Wl,whatever"
> 
> Note that you also need to make your board file set LOCPATH and GCONV_PATH
> appropriately (pointing the $sysroot/usr/lib/locale and
> $sysroot/usr/lib64/gconv respectively) for libstdc++ locale tests to work
> correctly with such a non-default glibc.  That would be code in your
> _load procedure in the board file (or in such a procedure in a
> file it loads via load_generic_config, etc.).

I copied unix.exp to unix-sysroot.exp and added this to it:

if {[info exists env(DEJAGNU_UNIX_SYSROOT_FLAGS)]} {
set_board_info ldflags "$env(DEJAGNU_UNIX_SYSROOT_FLAGS)"
}

I figured I would deal with LOCPATH and GCONV_PATH later.  When
I do a partial testrun, I don't get any failures but I do get some
new unresolved tests like this:

Download of ./2108-1.exe to unix-sysroot failed.
UNRESOLVED: gcc.dg/2108-1.c execution test

Have ever seen this error?

Steve Ellcey
sell...@cavium.com

Running the C++ library tests in the GCC testsuite

2018-11-07 Thread Steve Ellcey



I have a question about the C++ library testsuite.  I built and installed
a complete toolchain with GCC, binutils, and glibc in a directory ($T) and
then I run the GCC testsuite with this command:

# cd to GCC object directory
make -j50 check RUNTESTFLAGS="--tool_opts  '--sysroot=$T 
-Wl,--dynamic-linker=$T/lib/ld-linux-aarch64.so.1 -Wl,-rpath=$T/lib64 
-Wl,-rpath=$T/usr/lib64'"

When I look at the gcc.log, g++.log, gfortran.log files I see the -Wl options
that I specified being used when the tests are compiled, but when I look at
the C++ library test log file
(aarch64-linux-gnu/libstdc++-v3/testsuite/libstdc++.log) I do not see
the --rpath or other flags getting used.  Is this expected?  I have a
few tests that fail because of this and die with:

./check_nan.exe: /lib/aarch64-linux-gnu/libm.so.6: version `GLIBC_2.27' not 
found (required by ./check_nan.exe)

If I rerun by hand and add the --rpath, etc. flags the test works but I
am not sure why the test harness did not add them itself.

Steve Ellcey
sell...@cavium.com

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 900 matches

Mail list logo