date:20150422


Hi all,

This hunk that slightly reduces the cost of immediate moves doesn't actually 
have any effect.
In the whole of SPEC2006 it didn't make a difference. In any case, I'd like to 
move to a point
where we use COSTS_N_INSNS units for our costs and not increment decrement them 
by one.

This patch removes that bit of logic and makes it slightly cleaner to look at. 
As far as I know
its logic has never been confirmed in practice.

Bootstrapped and tested on arm.

Ok for trunk?

Thanks,
Kyrill

2015-04-22  Kyrylo Tkachov  kyrylo.tkac...@arm.com

* config/arm/arm.c (arm_new_rtx_costs): Do not lower cost
immediate moves.
commit e225669ff70f09520007b7898b170fb8fa75281f
Author: Kyrylo Tkachov kyrylo.tkac...@arm.com
Date:   Wed Apr 8 10:18:23 2015 +0100

[ARM] Do not lower cost of setting core reg to constant. It doesn't have any effect

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 0ef05c9..03988ac 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9725,11 +9725,7 @@ arm_new_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer_code,
 	 and we would otherwise be unable to work out the true cost.  */
 	  *cost = rtx_cost (SET_DEST (x), SET, 0, speed_p);
 	  outer_code = SET;
-	  /* Slightly lower the cost of setting a core reg to a constant.
-	 This helps break up chains and allows for better scheduling.  */
-	  if (REG_P (SET_DEST (x))
-	   REGNO (SET_DEST (x)) = LR_REGNUM)
-	*cost -= 1;
+
 	  x = SET_SRC (x);
 	  /* Immediate moves with an immediate in the range [0, 255] can be
 	 encoded in 16 bits in Thumb mode.  */

[PATCH] PR target/65846: Optimize data access in PIE with copy reloc

Normally, with PIE, GCC accesses globals that are extern to the module
using GOT.  This is two instructions, one to get the address of the global
from GOT and the other to get the value.  Examples:

---
extern int a_glob;
int
main ()
{
  return a_glob;
}
---

With PIE, the generated code accesses global via GOT using two memory
loads:

movqa_glob@GOTPCREL(%rip), %rax
movl(%rax), %eax

for 64-bit or

movla_glob@GOT(%ecx), %eax
movl(%eax), %eax

for 32-bit.

Some experiments on google and SPEC CPU benchmarks show that the extra
instruction affects performance by 1% to 5%.

Solution - Copy Relocations:

When the linker supports copy relocations, GCC can always assume that
the global will be defined in the executable.  For globals that are
truly extern (come from shared objects), the linker will create copy
relocations and have them defined in the executable.  Result is that
no global access needs to go through GOT and hence improves performance.
We can generate

movla_glob(%rip), %eax

for 64-bit and

movla_glob@GOTOFF(%eax), %eax

for 32-bit.  This optimization only applies to undefined non-weak
non-TLS global data.  Undefined weak global or TLS data access still
must go through GOT.

This patch reverts legitimate_pic_address_disp_p change made in revision
218397, which only applies to x86-64.  Instead, this patch updates
targetm.binds_local_p to indicate if undefined non-weak non-TLS global
data is defined locally in PIE.  It also introduces a new target hook,
binds_tls_local_p to distinguish TLS variable from non-TLS variable.  By
default, binds_tls_local_p is the same as binds_local_p.

This patch checks if 32-bit and 64-bit linkers support PIE with copy
reloc at configure time.  64-bit linker is enabled in binutils 2.25
and 32-bit linker is enabled in binutils 2.26.  This optimization
is enabled only if the linker support is available.

Tested on Linux/x86-64 with -m32 and -m64, using linkers with and without
support for copy relocation in PIE.  OK for trunk?

Thanks.

H.J.
---
gcc/

PR target/65846
* configure.ac (HAVE_LD_PIE_COPYRELOC): Renamed to ...
(HAVE_LD_64BIT_PIE_COPYRELOC): This.
(HAVE_LD_32BIT_PIE_COPYRELOC): New.   Defined to 1 if Linux/ia32
linker supports PIE with copy reloc.
* output.h (default_binds_tls_local_p): New.
(default_binds_local_p_3): Add 2 bool arguments.
* target.def (binds_tls_local_p): New target hook.
* varasm.c (decl_default_tls_model): Replace targetm.binds_local_p
with targetm.binds_tls_local_p.
(default_binds_local_p_3): Add a bool argument to indicate TLS
variable and a bool argument to indicate if an undefined non-TLS
non-weak data is local.  Double check TLS variable.  If an
undefined non-TLS non-weak data is local, treat it as defined
locally.
(default_binds_local_p): Pass false and false to
default_binds_local_p_3.
(default_binds_local_p_2): Likewise.
(default_binds_local_p_1): Likewise.
(default_binds_tls_local_p): New.
* config.in: Regenerated.
* configure: Likewise.
* doc/tm.texi: Likewise.
* config/i386/i386.c (legitimate_pic_address_disp_p): Don't
check HAVE_LD_PIE_COPYRELOC here.
(ix86_binds_local): New.
(ix86_binds_tls_local_p): Likewise.
(ix86_binds_local_p): Use it.
(TARGET_BINDS_TLS_LOCAL_P): New.
* doc/tm.texi.in (TARGET_BINDS_TLS_LOCAL_P): New hook.

gcc/testsuite/

PR target/65846
* gcc.target/i386/pie-copyrelocs-1.c: Updated for ia32.
* gcc.target/i386/pie-copyrelocs-2.c: Likewise.
* gcc.target/i386/pie-copyrelocs-3.c: Likewise.
* gcc.target/i386/pie-copyrelocs-4.c: Likewise.
* gcc.target/i386/pr32219-9.c: Likewise.
* gcc.target/i386/pr32219-10.c: New file.

* lib/target-supports.exp (check_effective_target_pie_copyreloc):
Check HAVE_LD_64BIT_PIE_COPYRELOC and HAVE_LD_32BIT_PIE_COPYRELOC
instead of HAVE_LD_64BIT_PIE_COPYRELOC.
---
 gcc/config.in| 18 ---
 gcc/config/i386/i386.c   | 44 ++-
 gcc/configure| 68 +---
 gcc/configure.ac | 64 +++---
 gcc/doc/tm.texi  | 10 
 gcc/doc/tm.texi.in   |  2 +
 gcc/output.h |  4 +-
 gcc/target.def   | 14 +
 gcc/testsuite/gcc.target/i386/pie-copyrelocs-1.c |  4 +-
 gcc/testsuite/gcc.target/i386/pie-copyrelocs-2.c |  4 +-
 gcc/testsuite/gcc.target/i386/pie-copyrelocs-3.c |  2 +-
 gcc/testsuite/gcc.target/i386/pie-copyrelocs-4.c |  4 +-
 gcc/testsuite/gcc.target/i386/pr32219-10.c   | 16 ++

[PATCH 0/14][ARM/AArch64] __FP16 support, vectors, intrinsics, testsuite

This patch series adds support for ARM Neon float16x4_t and float16x8_t vector 
types and intrinsics, and the __fp16 type, on both ARM and AArch64, and extends 
the tests in Christophe Lyon's advsimd-intrinsics testsuite to cover these. (I 
chose to extend the existing tests rather than add new ones, as the majority of 
f16 intrinsics are just moving blocks of 16-bits around and do not depend on HW 
support; I added new files for the conversion intrinsics.)


The ARM parts were previously posted at 
https://gcc.gnu.org/ml/gcc-patches/2015-01/msg01434.html but have had some fixes 
following the testsuite additions. Also The ARM patches depend upon my ARM 
lane-checking improvements at 
https://gcc.gnu.org/ml/gcc-patches/2015-01/msg01422.html , which I have just pinged.


I've cross-tested baremetal arm-none-eabi, aarch64-none-elf and 
aarch64_be-none-elf most patches individually, and bootstrapped each patch in 
series on (the relevant one of) arm-none-linux-gnueabihf and aarch64-none-linux-gnu.


OK for trunk?

Cheers, Alan

Re: trunk test result inconsistencies

2015-04-22 Thread Jakub Jelinek

On Wed, Apr 22, 2015 at 05:40:07AM -0700, H.J. Lu wrote:
 On Wed, Apr 22, 2015 at 5:19 AM, Jakub Jelinek ja...@redhat.com wrote:
  On Wed, Apr 22, 2015 at 08:04:03AM -0400, Andrew MacLeod wrote:
  Is anyone else seeing comparison problems on trunk?
 
  I was having problems testing a patch on a 4/16 extraction, so last night I
  checked out a fresh trunk, built it, ran make check... then removed the
  build directory, re-built it from scratch again. make check..  and get a
  bunch of different results. I even used test_summary instead of my own
  home-grown scripts.
 
  That is a buggy kernel, you must have missed a warning not to upgrade your
  kernel.  See
  https://bugzilla.kernel.org/show_bug.cgi?id=96311
  https://lkml.org/lkml/2015/4/13/600
  Dunno what progress has been on that patch since then, don't see it in
  latest Linus' git though.
  AFAIK at least 3.19 and 4.0 kernels are affected.
 
 3.19.5-100/3.19.5-200 from Fedora 20/21 fix the bug.

I admit I have not tried, but am certainly not seeing Peter's fix
in https://www.kernel.org/pub/linux/kernel/v3.x/ChangeLog-3.19.5
has it been fixed differently?  Not seeing anything in the Fedora kernel
package %changelog either...

Jakub

Re: [PATCH 3/5] libcc1: set debug compile: Display GCC driver filename


On 04/21/2015 03:41 PM, Jan Kratochvil wrote:

Hi,

as discussed in
How to use compile  execute function in GDB
https://sourceware.org/ml/gdb/2015-04/msg00026.html

GDB currently searches for /usr/bin/ARCH-OS-gcc and chooses one but it does not
display which one.  It cannot, GCC method set_arguments() does not yet know
whether 'set debug compile' is enabled or not.

Unfortunately this changes libcc1 API in an incompatible way.  There is
a possibility of a hack to keep the API the same - one could pass -v option
explicitly to set_arguments(), set_arguments() could compare the -v string
and print the GCC filename accordingly.  Then the 'verbose' parameter of
compile() would lose its meaning.  What do you think?
I think we're early enough in the evolution of libcc1 that changing the 
ABI shouldn't be a big deal.  I'd expect gcc  gdb to need to move in 
lock-step for this stuff for a while.




GDB counterpart:
[PATCH 3/4] compile: set debug compile: Display GCC driver filename
https://sourceware.org/ml/gdb-patches/2015-04/msg00807.html
Message-ID: 20150421213649.14147.79719.st...@host1.jankratochvil.net


Jan


include/ChangeLog
2015-04-21  Jan Kratochvil  jan.kratoch...@redhat.com

* gcc-interface.h (enum gcc_base_api_version): Add comment to
GCC_FE_VERSION_1.
(struct gcc_base_vtable): Move parameter verbose from compile to
set_arguments.

libcc1/ChangeLog
2015-04-21  Jan Kratochvil  jan.kratoch...@redhat.com

* libcc1.cc: Include intl.h.
(struct libcc1): Add field verbose.
(libcc1::libcc1): Initialize it.
(libcc1_set_arguments): Add parameter verbose, implement it.
(libcc1_compile): Remove parameter verbose, use self's field instead.

OK.  Please install on the trunk.

jeff

Re: trunk test result inconsistencies

On Wed, Apr 22, 2015 at 6:10 AM, Jakub Jelinek ja...@redhat.com wrote:
 On Wed, Apr 22, 2015 at 05:40:07AM -0700, H.J. Lu wrote:
 On Wed, Apr 22, 2015 at 5:19 AM, Jakub Jelinek ja...@redhat.com wrote:
  On Wed, Apr 22, 2015 at 08:04:03AM -0400, Andrew MacLeod wrote:
  Is anyone else seeing comparison problems on trunk?
 
  I was having problems testing a patch on a 4/16 extraction, so last night 
  I
  checked out a fresh trunk, built it, ran make check... then removed the
  build directory, re-built it from scratch again. make check..  and get a
  bunch of different results. I even used test_summary instead of my own
  home-grown scripts.
 
  That is a buggy kernel, you must have missed a warning not to upgrade your
  kernel.  See
  https://bugzilla.kernel.org/show_bug.cgi?id=96311
  https://lkml.org/lkml/2015/4/13/600
  Dunno what progress has been on that patch since then, don't see it in
  latest Linus' git though.
  AFAIK at least 3.19 and 4.0 kernels are affected.

 3.19.5-100/3.19.5-200 from Fedora 20/21 fix the bug.

 I admit I have not tried, but am certainly not seeing Peter's fix
 in https://www.kernel.org/pub/linux/kernel/v3.x/ChangeLog-3.19.5
 has it been fixed differently?  Not seeing anything in the Fedora kernel
 package %changelog either...


* Wed Apr 15 2015 Josh Boyer jwbo...@fedoraproject.org
- Add patch to fix tty closure race (rhbz 1208953)

https://bugzilla.redhat.com/show_bug.cgi?id=1208953


-- 
H.J.

Re: [PATCH][PR65823] Fix va_arg ap_copy nop detection

2015-04-22 Thread Tom de Vries


On 22-04-15 10:06, Richard Biener wrote:

On Wed, Apr 22, 2015 at 9:41 AM, Tom de Vries tom_devr...@mentor.com wrote:

Hi,

this patch fixes PR65823.



SNIP



The patches fixes the problem by using operand_equal_p to do the equality
test.


Bootstrapped and reg-tested on x86_64.

Did minimal non-bootstrap build on arm and reg-tested.

OK for trunk?


Hmm, ok for now.


Committed.


But I wonder if we can't fix things to not require that
odd extra copy.


Agreed, that would be good.


In fact that we introduce ap.1 looks completely bogus to me


AFAICT, it's introduced by gimplify_arg ('argp') because argp (a PARM_DECL) is 
not addressable.



(and we don't in this case for arm).  Note that the pointer compare obviously
fails because we unshare the expression.

So ... what breaks if we simply remove this odd fixup?



[ Originally mentioned at https://gcc.gnu.org/ml/gcc/2015-02/msg00011.html . ]

I've committed gcc.target/x86_64/abi/callabi/vaarg-6.c specifically as a minimal 
version of this problem.


If we remove the ap_copy fixup, at original we have:
...
;; Function do_cpy2 (null)
{
  char * e;

char * e;
  e = VA_ARG_EXPR argp;
  e = VA_ARG_EXPR argp;
  if (e != b)
{
  abort ();
}
}
...

and after gimplify we have:
...
do_cpy2 (char * argp)
{
  char * argp.1;
  char * argp.2;
  char * b.3;
  char * e;

  argp.1 = argp;
  e = VA_ARG (argp.1, 0B);
  argp.2 = argp;
  e = VA_ARG (argp.2, 0B);
  b.3 = b;
  if (e != b.3) goto D.1373; else goto D.1374;
  D.1373:
  abort ();
  D.1374:
}
...

The second VA_ARG uses argp.2, which is a copy of argp, which is unmodified by 
the first VA_ARG.



Using attached _proof-of-concept_ patch, I get callabi.exp working without the 
ap_copy, still at the cost of one 'argp.1 = argp' copy though:

...
do_cpy2 (char * argp)
{
  char * argp.1;
  char * b.2;
  char * e;

  argp.1 = argp;
  e = VA_ARG (argp.1, 0B);
  e = VA_ARG (argp.1, 0B);
  b.2 = b;
  if (e != b.2) goto D.1372; else goto D.1373;
  D.1372:
  abort ();
  D.1373:
}
...

But perhaps there's an easier way?

Thanks,
- Tom
Add copy for va_list parameter

---
 gcc/function.c | 29 +
 gcc/gimplify.c | 16 
 2 files changed, 29 insertions(+), 16 deletions(-)

diff --git a/gcc/function.c b/gcc/function.c
index 7d4df92..2ebfec4 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -3855,6 +3855,24 @@ gimplify_parm_type (tree *tp, int *walk_subtrees, void *data)
   return NULL;
 }
 
+static inline bool
+is_va_list_type (tree type)
+{
+  tree id = TYPE_IDENTIFIER (type);
+  if (id == NULL_TREE)
+return false;
+  const char *s = IDENTIFIER_POINTER (id);
+  if (s == NULL)
+return false;
+  if (strcmp (s, va_list) == 0)
+return true;
+  if (strcmp (s, __builtin_sysv_va_list) == 0)
+return true;
+  if (strcmp (s, __builtin_ms_va_list) == 0)
+return true;
+  return false;
+}
+
 /* Gimplify the parameter list for current_function_decl.  This involves
evaluating SAVE_EXPRs of variable sized parameters and generating code
to implement callee-copies reference parameters.  Returns a sequence of
@@ -3953,6 +3971,17 @@ gimplify_parameters (void)
 	  DECL_HAS_VALUE_EXPR_P (parm) = 1;
 	}
 	}
+  else if (is_va_list_type (TREE_TYPE (parm)))
+	{
+	  tree cp = create_tmp_reg (data.nominal_type, get_name (parm));
+	  DECL_IGNORED_P (cp) = 0;
+	  TREE_ADDRESSABLE (cp) = 1;
+	  tree t = build2 (MODIFY_EXPR, TREE_TYPE (cp), cp, parm);
+	  gimplify_and_add (t, stmts);
+
+	  SET_DECL_VALUE_EXPR (parm, cp);
+	  DECL_HAS_VALUE_EXPR_P (parm) = 1;
+	}
 }
 
   fnargs.release ();
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 5f1dd1a..c922dc7 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -4569,7 +4569,6 @@ gimplify_modify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
   gimple assign;
   location_t loc = EXPR_LOCATION (*expr_p);
   gimple_stmt_iterator gsi;
-  tree ap = NULL_TREE, ap_copy = NULL_TREE;
 
   gcc_assert (TREE_CODE (*expr_p) == MODIFY_EXPR
 	  || TREE_CODE (*expr_p) == INIT_EXPR);
@@ -4730,16 +4729,12 @@ gimplify_modify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
 	  enum internal_fn ifn = CALL_EXPR_IFN (*from_p);
 	  auto_vectree vargs (nargs);
 
-	  if (ifn == IFN_VA_ARG)
-	ap = unshare_expr (CALL_EXPR_ARG (*from_p, 0));
 	  for (i = 0; i  nargs; i++)
 	{
 	  gimplify_arg (CALL_EXPR_ARG (*from_p, i), pre_p,
 			EXPR_LOCATION (*from_p));
 	  vargs.quick_push (CALL_EXPR_ARG (*from_p, i));
 	}
-	  if (ifn == IFN_VA_ARG)
-	ap_copy = CALL_EXPR_ARG (*from_p, 0);
 	  call_stmt = gimple_build_call_internal_vec (ifn, vargs);
 	  gimple_set_location (call_stmt, EXPR_LOCATION (*expr_p));
 	}
@@ -4784,17 +4779,6 @@ gimplify_modify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
   gsi = gsi_last (*pre_p);
   maybe_fold_stmt (gsi);
 
-  /* When gimplifying the ap argument of va_arg, we might end up with
-   ap.1 = ap
-   va_arg (ap.1, 0B)
- We need to

Re: [PATCH 12/13] libstdc++, libgfortran gthr workaround for musl


On 04/20/2015 12:59 PM, Szabolcs Nagy wrote:

libgcc/gthr-posix.h uses weak reference logic to determine if libpthread
is linked into the application or not.  This is broken unless there is
special workaround with libc internal knowledge and even then static
linking needs further manual link time workaround, so this was disabled
for os/generic in libstdc++v3 and for musl in libgfortran.

The change minimizes the impact on other setups, but I think the weak
ref logic should be disabled by default, it is never entirely correct.
Conforming code can crash on a glibc setup too:

$ cat a.cpp
#include pthread.h
void(*f)(void) = (void(*)(void))pthread_key_create;
int main(){}
$ g++ -static a.cpp -lpthread
$ ./a.out
Segmentation fault

I reported this previously at
https://gcc.gnu.org/ml/gcc/2014-11/msg00246.html


libgfortran/Changelog:

2015-04-16  Szabolcs Nagy  szabolcs.n...@arm.com

* acinclude.m4 (GTHREAD_USE_WEAK): Define as 0 for *-*-musl*.
* configure: Regenerate.

libstdc++v3/Changelog:

2015-04-16  Szabolcs Nagy  szabolcs.n...@arm.com

* config/os/generic/os_defines.h (_GLIBCXX_GTHREAD_USE_WEAK): Define.
* configure.host (os_include_dir): Set to os/generic for linux-musl*.

OK.  Please install on the trunk.

jeff

[PATCH][AArch64] Properly cost FABD pattern

Hi all,

In rtx costs we do not handle the FP abs (minus (a b)) case which maps down
to a FABD instruction.
This patch fixes that. FABD behaves similarly to the FADD class of
instructions unlike simple FABS
which is closer to FNEG.

Tested aarch64-none-elf.
Ok for trunk?

Thanks,
Kyrill

2015-04-22  Kyrylo Tkachov  kyrylo.tkac...@arm.com

* config/aarch64/aarch64.c (aarch64_rtx_costs): Handle pattern for
fabd in ABS case.


aarch64-costs-fabd.patch
Description: Binary data

Re: [PATCH 10/13] fixincludes



On 20/04/15 21:38, Jeff Law wrote:

On 04/20/2015 12:58 PM, Szabolcs Nagy wrote:

No fixincludes are needed for musl.

fixincludes/Changelog:

2015-04-16  Gregor Richards  gregor.richa...@uwaterloo.ca

* mkfixinc.sh: Add *-musl* with no fixes.

OK.
jeff



I've committed this on Szabolcs' behalf with r222327.

Kyrill

Re: trunk test result inconsistencies


On 04/22/2015 06:28 AM, Andrew MacLeod wrote:

On 04/22/2015 08:19 AM, Jakub Jelinek wrote:

On Wed, Apr 22, 2015 at 08:04:03AM -0400, Andrew MacLeod wrote:

Is anyone else seeing comparison problems on trunk?

I was having problems testing a patch on a 4/16 extraction, so last
night I
checked out a fresh trunk, built it, ran make check... then removed the
build directory, re-built it from scratch again. make check..  and get a
bunch of different results. I even used test_summary instead of
my own
home-grown scripts.

That is a buggy kernel, you must have missed a warning not to upgrade
your
kernel.  See
https://bugzilla.kernel.org/show_bug.cgi?id=96311
https://lkml.org/lkml/2015/4/13/600
Dunno what progress has been on that patch since then, don't see it in
latest Linus' git though.
AFAIK at least 3.19 and 4.0 kernels are affected.

A nice. Thanks.   I did not see the warning, no :-)  That pretty
seriously buggy!
It may also have been poor timing on installing the OS and initial
upgrading of all the packages
shell.devel.redhat.com/~law has a 4.0 kernel with the tty fix if you 
don't want to try and downgrade.


jeff

Re: [PATCH 5/5] libcc1: 'set debug compile': Display absolute GCC driver filename


On 04/21/2015 03:41 PM, Jan Kratochvil wrote:

Hi,

with the patches so far after
(gdb) set debug compile 1
one would get:
searching for compiler matching regex 
^(x86_64|i.86)(-[^-]*)?-linux(-gnu)?-gcc$
found compiler x86_64-unknown-linux-gnu-gcc
But I believe it is more readable to see:
searching for compiler matching regex 
^(x86_64|i.86)(-[^-]*)?-linux(-gnu)?-gcc$
found compiler /usr/bin/x86_64-unknown-linux-gnu-gcc

I do not think the change will have functionality impact, although the filename
gets used even for executing the command.


Jan


libcc1/ChangeLog
2015-04-21  Jan Kratochvil  jan.kratoch...@redhat.com

* findcomp.cc: Include system.h.
(search_dir): Return absolute filename.

OK.  Please install on the trunk.

Thanks,
Jeff

Re: [PATCH 2/13] musl libc config


On 04/20/2015 12:52 PM, Szabolcs Nagy wrote:

Add musl libc support to gcc and the command line option -mmusl following other
libc support code.

Note that -mlibc cannot be entirely correct: there are build time decisions
based on the default libc.

gcc/Changelog:

2015-04-16  Gregor Richards  gregor.richa...@uwaterloo.ca

* config.gcc (LIBC_MUSL): New tm_defines macro.
* config/linux.h (OPTION_MUSL): Define.
(INCLUDE_DEFAULTS_MUSL_GPP, INCLUDE_DEFAULTS_MUSL_LOCAL,)
(INCLUDE_DEFAULTS_MUSL_PREFIX, INCLUDE_DEFAULTS_MUSL_CROSS,)
(INCLUDE_DEFAULTS_MUSL_TOOL, INCLUDE_DEFAULTS_MUSL_NATIVE): Define.

* config/linux.opt (mmusl): New option.
* gcc/configure.ac (gcc_cv_libc_provides_ssp): Add *-*-musl*.
(gcc_cv_target_dl_iterate_phdr): Add *-linux-musl*.

* gcc/configure: Regenerate.

OK for the trunk.  Please install.

jeff

Re: trunk test result inconsistencies

2015-04-22 Thread Andrew MacLeod


On 04/22/2015 08:33 AM, Jeff Law wrote:

On 04/22/2015 06:28 AM, Andrew MacLeod wrote:

On 04/22/2015 08:19 AM, Jakub Jelinek wrote:

On Wed, Apr 22, 2015 at 08:04:03AM -0400, Andrew MacLeod wrote:

Is anyone else seeing comparison problems on trunk?

I was having problems testing a patch on a 4/16 extraction, so last
night I
checked out a fresh trunk, built it, ran make check... then removed 
the
build directory, re-built it from scratch again. make check..  and 
get a

bunch of different results. I even used test_summary instead of
my own
home-grown scripts.

That is a buggy kernel, you must have missed a warning not to upgrade
your
kernel.  See
https://bugzilla.kernel.org/show_bug.cgi?id=96311
https://lkml.org/lkml/2015/4/13/600
Dunno what progress has been on that patch since then, don't see it in
latest Linus' git though.
AFAIK at least 3.19 and 4.0 kernels are affected.

A nice. Thanks.   I did not see the warning, no :-)  That pretty
seriously buggy!
It may also have been poor timing on installing the OS and initial
upgrading of all the packages
shell.devel.redhat.com/~law has a 4.0 kernel with the tty fix if you 
don't want to try and downgrade.



Thanks, but the original 3.17.blah is still available at boot time, so I 
just switched to that.Should be good enough until the fix shows up.


Andrew

Re: [PATCH 1/5] libcc1: Make libcc1.so-libcc1.so.0


On 04/21/2015 03:41 PM, Jan Kratochvil wrote:

Hi,

the next [patch 3/5] will change the libcc1.so API.  I am not sure if the API
change gets approved that way but for such case:
(1) We really need to change GCC_FE_VERSION_0 - GCC_FE_VERSION_1, this
 feature is there for this purpose.  That is [patch 2/5].
(2) Currently GDB does only dlopen(libcc1.so) and then depending on which
 libcc1.so version it would find first it would succeed/fail.
 I guess it is more convenient to do dlopen(libcc1.so.1) instead
 (where .1=.x corresponds to GCC_FE_VERSION_x).
 That is this patch (with x=0).
 GCC_C_FE_LIBCC is used only by GDB.
(3) Currently there is no backward or forward compatibility although there
 could be one implemented.  Personally I think the 'compile' feature is
 still in experimental stage so that it is OK to require last releases.
 At least in Fedora we can keep GDB-GCC in sync.

GDB counterpart:
[PATCH 1/4] compile: Use libcc1.so-libcc1.so.0
https://sourceware.org/ml/gdb-patches/2015-04/msg00805.html
Message-ID: 20150421213635.14147.15653.st...@host1.jankratochvil.net


Jan


include/ChangeLog
2015-04-21  Jan Kratochvil  jan.kratoch...@redhat.com

* gcc-c-interface.h (GCC_C_FE_LIBCC): Quote it.  Append
GCC_FE_VERSION_0.

OK.  Please install on the trunk.
jeff

Re: [PATCH 11/13] unwind fix for musl


On 04/20/2015 12:58 PM, Szabolcs Nagy wrote:

dl_iterate_phdr depends on USE_PT_GNU_EH_FRAME.

I think USE_PT_GNU_EH_FRAME could be enabled more generally (whenever
libc provides dl_iterate_phdr), but I only made a conservative change.


libgcc/Changelog:

2015-04-16  Gregor Richards  gregor.richa...@uwaterloo.ca
Szabolcs Nagy  szabolcs.n...@arm.com

* unwind-dw2-fde-dip.c (USE_PT_GNU_EH_FRAME): Define it on
Linux if target provides dl_iterate_phdr.

OK.  Please install on the trunk.

At this point I think everything but the target files have been 
approved, right?


jeff

Re: [PATCH][PR65823] Fix va_arg ap_copy nop detection

On Wed, Apr 22, 2015 at 3:38 PM, Tom de Vries tom_devr...@mentor.com wrote:
 On 22-04-15 10:06, Richard Biener wrote:

 On Wed, Apr 22, 2015 at 9:41 AM, Tom de Vries tom_devr...@mentor.com
 wrote:

 Hi,

 this patch fixes PR65823.


 SNIP


 The patches fixes the problem by using operand_equal_p to do the equality
 test.


 Bootstrapped and reg-tested on x86_64.

 Did minimal non-bootstrap build on arm and reg-tested.

 OK for trunk?


 Hmm, ok for now.


 Committed.

 But I wonder if we can't fix things to not require that
 odd extra copy.


 Agreed, that would be good.

 In fact that we introduce ap.1 looks completely bogus to me


 AFAICT, it's introduced by gimplify_arg ('argp') because argp (a PARM_DECL)
 is not addressable.

 (and we don't in this case for arm).  Note that the pointer compare
 obviously
 fails because we unshare the expression.

 So ... what breaks if we simply remove this odd fixup?


 [ Originally mentioned at https://gcc.gnu.org/ml/gcc/2015-02/msg00011.html .
 ]

 I've committed gcc.target/x86_64/abi/callabi/vaarg-6.c specifically as a
 minimal version of this problem.

 If we remove the ap_copy fixup, at original we have:
 ...
 ;; Function do_cpy2 (null)
 {
   char * e;

 char * e;
   e = VA_ARG_EXPR argp;
   e = VA_ARG_EXPR argp;
   if (e != b)
 {
   abort ();
 }
 }
 ...

 and after gimplify we have:
 ...
 do_cpy2 (char * argp)
 {
   char * argp.1;
   char * argp.2;
   char * b.3;
   char * e;

   argp.1 = argp;
   e = VA_ARG (argp.1, 0B);
   argp.2 = argp;
   e = VA_ARG (argp.2, 0B);
   b.3 = b;
   if (e != b.3) goto D.1373; else goto D.1374;
   D.1373:
   abort ();
   D.1374:
 }
 ...

 The second VA_ARG uses argp.2, which is a copy of argp, which is unmodified
 by the first VA_ARG.


 Using attached _proof-of-concept_ patch, I get callabi.exp working without
 the ap_copy, still at the cost of one 'argp.1 = argp' copy though:
 ...
 do_cpy2 (char * argp)
 {
   char * argp.1;
   char * b.2;
   char * e;

   argp.1 = argp;
   e = VA_ARG (argp.1, 0B);
   e = VA_ARG (argp.1, 0B);
   b.2 = b;
   if (e != b.2) goto D.1372; else goto D.1373;
   D.1372:
   abort ();
   D.1373:
 }
 ...

 But perhaps there's an easier way?

Hum, simply

Index: gcc/gimplify.c
===
--- gcc/gimplify.c  (revision 222320)
+++ gcc/gimplify.c  (working copy)
@@ -9419,6 +9419,7 @@ gimplify_va_arg_expr (tree *expr_p, gimp
 }

   /* Transform a VA_ARG_EXPR into an VA_ARG internal function.  */
+  mark_addressable (valist);
   ap = build_fold_addr_expr_loc (loc, valist);
   tag = build_int_cst (build_pointer_type (type), 0);
   *expr_p = build_call_expr_internal_loc (loc, IFN_VA_ARG, type, 2, ap, tag);

pre-approved with removing the kludge.

Thanks,
Richard.

 Thanks,
 - Tom

[PATCH] Quiet down -Wlogical-op a bit (PR c/61534)

This patch stifles -Wlogical-op a bit: don't warn if either operand comes from
a macro expansion.  As the comment says, it doesn't fix the bug completely, but
it's a simple improvement.  I did this by introducing a new macro.

Bootstrapped/regtested on x86_64-linux, ok for trunk?
(Bootstrap with -Wlogical-op enabled does not pass yet.)

2015-04-22  Marek Polacek  pola...@redhat.com

PR c/61534
* input.h (from_macro_expansion_at): Define.

* c-common.c (warn_logical_operator): Bail if either operand comes
from a macro expansion.

* c-c++-common/pr61534-1.c: New test.

diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c
index 7fe7fa6..b09bbb8 100644
--- gcc/c-family/c-common.c
+++ gcc/c-family/c-common.c
@@ -1697,6 +1697,13 @@ warn_logical_operator (location_t location, enum 
tree_code code, tree type,
code != TRUTH_OR_EXPR)
 return;
 
+  /* We don't want to warn if either operand comes from a macro
+ expansion.  ??? This doesn't work with e.g. NEGATE_EXPR yet;
+ see PR61534.  */
+  if (from_macro_expansion_at (EXPR_LOCATION (op_left))
+  || from_macro_expansion_at (EXPR_LOCATION (op_right)))
+return;
+
   /* Warn if /|| are being used in a context where it is
  likely that the bitwise equivalent was intended by the
  programmer. That is, an expression such as op  MASK
diff --git gcc/input.h gcc/input.h
index 7a0483f..93eb6ed 100644
--- gcc/input.h
+++ gcc/input.h
@@ -70,6 +70,10 @@ extern location_t input_location;
header, but expanded in a non-system file.  */
 #define in_system_header_at(LOC) \
   (linemap_location_in_system_header_p (line_table, LOC))
+/* Return a positive value if LOCATION is the locus of a token that
+   comes from a macro expansion, O otherwise.  */
+#define from_macro_expansion_at(LOC) \
+  ((linemap_location_from_macro_expansion_p (line_table, LOC)))
 
 void dump_line_table_statistics (void);
 
diff --git gcc/testsuite/c-c++-common/pr61534-1.c 
gcc/testsuite/c-c++-common/pr61534-1.c
index e69de29..1e304f0 100644
--- gcc/testsuite/c-c++-common/pr61534-1.c
+++ gcc/testsuite/c-c++-common/pr61534-1.c
@@ -0,0 +1,13 @@
+/* PR c/61534 */
+/* { dg-options -Wlogical-op } */
+
+extern int xxx;
+#define XXX !xxx
+int
+test (void)
+{
+  if (XXX  xxx) /* { dg-bogus logical } */
+return 4;
+  else
+return 0;
+}

Marek

Re: trunk test result inconsistencies

On Wed, Apr 22, 2015 at 5:19 AM, Jakub Jelinek ja...@redhat.com wrote:
 On Wed, Apr 22, 2015 at 08:04:03AM -0400, Andrew MacLeod wrote:
 Is anyone else seeing comparison problems on trunk?

 I was having problems testing a patch on a 4/16 extraction, so last night I
 checked out a fresh trunk, built it, ran make check... then removed the
 build directory, re-built it from scratch again. make check..  and get a
 bunch of different results. I even used test_summary instead of my own
 home-grown scripts.

 That is a buggy kernel, you must have missed a warning not to upgrade your
 kernel.  See
 https://bugzilla.kernel.org/show_bug.cgi?id=96311
 https://lkml.org/lkml/2015/4/13/600
 Dunno what progress has been on that patch since then, don't see it in
 latest Linus' git though.
 AFAIK at least 3.19 and 4.0 kernels are affected.

3.19.5-100/3.19.5-200 from Fedora 20/21 fix the bug.


-- 
H.J.

Re: [PATCH 2/5] libcc1: Use libcc1.so.0-libcc1.so.1


On 04/21/2015 03:41 PM, Jan Kratochvil wrote:

Hi,

see [patch 1/5], particularly:
(3) Currently there is no backward or forward compatibility although there
 could be one implemented.  Personally I think the 'compile' feature is
 still in experimental stage so that it is OK to require last releases.
 At least in Fedora we can keep GDB-GCC in sync.

GDB counterpart:
[PATCH 2/4] compile: Use libcc1.so.0-libcc1.so.1
https://sourceware.org/ml/gdb-patches/2015-04/msg00806.html
Message-ID: 20150421213642.14147.93210.st...@host1.jankratochvil.net


Jan


include/ChangeLog
2015-04-21  Jan Kratochvil  jan.kratoch...@redhat.com

* gcc-c-interface.h (GCC_C_FE_LIBCC): Update it to GCC_FE_VERSION_1.
* gcc-interface.h (enum gcc_base_api_version): Add GCC_FE_VERSION_1.

libcc1/ChangeLog
2015-04-21  Jan Kratochvil  jan.kratoch...@redhat.com

* Makefile.am (libcc1_la_LDFLAGS): Add version-info 1.
* Makefile.in: Regenerate.
* libcc1.cc (vtable, gcc_c_fe_context): Update it to GCC_FE_VERSION_1.

OK.  Please install on the trunk.

jeff

[PATCH] [AArch32] Additional bics patterns.

2015-04-22 Thread Alex Velenko

Hi,

This patch adds arm rtl patterns to generate bics instructions with shift.

Done full regression run on arm-none-eabi.

Is patch ok?

gcc/config

2015-04-22  Alex Velenko  alex.vele...@arm.com

  * arm/arm.md (andsi_not_shiftsi_si_scc): New pattern.
  * (andsi_not_shiftsi_si_scc_no_reuse): New pattern.

gcc/testsuite

2015-04-22  Alex Velenko alex.vele...@arm.com

  * gcc.target/arm/bics_1.c : New testcase.
  * gcc.target/arm/bics_2.c : New testcase.
  * gcc.target/arm/bics_3.c : New testcase.
  * gcc.target/arm/bics_4.c : New testcase.
---
 gcc/config/arm/arm.md | 42 ++
 gcc/testsuite/gcc.target/arm/bics_1.c | 54 +
 gcc/testsuite/gcc.target/arm/bics_2.c | 57 +++
 gcc/testsuite/gcc.target/arm/bics_3.c | 41 +
 gcc/testsuite/gcc.target/arm/bics_4.c | 49 ++
 5 files changed, 243 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arm/bics_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/bics_2.c
 create mode 100644 gcc/testsuite/gcc.target/arm/bics_3.c
 create mode 100644 gcc/testsuite/gcc.target/arm/bics_4.c

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 164ac13..51a149e 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -2768,6 +2768,48 @@
  (const_string logic_shift_reg)))]
 )
 
+(define_insn andsi_not_shiftsi_si_scc_no_reuse
+  [(set (reg:CC_NOOV CC_REGNUM)
+   (compare:CC_NOOV
+   (and:SI (not:SI (match_operator:SI 0 shift_operator
+   [(match_operand:SI 1 s_register_operand r)
+(match_operand:SI 2 arm_rhs_operand rM)]))
+   (match_operand:SI 3 s_register_operand r))
+   (const_int 0)))
+   (clobber (match_scratch:SI 4 =r))]
+  TARGET_32BIT
+  bic%.%?\\t%4, %3, %1%S0
+  [(set_attr predicable yes)
+   (set_attr conds set)
+   (set_attr shift 1)
+   (set (attr type) (if_then_else (match_operand 2 const_int_operand )
+ (const_string logic_shift_imm)
+ (const_string logic_shift_reg)))]
+)
+
+(define_insn andsi_not_shiftsi_si_scc
+  [(parallel [(set (reg:CC_NOOV CC_REGNUM)
+   (compare:CC_NOOV
+   (and:SI (not:SI (match_operator:SI 0 shift_operator
+   [(match_operand:SI 1 s_register_operand r)
+(match_operand:SI 2 arm_rhs_operand rM)]))
+   (match_operand:SI 3 s_register_operand r))
+   (const_int 0)))
+   (set (match_operand:SI 4 s_register_operand =r)
+(and:SI (not:SI (match_op_dup 0
+[(match_dup 1)
+ (match_dup 2)]))
+(match_dup 3)))])]
+  TARGET_32BIT
+  bic%.%?\\t%4, %3, %1%S0
+  [(set_attr predicable yes)
+   (set_attr conds set)
+   (set_attr shift 1)
+   (set (attr type) (if_then_else (match_operand 2 const_int_operand )
+ (const_string logic_shift_imm)
+ (const_string logic_shift_reg)))]
+)
+
 (define_insn *andsi_notsi_si_compare0
   [(set (reg:CC_NOOV CC_REGNUM)
(compare:CC_NOOV
diff --git a/gcc/testsuite/gcc.target/arm/bics_1.c 
b/gcc/testsuite/gcc.target/arm/bics_1.c
new file mode 100644
index 000..173eb89
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/bics_1.c
@@ -0,0 +1,54 @@
+/* { dg-do run } */
+/* { dg-options -O2 --save-temps -fno-inline } */
+/* { dg-require-effective-target arm32 } */
+
+extern void abort (void);
+
+int
+bics_si_test1 (int a, int b, int c)
+{
+  int d = a  ~b;
+
+  /* { dg-final { scan-assembler-times bics\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+ 
2 } } */
+  if (d == 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+int
+bics_si_test2 (int a, int b, int c)
+{
+  int d = a  ~(b  3);
+
+  /* { dg-final { scan-assembler-times bics\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+, 
.sl \#3 1 } } */
+  if (d == 0)
+return a + c;
+  else
+return b + d + c;
+}
+
+int
+main ()
+{
+  int x;
+
+  x = bics_si_test1 (29, ~4, 5);
+  if (x != ((29  4) + ~4 + 5))
+abort ();
+
+  x = bics_si_test1 (5, ~2, 20);
+  if (x != 25)
+abort ();
+
+x = bics_si_test2 (35, ~4, 5);
+  if (x != ((35  ~(~4  3)) + ~4 + 5))
+abort ();
+
+  x = bics_si_test2 (96, ~2, 20);
+  if (x != 116)
+  abort ();
+
+  return 0;
+}
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/arm/bics_2.c 
b/gcc/testsuite/gcc.target/arm/bics_2.c
new file mode 100644
index 000..740d7c9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/bics_2.c
@@ -0,0 +1,57 @@
+/* { dg-do run } */
+/* { dg-options -O2 --save-temps -fno-inline } */
+/* { dg-require-effective-target arm32 } */
+
+extern void abort (void);
+
+int
+bics_si_test1 (int a, int b, int c)
+{
+  int d = a  ~b;
+
+  /* { dg-final { scan-assembler-not bics\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+ } 
} */
+  /* { dg-final { scan-assembler-times bic\tr\[0-9\]+,

RE: [PATCH] [AArch32] Additional bics patterns.

Hi Alex,

On 22/04/15 14:12, Alex Velenko wrote:
 Hi,

 This patch adds arm rtl patterns to generate bics instructions with shift.

 Done full regression run on arm-none-eabi.

A bootstrap on arm-none-linux-gnueabihf would be nice too.



 Is patch ok?

 gcc/config

 2015-04-22  Alex Velenko  alex.vele...@arm.com

   * arm/arm.md (andsi_not_shiftsi_si_scc): New pattern.
   * (andsi_not_shiftsi_si_scc_no_reuse): New pattern.

the path to arm.md should be:
* config/arm/arm.md


  
 +(define_insn andsi_not_shiftsi_si_scc_no_reuse
 +  [(set (reg:CC_NOOV CC_REGNUM)
 + (compare:CC_NOOV
 + (and:SI (not:SI (match_operator:SI 0 shift_operator
 + [(match_operand:SI 1 s_register_operand r)
 +  (match_operand:SI 2 arm_rhs_operand rM)]))
 + (match_operand:SI 3 s_register_operand r))
 + (const_int 0)))
 +   (clobber (match_scratch:SI 4 =r))]
 +  TARGET_32BIT
 +  bic%.%?\\t%4, %3, %1%S0
 +  [(set_attr predicable yes)
 +   (set_attr conds set)
 +   (set_attr shift 1)
 +   (set (attr type) (if_then_else (match_operand 2 const_int_operand
)
 +   (const_string logic_shift_imm)
 +   (const_string logic_shift_reg)))]
 +)

Since this is a predicable instruction and has a 32-bit encoding
you should also set the 'predicable_short_it' attribute to 'no'
to prevent GCC from trying to put it inside an IT block when
compiling for ARMv8-A.


 +
 +(define_insn andsi_not_shiftsi_si_scc
 +  [(parallel [(set (reg:CC_NOOV CC_REGNUM)
 + (compare:CC_NOOV
 + (and:SI (not:SI (match_operator:SI 0 shift_operator
 + [(match_operand:SI 1 s_register_operand r)
 +  (match_operand:SI 2 arm_rhs_operand rM)]))
 + (match_operand:SI 3 s_register_operand r))
 + (const_int 0)))
 + (set (match_operand:SI 4 s_register_operand =r)
 +  (and:SI (not:SI (match_op_dup 0
 +  [(match_dup 1)
 +   (match_dup 2)]))
 +  (match_dup 3)))])]
 +  TARGET_32BIT
 +  bic%.%?\\t%4, %3, %1%S0
 +  [(set_attr predicable yes)
 +   (set_attr conds set)
 +   (set_attr shift 1)
 +   (set (attr type) (if_then_else (match_operand 2 const_int_operand
)
 +   (const_string logic_shift_imm)
 +   (const_string logic_shift_reg)))]
same comment about predicable_short_it.

Cheers,
Kyrill


 +)
 +
  (define_insn *andsi_notsi_si_compare0
[(set (reg:CC_NOOV CC_REGNUM)
   (compare:CC_NOOV

Re: [PATCH 4/5] libcc1: Add 'set compile-gcc'


On 04/21/2015 03:41 PM, Jan Kratochvil wrote:

as discussed in
How to use compile  execute function in GDB
https://sourceware.org/ml/gdb/2015-04/msg00026.html

GDB currently searches for /usr/bin/ARCH-OS-gcc and chooses one but one cannot
override which one.  GDB would provide new option 'set compile-gcc'.

This patch does not change the libcc1 API as it overloads the triplet_regexp
parameter of GCC's set_arguments according to:

+  if (access (triplet_regexp, X_OK) == 0)

GDB counterpart:
[PATCH 4/4] compile: Add 'set compile-gcc'
https://sourceware.org/ml/gdb-patches/2015-04/msg00808.html
Message-ID: 20150421213657.14147.60506.st...@host1.jankratochvil.net


Jan


include/ChangeLog
2015-04-21  Jan Kratochvil  jan.kratoch...@redhat.com

* gcc-interface.h (enum gcc_base_api_version): Add comment to
GCC_FE_VERSION_1.
(struct gcc_base_vtable): Describe triplet_regexp parameter overload
for set_arguments.

libcc1/ChangeLog
2015-04-21  Jan Kratochvil  jan.kratoch...@redhat.com

* libcc1.cc (libcc1_set_arguments): Implement filenames for
triplet_regexp.

OK.  Please install on the trunk.

jeff

Re: trunk test result inconsistencies

2015-04-22 Thread Jakub Jelinek

On Wed, Apr 22, 2015 at 06:16:24AM -0700, H.J. Lu wrote:
  I admit I have not tried, but am certainly not seeing Peter's fix
  in https://www.kernel.org/pub/linux/kernel/v3.x/ChangeLog-3.19.5
  has it been fixed differently?  Not seeing anything in the Fedora kernel
  package %changelog either...
 
 
 * Wed Apr 15 2015 Josh Boyer jwbo...@fedoraproject.org
 - Add patch to fix tty closure race (rhbz 1208953)
 
 https://bugzilla.redhat.com/show_bug.cgi?id=1208953

Ah, nice, thanks.  Now to get it fixed upstream too.

Jakub

Re: [PATCH, rs6000, testsuite] Fix PR target/64579, __TM_end __builtin_tend failed to return transactional state

2015-04-22 Thread Peter Bergner

On Tue, 2015-04-21 at 21:17 -0500, Segher Boessenkool wrote:
 On Tue, Apr 21, 2015 at 03:56:18PM -0500, Peter Bergner wrote:
  This patch also fixes some issues I hit with the tabortdc[i] and
  htm_m[ft]spr_mode patterns when used with -m32 -mpowerpc64.
 
 Running the testsuite, or did you actually try to _use_ -m32 -mpowerpc64?  :-)

Not with the testsuite.  I had some simple unit tests that basically
just returned the CR/SPR and hit some ICEs.





 Maybe you can fold tabortdc with tabortwc now?  Use one UNSPEC name
 for both, :GPR and wd?

Wouldn't that change the tabortwc pattern to use DImode rather
than SImode when compiled with -m64 or -m32 -mpowerpc64?
I'm not sure we want that.



  + case HTM_BUILTIN_TTEST: /* Alias for: tabortwci. 0,r0,0  */
  +   op[nopnds++] = GEN_INT (0);
  +   op[nopnds++] = gen_rtx_REG (SImode, 0);
  +   op[nopnds++] = GEN_INT (0);
 
 Is that really r0, isn't that (0|rA)?  [Too lazy to read the docs myself
 right now, sorry.]

The ISA doc shows:

  tabortwci. TO,RA,SI

  a - EXTS((RA)32:63)
  abort - 0
  CR0 - 0 || MSR(TS) || 0

  if a  EXTS(SI)  TO0 then abort - 1
  if a  EXTS(SI)  TO1 then abort - 1
  if a = EXTS(SI)  T02 then abort - 1
  if a u EXTS(SI)  TO3 then abort - 1
  if a u EXTS(SI)  TO4 then abort - 1

  ...

Given that I'm passing in a zero TO value, the second and third
operands are don't care values, so I'm just using r0 and 0 as
random input values.  I'm only interested in extracting the
MSR's Transaction Status (TS) bits and placing them into CR0.



  +   emit_insn (gen_movcc (subreg, cr));
  +   emit_insn (gen_lshrsi3 (scratch2, scratch1, GEN_INT (28)));
  +   emit_insn (gen_andsi3 (target, scratch2, GEN_INT (0xf)));
  + }
  + }
 
 Don't we have helper functions/expanders to do these moves?  Yuck.

Heh, I looked.  The only helper pattern was the movcc pattern, but
that placed the CR into bits 32-35 of the register.  I needed the
shift to move it down into the low nibble and I use the and, since
one of the move cr insns places two copies of the CR value into
bits 32-35 and 36-39.


  -/* { dg-final { scan-assembler-times tabortdc\\. 1 } } */
  -/* { dg-final { scan-assembler-times tabortdci\\. 1 } } */
  +/* { dg-final { scan-assembler-times tabortdc\\. 1 { target lp64 } } } */
  +/* { dg-final { scan-assembler-times tabortdci\\. 1 { target lp64 } } } 
  */
 
 This skips this test on -m32 -mpowerpc64, is that on purpose?

Ummm, not exactly.  :-)   Not that many people test that though.
I'll see if I can find a replacement for lp64 that covers that case.
If not, I'm not too torn up if we skip it for -m32 -mpowerpc64.


Peter

Re: [PATCH 1/13] libitm fixes for musl support



On 20/04/15 21:41, Jeff Law wrote:

On 04/20/2015 12:51 PM, Szabolcs Nagy wrote:

This are minor correctness fixes required for musl.

(fcntl.h is the standard header and always available on Linux,
sys/fcntl.h is just a legacy alias, so use the standard one.)

libitm/Changelog:

2015-04-16  Gregor Richards  gregor.richa...@uwaterloo.ca

* config/arm/hwcap.cc: Use fcntl.h instead of sys/fcntl.h.
* config/linux/x86/tls.h: Only use __GLIBC_PREREQ if defined.


OK.
jeff


I've committed this on Szabolcs' behalf with r222325.

Kyrill

Re: [PATCH 11/13] unwind fix for musl



On 22/04/15 14:20, Jeff Law wrote:

On 04/20/2015 12:58 PM, Szabolcs Nagy wrote:

dl_iterate_phdr depends on USE_PT_GNU_EH_FRAME.

I think USE_PT_GNU_EH_FRAME could be enabled more generally (whenever
libc provides dl_iterate_phdr), but I only made a conservative change.


libgcc/Changelog:

2015-04-16  Gregor Richards  gregor.richa...@uwaterloo.ca
Szabolcs Nagy  szabolcs.n...@arm.com

* unwind-dw2-fde-dip.c (USE_PT_GNU_EH_FRAME): Define it on
Linux if target provides dl_iterate_phdr.

OK.  Please install on the trunk.


I've committed this on Szabolcs' behalf with r222328.

Kyrill



At this point I think everything but the target files have been
approved, right?

jeff

Re: [PATCH 2/13] musl libc config



On 22/04/15 14:16, Jeff Law wrote:

On 04/20/2015 12:52 PM, Szabolcs Nagy wrote:

Add musl libc support to gcc and the command line option -mmusl following other
libc support code.

Note that -mlibc cannot be entirely correct: there are build time decisions
based on the default libc.

gcc/Changelog:

2015-04-16  Gregor Richards  gregor.richa...@uwaterloo.ca

* config.gcc (LIBC_MUSL): New tm_defines macro.
* config/linux.h (OPTION_MUSL): Define.
(INCLUDE_DEFAULTS_MUSL_GPP, INCLUDE_DEFAULTS_MUSL_LOCAL,)
(INCLUDE_DEFAULTS_MUSL_PREFIX, INCLUDE_DEFAULTS_MUSL_CROSS,)
(INCLUDE_DEFAULTS_MUSL_TOOL, INCLUDE_DEFAULTS_MUSL_NATIVE): Define.

* config/linux.opt (mmusl): New option.
* gcc/configure.ac (gcc_cv_libc_provides_ssp): Add *-*-musl*.
(gcc_cv_target_dl_iterate_phdr): Add *-linux-musl*.

* gcc/configure: Regenerate.

OK for the trunk.  Please install.


I've committed this on Szabolcs' behalf with r222326
with slightly adjusted ChangeLog paths:

2015-04-22  Gregor Richards  gregor.richa...@uwaterloo.ca

* config.gcc (LIBC_MUSL): New tm_defines macro.
* config/linux.h (OPTION_MUSL): Define.
(INCLUDE_DEFAULTS_MUSL_GPP, INCLUDE_DEFAULTS_MUSL_LOCAL,)
(INCLUDE_DEFAULTS_MUSL_PREFIX, INCLUDE_DEFAULTS_MUSL_CROSS,)
(INCLUDE_DEFAULTS_MUSL_TOOL, INCLUDE_DEFAULTS_MUSL_NATIVE): Define.

* config/linux.opt (mmusl): New option.
* configure.ac (gcc_cv_libc_provides_ssp): Add *-*-musl*.
(gcc_cv_target_dl_iterate_phdr): Add *-linux-musl*.

* configure: Regenerate.


Kyrill



jeff

Re: [WIP] OpenMP 4 NVPTX support

2015-04-22 Thread Bernd Schmidt


On 04/21/2015 05:58 PM, Jakub Jelinek wrote:


suggests that while it is nice that when building nvptx accel compiler
we build libgcc.a, libc.a, libm.a, libgfortran.a (and in the future hopefully 
libgomp.a),
nothing attempts to link those in :(.


I have that fixed; I expect I'll get around to posting this at some 
point now that stage1 is open.



Bernd

[PATCH] Fix up tm_clone_hasher

handle_cache_entry in tm_clone_hasher looks wrong: the condition
if (e != HTAB_EMPTY_ENTRY || e != HTAB_DELETED_ENTRY) is always true.  While
it could be fixed by just changing || into , I decided to follow suit and
do what we do in handle_cache_entry's elsewhere in the codebase.  I've fixed
a formatting issue below while at it.

Bootstrapped/regtested on x86_64-linux, ok for trunk?
I think this should also go into 5.1.

2015-04-22  Marek Polacek  pola...@redhat.com

* varasm.c (handle_cache_entry): Fix logic.

diff --git gcc/varasm.c gcc/varasm.c
index 1597de1..3fc0316 100644
--- gcc/varasm.c
+++ gcc/varasm.c
@@ -5779,21 +5779,20 @@ struct tm_clone_hasher : ggc_cache_hashertree_map *
   static hashval_t hash (tree_map *m) { return tree_map_hash (m); }
   static bool equal (tree_map *a, tree_map *b) { return tree_map_eq (a, b); }
 
-  static void handle_cache_entry (tree_map *e)
+  static void
+  handle_cache_entry (tree_map *e)
   {
-if (e != HTAB_EMPTY_ENTRY || e != HTAB_DELETED_ENTRY)
-  {
-   extern void gt_ggc_mx (tree_map *);
-   if (ggc_marked_p (e-base.from))
- gt_ggc_mx (e);
-   else
- e = static_casttree_map * (HTAB_DELETED_ENTRY);
-  }
+extern void gt_ggc_mx (tree_map *);
+if (e == HTAB_EMPTY_ENTRY || e == HTAB_DELETED_ENTRY)
+  return;
+else if (ggc_marked_p (e-base.from))
+  gt_ggc_mx (e);
+else
+  e = static_casttree_map * (HTAB_DELETED_ENTRY);
   }
 };
 
-static GTY((cache))
- hash_tabletm_clone_hasher *tm_clone_hash;
+static GTY((cache)) hash_tabletm_clone_hasher *tm_clone_hash;
 
 void
 record_tm_clone_pair (tree o, tree n)

Marek

Re: [PATCH][AArch64] Implement -m{cpu,tune,arch}=native using only /proc/cpuinfo

On 22/04/15 12:46, Kyrill Tkachov wrote:

[Sorry for resending twice. My mail client glitched]

On 20/04/15 16:47, Kyrill Tkachov wrote:

Hi all,

This is an attempt to add native CPU detection to AArch64 GNU/Linux targets.
Similar to other ports we use SPEC rewriting to rewrite -m{cpu,tune,arch}=native
options into the appropriate CPU/architecture and the architecture extension
options
when appropriate (i.e. +crypto/+crc etc).

For CPU/architecture detection it gets a bit involved, especially when running
on a
big.LITTLE system. My proposed approach is to look at /proc/cpuinfo/ and search
for the
implementer id and part number fields that uniquely identify each core
(appropriate identifying
information is added to aarch64-cores.def). If we find two types of core we
have a big.LITTLE
system, so search through the core definitions extracted from aarch64-cores.def
to find if we
support such a combination (currently only cortex-a57.cortex-a53 and
cortex-a72.cortex-a53)
and make sure that the implementer id field matches up.

I tested this on a 4xCortex-A53 + 2xCortex-A57 big.LITTLE Ubuntu GNU/Linux
system.
There are two formats for /proc/cpuinfo/ that I'm aware of. The first (old) one
has the format:
--
processor: 0
processor: 1
processor: 2
processor: 3
processor: 4
processor: 5
Features: fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer: 0x41
CPU architecture: AArch64
CPU variant: 0x0
CPU part: 0xd03
--

In this format it lists the 6 cores but the CPU part it reports is only the one
for the core
from which /proc/cpuinfo was read from (!), in this case one of the Cortex-A53
cores.
This means we detect a different CPU depending on which
core GCC was invoked on. Not ideal really, but there's no more information that
we can extract.
Given the /proc/cpuinfo above, this patch will rewrite -mcpu=native into
-mcpu=cortex-a53+fp+simd+crypto+crc

The newer /proc/cpuinfo format proposed at
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=44b82b7700d05a52cd983799d3ecde1a976b3bed
looks like this:

--
processor : 0
Features: fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part: 0xd03
CPU revision: 0

processor : 1
Features: fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part: 0xd03
CPU revision: 0

processor : 2
Features: fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part: 0xd03
CPU revision: 0

processor : 3
Features: fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part: 0xd03
CPU revision: 0

processor : 4
Features: fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part: 0xd07
CPU revision: 0

processor : 5
Features: fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part: 0xd07
CPU revision: 0
--

The Features field is used to detect the architectural features that we map to
GCC option extensions
i.e. +fp,+crypto,+simd,+crc etc.

Similarly, -march=native would be rewritten into
-march=armv8-a+fp+simd+crypto+crc
while -mtune=native into -march=cortex-a57.cortex-a53 (the arch extension
options are not valid
for -mtune).

If it detects more than one implementer ID or the implementer IDs not matching
up somewhere
or some other weirdness /proc/cpuinfo or fails to recognise the CPU it will
bail out and ignore
the option entirely (similarly to other ports).

The patch works fine with both /proc/cpuinfo formats although, as mentioned
above, it will not be
able to detect the big.LITTLE combination from the first format.

I've filled in the implementer ID and part numbers for the Cortex-A57,
Cortex-A53, Cortex-A72, X-Gene 1 cores,
but I don't have that info for thunderx or exynosm1. Could someone from Cavium
and Samsung help me out
here? At present this patch has some false dummy values that I'd like to fill
out before committing this.

Thanks Andrew and Evandro for the info.
I've added the numbers to the patch, so it should work on those systems.
I'm attaching the final patch here for review.

And resending here with a minor whitespace change in aarch64-cores.def to
make thunderx line up with the other entries. Thanks Evandro for pointing it
out!

Kyrill

Thanks,
Kyrill

2014-04-22 Kyrylo Tkachov kyrylo.tkac...@arm.com

* config.host (case ${host}): Add aarch64*-*-linux case.
*

RE: [PATCH v3][MIPS] fix CRT_CALL_STATIC_FUNCTION macro

2015-04-22 Thread Petar Jovanovic

-Original Message-
From: Maciej W. Rozycki [mailto:ma...@linux-mips.org] 
Sent: Tuesday, April 21, 2015 8:52 PM
To: Petar Jovanovic
Cc: gcc-patches@gcc.gnu.org; catherine_mo...@mentor.com;
matthew.fort...@imgtec.com
Subject: Re: [PATCH v3][MIPS] fix CRT_CALL_STATIC_FUNCTION macro

 I think this will best be 
 reduced to a link-only test on bare iron, hoping for a link failure.

I am not sure how we can reduce the test to a link failure (today), if
ld will not report an error (today).
What exactly is wrong with the run time test as is in the last patch?

As of ld issue you have mentioned, it has been reported - see BZ#18297
[1].

Regards,
Petar

[1] BZ#18297, https://sourceware.org/bugzilla/show_bug.cgi?id=18297

Re: [PATCH 12/13] libstdc++, libgfortran gthr workaround for musl



On 22/04/15 14:17, Jeff Law wrote:

On 04/20/2015 12:59 PM, Szabolcs Nagy wrote:

libgcc/gthr-posix.h uses weak reference logic to determine if libpthread
is linked into the application or not.  This is broken unless there is
special workaround with libc internal knowledge and even then static
linking needs further manual link time workaround, so this was disabled
for os/generic in libstdc++v3 and for musl in libgfortran.

The change minimizes the impact on other setups, but I think the weak
ref logic should be disabled by default, it is never entirely correct.
Conforming code can crash on a glibc setup too:

$ cat a.cpp
#include pthread.h
void(*f)(void) = (void(*)(void))pthread_key_create;
int main(){}
$ g++ -static a.cpp -lpthread
$ ./a.out
Segmentation fault

I reported this previously at
https://gcc.gnu.org/ml/gcc/2014-11/msg00246.html


libgfortran/Changelog:

2015-04-16  Szabolcs Nagy  szabolcs.n...@arm.com

* acinclude.m4 (GTHREAD_USE_WEAK): Define as 0 for *-*-musl*.
* configure: Regenerate.

libstdc++v3/Changelog:

2015-04-16  Szabolcs Nagy  szabolcs.n...@arm.com

* config/os/generic/os_defines.h (_GLIBCXX_GTHREAD_USE_WEAK): Define.
* configure.host (os_include_dir): Set to os/generic for linux-musl*.

OK.  Please install on the trunk.


I've committed this on Szabolcs' behalf with r222329.

Kyrill



jeff

Re: [PATCH 2/13] musl libc config

2015-04-22 Thread Ilya Verbin

On Wed, Apr 22, 2015 at 15:34:51 +0100, Kyrill Tkachov wrote:
 
 On 22/04/15 14:16, Jeff Law wrote:
 On 04/20/2015 12:52 PM, Szabolcs Nagy wrote:
 Add musl libc support to gcc and the command line option -mmusl following 
 other
 libc support code.
 
 Note that -mlibc cannot be entirely correct: there are build time 
 decisions
 based on the default libc.
 
 gcc/Changelog:
 
 2015-04-16  Gregor Richards  gregor.richa...@uwaterloo.ca
 
 * config.gcc (LIBC_MUSL): New tm_defines macro.
 * config/linux.h (OPTION_MUSL): Define.
 (INCLUDE_DEFAULTS_MUSL_GPP, INCLUDE_DEFAULTS_MUSL_LOCAL,)
 (INCLUDE_DEFAULTS_MUSL_PREFIX, INCLUDE_DEFAULTS_MUSL_CROSS,)
 (INCLUDE_DEFAULTS_MUSL_TOOL, INCLUDE_DEFAULTS_MUSL_NATIVE): Define.
 
 * config/linux.opt (mmusl): New option.
 * gcc/configure.ac (gcc_cv_libc_provides_ssp): Add *-*-musl*.
 (gcc_cv_target_dl_iterate_phdr): Add *-linux-musl*.
 
 * gcc/configure: Regenerate.
 OK for the trunk.  Please install.
 
 I've committed this on Szabolcs' behalf with r222326
 with slightly adjusted ChangeLog paths:
 
 2015-04-22  Gregor Richards  gregor.richa...@uwaterloo.ca
 
 * config.gcc (LIBC_MUSL): New tm_defines macro.
 * config/linux.h (OPTION_MUSL): Define.
 (INCLUDE_DEFAULTS_MUSL_GPP, INCLUDE_DEFAULTS_MUSL_LOCAL,)
 (INCLUDE_DEFAULTS_MUSL_PREFIX, INCLUDE_DEFAULTS_MUSL_CROSS,)
 (INCLUDE_DEFAULTS_MUSL_TOOL, INCLUDE_DEFAULTS_MUSL_NATIVE): Define.
 
 * config/linux.opt (mmusl): New option.
 * configure.ac (gcc_cv_libc_provides_ssp): Add *-*-musl*.
 (gcc_cv_target_dl_iterate_phdr): Add *-linux-musl*.
 
 * configure: Regenerate.

This caused:

https://gcc.gnu.org/ml/gcc-regression/2015-04/msg00262.html

  -- Ilya

Re: [PATCH 2/13] musl libc config

2015-04-22 Thread Szabolcs Nagy


On 22/04/15 16:36, Kyrylo Tkachov wrote:
 On 22/04/15 16:26, Ilya Verbin wrote:
 On Wed, Apr 22, 2015 at 15:34:51 +0100, Kyrill Tkachov wrote:
 On 22/04/15 14:16, Jeff Law wrote:
 On 04/20/2015 12:52 PM, Szabolcs Nagy wrote:
 Add musl libc support to gcc and the command line option -mmusl
 following other
 libc support code.

 Note that -mlibc cannot be entirely correct: there are build time
 decisions
 based on the default libc.

 gcc/Changelog:

 2015-04-16  Gregor Richards  gregor.richa...@uwaterloo.ca

   * config.gcc (LIBC_MUSL): New tm_defines macro.
   * config/linux.h (OPTION_MUSL): Define.
   (INCLUDE_DEFAULTS_MUSL_GPP, INCLUDE_DEFAULTS_MUSL_LOCAL,)
   (INCLUDE_DEFAULTS_MUSL_PREFIX, INCLUDE_DEFAULTS_MUSL_CROSS,)
   (INCLUDE_DEFAULTS_MUSL_TOOL, INCLUDE_DEFAULTS_MUSL_NATIVE): Define.

   * config/linux.opt (mmusl): New option.
   * gcc/configure.ac (gcc_cv_libc_provides_ssp): Add *-*-musl*.
   (gcc_cv_target_dl_iterate_phdr): Add *-linux-musl*.

   * gcc/configure: Regenerate.
 OK for the trunk.  Please install.
 I've committed this on Szabolcs' behalf with r222326
 with slightly adjusted ChangeLog paths:

 2015-04-22  Gregor Richards  gregor.richa...@uwaterloo.ca

 * config.gcc (LIBC_MUSL): New tm_defines macro.
 * config/linux.h (OPTION_MUSL): Define.
 (INCLUDE_DEFAULTS_MUSL_GPP, INCLUDE_DEFAULTS_MUSL_LOCAL,)
 (INCLUDE_DEFAULTS_MUSL_PREFIX, INCLUDE_DEFAULTS_MUSL_CROSS,)
 (INCLUDE_DEFAULTS_MUSL_TOOL, INCLUDE_DEFAULTS_MUSL_NATIVE): Define.

 * config/linux.opt (mmusl): New option.
 * configure.ac (gcc_cv_libc_provides_ssp): Add *-*-musl*.
 (gcc_cv_target_dl_iterate_phdr): Add *-linux-musl*.

 * configure: Regenerate.
 This caused:

 https://gcc.gnu.org/ml/gcc-regression/2015-04/msg00262.html
 
 Sorry about that. I've reverted the patch.
 Szabolcs, we should wait until the target-specific parts are
 approved and install it all together? Or did you want to #ifdef
 some parts out to make this patch more robust towards targets that
 don't support musl?
 

yes, i didn't realize that this depends on the target specific parts

i will prepare an updated patch that works if the target has no musl
dynamic linker name defined

sorry

RE: [PATCH 2/13] musl libc config


On 22/04/15 16:26, Ilya Verbin wrote:
 On Wed, Apr 22, 2015 at 15:34:51 +0100, Kyrill Tkachov wrote:
 On 22/04/15 14:16, Jeff Law wrote:
 On 04/20/2015 12:52 PM, Szabolcs Nagy wrote:
 Add musl libc support to gcc and the command line option -mmusl
following other
 libc support code.

 Note that -mlibc cannot be entirely correct: there are build time
decisions
 based on the default libc.

 gcc/Changelog:

 2015-04-16  Gregor Richards  gregor.richa...@uwaterloo.ca

* config.gcc (LIBC_MUSL): New tm_defines macro.
* config/linux.h (OPTION_MUSL): Define.
(INCLUDE_DEFAULTS_MUSL_GPP, INCLUDE_DEFAULTS_MUSL_LOCAL,)
(INCLUDE_DEFAULTS_MUSL_PREFIX, INCLUDE_DEFAULTS_MUSL_CROSS,)
(INCLUDE_DEFAULTS_MUSL_TOOL, INCLUDE_DEFAULTS_MUSL_NATIVE): Define.

* config/linux.opt (mmusl): New option.
* gcc/configure.ac (gcc_cv_libc_provides_ssp): Add *-*-musl*.
(gcc_cv_target_dl_iterate_phdr): Add *-linux-musl*.

* gcc/configure: Regenerate.
 OK for the trunk.  Please install.
 I've committed this on Szabolcs' behalf with r222326
 with slightly adjusted ChangeLog paths:

 2015-04-22  Gregor Richards  gregor.richa...@uwaterloo.ca

 * config.gcc (LIBC_MUSL): New tm_defines macro.
 * config/linux.h (OPTION_MUSL): Define.
 (INCLUDE_DEFAULTS_MUSL_GPP, INCLUDE_DEFAULTS_MUSL_LOCAL,)
 (INCLUDE_DEFAULTS_MUSL_PREFIX, INCLUDE_DEFAULTS_MUSL_CROSS,)
 (INCLUDE_DEFAULTS_MUSL_TOOL, INCLUDE_DEFAULTS_MUSL_NATIVE): Define.

 * config/linux.opt (mmusl): New option.
 * configure.ac (gcc_cv_libc_provides_ssp): Add *-*-musl*.
 (gcc_cv_target_dl_iterate_phdr): Add *-linux-musl*.

 * configure: Regenerate.
 This caused:

 https://gcc.gnu.org/ml/gcc-regression/2015-04/msg00262.html

Sorry about that. I've reverted the patch.
Szabolcs, we should wait until the target-specific parts are
approved and install it all together? Or did you want to #ifdef
some parts out to make this patch more robust towards targets that
don't support musl?

Kyrill
   -- Ilya

Re: [PATCH 2/2][ARM] PR/63870: Add a __builtin_lane_check


Ping (https://gcc.gnu.org/ml/gcc-patches/2015-01/msg01436.html).

These are required for float16 patches posted at 
https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01332.html


Bootstrapped + check-gcc on arm-none-linux-gnueabihf.

Alan Lawrence wrote:
This parallels the present form of __builtin_aarch64_im_lane_boundsi, and allows 
to check lane indices for intrinsics that can otherwise be written in terms of 
GCC vector extensions.


The new builtin is not used in this patch but is used in my series of float16_t 
intrinsics (https://gcc.gnu.org/ml/gcc-patches/2015-01/msg01434.html), and at 
some point in the future we should rewrite existing intrinsics (for other types) 
to this form too, but I'm leaving that for a later patch series :).


Cross-tested check-gcc on arm-none-eabi
Bootstrapped on arm-none-linux-gnueabihf cortex-a15

gcc/ChangeLog:

 * config/arm/arm-builtins.c (enum arm_builtins):
 Add ARM_BUILTIN_NEON_BASE and ARM_BUILTIN_NEON_LANE_CHECK.
 (ARM_BUILTIN_NEON_BASE): Rename macro to
 (ARM_BUILTIN_NEON_PATTERN_START): ...this.
 (arm_init_neon_builtins): Register __builtin_arm_lane_check.
 (arm_expand_neon_builtin): Handle ARM_BUILTIN_NEON_LANE_CHECK.


commit 3d5f2b80dc4527b4874bff458bb047946322028f
Author: Alan Lawrence alan.lawre...@arm.com
Date:   Mon Dec 8 18:36:30 2014 +

Add __builtin_arm_lane_check

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 20d2198..3de2be7 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -546,12 +546,16 @@ enum arm_builtins
 #undef CRYPTO2
 #undef CRYPTO3
 
+  ARM_BUILTIN_NEON_BASE,
+  ARM_BUILTIN_NEON_LANE_CHECK = ARM_BUILTIN_NEON_BASE,
+
 #include arm_neon_builtins.def
 
   ARM_BUILTIN_MAX
 };
 
-#define ARM_BUILTIN_NEON_BASE (ARM_BUILTIN_MAX - ARRAY_SIZE (neon_builtin_data))
+#define ARM_BUILTIN_NEON_PATTERN_START \
+(ARM_BUILTIN_MAX - ARRAY_SIZE (neon_builtin_data))
 
 #undef CF
 #undef VAR1
@@ -910,7 +914,7 @@ arm_init_simd_builtin_scalar_types (void)
 static void
 arm_init_neon_builtins (void)
 {
-  unsigned int i, fcode = ARM_BUILTIN_NEON_BASE;
+  unsigned int i, fcode = ARM_BUILTIN_NEON_PATTERN_START;
 
   arm_init_simd_builtin_types ();
 
@@ -920,6 +924,15 @@ arm_init_neon_builtins (void)
  system.  */
   arm_init_simd_builtin_scalar_types ();
 
+  tree lane_check_fpr = build_function_type_list (void_type_node,
+		  intSI_type_node,
+		  intSI_type_node,
+		  NULL);
+  arm_builtin_decls[ARM_BUILTIN_NEON_LANE_CHECK] =
+  add_builtin_function (__builtin_arm_lane_check, lane_check_fpr,
+			ARM_BUILTIN_NEON_LANE_CHECK, BUILT_IN_MD,
+			NULL, NULL_TREE);
+
   for (i = 0; i  ARRAY_SIZE (neon_builtin_data); i++, fcode++)
 {
   bool print_type_signature_p = false;
@@ -2183,14 +2196,28 @@ arm_expand_neon_args (rtx target, machine_mode map_mode, int fcode,
   return target;
 }
 
-/* Expand a Neon builtin. These are special because they don't have symbolic
+/* Expand a Neon builtin, i.e. those registered only if TARGET_NEON holds.
+   Most of these are special because they don't have symbolic
constants defined per-instruction or per instruction-variant. Instead, the
required info is looked up in the table neon_builtin_data.  */
 static rtx
 arm_expand_neon_builtin (int fcode, tree exp, rtx target)
 {
+  if (fcode == ARM_BUILTIN_NEON_LANE_CHECK)
+{
+  tree nlanes = CALL_EXPR_ARG (exp, 0);
+  gcc_assert (TREE_CODE (nlanes) == INTEGER_CST);
+  rtx lane_idx = expand_normal (CALL_EXPR_ARG (exp, 1));
+  if (CONST_INT_P (lane_idx))
+	neon_lane_bounds (lane_idx, 0, TREE_INT_CST_LOW (nlanes), exp);
+  else
+	error (%Klane index must be a constant immediate, exp);
+  /* Don't generate any RTL.  */
+  return const0_rtx;
+}
+
   neon_builtin_datum *d =
-		neon_builtin_data[fcode - ARM_BUILTIN_NEON_BASE];
+		neon_builtin_data[fcode - ARM_BUILTIN_NEON_PATTERN_START];
   enum insn_code icode = d-code;
   builtin_arg args[SIMD_MAX_BUILTIN_ARGS];
   int num_args = insn_data[d-code].n_operands;

[PATCH 2/14][ARM]Add float16x8_t type


Identical to https://gcc.gnu.org/ml/gcc-patches/2015-01/msg01438.html .

Bootstrapped on arm-none-linux-gnueabihf.
commit bc582bd6a0ed7c7c91fc834603fc573ed745b1a7
Author: Alan Lawrence alan.lawre...@arm.com
Date:   Mon Dec 8 18:40:24 2014 +

Add float16x8_t + V8HFmode support (regardless of -mfp16-format)

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 3de2be7..2d97023 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -204,6 +204,7 @@ arm_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define di_UPDImode
 #define v16qi_UP V16QImode
 #define v8hi_UP  V8HImode
+#define v8hf_UP  V8HFmode
 #define v4si_UP  V4SImode
 #define v4sf_UP  V4SFmode
 #define v2di_UP  V2DImode
@@ -839,6 +840,7 @@ arm_init_simd_builtin_types (void)
   /* Continue with standard types.  */
   arm_simd_types[Float16x4_t].eltype = arm_simd_floatHF_type_node;
   arm_simd_types[Float32x2_t].eltype = float_type_node;
+  arm_simd_types[Float16x8_t].eltype = arm_simd_floatHF_type_node;
   arm_simd_types[Float32x4_t].eltype = float_type_node;
 
   for (i = 0; i  nelts; i++)
diff --git a/gcc/config/arm/arm-simd-builtin-types.def b/gcc/config/arm/arm-simd-builtin-types.def
index bcbd20b..b178ae6 100644
--- a/gcc/config/arm/arm-simd-builtin-types.def
+++ b/gcc/config/arm/arm-simd-builtin-types.def
@@ -44,5 +44,7 @@
 
   ENTRY (Float16x4_t, V4HF, none, 64, float16, 18)
   ENTRY (Float32x2_t, V2SF, none, 64, float32, 18)
+
+  ENTRY (Float16x8_t, V8HF, none, 128, float16, 19)
   ENTRY (Float32x4_t, V4SF, none, 128, float32, 19)
 
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 4181f12..9de63fa 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -26367,7 +26367,8 @@ arm_vector_mode_supported_p (machine_mode mode)
 {
   /* Neon also supports V2SImode, etc. listed in the clause below.  */
   if (TARGET_NEON  (mode == V2SFmode || mode == V4SImode || mode == V8HImode
-  || mode == V4HFmode || mode == V16QImode || mode == V4SFmode || mode == V2DImode))
+  || mode ==V4HFmode || mode == V16QImode || mode == V4SFmode
+  || mode == V2DImode || mode == V8HFmode))
 return true;
 
   if ((TARGET_NEON || TARGET_IWMMXT)
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 8c10ea3..f0ef33f 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1104,7 +1104,7 @@ extern int arm_arch_crc;
 /* Modes valid for Neon Q registers.  */
 #define VALID_NEON_QREG_MODE(MODE) \
   ((MODE) == V4SImode || (MODE) == V8HImode || (MODE) == V16QImode \
-   || (MODE) == V4SFmode || (MODE) == V2DImode)
+   || (MODE) == V8HFmode || (MODE) == V4SFmode || (MODE) == V2DImode)
 
 /* Structure modes valid for Neon registers.  */
 #define VALID_NEON_STRUCT_MODE(MODE) \
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index b4100c8..a958f63 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -58,6 +58,7 @@ typedef __simd128_int8_t int8x16_t;
 typedef __simd128_int16_t int16x8_t;
 typedef __simd128_int32_t int32x4_t;
 typedef __simd128_int64_t int64x2_t;
+typedef __simd128_float16_t float16x8_t;
 typedef __simd128_float32_t float32x4_t;
 typedef __simd128_poly8_t poly8x16_t;
 typedef __simd128_poly16_t poly16x8_t;

[PATCH 1/14][ARM] Add float16x4_t intrinsics

This is a respin of https://gcc.gnu.org/ml/gcc-patches/2015-01/msg01437.html , 
but fixes a wrong 'lane index out of bounds' error on vget_lane_f16 and 
vset_lane_f16, and drops vdup_n_f16 and vdup_lane_f16, as these are not in the 
ACLE spec. As previously, these use GCC vector extensions to maximise mid-end 
optimization, and do not attempt to support bigendian.


The vld1, vldN, vldN_lane and corresponding intrinsics follow in patch 4/14.

Bootstrapped + check-gcc on arm-none-linux-gnueabihf.

gcc/ChangeLog:

* config/arm/arm_neon.h (float16_t, vget_lane_f16, vset_lane_f16,
vcreate_f16, vld1_lane_f16, vld1_dup_f16, vreinterpret_p8_f16,
vreinterpret_p16_f16, vreinterpret_f16_p8, vreinterpret_f16_p16,
vreinterpret_f16_f32, vreinterpret_f16_p64, vreinterpret_f16_s64,
vreinterpret_f16_u64, vreinterpret_f16_s8, vreinterpret_f16_s16,
vreinterpret_f16_s32, vreinterpret_f16_u8, vreinterpret_f16_u16,
vreinterpret_f16_u32, vreinterpret_f32_f16, vreinterpret_p64_f16,
vreinterpret_s64_f16, vreinterpret_u64_f16, vreinterpret_s8_f16,
vreinterpret_s16_f16, vreinterpret_s32_f16, vreinterpret_u8_f16,
vreinterpret_u16_f16, vreinterpret_u32_f16): New.
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index c923e294cda2f8cb88e4b1ccca6fd4f13a3ed98d..b4100c88f83bc603377912b7aab085532178ef99 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -41,6 +41,7 @@ typedef __simd64_int8_t int8x8_t;
 typedef __simd64_int16_t int16x4_t;
 typedef __simd64_int32_t int32x2_t;
 typedef __builtin_neon_di int64x1_t;
+typedef __builtin_neon_hf float16_t;
 typedef __simd64_float16_t float16x4_t;
 typedef __simd64_float32_t float32x2_t;
 typedef __simd64_poly8_t poly8x8_t;
@@ -5201,6 +5202,19 @@ vget_lane_s32 (int32x2_t __a, const int __b)
   return (int32_t)__builtin_neon_vget_lanev2si (__a, __b);
 }
 
+/* Functions cannot accept or return __FP16 types.  Even if the function
+   were marked always-inline so there were no call sites, the declaration
+   would nonetheless raise an error.  Hence, we must use a macro instead.  */
+
+#define vget_lane_f16(__v, __idx)		\
+  __extension__	\
+({		\
+  float16x4_t __vec = (__v);		\
+  __builtin_arm_lane_check (4, __idx);	\
+  float16_t __res = __vec[__idx];		\
+  __res;	\
+})
+
 __extension__ static __inline float32_t __attribute__ ((__always_inline__))
 vget_lane_f32 (float32x2_t __a, const int __b)
 {
@@ -5333,6 +5347,16 @@ vset_lane_s32 (int32_t __a, int32x2_t __b, const int __c)
   return (int32x2_t)__builtin_neon_vset_lanev2si ((__builtin_neon_si) __a, __b, __c);
 }
 
+#define vset_lane_f16(__e, __v, __idx)		\
+  __extension__	\
+({		\
+  float16_t __elem = (__e);			\
+  float16x4_t __vec = (__v);		\
+  __builtin_arm_lane_check (4, __idx);	\
+  __vec[__idx] = __elem;			\
+  __vec;	\
+})
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vset_lane_f32 (float32_t __a, float32x2_t __b, const int __c)
 {
@@ -5479,6 +5503,12 @@ vcreate_s64 (uint64_t __a)
   return (int64x1_t)__builtin_neon_vcreatedi ((__builtin_neon_di) __a);
 }
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vcreate_f16 (uint64_t __a)
+{
+  return (float16x4_t) __a;
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vcreate_f32 (uint64_t __a)
 {
@@ -8796,6 +8826,12 @@ vld1_lane_s32 (const int32_t * __a, int32x2_t __b, const int __c)
   return (int32x2_t)__builtin_neon_vld1_lanev2si ((const __builtin_neon_si *) __a, __b, __c);
 }
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vld1_lane_f16 (const float16_t * __a, float16x4_t __b, const int __c)
+{
+  return vset_lane_f16 (*__a, __b, __c);
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vld1_lane_f32 (const float32_t * __a, float32x2_t __b, const int __c)
 {
@@ -8944,6 +8980,13 @@ vld1_dup_s32 (const int32_t * __a)
   return (int32x2_t)__builtin_neon_vld1_dupv2si ((const __builtin_neon_si *) __a);
 }
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vld1_dup_f16 (const float16_t * __a)
+{
+  float16_t __f = *__a;
+  return (float16x4_t) { __f, __f, __f, __f };
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vld1_dup_f32 (const float32_t * __a)
 {
@@ -11828,6 +11871,12 @@ vreinterpret_p8_p16 (poly16x4_t __a)
 }
 
 __extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vreinterpret_p8_f16 (float16x4_t __a)
+{
+  return (poly8x8_t) __a;
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
 vreinterpret_p8_f32 (float32x2_t __a)
 {
   return (poly8x8_t)__builtin_neon_vreinterpretv8qiv2sf (__a);
@@ -11896,6 +11945,12 @@ vreinterpret_p16_p8 (poly8x8_t __a)
 }
 
 __extension__ static __inline poly16x4_t __attribute__

[PATCH 3/14][ARM] Add float16x8_t intrinsics

This is a respin of https://gcc.gnu.org/ml/gcc-patches/2015-01/msg01439.html , 
again fixing a wrong 'lane index out of bounds' error for vgetq_lane_f16 and 
vsetq_lane-f16 at -O0, and dropping vdupq_n_f16 and vdupq_lane_f16 as these are 
not in the ACLE spec.


The vld1, vldN, vldN_lane and corresponding intrinsics follow in patch 4/14.

Bootstrapped + check-gcc on arm-none-linux-gnueabihf.

gcc/ChangeLog:

* config/arm/arm_neon.h (vgetq_lane_f16, vsetq_lane_f16, vld1q_lane_f16,
vld1q_dup_f16, vreinterpretq_p8_f16, vreinterpretq_p16_f16,
vreinterpretq_f16_p8, vreinterpretq_f16_p16, vreinterpretq_f16_f32,
vreinterpretq_f16_p64, vreinterpretq_f16_p128, vreinterpretq_f16_s64,
vreinterpretq_f16_u64, vreinterpretq_f16_s8, vreinterpretq_f16_s16,
vreinterpretq_f16_s32, vreinterpretq_f16_u8, vreinterpretq_f16_u16,
vreinterpretq_f16_u32, vreinterpretq_f32_f16, vreinterpretq_p64_f16,
vreinterpretq_p128_f16, vreinterpretq_s64_f16, vreinterpretq_u64_f16,
vreinterpretq_s8_f16, vreinterpretq_s16_f16, vreinterpretq_s32_f16,
vreinterpretq_u8_f16, vreinterpretq_u16_f16, vreinterpretq_u32_f16):
New.
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index a958f63ca3084bf7cfaf6420e535d69f50efa6b6..db73c70c6e4ca99db62ff4055a33bfe00db29039 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -5282,6 +5282,15 @@ vgetq_lane_s32 (int32x4_t __a, const int __b)
   return (int32_t)__builtin_neon_vget_lanev4si (__a, __b);
 }
 
+#define vgetq_lane_f16(__v, __idx)		\
+  __extension__	\
+({		\
+  float16x8_t __vec = (__v);		\
+  __builtin_arm_lane_check (8, __idx);	\
+  float16_t __res = __vec[__idx];		\
+  __res;	\
+})
+
 __extension__ static __inline float32_t __attribute__ ((__always_inline__))
 vgetq_lane_f32 (float32x4_t __a, const int __b)
 {
@@ -5424,6 +5433,16 @@ vsetq_lane_s32 (int32_t __a, int32x4_t __b, const int __c)
   return (int32x4_t)__builtin_neon_vset_lanev4si ((__builtin_neon_si) __a, __b, __c);
 }
 
+#define vsetq_lane_f16(__e, __v, __idx)		\
+  __extension__	\
+({		\
+  float16_t __elem = (__e);			\
+  float16x8_t __vec = (__v);		\
+  __builtin_arm_lane_check (8, __idx);	\
+  __vec[__idx] = __elem;			\
+  __vec;	\
+})
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vsetq_lane_f32 (float32_t __a, float32x4_t __b, const int __c)
 {
@@ -8907,6 +8926,12 @@ vld1q_lane_s32 (const int32_t * __a, int32x4_t __b, const int __c)
   return (int32x4_t)__builtin_neon_vld1_lanev4si ((const __builtin_neon_si *) __a, __b, __c);
 }
 
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vld1q_lane_f16 (const float16_t * __a, float16x8_t __b, const int __c)
+{
+  return vsetq_lane_f16 (*__a, __b, __c);
+}
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vld1q_lane_f32 (const float32_t * __a, float32x4_t __b, const int __c)
 {
@@ -9062,6 +9087,13 @@ vld1q_dup_s32 (const int32_t * __a)
   return (int32x4_t)__builtin_neon_vld1_dupv4si ((const __builtin_neon_si *) __a);
 }
 
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vld1q_dup_f16 (const float16_t * __a)
+{
+  float16_t __f = *__a;
+  return (float16x8_t) { __f, __f, __f, __f, __f, __f, __f, __f };
+}
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vld1q_dup_f32 (const float32_t * __a)
 {
@@ -12856,6 +12888,12 @@ vreinterpretq_p8_p16 (poly16x8_t __a)
 }
 
 __extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_p8_f16 (float16x8_t __a)
+{
+  return (poly8x16_t) __a;
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
 vreinterpretq_p8_f32 (float32x4_t __a)
 {
   return (poly8x16_t)__builtin_neon_vreinterpretv16qiv4sf (__a);
@@ -12932,6 +12970,12 @@ vreinterpretq_p16_p8 (poly8x16_t __a)
 }
 
 __extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_p16_f16 (float16x8_t __a)
+{
+  return (poly16x8_t) __a;
+}
+
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
 vreinterpretq_p16_f32 (float32x4_t __a)
 {
   return (poly16x8_t)__builtin_neon_vreinterpretv8hiv4sf (__a);
@@ -13001,6 +13045,88 @@ vreinterpretq_p16_u32 (uint32x4_t __a)
   return (poly16x8_t)__builtin_neon_vreinterpretv8hiv4si ((int32x4_t) __a);
 }
 
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_f16_p8 (poly8x16_t __a)
+{
+  return (float16x8_t) __a;
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_f16_p16 (poly16x8_t __a)
+{
+  return (float16x8_t) __a;
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_f16_f32 (float32x4_t __a)
+{
+  return (float16x8_t) __a;
+}
+
+#ifdef __ARM_FEATURE_CRYPTO

[PATCH 5/14][AArch64] Add basic fp16 support

This adds basic support for moving __fp16 values around, passing and returning, 
and operating on them by promoting to 32-bit floats. Also a few scalar testcases.


Note I've not got an fmov (immediate) variant, because there is no 'fmov hn, 
...' - the only way to load a 16-bit immediate is to reinterpret the bit pattern 
into some other type. Vector MOVs are turned off for the same reason. If this is 
practical it can follow in a separate patch.


My reading of ACLE suggests the type name to use is __fp16, rather than 
__builtin_aarch64_simd_hf. I can use the latter if that's preferable?


int-f16 conversions are a little odd, assembly

int_to_f16: scvtf d0, w0 fcvt h0, d0 ret

int_from_f16: fcvt s0, h0 fcvtzs w0, s0 ret

The spec is silent on the absence or existence of intermediate rounding steps, 
however, I don't think this matters: even float32_t offers s many more bits 
than __fp16, that any integer which fits into the range of an __fp16 (i.e. is 
not infinite), can be expressed exactly as a float32_t without any loss of 
precision. So I think the above are OK. (if they can be optimized, that can 
follow in a later patch.)


Note that unlike ARM, where we support both IEEE and Alternative formats (and, 
somewhat-awkwardly, format-agnostic code too), here we are settling on IEEE 
format always. Technically, we should output an EABI attribute saying which 
format we are using here, however, aarch64 asm does not support the 
.eabi-attribute directive yet, so it seems reasonable to leave this while there 
is only one possible format.


Bootstrapped + check-gcc on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.c (aarch64_fp16_type_node): New.
(aarch64_init_builtins): Make aarch64_fp16_type_node, use for __fp16.

* config/aarch64/aarch64-modes.def: Add HFmode.

* config/aarch64/aarch64.h (TARGET_CPU_CPP_BUILTINS): Define
__ARM_FP16_FORMAT_IEEE and __ARM_FP16_ARGS. Set bit 1 of __ARM_FP.

* config/aarch64/aarch64.c (aarch64_init_libfuncs,
aarch64_promoted_type): New.

(aarch64_float_const_representable_p): Disable HFmode.
(aarch64_mangle_type): Mangle half-precision floats to Dh.
(TARGET_PROMOTED_TYPE): Define to aarch64_promoted_type.
(TARGET_INIT_LIBFUNCS): Define to aarch64_init_libfuncs.

* config/aarch64/aarch64.md (movmode): Include HFmode using GPF_F16.
(movhf_aarch64, extendhfsf2, extendhfdf2, truncsfhf2, truncdfhf2): New.

* config/aarch64/iterators.md (GPF_F16): New.


gcc/testsuite/ChangeLog:

* gcc.target/aarch64/f16_convs_1.c: New test.
* gcc.target/aarch64/f16_convs_2.c: New test.
* gcc.target/aarch64/f16_movs_1.c: New test.
diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 87f1ac2ec1e3c774782c567b20c673802ae90d99..5a7b112bd1fe77826bfb84383c86dceb6b1521e3 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -453,6 +453,9 @@ static struct aarch64_simd_type_info aarch64_simd_types [] = {
 };
 #undef ENTRY
 
+/* This type is not SIMD-specific; it is the user-visible __fp16.  */
+static tree aarch64_fp16_type_node = NULL_TREE;
+
 static tree aarch64_simd_intOI_type_node = NULL_TREE;
 static tree aarch64_simd_intEI_type_node = NULL_TREE;
 static tree aarch64_simd_intCI_type_node = NULL_TREE;
@@ -862,6 +865,12 @@ aarch64_init_builtins (void)
 = add_builtin_function (__builtin_aarch64_set_fpsr, ftype_set_fpr,
 			AARCH64_BUILTIN_SET_FPSR, BUILT_IN_MD, NULL, NULL_TREE);
 
+  aarch64_fp16_type_node = make_node (REAL_TYPE);
+  TYPE_PRECISION (aarch64_fp16_type_node) = 16;
+  layout_type (aarch64_fp16_type_node);
+
+  (*lang_hooks.types.register_builtin_type) (aarch64_fp16_type_node, __fp16);
+
   if (TARGET_SIMD)
 aarch64_init_simd_builtins ();
   if (TARGET_CRC32)
diff --git a/gcc/config/aarch64/aarch64-modes.def b/gcc/config/aarch64/aarch64-modes.def
index b17b90d90601ae0a631a78560da743720c4638ce..c30059b632fa8cb7fd9071917d3f581f0966a86d 100644
--- a/gcc/config/aarch64/aarch64-modes.def
+++ b/gcc/config/aarch64/aarch64-modes.def
@@ -36,6 +36,10 @@ CC_MODE (CC_DLTU);
 CC_MODE (CC_DGEU);
 CC_MODE (CC_DGTU);
 
+/* Half-precision floating point for arm_neon.h float16_t.  */
+FLOAT_MODE (HF, 2, 0);
+ADJUST_FLOAT_FORMAT (HF, ieee_half_format);
+
 /* Vector modes.  */
 VECTOR_MODES (INT, 8);/*   V8QI V4HI V2SI.  */
 VECTOR_MODES (INT, 16);   /* V16QI V8HI V4SI V2DI.  */
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index bf59e40a64459f6daddef47a5f5214adfd92d9b6..67c37ebc0e06d22e524322e5a82b6bcde550bd93 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -57,7 +57,9 @@
   if (TARGET_FLOAT) \
 {   \
   builtin_define (__ARM_FEATURE_FMA); \
-

[PATCH 5/14][AArch64] Add basic fp16 support


[Resending with correct in-reply-to header]

This adds basic support for moving __fp16 values around, passing and returning, 
and operating on them by promoting to 32-bit floats. Also a few scalar testcases.


Note I've not got an fmov (immediate) variant, because there is no 'fmov hn, 
...' - the only way to load a 16-bit immediate is to reinterpret the bit pattern 
into some other type. Vector MOVs are turned off for the same reason. If this is 
practical it can follow in a separate patch.



My reading of ACLE suggests the type name to use is __fp16, rather than 
__builtin_aarch64_simd_hf. I can use the latter if that's preferable?



int-f16 conversions are a little odd, assembly

int_to_f16: scvtf d0, w0 fcvt h0, d0 ret

int_from_f16: fcvt s0, h0 fcvtzs w0, s0 ret

The spec is silent on the absence or existence of intermediate rounding steps, 
however, I don't think this matters: even float32_t offers s many more bits 
than __fp16, that any integer which fits into the range of an __fp16 (i.e. is 
not infinite), can be expressed exactly as a float32_t without any loss of 
precision. So I think the above are OK. (if they can be optimized, that can 
follow in a later patch.)



Note that unlike ARM, where we support both IEEE and Alternative formats (and, 
somewhat-awkwardly, format-agnostic code too), here we are settling on IEEE 
format always. Technically, we should output an EABI attribute saying which 
format we are using here, however, aarch64 asm does not support the 
.eabi-attribute directive yet, so it seems reasonable to leave this while there 
is only one possible format.



Bootstrapped + check-gcc on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.c (aarch64_fp16_type_node): New.
(aarch64_init_builtins): Make aarch64_fp16_type_node, use for __fp16.

* config/aarch64/aarch64-modes.def: Add HFmode.

* config/aarch64/aarch64.h (TARGET_CPU_CPP_BUILTINS): Define
__ARM_FP16_FORMAT_IEEE and __ARM_FP16_ARGS. Set bit 1 of __ARM_FP.

* config/aarch64/aarch64.c (aarch64_init_libfuncs,
aarch64_promoted_type): New.

(aarch64_float_const_representable_p): Disable HFmode.
(aarch64_mangle_type): Mangle half-precision floats to Dh.
(TARGET_PROMOTED_TYPE): Define to aarch64_promoted_type.
(TARGET_INIT_LIBFUNCS): Define to aarch64_init_libfuncs.

* config/aarch64/aarch64.md (movmode): Include HFmode using GPF_F16.
(movhf_aarch64, extendhfsf2, extendhfdf2, truncsfhf2, truncdfhf2): New.

* config/aarch64/iterators.md (GPF_F16): New.


gcc/testsuite/ChangeLog:

* gcc.target/aarch64/f16_convs_1.c: New test.
* gcc.target/aarch64/f16_convs_2.c: New test.
* gcc.target/aarch64/f16_movs_1.c: New test.
diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 87f1ac2ec1e3c774782c567b20c673802ae90d99..5a7b112bd1fe77826bfb84383c86dceb6b1521e3 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -453,6 +453,9 @@ static struct aarch64_simd_type_info aarch64_simd_types [] = {
 };
 #undef ENTRY
 
+/* This type is not SIMD-specific; it is the user-visible __fp16.  */
+static tree aarch64_fp16_type_node = NULL_TREE;
+
 static tree aarch64_simd_intOI_type_node = NULL_TREE;
 static tree aarch64_simd_intEI_type_node = NULL_TREE;
 static tree aarch64_simd_intCI_type_node = NULL_TREE;
@@ -862,6 +865,12 @@ aarch64_init_builtins (void)
 = add_builtin_function (__builtin_aarch64_set_fpsr, ftype_set_fpr,
 			AARCH64_BUILTIN_SET_FPSR, BUILT_IN_MD, NULL, NULL_TREE);
 
+  aarch64_fp16_type_node = make_node (REAL_TYPE);
+  TYPE_PRECISION (aarch64_fp16_type_node) = 16;
+  layout_type (aarch64_fp16_type_node);
+
+  (*lang_hooks.types.register_builtin_type) (aarch64_fp16_type_node, __fp16);
+
   if (TARGET_SIMD)
 aarch64_init_simd_builtins ();
   if (TARGET_CRC32)
diff --git a/gcc/config/aarch64/aarch64-modes.def b/gcc/config/aarch64/aarch64-modes.def
index b17b90d90601ae0a631a78560da743720c4638ce..c30059b632fa8cb7fd9071917d3f581f0966a86d 100644
--- a/gcc/config/aarch64/aarch64-modes.def
+++ b/gcc/config/aarch64/aarch64-modes.def
@@ -36,6 +36,10 @@ CC_MODE (CC_DLTU);
 CC_MODE (CC_DGEU);
 CC_MODE (CC_DGTU);
 
+/* Half-precision floating point for arm_neon.h float16_t.  */
+FLOAT_MODE (HF, 2, 0);
+ADJUST_FLOAT_FORMAT (HF, ieee_half_format);
+
 /* Vector modes.  */
 VECTOR_MODES (INT, 8);/*   V8QI V4HI V2SI.  */
 VECTOR_MODES (INT, 16);   /* V16QI V8HI V4SI V2DI.  */
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index bf59e40a64459f6daddef47a5f5214adfd92d9b6..67c37ebc0e06d22e524322e5a82b6bcde550bd93 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -57,7 +57,9 @@
   if (TARGET_FLOAT) \
 {   \

[PATCH 7/14][AArch64] vld{2,3,4}{,_lane,_dup},vcombine,vcreate


gcc/ChangeLog:

* config/aarch64/aarch64.c (aarch64_split_simd_combine): Add V4HFmode.
* config/aarch64/aarch64-builtins.c (VAR13, VAR14): New.
(aarch64_scalar_builtin_types, aarch64_init_simd_builtin_scalar_types):
Add __builtin_aarch64_simd_hf.
* config/aarch64/arm_neon.h (float16x4x2_t, float16x8x2_t,
float16x4x3_t, float16x8x3_t, float16x4x4_t, float16x8x4_t,
vcombine_f16, vst2_lane_f16, vst2q_lane_f16, vst3_lane_f16,
vst3q_lane_f16, vst4_lane_f16, vst4q_lane_f16, vld2_f16, vld2q_f16,
vld3_f16, vld3q_f16, vld4_f16, vld4q_f16, vld2_dup_f16, vld2q_dup_f16,
vld3_dup_f16, vld3q_dup_f16, vld4_dup_f16, vld4q_dup_f16,
vld2_lane_f16, vld2q_lane_f16, vld3_lane_f16, vld3q_lane_f16,
vld4_lane_f16, vld4q_lane_f16, vst2_f16, vst2q_f16, vst3_f16,
vst3q_f16, vst4_f16, vst4q_f16, vcreate_f16): New.

* config/aarch64/iterators.md (VALLDIF, Vtype, Vetype, Vbtype,
V_cmp_result, v_cmp_result): Add cases for V4HF and V8HF.
(VDC, Vdbl): Add V4HF.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vldN_1.c: Add float16x4_t and float16x8_t cases.
* gcc.target/aarch64/vldN_dup_1.c: Likewise.
* gcc.target/aarch64/vldN_lane_1.c: Likewise.

[PATCH 12/14][ARM/AArch64 Testsuite] Update advsimd-intrinsics tests to add float16 vectors

This is a fairly straightforward addition of a new type: I've added it in on 
equal status to the other types, because the various 
vector-load/store/element-manipulating intrinsics, are *not* conditional on HW 
support. (They just involve moving 16-bit chunks around, just like s16/u16/p16).


Thus, for many tests, this just involves adding default expected values of { 
0x ...}. While there are indeed more of such default values for 
float16x4/8 than any other type (because there are fewer intrinsics), there are 
plenty others, so this seems consistent. However, there is no vdup_n_f16 
intrinsic so I worked around this using a macro (yes, a bit ugh).


There are many check_GNU_style.sh violations here but I tried to be consistent 
with the existing code.


Passing on arm-none-linux-gnueabihf and aarch64-none-linux-gnu, 
aarch64_be-none-elf (following previous patch)


gcc/testsuite/ChangeLog:

* gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h (hfloat16_t,
vdup_n_f16): New.
(result, expected, CHECK_RESULTS, CHECK_RESULTS_NAMED, clean_results):
Add float16x4 and float16x8 cases.

DECL_VARIABLE_64BITS_VARIANTS: Add float16x4 case.
DECL_VARIABLE_128BITS_VARIANTS: Add float16x8 case.

* gcc.target/aarch64/advsimd-intrinsics/compute-data-ref.h (buffer,
buffer_pad, buffer_dup, buffer_dup_pad): Add float16x4 and float16x8.

* gcc.target/aarch64/advsimd-intrinsics/vaba.c: Add expected results
for float16x4 and float16x8.
* gcc.target/aarch64/advsimd-intrinsics/vabal.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vabd.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vabdl.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vabs.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vadd.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vaddl.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vaddw.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vand.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vbic.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vbsl.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vcls.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vclz.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vcnt.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vdup-vmov.c: Likewise.

* gcc.target/aarch64/advsimd-intrinsics/vcombine.c: Add expected
results for float16x4 and float16x8.
(main): add test of float16x4 - float16x8 case.
* gcc.target/aarch64/advsimd-intrinsics/vcreate.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vdup_lane.c: Likewise.
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
index 80d6b5893cc33eaff4178a2f26aa53ccf1c48dda..3d1b36fd111e906f6e940ccf89900b44e79a68e9 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
@@ -7,6 +7,7 @@
 #include inttypes.h
 
 /* helper type, to help write floating point results in integer form.  */
+typedef uint16_t hfloat16_t;
 typedef uint32_t hfloat32_t;
 typedef uint64_t hfloat64_t;
 
@@ -132,6 +133,7 @@ static ARRAY(result, uint, 32, 2);
 static ARRAY(result, uint, 64, 1);
 static ARRAY(result, poly, 8, 8);
 static ARRAY(result, poly, 16, 4);
+static ARRAY(result, float, 16, 4);
 static ARRAY(result, float, 32, 2);
 static ARRAY(result, int, 8, 16);
 static ARRAY(result, int, 16, 8);
@@ -143,6 +145,7 @@ static ARRAY(result, uint, 32, 4);
 static ARRAY(result, uint, 64, 2);
 static ARRAY(result, poly, 8, 16);
 static ARRAY(result, poly, 16, 8);
+static ARRAY(result, float, 16, 8);
 static ARRAY(result, float, 32, 4);
 #ifdef __aarch64__
 static ARRAY(result, float, 64, 2);
@@ -160,6 +163,7 @@ extern ARRAY(expected, uint, 32, 2);
 extern ARRAY(expected, uint, 64, 1);
 extern ARRAY(expected, poly, 8, 8);
 extern ARRAY(expected, poly, 16, 4);
+extern ARRAY(expected, hfloat, 16, 4);
 extern ARRAY(expected, hfloat, 32, 2);
 extern ARRAY(expected, int, 8, 16);
 extern ARRAY(expected, int, 16, 8);
@@ -171,6 +175,7 @@ extern ARRAY(expected, uint, 32, 4);
 extern ARRAY(expected, uint, 64, 2);
 extern ARRAY(expected, poly, 8, 16);
 extern ARRAY(expected, poly, 16, 8);
+extern ARRAY(expected, hfloat, 16, 8);
 extern ARRAY(expected, hfloat, 32, 4);
 extern ARRAY(expected, hfloat, 64, 2);
 
@@ -187,6 +192,7 @@ extern ARRAY(expected, hfloat, 64, 2);
 CHECK(test_name, uint, 64, 1, PRIx64, expected, comment);		\
 CHECK(test_name, poly, 8, 8, PRIx8, expected, comment);		\
 CHECK(test_name, poly, 16, 4, PRIx16, expected, comment);		\
+CHECK_FP(test_name, float, 16, 4, PRIx16, expected, comment);	\
 CHECK_FP(test_name, float, 32, 2, PRIx32, expected, comment);	\
 	\

[PATCH 11/14][fold-const.c] Fix bigendian HFmode in native_interpret_real

native_interpret_real in fold_const.c has an assumption that floats are at least 
32-bits (on bigendian targets with UNITS_PER_WORD = 4). This patch relaxes that 
assumption (allowing e.g. 16-bit HFmode values).


On aarch64_be-none-elf, this fixes the float16x4_t variant of 
gcc.target/aarch64/advsimd-intrinsics/vcreate.c added in the next patch in series.


Bootstrapped + check-gcc on:
x86-unknown-linux-gnu
powerpc64-unknown-linux-gnu (gcc110 on compile farm, as I believe this is 
bigendian, as opposed to powerpc64le-unknown-linux-gnu on gcc112)

arm-none-linux-gnueabihf
aarch64-none-elf

It's not clear that any of those actually test this code, but the 
aarch64_be-none-elf tests in next patch definitely do ;).


gcc/ChangeLog:

fold-const.c (native_interpret_real): Correctly read floats of less
than 32 bits on bigendian targets.
commit f8ad02fecdb7b6f91bab77cc154a246bd719ac20
Author: Alan Lawrence alan.lawre...@arm.com
Date:   Thu Apr 9 10:54:40 2015 +0100

Fix native_interpret_real for HFmode floats on Bigendian with UNITS_PER_WORD=4

(with missing space)

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 6d085b1..52bc8e9 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -7625,7 +7625,7 @@ native_interpret_real (tree type, const unsigned char *ptr, int len)
 	offset += byte % UNITS_PER_WORD;
 	}
   else
-	offset = BYTES_BIG_ENDIAN ? 3 - byte : byte;
+	offset = BYTES_BIG_ENDIAN ? MIN (3, total_bytes - 1) - byte : byte;
   value = ptr[offset + ((bitpos / BITS_PER_UNIT)  ~3)];
 
   tmp[bitpos / 32] |= (unsigned long)value  (bitpos  31);

Re: [PATCH 1/2][ARM] PR/63870: Add qualifier to check lane bounds in expand


Ping (https://gcc.gnu.org/ml/gcc-patches/2015-01/msg01422.html).

These are required for float16 patches posted at 
https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01332.html .


Bootstrapped + check-gcc on arm-none-linux-gnueabihf.

Alan Lawrence wrote:
This is based loosely upon svn r217440, [AArch64] Add bounds checking to 
vqdm_lane intrinsics..., but applies to more intrinsics (including e.g. 
vget_lane), and does not do the endianness-flipping present on AArch64: the 
objective is to exactly preserve behaviour on all valid code. (Yes, the new 
qualifier may perhaps give us a location for flipping lanes according to 
endianness in the future, but I'm not doing that here.) Checks for lanes being 
in range for many insns are thus moved from assembly to expand time, with 
inlining history. For example, previous error message:


vqrdmulh_lane_s16_indices_1.c: In function 'test1':
vqrdmulh_lane_s16_indices_1.c:9:1: error: lane out of range
}
^

becomes:

In file included vqrdmulh_lane_s16_indices_1.c:3:0:
In function 'vqrdmulh_lane_s16',
inlined from 'test1' at 
gcc/testsuite/gcc.target/aarch64/simd/vqrdmulh_lane_s16_indices_1.c:8:10:
.../install/lib/gcc/arm-none-eabi/5.0.0/include/arm_neon.h:6882:10: error: lane 
-1 out of range 0 - 3

return (int16x4_t)builtin_neon_vqrdmulh_lanev4hi (a, b, c);

Note the question of how to common up tests with those in 
gcc.target/aarch64/simd/*_indices_1.c is not resolved by this patch.


Cross-tested check-gcc on arm-none-eabi
Bootstrapped on arm-none-linux-gnueabihf cortex-a15

gcc/ChangeLog:

 * config/arm/arm-builtins.c (enum arm_type_qualifiers):
 Add qualifier_lane_index.
 (arm_binop_imm_qualifiers, BINOP_IMM_QUALIFIERS): New.
 (arm_getlane_qualifiers): Use qualifier_lane_index.
 (arm_lanemac_qualifiers): Rename to...
 (arm_mac_n_qualifiers): ...this.
 (LANEMAC_QUALIFIERS): Rename to...
 (MAC_N_QUALIFIERS): ...this.
 (arm_mac_lane_qualifiers, MAC_LANE_QUALIFIERS): New.
 (arm_setlane_qualifiers): Use qualifier_lane_index.
 (arm_ternop_imm_qualifiers, TERNOP_IMM_QUALIFIERS): New.
 (enum builtin_arg): Add NEON_ARG_LANE_INDEX.
 (arm_expand_neon_args): Handle NEON_ARG_LANE_INDEX.
 (arm_expand_neon_builtin): Handle qualifier_lane_index.

 * config/arm/arm-protos.h (neon_lane_bounds): Add const_tree parameter.
 * config/arm/arm.c (bounds_check): Likewise, improve error message.
 (neon_lane_bounds, neon_const_bounds): Add arguments to bounds_check.
 * config/arm/arm_neon_builtins.def (vshrs_n, vshru_n, vrshrs_n,
 vrshru_n, vshrn_n, vrshrn_n, vqshrns_n, vqshrnu_n, vqrshrns_n,
 vqrshrnu_n, vqshrun_n, vqrshrun_n, vshl_n, vqshl_s_n, vqshl_u_n,
 vqshlu_n, vshlls_n, vshllu_n): Change qualifiers to BINOP_IMM.
 (vsras_n, vsrau_n, vrsras_n, vrsrau_n, vsri_n, vsli_n): Change
 qualifiers to TERNOP_IMM.
 (vdup_lane): Change qualifiers to GETLANE.
 (vmla_lane, vmlals_lane, vmlalu_lane, vqdmlal_lane, vmls_lane,
 vmlsls_lane, vmlslu_lane, vqdmlsl_lane): Change qualifiers to MAC_LANE.
 (vmla_n, vmlals_n, vmlalu_n, vqdmlal_n, vmls_n, vmlsls_n, vmlslu_n,
 vqdmlsl_n): Change qualifiers to MAC_N.

 * config/arm/neon.md (neon_vget_lanemode, neon_vget_laneumode,
 neon_vget_lanedi, neon_vget_lanev2di, neon_vset_lanemode,
 neon_vset_lanedi, neon_vdup_lanemode, neon_vdup_lanedi,
 neon_vdup_lanev2di, neon_vmul_lanemode, neon_vmul_lanemode,
 neon_vmullsup_lanemode, neon_vqdmull_lanemode,
 neon_vqrdmulh_lanemode, neon_vqrdmulh_lanemode,
 neon_vmla_lanemode, neon_vmla_lanemode, neon_vmlalsup_lanemode,
 neon_vqdmlal_lanemode, neon_vmls_lanemode, neon_vmls_lanemode,
 neon_vmlslsup_lanemode, neon_vqdmlsl_lanemode):
 Remove call to neon_lane_bounds.


diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 7a45113..20d2198 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -89,7 +89,9 @@ enum arm_type_qualifiers
   /* qualifier_const_pointer | qualifier_map_mode  */
   qualifier_const_pointer_map_mode = 0x86,
   /* Polynomial types.  */
-  qualifier_poly = 0x100
+  qualifier_poly = 0x100,
+  /* Lane indices - must be within range of previous argument = a vector.  */
+  qualifier_lane_index = 0x200
 };
 
 /*  The qualifier_internal allows generation of a unary builtin from
@@ -120,21 +122,40 @@ arm_ternop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 
 /* T (T, immediate).  */
 static enum arm_type_qualifiers
-arm_getlane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+arm_binop_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_immediate };
+#define BINOP_IMM_QUALIFIERS (arm_binop_imm_qualifiers)
+
+/* T (T, lane index).  */
+static enum arm_type_qualifiers
+arm_getlane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_lane_index };
 #define GETLANE_QUALIFIERS (arm_getlane_qualifiers)
 
 /* T (T, T, T, immediate).  */
 static enum arm_type_qualifiers

[PATCH 8/14][AArch64]Add vreinterpret, float_truncate_lo/hi, vget_low/high


gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_float_truncate_lo_v2sf):
Reparameterize to...
(aarch64_float_truncate_lo_mode): ...this, for both V2SF and V4HF.
(aarch64_float_truncate_hi_v4sf): Reparameterize to...
(aarch64_float_truncate_hi_Vdbl): ...this, for both V4SF and V8HF.

* config/aarch64/aarch64-simd-builtins.def (float_truncate_hi_): Add
v8hf variant.
(float_truncate_lo_): Use BUILTIN_VDF iterator.

* config/aarch64/arm_neon.h (vreinterpret_p8_f16, vreinterpret_p16_f16,
vreinterpret_f16_f64, vreinterpret_f16_s8, vreinterpret_f16_s16,
vreinterpret_f16_s32, vreinterpret_f16_s64, vreinterpret_f16_f32,
vreinterpret_f16_u8, vreinterpret_f16_u16, vreinterpret_f16_u32,
vreinterpret_f16_u64, vreinterpret_f16_p8, vreinterpret_f16_p16,
vreinterpretq_f16_f64, vreinterpretq_f16_s8, vreinterpretq_f16_s16,
vreinterpretq_f16_s32, vreinterpretq_f16_s64, vreinterpretq_f16_f32,
vreinterpretq_f16_u8, vreinterpretq_f16_u16, vreinterpretq_f16_u32,
vreinterpretq_f16_u64, vreinterpretq_f16_p8, vreinterpretq_f16_p16,
vreinterpret_f32_f16, vreinterpret_f64_f16, vreinterpret_s64_f16,
vreinterpret_u64_f16, vreinterpretq_u64_f16, vreinterpret_s8_f16,
vreinterpret_s16_f16, vreinterpret_s32_f16, vreinterpret_u8_f16,
vreinterpret_u16_f16, vreinterpret_u32_f16, vget_low_f16, vget_high_f16,
vcvt_f16_f32, vcvt_high_f16_f32): New.

* config/aarch64/iterators.md (VDF, Vdtype): New.
(VWIDE, Vmwtype): Add cases for V4HF and V2SF.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vget_high_1.c: Add float16x8-float16x4 case.
* gcc.target/aarch64/vget_low_1.c: Likewise.
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 6298063d13c444d9b7c6cb0c14cfabce611f0d56..ea84055476c9e56e78d1b843e0b028e85a672ee6 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -4857,6 +4857,12 @@ vsetq_lane_u64 (uint64_t __elem, uint64x2_t __vec, const int __index)
   uint64x1_t lo = vcreate_u64 (vgetq_lane_u64 (tmp, 0));  \
   return vreinterpret_##__TYPE##_u64 (lo);
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vget_low_f16 (float16x8_t __a)
+{
+  __GET_LOW (f16);
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vget_low_f32 (float32x4_t __a)
 {
@@ -4936,6 +4942,12 @@ vget_low_u64 (uint64x2_t __a)
   uint64x1_t hi = vcreate_u64 (vgetq_lane_u64 (tmp, 1));	\
   return vreinterpret_##__TYPE##_u64 (hi);
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vget_high_f16 (float16x8_t __a)
+{
+  __GET_HIGH (f16);
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vget_high_f32 (float32x4_t __a)
 {
diff --git a/gcc/testsuite/gcc.target/aarch64/vget_high_1.c b/gcc/testsuite/gcc.target/aarch64/vget_high_1.c
index 4cb872da2cd269df5290a6af928ed958c4fecd09..b6b57e0c5468dbf571ec9e9196ac2d0fa3754d7a 100644
--- a/gcc/testsuite/gcc.target/aarch64/vget_high_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/vget_high_1.c
@@ -14,6 +14,7 @@ VARIANT (int8_t, 8, int8x8_t, int8x16_t, s8)		\
 VARIANT (int16_t, 4, int16x4_t, int16x8_t, s16)		\
 VARIANT (int32_t, 2, int32x2_t, int32x4_t, s32)		\
 VARIANT (int64_t, 1, int64x1_t, int64x2_t, s64)		\
+VARIANT (float16_t, 4, float16x4_t, float16x8_t, f16)	\
 VARIANT (float32_t, 2, float32x2_t, float32x4_t, f32)	\
 VARIANT (float64_t, 1, float64x1_t, float64x2_t, f64)
 
@@ -51,6 +52,8 @@ main (int argc, char **argv)
   int16_t int16_t_data[8] = { -17, 19, 3, -999, 44048, 505, , 1000};
   int32_t int32_t_data[4] = { 123456789, -987654321, -135792468, 975318642 };
   int64_t int64_t_data[2] = {0xfedcba9876543210LL, 0xdeadbabecafebeefLL };
+  float16_t float16_t_data[8] = { 1.25, 4.5, 7.875, 2.3125, 5.675, 8.875,
+  3.6875, 6.75};
   float32_t float32_t_data[4] = { 3.14159, 2.718, 1.414, 100.0 };
   float64_t float64_t_data[2] = { 1.0100100011, 12345.6789 };
 
diff --git a/gcc/testsuite/gcc.target/aarch64/vget_low_1.c b/gcc/testsuite/gcc.target/aarch64/vget_low_1.c
index f8016ef73124981f7042957521f42754566e9518..2223676521c4c10b2d839746873eb559559d76ba 100644
--- a/gcc/testsuite/gcc.target/aarch64/vget_low_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/vget_low_1.c
@@ -14,6 +14,7 @@ VARIANT (int8_t, 8, int8x8_t, int8x16_t, s8)		\
 VARIANT (int16_t, 4, int16x4_t, int16x8_t, s16)		\
 VARIANT (int32_t, 2, int32x2_t, int32x4_t, s32)		\
 VARIANT (int64_t, 1, int64x1_t, int64x2_t, s64)		\
+VARIANT (float16_t, 4, float16x4_t, float16x8_t, f16)	\
 VARIANT (float32_t, 2, float32x2_t, float32x4_t, f32)	\
 VARIANT (float64_t, 1, float64x1_t, float64x2_t, f64)
 
@@ -51,6 +52,8 @@ main (int argc, char **argv)
   int16_t int16_t_data[8] = { -17, 19, 3, -999, 44048, 505, , 1000};
   int32_t int32_t_data[4] = { 123456789, -987654321, -135792468,

[PATCH 10/14][AArch64] Add vcvt(_high)?_f32_f16 intrinsics

This adds the two remaining widening intrinsics, first adding patterns in 
aarch64-simd.md, then entries in aarch64-simd-builtins.def, and finally 
intrinsics in arm_neon.h .


Note this changes the vector indices present in the RTL on bigendian for float 
vec_unpacks, to be the same as for integer vec_unpacks. This appears consistent 
with the usage of VEC_UNPACK_(FLOAT_)?EXPR in tree-vect-stmts.c, which uses a 
different EXPR for the same half of the vector depending on endianness. I was 
not able to construct a testcase where the RTL here mattered (i.e. where the RTL 
was constant-folded, but the tree had not been), but the correctness can be seen 
from a testcase:


double d[4];
void
bar (float *f)
{
  for (int i = 0; i  4; i++)
d[i] = f[i];
}

which used to produced as final RTL (-O3)

(insn:TI 8 10 12 (set (reg:V2DF 33 v1 [orig:78 vect__9.19 ] [78])
(float_extend:V2DF (vec_select:V2SF (reg:V4SF 32 v0 [orig:77 MEM[(float 
*)f_6(D)] ] [77])

(parallel [
(const_int 2 [0x2])
(const_int 3 [0x3])
] test.c:40 1274 {vec_unpacks_hi_v4sf}
 (expr_list:REG_EQUIV (mem/c:V2DF (reg/f:DI 0 x0 [79]) [2 MEM[(double 
*)d]+0 S16 A64])

(nil)))
(insn:TI 12 8 11 (set (reg:V2DF 32 v0 [orig:81 vect__9.19 ] [81])
(float_extend:V2DF (vec_select:V2SF (reg:V4SF 32 v0 [orig:77 MEM[(float 
*)f_6(D)] ] [77])

(parallel [
(const_int 0 [0])
(const_int 1 [0x1])
] test.c:40 1272 {vec_unpacks_lo_v4sf}
 (expr_list:REG_EQUIV (mem/c:V2DF (plus:DI (reg/f:DI 0 x0 [79])
(const_int 16 [0x10])) [2 MEM[(double *)d + 16B]+0 S16 A64])
(nil)))
(insn:TI 11 12 15 (set (mem/c:V2DF (reg/f:DI 0 x0 [79]) [2 MEM[(double *)d]+0 
S16 A64])
(reg:V2DF 33 v1 [orig:78 vect__9.19 ] [78])) test.c:40 808 
{*aarch64_simd_movv2df}

 (expr_list:REG_DEAD (reg:V2DF 33 v1 [orig:78 vect__9.19 ] [78])
(nil)))
(insn:TI 15 11 22 (set (mem/c:V2DF (plus:DI (reg/f:DI 0 x0 [79])
(const_int 16 [0x10])) [2 MEM[(double *)d + 16B]+0 S16 A64])
(reg:V2DF 32 v0 [orig:81 vect__9.19 ] [81])) test.c:40 808 
{*aarch64_simd_movv2df}

 (expr_list:REG_DEAD (reg:V2DF 32 v0 [orig:81 vect__9.19 ] [81])

i.e. apparently storing vector elements 2 and 3 to the address of d, and elems 
0+1 to address (d+16). Of course this was flipped back again to be correct at 
assembly time, but following this patch the RTL indices are also correct (elems 
0+1 to address d, elems 2+3 to address d+16).


gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_simd_vec_unpacks_lo_mode,
aarch64_simd_vec_unpacks_hi_mode): New insn.
(vec_unpacks_lo_v4sf, vec_unpacks_hi_v4sf): Delete insn.
(vec_unpacks_lo_mode, vec_unpacks_hi_mode): New expand.
(aarch64_float_extend_lo_v2df): Rename to...
(aarch64_float_extend_lo_Vwide): this, using VDF and so adding V4SF.

* config/aarch64/aarch64-simd-builtins.def (vec_unpacks_hi): Add v8hf.
(float_extend_lo): Add v4sf.

* config/aarch64/arm_neon.h (vcvt_f32_f16, vcvt_high_f32_f16): New.
* config/aarch64/iterators.md (VQ_HSF): New iterator.
(VWIDE, Vwtype, Vhalftype): Add V8HF, V4SF.
(Vwide): New mode_attr.
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 604bfa20bf838ee04ef0e1dda0b47b55dbdd82a6..1eefb37c2eba37aecee6ccae100274c5a8cc5ae3 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -360,11 +360,11 @@
  only ever used for the int64x1_t intrinsic, there is no scalar version.  */
   BUILTIN_VALLDI (UNOP, abs, 2)
 
-  VAR1 (UNOP, vec_unpacks_hi_, 10, v4sf)
+  VAR2 (UNOP, vec_unpacks_hi_, 10, v4sf, v8hf)
   VAR1 (BINOP, float_truncate_hi_, 0, v4sf)
   VAR1 (BINOP, float_truncate_hi_, 0, v8hf)
   
-  VAR1 (UNOP, float_extend_lo_, 0, v2df)
+  VAR2 (UNOP, float_extend_lo_, 0, v2df, v4sf)
   BUILTIN_VDF (UNOP, float_truncate_lo_, 0)
   
   /* Implemented by aarch64_ld1VALL_F16:mode.  */
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 161396a331cab777bb2108f86c39b74557be4abc..17a5d5f8c757833a7ed387083f8076af2c8cad66 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1677,36 +1677,57 @@
 
 ;; Float widening operations.
 
-(define_insn vec_unpacks_lo_v4sf
-  [(set (match_operand:V2DF 0 register_operand =w)
-	(float_extend:V2DF
-	  (vec_select:V2SF
-	(match_operand:V4SF 1 register_operand w)
-	(parallel [(const_int 0) (const_int 1)])
-	  )))]
+(define_insn aarch64_simd_vec_unpacks_lo_mode
+  [(set (match_operand:VWIDE 0 register_operand =w)
+(float_extend:VWIDE (vec_select:VHALF
+			   (match_operand:VQ_HSF 1 register_operand w)
+			   (match_operand:VQ_HSF 2 vect_par_cnst_lo_half )

[PATCH 13/14][ARM/AArch64 testsuite] Use gcc-dg-runtest in advsimd-intrinsics.exp

In the first revision of Christophe Lyon's advsimd-intrinsics tests, 
https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00532.html , both gcc-dg-runtest 
(to assemble only) and c-torture-execute were used. In review the gcc-dg-runtest 
part was then dropped, and execution tests continued using c-torture-execute. 
However, c-torture-execute ignores e.g. dg-options directives in the individual 
test files, whereas gcc-dg-runtest does not.


This patch switches to gcc-dg-runtest (with dg-do-what-default = run) for all 
tests, allowing use of e.g. dg-options (in testsuite patch 3/3). This generally 
seems to work OK - indeed I also dropped the parallelization-disabling code - 
and the new advsimd-intrinsics.exp now follows gcc.c-torture/compile/compile.exp 
and gcc.c-torture/execute/execute.exp very closely. However, there are side 
effects, of which we should be aware, but with which I think we can live, 
specifically:


(1) the lines in the test log change from...

PASS: gcc.target/aarch64/advsimd-intrinsics/vcombine.c compilation,  -O1
PASS: gcc.target/aarch64/advsimd-intrinsics/vcombine.c execution,  -O1

...to...

PASS: gcc.target/aarch64/advsimd-intrinsics/vcombine.c   -O1  execution test

(that is, the compilation line disappears, but the (test for excess errors) 
remains unchanged)


(2) The -Og -g variant is no longer tested;  all of -O0, -O1, -O2, -O2 -flto 
-fno-use-linker-plugin -flto-partition=none, -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects, -O3 -fomit-frame-pointer, -O3 -g, -Os still are. My 
feeling is that this set of options is exhaustive enough.


Cross-tested arm-none-eabi, aarch64-none-elf, aarch64_be-none-elf; natively 
tested arm-none-linux-gnueabihf and aarch64-none-linux-gnu.


gcc/testsuite/ChangeLog:

* gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp:
Use gcc-dg-runtest.
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp
index 551299ef5634b7fea68d2e2f813ab61270b59e35..aa5d4d5ec2f45fa745fac152125e1badc2f2df43 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp
@@ -26,10 +26,6 @@ if {![istarget arm*-*-*]
 load_lib gcc-dg.exp
 
 # Initialize `dg'.
-load_lib c-torture.exp
-load_lib target-supports.exp
-load_lib torture-options.exp
-
 dg-init
 
 if {[istarget arm*-*-*]
@@ -37,29 +33,14 @@ if {[istarget arm*-*-*]
   return
 }
 
-torture-init
-set-torture-options $C_TORTURE_OPTIONS {{}} $LTO_TORTURE_OPTIONS
-
 # Make sure Neon flags are provided, if necessary.
-set additional_flags [add_options_for_arm_neon ]
+set additional_flags [add_options_for_arm_neon -w]
 
 # Main loop.
-foreach src [lsort [glob -nocomplain $srcdir/$subdir/*.c]] {
-# If we're only testing specific files and this isn't one of them, skip it.
-if ![runtest_file_p $runtests $src] then {
-	continue
-}
-
-# runtest_file_p is already run above, and the code below can run
-# runtest_file_p again, make sure everything for this test is
-# performed if the above runtest_file_p decided this runtest
-# instance should execute the test
-gcc_parallel_test_enable 0
-c-torture-execute $src $additional_flags
-gcc-dg-runtest $src  $additional_flags
-gcc_parallel_test_enable 1
-}
+set saved-dg-do-what-default ${dg-do-what-default}
+set dg-do-what-default run
+gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cS\]]]  ${additional_flags}
+set dg-do-what-default ${saved-dg-do-what-default}
 
 # All done.
-torture-finish
 dg-finish

Re: [PATCH 7/14][AArch64] vld{2,3,4}{,_lane,_dup},vcombine,vcreate


Alan Lawrence wrote:

gcc/ChangeLog:

* config/aarch64/aarch64.c (aarch64_split_simd_combine): Add V4HFmode.
* config/aarch64/aarch64-builtins.c (VAR13, VAR14): New.
(aarch64_scalar_builtin_types, aarch64_init_simd_builtin_scalar_types):
Add __builtin_aarch64_simd_hf.
* config/aarch64/arm_neon.h (float16x4x2_t, float16x8x2_t,
float16x4x3_t, float16x8x3_t, float16x4x4_t, float16x8x4_t,
vcombine_f16, vst2_lane_f16, vst2q_lane_f16, vst3_lane_f16,
vst3q_lane_f16, vst4_lane_f16, vst4q_lane_f16, vld2_f16, vld2q_f16,
vld3_f16, vld3q_f16, vld4_f16, vld4q_f16, vld2_dup_f16, vld2q_dup_f16,
vld3_dup_f16, vld3q_dup_f16, vld4_dup_f16, vld4q_dup_f16,
vld2_lane_f16, vld2q_lane_f16, vld3_lane_f16, vld3q_lane_f16,
vld4_lane_f16, vld4q_lane_f16, vst2_f16, vst2q_f16, vst3_f16,
vst3q_f16, vst4_f16, vst4q_f16, vcreate_f16): New.

* config/aarch64/iterators.md (VALLDIF, Vtype, Vetype, Vbtype,
V_cmp_result, v_cmp_result): Add cases for V4HF and V8HF.
(VDC, Vdbl): Add V4HF.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vldN_1.c: Add float16x4_t and float16x8_t cases.
* gcc.target/aarch64/vldN_dup_1.c: Likewise.
* gcc.target/aarch64/vldN_lane_1.c: Likewise.




diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index ac05c43faf47e2de6b20a976defe2b269f7f1633..e791533f4c9e26a0c0c5ddf78f015f334f1ca2ed 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -314,6 +314,12 @@ aarch64_types_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define VAR12(T, N, MAP, A, B, C, D, E, F, G, H, I, J, K, L) \
   VAR11 (T, N, MAP, A, B, C, D, E, F, G, H, I, J, K) \
   VAR1 (T, N, MAP, L)
+#define VAR13(T, N, MAP, A, B, C, D, E, F, G, H, I, J, K, L, M) \
+  VAR12 (T, N, MAP, A, B, C, D, E, F, G, H, I, J, K, L) \
+  VAR1 (T, N, MAP, M)
+#define VAR14(T, X, MAP, A, B, C, D, E, F, G, H, I, J, K, L, M, N) \
+  VAR13 (T, X, MAP, A, B, C, D, E, F, G, H, I, J, K, L, M) \
+  VAR1 (T, X, MAP, N)
 
 #include aarch64-builtin-iterators.h
 
@@ -391,6 +397,7 @@ const char *aarch64_scalar_builtin_types[] = {
   __builtin_aarch64_simd_qi,
   __builtin_aarch64_simd_hi,
   __builtin_aarch64_simd_si,
+  __builtin_aarch64_simd_hf,
   __builtin_aarch64_simd_sf,
   __builtin_aarch64_simd_di,
   __builtin_aarch64_simd_df,
@@ -678,6 +685,8 @@ aarch64_init_simd_builtin_scalar_types (void)
 	 __builtin_aarch64_simd_qi);
   (*lang_hooks.types.register_builtin_type) (intHI_type_node,
 	 __builtin_aarch64_simd_hi);
+  (*lang_hooks.types.register_builtin_type) (aarch64_fp16_type_node,
+	 __builtin_aarch64_simd_hf);
   (*lang_hooks.types.register_builtin_type) (intSI_type_node,
 	 __builtin_aarch64_simd_si);
   (*lang_hooks.types.register_builtin_type) (float_type_node,
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 13debc8f94b54873bbe8cfda0a8e00921489d9c3..425bcbdc47e3b08d456b02e823a4b370e0fc6312 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1052,6 +1052,9 @@ aarch64_split_simd_combine (rtx dst, rtx src1, rtx src2)
 	case V2SImode:
 	  gen = gen_aarch64_simd_combinev2si;
 	  break;
+	case V4HFmode:
+	  gen = gen_aarch64_simd_combinev4hf;
+	  break;
 	case V2SFmode:
 	  gen = gen_aarch64_simd_combinev2sf;
 	  break;
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index dca74025c80eb82f6f56e43ec009ceb0803de6e9..935041297ac6878292c81d5db0d70674def21f03 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -153,6 +153,16 @@ typedef struct uint64x2x2_t
   uint64x2_t val[2];
 } uint64x2x2_t;
 
+typedef struct float16x4x2_t
+{
+  float16x4_t val[2];
+} float16x4x2_t;
+
+typedef struct float16x8x2_t
+{
+  float16x8_t val[2];
+} float16x8x2_t;
+
 typedef struct float32x2x2_t
 {
   float32x2_t val[2];
@@ -273,6 +283,16 @@ typedef struct uint64x2x3_t
   uint64x2_t val[3];
 } uint64x2x3_t;
 
+typedef struct float16x4x3_t
+{
+  float16x4_t val[3];
+} float16x4x3_t;
+
+typedef struct float16x8x3_t
+{
+  float16x8_t val[3];
+} float16x8x3_t;
+
 typedef struct float32x2x3_t
 {
   float32x2_t val[3];
@@ -393,6 +413,16 @@ typedef struct uint64x2x4_t
   uint64x2_t val[4];
 } uint64x2x4_t;
 
+typedef struct float16x4x4_t
+{
+  float16x4_t val[4];
+} float16x4x4_t;
+
+typedef struct float16x8x4_t
+{
+  float16x8_t val[4];
+} float16x8x4_t;
+
 typedef struct float32x2x4_t
 {
   float32x2_t val[4];
@@ -2644,6 +2674,12 @@ vcreate_s64 (uint64_t __a)
   return (int64x1_t) {__a};
 }
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vcreate_f16 (uint64_t __a)
+{
+  return (float16x4_t) __a;
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vcreate_f32 (uint64_t __a)
 {
@@ -4780,6 +4816,12 @@ vcombine_s64 (int64x1_t __a, int64x1_t __b)
   return

[PATCH 4/14][ARM] Remaining float16 intrinsics: vld..., vst..., vget_low|high, vcombine

This is a respin of https://gcc.gnu.org/ml/gcc-patches/2015-01/msg01440.html ; 
changes are to add in several missing vst... intrinsics, and fix a missing 
iterator V_uf_sclr used in vec_extract.


These intrinsics are all made from patterns in neon.md, and are all tied 
together by iterators - I've tried to reduce coupling where I can.


Cross-tested check-gcc on arm-none-eabi;
Bootstrapped + check-gcc on arm-none-linux-gnueabihf.

gcc/ChangeLog:

* config/arm/arm-builtins.c (VAR11, VAR12): New.
* config/arm/arm_neon_builtins.def (vcombine, vld2_dup, vld3_dup,
vld4_dup): Add v4hf variant.
(vget_high, vget_low): Add v8hf variant.
(vld1, vst1, vst1_lane, vld2, vld2_lane, vst2, vst2_lane, vld3,
vld3_lane, vst3, vst3_lane, vld4, vld4_lane, vst4, vst4_lane): Add
v4hf and v8hf variants.

* config/arm/iterators.md (VD_LANE, VD_RE, VQ2, VQ_HS): New.
(VDX): Add V4HF.
(V_DOUBLE): Add case for V4HF.
(VQX): Add V8HF.
(V_HALF): Add case for V8HF.
(VDQX): Add V4HF, V8HF.
(V_elem, V_two_elem, V_three_elem, V_four_elem, V_cmp_result,
V_uf_sclr, V_sz_elem, V_mode_nunits, q): Add cases for V4HF  V8HF.

* config/arm/neon.md (vec_setmodeinternal, vec_extractmode,
neon_vget_lanemode_sext_internal, neon_vget_lanemode_zext_internal,
vec_load_lanesoimode, neon_vld2mode, vec_store_lanesoimode,
neon_vst2mode, vec_load_lanescimode, neon_vld3mode,
neon_vld3qamode, neon_vld3qbmode, vec_store_lanescimode,
neon_vst3mode, neon_vst3qamode, neon_vst3qbmode,
vec_load_lanesximode, neon_vld4mode, neon_vld4qamode,
neon_vld4qbmode, vec_store_lanesximode, neon_vst4mode,
neon_vst4qamode, neon_vst4qbmode): Change VQ iterator to VQ2.

(neon_vcreate, neon_vreinterpretv8qimode,
neon_vreinterpretv4himode, neon_vreinterpretv2simode,
neon_vreinterpretv2sfmode, neon_vreinterpretdimode):
Change VDX to VD_RE.

(neon_vld2_lanemode, neon_vst2_lanemode, neon_vld3_lanemode,
neon_vst3_lanemode, neon_vld4_lanemode, neon_vst4_lanemode):
Change VD iterator to VD_LANE, and VMQ iterator to VQ_HS.

* config/arm/arm_neon.h (float16x4x2_t, float16x8x2_t, float16x4x3_t,
float16x8x3_t, float16x4x4_t, float16x8x4_t, vcombine_f16,
vget_high_f16, vget_low_f16, vld1_f16, vld1q_f16, vst1_f16, vst1q_f16,
vst1_lane_f16, vst1q_lane_f16, vld2_f16, vld2q_f16, vld2_lane_f16,
vld2q_lane_f16, vld2_dup_f16, vst2_f16, vst2q_f16, vst2_lane_f16,
vst2q_lane_f16, vld3_f16, vld3q_f16, vld3_lane_f16, vld3q_lane_f16,
vld3_dup_f16, vst3_f16, vst3q_f16, vst3_lane_f16, vst3q_lane_f16,
vld4_f16, vld4q_f16, vld4_lane_f16, vld4q_lane_f16, vld4_dup_f16,
vst4_f16, vst4q_f16, vst4_lane_f16, vst4q_lane_f16, ): New.
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 9855b86202e80816bada565786f35dd21fe68c91..4c3f0e888969f16ff6c84e2a1bf65321d73ec8b4 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -228,6 +228,12 @@ typedef struct {
 #define VAR10(T, N, A, B, C, D, E, F, G, H, I, J) \
   VAR9 (T, N, A, B, C, D, E, F, G, H, I) \
   VAR1 (T, N, J)
+#define VAR11(T, N, A, B, C, D, E, F, G, H, I, J, K) \
+  VAR10 (T, N, A, B, C, D, E, F, G, H, I, J) \
+  VAR1 (T, N, K)
+#define VAR12(T, N, A, B, C, D, E, F, G, H, I, J, K, L) \
+  VAR11 (T, N, A, B, C, D, E, F, G, H, I, J, K) \
+  VAR1 (T, N, L)
 
 /* The NEON builtin data can be found in arm_neon_builtins.def.
The mode entries in the following table correspond to the key type of the
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index e308bd4b7a404d869a7d125e8c5ddc4fb16ec884..1892fac503582366105b7dae0ac5e4da16698648 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -162,6 +162,16 @@ typedef struct uint64x2x2_t
   uint64x2_t val[2];
 } uint64x2x2_t;
 
+typedef struct float16x4x2_t
+{
+  float16x4_t val[2];
+} float16x4x2_t;
+
+typedef struct float16x8x2_t
+{
+  float16x8_t val[2];
+} float16x8x2_t;
+
 typedef struct float32x2x2_t
 {
   float32x2_t val[2];
@@ -288,6 +298,16 @@ typedef struct uint64x2x3_t
   uint64x2_t val[3];
 } uint64x2x3_t;
 
+typedef struct float16x4x3_t
+{
+  float16x4_t val[3];
+} float16x4x3_t;
+
+typedef struct float16x8x3_t
+{
+  float16x8_t val[3];
+} float16x8x3_t;
+
 typedef struct float32x2x3_t
 {
   float32x2_t val[3];
@@ -414,6 +434,16 @@ typedef struct uint64x2x4_t
   uint64x2_t val[4];
 } uint64x2x4_t;
 
+typedef struct float16x4x4_t
+{
+  float16x4_t val[4];
+} float16x4x4_t;
+
+typedef struct float16x8x4_t
+{
+  float16x8_t val[4];
+} float16x8x4_t;
+
 typedef struct float32x2x4_t
 {
   float32x2_t val[4];
@@ -6063,6 +6093,12 @@ vcombine_s64 (int64x1_t __a, int64x1_t __b)
   return (int64x2_t)__builtin_neon_vcombinedi (__a, __b);
 }
 
+__extension__ static __inline float16x8_t __attribute__

[PATCH 6/14][AArch64] Add support for float16x{4,8}_t vectors/builtins

This adds some basic intrinsics - vget_lane, vset_lane, vld1_lane, vld1, vst1 - 
for float16 types, and the necessary support in the builtin generator, basic 
patterns for moving values around, etc. Other intrinsics will follow in later 
patches.


I've extended the existing testcases in aarch64/, but advsimd-intrinsics tests 
follow later in the series.


gcc/ChangeLog:

* config/aarch64/aarch64.c (aarch64_vector_mode_supported_p): Support
V4HFmode and V8HFmode.
(aarch64_split_simd_move): Add case for V8HFmode.
* config/aarch64/aarch64-builtins.c (v4hf_UP, v8hf_UP): Define.
(aarch64_simd_builtin_std_type): Handle HFmode.
(aarch64_init_simd_builtin_types): Include Float16x4_t and Float16x8_t.

* config/aarch64/aarch64-simd.md (movmode, aarch64_get_lanemode,
aarch64_ld1VALL:mode, aarch64_st1VALL:mode): Use VALL_F16 iterator.
(aarch64_be_ld1mode, aarch64_be_st1mode): Use VALLDI_F16 iterator.

* config/aarch64/aarch64-simd-builtin-types.def: Add Float16x4_t,
Float16x8_t.

* config/aarch64/aarch64-simd-builtins.def (ld1, st1): Use VALL_F16.
* config/aarch64/arm_neon.h (float16x4_t, float16x8_t, float16_t):
New typedefs.
(vget_lane_f16, vgetq_lane_f16, vset_lane_f16, vsetq_lane_f16,
vld1_f16, vld1q_f16, vst1_f16, vst1q_f16, vst1_lane_f16,
vst1q_lane_f16): New.
* config/aarch64/iterators.md (VD, VQ, VQ_NO2E): Add vectors of HFmode.
(VALLDI_F16, VALL_F16): New.
(Vmtype, VEL, VCONQ, VHALF, VRL3, VRL4, V_TWO_ELEM, V_THREE_ELEM,
V_FOUR_ELEM, q): Add cases for V4HF and V8HF.
(VDBL, VRL2): Add V4HF case.

gcc/testsuite/ChangeLog:

* g++.dg/abi/mangle-neon-aarch64.C: Add cases for float16x4_t and
float16x8_t.
* gcc.target/aarch64/vset_lane_1.c: Likewise.
* gcc.target/aarch64/vld1-vst1_1.c: Likewise, also missing float32x4_t.
* gcc.target/aarch64/vld1_lane.c: Remove unused constants; add cases
for float16x4_t and float16x8_t.
diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index d554735ab480f9e9b1f49fd3510555197bb7b5f4..6544643a3cd1dd46b440eca0e1a05bad4c499262 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -63,6 +63,7 @@
 
 #define v8qi_UP  V8QImode
 #define v4hi_UP  V4HImode
+#define v4hf_UP  V4HFmode
 #define v2si_UP  V2SImode
 #define v2sf_UP  V2SFmode
 #define v1df_UP  V1DFmode
@@ -70,6 +71,7 @@
 #define df_UPDFmode
 #define v16qi_UP V16QImode
 #define v8hi_UP  V8HImode
+#define v8hf_UP  V8HFmode
 #define v4si_UP  V4SImode
 #define v4sf_UP  V4SFmode
 #define v2di_UP  V2DImode
@@ -522,6 +524,8 @@ aarch64_simd_builtin_std_type (enum machine_mode mode,
   return aarch64_simd_intCI_type_node;
 case XImode:
   return aarch64_simd_intXI_type_node;
+case HFmode:
+  return aarch64_fp16_type_node;
 case SFmode:
   return float_type_node;
 case DFmode:
@@ -606,6 +610,8 @@ aarch64_init_simd_builtin_types (void)
   aarch64_simd_types[Poly64x2_t].eltype = aarch64_simd_types[Poly64_t].itype;
 
   /* Continue with standard types.  */
+  aarch64_simd_types[Float16x4_t].eltype = aarch64_fp16_type_node;
+  aarch64_simd_types[Float16x8_t].eltype = aarch64_fp16_type_node;
   aarch64_simd_types[Float32x2_t].eltype = float_type_node;
   aarch64_simd_types[Float32x4_t].eltype = float_type_node;
   aarch64_simd_types[Float64x1_t].eltype = double_type_node;
diff --git a/gcc/config/aarch64/aarch64-simd-builtin-types.def b/gcc/config/aarch64/aarch64-simd-builtin-types.def
index b85a23109efae6301931f12c6b665015af570fb7..ef8f20574c52170facfe67fc9fa433dc64926bca 100644
--- a/gcc/config/aarch64/aarch64-simd-builtin-types.def
+++ b/gcc/config/aarch64/aarch64-simd-builtin-types.def
@@ -44,6 +44,8 @@
   ENTRY (Poly16x8_t, V8HI, poly, 12)
   ENTRY (Poly64x1_t, DI, poly, 12)
   ENTRY (Poly64x2_t, V2DI, poly, 12)
+  ENTRY (Float16x4_t, V4HF, none, 13)
+  ENTRY (Float16x8_t, V8HF, none, 13)
   ENTRY (Float32x2_t, V2SF, none, 13)
   ENTRY (Float32x4_t, V4SF, none, 13)
   ENTRY (Float64x1_t, V1DF, none, 13)
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index af39d9c2b42eea0bc45ea5bc3d4fc576849cfd65..07f8ba961c1546ccac7ecaa5756c631afeae4b3e 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -345,11 +345,11 @@
   VAR1 (UNOP, float_extend_lo_, 0, v2df)
   VAR1 (UNOP, float_truncate_lo_, 0, v2sf)
 
-  /* Implemented by aarch64_ld1VALL:mode.  */
-  BUILTIN_VALL (LOAD1, ld1, 0)
+  /* Implemented by aarch64_ld1VALL_F16:mode.  */
+  BUILTIN_VALL_F16 (LOAD1, ld1, 0)
 
-  /* Implemented by aarch64_st1VALL:mode.  */
-  BUILTIN_VALL (STORE1, st1, 0)
+  /* Implemented by aarch64_st1VALL_F16:mode.  */
+  BUILTIN_VALL_F16 (STORE1, st1, 0)
 
   /* Implemented by fmamode4.  */
   BUILTIN_VDQF (TERNOP, fma,

[PATCH 9/14][AArch64] vld1(q?)_dup, missing vreinterpretq intrinsics


gcc/ChangeLog:

* config/aarch64/arm_neon.h (vreinterpretq_p8_f16,
vreinterpretq_p16_f16, vreinterpretq_f32_f16, vreinterpretq_f64_f16,
vreinterpretq_s64_f16, vreinterpretq_s8_f16, vreinterpretq_s16_f16,
vreinterpretq_s32_f16, vreinterpretq_u8_f16, vreinterpretq_u16_f16,
vreinterpretq_u32_f16, vld1_dup_f16, vld1q_dup_f16): New.
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 88723231e5c32faf3bc68eccdf4e3a2b104b57b9..6d98b2e08221c3e25c4f66e6058b4f228d90a094 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -2993,6 +2993,12 @@ vreinterpretq_p8_s64 (int64x2_t __a)
 }
 
 __extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_p8_f16 (float16x8_t __a)
+{
+  return (poly8x16_t) __a;
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
 vreinterpretq_p8_f32 (float32x4_t __a)
 {
   return (poly8x16_t) __a;
@@ -3131,6 +3137,12 @@ vreinterpretq_p16_s64 (int64x2_t __a)
 }
 
 __extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_p16_f16 (float16x8_t __a)
+{
+  return (poly16x8_t) __a;
+}
+
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
 vreinterpretq_p16_f32 (float32x4_t __a)
 {
   return (poly16x8_t) __a;
@@ -3383,6 +3395,12 @@ vreinterpret_f32_p16 (poly16x4_t __a)
 }
 
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_f32_f16 (float16x8_t __a)
+{
+  return (float32x4_t) __a;
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vreinterpretq_f32_f64 (float64x2_t __a)
 {
   return (float32x4_t) __a;
@@ -3521,6 +3539,12 @@ vreinterpret_f64_u64 (uint64x1_t __a)
 }
 
 __extension__ static __inline float64x2_t __attribute__((__always_inline__))
+vreinterpretq_f64_f16 (float16x8_t __a)
+{
+  return (float64x2_t) __a;
+}
+
+__extension__ static __inline float64x2_t __attribute__((__always_inline__))
 vreinterpretq_f64_f32 (float32x4_t __a)
 {
   return (float64x2_t) __a;
@@ -3683,6 +3707,12 @@ vreinterpretq_s64_s32 (int32x4_t __a)
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_s64_f16 (float16x8_t __a)
+{
+  return (int64x2_t) __a;
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vreinterpretq_s64_f32 (float32x4_t __a)
 {
   return (int64x2_t) __a;
@@ -3965,6 +3995,12 @@ vreinterpretq_s8_s64 (int64x2_t __a)
 }
 
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_s8_f16 (float16x8_t __a)
+{
+  return (int8x16_t) __a;
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vreinterpretq_s8_f32 (float32x4_t __a)
 {
   return (int8x16_t) __a;
@@ -4103,6 +4139,12 @@ vreinterpretq_s16_s64 (int64x2_t __a)
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_s16_f16 (float16x8_t __a)
+{
+  return (int16x8_t) __a;
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vreinterpretq_s16_f32 (float32x4_t __a)
 {
   return (int16x8_t) __a;
@@ -4241,6 +4283,12 @@ vreinterpretq_s32_s64 (int64x2_t __a)
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_s32_f16 (float16x8_t __a)
+{
+  return (int32x4_t) __a;
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vreinterpretq_s32_f32 (float32x4_t __a)
 {
   return (int32x4_t) __a;
@@ -4385,6 +4433,12 @@ vreinterpretq_u8_s64 (int64x2_t __a)
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_u8_f16 (float16x8_t __a)
+{
+  return (uint8x16_t) __a;
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vreinterpretq_u8_f32 (float32x4_t __a)
 {
   return (uint8x16_t) __a;
@@ -4523,6 +4577,12 @@ vreinterpretq_u16_s64 (int64x2_t __a)
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_u16_f16 (float16x8_t __a)
+{
+  return (uint16x8_t) __a;
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vreinterpretq_u16_f32 (float32x4_t __a)
 {
   return (uint16x8_t) __a;
@@ -4661,6 +4721,12 @@ vreinterpretq_u32_s64 (int64x2_t __a)
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_u32_f16 (float16x8_t __a)
+{
+  return (uint32x4_t) __a;
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vreinterpretq_u32_f32 (float32x4_t __a)
 {
   return (uint32x4_t) __a;
@@ -15107,6 +15173,13 @@ vld1q_u64 (const uint64_t *a)
 
 /* vld1_dup  */
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vld1_dup_f16 (const float16_t* __a)
+{
+  float16_t __f = *__a;
+  return (float16x4_t) { __f, __f, __f, __f };
+}
+
 __extension__ static __inline float32x2_t

[PATCH 14/14][ARM/AArch64 testsuite] Test float16_t vcvt_* intrinsics

This adds a test of vcvt_f32_f16 and vcvt_f16_f32, also vcvt_high_f32_f16 and 
vcvt_high_f16_f32.


On ARM, we pass additional option -mfpu=neon-fp16 to the compiler (possible 
following patch 2/3). The compiler is already receiving an option such as 
-mfpu=neon or -mfpu=crypto-neon-fp-armv8, but passing neon-fp16 as well as 
either of those appears to do no harm, and turns on the superset of all -mfpu 
options, as desired.


On AArch64, we additionally test vcvt_high_f32_f16 and vcvt_high_f16_f32; these 
are not tested on ARM as the relevant intrinsics do not exist in 32-bit state.


Passing on aarch64_be-none-elf, aarch64-none-elf, arm-none-linux-gnueabi, 
aarch64-none-linux-gnu.


gcc/testsuite/ChangeLog:
 * gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c: New.
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c
new file mode 100644
index ..a346b3d72e13d5b2028de5ae7b88f910dcb3f862
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c
@@ -0,0 +1,96 @@
+/* { dg-additional-options -mfpu=neon-fp16 { target { arm*-*-* } } } */
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+#include math.h
+
+/* Expected results for vcvt.  */
+VECT_VAR_DECL (expected,hfloat,32,4) [] = { 0x4180, 0x4170,
+	0x4160, 0x4150 };
+VECT_VAR_DECL (expected,hfloat,16,4) [] = { 0x3e00, 0x4100, 0x4300, 0x4480 };
+
+/* Expected results for vcvt_high_f32_f16.  */
+VECT_VAR_DECL (expected_high,hfloat,32,4) [] = { 0xc140, 0xc130,
+		 0xc120, 0xc110 };
+/* Expected results for vcvt_high_f16_f32.  */
+VECT_VAR_DECL (expected_high,hfloat,16,8) [] = { 0x4000, 0x4000, 0x4000, 0x4000,
+		 0xcc00, 0xcb80, 0xcb00, 0xca80 };
+
+void
+exec_vcvt (void)
+{
+#define TEST_MSG vcvt_f32_f16
+  {
+VECT_VAR_DECL (buffer_src, float, 16, 4) [] = { 16.0, 15.0, 14.0, 13.0 };
+
+DECL_VARIABLE (vector_src, float, 16, 4);
+
+VLOAD (vector_src, buffer_src, , float, f, 16, 4);
+DECL_VARIABLE (vector_res, float, 32, 4) =
+	vcvt_f32_f16 (VECT_VAR (vector_src, float, 16, 4));
+vst1q_f32 (VECT_VAR (result, float, 32, 4),
+	   VECT_VAR (vector_res, float, 32, 4));
+
+CHECK_FP (TEST_MSG, float, 32, 4, PRIx32, expected, );
+  }
+#undef TEST_MSG
+
+  clean_results ();
+
+#define TEST_MSG vcvt_f16_f32
+  {
+VECT_VAR_DECL (buffer_src, float, 32, 4) [] = { 1.5, 2.5, 3.5, 4.5 };
+DECL_VARIABLE (vector_src, float, 32, 4);
+
+VLOAD (vector_src, buffer_src, q, float, f, 32, 4);
+DECL_VARIABLE (vector_res, float, 16, 4) =
+  vcvt_f16_f32 (VECT_VAR (vector_src, float, 32, 4));
+vst1_f16 (VECT_VAR (result, float, 16, 4),
+	  VECT_VAR (vector_res, float, 16 ,4));
+
+CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected, );
+  }
+#undef TEST_MSG
+
+#ifdef __ARM_64BIT_STATE
+  clean_results ();
+
+#define TEST_MSG vcvt_high_f32_f16
+  {
+DECL_VARIABLE (vector_src, float, 16, 8);
+VLOAD (vector_src, buffer, q, float, f, 16, 8);
+DECL_VARIABLE (vector_res, float, 32, 4);
+VECT_VAR (vector_res, float, 32, 4) =
+  vcvt_high_f32_f16 (VECT_VAR (vector_src, float, 16, 8));
+vst1q_f32 (VECT_VAR (result, float, 32, 4),
+	   VECT_VAR (vector_res, float, 32, 4));
+CHECK_FP (TEST_MSG, float, 32, 4, PRIx32, expected_high, );
+  }
+#undef TEST_MSG
+  clean_results ();
+
+#define TEST_MSG vcvt_high_f16_f32
+  {
+DECL_VARIABLE (vector_low, float, 16, 4);
+VDUP (vector_low, , float, f, 16, 4, 2.0);
+
+DECL_VARIABLE (vector_src, float, 32, 4);
+VLOAD (vector_src, buffer, q, float, f, 32, 4);
+
+DECL_VARIABLE (vector_res, float, 16, 8) =
+  vcvt_high_f16_f32 (VECT_VAR (vector_low, float, 16, 4),
+			 VECT_VAR (vector_src, float, 32, 4));
+vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	   VECT_VAR (vector_res, float, 16, 8));
+
+CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_high, );
+  }
+#endif
+}
+
+int
+main (void)
+{
+  exec_vcvt ();
+  return 0;
+}

Re: [PATCH 00/12] Reduce conditional compilation

2015-04-22 Thread David Malcolm

On Tue, 2015-04-21 at 11:29 -0400, Trevor Saunders wrote:
 On Tue, Apr 21, 2015 at 07:57:19AM -0600, Jeff Law wrote:
  On 04/21/2015 07:24 AM, tbsaunde+...@tbsaunde.org wrote:
  From: Trevor Saunders tbsaunde+...@tbsaunde.org
  
  Hi,
  
  This is a first round of patches to reduce the amount of code with in #if /
  #ifdef.  This makes it incrementally easier to not break configs other 
  than the
  one being built, and moves things slightly closer to using target hooks for
  everything.
  
  each commit bootstrapped and regtested on x86_64-linux-gnu without 
  regression,
  and whole patch set run through config-list.mk without issue, ok?
  So I think after looking at this patchset, any changes of a similar nature
  you want to make should be considered pre-approved.  Just post them for
  archival purposes, but no need for you to wait for review as long as they
  have the same purpose and overall structure as was seen in these patches.
 
 thanks!  Its also always nice to have someone double check your logic
 :-)

Thanks for working on this!

Conditional compilation was a major PITA when doing the rtx-rtx_insn *
work last year, so I'm very pleased to see these cleanups go in.

Dave

Re: [PATCH] Makefile.tpl: Remove surplus whitespace

On 9 April 2015 at 13:20, Bernhard Reutner-Fischer
rep.dot@gmail.com wrote:
 essentially s/[[:space:]]*[[:space:]];/;/g

 ChangeLog (attn: to src, IIRC no write-access, ask someone to commit)

 Ok for trunk now?

Jeff OKed this, applied as r222334.

Please holler if i broke something..

thanks,


 2015-04-01  Bernhard Reutner-Fischer  al...@gcc.gnu.org

 * Makefile.tpl: Remove surplus whitespace throughout.
 * Makefile.in: Regenerate.

 Signed-off-by: Bernhard Reutner-Fischer rep.dot@gmail.com
 ---
  Makefile.in  | 4858 
 +-
  Makefile.tpl |  110 +-
  2 files changed, 2484 insertions(+), 2484 deletions(-)

 diff --git a/Makefile.tpl b/Makefile.tpl
 index 1ea1954..9972909 100644
 --- a/Makefile.tpl
 +++ b/Makefile.tpl
 @@ -225,8 +225,8 @@ HOST_EXPORTS = \
 GMPINC=$(HOST_GMPINC); export GMPINC; \
 ISLLIBS=$(HOST_ISLLIBS); export ISLLIBS; \
 ISLINC=$(HOST_ISLINC); export ISLINC; \
 -   LIBELFLIBS=$(HOST_LIBELFLIBS) ; export LIBELFLIBS; \
 -   LIBELFINC=$(HOST_LIBELFINC) ; export LIBELFINC; \
 +   LIBELFLIBS=$(HOST_LIBELFLIBS); export LIBELFLIBS; \
 +   LIBELFINC=$(HOST_LIBELFINC); export LIBELFINC; \
  @if gcc-bootstrap
 $(RPATH_ENVVAR)=`echo $(TARGET_LIB_PATH)$$$(RPATH_ENVVAR) | sed 
 's,::*,:,g;s,^:*,,;s,:*$$,,'`; export $(RPATH_ENVVAR); \
  @endif gcc-bootstrap
 @@ -785,9 +785,9 @@ do-info: maybe-all-texinfo

  install-info: do-install-info dir.info
 s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
 -   if [ -f dir.info ] ; then \
 - $(INSTALL_DATA) dir.info $(DESTDIR)$(infodir)/dir.info ; \
 -   else true ; fi
 +   if [ -f dir.info ]; then \
 + $(INSTALL_DATA) dir.info $(DESTDIR)$(infodir)/dir.info; \
 +   else true; fi

  install-pdf: do-install-pdf

 @@ -913,14 +913,14 @@ uninstall:

  .PHONY: install.all
  install.all: install-no-fixedincludes
 -   @if [ -f ./gcc/Makefile ] ; then \
 -   r=`${PWD_COMMAND}` ; export r ; \
 +   @if [ -f ./gcc/Makefile ]; then \
 +   r=`${PWD_COMMAND}`; export r; \
 s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
 $(HOST_EXPORTS) \
 (cd ./gcc  \
 -   $(MAKE) $(FLAGS_TO_PASS) install-headers) ; \
 +   $(MAKE) $(FLAGS_TO_PASS) install-headers); \
 else \
 -   true ; \
 +   true; \
 fi

  # install-no-fixedincludes is used to allow the elaboration of binary 
 packages
 @@ -960,15 +960,15 @@ installdirs: mkinstalldirs
 $(SHELL) $(srcdir)/mkinstalldirs $(MAKEDIRS)

  dir.info: do-install-info
 -   if [ -f $(srcdir)/texinfo/gen-info-dir ] ; then \
 - $(srcdir)/texinfo/gen-info-dir $(DESTDIR)$(infodir) 
 $(srcdir)/texinfo/dir.info-template  dir.info.new ; \
 - mv -f dir.info.new dir.info ; \
 -   else true ; \
 +   if [ -f $(srcdir)/texinfo/gen-info-dir ]; then \
 + $(srcdir)/texinfo/gen-info-dir $(DESTDIR)$(infodir) 
 $(srcdir)/texinfo/dir.info-template  dir.info.new; \
 + mv -f dir.info.new dir.info; \
 +   else true; \
 fi

  dist:
 @echo Building a full distribution of this tree isn't done
 -   @echo via 'make dist'.  Check out the etc/ subdirectory
 +   @echo via 'make dist'.  Check out the etc/ subdirectory

  etags tags: TAGS

 @@ -998,8 +998,8 @@ configure-[+prefix+][+module+]: [+ IF bootstrap +][+ ELSE 
 +]
 s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
 [+ IF check_multilibs
 +]echo Checking multilib configuration for [+module+]...; \
 -   $(SHELL) $(srcdir)/mkinstalldirs [+subdir+]/[+module+] ; \
 -   $(CC_FOR_TARGET) --print-multi-lib  
 [+subdir+]/[+module+]/multilib.tmp 2 /dev/null ; \
 +   $(SHELL) $(srcdir)/mkinstalldirs [+subdir+]/[+module+]; \
 +   $(CC_FOR_TARGET) --print-multi-lib  
 [+subdir+]/[+module+]/multilib.tmp 2 /dev/null; \
 if test -r [+subdir+]/[+module+]/multilib.out; then \
   if cmp -s [+subdir+]/[+module+]/multilib.tmp 
 [+subdir+]/[+module+]/multilib.out; then \
 rm -f [+subdir+]/[+module+]/multilib.tmp; \
 @@ -1011,7 +1011,7 @@ configure-[+prefix+][+module+]: [+ IF bootstrap +][+ 
 ELSE +]
   mv [+subdir+]/[+module+]/multilib.tmp 
 [+subdir+]/[+module+]/multilib.out; \
 fi; \
 [+ ENDIF check_multilibs +]test ! -f [+subdir+]/[+module+]/Makefile 
 || exit 0; \
 -   $(SHELL) $(srcdir)/mkinstalldirs [+subdir+]/[+module+] ; \
 +   $(SHELL) $(srcdir)/mkinstalldirs [+subdir+]/[+module+]; \
 [+exports+] [+extra_exports+] \
 echo Configuring in [+subdir+]/[+module+]; \
 cd [+subdir+]/[+module+] || exit 1; \
 @@ -1044,7 +1044,7 @@ configure-stage[+id+]-[+prefix+][+module+]:
 TFLAGS=$(STAGE[+id+]_TFLAGS); \
 [+ IF check_multilibs
 +]echo Checking multilib configuration for [+module+]...; \
 -   $(CC_FOR_TARGET)

[C/C++ PATCH] Implement -Wshift-negative-value (PR c/65179)

Currently, we warn if the right operand of a shift expression is negative,
or greater than or equal to the length in bits of the promoted left operand.

But we don't warn when we see a left shift of a negative value.  That is
undefined behavior since C99 and I believe C++11, so this patch implements
a new warning, -Wshift-negative-value, only active in C99/C++11.

A bunch of tests needed tweaking; I find it scary that some vect tests are
invoking UB.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-04-22  Marek Polacek  pola...@redhat.com

PR c/65179
* c-common.c (c_fully_fold_internal): Warn when left shifting a
negative value.
* c.opt (Wshift-negative-value): New option.

* c-typeck.c (build_binary_op): Warn when left shifting a negative
value.

* typeck.c (cp_build_binary_op): Warn when left shifting a negative
value.

* doc/invoke.texi: Document -Wshift-negative-value.

* c-c++-common/Wshift-negative-value-1.c: New test.
* c-c++-common/Wshift-negative-value-2.c: New test.
* c-c++-common/torture/vector-shift2.c: Use -Wno-shift-negative-value.
* gcc.dg/torture/vector-shift2.c: Likewise.
* gcc.c-torture/execute/pr40386.c: Likewise.
* gcc.dg/tree-ssa/vrp65.c: Likewise.
* gcc.dg/tree-ssa/vrp66.c: Likewise.
* gcc.dg/vect/vect-sdivmod-1.c: Likewise.
* gcc.dg/vect/vect-shift-2-big-array.c: Likewise.
* gcc.dg/vect/vect-shift-2.c: Likewise.
* gcc.target/i386/pr31167.c: Likewise.
* g++.dg/init/array11.C: Add dg-warning.
* gcc.dg/c99-const-expr-7.c: Add dg-warning and dg-error.

diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c
index 7fe7fa6..e944f11 100644
--- gcc/c-family/c-common.c
+++ gcc/c-family/c-common.c
@@ -1361,6 +1361,15 @@ c_fully_fold_internal (tree expr, bool in_init, bool 
*maybe_const_operands,
   !TREE_OVERFLOW_P (op0)
   !TREE_OVERFLOW_P (op1))
overflow_warning (EXPR_LOCATION (expr), ret);
+  if (code == LSHIFT_EXPR
+  TREE_CODE (orig_op0) != INTEGER_CST
+  TREE_CODE (TREE_TYPE (orig_op0)) == INTEGER_TYPE
+  TREE_CODE (op0) == INTEGER_CST
+  c_inhibit_evaluation_warnings == 0
+  tree_int_cst_sgn (op0)  0
+  flag_isoc99)
+   warning_at (loc, OPT_Wshift_negative_value,
+   left shift of negative value);
   if ((code == LSHIFT_EXPR || code == RSHIFT_EXPR)
   TREE_CODE (orig_op1) != INTEGER_CST
   TREE_CODE (op1) == INTEGER_CST
diff --git gcc/c-family/c.opt gcc/c-family/c.opt
index 983f4a8..47e0913 100644
--- gcc/c-family/c.opt
+++ gcc/c-family/c.opt
@@ -781,6 +781,10 @@ Wshift-count-overflow
 C ObjC C++ ObjC++ Var(warn_shift_count_overflow) Init(1) Warning
 Warn if shift count = width of type
 
+Wshift-negative-value
+C ObjC C++ ObjC++ Var(warn_shift_negative_value) Init(1) Warning
+Warn if left shifting a negative value
+
 Wsign-compare
 C ObjC C++ ObjC++ Var(warn_sign_compare) Warning LangEnabledBy(C++ ObjC++,Wall)
 Warn about signed-unsigned comparisons
diff --git gcc/c/c-typeck.c gcc/c/c-typeck.c
index ebe4c73..17d2cac 100644
--- gcc/c/c-typeck.c
+++ gcc/c/c-typeck.c
@@ -10701,6 +10701,15 @@ build_binary_op (location_t location, enum tree_code 
code,
   code1 == INTEGER_TYPE)
{
  doing_shift = true;
+ if (TREE_CODE (op0) == INTEGER_CST
+  tree_int_cst_sgn (op0)  0
+  flag_isoc99)
+   {
+ int_const = false;
+ if (c_inhibit_evaluation_warnings == 0)
+   warning_at (location, OPT_Wshift_negative_value,
+   left shift of negative value);
+   }
  if (TREE_CODE (op1) == INTEGER_CST)
{
  if (tree_int_cst_sgn (op1)  0)
diff --git gcc/cp/typeck.c gcc/cp/typeck.c
index 250b5d6..d5d36bf 100644
--- gcc/cp/typeck.c
+++ gcc/cp/typeck.c
@@ -4327,11 +4327,21 @@ cp_build_binary_op (location_t location,
}
   else if (code0 == INTEGER_TYPE  code1 == INTEGER_TYPE)
{
+ tree const_op0 = fold_non_dependent_expr (op0);
+ if (TREE_CODE (const_op0) != INTEGER_CST)
+   const_op0 = op0;
  tree const_op1 = fold_non_dependent_expr (op1);
  if (TREE_CODE (const_op1) != INTEGER_CST)
const_op1 = op1;
  result_type = type0;
  doing_shift = true;
+ if (TREE_CODE (const_op0) == INTEGER_CST
+  tree_int_cst_sgn (const_op0)  0
+  (complain  tf_warning)
+  c_inhibit_evaluation_warnings == 0
+  cxx_dialect = cxx11)
+   warning (OPT_Wshift_negative_value,
+left shift of negative value);
  if (TREE_CODE (const_op1) == INTEGER_CST)
{
  if (tree_int_cst_lt (const_op1, integer_zero_node))
diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi
index a939ff7..2e14921 100644
---

Re: [PATCH 00/12] Reduce conditional compilation


On 04/22/2015 12:13 PM, David Malcolm wrote:



Conditional compilation was a major PITA when doing the rtx-rtx_insn *
work last year, so I'm very pleased to see these cleanups go in.
Yup.  It also got in Andrew's way last year and we regularly see cases 
where small patches which work fine on the mainstream architectures fail 
to build in the lesser used architectures (particularly cc0 targets). 
It's a whole class of problems I want to see slowly disappear.


glibc went through this process in their codebase for similar reasons, 
but they had more to lose when they got it wrong -- IIRC they had a case 
where exported ABI would differ as a result of conditionally compiled 
code.  Not good.


Jeff

[PATCH] Tidy up locking for libgomp OpenACC entry points

2015-04-22 Thread Julian Brown

Hi,

This patch is an attempt to fix some potential race conditions with
accesses to shared data structures from multiple concurrent threads in
libgomp's OpenACC entry points. The main change is to move locking out
of lookup_host and lookup_dev in oacc-mem.c and into their callers
(which can then hold the locks for the whole operation that they are
performing).

Also missing locking has been added for gomp_acc_insert_pointer.

Tests look OK (with offloading to NVidia PTX).

OK? (For the gomp4 branch, maybe, if trunk's not suitable at the
moment?)

Thanks,

Julian

ChangeLog

libgomp/
* oacc-mem.c (lookup_host): Remove locking from function. Note
locking requirement for caller in function comment.
(lookup_dev): Likewise.
(acc_free, acc_deviceptr, acc_hostptr, acc_is_present)
(acc_map_data, acc_unmap_data, present_create_copy, delete_copyout)
(update_dev_host, gomp_acc_insert_pointer, gomp_acc_remove_pointer):
Add locking.
commit 983e08e46be24380a52095851cd9c6eb481eb47c
Author: Julian Brown jul...@codesourcery.com
Date:   Tue Apr 21 12:42:17 2015 -0700

More locking in oacc-mem.c

diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index 89ef5fc..d53af4b 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -35,7 +35,8 @@
 #include stdint.h
 #include assert.h
 
-/* Return block containing [H-S), or NULL if not contained.  */
+/* Return block containing [H-S), or NULL if not contained.  The device lock
+   for DEV must be locked on entry, and remains locked on exit.  */
 
 static splay_tree_key
 lookup_host (struct gomp_device_descr *dev, void *h, size_t s)
@@ -46,9 +47,7 @@ lookup_host (struct gomp_device_descr *dev, void *h, size_t s)
   node.host_start = (uintptr_t) h;
   node.host_end = (uintptr_t) h + s;
 
-  gomp_mutex_lock (dev-lock);
   key = splay_tree_lookup (dev-mem_map, node);
-  gomp_mutex_unlock (dev-lock);
 
   return key;
 }
@@ -56,7 +55,8 @@ lookup_host (struct gomp_device_descr *dev, void *h, size_t s)
 /* Return block containing [D-S), or NULL if not contained.
The list isn't ordered by device address, so we have to iterate
over the whole array.  This is not expected to be a common
-   operation.  */
+   operation.  The device lock associated with TGT must be locked on entry, and
+   remains locked on exit.  */
 
 static splay_tree_key
 lookup_dev (struct target_mem_desc *tgt, void *d, size_t s)
@@ -67,16 +67,12 @@ lookup_dev (struct target_mem_desc *tgt, void *d, size_t s)
   if (!tgt)
 return NULL;
 
-  gomp_mutex_lock (tgt-device_descr-lock);
-
   for (t = tgt; t != NULL; t = t-prev)
 {
   if (t-tgt_start = (uintptr_t) d  t-tgt_end = (uintptr_t) d + s)
 break;
 }
 
-  gomp_mutex_unlock (tgt-device_descr-lock);
-
   if (!t)
 return NULL;
 
@@ -120,25 +116,32 @@ acc_free (void *d)
 {
   splay_tree_key k;
   struct goacc_thread *thr = goacc_thread ();
+  struct gomp_device_descr *acc_dev = thr-dev;
 
   if (!d)
 return;
 
   assert (thr  thr-dev);
 
+  gomp_mutex_lock (acc_dev-lock);
+
   /* We don't have to call lazy open here, as the ptr value must have
  been returned by acc_malloc.  It's not permitted to pass NULL in
  (unless you got that null from acc_malloc).  */
-  if ((k = lookup_dev (thr-dev-openacc.data_environ, d, 1)))
-   {
- void *offset;
+  if ((k = lookup_dev (acc_dev-openacc.data_environ, d, 1)))
+{
+  void *offset;
+
+  offset = d - k-tgt-tgt_start + k-tgt_offset;
 
- offset = d - k-tgt-tgt_start + k-tgt_offset;
+  gomp_mutex_unlock (acc_dev-lock);
 
- acc_unmap_data ((void *)(k-host_start + offset));
-   }
+  acc_unmap_data ((void *)(k-host_start + offset));
+}
+  else
+gomp_mutex_unlock (acc_dev-lock);
 
-  thr-dev-free_func (thr-dev-target_id, d);
+  acc_dev-free_func (acc_dev-target_id, d);
 }
 
 void
@@ -178,16 +181,24 @@ acc_deviceptr (void *h)
   goacc_lazy_initialize ();
 
   struct goacc_thread *thr = goacc_thread ();
+  struct gomp_device_descr *dev = thr-dev;
+
+  gomp_mutex_lock (dev-lock);
 
-  n = lookup_host (thr-dev, h, 1);
+  n = lookup_host (dev, h, 1);
 
   if (!n)
-return NULL;
+{
+  gomp_mutex_unlock (dev-lock);
+  return NULL;
+}
 
   offset = h - n-host_start;
 
   d = n-tgt-tgt_start + n-tgt_offset + offset;
 
+  gomp_mutex_unlock (dev-lock);
+
   return d;
 }
 
@@ -204,16 +215,24 @@ acc_hostptr (void *d)
   goacc_lazy_initialize ();
 
   struct goacc_thread *thr = goacc_thread ();
+  struct gomp_device_descr *acc_dev = thr-dev;
 
-  n = lookup_dev (thr-dev-openacc.data_environ, d, 1);
+  gomp_mutex_lock (acc_dev-lock);
+
+  n = lookup_dev (acc_dev-openacc.data_environ, d, 1);
 
   if (!n)
-return NULL;
+{
+  gomp_mutex_unlock (acc_dev-lock);
+  return NULL;
+}
 
   offset = d - n-tgt-tgt_start + n-tgt_offset;
 
   h = n-host_start + offset;
 
+  gomp_mutex_unlock (acc_dev-lock);
+
   return h;
 }
 
@@ -232,6 +251,8 @@ acc_is_present (void *h, size_t s)
   struct goacc_thread *thr

[debug-early] Adjust g++.dg/debug/dwarf2/auto1.C testcase

2015-04-22 Thread Aldy Hernandez

This patch adjusts the testcase to work with the now slightly different 
ordering of DIEs in the branch.


Brought to you by the letter N for nightmare.

Committed to branch.
Aldy
commit 7996af2f984f42a9694c466ee05d5067696503cc
Author: Aldy Hernandez al...@redhat.com
Date:   Wed Apr 22 12:20:10 2015 -0700

Adjust testcase for debug-early's different ordering.

diff --git a/gcc/testsuite/g++.dg/debug/dwarf2/auto1.C 
b/gcc/testsuite/g++.dg/debug/dwarf2/auto1.C
index e38334b..c04e923 100644
--- a/gcc/testsuite/g++.dg/debug/dwarf2/auto1.C
+++ b/gcc/testsuite/g++.dg/debug/dwarf2/auto1.C
@@ -10,14 +10,14 @@
 // .uleb128 0x5# (DIE (0x4c) DW_TAG_unspecified_type)
 // .long   .LASF6  # DW_AT_name: auto
 //...
+// .uleb128 0x9# (DIE (0x87) DW_TAG_base_type)
+// .ascii int\0  # DW_AT_name
+//...
 // .uleb128 0x7# (DIE (0x57) DW_TAG_subprogram)
 // .long   0x33# DW_AT_specification
 // .long   0x87# DW_AT_type
-//...
-// .uleb128 0x9# (DIE (0x87) DW_TAG_base_type)
-// .ascii int\0  # DW_AT_name
 
-// { dg-final { scan-assembler a1.*(0x\[0-9a-f]+)\[^\n\r]*DW_AT_type.*\\1. 
DW_TAG_unspecified_type.*DW_AT_specification\[\n\r]{1,2}\[^\n\r]*(0x\[0-9a-f]+)\[^\n\r]*DW_AT_type.*\\2.
 DW_TAG_base_type } }
+// { dg-final { scan-assembler a1.*(0x\[0-9a-f]+)\[^\n\r]*DW_AT_type.*\\1. 
DW_TAG_unspecified_type.*(0x\[0-9a-f]+). 
DW_TAG_base_type.*DW_AT_specification\[\n\r]{1,2}\[^\n\r]*\\2\[^\n\r]*DW_AT_type
 } }
 
 struct A
 {

Hide _S_n_primes from user code

2015-04-22 Thread François Dumont


Hello

Here is a rather trivial patch, just code cleanup. Since we export 
_Prime_rehash_policy we do not need to expose the _S_n_primes anymore.


* include/bits/hashtable_policy.h (_Prime_rehash_policy::_S_n_primes):
Delete.
* src/c++11/hashtable_c++0x.cc (_Prime_rehash_policy::_M_next_bkt):
Remove usage of latter and compute size of the prime numbers array
locally.

Tested under Linux x86_64.

Ok to commit ?

François

Re: patch ping

On April 13, 2015 3:12:48 PM GMT+02:00, Jeff Law l...@redhat.com wrote:
On 04/11/2015 04:27 PM, Bernhard Reutner-Fischer wrote:
 Hi,

 I'd like to ask an RM or global reviewer to kindly consider the
 following patches preventing one or the other target in
config-list.mk
 to build:

 [PATCH, bfin] handle BFIN_CPU_UNKNOWN in TARGET_CPU_CPP_BUILTINS
 https://gcc.gnu.org/ml/gcc-patches/2015-04/msg00034.html
OK.


 [PATCH, c6x] handle unk_isa in TARGET_CPU_CPP_BUILTINS
 https://gcc.gnu.org/ml/gcc-patches/2015-04/msg00089.html
OK.



 Cosmetic patchlets pending but probably for stage 1 now:

 Remove redundant guard in emit_bss()
 https://gcc.gnu.org/ml/gcc-patches/2015-04/msg00337.html
OK.


 tree-tailcall: Commentary typo fix, remove fwd declaration
 https://gcc.gnu.org/ml/gcc-patches/2015-04/msg00342.html
OK.


 s/ ;/;/g Makefile.tpl
 https://gcc.gnu.org/ml/gcc-patches/2015-04/msg00380.html
OK

Note there is a policy that requires all patches to be bootstrapped and

regression tested.  These are trivial enough that I'll approve them 
as-is.  However, in the future, please bootstrap and regression test 
changes whenever possible.

I'm aware of this policy. I did my best not to break other configs. By now all 
of the above were pushed including the erroneously committed
https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01270.html
to fix 
PR target/47122 vax-*-openbsd* config.gcc typo

that Jakub was kind enough to confirm to be obvious on IRC.

Thanks for your reviews!

Since I touched Makefile.tpl and there was at least one other patch against it 
in GCC, i would be grateful if someone could synch Makefile.tpl back to 
binutils-gdb in two days or three so I can sleep well again a couple of days 
after that :)
cheers,

stregnhten ICF WRT inline and operator_new flags

2015-04-22 Thread Jan Hubicka

Hi,
this patch strenghtens ipa-icf to allow merging non-inline function to
inline function.  This is safe because inline flag does not affect function
itself. It only affects way the function is used, so we need to compare the
flag only when comparing references. (inline flag mismatch is one of common
reasons to give up on merging).

The patch does the same for DECL_IS_OPERATOR_NEW and refactors code so
comparsions of all such symol properties (that matter only at referring
time) is done by compare_referenced_symbol_properties.

I also cleaned up reference walking and added code to match TREE_CODE of pointer
types to avoid wrong code with tree-vrp change I did last week.

Bootstrapped/regtested x86_64-linux, will commit it shortly.

Honza

* ipa-icf.c (symbol_compare_collection::symbol_compare_collection):
cleanup.
(sem_function::get_hash): Do not hash DECL_DISREGARD_INLINE_LIMITS,
DECL_DECLARED_INLINE_P and DECL_IS_OPERATOR_NEW.
(sem_item::compare_referenced_symbol_properties): New.
(sem_item::hash_referenced_symbol_properties): New.
(sem_item::compare_cgraph_references): Rename to ...
(sem_item::compare_symbol_references): ... this one; use
compare_referenced_symbol_properties.
(sem_function::equals_wpa): Do not compare
DECL_DISREGARD_INLINE_LIMITS, DECL_DECLARED_INLINE_P,
DECL_IS_OPERATOR_NEW; compare pointer sizes.
(sem_item::update_hash_by_addr_refs): Call
hash_referenced_symbol_properties.
(sem_item::update_hash_by_local_refs): Cleanup.
(sem_function::merge): Do not mix up symbol properties.
(sem_variable::equals_wpa): Use compare_symbol_references.
* ipa-icf.h (sem_item::compare_referenced_symbol_properties): New.
(sem_item::hash_referenced_symbol_properties): New.
(sem_item::compare_symbol_references): New.
(sem_item::compare_cgraph_references): Remove.
Index: ipa-icf.c
===
--- ipa-icf.c   (revision 92)
+++ ipa-icf.c   (working copy)
@@ -145,9 +145,8 @@ symbol_compare_collection::symbol_compar
   if (is_a varpool_node * (node)  DECL_VIRTUAL_P (node-decl))
 return;
 
-  for (unsigned i = 0; i  node-num_references (); i++)
+  for (unsigned i = 0; node-iterate_reference (i, ref); i++)
 {
-  ref = node-iterate_reference (i, ref);
   if (ref-address_matters_p ())
m_references.safe_push (ref-referred);
 
@@ -342,8 +341,6 @@ sem_function::get_hash (void)
   if (DECL_FUNCTION_SPECIFIC_OPTIMIZATION (decl))
 (cl_optimization_hash
   (TREE_OPTIMIZATION (DECL_FUNCTION_SPECIFIC_OPTIMIZATION (decl;
-  hstate.add_flag (DECL_DISREGARD_INLINE_LIMITS (decl));
-  hstate.add_flag (DECL_DECLARED_INLINE_P (decl));
-  hstate.add_flag (DECL_IS_OPERATOR_NEW (decl));
   hstate.add_flag (DECL_CXX_CONSTRUCTOR_P (decl));
   hstate.add_flag (DECL_CXX_DESTRUCTOR_P (decl));
@@ -354,12 +351,117 @@ sem_function::get_hash (void)
   return hash;
 }
 
+/* Compare properties of symbols N1 and N2 that does not affect semantics of
+   symbol itself but affects semantics of its references from USED_BY (which
+   may be NULL if it is unknown).  If comparsion is false, symbols
+   can still be merged but any symbols referring them can't.
+
+   If ADDRESS is true, do extra checking needed for IPA_REF_ADDR.
+
+   TODO: We can also split attributes to those that determine codegen of
+   a function body/variable constructor itself and those that are used when
+   referring to it.  */
+
+bool
+sem_item::compare_referenced_symbol_properties (symtab_node *used_by,
+   symtab_node *n1,
+   symtab_node *n2,
+   bool address)
+{
+  if (is_a cgraph_node * (n1))
+{
+  /* Inline properties matters: we do now want to merge uses of inline
+function to uses of normal function because inline hint would be lost.
+We however can merge inline function to noinline because the alias
+will keep its DECL_DECLARED_INLINE flag.
+
+Also ignore inline flag when optimizing for size or when function
+is known to not be inlinable.
+
+TODO: the optimize_size checks can also be assumed to be true if
+unit has no !optimize_size functions. */
+
+  if ((!used_by || address || !is_a cgraph_node * (used_by)
+  || !opt_for_fn (used_by-decl, optimize_size))
+  !opt_for_fn (n1-decl, optimize_size)
+  n1-get_availability ()  AVAIL_INTERPOSABLE
+  (!DECL_UNINLINABLE (n1-decl) || !DECL_UNINLINABLE (n2-decl)))
+   {
+ if (DECL_DISREGARD_INLINE_LIMITS (n1-decl)
+ != DECL_DISREGARD_INLINE_LIMITS (n2-decl))
+   return return_false_with_msg
+(DECL_DISREGARD_INLINE_LIMITS are different);
+
+ if

Re: [PATCH, rs6000, testsuite] Fix PR target/64579, __TM_end __builtin_tend failed to return transactional state

2015-04-22 Thread Peter Bergner

On Wed, 2015-04-22 at 20:55 -0500, Segher Boessenkool wrote:
 Using a hard reg in the RTL like this has a few problems:
 a) It might hinder register allocation.  Maybe it doesn't, not sure;
 b) It does hinder scheduling;
 c) It can make things ICE, maybe with register asm.

Ahh, I see what you mean now.  Yeah, I hadn't thought of that.


 The alternative is to write a separate define_insn for ttest, one
 without inputs; the generated assembler can still be the same of
 course.

In that case, I think you're right that this is the best course
if action.  I'll do that ans retest.  Thanks for catching this.

Peter

Re: [RFC][PATCH 3/3] Enable zero/sign extension elimination



On 23/04/15 09:48, H.J. Lu wrote:
 On Wed, Apr 22, 2015 at 3:15 PM, Kugan
 kugan.vivekanandara...@linaro.org wrote:
 On 17/01/15 13:11, Kugan wrote:

 Re-enable zero/sign extension elimination using value range that
 includes wrapped attribute.


 Now that stage-1 is open, rebased it and regression tested on
 x86-64-none-linux-gnu with no new regressions.

 Is this OK for trunk?

 Thanks,
 Kugan

 gcc/ChangeLog:

 2015-04-22  Kugan Vivekanandarajah  kug...@linaro.org

 * calls.c (precompute_arguments): Check
  promoted_for_signed_and_unsigned_p and set the promoted mode.
 * expr.c (expand_expr_real_1): Likewise.
 (promoted_for_signed_and_unsigned_p): New function.
 * cfgexpand.c (expand_gimple_stmt_1): Call emit_move_insn if
 SUBREG is promoted with SRP_SIGNED_AND_UNSIGNED.
 * expr.h (promoted_for_signed_and_unsigned_p): New definition.
 
 Are you planning to submit some testcases to show its improvement?
 Will it help
 
 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53639
 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33349
 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44532

Thanks H.J. Lu for the link. I will investigate them and will come up
with test cases if my patches help these.

Kugan

Re: [PATCH, rs6000, testsuite] Fix PR target/64579, __TM_end __builtin_tend failed to return transactional state

2015-04-22 Thread Segher Boessenkool

On Wed, Apr 22, 2015 at 06:08:26PM -0500, Peter Bergner wrote:
 +   case HTM_BUILTIN_TTEST: /* Alias for: tabortwci. 0,r0,0  */
 + op[nopnds++] = GEN_INT (0);
 + op[nopnds++] = gen_rtx_REG (SImode, 0);
 + op[nopnds++] = GEN_INT (0);

Is that really r0, isn't that (0|rA)?  [Too lazy to read the docs myself
right now, sorry.]
   
   The ISA doc shows:
  
  [snip]
  
  Thanks for looking it up!
  
  I'm still a bit worried about putting a reg in the RTL (while the 
  instruction
  doesn't actually use one), but perhaps it's harmless.
 
 I'm not sure what you mean by the instruction doesn't use one.
 The hardware instruction does use a register for its second
 operand (even though its contents are ignored due to TO == 0)
 and the pattern requires us to pass in a reg rtx, so I'm not
 sure what you're referring to.

I mean the instruction doesn't actually use the value in the register
(if it did, you couldn't just pass in a non-fixed hard register in RTL).

Using a hard reg in the RTL like this has a few problems:
a) It might hinder register allocation.  Maybe it doesn't, not sure;
b) It does hinder scheduling;
c) It can make things ICE, maybe with register asm.

I no longer think c) will happen in this case.

The alternative is to write a separate define_insn for ttest, one
without inputs; the generated assembler can still be the same of
course.

Cheers,


Segher

Re: [PATCH 02/12] remove some ifdef HAVE_cc0

2015-04-22 Thread James Greenhalgh

On Tue, Apr 21, 2015 at 04:24:44PM +0100, Trevor Saunders wrote:
 On Tue, Apr 21, 2015 at 04:14:01PM +0200, Richard Biener wrote:
  On Tue, Apr 21, 2015 at 3:24 PM,  tbsaunde+...@tbsaunde.org wrote:
   From: Trevor Saunders tbsaunde+...@tbsaunde.org
  
   gcc/ChangeLog:
  
   2015-04-21  Trevor Saunders  tbsaunde+...@tbsaunde.org
  
   * conditions.h: Define macros even if HAVE_cc0 is undefined.
   * emit-rtl.c: Define functions even if HAVE_cc0 is undefined.
   * final.c: Likewise.
   * jump.c: Likewise.
   * recog.c: Likewise.
   * recog.h: Declare functions even when HAVE_cc0 is undefined.
   * sched-deps.c (sched_analyze_2): Always compile case for cc0.

If I've counted right after the git bisect, this patch seems to break
the ARM buildi (arm-none-linux-gnueabihf):

  In file included from insn-output.c:40:0:
  /gcc-src/gcc/conditions.h:112:0: error: CC_STATUS_INIT redefined 
[-Werror]
   #define CC_STATUS_INIT  \
   ^
  In file included from tm.h:35:0,
   from insn-output.c:7:
  /gcc-src/gcc/config/arm/arm.h:2159:0: note: this is the location of the 
previous definition
   #define CC_STATUS_INIT \
   ^

I guess the conditions.h definition wants wrapping in #ifndef - though a
quick grep suggests that ARM is the only target defining CC_STATUS_INIT
so lets CC the ARM maintainers and see what their preference is...

Thanks,
James

[Patch][ARM]Correct options for arm test case pr65710

2015-04-22 Thread Terry Guo

Hi there,

This patch is to correct options in arm test case pr65710.c. I reused
some existing test case as template to produce this case, but forgot
to update the options. Is it OK to trunk?

BR,
Terry

2015-04-23  Terry Guo  terry@arm.com

   * gcc.target/arm/pr65710.c: Update the options.
diff --git a/gcc/testsuite/gcc.target/arm/pr65710.c 
b/gcc/testsuite/gcc.target/arm/pr65710.c
index 139bc64..737b7f3 100644
--- a/gcc/testsuite/gcc.target/arm/pr65710.c
+++ b/gcc/testsuite/gcc.target/arm/pr65710.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options -march=armv6-m -mthumb -O3 -w -mfloat-abi=soft } */
+/* { dg-options -mthumb -O2 -mfloat-abi=soft } */
 
 struct ST {
   char *buffer;

Re: [PATCH] fortran/65429 -- don't dereference a null pointer

2015-04-22 Thread Steve Kargl

On Tue, Apr 07, 2015 at 12:59:20PM -0700, Steve Kargl wrote:
 On Tue, Mar 31, 2015 at 10:17:14AM -0700, Jerry DeLisle wrote:
  On 03/29/2015 09:25 AM, Steve Kargl wrote:
   On Sat, Mar 28, 2015 at 01:01:57AM +0100, Dominique Dhumieres wrote:
  
   AFAICT your test succeeds without your patch and does not test that the 
   ICE
   reported by FX is gone (indeed it is with your patch).
  
  
   New patch and testcase.  The ChangeLog entries are in the
   original email.  Built and tested on x86_64-*-freebsd.
   OK, now?
  
  
  OK Steve.
 
 I just checked and this is not a regression with respect
 to 4.6, 4.7. 4.8, or 4.9.  As 5.0 is coming soon, I'll
 for stage 1 to commit.
 

Fixed with the following

r222342 trunk
r222343 5.1 branch

-- 
Steve

Re: [RFC][PATCH 3/3] Enable zero/sign extension elimination

On Wed, Apr 22, 2015 at 3:15 PM, Kugan
kugan.vivekanandara...@linaro.org wrote:
 On 17/01/15 13:11, Kugan wrote:

 Re-enable zero/sign extension elimination using value range that
 includes wrapped attribute.


 Now that stage-1 is open, rebased it and regression tested on
 x86-64-none-linux-gnu with no new regressions.

 Is this OK for trunk?

 Thanks,
 Kugan

 gcc/ChangeLog:

 2015-04-22  Kugan Vivekanandarajah  kug...@linaro.org

 * calls.c (precompute_arguments): Check
  promoted_for_signed_and_unsigned_p and set the promoted mode.
 * expr.c (expand_expr_real_1): Likewise.
 (promoted_for_signed_and_unsigned_p): New function.
 * cfgexpand.c (expand_gimple_stmt_1): Call emit_move_insn if
 SUBREG is promoted with SRP_SIGNED_AND_UNSIGNED.
 * expr.h (promoted_for_signed_and_unsigned_p): New definition.

Are you planning to submit some testcases to show its improvement?
Will it help

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53639
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33349
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44532

-- 
H.J.

Re: [PATCH 00/12] Reduce conditional compilation


On 04/22/2015 02:46 PM, Trevor Saunders wrote:

yeah, its irritated me on a number of occasions too.  I'd really like it
if building config-list.mk could be faster, but that's a much bigger
project, but at least if everything is target hooks maybe ccache can
kick in some.
I don't see ccache kicking in here because the target .h files get 
included most of the time.


Maybe one day that won't be the case :-)

jeff

Re: [PATCH 00/12] Reduce conditional compilation

2015-04-22 Thread Trevor Saunders

On Wed, Apr 22, 2015 at 12:36:58PM -0600, Jeff Law wrote:
 On 04/22/2015 12:13 PM, David Malcolm wrote:
 
 
 Conditional compilation was a major PITA when doing the rtx-rtx_insn *
 work last year, so I'm very pleased to see these cleanups go in.
 Yup.  It also got in Andrew's way last year and we regularly see cases where
 small patches which work fine on the mainstream architectures fail to build
 in the lesser used architectures (particularly cc0 targets). It's a whole
 class of problems I want to see slowly disappear.

yeah, its irritated me on a number of occasions too.  I'd really like it
if building config-list.mk could be faster, but that's a much bigger
project, but at least if everything is target hooks maybe ccache can
kick in some.

Trev

 
 glibc went through this process in their codebase for similar reasons, but
 they had more to lose when they got it wrong -- IIRC they had a case where
 exported ABI would differ as a result of conditionally compiled code.  Not
 good.
 
 Jeff

Re: [RFC][PATCH 2/3] Propagate and save value ranges wrapped information


On 19/01/15 22:28, Richard Biener wrote:
 On Sat, 17 Jan 2015, Kugan wrote:
 

 This patch propagate value range wrapps attribute and save this to
 SSA_NAME.
 
 diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
 index 9b7695d..832c35d 100644
 --- a/gcc/tree-vrp.c
 +++ b/gcc/tree-vrp.c
 @@ -103,6 +103,9 @@ struct value_range_d
tree min;
tree max;
  
 +  /* Set to true if values in this value range could wrapp. */
 +  bool is_wrapped;
 +
/* Set of SSA names whose value ranges are equivalent to this one.
   This set is only valid when TYPE is VR_RANGE or VR_ANTI_RANGE.  */
bitmap equiv;
 
 I can't make sense of this description (wrap with one p as well).
 I assume you mean that the expression that has this value-range
 assigned has an operation that may have wrapped?  (a value
 can't wrap)
 
 You need to specify how is_wrapped behaves for range union and
 intersect operations and which operations can wrap.
 
 I miss an overall description of these patches as to a) why you
 need this information and b) why it helps.
 
 It's now also too late and thus you have plenty of time until stage1
 starts again.


Thanks Richard for the comments. Now that stage1 is open, here is the
modified patch with the changes requested.

Due to wrapping in the value range computation, there was a regression
in aplha-linux
(https://gcc.gnu.org/ml/gcc-patches/2014-08/msg02458.html) while using
value range infromation to remove zero/sign extensions in rtl
expansaion. Hence I had to revert the patch that enabled zero/sign
extension. Now I am propgating this wrap_p information to SSA_NAME so
that we know, when used in PROMOTE_MODE, the values can have
unpredictable bits beyon the type width.

I have also updated the comments as below:


+  /* Set to true if the values in this range might have been wrapped
+ during the operation that computed it.
+
+ This is mainly used in zero/sign-extension elimination where value
ranges
+ computed are for the type of SSA_NAME and computation is
ultimately done
+ in PROMOTE_MODE.  Therefore, the value ranges has to be correct upto
+ PROMOTE_MODE precision.  If the operation can WRAP, higher bits in
+ PROMOTE_MODE can be unpredictable and cannot be used in zero/sign
extension
+ elimination; additional wrap_p attribute is needed to show this.
+
+ For example:
+ on alpha where PROMOTE_MODE is 64 bit and _344 is a 32 bit unsigned
+ variable,
+ _343 = ivtmp.179_52 + 2147483645;  [0x8004, 0x80043]
+
+ the value range VRP will compute is:
+
+ _344 = _343 * 2;  [0x8, 0x86]
+ _345 = (integer(kind=4)) _344;[0x8, 0x86]
+
+ In PROMOTE_MODE, there will be garbage above the type width.  In
places
+ like this, attribute wrap_p will be true.
+
+ wrap_p in range union operation will be true if either of the
value range
+ has wrap_p set.  In intersect operation, true when both the value
ranges
+ have wrap_p set.  */
+  bool wrap_p;
+

Thanks,
Kugan


gcc/testsuite/ChangeLog:

2015-04-22  Kugan Vivekanandarajah  kug...@linaro.org

* gcc.dg/tree-ssa/vrp92.c: Update scanned pattern.

gcc/ChangeLog:

2015-04-22  Kugan Vivekanandarajah  kug...@linaro.org

* builtins.c (determine_block_size): Use new definition of
 get_range_info.
* gimple-pretty-print.c (dump_ssaname_info): Dump new wrap_p info.
* internal-fn.c (get_range_pos_neg): Use new definition of
 get_range_info.
(get_min_precision): Likewise.
* tree-ssa-copy.c (fini_copy_prop): Use new definition of
 duplicate_ssa_range_info.
* tree-ssa-pre.c (insert_into_preds_of_block): Likewise.
(move_computations_dom_walker::before_dom_children): Likewise.
* tree-ssa-phiopt.c (value_replacement): Likewise.
* tree-ssa-pre.c (eliminate_dom_walker::before_dom_children):
Likewise.
* tree-ssa-loop-niter.c (determine_value_range): Use new definition.
(record_nonwrapping_iv): Likewise.
* tree-ssanames.c (set_range_info): Save wrap_p information.
(get_range_info): Retrive wrap_p information.
(set_nonzero_bits): Set wrap_p info.
(duplicate_ssa_name_range_info): Likewise.
(duplicate_ssa_name_fn): Likewise.
* tree-ssanames.h: (set_range_info): Update definition.
(get_range_info): Likewise.
* tree-vect-patterns.c (vect_recog_divmod_pattern): Use new
declaration get_range_info.
* tree-vrp.c (struct value_range_d): Add wrap_p field.
(set_value_range): Calculate and add wrap_p field.
(set_and_canonicalize_value_range): Likewise.
(copy_value_range): Likewise.
(set_value_range_to_value): Likewise.
(set_value_range_to_nonnegative): Likewise.
(set_value_range_to_nonnull): Likewise.
(set_value_range_to_truthvalue): Likewise.
(abs_extent_range): Likewise.
(get_value_range): Return wrap_p info.

[debug-early] Only output DW_TAG_GNU_formal_parameter_pack DIEs once

2015-04-22 Thread Aldy Hernandez

The attached patch fixes 
gcc/testsuite/g++.dg/debug/dwarf2/template-func-params-7.C.


The problem is that DW_TAG_GNU_formal_parameter_pack DIEs are generated 
multiple times (once for early dwarf and once for late dwarf).  Fixed by 
only outputting in early dwarf.


Tested with GCC and GDB testsuites.

Committed to branch.

Aldy
commit e74781aabb821c402a9c1efeb69a6311e4e905cf
Author: Aldy Hernandez al...@redhat.com
Date:   Wed Apr 22 16:50:09 2015 -0700

Only output DW_TAG_GNU_formal_parameter_pack DIEs once.

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 7cc6bb5..624ed19 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -19137,9 +19137,14 @@ gen_subprogram_die (tree decl, dw_die_ref context_die)
{
  if (generic_decl_parm
   lang_hooks.function_parameter_pack_p (generic_decl_parm))
-   gen_formal_parameter_pack_die (generic_decl_parm,
-  parm, subr_die,
-  parm);
+   {
+ if (early_dwarf_dumping)
+   gen_formal_parameter_pack_die (generic_decl_parm,
+  parm, subr_die,
+  parm);
+ else if (parm)
+   parm = DECL_CHAIN (parm);
+   }
  else if (parm  !POINTER_BOUNDS_P (parm))
{
  dw_die_ref parm_die = gen_decl_die (parm, NULL, subr_die);

Re: [RFC][PATCH 1/3] Free a bit in SSA_NAME to save wrapped information


On 17/01/15 13:06, Kugan wrote:
 Freeing a spare-bit to store wrapped attribute by going back to
 representing VR_ANTI_RANGE as [max + 1, min - 1] in SSA_NAME.
 

Now that stage-1 is open, rebased it and regression tested on
x86-64-none-linux-gnu with no new regressions.

Is this OK for trunk?

Thanks,
Kugan


gcc/ChangeLog:

2015-04-22  Kugan Vivekanandarajah  kug...@linaro.org

* tree-ssanames.c (set_range_info): Change range info representation
  and represent VR_ANTI_RANGE as [max + 1, min - 1].
(get_range_info): Likewise.
(set_nonzero_bits): Likewise.
(duplicate_ssa_name_range_info): Likewise.
* tree-ssanames.h (set_range_info): Change prototype.
(get_range_info): Likewise.
(set_nonzero_bits): Likewise.
(duplicate_ssa_name_range_info): Likewise.
* tree-vrp.c (remove_range_assertions): Use new representation.
(vrp_finalize): Likewise.
* tree-ssa-pre.c (insert_into_preds_of_block): Likewise.
diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index d857d84..67e5351 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -3205,7 +3205,6 @@ insert_into_preds_of_block (basic_block block, unsigned 
int exprnum,
   !wi::neg_p (max, SIGNED))
/* Just handle extension and sign-changes of all-positive ranges.  */
set_range_info (temp,
-   SSA_NAME_RANGE_TYPE (expr-u.nary-op[0]),
wide_int_storage::from (min, TYPE_PRECISION (type),
TYPE_SIGN (type)),
wide_int_storage::from (max, TYPE_PRECISION (type),
diff --git a/gcc/tree-ssanames.c b/gcc/tree-ssanames.c
index 9c39f65..744dc43 100644
--- a/gcc/tree-ssanames.c
+++ b/gcc/tree-ssanames.c
@@ -200,11 +200,10 @@ make_ssa_name_fn (struct function *fn, tree var, gimple 
stmt)
 /* Store range information RANGE_TYPE, MIN, and MAX to tree ssa_name NAME.  */
 
 void
-set_range_info (tree name, enum value_range_type range_type,
+set_range_info (tree name,
const wide_int_ref min, const wide_int_ref max)
 {
   gcc_assert (!POINTER_TYPE_P (TREE_TYPE (name)));
-  gcc_assert (range_type == VR_RANGE || range_type == VR_ANTI_RANGE);
   range_info_def *ri = SSA_NAME_RANGE_INFO (name);
   unsigned int precision = TYPE_PRECISION (TREE_TYPE (name));
 
@@ -219,16 +218,12 @@ set_range_info (tree name, enum value_range_type 
range_type,
   ri-set_nonzero_bits (wi::shwi (-1, precision));
 }
 
-  /* Record the range type.  */
-  if (SSA_NAME_RANGE_TYPE (name) != range_type)
-SSA_NAME_ANTI_RANGE_P (name) = (range_type == VR_ANTI_RANGE);
-
   /* Set the values.  */
   ri-set_min (min);
   ri-set_max (max);
 
   /* If it is a range, try to improve nonzero_bits from the min/max.  */
-  if (range_type == VR_RANGE)
+  if (wi::cmp (min, max, TYPE_SIGN (TREE_TYPE (name)))  0)
 {
   wide_int xorv = ri-get_min () ^ ri-get_max ();
   if (xorv != 0)
@@ -248,6 +243,7 @@ get_range_info (const_tree name, wide_int *min, wide_int 
*max)
   gcc_assert (!POINTER_TYPE_P (TREE_TYPE (name)));
   gcc_assert (min  max);
   range_info_def *ri = SSA_NAME_RANGE_INFO (name);
+  value_range_type range_type;
 
   /* Return VR_VARYING for SSA_NAMEs with NULL RANGE_INFO or SSA_NAMEs
  with integral types width  2 * HOST_BITS_PER_WIDE_INT precision.  */
@@ -255,9 +251,22 @@ get_range_info (const_tree name, wide_int *min, wide_int 
*max)
   2 * HOST_BITS_PER_WIDE_INT))
 return VR_VARYING;
 
-  *min = ri-get_min ();
-  *max = ri-get_max ();
-  return SSA_NAME_RANGE_TYPE (name);
+   /* If max  min, it is VR_ANTI_RANGE.  */
+  if (wi::cmp (ri-get_max (), ri-get_min (), TYPE_SIGN (TREE_TYPE (name)))  
0)
+{
+  /* VR_ANTI_RANGE ~[min, max] is encoded as [max + 1, min - 1].  */
+  range_type = VR_ANTI_RANGE;
+  *min = wi::add (ri-get_max (), 1);
+  *max = wi::sub (ri-get_min (), 1);
+}
+  else
+{
+  /* Otherwise (when min = max), it is VR_RANGE.  */
+  range_type = VR_RANGE;
+  *min = ri-get_min ();
+  *max = ri-get_max ();
+}
+  return range_type;
 }
 
 /* Change non-zero bits bitmask of NAME.  */
@@ -267,7 +276,7 @@ set_nonzero_bits (tree name, const wide_int_ref mask)
 {
   gcc_assert (!POINTER_TYPE_P (TREE_TYPE (name)));
   if (SSA_NAME_RANGE_INFO (name) == NULL)
-set_range_info (name, VR_RANGE,
+set_range_info (name,
TYPE_MIN_VALUE (TREE_TYPE (name)),
TYPE_MAX_VALUE (TREE_TYPE (name)));
   range_info_def *ri = SSA_NAME_RANGE_INFO (name);
@@ -495,7 +504,8 @@ duplicate_ssa_name_ptr_info (tree name, struct ptr_info_def 
*ptr_info)
 /* Creates a duplicate of the range_info_def at RANGE_INFO of type
RANGE_TYPE for use by the SSA name NAME.  */
 void
-duplicate_ssa_name_range_info (tree name, enum value_range_type range_type,
+duplicate_ssa_name_range_info (tree name,
+  enum value_range_type range_type

Re: [PATCH, rs6000, testsuite] Fix PR target/64579, __TM_end __builtin_tend failed to return transactional state

2015-04-22 Thread Segher Boessenkool

On Wed, Apr 22, 2015 at 08:43:10AM -0500, Peter Bergner wrote:
  Maybe you can fold tabortdc with tabortwc now?  Use one UNSPEC name
  for both, :GPR and wd?
 
 Wouldn't that change the tabortwc pattern to use DImode rather
 than SImode when compiled with -m64 or -m32 -mpowerpc64?
 I'm not sure we want that.

The GPR mode iterator creates two patterns, one for SI and one for DI,
and the tabortwc would be the one for SI if you use wd.

   +   case HTM_BUILTIN_TTEST: /* Alias for: tabortwci. 0,r0,0  */
   + op[nopnds++] = GEN_INT (0);
   + op[nopnds++] = gen_rtx_REG (SImode, 0);
   + op[nopnds++] = GEN_INT (0);
  
  Is that really r0, isn't that (0|rA)?  [Too lazy to read the docs myself
  right now, sorry.]
 
 The ISA doc shows:

[snip]

Thanks for looking it up!

I'm still a bit worried about putting a reg in the RTL (while the instruction
doesn't actually use one), but perhaps it's harmless.

   + emit_insn (gen_movcc (subreg, cr));
   + emit_insn (gen_lshrsi3 (scratch2, scratch1, GEN_INT (28)));
   + emit_insn (gen_andsi3 (target, scratch2, GEN_INT (0xf)));
   +   }
   +   }
  
  Don't we have helper functions/expanders to do these moves?  Yuck.
 
 Heh, I looked.  The only helper pattern was the movcc pattern, but
 that placed the CR into bits 32-35 of the register.  I needed the
 shift to move it down into the low nibble and I use the and, since
 one of the move cr insns places two copies of the CR value into
 bits 32-35 and 36-39.

At least the VMX patterns have something like it for CR6.  Probably not
directly usable either, sigh.

   -/* { dg-final { scan-assembler-times tabortdc\\. 1 } } */
   -/* { dg-final { scan-assembler-times tabortdci\\. 1 } } */
   +/* { dg-final { scan-assembler-times tabortdc\\. 1 { target lp64 } } } 
   */
   +/* { dg-final { scan-assembler-times tabortdci\\. 1 { target lp64 } } 
   } */
  
  This skips this test on -m32 -mpowerpc64, is that on purpose?
 
 Ummm, not exactly.  :-)   Not that many people test that though.
 I'll see if I can find a replacement for lp64 that covers that case.

Maybe just { powerpc64 } works?

 If not, I'm not too torn up if we skip it for -m32 -mpowerpc64.

Me neither, just looked like an oversight.


Segher

Re: [PATCH 02/12] remove some ifdef HAVE_cc0

2015-04-22 Thread Trevor Saunders

On Thu, Apr 23, 2015 at 04:27:59AM +0100, James Greenhalgh wrote:
 On Tue, Apr 21, 2015 at 04:24:44PM +0100, Trevor Saunders wrote:
  On Tue, Apr 21, 2015 at 04:14:01PM +0200, Richard Biener wrote:
   On Tue, Apr 21, 2015 at 3:24 PM,  tbsaunde+...@tbsaunde.org wrote:
From: Trevor Saunders tbsaunde+...@tbsaunde.org
   
gcc/ChangeLog:
   
2015-04-21  Trevor Saunders  tbsaunde+...@tbsaunde.org
   
* conditions.h: Define macros even if HAVE_cc0 is undefined.
* emit-rtl.c: Define functions even if HAVE_cc0 is undefined.
* final.c: Likewise.
* jump.c: Likewise.
* recog.c: Likewise.
* recog.h: Declare functions even when HAVE_cc0 is undefined.
* sched-deps.c (sched_analyze_2): Always compile case for cc0.
 
 If I've counted right after the git bisect, this patch seems to break
 the ARM buildi (arm-none-linux-gnueabihf):
 
   In file included from insn-output.c:40:0:
   /gcc-src/gcc/conditions.h:112:0: error: CC_STATUS_INIT redefined 
 [-Werror]
#define CC_STATUS_INIT  \
^
   In file included from tm.h:35:0,
from insn-output.c:7:
   /gcc-src/gcc/config/arm/arm.h:2159:0: note: this is the location of the 
 previous definition
#define CC_STATUS_INIT \
^
 
 I guess the conditions.h definition wants wrapping in #ifndef - though a
 quick grep suggests that ARM is the only target defining CC_STATUS_INIT
 so lets CC the ARM maintainers and see what their preference is...

Well, that seems pretty weird, but it looks intentional arm does this
see http://gcc.gnu.org/ml/gcc-patches/2010-07/msg00437.html

Of course I now see final.c also defines a fall back, so maybe the right
thing to do is wrap the conditions.h definition in #if HAVE_cc0, or
maybe the final.c definition can go away? Right now I'm to tired to make
a good decision about that.

sorry about the bustage!

Trev

 
 
 Thanks,
 James

Re: [PATCH] tetstsuite gcc.target/i386/ avx512*

2015-04-22 Thread Andreas Tobler


Hi Kirill,

On 21.04.15 10:28, Kirill Yukhin wrote:


On 19 Apr 21:56, Andreas Tobler wrote:

Done so and tested on FreeBSD amd64-unknown-freebsd11.0 and CentOS7.1.

Ok for trunk?

The patch is OK for trunk and for gcc-5 branch (when it is open).

Thanks for fixing this!


Done on trunk and gcc-5.

Thanks for the review!

Andreas

Re: [PATCH][libstc++v3]Add new dg-require-thread-fence directive.

2015-04-22 Thread Jonathan Wakely

On 22 April 2015 at 12:25, Renlin Li wrote:
 Hi Jonathan,

 Thank you for the suggestion. I have just attached the updated the patch. It
 works as before.
 Is this Okay to commit?

OK, thanks.

[PATCH] Dev tree housekeeping


This pushes a bunch of changes from my dev tree to trunk.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2015-04-22  Richard Biener  rguent...@suse.de

* cfgexpand.c (expand_gimple_stmt_1): Use ops.code.
* cfgloop.c (verify_loop_structure): Verify the root loop node.
* except.c (duplicate_eh_regions): Call get_eh_region_from_lp_number_fn
instead of get_eh_region_from_lp_number.
* loop-init.c (fix_loop_structure): If we removed a loop, reset
the SCEV cache.

Index: gcc/cfgexpand.c
===
--- gcc/cfgexpand.c (revision 222320)
+++ gcc/cfgexpand.c (working copy)
@@ -3413,7 +3413,7 @@ expand_gimple_stmt_1 (gimple stmt)
 
ops.code = gimple_assign_rhs_code (assign_stmt);
ops.type = TREE_TYPE (lhs);
-   switch (get_gimple_rhs_class (gimple_expr_code (stmt)))
+   switch (get_gimple_rhs_class (ops.code))
  {
case GIMPLE_TERNARY_RHS:
  ops.op2 = gimple_assign_rhs3 (assign_stmt);
Index: gcc/cfgloop.c
===
--- gcc/cfgloop.c   (revision 222320)
+++ gcc/cfgloop.c   (working copy)
@@ -1347,6 +1347,16 @@ verify_loop_structure (void)
   else
 verify_dominators (CDI_DOMINATORS);
 
+  /* Check the loop tree root.  */
+  if (current_loops-tree_root-header != ENTRY_BLOCK_PTR_FOR_FN (cfun)
+  || current_loops-tree_root-latch != EXIT_BLOCK_PTR_FOR_FN (cfun)
+  || (current_loops-tree_root-num_nodes
+ != (unsigned) n_basic_blocks_for_fn (cfun)))
+{
+  error (corrupt loop tree root);
+  err = 1;
+}
+
   /* Check the headers.  */
   FOR_EACH_BB_FN (bb, cfun)
 if (bb_loop_header_p (bb))
Index: gcc/except.c
===
--- gcc/except.c(revision 222320)
+++ gcc/except.c(working copy)
@@ -649,7 +649,7 @@ duplicate_eh_regions (struct function *i
   data.label_map_data = map_data;
   data.eh_map = new hash_mapvoid *, void *;
 
-  outer_region = get_eh_region_from_lp_number (outer_lp);
+  outer_region = get_eh_region_from_lp_number_fn (cfun, outer_lp);
 
   /* Copy all the regions in the subtree.  */
   if (copy_region)
Index: gcc/loop-init.c
===
--- gcc/loop-init.c (revision 222320)
+++ gcc/loop-init.c (working copy)
@@ -49,6 +49,7 @@ along with GCC; see the file COPYING3.
 #include ggc.h
 #include tree-ssa-loop-niter.h
 #include loop-unroll.h
+#include tree-scalar-evolution.h
 
 
 /* Apply FLAGS to the loop state.  */
@@ -221,6 +222,9 @@ fix_loop_structure (bitmap changed_bbs)
 
   timevar_push (TV_LOOP_INIT);
 
+  if (dump_file  (dump_flags  TDF_DETAILS))
+fprintf (dump_file, fix_loop_structure: fixing up loops for function\n);
+
   /* We need exact and fast dominance info to be available.  */
   gcc_assert (dom_info_state (CDI_DOMINATORS) == DOM_OK);
 
@@ -290,6 +294,7 @@ fix_loop_structure (bitmap changed_bbs)
 }
 
   /* Finally free deleted loops.  */
+  bool any_deleted = false;
   FOR_EACH_VEC_ELT (*get_loops (cfun), i, loop)
 if (loop  loop-header == NULL)
   {
@@ -322,8 +327,14 @@ fix_loop_structure (bitmap changed_bbs)
  }
(*get_loops (cfun))[i] = NULL;
flow_loop_free (loop);
+   any_deleted = true;
   }
 
+  /* If we deleted loops then the cached scalar evolutions refering to
+ those loops become invalid.  */
+  if (any_deleted  scev_initialized_p ())
+scev_reset_htab ();
+
   loops_state_clear (LOOPS_NEED_FIXUP);
 
   /* Apply flags to loops.  */

niter_base simplification

2015-04-22 Thread François Dumont


Hello

I don't know if I am missing something but I think __niter_base 
could be simplified to remove usage of _Iter_base. Additionally I 
overload it to also remove __normal_iterator layer even if behind a 
reverse_iterator or move_iterator, might help compiler to optimize code, 
no ? If not, might allow other algo optimization in the future...


I prefered to provide a __make_reverse_iterator to allow the latter 
in C++11 and not only in C++14. Is it fine to do it this way or do you 
prefer to simply get rid of all this part ?


* include/bits/cpp_type_traits.h (__gnu_cxx::__normal_iterator): 
Delete.

* include/bits/stl_algobase.h (std::__niter_base): Adapt.
* include/bits/stl_iterator.h (__make_reverse_iterator): New in C++11.
(std::__niter_base): Overloads for std::reverse_iterator,
__gnu_cxx::__normal_iterator and std::move_iterator.

Tested under Linux x86_64. I checked that std::copy still ends up 
calling __builtin_memmove when used on vector iterators.


François

diff --git a/libstdc++-v3/include/bits/cpp_type_traits.h b/libstdc++-v3/include/bits/cpp_type_traits.h
index 8c6bb7f..2142917 100644
--- a/libstdc++-v3/include/bits/cpp_type_traits.h
+++ b/libstdc++-v3/include/bits/cpp_type_traits.h
@@ -64,17 +64,6 @@
 // removed.
 //
 
-// Forward declaration hack, should really include this from somewhere.
-namespace __gnu_cxx _GLIBCXX_VISIBILITY(default)
-{
-_GLIBCXX_BEGIN_NAMESPACE_VERSION
-
-  templatetypename _Iterator, typename _Container
-class __normal_iterator;
-
-_GLIBCXX_END_NAMESPACE_VERSION
-} // namespace
-
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
@@ -331,24 +320,6 @@ __INT_N(__GLIBCXX_TYPE_INT_N_3)
 };
 
   //
-  // Normal iterator type
-  //
-  templatetypename _Tp
-struct __is_normal_iterator
-{
-  enum { __value = 0 };
-  typedef __false_type __type;
-};
-
-  templatetypename _Iterator, typename _Container
-struct __is_normal_iterator __gnu_cxx::__normal_iterator_Iterator,
-			  _Container 
-{
-  enum { __value = 1 };
-  typedef __true_type __type;
-};
-
-  //
   // An arithmetic type is an integer type or a floating point type
   //
   templatetypename _Tp
diff --git a/libstdc++-v3/include/bits/stl_algobase.h b/libstdc++-v3/include/bits/stl_algobase.h
index 0bcb133..73eea6b 100644
--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase.h
@@ -270,17 +270,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return __a;
 }
 
-  // If _Iterator is a __normal_iterator return its base (a plain pointer,
-  // normally) otherwise return it untouched.  See copy, fill, ... 
+  // Fallback implementation of the function used to remove the
+  // __normal_iterator wrapper. See copy, fill, ...
   templatetypename _Iterator
-struct _Niter_base
-: _Iter_base_Iterator, __is_normal_iterator_Iterator::__value
-{ };
-
-  templatetypename _Iterator
-inline typename _Niter_base_Iterator::iterator_type
+inline _Iterator
 __niter_base(_Iterator __it)
-{ return std::_Niter_base_Iterator::_S_base(__it); }
+{ return __it; }
 
   // Likewise, for move_iterator.
   templatetypename _Iterator
diff --git a/libstdc++-v3/include/bits/stl_iterator.h b/libstdc++-v3/include/bits/stl_iterator.h
index 4a9189e..3aad9f3 100644
--- a/libstdc++-v3/include/bits/stl_iterator.h
+++ b/libstdc++-v3/include/bits/stl_iterator.h
@@ -390,7 +390,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { return __y.base() - __x.base(); }
   //@}
 
-#if __cplusplus  201103L
+#if __cplusplus == 201103L
+  templatetypename _Iterator
+inline reverse_iterator_Iterator
+__make_reverse_iterator(_Iterator __i)
+{ return reverse_iterator_Iterator(__i); }
+
+# define _GLIBCXX_MAKE_REVERSE_ITERATOR(_Iter) \
+  std::__make_reverse_iterator(_Iter)
+#elif __cplusplus  201103L
 #define __cpp_lib_make_reverse_iterator 201402
 
   // _GLIBCXX_RESOLVE_LIB_DEFECTS
@@ -400,6 +408,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 inline reverse_iterator_Iterator
 make_reverse_iterator(_Iterator __i)
 { return reverse_iterator_Iterator(__i); }
+
+# define _GLIBCXX_MAKE_REVERSE_ITERATOR(_Iter) \
+  std::make_reverse_iterator(_Iter)
+#endif
+
+#if __cplusplus = 201103L
+  templatetypename _Iterator
+auto
+__niter_base(reverse_iterator_Iterator __it)
+- decltype(_GLIBCXX_MAKE_REVERSE_ITERATOR(__niter_base(__it.base(
+{ return _GLIBCXX_MAKE_REVERSE_ITERATOR(__niter_base(__it.base())); }
 #endif
 
   // 24.4.2.2.1 back_insert_iterator
@@ -979,6 +998,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace
 
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+  templatetypename _Iterator, typename _Container
+_Iterator
+__niter_base(__gnu_cxx::__normal_iterator_Iterator, _Container __it)
+{ return __it.base(); }
+
+_GLIBCXX_END_NAMESPACE_VERSION
+} // namespace
+
 #if __cplusplus =

Re: Hide _S_n_primes from user code

2015-04-22 Thread François Dumont


With the patch this time.


On 22/04/2015 21:39, François Dumont wrote:

Hello

Here is a rather trivial patch, just code cleanup. Since we export 
_Prime_rehash_policy we do not need to expose the _S_n_primes anymore.


* include/bits/hashtable_policy.h 
(_Prime_rehash_policy::_S_n_primes):

Delete.
* src/c++11/hashtable_c++0x.cc (_Prime_rehash_policy::_M_next_bkt):
Remove usage of latter and compute size of the prime numbers array
locally.

Tested under Linux x86_64.

Ok to commit ?

François



diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h
index 14bcca6..a9ad7dd 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -495,8 +495,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _M_reset(_State __state)
 { _M_next_resize = __state; }
 
-enum { _S_n_primes = sizeof(unsigned long) != 8 ? 256 : 256 + 48 };
-
 static const std::size_t _S_growth_factor = 2;
 
 float		_M_max_load_factor;
diff --git a/libstdc++-v3/src/c++11/hashtable_c++0x.cc b/libstdc++-v3/src/c++11/hashtable_c++0x.cc
index 22de51b..69f999f 100644
--- a/libstdc++-v3/src/c++11/hashtable_c++0x.cc
+++ b/libstdc++-v3/src/c++11/hashtable_c++0x.cc
@@ -56,8 +56,10 @@ namespace __detail
 	return __fast_bkt[__n];
   }
 
+constexpr auto __n_primes
+  = sizeof(__prime_list) / sizeof(unsigned long) - 1;
 const unsigned long* __next_bkt =
-  std::lower_bound(__prime_list + 5, __prime_list + _S_n_primes, __n);
+  std::lower_bound(__prime_list + 5, __prime_list + __n_primes, __n);
 _M_next_resize =
   __builtin_ceil(*__next_bkt * (long double)_M_max_load_factor);
 return *__next_bkt;

Re: Hide _S_n_primes from user code

On Wed, Apr 22, 2015 at 09:39:48PM +0200, François Dumont wrote:
 Hello
 
 Here is a rather trivial patch, just code cleanup. Since we export
 _Prime_rehash_policy we do not need to expose the _S_n_primes anymore.
 
 * include/bits/hashtable_policy.h (_Prime_rehash_policy::_S_n_primes):
 Delete.
 * src/c++11/hashtable_c++0x.cc (_Prime_rehash_policy::_M_next_bkt):
 Remove usage of latter and compute size of the prime numbers array
 locally.
 
 Tested under Linux x86_64.
 
 Ok to commit ?

The patch is missing :).

Marek

Re: [PATCH] PR target/65846: Optimize data access in PIE with copy reloc

2015-04-22 Thread Ramana Radhakrishnan

On Wed, Apr 22, 2015 at 5:34 PM, H.J. Lu hongjiu...@intel.com wrote:
 Normally, with PIE, GCC accesses globals that are extern to the module
 using GOT.  This is two instructions, one to get the address of the global
 from GOT and the other to get the value.  Examples:

 ---
 extern int a_glob;
 int
 main ()
 {
   return a_glob;
 }
 ---

 With PIE, the generated code accesses global via GOT using two memory
 loads:

 movqa_glob@GOTPCREL(%rip), %rax
 movl(%rax), %eax

 for 64-bit or

 movla_glob@GOT(%ecx), %eax
 movl(%eax), %eax

 for 32-bit.

 Some experiments on google and SPEC CPU benchmarks show that the extra
 instruction affects performance by 1% to 5%.

 Solution - Copy Relocations:

 When the linker supports copy relocations, GCC can always assume that
 the global will be defined in the executable.  For globals that are
 truly extern (come from shared objects), the linker will create copy
 relocations and have them defined in the executable.  Result is that
 no global access needs to go through GOT and hence improves performance.
 We can generate

 movla_glob(%rip), %eax

 for 64-bit and

 movla_glob@GOTOFF(%eax), %eax

 for 32-bit.  This optimization only applies to undefined non-weak
 non-TLS global data.  Undefined weak global or TLS data access still
 must go through GOT.

 This patch reverts legitimate_pic_address_disp_p change made in revision
 218397, which only applies to x86-64.  Instead, this patch updates
 targetm.binds_local_p to indicate if undefined non-weak non-TLS global
 data is defined locally in PIE.  It also introduces a new target hook,
 binds_tls_local_p to distinguish TLS variable from non-TLS variable.  By
 default, binds_tls_local_p is the same as binds_local_p.

 This patch checks if 32-bit and 64-bit linkers support PIE with copy
 reloc at configure time.  64-bit linker is enabled in binutils 2.25
 and 32-bit linker is enabled in binutils 2.26.  This optimization
 is enabled only if the linker support is available.

 Tested on Linux/x86-64 with -m32 and -m64, using linkers with and without
 support for copy relocation in PIE.  OK for trunk?

 Thanks.


Looking at this my first reaction was that surely most (if not all ? )
targets that use ELF and had copy relocs would benefit from this ?
Couldn't we find a simpler way for targets to have this support ? I
don't have a more constructive suggestion to make at the minute but
getting this to work just from the targetm.binds_local_p (decl)
interface would probably be better ?

It's late in the evening and I probably won't have time to look at
this in more detail till Friday afternoon given other personal
commitments.


regards,
Ramana





 H.J.
 ---
 gcc/

 PR target/65846
 * configure.ac (HAVE_LD_PIE_COPYRELOC): Renamed to ...
 (HAVE_LD_64BIT_PIE_COPYRELOC): This.
 (HAVE_LD_32BIT_PIE_COPYRELOC): New.   Defined to 1 if Linux/ia32
 linker supports PIE with copy reloc.



 * output.h (default_binds_tls_local_p): New.
 (default_binds_local_p_3): Add 2 bool arguments.
 * target.def (binds_tls_local_p): New target hook.
 * varasm.c (decl_default_tls_model): Replace targetm.binds_local_p
 with targetm.binds_tls_local_p.
 (default_binds_local_p_3): Add a bool argument to indicate TLS
 variable and a bool argument to indicate if an undefined non-TLS
 non-weak data is local.  Double check TLS variable.  If an
 undefined non-TLS non-weak data is local, treat it as defined
 locally.
 (default_binds_local_p): Pass false and false to
 default_binds_local_p_3.
 (default_binds_local_p_2): Likewise.
 (default_binds_local_p_1): Likewise.
 (default_binds_tls_local_p): New.
 * config.in: Regenerated.
 * configure: Likewise.
 * doc/tm.texi: Likewise.
 * config/i386/i386.c (legitimate_pic_address_disp_p): Don't
 check HAVE_LD_PIE_COPYRELOC here.
 (ix86_binds_local): New.
 (ix86_binds_tls_local_p): Likewise.
 (ix86_binds_local_p): Use it.
 (TARGET_BINDS_TLS_LOCAL_P): New.
 * doc/tm.texi.in (TARGET_BINDS_TLS_LOCAL_P): New hook.

 gcc/testsuite/

 PR target/65846
 * gcc.target/i386/pie-copyrelocs-1.c: Updated for ia32.
 * gcc.target/i386/pie-copyrelocs-2.c: Likewise.
 * gcc.target/i386/pie-copyrelocs-3.c: Likewise.
 * gcc.target/i386/pie-copyrelocs-4.c: Likewise.
 * gcc.target/i386/pr32219-9.c: Likewise.
 * gcc.target/i386/pr32219-10.c: New file.

 * lib/target-supports.exp (check_effective_target_pie_copyreloc):
 Check HAVE_LD_64BIT_PIE_COPYRELOC and HAVE_LD_32BIT_PIE_COPYRELOC
 instead of HAVE_LD_64BIT_PIE_COPYRELOC.
 ---
  gcc/config.in| 18 ---
  gcc/config/i386/i386.c   | 44 ++-

Re: [RFC][PATCH 3/3] Enable zero/sign extension elimination

On 17/01/15 13:11, Kugan wrote:
 
 Re-enable zero/sign extension elimination using value range that
 includes wrapped attribute.
 

Now that stage-1 is open, rebased it and regression tested on
x86-64-none-linux-gnu with no new regressions.

Is this OK for trunk?

Thanks,
Kugan

gcc/ChangeLog:

2015-04-22  Kugan Vivekanandarajah  kug...@linaro.org

* calls.c (precompute_arguments): Check
 promoted_for_signed_and_unsigned_p and set the promoted mode.
* expr.c (expand_expr_real_1): Likewise.
(promoted_for_signed_and_unsigned_p): New function.
* cfgexpand.c (expand_gimple_stmt_1): Call emit_move_insn if
SUBREG is promoted with SRP_SIGNED_AND_UNSIGNED.
* expr.h (promoted_for_signed_and_unsigned_p): New definition.
diff --git a/gcc/calls.c b/gcc/calls.c
index 3be7ca5..6b8d861 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -1637,7 +1637,10 @@ precompute_arguments (int num_actuals, struct arg_data 
*args)
  args[i].initial_value
= gen_lowpart_SUBREG (mode, args[i].value);
  SUBREG_PROMOTED_VAR_P (args[i].initial_value) = 1;
- SUBREG_PROMOTED_SET (args[i].initial_value, args[i].unsignedp);
+ if (promoted_for_signed_and_unsigned_p (args[i].tree_value))
+   SUBREG_PROMOTED_SET (args[i].initial_value, 
SRP_SIGNED_AND_UNSIGNED);
+ else
+   SUBREG_PROMOTED_SET (args[i].initial_value, args[i].unsignedp);
}
}
 }
diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index ca491a0..5fcee87 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -3452,7 +3452,13 @@ expand_gimple_stmt_1 (gimple stmt)
  GET_MODE (target), temp, unsignedp);
  }
 
-   convert_move (SUBREG_REG (target), temp, unsignedp);
+   if ((SUBREG_PROMOTED_GET (target) == SRP_SIGNED_AND_UNSIGNED)
+(GET_CODE (temp) == SUBREG)
+(GET_MODE (target) == GET_MODE (temp))
+(GET_MODE (SUBREG_REG (target)) == GET_MODE (SUBREG_REG 
(temp
+ emit_move_insn (SUBREG_REG (target), SUBREG_REG (temp));
+   else
+ convert_move (SUBREG_REG (target), temp, unsignedp);
  }
else if (nontemporal  emit_storent_insn (target, temp))
  ;
diff --git a/gcc/expr.c b/gcc/expr.c
index 530a944..224a50f 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -185,6 +185,39 @@ static rtx const_vector_from_tree (tree);
 static tree tree_expr_size (const_tree);
 static HOST_WIDE_INT int_expr_size (tree);
 
+/* Return TRUE if value in SSA is zero and sign extended for wider mode MODE
+   using value range information stored.  Return FALSE otherwise.
+
+   This is used to check if SUBREG is zero and sign extended and to set
+   promoted mode SRP_SIGNED_AND_UNSIGNED to SUBREG.  */
+
+bool
+promoted_for_signed_and_unsigned_p (tree ssa)
+{
+  wide_int min, max;
+  bool ovf;
+
+  if (ssa == NULL_TREE
+  || TREE_CODE (ssa) != SSA_NAME
+  || !INTEGRAL_TYPE_P (TREE_TYPE (ssa)))
+return false;
+
+  /* Return FALSE if value_range is not recorded for SSA.  */
+  if (get_range_info (ssa, min, max, ovf) != VR_RANGE)
+return false;
+
+  if (ovf)
+return false;
+
+  /* Return true (to set SRP_SIGNED_AND_UNSIGNED to SUBREG) if MSB of the
+ smaller mode is not set (i.e.  MSB of ssa is not set).  */
+  if (!wi::neg_p (min, SIGNED)  !wi::neg_p(max, SIGNED))
+return true;
+  else
+return false;
+
+}
+
 
 /* This is run to set up which modes can be used
directly in memory and to initialize the block move optab.  It is run
@@ -9674,7 +9707,10 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode 
tmode,
 
  temp = gen_lowpart_SUBREG (mode, decl_rtl);
  SUBREG_PROMOTED_VAR_P (temp) = 1;
- SUBREG_PROMOTED_SET (temp, unsignedp);
+ if (promoted_for_signed_and_unsigned_p (ssa_name))
+   SUBREG_PROMOTED_SET (temp, SRP_SIGNED_AND_UNSIGNED);
+ else
+   SUBREG_PROMOTED_SET (temp, unsignedp);
  return temp;
}
 
diff --git a/gcc/expr.h b/gcc/expr.h
index 867852e..f965fe0 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -243,6 +243,7 @@ extern rtx expand_expr_real_1 (tree, rtx, machine_mode,
   enum expand_modifier, rtx *, bool);
 extern rtx expand_expr_real_2 (sepops, rtx, machine_mode,
   enum expand_modifier);
+extern bool promoted_for_signed_and_unsigned_p (tree);
 
 /* Generate code for computing expression EXP.
An rtx for the computed value is returned.  The value is never null.

Re: [PATCH, rs6000, testsuite] Fix PR target/64579, __TM_end __builtin_tend failed to return transactional state

2015-04-22 Thread Peter Bergner

On Wed, 2015-04-22 at 17:16 -0500, Segher Boessenkool wrote:
 On Wed, Apr 22, 2015 at 08:43:10AM -0500, Peter Bergner wrote:
   Maybe you can fold tabortdc with tabortwc now?  Use one UNSPEC name
   for both, :GPR and wd?
  
  Wouldn't that change the tabortwc pattern to use DImode rather
  than SImode when compiled with -m64 or -m32 -mpowerpc64?
  I'm not sure we want that.
 
 The GPR mode iterator creates two patterns, one for SI and one for DI,
 and the tabortwc would be the one for SI if you use wd.

Ah, I think I know what you mean.  Sure, I can try that,





+ case HTM_BUILTIN_TTEST: /* Alias for: tabortwci. 0,r0,0  */
+   op[nopnds++] = GEN_INT (0);
+   op[nopnds++] = gen_rtx_REG (SImode, 0);
+   op[nopnds++] = GEN_INT (0);
   
   Is that really r0, isn't that (0|rA)?  [Too lazy to read the docs myself
   right now, sorry.]
  
  The ISA doc shows:
 
 [snip]
 
 Thanks for looking it up!
 
 I'm still a bit worried about putting a reg in the RTL (while the instruction
 doesn't actually use one), but perhaps it's harmless.

I'm not sure what you mean by the instruction doesn't use one.
The hardware instruction does use a register for its second
operand (even though its contents are ignored due to TO == 0)
and the pattern requires us to pass in a reg rtx, so I'm not
sure what you're referring to.





  This skips this test on -m32 -mpowerpc64, is that on purpose?
  
  Ummm, not exactly.  :-)   Not that many people test that though.
  I'll see if I can find a replacement for lp64 that covers that case.
 
 Maybe just { powerpc64 } works?

I'll take a look at that to see if that works, thanks.

Peter

Re: [PATCH] PR target/65846: Optimize data access in PIE with copy reloc

On Wed, Apr 22, 2015 at 3:15 PM, Ramana Radhakrishnan
ramana@googlemail.com wrote:
 On Wed, Apr 22, 2015 at 5:34 PM, H.J. Lu hongjiu...@intel.com wrote:
 Normally, with PIE, GCC accesses globals that are extern to the module
 using GOT.  This is two instructions, one to get the address of the global
 from GOT and the other to get the value.  Examples:

 ---
 extern int a_glob;
 int
 main ()
 {
   return a_glob;
 }
 ---

 With PIE, the generated code accesses global via GOT using two memory
 loads:

 movqa_glob@GOTPCREL(%rip), %rax
 movl(%rax), %eax

 for 64-bit or

 movla_glob@GOT(%ecx), %eax
 movl(%eax), %eax

 for 32-bit.

 Some experiments on google and SPEC CPU benchmarks show that the extra
 instruction affects performance by 1% to 5%.

 Solution - Copy Relocations:

 When the linker supports copy relocations, GCC can always assume that
 the global will be defined in the executable.  For globals that are
 truly extern (come from shared objects), the linker will create copy
 relocations and have them defined in the executable.  Result is that
 no global access needs to go through GOT and hence improves performance.
 We can generate

 movla_glob(%rip), %eax

 for 64-bit and

 movla_glob@GOTOFF(%eax), %eax

 for 32-bit.  This optimization only applies to undefined non-weak
 non-TLS global data.  Undefined weak global or TLS data access still
 must go through GOT.

 This patch reverts legitimate_pic_address_disp_p change made in revision
 218397, which only applies to x86-64.  Instead, this patch updates
 targetm.binds_local_p to indicate if undefined non-weak non-TLS global
 data is defined locally in PIE.  It also introduces a new target hook,
 binds_tls_local_p to distinguish TLS variable from non-TLS variable.  By
 default, binds_tls_local_p is the same as binds_local_p.

 This patch checks if 32-bit and 64-bit linkers support PIE with copy
 reloc at configure time.  64-bit linker is enabled in binutils 2.25
 and 32-bit linker is enabled in binutils 2.26.  This optimization
 is enabled only if the linker support is available.

 Tested on Linux/x86-64 with -m32 and -m64, using linkers with and without
 support for copy relocation in PIE.  OK for trunk?

 Thanks.


 Looking at this my first reaction was that surely most (if not all ? )
 targets that use ELF and had copy relocs would benefit from this ?
 Couldn't we find a simpler way for targets to have this support ? I
 don't have a more constructive suggestion to make at the minute but
 getting this to work just from the targetm.binds_local_p (decl)
 interface would probably be better ?

default_binds_local_p_3 is a global function which is used to
implement targetm.binds_local_p in x86 backend.  Any backend
can use it to optimize for copy relocation.

-- 
H.J.

Re: Ping^3 : [PATCH] [gcc, combine] PR46164: Don't combine the insns if a volatile register is contained.

2015-04-22 Thread Terry Guo

On Wed, Apr 22, 2015 at 10:30 AM, Segher Boessenkool
seg...@kernel.crashing.org wrote:
 On Wed, Apr 22, 2015 at 10:21:43AM +0800, Terry Guo wrote:
 gcc/ChangeLog:
 2015-04-22 Hale Wang hale.w...@arm.com
 Terry Guo  terry@arm.com

PR rtl-optimization/64818
* combine.c (can_combine_p): Don't combine user-specified register if
it is in an asm input.

 gcc/testsuite/ChangeLog:
 2015-04-22 Hale Wang hale.w...@arm.com
 Terry Guo  terry@arm.com

PR rtl-optimization/64818
* gcc.target/arm/pr64818.c: New.

 This is okay for trunk, if it has been bootstrapped and regression tested.

 Thanks,


 Segher

Thanks Segher. The patch is tested with bootstrap and regression test
for x86_64. No problem found. Committed as revision 222306.

BR,
Terry

Re: Expand oacc kernels after pass_fre (was: [PATCH, 1/8] Expand oacc kernels after pass_build_ealias)

On Tue, 21 Apr 2015, Thomas Schwinge wrote:

 Hi!
 
 On Tue, 25 Nov 2014 12:22:02 +0100, Tom de Vries tom_devr...@mentor.com 
 wrote:
  On 24-11-14 11:56, Tom de Vries wrote:
   On 15-11-14 18:19, Tom de Vries wrote:
   On 15-11-14 13:14, Tom de Vries wrote:
   I'm submitting a patch series with initial support for the oacc kernels
   directive.
  
   The patch series uses pass_parallelize_loops to implement 
   parallelization of
   loops in the oacc kernels region.
  
   The patch series consists of these 8 patches:
   ...
1  Expand oacc kernels after pass_build_ealias
2  Add pass_oacc_kernels
3  Add pass_ch_oacc_kernels to pass_oacc_kernels
4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
5  Add pass_loop_im to pass_oacc_kernels
6  Add pass_ccp to pass_oacc_kernels
7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
8  Do simple omp lowering for no address taken var
   ...
  
   This patch moves omp expansion of the oacc kernels directive to after
   pass_build_ealias.
  
   The rationale is that in order to use pass_parallelize_loops for 
   analysis and
   transformation of an oacc kernels region, we postpone omp expansion of 
   that
   region until the earliest point in the pass list where enough 
   information is
   availabe to run pass_parallelize_loops, in other words, after 
   pass_build_ealias.
  
   The patch postpones expansion in expand_omp, and ensures expansion by 
   adding
   pass_expand_omp_ssa:
   - after pass_build_ealias, and
   - after pass_all_early_optimizations for the case we're not optimizing.
  
   In order to make sure the oacc kernels region arrives at 
   pass_expand_omp_ssa,
   the way it left expand_omp, the patch makes pass_ccp and pass_forwprop 
   aware of
   lowered omp code, to handle it conservatively.
  
   The patch contains changes in expand_omp_target to deal with ssa-code, 
   similar
   to what is already present in expand_omp_taskreg.
  
   Furthermore, the patch forces the .omp_data_sizes and .omp_data_kinds to 
   not be
   static for oacc kernels. It does this to get some references to 
   .omp_data_sizes
   and .omp_data_kinds in the ssa code.  Without these references, the 
   definitions
   will be removed. The reference of the variables in GIMPLE_OACC_KERNELS 
   is not
   enough to have them not removed. [ In vries/oacc-kernels, I used a 
   BUILT_IN_USE
   kludge for this purpose ].
  
   Finally, at the end of pass_expand_omp_ssa we're left with SSA_NAMEs in 
   the
   original function of which the definition has been removed (as in moved 
   to the
   split off function). TODO_remove_unused_locals takes care of some of 
   them, but
   not the anonymous ones. So the patch iterates over all SSA_NAMEs to find 
   these
   dangling SSA_NAMEs and releases them.
  
  
   Reposting with small update: I've replaced the use of the rather generic
   gimple_stmt_omp_lowering_p with the more specific 
   gimple_stmt_omp_data_i_init_p.
  
   Bootstrapped and reg-tested in the same way as before.
  
  
  I've moved pass_expand_omp_ssa one down in the pass list, past pass_fre.
  
  This allows fre to unify references to the same omp variable before 
  entering 
  pass_oacc_kernels, which helps pass_lim in pass_oacc_kernels.
  
  F.i. this reduction fragment:
  ...
 # VUSE .MEM_8
 # PT = { D.2282 }
 _67 = .omp_data_i_59-sumD.2270;
 # VUSE .MEM_8
 _68 = *_67;
  
 _70 = _66 + _68;
  
 # VUSE .MEM_8
 # PT = { D.2282 }
 _69 = .omp_data_i_59-sumD.2270;
 # .MEM_71 = VDEF .MEM_8
 *_69 = _70;
  ...
  
  is transformed by fre into:
  ...
 # VUSE .MEM_8
 # PT = { D.2282 }
 _67 = .omp_data_i_59-sumD.2270;
 # VUSE .MEM_8
 _68 = *_67;
  
 _70 = _66 + _68;
  
 # .MEM_71 = VDEF .MEM_8
 *_67 = _70;
  ...
  
  In order for pass_fre to respect the kernels region boundaries, I've added 
  a 
  change in tree-ssa-sccvn.c:visit_use to handle the .omp_data_i init 
  conservatively.
  
  Bootstrapped and reg-tested as before.
  
  OK for trunk?
 
 Committed to gomp-4_0-branch in r79:
 
 commit 93557ac5e30c26ee1a3d1255e31265b287171a0d
 Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4
 Date:   Tue Apr 21 19:37:19 2015 +
 
 Expand oacc kernels after pass_fre
 
   gcc/
   * omp-low.c: Include gimple-pretty-print.h.
   (release_first_vuse_in_edge_dest): New function.
   (expand_omp_target): When not in ssa, don't split off oacc kernels
   region, clear PROP_gimple_eomp in cfun-curr_properties to force later
   expanssion, and add GOACC_kernels_internal call.
   When in ssa, split off oacc kernels and convert GOACC_kernels_internal
   into GOACC_kernels call.  Handle ssa-code.
   (pass_data_expand_omp): Don't set PROP_gimple_eomp unconditionally in
   properties_provided field.
   (pass_expand_omp::execute): Set PROP_gimple_eomp in
   cfun-curr_properties

Re: [PATCH, 6/8] Add pass_copy_prop in pass_oacc_kernels

On Tue, 21 Apr 2015, Thomas Schwinge wrote:

 Hi!
 
 On Tue, 25 Nov 2014 12:38:55 +0100, Tom de Vries tom_devr...@mentor.com 
 wrote:
  On 15-11-14 18:22, Tom de Vries wrote:
   On 15-11-14 13:14, Tom de Vries wrote:
   I'm submitting a patch series with initial support for the oacc kernels
   directive.
  
   The patch series uses pass_parallelize_loops to implement 
   parallelization of
   loops in the oacc kernels region.
  
   The patch series consists of these 8 patches:
   ...
1  Expand oacc kernels after pass_build_ealias
2  Add pass_oacc_kernels
3  Add pass_ch_oacc_kernels to pass_oacc_kernels
4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
5  Add pass_loop_im to pass_oacc_kernels
6  Add pass_ccp to pass_oacc_kernels
7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
8  Do simple omp lowering for no address taken var
   ...
  
   This patch adds pass_loop_ccp to pass group pass_oacc_kernels.
  
   We need this pass to simplify the loop body, and allow pass_parloops to 
   detect
   that loop iterations are independent.
  
  
  As suggested here ( 
  https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02993.html ) 
  I've replaced the pass_ccp with pass_copyprop, which performs trivial 
  constant 
  propagation in addition to copy propagation.
  
  Bootstrapped and reg-tested as before.
  
  OK for trunk?

I've recently wondered why we do copy propagation after LIM and I don't
remember.  Can you remind me?  Can you add testcases that fail before
this kind of patches and pass afterwards?

Richard.

 Committed to gomp-4_0-branch in r84:
 
 commit 1c2529b64620811cbff4a50374af797ee52ef5f8
 Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4
 Date:   Tue Apr 21 19:58:54 2015 +
 
 Add pass_copy_prop in pass_oacc_kernels
 
   gcc/
   * passes.def: Add pass_copy_prop to pass group pass_oacc_kernels.
   * tree-ssa-copy.c (stmt_may_generate_copy): Handle .omp_data_i init
   conservatively.
 
 git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@84 
 138bc75d-0d04-0410-961f-82ee72b054a4
 ---
  gcc/ChangeLog.gomp  |4 
  gcc/passes.def  |1 +
  gcc/tree-ssa-copy.c |4 
  3 files changed, 9 insertions(+)
 
 diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
 index 98e33ad..0be9191 100644
 --- gcc/ChangeLog.gomp
 +++ gcc/ChangeLog.gomp
 @@ -1,5 +1,9 @@
  2015-04-21  Tom de Vries  t...@codesourcery.com
  
 + * passes.def: Add pass_copy_prop to pass group pass_oacc_kernels.
 + * tree-ssa-copy.c (stmt_may_generate_copy): Handle .omp_data_i init
 + conservatively.
 +
   * passes.def: Add pass_lim in pass group pass_ch_oacc_kernels.
  
   * passes.def: Run pass_tree_loop_init and pass_tree_loop_done in pass
 diff --git gcc/passes.def gcc/passes.def
 index e6c9287..e6f1c33 100644
 --- gcc/passes.def
 +++ gcc/passes.def
 @@ -93,6 +93,7 @@ along with GCC; see the file COPYING3.  If not see
 NEXT_PASS (pass_ch_oacc_kernels);
 NEXT_PASS (pass_tree_loop_init);
 NEXT_PASS (pass_lim);
 +   NEXT_PASS (pass_copy_prop);
 NEXT_PASS (pass_expand_omp_ssa);
 NEXT_PASS (pass_tree_loop_done);
 POP_INSERT_PASSES ()
 diff --git gcc/tree-ssa-copy.c gcc/tree-ssa-copy.c
 index 5ae8e6c..6f35f99 100644
 --- gcc/tree-ssa-copy.c
 +++ gcc/tree-ssa-copy.c
 @@ -61,6 +61,7 @@ along with GCC; see the file COPYING3.  If not see
  #include tree-scalar-evolution.h
  #include tree-ssa-dom.h
  #include tree-ssa-loop-niter.h
 +#include omp-low.h
  
  
  /* This file implements the copy propagation pass and provides a
 @@ -116,6 +117,9 @@ stmt_may_generate_copy (gimple stmt)
if (gimple_has_volatile_ops (stmt))
  return false;
  
 +  if (gimple_stmt_omp_data_i_init_p (stmt))
 +return false;
 +
/* Statements with loads and/or stores will never generate a useful copy.  
 */
if (gimple_vuse (stmt))
  return false;
 
 
 Grüße,
  Thomas
 

-- 
Richard Biener rguent...@suse.de
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild,
Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)

Re: [PATCH] [1/2] [ARM] [libgcc] Support RTABI half-precision conversion functions.

2015-04-22 Thread Ramana Radhakrishnan

On Mon, Apr 13, 2015 at 12:25 PM, Joseph Myers jos...@codesourcery.com wrote:
 On Mon, 13 Apr 2015, Hale Wang wrote:

 Yes, you are right. It's my fault to add the only here. Thank you to point
 out this.
 Beside this, is this patch OK for you?

 I don't think it's a good idea for libgcc to include large pieces of
 assembly code generated by a compiler.  Just compile the code with
 whatever options are needed at the time libgcc is built - possibly with
 #if conditionals to allow compiling different versions of the code.
 Indeed, are any special options needed at all?


I agree and I don't think it's maintainable in the long run. From my
reading of this thread I can't see any special options being needed.
Can we just massage it in C ?


regards
Ramana


 --
 Joseph S. Myers
 jos...@codesourcery.com

Re: [PATCH] emit_bss(): Remove redundant guard

On 8 April 2015 at 20:19, Bernhard Reutner-Fischer
rep.dot@gmail.com wrote:
 gcc/ChangeLog:

OKed by Jeff and installed to trunk as r222311.
thanks,

 2015-04-01  Bernhard Reutner-Fischer  al...@gcc.gnu.org

 * varasm.c (emit_bss): Remove redundant guard.

 Signed-off-by: Bernhard Reutner-Fischer rep.dot@gmail.com
 ---
  gcc/varasm.c | 2 --
  1 file changed, 2 deletions(-)

 diff --git a/gcc/varasm.c b/gcc/varasm.c
 index 537a64d..2bb5f27 100644
 --- a/gcc/varasm.c
 +++ b/gcc/varasm.c
 @@ -1951,21 +1951,19 @@ emit_local (tree decl,

  #if defined ASM_OUTPUT_ALIGNED_BSS
  static bool
  emit_bss (tree decl ATTRIBUTE_UNUSED,
   const char *name ATTRIBUTE_UNUSED,
   unsigned HOST_WIDE_INT size ATTRIBUTE_UNUSED,
   unsigned HOST_WIDE_INT rounded ATTRIBUTE_UNUSED)
  {
 -#if defined ASM_OUTPUT_ALIGNED_BSS
ASM_OUTPUT_ALIGNED_BSS (asm_out_file, decl, name, size,
   get_variable_align (decl));
return true;
 -#endif
  }
  #endif

  /* A noswitch_section_callback for comm_section.  */

  static bool
  emit_common (tree decl ATTRIBUTE_UNUSED,
  const char *name ATTRIBUTE_UNUSED,
 --
 2.1.4

Re: [PATCH] PR target/55144: bfin: fix opening glibc-c.o: No such file or directory

On 8 April 2015 at 20:37, Bernhard Reutner-Fischer
rep.dot@gmail.com wrote:
 building all-gcc for bfin-linux-uclibc results in

 build/genchecksum cp/cp-lang.o c-family/stub-objc.o ... glibc-c.o \
 libbackend.a ..  cc1plus-checksum.c.tmp
 opening glibc-c.o: No such file or directory
 make[2]: *** [cc1-checksum.c] Error 1

 Fix this by prepending tmake_file which nowadays consists of t-slibgcc
 t-linux t-glibc. Remove the already listed tmake_file entries.

 Fixes all-gcc config-list.mk build for bfin-linux-uclibc.
 Ok for trunk?

This was OKed by Jeff and committed to trunk as r222313.
thanks,

 gcc/ChangeLog

 PR target/55144
 * config.gcc (bfin*-linux-uclibc*): Prepend tmake_file and
 remove already contained t-files.

 Signed-off-by: Bernhard Reutner-Fischer rep.dot@gmail.com
 Cc: Bernd Schmidt ber...@codesourcery.com
 Cc: Jie Zhang jzhang...@gmail.com
 Signed-off-by: Bernhard Reutner-Fischer rep.dot@gmail.com
 ---
  gcc/config.gcc | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/gcc/config.gcc b/gcc/config.gcc
 index cb08a5c..ddbd57b 100644
 --- a/gcc/config.gcc
 +++ b/gcc/config.gcc
 @@ -1118,7 +1118,7 @@ bfin*-uclinux*)
 ;;
  bfin*-linux-uclibc*)
 tm_file=${tm_file} dbxelf.h elfos.h bfin/elf.h gnu-user.h linux.h 
 glibc-stdint.h bfin/linux.h ./linux-sysroot-suffix.h
 -   tmake_file=bfin/t-bfin-linux t-slibgcc t-linux
 +   tmake_file=${tmake_file} bfin/t-bfin-linux
 use_collect2=no
 ;;
  bfin*-rtems*)
 --
 2.1.4

RE: [PATCH] [1/2] [ARM] [libgcc] Support RTABI half-precision conversion functions.

2015-04-22 Thread Hale Wang

 -Original Message-
 From: Ramana Radhakrishnan [mailto:ramana@googlemail.com]
 Sent: Wednesday, April 22, 2015 3:50 PM
 To: Joseph Myers
 Cc: Hale Wang; GCC Patches
 Subject: Re: [PATCH] [1/2] [ARM] [libgcc] Support RTABI half-precision
 conversion functions.

 On Mon, Apr 13, 2015 at 12:25 PM, Joseph Myers
 jos...@codesourcery.com wrote:
  On Mon, 13 Apr 2015, Hale Wang wrote:

  Yes, you are right. It's my fault to add the only here. Thank you
  to point out this.
  Beside this, is this patch OK for you?

  I don't think it's a good idea for libgcc to include large pieces of
  assembly code generated by a compiler.  Just compile the code with
  whatever options are needed at the time libgcc is built - possibly
  with #if conditionals to allow compiling different versions of the code.

Indeed, just compile the code with option '-mfloat-abi=soft' at the time libgcc 
is build which can solve this problem.

  Indeed, are any special options needed at all?

The reason is that the current GNU versions of the fp16 conversions are more 
efficient than the AEABI versions in this patch(and also more efficient than 
the code compiled with option '-mfloat-abi=soft', because no fp registers will 
be used to implement these functions which is allowed in the GNU versions). We 
provide an option so that the users can choose the version as they want(whether 
they want to follow the AEABI constraint or not).

 I agree and I don't think it's maintainable in the long run. From my reading 
 of
 this thread I can't see any special options being needed.
 Can we just massage it in C ?

The reason is that the implementations of these helper functions are allowed to 
corrupt the integer core registers permitted to be corrupted by the [AAPCS] 
(r0-r3, ip, lr, and CPSR). To guarantee this if we just massage it in C, as 
Joseph suggested, we can compile the code with whatever options are needed at 
the time libgcc is built. Possibly the option '-mfloat-abi=soft ' can help us 
to guarantee this (seems more strict than the AEABI constraint).

The special option is provided so that the users can choose the version as they 
want(whether they want to follow the AEABI constraint or not). Because the 
current GNU versions of the fp16 conversions are more efficient than the AEABI 
versions in this patch.

Best Regards,
Hale

 regards
 Ramana

  --
  Joseph S. Myers
  jos...@codesourcery.com

Re: [PATCH] Optionally sanitize globals in user-defined sections

2015-04-22 Thread Yury Gribov


On 04/19/2015 06:11 PM, Jakub Jelinek wrote:

On Sun, Apr 19, 2015 at 10:54:57AM +0300, Yury Gribov wrote:

On 04/17/2015 08:29 PM, Andi Kleen wrote:

Yury Gribov y.gri...@samsung.com writes:

+
+static bool
+section_sanitized_p (const char *sec)
+{
+  if (!sanitized_sections)
+return false;
+  size_t len = strlen (sec);
+  const char *p = sanitized_sections;
+  while ((p = strstr (p, sec)))
+{
+  if ((p == sanitized_sections || p[-1] == ',')
+  (p[len] == 0 || p[len] == ','))
+   return true;


No wildcard support? That may be a long option in some cases.


Right. Do you think * will be enough or we also need ? and [a-f] syntax?


libiberty contains and gcc build utilities already use fnmatch, so you
should just use that (with carefully chosen FNM_* options).


Hi all,

Here is an new patch which adds support for wildcards in 
-fsanitize-file:///home/ygribov/user-section-2.diff
sections.  This also adds a test which I forgot to svn-add last time 
(shame on me).


Bootstrapped and regtested on x64.  Ok to commit?

-Y

Re: [PATCH, 4/8] Add pass_tree_loop_{init,done} to pass_oacc_kernels

On Tue, 21 Apr 2015, Thomas Schwinge wrote:

 Hi!
 
 On Tue, 25 Nov 2014 12:29:28 +0100, Tom de Vries tom_devr...@mentor.com 
 wrote:
  On 15-11-14 18:21, Tom de Vries wrote:
   On 15-11-14 13:14, Tom de Vries wrote:
   I'm submitting a patch series with initial support for the oacc kernels
   directive.
  
   The patch series uses pass_parallelize_loops to implement 
   parallelization of
   loops in the oacc kernels region.
  
   The patch series consists of these 8 patches:
   ...
1  Expand oacc kernels after pass_build_ealias
2  Add pass_oacc_kernels
3  Add pass_ch_oacc_kernels to pass_oacc_kernels
4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
5  Add pass_loop_im to pass_oacc_kernels
6  Add pass_ccp to pass_oacc_kernels
7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
8  Do simple omp lowering for no address taken var
   ...
  
   This patch adds pass_tree_loop_init and pass_tree_loop_init_done to
   pass_oacc_kernels.
  
   Pass_parallelize_loops is run between these passes in the pass group
   pass_tree_loop, since it requires loop information.  We do the same for
   pass_oacc_kernels.
  
  
  Updated for moving pass_oacc_kernels down past pass_fre in the pass list.
  
  Bootstrapped and reg-tested as before.
  
  OK for trunk?

Both passes should be basically no-ops.  Why not call 
loop_optimizer_init/finalize from expand_omp_ssa instead?

 Committed to gomp-4_0-branch in r82:
 
 commit cb95b4a1efcdb96c58cda986d53b20c3537c1ab7
 Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4
 Date:   Tue Apr 21 19:51:33 2015 +
 
 Add pass_tree_loop_{init,done} to pass_oacc_kernels
 
   gcc/
   * passes.def: Run pass_tree_loop_init and pass_tree_loop_done in pass
   group pass_oacc_kernels.
   * tree-ssa-loop.c (pass_tree_loop_init::clone)
   (pass_tree_loop_done::clone): New function.
 
 git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@82 
 138bc75d-0d04-0410-961f-82ee72b054a4
 ---
  gcc/ChangeLog.gomp  |5 +
  gcc/passes.def  |2 ++
  gcc/tree-ssa-loop.c |2 ++
  3 files changed, 9 insertions(+)
 
 diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
 index d00c5e0..1fb060f 100644
 --- gcc/ChangeLog.gomp
 +++ gcc/ChangeLog.gomp
 @@ -1,5 +1,10 @@
  2015-04-21  Tom de Vries  t...@codesourcery.com
  
 + * passes.def: Run pass_tree_loop_init and pass_tree_loop_done in pass
 + group pass_oacc_kernels.
 + * tree-ssa-loop.c (pass_tree_loop_init::clone)
 + (pass_tree_loop_done::clone): New function.
 +
   * omp-low.c (loop_in_oacc_kernels_region_p): New function.
   * omp-low.h (loop_in_oacc_kernels_region_p): Declare.
   * passes.def: Add pass_ch_oacc_kernels to pass group pass_oacc_kernels.
 diff --git gcc/passes.def gcc/passes.def
 index 5cdbc87..83ae04e 100644
 --- gcc/passes.def
 +++ gcc/passes.def
 @@ -91,7 +91,9 @@ along with GCC; see the file COPYING3.  If not see
 NEXT_PASS (pass_oacc_kernels);
 PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
 NEXT_PASS (pass_ch_oacc_kernels);
 +   NEXT_PASS (pass_tree_loop_init);
 NEXT_PASS (pass_expand_omp_ssa);
 +   NEXT_PASS (pass_tree_loop_done);
 POP_INSERT_PASSES ()
 NEXT_PASS (pass_merge_phi);
 NEXT_PASS (pass_cd_dce);
 diff --git gcc/tree-ssa-loop.c gcc/tree-ssa-loop.c
 index a041858..2a96a39 100644
 --- gcc/tree-ssa-loop.c
 +++ gcc/tree-ssa-loop.c
 @@ -272,6 +272,7 @@ public:
  
/* opt_pass methods: */
virtual unsigned int execute (function *);
 +  opt_pass * clone () { return new pass_tree_loop_init (m_ctxt); }
  
  }; // class pass_tree_loop_init
  
 @@ -566,6 +567,7 @@ public:
  
/* opt_pass methods: */
virtual unsigned int execute (function *) { return tree_ssa_loop_done (); }
 +  opt_pass * clone () { return new pass_tree_loop_done (m_ctxt); }
  
  }; // class pass_tree_loop_done
  
 
 
 Grüße,
  Thomas
 

-- 
Richard Biener rguent...@suse.de
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild,
Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)

[PATCH][PR65823] Fix va_arg ap_copy nop detection

2015-04-22 Thread Tom de Vries


Hi,

this patch fixes PR65823.

The problem is a verify_gimple ICE during compilation of 
gcc.c-torture/execute/stdarg-2.c for arm at -O0/-O1:

...
In function 'f3':
src/gcc/testsuite/gcc.c-torture/execute/stdarg-2.c:61:1:
error: incorrect sharing of tree nodes
aps[4]
# .MEM_5 = VDEF .MEM_11
aps[4] = aps[4];
...


Before gimplification, f3 looks like this in the original dump:
...
  struct va_list aps[10];

struct va_list aps[10];
  __builtin_va_start ((struct  ) (struct  *) aps[4], i);
  x = VA_ARG_EXPR aps[4];
  __builtin_va_end ((struct  ) (struct  *) aps[4]);
...

After gimplification, it looks like:
...
f3 (int i)
{
  long intD.5 x.0D.4231;
  struct va_listD.4222 apsD.4227[10];

  try
{
  # USE = anything
  # CLB = anything
  __builtin_va_startD.1052 (apsD.4227[4], 0);
  # USE = anything
  # CLB = anything
  x.0D.4231 = VA_ARG (apsD.4227[4], 0B);
  apsD.4227[4] = apsD.4227[4];
  xD.4223 = x.0D.4231;
  # USE = anything
  # CLB = anything
  __builtin_va_endD.1051 (apsD.4227[4]);
}
  finally
{
  apsD.4227 = {CLOBBER};
}
}
...

The nop 'apsD.4227[4] = apsD.4227[4]' introduced during gimplification is not 
meant to be there.



There is already a test 'TREE_OPERAND (ap, 0) != TREE_OPERAND (ap_copy, 0))' in 
gimplify_modify_expr to prevent this nop:

...
  /* When gimplifying the ap argument of va_arg, we might end up with
   ap.1 = ap
   va_arg (ap.1, 0B)
 We need to assign ap.1 back to ap, otherwise va_arg has no effect on
 ap.  */
  if (ap != NULL_TREE
   TREE_CODE (ap) == ADDR_EXPR
   TREE_CODE (ap_copy) == ADDR_EXPR
   TREE_OPERAND (ap, 0) != TREE_OPERAND (ap_copy, 0))
gimplify_assign (TREE_OPERAND (ap, 0), TREE_OPERAND (ap_copy, 0), pre_p);
...

But the test is a pointer equality test, and it fails in this case.

The patches fixes the problem by using operand_equal_p to do the equality test.


Bootstrapped and reg-tested on x86_64.

Did minimal non-bootstrap build on arm and reg-tested.

OK for trunk?

Thanks,
- Tom
Fix va_arg ap_copy nop detection

2015-04-22  Tom de Vries  t...@codesourcery.com

	PR tree-optimization/65823
	* gimplify.c (gimplify_modify_expr): Use operand_equal_p to test for
	equality between ap_copy and ap.
---
 gcc/gimplify.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 0a8ef84..c68bd47 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -4792,7 +4792,7 @@ gimplify_modify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
   if (ap != NULL_TREE
TREE_CODE (ap) == ADDR_EXPR
TREE_CODE (ap_copy) == ADDR_EXPR
-   TREE_OPERAND (ap, 0) != TREE_OPERAND (ap_copy, 0))
+   !operand_equal_p (TREE_OPERAND (ap, 0), TREE_OPERAND (ap_copy, 0), 0))
 gimplify_assign (TREE_OPERAND (ap, 0), TREE_OPERAND (ap_copy, 0), pre_p);
 
   if (want_value)
-- 
1.9.1

Re: [PATCH][PR65823] Fix va_arg ap_copy nop detection

On Wed, Apr 22, 2015 at 9:41 AM, Tom de Vries tom_devr...@mentor.com wrote:
 Hi,

 this patch fixes PR65823.

 The problem is a verify_gimple ICE during compilation of
 gcc.c-torture/execute/stdarg-2.c for arm at -O0/-O1:
 ...
 In function 'f3':
 src/gcc/testsuite/gcc.c-torture/execute/stdarg-2.c:61:1:
 error: incorrect sharing of tree nodes
 aps[4]
 # .MEM_5 = VDEF .MEM_11
 aps[4] = aps[4];
 ...


 Before gimplification, f3 looks like this in the original dump:
 ...
   struct va_list aps[10];

 struct va_list aps[10];
   __builtin_va_start ((struct  ) (struct  *) aps[4], i);
   x = VA_ARG_EXPR aps[4];
   __builtin_va_end ((struct  ) (struct  *) aps[4]);
 ...

 After gimplification, it looks like:
 ...
 f3 (int i)
 {
   long intD.5 x.0D.4231;
   struct va_listD.4222 apsD.4227[10];

   try
 {
   # USE = anything
   # CLB = anything
   __builtin_va_startD.1052 (apsD.4227[4], 0);
   # USE = anything
   # CLB = anything
   x.0D.4231 = VA_ARG (apsD.4227[4], 0B);
   apsD.4227[4] = apsD.4227[4];
   xD.4223 = x.0D.4231;
   # USE = anything
   # CLB = anything
   __builtin_va_endD.1051 (apsD.4227[4]);
 }
   finally
 {
   apsD.4227 = {CLOBBER};
 }
 }
 ...

 The nop 'apsD.4227[4] = apsD.4227[4]' introduced during gimplification is
 not meant to be there.


 There is already a test 'TREE_OPERAND (ap, 0) != TREE_OPERAND (ap_copy, 0))'
 in gimplify_modify_expr to prevent this nop:
 ...
   /* When gimplifying the ap argument of va_arg, we might end up with
ap.1 = ap
va_arg (ap.1, 0B)
  We need to assign ap.1 back to ap, otherwise va_arg has no effect on
  ap.  */
   if (ap != NULL_TREE
TREE_CODE (ap) == ADDR_EXPR
TREE_CODE (ap_copy) == ADDR_EXPR
TREE_OPERAND (ap, 0) != TREE_OPERAND (ap_copy, 0))
 gimplify_assign (TREE_OPERAND (ap, 0), TREE_OPERAND (ap_copy, 0),
 pre_p);
 ...

 But the test is a pointer equality test, and it fails in this case.

 The patches fixes the problem by using operand_equal_p to do the equality
 test.


 Bootstrapped and reg-tested on x86_64.

 Did minimal non-bootstrap build on arm and reg-tested.

 OK for trunk?

Hmm, ok for now.  But I wonder if we can't fix things to not require that
odd extra copy.  In fact that we introduce ap.1 looks completely bogus to me
(and we don't in this case for arm).  Note that the pointer compare obviously
fails because we unshare the expression.

So ... what breaks if we simply remove this odd fixup?

Thanks,
Richard.

 Thanks,
 - Tom

Re: [PATCH, 3/8] Add pass_ch_oacc_kernels to pass_oacc_kernels