[PATCH][ARM] Do not lower cost of setting core reg to constant. It doesn't have any effect
Hi all, This hunk that slightly reduces the cost of immediate moves doesn't actually have any effect. In the whole of SPEC2006 it didn't make a difference. In any case, I'd like to move to a point where we use COSTS_N_INSNS units for our costs and not increment decrement them by one. This patch removes that bit of logic and makes it slightly cleaner to look at. As far as I know its logic has never been confirmed in practice. Bootstrapped and tested on arm. Ok for trunk? Thanks, Kyrill 2015-04-22 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/arm.c (arm_new_rtx_costs): Do not lower cost immediate moves. commit e225669ff70f09520007b7898b170fb8fa75281f Author: Kyrylo Tkachov kyrylo.tkac...@arm.com Date: Wed Apr 8 10:18:23 2015 +0100 [ARM] Do not lower cost of setting core reg to constant. It doesn't have any effect diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 0ef05c9..03988ac 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -9725,11 +9725,7 @@ arm_new_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer_code, and we would otherwise be unable to work out the true cost. */ *cost = rtx_cost (SET_DEST (x), SET, 0, speed_p); outer_code = SET; - /* Slightly lower the cost of setting a core reg to a constant. - This helps break up chains and allows for better scheduling. */ - if (REG_P (SET_DEST (x)) - REGNO (SET_DEST (x)) = LR_REGNUM) - *cost -= 1; + x = SET_SRC (x); /* Immediate moves with an immediate in the range [0, 255] can be encoded in 16 bits in Thumb mode. */
[PATCH] PR target/65846: Optimize data access in PIE with copy reloc
Normally, with PIE, GCC accesses globals that are extern to the module using GOT. This is two instructions, one to get the address of the global from GOT and the other to get the value. Examples: --- extern int a_glob; int main () { return a_glob; } --- With PIE, the generated code accesses global via GOT using two memory loads: movqa_glob@GOTPCREL(%rip), %rax movl(%rax), %eax for 64-bit or movla_glob@GOT(%ecx), %eax movl(%eax), %eax for 32-bit. Some experiments on google and SPEC CPU benchmarks show that the extra instruction affects performance by 1% to 5%. Solution - Copy Relocations: When the linker supports copy relocations, GCC can always assume that the global will be defined in the executable. For globals that are truly extern (come from shared objects), the linker will create copy relocations and have them defined in the executable. Result is that no global access needs to go through GOT and hence improves performance. We can generate movla_glob(%rip), %eax for 64-bit and movla_glob@GOTOFF(%eax), %eax for 32-bit. This optimization only applies to undefined non-weak non-TLS global data. Undefined weak global or TLS data access still must go through GOT. This patch reverts legitimate_pic_address_disp_p change made in revision 218397, which only applies to x86-64. Instead, this patch updates targetm.binds_local_p to indicate if undefined non-weak non-TLS global data is defined locally in PIE. It also introduces a new target hook, binds_tls_local_p to distinguish TLS variable from non-TLS variable. By default, binds_tls_local_p is the same as binds_local_p. This patch checks if 32-bit and 64-bit linkers support PIE with copy reloc at configure time. 64-bit linker is enabled in binutils 2.25 and 32-bit linker is enabled in binutils 2.26. This optimization is enabled only if the linker support is available. Tested on Linux/x86-64 with -m32 and -m64, using linkers with and without support for copy relocation in PIE. OK for trunk? Thanks. H.J. --- gcc/ PR target/65846 * configure.ac (HAVE_LD_PIE_COPYRELOC): Renamed to ... (HAVE_LD_64BIT_PIE_COPYRELOC): This. (HAVE_LD_32BIT_PIE_COPYRELOC): New. Defined to 1 if Linux/ia32 linker supports PIE with copy reloc. * output.h (default_binds_tls_local_p): New. (default_binds_local_p_3): Add 2 bool arguments. * target.def (binds_tls_local_p): New target hook. * varasm.c (decl_default_tls_model): Replace targetm.binds_local_p with targetm.binds_tls_local_p. (default_binds_local_p_3): Add a bool argument to indicate TLS variable and a bool argument to indicate if an undefined non-TLS non-weak data is local. Double check TLS variable. If an undefined non-TLS non-weak data is local, treat it as defined locally. (default_binds_local_p): Pass false and false to default_binds_local_p_3. (default_binds_local_p_2): Likewise. (default_binds_local_p_1): Likewise. (default_binds_tls_local_p): New. * config.in: Regenerated. * configure: Likewise. * doc/tm.texi: Likewise. * config/i386/i386.c (legitimate_pic_address_disp_p): Don't check HAVE_LD_PIE_COPYRELOC here. (ix86_binds_local): New. (ix86_binds_tls_local_p): Likewise. (ix86_binds_local_p): Use it. (TARGET_BINDS_TLS_LOCAL_P): New. * doc/tm.texi.in (TARGET_BINDS_TLS_LOCAL_P): New hook. gcc/testsuite/ PR target/65846 * gcc.target/i386/pie-copyrelocs-1.c: Updated for ia32. * gcc.target/i386/pie-copyrelocs-2.c: Likewise. * gcc.target/i386/pie-copyrelocs-3.c: Likewise. * gcc.target/i386/pie-copyrelocs-4.c: Likewise. * gcc.target/i386/pr32219-9.c: Likewise. * gcc.target/i386/pr32219-10.c: New file. * lib/target-supports.exp (check_effective_target_pie_copyreloc): Check HAVE_LD_64BIT_PIE_COPYRELOC and HAVE_LD_32BIT_PIE_COPYRELOC instead of HAVE_LD_64BIT_PIE_COPYRELOC. --- gcc/config.in| 18 --- gcc/config/i386/i386.c | 44 ++- gcc/configure| 68 +--- gcc/configure.ac | 64 +++--- gcc/doc/tm.texi | 10 gcc/doc/tm.texi.in | 2 + gcc/output.h | 4 +- gcc/target.def | 14 + gcc/testsuite/gcc.target/i386/pie-copyrelocs-1.c | 4 +- gcc/testsuite/gcc.target/i386/pie-copyrelocs-2.c | 4 +- gcc/testsuite/gcc.target/i386/pie-copyrelocs-3.c | 2 +- gcc/testsuite/gcc.target/i386/pie-copyrelocs-4.c | 4 +- gcc/testsuite/gcc.target/i386/pr32219-10.c | 16 ++
[PATCH 0/14][ARM/AArch64] __FP16 support, vectors, intrinsics, testsuite
This patch series adds support for ARM Neon float16x4_t and float16x8_t vector types and intrinsics, and the __fp16 type, on both ARM and AArch64, and extends the tests in Christophe Lyon's advsimd-intrinsics testsuite to cover these. (I chose to extend the existing tests rather than add new ones, as the majority of f16 intrinsics are just moving blocks of 16-bits around and do not depend on HW support; I added new files for the conversion intrinsics.) The ARM parts were previously posted at https://gcc.gnu.org/ml/gcc-patches/2015-01/msg01434.html but have had some fixes following the testsuite additions. Also The ARM patches depend upon my ARM lane-checking improvements at https://gcc.gnu.org/ml/gcc-patches/2015-01/msg01422.html , which I have just pinged. I've cross-tested baremetal arm-none-eabi, aarch64-none-elf and aarch64_be-none-elf most patches individually, and bootstrapped each patch in series on (the relevant one of) arm-none-linux-gnueabihf and aarch64-none-linux-gnu. OK for trunk? Cheers, Alan
Re: trunk test result inconsistencies
On Wed, Apr 22, 2015 at 05:40:07AM -0700, H.J. Lu wrote: On Wed, Apr 22, 2015 at 5:19 AM, Jakub Jelinek ja...@redhat.com wrote: On Wed, Apr 22, 2015 at 08:04:03AM -0400, Andrew MacLeod wrote: Is anyone else seeing comparison problems on trunk? I was having problems testing a patch on a 4/16 extraction, so last night I checked out a fresh trunk, built it, ran make check... then removed the build directory, re-built it from scratch again. make check.. and get a bunch of different results. I even used test_summary instead of my own home-grown scripts. That is a buggy kernel, you must have missed a warning not to upgrade your kernel. See https://bugzilla.kernel.org/show_bug.cgi?id=96311 https://lkml.org/lkml/2015/4/13/600 Dunno what progress has been on that patch since then, don't see it in latest Linus' git though. AFAIK at least 3.19 and 4.0 kernels are affected. 3.19.5-100/3.19.5-200 from Fedora 20/21 fix the bug. I admit I have not tried, but am certainly not seeing Peter's fix in https://www.kernel.org/pub/linux/kernel/v3.x/ChangeLog-3.19.5 has it been fixed differently? Not seeing anything in the Fedora kernel package %changelog either... Jakub
Re: [PATCH 3/5] libcc1: set debug compile: Display GCC driver filename
On 04/21/2015 03:41 PM, Jan Kratochvil wrote: Hi, as discussed in How to use compile execute function in GDB https://sourceware.org/ml/gdb/2015-04/msg00026.html GDB currently searches for /usr/bin/ARCH-OS-gcc and chooses one but it does not display which one. It cannot, GCC method set_arguments() does not yet know whether 'set debug compile' is enabled or not. Unfortunately this changes libcc1 API in an incompatible way. There is a possibility of a hack to keep the API the same - one could pass -v option explicitly to set_arguments(), set_arguments() could compare the -v string and print the GCC filename accordingly. Then the 'verbose' parameter of compile() would lose its meaning. What do you think? I think we're early enough in the evolution of libcc1 that changing the ABI shouldn't be a big deal. I'd expect gcc gdb to need to move in lock-step for this stuff for a while. GDB counterpart: [PATCH 3/4] compile: set debug compile: Display GCC driver filename https://sourceware.org/ml/gdb-patches/2015-04/msg00807.html Message-ID: 20150421213649.14147.79719.st...@host1.jankratochvil.net Jan include/ChangeLog 2015-04-21 Jan Kratochvil jan.kratoch...@redhat.com * gcc-interface.h (enum gcc_base_api_version): Add comment to GCC_FE_VERSION_1. (struct gcc_base_vtable): Move parameter verbose from compile to set_arguments. libcc1/ChangeLog 2015-04-21 Jan Kratochvil jan.kratoch...@redhat.com * libcc1.cc: Include intl.h. (struct libcc1): Add field verbose. (libcc1::libcc1): Initialize it. (libcc1_set_arguments): Add parameter verbose, implement it. (libcc1_compile): Remove parameter verbose, use self's field instead. OK. Please install on the trunk. jeff
Re: trunk test result inconsistencies
On Wed, Apr 22, 2015 at 6:10 AM, Jakub Jelinek ja...@redhat.com wrote: On Wed, Apr 22, 2015 at 05:40:07AM -0700, H.J. Lu wrote: On Wed, Apr 22, 2015 at 5:19 AM, Jakub Jelinek ja...@redhat.com wrote: On Wed, Apr 22, 2015 at 08:04:03AM -0400, Andrew MacLeod wrote: Is anyone else seeing comparison problems on trunk? I was having problems testing a patch on a 4/16 extraction, so last night I checked out a fresh trunk, built it, ran make check... then removed the build directory, re-built it from scratch again. make check.. and get a bunch of different results. I even used test_summary instead of my own home-grown scripts. That is a buggy kernel, you must have missed a warning not to upgrade your kernel. See https://bugzilla.kernel.org/show_bug.cgi?id=96311 https://lkml.org/lkml/2015/4/13/600 Dunno what progress has been on that patch since then, don't see it in latest Linus' git though. AFAIK at least 3.19 and 4.0 kernels are affected. 3.19.5-100/3.19.5-200 from Fedora 20/21 fix the bug. I admit I have not tried, but am certainly not seeing Peter's fix in https://www.kernel.org/pub/linux/kernel/v3.x/ChangeLog-3.19.5 has it been fixed differently? Not seeing anything in the Fedora kernel package %changelog either... * Wed Apr 15 2015 Josh Boyer jwbo...@fedoraproject.org - Add patch to fix tty closure race (rhbz 1208953) https://bugzilla.redhat.com/show_bug.cgi?id=1208953 -- H.J.
Re: [PATCH][PR65823] Fix va_arg ap_copy nop detection
On 22-04-15 10:06, Richard Biener wrote: On Wed, Apr 22, 2015 at 9:41 AM, Tom de Vries tom_devr...@mentor.com wrote: Hi, this patch fixes PR65823. SNIP The patches fixes the problem by using operand_equal_p to do the equality test. Bootstrapped and reg-tested on x86_64. Did minimal non-bootstrap build on arm and reg-tested. OK for trunk? Hmm, ok for now. Committed. But I wonder if we can't fix things to not require that odd extra copy. Agreed, that would be good. In fact that we introduce ap.1 looks completely bogus to me AFAICT, it's introduced by gimplify_arg ('argp') because argp (a PARM_DECL) is not addressable. (and we don't in this case for arm). Note that the pointer compare obviously fails because we unshare the expression. So ... what breaks if we simply remove this odd fixup? [ Originally mentioned at https://gcc.gnu.org/ml/gcc/2015-02/msg00011.html . ] I've committed gcc.target/x86_64/abi/callabi/vaarg-6.c specifically as a minimal version of this problem. If we remove the ap_copy fixup, at original we have: ... ;; Function do_cpy2 (null) { char * e; char * e; e = VA_ARG_EXPR argp; e = VA_ARG_EXPR argp; if (e != b) { abort (); } } ... and after gimplify we have: ... do_cpy2 (char * argp) { char * argp.1; char * argp.2; char * b.3; char * e; argp.1 = argp; e = VA_ARG (argp.1, 0B); argp.2 = argp; e = VA_ARG (argp.2, 0B); b.3 = b; if (e != b.3) goto D.1373; else goto D.1374; D.1373: abort (); D.1374: } ... The second VA_ARG uses argp.2, which is a copy of argp, which is unmodified by the first VA_ARG. Using attached _proof-of-concept_ patch, I get callabi.exp working without the ap_copy, still at the cost of one 'argp.1 = argp' copy though: ... do_cpy2 (char * argp) { char * argp.1; char * b.2; char * e; argp.1 = argp; e = VA_ARG (argp.1, 0B); e = VA_ARG (argp.1, 0B); b.2 = b; if (e != b.2) goto D.1372; else goto D.1373; D.1372: abort (); D.1373: } ... But perhaps there's an easier way? Thanks, - Tom Add copy for va_list parameter --- gcc/function.c | 29 + gcc/gimplify.c | 16 2 files changed, 29 insertions(+), 16 deletions(-) diff --git a/gcc/function.c b/gcc/function.c index 7d4df92..2ebfec4 100644 --- a/gcc/function.c +++ b/gcc/function.c @@ -3855,6 +3855,24 @@ gimplify_parm_type (tree *tp, int *walk_subtrees, void *data) return NULL; } +static inline bool +is_va_list_type (tree type) +{ + tree id = TYPE_IDENTIFIER (type); + if (id == NULL_TREE) +return false; + const char *s = IDENTIFIER_POINTER (id); + if (s == NULL) +return false; + if (strcmp (s, va_list) == 0) +return true; + if (strcmp (s, __builtin_sysv_va_list) == 0) +return true; + if (strcmp (s, __builtin_ms_va_list) == 0) +return true; + return false; +} + /* Gimplify the parameter list for current_function_decl. This involves evaluating SAVE_EXPRs of variable sized parameters and generating code to implement callee-copies reference parameters. Returns a sequence of @@ -3953,6 +3971,17 @@ gimplify_parameters (void) DECL_HAS_VALUE_EXPR_P (parm) = 1; } } + else if (is_va_list_type (TREE_TYPE (parm))) + { + tree cp = create_tmp_reg (data.nominal_type, get_name (parm)); + DECL_IGNORED_P (cp) = 0; + TREE_ADDRESSABLE (cp) = 1; + tree t = build2 (MODIFY_EXPR, TREE_TYPE (cp), cp, parm); + gimplify_and_add (t, stmts); + + SET_DECL_VALUE_EXPR (parm, cp); + DECL_HAS_VALUE_EXPR_P (parm) = 1; + } } fnargs.release (); diff --git a/gcc/gimplify.c b/gcc/gimplify.c index 5f1dd1a..c922dc7 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -4569,7 +4569,6 @@ gimplify_modify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, gimple assign; location_t loc = EXPR_LOCATION (*expr_p); gimple_stmt_iterator gsi; - tree ap = NULL_TREE, ap_copy = NULL_TREE; gcc_assert (TREE_CODE (*expr_p) == MODIFY_EXPR || TREE_CODE (*expr_p) == INIT_EXPR); @@ -4730,16 +4729,12 @@ gimplify_modify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, enum internal_fn ifn = CALL_EXPR_IFN (*from_p); auto_vectree vargs (nargs); - if (ifn == IFN_VA_ARG) - ap = unshare_expr (CALL_EXPR_ARG (*from_p, 0)); for (i = 0; i nargs; i++) { gimplify_arg (CALL_EXPR_ARG (*from_p, i), pre_p, EXPR_LOCATION (*from_p)); vargs.quick_push (CALL_EXPR_ARG (*from_p, i)); } - if (ifn == IFN_VA_ARG) - ap_copy = CALL_EXPR_ARG (*from_p, 0); call_stmt = gimple_build_call_internal_vec (ifn, vargs); gimple_set_location (call_stmt, EXPR_LOCATION (*expr_p)); } @@ -4784,17 +4779,6 @@ gimplify_modify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, gsi = gsi_last (*pre_p); maybe_fold_stmt (gsi); - /* When gimplifying the ap argument of va_arg, we might end up with - ap.1 = ap - va_arg (ap.1, 0B) - We need to
Re: [PATCH 12/13] libstdc++, libgfortran gthr workaround for musl
On 04/20/2015 12:59 PM, Szabolcs Nagy wrote: libgcc/gthr-posix.h uses weak reference logic to determine if libpthread is linked into the application or not. This is broken unless there is special workaround with libc internal knowledge and even then static linking needs further manual link time workaround, so this was disabled for os/generic in libstdc++v3 and for musl in libgfortran. The change minimizes the impact on other setups, but I think the weak ref logic should be disabled by default, it is never entirely correct. Conforming code can crash on a glibc setup too: $ cat a.cpp #include pthread.h void(*f)(void) = (void(*)(void))pthread_key_create; int main(){} $ g++ -static a.cpp -lpthread $ ./a.out Segmentation fault I reported this previously at https://gcc.gnu.org/ml/gcc/2014-11/msg00246.html libgfortran/Changelog: 2015-04-16 Szabolcs Nagy szabolcs.n...@arm.com * acinclude.m4 (GTHREAD_USE_WEAK): Define as 0 for *-*-musl*. * configure: Regenerate. libstdc++v3/Changelog: 2015-04-16 Szabolcs Nagy szabolcs.n...@arm.com * config/os/generic/os_defines.h (_GLIBCXX_GTHREAD_USE_WEAK): Define. * configure.host (os_include_dir): Set to os/generic for linux-musl*. OK. Please install on the trunk. jeff
[PATCH][AArch64] Properly cost FABD pattern
Hi all, In rtx costs we do not handle the FP abs (minus (a b)) case which maps down to a FABD instruction. This patch fixes that. FABD behaves similarly to the FADD class of instructions unlike simple FABS which is closer to FNEG. Tested aarch64-none-elf. Ok for trunk? Thanks, Kyrill 2015-04-22 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle pattern for fabd in ABS case. aarch64-costs-fabd.patch Description: Binary data
Re: [PATCH 10/13] fixincludes
On 20/04/15 21:38, Jeff Law wrote: On 04/20/2015 12:58 PM, Szabolcs Nagy wrote: No fixincludes are needed for musl. fixincludes/Changelog: 2015-04-16 Gregor Richards gregor.richa...@uwaterloo.ca * mkfixinc.sh: Add *-musl* with no fixes. OK. jeff I've committed this on Szabolcs' behalf with r222327. Kyrill
Re: trunk test result inconsistencies
On 04/22/2015 06:28 AM, Andrew MacLeod wrote: On 04/22/2015 08:19 AM, Jakub Jelinek wrote: On Wed, Apr 22, 2015 at 08:04:03AM -0400, Andrew MacLeod wrote: Is anyone else seeing comparison problems on trunk? I was having problems testing a patch on a 4/16 extraction, so last night I checked out a fresh trunk, built it, ran make check... then removed the build directory, re-built it from scratch again. make check.. and get a bunch of different results. I even used test_summary instead of my own home-grown scripts. That is a buggy kernel, you must have missed a warning not to upgrade your kernel. See https://bugzilla.kernel.org/show_bug.cgi?id=96311 https://lkml.org/lkml/2015/4/13/600 Dunno what progress has been on that patch since then, don't see it in latest Linus' git though. AFAIK at least 3.19 and 4.0 kernels are affected. A nice. Thanks. I did not see the warning, no :-) That pretty seriously buggy! It may also have been poor timing on installing the OS and initial upgrading of all the packages shell.devel.redhat.com/~law has a 4.0 kernel with the tty fix if you don't want to try and downgrade. jeff
Re: [PATCH 5/5] libcc1: 'set debug compile': Display absolute GCC driver filename
On 04/21/2015 03:41 PM, Jan Kratochvil wrote: Hi, with the patches so far after (gdb) set debug compile 1 one would get: searching for compiler matching regex ^(x86_64|i.86)(-[^-]*)?-linux(-gnu)?-gcc$ found compiler x86_64-unknown-linux-gnu-gcc But I believe it is more readable to see: searching for compiler matching regex ^(x86_64|i.86)(-[^-]*)?-linux(-gnu)?-gcc$ found compiler /usr/bin/x86_64-unknown-linux-gnu-gcc I do not think the change will have functionality impact, although the filename gets used even for executing the command. Jan libcc1/ChangeLog 2015-04-21 Jan Kratochvil jan.kratoch...@redhat.com * findcomp.cc: Include system.h. (search_dir): Return absolute filename. OK. Please install on the trunk. Thanks, Jeff
Re: [PATCH 2/13] musl libc config
On 04/20/2015 12:52 PM, Szabolcs Nagy wrote: Add musl libc support to gcc and the command line option -mmusl following other libc support code. Note that -mlibc cannot be entirely correct: there are build time decisions based on the default libc. gcc/Changelog: 2015-04-16 Gregor Richards gregor.richa...@uwaterloo.ca * config.gcc (LIBC_MUSL): New tm_defines macro. * config/linux.h (OPTION_MUSL): Define. (INCLUDE_DEFAULTS_MUSL_GPP, INCLUDE_DEFAULTS_MUSL_LOCAL,) (INCLUDE_DEFAULTS_MUSL_PREFIX, INCLUDE_DEFAULTS_MUSL_CROSS,) (INCLUDE_DEFAULTS_MUSL_TOOL, INCLUDE_DEFAULTS_MUSL_NATIVE): Define. * config/linux.opt (mmusl): New option. * gcc/configure.ac (gcc_cv_libc_provides_ssp): Add *-*-musl*. (gcc_cv_target_dl_iterate_phdr): Add *-linux-musl*. * gcc/configure: Regenerate. OK for the trunk. Please install. jeff
Re: trunk test result inconsistencies
On 04/22/2015 08:33 AM, Jeff Law wrote: On 04/22/2015 06:28 AM, Andrew MacLeod wrote: On 04/22/2015 08:19 AM, Jakub Jelinek wrote: On Wed, Apr 22, 2015 at 08:04:03AM -0400, Andrew MacLeod wrote: Is anyone else seeing comparison problems on trunk? I was having problems testing a patch on a 4/16 extraction, so last night I checked out a fresh trunk, built it, ran make check... then removed the build directory, re-built it from scratch again. make check.. and get a bunch of different results. I even used test_summary instead of my own home-grown scripts. That is a buggy kernel, you must have missed a warning not to upgrade your kernel. See https://bugzilla.kernel.org/show_bug.cgi?id=96311 https://lkml.org/lkml/2015/4/13/600 Dunno what progress has been on that patch since then, don't see it in latest Linus' git though. AFAIK at least 3.19 and 4.0 kernels are affected. A nice. Thanks. I did not see the warning, no :-) That pretty seriously buggy! It may also have been poor timing on installing the OS and initial upgrading of all the packages shell.devel.redhat.com/~law has a 4.0 kernel with the tty fix if you don't want to try and downgrade. Thanks, but the original 3.17.blah is still available at boot time, so I just switched to that.Should be good enough until the fix shows up. Andrew
Re: [PATCH 1/5] libcc1: Make libcc1.so-libcc1.so.0
On 04/21/2015 03:41 PM, Jan Kratochvil wrote: Hi, the next [patch 3/5] will change the libcc1.so API. I am not sure if the API change gets approved that way but for such case: (1) We really need to change GCC_FE_VERSION_0 - GCC_FE_VERSION_1, this feature is there for this purpose. That is [patch 2/5]. (2) Currently GDB does only dlopen(libcc1.so) and then depending on which libcc1.so version it would find first it would succeed/fail. I guess it is more convenient to do dlopen(libcc1.so.1) instead (where .1=.x corresponds to GCC_FE_VERSION_x). That is this patch (with x=0). GCC_C_FE_LIBCC is used only by GDB. (3) Currently there is no backward or forward compatibility although there could be one implemented. Personally I think the 'compile' feature is still in experimental stage so that it is OK to require last releases. At least in Fedora we can keep GDB-GCC in sync. GDB counterpart: [PATCH 1/4] compile: Use libcc1.so-libcc1.so.0 https://sourceware.org/ml/gdb-patches/2015-04/msg00805.html Message-ID: 20150421213635.14147.15653.st...@host1.jankratochvil.net Jan include/ChangeLog 2015-04-21 Jan Kratochvil jan.kratoch...@redhat.com * gcc-c-interface.h (GCC_C_FE_LIBCC): Quote it. Append GCC_FE_VERSION_0. OK. Please install on the trunk. jeff
Re: [PATCH 11/13] unwind fix for musl
On 04/20/2015 12:58 PM, Szabolcs Nagy wrote: dl_iterate_phdr depends on USE_PT_GNU_EH_FRAME. I think USE_PT_GNU_EH_FRAME could be enabled more generally (whenever libc provides dl_iterate_phdr), but I only made a conservative change. libgcc/Changelog: 2015-04-16 Gregor Richards gregor.richa...@uwaterloo.ca Szabolcs Nagy szabolcs.n...@arm.com * unwind-dw2-fde-dip.c (USE_PT_GNU_EH_FRAME): Define it on Linux if target provides dl_iterate_phdr. OK. Please install on the trunk. At this point I think everything but the target files have been approved, right? jeff
Re: [PATCH][PR65823] Fix va_arg ap_copy nop detection
On Wed, Apr 22, 2015 at 3:38 PM, Tom de Vries tom_devr...@mentor.com wrote: On 22-04-15 10:06, Richard Biener wrote: On Wed, Apr 22, 2015 at 9:41 AM, Tom de Vries tom_devr...@mentor.com wrote: Hi, this patch fixes PR65823. SNIP The patches fixes the problem by using operand_equal_p to do the equality test. Bootstrapped and reg-tested on x86_64. Did minimal non-bootstrap build on arm and reg-tested. OK for trunk? Hmm, ok for now. Committed. But I wonder if we can't fix things to not require that odd extra copy. Agreed, that would be good. In fact that we introduce ap.1 looks completely bogus to me AFAICT, it's introduced by gimplify_arg ('argp') because argp (a PARM_DECL) is not addressable. (and we don't in this case for arm). Note that the pointer compare obviously fails because we unshare the expression. So ... what breaks if we simply remove this odd fixup? [ Originally mentioned at https://gcc.gnu.org/ml/gcc/2015-02/msg00011.html . ] I've committed gcc.target/x86_64/abi/callabi/vaarg-6.c specifically as a minimal version of this problem. If we remove the ap_copy fixup, at original we have: ... ;; Function do_cpy2 (null) { char * e; char * e; e = VA_ARG_EXPR argp; e = VA_ARG_EXPR argp; if (e != b) { abort (); } } ... and after gimplify we have: ... do_cpy2 (char * argp) { char * argp.1; char * argp.2; char * b.3; char * e; argp.1 = argp; e = VA_ARG (argp.1, 0B); argp.2 = argp; e = VA_ARG (argp.2, 0B); b.3 = b; if (e != b.3) goto D.1373; else goto D.1374; D.1373: abort (); D.1374: } ... The second VA_ARG uses argp.2, which is a copy of argp, which is unmodified by the first VA_ARG. Using attached _proof-of-concept_ patch, I get callabi.exp working without the ap_copy, still at the cost of one 'argp.1 = argp' copy though: ... do_cpy2 (char * argp) { char * argp.1; char * b.2; char * e; argp.1 = argp; e = VA_ARG (argp.1, 0B); e = VA_ARG (argp.1, 0B); b.2 = b; if (e != b.2) goto D.1372; else goto D.1373; D.1372: abort (); D.1373: } ... But perhaps there's an easier way? Hum, simply Index: gcc/gimplify.c === --- gcc/gimplify.c (revision 222320) +++ gcc/gimplify.c (working copy) @@ -9419,6 +9419,7 @@ gimplify_va_arg_expr (tree *expr_p, gimp } /* Transform a VA_ARG_EXPR into an VA_ARG internal function. */ + mark_addressable (valist); ap = build_fold_addr_expr_loc (loc, valist); tag = build_int_cst (build_pointer_type (type), 0); *expr_p = build_call_expr_internal_loc (loc, IFN_VA_ARG, type, 2, ap, tag); pre-approved with removing the kludge. Thanks, Richard. Thanks, - Tom
[PATCH] Quiet down -Wlogical-op a bit (PR c/61534)
This patch stifles -Wlogical-op a bit: don't warn if either operand comes from a macro expansion. As the comment says, it doesn't fix the bug completely, but it's a simple improvement. I did this by introducing a new macro. Bootstrapped/regtested on x86_64-linux, ok for trunk? (Bootstrap with -Wlogical-op enabled does not pass yet.) 2015-04-22 Marek Polacek pola...@redhat.com PR c/61534 * input.h (from_macro_expansion_at): Define. * c-common.c (warn_logical_operator): Bail if either operand comes from a macro expansion. * c-c++-common/pr61534-1.c: New test. diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c index 7fe7fa6..b09bbb8 100644 --- gcc/c-family/c-common.c +++ gcc/c-family/c-common.c @@ -1697,6 +1697,13 @@ warn_logical_operator (location_t location, enum tree_code code, tree type, code != TRUTH_OR_EXPR) return; + /* We don't want to warn if either operand comes from a macro + expansion. ??? This doesn't work with e.g. NEGATE_EXPR yet; + see PR61534. */ + if (from_macro_expansion_at (EXPR_LOCATION (op_left)) + || from_macro_expansion_at (EXPR_LOCATION (op_right))) +return; + /* Warn if /|| are being used in a context where it is likely that the bitwise equivalent was intended by the programmer. That is, an expression such as op MASK diff --git gcc/input.h gcc/input.h index 7a0483f..93eb6ed 100644 --- gcc/input.h +++ gcc/input.h @@ -70,6 +70,10 @@ extern location_t input_location; header, but expanded in a non-system file. */ #define in_system_header_at(LOC) \ (linemap_location_in_system_header_p (line_table, LOC)) +/* Return a positive value if LOCATION is the locus of a token that + comes from a macro expansion, O otherwise. */ +#define from_macro_expansion_at(LOC) \ + ((linemap_location_from_macro_expansion_p (line_table, LOC))) void dump_line_table_statistics (void); diff --git gcc/testsuite/c-c++-common/pr61534-1.c gcc/testsuite/c-c++-common/pr61534-1.c index e69de29..1e304f0 100644 --- gcc/testsuite/c-c++-common/pr61534-1.c +++ gcc/testsuite/c-c++-common/pr61534-1.c @@ -0,0 +1,13 @@ +/* PR c/61534 */ +/* { dg-options -Wlogical-op } */ + +extern int xxx; +#define XXX !xxx +int +test (void) +{ + if (XXX xxx) /* { dg-bogus logical } */ +return 4; + else +return 0; +} Marek
Re: trunk test result inconsistencies
On Wed, Apr 22, 2015 at 5:19 AM, Jakub Jelinek ja...@redhat.com wrote: On Wed, Apr 22, 2015 at 08:04:03AM -0400, Andrew MacLeod wrote: Is anyone else seeing comparison problems on trunk? I was having problems testing a patch on a 4/16 extraction, so last night I checked out a fresh trunk, built it, ran make check... then removed the build directory, re-built it from scratch again. make check.. and get a bunch of different results. I even used test_summary instead of my own home-grown scripts. That is a buggy kernel, you must have missed a warning not to upgrade your kernel. See https://bugzilla.kernel.org/show_bug.cgi?id=96311 https://lkml.org/lkml/2015/4/13/600 Dunno what progress has been on that patch since then, don't see it in latest Linus' git though. AFAIK at least 3.19 and 4.0 kernels are affected. 3.19.5-100/3.19.5-200 from Fedora 20/21 fix the bug. -- H.J.
Re: [PATCH 2/5] libcc1: Use libcc1.so.0-libcc1.so.1
On 04/21/2015 03:41 PM, Jan Kratochvil wrote: Hi, see [patch 1/5], particularly: (3) Currently there is no backward or forward compatibility although there could be one implemented. Personally I think the 'compile' feature is still in experimental stage so that it is OK to require last releases. At least in Fedora we can keep GDB-GCC in sync. GDB counterpart: [PATCH 2/4] compile: Use libcc1.so.0-libcc1.so.1 https://sourceware.org/ml/gdb-patches/2015-04/msg00806.html Message-ID: 20150421213642.14147.93210.st...@host1.jankratochvil.net Jan include/ChangeLog 2015-04-21 Jan Kratochvil jan.kratoch...@redhat.com * gcc-c-interface.h (GCC_C_FE_LIBCC): Update it to GCC_FE_VERSION_1. * gcc-interface.h (enum gcc_base_api_version): Add GCC_FE_VERSION_1. libcc1/ChangeLog 2015-04-21 Jan Kratochvil jan.kratoch...@redhat.com * Makefile.am (libcc1_la_LDFLAGS): Add version-info 1. * Makefile.in: Regenerate. * libcc1.cc (vtable, gcc_c_fe_context): Update it to GCC_FE_VERSION_1. OK. Please install on the trunk. jeff
[PATCH] [AArch32] Additional bics patterns.
Hi, This patch adds arm rtl patterns to generate bics instructions with shift. Done full regression run on arm-none-eabi. Is patch ok? gcc/config 2015-04-22 Alex Velenko alex.vele...@arm.com * arm/arm.md (andsi_not_shiftsi_si_scc): New pattern. * (andsi_not_shiftsi_si_scc_no_reuse): New pattern. gcc/testsuite 2015-04-22 Alex Velenko alex.vele...@arm.com * gcc.target/arm/bics_1.c : New testcase. * gcc.target/arm/bics_2.c : New testcase. * gcc.target/arm/bics_3.c : New testcase. * gcc.target/arm/bics_4.c : New testcase. --- gcc/config/arm/arm.md | 42 ++ gcc/testsuite/gcc.target/arm/bics_1.c | 54 + gcc/testsuite/gcc.target/arm/bics_2.c | 57 +++ gcc/testsuite/gcc.target/arm/bics_3.c | 41 + gcc/testsuite/gcc.target/arm/bics_4.c | 49 ++ 5 files changed, 243 insertions(+) create mode 100644 gcc/testsuite/gcc.target/arm/bics_1.c create mode 100644 gcc/testsuite/gcc.target/arm/bics_2.c create mode 100644 gcc/testsuite/gcc.target/arm/bics_3.c create mode 100644 gcc/testsuite/gcc.target/arm/bics_4.c diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 164ac13..51a149e 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -2768,6 +2768,48 @@ (const_string logic_shift_reg)))] ) +(define_insn andsi_not_shiftsi_si_scc_no_reuse + [(set (reg:CC_NOOV CC_REGNUM) + (compare:CC_NOOV + (and:SI (not:SI (match_operator:SI 0 shift_operator + [(match_operand:SI 1 s_register_operand r) +(match_operand:SI 2 arm_rhs_operand rM)])) + (match_operand:SI 3 s_register_operand r)) + (const_int 0))) + (clobber (match_scratch:SI 4 =r))] + TARGET_32BIT + bic%.%?\\t%4, %3, %1%S0 + [(set_attr predicable yes) + (set_attr conds set) + (set_attr shift 1) + (set (attr type) (if_then_else (match_operand 2 const_int_operand ) + (const_string logic_shift_imm) + (const_string logic_shift_reg)))] +) + +(define_insn andsi_not_shiftsi_si_scc + [(parallel [(set (reg:CC_NOOV CC_REGNUM) + (compare:CC_NOOV + (and:SI (not:SI (match_operator:SI 0 shift_operator + [(match_operand:SI 1 s_register_operand r) +(match_operand:SI 2 arm_rhs_operand rM)])) + (match_operand:SI 3 s_register_operand r)) + (const_int 0))) + (set (match_operand:SI 4 s_register_operand =r) +(and:SI (not:SI (match_op_dup 0 +[(match_dup 1) + (match_dup 2)])) +(match_dup 3)))])] + TARGET_32BIT + bic%.%?\\t%4, %3, %1%S0 + [(set_attr predicable yes) + (set_attr conds set) + (set_attr shift 1) + (set (attr type) (if_then_else (match_operand 2 const_int_operand ) + (const_string logic_shift_imm) + (const_string logic_shift_reg)))] +) + (define_insn *andsi_notsi_si_compare0 [(set (reg:CC_NOOV CC_REGNUM) (compare:CC_NOOV diff --git a/gcc/testsuite/gcc.target/arm/bics_1.c b/gcc/testsuite/gcc.target/arm/bics_1.c new file mode 100644 index 000..173eb89 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/bics_1.c @@ -0,0 +1,54 @@ +/* { dg-do run } */ +/* { dg-options -O2 --save-temps -fno-inline } */ +/* { dg-require-effective-target arm32 } */ + +extern void abort (void); + +int +bics_si_test1 (int a, int b, int c) +{ + int d = a ~b; + + /* { dg-final { scan-assembler-times bics\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+ 2 } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +int +bics_si_test2 (int a, int b, int c) +{ + int d = a ~(b 3); + + /* { dg-final { scan-assembler-times bics\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+, .sl \#3 1 } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +int +main () +{ + int x; + + x = bics_si_test1 (29, ~4, 5); + if (x != ((29 4) + ~4 + 5)) +abort (); + + x = bics_si_test1 (5, ~2, 20); + if (x != 25) +abort (); + +x = bics_si_test2 (35, ~4, 5); + if (x != ((35 ~(~4 3)) + ~4 + 5)) +abort (); + + x = bics_si_test2 (96, ~2, 20); + if (x != 116) + abort (); + + return 0; +} +/* { dg-final { cleanup-saved-temps } } */ diff --git a/gcc/testsuite/gcc.target/arm/bics_2.c b/gcc/testsuite/gcc.target/arm/bics_2.c new file mode 100644 index 000..740d7c9 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/bics_2.c @@ -0,0 +1,57 @@ +/* { dg-do run } */ +/* { dg-options -O2 --save-temps -fno-inline } */ +/* { dg-require-effective-target arm32 } */ + +extern void abort (void); + +int +bics_si_test1 (int a, int b, int c) +{ + int d = a ~b; + + /* { dg-final { scan-assembler-not bics\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+ } } */ + /* { dg-final { scan-assembler-times bic\tr\[0-9\]+,
RE: [PATCH] [AArch32] Additional bics patterns.
Hi Alex, On 22/04/15 14:12, Alex Velenko wrote: Hi, This patch adds arm rtl patterns to generate bics instructions with shift. Done full regression run on arm-none-eabi. A bootstrap on arm-none-linux-gnueabihf would be nice too. Is patch ok? gcc/config 2015-04-22 Alex Velenko alex.vele...@arm.com * arm/arm.md (andsi_not_shiftsi_si_scc): New pattern. * (andsi_not_shiftsi_si_scc_no_reuse): New pattern. the path to arm.md should be: * config/arm/arm.md +(define_insn andsi_not_shiftsi_si_scc_no_reuse + [(set (reg:CC_NOOV CC_REGNUM) + (compare:CC_NOOV + (and:SI (not:SI (match_operator:SI 0 shift_operator + [(match_operand:SI 1 s_register_operand r) + (match_operand:SI 2 arm_rhs_operand rM)])) + (match_operand:SI 3 s_register_operand r)) + (const_int 0))) + (clobber (match_scratch:SI 4 =r))] + TARGET_32BIT + bic%.%?\\t%4, %3, %1%S0 + [(set_attr predicable yes) + (set_attr conds set) + (set_attr shift 1) + (set (attr type) (if_then_else (match_operand 2 const_int_operand ) + (const_string logic_shift_imm) + (const_string logic_shift_reg)))] +) Since this is a predicable instruction and has a 32-bit encoding you should also set the 'predicable_short_it' attribute to 'no' to prevent GCC from trying to put it inside an IT block when compiling for ARMv8-A. + +(define_insn andsi_not_shiftsi_si_scc + [(parallel [(set (reg:CC_NOOV CC_REGNUM) + (compare:CC_NOOV + (and:SI (not:SI (match_operator:SI 0 shift_operator + [(match_operand:SI 1 s_register_operand r) + (match_operand:SI 2 arm_rhs_operand rM)])) + (match_operand:SI 3 s_register_operand r)) + (const_int 0))) + (set (match_operand:SI 4 s_register_operand =r) + (and:SI (not:SI (match_op_dup 0 + [(match_dup 1) + (match_dup 2)])) + (match_dup 3)))])] + TARGET_32BIT + bic%.%?\\t%4, %3, %1%S0 + [(set_attr predicable yes) + (set_attr conds set) + (set_attr shift 1) + (set (attr type) (if_then_else (match_operand 2 const_int_operand ) + (const_string logic_shift_imm) + (const_string logic_shift_reg)))] same comment about predicable_short_it. Cheers, Kyrill +) + (define_insn *andsi_notsi_si_compare0 [(set (reg:CC_NOOV CC_REGNUM) (compare:CC_NOOV
Re: [PATCH 4/5] libcc1: Add 'set compile-gcc'
On 04/21/2015 03:41 PM, Jan Kratochvil wrote: as discussed in How to use compile execute function in GDB https://sourceware.org/ml/gdb/2015-04/msg00026.html GDB currently searches for /usr/bin/ARCH-OS-gcc and chooses one but one cannot override which one. GDB would provide new option 'set compile-gcc'. This patch does not change the libcc1 API as it overloads the triplet_regexp parameter of GCC's set_arguments according to: + if (access (triplet_regexp, X_OK) == 0) GDB counterpart: [PATCH 4/4] compile: Add 'set compile-gcc' https://sourceware.org/ml/gdb-patches/2015-04/msg00808.html Message-ID: 20150421213657.14147.60506.st...@host1.jankratochvil.net Jan include/ChangeLog 2015-04-21 Jan Kratochvil jan.kratoch...@redhat.com * gcc-interface.h (enum gcc_base_api_version): Add comment to GCC_FE_VERSION_1. (struct gcc_base_vtable): Describe triplet_regexp parameter overload for set_arguments. libcc1/ChangeLog 2015-04-21 Jan Kratochvil jan.kratoch...@redhat.com * libcc1.cc (libcc1_set_arguments): Implement filenames for triplet_regexp. OK. Please install on the trunk. jeff
Re: trunk test result inconsistencies
On Wed, Apr 22, 2015 at 06:16:24AM -0700, H.J. Lu wrote: I admit I have not tried, but am certainly not seeing Peter's fix in https://www.kernel.org/pub/linux/kernel/v3.x/ChangeLog-3.19.5 has it been fixed differently? Not seeing anything in the Fedora kernel package %changelog either... * Wed Apr 15 2015 Josh Boyer jwbo...@fedoraproject.org - Add patch to fix tty closure race (rhbz 1208953) https://bugzilla.redhat.com/show_bug.cgi?id=1208953 Ah, nice, thanks. Now to get it fixed upstream too. Jakub
Re: [PATCH, rs6000, testsuite] Fix PR target/64579, __TM_end __builtin_tend failed to return transactional state
On Tue, 2015-04-21 at 21:17 -0500, Segher Boessenkool wrote: On Tue, Apr 21, 2015 at 03:56:18PM -0500, Peter Bergner wrote: This patch also fixes some issues I hit with the tabortdc[i] and htm_m[ft]spr_mode patterns when used with -m32 -mpowerpc64. Running the testsuite, or did you actually try to _use_ -m32 -mpowerpc64? :-) Not with the testsuite. I had some simple unit tests that basically just returned the CR/SPR and hit some ICEs. Maybe you can fold tabortdc with tabortwc now? Use one UNSPEC name for both, :GPR and wd? Wouldn't that change the tabortwc pattern to use DImode rather than SImode when compiled with -m64 or -m32 -mpowerpc64? I'm not sure we want that. + case HTM_BUILTIN_TTEST: /* Alias for: tabortwci. 0,r0,0 */ + op[nopnds++] = GEN_INT (0); + op[nopnds++] = gen_rtx_REG (SImode, 0); + op[nopnds++] = GEN_INT (0); Is that really r0, isn't that (0|rA)? [Too lazy to read the docs myself right now, sorry.] The ISA doc shows: tabortwci. TO,RA,SI a - EXTS((RA)32:63) abort - 0 CR0 - 0 || MSR(TS) || 0 if a EXTS(SI) TO0 then abort - 1 if a EXTS(SI) TO1 then abort - 1 if a = EXTS(SI) T02 then abort - 1 if a u EXTS(SI) TO3 then abort - 1 if a u EXTS(SI) TO4 then abort - 1 ... Given that I'm passing in a zero TO value, the second and third operands are don't care values, so I'm just using r0 and 0 as random input values. I'm only interested in extracting the MSR's Transaction Status (TS) bits and placing them into CR0. + emit_insn (gen_movcc (subreg, cr)); + emit_insn (gen_lshrsi3 (scratch2, scratch1, GEN_INT (28))); + emit_insn (gen_andsi3 (target, scratch2, GEN_INT (0xf))); + } + } Don't we have helper functions/expanders to do these moves? Yuck. Heh, I looked. The only helper pattern was the movcc pattern, but that placed the CR into bits 32-35 of the register. I needed the shift to move it down into the low nibble and I use the and, since one of the move cr insns places two copies of the CR value into bits 32-35 and 36-39. -/* { dg-final { scan-assembler-times tabortdc\\. 1 } } */ -/* { dg-final { scan-assembler-times tabortdci\\. 1 } } */ +/* { dg-final { scan-assembler-times tabortdc\\. 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times tabortdci\\. 1 { target lp64 } } } */ This skips this test on -m32 -mpowerpc64, is that on purpose? Ummm, not exactly. :-) Not that many people test that though. I'll see if I can find a replacement for lp64 that covers that case. If not, I'm not too torn up if we skip it for -m32 -mpowerpc64. Peter
Re: [PATCH 1/13] libitm fixes for musl support
On 20/04/15 21:41, Jeff Law wrote: On 04/20/2015 12:51 PM, Szabolcs Nagy wrote: This are minor correctness fixes required for musl. (fcntl.h is the standard header and always available on Linux, sys/fcntl.h is just a legacy alias, so use the standard one.) libitm/Changelog: 2015-04-16 Gregor Richards gregor.richa...@uwaterloo.ca * config/arm/hwcap.cc: Use fcntl.h instead of sys/fcntl.h. * config/linux/x86/tls.h: Only use __GLIBC_PREREQ if defined. OK. jeff I've committed this on Szabolcs' behalf with r222325. Kyrill
Re: [PATCH 11/13] unwind fix for musl
On 22/04/15 14:20, Jeff Law wrote: On 04/20/2015 12:58 PM, Szabolcs Nagy wrote: dl_iterate_phdr depends on USE_PT_GNU_EH_FRAME. I think USE_PT_GNU_EH_FRAME could be enabled more generally (whenever libc provides dl_iterate_phdr), but I only made a conservative change. libgcc/Changelog: 2015-04-16 Gregor Richards gregor.richa...@uwaterloo.ca Szabolcs Nagy szabolcs.n...@arm.com * unwind-dw2-fde-dip.c (USE_PT_GNU_EH_FRAME): Define it on Linux if target provides dl_iterate_phdr. OK. Please install on the trunk. I've committed this on Szabolcs' behalf with r222328. Kyrill At this point I think everything but the target files have been approved, right? jeff
Re: [PATCH 2/13] musl libc config
On 22/04/15 14:16, Jeff Law wrote: On 04/20/2015 12:52 PM, Szabolcs Nagy wrote: Add musl libc support to gcc and the command line option -mmusl following other libc support code. Note that -mlibc cannot be entirely correct: there are build time decisions based on the default libc. gcc/Changelog: 2015-04-16 Gregor Richards gregor.richa...@uwaterloo.ca * config.gcc (LIBC_MUSL): New tm_defines macro. * config/linux.h (OPTION_MUSL): Define. (INCLUDE_DEFAULTS_MUSL_GPP, INCLUDE_DEFAULTS_MUSL_LOCAL,) (INCLUDE_DEFAULTS_MUSL_PREFIX, INCLUDE_DEFAULTS_MUSL_CROSS,) (INCLUDE_DEFAULTS_MUSL_TOOL, INCLUDE_DEFAULTS_MUSL_NATIVE): Define. * config/linux.opt (mmusl): New option. * gcc/configure.ac (gcc_cv_libc_provides_ssp): Add *-*-musl*. (gcc_cv_target_dl_iterate_phdr): Add *-linux-musl*. * gcc/configure: Regenerate. OK for the trunk. Please install. I've committed this on Szabolcs' behalf with r222326 with slightly adjusted ChangeLog paths: 2015-04-22 Gregor Richards gregor.richa...@uwaterloo.ca * config.gcc (LIBC_MUSL): New tm_defines macro. * config/linux.h (OPTION_MUSL): Define. (INCLUDE_DEFAULTS_MUSL_GPP, INCLUDE_DEFAULTS_MUSL_LOCAL,) (INCLUDE_DEFAULTS_MUSL_PREFIX, INCLUDE_DEFAULTS_MUSL_CROSS,) (INCLUDE_DEFAULTS_MUSL_TOOL, INCLUDE_DEFAULTS_MUSL_NATIVE): Define. * config/linux.opt (mmusl): New option. * configure.ac (gcc_cv_libc_provides_ssp): Add *-*-musl*. (gcc_cv_target_dl_iterate_phdr): Add *-linux-musl*. * configure: Regenerate. Kyrill jeff
Re: [WIP] OpenMP 4 NVPTX support
On 04/21/2015 05:58 PM, Jakub Jelinek wrote: suggests that while it is nice that when building nvptx accel compiler we build libgcc.a, libc.a, libm.a, libgfortran.a (and in the future hopefully libgomp.a), nothing attempts to link those in :(. I have that fixed; I expect I'll get around to posting this at some point now that stage1 is open. Bernd
[PATCH] Fix up tm_clone_hasher
handle_cache_entry in tm_clone_hasher looks wrong: the condition if (e != HTAB_EMPTY_ENTRY || e != HTAB_DELETED_ENTRY) is always true. While it could be fixed by just changing || into , I decided to follow suit and do what we do in handle_cache_entry's elsewhere in the codebase. I've fixed a formatting issue below while at it. Bootstrapped/regtested on x86_64-linux, ok for trunk? I think this should also go into 5.1. 2015-04-22 Marek Polacek pola...@redhat.com * varasm.c (handle_cache_entry): Fix logic. diff --git gcc/varasm.c gcc/varasm.c index 1597de1..3fc0316 100644 --- gcc/varasm.c +++ gcc/varasm.c @@ -5779,21 +5779,20 @@ struct tm_clone_hasher : ggc_cache_hashertree_map * static hashval_t hash (tree_map *m) { return tree_map_hash (m); } static bool equal (tree_map *a, tree_map *b) { return tree_map_eq (a, b); } - static void handle_cache_entry (tree_map *e) + static void + handle_cache_entry (tree_map *e) { -if (e != HTAB_EMPTY_ENTRY || e != HTAB_DELETED_ENTRY) - { - extern void gt_ggc_mx (tree_map *); - if (ggc_marked_p (e-base.from)) - gt_ggc_mx (e); - else - e = static_casttree_map * (HTAB_DELETED_ENTRY); - } +extern void gt_ggc_mx (tree_map *); +if (e == HTAB_EMPTY_ENTRY || e == HTAB_DELETED_ENTRY) + return; +else if (ggc_marked_p (e-base.from)) + gt_ggc_mx (e); +else + e = static_casttree_map * (HTAB_DELETED_ENTRY); } }; -static GTY((cache)) - hash_tabletm_clone_hasher *tm_clone_hash; +static GTY((cache)) hash_tabletm_clone_hasher *tm_clone_hash; void record_tm_clone_pair (tree o, tree n) Marek
Re: [PATCH][AArch64] Implement -m{cpu,tune,arch}=native using only /proc/cpuinfo
On 22/04/15 12:46, Kyrill Tkachov wrote: [Sorry for resending twice. My mail client glitched] On 20/04/15 16:47, Kyrill Tkachov wrote: Hi all, This is an attempt to add native CPU detection to AArch64 GNU/Linux targets. Similar to other ports we use SPEC rewriting to rewrite -m{cpu,tune,arch}=native options into the appropriate CPU/architecture and the architecture extension options when appropriate (i.e. +crypto/+crc etc). For CPU/architecture detection it gets a bit involved, especially when running on a big.LITTLE system. My proposed approach is to look at /proc/cpuinfo/ and search for the implementer id and part number fields that uniquely identify each core (appropriate identifying information is added to aarch64-cores.def). If we find two types of core we have a big.LITTLE system, so search through the core definitions extracted from aarch64-cores.def to find if we support such a combination (currently only cortex-a57.cortex-a53 and cortex-a72.cortex-a53) and make sure that the implementer id field matches up. I tested this on a 4xCortex-A53 + 2xCortex-A57 big.LITTLE Ubuntu GNU/Linux system. There are two formats for /proc/cpuinfo/ that I'm aware of. The first (old) one has the format: -- processor: 0 processor: 1 processor: 2 processor: 3 processor: 4 processor: 5 Features: fp asimd evtstrm aes pmull sha1 sha2 crc32 CPU implementer: 0x41 CPU architecture: AArch64 CPU variant: 0x0 CPU part: 0xd03 -- In this format it lists the 6 cores but the CPU part it reports is only the one for the core from which /proc/cpuinfo was read from (!), in this case one of the Cortex-A53 cores. This means we detect a different CPU depending on which core GCC was invoked on. Not ideal really, but there's no more information that we can extract. Given the /proc/cpuinfo above, this patch will rewrite -mcpu=native into -mcpu=cortex-a53+fp+simd+crypto+crc The newer /proc/cpuinfo format proposed at https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=44b82b7700d05a52cd983799d3ecde1a976b3bed looks like this: -- processor : 0 Features: fp asimd evtstrm aes pmull sha1 sha2 crc32 CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part: 0xd03 CPU revision: 0 processor : 1 Features: fp asimd evtstrm aes pmull sha1 sha2 crc32 CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part: 0xd03 CPU revision: 0 processor : 2 Features: fp asimd evtstrm aes pmull sha1 sha2 crc32 CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part: 0xd03 CPU revision: 0 processor : 3 Features: fp asimd evtstrm aes pmull sha1 sha2 crc32 CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part: 0xd03 CPU revision: 0 processor : 4 Features: fp asimd evtstrm aes pmull sha1 sha2 crc32 CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part: 0xd07 CPU revision: 0 processor : 5 Features: fp asimd evtstrm aes pmull sha1 sha2 crc32 CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part: 0xd07 CPU revision: 0 -- The Features field is used to detect the architectural features that we map to GCC option extensions i.e. +fp,+crypto,+simd,+crc etc. Similarly, -march=native would be rewritten into -march=armv8-a+fp+simd+crypto+crc while -mtune=native into -march=cortex-a57.cortex-a53 (the arch extension options are not valid for -mtune). If it detects more than one implementer ID or the implementer IDs not matching up somewhere or some other weirdness /proc/cpuinfo or fails to recognise the CPU it will bail out and ignore the option entirely (similarly to other ports). The patch works fine with both /proc/cpuinfo formats although, as mentioned above, it will not be able to detect the big.LITTLE combination from the first format. I've filled in the implementer ID and part numbers for the Cortex-A57, Cortex-A53, Cortex-A72, X-Gene 1 cores, but I don't have that info for thunderx or exynosm1. Could someone from Cavium and Samsung help me out here? At present this patch has some false dummy values that I'd like to fill out before committing this. Thanks Andrew and Evandro for the info. I've added the numbers to the patch, so it should work on those systems. I'm attaching the final patch here for review. And resending here with a minor whitespace change in aarch64-cores.def to make thunderx line up with the other entries. Thanks Evandro for pointing it out! Kyrill Thanks, Kyrill 2014-04-22 Kyrylo Tkachov kyrylo.tkac...@arm.com * config.host (case ${host}): Add aarch64*-*-linux case. *
RE: [PATCH v3][MIPS] fix CRT_CALL_STATIC_FUNCTION macro
-Original Message- From: Maciej W. Rozycki [mailto:ma...@linux-mips.org] Sent: Tuesday, April 21, 2015 8:52 PM To: Petar Jovanovic Cc: gcc-patches@gcc.gnu.org; catherine_mo...@mentor.com; matthew.fort...@imgtec.com Subject: Re: [PATCH v3][MIPS] fix CRT_CALL_STATIC_FUNCTION macro I think this will best be reduced to a link-only test on bare iron, hoping for a link failure. I am not sure how we can reduce the test to a link failure (today), if ld will not report an error (today). What exactly is wrong with the run time test as is in the last patch? As of ld issue you have mentioned, it has been reported - see BZ#18297 [1]. Regards, Petar [1] BZ#18297, https://sourceware.org/bugzilla/show_bug.cgi?id=18297
Re: [PATCH 12/13] libstdc++, libgfortran gthr workaround for musl
On 22/04/15 14:17, Jeff Law wrote: On 04/20/2015 12:59 PM, Szabolcs Nagy wrote: libgcc/gthr-posix.h uses weak reference logic to determine if libpthread is linked into the application or not. This is broken unless there is special workaround with libc internal knowledge and even then static linking needs further manual link time workaround, so this was disabled for os/generic in libstdc++v3 and for musl in libgfortran. The change minimizes the impact on other setups, but I think the weak ref logic should be disabled by default, it is never entirely correct. Conforming code can crash on a glibc setup too: $ cat a.cpp #include pthread.h void(*f)(void) = (void(*)(void))pthread_key_create; int main(){} $ g++ -static a.cpp -lpthread $ ./a.out Segmentation fault I reported this previously at https://gcc.gnu.org/ml/gcc/2014-11/msg00246.html libgfortran/Changelog: 2015-04-16 Szabolcs Nagy szabolcs.n...@arm.com * acinclude.m4 (GTHREAD_USE_WEAK): Define as 0 for *-*-musl*. * configure: Regenerate. libstdc++v3/Changelog: 2015-04-16 Szabolcs Nagy szabolcs.n...@arm.com * config/os/generic/os_defines.h (_GLIBCXX_GTHREAD_USE_WEAK): Define. * configure.host (os_include_dir): Set to os/generic for linux-musl*. OK. Please install on the trunk. I've committed this on Szabolcs' behalf with r222329. Kyrill jeff
Re: [PATCH 2/13] musl libc config
On Wed, Apr 22, 2015 at 15:34:51 +0100, Kyrill Tkachov wrote: On 22/04/15 14:16, Jeff Law wrote: On 04/20/2015 12:52 PM, Szabolcs Nagy wrote: Add musl libc support to gcc and the command line option -mmusl following other libc support code. Note that -mlibc cannot be entirely correct: there are build time decisions based on the default libc. gcc/Changelog: 2015-04-16 Gregor Richards gregor.richa...@uwaterloo.ca * config.gcc (LIBC_MUSL): New tm_defines macro. * config/linux.h (OPTION_MUSL): Define. (INCLUDE_DEFAULTS_MUSL_GPP, INCLUDE_DEFAULTS_MUSL_LOCAL,) (INCLUDE_DEFAULTS_MUSL_PREFIX, INCLUDE_DEFAULTS_MUSL_CROSS,) (INCLUDE_DEFAULTS_MUSL_TOOL, INCLUDE_DEFAULTS_MUSL_NATIVE): Define. * config/linux.opt (mmusl): New option. * gcc/configure.ac (gcc_cv_libc_provides_ssp): Add *-*-musl*. (gcc_cv_target_dl_iterate_phdr): Add *-linux-musl*. * gcc/configure: Regenerate. OK for the trunk. Please install. I've committed this on Szabolcs' behalf with r222326 with slightly adjusted ChangeLog paths: 2015-04-22 Gregor Richards gregor.richa...@uwaterloo.ca * config.gcc (LIBC_MUSL): New tm_defines macro. * config/linux.h (OPTION_MUSL): Define. (INCLUDE_DEFAULTS_MUSL_GPP, INCLUDE_DEFAULTS_MUSL_LOCAL,) (INCLUDE_DEFAULTS_MUSL_PREFIX, INCLUDE_DEFAULTS_MUSL_CROSS,) (INCLUDE_DEFAULTS_MUSL_TOOL, INCLUDE_DEFAULTS_MUSL_NATIVE): Define. * config/linux.opt (mmusl): New option. * configure.ac (gcc_cv_libc_provides_ssp): Add *-*-musl*. (gcc_cv_target_dl_iterate_phdr): Add *-linux-musl*. * configure: Regenerate. This caused: https://gcc.gnu.org/ml/gcc-regression/2015-04/msg00262.html -- Ilya
Re: [PATCH 2/13] musl libc config
On 22/04/15 16:36, Kyrylo Tkachov wrote: On 22/04/15 16:26, Ilya Verbin wrote: On Wed, Apr 22, 2015 at 15:34:51 +0100, Kyrill Tkachov wrote: On 22/04/15 14:16, Jeff Law wrote: On 04/20/2015 12:52 PM, Szabolcs Nagy wrote: Add musl libc support to gcc and the command line option -mmusl following other libc support code. Note that -mlibc cannot be entirely correct: there are build time decisions based on the default libc. gcc/Changelog: 2015-04-16 Gregor Richards gregor.richa...@uwaterloo.ca * config.gcc (LIBC_MUSL): New tm_defines macro. * config/linux.h (OPTION_MUSL): Define. (INCLUDE_DEFAULTS_MUSL_GPP, INCLUDE_DEFAULTS_MUSL_LOCAL,) (INCLUDE_DEFAULTS_MUSL_PREFIX, INCLUDE_DEFAULTS_MUSL_CROSS,) (INCLUDE_DEFAULTS_MUSL_TOOL, INCLUDE_DEFAULTS_MUSL_NATIVE): Define. * config/linux.opt (mmusl): New option. * gcc/configure.ac (gcc_cv_libc_provides_ssp): Add *-*-musl*. (gcc_cv_target_dl_iterate_phdr): Add *-linux-musl*. * gcc/configure: Regenerate. OK for the trunk. Please install. I've committed this on Szabolcs' behalf with r222326 with slightly adjusted ChangeLog paths: 2015-04-22 Gregor Richards gregor.richa...@uwaterloo.ca * config.gcc (LIBC_MUSL): New tm_defines macro. * config/linux.h (OPTION_MUSL): Define. (INCLUDE_DEFAULTS_MUSL_GPP, INCLUDE_DEFAULTS_MUSL_LOCAL,) (INCLUDE_DEFAULTS_MUSL_PREFIX, INCLUDE_DEFAULTS_MUSL_CROSS,) (INCLUDE_DEFAULTS_MUSL_TOOL, INCLUDE_DEFAULTS_MUSL_NATIVE): Define. * config/linux.opt (mmusl): New option. * configure.ac (gcc_cv_libc_provides_ssp): Add *-*-musl*. (gcc_cv_target_dl_iterate_phdr): Add *-linux-musl*. * configure: Regenerate. This caused: https://gcc.gnu.org/ml/gcc-regression/2015-04/msg00262.html Sorry about that. I've reverted the patch. Szabolcs, we should wait until the target-specific parts are approved and install it all together? Or did you want to #ifdef some parts out to make this patch more robust towards targets that don't support musl? yes, i didn't realize that this depends on the target specific parts i will prepare an updated patch that works if the target has no musl dynamic linker name defined sorry
RE: [PATCH 2/13] musl libc config
On 22/04/15 16:26, Ilya Verbin wrote: On Wed, Apr 22, 2015 at 15:34:51 +0100, Kyrill Tkachov wrote: On 22/04/15 14:16, Jeff Law wrote: On 04/20/2015 12:52 PM, Szabolcs Nagy wrote: Add musl libc support to gcc and the command line option -mmusl following other libc support code. Note that -mlibc cannot be entirely correct: there are build time decisions based on the default libc. gcc/Changelog: 2015-04-16 Gregor Richards gregor.richa...@uwaterloo.ca * config.gcc (LIBC_MUSL): New tm_defines macro. * config/linux.h (OPTION_MUSL): Define. (INCLUDE_DEFAULTS_MUSL_GPP, INCLUDE_DEFAULTS_MUSL_LOCAL,) (INCLUDE_DEFAULTS_MUSL_PREFIX, INCLUDE_DEFAULTS_MUSL_CROSS,) (INCLUDE_DEFAULTS_MUSL_TOOL, INCLUDE_DEFAULTS_MUSL_NATIVE): Define. * config/linux.opt (mmusl): New option. * gcc/configure.ac (gcc_cv_libc_provides_ssp): Add *-*-musl*. (gcc_cv_target_dl_iterate_phdr): Add *-linux-musl*. * gcc/configure: Regenerate. OK for the trunk. Please install. I've committed this on Szabolcs' behalf with r222326 with slightly adjusted ChangeLog paths: 2015-04-22 Gregor Richards gregor.richa...@uwaterloo.ca * config.gcc (LIBC_MUSL): New tm_defines macro. * config/linux.h (OPTION_MUSL): Define. (INCLUDE_DEFAULTS_MUSL_GPP, INCLUDE_DEFAULTS_MUSL_LOCAL,) (INCLUDE_DEFAULTS_MUSL_PREFIX, INCLUDE_DEFAULTS_MUSL_CROSS,) (INCLUDE_DEFAULTS_MUSL_TOOL, INCLUDE_DEFAULTS_MUSL_NATIVE): Define. * config/linux.opt (mmusl): New option. * configure.ac (gcc_cv_libc_provides_ssp): Add *-*-musl*. (gcc_cv_target_dl_iterate_phdr): Add *-linux-musl*. * configure: Regenerate. This caused: https://gcc.gnu.org/ml/gcc-regression/2015-04/msg00262.html Sorry about that. I've reverted the patch. Szabolcs, we should wait until the target-specific parts are approved and install it all together? Or did you want to #ifdef some parts out to make this patch more robust towards targets that don't support musl? Kyrill -- Ilya
Re: [PATCH 2/2][ARM] PR/63870: Add a __builtin_lane_check
Ping (https://gcc.gnu.org/ml/gcc-patches/2015-01/msg01436.html). These are required for float16 patches posted at https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01332.html Bootstrapped + check-gcc on arm-none-linux-gnueabihf. Alan Lawrence wrote: This parallels the present form of __builtin_aarch64_im_lane_boundsi, and allows to check lane indices for intrinsics that can otherwise be written in terms of GCC vector extensions. The new builtin is not used in this patch but is used in my series of float16_t intrinsics (https://gcc.gnu.org/ml/gcc-patches/2015-01/msg01434.html), and at some point in the future we should rewrite existing intrinsics (for other types) to this form too, but I'm leaving that for a later patch series :). Cross-tested check-gcc on arm-none-eabi Bootstrapped on arm-none-linux-gnueabihf cortex-a15 gcc/ChangeLog: * config/arm/arm-builtins.c (enum arm_builtins): Add ARM_BUILTIN_NEON_BASE and ARM_BUILTIN_NEON_LANE_CHECK. (ARM_BUILTIN_NEON_BASE): Rename macro to (ARM_BUILTIN_NEON_PATTERN_START): ...this. (arm_init_neon_builtins): Register __builtin_arm_lane_check. (arm_expand_neon_builtin): Handle ARM_BUILTIN_NEON_LANE_CHECK. commit 3d5f2b80dc4527b4874bff458bb047946322028f Author: Alan Lawrence alan.lawre...@arm.com Date: Mon Dec 8 18:36:30 2014 + Add __builtin_arm_lane_check diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index 20d2198..3de2be7 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -546,12 +546,16 @@ enum arm_builtins #undef CRYPTO2 #undef CRYPTO3 + ARM_BUILTIN_NEON_BASE, + ARM_BUILTIN_NEON_LANE_CHECK = ARM_BUILTIN_NEON_BASE, + #include arm_neon_builtins.def ARM_BUILTIN_MAX }; -#define ARM_BUILTIN_NEON_BASE (ARM_BUILTIN_MAX - ARRAY_SIZE (neon_builtin_data)) +#define ARM_BUILTIN_NEON_PATTERN_START \ +(ARM_BUILTIN_MAX - ARRAY_SIZE (neon_builtin_data)) #undef CF #undef VAR1 @@ -910,7 +914,7 @@ arm_init_simd_builtin_scalar_types (void) static void arm_init_neon_builtins (void) { - unsigned int i, fcode = ARM_BUILTIN_NEON_BASE; + unsigned int i, fcode = ARM_BUILTIN_NEON_PATTERN_START; arm_init_simd_builtin_types (); @@ -920,6 +924,15 @@ arm_init_neon_builtins (void) system. */ arm_init_simd_builtin_scalar_types (); + tree lane_check_fpr = build_function_type_list (void_type_node, + intSI_type_node, + intSI_type_node, + NULL); + arm_builtin_decls[ARM_BUILTIN_NEON_LANE_CHECK] = + add_builtin_function (__builtin_arm_lane_check, lane_check_fpr, + ARM_BUILTIN_NEON_LANE_CHECK, BUILT_IN_MD, + NULL, NULL_TREE); + for (i = 0; i ARRAY_SIZE (neon_builtin_data); i++, fcode++) { bool print_type_signature_p = false; @@ -2183,14 +2196,28 @@ arm_expand_neon_args (rtx target, machine_mode map_mode, int fcode, return target; } -/* Expand a Neon builtin. These are special because they don't have symbolic +/* Expand a Neon builtin, i.e. those registered only if TARGET_NEON holds. + Most of these are special because they don't have symbolic constants defined per-instruction or per instruction-variant. Instead, the required info is looked up in the table neon_builtin_data. */ static rtx arm_expand_neon_builtin (int fcode, tree exp, rtx target) { + if (fcode == ARM_BUILTIN_NEON_LANE_CHECK) +{ + tree nlanes = CALL_EXPR_ARG (exp, 0); + gcc_assert (TREE_CODE (nlanes) == INTEGER_CST); + rtx lane_idx = expand_normal (CALL_EXPR_ARG (exp, 1)); + if (CONST_INT_P (lane_idx)) + neon_lane_bounds (lane_idx, 0, TREE_INT_CST_LOW (nlanes), exp); + else + error (%Klane index must be a constant immediate, exp); + /* Don't generate any RTL. */ + return const0_rtx; +} + neon_builtin_datum *d = - neon_builtin_data[fcode - ARM_BUILTIN_NEON_BASE]; + neon_builtin_data[fcode - ARM_BUILTIN_NEON_PATTERN_START]; enum insn_code icode = d-code; builtin_arg args[SIMD_MAX_BUILTIN_ARGS]; int num_args = insn_data[d-code].n_operands;
[PATCH 2/14][ARM]Add float16x8_t type
Identical to https://gcc.gnu.org/ml/gcc-patches/2015-01/msg01438.html . Bootstrapped on arm-none-linux-gnueabihf. commit bc582bd6a0ed7c7c91fc834603fc573ed745b1a7 Author: Alan Lawrence alan.lawre...@arm.com Date: Mon Dec 8 18:40:24 2014 + Add float16x8_t + V8HFmode support (regardless of -mfp16-format) diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index 3de2be7..2d97023 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -204,6 +204,7 @@ arm_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS] #define di_UPDImode #define v16qi_UP V16QImode #define v8hi_UP V8HImode +#define v8hf_UP V8HFmode #define v4si_UP V4SImode #define v4sf_UP V4SFmode #define v2di_UP V2DImode @@ -839,6 +840,7 @@ arm_init_simd_builtin_types (void) /* Continue with standard types. */ arm_simd_types[Float16x4_t].eltype = arm_simd_floatHF_type_node; arm_simd_types[Float32x2_t].eltype = float_type_node; + arm_simd_types[Float16x8_t].eltype = arm_simd_floatHF_type_node; arm_simd_types[Float32x4_t].eltype = float_type_node; for (i = 0; i nelts; i++) diff --git a/gcc/config/arm/arm-simd-builtin-types.def b/gcc/config/arm/arm-simd-builtin-types.def index bcbd20b..b178ae6 100644 --- a/gcc/config/arm/arm-simd-builtin-types.def +++ b/gcc/config/arm/arm-simd-builtin-types.def @@ -44,5 +44,7 @@ ENTRY (Float16x4_t, V4HF, none, 64, float16, 18) ENTRY (Float32x2_t, V2SF, none, 64, float32, 18) + + ENTRY (Float16x8_t, V8HF, none, 128, float16, 19) ENTRY (Float32x4_t, V4SF, none, 128, float32, 19) diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 4181f12..9de63fa 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -26367,7 +26367,8 @@ arm_vector_mode_supported_p (machine_mode mode) { /* Neon also supports V2SImode, etc. listed in the clause below. */ if (TARGET_NEON (mode == V2SFmode || mode == V4SImode || mode == V8HImode - || mode == V4HFmode || mode == V16QImode || mode == V4SFmode || mode == V2DImode)) + || mode ==V4HFmode || mode == V16QImode || mode == V4SFmode + || mode == V2DImode || mode == V8HFmode)) return true; if ((TARGET_NEON || TARGET_IWMMXT) diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index 8c10ea3..f0ef33f 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -1104,7 +1104,7 @@ extern int arm_arch_crc; /* Modes valid for Neon Q registers. */ #define VALID_NEON_QREG_MODE(MODE) \ ((MODE) == V4SImode || (MODE) == V8HImode || (MODE) == V16QImode \ - || (MODE) == V4SFmode || (MODE) == V2DImode) + || (MODE) == V8HFmode || (MODE) == V4SFmode || (MODE) == V2DImode) /* Structure modes valid for Neon registers. */ #define VALID_NEON_STRUCT_MODE(MODE) \ diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index b4100c8..a958f63 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -58,6 +58,7 @@ typedef __simd128_int8_t int8x16_t; typedef __simd128_int16_t int16x8_t; typedef __simd128_int32_t int32x4_t; typedef __simd128_int64_t int64x2_t; +typedef __simd128_float16_t float16x8_t; typedef __simd128_float32_t float32x4_t; typedef __simd128_poly8_t poly8x16_t; typedef __simd128_poly16_t poly16x8_t;
[PATCH 1/14][ARM] Add float16x4_t intrinsics
This is a respin of https://gcc.gnu.org/ml/gcc-patches/2015-01/msg01437.html , but fixes a wrong 'lane index out of bounds' error on vget_lane_f16 and vset_lane_f16, and drops vdup_n_f16 and vdup_lane_f16, as these are not in the ACLE spec. As previously, these use GCC vector extensions to maximise mid-end optimization, and do not attempt to support bigendian. The vld1, vldN, vldN_lane and corresponding intrinsics follow in patch 4/14. Bootstrapped + check-gcc on arm-none-linux-gnueabihf. gcc/ChangeLog: * config/arm/arm_neon.h (float16_t, vget_lane_f16, vset_lane_f16, vcreate_f16, vld1_lane_f16, vld1_dup_f16, vreinterpret_p8_f16, vreinterpret_p16_f16, vreinterpret_f16_p8, vreinterpret_f16_p16, vreinterpret_f16_f32, vreinterpret_f16_p64, vreinterpret_f16_s64, vreinterpret_f16_u64, vreinterpret_f16_s8, vreinterpret_f16_s16, vreinterpret_f16_s32, vreinterpret_f16_u8, vreinterpret_f16_u16, vreinterpret_f16_u32, vreinterpret_f32_f16, vreinterpret_p64_f16, vreinterpret_s64_f16, vreinterpret_u64_f16, vreinterpret_s8_f16, vreinterpret_s16_f16, vreinterpret_s32_f16, vreinterpret_u8_f16, vreinterpret_u16_f16, vreinterpret_u32_f16): New. diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index c923e294cda2f8cb88e4b1ccca6fd4f13a3ed98d..b4100c88f83bc603377912b7aab085532178ef99 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -41,6 +41,7 @@ typedef __simd64_int8_t int8x8_t; typedef __simd64_int16_t int16x4_t; typedef __simd64_int32_t int32x2_t; typedef __builtin_neon_di int64x1_t; +typedef __builtin_neon_hf float16_t; typedef __simd64_float16_t float16x4_t; typedef __simd64_float32_t float32x2_t; typedef __simd64_poly8_t poly8x8_t; @@ -5201,6 +5202,19 @@ vget_lane_s32 (int32x2_t __a, const int __b) return (int32_t)__builtin_neon_vget_lanev2si (__a, __b); } +/* Functions cannot accept or return __FP16 types. Even if the function + were marked always-inline so there were no call sites, the declaration + would nonetheless raise an error. Hence, we must use a macro instead. */ + +#define vget_lane_f16(__v, __idx) \ + __extension__ \ +({ \ + float16x4_t __vec = (__v); \ + __builtin_arm_lane_check (4, __idx); \ + float16_t __res = __vec[__idx]; \ + __res; \ +}) + __extension__ static __inline float32_t __attribute__ ((__always_inline__)) vget_lane_f32 (float32x2_t __a, const int __b) { @@ -5333,6 +5347,16 @@ vset_lane_s32 (int32_t __a, int32x2_t __b, const int __c) return (int32x2_t)__builtin_neon_vset_lanev2si ((__builtin_neon_si) __a, __b, __c); } +#define vset_lane_f16(__e, __v, __idx) \ + __extension__ \ +({ \ + float16_t __elem = (__e); \ + float16x4_t __vec = (__v); \ + __builtin_arm_lane_check (4, __idx); \ + __vec[__idx] = __elem; \ + __vec; \ +}) + __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vset_lane_f32 (float32_t __a, float32x2_t __b, const int __c) { @@ -5479,6 +5503,12 @@ vcreate_s64 (uint64_t __a) return (int64x1_t)__builtin_neon_vcreatedi ((__builtin_neon_di) __a); } +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vcreate_f16 (uint64_t __a) +{ + return (float16x4_t) __a; +} + __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vcreate_f32 (uint64_t __a) { @@ -8796,6 +8826,12 @@ vld1_lane_s32 (const int32_t * __a, int32x2_t __b, const int __c) return (int32x2_t)__builtin_neon_vld1_lanev2si ((const __builtin_neon_si *) __a, __b, __c); } +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vld1_lane_f16 (const float16_t * __a, float16x4_t __b, const int __c) +{ + return vset_lane_f16 (*__a, __b, __c); +} + __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vld1_lane_f32 (const float32_t * __a, float32x2_t __b, const int __c) { @@ -8944,6 +8980,13 @@ vld1_dup_s32 (const int32_t * __a) return (int32x2_t)__builtin_neon_vld1_dupv2si ((const __builtin_neon_si *) __a); } +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vld1_dup_f16 (const float16_t * __a) +{ + float16_t __f = *__a; + return (float16x4_t) { __f, __f, __f, __f }; +} + __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vld1_dup_f32 (const float32_t * __a) { @@ -11828,6 +11871,12 @@ vreinterpret_p8_p16 (poly16x4_t __a) } __extension__ static __inline poly8x8_t __attribute__ ((__always_inline__)) +vreinterpret_p8_f16 (float16x4_t __a) +{ + return (poly8x8_t) __a; +} + +__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__)) vreinterpret_p8_f32 (float32x2_t __a) { return (poly8x8_t)__builtin_neon_vreinterpretv8qiv2sf (__a); @@ -11896,6 +11945,12 @@ vreinterpret_p16_p8 (poly8x8_t __a) } __extension__ static __inline poly16x4_t __attribute__
[PATCH 3/14][ARM] Add float16x8_t intrinsics
This is a respin of https://gcc.gnu.org/ml/gcc-patches/2015-01/msg01439.html , again fixing a wrong 'lane index out of bounds' error for vgetq_lane_f16 and vsetq_lane-f16 at -O0, and dropping vdupq_n_f16 and vdupq_lane_f16 as these are not in the ACLE spec. The vld1, vldN, vldN_lane and corresponding intrinsics follow in patch 4/14. Bootstrapped + check-gcc on arm-none-linux-gnueabihf. gcc/ChangeLog: * config/arm/arm_neon.h (vgetq_lane_f16, vsetq_lane_f16, vld1q_lane_f16, vld1q_dup_f16, vreinterpretq_p8_f16, vreinterpretq_p16_f16, vreinterpretq_f16_p8, vreinterpretq_f16_p16, vreinterpretq_f16_f32, vreinterpretq_f16_p64, vreinterpretq_f16_p128, vreinterpretq_f16_s64, vreinterpretq_f16_u64, vreinterpretq_f16_s8, vreinterpretq_f16_s16, vreinterpretq_f16_s32, vreinterpretq_f16_u8, vreinterpretq_f16_u16, vreinterpretq_f16_u32, vreinterpretq_f32_f16, vreinterpretq_p64_f16, vreinterpretq_p128_f16, vreinterpretq_s64_f16, vreinterpretq_u64_f16, vreinterpretq_s8_f16, vreinterpretq_s16_f16, vreinterpretq_s32_f16, vreinterpretq_u8_f16, vreinterpretq_u16_f16, vreinterpretq_u32_f16): New. diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index a958f63ca3084bf7cfaf6420e535d69f50efa6b6..db73c70c6e4ca99db62ff4055a33bfe00db29039 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -5282,6 +5282,15 @@ vgetq_lane_s32 (int32x4_t __a, const int __b) return (int32_t)__builtin_neon_vget_lanev4si (__a, __b); } +#define vgetq_lane_f16(__v, __idx) \ + __extension__ \ +({ \ + float16x8_t __vec = (__v); \ + __builtin_arm_lane_check (8, __idx); \ + float16_t __res = __vec[__idx]; \ + __res; \ +}) + __extension__ static __inline float32_t __attribute__ ((__always_inline__)) vgetq_lane_f32 (float32x4_t __a, const int __b) { @@ -5424,6 +5433,16 @@ vsetq_lane_s32 (int32_t __a, int32x4_t __b, const int __c) return (int32x4_t)__builtin_neon_vset_lanev4si ((__builtin_neon_si) __a, __b, __c); } +#define vsetq_lane_f16(__e, __v, __idx) \ + __extension__ \ +({ \ + float16_t __elem = (__e); \ + float16x8_t __vec = (__v); \ + __builtin_arm_lane_check (8, __idx); \ + __vec[__idx] = __elem; \ + __vec; \ +}) + __extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) vsetq_lane_f32 (float32_t __a, float32x4_t __b, const int __c) { @@ -8907,6 +8926,12 @@ vld1q_lane_s32 (const int32_t * __a, int32x4_t __b, const int __c) return (int32x4_t)__builtin_neon_vld1_lanev4si ((const __builtin_neon_si *) __a, __b, __c); } +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vld1q_lane_f16 (const float16_t * __a, float16x8_t __b, const int __c) +{ + return vsetq_lane_f16 (*__a, __b, __c); +} + __extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) vld1q_lane_f32 (const float32_t * __a, float32x4_t __b, const int __c) { @@ -9062,6 +9087,13 @@ vld1q_dup_s32 (const int32_t * __a) return (int32x4_t)__builtin_neon_vld1_dupv4si ((const __builtin_neon_si *) __a); } +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vld1q_dup_f16 (const float16_t * __a) +{ + float16_t __f = *__a; + return (float16x8_t) { __f, __f, __f, __f, __f, __f, __f, __f }; +} + __extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) vld1q_dup_f32 (const float32_t * __a) { @@ -12856,6 +12888,12 @@ vreinterpretq_p8_p16 (poly16x8_t __a) } __extension__ static __inline poly8x16_t __attribute__ ((__always_inline__)) +vreinterpretq_p8_f16 (float16x8_t __a) +{ + return (poly8x16_t) __a; +} + +__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__)) vreinterpretq_p8_f32 (float32x4_t __a) { return (poly8x16_t)__builtin_neon_vreinterpretv16qiv4sf (__a); @@ -12932,6 +12970,12 @@ vreinterpretq_p16_p8 (poly8x16_t __a) } __extension__ static __inline poly16x8_t __attribute__ ((__always_inline__)) +vreinterpretq_p16_f16 (float16x8_t __a) +{ + return (poly16x8_t) __a; +} + +__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__)) vreinterpretq_p16_f32 (float32x4_t __a) { return (poly16x8_t)__builtin_neon_vreinterpretv8hiv4sf (__a); @@ -13001,6 +13045,88 @@ vreinterpretq_p16_u32 (uint32x4_t __a) return (poly16x8_t)__builtin_neon_vreinterpretv8hiv4si ((int32x4_t) __a); } +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vreinterpretq_f16_p8 (poly8x16_t __a) +{ + return (float16x8_t) __a; +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vreinterpretq_f16_p16 (poly16x8_t __a) +{ + return (float16x8_t) __a; +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vreinterpretq_f16_f32 (float32x4_t __a) +{ + return (float16x8_t) __a; +} + +#ifdef __ARM_FEATURE_CRYPTO
[PATCH 5/14][AArch64] Add basic fp16 support
This adds basic support for moving __fp16 values around, passing and returning, and operating on them by promoting to 32-bit floats. Also a few scalar testcases. Note I've not got an fmov (immediate) variant, because there is no 'fmov hn, ...' - the only way to load a 16-bit immediate is to reinterpret the bit pattern into some other type. Vector MOVs are turned off for the same reason. If this is practical it can follow in a separate patch. My reading of ACLE suggests the type name to use is __fp16, rather than __builtin_aarch64_simd_hf. I can use the latter if that's preferable? int-f16 conversions are a little odd, assembly int_to_f16: scvtf d0, w0 fcvt h0, d0 ret int_from_f16: fcvt s0, h0 fcvtzs w0, s0 ret The spec is silent on the absence or existence of intermediate rounding steps, however, I don't think this matters: even float32_t offers s many more bits than __fp16, that any integer which fits into the range of an __fp16 (i.e. is not infinite), can be expressed exactly as a float32_t without any loss of precision. So I think the above are OK. (if they can be optimized, that can follow in a later patch.) Note that unlike ARM, where we support both IEEE and Alternative formats (and, somewhat-awkwardly, format-agnostic code too), here we are settling on IEEE format always. Technically, we should output an EABI attribute saying which format we are using here, however, aarch64 asm does not support the .eabi-attribute directive yet, so it seems reasonable to leave this while there is only one possible format. Bootstrapped + check-gcc on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64-builtins.c (aarch64_fp16_type_node): New. (aarch64_init_builtins): Make aarch64_fp16_type_node, use for __fp16. * config/aarch64/aarch64-modes.def: Add HFmode. * config/aarch64/aarch64.h (TARGET_CPU_CPP_BUILTINS): Define __ARM_FP16_FORMAT_IEEE and __ARM_FP16_ARGS. Set bit 1 of __ARM_FP. * config/aarch64/aarch64.c (aarch64_init_libfuncs, aarch64_promoted_type): New. (aarch64_float_const_representable_p): Disable HFmode. (aarch64_mangle_type): Mangle half-precision floats to Dh. (TARGET_PROMOTED_TYPE): Define to aarch64_promoted_type. (TARGET_INIT_LIBFUNCS): Define to aarch64_init_libfuncs. * config/aarch64/aarch64.md (movmode): Include HFmode using GPF_F16. (movhf_aarch64, extendhfsf2, extendhfdf2, truncsfhf2, truncdfhf2): New. * config/aarch64/iterators.md (GPF_F16): New. gcc/testsuite/ChangeLog: * gcc.target/aarch64/f16_convs_1.c: New test. * gcc.target/aarch64/f16_convs_2.c: New test. * gcc.target/aarch64/f16_movs_1.c: New test. diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index 87f1ac2ec1e3c774782c567b20c673802ae90d99..5a7b112bd1fe77826bfb84383c86dceb6b1521e3 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -453,6 +453,9 @@ static struct aarch64_simd_type_info aarch64_simd_types [] = { }; #undef ENTRY +/* This type is not SIMD-specific; it is the user-visible __fp16. */ +static tree aarch64_fp16_type_node = NULL_TREE; + static tree aarch64_simd_intOI_type_node = NULL_TREE; static tree aarch64_simd_intEI_type_node = NULL_TREE; static tree aarch64_simd_intCI_type_node = NULL_TREE; @@ -862,6 +865,12 @@ aarch64_init_builtins (void) = add_builtin_function (__builtin_aarch64_set_fpsr, ftype_set_fpr, AARCH64_BUILTIN_SET_FPSR, BUILT_IN_MD, NULL, NULL_TREE); + aarch64_fp16_type_node = make_node (REAL_TYPE); + TYPE_PRECISION (aarch64_fp16_type_node) = 16; + layout_type (aarch64_fp16_type_node); + + (*lang_hooks.types.register_builtin_type) (aarch64_fp16_type_node, __fp16); + if (TARGET_SIMD) aarch64_init_simd_builtins (); if (TARGET_CRC32) diff --git a/gcc/config/aarch64/aarch64-modes.def b/gcc/config/aarch64/aarch64-modes.def index b17b90d90601ae0a631a78560da743720c4638ce..c30059b632fa8cb7fd9071917d3f581f0966a86d 100644 --- a/gcc/config/aarch64/aarch64-modes.def +++ b/gcc/config/aarch64/aarch64-modes.def @@ -36,6 +36,10 @@ CC_MODE (CC_DLTU); CC_MODE (CC_DGEU); CC_MODE (CC_DGTU); +/* Half-precision floating point for arm_neon.h float16_t. */ +FLOAT_MODE (HF, 2, 0); +ADJUST_FLOAT_FORMAT (HF, ieee_half_format); + /* Vector modes. */ VECTOR_MODES (INT, 8);/* V8QI V4HI V2SI. */ VECTOR_MODES (INT, 16); /* V16QI V8HI V4SI V2DI. */ diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index bf59e40a64459f6daddef47a5f5214adfd92d9b6..67c37ebc0e06d22e524322e5a82b6bcde550bd93 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -57,7 +57,9 @@ if (TARGET_FLOAT) \ { \ builtin_define (__ARM_FEATURE_FMA); \ -
[PATCH 5/14][AArch64] Add basic fp16 support
[Resending with correct in-reply-to header] This adds basic support for moving __fp16 values around, passing and returning, and operating on them by promoting to 32-bit floats. Also a few scalar testcases. Note I've not got an fmov (immediate) variant, because there is no 'fmov hn, ...' - the only way to load a 16-bit immediate is to reinterpret the bit pattern into some other type. Vector MOVs are turned off for the same reason. If this is practical it can follow in a separate patch. My reading of ACLE suggests the type name to use is __fp16, rather than __builtin_aarch64_simd_hf. I can use the latter if that's preferable? int-f16 conversions are a little odd, assembly int_to_f16: scvtf d0, w0 fcvt h0, d0 ret int_from_f16: fcvt s0, h0 fcvtzs w0, s0 ret The spec is silent on the absence or existence of intermediate rounding steps, however, I don't think this matters: even float32_t offers s many more bits than __fp16, that any integer which fits into the range of an __fp16 (i.e. is not infinite), can be expressed exactly as a float32_t without any loss of precision. So I think the above are OK. (if they can be optimized, that can follow in a later patch.) Note that unlike ARM, where we support both IEEE and Alternative formats (and, somewhat-awkwardly, format-agnostic code too), here we are settling on IEEE format always. Technically, we should output an EABI attribute saying which format we are using here, however, aarch64 asm does not support the .eabi-attribute directive yet, so it seems reasonable to leave this while there is only one possible format. Bootstrapped + check-gcc on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64-builtins.c (aarch64_fp16_type_node): New. (aarch64_init_builtins): Make aarch64_fp16_type_node, use for __fp16. * config/aarch64/aarch64-modes.def: Add HFmode. * config/aarch64/aarch64.h (TARGET_CPU_CPP_BUILTINS): Define __ARM_FP16_FORMAT_IEEE and __ARM_FP16_ARGS. Set bit 1 of __ARM_FP. * config/aarch64/aarch64.c (aarch64_init_libfuncs, aarch64_promoted_type): New. (aarch64_float_const_representable_p): Disable HFmode. (aarch64_mangle_type): Mangle half-precision floats to Dh. (TARGET_PROMOTED_TYPE): Define to aarch64_promoted_type. (TARGET_INIT_LIBFUNCS): Define to aarch64_init_libfuncs. * config/aarch64/aarch64.md (movmode): Include HFmode using GPF_F16. (movhf_aarch64, extendhfsf2, extendhfdf2, truncsfhf2, truncdfhf2): New. * config/aarch64/iterators.md (GPF_F16): New. gcc/testsuite/ChangeLog: * gcc.target/aarch64/f16_convs_1.c: New test. * gcc.target/aarch64/f16_convs_2.c: New test. * gcc.target/aarch64/f16_movs_1.c: New test. diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index 87f1ac2ec1e3c774782c567b20c673802ae90d99..5a7b112bd1fe77826bfb84383c86dceb6b1521e3 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -453,6 +453,9 @@ static struct aarch64_simd_type_info aarch64_simd_types [] = { }; #undef ENTRY +/* This type is not SIMD-specific; it is the user-visible __fp16. */ +static tree aarch64_fp16_type_node = NULL_TREE; + static tree aarch64_simd_intOI_type_node = NULL_TREE; static tree aarch64_simd_intEI_type_node = NULL_TREE; static tree aarch64_simd_intCI_type_node = NULL_TREE; @@ -862,6 +865,12 @@ aarch64_init_builtins (void) = add_builtin_function (__builtin_aarch64_set_fpsr, ftype_set_fpr, AARCH64_BUILTIN_SET_FPSR, BUILT_IN_MD, NULL, NULL_TREE); + aarch64_fp16_type_node = make_node (REAL_TYPE); + TYPE_PRECISION (aarch64_fp16_type_node) = 16; + layout_type (aarch64_fp16_type_node); + + (*lang_hooks.types.register_builtin_type) (aarch64_fp16_type_node, __fp16); + if (TARGET_SIMD) aarch64_init_simd_builtins (); if (TARGET_CRC32) diff --git a/gcc/config/aarch64/aarch64-modes.def b/gcc/config/aarch64/aarch64-modes.def index b17b90d90601ae0a631a78560da743720c4638ce..c30059b632fa8cb7fd9071917d3f581f0966a86d 100644 --- a/gcc/config/aarch64/aarch64-modes.def +++ b/gcc/config/aarch64/aarch64-modes.def @@ -36,6 +36,10 @@ CC_MODE (CC_DLTU); CC_MODE (CC_DGEU); CC_MODE (CC_DGTU); +/* Half-precision floating point for arm_neon.h float16_t. */ +FLOAT_MODE (HF, 2, 0); +ADJUST_FLOAT_FORMAT (HF, ieee_half_format); + /* Vector modes. */ VECTOR_MODES (INT, 8);/* V8QI V4HI V2SI. */ VECTOR_MODES (INT, 16); /* V16QI V8HI V4SI V2DI. */ diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index bf59e40a64459f6daddef47a5f5214adfd92d9b6..67c37ebc0e06d22e524322e5a82b6bcde550bd93 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -57,7 +57,9 @@ if (TARGET_FLOAT) \ { \
[PATCH 7/14][AArch64] vld{2,3,4}{,_lane,_dup},vcombine,vcreate
gcc/ChangeLog: * config/aarch64/aarch64.c (aarch64_split_simd_combine): Add V4HFmode. * config/aarch64/aarch64-builtins.c (VAR13, VAR14): New. (aarch64_scalar_builtin_types, aarch64_init_simd_builtin_scalar_types): Add __builtin_aarch64_simd_hf. * config/aarch64/arm_neon.h (float16x4x2_t, float16x8x2_t, float16x4x3_t, float16x8x3_t, float16x4x4_t, float16x8x4_t, vcombine_f16, vst2_lane_f16, vst2q_lane_f16, vst3_lane_f16, vst3q_lane_f16, vst4_lane_f16, vst4q_lane_f16, vld2_f16, vld2q_f16, vld3_f16, vld3q_f16, vld4_f16, vld4q_f16, vld2_dup_f16, vld2q_dup_f16, vld3_dup_f16, vld3q_dup_f16, vld4_dup_f16, vld4q_dup_f16, vld2_lane_f16, vld2q_lane_f16, vld3_lane_f16, vld3q_lane_f16, vld4_lane_f16, vld4q_lane_f16, vst2_f16, vst2q_f16, vst3_f16, vst3q_f16, vst4_f16, vst4q_f16, vcreate_f16): New. * config/aarch64/iterators.md (VALLDIF, Vtype, Vetype, Vbtype, V_cmp_result, v_cmp_result): Add cases for V4HF and V8HF. (VDC, Vdbl): Add V4HF. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vldN_1.c: Add float16x4_t and float16x8_t cases. * gcc.target/aarch64/vldN_dup_1.c: Likewise. * gcc.target/aarch64/vldN_lane_1.c: Likewise.
[PATCH 12/14][ARM/AArch64 Testsuite] Update advsimd-intrinsics tests to add float16 vectors
This is a fairly straightforward addition of a new type: I've added it in on equal status to the other types, because the various vector-load/store/element-manipulating intrinsics, are *not* conditional on HW support. (They just involve moving 16-bit chunks around, just like s16/u16/p16). Thus, for many tests, this just involves adding default expected values of { 0x ...}. While there are indeed more of such default values for float16x4/8 than any other type (because there are fewer intrinsics), there are plenty others, so this seems consistent. However, there is no vdup_n_f16 intrinsic so I worked around this using a macro (yes, a bit ugh). There are many check_GNU_style.sh violations here but I tried to be consistent with the existing code. Passing on arm-none-linux-gnueabihf and aarch64-none-linux-gnu, aarch64_be-none-elf (following previous patch) gcc/testsuite/ChangeLog: * gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h (hfloat16_t, vdup_n_f16): New. (result, expected, CHECK_RESULTS, CHECK_RESULTS_NAMED, clean_results): Add float16x4 and float16x8 cases. DECL_VARIABLE_64BITS_VARIANTS: Add float16x4 case. DECL_VARIABLE_128BITS_VARIANTS: Add float16x8 case. * gcc.target/aarch64/advsimd-intrinsics/compute-data-ref.h (buffer, buffer_pad, buffer_dup, buffer_dup_pad): Add float16x4 and float16x8. * gcc.target/aarch64/advsimd-intrinsics/vaba.c: Add expected results for float16x4 and float16x8. * gcc.target/aarch64/advsimd-intrinsics/vabal.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vabd.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vabdl.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vabs.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vadd.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vaddl.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vaddw.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vand.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vbic.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vbsl.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vcls.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vclz.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vcnt.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vdup-vmov.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vcombine.c: Add expected results for float16x4 and float16x8. (main): add test of float16x4 - float16x8 case. * gcc.target/aarch64/advsimd-intrinsics/vcreate.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vdup_lane.c: Likewise. diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h index 80d6b5893cc33eaff4178a2f26aa53ccf1c48dda..3d1b36fd111e906f6e940ccf89900b44e79a68e9 100644 --- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h @@ -7,6 +7,7 @@ #include inttypes.h /* helper type, to help write floating point results in integer form. */ +typedef uint16_t hfloat16_t; typedef uint32_t hfloat32_t; typedef uint64_t hfloat64_t; @@ -132,6 +133,7 @@ static ARRAY(result, uint, 32, 2); static ARRAY(result, uint, 64, 1); static ARRAY(result, poly, 8, 8); static ARRAY(result, poly, 16, 4); +static ARRAY(result, float, 16, 4); static ARRAY(result, float, 32, 2); static ARRAY(result, int, 8, 16); static ARRAY(result, int, 16, 8); @@ -143,6 +145,7 @@ static ARRAY(result, uint, 32, 4); static ARRAY(result, uint, 64, 2); static ARRAY(result, poly, 8, 16); static ARRAY(result, poly, 16, 8); +static ARRAY(result, float, 16, 8); static ARRAY(result, float, 32, 4); #ifdef __aarch64__ static ARRAY(result, float, 64, 2); @@ -160,6 +163,7 @@ extern ARRAY(expected, uint, 32, 2); extern ARRAY(expected, uint, 64, 1); extern ARRAY(expected, poly, 8, 8); extern ARRAY(expected, poly, 16, 4); +extern ARRAY(expected, hfloat, 16, 4); extern ARRAY(expected, hfloat, 32, 2); extern ARRAY(expected, int, 8, 16); extern ARRAY(expected, int, 16, 8); @@ -171,6 +175,7 @@ extern ARRAY(expected, uint, 32, 4); extern ARRAY(expected, uint, 64, 2); extern ARRAY(expected, poly, 8, 16); extern ARRAY(expected, poly, 16, 8); +extern ARRAY(expected, hfloat, 16, 8); extern ARRAY(expected, hfloat, 32, 4); extern ARRAY(expected, hfloat, 64, 2); @@ -187,6 +192,7 @@ extern ARRAY(expected, hfloat, 64, 2); CHECK(test_name, uint, 64, 1, PRIx64, expected, comment); \ CHECK(test_name, poly, 8, 8, PRIx8, expected, comment); \ CHECK(test_name, poly, 16, 4, PRIx16, expected, comment); \ +CHECK_FP(test_name, float, 16, 4, PRIx16, expected, comment); \ CHECK_FP(test_name, float, 32, 2, PRIx32, expected, comment); \ \
[PATCH 11/14][fold-const.c] Fix bigendian HFmode in native_interpret_real
native_interpret_real in fold_const.c has an assumption that floats are at least 32-bits (on bigendian targets with UNITS_PER_WORD = 4). This patch relaxes that assumption (allowing e.g. 16-bit HFmode values). On aarch64_be-none-elf, this fixes the float16x4_t variant of gcc.target/aarch64/advsimd-intrinsics/vcreate.c added in the next patch in series. Bootstrapped + check-gcc on: x86-unknown-linux-gnu powerpc64-unknown-linux-gnu (gcc110 on compile farm, as I believe this is bigendian, as opposed to powerpc64le-unknown-linux-gnu on gcc112) arm-none-linux-gnueabihf aarch64-none-elf It's not clear that any of those actually test this code, but the aarch64_be-none-elf tests in next patch definitely do ;). gcc/ChangeLog: fold-const.c (native_interpret_real): Correctly read floats of less than 32 bits on bigendian targets. commit f8ad02fecdb7b6f91bab77cc154a246bd719ac20 Author: Alan Lawrence alan.lawre...@arm.com Date: Thu Apr 9 10:54:40 2015 +0100 Fix native_interpret_real for HFmode floats on Bigendian with UNITS_PER_WORD=4 (with missing space) diff --git a/gcc/fold-const.c b/gcc/fold-const.c index 6d085b1..52bc8e9 100644 --- a/gcc/fold-const.c +++ b/gcc/fold-const.c @@ -7625,7 +7625,7 @@ native_interpret_real (tree type, const unsigned char *ptr, int len) offset += byte % UNITS_PER_WORD; } else - offset = BYTES_BIG_ENDIAN ? 3 - byte : byte; + offset = BYTES_BIG_ENDIAN ? MIN (3, total_bytes - 1) - byte : byte; value = ptr[offset + ((bitpos / BITS_PER_UNIT) ~3)]; tmp[bitpos / 32] |= (unsigned long)value (bitpos 31);
Re: [PATCH 1/2][ARM] PR/63870: Add qualifier to check lane bounds in expand
Ping (https://gcc.gnu.org/ml/gcc-patches/2015-01/msg01422.html). These are required for float16 patches posted at https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01332.html . Bootstrapped + check-gcc on arm-none-linux-gnueabihf. Alan Lawrence wrote: This is based loosely upon svn r217440, [AArch64] Add bounds checking to vqdm_lane intrinsics..., but applies to more intrinsics (including e.g. vget_lane), and does not do the endianness-flipping present on AArch64: the objective is to exactly preserve behaviour on all valid code. (Yes, the new qualifier may perhaps give us a location for flipping lanes according to endianness in the future, but I'm not doing that here.) Checks for lanes being in range for many insns are thus moved from assembly to expand time, with inlining history. For example, previous error message: vqrdmulh_lane_s16_indices_1.c: In function 'test1': vqrdmulh_lane_s16_indices_1.c:9:1: error: lane out of range } ^ becomes: In file included vqrdmulh_lane_s16_indices_1.c:3:0: In function 'vqrdmulh_lane_s16', inlined from 'test1' at gcc/testsuite/gcc.target/aarch64/simd/vqrdmulh_lane_s16_indices_1.c:8:10: .../install/lib/gcc/arm-none-eabi/5.0.0/include/arm_neon.h:6882:10: error: lane -1 out of range 0 - 3 return (int16x4_t)builtin_neon_vqrdmulh_lanev4hi (a, b, c); Note the question of how to common up tests with those in gcc.target/aarch64/simd/*_indices_1.c is not resolved by this patch. Cross-tested check-gcc on arm-none-eabi Bootstrapped on arm-none-linux-gnueabihf cortex-a15 gcc/ChangeLog: * config/arm/arm-builtins.c (enum arm_type_qualifiers): Add qualifier_lane_index. (arm_binop_imm_qualifiers, BINOP_IMM_QUALIFIERS): New. (arm_getlane_qualifiers): Use qualifier_lane_index. (arm_lanemac_qualifiers): Rename to... (arm_mac_n_qualifiers): ...this. (LANEMAC_QUALIFIERS): Rename to... (MAC_N_QUALIFIERS): ...this. (arm_mac_lane_qualifiers, MAC_LANE_QUALIFIERS): New. (arm_setlane_qualifiers): Use qualifier_lane_index. (arm_ternop_imm_qualifiers, TERNOP_IMM_QUALIFIERS): New. (enum builtin_arg): Add NEON_ARG_LANE_INDEX. (arm_expand_neon_args): Handle NEON_ARG_LANE_INDEX. (arm_expand_neon_builtin): Handle qualifier_lane_index. * config/arm/arm-protos.h (neon_lane_bounds): Add const_tree parameter. * config/arm/arm.c (bounds_check): Likewise, improve error message. (neon_lane_bounds, neon_const_bounds): Add arguments to bounds_check. * config/arm/arm_neon_builtins.def (vshrs_n, vshru_n, vrshrs_n, vrshru_n, vshrn_n, vrshrn_n, vqshrns_n, vqshrnu_n, vqrshrns_n, vqrshrnu_n, vqshrun_n, vqrshrun_n, vshl_n, vqshl_s_n, vqshl_u_n, vqshlu_n, vshlls_n, vshllu_n): Change qualifiers to BINOP_IMM. (vsras_n, vsrau_n, vrsras_n, vrsrau_n, vsri_n, vsli_n): Change qualifiers to TERNOP_IMM. (vdup_lane): Change qualifiers to GETLANE. (vmla_lane, vmlals_lane, vmlalu_lane, vqdmlal_lane, vmls_lane, vmlsls_lane, vmlslu_lane, vqdmlsl_lane): Change qualifiers to MAC_LANE. (vmla_n, vmlals_n, vmlalu_n, vqdmlal_n, vmls_n, vmlsls_n, vmlslu_n, vqdmlsl_n): Change qualifiers to MAC_N. * config/arm/neon.md (neon_vget_lanemode, neon_vget_laneumode, neon_vget_lanedi, neon_vget_lanev2di, neon_vset_lanemode, neon_vset_lanedi, neon_vdup_lanemode, neon_vdup_lanedi, neon_vdup_lanev2di, neon_vmul_lanemode, neon_vmul_lanemode, neon_vmullsup_lanemode, neon_vqdmull_lanemode, neon_vqrdmulh_lanemode, neon_vqrdmulh_lanemode, neon_vmla_lanemode, neon_vmla_lanemode, neon_vmlalsup_lanemode, neon_vqdmlal_lanemode, neon_vmls_lanemode, neon_vmls_lanemode, neon_vmlslsup_lanemode, neon_vqdmlsl_lanemode): Remove call to neon_lane_bounds. diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index 7a45113..20d2198 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -89,7 +89,9 @@ enum arm_type_qualifiers /* qualifier_const_pointer | qualifier_map_mode */ qualifier_const_pointer_map_mode = 0x86, /* Polynomial types. */ - qualifier_poly = 0x100 + qualifier_poly = 0x100, + /* Lane indices - must be within range of previous argument = a vector. */ + qualifier_lane_index = 0x200 }; /* The qualifier_internal allows generation of a unary builtin from @@ -120,21 +122,40 @@ arm_ternop_qualifiers[SIMD_MAX_BUILTIN_ARGS] /* T (T, immediate). */ static enum arm_type_qualifiers -arm_getlane_qualifiers[SIMD_MAX_BUILTIN_ARGS] +arm_binop_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_none, qualifier_none, qualifier_immediate }; +#define BINOP_IMM_QUALIFIERS (arm_binop_imm_qualifiers) + +/* T (T, lane index). */ +static enum arm_type_qualifiers +arm_getlane_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none, qualifier_lane_index }; #define GETLANE_QUALIFIERS (arm_getlane_qualifiers) /* T (T, T, T, immediate). */ static enum arm_type_qualifiers
[PATCH 8/14][AArch64]Add vreinterpret, float_truncate_lo/hi, vget_low/high
gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_float_truncate_lo_v2sf): Reparameterize to... (aarch64_float_truncate_lo_mode): ...this, for both V2SF and V4HF. (aarch64_float_truncate_hi_v4sf): Reparameterize to... (aarch64_float_truncate_hi_Vdbl): ...this, for both V4SF and V8HF. * config/aarch64/aarch64-simd-builtins.def (float_truncate_hi_): Add v8hf variant. (float_truncate_lo_): Use BUILTIN_VDF iterator. * config/aarch64/arm_neon.h (vreinterpret_p8_f16, vreinterpret_p16_f16, vreinterpret_f16_f64, vreinterpret_f16_s8, vreinterpret_f16_s16, vreinterpret_f16_s32, vreinterpret_f16_s64, vreinterpret_f16_f32, vreinterpret_f16_u8, vreinterpret_f16_u16, vreinterpret_f16_u32, vreinterpret_f16_u64, vreinterpret_f16_p8, vreinterpret_f16_p16, vreinterpretq_f16_f64, vreinterpretq_f16_s8, vreinterpretq_f16_s16, vreinterpretq_f16_s32, vreinterpretq_f16_s64, vreinterpretq_f16_f32, vreinterpretq_f16_u8, vreinterpretq_f16_u16, vreinterpretq_f16_u32, vreinterpretq_f16_u64, vreinterpretq_f16_p8, vreinterpretq_f16_p16, vreinterpret_f32_f16, vreinterpret_f64_f16, vreinterpret_s64_f16, vreinterpret_u64_f16, vreinterpretq_u64_f16, vreinterpret_s8_f16, vreinterpret_s16_f16, vreinterpret_s32_f16, vreinterpret_u8_f16, vreinterpret_u16_f16, vreinterpret_u32_f16, vget_low_f16, vget_high_f16, vcvt_f16_f32, vcvt_high_f16_f32): New. * config/aarch64/iterators.md (VDF, Vdtype): New. (VWIDE, Vmwtype): Add cases for V4HF and V2SF. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vget_high_1.c: Add float16x8-float16x4 case. * gcc.target/aarch64/vget_low_1.c: Likewise. diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 6298063d13c444d9b7c6cb0c14cfabce611f0d56..ea84055476c9e56e78d1b843e0b028e85a672ee6 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -4857,6 +4857,12 @@ vsetq_lane_u64 (uint64_t __elem, uint64x2_t __vec, const int __index) uint64x1_t lo = vcreate_u64 (vgetq_lane_u64 (tmp, 0)); \ return vreinterpret_##__TYPE##_u64 (lo); +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vget_low_f16 (float16x8_t __a) +{ + __GET_LOW (f16); +} + __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vget_low_f32 (float32x4_t __a) { @@ -4936,6 +4942,12 @@ vget_low_u64 (uint64x2_t __a) uint64x1_t hi = vcreate_u64 (vgetq_lane_u64 (tmp, 1)); \ return vreinterpret_##__TYPE##_u64 (hi); +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vget_high_f16 (float16x8_t __a) +{ + __GET_HIGH (f16); +} + __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vget_high_f32 (float32x4_t __a) { diff --git a/gcc/testsuite/gcc.target/aarch64/vget_high_1.c b/gcc/testsuite/gcc.target/aarch64/vget_high_1.c index 4cb872da2cd269df5290a6af928ed958c4fecd09..b6b57e0c5468dbf571ec9e9196ac2d0fa3754d7a 100644 --- a/gcc/testsuite/gcc.target/aarch64/vget_high_1.c +++ b/gcc/testsuite/gcc.target/aarch64/vget_high_1.c @@ -14,6 +14,7 @@ VARIANT (int8_t, 8, int8x8_t, int8x16_t, s8) \ VARIANT (int16_t, 4, int16x4_t, int16x8_t, s16) \ VARIANT (int32_t, 2, int32x2_t, int32x4_t, s32) \ VARIANT (int64_t, 1, int64x1_t, int64x2_t, s64) \ +VARIANT (float16_t, 4, float16x4_t, float16x8_t, f16) \ VARIANT (float32_t, 2, float32x2_t, float32x4_t, f32) \ VARIANT (float64_t, 1, float64x1_t, float64x2_t, f64) @@ -51,6 +52,8 @@ main (int argc, char **argv) int16_t int16_t_data[8] = { -17, 19, 3, -999, 44048, 505, , 1000}; int32_t int32_t_data[4] = { 123456789, -987654321, -135792468, 975318642 }; int64_t int64_t_data[2] = {0xfedcba9876543210LL, 0xdeadbabecafebeefLL }; + float16_t float16_t_data[8] = { 1.25, 4.5, 7.875, 2.3125, 5.675, 8.875, + 3.6875, 6.75}; float32_t float32_t_data[4] = { 3.14159, 2.718, 1.414, 100.0 }; float64_t float64_t_data[2] = { 1.0100100011, 12345.6789 }; diff --git a/gcc/testsuite/gcc.target/aarch64/vget_low_1.c b/gcc/testsuite/gcc.target/aarch64/vget_low_1.c index f8016ef73124981f7042957521f42754566e9518..2223676521c4c10b2d839746873eb559559d76ba 100644 --- a/gcc/testsuite/gcc.target/aarch64/vget_low_1.c +++ b/gcc/testsuite/gcc.target/aarch64/vget_low_1.c @@ -14,6 +14,7 @@ VARIANT (int8_t, 8, int8x8_t, int8x16_t, s8) \ VARIANT (int16_t, 4, int16x4_t, int16x8_t, s16) \ VARIANT (int32_t, 2, int32x2_t, int32x4_t, s32) \ VARIANT (int64_t, 1, int64x1_t, int64x2_t, s64) \ +VARIANT (float16_t, 4, float16x4_t, float16x8_t, f16) \ VARIANT (float32_t, 2, float32x2_t, float32x4_t, f32) \ VARIANT (float64_t, 1, float64x1_t, float64x2_t, f64) @@ -51,6 +52,8 @@ main (int argc, char **argv) int16_t int16_t_data[8] = { -17, 19, 3, -999, 44048, 505, , 1000}; int32_t int32_t_data[4] = { 123456789, -987654321, -135792468,
[PATCH 10/14][AArch64] Add vcvt(_high)?_f32_f16 intrinsics
This adds the two remaining widening intrinsics, first adding patterns in aarch64-simd.md, then entries in aarch64-simd-builtins.def, and finally intrinsics in arm_neon.h . Note this changes the vector indices present in the RTL on bigendian for float vec_unpacks, to be the same as for integer vec_unpacks. This appears consistent with the usage of VEC_UNPACK_(FLOAT_)?EXPR in tree-vect-stmts.c, which uses a different EXPR for the same half of the vector depending on endianness. I was not able to construct a testcase where the RTL here mattered (i.e. where the RTL was constant-folded, but the tree had not been), but the correctness can be seen from a testcase: double d[4]; void bar (float *f) { for (int i = 0; i 4; i++) d[i] = f[i]; } which used to produced as final RTL (-O3) (insn:TI 8 10 12 (set (reg:V2DF 33 v1 [orig:78 vect__9.19 ] [78]) (float_extend:V2DF (vec_select:V2SF (reg:V4SF 32 v0 [orig:77 MEM[(float *)f_6(D)] ] [77]) (parallel [ (const_int 2 [0x2]) (const_int 3 [0x3]) ] test.c:40 1274 {vec_unpacks_hi_v4sf} (expr_list:REG_EQUIV (mem/c:V2DF (reg/f:DI 0 x0 [79]) [2 MEM[(double *)d]+0 S16 A64]) (nil))) (insn:TI 12 8 11 (set (reg:V2DF 32 v0 [orig:81 vect__9.19 ] [81]) (float_extend:V2DF (vec_select:V2SF (reg:V4SF 32 v0 [orig:77 MEM[(float *)f_6(D)] ] [77]) (parallel [ (const_int 0 [0]) (const_int 1 [0x1]) ] test.c:40 1272 {vec_unpacks_lo_v4sf} (expr_list:REG_EQUIV (mem/c:V2DF (plus:DI (reg/f:DI 0 x0 [79]) (const_int 16 [0x10])) [2 MEM[(double *)d + 16B]+0 S16 A64]) (nil))) (insn:TI 11 12 15 (set (mem/c:V2DF (reg/f:DI 0 x0 [79]) [2 MEM[(double *)d]+0 S16 A64]) (reg:V2DF 33 v1 [orig:78 vect__9.19 ] [78])) test.c:40 808 {*aarch64_simd_movv2df} (expr_list:REG_DEAD (reg:V2DF 33 v1 [orig:78 vect__9.19 ] [78]) (nil))) (insn:TI 15 11 22 (set (mem/c:V2DF (plus:DI (reg/f:DI 0 x0 [79]) (const_int 16 [0x10])) [2 MEM[(double *)d + 16B]+0 S16 A64]) (reg:V2DF 32 v0 [orig:81 vect__9.19 ] [81])) test.c:40 808 {*aarch64_simd_movv2df} (expr_list:REG_DEAD (reg:V2DF 32 v0 [orig:81 vect__9.19 ] [81]) i.e. apparently storing vector elements 2 and 3 to the address of d, and elems 0+1 to address (d+16). Of course this was flipped back again to be correct at assembly time, but following this patch the RTL indices are also correct (elems 0+1 to address d, elems 2+3 to address d+16). gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_simd_vec_unpacks_lo_mode, aarch64_simd_vec_unpacks_hi_mode): New insn. (vec_unpacks_lo_v4sf, vec_unpacks_hi_v4sf): Delete insn. (vec_unpacks_lo_mode, vec_unpacks_hi_mode): New expand. (aarch64_float_extend_lo_v2df): Rename to... (aarch64_float_extend_lo_Vwide): this, using VDF and so adding V4SF. * config/aarch64/aarch64-simd-builtins.def (vec_unpacks_hi): Add v8hf. (float_extend_lo): Add v4sf. * config/aarch64/arm_neon.h (vcvt_f32_f16, vcvt_high_f32_f16): New. * config/aarch64/iterators.md (VQ_HSF): New iterator. (VWIDE, Vwtype, Vhalftype): Add V8HF, V4SF. (Vwide): New mode_attr. diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index 604bfa20bf838ee04ef0e1dda0b47b55dbdd82a6..1eefb37c2eba37aecee6ccae100274c5a8cc5ae3 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -360,11 +360,11 @@ only ever used for the int64x1_t intrinsic, there is no scalar version. */ BUILTIN_VALLDI (UNOP, abs, 2) - VAR1 (UNOP, vec_unpacks_hi_, 10, v4sf) + VAR2 (UNOP, vec_unpacks_hi_, 10, v4sf, v8hf) VAR1 (BINOP, float_truncate_hi_, 0, v4sf) VAR1 (BINOP, float_truncate_hi_, 0, v8hf) - VAR1 (UNOP, float_extend_lo_, 0, v2df) + VAR2 (UNOP, float_extend_lo_, 0, v2df, v4sf) BUILTIN_VDF (UNOP, float_truncate_lo_, 0) /* Implemented by aarch64_ld1VALL_F16:mode. */ diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 161396a331cab777bb2108f86c39b74557be4abc..17a5d5f8c757833a7ed387083f8076af2c8cad66 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1677,36 +1677,57 @@ ;; Float widening operations. -(define_insn vec_unpacks_lo_v4sf - [(set (match_operand:V2DF 0 register_operand =w) - (float_extend:V2DF - (vec_select:V2SF - (match_operand:V4SF 1 register_operand w) - (parallel [(const_int 0) (const_int 1)]) - )))] +(define_insn aarch64_simd_vec_unpacks_lo_mode + [(set (match_operand:VWIDE 0 register_operand =w) +(float_extend:VWIDE (vec_select:VHALF + (match_operand:VQ_HSF 1 register_operand w) + (match_operand:VQ_HSF 2 vect_par_cnst_lo_half )
[PATCH 13/14][ARM/AArch64 testsuite] Use gcc-dg-runtest in advsimd-intrinsics.exp
In the first revision of Christophe Lyon's advsimd-intrinsics tests, https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00532.html , both gcc-dg-runtest (to assemble only) and c-torture-execute were used. In review the gcc-dg-runtest part was then dropped, and execution tests continued using c-torture-execute. However, c-torture-execute ignores e.g. dg-options directives in the individual test files, whereas gcc-dg-runtest does not. This patch switches to gcc-dg-runtest (with dg-do-what-default = run) for all tests, allowing use of e.g. dg-options (in testsuite patch 3/3). This generally seems to work OK - indeed I also dropped the parallelization-disabling code - and the new advsimd-intrinsics.exp now follows gcc.c-torture/compile/compile.exp and gcc.c-torture/execute/execute.exp very closely. However, there are side effects, of which we should be aware, but with which I think we can live, specifically: (1) the lines in the test log change from... PASS: gcc.target/aarch64/advsimd-intrinsics/vcombine.c compilation, -O1 PASS: gcc.target/aarch64/advsimd-intrinsics/vcombine.c execution, -O1 ...to... PASS: gcc.target/aarch64/advsimd-intrinsics/vcombine.c -O1 execution test (that is, the compilation line disappears, but the (test for excess errors) remains unchanged) (2) The -Og -g variant is no longer tested; all of -O0, -O1, -O2, -O2 -flto -fno-use-linker-plugin -flto-partition=none, -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects, -O3 -fomit-frame-pointer, -O3 -g, -Os still are. My feeling is that this set of options is exhaustive enough. Cross-tested arm-none-eabi, aarch64-none-elf, aarch64_be-none-elf; natively tested arm-none-linux-gnueabihf and aarch64-none-linux-gnu. gcc/testsuite/ChangeLog: * gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp: Use gcc-dg-runtest. diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp index 551299ef5634b7fea68d2e2f813ab61270b59e35..aa5d4d5ec2f45fa745fac152125e1badc2f2df43 100644 --- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp @@ -26,10 +26,6 @@ if {![istarget arm*-*-*] load_lib gcc-dg.exp # Initialize `dg'. -load_lib c-torture.exp -load_lib target-supports.exp -load_lib torture-options.exp - dg-init if {[istarget arm*-*-*] @@ -37,29 +33,14 @@ if {[istarget arm*-*-*] return } -torture-init -set-torture-options $C_TORTURE_OPTIONS {{}} $LTO_TORTURE_OPTIONS - # Make sure Neon flags are provided, if necessary. -set additional_flags [add_options_for_arm_neon ] +set additional_flags [add_options_for_arm_neon -w] # Main loop. -foreach src [lsort [glob -nocomplain $srcdir/$subdir/*.c]] { -# If we're only testing specific files and this isn't one of them, skip it. -if ![runtest_file_p $runtests $src] then { - continue -} - -# runtest_file_p is already run above, and the code below can run -# runtest_file_p again, make sure everything for this test is -# performed if the above runtest_file_p decided this runtest -# instance should execute the test -gcc_parallel_test_enable 0 -c-torture-execute $src $additional_flags -gcc-dg-runtest $src $additional_flags -gcc_parallel_test_enable 1 -} +set saved-dg-do-what-default ${dg-do-what-default} +set dg-do-what-default run +gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cS\]]] ${additional_flags} +set dg-do-what-default ${saved-dg-do-what-default} # All done. -torture-finish dg-finish
Re: [PATCH 7/14][AArch64] vld{2,3,4}{,_lane,_dup},vcombine,vcreate
Alan Lawrence wrote: gcc/ChangeLog: * config/aarch64/aarch64.c (aarch64_split_simd_combine): Add V4HFmode. * config/aarch64/aarch64-builtins.c (VAR13, VAR14): New. (aarch64_scalar_builtin_types, aarch64_init_simd_builtin_scalar_types): Add __builtin_aarch64_simd_hf. * config/aarch64/arm_neon.h (float16x4x2_t, float16x8x2_t, float16x4x3_t, float16x8x3_t, float16x4x4_t, float16x8x4_t, vcombine_f16, vst2_lane_f16, vst2q_lane_f16, vst3_lane_f16, vst3q_lane_f16, vst4_lane_f16, vst4q_lane_f16, vld2_f16, vld2q_f16, vld3_f16, vld3q_f16, vld4_f16, vld4q_f16, vld2_dup_f16, vld2q_dup_f16, vld3_dup_f16, vld3q_dup_f16, vld4_dup_f16, vld4q_dup_f16, vld2_lane_f16, vld2q_lane_f16, vld3_lane_f16, vld3q_lane_f16, vld4_lane_f16, vld4q_lane_f16, vst2_f16, vst2q_f16, vst3_f16, vst3q_f16, vst4_f16, vst4q_f16, vcreate_f16): New. * config/aarch64/iterators.md (VALLDIF, Vtype, Vetype, Vbtype, V_cmp_result, v_cmp_result): Add cases for V4HF and V8HF. (VDC, Vdbl): Add V4HF. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vldN_1.c: Add float16x4_t and float16x8_t cases. * gcc.target/aarch64/vldN_dup_1.c: Likewise. * gcc.target/aarch64/vldN_lane_1.c: Likewise. diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index ac05c43faf47e2de6b20a976defe2b269f7f1633..e791533f4c9e26a0c0c5ddf78f015f334f1ca2ed 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -314,6 +314,12 @@ aarch64_types_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS] #define VAR12(T, N, MAP, A, B, C, D, E, F, G, H, I, J, K, L) \ VAR11 (T, N, MAP, A, B, C, D, E, F, G, H, I, J, K) \ VAR1 (T, N, MAP, L) +#define VAR13(T, N, MAP, A, B, C, D, E, F, G, H, I, J, K, L, M) \ + VAR12 (T, N, MAP, A, B, C, D, E, F, G, H, I, J, K, L) \ + VAR1 (T, N, MAP, M) +#define VAR14(T, X, MAP, A, B, C, D, E, F, G, H, I, J, K, L, M, N) \ + VAR13 (T, X, MAP, A, B, C, D, E, F, G, H, I, J, K, L, M) \ + VAR1 (T, X, MAP, N) #include aarch64-builtin-iterators.h @@ -391,6 +397,7 @@ const char *aarch64_scalar_builtin_types[] = { __builtin_aarch64_simd_qi, __builtin_aarch64_simd_hi, __builtin_aarch64_simd_si, + __builtin_aarch64_simd_hf, __builtin_aarch64_simd_sf, __builtin_aarch64_simd_di, __builtin_aarch64_simd_df, @@ -678,6 +685,8 @@ aarch64_init_simd_builtin_scalar_types (void) __builtin_aarch64_simd_qi); (*lang_hooks.types.register_builtin_type) (intHI_type_node, __builtin_aarch64_simd_hi); + (*lang_hooks.types.register_builtin_type) (aarch64_fp16_type_node, + __builtin_aarch64_simd_hf); (*lang_hooks.types.register_builtin_type) (intSI_type_node, __builtin_aarch64_simd_si); (*lang_hooks.types.register_builtin_type) (float_type_node, diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 13debc8f94b54873bbe8cfda0a8e00921489d9c3..425bcbdc47e3b08d456b02e823a4b370e0fc6312 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -1052,6 +1052,9 @@ aarch64_split_simd_combine (rtx dst, rtx src1, rtx src2) case V2SImode: gen = gen_aarch64_simd_combinev2si; break; + case V4HFmode: + gen = gen_aarch64_simd_combinev4hf; + break; case V2SFmode: gen = gen_aarch64_simd_combinev2sf; break; diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index dca74025c80eb82f6f56e43ec009ceb0803de6e9..935041297ac6878292c81d5db0d70674def21f03 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -153,6 +153,16 @@ typedef struct uint64x2x2_t uint64x2_t val[2]; } uint64x2x2_t; +typedef struct float16x4x2_t +{ + float16x4_t val[2]; +} float16x4x2_t; + +typedef struct float16x8x2_t +{ + float16x8_t val[2]; +} float16x8x2_t; + typedef struct float32x2x2_t { float32x2_t val[2]; @@ -273,6 +283,16 @@ typedef struct uint64x2x3_t uint64x2_t val[3]; } uint64x2x3_t; +typedef struct float16x4x3_t +{ + float16x4_t val[3]; +} float16x4x3_t; + +typedef struct float16x8x3_t +{ + float16x8_t val[3]; +} float16x8x3_t; + typedef struct float32x2x3_t { float32x2_t val[3]; @@ -393,6 +413,16 @@ typedef struct uint64x2x4_t uint64x2_t val[4]; } uint64x2x4_t; +typedef struct float16x4x4_t +{ + float16x4_t val[4]; +} float16x4x4_t; + +typedef struct float16x8x4_t +{ + float16x8_t val[4]; +} float16x8x4_t; + typedef struct float32x2x4_t { float32x2_t val[4]; @@ -2644,6 +2674,12 @@ vcreate_s64 (uint64_t __a) return (int64x1_t) {__a}; } +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vcreate_f16 (uint64_t __a) +{ + return (float16x4_t) __a; +} + __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vcreate_f32 (uint64_t __a) { @@ -4780,6 +4816,12 @@ vcombine_s64 (int64x1_t __a, int64x1_t __b) return
[PATCH 4/14][ARM] Remaining float16 intrinsics: vld..., vst..., vget_low|high, vcombine
This is a respin of https://gcc.gnu.org/ml/gcc-patches/2015-01/msg01440.html ; changes are to add in several missing vst... intrinsics, and fix a missing iterator V_uf_sclr used in vec_extract. These intrinsics are all made from patterns in neon.md, and are all tied together by iterators - I've tried to reduce coupling where I can. Cross-tested check-gcc on arm-none-eabi; Bootstrapped + check-gcc on arm-none-linux-gnueabihf. gcc/ChangeLog: * config/arm/arm-builtins.c (VAR11, VAR12): New. * config/arm/arm_neon_builtins.def (vcombine, vld2_dup, vld3_dup, vld4_dup): Add v4hf variant. (vget_high, vget_low): Add v8hf variant. (vld1, vst1, vst1_lane, vld2, vld2_lane, vst2, vst2_lane, vld3, vld3_lane, vst3, vst3_lane, vld4, vld4_lane, vst4, vst4_lane): Add v4hf and v8hf variants. * config/arm/iterators.md (VD_LANE, VD_RE, VQ2, VQ_HS): New. (VDX): Add V4HF. (V_DOUBLE): Add case for V4HF. (VQX): Add V8HF. (V_HALF): Add case for V8HF. (VDQX): Add V4HF, V8HF. (V_elem, V_two_elem, V_three_elem, V_four_elem, V_cmp_result, V_uf_sclr, V_sz_elem, V_mode_nunits, q): Add cases for V4HF V8HF. * config/arm/neon.md (vec_setmodeinternal, vec_extractmode, neon_vget_lanemode_sext_internal, neon_vget_lanemode_zext_internal, vec_load_lanesoimode, neon_vld2mode, vec_store_lanesoimode, neon_vst2mode, vec_load_lanescimode, neon_vld3mode, neon_vld3qamode, neon_vld3qbmode, vec_store_lanescimode, neon_vst3mode, neon_vst3qamode, neon_vst3qbmode, vec_load_lanesximode, neon_vld4mode, neon_vld4qamode, neon_vld4qbmode, vec_store_lanesximode, neon_vst4mode, neon_vst4qamode, neon_vst4qbmode): Change VQ iterator to VQ2. (neon_vcreate, neon_vreinterpretv8qimode, neon_vreinterpretv4himode, neon_vreinterpretv2simode, neon_vreinterpretv2sfmode, neon_vreinterpretdimode): Change VDX to VD_RE. (neon_vld2_lanemode, neon_vst2_lanemode, neon_vld3_lanemode, neon_vst3_lanemode, neon_vld4_lanemode, neon_vst4_lanemode): Change VD iterator to VD_LANE, and VMQ iterator to VQ_HS. * config/arm/arm_neon.h (float16x4x2_t, float16x8x2_t, float16x4x3_t, float16x8x3_t, float16x4x4_t, float16x8x4_t, vcombine_f16, vget_high_f16, vget_low_f16, vld1_f16, vld1q_f16, vst1_f16, vst1q_f16, vst1_lane_f16, vst1q_lane_f16, vld2_f16, vld2q_f16, vld2_lane_f16, vld2q_lane_f16, vld2_dup_f16, vst2_f16, vst2q_f16, vst2_lane_f16, vst2q_lane_f16, vld3_f16, vld3q_f16, vld3_lane_f16, vld3q_lane_f16, vld3_dup_f16, vst3_f16, vst3q_f16, vst3_lane_f16, vst3q_lane_f16, vld4_f16, vld4q_f16, vld4_lane_f16, vld4q_lane_f16, vld4_dup_f16, vst4_f16, vst4q_f16, vst4_lane_f16, vst4q_lane_f16, ): New. diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index 9855b86202e80816bada565786f35dd21fe68c91..4c3f0e888969f16ff6c84e2a1bf65321d73ec8b4 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -228,6 +228,12 @@ typedef struct { #define VAR10(T, N, A, B, C, D, E, F, G, H, I, J) \ VAR9 (T, N, A, B, C, D, E, F, G, H, I) \ VAR1 (T, N, J) +#define VAR11(T, N, A, B, C, D, E, F, G, H, I, J, K) \ + VAR10 (T, N, A, B, C, D, E, F, G, H, I, J) \ + VAR1 (T, N, K) +#define VAR12(T, N, A, B, C, D, E, F, G, H, I, J, K, L) \ + VAR11 (T, N, A, B, C, D, E, F, G, H, I, J, K) \ + VAR1 (T, N, L) /* The NEON builtin data can be found in arm_neon_builtins.def. The mode entries in the following table correspond to the key type of the diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index e308bd4b7a404d869a7d125e8c5ddc4fb16ec884..1892fac503582366105b7dae0ac5e4da16698648 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -162,6 +162,16 @@ typedef struct uint64x2x2_t uint64x2_t val[2]; } uint64x2x2_t; +typedef struct float16x4x2_t +{ + float16x4_t val[2]; +} float16x4x2_t; + +typedef struct float16x8x2_t +{ + float16x8_t val[2]; +} float16x8x2_t; + typedef struct float32x2x2_t { float32x2_t val[2]; @@ -288,6 +298,16 @@ typedef struct uint64x2x3_t uint64x2_t val[3]; } uint64x2x3_t; +typedef struct float16x4x3_t +{ + float16x4_t val[3]; +} float16x4x3_t; + +typedef struct float16x8x3_t +{ + float16x8_t val[3]; +} float16x8x3_t; + typedef struct float32x2x3_t { float32x2_t val[3]; @@ -414,6 +434,16 @@ typedef struct uint64x2x4_t uint64x2_t val[4]; } uint64x2x4_t; +typedef struct float16x4x4_t +{ + float16x4_t val[4]; +} float16x4x4_t; + +typedef struct float16x8x4_t +{ + float16x8_t val[4]; +} float16x8x4_t; + typedef struct float32x2x4_t { float32x2_t val[4]; @@ -6063,6 +6093,12 @@ vcombine_s64 (int64x1_t __a, int64x1_t __b) return (int64x2_t)__builtin_neon_vcombinedi (__a, __b); } +__extension__ static __inline float16x8_t __attribute__
[PATCH 6/14][AArch64] Add support for float16x{4,8}_t vectors/builtins
This adds some basic intrinsics - vget_lane, vset_lane, vld1_lane, vld1, vst1 - for float16 types, and the necessary support in the builtin generator, basic patterns for moving values around, etc. Other intrinsics will follow in later patches. I've extended the existing testcases in aarch64/, but advsimd-intrinsics tests follow later in the series. gcc/ChangeLog: * config/aarch64/aarch64.c (aarch64_vector_mode_supported_p): Support V4HFmode and V8HFmode. (aarch64_split_simd_move): Add case for V8HFmode. * config/aarch64/aarch64-builtins.c (v4hf_UP, v8hf_UP): Define. (aarch64_simd_builtin_std_type): Handle HFmode. (aarch64_init_simd_builtin_types): Include Float16x4_t and Float16x8_t. * config/aarch64/aarch64-simd.md (movmode, aarch64_get_lanemode, aarch64_ld1VALL:mode, aarch64_st1VALL:mode): Use VALL_F16 iterator. (aarch64_be_ld1mode, aarch64_be_st1mode): Use VALLDI_F16 iterator. * config/aarch64/aarch64-simd-builtin-types.def: Add Float16x4_t, Float16x8_t. * config/aarch64/aarch64-simd-builtins.def (ld1, st1): Use VALL_F16. * config/aarch64/arm_neon.h (float16x4_t, float16x8_t, float16_t): New typedefs. (vget_lane_f16, vgetq_lane_f16, vset_lane_f16, vsetq_lane_f16, vld1_f16, vld1q_f16, vst1_f16, vst1q_f16, vst1_lane_f16, vst1q_lane_f16): New. * config/aarch64/iterators.md (VD, VQ, VQ_NO2E): Add vectors of HFmode. (VALLDI_F16, VALL_F16): New. (Vmtype, VEL, VCONQ, VHALF, VRL3, VRL4, V_TWO_ELEM, V_THREE_ELEM, V_FOUR_ELEM, q): Add cases for V4HF and V8HF. (VDBL, VRL2): Add V4HF case. gcc/testsuite/ChangeLog: * g++.dg/abi/mangle-neon-aarch64.C: Add cases for float16x4_t and float16x8_t. * gcc.target/aarch64/vset_lane_1.c: Likewise. * gcc.target/aarch64/vld1-vst1_1.c: Likewise, also missing float32x4_t. * gcc.target/aarch64/vld1_lane.c: Remove unused constants; add cases for float16x4_t and float16x8_t. diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index d554735ab480f9e9b1f49fd3510555197bb7b5f4..6544643a3cd1dd46b440eca0e1a05bad4c499262 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -63,6 +63,7 @@ #define v8qi_UP V8QImode #define v4hi_UP V4HImode +#define v4hf_UP V4HFmode #define v2si_UP V2SImode #define v2sf_UP V2SFmode #define v1df_UP V1DFmode @@ -70,6 +71,7 @@ #define df_UPDFmode #define v16qi_UP V16QImode #define v8hi_UP V8HImode +#define v8hf_UP V8HFmode #define v4si_UP V4SImode #define v4sf_UP V4SFmode #define v2di_UP V2DImode @@ -522,6 +524,8 @@ aarch64_simd_builtin_std_type (enum machine_mode mode, return aarch64_simd_intCI_type_node; case XImode: return aarch64_simd_intXI_type_node; +case HFmode: + return aarch64_fp16_type_node; case SFmode: return float_type_node; case DFmode: @@ -606,6 +610,8 @@ aarch64_init_simd_builtin_types (void) aarch64_simd_types[Poly64x2_t].eltype = aarch64_simd_types[Poly64_t].itype; /* Continue with standard types. */ + aarch64_simd_types[Float16x4_t].eltype = aarch64_fp16_type_node; + aarch64_simd_types[Float16x8_t].eltype = aarch64_fp16_type_node; aarch64_simd_types[Float32x2_t].eltype = float_type_node; aarch64_simd_types[Float32x4_t].eltype = float_type_node; aarch64_simd_types[Float64x1_t].eltype = double_type_node; diff --git a/gcc/config/aarch64/aarch64-simd-builtin-types.def b/gcc/config/aarch64/aarch64-simd-builtin-types.def index b85a23109efae6301931f12c6b665015af570fb7..ef8f20574c52170facfe67fc9fa433dc64926bca 100644 --- a/gcc/config/aarch64/aarch64-simd-builtin-types.def +++ b/gcc/config/aarch64/aarch64-simd-builtin-types.def @@ -44,6 +44,8 @@ ENTRY (Poly16x8_t, V8HI, poly, 12) ENTRY (Poly64x1_t, DI, poly, 12) ENTRY (Poly64x2_t, V2DI, poly, 12) + ENTRY (Float16x4_t, V4HF, none, 13) + ENTRY (Float16x8_t, V8HF, none, 13) ENTRY (Float32x2_t, V2SF, none, 13) ENTRY (Float32x4_t, V4SF, none, 13) ENTRY (Float64x1_t, V1DF, none, 13) diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index af39d9c2b42eea0bc45ea5bc3d4fc576849cfd65..07f8ba961c1546ccac7ecaa5756c631afeae4b3e 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -345,11 +345,11 @@ VAR1 (UNOP, float_extend_lo_, 0, v2df) VAR1 (UNOP, float_truncate_lo_, 0, v2sf) - /* Implemented by aarch64_ld1VALL:mode. */ - BUILTIN_VALL (LOAD1, ld1, 0) + /* Implemented by aarch64_ld1VALL_F16:mode. */ + BUILTIN_VALL_F16 (LOAD1, ld1, 0) - /* Implemented by aarch64_st1VALL:mode. */ - BUILTIN_VALL (STORE1, st1, 0) + /* Implemented by aarch64_st1VALL_F16:mode. */ + BUILTIN_VALL_F16 (STORE1, st1, 0) /* Implemented by fmamode4. */ BUILTIN_VDQF (TERNOP, fma,
[PATCH 9/14][AArch64] vld1(q?)_dup, missing vreinterpretq intrinsics
gcc/ChangeLog: * config/aarch64/arm_neon.h (vreinterpretq_p8_f16, vreinterpretq_p16_f16, vreinterpretq_f32_f16, vreinterpretq_f64_f16, vreinterpretq_s64_f16, vreinterpretq_s8_f16, vreinterpretq_s16_f16, vreinterpretq_s32_f16, vreinterpretq_u8_f16, vreinterpretq_u16_f16, vreinterpretq_u32_f16, vld1_dup_f16, vld1q_dup_f16): New. diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 88723231e5c32faf3bc68eccdf4e3a2b104b57b9..6d98b2e08221c3e25c4f66e6058b4f228d90a094 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -2993,6 +2993,12 @@ vreinterpretq_p8_s64 (int64x2_t __a) } __extension__ static __inline poly8x16_t __attribute__ ((__always_inline__)) +vreinterpretq_p8_f16 (float16x8_t __a) +{ + return (poly8x16_t) __a; +} + +__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__)) vreinterpretq_p8_f32 (float32x4_t __a) { return (poly8x16_t) __a; @@ -3131,6 +3137,12 @@ vreinterpretq_p16_s64 (int64x2_t __a) } __extension__ static __inline poly16x8_t __attribute__ ((__always_inline__)) +vreinterpretq_p16_f16 (float16x8_t __a) +{ + return (poly16x8_t) __a; +} + +__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__)) vreinterpretq_p16_f32 (float32x4_t __a) { return (poly16x8_t) __a; @@ -3383,6 +3395,12 @@ vreinterpret_f32_p16 (poly16x4_t __a) } __extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) +vreinterpretq_f32_f16 (float16x8_t __a) +{ + return (float32x4_t) __a; +} + +__extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) vreinterpretq_f32_f64 (float64x2_t __a) { return (float32x4_t) __a; @@ -3521,6 +3539,12 @@ vreinterpret_f64_u64 (uint64x1_t __a) } __extension__ static __inline float64x2_t __attribute__((__always_inline__)) +vreinterpretq_f64_f16 (float16x8_t __a) +{ + return (float64x2_t) __a; +} + +__extension__ static __inline float64x2_t __attribute__((__always_inline__)) vreinterpretq_f64_f32 (float32x4_t __a) { return (float64x2_t) __a; @@ -3683,6 +3707,12 @@ vreinterpretq_s64_s32 (int32x4_t __a) } __extension__ static __inline int64x2_t __attribute__ ((__always_inline__)) +vreinterpretq_s64_f16 (float16x8_t __a) +{ + return (int64x2_t) __a; +} + +__extension__ static __inline int64x2_t __attribute__ ((__always_inline__)) vreinterpretq_s64_f32 (float32x4_t __a) { return (int64x2_t) __a; @@ -3965,6 +3995,12 @@ vreinterpretq_s8_s64 (int64x2_t __a) } __extension__ static __inline int8x16_t __attribute__ ((__always_inline__)) +vreinterpretq_s8_f16 (float16x8_t __a) +{ + return (int8x16_t) __a; +} + +__extension__ static __inline int8x16_t __attribute__ ((__always_inline__)) vreinterpretq_s8_f32 (float32x4_t __a) { return (int8x16_t) __a; @@ -4103,6 +4139,12 @@ vreinterpretq_s16_s64 (int64x2_t __a) } __extension__ static __inline int16x8_t __attribute__ ((__always_inline__)) +vreinterpretq_s16_f16 (float16x8_t __a) +{ + return (int16x8_t) __a; +} + +__extension__ static __inline int16x8_t __attribute__ ((__always_inline__)) vreinterpretq_s16_f32 (float32x4_t __a) { return (int16x8_t) __a; @@ -4241,6 +4283,12 @@ vreinterpretq_s32_s64 (int64x2_t __a) } __extension__ static __inline int32x4_t __attribute__ ((__always_inline__)) +vreinterpretq_s32_f16 (float16x8_t __a) +{ + return (int32x4_t) __a; +} + +__extension__ static __inline int32x4_t __attribute__ ((__always_inline__)) vreinterpretq_s32_f32 (float32x4_t __a) { return (int32x4_t) __a; @@ -4385,6 +4433,12 @@ vreinterpretq_u8_s64 (int64x2_t __a) } __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__)) +vreinterpretq_u8_f16 (float16x8_t __a) +{ + return (uint8x16_t) __a; +} + +__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__)) vreinterpretq_u8_f32 (float32x4_t __a) { return (uint8x16_t) __a; @@ -4523,6 +4577,12 @@ vreinterpretq_u16_s64 (int64x2_t __a) } __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) +vreinterpretq_u16_f16 (float16x8_t __a) +{ + return (uint16x8_t) __a; +} + +__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) vreinterpretq_u16_f32 (float32x4_t __a) { return (uint16x8_t) __a; @@ -4661,6 +4721,12 @@ vreinterpretq_u32_s64 (int64x2_t __a) } __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__)) +vreinterpretq_u32_f16 (float16x8_t __a) +{ + return (uint32x4_t) __a; +} + +__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__)) vreinterpretq_u32_f32 (float32x4_t __a) { return (uint32x4_t) __a; @@ -15107,6 +15173,13 @@ vld1q_u64 (const uint64_t *a) /* vld1_dup */ +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vld1_dup_f16 (const float16_t* __a) +{ + float16_t __f = *__a; + return (float16x4_t) { __f, __f, __f, __f }; +} + __extension__ static __inline float32x2_t
[PATCH 14/14][ARM/AArch64 testsuite] Test float16_t vcvt_* intrinsics
This adds a test of vcvt_f32_f16 and vcvt_f16_f32, also vcvt_high_f32_f16 and vcvt_high_f16_f32. On ARM, we pass additional option -mfpu=neon-fp16 to the compiler (possible following patch 2/3). The compiler is already receiving an option such as -mfpu=neon or -mfpu=crypto-neon-fp-armv8, but passing neon-fp16 as well as either of those appears to do no harm, and turns on the superset of all -mfpu options, as desired. On AArch64, we additionally test vcvt_high_f32_f16 and vcvt_high_f16_f32; these are not tested on ARM as the relevant intrinsics do not exist in 32-bit state. Passing on aarch64_be-none-elf, aarch64-none-elf, arm-none-linux-gnueabi, aarch64-none-linux-gnu. gcc/testsuite/ChangeLog: * gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c: New. diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c new file mode 100644 index ..a346b3d72e13d5b2028de5ae7b88f910dcb3f862 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c @@ -0,0 +1,96 @@ +/* { dg-additional-options -mfpu=neon-fp16 { target { arm*-*-* } } } */ +#include arm_neon.h +#include arm-neon-ref.h +#include compute-ref-data.h +#include math.h + +/* Expected results for vcvt. */ +VECT_VAR_DECL (expected,hfloat,32,4) [] = { 0x4180, 0x4170, + 0x4160, 0x4150 }; +VECT_VAR_DECL (expected,hfloat,16,4) [] = { 0x3e00, 0x4100, 0x4300, 0x4480 }; + +/* Expected results for vcvt_high_f32_f16. */ +VECT_VAR_DECL (expected_high,hfloat,32,4) [] = { 0xc140, 0xc130, + 0xc120, 0xc110 }; +/* Expected results for vcvt_high_f16_f32. */ +VECT_VAR_DECL (expected_high,hfloat,16,8) [] = { 0x4000, 0x4000, 0x4000, 0x4000, + 0xcc00, 0xcb80, 0xcb00, 0xca80 }; + +void +exec_vcvt (void) +{ +#define TEST_MSG vcvt_f32_f16 + { +VECT_VAR_DECL (buffer_src, float, 16, 4) [] = { 16.0, 15.0, 14.0, 13.0 }; + +DECL_VARIABLE (vector_src, float, 16, 4); + +VLOAD (vector_src, buffer_src, , float, f, 16, 4); +DECL_VARIABLE (vector_res, float, 32, 4) = + vcvt_f32_f16 (VECT_VAR (vector_src, float, 16, 4)); +vst1q_f32 (VECT_VAR (result, float, 32, 4), + VECT_VAR (vector_res, float, 32, 4)); + +CHECK_FP (TEST_MSG, float, 32, 4, PRIx32, expected, ); + } +#undef TEST_MSG + + clean_results (); + +#define TEST_MSG vcvt_f16_f32 + { +VECT_VAR_DECL (buffer_src, float, 32, 4) [] = { 1.5, 2.5, 3.5, 4.5 }; +DECL_VARIABLE (vector_src, float, 32, 4); + +VLOAD (vector_src, buffer_src, q, float, f, 32, 4); +DECL_VARIABLE (vector_res, float, 16, 4) = + vcvt_f16_f32 (VECT_VAR (vector_src, float, 32, 4)); +vst1_f16 (VECT_VAR (result, float, 16, 4), + VECT_VAR (vector_res, float, 16 ,4)); + +CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected, ); + } +#undef TEST_MSG + +#ifdef __ARM_64BIT_STATE + clean_results (); + +#define TEST_MSG vcvt_high_f32_f16 + { +DECL_VARIABLE (vector_src, float, 16, 8); +VLOAD (vector_src, buffer, q, float, f, 16, 8); +DECL_VARIABLE (vector_res, float, 32, 4); +VECT_VAR (vector_res, float, 32, 4) = + vcvt_high_f32_f16 (VECT_VAR (vector_src, float, 16, 8)); +vst1q_f32 (VECT_VAR (result, float, 32, 4), + VECT_VAR (vector_res, float, 32, 4)); +CHECK_FP (TEST_MSG, float, 32, 4, PRIx32, expected_high, ); + } +#undef TEST_MSG + clean_results (); + +#define TEST_MSG vcvt_high_f16_f32 + { +DECL_VARIABLE (vector_low, float, 16, 4); +VDUP (vector_low, , float, f, 16, 4, 2.0); + +DECL_VARIABLE (vector_src, float, 32, 4); +VLOAD (vector_src, buffer, q, float, f, 32, 4); + +DECL_VARIABLE (vector_res, float, 16, 8) = + vcvt_high_f16_f32 (VECT_VAR (vector_low, float, 16, 4), + VECT_VAR (vector_src, float, 32, 4)); +vst1q_f16 (VECT_VAR (result, float, 16, 8), + VECT_VAR (vector_res, float, 16, 8)); + +CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_high, ); + } +#endif +} + +int +main (void) +{ + exec_vcvt (); + return 0; +}
Re: [PATCH 00/12] Reduce conditional compilation
On Tue, 2015-04-21 at 11:29 -0400, Trevor Saunders wrote: On Tue, Apr 21, 2015 at 07:57:19AM -0600, Jeff Law wrote: On 04/21/2015 07:24 AM, tbsaunde+...@tbsaunde.org wrote: From: Trevor Saunders tbsaunde+...@tbsaunde.org Hi, This is a first round of patches to reduce the amount of code with in #if / #ifdef. This makes it incrementally easier to not break configs other than the one being built, and moves things slightly closer to using target hooks for everything. each commit bootstrapped and regtested on x86_64-linux-gnu without regression, and whole patch set run through config-list.mk without issue, ok? So I think after looking at this patchset, any changes of a similar nature you want to make should be considered pre-approved. Just post them for archival purposes, but no need for you to wait for review as long as they have the same purpose and overall structure as was seen in these patches. thanks! Its also always nice to have someone double check your logic :-) Thanks for working on this! Conditional compilation was a major PITA when doing the rtx-rtx_insn * work last year, so I'm very pleased to see these cleanups go in. Dave
Re: [PATCH] Makefile.tpl: Remove surplus whitespace
On 9 April 2015 at 13:20, Bernhard Reutner-Fischer rep.dot@gmail.com wrote: essentially s/[[:space:]]*[[:space:]];/;/g ChangeLog (attn: to src, IIRC no write-access, ask someone to commit) Ok for trunk now? Jeff OKed this, applied as r222334. Please holler if i broke something.. thanks, 2015-04-01 Bernhard Reutner-Fischer al...@gcc.gnu.org * Makefile.tpl: Remove surplus whitespace throughout. * Makefile.in: Regenerate. Signed-off-by: Bernhard Reutner-Fischer rep.dot@gmail.com --- Makefile.in | 4858 +- Makefile.tpl | 110 +- 2 files changed, 2484 insertions(+), 2484 deletions(-) diff --git a/Makefile.tpl b/Makefile.tpl index 1ea1954..9972909 100644 --- a/Makefile.tpl +++ b/Makefile.tpl @@ -225,8 +225,8 @@ HOST_EXPORTS = \ GMPINC=$(HOST_GMPINC); export GMPINC; \ ISLLIBS=$(HOST_ISLLIBS); export ISLLIBS; \ ISLINC=$(HOST_ISLINC); export ISLINC; \ - LIBELFLIBS=$(HOST_LIBELFLIBS) ; export LIBELFLIBS; \ - LIBELFINC=$(HOST_LIBELFINC) ; export LIBELFINC; \ + LIBELFLIBS=$(HOST_LIBELFLIBS); export LIBELFLIBS; \ + LIBELFINC=$(HOST_LIBELFINC); export LIBELFINC; \ @if gcc-bootstrap $(RPATH_ENVVAR)=`echo $(TARGET_LIB_PATH)$$$(RPATH_ENVVAR) | sed 's,::*,:,g;s,^:*,,;s,:*$$,,'`; export $(RPATH_ENVVAR); \ @endif gcc-bootstrap @@ -785,9 +785,9 @@ do-info: maybe-all-texinfo install-info: do-install-info dir.info s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \ - if [ -f dir.info ] ; then \ - $(INSTALL_DATA) dir.info $(DESTDIR)$(infodir)/dir.info ; \ - else true ; fi + if [ -f dir.info ]; then \ + $(INSTALL_DATA) dir.info $(DESTDIR)$(infodir)/dir.info; \ + else true; fi install-pdf: do-install-pdf @@ -913,14 +913,14 @@ uninstall: .PHONY: install.all install.all: install-no-fixedincludes - @if [ -f ./gcc/Makefile ] ; then \ - r=`${PWD_COMMAND}` ; export r ; \ + @if [ -f ./gcc/Makefile ]; then \ + r=`${PWD_COMMAND}`; export r; \ s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \ $(HOST_EXPORTS) \ (cd ./gcc \ - $(MAKE) $(FLAGS_TO_PASS) install-headers) ; \ + $(MAKE) $(FLAGS_TO_PASS) install-headers); \ else \ - true ; \ + true; \ fi # install-no-fixedincludes is used to allow the elaboration of binary packages @@ -960,15 +960,15 @@ installdirs: mkinstalldirs $(SHELL) $(srcdir)/mkinstalldirs $(MAKEDIRS) dir.info: do-install-info - if [ -f $(srcdir)/texinfo/gen-info-dir ] ; then \ - $(srcdir)/texinfo/gen-info-dir $(DESTDIR)$(infodir) $(srcdir)/texinfo/dir.info-template dir.info.new ; \ - mv -f dir.info.new dir.info ; \ - else true ; \ + if [ -f $(srcdir)/texinfo/gen-info-dir ]; then \ + $(srcdir)/texinfo/gen-info-dir $(DESTDIR)$(infodir) $(srcdir)/texinfo/dir.info-template dir.info.new; \ + mv -f dir.info.new dir.info; \ + else true; \ fi dist: @echo Building a full distribution of this tree isn't done - @echo via 'make dist'. Check out the etc/ subdirectory + @echo via 'make dist'. Check out the etc/ subdirectory etags tags: TAGS @@ -998,8 +998,8 @@ configure-[+prefix+][+module+]: [+ IF bootstrap +][+ ELSE +] s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \ [+ IF check_multilibs +]echo Checking multilib configuration for [+module+]...; \ - $(SHELL) $(srcdir)/mkinstalldirs [+subdir+]/[+module+] ; \ - $(CC_FOR_TARGET) --print-multi-lib [+subdir+]/[+module+]/multilib.tmp 2 /dev/null ; \ + $(SHELL) $(srcdir)/mkinstalldirs [+subdir+]/[+module+]; \ + $(CC_FOR_TARGET) --print-multi-lib [+subdir+]/[+module+]/multilib.tmp 2 /dev/null; \ if test -r [+subdir+]/[+module+]/multilib.out; then \ if cmp -s [+subdir+]/[+module+]/multilib.tmp [+subdir+]/[+module+]/multilib.out; then \ rm -f [+subdir+]/[+module+]/multilib.tmp; \ @@ -1011,7 +1011,7 @@ configure-[+prefix+][+module+]: [+ IF bootstrap +][+ ELSE +] mv [+subdir+]/[+module+]/multilib.tmp [+subdir+]/[+module+]/multilib.out; \ fi; \ [+ ENDIF check_multilibs +]test ! -f [+subdir+]/[+module+]/Makefile || exit 0; \ - $(SHELL) $(srcdir)/mkinstalldirs [+subdir+]/[+module+] ; \ + $(SHELL) $(srcdir)/mkinstalldirs [+subdir+]/[+module+]; \ [+exports+] [+extra_exports+] \ echo Configuring in [+subdir+]/[+module+]; \ cd [+subdir+]/[+module+] || exit 1; \ @@ -1044,7 +1044,7 @@ configure-stage[+id+]-[+prefix+][+module+]: TFLAGS=$(STAGE[+id+]_TFLAGS); \ [+ IF check_multilibs +]echo Checking multilib configuration for [+module+]...; \ - $(CC_FOR_TARGET)
[C/C++ PATCH] Implement -Wshift-negative-value (PR c/65179)
Currently, we warn if the right operand of a shift expression is negative, or greater than or equal to the length in bits of the promoted left operand. But we don't warn when we see a left shift of a negative value. That is undefined behavior since C99 and I believe C++11, so this patch implements a new warning, -Wshift-negative-value, only active in C99/C++11. A bunch of tests needed tweaking; I find it scary that some vect tests are invoking UB. Bootstrapped/regtested on x86_64-linux, ok for trunk? 2015-04-22 Marek Polacek pola...@redhat.com PR c/65179 * c-common.c (c_fully_fold_internal): Warn when left shifting a negative value. * c.opt (Wshift-negative-value): New option. * c-typeck.c (build_binary_op): Warn when left shifting a negative value. * typeck.c (cp_build_binary_op): Warn when left shifting a negative value. * doc/invoke.texi: Document -Wshift-negative-value. * c-c++-common/Wshift-negative-value-1.c: New test. * c-c++-common/Wshift-negative-value-2.c: New test. * c-c++-common/torture/vector-shift2.c: Use -Wno-shift-negative-value. * gcc.dg/torture/vector-shift2.c: Likewise. * gcc.c-torture/execute/pr40386.c: Likewise. * gcc.dg/tree-ssa/vrp65.c: Likewise. * gcc.dg/tree-ssa/vrp66.c: Likewise. * gcc.dg/vect/vect-sdivmod-1.c: Likewise. * gcc.dg/vect/vect-shift-2-big-array.c: Likewise. * gcc.dg/vect/vect-shift-2.c: Likewise. * gcc.target/i386/pr31167.c: Likewise. * g++.dg/init/array11.C: Add dg-warning. * gcc.dg/c99-const-expr-7.c: Add dg-warning and dg-error. diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c index 7fe7fa6..e944f11 100644 --- gcc/c-family/c-common.c +++ gcc/c-family/c-common.c @@ -1361,6 +1361,15 @@ c_fully_fold_internal (tree expr, bool in_init, bool *maybe_const_operands, !TREE_OVERFLOW_P (op0) !TREE_OVERFLOW_P (op1)) overflow_warning (EXPR_LOCATION (expr), ret); + if (code == LSHIFT_EXPR + TREE_CODE (orig_op0) != INTEGER_CST + TREE_CODE (TREE_TYPE (orig_op0)) == INTEGER_TYPE + TREE_CODE (op0) == INTEGER_CST + c_inhibit_evaluation_warnings == 0 + tree_int_cst_sgn (op0) 0 + flag_isoc99) + warning_at (loc, OPT_Wshift_negative_value, + left shift of negative value); if ((code == LSHIFT_EXPR || code == RSHIFT_EXPR) TREE_CODE (orig_op1) != INTEGER_CST TREE_CODE (op1) == INTEGER_CST diff --git gcc/c-family/c.opt gcc/c-family/c.opt index 983f4a8..47e0913 100644 --- gcc/c-family/c.opt +++ gcc/c-family/c.opt @@ -781,6 +781,10 @@ Wshift-count-overflow C ObjC C++ ObjC++ Var(warn_shift_count_overflow) Init(1) Warning Warn if shift count = width of type +Wshift-negative-value +C ObjC C++ ObjC++ Var(warn_shift_negative_value) Init(1) Warning +Warn if left shifting a negative value + Wsign-compare C ObjC C++ ObjC++ Var(warn_sign_compare) Warning LangEnabledBy(C++ ObjC++,Wall) Warn about signed-unsigned comparisons diff --git gcc/c/c-typeck.c gcc/c/c-typeck.c index ebe4c73..17d2cac 100644 --- gcc/c/c-typeck.c +++ gcc/c/c-typeck.c @@ -10701,6 +10701,15 @@ build_binary_op (location_t location, enum tree_code code, code1 == INTEGER_TYPE) { doing_shift = true; + if (TREE_CODE (op0) == INTEGER_CST + tree_int_cst_sgn (op0) 0 + flag_isoc99) + { + int_const = false; + if (c_inhibit_evaluation_warnings == 0) + warning_at (location, OPT_Wshift_negative_value, + left shift of negative value); + } if (TREE_CODE (op1) == INTEGER_CST) { if (tree_int_cst_sgn (op1) 0) diff --git gcc/cp/typeck.c gcc/cp/typeck.c index 250b5d6..d5d36bf 100644 --- gcc/cp/typeck.c +++ gcc/cp/typeck.c @@ -4327,11 +4327,21 @@ cp_build_binary_op (location_t location, } else if (code0 == INTEGER_TYPE code1 == INTEGER_TYPE) { + tree const_op0 = fold_non_dependent_expr (op0); + if (TREE_CODE (const_op0) != INTEGER_CST) + const_op0 = op0; tree const_op1 = fold_non_dependent_expr (op1); if (TREE_CODE (const_op1) != INTEGER_CST) const_op1 = op1; result_type = type0; doing_shift = true; + if (TREE_CODE (const_op0) == INTEGER_CST + tree_int_cst_sgn (const_op0) 0 + (complain tf_warning) + c_inhibit_evaluation_warnings == 0 + cxx_dialect = cxx11) + warning (OPT_Wshift_negative_value, +left shift of negative value); if (TREE_CODE (const_op1) == INTEGER_CST) { if (tree_int_cst_lt (const_op1, integer_zero_node)) diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi index a939ff7..2e14921 100644 ---
Re: [PATCH 00/12] Reduce conditional compilation
On 04/22/2015 12:13 PM, David Malcolm wrote: Conditional compilation was a major PITA when doing the rtx-rtx_insn * work last year, so I'm very pleased to see these cleanups go in. Yup. It also got in Andrew's way last year and we regularly see cases where small patches which work fine on the mainstream architectures fail to build in the lesser used architectures (particularly cc0 targets). It's a whole class of problems I want to see slowly disappear. glibc went through this process in their codebase for similar reasons, but they had more to lose when they got it wrong -- IIRC they had a case where exported ABI would differ as a result of conditionally compiled code. Not good. Jeff
[PATCH] Tidy up locking for libgomp OpenACC entry points
Hi, This patch is an attempt to fix some potential race conditions with accesses to shared data structures from multiple concurrent threads in libgomp's OpenACC entry points. The main change is to move locking out of lookup_host and lookup_dev in oacc-mem.c and into their callers (which can then hold the locks for the whole operation that they are performing). Also missing locking has been added for gomp_acc_insert_pointer. Tests look OK (with offloading to NVidia PTX). OK? (For the gomp4 branch, maybe, if trunk's not suitable at the moment?) Thanks, Julian ChangeLog libgomp/ * oacc-mem.c (lookup_host): Remove locking from function. Note locking requirement for caller in function comment. (lookup_dev): Likewise. (acc_free, acc_deviceptr, acc_hostptr, acc_is_present) (acc_map_data, acc_unmap_data, present_create_copy, delete_copyout) (update_dev_host, gomp_acc_insert_pointer, gomp_acc_remove_pointer): Add locking. commit 983e08e46be24380a52095851cd9c6eb481eb47c Author: Julian Brown jul...@codesourcery.com Date: Tue Apr 21 12:42:17 2015 -0700 More locking in oacc-mem.c diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c index 89ef5fc..d53af4b 100644 --- a/libgomp/oacc-mem.c +++ b/libgomp/oacc-mem.c @@ -35,7 +35,8 @@ #include stdint.h #include assert.h -/* Return block containing [H-S), or NULL if not contained. */ +/* Return block containing [H-S), or NULL if not contained. The device lock + for DEV must be locked on entry, and remains locked on exit. */ static splay_tree_key lookup_host (struct gomp_device_descr *dev, void *h, size_t s) @@ -46,9 +47,7 @@ lookup_host (struct gomp_device_descr *dev, void *h, size_t s) node.host_start = (uintptr_t) h; node.host_end = (uintptr_t) h + s; - gomp_mutex_lock (dev-lock); key = splay_tree_lookup (dev-mem_map, node); - gomp_mutex_unlock (dev-lock); return key; } @@ -56,7 +55,8 @@ lookup_host (struct gomp_device_descr *dev, void *h, size_t s) /* Return block containing [D-S), or NULL if not contained. The list isn't ordered by device address, so we have to iterate over the whole array. This is not expected to be a common - operation. */ + operation. The device lock associated with TGT must be locked on entry, and + remains locked on exit. */ static splay_tree_key lookup_dev (struct target_mem_desc *tgt, void *d, size_t s) @@ -67,16 +67,12 @@ lookup_dev (struct target_mem_desc *tgt, void *d, size_t s) if (!tgt) return NULL; - gomp_mutex_lock (tgt-device_descr-lock); - for (t = tgt; t != NULL; t = t-prev) { if (t-tgt_start = (uintptr_t) d t-tgt_end = (uintptr_t) d + s) break; } - gomp_mutex_unlock (tgt-device_descr-lock); - if (!t) return NULL; @@ -120,25 +116,32 @@ acc_free (void *d) { splay_tree_key k; struct goacc_thread *thr = goacc_thread (); + struct gomp_device_descr *acc_dev = thr-dev; if (!d) return; assert (thr thr-dev); + gomp_mutex_lock (acc_dev-lock); + /* We don't have to call lazy open here, as the ptr value must have been returned by acc_malloc. It's not permitted to pass NULL in (unless you got that null from acc_malloc). */ - if ((k = lookup_dev (thr-dev-openacc.data_environ, d, 1))) - { - void *offset; + if ((k = lookup_dev (acc_dev-openacc.data_environ, d, 1))) +{ + void *offset; + + offset = d - k-tgt-tgt_start + k-tgt_offset; - offset = d - k-tgt-tgt_start + k-tgt_offset; + gomp_mutex_unlock (acc_dev-lock); - acc_unmap_data ((void *)(k-host_start + offset)); - } + acc_unmap_data ((void *)(k-host_start + offset)); +} + else +gomp_mutex_unlock (acc_dev-lock); - thr-dev-free_func (thr-dev-target_id, d); + acc_dev-free_func (acc_dev-target_id, d); } void @@ -178,16 +181,24 @@ acc_deviceptr (void *h) goacc_lazy_initialize (); struct goacc_thread *thr = goacc_thread (); + struct gomp_device_descr *dev = thr-dev; + + gomp_mutex_lock (dev-lock); - n = lookup_host (thr-dev, h, 1); + n = lookup_host (dev, h, 1); if (!n) -return NULL; +{ + gomp_mutex_unlock (dev-lock); + return NULL; +} offset = h - n-host_start; d = n-tgt-tgt_start + n-tgt_offset + offset; + gomp_mutex_unlock (dev-lock); + return d; } @@ -204,16 +215,24 @@ acc_hostptr (void *d) goacc_lazy_initialize (); struct goacc_thread *thr = goacc_thread (); + struct gomp_device_descr *acc_dev = thr-dev; - n = lookup_dev (thr-dev-openacc.data_environ, d, 1); + gomp_mutex_lock (acc_dev-lock); + + n = lookup_dev (acc_dev-openacc.data_environ, d, 1); if (!n) -return NULL; +{ + gomp_mutex_unlock (acc_dev-lock); + return NULL; +} offset = d - n-tgt-tgt_start + n-tgt_offset; h = n-host_start + offset; + gomp_mutex_unlock (acc_dev-lock); + return h; } @@ -232,6 +251,8 @@ acc_is_present (void *h, size_t s) struct goacc_thread *thr
[debug-early] Adjust g++.dg/debug/dwarf2/auto1.C testcase
This patch adjusts the testcase to work with the now slightly different ordering of DIEs in the branch. Brought to you by the letter N for nightmare. Committed to branch. Aldy commit 7996af2f984f42a9694c466ee05d5067696503cc Author: Aldy Hernandez al...@redhat.com Date: Wed Apr 22 12:20:10 2015 -0700 Adjust testcase for debug-early's different ordering. diff --git a/gcc/testsuite/g++.dg/debug/dwarf2/auto1.C b/gcc/testsuite/g++.dg/debug/dwarf2/auto1.C index e38334b..c04e923 100644 --- a/gcc/testsuite/g++.dg/debug/dwarf2/auto1.C +++ b/gcc/testsuite/g++.dg/debug/dwarf2/auto1.C @@ -10,14 +10,14 @@ // .uleb128 0x5# (DIE (0x4c) DW_TAG_unspecified_type) // .long .LASF6 # DW_AT_name: auto //... +// .uleb128 0x9# (DIE (0x87) DW_TAG_base_type) +// .ascii int\0 # DW_AT_name +//... // .uleb128 0x7# (DIE (0x57) DW_TAG_subprogram) // .long 0x33# DW_AT_specification // .long 0x87# DW_AT_type -//... -// .uleb128 0x9# (DIE (0x87) DW_TAG_base_type) -// .ascii int\0 # DW_AT_name -// { dg-final { scan-assembler a1.*(0x\[0-9a-f]+)\[^\n\r]*DW_AT_type.*\\1. DW_TAG_unspecified_type.*DW_AT_specification\[\n\r]{1,2}\[^\n\r]*(0x\[0-9a-f]+)\[^\n\r]*DW_AT_type.*\\2. DW_TAG_base_type } } +// { dg-final { scan-assembler a1.*(0x\[0-9a-f]+)\[^\n\r]*DW_AT_type.*\\1. DW_TAG_unspecified_type.*(0x\[0-9a-f]+). DW_TAG_base_type.*DW_AT_specification\[\n\r]{1,2}\[^\n\r]*\\2\[^\n\r]*DW_AT_type } } struct A {
Hide _S_n_primes from user code
Hello Here is a rather trivial patch, just code cleanup. Since we export _Prime_rehash_policy we do not need to expose the _S_n_primes anymore. * include/bits/hashtable_policy.h (_Prime_rehash_policy::_S_n_primes): Delete. * src/c++11/hashtable_c++0x.cc (_Prime_rehash_policy::_M_next_bkt): Remove usage of latter and compute size of the prime numbers array locally. Tested under Linux x86_64. Ok to commit ? François
Re: patch ping
On April 13, 2015 3:12:48 PM GMT+02:00, Jeff Law l...@redhat.com wrote: On 04/11/2015 04:27 PM, Bernhard Reutner-Fischer wrote: Hi, I'd like to ask an RM or global reviewer to kindly consider the following patches preventing one or the other target in config-list.mk to build: [PATCH, bfin] handle BFIN_CPU_UNKNOWN in TARGET_CPU_CPP_BUILTINS https://gcc.gnu.org/ml/gcc-patches/2015-04/msg00034.html OK. [PATCH, c6x] handle unk_isa in TARGET_CPU_CPP_BUILTINS https://gcc.gnu.org/ml/gcc-patches/2015-04/msg00089.html OK. Cosmetic patchlets pending but probably for stage 1 now: Remove redundant guard in emit_bss() https://gcc.gnu.org/ml/gcc-patches/2015-04/msg00337.html OK. tree-tailcall: Commentary typo fix, remove fwd declaration https://gcc.gnu.org/ml/gcc-patches/2015-04/msg00342.html OK. s/ ;/;/g Makefile.tpl https://gcc.gnu.org/ml/gcc-patches/2015-04/msg00380.html OK Note there is a policy that requires all patches to be bootstrapped and regression tested. These are trivial enough that I'll approve them as-is. However, in the future, please bootstrap and regression test changes whenever possible. I'm aware of this policy. I did my best not to break other configs. By now all of the above were pushed including the erroneously committed https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01270.html to fix PR target/47122 vax-*-openbsd* config.gcc typo that Jakub was kind enough to confirm to be obvious on IRC. Thanks for your reviews! Since I touched Makefile.tpl and there was at least one other patch against it in GCC, i would be grateful if someone could synch Makefile.tpl back to binutils-gdb in two days or three so I can sleep well again a couple of days after that :) cheers,
stregnhten ICF WRT inline and operator_new flags
Hi, this patch strenghtens ipa-icf to allow merging non-inline function to inline function. This is safe because inline flag does not affect function itself. It only affects way the function is used, so we need to compare the flag only when comparing references. (inline flag mismatch is one of common reasons to give up on merging). The patch does the same for DECL_IS_OPERATOR_NEW and refactors code so comparsions of all such symol properties (that matter only at referring time) is done by compare_referenced_symbol_properties. I also cleaned up reference walking and added code to match TREE_CODE of pointer types to avoid wrong code with tree-vrp change I did last week. Bootstrapped/regtested x86_64-linux, will commit it shortly. Honza * ipa-icf.c (symbol_compare_collection::symbol_compare_collection): cleanup. (sem_function::get_hash): Do not hash DECL_DISREGARD_INLINE_LIMITS, DECL_DECLARED_INLINE_P and DECL_IS_OPERATOR_NEW. (sem_item::compare_referenced_symbol_properties): New. (sem_item::hash_referenced_symbol_properties): New. (sem_item::compare_cgraph_references): Rename to ... (sem_item::compare_symbol_references): ... this one; use compare_referenced_symbol_properties. (sem_function::equals_wpa): Do not compare DECL_DISREGARD_INLINE_LIMITS, DECL_DECLARED_INLINE_P, DECL_IS_OPERATOR_NEW; compare pointer sizes. (sem_item::update_hash_by_addr_refs): Call hash_referenced_symbol_properties. (sem_item::update_hash_by_local_refs): Cleanup. (sem_function::merge): Do not mix up symbol properties. (sem_variable::equals_wpa): Use compare_symbol_references. * ipa-icf.h (sem_item::compare_referenced_symbol_properties): New. (sem_item::hash_referenced_symbol_properties): New. (sem_item::compare_symbol_references): New. (sem_item::compare_cgraph_references): Remove. Index: ipa-icf.c === --- ipa-icf.c (revision 92) +++ ipa-icf.c (working copy) @@ -145,9 +145,8 @@ symbol_compare_collection::symbol_compar if (is_a varpool_node * (node) DECL_VIRTUAL_P (node-decl)) return; - for (unsigned i = 0; i node-num_references (); i++) + for (unsigned i = 0; node-iterate_reference (i, ref); i++) { - ref = node-iterate_reference (i, ref); if (ref-address_matters_p ()) m_references.safe_push (ref-referred); @@ -342,8 +341,6 @@ sem_function::get_hash (void) if (DECL_FUNCTION_SPECIFIC_OPTIMIZATION (decl)) (cl_optimization_hash (TREE_OPTIMIZATION (DECL_FUNCTION_SPECIFIC_OPTIMIZATION (decl; - hstate.add_flag (DECL_DISREGARD_INLINE_LIMITS (decl)); - hstate.add_flag (DECL_DECLARED_INLINE_P (decl)); - hstate.add_flag (DECL_IS_OPERATOR_NEW (decl)); hstate.add_flag (DECL_CXX_CONSTRUCTOR_P (decl)); hstate.add_flag (DECL_CXX_DESTRUCTOR_P (decl)); @@ -354,12 +351,117 @@ sem_function::get_hash (void) return hash; } +/* Compare properties of symbols N1 and N2 that does not affect semantics of + symbol itself but affects semantics of its references from USED_BY (which + may be NULL if it is unknown). If comparsion is false, symbols + can still be merged but any symbols referring them can't. + + If ADDRESS is true, do extra checking needed for IPA_REF_ADDR. + + TODO: We can also split attributes to those that determine codegen of + a function body/variable constructor itself and those that are used when + referring to it. */ + +bool +sem_item::compare_referenced_symbol_properties (symtab_node *used_by, + symtab_node *n1, + symtab_node *n2, + bool address) +{ + if (is_a cgraph_node * (n1)) +{ + /* Inline properties matters: we do now want to merge uses of inline +function to uses of normal function because inline hint would be lost. +We however can merge inline function to noinline because the alias +will keep its DECL_DECLARED_INLINE flag. + +Also ignore inline flag when optimizing for size or when function +is known to not be inlinable. + +TODO: the optimize_size checks can also be assumed to be true if +unit has no !optimize_size functions. */ + + if ((!used_by || address || !is_a cgraph_node * (used_by) + || !opt_for_fn (used_by-decl, optimize_size)) + !opt_for_fn (n1-decl, optimize_size) + n1-get_availability () AVAIL_INTERPOSABLE + (!DECL_UNINLINABLE (n1-decl) || !DECL_UNINLINABLE (n2-decl))) + { + if (DECL_DISREGARD_INLINE_LIMITS (n1-decl) + != DECL_DISREGARD_INLINE_LIMITS (n2-decl)) + return return_false_with_msg +(DECL_DISREGARD_INLINE_LIMITS are different); + + if
Re: [PATCH, rs6000, testsuite] Fix PR target/64579, __TM_end __builtin_tend failed to return transactional state
On Wed, 2015-04-22 at 20:55 -0500, Segher Boessenkool wrote: Using a hard reg in the RTL like this has a few problems: a) It might hinder register allocation. Maybe it doesn't, not sure; b) It does hinder scheduling; c) It can make things ICE, maybe with register asm. Ahh, I see what you mean now. Yeah, I hadn't thought of that. The alternative is to write a separate define_insn for ttest, one without inputs; the generated assembler can still be the same of course. In that case, I think you're right that this is the best course if action. I'll do that ans retest. Thanks for catching this. Peter
Re: [RFC][PATCH 3/3] Enable zero/sign extension elimination
On 23/04/15 09:48, H.J. Lu wrote: On Wed, Apr 22, 2015 at 3:15 PM, Kugan kugan.vivekanandara...@linaro.org wrote: On 17/01/15 13:11, Kugan wrote: Re-enable zero/sign extension elimination using value range that includes wrapped attribute. Now that stage-1 is open, rebased it and regression tested on x86-64-none-linux-gnu with no new regressions. Is this OK for trunk? Thanks, Kugan gcc/ChangeLog: 2015-04-22 Kugan Vivekanandarajah kug...@linaro.org * calls.c (precompute_arguments): Check promoted_for_signed_and_unsigned_p and set the promoted mode. * expr.c (expand_expr_real_1): Likewise. (promoted_for_signed_and_unsigned_p): New function. * cfgexpand.c (expand_gimple_stmt_1): Call emit_move_insn if SUBREG is promoted with SRP_SIGNED_AND_UNSIGNED. * expr.h (promoted_for_signed_and_unsigned_p): New definition. Are you planning to submit some testcases to show its improvement? Will it help https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53639 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33349 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44532 Thanks H.J. Lu for the link. I will investigate them and will come up with test cases if my patches help these. Kugan
Re: [PATCH, rs6000, testsuite] Fix PR target/64579, __TM_end __builtin_tend failed to return transactional state
On Wed, Apr 22, 2015 at 06:08:26PM -0500, Peter Bergner wrote: + case HTM_BUILTIN_TTEST: /* Alias for: tabortwci. 0,r0,0 */ + op[nopnds++] = GEN_INT (0); + op[nopnds++] = gen_rtx_REG (SImode, 0); + op[nopnds++] = GEN_INT (0); Is that really r0, isn't that (0|rA)? [Too lazy to read the docs myself right now, sorry.] The ISA doc shows: [snip] Thanks for looking it up! I'm still a bit worried about putting a reg in the RTL (while the instruction doesn't actually use one), but perhaps it's harmless. I'm not sure what you mean by the instruction doesn't use one. The hardware instruction does use a register for its second operand (even though its contents are ignored due to TO == 0) and the pattern requires us to pass in a reg rtx, so I'm not sure what you're referring to. I mean the instruction doesn't actually use the value in the register (if it did, you couldn't just pass in a non-fixed hard register in RTL). Using a hard reg in the RTL like this has a few problems: a) It might hinder register allocation. Maybe it doesn't, not sure; b) It does hinder scheduling; c) It can make things ICE, maybe with register asm. I no longer think c) will happen in this case. The alternative is to write a separate define_insn for ttest, one without inputs; the generated assembler can still be the same of course. Cheers, Segher
Re: [PATCH 02/12] remove some ifdef HAVE_cc0
On Tue, Apr 21, 2015 at 04:24:44PM +0100, Trevor Saunders wrote: On Tue, Apr 21, 2015 at 04:14:01PM +0200, Richard Biener wrote: On Tue, Apr 21, 2015 at 3:24 PM, tbsaunde+...@tbsaunde.org wrote: From: Trevor Saunders tbsaunde+...@tbsaunde.org gcc/ChangeLog: 2015-04-21 Trevor Saunders tbsaunde+...@tbsaunde.org * conditions.h: Define macros even if HAVE_cc0 is undefined. * emit-rtl.c: Define functions even if HAVE_cc0 is undefined. * final.c: Likewise. * jump.c: Likewise. * recog.c: Likewise. * recog.h: Declare functions even when HAVE_cc0 is undefined. * sched-deps.c (sched_analyze_2): Always compile case for cc0. If I've counted right after the git bisect, this patch seems to break the ARM buildi (arm-none-linux-gnueabihf): In file included from insn-output.c:40:0: /gcc-src/gcc/conditions.h:112:0: error: CC_STATUS_INIT redefined [-Werror] #define CC_STATUS_INIT \ ^ In file included from tm.h:35:0, from insn-output.c:7: /gcc-src/gcc/config/arm/arm.h:2159:0: note: this is the location of the previous definition #define CC_STATUS_INIT \ ^ I guess the conditions.h definition wants wrapping in #ifndef - though a quick grep suggests that ARM is the only target defining CC_STATUS_INIT so lets CC the ARM maintainers and see what their preference is... Thanks, James
[Patch][ARM]Correct options for arm test case pr65710
Hi there, This patch is to correct options in arm test case pr65710.c. I reused some existing test case as template to produce this case, but forgot to update the options. Is it OK to trunk? BR, Terry 2015-04-23 Terry Guo terry@arm.com * gcc.target/arm/pr65710.c: Update the options. diff --git a/gcc/testsuite/gcc.target/arm/pr65710.c b/gcc/testsuite/gcc.target/arm/pr65710.c index 139bc64..737b7f3 100644 --- a/gcc/testsuite/gcc.target/arm/pr65710.c +++ b/gcc/testsuite/gcc.target/arm/pr65710.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options -march=armv6-m -mthumb -O3 -w -mfloat-abi=soft } */ +/* { dg-options -mthumb -O2 -mfloat-abi=soft } */ struct ST { char *buffer;
Re: [PATCH] fortran/65429 -- don't dereference a null pointer
On Tue, Apr 07, 2015 at 12:59:20PM -0700, Steve Kargl wrote: On Tue, Mar 31, 2015 at 10:17:14AM -0700, Jerry DeLisle wrote: On 03/29/2015 09:25 AM, Steve Kargl wrote: On Sat, Mar 28, 2015 at 01:01:57AM +0100, Dominique Dhumieres wrote: AFAICT your test succeeds without your patch and does not test that the ICE reported by FX is gone (indeed it is with your patch). New patch and testcase. The ChangeLog entries are in the original email. Built and tested on x86_64-*-freebsd. OK, now? OK Steve. I just checked and this is not a regression with respect to 4.6, 4.7. 4.8, or 4.9. As 5.0 is coming soon, I'll for stage 1 to commit. Fixed with the following r222342 trunk r222343 5.1 branch -- Steve
Re: [RFC][PATCH 3/3] Enable zero/sign extension elimination
On Wed, Apr 22, 2015 at 3:15 PM, Kugan kugan.vivekanandara...@linaro.org wrote: On 17/01/15 13:11, Kugan wrote: Re-enable zero/sign extension elimination using value range that includes wrapped attribute. Now that stage-1 is open, rebased it and regression tested on x86-64-none-linux-gnu with no new regressions. Is this OK for trunk? Thanks, Kugan gcc/ChangeLog: 2015-04-22 Kugan Vivekanandarajah kug...@linaro.org * calls.c (precompute_arguments): Check promoted_for_signed_and_unsigned_p and set the promoted mode. * expr.c (expand_expr_real_1): Likewise. (promoted_for_signed_and_unsigned_p): New function. * cfgexpand.c (expand_gimple_stmt_1): Call emit_move_insn if SUBREG is promoted with SRP_SIGNED_AND_UNSIGNED. * expr.h (promoted_for_signed_and_unsigned_p): New definition. Are you planning to submit some testcases to show its improvement? Will it help https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53639 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33349 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44532 -- H.J.
Re: [PATCH 00/12] Reduce conditional compilation
On 04/22/2015 02:46 PM, Trevor Saunders wrote: yeah, its irritated me on a number of occasions too. I'd really like it if building config-list.mk could be faster, but that's a much bigger project, but at least if everything is target hooks maybe ccache can kick in some. I don't see ccache kicking in here because the target .h files get included most of the time. Maybe one day that won't be the case :-) jeff
Re: [PATCH 00/12] Reduce conditional compilation
On Wed, Apr 22, 2015 at 12:36:58PM -0600, Jeff Law wrote: On 04/22/2015 12:13 PM, David Malcolm wrote: Conditional compilation was a major PITA when doing the rtx-rtx_insn * work last year, so I'm very pleased to see these cleanups go in. Yup. It also got in Andrew's way last year and we regularly see cases where small patches which work fine on the mainstream architectures fail to build in the lesser used architectures (particularly cc0 targets). It's a whole class of problems I want to see slowly disappear. yeah, its irritated me on a number of occasions too. I'd really like it if building config-list.mk could be faster, but that's a much bigger project, but at least if everything is target hooks maybe ccache can kick in some. Trev glibc went through this process in their codebase for similar reasons, but they had more to lose when they got it wrong -- IIRC they had a case where exported ABI would differ as a result of conditionally compiled code. Not good. Jeff
Re: [RFC][PATCH 2/3] Propagate and save value ranges wrapped information
On 19/01/15 22:28, Richard Biener wrote: On Sat, 17 Jan 2015, Kugan wrote: This patch propagate value range wrapps attribute and save this to SSA_NAME. diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c index 9b7695d..832c35d 100644 --- a/gcc/tree-vrp.c +++ b/gcc/tree-vrp.c @@ -103,6 +103,9 @@ struct value_range_d tree min; tree max; + /* Set to true if values in this value range could wrapp. */ + bool is_wrapped; + /* Set of SSA names whose value ranges are equivalent to this one. This set is only valid when TYPE is VR_RANGE or VR_ANTI_RANGE. */ bitmap equiv; I can't make sense of this description (wrap with one p as well). I assume you mean that the expression that has this value-range assigned has an operation that may have wrapped? (a value can't wrap) You need to specify how is_wrapped behaves for range union and intersect operations and which operations can wrap. I miss an overall description of these patches as to a) why you need this information and b) why it helps. It's now also too late and thus you have plenty of time until stage1 starts again. Thanks Richard for the comments. Now that stage1 is open, here is the modified patch with the changes requested. Due to wrapping in the value range computation, there was a regression in aplha-linux (https://gcc.gnu.org/ml/gcc-patches/2014-08/msg02458.html) while using value range infromation to remove zero/sign extensions in rtl expansaion. Hence I had to revert the patch that enabled zero/sign extension. Now I am propgating this wrap_p information to SSA_NAME so that we know, when used in PROMOTE_MODE, the values can have unpredictable bits beyon the type width. I have also updated the comments as below: + /* Set to true if the values in this range might have been wrapped + during the operation that computed it. + + This is mainly used in zero/sign-extension elimination where value ranges + computed are for the type of SSA_NAME and computation is ultimately done + in PROMOTE_MODE. Therefore, the value ranges has to be correct upto + PROMOTE_MODE precision. If the operation can WRAP, higher bits in + PROMOTE_MODE can be unpredictable and cannot be used in zero/sign extension + elimination; additional wrap_p attribute is needed to show this. + + For example: + on alpha where PROMOTE_MODE is 64 bit and _344 is a 32 bit unsigned + variable, + _343 = ivtmp.179_52 + 2147483645; [0x8004, 0x80043] + + the value range VRP will compute is: + + _344 = _343 * 2; [0x8, 0x86] + _345 = (integer(kind=4)) _344;[0x8, 0x86] + + In PROMOTE_MODE, there will be garbage above the type width. In places + like this, attribute wrap_p will be true. + + wrap_p in range union operation will be true if either of the value range + has wrap_p set. In intersect operation, true when both the value ranges + have wrap_p set. */ + bool wrap_p; + Thanks, Kugan gcc/testsuite/ChangeLog: 2015-04-22 Kugan Vivekanandarajah kug...@linaro.org * gcc.dg/tree-ssa/vrp92.c: Update scanned pattern. gcc/ChangeLog: 2015-04-22 Kugan Vivekanandarajah kug...@linaro.org * builtins.c (determine_block_size): Use new definition of get_range_info. * gimple-pretty-print.c (dump_ssaname_info): Dump new wrap_p info. * internal-fn.c (get_range_pos_neg): Use new definition of get_range_info. (get_min_precision): Likewise. * tree-ssa-copy.c (fini_copy_prop): Use new definition of duplicate_ssa_range_info. * tree-ssa-pre.c (insert_into_preds_of_block): Likewise. (move_computations_dom_walker::before_dom_children): Likewise. * tree-ssa-phiopt.c (value_replacement): Likewise. * tree-ssa-pre.c (eliminate_dom_walker::before_dom_children): Likewise. * tree-ssa-loop-niter.c (determine_value_range): Use new definition. (record_nonwrapping_iv): Likewise. * tree-ssanames.c (set_range_info): Save wrap_p information. (get_range_info): Retrive wrap_p information. (set_nonzero_bits): Set wrap_p info. (duplicate_ssa_name_range_info): Likewise. (duplicate_ssa_name_fn): Likewise. * tree-ssanames.h: (set_range_info): Update definition. (get_range_info): Likewise. * tree-vect-patterns.c (vect_recog_divmod_pattern): Use new declaration get_range_info. * tree-vrp.c (struct value_range_d): Add wrap_p field. (set_value_range): Calculate and add wrap_p field. (set_and_canonicalize_value_range): Likewise. (copy_value_range): Likewise. (set_value_range_to_value): Likewise. (set_value_range_to_nonnegative): Likewise. (set_value_range_to_nonnull): Likewise. (set_value_range_to_truthvalue): Likewise. (abs_extent_range): Likewise. (get_value_range): Return wrap_p info.
[debug-early] Only output DW_TAG_GNU_formal_parameter_pack DIEs once
The attached patch fixes gcc/testsuite/g++.dg/debug/dwarf2/template-func-params-7.C. The problem is that DW_TAG_GNU_formal_parameter_pack DIEs are generated multiple times (once for early dwarf and once for late dwarf). Fixed by only outputting in early dwarf. Tested with GCC and GDB testsuites. Committed to branch. Aldy commit e74781aabb821c402a9c1efeb69a6311e4e905cf Author: Aldy Hernandez al...@redhat.com Date: Wed Apr 22 16:50:09 2015 -0700 Only output DW_TAG_GNU_formal_parameter_pack DIEs once. diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index 7cc6bb5..624ed19 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -19137,9 +19137,14 @@ gen_subprogram_die (tree decl, dw_die_ref context_die) { if (generic_decl_parm lang_hooks.function_parameter_pack_p (generic_decl_parm)) - gen_formal_parameter_pack_die (generic_decl_parm, - parm, subr_die, - parm); + { + if (early_dwarf_dumping) + gen_formal_parameter_pack_die (generic_decl_parm, + parm, subr_die, + parm); + else if (parm) + parm = DECL_CHAIN (parm); + } else if (parm !POINTER_BOUNDS_P (parm)) { dw_die_ref parm_die = gen_decl_die (parm, NULL, subr_die);
Re: [RFC][PATCH 1/3] Free a bit in SSA_NAME to save wrapped information
On 17/01/15 13:06, Kugan wrote: Freeing a spare-bit to store wrapped attribute by going back to representing VR_ANTI_RANGE as [max + 1, min - 1] in SSA_NAME. Now that stage-1 is open, rebased it and regression tested on x86-64-none-linux-gnu with no new regressions. Is this OK for trunk? Thanks, Kugan gcc/ChangeLog: 2015-04-22 Kugan Vivekanandarajah kug...@linaro.org * tree-ssanames.c (set_range_info): Change range info representation and represent VR_ANTI_RANGE as [max + 1, min - 1]. (get_range_info): Likewise. (set_nonzero_bits): Likewise. (duplicate_ssa_name_range_info): Likewise. * tree-ssanames.h (set_range_info): Change prototype. (get_range_info): Likewise. (set_nonzero_bits): Likewise. (duplicate_ssa_name_range_info): Likewise. * tree-vrp.c (remove_range_assertions): Use new representation. (vrp_finalize): Likewise. * tree-ssa-pre.c (insert_into_preds_of_block): Likewise. diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c index d857d84..67e5351 100644 --- a/gcc/tree-ssa-pre.c +++ b/gcc/tree-ssa-pre.c @@ -3205,7 +3205,6 @@ insert_into_preds_of_block (basic_block block, unsigned int exprnum, !wi::neg_p (max, SIGNED)) /* Just handle extension and sign-changes of all-positive ranges. */ set_range_info (temp, - SSA_NAME_RANGE_TYPE (expr-u.nary-op[0]), wide_int_storage::from (min, TYPE_PRECISION (type), TYPE_SIGN (type)), wide_int_storage::from (max, TYPE_PRECISION (type), diff --git a/gcc/tree-ssanames.c b/gcc/tree-ssanames.c index 9c39f65..744dc43 100644 --- a/gcc/tree-ssanames.c +++ b/gcc/tree-ssanames.c @@ -200,11 +200,10 @@ make_ssa_name_fn (struct function *fn, tree var, gimple stmt) /* Store range information RANGE_TYPE, MIN, and MAX to tree ssa_name NAME. */ void -set_range_info (tree name, enum value_range_type range_type, +set_range_info (tree name, const wide_int_ref min, const wide_int_ref max) { gcc_assert (!POINTER_TYPE_P (TREE_TYPE (name))); - gcc_assert (range_type == VR_RANGE || range_type == VR_ANTI_RANGE); range_info_def *ri = SSA_NAME_RANGE_INFO (name); unsigned int precision = TYPE_PRECISION (TREE_TYPE (name)); @@ -219,16 +218,12 @@ set_range_info (tree name, enum value_range_type range_type, ri-set_nonzero_bits (wi::shwi (-1, precision)); } - /* Record the range type. */ - if (SSA_NAME_RANGE_TYPE (name) != range_type) -SSA_NAME_ANTI_RANGE_P (name) = (range_type == VR_ANTI_RANGE); - /* Set the values. */ ri-set_min (min); ri-set_max (max); /* If it is a range, try to improve nonzero_bits from the min/max. */ - if (range_type == VR_RANGE) + if (wi::cmp (min, max, TYPE_SIGN (TREE_TYPE (name))) 0) { wide_int xorv = ri-get_min () ^ ri-get_max (); if (xorv != 0) @@ -248,6 +243,7 @@ get_range_info (const_tree name, wide_int *min, wide_int *max) gcc_assert (!POINTER_TYPE_P (TREE_TYPE (name))); gcc_assert (min max); range_info_def *ri = SSA_NAME_RANGE_INFO (name); + value_range_type range_type; /* Return VR_VARYING for SSA_NAMEs with NULL RANGE_INFO or SSA_NAMEs with integral types width 2 * HOST_BITS_PER_WIDE_INT precision. */ @@ -255,9 +251,22 @@ get_range_info (const_tree name, wide_int *min, wide_int *max) 2 * HOST_BITS_PER_WIDE_INT)) return VR_VARYING; - *min = ri-get_min (); - *max = ri-get_max (); - return SSA_NAME_RANGE_TYPE (name); + /* If max min, it is VR_ANTI_RANGE. */ + if (wi::cmp (ri-get_max (), ri-get_min (), TYPE_SIGN (TREE_TYPE (name))) 0) +{ + /* VR_ANTI_RANGE ~[min, max] is encoded as [max + 1, min - 1]. */ + range_type = VR_ANTI_RANGE; + *min = wi::add (ri-get_max (), 1); + *max = wi::sub (ri-get_min (), 1); +} + else +{ + /* Otherwise (when min = max), it is VR_RANGE. */ + range_type = VR_RANGE; + *min = ri-get_min (); + *max = ri-get_max (); +} + return range_type; } /* Change non-zero bits bitmask of NAME. */ @@ -267,7 +276,7 @@ set_nonzero_bits (tree name, const wide_int_ref mask) { gcc_assert (!POINTER_TYPE_P (TREE_TYPE (name))); if (SSA_NAME_RANGE_INFO (name) == NULL) -set_range_info (name, VR_RANGE, +set_range_info (name, TYPE_MIN_VALUE (TREE_TYPE (name)), TYPE_MAX_VALUE (TREE_TYPE (name))); range_info_def *ri = SSA_NAME_RANGE_INFO (name); @@ -495,7 +504,8 @@ duplicate_ssa_name_ptr_info (tree name, struct ptr_info_def *ptr_info) /* Creates a duplicate of the range_info_def at RANGE_INFO of type RANGE_TYPE for use by the SSA name NAME. */ void -duplicate_ssa_name_range_info (tree name, enum value_range_type range_type, +duplicate_ssa_name_range_info (tree name, + enum value_range_type range_type
Re: [PATCH, rs6000, testsuite] Fix PR target/64579, __TM_end __builtin_tend failed to return transactional state
On Wed, Apr 22, 2015 at 08:43:10AM -0500, Peter Bergner wrote: Maybe you can fold tabortdc with tabortwc now? Use one UNSPEC name for both, :GPR and wd? Wouldn't that change the tabortwc pattern to use DImode rather than SImode when compiled with -m64 or -m32 -mpowerpc64? I'm not sure we want that. The GPR mode iterator creates two patterns, one for SI and one for DI, and the tabortwc would be the one for SI if you use wd. + case HTM_BUILTIN_TTEST: /* Alias for: tabortwci. 0,r0,0 */ + op[nopnds++] = GEN_INT (0); + op[nopnds++] = gen_rtx_REG (SImode, 0); + op[nopnds++] = GEN_INT (0); Is that really r0, isn't that (0|rA)? [Too lazy to read the docs myself right now, sorry.] The ISA doc shows: [snip] Thanks for looking it up! I'm still a bit worried about putting a reg in the RTL (while the instruction doesn't actually use one), but perhaps it's harmless. + emit_insn (gen_movcc (subreg, cr)); + emit_insn (gen_lshrsi3 (scratch2, scratch1, GEN_INT (28))); + emit_insn (gen_andsi3 (target, scratch2, GEN_INT (0xf))); + } + } Don't we have helper functions/expanders to do these moves? Yuck. Heh, I looked. The only helper pattern was the movcc pattern, but that placed the CR into bits 32-35 of the register. I needed the shift to move it down into the low nibble and I use the and, since one of the move cr insns places two copies of the CR value into bits 32-35 and 36-39. At least the VMX patterns have something like it for CR6. Probably not directly usable either, sigh. -/* { dg-final { scan-assembler-times tabortdc\\. 1 } } */ -/* { dg-final { scan-assembler-times tabortdci\\. 1 } } */ +/* { dg-final { scan-assembler-times tabortdc\\. 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times tabortdci\\. 1 { target lp64 } } } */ This skips this test on -m32 -mpowerpc64, is that on purpose? Ummm, not exactly. :-) Not that many people test that though. I'll see if I can find a replacement for lp64 that covers that case. Maybe just { powerpc64 } works? If not, I'm not too torn up if we skip it for -m32 -mpowerpc64. Me neither, just looked like an oversight. Segher
Re: [PATCH 02/12] remove some ifdef HAVE_cc0
On Thu, Apr 23, 2015 at 04:27:59AM +0100, James Greenhalgh wrote: On Tue, Apr 21, 2015 at 04:24:44PM +0100, Trevor Saunders wrote: On Tue, Apr 21, 2015 at 04:14:01PM +0200, Richard Biener wrote: On Tue, Apr 21, 2015 at 3:24 PM, tbsaunde+...@tbsaunde.org wrote: From: Trevor Saunders tbsaunde+...@tbsaunde.org gcc/ChangeLog: 2015-04-21 Trevor Saunders tbsaunde+...@tbsaunde.org * conditions.h: Define macros even if HAVE_cc0 is undefined. * emit-rtl.c: Define functions even if HAVE_cc0 is undefined. * final.c: Likewise. * jump.c: Likewise. * recog.c: Likewise. * recog.h: Declare functions even when HAVE_cc0 is undefined. * sched-deps.c (sched_analyze_2): Always compile case for cc0. If I've counted right after the git bisect, this patch seems to break the ARM buildi (arm-none-linux-gnueabihf): In file included from insn-output.c:40:0: /gcc-src/gcc/conditions.h:112:0: error: CC_STATUS_INIT redefined [-Werror] #define CC_STATUS_INIT \ ^ In file included from tm.h:35:0, from insn-output.c:7: /gcc-src/gcc/config/arm/arm.h:2159:0: note: this is the location of the previous definition #define CC_STATUS_INIT \ ^ I guess the conditions.h definition wants wrapping in #ifndef - though a quick grep suggests that ARM is the only target defining CC_STATUS_INIT so lets CC the ARM maintainers and see what their preference is... Well, that seems pretty weird, but it looks intentional arm does this see http://gcc.gnu.org/ml/gcc-patches/2010-07/msg00437.html Of course I now see final.c also defines a fall back, so maybe the right thing to do is wrap the conditions.h definition in #if HAVE_cc0, or maybe the final.c definition can go away? Right now I'm to tired to make a good decision about that. sorry about the bustage! Trev Thanks, James
Re: [PATCH] tetstsuite gcc.target/i386/ avx512*
Hi Kirill, On 21.04.15 10:28, Kirill Yukhin wrote: On 19 Apr 21:56, Andreas Tobler wrote: Done so and tested on FreeBSD amd64-unknown-freebsd11.0 and CentOS7.1. Ok for trunk? The patch is OK for trunk and for gcc-5 branch (when it is open). Thanks for fixing this! Done on trunk and gcc-5. Thanks for the review! Andreas
Re: [PATCH][libstc++v3]Add new dg-require-thread-fence directive.
On 22 April 2015 at 12:25, Renlin Li wrote: Hi Jonathan, Thank you for the suggestion. I have just attached the updated the patch. It works as before. Is this Okay to commit? OK, thanks.
[PATCH] Dev tree housekeeping
This pushes a bunch of changes from my dev tree to trunk. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2015-04-22 Richard Biener rguent...@suse.de * cfgexpand.c (expand_gimple_stmt_1): Use ops.code. * cfgloop.c (verify_loop_structure): Verify the root loop node. * except.c (duplicate_eh_regions): Call get_eh_region_from_lp_number_fn instead of get_eh_region_from_lp_number. * loop-init.c (fix_loop_structure): If we removed a loop, reset the SCEV cache. Index: gcc/cfgexpand.c === --- gcc/cfgexpand.c (revision 222320) +++ gcc/cfgexpand.c (working copy) @@ -3413,7 +3413,7 @@ expand_gimple_stmt_1 (gimple stmt) ops.code = gimple_assign_rhs_code (assign_stmt); ops.type = TREE_TYPE (lhs); - switch (get_gimple_rhs_class (gimple_expr_code (stmt))) + switch (get_gimple_rhs_class (ops.code)) { case GIMPLE_TERNARY_RHS: ops.op2 = gimple_assign_rhs3 (assign_stmt); Index: gcc/cfgloop.c === --- gcc/cfgloop.c (revision 222320) +++ gcc/cfgloop.c (working copy) @@ -1347,6 +1347,16 @@ verify_loop_structure (void) else verify_dominators (CDI_DOMINATORS); + /* Check the loop tree root. */ + if (current_loops-tree_root-header != ENTRY_BLOCK_PTR_FOR_FN (cfun) + || current_loops-tree_root-latch != EXIT_BLOCK_PTR_FOR_FN (cfun) + || (current_loops-tree_root-num_nodes + != (unsigned) n_basic_blocks_for_fn (cfun))) +{ + error (corrupt loop tree root); + err = 1; +} + /* Check the headers. */ FOR_EACH_BB_FN (bb, cfun) if (bb_loop_header_p (bb)) Index: gcc/except.c === --- gcc/except.c(revision 222320) +++ gcc/except.c(working copy) @@ -649,7 +649,7 @@ duplicate_eh_regions (struct function *i data.label_map_data = map_data; data.eh_map = new hash_mapvoid *, void *; - outer_region = get_eh_region_from_lp_number (outer_lp); + outer_region = get_eh_region_from_lp_number_fn (cfun, outer_lp); /* Copy all the regions in the subtree. */ if (copy_region) Index: gcc/loop-init.c === --- gcc/loop-init.c (revision 222320) +++ gcc/loop-init.c (working copy) @@ -49,6 +49,7 @@ along with GCC; see the file COPYING3. #include ggc.h #include tree-ssa-loop-niter.h #include loop-unroll.h +#include tree-scalar-evolution.h /* Apply FLAGS to the loop state. */ @@ -221,6 +222,9 @@ fix_loop_structure (bitmap changed_bbs) timevar_push (TV_LOOP_INIT); + if (dump_file (dump_flags TDF_DETAILS)) +fprintf (dump_file, fix_loop_structure: fixing up loops for function\n); + /* We need exact and fast dominance info to be available. */ gcc_assert (dom_info_state (CDI_DOMINATORS) == DOM_OK); @@ -290,6 +294,7 @@ fix_loop_structure (bitmap changed_bbs) } /* Finally free deleted loops. */ + bool any_deleted = false; FOR_EACH_VEC_ELT (*get_loops (cfun), i, loop) if (loop loop-header == NULL) { @@ -322,8 +327,14 @@ fix_loop_structure (bitmap changed_bbs) } (*get_loops (cfun))[i] = NULL; flow_loop_free (loop); + any_deleted = true; } + /* If we deleted loops then the cached scalar evolutions refering to + those loops become invalid. */ + if (any_deleted scev_initialized_p ()) +scev_reset_htab (); + loops_state_clear (LOOPS_NEED_FIXUP); /* Apply flags to loops. */
niter_base simplification
Hello I don't know if I am missing something but I think __niter_base could be simplified to remove usage of _Iter_base. Additionally I overload it to also remove __normal_iterator layer even if behind a reverse_iterator or move_iterator, might help compiler to optimize code, no ? If not, might allow other algo optimization in the future... I prefered to provide a __make_reverse_iterator to allow the latter in C++11 and not only in C++14. Is it fine to do it this way or do you prefer to simply get rid of all this part ? * include/bits/cpp_type_traits.h (__gnu_cxx::__normal_iterator): Delete. * include/bits/stl_algobase.h (std::__niter_base): Adapt. * include/bits/stl_iterator.h (__make_reverse_iterator): New in C++11. (std::__niter_base): Overloads for std::reverse_iterator, __gnu_cxx::__normal_iterator and std::move_iterator. Tested under Linux x86_64. I checked that std::copy still ends up calling __builtin_memmove when used on vector iterators. François diff --git a/libstdc++-v3/include/bits/cpp_type_traits.h b/libstdc++-v3/include/bits/cpp_type_traits.h index 8c6bb7f..2142917 100644 --- a/libstdc++-v3/include/bits/cpp_type_traits.h +++ b/libstdc++-v3/include/bits/cpp_type_traits.h @@ -64,17 +64,6 @@ // removed. // -// Forward declaration hack, should really include this from somewhere. -namespace __gnu_cxx _GLIBCXX_VISIBILITY(default) -{ -_GLIBCXX_BEGIN_NAMESPACE_VERSION - - templatetypename _Iterator, typename _Container -class __normal_iterator; - -_GLIBCXX_END_NAMESPACE_VERSION -} // namespace - namespace std _GLIBCXX_VISIBILITY(default) { _GLIBCXX_BEGIN_NAMESPACE_VERSION @@ -331,24 +320,6 @@ __INT_N(__GLIBCXX_TYPE_INT_N_3) }; // - // Normal iterator type - // - templatetypename _Tp -struct __is_normal_iterator -{ - enum { __value = 0 }; - typedef __false_type __type; -}; - - templatetypename _Iterator, typename _Container -struct __is_normal_iterator __gnu_cxx::__normal_iterator_Iterator, - _Container -{ - enum { __value = 1 }; - typedef __true_type __type; -}; - - // // An arithmetic type is an integer type or a floating point type // templatetypename _Tp diff --git a/libstdc++-v3/include/bits/stl_algobase.h b/libstdc++-v3/include/bits/stl_algobase.h index 0bcb133..73eea6b 100644 --- a/libstdc++-v3/include/bits/stl_algobase.h +++ b/libstdc++-v3/include/bits/stl_algobase.h @@ -270,17 +270,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION return __a; } - // If _Iterator is a __normal_iterator return its base (a plain pointer, - // normally) otherwise return it untouched. See copy, fill, ... + // Fallback implementation of the function used to remove the + // __normal_iterator wrapper. See copy, fill, ... templatetypename _Iterator -struct _Niter_base -: _Iter_base_Iterator, __is_normal_iterator_Iterator::__value -{ }; - - templatetypename _Iterator -inline typename _Niter_base_Iterator::iterator_type +inline _Iterator __niter_base(_Iterator __it) -{ return std::_Niter_base_Iterator::_S_base(__it); } +{ return __it; } // Likewise, for move_iterator. templatetypename _Iterator diff --git a/libstdc++-v3/include/bits/stl_iterator.h b/libstdc++-v3/include/bits/stl_iterator.h index 4a9189e..3aad9f3 100644 --- a/libstdc++-v3/include/bits/stl_iterator.h +++ b/libstdc++-v3/include/bits/stl_iterator.h @@ -390,7 +390,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { return __y.base() - __x.base(); } //@} -#if __cplusplus 201103L +#if __cplusplus == 201103L + templatetypename _Iterator +inline reverse_iterator_Iterator +__make_reverse_iterator(_Iterator __i) +{ return reverse_iterator_Iterator(__i); } + +# define _GLIBCXX_MAKE_REVERSE_ITERATOR(_Iter) \ + std::__make_reverse_iterator(_Iter) +#elif __cplusplus 201103L #define __cpp_lib_make_reverse_iterator 201402 // _GLIBCXX_RESOLVE_LIB_DEFECTS @@ -400,6 +408,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION inline reverse_iterator_Iterator make_reverse_iterator(_Iterator __i) { return reverse_iterator_Iterator(__i); } + +# define _GLIBCXX_MAKE_REVERSE_ITERATOR(_Iter) \ + std::make_reverse_iterator(_Iter) +#endif + +#if __cplusplus = 201103L + templatetypename _Iterator +auto +__niter_base(reverse_iterator_Iterator __it) +- decltype(_GLIBCXX_MAKE_REVERSE_ITERATOR(__niter_base(__it.base( +{ return _GLIBCXX_MAKE_REVERSE_ITERATOR(__niter_base(__it.base())); } #endif // 24.4.2.2.1 back_insert_iterator @@ -979,6 +998,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _GLIBCXX_END_NAMESPACE_VERSION } // namespace +namespace std _GLIBCXX_VISIBILITY(default) +{ +_GLIBCXX_BEGIN_NAMESPACE_VERSION + + templatetypename _Iterator, typename _Container +_Iterator +__niter_base(__gnu_cxx::__normal_iterator_Iterator, _Container __it) +{ return __it.base(); } + +_GLIBCXX_END_NAMESPACE_VERSION +} // namespace + #if __cplusplus =
Re: Hide _S_n_primes from user code
With the patch this time. On 22/04/2015 21:39, François Dumont wrote: Hello Here is a rather trivial patch, just code cleanup. Since we export _Prime_rehash_policy we do not need to expose the _S_n_primes anymore. * include/bits/hashtable_policy.h (_Prime_rehash_policy::_S_n_primes): Delete. * src/c++11/hashtable_c++0x.cc (_Prime_rehash_policy::_M_next_bkt): Remove usage of latter and compute size of the prime numbers array locally. Tested under Linux x86_64. Ok to commit ? François diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h index 14bcca6..a9ad7dd 100644 --- a/libstdc++-v3/include/bits/hashtable_policy.h +++ b/libstdc++-v3/include/bits/hashtable_policy.h @@ -495,8 +495,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _M_reset(_State __state) { _M_next_resize = __state; } -enum { _S_n_primes = sizeof(unsigned long) != 8 ? 256 : 256 + 48 }; - static const std::size_t _S_growth_factor = 2; float _M_max_load_factor; diff --git a/libstdc++-v3/src/c++11/hashtable_c++0x.cc b/libstdc++-v3/src/c++11/hashtable_c++0x.cc index 22de51b..69f999f 100644 --- a/libstdc++-v3/src/c++11/hashtable_c++0x.cc +++ b/libstdc++-v3/src/c++11/hashtable_c++0x.cc @@ -56,8 +56,10 @@ namespace __detail return __fast_bkt[__n]; } +constexpr auto __n_primes + = sizeof(__prime_list) / sizeof(unsigned long) - 1; const unsigned long* __next_bkt = - std::lower_bound(__prime_list + 5, __prime_list + _S_n_primes, __n); + std::lower_bound(__prime_list + 5, __prime_list + __n_primes, __n); _M_next_resize = __builtin_ceil(*__next_bkt * (long double)_M_max_load_factor); return *__next_bkt;
Re: Hide _S_n_primes from user code
On Wed, Apr 22, 2015 at 09:39:48PM +0200, François Dumont wrote: Hello Here is a rather trivial patch, just code cleanup. Since we export _Prime_rehash_policy we do not need to expose the _S_n_primes anymore. * include/bits/hashtable_policy.h (_Prime_rehash_policy::_S_n_primes): Delete. * src/c++11/hashtable_c++0x.cc (_Prime_rehash_policy::_M_next_bkt): Remove usage of latter and compute size of the prime numbers array locally. Tested under Linux x86_64. Ok to commit ? The patch is missing :). Marek
Re: [PATCH] PR target/65846: Optimize data access in PIE with copy reloc
On Wed, Apr 22, 2015 at 5:34 PM, H.J. Lu hongjiu...@intel.com wrote: Normally, with PIE, GCC accesses globals that are extern to the module using GOT. This is two instructions, one to get the address of the global from GOT and the other to get the value. Examples: --- extern int a_glob; int main () { return a_glob; } --- With PIE, the generated code accesses global via GOT using two memory loads: movqa_glob@GOTPCREL(%rip), %rax movl(%rax), %eax for 64-bit or movla_glob@GOT(%ecx), %eax movl(%eax), %eax for 32-bit. Some experiments on google and SPEC CPU benchmarks show that the extra instruction affects performance by 1% to 5%. Solution - Copy Relocations: When the linker supports copy relocations, GCC can always assume that the global will be defined in the executable. For globals that are truly extern (come from shared objects), the linker will create copy relocations and have them defined in the executable. Result is that no global access needs to go through GOT and hence improves performance. We can generate movla_glob(%rip), %eax for 64-bit and movla_glob@GOTOFF(%eax), %eax for 32-bit. This optimization only applies to undefined non-weak non-TLS global data. Undefined weak global or TLS data access still must go through GOT. This patch reverts legitimate_pic_address_disp_p change made in revision 218397, which only applies to x86-64. Instead, this patch updates targetm.binds_local_p to indicate if undefined non-weak non-TLS global data is defined locally in PIE. It also introduces a new target hook, binds_tls_local_p to distinguish TLS variable from non-TLS variable. By default, binds_tls_local_p is the same as binds_local_p. This patch checks if 32-bit and 64-bit linkers support PIE with copy reloc at configure time. 64-bit linker is enabled in binutils 2.25 and 32-bit linker is enabled in binutils 2.26. This optimization is enabled only if the linker support is available. Tested on Linux/x86-64 with -m32 and -m64, using linkers with and without support for copy relocation in PIE. OK for trunk? Thanks. Looking at this my first reaction was that surely most (if not all ? ) targets that use ELF and had copy relocs would benefit from this ? Couldn't we find a simpler way for targets to have this support ? I don't have a more constructive suggestion to make at the minute but getting this to work just from the targetm.binds_local_p (decl) interface would probably be better ? It's late in the evening and I probably won't have time to look at this in more detail till Friday afternoon given other personal commitments. regards, Ramana H.J. --- gcc/ PR target/65846 * configure.ac (HAVE_LD_PIE_COPYRELOC): Renamed to ... (HAVE_LD_64BIT_PIE_COPYRELOC): This. (HAVE_LD_32BIT_PIE_COPYRELOC): New. Defined to 1 if Linux/ia32 linker supports PIE with copy reloc. * output.h (default_binds_tls_local_p): New. (default_binds_local_p_3): Add 2 bool arguments. * target.def (binds_tls_local_p): New target hook. * varasm.c (decl_default_tls_model): Replace targetm.binds_local_p with targetm.binds_tls_local_p. (default_binds_local_p_3): Add a bool argument to indicate TLS variable and a bool argument to indicate if an undefined non-TLS non-weak data is local. Double check TLS variable. If an undefined non-TLS non-weak data is local, treat it as defined locally. (default_binds_local_p): Pass false and false to default_binds_local_p_3. (default_binds_local_p_2): Likewise. (default_binds_local_p_1): Likewise. (default_binds_tls_local_p): New. * config.in: Regenerated. * configure: Likewise. * doc/tm.texi: Likewise. * config/i386/i386.c (legitimate_pic_address_disp_p): Don't check HAVE_LD_PIE_COPYRELOC here. (ix86_binds_local): New. (ix86_binds_tls_local_p): Likewise. (ix86_binds_local_p): Use it. (TARGET_BINDS_TLS_LOCAL_P): New. * doc/tm.texi.in (TARGET_BINDS_TLS_LOCAL_P): New hook. gcc/testsuite/ PR target/65846 * gcc.target/i386/pie-copyrelocs-1.c: Updated for ia32. * gcc.target/i386/pie-copyrelocs-2.c: Likewise. * gcc.target/i386/pie-copyrelocs-3.c: Likewise. * gcc.target/i386/pie-copyrelocs-4.c: Likewise. * gcc.target/i386/pr32219-9.c: Likewise. * gcc.target/i386/pr32219-10.c: New file. * lib/target-supports.exp (check_effective_target_pie_copyreloc): Check HAVE_LD_64BIT_PIE_COPYRELOC and HAVE_LD_32BIT_PIE_COPYRELOC instead of HAVE_LD_64BIT_PIE_COPYRELOC. --- gcc/config.in| 18 --- gcc/config/i386/i386.c | 44 ++-
Re: [RFC][PATCH 3/3] Enable zero/sign extension elimination
On 17/01/15 13:11, Kugan wrote: Re-enable zero/sign extension elimination using value range that includes wrapped attribute. Now that stage-1 is open, rebased it and regression tested on x86-64-none-linux-gnu with no new regressions. Is this OK for trunk? Thanks, Kugan gcc/ChangeLog: 2015-04-22 Kugan Vivekanandarajah kug...@linaro.org * calls.c (precompute_arguments): Check promoted_for_signed_and_unsigned_p and set the promoted mode. * expr.c (expand_expr_real_1): Likewise. (promoted_for_signed_and_unsigned_p): New function. * cfgexpand.c (expand_gimple_stmt_1): Call emit_move_insn if SUBREG is promoted with SRP_SIGNED_AND_UNSIGNED. * expr.h (promoted_for_signed_and_unsigned_p): New definition. diff --git a/gcc/calls.c b/gcc/calls.c index 3be7ca5..6b8d861 100644 --- a/gcc/calls.c +++ b/gcc/calls.c @@ -1637,7 +1637,10 @@ precompute_arguments (int num_actuals, struct arg_data *args) args[i].initial_value = gen_lowpart_SUBREG (mode, args[i].value); SUBREG_PROMOTED_VAR_P (args[i].initial_value) = 1; - SUBREG_PROMOTED_SET (args[i].initial_value, args[i].unsignedp); + if (promoted_for_signed_and_unsigned_p (args[i].tree_value)) + SUBREG_PROMOTED_SET (args[i].initial_value, SRP_SIGNED_AND_UNSIGNED); + else + SUBREG_PROMOTED_SET (args[i].initial_value, args[i].unsignedp); } } } diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c index ca491a0..5fcee87 100644 --- a/gcc/cfgexpand.c +++ b/gcc/cfgexpand.c @@ -3452,7 +3452,13 @@ expand_gimple_stmt_1 (gimple stmt) GET_MODE (target), temp, unsignedp); } - convert_move (SUBREG_REG (target), temp, unsignedp); + if ((SUBREG_PROMOTED_GET (target) == SRP_SIGNED_AND_UNSIGNED) +(GET_CODE (temp) == SUBREG) +(GET_MODE (target) == GET_MODE (temp)) +(GET_MODE (SUBREG_REG (target)) == GET_MODE (SUBREG_REG (temp + emit_move_insn (SUBREG_REG (target), SUBREG_REG (temp)); + else + convert_move (SUBREG_REG (target), temp, unsignedp); } else if (nontemporal emit_storent_insn (target, temp)) ; diff --git a/gcc/expr.c b/gcc/expr.c index 530a944..224a50f 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -185,6 +185,39 @@ static rtx const_vector_from_tree (tree); static tree tree_expr_size (const_tree); static HOST_WIDE_INT int_expr_size (tree); +/* Return TRUE if value in SSA is zero and sign extended for wider mode MODE + using value range information stored. Return FALSE otherwise. + + This is used to check if SUBREG is zero and sign extended and to set + promoted mode SRP_SIGNED_AND_UNSIGNED to SUBREG. */ + +bool +promoted_for_signed_and_unsigned_p (tree ssa) +{ + wide_int min, max; + bool ovf; + + if (ssa == NULL_TREE + || TREE_CODE (ssa) != SSA_NAME + || !INTEGRAL_TYPE_P (TREE_TYPE (ssa))) +return false; + + /* Return FALSE if value_range is not recorded for SSA. */ + if (get_range_info (ssa, min, max, ovf) != VR_RANGE) +return false; + + if (ovf) +return false; + + /* Return true (to set SRP_SIGNED_AND_UNSIGNED to SUBREG) if MSB of the + smaller mode is not set (i.e. MSB of ssa is not set). */ + if (!wi::neg_p (min, SIGNED) !wi::neg_p(max, SIGNED)) +return true; + else +return false; + +} + /* This is run to set up which modes can be used directly in memory and to initialize the block move optab. It is run @@ -9674,7 +9707,10 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode, temp = gen_lowpart_SUBREG (mode, decl_rtl); SUBREG_PROMOTED_VAR_P (temp) = 1; - SUBREG_PROMOTED_SET (temp, unsignedp); + if (promoted_for_signed_and_unsigned_p (ssa_name)) + SUBREG_PROMOTED_SET (temp, SRP_SIGNED_AND_UNSIGNED); + else + SUBREG_PROMOTED_SET (temp, unsignedp); return temp; } diff --git a/gcc/expr.h b/gcc/expr.h index 867852e..f965fe0 100644 --- a/gcc/expr.h +++ b/gcc/expr.h @@ -243,6 +243,7 @@ extern rtx expand_expr_real_1 (tree, rtx, machine_mode, enum expand_modifier, rtx *, bool); extern rtx expand_expr_real_2 (sepops, rtx, machine_mode, enum expand_modifier); +extern bool promoted_for_signed_and_unsigned_p (tree); /* Generate code for computing expression EXP. An rtx for the computed value is returned. The value is never null.
Re: [PATCH, rs6000, testsuite] Fix PR target/64579, __TM_end __builtin_tend failed to return transactional state
On Wed, 2015-04-22 at 17:16 -0500, Segher Boessenkool wrote: On Wed, Apr 22, 2015 at 08:43:10AM -0500, Peter Bergner wrote: Maybe you can fold tabortdc with tabortwc now? Use one UNSPEC name for both, :GPR and wd? Wouldn't that change the tabortwc pattern to use DImode rather than SImode when compiled with -m64 or -m32 -mpowerpc64? I'm not sure we want that. The GPR mode iterator creates two patterns, one for SI and one for DI, and the tabortwc would be the one for SI if you use wd. Ah, I think I know what you mean. Sure, I can try that, + case HTM_BUILTIN_TTEST: /* Alias for: tabortwci. 0,r0,0 */ + op[nopnds++] = GEN_INT (0); + op[nopnds++] = gen_rtx_REG (SImode, 0); + op[nopnds++] = GEN_INT (0); Is that really r0, isn't that (0|rA)? [Too lazy to read the docs myself right now, sorry.] The ISA doc shows: [snip] Thanks for looking it up! I'm still a bit worried about putting a reg in the RTL (while the instruction doesn't actually use one), but perhaps it's harmless. I'm not sure what you mean by the instruction doesn't use one. The hardware instruction does use a register for its second operand (even though its contents are ignored due to TO == 0) and the pattern requires us to pass in a reg rtx, so I'm not sure what you're referring to. This skips this test on -m32 -mpowerpc64, is that on purpose? Ummm, not exactly. :-) Not that many people test that though. I'll see if I can find a replacement for lp64 that covers that case. Maybe just { powerpc64 } works? I'll take a look at that to see if that works, thanks. Peter
Re: [PATCH] PR target/65846: Optimize data access in PIE with copy reloc
On Wed, Apr 22, 2015 at 3:15 PM, Ramana Radhakrishnan ramana@googlemail.com wrote: On Wed, Apr 22, 2015 at 5:34 PM, H.J. Lu hongjiu...@intel.com wrote: Normally, with PIE, GCC accesses globals that are extern to the module using GOT. This is two instructions, one to get the address of the global from GOT and the other to get the value. Examples: --- extern int a_glob; int main () { return a_glob; } --- With PIE, the generated code accesses global via GOT using two memory loads: movqa_glob@GOTPCREL(%rip), %rax movl(%rax), %eax for 64-bit or movla_glob@GOT(%ecx), %eax movl(%eax), %eax for 32-bit. Some experiments on google and SPEC CPU benchmarks show that the extra instruction affects performance by 1% to 5%. Solution - Copy Relocations: When the linker supports copy relocations, GCC can always assume that the global will be defined in the executable. For globals that are truly extern (come from shared objects), the linker will create copy relocations and have them defined in the executable. Result is that no global access needs to go through GOT and hence improves performance. We can generate movla_glob(%rip), %eax for 64-bit and movla_glob@GOTOFF(%eax), %eax for 32-bit. This optimization only applies to undefined non-weak non-TLS global data. Undefined weak global or TLS data access still must go through GOT. This patch reverts legitimate_pic_address_disp_p change made in revision 218397, which only applies to x86-64. Instead, this patch updates targetm.binds_local_p to indicate if undefined non-weak non-TLS global data is defined locally in PIE. It also introduces a new target hook, binds_tls_local_p to distinguish TLS variable from non-TLS variable. By default, binds_tls_local_p is the same as binds_local_p. This patch checks if 32-bit and 64-bit linkers support PIE with copy reloc at configure time. 64-bit linker is enabled in binutils 2.25 and 32-bit linker is enabled in binutils 2.26. This optimization is enabled only if the linker support is available. Tested on Linux/x86-64 with -m32 and -m64, using linkers with and without support for copy relocation in PIE. OK for trunk? Thanks. Looking at this my first reaction was that surely most (if not all ? ) targets that use ELF and had copy relocs would benefit from this ? Couldn't we find a simpler way for targets to have this support ? I don't have a more constructive suggestion to make at the minute but getting this to work just from the targetm.binds_local_p (decl) interface would probably be better ? default_binds_local_p_3 is a global function which is used to implement targetm.binds_local_p in x86 backend. Any backend can use it to optimize for copy relocation. -- H.J.
Re: Ping^3 : [PATCH] [gcc, combine] PR46164: Don't combine the insns if a volatile register is contained.
On Wed, Apr 22, 2015 at 10:30 AM, Segher Boessenkool seg...@kernel.crashing.org wrote: On Wed, Apr 22, 2015 at 10:21:43AM +0800, Terry Guo wrote: gcc/ChangeLog: 2015-04-22 Hale Wang hale.w...@arm.com Terry Guo terry@arm.com PR rtl-optimization/64818 * combine.c (can_combine_p): Don't combine user-specified register if it is in an asm input. gcc/testsuite/ChangeLog: 2015-04-22 Hale Wang hale.w...@arm.com Terry Guo terry@arm.com PR rtl-optimization/64818 * gcc.target/arm/pr64818.c: New. This is okay for trunk, if it has been bootstrapped and regression tested. Thanks, Segher Thanks Segher. The patch is tested with bootstrap and regression test for x86_64. No problem found. Committed as revision 222306. BR, Terry
Re: Expand oacc kernels after pass_fre (was: [PATCH, 1/8] Expand oacc kernels after pass_build_ealias)
On Tue, 21 Apr 2015, Thomas Schwinge wrote: Hi! On Tue, 25 Nov 2014 12:22:02 +0100, Tom de Vries tom_devr...@mentor.com wrote: On 24-11-14 11:56, Tom de Vries wrote: On 15-11-14 18:19, Tom de Vries wrote: On 15-11-14 13:14, Tom de Vries wrote: I'm submitting a patch series with initial support for the oacc kernels directive. The patch series uses pass_parallelize_loops to implement parallelization of loops in the oacc kernels region. The patch series consists of these 8 patches: ... 1 Expand oacc kernels after pass_build_ealias 2 Add pass_oacc_kernels 3 Add pass_ch_oacc_kernels to pass_oacc_kernels 4 Add pass_tree_loop_{init,done} to pass_oacc_kernels 5 Add pass_loop_im to pass_oacc_kernels 6 Add pass_ccp to pass_oacc_kernels 7 Add pass_parloops_oacc_kernels to pass_oacc_kernels 8 Do simple omp lowering for no address taken var ... This patch moves omp expansion of the oacc kernels directive to after pass_build_ealias. The rationale is that in order to use pass_parallelize_loops for analysis and transformation of an oacc kernels region, we postpone omp expansion of that region until the earliest point in the pass list where enough information is availabe to run pass_parallelize_loops, in other words, after pass_build_ealias. The patch postpones expansion in expand_omp, and ensures expansion by adding pass_expand_omp_ssa: - after pass_build_ealias, and - after pass_all_early_optimizations for the case we're not optimizing. In order to make sure the oacc kernels region arrives at pass_expand_omp_ssa, the way it left expand_omp, the patch makes pass_ccp and pass_forwprop aware of lowered omp code, to handle it conservatively. The patch contains changes in expand_omp_target to deal with ssa-code, similar to what is already present in expand_omp_taskreg. Furthermore, the patch forces the .omp_data_sizes and .omp_data_kinds to not be static for oacc kernels. It does this to get some references to .omp_data_sizes and .omp_data_kinds in the ssa code. Without these references, the definitions will be removed. The reference of the variables in GIMPLE_OACC_KERNELS is not enough to have them not removed. [ In vries/oacc-kernels, I used a BUILT_IN_USE kludge for this purpose ]. Finally, at the end of pass_expand_omp_ssa we're left with SSA_NAMEs in the original function of which the definition has been removed (as in moved to the split off function). TODO_remove_unused_locals takes care of some of them, but not the anonymous ones. So the patch iterates over all SSA_NAMEs to find these dangling SSA_NAMEs and releases them. Reposting with small update: I've replaced the use of the rather generic gimple_stmt_omp_lowering_p with the more specific gimple_stmt_omp_data_i_init_p. Bootstrapped and reg-tested in the same way as before. I've moved pass_expand_omp_ssa one down in the pass list, past pass_fre. This allows fre to unify references to the same omp variable before entering pass_oacc_kernels, which helps pass_lim in pass_oacc_kernels. F.i. this reduction fragment: ... # VUSE .MEM_8 # PT = { D.2282 } _67 = .omp_data_i_59-sumD.2270; # VUSE .MEM_8 _68 = *_67; _70 = _66 + _68; # VUSE .MEM_8 # PT = { D.2282 } _69 = .omp_data_i_59-sumD.2270; # .MEM_71 = VDEF .MEM_8 *_69 = _70; ... is transformed by fre into: ... # VUSE .MEM_8 # PT = { D.2282 } _67 = .omp_data_i_59-sumD.2270; # VUSE .MEM_8 _68 = *_67; _70 = _66 + _68; # .MEM_71 = VDEF .MEM_8 *_67 = _70; ... In order for pass_fre to respect the kernels region boundaries, I've added a change in tree-ssa-sccvn.c:visit_use to handle the .omp_data_i init conservatively. Bootstrapped and reg-tested as before. OK for trunk? Committed to gomp-4_0-branch in r79: commit 93557ac5e30c26ee1a3d1255e31265b287171a0d Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Tue Apr 21 19:37:19 2015 + Expand oacc kernels after pass_fre gcc/ * omp-low.c: Include gimple-pretty-print.h. (release_first_vuse_in_edge_dest): New function. (expand_omp_target): When not in ssa, don't split off oacc kernels region, clear PROP_gimple_eomp in cfun-curr_properties to force later expanssion, and add GOACC_kernels_internal call. When in ssa, split off oacc kernels and convert GOACC_kernels_internal into GOACC_kernels call. Handle ssa-code. (pass_data_expand_omp): Don't set PROP_gimple_eomp unconditionally in properties_provided field. (pass_expand_omp::execute): Set PROP_gimple_eomp in cfun-curr_properties
Re: [PATCH, 6/8] Add pass_copy_prop in pass_oacc_kernels
On Tue, 21 Apr 2015, Thomas Schwinge wrote: Hi! On Tue, 25 Nov 2014 12:38:55 +0100, Tom de Vries tom_devr...@mentor.com wrote: On 15-11-14 18:22, Tom de Vries wrote: On 15-11-14 13:14, Tom de Vries wrote: I'm submitting a patch series with initial support for the oacc kernels directive. The patch series uses pass_parallelize_loops to implement parallelization of loops in the oacc kernels region. The patch series consists of these 8 patches: ... 1 Expand oacc kernels after pass_build_ealias 2 Add pass_oacc_kernels 3 Add pass_ch_oacc_kernels to pass_oacc_kernels 4 Add pass_tree_loop_{init,done} to pass_oacc_kernels 5 Add pass_loop_im to pass_oacc_kernels 6 Add pass_ccp to pass_oacc_kernels 7 Add pass_parloops_oacc_kernels to pass_oacc_kernels 8 Do simple omp lowering for no address taken var ... This patch adds pass_loop_ccp to pass group pass_oacc_kernels. We need this pass to simplify the loop body, and allow pass_parloops to detect that loop iterations are independent. As suggested here ( https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02993.html ) I've replaced the pass_ccp with pass_copyprop, which performs trivial constant propagation in addition to copy propagation. Bootstrapped and reg-tested as before. OK for trunk? I've recently wondered why we do copy propagation after LIM and I don't remember. Can you remind me? Can you add testcases that fail before this kind of patches and pass afterwards? Richard. Committed to gomp-4_0-branch in r84: commit 1c2529b64620811cbff4a50374af797ee52ef5f8 Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Tue Apr 21 19:58:54 2015 + Add pass_copy_prop in pass_oacc_kernels gcc/ * passes.def: Add pass_copy_prop to pass group pass_oacc_kernels. * tree-ssa-copy.c (stmt_may_generate_copy): Handle .omp_data_i init conservatively. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@84 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog.gomp |4 gcc/passes.def |1 + gcc/tree-ssa-copy.c |4 3 files changed, 9 insertions(+) diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp index 98e33ad..0be9191 100644 --- gcc/ChangeLog.gomp +++ gcc/ChangeLog.gomp @@ -1,5 +1,9 @@ 2015-04-21 Tom de Vries t...@codesourcery.com + * passes.def: Add pass_copy_prop to pass group pass_oacc_kernels. + * tree-ssa-copy.c (stmt_may_generate_copy): Handle .omp_data_i init + conservatively. + * passes.def: Add pass_lim in pass group pass_ch_oacc_kernels. * passes.def: Run pass_tree_loop_init and pass_tree_loop_done in pass diff --git gcc/passes.def gcc/passes.def index e6c9287..e6f1c33 100644 --- gcc/passes.def +++ gcc/passes.def @@ -93,6 +93,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_ch_oacc_kernels); NEXT_PASS (pass_tree_loop_init); NEXT_PASS (pass_lim); + NEXT_PASS (pass_copy_prop); NEXT_PASS (pass_expand_omp_ssa); NEXT_PASS (pass_tree_loop_done); POP_INSERT_PASSES () diff --git gcc/tree-ssa-copy.c gcc/tree-ssa-copy.c index 5ae8e6c..6f35f99 100644 --- gcc/tree-ssa-copy.c +++ gcc/tree-ssa-copy.c @@ -61,6 +61,7 @@ along with GCC; see the file COPYING3. If not see #include tree-scalar-evolution.h #include tree-ssa-dom.h #include tree-ssa-loop-niter.h +#include omp-low.h /* This file implements the copy propagation pass and provides a @@ -116,6 +117,9 @@ stmt_may_generate_copy (gimple stmt) if (gimple_has_volatile_ops (stmt)) return false; + if (gimple_stmt_omp_data_i_init_p (stmt)) +return false; + /* Statements with loads and/or stores will never generate a useful copy. */ if (gimple_vuse (stmt)) return false; Grüße, Thomas -- Richard Biener rguent...@suse.de SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild, Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)
Re: [PATCH] [1/2] [ARM] [libgcc] Support RTABI half-precision conversion functions.
On Mon, Apr 13, 2015 at 12:25 PM, Joseph Myers jos...@codesourcery.com wrote: On Mon, 13 Apr 2015, Hale Wang wrote: Yes, you are right. It's my fault to add the only here. Thank you to point out this. Beside this, is this patch OK for you? I don't think it's a good idea for libgcc to include large pieces of assembly code generated by a compiler. Just compile the code with whatever options are needed at the time libgcc is built - possibly with #if conditionals to allow compiling different versions of the code. Indeed, are any special options needed at all? I agree and I don't think it's maintainable in the long run. From my reading of this thread I can't see any special options being needed. Can we just massage it in C ? regards Ramana -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] emit_bss(): Remove redundant guard
On 8 April 2015 at 20:19, Bernhard Reutner-Fischer rep.dot@gmail.com wrote: gcc/ChangeLog: OKed by Jeff and installed to trunk as r222311. thanks, 2015-04-01 Bernhard Reutner-Fischer al...@gcc.gnu.org * varasm.c (emit_bss): Remove redundant guard. Signed-off-by: Bernhard Reutner-Fischer rep.dot@gmail.com --- gcc/varasm.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/gcc/varasm.c b/gcc/varasm.c index 537a64d..2bb5f27 100644 --- a/gcc/varasm.c +++ b/gcc/varasm.c @@ -1951,21 +1951,19 @@ emit_local (tree decl, #if defined ASM_OUTPUT_ALIGNED_BSS static bool emit_bss (tree decl ATTRIBUTE_UNUSED, const char *name ATTRIBUTE_UNUSED, unsigned HOST_WIDE_INT size ATTRIBUTE_UNUSED, unsigned HOST_WIDE_INT rounded ATTRIBUTE_UNUSED) { -#if defined ASM_OUTPUT_ALIGNED_BSS ASM_OUTPUT_ALIGNED_BSS (asm_out_file, decl, name, size, get_variable_align (decl)); return true; -#endif } #endif /* A noswitch_section_callback for comm_section. */ static bool emit_common (tree decl ATTRIBUTE_UNUSED, const char *name ATTRIBUTE_UNUSED, -- 2.1.4
Re: [PATCH] PR target/55144: bfin: fix opening glibc-c.o: No such file or directory
On 8 April 2015 at 20:37, Bernhard Reutner-Fischer rep.dot@gmail.com wrote: building all-gcc for bfin-linux-uclibc results in build/genchecksum cp/cp-lang.o c-family/stub-objc.o ... glibc-c.o \ libbackend.a .. cc1plus-checksum.c.tmp opening glibc-c.o: No such file or directory make[2]: *** [cc1-checksum.c] Error 1 Fix this by prepending tmake_file which nowadays consists of t-slibgcc t-linux t-glibc. Remove the already listed tmake_file entries. Fixes all-gcc config-list.mk build for bfin-linux-uclibc. Ok for trunk? This was OKed by Jeff and committed to trunk as r222313. thanks, gcc/ChangeLog PR target/55144 * config.gcc (bfin*-linux-uclibc*): Prepend tmake_file and remove already contained t-files. Signed-off-by: Bernhard Reutner-Fischer rep.dot@gmail.com Cc: Bernd Schmidt ber...@codesourcery.com Cc: Jie Zhang jzhang...@gmail.com Signed-off-by: Bernhard Reutner-Fischer rep.dot@gmail.com --- gcc/config.gcc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config.gcc b/gcc/config.gcc index cb08a5c..ddbd57b 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -1118,7 +1118,7 @@ bfin*-uclinux*) ;; bfin*-linux-uclibc*) tm_file=${tm_file} dbxelf.h elfos.h bfin/elf.h gnu-user.h linux.h glibc-stdint.h bfin/linux.h ./linux-sysroot-suffix.h - tmake_file=bfin/t-bfin-linux t-slibgcc t-linux + tmake_file=${tmake_file} bfin/t-bfin-linux use_collect2=no ;; bfin*-rtems*) -- 2.1.4
RE: [PATCH] [1/2] [ARM] [libgcc] Support RTABI half-precision conversion functions.
-Original Message- From: Ramana Radhakrishnan [mailto:ramana@googlemail.com] Sent: Wednesday, April 22, 2015 3:50 PM To: Joseph Myers Cc: Hale Wang; GCC Patches Subject: Re: [PATCH] [1/2] [ARM] [libgcc] Support RTABI half-precision conversion functions. On Mon, Apr 13, 2015 at 12:25 PM, Joseph Myers jos...@codesourcery.com wrote: On Mon, 13 Apr 2015, Hale Wang wrote: Yes, you are right. It's my fault to add the only here. Thank you to point out this. Beside this, is this patch OK for you? I don't think it's a good idea for libgcc to include large pieces of assembly code generated by a compiler. Just compile the code with whatever options are needed at the time libgcc is built - possibly with #if conditionals to allow compiling different versions of the code. Indeed, just compile the code with option '-mfloat-abi=soft' at the time libgcc is build which can solve this problem. Indeed, are any special options needed at all? The reason is that the current GNU versions of the fp16 conversions are more efficient than the AEABI versions in this patch(and also more efficient than the code compiled with option '-mfloat-abi=soft', because no fp registers will be used to implement these functions which is allowed in the GNU versions). We provide an option so that the users can choose the version as they want(whether they want to follow the AEABI constraint or not). I agree and I don't think it's maintainable in the long run. From my reading of this thread I can't see any special options being needed. Can we just massage it in C ? The reason is that the implementations of these helper functions are allowed to corrupt the integer core registers permitted to be corrupted by the [AAPCS] (r0-r3, ip, lr, and CPSR). To guarantee this if we just massage it in C, as Joseph suggested, we can compile the code with whatever options are needed at the time libgcc is built. Possibly the option '-mfloat-abi=soft ' can help us to guarantee this (seems more strict than the AEABI constraint). The special option is provided so that the users can choose the version as they want(whether they want to follow the AEABI constraint or not). Because the current GNU versions of the fp16 conversions are more efficient than the AEABI versions in this patch. Best Regards, Hale regards Ramana -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] Optionally sanitize globals in user-defined sections
On 04/19/2015 06:11 PM, Jakub Jelinek wrote: On Sun, Apr 19, 2015 at 10:54:57AM +0300, Yury Gribov wrote: On 04/17/2015 08:29 PM, Andi Kleen wrote: Yury Gribov y.gri...@samsung.com writes: + +static bool +section_sanitized_p (const char *sec) +{ + if (!sanitized_sections) +return false; + size_t len = strlen (sec); + const char *p = sanitized_sections; + while ((p = strstr (p, sec))) +{ + if ((p == sanitized_sections || p[-1] == ',') + (p[len] == 0 || p[len] == ',')) + return true; No wildcard support? That may be a long option in some cases. Right. Do you think * will be enough or we also need ? and [a-f] syntax? libiberty contains and gcc build utilities already use fnmatch, so you should just use that (with carefully chosen FNM_* options). Hi all, Here is an new patch which adds support for wildcards in -fsanitize-file:///home/ygribov/user-section-2.diff sections. This also adds a test which I forgot to svn-add last time (shame on me). Bootstrapped and regtested on x64. Ok to commit? -Y
Re: [PATCH, 4/8] Add pass_tree_loop_{init,done} to pass_oacc_kernels
On Tue, 21 Apr 2015, Thomas Schwinge wrote: Hi! On Tue, 25 Nov 2014 12:29:28 +0100, Tom de Vries tom_devr...@mentor.com wrote: On 15-11-14 18:21, Tom de Vries wrote: On 15-11-14 13:14, Tom de Vries wrote: I'm submitting a patch series with initial support for the oacc kernels directive. The patch series uses pass_parallelize_loops to implement parallelization of loops in the oacc kernels region. The patch series consists of these 8 patches: ... 1 Expand oacc kernels after pass_build_ealias 2 Add pass_oacc_kernels 3 Add pass_ch_oacc_kernels to pass_oacc_kernels 4 Add pass_tree_loop_{init,done} to pass_oacc_kernels 5 Add pass_loop_im to pass_oacc_kernels 6 Add pass_ccp to pass_oacc_kernels 7 Add pass_parloops_oacc_kernels to pass_oacc_kernels 8 Do simple omp lowering for no address taken var ... This patch adds pass_tree_loop_init and pass_tree_loop_init_done to pass_oacc_kernels. Pass_parallelize_loops is run between these passes in the pass group pass_tree_loop, since it requires loop information. We do the same for pass_oacc_kernels. Updated for moving pass_oacc_kernels down past pass_fre in the pass list. Bootstrapped and reg-tested as before. OK for trunk? Both passes should be basically no-ops. Why not call loop_optimizer_init/finalize from expand_omp_ssa instead? Committed to gomp-4_0-branch in r82: commit cb95b4a1efcdb96c58cda986d53b20c3537c1ab7 Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Tue Apr 21 19:51:33 2015 + Add pass_tree_loop_{init,done} to pass_oacc_kernels gcc/ * passes.def: Run pass_tree_loop_init and pass_tree_loop_done in pass group pass_oacc_kernels. * tree-ssa-loop.c (pass_tree_loop_init::clone) (pass_tree_loop_done::clone): New function. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@82 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog.gomp |5 + gcc/passes.def |2 ++ gcc/tree-ssa-loop.c |2 ++ 3 files changed, 9 insertions(+) diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp index d00c5e0..1fb060f 100644 --- gcc/ChangeLog.gomp +++ gcc/ChangeLog.gomp @@ -1,5 +1,10 @@ 2015-04-21 Tom de Vries t...@codesourcery.com + * passes.def: Run pass_tree_loop_init and pass_tree_loop_done in pass + group pass_oacc_kernels. + * tree-ssa-loop.c (pass_tree_loop_init::clone) + (pass_tree_loop_done::clone): New function. + * omp-low.c (loop_in_oacc_kernels_region_p): New function. * omp-low.h (loop_in_oacc_kernels_region_p): Declare. * passes.def: Add pass_ch_oacc_kernels to pass group pass_oacc_kernels. diff --git gcc/passes.def gcc/passes.def index 5cdbc87..83ae04e 100644 --- gcc/passes.def +++ gcc/passes.def @@ -91,7 +91,9 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_oacc_kernels); PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels) NEXT_PASS (pass_ch_oacc_kernels); + NEXT_PASS (pass_tree_loop_init); NEXT_PASS (pass_expand_omp_ssa); + NEXT_PASS (pass_tree_loop_done); POP_INSERT_PASSES () NEXT_PASS (pass_merge_phi); NEXT_PASS (pass_cd_dce); diff --git gcc/tree-ssa-loop.c gcc/tree-ssa-loop.c index a041858..2a96a39 100644 --- gcc/tree-ssa-loop.c +++ gcc/tree-ssa-loop.c @@ -272,6 +272,7 @@ public: /* opt_pass methods: */ virtual unsigned int execute (function *); + opt_pass * clone () { return new pass_tree_loop_init (m_ctxt); } }; // class pass_tree_loop_init @@ -566,6 +567,7 @@ public: /* opt_pass methods: */ virtual unsigned int execute (function *) { return tree_ssa_loop_done (); } + opt_pass * clone () { return new pass_tree_loop_done (m_ctxt); } }; // class pass_tree_loop_done Grüße, Thomas -- Richard Biener rguent...@suse.de SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild, Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)
[PATCH][PR65823] Fix va_arg ap_copy nop detection
Hi, this patch fixes PR65823. The problem is a verify_gimple ICE during compilation of gcc.c-torture/execute/stdarg-2.c for arm at -O0/-O1: ... In function 'f3': src/gcc/testsuite/gcc.c-torture/execute/stdarg-2.c:61:1: error: incorrect sharing of tree nodes aps[4] # .MEM_5 = VDEF .MEM_11 aps[4] = aps[4]; ... Before gimplification, f3 looks like this in the original dump: ... struct va_list aps[10]; struct va_list aps[10]; __builtin_va_start ((struct ) (struct *) aps[4], i); x = VA_ARG_EXPR aps[4]; __builtin_va_end ((struct ) (struct *) aps[4]); ... After gimplification, it looks like: ... f3 (int i) { long intD.5 x.0D.4231; struct va_listD.4222 apsD.4227[10]; try { # USE = anything # CLB = anything __builtin_va_startD.1052 (apsD.4227[4], 0); # USE = anything # CLB = anything x.0D.4231 = VA_ARG (apsD.4227[4], 0B); apsD.4227[4] = apsD.4227[4]; xD.4223 = x.0D.4231; # USE = anything # CLB = anything __builtin_va_endD.1051 (apsD.4227[4]); } finally { apsD.4227 = {CLOBBER}; } } ... The nop 'apsD.4227[4] = apsD.4227[4]' introduced during gimplification is not meant to be there. There is already a test 'TREE_OPERAND (ap, 0) != TREE_OPERAND (ap_copy, 0))' in gimplify_modify_expr to prevent this nop: ... /* When gimplifying the ap argument of va_arg, we might end up with ap.1 = ap va_arg (ap.1, 0B) We need to assign ap.1 back to ap, otherwise va_arg has no effect on ap. */ if (ap != NULL_TREE TREE_CODE (ap) == ADDR_EXPR TREE_CODE (ap_copy) == ADDR_EXPR TREE_OPERAND (ap, 0) != TREE_OPERAND (ap_copy, 0)) gimplify_assign (TREE_OPERAND (ap, 0), TREE_OPERAND (ap_copy, 0), pre_p); ... But the test is a pointer equality test, and it fails in this case. The patches fixes the problem by using operand_equal_p to do the equality test. Bootstrapped and reg-tested on x86_64. Did minimal non-bootstrap build on arm and reg-tested. OK for trunk? Thanks, - Tom Fix va_arg ap_copy nop detection 2015-04-22 Tom de Vries t...@codesourcery.com PR tree-optimization/65823 * gimplify.c (gimplify_modify_expr): Use operand_equal_p to test for equality between ap_copy and ap. --- gcc/gimplify.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/gimplify.c b/gcc/gimplify.c index 0a8ef84..c68bd47 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -4792,7 +4792,7 @@ gimplify_modify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, if (ap != NULL_TREE TREE_CODE (ap) == ADDR_EXPR TREE_CODE (ap_copy) == ADDR_EXPR - TREE_OPERAND (ap, 0) != TREE_OPERAND (ap_copy, 0)) + !operand_equal_p (TREE_OPERAND (ap, 0), TREE_OPERAND (ap_copy, 0), 0)) gimplify_assign (TREE_OPERAND (ap, 0), TREE_OPERAND (ap_copy, 0), pre_p); if (want_value) -- 1.9.1
Re: [PATCH][PR65823] Fix va_arg ap_copy nop detection
On Wed, Apr 22, 2015 at 9:41 AM, Tom de Vries tom_devr...@mentor.com wrote: Hi, this patch fixes PR65823. The problem is a verify_gimple ICE during compilation of gcc.c-torture/execute/stdarg-2.c for arm at -O0/-O1: ... In function 'f3': src/gcc/testsuite/gcc.c-torture/execute/stdarg-2.c:61:1: error: incorrect sharing of tree nodes aps[4] # .MEM_5 = VDEF .MEM_11 aps[4] = aps[4]; ... Before gimplification, f3 looks like this in the original dump: ... struct va_list aps[10]; struct va_list aps[10]; __builtin_va_start ((struct ) (struct *) aps[4], i); x = VA_ARG_EXPR aps[4]; __builtin_va_end ((struct ) (struct *) aps[4]); ... After gimplification, it looks like: ... f3 (int i) { long intD.5 x.0D.4231; struct va_listD.4222 apsD.4227[10]; try { # USE = anything # CLB = anything __builtin_va_startD.1052 (apsD.4227[4], 0); # USE = anything # CLB = anything x.0D.4231 = VA_ARG (apsD.4227[4], 0B); apsD.4227[4] = apsD.4227[4]; xD.4223 = x.0D.4231; # USE = anything # CLB = anything __builtin_va_endD.1051 (apsD.4227[4]); } finally { apsD.4227 = {CLOBBER}; } } ... The nop 'apsD.4227[4] = apsD.4227[4]' introduced during gimplification is not meant to be there. There is already a test 'TREE_OPERAND (ap, 0) != TREE_OPERAND (ap_copy, 0))' in gimplify_modify_expr to prevent this nop: ... /* When gimplifying the ap argument of va_arg, we might end up with ap.1 = ap va_arg (ap.1, 0B) We need to assign ap.1 back to ap, otherwise va_arg has no effect on ap. */ if (ap != NULL_TREE TREE_CODE (ap) == ADDR_EXPR TREE_CODE (ap_copy) == ADDR_EXPR TREE_OPERAND (ap, 0) != TREE_OPERAND (ap_copy, 0)) gimplify_assign (TREE_OPERAND (ap, 0), TREE_OPERAND (ap_copy, 0), pre_p); ... But the test is a pointer equality test, and it fails in this case. The patches fixes the problem by using operand_equal_p to do the equality test. Bootstrapped and reg-tested on x86_64. Did minimal non-bootstrap build on arm and reg-tested. OK for trunk? Hmm, ok for now. But I wonder if we can't fix things to not require that odd extra copy. In fact that we introduce ap.1 looks completely bogus to me (and we don't in this case for arm). Note that the pointer compare obviously fails because we unshare the expression. So ... what breaks if we simply remove this odd fixup? Thanks, Richard. Thanks, - Tom
Re: [PATCH, 3/8] Add pass_ch_oacc_kernels to pass_oacc_kernels
On Tue, 21 Apr 2015, Thomas Schwinge wrote: Hi! On Tue, 25 Nov 2014 12:27:34 +0100, Tom de Vries tom_devr...@mentor.com wrote: On 15-11-14 18:21, Tom de Vries wrote: On 15-11-14 13:14, Tom de Vries wrote: Hi, I'm submitting a patch series with initial support for the oacc kernels directive. The patch series uses pass_parallelize_loops to implement parallelization of loops in the oacc kernels region. The patch series consists of these 8 patches: ... 1 Expand oacc kernels after pass_build_ealias 2 Add pass_oacc_kernels 3 Add pass_ch_oacc_kernels to pass_oacc_kernels 4 Add pass_tree_loop_{init,done} to pass_oacc_kernels 5 Add pass_loop_im to pass_oacc_kernels 6 Add pass_ccp to pass_oacc_kernels 7 Add pass_parloops_oacc_kernels to pass_oacc_kernels 8 Do simple omp lowering for no address taken var ... This patch adds a pass_ch_oacc_kernels to the pass group pass_oacc_kernels. The idea is that pass_parallelize_loops only deals with loops for which the header has been copied, so the easiest way to meet that requirement when running pass_parallelize_loops in group pass_oacc_kernels, is to run pass_ch as a part of pass_oacc_kernels. We define a seperate pass pass_ch_oacc_kernels, to leave all loops that aren't part of a kernels region alone. Updated for moving pass_oacc_kernels down past pass_fre in the pass list. Bootstrapped and reg-tested as before. OK for trunk? Committed to gomp-4_0-branch in r81: commit 58c33a7965c379b55b549d50e3b79b2252bcc876 Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Tue Apr 21 19:48:16 2015 + Add pass_ch_oacc_kernels to pass_oacc_kernels gcc/ * omp-low.c (loop_in_oacc_kernels_region_p): New function. * omp-low.h (loop_in_oacc_kernels_region_p): Declare. * passes.def: Add pass_ch_oacc_kernels to pass group pass_oacc_kernels. * tree-pass.h (make_pass_ch_oacc_kernels): Declare * tree-ssa-loop-ch.c: Include omp-low.h. (pass_ch_execute): Declare. (pass_ch::execute): Factor out ... (pass_ch_execute): ... this new function. If handling oacc kernels, skip loops that are not in oacc kernels region. (pass_ch_oacc_kernels::execute): (pass_data_ch_oacc_kernels): New pass_data. (class pass_ch_oacc_kernels): New pass. (pass_ch_oacc_kernels::execute, make_pass_ch_oacc_kernels): New function. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@81 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog.gomp | 15 gcc/omp-low.c | 91 gcc/omp-low.h |2 ++ gcc/passes.def |1 + gcc/tree-pass.h|1 + gcc/tree-ssa-loop-ch.c | 59 +-- 6 files changed, 167 insertions(+), 2 deletions(-) diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp index 8a53ad8..d00c5e0 100644 --- gcc/ChangeLog.gomp +++ gcc/ChangeLog.gomp @@ -1,5 +1,20 @@ 2015-04-21 Tom de Vries t...@codesourcery.com + * omp-low.c (loop_in_oacc_kernels_region_p): New function. + * omp-low.h (loop_in_oacc_kernels_region_p): Declare. + * passes.def: Add pass_ch_oacc_kernels to pass group pass_oacc_kernels. + * tree-pass.h (make_pass_ch_oacc_kernels): Declare + * tree-ssa-loop-ch.c: Include omp-low.h. + (pass_ch_execute): Declare. + (pass_ch::execute): Factor out ... + (pass_ch_execute): ... this new function. If handling oacc kernels, + skip loops that are not in oacc kernels region. + (pass_ch_oacc_kernels::execute): + (pass_data_ch_oacc_kernels): New pass_data. + (class pass_ch_oacc_kernels): New pass. + (pass_ch_oacc_kernels::execute, make_pass_ch_oacc_kernels): New + function. + * passes.def: Add pass group pass_oacc_kernels. * tree-pass.h (make_pass_oacc_kernels): Declare. * tree-ssa-loop.c (gate_oacc_kernels): New static function. diff --git gcc/omp-low.c gcc/omp-low.c index 16d9a5e..1b03ae6 100644 --- gcc/omp-low.c +++ gcc/omp-low.c @@ -13920,4 +13920,95 @@ gimple_stmt_omp_data_i_init_p (gimple stmt) SSA_OP_DEF); } +/* Return true if LOOP is inside a kernels region. */ + +bool +loop_in_oacc_kernels_region_p (struct loop *loop, basic_block *region_entry, +basic_block *region_exit) Ehm. So why not simply add a flag to struct loop instead and set it during OMP region parsing/lowering? It's also very odd that you disable transforms on OMP regions but at the same time do all the OMP processing _after_ those transforms. Something feels backward here. Richard. +{ + bitmap excludes_bitmap = BITMAP_GGC_ALLOC (); + bitmap
Re: [wwwdocs] Add libstdc++ ABI changes to /gcc-5/changes.html
On 21 April 2015 at 12:54, I wrote: I plan to commit this to wwwdocs later today, it adds a caveat to the top of the file, with a link to a larger description in the libstdc++ section, which links to the new page I've just added to the manual. I committed it, and fixed the invalid HTML in /code0/code.