Re: [PATCH] [AArch64, NEON] Improve vpmaxX vpminX intrinsics
On 28 November 2014 at 09:23, Yangfei (Felix) felix.y...@huawei.com wrote: Hi, This patch converts vpmaxX vpminX intrinsics to use builtin functions instead of the previous inline assembly syntax. Regtested with aarch64-linux-gnu on QEMU. Also passed the glorious testsuite of Christophe Lyon. OK for the trunk? Hi Felix, We know from experience that the advsimd intrinsics tend to be fragile for big endian and in general it is fairly easy to break the big endian case. For these advsimd improvements that you are working on (that we very much appreciate) it is important to run both little endian and big endian regressions. Thanks /Marcus Okay. Any plan for the advsimd big-endian improvement? I rebased this patch over Alan Lawrance's patch: https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00279.html No regressions for aarch64_be-linux-gnu target too. OK for the thunk? Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 218464) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,18 @@ +2014-12-09 Felix Yang felix.y...@huawei.com + + * config/aarch64/aarch64-simd.md (aarch64_maxmin_unspmode): New + pattern. + * config/aarch64/aarch64-simd-builtins.def (smaxp, sminp, umaxp, + uminp, smax_nanp, smin_nanp): New builtins. + * config/aarch64/arm_neon.h (vpmax_s8, vpmax_s16, vpmax_s32, + vpmax_u8, vpmax_u16, vpmax_u32, vpmaxq_s8, vpmaxq_s16, vpmaxq_s32, + vpmaxq_u8, vpmaxq_u16, vpmaxq_u32, vpmax_f32, vpmaxq_f32, vpmaxq_f64, + vpmaxqd_f64, vpmaxs_f32, vpmaxnm_f32, vpmaxnmq_f32, vpmaxnmq_f64, + vpmaxnmqd_f64, vpmaxnms_f32, vpmin_s8, vpmin_s16, vpmin_s32, vpmin_u8, + vpmin_u16, vpmin_u32, vpminq_s8, vpminq_s16, vpminq_s32, vpminq_u8, + vpminq_u16, vpminq_u32, vpmin_f32, vpminq_f32, vpminq_f64, vpminqd_f64, + vpmins_f32, vpminnm_f32, vpminnmq_f32, vpminnmq_f64, vpminnmqd_f64, + 2014-12-07 Felix Yang felix.y...@huawei.com Shanyao Chen chenshan...@huawei.com Index: gcc/config/aarch64/arm_neon.h === --- gcc/config/aarch64/arm_neon.h (revision 218464) +++ gcc/config/aarch64/arm_neon.h (working copy) @@ -8843,491 +8843,7 @@ vpadds_f32 (float32x2_t a) return result; } -__extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) -vpmax_f32 (float32x2_t a, float32x2_t b) -{ - float32x2_t result; - __asm__ (fmaxp %0.2s, %1.2s, %2.2s - : =w(result) - : w(a), w(b) - : /* No clobbers */); - return result; -} - -__extension__ static __inline int8x8_t __attribute__ ((__always_inline__)) -vpmax_s8 (int8x8_t a, int8x8_t b) -{ - int8x8_t result; - __asm__ (smaxp %0.8b, %1.8b, %2.8b - : =w(result) - : w(a), w(b) - : /* No clobbers */); - return result; -} - __extension__ static __inline int16x4_t __attribute__ ((__always_inline__)) -vpmax_s16 (int16x4_t a, int16x4_t b) -{ - int16x4_t result; - __asm__ (smaxp %0.4h, %1.4h, %2.4h - : =w(result) - : w(a), w(b) - : /* No clobbers */); - return result; -} - -__extension__ static __inline int32x2_t __attribute__ ((__always_inline__)) -vpmax_s32 (int32x2_t a, int32x2_t b) -{ - int32x2_t result; - __asm__ (smaxp %0.2s, %1.2s, %2.2s - : =w(result) - : w(a), w(b) - : /* No clobbers */); - return result; -} - -__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__)) -vpmax_u8 (uint8x8_t a, uint8x8_t b) -{ - uint8x8_t result; - __asm__ (umaxp %0.8b, %1.8b, %2.8b - : =w(result) - : w(a), w(b) - : /* No clobbers */); - return result; -} - -__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) -vpmax_u16 (uint16x4_t a, uint16x4_t b) -{ - uint16x4_t result; - __asm__ (umaxp %0.4h, %1.4h, %2.4h - : =w(result) - : w(a), w(b) - : /* No clobbers */); - return result; -} - -__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__)) -vpmax_u32 (uint32x2_t a, uint32x2_t b) -{ - uint32x2_t result; - __asm__ (umaxp %0.2s, %1.2s, %2.2s - : =w(result) - : w(a), w(b) - : /* No clobbers */); - return result; -} - -__extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) -vpmaxnm_f32 (float32x2_t a, float32x2_t b) -{ - float32x2_t result; - __asm__ (fmaxnmp %0.2s,%1.2s,%2.2s - : =w(result) - : w(a), w(b) - : /* No clobbers */); - return result; -} - -__extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) -vpmaxnmq_f32 (float32x4_t a, float32x4_t b) -{ - float32x4_t result; - __asm__ (fmaxnmp %0.4s,%1.4s,%2.4s - : =w(result) - : w(a), w(b) - : /* No clobbers */); - return result; -} - -__extension__ static __inline float64x2_t
Re: [PATCH] [AArch64, NEON] Improve vpmaxX vpminX intrinsics
You'll need to rebase over Alan Lawrance's patch. https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00279.html Yes, see my new patch: https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00750.html +;; Pairwise Integer Max/Min operations. +(define_insn aarch64_maxmin_unspmode + [(set (match_operand:VQ_S 0 register_operand =w) + (unspec:VQ_S [(match_operand:VQ_S 1 register_operand w) +(match_operand:VQ_S 2 register_operand w)] + MAXMINV))] + TARGET_SIMD + maxmin_uns_opp\t%0.Vtype, %1.Vtype, %2.Vtype + [(set_attr type neon_minmaxq)] +) + Could you roll aarch64_reduc_maxmin_uns_internalv2si into this pattern? Will come up with another patch to fix this. Thanks for pointing this out. Thanks, Tejas.
Re: [PATCH, MPX runtime 1/2] Integrate MPX runtime library
On 24 Nov 17:02, Ilya Enkovich wrote: Right. This works for both top level and multilib checks because failing test is used and CC is usually not set when it's called by the top level configure. If we configure with CC=... then it may go wrong. I left only target check in configure.tgt and inlined test for x32 into libmpx configure. -- Joseph S. Myers jos...@codesourcery.com Here is an updated version. Thanks, Ilya -- Here is an updated version. I moved linker specs to target. Currently mpx libraries are built for x86_64-*-linux* | i?86-*-linux*, so I think gcc/config/i386/linux-common.h is a proper place for LIBMPX delcarations. Thanks, Ilya -- 2014-12-09 Ilya Enkovich ilya.enkov...@intel.com * Makefile.def: Add libmpx. * configure.ac: Add libmpx. * Makefile.in: Regenerate. * configure: Regenerate. gcc/ 2014-12-09 Ilya Enkovich ilya.enkov...@intel.com * config/i386/linux-common.h (LIBMPX_LIBS): New. (LIBMPX_SPEC): New. (CHKP_SPEC): New. * gcc.c (CHKP_SPEC): New. (LINK_COMMAND_SPEC): Add CHKP_SPEC. * c-family/c.opt (static-libmpx): New. libmpx/ 2014-12-09 Ilya Enkovich ilya.enkov...@intel.com Initial commit. diff --git a/Makefile.def b/Makefile.def index 7c8761a..82432ef 100644 --- a/Makefile.def +++ b/Makefile.def @@ -129,6 +129,9 @@ target_modules = { module= libsanitizer; bootstrap=true; lib_path=.libs; raw_cxx=true; }; +target_modules = { module= libmpx; + bootstrap=true; + lib_path=.libs; }; target_modules = { module= libvtv; bootstrap=true; lib_path=.libs; diff --git a/configure.ac b/configure.ac index fd1bdf0..1d36f6b 100644 --- a/configure.ac +++ b/configure.ac @@ -162,6 +162,7 @@ target_libraries=target-libgcc \ target-libstdc++-v3 \ target-libsanitizer \ target-libvtv \ + target-libmpx \ target-libssp \ target-libquadmath \ target-libgfortran \ @@ -642,6 +643,25 @@ if test -d ${srcdir}/libvtv; then fi fi + +# Disable libmpx on unsupported systems. +if test -d ${srcdir}/libmpx; then +if test x$enable_libmpx = x; then + AC_MSG_CHECKING([for libmpx support]) + if (srcdir=${srcdir}/libmpx; \ + . ${srcdir}/configure.tgt; \ + test $LIBMPX_SUPPORTED != yes) + then + AC_MSG_RESULT([no]) + noconfigdirs=$noconfigdirs target-libmpx + else + AC_MSG_RESULT([yes]) + fi +fi +fi + + + # Disable libquadmath for some systems. case ${target} in avr-*-*) @@ -2662,6 +2682,11 @@ if echo ${target_configdirs} | grep libvtv /dev/null 21 bootstrap_target_libs=${bootstrap_target_libs}target-libvtv, fi +# If we are building libmpx, bootstrap it. +if echo ${target_configdirs} | grep libmpx /dev/null 21; then + bootstrap_target_libs=${bootstrap_target_libs}target-libmpx, +fi + # Determine whether gdb needs tk/tcl or not. # Use 'maybe' since enable_gdbtk might be true even if tk isn't available # and in that case we want gdb to be built without tk. Ugh! diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index b9f7c65..65731cc 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -1020,6 +1020,9 @@ fchkp-instrument-marked-only C ObjC C++ ObjC++ LTO Report Var(flag_chkp_instrument_marked_only) Init(0) Instrument only functions marked with bnd_instrument attribute. +static-libmpx +Driver + fcilkplus C ObjC C++ ObjC++ LTO Report Var(flag_cilkplus) Init(0) Enable Cilk Plus diff --git a/gcc/config/i386/linux-common.h b/gcc/config/i386/linux-common.h index 1eaf024..07cd1a2 100644 --- a/gcc/config/i386/linux-common.h +++ b/gcc/config/i386/linux-common.h @@ -53,3 +53,27 @@ along with GCC; see the file COPYING3. If not see GNU_USER_TARGET_ENDFILE_SPEC, \ GNU_USER_TARGET_MATHFILE_SPEC \ ANDROID_ENDFILE_SPEC) + +#ifndef LIBMPX_LIBS +#define LIBMPX_LIBS \ + %:include(libmpx.spec)%(link_libmpx) +#endif + +#ifndef LIBMPX_SPEC +#if defined(HAVE_LD_STATIC_DYNAMIC) +#define LIBMPX_SPEC \ +%{mmpx:%{fcheck-pointer-bounds:\ +%{static:--whole-archive -lmpx --no-whole-archive LIBMPX_LIBS }\ +%{!static:%{static-libmpx: LD_STATIC_OPTION --whole-archive}\ +-lmpx %{static-libmpx:--no-whole-archive LD_DYNAMIC_OPTION \ +LIBMPX_LIBS +#else +#define LIBMPX_SPEC \ +%{mmpx:%{fcheck-pointer-bounds:-lmpx LIBMPX_LIBS }} +#endif +#endif + +#ifndef CHKP_SPEC +#define CHKP_SPEC \ +%{!nostdlib:%{!nodefaultlibs: LIBMPX_SPEC }} +#endif diff --git a/gcc/gcc.c b/gcc/gcc.c index 4cb4ba3..636830e 100644 --- a/gcc/gcc.c +++ b/gcc/gcc.c @@ -809,6 +809,10 @@ proper position among the other output files. */ %{fvtable-verify=preinit: -lvtv
Re: [PATCH, MPX wrappers 1/3] Add MPX wrappers library
On 05 Dec 15:52, Jeff Law wrote: On 12/03/14 07:28, Ilya Enkovich wrote: #ifndef MPX_SPEC #define MPX_SPEC \ -%{!nostdlib:%{!nodefaultlibs: LIBMPX_SPEC }} +%{!nostdlib:%{!nodefaultlibs: LIBMPX_SPEC LIBMPXWRAPPERS_SPEC }} #endif Ugh. Somehow I missed that MPX_SPEC was in gcc.c along with the uses of LIBMPX_SPEC. Aren't all these target specific and thus belong in the x86 specific files? Is config/i386/linux-common.h is a proper place for these specs then? Depends on whether or not we expect MPX to show up on other systems such as *bsd, mingw, solaris, etc. So I'd say linux-common.h is better than gcc.c, but perhaps not the best location. Uros should chime in here. Right. Wrappers code doesn't use anything specific to MPX. In case of pure software solution we should be able to compile and use this library without changes (except compilation flags). But in case pure software solution exists MPX option should still be available and we should have two builds for this library. Ok. Just wanted to be sure I understood how the pieces fit together. I don't really expect a software implementation, but keeping it in mind helps us reasonably consider where certain things belong implementation wise. jeff Here is a version with linker specs moved into linux-common.h. Thanks, Ilya -- gcc/ 2014-12-09 Ilya Enkovich ilya.enkov...@intel.com * config/i386/linux-common.h (LIBMPX_WRAPPERSSPEC): New. (CHKP_SPEC): Add wrappers library. * c-family/c.opt (static-libmpxwrappers): New. libmpx/ 2014-12-09 Ilya Enkovich ilya.enkov...@intel.com * Makefile.am (SUBDIRS): Add mpxwrap when used AS supports MPX. (MAKEOVERRIDES): New. * Makefile.in: Regenerate. * configure.ac: Check AS supports MPX. Add mpxintr/Makefile to config files. * configure: Regenerate. * mpxwrap/Makefile.am: New. * mpxwrap/Makefile.in: New. * mpxwrap/libtool-version: New. * mpxwrap/mpx_wrappers.cc: New. * mpxwrap/libmpxwrappers.map: New. diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index 65731cc..2632c3f 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -1023,6 +1023,9 @@ Instrument only functions marked with bnd_instrument attribute. static-libmpx Driver +static-libmpxwrappers +Driver + fcilkplus C ObjC C++ ObjC++ LTO Report Var(flag_cilkplus) Init(0) Enable Cilk Plus diff --git a/gcc/config/i386/linux-common.h b/gcc/config/i386/linux-common.h index 07cd1a2..854af46 100644 --- a/gcc/config/i386/linux-common.h +++ b/gcc/config/i386/linux-common.h @@ -73,7 +73,21 @@ along with GCC; see the file COPYING3. If not see #endif #endif +#ifndef LIBMPXWRAPPERS_SPEC +#if defined(HAVE_LD_STATIC_DYNAMIC) +#define LIBMPXWRAPPERS_SPEC \ +%{mmpx:%{fcheck-pointer-bounds:%{!fno-chkp-use-wrappers:\ +%{static:-lmpxwrappers}\ +%{!static:%{static-libmpxwrappers: LD_STATIC_OPTION --whole-archive}\ +-lmpxwrappers %{static-libmpxwrappers:--no-whole-archive \ +LD_DYNAMIC_OPTION } +#else +#define LIBMPXWRAPPERS_SPEC \ +%{mmpx:%{fcheck-pointer-bounds:{!fno-chkp-use-wrappers:-lmpxwrappers}}} +#endif +#endif + #ifndef CHKP_SPEC #define CHKP_SPEC \ -%{!nostdlib:%{!nodefaultlibs: LIBMPX_SPEC }} +%{!nostdlib:%{!nodefaultlibs: LIBMPX_SPEC LIBMPXWRAPPERS_SPEC }} #endif diff --git a/libmpx/Makefile.am b/libmpx/Makefile.am index 6cee4ac..bd0a8b6 100644 --- a/libmpx/Makefile.am +++ b/libmpx/Makefile.am @@ -2,6 +2,9 @@ ACLOCAL_AMFLAGS = -I .. -I ../config if LIBMPX_SUPPORTED SUBDIRS = mpxrt +if MPX_AS_SUPPORTED +SUBDIRS += mpxwrap +endif nodist_toolexeclib_HEADERS = libmpx.spec endif @@ -45,3 +48,5 @@ AM_MAKEFLAGS = \ PICFLAG=$(PICFLAG) \ RANLIB=$(RANLIB) \ DESTDIR=$(DESTDIR) + +MAKEOVERRIDES = diff --git a/libmpx/configure.ac b/libmpx/configure.ac index dbfad02..4669525 100644 --- a/libmpx/configure.ac +++ b/libmpx/configure.ac @@ -100,6 +100,18 @@ AC_CHECK_TOOL(AS, as) AC_CHECK_TOOL(AR, ar) AC_CHECK_TOOL(RANLIB, ranlib, :) +# Check we may build wrappers library +echo test: bndmov %bnd0, %bnd1 conftest.s +if AC_TRY_COMMAND([$AS -o conftest.o conftest.s 1AS_MESSAGE_LOG_FD]) +then +mpx_as=yes +else +mpx_as=no +echo configure: no MPX support fo as AS_MESSAGE_LOG_FD +fi +rm -f conftest.o conftest.s +AM_CONDITIONAL(MPX_AS_SUPPORTED, [test x$mpx_as = xyes]) + # Configure libtool AC_LIBTOOL_DLOPEN AM_PROG_LIBTOOL @@ -117,7 +129,7 @@ fi AC_CONFIG_FILES([Makefile libmpx.spec]) AC_CONFIG_HEADERS(config.h) -AC_CONFIG_FILES(AC_FOREACH([DIR], [mpxrt], [DIR/Makefile]), +AC_CONFIG_FILES(AC_FOREACH([DIR], [mpxrt mpxwrap], [DIR/Makefile ]), [cat vpsed$$ \_EOF s!`test -f '$' || echo '$(srcdir)/'`!! _EOF diff --git a/libmpx/mpxwrap/Makefile.am b/libmpx/mpxwrap/Makefile.am new file mode 100644 index 000..c254d9b --- /dev/null +++ b/libmpx/mpxwrap/Makefile.am @@ -0,0 +1,54 @@ +ALCLOCAL_AMFLAGS = -I .. -I ../config + +#
[PING ^ 3] [RFC PATCH, AARCH64] Add support for -mlong-calls option
Hi, This is a pin for: https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02258.html Thanks.
Re: [PATCH] Fix broadcast from scalar patterns (PR target/63594)
On Mon, Dec 8, 2014 at 10:42 PM, Jakub Jelinek ja...@redhat.com wrote: Hi! This patch attempts to fix (set (reg:V*mode) (vec_duplicate:V*mode (reg/mem:mode))) patterns. One issue is that there were separate patterns for broadcast from gpr and separate patterns for broadcast from memory (and vector reg), that isn't a good idea for reload, which can't then freely choose. Another issue is that some pre-AVX2 broadcast patterns were present above the avx512vl broadcast patterns, so again, reload didn't have the possibility to use %xmm16-31/%ymm16-31 registers. Also, the splitter written for AVX2 broadcasts from gpr went into the way of AVX512VL broadcasts. And finally, the avx512*intrin.h headers were using #ifdef TARGET_64BIT, macro not used anywhere (probably meant to write __x86_64__ instead, but with the patch we actually just have one set of builtins. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2014-12-08 Jakub Jelinek ja...@redhat.com PR target/63594 * config/i386/sse.md (vec_dupv4sf): Move after mask_codeforavx512_vec_dup_gprmodemask_name pattern. (*vec_dupv4si, *vec_dupv2di): Likewise. (mask_codeforavx512_vec_dup_memmodemask_name): Merge into ... (mask_codeforavx512_vec_dup_gprmodemask_name): ... this pattern. (*vec_dupmode AVX2_VEC_DUP_MODE splitter): Disable for TARGET_AVX512VL (for QI/HI scalar modes only if TARGET_AVX512BW is set too). * config/i386/i386.c (enum ix86_builtins): Remove IX86_BUILTIN_PBROADCASTQ256_MEM_MASK, IX86_BUILTIN_PBROADCASTQ128_MEM_MASK and IX86_BUILTIN_PBROADCASTQ512_MEM. (bdesc_args): Use __builtin_ia32_pbroadcastq512_gpr_mask, __builtin_ia32_pbroadcastq256_gpr_mask and __builtin_ia32_pbroadcastq128_gpr_mask instead of *_mem_mask regardless of OPTION_MASK_ISA_64BIT. * config/i386/avx512fintrin.h (_mm512_set1_epi64, _mm512_mask_set1_epi64, _mm512_maskz_set1_epi64): Use *_gpr_mask builtins regardless of whether TARGET_64BIT is defined or not. * config/i386/avx512vlintrin.h (_mm256_mask_set1_epi64, _mm256_maskz_set1_epi64, _mm_mask_set1_epi64, _mm_maskz_set1_epi64): Likewise. LGTM, but please see inline comment below. --- gcc/config/i386/sse.md.jj 2014-12-03 11:52:41.0 +0100 +++ gcc/config/i386/sse.md 2014-12-08 13:26:06.505543457 +0100 @@ -6319,22 +6319,6 @@ (define_insn avx512f_vec_dupmode_1 (set_attr prefix evex) (set_attr mode MODE)]) -(define_insn vec_dupv4sf - [(set (match_operand:V4SF 0 register_operand =x,x,x) - (vec_duplicate:V4SF - (match_operand:SF 1 nonimmediate_operand x,m,0)))] - TARGET_SSE - @ - vshufps\t{$0, %1, %1, %0|%0, %1, %1, 0} - vbroadcastss\t{%1, %0|%0, %1} - shufps\t{$0, %0, %0|%0, %0, 0} - [(set_attr isa avx,avx,noavx) - (set_attr type sseshuf1,ssemov,sseshuf1) - (set_attr length_immediate 1,0,1) - (set_attr prefix_extra 0,1,*) - (set_attr prefix vex,vex,orig) - (set_attr mode V4SF)]) - ;; Although insertps takes register source, we prefer ;; unpcklps with register source since it is shorter. (define_insn *vec_concatv2sf_sse4_1 @@ -12821,37 +12805,6 @@ (define_split operands[1] = adjust_address (operands[1], ssescalarmodemode, offs); }) -(define_insn *vec_dupv4si - [(set (match_operand:V4SI 0 register_operand =x,x,x) - (vec_duplicate:V4SI - (match_operand:SI 1 nonimmediate_operand x,m,0)))] - TARGET_SSE - @ - %vpshufd\t{$0, %1, %0|%0, %1, 0} - vbroadcastss\t{%1, %0|%0, %1} - shufps\t{$0, %0, %0|%0, %0, 0} - [(set_attr isa sse2,avx,noavx) - (set_attr type sselog1,ssemov,sselog1) - (set_attr length_immediate 1,0,1) - (set_attr prefix_extra 0,1,*) - (set_attr prefix maybe_vex,vex,orig) - (set_attr mode TI,V4SF,V4SF)]) - -(define_insn *vec_dupv2di - [(set (match_operand:V2DI 0 register_operand =x,x,x,x) - (vec_duplicate:V2DI - (match_operand:DI 1 nonimmediate_operand 0,x,m,0)))] - TARGET_SSE - @ - punpcklqdq\t%0, %0 - vpunpcklqdq\t{%d1, %0|%0, %d1} - %vmovddup\t{%1, %0|%0, %1} - movlhps\t%0, %0 - [(set_attr isa sse2_noavx,avx,sse3,noavx) - (set_attr type sselog1,sselog1,sselog1,ssemov) - (set_attr prefix orig,vex,maybe_vex,orig) - (set_attr mode TI,TI,DF,V4SF)]) - (define_insn *vec_concatv2si_sse4_1 [(set (match_operand:V2SI 0 register_operand =Yr,*x,x, Yr,*x,x, x, *y,*y) (vec_concat:V2SI @@ -16665,46 +16618,78 @@ (define_insn mask_codeforavx512f_broa (set_attr mode sseinsnmode)]) (define_insn mask_codeforavx512_vec_dup_gprmodemask_name - [(set (match_operand:VI12_AVX512VL 0 register_operand =v) + [(set (match_operand:VI12_AVX512VL 0 register_operand =v,v) (vec_duplicate:VI12_AVX512VL - (match_operand:ssescalarmode 1 register_operand r)))] +
Re: [PATCH, x86] Fix pblendv expand.
On Tue, Dec 9, 2014 at 12:33 AM, Evgeny Stupachenko evstu...@gmail.com wrote: The patch fix pblendv expand. The bug was uncovered when permutation operands are constants. In this case we init target register for expand_vec_perm_1 with constant and then rewrite the target with constant for expand_vec_perm_pblend. The patch fixes 403.gcc execution, compiled with -Ofast -funroll-loops -flto -march=corei7. Bootstrap and make check passed. Is it ok? Please add a testcase. Uros. Evgeny 2014-12-09 Evgeny Stupachenko evstu...@gmail.com gcc/ * config/i386/i386.c (expand_vec_perm_pblendv): Gen new rtx for expand_vec_perm_1 target. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index eafc15a..5a914ad 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -47546,6 +47546,7 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d *d) dcopy.op0 = dcopy.op1 = d-op1; else dcopy.op0 = dcopy.op1 = d-op0; + dcopy.target = gen_reg_rtx (vmode); dcopy.one_operand_p = true; for (i = 0; i nelt; ++i)
Re: [PATCH] Mark explicit decls as implicit when we've seen a prototype
On Mon, 8 Dec 2014, Joseph Myers wrote: On Mon, 8 Dec 2014, Richard Biener wrote: The alternative is to decide used in the middle-end at one point, for example at the end of all_lowering_passes where hopefully we have constant folded and removed dead code enough. We can also compute an overall uses libm flag to fix the testcase I reported (of course we'd like to re-compute that at LTO time). If you do that, of course any optimizations before deciding used need to be conservative - assume that any function except the one you're optimizing a call of is not used. Yes. I guess I'm willing to consider a new langhook that for C evaluates C_DECL_USED. We still have to find a suitable point to query it - with the earliest point is after parsing finished (thus when we finalize the TU). At that point all constant expression folding should have happened (up to the point required by the language) - but I suppose C_DECL_USED is still set even if all calls were constant folded away. Note that computing a middle-end C_DECL_USED can be done from gimplification (possibly after folding the calls there once). Same for a global libm function used flag (where at some point we might need a target hook to allow specifying whether a call is from libm). Richard.
[PATCH, trivial] [AArch64] Remove declaration of removed function from aarch64-protos.h
The definition of function aarch64_function_profiler is removed since GCC-4.9. But the declaration is still there in aarch64-protos.h. So remove it. OK for the trunk? Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 218464) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2014-12-09 Felix Yang felix.y...@huawei.com + + * config/aarch64/aarch64-protos.h (aarch64_function_profiler): Remove + declaration of removed function. + 2014-12-07 Felix Yang felix.y...@huawei.com Shanyao Chen chenshan...@huawei.com Index: gcc/config/aarch64/aarch64-protos.h === --- gcc/config/aarch64/aarch64-protos.h (revision 218464) +++ gcc/config/aarch64/aarch64-protos.h (working copy) @@ -247,7 +247,6 @@ void aarch64_expand_epilogue (bool); void aarch64_expand_mov_immediate (rtx, rtx); void aarch64_expand_prologue (void); void aarch64_expand_vector_init (rtx, rtx); -void aarch64_function_profiler (FILE *, int); void aarch64_init_cumulative_args (CUMULATIVE_ARGS *, const_tree, rtx, const_tree, unsigned); void aarch64_init_expanders (void);
Re: [PATCH, x86] Fix pblendv expand.
On Tue, Dec 9, 2014 at 9:57 AM, Uros Bizjak ubiz...@gmail.com wrote: The patch fix pblendv expand. The bug was uncovered when permutation operands are constants. In this case we init target register for expand_vec_perm_1 with constant and then rewrite the target with constant for expand_vec_perm_pblend. The patch fixes 403.gcc execution, compiled with -Ofast -funroll-loops -flto -march=corei7. Bootstrap and make check passed. Is it ok? Please add a testcase. Also, it surprises me that we enter expand_vec_perm_pblendv with uninitialized (?) target. Does your patch only papers over a real problem up the call chain (hard to say without a testcase)? Uros. Evgeny 2014-12-09 Evgeny Stupachenko evstu...@gmail.com gcc/ * config/i386/i386.c (expand_vec_perm_pblendv): Gen new rtx for expand_vec_perm_1 target. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index eafc15a..5a914ad 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -47546,6 +47546,7 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d *d) dcopy.op0 = dcopy.op1 = d-op1; else dcopy.op0 = dcopy.op1 = d-op0; + dcopy.target = gen_reg_rtx (vmode); dcopy.one_operand_p = true; for (i = 0; i nelt; ++i)
[PATCH] Fix half of PR64191
This avoids vectorizing empty loops that only contain an indirect clobber. The real bug of course is that we don't remove the loop in the first place (but we might not able to prove it terminates). Still vectorizing it adds additional prologue/epilogue loops that only convolutes the final code we generate. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk and branch. Richard. 2014-12-09 Richard Biener rguent...@suse.de PR tree-optimization/64191 * tree-vect-stmts.c (vect_stmt_relevant_p): Clobbers are not relevant (nor are their uses). Index: gcc/tree-vect-stmts.c === --- gcc/tree-vect-stmts.c (revision 218479) +++ gcc/tree-vect-stmts.c (working copy) @@ -340,7 +340,8 @@ vect_stmt_relevant_p (gimple stmt, loop_ /* changing memory. */ if (gimple_code (stmt) != GIMPLE_PHI) -if (gimple_vdef (stmt)) +if (gimple_vdef (stmt) +!gimple_clobber_p (stmt)) { if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location,
Re: [PATCH] Fix broadcast from scalar patterns (PR target/63594)
On Tue, Dec 09, 2014 at 09:49:17AM +0100, Uros Bizjak wrote: (define_insn mask_codeforavx512_vec_dup_gprmodemask_name - [(set (match_operand:VI48_AVX512VL 0 register_operand =v) - (vec_duplicate:VI48_AVX512VL - (match_operand:ssescalarmode 1 register_operand r)))] - TARGET_AVX512F (ssescalarmodemode != DImode || TARGET_64BIT) -{ - return vpbroadcastbcstscalarsuff\t{%1, %0mask_operand2|%0mask_operand2, %1}; -} - [(set_attr type ssemov) - (set_attr prefix evex) - (set_attr mode sseinsnmode)]) - -(define_insn mask_codeforavx512_vec_dup_memmodemask_name - [(set (match_operand:V48_AVX512VL 0 register_operand =v) - (vec_duplicate:V48_AVX512VL - (match_operand:ssescalarmode 1 nonimmediate_operand vm)))] + [(set (match_operand:V48_AVX512VL 0 register_operand =v,v) + (vec_duplicate:V48_AVX512VL + (match_operand:ssescalarmode 1 nonimmediate_operand vm,r)))] TARGET_AVX512F vsseintprefixbroadcastbcstscalarsuff\t{%1, %0mask_operand2|%0mask_operand2, %1} [(set_attr type ssemov) (set_attr prefix evex) - (set_attr mode sseinsnmode)]) + (set_attr mode sseinsnmode) + (set (attr enabled) + (if_then_else (eq_attr alternative 1) + (symbol_ref GET_MODE_CLASS (ssescalarmodemode) == MODE_INT + (ssescalarmodemode != DImode || TARGET_64BIT)) + (const_int 1)))]) I have no idea how to get rid of this enabled attribute, as it disables just one alternative. Creating a special isa attribute for it looks less readable and much more expensive. This enabled attribute is IMHO of the good kind, GET_MODE_CLASS (ssescalarmodemode) == MODE_INT and ssescalarmodemode != DImode are constants for the particular chosen mode, TARGET_64BIT doesn't change during compilation of a TU, and so the only variable is that it disables just one of the alternatives. @@ -16759,7 +16744,10 @@ (define_split [(set (match_operand:AVX2_VEC_DUP_MODE 0 register_operand) (vec_duplicate:AVX2_VEC_DUP_MODE (match_operand:ssescalarmode 1 register_operand)))] - TARGET_AVX2 reload_completed GENERAL_REG_P (operands[1]) + TARGET_AVX2 +(!TARGET_AVX512VL + || (!TARGET_AVX512BW GET_MODE_SIZE (ssescalarmodemode) 2)) +reload_completed GENERAL_REG_P (operands[1]) [(const_int 0)] We would like to avoid convoluted insn enable condition by moving the target delated complexity to the mode iterator, even if it requires additional single-use mode iterator. In the ideal case, the remaining target-dependant condition would represent the baseline target for an insn and all other target-related conditions would be inside mode iterator. This splitter condition can be replaced with mode iterator, but it results in big repetitions there (incremental diff below). Or do you have something else in mind? --- gcc/config/i386/sse.md 2014-12-08 13:26:06.505543457 +0100 +++ gcc/config/i386/sse.md 2014-12-08 13:26:06.505543457 +0100 @@ -16740,14 +16740,21 @@ (set_attr isa avx2,noavx2,avx2,noavx2) (set_attr mode sseinsnmode,V8SF,sseinsnmode,V8SF)]) +;; Modes handled by AVX2 vec_dup splitter. Assumes TARGET_AVX2, +;; if both -mavx512vl and -mavx512bw are used, disables all modes, +;; if just -mavx512vl, disables just V*SI. +(define_mode_iterator AVX2_VEC_DUP_MODE_SPLIT + [(V32QI !TARGET_AVX512VL || !TARGET_AVX512BW) + (V16QI !TARGET_AVX512VL || !TARGET_AVX512BW) + (V16HI !TARGET_AVX512VL || !TARGET_AVX512BW) + (V8HI !TARGET_AVX512VL || !TARGET_AVX512BW) + (V8SI !TARGET_AVX512VL) (V4SI !TARGET_AVX512VL)]) + (define_split - [(set (match_operand:AVX2_VEC_DUP_MODE 0 register_operand) - (vec_duplicate:AVX2_VEC_DUP_MODE + [(set (match_operand:AVX2_VEC_DUP_MODE_SPLIT 0 register_operand) + (vec_duplicate:AVX2_VEC_DUP_MODE_SPLIT (match_operand:ssescalarmode 1 register_operand)))] - TARGET_AVX2 -(!TARGET_AVX512VL - || (!TARGET_AVX512BW GET_MODE_SIZE (ssescalarmodemode) 2)) -reload_completed GENERAL_REG_P (operands[1]) + TARGET_AVX2 reload_completed GENERAL_REG_P (operands[1]) [(const_int 0)] { emit_insn (gen_vec_setv4si_0 (gen_lowpart (V4SImode, operands[0]), Jakub
Re: [RFC, PATCH, fortran] PR fortran/60255 Deferred character length
Hi Paul, s/furure/future/ :-) Hups, fixed. Why are you using a double underscore in get__len_component? Because the component is called _len. The routine should be called get _len component, but spaces aren't allowed :-) Anyways, does this violate some style guide? Should I remove one of underscores? More seriously, I think that the len field should be added unconditionally to unlimited polymorphic variables. Otherwise, you might find unlimited polymorphic variables that are created in an already compiled module/subprogramme arriving without the requisite field. I was thinking about that, too. For a start I just wanted to give an idea of where this is going. When more gfortran gurus vote for the unconditional add to u-poly variables, then I will change it. Michael Metcalf has posted an example that makes use of unlimited polymorphism at ftp://ftp.numerical.rl.ac.uk/pub/MRandC/oo.f90 . gfortran does not work correctly with it at the moment because of the lack of a len field. Removing all the string input allows it to run correctly. I think that you should ensure that your patch fixes the problem. A slight obstacle is that the substring at line 216 causes the emission of: type is (character(*)) 1 Error: Associate-name '__tmp_CHARACTER_0_1' at (1) is used as array Just retaining print *, 'character = ', v, '' allows the example to compile Ok, I take a look at it. As I am paid to fix certain bugs that prevent compiling another software, I will not prioritize working on 55901 as long as it is not needed in the software I focus on. Sorry for not being more enthusiastic, but there are more than 8 prs (and only one down yet) I have to fix and time is limited. What I did not mention in the previous mail is that I also plan to implement this fixes in the fortran-dev branch with the new array descriptor. Given that there is no other volunteer. :-) Please continue commenting. Regards, Andre ifort compiles and runs it successfully and so I think that it would be nice if gfortran catches up on this one. Parenthetically, I wonder if this is not the time to implement PR53971... including responding to Mikael's comment? Anyway, this is a good start in the right direction. Please persist! Thanks Paul On 8 December 2014 at 18:38, Andre Vehreschild ve...@gmx.de wrote: Hi all, please find attached a more elaborate patch for pr60255. I totally agree that my first attempt was just scratching the surface of the work needed. This patch also is *not* complete, but because I am really new to gfortran patching, I don't want to present a final patch only to learn then, that I have violated design rules, common practice or the like. Therefore please comment and direct me to any sources/ideas to improve the patch. Topic: The pr 60255 is about assigning a char array to an unlimited polymorphic entity. In the comments the concern about the lost length information is raised. The patch adds a _len component to the unlimited polymorphic entity (after _data and _vtab) and adds an assignment of the string length to _len when a string is pointer assigned to the unlimited poly entity. Furthermore is the intrinsic len(unlimited poly pointing to a string) resolved to give the _len component. Yet missing: - assign _len component back to deferred char array length component - transport length along chains of unlimited poly entities, i.e., a = b; c = a where all objects are unlimited poly and b is a string. - allocate() in this context Patch dependencies: none Comments, concerns, candy welcome! Regards, Andre On Sun, 17 Aug 2014 14:32:21 +0200 domi...@lps.ens.fr (Dominique Dhumieres) wrote: the testcase should check that the code generated is actually working, not just that the ICE disappeared. I agree. Note that there is a test in the comment 3 of PR60255 that can be used to check the run time behavior (and possibly check the vtab issue). Dominique -- Andre Vehreschild * Email: vehre ad gmx dot de -- Andre Vehreschild * Kreuzherrenstr. 8 * 52062 Aachen Tel.: +49 241 9291018 * Email: ve...@gmx.de
RE: [PATCH] Fix PR 61225
-Original Message- From: Jeff Law [mailto:l...@redhat.com] Sent: Tuesday, December 09, 2014 5:29 AM To: Zhenqiang Chen Cc: Steven Bosscher; gcc-patches@gcc.gnu.org; Jakub Jelinek Subject: Re: [PATCH] Fix PR 61225 On 12/04/14 01:43, Zhenqiang Chen wrote: Part of PR rtl-optimization/61225 * combine.c (refer_same_reg_p): New function. (combine_instructions): Handle I1 - I2 - I3; I2 - insn. (try_combine): Add one more parameter TO_COMBINED_INSN, which is used to create a new insn parallel (TO_COMBINED_INSN, I3). testsuite/ChangeLog: 2014-08-04 Zhenqiang Chenzhenqiang.c...@linaro.org * gcc.target/i386/pr61225.c: New test. THanks for the updates and clarifications. Just a few minor things and while it's a bit of a hack, I'll approve: + +/* A is a compare (reg1, 0) and B is SINGLE_SET which SET_SRC is reg2. + It returns TRUE, if reg1 == reg2, and no other refer of reg1 + except A and B. */ + +static bool +refer_same_reg_p (rtx_insn *a, rtx_insn *b) +{ + rtx seta = single_set (a); + rtx setb = single_set (b); + + if (BLOCK_FOR_INSN (a) != BLOCK_FOR_INSN (b) + || !seta + || !setb) +return false; + if (GET_CODE (SET_SRC (seta)) != COMPARE + || GET_MODE_CLASS (GET_MODE (SET_DEST (seta))) != MODE_CC + || !REG_P (XEXP (SET_SRC (seta), 0)) + || XEXP (SET_SRC (seta), 1) != const0_rtx + || !REG_P (SET_SRC (setb)) + || REGNO (SET_SRC (setb)) != REGNO (XEXP (SET_SRC (seta), 0))) +return false; Do you need to verify SETA and SETB satisfy single_set? Or has that already been done elsewhere? A is NEXT_INSN (insn) B is prev_nonnote_nondebug_insn (insn), For I1 - I2 - B; I2 - A; LOG_LINK can make sure I1 and I2 are single_set, but not A and B. And I did found codes in function try_combine, which can make sure B (or I3) is single_set. So I think the check can skip failed cases at early stage. The name refer_same_reg_p seems wrong -- your function is verifying the underlying RTL store as well as the existence of a a dependency between the insns. Can you try to come up with a better name? Change it to can_reuse_cc_set_p. Please use CONST0_RTX (mode) IIRC that'll allow this to work regardless of the size of the modes relative to the host word size. Updated. + + if (DF_REG_USE_COUNT (REGNO (SET_SRC (setb))) 2) +{ + df_ref use; + rtx insn; + unsigned int i = REGNO (SET_SRC (setb)); + + for (use = DF_REG_USE_CHAIN (i); use; use = DF_REF_NEXT_REG (use)) +{ + insn = DF_REF_INSN (use); + if (insn != a insn != b !(NOTE_P (insn) || DEBUG_INSN_P (insn))) + return false; + } +} + + return true; +} Is this fragment really needed? Does it ever trigger? I'd think that for 2 uses punting would be fine. Do we really commonly have cases with 2 uses, but where they're all in SETA and SETB? The check is to make sure the correctness. Here is a case, int f1 (int *x) { int t = --*x; if (!t) foo (x); return t; } _4 = *x_3(D); _5 = _4 + -1; *x_3(D) = _5; # DEBUG t = _5 if (_5 == 0) ... bb 4: return _5; _5 is used in return _5. So we can not remove _5 = _4 + -1. } } + /* Try to combine a compare insn that sets CC +with a preceding insn that can set CC, and maybe with its +logical predecessor as well. +We need this special code because data flow connections +do not get entered in LOG_LINKS. */ + if ((prev = prev_nonnote_nondebug_insn (insn)) != NULL_RTX + refer_same_reg_p (insn, prev) + max_combine = 4) + { + struct insn_link *next1; + FOR_EACH_LOG_LINK (next1, prev) + { + rtx_insn *link1 = next1-insn; + if (NOTE_P (link1)) + continue; + /* I1 - I2 - I3; I2 - insn; + output parallel (insn, I3). */ + FOR_EACH_LOG_LINK (nextlinks, link1) + if ((next = try_combine (prev, link1, + nextlinks-insn, NULL, + new_direct_jump_p, + last_combined_insn, insn)) != 0) + + { + delete_insn (insn); + insn = next; + statistics_counter_event (cfun, four-insn combine, 1); + goto retry; + } + /* I2 - I3; I2 - insn + output next = parallel (insn, I3). */ + if ((next = try_combine (prev, link1, +NULL, NULL, +new_direct_jump_p, +
Re: [testsuite patch] avoid test when compile options is conflict with default mthumb
On Dec 8, 2014, at 7:06 PM, Wang Deqiang wang.deqi...@linaro.org wrote: This is a ping for https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01049.html Seems reasonable enough. I was hoping the arm folks would chime in, we should have enough of them to review. Let’s given them two more days, and if no comments, Ok. If they comment then seems reasonable to let them have the final say.
Re: locales fixes
On 08/12/14 23:53 +0100, François Dumont wrote: After having installed all necessary locales on my system I end up with 4 failures. Here is a patch to fix them all. Did you discover why only you are seeing failures? The whole testsuite passes for me, with glibc 2.18, including the tests you are fixing (i.e. I don't see any need for a fix). The changes to the glibc versions tests are definitely not correct.
Re: [PATCH] Fix broadcast from scalar patterns (PR target/63594)
On Tue, Dec 9, 2014 at 10:17 AM, Jakub Jelinek ja...@redhat.com wrote: On Tue, Dec 09, 2014 at 09:49:17AM +0100, Uros Bizjak wrote: (define_insn mask_codeforavx512_vec_dup_gprmodemask_name - [(set (match_operand:VI48_AVX512VL 0 register_operand =v) - (vec_duplicate:VI48_AVX512VL - (match_operand:ssescalarmode 1 register_operand r)))] - TARGET_AVX512F (ssescalarmodemode != DImode || TARGET_64BIT) -{ - return vpbroadcastbcstscalarsuff\t{%1, %0mask_operand2|%0mask_operand2, %1}; -} - [(set_attr type ssemov) - (set_attr prefix evex) - (set_attr mode sseinsnmode)]) - -(define_insn mask_codeforavx512_vec_dup_memmodemask_name - [(set (match_operand:V48_AVX512VL 0 register_operand =v) - (vec_duplicate:V48_AVX512VL - (match_operand:ssescalarmode 1 nonimmediate_operand vm)))] + [(set (match_operand:V48_AVX512VL 0 register_operand =v,v) + (vec_duplicate:V48_AVX512VL + (match_operand:ssescalarmode 1 nonimmediate_operand vm,r)))] TARGET_AVX512F vsseintprefixbroadcastbcstscalarsuff\t{%1, %0mask_operand2|%0mask_operand2, %1} [(set_attr type ssemov) (set_attr prefix evex) - (set_attr mode sseinsnmode)]) + (set_attr mode sseinsnmode) + (set (attr enabled) + (if_then_else (eq_attr alternative 1) + (symbol_ref GET_MODE_CLASS (ssescalarmodemode) == MODE_INT + (ssescalarmodemode != DImode || TARGET_64BIT)) + (const_int 1)))]) I have no idea how to get rid of this enabled attribute, as it disables just one alternative. Creating a special isa attribute for it looks less readable and much more expensive. This enabled attribute is IMHO of the good kind, GET_MODE_CLASS (ssescalarmodemode) == MODE_INT and ssescalarmodemode != DImode are constants for the particular chosen mode, TARGET_64BIT doesn't change during compilation of a TU, and so the only variable is that it disables just one of the alternatives. IMO, this one is not problematic. @@ -16759,7 +16744,10 @@ (define_split [(set (match_operand:AVX2_VEC_DUP_MODE 0 register_operand) (vec_duplicate:AVX2_VEC_DUP_MODE (match_operand:ssescalarmode 1 register_operand)))] - TARGET_AVX2 reload_completed GENERAL_REG_P (operands[1]) + TARGET_AVX2 +(!TARGET_AVX512VL + || (!TARGET_AVX512BW GET_MODE_SIZE (ssescalarmodemode) 2)) +reload_completed GENERAL_REG_P (operands[1]) [(const_int 0)] We would like to avoid convoluted insn enable condition by moving the target delated complexity to the mode iterator, even if it requires additional single-use mode iterator. In the ideal case, the remaining target-dependant condition would represent the baseline target for an insn and all other target-related conditions would be inside mode iterator. This splitter condition can be replaced with mode iterator, but it results in big repetitions there (incremental diff below). Or do you have something else in mind? --- gcc/config/i386/sse.md 2014-12-08 13:26:06.505543457 +0100 +++ gcc/config/i386/sse.md 2014-12-08 13:26:06.505543457 +0100 @@ -16740,14 +16740,21 @@ (set_attr isa avx2,noavx2,avx2,noavx2) (set_attr mode sseinsnmode,V8SF,sseinsnmode,V8SF)]) +;; Modes handled by AVX2 vec_dup splitter. Assumes TARGET_AVX2, +;; if both -mavx512vl and -mavx512bw are used, disables all modes, +;; if just -mavx512vl, disables just V*SI. +(define_mode_iterator AVX2_VEC_DUP_MODE_SPLIT + [(V32QI !TARGET_AVX512VL || !TARGET_AVX512BW) + (V16QI !TARGET_AVX512VL || !TARGET_AVX512BW) + (V16HI !TARGET_AVX512VL || !TARGET_AVX512BW) + (V8HI !TARGET_AVX512VL || !TARGET_AVX512BW) + (V8SI !TARGET_AVX512VL) (V4SI !TARGET_AVX512VL)]) + (define_split - [(set (match_operand:AVX2_VEC_DUP_MODE 0 register_operand) - (vec_duplicate:AVX2_VEC_DUP_MODE + [(set (match_operand:AVX2_VEC_DUP_MODE_SPLIT 0 register_operand) + (vec_duplicate:AVX2_VEC_DUP_MODE_SPLIT (match_operand:ssescalarmode 1 register_operand)))] - TARGET_AVX2 -(!TARGET_AVX512VL - || (!TARGET_AVX512BW GET_MODE_SIZE (ssescalarmodemode) 2)) -reload_completed GENERAL_REG_P (operands[1]) + TARGET_AVX2 reload_completed GENERAL_REG_P (operands[1]) [(const_int 0)] { emit_insn (gen_vec_setv4si_0 (gen_lowpart (V4SImode, operands[0]), Ugh, tough choice. Let's go with the original, but please add the comment, similar to the one above. Maybe also use ssecalarmodemode == SImode instead of GET_MODE_SIZE, it better matches the comment. Thanks, Uros.
Re: [PATCH, trivial] [AArch64] Remove declaration of removed function from aarch64-protos.h
On 09/12/14 08:59, Yangfei (Felix) wrote: The definition of function aarch64_function_profiler is removed since GCC-4.9. But the declaration is still there in aarch64-protos.h. So remove it. OK for the trunk? Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 218464) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2014-12-09 Felix Yang felix.y...@huawei.com + + * config/aarch64/aarch64-protos.h (aarch64_function_profiler): Remove + declaration of removed function. + 2014-12-07 Felix Yang felix.y...@huawei.com Shanyao Chen chenshan...@huawei.com Index: gcc/config/aarch64/aarch64-protos.h === --- gcc/config/aarch64/aarch64-protos.h (revision 218464) +++ gcc/config/aarch64/aarch64-protos.h (working copy) @@ -247,7 +247,6 @@ void aarch64_expand_epilogue (bool); void aarch64_expand_mov_immediate (rtx, rtx); void aarch64_expand_prologue (void); void aarch64_expand_vector_init (rtx, rtx); -void aarch64_function_profiler (FILE *, int); void aarch64_init_cumulative_args (CUMULATIVE_ARGS *, const_tree, rtx, const_tree, unsigned); void aarch64_init_expanders (void); OK. R.
[PATCH][libstdc++][testsuite] Mark as UNSUPPORTED tests that don't fit into tiny memory model
Hi all, There are a few libstdc++ that don't fit into memory when the -mcmodel=tiny option is used. For example 25_algorithms/random_shuffle/1.cc on aarch64 is a 5.5M binary on aarch64 with -mcmodel=small. In the gcc and g++ testsuite we already catch such cases and mark them as UNSUPPORTED. In libstdc++.exp there is no such functionality. This patch adds that check. I've implemented the v3_check_unsupported_p predicate the same way as it's implemented in the gcc testsuite. I'm not very happy that it had to be copied, but I couldn't find a way to include the gcc definition sanely. With this patch 8 tests that were previously FAILing on aarch64-none-elf/-mcmodel=tiny due to relocation truncation errors are now marked as UNSUPPORTED. A test run on x86 didn't show any bad behaviour appearing. Is this a sane approach to what I'm trying to solve? Thanks, Kyrill 2014-12-03 Kyrylo Tkachov kyrylo.tkac...@arm.com\ * testsuite/lib/libstdc++.exp (${tool}_check_unsupported_p): New procedure. (v3_target_compile): Check if test is unsupported. (v3_target_compile_as_c): Likewise.diff --git a/libstdc++-v3/testsuite/lib/libstdc++.exp b/libstdc++-v3/testsuite/lib/libstdc++.exp index 3d9913b..385ae07 100644 --- a/libstdc++-v3/testsuite/lib/libstdc++.exp +++ b/libstdc++-v3/testsuite/lib/libstdc++.exp @@ -343,6 +343,22 @@ proc libstdc++_exit { } { } } +proc ${tool}_check_unsupported_p { output } { +if [regexp (^|\n)\[^\n\]*: region \[^\n\]* is full $output] { + return memory full +} +if { [regexp (^|\n)\[^\n\]*: relocation truncated to fit $output] + [check_effective_target_tiny] } { +return memory full + } + +if { [istarget spu-*-*] \ + [string match *exceeds local store* $output] } { + return memory full +} +return +} + # Callback from system dg-test. proc libstdc++-dg-test { prog do_what extra_tool_flags } { # Set up the compiler flags, based on what we're going to do. @@ -455,6 +471,7 @@ proc v3_target_compile { source dest type options } { global cxxldflags global includes global STATIC_LIBCXXFLAGS +global tool if { [target_info needs_status_wrapper] != [info exists gluefile] } { lappend options libs=${gluefile} @@ -483,7 +500,14 @@ proc v3_target_compile { source dest type options } { lappend options compiler=$cxx_final lappend options timeout=[timeout_value] -return [target_compile $source $dest $type $options] +set comp_output [target_compile $source $dest $type $options] +set unsupported_message [${tool}_check_unsupported_p $comp_output] + +if { $unsupported_message != } { + unsupported $dest: $unsupported_message + return +} +return $comp_output } @@ -498,6 +522,7 @@ proc v3_target_compile_as_c { source dest type options } { global cc global cxxflags global STATIC_LIBCXXFLAGS +global tool if { [target_info needs_status_wrapper] != [info exists gluefile] } { lappend options libs=${gluefile} @@ -551,7 +576,14 @@ proc v3_target_compile_as_c { source dest type options } { lappend options compiler=$cc_final lappend options timeout=[timeout_value] -return [target_compile $source $dest $type $options] +set comp_output [target_compile $source $dest $type $options] +set unsupported_message [${tool}_check_unsupported_p $comp_output] + +if { $unsupported_message != } { + unsupported $dest: $unsupported_message + return +} +return $comp_output } # Build the support objects linked in with the libstdc++ tests. In
Re: Fix libgomp crash without TLS (PR42616)
Can we instead of adding new macroses in config/tls.m4 use something like that in libgomp: #if defined (HAVE_TLS) defined (USE_EMUTLS) (with GCC_CHECK_EMUTLS in libgomp/configure.ac)? 2014-12-08 19:28 GMT+03:00 Jakub Jelinek ja...@redhat.com: On Mon, Dec 08, 2014 at 07:01:09PM +0300, Varvara Rainchik wrote: Is it ok to add GCC_CHECK_EMUTLS test in libgomp/configure.ac or should I add in tls.m3 a similair test that would be used only in libgomp? I think config/tls.m4 would be a better place, but doing it in some way where the users of config/tls.m4 could actually by using different macros from there choose what they want (either check solely for real TLS, or only for emutls, or for both). And libgomp/configure.ac would then choose it is happy with both. Jakub
[patch] libstdc++/64203 fix shared_mutex for bare-metal targets
Tested x86_64-linux, committed to trunk and the 4.9 branch. commit 6e201632e6b99a8381ec973ae77a9b4e1db255d7 Author: Jonathan Wakely jwak...@redhat.com Date: Mon Dec 8 16:02:48 2014 + PR libstdc++/64203 * include/std/shared_mutex: Fix preprocessor conditions. * testsuite/experimental/feat-cxx14.cc: Check conditions. diff --git a/libstdc++-v3/include/std/shared_mutex b/libstdc++-v3/include/std/shared_mutex index 6405f10..c193eb2 100644 --- a/libstdc++-v3/include/std/shared_mutex +++ b/libstdc++-v3/include/std/shared_mutex @@ -36,10 +36,8 @@ #else #include bits/c++config.h -#if defined(_GLIBCXX_HAS_GTHREADS) defined(_GLIBCXX_USE_C99_STDINT_TR1) -# include mutex -# include condition_variable -#endif +#include mutex +#include condition_variable #include bits/functexcept.h namespace std _GLIBCXX_VISIBILITY(default) @@ -51,7 +49,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION * @{ */ -#if defined(_GLIBCXX_HAS_GTHREADS) defined(_GLIBCXX_USE_C99_STDINT_TR1) +#ifdef _GLIBCXX_USE_C99_STDINT_TR1 +#ifdef _GLIBCXX_HAS_GTHREADS #define __cpp_lib_shared_timed_mutex 201402 @@ -254,7 +253,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } } }; -#endif // _GLIBCXX_HAS_GTHREADS _GLIBCXX_USE_C99_STDINT_TR1 +#endif // _GLIBCXX_HAS_GTHREADS /// shared_lock templatetypename _Mutex @@ -393,6 +392,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION swap(shared_lock_Mutex __x, shared_lock_Mutex __y) noexcept { __x.swap(__y); } +#endif // _GLIBCXX_USE_C99_STDINT_TR1 + // @} group mutexes _GLIBCXX_END_NAMESPACE_VERSION } // namespace diff --git a/libstdc++-v3/testsuite/experimental/feat-cxx14.cc b/libstdc++-v3/testsuite/experimental/feat-cxx14.cc index 1c590dd..2cc31ca 100644 --- a/libstdc++-v3/testsuite/experimental/feat-cxx14.cc +++ b/libstdc++-v3/testsuite/experimental/feat-cxx14.cc @@ -106,10 +106,12 @@ # error shared_mutex #endif -#ifndef __cpp_lib_shared_timed_mutex -# error __cpp_lib_shared_timed_mutex -#elif __cpp_lib_shared_timed_mutex != 201402 -# error __cpp_lib_shared_timed_mutex != 201402 +#if defined(_GLIBCXX_HAS_GTHREADS) defined(_GLIBCXX_USE_C99_STDINT_TR1) +# ifndef __cpp_lib_shared_timed_mutex +#error __cpp_lib_shared_timed_mutex +# elif __cpp_lib_shared_timed_mutex != 201402 +#error __cpp_lib_shared_timed_mutex != 201402 +# endif #endif #ifndef __cpp_lib_is_final
[PATCH] Fix PR64199
Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2014-12-09 Richard Biener rguent...@suse.de PR middle-end/64199 * fold-const.c (fold_binary_loc): Use TREE_OVERFLOW_P. * gcc.dg/torture/pr64199.c: New testcase. Index: gcc/fold-const.c === --- gcc/fold-const.c(revision 218479) +++ gcc/fold-const.c(working copy) @@ -10398,8 +10415,8 @@ fold_binary_loc (location_t loc, /* Don't introduce overflows through reassociation. */ if (!any_overflows - ((lit0 TREE_OVERFLOW (lit0)) - || (minus_lit0 TREE_OVERFLOW (minus_lit0 + ((lit0 TREE_OVERFLOW_P (lit0)) + || (minus_lit0 TREE_OVERFLOW_P (minus_lit0 return NULL_TREE; if (minus_lit0) Index: gcc/testsuite/gcc.dg/torture/pr64199.c === --- gcc/testsuite/gcc.dg/torture/pr64199.c (revision 0) +++ gcc/testsuite/gcc.dg/torture/pr64199.c (working copy) @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-additional-options -ffast-math -frounding-math } */ + +float +foo (void) +{ + return 1.1f + 2.2f + 2.2f; +}
Re: Fix libgomp crash without TLS (PR42616)
On Tue, Dec 09, 2014 at 02:49:44PM +0300, Varvara Rainchik wrote: Can we instead of adding new macroses in config/tls.m4 use something like that in libgomp: #if defined (HAVE_TLS) defined (USE_EMUTLS) (with GCC_CHECK_EMUTLS in libgomp/configure.ac)? That would be fine too. Jakub
Re: [PATCH 3/n] OpenMP 4.0 offloading infrastructure: offload tables
On 04 Dec 20:52, Jakub Jelinek wrote: On Thu, Dec 04, 2014 at 10:35:19PM +0300, Ilya Verbin wrote: This issue can be resolved by forcing output of such variables. Is this fix ok? Should I add a testcase? Yes, with proper ChangeLog. Yes. Here is updated patch, ok to commit? However, I don't see -flto option in the build log. It seems that check_effective_target_lto isn't working inside libgomp/ directory. Maybe because ENABLE_LTO is defined only in gcc/configure.ac ? gcc/ * varpool.c (varpool_node::get_create): Force output of vars with omp declare target attribute. libgomp/ * testsuite/libgomp.c/target-9.c: New test. diff --git a/gcc/varpool.c b/gcc/varpool.c index 0526b7f..db28c2a 100644 --- a/gcc/varpool.c +++ b/gcc/varpool.c @@ -175,6 +175,7 @@ varpool_node::get_create (tree decl) g-have_offload = true; if (!in_lto_p) vec_safe_push (offload_vars, decl); + node-force_output = 1; #endif } diff --git a/libgomp/testsuite/libgomp.c/target-9.c b/libgomp/testsuite/libgomp.c/target-9.c new file mode 100644 index 000..00fe0cb --- /dev/null +++ b/libgomp/testsuite/libgomp.c/target-9.c @@ -0,0 +1,37 @@ +/* { dg-do run } */ +/* { dg-options -O1 } */ +/* { dg-additional-options -flto { target lto } } */ + +#include stdlib.h + +#define N 123456 + +#pragma omp declare target +int X, Y; +#pragma omp end declare target + +void +foo () +{ + #pragma omp target map(alloc: X) +X = N; +} + +int +main () +{ + int res; + + foo (); + + #pragma omp target map(alloc: X, Y) map(from: res) +{ + Y = N; + res = X + Y; +} + + if (res != N + N) +abort (); + + return 0; +} -- Ilya
[Android] Stop unconditional disabling of ifuncs for Bionic #2
Hi, Whether the *linux* target supports ifuncs or not is defined here: linux_has_ifunc_p (void) { return OPTION_BIONIC ? false : HAVE_GNU_INDIRECT_FUNCTION; } Bionic right now supports indirect functions, but there is no way to notify the compiler about that (For Android OPTION_BIONIC is always on and so configure time check with --enable-gnu-indirect-function does not work) The following patch makes the use of the default version of TARGET_HAS_IFUNC_P for *linux*, so we can decide whether the Android target supports indirect functions or not on configure time. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index f22bba8..d4d09d0 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2014-12-08 Alexander Ivchenko alexander.ivche...@intel.com + + * config/linux.c (linux_has_ifunc_p): Remove. + * config/linux.h (TARGET_HAS_IFUNC_P): Use default version. + 2014-12-08 Ilya Tocar ilya.to...@intel.com * config/i386/i386.c (ix86_expand_vec_perm_vpermi2): Handle v64qi. diff --git a/gcc/config/linux.c b/gcc/config/linux.c index 6242e11..15df213 100644 --- a/gcc/config/linux.c +++ b/gcc/config/linux.c @@ -23,14 +23,6 @@ along with GCC; see the file COPYING3. If not see #include tm.h #include linux-protos.h -/* Android does not support GNU indirect functions. */ - -bool -linux_has_ifunc_p (void) -{ - return OPTION_BIONIC ? false : HAVE_GNU_INDIRECT_FUNCTION; -} - bool linux_libc_has_function (enum function_class fn_class) { diff --git a/gcc/config/linux.h b/gcc/config/linux.h index d38ef81..6ccacff 100644 --- a/gcc/config/linux.h +++ b/gcc/config/linux.h @@ -117,10 +117,6 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #else /* !uClinux, i.e., normal Linux */ -/* IFUNCs are supported by glibc, but not by uClibc or Bionic. */ -# undef TARGET_HAS_IFUNC_P -# define TARGET_HAS_IFUNC_P linux_has_ifunc_p - /* Determine what functions are present at the runtime; this includes full c99 runtime and sincos. */ # undef TARGET_LIBC_HAS_FUNCTION Is the patch ok? Bootstrapped/regtested on x86_64-unknown-linux-gnu + checked that the behavior of i686-linux-android is correct. Alexander
[AArch64, NEON] Improve vmulX intrinsics
Hi, This patch converts more intrinsics to use builtin functions instead of the previous inline assembly syntax. Passed the glorious testsuite of Christophe Lyon. Three testcases are added for the testing of intriniscs which are not covered by the testsuite: gcc.target/aarch64/vmull_high.c gcc.target/aarch64/vmull_high_lane.c gcc.target/aarch64/vmull_high_n.c Regtested with aarch64-linux-gnu on QEMU. This patch has no regressions for aarch64_be-linux-gnu big-endian target too. OK for the trunk? Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 218464) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,38 @@ +2014-12-09 Felix Yang felix.y...@huawei.com +Jiji Jiang jiangj...@huawei.com + + * config/aarch64/aarch64-simd.md (aarch64_mul_nmode, + aarch64_sumull_nmode, aarch64_sumullmode, + aarch64_simd_sumull2_nmode, aarch64_sumull2_nmode, + aarch64_sumull_lanemode, aarch64_sumull2_lanemode_internal, + aarch64_sumull_laneqmode, aarch64_sumull2_laneqmode_internal, + aarch64_smull2_lanemode, aarch64_umull2_lanemode, + aarch64_smull2_laneqmode, aarch64_umull2_laneqmode, + aarch64_fmulxmode, aarch64_fmulxmode, aarch64_fmulx_lanemode, + aarch64_pmull2v16qi, aarch64_pmullv8qi): New patterns. + * config/aarch64/aarch64-simd-builtins.def (vec_widen_smult_hi_, + vec_widen_umult_hi_, umull, smull, smull_n, umull_n, mul_n, smull2_n, + umull2_n, smull_lane, umull_lane, smull_laneq, umull_laneq, pmull, + umull2_lane, smull2_laneq, umull2_laneq, fmulx, fmulx_lane, pmull2, + smull2_lane): New builtins. + * config/aarch64/arm_neon.h (vmul_n_f32, vmul_n_s16, vmul_n_s32, + vmul_n_u16, vmul_n_u32, vmulq_n_f32, vmulq_n_f64, vmulq_n_s16, + vmulq_n_s32, vmulq_n_u16, vmulq_n_u32, vmull_high_lane_s16, + vmull_high_lane_s32, vmull_high_lane_u16, vmull_high_lane_u32, + vmull_high_laneq_s16, vmull_high_laneq_s32, vmull_high_laneq_u16, + vmull_high_laneq_u32, vmull_high_n_s16, vmull_high_n_s32, + vmull_high_n_u16, vmull_high_n_u32, vmull_high_p8, vmull_high_s8, + vmull_high_s16, vmull_high_s32, vmull_high_u8, vmull_high_u16, + vmull_high_u32, vmull_lane_s16, vmull_lane_s32, vmull_lane_u16, + vmull_lane_u32, vmull_laneq_s16, vmull_laneq_s32, vmull_laneq_u16, + vmull_laneq_u32, vmull_n_s16, vmull_n_s32, vmull_n_u16, vmull_n_u32, + vmull_p8, vmull_s8, vmull_s16, vmull_s32, vmull_u8, vmull_u16, + vmull_u32, vmulx_f32, vmulx_lane_f32, vmulxd_f64, vmulxq_f32, + vmulxq_f64, vmulxq_lane_f32, vmulxq_lane_f64, vmulxs_f32): Rewrite + using builtin functions. + * config/aarch64/iterators.md (UNSPEC_FMULX, UNSPEC_FMULX_LANE, + VDQF_Q): New unspec and int iterator. + 2014-12-07 Felix Yang felix.y...@huawei.com Shanyao Chen chenshan...@huawei.com Index: gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmull_high.c === --- gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmull_high.c (revision 0) +++ gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmull_high.c (revision 0) @@ -0,0 +1,111 @@ +#include arm_neon.h +#include arm-neon-ref.h +#include compute-ref-data.h + + +/* Expected results. */ +VECT_VAR_DECL(expected,int,16,8) [] = { 0xfc48, 0xfcbf, 0xfd36, 0xfdad, +0xfe24, 0xfe9b, 0xff12, 0xff89 }; +VECT_VAR_DECL(expected,int,32,4) [] = { 0xf9a0, 0xfa28, +0xfab0, 0xfb38 }; +VECT_VAR_DECL(expected,int,64,2) [] = { 0xf7a2, +0xf83b }; +VECT_VAR_DECL(expected,uint,16,8) [] = { 0xa4b0, 0xa55a, 0xa604, 0xa6ae, + 0xa758, 0xa802, 0xa8ac, 0xa956 }; +VECT_VAR_DECL(expected,uint,32,4) [] = { 0xbaf73c, 0xbaf7f7, + 0xbaf8b2, 0xbaf96d }; +VECT_VAR_DECL(expected,uint,64,2) [] = { 0xcbf4d8, + 0xcbf5a4}; +VECT_VAR_DECL(expected,poly,16,8) [] = { 0x6530, 0x659a, 0x6464, 0x64ce, + 0x6798, 0x6732, 0x66cc, 0x }; + +#ifndef INSN_NAME +#define INSN_NAME vmull_high +#define TEST_MSG VMUL_HIGH +#endif + +#define FNNAME1(NAME) exec_ ## NAME +#define FNNAME(NAME) FNNAME1(NAME) + +void FNNAME (INSN_NAME) (void) +{ +#define DECL_VMUL(T, W, N) \ + DECL_VARIABLE(vector1, T, W, N); \ + DECL_VARIABLE(vector2, T, W, N); + + /* vector_res = OP(vector1, vector2), then store the result. */ +#define TEST_VMULL_HIGH1(INSN, Q, T1, T2, W, N, W1, N1) \ + VECT_VAR(vector_res, T1, W1, N1) =\ +INSN##Q##_##T2##W(VECT_VAR(vector1, T1, W, N),
Re: match.pd tweaks for vectors and issues with HONOR_NANS
On Mon, Dec 8, 2014 at 3:45 PM, Marc Glisse marc.gli...@inria.fr wrote: On Tue, 18 Nov 2014, Richard Biener wrote: I'll try to replace some more TYPE_MODE during stage3... Btw, a convenience would be to be able to write HONOR_NANS (type) thus effectively make HONOR_* inline functions with a machine_mode and a type overload (and the type overload properly looking at element types). Making those functions inline is not easy, because real.h and tree.h don't include each other. Here is a version with the functions not inline. I was tempted to also overload on gcond const* (for the cases that call gimple_cond_lhs) but the arguments were always gimple and not gcond*, so I didn't. Passes bootstrap+testsuite on x86_64-linux-gnu. Ok. Thanks, Richard. 2014-12-08 Marc Glisse marc.gli...@inria.fr * real.h (HONOR_NANS): Replace macro with 3 overloaded declarations. * real.c: Include rtl.h and options.h. (HONOR_NANS): Define three overloads. * builtins.c (fold_builtin_classify, fold_builtin_unordered_cmp): Simplify argument of HONOR_NANS. * fold-const.c (combine_comparisons, fold_truth_not_expr, fold_cond_expr_with_comparison, merge_truthop_with_opposite_arm, fold_comparison, fold_binary_loc): Likewise. * ifcvt.c (noce_try_move, noce_try_minmax): Likewise. * ipa-inline-analysis.c (add_clause, set_cond_stmt_execution_predicate): Likewise. * match.pd: Likewise. * rtlanal.c (may_trap_p_1): Likewise. * simplify-rtx.c (simplify_const_relational_operation): Likewise. * tree-if-conv.c (parse_predicate): Likewise. * tree-ssa-ccp.c (valid_lattice_transition): Likewise. * tree-ssa-ifcombine.c (ifcombine_ifandif): Likewise. * tree-ssa-phiopt.c (minmax_replacement, neg_replacement): Likewise. * tree-ssa-reassoc.c (eliminate_using_constants): Likewise. * tree-ssa-tail-merge.c (gimple_equal_p): Likewise. -- Marc Glisse Index: builtins.c === --- builtins.c (revision 218467) +++ builtins.c (working copy) @@ -9641,34 +9641,34 @@ fold_builtin_classify (location_t loc, t integer_minus_one_node, integer_one_node); tmp = fold_build3_loc (loc, COND_EXPR, integer_type_node, isinf_call, tmp, integer_zero_node); } return tmp; } case BUILT_IN_ISFINITE: - if (!HONOR_NANS (TYPE_MODE (TREE_TYPE (arg))) + if (!HONOR_NANS (arg) !HONOR_INFINITIES (TYPE_MODE (TREE_TYPE (arg return omit_one_operand_loc (loc, type, integer_one_node, arg); if (TREE_CODE (arg) == REAL_CST) { r = TREE_REAL_CST (arg); return real_isfinite (r) ? integer_one_node : integer_zero_node; } return NULL_TREE; case BUILT_IN_ISNAN: - if (!HONOR_NANS (TYPE_MODE (TREE_TYPE (arg + if (!HONOR_NANS (arg)) return omit_one_operand_loc (loc, type, integer_zero_node, arg); if (TREE_CODE (arg) == REAL_CST) { r = TREE_REAL_CST (arg); return real_isnan (r) ? integer_one_node : integer_zero_node; } arg = builtin_save_expr (arg); return fold_build2_loc (loc, UNORDERED_EXPR, type, arg, arg); @@ -9782,27 +9782,26 @@ fold_builtin_unordered_cmp (location_t l else if (code0 == REAL_TYPE code1 == INTEGER_TYPE) cmp_type = type0; else if (code0 == INTEGER_TYPE code1 == REAL_TYPE) cmp_type = type1; arg0 = fold_convert_loc (loc, cmp_type, arg0); arg1 = fold_convert_loc (loc, cmp_type, arg1); if (unordered_code == UNORDERED_EXPR) { - if (!HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0 + if (!HONOR_NANS (arg0)) return omit_two_operands_loc (loc, type, integer_zero_node, arg0, arg1); return fold_build2_loc (loc, UNORDERED_EXPR, type, arg0, arg1); } - code = HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))) ? unordered_code - : ordered_code; + code = HONOR_NANS (arg0) ? unordered_code : ordered_code; return fold_build1_loc (loc, TRUTH_NOT_EXPR, type, fold_build2_loc (loc, code, type, arg0, arg1)); } /* Fold __builtin_{,s,u}{add,sub,mul}{,l,ll}_overflow, either into normal arithmetics if it can never overflow, or into internal functions that return both result of arithmetics and overflowed boolean flag in a complex integer result, or some other check for overflow. */ static tree Index: fold-const.c === --- fold-const.c(revision 218467) +++ fold-const.c(working copy) @@ -2585,21 +2585,21 @@ compcode_to_comparison (enum comparison_ and RCODE on the
Re: [PATCH 2/3] Extended if-conversion
Richard, Here is updated patch2 with the following changes: 1. Delete functions phi_has_two_different_args and find_insertion_point. 2. Use only one function for extended predication - predicate_extended_scalar_phi. 3. Save gsi before insertion of predicate computations for basic blocks if it has 2 predecessors and both incoming edges are critical or it gas more than 2 predecessors and at least one incoming edge is critical. This saved iterator can be used by extended phi predication. Here is motivated test-case which explains this point. Test-case is attached (t5.c) and it must be compiled with -O2 -ftree-loop-vectorize -fopenmp options. The problem phi is in bb-7: bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 }) { bb 5: xmax_edge_18 = xmax_edge_36 + 1; if (xmax_17 == xmax_27) goto bb 7; else goto bb 9; } bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 }) { bb 6: if (xmax_17 == xmax_27) goto bb 7; else goto bb 8; } bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 }) { bb 7: # xmax_edge_30 = PHI xmax_edge_36(6), xmax_edge_18(5) xmax_edge_19 = xmax_edge_39 + 1; goto bb 11; } Note that both incoming edges to bb_7 are critical. If we comment out restoring gsi in predicate_all_scalar_phi: #if 0 if ((EDGE_COUNT (bb-preds) == 2 all_preds_critical_p (bb)) || (EDGE_COUNT (bb-preds) 2 has_pred_critical_p (bb))) gsi = bb_insert_point (bb); else #endif gsi = gsi_after_labels (bb); we will get ICE: t5.c: In function 'foo': t5.c:9:6: error: definition in block 4 follows the use void foo (int n) ^ for SSA_NAME: _1 in statement: _52 = _1 _3; t5.c:9:6: internal compiler error: verify_ssa failed smce predicate computations were inserted in bb_7. ChangeLog is 2014-12-09 Yuri Rumyantsev ysrum...@gmail.com * tree-if-conv.c : Include hash-map.h. (struct bb_predicate_s): Add new field to save copy of gimple statement iterator. (bb_insert_point): New function. (set_bb_insert_point): New function. (has_pred_critical_p): New function. (if_convertible_bb_p): Allow bb has more than 2 predecessors if AGGRESSIVE_IF_CONV is true. (if_convertible_bb_p): Delete check that bb has at least one non-critical incoming edge. (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED. Allow interchange PHI arguments if EXTENDED is false. Change check that block containing reduction statement candidate is predecessor of phi-block since phi may have more than two arguments. (predicate_scalar_phi): Add new arguments for call of is_cond_scalar_reduction. (get_predicate_for_edge): New function. (struct phi_args_hash_traits): New type. (phi_args_hash_traits::hash): New function. (phi_args_hash_traits::equal_keys): New function. (gen_phi_arg_condition): New function. (predicate_extended_scalar_phi): New function. (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it to true if BB containing phi has more than 2 predecessors or both incoming edges are critical. Invoke find_phi_replacement_condition and predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB has 2 predecessors and both incoming edges are critical or it has more than 2 predecessors and atleast one incoming edge is critical. Use standard gsi_after_labels otherwise. Invoke predicate_extended_scalar_phi if EXTENDED is true. (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION to save gsi before insertion of predicate computations. SEt-up it to true for BB with 2 predecessors and critical incoming edges either number of predecessors is geater 2 and at least one incoming edge is critical. Add check that non-predicated block may have statements to insert. Insert predicate computation of BB just after label if EXTENDED_PREDICATION is true. (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which is copy of inner or outer loop force_vectorize field. 2014-12-04 16:37 GMT+03:00 Richard Biener richard.guent...@gmail.com: On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Richard, I did simple change by saving gsi iterator for each bb that has critical edges by adding additional field to bb_predicate_s: typedef struct bb_predicate_s { /* The condition under which this basic block is executed. */ tree predicate; /* PREDICATE is gimplified, and the sequence of statements is recorded here, in order to avoid the duplication of computations that occur in previous conditions. See PR44483. */ gimple_seq predicate_gimplified_stmts; /* Insertion point for blocks having incoming critical edges. */ gimple_stmt_iterator gsi; } *bb_predicate_p; and this iterator is saved in insert_gimplified_predicates before insertion code for predicate computation. I checked that this fix works. Huh? I still wonder what the issue is with inserting everything after the PHI we predicate. Well, your updated patch will come with testcases for the testsuite that will
Re: [COMMITTED] [PING] [PATCH] [AArch64, NEON] More NEON intrinsics improvement
On 9 December 2014 at 03:26, Yangfei (Felix) felix.y...@huawei.com wrote: On 5 December 2014 at 18:44, Tejas Belagod tejas.bela...@arm.com wrote: +__extension__ static __inline float32x2_t __attribute__ +((__always_inline__)) +vfms_f32 (float32x2_t __a, float32x2_t __b, float32x2_t __c) { + return __builtin_aarch64_fmav2sf (-__b, __c, __a); } + +__extension__ static __inline float32x4_t __attribute__ +((__always_inline__)) +vfmsq_f32 (float32x4_t __a, float32x4_t __b, float32x4_t __c) { + return __builtin_aarch64_fmav4sf (-__b, __c, __a); } + +__extension__ static __inline float64x2_t __attribute__ +((__always_inline__)) +vfmsq_f64 (float64x2_t __a, float64x2_t __b, float64x2_t __c) { + return __builtin_aarch64_fmav2df (-__b, __c, __a); } + + Thanks, the patch looks good. Just one comment: You could also add float32x2_t vfms_n_f32(float32x2_t a, float32x2_t b, float32_t n) and its Q-variant. You can, if you wish, deal with Tejas' comment with a follow on patch rather than re-spinning this one. Provided this patch has no regressions on a big endian and a little endian test run then you can commit it. Thanks /Marcus No regressions for aarch64_be-linux-gnu target. Committed as r218484. Will come up with a new patch to deal with Tejas' comment. Thanks. My validations of trunk show that your new tests are incorrect: none of them compiles because the hfloat64_t type isn't defined. Also, keep in mind that the tests in this directory are executed by the aarch32 target too. Christophe
Re: [Patch] Improving jump-thread pass for PR 54742
On Mon, Dec 8, 2014 at 10:49 PM, Steve Ellcey sell...@mips.com wrote: On Sat, 2014-12-06 at 19:21 +, Sebastian Pop wrote: I think it does not make sense to duplicate paths at -Os: I disabled the FSM jump-threading when optimizing for size like this. diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c index 29b20c8..ce70311 100644 --- a/gcc/tree-ssa-threadedge.c +++ b/gcc/tree-ssa-threadedge.c @@ -1335,8 +1335,9 @@ thread_through_normal_block (edge e, return 1; } if (!flag_expensive_optimizations + || optimize_function_for_size_p (cfun) || TREE_CODE (cond) != SSA_NAME || e-dest-loop_father != e-src-loop_father || loop_depth (e-dest-loop_father) == 0) return 0; I will regstrap and commit the attached patch. Bootstrapped and regression tested on x86_64-linux. Committed r218451. Sebastian Thanks for getting all this checked in Sebastian, I have tested it on coremark and I am getting the speed up that I expect. But I am a little confused about turning off jump threading. I am getting the optimization on coremark with -O3 and that is great and if I use '-O3 -fno-expensive-optimizations' then I don't see this part of the jump threading, also great. But I was surprised that if I just used '-O3 -fno-thread-jumps' then I still see this optimization. Is that expected? Should this test also check flag_thread_jumps? Or should that be getting checked somewhere else? -fthread-jumps is an RTL optimization flag and ignored on GIMPLE. Richard. Steve Ellcey
Re: [RFC, PATCH, fortran] PR fortran/60255 Deferred character length
Dear Andre, The patch causes an ICE for the test gfortran.dg/unlimited_polymorphic_1.f03: f951: internal compiler error: in gfc_add_component_ref, at fortran/class.c:236 f951: internal compiler error: Abort trap: 6 gfc: internal compiler error: Abort trap: 6 (program f951) Abort Reduced test for which the ICE is triggered by ‘len(w)' MODULE m contains subroutine bar (arg, res) class(*) :: arg character(100) :: res select type (w = arg) type is (character(*)) write (res, '(I2)') len(w) end select end subroutine END MODULE Note that with your patch at https://gcc.gnu.org/ml/fortran/2014-08/msg00022.html, I get the same ICE for the Mikael’s test at https://gcc.gnu.org/ml/fortran/2014-08/msg00055.html (before your patch for pr60255, it used to give a wrong length: 80 instead of 20 AFAICR). Note that the assert at fortran/class.c:236 is also triggered for pr61115. Thanks for working on these issues, Dominique On 8 December 2014 at 18:38, Andre Vehreschild ve...@gmx.de wrote: Hi all, please find attached a more elaborate patch for pr60255. I totally agree that my first attempt was just scratching the surface of the work needed. This patch also is *not* complete, but because I am really new to gfortran patching, I don't want to present a final patch only to learn then, that I have violated design rules, common practice or the like. Therefore please comment and direct me to any sources/ideas to improve the patch. Topic: The pr 60255 is about assigning a char array to an unlimited polymorphic entity. In the comments the concern about the lost length information is raised. The patch adds a _len component to the unlimited polymorphic entity (after _data and _vtab) and adds an assignment of the string length to _len when a string is pointer assigned to the unlimited poly entity. Furthermore is the intrinsic len(unlimited poly pointing to a string) resolved to give the _len component. Yet missing: - assign _len component back to deferred char array length component - transport length along chains of unlimited poly entities, i.e., a = b; c = a where all objects are unlimited poly and b is a string. - allocate() in this context Patch dependencies: none Comments, concerns, candy welcome! Regards, Andre
Re: [PATCH] Do not download packages for graphite loop optimizations by default when using ./contrib/download_prerequisites
On Tue, Dec 9, 2014 at 6:36 AM, Chung-Ju Wu jasonw...@gmail.com wrote: Hi, all, In the discussion thread last year: https://gcc.gnu.org/ml/gcc-patches/2013-05/msg01334.html I extended the script ./contrib/download_prerequisites so that it can download isl and cloog packages for graphite loop optimizations. The patch was proposed to use GRAPHITE_LOOP_OPT=no by default. However, the change I committed into trunk is setting GRAPHITE_LOOP_OPT=yes: https://gcc.gnu.org/viewcvs/gcc?view=revisionrevision=199297 I am sorry about my carelessness and I would like to propose a new patch to fix it. The plaintext ChangeLog and patch are as follow: I'd say keep it as =yes (now that we only need ISL) and adjust the comment instead. Richard. Index: contrib/ChangeLog === --- contrib/ChangeLog (revision 218505) +++ contrib/ChangeLog (working copy) @@ -1,3 +1,7 @@ +2014-12-09 Chung-Ju Wu jasonw...@gmail.com + + * download_prerequisites: Set GRAPHITE_LOOP_OPT=no by default. + 2014-12-04 Thomas Preud'homme thomas.preudho...@arm.com * check_GNU_style.sh: Warn for incorrect number of spaces in function Index: contrib/download_prerequisites === --- contrib/download_prerequisites (revision 218505) +++ contrib/download_prerequisites (working copy) @@ -22,7 +22,7 @@ # If you want to build GCC with the Graphite loop optimizations, # set GRAPHITE_LOOP_OPT=yes to download optional prerequisties # ISL Library and CLooG. -GRAPHITE_LOOP_OPT=yes +GRAPHITE_LOOP_OPT=no # Necessary to build GCC. MPFR=mpfr-2.4.2 Is this patch OK for trunk? Or we can still keep it yes since it has been using GRAPHITE_LOOP_OPT=yes for a long time. In that case, I will help to update the comment accordingly. Any comment? :) Best regards, jasonwucj
Re: [PATCH, x86] Fix pblendv expand.
The case comes from spec2006 403.gcc (or old GCC itself). for (i = 0; i FIRST_PSEUDO_REGISTER; ++i) { vd-e[i].mode = VOIDmode; vd-e[i].oldest_regno = i; vd-e[i].next_regno = INVALID_REGNUM; } It is vectorized and only then completely peeled. Only after peeling all stored values become constant. Currently expand_vec_perm_pblendv works as following: Let d.target, d.op0, dop1 be permutation parameters. First we permute an operand (d.op1 or d.op0) and then blend it with other argument: d.target = shuffle(d.op1) /* using expand_vec_perm_1 */ d.target = pblend(d.op0, d.target) (if d.op0 equal to d.target this is buggy) Patch make it more accurate: tmp = gen_reg_rtx (vmode) tmp = shuffle(d.op1) /* using expand_vec_perm_1 */ d.target = pblend(d.op0, tmp) (Here d.op0 can be equal to d.target) Below is rtl dump of buggy case: (183t.optimized) ... vect_shuffle3_low_470 = VEC_PERM_EXPR { 0, 0, 0, 0 }, { 32, 33, 34, 35 }, { 0, 4, 0, 1 }; vect_shuffle3_high_469 = VEC_PERM_EXPR vect_shuffle3_low_470, { 4294967295, 4294967295, 4294967295, 4294967295 }, { 0, 1, 4, 3 }; ... (184r.expand) ... (insn 205 204 206 (set (reg:V4SI 768) (const_vector:V4SI [ (const_int 0 [0]) (const_int 0 [0]) (const_int 0 [0]) (const_int 0 [0]) ])) ../regrename.c:1171 -1 (nil)) (insn 206 205 208 (set (reg:V4SI 769) (mem/u/c:V4SI (symbol_ref/u:DI (*.LC28) [flags 0x2]) [3 S16 A128])) ../regrename.c:1171 -1 (expr_list:REG_EQUAL (const_vector:V4SI [ (const_int 32 [0x20]) (const_int 33 [0x21]) (const_int 34 [0x22]) (const_int 35 [0x23]) ]) (nil))) (insn 208 206 207 (set (reg:V4SI 770) (vec_select:V4SI (vec_concat:V8SI (reg:V4SI 768) (reg:V4SI 769)) (parallel [ (const_int 0 [0]) (const_int 4 [0x4]) (const_int 1 [0x1]) (const_int 5 [0x5]) ]))) ../regrename.c:1171 -1 (nil)) (insn 207 208 209 (set (reg:V4SI 464 [ D.15061 ]) (vec_select:V4SI (reg:V4SI 770) (parallel [ (const_int 0 [0]) (const_int 1 [0x1]) (const_int 0 [0]) (const_int 2 [0x2]) ]))) ../regrename.c:1171 -1 (nil)) (insn 209 207 210 (set (reg:V4SI 771) (const_vector:V4SI [ (const_int -1 [0x]) (const_int -1 [0x]) (const_int -1 [0x]) (const_int -1 [0x]) ])) ../regrename.c:1171 -1 (nil)) (insn 210 209 211 (set (reg:V4SI 464 [ D.15061 ]) (vec_select:V4SI (reg:V4SI 771) (parallel [ (const_int 0 [0]) (const_int 1 [0x1]) (const_int 0 [0]) (const_int 3 [0x3]) ]))) ../regrename.c:1171 -1 (nil)) (insn 211 210 212 (set (reg:V8HI 772) (vec_merge:V8HI (subreg:V8HI (reg:V4SI 464 [ D.15061 ]) 0) (subreg:V8HI (reg:V4SI 464 [ D.15061 ]) 0) (const_int 48 [0x30]))) ../regrename.c:1171 -1 (nil)) ... On Tue, Dec 9, 2014 at 12:06 PM, Uros Bizjak ubiz...@gmail.com wrote: On Tue, Dec 9, 2014 at 9:57 AM, Uros Bizjak ubiz...@gmail.com wrote: The patch fix pblendv expand. The bug was uncovered when permutation operands are constants. In this case we init target register for expand_vec_perm_1 with constant and then rewrite the target with constant for expand_vec_perm_pblend. The patch fixes 403.gcc execution, compiled with -Ofast -funroll-loops -flto -march=corei7. Bootstrap and make check passed. Is it ok? Please add a testcase. Also, it surprises me that we enter expand_vec_perm_pblendv with uninitialized (?) target. Does your patch only papers over a real problem up the call chain (hard to say without a testcase)? Uros. Evgeny 2014-12-09 Evgeny Stupachenko evstu...@gmail.com gcc/ * config/i386/i386.c (expand_vec_perm_pblendv): Gen new rtx for expand_vec_perm_1 target. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index eafc15a..5a914ad 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -47546,6 +47546,7 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d *d) dcopy.op0 = dcopy.op1 = d-op1; else dcopy.op0 = dcopy.op1 = d-op0; + dcopy.target = gen_reg_rtx (vmode); dcopy.one_operand_p = true; for (i = 0; i nelt; ++i)
Re: [AArch64, NEON] Improve vmulX intrinsics
On 9 December 2014 at 13:52, Jiangjiji jiangj...@huawei.com wrote: Hi, This patch converts more intrinsics to use builtin functions instead of the previous inline assembly syntax. Passed the glorious testsuite of Christophe Lyon. Three testcases are added for the testing of intriniscs which are not covered by the testsuite: gcc.target/aarch64/vmull_high.c gcc.target/aarch64/vmull_high_lane.c gcc.target/aarch64/vmull_high_n.c As I said here: https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01934.html I am in tre process of converting my existing testsuite to GCC/Dejagnu. Please do not duplicate work. Regtested with aarch64-linux-gnu on QEMU. This patch has no regressions for aarch64_be-linux-gnu big-endian target too. OK for the trunk? Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 218464) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,38 @@ +2014-12-09 Felix Yang felix.y...@huawei.com +Jiji Jiang jiangj...@huawei.com + + * config/aarch64/aarch64-simd.md (aarch64_mul_nmode, + aarch64_sumull_nmode, aarch64_sumullmode, + aarch64_simd_sumull2_nmode, aarch64_sumull2_nmode, + aarch64_sumull_lanemode, aarch64_sumull2_lanemode_internal, + aarch64_sumull_laneqmode, aarch64_sumull2_laneqmode_internal, + aarch64_smull2_lanemode, aarch64_umull2_lanemode, + aarch64_smull2_laneqmode, aarch64_umull2_laneqmode, + aarch64_fmulxmode, aarch64_fmulxmode, aarch64_fmulx_lanemode, + aarch64_pmull2v16qi, aarch64_pmullv8qi): New patterns. + * config/aarch64/aarch64-simd-builtins.def (vec_widen_smult_hi_, + vec_widen_umult_hi_, umull, smull, smull_n, umull_n, mul_n, smull2_n, + umull2_n, smull_lane, umull_lane, smull_laneq, umull_laneq, pmull, + umull2_lane, smull2_laneq, umull2_laneq, fmulx, fmulx_lane, pmull2, + smull2_lane): New builtins. + * config/aarch64/arm_neon.h (vmul_n_f32, vmul_n_s16, vmul_n_s32, + vmul_n_u16, vmul_n_u32, vmulq_n_f32, vmulq_n_f64, vmulq_n_s16, + vmulq_n_s32, vmulq_n_u16, vmulq_n_u32, vmull_high_lane_s16, + vmull_high_lane_s32, vmull_high_lane_u16, vmull_high_lane_u32, + vmull_high_laneq_s16, vmull_high_laneq_s32, vmull_high_laneq_u16, + vmull_high_laneq_u32, vmull_high_n_s16, vmull_high_n_s32, + vmull_high_n_u16, vmull_high_n_u32, vmull_high_p8, vmull_high_s8, + vmull_high_s16, vmull_high_s32, vmull_high_u8, vmull_high_u16, + vmull_high_u32, vmull_lane_s16, vmull_lane_s32, vmull_lane_u16, + vmull_lane_u32, vmull_laneq_s16, vmull_laneq_s32, vmull_laneq_u16, + vmull_laneq_u32, vmull_n_s16, vmull_n_s32, vmull_n_u16, vmull_n_u32, + vmull_p8, vmull_s8, vmull_s16, vmull_s32, vmull_u8, vmull_u16, + vmull_u32, vmulx_f32, vmulx_lane_f32, vmulxd_f64, vmulxq_f32, + vmulxq_f64, vmulxq_lane_f32, vmulxq_lane_f64, vmulxs_f32): Rewrite + using builtin functions. + * config/aarch64/iterators.md (UNSPEC_FMULX, UNSPEC_FMULX_LANE, + VDQF_Q): New unspec and int iterator. + 2014-12-07 Felix Yang felix.y...@huawei.com Shanyao Chen chenshan...@huawei.com Index: gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmull_high.c === --- gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmull_high.c (revision 0) +++ gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmull_high.c (revision 0) @@ -0,0 +1,111 @@ +#include arm_neon.h +#include arm-neon-ref.h +#include compute-ref-data.h + + +/* Expected results. */ +VECT_VAR_DECL(expected,int,16,8) [] = { 0xfc48, 0xfcbf, 0xfd36, 0xfdad, +0xfe24, 0xfe9b, 0xff12, 0xff89 }; +VECT_VAR_DECL(expected,int,32,4) [] = { 0xf9a0, 0xfa28, +0xfab0, 0xfb38 }; +VECT_VAR_DECL(expected,int,64,2) [] = { 0xf7a2, +0xf83b }; +VECT_VAR_DECL(expected,uint,16,8) [] = { 0xa4b0, 0xa55a, 0xa604, 0xa6ae, + 0xa758, 0xa802, 0xa8ac, 0xa956 }; +VECT_VAR_DECL(expected,uint,32,4) [] = { 0xbaf73c, 0xbaf7f7, + 0xbaf8b2, 0xbaf96d }; +VECT_VAR_DECL(expected,uint,64,2) [] = { 0xcbf4d8, + 0xcbf5a4}; +VECT_VAR_DECL(expected,poly,16,8) [] = { 0x6530, 0x659a, 0x6464, 0x64ce, + 0x6798, 0x6732, 0x66cc, 0x }; + +#ifndef INSN_NAME +#define INSN_NAME vmull_high +#define TEST_MSG VMUL_HIGH +#endif + +#define FNNAME1(NAME) exec_ ## NAME +#define FNNAME(NAME) FNNAME1(NAME) + +void FNNAME (INSN_NAME) (void) +{ +#define DECL_VMUL(T, W, N) \ + DECL_VARIABLE(vector1, T, W, N);
[PATCH] Fix PR64193
The following patch restores proper CSE by FRE/PRE when doing optimistic value-numbering of stores/loads. In that case we may end up value-numbering a virtual operand to the only executable edge value of a PHI - but as the alias walker doesn't know this it fails to handle the PHI which results in inconsistent virtual operand being used for the entry in the hashtable. Since r173250 we refuse to value-number to sth non-VARYING in the 2nd iteration if the 1st iteration computed VARYING (which is correct). The fix is to tell the alias walker about this optimistic handling. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2014-12-09 Richard Biener rguent...@suse.de PR tree-optimization/64193 * tree-ssa-alias.c (walk_non_aliased_vuses): Add valueize parameter and valueize the VUSE before looking up the def stmt. * tree-ssa-alias.h (walk_non_aliased_vuses): Adjust prototype. * tree-ssa-sccvn.c (vn_reference_lookup_pieces): Pass vn_valueize to walk_non_aliased_vuses. (vn_reference_lookup): Likewise. * tree-ssa-dom.c (lookup_avail_expr): Pass NULL as valueize callback to walk_non_aliased_vuses. * gcc.dg/tree-ssa/ssa-fre-43.c: New testcase. Index: gcc/tree-ssa-alias.c === --- gcc/tree-ssa-alias.c(revision 218479) +++ gcc/tree-ssa-alias.c(working copy) @@ -2632,12 +2632,18 @@ get_continuation_for_phi (gimple phi, ao If TRANSLATE returns NULL the walk continues and TRANSLATE is supposed to adjust REF and *DATA to make that valid. + VALUEIZE if non-NULL is called with the next VUSE that is considered + and return value is substituted for that. This can be used to + implement optimistic value-numbering for example. Note that the + VUSE argument is assumed to be valueized already. + TODO: Cache the vector of equivalent vuses per ref, vuse pair. */ void * walk_non_aliased_vuses (ao_ref *ref, tree vuse, void *(*walker)(ao_ref *, tree, unsigned int, void *), void *(*translate)(ao_ref *, tree, void *, bool), + tree (*valueize)(tree), void *data) { bitmap visited = NULL; @@ -2663,6 +2669,8 @@ walk_non_aliased_vuses (ao_ref *ref, tre else if (res != NULL) break; + if (valueize) + vuse = valueize (vuse); def_stmt = SSA_NAME_DEF_STMT (vuse); if (gimple_nop_p (def_stmt)) break; Index: gcc/tree-ssa-alias.h === --- gcc/tree-ssa-alias.h(revision 218479) +++ gcc/tree-ssa-alias.h(working copy) @@ -124,6 +124,7 @@ extern void *walk_non_aliased_vuses (ao_ void *(*)(ao_ref *, tree, unsigned int, void *), void *(*)(ao_ref *, tree, void *, bool), +tree (*)(tree), void *); extern unsigned int walk_aliased_vdefs (ao_ref *, tree, bool (*)(ao_ref *, tree, void *), Index: gcc/tree-ssa-sccvn.c === --- gcc/tree-ssa-sccvn.c(revision 218479) +++ gcc/tree-ssa-sccvn.c(working copy) @@ -2160,7 +2176,8 @@ vn_reference_lookup_pieces (tree vuse, a *vnresult = (vn_reference_t)walk_non_aliased_vuses (r, vr1.vuse, vn_reference_lookup_2, - vn_reference_lookup_3, vr1); + vn_reference_lookup_3, + vn_valueize, vr1); gcc_checking_assert (vr1.operands == shared_lookup_references); } @@ -2212,7 +2229,8 @@ vn_reference_lookup (tree op, tree vuse, wvnresult = (vn_reference_t)walk_non_aliased_vuses (r, vr1.vuse, vn_reference_lookup_2, - vn_reference_lookup_3, vr1); + vn_reference_lookup_3, + vn_valueize, vr1); gcc_checking_assert (vr1.operands == shared_lookup_references); if (wvnresult) { Index: gcc/tree-ssa-dom.c === --- gcc/tree-ssa-dom.c (revision 218479) +++ gcc/tree-ssa-dom.c (working copy) @@ -2635,7 +2635,7 @@ lookup_avail_expr (gimple stmt, bool ins TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME (ao_ref_init (ref, gimple_assign_rhs1 (stmt)), true) walk_non_aliased_vuses (ref, vuse2, - vuse_eq, NULL, vuse1) != NULL)) +
[PATCH] Fix PR42108
The following finally fixes PR42108 (well, hopefully...) by using range-information on SSA names to allow the integer divisions introduced by Fortran array lowering being hoisted out of loops, thus detecting them as not trapping. I chose to enhance tree_single_nonzero_warnv_p for this and adjusted operation_could_trap_helper_p to use this helper. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2014-12-09 Richard Biener rguent...@suse.de PR tree-optimization/42108 * fold-const.c (tree_single_nonzero_warnv_p): Use range information associated with SSA names. * tree-eh.c (operation_could_trap_helper_p): Use tree_single_nonzero_warnv_p to check for trapping non-fp operation. * tree-vrp.c (remove_range_assertions): Remove bogus assert. * gfortran.dg/pr42108.f90: Adjust testcase. Index: gcc/fold-const.c === --- gcc/fold-const.c(revision 218479) +++ gcc/fold-const.c(working copy) @@ -83,6 +83,8 @@ along with GCC; see the file COPYING3. #include cgraph.h #include generic-match.h #include optabs.h +#include stringpool.h +#include tree-ssanames.h /* Nonzero if we are folding constants inside an initializer; zero otherwise. */ @@ -15362,6 +15381,26 @@ tree_single_nonzero_warnv_p (tree t, boo } break; +case SSA_NAME: + if (INTEGRAL_TYPE_P (TREE_TYPE (t))) + { + wide_int minv, maxv; + enum value_range_type rtype = get_range_info (t, minv, maxv); + if (rtype == VR_RANGE) + { + if (wi::lt_p (maxv, 0, TYPE_SIGN (TREE_TYPE (t))) + || wi::gt_p (minv, 0, TYPE_SIGN (TREE_TYPE (t + return true; + } + else if (rtype == VR_ANTI_RANGE) + { + if (wi::le_p (minv, 0, TYPE_SIGN (TREE_TYPE (t))) + wi::ge_p (maxv, 0, TYPE_SIGN (TREE_TYPE (t + return true; + } + } + break; + default: break; } Index: gcc/tree-eh.c === --- gcc/tree-eh.c (revision 218479) +++ gcc/tree-eh.c (working copy) @@ -2440,13 +2440,16 @@ operation_could_trap_helper_p (enum tree case ROUND_MOD_EXPR: case TRUNC_MOD_EXPR: case RDIV_EXPR: - if (honor_snans || honor_trapv) - return true; - if (fp_operation) - return flag_trapping_math; - if (!TREE_CONSTANT (divisor) || integer_zerop (divisor)) -return true; - return false; + { + if (honor_snans || honor_trapv) + return true; + if (fp_operation) + return flag_trapping_math; + bool sop; + if (!tree_single_nonzero_warnv_p (divisor, sop)) + return true; + return false; + } case LT_EXPR: case LE_EXPR: Index: gcc/testsuite/gfortran.dg/pr42108.f90 === --- gcc/testsuite/gfortran.dg/pr42108.f90 (revision 218479) +++ gcc/testsuite/gfortran.dg/pr42108.f90 (working copy) @@ -1,5 +1,5 @@ ! { dg-do compile } -! { dg-options -O2 -fdump-tree-fre1 } +! { dg-options -O2 -fdump-tree-fre1 -fdump-tree-lim1-details } subroutine eval(foo1,foo2,foo3,foo4,x,n,nnd) implicit real*8 (a-h,o-z) @@ -21,7 +21,10 @@ subroutine eval(foo1,foo2,foo3,foo4,x,n end do end subroutine eval -! There should be only one load from n left +! There should be only one load from n left and the division should +! be hoisted out of the loop ! { dg-final { scan-tree-dump-times \\*n_ 1 fre1 } } +! { dg-final { scan-tree-dump-times Moving statement 5 lim1 } } ! { dg-final { cleanup-tree-dump fre1 } } +! { dg-final { cleanup-tree-dump lim1 } } Index: gcc/tree-vrp.c === --- gcc/tree-vrp.c (revision 218479) +++ gcc/tree-vrp.c (working copy) @@ -6866,12 +6866,9 @@ remove_range_assertions (void) tree lhs = gimple_assign_lhs (stmt); tree rhs = gimple_assign_rhs1 (stmt); tree var; - tree cond = fold (ASSERT_EXPR_COND (rhs)); use_operand_p use_p; imm_use_iterator iter; - gcc_assert (cond != boolean_false_node); - var = ASSERT_EXPR_VAR (rhs); gcc_assert (TREE_CODE (var) == SSA_NAME);
Re: [PATCH 6/n] OpenMP 4.0 offloading infrastructure: option handling
On Mon, 8 Dec 2014, Ilya Verbin wrote: Hi, On 28 Nov 09:36, Richard Biener wrote: On Fri, 28 Nov 2014, Ilya Verbin wrote: I found a bug here, have_{lto,offload} must be set if at least one file contains lto/offload sections, but currently they are overwritten by the last file. Fix is bootstrapped and regtested on x86_64-linux. OK for trunk? Ok. Unfortunately, this fix was not general enough. There might be cases when mixed object files get into lto-wrapper, ie some of them contain only LTO sections, some contain only offload sections, and some contain both. But when lto-wrapper will pass all these files to recompilation, the compiler might crash (it depends on the order of input files), since in read_cgraph_and_symbols it expects that *all* input files contain IR section of given type. This patch splits input objects from argv into lto_argv and offload_argv, so that all files in arrays contain corresponding IR. Similarly, in lto-plugin, it was bad idea to add objects, which contain offload IR without LTO, to claimed_files, since this may corrupt a resolution file. Tested on various combinations of files with/without -flto and with/without offload, using trunk ld and gold, also tested on ld without plugin support. Bootstrap and make check passed on x86_64-linux and i686-linux. Ok for trunk? Did you check that bootstrap-lto still works? Ok if so. Thanks, Richard. But I am unable to run lto-bootstrap on trunk, it says: /x/build-x86_64-linux/./prev-gcc/gcc-ar -B/x/build-x86_64-linux/./prev-gcc/ cru libdecnumber.a decNumber.o decContext.o decimal32.o decimal64.o decimal128.o bid2dpd_dpd2bid.o host-ieee32.o host-ieee64.o host-ieee128.o /usr/bin/ar terminated with signal 11 [Segmentation fault], core dumped make[4]: *** [libdecnumber.a] Error 1 gcc/ * lto-wrapper.c (compile_offload_image): Start processing in_argv from 0 instead of 1. (run_gcc): Put offload objects into offload_argv, put LTO objects and possible preceding arguments into lto_argv. Pass offload_argv to compile_images_for_offload_targets instead of argv. Use lto_argv for LTO recompilation instead of argv. lto-plugin/ * lto-plugin.c (offload_files, num_offload_files): New static variables. (free_1): Use arguments instead of global variables. (free_2): Free offload_files. (all_symbols_read_handler): Add names from offload_files to lto-wrapper arguments. (claim_file_handler): Do not add file to claimed_files if it contains offload sections without LTO sections. Add it to offload_files instead. diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c index 0f69457..f75c0dc 100644 --- a/gcc/lto-wrapper.c +++ b/gcc/lto-wrapper.c @@ -669,7 +669,7 @@ compile_offload_image (const char *target, const char *compiler_path, obstack_ptr_grow (argv_obstack, filename); /* Append names of input object files. */ - for (unsigned i = 1; i in_argc; i++) + for (unsigned i = 0; i in_argc; i++) obstack_ptr_grow (argv_obstack, in_argv[i]); /* Append options from offload_lto sections. */ @@ -883,6 +883,8 @@ run_gcc (unsigned argc, char *argv[]) int new_head_argc; bool have_lto = false; bool have_offload = false; + unsigned lto_argc = 0, offload_argc = 0; + char **lto_argv, **offload_argv; /* Get the driver and options. */ collect_gcc = getenv (COLLECT_GCC); @@ -896,6 +898,11 @@ run_gcc (unsigned argc, char *argv[]) decoded_options, decoded_options_count); + /* Allocate arrays for input object files with LTO or offload IL, + and for possible preceding arguments. */ + lto_argv = XNEWVEC (char *, argc); + offload_argv = XNEWVEC (char *, argc); + /* Look at saved options in the IL files. */ for (i = 1; i argc; ++i) { @@ -918,17 +925,27 @@ run_gcc (unsigned argc, char *argv[]) } fd = open (argv[i], O_RDONLY); if (fd == -1) - continue; + { + lto_argv[lto_argc++] = argv[i]; + continue; + } + + if (find_and_merge_options (fd, file_offset, LTO_SECTION_NAME_PREFIX, + fdecoded_options, fdecoded_options_count, + collect_gcc)) + { + have_lto = true; + lto_argv[lto_argc++] = argv[i]; + } + + if (find_and_merge_options (fd, file_offset, OFFLOAD_SECTION_NAME_PREFIX, + offload_fdecoded_options, + offload_fdecoded_options_count, collect_gcc)) + { + have_offload = true; + offload_argv[offload_argc++] = argv[i]; + } - have_lto - |= find_and_merge_options (fd, file_offset, LTO_SECTION_NAME_PREFIX, -fdecoded_options, fdecoded_options_count, -
Re: [PATCH 8/9] Negative numbers added for sreal class.
On Mon, Dec 8, 2014 at 5:52 PM, Martin Liška mli...@suse.cz wrote: On 11/28/2014 10:32 AM, Richard Biener wrote: On Thu, Nov 27, 2014 at 6:08 PM, Martin Liška mli...@suse.cz wrote: On 11/21/2014 04:21 PM, Martin Liška wrote: On 11/21/2014 04:02 PM, Richard Biener wrote: On Fri, Nov 21, 2014 at 3:39 PM, Martin Liška mli...@suse.cz wrote: Hello. Ok, this is simplified, one can use sreal a = 12345 and it works ;) that's a new API, right? There is no max () and I think that using LONG_MIN here is asking for trouble (host dependence). The comment in the file says the max should be sreal (SREAL_MAX_SIG, SREAL_MAX_EXP) and the min sreal (-SREAL_MAX_SIG, SREAL_MAX_EXP)? Sure, sreal can store much bigger(smaller) numbers :) Where do you need sreal::to_double? The host shouldn't perform double calculations so it can be only for dumping? In which case the user should have used sreal::dump (), maybe with extra arguments. That new function was request from Honza, only for debugging purpose. I agree that dump should this kind of job. If no other problem, I will run tests once more and commit it. Thanks, Martin -#define SREAL_MAX_EXP (INT_MAX / 4) +#define SREAL_MAX_EXP (INT_MAX / 8) this change doesn't look necessary anymore? Btw, it's also odd that... #define SREAL_PART_BITS 32 ... #define SREAL_MIN_SIG ((uint64_t) 1 (SREAL_PART_BITS - 1)) #define SREAL_MAX_SIG (((uint64_t) 1 SREAL_PART_BITS) - 1) thus all m_sig values fit in 32bits but we still use a uint64_t m_sig ... (the implementation uses 64bit for internal computations, but still the storage is wasteful?) Of course the way normalize() works requires that storage to be 64bits to store unnormalized values. I'd say ok with the SREAL_MAX_EXP change reverted. Hi. You are right, this change was done because I used one bit for m_negative (bitfield), not needed any more. Final version attached. Thank you, Martin Thanks, Richard. Otherwise looks good to me and sorry for not noticing the above earlier. Thanks, Richard. Thanks, Martin }; extern void debug (sreal ref); @@ -76,12 +133,12 @@ inline sreal operator+= (sreal a, const sreal b) inline sreal operator-= (sreal a, const sreal b) { -return a = a - b; + return a = a - b; } inline sreal operator/= (sreal a, const sreal b) { -return a = a / b; + return a = a / b; } inline sreal operator*= (sreal a, const sreal b) -- 2.1.2 Hello. After IRC discussions, I decided to give sreal another refactoring where I use int64_t for m_sig. This approach looks much easier and straightforward. I would like to ask folk for comments? I think you want to exclude LONG_MIN (how do you know that LONG_MIN is min(int64_t)?) as a valid value for m_sig - after all SREAL_MIN_SIG makes it symmetric. Or simply use abs_hwi at Hello. I decided to use abs_hwi. That will ICE if you do sreal x (- __LONG_MAX__ - 1); maybe that's the only case though. sreal::normalize () { + int64_t s = m_sig 0 ? -1 : 1; + HOST_WIDE_INT sig = abs_hwi (m_sig); + if (m_sig == 0) ... } + + m_sig = s * sig; } it's a bit awkward to strip the sign and then put it back on this way. Also now using a signed 'sig' where it was unsigned before. And keeping the first test using m_sig instead of sig. I'd simply have used 'unsigned HOST_WIDE_INT sig = absu_hwi (m_sig);' instead. The rest of the patch is ok with the above change. Thanks, Richard. + int64_t s = m_sig 0 ? -1 : 1; + uint64_t sig = m_sig == LONG_MIN ? LONG_MAX : std::abs (m_sig); -#define SREAL_MIN_SIG ((uint64_t) 1 (SREAL_PART_BITS - 1)) -#define SREAL_MAX_SIG (((uint64_t) 1 SREAL_PART_BITS) - 1) +#define SREAL_MIN_SIG ((uint64_t) 1 (SREAL_PART_BITS - 2)) +#define SREAL_MAX_SIG (((uint64_t) 1 (SREAL_PART_BITS - 1)) - 1) shouldn't this also be -2 in the last line? It's correct, in the first line, I s/'SREAL_PART_BITS - 1'/'SREAL_PART_BITS - 2' and second one is also decremented: s/'SREAL_PART_BITS'/'REAL_PART_BITS - 1'. That is, you effectively use the MSB as a sign bit? Yes. It uses signed integer with MSB as a sign bit. Btw, a further change would be to make m_sig 'signed int', matching the real storage requirements according to SREAL_PART_BITS. This would of course still require temporaries used for computation to be 64bits and it would require normalization to work on the temporaries. But then we'd get down to 8 bytes storage ... Agree, we can shrink the size for future. I've been running tests, I hope this implementation is much nicer than having bool m_negative. What do you think about trunk acceptation? Thanks, Martin Richard. I am able to run profiled bootstrap on x86_64-linux-pc and ppc64-linux-pc and new regression is introduced. Thanks, Martin
[PATCH, rtl-optimization]: Fix PR64213 by reverting fix for PR64037
Hello! The fix for PR64037 [1] introduced severe problems, including bootstrap failure on alphaev6-linux-gnu. Non-bootstrapped testsuite results [2] can be compared with bootstrapped gcc with a revert [3]. As discussed in the PR, the problem was in a missed SI-DI zero_extend. [1] https://gcc.gnu.org/ml/gcc-patches/2014-11/msg03405.html [2] https://gcc.gnu.org/ml/gcc-testresults/2014-12/msg01068.html [3] https://gcc.gnu.org/ml/gcc-testresults/2014-12/msg01069.html The patch fixes the regression by reverting the above patch. 2014-12-09 Uros Bizjak ubiz...@gmail.com PR bootstrap/64213 Revert: 2014-11-28 H.J. Lu hongjiu...@intel.com PR rtl-optimization/64037 * combine.c (setup_incoming_promotions): Pass the argument before any promotions happen to promote_function_mode. * configure.tgt (x86_64): Tune -m32 multilib to generic. testsuite/ChangeLog: 2014-12-09 Uros Bizjak ubiz...@gmail.com PR bootstrap/64213 Revert: 2014-11-28 H.J. Lu hongjiu...@intel.com PR rtl-optimization/64037 * g++.dg/pr64037.C: New test. Patch was bootstrapped and regression tested on alphaev68-linux-gnu. Approved by Richi in the PR audit trail, will be committed to mainline SVN, 4.9 and 4.8 branch. Uros. Index: combine.c === --- combine.c (revision 218160) +++ combine.c (revision 218161) @@ -1561,8 +1561,8 @@ setup_incoming_promotions (rtx_insn *first) uns3 = TYPE_UNSIGNED (DECL_ARG_TYPE (arg)); /* The mode and signedness of the argument as it is actually passed, - after any TARGET_PROMOTE_FUNCTION_ARGS-driven ABI promotions. */ - mode3 = promote_function_mode (DECL_ARG_TYPE (arg), mode2, uns3, + see assign_parm_setup_reg in function.c. */ + mode3 = promote_function_mode (TREE_TYPE (arg), mode1, uns1, TREE_TYPE (cfun-decl), 0); /* The mode of the register in which the argument is being passed. */
Re: [PING ^ 3][PATCH, AArch64] Add doloop_end pattern for -fmodulo-sched
On 18/11/14 11:28, Yangfei (Felix) wrote: On 11/18/2014 11:48 AM, Yangfei (Felix) wrote: +(define_expand doloop_end + [(use (match_operand 0 )) ; loop pseudo + (use (match_operand 1 ))] ; label + +{ + /* Currently SMS relies on the do-loop pattern to recognize loops + where (1) the control part consists of all insns defining and/or + using a certain 'count' register and (2) the loop count can be + adjusted by modifying this register prior to the loop. + ??? The possible introduction of a new block to initialize the + new IV can potentially affect branch optimizations. */ + if (optimize 0 flag_modulo_sched) You'd be better off moving this condition into the expansion predicate (which is currently ). This short-circuits a lot of unnecessary work. See pass_rtl_doloop::gate. r~ Yeah, that's a good idea. See my updated patch :-) Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 217394) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2014-11-13 Felix Yang felix.y...@huawei.com + + * config/aarch64/aarch64.c (doloop_end): New pattern. + * config/aarch64/aarch64.md (TARGET_CAN_USE_DOLOOP_P): Implement. + 2014-11-11 Andrew Pinski apin...@cavium.com Bug target/61997 Index: gcc/config/aarch64/aarch64.md === --- gcc/config/aarch64/aarch64.md (revision 217394) +++ gcc/config/aarch64/aarch64.md (working copy) @@ -4087,6 +4087,43 @@ [(set_attr type mrs)]) +;; Define the subtract-one-and-jump insns so loop.c +;; knows what to generate. +(define_expand doloop_end + [(use (match_operand 0 )) ; loop pseudo + (use (match_operand 1 ))] ; label + optimize 0 flag_modulo_sched +{ + rtx s0; + rtx bcomp; + rtx loc_ref; + rtx cc_reg; + rtx insn; + rtx cmp; + + /* Currently SMS relies on the do-loop pattern to recognize loops + where (1) the control part consists of all insns defining and/or + using a certain 'count' register and (2) the loop count can be + adjusted by modifying this register prior to the loop. + ??? The possible introduction of a new block to initialize the + new IV can potentially affect branch optimizations. */ + + if (GET_MODE (operands[0]) != DImode) +FAIL; + + s0 = operands [0]; + insn = emit_insn (gen_adddi3_compare0 (s0, s0, GEN_INT (-1))); + + cmp = XVECEXP (PATTERN (insn), 0, 0); + cc_reg = SET_DEST (cmp); + bcomp = gen_rtx_NE (VOIDmode, cc_reg, const0_rtx); + loc_ref = gen_rtx_LABEL_REF (VOIDmode, operands [1]); + emit_jump_insn (gen_rtx_SET (VOIDmode, pc_rtx, + gen_rtx_IF_THEN_ELSE (VOIDmode, bcomp, +loc_ref, pc_rtx))); + DONE; +}) + ;; AdvSIMD Stuff (include aarch64-simd.md) Index: gcc/config/aarch64/aarch64.c === --- gcc/config/aarch64/aarch64.c(revision 217394) +++ gcc/config/aarch64/aarch64.c(working copy) @@ -10224,6 +10224,9 @@ aarch64_use_by_pieces_infrastructure_p (unsigned i #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \ aarch64_use_by_pieces_infrastructure_p +#undef TARGET_CAN_USE_DOLOOP_P +#define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost + struct gcc_target targetm = TARGET_INITIALIZER; #include gt-aarch64.h Hi Felix, This patch causes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64240 when sms-3 is tested with -fPIC. It runs fine when I reverse this patch out. Please could you have a look? Thanks, Tejas.
Re: Fix libgomp crash without TLS (PR42616)
Ok, the following patch works for Android: 2014-12-09 Varvara Rainchik varvara.rainc...@intel.com * config.h.in: Regenerate. * configure: Regenerate. * configure.ac: Add GCC_CHECK_EMUTLS. * libgomp.h: Add check for USE_EMUTLS: this case is equal to HAVE_TLS. * team.c: Likewise. -- diff --git a/libgomp/configure.ac b/libgomp/configure.ac index cea6366..16ec158 100644 --- a/libgomp/configure.ac +++ b/libgomp/configure.ac @@ -245,6 +245,9 @@ fi # See if we support thread-local storage. GCC_CHECK_TLS +# See if we have emulated thread-local storage. +GCC_CHECK_EMUTLS + # See what sort of export controls are available. LIBGOMP_CHECK_ATTRIBUTE_VISIBILITY LIBGOMP_CHECK_ATTRIBUTE_DLLEXPORT diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h index a1482cc..b694356 100644 --- a/libgomp/libgomp.h +++ b/libgomp/libgomp.h @@ -471,7 +471,7 @@ enum gomp_cancel_kind /* ... and here is that TLS data. */ -#ifdef HAVE_TLS +#if defined HAVE_TLS || defined USE_EMUTLS extern __thread struct gomp_thread gomp_tls_data; static inline struct gomp_thread *gomp_thread (void) { diff --git a/libgomp/team.c b/libgomp/team.c index e6a6d8f..594127c 100644 --- a/libgomp/team.c +++ b/libgomp/team.c @@ -37,7 +37,7 @@ pthread_key_t gomp_thread_destructor; /* This is the libgomp per-thread data structure. */ -#ifdef HAVE_TLS +#if defined HAVE_TLS || defined USE_EMUTLS __thread struct gomp_thread gomp_tls_data; #else pthread_key_t gomp_tls_key; @@ -70,7 +70,7 @@ gomp_thread_start (void *xdata) void (*local_fn) (void *); void *local_data; -#ifdef HAVE_TLS +#if defined HAVE_TLS || defined USE_EMUTLS thr = gomp_tls_data; #else struct gomp_thread local_thr; @@ -916,7 +916,7 @@ gomp_team_end (void) static void __attribute__((constructor)) initialize_team (void) { -#ifndef HAVE_TLS +#if !defined HAVE_TLS !defined USE_EMUTLS static struct gomp_thread initial_thread_tls_data; pthread_key_create (gomp_tls_key, NULL); Changes are bootstrapped and regtested on linux, all make check tests now also pass with --disable-tls. Is it ok for upstream? 2014-12-09 15:12 GMT+03:00 Jakub Jelinek ja...@redhat.com: On Tue, Dec 09, 2014 at 02:49:44PM +0300, Varvara Rainchik wrote: Can we instead of adding new macroses in config/tls.m4 use something like that in libgomp: #if defined (HAVE_TLS) defined (USE_EMUTLS) (with GCC_CHECK_EMUTLS in libgomp/configure.ac)? That would be fine too. Jakub
Re: [PATCH] Update libgcc.texi to match implementation in libgcc/libgcc2.c
On Mon, Oct 27, 2014 at 8:19 PM, Kito Cheng kito.ch...@gmail.com wrote: This patch update `Bit operations` section in libgcc.text, most bit operation function is take an unsigned integer instead of signed integer in libgcc/libgcc2.c [1], and it seem more make sense :) ChangeLog 2014-10-28 Kito Cheng k...@0xlab.org * doc/libgcc.texi: Update text to match implementation in libgcc/libgcc2.c [1] https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=blob;f=libgcc/libgcc2.c;h=46d6a2ef030ff98a944935f77529cee9c05fb326;hb=HEAD This is OK. Thanks. Ian
Re: [PATCH 2/3] Extended if-conversion
On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Richard, Here is updated patch2 with the following changes: 1. Delete functions phi_has_two_different_args and find_insertion_point. 2. Use only one function for extended predication - predicate_extended_scalar_phi. 3. Save gsi before insertion of predicate computations for basic blocks if it has 2 predecessors and both incoming edges are critical or it gas more than 2 predecessors and at least one incoming edge is critical. This saved iterator can be used by extended phi predication. Here is motivated test-case which explains this point. Test-case is attached (t5.c) and it must be compiled with -O2 -ftree-loop-vectorize -fopenmp options. The problem phi is in bb-7: bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 }) { bb 5: xmax_edge_18 = xmax_edge_36 + 1; if (xmax_17 == xmax_27) goto bb 7; else goto bb 9; } bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 }) { bb 6: if (xmax_17 == xmax_27) goto bb 7; else goto bb 8; } bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 }) { bb 7: # xmax_edge_30 = PHI xmax_edge_36(6), xmax_edge_18(5) xmax_edge_19 = xmax_edge_39 + 1; goto bb 11; } Note that both incoming edges to bb_7 are critical. If we comment out restoring gsi in predicate_all_scalar_phi: #if 0 if ((EDGE_COUNT (bb-preds) == 2 all_preds_critical_p (bb)) || (EDGE_COUNT (bb-preds) 2 has_pred_critical_p (bb))) gsi = bb_insert_point (bb); else #endif gsi = gsi_after_labels (bb); we will get ICE: t5.c: In function 'foo': t5.c:9:6: error: definition in block 4 follows the use void foo (int n) ^ for SSA_NAME: _1 in statement: _52 = _1 _3; t5.c:9:6: internal compiler error: verify_ssa failed smce predicate computations were inserted in bb_7. The issue is obviously that the predicates have already been emitted in the target BB - that's of course the wrong place. This is done by insert_gimplified_predicates. This just shows how edge predicate handling is broken - we don't seem to have a sequence of gimplified stmts for edge predicates but push those to e-dest which makes this really messy. Rather than having a separate phase where we insert all gimplified bb predicates we should do that on-demand when predicating a PHI. Your patch writes to stderr - that's bad - use dump_file and guard the printfs properly. You also still have two functions for PHI predication. And the new extended variant doesn't commonize the 2-args and general paths. I'm not at all happy with this code. It may be existing if-conv codes fault but making it even worse is not an option. Again - what's wrong with simply splitting critical edges if aggressive_if_conv? I think that would very much simplify things here. Or alternatively use gsi_insert_on_edge and commit edge insertions before merging the blocks. Thanks, Richard. ChangeLog is 2014-12-09 Yuri Rumyantsev ysrum...@gmail.com * tree-if-conv.c : Include hash-map.h. (struct bb_predicate_s): Add new field to save copy of gimple statement iterator. (bb_insert_point): New function. (set_bb_insert_point): New function. (has_pred_critical_p): New function. (if_convertible_bb_p): Allow bb has more than 2 predecessors if AGGRESSIVE_IF_CONV is true. (if_convertible_bb_p): Delete check that bb has at least one non-critical incoming edge. (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED. Allow interchange PHI arguments if EXTENDED is false. Change check that block containing reduction statement candidate is predecessor of phi-block since phi may have more than two arguments. (predicate_scalar_phi): Add new arguments for call of is_cond_scalar_reduction. (get_predicate_for_edge): New function. (struct phi_args_hash_traits): New type. (phi_args_hash_traits::hash): New function. (phi_args_hash_traits::equal_keys): New function. (gen_phi_arg_condition): New function. (predicate_extended_scalar_phi): New function. (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it to true if BB containing phi has more than 2 predecessors or both incoming edges are critical. Invoke find_phi_replacement_condition and predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB has 2 predecessors and both incoming edges are critical or it has more than 2 predecessors and atleast one incoming edge is critical. Use standard gsi_after_labels otherwise. Invoke predicate_extended_scalar_phi if EXTENDED is true. (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION to save gsi before insertion of predicate computations. SEt-up it to true for BB with 2 predecessors and critical incoming edges either number of predecessors is geater 2 and at least one incoming edge is critical. Add check that non-predicated block may have statements to insert. Insert predicate computation of BB just after
[PATCH 0/4] [AARCH64,SIMD] PR63870 Improve error messages for single lane load/store
From: Charles Baylis charles.bay...@linaro.org This patch series moves the checking of lane indices for vld[234](q?)_lane and vst[234](q?)_lane intrinsics so that it occurs during builtin expansion. The q register variants are checked directly, but since the d register variants use the same intrinsics, these are checked in arm_neon.h using __builtin_aarch64_im_land_boundsi(). Tested with make check-gcc on aarch64-oe-linux, with no regressions. Charles Baylis (4): vldN_lane error message enhancements (Q registers) vldN_lane error message enhancements (D registers) vstN_lane error message enhancements (Q register) vstN_lane error message enhancements (D registers) gcc/config/aarch64/aarch64-builtins.c | 32 +++- gcc/config/aarch64/aarch64-simd.md| 12 ++-- gcc/config/aarch64/arm_neon.h | 12 3 files changed, 45 insertions(+), 11 deletions(-) -- 1.9.1
[PATCH 1/4] vldN_lane error message enhancements (Q registers)
From: Charles Baylis charles.bay...@linaro.org gcc/ChangeLog: DATE Charles Baylis charles.bay...@linaro.org PR target/63870 * config/aarch64/aarch64-builtins.c (enum aarch64_type_qualifiers): Add qualifier_struct_load_store_lane_index. (aarch64_types_loadstruct_lane_qualifiers): Use qualifier_struct_load_store_lane_index for lane index argument for last argument. (builtin_simd_arg): Add SIMD_ARG_STRUCT_LOAD_STORE_LANE_INDEX. (aarch64_simd_expand_args): Add new argument describing mode of builtin. Check lane bounds for arguments with SIMD_ARG_STRUCT_LOAD_STORE_LANE_INDEX. (aarch64_simd_expand_builtin): Emit error for incorrect lane indices if marked with SIMD_ARG_STRUCT_LOAD_STORE_LANE_INDEX. (aarch64_simd_expand_builtin): Pass machine mode of builtin to aarch64_simd_expand_args. * config/aarch64/aarch64-simd.md: (aarch64_ld2_lanemode): Remove lane bounds check. Adjust lane numbers for big-endian. (aarch64_ld3_lanemode): Likewise. (aarch64_ld4_lanemode): Likewise. gcc/testsuite/ChangeLog: DATE Charles Baylis charles.bay...@linaro.org * gcc.target/aarch64/simd/vld4q_lane.c: New test. Change-Id: Ib17adaf64e631cf8d00a1a1a6c12409d2d7f4239 --- gcc/config/aarch64/aarch64-builtins.c | 30 +++--- gcc/config/aarch64/aarch64-simd.md | 12 - gcc/testsuite/gcc.target/aarch64/simd/vld4q_lane.c | 16 3 files changed, 48 insertions(+), 10 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/vld4q_lane.c diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index aac7269..27046e2 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -116,7 +116,9 @@ enum aarch64_type_qualifiers /* Polynomial types. */ qualifier_poly = 0x100, /* Lane indices - must be in range, and flipped for bigendian. */ - qualifier_lane_index = 0x200 + qualifier_lane_index = 0x200, + /* Lane indices for single lane structure loads and stores */ + qualifier_struct_load_store_lane_index = 0x400 }; typedef struct @@ -224,7 +226,7 @@ aarch64_types_load1_qualifiers[SIMD_MAX_BUILTIN_ARGS] static enum aarch64_type_qualifiers aarch64_types_loadstruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_none, qualifier_const_pointer_map_mode, - qualifier_none, qualifier_none }; + qualifier_none, qualifier_struct_load_store_lane_index }; #define TYPES_LOADSTRUCT_LANE (aarch64_types_loadstruct_lane_qualifiers) static enum aarch64_type_qualifiers @@ -859,12 +861,14 @@ typedef enum SIMD_ARG_COPY_TO_REG, SIMD_ARG_CONSTANT, SIMD_ARG_LANE_INDEX, + SIMD_ARG_STRUCT_LOAD_STORE_LANE_INDEX, SIMD_ARG_STOP } builtin_simd_arg; static rtx aarch64_simd_expand_args (rtx target, int icode, int have_retval, - tree exp, builtin_simd_arg *args) + tree exp, builtin_simd_arg *args, + enum machine_mode builtin_mode) { rtx pat; rtx op[SIMD_MAX_BUILTIN_ARGS + 1]; /* First element for result operand. */ @@ -903,6 +907,21 @@ aarch64_simd_expand_args (rtx target, int icode, int have_retval, op[opc] = copy_to_mode_reg (mode, op[opc]); break; + case SIMD_ARG_STRUCT_LOAD_STORE_LANE_INDEX: + /* we expect arguments in order (ptr, array_of_vector, lane), and +we have to grub around in the ptr to find the lane size */ + gcc_assert (opc 1); + if (CONST_INT_P (op[opc])) + { + aarch64_simd_lane_bounds (op[opc], 0, + GET_MODE_NUNITS (builtin_mode), + exp); + /* Keep to GCC-vector-extension lane indices in the RTL. */ + op[opc] = + GEN_INT (ENDIAN_LANE_N (builtin_mode, INTVAL (op[opc]))); + } + goto constant_arg; + case SIMD_ARG_LANE_INDEX: /* Must be a previous operand into which this is an index. */ gcc_assert (opc 0); @@ -917,6 +936,7 @@ aarch64_simd_expand_args (rtx target, int icode, int have_retval, /* Fall through - if the lane index isn't a constant then the next case will error. */ case SIMD_ARG_CONSTANT: +constant_arg: if (!(*insn_data[icode].operand[opc].predicate) (op[opc], mode)) { @@ -1003,6 +1023,8 @@ aarch64_simd_expand_builtin (int fcode, tree exp, rtx target) if (d-qualifiers[qualifiers_k] qualifier_lane_index) args[k] = SIMD_ARG_LANE_INDEX; + else if (d-qualifiers[qualifiers_k] qualifier_struct_load_store_lane_index) + args[k] = SIMD_ARG_STRUCT_LOAD_STORE_LANE_INDEX; else if
[PATCH 2/4] vldN_lane error message enhancements (D registers)
From: Charles Baylis charles.bay...@linaro.org gcc/ChangeLog DATE Charles Baylis charles.bay...@linaro.org * config/aarch64/arm_neon.h (__LD2_LANE_FUNC): Add explicit lane bounds check. (__LD3_LANE_FUNC): Likewise. (__LD4_LANE_FUNC): Likewise gcc/testsuite/ChangeLog: DATE Charles Baylis charles.bay...@linaro.org * gcc.target/aarch64/simd/vld4_lane.c: New test. Change-Id: Ia95fbed34b50cf710ea9032ff3428a5f1432e0aa --- gcc/config/aarch64/arm_neon.h | 6 ++ gcc/testsuite/gcc.target/aarch64/simd/vld4_lane.c | 15 +++ 2 files changed, 21 insertions(+) create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/vld4_lane.c diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 8cff719..22df564 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -17901,6 +17901,8 @@ vld2_lane_##funcsuffix (const ptrtype * __ptr, intype __b, const int __c) \ __o = __builtin_aarch64_set_qregoi##mode (__o, \ (signedtype) __temp.val[1], \ 1); \ + __builtin_aarch64_im_lane_boundsi (__c, \ +sizeof (vectype) / sizeof (*__ptr)); \ __o =__builtin_aarch64_ld2_lane##mode ( \ (__builtin_aarch64_simd_##ptrmode *) __ptr, __o, __c); \ __b.val[0] = (vectype) __builtin_aarch64_get_dregoidi (__o, 0); \ @@ -17991,6 +17993,8 @@ vld3_lane_##funcsuffix (const ptrtype * __ptr, intype __b, const int __c) \ __o = __builtin_aarch64_set_qregci##mode (__o, \ (signedtype) __temp.val[2], \ 2); \ + __builtin_aarch64_im_lane_boundsi (__c, \ +sizeof (vectype) / sizeof (*__ptr)); \ __o =__builtin_aarch64_ld3_lane##mode ( \ (__builtin_aarch64_simd_##ptrmode *) __ptr, __o, __c); \ __b.val[0] = (vectype) __builtin_aarch64_get_dregcidi (__o, 0); \ @@ -18089,6 +18093,8 @@ vld4_lane_##funcsuffix (const ptrtype * __ptr, intype __b, const int __c) \ __o = __builtin_aarch64_set_qregxi##mode (__o, \ (signedtype) __temp.val[3], \ 3); \ + __builtin_aarch64_im_lane_boundsi (__c, \ +sizeof (vectype) / sizeof (*__ptr)); \ __o =__builtin_aarch64_ld4_lane##mode ( \ (__builtin_aarch64_simd_##ptrmode *) __ptr, __o, __c); \ __b.val[0] = (vectype) __builtin_aarch64_get_dregxidi (__o, 0); \ diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vld4_lane.c b/gcc/testsuite/gcc.target/aarch64/simd/vld4_lane.c new file mode 100644 index 000..d14e6c1 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/simd/vld4_lane.c @@ -0,0 +1,15 @@ +/* Test error message when passing an invalid value as a lane index. */ + +/* { dg-do compile } */ + +#include arm_neon.h + +int8x8x4_t +f_vld4_lane (int8_t * p, int8x8x4_t v) +{ + int8x8x4_t res; + /* { dg-error lane 8 out of range 0 - 7 { target *-*-* } 0 } */ + res = vld4_lane_s8 (p, v, 8); + return res; +} + -- 1.9.1
[PATCH 4/4] vstN_lane error message enhancements (D registers)
From: Charles Baylis charles.bay...@linaro.org gcc/ChangeLog: DATE Charles Baylis charles.bay...@linaro.org * config/aarch64/arm_neon.h (__ST2_LANE_FUNC): Add explicit lane bounds check. (__ST3_LANE_FUNC): Likewise. (__ST4_LANE_FUNC): Likewise. gcc/testsuite/ChangeLog: DATE Charles Baylis charles.bay...@linaro.org * gcc.target/aarch64/simd/vst4_lane.c: New test. Change-Id: I6bceaeb7773bf20860daca4013ea1c4d2c06afa6 --- gcc/config/aarch64/arm_neon.h | 6 ++ gcc/testsuite/gcc.target/aarch64/simd/vst4_lane.c | 14 ++ 2 files changed, 20 insertions(+) create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/vst4_lane.c diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 22df564..22c6d06 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -11181,6 +11181,8 @@ vst2_lane_ ## funcsuffix (ptrtype *__ptr, \ __temp.val[1] \ = vcombine_##funcsuffix (__b.val[1],\ vcreate_##funcsuffix (__AARCH64_UINT64_C (0))); \ + __builtin_aarch64_im_lane_boundsi (__c, \ +sizeof (__b.val[0]) / sizeof (*__ptr)); \ __o = __builtin_aarch64_set_qregoi##mode (__o,\ (signedtype) __temp.val[0], 0); \ __o = __builtin_aarch64_set_qregoi##mode (__o,\ @@ -11258,6 +11260,8 @@ vst3_lane_ ## funcsuffix (ptrtype *__ptr, \ (signedtype) __temp.val[1], 1); \ __o = __builtin_aarch64_set_qregci##mode (__o,\ (signedtype) __temp.val[2], 2); \ + __builtin_aarch64_im_lane_boundsi (__c, \ +sizeof (__b.val[0]) / sizeof (*__ptr)); \ __builtin_aarch64_st3_lane##mode ((__builtin_aarch64_simd_ ## ptr_mode *) \ __ptr, __o, __c); \ } @@ -11336,6 +11340,8 @@ vst4_lane_ ## funcsuffix (ptrtype *__ptr, \ (signedtype) __temp.val[2], 2); \ __o = __builtin_aarch64_set_qregxi##mode (__o,\ (signedtype) __temp.val[3], 3); \ + __builtin_aarch64_im_lane_boundsi (__c, \ +sizeof (__b.val[0]) / sizeof (*__ptr)); \ __builtin_aarch64_st4_lane##mode ((__builtin_aarch64_simd_ ## ptr_mode *) \ __ptr, __o, __c); \ } diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vst4_lane.c b/gcc/testsuite/gcc.target/aarch64/simd/vst4_lane.c new file mode 100644 index 000..6627ecf --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/simd/vst4_lane.c @@ -0,0 +1,14 @@ +/* Test error message when passing an invalid value as a lane index. */ + +/* { dg-do compile } */ + +#include arm_neon.h + +void +f_vst4_lane (int8_t * p, int8x8x4_t v) +{ + /* { dg-error lane 8 out of range 0 - 7 { target *-*-* } 0 } */ + vst4_lane_s8 (p, v, 8); + return; +} + -- 1.9.1
[PATCH 3/4] vstN_lane error message enhancements (Q register)
From: Charles Baylis charles.bay...@linaro.org gcc/ChangeLog: DATE Charles Baylis charles.bay...@linaro.org * config/aarch64/aarch64-builtins.c (aarch64_types_storestruct_lane_qualifiers): Mark last argument with qualifier_struct_load_store_lane_index. gcc/testsuite/ChangeLog: DATE Charles Baylis charles.bay...@linaro.org * gcc.target/aarch64/simd/vst4q_lane.c: New test. Change-Id: If097c9d32eb6eb3d4c4e16db81f81e44a3154509 --- gcc/config/aarch64/aarch64-builtins.c | 2 +- gcc/testsuite/gcc.target/aarch64/simd/vst4q_lane.c | 15 +++ 2 files changed, 16 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/vst4q_lane.c diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index 27046e2..f2fb939 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -258,7 +258,7 @@ aarch64_types_store1_qualifiers[SIMD_MAX_BUILTIN_ARGS] static enum aarch64_type_qualifiers aarch64_types_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_void, qualifier_pointer_map_mode, - qualifier_none, qualifier_none }; + qualifier_none, qualifier_struct_load_store_lane_index }; #define TYPES_STORESTRUCT_LANE (aarch64_types_storestruct_lane_qualifiers) #define CF0(N, X) CODE_FOR_aarch64_##N##X diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vst4q_lane.c b/gcc/testsuite/gcc.target/aarch64/simd/vst4q_lane.c new file mode 100644 index 000..849f07a --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/simd/vst4q_lane.c @@ -0,0 +1,15 @@ +/* Test error message when passing an invalid value as a lane index. */ + +/* { dg-do compile } */ + +#include arm_neon.h + +void +f_vst4q_lane (int8_t * p, int8x16x4_t v) +{ + vst4q_lane_s8 (p, v, 16); +/* { dg-error lane 16 out of range 0 - 15 { target *-*-* } 0 } */ + return; +} + + -- 1.9.1
Re: [PATCH] PR ipa/63909 ICE: SIGSEGV in ipa_icf_gimple::func_checker::compare_bb()
On 11/21/2014 01:23 PM, Richard Biener wrote: On Fri, Nov 21, 2014 at 12:52 PM, Martin Liška mli...@suse.cz wrote: On 11/20/2014 05:41 PM, Richard Biener wrote: On Thu, Nov 20, 2014 at 5:30 PM, Martin Liška mli...@suse.cz wrote: Hello. Following patch fixes ICE in IPA ICF. Problem was that number of non-debug statements in a BB can change (for instance by IPA split), so that the number is recomputed. Huh, so can it get different for both candidates? I think the stmt compare loop should be terminated on gsi_end_p of either iterator and return false for any remaining non-debug-stmts on the other. Thus, not walk all stmts twice here. Hello. Sorry for the previous patch, you are right it can be fixed in purer way. Please take a look at attached patch. As IPA split is run early I don't see how it should affect a real IPA pass though? Sorry for non precise information, the problematic BB is changed here: #0 gsi_split_seq_before (i=0x7fffd550, pnew_seq=0x7fffd528) at ../../gcc/gimple-iterator.c:429 #1 0x00b95a2a in gimple_split_block (bb=0x76c41548, stmt=0x0) at ../../gcc/tree-cfg.c:5707 #2 0x007563cf in split_block (bb=0x76c41548, i=i@entry=0x0) at ../../gcc/cfghooks.c:508 #3 0x00756b44 in split_block_after_labels (bb=optimized out) at ../../gcc/cfghooks.c:549 #4 make_forwarder_block (bb=optimized out, redirect_edge_p=redirect_edge_p@entry=0x75d4e0 mfb_keep_just(edge_def*), new_bb_cbk=new_bb_cbk@entry=0x0) at ../../gcc/cfghooks.c:842 #5 0x0076085a in create_preheader (loop=0x76d56948, flags=optimized out) at ../../gcc/cfgloopmanip.c:1563 #6 0x00760aea in create_preheaders (flags=1) at ../../gcc/cfgloopmanip.c:1613 #7 0x009bc6b0 in apply_loop_flags (flags=15) at ../../gcc/loop-init.c:75 #8 0x009bc7d3 in loop_optimizer_init (flags=15) at ../../gcc/loop-init.c:136 #9 0x00957914 in estimate_function_body_sizes (node=0x76c47620, early=false) at ../../gcc/ipa-inline-analysis.c:2480 #10 0x0095948b in compute_inline_parameters (node=0x76c47620, early=false) at ../../gcc/ipa-inline-analysis.c:2907 #11 0x0095bd88 in inline_analyze_function (node=0x76c47620) at ../../gcc/ipa-inline-analysis.c:3994 #12 0x0095bed3 in inline_generate_summary () at ../../gcc/ipa-inline-analysis.c:4045 #13 0x00a70b71 in execute_ipa_summary_passes (ipa_pass=0x1dcb9e0) at So inline_summary is generated after IPA-ICF does its job? But the bug is obviously that an IPA analysis phase does a code transform (here initializes loops without AVOID_CFG_MANIPULATIONS). Honza - if that is really needed then I think we should make sure loops are initialized at the start of the IPA analysis phase, not randomly inbetween. Thanks, Richard. Hello. Even thought the root of problem is hidden somewhere in loop creation, I would like to apply the patch which makes iteration of non-debug gimple statement more clearly? What do you think Richard? Thanks, Martin ../../gcc/passes.c:2137 #14 0x00777a15 in ipa_passes () at ../../gcc/cgraphunit.c:2074 #15 symbol_table::compile (this=this@entry=0x76c3a000) at ../../gcc/cgraphunit.c:2187 #16 0x00778bcd in symbol_table::finalize_compilation_unit (this=0x76c3a000) at ../../gcc/cgraphunit.c:2340 #17 0x006580ee in c_write_global_declarations () at ../../gcc/c/c-decl.c:10777 #18 0x00b5bb8b in compile_file () at ../../gcc/toplev.c:584 #19 0x00b5def1 in do_compile () at ../../gcc/toplev.c:2041 #20 0x00b5e0fa in toplev::main (this=0x7fffdc9f, argc=20, argv=0x7fffdd98) at ../../gcc/toplev.c:2138 #21 0x0063f1d9 in main (argc=20, argv=0x7fffdd98) at ../../gcc/main.c:38 Patch can bootstrap on x86_64-linux-pc and no regression has been seen. Ready for trunk? Thanks, Martin Thanks, Richard. Patch can bootstrap on x86_64-linux-pc and no regression has been seen. Ready for trunk? Thanks, Martin
Re: [PATCH, x86] Fix pblendv expand.
I've tried to get smaller reproducer, however currently it is complicated as several functions in GCC. However patch is fixing spec2006 benchmark. Shouldn't that be enough for regression testing? Thanks, Evgeny On Tue, Dec 9, 2014 at 4:29 PM, Evgeny Stupachenko evstu...@gmail.com wrote: The case comes from spec2006 403.gcc (or old GCC itself). for (i = 0; i FIRST_PSEUDO_REGISTER; ++i) { vd-e[i].mode = VOIDmode; vd-e[i].oldest_regno = i; vd-e[i].next_regno = INVALID_REGNUM; } It is vectorized and only then completely peeled. Only after peeling all stored values become constant. Currently expand_vec_perm_pblendv works as following: Let d.target, d.op0, dop1 be permutation parameters. First we permute an operand (d.op1 or d.op0) and then blend it with other argument: d.target = shuffle(d.op1) /* using expand_vec_perm_1 */ d.target = pblend(d.op0, d.target) (if d.op0 equal to d.target this is buggy) Patch make it more accurate: tmp = gen_reg_rtx (vmode) tmp = shuffle(d.op1) /* using expand_vec_perm_1 */ d.target = pblend(d.op0, tmp) (Here d.op0 can be equal to d.target) Below is rtl dump of buggy case: (183t.optimized) ... vect_shuffle3_low_470 = VEC_PERM_EXPR { 0, 0, 0, 0 }, { 32, 33, 34, 35 }, { 0, 4, 0, 1 }; vect_shuffle3_high_469 = VEC_PERM_EXPR vect_shuffle3_low_470, { 4294967295, 4294967295, 4294967295, 4294967295 }, { 0, 1, 4, 3 }; ... (184r.expand) ... (insn 205 204 206 (set (reg:V4SI 768) (const_vector:V4SI [ (const_int 0 [0]) (const_int 0 [0]) (const_int 0 [0]) (const_int 0 [0]) ])) ../regrename.c:1171 -1 (nil)) (insn 206 205 208 (set (reg:V4SI 769) (mem/u/c:V4SI (symbol_ref/u:DI (*.LC28) [flags 0x2]) [3 S16 A128])) ../regrename.c:1171 -1 (expr_list:REG_EQUAL (const_vector:V4SI [ (const_int 32 [0x20]) (const_int 33 [0x21]) (const_int 34 [0x22]) (const_int 35 [0x23]) ]) (nil))) (insn 208 206 207 (set (reg:V4SI 770) (vec_select:V4SI (vec_concat:V8SI (reg:V4SI 768) (reg:V4SI 769)) (parallel [ (const_int 0 [0]) (const_int 4 [0x4]) (const_int 1 [0x1]) (const_int 5 [0x5]) ]))) ../regrename.c:1171 -1 (nil)) (insn 207 208 209 (set (reg:V4SI 464 [ D.15061 ]) (vec_select:V4SI (reg:V4SI 770) (parallel [ (const_int 0 [0]) (const_int 1 [0x1]) (const_int 0 [0]) (const_int 2 [0x2]) ]))) ../regrename.c:1171 -1 (nil)) (insn 209 207 210 (set (reg:V4SI 771) (const_vector:V4SI [ (const_int -1 [0x]) (const_int -1 [0x]) (const_int -1 [0x]) (const_int -1 [0x]) ])) ../regrename.c:1171 -1 (nil)) (insn 210 209 211 (set (reg:V4SI 464 [ D.15061 ]) (vec_select:V4SI (reg:V4SI 771) (parallel [ (const_int 0 [0]) (const_int 1 [0x1]) (const_int 0 [0]) (const_int 3 [0x3]) ]))) ../regrename.c:1171 -1 (nil)) (insn 211 210 212 (set (reg:V8HI 772) (vec_merge:V8HI (subreg:V8HI (reg:V4SI 464 [ D.15061 ]) 0) (subreg:V8HI (reg:V4SI 464 [ D.15061 ]) 0) (const_int 48 [0x30]))) ../regrename.c:1171 -1 (nil)) ... On Tue, Dec 9, 2014 at 12:06 PM, Uros Bizjak ubiz...@gmail.com wrote: On Tue, Dec 9, 2014 at 9:57 AM, Uros Bizjak ubiz...@gmail.com wrote: The patch fix pblendv expand. The bug was uncovered when permutation operands are constants. In this case we init target register for expand_vec_perm_1 with constant and then rewrite the target with constant for expand_vec_perm_pblend. The patch fixes 403.gcc execution, compiled with -Ofast -funroll-loops -flto -march=corei7. Bootstrap and make check passed. Is it ok? Please add a testcase. Also, it surprises me that we enter expand_vec_perm_pblendv with uninitialized (?) target. Does your patch only papers over a real problem up the call chain (hard to say without a testcase)? Uros. Evgeny 2014-12-09 Evgeny Stupachenko evstu...@gmail.com gcc/ * config/i386/i386.c (expand_vec_perm_pblendv): Gen new rtx for expand_vec_perm_1 target. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index eafc15a..5a914ad 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -47546,6 +47546,7 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d *d) dcopy.op0 = dcopy.op1 = d-op1; else dcopy.op0 = dcopy.op1 = d-op0; + dcopy.target = gen_reg_rtx
Re: [PATCH] AArch64: Add TARGET_SCHED_REASSOCIATION_WIDTH
On 24 November 2014 at 13:46, Wilco Dijkstra wdijk...@arm.com wrote: Richard Earnshaw wrote: If all cores seem to benefit from FP reassociation set to 4, then it seems odd that 4 is not also the default for generic. Andrew, you may need to pick a target-specific value for ThunderX; I think Wilco has just picked something that seems plausible because he needs to put a real value in there. What happens if the integer and vector numbers are bumped up? I'd have thought that integer numbers 1 would be appropriate on all dual-issue or greater cores. I tried int and vector as well, and setting int to 2 did give an improvement, but vector had no effect, so I'll leave to 1 for now. The patch is the same as last time, it just sets integer to 2, and uses the same settings for all CPUs. OK for commit? ChangeLog: 2014-11-24 Wilco Dijkstra wdijk...@arm.com * gcc/config/aarch64/aarch64-protos.h (tune-params): Add reasociation tuning parameters. * gcc/config/aarch64/aarch64.c (TARGET_SCHED_REASSOCIATION_WIDTH): Define. (aarch64_reassociation_width): New function. (generic_tunings) Add reassociation tuning parameters. (cortexa53_tunings): Likewise. (cortexa57_tunings): Likewise. (thunderx_tunings): Likewise. OK /Marcus
Re: [PATCH] PR other/63613: Add fixincludes for dejagnu.h
On Mon, 2014-12-08 at 14:13 +0100, Rainer Orth wrote: Jeff Law l...@redhat.com writes: On 12/04/14 15:42, Rainer Orth wrote: David Malcolm dmalc...@redhat.com writes: dejagnu.h assumed -fgnu89-inline until a recent upstream fix; see http://lists.gnu.org/archive/html/dejagnu/2014-10/msg00011.html Remove the workaround from jit.exp that used -fgnu89-inline in favor of a fixincludes to dejagnu.h that applies the upstream fix to a local copy. This should make it easier to support C++ testcases from jit.exp. I wonder how this would work if dejagnu.h doesn't live in a system include dir (e.g. a self-compiled version)? fixincludes won't touch those AFAIU. The previous version with -fgnu89-inline would still work in that case provided dejagnu.h is found at all. Presumably in that case the answer is upgrade dejagnu? :-) I've two problems with this: * There's not yet a DejaGnu release available with the fix and I've no idea if there are any planned any time soon. Not everyone is comfortable with random git (or whatever) snapshots. FWIW I've asked on the DejaGnu mailing list, and Ben Elliston said: Yes. I plan on releasing 1.6 over the holidays. http://lists.gnu.org/archive/html/dejagnu/2014-12/msg1.html * I don't consider this a critical issue that cannot work without current releases. We're already working around several upstream DejaGnu issues in our codebase, and I don't consider this particular one important enough to require everyone to upgrade to a not-a-release version. Rainer
Re: [PATCH, x86] Fix pblendv expand.
On 12/09/2014 07:59 AM, Evgeny Stupachenko wrote: However patch is fixing spec2006 benchmark. Shouldn't that be enough for regression testing? No. Spec is not free. r~
Re: [PATCH][AARCH64][5/5] Add macro fusion support for cmp/b.X for ThunderX
On 5 December 2014 at 14:36, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, Andrew posted this patch sometime ago (before stage1 closed) and I had rebased it on top of the other macro fusion patches in that series. This is a respin of that patch with the comment about not calling get_attr_type repeatedly resolved (https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02251.html) Tested aarch64-none-elf. Is this ok to go in? Thanks, Kyrill 2014-12-01 Andrew Pinski apin...@cavium.com Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/aarch64.c (AARCH64_FUSE_CMP_BRANCH): New define. (thunderx_tunings): Add AARCH64_FUSE_CMP_BRANCH to fuseable_ops. (aarch_macro_fusion_pair_p): Handle AARCH64_FUSE_CMP_BRANCH. OK /Marcus
Re: [PATCH, x86] Fix pblendv expand.
On Tue, Dec 09, 2014 at 06:59:27PM +0300, Evgeny Stupachenko wrote: I've tried to get smaller reproducer, however currently it is complicated as several functions in GCC. Just add a gcc_assert where you are changing the code, testing for what you want to avoid. Then just delta reduce it or creduce it. Jakub
Re: [PATCH 04/10] rs6000: Remove addic from the normal add pattern
It's unfortunate that GCC does not have separate patterns for a safe ADD used by reload. It does, the addptrM3 pattern, but it works only with LRA at the moment. -- Eric Botcazou
Re: Fix libgomp crash without TLS (PR42616)
On 12/09/2014 06:53 AM, Varvara Rainchik wrote: 2014-12-09 Varvara Rainchik varvara.rainc...@intel.com * config.h.in: Regenerate. * configure: Regenerate. * configure.ac: Add GCC_CHECK_EMUTLS. * libgomp.h: Add check for USE_EMUTLS: this case is equal to HAVE_TLS. * team.c: Likewise. Ok. r~
Re: [PATCH, x86] Fix pblendv expand.
I mean that there are a lot of people tracking spec2006 stability and therefore the issue should be on track in future. And that I can create the test case, but it would be as big as several GCC functions. Will work on reducing the test case. On Tue, Dec 9, 2014 at 7:20 PM, Richard Henderson r...@redhat.com wrote: On 12/09/2014 07:59 AM, Evgeny Stupachenko wrote: However patch is fixing spec2006 benchmark. Shouldn't that be enough for regression testing? No. Spec is not free. r~
Re: [PATCH 04/10] rs6000: Remove addic from the normal add pattern
On Tue, Dec 9, 2014 at 11:24 AM, Eric Botcazou ebotca...@adacore.com wrote: It's unfortunate that GCC does not have separate patterns for a safe ADD used by reload. It does, the addptrM3 pattern, but it works only with LRA at the moment. Hi, Eric Segher and I previously discussed addptrM3, but I thought that it was restricted in a different way. We definitely will have to define addptrM3 and restore addM3 when rs6000 switches to LRA. Thanks, David
Re: [PATCH 04/10] rs6000: Remove addic from the normal add pattern
On Tue, Dec 09, 2014 at 11:48:44AM -0500, David Edelsohn wrote: We definitely will have to define addptrM3 and restore addM3 when rs6000 switches to LRA. Can LRA deal with adding a clobber of a hard register? How would you express that in the patterns, anyway? Segher
Re: [patch, build] Restore bootstrap in building libcc1 on darwin
This was for x86_64-apple-darwin14. The patch also works for x86_64-apple-darwin10. Dominique Le 6 déc. 2014 à 01:49, Dominique d'Humières domi...@lps.ens.fr a écrit : Bootstrap just finished with the patch. Thanks, Dominique
RE: [PATCH] AArch64: Add TARGET_SCHED_REASSOCIATION_WIDTH
Marcus Shawcroft wrote: OK for commit? ChangeLog: 2014-11-24 Wilco Dijkstra wdijk...@arm.com * gcc/config/aarch64/aarch64-protos.h (tune-params): Add reasociation tuning parameters. * gcc/config/aarch64/aarch64.c (TARGET_SCHED_REASSOCIATION_WIDTH): Define. (aarch64_reassociation_width): New function. (generic_tunings) Add reassociation tuning parameters. (cortexa53_tunings): Likewise. (cortexa57_tunings): Likewise. (thunderx_tunings): Likewise. OK /Marcus Jiong, can you commit this please? --- gcc/config/aarch64/aarch64-protos.h | 3 +++ gcc/config/aarch64/aarch64.c| 36 2 files changed, 35 insertions(+), 4 deletions(-) diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index a9985b5..ac3487b 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -170,6 +170,9 @@ struct tune_params const struct cpu_vector_cost *const vec_costs; const int memmov_cost; const int issue_rate; + const int int_reassoc_width; + const int fp_reassoc_width; + const int vec_reassoc_width; }; HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 3832123..e543161 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -314,7 +314,10 @@ static const struct tune_params generic_tunings = generic_regmove_cost, generic_vector_cost, NAMED_PARAM (memmov_cost, 4), - NAMED_PARAM (issue_rate, 2) + NAMED_PARAM (issue_rate, 2), + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1/* vec_reassoc_width. */ }; static const struct tune_params cortexa53_tunings = @@ -324,7 +327,10 @@ static const struct tune_params cortexa53_tunings = cortexa53_regmove_cost, generic_vector_cost, NAMED_PARAM (memmov_cost, 4), - NAMED_PARAM (issue_rate, 2) + NAMED_PARAM (issue_rate, 2), + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1/* vec_reassoc_width. */ }; static const struct tune_params cortexa57_tunings = @@ -334,7 +340,10 @@ static const struct tune_params cortexa57_tunings = cortexa57_regmove_cost, cortexa57_vector_cost, NAMED_PARAM (memmov_cost, 4), - NAMED_PARAM (issue_rate, 3) + NAMED_PARAM (issue_rate, 3), + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1/* vec_reassoc_width. */ }; static const struct tune_params thunderx_tunings = @@ -344,7 +353,10 @@ static const struct tune_params thunderx_tunings = thunderx_regmove_cost, generic_vector_cost, NAMED_PARAM (memmov_cost, 6), - NAMED_PARAM (issue_rate, 2) + NAMED_PARAM (issue_rate, 2), + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1/* vec_reassoc_width. */ }; /* A processor implementing AArch64. */ @@ -437,6 +449,19 @@ static const char * const aarch64_condition_codes[] = hi, ls, ge, lt, gt, le, al, nv }; +static int +aarch64_reassociation_width (unsigned opc ATTRIBUTE_UNUSED, +enum machine_mode mode) +{ + if (VECTOR_MODE_P (mode)) +return aarch64_tune_params-vec_reassoc_width; + if (INTEGRAL_MODE_P (mode)) +return aarch64_tune_params-int_reassoc_width; + if (FLOAT_MODE_P (mode)) +return aarch64_tune_params-fp_reassoc_width; + return 1; +} + /* Provide a mapping from gcc register numbers to dwarf register numbers. */ unsigned aarch64_dbx_register_number (unsigned regno) @@ -10499,6 +10524,9 @@ aarch64_gen_ccmp_next (rtx prev, int cmp_code, rtx op0, rtx op1, int bit_code) #undef TARGET_PREFERRED_RELOAD_CLASS #define TARGET_PREFERRED_RELOAD_CLASS aarch64_preferred_reload_class +#undef TARGET_SCHED_REASSOCIATION_WIDTH +#define TARGET_SCHED_REASSOCIATION_WIDTH aarch64_reassociation_width + #undef TARGET_SECONDARY_RELOAD #define TARGET_SECONDARY_RELOAD aarch64_secondary_reload -- 1.9.1
Re: [PATCH 04/10] rs6000: Remove addic from the normal add pattern
On Mon, Dec 08, 2014 at 09:01:39AM -0600, Segher Boessenkool wrote: Why are you removing the alternative instead of clobbering XER[CA]? I should have mentioned, sorry. We don't want to clobber CA on *every* add, and reload can generate more adds out of thin air. And CA is a fixed register. There may be a better way to do this, but I don't know it. Some numbers... I checked how often the compiler used to allocate GPR0 here. I looked at cc1 of 4.7.2, 12MB text size. It happened zero times (it already was ?r to discourage the RA / reload from using that alternative). There are 143962 addi insns in there. Segher
Re: [Patch] Improving jump-thread pass for PR 54742
Richard Biener wrote: On Mon, Dec 8, 2014 at 10:49 PM, Steve Ellcey sell...@mips.com wrote: expected? Should this test also check flag_thread_jumps? Or should that be getting checked somewhere else? -fthread-jumps is an RTL optimization flag and ignored on GIMPLE. Does it make sense to add a -f[no-]tree-thread-jumps to enable/disable the tree jump threading? I could also add -f[no-]tree-fsm-thread-jumps. Opinions? On the llvm test-suite, I have seen one ICE with my fsm jump-thread patch. This patch fixes the problem: diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c index 12f83ba..f8c736e 100644 --- a/gcc/tree-ssa-threadupdate.c +++ b/gcc/tree-ssa-threadupdate.c @@ -2564,6 +2564,7 @@ thread_through_all_blocks (bool may_peel_loop_headers) FOR_EACH_LOOP (loop, LI_FROM_INNERMOST) { if (!loop-header +|| !loop_latch_edge (loop) || !bitmap_bit_p (threaded_blocks, loop-header-index)) continue; retval |= thread_through_loop_header (loop, may_peel_loop_headers); Ok to commit after regstrap? Thanks, Sebastian
[PATCH, libcpp]: Check the result of vasprintf
Hello! The compilation with gentoo glibc 2.20 emits following warning: ../../../gcc-svn/trunk/libcpp/directives.c:2411:28: warning: ignoring return value of ‘int vasprintf(char**, const char*, __gnuc_va_list)’, declared with attribute warn_unused_result [-Wunused-result] The manpage says: If memory allocation wasn't possible, or some other error occurs, these functions will return -1, and the contents of strp is undefined. Attached patch checks the return value and sets ptr to NULL in this case. 2014-12-09 Uros Bizjak ubiz...@gmail.com * directives.c (cpp_define_formatted): Check return value of vasprintf and in case of error set ptr to NULL. Bootstrapped on x86_64-linux-gnu. OK for mainline? Uros. Index: directives.c === --- directives.c(revision 218525) +++ directives.c(working copy) @@ -2408,7 +2408,8 @@ cpp_define_formatted (cpp_reader *pfile, const cha va_list ap; va_start (ap, fmt); - vasprintf (ptr, fmt, ap); + if (vasprintf (ptr, fmt, ap) 0) +ptr = NULL; va_end (ap); cpp_define (pfile, ptr);
[PATCH] TYPE_OVERFLOW_* cleanup
The issue here is that TYPE_OVERFLOW_TRAPS, TYPE_OVERFLOW_UNDEFINED, and TYPE_OVERFLOW_WRAPS macros work on integral types only, yet we pass e.g. pointer_type/real_type to them. This patch adds proper checking for these macros and adds missing guards to various places. This looks pretty straightforward, but I had to tweak a few places wrt vector_types (where I've used VECTOR_INTEGER_TYPE_P) to not to regress folding - and I'm afraid I missed places that aren't tested in our testsuite :/. Bootstrapped/regtested on ppc64-linux and x86_64-linux. 2014-12-09 Marek Polacek pola...@redhat.com * fold-const.c (negate_expr_p): Check for INTEGRAL_TYPE_P. (fold_negate_expr): Likewise. (extract_muldiv_1): Likewise. (maybe_canonicalize_comparison_1): Likewise. (fold_comparison): Likewise. (fold_binary_loc): Likewise. (tree_binary_nonnegative_warnv_p): Likewise. (tree_binary_nonzero_warnv_p): Likewise. * gimple-ssa-strength-reduction.c (legal_cast_p_1): Likewise. * tree-scalar-evolution.c (simple_iv): Likewise. (scev_const_prop): Likewise. * tree-ssa-loop-niter.c (expand_simple_operations): Likewise. * match.pd (X % -C): Likewise. (-A - 1 - ~A): Likewise. (~A + A - -1): Check for INTEGRAL_TYPE_P or VECTOR_INTEGER_TYPE_P. * tree-vect-generic.c (expand_vector_operation): Likewise. * tree.h (INTEGRAL_TYPE_CHECK): Define. (TYPE_OVERFLOW_WRAPS, TYPE_OVERFLOW_UNDEFINED, TYPE_OVERFLOW_TRAPS): Add INTEGRAL_TYPE_CHECK. diff --git gcc/fold-const.c gcc/fold-const.c index 0c4fe40..ff9d917 100644 --- gcc/fold-const.c +++ gcc/fold-const.c @@ -426,7 +426,8 @@ negate_expr_p (tree t) case VECTOR_CST: { - if (FLOAT_TYPE_P (TREE_TYPE (type)) || TYPE_OVERFLOW_WRAPS (type)) + if (FLOAT_TYPE_P (TREE_TYPE (type)) + || (INTEGRAL_TYPE_P (type) TYPE_OVERFLOW_WRAPS (type))) return true; int count = TYPE_VECTOR_SUBPARTS (type), i; @@ -558,7 +559,8 @@ fold_negate_expr (location_t loc, tree t) case INTEGER_CST: tem = fold_negate_const (t, type); if (TREE_OVERFLOW (tem) == TREE_OVERFLOW (t) - || (!TYPE_OVERFLOW_TRAPS (type) + || (INTEGRAL_TYPE_P (type) + !TYPE_OVERFLOW_TRAPS (type) TYPE_OVERFLOW_WRAPS (type)) || (flag_sanitize SANITIZE_SI_OVERFLOW) == 0) return tem; @@ -5951,7 +5953,8 @@ extract_muldiv_1 (tree t, tree c, enum tree_code code, tree wide_type, || EXPRESSION_CLASS_P (op0)) /* ... and has wrapping overflow, and its type is smaller than ctype, then we cannot pass through as widening. */ - ((TYPE_OVERFLOW_WRAPS (TREE_TYPE (op0)) + (((INTEGRAL_TYPE_P (TREE_TYPE (op0)) +TYPE_OVERFLOW_WRAPS (TREE_TYPE (op0))) (TYPE_PRECISION (ctype) TYPE_PRECISION (TREE_TYPE (op0 /* ... or this is a truncation (t is narrower than op0), @@ -5966,7 +5969,8 @@ extract_muldiv_1 (tree t, tree c, enum tree_code code, tree wide_type, /* ... or has undefined overflow while the converted to type has not, we cannot do the operation in the inner type as that would introduce undefined overflow. */ - || (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (op0)) + || ((INTEGRAL_TYPE_P (TREE_TYPE (op0)) + TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (op0))) !TYPE_OVERFLOW_UNDEFINED (type break; @@ -6159,6 +6163,7 @@ extract_muldiv_1 (tree t, tree c, enum tree_code code, tree wide_type, if ((code == TRUNC_MOD_EXPR || code == CEIL_MOD_EXPR || code == FLOOR_MOD_EXPR || code == ROUND_MOD_EXPR) /* If the multiplication can overflow we cannot optimize this. */ + INTEGRAL_TYPE_P (TREE_TYPE (t)) TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (t)) TREE_CODE (TREE_OPERAND (t, 1)) == INTEGER_CST wi::multiple_of_p (op1, c, TYPE_SIGN (type))) @@ -6211,7 +6216,8 @@ extract_muldiv_1 (tree t, tree c, enum tree_code code, tree wide_type, If we have an unsigned type, we cannot do this since it will change the result if the original computation overflowed. */ - if (TYPE_OVERFLOW_UNDEFINED (ctype) + if (INTEGRAL_TYPE_P (ctype) + TYPE_OVERFLOW_UNDEFINED (ctype) ((code == MULT_EXPR tcode == EXACT_DIV_EXPR) || (tcode == MULT_EXPR code != TRUNC_MOD_EXPR code != CEIL_MOD_EXPR @@ -8497,7 +8503,8 @@ maybe_canonicalize_comparison_1 (location_t loc, enum tree_code code, tree type, /* Match A +- CST code arg1 and CST code arg1. We can change the first form only if overflow is undefined. */ - if (!((TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0)) + if (!(((INTEGRAL_TYPE_P (TREE_TYPE (arg0)) + TYPE_OVERFLOW_UNDEFINED (TREE_TYPE
Re: [Patch] Improving jump-thread pass for PR 54742
On 12/09/14 10:38, Sebastian Pop wrote: Richard Biener wrote: On Mon, Dec 8, 2014 at 10:49 PM, Steve Ellcey sell...@mips.com wrote: expected? Should this test also check flag_thread_jumps? Or should that be getting checked somewhere else? -fthread-jumps is an RTL optimization flag and ignored on GIMPLE. Does it make sense to add a -f[no-]tree-thread-jumps to enable/disable the tree jump threading? I could also add -f[no-]tree-fsm-thread-jumps. Opinions? Our option handling is a bit of a mess if we look at it from the user standpoint. Given that the vast majority of jump threading happens on gimple, ISTM that -f[no-]thread-jumps ought to be controlling the gimple implementation. One could easily argue that the user doesn't really care about where in the pipeline the optimization is implemented. My vote would be to just make -fthread-jumps control both RTL and gimple jump threading. On the llvm test-suite, I have seen one ICE with my fsm jump-thread patch. This patch fixes the problem: diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c index 12f83ba..f8c736e 100644 --- a/gcc/tree-ssa-threadupdate.c +++ b/gcc/tree-ssa-threadupdate.c @@ -2564,6 +2564,7 @@ thread_through_all_blocks (bool may_peel_loop_headers) FOR_EACH_LOOP (loop, LI_FROM_INNERMOST) { if (!loop-header +|| !loop_latch_edge (loop) || !bitmap_bit_p (threaded_blocks, loop-header-index)) continue; retval |= thread_through_loop_header (loop, may_peel_loop_headers); Ok to commit after regstrap? This seems to be indicating that we have with no edge from the latch block to the header block. I'd like to know better how we got into that state. Also, a test for the GCC testsuite would be good. I have no idea what license covers the LLVM testsuite. But given a good analysis of the problem we may be able to write a suitable test independent of the LLVM test. jeff
Re: [Android] Stop unconditional disabling of ifuncs for Bionic #2
On 12/09/14 05:39, Alexander Ivchenko wrote: Hi, Whether the *linux* target supports ifuncs or not is defined here: linux_has_ifunc_p (void) { return OPTION_BIONIC ? false : HAVE_GNU_INDIRECT_FUNCTION; } Bionic right now supports indirect functions, but there is no way to notify the compiler about that (For Android OPTION_BIONIC is always on and so configure time check with --enable-gnu-indirect-function does not work) The following patch makes the use of the default version of TARGET_HAS_IFUNC_P for *linux*, so we can decide whether the Android target supports indirect functions or not on configure time. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index f22bba8..d4d09d0 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2014-12-08 Alexander Ivchenko alexander.ivche...@intel.com + + * config/linux.c (linux_has_ifunc_p): Remove. + * config/linux.h (TARGET_HAS_IFUNC_P): Use default version. OK. jeff
[PATCH, committed] jit-playback.c: Move dlopen code into a new function
Another simplification of playback::context::compile. Committed to trunk as r218527. Index: gcc/jit/ChangeLog === --- gcc/jit/ChangeLog (revision 218526) +++ gcc/jit/ChangeLog (revision 218527) @@ -1,3 +1,12 @@ +2014-12-09 David Malcolm dmalc...@redhat.com + + * jit-playback.c (gcc::jit::playback::context::compile): Move the + dlopen code into... + (gcc::jit::playback::context::dlopen_built_dso): ...this new + function. + * jit-playback.h (gcc::jit::playback::context::dlopen_built_dso): + New function. + 2014-12-08 David Malcolm dmalc...@redhat.com * libgccjit++.h: Indent the forward declarations of the classes to Index: gcc/jit/jit-playback.c === --- gcc/jit/jit-playback.c (revision 218526) +++ gcc/jit/jit-playback.c (revision 218527) @@ -1586,7 +1586,6 @@ playback::context:: compile () { - void *handle = NULL; const char *ctxt_progname; result *result_obj = NULL; @@ -1648,25 +1647,8 @@ if (errors_occurred ()) return NULL; - /* dlopen the .so file. */ - { -auto_timevar load_timevar (TV_LOAD); + result_obj = dlopen_built_dso (); -const char *error; - -/* Clear any existing error. */ -dlerror (); - -handle = dlopen (m_path_so_file, RTLD_NOW | RTLD_LOCAL); -if ((error = dlerror()) != NULL) { - add_error (NULL, %s, error); -} -if (handle) - result_obj = new result (handle); -else - result_obj = NULL; - } - return result_obj; } @@ -1916,6 +1898,34 @@ } } +/* Dynamically-link the built DSO file into this process, using dlopen. + Wrap it up within a jit::result *, and return that. + Return NULL if any errors occur, reporting them on this context. */ + +result * +playback::context:: +dlopen_built_dso () +{ + auto_timevar load_timevar (TV_LOAD); + void *handle = NULL; + const char *error = NULL; + result *result_obj = NULL; + + /* Clear any existing error. */ + dlerror (); + + handle = dlopen (m_path_so_file, RTLD_NOW | RTLD_LOCAL); + if ((error = dlerror()) != NULL) { +add_error (NULL, %s, error); + } + if (handle) +result_obj = new result (handle); + else +result_obj = NULL; + + return result_obj; +} + /* Top-level hook for playing back a recording context. This plays back m_recording_ctxt, and, if no errors Index: gcc/jit/jit-playback.h === --- gcc/jit/jit-playback.h (revision 218526) +++ gcc/jit/jit-playback.h (revision 218527) @@ -250,6 +250,9 @@ void convert_to_dso (const char *ctxt_progname); + result * + dlopen_built_dso (); + private: ::gcc::jit::recording::context *m_recording_ctxt;
Re: [PATCH] Fix PR 61225
On 12/09/14 02:49, Zhenqiang Chen wrote: Do you need to verify SETA and SETB satisfy single_set? Or has that already been done elsewhere? A is NEXT_INSN (insn) B is prev_nonnote_nondebug_insn (insn), For I1 - I2 - B; I2 - A; LOG_LINK can make sure I1 and I2 are single_set, but not A and B. And I did found codes in function try_combine, which can make sure B (or I3) is single_set. So I think the check can skip failed cases at early stage. Thanks for doing the research on this. The check is to make sure the correctness. Here is a case, int f1 (int *x) { int t = --*x; if (!t) foo (x); return t; } _4 = *x_3(D); _5 = _4 + -1; *x_3(D) = _5; # DEBUG t = _5 if (_5 == 0) ... bb 4: return _5; _5 is used in return _5. So we can not remove _5 = _4 + -1. Right, but ISTM that if the # uses 2, then we could just return false rather than bothering to see if all the uses are consumed by A or B. It's not a big deal, I just have a hard time seeing that doing something more complex than if (# uses 2) return false; makes sense. So you've got two new combine cases here, but I think the testcase only tests one of them. Can you include a testcase for both of hte major paths above (I1-I2-I3; I2-insn and I2-I3; I2-INSN) pr61225.c is the case to cover I1-I2-I3; I2-insn. For I2 - I3; I2 - insn, I tried my test cases and found peephole2 can also handle them. So I removed the code from the patch. Seems like the reasonable thing to do. Here is the final patch. Bootstrap and no make check regression on X86-64. ChangeLog: 2014-11-09 Zhenqiang Chen zhenqiang.c...@linaro.org Part of PR rtl-optimization/61225 * combine.c (can_reuse_cc_set_p): New function. (combine_instructions): Handle I1 - I2 - I3; I2 - insn. (try_combine): Add one more parameter TO_COMBINED_INSN, which is used to create a new insn parallel (TO_COMBINED_INSN, I3). testsuite/ChangeLog: 2014-11-09 Zhenqiang Chenzhenqiang.c...@linaro.org * gcc.target/i386/pr61225.c: New test. OK for the trunk. jeff
[PATCH, fixincludes]: Check return value of getcwd
Hello! The compilation with gentoo glibc 2.20 emits following warning: ../../../gcc-svn/trunk/fixincludes/server.c:195:10: warning: ignoring return value of ‘getcwd’, declared with attribute warn_unused_result [-Wunused-result] The manpage says: On failure, these functions return NULL, and errno is set to indicate the error. The contents of the array pointed to by buf are undefined on error. Attached patch checks the return value and sets buff[0] to 0 in this case. 2014-12-09 Uros Bizjak ubiz...@gmail.com * server.c (server_setup): Check return value of getcwd and in case of error set buff[0] to 0. Bootstrapped on x86_64-linux-gnu. OK for mainline? Index: server.c === --- server.c(revision 218525) +++ server.c(working copy) @@ -192,7 +192,8 @@ server_setup (void) fputs (trap : 1\n, server_pair.pf_write); fflush (server_pair.pf_write); - getcwd (buff, MAXPATHLEN + 1); + if (getcwd (buff, MAXPATHLEN + 1) == NULL) +buff[0] = 0; p_cur_dir = xstrdup (buff); }
[PATCH, committed] Guard less code with the JIT mutex
Move acquisition/release of the JIT mutex from jit-recording.c:gcc::jit::recording::context::compile into jit-playback.c: gcc::jit::playback::context::compile and reduce the amount of code guarded by the mutex somewhat. For acquisition, we didn't need the mutex when building the tempdir and building toplev's argv, so now acquire the mutex immediately before using toplev. For release, we didn't need the mutex when cleaning up the tempdir and the other cleanup of the playback context, so release the mutex on each exit path from playback::context::compile. Ideally we'd release it immediately after toplev.finalize (), but there are various places after that point that use timevars (which is global state) so we have to hold the mutex until we're done with them. Committed to trunk as r218528. gcc/jit/ChangeLog: * jit-playback.c (gcc::jit::playback::context::compile): Acquire the mutex here, immediately before using toplev, and release it here, on each exit path after acquisition. (jit_mutex): Move this variable here, from jit-recording.c. (gcc::jit::playback::context::acquire_mutex): New function, based on code in jit-recording.c. (gcc::jit::playback::context::release_mutex): Likewise. * jit-playback.h (gcc::jit::playback::context::acquire_mutex): New function. (gcc::jit::playback::context::release_mutex): New function. * jit-recording.c (jit_mutex): Move this variable to jit-playback.c. (gcc::jit::recording::context::compile): Move mutex-handling from here into jit-playback.c's gcc::jit::playback::context::compile. * notes.txt: Update to show the new locations of ACQUIRE_MUTEX and RELEASE_MUTEX. Index: gcc/jit/jit-recording.c === --- gcc/jit/jit-recording.c (revision 218527) +++ gcc/jit/jit-recording.c (revision 218528) @@ -888,12 +888,6 @@ m_requested_dumps.safe_push (d); } - -/* This mutex guards gcc::jit::recording::context::compile, so that only - one thread can be accessing the bulk of GCC's state at once. */ - -static pthread_mutex_t jit_mutex = PTHREAD_MUTEX_INITIALIZER; - /* Validate this context, and if it passes, compile it within a mutex. @@ -908,20 +902,12 @@ if (errors_occurred ()) return NULL; - /* Acquire the big GCC mutex. */ - pthread_mutex_lock (jit_mutex); - gcc_assert (NULL == ::gcc::jit::active_playback_ctxt); - /* Set up a playback context. */ ::gcc::jit::playback::context replayer (this); - ::gcc::jit::active_playback_ctxt = replayer; + /* Use it. */ result *result_obj = replayer.compile (); - /* Release the big GCC mutex. */ - ::gcc::jit::active_playback_ctxt = NULL; - pthread_mutex_unlock (jit_mutex); - return result_obj; } Index: gcc/jit/ChangeLog === --- gcc/jit/ChangeLog (revision 218527) +++ gcc/jit/ChangeLog (revision 218528) @@ -1,5 +1,23 @@ 2014-12-09 David Malcolm dmalc...@redhat.com + * jit-playback.c (gcc::jit::playback::context::compile): Acquire the + mutex here, immediately before using toplev, and release it here, on + each exit path after acquisition. + (jit_mutex): Move this variable here, from jit-recording.c. + (gcc::jit::playback::context::acquire_mutex): New function, based on + code in jit-recording.c. + (gcc::jit::playback::context::release_mutex): Likewise. + * jit-playback.h (gcc::jit::playback::context::acquire_mutex): New + function. + (gcc::jit::playback::context::release_mutex): New function. + * jit-recording.c (jit_mutex): Move this variable to jit-playback.c. + (gcc::jit::recording::context::compile): Move mutex-handling from + here into jit-playback.c's gcc::jit::playback::context::compile. + * notes.txt: Update to show the new locations of ACQUIRE_MUTEX + and RELEASE_MUTEX. + +2014-12-09 David Malcolm dmalc...@redhat.com + * jit-playback.c (gcc::jit::playback::context::compile): Move the dlopen code into... (gcc::jit::playback::context::dlopen_built_dso): ...this new Index: gcc/jit/jit-playback.c === --- gcc/jit/jit-playback.c (revision 218527) +++ gcc/jit/jit-playback.c (revision 218528) @@ -1622,6 +1622,9 @@ if (errors_occurred ()) return NULL; + /* Acquire the JIT mutex and set this as the active playback ctxt. */ + acquire_mutex (); + /* This runs the compiler. */ toplev toplev (false); toplev.main (fake_args.length (), @@ -1635,10 +1638,14 @@ /* Clean up the compiler. */ toplev.finalize (); - active_playback_ctxt = NULL; + /* Ideally we would release the jit mutex here, but we can't yet since + followup activities use timevars, which are global state. */ if (errors_occurred ()) -return NULL; +{ + release_mutex (); + return NULL; +} if (get_bool_option (GCC_JIT_BOOL_OPTION_DUMP_GENERATED_CODE))
Re: [PATCH 07/10] rs6000: The big carry insns addition
On Mon, Dec 8, 2014 at 9:18 AM, Segher Boessenkool seg...@kernel.crashing.org wrote: Now that everything is in place for letting GCC use the carry-using and carry-producing machine insns as separate RTL insns, switch over all remaining patterns that clobber CA without telling the compiler. This are the multiple-precision add/sub/neg patterns, and the various eq/ne/ ltu/gtu/leu/geu patterns. The eq/ne patterns are special. The optimal machine code for those isn't regular (like e.g. the ltu patterns are), and we want to implement a plain eq as cntlz[wd];sr[wd]i; but that means that if we split those patterns at expand time, combine will happily put them back together again. So expand them as eq/ne, and split later. 2014-12-08 Segher Boessenkool seg...@kernel.crashing.org gcc/ PR target/64180 * config/rs6000/predicates.md (unsigned_comparison_operator): New. (signed_comparison_operator): New. * config/rs6000/rs6000-protos.h (rs6000_emit_eqne): Declare. * config/rs6000/rs6000.c (rs6000_emit_eqne): New function. (rs6000_emit_sCOND): Remove ISEL test (move it to the expander). * config/rs6000/rs6000.md (addmode3 for SDI): Expand DImode add to addc,adde directly, if !TARGET_POWERPC64. (submode3 for SDI): Expand DImode sub to subfc,subfe directly, if !TARGET_POWERPC64. (negmode2): Delete expander. (*negmode2): Rename to negmode2. (addti3, subti3): Delete. (addti3, subti3): New expanders. (*adddi3_noppc64, *subdi3_noppc64, *negdi2_noppc64): Delete. (cstoremode4_unsigned): New expander. (cstoremode4): Allow GPR as output (not just SI). Rewrite. (cstoremode4 for FP): Remove superfluous quotes. (*eqmode, *eqmode_compare, *plus_eqsi and splitter, *compare_plus_eqsi and splitter, *plus_eqsi_compare and splitter, *neg_eq0mode, *neg_eqmode, *ne0_mode, plus_ne0_mode, compare_plus_ne0_mode and splitter, *compare_plus_ne0_mode_1 and splitter, *plus_ne0_mode_compare and splitter, *leumode, *leumode_compare and splitter, *plus_leumode, *neg_leumode, *and_neg_leumode, *ltumode, *ltumode_compare, *plus_ltumode, *plus_ltumode_1, *plus_ltumodecompare, *neg_ltumode, *geumode, *geumode_compare and splitter, *plus_geumode, *neg_geumode, *and_neg_geumode, *plus_gt0mode, *gtumode, *gtumode_compare, *plus_gtumode, *plus_gtumode_1, *plus_gtumode_compare, *neg_gtumode, 12 anonymous insns, and 12 anonymous splitters): Delete. (eqmode3, nemode3): New. (*neg_eq_mode, *neg_ne_mode): New. (*plus_eq_mode, *plus_ne_mode): New. (*minus_eq_mode, *minus_ne_mode): New. Okay. Thanks, David
Re: [PATCH 08/10] rs6000: compare-two for rld*;rld*.
On Mon, Dec 8, 2014 at 9:18 AM, Segher Boessenkool seg...@kernel.crashing.org wrote: These now are the only remaining patterns that have type compare. And they shouldn't: two is a better type for them. 2014-12-08 Segher Boessenkool seg...@kernel.crashing.org gcc/ * config/rs6000/rs6000.md (*anddi3_2rld_dot, *anddi3_rld_dot2): Change type from compare to two. Okay. Thanks, David
Re: [PATCH 09/10] rs6000: Remove the (now unused) insn type compare
On Mon, Dec 8, 2014 at 9:18 AM, Segher Boessenkool seg...@kernel.crashing.org wrote: 2014-12-08 Segher Boessenkool seg...@kernel.crashing.org gcc/ * config/rs6000/40x.md (ppc403-compare): Remove compare. config/rs6000/440.md (ppc440-compare): Remove compare. config/rs6000/476.md (ppc476-compare): Remove compare. config/rs6000/601.md (ppc601-compare): Remove compare. config/rs6000/603.md (ppc603-compare): Remove compare. config/rs6000/6xx.md (ppc604-compare): Remove compare. config/rs6000/7450.md (ppc7450-compare): Remove compare. config/rs6000/7xx.md (ppc750-compare): Remove compare. config/rs6000/8540.md (ppc8540_su): Remove compare. config/rs6000/cell.md (cell-fast-cmp, cell-cmp-microcoded): Remove compare. config/rs6000/e300c2c3.md (ppce300c3_cmp): Remove compare. config/rs6000/e500mc.md (e500mc_su): Remove compare. config/rs6000/e500mc64.md (e500mc64_su2): Remove compare. config/rs6000/e5500.md (e5500_sfx2): Remove compare. config/rs6000/e6500.md (e6500_sfx2): Remove compare. config/rs6000/mpc.md (mpccore-compare): Remove compare. config/rs6000/power4.md (power4-compare): Remove compare. config/rs6000/power5.md (power5-compare): Remove compare. config/rs6000/power6.md (power6-compare): Remove compare. config/rs6000/power7.md (power7-compare): Remove compare. config/rs6000/power8.md (power8-compare): Remove compare. Update comment. config/rs6000/rs6000.c (rs6000_adjust_cost) TYPE_COMPARE: Remove (three times). (is_cracked_insn): Remove TYPE_COMPARE case. (insn_must_be_first_in_group) TYPE_COMPARE: Remove (twice). config/rs6000/rs6000.md (type): Remove compare. (cell_micro): Remove compare. config/rs6000/rs64.md (rs64a-compare): Remove compare. Okay. Thanks, David
Re: [PATCH 10/10] rs6000: Get rid of the weird decimal thing in add
On Mon, Dec 8, 2014 at 9:18 AM, Segher Boessenkool seg...@kernel.crashing.org wrote: Peter tells me it was an artifact of old versions of the DFP code. The condition can never be false; delete it. 2014-12-08 Segher Boessenkool seg...@kernel.crashing.org gcc/ * config/rs6000/rs6000.md (*addmode3): Remove condition. Okay. Thanks, David
Re: [PATCH] Fix PR 61225
On Tue, Dec 09, 2014 at 05:49:18PM +0800, Zhenqiang Chen wrote: Do you need to verify SETA and SETB satisfy single_set? Or has that already been done elsewhere? A is NEXT_INSN (insn) B is prev_nonnote_nondebug_insn (insn), For I1 - I2 - B; I2 - A; LOG_LINK can make sure I1 and I2 are single_set, It cannot, not anymore anyway. LOG_LINKs can point to an insn with multiple SETs; multiple LOG_LINKs can point to such an insn. The only thing a LOG_LINK from Y to X tells you is that Y is the earliest insn after X that uses some register set by X (and it knows which register, too, nowadays). + if (DF_REG_USE_COUNT (REGNO (SET_SRC (setb))) 2) +{ + df_ref use; + rtx insn; + unsigned int i = REGNO (SET_SRC (setb)); + + for (use = DF_REG_USE_CHAIN (i); use; use = DF_REF_NEXT_REG (use)) +{ + insn = DF_REF_INSN (use); + if (insn != a insn != b !(NOTE_P (insn) || DEBUG_INSN_P (insn))) + return false; + } +} + + return true; +} Is this fragment really needed? Does it ever trigger? I'd think that for 2 uses punting would be fine. Do we really commonly have cases with 2 uses, but where they're all in SETA and SETB? Can't you just check for a death note on the second insn? Together with reg_used_between_p? + /* Try to combine a compare insn that sets CC + with a preceding insn that can set CC, and maybe with its + logical predecessor as well. + We need this special code because data flow connections + do not get entered in LOG_LINKS. */ I think you mean not _all_ data flow connections? + if ((prev = prev_nonnote_nondebug_insn (insn)) != NULL_RTX +refer_same_reg_p (insn, prev) +max_combine = 4) + { + struct insn_link *next1; + FOR_EACH_LOG_LINK (next1, prev) + { + rtx_insn *link1 = next1-insn; + if (NOTE_P (link1)) + continue; + /* I1 - I2 - I3; I2 - insn; +output parallel (insn, I3). */ + FOR_EACH_LOG_LINK (nextlinks, link1) + if ((next = try_combine (prev, link1, +nextlinks-insn, NULL, +new_direct_jump_p, +last_combined_insn, insn)) != 0) + + { + delete_insn (insn); + insn = next; + statistics_counter_event (cfun, four-insn combine, 1); + goto retry; + } + /* I2 - I3; I2 - insn +output next = parallel (insn, I3). */ + if ((next = try_combine (prev, link1, + NULL, NULL, + new_direct_jump_p, + last_combined_insn, insn)) != 0) + + { + delete_insn (insn); + insn = next; + statistics_counter_event (cfun, three-insn combine, 1); + goto retry; + } + } + } So you've got two new combine cases here, but I think the testcase only tests one of them. Can you include a testcase for both of hte major paths above (I1-I2-I3; I2-insn and I2-I3; I2-INSN) pr61225.c is the case to cover I1-I2-I3; I2-insn. For I2 - I3; I2 - insn, I tried my test cases and found peephole2 can also handle them. So I removed the code from the patch. Why? The simpler case has much better chances of being used. In fact, there are many more cases you could handle: You handle I1 - I2 - I3; I2 - insn I2 - I3; I2 - insn but there are also I1,I2 - I3; I2 - insn and the many 4-insn combos, too. But that's not all: instead of just dealing with I2-insn, you can just as well have I1-insn or I0-insn, and if you could handle the SET not dying in the resulting insn, I3-insn. In fact, in that case you really only need to handle I3-insn (no other instructions involved), as a simple 2-insn combination that combines into the earlier insn instead of the later, to get the effect you want. Just like your patch, that would pull insn earlier, but it would do it much more explicitly. Some comments on the patch... +/* A is a compare (reg1, 0) and B is SINGLE_SET which SET_SRC is reg2. + It returns TRUE, if reg1 == reg2, and no other refer of reg1 + except A and B. */ That sound like the only correct inputs are such a compare etc., but the routine tests whether that is true. +static bool +can_reuse_cc_set_p (rtx_insn *a, rtx_insn *b) +{ + rtx seta = single_set (a); + rtx setb = single_set (b); + + if (BLOCK_FOR_INSN (a) != BLOCK_FOR_INSN (b) Neither the comment nor the function name mention this. This test is better placed in the
Re: [PATCH] Fix PR 61225
On 12/09/14 12:07, Segher Boessenkool wrote: On Tue, Dec 09, 2014 at 05:49:18PM +0800, Zhenqiang Chen wrote: Do you need to verify SETA and SETB satisfy single_set? Or has that already been done elsewhere? A is NEXT_INSN (insn) B is prev_nonnote_nondebug_insn (insn), For I1 - I2 - B; I2 - A; LOG_LINK can make sure I1 and I2 are single_set, It cannot, not anymore anyway. LOG_LINKs can point to an insn with multiple SETs; multiple LOG_LINKs can point to such an insn. So let's go ahead and put a single_set test in this function. Is this fragment really needed? Does it ever trigger? I'd think that for 2 uses punting would be fine. Do we really commonly have cases with 2 uses, but where they're all in SETA and SETB? Can't you just check for a death note on the second insn? Together with reg_used_between_p? Yea, that'd accomplish the same thing I think Zhenqiang is trying to catch and is simpler than walking the lists. + /* Try to combine a compare insn that sets CC +with a preceding insn that can set CC, and maybe with its +logical predecessor as well. +We need this special code because data flow connections +do not get entered in LOG_LINKS. */ I think you mean not _all_ data flow connections? I almost said something about this comment, but figured I was nitpicking too much :-) So you've got two new combine cases here, but I think the testcase only tests one of them. Can you include a testcase for both of hte major paths above (I1-I2-I3; I2-insn and I2-I3; I2-INSN) pr61225.c is the case to cover I1-I2-I3; I2-insn. For I2 - I3; I2 - insn, I tried my test cases and found peephole2 can also handle them. So I removed the code from the patch. Why? The simpler case has much better chances of being used. The question does it actually catch anything not already handled? I guess you could argue that doing it in combine is better than peep2 and I'd agree with that. In fact, there are many more cases you could handle: You handle I1 - I2 - I3; I2 - insn I2 - I3; I2 - insn but there are also I1,I2 - I3; I2 - insn and the many 4-insn combos, too. Yes, but I wonder how much of this is really necessary in practice. We could do exhaustive testing here, but I suspect the payoff isn't all that great. Thus I'm comfortable with faulting in the cases we actually find are useful in practice. +/* A is a compare (reg1, 0) and B is SINGLE_SET which SET_SRC is reg2. + It returns TRUE, if reg1 == reg2, and no other refer of reg1 + except A and B. */ That sound like the only correct inputs are such a compare etc., but the routine tests whether that is true. Correct, the RTL has to have a specific form and that is tested for. Comment updates can't hurt. +static bool +can_reuse_cc_set_p (rtx_insn *a, rtx_insn *b) +{ + rtx seta = single_set (a); + rtx setb = single_set (b); + + if (BLOCK_FOR_INSN (a) != BLOCK_FOR_INSN (b) Neither the comment nor the function name mention this. This test is better placed in the caller of this function, anyway. Didn't consider it terribly important. Moving it to the caller doesn't change anything significantly, though I would agree it's martinally cleaner. @@ -3323,7 +3396,11 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, rtx old = newpat; total_sets = 1 + extra_sets; newpat = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (total_sets)); - XVECEXP (newpat, 0, 0) = old; + + if (to_combined_insn) + XVECEXP (newpat, 0, --total_sets) = old; + else + XVECEXP (newpat, 0, 0) = old; } Is this correct? If so, it needs a big fat comment, because it is not exactly obvious :-) Also, it doesn't handle at all the case where the new pattern already is a PARALLEL; can that never happen? I'd convinced myself it was. But yes, a comment here would be good. Presumably you're thinking about a PARALLEL that satisfies single_set_p? jeff
Re: Running cc1 etc under valgrind (was Re: [PATCH 13/21] PR jit/63854: Add support for running make check-jit under valgrind)
On Wed, 2014-11-19 at 13:38 -0500, David Malcolm wrote: On Wed, 2014-11-19 at 09:57 -0700, Jeff Law wrote: On 11/19/14 03:46, David Malcolm wrote: This commit updates jit.exp so that if RUN_UNDER_VALGRIND is present in the environment, all of the built client code using libgccjit.so is run under valgrind, with --leak-check=full. Hence: RUN_UNDER_VALGRIND= make check-jit will run all jit testcases under valgrind (taking 27 mins on my machine). Results are written to testsuite/jit/test-FOO.exe.valgrind.txt jit.exp automatically parses these result file, looking for lines of the form definitely lost: 11,316 bytes in 235 blocks indirectly lost: 352 bytes in 4 blocks in the valgrind log's summary footer, adding PASSes if they are zero bytes, and, for now generating XFAILs for non-zero bytes. Sadly this diverges jit.exp's fixed_host_execute further from DejaGnu's host_execute, but I don't see a clean way to fix that. This currently adds 63 PASSes and 49 XFAILs to jit.sum, giving: # of expected passes 2481 # of expected failures 49 gcc/testsuite/ChangeLog: PR jit/63854 * jit.dg/jit.exp (report_leak): New. (parse_valgrind_logfile): New. (fixed_host_execute): Detect if RUN_UNDER_VALGRIND is present in the environment, and if so, run the executable under valgrind, capturing valgrind's output to a logfile. Parse the log file, generating PASSes and XFAILs for the summary of leaks. OK for the trunk. FWIW, I'd love to see a mode where we can easily do this for the other testsuites as well. Many of the cleanups in these patches are called from toplev::finalize, or something called from there, so that they're called by libgccjit.so, but not called by cc1/cc1plus etc. In general, cc1 etc don't need to bother free-ing everything, and can instead simply exit. But if you're running under valgrind, you'd probably want them to call toplev::finalize before exiting, to make the valgrind log shorter. So perhaps cc1 etc could detect if they're being run under valgrind, and call toplev::finalize in the appropriate place? Or maybe this could be a command-line option? [I think I prefer autodetection, fwiw] It turns out that this isn't necessary (I think), since the pointers are typically still reachable.
[PATCH] Fix IRA register preferencing
This fixes a bug in register preferencing. When live range splitting creates a new register from another, it copies most fields except for the register preferences. The preference GENERAL_REGS is used as reg_pref[i].prefclass is initialized with GENERAL_REGS in allocate_reg_info () and resize_reg_info (). This initialization value is not innocuous like the comment suggests - if a new register has a non-integer mode, it is forced to prefer GENERAL_REGS. This changes the register costs in pass 2 so that they are incorrect. As a result the liverange is either spilled or allocated to an integer register: void g(double); void f(double x) { if (x == 0) return; g (x); g (x); } f: fcmpd0, #0.0 bne .L6 ret .L6: stp x19, x30, [sp, -16]! fmovx19, d0 bl g fmovd0, x19 ldp x19, x30, [sp], 16 b g With the fix it uses a floating point register as expected. Given a similar issue in https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02253.html, would it not be better to change the initialization values of reg_pref to illegal register classes so this kind of issue can be trivially found with an assert? Also would it not be a good idea to have a single register copy function that ensures all data is copied? ChangeLog: 2014-12-09 Wilco Dijkstra wdijk...@arm.com * gcc/ira-emit.c (ira_create_new_reg): Copy preference classes. --- gcc/ira-emit.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/gcc/ira-emit.c b/gcc/ira-emit.c index d246b7f..d736836 100644 --- a/gcc/ira-emit.c +++ b/gcc/ira-emit.c @@ -348,6 +348,7 @@ rtx ira_create_new_reg (rtx original_reg) { rtx new_reg; + int original_regno = REGNO (original_reg); new_reg = gen_reg_rtx (GET_MODE (original_reg)); ORIGINAL_REGNO (new_reg) = ORIGINAL_REGNO (original_reg); @@ -356,8 +357,16 @@ ira_create_new_reg (rtx original_reg) REG_ATTRS (new_reg) = REG_ATTRS (original_reg); if (internal_flag_ira_verbose 3 ira_dump_file != NULL) fprintf (ira_dump_file, Creating newreg=%i from oldreg=%i\n, -REGNO (new_reg), REGNO (original_reg)); +REGNO (new_reg), original_regno); ira_expand_reg_equiv (); + + /* Copy the preference classes to new_reg. */ + resize_reg_info (); + setup_reg_classes (REGNO (new_reg), +reg_preferred_class (original_regno), +reg_alternate_class (original_regno), +reg_allocno_class (original_regno)); + return new_reg; } -- 1.9.1
Re: [PATCH] TYPE_OVERFLOW_* cleanup
On Tue, 9 Dec 2014, Marek Polacek wrote: The issue here is that TYPE_OVERFLOW_TRAPS, TYPE_OVERFLOW_UNDEFINED, and TYPE_OVERFLOW_WRAPS macros work on integral types only, yet we pass e.g. pointer_type/real_type to them. This patch adds proper checking for these macros and adds missing guards to various places. This looks pretty straightforward, but I had to tweak a few places wrt vector_types (where I've used VECTOR_INTEGER_TYPE_P) to not to regress folding - and I'm afraid I missed places that aren't tested in our testsuite :/. Bootstrapped/regtested on ppc64-linux and x86_64-linux. 2014-12-09 Marek Polacek pola...@redhat.com * fold-const.c (negate_expr_p): Check for INTEGRAL_TYPE_P. (fold_negate_expr): Likewise. (extract_muldiv_1): Likewise. (maybe_canonicalize_comparison_1): Likewise. (fold_comparison): Likewise. (fold_binary_loc): Likewise. (tree_binary_nonnegative_warnv_p): Likewise. (tree_binary_nonzero_warnv_p): Likewise. * gimple-ssa-strength-reduction.c (legal_cast_p_1): Likewise. * tree-scalar-evolution.c (simple_iv): Likewise. (scev_const_prop): Likewise. * tree-ssa-loop-niter.c (expand_simple_operations): Likewise. * match.pd (X % -C): Likewise. (-A - 1 - ~A): Likewise. (~A + A - -1): Check for INTEGRAL_TYPE_P or VECTOR_INTEGER_TYPE_P. * tree-vect-generic.c (expand_vector_operation): Likewise. * tree.h (INTEGRAL_TYPE_CHECK): Define. (TYPE_OVERFLOW_WRAPS, TYPE_OVERFLOW_UNDEFINED, TYPE_OVERFLOW_TRAPS): Add INTEGRAL_TYPE_CHECK. diff --git gcc/fold-const.c gcc/fold-const.c index 0c4fe40..ff9d917 100644 --- gcc/fold-const.c +++ gcc/fold-const.c @@ -426,7 +426,8 @@ negate_expr_p (tree t) case VECTOR_CST: { - if (FLOAT_TYPE_P (TREE_TYPE (type)) || TYPE_OVERFLOW_WRAPS (type)) + if (FLOAT_TYPE_P (TREE_TYPE (type)) + || (INTEGRAL_TYPE_P (type) TYPE_OVERFLOW_WRAPS (type))) return true; type is a vector type, so INTEGRAL_TYPE_P (type) is always false I think. int count = TYPE_VECTOR_SUBPARTS (type), i; @@ -558,7 +559,8 @@ fold_negate_expr (location_t loc, tree t) case INTEGER_CST: tem = fold_negate_const (t, type); if (TREE_OVERFLOW (tem) == TREE_OVERFLOW (t) - || (!TYPE_OVERFLOW_TRAPS (type) + || (INTEGRAL_TYPE_P (type) Can that be false for an INTEGER_CST? + !TYPE_OVERFLOW_TRAPS (type) TYPE_OVERFLOW_WRAPS (type)) || (flag_sanitize SANITIZE_SI_OVERFLOW) == 0) return tem; @@ -5951,7 +5953,8 @@ extract_muldiv_1 (tree t, tree c, enum tree_code code, tree wide_type, || EXPRESSION_CLASS_P (op0)) /* ... and has wrapping overflow, and its type is smaller than ctype, then we cannot pass through as widening. */ - ((TYPE_OVERFLOW_WRAPS (TREE_TYPE (op0)) + (((INTEGRAL_TYPE_P (TREE_TYPE (op0)) +TYPE_OVERFLOW_WRAPS (TREE_TYPE (op0))) (TYPE_PRECISION (ctype) TYPE_PRECISION (TREE_TYPE (op0 /* ... or this is a truncation (t is narrower than op0), @@ -5966,7 +5969,8 @@ extract_muldiv_1 (tree t, tree c, enum tree_code code, tree wide_type, /* ... or has undefined overflow while the converted to type has not, we cannot do the operation in the inner type as that would introduce undefined overflow. */ - || (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (op0)) + || ((INTEGRAL_TYPE_P (TREE_TYPE (op0)) + TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (op0))) !TYPE_OVERFLOW_UNDEFINED (type break; @@ -6159,6 +6163,7 @@ extract_muldiv_1 (tree t, tree c, enum tree_code code, tree wide_type, if ((code == TRUNC_MOD_EXPR || code == CEIL_MOD_EXPR || code == FLOOR_MOD_EXPR || code == ROUND_MOD_EXPR) /* If the multiplication can overflow we cannot optimize this. */ + INTEGRAL_TYPE_P (TREE_TYPE (t)) TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (t)) TREE_CODE (TREE_OPERAND (t, 1)) == INTEGER_CST wi::multiple_of_p (op1, c, TYPE_SIGN (type))) @@ -6211,7 +6216,8 @@ extract_muldiv_1 (tree t, tree c, enum tree_code code, tree wide_type, If we have an unsigned type, we cannot do this since it will change the result if the original computation overflowed. */ - if (TYPE_OVERFLOW_UNDEFINED (ctype) + if (INTEGRAL_TYPE_P (ctype) + TYPE_OVERFLOW_UNDEFINED (ctype) ((code == MULT_EXPR tcode == EXACT_DIV_EXPR) || (tcode == MULT_EXPR code != TRUNC_MOD_EXPR code != CEIL_MOD_EXPR @@ -8497,7 +8503,8 @@ maybe_canonicalize_comparison_1 (location_t loc, enum tree_code code, tree type, /* Match A +- CST code arg1 and CST code arg1. We can change the first form only if overflow is undefined.
Re: locales fixes
On Tue, 9 Dec 2014, Jonathan Wakely wrote: On 08/12/14 23:53 +0100, François Dumont wrote: After having installed all necessary locales on my system I end up with 4 failures. Here is a patch to fix them all. Did you discover why only you are seeing failures? Not just him, anyone with a debian-based system (assuming we are talking about the same thing). I believe it was documented somewhere, Paolo should know. -- Marc Glisse
Re: [Patch] Improving jump-thread pass for PR 54742
On December 9, 2014 7:39:48 PM CET, Jeff Law l...@redhat.com wrote: On 12/09/14 10:38, Sebastian Pop wrote: Richard Biener wrote: On Mon, Dec 8, 2014 at 10:49 PM, Steve Ellcey sell...@mips.com wrote: expected? Should this test also check flag_thread_jumps? Or should that be getting checked somewhere else? -fthread-jumps is an RTL optimization flag and ignored on GIMPLE. Does it make sense to add a -f[no-]tree-thread-jumps to enable/disable the tree jump threading? I could also add -f[no-]tree-fsm-thread-jumps. Opinions? Our option handling is a bit of a mess if we look at it from the user standpoint. Given that the vast majority of jump threading happens on gimple, ISTM that -f[no-]thread-jumps ought to be controlling the gimple implementation. One could easily argue that the user doesn't really care about where in the pipeline the optimization is implemented. My vote would be to just make -fthread-jumps control both RTL and gimple jump threading. Works for me. On the llvm test-suite, I have seen one ICE with my fsm jump-thread patch. This patch fixes the problem: diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c index 12f83ba..f8c736e 100644 --- a/gcc/tree-ssa-threadupdate.c +++ b/gcc/tree-ssa-threadupdate.c @@ -2564,6 +2564,7 @@ thread_through_all_blocks (bool may_peel_loop_headers) FOR_EACH_LOOP (loop, LI_FROM_INNERMOST) { if (!loop-header +|| !loop_latch_edge (loop) || !bitmap_bit_p (threaded_blocks, loop-header-index)) continue; retval |= thread_through_loop_header (loop, may_peel_loop_headers); Ok to commit after regstrap? This seems to be indicating that we have with no edge from the latch block to the header block. I'd like to know better how we got into that state. It Also returns null for loops with multiple latches. So the patch looks OK for me. Thanks, Richard. Also, a test for the GCC testsuite would be good. I have no idea what license covers the LLVM testsuite. But given a good analysis of the problem we may be able to write a suitable test independent of the LLVM test. jeff
Re: [PATCH, MPX wrappers 1/3] Add MPX wrappers library
On 12/09/14 01:27, Ilya Enkovich wrote: 2014-12-09 Ilya Enkovich ilya.enkov...@intel.com * config/i386/linux-common.h (LIBMPX_WRAPPERSSPEC): New. (CHKP_SPEC): Add wrappers library. * c-family/c.opt (static-libmpxwrappers): New. libmpx/ 2014-12-09 Ilya Enkovich ilya.enkov...@intel.com * Makefile.am (SUBDIRS): Add mpxwrap when used AS supports MPX. (MAKEOVERRIDES): New. * Makefile.in: Regenerate. * configure.ac: Check AS supports MPX. Add mpxintr/Makefile to config files. * configure: Regenerate. * mpxwrap/Makefile.am: New. * mpxwrap/Makefile.in: New. * mpxwrap/libtool-version: New. * mpxwrap/mpx_wrappers.cc: New. * mpxwrap/libmpxwrappers.map: New. OK once prerequisites are approved. jeff
Re: [Patch] Improving jump-thread pass for PR 54742
On 12/09/14 12:43, Richard Biener wrote: This seems to be indicating that we have with no edge from the latch block to the header block. I'd like to know better how we got into that state. It Also returns null for loops with multiple latches. So the patch looks OK for me. Ah, OK. Jeff
Re: [PATCH] Set stores_off_frame_dead_at_return to false if sibling call
On 12/08/14 14:50, John David Anglin wrote: On 12/8/2014 3:49 PM, Jeff Law wrote: I think the terminology and variable names certainly makes this tougher to follow than it should. I certainly agree. When I first looked at the code, I thought it was completely backwards. Me too. Thoughts on using both patches to ensure the we don't have extraneous entries in the GEN set and that we're adding more stuff to the KILL set for sibcalls? Jeff
[PATCH, committed] Add jit-tempdir.{c|h}
Move tempdir handling out from jit-playback.c into new jit-tempdir.{c|h} files. This simplifies jit-playback.c, and may help fix PR jit/64206. Committed to trunk as r218533. gcc/jit/ChangeLog: PR jit/64206 * Make-lang.in (jit_OBJS): Add jit/jit-tempdir.o. * jit-common.h (gcc::jit::tempdir): New forward decl. * jit-playback.c: Include jit-tempdir.h. (gcc::jit::playback::context::context): Initialize m_tempdir. (gcc::jit::playback::context::~context): Move tempdir cleanup to new file jit-tempdir.c (make_tempdir_path_template): Move to new file jit-tempdir.c. (gcc::jit::playback::context::compile): Move tempdir creation to new tempdir object in new file jit-tempdir.c. (gcc::jit::playback::context::make_fake_args): Get path from tempdir object rather than from member data. (gcc::jit::playback::context::convert_to_dso): Likewise. (gcc::jit::playback::context::dlopen_built_dso): Likewise. (gcc::jit::playback::context::dump_generated_code): Likewise. (gcc::jit::playback::context::get_path_c_file): New function. (gcc::jit::playback::context::get_path_s_file): New function. (gcc::jit::playback::context::get_path_so_file): New function. * jit-playback.h (gcc::jit::playback::context::get_path_c_file): New function. (gcc::jit::playback::context::get_path_s_file): New function. (gcc::jit::playback::context::get_path_so_file): New function. (gcc::jit::playback::context): Move fields m_path_template, m_path_tempdir, m_path_c_file, m_path_s_file, m_path_so_file to new jit::tempdir class; add field m_tempdir. * jit-tempdir.c: New file. * jit-tempdir.h: New file. --- gcc/jit/Make-lang.in | 1 + gcc/jit/jit-common.h | 1 + gcc/jit/jit-playback.c | 118 +--- gcc/jit/jit-playback.h | 15 ++ gcc/jit/jit-tempdir.c | 129 + gcc/jit/jit-tempdir.h | 81 +++ 6 files changed, 262 insertions(+), 83 deletions(-) create mode 100644 gcc/jit/jit-tempdir.c create mode 100644 gcc/jit/jit-tempdir.h diff --git a/gcc/jit/Make-lang.in b/gcc/jit/Make-lang.in index e88fd00..818e14b 100644 --- a/gcc/jit/Make-lang.in +++ b/gcc/jit/Make-lang.in @@ -65,6 +65,7 @@ jit_OBJS = attribs.o \ jit/jit-recording.o \ jit/jit-playback.o \ jit/jit-result.o \ + jit/jit-tempdir.o \ jit/jit-builtins.o # Use strict warnings for this front end. diff --git a/gcc/jit/jit-common.h b/gcc/jit/jit-common.h index c9dde3e..25c2c6f 100644 --- a/gcc/jit/jit-common.h +++ b/gcc/jit/jit-common.h @@ -98,6 +98,7 @@ namespace jit { class result; class dump; class builtins_manager; // declared within jit-builtins.h +class tempdir; namespace recording { diff --git a/gcc/jit/jit-playback.c b/gcc/jit/jit-playback.c index 281ad85..8498900 100644 --- a/gcc/jit/jit-playback.c +++ b/gcc/jit/jit-playback.c @@ -53,6 +53,7 @@ along with GCC; see the file COPYING3. If not see #include jit-playback.h #include jit-result.h #include jit-builtins.h +#include jit-tempdir.h /* gcc::jit::playback::context::build_cast uses the convert.h API, @@ -86,6 +87,7 @@ namespace jit { playback::context::context (recording::context *ctxt) : m_recording_ctxt (ctxt), +m_tempdir (NULL), m_char_array_type_node (NULL), m_const_char_ptr (NULL) { @@ -98,25 +100,8 @@ playback::context::context (recording::context *ctxt) playback::context::~context () { - if (get_bool_option (GCC_JIT_BOOL_OPTION_KEEP_INTERMEDIATES)) -fprintf (stderr, intermediate files written to %s\n, m_path_tempdir); - else -{ - /* Clean up .s/.so and tempdir. */ - if (m_path_s_file) -unlink (m_path_s_file); - if (m_path_so_file) -unlink (m_path_so_file); - if (m_path_tempdir) -rmdir (m_path_tempdir); -} - - free (m_path_template); - /* m_path_tempdir aliases m_path_template, or is NULL, so don't - attempt to free it . */ - free (m_path_c_file); - free (m_path_s_file); - free (m_path_so_file); + if (m_tempdir) +delete m_tempdir; m_functions.release (); } @@ -1515,44 +1500,6 @@ block (function *func, m_label_expr = NULL; } -/* Construct a tempdir path template suitable for use by mkdtemp - e.g. /tmp/libgccjit-XX, but respecting the rules in - libiberty's choose_tempdir rather than hardcoding /tmp/. - - The memory is allocated using malloc and must be freed. - Aborts the process if allocation fails. */ - -static char * -make_tempdir_path_template () -{ - const char *tmpdir_buf; - size_t tmpdir_len; - const char *file_template_buf; - size_t file_template_len; - char *result; - - /* The result of choose_tmpdir is a cached buffer within libiberty, so - we must *not* free it. */ - tmpdir_buf = choose_tmpdir (); - - /*
Re: [PATCH][libstdc++][testsuite] Mark as UNSUPPORTED tests that don't fit into tiny memory model
On Dec 9, 2014, at 3:17 AM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: In the gcc and g++ testsuite we already catch such cases and mark them as UNSUPPORTED. In libstdc++.exp there is no such functionality. I'm not very happy that it had to be copied, but I couldn't find a way to include the gcc definition sanely. Is this a sane approach to what I'm trying to solve? Ok. If you would like, you can try and pull the common parts out into a new file, and include (load) that file from the two places that currently do that. If they are exactly identical, should be trivial enough. If not exactly the same, I’d do two patches, once to make them the same, then, the second one to split them out.
Re: [Patch] Improving jump-thread pass for PR 54742
On Dec 9, 2014, at 10:39 AM, Jeff Law l...@redhat.com wrote: Also, a test for the GCC testsuite would be good. I have no idea what license covers the LLVM testsuite. But given a good analysis of the problem we may be able to write a suitable test independent of the LLVM test. So, the usual engineerings rules should work just fine on it. delta reduce and submit.
Re: [PATCH, MPX runtime 1/2] Integrate MPX runtime library
On 12/09/14 01:24, Ilya Enkovich wrote: On 24 Nov 17:02, Ilya Enkovich wrote: Right. This works for both top level and multilib checks because failing test is used and CC is usually not set when it's called by the top level configure. If we configure with CC=... then it may go wrong. I left only target check in configure.tgt and inlined test for x32 into libmpx configure. -- Joseph S. Myers jos...@codesourcery.com Here is an updated version. Thanks, Ilya -- Here is an updated version. I moved linker specs to target. Currently mpx libraries are built for x86_64-*-linux* | i?86-*-linux*, so I think gcc/config/i386/linux-common.h is a proper place for LIBMPX delcarations. Thanks, Ilya -- 2014-12-09 Ilya Enkovich ilya.enkov...@intel.com * Makefile.def: Add libmpx. * configure.ac: Add libmpx. * Makefile.in: Regenerate. * configure: Regenerate. gcc/ 2014-12-09 Ilya Enkovich ilya.enkov...@intel.com * config/i386/linux-common.h (LIBMPX_LIBS): New. (LIBMPX_SPEC): New. (CHKP_SPEC): New. * gcc.c (CHKP_SPEC): New. (LINK_COMMAND_SPEC): Add CHKP_SPEC. * c-family/c.opt (static-libmpx): New. libmpx/ diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index b9f7c65..65731cc 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -1020,6 +1020,9 @@ fchkp-instrument-marked-only C ObjC C++ ObjC++ LTO Report Var(flag_chkp_instrument_marked_only) Init(0) Instrument only functions marked with bnd_instrument attribute. +static-libmpx +Driver + fcilkplus C ObjC C++ ObjC++ LTO Report Var(flag_cilkplus) Init(0) Enable Cilk Plus Wasn't this duplicated in the mpx-wrapper patch? I think this is OK on the technical side. I need to do to some vote tallying on the steering committee side WRT licensing, ownership of the sources, canonical source location, etc, so please don't commit yet. jeff
[PATCH, commmited] PR jit/63854: Document how to run the jit testsuite under valgrind
Committed to trunk as r218538. Index: gcc/jit/ChangeLog === --- gcc/jit/ChangeLog (revision 218537) +++ gcc/jit/ChangeLog (revision 218538) @@ -1,5 +1,12 @@ 2014-12-09 David Malcolm dmalc...@redhat.com + PR jit/63854 + * docs/internals/index.rst (Running under valgrind): New + subsection. + (docs/_build/texinfo/libgccjit.texi): Regenerate. + +2014-12-09 David Malcolm dmalc...@redhat.com + PR jit/64206 * Make-lang.in (jit_OBJS): Add jit/jit-tempdir.o. * jit-common.h (gcc::jit::tempdir): New forward decl. Index: gcc/jit/docs/internals/index.rst === --- gcc/jit/docs/internals/index.rst (revision 218537) +++ gcc/jit/docs/internals/index.rst (revision 218538) @@ -118,6 +118,51 @@ gdb --args \ testsuite/jit/test-factorial.exe +Running under valgrind +** + +The jit testsuite detects if RUN_UNDER_VALGRIND is present in the +environment (with any value). If it is present, it runs the test client +code under `valgrind http://valgrind.org`_, +specifcally, the default +`memcheck http://valgrind.org/docs/manual/mc-manual.html`_ +tool with +`--leak-check=full +http://valgrind.org/docs/manual/mc-manual.html#opt.leak-check`_. + +It automatically parses the output from valgrind, injecting XFAIL results if +any issues are found, or PASS results if the output is clean. The output +is saved to ``TESTNAME.exe.valgrind.txt``. + +For example, the following invocation verbosely runs the testcase +``test-sum-of-squares.c`` under valgrind, showing an issue: + +.. code-block:: console + + $ RUN_UNDER_VALGRIND= \ + make check-jit \ +RUNTESTFLAGS=-v -v -v jit.exp=test-sum-of-squares.c + + (...verbose log contains detailed valgrind errors, if any...) + + === jit Summary === + + # of expected passes28 + # of expected failures 2 + + $ less testsuite/jit/jit.sum + (...other results...) + XFAIL: jit.dg/test-sum-of-squares.c: test-sum-of-squares.exe.valgrind.txt: definitely lost: 8 bytes in 1 blocks + XFAIL: jit.dg/test-sum-of-squares.c: test-sum-of-squares.exe.valgrind.txt: unsuppressed errors: 1 + (...other results...) + + $ less testsuite/jit/test-sum-of-squares.exe.valgrind.txt + (...shows full valgrind report for this test case...) + +When running under valgrind, it's best to have configured gcc with +:option:`--enable-valgrind-annotations`, which automatically suppresses +various known false positives. + Environment variables - When running client code against a locally-built libgccjit, three Index: gcc/jit/docs/_build/texinfo/libgccjit.texi === --- gcc/jit/docs/_build/texinfo/libgccjit.texi (revision 218537) +++ gcc/jit/docs/_build/texinfo/libgccjit.texi (revision 218538) @@ -197,6 +197,10 @@ * Environment variables:: * Overview of code structure:: +Running the test suite + +* Running under valgrind:: + @end detailmenu @end menu @@ -6548,8 +6552,61 @@ @noindent +@menu +* Running under valgrind:: + +@end menu + +@node Running under valgrind,,,Running the test suite +@anchor{internals/index running-under-valgrind}@anchor{c9} +@subsection Running under valgrind + + +The jit testsuite detects if RUN_UNDER_VALGRIND is present in the +environment (with any value). If it is present, it runs the test client +code under valgrind@footnote{http://valgrind.org}, +specifcally, the default +memcheck@footnote{http://valgrind.org/docs/manual/mc-manual.html} +tool with +--leak-check=full@footnote{http://valgrind.org/docs/manual/mc-manual.html#opt.leak-check}. + +It automatically parses the output from valgrind, injecting XFAIL results if +any issues are found, or PASS results if the output is clean. The output +is saved to @code{TESTNAME.exe.valgrind.txt}. + +For example, the following invocation verbosely runs the testcase +@code{test-sum-of-squares.c} under valgrind, showing an issue: + +@example +$ RUN_UNDER_VALGRIND= \ +make check-jit \ + RUNTESTFLAGS=-v -v -v jit.exp=test-sum-of-squares.c + +(...verbose log contains detailed valgrind errors, if any...) + +=== jit Summary === + +# of expected passes28 +# of expected failures 2 + +$ less testsuite/jit/jit.sum +(...other results...) +XFAIL: jit.dg/test-sum-of-squares.c: test-sum-of-squares.exe.valgrind.txt: definitely lost: 8 bytes in 1 blocks +XFAIL: jit.dg/test-sum-of-squares.c: test-sum-of-squares.exe.valgrind.txt: unsuppressed errors: 1 +(...other results...) + +$ less testsuite/jit/test-sum-of-squares.exe.valgrind.txt +(...shows full valgrind report for this test case...) +@end example + +@noindent + +When running under valgrind, it's best to have configured gcc with +@code{--enable-valgrind-annotations}, which automatically suppresses +various known false positives. + @node
Re: [PATCH, MPX runtime 1/2] Integrate MPX runtime library
2014-12-09 22:57 GMT+03:00 Jeff Law l...@redhat.com: On 12/09/14 01:24, Ilya Enkovich wrote: On 24 Nov 17:02, Ilya Enkovich wrote: Right. This works for both top level and multilib checks because failing test is used and CC is usually not set when it's called by the top level configure. If we configure with CC=... then it may go wrong. I left only target check in configure.tgt and inlined test for x32 into libmpx configure. -- Joseph S. Myers jos...@codesourcery.com Here is an updated version. Thanks, Ilya -- Here is an updated version. I moved linker specs to target. Currently mpx libraries are built for x86_64-*-linux* | i?86-*-linux*, so I think gcc/config/i386/linux-common.h is a proper place for LIBMPX delcarations. Thanks, Ilya -- 2014-12-09 Ilya Enkovich ilya.enkov...@intel.com * Makefile.def: Add libmpx. * configure.ac: Add libmpx. * Makefile.in: Regenerate. * configure: Regenerate. gcc/ 2014-12-09 Ilya Enkovich ilya.enkov...@intel.com * config/i386/linux-common.h (LIBMPX_LIBS): New. (LIBMPX_SPEC): New. (CHKP_SPEC): New. * gcc.c (CHKP_SPEC): New. (LINK_COMMAND_SPEC): Add CHKP_SPEC. * c-family/c.opt (static-libmpx): New. libmpx/ diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index b9f7c65..65731cc 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -1020,6 +1020,9 @@ fchkp-instrument-marked-only C ObjC C++ ObjC++ LTO Report Var(flag_chkp_instrument_marked_only) Init(0) Instrument only functions marked with bnd_instrument attribute. +static-libmpx +Driver + fcilkplus C ObjC C++ ObjC++ LTO Report Var(flag_cilkplus) Init(0) Enable Cilk Plus Wasn't this duplicated in the mpx-wrapper patch? Wrappers patch introduces similar option static-libmpxwrappers. I think this is OK on the technical side. I need to do to some vote tallying on the steering committee side WRT licensing, ownership of the sources, canonical source location, etc, so please don't commit yet. Great! Let me know when it's time to commit. Thanks, Ilya jeff
Re: [PATCH] Set stores_off_frame_dead_at_return to false if sibling call
On 12/9/2014 3:00 PM, Jeff Law wrote: Thoughts on using both patches to ensure the we don't have extraneous entries in the GEN set and that we're adding more stuff to the KILL set for sibcalls? Maybe we should add all stores to the KILL set for a sibling call and not worry about whether they are frame related or not. Dave -- John David Anglindave.ang...@bell.net
[patch] gdb python pretty printer for DIEs
I am tired of dumping entire DIEs just to see what type they are. With this patch, we get: (gdb) print context_die $5 = dw_die_ref 0x76de0230 DW_TAG_module parent=0x76de DW_TAG_compile_unit I know it's past the end of stage1, but this debugging aid can help in fixing bugs in stage = 3. I am committing this to the [debug-early] branch, but I am hoping I can also commit it to mainline and avoid dragging it along. OK for mainline? commit e20f40f92eeba51f9a1ffb078e060366e7beb471 Author: Aldy Hernandez al...@redhat.com Date: Tue Dec 9 13:04:46 2014 -0800 * gdbhooks.py (class DWDieRefPrinter): New class. (build_pretty_printer): Register dw_die_ref's. diff --git a/gcc/gdbhooks.py b/gcc/gdbhooks.py index a74e712..26affd9 100644 --- a/gcc/gdbhooks.py +++ b/gcc/gdbhooks.py @@ -253,6 +253,26 @@ class CGraphNodePrinter: return result ## +# Dwarf DIE pretty-printers +## + +class DWDieRefPrinter: +def __init__(self, gdbval): +self.gdbval = gdbval + +def to_string (self): +result = 'dw_die_ref 0x%x' % long(self.gdbval) +if long(self.gdbval) == 0: +return 'dw_die_ref 0x0' +result += ' %s' % self.gdbval['die_tag'] +if long(self.gdbval['die_parent']) != 0: +result += ' parent=0x%x %s' % (long(self.gdbval['die_parent']), + self.gdbval['die_parent']['die_tag']) + +result += '' +return result + +## class GimplePrinter: def __init__(self, gdbval): @@ -455,6 +475,8 @@ def build_pretty_printer(): 'tree', TreePrinter) pp.add_printer_for_types(['cgraph_node *'], 'cgraph_node', CGraphNodePrinter) +pp.add_printer_for_types(['dw_die_ref'], + 'dw_die_ref', DWDieRefPrinter) pp.add_printer_for_types(['gimple', 'gimple_statement_base *', # Keep this in the same order as gimple.def:
Re: [PATCH, x86] Fix pblendv expand.
I've added the reproducer to the patch. is it ok? ChangeLog: 2014-12-10 Evgeny Stupachenko evstu...@gmail.com gcc/testsuite * gcc.target/i386/blend.c: New. gcc/ * config/i386/i386.c (expand_vec_perm_pblendv): Gen new rtx for expand_vec_perm_1 target. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index eafc15a..5a914ad 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -47546,6 +47546,7 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d *d) dcopy.op0 = dcopy.op1 = d-op1; else dcopy.op0 = dcopy.op1 = d-op0; + dcopy.target = gen_reg_rtx (vmode); dcopy.one_operand_p = true; for (i = 0; i nelt; ++i) diff --git a/gcc/testsuite/gcc.target/i386/blend.c b/gcc/testsuite/gcc.target/i386/blend.c new file mode 100644 index 000..d03bdbb --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/blend.c @@ -0,0 +1,61 @@ +/* Test correctness of size 3 store groups permutation. */ +/* { dg-do run } */ +/* { dg-options -O3 } */ + +#define N 50 + +enum num3 +{ + a, b, c +}; + +struct flags +{ + enum num3 f; + unsigned int c; + unsigned int p; +}; + +struct flagsN +{ + struct flags a[N]; +}; + +void +bar (int n, struct flagsN *ff) +{ + struct flagsN *fc; + for (fc = ff + 1; fc (ff + n); fc++) +{ + int i; + for (i = 0; i N; ++i) + { + ff-a[i].f = 0; + ff-a[i].c = i; + ff-a[i].p = -1; + } + for (i = 0; i n; i++) + { + int j; + for (j = 0; j N - n; ++j) + { + fc-a[i + j].f = 0; + fc-a[i + j].c = j + i; + fc-a[i + j].p = -1; + } + } +} +} + +struct flagsN q[2]; + +int main() +{ + int i; + long long *rr = (long long *)q[0].a; + bar(2, q); + for (i = 0; i N * 2; i += 2) +if (rr[i] == -1 rr[i + 1] == -1) + return 1; + return 0; +} On Tue, Dec 9, 2014 at 7:38 PM, Evgeny Stupachenko evstu...@gmail.com wrote: I mean that there are a lot of people tracking spec2006 stability and therefore the issue should be on track in future. And that I can create the test case, but it would be as big as several GCC functions. Will work on reducing the test case. On Tue, Dec 9, 2014 at 7:20 PM, Richard Henderson r...@redhat.com wrote: On 12/09/2014 07:59 AM, Evgeny Stupachenko wrote: However patch is fixing spec2006 benchmark. Shouldn't that be enough for regression testing? No. Spec is not free. r~
[PATCH, committed] jit: toyvm.c: use correct path in debuginfo
I noticed while debugging another issue that at some point I'd broken the source locations that toyvm.c sets up into the .toy source files, by using the filename, rather than the filesystem path. Also fix a segfault due to fclose (NULL) when the .toy file is not found. Committed to trunk as r218540. gcc/jit/ChangeLog: * docs/examples/tut04-toyvm/toyvm.c (toyvm_function_compile): Move logic for determine funcname to new function... (get_function_name): ...here, adding logic to skip any leading path from the filename. (toyvm_function_parse): Use the filename for fn_filename, rather than name, so that the debugger can locate the source .toy file. (toyvm_function_parse): Don't fclose a NULL FILE *. --- gcc/jit/docs/examples/tut04-toyvm/toyvm.c | 31 +++ 1 file changed, 23 insertions(+), 8 deletions(-) diff --git a/gcc/jit/docs/examples/tut04-toyvm/toyvm.c b/gcc/jit/docs/examples/tut04-toyvm/toyvm.c index 07de507..0089ea7 100644 --- a/gcc/jit/docs/examples/tut04-toyvm/toyvm.c +++ b/gcc/jit/docs/examples/tut04-toyvm/toyvm.c @@ -121,6 +121,25 @@ add_unary_op (toyvm_function *fn, enum opcode opcode, add_op (fn, opcode, operand, linenum); } +static char * +get_function_name (const char *filename) +{ + /* Skip any path separators. */ + const char *pathsep = strrchr (filename, '/'); + if (pathsep) +filename = pathsep + 1; + + /* Copy filename to funcname. */ + char *funcname = (char *)malloc (strlen (filename) + 1); + + strcpy (funcname, filename); + + /* Convert . to NIL terminator. */ + *(strchr (funcname, '.')) = '\0'; + + return funcname; +} + static toyvm_function * toyvm_function_parse (const char *filename, const char *name) { @@ -149,7 +168,7 @@ toyvm_function_parse (const char *filename, const char *name) fprintf (stderr, out of memory allocating toyvm_function\n); goto error; } - fn-fn_filename = name; + fn-fn_filename = filename; /* Read the lines of the file. */ while ((linelen = getline (line, bufsize, f)) != -1) @@ -208,7 +227,8 @@ toyvm_function_parse (const char *filename, const char *name) error: free (line); - fclose (f); + if (f) +fclose (f); free (fn); return NULL; } @@ -460,12 +480,7 @@ toyvm_function_compile (toyvm_function *fn) memset (state, 0, sizeof (state)); - /* Copy filename to funcname. */ - funcname = (char *)malloc (strlen (fn-fn_filename) + 1); - strcpy (funcname, fn-fn_filename); - - /* Convert . to NIL terminator. */ - *(strchr (funcname, '.')) = '\0'; + funcname = get_function_name (fn-fn_filename); state.ctxt = gcc_jit_context_acquire (); -- 1.8.5.3
Re: [PATCH 01/10] rs6000: Clobber XER[CA] in all user asm statements
On 12/08/2014 06:25 AM, David Edelsohn wrote: Okay. I don't know of another option that preserves backward compatibility. If RTH looks at this series of patches, he may have a better suggestion. No, this looks like what I'd do. Indeed, I added this hook back when the i386 port gained direct support for the flags register, so it's exactly what I did. ;-) r~
Re: [PATCH, x86] Fix pblendv expand.
On Wed, Dec 10, 2014 at 12:33:52AM +0300, Evgeny Stupachenko wrote: --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/blend.c @@ -0,0 +1,61 @@ +/* Test correctness of size 3 store groups permutation. */ +/* { dg-do run } */ +/* { dg-options -O3 } */ + +#define N 50 + +enum num3 +{ + a, b, c +}; + +struct flags +{ + enum num3 f; Does this really has to be an enum? Doesn't unsigned int there work the same? +int main() +{ + int i; + long long *rr = (long long *)q[0].a; + bar(2, q); + for (i = 0; i N * 2; i += 2) +if (rr[i] == -1 rr[i + 1] == -1) This is aliasing violation, can't you avoid that? I mean, just check the individual struct flags fields? And with the aliasing violation fixed, is there anything i?86 specific left in the testcase (i.e. shouldn't it go either to gcc.dg/ or gcc.c-torture/execute/ instead?)? Jakub
Re: [PATCH, libcpp]: Check the result of vasprintf
On Tue, 9 Dec 2014, Uros Bizjak wrote: Attached patch checks the return value and sets ptr to NULL in this case. 2014-12-09 Uros Bizjak ubiz...@gmail.com * directives.c (cpp_define_formatted): Check return value of vasprintf and in case of error set ptr to NULL. Bootstrapped on x86_64-linux-gnu. OK for mainline? No, this will just continue to pass NULL into cpp_define, and so into strlen, where it isn't a valid argument. You need to give an error message for allocation failure and exit, much like xmalloc does (or put xvasprintf in libiberty and use that here - see https://gcc.gnu.org/ml/gcc-patches/2009-11/msg01448.html and https://gcc.gnu.org/ml/gcc-patches/2009-11/msg01449.html - I don't know if that's the latest version). -- Joseph S. Myers jos...@codesourcery.com
RE: [PATCH][MIPS] Fix P5600 memory cost
Hi Prachi, I'm afraid you updated the wrong Changelog with this commit. GCC changes are recorded in gcc/ChangeLog. I'm not sure what the correct procedure is for fixing this. Jeff: Should a mistake like this be fixed by removing the entry from the top level and adding one at the appropriate location in the gcc/ChangeLog one? Thanks, Matthew -Original Message- From: Prachi Godbole Sent: 26 November 2014 08:40 To: Matthew Fortune; gcc-patches@gcc.gnu.org Subject: RE: [PATCH][MIPS] Fix P5600 memory cost Committed with ChangeLog entry fixes. Prachi -Original Message- From: Matthew Fortune Sent: Wednesday, November 5, 2014 4:07 PM To: Prachi Godbole; gcc-patches@gcc.gnu.org Subject: RE: [PATCH][MIPS] Fix P5600 memory cost The patch below fixes the memory cost for P5600. ChangeLog: 2014-11-05 Prachi Godbole prachi.godb...@imgtec.com * config/mips/mips.c (mips_rtx_cost_data): Fix memory_letency cost for p5600. Please follow these instructions to add yourself to MAINTAINERS in the write-after-approval section now that you have write access to GCC: https://gcc.gnu.org/svnwrite.html#authenticated OK with fixes to the changelog entry: latency not latency. Remember to tab in the changelog entry and split the line as it will exceed 80 chars. Also two spaces between the date/name and name/email. E.g. 2014-11-05 Prachi Godbole prachi.godb...@imgtec.com * config/mips/mips.c (mips_rtx_cost_data): Fix memory_latency cost for p5600. Thanks, Matthew diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index af6a913..558ba2f 100644 --- a/gcc/config/mips/mips.c +++ b/gcc/config/mips/mips.c @@ -1193,7 +1193,7 @@ static const struct mips_rtx_cost_data COSTS_N_INSNS (8),/* int_div_si */ COSTS_N_INSNS (8),/* int_div_di */ 2,/* branch_cost */ - 10 /* memory_latency */ + 4 /* memory_latency */ } }; ^L