Re: [PATCH] [AArch64, NEON] Improve vpmaxX vpminX intrinsics

2014-12-09 Thread Yangfei (Felix)
 On 28 November 2014 at 09:23, Yangfei (Felix) felix.y...@huawei.com wrote:
  Hi,
This patch converts vpmaxX  vpminX intrinsics to use builtin functions
 instead of the previous inline assembly syntax.
Regtested with aarch64-linux-gnu on QEMU.  Also passed the glorious
 testsuite of Christophe Lyon.
OK for the trunk?
 
 Hi Felix,   We know from experience that the advsimd intrinsics tend
 to be fragile for big endian and in general it is fairly easy to break the 
 big endian
 case.  For these advsimd improvements that you are working on (that we very
 much appreciate) it is important to run both little endian and big endian
 regressions.
 
 Thanks
 /Marcus


Okay.  Any plan for the advsimd big-endian improvement? 
I rebased this patch over Alan Lawrance's patch: 
https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00279.html 
No regressions for aarch64_be-linux-gnu target too.  OK for the thunk? 


Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 218464)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,18 @@
+2014-12-09  Felix Yang  felix.y...@huawei.com
+
+   * config/aarch64/aarch64-simd.md (aarch64_maxmin_unspmode): New
+   pattern.
+   * config/aarch64/aarch64-simd-builtins.def (smaxp, sminp, umaxp,
+   uminp, smax_nanp, smin_nanp): New builtins.
+   * config/aarch64/arm_neon.h (vpmax_s8, vpmax_s16, vpmax_s32,
+   vpmax_u8, vpmax_u16, vpmax_u32, vpmaxq_s8, vpmaxq_s16, vpmaxq_s32,
+   vpmaxq_u8, vpmaxq_u16, vpmaxq_u32, vpmax_f32, vpmaxq_f32, vpmaxq_f64,
+   vpmaxqd_f64, vpmaxs_f32, vpmaxnm_f32, vpmaxnmq_f32, vpmaxnmq_f64,
+   vpmaxnmqd_f64, vpmaxnms_f32, vpmin_s8, vpmin_s16, vpmin_s32, vpmin_u8,
+   vpmin_u16, vpmin_u32, vpminq_s8, vpminq_s16, vpminq_s32, vpminq_u8,
+   vpminq_u16, vpminq_u32, vpmin_f32, vpminq_f32, vpminq_f64, vpminqd_f64,
+   vpmins_f32, vpminnm_f32, vpminnmq_f32, vpminnmq_f64, vpminnmqd_f64,
+
 2014-12-07  Felix Yang  felix.y...@huawei.com
Shanyao Chen  chenshan...@huawei.com
 
Index: gcc/config/aarch64/arm_neon.h
===
--- gcc/config/aarch64/arm_neon.h   (revision 218464)
+++ gcc/config/aarch64/arm_neon.h   (working copy)
@@ -8843,491 +8843,7 @@ vpadds_f32 (float32x2_t a)
   return result;
 }
 
-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
-vpmax_f32 (float32x2_t a, float32x2_t b)
-{
-  float32x2_t result;
-  __asm__ (fmaxp %0.2s, %1.2s, %2.2s
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
-vpmax_s8 (int8x8_t a, int8x8_t b)
-{
-  int8x8_t result;
-  __asm__ (smaxp %0.8b, %1.8b, %2.8b
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
-vpmax_s16 (int16x4_t a, int16x4_t b)
-{
-  int16x4_t result;
-  __asm__ (smaxp %0.4h, %1.4h, %2.4h
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
-vpmax_s32 (int32x2_t a, int32x2_t b)
-{
-  int32x2_t result;
-  __asm__ (smaxp %0.2s, %1.2s, %2.2s
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
-vpmax_u8 (uint8x8_t a, uint8x8_t b)
-{
-  uint8x8_t result;
-  __asm__ (umaxp %0.8b, %1.8b, %2.8b
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
-vpmax_u16 (uint16x4_t a, uint16x4_t b)
-{
-  uint16x4_t result;
-  __asm__ (umaxp %0.4h, %1.4h, %2.4h
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
-vpmax_u32 (uint32x2_t a, uint32x2_t b)
-{
-  uint32x2_t result;
-  __asm__ (umaxp %0.2s, %1.2s, %2.2s
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
-vpmaxnm_f32 (float32x2_t a, float32x2_t b)
-{
-  float32x2_t result;
-  __asm__ (fmaxnmp %0.2s,%1.2s,%2.2s
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-vpmaxnmq_f32 (float32x4_t a, float32x4_t b)
-{
-  float32x4_t result;
-  __asm__ (fmaxnmp %0.4s,%1.4s,%2.4s
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float64x2_t 

Re: [PATCH] [AArch64, NEON] Improve vpmaxX vpminX intrinsics

2014-12-09 Thread Yangfei (Felix)
 You'll need to rebase over Alan Lawrance's patch.
 https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00279.html

Yes, see my new patch: https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00750.html 

  +;; Pairwise Integer Max/Min operations.
  +(define_insn aarch64_maxmin_unspmode
  + [(set (match_operand:VQ_S 0 register_operand =w)
  +   (unspec:VQ_S [(match_operand:VQ_S 1 register_operand w)
  +(match_operand:VQ_S 2 register_operand w)]
  +   MAXMINV))]
  + TARGET_SIMD
  + maxmin_uns_opp\t%0.Vtype, %1.Vtype, %2.Vtype
  +  [(set_attr type neon_minmaxq)]
  +)
  +
 
 Could you roll aarch64_reduc_maxmin_uns_internalv2si into this pattern?

Will come up with another patch to fix this.  Thanks for pointing this out.

 
 Thanks,
 Tejas.



Re: [PATCH, MPX runtime 1/2] Integrate MPX runtime library

2014-12-09 Thread Ilya Enkovich
On 24 Nov 17:02, Ilya Enkovich wrote:
 
 Right.  This works for both top level and multilib checks because failing 
 test is used and CC is usually not set when it's called by the top level 
 configure.  If we configure with CC=... then it may go wrong.  I left only 
 target check in configure.tgt and inlined test for x32 into libmpx configure.
 
  
  -- 
  Joseph S. Myers
  jos...@codesourcery.com
 
 Here is an updated version.
 
 Thanks,
 Ilya
 --

Here is an updated version.  I moved linker specs to target.  Currently mpx 
libraries are built for x86_64-*-linux* | i?86-*-linux*, so I think 
gcc/config/i386/linux-common.h is a proper place for LIBMPX delcarations.

Thanks,
Ilya
--
2014-12-09  Ilya Enkovich  ilya.enkov...@intel.com

* Makefile.def: Add libmpx.
* configure.ac: Add libmpx.
* Makefile.in: Regenerate.
* configure: Regenerate.

gcc/

2014-12-09  Ilya Enkovich  ilya.enkov...@intel.com

* config/i386/linux-common.h (LIBMPX_LIBS): New.
(LIBMPX_SPEC): New.
(CHKP_SPEC): New.
* gcc.c (CHKP_SPEC): New.
(LINK_COMMAND_SPEC): Add CHKP_SPEC.
* c-family/c.opt (static-libmpx): New.

libmpx/

2014-12-09  Ilya Enkovich  ilya.enkov...@intel.com

Initial commit.


diff --git a/Makefile.def b/Makefile.def
index 7c8761a..82432ef 100644
--- a/Makefile.def
+++ b/Makefile.def
@@ -129,6 +129,9 @@ target_modules = { module= libsanitizer;
   bootstrap=true;
   lib_path=.libs;
   raw_cxx=true; };
+target_modules = { module= libmpx;
+  bootstrap=true;
+  lib_path=.libs; };
 target_modules = { module= libvtv;
   bootstrap=true;
   lib_path=.libs;
diff --git a/configure.ac b/configure.ac
index fd1bdf0..1d36f6b 100644
--- a/configure.ac
+++ b/configure.ac
@@ -162,6 +162,7 @@ target_libraries=target-libgcc \
target-libstdc++-v3 \
target-libsanitizer \
target-libvtv \
+   target-libmpx \
target-libssp \
target-libquadmath \
target-libgfortran \
@@ -642,6 +643,25 @@ if test -d ${srcdir}/libvtv; then
 fi
 fi
 
+
+# Disable libmpx on unsupported systems.
+if test -d ${srcdir}/libmpx; then
+if test x$enable_libmpx = x; then
+   AC_MSG_CHECKING([for libmpx support])
+   if (srcdir=${srcdir}/libmpx; \
+   . ${srcdir}/configure.tgt; \
+   test $LIBMPX_SUPPORTED != yes)
+   then
+   AC_MSG_RESULT([no])
+   noconfigdirs=$noconfigdirs target-libmpx
+   else
+   AC_MSG_RESULT([yes])
+   fi
+fi
+fi
+
+
+
 # Disable libquadmath for some systems.
 case ${target} in
   avr-*-*)
@@ -2662,6 +2682,11 @@ if echo  ${target_configdirs}  | grep  libvtv   
/dev/null 21 
   bootstrap_target_libs=${bootstrap_target_libs}target-libvtv,
 fi
 
+# If we are building libmpx, bootstrap it.
+if echo  ${target_configdirs}  | grep  libmpx   /dev/null 21; then
+  bootstrap_target_libs=${bootstrap_target_libs}target-libmpx,
+fi
+
 # Determine whether gdb needs tk/tcl or not.
 # Use 'maybe' since enable_gdbtk might be true even if tk isn't available
 # and in that case we want gdb to be built without tk.  Ugh!
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index b9f7c65..65731cc 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1020,6 +1020,9 @@ fchkp-instrument-marked-only
 C ObjC C++ ObjC++ LTO Report Var(flag_chkp_instrument_marked_only) Init(0)
 Instrument only functions marked with bnd_instrument attribute.
 
+static-libmpx
+Driver
+
 fcilkplus
 C ObjC C++ ObjC++ LTO Report Var(flag_cilkplus) Init(0)
 Enable Cilk Plus
diff --git a/gcc/config/i386/linux-common.h b/gcc/config/i386/linux-common.h
index 1eaf024..07cd1a2 100644
--- a/gcc/config/i386/linux-common.h
+++ b/gcc/config/i386/linux-common.h
@@ -53,3 +53,27 @@ along with GCC; see the file COPYING3.  If not see
   GNU_USER_TARGET_ENDFILE_SPEC, \
   GNU_USER_TARGET_MATHFILE_SPEC   \
   ANDROID_ENDFILE_SPEC)
+
+#ifndef LIBMPX_LIBS
+#define LIBMPX_LIBS \
+ %:include(libmpx.spec)%(link_libmpx)
+#endif
+
+#ifndef LIBMPX_SPEC
+#if defined(HAVE_LD_STATIC_DYNAMIC)
+#define LIBMPX_SPEC \
+%{mmpx:%{fcheck-pointer-bounds:\
+%{static:--whole-archive -lmpx --no-whole-archive LIBMPX_LIBS }\
+%{!static:%{static-libmpx: LD_STATIC_OPTION  --whole-archive}\
+-lmpx %{static-libmpx:--no-whole-archive  LD_DYNAMIC_OPTION \
+LIBMPX_LIBS 
+#else
+#define LIBMPX_SPEC \
+%{mmpx:%{fcheck-pointer-bounds:-lmpx LIBMPX_LIBS }}
+#endif
+#endif
+
+#ifndef CHKP_SPEC
+#define CHKP_SPEC \
+%{!nostdlib:%{!nodefaultlibs: LIBMPX_SPEC }}
+#endif
diff --git a/gcc/gcc.c b/gcc/gcc.c
index 4cb4ba3..636830e 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -809,6 +809,10 @@ proper position among the other output files.  */
 %{fvtable-verify=preinit: -lvtv 

Re: [PATCH, MPX wrappers 1/3] Add MPX wrappers library

2014-12-09 Thread Ilya Enkovich
On 05 Dec 15:52, Jeff Law wrote:
 On 12/03/14 07:28, Ilya Enkovich wrote:
#ifndef MPX_SPEC
#define MPX_SPEC \
 -%{!nostdlib:%{!nodefaultlibs: LIBMPX_SPEC }}
 +%{!nostdlib:%{!nodefaultlibs: LIBMPX_SPEC LIBMPXWRAPPERS_SPEC }}
#endif
 
 Ugh.  Somehow I missed that MPX_SPEC was in gcc.c along with the uses of
 LIBMPX_SPEC.  Aren't all these target specific and thus belong in the x86
 specific files?
 
 Is config/i386/linux-common.h is a proper place for these specs then?
 Depends on whether or not we expect MPX to show up on other systems
 such as *bsd, mingw, solaris, etc.
 
 So I'd say linux-common.h is better than gcc.c, but perhaps not the
 best location.  Uros should chime in here.
 
 Right. Wrappers code doesn't use anything specific to MPX.  In case of
 pure software solution we should be able to compile and use this
 library without changes (except compilation flags).  But in case pure
 software solution exists MPX option should still be available and we
 should have two builds for this library.
 Ok.  Just wanted to be sure I understood how the pieces fit
 together.  I don't really expect a software implementation, but
 keeping it in mind helps us reasonably consider where certain things
 belong implementation wise.
 
 jeff
 

Here is a version with linker specs moved into linux-common.h.

Thanks,
Ilya
--
gcc/

2014-12-09  Ilya Enkovich  ilya.enkov...@intel.com

* config/i386/linux-common.h (LIBMPX_WRAPPERSSPEC): New.
(CHKP_SPEC): Add wrappers library.
* c-family/c.opt (static-libmpxwrappers): New.

libmpx/

2014-12-09  Ilya Enkovich  ilya.enkov...@intel.com

* Makefile.am (SUBDIRS): Add mpxwrap when used
AS supports MPX.
(MAKEOVERRIDES): New.
* Makefile.in: Regenerate.
* configure.ac: Check AS supports MPX.  Add
mpxintr/Makefile to config files.
* configure: Regenerate.
* mpxwrap/Makefile.am: New.
* mpxwrap/Makefile.in: New.
* mpxwrap/libtool-version: New.
* mpxwrap/mpx_wrappers.cc: New.
* mpxwrap/libmpxwrappers.map: New.


diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 65731cc..2632c3f 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1023,6 +1023,9 @@ Instrument only functions marked with bnd_instrument 
attribute.
 static-libmpx
 Driver
 
+static-libmpxwrappers
+Driver
+
 fcilkplus
 C ObjC C++ ObjC++ LTO Report Var(flag_cilkplus) Init(0)
 Enable Cilk Plus
diff --git a/gcc/config/i386/linux-common.h b/gcc/config/i386/linux-common.h
index 07cd1a2..854af46 100644
--- a/gcc/config/i386/linux-common.h
+++ b/gcc/config/i386/linux-common.h
@@ -73,7 +73,21 @@ along with GCC; see the file COPYING3.  If not see
 #endif
 #endif
 
+#ifndef LIBMPXWRAPPERS_SPEC
+#if defined(HAVE_LD_STATIC_DYNAMIC)
+#define LIBMPXWRAPPERS_SPEC \
+%{mmpx:%{fcheck-pointer-bounds:%{!fno-chkp-use-wrappers:\
+%{static:-lmpxwrappers}\
+%{!static:%{static-libmpxwrappers: LD_STATIC_OPTION  --whole-archive}\
+-lmpxwrappers %{static-libmpxwrappers:--no-whole-archive \
+LD_DYNAMIC_OPTION }
+#else
+#define LIBMPXWRAPPERS_SPEC \
+%{mmpx:%{fcheck-pointer-bounds:{!fno-chkp-use-wrappers:-lmpxwrappers}}}
+#endif
+#endif
+
 #ifndef CHKP_SPEC
 #define CHKP_SPEC \
-%{!nostdlib:%{!nodefaultlibs: LIBMPX_SPEC }}
+%{!nostdlib:%{!nodefaultlibs: LIBMPX_SPEC LIBMPXWRAPPERS_SPEC }}
 #endif
diff --git a/libmpx/Makefile.am b/libmpx/Makefile.am
index 6cee4ac..bd0a8b6 100644
--- a/libmpx/Makefile.am
+++ b/libmpx/Makefile.am
@@ -2,6 +2,9 @@ ACLOCAL_AMFLAGS = -I .. -I ../config
 
 if LIBMPX_SUPPORTED
 SUBDIRS = mpxrt
+if MPX_AS_SUPPORTED
+SUBDIRS += mpxwrap
+endif
 nodist_toolexeclib_HEADERS = libmpx.spec
 endif
 
@@ -45,3 +48,5 @@ AM_MAKEFLAGS = \
PICFLAG=$(PICFLAG) \
RANLIB=$(RANLIB) \
DESTDIR=$(DESTDIR)
+
+MAKEOVERRIDES =
diff --git a/libmpx/configure.ac b/libmpx/configure.ac
index dbfad02..4669525 100644
--- a/libmpx/configure.ac
+++ b/libmpx/configure.ac
@@ -100,6 +100,18 @@ AC_CHECK_TOOL(AS, as)
 AC_CHECK_TOOL(AR, ar)
 AC_CHECK_TOOL(RANLIB, ranlib, :)
 
+# Check we may build wrappers library
+echo test:  bndmov %bnd0, %bnd1  conftest.s
+if AC_TRY_COMMAND([$AS -o conftest.o conftest.s 1AS_MESSAGE_LOG_FD])
+then
+mpx_as=yes
+else
+mpx_as=no
+echo configure: no MPX support fo as AS_MESSAGE_LOG_FD
+fi
+rm -f conftest.o conftest.s
+AM_CONDITIONAL(MPX_AS_SUPPORTED, [test x$mpx_as = xyes])
+
 # Configure libtool
 AC_LIBTOOL_DLOPEN
 AM_PROG_LIBTOOL
@@ -117,7 +129,7 @@ fi
 
 AC_CONFIG_FILES([Makefile libmpx.spec])
 AC_CONFIG_HEADERS(config.h)
-AC_CONFIG_FILES(AC_FOREACH([DIR], [mpxrt], [DIR/Makefile]),
+AC_CONFIG_FILES(AC_FOREACH([DIR], [mpxrt mpxwrap], [DIR/Makefile ]),
   [cat  vpsed$$  \_EOF
 s!`test -f '$' || echo '$(srcdir)/'`!!
 _EOF
diff --git a/libmpx/mpxwrap/Makefile.am b/libmpx/mpxwrap/Makefile.am
new file mode 100644
index 000..c254d9b
--- /dev/null
+++ b/libmpx/mpxwrap/Makefile.am
@@ -0,0 +1,54 @@
+ALCLOCAL_AMFLAGS = -I .. -I ../config
+
+# 

[PING ^ 3] [RFC PATCH, AARCH64] Add support for -mlong-calls option

2014-12-09 Thread Yangfei (Felix)
Hi, 
  This is a pin for: 
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02258.html 

Thanks.


Re: [PATCH] Fix broadcast from scalar patterns (PR target/63594)

2014-12-09 Thread Uros Bizjak
On Mon, Dec 8, 2014 at 10:42 PM, Jakub Jelinek ja...@redhat.com wrote:
 Hi!

 This patch attempts to fix
 (set (reg:V*mode) (vec_duplicate:V*mode (reg/mem:mode)))
 patterns.  One issue is that there were separate patterns for
 broadcast from gpr and separate patterns for broadcast from memory
 (and vector reg), that isn't a good idea for reload, which can't then
 freely choose.  Another issue is that some pre-AVX2 broadcast patterns
 were present above the avx512vl broadcast patterns, so again, reload didn't
 have the possibility to use %xmm16-31/%ymm16-31 registers.  Also, the
 splitter written for AVX2 broadcasts from gpr went into the way of AVX512VL
 broadcasts.  And finally, the avx512*intrin.h headers were using
 #ifdef TARGET_64BIT, macro not used anywhere (probably meant to write
 __x86_64__ instead, but with the patch we actually just have one set of
 builtins.

 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

 2014-12-08  Jakub Jelinek  ja...@redhat.com

 PR target/63594
 * config/i386/sse.md (vec_dupv4sf): Move after
 mask_codeforavx512_vec_dup_gprmodemask_name pattern.
 (*vec_dupv4si, *vec_dupv2di): Likewise.
 (mask_codeforavx512_vec_dup_memmodemask_name): Merge into ...
 (mask_codeforavx512_vec_dup_gprmodemask_name): ... this
 pattern.
 (*vec_dupmode AVX2_VEC_DUP_MODE splitter): Disable for
 TARGET_AVX512VL (for QI/HI scalar modes only if TARGET_AVX512BW
 is set too).
 * config/i386/i386.c (enum ix86_builtins): Remove
 IX86_BUILTIN_PBROADCASTQ256_MEM_MASK,
 IX86_BUILTIN_PBROADCASTQ128_MEM_MASK and
 IX86_BUILTIN_PBROADCASTQ512_MEM.
 (bdesc_args): Use __builtin_ia32_pbroadcastq512_gpr_mask,
 __builtin_ia32_pbroadcastq256_gpr_mask and
 __builtin_ia32_pbroadcastq128_gpr_mask instead of *_mem_mask
 regardless of OPTION_MASK_ISA_64BIT.
 * config/i386/avx512fintrin.h (_mm512_set1_epi64,
 _mm512_mask_set1_epi64, _mm512_maskz_set1_epi64): Use *_gpr_mask
 builtins regardless of whether TARGET_64BIT is defined or not.
 * config/i386/avx512vlintrin.h (_mm256_mask_set1_epi64,
 _mm256_maskz_set1_epi64, _mm_mask_set1_epi64, _mm_maskz_set1_epi64):
 Likewise.

LGTM, but please see inline comment below.

 --- gcc/config/i386/sse.md.jj   2014-12-03 11:52:41.0 +0100
 +++ gcc/config/i386/sse.md  2014-12-08 13:26:06.505543457 +0100
 @@ -6319,22 +6319,6 @@ (define_insn avx512f_vec_dupmode_1
  (set_attr prefix evex)
  (set_attr mode MODE)])

 -(define_insn vec_dupv4sf
 -  [(set (match_operand:V4SF 0 register_operand =x,x,x)
 -   (vec_duplicate:V4SF
 - (match_operand:SF 1 nonimmediate_operand x,m,0)))]
 -  TARGET_SSE
 -  @
 -   vshufps\t{$0, %1, %1, %0|%0, %1, %1, 0}
 -   vbroadcastss\t{%1, %0|%0, %1}
 -   shufps\t{$0, %0, %0|%0, %0, 0}
 -  [(set_attr isa avx,avx,noavx)
 -   (set_attr type sseshuf1,ssemov,sseshuf1)
 -   (set_attr length_immediate 1,0,1)
 -   (set_attr prefix_extra 0,1,*)
 -   (set_attr prefix vex,vex,orig)
 -   (set_attr mode V4SF)])
 -
  ;; Although insertps takes register source, we prefer
  ;; unpcklps with register source since it is shorter.
  (define_insn *vec_concatv2sf_sse4_1
 @@ -12821,37 +12805,6 @@ (define_split
operands[1] = adjust_address (operands[1], ssescalarmodemode, offs);
  })

 -(define_insn *vec_dupv4si
 -  [(set (match_operand:V4SI 0 register_operand =x,x,x)
 -   (vec_duplicate:V4SI
 - (match_operand:SI 1 nonimmediate_operand  x,m,0)))]
 -  TARGET_SSE
 -  @
 -   %vpshufd\t{$0, %1, %0|%0, %1, 0}
 -   vbroadcastss\t{%1, %0|%0, %1}
 -   shufps\t{$0, %0, %0|%0, %0, 0}
 -  [(set_attr isa sse2,avx,noavx)
 -   (set_attr type sselog1,ssemov,sselog1)
 -   (set_attr length_immediate 1,0,1)
 -   (set_attr prefix_extra 0,1,*)
 -   (set_attr prefix maybe_vex,vex,orig)
 -   (set_attr mode TI,V4SF,V4SF)])
 -
 -(define_insn *vec_dupv2di
 -  [(set (match_operand:V2DI 0 register_operand =x,x,x,x)
 -   (vec_duplicate:V2DI
 - (match_operand:DI 1 nonimmediate_operand  0,x,m,0)))]
 -  TARGET_SSE
 -  @
 -   punpcklqdq\t%0, %0
 -   vpunpcklqdq\t{%d1, %0|%0, %d1}
 -   %vmovddup\t{%1, %0|%0, %1}
 -   movlhps\t%0, %0
 -  [(set_attr isa sse2_noavx,avx,sse3,noavx)
 -   (set_attr type sselog1,sselog1,sselog1,ssemov)
 -   (set_attr prefix orig,vex,maybe_vex,orig)
 -   (set_attr mode TI,TI,DF,V4SF)])
 -
  (define_insn *vec_concatv2si_sse4_1
[(set (match_operand:V2SI 0 register_operand =Yr,*x,x, Yr,*x,x, x, 
 *y,*y)
 (vec_concat:V2SI
 @@ -16665,46 +16618,78 @@ (define_insn mask_codeforavx512f_broa
 (set_attr mode sseinsnmode)])

  (define_insn mask_codeforavx512_vec_dup_gprmodemask_name
 -  [(set (match_operand:VI12_AVX512VL 0 register_operand =v)
 +  [(set (match_operand:VI12_AVX512VL 0 register_operand =v,v)
 (vec_duplicate:VI12_AVX512VL
 - (match_operand:ssescalarmode 1 register_operand r)))]
 +  

Re: [PATCH, x86] Fix pblendv expand.

2014-12-09 Thread Uros Bizjak
On Tue, Dec 9, 2014 at 12:33 AM, Evgeny Stupachenko evstu...@gmail.com wrote:

 The patch fix pblendv expand.
 The bug was uncovered when permutation operands are constants.
 In this case we init target register for expand_vec_perm_1 with
 constant and then rewrite the target with constant for
 expand_vec_perm_pblend.

 The patch fixes 403.gcc execution, compiled with -Ofast -funroll-loops
 -flto -march=corei7.

 Bootstrap and make check passed.

 Is it ok?

Please add a testcase.

Uros.


 Evgeny

 2014-12-09  Evgeny Stupachenko  evstu...@gmail.com

 gcc/
 * config/i386/i386.c (expand_vec_perm_pblendv): Gen new rtx for
 expand_vec_perm_1 target.

 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index eafc15a..5a914ad 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -47546,6 +47546,7 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d *d)
  dcopy.op0 = dcopy.op1 = d-op1;
else
  dcopy.op0 = dcopy.op1 = d-op0;
 +  dcopy.target = gen_reg_rtx (vmode);
dcopy.one_operand_p = true;

for (i = 0; i  nelt; ++i)


Re: [PATCH] Mark explicit decls as implicit when we've seen a prototype

2014-12-09 Thread Richard Biener
On Mon, 8 Dec 2014, Joseph Myers wrote:

 On Mon, 8 Dec 2014, Richard Biener wrote:
 
  The alternative is to decide used in the middle-end at one point,
  for example at the end of all_lowering_passes where hopefully
  we have constant folded and removed dead code enough.  We can also
  compute an overall uses libm flag to fix the testcase I reported
  (of course we'd like to re-compute that at LTO time).
 
 If you do that, of course any optimizations before deciding used need to 
 be conservative - assume that any function except the one you're 
 optimizing a call of is not used.

Yes.  I guess I'm willing to consider a new langhook that for C evaluates
C_DECL_USED.  We still have to find a suitable point to query it -
with the earliest point is after parsing finished (thus when we
finalize the TU).  At that point all constant expression folding
should have happened (up to the point required by the language) - but
I suppose C_DECL_USED is still set even if all calls were constant
folded away.

Note that computing a middle-end C_DECL_USED can be done from
gimplification (possibly after folding the calls there once).
Same for a global libm function used flag (where at some point
we might need a target hook to allow specifying whether a call
is from libm).

Richard.


[PATCH, trivial] [AArch64] Remove declaration of removed function from aarch64-protos.h

2014-12-09 Thread Yangfei (Felix)
The definition of function aarch64_function_profiler is removed since GCC-4.9. 
But the declaration is still there in aarch64-protos.h.  So remove it. 
OK for the trunk? 


Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 218464)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2014-12-09  Felix Yang  felix.y...@huawei.com
+
+   * config/aarch64/aarch64-protos.h (aarch64_function_profiler): Remove
+   declaration of removed function.
+
 2014-12-07  Felix Yang  felix.y...@huawei.com
Shanyao Chen  chenshan...@huawei.com
 
Index: gcc/config/aarch64/aarch64-protos.h
===
--- gcc/config/aarch64/aarch64-protos.h (revision 218464)
+++ gcc/config/aarch64/aarch64-protos.h (working copy)
@@ -247,7 +247,6 @@ void aarch64_expand_epilogue (bool);
 void aarch64_expand_mov_immediate (rtx, rtx);
 void aarch64_expand_prologue (void);
 void aarch64_expand_vector_init (rtx, rtx);
-void aarch64_function_profiler (FILE *, int);
 void aarch64_init_cumulative_args (CUMULATIVE_ARGS *, const_tree, rtx,
   const_tree, unsigned);
 void aarch64_init_expanders (void);


Re: [PATCH, x86] Fix pblendv expand.

2014-12-09 Thread Uros Bizjak
On Tue, Dec 9, 2014 at 9:57 AM, Uros Bizjak ubiz...@gmail.com wrote:

 The patch fix pblendv expand.
 The bug was uncovered when permutation operands are constants.
 In this case we init target register for expand_vec_perm_1 with
 constant and then rewrite the target with constant for
 expand_vec_perm_pblend.

 The patch fixes 403.gcc execution, compiled with -Ofast -funroll-loops
 -flto -march=corei7.

 Bootstrap and make check passed.

 Is it ok?

 Please add a testcase.

Also, it surprises me that we enter expand_vec_perm_pblendv with
uninitialized (?) target. Does your patch only papers over a real
problem up the call chain (hard to say without a testcase)?

Uros.



 Evgeny

 2014-12-09  Evgeny Stupachenko  evstu...@gmail.com

 gcc/
 * config/i386/i386.c (expand_vec_perm_pblendv): Gen new rtx for
 expand_vec_perm_1 target.

 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index eafc15a..5a914ad 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -47546,6 +47546,7 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d *d)
  dcopy.op0 = dcopy.op1 = d-op1;
else
  dcopy.op0 = dcopy.op1 = d-op0;
 +  dcopy.target = gen_reg_rtx (vmode);
dcopy.one_operand_p = true;

for (i = 0; i  nelt; ++i)


[PATCH] Fix half of PR64191

2014-12-09 Thread Richard Biener

This avoids vectorizing empty loops that only contain an indirect
clobber.  The real bug of course is that we don't remove the loop
in the first place (but we might not able to prove it terminates).
Still vectorizing it adds additional prologue/epilogue loops that
only convolutes the final code we generate.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk
and branch.

Richard.

2014-12-09  Richard Biener  rguent...@suse.de

PR tree-optimization/64191
* tree-vect-stmts.c (vect_stmt_relevant_p): Clobbers are
not relevant (nor are their uses).

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 218479)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -340,7 +340,8 @@ vect_stmt_relevant_p (gimple stmt, loop_
 
   /* changing memory.  */
   if (gimple_code (stmt) != GIMPLE_PHI)
-if (gimple_vdef (stmt))
+if (gimple_vdef (stmt)
+!gimple_clobber_p (stmt))
   {
if (dump_enabled_p ())
  dump_printf_loc (MSG_NOTE, vect_location,


Re: [PATCH] Fix broadcast from scalar patterns (PR target/63594)

2014-12-09 Thread Jakub Jelinek
On Tue, Dec 09, 2014 at 09:49:17AM +0100, Uros Bizjak wrote:
   (define_insn mask_codeforavx512_vec_dup_gprmodemask_name
  -  [(set (match_operand:VI48_AVX512VL 0 register_operand =v)
  -   (vec_duplicate:VI48_AVX512VL
  - (match_operand:ssescalarmode 1 register_operand r)))]
  -  TARGET_AVX512F  (ssescalarmodemode != DImode || TARGET_64BIT)
  -{
  -  return vpbroadcastbcstscalarsuff\t{%1, 
  %0mask_operand2|%0mask_operand2, %1};
  -}
  -  [(set_attr type ssemov)
  -   (set_attr prefix evex)
  -   (set_attr mode sseinsnmode)])
  -
  -(define_insn mask_codeforavx512_vec_dup_memmodemask_name
  -  [(set (match_operand:V48_AVX512VL 0 register_operand =v)
  -   (vec_duplicate:V48_AVX512VL
  - (match_operand:ssescalarmode 1 nonimmediate_operand vm)))]
  +  [(set (match_operand:V48_AVX512VL 0 register_operand =v,v)
  +   (vec_duplicate:V48_AVX512VL
  + (match_operand:ssescalarmode 1 nonimmediate_operand vm,r)))]
 TARGET_AVX512F
 vsseintprefixbroadcastbcstscalarsuff\t{%1, 
  %0mask_operand2|%0mask_operand2, %1}
 [(set_attr type ssemov)
  (set_attr prefix evex)
  -   (set_attr mode sseinsnmode)])
  +   (set_attr mode sseinsnmode)
  +   (set (attr enabled)
  + (if_then_else (eq_attr alternative 1)
  +   (symbol_ref GET_MODE_CLASS (ssescalarmodemode) == MODE_INT
  + (ssescalarmodemode != DImode || TARGET_64BIT))
  +   (const_int 1)))])

I have no idea how to get rid of this enabled attribute, as it disables just
one alternative.  Creating a special isa attribute for it looks less
readable and much more expensive.  This enabled attribute is IMHO of the
good kind, GET_MODE_CLASS (ssescalarmodemode) == MODE_INT and
ssescalarmodemode != DImode are constants for the particular chosen mode,
TARGET_64BIT doesn't change during compilation of a TU, and so the only
variable is that it disables just one of the alternatives.

  @@ -16759,7 +16744,10 @@ (define_split
 [(set (match_operand:AVX2_VEC_DUP_MODE 0 register_operand)
  (vec_duplicate:AVX2_VEC_DUP_MODE
(match_operand:ssescalarmode 1 register_operand)))]
  -  TARGET_AVX2  reload_completed  GENERAL_REG_P (operands[1])
  +  TARGET_AVX2
  +(!TARGET_AVX512VL
  +   || (!TARGET_AVX512BW  GET_MODE_SIZE (ssescalarmodemode)  2))
  +reload_completed  GENERAL_REG_P (operands[1])
 [(const_int 0)]
 
 We would like to avoid convoluted insn enable condition by moving the
 target delated complexity to the mode iterator, even if it requires
 additional single-use mode iterator. In the ideal case, the remaining
 target-dependant condition would represent the baseline target for an
 insn and all other target-related conditions would be inside mode
 iterator.

This splitter condition can be replaced with mode iterator, but it results
in big repetitions there (incremental diff below).  Or do you have something
else in mind?

--- gcc/config/i386/sse.md  2014-12-08 13:26:06.505543457 +0100
+++ gcc/config/i386/sse.md  2014-12-08 13:26:06.505543457 +0100
@@ -16740,14 +16740,21 @@
(set_attr isa avx2,noavx2,avx2,noavx2)
(set_attr mode sseinsnmode,V8SF,sseinsnmode,V8SF)])
 
+;; Modes handled by AVX2 vec_dup splitter.  Assumes TARGET_AVX2,
+;; if both -mavx512vl and -mavx512bw are used, disables all modes,
+;; if just -mavx512vl, disables just V*SI.
+(define_mode_iterator AVX2_VEC_DUP_MODE_SPLIT
+  [(V32QI !TARGET_AVX512VL || !TARGET_AVX512BW)
+   (V16QI !TARGET_AVX512VL || !TARGET_AVX512BW)
+   (V16HI !TARGET_AVX512VL || !TARGET_AVX512BW)
+   (V8HI !TARGET_AVX512VL || !TARGET_AVX512BW)
+   (V8SI !TARGET_AVX512VL) (V4SI !TARGET_AVX512VL)])
+
 (define_split
-  [(set (match_operand:AVX2_VEC_DUP_MODE 0 register_operand)
-   (vec_duplicate:AVX2_VEC_DUP_MODE
+  [(set (match_operand:AVX2_VEC_DUP_MODE_SPLIT 0 register_operand)
+   (vec_duplicate:AVX2_VEC_DUP_MODE_SPLIT
  (match_operand:ssescalarmode 1 register_operand)))]
-  TARGET_AVX2
-(!TARGET_AVX512VL
-   || (!TARGET_AVX512BW  GET_MODE_SIZE (ssescalarmodemode)  2))
-reload_completed  GENERAL_REG_P (operands[1])
+  TARGET_AVX2  reload_completed  GENERAL_REG_P (operands[1])
   [(const_int 0)]
 {
   emit_insn (gen_vec_setv4si_0 (gen_lowpart (V4SImode, operands[0]),

Jakub


Re: [RFC, PATCH, fortran] PR fortran/60255 Deferred character length

2014-12-09 Thread Andre Vehreschild
Hi Paul,

 s/furure/future/ :-)

Hups, fixed.

 Why are you using a double underscore in get__len_component?

Because the component is called _len. The routine should be called get _len
component, but spaces aren't allowed :-) Anyways, does this violate some style
guide? Should I remove one of underscores?

 More seriously, I think that the len field should be added unconditionally
 to unlimited polymorphic variables. Otherwise, you might find unlimited
 polymorphic variables that are created in an already compiled
 module/subprogramme arriving without the requisite field.

I was thinking about that, too. For a start I just wanted to give an idea of
where this is going. When more gfortran gurus vote for the unconditional add to
u-poly variables, then I will change it. 

 Michael Metcalf has posted an example that makes use of unlimited
 polymorphism at ftp://ftp.numerical.rl.ac.uk/pub/MRandC/oo.f90 . gfortran
 does not work correctly with it at the moment because of the lack of a len
 field. Removing all the string input allows it to run correctly. I think
 that you should ensure that your patch fixes the problem.
 
 A slight obstacle is that the substring at line 216 causes the emission of:
 type is (character(*))
   1
 Error: Associate-name '__tmp_CHARACTER_0_1' at (1) is used as array
 
 Just retaining print *, 'character = ', v, '' allows the example to
 compile

Ok, I take a look at it. As I am paid to fix certain bugs that prevent
compiling another software, I will not prioritize working on 55901 as long as
it is not needed in the software I focus on. Sorry for not being more
enthusiastic, but there are more than 8 prs (and only one down yet) I have to
fix and time is limited.

What I did not mention in the previous mail is that I also plan to implement
this fixes in the fortran-dev branch with the new array descriptor. Given that
there is no other volunteer. :-)

Please continue commenting.

Regards,
Andre

 ifort compiles and runs it successfully and so I think that it would be
 nice if gfortran catches up on this one.
 
 Parenthetically, I wonder if this is not the time to implement PR53971...
 including responding to Mikael's comment?
 
 Anyway, this is a good start in the right direction. Please persist!
 
 Thanks
 
 Paul
 
 
 On 8 December 2014 at 18:38, Andre Vehreschild ve...@gmx.de wrote:
 
  Hi all,
 
  please find attached a more elaborate patch for pr60255. I totally agree
  that
  my first attempt was just scratching the surface of the work needed.
 
  This patch also is *not* complete, but because I am really new to gfortran
  patching, I don't want to present a final patch only to learn then, that I
  have
  violated design rules, common practice or the like. Therefore please
  comment
  and direct me to any sources/ideas to improve the patch.
 
  Topic:
  The pr 60255 is about assigning a char array to an unlimited polymorphic
  entity. In the comments the concern about the lost length information is
  raised. The patch adds a _len component to the unlimited polymorphic entity
  (after _data and _vtab) and adds an assignment of the string length to _len
  when a string is pointer assigned to the unlimited poly entity.
  Furthermore is
  the intrinsic len(unlimited poly pointing to a string) resolved to give the
  _len component.
 
  Yet missing:
  - assign _len component back to deferred char array length component
  - transport length along chains of unlimited poly entities, i.e., a = b;
  c =
a where all objects are unlimited poly and b is a string.
  - allocate() in this context
 
  Patch dependencies:
  none
 
  Comments, concerns, candy welcome!
 
  Regards,
  Andre
 
  On Sun, 17 Aug 2014 14:32:21 +0200
  domi...@lps.ens.fr (Dominique Dhumieres) wrote:
 
the testcase should check that the code generated is actually working,
not just that the ICE disappeared.
  
   I agree. Note that there is a test in the comment 3 of PR60255 that
   can be used to check the run time behavior (and possibly check the
   vtab issue).
  
   Dominique
 
 
  --
  Andre Vehreschild * Email: vehre ad gmx dot de
 
 
 
 


-- 
Andre Vehreschild * Kreuzherrenstr. 8 * 52062 Aachen
Tel.: +49 241 9291018 * Email: ve...@gmx.de 


RE: [PATCH] Fix PR 61225

2014-12-09 Thread Zhenqiang Chen

 -Original Message-
 From: Jeff Law [mailto:l...@redhat.com]
 Sent: Tuesday, December 09, 2014 5:29 AM
 To: Zhenqiang Chen
 Cc: Steven Bosscher; gcc-patches@gcc.gnu.org; Jakub Jelinek
 Subject: Re: [PATCH] Fix PR 61225
 
 On 12/04/14 01:43, Zhenqiang Chen wrote:
   
 Part of PR rtl-optimization/61225
 * combine.c (refer_same_reg_p): New function.
 (combine_instructions): Handle I1 - I2 - I3; I2 -
insn.
 (try_combine): Add one more parameter TO_COMBINED_INSN,
which
  is
 used to create a new insn parallel (TO_COMBINED_INSN,
I3).
   
   testsuite/ChangeLog:
   2014-08-04  Zhenqiang Chenzhenqiang.c...@linaro.org
   
 * gcc.target/i386/pr61225.c: New test.
 THanks for the updates and clarifications.  Just a few minor things and
while
 it's a bit of a hack, I'll approve:
 
 
 
  +
  +/* A is a compare (reg1, 0) and B is SINGLE_SET which SET_SRC is reg2.
  +   It returns TRUE, if reg1 == reg2, and no other refer of reg1
  +   except A and B.  */
  +
  +static bool
  +refer_same_reg_p (rtx_insn *a, rtx_insn *b)
  +{
  +  rtx seta = single_set (a);
  +  rtx setb = single_set (b);
  +
  +  if (BLOCK_FOR_INSN (a) != BLOCK_FOR_INSN (b)
  +  || !seta
  +  || !setb)
  +return false;
  
  +  if (GET_CODE (SET_SRC (seta)) != COMPARE
  +  || GET_MODE_CLASS (GET_MODE (SET_DEST (seta))) != MODE_CC
  +  || !REG_P (XEXP (SET_SRC (seta), 0))
  +  || XEXP (SET_SRC (seta), 1) != const0_rtx
  +  || !REG_P (SET_SRC (setb))
  +  || REGNO (SET_SRC (setb)) != REGNO (XEXP (SET_SRC (seta), 0)))
  +return false;
 Do you need to verify SETA and SETB satisfy single_set?  Or has that
 already been done elsewhere?

A is NEXT_INSN (insn)
B is prev_nonnote_nondebug_insn (insn),

For I1 - I2 - B; I2 - A;
LOG_LINK can make sure I1 and I2 are single_set, but not A and B. And I did
found codes in function try_combine, which can make sure B (or I3) is
single_set. 

So I think the check can skip failed cases at early stage.
 
 The name refer_same_reg_p seems wrong -- your function is verifying the
 underlying RTL store as well as the existence of a a dependency between
 the insns.  Can you try to come up with a better name?

Change it to can_reuse_cc_set_p.

 Please use CONST0_RTX (mode)  IIRC that'll allow this to work regardless
 of the size of the modes relative to the host word size.
 
Updated. 
 
  +
  +  if (DF_REG_USE_COUNT (REGNO (SET_SRC (setb)))  2)
  +{
  +  df_ref use;
  +  rtx insn;
  +  unsigned int i = REGNO (SET_SRC (setb));
  +
  +  for (use = DF_REG_USE_CHAIN (i); use; use = DF_REF_NEXT_REG
(use))
  +{
  + insn = DF_REF_INSN (use);
  + if (insn != a  insn != b  !(NOTE_P (insn) || DEBUG_INSN_P
 (insn)))
  +   return false;
  +   }
  +}
  +
  +  return true;
  +}
 Is this fragment really needed?  Does it ever trigger?  I'd think that
 for  2 uses punting would be fine.  Do we really commonly have cases
 with  2 uses, but where they're all in SETA and SETB?

The check is to make sure the correctness.  Here is a case,

int 
f1 (int *x)
{
  int t = --*x;
  if (!t)
foo (x);
  return t;
}

  _4 = *x_3(D);
  _5 = _4 + -1;
  *x_3(D) = _5;
  # DEBUG t = _5
  if (_5 == 0)
   ...
  bb 4:
  return _5;

_5 is used in return _5. So we can not remove _5 = _4 + -1.
 
}
}
 
  + /* Try to combine a compare insn that sets CC
  +with a preceding insn that can set CC, and maybe with its
  +logical predecessor as well.
  +We need this special code because data flow connections
  +do not get entered in LOG_LINKS.  */
  + if ((prev = prev_nonnote_nondebug_insn (insn)) != NULL_RTX
  +  refer_same_reg_p (insn, prev)
  +  max_combine = 4)
  +   {
  +   struct insn_link *next1;
  +   FOR_EACH_LOG_LINK (next1, prev)
  + {
  +   rtx_insn *link1 = next1-insn;
  +   if (NOTE_P (link1))
  + continue;
  +   /* I1 - I2 - I3; I2 - insn;
  +  output parallel (insn, I3).  */
  +   FOR_EACH_LOG_LINK (nextlinks, link1)
  + if ((next = try_combine (prev, link1,
  +  nextlinks-insn, NULL,
  +  new_direct_jump_p,
  +  last_combined_insn, insn)) !=
0)
  +
  +   {
  + delete_insn (insn);
  + insn = next;
  + statistics_counter_event (cfun, four-insn
combine,
 1);
  + goto retry;
  +   }
  +   /* I2 - I3; I2 - insn
  +  output next = parallel (insn, I3).  */
  +   if ((next = try_combine (prev, link1,
  +NULL, NULL,
  +new_direct_jump_p,
  + 

Re: [testsuite patch] avoid test when compile options is conflict with default mthumb

2014-12-09 Thread Mike Stump
On Dec 8, 2014, at 7:06 PM, Wang Deqiang wang.deqi...@linaro.org wrote:
 This is a ping for
 
 https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01049.html

Seems reasonable enough.  I was hoping the arm folks would chime in, we should 
have enough of them to review.  Let’s given them two more days, and if no 
comments, Ok.  If they comment then seems reasonable to let them have the final 
say.

Re: locales fixes

2014-12-09 Thread Jonathan Wakely

On 08/12/14 23:53 +0100, François Dumont wrote:
   After having installed all necessary locales on my system I end up 
with 4 failures. Here is a patch to fix them all.


Did you discover why only you are seeing failures?

The whole testsuite passes for me, with glibc 2.18, including the
tests you are fixing (i.e. I don't see any need for a fix).

The changes to the glibc versions tests are definitely not correct.


Re: [PATCH] Fix broadcast from scalar patterns (PR target/63594)

2014-12-09 Thread Uros Bizjak
On Tue, Dec 9, 2014 at 10:17 AM, Jakub Jelinek ja...@redhat.com wrote:
 On Tue, Dec 09, 2014 at 09:49:17AM +0100, Uros Bizjak wrote:
   (define_insn mask_codeforavx512_vec_dup_gprmodemask_name
  -  [(set (match_operand:VI48_AVX512VL 0 register_operand =v)
  -   (vec_duplicate:VI48_AVX512VL
  - (match_operand:ssescalarmode 1 register_operand r)))]
  -  TARGET_AVX512F  (ssescalarmodemode != DImode || TARGET_64BIT)
  -{
  -  return vpbroadcastbcstscalarsuff\t{%1, 
  %0mask_operand2|%0mask_operand2, %1};
  -}
  -  [(set_attr type ssemov)
  -   (set_attr prefix evex)
  -   (set_attr mode sseinsnmode)])
  -
  -(define_insn mask_codeforavx512_vec_dup_memmodemask_name
  -  [(set (match_operand:V48_AVX512VL 0 register_operand =v)
  -   (vec_duplicate:V48_AVX512VL
  - (match_operand:ssescalarmode 1 nonimmediate_operand vm)))]
  +  [(set (match_operand:V48_AVX512VL 0 register_operand =v,v)
  +   (vec_duplicate:V48_AVX512VL
  + (match_operand:ssescalarmode 1 nonimmediate_operand 
  vm,r)))]
 TARGET_AVX512F
 vsseintprefixbroadcastbcstscalarsuff\t{%1, 
  %0mask_operand2|%0mask_operand2, %1}
 [(set_attr type ssemov)
  (set_attr prefix evex)
  -   (set_attr mode sseinsnmode)])
  +   (set_attr mode sseinsnmode)
  +   (set (attr enabled)
  + (if_then_else (eq_attr alternative 1)
  +   (symbol_ref GET_MODE_CLASS (ssescalarmodemode) == MODE_INT
  + (ssescalarmodemode != DImode || TARGET_64BIT))
  +   (const_int 1)))])

 I have no idea how to get rid of this enabled attribute, as it disables just
 one alternative.  Creating a special isa attribute for it looks less
 readable and much more expensive.  This enabled attribute is IMHO of the
 good kind, GET_MODE_CLASS (ssescalarmodemode) == MODE_INT and
 ssescalarmodemode != DImode are constants for the particular chosen mode,
 TARGET_64BIT doesn't change during compilation of a TU, and so the only
 variable is that it disables just one of the alternatives.

IMO, this one is not problematic.

  @@ -16759,7 +16744,10 @@ (define_split
 [(set (match_operand:AVX2_VEC_DUP_MODE 0 register_operand)
  (vec_duplicate:AVX2_VEC_DUP_MODE
(match_operand:ssescalarmode 1 register_operand)))]
  -  TARGET_AVX2  reload_completed  GENERAL_REG_P (operands[1])
  +  TARGET_AVX2
  +(!TARGET_AVX512VL
  +   || (!TARGET_AVX512BW  GET_MODE_SIZE (ssescalarmodemode)  2))
  +reload_completed  GENERAL_REG_P (operands[1])
 [(const_int 0)]

 We would like to avoid convoluted insn enable condition by moving the
 target delated complexity to the mode iterator, even if it requires
 additional single-use mode iterator. In the ideal case, the remaining
 target-dependant condition would represent the baseline target for an
 insn and all other target-related conditions would be inside mode
 iterator.

 This splitter condition can be replaced with mode iterator, but it results
 in big repetitions there (incremental diff below).  Or do you have something
 else in mind?

 --- gcc/config/i386/sse.md  2014-12-08 13:26:06.505543457 +0100
 +++ gcc/config/i386/sse.md  2014-12-08 13:26:06.505543457 +0100
 @@ -16740,14 +16740,21 @@
 (set_attr isa avx2,noavx2,avx2,noavx2)
 (set_attr mode sseinsnmode,V8SF,sseinsnmode,V8SF)])

 +;; Modes handled by AVX2 vec_dup splitter.  Assumes TARGET_AVX2,
 +;; if both -mavx512vl and -mavx512bw are used, disables all modes,
 +;; if just -mavx512vl, disables just V*SI.
 +(define_mode_iterator AVX2_VEC_DUP_MODE_SPLIT
 +  [(V32QI !TARGET_AVX512VL || !TARGET_AVX512BW)
 +   (V16QI !TARGET_AVX512VL || !TARGET_AVX512BW)
 +   (V16HI !TARGET_AVX512VL || !TARGET_AVX512BW)
 +   (V8HI !TARGET_AVX512VL || !TARGET_AVX512BW)
 +   (V8SI !TARGET_AVX512VL) (V4SI !TARGET_AVX512VL)])
 +
  (define_split
 -  [(set (match_operand:AVX2_VEC_DUP_MODE 0 register_operand)
 -   (vec_duplicate:AVX2_VEC_DUP_MODE
 +  [(set (match_operand:AVX2_VEC_DUP_MODE_SPLIT 0 register_operand)
 +   (vec_duplicate:AVX2_VEC_DUP_MODE_SPLIT
   (match_operand:ssescalarmode 1 register_operand)))]
 -  TARGET_AVX2
 -(!TARGET_AVX512VL
 -   || (!TARGET_AVX512BW  GET_MODE_SIZE (ssescalarmodemode)  2))
 -reload_completed  GENERAL_REG_P (operands[1])
 +  TARGET_AVX2  reload_completed  GENERAL_REG_P (operands[1])
[(const_int 0)]
  {
emit_insn (gen_vec_setv4si_0 (gen_lowpart (V4SImode, operands[0]),

Ugh, tough choice.

Let's go with the original, but please add the comment, similar to the
one above. Maybe also use ssecalarmodemode == SImode instead of
GET_MODE_SIZE, it better matches the comment.

Thanks,
Uros.


Re: [PATCH, trivial] [AArch64] Remove declaration of removed function from aarch64-protos.h

2014-12-09 Thread Richard Earnshaw
On 09/12/14 08:59, Yangfei (Felix) wrote:
 The definition of function aarch64_function_profiler is removed since 
 GCC-4.9. 
 But the declaration is still there in aarch64-protos.h.  So remove it. 
 OK for the trunk? 
 
 
 Index: gcc/ChangeLog
 ===
 --- gcc/ChangeLog (revision 218464)
 +++ gcc/ChangeLog (working copy)
 @@ -1,3 +1,8 @@
 +2014-12-09  Felix Yang  felix.y...@huawei.com
 +
 + * config/aarch64/aarch64-protos.h (aarch64_function_profiler): Remove
 + declaration of removed function.
 +
  2014-12-07  Felix Yang  felix.y...@huawei.com
   Shanyao Chen  chenshan...@huawei.com
  
 Index: gcc/config/aarch64/aarch64-protos.h
 ===
 --- gcc/config/aarch64/aarch64-protos.h   (revision 218464)
 +++ gcc/config/aarch64/aarch64-protos.h   (working copy)
 @@ -247,7 +247,6 @@ void aarch64_expand_epilogue (bool);
  void aarch64_expand_mov_immediate (rtx, rtx);
  void aarch64_expand_prologue (void);
  void aarch64_expand_vector_init (rtx, rtx);
 -void aarch64_function_profiler (FILE *, int);
  void aarch64_init_cumulative_args (CUMULATIVE_ARGS *, const_tree, rtx,
  const_tree, unsigned);
  void aarch64_init_expanders (void);
 

OK.

R.



[PATCH][libstdc++][testsuite] Mark as UNSUPPORTED tests that don't fit into tiny memory model

2014-12-09 Thread Kyrill Tkachov

Hi all,

There are a few libstdc++ that don't fit into memory when the 
-mcmodel=tiny option is used.
For example 25_algorithms/random_shuffle/1.cc on aarch64 is a 5.5M 
binary on aarch64 with -mcmodel=small.


In the gcc and g++ testsuite we already catch such cases and mark them 
as UNSUPPORTED. In libstdc++.exp there is no such functionality.
This patch adds that check. I've implemented the v3_check_unsupported_p 
predicate the same way as it's implemented in the gcc testsuite.
I'm not very happy that it had to be copied, but I couldn't find a way 
to include the gcc definition sanely.


With this patch 8 tests that were previously FAILing on 
aarch64-none-elf/-mcmodel=tiny due to relocation truncation errors are 
now marked as UNSUPPORTED.


A test run on x86 didn't show any bad behaviour appearing.

Is this a sane approach to what I'm trying to solve?

Thanks,
Kyrill

2014-12-03  Kyrylo Tkachov  kyrylo.tkac...@arm.com\

 * testsuite/lib/libstdc++.exp (${tool}_check_unsupported_p):
 New procedure.
 (v3_target_compile): Check if test is unsupported.
 (v3_target_compile_as_c): Likewise.diff --git a/libstdc++-v3/testsuite/lib/libstdc++.exp b/libstdc++-v3/testsuite/lib/libstdc++.exp
index 3d9913b..385ae07 100644
--- a/libstdc++-v3/testsuite/lib/libstdc++.exp
+++ b/libstdc++-v3/testsuite/lib/libstdc++.exp
@@ -343,6 +343,22 @@ proc libstdc++_exit { } {
 }
 }
 
+proc ${tool}_check_unsupported_p { output } {
+if [regexp (^|\n)\[^\n\]*: region \[^\n\]* is full $output] {
+	return memory full
+}
+if { [regexp (^|\n)\[^\n\]*: relocation truncated to fit $output]
+   [check_effective_target_tiny] } {
+return memory full
+ }
+
+if { [istarget spu-*-*]  \
+	 [string match *exceeds local store* $output] } {
+	return memory full
+}
+return 
+}
+
 # Callback from system dg-test.
 proc libstdc++-dg-test { prog do_what extra_tool_flags } {
 # Set up the compiler flags, based on what we're going to do.
@@ -455,6 +471,7 @@ proc v3_target_compile { source dest type options } {
 global cxxldflags
 global includes
 global STATIC_LIBCXXFLAGS
+global tool
 
 if { [target_info needs_status_wrapper] !=   [info exists gluefile] } {
 lappend options libs=${gluefile}
@@ -483,7 +500,14 @@ proc v3_target_compile { source dest type options } {
 lappend options compiler=$cxx_final
 lappend options timeout=[timeout_value]
 
-return [target_compile $source $dest $type $options]
+set comp_output [target_compile $source $dest $type $options]
+set unsupported_message [${tool}_check_unsupported_p $comp_output]
+
+if { $unsupported_message !=  } {
+  unsupported $dest: $unsupported_message
+  return 
+}
+return $comp_output
 }
 
 
@@ -498,6 +522,7 @@ proc v3_target_compile_as_c { source dest type options } {
 global cc
 global cxxflags
 global STATIC_LIBCXXFLAGS
+global tool
 
 if { [target_info needs_status_wrapper] !=   [info exists gluefile] } {
 lappend options libs=${gluefile}
@@ -551,7 +576,14 @@ proc v3_target_compile_as_c { source dest type options } {
 lappend options compiler=$cc_final
 lappend options timeout=[timeout_value]
 
-return [target_compile $source $dest $type $options]
+set comp_output [target_compile $source $dest $type $options]
+set unsupported_message [${tool}_check_unsupported_p $comp_output]
+
+if { $unsupported_message !=  } {
+  unsupported $dest: $unsupported_message
+  return 
+}
+return $comp_output
 }
 
 # Build the support objects linked in with the libstdc++ tests.  In

Re: Fix libgomp crash without TLS (PR42616)

2014-12-09 Thread Varvara Rainchik
Can we instead of adding new macroses in config/tls.m4 use something
like that in libgomp:

#if defined (HAVE_TLS)  defined (USE_EMUTLS)

(with GCC_CHECK_EMUTLS in libgomp/configure.ac)?

2014-12-08 19:28 GMT+03:00 Jakub Jelinek ja...@redhat.com:
 On Mon, Dec 08, 2014 at 07:01:09PM +0300, Varvara Rainchik wrote:
 Is it ok to add GCC_CHECK_EMUTLS test in libgomp/configure.ac or
 should I add in tls.m3 a similair test that would be used only in
 libgomp?

 I think config/tls.m4 would be a better place, but doing it in some way
 where the users of config/tls.m4 could actually by using different macros
 from there choose what they want (either check solely for real TLS, or
 only for emutls, or for both).  And libgomp/configure.ac would then choose
 it is happy with both.

 Jakub


[patch] libstdc++/64203 fix shared_mutex for bare-metal targets

2014-12-09 Thread Jonathan Wakely

Tested x86_64-linux, committed to trunk and the 4.9 branch.
commit 6e201632e6b99a8381ec973ae77a9b4e1db255d7
Author: Jonathan Wakely jwak...@redhat.com
Date:   Mon Dec 8 16:02:48 2014 +

	PR libstdc++/64203
	* include/std/shared_mutex: Fix preprocessor conditions.
	* testsuite/experimental/feat-cxx14.cc: Check conditions.

diff --git a/libstdc++-v3/include/std/shared_mutex b/libstdc++-v3/include/std/shared_mutex
index 6405f10..c193eb2 100644
--- a/libstdc++-v3/include/std/shared_mutex
+++ b/libstdc++-v3/include/std/shared_mutex
@@ -36,10 +36,8 @@
 #else
 
 #include bits/c++config.h
-#if defined(_GLIBCXX_HAS_GTHREADS)  defined(_GLIBCXX_USE_C99_STDINT_TR1)
-# include mutex
-# include condition_variable
-#endif
+#include mutex
+#include condition_variable
 #include bits/functexcept.h
 
 namespace std _GLIBCXX_VISIBILITY(default)
@@ -51,7 +49,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
* @{
*/
 
-#if defined(_GLIBCXX_HAS_GTHREADS)  defined(_GLIBCXX_USE_C99_STDINT_TR1)
+#ifdef _GLIBCXX_USE_C99_STDINT_TR1
+#ifdef _GLIBCXX_HAS_GTHREADS
 
 #define __cpp_lib_shared_timed_mutex 201402
 
@@ -254,7 +253,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	}
 }
   };
-#endif // _GLIBCXX_HAS_GTHREADS  _GLIBCXX_USE_C99_STDINT_TR1
+#endif // _GLIBCXX_HAS_GTHREADS
 
   /// shared_lock
   templatetypename _Mutex
@@ -393,6 +392,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 swap(shared_lock_Mutex __x, shared_lock_Mutex __y) noexcept
 { __x.swap(__y); }
 
+#endif // _GLIBCXX_USE_C99_STDINT_TR1
+
   // @} group mutexes
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace
diff --git a/libstdc++-v3/testsuite/experimental/feat-cxx14.cc b/libstdc++-v3/testsuite/experimental/feat-cxx14.cc
index 1c590dd..2cc31ca 100644
--- a/libstdc++-v3/testsuite/experimental/feat-cxx14.cc
+++ b/libstdc++-v3/testsuite/experimental/feat-cxx14.cc
@@ -106,10 +106,12 @@
 #  error shared_mutex
 #endif
 
-#ifndef  __cpp_lib_shared_timed_mutex
-#  error __cpp_lib_shared_timed_mutex
-#elif  __cpp_lib_shared_timed_mutex != 201402
-#  error __cpp_lib_shared_timed_mutex != 201402
+#if defined(_GLIBCXX_HAS_GTHREADS)  defined(_GLIBCXX_USE_C99_STDINT_TR1)
+#  ifndef  __cpp_lib_shared_timed_mutex
+#error __cpp_lib_shared_timed_mutex
+#  elif  __cpp_lib_shared_timed_mutex != 201402
+#error __cpp_lib_shared_timed_mutex != 201402
+#  endif
 #endif
 
 #ifndef  __cpp_lib_is_final


[PATCH] Fix PR64199

2014-12-09 Thread Richard Biener

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2014-12-09  Richard Biener  rguent...@suse.de

PR middle-end/64199
* fold-const.c (fold_binary_loc): Use TREE_OVERFLOW_P.

* gcc.dg/torture/pr64199.c: New testcase.

Index: gcc/fold-const.c
===
--- gcc/fold-const.c(revision 218479)
+++ gcc/fold-const.c(working copy)
@@ -10398,8 +10415,8 @@ fold_binary_loc (location_t loc,
 
  /* Don't introduce overflows through reassociation.  */
  if (!any_overflows
-  ((lit0  TREE_OVERFLOW (lit0))
- || (minus_lit0  TREE_OVERFLOW (minus_lit0
+  ((lit0  TREE_OVERFLOW_P (lit0))
+ || (minus_lit0  TREE_OVERFLOW_P (minus_lit0
return NULL_TREE;
 
  if (minus_lit0)
Index: gcc/testsuite/gcc.dg/torture/pr64199.c
===
--- gcc/testsuite/gcc.dg/torture/pr64199.c  (revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr64199.c  (working copy)
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-additional-options -ffast-math -frounding-math } */
+
+float
+foo (void)
+{
+  return 1.1f + 2.2f + 2.2f;
+}


Re: Fix libgomp crash without TLS (PR42616)

2014-12-09 Thread Jakub Jelinek
On Tue, Dec 09, 2014 at 02:49:44PM +0300, Varvara Rainchik wrote:
 Can we instead of adding new macroses in config/tls.m4 use something
 like that in libgomp:
 
 #if defined (HAVE_TLS)  defined (USE_EMUTLS)
 
 (with GCC_CHECK_EMUTLS in libgomp/configure.ac)?

That would be fine too.

Jakub


Re: [PATCH 3/n] OpenMP 4.0 offloading infrastructure: offload tables

2014-12-09 Thread Ilya Verbin
On 04 Dec 20:52, Jakub Jelinek wrote:
 On Thu, Dec 04, 2014 at 10:35:19PM +0300, Ilya Verbin wrote:
  This issue can be resolved by forcing output of such variables.
  Is this fix ok?  Should I add a testcase?
 
 Yes, with proper ChangeLog.  Yes.

Here is updated patch, ok to commit?

However, I don't see -flto option in the build log.  It seems that
check_effective_target_lto isn't working inside libgomp/ directory.
Maybe because ENABLE_LTO is defined only in gcc/configure.ac ?


gcc/
* varpool.c (varpool_node::get_create): Force output of vars with
omp declare target attribute.
libgomp/
* testsuite/libgomp.c/target-9.c: New test.


diff --git a/gcc/varpool.c b/gcc/varpool.c
index 0526b7f..db28c2a 100644
--- a/gcc/varpool.c
+++ b/gcc/varpool.c
@@ -175,6 +175,7 @@ varpool_node::get_create (tree decl)
   g-have_offload = true;
   if (!in_lto_p)
vec_safe_push (offload_vars, decl);
+  node-force_output = 1;
 #endif
 }
 
diff --git a/libgomp/testsuite/libgomp.c/target-9.c 
b/libgomp/testsuite/libgomp.c/target-9.c
new file mode 100644
index 000..00fe0cb
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/target-9.c
@@ -0,0 +1,37 @@
+/* { dg-do run } */
+/* { dg-options -O1 } */
+/* { dg-additional-options -flto { target lto } } */
+
+#include stdlib.h
+
+#define N 123456
+
+#pragma omp declare target
+int X, Y;
+#pragma omp end declare target
+
+void
+foo ()
+{
+  #pragma omp target map(alloc: X)
+X = N;
+}
+
+int
+main ()
+{
+  int res;
+
+  foo ();
+
+  #pragma omp target map(alloc: X, Y) map(from: res)
+{
+  Y = N;
+  res = X + Y;
+}
+
+  if (res != N + N)
+abort ();
+
+  return 0;
+}


  -- Ilya


[Android] Stop unconditional disabling of ifuncs for Bionic #2

2014-12-09 Thread Alexander Ivchenko
Hi,

Whether the *linux* target supports ifuncs or not is defined here:

linux_has_ifunc_p (void)
{
  return OPTION_BIONIC ? false : HAVE_GNU_INDIRECT_FUNCTION;
}

Bionic right now supports indirect functions, but there is no way to
notify the compiler about that (For Android OPTION_BIONIC is always on
and so configure time check with --enable-gnu-indirect-function does
not work)

The following patch makes the use of the default version of
TARGET_HAS_IFUNC_P for *linux*, so we can decide whether the Android
target supports indirect functions or not on configure time.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index f22bba8..d4d09d0 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2014-12-08  Alexander Ivchenko  alexander.ivche...@intel.com
+
+   * config/linux.c (linux_has_ifunc_p): Remove.
+   * config/linux.h (TARGET_HAS_IFUNC_P): Use default version.
+
 2014-12-08  Ilya Tocar  ilya.to...@intel.com

* config/i386/i386.c (ix86_expand_vec_perm_vpermi2): Handle v64qi.
diff --git a/gcc/config/linux.c b/gcc/config/linux.c
index 6242e11..15df213 100644
--- a/gcc/config/linux.c
+++ b/gcc/config/linux.c
@@ -23,14 +23,6 @@ along with GCC; see the file COPYING3.  If not see
 #include tm.h
 #include linux-protos.h

-/* Android does not support GNU indirect functions.  */
-
-bool
-linux_has_ifunc_p (void)
-{
-  return OPTION_BIONIC ? false : HAVE_GNU_INDIRECT_FUNCTION;
-}
-
 bool
 linux_libc_has_function (enum function_class fn_class)
 {
diff --git a/gcc/config/linux.h b/gcc/config/linux.h
index d38ef81..6ccacff 100644
--- a/gcc/config/linux.h
+++ b/gcc/config/linux.h
@@ -117,10 +117,6 @@ see the files COPYING3 and COPYING.RUNTIME
respectively.  If not, see

 #else /* !uClinux, i.e., normal Linux */

-/* IFUNCs are supported by glibc, but not by uClibc or Bionic.  */
-# undef TARGET_HAS_IFUNC_P
-# define TARGET_HAS_IFUNC_P linux_has_ifunc_p
-
 /* Determine what functions are present at the runtime;
this includes full c99 runtime and sincos.  */
 # undef TARGET_LIBC_HAS_FUNCTION

Is the patch ok? Bootstrapped/regtested on x86_64-unknown-linux-gnu +
checked that the behavior of i686-linux-android is correct.

Alexander


[AArch64, NEON] Improve vmulX intrinsics

2014-12-09 Thread Jiangjiji

Hi,
 This patch converts more intrinsics to use builtin functions instead of the
previous inline assembly syntax.
 Passed the glorious testsuite of Christophe Lyon.

 Three testcases are added for the testing of intriniscs which are not
covered by the testsuite:
 gcc.target/aarch64/vmull_high.c
 gcc.target/aarch64/vmull_high_lane.c
 gcc.target/aarch64/vmull_high_n.c

 Regtested with aarch64-linux-gnu on QEMU.
 This patch has no regressions for aarch64_be-linux-gnu big-endian target 
too.
 OK for the trunk?



Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 218464)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,38 @@
+2014-12-09  Felix Yang  felix.y...@huawei.com
+Jiji Jiang  jiangj...@huawei.com
+
+   * config/aarch64/aarch64-simd.md (aarch64_mul_nmode,
+   aarch64_sumull_nmode, aarch64_sumullmode,
+   aarch64_simd_sumull2_nmode, aarch64_sumull2_nmode,
+   aarch64_sumull_lanemode, aarch64_sumull2_lanemode_internal,
+   aarch64_sumull_laneqmode, aarch64_sumull2_laneqmode_internal,
+   aarch64_smull2_lanemode, aarch64_umull2_lanemode,
+   aarch64_smull2_laneqmode, aarch64_umull2_laneqmode,
+   aarch64_fmulxmode, aarch64_fmulxmode, aarch64_fmulx_lanemode,
+   aarch64_pmull2v16qi, aarch64_pmullv8qi): New patterns.
+   * config/aarch64/aarch64-simd-builtins.def (vec_widen_smult_hi_,
+   vec_widen_umult_hi_, umull, smull, smull_n, umull_n, mul_n, smull2_n,
+   umull2_n, smull_lane, umull_lane, smull_laneq, umull_laneq, pmull,
+   umull2_lane, smull2_laneq, umull2_laneq, fmulx, fmulx_lane, pmull2,
+   smull2_lane): New builtins.
+   * config/aarch64/arm_neon.h (vmul_n_f32, vmul_n_s16, vmul_n_s32,
+   vmul_n_u16, vmul_n_u32, vmulq_n_f32, vmulq_n_f64, vmulq_n_s16,
+   vmulq_n_s32, vmulq_n_u16, vmulq_n_u32, vmull_high_lane_s16,
+   vmull_high_lane_s32, vmull_high_lane_u16, vmull_high_lane_u32,
+   vmull_high_laneq_s16, vmull_high_laneq_s32, vmull_high_laneq_u16,
+   vmull_high_laneq_u32, vmull_high_n_s16, vmull_high_n_s32,
+   vmull_high_n_u16, vmull_high_n_u32, vmull_high_p8, vmull_high_s8,
+   vmull_high_s16, vmull_high_s32, vmull_high_u8, vmull_high_u16,
+   vmull_high_u32, vmull_lane_s16, vmull_lane_s32, vmull_lane_u16,
+   vmull_lane_u32, vmull_laneq_s16, vmull_laneq_s32, vmull_laneq_u16,
+   vmull_laneq_u32, vmull_n_s16, vmull_n_s32, vmull_n_u16, vmull_n_u32,
+   vmull_p8, vmull_s8, vmull_s16, vmull_s32, vmull_u8, vmull_u16,
+   vmull_u32, vmulx_f32, vmulx_lane_f32, vmulxd_f64, vmulxq_f32,
+   vmulxq_f64, vmulxq_lane_f32, vmulxq_lane_f64, vmulxs_f32): Rewrite
+   using builtin functions.
+   * config/aarch64/iterators.md (UNSPEC_FMULX, UNSPEC_FMULX_LANE,
+   VDQF_Q): New unspec and int iterator.
+
 2014-12-07  Felix Yang  felix.y...@huawei.com
Shanyao Chen  chenshan...@huawei.com
 
Index: gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmull_high.c

===
--- gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmull_high.c
(revision 0)
+++ gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmull_high.c
(revision 0)
@@ -0,0 +1,111 @@
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,16,8) [] = { 0xfc48, 0xfcbf, 0xfd36, 0xfdad,
+0xfe24, 0xfe9b, 0xff12, 0xff89 };
+VECT_VAR_DECL(expected,int,32,4) [] = { 0xf9a0, 0xfa28,
+0xfab0, 0xfb38 };
+VECT_VAR_DECL(expected,int,64,2) [] = { 0xf7a2,
+0xf83b };
+VECT_VAR_DECL(expected,uint,16,8) [] = { 0xa4b0, 0xa55a, 0xa604, 0xa6ae,
+ 0xa758, 0xa802, 0xa8ac, 0xa956 };
+VECT_VAR_DECL(expected,uint,32,4) [] = { 0xbaf73c, 0xbaf7f7,
+ 0xbaf8b2, 0xbaf96d };
+VECT_VAR_DECL(expected,uint,64,2) [] = { 0xcbf4d8,
+ 0xcbf5a4};
+VECT_VAR_DECL(expected,poly,16,8) [] = { 0x6530, 0x659a, 0x6464, 0x64ce,
+ 0x6798, 0x6732, 0x66cc, 0x };
+
+#ifndef INSN_NAME
+#define INSN_NAME vmull_high
+#define TEST_MSG VMUL_HIGH
+#endif
+
+#define FNNAME1(NAME) exec_ ## NAME
+#define FNNAME(NAME) FNNAME1(NAME)
+
+void FNNAME (INSN_NAME) (void)
+{
+#define DECL_VMUL(T, W, N)  \
+  DECL_VARIABLE(vector1, T, W, N);  \
+  DECL_VARIABLE(vector2, T, W, N); 
+
+  /* vector_res = OP(vector1, vector2), then store the result.  */
+#define TEST_VMULL_HIGH1(INSN, Q, T1, T2, W, N, W1, N1) \
+  VECT_VAR(vector_res, T1, W1, N1) =\
+INSN##Q##_##T2##W(VECT_VAR(vector1, T1, W, N), 

Re: match.pd tweaks for vectors and issues with HONOR_NANS

2014-12-09 Thread Richard Biener
On Mon, Dec 8, 2014 at 3:45 PM, Marc Glisse marc.gli...@inria.fr wrote:
 On Tue, 18 Nov 2014, Richard Biener wrote:

 I'll try to replace some more TYPE_MODE during stage3...


 Btw, a convenience would be to be able to write

 HONOR_NANS (type)

 thus effectively make HONOR_* inline functions with a machine_mode
 and a type overload (and the type overload properly looking at
 element types).


 Making those functions inline is not easy, because real.h and tree.h don't
 include each other. Here is a version with the functions not inline. I was
 tempted to also overload on gcond const* (for the cases that call
 gimple_cond_lhs) but the arguments were always gimple and not gcond*, so I
 didn't.

 Passes bootstrap+testsuite on x86_64-linux-gnu.

Ok.

Thanks,
Richard.

 2014-12-08  Marc Glisse  marc.gli...@inria.fr

 * real.h (HONOR_NANS): Replace macro with 3 overloaded declarations.
 * real.c: Include rtl.h and options.h.
 (HONOR_NANS): Define three overloads.
 * builtins.c (fold_builtin_classify, fold_builtin_unordered_cmp):
 Simplify argument of HONOR_NANS.
 * fold-const.c (combine_comparisons, fold_truth_not_expr,
 fold_cond_expr_with_comparison, merge_truthop_with_opposite_arm,
 fold_comparison, fold_binary_loc): Likewise.
 * ifcvt.c (noce_try_move, noce_try_minmax): Likewise.
 * ipa-inline-analysis.c (add_clause,
 set_cond_stmt_execution_predicate): Likewise.
 * match.pd: Likewise.
 * rtlanal.c (may_trap_p_1): Likewise.
 * simplify-rtx.c (simplify_const_relational_operation): Likewise.
 * tree-if-conv.c (parse_predicate): Likewise.
 * tree-ssa-ccp.c (valid_lattice_transition): Likewise.
 * tree-ssa-ifcombine.c (ifcombine_ifandif): Likewise.
 * tree-ssa-phiopt.c (minmax_replacement, neg_replacement): Likewise.
 * tree-ssa-reassoc.c (eliminate_using_constants): Likewise.
 * tree-ssa-tail-merge.c (gimple_equal_p): Likewise.

 --
 Marc Glisse
 Index: builtins.c
 ===
 --- builtins.c  (revision 218467)
 +++ builtins.c  (working copy)
 @@ -9641,34 +9641,34 @@ fold_builtin_classify (location_t loc, t
integer_minus_one_node, integer_one_node);
 tmp = fold_build3_loc (loc, COND_EXPR, integer_type_node,
isinf_call, tmp,
integer_zero_node);
   }

 return tmp;
}

  case BUILT_IN_ISFINITE:
 -  if (!HONOR_NANS (TYPE_MODE (TREE_TYPE (arg)))
 +  if (!HONOR_NANS (arg)
!HONOR_INFINITIES (TYPE_MODE (TREE_TYPE (arg
 return omit_one_operand_loc (loc, type, integer_one_node, arg);

if (TREE_CODE (arg) == REAL_CST)
 {
   r = TREE_REAL_CST (arg);
   return real_isfinite (r) ? integer_one_node : integer_zero_node;
 }

return NULL_TREE;

  case BUILT_IN_ISNAN:
 -  if (!HONOR_NANS (TYPE_MODE (TREE_TYPE (arg
 +  if (!HONOR_NANS (arg))
 return omit_one_operand_loc (loc, type, integer_zero_node, arg);

if (TREE_CODE (arg) == REAL_CST)
 {
   r = TREE_REAL_CST (arg);
   return real_isnan (r) ? integer_one_node : integer_zero_node;
 }

arg = builtin_save_expr (arg);
return fold_build2_loc (loc, UNORDERED_EXPR, type, arg, arg);
 @@ -9782,27 +9782,26 @@ fold_builtin_unordered_cmp (location_t l
else if (code0 == REAL_TYPE  code1 == INTEGER_TYPE)
  cmp_type = type0;
else if (code0 == INTEGER_TYPE  code1 == REAL_TYPE)
  cmp_type = type1;

arg0 = fold_convert_loc (loc, cmp_type, arg0);
arg1 = fold_convert_loc (loc, cmp_type, arg1);

if (unordered_code == UNORDERED_EXPR)
  {
 -  if (!HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0
 +  if (!HONOR_NANS (arg0))
 return omit_two_operands_loc (loc, type, integer_zero_node, arg0,
 arg1);
return fold_build2_loc (loc, UNORDERED_EXPR, type, arg0, arg1);
  }

 -  code = HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))) ? unordered_code
 -  : ordered_code;
 +  code = HONOR_NANS (arg0) ? unordered_code : ordered_code;
return fold_build1_loc (loc, TRUTH_NOT_EXPR, type,
   fold_build2_loc (loc, code, type, arg0, arg1));
  }

  /* Fold __builtin_{,s,u}{add,sub,mul}{,l,ll}_overflow, either into normal
 arithmetics if it can never overflow, or into internal functions that
 return both result of arithmetics and overflowed boolean flag in
 a complex integer result, or some other check for overflow.  */

  static tree
 Index: fold-const.c
 ===
 --- fold-const.c(revision 218467)
 +++ fold-const.c(working copy)
 @@ -2585,21 +2585,21 @@ compcode_to_comparison (enum comparison_
 and RCODE on the 

Re: [PATCH 2/3] Extended if-conversion

2014-12-09 Thread Yuri Rumyantsev
Richard,

Here is updated patch2 with the following changes:
1. Delete functions  phi_has_two_different_args and find_insertion_point.
2. Use only one function for extended predication -
predicate_extended_scalar_phi.
3. Save gsi before insertion of predicate computations for basic
blocks if it has 2 predecessors and
both incoming edges are critical or it gas more than 2 predecessors
and at least one incoming edge
is critical. This saved iterator can be used by extended phi predication.

Here is motivated test-case which explains this point.
Test-case is attached (t5.c) and it must be compiled with -O2
-ftree-loop-vectorize -fopenmp options.
The problem phi is in bb-7:

  bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 })
  {
bb 5:
xmax_edge_18 = xmax_edge_36 + 1;
if (xmax_17 == xmax_27)
  goto bb 7;
else
  goto bb 9;

  }
  bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 })
  {
bb 6:
if (xmax_17 == xmax_27)
  goto bb 7;
else
  goto bb 8;

  }
  bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 })
  {
bb 7:
# xmax_edge_30 = PHI xmax_edge_36(6), xmax_edge_18(5)
xmax_edge_19 = xmax_edge_39 + 1;
goto bb 11;

  }

Note that both incoming edges to bb_7 are critical. If we comment out
restoring gsi in predicate_all_scalar_phi:
#if 0
 if ((EDGE_COUNT (bb-preds) == 2  all_preds_critical_p (bb))
 || (EDGE_COUNT (bb-preds)  2  has_pred_critical_p (bb)))
   gsi = bb_insert_point (bb);
 else
#endif
   gsi = gsi_after_labels (bb);

we will get ICE:
t5.c: In function 'foo':
t5.c:9:6: error: definition in block 4 follows the use
 void foo (int n)
  ^
for SSA_NAME: _1 in statement:
_52 = _1  _3;
t5.c:9:6: internal compiler error: verify_ssa failed

smce predicate computations were inserted in bb_7.

ChangeLog is

2014-12-09  Yuri Rumyantsev  ysrum...@gmail.com

* tree-if-conv.c : Include hash-map.h.
(struct bb_predicate_s): Add new field to save copy of gimple
statement iterator.
(bb_insert_point): New function.
(set_bb_insert_point): New function.
(has_pred_critical_p): New function.
(if_convertible_bb_p): Allow bb has more than 2 predecessors if
AGGRESSIVE_IF_CONV is true.
(if_convertible_bb_p): Delete check that bb has at least one
non-critical incoming edge.
(is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
Allow interchange PHI arguments if EXTENDED is false.
Change check that block containing reduction statement candidate
is predecessor of phi-block since phi may have more than two arguments.
(predicate_scalar_phi): Add new arguments for call of
is_cond_scalar_reduction.
(get_predicate_for_edge): New function.
(struct phi_args_hash_traits): New type.
(phi_args_hash_traits::hash): New function.
(phi_args_hash_traits::equal_keys): New function.
(gen_phi_arg_condition): New function.
(predicate_extended_scalar_phi): New function.
(predicate_all_scalar_phis): Add boolean variable EXTENDED and set it
to true if BB containing phi has more than 2 predecessors or both
incoming edges are critical. Invoke find_phi_replacement_condition and
predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB
has 2 predecessors and both incoming edges are critical or it has more
than 2 predecessors and atleast one incoming edge is critical.
Use standard gsi_after_labels otherwise.
Invoke predicate_extended_scalar_phi if EXTENDED is true.
(insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION
to save gsi before insertion of predicate computations. SEt-up it to
true for BB with 2 predecessors and critical incoming edges either
number of predecessors is geater 2 and at least one incoming edge is
critical.
Add check that non-predicated block may have statements to insert.
Insert predicate computation of BB just after label if
EXTENDED_PREDICATION is true.
(tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
is copy of inner or outer loop force_vectorize field.




2014-12-04 16:37 GMT+03:00 Richard Biener richard.guent...@gmail.com:
 On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev ysrum...@gmail.com wrote:
 Richard,

 I did simple change by saving gsi iterator for each bb that has
 critical edges by adding additional field to bb_predicate_s:

 typedef struct bb_predicate_s {

   /* The condition under which this basic block is executed.  */
   tree predicate;

   /* PREDICATE is gimplified, and the sequence of statements is
  recorded here, in order to avoid the duplication of computations
  that occur in previous conditions.  See PR44483.  */
   gimple_seq predicate_gimplified_stmts;

   /* Insertion point for blocks having incoming critical edges.  */
   gimple_stmt_iterator gsi;
 } *bb_predicate_p;

 and this iterator is saved in  insert_gimplified_predicates before
 insertion code for predicate computation. I checked that this fix
 works.

 Huh?  I still wonder what the issue is with inserting everything
 after the PHI we predicate.

 Well, your updated patch will come with testcases for the testsuite
 that will 

Re: [COMMITTED] [PING] [PATCH] [AArch64, NEON] More NEON intrinsics improvement

2014-12-09 Thread Christophe Lyon
On 9 December 2014 at 03:26, Yangfei (Felix) felix.y...@huawei.com wrote:
 On 5 December 2014 at 18:44, Tejas Belagod tejas.bela...@arm.com wrote:
 
 
  +__extension__ static __inline float32x2_t __attribute__
  +((__always_inline__))
  +vfms_f32 (float32x2_t __a, float32x2_t __b, float32x2_t __c) {
  +  return __builtin_aarch64_fmav2sf (-__b, __c, __a); }
  +
  +__extension__ static __inline float32x4_t __attribute__
  +((__always_inline__))
  +vfmsq_f32 (float32x4_t __a, float32x4_t __b, float32x4_t __c) {
  +  return __builtin_aarch64_fmav4sf (-__b, __c, __a); }
  +
  +__extension__ static __inline float64x2_t __attribute__
  +((__always_inline__))
  +vfmsq_f64 (float64x2_t __a, float64x2_t __b, float64x2_t __c) {
  +  return __builtin_aarch64_fmav2df (-__b, __c, __a); }
  +
  +
 
 
  Thanks, the patch looks good. Just one comment:
  You could also add
  float32x2_t vfms_n_f32(float32x2_t a, float32x2_t b, float32_t n) and
  its Q-variant.

 You can, if you wish,  deal with Tejas' comment with a follow on patch
 rather than re-spinning this one.   Provided this patch has no
 regressions on a big endian and a little endian test run then you can commit 
 it.
 Thanks
 /Marcus


 No regressions for aarch64_be-linux-gnu target.  Committed as r218484.
 Will come up with a new patch to deal with Tejas' comment.  Thanks.

My validations of trunk show that your new tests are incorrect: none
of them compiles because the hfloat64_t type isn't defined.

Also, keep in mind that the tests in this directory are executed by
the aarch32 target too.

Christophe


Re: [Patch] Improving jump-thread pass for PR 54742

2014-12-09 Thread Richard Biener
On Mon, Dec 8, 2014 at 10:49 PM, Steve Ellcey sell...@mips.com wrote:
 On Sat, 2014-12-06 at 19:21 +, Sebastian Pop wrote:

  I think it does not make sense to duplicate paths at -Os: I disabled the 
  FSM
  jump-threading when optimizing for size like this.
 
  diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
  index 29b20c8..ce70311 100644
  --- a/gcc/tree-ssa-threadedge.c
  +++ b/gcc/tree-ssa-threadedge.c
  @@ -1335,8 +1335,9 @@ thread_through_normal_block (edge e,
return 1;
  }
 
 if (!flag_expensive_optimizations
  + || optimize_function_for_size_p (cfun)
|| TREE_CODE (cond) != SSA_NAME
|| e-dest-loop_father != e-src-loop_father
|| loop_depth (e-dest-loop_father) == 0)
  return 0;
 
  I will regstrap and commit the attached patch.

 Bootstrapped and regression tested on x86_64-linux.
 Committed r218451.

 Sebastian

 Thanks for getting all this checked in Sebastian, I have tested it on
 coremark and I am getting the speed up that I expect.  But I am a little
 confused about turning off jump threading.  I am getting the
 optimization on coremark with -O3 and that is great and if I use '-O3
 -fno-expensive-optimizations' then I don't see this part of the jump
 threading, also great.  But I was surprised that if I just used '-O3
 -fno-thread-jumps' then I still see this optimization.  Is that
 expected?  Should this test also check flag_thread_jumps?  Or should
 that be getting checked somewhere else?

-fthread-jumps is an RTL optimization flag and ignored on GIMPLE.

Richard.

 Steve Ellcey




Re: [RFC, PATCH, fortran] PR fortran/60255 Deferred character length

2014-12-09 Thread Dominique d'Humières
Dear Andre,

The patch causes an ICE for the test gfortran.dg/unlimited_polymorphic_1.f03:

f951: internal compiler error: in gfc_add_component_ref, at fortran/class.c:236

f951: internal compiler error: Abort trap: 6
gfc: internal compiler error: Abort trap: 6 (program f951)
Abort

Reduced test for which the ICE is triggered by ‘len(w)'

MODULE m

contains
  subroutine bar (arg, res)
class(*) :: arg
character(100) :: res
select type (w = arg)
  type is (character(*))
write (res, '(I2)') len(w)
end select
  end subroutine

END MODULE

Note that with your patch at 
https://gcc.gnu.org/ml/fortran/2014-08/msg00022.html, I get the same ICE for 
the Mikael’s test at https://gcc.gnu.org/ml/fortran/2014-08/msg00055.html 
(before your patch for pr60255, it used to give a wrong length: 80 instead of 
20 AFAICR).

Note that the assert at fortran/class.c:236 is also triggered for pr61115.

Thanks for working on these issues,

Dominique

 On 8 December 2014 at 18:38, Andre Vehreschild ve...@gmx.de wrote:
 Hi all,
 
 please find attached a more elaborate patch for pr60255. I totally agree that
 my first attempt was just scratching the surface of the work needed.
 
 This patch also is *not* complete, but because I am really new to gfortran
 patching, I don't want to present a final patch only to learn then, that I 
 have
 violated design rules, common practice or the like. Therefore please comment
 and direct me to any sources/ideas to improve the patch.
 
 Topic:
 The pr 60255 is about assigning a char array to an unlimited polymorphic
 entity. In the comments the concern about the lost length information is
 raised. The patch adds a _len component to the unlimited polymorphic entity
 (after _data and _vtab) and adds an assignment of the string length to _len
 when a string is pointer assigned to the unlimited poly entity. Furthermore 
 is
 the intrinsic len(unlimited poly pointing to a string) resolved to give the
 _len component.
 
 Yet missing:
 - assign _len component back to deferred char array length component
 - transport length along chains of unlimited poly entities, i.e., a = b; c 
 =
  a where all objects are unlimited poly and b is a string.
 - allocate() in this context
 
 Patch dependencies:
 none
 
 Comments, concerns, candy welcome!
 
 Regards,
Andre




Re: [PATCH] Do not download packages for graphite loop optimizations by default when using ./contrib/download_prerequisites

2014-12-09 Thread Richard Biener
On Tue, Dec 9, 2014 at 6:36 AM, Chung-Ju Wu jasonw...@gmail.com wrote:
 Hi, all,

 In the discussion thread last year:
   https://gcc.gnu.org/ml/gcc-patches/2013-05/msg01334.html

 I extended the script ./contrib/download_prerequisites so that it can
 download isl and cloog packages for graphite loop optimizations.
 The patch was proposed to use GRAPHITE_LOOP_OPT=no by default.
 However, the change I committed into trunk is setting GRAPHITE_LOOP_OPT=yes:
   https://gcc.gnu.org/viewcvs/gcc?view=revisionrevision=199297

 I am sorry about my carelessness and I would like to propose a new patch
 to fix it.  The plaintext ChangeLog and patch are as follow:

I'd say keep it as =yes (now that we only need ISL) and adjust the comment
instead.

Richard.


 Index: contrib/ChangeLog
 ===
 --- contrib/ChangeLog   (revision 218505)
 +++ contrib/ChangeLog   (working copy)
 @@ -1,3 +1,7 @@
 +2014-12-09  Chung-Ju Wu  jasonw...@gmail.com
 +
 +   * download_prerequisites: Set GRAPHITE_LOOP_OPT=no by default.
 +
  2014-12-04  Thomas Preud'homme  thomas.preudho...@arm.com

 * check_GNU_style.sh: Warn for incorrect number of spaces in function


 Index: contrib/download_prerequisites
 ===
 --- contrib/download_prerequisites  (revision 218505)
 +++ contrib/download_prerequisites  (working copy)
 @@ -22,7 +22,7 @@
  # If you want to build GCC with the Graphite loop optimizations,
  # set GRAPHITE_LOOP_OPT=yes to download optional prerequisties
  # ISL Library and CLooG.
 -GRAPHITE_LOOP_OPT=yes
 +GRAPHITE_LOOP_OPT=no

  # Necessary to build GCC.
  MPFR=mpfr-2.4.2


 Is this patch OK for trunk?

 Or we can still keep it yes since it has been using GRAPHITE_LOOP_OPT=yes
 for a long time.  In that case, I will help to update the comment accordingly.

 Any comment? :)


 Best regards,
 jasonwucj


Re: [PATCH, x86] Fix pblendv expand.

2014-12-09 Thread Evgeny Stupachenko
The case comes from spec2006 403.gcc (or old GCC itself).

  for (i = 0; i  FIRST_PSEUDO_REGISTER; ++i)
{
  vd-e[i].mode = VOIDmode;
  vd-e[i].oldest_regno = i;
  vd-e[i].next_regno = INVALID_REGNUM;
}

It is vectorized and only then completely peeled.
Only after peeling all stored values become constant.

Currently expand_vec_perm_pblendv works as following:
Let d.target, d.op0, dop1 be permutation parameters.

First we permute an operand (d.op1 or d.op0) and then blend it with
other argument:

d.target = shuffle(d.op1) /* using expand_vec_perm_1 */
d.target = pblend(d.op0, d.target)
(if d.op0 equal to d.target this is buggy)

Patch make it more accurate:
tmp = gen_reg_rtx (vmode)
tmp = shuffle(d.op1) /* using expand_vec_perm_1 */
d.target = pblend(d.op0, tmp)
(Here d.op0 can be equal to d.target)

Below is rtl dump of buggy case:

(183t.optimized)
...
vect_shuffle3_low_470 = VEC_PERM_EXPR { 0, 0, 0, 0 }, { 32, 33, 34,
35 }, { 0, 4, 0, 1 };
vect_shuffle3_high_469 = VEC_PERM_EXPR vect_shuffle3_low_470, {
4294967295, 4294967295, 4294967295, 4294967295 }, { 0, 1, 4, 3 };
...
(184r.expand)
...
(insn 205 204 206 (set (reg:V4SI 768)
(const_vector:V4SI [
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
])) ../regrename.c:1171 -1
 (nil))

(insn 206 205 208 (set (reg:V4SI 769)
(mem/u/c:V4SI (symbol_ref/u:DI (*.LC28) [flags 0x2]) [3  S16
A128])) ../regrename.c:1171 -1
 (expr_list:REG_EQUAL (const_vector:V4SI [
(const_int 32 [0x20])
(const_int 33 [0x21])
(const_int 34 [0x22])
(const_int 35 [0x23])
])
(nil)))

(insn 208 206 207 (set (reg:V4SI 770)
(vec_select:V4SI (vec_concat:V8SI (reg:V4SI 768)
(reg:V4SI 769))
(parallel [
(const_int 0 [0])
(const_int 4 [0x4])
(const_int 1 [0x1])
(const_int 5 [0x5])
]))) ../regrename.c:1171 -1
 (nil))

(insn 207 208 209 (set (reg:V4SI 464 [ D.15061 ])
(vec_select:V4SI (reg:V4SI 770)
(parallel [
(const_int 0 [0])
(const_int 1 [0x1])
(const_int 0 [0])
(const_int 2 [0x2])
]))) ../regrename.c:1171 -1
 (nil))

(insn 209 207 210 (set (reg:V4SI 771)
(const_vector:V4SI [
(const_int -1 [0x])
(const_int -1 [0x])
(const_int -1 [0x])
(const_int -1 [0x])
])) ../regrename.c:1171 -1
 (nil))

(insn 210 209 211 (set (reg:V4SI 464 [ D.15061 ])
(vec_select:V4SI (reg:V4SI 771)
(parallel [
(const_int 0 [0])
(const_int 1 [0x1])
(const_int 0 [0])
(const_int 3 [0x3])
]))) ../regrename.c:1171 -1
 (nil))
(insn 211 210 212 (set (reg:V8HI 772)
(vec_merge:V8HI (subreg:V8HI (reg:V4SI 464 [ D.15061 ]) 0)
(subreg:V8HI (reg:V4SI 464 [ D.15061 ]) 0)
(const_int 48 [0x30]))) ../regrename.c:1171 -1
 (nil))
...

On Tue, Dec 9, 2014 at 12:06 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Tue, Dec 9, 2014 at 9:57 AM, Uros Bizjak ubiz...@gmail.com wrote:

 The patch fix pblendv expand.
 The bug was uncovered when permutation operands are constants.
 In this case we init target register for expand_vec_perm_1 with
 constant and then rewrite the target with constant for
 expand_vec_perm_pblend.

 The patch fixes 403.gcc execution, compiled with -Ofast -funroll-loops
 -flto -march=corei7.

 Bootstrap and make check passed.

 Is it ok?

 Please add a testcase.

 Also, it surprises me that we enter expand_vec_perm_pblendv with
 uninitialized (?) target. Does your patch only papers over a real
 problem up the call chain (hard to say without a testcase)?

 Uros.



 Evgeny

 2014-12-09  Evgeny Stupachenko  evstu...@gmail.com

 gcc/
 * config/i386/i386.c (expand_vec_perm_pblendv): Gen new rtx for
 expand_vec_perm_1 target.

 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index eafc15a..5a914ad 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -47546,6 +47546,7 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d 
 *d)
  dcopy.op0 = dcopy.op1 = d-op1;
else
  dcopy.op0 = dcopy.op1 = d-op0;
 +  dcopy.target = gen_reg_rtx (vmode);
dcopy.one_operand_p = true;

for (i = 0; i  nelt; ++i)


Re: [AArch64, NEON] Improve vmulX intrinsics

2014-12-09 Thread Christophe Lyon
On 9 December 2014 at 13:52, Jiangjiji jiangj...@huawei.com wrote:
 Hi,
  This patch converts more intrinsics to use builtin functions instead of
 the
 previous inline assembly syntax.
  Passed the glorious testsuite of Christophe Lyon.

  Three testcases are added for the testing of intriniscs which are not
 covered by the testsuite:
  gcc.target/aarch64/vmull_high.c
  gcc.target/aarch64/vmull_high_lane.c
  gcc.target/aarch64/vmull_high_n.c


As I said here:
https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01934.html
I am in tre process of converting my existing testsuite to GCC/Dejagnu.
Please do not duplicate work.


  Regtested with aarch64-linux-gnu on QEMU.
  This patch has no regressions for aarch64_be-linux-gnu big-endian
 target too.
  OK for the trunk?



 Index: gcc/ChangeLog
 ===
 --- gcc/ChangeLog   (revision 218464)
 +++ gcc/ChangeLog   (working copy)
 @@ -1,3 +1,38 @@
 +2014-12-09  Felix Yang  felix.y...@huawei.com
 +Jiji Jiang  jiangj...@huawei.com
 +
 +   * config/aarch64/aarch64-simd.md (aarch64_mul_nmode,
 +   aarch64_sumull_nmode, aarch64_sumullmode,
 +   aarch64_simd_sumull2_nmode, aarch64_sumull2_nmode,
 +   aarch64_sumull_lanemode, aarch64_sumull2_lanemode_internal,
 +   aarch64_sumull_laneqmode,
 aarch64_sumull2_laneqmode_internal,
 +   aarch64_smull2_lanemode, aarch64_umull2_lanemode,
 +   aarch64_smull2_laneqmode, aarch64_umull2_laneqmode,
 +   aarch64_fmulxmode, aarch64_fmulxmode, aarch64_fmulx_lanemode,
 +   aarch64_pmull2v16qi, aarch64_pmullv8qi): New patterns.
 +   * config/aarch64/aarch64-simd-builtins.def (vec_widen_smult_hi_,
 +   vec_widen_umult_hi_, umull, smull, smull_n, umull_n, mul_n,
 smull2_n,
 +   umull2_n, smull_lane, umull_lane, smull_laneq, umull_laneq, pmull,
 +   umull2_lane, smull2_laneq, umull2_laneq, fmulx, fmulx_lane, pmull2,
 +   smull2_lane): New builtins.
 +   * config/aarch64/arm_neon.h (vmul_n_f32, vmul_n_s16, vmul_n_s32,
 +   vmul_n_u16, vmul_n_u32, vmulq_n_f32, vmulq_n_f64, vmulq_n_s16,
 +   vmulq_n_s32, vmulq_n_u16, vmulq_n_u32, vmull_high_lane_s16,
 +   vmull_high_lane_s32, vmull_high_lane_u16, vmull_high_lane_u32,
 +   vmull_high_laneq_s16, vmull_high_laneq_s32, vmull_high_laneq_u16,
 +   vmull_high_laneq_u32, vmull_high_n_s16, vmull_high_n_s32,
 +   vmull_high_n_u16, vmull_high_n_u32, vmull_high_p8, vmull_high_s8,
 +   vmull_high_s16, vmull_high_s32, vmull_high_u8, vmull_high_u16,
 +   vmull_high_u32, vmull_lane_s16, vmull_lane_s32, vmull_lane_u16,
 +   vmull_lane_u32, vmull_laneq_s16, vmull_laneq_s32, vmull_laneq_u16,
 +   vmull_laneq_u32, vmull_n_s16, vmull_n_s32, vmull_n_u16, vmull_n_u32,
 +   vmull_p8, vmull_s8, vmull_s16, vmull_s32, vmull_u8, vmull_u16,
 +   vmull_u32, vmulx_f32, vmulx_lane_f32, vmulxd_f64, vmulxq_f32,
 +   vmulxq_f64, vmulxq_lane_f32, vmulxq_lane_f64, vmulxs_f32): Rewrite
 +   using builtin functions.
 +   * config/aarch64/iterators.md (UNSPEC_FMULX, UNSPEC_FMULX_LANE,
 +   VDQF_Q): New unspec and int iterator.
 +
  2014-12-07  Felix Yang  felix.y...@huawei.com
 Shanyao Chen  chenshan...@huawei.com
  Index: gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmull_high.c
 ===
 --- gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmull_high.c
 (revision 0)
 +++ gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmull_high.c
 (revision 0)
 @@ -0,0 +1,111 @@
 +#include arm_neon.h
 +#include arm-neon-ref.h
 +#include compute-ref-data.h
 +
 +
 +/* Expected results.  */
 +VECT_VAR_DECL(expected,int,16,8) [] = { 0xfc48, 0xfcbf, 0xfd36, 0xfdad,
 +0xfe24, 0xfe9b, 0xff12, 0xff89 };
 +VECT_VAR_DECL(expected,int,32,4) [] = { 0xf9a0, 0xfa28,
 +0xfab0, 0xfb38 };
 +VECT_VAR_DECL(expected,int,64,2) [] = { 0xf7a2,
 +0xf83b };
 +VECT_VAR_DECL(expected,uint,16,8) [] = { 0xa4b0, 0xa55a, 0xa604, 0xa6ae,
 + 0xa758, 0xa802, 0xa8ac, 0xa956 };
 +VECT_VAR_DECL(expected,uint,32,4) [] = { 0xbaf73c, 0xbaf7f7,
 + 0xbaf8b2, 0xbaf96d };
 +VECT_VAR_DECL(expected,uint,64,2) [] = { 0xcbf4d8,
 + 0xcbf5a4};
 +VECT_VAR_DECL(expected,poly,16,8) [] = { 0x6530, 0x659a, 0x6464, 0x64ce,
 + 0x6798, 0x6732, 0x66cc, 0x };
 +
 +#ifndef INSN_NAME
 +#define INSN_NAME vmull_high
 +#define TEST_MSG VMUL_HIGH
 +#endif
 +
 +#define FNNAME1(NAME) exec_ ## NAME
 +#define FNNAME(NAME) FNNAME1(NAME)
 +
 +void FNNAME (INSN_NAME) (void)
 +{
 +#define DECL_VMUL(T, W, N)  \
 +  DECL_VARIABLE(vector1, T, W, N);  

[PATCH] Fix PR64193

2014-12-09 Thread Richard Biener

The following patch restores proper CSE by FRE/PRE when doing
optimistic value-numbering of stores/loads.  In that case we
may end up value-numbering a virtual operand to the only executable
edge value of a PHI - but as the alias walker doesn't know this
it fails to handle the PHI which results in inconsistent virtual
operand being used for the entry in the hashtable.  Since
r173250 we refuse to value-number to sth non-VARYING in the 2nd
iteration if the 1st iteration computed VARYING (which is correct).

The fix is to tell the alias walker about this optimistic handling.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2014-12-09  Richard Biener  rguent...@suse.de

PR tree-optimization/64193
* tree-ssa-alias.c (walk_non_aliased_vuses): Add valueize parameter
and valueize the VUSE before looking up the def stmt.
* tree-ssa-alias.h (walk_non_aliased_vuses): Adjust prototype.
* tree-ssa-sccvn.c (vn_reference_lookup_pieces): Pass vn_valueize
to walk_non_aliased_vuses.
(vn_reference_lookup): Likewise.
* tree-ssa-dom.c (lookup_avail_expr): Pass NULL as valueize
callback to walk_non_aliased_vuses.

* gcc.dg/tree-ssa/ssa-fre-43.c: New testcase.

Index: gcc/tree-ssa-alias.c
===
--- gcc/tree-ssa-alias.c(revision 218479)
+++ gcc/tree-ssa-alias.c(working copy)
@@ -2632,12 +2632,18 @@ get_continuation_for_phi (gimple phi, ao
If TRANSLATE returns NULL the walk continues and TRANSLATE is supposed
to adjust REF and *DATA to make that valid.
 
+   VALUEIZE if non-NULL is called with the next VUSE that is considered
+   and return value is substituted for that.  This can be used to
+   implement optimistic value-numbering for example.  Note that the
+   VUSE argument is assumed to be valueized already.
+
TODO: Cache the vector of equivalent vuses per ref, vuse pair.  */
 
 void *
 walk_non_aliased_vuses (ao_ref *ref, tree vuse,
void *(*walker)(ao_ref *, tree, unsigned int, void *),
void *(*translate)(ao_ref *, tree, void *, bool),
+   tree (*valueize)(tree),
void *data)
 {
   bitmap visited = NULL;
@@ -2663,6 +2669,8 @@ walk_non_aliased_vuses (ao_ref *ref, tre
   else if (res != NULL)
break;
 
+  if (valueize)
+   vuse = valueize (vuse);
   def_stmt = SSA_NAME_DEF_STMT (vuse);
   if (gimple_nop_p (def_stmt))
break;
Index: gcc/tree-ssa-alias.h
===
--- gcc/tree-ssa-alias.h(revision 218479)
+++ gcc/tree-ssa-alias.h(working copy)
@@ -124,6 +124,7 @@ extern void *walk_non_aliased_vuses (ao_
 void *(*)(ao_ref *, tree,
   unsigned int, void *),
 void *(*)(ao_ref *, tree, void *, bool),
+tree (*)(tree),
 void *);
 extern unsigned int walk_aliased_vdefs (ao_ref *, tree,
bool (*)(ao_ref *, tree, void *),
Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 218479)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -2160,7 +2176,8 @@ vn_reference_lookup_pieces (tree vuse, a
*vnresult =
  (vn_reference_t)walk_non_aliased_vuses (r, vr1.vuse,
  vn_reference_lookup_2,
- vn_reference_lookup_3, vr1);
+ vn_reference_lookup_3,
+ vn_valueize, vr1);
   gcc_checking_assert (vr1.operands == shared_lookup_references);
 }
 
@@ -2212,7 +2229,8 @@ vn_reference_lookup (tree op, tree vuse,
   wvnresult =
(vn_reference_t)walk_non_aliased_vuses (r, vr1.vuse,
vn_reference_lookup_2,
-   vn_reference_lookup_3, vr1);
+   vn_reference_lookup_3,
+   vn_valueize, vr1);
   gcc_checking_assert (vr1.operands == shared_lookup_references);
   if (wvnresult)
{
Index: gcc/tree-ssa-dom.c
===
--- gcc/tree-ssa-dom.c  (revision 218479)
+++ gcc/tree-ssa-dom.c  (working copy)
@@ -2635,7 +2635,7 @@ lookup_avail_expr (gimple stmt, bool ins
 TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME
 (ao_ref_init (ref, gimple_assign_rhs1 (stmt)), true)
 walk_non_aliased_vuses (ref, vuse2,
-  vuse_eq, NULL, vuse1) != NULL))
+   

[PATCH] Fix PR42108

2014-12-09 Thread Richard Biener

The following finally fixes PR42108 (well, hopefully...) by using
range-information on SSA names to allow the integer divisions introduced
by Fortran array lowering being hoisted out of loops, thus detecting
them as not trapping.

I chose to enhance tree_single_nonzero_warnv_p for this and adjusted
operation_could_trap_helper_p to use this helper.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2014-12-09  Richard Biener  rguent...@suse.de

PR tree-optimization/42108
* fold-const.c (tree_single_nonzero_warnv_p): Use range
information associated with SSA names.
* tree-eh.c (operation_could_trap_helper_p): Use
tree_single_nonzero_warnv_p to check for trapping non-fp
operation.
* tree-vrp.c (remove_range_assertions): Remove bogus assert.

* gfortran.dg/pr42108.f90: Adjust testcase.

Index: gcc/fold-const.c
===
--- gcc/fold-const.c(revision 218479)
+++ gcc/fold-const.c(working copy)
@@ -83,6 +83,8 @@ along with GCC; see the file COPYING3.
 #include cgraph.h
 #include generic-match.h
 #include optabs.h
+#include stringpool.h
+#include tree-ssanames.h
 
 /* Nonzero if we are folding constants inside an initializer; zero
otherwise.  */
@@ -15362,6 +15381,26 @@ tree_single_nonzero_warnv_p (tree t, boo
}
   break;
 
+case SSA_NAME:
+  if (INTEGRAL_TYPE_P (TREE_TYPE (t)))
+   {
+ wide_int minv, maxv;
+ enum value_range_type rtype = get_range_info (t, minv, maxv);
+ if (rtype == VR_RANGE)
+   {
+ if (wi::lt_p (maxv, 0, TYPE_SIGN (TREE_TYPE (t)))
+ || wi::gt_p (minv, 0, TYPE_SIGN (TREE_TYPE (t
+   return true;
+   }
+ else if (rtype == VR_ANTI_RANGE)
+   {
+ if (wi::le_p (minv, 0, TYPE_SIGN (TREE_TYPE (t)))
+  wi::ge_p (maxv, 0, TYPE_SIGN (TREE_TYPE (t
+   return true;
+   }
+   }
+  break;
+
 default:
   break;
 }
Index: gcc/tree-eh.c
===
--- gcc/tree-eh.c   (revision 218479)
+++ gcc/tree-eh.c   (working copy)
@@ -2440,13 +2440,16 @@ operation_could_trap_helper_p (enum tree
 case ROUND_MOD_EXPR:
 case TRUNC_MOD_EXPR:
 case RDIV_EXPR:
-  if (honor_snans || honor_trapv)
-   return true;
-  if (fp_operation)
-   return flag_trapping_math;
-  if (!TREE_CONSTANT (divisor) || integer_zerop (divisor))
-return true;
-  return false;
+  {
+   if (honor_snans || honor_trapv)
+ return true;
+   if (fp_operation)
+ return flag_trapping_math;
+   bool sop;
+   if (!tree_single_nonzero_warnv_p (divisor, sop))
+ return true;
+   return false;
+  }
 
 case LT_EXPR:
 case LE_EXPR:
Index: gcc/testsuite/gfortran.dg/pr42108.f90
===
--- gcc/testsuite/gfortran.dg/pr42108.f90   (revision 218479)
+++ gcc/testsuite/gfortran.dg/pr42108.f90   (working copy)
@@ -1,5 +1,5 @@
 ! { dg-do compile }
-! { dg-options -O2 -fdump-tree-fre1 }
+! { dg-options -O2 -fdump-tree-fre1 -fdump-tree-lim1-details }
 
 subroutine  eval(foo1,foo2,foo3,foo4,x,n,nnd)
   implicit real*8 (a-h,o-z)
@@ -21,7 +21,10 @@ subroutine  eval(foo1,foo2,foo3,foo4,x,n
   end do
 end subroutine eval
 
-! There should be only one load from n left
+! There should be only one load from n left and the division should
+! be hoisted out of the loop
 
 ! { dg-final { scan-tree-dump-times \\*n_ 1 fre1 } }
+! { dg-final { scan-tree-dump-times Moving statement 5 lim1 } }
 ! { dg-final { cleanup-tree-dump fre1 } }
+! { dg-final { cleanup-tree-dump lim1 } }
Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 218479)
+++ gcc/tree-vrp.c  (working copy)
@@ -6866,12 +6866,9 @@ remove_range_assertions (void)
tree lhs = gimple_assign_lhs (stmt);
tree rhs = gimple_assign_rhs1 (stmt);
tree var;
-   tree cond = fold (ASSERT_EXPR_COND (rhs));
use_operand_p use_p;
imm_use_iterator iter;
 
-   gcc_assert (cond != boolean_false_node);
-
var = ASSERT_EXPR_VAR (rhs);
gcc_assert (TREE_CODE (var) == SSA_NAME);
 


Re: [PATCH 6/n] OpenMP 4.0 offloading infrastructure: option handling

2014-12-09 Thread Richard Biener
On Mon, 8 Dec 2014, Ilya Verbin wrote:

 Hi,
 
 On 28 Nov 09:36, Richard Biener wrote:
  On Fri, 28 Nov 2014, Ilya Verbin wrote:
   I found a bug here, have_{lto,offload} must be set if at least one file 
   contains
   lto/offload sections, but currently they are overwritten by the last file.
   Fix is bootstrapped and regtested on x86_64-linux.  OK for trunk?
  Ok.
 
 Unfortunately, this fix was not general enough.
 There might be cases when mixed object files get into lto-wrapper, ie some of
 them contain only LTO sections, some contain only offload sections, and some
 contain both.  But when lto-wrapper will pass all these files to 
 recompilation,
 the compiler might crash (it depends on the order of input files), since in
 read_cgraph_and_symbols it expects that *all* input files contain IR section 
 of
 given type.
 This patch splits input objects from argv into lto_argv and offload_argv, so
 that all files in arrays contain corresponding IR.
 Similarly, in lto-plugin, it was bad idea to add objects, which contain 
 offload
 IR without LTO, to claimed_files, since this may corrupt a resolution file.
 
 Tested on various combinations of files with/without -flto and with/without
 offload, using trunk ld and gold, also tested on ld without plugin support.
 Bootstrap and make check passed on x86_64-linux and i686-linux.  Ok for trunk?

Did you check that bootstrap-lto still works?  Ok if so.

Thanks,
Richard.

 But I am unable to run lto-bootstrap on trunk, it says:
 /x/build-x86_64-linux/./prev-gcc/gcc-ar -B/x/build-x86_64-linux/./prev-gcc/ 
 cru libdecnumber.a decNumber.o decContext.o decimal32.o decimal64.o 
 decimal128.o bid2dpd_dpd2bid.o host-ieee32.o host-ieee64.o host-ieee128.o
 /usr/bin/ar terminated with signal 11 [Segmentation fault], core dumped
 make[4]: *** [libdecnumber.a] Error 1
 
 
 gcc/
   * lto-wrapper.c (compile_offload_image): Start processing in_argv
   from 0 instead of 1.
   (run_gcc): Put offload objects into offload_argv, put LTO objects and
   possible preceding arguments into lto_argv.
   Pass offload_argv to compile_images_for_offload_targets instead of argv.
   Use lto_argv for LTO recompilation instead of argv.
 lto-plugin/
   * lto-plugin.c (offload_files, num_offload_files): New static variables.
   (free_1): Use arguments instead of global variables.
   (free_2): Free offload_files.
   (all_symbols_read_handler): Add names from offload_files to lto-wrapper
   arguments.
   (claim_file_handler): Do not add file to claimed_files if it contains
   offload sections without LTO sections.  Add it to offload_files instead.
 
 
 diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
 index 0f69457..f75c0dc 100644
 --- a/gcc/lto-wrapper.c
 +++ b/gcc/lto-wrapper.c
 @@ -669,7 +669,7 @@ compile_offload_image (const char *target, const char 
 *compiler_path,
obstack_ptr_grow (argv_obstack, filename);
  
/* Append names of input object files.  */
 -  for (unsigned i = 1; i  in_argc; i++)
 +  for (unsigned i = 0; i  in_argc; i++)
   obstack_ptr_grow (argv_obstack, in_argv[i]);
  
/* Append options from offload_lto sections.  */
 @@ -883,6 +883,8 @@ run_gcc (unsigned argc, char *argv[])
int new_head_argc;
bool have_lto = false;
bool have_offload = false;
 +  unsigned lto_argc = 0, offload_argc = 0;
 +  char **lto_argv, **offload_argv;
  
/* Get the driver and options.  */
collect_gcc = getenv (COLLECT_GCC);
 @@ -896,6 +898,11 @@ run_gcc (unsigned argc, char *argv[])
   decoded_options,
   decoded_options_count);
  
 +  /* Allocate arrays for input object files with LTO or offload IL,
 + and for possible preceding arguments.  */
 +  lto_argv = XNEWVEC (char *, argc);
 +  offload_argv = XNEWVEC (char *, argc);
 +
/* Look at saved options in the IL files.  */
for (i = 1; i  argc; ++i)
  {
 @@ -918,17 +925,27 @@ run_gcc (unsigned argc, char *argv[])
   }
fd = open (argv[i], O_RDONLY);
if (fd == -1)
 - continue;
 + {
 +   lto_argv[lto_argc++] = argv[i];
 +   continue;
 + }
 +
 +  if (find_and_merge_options (fd, file_offset, LTO_SECTION_NAME_PREFIX,
 +   fdecoded_options, fdecoded_options_count,
 +   collect_gcc))
 + {
 +   have_lto = true;
 +   lto_argv[lto_argc++] = argv[i];
 + }
 +
 +  if (find_and_merge_options (fd, file_offset, 
 OFFLOAD_SECTION_NAME_PREFIX,
 +   offload_fdecoded_options,
 +   offload_fdecoded_options_count, collect_gcc))
 + {
 +   have_offload = true;
 +   offload_argv[offload_argc++] = argv[i];
 + }
  
 -  have_lto
 - |= find_and_merge_options (fd, file_offset, LTO_SECTION_NAME_PREFIX,
 -fdecoded_options, fdecoded_options_count,
 - 

Re: [PATCH 8/9] Negative numbers added for sreal class.

2014-12-09 Thread Richard Biener
On Mon, Dec 8, 2014 at 5:52 PM, Martin Liška mli...@suse.cz wrote:
 On 11/28/2014 10:32 AM, Richard Biener wrote:

 On Thu, Nov 27, 2014 at 6:08 PM, Martin Liška mli...@suse.cz wrote:

 On 11/21/2014 04:21 PM, Martin Liška wrote:


 On 11/21/2014 04:02 PM, Richard Biener wrote:


 On Fri, Nov 21, 2014 at 3:39 PM, Martin Liška mli...@suse.cz wrote:

 Hello.

 Ok, this is simplified, one can use sreal a = 12345 and it works ;)

 that's a  new API, right?  There is no max () and I think that using
 LONG_MIN here is asking for trouble (host dependence).  The
 comment in the file says the max should be
 sreal (SREAL_MAX_SIG, SREAL_MAX_EXP) and the min
 sreal (-SREAL_MAX_SIG, SREAL_MAX_EXP)?


 Sure, sreal can store much bigger(smaller) numbers :)

 Where do you need sreal::to_double?  The host shouldn't perform
 double calculations so it can be only for dumping?  In which case
 the user should have used sreal::dump (), maybe with extra
 arguments.


 That new function was request from Honza, only for debugging purpose.
 I agree that dump should this kind of job.

 If no other problem, I will run tests once more and commit it.
 Thanks,
 Martin



 -#define SREAL_MAX_EXP (INT_MAX / 4)
 +#define SREAL_MAX_EXP (INT_MAX / 8)

 this change doesn't look necessary anymore?

 Btw, it's also odd that...

#define SREAL_PART_BITS 32
 ...
#define SREAL_MIN_SIG ((uint64_t) 1  (SREAL_PART_BITS - 1))
#define SREAL_MAX_SIG (((uint64_t) 1  SREAL_PART_BITS) - 1)

 thus all m_sig values fit in 32bits but we still use a uint64_t m_sig
 ...
 (the implementation uses 64bit for internal computations, but still
 the storage is wasteful?)

 Of course the way normalize() works requires that storage to be
 64bits to store unnormalized values.

 I'd say ok with the SREAL_MAX_EXP change reverted.


 Hi.

 You are right, this change was done because I used one bit for
 m_negative
 (bitfield), not needed any more.

 Final version attached.

 Thank you,
 Martin

 Thanks,
 Richard.



 Otherwise looks good to me and sorry for not noticing the above
 earlier.

 Thanks,
 Richard.

 Thanks,
 Martin


  };

  extern void debug (sreal ref);
 @@ -76,12 +133,12 @@ inline sreal operator+= (sreal a, const
 sreal
 b)

  inline sreal operator-= (sreal a, const sreal b)
  {
 -return a = a - b;
 +  return a = a - b;
  }

  inline sreal operator/= (sreal a, const sreal b)
  {
 -return a = a / b;
 +  return a = a / b;
  }

  inline sreal operator*= (sreal a, const sreal b)
 --
 2.1.2






 Hello.

 After IRC discussions, I decided to give sreal another refactoring where
 I
 use int64_t for m_sig.

 This approach looks much easier and straightforward. I would like to
 ask folk for comments?


 I think you want to exclude LONG_MIN (how do you know that LONG_MIN
 is min(int64_t)?) as a valid value for m_sig - after all SREAL_MIN_SIG
 makes it symmetric.  Or simply use abs_hwi at


 Hello.

 I decided to use abs_hwi.

That will ICE if you do

  sreal x (- __LONG_MAX__ - 1);

maybe that's the only case though.

 sreal::normalize ()
 {
+  int64_t s = m_sig  0 ? -1 : 1;
+  HOST_WIDE_INT sig = abs_hwi (m_sig);
+
   if (m_sig == 0)
...
 }
+
+  m_sig = s * sig;
 }

it's a bit awkward to strip the sign and then put it back on this way.  Also
now using a signed 'sig' where it was unsigned before.  And keeping
the first test using m_sig instead of sig.

I'd simply have used 'unsigned HOST_WIDE_INT sig = absu_hwi (m_sig);'
instead.

The rest of the patch is ok with the above change.

Thanks,
Richard.



 +  int64_t s = m_sig  0 ? -1 : 1;
 +  uint64_t sig = m_sig == LONG_MIN ? LONG_MAX : std::abs (m_sig);

 -#define SREAL_MIN_SIG ((uint64_t) 1  (SREAL_PART_BITS - 1))
 -#define SREAL_MAX_SIG (((uint64_t) 1  SREAL_PART_BITS) - 1)
 +#define SREAL_MIN_SIG ((uint64_t) 1  (SREAL_PART_BITS - 2))
 +#define SREAL_MAX_SIG (((uint64_t) 1  (SREAL_PART_BITS - 1)) - 1)

 shouldn't this also be -2 in the last line?


 It's correct, in the first line, I s/'SREAL_PART_BITS - 1'/'SREAL_PART_BITS
 - 2' and
 second one is also decremented: s/'SREAL_PART_BITS'/'REAL_PART_BITS - 1'.


 That is, you effectively use the MSB as a sign bit?


 Yes. It uses signed integer with MSB as a sign bit.


 Btw, a further change would be to make m_sig 'signed int', matching
 the real storage requirements according to SREAL_PART_BITS.
 This would of course still require temporaries used for computation
 to be 64bits and it would require normalization to work on the
 temporaries.  But then we'd get down to 8 bytes storage ...


 Agree, we can shrink the size for future.

 I've been running tests, I hope this implementation is much nicer than
 having bool m_negative. What do you think about trunk acceptation?

 Thanks,
 Martin



 Richard.


 I am able to run profiled bootstrap on x86_64-linux-pc and ppc64-linux-pc
 and new regression is introduced.

 Thanks,
 Martin





[PATCH, rtl-optimization]: Fix PR64213 by reverting fix for PR64037

2014-12-09 Thread Uros Bizjak
Hello!

The fix for PR64037 [1] introduced severe problems, including
bootstrap failure on alphaev6-linux-gnu. Non-bootstrapped testsuite
results [2] can be compared with bootstrapped gcc with a revert [3].
As discussed in the PR, the problem was in a missed SI-DI zero_extend.

[1] https://gcc.gnu.org/ml/gcc-patches/2014-11/msg03405.html
[2] https://gcc.gnu.org/ml/gcc-testresults/2014-12/msg01068.html
[3] https://gcc.gnu.org/ml/gcc-testresults/2014-12/msg01069.html

The patch fixes the regression by reverting the above patch.

2014-12-09  Uros Bizjak  ubiz...@gmail.com

PR bootstrap/64213
Revert:
2014-11-28  H.J. Lu  hongjiu...@intel.com

PR rtl-optimization/64037
* combine.c (setup_incoming_promotions): Pass the argument
before any promotions happen to promote_function_mode.

* configure.tgt (x86_64): Tune -m32 multilib to generic.

testsuite/ChangeLog:

2014-12-09  Uros Bizjak  ubiz...@gmail.com

PR bootstrap/64213
Revert:
2014-11-28  H.J. Lu  hongjiu...@intel.com

PR rtl-optimization/64037
* g++.dg/pr64037.C: New test.

Patch was bootstrapped and regression tested on alphaev68-linux-gnu.

Approved by Richi in the PR audit trail, will be committed to mainline
SVN, 4.9 and 4.8 branch.

Uros.
Index: combine.c
===
--- combine.c   (revision 218160)
+++ combine.c   (revision 218161)
@@ -1561,8 +1561,8 @@ setup_incoming_promotions (rtx_insn *first)
   uns3 = TYPE_UNSIGNED (DECL_ARG_TYPE (arg));
 
   /* The mode and signedness of the argument as it is actually passed,
- after any TARGET_PROMOTE_FUNCTION_ARGS-driven ABI promotions.  */
-  mode3 = promote_function_mode (DECL_ARG_TYPE (arg), mode2, uns3,
+ see assign_parm_setup_reg in function.c.  */
+  mode3 = promote_function_mode (TREE_TYPE (arg), mode1, uns1,
 TREE_TYPE (cfun-decl), 0);
 
   /* The mode of the register in which the argument is being passed.  */


Re: [PING ^ 3][PATCH, AArch64] Add doloop_end pattern for -fmodulo-sched

2014-12-09 Thread Tejas Belagod

On 18/11/14 11:28, Yangfei (Felix) wrote:

On 11/18/2014 11:48 AM, Yangfei (Felix) wrote:

+(define_expand doloop_end
+  [(use (match_operand 0  ))  ; loop pseudo
+   (use (match_operand 1  ))] ; label
+  
+{
+  /* Currently SMS relies on the do-loop pattern to recognize loops
+ where (1) the control part consists of all insns defining and/or
+ using a certain 'count' register and (2) the loop count can be
+ adjusted by modifying this register prior to the loop.
+ ??? The possible introduction of a new block to initialize the
+ new IV can potentially affect branch optimizations.  */
+  if (optimize  0  flag_modulo_sched)


You'd be better off moving this condition into the expansion predicate (which is
currently ).

This short-circuits a lot of unnecessary work.  See  pass_rtl_doloop::gate.


r~



Yeah, that's a good idea.  See my updated patch :-)


Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 217394)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2014-11-13  Felix Yang  felix.y...@huawei.com
+
+   * config/aarch64/aarch64.c (doloop_end): New pattern.
+   * config/aarch64/aarch64.md (TARGET_CAN_USE_DOLOOP_P): Implement.
+
  2014-11-11  Andrew Pinski  apin...@cavium.com

Bug target/61997
Index: gcc/config/aarch64/aarch64.md
===
--- gcc/config/aarch64/aarch64.md   (revision 217394)
+++ gcc/config/aarch64/aarch64.md   (working copy)
@@ -4087,6 +4087,43 @@
[(set_attr type mrs)])


+;; Define the subtract-one-and-jump insns so loop.c
+;; knows what to generate.
+(define_expand doloop_end
+  [(use (match_operand 0  ))  ; loop pseudo
+   (use (match_operand 1  ))] ; label
+  optimize  0  flag_modulo_sched
+{
+  rtx s0;
+  rtx bcomp;
+  rtx loc_ref;
+  rtx cc_reg;
+  rtx insn;
+  rtx cmp;
+
+  /* Currently SMS relies on the do-loop pattern to recognize loops
+ where (1) the control part consists of all insns defining and/or
+ using a certain 'count' register and (2) the loop count can be
+ adjusted by modifying this register prior to the loop.
+ ??? The possible introduction of a new block to initialize the
+ new IV can potentially affect branch optimizations.  */
+
+  if (GET_MODE (operands[0]) != DImode)
+FAIL;
+
+  s0 = operands [0];
+  insn = emit_insn (gen_adddi3_compare0 (s0, s0, GEN_INT (-1)));
+
+  cmp = XVECEXP (PATTERN (insn), 0, 0);
+  cc_reg = SET_DEST (cmp);
+  bcomp = gen_rtx_NE (VOIDmode, cc_reg, const0_rtx);
+  loc_ref = gen_rtx_LABEL_REF (VOIDmode, operands [1]);
+  emit_jump_insn (gen_rtx_SET (VOIDmode, pc_rtx,
+  gen_rtx_IF_THEN_ELSE (VOIDmode, bcomp,
+loc_ref, pc_rtx)));
+  DONE;
+})
+
  ;; AdvSIMD Stuff
  (include aarch64-simd.md)

Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c(revision 217394)
+++ gcc/config/aarch64/aarch64.c(working copy)
@@ -10224,6 +10224,9 @@ aarch64_use_by_pieces_infrastructure_p (unsigned i
  #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \
aarch64_use_by_pieces_infrastructure_p

+#undef TARGET_CAN_USE_DOLOOP_P
+#define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost
+
  struct gcc_target targetm = TARGET_INITIALIZER;

  #include gt-aarch64.h



Hi Felix,

This patch causes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64240 
when sms-3 is tested with -fPIC. It runs fine when I reverse this patch out.


Please could you have a look?

Thanks,
Tejas.



Re: Fix libgomp crash without TLS (PR42616)

2014-12-09 Thread Varvara Rainchik
Ok, the following patch works for Android:

2014-12-09  Varvara Rainchik  varvara.rainc...@intel.com

* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac: Add GCC_CHECK_EMUTLS.
* libgomp.h: Add check for USE_EMUTLS: this case
is equal to HAVE_TLS.
* team.c: Likewise.

--

diff --git a/libgomp/configure.ac b/libgomp/configure.ac
index cea6366..16ec158 100644
--- a/libgomp/configure.ac
+++ b/libgomp/configure.ac
@@ -245,6 +245,9 @@ fi
 # See if we support thread-local storage.
 GCC_CHECK_TLS

+# See if we have emulated thread-local storage.
+GCC_CHECK_EMUTLS
+
 # See what sort of export controls are available.
 LIBGOMP_CHECK_ATTRIBUTE_VISIBILITY
 LIBGOMP_CHECK_ATTRIBUTE_DLLEXPORT
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index a1482cc..b694356 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -471,7 +471,7 @@ enum gomp_cancel_kind

 /* ... and here is that TLS data.  */

-#ifdef HAVE_TLS
+#if defined HAVE_TLS || defined USE_EMUTLS
 extern __thread struct gomp_thread gomp_tls_data;
 static inline struct gomp_thread *gomp_thread (void)
 {
diff --git a/libgomp/team.c b/libgomp/team.c
index e6a6d8f..594127c 100644
--- a/libgomp/team.c
+++ b/libgomp/team.c
@@ -37,7 +37,7 @@ pthread_key_t gomp_thread_destructor;


 /* This is the libgomp per-thread data structure.  */
-#ifdef HAVE_TLS
+#if defined HAVE_TLS || defined USE_EMUTLS
 __thread struct gomp_thread gomp_tls_data;
 #else
 pthread_key_t gomp_tls_key;
@@ -70,7 +70,7 @@ gomp_thread_start (void *xdata)
   void (*local_fn) (void *);
   void *local_data;

-#ifdef HAVE_TLS
+#if defined HAVE_TLS || defined USE_EMUTLS
   thr = gomp_tls_data;
 #else
   struct gomp_thread local_thr;
@@ -916,7 +916,7 @@ gomp_team_end (void)
 static void __attribute__((constructor))
 initialize_team (void)
 {
-#ifndef HAVE_TLS
+#if !defined HAVE_TLS  !defined USE_EMUTLS
   static struct gomp_thread initial_thread_tls_data;

   pthread_key_create (gomp_tls_key, NULL);

Changes are bootstrapped and regtested on linux, all make check tests
now also pass with --disable-tls.
Is it ok for upstream?

2014-12-09 15:12 GMT+03:00 Jakub Jelinek ja...@redhat.com:
 On Tue, Dec 09, 2014 at 02:49:44PM +0300, Varvara Rainchik wrote:
 Can we instead of adding new macroses in config/tls.m4 use something
 like that in libgomp:

 #if defined (HAVE_TLS)  defined (USE_EMUTLS)

 (with GCC_CHECK_EMUTLS in libgomp/configure.ac)?

 That would be fine too.

 Jakub


Re: [PATCH] Update libgcc.texi to match implementation in libgcc/libgcc2.c

2014-12-09 Thread Ian Lance Taylor
On Mon, Oct 27, 2014 at 8:19 PM, Kito Cheng kito.ch...@gmail.com wrote:

 This patch update `Bit operations` section in libgcc.text, most bit
 operation function is take an unsigned integer instead of signed
 integer in libgcc/libgcc2.c [1], and it seem more make sense :)

 ChangeLog
 2014-10-28  Kito Cheng  k...@0xlab.org

 * doc/libgcc.texi: Update text to match implementation in
 libgcc/libgcc2.c

 [1] 
 https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=blob;f=libgcc/libgcc2.c;h=46d6a2ef030ff98a944935f77529cee9c05fb326;hb=HEAD

This is OK.

Thanks.

Ian


Re: [PATCH 2/3] Extended if-conversion

2014-12-09 Thread Richard Biener
On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev ysrum...@gmail.com wrote:
 Richard,

 Here is updated patch2 with the following changes:
 1. Delete functions  phi_has_two_different_args and find_insertion_point.
 2. Use only one function for extended predication -
 predicate_extended_scalar_phi.
 3. Save gsi before insertion of predicate computations for basic
 blocks if it has 2 predecessors and
 both incoming edges are critical or it gas more than 2 predecessors
 and at least one incoming edge
 is critical. This saved iterator can be used by extended phi predication.

 Here is motivated test-case which explains this point.
 Test-case is attached (t5.c) and it must be compiled with -O2
 -ftree-loop-vectorize -fopenmp options.
 The problem phi is in bb-7:

   bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 })
   {
 bb 5:
 xmax_edge_18 = xmax_edge_36 + 1;
 if (xmax_17 == xmax_27)
   goto bb 7;
 else
   goto bb 9;

   }
   bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 })
   {
 bb 6:
 if (xmax_17 == xmax_27)
   goto bb 7;
 else
   goto bb 8;

   }
   bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 })
   {
 bb 7:
 # xmax_edge_30 = PHI xmax_edge_36(6), xmax_edge_18(5)
 xmax_edge_19 = xmax_edge_39 + 1;
 goto bb 11;

   }

 Note that both incoming edges to bb_7 are critical. If we comment out
 restoring gsi in predicate_all_scalar_phi:
 #if 0
  if ((EDGE_COUNT (bb-preds) == 2  all_preds_critical_p (bb))
  || (EDGE_COUNT (bb-preds)  2  has_pred_critical_p (bb)))
gsi = bb_insert_point (bb);
  else
 #endif
gsi = gsi_after_labels (bb);

 we will get ICE:
 t5.c: In function 'foo':
 t5.c:9:6: error: definition in block 4 follows the use
  void foo (int n)
   ^
 for SSA_NAME: _1 in statement:
 _52 = _1  _3;
 t5.c:9:6: internal compiler error: verify_ssa failed

 smce predicate computations were inserted in bb_7.

The issue is obviously that the predicates have already been emitted
in the target BB - that's of course the wrong place.  This is done
by insert_gimplified_predicates.

This just shows how edge predicate handling is broken - we don't
seem to have a sequence of gimplified stmts for edge predicates
but push those to e-dest which makes this really messy.

Rather than having a separate phase where we insert all
gimplified bb predicates we should do that on-demand when
predicating a PHI.

Your patch writes to stderr - that's bad - use dump_file and guard
the printfs properly.

You also still have two functions for PHI predication.  And the
new extended variant doesn't commonize the 2-args and general
paths.

I'm not at all happy with this code.  It may be existing if-conv codes
fault but making it even worse is not an option.

Again - what's wrong with simply splitting critical edges if
aggressive_if_conv?  I think that would very much simplify
things here.  Or alternatively use gsi_insert_on_edge and
commit edge insertions before merging the blocks.

Thanks,
Richard.

 ChangeLog is

 2014-12-09  Yuri Rumyantsev  ysrum...@gmail.com

 * tree-if-conv.c : Include hash-map.h.
 (struct bb_predicate_s): Add new field to save copy of gimple
 statement iterator.
 (bb_insert_point): New function.
 (set_bb_insert_point): New function.
 (has_pred_critical_p): New function.
 (if_convertible_bb_p): Allow bb has more than 2 predecessors if
 AGGRESSIVE_IF_CONV is true.
 (if_convertible_bb_p): Delete check that bb has at least one
 non-critical incoming edge.
 (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
 Allow interchange PHI arguments if EXTENDED is false.
 Change check that block containing reduction statement candidate
 is predecessor of phi-block since phi may have more than two arguments.
 (predicate_scalar_phi): Add new arguments for call of
 is_cond_scalar_reduction.
 (get_predicate_for_edge): New function.
 (struct phi_args_hash_traits): New type.
 (phi_args_hash_traits::hash): New function.
 (phi_args_hash_traits::equal_keys): New function.
 (gen_phi_arg_condition): New function.
 (predicate_extended_scalar_phi): New function.
 (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it
 to true if BB containing phi has more than 2 predecessors or both
 incoming edges are critical. Invoke find_phi_replacement_condition and
 predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB
 has 2 predecessors and both incoming edges are critical or it has more
 than 2 predecessors and atleast one incoming edge is critical.
 Use standard gsi_after_labels otherwise.
 Invoke predicate_extended_scalar_phi if EXTENDED is true.
 (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION
 to save gsi before insertion of predicate computations. SEt-up it to
 true for BB with 2 predecessors and critical incoming edges either
 number of predecessors is geater 2 and at least one incoming edge is
 critical.
 Add check that non-predicated block may have statements to insert.
 Insert predicate computation of BB just after 

[PATCH 0/4] [AARCH64,SIMD] PR63870 Improve error messages for single lane load/store

2014-12-09 Thread charles . baylis
From: Charles Baylis charles.bay...@linaro.org

This patch series moves the checking of lane indices for vld[234](q?)_lane and
vst[234](q?)_lane intrinsics so that it occurs during builtin expansion.

The q register variants are checked directly, but since the d register variants
use the same intrinsics, these are checked in arm_neon.h using
__builtin_aarch64_im_land_boundsi().

Tested with make check-gcc on aarch64-oe-linux, with no regressions.

Charles Baylis (4):
  vldN_lane error message enhancements (Q registers)
  vldN_lane error message enhancements (D registers)
  vstN_lane error message enhancements (Q register)
  vstN_lane error message enhancements (D registers)

 gcc/config/aarch64/aarch64-builtins.c | 32 +++-
 gcc/config/aarch64/aarch64-simd.md| 12 ++--
 gcc/config/aarch64/arm_neon.h | 12 
 3 files changed, 45 insertions(+), 11 deletions(-)

-- 
1.9.1



[PATCH 1/4] vldN_lane error message enhancements (Q registers)

2014-12-09 Thread charles . baylis
From: Charles Baylis charles.bay...@linaro.org

gcc/ChangeLog:

DATE  Charles Baylis  charles.bay...@linaro.org

PR target/63870
* config/aarch64/aarch64-builtins.c (enum aarch64_type_qualifiers):
Add qualifier_struct_load_store_lane_index.
(aarch64_types_loadstruct_lane_qualifiers): Use
qualifier_struct_load_store_lane_index for lane index argument for
last argument.
(builtin_simd_arg): Add SIMD_ARG_STRUCT_LOAD_STORE_LANE_INDEX.
(aarch64_simd_expand_args): Add new argument describing mode of
builtin. Check lane bounds for arguments with
SIMD_ARG_STRUCT_LOAD_STORE_LANE_INDEX.
(aarch64_simd_expand_builtin): Emit error for incorrect lane indices
if marked with SIMD_ARG_STRUCT_LOAD_STORE_LANE_INDEX.
(aarch64_simd_expand_builtin): Pass machine mode of builtin to
aarch64_simd_expand_args.
* config/aarch64/aarch64-simd.md: (aarch64_ld2_lanemode): Remove
lane bounds check. Adjust lane numbers for big-endian.
(aarch64_ld3_lanemode): Likewise.
(aarch64_ld4_lanemode): Likewise.

gcc/testsuite/ChangeLog:

DATE  Charles Baylis  charles.bay...@linaro.org

* gcc.target/aarch64/simd/vld4q_lane.c: New test.

Change-Id: Ib17adaf64e631cf8d00a1a1a6c12409d2d7f4239
---
 gcc/config/aarch64/aarch64-builtins.c  | 30 +++---
 gcc/config/aarch64/aarch64-simd.md | 12 -
 gcc/testsuite/gcc.target/aarch64/simd/vld4q_lane.c | 16 
 3 files changed, 48 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/vld4q_lane.c

diff --git a/gcc/config/aarch64/aarch64-builtins.c 
b/gcc/config/aarch64/aarch64-builtins.c
index aac7269..27046e2 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -116,7 +116,9 @@ enum aarch64_type_qualifiers
   /* Polynomial types.  */
   qualifier_poly = 0x100,
   /* Lane indices - must be in range, and flipped for bigendian.  */
-  qualifier_lane_index = 0x200
+  qualifier_lane_index = 0x200,
+  /* Lane indices for single lane structure loads and stores */
+  qualifier_struct_load_store_lane_index = 0x400
 };
 
 typedef struct
@@ -224,7 +226,7 @@ aarch64_types_load1_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 static enum aarch64_type_qualifiers
 aarch64_types_loadstruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_const_pointer_map_mode,
-  qualifier_none, qualifier_none };
+  qualifier_none, qualifier_struct_load_store_lane_index };
 #define TYPES_LOADSTRUCT_LANE (aarch64_types_loadstruct_lane_qualifiers)
 
 static enum aarch64_type_qualifiers
@@ -859,12 +861,14 @@ typedef enum
   SIMD_ARG_COPY_TO_REG,
   SIMD_ARG_CONSTANT,
   SIMD_ARG_LANE_INDEX,
+  SIMD_ARG_STRUCT_LOAD_STORE_LANE_INDEX,
   SIMD_ARG_STOP
 } builtin_simd_arg;
 
 static rtx
 aarch64_simd_expand_args (rtx target, int icode, int have_retval,
- tree exp, builtin_simd_arg *args)
+ tree exp, builtin_simd_arg *args,
+ enum machine_mode builtin_mode)
 {
   rtx pat;
   rtx op[SIMD_MAX_BUILTIN_ARGS + 1]; /* First element for result operand.  */
@@ -903,6 +907,21 @@ aarch64_simd_expand_args (rtx target, int icode, int 
have_retval,
op[opc] = copy_to_mode_reg (mode, op[opc]);
  break;
 
+   case SIMD_ARG_STRUCT_LOAD_STORE_LANE_INDEX:
+ /* we expect arguments in order (ptr, array_of_vector, lane), and
+we have to grub around in the ptr to find the lane size */
+ gcc_assert (opc  1);
+ if (CONST_INT_P (op[opc]))
+   {
+ aarch64_simd_lane_bounds (op[opc], 0,
+   GET_MODE_NUNITS (builtin_mode),
+   exp);
+ /* Keep to GCC-vector-extension lane indices in the RTL.  */
+ op[opc] =
+   GEN_INT (ENDIAN_LANE_N (builtin_mode, INTVAL (op[opc])));
+   }
+ goto constant_arg;
+
case SIMD_ARG_LANE_INDEX:
  /* Must be a previous operand into which this is an index.  */
  gcc_assert (opc  0);
@@ -917,6 +936,7 @@ aarch64_simd_expand_args (rtx target, int icode, int 
have_retval,
  /* Fall through - if the lane index isn't a constant then
 the next case will error.  */
case SIMD_ARG_CONSTANT:
+constant_arg:
  if (!(*insn_data[icode].operand[opc].predicate)
  (op[opc], mode))
  {
@@ -1003,6 +1023,8 @@ aarch64_simd_expand_builtin (int fcode, tree exp, rtx 
target)
 
   if (d-qualifiers[qualifiers_k]  qualifier_lane_index)
args[k] = SIMD_ARG_LANE_INDEX;
+  else if (d-qualifiers[qualifiers_k]  
qualifier_struct_load_store_lane_index)
+   args[k] = SIMD_ARG_STRUCT_LOAD_STORE_LANE_INDEX;
   else if 

[PATCH 2/4] vldN_lane error message enhancements (D registers)

2014-12-09 Thread charles . baylis
From: Charles Baylis charles.bay...@linaro.org

gcc/ChangeLog

DATE  Charles Baylis  charles.bay...@linaro.org

* config/aarch64/arm_neon.h (__LD2_LANE_FUNC): Add explicit lane
bounds check.
(__LD3_LANE_FUNC): Likewise.
(__LD4_LANE_FUNC): Likewise

gcc/testsuite/ChangeLog:

DATE  Charles Baylis  charles.bay...@linaro.org

* gcc.target/aarch64/simd/vld4_lane.c: New test.

Change-Id: Ia95fbed34b50cf710ea9032ff3428a5f1432e0aa
---
 gcc/config/aarch64/arm_neon.h |  6 ++
 gcc/testsuite/gcc.target/aarch64/simd/vld4_lane.c | 15 +++
 2 files changed, 21 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/vld4_lane.c

diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 8cff719..22df564 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -17901,6 +17901,8 @@ vld2_lane_##funcsuffix (const ptrtype * __ptr, intype 
__b, const int __c)  \
   __o = __builtin_aarch64_set_qregoi##mode (__o,  \
   (signedtype) __temp.val[1], \
   1); \
+  __builtin_aarch64_im_lane_boundsi (__c, \
+sizeof (vectype) / sizeof (*__ptr));  \
   __o =__builtin_aarch64_ld2_lane##mode (  
   \
  (__builtin_aarch64_simd_##ptrmode *) __ptr, __o, __c);   \
   __b.val[0] = (vectype) __builtin_aarch64_get_dregoidi (__o, 0); \
@@ -17991,6 +17993,8 @@ vld3_lane_##funcsuffix (const ptrtype * __ptr, intype 
__b, const int __c)  \
   __o = __builtin_aarch64_set_qregci##mode (__o,  \
   (signedtype) __temp.val[2], \
   2); \
+  __builtin_aarch64_im_lane_boundsi (__c, \
+sizeof (vectype) / sizeof (*__ptr));  \
   __o =__builtin_aarch64_ld3_lane##mode (  
   \
  (__builtin_aarch64_simd_##ptrmode *) __ptr, __o, __c);   \
   __b.val[0] = (vectype) __builtin_aarch64_get_dregcidi (__o, 0); \
@@ -18089,6 +18093,8 @@ vld4_lane_##funcsuffix (const ptrtype * __ptr, intype 
__b, const int __c)  \
   __o = __builtin_aarch64_set_qregxi##mode (__o,  \
   (signedtype) __temp.val[3], \
   3); \
+  __builtin_aarch64_im_lane_boundsi (__c, \
+sizeof (vectype) / sizeof (*__ptr));  \
   __o =__builtin_aarch64_ld4_lane##mode (  
   \
  (__builtin_aarch64_simd_##ptrmode *) __ptr, __o, __c);   \
   __b.val[0] = (vectype) __builtin_aarch64_get_dregxidi (__o, 0); \
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vld4_lane.c 
b/gcc/testsuite/gcc.target/aarch64/simd/vld4_lane.c
new file mode 100644
index 000..d14e6c1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vld4_lane.c
@@ -0,0 +1,15 @@
+/* Test error message when passing an invalid value as a lane index.  */
+
+/* { dg-do compile } */
+
+#include arm_neon.h
+
+int8x8x4_t
+f_vld4_lane (int8_t * p, int8x8x4_t v)
+{
+  int8x8x4_t res;
+  /* { dg-error lane 8 out of range 0 - 7  { target *-*-* } 0 } */
+  res = vld4_lane_s8 (p, v, 8);
+  return res;
+}
+
-- 
1.9.1



[PATCH 4/4] vstN_lane error message enhancements (D registers)

2014-12-09 Thread charles . baylis
From: Charles Baylis charles.bay...@linaro.org

gcc/ChangeLog:

DATE  Charles Baylis  charles.bay...@linaro.org

* config/aarch64/arm_neon.h (__ST2_LANE_FUNC): Add explicit lane bounds
check.
(__ST3_LANE_FUNC): Likewise.
(__ST4_LANE_FUNC): Likewise.

gcc/testsuite/ChangeLog:

DATE  Charles Baylis  charles.bay...@linaro.org

* gcc.target/aarch64/simd/vst4_lane.c: New test.

Change-Id: I6bceaeb7773bf20860daca4013ea1c4d2c06afa6
---
 gcc/config/aarch64/arm_neon.h |  6 ++
 gcc/testsuite/gcc.target/aarch64/simd/vst4_lane.c | 14 ++
 2 files changed, 20 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/vst4_lane.c

diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 22df564..22c6d06 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -11181,6 +11181,8 @@ vst2_lane_ ## funcsuffix (ptrtype *__ptr,   
 \
   __temp.val[1]
 \
 = vcombine_##funcsuffix (__b.val[1],\
 vcreate_##funcsuffix (__AARCH64_UINT64_C (0))); \
+  __builtin_aarch64_im_lane_boundsi (__c,   \
+sizeof (__b.val[0]) / sizeof (*__ptr)); \
   __o = __builtin_aarch64_set_qregoi##mode (__o,\
(signedtype) __temp.val[0], 0);  \
   __o = __builtin_aarch64_set_qregoi##mode (__o,\
@@ -11258,6 +11260,8 @@ vst3_lane_ ## funcsuffix (ptrtype *__ptr,   
 \
(signedtype) __temp.val[1], 1);  \
   __o = __builtin_aarch64_set_qregci##mode (__o,\
(signedtype) __temp.val[2], 2);  \
+  __builtin_aarch64_im_lane_boundsi (__c,   \
+sizeof (__b.val[0]) / sizeof (*__ptr)); \
   __builtin_aarch64_st3_lane##mode ((__builtin_aarch64_simd_ ## ptr_mode *)  \
 __ptr, __o, __c);   \
 }
@@ -11336,6 +11340,8 @@ vst4_lane_ ## funcsuffix (ptrtype *__ptr,   
 \
(signedtype) __temp.val[2], 2);  \
   __o = __builtin_aarch64_set_qregxi##mode (__o,\
(signedtype) __temp.val[3], 3);  \
+  __builtin_aarch64_im_lane_boundsi (__c,   \
+sizeof (__b.val[0]) / sizeof (*__ptr)); \
   __builtin_aarch64_st4_lane##mode ((__builtin_aarch64_simd_ ## ptr_mode *)  \
 __ptr, __o, __c);   \
 }
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vst4_lane.c 
b/gcc/testsuite/gcc.target/aarch64/simd/vst4_lane.c
new file mode 100644
index 000..6627ecf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vst4_lane.c
@@ -0,0 +1,14 @@
+/* Test error message when passing an invalid value as a lane index.  */
+
+/* { dg-do compile } */
+
+#include arm_neon.h
+
+void
+f_vst4_lane (int8_t * p, int8x8x4_t v)
+{
+  /* { dg-error lane 8 out of range 0 - 7  { target *-*-* } 0 } */
+  vst4_lane_s8 (p, v, 8);
+  return;
+}
+
-- 
1.9.1



[PATCH 3/4] vstN_lane error message enhancements (Q register)

2014-12-09 Thread charles . baylis
From: Charles Baylis charles.bay...@linaro.org

gcc/ChangeLog:

DATE  Charles Baylis  charles.bay...@linaro.org

* config/aarch64/aarch64-builtins.c
(aarch64_types_storestruct_lane_qualifiers): Mark last argument with
qualifier_struct_load_store_lane_index.

gcc/testsuite/ChangeLog:

DATE  Charles Baylis  charles.bay...@linaro.org

* gcc.target/aarch64/simd/vst4q_lane.c: New test.

Change-Id: If097c9d32eb6eb3d4c4e16db81f81e44a3154509
---
 gcc/config/aarch64/aarch64-builtins.c  |  2 +-
 gcc/testsuite/gcc.target/aarch64/simd/vst4q_lane.c | 15 +++
 2 files changed, 16 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/vst4q_lane.c

diff --git a/gcc/config/aarch64/aarch64-builtins.c 
b/gcc/config/aarch64/aarch64-builtins.c
index 27046e2..f2fb939 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -258,7 +258,7 @@ aarch64_types_store1_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 static enum aarch64_type_qualifiers
 aarch64_types_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_void, qualifier_pointer_map_mode,
-  qualifier_none, qualifier_none };
+  qualifier_none, qualifier_struct_load_store_lane_index };
 #define TYPES_STORESTRUCT_LANE (aarch64_types_storestruct_lane_qualifiers)
 
 #define CF0(N, X) CODE_FOR_aarch64_##N##X
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vst4q_lane.c 
b/gcc/testsuite/gcc.target/aarch64/simd/vst4q_lane.c
new file mode 100644
index 000..849f07a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vst4q_lane.c
@@ -0,0 +1,15 @@
+/* Test error message when passing an invalid value as a lane index.  */
+
+/* { dg-do compile } */
+
+#include arm_neon.h
+
+void
+f_vst4q_lane (int8_t * p, int8x16x4_t v)
+{
+  vst4q_lane_s8 (p, v, 16);
+/* { dg-error lane 16 out of range 0 - 15  { target *-*-* } 0 } */
+  return;
+}
+
+
-- 
1.9.1



Re: [PATCH] PR ipa/63909 ICE: SIGSEGV in ipa_icf_gimple::func_checker::compare_bb()

2014-12-09 Thread Martin Liška

On 11/21/2014 01:23 PM, Richard Biener wrote:

On Fri, Nov 21, 2014 at 12:52 PM, Martin Liška mli...@suse.cz wrote:

On 11/20/2014 05:41 PM, Richard Biener wrote:


On Thu, Nov 20, 2014 at 5:30 PM, Martin Liška mli...@suse.cz wrote:


Hello.

Following patch fixes ICE in IPA ICF. Problem was that number of
non-debug
statements in a BB can
change (for instance by IPA split), so that the number is recomputed.



Huh, so can it get different for both candidates?  I think the stmt
compare
loop should be terminated on gsi_end_p of either iterator and return
false for any remaining non-debug-stmts on the other.

Thus, not walk all stmts twice here.



Hello.

Sorry for the previous patch, you are right it can be fixed in purer way.
Please take a look at attached patch.



As IPA split is run early I don't see how it should affect a real IPA
pass though?





Sorry for non precise information, the problematic BB is changed here:
#0  gsi_split_seq_before (i=0x7fffd550, pnew_seq=0x7fffd528) at
../../gcc/gimple-iterator.c:429
#1  0x00b95a2a in gimple_split_block (bb=0x76c41548, stmt=0x0)
at ../../gcc/tree-cfg.c:5707
#2  0x007563cf in split_block (bb=0x76c41548, i=i@entry=0x0) at
../../gcc/cfghooks.c:508
#3  0x00756b44 in split_block_after_labels (bb=optimized out) at
../../gcc/cfghooks.c:549
#4  make_forwarder_block (bb=optimized out,
redirect_edge_p=redirect_edge_p@entry=0x75d4e0 mfb_keep_just(edge_def*),
new_bb_cbk=new_bb_cbk@entry=0x0) at ../../gcc/cfghooks.c:842
#5  0x0076085a in create_preheader (loop=0x76d56948,
flags=optimized out) at ../../gcc/cfgloopmanip.c:1563
#6  0x00760aea in create_preheaders (flags=1) at
../../gcc/cfgloopmanip.c:1613
#7  0x009bc6b0 in apply_loop_flags (flags=15) at
../../gcc/loop-init.c:75
#8  0x009bc7d3 in loop_optimizer_init (flags=15) at
../../gcc/loop-init.c:136
#9  0x00957914 in estimate_function_body_sizes (node=0x76c47620,
early=false) at ../../gcc/ipa-inline-analysis.c:2480
#10 0x0095948b in compute_inline_parameters (node=0x76c47620,
early=false) at ../../gcc/ipa-inline-analysis.c:2907
#11 0x0095bd88 in inline_analyze_function (node=0x76c47620) at
../../gcc/ipa-inline-analysis.c:3994
#12 0x0095bed3 in inline_generate_summary () at
../../gcc/ipa-inline-analysis.c:4045
#13 0x00a70b71 in execute_ipa_summary_passes (ipa_pass=0x1dcb9e0) at


So inline_summary is generated after IPA-ICF does its job?

But the bug is obviously that an IPA analysis phase does a code transform
(here initializes loops without AVOID_CFG_MANIPULATIONS).
Honza - if that is really needed then I think we should make sure
loops are initialized at the start of the IPA analysis phase, not randomly
inbetween.

Thanks,
Richard.


Hello.

Even thought the root of problem is hidden somewhere in loop creation, I would
like to apply the patch which makes iteration of non-debug gimple statement
more clearly?

What do you think Richard?

Thanks,
Martin




../../gcc/passes.c:2137
#14 0x00777a15 in ipa_passes () at ../../gcc/cgraphunit.c:2074
#15 symbol_table::compile (this=this@entry=0x76c3a000) at
../../gcc/cgraphunit.c:2187
#16 0x00778bcd in symbol_table::finalize_compilation_unit
(this=0x76c3a000) at ../../gcc/cgraphunit.c:2340
#17 0x006580ee in c_write_global_declarations () at
../../gcc/c/c-decl.c:10777
#18 0x00b5bb8b in compile_file () at ../../gcc/toplev.c:584
#19 0x00b5def1 in do_compile () at ../../gcc/toplev.c:2041
#20 0x00b5e0fa in toplev::main (this=0x7fffdc9f, argc=20,
argv=0x7fffdd98) at ../../gcc/toplev.c:2138
#21 0x0063f1d9 in main (argc=20, argv=0x7fffdd98) at
../../gcc/main.c:38


Patch can bootstrap on x86_64-linux-pc and no regression has been seen.
Ready for trunk?


Thanks,
Martin



Thanks,
Richard.


Patch can bootstrap on x86_64-linux-pc and no regression has been seen.
Ready for trunk?

Thanks,
Martin







Re: [PATCH, x86] Fix pblendv expand.

2014-12-09 Thread Evgeny Stupachenko
I've tried to get smaller reproducer, however currently it is
complicated as several functions in GCC.
However patch is fixing spec2006 benchmark. Shouldn't that be enough
for regression testing?

Thanks,
Evgeny

On Tue, Dec 9, 2014 at 4:29 PM, Evgeny Stupachenko evstu...@gmail.com wrote:
 The case comes from spec2006 403.gcc (or old GCC itself).

   for (i = 0; i  FIRST_PSEUDO_REGISTER; ++i)
 {
   vd-e[i].mode = VOIDmode;
   vd-e[i].oldest_regno = i;
   vd-e[i].next_regno = INVALID_REGNUM;
 }

 It is vectorized and only then completely peeled.
 Only after peeling all stored values become constant.

 Currently expand_vec_perm_pblendv works as following:
 Let d.target, d.op0, dop1 be permutation parameters.

 First we permute an operand (d.op1 or d.op0) and then blend it with
 other argument:

 d.target = shuffle(d.op1) /* using expand_vec_perm_1 */
 d.target = pblend(d.op0, d.target)
 (if d.op0 equal to d.target this is buggy)

 Patch make it more accurate:
 tmp = gen_reg_rtx (vmode)
 tmp = shuffle(d.op1) /* using expand_vec_perm_1 */
 d.target = pblend(d.op0, tmp)
 (Here d.op0 can be equal to d.target)

 Below is rtl dump of buggy case:

 (183t.optimized)
 ...
 vect_shuffle3_low_470 = VEC_PERM_EXPR { 0, 0, 0, 0 }, { 32, 33, 34,
 35 }, { 0, 4, 0, 1 };
 vect_shuffle3_high_469 = VEC_PERM_EXPR vect_shuffle3_low_470, {
 4294967295, 4294967295, 4294967295, 4294967295 }, { 0, 1, 4, 3 };
 ...
 (184r.expand)
 ...
 (insn 205 204 206 (set (reg:V4SI 768)
 (const_vector:V4SI [
 (const_int 0 [0])
 (const_int 0 [0])
 (const_int 0 [0])
 (const_int 0 [0])
 ])) ../regrename.c:1171 -1
  (nil))

 (insn 206 205 208 (set (reg:V4SI 769)
 (mem/u/c:V4SI (symbol_ref/u:DI (*.LC28) [flags 0x2]) [3  S16
 A128])) ../regrename.c:1171 -1
  (expr_list:REG_EQUAL (const_vector:V4SI [
 (const_int 32 [0x20])
 (const_int 33 [0x21])
 (const_int 34 [0x22])
 (const_int 35 [0x23])
 ])
 (nil)))

 (insn 208 206 207 (set (reg:V4SI 770)
 (vec_select:V4SI (vec_concat:V8SI (reg:V4SI 768)
 (reg:V4SI 769))
 (parallel [
 (const_int 0 [0])
 (const_int 4 [0x4])
 (const_int 1 [0x1])
 (const_int 5 [0x5])
 ]))) ../regrename.c:1171 -1
  (nil))

 (insn 207 208 209 (set (reg:V4SI 464 [ D.15061 ])
 (vec_select:V4SI (reg:V4SI 770)
 (parallel [
 (const_int 0 [0])
 (const_int 1 [0x1])
 (const_int 0 [0])
 (const_int 2 [0x2])
 ]))) ../regrename.c:1171 -1
  (nil))

 (insn 209 207 210 (set (reg:V4SI 771)
 (const_vector:V4SI [
 (const_int -1 [0x])
 (const_int -1 [0x])
 (const_int -1 [0x])
 (const_int -1 [0x])
 ])) ../regrename.c:1171 -1
  (nil))

 (insn 210 209 211 (set (reg:V4SI 464 [ D.15061 ])
 (vec_select:V4SI (reg:V4SI 771)
 (parallel [
 (const_int 0 [0])
 (const_int 1 [0x1])
 (const_int 0 [0])
 (const_int 3 [0x3])
 ]))) ../regrename.c:1171 -1
  (nil))
 (insn 211 210 212 (set (reg:V8HI 772)
 (vec_merge:V8HI (subreg:V8HI (reg:V4SI 464 [ D.15061 ]) 0)
 (subreg:V8HI (reg:V4SI 464 [ D.15061 ]) 0)
 (const_int 48 [0x30]))) ../regrename.c:1171 -1
  (nil))
 ...

 On Tue, Dec 9, 2014 at 12:06 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Tue, Dec 9, 2014 at 9:57 AM, Uros Bizjak ubiz...@gmail.com wrote:

 The patch fix pblendv expand.
 The bug was uncovered when permutation operands are constants.
 In this case we init target register for expand_vec_perm_1 with
 constant and then rewrite the target with constant for
 expand_vec_perm_pblend.

 The patch fixes 403.gcc execution, compiled with -Ofast -funroll-loops
 -flto -march=corei7.

 Bootstrap and make check passed.

 Is it ok?

 Please add a testcase.

 Also, it surprises me that we enter expand_vec_perm_pblendv with
 uninitialized (?) target. Does your patch only papers over a real
 problem up the call chain (hard to say without a testcase)?

 Uros.



 Evgeny

 2014-12-09  Evgeny Stupachenko  evstu...@gmail.com

 gcc/
 * config/i386/i386.c (expand_vec_perm_pblendv): Gen new rtx for
 expand_vec_perm_1 target.

 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index eafc15a..5a914ad 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -47546,6 +47546,7 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d 
 *d)
  dcopy.op0 = dcopy.op1 = d-op1;
else
  dcopy.op0 = dcopy.op1 = d-op0;
 +  dcopy.target = gen_reg_rtx 

Re: [PATCH] AArch64: Add TARGET_SCHED_REASSOCIATION_WIDTH

2014-12-09 Thread Marcus Shawcroft
On 24 November 2014 at 13:46, Wilco Dijkstra wdijk...@arm.com wrote:
 Richard Earnshaw wrote:
 If all cores seem to benefit from FP reassociation set to 4, then it
 seems odd that 4 is not also the default for generic.

 Andrew, you may need to pick a target-specific value for ThunderX; I
 think Wilco has just picked something that seems plausible because he
 needs to put a real value in there.

 What happens if the integer and vector numbers are bumped up?  I'd have
 thought that integer numbers 1 would be appropriate on all dual-issue
 or greater cores.

 I tried int and vector as well, and setting int to 2 did give an improvement,
 but vector had no effect, so I'll leave to 1 for now. The patch is the same
 as last time, it just sets integer to 2, and uses the same settings for all 
 CPUs.

 OK for commit?

 ChangeLog:
 2014-11-24  Wilco Dijkstra  wdijk...@arm.com

 * gcc/config/aarch64/aarch64-protos.h (tune-params):
 Add reasociation tuning parameters.
 * gcc/config/aarch64/aarch64.c (TARGET_SCHED_REASSOCIATION_WIDTH):
 Define.  (aarch64_reassociation_width): New function.
 (generic_tunings) Add reassociation tuning parameters.
 (cortexa53_tunings): Likewise.
 (cortexa57_tunings): Likewise.
 (thunderx_tunings): Likewise.

OK /Marcus


Re: [PATCH] PR other/63613: Add fixincludes for dejagnu.h

2014-12-09 Thread David Malcolm
On Mon, 2014-12-08 at 14:13 +0100, Rainer Orth wrote:
 Jeff Law l...@redhat.com writes:
 
  On 12/04/14 15:42, Rainer Orth wrote:
  David Malcolm dmalc...@redhat.com writes:
 
  dejagnu.h assumed -fgnu89-inline until a recent upstream fix;
  see http://lists.gnu.org/archive/html/dejagnu/2014-10/msg00011.html
 
  Remove the workaround from jit.exp that used -fgnu89-inline
  in favor of a fixincludes to dejagnu.h that applies the upstream fix
  to a local copy.
 
  This should make it easier to support C++ testcases from jit.exp.
 
  I wonder how this would work if dejagnu.h doesn't live in a system
  include dir (e.g. a self-compiled version)?  fixincludes won't touch
  those AFAIU.  The previous version with -fgnu89-inline would still work
  in that case provided dejagnu.h is found at all.
  Presumably in that case the answer is upgrade dejagnu? :-)
 
 I've two problems with this:
 
 * There's not yet a DejaGnu release available with the fix and I've no
   idea if there are any planned any time soon.  Not everyone is
   comfortable with random git (or whatever) snapshots.

FWIW I've asked on the DejaGnu mailing list, and Ben Elliston said:
 Yes. I plan on releasing 1.6 over the holidays.
http://lists.gnu.org/archive/html/dejagnu/2014-12/msg1.html

 * I don't consider this a critical issue that cannot work without
   current releases.  We're already working around several upstream
   DejaGnu issues in our codebase, and I don't consider this particular
   one important enough to require everyone to upgrade to a not-a-release
   version.
 
   Rainer
 




Re: [PATCH, x86] Fix pblendv expand.

2014-12-09 Thread Richard Henderson
On 12/09/2014 07:59 AM, Evgeny Stupachenko wrote:
 However patch is fixing spec2006 benchmark. Shouldn't that be enough
 for regression testing?
 

No.  Spec is not free.


r~


Re: [PATCH][AARCH64][5/5] Add macro fusion support for cmp/b.X for ThunderX

2014-12-09 Thread Marcus Shawcroft
On 5 December 2014 at 14:36, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:
 Hi all,

 Andrew posted this patch sometime ago (before stage1 closed) and I had
 rebased it on top of the
 other macro fusion patches in that series.
 This is a respin of that patch with the comment about not calling
 get_attr_type
 repeatedly resolved
 (https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02251.html)

 Tested aarch64-none-elf.

 Is this ok to go in?

 Thanks,
 Kyrill

 2014-12-01  Andrew Pinski  apin...@cavium.com
 Kyrylo Tkachov  kyrylo.tkac...@arm.com

  * config/aarch64/aarch64.c (AARCH64_FUSE_CMP_BRANCH): New define.
  (thunderx_tunings): Add AARCH64_FUSE_CMP_BRANCH to fuseable_ops.
  (aarch_macro_fusion_pair_p): Handle AARCH64_FUSE_CMP_BRANCH.

OK /Marcus


Re: [PATCH, x86] Fix pblendv expand.

2014-12-09 Thread Jakub Jelinek
On Tue, Dec 09, 2014 at 06:59:27PM +0300, Evgeny Stupachenko wrote:
 I've tried to get smaller reproducer, however currently it is
 complicated as several functions in GCC.

Just add a gcc_assert where you are changing the code, testing for
what you want to avoid.
Then just delta reduce it or creduce it.

Jakub


Re: [PATCH 04/10] rs6000: Remove addic from the normal add pattern

2014-12-09 Thread Eric Botcazou
 It's unfortunate that GCC does not have separate patterns for a safe
 ADD used by reload.

It does, the addptrM3 pattern, but it works only with LRA at the moment.

-- 
Eric Botcazou


Re: Fix libgomp crash without TLS (PR42616)

2014-12-09 Thread Richard Henderson
On 12/09/2014 06:53 AM, Varvara Rainchik wrote:
 2014-12-09  Varvara Rainchik  varvara.rainc...@intel.com
 
 * config.h.in: Regenerate.
 * configure: Regenerate.
 * configure.ac: Add GCC_CHECK_EMUTLS.
 * libgomp.h: Add check for USE_EMUTLS: this case
 is equal to HAVE_TLS.
 * team.c: Likewise.

Ok.


r~


Re: [PATCH, x86] Fix pblendv expand.

2014-12-09 Thread Evgeny Stupachenko
I mean that there are a lot of people tracking spec2006 stability and
therefore the issue should be on track in future.
And that I can create the test case, but it would be as big as several
GCC functions.
Will work on reducing the test case.


On Tue, Dec 9, 2014 at 7:20 PM, Richard Henderson r...@redhat.com wrote:
 On 12/09/2014 07:59 AM, Evgeny Stupachenko wrote:
 However patch is fixing spec2006 benchmark. Shouldn't that be enough
 for regression testing?


 No.  Spec is not free.


 r~


Re: [PATCH 04/10] rs6000: Remove addic from the normal add pattern

2014-12-09 Thread David Edelsohn
On Tue, Dec 9, 2014 at 11:24 AM, Eric Botcazou ebotca...@adacore.com wrote:
 It's unfortunate that GCC does not have separate patterns for a safe
 ADD used by reload.

 It does, the addptrM3 pattern, but it works only with LRA at the moment.

Hi, Eric

Segher and I previously discussed addptrM3, but I thought that it was
restricted in a different way.

We definitely will have to define addptrM3 and restore addM3 when
rs6000 switches to LRA.

Thanks, David


Re: [PATCH 04/10] rs6000: Remove addic from the normal add pattern

2014-12-09 Thread Segher Boessenkool
On Tue, Dec 09, 2014 at 11:48:44AM -0500, David Edelsohn wrote:
 We definitely will have to define addptrM3 and restore addM3 when
 rs6000 switches to LRA.

Can LRA deal with adding a clobber of a hard register?  How would
you express that in the patterns, anyway?


Segher


Re: [patch, build] Restore bootstrap in building libcc1 on darwin

2014-12-09 Thread Dominique d'Humières
This was for x86_64-apple-darwin14. The patch also works for 
x86_64-apple-darwin10.

Dominique

 Le 6 déc. 2014 à 01:49, Dominique d'Humières domi...@lps.ens.fr a écrit :
 
 Bootstrap just finished with the patch.
 
 Thanks,
 
 Dominique



RE: [PATCH] AArch64: Add TARGET_SCHED_REASSOCIATION_WIDTH

2014-12-09 Thread Wilco Dijkstra
 Marcus Shawcroft wrote:
  OK for commit?
 
  ChangeLog:
  2014-11-24  Wilco Dijkstra  wdijk...@arm.com
 
  * gcc/config/aarch64/aarch64-protos.h (tune-params):
  Add reasociation tuning parameters.
  * gcc/config/aarch64/aarch64.c (TARGET_SCHED_REASSOCIATION_WIDTH):
  Define.  (aarch64_reassociation_width): New function.
  (generic_tunings) Add reassociation tuning parameters.
  (cortexa53_tunings): Likewise.
  (cortexa57_tunings): Likewise.
  (thunderx_tunings): Likewise.
 
 OK /Marcus

Jiong, can you commit this please?

---
 gcc/config/aarch64/aarch64-protos.h |  3 +++
 gcc/config/aarch64/aarch64.c| 36 
 2 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index a9985b5..ac3487b 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -170,6 +170,9 @@ struct tune_params
   const struct cpu_vector_cost *const vec_costs;
   const int memmov_cost;
   const int issue_rate;
+  const int int_reassoc_width;
+  const int fp_reassoc_width;
+  const int vec_reassoc_width;
 };
 
 HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 3832123..e543161 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -314,7 +314,10 @@ static const struct tune_params generic_tunings =
   generic_regmove_cost,
   generic_vector_cost,
   NAMED_PARAM (memmov_cost, 4),
-  NAMED_PARAM (issue_rate, 2)
+  NAMED_PARAM (issue_rate, 2),
+  2,   /* int_reassoc_width.  */
+  4,   /* fp_reassoc_width.  */
+  1/* vec_reassoc_width.  */
 };
 
 static const struct tune_params cortexa53_tunings =
@@ -324,7 +327,10 @@ static const struct tune_params cortexa53_tunings =
   cortexa53_regmove_cost,
   generic_vector_cost,
   NAMED_PARAM (memmov_cost, 4),
-  NAMED_PARAM (issue_rate, 2)
+  NAMED_PARAM (issue_rate, 2),
+  2,   /* int_reassoc_width.  */
+  4,   /* fp_reassoc_width.  */
+  1/* vec_reassoc_width.  */
 };
 
 static const struct tune_params cortexa57_tunings =
@@ -334,7 +340,10 @@ static const struct tune_params cortexa57_tunings =
   cortexa57_regmove_cost,
   cortexa57_vector_cost,
   NAMED_PARAM (memmov_cost, 4),
-  NAMED_PARAM (issue_rate, 3)
+  NAMED_PARAM (issue_rate, 3),
+  2,   /* int_reassoc_width.  */
+  4,   /* fp_reassoc_width.  */
+  1/* vec_reassoc_width.  */
 };
 
 static const struct tune_params thunderx_tunings =
@@ -344,7 +353,10 @@ static const struct tune_params thunderx_tunings =
   thunderx_regmove_cost,
   generic_vector_cost,
   NAMED_PARAM (memmov_cost, 6),
-  NAMED_PARAM (issue_rate, 2)
+  NAMED_PARAM (issue_rate, 2),
+  2,   /* int_reassoc_width.  */
+  4,   /* fp_reassoc_width.  */
+  1/* vec_reassoc_width.  */
 };
 
 /* A processor implementing AArch64.  */
@@ -437,6 +449,19 @@ static const char * const aarch64_condition_codes[] =
   hi, ls, ge, lt, gt, le, al, nv
 };
 
+static int
+aarch64_reassociation_width (unsigned opc ATTRIBUTE_UNUSED,
+enum machine_mode mode)
+{
+  if (VECTOR_MODE_P (mode))
+return aarch64_tune_params-vec_reassoc_width;
+  if (INTEGRAL_MODE_P (mode))
+return aarch64_tune_params-int_reassoc_width;
+  if (FLOAT_MODE_P (mode))
+return aarch64_tune_params-fp_reassoc_width;
+  return 1;
+}
+
 /* Provide a mapping from gcc register numbers to dwarf register numbers.  */
 unsigned
 aarch64_dbx_register_number (unsigned regno)
@@ -10499,6 +10524,9 @@ aarch64_gen_ccmp_next (rtx prev, int cmp_code, rtx op0, 
rtx op1, int bit_code)
 #undef TARGET_PREFERRED_RELOAD_CLASS
 #define TARGET_PREFERRED_RELOAD_CLASS aarch64_preferred_reload_class
 
+#undef TARGET_SCHED_REASSOCIATION_WIDTH
+#define TARGET_SCHED_REASSOCIATION_WIDTH aarch64_reassociation_width
+
 #undef TARGET_SECONDARY_RELOAD
 #define TARGET_SECONDARY_RELOAD aarch64_secondary_reload
 
-- 
1.9.1



Re: [PATCH 04/10] rs6000: Remove addic from the normal add pattern

2014-12-09 Thread Segher Boessenkool
On Mon, Dec 08, 2014 at 09:01:39AM -0600, Segher Boessenkool wrote:
  Why are you removing the alternative instead of clobbering XER[CA]?
 
 I should have mentioned, sorry.
 
 We don't want to clobber CA on *every* add, and reload can generate more
 adds out of thin air.  And CA is a fixed register.
 
 There may be a better way to do this, but I don't know it.

Some numbers...

I checked how often the compiler used to allocate GPR0 here.  I looked
at cc1 of 4.7.2, 12MB text size.  It happened zero times (it already was
?r to discourage the RA / reload from using that alternative).

There are 143962 addi insns in there.


Segher


Re: [Patch] Improving jump-thread pass for PR 54742

2014-12-09 Thread Sebastian Pop
Richard Biener wrote:
 On Mon, Dec 8, 2014 at 10:49 PM, Steve Ellcey sell...@mips.com wrote:
  expected?  Should this test also check flag_thread_jumps?  Or should
  that be getting checked somewhere else?
 
 -fthread-jumps is an RTL optimization flag and ignored on GIMPLE.

Does it make sense to add a -f[no-]tree-thread-jumps to enable/disable the tree
jump threading?  I could also add -f[no-]tree-fsm-thread-jumps.  Opinions?

On the llvm test-suite, I have seen one ICE with my fsm jump-thread patch.
This patch fixes the problem:

diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index 12f83ba..f8c736e 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -2564,6 +2564,7 @@ thread_through_all_blocks (bool may_peel_loop_headers)
   FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
 {
   if (!loop-header
+|| !loop_latch_edge (loop)
   || !bitmap_bit_p (threaded_blocks, loop-header-index))
   continue;
 
  retval |= thread_through_loop_header (loop, may_peel_loop_headers);

Ok to commit after regstrap?

Thanks,
Sebastian


[PATCH, libcpp]: Check the result of vasprintf

2014-12-09 Thread Uros Bizjak
Hello!

The compilation with gentoo glibc 2.20 emits following warning:

../../../gcc-svn/trunk/libcpp/directives.c:2411:28: warning: ignoring
return value of ‘int vasprintf(char**, const char*, __gnuc_va_list)’,
declared with attribute warn_unused_result [-Wunused-result]

The manpage says:

If memory allocation wasn't possible, or some other error occurs,
these functions will return -1, and the contents of strp is undefined.

Attached patch checks the return value and sets ptr to NULL in this case.

2014-12-09  Uros Bizjak  ubiz...@gmail.com

* directives.c (cpp_define_formatted): Check return value of
vasprintf and in case of error set ptr to NULL.

Bootstrapped on x86_64-linux-gnu.

OK for mainline?

Uros.

Index: directives.c
===
--- directives.c(revision 218525)
+++ directives.c(working copy)
@@ -2408,7 +2408,8 @@ cpp_define_formatted (cpp_reader *pfile, const cha

   va_list ap;
   va_start (ap, fmt);
-  vasprintf (ptr, fmt, ap);
+  if (vasprintf (ptr, fmt, ap)  0)
+ptr = NULL;
   va_end (ap);

   cpp_define (pfile, ptr);


[PATCH] TYPE_OVERFLOW_* cleanup

2014-12-09 Thread Marek Polacek
The issue here is that TYPE_OVERFLOW_TRAPS, TYPE_OVERFLOW_UNDEFINED,
and TYPE_OVERFLOW_WRAPS macros work on integral types only, yet we
pass e.g. pointer_type/real_type to them.  This patch adds proper
checking for these macros and adds missing guards to various places.
This looks pretty straightforward, but I had to tweak a few places
wrt vector_types (where I've used VECTOR_INTEGER_TYPE_P) to not to
regress folding - and I'm afraid I missed places that aren't tested
in our testsuite :/.

Bootstrapped/regtested on ppc64-linux and x86_64-linux.

2014-12-09  Marek Polacek  pola...@redhat.com

* fold-const.c (negate_expr_p): Check for INTEGRAL_TYPE_P.
(fold_negate_expr): Likewise.
(extract_muldiv_1): Likewise.
(maybe_canonicalize_comparison_1): Likewise.
(fold_comparison): Likewise.
(fold_binary_loc): Likewise.
(tree_binary_nonnegative_warnv_p): Likewise.
(tree_binary_nonzero_warnv_p): Likewise.
* gimple-ssa-strength-reduction.c (legal_cast_p_1): Likewise.
* tree-scalar-evolution.c (simple_iv): Likewise.
(scev_const_prop): Likewise.
* tree-ssa-loop-niter.c (expand_simple_operations): Likewise.
* match.pd (X % -C): Likewise.
(-A - 1 - ~A): Likewise.
(~A + A - -1): Check for INTEGRAL_TYPE_P or VECTOR_INTEGER_TYPE_P.
* tree-vect-generic.c (expand_vector_operation): Likewise.
* tree.h (INTEGRAL_TYPE_CHECK): Define.
(TYPE_OVERFLOW_WRAPS, TYPE_OVERFLOW_UNDEFINED, TYPE_OVERFLOW_TRAPS):
Add INTEGRAL_TYPE_CHECK.

diff --git gcc/fold-const.c gcc/fold-const.c
index 0c4fe40..ff9d917 100644
--- gcc/fold-const.c
+++ gcc/fold-const.c
@@ -426,7 +426,8 @@ negate_expr_p (tree t)
 
 case VECTOR_CST:
   {
-   if (FLOAT_TYPE_P (TREE_TYPE (type)) || TYPE_OVERFLOW_WRAPS (type))
+   if (FLOAT_TYPE_P (TREE_TYPE (type))
+   || (INTEGRAL_TYPE_P (type)  TYPE_OVERFLOW_WRAPS (type)))
  return true;
 
int count = TYPE_VECTOR_SUBPARTS (type), i;
@@ -558,7 +559,8 @@ fold_negate_expr (location_t loc, tree t)
 case INTEGER_CST:
   tem = fold_negate_const (t, type);
   if (TREE_OVERFLOW (tem) == TREE_OVERFLOW (t)
- || (!TYPE_OVERFLOW_TRAPS (type)
+ || (INTEGRAL_TYPE_P (type)
+  !TYPE_OVERFLOW_TRAPS (type)
   TYPE_OVERFLOW_WRAPS (type))
  || (flag_sanitize  SANITIZE_SI_OVERFLOW) == 0)
return tem;
@@ -5951,7 +5953,8 @@ extract_muldiv_1 (tree t, tree c, enum tree_code code, 
tree wide_type,
   || EXPRESSION_CLASS_P (op0))
  /* ... and has wrapping overflow, and its type is smaller
 than ctype, then we cannot pass through as widening.  */
-  ((TYPE_OVERFLOW_WRAPS (TREE_TYPE (op0))
+  (((INTEGRAL_TYPE_P (TREE_TYPE (op0))
+TYPE_OVERFLOW_WRAPS (TREE_TYPE (op0)))
(TYPE_PRECISION (ctype)
TYPE_PRECISION (TREE_TYPE (op0
  /* ... or this is a truncation (t is narrower than op0),
@@ -5966,7 +5969,8 @@ extract_muldiv_1 (tree t, tree c, enum tree_code code, 
tree wide_type,
  /* ... or has undefined overflow while the converted to
 type has not, we cannot do the operation in the inner type
 as that would introduce undefined overflow.  */
- || (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (op0))
+ || ((INTEGRAL_TYPE_P (TREE_TYPE (op0))
+   TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (op0)))
   !TYPE_OVERFLOW_UNDEFINED (type
break;
 
@@ -6159,6 +6163,7 @@ extract_muldiv_1 (tree t, tree c, enum tree_code code, 
tree wide_type,
   if ((code == TRUNC_MOD_EXPR || code == CEIL_MOD_EXPR
   || code == FLOOR_MOD_EXPR || code == ROUND_MOD_EXPR)
  /* If the multiplication can overflow we cannot optimize this.  */
+  INTEGRAL_TYPE_P (TREE_TYPE (t))
   TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (t))
   TREE_CODE (TREE_OPERAND (t, 1)) == INTEGER_CST
   wi::multiple_of_p (op1, c, TYPE_SIGN (type)))
@@ -6211,7 +6216,8 @@ extract_muldiv_1 (tree t, tree c, enum tree_code code, 
tree wide_type,
 
 If we have an unsigned type, we cannot do this since it will change
 the result if the original computation overflowed.  */
-  if (TYPE_OVERFLOW_UNDEFINED (ctype)
+  if (INTEGRAL_TYPE_P (ctype)
+  TYPE_OVERFLOW_UNDEFINED (ctype)
   ((code == MULT_EXPR  tcode == EXACT_DIV_EXPR)
  || (tcode == MULT_EXPR
   code != TRUNC_MOD_EXPR  code != CEIL_MOD_EXPR
@@ -8497,7 +8503,8 @@ maybe_canonicalize_comparison_1 (location_t loc, enum 
tree_code code, tree type,
 
   /* Match A +- CST code arg1 and CST code arg1.  We can change the
  first form only if overflow is undefined.  */
-  if (!((TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0))
+  if (!(((INTEGRAL_TYPE_P (TREE_TYPE (arg0))
+  TYPE_OVERFLOW_UNDEFINED (TREE_TYPE 

Re: [Patch] Improving jump-thread pass for PR 54742

2014-12-09 Thread Jeff Law

On 12/09/14 10:38, Sebastian Pop wrote:

Richard Biener wrote:

On Mon, Dec 8, 2014 at 10:49 PM, Steve Ellcey sell...@mips.com wrote:

expected?  Should this test also check flag_thread_jumps?  Or should
that be getting checked somewhere else?


-fthread-jumps is an RTL optimization flag and ignored on GIMPLE.


Does it make sense to add a -f[no-]tree-thread-jumps to enable/disable the tree
jump threading?  I could also add -f[no-]tree-fsm-thread-jumps.  Opinions?
Our option handling is a bit of a mess if we look at it from the user 
standpoint.  Given that the vast majority of jump threading happens on 
gimple, ISTM that -f[no-]thread-jumps ought to be controlling the gimple 
implementation.  One could easily argue that the user doesn't really 
care about where in the pipeline the optimization is implemented.


My vote would be to just make -fthread-jumps control both RTL and gimple 
jump threading.





On the llvm test-suite, I have seen one ICE with my fsm jump-thread patch.
This patch fixes the problem:

diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index 12f83ba..f8c736e 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -2564,6 +2564,7 @@ thread_through_all_blocks (bool may_peel_loop_headers)
FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
  {
if (!loop-header
+|| !loop_latch_edge (loop)
|| !bitmap_bit_p (threaded_blocks, loop-header-index))
continue;

   retval |= thread_through_loop_header (loop, may_peel_loop_headers);

Ok to commit after regstrap?
This seems to be indicating that we have with no edge from the latch 
block to the header block.  I'd like to know better how we got into that 
state.


Also, a test for the GCC testsuite would be good.  I have no idea what 
license covers the LLVM testsuite.  But given a good analysis of the 
problem we may be able to write a suitable test independent of the LLVM 
test.


jeff


Re: [Android] Stop unconditional disabling of ifuncs for Bionic #2

2014-12-09 Thread Jeff Law

On 12/09/14 05:39, Alexander Ivchenko wrote:

Hi,

Whether the *linux* target supports ifuncs or not is defined here:

linux_has_ifunc_p (void)
{
   return OPTION_BIONIC ? false : HAVE_GNU_INDIRECT_FUNCTION;
}

Bionic right now supports indirect functions, but there is no way to
notify the compiler about that (For Android OPTION_BIONIC is always on
and so configure time check with --enable-gnu-indirect-function does
not work)

The following patch makes the use of the default version of
TARGET_HAS_IFUNC_P for *linux*, so we can decide whether the Android
target supports indirect functions or not on configure time.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index f22bba8..d4d09d0 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2014-12-08  Alexander Ivchenko  alexander.ivche...@intel.com
+
+   * config/linux.c (linux_has_ifunc_p): Remove.
+   * config/linux.h (TARGET_HAS_IFUNC_P): Use default version.

OK.
jeff



[PATCH, committed] jit-playback.c: Move dlopen code into a new function

2014-12-09 Thread David Malcolm
Another simplification of playback::context::compile.

Committed to trunk as r218527.

Index: gcc/jit/ChangeLog
===
--- gcc/jit/ChangeLog	(revision 218526)
+++ gcc/jit/ChangeLog	(revision 218527)
@@ -1,3 +1,12 @@
+2014-12-09  David Malcolm  dmalc...@redhat.com
+
+	* jit-playback.c (gcc::jit::playback::context::compile): Move the
+	dlopen code into...
+	(gcc::jit::playback::context::dlopen_built_dso): ...this new
+	function.
+	* jit-playback.h (gcc::jit::playback::context::dlopen_built_dso):
+	New function.
+
 2014-12-08  David Malcolm  dmalc...@redhat.com
 
 	* libgccjit++.h: Indent the forward declarations of the classes to
Index: gcc/jit/jit-playback.c
===
--- gcc/jit/jit-playback.c	(revision 218526)
+++ gcc/jit/jit-playback.c	(revision 218527)
@@ -1586,7 +1586,6 @@
 playback::context::
 compile ()
 {
-  void *handle = NULL;
   const char *ctxt_progname;
   result *result_obj = NULL;
 
@@ -1648,25 +1647,8 @@
   if (errors_occurred ())
 return NULL;
 
-  /* dlopen the .so file. */
-  {
-auto_timevar load_timevar (TV_LOAD);
+  result_obj = dlopen_built_dso ();
 
-const char *error;
-
-/* Clear any existing error.  */
-dlerror ();
-
-handle = dlopen (m_path_so_file, RTLD_NOW | RTLD_LOCAL);
-if ((error = dlerror()) != NULL)  {
-  add_error (NULL, %s, error);
-}
-if (handle)
-  result_obj = new result (handle);
-else
-  result_obj = NULL;
-  }
-
   return result_obj;
 }
 
@@ -1916,6 +1898,34 @@
 }
 }
 
+/* Dynamically-link the built DSO file into this process, using dlopen.
+   Wrap it up within a jit::result *, and return that.
+   Return NULL if any errors occur, reporting them on this context.  */
+
+result *
+playback::context::
+dlopen_built_dso ()
+{
+  auto_timevar load_timevar (TV_LOAD);
+  void *handle = NULL;
+  const char *error = NULL;
+  result *result_obj = NULL;
+
+  /* Clear any existing error.  */
+  dlerror ();
+
+  handle = dlopen (m_path_so_file, RTLD_NOW | RTLD_LOCAL);
+  if ((error = dlerror()) != NULL)  {
+add_error (NULL, %s, error);
+  }
+  if (handle)
+result_obj = new result (handle);
+  else
+result_obj = NULL;
+
+  return result_obj;
+}
+
 /* Top-level hook for playing back a recording context.
 
This plays back m_recording_ctxt, and, if no errors
Index: gcc/jit/jit-playback.h
===
--- gcc/jit/jit-playback.h	(revision 218526)
+++ gcc/jit/jit-playback.h	(revision 218527)
@@ -250,6 +250,9 @@
   void
   convert_to_dso (const char *ctxt_progname);
 
+  result *
+  dlopen_built_dso ();
+
 private:
   ::gcc::jit::recording::context *m_recording_ctxt;
 


Re: [PATCH] Fix PR 61225

2014-12-09 Thread Jeff Law

On 12/09/14 02:49, Zhenqiang Chen wrote:

Do you need to verify SETA and SETB satisfy single_set?  Or has that
already been done elsewhere?


A is NEXT_INSN (insn)
B is prev_nonnote_nondebug_insn (insn),

For I1 - I2 - B; I2 - A;
LOG_LINK can make sure I1 and I2 are single_set, but not A and B. And I did
found codes in function try_combine, which can make sure B (or I3) is
single_set.

So I think the check can skip failed cases at early stage.

Thanks for doing the research on this.



The check is to make sure the correctness.  Here is a case,

int
f1 (int *x)
{
   int t = --*x;
   if (!t)
 foo (x);
   return t;
}

   _4 = *x_3(D);
   _5 = _4 + -1;
   *x_3(D) = _5;
   # DEBUG t = _5
   if (_5 == 0)
...
   bb 4:
   return _5;

_5 is used in return _5. So we can not remove _5 = _4 + -1.
Right, but ISTM that if the # uses  2, then we could just return false 
rather than bothering to see if all the uses are consumed by A or B. 
It's not a big deal, I just have a hard time seeing that doing something 
more complex than if (# uses  2) return false; makes sense.






So you've got two new combine cases here, but I think the testcase only
tests one of them.  Can you include a testcase for both of hte major
paths above (I1-I2-I3; I2-insn and I2-I3; I2-INSN)


pr61225.c is the case to cover I1-I2-I3; I2-insn.

For I2 - I3; I2 - insn, I tried my test cases and found peephole2 can also
handle them. So I removed the code from the patch.

Seems like the reasonable thing to do.



Here is the final patch.
Bootstrap and no make check regression on X86-64.

ChangeLog:
2014-11-09  Zhenqiang Chen  zhenqiang.c...@linaro.org

Part of PR rtl-optimization/61225
* combine.c (can_reuse_cc_set_p): New function.
(combine_instructions): Handle I1 - I2 - I3; I2 - insn.
(try_combine): Add one more parameter TO_COMBINED_INSN, which
is used to create a new insn parallel (TO_COMBINED_INSN, I3).

testsuite/ChangeLog:
2014-11-09  Zhenqiang Chenzhenqiang.c...@linaro.org

* gcc.target/i386/pr61225.c: New test.



OK for the trunk.

jeff


[PATCH, fixincludes]: Check return value of getcwd

2014-12-09 Thread Uros Bizjak
Hello!

The compilation with gentoo glibc 2.20 emits following warning:

../../../gcc-svn/trunk/fixincludes/server.c:195:10: warning: ignoring
return value of ‘getcwd’, declared with attribute warn_unused_result
[-Wunused-result]

The manpage says:

On failure, these functions return NULL, and errno is set to indicate
the error.  The contents of the array pointed to by buf are undefined
on error.

Attached patch checks the return value and sets buff[0] to 0 in this case.

2014-12-09  Uros Bizjak  ubiz...@gmail.com

* server.c (server_setup): Check return value of
getcwd and in case of error set buff[0] to 0.

Bootstrapped on x86_64-linux-gnu.

OK for mainline?

Index: server.c
===
--- server.c(revision 218525)
+++ server.c(working copy)
@@ -192,7 +192,8 @@ server_setup (void)

   fputs (trap : 1\n, server_pair.pf_write);
   fflush (server_pair.pf_write);
-  getcwd (buff, MAXPATHLEN + 1);
+  if (getcwd (buff, MAXPATHLEN + 1) == NULL)
+buff[0] = 0;
   p_cur_dir = xstrdup (buff);
 }


[PATCH, committed] Guard less code with the JIT mutex

2014-12-09 Thread David Malcolm
Move acquisition/release of the JIT mutex from
  jit-recording.c:gcc::jit::recording::context::compile
into
  jit-playback.c: gcc::jit::playback::context::compile
and reduce the amount of code guarded by the mutex somewhat.

For acquisition, we didn't need the mutex when building the tempdir and
building toplev's argv, so now acquire the mutex immediately before using
toplev.

For release, we didn't need the mutex when cleaning up the tempdir and
the other cleanup of the playback context, so release the mutex on each exit
path from playback::context::compile.

Ideally we'd release it immediately after toplev.finalize (), but there
are various places after that point that use timevars (which is global
state) so we have to hold the mutex until we're done with them.

Committed to trunk as r218528.

gcc/jit/ChangeLog:
* jit-playback.c (gcc::jit::playback::context::compile): Acquire the
mutex here, immediately before using toplev, and release it here, on
each exit path after acquisition.
(jit_mutex): Move this variable here, from jit-recording.c.
(gcc::jit::playback::context::acquire_mutex): New function, based on
code in jit-recording.c.
(gcc::jit::playback::context::release_mutex): Likewise.
* jit-playback.h (gcc::jit::playback::context::acquire_mutex): New
function.
(gcc::jit::playback::context::release_mutex): New function.
* jit-recording.c (jit_mutex): Move this variable to jit-playback.c.
(gcc::jit::recording::context::compile): Move mutex-handling from
here into jit-playback.c's gcc::jit::playback::context::compile.
* notes.txt: Update to show the new locations of ACQUIRE_MUTEX
and RELEASE_MUTEX.

Index: gcc/jit/jit-recording.c
===
--- gcc/jit/jit-recording.c	(revision 218527)
+++ gcc/jit/jit-recording.c	(revision 218528)
@@ -888,12 +888,6 @@
   m_requested_dumps.safe_push (d);
 }
 
-
-/* This mutex guards gcc::jit::recording::context::compile, so that only
-   one thread can be accessing the bulk of GCC's state at once.  */
-
-static pthread_mutex_t jit_mutex = PTHREAD_MUTEX_INITIALIZER;
-
 /* Validate this context, and if it passes, compile it within a
mutex.
 
@@ -908,20 +902,12 @@
   if (errors_occurred ())
 return NULL;
 
-  /* Acquire the big GCC mutex. */
-  pthread_mutex_lock (jit_mutex);
-  gcc_assert (NULL == ::gcc::jit::active_playback_ctxt);
-
   /* Set up a playback context.  */
   ::gcc::jit::playback::context replayer (this);
-  ::gcc::jit::active_playback_ctxt = replayer;
 
+  /* Use it.  */
   result *result_obj = replayer.compile ();
 
-  /* Release the big GCC mutex. */
-  ::gcc::jit::active_playback_ctxt = NULL;
-  pthread_mutex_unlock (jit_mutex);
-
   return result_obj;
 }
 
Index: gcc/jit/ChangeLog
===
--- gcc/jit/ChangeLog	(revision 218527)
+++ gcc/jit/ChangeLog	(revision 218528)
@@ -1,5 +1,23 @@
 2014-12-09  David Malcolm  dmalc...@redhat.com
 
+	* jit-playback.c (gcc::jit::playback::context::compile): Acquire the
+	mutex here, immediately before using toplev, and release it here, on
+	each exit path after acquisition.
+	(jit_mutex): Move this variable here, from jit-recording.c.
+	(gcc::jit::playback::context::acquire_mutex): New function, based on
+	code in jit-recording.c.
+	(gcc::jit::playback::context::release_mutex): Likewise.
+	* jit-playback.h (gcc::jit::playback::context::acquire_mutex): New
+	function.
+	(gcc::jit::playback::context::release_mutex): New function.
+	* jit-recording.c (jit_mutex): Move this variable to jit-playback.c.
+	(gcc::jit::recording::context::compile): Move mutex-handling from
+	here into jit-playback.c's gcc::jit::playback::context::compile.
+	* notes.txt: Update to show the new locations of ACQUIRE_MUTEX
+	and RELEASE_MUTEX.
+
+2014-12-09  David Malcolm  dmalc...@redhat.com
+
 	* jit-playback.c (gcc::jit::playback::context::compile): Move the
 	dlopen code into...
 	(gcc::jit::playback::context::dlopen_built_dso): ...this new
Index: gcc/jit/jit-playback.c
===
--- gcc/jit/jit-playback.c	(revision 218527)
+++ gcc/jit/jit-playback.c	(revision 218528)
@@ -1622,6 +1622,9 @@
   if (errors_occurred ())
 return NULL;
 
+  /* Acquire the JIT mutex and set this as the active playback ctxt.  */
+  acquire_mutex ();
+
   /* This runs the compiler.  */
   toplev toplev (false);
   toplev.main (fake_args.length (),
@@ -1635,10 +1638,14 @@
   /* Clean up the compiler.  */
   toplev.finalize ();
 
-  active_playback_ctxt = NULL;
+  /* Ideally we would release the jit mutex here, but we can't yet since
+ followup activities use timevars, which are global state.  */
 
   if (errors_occurred ())
-return NULL;
+{
+  release_mutex ();
+  return NULL;
+}
 
   if (get_bool_option (GCC_JIT_BOOL_OPTION_DUMP_GENERATED_CODE))
   

Re: [PATCH 07/10] rs6000: The big carry insns addition

2014-12-09 Thread David Edelsohn
On Mon, Dec 8, 2014 at 9:18 AM, Segher Boessenkool
seg...@kernel.crashing.org wrote:
 Now that everything is in place for letting GCC use the carry-using and
 carry-producing machine insns as separate RTL insns, switch over all
 remaining patterns that clobber CA without telling the compiler.  This
 are the multiple-precision add/sub/neg patterns, and the various eq/ne/
 ltu/gtu/leu/geu patterns.

 The eq/ne patterns are special.  The optimal machine code for those isn't
 regular (like e.g. the ltu patterns are), and we want to implement a plain
 eq as cntlz[wd];sr[wd]i; but that means that if we split those patterns
 at expand time, combine will happily put them back together again.  So
 expand them as eq/ne, and split later.


 2014-12-08  Segher Boessenkool  seg...@kernel.crashing.org

 gcc/
 PR target/64180
 * config/rs6000/predicates.md (unsigned_comparison_operator): New.
 (signed_comparison_operator): New.
 * config/rs6000/rs6000-protos.h (rs6000_emit_eqne): Declare.
 * config/rs6000/rs6000.c (rs6000_emit_eqne): New function.
 (rs6000_emit_sCOND): Remove ISEL test (move it to the expander).
 * config/rs6000/rs6000.md (addmode3 for SDI): Expand DImode
 add to addc,adde directly, if !TARGET_POWERPC64.
 (submode3 for SDI): Expand DImode sub to subfc,subfe directly,
 if !TARGET_POWERPC64.
 (negmode2): Delete expander.
 (*negmode2): Rename to negmode2.
 (addti3, subti3): Delete.
 (addti3, subti3): New expanders.
 (*adddi3_noppc64, *subdi3_noppc64, *negdi2_noppc64): Delete.
 (cstoremode4_unsigned): New expander.
 (cstoremode4): Allow GPR as output (not just SI).  Rewrite.
 (cstoremode4 for FP): Remove superfluous quotes.
 (*eqmode, *eqmode_compare, *plus_eqsi and splitter,
 *compare_plus_eqsi and splitter, *plus_eqsi_compare and splitter,
 *neg_eq0mode, *neg_eqmode, *ne0_mode, plus_ne0_mode,
 compare_plus_ne0_mode and splitter, *compare_plus_ne0_mode_1 and
 splitter, *plus_ne0_mode_compare and splitter, *leumode,
 *leumode_compare and splitter, *plus_leumode, *neg_leumode,
 *and_neg_leumode, *ltumode, *ltumode_compare, *plus_ltumode,
 *plus_ltumode_1, *plus_ltumodecompare, *neg_ltumode, *geumode,
 *geumode_compare and splitter, *plus_geumode, *neg_geumode,
 *and_neg_geumode, *plus_gt0mode, *gtumode, *gtumode_compare,
 *plus_gtumode, *plus_gtumode_1, *plus_gtumode_compare,
 *neg_gtumode, 12 anonymous insns, and 12 anonymous splitters):
 Delete.
 (eqmode3, nemode3): New.
 (*neg_eq_mode, *neg_ne_mode): New.
 (*plus_eq_mode, *plus_ne_mode): New.
 (*minus_eq_mode, *minus_ne_mode): New.

Okay.

Thanks, David


Re: [PATCH 08/10] rs6000: compare-two for rld*;rld*.

2014-12-09 Thread David Edelsohn
On Mon, Dec 8, 2014 at 9:18 AM, Segher Boessenkool
seg...@kernel.crashing.org wrote:
 These now are the only remaining patterns that have type compare.
 And they shouldn't: two is a better type for them.


 2014-12-08  Segher Boessenkool  seg...@kernel.crashing.org

 gcc/
 * config/rs6000/rs6000.md (*anddi3_2rld_dot, *anddi3_rld_dot2):
 Change type from compare to two.

Okay.

Thanks, David


Re: [PATCH 09/10] rs6000: Remove the (now unused) insn type compare

2014-12-09 Thread David Edelsohn
On Mon, Dec 8, 2014 at 9:18 AM, Segher Boessenkool
seg...@kernel.crashing.org wrote:

 2014-12-08  Segher Boessenkool  seg...@kernel.crashing.org

 gcc/
 * config/rs6000/40x.md (ppc403-compare): Remove compare.
 config/rs6000/440.md (ppc440-compare): Remove compare.
 config/rs6000/476.md (ppc476-compare): Remove compare.
 config/rs6000/601.md (ppc601-compare): Remove compare.
 config/rs6000/603.md (ppc603-compare): Remove compare.
 config/rs6000/6xx.md (ppc604-compare): Remove compare.
 config/rs6000/7450.md (ppc7450-compare): Remove compare.
 config/rs6000/7xx.md (ppc750-compare): Remove compare.
 config/rs6000/8540.md (ppc8540_su): Remove compare.
 config/rs6000/cell.md (cell-fast-cmp, cell-cmp-microcoded): Remove
 compare.
 config/rs6000/e300c2c3.md (ppce300c3_cmp): Remove compare.
 config/rs6000/e500mc.md (e500mc_su): Remove compare.
 config/rs6000/e500mc64.md (e500mc64_su2): Remove compare.
 config/rs6000/e5500.md (e5500_sfx2): Remove compare.
 config/rs6000/e6500.md (e6500_sfx2): Remove compare.
 config/rs6000/mpc.md (mpccore-compare): Remove compare.
 config/rs6000/power4.md (power4-compare): Remove compare.
 config/rs6000/power5.md (power5-compare): Remove compare.
 config/rs6000/power6.md (power6-compare): Remove compare.
 config/rs6000/power7.md (power7-compare): Remove compare.
 config/rs6000/power8.md (power8-compare): Remove compare.  Update
 comment.
 config/rs6000/rs6000.c (rs6000_adjust_cost) TYPE_COMPARE: Remove
 (three times).
 (is_cracked_insn): Remove TYPE_COMPARE case.
 (insn_must_be_first_in_group) TYPE_COMPARE: Remove (twice).
 config/rs6000/rs6000.md (type): Remove compare.
 (cell_micro): Remove compare.
 config/rs6000/rs64.md (rs64a-compare): Remove compare.

Okay.

Thanks, David


Re: [PATCH 10/10] rs6000: Get rid of the weird decimal thing in add

2014-12-09 Thread David Edelsohn
On Mon, Dec 8, 2014 at 9:18 AM, Segher Boessenkool
seg...@kernel.crashing.org wrote:
 Peter tells me it was an artifact of old versions of the DFP code.  The
 condition can never be false; delete it.


 2014-12-08  Segher Boessenkool  seg...@kernel.crashing.org

 gcc/
 * config/rs6000/rs6000.md (*addmode3): Remove condition.

Okay.

Thanks, David


Re: [PATCH] Fix PR 61225

2014-12-09 Thread Segher Boessenkool
On Tue, Dec 09, 2014 at 05:49:18PM +0800, Zhenqiang Chen wrote:
  Do you need to verify SETA and SETB satisfy single_set?  Or has that
  already been done elsewhere?
 
 A is NEXT_INSN (insn)
 B is prev_nonnote_nondebug_insn (insn),
 
 For I1 - I2 - B; I2 - A;
 LOG_LINK can make sure I1 and I2 are single_set,

It cannot, not anymore anyway.  LOG_LINKs can point to an insn with multiple
SETs; multiple LOG_LINKs can point to such an insn.

The only thing a LOG_LINK from Y to X tells you is that Y is the earliest
insn after X that uses some register set by X (and it knows which register,
too, nowadays).

   +  if (DF_REG_USE_COUNT (REGNO (SET_SRC (setb)))  2)
   +{
   +  df_ref use;
   +  rtx insn;
   +  unsigned int i = REGNO (SET_SRC (setb));
   +
   +  for (use = DF_REG_USE_CHAIN (i); use; use = DF_REF_NEXT_REG
 (use))
   +{
   +   insn = DF_REF_INSN (use);
   +   if (insn != a  insn != b  !(NOTE_P (insn) || DEBUG_INSN_P
  (insn)))
   + return false;
   + }
   +}
   +
   +  return true;
   +}
  Is this fragment really needed?  Does it ever trigger?  I'd think that
  for  2 uses punting would be fine.  Do we really commonly have cases
  with  2 uses, but where they're all in SETA and SETB?

Can't you just check for a death note on the second insn?  Together with
reg_used_between_p?

   +   /* Try to combine a compare insn that sets CC
   +  with a preceding insn that can set CC, and maybe with its
   +  logical predecessor as well.
   +  We need this special code because data flow connections
   +  do not get entered in LOG_LINKS.  */

I think you mean not _all_ data flow connections?

   +   if ((prev = prev_nonnote_nondebug_insn (insn)) != NULL_RTX
   +refer_same_reg_p (insn, prev)
   +max_combine = 4)
   + {
   + struct insn_link *next1;
   + FOR_EACH_LOG_LINK (next1, prev)
   +   {
   + rtx_insn *link1 = next1-insn;
   + if (NOTE_P (link1))
   +   continue;
   + /* I1 - I2 - I3; I2 - insn;
   +output parallel (insn, I3).  */
   + FOR_EACH_LOG_LINK (nextlinks, link1)
   +   if ((next = try_combine (prev, link1,
   +nextlinks-insn, NULL,
   +new_direct_jump_p,
   +last_combined_insn, insn)) !=
 0)
   +
   + {
   +   delete_insn (insn);
   +   insn = next;
   +   statistics_counter_event (cfun, four-insn
 combine,
  1);
   +   goto retry;
   + }
   + /* I2 - I3; I2 - insn
   +output next = parallel (insn, I3).  */
   + if ((next = try_combine (prev, link1,
   +  NULL, NULL,
   +  new_direct_jump_p,
   +  last_combined_insn, insn)) !=
 0)
   +
   +   {
   + delete_insn (insn);
   + insn = next;
   + statistics_counter_event (cfun, three-insn
 combine,
  1);
   + goto retry;
   +   }
   +   }
   + }
  So you've got two new combine cases here, but I think the testcase only
  tests one of them.  Can you include a testcase for both of hte major
  paths above (I1-I2-I3; I2-insn and I2-I3; I2-INSN)
 
 pr61225.c is the case to cover I1-I2-I3; I2-insn.
 
 For I2 - I3; I2 - insn, I tried my test cases and found peephole2 can also
 handle them. So I removed the code from the patch.

Why?  The simpler case has much better chances of being used.

In fact, there are many more cases you could handle:

You handle

I1 - I2 - I3; I2 - insn
  I2 - I3; I2 - insn

but there are also

   I1,I2 - I3; I2 - insn

and the many 4-insn combos, too.
But that's not all: instead of just dealing with I2-insn, you can just as
well have I1-insn or I0-insn, and if you could handle the SET not dying
in the resulting insn, I3-insn.

In fact, in that case you really only need to handle I3-insn (no other
instructions involved), as a simple 2-insn combination that combines into
the earlier insn instead of the later, to get the effect you want.

Just like your patch, that would pull insn earlier, but it would do it
much more explicitly.


Some comments on the patch...

 +/* A is a compare (reg1, 0) and B is SINGLE_SET which SET_SRC is reg2.
 +   It returns TRUE, if reg1 == reg2, and no other refer of reg1
 +   except A and B.  */

That sound like the only correct inputs are such a compare etc., but the
routine tests whether that is true.

 +static bool
 +can_reuse_cc_set_p (rtx_insn *a, rtx_insn *b)
 +{
 +  rtx seta = single_set (a);
 +  rtx setb = single_set (b);
 +
 +  if (BLOCK_FOR_INSN (a) != BLOCK_FOR_INSN (b)

Neither the comment nor the function name mention this.  This test is
better placed in the 

Re: [PATCH] Fix PR 61225

2014-12-09 Thread Jeff Law

On 12/09/14 12:07, Segher Boessenkool wrote:

On Tue, Dec 09, 2014 at 05:49:18PM +0800, Zhenqiang Chen wrote:

Do you need to verify SETA and SETB satisfy single_set?  Or has that
already been done elsewhere?


A is NEXT_INSN (insn)
B is prev_nonnote_nondebug_insn (insn),

For I1 - I2 - B; I2 - A;
LOG_LINK can make sure I1 and I2 are single_set,


It cannot, not anymore anyway.  LOG_LINKs can point to an insn with multiple
SETs; multiple LOG_LINKs can point to such an insn.

So let's go ahead and put a single_set test in this function.


Is this fragment really needed?  Does it ever trigger?  I'd think that

for  2 uses punting would be fine.  Do we really commonly have cases
with  2 uses, but where they're all in SETA and SETB?


Can't you just check for a death note on the second insn?  Together with
reg_used_between_p?
Yea, that'd accomplish the same thing I think Zhenqiang is trying to 
catch and is simpler than walking the lists.





+ /* Try to combine a compare insn that sets CC
+with a preceding insn that can set CC, and maybe with its
+logical predecessor as well.
+We need this special code because data flow connections
+do not get entered in LOG_LINKS.  */


I think you mean not _all_ data flow connections?
I almost said something about this comment, but figured I was nitpicking 
too much :-)



So you've got two new combine cases here, but I think the testcase only
tests one of them.  Can you include a testcase for both of hte major
paths above (I1-I2-I3; I2-insn and I2-I3; I2-INSN)


pr61225.c is the case to cover I1-I2-I3; I2-insn.

For I2 - I3; I2 - insn, I tried my test cases and found peephole2 can also
handle them. So I removed the code from the patch.


Why?  The simpler case has much better chances of being used.
The question does it actually catch anything not already handled?  I 
guess you could argue that doing it in combine is better than peep2 and 
I'd agree with that.




In fact, there are many more cases you could handle:

You handle

I1 - I2 - I3; I2 - insn
   I2 - I3; I2 - insn

but there are also

I1,I2 - I3; I2 - insn

and the many 4-insn combos, too.
Yes, but I wonder how much of this is really necessary in practice.  We 
could do exhaustive testing here, but I suspect the payoff isn't all 
that great.  Thus I'm comfortable with faulting in the cases we actually 
find are useful in practice.





+/* A is a compare (reg1, 0) and B is SINGLE_SET which SET_SRC is reg2.
+   It returns TRUE, if reg1 == reg2, and no other refer of reg1
+   except A and B.  */


That sound like the only correct inputs are such a compare etc., but the
routine tests whether that is true.
Correct, the RTL has to have a specific form and that is tested for. 
Comment updates can't hurt.






+static bool
+can_reuse_cc_set_p (rtx_insn *a, rtx_insn *b)
+{
+  rtx seta = single_set (a);
+  rtx setb = single_set (b);
+
+  if (BLOCK_FOR_INSN (a) != BLOCK_FOR_INSN (b)


Neither the comment nor the function name mention this.  This test is
better placed in the caller of this function, anyway.
Didn't consider it terribly important.  Moving it to the caller doesn't 
change anything significantly, though I would agree it's martinally cleaner.





@@ -3323,7 +3396,11 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn
*i1, rtx_insn *i0,
  rtx old = newpat;
  total_sets = 1 + extra_sets;
  newpat = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (total_sets));
- XVECEXP (newpat, 0, 0) = old;
+
+ if (to_combined_insn)
+   XVECEXP (newpat, 0, --total_sets) = old;
+ else
+   XVECEXP (newpat, 0, 0) = old;
}


Is this correct?  If so, it needs a big fat comment, because it is
not exactly obvious :-)

Also, it doesn't handle at all the case where the new pattern already is
a PARALLEL; can that never happen?

I'd convinced myself it was.  But yes, a comment here would be good.

Presumably you're thinking about a PARALLEL that satisfies single_set_p?

jeff



Re: Running cc1 etc under valgrind (was Re: [PATCH 13/21] PR jit/63854: Add support for running make check-jit under valgrind)

2014-12-09 Thread David Malcolm
On Wed, 2014-11-19 at 13:38 -0500, David Malcolm wrote:
 On Wed, 2014-11-19 at 09:57 -0700, Jeff Law wrote:
  On 11/19/14 03:46, David Malcolm wrote:
   This commit updates jit.exp so that if RUN_UNDER_VALGRIND is present
   in the environment, all of the built client code using libgccjit.so is
   run under valgrind, with --leak-check=full.
  
   Hence:
  RUN_UNDER_VALGRIND= make check-jit
   will run all jit testcases under valgrind (taking 27 mins on my
   machine).
  
   Results are written to testsuite/jit/test-FOO.exe.valgrind.txt
  
   jit.exp automatically parses these result file, looking for lines of
   the form
  definitely lost: 11,316 bytes in 235 blocks
  indirectly lost: 352 bytes in 4 blocks
   in the valgrind log's summary footer, adding PASSes if they are zero
   bytes, and, for now generating XFAILs for non-zero bytes.
  
   Sadly this diverges jit.exp's fixed_host_execute further from DejaGnu's
   host_execute, but I don't see a clean way to fix that.
  
   This currently adds 63 PASSes and 49 XFAILs to jit.sum, giving:
  # of expected passes   2481
  # of expected failures 49
  
   gcc/testsuite/ChangeLog:
 PR jit/63854
 * jit.dg/jit.exp (report_leak): New.
 (parse_valgrind_logfile): New.
 (fixed_host_execute): Detect if RUN_UNDER_VALGRIND is present
 in the environment, and if so, run the executable under
 valgrind, capturing valgrind's output to a logfile.  Parse the
 log file, generating PASSes and XFAILs for the summary of leaks.
  OK for the trunk.  FWIW, I'd love to see a mode where we can easily do 
  this for the other testsuites as well.
 
 Many of the cleanups in these patches are called from toplev::finalize,
 or something called from there, so that they're called by libgccjit.so,
 but not called by cc1/cc1plus etc.
 
 In general, cc1 etc don't need to bother free-ing everything, and can
 instead simply exit.
 
 But if you're running under valgrind, you'd probably want them to call
 toplev::finalize before exiting, to make the valgrind log shorter.
 
 So perhaps cc1 etc could detect if they're being run under valgrind, and
 call toplev::finalize in the appropriate place?
 
 Or maybe this could be a command-line option?
 
 [I think I prefer autodetection, fwiw]

It turns out that this isn't necessary (I think), since the pointers are
typically still reachable.




[PATCH] Fix IRA register preferencing

2014-12-09 Thread Wilco Dijkstra
This fixes a bug in register preferencing. When live range splitting creates a 
new register from
another, it copies most fields except for the register preferences. The 
preference GENERAL_REGS is
used as reg_pref[i].prefclass is initialized with GENERAL_REGS in 
allocate_reg_info () and
resize_reg_info ().

This initialization value is not innocuous like the comment suggests - if a new 
register has a
non-integer mode, it is forced to prefer GENERAL_REGS. This changes the 
register costs in pass 2 so
that they are incorrect. As a result the liverange is either spilled or 
allocated to an integer
register:

void g(double);
void f(double x) 
{ 
  if (x == 0) 
return; 
  g (x);
  g (x); 
}

f:
fcmpd0, #0.0
bne .L6
ret
.L6:
stp x19, x30, [sp, -16]!
fmovx19, d0
bl  g
fmovd0, x19
ldp x19, x30, [sp], 16
b   g

With the fix it uses a floating point register as expected. Given a similar 
issue in
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02253.html, would it not be 
better to change the
initialization values of reg_pref to illegal register classes so this kind of 
issue can be trivially
found with an assert? Also would it not be a good idea to have a single 
register copy function that
ensures all data is copied?


ChangeLog: 2014-12-09  Wilco Dijkstra  wdijk...@arm.com

* gcc/ira-emit.c (ira_create_new_reg): Copy preference classes.

---
 gcc/ira-emit.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/gcc/ira-emit.c b/gcc/ira-emit.c
index d246b7f..d736836 100644
--- a/gcc/ira-emit.c
+++ b/gcc/ira-emit.c
@@ -348,6 +348,7 @@ rtx
 ira_create_new_reg (rtx original_reg)
 {
   rtx new_reg;
+  int original_regno = REGNO (original_reg);
 
   new_reg = gen_reg_rtx (GET_MODE (original_reg));
   ORIGINAL_REGNO (new_reg) = ORIGINAL_REGNO (original_reg);
@@ -356,8 +357,16 @@ ira_create_new_reg (rtx original_reg)
   REG_ATTRS (new_reg) = REG_ATTRS (original_reg);
   if (internal_flag_ira_verbose  3  ira_dump_file != NULL)
 fprintf (ira_dump_file,   Creating newreg=%i from oldreg=%i\n,
-REGNO (new_reg), REGNO (original_reg));
+REGNO (new_reg), original_regno);
   ira_expand_reg_equiv ();
+
+  /* Copy the preference classes to new_reg.  */
+  resize_reg_info ();
+  setup_reg_classes (REGNO (new_reg),
+reg_preferred_class (original_regno),
+reg_alternate_class (original_regno),
+reg_allocno_class (original_regno));
+
   return new_reg;
 }
 
-- 
1.9.1





Re: [PATCH] TYPE_OVERFLOW_* cleanup

2014-12-09 Thread Marc Glisse

On Tue, 9 Dec 2014, Marek Polacek wrote:


The issue here is that TYPE_OVERFLOW_TRAPS, TYPE_OVERFLOW_UNDEFINED,
and TYPE_OVERFLOW_WRAPS macros work on integral types only, yet we
pass e.g. pointer_type/real_type to them.  This patch adds proper
checking for these macros and adds missing guards to various places.
This looks pretty straightforward, but I had to tweak a few places
wrt vector_types (where I've used VECTOR_INTEGER_TYPE_P) to not to
regress folding - and I'm afraid I missed places that aren't tested
in our testsuite :/.

Bootstrapped/regtested on ppc64-linux and x86_64-linux.

2014-12-09  Marek Polacek  pola...@redhat.com

* fold-const.c (negate_expr_p): Check for INTEGRAL_TYPE_P.
(fold_negate_expr): Likewise.
(extract_muldiv_1): Likewise.
(maybe_canonicalize_comparison_1): Likewise.
(fold_comparison): Likewise.
(fold_binary_loc): Likewise.
(tree_binary_nonnegative_warnv_p): Likewise.
(tree_binary_nonzero_warnv_p): Likewise.
* gimple-ssa-strength-reduction.c (legal_cast_p_1): Likewise.
* tree-scalar-evolution.c (simple_iv): Likewise.
(scev_const_prop): Likewise.
* tree-ssa-loop-niter.c (expand_simple_operations): Likewise.
* match.pd (X % -C): Likewise.
(-A - 1 - ~A): Likewise.
(~A + A - -1): Check for INTEGRAL_TYPE_P or VECTOR_INTEGER_TYPE_P.
* tree-vect-generic.c (expand_vector_operation): Likewise.
* tree.h (INTEGRAL_TYPE_CHECK): Define.
(TYPE_OVERFLOW_WRAPS, TYPE_OVERFLOW_UNDEFINED, TYPE_OVERFLOW_TRAPS):
Add INTEGRAL_TYPE_CHECK.

diff --git gcc/fold-const.c gcc/fold-const.c
index 0c4fe40..ff9d917 100644
--- gcc/fold-const.c
+++ gcc/fold-const.c
@@ -426,7 +426,8 @@ negate_expr_p (tree t)

case VECTOR_CST:
  {
-   if (FLOAT_TYPE_P (TREE_TYPE (type)) || TYPE_OVERFLOW_WRAPS (type))
+   if (FLOAT_TYPE_P (TREE_TYPE (type))
+   || (INTEGRAL_TYPE_P (type)  TYPE_OVERFLOW_WRAPS (type)))
  return true;


type is a vector type, so INTEGRAL_TYPE_P (type) is always false I think.



int count = TYPE_VECTOR_SUBPARTS (type), i;
@@ -558,7 +559,8 @@ fold_negate_expr (location_t loc, tree t)
case INTEGER_CST:
  tem = fold_negate_const (t, type);
  if (TREE_OVERFLOW (tem) == TREE_OVERFLOW (t)
- || (!TYPE_OVERFLOW_TRAPS (type)
+ || (INTEGRAL_TYPE_P (type)


Can that be false for an INTEGER_CST?


+  !TYPE_OVERFLOW_TRAPS (type)
   TYPE_OVERFLOW_WRAPS (type))
  || (flag_sanitize  SANITIZE_SI_OVERFLOW) == 0)
return tem;
@@ -5951,7 +5953,8 @@ extract_muldiv_1 (tree t, tree c, enum tree_code code, 
tree wide_type,
   || EXPRESSION_CLASS_P (op0))
  /* ... and has wrapping overflow, and its type is smaller
 than ctype, then we cannot pass through as widening.  */
-  ((TYPE_OVERFLOW_WRAPS (TREE_TYPE (op0))
+  (((INTEGRAL_TYPE_P (TREE_TYPE (op0))
+TYPE_OVERFLOW_WRAPS (TREE_TYPE (op0)))
(TYPE_PRECISION (ctype)
TYPE_PRECISION (TREE_TYPE (op0
  /* ... or this is a truncation (t is narrower than op0),
@@ -5966,7 +5969,8 @@ extract_muldiv_1 (tree t, tree c, enum tree_code code, 
tree wide_type,
  /* ... or has undefined overflow while the converted to
 type has not, we cannot do the operation in the inner type
 as that would introduce undefined overflow.  */
- || (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (op0))
+ || ((INTEGRAL_TYPE_P (TREE_TYPE (op0))
+   TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (op0)))
   !TYPE_OVERFLOW_UNDEFINED (type
break;

@@ -6159,6 +6163,7 @@ extract_muldiv_1 (tree t, tree c, enum tree_code code, 
tree wide_type,
  if ((code == TRUNC_MOD_EXPR || code == CEIL_MOD_EXPR
   || code == FLOOR_MOD_EXPR || code == ROUND_MOD_EXPR)
  /* If the multiplication can overflow we cannot optimize this.  */
+  INTEGRAL_TYPE_P (TREE_TYPE (t))
   TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (t))
   TREE_CODE (TREE_OPERAND (t, 1)) == INTEGER_CST
   wi::multiple_of_p (op1, c, TYPE_SIGN (type)))
@@ -6211,7 +6216,8 @@ extract_muldiv_1 (tree t, tree c, enum tree_code code, 
tree wide_type,

 If we have an unsigned type, we cannot do this since it will change
 the result if the original computation overflowed.  */
-  if (TYPE_OVERFLOW_UNDEFINED (ctype)
+  if (INTEGRAL_TYPE_P (ctype)
+  TYPE_OVERFLOW_UNDEFINED (ctype)
   ((code == MULT_EXPR  tcode == EXACT_DIV_EXPR)
  || (tcode == MULT_EXPR
   code != TRUNC_MOD_EXPR  code != CEIL_MOD_EXPR
@@ -8497,7 +8503,8 @@ maybe_canonicalize_comparison_1 (location_t loc, enum 
tree_code code, tree type,

  /* Match A +- CST code arg1 and CST code arg1.  We can change the
 first form only if overflow is undefined.  

Re: locales fixes

2014-12-09 Thread Marc Glisse

On Tue, 9 Dec 2014, Jonathan Wakely wrote:


On 08/12/14 23:53 +0100, François Dumont wrote:
   After having installed all necessary locales on my system I end up with 
4 failures. Here is a patch to fix them all.


Did you discover why only you are seeing failures?


Not just him, anyone with a debian-based system (assuming we are talking 
about the same thing). I believe it was documented somewhere, Paolo should 
know.


--
Marc Glisse


Re: [Patch] Improving jump-thread pass for PR 54742

2014-12-09 Thread Richard Biener
On December 9, 2014 7:39:48 PM CET, Jeff Law l...@redhat.com wrote:
On 12/09/14 10:38, Sebastian Pop wrote:
 Richard Biener wrote:
 On Mon, Dec 8, 2014 at 10:49 PM, Steve Ellcey sell...@mips.com
wrote:
 expected?  Should this test also check flag_thread_jumps?  Or
should
 that be getting checked somewhere else?

 -fthread-jumps is an RTL optimization flag and ignored on GIMPLE.

 Does it make sense to add a -f[no-]tree-thread-jumps to
enable/disable the tree
 jump threading?  I could also add -f[no-]tree-fsm-thread-jumps. 
Opinions?
Our option handling is a bit of a mess if we look at it from the user 
standpoint.  Given that the vast majority of jump threading happens on 
gimple, ISTM that -f[no-]thread-jumps ought to be controlling the
gimple 
implementation.  One could easily argue that the user doesn't really 
care about where in the pipeline the optimization is implemented.

My vote would be to just make -fthread-jumps control both RTL and
gimple 
jump threading.

Works for me.



 On the llvm test-suite, I have seen one ICE with my fsm jump-thread
patch.
 This patch fixes the problem:

 diff --git a/gcc/tree-ssa-threadupdate.c
b/gcc/tree-ssa-threadupdate.c
 index 12f83ba..f8c736e 100644
 --- a/gcc/tree-ssa-threadupdate.c
 +++ b/gcc/tree-ssa-threadupdate.c
 @@ -2564,6 +2564,7 @@ thread_through_all_blocks (bool
may_peel_loop_headers)
 FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
   {
 if (!loop-header
 +|| !loop_latch_edge (loop)
 || !bitmap_bit_p (threaded_blocks, loop-header-index))
 continue;

retval |= thread_through_loop_header (loop,
may_peel_loop_headers);

 Ok to commit after regstrap?
This seems to be indicating that we have with no edge from the latch 
block to the header block.  I'd like to know better how we got into
that 
state.

It Also returns null for loops with multiple latches. So the patch looks OK for 
me.

Thanks,
Richard.

Also, a test for the GCC testsuite would be good.  I have no idea what 
license covers the LLVM testsuite.  But given a good analysis of the 
problem we may be able to write a suitable test independent of the LLVM

test.

jeff




Re: [PATCH, MPX wrappers 1/3] Add MPX wrappers library

2014-12-09 Thread Jeff Law

On 12/09/14 01:27, Ilya Enkovich wrote:


2014-12-09  Ilya Enkovich  ilya.enkov...@intel.com

* config/i386/linux-common.h (LIBMPX_WRAPPERSSPEC): New.
(CHKP_SPEC): Add wrappers library.
* c-family/c.opt (static-libmpxwrappers): New.

libmpx/

2014-12-09  Ilya Enkovich  ilya.enkov...@intel.com

* Makefile.am (SUBDIRS): Add mpxwrap when used
AS supports MPX.
(MAKEOVERRIDES): New.
* Makefile.in: Regenerate.
* configure.ac: Check AS supports MPX.  Add
mpxintr/Makefile to config files.
* configure: Regenerate.
* mpxwrap/Makefile.am: New.
* mpxwrap/Makefile.in: New.
* mpxwrap/libtool-version: New.
* mpxwrap/mpx_wrappers.cc: New.
* mpxwrap/libmpxwrappers.map: New.



OK once prerequisites are approved.

jeff



Re: [Patch] Improving jump-thread pass for PR 54742

2014-12-09 Thread Jeff Law

On 12/09/14 12:43, Richard Biener wrote:

This seems to be indicating that we have with no edge from the latch
block to the header block.  I'd like to know better how we got into
that
state.


It Also returns null for loops with multiple latches. So the patch looks OK for 
me.

Ah, OK.

Jeff



Re: [PATCH] Set stores_off_frame_dead_at_return to false if sibling call

2014-12-09 Thread Jeff Law

On 12/08/14 14:50, John David Anglin wrote:

On 12/8/2014 3:49 PM, Jeff Law wrote:

I think the terminology and variable names certainly makes this
tougher to follow than it should.

I certainly agree. When I first looked at the code, I thought it
was completely backwards.

Me too.

Thoughts on using both patches to ensure the we don't have extraneous 
entries in the GEN set and that we're adding more stuff to the KILL set 
for sibcalls?


Jeff


[PATCH, committed] Add jit-tempdir.{c|h}

2014-12-09 Thread David Malcolm
Move tempdir handling out from jit-playback.c into new
jit-tempdir.{c|h} files.

This simplifies jit-playback.c, and may help fix PR jit/64206.

Committed to trunk as r218533.

gcc/jit/ChangeLog:
PR jit/64206
* Make-lang.in (jit_OBJS): Add jit/jit-tempdir.o.
* jit-common.h (gcc::jit::tempdir): New forward decl.
* jit-playback.c: Include jit-tempdir.h.
(gcc::jit::playback::context::context): Initialize m_tempdir.
(gcc::jit::playback::context::~context): Move tempdir
cleanup to new file jit-tempdir.c
(make_tempdir_path_template): Move to new file jit-tempdir.c.
(gcc::jit::playback::context::compile): Move tempdir creation
to new tempdir object in new file jit-tempdir.c.
(gcc::jit::playback::context::make_fake_args): Get path from
tempdir object rather than from member data.
(gcc::jit::playback::context::convert_to_dso): Likewise.
(gcc::jit::playback::context::dlopen_built_dso): Likewise.
(gcc::jit::playback::context::dump_generated_code): Likewise.
(gcc::jit::playback::context::get_path_c_file): New function.
(gcc::jit::playback::context::get_path_s_file): New function.
(gcc::jit::playback::context::get_path_so_file): New function.
* jit-playback.h (gcc::jit::playback::context::get_path_c_file):
New function.
(gcc::jit::playback::context::get_path_s_file): New function.
(gcc::jit::playback::context::get_path_so_file): New function.
(gcc::jit::playback::context): Move fields m_path_template,
m_path_tempdir, m_path_c_file, m_path_s_file,
m_path_so_file to new jit::tempdir class; add field m_tempdir.
* jit-tempdir.c: New file.
* jit-tempdir.h: New file.
---
 gcc/jit/Make-lang.in   |   1 +
 gcc/jit/jit-common.h   |   1 +
 gcc/jit/jit-playback.c | 118 +---
 gcc/jit/jit-playback.h |  15 ++
 gcc/jit/jit-tempdir.c  | 129 +
 gcc/jit/jit-tempdir.h  |  81 +++
 6 files changed, 262 insertions(+), 83 deletions(-)
 create mode 100644 gcc/jit/jit-tempdir.c
 create mode 100644 gcc/jit/jit-tempdir.h

diff --git a/gcc/jit/Make-lang.in b/gcc/jit/Make-lang.in
index e88fd00..818e14b 100644
--- a/gcc/jit/Make-lang.in
+++ b/gcc/jit/Make-lang.in
@@ -65,6 +65,7 @@ jit_OBJS = attribs.o \
jit/jit-recording.o \
jit/jit-playback.o \
jit/jit-result.o \
+   jit/jit-tempdir.o \
jit/jit-builtins.o
 
 # Use strict warnings for this front end.
diff --git a/gcc/jit/jit-common.h b/gcc/jit/jit-common.h
index c9dde3e..25c2c6f 100644
--- a/gcc/jit/jit-common.h
+++ b/gcc/jit/jit-common.h
@@ -98,6 +98,7 @@ namespace jit {
 class result;
 class dump;
 class builtins_manager; // declared within jit-builtins.h
+class tempdir;
 
 namespace recording {
 
diff --git a/gcc/jit/jit-playback.c b/gcc/jit/jit-playback.c
index 281ad85..8498900 100644
--- a/gcc/jit/jit-playback.c
+++ b/gcc/jit/jit-playback.c
@@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
 #include jit-playback.h
 #include jit-result.h
 #include jit-builtins.h
+#include jit-tempdir.h
 
 
 /* gcc::jit::playback::context::build_cast uses the convert.h API,
@@ -86,6 +87,7 @@ namespace jit {
 
 playback::context::context (recording::context *ctxt)
   : m_recording_ctxt (ctxt),
+m_tempdir (NULL),
 m_char_array_type_node (NULL),
 m_const_char_ptr (NULL)
 {
@@ -98,25 +100,8 @@ playback::context::context (recording::context *ctxt)
 
 playback::context::~context ()
 {
-  if (get_bool_option (GCC_JIT_BOOL_OPTION_KEEP_INTERMEDIATES))
-fprintf (stderr, intermediate files written to %s\n, m_path_tempdir);
-  else
-{
-  /* Clean up .s/.so and tempdir. */
-  if (m_path_s_file)
-unlink (m_path_s_file);
-  if (m_path_so_file)
-unlink (m_path_so_file);
-  if (m_path_tempdir)
-rmdir (m_path_tempdir);
-}
-
-  free (m_path_template);
-  /* m_path_tempdir aliases m_path_template, or is NULL, so don't
- attempt to free it .  */
-  free (m_path_c_file);
-  free (m_path_s_file);
-  free (m_path_so_file);
+  if (m_tempdir)
+delete m_tempdir;
   m_functions.release ();
 }
 
@@ -1515,44 +1500,6 @@ block (function *func,
   m_label_expr = NULL;
 }
 
-/* Construct a tempdir path template suitable for use by mkdtemp
-   e.g. /tmp/libgccjit-XX, but respecting the rules in
-   libiberty's choose_tempdir rather than hardcoding /tmp/.
-
-   The memory is allocated using malloc and must be freed.
-   Aborts the process if allocation fails. */
-
-static char *
-make_tempdir_path_template ()
-{
-  const char *tmpdir_buf;
-  size_t tmpdir_len;
-  const char *file_template_buf;
-  size_t file_template_len;
-  char *result;
-
-  /* The result of choose_tmpdir is a cached buffer within libiberty, so
- we must *not* free it.  */
-  tmpdir_buf = choose_tmpdir ();
-
-  /* 

Re: [PATCH][libstdc++][testsuite] Mark as UNSUPPORTED tests that don't fit into tiny memory model

2014-12-09 Thread Mike Stump
On Dec 9, 2014, at 3:17 AM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:
 In the gcc and g++ testsuite we already catch such cases and mark them as 
 UNSUPPORTED. In libstdc++.exp there is no such functionality.

 I'm not very happy that it had to be copied, but I couldn't find a way to 
 include the gcc definition sanely.

 Is this a sane approach to what I'm trying to solve?

Ok.

If you would like, you can try and pull the common parts out into a new file, 
and include (load) that file from the two places that currently do that.  If 
they are exactly identical, should be trivial enough.  If not exactly the same, 
I’d do two patches, once to make them the same, then, the second one to split 
them out.

Re: [Patch] Improving jump-thread pass for PR 54742

2014-12-09 Thread Mike Stump
On Dec 9, 2014, at 10:39 AM, Jeff Law l...@redhat.com wrote:
 Also, a test for the GCC testsuite would be good.  I have no idea what 
 license covers the LLVM testsuite.  But given a good analysis of the problem 
 we may be able to write a suitable test independent of the LLVM test.

So, the usual engineerings rules should work just fine on it.  delta reduce and 
submit.

Re: [PATCH, MPX runtime 1/2] Integrate MPX runtime library

2014-12-09 Thread Jeff Law

On 12/09/14 01:24, Ilya Enkovich wrote:

On 24 Nov 17:02, Ilya Enkovich wrote:


Right.  This works for both top level and multilib checks because failing test 
is used and CC is usually not set when it's called by the top level configure.  
If we configure with CC=... then it may go wrong.  I left only target check in 
configure.tgt and inlined test for x32 into libmpx configure.



--
Joseph S. Myers
jos...@codesourcery.com


Here is an updated version.

Thanks,
Ilya
--


Here is an updated version.  I moved linker specs to target.  Currently mpx 
libraries are built for x86_64-*-linux* | i?86-*-linux*, so I think 
gcc/config/i386/linux-common.h is a proper place for LIBMPX delcarations.

Thanks,
Ilya
--
2014-12-09  Ilya Enkovich  ilya.enkov...@intel.com

* Makefile.def: Add libmpx.
* configure.ac: Add libmpx.
* Makefile.in: Regenerate.
* configure: Regenerate.

gcc/

2014-12-09  Ilya Enkovich  ilya.enkov...@intel.com

* config/i386/linux-common.h (LIBMPX_LIBS): New.
(LIBMPX_SPEC): New.
(CHKP_SPEC): New.
* gcc.c (CHKP_SPEC): New.
(LINK_COMMAND_SPEC): Add CHKP_SPEC.
* c-family/c.opt (static-libmpx): New.

libmpx/




diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index b9f7c65..65731cc 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1020,6 +1020,9 @@ fchkp-instrument-marked-only
  C ObjC C++ ObjC++ LTO Report Var(flag_chkp_instrument_marked_only) Init(0)
  Instrument only functions marked with bnd_instrument attribute.

+static-libmpx
+Driver
+
  fcilkplus
  C ObjC C++ ObjC++ LTO Report Var(flag_cilkplus) Init(0)
  Enable Cilk Plus

Wasn't this duplicated in the mpx-wrapper patch?

I think this is OK on the technical side.  I need to do to some vote 
tallying on the steering committee side WRT licensing, ownership of the 
sources, canonical source location, etc, so please don't commit yet.


jeff



[PATCH, commmited] PR jit/63854: Document how to run the jit testsuite under valgrind

2014-12-09 Thread David Malcolm
Committed to trunk as r218538.
Index: gcc/jit/ChangeLog
===
--- gcc/jit/ChangeLog	(revision 218537)
+++ gcc/jit/ChangeLog	(revision 218538)
@@ -1,5 +1,12 @@
 2014-12-09  David Malcolm  dmalc...@redhat.com
 
+	PR jit/63854
+	* docs/internals/index.rst (Running under valgrind): New
+	subsection.
+	(docs/_build/texinfo/libgccjit.texi): Regenerate.
+
+2014-12-09  David Malcolm  dmalc...@redhat.com
+
 PR jit/64206
 	* Make-lang.in (jit_OBJS): Add jit/jit-tempdir.o.
 	* jit-common.h (gcc::jit::tempdir): New forward decl.
Index: gcc/jit/docs/internals/index.rst
===
--- gcc/jit/docs/internals/index.rst	(revision 218537)
+++ gcc/jit/docs/internals/index.rst	(revision 218538)
@@ -118,6 +118,51 @@
  gdb --args \
testsuite/jit/test-factorial.exe
 
+Running under valgrind
+**
+
+The jit testsuite detects if RUN_UNDER_VALGRIND is present in the
+environment (with any value).  If it is present, it runs the test client
+code under `valgrind http://valgrind.org`_,
+specifcally, the default
+`memcheck http://valgrind.org/docs/manual/mc-manual.html`_
+tool with
+`--leak-check=full
+http://valgrind.org/docs/manual/mc-manual.html#opt.leak-check`_.
+
+It automatically parses the output from valgrind, injecting XFAIL results if
+any issues are found, or PASS results if the output is clean.  The output
+is saved to ``TESTNAME.exe.valgrind.txt``.
+
+For example, the following invocation verbosely runs the testcase
+``test-sum-of-squares.c`` under valgrind, showing an issue:
+
+.. code-block:: console
+
+  $ RUN_UNDER_VALGRIND= \
+  make check-jit \
+RUNTESTFLAGS=-v -v -v jit.exp=test-sum-of-squares.c
+
+  (...verbose log contains detailed valgrind errors, if any...)
+
+  === jit Summary ===
+
+  # of expected passes28
+  # of expected failures  2
+
+  $ less testsuite/jit/jit.sum
+  (...other results...)
+  XFAIL: jit.dg/test-sum-of-squares.c: test-sum-of-squares.exe.valgrind.txt: definitely lost: 8 bytes in 1 blocks
+  XFAIL: jit.dg/test-sum-of-squares.c: test-sum-of-squares.exe.valgrind.txt: unsuppressed errors: 1
+  (...other results...)
+
+  $ less testsuite/jit/test-sum-of-squares.exe.valgrind.txt
+  (...shows full valgrind report for this test case...)
+
+When running under valgrind, it's best to have configured gcc with
+:option:`--enable-valgrind-annotations`, which automatically suppresses
+various known false positives.
+
 Environment variables
 -
 When running client code against a locally-built libgccjit, three
Index: gcc/jit/docs/_build/texinfo/libgccjit.texi
===
--- gcc/jit/docs/_build/texinfo/libgccjit.texi	(revision 218537)
+++ gcc/jit/docs/_build/texinfo/libgccjit.texi	(revision 218538)
@@ -197,6 +197,10 @@
 * Environment variables:: 
 * Overview of code structure:: 
 
+Running the test suite
+
+* Running under valgrind:: 
+
 @end detailmenu
 @end menu
 
@@ -6548,8 +6552,61 @@
 
 @noindent
 
+@menu
+* Running under valgrind:: 
+
+@end menu
+
+@node Running under valgrind,,,Running the test suite
+@anchor{internals/index running-under-valgrind}@anchor{c9}
+@subsection Running under valgrind
+
+
+The jit testsuite detects if RUN_UNDER_VALGRIND is present in the
+environment (with any value).  If it is present, it runs the test client
+code under valgrind@footnote{http://valgrind.org},
+specifcally, the default
+memcheck@footnote{http://valgrind.org/docs/manual/mc-manual.html}
+tool with
+--leak-check=full@footnote{http://valgrind.org/docs/manual/mc-manual.html#opt.leak-check}.
+
+It automatically parses the output from valgrind, injecting XFAIL results if
+any issues are found, or PASS results if the output is clean.  The output
+is saved to @code{TESTNAME.exe.valgrind.txt}.
+
+For example, the following invocation verbosely runs the testcase
+@code{test-sum-of-squares.c} under valgrind, showing an issue:
+
+@example
+$ RUN_UNDER_VALGRIND= \
+make check-jit \
+  RUNTESTFLAGS=-v -v -v jit.exp=test-sum-of-squares.c
+
+(...verbose log contains detailed valgrind errors, if any...)
+
+=== jit Summary ===
+
+# of expected passes28
+# of expected failures  2
+
+$ less testsuite/jit/jit.sum
+(...other results...)
+XFAIL: jit.dg/test-sum-of-squares.c: test-sum-of-squares.exe.valgrind.txt: definitely lost: 8 bytes in 1 blocks
+XFAIL: jit.dg/test-sum-of-squares.c: test-sum-of-squares.exe.valgrind.txt: unsuppressed errors: 1
+(...other results...)
+
+$ less testsuite/jit/test-sum-of-squares.exe.valgrind.txt
+(...shows full valgrind report for this test case...)
+@end example
+
+@noindent
+
+When running under valgrind, it's best to have configured gcc with
+@code{--enable-valgrind-annotations}, which automatically suppresses
+various known false positives.
+
 @node 

Re: [PATCH, MPX runtime 1/2] Integrate MPX runtime library

2014-12-09 Thread Ilya Enkovich
2014-12-09 22:57 GMT+03:00 Jeff Law l...@redhat.com:
 On 12/09/14 01:24, Ilya Enkovich wrote:

 On 24 Nov 17:02, Ilya Enkovich wrote:


 Right.  This works for both top level and multilib checks because failing
 test is used and CC is usually not set when it's called by the top level
 configure.  If we configure with CC=... then it may go wrong.  I left only
 target check in configure.tgt and inlined test for x32 into libmpx
 configure.


 --
 Joseph S. Myers
 jos...@codesourcery.com


 Here is an updated version.

 Thanks,
 Ilya
 --


 Here is an updated version.  I moved linker specs to target.  Currently
 mpx libraries are built for x86_64-*-linux* | i?86-*-linux*, so I think
 gcc/config/i386/linux-common.h is a proper place for LIBMPX delcarations.

 Thanks,
 Ilya
 --
 2014-12-09  Ilya Enkovich  ilya.enkov...@intel.com

 * Makefile.def: Add libmpx.
 * configure.ac: Add libmpx.
 * Makefile.in: Regenerate.
 * configure: Regenerate.

 gcc/

 2014-12-09  Ilya Enkovich  ilya.enkov...@intel.com

 * config/i386/linux-common.h (LIBMPX_LIBS): New.
 (LIBMPX_SPEC): New.
 (CHKP_SPEC): New.
 * gcc.c (CHKP_SPEC): New.
 (LINK_COMMAND_SPEC): Add CHKP_SPEC.
 * c-family/c.opt (static-libmpx): New.

 libmpx/


 diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
 index b9f7c65..65731cc 100644
 --- a/gcc/c-family/c.opt
 +++ b/gcc/c-family/c.opt
 @@ -1020,6 +1020,9 @@ fchkp-instrument-marked-only
   C ObjC C++ ObjC++ LTO Report Var(flag_chkp_instrument_marked_only)
 Init(0)
   Instrument only functions marked with bnd_instrument attribute.

 +static-libmpx
 +Driver
 +
   fcilkplus
   C ObjC C++ ObjC++ LTO Report Var(flag_cilkplus) Init(0)
   Enable Cilk Plus

 Wasn't this duplicated in the mpx-wrapper patch?

Wrappers patch introduces similar option static-libmpxwrappers.


 I think this is OK on the technical side.  I need to do to some vote
 tallying on the steering committee side WRT licensing, ownership of the
 sources, canonical source location, etc, so please don't commit yet.

Great! Let me know when it's time to commit.

Thanks,
Ilya


 jeff



Re: [PATCH] Set stores_off_frame_dead_at_return to false if sibling call

2014-12-09 Thread John David Anglin

On 12/9/2014 3:00 PM, Jeff Law wrote:
Thoughts on using both patches to ensure the we don't have extraneous 
entries in the GEN set and that we're adding more stuff to the KILL 
set for sibcalls?

Maybe we should add all stores to the KILL set
for a sibling call and not worry about whether
they are frame related or not.

Dave

--
John David Anglindave.ang...@bell.net



[patch] gdb python pretty printer for DIEs

2014-12-09 Thread Aldy Hernandez
I am tired of dumping entire DIEs just to see what type they are.  With 
this patch, we get:


(gdb) print context_die
$5 = dw_die_ref 0x76de0230 DW_TAG_module parent=0x76de 
DW_TAG_compile_unit


I know it's past the end of stage1, but this debugging aid can help in 
fixing bugs in stage = 3.


I am committing this to the [debug-early] branch, but I am hoping I can 
also commit it to mainline and avoid dragging it along.


OK for mainline?
commit e20f40f92eeba51f9a1ffb078e060366e7beb471
Author: Aldy Hernandez al...@redhat.com
Date:   Tue Dec 9 13:04:46 2014 -0800

* gdbhooks.py (class DWDieRefPrinter): New class.
(build_pretty_printer): Register dw_die_ref's.

diff --git a/gcc/gdbhooks.py b/gcc/gdbhooks.py
index a74e712..26affd9 100644
--- a/gcc/gdbhooks.py
+++ b/gcc/gdbhooks.py
@@ -253,6 +253,26 @@ class CGraphNodePrinter:
 return result
 
 ##
+# Dwarf DIE pretty-printers
+##
+
+class DWDieRefPrinter:
+def __init__(self, gdbval):
+self.gdbval = gdbval
+
+def to_string (self):
+result = 'dw_die_ref 0x%x' % long(self.gdbval)
+if long(self.gdbval) == 0:
+return 'dw_die_ref 0x0'
+result += ' %s' % self.gdbval['die_tag']
+if long(self.gdbval['die_parent']) != 0:
+result += ' parent=0x%x %s' % (long(self.gdbval['die_parent']),
+ 
self.gdbval['die_parent']['die_tag'])
+ 
+result += ''
+return result
+
+##
 
 class GimplePrinter:
 def __init__(self, gdbval):
@@ -455,6 +475,8 @@ def build_pretty_printer():
  'tree', TreePrinter)
 pp.add_printer_for_types(['cgraph_node *'],
  'cgraph_node', CGraphNodePrinter)
+pp.add_printer_for_types(['dw_die_ref'],
+ 'dw_die_ref', DWDieRefPrinter)
 pp.add_printer_for_types(['gimple', 'gimple_statement_base *',
 
   # Keep this in the same order as gimple.def:


Re: [PATCH, x86] Fix pblendv expand.

2014-12-09 Thread Evgeny Stupachenko
I've added the reproducer to the patch.

is it ok?

ChangeLog:

2014-12-10  Evgeny Stupachenko  evstu...@gmail.com

gcc/testsuite
* gcc.target/i386/blend.c: New.

gcc/
* config/i386/i386.c (expand_vec_perm_pblendv): Gen new rtx for
expand_vec_perm_1 target.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index eafc15a..5a914ad 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -47546,6 +47546,7 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d *d)
 dcopy.op0 = dcopy.op1 = d-op1;
   else
 dcopy.op0 = dcopy.op1 = d-op0;
+  dcopy.target = gen_reg_rtx (vmode);
   dcopy.one_operand_p = true;

   for (i = 0; i  nelt; ++i)
diff --git a/gcc/testsuite/gcc.target/i386/blend.c
b/gcc/testsuite/gcc.target/i386/blend.c
new file mode 100644
index 000..d03bdbb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/blend.c
@@ -0,0 +1,61 @@
+/* Test correctness of size 3 store groups permutation.  */
+/* { dg-do run } */
+/* { dg-options -O3 } */
+
+#define N 50
+
+enum num3
+{
+  a, b, c
+};
+
+struct flags
+{
+  enum num3 f;
+  unsigned int c;
+  unsigned int p;
+};
+
+struct flagsN
+{
+  struct flags a[N];
+};
+
+void
+bar (int n, struct flagsN *ff)
+{
+  struct flagsN *fc;
+  for (fc = ff + 1; fc  (ff + n); fc++)
+{
+  int i;
+  for (i = 0; i  N; ++i)
+   {
+ ff-a[i].f = 0;
+ ff-a[i].c = i;
+ ff-a[i].p = -1;
+   }
+  for (i = 0; i  n; i++)
+   {
+ int j;
+ for (j = 0; j  N - n; ++j)
+   {
+ fc-a[i + j].f = 0;
+ fc-a[i + j].c = j + i;
+ fc-a[i + j].p = -1;
+   }
+   }
+}
+}
+
+struct flagsN q[2];
+
+int main()
+{
+  int i;
+  long long *rr = (long long *)q[0].a;
+  bar(2, q);
+  for (i = 0; i  N * 2; i += 2)
+if (rr[i] == -1  rr[i + 1] == -1)
+  return 1;
+  return 0;
+}



On Tue, Dec 9, 2014 at 7:38 PM, Evgeny Stupachenko evstu...@gmail.com wrote:
 I mean that there are a lot of people tracking spec2006 stability and
 therefore the issue should be on track in future.
 And that I can create the test case, but it would be as big as several
 GCC functions.
 Will work on reducing the test case.


 On Tue, Dec 9, 2014 at 7:20 PM, Richard Henderson r...@redhat.com wrote:
 On 12/09/2014 07:59 AM, Evgeny Stupachenko wrote:
 However patch is fixing spec2006 benchmark. Shouldn't that be enough
 for regression testing?


 No.  Spec is not free.


 r~


[PATCH, committed] jit: toyvm.c: use correct path in debuginfo

2014-12-09 Thread David Malcolm
I noticed while debugging another issue that at some point I'd broken
the source locations that toyvm.c sets up into the .toy source files,
by using the filename, rather than the filesystem path.

Also fix a segfault due to fclose (NULL) when the .toy file is not
found.

Committed to trunk as r218540.

gcc/jit/ChangeLog:
* docs/examples/tut04-toyvm/toyvm.c (toyvm_function_compile): Move
logic for determine funcname to new function...
(get_function_name): ...here, adding logic to skip any leading
path from the filename.
(toyvm_function_parse): Use the filename for fn_filename, rather
than name, so that the debugger can locate the source .toy
file.
(toyvm_function_parse): Don't fclose a NULL FILE *.
---
 gcc/jit/docs/examples/tut04-toyvm/toyvm.c | 31 +++
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/gcc/jit/docs/examples/tut04-toyvm/toyvm.c 
b/gcc/jit/docs/examples/tut04-toyvm/toyvm.c
index 07de507..0089ea7 100644
--- a/gcc/jit/docs/examples/tut04-toyvm/toyvm.c
+++ b/gcc/jit/docs/examples/tut04-toyvm/toyvm.c
@@ -121,6 +121,25 @@ add_unary_op (toyvm_function *fn, enum opcode opcode,
   add_op (fn, opcode, operand, linenum);
 }
 
+static char *
+get_function_name (const char *filename)
+{
+  /* Skip any path separators.  */
+  const char *pathsep = strrchr (filename, '/');
+  if (pathsep)
+filename = pathsep + 1;
+
+  /* Copy filename to funcname.  */
+  char *funcname = (char *)malloc (strlen (filename) + 1);
+
+  strcpy (funcname, filename);
+
+  /* Convert . to NIL terminator.  */
+  *(strchr (funcname, '.')) = '\0';
+
+  return funcname;
+}
+
 static toyvm_function *
 toyvm_function_parse (const char *filename, const char *name)
 {
@@ -149,7 +168,7 @@ toyvm_function_parse (const char *filename, const char 
*name)
   fprintf (stderr, out of memory allocating toyvm_function\n);
   goto error;
 }
-  fn-fn_filename = name;
+  fn-fn_filename = filename;
 
   /* Read the lines of the file.  */
   while ((linelen = getline (line, bufsize, f)) != -1)
@@ -208,7 +227,8 @@ toyvm_function_parse (const char *filename, const char 
*name)
 
  error:
   free (line);
-  fclose (f);
+  if (f)
+fclose (f);
   free (fn);
   return NULL;
 }
@@ -460,12 +480,7 @@ toyvm_function_compile (toyvm_function *fn)
 
   memset (state, 0, sizeof (state));
 
-  /* Copy filename to funcname.  */
-  funcname = (char *)malloc (strlen (fn-fn_filename) + 1);
-  strcpy (funcname, fn-fn_filename);
-
-  /* Convert . to NIL terminator.  */
-  *(strchr (funcname, '.')) = '\0';
+  funcname = get_function_name (fn-fn_filename);
 
   state.ctxt = gcc_jit_context_acquire ();
 
-- 
1.8.5.3



Re: [PATCH 01/10] rs6000: Clobber XER[CA] in all user asm statements

2014-12-09 Thread Richard Henderson
On 12/08/2014 06:25 AM, David Edelsohn wrote:
 Okay.  I don't know of another option that preserves backward
 compatibility.  If RTH looks at this series of patches, he may have a
 better suggestion.

No, this looks like what I'd do.

Indeed, I added this hook back when the i386 port gained direct
support for the flags register, so it's exactly what I did.  ;-)


r~


Re: [PATCH, x86] Fix pblendv expand.

2014-12-09 Thread Jakub Jelinek
On Wed, Dec 10, 2014 at 12:33:52AM +0300, Evgeny Stupachenko wrote:
 --- /dev/null
 +++ b/gcc/testsuite/gcc.target/i386/blend.c
 @@ -0,0 +1,61 @@
 +/* Test correctness of size 3 store groups permutation.  */
 +/* { dg-do run } */
 +/* { dg-options -O3 } */
 +
 +#define N 50
 +
 +enum num3
 +{
 +  a, b, c
 +};
 +
 +struct flags
 +{
 +  enum num3 f;

Does this really has to be an enum?  Doesn't unsigned int there work the
same?

 +int main()
 +{
 +  int i;
 +  long long *rr = (long long *)q[0].a;
 +  bar(2, q);
 +  for (i = 0; i  N * 2; i += 2)
 +if (rr[i] == -1  rr[i + 1] == -1)

This is aliasing violation, can't you avoid that?  I mean, just
check the individual struct flags fields?  And with the aliasing violation
fixed, is there anything i?86 specific left in the testcase (i.e. shouldn't
it go either to gcc.dg/ or gcc.c-torture/execute/ instead?)?

Jakub


Re: [PATCH, libcpp]: Check the result of vasprintf

2014-12-09 Thread Joseph Myers
On Tue, 9 Dec 2014, Uros Bizjak wrote:

 Attached patch checks the return value and sets ptr to NULL in this case.
 
 2014-12-09  Uros Bizjak  ubiz...@gmail.com
 
 * directives.c (cpp_define_formatted): Check return value of
 vasprintf and in case of error set ptr to NULL.
 
 Bootstrapped on x86_64-linux-gnu.
 
 OK for mainline?

No, this will just continue to pass NULL into cpp_define, and so into 
strlen, where it isn't a valid argument.  You need to give an error 
message for allocation failure and exit, much like xmalloc does (or put 
xvasprintf in libiberty and use that here - see 
https://gcc.gnu.org/ml/gcc-patches/2009-11/msg01448.html and 
https://gcc.gnu.org/ml/gcc-patches/2009-11/msg01449.html - I don't know 
if that's the latest version).

-- 
Joseph S. Myers
jos...@codesourcery.com


RE: [PATCH][MIPS] Fix P5600 memory cost

2014-12-09 Thread Matthew Fortune
Hi Prachi,

I'm afraid you updated the wrong Changelog with this commit. GCC changes
are recorded in gcc/ChangeLog.

I'm not sure what the correct procedure is for fixing this.

Jeff: Should a mistake like this be fixed by removing the entry
from the top level and adding one at the appropriate location in the
gcc/ChangeLog one?

Thanks,
Matthew

 -Original Message-
 From: Prachi Godbole
 Sent: 26 November 2014 08:40
 To: Matthew Fortune; gcc-patches@gcc.gnu.org
 Subject: RE: [PATCH][MIPS] Fix P5600 memory cost
 
 Committed with ChangeLog entry fixes.
 
 Prachi
 
 
 -Original Message-
 From: Matthew Fortune
 Sent: Wednesday, November 5, 2014 4:07 PM
 To: Prachi Godbole; gcc-patches@gcc.gnu.org
 Subject: RE: [PATCH][MIPS] Fix P5600 memory cost
 
  The patch below fixes the memory cost for P5600.
 
  ChangeLog:
  2014-11-05 Prachi Godbole prachi.godb...@imgtec.com
 
  * config/mips/mips.c (mips_rtx_cost_data): Fix memory_letency cost for
  p5600.
 
 Please follow these instructions to add yourself to MAINTAINERS in the
 write-after-approval section now that you have write access to GCC:
 
 https://gcc.gnu.org/svnwrite.html#authenticated
 
 OK with fixes to the changelog entry:
 
 latency not latency. Remember to tab in the changelog entry and split
 the line as it will exceed 80 chars. Also two spaces between the
 date/name and name/email. E.g.
 
 2014-11-05  Prachi Godbole  prachi.godb...@imgtec.com
 
   * config/mips/mips.c (mips_rtx_cost_data): Fix memory_latency cost
 for
   p5600.
 
 Thanks,
 Matthew
 
  diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index
  af6a913..558ba2f 100644
  --- a/gcc/config/mips/mips.c
  +++ b/gcc/config/mips/mips.c
  @@ -1193,7 +1193,7 @@ static const struct mips_rtx_cost_data
   COSTS_N_INSNS (8),/* int_div_si */
   COSTS_N_INSNS (8),/* int_div_di */
  2,/* branch_cost */
  -  10 /* memory_latency */
  +   4 /* memory_latency */
 }
   };
   ^L


  1   2   >