Re: [PATCH] Fix PR rtl-optimization/pr60663

2014-03-27 Thread Zhenqiang Chen
On 26 March 2014 15:45, Jakub Jelinek ja...@redhat.com wrote:
 On Wed, Mar 26, 2014 at 03:30:44PM +0800, Zhenqiang Chen wrote:
 Agree. CSE should never modify asm insns to drop some of the outputs.

 So the right fix is top prevent this from happening, not papering over about
 it.

 But in this case, CSE does not drop any of the outputs. It just takes
 the SRC of a set and replace the reference of the set. And the
 instruction validation tells CSE that it is legal instruction after
 replacement. (The original correct asm insn is optimized away after
 this replacement)

 I think it is common for most rtl-optimizations to do such kind of
 validation. So to avoid such kind of bug, check_asm_operands must tell
 the optimizer the asm is illegal.

 As it is wrong if CSE does that even with asm ( : =r (i), =r (j));,
 your patch is not the right place to fix this.  CSE just must check where
 the ASM_OPERANDS is coming from and if it comes from a PARALLEL with
 multiple outputs, either give up or duplicate all outputs (if it makes sense
 at all).  Or just don't enter into the hash tables ASM_OPERANDS with
 multiple outputs.

Patch is updated:

diff --git a/gcc/cse.c b/gcc/cse.c
index 852d13e..ce84982 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -4280,6 +4280,19 @@ find_sets_in_insn (rtx insn, struct set **psets)
;
  else if (GET_CODE (SET_SRC (y)) == CALL)
;
+ else if (GET_CODE (SET_SRC (y)) == ASM_OPERANDS)
+   {
+ if (i + 1  lim)
+   {
+ rtx n = XVECEXP (x, 0, i + 1);
+ /* For inline assemble with multiple outputs, we can not
+handle the SET separately.  Refer PR60663.  */
+ if (GET_CODE (n) == SET
+  GET_CODE (SET_SRC (n)) == ASM_OPERANDS)
+   break;
+   }
+ sets[n_sets++].rtl = y;
+   }
  else
sets[n_sets++].rtl = y;
}

Thanks!
-Zhenqiang


[PATCH, x86, testsuite, AVX-512] Fix initialization in 4 tests for shuffles.

2014-03-27 Thread Kirill Yukhin
Hello,
Straightforward patch in the bottom fixes
copy-and-paste problem in initialization part
of tests.

Updated tests pass on simulator.

Is it ok for trunk?

gcc/testsuite:
* gcc.target/i386/avx512f-vshuff32x4-2.c: Fix initialization
of second source operand.
* gcc.target/i386/avx512f-vshuff64x2-2.c: Ditto.
* gcc.target/i386/avx512f-vshufi32x4-2.c: Ditto.
* gcc.target/i386/avx512f-vshufi64x2-2.c: Ditto.

--
Thanks, K

PS: I fully understand that there should be lots
of such bugs in testsuite and going to fix them all.

commit 2a5c128e75b4f18189d62b0e159de73272c41cf9
Author: Kirill Yukhin kirill.yuk...@intel.com
Date:   Thu Mar 27 13:04:15 2014 +0400

AVX-512. Fix initialization of AVX-512 shuffle tests.
---
 gcc/testsuite/gcc.target/i386/avx512f-vshuff32x4-2.c | 2 +-
 gcc/testsuite/gcc.target/i386/avx512f-vshuff64x2-2.c | 2 +-
 gcc/testsuite/gcc.target/i386/avx512f-vshufi32x4-2.c | 2 +-
 gcc/testsuite/gcc.target/i386/avx512f-vshufi64x2-2.c | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vshuff32x4-2.c 
b/gcc/testsuite/gcc.target/i386/avx512f-vshuff32x4-2.c
index 271c862..35eabc2 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-vshuff32x4-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vshuff32x4-2.c
@@ -43,7 +43,7 @@ TEST (void)
   for (i = 0; i  SIZE; i++)
 {
   s1.a[i] = 1.2 / (i + 0.378);
-  s1.a[i] = 91.02 / (i + 4.3578);
+  s2.a[i] = 91.02 / (i + 4.3578);
   u1.a[i] = DEFAULT_VALUE;
   u2.a[i] = DEFAULT_VALUE;
   u3.a[i] = DEFAULT_VALUE;
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vshuff64x2-2.c 
b/gcc/testsuite/gcc.target/i386/avx512f-vshuff64x2-2.c
index 4842942..9fee420 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-vshuff64x2-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vshuff64x2-2.c
@@ -43,7 +43,7 @@ TEST (void)
   for (i = 0; i  SIZE; i++)
 {
   s1.a[i] = 1.2 / (i + 0.378);
-  s1.a[i] = 91.02 / (i + 4.3578);
+  s2.a[i] = 91.02 / (i + 4.3578);
   u1.a[i] = DEFAULT_VALUE;
   u2.a[i] = DEFAULT_VALUE;
   u3.a[i] = DEFAULT_VALUE;
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vshufi32x4-2.c 
b/gcc/testsuite/gcc.target/i386/avx512f-vshufi32x4-2.c
index 105c715..9b1603c 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-vshufi32x4-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vshufi32x4-2.c
@@ -43,7 +43,7 @@ TEST (void)
   for (i = 0; i  SIZE; i++)
 {
   s1.a[i] = 1.2 / (i + 0.378);
-  s1.a[i] = 91.02 / (i + 4.3578);
+  s2.a[i] = 91.02 / (i + 4.3578);
   u1.a[i] = DEFAULT_VALUE;
   u2.a[i] = DEFAULT_VALUE;
   u3.a[i] = DEFAULT_VALUE;
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vshufi64x2-2.c 
b/gcc/testsuite/gcc.target/i386/avx512f-vshufi64x2-2.c
index d79d8f6..85a5918 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-vshufi64x2-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vshufi64x2-2.c
@@ -43,7 +43,7 @@ TEST (void)
   for (i = 0; i  SIZE; i++)
 {
   s1.a[i] = 1.2 / (i + 0.378);
-  s1.a[i] = 91.02 / (i + 4.3578);
+  s2.a[i] = 91.02 / (i + 4.3578);
   u1.a[i] = DEFAULT_VALUE;
   u2.a[i] = DEFAULT_VALUE;
   u3.a[i] = DEFAULT_VALUE;


Re: [PATCH, x86, testsuite, AVX-512] Fix initialization in 4 tests for shuffles.

2014-03-27 Thread Uros Bizjak
On Thu, Mar 27, 2014 at 10:18 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote:

 Straightforward patch in the bottom fixes
 copy-and-paste problem in initialization part
 of tests.

 Updated tests pass on simulator.

 Is it ok for trunk?

 gcc/testsuite:
 * gcc.target/i386/avx512f-vshuff32x4-2.c: Fix initialization
 of second source operand.
 * gcc.target/i386/avx512f-vshuff64x2-2.c: Ditto.
 * gcc.target/i386/avx512f-vshufi32x4-2.c: Ditto.
 * gcc.target/i386/avx512f-vshufi64x2-2.c: Ditto.

OK.

Thanks,
Uros.


Re: Fix PR60644

2014-03-27 Thread Alexander Ivchenko
Adding Balaji.

--Alexander

2014-03-26 18:56 GMT+04:00 Alexander Ivchenko aivch...@gmail.com:
 Hi,

 In gcc/config/linux-android.h we have builtin_define (__ANDROID__);
 So ANDROID as in libcilkrts now is not the correct macro to check.

 Bootstrapped and passed cilk testsuite on x86_64-unknown-linux-gnu.

 diff --git a/libcilkrts/ChangeLog b/libcilkrts/ChangeLog
 index eb0d6ec..65efef0 100644
 --- a/libcilkrts/ChangeLog
 +++ b/libcilkrts/ChangeLog
 @@ -1,3 +1,12 @@
 +2014-03-26  Alexander Ivchenko  alexander.ivche...@intel.com
 +
 + PR bootstrap/60644
 +
 + * include/cilk/metaprogramming.h: Change ANDROID to __ANDROID__.
 + * include/cilk/reducer_min_max.h: Ditto.
 + * runtime/bug.h: Ditto.
 + * runtime/os-unix.c: Ditto.
 +
  2014-03-20  Tobias Burnus  bur...@net-b.de

   PR other/60589
 diff --git a/libcilkrts/include/cilk/metaprogramming.h
 b/libcilkrts/include/cilk/metaprogramming.h
 index 5f6f29d..29b0839 100644
 --- a/libcilkrts/include/cilk/metaprogramming.h
 +++ b/libcilkrts/include/cilk/metaprogramming.h
 @@ -468,7 +468,7 @@ inline void* allocate_aligned(std::size_t size,
 std::size_t alignment)
  #ifdef _WIN32
  return _aligned_malloc(size, alignment);
  #else
 -#if defined(ANDROID) || defined(__ANDROID__)
 +#if defined(__ANDROID__)
  return memalign(std::max(alignment, sizeof(void*)), size);
  #else
  void* ptr;
 diff --git a/libcilkrts/include/cilk/reducer_min_max.h
 b/libcilkrts/include/cilk/reducer_min_max.h
 index 55f068c..7fe09e8 100644
 --- a/libcilkrts/include/cilk/reducer_min_max.h
 +++ b/libcilkrts/include/cilk/reducer_min_max.h
 @@ -3025,7 +3025,7 @@ struct legacy_reducer_downcast reducer
 op_min_indexIndex, Type, Compare, Alig
  #include limits.h

  /* Wchar_t min/max constants */
 -#if defined(_MSC_VER) || defined(ANDROID)
 +#if defined(_MSC_VER) || defined(__ANDROID__)
  #   include wchar.h
  #else
  #   include stdint.h
 diff --git a/libcilkrts/runtime/bug.h b/libcilkrts/runtime/bug.h
 index bb18913..1a64bea 100644
 --- a/libcilkrts/runtime/bug.h
 +++ b/libcilkrts/runtime/bug.h
 @@ -90,7 +90,7 @@ COMMON_PORTABLE extern const char *const
 __cilkrts_assertion_failed;
   * GPL V3 licensed.
   */
  COMMON_PORTABLE void cilkbug_assert_no_uncaught_exception(void);
 -#if defined(_WIN32) || defined(ANDROID)
 +#if defined(_WIN32) || defined(__ANDROID__)
  #  define CILKBUG_ASSERT_NO_UNCAUGHT_EXCEPTION()
  #else
  #  define CILKBUG_ASSERT_NO_UNCAUGHT_EXCEPTION() \
 diff --git a/libcilkrts/runtime/os-unix.c b/libcilkrts/runtime/os-unix.c
 index fafb91d..85bc08d 100644
 --- a/libcilkrts/runtime/os-unix.c
 +++ b/libcilkrts/runtime/os-unix.c
 @@ -282,7 +282,7 @@ void __cilkrts_init_tls_variables(void)
  }
  #endif

 -#if defined (__linux__)  ! defined(ANDROID)
 +#if defined (__linux__)  ! defined(__ANDROID__)
  /*
   * Get the thread id, rather than the pid. In the case of MIC offload, it's
   * possible that we have multiple threads entering Cilk, and each has a
 @@ -354,7 +354,7 @@ static int linux_get_affinity_count (int tid)

  COMMON_SYSDEP int __cilkrts_hardware_cpu_count(void)
  {
 -#if defined ANDROID || (defined(__sun__)  defined(__svr4__))
 +#if defined __ANDROID__ || (defined(__sun__)  defined(__svr4__))
  return sysconf (_SC_NPROCESSORS_ONLN);
  #elif defined __MIC__
  /// HACK: Usually, the 3rd and 4th hyperthreads are not beneficial
 @@ -409,7 +409,7 @@ COMMON_SYSDEP void __cilkrts_yield(void)
  // giving up the processor and latency starting up when work becomes
  // available
  _mm_delay_32(1024);
 -#elif defined(ANDROID) || (defined(__sun__)  defined(__svr4__))
 +#elif defined(__ANDROID__) || (defined(__sun__)  defined(__svr4__))
  // On Android and Solaris, call sched_yield to yield quantum.  I'm not
  // sure why we don't do this on Linux also.
  sched_yield();




 Is it OK?

 --Alexander


Re: [PATCH] x86: _mm*_undefined_* (for real)

2014-03-27 Thread Kirill Yukhin
Hello Ulrich,
On 21 Mar 06:41, Ulrich Drepper wrote:
 From personal experience I find it
 very frustrating if a gcc release doesn't have the complete set of
 intrinsics since then you have to provide your own implementations in
 code which doesn't assume the latest compiler.

I think I should mention here, that there're set of intrinsics which
didn't enter GCC. They fail review.
Initial submission: http://gcc.gnu.org/ml/gcc-patches/2013-08/msg00782.html
because of too complex pattern, UNSPECs were also rejected.

Although they seems to be rare in use, we might try to retry with review
if invent better MD description...

--
Thanks, K


[AArch64/ARM 0/3] Patch series for ZIP intrinsics

2014-03-27 Thread Alan Lawrence

Hi,

AArch64 zip_* intrinsics are currently implemented with temporary inline asm, 
which prevent analysis through themselves. This is to replace those asm blocks 
with (equivalent) calls to __builtin_shuffle, which produce the same assembler 
instructions (unless gcc can do better).


First patch adds a bunch of tests, passing for the current asm implementation;
Second patch reimplements with __builtin_shuffle;
Third patch reuses the test bodies in equivalent tests on the ARM architecture.

Ok for stage 1 ?

Cheers, Alan



[AArch64/ARM 2/3] Rewrite AArch64 ZIP Intrinsics using __builtin_shuffle

2014-03-27 Thread Alan Lawrence
This patch replaces the temporary inline assembler for vzip_* in arm_neon.h with 
equivalent calls to __builtin_shuffle. These are matched by 
aarch64_expand_vec_perm_const{,_1} to output the same assembler instructions.


Tests from first patch still passing on aarch64-none-elf and 
aarch64_be-none-elf.

gcc/ChangeLog:

2012-03-27  Alan Lawrence  alan.lawre...@arm.com

* config/aarch64/arm_neon.h (vzip1_f32, vzip1_p8, vzip1_p16, vzip1_s8,
vzip1_s16, vzip1_s32, vzip1_u8, vzip1_u16, vzip1_u32, vzip1q_f32,
vzip1q_f64, vzip1q_p8, vzip1q_p16, vzip1q_s8, vzip1q_s16, vzip1q_s32,
vzip1q_s64, vzip1q_u8, vzip1q_u16, vzip1q_u32, vzip1q_u64, vzip2_f32,
vzip2_p8, vzip2_p16, vzip2_s8, vzip2_s16, vzip2_s32, vzip2_u8,
vzip2_u16, vzip2_u32, vzip2q_f32, vzip2q_f64, vzip2q_p8, vzip2q_p16,
vzip2q_s8, vzip2q_s16, vzip2q_s32, vzip2q_s64, vzip2q_u8, vzip2q_u16,
vzip2q_u32, vzip2q_u64): Replace inline __asm__ with __builtin_shuffle.diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 6af99361..0ee0aae 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -13414,468 +13414,6 @@ vuzp2q_u64 (uint64x2_t a, uint64x2_t b)
   return result;
 }
 
-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
-vzip1_f32 (float32x2_t a, float32x2_t b)
-{
-  float32x2_t result;
-  __asm__ (zip1 %0.2s,%1.2s,%2.2s
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
-vzip1_p8 (poly8x8_t a, poly8x8_t b)
-{
-  poly8x8_t result;
-  __asm__ (zip1 %0.8b,%1.8b,%2.8b
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
-vzip1_p16 (poly16x4_t a, poly16x4_t b)
-{
-  poly16x4_t result;
-  __asm__ (zip1 %0.4h,%1.4h,%2.4h
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
-vzip1_s8 (int8x8_t a, int8x8_t b)
-{
-  int8x8_t result;
-  __asm__ (zip1 %0.8b,%1.8b,%2.8b
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
-vzip1_s16 (int16x4_t a, int16x4_t b)
-{
-  int16x4_t result;
-  __asm__ (zip1 %0.4h,%1.4h,%2.4h
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
-vzip1_s32 (int32x2_t a, int32x2_t b)
-{
-  int32x2_t result;
-  __asm__ (zip1 %0.2s,%1.2s,%2.2s
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
-vzip1_u8 (uint8x8_t a, uint8x8_t b)
-{
-  uint8x8_t result;
-  __asm__ (zip1 %0.8b,%1.8b,%2.8b
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
-vzip1_u16 (uint16x4_t a, uint16x4_t b)
-{
-  uint16x4_t result;
-  __asm__ (zip1 %0.4h,%1.4h,%2.4h
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
-vzip1_u32 (uint32x2_t a, uint32x2_t b)
-{
-  uint32x2_t result;
-  __asm__ (zip1 %0.2s,%1.2s,%2.2s
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-vzip1q_f32 (float32x4_t a, float32x4_t b)
-{
-  float32x4_t result;
-  __asm__ (zip1 %0.4s,%1.4s,%2.4s
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
-vzip1q_f64 (float64x2_t a, float64x2_t b)
-{
-  float64x2_t result;
-  __asm__ (zip1 %0.2d,%1.2d,%2.2d
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
-vzip1q_p8 (poly8x16_t a, poly8x16_t b)
-{
-  poly8x16_t result;
-  __asm__ (zip1 %0.16b,%1.16b,%2.16b
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
-vzip1q_p16 (poly16x8_t a, poly16x8_t b)
-{
-  poly16x8_t result;
-  __asm__ (zip1 %0.8h,%1.8h,%2.8h
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int8x16_t 

[AArch64/ARM 1/3] Add execution + assembler tests of the AArch64 ZIP Intrinsics.

2014-03-27 Thread Alan Lawrence
This adds DejaGNU tests of the existing AArch64 vzip_* intrinsics, both checking 
the assembler output and the runtime results. Test bodies are in separate files 
ready to reuse for ARM in the third patch. Putting these in a new subdirectory 
ready for tests of other/related intrinsics.


All tests passing on aarch64-none-elf and aarch64_be-none-elf.

testsuite/ChangeLog:

2014-03-25  Alan Lawrence  alan.lawre...@arm.com

* gcc.target/aarch64/simd/simd.exp: New file.
* gcc.target/aarch64/simd/vzipf32_1.c: New file.
* gcc.target/aarch64/simd/vzipf32.x: New file.
* gcc.target/aarch64/simd/vzipp16_1.c: New file.
* gcc.target/aarch64/simd/vzipp16.x: New file.
* gcc.target/aarch64/simd/vzipp8_1.c: New file.
* gcc.target/aarch64/simd/vzipp8.x: New file.
* gcc.target/aarch64/simd/vzipqf32_1.c: New file.
* gcc.target/aarch64/simd/vzipqf32.x: New file.
* gcc.target/aarch64/simd/vzipqp16_1.c: New file.
* gcc.target/aarch64/simd/vzipqp16.x: New file.
* gcc.target/aarch64/simd/vzipqp8_1.c: New file.
* gcc.target/aarch64/simd/vzipqp8.x: New file.
* gcc.target/aarch64/simd/vzipqs16_1.c: New file.
* gcc.target/aarch64/simd/vzipqs16.x: New file.
* gcc.target/aarch64/simd/vzipqs32_1.c: New file.
* gcc.target/aarch64/simd/vzipqs32.x: New file.
* gcc.target/aarch64/simd/vzipqs8_1.c: New file.
* gcc.target/aarch64/simd/vzipqs8.x: New file.
* gcc.target/aarch64/simd/vzipqu16_1.c: New file.
* gcc.target/aarch64/simd/vzipqu16.x: New file.
* gcc.target/aarch64/simd/vzipqu32_1.c: New file.
* gcc.target/aarch64/simd/vzipqu32.x: New file.
* gcc.target/aarch64/simd/vzipqu8_1.c: New file.
* gcc.target/aarch64/simd/vzipqu8.x: New file.
* gcc.target/aarch64/simd/vzips16_1.c: New file.
* gcc.target/aarch64/simd/vzips16.x: New file.
* gcc.target/aarch64/simd/vzips32_1.c: New file.
* gcc.target/aarch64/simd/vzips32.x: New file.
* gcc.target/aarch64/simd/vzips8_1.c: New file.
* gcc.target/aarch64/simd/vzips8.x: New file.
* gcc.target/aarch64/simd/vzipu16_1.c: New file.
* gcc.target/aarch64/simd/vzipu16.x: New file.
* gcc.target/aarch64/simd/vzipu32_1.c: New file.
* gcc.target/aarch64/simd/vzipu32.x: New file.
* gcc.target/aarch64/simd/vzipu8_1.c: New file.
* gcc.target/aarch64/simd/vzipu8.x: New file.diff --git a/gcc/testsuite/gcc.target/aarch64/simd/simd.exp b/gcc/testsuite/gcc.target/aarch64/simd/simd.exp
new file mode 100644
index 000..097d29a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/simd.exp
@@ -0,0 +1,45 @@
+#  Specific regression driver for AArch64 SIMD instructions.
+#  Copyright (C) 2014 Free Software Foundation, Inc.
+#  Contributed by ARM Ltd.
+#
+#  This file is part of GCC.
+#
+#  GCC is free software; you can redistribute it and/or modify it
+#  under the terms of the GNU General Public License as published by
+#  the Free Software Foundation; either version 3, or (at your option)
+#  any later version.
+#
+#  GCC is distributed in the hope that it will be useful, but
+#  WITHOUT ANY WARRANTY; without even the implied warranty of
+#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+#  General Public License for more details.
+#
+#  You should have received a copy of the GNU General Public License
+#  along with GCC; see the file COPYING3.  If not see
+#  http://www.gnu.org/licenses/.  */
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# Exit immediately if this isn't an AArch64 target.
+if {![istarget aarch64*-*-*] } then {
+  return
+}
+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# If a testcase doesn't have special options, use these.
+global DEFAULT_CFLAGS
+if ![info exists DEFAULT_CFLAGS] then {
+set DEFAULT_CFLAGS  -ansi -pedantic-errors
+}
+
+# Initialize `dg'.
+dg-init
+
+# Main loop.
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cCS\]]] \
+	 $DEFAULT_CFLAGS
+
+# All done.
+dg-finish
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vzipf32.x b/gcc/testsuite/gcc.target/aarch64/simd/vzipf32.x
new file mode 100644
index 000..cc69b89
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vzipf32.x
@@ -0,0 +1,27 @@
+extern void abort (void);
+
+float32x2x2_t
+test_vzipf32 (float32x2_t _a, float32x2_t _b)
+{
+  return vzip_f32 (_a, _b);
+}
+
+int
+main (int argc, char **argv)
+{
+  int i;
+  float32_t first[] = {1, 2};
+  float32_t second[] = {3, 4};
+  float32x2x2_t result = test_vzipf32 (vld1_f32 (first), vld1_f32 (second));
+  float32x2_t res1 = result.val[0], res2 = result.val[1];
+  float32_t exp1[] = {1, 3};
+  float32_t exp2[] = {2, 4};
+  float32x2_t expected1 = vld1_f32 (exp1);
+  float32x2_t expected2 = vld1_f32 (exp2);
+
+  for (i = 0; i  2; i++)
+if ((res1[i] != expected1[i]) || (res2[i] != expected2[i]))
+  abort ();
+
+  return 0;
+}
diff 

[AArch64/ARM 3/3] Add execution tests of ARM ZIP Intrinsics

2014-03-27 Thread Alan Lawrence
Final patch adds new tests of the ARM ZIP Intrinsics (subsuming the 
autogenerated ones in testsuite/gcc.target/arm/neon/), that also check the 
execution results, reusing the test bodies introduced into AArch64 in the first 
patch.


All tests passing on arm-none-eabi.

gcc/testsuite/ChangeLog:
2012-03-27  Alan Lawrence  alan.lawre...@arm.com

* gcc.target/arm/simd/simd.exp: New file.
* gcc.target/arm/simd/vzipqf32_1.c: New file.
* gcc.target/arm/simd/vzipqp16_1.c: New file.
* gcc.target/arm/simd/vzipqp8_1.c: New file.
* gcc.target/arm/simd/vzipqs16_1.c: New file.
* gcc.target/arm/simd/vzipqs32_1.c: New file.
* gcc.target/arm/simd/vzipqs8_1.c: New file.
* gcc.target/arm/simd/vzipqu16_1.c: New file.
* gcc.target/arm/simd/vzipqu32_1.c: New file.
* gcc.target/arm/simd/vzipqu8_1.c: New file.
* gcc.target/arm/simd/vzipf32_1.c: New file.
* gcc.target/arm/simd/vzipp16_1.c: New file.
* gcc.target/arm/simd/vzipp8_1.c: New file.
* gcc.target/arm/simd/vzips16_1.c: New file.
* gcc.target/arm/simd/vzips32_1.c: New file.
* gcc.target/arm/simd/vzips8_1.c: New file.
* gcc.target/arm/simd/vzipu16_1.c: New file.
* gcc.target/arm/simd/vzipu32_1.c: New file.
* gcc.target/arm/simd/vzipu8_1.c: New file.diff --git a/gcc/testsuite/gcc.target/arm/simd/simd.exp b/gcc/testsuite/gcc.target/arm/simd/simd.exp
new file mode 100644
index 000..746429d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/simd.exp
@@ -0,0 +1,35 @@
+# Copyright (C) 1997-2014 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# http://www.gnu.org/licenses/.
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# Exit immediately if this isn't an ARM target.
+if ![istarget arm*-*-*] then {
+  return
+}
+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# Initialize `dg'.
+dg-init
+
+# Main loop.
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cCS\]]] \
+	 
+
+# All done.
+dg-finish
diff --git a/gcc/testsuite/gcc.target/arm/simd/vzipf32_1.c b/gcc/testsuite/gcc.target/arm/simd/vzipf32_1.c
new file mode 100644
index 000..efaa96e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vzipf32_1.c
@@ -0,0 +1,12 @@
+/* Test the `vzipf32' ARM Neon intrinsic.  */
+
+/* { dg-do run } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options -save-temps -O1 -fno-inline } */
+/* { dg-add-options arm_neon } */
+
+#include arm_neon.h
+#include ../../aarch64/simd/vzipf32.x
+
+/* { dg-final { scan-assembler-times vuzp\.32\[ \t\]+\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+!?\(?:\[ \t\]+@\[a-zA-Z0-9 \]+\)?\n 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/arm/simd/vzipp16_1.c b/gcc/testsuite/gcc.target/arm/simd/vzipp16_1.c
new file mode 100644
index 000..4154333
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vzipp16_1.c
@@ -0,0 +1,12 @@
+/* Test the `vzipp16' ARM Neon intrinsic.  */
+
+/* { dg-do run } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options -save-temps -O1 -fno-inline } */
+/* { dg-add-options arm_neon } */
+
+#include arm_neon.h
+#include ../../aarch64/simd/vzipp16.x
+
+/* { dg-final { scan-assembler-times vzip\.16\[ \t\]+\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+!?\(?:\[ \t\]+@\[a-zA-Z0-9 \]+\)?\n 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/arm/simd/vzipp8_1.c b/gcc/testsuite/gcc.target/arm/simd/vzipp8_1.c
new file mode 100644
index 000..9fe2384
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vzipp8_1.c
@@ -0,0 +1,12 @@
+/* Test the `vzipp8' ARM Neon intrinsic.  */
+
+/* { dg-do run } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options -save-temps -O1 -fno-inline } */
+/* { dg-add-options arm_neon } */
+
+#include arm_neon.h
+#include ../../aarch64/simd/vzipp8.x
+
+/* { dg-final { scan-assembler-times vzip\.8\[ \t\]+\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+!?\(?:\[ \t\]+@\[a-zA-Z0-9 \]+\)?\n 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/arm/simd/vzipqf32_1.c b/gcc/testsuite/gcc.target/arm/simd/vzipqf32_1.c
new file mode 100644
index 000..8c547a7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vzipqf32_1.c
@@ -0,0 +1,12 @@
+/* Test the `vzipQf32' ARM Neon intrinsic.  */
+
+/* { dg-do run } */
+/* { 

RE: [PATCH, ARM] Optimise NotDI AND/OR ZeroExtendSI for ARMv7A

2014-03-27 Thread Ian Bolton
 -Original Message-
 From: Richard Earnshaw
 Sent: 21 March 2014 13:57
 To: Ian Bolton
 Cc: gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH, ARM] Optimise NotDI AND/OR ZeroExtendSI for ARMv7A
 
 On 19/03/14 16:53, Ian Bolton wrote:
  This is a follow-on patch to one already committed:
  http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01128.html
 
  It implements patterns to simplify our RTL as follows:
 
  OR (Not:DI (A:DI), ZeroExtend:DI (B:SI))
--  the top half can be done with a MVN
 
  AND (Not:DI (A:DI), ZeroExtend:DI (B:SI))
--  the top half becomes zero.
 
  I've added test cases for both of these and also the existing
  anddi_notdi patterns.  The tests all pass.
 
  Full regression runs passed.
 
  OK for stage 1?
 
  Cheers,
  Ian
 
 
  2014-03-19  Ian Bolton  ian.bol...@arm.com
 
  gcc/
  * config/arm/arm.md (*anddi_notdi_zesidi): New pattern
  * config/arm/thumb2.md (*iordi_notdi_zesidi): New pattern.
 
  testsuite/
  * gcc.target/arm/anddi_notdi-1.c: New test.
  * gcc.target/arm/iordi_notdi-1.c: New test case.
 
 
  arm-and-ior-notdi-zeroextend-patch-v1.txt
 
 
  diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
  index 2ddda02..d2d85ee 100644
  --- a/gcc/config/arm/arm.md
  +++ b/gcc/config/arm/arm.md
  @@ -2962,6 +2962,28 @@
  (set_attr type multiple)]
   )
 
  +(define_insn_and_split *anddi_notdi_zesidi
  +  [(set (match_operand:DI 0 s_register_operand =r,r)
  +(and:DI (not:DI (match_operand:DI 2 s_register_operand
 0,?r))
  +(zero_extend:DI
  + (match_operand:SI 1 s_register_operand r,r]
 
 The early clobber and register tying here is unnecessary.  All of the
 input operands are consumed in the first instruction, so you can
 eliminate the ties and the restriction on the overlap.  Something like
 (untested):
 
 +(define_insn_and_split *anddi_notdi_zesidi
 +  [(set (match_operand:DI 0 s_register_operand =r)
 +(and:DI (not:DI (match_operand:DI 2 s_register_operand r))
 +(zero_extend:DI
 + (match_operand:SI 1 s_register_operand r]
 
 Ok for stage-1 with that change (though I'd recommend a another test
 run
 to validate the above).
 
 R.

Thanks, Richard.  Regression runs came back OK with that change, so
I will consider this ready for stage 1.

The patch is attached for reference.
 
Cheers,
Ian


diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 2ddda02..4176b7ff 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -2962,6 +2962,28 @@
(set_attr type multiple)]
 )
 
+(define_insn_and_split *anddi_notdi_zesidi
+  [(set (match_operand:DI 0 s_register_operand =r)
+(and:DI (not:DI (match_operand:DI 2 s_register_operand r))
+(zero_extend:DI
+ (match_operand:SI 1 s_register_operand r]
+  TARGET_32BIT
+  #
+  TARGET_32BIT  reload_completed
+  [(set (match_dup 0) (and:SI (not:SI (match_dup 2)) (match_dup 1)))
+   (set (match_dup 3) (const_int 0))]
+  
+  {
+operands[3] = gen_highpart (SImode, operands[0]);
+operands[0] = gen_lowpart (SImode, operands[0]);
+operands[2] = gen_lowpart (SImode, operands[2]);
+  }
+  [(set_attr length 8)
+   (set_attr predicable yes)
+   (set_attr predicable_short_it no)
+   (set_attr type multiple)]
+)
+
 (define_insn_and_split *anddi_notsesidi_di
   [(set (match_operand:DI 0 s_register_operand =r,r)
(and:DI (not:DI (sign_extend:DI
diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 467c619..10bc8b1 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -1418,6 +1418,30 @@
(set_attr type multiple)]
 )
 
+(define_insn_and_split *iordi_notdi_zesidi
+  [(set (match_operand:DI 0 s_register_operand =r,r)
+   (ior:DI (not:DI (match_operand:DI 2 s_register_operand 0,?r))
+   (zero_extend:DI
+(match_operand:SI 1 s_register_operand r,r]
+  TARGET_THUMB2
+  #
+  TARGET_THUMB2  reload_completed
+  [(set (match_dup 0) (ior:SI (not:SI (match_dup 2)) (match_dup 1)))
+   (set (match_dup 3) (not:SI (match_dup 4)))]
+  
+  {
+operands[3] = gen_highpart (SImode, operands[0]);
+operands[0] = gen_lowpart (SImode, operands[0]);
+operands[1] = gen_lowpart (SImode, operands[1]);
+operands[4] = gen_highpart (SImode, operands[2]);
+operands[2] = gen_lowpart (SImode, operands[2]);
+  }
+  [(set_attr length 8)
+   (set_attr predicable yes)
+   (set_attr predicable_short_it no)
+   (set_attr type multiple)]
+)
+
 (define_insn_and_split *iordi_notsesidi_di
   [(set (match_operand:DI 0 s_register_operand =r,r)
(ior:DI (not:DI (sign_extend:DI
diff --git a/gcc/testsuite/gcc.target/arm/anddi_notdi-1.c 
b/gcc/testsuite/gcc.target/arm/anddi_notdi-1.c
new file mode 100644
index 000..cfb33fc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/anddi_notdi-1.c
@@ -0,0 +1,65 @@
+/* { dg-do run } */
+/* { dg-options -O2 -fno-inline --save-temps } */
+
+extern void abort (void);
+
+typedef 

Re: Fix PR ipa/60315 (inliner explosion)

2014-03-27 Thread Andreas Schwab
Jan Hubicka hubi...@ucw.cz writes:

 Index: testsuite/g++.dg/torture/pr60315.C
 ===
 --- testsuite/g++.dg/torture/pr60315.C(revision 0)
 +++ testsuite/g++.dg/torture/pr60315.C(revision 0)
 @@ -0,0 +1,32 @@
 +// { dg-do compile }
 +struct Base {
 +virtual int f() = 0;
 +};
 +
 +struct Derived : public Base {
 +virtual int f() final override {
 +return 42;
 +}
 +};
 +
 +extern Base* b;
 +
 +int main() {
 +return (static_castDerived*(b)-*(Derived::f))();
 +}

FAIL: g++.dg/torture/pr60315.C  -O0  (test for excess errors)
Excess errors:
/usr/local/gcc/gcc-20140327/gcc/testsuite/g++.dg/torture/pr60315.C:7:19: 
warning: override controls (override/final) only available with -std=c++11 or 
-std=gnu++11
/usr/local/gcc/gcc-20140327/gcc/testsuite/g++.dg/torture/pr60315.C:7:21: 
warning: override controls (override/final) only available with -std=c++11 or 
-std=gnu++11

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
And now for something completely different.


Re: C++ PATCH for c++/60566 (dtor devirtualization and missing thunks)

2014-03-27 Thread Andreas Schwab
Jason Merrill ja...@redhat.com writes:

 diff --git a/gcc/testsuite/g++.dg/abi/thunk6.C 
 b/gcc/testsuite/g++.dg/abi/thunk6.C
 new file mode 100644
 index 000..e3d07f2
 --- /dev/null
 +++ b/gcc/testsuite/g++.dg/abi/thunk6.C
 @@ -0,0 +1,18 @@
 +// PR c++/60566
 +// We need to emit the construction vtable thunk for ~C even if we aren't
 +// going to use it.
 +
 +struct A
 +{
 +  virtual void f() = 0;
 +  virtual ~A() {}
 +};
 +
 +struct B: virtual A { int i; };
 +struct C: virtual A { int i; ~C(); };
 +
 +C::~C() {}
 +
 +int main() {}
 +
 +// { dg-final { scan-assembler _ZTv0_n32_N1CD1Ev } }

FAIL: g++.dg/abi/thunk6.C -std=c++11  scan-assembler _ZTv0_n32_N1CD1Ev

$ grep _ZTv0_ thunk6.s
.globl  _ZTv0_n16_N1CD1Ev
.type   _ZTv0_n16_N1CD1Ev, @function
_ZTv0_n16_N1CD1Ev:
.size   _ZTv0_n16_N1CD1Ev, .-_ZTv0_n16_N1CD1Ev
.globl  _ZTv0_n16_N1CD0Ev
.type   _ZTv0_n16_N1CD0Ev, @function
_ZTv0_n16_N1CD0Ev:
.size   _ZTv0_n16_N1CD0Ev, .-_ZTv0_n16_N1CD0Ev

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
And now for something completely different.


[Patch debug] Fix PR60655 partially.

2014-03-27 Thread Ramana Radhakrishnan

Hi,

   This is a partial fix for PR60655 where dwarf2out.c rejects NOT of a 
value in const_ok_for_output_1. There is still a problem with the 
testcase on armhf where we get operations of the form, const (minus 
(const_int) (symref)) without the -fdata-sections option which is just 
weird. I'm not yet sure where this is produced from and will not have 
the time to dig further today.


As Jakub said on IRC, const_ok_for_output_1 is called only with partial 
rtx's and therefore disabling minus (const_int) (symref) might not be 
the best thing to do especially if this were part of plus (symref) 
(minus (const int) (symref)) and both symrefs were in the same section. 
I will try and find sometime to investigate this further tomorrow.


Bootstrapped and regtested on armhf

Bootstrap and regression test running on x86_64.

Ok to commit ?

regards
Ramana

gcc/

DATE   Jakub Jelinek ja...@redhat.com
 Ramana Radhakrishnan  ramana.radhakrish...@arm.com

* dwarf2out.c (const_ok_for_output_1): Reject expressions containing a 
NOT.

gcc/testsuite

DATE  Ramana Radhakrishnan  ramana.radhakrish...@arm.com

* gcc.c-torture/compile/pr60655-1.c: New test.
commit a0ccb2bce5e3179e7b3560ecd7eecea6289baf7b
Author: Ramana Radhakrishnan ramana.radhakrish...@arm.com
Date:   Thu Mar 27 10:47:50 2014 +

[Patch debug] Fix PR60655 partially.

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 2b584a5..67b37eb 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -11325,8 +11325,18 @@ const_ok_for_output_1 (rtx *rtlp, void *data 
ATTRIBUTE_UNUSED)
   return 1;
 }
 
+  /* FIXME: Refer to PR60655. It is possible for simplification
+ of rtl expressions in var tracking to produce such expressions.
+ We should really identify / validate expressions
+ enclosed in CONST that can be handled by assemblers on various
+ targets and only handle legitimate cases here.  */
   if (GET_CODE (rtl) != SYMBOL_REF)
-return 0;
+{
+  if (GET_CODE (rtl) == NOT)
+ return 1;
+
+  return 0;
+}
 
   if (CONSTANT_POOL_ADDRESS_P (rtl))
 {
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr60655-1.c 
b/gcc/testsuite/gcc.c-torture/compile/pr60655-1.c
new file mode 100644
index 000..5f38701
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr60655-1.c
@@ -0,0 +1,31 @@
+/* { dg-options -fdata-sections } */
+
+typedef unsigned char unit;
+typedef unit *unitptr;
+extern short global_precision;
+typedef unsigned int size_t;
+extern void *memcpy (void *dest, const void *src, size_t n);
+
+short mp_compare(const unit* r1, const unit* r2)
+{
+  register short precision;
+  precision = global_precision;
+  (r1) = ((r1)+(precision)-1);
+  (r2) = ((r2)+(precision)-1);
+  do
+{ if (*r1  *r2)
+   return(-1);
+  if (*((r1)--)  *((r2)--))
+   return(1);
+} while (--precision);
+}
+
+static unit modulus[((1280+(2*8))/8)];
+static unit d_data[((1280+(2*8))/8)*2];
+
+int upton_modmult (unitptr prod, unitptr multiplicand, unitptr multiplier)
+{
+ unitptr d = d_data;
+ while (mp_compare(d,modulus)  0)
+   memcpy((void*)(prod), (const void*)(d), (global_precision));
+}

[PATCH] libstdc++: Add hexfloat/defaultfloat io manipulators.

2014-03-27 Thread Rüdiger Sonderfeld
* include/bits/ios_base.h (hexfloat): New function.
(defaultfloat): New function.
* src/c++98/locale_facets.cc (__num_base::_S_format_float):
Support hexadecimal floating point format.
* testsuite/27_io/basic_ostream/inserters_arithmetic/char/hexfloat.cc:
New file.

hexfloat/defaultfloat are new iostream manipulators introduced in C++11.
See the changes in [locale.nm.put] (§22.4.2.2) and
[floatfield.manip] (§27.5.6.4).  This patch does not add input support
for hexfloats.  The effect of outputting hexadecimal floating points by
setting both fixed and scientific is also added in C++98.  I am not sure
how to change this except for adding a C++11 specific implementation of
`__num_base::_S_format_float'.  But since the C++11 standard explicitly
says that C++2003 gives no meaning to the combination of fixed and
scientific, this might be acceptable anyway.

I have signed the FSF papers.

Signed-off-by: Rüdiger Sonderfeld ruedi...@c-plusplus.de
---
 libstdc++-v3/ChangeLog |   9 ++
 libstdc++-v3/include/bits/ios_base.h   |  20 +++
 libstdc++-v3/src/c++98/locale_facets.cc|   2 +
 .../inserters_arithmetic/char/hexfloat.cc  | 141 
+
 4 files changed, 172 insertions(+)
 create mode 100644 libstdc++-
v3/testsuite/27_io/basic_ostream/inserters_arithmetic/char/hexfloat.cc

diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog
index f6008d1..3345d12 100644
--- a/libstdc++-v3/ChangeLog
+++ b/libstdc++-v3/ChangeLog
@@ -1,3 +1,12 @@
+2014-03-27  Rüdiger Sonderfeld  ruedi...@c-plusplus.de
+
+   * include/bits/ios_base.h (hexfloat): New function.
+   (defaultfloat): New function.
+   * src/c++98/locale_facets.cc (__num_base::_S_format_float):
+   Support hexadecimal floating point format.
+   * testsuite/27_io/basic_ostream/inserters_arithmetic/char/hexfloat.cc:
+   New file.
+
 2014-03-25  Jonathan Wakely  jwak...@redhat.com
 
PR libstdc++/60658
diff --git a/libstdc++-v3/include/bits/ios_base.h b/libstdc++-
v3/include/bits/ios_base.h
index ae856de..8f263c1 100644
--- a/libstdc++-v3/include/bits/ios_base.h
+++ b/libstdc++-v3/include/bits/ios_base.h
@@ -969,6 +969,26 @@ namespace std _GLIBCXX_VISIBILITY(default)
 return __base;
   }
 
+#if __cplusplus = 201103L
+  // New floatingfield manipulators
+
+  /// Calls base.setf(ios_base::fixed|
ios_base::scientific,ios_base::floatfield)
+  inline ios_base
+  hexfloat(ios_base __base)
+  {
+__base.setf(ios_base::fixed | ios_base::scientific, 
ios_base::floatfield);
+return __base;
+  }
+
+  /// Calls base.unsetf(ios_base::floatfield)
+  inline ios_base
+  defaultfloat(ios_base __base)
+  {
+__base.unsetf(ios_base::floatfield);
+return __base;
+  }
+#endif // __cplusplus = 201103L
+
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace
 
diff --git a/libstdc++-v3/src/c++98/locale_facets.cc b/libstdc++-
v3/src/c++98/locale_facets.cc
index 3669acb..9455f42 100644
--- a/libstdc++-v3/src/c++98/locale_facets.cc
+++ b/libstdc++-v3/src/c++98/locale_facets.cc
@@ -82,6 +82,8 @@ namespace std _GLIBCXX_VISIBILITY(default)
   *__fptr++ = 'f';
 else if (__fltfield == ios_base::scientific)
   *__fptr++ = (__flags  ios_base::uppercase) ? 'E' : 'e';
+else if (__fltfield == (ios_base::fixed | ios_base::scientific))
+  *__fptr++ = (__flags  ios_base::uppercase) ? 'A' : 'a';
 else
   *__fptr++ = (__flags  ios_base::uppercase) ? 'G' : 'g';
 *__fptr = '\0';
diff --git a/libstdc++-
v3/testsuite/27_io/basic_ostream/inserters_arithmetic/char/hexfloat.cc 
b/libstdc++-
v3/testsuite/27_io/basic_ostream/inserters_arithmetic/char/hexfloat.cc
new file mode 100644
index 000..b0f5724
--- /dev/null
+++ b/libstdc++-
v3/testsuite/27_io/basic_ostream/inserters_arithmetic/char/hexfloat.cc
@@ -0,0 +1,141 @@
+// { dg-options -std=gnu++0x }
+
+// 2014-03-27 Rüdiger Sonderfeld
+// test the hexadecimal floating point inserters (facet num_put)
+
+// Copyright (C) 2014 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// http://www.gnu.org/licenses/.
+
+#include iostream
+#include iomanip
+#include sstream
+#include limits
+#include testsuite_hooks.h
+
+using namespace std;
+
+//#ifndef _GLIBCXX_ASSERT
+#  define TEST_NUMPUT_VERBOSE 1
+//#endif
+
+void
+test01()
+{
+  {
+ostringstream os;
+double d = 

Re: [committed] Skip gcc.dg/torture/pr60092.c on 32-bit hpux

2014-03-27 Thread John David Anglin


On 24-Mar-14, at 2:45 AM, Rainer Orth wrote:


John David Anglin dave.ang...@bell.net writes:


Index: gcc.dg/torture/pr60092.c
===
--- gcc.dg/torture/pr60092.c(revision 208769)
+++ gcc.dg/torture/pr60092.c(working copy)
@@ -1,5 +1,6 @@
/* { dg-do run } */
/* { dg-require-weak  } */
+/* { dg-skip-if No undefined weak { hppa*-*-hpux*  { !  
lp64 } } { * } {  } } */


Please omit those default args to dg-skip-if.  Besides, it seems we
should have a separate undefined_weak (or whatever) keyword for this,
rather than listing targets explicitly.  But that's certainly not 4.9


Done.  Tested on hppa2.0w-hp-hpux11.11.

Dave
--
John David Anglin   dave.ang...@bell.net


2014-03-27  John David Anglin  dang...@gcc.gnu.org

* gcc.dg/torture/pr60092.c: Remove default dg-skip-if arguments.

Index: gcc.dg/torture/pr60092.c
===
--- gcc.dg/torture/pr60092.c(revision 208856)
+++ gcc.dg/torture/pr60092.c(working copy)
@@ -1,6 +1,6 @@
 /* { dg-do run } */
 /* { dg-require-weak  } */
-/* { dg-skip-if No undefined weak { hppa*-*-hpux*  { ! lp64 } } { * } { 
 } } */
+/* { dg-skip-if No undefined weak { hppa*-*-hpux*  { ! lp64 } } } */
 /* { dg-xfail-run-if posix_memalign modifies first arg on error { 
*-*-solaris2.11* } { -O0 } } */
 
 typedef __SIZE_TYPE__ size_t;


Re: [PATCH] Fix GDB PR15559 (inferior calls using thiscall calling convention)

2014-03-27 Thread Julian Brown
On Wed, 26 Mar 2014 10:25:19 -0600
Tom Tromey tro...@redhat.com wrote:

  Julian == Julian Brown jul...@codesourcery.com writes:
 
 Julian include/
 Julian * dwarf2.h (enum dwarf_calling_convention): Add
 Julian DW_CC_GNU_thiscall_i386.
 
 We've been trying to ensure that all GNU DWARF extensions are
 documented.  In the past we had problems where an extension was added
 and then, years later, its use was unclear.
 
 The usual approach is some appropriate text somewhere on the GCC wiki
 (though I suppose a note in the mail archives would do in a pinch)
 along with a URL in a comment in the appropriate file (dwarf2.h or
 dwarf2.def).
 
 Could you please do that?

How's this, as a first attempt?

http://gcc.gnu.org/wiki/GNUDwarfExtensions

Thanks,

Julian


[committed] Fix #pragma omp simd ICE (PR middle-end/60682)

2014-03-27 Thread Jakub Jelinek
Hi!

gimplify_regimplify_operands doesn't grok gimple_clobber_p stmts
very well (tries to regimplify the CONSTRUCTOR), but in the only case
where we might need to regimplify them in omp-low.c (the addressable
local vars in simd regions, remember this is before inlining) the clobbers
actually don't serve any good, the vars are laid out into the magic arrays
anyway and if vectorized the vectorizer drops the clobbers anyway,
so I think there is no need to preserve the clobbers.  Other possibility
would be to gimplify it into
  temporary = foo_simd_array[bar];
  MEM_REFtemporary =v {CLOBBER};
but that might even prevent vectorization.

Tested on x86_64-linux, committed to trunk.

2014-03-27  Jakub Jelinek  ja...@redhat.com

PR middle-end/60682
* omp-low.c (lower_omp_1): For gimple_clobber_p stmts,
if they need regimplification, just drop them instead of
calling gimple_regimplify_operands on them.

* g++.dg/gomp/pr60682.C: New test.

--- gcc/omp-low.c.jj2014-03-18 10:24:08.0 +0100
+++ gcc/omp-low.c   2014-03-27 13:47:49.474233639 +0100
@@ -10124,7 +10124,20 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p
   if ((ctx || task_shared_vars)
   walk_gimple_op (stmt, lower_omp_regimplify_p,
 ctx ? NULL : wi))
-   gimple_regimplify_operands (stmt, gsi_p);
+   {
+ /* Just remove clobbers, this should happen only if we have
+privatized local addressable variables in SIMD regions,
+the clobber isn't needed in that case and gimplifying address
+of the ARRAY_REF into a pointer and creating MEM_REF based
+clobber would create worse code than we get with the clobber
+dropped.  */
+ if (gimple_clobber_p (stmt))
+   {
+ gsi_replace (gsi_p, gimple_build_nop (), true);
+ break;
+   }
+ gimple_regimplify_operands (stmt, gsi_p);
+   }
   break;
 }
 }
--- gcc/testsuite/g++.dg/gomp/pr60682.C.jj  2014-03-27 14:03:16.889205684 
+0100
+++ gcc/testsuite/g++.dg/gomp/pr60682.C 2014-03-27 14:02:39.0 +0100
@@ -0,0 +1,44 @@
+// PR middle-end/60682
+// { dg-do compile }
+// { dg-options -O2 -fopenmp-simd }
+
+struct A
+{
+  float a;
+  A () {}
+  A (const A x) { a = x.a; }
+};
+
+struct B
+{
+  A a[16];
+};
+
+struct C
+{
+  float a[1];
+  C () {}
+  C (const C x) { a[0] = x.a[0]; }
+};
+
+struct D
+{
+  C a[16];
+};
+
+void
+foo (int x, B y, D z)
+{
+#pragma omp simd
+  for (int i = 0; i  x; ++i)
+{
+  A a;
+  y.a[i] = a;
+}
+#pragma omp simd
+  for (int i = 0; i  x; ++i)
+{
+  C a;
+  z.a[i] = a;
+}
+}

Jakub


Re: [gomp4] Add tables generation

2014-03-27 Thread Ilya Verbin
+#ifdef ACCEL_COMPILER
+  /* Decls are placed in reversed order in fat-objects, so we need to
+ revert them back if we compile target.  */
...

Actually this change is incorrect.  If host binary is built with -flto, then
both host gcc and target gcc read decls from lto and target_lto sections in the
same order, and resulting tables are identical.
So, in this case there is no need to change the order.

But what if one wants to link non-lto host object files with a target image,
produced from target_lto sections?
In this case the order of host table, produced during ordinary compilation will
differ from the order of target table, produced during lto compilation.

Jakub, what do you think?


Here is a simple example with 4 functions and 4 global variables:

#define N 100

#pragma omp declare target
int arr1[N];
int arr2[N];
int arr3[N];
int arr4[N];
#pragma omp end declare target

void foo ()
{
  #pragma omp target
  for (int i = 0; i  N; i++)
arr1[i] = 41 + i;

  #pragma omp target
  for (int i = 0; i  N; i++)
arr2[i] = 42 + i;

  #pragma omp target
  for (int i = 0; i  N; i++)
arr3[i] = 43 + i;

  #pragma omp target
  for (int i = 0; i  N; i++)
arr4[i] = 44 + i;
}


I print DECL_NAME ((*v_funcs)[i]) and DECL_NAME ((*v_vars)[i]) in
omp_finish_file:

Host compilation:
$ gcc -std=c99 -fopenmp -flto -c test.c -o test.o

host func 0: foo._omp_fn.0
host func 1: foo._omp_fn.1
host func 2: foo._omp_fn.2
host func 3: foo._omp_fn.3
host var 0:  arr4
host var 1:  arr3
host var 2:  arr2
host var 3:  arr1

Host lto and target lto:
$ gcc -std=c99 -fopenmp -flto test.o -o test

host func 0: foo._omp_fn.3
host func 1: foo._omp_fn.2
host func 2: foo._omp_fn.1
host func 3: foo._omp_fn.0
host var 0:  arr4
host var 1:  arr3
host var 2:  arr2
host var 3:  arr1

target func 0: foo._omp_fn.3
target func 1: foo._omp_fn.2
target func 2: foo._omp_fn.1
target func 3: foo._omp_fn.0
target var 0:  arr4
target var 1:  arr3
target var 2:  arr2
target var 3:  arr1

The func tables produced during ordinary compilation and lto are different.

  -- Ilya


Re: [PATCH] libstdc++: Add hexfloat/defaultfloat io manipulators.

2014-03-27 Thread Jonathan Wakely

On 27/03/14 12:52 +0100, Rüdiger Sonderfeld wrote:

* include/bits/ios_base.h (hexfloat): New function.
(defaultfloat): New function.
* src/c++98/locale_facets.cc (__num_base::_S_format_float):
Support hexadecimal floating point format.
* testsuite/27_io/basic_ostream/inserters_arithmetic/char/hexfloat.cc:
New file.


Thanks for the patch. I have a few comments inline below.

With a few tweaks I think this can be committed once the trunk
re-opens for stage 1 http://gcc.gnu.org/develop.html#stage1


hexfloat/defaultfloat are new iostream manipulators introduced in C++11.
See the changes in [locale.nm.put] (§22.4.2.2) and
[floatfield.manip] (§27.5.6.4).  This patch does not add input support
for hexfloats.  The effect of outputting hexadecimal floating points by
setting both fixed and scientific is also added in C++98.  I am not sure
how to change this except for adding a C++11 specific implementation of
`__num_base::_S_format_float'.  But since the C++11 standard explicitly
says that C++2003 gives no meaning to the combination of fixed and
scientific, this might be acceptable anyway.


Yes, I think that's OK, thanks for explaining it.

We could document that (fixed|scientific) has that effect in c++98
mode. I'll also need to update the C++11 status table in the manual to
note std::hexfloat works for output streams.


I have signed the FSF papers.


Excellent, that's the hardest part of contributing.


+2014-03-27  Rüdiger Sonderfeld  ruedi...@c-plusplus.de
+
+   * include/bits/ios_base.h (hexfloat): New function.
+   (defaultfloat): New function.
+   * src/c++98/locale_facets.cc (__num_base::_S_format_float):
+   Support hexadecimal floating point format.
+   * testsuite/27_io/basic_ostream/inserters_arithmetic/char/hexfloat.cc:
+   New file.


N.B. patches to the ChangeLog rarely apply cleanly (because someone
else may have changed the ChangeLog since the patch was created) so
the convention is to send the ChangeLog entry in the email body, or as
a separate attachment, or by using 'git log -p @{u}..' so the commit
log shows it, rather than as part of the patch.


diff --git a/libstdc++-v3/include/bits/ios_base.h b/libstdc++-
v3/include/bits/ios_base.h
index ae856de..8f263c1 100644
--- a/libstdc++-v3/include/bits/ios_base.h
+++ b/libstdc++-v3/include/bits/ios_base.h
@@ -969,6 +969,26 @@ namespace std _GLIBCXX_VISIBILITY(default)
return __base;
  }

+#if __cplusplus = 201103L
+  // New floatingfield manipulators


It's only a comment, but that should be floatfield


+
+  /// Calls base.setf(ios_base::fixed|
ios_base::scientific,ios_base::floatfield)


Does this line exceed 80 characters?


diff --git a/libstdc++-v3/src/c++98/locale_facets.cc b/libstdc++-
v3/src/c++98/locale_facets.cc
index 3669acb..9455f42 100644
--- a/libstdc++-v3/src/c++98/locale_facets.cc
+++ b/libstdc++-v3/src/c++98/locale_facets.cc
@@ -82,6 +82,8 @@ namespace std _GLIBCXX_VISIBILITY(default)
  *__fptr++ = 'f';
else if (__fltfield == ios_base::scientific)
  *__fptr++ = (__flags  ios_base::uppercase) ? 'E' : 'e';
+else if (__fltfield == (ios_base::fixed | ios_base::scientific))
+  *__fptr++ = (__flags  ios_base::uppercase) ? 'A' : 'a';
else
  *__fptr++ = (__flags  ios_base::uppercase) ? 'G' : 'g';
*__fptr = '\0';


I wonder if we need a config macro to test whether the libc's
vsnprintf supports the A conversion specifier. You could do:

#ifdef _GLIBCXX_USE_C99
else if (__fltfield == (ios_base::fixed | ios_base::scientific))
  *__fptr++ = (__flags  ios_base::uppercase) ? 'A' : 'a';
#endif

which I think would be correct for now (I'm planning to overhaul how we
use _GLIBCXX_USE_C99 later this year).



diff --git a/libstdc++-
v3/testsuite/27_io/basic_ostream/inserters_arithmetic/char/hexfloat.cc
b/libstdc++-
v3/testsuite/27_io/basic_ostream/inserters_arithmetic/char/hexfloat.cc
new file mode 100644
index 000..b0f5724
--- /dev/null
+++ b/libstdc++-
v3/testsuite/27_io/basic_ostream/inserters_arithmetic/char/hexfloat.cc
@@ -0,0 +1,141 @@
+// { dg-options -std=gnu++0x }


New tests should use -std=gnu++11, the gnu++0x option is deprecated so
will stop working at some point (and we'll have to update the existing
tests using it).


+//#ifndef _GLIBCXX_ASSERT
+#  define TEST_NUMPUT_VERBOSE 1
+//#endif


This appears to make the test write to std::cout by default, did you
mean to remove that before submitting the patch?

Thanks again for contributing this to the library!



Re: [gomp4] Add tables generation

2014-03-27 Thread Bernd Schmidt

On 03/27/2014 02:31 PM, Ilya Verbin wrote:

+#ifdef ACCEL_COMPILER
+  /* Decls are placed in reversed order in fat-objects, so we need to
+ revert them back if we compile target.  */
...


Actually this change is incorrect.  If host binary is built with -flto, then
both host gcc and target gcc read decls from lto and target_lto sections in the
same order, and resulting tables are identical.
So, in this case there is no need to change the order.

But what if one wants to link non-lto host object files with a target image,
produced from target_lto sections?
In this case the order of host table, produced during ordinary compilation will
differ from the order of target table, produced during lto compilation.


I haven't looked into the ordering issue here (the reversing of the 
order is from Michael's original patch), because I still think the whole 
scheme can't work and I was intending to produce a testcase to 
demonstrate that. Looks like you saved me some time here :)


My suggestion would be to augment the tables with the unique-name scheme 
I posted previously. I think the objections against it were a little 
exaggerated, and it would ensure reliability.



Bernd



Re: [gomp4] Add tables generation

2014-03-27 Thread Bernd Schmidt

On 03/27/2014 02:31 PM, Ilya Verbin wrote:

+#ifdef ACCEL_COMPILER
+  /* Decls are placed in reversed order in fat-objects, so we need to
+ revert them back if we compile target.  */
...


Actually this change is incorrect.  If host binary is built with -flto, then
both host gcc and target gcc read decls from lto and target_lto sections in the
same order, and resulting tables are identical.
So, in this case there is no need to change the order.

But what if one wants to link non-lto host object files with a target image,
produced from target_lto sections?
In this case the order of host table, produced during ordinary compilation will
differ from the order of target table, produced during lto compilation.


I haven't looked into the ordering issue here (the reversing of the 
order is from Michael's original patch), because I still think the whole 
scheme can't work and I was intending to produce a testcase to 
demonstrate that. Looks like you saved me some time here :)


My suggestion would be to augment the tables with the unique-name scheme 
I posted previously. I think the objections against it were a little 
exaggerated, and it would ensure reliability.



Bernd



Re: [RFA][PATCH][pr target/60648] Fix non-canonical RTL from x86 backend -- P1 regression

2014-03-27 Thread Jakub Jelinek
On Wed, Mar 26, 2014 at 09:53:47PM +, Richard Sandiford wrote:
 Richard Henderson r...@redhat.com writes:
  On 03/26/2014 12:40 PM, Jakub Jelinek wrote:
  On Wed, Mar 26, 2014 at 01:32:44PM -0600, Jeff Law wrote:
  On 03/26/14 12:28, Jakub Jelinek wrote:
  (mult:SI (const_int 0) (const_int 4)) is IMHO far from being canonical.
  And, I'd say it is likely other target legitimization hooks would also 
  try
  to simplify it similarly.
  simplify_gen_binary is used in several other places during expansion,
  so I don't see why it couldn't be desirable here.
  No particular reason.  I'll try that since we disagree about the
  validity of the RTL and we can both agree that using
  simplify_gen_binary is reasonable.
  
  Other possibility if you want to change it in the i386.c legitimize_address
  hook would be IMHO using force_reg instead of force_operand, it should be
  the same thing in most cases, except for these corner cases, and there 
  would
  be no need to canonizalize anything afterwards.
  But, if the i?86 maintainers feel otherwise on this and think your patch is
  ok, I don't feel that strongly about this.
 
  I like this as a solution.  Let the combiner clean things up if it's
  gotten so far.

Did you mean Jeff's original change, or say:
--- gcc/config/i386/i386.c  2014-03-20 17:41:45.917689676 +0100
+++ gcc/config/i386/i386.c  2014-03-27 14:47:21.876254288 +0100
@@ -13925,13 +13925,13 @@ ix86_legitimize_address (rtx x, rtx oldx
   if (GET_CODE (XEXP (x, 0)) == MULT)
{
  changed = 1;
- XEXP (x, 0) = force_operand (XEXP (x, 0), 0);
+ XEXP (x, 0) = copy_addr_to_reg (XEXP (x, 0));
}
 
   if (GET_CODE (XEXP (x, 1)) == MULT)
{
  changed = 1;
- XEXP (x, 1) = force_operand (XEXP (x, 1), 0);
+ XEXP (x, 1) = copy_addr_to_reg (XEXP (x, 1));
}
 
   if (changed
(or copy_to_reg, should be the same thing).

 How about doing both?  Jakub's simplify_gen_binary change looked like a good
 idea regardless of whatever else happens.  Seems a shame not to go with it.

Agreed.

Jakub


[PATCH] S/390: Don't include 32 bit fp to int routines for 64 bit libgcc

2014-03-27 Thread Andreas Krebbel
Hi,

with r207507 I've made our fp to int conversion routines available
also for the 32 bit biarch libgcc.  The patch included these also for
the 64 bit libgcc which is wrong since it prevent routines like
fixsfti from being generated by libgcc2.c.

The attached patch fixes the following testsuite regressions:
 FAIL: gcc.c-torture/execute/pr49218.c compilation,  -O0
 FAIL: gcc.dg/torture/fp-int-convert-timode.c  -O0  (test for excess errors)

Bye,

-Andreas-

2014-03-27  Andreas Krebbel  andreas.kreb...@de.ibm.com

* configure.ac: Set host_address for S/390.
* configure: Regenerate.
* config.host: Append t-floattodi to tmake_file depending on
host_address.

diff --git a/libgcc/config.host b/libgcc/config.host
index f8f74cc..f4a7428 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -1027,7 +1027,10 @@ s390-*-linux*)
md_unwind_header=s390/linux-unwind.h
;;
 s390x-*-linux*)
-   tmake_file=${tmake_file} s390/t-crtstuff s390/t-linux 
s390/32/t-floattodi
+   tmake_file=${tmake_file} s390/t-crtstuff s390/t-linux
+   if test ${host_address} = 32; then
+  tmake_file=${tmake_file} s390/32/t-floattodi
+   fi
md_unwind_header=s390/linux-unwind.h
;;
 s390x-ibm-tpf*)
diff --git a/libgcc/configure b/libgcc/configure
index 35896de..9e30d5e 100644
--- a/libgcc/configure
+++ b/libgcc/configure
@@ -4321,7 +4321,7 @@ $as_echo $libgcc_cv_cfi 6; }
 # word size rather than the address size.
 cat  conftest.c EOF
 #if defined(__x86_64__) || (!defined(__i386__)  defined(__LP64__)) \
-|| defined(__mips64)
+|| defined(__mips64) || defined(__s390x__)
 host_address=64
 #else
 host_address=32
diff --git a/libgcc/configure.ac b/libgcc/configure.ac
index d877d21..3a1a11c 100644
--- a/libgcc/configure.ac
+++ b/libgcc/configure.ac
@@ -283,7 +283,7 @@ AC_CACHE_CHECK([whether assembler supports CFI directives], 
[libgcc_cv_cfi],
 # word size rather than the address size.
 cat  conftest.c EOF
 #if defined(__x86_64__) || (!defined(__i386__)  defined(__LP64__)) \
-|| defined(__mips64)
+|| defined(__mips64) || defined(__s390x__)
 host_address=64
 #else
 host_address=32



Re: [gomp4] Add tables generation

2014-03-27 Thread Jakub Jelinek
On Thu, Mar 27, 2014 at 05:31:29PM +0400, Ilya Verbin wrote:
 +#ifdef ACCEL_COMPILER
 +  /* Decls are placed in reversed order in fat-objects, so we need to
 + revert them back if we compile target.  */
 ...
 
 Actually this change is incorrect.  If host binary is built with -flto, then
 both host gcc and target gcc read decls from lto and target_lto sections in 
 the
 same order, and resulting tables are identical.
 So, in this case there is no need to change the order.
 
 But what if one wants to link non-lto host object files with a target image,
 produced from target_lto sections?
 In this case the order of host table, produced during ordinary compilation 
 will
 differ from the order of target table, produced during lto compilation.
 
 Jakub, what do you think?

The tables need to be created before IPA, that way it really shouldn't
matter in what order you emit them.  E.g. the outlined target functions
could be added to the table during ompexp pass which actually creates the
outlined functions, the vars need to be added before target lto or host lto
is streamed.

Jakub


Re: [PATCH] S/390: Don't include 32 bit fp to int routines for 64 bit libgcc

2014-03-27 Thread Jakub Jelinek
On Thu, Mar 27, 2014 at 02:59:05PM +0100, Andreas Krebbel wrote:
 Hi,
 
 with r207507 I've made our fp to int conversion routines available
 also for the 32 bit biarch libgcc.  The patch included these also for
 the 64 bit libgcc which is wrong since it prevent routines like
 fixsfti from being generated by libgcc2.c.
 
 The attached patch fixes the following testsuite regressions:
  FAIL: gcc.c-torture/execute/pr49218.c compilation,  -O0
  FAIL: gcc.dg/torture/fp-int-convert-timode.c  -O0  (test for excess errors)

Does this fix the:
-__fixdfti@@GCC_3.0 FUNC GLOBAL DEFAULT
-__fixsfti@@GCC_3.0 FUNC GLOBAL DEFAULT
-__fixtfti@@GCC_4.1.0 FUNC GLOBAL DEFAULT
-__fixunsdfti@@GCC_3.0 FUNC GLOBAL DEFAULT
-__fixunssfti@@GCC_3.0 FUNC GLOBAL DEFAULT
-__fixunstfti@@GCC_4.1.0 FUNC GLOBAL DEFAULT
regression when comparing
readelf -Ws libgcc_s.so.1 | sed -n '/\.symtab/,$d;/ UND 
/d;/@GLIBC_PRIVATE/d;/\(GLOBAL\|WEAK\|UNIQUE\)/p' | awk '{ if ($4 == OBJECT) 
{ printf %s %s %s %s %s\n, $8, $4, $5, $6, $3 } else { printf %s %s %s 
%s\n, $8, $4, $5, $6 }}' | LC_ALL=C sort -u
output between 4.8 and 4.9?

 2014-03-27  Andreas Krebbel  andreas.kreb...@de.ibm.com
 
   * configure.ac: Set host_address for S/390.
   * configure: Regenerate.
   * config.host: Append t-floattodi to tmake_file depending on
   host_address.
 
 diff --git a/libgcc/config.host b/libgcc/config.host
 index f8f74cc..f4a7428 100644
 --- a/libgcc/config.host
 +++ b/libgcc/config.host
 @@ -1027,7 +1027,10 @@ s390-*-linux*)
   md_unwind_header=s390/linux-unwind.h
   ;;
  s390x-*-linux*)
 - tmake_file=${tmake_file} s390/t-crtstuff s390/t-linux 
 s390/32/t-floattodi
 + tmake_file=${tmake_file} s390/t-crtstuff s390/t-linux
 + if test ${host_address} = 32; then
 +tmake_file=${tmake_file} s390/32/t-floattodi
 + fi
   md_unwind_header=s390/linux-unwind.h
   ;;
  s390x-ibm-tpf*)
 --- a/libgcc/configure.ac
 +++ b/libgcc/configure.ac
 @@ -283,7 +283,7 @@ AC_CACHE_CHECK([whether assembler supports CFI 
 directives], [libgcc_cv_cfi],
  # word size rather than the address size.
  cat  conftest.c EOF
  #if defined(__x86_64__) || (!defined(__i386__)  defined(__LP64__)) \
 -|| defined(__mips64)
 +|| defined(__mips64) || defined(__s390x__)
  host_address=64
  #else
  host_address=32

Why is this needed?  Don't s390x define __LP64__ ?

Jakub


Re: [RFA][PATCH][pr target/60648] Fix non-canonical RTL from x86 backend -- P1 regression

2014-03-27 Thread Richard Henderson
On 03/27/2014 06:51 AM, Jakub Jelinek wrote:
 Did you mean Jeff's original change, or say:
 --- gcc/config/i386/i386.c2014-03-20 17:41:45.917689676 +0100
 +++ gcc/config/i386/i386.c2014-03-27 14:47:21.876254288 +0100
 @@ -13925,13 +13925,13 @@ ix86_legitimize_address (rtx x, rtx oldx
if (GET_CODE (XEXP (x, 0)) == MULT)
   {
 changed = 1;
 -   XEXP (x, 0) = force_operand (XEXP (x, 0), 0);
 +   XEXP (x, 0) = copy_addr_to_reg (XEXP (x, 0));

I meant more like this.

 How about doing both?  Jakub's simplify_gen_binary change looked like a good
 idea regardless of whatever else happens.  Seems a shame not to go with it.
 
 Agreed.

Certainly.


r~


Re: [DOC PATCH] Fix up __builtin_ffs* args (PR c/50347)

2014-03-27 Thread Marek Polacek
Actually I suppose this is obvious enough, so will commit it today.

On Tue, Mar 25, 2014 at 09:59:39PM +0100, Marek Polacek wrote:
 ffs builtins had wrong type of parameters.
 
 2014-03-25  Marek Polacek  pola...@redhat.com
 
   PR c/50347
   * doc/extend.texi (ffs Builtins): Change unsigned types to signed
   types.
 
 diff --git gcc/doc/extend.texi gcc/doc/extend.texi
 index c0da713..cb28bc9 100644
 --- gcc/doc/extend.texi
 +++ gcc/doc/extend.texi
 @@ -8963,7 +8963,7 @@ Similar to @code{__builtin_nans}, except the return 
 type is @code{float}.
  Similar to @code{__builtin_nans}, except the return type is @code{long 
 double}.
  @end deftypefn
  
 -@deftypefn {Built-in Function} int __builtin_ffs (unsigned int x)
 +@deftypefn {Built-in Function} int __builtin_ffs (int x)
  Returns one plus the index of the least significant 1-bit of @var{x}, or
  if @var{x} is zero, returns zero.
  @end deftypefn
 @@ -8993,9 +8993,9 @@ Returns the parity of @var{x}, i.e.@: the number of 
 1-bits in @var{x}
  modulo 2.
  @end deftypefn
  
 -@deftypefn {Built-in Function} int __builtin_ffsl (unsigned long)
 +@deftypefn {Built-in Function} int __builtin_ffsl (long)
  Similar to @code{__builtin_ffs}, except the argument type is
 -@code{unsigned long}.
 +@code{long}.
  @end deftypefn
  
  @deftypefn {Built-in Function} int __builtin_clzl (unsigned long)
 @@ -9023,9 +9023,9 @@ Similar to @code{__builtin_parity}, except the argument 
 type is
  @code{unsigned long}.
  @end deftypefn
  
 -@deftypefn {Built-in Function} int __builtin_ffsll (unsigned long long)
 +@deftypefn {Built-in Function} int __builtin_ffsll (long long)
  Similar to @code{__builtin_ffs}, except the argument type is
 -@code{unsigned long long}.
 +@code{long long}.
  @end deftypefn
  
  @deftypefn {Built-in Function} int __builtin_clzll (unsigned long long)
 
Marek


Re: [PATCH] S/390: Don't include 32 bit fp to int routines for 64 bit libgcc

2014-03-27 Thread Andreas Krebbel
On 27/03/14 15:15, Jakub Jelinek wrote:
 Does this fix the:
 -__fixdfti@@GCC_3.0 FUNC GLOBAL DEFAULT
 -__fixsfti@@GCC_3.0 FUNC GLOBAL DEFAULT
 -__fixtfti@@GCC_4.1.0 FUNC GLOBAL DEFAULT
 -__fixunsdfti@@GCC_3.0 FUNC GLOBAL DEFAULT
 -__fixunssfti@@GCC_3.0 FUNC GLOBAL DEFAULT
 -__fixunstfti@@GCC_4.1.0 FUNC GLOBAL DEFAULT
 regression when comparing
 readelf -Ws libgcc_s.so.1 | sed -n '/\.symtab/,$d;/ UND 
 /d;/@GLIBC_PRIVATE/d;/\(GLOBAL\|WEAK\|UNIQUE\)/p' | awk '{ if ($4 == 
 OBJECT) { printf %s %s %s %s %s\n, $8, $4, $5, $6, $3 } else { printf %s 
 %s %s %s\n, $8, $4, $5, $6 }}' | LC_ALL=C sort -u
 output between 4.8 and 4.9?
Yes. It does fix it.

  #if defined(__x86_64__) || (!defined(__i386__)  defined(__LP64__)) \
 -|| defined(__mips64)
 +|| defined(__mips64) || defined(__s390x__)
  host_address=64
  #else
  host_address=32
 
 Why is this needed?  Don't s390x define __LP64__ ?

We do. I'll remove it.

Bye,

-Andreas-




Re: [RFA][PATCH][pr target/60648] Fix non-canonical RTL from x86 backend -- P1 regression

2014-03-27 Thread Jeff Law

On 03/26/14 15:53, Richard Sandiford wrote:

Richard Henderson r...@redhat.com writes:

On 03/26/2014 12:40 PM, Jakub Jelinek wrote:

On Wed, Mar 26, 2014 at 01:32:44PM -0600, Jeff Law wrote:

On 03/26/14 12:28, Jakub Jelinek wrote:

(mult:SI (const_int 0) (const_int 4)) is IMHO far from being canonical.
And, I'd say it is likely other target legitimization hooks would also try
to simplify it similarly.
simplify_gen_binary is used in several other places during expansion,
so I don't see why it couldn't be desirable here.

No particular reason.  I'll try that since we disagree about the
validity of the RTL and we can both agree that using
simplify_gen_binary is reasonable.


Other possibility if you want to change it in the i386.c legitimize_address
hook would be IMHO using force_reg instead of force_operand, it should be
the same thing in most cases, except for these corner cases, and there would
be no need to canonizalize anything afterwards.
But, if the i?86 maintainers feel otherwise on this and think your patch is
ok, I don't feel that strongly about this.


I like this as a solution.  Let the combiner clean things up if it's
gotten so far.


How about doing both?  Jakub's simplify_gen_binary change looked like a good
idea regardless of whatever else happens.  Seems a shame not to go with it.
Agreed.  The simplify_gen_binary change was spinning overnight, I'll be 
looking at the results momentarily.


jeff



Re: [PATCH] Add support for vbpermq builtin; Improve vec_extract

2014-03-27 Thread Michael Meissner
On Wed, Mar 26, 2014 at 08:30:39PM -0400, David Edelsohn wrote:
 Okay.
 
 Good to add the optimizations.
 
 I notice that you emit nop with a comment after a # character. I
 notice that you also added that to the POWER8 vector fusion peepholes.
 
 Is it safe to assume that all assemblers for PowerPC will consider all
 characters after a # to be a comment?

Well in this case, we are considering only PowerPC assemblers that support VSX.

The fusion stuff uses ASM_COMMENT_START, so it should be safe.  I can delete
the comments on the nop, or delete the nop handling and just do fmr/xxlor to
the same register.  I put the comments on, so I could tell in processing the
asm files which flavor of vec_extract was used (and as I said, on spec 2006,
only the fmr/xxlor's were generated, and no nop's).

 I would like to make sure there are no other problems with the patch
 before backporting to 4.8. It wasn't included in the group of patches
 for 4.8 that have been widely tested.

I would at least like to add the part that adds vbpermq, even if we don't add
the vec_extract optimizations.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH] S/390: Don't include 32 bit fp to int routines for 64 bit libgcc

2014-03-27 Thread Jakub Jelinek
On Thu, Mar 27, 2014 at 04:32:19PM +0100, Andreas Krebbel wrote:
 On 27/03/14 15:15, Jakub Jelinek wrote:
  Does this fix the:
  -__fixdfti@@GCC_3.0 FUNC GLOBAL DEFAULT
  -__fixsfti@@GCC_3.0 FUNC GLOBAL DEFAULT
  -__fixtfti@@GCC_4.1.0 FUNC GLOBAL DEFAULT
  -__fixunsdfti@@GCC_3.0 FUNC GLOBAL DEFAULT
  -__fixunssfti@@GCC_3.0 FUNC GLOBAL DEFAULT
  -__fixunstfti@@GCC_4.1.0 FUNC GLOBAL DEFAULT
  regression when comparing
  readelf -Ws libgcc_s.so.1 | sed -n '/\.symtab/,$d;/ UND 
  /d;/@GLIBC_PRIVATE/d;/\(GLOBAL\|WEAK\|UNIQUE\)/p' | awk '{ if ($4 == 
  OBJECT) { printf %s %s %s %s %s\n, $8, $4, $5, $6, $3 } else { printf 
  %s %s %s %s\n, $8, $4, $5, $6 }}' | LC_ALL=C sort -u
  output between 4.8 and 4.9?
 Yes. It does fix it.

Looks ok to me.

   #if defined(__x86_64__) || (!defined(__i386__)  defined(__LP64__)) \
  -|| defined(__mips64)
  +|| defined(__mips64) || defined(__s390x__)
   host_address=64
   #else
   host_address=32
  
  Why is this needed?  Don't s390x define __LP64__ ?
 
 We do. I'll remove it.

BTW, your patch forced me to look at abilists of versioned shared libraries
between 4.8.x and trunk and these 6 symbols are the only ones that are
removed from {i?86,x86_64,ppc,ppc64,s390,s390x} arches.

Jakub


[PATCH v2] libstdc++: Add hexfloat/defaultfloat io manipulators.

2014-03-27 Thread Rüdiger Sonderfeld
Hello Jonathan,

thanks for your comments.

 N.B. patches to the ChangeLog rarely apply cleanly (because someone
 else may have changed the ChangeLog since the patch was created) so
 the convention is to send the ChangeLog entry in the email body, or as
 a separate attachment, or by using 'git log -p @{u}..' so the commit
 log shows it, rather than as part of the patch.

Yes, ChangeLog's can be a bit of a pain.  I removed the ChangeLog from the
patch.  But FYI there is a ChangeLog merge driver hidden in gnulib, which
can be helpful when dealing with ChangeLog files in git (and potentially
other version control systems)

http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob;f=lib/git-merge-changelog.c

 We could document that (fixed|scientific) has that effect in c++98
 mode.

Where should it be documented?

Regards,
Rüdiger

-- 8 -- 8 --

* include/bits/ios_base.h (hexfloat): New function.
(defaultfloat): New function.
* src/c++98/locale_facets.cc (__num_base::_S_format_float):
Support hexadecimal floating point format.
* testsuite/27_io/basic_ostream/inserters_arithmetic/char/hexfloat.cc:
New file.

Signed-off-by: Rüdiger Sonderfeld ruedi...@c-plusplus.de
---
 libstdc++-v3/include/bits/ios_base.h   |  21 +++
 libstdc++-v3/src/c++98/locale_facets.cc|   4 +
 .../inserters_arithmetic/char/hexfloat.cc  | 141 +
 3 files changed, 166 insertions(+)
 create mode 100644 
libstdc++-v3/testsuite/27_io/basic_ostream/inserters_arithmetic/char/hexfloat.cc

diff --git a/libstdc++-v3/include/bits/ios_base.h 
b/libstdc++-v3/include/bits/ios_base.h
index ae856de..b7fae43 100644
--- a/libstdc++-v3/include/bits/ios_base.h
+++ b/libstdc++-v3/include/bits/ios_base.h
@@ -969,6 +969,27 @@ namespace std _GLIBCXX_VISIBILITY(default)
 return __base;
   }
 
+#if __cplusplus = 201103L
+  // New C++11 floatfield manipulators
+
+  /// Calls base.setf(ios_base::fixed | ios_base::scientific,
+  ///   ios_base::floatfield).
+  inline ios_base
+  hexfloat(ios_base __base)
+  {
+__base.setf(ios_base::fixed | ios_base::scientific, ios_base::floatfield);
+return __base;
+  }
+
+  /// Calls base.unsetf(ios_base::floatfield).
+  inline ios_base
+  defaultfloat(ios_base __base)
+  {
+__base.unsetf(ios_base::floatfield);
+return __base;
+  }
+#endif // __cplusplus = 201103L
+
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace
 
diff --git a/libstdc++-v3/src/c++98/locale_facets.cc 
b/libstdc++-v3/src/c++98/locale_facets.cc
index 3669acb..a21d665 100644
--- a/libstdc++-v3/src/c++98/locale_facets.cc
+++ b/libstdc++-v3/src/c++98/locale_facets.cc
@@ -82,6 +82,10 @@ namespace std _GLIBCXX_VISIBILITY(default)
   *__fptr++ = 'f';
 else if (__fltfield == ios_base::scientific)
   *__fptr++ = (__flags  ios_base::uppercase) ? 'E' : 'e';
+#ifdef _GLIBCXX_USE_C99
+else if (__fltfield == (ios_base::fixed | ios_base::scientific))
+  *__fptr++ = (__flags  ios_base::uppercase) ? 'A' : 'a';
+#endif
 else
   *__fptr++ = (__flags  ios_base::uppercase) ? 'G' : 'g';
 *__fptr = '\0';
diff --git 
a/libstdc++-v3/testsuite/27_io/basic_ostream/inserters_arithmetic/char/hexfloat.cc
 
b/libstdc++-v3/testsuite/27_io/basic_ostream/inserters_arithmetic/char/hexfloat.cc
new file mode 100644
index 000..55c46ad
--- /dev/null
+++ 
b/libstdc++-v3/testsuite/27_io/basic_ostream/inserters_arithmetic/char/hexfloat.cc
@@ -0,0 +1,141 @@
+// { dg-options -std=gnu++11 }
+
+// 2014-03-27 Rüdiger Sonderfeld
+// test the hexadecimal floating point inserters (facet num_put)
+
+// Copyright (C) 2014 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// http://www.gnu.org/licenses/.
+
+#include iostream
+#include iomanip
+#include sstream
+#include limits
+#include testsuite_hooks.h
+
+using namespace std;
+
+#ifndef _GLIBCXX_ASSERT
+#  define TEST_NUMPUT_VERBOSE 1
+#endif
+
+void
+test01()
+{
+  {
+ostringstream os;
+double d = 272.; // 0x1.1p+8;
+cout  os.precision()  endl;
+os  hexfloat  setprecision(1);
+os  d;
+#ifdef TEST_NUMPUT_VERBOSE
+cout  got:   os.str()  endl;
+#endif
+VERIFY( os  os.str() == 0x1.1p+8 );
+os.str();
+os  uppercase  d;
+#ifdef TEST_NUMPUT_VERBOSE
+cout  got:   os.str()  endl;
+#endif
+VERIFY( os  os.str() == 

Re: [GOOGLE] Refactor the LIPO fixup

2014-03-27 Thread Dehao Chen
On Wed, Mar 26, 2014 at 4:05 PM, Xinliang David Li davi...@google.com wrote:
 is cgraph_init_gid_map called after linking?

Oh, forgot that part. It's interesting that the test can pass without
another cgraph_init_gid_map call.

Patch updated. Retested and the performance is OK.

Dehao


 David

 On Wed, Mar 26, 2014 at 3:54 PM, Dehao Chen de...@google.com wrote:
 Patch updated, passed performance tests.

 Dehao

 On Tue, Mar 25, 2014 at 4:03 PM, Xinliang David Li davi...@google.com 
 wrote:
 Add comment to the new function. init_node_map is better invoked after
 the link step to avoid creating entries with for dead nodes.

 Ok if large perf testing is fine.

 David

 On Tue, Mar 25, 2014 at 3:38 PM, Dehao Chen de...@google.com wrote:
 This patch refactors LIPO fixup related code to move it into a
 standalone function. This makes sure that
 symtab_remove_unreachable_nodes is called right after the fixup so
 that there is not dangling cgraph nodes any time.

 Bootstrapped and regression test on-going.

 OK for google-4_8?

 Thanks,
 Dehao
Index: gcc/cgraphbuild.c
===
--- gcc/cgraphbuild.c   (revision 208869)
+++ gcc/cgraphbuild.c   (working copy)
@@ -244,9 +244,6 @@ add_fake_indirect_call_edges (struct cgraph_node *
   if (!L_IPO_COMP_MODE)
 return;
 
-  if (cgraph_pre_profiling_inlining_done)
-return;
-
   ic_counts
   = get_coverage_counts_no_warn (DECL_STRUCT_FUNCTION (node-symbol.decl),
  GCOV_COUNTER_ICALL_TOPNV, n_counts);
@@ -599,7 +596,7 @@ record_references_in_initializer (tree decl, bool
needs to be set to the resolved node so that ipa-inline
sees the definitions.  */
 #include gimple-pretty-print.h
-void
+static void
 lipo_fixup_cgraph_edge_call_target (gimple stmt)
 {
   tree decl;
@@ -625,6 +622,58 @@ lipo_fixup_cgraph_edge_call_target (gimple stmt)
 }
 }
 
+/* Link the cgraph nodes, varpool nodes and fixup the call target to
+   the correct decl. Remove dead functions.  */
+
+
+void
+lipo_link_and_fixup ()
+{
+  struct cgraph_node *node;
+
+  cgraph_pre_profiling_inlining_done = true;
+  cgraph_process_module_scope_statics ();
+  /* Now perform link to allow cross module inlining.  */
+  cgraph_do_link ();
+  varpool_do_link ();
+  cgraph_unify_type_alias_sets ();
+  cgraph_init_gid_map ();
+ 
+  FOR_EACH_DEFINED_FUNCTION (node)
+{
+  if (!gimple_has_body_p (node-symbol.decl))
+   continue;
+
+  /* Don't profile functions produced for builtin stuff.  */
+  if (DECL_SOURCE_LOCATION (node-symbol.decl) == BUILTINS_LOCATION)
+   continue;
+
+  push_cfun (DECL_STRUCT_FUNCTION (node-symbol.decl));
+
+  if (L_IPO_COMP_MODE)
+   {
+ basic_block bb;
+ FOR_EACH_BB (bb)
+   {
+ gimple_stmt_iterator gsi;
+ for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi))
+   {
+ gimple stmt = gsi_stmt (gsi);
+ if (is_gimple_call (stmt))
+   lipo_fixup_cgraph_edge_call_target (stmt);
+   }
+   }
+ update_ssa (TODO_update_ssa);
+   }
+  rebuild_cgraph_edges ();
+  pop_cfun ();
+}
+
+  cgraph_add_fake_indirect_call_edges ();
+  symtab_remove_unreachable_nodes (true, dump_file);
+}
+
+
 /* Rebuild cgraph edges for current function node.  This needs to be run after
passes that don't update the cgraph.  */
 
@@ -677,7 +726,8 @@ rebuild_cgraph_edges (void)
   mark_load, mark_store, mark_address);
 }
 
-  add_fake_indirect_call_edges (node);
+  if (!cgraph_pre_profiling_inlining_done)
+add_fake_indirect_call_edges (node);
   record_eh_tables (node, cfun);
   gcc_assert (!node-global.inlined_to);
 
Index: gcc/l-ipo.h
===
--- gcc/l-ipo.h (revision 208869)
+++ gcc/l-ipo.h (working copy)
@@ -60,7 +60,7 @@ void add_decl_to_current_module_scope (tree decl,
 int lipo_cmp_type (tree t1, tree t2);
 tree get_type_or_decl_name (tree);
 int equivalent_struct_types_for_tbaa (const_tree t1, const_tree t2);
-void lipo_fixup_cgraph_edge_call_target (gimple);
+void lipo_link_and_fixup (void);
 extern void copy_defined_module_set (tree, tree);
 extern bool is_parsing_done_p (void);
 extern const char* get_module_name (unsigned int);
Index: gcc/tree-profile.c
===
--- gcc/tree-profile.c  (revision 208869)
+++ gcc/tree-profile.c  (working copy)
@@ -1118,19 +1118,12 @@ tree_profiling (void)
   /* This is a small-ipa pass that gets called only once, from
  cgraphunit.c:ipa_passes().  */
   gcc_assert (cgraph_state == CGRAPH_STATE_IPA_SSA);
-
   /* After value profile transformation, artificial edges (that keep
  function body from being deleted) won't be needed.  */
+  if (L_IPO_COMP_MODE)
+lipo_link_and_fixup ();
+  init_node_map ();
 
-  

Re: [gomp4] Add tables generation

2014-03-27 Thread Ilya Verbin
On 27 Mar 15:02, Jakub Jelinek wrote:
 The tables need to be created before IPA, that way it really shouldn't
 matter in what order you emit them.  E.g. the outlined target functions
 could be added to the table during ompexp pass which actually creates the
 outlined functions, the vars need to be added before target lto or host lto
 is streamed.

For host tables it's ok, but when target compiler will create tables with 
functions?
It reads bytecode from target_lto sections, so it never executes ompexp pass.

  -- Ilya


Re: [RFA][PATCH][pr target/60648] Fix non-canonical RTL from x86 backend -- P1 regression

2014-03-27 Thread Jeff Law

On 03/27/14 07:51, Jakub Jelinek wrote:

On Wed, Mar 26, 2014 at 09:53:47PM +, Richard Sandiford wrote:

Richard Henderson r...@redhat.com writes:

On 03/26/2014 12:40 PM, Jakub Jelinek wrote:

On Wed, Mar 26, 2014 at 01:32:44PM -0600, Jeff Law wrote:

On 03/26/14 12:28, Jakub Jelinek wrote:

(mult:SI (const_int 0) (const_int 4)) is IMHO far from being canonical.
And, I'd say it is likely other target legitimization hooks would also try
to simplify it similarly.
simplify_gen_binary is used in several other places during expansion,
so I don't see why it couldn't be desirable here.

No particular reason.  I'll try that since we disagree about the
validity of the RTL and we can both agree that using
simplify_gen_binary is reasonable.


Other possibility if you want to change it in the i386.c legitimize_address
hook would be IMHO using force_reg instead of force_operand, it should be
the same thing in most cases, except for these corner cases, and there would
be no need to canonizalize anything afterwards.
But, if the i?86 maintainers feel otherwise on this and think your patch is
ok, I don't feel that strongly about this.


I like this as a solution.  Let the combiner clean things up if it's
gotten so far.


Did you mean Jeff's original change, or say:
--- gcc/config/i386/i386.c  2014-03-20 17:41:45.917689676 +0100
+++ gcc/config/i386/i386.c  2014-03-27 14:47:21.876254288 +0100
@@ -13925,13 +13925,13 @@ ix86_legitimize_address (rtx x, rtx oldx
if (GET_CODE (XEXP (x, 0)) == MULT)
{
  changed = 1;
- XEXP (x, 0) = force_operand (XEXP (x, 0), 0);
+ XEXP (x, 0) = copy_addr_to_reg (XEXP (x, 0));
}

if (GET_CODE (XEXP (x, 1)) == MULT)
{
  changed = 1;
- XEXP (x, 1) = force_operand (XEXP (x, 1), 0);
+ XEXP (x, 1) = copy_addr_to_reg (XEXP (x, 1));
}

if (changed
(or copy_to_reg, should be the same thing).
copy_addr_to_reg is probably better since it forces us into Pmode (which 
is useful if we had a mode-less constant).


Both the simplify_gen_binary change and the force_addr_to_reg change 
independently fix this problem.  I'll do a regression run with both of 
them installed for completeness.


jeff



Re: [RFA][PATCH][pr target/60648] Fix non-canonical RTL from x86 backend -- P1 regression

2014-03-27 Thread Jakub Jelinek
On Thu, Mar 27, 2014 at 10:17:26AM -0600, Jeff Law wrote:
 Did you mean Jeff's original change, or say:
 --- gcc/config/i386/i386.c   2014-03-20 17:41:45.917689676 +0100
 +++ gcc/config/i386/i386.c   2014-03-27 14:47:21.876254288 +0100
 @@ -13925,13 +13925,13 @@ ix86_legitimize_address (rtx x, rtx oldx
 if (GET_CODE (XEXP (x, 0)) == MULT)
  {
changed = 1;
 -  XEXP (x, 0) = force_operand (XEXP (x, 0), 0);
 +  XEXP (x, 0) = copy_addr_to_reg (XEXP (x, 0));
  }
 
 if (GET_CODE (XEXP (x, 1)) == MULT)
  {
changed = 1;
 -  XEXP (x, 1) = force_operand (XEXP (x, 1), 0);
 +  XEXP (x, 1) = copy_addr_to_reg (XEXP (x, 1));
  }
 
 if (changed
 (or copy_to_reg, should be the same thing).
 copy_addr_to_reg is probably better since it forces us into Pmode
 (which is useful if we had a mode-less constant).

Well, but in both of these cases you know that what you pass in is
a MULT and thus never mode-less.  That said, copy_addr_to_reg has the
advantage that it will ICE if the MULT isn't Pmode, but that really should
never happen for addresses, so not a big difference.

Jakub


Re: C++ PATCH for c++/60566 (dtor devirtualization and missing thunks)

2014-03-27 Thread Jason Merrill

On 03/27/2014 01:42 AM, Jan Hubicka wrote:

I belive the problem here is the _vptr.MultiTermDocs vtable is initialized from
VTT that is not understood by ipa-prop jump functions.


Makes sense.  It would be good to update those functions to understand 
that the initialization is always setting the vptr to a construction 
vtable for MultiTermDocs (in some derived class).


Jason



Re: [PATCH v2] libstdc++: Add hexfloat/defaultfloat io manipulators.

2014-03-27 Thread Jonathan Wakely

On 27/03/14 17:00 +0100, Rüdiger Sonderfeld wrote:

Hello Jonathan,

thanks for your comments.


N.B. patches to the ChangeLog rarely apply cleanly (because someone
else may have changed the ChangeLog since the patch was created) so
the convention is to send the ChangeLog entry in the email body, or as
a separate attachment, or by using 'git log -p @{u}..' so the commit
log shows it, rather than as part of the patch.


Yes, ChangeLog's can be a bit of a pain.  I removed the ChangeLog from the
patch.  But FYI there is a ChangeLog merge driver hidden in gnulib, which
can be helpful when dealing with ChangeLog files in git (and potentially
other version control systems)

http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob;f=lib/git-merge-changelog.c


Yes, I use that myself, and generate patches with the 'git lgp'
command shown at http://gcc.gnu.org/wiki/GitMirror


We could document that (fixed|scientific) has that effect in c++98
mode.


Where should it be documented?


Probably somewhere in doc/xml/manual/io.xml but I'm happy to do that
once the patch is committed if you like.

Thanks for the updated patch, I will try to remember to commit it when
stage 1 starts. If you don't get a mail from me then please ping me as
a reminder.




Re: [PATCH] Fix PR c++/60573

2014-03-27 Thread Jason Merrill

On 03/26/2014 09:12 PM, Adam Butcher wrote:

+Note: cp_binding_level::class_shadowed is used as a predicate to
+indicate whether a class scope is a class-defining scope.  We stop
+at the first such scope as this will be the currently open class
+definition into which the function being declared will be appended;
+and therefore the scope into which the synthesized template
+parameter list for the declarator should be injected.  */
+
+ while (scope-kind == sk_class  !scope-class_shadowed)


That doesn't seem reliable either, unfortunately; class_shadowed is 
populated when names are looked up, so a declarator that refers to a 
type member of B will cause scope-class_shadowed to be non-null.


Jason



[AArch64/ARM 0/3] Patch series for UZP intrinsics

2014-03-27 Thread Alan Lawrence

Hi,

Much like the zip intrinsics, the vuzp_* intrinsics are implemented with inline
ASM, which prevents compiler analysis. This series replaces those with calls to
_builtin_shuffle, which produce the same** assembler instructions.

(**except for two-element vectors where UZP and ZIP are equivalent and the
backend outputs ZIP.)

First patch adds a bunch of tests, passing for the current asm implementation;
Second patch reimplements with __builtin_shuffle;
Third patch adds equivalent ARM tests using test bodies shared from first patch.

OK for stage 1?

Cheers, Alan




[AArch64/ARM 1/3] Add execution + assembler tests of AArch64 UZP Intrinsics

2014-03-27 Thread Alan Lawrence
This adds DejaGNU tests of the existing AArch64 vuzp_* intrinsics, both checking 
the assembler output and the runtime results. Test bodies are in separate files 
ready to reuse for ARM in the third patch.


Putting these in a new subdirectory with the ZIP Intrinsic tests, using simd.exp 
added there (will commit ZIP tests first).


All tests passing on aarch64-none-elf and aarch64_be-none-elf.

testsuite/ChangeLog:
2014-03-27  Alan Lawrence  alan.lawre...@arm.com

* gcc.target/aarch64/simd/vuzpf32_1.c: New file.
* gcc.target/aarch64/simd/vuzpf32.x: New file.
* gcc.target/aarch64/simd/vuzpp16_1.c: New file.
* gcc.target/aarch64/simd/vuzpp16.x: New file.
* gcc.target/aarch64/simd/vuzpp8_1.c: New file.
* gcc.target/aarch64/simd/vuzpp8.x: New file.
* gcc.target/aarch64/simd/vuzpqf32_1.c: New file.
* gcc.target/aarch64/simd/vuzpqf32.x: New file.
* gcc.target/aarch64/simd/vuzpqp16_1.c: New file.
* gcc.target/aarch64/simd/vuzpqp16.x: New file.
* gcc.target/aarch64/simd/vuzpqp8_1.c: New file.
* gcc.target/aarch64/simd/vuzpqp8.x: New file.
* gcc.target/aarch64/simd/vuzpqs16_1.c: New file.
* gcc.target/aarch64/simd/vuzpqs16.x: New file.
* gcc.target/aarch64/simd/vuzpqs32_1.c: New file.
* gcc.target/aarch64/simd/vuzpqs32.x: New file.
* gcc.target/aarch64/simd/vuzpqs8_1.c: New file.
* gcc.target/aarch64/simd/vuzpqs8.x: New file.
* gcc.target/aarch64/simd/vuzpqu16_1.c: New file.
* gcc.target/aarch64/simd/vuzpqu16.x: New file.
* gcc.target/aarch64/simd/vuzpqu32_1.c: New file.
* gcc.target/aarch64/simd/vuzpqu32.x: New file.
* gcc.target/aarch64/simd/vuzpqu8_1.c: New file.
* gcc.target/aarch64/simd/vuzpqu8.x: New file.
* gcc.target/aarch64/simd/vuzps16_1.c: New file.
* gcc.target/aarch64/simd/vuzps16.x: New file.
* gcc.target/aarch64/simd/vuzps32_1.c: New file.
* gcc.target/aarch64/simd/vuzps32.x: New file.
* gcc.target/aarch64/simd/vuzps8_1.c: New file.
* gcc.target/aarch64/simd/vuzps8.x: New file.
* gcc.target/aarch64/simd/vuzpu16_1.c: New file.
* gcc.target/aarch64/simd/vuzpu16.x: New file.
* gcc.target/aarch64/simd/vuzpu32_1.c: New file.
* gcc.target/aarch64/simd/vuzpu32.x: New file.
* gcc.target/aarch64/simd/vuzpu8_1.c: New file.
* gcc.target/aarch64/simd/vuzpu8.x: New file.diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vuzpf32.x b/gcc/testsuite/gcc.target/aarch64/simd/vuzpf32.x
new file mode 100644
index 000..86c3700
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vuzpf32.x
@@ -0,0 +1,26 @@
+extern void abort (void);
+
+float32x2x2_t
+test_vuzpf32 (float32x2_t _a, float32x2_t _b)
+{
+  return vuzp_f32 (_a, _b);
+}
+
+int
+main (int argc, char **argv)
+{
+  int i;
+  float32_t first[] = {1, 2};
+  float32_t second[] = {3, 4};
+  float32x2x2_t result = test_vuzpf32 (vld1_f32 (first), vld1_f32 (second));
+  float32_t exp1[] = {1, 3};
+  float32_t exp2[] = {2, 4};
+  float32x2_t expect1 = vld1_f32 (exp1);
+  float32x2_t expect2 = vld1_f32 (exp2);
+
+  for (i = 0; i  2; i++)
+if ((result.val[0][i] != expect1[i]) || (result.val[1][i] != expect2[i]))
+  abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vuzpf32_1.c b/gcc/testsuite/gcc.target/aarch64/simd/vuzpf32_1.c
new file mode 100644
index 000..fedee93
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vuzpf32_1.c
@@ -0,0 +1,11 @@
+/* Test the `vuzp_f32' AArch64 SIMD intrinsic.  */
+
+/* { dg-do run } */
+/* { dg-options -save-temps -fno-inline } */
+
+#include arm_neon.h
+#include vuzpf32.x
+
+/* { dg-final { scan-assembler-times uzp1\[ \t\]+v\[0-9\]+\.2s, ?v\[0-9\]+\.2s, ?v\[0-9\]+\.2s!?\(?:\[ \t\]+@\[a-zA-Z0-9 \]+\)?\n 1 } } */
+/* { dg-final { scan-assembler-times uzp2\[ \t\]+v\[0-9\]+\.2s, ?v\[0-9\]+\.2s, ?v\[0-9\]+\.2s!?\(?:\[ \t\]+@\[a-zA-Z0-9 \]+\)?\n 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vuzpp16.x b/gcc/testsuite/gcc.target/aarch64/simd/vuzpp16.x
new file mode 100644
index 000..bc45efc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vuzpp16.x
@@ -0,0 +1,26 @@
+extern void abort (void);
+
+poly16x4x2_t
+test_vuzpp16 (poly16x4_t _a, poly16x4_t _b)
+{
+  return vuzp_p16 (_a, _b);
+}
+
+int
+main (int argc, char **argv)
+{
+  int i;
+  poly16_t first[] = {1, 2, 3, 4};
+  poly16_t second[] = {5, 6, 7, 8};
+  poly16x4x2_t result = test_vuzpp16 (vld1_p16 (first), vld1_p16 (second));
+  poly16_t exp1[] = {1, 3, 5, 7};
+  poly16_t exp2[] = {2, 4, 6, 8};
+  poly16x4_t expect1 = vld1_p16 (exp1);
+  poly16x4_t expect2 = vld1_p16 (exp2);
+
+  for (i = 0; i  4; i++)
+if ((result.val[0][i] != expect1[i]) || (result.val[1][i] != expect2[i]))
+  abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vuzpp16_1.c b/gcc/testsuite/gcc.target/aarch64/simd/vuzpp16_1.c
new file mode 100644
index 000..03b0722
--- /dev/null
+++ 

[AArch64/ARM 2/3] Rewrite AArch64 UZP Intrinsics using __builtin_shuffle

2014-03-27 Thread Alan Lawrence
This patch replaces the temporary inline assembler for vuzp_* in arm_neon.h with 
equivalent calls to __builtin_shuffle.  These are matched by 
aarch64_expand_vec_perm_const{,_1} to output (generally) the same assembler 
instructions.  That is, except for two-element vectors, where ZIP, UZP and TRN 
instructions all have the same effect; gcc's backend chooses to output ZIP so 
this patch also updates the 3 affected tests.


Regressed, and tests from first patch still passing modulo updates herein, on 
aarch64-none-elf and aarch64_be-none-elf.


gcc/testsuite/ChangeLog:
2014-03-27  Alan Lawrence  alan.lawre...@arm.com

* gcc.target/aarch64/vuzps32_1.c: Expect zip1/2 insn rather than uzp1/2.
* gcc.target/aarch64/vuzpu32_1.c: Likewise.
* gcc.target/aarch64/vuzpf32_1.c: Likewise.

gcc/ChangeLog:
2014-03-27  Alan Lawrence  alan.lawre...@arm.com

* config/aarch64/arm_neon.h (vuzp1_f32, vuzp1_p8, vuzp1_p16, vuzp1_s8,
vuzp1_s16, vuzp1_s32, vuzp1_u8, vuzp1_u16, vuzp1_u32, vuzp1q_f32,
vuzp1q_f64, vuzp1q_p8, vuzp1q_p16, vuzp1q_s8, vuzp1q_s16, vuzp1q_s32,
vuzp1q_s64, vuzp1q_u8, vuzp1q_u16, vuzp1q_u32, vuzp1q_u64, vuzp2_f32,
vuzp2_p8, vuzp2_p16, vuzp2_s8, vuzp2_s16, vuzp2_s32, vuzp2_u8,
vuzp2_u16, vuzp2_u32, vuzp2q_f32, vuzp2q_f64, vuzp2q_p8, vuzp2q_p16,
vuzp2q_s8, vuzp2q_s16, vuzp2q_s32, vuzp2q_s64, vuzp2q_u8, vuzp2q_u16,
vuzp2q_u32, vuzp2q_u64): Replace temporary asm with __builtin_shuffle.diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 6af99361..efbba09 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -12952,467 +12952,6 @@ vtstq_p16 (poly16x8_t a, poly16x8_t b)
: /* No clobbers */);
   return result;
 }
-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
-vuzp1_f32 (float32x2_t a, float32x2_t b)
-{
-  float32x2_t result;
-  __asm__ (uzp1 %0.2s,%1.2s,%2.2s
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
-vuzp1_p8 (poly8x8_t a, poly8x8_t b)
-{
-  poly8x8_t result;
-  __asm__ (uzp1 %0.8b,%1.8b,%2.8b
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
-vuzp1_p16 (poly16x4_t a, poly16x4_t b)
-{
-  poly16x4_t result;
-  __asm__ (uzp1 %0.4h,%1.4h,%2.4h
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
-vuzp1_s8 (int8x8_t a, int8x8_t b)
-{
-  int8x8_t result;
-  __asm__ (uzp1 %0.8b,%1.8b,%2.8b
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
-vuzp1_s16 (int16x4_t a, int16x4_t b)
-{
-  int16x4_t result;
-  __asm__ (uzp1 %0.4h,%1.4h,%2.4h
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
-vuzp1_s32 (int32x2_t a, int32x2_t b)
-{
-  int32x2_t result;
-  __asm__ (uzp1 %0.2s,%1.2s,%2.2s
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
-vuzp1_u8 (uint8x8_t a, uint8x8_t b)
-{
-  uint8x8_t result;
-  __asm__ (uzp1 %0.8b,%1.8b,%2.8b
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
-vuzp1_u16 (uint16x4_t a, uint16x4_t b)
-{
-  uint16x4_t result;
-  __asm__ (uzp1 %0.4h,%1.4h,%2.4h
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
-vuzp1_u32 (uint32x2_t a, uint32x2_t b)
-{
-  uint32x2_t result;
-  __asm__ (uzp1 %0.2s,%1.2s,%2.2s
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-vuzp1q_f32 (float32x4_t a, float32x4_t b)
-{
-  float32x4_t result;
-  __asm__ (uzp1 %0.4s,%1.4s,%2.4s
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
-vuzp1q_f64 (float64x2_t a, float64x2_t b)
-{
-  float64x2_t result;
-  __asm__ (uzp1 %0.2d,%1.2d,%2.2d
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
-vuzp1q_p8 

[AArch64/ARM 3/3] Add execution tests of ARM UZP Intrinsics

2014-03-27 Thread Alan Lawrence
inal patch in series, adds new tests of the ARM UZP Intrinsics (subsuming the 
autogenerated ones in testsuite/gcc.target/arm/neon/), that also check the 
execution results, reusing the test bodies introduced into AArch64 in the first 
patch.


Tests use gcc.target/arm/simd/simd.exp from corresponding patch for ZIP 
Intrinsics, will commit that first.


All tests passing on arm-none-eabi.

gcc/testsuite/ChangeLog:
2014-03-27  Alan Lawrence  alan.lawre...@arm.com

* gcc.target/arm/simd/vuzpqf32_1.c: New file.
* gcc.target/arm/simd/vuzpqp16_1.c: New file.
* gcc.target/arm/simd/vuzpqp8_1.c: New file.
* gcc.target/arm/simd/vuzpqs16_1.c: New file.
* gcc.target/arm/simd/vuzpqs32_1.c: New file.
* gcc.target/arm/simd/vuzpqs8_1.c: New file.
* gcc.target/arm/simd/vuzpqu16_1.c: New file.
* gcc.target/arm/simd/vuzpqu32_1.c: New file.
* gcc.target/arm/simd/vuzpqu8_1.c: New file.
* gcc.target/arm/simd/vuzpf32_1.c: New file.
* gcc.target/arm/simd/vuzpp16_1.c: New file.
* gcc.target/arm/simd/vuzpp8_1.c: New file.
* gcc.target/arm/simd/vuzps16_1.c: New file.
* gcc.target/arm/simd/vuzps32_1.c: New file.
* gcc.target/arm/simd/vuzps8_1.c: New file.
* gcc.target/arm/simd/vuzpu16_1.c: New file.
* gcc.target/arm/simd/vuzpu32_1.c: New file.
* gcc.target/arm/simd/vuzpu8_1.c: New file.

diff --git a/gcc/testsuite/gcc.target/arm/simd/vuzpf32_1.c b/gcc/testsuite/gcc.target/arm/simd/vuzpf32_1.c
new file mode 100644
index 000..723c86a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vuzpf32_1.c
@@ -0,0 +1,12 @@
+/* Test the `vuzpf32' ARM Neon intrinsic.  */
+
+/* { dg-do run } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options -save-temps -O1 -fno-inline } */
+/* { dg-add-options arm_neon } */
+
+#include arm_neon.h
+#include ../../aarch64/simd/vuzpf32.x
+
+/* { dg-final { scan-assembler-times vuzp\.32\[ \t\]+\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+!?\(?:\[ \t\]+@\[a-zA-Z0-9 \]+\)?\n 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/arm/simd/vuzpp16_1.c b/gcc/testsuite/gcc.target/arm/simd/vuzpp16_1.c
new file mode 100644
index 000..c7ad757
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vuzpp16_1.c
@@ -0,0 +1,12 @@
+/* Test the `vuzpp16' ARM Neon intrinsic.  */
+
+/* { dg-do run } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options -save-temps -O1 -fno-inline } */
+/* { dg-add-options arm_neon } */
+
+#include arm_neon.h
+#include ../../aarch64/simd/vuzpp16.x
+
+/* { dg-final { scan-assembler-times vuzp\.16\[ \t\]+\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+!?\(?:\[ \t\]+@\[a-zA-Z0-9 \]+\)?\n 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/arm/simd/vuzpp8_1.c b/gcc/testsuite/gcc.target/arm/simd/vuzpp8_1.c
new file mode 100644
index 000..670b550
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vuzpp8_1.c
@@ -0,0 +1,12 @@
+/* Test the `vuzpp8' ARM Neon intrinsic.  */
+
+/* { dg-do run } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options -save-temps -O1 -fno-inline } */
+/* { dg-add-options arm_neon } */
+
+#include arm_neon.h
+#include ../../aarch64/simd/vuzpp8.x
+
+/* { dg-final { scan-assembler-times vuzp\.8\[ \t\]+\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+!?\(?:\[ \t\]+@\[a-zA-Z0-9 \]+\)?\n 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/arm/simd/vuzpqf32_1.c b/gcc/testsuite/gcc.target/arm/simd/vuzpqf32_1.c
new file mode 100644
index 000..53147f1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vuzpqf32_1.c
@@ -0,0 +1,12 @@
+/* Test the `vuzpQf32' ARM Neon intrinsic.  */
+
+/* { dg-do run } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options -save-temps -O1 -fno-inline } */
+/* { dg-add-options arm_neon } */
+
+#include arm_neon.h
+#include ../../aarch64/simd/vuzpqf32.x
+
+/* { dg-final { scan-assembler-times vuzp\.32\[ \t\]+\[qQ\]\[0-9\]+, ?\[qQ\]\[0-9\]+!?\(?:\[ \t\]+@\[a-zA-Z0-9 \]+\)?\n 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/arm/simd/vuzpqp16_1.c b/gcc/testsuite/gcc.target/arm/simd/vuzpqp16_1.c
new file mode 100644
index 000..feef15a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vuzpqp16_1.c
@@ -0,0 +1,12 @@
+/* Test the `vuzpQp16' ARM Neon intrinsic.  */
+
+/* { dg-do run } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options -save-temps -O1 -fno-inline } */
+/* { dg-add-options arm_neon } */
+
+#include arm_neon.h
+#include ../../aarch64/simd/vuzpqp16.x
+
+/* { dg-final { scan-assembler-times vuzp\.16\[ \t\]+\[qQ\]\[0-9\]+, ?\[qQ\]\[0-9\]+!?\(?:\[ \t\]+@\[a-zA-Z0-9 \]+\)?\n 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/arm/simd/vuzpqp8_1.c b/gcc/testsuite/gcc.target/arm/simd/vuzpqp8_1.c
new file mode 100644
index 000..db98f35
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vuzpqp8_1.c
@@ -0,0 +1,12 @@
+/* Test the 

[AArch64 costs 3/18] Wrap aarch64_rtx_costs to dump verbose output

2014-03-27 Thread James Greenhalgh

Hi,

The rtx_costs implementation in the ARM backend dumps the partial
or total computed cost of an insn as it goes along. This
functionality proves useful when debugging aarch64_rtx_costs,
so we should port it across.

Tested on aarch64-none-elf in series with no issues.

OK for stage 1?

Thanks,
James

---
2014-03-27  James Greenhalgh  james.greenha...@arm.com

* config/aarch64/aarch64.c (aarch64_rtx_costs_wrapper): New.
(TARGET_RTX_COSTS): Call it.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 8c261ca..1d19ed3 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -63,6 +63,7 @@
 #include cfgloop.h
 #include tree-vectorizer.h
 #include config/arm/aarch-cost-tables.h
+#include dumpfile.h
 
 /* Defined for convenience.  */
 #define POINTER_BYTES (POINTER_SIZE / BITS_PER_UNIT)
@@ -4978,6 +4979,26 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
   return false;
 }
 
+/* Wrapper around aarch64_rtx_costs, dumps the partial, or total cost
+   calculated for X.  This cost is stored in *COST.  Returns true
+   if the total cost of X was calculated.  */
+static bool
+aarch64_rtx_costs_wrapper (rtx x, int code, int outer,
+		   int param, int *cost, bool speed)
+{
+  bool result = aarch64_rtx_costs (x, code, outer, param, cost, speed);
+
+  if (dump_file  (dump_flags  TDF_DETAILS))
+{
+  print_rtl_single (dump_file, x);
+  fprintf (dump_file, \n%s cost: %d (%s)\n,
+	   speed ? Hot : Cold,
+	   *cost, result ? final : partial);
+}
+
+  return result;
+}
+
 static int
 aarch64_register_move_cost (enum machine_mode mode ATTRIBUTE_UNUSED,
 			reg_class_t from, reg_class_t to)

[AArch64 costs 2/18] Add cost tables for Cortex-A57

2014-03-27 Thread James Greenhalgh

Hi,

This patch wires up the address and vector cost tables for the
Cortex-A57 processor.

Tested on aarch64-none-elf.

OK for stage 1?

Thanks,
James

---
2014-03-27  James Greenhalgh  james.greenha...@arm.com

* config/aarch64/aarch64.c (cortexa57_addrcost_table): New.
(cortexa57_vector_cost): Likewise.
(cortexa57_tunings): Use them.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 2a2e2e9..8c261ca 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -190,6 +190,27 @@ static const struct cpu_addrcost_table generic_addrcost_table =
 #if HAVE_DESIGNATED_INITIALIZERS  GCC_VERSION = 2007
 __extension__
 #endif
+static const struct cpu_addrcost_table cortexa57_addrcost_table =
+{
+#if HAVE_DESIGNATED_INITIALIZERS
+  .addr_scale_costs =
+#endif
+{
+  NAMED_PARAM (qi, 0),
+  NAMED_PARAM (hi, 1),
+  NAMED_PARAM (si, 0),
+  NAMED_PARAM (ti, 1),
+},
+  NAMED_PARAM (pre_modify, 0),
+  NAMED_PARAM (post_modify, 0),
+  NAMED_PARAM (register_offset, 0),
+  NAMED_PARAM (register_extend, 0),
+  NAMED_PARAM (imm_offset, 0),
+};
+
+#if HAVE_DESIGNATED_INITIALIZERS  GCC_VERSION = 2007
+__extension__
+#endif
 static const struct cpu_regmove_cost generic_regmove_cost =
 {
   NAMED_PARAM (GP2GP, 1),
@@ -221,6 +242,26 @@ static const struct cpu_vector_cost generic_vector_cost =
   NAMED_PARAM (cond_not_taken_branch_cost, 1)
 };
 
+/* Generic costs for vector insn classes.  */
+#if HAVE_DESIGNATED_INITIALIZERS  GCC_VERSION = 2007
+__extension__
+#endif
+static const struct cpu_vector_cost cortexa57_vector_cost =
+{
+  NAMED_PARAM (scalar_stmt_cost, 1),
+  NAMED_PARAM (scalar_load_cost, 4),
+  NAMED_PARAM (scalar_store_cost, 1),
+  NAMED_PARAM (vec_stmt_cost, 3),
+  NAMED_PARAM (vec_to_scalar_cost, 8),
+  NAMED_PARAM (scalar_to_vec_cost, 8),
+  NAMED_PARAM (vec_align_load_cost, 5),
+  NAMED_PARAM (vec_unalign_load_cost, 5),
+  NAMED_PARAM (vec_unalign_store_cost, 1),
+  NAMED_PARAM (vec_store_cost, 1),
+  NAMED_PARAM (cond_taken_branch_cost, 1),
+  NAMED_PARAM (cond_not_taken_branch_cost, 1)
+};
+
 #if HAVE_DESIGNATED_INITIALIZERS  GCC_VERSION = 2007
 __extension__
 #endif
@@ -247,9 +288,9 @@ static const struct tune_params cortexa53_tunings =
 static const struct tune_params cortexa57_tunings =
 {
   cortexa57_extra_costs,
-  generic_addrcost_table,
+  cortexa57_addrcost_table,
   generic_regmove_cost,
-  generic_vector_cost,
+  cortexa57_vector_cost,
   NAMED_PARAM (memmov_cost, 4),
   NAMED_PARAM (issue_rate, 3)
 };

[AArch64 costs 1/18] Refactor aarch64_address_costs.

2014-03-27 Thread James Greenhalgh

Hi,

The address cost function, as it stands, is fairly limited.

Rather than write a lot of new walking code, we would instead
like to reuse some of the existing infrastructure which can
categorize an address.

This patch therefore rewires aarch64_address_costs to use
aarch64_classify_address, which drastically simplifies the
function.

We can also take advantage of the higher level of detail
returned by address classification, allowing us to cost fast
and slow register shifts differently.

Tested in series on aarch64-none-elf with no regressions.

OK for stage 1?

Thanks,
James

---
gcc/

2014-03-27  James Greenhalgh  james.greenha...@arm.com

* config/aarch64/aarch64-protos.h (scale_addr_mode_cost): New.
(cpu_addrcost_table): Use it.
* config/aarch64/aarch64.c (generic_addrcost_table): Initialize it.
(aarch64_address_cost): Rewrite using aarch64_classify_address,
move it.
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 5542f02..cdea6a4 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -108,9 +108,22 @@ enum aarch64_symbol_type
cost models and vectors for address cost calculations, register
move costs and memory move costs.  */
 
+/* Scaled addressing modes can vary cost depending on the mode of the
+   value to be loaded/stored.  QImode values cannot use scaled
+   addressing modes.  */
+
+struct scale_addr_mode_cost
+{
+  const int hi;
+  const int si;
+  const int di;
+  const int ti;
+};
+
 /* Additional cost for addresses.  */
 struct cpu_addrcost_table
 {
+  const struct scale_addr_mode_cost addr_scale_costs;
   const int pre_modify;
   const int post_modify;
   const int register_offset;
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 3c06d92..2a2e2e9 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -171,11 +171,20 @@ __extension__
 #endif
 static const struct cpu_addrcost_table generic_addrcost_table =
 {
+#if HAVE_DESIGNATED_INITIALIZERS
+  .addr_scale_costs =
+#endif
+{
+  NAMED_PARAM (qi, 0),
+  NAMED_PARAM (hi, 0),
+  NAMED_PARAM (si, 0),
+  NAMED_PARAM (ti, 0),
+},
   NAMED_PARAM (pre_modify, 0),
   NAMED_PARAM (post_modify, 0),
   NAMED_PARAM (register_offset, 0),
   NAMED_PARAM (register_extend, 0),
-  NAMED_PARAM (imm_offset, 0)
+  NAMED_PARAM (imm_offset, 0),
 };
 
 #if HAVE_DESIGNATED_INITIALIZERS  GCC_VERSION = 2007
@@ -4515,6 +4524,101 @@ aarch64_strip_shift_or_extend (rtx x)
   return aarch64_strip_shift (x);
 }
 
+static int
+aarch64_address_cost (rtx x,
+		  enum machine_mode mode,
+		  addr_space_t as ATTRIBUTE_UNUSED,
+		  bool speed)
+{
+  enum rtx_code c = GET_CODE (x);
+  const struct cpu_addrcost_table *addr_cost = aarch64_tune_params-addr_cost;
+  struct aarch64_address_info info;
+  int cost = 0;
+  info.shift = 0;
+
+  if (!aarch64_classify_address (info, x, mode, c, false))
+{
+  if (GET_CODE (x) == CONST || GET_CODE (x) == SYMBOL_REF)
+	{
+	  /* This is a CONST or SYMBOL ref which will be split
+	 in a different way depending on the code model in use.
+	 Cost it through the generic infrastructure.  */
+	  int cost_symbol_ref = rtx_cost (x, MEM, 1, speed);
+	  /* Divide through by the cost of one instruction to
+	 bring it to the same units as the address costs.  */
+	  cost_symbol_ref /= COSTS_N_INSNS (1);
+	  /* The cost is then the cost of preparing the address,
+	 followed by an immediate (possibly 0) offset.  */
+	  return cost_symbol_ref + addr_cost-imm_offset;
+	}
+  else
+	{
+	  /* This is most likely a jump table from a case
+	 statement.  */
+	  return addr_cost-register_offset;
+	}
+}
+
+  switch (info.type)
+{
+  case ADDRESS_LO_SUM:
+  case ADDRESS_SYMBOLIC:
+  case ADDRESS_REG_IMM:
+	cost += addr_cost-imm_offset;
+	break;
+
+  case ADDRESS_REG_WB:
+	if (c == PRE_INC || c == PRE_DEC || c == PRE_MODIFY)
+	  cost += addr_cost-pre_modify;
+	else if (c == POST_INC || c == POST_DEC || c == POST_MODIFY)
+	  cost += addr_cost-post_modify;
+	else
+	  gcc_unreachable ();
+
+	break;
+
+  case ADDRESS_REG_REG:
+	cost += addr_cost-register_offset;
+	break;
+
+  case ADDRESS_REG_UXTW:
+  case ADDRESS_REG_SXTW:
+	cost += addr_cost-register_extend;
+	break;
+
+  default:
+	gcc_unreachable ();
+}
+
+
+  if (info.shift  0)
+{
+  /* For the sake of calculating the cost of the shifted register
+	 component, we can treat same sized modes in the same way.  */
+  switch (GET_MODE_BITSIZE (mode))
+	{
+	  case 16:
+	cost += addr_cost-addr_scale_costs.hi;
+	break;
+
+	  case 32:
+	cost += addr_cost-addr_scale_costs.si;
+	break;
+
+	  case 64:
+	cost += addr_cost-addr_scale_costs.di;
+	break;
+
+	  /* We can't tell, or this is a 128-bit vector.  */
+	  default:
+	cost += addr_cost-addr_scale_costs.ti;
+	break;
+	}
+}
+
+  return 

[AArch64 costs 14/18] Cost comparisons, flag setting operators and IF_THEN_ELSE

2014-03-27 Thread James Greenhalgh
Hi,

Next, comparisons, flag setting operations and IF_THEN_ELSE.

Tested on aarch64-none-elf.

Ok for stage 1?

Thanks,
James

---
2014-03-27  James Greenhalgh  james.greenha...@arm.com
Philipp Tomsich  philipp.toms...@theobroma-systems.com

* config/aarch64/aarch64.c (aarch64_rtx_costs): Cost comparison
operators.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index c6f1ac5..bdfcc55 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4833,7 +4833,7 @@ static bool
 aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
 		   int param ATTRIBUTE_UNUSED, int *cost, bool speed)
 {
-  rtx op0, op1;
+  rtx op0, op1, op2;
   const struct cpu_cost_table *extra_cost
 = aarch64_tune_params-insn_extra_cost;
   enum machine_mode mode = GET_MODE (x);
@@ -5058,16 +5058,77 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
 	  goto cost_logic;
 	}
 
-  /* Comparisons can work if the order is swapped.
-	 Canonicalization puts the more complex operation first, but
-	 we want it in op1.  */
-  if (! (REG_P (op0)
-	 || (GET_CODE (op0) == SUBREG  REG_P (SUBREG_REG (op0)
-	{
-	  op0 = XEXP (x, 1);
-	  op1 = XEXP (x, 0);
-	}
-  goto cost_minus;
+  if (GET_MODE_CLASS (GET_MODE (op0)) == MODE_INT)
+{
+  /* TODO: A write to the CC flags possibly costs extra, this
+	 needs encoding in the cost tables.  */
+
+  /* CC_ZESWPmode supports zero extend for free.  */
+  if (GET_MODE (x) == CC_ZESWPmode  GET_CODE (op0) == ZERO_EXTEND)
+op0 = XEXP (op0, 0);
+
+  /* ANDS.  */
+  if (GET_CODE (op0) == AND)
+{
+  x = op0;
+  goto cost_logic;
+}
+
+  if (GET_CODE (op0) == PLUS)
+{
+	  /* ADDS (and CMN alias).  */
+  x = op0;
+  goto cost_plus;
+}
+
+  if (GET_CODE (op0) == MINUS)
+{
+	  /* SUBS.  */
+  x = op0;
+  goto cost_minus;
+}
+
+  if (GET_CODE (op1) == NEG)
+{
+	  /* CMN.  */
+	  if (speed)
+		*cost += extra_cost-alu.arith;
+
+  *cost += rtx_cost (op0, COMPARE, 0, speed);
+	  *cost += rtx_cost (XEXP (op1, 0), NEG, 1, speed);
+  return true;
+}
+
+  /* CMP.
+
+	 Compare can freely swap the order of operands, and
+ canonicalization puts the more complex operation first.
+ But the integer MINUS logic expects the shift/extend
+ operation in op1.  */
+  if (! (REG_P (op0)
+ || (GET_CODE (op0) == SUBREG  REG_P (SUBREG_REG (op0)
+  {
+op0 = XEXP (x, 1);
+op1 = XEXP (x, 0);
+  }
+  goto cost_minus;
+}
+
+  if (GET_MODE_CLASS (GET_MODE (op0)) == MODE_FLOAT)
+{
+	  /* FCMP.  */
+	  if (speed)
+	*cost += extra_cost-fp[mode == DFmode].compare;
+
+  if (CONST_DOUBLE_P (op1)  aarch64_float_const_zero_rtx_p (op1))
+{
+  /* FCMP supports constant 0.0 for no extra cost. */
+  return true;
+}
+  return false;
+}
+
+  return false;
 
 case MINUS:
   {
@@ -5138,6 +5199,7 @@ cost_minus:
 	op0 = XEXP (x, 0);
 	op1 = XEXP (x, 1);
 
+cost_plus:
 	if (GET_RTX_CLASS (GET_CODE (op0)) == RTX_COMPARE
 	|| GET_RTX_CLASS (GET_CODE (op0)) == RTX_COMM_COMPARE)
 	  {
@@ -5451,6 +5513,81 @@ cost_minus:
 	}
   return false;  /* All arguments need to be in registers.  */
 
+case IF_THEN_ELSE:
+  op2 = XEXP (x, 2);
+  op0 = XEXP (x, 0);
+  op1 = XEXP (x, 1);
+
+  if (GET_CODE (op1) == PC || GET_CODE (op2) == PC)
+{
+  /* Conditional branch.  */
+  if (GET_MODE_CLASS (GET_MODE (XEXP (op0, 0))) == MODE_CC)
+	return true;
+	  else
+	{
+	  if (GET_CODE (op0) == NE
+		  || GET_CODE (op0) == EQ)
+		{
+		  rtx inner = XEXP (op0, 0);
+		  rtx comparator = XEXP (op0, 1);
+
+		  if (comparator == const0_rtx)
+		{
+		  /* TBZ/TBNZ/CBZ/CBNZ.  */
+		  if (GET_CODE (inner) == ZERO_EXTRACT)
+			/* TBZ/TBNZ.  */
+			*cost += rtx_cost (XEXP (inner, 0), ZERO_EXTRACT,
+	   0, speed);
+		  else
+			/* CBZ/CBNZ.  */
+			*cost += rtx_cost (inner, GET_CODE (op0), 0, speed);
+
+		  return true;
+		}
+		}
+	  else if (GET_CODE (op0) == LT
+		   || GET_CODE (op0) == GE)
+		{
+		  rtx comparator = XEXP (op0, 1);
+
+		  /* TBZ/TBNZ.  */
+		  if (comparator == const0_rtx)
+		return true;
+		}
+	}
+}
+  else if (GET_MODE_CLASS (GET_MODE (XEXP (op0, 0))) == MODE_CC)
+{
+  /* It's a conditional operation based on the status flags,
+ so it must be some flavor of CSEL.  */
+
+  /* CSNEG, CSINV, and CSINC are handled for free as part of CSEL.  */
+  if (GET_CODE (op1) == NEG
+  

[AArch64 costs 4/18] Better estimate cost of building a constant

2014-03-27 Thread James Greenhalgh

Hi,

One thing we might want to be more accurate in costing is building
an integer from scratch.

To estimate this, we can repurpose aarch64_build_constant. If we
take an additional flag to decide whether we should actually emit
instructions, we can simply count the number of instructions we would
have spat out, and use that count.

This patch performs that modification, updates the existing
call-sites for aarch64_build_constant, and adds code to
aarch64_rtx_costs to cost a CONST_INT or a CONST_DOUBLE.

Tested in series on aarch64-none-elf with no issues.

OK for stage 1?

Thanks,
James

---
2014-03-27  James Greenhalgh  james.greenha...@arm.com
Philipp Tomsich  philipp.toms...@theobroma-systems.com

* config/aarch64/aarch64.c (aarch64_build_constant): Conditionally
emit instructions, return number of instructions which would
be emitted.
(aarch64_add_constant): Update call to aarch64_build_constant.
(aarch64_output_mi_thunk): Likewise.
(aarch64_rtx_costs): Estimate cost of a CONST_INT, cost
a CONST_DOUBLE.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1d19ed3..af947ca 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2503,12 +2503,22 @@ aarch64_final_eh_return_addr (void)
    - 2 * UNITS_PER_WORD));
 }
 
-/* Output code to build up a constant in a register.  */
-static void
-aarch64_build_constant (int regnum, HOST_WIDE_INT val)
+/* Possibly output code to build up a constant in a register.  For
+   the benefit of the costs infrastructure, returns the number of
+   instructions which would be emitted.  GENERATE inhibits or
+   enables code generation.  */
+
+static int
+aarch64_build_constant (int regnum, HOST_WIDE_INT val, bool generate)
 {
+  int insns = 0;
+
   if (aarch64_bitmask_imm (val, DImode))
-emit_move_insn (gen_rtx_REG (Pmode, regnum), GEN_INT (val));
+{
+  if (generate)
+	emit_move_insn (gen_rtx_REG (Pmode, regnum), GEN_INT (val));
+  insns = 1;
+}
   else
 {
   int i;
@@ -2539,15 +2549,19 @@ aarch64_build_constant (int regnum, HOST_WIDE_INT val)
 	 the same.  */
   if (ncount  zcount)
 	{
-	  emit_move_insn (gen_rtx_REG (Pmode, regnum),
-			  GEN_INT (val | ~(HOST_WIDE_INT) 0x));
+	  if (generate)
+	emit_move_insn (gen_rtx_REG (Pmode, regnum),
+			GEN_INT (val | ~(HOST_WIDE_INT) 0x));
 	  tval = 0x;
+	  insns++;
 	}
   else
 	{
-	  emit_move_insn (gen_rtx_REG (Pmode, regnum),
-			  GEN_INT (val  0x));
+	  if (generate)
+	emit_move_insn (gen_rtx_REG (Pmode, regnum),
+			GEN_INT (val  0x));
 	  tval = 0;
+	  insns++;
 	}
 
   val = 16;
@@ -2555,11 +2569,17 @@ aarch64_build_constant (int regnum, HOST_WIDE_INT val)
   for (i = 16; i  64; i += 16)
 	{
 	  if ((val  0x) != tval)
-	emit_insn (gen_insv_immdi (gen_rtx_REG (Pmode, regnum),
-   GEN_INT (i), GEN_INT (val  0x)));
+	{
+	  if (generate)
+		emit_insn (gen_insv_immdi (gen_rtx_REG (Pmode, regnum),
+	   GEN_INT (i),
+	   GEN_INT (val  0x)));
+	  insns++;
+	}
 	  val = 16;
 	}
 }
+  return insns;
 }
 
 static void
@@ -2574,7 +2594,7 @@ aarch64_add_constant (int regnum, int scratchreg, HOST_WIDE_INT delta)
 
   if (mdelta = 4096 * 4096)
 {
-  aarch64_build_constant (scratchreg, delta);
+  (void) aarch64_build_constant (scratchreg, delta, true);
   emit_insn (gen_add3_insn (this_rtx, this_rtx, scratch_rtx));
 }
   else if (mdelta  0)
@@ -2648,7 +2668,7 @@ aarch64_output_mi_thunk (FILE *file, tree thunk ATTRIBUTE_UNUSED,
 	  addr = plus_constant (Pmode, temp0, vcall_offset);
   else
 	{
-	  aarch64_build_constant (IP1_REGNUM, vcall_offset);
+	  (void) aarch64_build_constant (IP1_REGNUM, vcall_offset, true);
 	  addr = gen_rtx_PLUS (Pmode, temp0, temp1);
 	}
 
@@ -4670,6 +4690,7 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
   rtx op0, op1;
   const struct cpu_cost_table *extra_cost
 = aarch64_tune_params-insn_extra_cost;
+  enum machine_mode mode = GET_MODE (x);
 
   switch (code)
 {
@@ -4716,6 +4737,57 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
 	}
   return false;
 
+case CONST_INT:
+  /* If an instruction can incorporate a constant within the
+	 instruction, the instruction's expression avoids calling
+	 rtx_cost() on the constant.  If rtx_cost() is called on a
+	 constant, then it is usually because the constant must be
+	 moved into a register by one or more instructions.
+
+	 The exception is constant 0, which can be expressed
+	 as XZR/WZR and is therefore free.  The exception to this is
+	 if we have (set (reg) (const0_rtx)) in which case we must cost
+	 the move.  However, we can catch that when we cost the SET, so
+	 we don't need to consider that here.  */
+  if (x == const0_rtx)
+	*cost = 0;
+  else
+	{
+	  /* To an approximation, building any other constant is
+	 

[AArch64 costs 12/18] Improve costs for sign/zero extracts

2014-03-27 Thread James Greenhalgh

Hi,

Next SIGN_EXTRACT/ZERO_EXTRACT.

Tested in series on aarch64-none-elf.

OK for stage 1?

Thanks,
James

---
2014-03-27  James Greenhalgh  james.greenha...@arm.com
Philipp Tomsich  philipp.toms...@theobroma-systems.com

* config/aarch64/aarch64.c (aarch64_rtx_costs): Improve costs for
SIGN/ZERO_EXTRACT.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a8de1e3..338f6b3 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4798,6 +4798,35 @@ aarch64_address_cost (rtx x,
   return cost;
 }
 
+/* Return true if the RTX X in mode MODE is a zero or sign extract
+   usable in an ADD or SUB (extended register) instruction.  */
+static bool
+aarch64_rtx_arith_op_extract_p (rtx x, enum machine_mode mode)
+{
+  /* Catch add with a sign extract.
+ This is add_optabmode_multp2.  */
+  if (GET_CODE (x) == SIGN_EXTRACT
+  || GET_CODE (x) == ZERO_EXTRACT)
+{
+  rtx op0 = XEXP (x, 0);
+  rtx op1 = XEXP (x, 1);
+  rtx op2 = XEXP (x, 2);
+
+  if (GET_CODE (op0) == MULT
+	   CONST_INT_P (op1)
+	   op2 == const0_rtx
+	   CONST_INT_P (XEXP (op0, 1))
+	   aarch64_is_extend_from_extract (mode,
+	 XEXP (op0, 1),
+	 op1))
+	{
+	  return true;
+	}
+}
+
+  return false;
+}
+
 /* Calculate the cost of calculating X, storing it in *COST.  Result
is true if the total cost of the operation has now been calculated.  */
 static bool
@@ -5062,6 +5091,18 @@ cost_minus:
 
 	  }
 
+	/* Look for SUB (extended register).  */
+if (aarch64_rtx_arith_op_extract_p (op1, mode))
+	  {
+	if (speed)
+	  *cost += extra_cost-alu.arith_shift;
+
+	*cost += rtx_cost (XEXP (XEXP (op1, 0), 0),
+			   (enum rtx_code) GET_CODE (op1),
+			   0, speed);
+	return true;
+	  }
+
 	rtx new_op1 = aarch64_strip_extend (op1);
 
 	/* Cost this as an FMA-alike operation.  */
@@ -5118,6 +5159,18 @@ cost_minus:
 	return true;
 	  }
 
+	/* Look for ADD (extended register).  */
+if (aarch64_rtx_arith_op_extract_p (op0, mode))
+	  {
+	if (speed)
+	  *cost += extra_cost-alu.arith_shift;
+
+	*cost += rtx_cost (XEXP (XEXP (op0, 0), 0),
+			   (enum rtx_code) GET_CODE (op0),
+			   0, speed);
+	return true;
+	  }
+
 	/* Strip any extend, leave shifts behind as we will
 	   cost them through mult_cost.  */
 	new_op0 = aarch64_strip_extend (op0);
@@ -5353,7 +5406,13 @@ cost_minus:
 
 case ZERO_EXTRACT:
 case SIGN_EXTRACT:
-  *cost += rtx_cost (XEXP (x, 0), ZERO_EXTRACT, 0, speed);
+  /* UBFX/SBFX.  */
+  if (speed)
+	*cost += extra_cost-alu.bfx;
+
+  /* We can trust that the immediates used will be correct (there
+	 are no by-register forms), so we need only cost op0.  */
+  *cost += rtx_cost (XEXP (x, 0), (enum rtx_code) code, 0, speed);
   return true;
 
 case MULT:
@@ -9030,7 +9089,7 @@ aarch64_cannot_change_mode_class (enum machine_mode from,
 #define TARGET_RETURN_IN_MSB aarch64_return_in_msb
 
 #undef TARGET_RTX_COSTS
-#define TARGET_RTX_COSTS aarch64_rtx_costs
+#define TARGET_RTX_COSTS aarch64_rtx_costs_wrapper
 
 #undef TARGET_SCHED_ISSUE_RATE
 #define TARGET_SCHED_ISSUE_RATE aarch64_sched_issue_rate

[AArch64 costs 13/18] Improve costs for div/mod

2014-03-27 Thread James Greenhalgh

Hi,

Now DIV/MOD.

Tested in series on aarch64-none-elf.

OK for stage 1?

Thanks,
James

---
2014-03-27  James Greenhalgh  james.greenha...@arm.com
Philipp Tomsich  philipp.toms...@theobroma-systems.com

* config/aarch64/aarch64.c (aarch64_rtx_costs): Improve costs for
DIV/MOD.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 338f6b3..c6f1ac5 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5423,7 +5423,6 @@ cost_minus:
 
 case MOD:
 case UMOD:
-  *cost = COSTS_N_INSNS (2);
   if (speed)
 	{
 	  if (GET_MODE_CLASS (GET_MODE (x)) == MODE_INT)
@@ -5440,15 +5439,15 @@ cost_minus:
 
 case DIV:
 case UDIV:
-  *cost = COSTS_N_INSNS (1);
+case SQRT:
   if (speed)
 	{
-	  if (GET_MODE_CLASS (GET_MODE (x)) == MODE_INT)
-	*cost += extra_cost-mult[GET_MODE (x) == DImode].idiv;
-	  else if (GET_MODE (x) == DFmode)
-	*cost += extra_cost-fp[1].div;
-	  else if (GET_MODE (x) == SFmode)
-	*cost += extra_cost-fp[0].div;
+	  if (GET_MODE_CLASS (mode) == MODE_INT)
+	/* There is no integer SQRT, so only DIV and UDIV can get
+	   here.  */
+	*cost += extra_cost-mult[mode == DImode].idiv;
+	  else
+	*cost += extra_cost-fp[mode == DFmode].div;
 	}
   return false;  /* All arguments need to be in registers.  */
 

[AArch64 costs 17/18] Cost for SYMBOL_REF, HIGH and LO_SUM

2014-03-27 Thread James Greenhalgh

Hi,

Next, costs for SYMBOL_REF, HIGH and LO_SUM.

Tested in series on aarch64-none-elf.

OK for Stage-1?

Thanks,
James

---
2014-03-27  James Greenhalgh  james.greenha...@arm.com

* config/aarch64/aarch64.c (aarch64_rtx_costs): Cost SYMBOL_REF,
HIGH, LO_SUM.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 7a6255b..8ebb3d0 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5455,15 +5455,44 @@ cost_plus:
 	  return false;  /* All arguments need to be in registers.  */
 	}
 
-case HIGH:
-  if (!CONSTANT_P (XEXP (x, 0)))
-	*cost += rtx_cost (XEXP (x, 0), HIGH, 0, speed);
+case SYMBOL_REF:
+
+  if (aarch64_cmodel == AARCH64_CMODEL_LARGE)
+	{
+	  /* LDR.  */
+	  if (speed)
+	*cost += extra_cost-ldst.load;
+	}
+  else if (aarch64_cmodel == AARCH64_CMODEL_SMALL
+	   || aarch64_cmodel == AARCH64_CMODEL_SMALL_PIC)
+	{
+	  /* ADRP, followed by ADD.  */
+	  *cost += COSTS_N_INSNS (1);
+	  if (speed)
+	*cost += 2 * extra_cost-alu.arith;
+	}
+  else if (aarch64_cmodel == AARCH64_CMODEL_TINY
+	   || aarch64_cmodel == AARCH64_CMODEL_TINY_PIC)
+	{
+	  /* ADR.  */
+	  if (speed)
+	*cost += extra_cost-alu.arith;
+	}
+
+  if (flag_pic)
+	{
+	  /* One extra load instruction, after accessing the GOT.  */
+	  *cost += COSTS_N_INSNS (1);
+	  if (speed)
+	*cost += extra_cost-ldst.load;
+	}
   return true;
 
+case HIGH:
 case LO_SUM:
-  if (!CONSTANT_P (XEXP (x, 1)))
-	*cost += rtx_cost (XEXP (x, 1), LO_SUM, 1, speed);
-  *cost += rtx_cost (XEXP (x, 0), LO_SUM, 0, speed);
+  /* ADRP/ADD (immediate).  */
+  if (speed)
+	*cost += extra_cost-alu.arith;
   return true;
 
 case ZERO_EXTRACT:

[AArch64 costs 9/18] Better cost logical operations

2014-03-27 Thread James Greenhalgh

Hi,

Next up are costs for the logical operations (AND, OR, etc.).

Tested in series for aarch64-none-elf.

OK for stage 1?

Thanks,
James

---
2014-03-27  James Greenhalgh  james.greenha...@arm.com
Philipp Tomsich  philipp.toms...@theobroma-systems.com

* config/aarch64/aarch64.c (aarch64_rtx_costs): Improve cost for
logical operations.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index d158260..f432788 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5153,25 +5153,80 @@ cost_minus:
   op0 = XEXP (x, 0);
   op1 = XEXP (x, 1);
 
+  if (code == AND
+   GET_CODE (op0) == MULT
+   CONST_INT_P (XEXP (op0, 1))
+   CONST_INT_P (op1)
+   aarch64_uxt_size (exact_log2 (INTVAL (XEXP (op0, 1))),
+   INTVAL (op1)) != 0)
+{
+  /* This is a UBFM/SBFM.  */
+  *cost += rtx_cost (XEXP (op0, 0), ZERO_EXTRACT, 0, speed);
+	  if (speed)
+	*cost += extra_cost-alu.bfx;
+  return true;
+}
+
   if (GET_MODE_CLASS (GET_MODE (x)) == MODE_INT)
 	{
+	  /* We possibly get the immediate for free, this is not
+	 modelled.  */
 	  if (CONST_INT_P (op1)
 	   aarch64_bitmask_imm (INTVAL (op1), GET_MODE (x)))
 	{
-	  *cost += rtx_cost (op0, AND, 0, speed);
+	  *cost += rtx_cost (op0, (enum rtx_code) code, 0, speed);
+
+	  if (speed)
+		*cost += extra_cost-alu.logical;
+
+	  return true;
 	}
 	  else
 	{
+	  rtx new_op0 = op0;
+
+	  /* Handle ORN, EON, or BIC.  */
 	  if (GET_CODE (op0) == NOT)
 		op0 = XEXP (op0, 0);
-	  op0 = aarch64_strip_shift (op0);
-	  *cost += (rtx_cost (op0, AND, 0, speed)
-			+ rtx_cost (op1, AND, 1, speed));
+
+	  new_op0 = aarch64_strip_shift (op0);
+
+	  /* If we had a shift on op0 then this is a logical-shift-
+		 by-register/immediate operation.  Otherwise, this is just
+		 a logical operation.  */
+	  if (speed)
+		{
+		  if (new_op0 != op0)
+		{
+		  /* Shift by immediate.  */
+		  if (CONST_INT_P (XEXP (op0, 1)))
+			*cost += extra_cost-alu.log_shift;
+		  else
+			*cost += extra_cost-alu.log_shift_reg;
+		}
+		  else
+		*cost += extra_cost-alu.logical;
+		}
+
+	  /* In both cases we want to cost both operands.  */
+	  *cost += rtx_cost (new_op0, (enum rtx_code) code, 0, speed)
+		   + rtx_cost (op1, (enum rtx_code) code, 1, speed);
+
+	  return true;
 	}
-	  return true;
 	}
   return false;
 
+case NOT:
+  /* MVN.  */
+  if (speed)
+	*cost += extra_cost-alu.logical;
+
+  /* The logical instruction could have the shifted register form,
+ but the cost is the same if the shift is processed as a separate
+ instruction, so we don't bother with it here.  */
+  return false;
+
 case ZERO_EXTEND:
   if ((GET_MODE (x) == DImode
 	GET_MODE (XEXP (x, 0)) == SImode)

[AArch64 costs 7/18] Improve SET cost.

2014-03-27 Thread James Greenhalgh

Hi,

This patch adds functionality for costing a SET RTX.

Often these are free in the sense that we factor the cost of the set
in to the cost of the RHS of the insn. Notable exceptions are sets to
MEM which should be costed as a store, and simple register moves, which
should be costed.

Tested in series on aarch64-none-elf.

OK for stage 1?

Thanks,
James

---
2014-03-27  James Greenhalgh  james.greenha...@arm.com
Philip Tomsich  philipp.toms...@theobroma-systems.com

* config/aarch64/aarch64.c (aarch64_rtx_costs): Improve costing
for SET RTX.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 766d70d..1b21ecc 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4826,6 +4826,8 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
   switch (code)
 {
 case SET:
+  /* The cost depends entirely on the operands to SET.  */
+  *cost = 0;
   op0 = SET_DEST (x);
   op1 = SET_SRC (x);
 
@@ -4835,23 +4837,33 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
 	  if (speed)
 	*cost += extra_cost-ldst.store;
 
-	  if (op1 != const0_rtx)
-	*cost += rtx_cost (op1, SET, 1, speed);
+	  *cost += rtx_cost (op1, SET, 1, speed);
 	  return true;
 
 	case SUBREG:
 	  if (! REG_P (SUBREG_REG (op0)))
 	*cost += rtx_cost (SUBREG_REG (op0), SET, 0, speed);
+
 	  /* Fall through.  */
 	case REG:
-	  /* Cost is just the cost of the RHS of the set.  */
-	  *cost += rtx_cost (op1, SET, 1, true);
+	  /* const0_rtx is in general free, but we will use an
+	 instruction to set a register to 0.  */
+  if (REG_P (op1) || op1 == const0_rtx)
+{
+  /* The cost is 1 per register copied.  */
+  int n_minus_1 = (GET_MODE_SIZE (GET_MODE (op0)) - 1)
+			  / UNITS_PER_WORD;
+  *cost = COSTS_N_INSNS (n_minus_1 + 1);
+}
+  else
+	/* Cost is just the cost of the RHS of the set.  */
+	*cost += rtx_cost (op1, SET, 1, speed);
 	  return true;
 
-	case ZERO_EXTRACT:  /* Bit-field insertion.  */
+	case ZERO_EXTRACT:
 	case SIGN_EXTRACT:
-	  /* Strip any redundant widening of the RHS to meet the width of
-	 the target.  */
+	  /* Bit-field insertion.  Strip any redundant widening of
+	 the RHS to meet the width of the target.  */
 	  if (GET_CODE (op1) == SUBREG)
 	op1 = SUBREG_REG (op1);
 	  if ((GET_CODE (op1) == ZERO_EXTEND
@@ -4860,10 +4872,25 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
 	   (GET_MODE_BITSIZE (GET_MODE (XEXP (op1, 0)))
 		  = INTVAL (XEXP (op0, 1
 	op1 = XEXP (op1, 0);
-	  *cost += rtx_cost (op1, SET, 1, speed);
+
+  if (CONST_INT_P (op1))
+{
+  /* MOV immediate is assumed to always be cheap.  */
+  *cost = COSTS_N_INSNS (1);
+}
+  else
+{
+  /* BFM.  */
+	  if (speed)
+		*cost += extra_cost-alu.bfi;
+  *cost += rtx_cost (op1, (enum rtx_code) code, 1, speed);
+}
+
 	  return true;
 
 	default:
+	  /* We can't make sense of this, assume default cost.  */
+  *cost = COSTS_N_INSNS (1);
 	  break;
 	}
   return false;

[AArch64 costs 6/18] Set default costs and handle vector modes.

2014-03-27 Thread James Greenhalgh

Hi,

The GCC rtx_costs function will try to be helpful by setting the
cost of a multiply to something very high.  As this is unlikely
to be appropriate we want to overwrite these costs as soon as
possible.

We start with the assumption that everything will be as expensive
as the cheapest instruction.

Additionally, we do a terrible job of costing vector operations,
and we really shouldn't pretend that any of the code in this function
will make the right decision when faced with a vector. So we take
the simplifying view that all vector operations are basically the
same. This will not give a good costing function, and there is
scope for improvement in future. Just trying to cost the
element-function is unlikely to be appropriate, it would imply an ADD
and a vector ADD were equally expensive.

Tested in series on aarch64-none-elf.

OK for stage 1?

Thanks,
James

---
2014-03-27  James Greenhalgh  james.greenha...@arm.com

* config/aarch64/aarch64.c (aarch64_rtx_costs): Set default costs.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 11dc788..766d70d 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4808,6 +4808,21 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
 = aarch64_tune_params-insn_extra_cost;
   enum machine_mode mode = GET_MODE (x);
 
+  /* By default, assume that everything has equivalent cost to the
+ cheapest instruction.  Any additional costs are applied as a delta
+ above this default.  */
+  *cost = COSTS_N_INSNS (1);
+
+  /* TODO: The cost infrastructure currently does not handle
+ vector operations.  Assume that all vector operations
+ are equally expensive.  */
+  if (VECTOR_MODE_P (mode))
+{
+  if (speed)
+	*cost += extra_cost-vect.alu;
+  return true;
+}
+
   switch (code)
 {
 case SET:

[AArch64 costs 11/18] Improve costs for rotate and shift operations.

2014-03-27 Thread James Greenhalgh
Hi,

Now the rotates and shifts.

Tested in series on aarch64-none-elf.

OK for stage 1?

Thanks,
James

---
2014-03-27  James Greenhalgh  james.greenha...@arm.com
Philipp Tomsich  philipp.toms...@theobroma-systems.com

* config/aarch64/aarch64.c (aarch64_rtx_costs): Improve costs for
rotates and shifts.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 3c3dd6d..a8de1e3 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5286,21 +5286,59 @@ cost_minus:
 	*cost += extra_cost-alu.extend;
   return false;
 
+case ASHIFT:
+  op0 = XEXP (x, 0);
+  op1 = XEXP (x, 1);
+
+  if (CONST_INT_P (op1))
+{
+	  /* LSL (immediate), UBMF, UBFIZ and friends.  These are all
+	 aliases.  */
+	  if (speed)
+	*cost += extra_cost-alu.shift;
+
+  /* We can incorporate zero/sign extend for free.  */
+  if (GET_CODE (op0) == ZERO_EXTEND
+  || GET_CODE (op0) == SIGN_EXTEND)
+op0 = XEXP (op0, 0);
+
+  *cost += rtx_cost (op0, ASHIFT, 0, speed);
+  return true;
+}
+  else
+{
+	  /* LSLV.  */
+	  if (speed)
+	*cost += extra_cost-alu.shift_reg;
+
+	  return false;  /* All arguments need to be in registers.  */
+}
+
 case ROTATE:
-  if (!CONST_INT_P (XEXP (x, 1)))
-	*cost += COSTS_N_INSNS (2);
-  /* Fall through.  */
 case ROTATERT:
 case LSHIFTRT:
-case ASHIFT:
 case ASHIFTRT:
+  op0 = XEXP (x, 0);
+  op1 = XEXP (x, 1);
 
-  /* Shifting by a register often takes an extra cycle.  */
-  if (speed  !CONST_INT_P (XEXP (x, 1)))
-	*cost += extra_cost-alu.arith_shift_reg;
+  if (CONST_INT_P (op1))
+	{
+	  /* ASR (immediate) and friends.  */
+	  if (speed)
+	*cost += extra_cost-alu.shift;
 
-  *cost += rtx_cost (XEXP (x, 0), ASHIFT, 0, speed);
-  return true;
+	  *cost += rtx_cost (op0, (enum rtx_code) code, 0, speed);
+	  return true;
+	}
+  else
+	{
+
+	  /* ASR (register) and friends.  */
+	  if (speed)
+	*cost += extra_cost-alu.shift_reg;
+
+	  return false;  /* All arguments need to be in registers.  */
+	}
 
 case HIGH:
   if (!CONSTANT_P (XEXP (x, 0)))

[AArch64 costs 8/18] Cost memory accesses using address costs

2014-03-27 Thread James Greenhalgh

Hi,

When we cost an RTX which touches memory, we really want to cost two
things. The cost of the memory operation, plus some additional cost
if we are using an expensive addressing mode.

This patch adds that modelling.

Tested in series on aarch64-none-elf.

OK for stage 1?

Thanks,
James

---
2014-03-27  James Greenhalgh  james.greenha...@arm.com
Philipp Tomsich  philipp.toms...@theobroma-systems.com

* config/aarch64/aarch64.c (aarch64_rtx_costs): Use address
costs when costing loads and stores to memory.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1b21ecc..d158260 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -142,6 +142,7 @@ static bool aarch64_const_vec_all_same_int_p (rtx,
 
 static bool aarch64_vectorize_vec_perm_const_ok (enum machine_mode vmode,
 		 const unsigned char *sel);
+static int aarch64_address_cost (rtx, enum machine_mode, addr_space_t, bool);
 
 /* The processor for which instructions should be scheduled.  */
 enum aarch64_processor aarch64_tune = cortexa53;
@@ -4835,7 +4836,19 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
 	{
 	case MEM:
 	  if (speed)
-	*cost += extra_cost-ldst.store;
+	{
+	  rtx address = XEXP (op0, 0);
+	  if (GET_MODE_CLASS (mode) == MODE_INT)
+		*cost += extra_cost-ldst.store;
+	  else if (mode == SFmode)
+		*cost += extra_cost-ldst.storef;
+	  else if (mode == DFmode)
+		*cost += extra_cost-ldst.stored;
+
+	  *cost +=
+		COSTS_N_INSNS (aarch64_address_cost (address, mode,
+		 0, speed));
+	}
 
 	  *cost += rtx_cost (op1, SET, 1, speed);
 	  return true;
@@ -4948,7 +4961,22 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
 
 case MEM:
   if (speed)
-	*cost += extra_cost-ldst.load;
+	{
+	  /* For loads we want the base cost of a load, plus an
+	 approximation for the additional cost of the addressing
+	 mode.  */
+	  rtx address = XEXP (x, 0);
+	  if (GET_MODE_CLASS (mode) == MODE_INT)
+	*cost += extra_cost-ldst.load;
+	  else if (mode == SFmode)
+	*cost += extra_cost-ldst.loadf;
+	  else if (mode == DFmode)
+	*cost += extra_cost-ldst.loadd;
+
+	  *cost +=
+		COSTS_N_INSNS (aarch64_address_cost (address, mode,
+		 0, speed));
+	}
 
   return true;
 

[AArch64 costs 10/18] Improve costs for sign/zero extend operations

2014-03-27 Thread James Greenhalgh
Hi,

Next up: SIGN_EXTEND, ZERO_EXTEND.

Tested in series on aarch64-none-elf with no regressions.

OK for stage 1?

Thanks,
James

---
2014-03-27  James Greenhalgh  james.greenha...@arm.com
Philipp Tomsich  philipp.toms...@theobroma-systems.com

* config/aarch64/aarch64.c (aarch64_rtx_costs): Cost
ZERO_EXTEND and SIGN_EXTEND better.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index f432788..3c3dd6d 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5228,21 +5228,62 @@ cost_minus:
   return false;
 
 case ZERO_EXTEND:
-  if ((GET_MODE (x) == DImode
-	GET_MODE (XEXP (x, 0)) == SImode)
-	  || GET_CODE (XEXP (x, 0)) == MEM)
+
+  op0 = XEXP (x, 0);
+  /* If a value is written in SI mode, then zero extended to DI
+	 mode, the operation will in general be free as a write to
+	 a 'w' register implicitly zeroes the upper bits of an 'x'
+	 register.  However, if this is
+
+	   (set (reg) (zero_extend (reg)))
+
+	 we must cost the explicit register move.  */
+  if (mode == DImode
+	   GET_MODE (op0) == SImode
+	   outer == SET)
+	{
+	  int op_cost = rtx_cost (XEXP (x, 0), ZERO_EXTEND, 0, speed);
+
+	  if (!op_cost  speed)
+	/* MOV.  */
+	*cost += extra_cost-alu.extend;
+	  else
+	/* Free, the cost is that of the SI mode operation.  */
+	*cost = op_cost;
+
+	  return true;
+	}
+  else if (MEM_P (XEXP (x, 0)))
 	{
-	  *cost += rtx_cost (XEXP (x, 0), ZERO_EXTEND, 0, speed);
+	  /* All loads can zero extend to any size for free.  */
+	  *cost = rtx_cost (XEXP (x, 0), ZERO_EXTEND, param, speed);
 	  return true;
 	}
+
+  /* UXTB/UXTH.  */
+  if (speed)
+	*cost += extra_cost-alu.extend;
+
   return false;
 
 case SIGN_EXTEND:
-  if (GET_CODE (XEXP (x, 0)) == MEM)
+  if (MEM_P (XEXP (x, 0)))
 	{
-	  *cost += rtx_cost (XEXP (x, 0), SIGN_EXTEND, 0, speed);
+	  /* LDRSH.  */
+	  if (speed)
+	{
+	  rtx address = XEXP (XEXP (x, 0), 0);
+	  *cost += extra_cost-ldst.load_sign_extend;
+
+	  *cost +=
+		COSTS_N_INSNS (aarch64_address_cost (address, mode,
+		 0, speed));
+	}
 	  return true;
 	}
+
+  if (speed)
+	*cost += extra_cost-alu.extend;
   return false;
 
 case ROTATE:

[AArch64 costs 16/18] Cost TRUNCATE

2014-03-27 Thread James Greenhalgh

Hi,

And now - TRUNCATE.

Tested in series on aarch64-none-elf

OK For stage 1?

Thanks,
James

---
2014-03-27  Philipp Tomsich  philipp.toms...@theobroma-systems.com

* config/aarch64/aarch64.c (aarch64_rtx_costs): Cost TRUNCATE.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 3caff3a..7a6255b 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5671,6 +5671,39 @@ cost_plus:
 	}
   return false;
 
+case TRUNCATE:
+
+  /* Decompose sumuldi3_highpart.  */
+  if (/* (truncate:DI  */
+	  mode == DImode
+	  /*   (lshiftrt:TI  */
+   GET_MODE (XEXP (x, 0)) == TImode
+   GET_CODE (XEXP (x, 0)) == LSHIFTRT
+	  /*  (mult:TI  */
+   GET_CODE (XEXP (XEXP (x, 0), 0)) == MULT
+	  /*(ANY_EXTEND:TI (reg:DI))
+	(ANY_EXTEND:TI (reg:DI)))  */
+   ((GET_CODE (XEXP (XEXP (XEXP (x, 0), 0), 0)) == ZERO_EXTEND
+GET_CODE (XEXP (XEXP (XEXP (x, 0), 0), 1)) == ZERO_EXTEND)
+  || (GET_CODE (XEXP (XEXP (XEXP (x, 0), 0), 0)) == SIGN_EXTEND
+   GET_CODE (XEXP (XEXP (XEXP (x, 0), 0), 1)) == SIGN_EXTEND))
+   GET_MODE (XEXP (XEXP (XEXP (XEXP (x, 0), 0), 0), 0)) == DImode
+   GET_MODE (XEXP (XEXP (XEXP (XEXP (x, 0), 0), 1), 0)) == DImode
+	  /* (const_int 64)  */
+   CONST_INT_P (XEXP (XEXP (x, 0), 1))
+   UINTVAL (XEXP (XEXP (x, 0), 1)) == 64)
+{
+  /* UMULH/SMULH.  */
+	  if (speed)
+	*cost += extra_cost-mult[mode == DImode].extend;
+  *cost += rtx_cost (XEXP (XEXP (XEXP (XEXP (x, 0), 0), 0), 0),
+			 MULT, 0, speed);
+  *cost += rtx_cost (XEXP (XEXP (XEXP (XEXP (x, 0), 0), 1), 0),
+			 MULT, 1, speed);
+  return true;
+}
+
+  /* Fall through.  */
 default:
   break;
 }

[AArch64 costs 5/18] Factor out common MULT cases

2014-03-27 Thread James Greenhalgh

Hi,

MULT, fused/chained multiply with add, and similar operations
with MINUS and NEG should all be handled in a consistent way.

To do that, we can pull out a common mult core from the partial
implementations found around aarch64_rtx_costs.

This patch performs that refactoring. One additional change we make
is to aarch64_strip_shift_or_extend, which becomes aarch64_strip_extend.
This allows us to catch the shift in our multiply code, and cost it
as appropriate. In order to maintain the precondition on
aarch64_is_extend_from_extract we must add an extra check before calling
it.

With this patch, we refactor the PLUS, MINUS and NEG costing cases.

Tested in series on aarch64-none-elf with no regressions.

OK for stage 1?

Thanks,
James

---
2014-03-27  James Greenhalgh  james.greenha...@arm.com
Philipp Tomsich  philipp.toms...@theobroma-systems.com

* config/aarch64/aarch64.c (aarch64_strip_shift_or_extend): Rename
to...
(aarch64_strip_extend): ...this, don't strip shifts, check RTX is
well formed.
(aarch64_rtx_mult_cost): New.
(aarch64_rtx_costs): Use it, refactor as appropriate.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index af947ca..11dc788 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -499,7 +499,7 @@ aarch64_is_long_call_p (rtx sym)
represent an expression that matches an extend operation.  The
operands represent the paramters from
 
-   (extract (mult (reg) (mult_imm)) (extract_imm) (const_int 0)).  */
+   (extract:MODE (mult (reg) (MULT_IMM)) (EXTRACT_IMM) (const_int 0)).  */
 bool
 aarch64_is_extend_from_extract (enum machine_mode mode, rtx mult_imm,
 rtx extract_imm)
@@ -4543,18 +4543,19 @@ aarch64_strip_shift (rtx x)
   return x;
 }
 
-/* Helper function for rtx cost calculation.  Strip a shift or extend
+/* Helper function for rtx cost calculation.  Strip an extend
expression from X.  Returns the inner operand if successful, or the
original expression on failure.  We deal with a number of possible
canonicalization variations here.  */
 static rtx
-aarch64_strip_shift_or_extend (rtx x)
+aarch64_strip_extend (rtx x)
 {
   rtx op = x;
 
   /* Zero and sign extraction of a widened value.  */
   if ((GET_CODE (op) == ZERO_EXTRACT || GET_CODE (op) == SIGN_EXTRACT)
XEXP (op, 2) == const0_rtx
+   GET_CODE (XEXP (op, 0)) == MULT
aarch64_is_extend_from_extract (GET_MODE (op), XEXP (XEXP (op, 0), 1),
 	 XEXP (op, 1)))
 return XEXP (XEXP (op, 0), 0);
@@ -4583,7 +4584,122 @@ aarch64_strip_shift_or_extend (rtx x)
   if (op != x)
 return op;
 
-  return aarch64_strip_shift (x);
+  return x;
+}
+
+/* Helper function for rtx cost calculation.  Calculate the cost of
+   a MULT, which may be part of a multiply-accumulate rtx.  Return
+   the calculated cost of the expression, recursing manually in to
+   operands where needed.  */
+
+static int
+aarch64_rtx_mult_cost (rtx x, int code, int outer, bool speed)
+{
+  rtx op0, op1;
+  const struct cpu_cost_table *extra_cost
+= aarch64_tune_params-insn_extra_cost;
+  int cost = 0;
+  bool maybe_fma = (outer == PLUS || outer == MINUS);
+  enum machine_mode mode = GET_MODE (x);
+
+  gcc_checking_assert (code == MULT);
+
+  op0 = XEXP (x, 0);
+  op1 = XEXP (x, 1);
+
+  if (VECTOR_MODE_P (mode))
+mode = GET_MODE_INNER (mode);
+
+  /* Integer multiply/fma.  */
+  if (GET_MODE_CLASS (mode) == MODE_INT)
+{
+  /* The multiply will be canonicalized as a shift, cost it as such.  */
+  if (CONST_INT_P (op1)
+	   exact_log2 (INTVAL (op1))  0)
+	{
+	  if (speed)
+	{
+	  if (maybe_fma)
+		/* ADD (shifted register).  */
+		cost += extra_cost-alu.arith_shift;
+	  else
+		/* LSL (immediate).  */
+		cost += extra_cost-alu.shift;
+	}
+
+	  cost += rtx_cost (op0, GET_CODE (op0), 0, speed);
+
+	  return cost;
+	}
+
+  /* Integer multiplies or FMAs have zero/sign extending variants.  */
+  if ((GET_CODE (op0) == ZERO_EXTEND
+	GET_CODE (op1) == ZERO_EXTEND)
+	  || (GET_CODE (op0) == SIGN_EXTEND
+	   GET_CODE (op1) == SIGN_EXTEND))
+	{
+	  cost += rtx_cost (XEXP (op0, 0), MULT, 0, speed)
+		  + rtx_cost (XEXP (op1, 0), MULT, 1, speed);
+
+	  if (speed)
+	{
+	  if (maybe_fma)
+		/* MADD/SMADDL/UMADDL.  */
+		cost += extra_cost-mult[0].extend_add;
+	  else
+		/* MUL/SMULL/UMULL.  */
+		cost += extra_cost-mult[0].extend;
+	}
+
+	  return cost;
+	}
+
+  /* This is either an integer multiply or an FMA.  In both cases
+	 we want to recurse and cost the operands.  */
+  cost += rtx_cost (op0, MULT, 0, speed)
+	  + rtx_cost (op1, MULT, 1, speed);
+
+  if (speed)
+	{
+	  if (maybe_fma)
+	/* MADD.  */
+	cost += extra_cost-mult[mode == DImode].add;
+	  else
+	/* MUL.  */
+	cost += extra_cost-mult[mode == DImode].simple;
+	}
+
+  return cost;
+}
+  else
+{
+  if (speed)
+	{
+	  /* Floating-point FMA can also 

[AArch64 costs 18/18] Dump a message if we are unable to cost an insn.

2014-03-27 Thread James Greenhalgh

Hi,

If we are unable to fully cost an RTX, we should return the default
cost and avoid recursing to the operands. This will bias us towards
picking bigger RTX - which presumably have been added as patterns
because somebody expects them to be more efficient.

To aid future debugging and development, we also dump our shortcomings.

Tested on aarch64-none-elf with no issues.

OK for 5.0?

Thanks,
James

---
2014-03-27  James Greenhalgh  james.greenha...@arm.com

* config/aarch64/aarch64.c (aarch64_rtx_costs): Handle the case
where we were unable to cost an RTX.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 8ebb3d0..f284641 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5734,7 +5734,11 @@ cost_plus:
 
   /* Fall through.  */
 default:
-  break;
+  if (dump_file  (dump_flags  TDF_DETAILS))
+	fprintf (dump_file,
+		 \nFailed to cost RTX.  Assuming default cost.\n);
+
+  return true;
 }
   return false;
 }

[PATCH], PR 60672, Add xxsldwi/xxpermdi builtins to altivec.h

2014-03-27 Thread Michael Meissner
One of the users within IBM noticed that we did not provide builtins for the
XXSLDWI (vector shift left) and XXPERMDI (permute 64-bit values to make 128-bit
vector) instructions.  It turns out, we had provided these builtins, but we had
not documented them, nor did we add them to altivec.h with a user visible name.

When I added these builtins several years ago, I did not understand the naming
scheme for overloaded functions (i.e. __builtin_vec_xxx in the compiler, and
vec_xxx in altivec.h), so I added the overloaded builtin as
__builtin_vsx_xxsldwi and __builtin_vsx_xxpermdi.  This patch does not fix the
historical accident, but instead just uses the name that is created.

I can change the name, and provide a #define for somebody using the old name,
or we can just leave the compiler generating the old name, and altivec.h just
has to adapt.

I have done bootstraps and make check with no regressions.  Are these patches
ok to apply to 4.9 and backported to 4.8 when the rest of the changes go in?

[gcc]
2014-03-27  Michael Meissner  meiss...@linux.vnet.ibm.com

PR target/60672
* config/rs6000/altivec.h (vec_xxsldwi): Add missing define to
enable use of xxsldwi and xxpermdi builtin functions.
(vec_xxpermdi): Likewise.

* doc/extend.texi (PowerPC AltiVec/VSX Built-in Functions):
Document use of vec_xxsldwi and vec_xxpermdi builtins.

[gcc/testsuite]
2014-03-27  Michael Meissner  meiss...@linux.vnet.ibm.com

PR target/60672
* gcc.target/powerpc/pr60676.c: New file, make sure xxsldwi and
xxpermdi builtins are supported.


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/altivec.h
===
--- gcc/config/rs6000/altivec.h (revision 208851)
+++ gcc/config/rs6000/altivec.h (working copy)
@@ -319,6 +319,11 @@
 #define vec_sqrt __builtin_vec_sqrt
 #define vec_vsx_ld __builtin_vec_vsx_ld
 #define vec_vsx_st __builtin_vec_vsx_st
+
+/* Note, xxsldi and xxpermdi were added as __builtin_vsx_xxx functions
+   instead of __builtin_vec_xxx  */
+#define vec_xxsldwi __builtin_vsx_xxsldwi
+#define vec_xxpermdi __builtin_vsx_xxpermdi
 #endif
 
 #ifdef _ARCH_PWR8
Index: gcc/doc/extend.texi
===
--- gcc/doc/extend.texi (revision 208851)
+++ gcc/doc/extend.texi (working copy)
@@ -14859,6 +14859,35 @@ void vec_vsx_st (vector unsigned char, i
 void vec_vsx_st (vector bool char, int, vector bool char *);
 void vec_vsx_st (vector bool char, int, unsigned char *);
 void vec_vsx_st (vector bool char, int, signed char *);
+
+vector double vec_xxpermdi (vector double, vector double, int);
+vector float vec_xxpermdi (vector float, vector float, int);
+vector long long vec_xxpermdi (vector long long, vector long long, int);
+vector unsigned long long vec_xxpermdi (vector unsigned long long,
+vector unsigned long long, int);
+vector int vec_xxpermdi (vector int, vector int, int);
+vector unsigned int vec_xxpermdi (vector unsigned int,
+  vector unsigned int, int);
+vector short vec_xxpermdi (vector short, vector short, int);
+vector unsigned short vec_xxpermdi (vector unsigned short,
+vector unsigned short, int);
+vector signed char vec_xxpermdi (vector signed char, vector signed char, int);
+vector unsigned char vec_xxpermdi (vector unsigned char,
+   vector unsigned char, int);
+
+vector double vec_xxsldi (vector double, vector double, int);
+vector float vec_xxsldi (vector float, vector float, int);
+vector long long vec_xxsldi (vector long long, vector long long, int);
+vector unsigned long long vec_xxsldi (vector unsigned long long,
+  vector unsigned long long, int);
+vector int vec_xxsldi (vector int, vector int, int);
+vector unsigned int vec_xxsldi (vector unsigned int, vector unsigned int, int);
+vector short vec_xxsldi (vector short, vector short, int);
+vector unsigned short vec_xxsldi (vector unsigned short,
+  vector unsigned short, int);
+vector signed char vec_xxsldi (vector signed char, vector signed char, int);
+vector unsigned char vec_xxsldi (vector unsigned char,
+ vector unsigned char, int);
 @end smallexample
 
 Note that the @samp{vec_ld} and @samp{vec_st} built-in functions always
Index: gcc/testsuite/gcc.target/powerpc/pr60676.c
===
--- gcc/testsuite/gcc.target/powerpc/pr60676.c  (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr60676.c  (revision 0)
@@ -0,0 +1,128 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if  { powerpc*-*-darwin* } { * } {  } } */
+/* { 

[AArch64 costs 15/18] Cost more Floating point RTX.

2014-03-27 Thread James Greenhalgh

Hi,

This one adds FMA, FLOAT_EXTEND, FLOAT_TRUNCATE, ABS.
and SMAX/SMIN.

Tested in series on aarch64-none-elf.

OK for stage 1?

Thanks,
James

---
2014-03-27  James Greenhalgh  james.greenha...@arm.com
Philipp Tomsich  philipp.toms...@theobroma-systems.com

* config/aarch64/aarch64.c (aarch64_rtx_costs): Cost FMA,
FLOAT_EXTEND, FLOAT_TRUNCATE, ABS, SMAX, and SMIN.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index bdfcc55..3caff3a 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5588,6 +5588,89 @@ cost_plus:
 
   return false; /* All arguments must be in registers.  */
 
+case FMA:
+  op0 = XEXP (x, 0);
+  op1 = XEXP (x, 1);
+  op2 = XEXP (x, 2);
+
+  if (speed)
+	*cost += extra_cost-fp[mode == DFmode].fma;
+
+  /* FMSUB, FNMADD, and FNMSUB are free.  */
+  if (GET_CODE (op0) == NEG)
+op0 = XEXP (op0, 0);
+
+  if (GET_CODE (op2) == NEG)
+op2 = XEXP (op2, 0);
+
+  /* aarch64_fnma4_elt_to_64v2df has the NEG as operand 1,
+	 and the by-element operand as operand 0.  */
+  if (GET_CODE (op1) == NEG)
+op1 = XEXP (op1, 0);
+
+  /* Catch vector-by-element operations.  The by-element operand can
+	 either be (vec_duplicate (vec_select (x))) or just
+	 (vec_select (x)), depending on whether we are multiplying by
+	 a vector or a scalar.
+
+	 Canonicalization is not very good in these cases, FMA4 will put the
+	 by-element operand as operand 0, FNMA4 will have it as operand 1.  */
+  if (GET_CODE (op0) == VEC_DUPLICATE)
+	op0 = XEXP (op0, 0);
+  else if (GET_CODE (op1) == VEC_DUPLICATE)
+	op1 = XEXP (op1, 0);
+
+  if (GET_CODE (op0) == VEC_SELECT)
+	op0 = XEXP (op0, 0);
+  else if (GET_CODE (op1) == VEC_SELECT)
+	op1 = XEXP (op1, 0);
+
+  /* If the remaining parameters are not registers,
+ get the cost to put them into registers.  */
+  *cost += rtx_cost (op0, FMA, 0, speed);
+  *cost += rtx_cost (op1, FMA, 1, speed);
+  *cost += rtx_cost (op2, FMA, 2, speed);
+  return true;
+
+case FLOAT_EXTEND:
+  if (speed)
+	*cost += extra_cost-fp[mode == DFmode].widen;
+  return false;
+
+case FLOAT_TRUNCATE:
+  if (speed)
+	*cost += extra_cost-fp[mode == DFmode].narrow;
+  return false;
+
+case ABS:
+  if (GET_MODE_CLASS (mode) == MODE_FLOAT)
+	{
+	  /* FABS and FNEG are analogous.  */
+	  if (speed)
+	*cost += extra_cost-fp[mode == DFmode].neg;
+	}
+  else
+	{
+	  /* Integer ABS will either be split to
+	 two arithmetic instructions, or will be an ABS
+	 (scalar), which we don't model.  */
+	  *cost = COSTS_N_INSNS (2);
+	  if (speed)
+	*cost += 2 * extra_cost-alu.arith;
+	}
+  return false;
+
+case SMAX:
+case SMIN:
+  if (speed)
+	{
+	  /* FMAXNM/FMINNM/FMAX/FMIN.
+	 TODO: This may not be accurate for all implementations, but
+	 we do not model this in the cost tables.  */
+	  *cost += extra_cost-fp[mode == DFmode].addsub;
+	}
+  return false;
+
 default:
   break;
 }

[AArch64 costs 0/18] Improve address- and rtx-cost models

2014-03-27 Thread James Greenhalgh
Hi,

This patch series improves the costing model in the AArch64 backend to
match a number of new idioms.

This patch is a combination of a series I had been working on, with the
cost-model for XGene-1 proposed by Philipp Tomsich.
( http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01084.html )

Where sensible I have integrated the idiom matching in Philipp's
with my own work, though there were cases which were redundant, or
could be folded to reduce code duplication. There were other cases where
the code suggested XGene-1 would benefit from special-case handling.
Without documentation for the XGene-1 I can't cater for these special
cases, and I have not tried to do so here.

The patch series has been bootstrapped natively on aarch64-none-elf, and
has been through aarch64-none-elf runs with no issue.

Is this OK for stage-1?

Thanks,
James

Re: [PATCH], PR 60672, Add xxsldwi/xxpermdi builtins to altivec.h

2014-03-27 Thread David Edelsohn
On Thu, Mar 27, 2014 at 1:43 PM, Michael Meissner
meiss...@linux.vnet.ibm.com wrote:
 One of the users within IBM noticed that we did not provide builtins for the
 XXSLDWI (vector shift left) and XXPERMDI (permute 64-bit values to make 
 128-bit
 vector) instructions.  It turns out, we had provided these builtins, but we 
 had
 not documented them, nor did we add them to altivec.h with a user visible 
 name.

 When I added these builtins several years ago, I did not understand the naming
 scheme for overloaded functions (i.e. __builtin_vec_xxx in the compiler, and
 vec_xxx in altivec.h), so I added the overloaded builtin as
 __builtin_vsx_xxsldwi and __builtin_vsx_xxpermdi.  This patch does not fix the
 historical accident, but instead just uses the name that is created.

 I can change the name, and provide a #define for somebody using the old name,
 or we can just leave the compiler generating the old name, and altivec.h just
 has to adapt.

 I have done bootstraps and make check with no regressions.  Are these patches
 ok to apply to 4.9 and backported to 4.8 when the rest of the changes go in?

 [gcc]
 2014-03-27  Michael Meissner  meiss...@linux.vnet.ibm.com

 PR target/60672
 * config/rs6000/altivec.h (vec_xxsldwi): Add missing define to
 enable use of xxsldwi and xxpermdi builtin functions.
 (vec_xxpermdi): Likewise.

 * doc/extend.texi (PowerPC AltiVec/VSX Built-in Functions):
 Document use of vec_xxsldwi and vec_xxpermdi builtins.

 [gcc/testsuite]
 2014-03-27  Michael Meissner  meiss...@linux.vnet.ibm.com

 PR target/60672
 * gcc.target/powerpc/pr60676.c: New file, make sure xxsldwi and
 xxpermdi builtins are supported.

Thanks for fixing the missing functions and documentation.

Just to clarify the explanation, the builtin name remains
__builtin_vsx_xxx, but the altivec.h macro definition \ is
vec_xxx.

Okay.

Thanks, David


Re: std::rethrow_exception is broken

2014-03-27 Thread Jonathan Wakely

On 25/03/14 17:25 +, Jonathan Wakely wrote:

Tested x86_64-linux, I plan to commit this to trunk soon.




commit 06a845f80204947afd6866109db58cc85dc87117
Author: Jonathan Wakely jwak...@redhat.com
Date:   Tue Mar 25 14:42:45 2014 +

PR libstdc++/60612
* libsupc++/eh_ptr.cc: Assert __cxa_dependent_exception layout is
compatible with __cxa_exception.
* libsupc++/unwind-cxx.h (__cxa_dependent_exception): Add padding.
Fix typos in comments.
* testsuite/18_support/exception_ptr/60612-terminate.cc: New.
* testsuite/18_support/exception_ptr/60612-unexpected.cc: New.


Committed to trunk.



Re: [GOOGLE] Refactor the LIPO fixup

2014-03-27 Thread Xinliang David Li
ok.

On Thu, Mar 27, 2014 at 9:02 AM, Dehao Chen de...@google.com wrote:
 On Wed, Mar 26, 2014 at 4:05 PM, Xinliang David Li davi...@google.com wrote:
 is cgraph_init_gid_map called after linking?

 Oh, forgot that part. It's interesting that the test can pass without
 another cgraph_init_gid_map call.

 Patch updated. Retested and the performance is OK.

 Dehao


 David

 On Wed, Mar 26, 2014 at 3:54 PM, Dehao Chen de...@google.com wrote:
 Patch updated, passed performance tests.

 Dehao

 On Tue, Mar 25, 2014 at 4:03 PM, Xinliang David Li davi...@google.com 
 wrote:
 Add comment to the new function. init_node_map is better invoked after
 the link step to avoid creating entries with for dead nodes.

 Ok if large perf testing is fine.

 David

 On Tue, Mar 25, 2014 at 3:38 PM, Dehao Chen de...@google.com wrote:
 This patch refactors LIPO fixup related code to move it into a
 standalone function. This makes sure that
 symtab_remove_unreachable_nodes is called right after the fixup so
 that there is not dangling cgraph nodes any time.

 Bootstrapped and regression test on-going.

 OK for google-4_8?

 Thanks,
 Dehao


Re: [PATCH], PR 60672, Add xxsldwi/xxpermdi builtins to altivec.h

2014-03-27 Thread Michael Meissner
On Thu, Mar 27, 2014 at 02:06:23PM -0400, David Edelsohn wrote:
 Thanks for fixing the missing functions and documentation.
 
 Just to clarify the explanation, the builtin name remains
 __builtin_vsx_xxx, but the altivec.h macro definition \ is
 vec_xxx.
 
 Okay.

Yes.  I can change if desired, but I figured just having the vec_xxx would
suffice, particularly if it is documented.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



[patch] libstdc++ doc improvements

2014-03-27 Thread Jonathan Wakely

The three attachments have been committed as well as another commit to
regenerate the HTML pages.

  * doc/xml/manual/io.xml (std.io.objects): Additional markup.

  * doc/xml/faq.xml (faq): Refer to clauses instead of chapters.
  * doc/xml/manual/appendix_contributing.xml (contrib.design_notes):
  Likewise.
  * doc/xml/manual/backwards_compatibility.xml (backwards.third):
  Likewise.
  * doc/xml/manual/test.xml (test.organization.layout): Likewise.

  * doc/xml/manual/containers.xml (associative.bitset.size_variable):
  Fix bad s/part/chapter/ substitutions.
  * doc/xml/manual/io.xml (std.io): Likewise.
  * doc/xml/manual/numerics.xml (std.numerics.generalized_ops): Likewise.
  * doc/xml/manual/strings.xml (strings.string.Cstring): Likewise.

  * doc/html/*: Regenerate.

commit 3f5c2a4e2635254a7e71907e655bf7ad324cea28
Author: Jonathan Wakely jwak...@redhat.com
Date:   Thu Mar 27 17:48:20 2014 +

* doc/xml/manual/containers.xml (associative.bitset.size_variable):
Fix bad s/part/chapter/ substitutions.
* doc/xml/manual/io.xml (std.io): Likewise.
* doc/xml/manual/numerics.xml (std.numerics.generalized_ops): Likewise.
* doc/xml/manual/strings.xml (strings.string.Cstring): Likewise.

diff --git a/libstdc++-v3/doc/xml/manual/containers.xml 
b/libstdc++-v3/doc/xml/manual/containers.xml
index 653033d..9fea0f7 100644
--- a/libstdc++-v3/doc/xml/manual/containers.xml
+++ b/libstdc++-v3/doc/xml/manual/containers.xml
@@ -232,7 +232,7 @@
para
  There are a couple of ways to handle this kind of thing.  Please
  consider all of them before passing judgement.  They include, in
- no chaptericular order:
+ no particular order:
/para
   itemizedlist
listitemparaA very large N in 
codebitsetlt;Ngt;/code./para/listitem
diff --git a/libstdc++-v3/doc/xml/manual/io.xml 
b/libstdc++-v3/doc/xml/manual/io.xml
index 34e47ea..5ae93b9 100644
--- a/libstdc++-v3/doc/xml/manual/io.xml
+++ b/libstdc++-v3/doc/xml/manual/io.xml
@@ -424,7 +424,7 @@
paraSeriously, go do it.  Get surprised, then come back.  It's worth it.
/para
paraThe thing to remember is that the codebasic_[io]stream/code 
classes
-  handle formatting, nothing else.  In chaptericular, they break up on
+  handle formatting, nothing else.  In particular, they break up on
   whitespace.  The actual reading, writing, and storing of data is
   handled by the codebasic_streambuf/code family.  Fortunately, the
   codeoperatorlt;lt;/code is overloaded to take an ostream and
@@ -442,7 +442,7 @@
programlisting
OUT lt;lt; IN.rdbuf();/programlisting
paraSo what emphasiswas/emphasis happening with OUTlt;lt;IN?  
Undefined
-  behavior, since that chaptericular lt;lt; isn't defined by the 
Standard.
+  behavior, since that particular lt;lt; isn't defined by the Standard.
   I have seen instances where it is implemented, but the character
   extraction process removes all the whitespace, leaving you with no
   blank lines and only Thequickbrownfox  With
@@ -659,7 +659,7 @@
paraNote, by the way, that the synchronization requirement only applies to
   the standard streams (codecin/code, codecout/code,
   codecerr/code,
-  codeclog/code, and their wide-character counterchapters).  File 
stream
+  codeclog/code, and their wide-character counterparts).  File stream
   objects that you declare yourself have no such requirement and are fully
   buffered.
/para
diff --git a/libstdc++-v3/doc/xml/manual/numerics.xml 
b/libstdc++-v3/doc/xml/manual/numerics.xml
index 4957355..cc26153 100644
--- a/libstdc++-v3/doc/xml/manual/numerics.xml
+++ b/libstdc++-v3/doc/xml/manual/numerics.xml
@@ -65,7 +65,7 @@
itemizedlist
   listitemparacodeaccumulate/code/para/listitem
   listitemparacodeinner_product/code/para/listitem
-  listitemparacodechapterial_sum/code/para/listitem
+  listitemparacodepartial_sum/code/para/listitem
   listitemparacodeadjacent_difference/code/para/listitem
/itemizedlist
paraHere is a simple example of the two forms of codeaccumulate/code.
diff --git a/libstdc++-v3/doc/xml/manual/strings.xml 
b/libstdc++-v3/doc/xml/manual/strings.xml
index d281c02..6a94fa2 100644
--- a/libstdc++-v3/doc/xml/manual/strings.xml
+++ b/libstdc++-v3/doc/xml/manual/strings.xml
@@ -462,7 +462,7 @@ stringtok(Container amp;container, string const amp;in,
 emphasisif the implementors do it correctly/emphasis.  The 
libstdc++
 implementors did it correctly.  Other vendors might not.
 /para/listitem
-listitemparaWhile chapters of the SGI STL are used in libstdc++, 
their
+listitemparaWhile parts of the SGI STL are used in libstdc++, their
 string class is not.  The SGI codestring/code is essentially
 codevectorlt;chargt;/code and does not do any reference
 counting like libstdc++'s does.  (It is O(n), though.)
commit 

Re: [gomp4] Add tables generation

2014-03-27 Thread Ilya Verbin
On 27 Mar 17:16, Jakub Jelinek wrote:
 Which is why the table created for host by the ompexp pass should be
 streamed into the target_lto sections (marked specially somehow, special
 attribute or whatever), and then corresponding target table created from
 that, rather then created from some possibly different ordering there.

Ok, this should work.  I'll rewrite tables generation.

  -- Ilya


[PATCH, rs6000] Avoid clobbering stack pointer via P8 fusion peephole

2014-03-27 Thread Ulrich Weigand
Hello,

when trying to build Ada for powerpc64le-linux, I ran into an ICE
in fixup_args_size_notes.

It turns out that the p8 fusion peephole acts on these two insns
from the epilog sequence:

(insn 1693 1078 1079 91 (set (reg:DI 7 7)
(plus:DI (reg/f:DI 31 31)
(const_int 65536 [0x1]))) 82 {*adddi3_internal1}
 (nil))
(insn 1079 1693 1511 91 (set (reg/f:DI 1 1)
(mem/c:DI (plus:DI (reg:DI 7 7)
(const_int -16096 [0xc120])) [233 %sfp+49440 S8 
A64])) 519 {*movdi_internal64}
 (expr_list:REG_DEAD (reg:DI 7 7)
(expr_list:REG_ARGS_SIZE (const_int 0 [0])
(nil

and replaces them by:

(insn 1776 1078 1777 91 (set (reg/f:DI 1 1)
(plus:DI (reg/f:DI 31 31)  
(const_int 65536 [0x1]))) -1
 (nil)) 

(insn 1777 1776 1511 91 (set (reg/f:DI 1 1)
(mem/c:DI (plus:DI (reg/f:DI 1 1)  
(const_int -16096 [0xc120])) [233  S8 A8])) -1
 (expr_list:REG_ARGS_SIZE (const_int 0 [0])   
(nil)))   

Then peephole common code thinks it needs to re-create the REG_ARGS_SIZE
note and fails since the code is too complex for it to understand.  (Which
is reasonable since it doesn't know what value is being restored from the
stack here.)

However, the more fundamental problem seems to be that this transformation
should be invalid anyway, since it creates an intermediate state where
the stack pointer points to a location without proper backchain, which
violates the ABI.

The following patch fixes this by disabling the fusion peephole in those
cases where it would introduce a new use of the stack pointer as temporary
register.

Tested on powerpc64le-linux.  OK for mainline (and 4.8 after the big patch
series is committed)?

Bye,
Ulrich

ChangeLog:

* config/rs6000/rs6000.c (fusion_gpr_load_p): Refuse optimization
if it would clobber the stack pointer, even temporarily.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 208870)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -32519,6 +32519,11 @@
 
   if (!peep2_reg_dead_p (2, addis_reg))
return false;
+
+  /* If the target register being loaded is the stack pointer, we must
+ avoid loading any other value into it, even temporarily.  */
+  if (REG_P (target)  REGNO (target) == STACK_POINTER_REGNUM)
+   return false;
 }
 
   base_reg = XEXP (addr, 0);

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



patch to fix PR 60650

2014-03-27 Thread Vladimir Makarov
  The following patch fixes

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60650

  The reason was in general regs pool fragmentation which resulted in
failure to assign double regs to 3 conflicting reload pseudos.  The
fragmentation started in IRA and the chain unfortunate events further in
LRA resulted in LRA crash.

  The patch fixes this problem by more radical than previous approach. 
In case we can not assign a hard reg on the 1st pass (which is very rare
event as we can spill *non-reload* pseudos), we spill all conflicting
pseudos and assign always 1st available hard reg which permits to avoid
the fragmentation.

  The patch was successfully bootstrapped and tested on x86-64 and arm.

  Committed as rev.208876.

2014-03-27  Vladimir Makarov  vmaka...@redhat.com

PR rtl-optimization/60650
* lra-assign.c (find_hard_regno_for, spill_for): Add parameter
first_p.  Use it.
(find_spills_for): New.
(assign_by_spills): Pass the new parameter to find_hard_regno_for.
Spill all pseudos on the second iteration.

2014-03-27  Vladimir Makarov  vmaka...@redhat.com

PR rtl-optimization/60650
* gcc.target/arm/pr60650.c: New.

index 28d4a0f..cba8b8f 100644
--- a/gcc/lra-assigns.c
+++ b/gcc/lra-assigns.c
@@ -451,10 +451,16 @@ adjust_hard_regno_cost (int hard_regno, int incr)
that register.  (If several registers have equal cost, the one with
the highest priority wins.)  Return -1 on failure.
 
+   If FIRST_P, return the first available hard reg ignoring other
+   criteria, e.g. allocation cost.  This approach results in less hard
+   reg pool fragmentation and permit to allocate hard regs to reload
+   pseudos in complicated situations where pseudo sizes are different.
+
If TRY_ONLY_HARD_REGNO = 0, consider only that hard register,
otherwise consider all hard registers in REGNO's class.  */
 static int
-find_hard_regno_for (int regno, int *cost, int try_only_hard_regno)
+find_hard_regno_for (int regno, int *cost, int try_only_hard_regno,
+		 bool first_p)
 {
   HARD_REG_SET conflict_set;
   int best_cost = INT_MAX, best_priority = INT_MIN, best_usage = INT_MAX;
@@ -630,7 +636,7 @@ find_hard_regno_for (int regno, int *cost, int try_only_hard_regno)
 	  best_usage = lra_hard_reg_usage[hard_regno];
 	}
 	}
-  if (try_only_hard_regno = 0)
+  if (try_only_hard_regno = 0 || (first_p  best_hard_regno = 0))
 	break;
 }
   if (best_hard_regno = 0)
@@ -816,9 +822,15 @@ static int *sorted_reload_pseudos;
to be spilled), we take into account not only how REGNO will
benefit from the spills but also how other reload pseudos not yet
assigned to hard registers benefit from the spills too.  In very
-   rare cases, the function can fail and return -1.  */
+   rare cases, the function can fail and return -1.
+
+   If FIRST_P, return the first available hard reg ignoring other
+   criteria, e.g. allocation cost and cost of spilling non-reload
+   pseudos.  This approach results in less hard reg pool fragmentation
+   and permit to allocate hard regs to reload pseudos in complicated
+   situations where pseudo sizes are different.  */
 static int
-spill_for (int regno, bitmap spilled_pseudo_bitmap)
+spill_for (int regno, bitmap spilled_pseudo_bitmap, bool first_p)
 {
   int i, j, n, p, hard_regno, best_hard_regno, cost, best_cost, rclass_size;
   int reload_hard_regno, reload_cost;
@@ -905,7 +917,7 @@ spill_for (int regno, bitmap spilled_pseudo_bitmap)
 	   (ira_reg_classes_intersect_p
 		  [rclass][regno_allocno_class_array[reload_regno]])
 	   live_pseudos_reg_renumber[reload_regno]  0
-	   find_hard_regno_for (reload_regno, cost, -1)  0)
+	   find_hard_regno_for (reload_regno, cost, -1, first_p)  0)
 	sorted_reload_pseudos[n++] = reload_regno;
   EXECUTE_IF_SET_IN_BITMAP (spill_pseudos_bitmap, 0, spill_regno, bi)
 	{
@@ -914,7 +926,7 @@ spill_for (int regno, bitmap spilled_pseudo_bitmap)
 	fprintf (lra_dump_file,  spill %d(freq=%d),
 		 spill_regno, lra_reg_info[spill_regno].freq);
 	}
-  hard_regno = find_hard_regno_for (regno, cost, -1);
+  hard_regno = find_hard_regno_for (regno, cost, -1, first_p);
   if (hard_regno = 0)
 	{
 	  assign_temporarily (regno, hard_regno);
@@ -926,7 +938,7 @@ spill_for (int regno, bitmap spilled_pseudo_bitmap)
 	  lra_assert (live_pseudos_reg_renumber[reload_regno]  0);
 	  if ((reload_hard_regno
 		   = find_hard_regno_for (reload_regno,
-	  reload_cost, -1)) = 0)
+	  reload_cost, -1, first_p)) = 0)
 		{
 		  if (lra_dump_file != NULL)
 		fprintf (lra_dump_file,  assign %d(cost=%d),
@@ -1148,8 +1160,8 @@ improve_inheritance (bitmap changed_pseudos)
 		   regno, hard_regno, another_regno, another_hard_regno);
 	  update_lives (another_regno, true);
 	  lra_setup_reg_renumber (another_regno, -1, false);
-	  if (hard_regno
-		  == find_hard_regno_for (another_regno, cost, hard_regno))
+	  if (hard_regno == 

Re: Fix PR ipa/60315 (inliner explosion)

2014-03-27 Thread Jan Hubicka
  Bootstrapped/regtested x86_64-linux, comitted.
 
 Not with Ada apparently, resulting in 
 
 === acats tests ===
 FAIL:   c34007d
 FAIL:   c34007g
 FAIL:   c34007s
 FAIL:   c37213j
 FAIL:   c37213k
 FAIL:   c37213l
 FAIL:   ce2201g
 FAIL:   cxa5a03
 FAIL:   cxa5a04
 FAIL:   cxa5a06
 FAIL:   cxg2013
 FAIL:   cxg2015
 
The problem is that by redirection to noreturn, we end up freeing SSA name of 
the
LHS but later we still process statements that refer it until they are removed 
as
unreachable.
The following patch fixes it. I tested it on x86_64-linux, but changed my mind.
I think fixup_noreturn_call should do it instead, I will send updated patch 
after
testing.

Honza

Index: cgraph.c
===
--- cgraph.c(revision 208875)
+++ cgraph.c(working copy)
@@ -1329,6 +1331,7 @@ gimple
 cgraph_redirect_edge_call_stmt_to_callee (struct cgraph_edge *e)
 {
   tree decl = gimple_call_fndecl (e-call_stmt);
+  tree lhs = gimple_call_lhs (e-call_stmt);
   gimple new_stmt;
   gimple_stmt_iterator gsi;
 #ifdef ENABLE_CHECKING
@@ -1471,6 +1474,22 @@ cgraph_redirect_edge_call_stmt_to_callee
   update_stmt_fn (DECL_STRUCT_FUNCTION (e-caller-decl), new_stmt);
 }
 
+  /* If the call becomes noreturn, remove the lhs.  */
+  if (lhs  (gimple_call_flags (new_stmt)  ECF_NORETURN))
+{
+  if (TREE_CODE (lhs) == SSA_NAME)
+   {
+  gsi = gsi_for_stmt (new_stmt);
+
+ tree var = create_tmp_var (TREE_TYPE (lhs), NULL);
+ tree def = get_or_create_ssa_default_def
+ (DECL_STRUCT_FUNCTION (e-caller-decl), var);
+ gimple set_stmt = gimple_build_assign (lhs, def);
+ gsi_insert_before (gsi, set_stmt, GSI_SAME_STMT);
+   }
+  gimple_call_set_lhs (new_stmt, NULL_TREE);
+}
+
   cgraph_set_call_stmt_including_clones (e-caller, e-call_stmt, new_stmt, 
false);
 
   if (cgraph_dump_file)


Re: [Patch, Fortran] PR60576 Fix out-of-bounds problem

2014-03-27 Thread Tobias Burnus

An early * PING* for this wrong-code issue.

Tobias Burnus wrote:
This patch fixes part of the problems of the PR. The problem is that 
one assigns an array descriptor to an assumed-rank array descriptor. 
The latter has for BT_CLASS the size of max_dim (reason: we have first 
the data array and than vtab). With true, one takes the 
TREE_TYPE from the LHS (i.e. the assumed-rank variable) and as the 
type determines how many bytes the range assignment copies, one reads 
max_dimension elements from the RHS array - which can be too much.


Testcase: Already in the testsuite, even if it only fails under 
special conditions.


Build and regtested on x86-64-gnu-linux.
OK for the trunk and 4.8?

Tobias

PS: I haven't investigated the issues Jakub is seeing. With valgrind, 
they do not pop up and my attempt to build with all checking enabled, 
failed with configure or compile errors.




Re: [Patch, Fortran] PR58880 - Fix ICE with finalizers

2014-03-27 Thread Tobias Burnus

* PING*

Tobias Burnus wrote:

Hi all,

this patch fixes a problem with the conversion of scalars to 
descriptors. There one assigns the address of the scalar to the 
base_address field of the descriptor. The ICE occurred when the RHS 
(the scalar) wasn't a pointer.


It does not fully solve the PR as for some reasons the finalization 
wrapper is not generated - which causes link errors or ICEs (see PR).


Build and regtested on x86-64-gnu-linux.
OK for the (4.9) trunk?

Tobias




[PATCH] Fix PR c++/60573

2014-03-27 Thread Adam Butcher
PR c++/60573
* name-lookup.h (cp_binding_level): New field scope_defines_class_p.
* semantics.c (begin_class_definition): Set scope_defines_class_p.
* pt.c (instantiate_class_template_1): Likewise.
* parser.c (synthesize_implicit_template_parm): Use cp_binding_level::
scope_defines_class_p rather than TYPE_BEING_DEFINED as the predicate
for unwinding to class-defining scope to handle the erroneous definition
of a generic function of an arbitrarily nested class within an enclosing
class.

PR c++/60573
* g++.dg/cpp1y/pr60573.C: New testcase.
---
 gcc/cp/name-lookup.h |  6 +-
 gcc/cp/parser.c  | 23 +--
 gcc/cp/pt.c  |  5 -
 gcc/cp/semantics.c   |  1 +
 gcc/testsuite/g++.dg/cpp1y/pr60573.C | 28 
 5 files changed, 55 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/pr60573.C

diff --git a/gcc/cp/name-lookup.h b/gcc/cp/name-lookup.h
index a63442f..9e5d812 100644
--- a/gcc/cp/name-lookup.h
+++ b/gcc/cp/name-lookup.h
@@ -255,7 +255,11 @@ struct GTY(()) cp_binding_level {
   unsigned more_cleanups_ok : 1;
   unsigned have_cleanups : 1;
 
-  /* 24 bits left to fill a 32-bit word.  */
+  /* Set if this scope is of sk_class kind and is the defining
+ scope for this_entity.  */
+  unsigned scope_defines_class_p : 1;
+
+  /* 23 bits left to fill a 32-bit word.  */
 };
 
 /* The binding level currently in effect.  */
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index e729d65..4919a67 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -32000,7 +32000,7 @@ synthesize_implicit_template_parm  (cp_parser *parser)
{
  /* If not defining a class, then any class scope is a scope level in
 an out-of-line member definition.  In this case simply wind back
-beyond the first such scope to inject the template argument list.
+beyond the first such scope to inject the template parameter list.
 Otherwise wind back to the class being defined.  The latter can
 occur in class member friend declarations such as:
 
@@ -32011,12 +32011,23 @@ synthesize_implicit_template_parm  (cp_parser *parser)
 friend void A::foo (auto);
   };
 
-   The template argument list synthesized for the friend declaration
-   must be injected in the scope of 'B', just beyond the scope of 'A'
-   introduced by 'A::'.  */
+   The template parameter list synthesized for the friend declaration
+   must be injected in the scope of 'B'.  This can also occur in
+   erroneous cases such as:
 
- while (scope-kind == sk_class
- !TYPE_BEING_DEFINED (scope-this_entity))
+  struct A {
+struct B {
+  void foo (auto);
+};
+void B::foo (auto) {}
+  };
+
+   Here the attempted definition of 'B::foo' within 'A' is ill-formed
+   but, nevertheless, the template parameter list synthesized for the
+   declarator should be injected into the scope of 'A' as if the
+   ill-formed template was specified explicitly.  */
+
+ while (scope-kind == sk_class  !scope-scope_defines_class_p)
{
  parent_scope = scope;
  scope = scope-level_chain;
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index c791d03..90faeec 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -8905,9 +8905,12 @@ instantiate_class_template_1 (tree type)
 return type;
 
   /* Now we're really doing the instantiation.  Mark the type as in
- the process of being defined.  */
+ the process of being defined...  */
   TYPE_BEING_DEFINED (type) = 1;
 
+  /* ... and the scope defining it.  */
+  class_binding_level-scope_defines_class_p = 1;
+
   /* We may be in the middle of deferred access check.  Disable
  it now.  */
   push_deferring_access_checks (dk_no_deferred);
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 886fbb8..deba2ab 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -2777,6 +2777,7 @@ begin_class_definition (tree t)
   maybe_process_partial_specialization (t);
   pushclass (t);
   TYPE_BEING_DEFINED (t) = 1;
+  class_binding_level-scope_defines_class_p = 1;
 
   if (flag_pack_struct)
 {
diff --git a/gcc/testsuite/g++.dg/cpp1y/pr60573.C 
b/gcc/testsuite/g++.dg/cpp1y/pr60573.C
new file mode 100644
index 000..2f60707
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/pr60573.C
@@ -0,0 +1,28 @@
+// PR c++/60573
+// { dg-do compile { target c++1y } }
+// { dg-options  }
+
+struct A
+{
+  struct B
+  {
+void foo(auto);
+  };
+
+  void B::foo(auto) {}  // { dg-error cannot define }
+
+  struct X
+  {
+struct Y
+{
+  struct Z
+  {
+void foo(auto);
+  };
+};
+
+void 

Re: [Patch, Fortran] PR58880 - Fix ICE with finalizers

2014-03-27 Thread Janus Weil
Hi Tobias,

 this patch fixes a problem with the conversion of scalars to descriptors.
 There one assigns the address of the scalar to the base_address field of the
 descriptor. The ICE occurred when the RHS (the scalar) wasn't a pointer.

looks good to me. Ok for trunk!


 It does not fully solve the PR as for some reasons the finalization wrapper
 is not generated - which causes link errors or ICEs (see PR).

I was planning to have a look at that problem, but currently I have
too much other stuff going on to devote any serious amounts of time to
gfortran. I hope I will manage to do it soon, but can not say when ...

Cheers,
Janus


Re: [PATCH] Fix PR c++/60573

2014-03-27 Thread Adam Butcher

On 2014-03-27 20:45, Adam Butcher wrote:

@@ -8905,9 +8905,12 @@ instantiate_class_template_1 (tree type)
 return type;

   /* Now we're really doing the instantiation.  Mark the type as in
- the process of being defined.  */
+ the process of being defined...  */
   TYPE_BEING_DEFINED (type) = 1;

+  /* ... and the scope defining it.  */
+  class_binding_level-scope_defines_class_p = 1;



I meant current_binding_level here; but I'm not sure it's necessary 
here at all.




Re: [PATCH] Fix PR c++/60573

2014-03-27 Thread Adam Butcher

On 2014-03-27 20:45, Adam Butcher wrote:

PR c++/60573
* name-lookup.h (cp_binding_level): New field scope_defines_class_p.
* semantics.c (begin_class_definition): Set scope_defines_class_p.
* pt.c (instantiate_class_template_1): Likewise.
	* parser.c (synthesize_implicit_template_parm): Use 
cp_binding_level::
	scope_defines_class_p rather than TYPE_BEING_DEFINED as the 
predicate
	for unwinding to class-defining scope to handle the erroneous 
definition
	of a generic function of an arbitrarily nested class within an 
enclosing

class.

Still got issues with this.  It fails on out-of-line defs.  I'll have 
another look.




Re: [C++ patch] for C++/52369

2014-03-27 Thread Fabien Chêne
Hi,

As a followup, the following patch homogeneise some diagnostics that
relate to uninitialized const or reference members.
Tested x86_64 linux in progress, OK to commit for next stage 1 if that
succeeds ? (or trunk otherwise, I dare to mention it).

2014-03-28  Fabien Chêne  fab...@gcc.gnu.org

* cp/init.c (perform_member_init): homogeneize uninitialized
diagnostics.

2014-03-28  Fabien Chêne  fab...@gcc.gnu.org

* g++.dg/init/ctor4.C: Adjust.
* g++.dg/init/ctor4-1.C: New.

2014-03-24 18:21 GMT+01:00 Jason Merrill ja...@redhat.com:
 OK, thanks.

 Jason

-- 
Fabien
Index: gcc/testsuite/g++.dg/init/ctor4-1.C
===
--- gcc/testsuite/g++.dg/init/ctor4-1.C	(révision 0)
+++ gcc/testsuite/g++.dg/init/ctor4-1.C	(révision 0)
@@ -0,0 +1,21 @@
+// { dg-do compile }
+
+class foo {
+public:
+  foo();
+};
+
+class bar: public foo {	// { dg-error uninitialized }
+		   // { dg-message implicitly deleted  { target c++11 } 8 }
+private:
+  int const a; // { dg-message should be initialized }
+};
+
+foo::foo() {
+}
+
+int main(int argc, char **argv)
+{
+  bar x; // { dg-error deleted  { target c++11 } }
+	 // { dg-message synthesized  { target { ! c++11 } } 19 }
+}
Index: gcc/testsuite/g++.dg/init/ctor4.C
===
--- gcc/testsuite/g++.dg/init/ctor4.C	(révision 208853)
+++ gcc/testsuite/g++.dg/init/ctor4.C	(copie de travail)
@@ -6,9 +6,10 @@ public:
   foo();
 };
 
-class bar: public foo {		// { dg-error reference|bar::bar }
+class bar: public foo {	// { dg-error uninitialized }
+		   // { dg-message implicitly deleted  { target c++11 } 9 }
 private:
-  int a;
+  int a; // { dg-message should be initialized }
 };
 
 foo::foo() {
@@ -16,5 +17,6 @@ foo::foo() {
 
 int main(int argc, char **argv)
 {
-  bar x; // { dg-message synthesized|deleted }
+  bar x; // { dg-error deleted  { target c++11 } }
+ // { dg-message synthesized  { target { ! c++11 } } 20 }
 }
Index: gcc/cp/init.c
===
--- gcc/cp/init.c	(révision 208854)
+++ gcc/cp/init.c	(copie de travail)
@@ -710,13 +710,19 @@ perform_member_init (tree member, tree i
 	  tree core_type;
 	  /* member traversal: note it leaves init NULL */
 	  if (TREE_CODE (type) == REFERENCE_TYPE)
-	permerror (DECL_SOURCE_LOCATION (current_function_decl),
-		   uninitialized reference member %qD,
-		   member);
+	{
+	  permerror (DECL_SOURCE_LOCATION (current_function_decl),
+			 uninitialized reference member in %q#T, type);
+	  inform (DECL_SOURCE_LOCATION (member),
+		  %q#D should be initialized, member);
+	}
 	  else if (CP_TYPE_CONST_P (type))
-	permerror (DECL_SOURCE_LOCATION (current_function_decl),
-		   uninitialized member %qD with %const% type %qT,
-		   member, type);
+	{
+	  permerror (DECL_SOURCE_LOCATION (current_function_decl),
+			 uninitialized const member in %q#T, type);
+	  inform (DECL_SOURCE_LOCATION (member),
+		  %q#D should be initialized, member );
+	}
 
 	  core_type = strip_array_types (type);
 


[PATCH] RL78 - minor size optimization

2014-03-27 Thread Richard Hulme

Hi,

This patch is a small optimization for the RL78 target that uses the 
'clrb' instruction where possible when performing a zero-extend instead 
of 'mov'ing a literal #0.  This saves a byte on each operation.


Regards,

Richard

2014-03-27  Richard Hulme  pepe...@yahoo.com

* config/rl78/rl78-real.md (zero_extendqihi2_real):
Minor optimization to use clrb instruction where possible,
which is 1 byte shorter than 'mov'ing #0.

---
 gcc/config/rl78/rl78-real.md |7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/config/rl78/rl78-real.md b/gcc/config/rl78/rl78-real.md
index 27ff60f..3503a02 100644
--- a/gcc/config/rl78/rl78-real.md
+++ b/gcc/config/rl78/rl78-real.md
@@ -77,12 +77,13 @@
 ;;-- Conversions 

 (define_insn *zero_extendqihi2_real
-  [(set (match_operand:HI 0 nonimmediate_operand =rv,A)
-   (zero_extend:HI (match_operand:QI 1 general_operand 0,a)))]
+  [(set (match_operand:HI 0 nonimmediate_operand 
=Bv,DT,A)

+   (zero_extend:HI (match_operand:QI 1 general_operand 0,0,a)))]
   rl78_real_insns_ok ()
   @
+   clrb\t%Q0
mov\t%Q0, #0
-   mov\tx, a \;mov\ta, #0
+   mov\tx, a \;clrb\ta
   )

 (define_insn *extendqihi2_real
--
1.7.9.5


Re: [PATCH, rs6000] Avoid clobbering stack pointer via P8 fusion peephole

2014-03-27 Thread David Edelsohn
On Thu, Mar 27, 2014 at 2:49 PM, Ulrich Weigand uweig...@de.ibm.com wrote:
 Hello,

 when trying to build Ada for powerpc64le-linux, I ran into an ICE
 in fixup_args_size_notes.

 It turns out that the p8 fusion peephole acts on these two insns
 from the epilog sequence:

 (insn 1693 1078 1079 91 (set (reg:DI 7 7)
 (plus:DI (reg/f:DI 31 31)
 (const_int 65536 [0x1]))) 82 {*adddi3_internal1}
  (nil))
 (insn 1079 1693 1511 91 (set (reg/f:DI 1 1)
 (mem/c:DI (plus:DI (reg:DI 7 7)
 (const_int -16096 [0xc120])) [233 %sfp+49440 S8 
 A64])) 519 {*movdi_internal64}
  (expr_list:REG_DEAD (reg:DI 7 7)
 (expr_list:REG_ARGS_SIZE (const_int 0 [0])
 (nil

 and replaces them by:

 (insn 1776 1078 1777 91 (set (reg/f:DI 1 1)
 (plus:DI (reg/f:DI 31 31)
 (const_int 65536 [0x1]))) -1
  (nil))

 (insn 1777 1776 1511 91 (set (reg/f:DI 1 1)
 (mem/c:DI (plus:DI (reg/f:DI 1 1)
 (const_int -16096 [0xc120])) [233  S8 A8])) -1
  (expr_list:REG_ARGS_SIZE (const_int 0 [0])
 (nil)))

 Then peephole common code thinks it needs to re-create the REG_ARGS_SIZE
 note and fails since the code is too complex for it to understand.  (Which
 is reasonable since it doesn't know what value is being restored from the
 stack here.)

 However, the more fundamental problem seems to be that this transformation
 should be invalid anyway, since it creates an intermediate state where
 the stack pointer points to a location without proper backchain, which
 violates the ABI.

 The following patch fixes this by disabling the fusion peephole in those
 cases where it would introduce a new use of the stack pointer as temporary
 register.

 Tested on powerpc64le-linux.  OK for mainline (and 4.8 after the big patch
 series is committed)?

 Bye,
 Ulrich

 ChangeLog:

 * config/rs6000/rs6000.c (fusion_gpr_load_p): Refuse optimization
 if it would clobber the stack pointer, even temporarily.

Okay.

Thanks, David


Re: [PATCH], PR 60672, Add xxsldwi/xxpermdi builtins to altivec.h

2014-03-27 Thread Michael Meissner
Whoops, I forgot to document the new builtin.  I just committed this change to
the documentation file.  Sorry about that.

I also deleted the comment on the nop instruction, just in case there is a VSX
assembler some day that uses a different comment convention.

2014-03-27  Michael Meissner  meiss...@linux.vnet.ibm.com

* doc/extend.texi (PowerPC AltiVec/VSX Built-in Functions):
Document use of vec_xxsldwi and vec_xxpermdi builtins.

Index: gcc/doc/extend.texi
===
--- gcc/doc/extend.texi (revision 208879)
+++ gcc/doc/extend.texi (working copy)
@@ -15075,6 +15075,9 @@ vector unsigned long long vec_vaddudm (v
 vector unsigned long long vec_vaddudm (vector unsigned long long,
vector bool unsigned long long);
 
+vector long long vec_vbpermq (vector signed char, vector signed char);
+vector long long vec_vbpermq (vector unsigned char, vector unsigned char);
+
 vector long long vec_vclz (vector long long);
 vector unsigned long long vec_vclz (vector unsigned long long);
 vector int vec_vclz (vector int);

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH] RL78 - minor size optimization

2014-03-27 Thread DJ Delorie

This is OK after 4.9 branches (i.e. stage1).  I suspect we could add
AX to the first alternative, although I don't know if it will get
used.  We could add HL to the second alternative to complete the
replacement of the 'r' constraint.


[Fortran-CAF, patch, committed] Implement the library call for caf_send

2014-03-27 Thread Tobias Burnus

This patch implements the call to the library for code of the form:
   caf[j] = (rhs - expr)

Caveats: It currently only handles scalars and for characters only 
len-one ones. While it copies also derived types, it does not handle 
allocatable components.


With a suitable communication library, this patch finally permits real 
multi-image communication. Hooray!



The next steps are (in no specific order and without committal to do it 
myself):

- Adding test cases for this code (dumps, -fcaf_single run-time checks)
- supporting array sections and array vector sections
- supporting len  1 character strings
- allocatable/pointer components of coarrays
- supporting coindexed coarray on the RHS.

Committed to the Fortran-CAF branch as Rev.

Tobias
Index: gcc/fortran/ChangeLog.fortran-caf
===
--- gcc/fortran/ChangeLog.fortran-caf	(Revision 208886)
+++ gcc/fortran/ChangeLog.fortran-caf	(Arbeitskopie)
@@ -1,3 +1,10 @@
+2014-03-28  Tobias Burnus  bur...@net-b.de
+
+	* trans-intrinsic.c (caf_get_image_index, conv_caf_send): New.
+	(gfc_conv_intrinsic_subroutine): Call it.
+	* resolve.c (resolve_ordinary_assign): Enable coindex LHS
+	support for -fcoarray=lib.
+
 2014-03-15  Tobias Burnus  bur...@net-b.de
 
 	* gfortran.h (gfc_isym_id): Add GFC_ISYM_CAF_SEND.
@@ -6,6 +13,8 @@
 	* resolve.c (resolve_ordinary_assign): Prepare the
 	replacement of the assignment for coindexed LHS by
 	a call to caf_send.
+	(resolve_code): Ignore component_assignments for those
+	assignments which have been replaced.
 
 2014-03-14  Tobias Burnus  bur...@net-b.de
 
Index: gcc/fortran/resolve.c
===
--- gcc/fortran/resolve.c	(Revision 208886)
+++ gcc/fortran/resolve.c	(Arbeitskopie)
@@ -9229,7 +9229,7 @@ resolve_ordinary_assign (gfc_code *code, gfc_names
 
   gfc_check_assign (lhs, rhs, 1);
 
-  if (false  lhs_coindexed  gfc_option.coarray == GFC_FCOARRAY_LIB)
+  if (lhs_coindexed  gfc_option.coarray == GFC_FCOARRAY_LIB)
 {
   code-op = EXEC_CALL;
   gfc_get_sym_tree (GFC_PREFIX (caf_send), ns, code-symtree, true);
Index: gcc/fortran/trans-intrinsic.c
===
--- gcc/fortran/trans-intrinsic.c	(Revision 208886)
+++ gcc/fortran/trans-intrinsic.c	(Arbeitskopie)
@@ -7788,6 +7788,182 @@ conv_intrinsic_move_alloc (gfc_code *code)
 }
 
 
+/* Convert the coindex of a coarray into an image index; the result is
+   image_num =  (idx(1)-lcobound(1)+1) + (idx(2)-lcobound(2)+1)*extent(1)
+  + (idx(3)-lcobound(3)+1)*extent(2) + ...  */
+
+static tree
+caf_get_image_index (stmtblock_t *block, gfc_expr *e, tree desc)
+{
+  gfc_ref *ref;
+  tree lbound, ubound, extent, tmp, img_idx;
+  gfc_se se;
+  int i;
+
+  for (ref = e-ref; ref; ref = ref-next)
+if (ref-type == REF_ARRAY  ref-u.ar.codimen  0)
+  break;
+  gcc_assert (ref != NULL);
+
+  img_idx = integer_zero_node;
+  extent = integer_one_node;
+  if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (desc)))
+for (i = ref-u.ar.dimen; i  ref-u.ar.dimen + ref-u.ar.codimen; i++)
+  {
+	gfc_init_se (se, NULL);
+	gfc_conv_expr_type (se, ref-u.ar.start[i], integer_type_node);
+	gfc_add_block_to_block (block, se.pre);
+	lbound = gfc_conv_descriptor_lbound_get (desc, gfc_rank_cst[i]);
+	tmp = fold_build2_loc (input_location, MINUS_EXPR,
+			   integer_type_node, se.expr,
+			   fold_convert(integer_type_node, lbound));
+	tmp = fold_build2_loc (input_location, PLUS_EXPR, integer_type_node,
+			   tmp, integer_one_node);
+	tmp = fold_build2_loc (input_location, MULT_EXPR, integer_type_node,
+			   extent, tmp);
+	img_idx = fold_build2_loc (input_location, PLUS_EXPR, integer_type_node,
+   img_idx, tmp);
+	if (i  ref-u.ar.dimen + ref-u.ar.codimen - 1)
+	  {
+	ubound = gfc_conv_descriptor_ubound_get (desc, gfc_rank_cst[i]);
+	extent = gfc_conv_array_extent_dim (lbound, ubound, NULL);
+	extent = fold_convert (integer_type_node, extent);
+	  }
+  }
+  else
+for (i = ref-u.ar.dimen; i  ref-u.ar.dimen + ref-u.ar.codimen; i++)
+  {
+	gfc_init_se (se, NULL);
+	gfc_conv_expr_type (se, ref-u.ar.start[i], integer_type_node);
+	gfc_add_block_to_block (block, se.pre);
+	lbound = GFC_TYPE_ARRAY_LBOUND (TREE_TYPE (desc), i);
+	lbound = fold_convert (integer_type_node, lbound);
+	tmp = fold_build2_loc (input_location, MINUS_EXPR,
+			   integer_type_node, se.expr, lbound);
+	tmp = fold_build2_loc (input_location, PLUS_EXPR, integer_type_node,
+			   tmp, integer_one_node);
+	tmp = fold_build2_loc (input_location, MULT_EXPR, integer_type_node,
+			   extent, tmp);
+	img_idx = fold_build2_loc (input_location, PLUS_EXPR, integer_type_node,
+   img_idx, tmp);
+	if (i  ref-u.ar.dimen + ref-u.ar.codimen - 1)
+	  {
+	ubound = GFC_TYPE_ARRAY_UBOUND (TREE_TYPE (desc), i);
+	ubound = fold_convert (integer_type_node, ubound);
+	extent = fold_build2_loc 

Re: Fix PR ipa/60315 (inliner explosion)

2014-03-27 Thread Eric Botcazou
 I will check, thanks for the reduced testcase. It seems like another case
 where we get predicate wrong that ought to be fixed, of course.

You're welcome.  Another bug introduced by the patch:

eric@polaris:~/build/gcc/native gcc/xgcc -Bgcc -S opt33.adb -O
/home/eric/install/gcc/lib64/gcc/x86_64-suse-linux/4.9.0/adainclude/a-
coorse.adb: In function 'Opt33.My_Ordered_Sets.To_Set':
/home/eric/install/gcc/lib64/gcc/x86_64-suse-linux/4.9.0/adainclude/a-
coorse.adb:1983:4: error: static chain with function that doesn't use one
.builtin_unreachable (tree, node_54, 1, 0B); [static-chain: FRAME.437]
/home/eric/install/gcc/lib64/gcc/x86_64-suse-linux/4.9.0/adainclude/a-
coorse.adb:1983:4: error: static chain with function that doesn't use one
.builtin_unreachable (tree, node_54, _60, node_57); [static-chain: 
FRAME.437]
+===GNAT BUG DETECTED==+
| 4.9.0 20140327 (experimental) [trunk revision 208879] (x86_64-suse-linux) 
GCC error:|
| verify_gimple failed |
| Error detected around /home/eric/install/gcc/lib64/gcc/x86_64-suse-
linux/4.9.0/adainclude/a-coorse.adb:1983:4|

You can install the testcase as gnat.dg/opt33.adb in the testsuite.

-- 
Eric Botcazou-- { dg-do compile }
-- { dg-options -O }

with Ada.Containers.Ordered_Sets;
with Ada.Strings.Unbounded;

procedure Opt33 is

   type Rec is record
  Name : Ada.Strings.Unbounded.Unbounded_String;
   end record;

   function  (Left : Rec; Right : Rec) return Boolean;

   package My_Ordered_Sets is new Ada.Containers.Ordered_Sets (Rec);

   protected type Data is
  procedure Do_It;
   private
  Set : My_Ordered_Sets.Set;
   end Data;

   function  (Left : Rec; Right : Rec) return Boolean is
   begin
  return False;
   end ;

   protected body Data is
  procedure Do_It is
 procedure Dummy (Position : My_Ordered_Sets.Cursor) is
 begin
null;
 end;
  begin
 Set.Iterate (Dummy'Access);
  end;
   end Data;

begin
   null;
end;


various _mm512_set* intrinsics

2014-03-27 Thread Ulrich Drepper
Here are more intrinsics that are missing.  I know that gcc currently
generates horrible code for most of them but I think it's more important
to have the API in place, albeit non-optimal.  Maybe this entices some
one to add the necessary optimizations.

The code is self-contained and shouldn't interfere with any correct
code.  Should this also go into 4.9?

2014-03-27  Ulrich Drepper  drep...@gmail.com

* config/i386/avx512fintrin.h (__v32hi): Define type.
(__v64qi): Likewise.
(_mm512_set1_epi8): Define.
(_mm512_set1_epi16): Define.
(_mm512_set4_epi32): Define.
(_mm512_set4_epi64): Define.
(_mm512_set4_pd): Define.
(_mm512_set4_ps): Define.
(_mm512_setr4_epi64): Define.
(_mm512_setr4_epi32): Define.
(_mm512_setr4_pd): Define.
(_mm512_setr4_ps): Define.
(_mm512_setzero_epi32): Define.

diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
index 9602866..314895a 100644
--- a/gcc/config/i386/avx512fintrin.h
+++ b/gcc/config/i386/avx512fintrin.h
@@ -39,6 +39,8 @@ typedef double __v8df __attribute__ ((__vector_size__ (64)));
 typedef float __v16sf __attribute__ ((__vector_size__ (64)));
 typedef long long __v8di __attribute__ ((__vector_size__ (64)));
 typedef int __v16si __attribute__ ((__vector_size__ (64)));
+typedef short __v32hi __attribute__ ((__vector_size__ (64)));
+typedef char __v64qi __attribute__ ((__vector_size__ (64)));
 
 /* The Intel API is flexible enough that we must allow aliasing with other
vector types, and their scalar components.  */
@@ -130,6 +132,32 @@ _mm512_undefined_si512 (void)
   return __Y;
 }
 
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set1_epi8 (char __A)
+{
+  return __extension__ (__m512i)(__v64qi)
+{ __A, __A, __A, __A, __A, __A, __A, __A,
+  __A, __A, __A, __A, __A, __A, __A, __A,
+  __A, __A, __A, __A, __A, __A, __A, __A,
+  __A, __A, __A, __A, __A, __A, __A, __A,
+  __A, __A, __A, __A, __A, __A, __A, __A,
+  __A, __A, __A, __A, __A, __A, __A, __A,
+  __A, __A, __A, __A, __A, __A, __A, __A,
+  __A, __A, __A, __A, __A, __A, __A, __A };
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set1_epi16 (short __A)
+{
+  return __extension__ (__m512i)(__v32hi)
+{ __A, __A, __A, __A, __A, __A, __A, __A,
+  __A, __A, __A, __A, __A, __A, __A, __A,
+  __A, __A, __A, __A, __A, __A, __A, __A,
+  __A, __A, __A, __A, __A, __A, __A, __A };
+}
+
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_set1_pd (double __A)
@@ -152,6 +180,54 @@ _mm512_set1_ps (float __A)
 (__mmask16) -1);
 }
 
+/* Create the vector [A B C D A B C D A B C D A B C D].  */
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set4_epi32 (int __A, int __B, int __C, int __D)
+{
+  return __extension__ (__m512i)(__v16si)
+{ __D, __C, __B, __A, __D, __C, __B, __A,
+  __D, __C, __B, __A, __D, __C, __B, __A };
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set4_epi64 (long long __A, long long __B, long long __C,
+  long long __D)
+{
+  return __extension__ (__m512i) (__v8di)
+{ __D, __C, __B, __A, __D, __C, __B, __A };
+}
+
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set4_pd (double __A, double __B, double __C, double __D)
+{
+  return __extension__ (__m512d)
+{ __D, __C, __B, __A, __D, __C, __B, __A };
+}
+
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set4_ps (float __A, float __B, float __C, float __D)
+{
+  return __extension__ (__m512)
+{ __D, __C, __B, __A, __D, __C, __B, __A,
+  __D, __C, __B, __A, __D, __C, __B, __A };
+}
+
+#define _mm512_setr4_epi64(e0,e1,e2,e3)
  \
+  _mm512_set4_epi64(e3,e2,e1,e0)
+
+#define _mm512_setr4_epi32(e0,e1,e2,e3)
  \
+  _mm512_set4_epi32(e3,e2,e1,e0)
+
+#define _mm512_setr4_pd(e0,e1,e2,e3) \
+  _mm512_set4_pd(e3,e2,e1,e0)
+
+#define _mm512_setr4_ps(e0,e1,e2,e3) \
+  _mm512_set4_ps(e3,e2,e1,e0)
+
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_setzero_ps (void)
@@ -169,6 +245,13 @@ _mm512_setzero_pd (void)
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_setzero_epi32 (void)
+{
+  return __extension__ (__m512i)(__v8di){ 0, 0, 0, 0, 0, 0, 0, 0 };
+}
+
+extern __inline __m512i
+__attribute__ 

Go patch committed: Avoid reading bogus field

2014-03-27 Thread Ian Lance Taylor
PR 59545 points out that there is a case where the Go frontend reads an
invalid value from a class field.  This happens because of an incorrect
static_cast.  This patch fixes the problem to only use the static_cast
when it is valid.  Bootstrapped and ran Go tests on
x86_64-unknown-linux-gnu.  Committed to mainline.

Ian

diff -r 55fb8756d889 go/expressions.cc
--- a/go/expressions.cc	Thu Mar 27 14:22:49 2014 -0700
+++ b/go/expressions.cc	Thu Mar 27 22:11:25 2014 -0700
@@ -4163,8 +4163,12 @@
 
   go_assert(!this-expr_-is_composite_literal()
 || this-expr_-is_immutable());
-  Unary_expression* ue = static_castUnary_expression*(this-expr_);
-  go_assert(ue == NULL || ue-op() != OPERATOR_AND);
+	  if (this-expr_-classification() == EXPRESSION_UNARY)
+	{
+	  Unary_expression* ue =
+		static_castUnary_expression*(this-expr_);
+	  go_assert(ue-op() != OPERATOR_AND);
+	}
 	}
 
   // Build a decl for a constant constructor.


Re: Fix PR ipa/60315 (inliner explosion)

2014-03-27 Thread Jan Hubicka
  I will check, thanks for the reduced testcase. It seems like another case
  where we get predicate wrong that ought to be fixed, of course.
 
 You're welcome.  Another bug introduced by the patch:
 
 eric@polaris:~/build/gcc/native gcc/xgcc -Bgcc -S opt33.adb -O
 /home/eric/install/gcc/lib64/gcc/x86_64-suse-linux/4.9.0/adainclude/a-
 coorse.adb: In function 'Opt33.My_Ordered_Sets.To_Set':
 /home/eric/install/gcc/lib64/gcc/x86_64-suse-linux/4.9.0/adainclude/a-
 coorse.adb:1983:4: error: static chain with function that doesn't use one
 .builtin_unreachable (tree, node_54, 1, 0B); [static-chain: FRAME.437]
 /home/eric/install/gcc/lib64/gcc/x86_64-suse-linux/4.9.0/adainclude/a-
 coorse.adb:1983:4: error: static chain with function that doesn't use one
 .builtin_unreachable (tree, node_54, _60, node_57); [static-chain: 
 FRAME.437]
 +===GNAT BUG DETECTED==+
 | 4.9.0 20140327 (experimental) [trunk revision 208879] (x86_64-suse-linux) 
 GCC error:|
 | verify_gimple failed |
 | Error detected around /home/eric/install/gcc/lib64/gcc/x86_64-suse-
 linux/4.9.0/adainclude/a-coorse.adb:1983:4|

Thanks, looks like another (semi)-latent bug.  I will add code to cgraph 
redirection
to get rid of static chain if it is uneeded.

Honza