Re: [RFA] [PATCH][PR tree-optimization/59749] Fix recently introduced ree bug

2014-01-12 Thread Jakub Jelinek
On Sun, Jan 12, 2014 at 11:21:59PM -0700, Jeff Law wrote:
> --- a/gcc/ree.c
> +++ b/gcc/ree.c
> @@ -297,6 +297,13 @@ combine_set_extension (ext_cand *cand, rtx curr_insn, 
> rtx *orig_set)
>else
>  new_reg = gen_rtx_REG (cand->mode, REGNO (SET_DEST (*orig_set)));
>  
> +  /* We're going to be widening the result of DEF_INSN, ensure that doing so
> + doesn't change the number of hard registers needed for the result.  */
> +  if (HARD_REGNO_NREGS (REGNO (new_reg), cand->mode)
> +  != HARD_REGNO_NREGS (REGNO (SET_SRC (*orig_set)),

Note you can use orig_src instead of SET_SRC (*orig_set) here.

> +GET_MODE (SET_DEST (*orig_set
> + return false;
> +
>/* Merge constants by directly moving the constant into the register under
>   some conditions.  Recall that RTL constants are sign-extended.  */
>if (GET_CODE (orig_src) == CONST_INT

Are you sure the above is needed even for the
REGNO (new_reg) == REGNO (SET_DEST (*orig_set))
&& REGNO (new_reg) == REGNO (orig_src) case?
I mean in that case no copy insn is going to be scheduled right now, nor has
been previously scheduled, so we are back to what the code did before
r206418.  I can imagine it can be a problem, but doesn't have to be.

(set (reg:SI 3) (something:SI))
(set (reg:SI 2) (expression:SI))// def_insn
(use (reg:SI 3))
(set (reg:DI 3) (sign_extend:DI (reg:SI 2)))

So, perhaps if we wanted to handle the HARD_REGNO_NREGS != HARD_REGNO_NREGS
case when all 3 REGNOs are the same, we'd need to limit it to the case where
cand->insn and curr_insn are in the same bb, DF_INSN_LUID of curr_insn
is smaller than DF_INSN_LUID of cand->insn and the extra hard regs aren't
used between the two.  Perhaps not worth it?

BTW, I'm surprised to hear that it triggers in the testsuite already (for
the 3 REGNOs the same case, or different?), is that on x86_64 or i?86?
Do you have an example?  I'm surprised that we'd have post reload a pattern
that extends into multiple hard registers.

Jakub


Re: [Ping]Two pending IVOPT patches

2014-01-12 Thread Bin.Cheng
On Mon, Jan 13, 2014 at 2:29 PM, Jeff Law  wrote:
> On 01/11/14 02:21, Bin.Cheng wrote:
>>
>> On Sat, Jan 11, 2014 at 5:07 PM, Jakub Jelinek  wrote:
>>>
>>> On Sat, Jan 11, 2014 at 05:02:26PM +0800, Bin.Cheng wrote:
>
> I reduced the case and attached ivopt dumps with/without the patch.
> It seems the patch is doing right thing and choosing better
> candidates, most likely it reveals an existing bug.
> I am looking into this issue, in the meantime, I am wondering should I
> apply the patch and file a PR for it, or apply the patch after root
> causing it?
>>>
>>>
>>> Sounds like PR59743 which should be already fixed.
>>
>> Yes, it is.  I just did the test against trunk exactly before Jeff's fix.
>> Then I will apply the patch to trunk.
>>
>> And an additional question: Are these uses before definition always
>> caused by uninitialized use in GCC?
>
> In general, no.
>
> Consider a loop where an object's use is guarded (say it only gets used on
> the 2nd and later iterations) and the set is unguarded.  ISTM that the
> reaching def would be from a later insn in the loop.
I see, thanks for elaborating.
BTW, I applied the approved patch as revision 206552, and there should
be no case violated anymore.

Thanks,
bin

>
> jeff
>



-- 
Best Regards.


Re: [Ping]Two pending IVOPT patches

2014-01-12 Thread Jeff Law

On 01/11/14 02:21, Bin.Cheng wrote:

On Sat, Jan 11, 2014 at 5:07 PM, Jakub Jelinek  wrote:

On Sat, Jan 11, 2014 at 05:02:26PM +0800, Bin.Cheng wrote:

I reduced the case and attached ivopt dumps with/without the patch.
It seems the patch is doing right thing and choosing better
candidates, most likely it reveals an existing bug.
I am looking into this issue, in the meantime, I am wondering should I
apply the patch and file a PR for it, or apply the patch after root
causing it?


Sounds like PR59743 which should be already fixed.

Yes, it is.  I just did the test against trunk exactly before Jeff's fix.
Then I will apply the patch to trunk.

And an additional question: Are these uses before definition always
caused by uninitialized use in GCC?

In general, no.

Consider a loop where an object's use is guarded (say it only gets used 
on the 2nd and later iterations) and the set is unguarded.  ISTM that 
the reaching def would be from a later insn in the loop.


jeff



Re: [Ping]Two pending IVOPT patches

2014-01-12 Thread Jeff Law

On 01/11/14 02:07, Jakub Jelinek wrote:

On Sat, Jan 11, 2014 at 05:02:26PM +0800, Bin.Cheng wrote:

I reduced the case and attached ivopt dumps with/without the patch.
It seems the patch is doing right thing and choosing better
candidates, most likely it reveals an existing bug.
I am looking into this issue, in the meantime, I am wondering should I
apply the patch and file a PR for it, or apply the patch after root
causing it?


Sounds like PR59743 which should be already fixed.

Certainly what it looks like to me as well.
jeff



Re: [RFA] [PATCH][PR tree-optimization/59749] Fix recently introduced ree bug

2014-01-12 Thread Jeff Law

On 01/10/14 14:52, Jakub Jelinek wrote:

There is one thing I still worry about, if some target has
an insn to say sign extend or zero extend a short memory load
into HARD_REGNO_NREGS () > 1 register, but the other involved
register has the only one (or fewer) hard registers available to it.
Consider registers SImode hard registers 0, 1, 2, 3:
   (set (reg:SI 3) (something:SI))
   (set (reg:HI 0) (expression:HI))
   (set (reg:SI 2) (sign_extend:SI (reg:HI 0)))
   (set (reg:DI 0) (sign_extend:DI (reg:HI 0)))
   (use (reg:SI 3))
we transform this into:
   (set (reg:SI 3) (something:SI))
   (set (reg:SI 2) (sign_extend:SI (expression:HI)))
   (set (reg:SI 0) (reg:HI 2))
   (set (reg:DI 0) (sign_extend:DI (reg:HI 0)))
   (use (reg:SI 3))
first (well, the middle is then pending in copy list), and next:
   (set (reg:SI 3) (something))
   (set (reg:DI 2) (sign_extend:DI (expression:HI)))
   (set (reg:DI 0) (reg:DI 2))
   (use (reg:SI 3))
but that looks wrong, because the second instruction would now clobber
(reg:SI 3).  Dunno if we have such an target and thus if it is possible
to construct a testcase.
No need to construct a testcase, there's a few that trip the condition 
in the existing testsuite :-)


Basically I just put a check in combine_set_extension to detect when 
widening of the result of the reaching def requires more hard registers 
than it previously needed and ran the testsuite.





So, I'd say the handling of the second extend should notice that
it is actually extending load into a different register and bail out
if it would need more hard registers than it needed previously, or
something similar.

Yes, like in the attached patch?  OK for the trunk?


commit 1313449102ac8d62e36818d8660ef2e897bd59e3
Author: Jeff Law 
Date:   Fri Jan 10 14:31:15 2014 -0700

PR tree-optimization/59747
* ree.c (find_and_remove_re): Properly handle case where a second
eliminated extension requires widening a copy created for elimination
of a prior extension.
(combine_set_extension): Ensure that the number of hard regs needed
for a destination register does not change when we widen it.

PR tree-optimization/59747
* gcc.c-torture/execute/pr59747.c: New test.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index c554609..a82e23c 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -5,6 +5,13 @@
occurs before the extension when optimizing extensions with
different source and destination hard registers.
 
+   PR tree-optimization/59747
+   * ree.c (find_and_remove_re): Properly handle case where a second
+   eliminated extension requires widening a copy created for elimination
+   of a prior extension.
+   (combine_set_extension): Ensure that the number of hard regs needed
+   for a destination register does not change when we widen it.
+
 2014-01-10  Jan Hubicka  
 
PR ipa/58585
diff --git a/gcc/ree.c b/gcc/ree.c
index 63cc8cc..3ee97cd 100644
--- a/gcc/ree.c
+++ b/gcc/ree.c
@@ -297,6 +297,13 @@ combine_set_extension (ext_cand *cand, rtx curr_insn, rtx 
*orig_set)
   else
 new_reg = gen_rtx_REG (cand->mode, REGNO (SET_DEST (*orig_set)));
 
+  /* We're going to be widening the result of DEF_INSN, ensure that doing so
+ doesn't change the number of hard registers needed for the result.  */
+  if (HARD_REGNO_NREGS (REGNO (new_reg), cand->mode)
+  != HARD_REGNO_NREGS (REGNO (SET_SRC (*orig_set)),
+  GET_MODE (SET_DEST (*orig_set
+   return false;
+
   /* Merge constants by directly moving the constant into the register under
  some conditions.  Recall that RTL constants are sign-extended.  */
   if (GET_CODE (orig_src) == CONST_INT
@@ -1017,11 +1024,20 @@ find_and_remove_re (void)
   for (unsigned int i = 0; i < reinsn_copy_list.length (); i += 2)
 {
   rtx curr_insn = reinsn_copy_list[i];
+  rtx def_insn = reinsn_copy_list[i + 1];
+
+  /* Use the mode of the destination of the defining insn
+for the mode of the copy.  This is necessary if the
+defining insn was used to eliminate a second extension
+that was wider than the first.  */
+  rtx sub_rtx = *get_sub_rtx (def_insn);
   rtx pat = PATTERN (curr_insn);
-  rtx new_reg = gen_rtx_REG (GET_MODE (SET_DEST (pat)),
+  rtx new_dst = gen_rtx_REG (GET_MODE (SET_DEST (sub_rtx)),
 REGNO (XEXP (SET_SRC (pat), 0)));
-  rtx set = gen_rtx_SET (VOIDmode, new_reg, SET_DEST (pat));
-  emit_insn_after (set, reinsn_copy_list[i + 1]);
+  rtx new_src = gen_rtx_REG (GET_MODE (SET_DEST (sub_rtx)),
+REGNO (SET_DEST (pat)));
+  rtx set = gen_rtx_SET (VOIDmode, new_dst, new_src);
+  emit_insn_after (set, def_insn);
 }
 
   /* Delete all useless extensions here in one sweep.  */
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index f40d56e..a603952 100644
--- a/gcc/testsuite/ChangeLog
+++ 

Re: [PATCH i386 10/8] [AVX512] Add missing AVX-512ER patterns, intrinsics, tests.

2014-01-12 Thread Kirill Yukhin
Hello,
On 11 Jan 12:42, Uros Bizjak wrote:
> On Fri, Jan 10, 2014 at 5:24 PM, Jakub Jelinek  wrote:
> > This means you should ensure aligned_mem will be set for
> > CODE_FOR_avx512f_movntdqa in ix86_expand_special_args_builtin.
Fixed. Updated patch in the bottom.

> > Leaving the rest of review to Uros/Richard.
> 
> The rest is OK.
Thanks! I'll check it in tomorrow if no more issues!

--
Thanks, K

 gcc/config/i386/avx512erintrin.h   | 62 +++
 gcc/config/i386/avx512fintrin.h|  7 +++
 gcc/config/i386/i386-builtin-types.def |  1 +
 gcc/config/i386/i386.c | 14 +
 gcc/config/i386/sse.md | 71 +++---
 gcc/config/i386/subst.md   |  4 --
 gcc/testsuite/gcc.target/i386/avx-1.c  | 20 --
 gcc/testsuite/gcc.target/i386/avx512er-vexp2pd-1.c | 12 ++--
 gcc/testsuite/gcc.target/i386/avx512er-vexp2ps-1.c | 12 ++--
 .../gcc.target/i386/avx512er-vrcp28pd-1.c  | 12 ++--
 .../gcc.target/i386/avx512er-vrcp28ps-1.c  | 12 ++--
 .../gcc.target/i386/avx512er-vrcp28sd-1.c  | 15 +
 .../gcc.target/i386/avx512er-vrcp28sd-2.c  | 29 +
 .../gcc.target/i386/avx512er-vrcp28ss-1.c  | 15 +
 .../gcc.target/i386/avx512er-vrcp28ss-2.c  | 29 +
 .../gcc.target/i386/avx512er-vrsqrt28pd-1.c| 12 ++--
 .../gcc.target/i386/avx512er-vrsqrt28ps-1.c| 12 ++--
 .../gcc.target/i386/avx512er-vrsqrt28sd-1.c| 15 +
 .../gcc.target/i386/avx512er-vrsqrt28sd-2.c| 29 +
 .../gcc.target/i386/avx512er-vrsqrt28ss-1.c| 15 +
 .../gcc.target/i386/avx512er-vrsqrt28ss-2.c| 29 +
 .../gcc.target/i386/avx512f-vmovntdqa-1.c  | 14 +
 .../gcc.target/i386/avx512f-vmovntdqa-2.c  | 17 ++
 gcc/testsuite/gcc.target/i386/avx512f-vrcp14sd-2.c |  6 +-
 gcc/testsuite/gcc.target/i386/avx512f-vrcp14ss-2.c | 10 +--
 gcc/testsuite/gcc.target/i386/sse-22.c | 40 ++--
 gcc/testsuite/gcc.target/i386/sse-23.c | 16 +++--
 27 files changed, 430 insertions(+), 100 deletions(-)

diff --git a/gcc/config/i386/avx512erintrin.h b/gcc/config/i386/avx512erintrin.h
index f442f2b..6fe05bc 100644
--- a/gcc/config/i386/avx512erintrin.h
+++ b/gcc/config/i386/avx512erintrin.h
@@ -159,6 +159,24 @@ _mm512_maskz_rcp28_round_ps (__mmask16 __U, __m512 __A, 
int __R)
   (__mmask16) __U, __R);
 }
 
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_rcp28_round_sd (__m128d __A, __m128d __B, int __R)
+{
+  return (__m128d) __builtin_ia32_rcp28sd_round ((__v2df) __A,
+(__v2df) __B,
+__R);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_rcp28_round_ss (__m128 __A, __m128 __B, int __R)
+{
+  return (__m128) __builtin_ia32_rcp28ss_round ((__v4sf) __A,
+   (__v4sf) __B,
+   __R);
+}
+
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_rsqrt28_round_pd (__m512d __A, int __R)
@@ -214,6 +232,25 @@ _mm512_maskz_rsqrt28_round_ps (__mmask16 __U, __m512 __A, 
int __R)
 (__v16sf) _mm512_setzero_ps (),
 (__mmask16) __U, __R);
 }
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_rsqrt28_round_sd (__m128d __A, __m128d __B, int __R)
+{
+  return (__m128d) __builtin_ia32_rsqrt28sd_round ((__v2df) __A,
+  (__v2df) __B,
+  __R);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_rsqrt28_round_ss (__m128 __A, __m128 __B, int __R)
+{
+  return (__m128) __builtin_ia32_rsqrt28ss_round ((__v4sf) __A,
+ (__v4sf) __B,
+ __R);
+}
+
 #else
 #define _mm512_exp2a23_round_pd(A, C)\
 __builtin_ia32_exp2pd_mask(A, (__v8df)_mm512_setzero_pd(), -1, C)
@@ -268,6 +305,19 @@ _mm512_maskz_rsqrt28_round_ps (__mmask16 __U, __m512 __A, 
int __R)
 
 #define _mm512_maskz_rsqrt28_round_ps(U, A, C)   \
 __builtin_ia32_rsqrt28ps_mask(A, (__v16sf)_mm512_setzero_ps(), U, C)
+
+#define _mm_rcp28_round_sd(A, B, R)\
+__builtin_ia32_rcp28sd_round(A, B, R)
+
+#define _mm_rcp28_round_ss(A, B, R)\
+__builtin_ia32_rcp28ss_round(A, B, R)
+
+#define _mm_rsqrt28_round_sd(A, B, R)  \
+__builtin_ia32_rsqrt28sd_round(A, B, R)
+
+#define _mm_rsqrt28_round_ss(A, B, R)  \
+__builtin_ia32_rsqrt28ss_round(A, B, R)
+

[PATCH,rs6000] Implement -maltivec=be for vec_insert and vec_extract Altivec intrinsics

2014-01-12 Thread Bill Schmidt
This patch provides for interpreting element numbers for the Altivec
vec_insert and vec_extract intrinsics as big-endian (left to right in a
vector register) when targeting a little endian machine and specifying
-maltivec=be.  New test cases are added to test this functionality on
all supported vector types.

Bootstrapped and tested with no new regressions on
powerpc64{,le}-unknown-linux-gnu.  Ok for trunk?

Thanks,
Bill


gcc:

2014-01-12  Bill Schmidt  

* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
Implement -maltivec=be for vec_insert and vec_extract.

gcc/testsuite:

2014-01-12  Bill Schmidt  

* gcc.dg/vmx/insert.c: New.
* gcc.dg/vmx/insert-be-order.c: New.
* gcc.dg/vmx/extract.c: New.
* gcc.dg/vmx/extract-be-order.c: New.


Index: gcc/testsuite/gcc.dg/vmx/insert-be-order.c
===
--- gcc/testsuite/gcc.dg/vmx/insert-be-order.c  (revision 0)
+++ gcc/testsuite/gcc.dg/vmx/insert-be-order.c  (revision 0)
@@ -0,0 +1,65 @@
+/* { dg-options "-maltivec=be -mabi=altivec -std=gnu99 -mno-vsx" } */
+
+#include "harness.h"
+
+static void test()
+{
+  vector unsigned char va = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+  vector signed char vb = {-8,-7,-6,-5,-4,-3,-2,-1,0,1,2,3,4,5,6,7};
+  vector unsigned short vc = {0,1,2,3,4,5,6,7};
+  vector signed short vd = {-4,-3,-2,-1,0,1,2,3};
+  vector unsigned int ve = {0,1,2,3};
+  vector signed int vf = {-2,-1,0,1};
+  vector float vg = {-2.0f,-1.0f,0.0f,1.0f};
+
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+  check (vec_all_eq (vec_insert (16, va, 5),
+((vector unsigned char)
+  {0,1,2,3,4,5,6,7,8,9,16,11,12,13,14,15})),
+"vec_insert (va LE)");
+  check (vec_all_eq (vec_insert (-16, vb, 0),
+((vector signed char)
+  {-8,-7,-6,-5,-4,-3,-2,-1,0,1,2,3,4,5,6,-16})),
+"vec_insert (vb LE)");
+  check (vec_all_eq (vec_insert (16, vc, 7),
+((vector unsigned short){16,1,2,3,4,5,6,7})),
+"vec_insert (vc LE)");
+  check (vec_all_eq (vec_insert (-16, vd, 3),
+((vector signed short){-4,-3,-2,-1,-16,1,2,3})),
+"vec_insert (vd LE)");
+  check (vec_all_eq (vec_insert (16, ve, 2),
+((vector unsigned int){0,16,2,3})),
+"vec_insert (ve LE)");
+  check (vec_all_eq (vec_insert (-16, vf, 1),
+((vector signed int){-2,-1,-16,1})),
+"vec_insert (vf LE)");
+  check (vec_all_eq (vec_insert (-16.0f, vg, 0),
+((vector float){-2.0f,-1.0f,0.0f,-16.0f})),
+"vec_insert (vg LE)");
+#else
+  check (vec_all_eq (vec_insert (16, va, 5),
+((vector unsigned char)
+  {0,1,2,3,4,16,6,7,8,9,10,11,12,13,14,15})),
+"vec_insert (va BE)");
+  check (vec_all_eq (vec_insert (-16, vb, 0),
+((vector signed char)
+  {-16,-7,-6,-5,-4,-3,-2,-1,0,1,2,3,4,5,6,7})),
+"vec_insert (vb BE)");
+  check (vec_all_eq (vec_insert (16, vc, 7),
+((vector unsigned short){0,1,2,3,4,5,6,16})),
+"vec_insert (vc BE)");
+  check (vec_all_eq (vec_insert (-16, vd, 3),
+((vector signed short){-4,-3,-2,-16,0,1,2,3})),
+"vec_insert (vd BE)");
+  check (vec_all_eq (vec_insert (16, ve, 2),
+((vector unsigned int){0,1,16,3})),
+"vec_insert (ve BE)");
+  check (vec_all_eq (vec_insert (-16, vf, 1),
+((vector signed int){-2,-16,0,1})),
+"vec_insert (vf BE)");
+  check (vec_all_eq (vec_insert (-16.0f, vg, 0),
+((vector float){-16.0f,-1.0f,0.0f,1.0f})),
+"vec_insert (vg BE)");
+#endif
+}
+
Index: gcc/testsuite/gcc.dg/vmx/extract-be-order.c
===
--- gcc/testsuite/gcc.dg/vmx/extract-be-order.c (revision 0)
+++ gcc/testsuite/gcc.dg/vmx/extract-be-order.c (revision 0)
@@ -0,0 +1,33 @@
+/* { dg-options "-maltivec=be -mabi=altivec -std=gnu99 -mno-vsx" } */
+
+#include "harness.h"
+
+static void test()
+{
+  vector unsigned char va = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+  vector signed char vb = {-8,-7,-6,-5,-4,-3,-2,-1,0,1,2,3,4,5,6,7};
+  vector unsigned short vc = {0,1,2,3,4,5,6,7};
+  vector signed short vd = {-4,-3,-2,-1,0,1,2,3};
+  vector unsigned int ve = {0,1,2,3};
+  vector signed int vf = {-2,-1,0,1};
+  vector float vg = {-2.0f,-1.0f,0.0f,1.0f};
+
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+  check (vec_extract (va, 5) == 10, "vec_extract (va, 5)");
+  check (vec_extract (vb, 0) == 7, "vec_extract (vb, 0)");
+  check (vec_extract (vc, 7) == 0, "vec_extract (vc, 7)");
+  check (vec_extract (vd, 3) == 0, "vec_extract (vd, 3)");
+  check (vec_extract (ve, 2) == 1, "vec_extract (ve, 2)");
+  check (vec_extract (vf, 1) == 0, "vec_extract (vf, 1)");
+  check (vec_extract (vg, 0) == 1.0f, "vec_

RE: Test cases vect-simd-clone-10/12.c keep failing

2014-01-12 Thread Bernd Edlinger
On Sun, 12 Jan 2014 12:01:43, Jakub Jelinek wrote:
>
> On Sun, Jan 12, 2014 at 10:53:58AM +0100, Bernd Edlinger wrote:
>> The test cases gcc.dg/vect/vect-simd-clone-10.c and
>> gcc.dg/vect/vect-simd-clone-12.c keep failing on my i686-pc. I do not really
>> understand why. The problem seem to be the command line to xgcc has
>> -S and -o and two .c files, probably the test case is not supported at all
>> on this target, does not have AVX, SSE...
>
> It seems that on some configurations (such as very old i?86/x86_64) the
> default is dg-do compile for vect and not dg-do run.
> check_vect_support_and_set_flags has:
> if { [check_effective_target_sse2_runtime] } {
> set dg-do-what-default run
> } else {
> set dg-do-what-default compile
> }
> so if you have really old box that doesn't support SSE2 even, you get this.
>
> I guess explicit /* { dg-do run } */ needs to be added in this case, after
> all the test has all the check_vect stuff in.
>
> Jakub

Yes, explicit /* { dg-do run } */ works.

Bernd.

Re: [PATCH] Allocate all target globals using GC for SWITCHABLE_TARGETs

2014-01-12 Thread Trevor Saunders
On Sun, Jan 12, 2014 at 02:23:21PM +0100, Richard Biener wrote:
> On Fri, Jan 10, 2014 at 6:37 PM, Richard Henderson  wrote:
> > On 01/09/2014 03:34 PM, Jakub Jelinek wrote:
> >> 2014-01-09  Jakub Jelinek  
> >>
> >>   * target-globals.c (save_target_globals): Allocate < 4KB structs 
> >> using
> >>   GC in payload of target_globals struct instead of allocating them on
> >>   the heap and the larger structs separately using GC.
> >>   * target-globals.h (struct target_globals): Make regs, hard_regs,
> >>   reload, expmed, ira, ira_int and lra_fields GTY((atomic)) instead
> >>   of GTY((skip)) and change type to void *.
> >>   (reset_target_globals): Cast loads from those fields to corresponding
> >>   types.
> >>
> >> --- gcc/target-globals.h.jj   2014-01-09 19:24:20.0 +0100
> >> +++ gcc/target-globals.h  2014-01-09 19:39:43.879348712 +0100
> >> @@ -41,17 +41,17 @@ extern struct target_lower_subreg *this_
> >>
> >>  struct GTY(()) target_globals {
> >>struct target_flag_state *GTY((skip)) flag_state;
> >> -  struct target_regs *GTY((skip)) regs;
> >> +  void *GTY((atomic)) regs;
> >
> > I'm not entirely fond of this either, for the obvious reason.  Clearly a
> > deficiency in gengtype, but after 2 hours of poking around I can see that
> > it isn't a quick fix.
> >
> > I guess I'm ok with the patch, since the use of the target_globals structure
> > is so restricted.
> 
> Yeah.  At some time we need a way to specify a finalization hook called
> if an object is collected and eventually a hook that walks extra roots
> indirectly
> reachable via an object (so you can have GC -> heap -> GC memory layouts
> more easily).

I actually tried to add finalizers a couple weeks ago, but it seems
pretty non trivial.  ggc seems to basically just allocate by searching
for the first unmarked block. It doesn't even sweep unmarked stuff, it
just marks and then waits for the space to be allocated over.  I believe
it deals with size by using different pages for each size class? So even
if it did sweep it would be somewhat tricky to know what finalizer to
call. Perhaps a solution is to have separate pages for each type that
needs a finalizer, and be able to mark things as being in one of three
states (in use, needs finalization but not in use, finalized and not in
use).  That might hurt memory consumption in the short term, but I think
finalizers will be really useful in getting stuff out of gc memory so
that's probably not too bad.

Trev

> 
> Richard.
> 
> >
> > r~
> >


[PATCH] Fix a nonfatal build error

2014-01-12 Thread Patrick Palka
This patch fixes the cause of the following build output during a
non-bootstrap build:

make[2]: Entering directory
`/home/patrick/code/gcc-build/x86_64-unknown-linux-gnu/libgcc'
# If this is the top-level multilib, build all the other
# multilibs.
# Early copyback; see "all" above for the rationale.  The
# early copy is necessary so that the gcc -B options find
# the right startup files when linking shared libgcc.
/bin/bash ../../../gcc/libgcc/../mkinstalldirs ../.././gcc
parts="crtbegin.o crtbeginS.o crtbeginT.o crtend.o crtendS.o
vtv_start.o vtv_end.o vtv_start_preinit.o vtv_end_preinit.o
crtprec32.o crtprec64.o crtprec80.o crtfastmath.o";
\
for file in $parts; do  \
  rm -f ../.././gcc/$file;  \
  /usr/bin/install -c -m 644 $file ../.././gcc/;\
  case $file in \
*.a)\
  ranlib ../.././gcc/$file ;;   \
  esac; \
done
/usr/bin/install: cannot stat ‘vtv_start.o’: No such file or directory
/usr/bin/install: cannot stat ‘vtv_end.o’: No such file or directory
/usr/bin/install: cannot stat ‘vtv_start_preinit.o’: No such file or directory
/usr/bin/install: cannot stat ‘vtv_end_preinit.o’: No such file or directory

The vtv_*.o files are only built when vtable verification is enabled
(--enable-vtable-verify) and are otherwise nonexistent.  Therefore
these object files should only be added to $(EXTRA_PARTS)
when vtable verification is enabled.

2014-01-11  Patrick Palka  

* config.host (extra_parts): Don't include vtv_*.o objects unless
vtable verification is enabled.

--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -209,7 +209,10 @@ case ${host} in
   ;;
 *-*-linux* | frv-*-*linux* | *-*-kfreebsd*-gnu | *-*-knetbsd*-gnu |
*-*-gnu* | *-*-kopensolaris*-gnu)
   tmake_file="$tmake_file t-crtstuff-pic t-libgcc-pic t-eh-dw2-dip
t-slibgcc t-slibgcc-gld t-slibgcc-elf-ver t-linux"
-  extra_parts="crtbegin.o crtbeginS.o crtbeginT.o crtend.o crtendS.o
vtv_start.o vtv_end.o vtv_start_preinit.o vtv_end_preinit.o"
+  extra_parts="crtbegin.o crtbeginS.o crtbeginT.o crtend.o crtendS.o"
+  if test $enable_vtable_verify = yes; then
+extra_parts="$extra_parts vtv_start.o vtv_end.o
vtv_start_preinit.o vtv_end_preinit.o"
+  fi
   ;;
 *-*-lynxos*)
   tmake_file="$tmake_file t-lynx $cpu_type/t-crtstuff t-crtstuff-pic
t-libgcc-pic"


Re: [committed] Fix predcom (PR tree-optimization/59745)

2014-01-12 Thread Jakub Jelinek
On Sun, Jan 12, 2014 at 02:24:22PM +0100, Richard Biener wrote:
> Uh.  Is this also applicable to branches?

In theory yes, I don't have a testcase that can trigger it though.
I'll apply it to 4.8 soon.

> > 2014-01-10  Jakub Jelinek  
> >
> > PR tree-optimization/59745
> > * tree-predcom.c (tree_predictive_commoning_loop): Call
> > free_affine_expand_cache if giving up because components is NULL.
> >
> > --- gcc/tree-predcom.c.jj   2014-01-07 08:48:34.0 +0100
> > +++ gcc/tree-predcom.c  2014-01-10 10:08:04.476340865 +0100
> > @@ -2447,6 +2447,7 @@ tree_predictive_commoning_loop (struct l
> >if (!components)
> >  {
> >free_data_refs (datarefs);
> > +  free_affine_expand_cache (&name_expansions);
> >return false;
> >  }
> >
> >
> > Jakub

Jakub


[Ada] Fix PR ada/59772

2014-01-12 Thread Eric Botcazou
This is a regression present on all active branches for 8-bit/16-bit targets 
and introduced by the rewrite of build_int_cst which now truncates its output.

Tested on x86_64-suse-linux, applied on all active branches.


2014-01-12  Eric Botcazou  

PR ada/59772
* gcc-interface/cuintp.c (build_cst_from_int): Use 32-bit integer type
as intermediate type.
(UI_To_gnu): Likewise.


-- 
Eric BotcazouIndex: gcc-interface/cuintp.c
===
--- gcc-interface/cuintp.c	(revision 206563)
+++ gcc-interface/cuintp.c	(working copy)
@@ -6,7 +6,7 @@
  *  *
  *  C Implementation File   *
  *  *
- *  Copyright (C) 1992-2013, Free Software Foundation, Inc. *
+ *  Copyright (C) 1992-2014, Free Software Foundation, Inc. *
  *  *
  * GNAT is free software;  you can  redistribute it  and/or modify it under *
  * terms of the  GNU General Public License as published  by the Free Soft- *
@@ -55,7 +55,7 @@ static tree
 build_cst_from_int (tree type, HOST_WIDE_INT low)
 {
   if (SCALAR_FLOAT_TYPE_P (type))
-return convert (type, build_int_cst (NULL_TREE, low));
+return convert (type, build_int_cst (gnat_type_for_size (32, 0), low));
   else
 return build_int_cst_type (type, low);
 }
@@ -89,19 +89,12 @@ UI_To_gnu (Uint Input, tree type)
   gcc_assert (Length > 0);
 
   /* The computations we perform below always require a type at least as
-	 large as an integer not to overflow.  REAL types are always fine, but
+	 large as an integer not to overflow.  FP types are always fine, but
 	 INTEGER or ENUMERAL types we are handed may be too short.  We use a
 	 base integer type node for the computations in this case and will
-	 convert the final result back to the incoming type later on.
-	 The base integer precision must be superior than 16.  */
-
-  if (TREE_CODE (comp_type) != REAL_TYPE
-	  && TYPE_PRECISION (comp_type)
-	 < TYPE_PRECISION (long_integer_type_node))
-	{
-	  comp_type = long_integer_type_node;
-	  gcc_assert (TYPE_PRECISION (comp_type) > 16);
-	}
+	 convert the final result back to the incoming type later on.  */
+  if (!SCALAR_FLOAT_TYPE_P (comp_type) && TYPE_PRECISION (comp_type) < 32)
+	comp_type = gnat_type_for_size (32, 0);
 
   gnu_base = build_cst_from_int (comp_type, Base);
 

Re: [committed] Fix predcom (PR tree-optimization/59745)

2014-01-12 Thread Richard Biener
On Fri, Jan 10, 2014 at 9:41 PM, Jakub Jelinek  wrote:
> Hi!
>
> split_data_refs_to_components used the name_expansions affine cache
> through determine_offset, and since my patch uses it even more often,
> but if it returns NULL, we don't free the cache and it can contain garbage
> next time we perform tree_predictive_commoning_loop.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, committed
> to trunk as obvious.  No testcase for testsuite, as it is pretty
> random if we ICE or not, e.g. stage1 f951 doesn't ICE, but stage2/3 did.

Uh.  Is this also applicable to branches?

Thanks for fixing this,
Richard.

> 2014-01-10  Jakub Jelinek  
>
> PR tree-optimization/59745
> * tree-predcom.c (tree_predictive_commoning_loop): Call
> free_affine_expand_cache if giving up because components is NULL.
>
> --- gcc/tree-predcom.c.jj   2014-01-07 08:48:34.0 +0100
> +++ gcc/tree-predcom.c  2014-01-10 10:08:04.476340865 +0100
> @@ -2447,6 +2447,7 @@ tree_predictive_commoning_loop (struct l
>if (!components)
>  {
>free_data_refs (datarefs);
> +  free_affine_expand_cache (&name_expansions);
>return false;
>  }
>
>
> Jakub


Re: [PATCH] Allocate all target globals using GC for SWITCHABLE_TARGETs

2014-01-12 Thread Richard Biener
On Fri, Jan 10, 2014 at 6:37 PM, Richard Henderson  wrote:
> On 01/09/2014 03:34 PM, Jakub Jelinek wrote:
>> 2014-01-09  Jakub Jelinek  
>>
>>   * target-globals.c (save_target_globals): Allocate < 4KB structs using
>>   GC in payload of target_globals struct instead of allocating them on
>>   the heap and the larger structs separately using GC.
>>   * target-globals.h (struct target_globals): Make regs, hard_regs,
>>   reload, expmed, ira, ira_int and lra_fields GTY((atomic)) instead
>>   of GTY((skip)) and change type to void *.
>>   (reset_target_globals): Cast loads from those fields to corresponding
>>   types.
>>
>> --- gcc/target-globals.h.jj   2014-01-09 19:24:20.0 +0100
>> +++ gcc/target-globals.h  2014-01-09 19:39:43.879348712 +0100
>> @@ -41,17 +41,17 @@ extern struct target_lower_subreg *this_
>>
>>  struct GTY(()) target_globals {
>>struct target_flag_state *GTY((skip)) flag_state;
>> -  struct target_regs *GTY((skip)) regs;
>> +  void *GTY((atomic)) regs;
>
> I'm not entirely fond of this either, for the obvious reason.  Clearly a
> deficiency in gengtype, but after 2 hours of poking around I can see that
> it isn't a quick fix.
>
> I guess I'm ok with the patch, since the use of the target_globals structure
> is so restricted.

Yeah.  At some time we need a way to specify a finalization hook called
if an object is collected and eventually a hook that walks extra roots
indirectly
reachable via an object (so you can have GC -> heap -> GC memory layouts
more easily).

Richard.

>
> r~
>


Re: [PATCH] Final removal of mudflap

2014-01-12 Thread Gerald Pfeifer
Jeff Law  wrote:
>It's been so long since I did anything with our web pages, I'm not
>entirely sure of proper procedures anymore.
>
>Gerald, this look OK?

Basically. ;-)

Per http://gcc.gnu.org/codingconventions.html, should it be "run time"?

And add a  at the end if the item.

If anything else needs tweaking, I'll keep an eye on it.

Gerald


-- 
Gerald Pfeifer 


Re: [PATCH] Fix PR59715

2014-01-12 Thread Richard Biener
On Fri, Jan 10, 2014 at 4:45 PM, Tom de Vries  wrote:
> On 09-01-14 13:33, Richard Biener wrote:
>>
>> On Thu, 9 Jan 2014, Tom de Vries wrote:
>>
>>> On 09-01-14 10:16, Richard Biener wrote:


 This fixes PR59715 by splitting critical edges again before
 code sinking.  The critical edge splitting done before PRE
 was designed to survive until sinking originally, but at least
 since 4.5 PRE now eventually cleans up the CFG and thus undos
 critical edge splitting.  This results in less than optimal
 code placement (and lost opportunities) for sinking and it
 breaks (at least) the virtual operand updating code which
 assumes that critical edges are still split.

>>>
>>> Richard,
>>>
>>> this follow-up patch:
>>> - notes in pass_pre that PROP_no_crit_edge is destroyed
>>> - notes in pass_sink_code that PROP_no_crit_edge is not required
>>>(because it's now ensured by the pass itself)
>>>
>>> Build and reg-tested pr59715.c on x86_64.
>>>
>>> OK for stage3 trunk if bootstrap and full reg-test on x86_64 is ok?
>>
>>
>> Ok with /* PROP_no_crit_edges | */ not commented but removed.
>>
>
> Richard,
>
> Committed to trunk with that change.
>
> I saw you propagated the fix for PR59715 to 4.8 and 4.7 as well.
>
> Should I propagate this follow-up patch as well?

No, it's purely cosmetic after all.

Richard.

> Thanks,
> - Tom
>
>> Thanks,
>> Richard.
>>
>


Re: [Patch, fortran] PR58007: unresolved fixup hell

2014-01-12 Thread Dominique Dhumieres
> However, I don't quite see the necessity for changing the module
> format (apart from the fact that it makes your patch slightly
> simpler).

I think it should otherwise reading old module gives
"Expected left parenthesis".

Cheers,

Dominique


Re: [Patch, Fortran] PR 58026: Bad error recovery for allocatable component of undeclared type

2014-01-12 Thread Janus Weil
2014/1/11 Mikael Morin :
>
>
> Le 09/01/2014 16:30, Janus Weil a écrit :
>> Hi all,
>>
>> the attached patch started out as an ICE-on-invalid regression fix,
>> but after the ICE had been fixed recently by other means, it was
>> degraded to a mere error-recovery improvement. It removes some rather
>> 'hackish' code that was added by Paul quite a long time ago.
>>
>> Regtests cleanly on x86_64-unknown-linux-gnu. Ok for trunk?
>>
> Could you check whether it works with a regular error?
> i.e. s/gfc_error_now/gfc_error/
> If it doesn't, OK as is.

Good point. In fact in works just as well with a plain gfc_error.
Committed as r206564 with that change. Thanks for the review.

Cheers,
Janus


Re: Test cases vect-simd-clone-10/12.c keep failing

2014-01-12 Thread Jakub Jelinek
On Sun, Jan 12, 2014 at 10:53:58AM +0100, Bernd Edlinger wrote:
> The test cases gcc.dg/vect/vect-simd-clone-10.c and
> gcc.dg/vect/vect-simd-clone-12.c keep failing on my i686-pc. I do not really
> understand why. The problem seem to be the command line to xgcc has
> -S and -o and two .c files, probably the test case is not supported at all
> on this target, does not have AVX, SSE...

It seems that on some configurations (such as very old i?86/x86_64) the
default is dg-do compile for vect and not dg-do run.
check_vect_support_and_set_flags has:
if { [check_effective_target_sse2_runtime] } {
set dg-do-what-default run
} else {
set dg-do-what-default compile
}
so if you have really old box that doesn't support SSE2 even, you get this.

I guess explicit /* { dg-do run } */ needs to be added in this case, after
all the test has all the check_vect stuff in.

Jakub


Re: [Patch] Patch set for regex instantiation

2014-01-12 Thread Paolo Carlini
Hi

> On 12/gen/2014, at 01:48, Tim Shen  wrote:
> 
> Here're 4 patches that finally led the _Compiler's instantiation and
> some other optimization for compiling time.
> 
> 1) Create class _ScannerBase to make _Scanner pithier. Move const
> static members to src/c++11/regex.cc.
> 2) Make _Compiler and _Scanner `_FwdIter independent`. We store the
> input regex string in basic_regex as a basic_string; but when
> compiling it, const _CharT* is used.
> 3) Avoid using std::map, std::set and std::queue to reduce compile time.
> 4) Instantiate _Compiler> and
> _Compiler>. Export vector and
> vector's ctor and dtor as well for _Compiler's denpendency.

Thanks, but as we already tried to explain, instantiating, thus adding many 
exported symbols, is post 4.9 material, can't be committed until we branch. 
Please make sure to have in a separate patch or multiple patches the 
correctness fixes and maybe anything unrelated to instantiation which you 
consider stable and independently useful.

> Booted, and tested with -m64 and -m32; But check-debug failed some
> 23_containers/* cases? I suppose it's not my problem?

Please make sure Francois knows about that...

Paolo

Test cases vect-simd-clone-10/12.c keep failing

2014-01-12 Thread Bernd Edlinger
Hi Jakub,

The test cases gcc.dg/vect/vect-simd-clone-10.c and
gcc.dg/vect/vect-simd-clone-12.c keep failing on my i686-pc. I do not really
understand why. The problem seem to be the command line to xgcc has
-S and -o and two .c files, probably the test case is not supported at all
on this target, does not have AVX, SSE...

Any ideas?

Regards
Bernd.



Executing on host: /home/ed/gnu/gcc-build/gcc/xgcc 
-B/home/ed/gnu/gcc-build/gcc/ 
/home/ed/gnu/gcc-4.9-trunk/gcc/testsuite/gcc.dg/vect/vect-simd-clone-10.c  
-fno-diagnostics-show-caret -fdiagnostics-color=never  -flto -ffat-lto-objects 
-msse2 -ftree-vectorize -fno-vect-cost-model -fno-common -O2 
-fdump-tree-vect-details -fopenmp-simd  
/home/ed/gnu/gcc-4.9-trunk/gcc/testsuite/gcc.dg/vect/vect-simd-clone-10a.c -S  
-o vect-simd-clone-10.s    (timeout = 300)
spawn /home/ed/gnu/gcc-build/gcc/xgcc -B/home/ed/gnu/gcc-build/gcc/ 
/home/ed/gnu/gcc-4.9-trunk/gcc/testsuite/gcc.dg/vect/vect-simd-clone-10.c 
-fno-diagnostics-show-caret -fdiagnostics-color=never -flto -ffat-lto-objects 
-msse2 -ftree-vectorize -fno-vect-cost-model -fno-common -O2 
-fdump-tree-vect-details -fopenmp-simd 
/home/ed/gnu/gcc-4.9-trunk/gcc/testsuite/gcc.dg/vect/vect-simd-clone-10a.c -S 
-o vect-simd-clone-10.s^M
xgcc: fatal error: cannot specify -o with -c, -S or -E with multiple files^M
compilation terminated.^M
compiler exited with status 1
output is:
xgcc: fatal error: cannot specify -o with -c, -S or -E with multiple files^M
compilation terminated.^M

FAIL: gcc.dg/vect/vect-simd-clone-10.c -flto -ffat-lto-objects (test for excess 
errors)
Excess errors:
xgcc: fatal error: cannot specify -o with -c, -S or -E with multiple files
compilation terminated.

Executing on host: /home/ed/gnu/gcc-build/gcc/xgcc 
-B/home/ed/gnu/gcc-build/gcc/ 
/home/ed/gnu/gcc-4.9-trunk/gcc/testsuite/gcc.dg/vect/vect-simd-clone-10a.c  
-fno-diagnostics-show-caret -fdiagnostics-color=never  -flto -ffat-lto-objects 
-msse2 -ftree-vectorize -fno-vect-cost-model -fno-common -O2 
-fdump-tree-vect-details -S  -o vect-simd-clone-10a.s    (timeout = 300)
spawn /home/ed/gnu/gcc-build/gcc/xgcc -B/home/ed/gnu/gcc-build/gcc/ 
/home/ed/gnu/gcc-4.9-trunk/gcc/testsuite/gcc.dg/vect/vect-simd-clone-10a.c 
-fno-diagnostics-show-caret -fdiagnostics-color=never -flto -ffat-lto-objects 
-msse2 -ftree-vectorize -fno-vect-cost-model -fno-common -O2 
-fdump-tree-vect-details -S -o vect-simd-clone-10a.s^M
PASS: gcc.dg/vect/vect-simd-clone-10a.c -flto -ffat-lto-objects (test for 
excess errors)