Re: [RFA] [PATCH][PR tree-optimization/59749] Fix recently introduced ree bug
On Sun, Jan 12, 2014 at 11:21:59PM -0700, Jeff Law wrote: > --- a/gcc/ree.c > +++ b/gcc/ree.c > @@ -297,6 +297,13 @@ combine_set_extension (ext_cand *cand, rtx curr_insn, > rtx *orig_set) >else > new_reg = gen_rtx_REG (cand->mode, REGNO (SET_DEST (*orig_set))); > > + /* We're going to be widening the result of DEF_INSN, ensure that doing so > + doesn't change the number of hard registers needed for the result. */ > + if (HARD_REGNO_NREGS (REGNO (new_reg), cand->mode) > + != HARD_REGNO_NREGS (REGNO (SET_SRC (*orig_set)), Note you can use orig_src instead of SET_SRC (*orig_set) here. > +GET_MODE (SET_DEST (*orig_set > + return false; > + >/* Merge constants by directly moving the constant into the register under > some conditions. Recall that RTL constants are sign-extended. */ >if (GET_CODE (orig_src) == CONST_INT Are you sure the above is needed even for the REGNO (new_reg) == REGNO (SET_DEST (*orig_set)) && REGNO (new_reg) == REGNO (orig_src) case? I mean in that case no copy insn is going to be scheduled right now, nor has been previously scheduled, so we are back to what the code did before r206418. I can imagine it can be a problem, but doesn't have to be. (set (reg:SI 3) (something:SI)) (set (reg:SI 2) (expression:SI))// def_insn (use (reg:SI 3)) (set (reg:DI 3) (sign_extend:DI (reg:SI 2))) So, perhaps if we wanted to handle the HARD_REGNO_NREGS != HARD_REGNO_NREGS case when all 3 REGNOs are the same, we'd need to limit it to the case where cand->insn and curr_insn are in the same bb, DF_INSN_LUID of curr_insn is smaller than DF_INSN_LUID of cand->insn and the extra hard regs aren't used between the two. Perhaps not worth it? BTW, I'm surprised to hear that it triggers in the testsuite already (for the 3 REGNOs the same case, or different?), is that on x86_64 or i?86? Do you have an example? I'm surprised that we'd have post reload a pattern that extends into multiple hard registers. Jakub
Re: [Ping]Two pending IVOPT patches
On Mon, Jan 13, 2014 at 2:29 PM, Jeff Law wrote: > On 01/11/14 02:21, Bin.Cheng wrote: >> >> On Sat, Jan 11, 2014 at 5:07 PM, Jakub Jelinek wrote: >>> >>> On Sat, Jan 11, 2014 at 05:02:26PM +0800, Bin.Cheng wrote: > > I reduced the case and attached ivopt dumps with/without the patch. > It seems the patch is doing right thing and choosing better > candidates, most likely it reveals an existing bug. > I am looking into this issue, in the meantime, I am wondering should I > apply the patch and file a PR for it, or apply the patch after root > causing it? >>> >>> >>> Sounds like PR59743 which should be already fixed. >> >> Yes, it is. I just did the test against trunk exactly before Jeff's fix. >> Then I will apply the patch to trunk. >> >> And an additional question: Are these uses before definition always >> caused by uninitialized use in GCC? > > In general, no. > > Consider a loop where an object's use is guarded (say it only gets used on > the 2nd and later iterations) and the set is unguarded. ISTM that the > reaching def would be from a later insn in the loop. I see, thanks for elaborating. BTW, I applied the approved patch as revision 206552, and there should be no case violated anymore. Thanks, bin > > jeff > -- Best Regards.
Re: [Ping]Two pending IVOPT patches
On 01/11/14 02:21, Bin.Cheng wrote: On Sat, Jan 11, 2014 at 5:07 PM, Jakub Jelinek wrote: On Sat, Jan 11, 2014 at 05:02:26PM +0800, Bin.Cheng wrote: I reduced the case and attached ivopt dumps with/without the patch. It seems the patch is doing right thing and choosing better candidates, most likely it reveals an existing bug. I am looking into this issue, in the meantime, I am wondering should I apply the patch and file a PR for it, or apply the patch after root causing it? Sounds like PR59743 which should be already fixed. Yes, it is. I just did the test against trunk exactly before Jeff's fix. Then I will apply the patch to trunk. And an additional question: Are these uses before definition always caused by uninitialized use in GCC? In general, no. Consider a loop where an object's use is guarded (say it only gets used on the 2nd and later iterations) and the set is unguarded. ISTM that the reaching def would be from a later insn in the loop. jeff
Re: [Ping]Two pending IVOPT patches
On 01/11/14 02:07, Jakub Jelinek wrote: On Sat, Jan 11, 2014 at 05:02:26PM +0800, Bin.Cheng wrote: I reduced the case and attached ivopt dumps with/without the patch. It seems the patch is doing right thing and choosing better candidates, most likely it reveals an existing bug. I am looking into this issue, in the meantime, I am wondering should I apply the patch and file a PR for it, or apply the patch after root causing it? Sounds like PR59743 which should be already fixed. Certainly what it looks like to me as well. jeff
Re: [RFA] [PATCH][PR tree-optimization/59749] Fix recently introduced ree bug
On 01/10/14 14:52, Jakub Jelinek wrote: There is one thing I still worry about, if some target has an insn to say sign extend or zero extend a short memory load into HARD_REGNO_NREGS () > 1 register, but the other involved register has the only one (or fewer) hard registers available to it. Consider registers SImode hard registers 0, 1, 2, 3: (set (reg:SI 3) (something:SI)) (set (reg:HI 0) (expression:HI)) (set (reg:SI 2) (sign_extend:SI (reg:HI 0))) (set (reg:DI 0) (sign_extend:DI (reg:HI 0))) (use (reg:SI 3)) we transform this into: (set (reg:SI 3) (something:SI)) (set (reg:SI 2) (sign_extend:SI (expression:HI))) (set (reg:SI 0) (reg:HI 2)) (set (reg:DI 0) (sign_extend:DI (reg:HI 0))) (use (reg:SI 3)) first (well, the middle is then pending in copy list), and next: (set (reg:SI 3) (something)) (set (reg:DI 2) (sign_extend:DI (expression:HI))) (set (reg:DI 0) (reg:DI 2)) (use (reg:SI 3)) but that looks wrong, because the second instruction would now clobber (reg:SI 3). Dunno if we have such an target and thus if it is possible to construct a testcase. No need to construct a testcase, there's a few that trip the condition in the existing testsuite :-) Basically I just put a check in combine_set_extension to detect when widening of the result of the reaching def requires more hard registers than it previously needed and ran the testsuite. So, I'd say the handling of the second extend should notice that it is actually extending load into a different register and bail out if it would need more hard registers than it needed previously, or something similar. Yes, like in the attached patch? OK for the trunk? commit 1313449102ac8d62e36818d8660ef2e897bd59e3 Author: Jeff Law Date: Fri Jan 10 14:31:15 2014 -0700 PR tree-optimization/59747 * ree.c (find_and_remove_re): Properly handle case where a second eliminated extension requires widening a copy created for elimination of a prior extension. (combine_set_extension): Ensure that the number of hard regs needed for a destination register does not change when we widen it. PR tree-optimization/59747 * gcc.c-torture/execute/pr59747.c: New test. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index c554609..a82e23c 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -5,6 +5,13 @@ occurs before the extension when optimizing extensions with different source and destination hard registers. + PR tree-optimization/59747 + * ree.c (find_and_remove_re): Properly handle case where a second + eliminated extension requires widening a copy created for elimination + of a prior extension. + (combine_set_extension): Ensure that the number of hard regs needed + for a destination register does not change when we widen it. + 2014-01-10 Jan Hubicka PR ipa/58585 diff --git a/gcc/ree.c b/gcc/ree.c index 63cc8cc..3ee97cd 100644 --- a/gcc/ree.c +++ b/gcc/ree.c @@ -297,6 +297,13 @@ combine_set_extension (ext_cand *cand, rtx curr_insn, rtx *orig_set) else new_reg = gen_rtx_REG (cand->mode, REGNO (SET_DEST (*orig_set))); + /* We're going to be widening the result of DEF_INSN, ensure that doing so + doesn't change the number of hard registers needed for the result. */ + if (HARD_REGNO_NREGS (REGNO (new_reg), cand->mode) + != HARD_REGNO_NREGS (REGNO (SET_SRC (*orig_set)), + GET_MODE (SET_DEST (*orig_set + return false; + /* Merge constants by directly moving the constant into the register under some conditions. Recall that RTL constants are sign-extended. */ if (GET_CODE (orig_src) == CONST_INT @@ -1017,11 +1024,20 @@ find_and_remove_re (void) for (unsigned int i = 0; i < reinsn_copy_list.length (); i += 2) { rtx curr_insn = reinsn_copy_list[i]; + rtx def_insn = reinsn_copy_list[i + 1]; + + /* Use the mode of the destination of the defining insn +for the mode of the copy. This is necessary if the +defining insn was used to eliminate a second extension +that was wider than the first. */ + rtx sub_rtx = *get_sub_rtx (def_insn); rtx pat = PATTERN (curr_insn); - rtx new_reg = gen_rtx_REG (GET_MODE (SET_DEST (pat)), + rtx new_dst = gen_rtx_REG (GET_MODE (SET_DEST (sub_rtx)), REGNO (XEXP (SET_SRC (pat), 0))); - rtx set = gen_rtx_SET (VOIDmode, new_reg, SET_DEST (pat)); - emit_insn_after (set, reinsn_copy_list[i + 1]); + rtx new_src = gen_rtx_REG (GET_MODE (SET_DEST (sub_rtx)), +REGNO (SET_DEST (pat))); + rtx set = gen_rtx_SET (VOIDmode, new_dst, new_src); + emit_insn_after (set, def_insn); } /* Delete all useless extensions here in one sweep. */ diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index f40d56e..a603952 100644 --- a/gcc/testsuite/ChangeLog +++
Re: [PATCH i386 10/8] [AVX512] Add missing AVX-512ER patterns, intrinsics, tests.
Hello, On 11 Jan 12:42, Uros Bizjak wrote: > On Fri, Jan 10, 2014 at 5:24 PM, Jakub Jelinek wrote: > > This means you should ensure aligned_mem will be set for > > CODE_FOR_avx512f_movntdqa in ix86_expand_special_args_builtin. Fixed. Updated patch in the bottom. > > Leaving the rest of review to Uros/Richard. > > The rest is OK. Thanks! I'll check it in tomorrow if no more issues! -- Thanks, K gcc/config/i386/avx512erintrin.h | 62 +++ gcc/config/i386/avx512fintrin.h| 7 +++ gcc/config/i386/i386-builtin-types.def | 1 + gcc/config/i386/i386.c | 14 + gcc/config/i386/sse.md | 71 +++--- gcc/config/i386/subst.md | 4 -- gcc/testsuite/gcc.target/i386/avx-1.c | 20 -- gcc/testsuite/gcc.target/i386/avx512er-vexp2pd-1.c | 12 ++-- gcc/testsuite/gcc.target/i386/avx512er-vexp2ps-1.c | 12 ++-- .../gcc.target/i386/avx512er-vrcp28pd-1.c | 12 ++-- .../gcc.target/i386/avx512er-vrcp28ps-1.c | 12 ++-- .../gcc.target/i386/avx512er-vrcp28sd-1.c | 15 + .../gcc.target/i386/avx512er-vrcp28sd-2.c | 29 + .../gcc.target/i386/avx512er-vrcp28ss-1.c | 15 + .../gcc.target/i386/avx512er-vrcp28ss-2.c | 29 + .../gcc.target/i386/avx512er-vrsqrt28pd-1.c| 12 ++-- .../gcc.target/i386/avx512er-vrsqrt28ps-1.c| 12 ++-- .../gcc.target/i386/avx512er-vrsqrt28sd-1.c| 15 + .../gcc.target/i386/avx512er-vrsqrt28sd-2.c| 29 + .../gcc.target/i386/avx512er-vrsqrt28ss-1.c| 15 + .../gcc.target/i386/avx512er-vrsqrt28ss-2.c| 29 + .../gcc.target/i386/avx512f-vmovntdqa-1.c | 14 + .../gcc.target/i386/avx512f-vmovntdqa-2.c | 17 ++ gcc/testsuite/gcc.target/i386/avx512f-vrcp14sd-2.c | 6 +- gcc/testsuite/gcc.target/i386/avx512f-vrcp14ss-2.c | 10 +-- gcc/testsuite/gcc.target/i386/sse-22.c | 40 ++-- gcc/testsuite/gcc.target/i386/sse-23.c | 16 +++-- 27 files changed, 430 insertions(+), 100 deletions(-) diff --git a/gcc/config/i386/avx512erintrin.h b/gcc/config/i386/avx512erintrin.h index f442f2b..6fe05bc 100644 --- a/gcc/config/i386/avx512erintrin.h +++ b/gcc/config/i386/avx512erintrin.h @@ -159,6 +159,24 @@ _mm512_maskz_rcp28_round_ps (__mmask16 __U, __m512 __A, int __R) (__mmask16) __U, __R); } +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_rcp28_round_sd (__m128d __A, __m128d __B, int __R) +{ + return (__m128d) __builtin_ia32_rcp28sd_round ((__v2df) __A, +(__v2df) __B, +__R); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_rcp28_round_ss (__m128 __A, __m128 __B, int __R) +{ + return (__m128) __builtin_ia32_rcp28ss_round ((__v4sf) __A, + (__v4sf) __B, + __R); +} + extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm512_rsqrt28_round_pd (__m512d __A, int __R) @@ -214,6 +232,25 @@ _mm512_maskz_rsqrt28_round_ps (__mmask16 __U, __m512 __A, int __R) (__v16sf) _mm512_setzero_ps (), (__mmask16) __U, __R); } + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_rsqrt28_round_sd (__m128d __A, __m128d __B, int __R) +{ + return (__m128d) __builtin_ia32_rsqrt28sd_round ((__v2df) __A, + (__v2df) __B, + __R); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_rsqrt28_round_ss (__m128 __A, __m128 __B, int __R) +{ + return (__m128) __builtin_ia32_rsqrt28ss_round ((__v4sf) __A, + (__v4sf) __B, + __R); +} + #else #define _mm512_exp2a23_round_pd(A, C)\ __builtin_ia32_exp2pd_mask(A, (__v8df)_mm512_setzero_pd(), -1, C) @@ -268,6 +305,19 @@ _mm512_maskz_rsqrt28_round_ps (__mmask16 __U, __m512 __A, int __R) #define _mm512_maskz_rsqrt28_round_ps(U, A, C) \ __builtin_ia32_rsqrt28ps_mask(A, (__v16sf)_mm512_setzero_ps(), U, C) + +#define _mm_rcp28_round_sd(A, B, R)\ +__builtin_ia32_rcp28sd_round(A, B, R) + +#define _mm_rcp28_round_ss(A, B, R)\ +__builtin_ia32_rcp28ss_round(A, B, R) + +#define _mm_rsqrt28_round_sd(A, B, R) \ +__builtin_ia32_rsqrt28sd_round(A, B, R) + +#define _mm_rsqrt28_round_ss(A, B, R) \ +__builtin_ia32_rsqrt28ss_round(A, B, R) +
[PATCH,rs6000] Implement -maltivec=be for vec_insert and vec_extract Altivec intrinsics
This patch provides for interpreting element numbers for the Altivec vec_insert and vec_extract intrinsics as big-endian (left to right in a vector register) when targeting a little endian machine and specifying -maltivec=be. New test cases are added to test this functionality on all supported vector types. Bootstrapped and tested with no new regressions on powerpc64{,le}-unknown-linux-gnu. Ok for trunk? Thanks, Bill gcc: 2014-01-12 Bill Schmidt * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): Implement -maltivec=be for vec_insert and vec_extract. gcc/testsuite: 2014-01-12 Bill Schmidt * gcc.dg/vmx/insert.c: New. * gcc.dg/vmx/insert-be-order.c: New. * gcc.dg/vmx/extract.c: New. * gcc.dg/vmx/extract-be-order.c: New. Index: gcc/testsuite/gcc.dg/vmx/insert-be-order.c === --- gcc/testsuite/gcc.dg/vmx/insert-be-order.c (revision 0) +++ gcc/testsuite/gcc.dg/vmx/insert-be-order.c (revision 0) @@ -0,0 +1,65 @@ +/* { dg-options "-maltivec=be -mabi=altivec -std=gnu99 -mno-vsx" } */ + +#include "harness.h" + +static void test() +{ + vector unsigned char va = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}; + vector signed char vb = {-8,-7,-6,-5,-4,-3,-2,-1,0,1,2,3,4,5,6,7}; + vector unsigned short vc = {0,1,2,3,4,5,6,7}; + vector signed short vd = {-4,-3,-2,-1,0,1,2,3}; + vector unsigned int ve = {0,1,2,3}; + vector signed int vf = {-2,-1,0,1}; + vector float vg = {-2.0f,-1.0f,0.0f,1.0f}; + +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ + check (vec_all_eq (vec_insert (16, va, 5), +((vector unsigned char) + {0,1,2,3,4,5,6,7,8,9,16,11,12,13,14,15})), +"vec_insert (va LE)"); + check (vec_all_eq (vec_insert (-16, vb, 0), +((vector signed char) + {-8,-7,-6,-5,-4,-3,-2,-1,0,1,2,3,4,5,6,-16})), +"vec_insert (vb LE)"); + check (vec_all_eq (vec_insert (16, vc, 7), +((vector unsigned short){16,1,2,3,4,5,6,7})), +"vec_insert (vc LE)"); + check (vec_all_eq (vec_insert (-16, vd, 3), +((vector signed short){-4,-3,-2,-1,-16,1,2,3})), +"vec_insert (vd LE)"); + check (vec_all_eq (vec_insert (16, ve, 2), +((vector unsigned int){0,16,2,3})), +"vec_insert (ve LE)"); + check (vec_all_eq (vec_insert (-16, vf, 1), +((vector signed int){-2,-1,-16,1})), +"vec_insert (vf LE)"); + check (vec_all_eq (vec_insert (-16.0f, vg, 0), +((vector float){-2.0f,-1.0f,0.0f,-16.0f})), +"vec_insert (vg LE)"); +#else + check (vec_all_eq (vec_insert (16, va, 5), +((vector unsigned char) + {0,1,2,3,4,16,6,7,8,9,10,11,12,13,14,15})), +"vec_insert (va BE)"); + check (vec_all_eq (vec_insert (-16, vb, 0), +((vector signed char) + {-16,-7,-6,-5,-4,-3,-2,-1,0,1,2,3,4,5,6,7})), +"vec_insert (vb BE)"); + check (vec_all_eq (vec_insert (16, vc, 7), +((vector unsigned short){0,1,2,3,4,5,6,16})), +"vec_insert (vc BE)"); + check (vec_all_eq (vec_insert (-16, vd, 3), +((vector signed short){-4,-3,-2,-16,0,1,2,3})), +"vec_insert (vd BE)"); + check (vec_all_eq (vec_insert (16, ve, 2), +((vector unsigned int){0,1,16,3})), +"vec_insert (ve BE)"); + check (vec_all_eq (vec_insert (-16, vf, 1), +((vector signed int){-2,-16,0,1})), +"vec_insert (vf BE)"); + check (vec_all_eq (vec_insert (-16.0f, vg, 0), +((vector float){-16.0f,-1.0f,0.0f,1.0f})), +"vec_insert (vg BE)"); +#endif +} + Index: gcc/testsuite/gcc.dg/vmx/extract-be-order.c === --- gcc/testsuite/gcc.dg/vmx/extract-be-order.c (revision 0) +++ gcc/testsuite/gcc.dg/vmx/extract-be-order.c (revision 0) @@ -0,0 +1,33 @@ +/* { dg-options "-maltivec=be -mabi=altivec -std=gnu99 -mno-vsx" } */ + +#include "harness.h" + +static void test() +{ + vector unsigned char va = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}; + vector signed char vb = {-8,-7,-6,-5,-4,-3,-2,-1,0,1,2,3,4,5,6,7}; + vector unsigned short vc = {0,1,2,3,4,5,6,7}; + vector signed short vd = {-4,-3,-2,-1,0,1,2,3}; + vector unsigned int ve = {0,1,2,3}; + vector signed int vf = {-2,-1,0,1}; + vector float vg = {-2.0f,-1.0f,0.0f,1.0f}; + +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ + check (vec_extract (va, 5) == 10, "vec_extract (va, 5)"); + check (vec_extract (vb, 0) == 7, "vec_extract (vb, 0)"); + check (vec_extract (vc, 7) == 0, "vec_extract (vc, 7)"); + check (vec_extract (vd, 3) == 0, "vec_extract (vd, 3)"); + check (vec_extract (ve, 2) == 1, "vec_extract (ve, 2)"); + check (vec_extract (vf, 1) == 0, "vec_extract (vf, 1)"); + check (vec_extract (vg, 0) == 1.0f, "vec_
RE: Test cases vect-simd-clone-10/12.c keep failing
On Sun, 12 Jan 2014 12:01:43, Jakub Jelinek wrote: > > On Sun, Jan 12, 2014 at 10:53:58AM +0100, Bernd Edlinger wrote: >> The test cases gcc.dg/vect/vect-simd-clone-10.c and >> gcc.dg/vect/vect-simd-clone-12.c keep failing on my i686-pc. I do not really >> understand why. The problem seem to be the command line to xgcc has >> -S and -o and two .c files, probably the test case is not supported at all >> on this target, does not have AVX, SSE... > > It seems that on some configurations (such as very old i?86/x86_64) the > default is dg-do compile for vect and not dg-do run. > check_vect_support_and_set_flags has: > if { [check_effective_target_sse2_runtime] } { > set dg-do-what-default run > } else { > set dg-do-what-default compile > } > so if you have really old box that doesn't support SSE2 even, you get this. > > I guess explicit /* { dg-do run } */ needs to be added in this case, after > all the test has all the check_vect stuff in. > > Jakub Yes, explicit /* { dg-do run } */ works. Bernd.
Re: [PATCH] Allocate all target globals using GC for SWITCHABLE_TARGETs
On Sun, Jan 12, 2014 at 02:23:21PM +0100, Richard Biener wrote: > On Fri, Jan 10, 2014 at 6:37 PM, Richard Henderson wrote: > > On 01/09/2014 03:34 PM, Jakub Jelinek wrote: > >> 2014-01-09 Jakub Jelinek > >> > >> * target-globals.c (save_target_globals): Allocate < 4KB structs > >> using > >> GC in payload of target_globals struct instead of allocating them on > >> the heap and the larger structs separately using GC. > >> * target-globals.h (struct target_globals): Make regs, hard_regs, > >> reload, expmed, ira, ira_int and lra_fields GTY((atomic)) instead > >> of GTY((skip)) and change type to void *. > >> (reset_target_globals): Cast loads from those fields to corresponding > >> types. > >> > >> --- gcc/target-globals.h.jj 2014-01-09 19:24:20.0 +0100 > >> +++ gcc/target-globals.h 2014-01-09 19:39:43.879348712 +0100 > >> @@ -41,17 +41,17 @@ extern struct target_lower_subreg *this_ > >> > >> struct GTY(()) target_globals { > >>struct target_flag_state *GTY((skip)) flag_state; > >> - struct target_regs *GTY((skip)) regs; > >> + void *GTY((atomic)) regs; > > > > I'm not entirely fond of this either, for the obvious reason. Clearly a > > deficiency in gengtype, but after 2 hours of poking around I can see that > > it isn't a quick fix. > > > > I guess I'm ok with the patch, since the use of the target_globals structure > > is so restricted. > > Yeah. At some time we need a way to specify a finalization hook called > if an object is collected and eventually a hook that walks extra roots > indirectly > reachable via an object (so you can have GC -> heap -> GC memory layouts > more easily). I actually tried to add finalizers a couple weeks ago, but it seems pretty non trivial. ggc seems to basically just allocate by searching for the first unmarked block. It doesn't even sweep unmarked stuff, it just marks and then waits for the space to be allocated over. I believe it deals with size by using different pages for each size class? So even if it did sweep it would be somewhat tricky to know what finalizer to call. Perhaps a solution is to have separate pages for each type that needs a finalizer, and be able to mark things as being in one of three states (in use, needs finalization but not in use, finalized and not in use). That might hurt memory consumption in the short term, but I think finalizers will be really useful in getting stuff out of gc memory so that's probably not too bad. Trev > > Richard. > > > > > r~ > >
[PATCH] Fix a nonfatal build error
This patch fixes the cause of the following build output during a non-bootstrap build: make[2]: Entering directory `/home/patrick/code/gcc-build/x86_64-unknown-linux-gnu/libgcc' # If this is the top-level multilib, build all the other # multilibs. # Early copyback; see "all" above for the rationale. The # early copy is necessary so that the gcc -B options find # the right startup files when linking shared libgcc. /bin/bash ../../../gcc/libgcc/../mkinstalldirs ../.././gcc parts="crtbegin.o crtbeginS.o crtbeginT.o crtend.o crtendS.o vtv_start.o vtv_end.o vtv_start_preinit.o vtv_end_preinit.o crtprec32.o crtprec64.o crtprec80.o crtfastmath.o"; \ for file in $parts; do \ rm -f ../.././gcc/$file; \ /usr/bin/install -c -m 644 $file ../.././gcc/;\ case $file in \ *.a)\ ranlib ../.././gcc/$file ;; \ esac; \ done /usr/bin/install: cannot stat ‘vtv_start.o’: No such file or directory /usr/bin/install: cannot stat ‘vtv_end.o’: No such file or directory /usr/bin/install: cannot stat ‘vtv_start_preinit.o’: No such file or directory /usr/bin/install: cannot stat ‘vtv_end_preinit.o’: No such file or directory The vtv_*.o files are only built when vtable verification is enabled (--enable-vtable-verify) and are otherwise nonexistent. Therefore these object files should only be added to $(EXTRA_PARTS) when vtable verification is enabled. 2014-01-11 Patrick Palka * config.host (extra_parts): Don't include vtv_*.o objects unless vtable verification is enabled. --- a/libgcc/config.host +++ b/libgcc/config.host @@ -209,7 +209,10 @@ case ${host} in ;; *-*-linux* | frv-*-*linux* | *-*-kfreebsd*-gnu | *-*-knetbsd*-gnu | *-*-gnu* | *-*-kopensolaris*-gnu) tmake_file="$tmake_file t-crtstuff-pic t-libgcc-pic t-eh-dw2-dip t-slibgcc t-slibgcc-gld t-slibgcc-elf-ver t-linux" - extra_parts="crtbegin.o crtbeginS.o crtbeginT.o crtend.o crtendS.o vtv_start.o vtv_end.o vtv_start_preinit.o vtv_end_preinit.o" + extra_parts="crtbegin.o crtbeginS.o crtbeginT.o crtend.o crtendS.o" + if test $enable_vtable_verify = yes; then +extra_parts="$extra_parts vtv_start.o vtv_end.o vtv_start_preinit.o vtv_end_preinit.o" + fi ;; *-*-lynxos*) tmake_file="$tmake_file t-lynx $cpu_type/t-crtstuff t-crtstuff-pic t-libgcc-pic"
Re: [committed] Fix predcom (PR tree-optimization/59745)
On Sun, Jan 12, 2014 at 02:24:22PM +0100, Richard Biener wrote: > Uh. Is this also applicable to branches? In theory yes, I don't have a testcase that can trigger it though. I'll apply it to 4.8 soon. > > 2014-01-10 Jakub Jelinek > > > > PR tree-optimization/59745 > > * tree-predcom.c (tree_predictive_commoning_loop): Call > > free_affine_expand_cache if giving up because components is NULL. > > > > --- gcc/tree-predcom.c.jj 2014-01-07 08:48:34.0 +0100 > > +++ gcc/tree-predcom.c 2014-01-10 10:08:04.476340865 +0100 > > @@ -2447,6 +2447,7 @@ tree_predictive_commoning_loop (struct l > >if (!components) > > { > >free_data_refs (datarefs); > > + free_affine_expand_cache (&name_expansions); > >return false; > > } > > > > > > Jakub Jakub
[Ada] Fix PR ada/59772
This is a regression present on all active branches for 8-bit/16-bit targets and introduced by the rewrite of build_int_cst which now truncates its output. Tested on x86_64-suse-linux, applied on all active branches. 2014-01-12 Eric Botcazou PR ada/59772 * gcc-interface/cuintp.c (build_cst_from_int): Use 32-bit integer type as intermediate type. (UI_To_gnu): Likewise. -- Eric BotcazouIndex: gcc-interface/cuintp.c === --- gcc-interface/cuintp.c (revision 206563) +++ gcc-interface/cuintp.c (working copy) @@ -6,7 +6,7 @@ * * * C Implementation File * * * - * Copyright (C) 1992-2013, Free Software Foundation, Inc. * + * Copyright (C) 1992-2014, Free Software Foundation, Inc. * * * * GNAT is free software; you can redistribute it and/or modify it under * * terms of the GNU General Public License as published by the Free Soft- * @@ -55,7 +55,7 @@ static tree build_cst_from_int (tree type, HOST_WIDE_INT low) { if (SCALAR_FLOAT_TYPE_P (type)) -return convert (type, build_int_cst (NULL_TREE, low)); +return convert (type, build_int_cst (gnat_type_for_size (32, 0), low)); else return build_int_cst_type (type, low); } @@ -89,19 +89,12 @@ UI_To_gnu (Uint Input, tree type) gcc_assert (Length > 0); /* The computations we perform below always require a type at least as - large as an integer not to overflow. REAL types are always fine, but + large as an integer not to overflow. FP types are always fine, but INTEGER or ENUMERAL types we are handed may be too short. We use a base integer type node for the computations in this case and will - convert the final result back to the incoming type later on. - The base integer precision must be superior than 16. */ - - if (TREE_CODE (comp_type) != REAL_TYPE - && TYPE_PRECISION (comp_type) - < TYPE_PRECISION (long_integer_type_node)) - { - comp_type = long_integer_type_node; - gcc_assert (TYPE_PRECISION (comp_type) > 16); - } + convert the final result back to the incoming type later on. */ + if (!SCALAR_FLOAT_TYPE_P (comp_type) && TYPE_PRECISION (comp_type) < 32) + comp_type = gnat_type_for_size (32, 0); gnu_base = build_cst_from_int (comp_type, Base);
Re: [committed] Fix predcom (PR tree-optimization/59745)
On Fri, Jan 10, 2014 at 9:41 PM, Jakub Jelinek wrote: > Hi! > > split_data_refs_to_components used the name_expansions affine cache > through determine_offset, and since my patch uses it even more often, > but if it returns NULL, we don't free the cache and it can contain garbage > next time we perform tree_predictive_commoning_loop. > > Bootstrapped/regtested on x86_64-linux and i686-linux, committed > to trunk as obvious. No testcase for testsuite, as it is pretty > random if we ICE or not, e.g. stage1 f951 doesn't ICE, but stage2/3 did. Uh. Is this also applicable to branches? Thanks for fixing this, Richard. > 2014-01-10 Jakub Jelinek > > PR tree-optimization/59745 > * tree-predcom.c (tree_predictive_commoning_loop): Call > free_affine_expand_cache if giving up because components is NULL. > > --- gcc/tree-predcom.c.jj 2014-01-07 08:48:34.0 +0100 > +++ gcc/tree-predcom.c 2014-01-10 10:08:04.476340865 +0100 > @@ -2447,6 +2447,7 @@ tree_predictive_commoning_loop (struct l >if (!components) > { >free_data_refs (datarefs); > + free_affine_expand_cache (&name_expansions); >return false; > } > > > Jakub
Re: [PATCH] Allocate all target globals using GC for SWITCHABLE_TARGETs
On Fri, Jan 10, 2014 at 6:37 PM, Richard Henderson wrote: > On 01/09/2014 03:34 PM, Jakub Jelinek wrote: >> 2014-01-09 Jakub Jelinek >> >> * target-globals.c (save_target_globals): Allocate < 4KB structs using >> GC in payload of target_globals struct instead of allocating them on >> the heap and the larger structs separately using GC. >> * target-globals.h (struct target_globals): Make regs, hard_regs, >> reload, expmed, ira, ira_int and lra_fields GTY((atomic)) instead >> of GTY((skip)) and change type to void *. >> (reset_target_globals): Cast loads from those fields to corresponding >> types. >> >> --- gcc/target-globals.h.jj 2014-01-09 19:24:20.0 +0100 >> +++ gcc/target-globals.h 2014-01-09 19:39:43.879348712 +0100 >> @@ -41,17 +41,17 @@ extern struct target_lower_subreg *this_ >> >> struct GTY(()) target_globals { >>struct target_flag_state *GTY((skip)) flag_state; >> - struct target_regs *GTY((skip)) regs; >> + void *GTY((atomic)) regs; > > I'm not entirely fond of this either, for the obvious reason. Clearly a > deficiency in gengtype, but after 2 hours of poking around I can see that > it isn't a quick fix. > > I guess I'm ok with the patch, since the use of the target_globals structure > is so restricted. Yeah. At some time we need a way to specify a finalization hook called if an object is collected and eventually a hook that walks extra roots indirectly reachable via an object (so you can have GC -> heap -> GC memory layouts more easily). Richard. > > r~ >
Re: [PATCH] Final removal of mudflap
Jeff Law wrote: >It's been so long since I did anything with our web pages, I'm not >entirely sure of proper procedures anymore. > >Gerald, this look OK? Basically. ;-) Per http://gcc.gnu.org/codingconventions.html, should it be "run time"? And add a at the end if the item. If anything else needs tweaking, I'll keep an eye on it. Gerald -- Gerald Pfeifer
Re: [PATCH] Fix PR59715
On Fri, Jan 10, 2014 at 4:45 PM, Tom de Vries wrote: > On 09-01-14 13:33, Richard Biener wrote: >> >> On Thu, 9 Jan 2014, Tom de Vries wrote: >> >>> On 09-01-14 10:16, Richard Biener wrote: This fixes PR59715 by splitting critical edges again before code sinking. The critical edge splitting done before PRE was designed to survive until sinking originally, but at least since 4.5 PRE now eventually cleans up the CFG and thus undos critical edge splitting. This results in less than optimal code placement (and lost opportunities) for sinking and it breaks (at least) the virtual operand updating code which assumes that critical edges are still split. >>> >>> Richard, >>> >>> this follow-up patch: >>> - notes in pass_pre that PROP_no_crit_edge is destroyed >>> - notes in pass_sink_code that PROP_no_crit_edge is not required >>>(because it's now ensured by the pass itself) >>> >>> Build and reg-tested pr59715.c on x86_64. >>> >>> OK for stage3 trunk if bootstrap and full reg-test on x86_64 is ok? >> >> >> Ok with /* PROP_no_crit_edges | */ not commented but removed. >> > > Richard, > > Committed to trunk with that change. > > I saw you propagated the fix for PR59715 to 4.8 and 4.7 as well. > > Should I propagate this follow-up patch as well? No, it's purely cosmetic after all. Richard. > Thanks, > - Tom > >> Thanks, >> Richard. >> >
Re: [Patch, fortran] PR58007: unresolved fixup hell
> However, I don't quite see the necessity for changing the module > format (apart from the fact that it makes your patch slightly > simpler). I think it should otherwise reading old module gives "Expected left parenthesis". Cheers, Dominique
Re: [Patch, Fortran] PR 58026: Bad error recovery for allocatable component of undeclared type
2014/1/11 Mikael Morin : > > > Le 09/01/2014 16:30, Janus Weil a écrit : >> Hi all, >> >> the attached patch started out as an ICE-on-invalid regression fix, >> but after the ICE had been fixed recently by other means, it was >> degraded to a mere error-recovery improvement. It removes some rather >> 'hackish' code that was added by Paul quite a long time ago. >> >> Regtests cleanly on x86_64-unknown-linux-gnu. Ok for trunk? >> > Could you check whether it works with a regular error? > i.e. s/gfc_error_now/gfc_error/ > If it doesn't, OK as is. Good point. In fact in works just as well with a plain gfc_error. Committed as r206564 with that change. Thanks for the review. Cheers, Janus
Re: Test cases vect-simd-clone-10/12.c keep failing
On Sun, Jan 12, 2014 at 10:53:58AM +0100, Bernd Edlinger wrote: > The test cases gcc.dg/vect/vect-simd-clone-10.c and > gcc.dg/vect/vect-simd-clone-12.c keep failing on my i686-pc. I do not really > understand why. The problem seem to be the command line to xgcc has > -S and -o and two .c files, probably the test case is not supported at all > on this target, does not have AVX, SSE... It seems that on some configurations (such as very old i?86/x86_64) the default is dg-do compile for vect and not dg-do run. check_vect_support_and_set_flags has: if { [check_effective_target_sse2_runtime] } { set dg-do-what-default run } else { set dg-do-what-default compile } so if you have really old box that doesn't support SSE2 even, you get this. I guess explicit /* { dg-do run } */ needs to be added in this case, after all the test has all the check_vect stuff in. Jakub
Re: [Patch] Patch set for regex instantiation
Hi > On 12/gen/2014, at 01:48, Tim Shen wrote: > > Here're 4 patches that finally led the _Compiler's instantiation and > some other optimization for compiling time. > > 1) Create class _ScannerBase to make _Scanner pithier. Move const > static members to src/c++11/regex.cc. > 2) Make _Compiler and _Scanner `_FwdIter independent`. We store the > input regex string in basic_regex as a basic_string; but when > compiling it, const _CharT* is used. > 3) Avoid using std::map, std::set and std::queue to reduce compile time. > 4) Instantiate _Compiler> and > _Compiler>. Export vector and > vector's ctor and dtor as well for _Compiler's denpendency. Thanks, but as we already tried to explain, instantiating, thus adding many exported symbols, is post 4.9 material, can't be committed until we branch. Please make sure to have in a separate patch or multiple patches the correctness fixes and maybe anything unrelated to instantiation which you consider stable and independently useful. > Booted, and tested with -m64 and -m32; But check-debug failed some > 23_containers/* cases? I suppose it's not my problem? Please make sure Francois knows about that... Paolo
Test cases vect-simd-clone-10/12.c keep failing
Hi Jakub, The test cases gcc.dg/vect/vect-simd-clone-10.c and gcc.dg/vect/vect-simd-clone-12.c keep failing on my i686-pc. I do not really understand why. The problem seem to be the command line to xgcc has -S and -o and two .c files, probably the test case is not supported at all on this target, does not have AVX, SSE... Any ideas? Regards Bernd. Executing on host: /home/ed/gnu/gcc-build/gcc/xgcc -B/home/ed/gnu/gcc-build/gcc/ /home/ed/gnu/gcc-4.9-trunk/gcc/testsuite/gcc.dg/vect/vect-simd-clone-10.c -fno-diagnostics-show-caret -fdiagnostics-color=never -flto -ffat-lto-objects -msse2 -ftree-vectorize -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details -fopenmp-simd /home/ed/gnu/gcc-4.9-trunk/gcc/testsuite/gcc.dg/vect/vect-simd-clone-10a.c -S -o vect-simd-clone-10.s (timeout = 300) spawn /home/ed/gnu/gcc-build/gcc/xgcc -B/home/ed/gnu/gcc-build/gcc/ /home/ed/gnu/gcc-4.9-trunk/gcc/testsuite/gcc.dg/vect/vect-simd-clone-10.c -fno-diagnostics-show-caret -fdiagnostics-color=never -flto -ffat-lto-objects -msse2 -ftree-vectorize -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details -fopenmp-simd /home/ed/gnu/gcc-4.9-trunk/gcc/testsuite/gcc.dg/vect/vect-simd-clone-10a.c -S -o vect-simd-clone-10.s^M xgcc: fatal error: cannot specify -o with -c, -S or -E with multiple files^M compilation terminated.^M compiler exited with status 1 output is: xgcc: fatal error: cannot specify -o with -c, -S or -E with multiple files^M compilation terminated.^M FAIL: gcc.dg/vect/vect-simd-clone-10.c -flto -ffat-lto-objects (test for excess errors) Excess errors: xgcc: fatal error: cannot specify -o with -c, -S or -E with multiple files compilation terminated. Executing on host: /home/ed/gnu/gcc-build/gcc/xgcc -B/home/ed/gnu/gcc-build/gcc/ /home/ed/gnu/gcc-4.9-trunk/gcc/testsuite/gcc.dg/vect/vect-simd-clone-10a.c -fno-diagnostics-show-caret -fdiagnostics-color=never -flto -ffat-lto-objects -msse2 -ftree-vectorize -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details -S -o vect-simd-clone-10a.s (timeout = 300) spawn /home/ed/gnu/gcc-build/gcc/xgcc -B/home/ed/gnu/gcc-build/gcc/ /home/ed/gnu/gcc-4.9-trunk/gcc/testsuite/gcc.dg/vect/vect-simd-clone-10a.c -fno-diagnostics-show-caret -fdiagnostics-color=never -flto -ffat-lto-objects -msse2 -ftree-vectorize -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details -S -o vect-simd-clone-10a.s^M PASS: gcc.dg/vect/vect-simd-clone-10a.c -flto -ffat-lto-objects (test for excess errors)