Re: LRA assign same hard register with live range overlapped pseduos
On 13-04-15 1:20 AM, shiva Chen wrote: HI, I'm trying to port a new 32bit target to GCC 4.8.0 with LRA enabled There is an error case which generates following RTL (insn 536 267 643 3 (set (reg/f:SI 0 $r0 [477]) == r477 assign to r0 (plus:SI (reg/f:SI 31 $sp) (const_int 112 [0x70]))) test2.c:95 64 {*addsi3} (nil)) (insn 643 536 537 3 (set (reg/f:SI 0 $r0 [565]) == r565 assign to r0, and corrupt the usage of r477 (reg/f:SI 31 $sp)) test2.c:95 44 {*movsi} (nil)) (insn 537 643 538 3 (set (reg/v:SI 13 $r13 [orig:61 i14 ] [61]) (mem/c:SI (plus:SI (reg/f:SI 0 $r0 [565]) == use r565 (const_int 136 [0x88])) [5 %sfp+24 S4 A32])) test2.c:95 39 {*load_si} (expr_list:REG_DEAD (reg/f:SI 0 $r0 [565]) (nil))) ... (insn 539 540 270 3 (set (reg:SI 0 $r0 [479]) (plus:SI (reg/f:SI 0 $r0 [477]) (reg:SI 5 $r5 [480]))) test2.c:95 62 {*add_16bit} (expr_list:REG_DEAD (reg:SI 5 $r5 [480]) (expr_list:REG_DEAD (reg/f:SI 0 $r0 [477]) == use r477 which should be $sp +112 Note that the live ranges of r477 and r565 are overlapped but assigned same register $r0. (r31 is stack pointer) By tracing LRA process, I noticed that when r477 is created, the lra_reg_info[r477].val = lra_reg_info[r31] due to (set r477 r31). But after lra_eliminate(), the stack offset changes and r477 is equal to r31+112 instead. In next lra-iteration round, r565 is created, and r565 = r31. In that case, register content of r477 should treat as not equal to r565 due to eliminate offset have been changed. Otherwise, r565 and r477 may assign to same hard register. To recognize that, I record the eliminate offset when the pseudo register have been created. Register content are the same only when lra_reg_info[].val and lra_reg_info[].offset are equal. gcc/lra-assigns.c |6 -- gcc/lra-int.h |2 ++ gcc/lra.c | 12 +++- 3 files changed, 17 insertions(+), 3 deletions(-) diff --git a/gcc/lra-assigns.c b/gcc/lra-assigns.c index b204513..daf0aa9 100644 --- a/gcc/lra-assigns.c +++ b/gcc/lra-assigns.c @@ -448,7 +448,7 @@ find_hard_regno_for (int regno, int *cost, int try_only_hard_regno) int hr, conflict_hr, nregs; enum machine_mode biggest_mode; unsigned int k, conflict_regno; - int val, biggest_nregs, nregs_diff; + int offset, val, biggest_nregs, nregs_diff; enum reg_class rclass; bitmap_iterator bi; bool *rclass_intersect_p; @@ -508,9 +508,11 @@ find_hard_regno_for (int regno, int *cost, int try_only_hard_regno) #endif sparseset_clear_bit (conflict_reload_and_inheritance_pseudos, regno); val = lra_reg_info[regno].val; + offset = lra_reg_info[regno].offset; CLEAR_HARD_REG_SET (impossible_start_hard_regs); EXECUTE_IF_SET_IN_SPARSESET (live_range_hard_reg_pseudos, conflict_regno) -if (val == lra_reg_info[conflict_regno].val) +if ((val == lra_reg_info[conflict_regno].val) + (offset == lra_reg_info[conflict_regno].offset)) { conflict_hr = live_pseudos_reg_renumber[conflict_regno]; nregs = (hard_regno_nregs[conflict_hr] diff --git a/gcc/lra-int.h b/gcc/lra-int.h index 98f2ff7..8ae4eb0 100644 --- a/gcc/lra-int.h +++ b/gcc/lra-int.h @@ -116,6 +116,8 @@ struct lra_reg /* Value holding by register. If the pseudos have the same value they do not conflict. */ int val; + /* Eliminate offset of the pseduo have been created. */ + int offset; /* These members are set up in lra-lives.c and updated in lra-coalesce.c. */ /* The biggest size mode in which each pseudo reg is referred in diff --git a/gcc/lra.c b/gcc/lra.c index 9df24b5..69962be 100644 --- a/gcc/lra.c +++ b/gcc/lra.c @@ -194,7 +194,17 @@ lra_create_new_reg (enum machine_mode md_mode, rtx original, new_reg = lra_create_new_reg_with_unique_value (md_mode, original, rclass, title); if (original != NULL_RTX REG_P (original)) -lra_reg_info[REGNO (new_reg)].val = lra_reg_info[REGNO (original)].val; +{ + lra_reg_info[REGNO (new_reg)].val = lra_reg_info[REGNO (original)].val; + + rtx x = lra_eliminate_regs (original, VOIDmode, NULL_RTX); + + if (GET_CODE (x) == PLUS + GET_CODE (XEXP (x, 1)) == CONST_INT) + lra_reg_info[REGNO (new_reg)].offset = INTVAL (XEXP (x, 1)); + else + lra_reg_info[REGNO (new_reg)].offset = 0; +} return new_reg; } -- 1.7.9.5 Comments? Thanks for working on it, Shiva. Could you send me full dump for lra (and ira if possible) for better understanding the problem situation. It is hard for me to say now that your solution is complete (e.g. offsets can be changed again).
2 quick things: dead link + potential fix
Hi there, A quick note to say, first, thank you for providing such a great site! I am in the middle of a writing project (topic: finding - and keeping - a job in the modern world) and discovered your site while researching. And, second, I wanted to let you know that I did find a link that’s no longer working (on this page: http://gcc.gnu.org/ml/gcc/2002-11/msg00060.html) - you’re still linking to Yahoo’s old “hot jobs” site: http://hotjobs.yahoo.com/ This site was awesome in its day; unfortunately it is no longer live. If you would like to replace the site and need a recommendation, I wondered whether you might consider the job resource page provided by Answers.com? Check it out and see if you agree that it is a good replacement: Job Search Help from Answers.com http://jobs.answers.com/ Thanks for your time, and again, thank you for providing a great site! Nicki Nicole Stoff Research Assistant www.answers.com
Re: increasing testsuite-errors when optimizing for amdfam10/bdver2
Hi, looks like XOP/FAM4/FAM is responsible for the additional errors I see when running gcc-testsuite or glibc-testsuite. I've opened Bug 56866 as a starting point, so the subject is a little bit misleading: Bug 56866 - gcc 4.7.x/gcc-4.8.x with '-O3 -march=bdver2' misscompiles glibc-2.17/crypt/sha512.c Disabling XOP/FAM4/FAM shows no regression (compared with amdfam10) with glibc-testsuite and no additional execution-errors in the gcc-testsuite. Currently I'm running gcc-4.8-branch configured ith '--with-arch=bdver2' and with a simple patch disabling XOP/FAM4/FAM for bdver2 in gcc/config/i386/i386.c. regards winfried On Mon, Apr 01, 2013 at 08:44:59PM +0200, winfried.mag...@t-online.de wrote: Hi, replacing my AMD Phenom2 with a AMD Piledriver (Bulldozer Version2) was reason enough for me to recompile gcc (and the whole linux-system) with hard optimisation set to bdver2 (as I've done since my first linux on an 68030). But this time an increasing number of errors makes me a little bit nervous and after some additional errors when running the glibc-2.17-testsuite I've refused to use this optimisation as default on my system. The results might be interesting for the gcc-developer-community and I've mailed four results with different set of '--with-arch' and '--with-tune' to gcc-testresu...@gcc.gnu.org from stock gcc-4.8.0. I've set '--build=x86_64-winnix-linux-gnu' just to make it easier to search the archive for this specific results (results include the complete set of relevant libs/tools). Basic flags for every compile/test-run: --build=x86_64-winnix-linux-gnu --enable-languages=c,c++ --enable-shared --prefix=/usr --enable-multilib=no optimization for phenom2 (I've used since I've replaced my Athlon-FX): --with-arch=amdfam10 --with-tune=amdfam10 soft-optimization for bdver2 which is the current configuration I use on my system (no additional errors in glibc-2.17: --with-arch=amdfam10 --with-tune=bdver2 optimization for bdver2: --with-arch=bdver2 --with-tune=bdver2 The number of additional errors is always increasing. Mostly errors in scan-assembler and scan-tree-dump (maybe wrong expections in the tests?) but with arch=bdver2 I see an increasing number of execution-tests failing. Surprisingly (at least for me) the difference is only visible in the gcc-testsuite and doesn't harm other languages. I've done some work to ensure errors are not related to the system-setup and maybe it's of interest what I've learned during this process: gcc.dg/guality/vla-1.c and vla-2.c depends on the gdb-version. Fails with stock gdb-7.5.1 (also tested prerelease gdb-7.5.91) and don't fail with gdb-patches from opensuse (fedora-patches works also). Using tcl8.6.0 as base for expect/dejagnu doesn't currently work, at least not with the gcc-testsuite. Please note that this is not a regression and that gcc-4.7.x gives very similar results. Thank you for listening and all the good work I apreciate since 20 years with all sorts of cpu's and operating-systems gcc supports! best regards winfried
Delay slot filling - what still matters, and what doesn't matter so much anymore?
Hello delay-slot target maintainers :-) As you know, I'm playing with a new for-now-toy delay slot filling pass that preserves the CFG, and uses DF and sched-deps instead of resource.c. It's now beginning to take form enough that I run into the to-be-expected unexpected problems and questions. The biggest problem is that I have never been this far down into machine details since the DFA scheduler conversions, and have never worked with targets that have delay slots. I have no idea what really matters, and I hope you can help me with some of those questions. First of all: What is still important to handle? It's clear that the expectations in reorg.c are anything goes but modern RISCs (everything since the PA-8000, say) probably have some limitations on what is helpful to have, or not have, in a delay slot. According to the comments in pa.h about MASK_JUMP_IN_DELAY, having jumps in delay slots of other jumps is one such thing: They don't bring benefit to the PA-8000 and they don't work with DWARF2 CFI. As far as I know, SPARC and MIPS don't allow jumps in delay slots, SH looks like it doesn't allow it either, and CRIS can do it for short branches but doesn't do because the trade-off between benefit and machine description complexity comes out negative. On the scheduler implementation side: Branches as delayed insns in delay slots of other branches is impossible to express in the CFG (at least in GCC, but I think in general it can't be done cleanly). Therefore I want to drop support for branches in delay slots. What do you think about this? What about multiple delay slots? It looks like reorg.c has code to handle insns with multiple delay slots, but there currently are no GCC targets in the FSF tree that have insns with multiple delay slots and that use define_delay. The C6X has many more delay slots than just 1 (it can have up to 5 delay slots IIRC) but it is much more flexible than traditional RISCs when it comes to putting insns in delay slots (it uses predication so it can annul delayed insns on various conditions) and it uses a very clever (and effective??) delay slot filling mechanism via the normal scheduler, using back-tracking and jump shadows (see UNSPEC_JUMP_SHADOW in the cx6 back end). But C6X doesn't use reorg.c delay slot scheduling. I'm not aware of any non-VLIW, non-DSP targets with more than one delay slot per insn, and new VLIW/DSP ports with delay slots probably should look at c6x rather than using define_delay. Supporting only a single delay slot per delay_insn would make my scheduler a bit less complex. Would that be enough for everyone, or is it necessary to continue to support multiple delay slots per insn? Another thing I completely fail to grasp, is how the pipeline scheduler and delay slots interact. Doesn't dbr_schedule destroy all the good work schedule_insns has tried to do? If so, how much does that hurt on modern RISCs? Related question: What, if anything, currently prevents dbr_schedule from causing pipeline stalls by stuffing a long-latency insn in a delay slot? I'm currently using a cost function using: cost = insn_default_latency (trial_insn) - insn_default_latency (delay_insn); saying that a trial_insn with greater latency than delay_insn, and from the same basic block as delay_insn, should not be put in the delay slot. But that's preventing my scheduler from filling slots that reorg.c does fill. For example a case like this on sparc, where cost=1 is greater than the cost threshold I'm using (cost==0 i.e. no cost): (gdb) p debug_rtx(delay_insn) (jump_insn 18 0 0 2 (set (pc) (if_then_else (gt (reg:CCX 100 %icc) (const_int 0 [0])) (label_ref:DI 77) (pc))) t.c:18 48 {*normal_branch} (expr_list:REG_DEAD (reg:CCX 100 %icc) (expr_list:REG_BR_PROB (const_int 2900 [0xb54]) (nil))) - 77) $5 = void (gdb) p insn_default_latency(delay_insn) $6 = 1 (gdb) p debug_rtx(trial_insn) (insn/s:TI 16 13 17 2 (set (reg/v:DI 26 %i2 [orig:112 d ] [112]) (mem/c:DI (plus:DI (reg/f:DI 1 %g1 [122]) (const_int 24 [0x18])) [2 x+24 S8 A64])) t.c:14 72 {*movdi_insn_sp64} (expr_list:REG_DEAD (reg/f:DI 1 %g1 [122]) (nil))) $7 = void (gdb) p insn_default_latency(trial_insn) $8 = 2 (gdb) What do you think will be a good strategy to deal with this (short of integrating delay slot filling in the scheduler proper)? Should I try to find cost==0 delay slot candidates, and only fill slots with cost0 candidates if nothing cheap is available? Prefer a nop over cost0 candidates? Ignore insn_default_latency? Another thing I noticed about targets with delay slots that can be nullified, is that at least some of the ifcvt.c transformations could be applied to fill more delay slots (obviously if_case_1 and if_case_2. In reorg.c, optimize_skip does some kind of if-conversion. Has anyone looked at whether optimize_skip still does something, and derived a test case for that? Thanks for any
Re: LRA assign same hard register with live range overlapped pseduos
Full test2.c.209r.reload is about 296kb and i can't send successfully. Is there another way to send the dump file? Shiva 2013/4/18 Shiva Chen shiva0...@gmail.com: Hi, Vladimir attachment is the ira dump of the case Shiva 2013/4/17 Vladimir Makarov vmaka...@redhat.com: On 13-04-15 1:20 AM, shiva Chen wrote: HI, I'm trying to port a new 32bit target to GCC 4.8.0 with LRA enabled There is an error case which generates following RTL (insn 536 267 643 3 (set (reg/f:SI 0 $r0 [477]) == r477 assign to r0 (plus:SI (reg/f:SI 31 $sp) (const_int 112 [0x70]))) test2.c:95 64 {*addsi3} (nil)) (insn 643 536 537 3 (set (reg/f:SI 0 $r0 [565]) == r565 assign to r0, and corrupt the usage of r477 (reg/f:SI 31 $sp)) test2.c:95 44 {*movsi} (nil)) (insn 537 643 538 3 (set (reg/v:SI 13 $r13 [orig:61 i14 ] [61]) (mem/c:SI (plus:SI (reg/f:SI 0 $r0 [565]) == use r565 (const_int 136 [0x88])) [5 %sfp+24 S4 A32])) test2.c:95 39 {*load_si} (expr_list:REG_DEAD (reg/f:SI 0 $r0 [565]) (nil))) ... (insn 539 540 270 3 (set (reg:SI 0 $r0 [479]) (plus:SI (reg/f:SI 0 $r0 [477]) (reg:SI 5 $r5 [480]))) test2.c:95 62 {*add_16bit} (expr_list:REG_DEAD (reg:SI 5 $r5 [480]) (expr_list:REG_DEAD (reg/f:SI 0 $r0 [477]) == use r477 which should be $sp +112 Note that the live ranges of r477 and r565 are overlapped but assigned same register $r0. (r31 is stack pointer) By tracing LRA process, I noticed that when r477 is created, the lra_reg_info[r477].val = lra_reg_info[r31] due to (set r477 r31). But after lra_eliminate(), the stack offset changes and r477 is equal to r31+112 instead. In next lra-iteration round, r565 is created, and r565 = r31. In that case, register content of r477 should treat as not equal to r565 due to eliminate offset have been changed. Otherwise, r565 and r477 may assign to same hard register. To recognize that, I record the eliminate offset when the pseudo register have been created. Register content are the same only when lra_reg_info[].val and lra_reg_info[].offset are equal. gcc/lra-assigns.c |6 -- gcc/lra-int.h |2 ++ gcc/lra.c | 12 +++- 3 files changed, 17 insertions(+), 3 deletions(-) diff --git a/gcc/lra-assigns.c b/gcc/lra-assigns.c index b204513..daf0aa9 100644 --- a/gcc/lra-assigns.c +++ b/gcc/lra-assigns.c @@ -448,7 +448,7 @@ find_hard_regno_for (int regno, int *cost, int try_only_hard_regno) int hr, conflict_hr, nregs; enum machine_mode biggest_mode; unsigned int k, conflict_regno; - int val, biggest_nregs, nregs_diff; + int offset, val, biggest_nregs, nregs_diff; enum reg_class rclass; bitmap_iterator bi; bool *rclass_intersect_p; @@ -508,9 +508,11 @@ find_hard_regno_for (int regno, int *cost, int try_only_hard_regno) #endif sparseset_clear_bit (conflict_reload_and_inheritance_pseudos, regno); val = lra_reg_info[regno].val; + offset = lra_reg_info[regno].offset; CLEAR_HARD_REG_SET (impossible_start_hard_regs); EXECUTE_IF_SET_IN_SPARSESET (live_range_hard_reg_pseudos, conflict_regno) -if (val == lra_reg_info[conflict_regno].val) +if ((val == lra_reg_info[conflict_regno].val) + (offset == lra_reg_info[conflict_regno].offset)) { conflict_hr = live_pseudos_reg_renumber[conflict_regno]; nregs = (hard_regno_nregs[conflict_hr] diff --git a/gcc/lra-int.h b/gcc/lra-int.h index 98f2ff7..8ae4eb0 100644 --- a/gcc/lra-int.h +++ b/gcc/lra-int.h @@ -116,6 +116,8 @@ struct lra_reg /* Value holding by register. If the pseudos have the same value they do not conflict. */ int val; + /* Eliminate offset of the pseduo have been created. */ + int offset; /* These members are set up in lra-lives.c and updated in lra-coalesce.c. */ /* The biggest size mode in which each pseudo reg is referred in diff --git a/gcc/lra.c b/gcc/lra.c index 9df24b5..69962be 100644 --- a/gcc/lra.c +++ b/gcc/lra.c @@ -194,7 +194,17 @@ lra_create_new_reg (enum machine_mode md_mode, rtx original, new_reg = lra_create_new_reg_with_unique_value (md_mode, original, rclass, title); if (original != NULL_RTX REG_P (original)) -lra_reg_info[REGNO (new_reg)].val = lra_reg_info[REGNO (original)].val; +{ + lra_reg_info[REGNO (new_reg)].val = lra_reg_info[REGNO (original)].val; + + rtx x = lra_eliminate_regs (original, VOIDmode, NULL_RTX); + + if (GET_CODE (x) == PLUS + GET_CODE (XEXP (x, 1)) == CONST_INT) + lra_reg_info[REGNO (new_reg)].offset = INTVAL (XEXP (x, 1)); + else + lra_reg_info[REGNO (new_reg)].offset = 0; +} return new_reg; } -- 1.7.9.5 Comments? Thanks for working on it, Shiva. Could you send me full dump for lra (and ira if possible)
Re: LRA assign same hard register with live range overlapped pseduos
Hi, Vladimir Previous patch probably not completed. The new patch will record lra_reg_info[i].offset as the offset from eliminate register to the pseudo i and keep updating when the stack has been changed. Therefore, lra-assign could get the latest offset to identify the pseudo content is equal or not. gcc/lra-assigns.c |6 -- gcc/lra-eliminations.c | 12 ++-- gcc/lra-int.h |2 ++ gcc/lra.c |5 - 4 files changed, 20 insertions(+), 5 deletions(-) diff --git a/gcc/lra-assigns.c b/gcc/lra-assigns.c index b204513..daf0aa9 100644 --- a/gcc/lra-assigns.c +++ b/gcc/lra-assigns.c @@ -448,7 +448,7 @@ find_hard_regno_for (int regno, int *cost, int try_only_hard_regno) int hr, conflict_hr, nregs; enum machine_mode biggest_mode; unsigned int k, conflict_regno; - int val, biggest_nregs, nregs_diff; + int offset, val, biggest_nregs, nregs_diff; enum reg_class rclass; bitmap_iterator bi; bool *rclass_intersect_p; @@ -508,9 +508,11 @@ find_hard_regno_for (int regno, int *cost, int try_only_hard_regno) #endif sparseset_clear_bit (conflict_reload_and_inheritance_pseudos, regno); val = lra_reg_info[regno].val; + offset = lra_reg_info[regno].offset; CLEAR_HARD_REG_SET (impossible_start_hard_regs); EXECUTE_IF_SET_IN_SPARSESET (live_range_hard_reg_pseudos, conflict_regno) -if (val == lra_reg_info[conflict_regno].val) +if ((val == lra_reg_info[conflict_regno].val) + (offset == lra_reg_info[conflict_regno].offset)) { conflict_hr = live_pseudos_reg_renumber[conflict_regno]; nregs = (hard_regno_nregs[conflict_hr] diff --git a/gcc/lra-eliminations.c b/gcc/lra-eliminations.c index 9df0bae..2d34b51 100644 --- a/gcc/lra-eliminations.c +++ b/gcc/lra-eliminations.c @@ -1046,6 +1046,7 @@ spill_pseudos (HARD_REG_SET set) static void update_reg_eliminate (bitmap insns_with_changed_offsets) { + int i; bool prev; struct elim_table *ep, *ep1; HARD_REG_SET temp_hard_reg_set; @@ -1124,8 +1125,15 @@ update_reg_eliminate (bitmap insns_with_changed_offsets) setup_elimination_map (); for (ep = reg_eliminate; ep reg_eliminate[NUM_ELIMINABLE_REGS]; ep++) if (elimination_map[ep-from] == ep ep-previous_offset != ep-offset) - bitmap_ior_into (insns_with_changed_offsets, - lra_reg_info[ep-from].insn_bitmap); + { +bitmap_ior_into (insns_with_changed_offsets, +lra_reg_info[ep-from].insn_bitmap); + + /* Update offset when the eliminate offset have been changed. */ +for (i = FIRST_PSEUDO_REGISTER; i max_reg_num (); i++) + if (lra_reg_info[i].val - 1 == ep-from) + lra_reg_info[i].offset += (ep-offset - ep-previous_offset); + } } /* Initialize the table of hard registers to eliminate. diff --git a/gcc/lra-int.h b/gcc/lra-int.h index 98f2ff7..944cad1 100644 --- a/gcc/lra-int.h +++ b/gcc/lra-int.h @@ -116,6 +116,8 @@ struct lra_reg /* Value holding by register. If the pseudos have the same value they do not conflict. */ int val; + /* Offset from relative eliminate register to pesudo reg. */ + int offset; /* These members are set up in lra-lives.c and updated in lra-coalesce.c. */ /* The biggest size mode in which each pseudo reg is referred in diff --git a/gcc/lra.c b/gcc/lra.c index 9df24b5..7a60281 100644 --- a/gcc/lra.c +++ b/gcc/lra.c @@ -194,7 +194,10 @@ lra_create_new_reg (enum machine_mode md_mode, rtx original, new_reg = lra_create_new_reg_with_unique_value (md_mode, original, rclass, title); if (original != NULL_RTX REG_P (original)) -lra_reg_info[REGNO (new_reg)].val = lra_reg_info[REGNO (original)].val; +{ + lra_reg_info[REGNO (new_reg)].val = lra_reg_info[REGNO (original)].val; + lra_reg_info[REGNO (new_reg)].offset = 0; +} return new_reg; } Thanks for the comment :) Shiva 2013/4/18 Shiva Chen shiva0...@gmail.com: Full test2.c.209r.reload is about 296kb and i can't send successfully. Is there another way to send the dump file? Shiva 2013/4/18 Shiva Chen shiva0...@gmail.com: Hi, Vladimir attachment is the ira dump of the case Shiva 2013/4/17 Vladimir Makarov vmaka...@redhat.com: On 13-04-15 1:20 AM, shiva Chen wrote: HI, I'm trying to port a new 32bit target to GCC 4.8.0 with LRA enabled There is an error case which generates following RTL (insn 536 267 643 3 (set (reg/f:SI 0 $r0 [477]) == r477 assign to r0 (plus:SI (reg/f:SI 31 $sp) (const_int 112 [0x70]))) test2.c:95 64 {*addsi3} (nil)) (insn 643 536 537 3 (set (reg/f:SI 0 $r0 [565]) == r565 assign to r0, and corrupt the usage of r477 (reg/f:SI 31 $sp)) test2.c:95 44 {*movsi} (nil)) (insn 537 643 538 3 (set (reg/v:SI 13 $r13 [orig:61 i14 ] [61]) (mem/c:SI (plus:SI (reg/f:SI 0 $r0 [565]) == use r565 (const_int 136 [0x88])) [5 %sfp+24 S4 A32]))
Re: Delay slot filling - what still matters, and what doesn't matter so much anymore?
On 04/17/2013 03:52 PM, Steven Bosscher wrote: First of all: What is still important to handle? It's clear that the expectations in reorg.c are anything goes but modern RISCs (everything since the PA-8000, say) probably have some limitations on what is helpful to have, or not have, in a delay slot. According to the comments in pa.h about MASK_JUMP_IN_DELAY, having jumps in delay slots of other jumps is one such thing: They don't bring benefit to the PA-8000 and they don't work with DWARF2 CFI. As far as I know, SPARC and MIPS don't allow jumps in delay slots, SH looks like it doesn't allow it either, and CRIS can do it for short branches but doesn't do because the trade-off between benefit and machine description complexity comes out negative. Note that sparc and/or mips might use the adjust the return pointer trick. I know it wasn't my idea when I added it to the PA. Now the PA really can do jumps in the delay slot of another jump, but the semantics are such that it's not all that helpful and we've never tried to model it. You effectively get a single instruction executed at the first branch target, then you transfer to the second branch target IIRC. It's actually pretty natural semantics once you look at the pc queues work on the PA. On the scheduler implementation side: Branches as delayed insns in delay slots of other branches is impossible to express in the CFG (at least in GCC, but I think in general it can't be done cleanly). Therefore I want to drop support for branches in delay slots. What do you think about this? Certainly no need to support it in the generic case. The only question is whether or not it's worth supporting the adjust the return pointer in the delay slot stuff. Given an target without call/ret predictor stack, it can be a singificant advantage. Such things might exist in the embedded space. What about multiple delay slots? It looks like reorg.c has code to handle insns with multiple delay slots, but there currently are no GCC targets in the FSF tree that have insns with multiple delay slots and that use define_delay. Ping Hans, I think he was the last person who tried to deal with reorg and multiple delay slots (c4x?). I certainly wouldn't lose any sleep if we killed the limit support for multiple delay slots. Another thing I completely fail to grasp, is how the pipeline scheduler and delay slots interact. Doesn't dbr_schedule destroy all the good work schedule_insns has tried to do? If so, how much does that hurt on modern RISCs? It really depends on how the slot is filled and how far in the insn chain you had to look. You're usually just ask likely to improve the schedule as you are to muck it up. Also remember you're dealing stuff at block boundaries, where the scheduler really isn't helping much anyway. There's always a tradeoff here. It could always be improved by having the scheduler mark insns which are good candidates (scheduling-wise) for filling slots. I certainly pondered this a couple decades ago when I cared about delay slot filling on in-order targets :-) Oh yea, those hints have to be directional since it may be good to move an insn earlier to fill a path leading to the insn, but it may not be good to move it later to fill a branch after the insn. Related question: What, if anything, currently prevents dbr_schedule from causing pipeline stalls by stuffing a long-latency insn in a delay slot? I'm currently using a cost function using: This has generally been left to ports to sort out. My experience was that loads/stores were often OK to put into a delay slot. A large part of the reason for this is when we fill via the backwards walk, we're not doing anything speculatively. A nullified slot is different in that it's usually implemented by cancelling out the last stage in the pipeline. So even if you nullify, you still have to go through the entire pipeline. For something like an fpsqrt or fpdiv, that's *really* bad. What do you think will be a good strategy to deal with this (short of integrating delay slot filling in the scheduler proper)? Should I try to find cost==0 delay slot candidates, and only fill slots with cost0 candidates if nothing cheap is available? Prefer a nop over cost0 candidates? Ignore insn_default_latency? It's really been left to the backends to deal with. So for example, on the PA anything which touched the FPU was disallowed in a nullified slot. Another thing I noticed about targets with delay slots that can be nullified, is that at least some of the ifcvt.c transformations could be applied to fill more delay slots (obviously if_case_1 and if_case_2. In reorg.c, optimize_skip does some kind of if-conversion. Has anyone looked at whether optimize_skip still does something, and derived a test case for that? I doubt anyone has looked at it recently. It pre-dates our if-conversion code by a decade or more. Jeff
[Bug tree-optimization/56984] [4.8/4.9 Regression] ICE in tree_vrp.c
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56984 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2013-04-17 CC||jakub at gcc dot gnu.org Target Milestone|--- |4.8.1 Summary|GCC-4.8.0 ICE in tree_vrp.c |[4.8/4.9 Regression] ICE in ||tree_vrp.c Ever Confirmed|0 |1 --- Comment #1 from Jakub Jelinek jakub at gcc dot gnu.org 2013-04-17 06:23:02 UTC --- Started with http://gcc.gnu.org/r184927
[Bug rtl-optimization/56957] [4.9 regression] ICE in add_insn_after, at emit-rtl.c:3783
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56957 Andrey Belevantsev abel at gcc dot gnu.org changed: What|Removed |Added Status|NEW |ASSIGNED AssignedTo|unassigned at gcc dot |abel at gcc dot gnu.org |gnu.org | --- Comment #5 from Andrey Belevantsev abel at gcc dot gnu.org 2013-04-17 06:52:47 UTC --- Created attachment 29886 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29886 proposed patch Easy enough, we can have a speculation transformation that does not change insn at all (e.g. we're asked to speculate an insn already speculated, so we just changed the speculation probability, not the pattern itself), but EXPR_WAS_CHANGED only tests that the transformation history vector is non-empty, so it would report changes have actually happened. So checking additionally that the oldest insn form (last vector element) has the same INSN_ID as the one of the current expr fixes the test. I will throw this to our Itanium for the full testing. Steven, thanks for your insn emitting patches. It was not that easy to catch that kind of issues earlier, AFAIR we noticed it via corruption of our own structures and we needed to trace that back to the offending move
[Bug debug/53453] darwin linker expects both AT_name and AT_comp_dir debug notes
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53453 Eric Botcazou ebotcazou at gcc dot gnu.org changed: What|Removed |Added CC||ebotcazou at gcc dot ||gnu.org --- Comment #17 from Eric Botcazou ebotcazou at gcc dot gnu.org 2013-04-17 07:17:18 UTC --- The patch was silently backported yesterday, but the wrong ChangeLog has been modified. Please post a message on gcc-patches@ and fix the ChangeLog. TIA.
[Bug tree-optimization/56984] [4.8/4.9 Regression] ICE in tree_vrp.c
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56984 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Status|NEW |ASSIGNED AssignedTo|unassigned at gcc dot |jakub at gcc dot gnu.org |gnu.org | --- Comment #2 from Jakub Jelinek jakub at gcc dot gnu.org 2013-04-17 07:30:20 UTC --- Created attachment 29887 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29887 gcc49-pr56984.patch Untested fix. Another thing is that fold resp. gimple_fold aren't able to optimize (x N) M into 0 if M N is the minimum value, but that isn't something VRP should handle.
[Bug tree-optimization/56982] [4.8/4.9 Regression] Bad optimization with setjmp()
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56982 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Status|NEW |ASSIGNED AssignedTo|unassigned at gcc dot |rguenth at gcc dot gnu.org |gnu.org | --- Comment #4 from Richard Biener rguenth at gcc dot gnu.org 2013-04-17 08:26:20 UTC --- I will have a look.
[Bug tree-optimization/50789] Gather vectorization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50789 Andrey Turetskiy andrey.turetskiy at gmail dot com changed: What|Removed |Added CC||andrey.turetskiy at gmail ||dot com --- Comment #11 from Andrey Turetskiy andrey.turetskiy at gmail dot com 2013-04-17 08:31:29 UTC --- It looks like gathers can be used for vectorization in cases like: #define N 1024 float x[4*N], y[N]; void foo () { int i; for (i = 0; i N; i++) y[i] = x[179 + 3*i]; } Now this code isn't vectorized. In addition there are a lot of such exampes in SPECS 2006. Vectorization with gathers can give noticeable gain.
[Bug middle-end/36296] bogus uninitialized warning (loop representation, VRP missed-optimization)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36296 --- Comment #16 from Vincent Lefèvre vincent-gcc at vinc17 dot net 2013-04-17 08:40:09 UTC --- (In reply to comment #3) A way to tell gcc a variable is not uninitialized is to perform self-initialization like int i = i; this will cause no code generation but inhibits the warning. Other compilers may warn about this construct of course. What makes things worse about this workaround is that even protecting this by a #if defined(__GNUC__) may not be sufficient as other compilers may claim GNUC compatibility and behave differently. This is the case of clang (at least under Debian): http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=705583 The only good solution would be to fix the bug. I've checked that it is still there in the trunk revision 197260 (current Debian's gcc-snapshot).
[Bug tree-optimization/56982] [4.8/4.9 Regression] Bad optimization with setjmp()
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56982 --- Comment #5 from Richard Biener rguenth at gcc dot gnu.org 2013-04-17 08:48:33 UTC --- So the questions are: - is it desirable that uncprop does anything to SSA_NAME_VAR == NULL phis? sure - it is all about improving out-of-SSA coalescing opportunities and avoiding copies - shouldn't something like that be not performed if current function calls setjmp (or more narrowly, if there is a returns twice function somewhere in between the considered setter and user)? the testcase shows that uncprop extends the lifetime of an SSA name across a setjmp call - but it can only do so because it's an SSA name. Which means the testcase is questionable as 'n' is not declared volatile, no? - what other optimizations might be similarly problematic across returns twice calls? every optimization pass that performs hosting. PRE comes to my mind for if (x) tem = expr; setjmp () var = expr; which would happily eliminate the partial redundancy, moving expr to the else arm of the if () and thus extending the lifetime of 'var' across the setjmp call. We do not explicitely model the abnormal control flow for setjmp / longjmp which is the reason all these issues may appear. So I believe the correct fix is to either declare the testcase invalid or to model the abnormal control flow explicitely. Add abnormal edges from all call sites in the function that may end up calling longjmp _and_ eventually an abnormal edge from function entry as we can call longjmp from callers as well (though that may be invalid and thus we do not have to care?). I don't see an easy fix for the issue (well, maybe the specific testcase). That it happens only after my patch is probably pure luck because of for example the PRE issue. Testcase for that: int f (int a, int flag) { int tem; if (flag) tem = a + 1; int x = setjmp (env); int tem2 = a + 1; if (x) return tem2; return tem; } validity of course is questionable, but we clearly use tem only on the normal path and tem2 on the abnormal path. PRE does the transform I indicated, proper abnormal edges would disable the transform.
[Bug tree-optimization/50789] Gather vectorization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50789 --- Comment #12 from rguenther at suse dot de rguenther at suse dot de 2013-04-17 08:53:21 UTC --- On Wed, 17 Apr 2013, andrey.turetskiy at gmail dot com wrote: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50789 Andrey Turetskiy andrey.turetskiy at gmail dot com changed: What|Removed |Added CC||andrey.turetskiy at gmail ||dot com --- Comment #11 from Andrey Turetskiy andrey.turetskiy at gmail dot com 2013-04-17 08:31:29 UTC --- It looks like gathers can be used for vectorization in cases like: #define N 1024 float x[4*N], y[N]; void foo () { int i; for (i = 0; i N; i++) y[i] = x[179 + 3*i]; } Now this code isn't vectorized. In addition there are a lot of such exampes in SPECS 2006. Vectorization with gathers can give noticeable gain. The above can be vectorized with the strided-load vectorization support (just it doesn't trigger here). And strided-load vectorization code-generation can be imrpoved by using gather vectorization by first building a vector of addresses / indices and then performing a gather load. If building a vector of addresses / indices is cheaper than performing scalar loads and building a vector from the results, that is. So the above is more related to strided load support (and the not yet implemented strided store support as well, if there are also gather stores ...) Richard.
[Bug tree-optimization/56982] [4.8/4.9 Regression] Bad optimization with setjmp()
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56982 --- Comment #6 from Jakub Jelinek jakub at gcc dot gnu.org 2013-04-17 08:56:00 UTC --- I don't see how we could declare the testcase invalid, why would n need to be volatile? It isn't live across the setjmp call, it is even declared after the setjmp call, and it is always initialized after the setjmp call.
[Bug fortran/56814] [4.8/4.9 Regression] Bogus Interface mismatch in dummy procedure
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56814 --- Comment #5 from janus at gcc dot gnu.org 2013-04-17 08:58:25 UTC --- Alternative patch: Index: gcc/fortran/interface.c === --- gcc/fortran/interface.c(revision 198007) +++ gcc/fortran/interface.c(working copy) @@ -1184,9 +1184,20 @@ check_result_characteristics (gfc_symbol *s1, gfc_ { gfc_symbol *r1, *r2; - r1 = s1-result ? s1-result : s1; - r2 = s2-result ? s2-result : s2; + if (s1-ts.interface s1-ts.interface-result) +r1 = s1-ts.interface-result; + else if (s1-result) +r1 = s1-result; + else +r1 = s1; + if (s2-ts.interface s2-ts.interface-result) +r2 = s2-ts.interface-result; + else if (s2-result) +r2 = s2-result; + else +r2 = s2; + if (r1-ts.type == BT_UNKNOWN) return true; Regtesting now ...
[Bug bootstrap/56644] --disable-nls requires symbols from libintl
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56644 --- Comment #6 from Markus Eisenmann meisenmann@fh-salzburg.ac.at 2013-04-17 09:01:04 UTC --- Created attachment 29888 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29888 Prevent redirect to some libintl-functions if NLS isn't requested This Patch will undefine some macros which would cause unneed redirections to libintl-functions (like vsnprintf); while NLS isn't configured (I.e. ENABLE_NLS is not set).
[Bug tree-optimization/56982] [4.8/4.9 Regression] Bad optimization with setjmp()
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56982 --- Comment #7 from rguenther at suse dot de rguenther at suse dot de 2013-04-17 09:07:10 UTC --- On Wed, 17 Apr 2013, jakub at gcc dot gnu.org wrote: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56982 --- Comment #6 from Jakub Jelinek jakub at gcc dot gnu.org 2013-04-17 08:56:00 UTC --- I don't see how we could declare the testcase invalid, why would n need to be volatile? It isn't live across the setjmp call, it is even declared after the setjmp call, and it is always initialized after the setjmp call. Then there is no other way but to model the abnormal control flow properly. Even simple CSE can break things otherwise. Consider int tmp = a + 1; setjmp () int tmp2 = a + 1; even on RTL CSE would break that, no? setjmp doesn't even forcefully start a new basic-block. Hmm, maybe doing that, start a new BB for all returns-twice calls and add an abnormal edge from FN entry is enough to avoid all possibly dangerous transforms. Richard.
[Bug ada/40986] [4.6 regression] Assert_Failure sinfo.adb:360, error detected at a-unccon.ads:23:27
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40986 --- Comment #15 from Markus Schöpflin markus.schoepflin at comsoft dot de 2013-04-17 09:15:22 UTC --- I have bisected the problem using the git gcc repository, unfortunately 121 commits are left after bisecting, as in between the last known good and the first known bad commit the gcc tree does not compile for a lot of commits. Anyway, this is the last known good commit: commit 244de65defd519a1245551886fce58113a4b7b2a Author: charlet charlet@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Wed Jun 6 10:13:25 2007 + This is the first known bad commit: commit 7b29e7de1cf940343eeeb25058b7870877d15524 Author: charlet charlet@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Wed Jun 6 10:54:04 2007 + The 120 commits in between do not compile, and all do massive changes in the Ada part of gcc.
[Bug bootstrap/56644] --disable-nls requires symbols from libintl
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56644 --- Comment #7 from Markus Eisenmann meisenmann@fh-salzburg.ac.at 2013-04-17 09:17:36 UTC --- At least this error is based on some libintl-macros, which will redirect some stdio-functions (like vsnprintf ...) to their libintl-version(s); I.e. the header-file libintl.h is available and included, but NLS/libintl isn't requested. Solution (as be possible by processing the attached patch/diff-file): Add following undef's in the #else-region of gcc/intl.h (of #ifdef ENABLE_NLS), for example after line #54: #undeffprintf #undefvfprintf #undefprintf #undefvprintf #undefsprintf #undefvsprintf #undefsnprintf #undefvsnprintf #undefasprintf #undefvasprintf #undefsetlocale Additional comment: The header-file gcc/intl.h does already contain undef's to prevent using libintl if not requested or configured. But not for the affected stdio-funcs as vsnprintf [...], which may cause linker-errors (I.e. unresolved externals).
[Bug middle-end/36296] bogus uninitialized warning (loop representation, VRP missed-optimization)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36296 Manuel López-Ibáñez manu at gcc dot gnu.org changed: What|Removed |Added Keywords||diagnostic --- Comment #17 from Manuel López-Ibáñez manu at gcc dot gnu.org 2013-04-17 09:19:09 UTC --- (In reply to comment #16) (In reply to comment #3) A way to tell gcc a variable is not uninitialized is to perform self-initialization like int i = i; this will cause no code generation but inhibits the warning. Other compilers may warn about this construct of course. The only good solution would be to fix the bug. I've checked that it is still there in the trunk revision 197260 (current Debian's gcc-snapshot). If you mean to fix the false warning, then you are likely to wait a long long time (in order of years) because it doesn't seem a trivial thing to fix and there are very very few people with enough GCC knowledge to fix it (and they are busy with other things). What would be trivial to fix (but require persistence, patience and time) is to implement this idea: http://gcc.gnu.org/ml/gcc/2010-08/msg00297.html that is, either __attribute__ ((initialized)) or _Pragma(GCC diagnostic ignored \-Wuninitialized\). (Personally, I prefer the latter, since it reuses existing code). Add as a follow-up, get rid of the non-portable valgrind-unfriendly i=i idiom that has caused so much grief over the years. However, we still need someone with the persistence, patience and time to implement this and get it past the powers that be.
[Bug tree-optimization/56982] [4.8/4.9 Regression] Bad optimization with setjmp()
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56982 --- Comment #8 from Jakub Jelinek jakub at gcc dot gnu.org 2013-04-17 09:28:55 UTC --- #include stdio.h #include setjmp.h static sigjmp_buf env; static inline int g (int x) { if (x) { fprintf (stderr, Returning 0\n); return 0; } else { fprintf (stderr, Returning 1\n); return 1; } } __attribute__ ((noinline)) void bar (int n) { if (n == 0) exit (0); static int x; if (x++) abort (); longjmp (env, 42); } int f (int *e) { int n = *e; if (n) return 1; int x = setjmp (env); n = g (x); fprintf (stderr, x = %i, n = %i\n, x, n); bar (n); } int main () { int v = 0; return f (v); } Adjusted testcase that fails even with GCC 4.7.2 at -O2, works with -O2 -fno-dominator-opts (which disables uncprop). Again, I don't see how this could be declared invalid, while n is declared before the setjmp, it is not live across the setjmp call. This adjusted testcase regressed in April 2005 (i.e. 4.1+ regression).
[Bug web/44269] Search for PR number in mailing lists fails
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44269 Shakthi Kannan skannan at redhat dot com changed: What|Removed |Added CC||skannan at redhat dot com --- Comment #2 from Shakthi Kannan skannan at redhat dot com 2013-04-17 09:31:28 UTC --- Searching for 18249 in the web archive of the gcc-patches mailing list with mnoGoSearch 3.3.13 does return the link to PR 18249. http://gcc.gnu.org/ml/gcc-patches/2010-05/msg01458.html
[Bug middle-end/36296] bogus uninitialized warning (loop representation, VRP missed-optimization)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36296 --- Comment #18 from Manuel López-Ibáñez manu at gcc dot gnu.org 2013-04-17 09:31:59 UTC --- In fact, we should have removed the i=i idiom a long time ago. The correct thing to do (as Linus says) is to initialize the variable to a sensible value to silence the warning: http://lwn.net/Articles/529954/ If GCC is smart enough to remove the initialization, then there is no harm. If GCC is not smart enough, then the code is probably complex enough that GCC cannot optimize it properly and this is why it gives a false positive, so the fake initialization is the least of your worries.
[Bug middle-end/36296] bogus uninitialized warning (loop representation, VRP missed-optimization)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36296 --- Comment #19 from Manuel López-Ibáñez manu at gcc dot gnu.org 2013-04-17 09:37:24 UTC --- (In reply to comment #2) 1. Split the -Wuninitialized into two different warnings: one for which gcc knows that the variable is uninitialized and one for which it cannot decide. -Wuninitialized currently does both. Note that -Wmaybe-uninitialized is available since at least GCC 4.8.0
[Bug fortran/56981] Slow I/O: Unformatted 5x slower, large sys component; formatted slow as well
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56981 --- Comment #3 from Tobias Burnus burnus at gcc dot gnu.org 2013-04-17 09:39:58 UTC --- (In reply to comment #2) There is a seek inside next_record_w_unf. That function is used for DIRECT I/O. Looks conceptually wrong to me for sequential unformatted. I won't have time for a few days to look at this further. Well, what gfortran does is: * write place-holder record length in the heading record marker * write actual data * write tailing record marker (1st call to write_us_marker in next_record_w_unf) * write actual length of this record, i.e. seek back + write_us_marker + see to past the tailing record marker (all in next_record_w_unf) I think what other compilers do is to make use of the following item in the Fortran standard: The value of the RECL= specifier shall be positive. It specifies the length of each record in a file being connected for direct access, or specifies the maximum length of a record in a file being connected for sequential access. (F2008, 9.5.6.15 RECL= specifier in the OPEN statement) I tried the following program: --- integer, allocatable :: array(:) integer :: rl, i open(99,file=/dev/null,form=unformatted) inquire(99,recl=rl) allocate(array(1024*1024*100)) array = 0 print *,rl, size(array)/4 write(99) (array, i=1,1000) close(99) end --- With gfortran, it takes only: 0.203s and one has: 19 mmap 26 open 392 lseek 1784 write The question is why there are that many seeks. There should be only a single record! With pathf95, it fails after 0.099s with the error: This request exceeds the maximum record size. And with g95, it takes 4.946s (!) until it fails with Writing more data than the record size (RECL): 11 close 17 fstat 20 mprotect 21 stat 25 mmap 30 write 47 open where the mmap+munmap pairs seem to take the lion share of the time. However, one can do better: NAG f95 only needs 0.007s and does: 5 read 6 lseek 8 mprotect 10 fstat 23 stat 29 mmap 40 open 2003 write Maybe something like the following would work: * Create a reasonable sized buffer * Use it to buffer the writes, and if it fits, write the length, the buffer, the length. * If the argument is a (too) big array, write the length of data in the buffer plus array byte size, then the data - and only if another item comes, seek to the beginning and update the length. That should take care of: write(99) i, j, k write(99) i, j, k, small_array write(99) big_array and even write(99) i, j, k, big_array but it will not help for write(99) big_array1, big_array2 I think that covers the most important cases. One question is how large the buffer should be initially, whether it should be resizable - and how long it should remain allocated. Even a small buffer of 1024 kbyte (= 128 real(8) values) will help when writing small data like in the example of comment 0. If it is larger, the issue of freeing the data and/or resizing becomes more important - and one needs to be careful not to require huge amount of memory and/or do do very frequent memory allocation+freeing, which causes the problems with g95. * * * Closer look at NAG: It does the following (allocate moved before open, inquire removed): open(/dev/null, O_RDWR) = 3 mmap(NULL, 3856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2ab0e000 mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2ab22000 fstat(3, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0 ioctl(3, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7fff645b1c90) = -1 ENOTTY (Inappropriate ioctl for device) fstat(3, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0 ioctl(3, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7fff645b2200) = -1 ENOTTY (Inappropriate ioctl for device) mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2ac22000 lseek(3, 0, SEEK_CUR) = 0 write(3, \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0..., 4096) = 4096 write(3, \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0..., 419426304) = 419426304 (and 999 further write lines) lseek(3, 0, SEEK_SET) = 0 read(3, , 4096) = 0 lseek(3, 12, SEEK_CUR) = 0 write(3, \0\0\0\250, 4) = 4 lseek(3, 18446744072233156608, SEEK_SET) = 0 read(3, , 4096) = 0 lseek(3, 20, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 ftruncate(3, 0) = -1 EINVAL (Invalid argument) close(3)= 0 munmap(0x2ac22000, 4096)= 0
[Bug tree-optimization/56982] [4.8/4.9 Regression] Bad optimization with setjmp()
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56982 --- Comment #9 from Richard Biener rguenth at gcc dot gnu.org 2013-04-17 09:57:52 UTC --- Created attachment 29889 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29889 patch Untested patch. The patch handles setjmp similar to a non-local label, thus force it to start a new basic-block and get abnormal edges from all sites that can make a non-local goto or call longjmp. Fixes the testcase for me. Somewhat reduced: #include stdio.h #include stdlib.h #include setjmp.h static sigjmp_buf env; static inline int g(int x) { if (x) { fprintf(stderr, Returning 0\n); return 0; } else { fprintf(stderr, Returning 1\n); return 1; } } int f(int *e) { if (*e) return 1; int x = setjmp(env); int n = g(x); if (n == 0) exit(0); if (x) abort(); longjmp(env, 42); } int main(int argc, char** argv) { int v = 0; return f(v); } but I cannot remove the remaining printfs, so it's not appropriate for the testsuite yet.
[Bug translation/56985] New: gcc/fortran/resolve.c:920: '%s' in cannot appear in COMMON ...
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56985 Bug #: 56985 Summary: gcc/fortran/resolve.c:920: '%s' in cannot appear in COMMON ... Classification: Unclassified Product: gcc Version: 4.8.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: translation AssignedTo: unassig...@gcc.gnu.org ReportedBy: sti...@antcom.de In gcc/fortran/resolve.c:920: '%s' in cannot appear in COMMON ... - I guess the in is not intended?
[Bug web/45655] GCC WIki Needs Text Colorizing Capability
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45655 Shakthi Kannan skannan at redhat dot com changed: What|Removed |Added CC||skannan at redhat dot com --- Comment #1 from Shakthi Kannan skannan at redhat dot com 2013-04-17 10:06:03 UTC --- I tested with the following: Color2(red courier on blue,col=red,bcol=blue,font=courier) Color2(Green Font on Yellow Background,green,yellow) Color2(Orange Text,orange) Color2(Text with commas:one,two,three,red) Color2(Optional parameters,bcol=yellow) at: http://gcc.gnu.org/wiki/WikiSandBox and can confirm that text colorizing macro isn't working on the GCC Wiki.
[Bug web/45688] Typo in __attribute__((version-id)) docs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45688 Shakthi Kannan skannan at redhat dot com changed: What|Removed |Added CC||skannan at redhat dot com --- Comment #2 from Shakthi Kannan skannan at redhat dot com 2013-04-17 10:36:30 UTC --- http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html#Function-Attributes now mentions version_id correctly: extern int foo () __attribute__((version_id (20040821)));
[Bug fortran/56981] Slow I/O: Unformatted 5x slower, large sys component; formatted slow as well
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56981 --- Comment #4 from Janne Blomqvist jb at gcc dot gnu.org 2013-04-17 10:50:07 UTC --- The reason why gfortran is slow here is that for non-regular files we use unbuffered I/O. If you write to a regular file instead of /dev/null, you'll see us doing ~8 KB writes at a time. On my system, timing writing to /dev/null gives real0m0.727s user0m0.272s sys 0m0.452s whereas writing to a file gives real0m0.202s user0m0.180s sys 0m0.020s The reason for this is that non-regular files (a.k.a. special files) are special in many ways wrt seeking. Some allow seeking just fine, some always return 0, some return an error (and which special files behave in which way is to some extent different on different OS'es). As the buffered IO keeps track of the logical file pointer position, it can easily get out of sync with the physical position if it doesn't behave as for a regular file. Also, for special files users often expect non-buffered IO, e.g. they want output on the terminal directly instead of waiting until the 8 KB buffer fills up, programs communicating via pipes can deadlock if data sits in the buffers, etc. One could of course make unbuffered I/O in gfortran really mean flush the buffer at the end of each I/O statement rather than not using a buffer at all and instead using the raw POSIX I/O syscalls. This would perhaps not be a bad idea per se, but would require making the buffered I/O code handle special files in some sensible way. Another reason for gfortran slowness is that we do quite a lot of checking in data_transfer_init(), which means that there's quite a lot of per-record overhead. Writing a single element unformatted is thus the worst case. One way to speed up data_transfer_init, I think, is that instead of checking each flag bit (which says which I/O specifiers are present) separately, create a variable with forbidden flags for each I/O type (unformatted/formatted, sequential/direct/stream = 6x), and check the entire flag variable once (flag forbidden_flags == 0). Only if there is an error, do the bit-by-bit checking in order to generate the error message.
[Bug middle-end/36296] bogus uninitialized warning (loop representation, VRP missed-optimization)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36296 --- Comment #20 from Vincent Lefèvre vincent-gcc at vinc17 dot net 2013-04-17 11:17:14 UTC --- (In reply to comment #18) In fact, we should have removed the i=i idiom a long time ago. The correct thing to do (as Linus says) is to initialize the variable to a sensible value to silence the warning: http://lwn.net/Articles/529954/ There is no real sensible value except some trap value. Letting the variable uninitialized at that point (the declaration) allows some tools, like the Formalin compiler described in WG14/N1637, to detect potential problems if the variable is really used uninitialized. (In reply to comment #19) Note that -Wmaybe-uninitialized is available since at least GCC 4.8.0 OK, so a solution would be to add a configure test for projects that don't want such warnings (while still using -Wall) to see whether -Wno-maybe-uninitialized is supported.
[Bug middle-end/36296] bogus uninitialized warning (loop representation, VRP missed-optimization)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36296 --- Comment #21 from Manuel López-Ibáñez manu at gcc dot gnu.org 2013-04-17 11:26:01 UTC --- (In reply to comment #20) OK, so a solution would be to add a configure test for projects that don't want such warnings (while still using -Wall) to see whether -Wno-maybe-uninitialized is supported. When an unrecognized warning option is requested (e.g., -Wunknown-warning), GCC will emit a diagnostic stating that the option is not recognized. However, if the -Wno- form is used, the behavior is slightly different: No diagnostic will be produced for -Wno-unknown-warning unless other diagnostics are being produced. This allows the use of new -Wno- options with old compilers, but if something goes wrong, the compiler will warn that an unrecognized option was used.
[Bug middle-end/36296] bogus uninitialized warning (loop representation, VRP missed-optimization)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36296 --- Comment #22 from Manuel López-Ibáñez manu at gcc dot gnu.org 2013-04-17 11:31:29 UTC --- (In reply to comment #20) (In reply to comment #18) In fact, we should have removed the i=i idiom a long time ago. The correct thing to do (as Linus says) is to initialize the variable to a sensible value to silence the warning: http://lwn.net/Articles/529954/ There is no real sensible value except some trap value. Letting the variable uninitialized at that point (the declaration) allows some tools, like the Formalin compiler described in WG14/N1637, to detect potential problems if the variable is really used uninitialized. That doesn't contradict my assessment above that i=i idiom should die. With the Pragma one can choose to ignore GCC warnings if they don't want to initialize the value. The trap value would be an additional improvement, but someone needs to implement it. Clang has fsanitize=undefined-trap: http://clang.llvm.org/docs/UsersManual.html#controlling-code-generation
[Bug fortran/56814] [4.8/4.9 Regression] Bogus Interface mismatch in dummy procedure
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56814 --- Comment #6 from janus at gcc dot gnu.org 2013-04-17 11:41:21 UTC --- (In reply to comment #5) Alternative patch: In contrast to the patch in comment #3, this one regtests cleanly ...
[Bug rtl-optimization/56921] [4.9 Regression] ICE in rtx_cost called by doloop_optimize_loops for PPC
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56921 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED --- Comment #15 from Richard Biener rguenth at gcc dot gnu.org 2013-04-17 12:02:05 UTC --- Author: rguenth Date: Wed Apr 17 12:01:46 2013 New Revision: 198025 URL: http://gcc.gnu.org/viewcvs?rev=198025root=gccview=rev Log: 2013-04-17 Richard Biener rguent...@suse.de PR rtl-optimization/56921 * cfgloop.h (struct loop): Add simple_loop_desc member. (struct niter_desc): Mark with GTY(()). (simple_loop_desc): Do not use aux field but simple_loop_desc. * loop-iv.c (get_simple_loop_desc): Likewise. (free_simple_loop_desc): Likewise. Revert 2013-04-16 Richard Biener rguent...@suse.de PR rtl-optimization/56921 * loop-init.c (pass_rtl_move_loop_invariants): Add TODO_do_not_ggc_collect to todo_flags_finish. (pass_rtl_unswitch): Same. (pass_rtl_unroll_and_peel_loops): Same. (pass_rtl_doloop): Same. Modified: trunk/gcc/ChangeLog trunk/gcc/cfgloop.h trunk/gcc/loop-init.c trunk/gcc/loop-iv.c
[Bug middle-end/36296] bogus uninitialized warning (loop representation, VRP missed-optimization)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36296 --- Comment #23 from Vincent Lefèvre vincent-gcc at vinc17 dot net 2013-04-17 12:24:56 UTC --- (In reply to comment #21) When an unrecognized warning option is requested (e.g., -Wunknown-warning), GCC will emit a diagnostic stating that the option is not recognized. However, if the -Wno- form is used, the behavior is slightly different: No diagnostic will be produced for -Wno-unknown-warning unless other diagnostics are being produced. That was mainly for pre-4.7 GCC versions, where without the i=i idiom, one would get the usual may be used uninitialized in this function warning because -Wno-maybe-uninitialized is not supported, but also the unrecognized command line option -Wno-maybe-uninitialized warning because there was already a warning. However this may not really be important.
[Bug middle-end/36296] bogus uninitialized warning (loop representation, VRP missed-optimization)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36296 --- Comment #24 from Vincent Lefèvre vincent-gcc at vinc17 dot net 2013-04-17 12:34:40 UTC --- BTW, since with the latest GCC versions (such as Debian's GCC 4.7.2), the warning is no longer issued with -Wno-maybe-uninitialized, perhaps the bug severity could be lowered to enhancement.
[Bug web/45688] Typo in __attribute__((version-id)) docs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45688 Manuel López-Ibáñez manu at gcc dot gnu.org changed: What|Removed |Added Status|NEW |RESOLVED CC||manu at gcc dot gnu.org Resolution||FIXED --- Comment #3 from Manuel López-Ibáñez manu at gcc dot gnu.org 2013-04-17 13:04:42 UTC --- So FIXED. Thanks!
[Bug target/56948] PPC V2DI ICE when loading zero into GPRs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56948 David Edelsohn dje at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED --- Comment #2 from David Edelsohn dje at gcc dot gnu.org 2013-04-17 13:27:24 UTC --- Patch applied to trunk and GCC 4.8 branch.
[Bug web/45688] Typo in __attribute__((version-id)) docs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45688 --- Comment #4 from Manuel López-Ibáñez manu at gcc dot gnu.org 2013-04-17 13:30:35 UTC --- Actually, the bug was version level functioning. Since it is obvious, I fixed it. http://gcc.gnu.org/r198028
[Bug fortran/40958] module files too large
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40958 Tobias Burnus burnus at gcc dot gnu.org changed: What|Removed |Added CC||burnus at gcc dot gnu.org --- Comment #9 from Tobias Burnus burnus at gcc dot gnu.org 2013-04-17 13:50:31 UTC --- Author: jb Date: Tue Mar 26 22:08:17 2013 New Revision: 197124 URL: http://gcc.gnu.org/viewcvs?rev=197124root=gccview=rev Log: PR 25708 Use a temporary buffer when parsing module files. 2013-03-27 Janne Blomqvist j...@gcc.gnu.org PR fortran/25708 * module.c (module_locus): Use long for position. (module_content): New variable. (module_pos): Likewise. (prev_character): Remove. (bad_module): Free data instead of closing mod file. (set_module_locus): Use module_pos. (get_module_locus): Likewise. (module_char): use buffer rather than stdio file. (module_unget_char): Likewise. (read_module_to_tmpbuf): New function. (gfc_use_module): Call read_module_to_tmpbuf. Modified: trunk/gcc/fortran/ChangeLog trunk/gcc/fortran/module.c
[Bug fortran/40958] module files too large
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40958 --- Comment #10 from Tobias Burnus burnus at gcc dot gnu.org 2013-04-17 13:50:58 UTC --- Author: jb Date: Wed Apr 17 10:19:40 2013 New Revision: 198023 URL: http://gcc.gnu.org/viewcvs?rev=198023root=gccview=rev Log: PR 40958 Compress module files with zlib. frontend ChangeLog: 2013-04-17 Janne Blomqvist j...@gcc.gnu.org PR fortran/40958 * scanner.h: New file. * Make-lang.in: Dependencies on scanner.h. * scanner.c (gfc_directorylist): Move to scanner.h. * module.c: Don't include md5.h, include scanner.h and zlib.h. (MOD_VERSION): Add comment about backwards compatibility. (module_fp): Change type to gzFile. (ctx): Remove. (gzopen_included_file_1): New function. (gzopen_included_file): New function. (gzopen_intrinsic_module): New function. (write_char): Use gzputc. (read_crc32_from_module_file): New function. (read_md5_from_module_file): Remove. (gfc_dump_module): Use gz* functions instead of stdio, check gzip crc32 instead of md5. (read_module_to_tmpbuf): Use gz* functions instead of stdio. (gfc_use_module): Use gz* functions. testsuite ChangeLog: 2013-04-17 Janne Blomqvist j...@gcc.gnu.org PR fortran/40958 * lib/gcc-dg.exp (scan-module): Uncompress module file before scanning. * gfortran.dg/module_md5_1.f90: Remove. Added: trunk/gcc/fortran/scanner.h Removed: trunk/gcc/testsuite/gfortran.dg/module_md5_1.f90 Modified: trunk/gcc/fortran/ChangeLog trunk/gcc/fortran/Make-lang.in trunk/gcc/fortran/module.c trunk/gcc/fortran/scanner.c trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/lib/gcc-dg.exp
[Bug c++/54320] [c++11] range access to VLA
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54320 Paolo Carlini paolo.carlini at oracle dot com changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2013-04-17 Ever Confirmed|0 |1 --- Comment #8 from Paolo Carlini paolo.carlini at oracle dot com 2013-04-17 14:11:54 UTC --- This is now covered (allowed) in N3497.
[Bug c++/54320] [c++11] range access to VLA
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54320 --- Comment #9 from Paolo Carlini paolo.carlini at oracle dot com 2013-04-17 14:16:40 UTC --- Sorry, the most recent paper in the series is actually N3639.
[Bug c++/55149] capturing VLA in lambda
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55149 --- Comment #5 from Paolo Carlini paolo.carlini at oracle dot com 2013-04-17 14:19:07 UTC --- Likewise capturing VLAs is covered in N3639 (only capture by reference allowed)
[Bug fortran/56981] Slow I/O: Unformatted 5x slower, large sys component; formatted slow as well
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56981 --- Comment #5 from Tobias Burnus burnus at gcc dot gnu.org 2013-04-17 14:50:16 UTC --- (In reply to comment #4) The reason why gfortran is slow here is that for non-regular files we use unbuffered I/O. If you write to a regular file instead of /dev/null, you'll see us doing ~8 KB writes at a time. The reason for this is that non-regular files (a.k.a. special files) are special in many ways wrt seeking. Some allow seeking just fine, some always return 0, some return an error (and which special files behave in which way is to some extent different on different OS'es). I do not understand the argument regarding seek. If seek doesn't work - why should there be a problem with buffering but not without? At least with SEQUENTIAL one cannot do without (buffer exceeded or no buffering) and with STREAM no seek should be required. Also, for special files users often expect non-buffered IO, e.g. they want output on the terminal directly instead of waiting until the 8 KB buffer fills up, programs communicating via pipes can deadlock if data sits in the buffers, etc. But the code should be able to wait until a complete record has been written? That should be rather quick, unless one write a 2GB array. I am not talking about flushing the data only when 8kB are filled or when the file is closed. And doing buffering within a record avoids seeks. One could of course make unbuffered I/O in gfortran really mean flush the buffer at the end of each I/O statement rather than not using a buffer at all. We should consider this. * * * I have now updated timings with writing to a file. Results for the example in comment 0, but writing to a file (test.dat, tmpfs). Unformatted is much faster with a normal file, but some others compilers are still significantly faster. And for formatted, all other compilers are significantly faster. Timing in sec Unformatted Formatted real / user real / user Compiler --- --- - 0.378/0.352 2.815/2.804 GCC 4.8.0 (-Ofast, 20130308, Rev. 196547) 0.307/0.296 1.303/1.288 g95 4.0.3 (g95 0.93!) Aug 17 2010 (-O3) 0.210/0.196 0.555/0.532 Sun Fortran 95 8.3 Linux_i386 2007/05/03 0.208/0.184 0.920/0.888 PathScale 3.2.99 0.176/0.152 2.185/2.168 NAGWare Fortran 5.1 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 0.127/0.125 1.091/1.080 GCC 4.9 (trunk, -Ofast) 0.120/0.118 0.465/0.459 g95 4.0.3 (g95 0.94!) Dec 17 2012 0.136/0.131 0.527/0.524 PathScale EKOPath 4.9.0 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 0.335/0.316 2.866/2.860 GCC 4.7.2 20120920 (Cray Inc.) 0.204/0.188 0.659/0.628 Cray Fortran : Version 8.1.6 0.881/0.328 1.281/0.672 Intel 64, Version 13.1.1.163 0.444/0.432 0.884/0.864 pgf90 12.10-0 ---
[Bug c/35649] Incorrect printf warning: expect double has float
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35649 Trevor Morgan trevmrgn+bug at gmail dot com changed: What|Removed |Added Target|h8300-elf |h8300-elf, rx-elf, avr --- Comment #12 from Trevor Morgan trevmrgn+bug at gmail dot com 2013-04-17 15:22:48 UTC --- printf( %f, 2.0D ); will also produce the erroneous warning (tried on rx-elf)
[Bug translation/56986] New: config/epiphany/epiphany.opt:108: floatig
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56986 Bug #: 56986 Summary: config/epiphany/epiphany.opt:108: floatig Classification: Unclassified Product: gcc Version: 4.8.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: translation AssignedTo: unassig...@gcc.gnu.org ReportedBy: sti...@antcom.de Translatable string: config/epiphany/epiphany.opt:108: floatig - floating?
[Bug translation/56987] New: gcc/config/avr/avr.opt:80: change - changed?
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56987 Bug #: 56987 Summary: gcc/config/avr/avr.opt:80: change - changed? Classification: Unclassified Product: gcc Version: 4.8.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: translation AssignedTo: unassig...@gcc.gnu.org ReportedBy: sti...@antcom.de Translatable string: gcc/config/avr/avr.opt:80: Warn if the address space of an address is change. - changed?
[Bug middle-end/10474] shrink wrapping for functions
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10474 Martin Jambor jamborm at gcc dot gnu.org changed: What|Removed |Added CC||jamborm at gcc dot gnu.org Component|tree-optimization |middle-end --- Comment #11 from Martin Jambor jamborm at gcc dot gnu.org 2013-04-17 15:52:44 UTC --- I've submitted a patch that actually makes shrink wrapping happen, at least on x86_64. It would be great if someone checked whether it helps on other platforms: http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01033.html (I'm also changing the component to to middle end as this is hardly a tree-optimization matter.)
[Bug debug/53453] darwin linker expects both AT_name and AT_comp_dir debug notes
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53453 m...@gcc.gnu.org mrs at gcc dot gnu.org changed: What|Removed |Added Known to work||4.7.4 --- Comment #18 from mrs at gcc dot gnu.org mrs at gcc dot gnu.org 2013-04-17 15:55:25 UTC --- Fixed the ChangeLog, thanks for spotting it.
[Bug middle-end/42371] dead code not eliminated during folding with whole-program
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42371 Martin Jambor jamborm at gcc dot gnu.org changed: What|Removed |Added URL||http://gcc.gnu.org/ml/gcc-p ||atches/2013-04/msg01032.htm ||l CC||jamborm at gcc dot gnu.org --- Comment #16 from Martin Jambor jamborm at gcc dot gnu.org 2013-04-17 15:58:17 UTC --- I have submitted a patch to address this issue: http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01032.html
[Bug tree-optimization/56718] Early inlining prevents type based devirtualization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56718 Martin Jambor jamborm at gcc dot gnu.org changed: What|Removed |Added URL||http://gcc.gnu.org/ml/gcc-p ||atches/2013-04/msg01034.htm ||l --- Comment #1 from Martin Jambor jamborm at gcc dot gnu.org 2013-04-17 16:03:01 UTC --- I have submitted a patch to address this issue: http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01034.html
[Bug fortran/56814] [4.8/4.9 Regression] Bogus Interface mismatch in dummy procedure
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56814 --- Comment #7 from janus at gcc dot gnu.org 2013-04-17 16:15:06 UTC --- Fixed on trunk with: Author: janus Date: Wed Apr 17 16:13:07 2013 New Revision: 198032 URL: http://gcc.gnu.org/viewcvs?rev=198032root=gccview=rev Log: 2013-04-17 Janus Weil ja...@gcc.gnu.org PR fortran/56814 * interface.c (check_result_characteristics): Get result from interface if present. 2013-04-17 Janus Weil ja...@gcc.gnu.org PR fortran/56814 * gfortran.dg/proc_ptr_42.f90: New. Added: trunk/gcc/testsuite/gfortran.dg/proc_ptr_42.f90 Modified: trunk/gcc/fortran/ChangeLog trunk/gcc/fortran/interface.c trunk/gcc/testsuite/ChangeLog Will backport to 4.8 soon.
[Bug debug/53453] darwin linker expects both AT_name and AT_comp_dir debug notes
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53453 --- Comment #19 from mrs at gcc dot gnu.org mrs at gcc dot gnu.org 2013-04-17 16:21:39 UTC --- I've sent a message to the gcc-patches list with a pointer to the gcc-patches list for the work.
[Bug middle-end/56988] New: ipa-cp incorrectly propagates a field of an aggregate
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56988 Bug #: 56988 Summary: ipa-cp incorrectly propagates a field of an aggregate Classification: Unclassified Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassig...@gcc.gnu.org ReportedBy: era...@google.com Created attachment 29890 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29890 Reduced test case $ trunk_g++ --version trunk_g++ (GCC) 4.9.0 20130416 (experimental) $ trunk_g++ -S -O2 -std=c++11 -fno-exceptions upstream_test_case.ii grep mov.* _ZTVN12_GLOBAL__N_18RCTesterE upstream_test_case.s movq%rax, _ZTVN12_GLOBAL__N_18RCTesterE+24(%rip) The generated assembly attempts to write into RCTester class's vtable. From the dump generated by -fdump-ipa-whole-program-all (just before ipa-cp), the caller has the following code: # .MEM_11 = VDEF .MEM_10 obj_3-D.2045._vptr.ReferenceCountedD.2013 = MEM[(voidD.45 *)_ZTVN12_GLOBAL__N_18RCTesterED.2049 + 16B]; # .MEM_12 = VDEF .MEM_11 obj_3-destructed_D.2025 = 0B; # .MEM_13 = VDEF .MEM_12 obj_3-owner_D.2026 = 0B; # .MEM_5 = VDEF .MEM_13 # USE = nonlocal null { D.2015 D.2049 } (glob) # CLB = nonlocal null { D.2015 D.2049 } (glob) _ZN12_GLOBAL__N_19TestResetEPNS_8RCTesterED.2068 (obj_3); At the callee, we see: void {anonymous}::TestReset({anonymous}::RCTester*) (struct RCTesterD.2017 * objD.2067) { const struct AssertionResultD.1962 gtest_arD.2071; boolD.1899 destructedD.2070; struct RCTesterD.2017 * obj.3D.2179; # .MEM_2 = VDEF .MEM_1(D) destructedD.2070 = 0; # VUSE .MEM_2 # PT = nonlocal escaped obj.3_3 = objD.2067; # .MEM_8 = VDEF .MEM_2 MEM[(boolD.1899 * *)obj.3_3 + 8B] = destructedD.2070; ipa-cp mistakenly thinks that the move statement obj.3_3 = objD.2067; actually loads from offset 0 of objD.2067 and hence propagates MEM[(voidD.45 *)_ZTVN12_GLOBAL__N_18RCTesterED.2049 + 16B] into obj.3_3 which then subsequently gets propagated to the store of destructedD.2070. The following patch fixes this, but not sure if this could be too restrictive: Index: gcc/ipa-prop.c === --- gcc/ipa-prop.c(revision 197495) +++ gcc/ipa-prop.c(working copy) @@ -3892,7 +3892,7 @@ ipcp_transform_function (struct cgraph_node *node) { struct ipa_agg_replacement_value *v; gimple stmt = gsi_stmt (gsi); -tree rhs, val, t; +tree rhs, lhs, val, t; HOST_WIDE_INT offset; int index; bool by_ref, vce; @@ -3900,6 +3900,7 @@ ipcp_transform_function (struct cgraph_node *node) if (!gimple_assign_load_p (stmt)) continue; rhs = gimple_assign_rhs1 (stmt); +lhs = gimple_assign_lhs (stmt); if (!is_gimple_reg_type (TREE_TYPE (rhs))) continue; @@ -3924,7 +3925,8 @@ ipcp_transform_function (struct cgraph_node *node) continue; for (v = aggval; v; v = v-next) if (v-index == index - v-offset == offset) + v-offset == offset + TREE_TYPE (v-value) == TREE_TYPE (lhs)) break; if (!v) continue;
[Bug ada/40986] [4.6 regression] Assert_Failure sinfo.adb:360, error detected at a-unccon.ads:23:27
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40986 Ludovic Brenta ludo...@ludovic-brenta.org changed: What|Removed |Added Status|RESOLVED|REOPENED Known to work|4.7.2 | Resolution|FIXED | Known to fail||4.7.2 --- Comment #16 from Ludovic Brenta ludo...@ludovic-brenta.org 2013-04-17 18:30:40 UTC --- gcc-4.7 -c -I./ -gnato -gnatwl -gnatwauJF -gnatef -g -fno-strict-aliasing -gnatwA -I- ./test.adb +===GNAT BUG DETECTED==+ | 4.7.2 (x86_64-linux-gnu) Assert_Failure sinfo.adb:388| | Error detected at a-unccon.ads:23:27 | Thanks Markus for noticing the interference of gnatchop. I did the mistake of gnatchopping the reproducer, this hid the problem.
[Bug target/56866] gcc 4.7.x/gcc-4.8.x with '-O3 -march=bdver2' misscompiles glibc-2.17/crypt/sha512.c
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56866 --- Comment #9 from Winfried Magerl winfried.mag...@t-online.de 2013-04-17 18:41:06 UTC --- Hi, at least one confirmation. I've done some further checks about float-errors in glibc and that FAM/FAM4 are the extension responsible for the additional float-errors. How to proceed? From my point of view and comapred with '-march=amdfam10' the extensions XOP/FAM4/FAM are responsible for the failed tests. Disabling it in gcc-4.8-noxop/gcc/config/i386/i386.c brings me back to the same test-results I'm seeing with amdfam10 (excluding all sorts of scan-*-errors). I would propose the following patch for bdver2-support because features which are untested and known to break code (like for example all the additional test-errors in the gcc-testsuite) should be disabeled: --- gcc-4.8-noxop/gcc/config/i386/i386.c.orig 2013-04-12 20:49:09.181351855 +0200 +++ gcc-4.8-noxop/gcc/config/i386/i386.c2013-04-12 23:15:09.112185980 +0200 @@ -2976,9 +2976,9 @@ {bdver2, PROCESSOR_BDVER2, CPU_BDVER2, PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3 | PTA_SSE4A | PTA_CX16 | PTA_ABM | PTA_SSSE3 | PTA_SSE4_1 - | PTA_SSE4_2 | PTA_AES | PTA_PCLMUL | PTA_AVX | PTA_FMA4 - | PTA_XOP | PTA_LWP | PTA_BMI | PTA_TBM | PTA_F16C - | PTA_FMA | PTA_PRFCHW | PTA_FXSR | PTA_XSAVE}, + | PTA_SSE4_2 | PTA_AES | PTA_PCLMUL | PTA_AVX + | PTA_LWP | PTA_BMI | PTA_TBM | PTA_F16C + | PTA_PRFCHW | PTA_FXSR | PTA_XSAVE}, {bdver3, PROCESSOR_BDVER3, CPU_BDVER3, PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3 | PTA_SSE4A | PTA_CX16 | PTA_ABM | PTA_SSSE3 | PTA_SSE4_1 just an examp,e because the features should be disabled in bdver1/3 too (XOP/FMA4/FMA are only available in bdver1/2/3). Maybe adding the gcc-developers from @amd.com? regards winfried
[Bug c/56989] New: wrong location in error message
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56989 Bug #: 56989 Summary: wrong location in error message Classification: Unclassified Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassig...@gcc.gnu.org ReportedBy: tro...@gcc.gnu.org Consider this program: extern void voidf(void); extern int intf(void); int check(void) { if (voidf() 0 || intf() 0) return -1; return 0; } I compiled it with a recent git gcc and got: barimba. gcc --syntax-only qq.c qq.c: In function ‘check’: qq.c:7:7: error: void value not ignored as it ought to be || intf() 0) ^ I think the error message would be more helpful if it pointed to the call to voidf.
[Bug sanitizer/56990] New: ICE: SIGFPE with -fsanitize=thread and empty struct
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56990 Bug #: 56990 Summary: ICE: SIGFPE with -fsanitize=thread and empty struct Classification: Unclassified Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: sanitizer AssignedTo: unassig...@gcc.gnu.org ReportedBy: zso...@seznam.cz CC: do...@gcc.gnu.org, dvyu...@gcc.gnu.org, ja...@gcc.gnu.org, k...@gcc.gnu.org Created attachment 29891 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29891 reduced testcase Compiler output: $ gcc -fsanitize=thread testcase.c testcase.c: In function 'foo': testcase.c:3:6: internal compiler error: Floating point exception void foo(struct S *p) ^ 0xa24dbf crash_signal /mnt/svn/gcc-trunk/gcc/toplev.c:332 0xa3af12 instrument_expr /mnt/svn/gcc-trunk/gcc/tsan.c:134 0xa3c406 instrument_gimple /mnt/svn/gcc-trunk/gcc/tsan.c:612 0xa3c406 instrument_memory_accesses /mnt/svn/gcc-trunk/gcc/tsan.c:635 0xa3c406 tsan_pass /mnt/svn/gcc-trunk/gcc/tsan.c:700 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See http://gcc.gnu.org/bugs.html for instructions. Program received signal SIGFPE, Arithmetic exception. 0x00a3af12 in instrument_expr (gsi=..., expr=0x76d9f000, is_write=is_write@entry=true) at /mnt/svn/gcc-trunk/gcc/tsan.c:134 134 if (bitpos % (size * BITS_PER_UNIT) Tested revisions: r198018 - crash 4.8 r196898 - crash
[Bug fortran/40958] module files too large
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40958 --- Comment #11 from Joost VandeVondele Joost.VandeVondele at mat dot ethz.ch 2013-04-17 19:36:45 UTC --- With these patches in, parallel compilation of multi-file cp2k becomes significantly faster. Time for a full build goes from 70s to 50s. I think that in a parallel build the IO bottleneck (bandwidth) was significant, while this is now much improved. The effect will likely be even larger on mounted filesystems.
[Bug ada/56909] [4.8 regression] s-atopri.adb:multiple undefined references on mingw32
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56909 Arthur Zhang mail2arthur at gmail dot com changed: What|Removed |Added Status|RESOLVED|NEW Resolution|WONTFIX | --- Comment #12 from Arthur Zhang mail2arthur at gmail dot com 2013-04-17 19:44:34 UTC --- I can build successfully with either '--with-arch=i686 --build=mingw32' or '--build=i686-pc-mingw32', but as I mentioned in comment 10, change build target cause packaging issue. What is the benefit to use '--build=i686-pc-mingw32' than '--with-arch=i686'? Thanks.
[Bug c++/56991] New: constexpr std::initializer_list crashes on too complex initialization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56991 Bug #: 56991 Summary: constexpr std::initializer_list crashes on too complex initialization Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: morwen...@hotmail.fr I found some strange behaviour that, after a discussion on StackOverflow, seems to be a bug (discussion here: http://stackoverflow.com/questions/16057690/confusion-about-constant-expressions/16068953?noredirect=1#16068953). It seems that GCC implements N3471 which means that every function of an std::initializer_list are constexpr. When trying to pass simple constexpr things in the initializer_list, it works fine: #include array #include initializer_list int main() { constexpr std::arrayint, 3 a = {{ 1, 2, 3 }}; constexpr int a0 = a[0]; constexpr int a1 = a[1]; constexpr int a2 = a[2]; constexpr std::initializer_listint b = { a0, a1, a2 }; return 0; } However, without the intermediate variables a0, a1 and a2, the example above crashes: #include array #include initializer_list int main() { constexpr std::arrayint, 3 a = {{ 1, 2, 3 }}; constexpr std::initializer_listint b = { a[0], a[1], a[2] }; return 0; } The error is the following one: error: 'const std::initializer_listint{((const int*)(anonymous)), 3u}' is not a constant expression This last example works fine if I remove the constexpr qualifier at the beginning of the line or if I replace the initializer_list by a std::array. It seems that the bug is only triggered when using std::initializer_list with constexpr.
[Bug target/56866] gcc 4.7.x/gcc-4.8.x with '-O3 -march=bdver2' misscompiles glibc-2.17/crypt/sha512.c
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56866 --- Comment #10 from Mikael Pettersson mikpe at it dot uu.se 2013-04-17 20:15:47 UTC --- (In reply to comment #9) How to proceed? Derive a stand-alone test case from the failing glibc module and whatever glibc code it requires, then minimize it.
[Bug ada/56909] [4.8 regression] s-atopri.adb:multiple undefined references on mingw32
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56909 Eric Botcazou ebotcazou at gcc dot gnu.org changed: What|Removed |Added Status|NEW |RESOLVED Resolution||WONTFIX --- Comment #13 from Eric Botcazou ebotcazou at gcc dot gnu.org 2013-04-17 20:41:54 UTC --- I can build successfully with either '--with-arch=i686 --build=mingw32' or '--build=i686-pc-mingw32', but as I mentioned in comment 10, change build target cause packaging issue. Too bad, but I don't think this will ultimately change the decision, as i686-pc-mingw32 is the standard triplet for Windows these days. What is the benefit to use '--build=i686-pc-mingw32' than '--with-arch=i686'? It doesn't force -march=i686 by default.
[Bug ada/56909] [4.8 regression] s-atopri.adb:multiple undefined references on mingw32
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56909 --- Comment #14 from Arthur Zhang mail2arthur at gmail dot com 2013-04-17 21:02:14 UTC --- (In reply to comment #13) What is the benefit to use '--build=i686-pc-mingw32' than '--with-arch=i686'? It doesn't force -march=i686 by default. Why below output has '-march=pentiumpro'? bash-3.1$ gcc -v -o t.exe ./test.c Using built-in specs. COLLECT_GCC=c:\MinGW\bin\gcc.exe COLLECT_LTO_WRAPPER=c:/mingw/bin/../libexec/gcc/i686-pc-mingw32/4.8.0/lto-wrappe r.exe Target: i686-pc-mingw32 Configured with: ../gcc-4.8.0/configure --enable-languages=c,c++,ada,fortran,obj c,obj-c++ --disable-sjlj-exceptions --with-dwarf2 --enable-shared --enable-libgo mp --disable-win32-registry --enable-libstdcxx-debug --enable-version-specific-r untime-libs --build=i686-pc-mingw32 --prefix=/mingw Thread model: win32 gcc version 4.8.0 (GCC) COLLECT_GCC_OPTIONS='-v' '-o' 't.exe' '-mtune=generic' '-march=pentiumpro' ...
[Bug target/56866] gcc 4.7.x/gcc-4.8.x with '-O3 -march=bdver2' misscompiles glibc-2.17/crypt/sha512.c
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56866 --- Comment #11 from Winfried Magerl winfried.mag...@t-online.de 2013-04-17 21:02:38 UTC --- Hi Mike, On Wed, Apr 17, 2013 at 08:15:47PM +, mikpe at it dot uu.se wrote: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56866 --- Comment #10 from Mikael Pettersson mikpe at it dot uu.se 2013-04-17 20:15:47 UTC --- (In reply to comment #9) How to proceed? Derive a stand-alone test case from the failing glibc module and whatever glibc code it requires, then minimize it. If fixing broken gcc's XOP/FMA/FMA4-extensions on AMD-CPUs depends on my ability to extract a stand-alone-test from glibc-testsuite then I'm realy sorry for not having the necessary skills (as already stated). Why not simply using the failing test-cases from gcc-testsuite which are all standalone and depends on XOP: +FAIL: gcc.c-torture/execute/pr51581-1.c execution, -O3 -fomit-frame-pointer +FAIL: gcc.c-torture/execute/pr51581-1.c execution, -O3 -fomit-frame-pointer -funroll-loops +FAIL: gcc.c-torture/execute/pr51581-1.c execution, -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions +FAIL: gcc.c-torture/execute/pr51581-1.c execution, -O3 -g +FAIL: gcc.c-torture/execute/pr51581-2.c execution, -O3 -fomit-frame-pointer +FAIL: gcc.c-torture/execute/pr51581-2.c execution, -O3 -fomit-frame-pointer -funroll-loops +FAIL: gcc.c-torture/execute/pr51581-2.c execution, -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions +FAIL: gcc.c-torture/execute/pr51581-2.c execution, -O3 -g +FAIL: gcc.c-torture/execute/pr53645.c execution, -O1 +FAIL: gcc.c-torture/execute/pr53645.c execution, -O2 +FAIL: gcc.c-torture/execute/pr53645.c execution, -O3 -fomit-frame-pointer +FAIL: gcc.c-torture/execute/pr53645.c execution, -O3 -fomit-frame-pointer -funroll-loops +FAIL: gcc.c-torture/execute/pr53645.c execution, -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions +FAIL: gcc.c-torture/execute/pr53645.c execution, -O3 -g +FAIL: gcc.c-torture/execute/pr53645.c execution, -Os +FAIL: gcc.c-torture/execute/pr53645.c execution, -Og -g +FAIL: gcc.c-torture/execute/pr53645.c execution, -O2 -flto -fno-use-linker-plugin -flto-partition=none +FAIL: gcc.c-torture/execute/pr53645.c execution, -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects +FAIL: gcc.dg/vect/pr51581-1.c execution test +FAIL: gcc.dg/vect/pr51581-2.c execution test +FAIL: gcc.dg/vect/pr51581-3.c execution test +FAIL: gcc.dg/vect/pr51581-1.c -flto execution test +FAIL: gcc.dg/vect/pr51581-2.c -flto execution test +FAIL: gcc.dg/vect/pr51581-3.c -flto execution test +FAIL: gcc.target/i386/avx-mul-1.c execution test +FAIL: gcc.target/i386/avx-pr51581-1.c execution test +FAIL: gcc.target/i386/avx-pr51581-2.c execution test +FAIL: gcc.target/i386/sse2-mul-1.c execution test +FAIL: gcc.target/i386/sse4_1-mul-1.c execution test Or is this a formal problem because the subject does not realy match the whole problem which looks like a more general problem with extensions specific to bdver1/2/3 (and for this not reproducable on other cpu's). regards winfried
[Bug c/56989] wrong location in error message
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56989 Manuel López-Ibáñez manu at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2013-04-17 CC||manu at gcc dot gnu.org Ever Confirmed|0 |1 --- Comment #1 from Manuel López-Ibáñez manu at gcc dot gnu.org 2013-04-17 21:08:52 UTC --- Index: c-typeck.c === --- c-typeck.c (revision 198021) +++ c-typeck.c (working copy) @@ -1981,11 +1981,12 @@ default_conversion (tree exp) if (TREE_NO_WARNING (orig_exp)) TREE_NO_WARNING (exp) = 1; if (code == VOID_TYPE) { - error (void value not ignored as it ought to be); + error_at (EXPR_LOC_OR_HERE (exp), +void value not ignored as it ought to be); return error_mark_node; } exp = require_complete_type (exp); if (exp == error_mark_node) /home/manuel/void.c:6:12: error: void value not ignored as it ought to be if (voidf() 0 ^ The location could be even better, but that is what the c-parser records. I like Clang's diagnostic much more: /home/manuel/void.c:6:15: error: invalid operands to binary expression ('void' and 'int') if (voidf() 0 ~~~ ^ ~ It is similar to what g++ produces: /home/manuel/void.c:6:17: error: invalid operands of types ‘void’ and ‘int’ to binary ‘operator’ if (voidf() 0 ^ but with better locations.
[Bug ada/56909] [4.8 regression] s-atopri.adb:multiple undefined references on mingw32
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56909 --- Comment #15 from Eric Botcazou ebotcazou at gcc dot gnu.org 2013-04-17 21:35:54 UTC --- Why below output has '-march=pentiumpro'? I think it's the autodetected arch, but maybe I'm confused. Never mind.
[Bug target/56866] gcc 4.7.x/gcc-4.8.x with '-O3 -march=bdver2' misscompiles glibc-2.17/crypt/sha512.c
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56866 --- Comment #12 from Marc Glisse glisse at gcc dot gnu.org 2013-04-17 21:49:15 UTC --- (In reply to comment #11) If fixing broken gcc's XOP/FMA/FMA4-extensions on AMD-CPUs depends on my ability to extract a stand-alone-test from glibc-testsuite then I'm realy sorry for not having the necessary skills (as already stated). Skills can be learned, and the best way is through practice. Ideally someone with the right combination of knowledge, hardware and free time would look at it, and you seem to be the closest currently ;-) Why not simply using the failing test-cases from gcc-testsuite which are all standalone and depends on XOP: Good idea. I suggest you pick a simple one: +FAIL: gcc.target/i386/sse2-mul-1.c execution test it looks like a list of several tests in a row. If you can first replace the aborts with printf to determine the first one that fails, then remove everything after that point, you have already narrowed the issue quite a bit. Then you can try to simplify what remains. Ideally, you would get a program small enough that posting the dumps would show the obvious issue. Do make sure while reducing the program that it still works correctly without the bdver2 option.
[Bug rtl-optimization/56847] [4.8/4.9 Regression] '-fpie' triggers - internal compiler error: in gen_add2_insn, at optabs.c:4705
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56847 --- Comment #8 from Han Shen shenhan at google dot com 2013-04-17 23:42:22 UTC --- Hi, any progress on this? Thanks!
[Bug c/56992] New: building Wine with -Og causes GCC to seg fault
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56992 Bug #: 56992 Summary: building Wine with -Og causes GCC to seg fault Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassig...@gcc.gnu.org ReportedBy: jimpor...@gmail.com GCC seg faults when building Wine with -Og. The Wine developers pointed me at http://gcc.gnu.org/wiki/A_guide_to_testcase_reduction which helped me reduce the problem down to the attached file (~14 lines). $ gcc -Og -c testcase-min.c testcase-min.c: In function ‘DnsHostnameToComputerNameA’: testcase-min.c:13:1: internal compiler error: Segmentation fault } ^ Please submit a full bug report, with preprocessed source if appropriate. See https://bugs.archlinux.org/ for instructions. I'm using gcc (GCC) 4.8.0 20130411 (prerelease) as supplied by ArchLinux. My processor is an AMD Phenom(tm) II X4 955 For hints about how GCC was built, you can look for the configure line here: https://projects.archlinux.org/svntogit/community.git/tree/trunk/PKGBUILD?h=packages/gcc-multilib
[Bug c/56992] building Wine with -Og causes GCC to seg fault
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56992 --- Comment #1 from James Eder jimportal at gmail dot com 2013-04-17 23:47:39 UTC --- Created attachment 29892 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29892 testcase-min.c
[Bug target/56993] New: power gcc built 416.gamess generates wrong result
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56993 Bug #: 56993 Summary: power gcc built 416.gamess generates wrong result Classification: Unclassified Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: car...@google.com Host: powerpc-linux-gnu Target: powerpc-linux-gnu Build: powerpc-linux-gnu When I use the trunk gcc to run spec2006 416.gamess, I got the following error $ runspec --config=test.cfg --tune=base --size=test --nofeedback --noreportable game runspec v6152 - Copyright 1999-2008 Standard Performance Evaluation Corporation Using 'linux-ydl23-ppc' tools Reading MANIFEST... 18357 files Loading runspec modules Locating benchmarks...found 31 benchmarks in 6 benchsets. Reading config file '/usr/local/google/carrot/spec2006/config/test.cfg' Benchmarks selected: 416.gamess Compiling Binaries Building 416.gamess base Linux64 default: (build_base_Linux64.) Build successes: 416.gamess(base) Setting Up Run Directories Setting up 416.gamess test base Linux64 default: created (run_base_test_Linux64.) Running Benchmarks Running (#1) 416.gamess test base Linux64 default Contents of exam29.err STOP IN ABRT *** Miscompare of exam29.out; for details see /usr/local/google/carrot/spec2006/benchspec/CPU2006/416.gamess/run/run_base_test_Linux64./exam29.out.mis Invalid run; unable to continue. If you wish to ignore errors please use '-I' or ignore_errors The log for this run is in /usr/local/google/carrot/spec2006/result/CPU2006.111.log The debug log for this run is in /usr/local/google/carrot/spec2006/result/CPU2006.111.log.debug * * Temporary files were NOT deleted; keeping temporaries such as * /usr/local/google/carrot/spec2006/result/CPU2006.111.log.debug and * /usr/local/google/carrot/spec2006/tmp/CPU2006.111 * (These may be large!) * runspec finished at Wed Apr 17 16:37:27 2013; 93 total seconds elapsed My gcc is configured as $ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/powerpc-linux-gnu/4.6/lto-wrapper Target: powerpc-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 4.6.2-12' --with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.6 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6 --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-plugin --enable-objc-gc --enable-secureplt --disable-softfloat --enable-targets=powerpc-linux,powerpc64-linux --with-cpu=default32 --with-long-double-128 --enable-checking=release --build=powerpc-linux-gnu --host=powerpc-linux-gnu --target=powerpc-linux-gnu Thread model: posix gcc version 4.6.2 (Debian 4.6.2-12) GCC4.8 has the same error, but gcc4.7 is good.
[Bug tree-optimization/56982] [4.8/4.9 Regression] Bad optimization with setjmp()
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56982 --- Comment #10 from Leif Leonhardy bugfeed at online dot de 2013-04-18 01:17:31 UTC --- One proposed requirement on setjmp is that it be usable like any other function, that is, that it be callable in *any* expression context, and that the expression evaluate correctly whether the return from setjmp is direct or via a call to longjmp. Unfortunately, any implementation of setjmp as a conventional called function cannot know enough about the calling environment to save any temporary registers or dynamic stack locations used part way through an expression evaluation. [...] The temporaries may be correct on the initial call to setjmp, but are not likely to be on any return initiated by a corresponding call to longjmp. These considerations dictated the constraint that setjmp be called only from within fairly simple expressions, ones not likely to need temporary storage. An alternative proposal considered by the C89 Committee was to require that implementations recognize that calling setjmp is a special case, and hence that they take whatever precautions are necessary to restore the setjmp environment properly upon a longjmp call. This proposal was rejected on grounds of consistency: implementations are currently allowed to implement library functions specially, but no other situations require special treatment. So according to this (The C99 Rationale [1], page 139 ff., likewise the Single UNIX Specification), here setjmp() is simply used in an invalid context (i.e., in an assignment statement). ;-) Still, with -Og at least, GCC 4.8.0 produces wrong code even if setjmp() is used in an allowed context (as in e.g. if (setjmp(...)0) ..., or switch (setjmp(...)) { ... }), and no matter whether n is declared volatile or not. [1] http://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf
[Bug fortran/56981] Slow I/O: Unformatted 5x slower, large sys component; formatted slow as well
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56981 --- Comment #6 from Jerry DeLisle jvdelisle at gcc dot gnu.org 2013-04-18 01:21:42 UTC --- I like Jannes idea with the flags. Also, it seems that at the time we open a file we know it is /dev/null or /dev/nul in some cases by the file name. It would be very low overhead in a few cases to disable some or all checks and even disable the writing completely. We would not get all situations, but the low hanging fruit we could. It could be done by setting a NULL bit. One could consider doing this at compile time in some cases where the frontend could have more elaborate configuration checks that determine the name of the null device on the target system and look for its use. (probably not really worth if fur NULL I/O The other idea to consider is a compiler flag, say -fast-IO or similar that also disables the extra error checking that is not critical to runtime after a program has been debugged.
[Bug c/56682] -fsanitize documentation
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56682 --- Comment #1 from Andrew Pinski pinskia at gcc dot gnu.org 2013-04-18 01:58:03 UTC --- -fsanitize=thread I think it requires -fPIE but really it should not.
[Bug fortran/56994] New: Incorrect documentation for Fortran NEAREST intrinsic function
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56994 Bug #: 56994 Summary: Incorrect documentation for Fortran NEAREST intrinsic function Classification: Unclassified Product: gcc Version: unknown Status: UNCONFIRMED Severity: trivial Priority: P3 Component: fortran AssignedTo: unassig...@gcc.gnu.org ReportedBy: spam.brian.tay...@gmail.com The GNU Fortran documentation** for the Fortran intrinsic function NEAREST(X,S) says that S is an optional argument. It is not optional according to the Fortran standard. It is implemented correctly in gfortran, so this is only an error in the documentation. ** http://gcc.gnu.org/onlinedocs/gcc-4.8.0/gfortran/NEAREST.html
Re: [PATCH, generic] Support printing of escaped curly braces and vertical bar in assembler output
On 2013-04-16 13:55, Jakub Jelinek wrote: On Tue, Apr 16, 2013 at 03:41:52PM +0400, Maksim Kuznetsov wrote: Richard, Jeff, could you please have a look? I wonder if it %{ and %} shouldn't be better handled in final.c for all #ifdef ASSEMBLER_DIALECT targets, rather than just for one specific. Yes, please. r~
Re: [patch] Fix PR middle-end/56474
On Wed, Apr 17, 2013 at 1:12 AM, Eric Botcazou ebotca...@adacore.com wrote: For the C family I found exactly one - the layout_type case, and fixed it in the FEs by making empty arrays use [1, 0] domains or signed domains (I don't remember exactly). I believe the layout_type change was to make Ada happy. I'm skeptical, I had to narrow down the initial kludge because it hurted Ada. It may be that enabling overflow detection for even unsigned sizetype was because of Ada as well. After all only Ada changed its sizetype sign recently. Not true, overflow detection has _always_ been enabled for sizetypes. But sizetypes, including unsigned ones, were sign-extended so 0 -1 didn't overflow and we need that behavior back for Ada to work properly, Yeah, well - they were effectively signed. I don't like special casing 0 - 1 in a general compute function. Maybe you want to use size_diffop for the computation? That would result in a signed result and thus no overflow for 0 - 1. But it's not a general compute function, it's size_binop which is meant to be used for sizetypes only and which forces overflow on unsigned types. We need overflow detection for sizetypes but we can also tailor it to fit our needs. I'm not against tailoring it to fit our needs - I'm just against special casing behavior for specific values. That just sounds wrong. Maybe we should detect overflow as if the input and output were signed while computing an unsigned result. As far as I can see int_const_binop_1 does detect overflow as if operations were signed (it passes 'false' as uns to all double-int operations rather than TYPE_UNSIGNED). For example sub_with_overflow simply does neg_double (b.low, b.high, ret.low, ret.high); add_double (low, high, ret.low, ret.high, ret.low, ret.high); *overflow = OVERFLOW_SUM_SIGN (ret.high, b.high, high); which I believe is wrong. Shouldn't it be neg_double (b.low, b.high, ret.low, ret.high); HOST_WIDE_INT tem = ret.high; add_double (low, high, ret.low, ret.high, ret.low, ret.high); *overflow = OVERFLOW_SUM_SIGN (ret.high, tem, high); ? Because we are computing a + (-b) and thus OVERFLOW_SUM_SIGN expects the sign of a and -b, not a and b to verify against the sign of ret. The other option is to for example disable overflow handling for _all_ constants and MINUS_EXPR (and then please PLUS_EXPR as well) in size_binop. Maybe it's only the MULT_EXPR overflow we want to know (byte-to-bit conversion / element size scaling IIRC). Well, we just need 0 - 1 because of the way we compute size expressions for variable-sized arrays. I'm sceptical. Where do you compute the size expression for variable-sized arrays? I suppose with the testcase in the initial patch I can then inspect myself what actually happens? Thanks, Richard. -- Eric Botcazou
Re: [patch] simplify emit_delay_sequence
This patch is also necessary for my new delay-slot scheduler to keep basic block boundaries correctly up-to-date. The emit-rtl API does that already. Cross-tested powerpc64 x mips. Currently running bootstraptest on sparc64-unknown-linux-gnu. OK if it passes? Yes, modulo @@ -538,6 +502,8 @@ emit_delay_sequence (rtx insn, rtx list, int lengt INSN_LOCATION (seq_insn) = INSN_LOCATION (tem); INSN_LOCATION (tem) = 0; + /* Remove any REG_DEAD notes because we can't rely on them now +that the insn has been moved. */ for (note = REG_NOTES (tem); note; note = next) { next = XEXP (note, 1); Did you mean to move the comment instead of duplicating it? -- Eric Botcazou
Re: [patch] Fix ICE during RTL expansion at -O1
+ if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE) +goto may_overlap; ick, can TREE_CODE (type1) != RECORD_TYPE happen as well here? Please add a comment similar to the Fortran ??? above. It can happen because we stop at unions (and qualified unions) and for them we cannot disambiguate based on the fields. I'll add a regular comment. Can you please also add at least one testcase as gcc.dg/tree-ssa/ssa-fre-??.c that tests the functionality of this and that wasn't handled before? I suppose it would be sth like struct S { int i; int j; }; struct U { struct S a[10]; } u; u.a[n].i= i; u.a[n].j = j; return u.a[n].i; where we miss to CSE the load from u.a[n].i. Yes, the patch does eliminate the redundant load in .fre1: u.a[n_2(D)].i = i_3(D); u.a[n_2(D)].j = j_5(D); _7 = u.a[n_2(D)].i; return _7; becomes: u.a[n_2(D)].i = i_3(D); u.a[n_2(D)].j = j_5(D); _7 = i_3(D); return _7; Otherwise the patch is ok. Thanks. -- Eric Botcazou
Re: [PATCH] Add a new option -fstack-protector-strong
On 04/17/2013 02:49 AM, Han Shen wrote: + if (flag_stack_protect == 3) +cpp_define (pfile, __SSP_STRONG__=3); if (flag_stack_protect == 2) cpp_define (pfile, __SSP_ALL__=2); 3 and 2 should be replaced by SPCT_FLAG_STRONG and SPCT_FLAG_ALL. I define these SPCT_FLAG_XXX in cfgexpand.c locally, so they are not visible to c-cppbuiltin.c, do you suggest define these inside c-cppbuiltin.c also? I see. Let's use the constants for now. Indentation is off (unless both mail clients I tried are clobbering your patch). I think the GNU coding style prohibits the braces around the single-statement body of the outer 'for. Done with indentation properly on and removed the braces. (GMail composing window drops all the tabs when pasting... I have to use Thunderbird to paste the patch. Hope it is right this time) Thunderbird mangles patches as well, but I was able to repair the damage. When using Thunderbird, please send the patch as a text file attachment. You can put the changelog snippets at the beginning of the file as well. This way, everything is sent out unchanged. Can you make the conditional more similar to the comment, perhaps using a switch statement on the value of the flag_stack_protect variable? That's going to be much easier to read. Re-coded. Now using 'switch-case'. Thanks. I think the comment is now redundant because it matches the code almost word-for-word. 8-) No for 'struct-returning' functions. But I regard this not an issue --- at the programming level, there is no way to get one's hand on the address of a returned structure --- struct Node foo(); struct Node *p = foo(); // compiler error - lvalue required as unary '' operand. C++ const references can bind to rvalues. But I'm more worried about the interaction with the return value optimization. Consider this C++ code: struct S { S(); int a; int b; int c; int d; int e; }; void f1(int *); S f2() { S s; f1(s.a); return s; } S g2(); void g3() { S s = g2(); } void g3b(const S); void g3b() { g3b(g2()); } With your patch and -O2 -fstack-protector-strong, this generates the following assembly: .globl _Z2f2v .type _Z2f2v, @function _Z2f2v: .LFB0: .cfi_startproc pushq %rbx .cfi_def_cfa_offset 16 .cfi_offset 3, -16 movq%rdi, %rbx call_ZN1SC1Ev movq%rbx, %rdi call_Z2f1Pi movq%rbx, %rax popq%rbx .cfi_def_cfa_offset 8 ret .cfi_endproc .LFE0: .size _Z2f2v, .-_Z2f2v .p2align 4,,15 .globl _Z2g3v .type _Z2g3v, @function _Z2g3v: .LFB1: .cfi_startproc subq$40, %rsp .cfi_def_cfa_offset 48 movq%rsp, %rdi call_Z2g2v addq$40, %rsp .cfi_def_cfa_offset 8 ret .cfi_endproc .LFE1: .size _Z2g3v, .-_Z2g3v .p2align 4,,15 .globl _Z3g3bv .type _Z3g3bv, @function _Z3g3bv: .LFB2: .cfi_startproc subq$40, %rsp .cfi_def_cfa_offset 48 movq%rsp, %rdi movq%fs:40, %rax movq%rax, 24(%rsp) xorl%eax, %eax call_Z2g2v movq%rsp, %rdi call_Z3g3bRK1S movq24(%rsp), %rax xorq%fs:40, %rax jne .L9 addq$40, %rsp .cfi_remember_state .cfi_def_cfa_offset 8 ret .L9: .cfi_restore_state .p2align 4,,6 call__stack_chk_fail .cfi_endproc .LFE2: .size _Z3g3bv, .-_Z3g3bv Here, g3b() is correctly instrumented, and f2() does not need instrumentation (because space for the returned object is not part of the local frame). But an address on the stack escapes in g3() and is used for the return value of the call to g2(). This requires instrumentation, which is missing in this example. I suppose this can be handled in a follow-up patch if necessary. ChangeLog and patch below -- gcc/ChangeLog 2013-04-16 Han Shen shen...@google.com * cfgexpand.c (record_or_union_type_has_array_p): Helper function to check if a record or union contains an array field. I think the GNU convention is to write only this: * cfgexpand.c (record_or_union_type_has_array_p): New function. (expand_used_vars): Add logic handling '-fstack-protector-strong'. * common.opt (fstack-protector-all): New option. Should be fstack-protector-strong. -- Florian Weimer / Red Hat Product Security Team
RE: [PATCH, AArch64] Compare instruction in shift_extend mode
Hi, I suggest for this one test case either making it compile only and dropping main() such that the pattern match only looks in the assembled output of the cmp_* functions The testcase will check only for assembly pattern of the instruction as per your suggestion. Please find attached the modified patch let me know if there should be any further modifications in it. Thanks, Naveen --- gcc/config/aarch64/aarch64.md 2013-04-17 11:18:29.453576713 +0530 +++ gcc/config/aarch64/aarch64.md 2013-04-17 15:22:36.161492471 +0530 @@ -2311,6 +2311,18 @@ (set_attr mode GPI:MODE)] ) +(define_insn *cmp_swp_optabALLX:mode_shft_GPI:mode + [(set (reg:CC_SWP CC_REGNUM) + (compare:CC_SWP (ashift:GPI + (ANY_EXTEND:GPI + (match_operand:ALLX 0 register_operand r)) + (match_operand:QI 1 aarch64_shift_imm_mode n)) + (match_operand:GPI 2 register_operand r)))] + + cmp\\t%GPI:w2, %GPI:w0, suxtALLX:size %1 + [(set_attr v8type alus_ext) + (set_attr mode GPI:MODE)] +) ;; --- ;; Store-flag and conditional select insns --- gcc/testsuite/gcc.target/aarch64/cmp.c 1970-01-01 05:30:00.0 +0530 +++ gcc/testsuite/gcc.target/aarch64/cmp.c 2013-04-17 15:23:36.121492125 +0530 @@ -0,0 +1,61 @@ +/* { dg-do compile } */ +/* { dg-options -O2 } */ + +int +cmp_si_test1 (int a, int b, int c) +{ + if (a b) +return a + c; + else +return a + b + c; +} + +int +cmp_si_test2 (int a, int b, int c) +{ + if ((a 3) b) +return a + c; + else +return a + b + c; +} + +typedef long long s64; + +s64 +cmp_di_test1 (s64 a, s64 b, s64 c) +{ + if (a b) +return a + c; + else +return a + b + c; +} + +s64 +cmp_di_test2 (s64 a, s64 b, s64 c) +{ + if ((a 3) b) +return a + c; + else +return a + b + c; +} + +int +cmp_di_test3 (int a, s64 b, s64 c) +{ + if (a b) +return a + c; + else +return a + b + c; +} + +int +cmp_di_test4 (int a, s64 b, s64 c) +{ + if (((s64)a 3) b) +return a + c; + else +return a + b + c; +} + +/* { dg-final { scan-assembler-times cmp\tw\[0-9\]+, w\[0-9\]+ 2 } } */ +/* { dg-final { scan-assembler-times cmp\tx\[0-9\]+, x\[0-9\]+ 4 } } */
RE: [PATCH][ARM][1/2] Add support for vcvt_f16_f32 and vcvt_f32_f16 NEON intrinsics
Hi Julian, From: Julian Brown [mailto:jul...@codesourcery.com] Sent: 13 April 2013 15:04 To: Julian Brown Cc: Kyrylo Tkachov; gcc-patches@gcc.gnu.org; Richard Earnshaw; Ramana Radhakrishnan Subject: Re: [PATCH][ARM][1/2] Add support for vcvt_f16_f32 and vcvt_f32_f16 NEON intrinsics On Fri, 12 Apr 2013 20:09:39 +0100 Julian Brown jul...@codesourcery.com wrote: On Fri, 12 Apr 2013 15:19:18 +0100 Kyrylo Tkachov kyrylo.tkac...@arm.com wrote: Hi all, This patch adds the vcvt_f16_f32 and vcvt_f32_f16 NEON intrinsic to arm_neon.h through the generator ML scripts and also adds the built-ins to which the intrinsics will map to. The generator ML scripts are updated and used to generate the relevant .texi documentation, arm_neon.h and the tests in gcc.target/arm/neon . FWIW, some of the changes to neon*.ml can be simplified somewhat -- my attempt at an improved version of those bits is attached. I'm still not too happy with mode_suffix, but these new instructions require adding semantics to parts of the generator program which weren't really very well-defined to start with :-). I appreciate that it's a bit of a tangle... I thought of an improvement to the mode_suffix part from the last version of the patch, so here it is. I'm done fiddling with this now, so back to you! Thanks for looking at it! My Ocaml-fu is rather limited. It does look cleaner now. Here it is together with all the other parts of the patch, plus some minor formatting changes. Ok for trunk now? gcc/ChangeLog 2013-04-17 Kyrylo Tkachov kyrylo.tkac...@arm.com Julian Brown jul...@codesourcery.com * config/arm/arm.c (neon_builtin_type_mode): Add T_V4HF. (TB_DREG): Add T_V4HF. (v4hf_UP): New macro. (neon_itype): Add NEON_FLOAT_WIDEN, NEON_FLOAT_NARROW. (arm_init_neon_builtins): Handle NEON_FLOAT_WIDEN, NEON_FLOAT_NARROW. Handle initialisation of V4HF. Adjust initialisation of reinterpret built-ins. (arm_expand_neon_builtin): Handle NEON_FLOAT_WIDEN, NEON_FLOAT_NARROW. (arm_vector_mode_supported_p): Handle V4HF. (arm_mangle_map): Handle V4HFmode. * config/arm/arm.h (VALID_NEON_DREG_MODE): Add V4HF. * config/arm/arm_neon_builtins.def: Add entries for vcvtv4hfv4sf, vcvtv4sfv4hf. * config/arm/neon.md (neon_vcvtv4sfv4hf): New pattern. (neon_vcvtv4hfv4sf): Likewise. * config/arm/neon-gen.ml: Handle half-precision floating point features. * config/arm/neon-testgen.ml: Handle Requires_FP_bit feature. * config/arm/arm_neon.h: Regenerate. * config/arm/neon.ml (type elts): Add F16. (type vectype): Add T_float16x4, T_floatHF. (type vecmode): Add V4HF. (type features): Add Requires_FP_bit feature. (elt_width): Handle F16. (elt_class): Likewise. (elt_of_class_width): Likewise. (mode_of_elt): Refactor. (type_for_elt): Handle F16, fix error messages. (vectype_size): Handle T_float16x4. (vcvt_sh): New function. (ops): Add entries for vcvt_f16_f32, vcvt_f32_f16. (string_of_vectype): Handle T_floatHF, T_float16, T_float16x4. (string_of_mode): Handle V4HF. * doc/arm-neon-intrinsics.texi: Regenerate. gcc/testsuite/ChangeLog 2013-04-17 Kyrylo Tkachov kyrylo.tkac...@arm.com Julian Brown jul...@codesourcery.com * gcc.target/arm/neon/vcvtf16_f32.c: New test. Generated. * gcc.target/arm/neon/vcvtf32_f16.c: Likewise. neon-vcvt-intrinsics.patch Description: Binary data
[PATCH] Fix PR56982, handle setjmp like non-local labels
This fixes PR56982 by properly modeling the control-flow of setjmp. It basically behaves as a non-local goto target so this patch treats it so - it makes it start a basic-block and get abnormal edges from possible sources of non-local gotos. The patch also fixes the bug that longjmp is marked as leaf. Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk? What about release branches (after it had some time to settle on trunk of course)? Thanks, Richard. 2013-04-17 Richard Biener rguent...@suse.de PR tree-optimization/56982 * builtins.def (BUILT_IN_LONGJMP): longjmp is not a leaf function. * gimplify.c (gimplify_call_expr): Notice special calls. (gimplify_modify_expr): Likewise. * tree-cfg.c (make_abnormal_goto_edges): Handle setjmp-like abnormal control flow receivers. (call_can_make_abnormal_goto): Handle cfun-calls_setjmp in the same way as cfun-has_nonlocal_labels. (gimple_purge_dead_abnormal_call_edges): Likewise. (stmt_starts_bb_p): Make setjmp-like abnormal control flow receivers start a basic-block. * gcc.c-torture/execute/pr56982.c: New testcase. Index: gcc/gimplify.c === *** gcc/gimplify.c (revision 198021) --- gcc/gimplify.c (working copy) *** gimplify_call_expr (tree *expr_p, gimple *** 2729,2734 --- 2729,2735 gimple_stmt_iterator gsi; call = gimple_build_call_from_tree (*expr_p); gimple_call_set_fntype (call, TREE_TYPE (fnptrtype)); + notice_special_calls (call); gimplify_seq_add_stmt (pre_p, call); gsi = gsi_last (*pre_p); fold_stmt (gsi); *** gimplify_modify_expr (tree *expr_p, gimp *** 4968,4973 --- 4969,4975 STRIP_USELESS_TYPE_CONVERSION (CALL_EXPR_FN (*from_p)); assign = gimple_build_call_from_tree (*from_p); gimple_call_set_fntype (assign, TREE_TYPE (fnptrtype)); + notice_special_calls (assign); if (!gimple_call_noreturn_p (assign)) gimple_call_set_lhs (assign, *to_p); } Index: gcc/tree-cfg.c === *** gcc/tree-cfg.c (revision 198021) --- gcc/tree-cfg.c (working copy) *** make_abnormal_goto_edges (basic_block bb *** 967,991 gimple_stmt_iterator gsi; FOR_EACH_BB (target_bb) ! for (gsi = gsi_start_bb (target_bb); !gsi_end_p (gsi); gsi_next (gsi)) ! { ! gimple label_stmt = gsi_stmt (gsi); ! tree target; ! if (gimple_code (label_stmt) != GIMPLE_LABEL) ! break; ! target = gimple_label_label (label_stmt); ! /* Make an edge to every label block that has been marked as a ! potential target for a computed goto or a non-local goto. */ ! if ((FORCED_LABEL (target) !for_call) ! || (DECL_NONLOCAL (target) for_call)) ! { make_edge (bb, target_bb, EDGE_ABNORMAL); ! break; ! } ! } } /* Create edges for a goto statement at block BB. */ --- 971,1005 gimple_stmt_iterator gsi; FOR_EACH_BB (target_bb) ! { ! for (gsi = gsi_start_bb (target_bb); !gsi_end_p (gsi); gsi_next (gsi)) ! { ! gimple label_stmt = gsi_stmt (gsi); ! tree target; ! if (gimple_code (label_stmt) != GIMPLE_LABEL) ! break; ! target = gimple_label_label (label_stmt); ! /* Make an edge to every label block that has been marked as a !potential target for a computed goto or a non-local goto. */ ! if ((FORCED_LABEL (target) !for_call) ! || (DECL_NONLOCAL (target) for_call)) ! { ! make_edge (bb, target_bb, EDGE_ABNORMAL); ! break; ! } ! } ! if (!gsi_end_p (gsi)) ! { ! /* Make an edge to every setjmp-like call. */ ! gimple call_stmt = gsi_stmt (gsi); ! if (is_gimple_call (call_stmt) ! (gimple_call_flags (call_stmt) ECF_RETURNS_TWICE)) make_edge (bb, target_bb, EDGE_ABNORMAL); ! } ! } } /* Create edges for a goto statement at block BB. */ *** call_can_make_abnormal_goto (gimple t) *** 2147,2153 { /* If the function has no non-local labels, then a call cannot make an abnormal transfer of control. */ ! if (!cfun-has_nonlocal_label) return false; /* Likewise if the call has no side effects. */ --- 2161,2168 { /* If the function has no non-local labels, then a call cannot make an abnormal transfer of control. */ ! if (!cfun-has_nonlocal_label !!cfun-calls_setjmp) return false; /* Likewise if the call has no side effects. */ *** stmt_starts_bb_p (gimple stmt, gimple pr *** 2302,2307 --- 2317,2327 else
[PATCH] Fix PR56921
This fixes PR56921 in a better way and restores the ability to ggc-collect during RTL loop passes. The patch stores the simple-loop-description in a separate member of struct loop and not its aux field which is not scanned by GC. Bootstrapped and tested on x86_64-unknown-linux-gnu and powerpc64-linux-gnu, applied. Richard. 2013-04-17 Richard Biener rguent...@suse.de PR rtl-optimization/56921 * cfgloop.h (struct loop): Add simple_loop_desc member. (struct niter_desc): Mark with GTY(()). (simple_loop_desc): Do not use aux field but simple_loop_desc. * loop-iv.c (get_simple_loop_desc): Likewise. (free_simple_loop_desc): Likewise. Revert 2013-04-16 Richard Biener rguent...@suse.de PR rtl-optimization/56921 * loop-init.c (pass_rtl_move_loop_invariants): Add TODO_do_not_ggc_collect to todo_flags_finish. (pass_rtl_unswitch): Same. (pass_rtl_unroll_and_peel_loops): Same. (pass_rtl_doloop): Same. Index: gcc/cfgloop.h === *** gcc/cfgloop.h (revision 198021) --- gcc/cfgloop.h (working copy) *** struct GTY ((chain_next (%h.next))) lo *** 172,177 --- 172,180 /* Head of the cyclic list of the exits of the loop. */ struct loop_exit *exits; + + /* Number of iteration analysis data for RTL. */ + struct niter_desc *simple_loop_desc; }; /* Flags for state of loop structure. */ *** struct rtx_iv *** 372,378 /* The description of an exit from the loop and of the number of iterations till we take the exit. */ ! struct niter_desc { /* The edge out of the loop. */ edge out_edge; --- 375,381 /* The description of an exit from the loop and of the number of iterations till we take the exit. */ ! struct GTY(()) niter_desc { /* The edge out of the loop. */ edge out_edge; *** extern void free_simple_loop_desc (struc *** 425,431 static inline struct niter_desc * simple_loop_desc (struct loop *loop) { ! return (struct niter_desc *) loop-aux; } /* Accessors for the loop structures. */ --- 428,434 static inline struct niter_desc * simple_loop_desc (struct loop *loop) { ! return loop-simple_loop_desc; } /* Accessors for the loop structures. */ Index: gcc/loop-iv.c === *** gcc/loop-iv.c (revision 198021) --- gcc/loop-iv.c (working copy) *** get_simple_loop_desc (struct loop *loop) *** 3016,3025 /* At least desc-infinite is not always initialized by find_simple_loop_exit. */ ! desc = XCNEW (struct niter_desc); iv_analysis_loop_init (loop); find_simple_exit (loop, desc); ! loop-aux = desc; if (desc-simple_p (desc-assumptions || desc-infinite)) { --- 3016,3025 /* At least desc-infinite is not always initialized by find_simple_loop_exit. */ ! desc = ggc_alloc_cleared_niter_desc (); iv_analysis_loop_init (loop); find_simple_exit (loop, desc); ! loop-simple_loop_desc = desc; if (desc-simple_p (desc-assumptions || desc-infinite)) { *** free_simple_loop_desc (struct loop *loop *** 3069,3074 if (!desc) return; ! free (desc); ! loop-aux = NULL; } --- 3069,3074 if (!desc) return; ! ggc_free (desc); ! loop-simple_loop_desc = NULL; } Index: gcc/loop-init.c === *** gcc/loop-init.c (revision 198021) --- gcc/loop-init.c (working copy) *** struct rtl_opt_pass pass_rtl_move_loop_i *** 434,441 0,/* properties_destroyed */ 0,/* todo_flags_start */ TODO_df_verify | ! TODO_df_finish | TODO_verify_rtl_sharing ! | TODO_do_not_ggc_collect /* todo_flags_finish */ } }; --- 434,440 0,/* properties_destroyed */ 0,/* todo_flags_start */ TODO_df_verify | ! TODO_df_finish | TODO_verify_rtl_sharing /* todo_flags_finish */ } }; *** struct rtl_opt_pass pass_rtl_unswitch = *** 471,478 0,/* properties_provided */ 0,/* properties_destroyed */ 0,/* todo_flags_start */ ! TODO_verify_rtl_sharing ! | TODO_do_not_ggc_collect /* todo_flags_finish */ } }; --- 470,476 0,/* properties_provided */ 0,/* properties_destroyed */ 0,/* todo_flags_start */ ! TODO_verify_rtl_sharing, /* todo_flags_finish */ }
Re: [PATCH][RFC] Handle commutative operations in SLP tree build
On Wed, 10 Apr 2013, Richard Biener wrote: This handles commutative operations during SLP tree build in the way that if one configuration does not match, the build will try again with commutated operands for. This allows to remove the special-casing of commutated loads in a complex addition that was in the end handled as permutation. It of course also applies more generally. Permutation is currently limited to 3 unsuccessful permutes to avoid running into the inherently exponential complexity of tree matching. The gcc.dg/vect/vect-complex-?.c testcases provide some testing coverage (previously handled by the special-casing). I have seen failed SLP in the wild previously but it's usually on larger testcases and dependent on operand order of commutative operands. I've discussed ideas to restrict the cases where we try a permutation with Matz, but I'll rather defer that to an eventual followup. (compute per SSA name a value dependent on the shape of its use-def tree and use that as a quick check whether sub-trees can possibly match) Bootstrap and regtest running on x86_64-unknown-linux-gnu. Any comments? Committed to trunk. Richard. 2013-04-10 Richard Biener rguent...@suse.de * tree-vect-slp.c (vect_build_slp_tree_1): Split out from ... (vect_build_slp_tree): ... here. (vect_build_slp_tree_1): Compute which stmts of the SLP group match. Remove special-casing of mismatched complex loads. (vect_build_slp_tree): Based on the result from vect_build_slp_tree_1 re-try the match with swapped commutative operands. (vect_supported_load_permutation_p): Remove special-casing of mismatched complex loads. (vect_analyze_slp_instance): Adjust.
Re: [patch] RFC: ix86 / x86_64 register pressure aware scheduling
These changes are what we used to try here at Intel after bunch of changes which made pre-alloc scheduler more stable. We benchmarked both register pressure algorithms and overall result was not that promising. We saw number of regressions e.g. for optset -mavx -O3 -funroll-loops -ffast-math -march=corei7 (for spec2000 not only lucas but also applu regressed). And overall gain is negative even for x86_64. For 32 bits picture was worse if I remember correctly. In common we have doubts that this feature is good for OOO machine Thanks, Igor -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On Behalf Of Steven Bosscher Sent: Monday, April 15, 2013 11:34 PM To: GCC Patches Cc: H.J. Lu; Uros Bizjak; Jan Hubicha Subject: [patch] RFC: ix86 / x86_64 register pressure aware scheduling Hello, The attached patch enables register pressure aware scheduling for the ix86 and x86_64 targets. It uses the optimistic algorithm to avoid being overly conservative. This is the same as what other CISCy targets, like s390, also do. The motivation for this patch is the excessive spilling I've observed in a few test cases with relatively large basic blocks, e.g. encryption algorithms and codecs. The patch passes bootstrap+testing on x86_64-unknown-linux-gnu and i686-unknown-linux-gnu, with a few new failures due to PR56950. Off-list, Uros, Honza and others have already looked at the patch and benchmarked it. For x86_64 there is an overall improvement for SPEC2k except that lucas regresses, but such a preliminary result is IMHO very promising. Comments/suggestions welcome :-) Ciao! Steven * common/config/i386/i386-common.c (ix86_option_optimization_table): Do not disable insns scheduling. Enable register pressure aware scheduling. * config/i386/i386.c (ix86_option_override): Use the alternative, optimistic scheduling-pressure algorithm by default. Index: common/config/i386/i386-common.c === --- common/config/i386/i386-common.c(revision 197941) +++ common/config/i386/i386-common.c(working copy) @@ -707,9 +707,15 @@ static const struct default_options ix86 { /* Enable redundant extension instructions removal at -O2 and higher. */ { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 }, -/* Turn off -fschedule-insns by default. It tends to make the - problem with not enough registers even worse. */ -{ OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 }, +/* Enable -fsched-pressure by default for all optimization levels. + Before SCHED_PRESSURE_MODEL register-pressure aware schedule was + available, -fschedule-insns was turned off completely by default for + this port, because scheduling before register allocation tends to + make the problem with not enough registers even worse. However, + for very long basic blocks the scheduler can help bring register + pressure down significantly, and SCHED_PRESSURE_MODEL is still + conservative enough to avoid creating excessive register pressure. */ +{ OPT_LEVELS_ALL, OPT_fsched_pressure, NULL, 1 }, #ifdef SUBTARGET_OPTIMIZATION_OPTIONS SUBTARGET_OPTIMIZATION_OPTIONS, Index: config/i386/i386.c === --- config/i386/i386.c (revision 197941) +++ config/i386/i386.c (working copy) @@ -3936,6 +3936,10 @@ ix86_option_override (void) ix86_option_override_internal (true); + /* Use the alternative scheduling-pressure algorithm by default. */ + maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, 2, +global_options.x_param_values, +global_options_set.x_param_values); /* This needs to be done at start up. It's convenient to do it here. */ register_pass (insert_vzeroupper_info);
[PATCH, ARM] emit LDRD epilogue instead of a single LDM return
Currently, epilogue is not generated in RTL for function that can return using a single instruction. This patch enables RTL epilogue for such functions on targets that can benefit from using a sequence of LDRD instructions in epilogue instead of a single LDM instruction. No regression on qemu arm-none-eabi with cortex-a15. Ok for trunk? Thanks, Greta gcc/ 2012-10-19 Greta Yorsh Greta.Yorsh at arm.com * config/arm/arm.c (use_return_insn): Return 0 for targets that can benefit from using a sequence of LDRD instructions in epilogue instead of a single LDM instruction.diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 866385c..bca92af 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -2296,6 +2296,10 @@ use_return_insn (int iscond, rtx sibling) if (IS_INTERRUPT (func_type) (frame_pointer_needed || TARGET_THUMB)) return 0; + if (TARGET_LDRD current_tune-prefer_ldrd_strd + !optimize_function_for_size_p (cfun)) +return 0; + offsets = arm_get_frame_offsets (); stack_adjust = offsets-outgoing_args - offsets-saved_regs;
[PATCH, ARM][10/n] Split scc patterns using cond_exec
This patch converts define_insn into define_insn_and_split to split some alternatives of movsicc_insn and some scc patterns that cannot be expressed using movsicc. The patch emits cond_exec RTL insns. Ok for trunk? Thanks, Greta gcc/ 2013-02-19 Greta Yorsh greta.yo...@arm.com * config/arm/arm.md (movsicc_insn): Convert define_insn into define_insn_and_split. (and_scc,ior_scc,negscc): Likewise.diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 83b36ca..c2e59ed 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -858,7 +858,7 @@ ;; This is the canonicalization of addsi3_compare0_for_combiner when the ;; addend is a constant. -(define_insn *cmpsi2_addneg +(define_insn cmpsi2_addneg [(set (reg:CC CC_REGNUM) (compare:CC (match_operand:SI 1 s_register_operand r,r) @@ -1415,7 +1415,7 @@ (set_attr type simple_alu_imm,*,*)] ) -(define_insn *subsi3_compare +(define_insn subsi3_compare [(set (reg:CC CC_REGNUM) (compare:CC (match_operand:SI 1 arm_rhs_operand r,r,I) (match_operand:SI 2 arm_rhs_operand I,r,r))) @@ -8619,7 +8619,7 @@ (set_attr type f_selvfp_type)] ) -(define_insn *movsicc_insn +(define_insn_and_split *movsicc_insn [(set (match_operand:SI 0 s_register_operand =r,r,r,r,r,r,r,r) (if_then_else:SI (match_operator 3 arm_comparison_operator @@ -8632,10 +8632,45 @@ mvn%D3\\t%0, #%B2 mov%d3\\t%0, %1 mvn%d3\\t%0, #%B1 - mov%d3\\t%0, %1\;mov%D3\\t%0, %2 - mov%d3\\t%0, %1\;mvn%D3\\t%0, #%B2 - mvn%d3\\t%0, #%B1\;mov%D3\\t%0, %2 - mvn%d3\\t%0, #%B1\;mvn%D3\\t%0, #%B2 + # + # + # + # + ; alt4: mov%d3\\t%0, %1\;mov%D3\\t%0, %2 + ; alt5: mov%d3\\t%0, %1\;mvn%D3\\t%0, #%B2 + ; alt6: mvn%d3\\t%0, #%B1\;mov%D3\\t%0, %2 + ; alt7: mvn%d3\\t%0, #%B1\;mvn%D3\\t%0, #%B2 + reload_completed + [(const_int 0)] + { +enum rtx_code rev_code; +enum machine_mode mode; +rtx rev_cond; + +emit_insn (gen_rtx_COND_EXEC (VOIDmode, + operands[3], + gen_rtx_SET (VOIDmode, + operands[0], + operands[1]))); + +rev_code = GET_CODE (operands[3]); +mode = GET_MODE (operands[4]); +if (mode == CCFPmode || mode == CCFPEmode) + rev_code = reverse_condition_maybe_unordered (rev_code); +else + rev_code = reverse_condition (rev_code); + +rev_cond = gen_rtx_fmt_ee (rev_code, + VOIDmode, + operands[4], + const0_rtx); +emit_insn (gen_rtx_COND_EXEC (VOIDmode, + rev_cond, + gen_rtx_SET (VOIDmode, + operands[0], + operands[2]))); +DONE; + } [(set_attr length 4,4,4,4,8,8,8,8) (set_attr conds use) (set_attr insn mov,mvn,mov,mvn,mov,mov,mvn,mvn) @@ -9604,27 +9639,64 @@ (set_attr type alu_shift,alu_shift_reg)]) -(define_insn *and_scc +(define_insn_and_split *and_scc [(set (match_operand:SI 0 s_register_operand =r) (and:SI (match_operator:SI 1 arm_comparison_operator -[(match_operand 3 cc_register ) (const_int 0)]) - (match_operand:SI 2 s_register_operand r)))] +[(match_operand 2 cc_register ) (const_int 0)]) + (match_operand:SI 3 s_register_operand r)))] TARGET_ARM - mov%D1\\t%0, #0\;and%d1\\t%0, %2, #1 + # ; mov%D1\\t%0, #0\;and%d1\\t%0, %3, #1 + reload_completed + [(cond_exec (match_dup 5) (set (match_dup 0) (const_int 0))) + (cond_exec (match_dup 4) (set (match_dup 0) + (and:SI (match_dup 3) (const_int 1] + { +enum machine_mode mode = GET_MODE (operands[2]); +enum rtx_code rc = GET_CODE (operands[1]); + +/* Note that operands[4] is the same as operands[1], + but with VOIDmode as the result. */ +operands[4] = gen_rtx_fmt_ee (rc, VOIDmode, operands[2], const0_rtx); +if (mode == CCFPmode || mode == CCFPEmode) + rc = reverse_condition_maybe_unordered (rc); +else + rc = reverse_condition (rc); +operands[5] = gen_rtx_fmt_ee (rc, VOIDmode, operands[2], const0_rtx); + } [(set_attr conds use) (set_attr insn mov) (set_attr length 8)] ) -(define_insn *ior_scc +(define_insn_and_split *ior_scc [(set (match_operand:SI 0 s_register_operand =r,r) - (ior:SI (match_operator:SI 2 arm_comparison_operator -[(match_operand 3 cc_register ) (const_int 0)]) - (match_operand:SI 1 s_register_operand 0,?r)))] + (ior:SI (match_operator:SI 1 arm_comparison_operator +[(match_operand 2 cc_register ) (const_int 0)]) + (match_operand:SI 3 s_register_operand 0,?r)))] TARGET_ARM @ -
New German PO file for 'gcc' (version 4.8.0)
Hello, gentle maintainer. This is a message from the Translation Project robot. A revised PO file for textual domain 'gcc' has been submitted by the German team of translators. The file is available at: http://translationproject.org/latest/gcc/de.po (This file, 'gcc-4.8.0.de.po', has just now been sent to you in a separate email.) All other PO files for your package are available in: http://translationproject.org/latest/gcc/ Please consider including all of these in your next release, whether official or a pretest. Whenever you have a new distribution with a new version number ready, containing a newer POT file, please send the URL of that distribution tarball to the address below. The tarball may be just a pretest or a snapshot, it does not even have to compile. It is just used by the translators when they need some extra translation context. The following HTML page has been updated: http://translationproject.org/domain/gcc.html If any question arises, please contact the translation coordinator. Thank you for all your work, The Translation Project robot, in the name of your translation coordinator. coordina...@translationproject.org
Re: [PATCH] Fix linking with -findirect-dispatch
Bryce McKinlay bmckin...@gmail.com writes: It certainly _did_ work as intended previously. Only by chance, when libtool has to relink the library during install. Andreas. -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 And now for something completely different.
[Patch, Fortran] PR 56814: [4.8/4.9 Regression] Bogus Interface mismatch in dummy procedure
Hi all, here is patch for a recent regression with procedure pointers. Regtested on x86_64-unknown-linux-gnu. Ok for trunk and 4.8? Cheers, Janus 2013-04-17 Janus Weil ja...@gcc.gnu.org PR fortran/56814 * interface.c (check_result_characteristics): Get result from interface if present. 2013-04-17 Janus Weil ja...@gcc.gnu.org PR fortran/56814 * gfortran.dg/proc_ptr_42.f90: New. pr56814_v2.diff Description: Binary data proc_ptr_42.f90 Description: Binary data
Re: [Patch, Fortran] PR 56814: [4.8/4.9 Regression] Bogus Interface mismatch in dummy procedure
Janus Weil: here is patch for a recent regression with procedure pointers. Regtested on x86_64-unknown-linux-gnu. Ok for trunk and 4.8? Looks rather obvious. OK - and thanks for the patch. Tobias PS: If you have time, could you review my C_LOC patch at http://gcc.gnu.org/ml/fortran/2013-04/msg00073.html ? 2013-04-17 Janus Weil ja...@gcc.gnu.org PR fortran/56814 * interface.c (check_result_characteristics): Get result from interface if present. 2013-04-17 Janus Weil ja...@gcc.gnu.org PR fortran/56814 * gfortran.dg/proc_ptr_42.f90: New.
Re: RFA: enable LRA for rs6000 [patch for WRF]
On 13-04-16 6:56 PM, Michael Meissner wrote: I tracked down the bug with the spec 2006 benchmark WRF using the LRA register allocator. At one point LRA has decided to use the CTR to hold a CCmode value: (insn 11019 11018 11020 16 (set (reg:CC 66 ctr [4411]) (reg:CC 66 ctr [4411])) module_diffusion_em.fppized.f90:4885 360 {*movcc_internal1} (expr_list:REG_DEAD (reg:CC 66 ctr [4411]) (nil))) Now movcc_internal1 has moves from r-h (which includes ctr/lr) and ctr/lr-r, but it doesn't have a move to cover the nop move of moving the ctr to the ctr. IMHO, LRA should not be generating NOP moves that are later deleted. There are two ways to solve the problem. One is not to let anything but int modes into CTR/LR, which will also eliminate the register allocator from spilling floating point values there (which we've seen in the past, but the last time I tried to eliminate it I couldn't). The following patch does this, and also changes the assertion to call fatal_insn_not_found to make it clearer what the error is. I imagine, I could add a NOP move insn to movcc_internal1, but that just strikes me as wrong. Note, this does not fix the 32-bit failure in dealII, and I also noticed that I can't bootstrap the compiler using --with-cpu=power7, which I will get to tomorrow. 2013-04-16 Michael Meissner meiss...@linux.vnet.ibm.com * config/rs6000/rs6000.opt (-mconstrain-regs): New debug switch to control whether we only allow int modes to go in the CTR, LR, VRSAVE, VSCR registers. * config/rs6000/rs6000.c (rs6000_hard_regno_mode_ok): Likewise. (rs6000_debug_reg_global): If -mdebug=reg, print out if SPRs are constrained. (rs6000_option_override_internal): Set -mconstrain-regs if we are using the LRA register allocator. * lra.c (check_rtl): Use fatal_insn_not_found to report constraint does not match. Mike, thanks for the patch and all the SPEC2006 data (which are very useful as I have no access to power machine which can be used for benchmarking). I guess that may be some benchmark scores are lower because of LRA lacks some micro-optimizations which reload implements through many power hooks (e.g. LRA does not use push reload). Although sometimes it is not a bad thing (e.g. LRA does not use SECONDARY_MEMORY_NEEDED_RTX which permits to reuse the stack slots for other useful things). In general I got impression that power7 is the most difficult port for LRA. If we manage to port it, LRA ports for other targets will be easier. I also reproduced bootstrap failure --with-cpu=power7 and I am going to work on this and after that on SPEC2006 you wrote about.
Re: [PATCH, x86] Use vector moves in memmove expanding
Bootstrap/make check/Specs2k are passing on i686 and x86_64. Thanks for returning to this! glibc has quite comprehensive testsuite for stringop. It may be useful to test it with -minline-all-stringop -mstringop-stategy=vector I tested the patch on my core notebook and my memcpy micro benchmark. Vector loop is not a win since apparenlty we do not produce any SSE code for 64bit compilation. What CPUs and bock sizes this is intended for? Also the internal loop with -march=native seems to come out as: .L7: movq(%rsi,%r8), %rax movq8(%rsi,%r8), %rdx movq48(%rsi,%r8), %r9 movq56(%rsi,%r8), %r10 movdqu 16(%rsi,%r8), %xmm3 movdqu 32(%rsi,%r8), %xmm1 movq%rax, (%rdi,%r8) movq%rdx, 8(%rdi,%r8) movdqa %xmm3, 16(%rdi,%r8) movdqa %xmm1, 32(%rdi,%r8) movq%r9, 48(%rdi,%r8) movq%r10, 56(%rdi,%r8) addq$64, %r8 cmpq%r11, %r8 It is not htat much of SSE enablement since RA seems to home the vars in integer regs. Could you please look into it? Changelog entry: 2013-04-10 Michael Zolotukhin michael.v.zolotuk...@gmail.com * config/i386/i386-opts.h (enum stringop_alg): Add vector_loop. * config/i386/i386.c (expand_set_or_movmem_via_loop): Use adjust_address instead of change_address to keep info about alignment. (emit_strmov): Remove. (emit_memmov): New function. (expand_movmem_epilogue): Refactor to properly handle bigger sizes. (expand_movmem_epilogue): Likewise and return updated rtx for destination. (expand_constant_movmem_prologue): Likewise and return updated rtx for destination and source. (decide_alignment): Refactor, handle vector_loop. (ix86_expand_movmem): Likewise. (ix86_expand_setmem): Likewise. * config/i386/i386.opt (Enum): Add vector_loop to option stringop_alg. * emit-rtl.c (get_mem_align_offset): Compute alignment for MEM_REF. diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c index 73a59b5..edb59da 100644 --- a/gcc/emit-rtl.c +++ b/gcc/emit-rtl.c @@ -1565,6 +1565,18 @@ get_mem_align_offset (rtx mem, unsigned int align) expr = inner; } } + else if (TREE_CODE (expr) == MEM_REF) +{ + tree base = TREE_OPERAND (expr, 0); + tree byte_offset = TREE_OPERAND (expr, 1); + if (TREE_CODE (base) != ADDR_EXPR + || TREE_CODE (byte_offset) != INTEGER_CST) + return -1; + if (!DECL_P (TREE_OPERAND (base, 0)) + || DECL_ALIGN (TREE_OPERAND (base, 0)) align) You can use TYPE_ALIGN here? In general can't we replace all the GIMPLE handling by get_object_alignment? + return -1; + offset += tree_low_cst (byte_offset, 1); +} else return -1; This change out to go independently. I can not review it. I will make first look over the patch shortly, but please send updated patch fixing the problem with integer regs. Honza