[Bug rtl-optimization/57915] [4.8/4.9 Regression] ICE in set_address_disp, at rtlanal.c:5537
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57915 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #4 from Vladimir Makarov vmakarov at redhat dot com --- The address in question is (plus (symbol_ref ...) (const_int 4)) LRA finds two displacements (symbol_ref and const_int) although only one displacement is allowed. The correct canonical address should be: (const (plus (symbol_ref ...) (const_int 4))) Non-canonical address is created from (reg ...) by 1st constant propagation pass (cprop1). I believe the problem should be fixed there. As for reload pass, it has code transforming address (plus some const some const) into (const (plus some const some const)). It was probably a problem fix in a wrong place. There is no need to complicate LRA more and implement analogous code in LRA. As I wrote I believe it should be fixed in cprop1 by generating the correct canonical address.
[Bug middle-end/58419] [4.9 Regression] wrong code at -O3 on x86_64-linux-gnu in 32-bit mode
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58419 --- Comment #3 from Vladimir Makarov vmakarov at redhat dot com --- (In reply to Zhendong Su from comment #2) (In reply to H.J. Lu from comment #1) It is caused by r202468. So it may have been a dup of 58418? Yes, it is a duplication.
[Bug target/58166] ARMv5: poor register allocation in function containing smull instruction
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58166 --- Comment #6 from Vladimir Makarov vmakarov at redhat dot com --- On 13-08-22 10:11 AM, rearnsha at gcc dot gnu.org wrote: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58166 --- Comment #5 from Richard Earnshaw rearnsha at gcc dot gnu.org --- (In reply to Jay Foad from comment #3) I've bisected this to r191805: http://gcc.gnu.org/viewcvs/gcc?view=revisionrevision=191805 http://gcc.gnu.org/ml/gcc-patches/2012-09/msg01764.html I suspect that is just exposing a latent problem. Sorry, I am on vacation now. I'll look at this after my vacation (after the Labor day).
[Bug target/58110] Useless GPR push and pop when only xmm registers are used.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58110 --- Comment #3 from Vladimir Makarov vmakarov at redhat dot com --- Thanks, Ondrej and Jan. GCC with reload generates code with the same problem. I mentioned on RA BOF that we should look at postreload.c and postreload-gcse.c to figure out what should and can be removed as redundant and what can be integrated with IRA/LRA. This PR is just a good illustration of why it should be done. I don't think this work will be done soon but it is good to have the PR to remember this.
[Bug rtl-optimization/58048] [4.8/4.9 Regression] internal compiler error: Max. number of generated reload insns per insn is achieved (90)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58048 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #9 from Vladimir Makarov vmakarov at redhat dot com --- (In reply to Bernd Edlinger from comment #8) I see the same error with recent 4.9 i686-pc-linux-gnu in the following test case: gcc -O2 -msse -mno-avx -S testsuite/gcc.target/i386/intrinsics_4.c intrinsics_4.c: In function 'foo': intrinsics_4.c:14:1: internal compiler error: Max. number of generated reload insns per insn is achieved (90) } ^ 0x849e4c3 lra_constraints(bool) ../../gcc-4.9-20130728/gcc/lra-constraints.c:3724 0x849136c lra(_IO_FILE*) ../../gcc-4.9-20130728/gcc/lra.c:2319 0x8456beb do_reload ../../gcc-4.9-20130728/gcc/ira.c:4689 0x8456beb rest_of_handle_reload ../../gcc-4.9-20130728/gcc/ira.c:4801 Please submit a full bug report, with preprocessed source if appropriate. It is the same diagnostic but it has different reason for this. I guess it is not LRA problem. This test should be not run for i686 as it tries to use non-avx and avx insns (which is absent for i686 architecture). Reload pass also finishes badly by assert (internal error) on this test as reload can not find insns to generate correct code. Still GCC should have a better diagnostic for this case (may be by checking correct architecture/attribute pairs). Although I have no idea how to do it right.
[Bug target/57293] [4.8/4.9 Regression] not needed frame pointers on IA-32 (performance regression?)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57293 --- Comment #5 from Vladimir Makarov vmakarov at redhat dot com --- I've started this work. But unfortunately, i have too many things on my plate now. I was too optimistic. Now I can say only that I am planning to fix it on stage1 (so the fix should be in gcc4.9).
[Bug rtl-optimization/51041] g++ strange optimisation behaviour
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51041 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #3 from Vladimir Makarov vmakarov at redhat dot com --- I guess RA is doing right thing. Pseudo 84 corresponding to variable sum when the second printf is uncommented lives through insn throwing an exception. The code affecting p84 allocation (putting it into memory as SSE_REGS have no caller-saved regs) is ira-lives.c::process_bb_node_lives: if (can_throw_internal (insn)) { IOR_HARD_REG_SET (OBJECT_CONFLICT_HARD_REGS (obj), call_used_reg_set); IOR_HARD_REG_SET (OBJECT_TOTAL_CONFLICT_HARD_REGS (obj), call_used_reg_set); } Where insn is: (call_insn 141 140 142 22 (call (mem:QI (symbol_ref:DI (_ZdlPv) [flags 0x41] function_decl 0x71ae2400 operator delete) [0 operator delete S1 A8]) (const_int 0 [0])) /usr/lib/gcc/x86_64-redhat-linux/4.7.2/../../../../include/c++/4.7.2/ext/new_allocator.h:100 648 {*call} (expr_list:REG_DEAD (reg:DI 5 di) (expr_list:REG_EH_REGION (const_int 0 [0]) (nil))) (expr_list:REG_FRAME_RELATED_EXPR (use (reg:DI 5 di)) (nil))) it is a destructor in new_allocator.h: void deallocate(pointer __p, size_type) { ::operator delete(__p); } The problem could be solved by p84 live range splitting. By default IRA does live range splitting only when the register pressure is high. This is not the case for the test where max pressure for GENERAL_REGS and SSE_REGS is only 4. We can modify semantics -fira-region=all to form a region for any loop on which border live range splitting is done. I tried that and with -fira-region=all the same speed is achieved for the test. Unfortunately, with the new semantics permitting too aggressive spilling, the generated code is about 0.5% worse on SPEC2000 for x86-64. I guess we should pay more attention in optimizations to deal with code with EH regions, as C++ code have a lot of such code. I'll think what can I do more with the problem.
[Bug rtl-optimization/57960] S/390: LRA ICE building glibc
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57960 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #3 from Vladimir Makarov vmakarov at redhat dot com --- (In reply to Andreas Krebbel from comment #2) (In reply to Marek Polacek from comment #1) But this is s390x, right? (Judging from the movstrictsi.) Yes. Thanks, Andrew. I've reproduced it. I guess a fix will be ready on this week as the bug is in a sensitive part of LRA and the fix will need a lot of testing on a few machines.
[Bug bootstrap/57604] LRA related bootstrap comparison failure on s390x --with-arch=zEC12
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57604 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #2 from Vladimir Makarov vmakarov at redhat dot com --- Andreas, thanks for checking it and doing the analysis. I'll try to make a patch fixing this on this week.
[Bug rtl-optimization/57462] ira-costs considers only a single register at a time
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57462 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #1 from Vladimir Makarov vmakarov at redhat dot com --- Thanks for the analysis. That would be an interesting problem to solve. Although I don't know when I could start work on the problem. The code you are mentioning is actually adaptation of code from old regclass pass which existed since day 1 of GCC. The optimal solution of the problem might be NP-complete (I am not sure about it, but at least long time ago I tried to describe it by ILP). I should say that even the current cost code is very expensive and speeding up is on my list to do. Better solution (through better heuristics) probably will be even more expensive. IMHO, it is also GCC specific problem because GCC postpones code selection (usually compilers do complete code selection before RA, e.g. selecting insn for add in this case) and that is a consequence of GCC machine description model. Doing complete code selection before RA is also challenging task. In any case, the problem is known and quite interesting. There are a lot of different approaches to solve it (some require even GCC architectural changes), none of them are easy. So I don't think, the problem will be solved soon. Sorry.
[Bug target/57293] [4.8/4.9 Regression] not needed frame pointers on IA-32 (performance regression?)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57293 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #1 from Vladimir Makarov vmakarov at redhat dot com --- The change was done because of http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57018 LRA misses some functionality for now for this kind of code. There will be no quick fix (I mean in a few days or even in 2 weeks) for this. But I am planning to fix it until end of June. Sorry.
[Bug rtl-optimization/55278] [4.8/4.9 Regression] Botan performance regressions apparently due to LRA
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55278 --- Comment #11 from Vladimir Makarov vmakarov at redhat dot com --- I don't see a code degradation because of LRA. Here what I got using gcc4.8 branch compiler with options -O3 -finline-functions -D_REENTRANT -Wno-long-long -W -Wall -fPIC -fvisibility=hidden on Xeon X5660 and i7-2600 (sandy bridge): 64-bit: real=16.78 user=16.57 system=0.00 real=16.39 user=16.20 system=0.00 real=16.81 user=16.57 system=0.00 real=16.35 user=16.20 system=0.00 real=16.82 user=16.56 system=0.00 real=16.40 user=16.20 system=0.00 real=7.37 user=7.34 system=0.00 real=7.05 user=7.02 system=0.00 real=7.34 user=7.31 system=0.00 real=7.05 user=7.02 system=0.00 real=7.37 user=7.31 system=0.00 real=7.05 user=7.02 system=0.00 32-bit: real=15.46 user=15.22 system=0.00 share=98%% real=14.53 user=14.21 system=0.00 share=97%% real=15.77 user=15.41 system=0.00 share=97%% real=14.49 user=14.23 system=0.00 share=98%% real=15.57 user=15.22 system=0.00 share=97%% real=14.51 user=14.23 system=0.00 share=98%% real=10.17 user=10.13 system=0.00 real=7.76 user=7.73 system=0.00 real=10.17 user=10.13 system=0.00 real=7.76 user=7.73 system=0.00 real=10.17 user=10.13 system=0.00 real=7.76 user=7.73 system=0.00 The first run is for gcc-4.8 with reload the second run with LRA. It is repeated 3 times. LRA generates a better code for this test on both CPU in 32 and 64-bit mode. Although LLVM new reg allocator might generate better code than LRA or reload or may be there is another reason for this. To be honest I don't know. I looked at http://gcc.opensuse.org/c++bench/botan/botan-summary.txt-1-0.html and I see that KASUMI was improved about October. I worked on botan after LRA merge and as I remember some benchmarked became worse, some were improved but in overall (run time for all algorithms) was about the same. I don't have 3.3 LLVM but I using 3.2 I am getting on i7-2600 7.378s(64-bit) and 7.234s (32-bit) using the option above vs 7.02s and 7.73s for gcc4.8 (LRA). So I can not confirm the big difference on KASUMI reported on http://www.phoronix.com/scan.php?page=articleitem=llvm_32_eggingnum=2. It seems to me phoronix is very LLVM biased and that is not good for its credibility.
[Bug rtl-optimization/57131] [4.8/4.9 Regression] Wrong register assignment?
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57131 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #3 from Vladimir Makarov vmakarov at redhat dot com 2013-05-02 03:03:34 UTC --- (In reply to comment #2) Apparently went away with the http://gcc.gnu.org/r198432 fix, but it isn't clear whether that change was meant to fix this or just made the bug latent. Anyway, still reproduceable on the 4.8 branch. What I'm seeing before that change is that extendsidi2_1 pattern with MEM destination LRA chooses %ebx as (clobber (scratch:SI)) register, eventhough %ebx is live across that instruction (there is (insn 14 74 68 2 (set (reg:SI 3 bx [orig:83 D.1395 ] [83]) (mem/v/c:SI (plus:SI (reg/f:SI 7 sp) (const_int 72 [0x48])) [0 x4+0 S4 A64])) pr57131.c:11 85 {*movsi_internal} (nil)) (insn 68 14 73 2 (set (reg:SI 3 bx [orig:83 D.1395 ] [83]) (reg:SI 3 bx [orig:83 D.1395 ] [83])) pr57131.c:11 85 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 3 bx [orig:83 D.1395 ] [83]) (nil))) some insns before it and: (insn 65 24 26 2 (set (reg:SI 5 di [orig:83 D.1395 ] [83]) (reg:SI 3 bx [orig:83 D.1395 ] [83])) pr57131.c:11 85 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 3 bx [orig:83 D.1395 ] [83]) (nil))) some insns after it. Not sure if the noop move with REG_DEAD has anything to do with that. Vlad, can you please have a look? http://gcc.gnu.org/r198432 was a right solution for this bug. LRA don't pay attention to NO_REGS pseudos during assignment although ebx was assigned to NO_REGS r95 (which is reflected in reg_renumber). At some points of LRA work reg notes can be invalid. LRA makes them up to date after live subpass (lra-lives.c). It needs only correct live info on bb borders. So I'd close this PR.
[Bug rtl-optimization/57046] [4.8/4.9 Regression] wrong code generated by gcc 4.8.0 on i686
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57046 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #5 from Vladimir Makarov vmakarov at redhat dot com 2013-04-23 15:34:40 UTC --- (In reply to comment #4) We have after the get_value call: (insn 73 30 32 6 (set (reg:SI 76 [ D.1441 ]) (reg:SI 0 ax)) pr57046.c:42 85 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 0 ax) (nil))) (insn 32 73 33 6 (parallel [ (set (reg:SI 73 [ D.1443 ]) (ashift:SI (subreg:SI (reg:DI 60 [ D.1441 ]) 0) (const_int 2 [0x2]))) (clobber (reg:CC 17 flags)) ]) 502 {*ashlsi3_1} (expr_list:REG_DEAD (reg:DI 60 [ D.1441 ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil and IRA decides to put pseudo 76 into %ebx and pseudo 60 into %ecx. Either it is an IRA bug, or IRA takes into account that we only really need the low 32-bits of pseudo 60 at that point. In any case, reload loads SImode %ecx from the stack and uses it in the shift, while LRA loads full DImode %ecx (i.e. %ecx and %ebx) from the stack, then uses just the low bits from that (i.e. %ecx) in the shift. So the LRA generated code clobbers the value in %ebx, and get_value call is even later on DCEd because of it. It seems like a discrepancy in IRA which allocates in terms of subregisters and LRA splitting (including call save/restore as in this case) in terms of pseudos. I guess fixing this might take about week.
[Bug target/57018] [4.8/4.9 Regression] Miscompilation of bison 2.7.1 under -Os -fomit-frame-pointer
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57018 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #9 from Vladimir Makarov vmakarov at redhat dot com 2013-04-22 13:53:10 UTC --- (In reply to comment #8) BTW, with reload on current trunk, bar has identical code, except for the right leal 32(%esp), %esi instead of the wrong leal 16(%esp), %esi. It seems that with reload, elimination_effects is called both during IRA costs analysis and later on during actual elimination, while with LRA only IRA costs analysis calls it. And I don't see code in lra-eliminations.c that would adjust ep-offset based on say sp adjustments in the code. Yes, that is true. In such cases LRA just prevents frame pointer elimination (except for stack realingnment). I omitted this functionality as I thought it is not that important for code majority. May be it is time to reconsider this decision. I have a patch for the PR which I'll commit today later after some testing.
[Bug rtl-optimization/56999] [4.8/4.9 Regression] LRA caused miscompilation of xulrunner
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56999 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #4 from Vladimir Makarov vmakarov at redhat dot com 2013-04-18 18:28:21 UTC --- The bug is in a complicated interactions inheritance and coalescing through several inheritance/coalesce passes. I think the patch will be ready tomorrow. Jakub and Marek, thanks for working on extracting the testcase which required a lot of your efforts.
[Bug rtl-optimization/56847] [4.8/4.9 Regression] '-fpie' triggers - internal compiler error: in gen_add2_insn, at optabs.c:4705
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56847 --- Comment #9 from Vladimir Makarov vmakarov at redhat dot com 2013-04-18 20:10:34 UTC --- I am still working on this. I have a patch solving the problem but I'd like to try other solutions too.
[Bug rtl-optimization/56847] [4.8/4.9 Regression] '-fpie' triggers - internal compiler error: in gen_add2_insn, at optabs.c:4705
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56847 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #7 from Vladimir Makarov vmakarov at redhat dot com 2013-04-06 03:43:50 UTC --- It seems that reload systematically chooses a different alternative (4) than LRA (1) for movti_internal. This is a very tricky part of LRA so I guess fixing this can take a few days may be a week.
[Bug middle-end/55889] [4.8 Regression] ICE: in move_op_ascend, at sel-sched.c:6153 with -fschedule-insns -fselective-scheduling
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55889 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #29 from Vladimir Makarov vmakarov at redhat dot com 2013-02-14 16:48:24 UTC --- (In reply to comment #28) (In reply to comment #27) (In reply to comment #26) You are right, your suggestions is what I sketched in comment #21 as choices 1 or 2. Sorry for my unclear expalanation of what was actually happening. I don't have a problem with making sel-sched have extra checks when renaming registers before reload, which will make us notice a not obvious extra dependence and avoid renaming properly, as now we've figured out these dependences don't follow immediately from the RTL. I just want an extra opinion on whether such unexpected dependencies arising when a target (hard) register is replaced by a pseudo register should be normal within GCC, or do we attribute such dependencies only to the register pressure scheduling mode. FWIW, I would rather agree with the latter than with the former. I guess you can not fully assume that dependencies are created only from RTL data flow. There are cases (besides pressure sensitive scheduling case mentioned here) when dependencies are still created for other reasons different from RTL data flow. I'd look at the dependencies as constraints resulting in correct and *desirable* insn schedule. Although overwhelming majority of them are created from RTL data flow analysis. I agree with you in general, it's just this case of having extra dependencies because an LHS hard register was substituted to a pseudo is non-intuitive to me. I am not aware of other similar cases when the other dependency reasons you mention kick in after such transformation. For example, additional dependencies can be created when queues are too long to speed up insn scheduling in some patalogical cases. The probability that it happens is small but it still happens and selective scheduler can crash in this case too. So I'll try going with the minimal fix of tracking only this particular case (of newly created implicit clobbers) in the selective scheduler. Btw, does the code calculating implicit clobbers via ira_implicitly_set_insn_hard_regs were planned just for the pressure sensitive scheduling or also for the general case? It looks like it is needed for the former but it is calculated for the latter. It was done to solve (or at least decrease the probability) reload crashes (reload can not find a spill register) when the first insn scheduling is used.
[Bug inline-asm/56148] [4.8 Regression] inline asm matching constraint with different mode
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56148 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #12 from Vladimir Makarov vmakarov at redhat dot com 2013-02-12 18:52:05 UTC --- (In reply to comment #10) Vlad, could you please explain a bit how you figured out this issue so quickly? (I mean, apart from experience, of course.) Actually I worked on this for 2.5 days. The patch affects very sensitive LRA code. I think there is very small probability that the patch affects other targets (so I am going to do a merge trunk into lra branch at the end of week). The problem was in using the same pseudo for two input operands only one of which is matching the same pseudo which is an earlyclobber.
[Bug rtl-optimization/56195] [4.8 Regression] Error: incorrect register `%rdi' used with `l' suffix (at -O2)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56195 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #4 from Vladimir Makarov vmakarov at redhat dot com 2013-02-07 19:24:58 UTC --- (In reply to comment #3) I'd say the bug is in get_reload_reg. Changing pseudo 118 in operand 0 of insn 90 on equiv 0 Changing address in insn 90 r59:DI -- no change Changing pseudo 59 in address of insn 90 on equiv 0 Creating newreg=137, assigning class GENERAL_REGS to address r137 Choosing alt 1 in insn 90: (0) r (1) rm Reuse r137 for reload 0, change to class INDEX_REGS for r137 90: flags:CCGC=cmp(r137:DI,[r137:DI]) Inserting insn reload before: 256: r137:DI=0 3065 if (get_reload_reg (type, mode, old, goal_alt[i], , new_reg) 3066 type != OP_OUT) calls it with type=OP_IN, mode=SImode, original=const0_rtx, rclass=GENERAL_REGS but returns new_reg = (reg:DI 137). That is because: if (rtx_equal_p (curr_insn_input_reloads[i].input, original) in_class_p (curr_insn_input_reloads[i].reg, rclass, new_class)) doesn't check any mode if original (and curr_insn_input_reloads[i].input) are VOIDmode as in this case. So, either this can be fixed by doing: if (rtx_equal_p (curr_insn_input_reloads[i].input, original) - in_class_p (curr_insn_input_reloads[i].reg, rclass, new_class)) + in_class_p (curr_insn_input_reloads[i].reg, rclass, new_class) + GET_MODE (curr_insn_input_reloads[i].reg) == mode) , or we could try better, if the GET_MODE (curr_insn_input_reloads[i].reg) is wider than mode, see if we can create a lowpart subreg thereof and return that, and only give up (i.e. continue looping) if creation of the lowpart subreg for some reason failed. Vlad, what do you think? I think, the second solution with lowpart is better. Would you like to make a patch or may be you prefer that I work on it?
[Bug rtl-optimization/56195] [4.8 Regression] Error: incorrect register `%rdi' used with `l' suffix (at -O2)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56195 --- Comment #7 from Vladimir Makarov vmakarov at redhat dot com 2013-02-07 20:08:47 UTC --- (In reply to comment #6) Actually, that one doesn't really work, because we have pseudo rather than hard reg at that point, which will never simplify. With this: --- lra-constraints.c.jj2013-02-07 18:34:39.0 +0100 +++ lra-constraints.c2013-02-07 20:58:25.558920536 +0100 @@ -421,8 +421,20 @@ get_reload_reg (enum op_type type, enum if (rtx_equal_p (curr_insn_input_reloads[i].input, original) in_class_p (curr_insn_input_reloads[i].reg, rclass, new_class)) { - *result_reg = curr_insn_input_reloads[i].reg; - regno = REGNO (*result_reg); + rtx reg = curr_insn_input_reloads[i].reg; + regno = REGNO (reg); + /* If input is equal to original and both are VOIDmode, + GET_MODE (reg) might be still different from mode. + Ensure we don't return *result_reg with wrong mode. */ + if (GET_MODE (reg) != mode) +{ + if (GET_MODE_SIZE (GET_MODE (reg)) GET_MODE_SIZE (mode)) +continue; + reg = lowpart_subreg (mode, reg, GET_MODE (reg)); + if (reg == NULL_RTX || GET_CODE (reg) != SUBREG) +continue; +} + *result_reg = reg; if (lra_dump_file != NULL) { fprintf (lra_dump_file, Reuse r%d for reload , regno); the assembly difference is: -cmpl(%rdi), %rdi +cmpl(%rdi), %edi which is desirable in this case, but not sure if all get_reload_reg callers will grok a SUBREG instead of REG returned in *result_reg. This version of patch looks ok for me. I have no worry about get_reload_reg callers. It should work fine (that is a difference from reload pass when you should care about secondary reloads etc). Thanks for working on this, Jakub,
[Bug rtl-optimization/56069] [4.6/4.7/4.8 Regression] RA pessimization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56069 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #2 from Vladimir Makarov vmakarov at redhat dot com 2013-01-22 21:01:21 UTC --- It is definitely regmove pass drawback. IRA can do nothing in this case. We have the following code before and after regmove: 2: r62:DI=di:DI 2: r63:DI=di:DI REG_DEAD di:DI REG_DEAD di:DI 6: {r64:DI=r62:DI 00x3;clobber fla 6: {r63:DI=r63:DI 00x3;clobber fl REG_DEAD r62:DI 7: r65:DI=0x1000 7: r65:DI=0x1000 8: {r63:DI=r64:DI|r65:DI;clobber fla 8: {r63:DI=r63:DI|r65:DI;clobber fl REG_DEAD r65:DI REG_DEAD r65:DI REG_DEAD r64:DI 13: ax:DI=r63:DI 13: ax:DI=r63:DI REG_DEAD r63:DI REG_DEAD r63:DI 16: use ax:DI16: use ax:DI Regmove changes r64 to r63. It makes two equal hard reg preferences for r63: AX or DI. Choosing either one results in worse code. The original generated code can be achieved if regmove changes r65 to r63. In this case we have only one hard register preference for r63 (AX) and for r62 (DI). It can be achieved if regmove tries all orders of commutative operands (now regmove pass uses the first found order) using additional heuristics (live range length and/or number of preferred hard regs) to choose the best order. It is not a trivial change and can not be done for given release (gcc4.8). Also now we have LRA and may be such regmove transformations are not necessary. I am going to try this for gcc4.9 when I have more time. Still to assign ax to r65 and r63 we also need some hard register preference propagations in IRA which is currently absent. In any case, I think that older releases were just lucky and generated the best code as some passes before regmove put r65 as a first input operand.
[Bug rtl-optimization/55153] [4.8 Regression] ICE: in begin_move_insn, at sched-ebb.c:205 with -fsched2-use-superblocks and __builtin_prefetch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55153 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #3 from Vladimir Makarov vmakarov at redhat dot com 2013-01-14 19:44:36 UTC --- (In reply to comment #2) Vlad, can you please have a look? Thanks. Ok, I started to work on this.
[Bug rtl-optimization/55829] [4.8 Regression] ICE: in curr_insn_transform, at lra-constraints.c:3069 with -msse3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55829 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #4 from Vladimir Makarov vmakarov at redhat dot com 2013-01-08 16:09:58 UTC --- (In reply to comment #2) I think this patch can be useful and does give the RA more freedom, but it is unclear whether it doesn't make some LRA bug latent. Vlad? I am working on it on LRA side. I hope the patch will be ready today.
[Bug rtl-optimization/55458] [4.8 Regression] ICE: in assign_by_spills, at lra-assigns.c:1212 with -fPIC -m32 and simple asm volatile
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55458 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #2 from Vladimir Makarov vmakarov at redhat dot com 2012-11-27 22:03:04 UTC --- Reload also can not compile the test. But at least it gives a meaningful error. The problem that asm insns needs 6 reload regs and there is only 5 of them (one is reserved for PIC and sp is always reserved). Optimizations make asm insn requiring only 3 regs. I've just submitted the patch reporting error as reload reports. In any case, I wanted to add this code for some time.
[Bug rtl-optimization/55330] [4.8 Regression] ICE: Maximum number of LRA constraint passes is achieved (15) on gfortran.dg/actual_array_constructor_1.f90
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55330 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #4 from Vladimir Makarov vmakarov at redhat dot com 2012-11-16 16:39:02 UTC --- (In reply to comment #2) (In reply to comment #1) I don't see it on x86_64-apple-darwin10 (revisions 193495+patches and 193329). Looks like a duplicate of 55122. The both have the same end and diagnostics but reasons for this are different.
[Bug rtl-optimization/55247] [4.8 Regression] internal compiler error: Max. number of generated reload insns per insn is achieved (90)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55247 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||ubizjak at gmail dot com --- Comment #2 from Vladimir Makarov vmakarov at redhat dot com 2012-11-09 19:42:30 UTC --- Here is the insn in question: (insn 26 25 27 2 (set (reg:TI 115 [orig:100 *defsym_17 ] [100]) (mem:TI (zero_extend:DI (reg:SI 98)) [7 *defsym_17+0 S16 A32])) h.i:54 61 {*movti_internal_rex64} As I understand the first alternative has ! to strongly encourage to use SSE instead of GENERAL registers. (define_insn *movti_internal_rex64 [(set (match_operand:TI 0 nonimmediate_operand =!r ,o ,x,x ,m) (match_operand:TI 1 general_operand riFo,riF,C,xm,x))] TARGET_64BIT !(MEM_P (operands[0]) MEM_P (operands[1])) For some reasons, the second alternative does not have !. I don't know why it is different from the first alternative. For reload it works as it already substituted hard register for the first operand and in this case it rejects the 2nd alternative. (insn 26 25 27 2 (set (reg:TI 0 ax [orig:100 *defsym_17 ] [100]) (mem:TI (zero_extend:DI (reg:SI 2 cx [98])) [7 *defsym_17+0 S16 A32])) h.i:54 61 {*movti_internal_rex64} Adding ! for the second alternative (as I believe it should be) solves the problem. (define_insn *movti_internal_rex64 [(set (match_operand:TI 0 nonimmediate_operand =!r ,!o ,x,x ,m) (match_operand:TI 1 general_operand riFo,riF,C,xm,x))] TARGET_64BIT !(MEM_P (operands[0]) MEM_P (operands[1])) Uros, is this change ok for you? If it is ok I can commit the patch only on Wednesday (I'll be away for a few days).
[Bug rtl-optimization/55141] [4.8 Regression] wrong code with -fno-split-wide-types
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55141 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #3 from Vladimir Makarov vmakarov at redhat dot com 2012-11-07 15:21:11 UTC --- (In reply to comment #2) Yeah, argp is eliminated to rsp + 16 instead of the correct rsp + 32 (there are 2 64-bit call used registers saved to stack in the prologue, callq pushes 8 bytes and rsp is adjusted by 8 to maintain the required stack alignment. Vlad, can you please take a look? Sure, I'll look at this. Simply I don't know when exactly (probably in a few days) because I am working on many LRA PRs these days.
[Bug rtl-optimization/55092] [4.8 Regression] LRA doesn't scale
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55092 --- Comment #2 from Vladimir Makarov vmakarov at redhat dot com 2012-10-26 22:49:23 UTC --- LRA reuses stack memory much better than reload (in all modes but especially in -O0). May be that is the reason of the var-tracking problem.
[Bug rtl-optimization/55092] [4.8 Regression] LRA doesn't scale
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55092 --- Comment #4 from Vladimir Makarov vmakarov at redhat dot com 2012-10-26 22:57:38 UTC --- (In reply to comment #2) LRA reuses stack memory much better than reload (in all modes but especially in -O0). May be that is the reason of the var-tracking problem. I forgot to say that LRA understands -fno-ira-share-spill-slots. In this case, each pseudo gets own stack slot. I thing it is worth to try it.
[Bug debug/54402] [4.8 Regression] var-tracking does not scale
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54402 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #5 from Vladimir Makarov vmakarov at redhat dot com 2012-10-26 23:06:44 UTC --- Ok, I'll try to find a reason for this slow down.
[Bug regression/55050] Regression test failure slp-21.c on arm-linux-gnueabi
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55050 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #3 from Vladimir Makarov vmakarov at redhat dot com 2012-10-24 20:27:06 UTC --- I am not sure that is a LRA merge problem. LRA merge should not affect ARM because old reload should work for ARM. There is a very small change for arm.c because I added a new argument for final.c::alter_subreg. There is few changes in IRA too but again I don't think it affect ARM. Could you provide a preprocessed file because I can not reproduce the problem by myself. Thanks.
[Bug rtl-optimization/53125] Very slow register allocation on SPARC
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53125 --- Comment #3 from Vladimir Makarov vmakarov at redhat dot com 2012-05-10 18:30:19 UTC --- I've tried a recent trunk on gcc63 of the compiler farm with -O0. The compilation takes about 300sec. I checked also gcc-4.3 (this last version with the old RA), it takes also about 300sec. The actual old RA is slower (it takes 150sec) than IRA (it takes 55sec) but register information pass (more exactly regstat_compute_ri which is a part of DF-infrastructure) takes more time in the trunk than in gcc4.3. So my times are different what you reported. Probably it depends on a machine (gcc63 is relatively modern SPARC machine with NIAGARA processors). After some investigation, I found that the trunk gcc calls regstat_compute_ri more than gcc-4.3. That is a result of recent addition to IRA to move some insns (a month old Bernd's patch). It is not worth to do for -O0. So I am going to switch it off and achieve the same number of regstat_compute_ri calls (2 of them) as in gcc-4.3 and that means achieving less 200sec of compilation time. (65% of previous time). I am going to submit a patch today. The futher improvement of regstat_compute_ri is not possible because we need one call for IRA needs and one call after reload transformations (for subsequent passes). Speedup of IRA itself can have only a small impact. I don't see how it is possible. It is very simple and fast enough (3 times faster than the old RA). One might think that not doing RA at all (setting -1 for all reg_renumber elements) could speed the case up. But this is not true. It increases reload work enormously and generates 2-3 times more insns which will slow down the compiler even more. So, Ian, if you need more speedup for -O0, regstat_compute_ri should be improved. But that is not my responsibility area. For me, it is strange that such simple task (which requires 1 pass of RTL) takes so much time for this case.
[Bug rtl-optimization/53125] Very slow register allocation on SPARC
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53125 --- Comment #2 from Vladimir Makarov vmakarov at redhat dot com 2012-04-29 00:08:54 UTC --- I'll look at this PR in a week.
[Bug rtl-optimization/52208] [4.7 Regression] Useless store
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52208 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #4 from Vladimir Makarov vmakarov at redhat dot com 2012-02-15 19:34:06 UTC --- (In reply to comment #3) The -1000 costs comes from the scan_one_insn subtracting there ira_memory_move_cost[][][] * frequency (i.e. memory_cost becomes -4000) and on the plus we add just 3000 to memory_cost. I wonder if we shouldn't limit this subtraction of mem_cost / setting of counted_mem e.g. to general_operand (SET_SRC (set), GET_MODE (SET_SRC (set))) and leave the specialized memory loads alone (I know, it would be a hack, but works for this and shouldn't pessimize the cases for which this hunk has been added). I would not name this a hack, Jakub. It is a heuristic :) This solution is ok for me. I checked SPEC2000 and did not find any effect of this patch on generated code. So the patch is ok but it would be great if you add some comment for the change. And would at least tiny bit model what reload will do with such non-standard mems - as on this testcase it doesn't use the orignal mem, but does the load, followed by store to another mem, followed by load from that mem. --- ira-costs.c.jj 2012-01-20 12:35:17.0 +0100 +++ ira-costs.c 2012-02-14 14:54:52.297356053 +0100 @@ -1313,7 +1313,8 @@ scan_one_insn (rtx insn) || (CONSTANT_P (XEXP (note, 0)) targetm.legitimate_constant_p (GET_MODE (SET_DEST (set)), XEXP (note, 0)) - REG_N_SETS (REGNO (SET_DEST (set))) == 1))) + REG_N_SETS (REGNO (SET_DEST (set))) == 1)) + general_operand (SET_SRC (set), GET_MODE (SET_SRC (set { enum reg_class cl = GENERAL_REGS; rtx reg = SET_DEST (set);
[Bug rtl-optimization/49800] [4.7 Regression] segfault with -fsched-pressure -fdump-rtl-sched1
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49800 --- Comment #3 from Vladimir Makarov vmakarov at redhat dot com 2012-02-02 18:33:34 UTC --- I am working on it.
[Bug rtl-optimization/40761] IRA memory hog for insanely nested loops
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40761 --- Comment #17 from Vladimir Makarov vmakarov at redhat dot com 2012-01-19 20:42:57 UTC --- The problem was in building CFG loops which took the most of time. CFG loops were built even if we don't use regional allocation as for -O0. I'll send a patch soon. It is not small because IRA in any case uses one region with CFG loop representing the whole function.
[Bug rtl-optimization/40761] IRA memory hog for insanely nested loops
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40761 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #16 from Vladimir Makarov vmakarov at redhat dot com 2012-01-18 22:01:11 UTC --- I'll work on it.
[Bug target/49865] [4.7 Regression] Unnecessary reload causes small bloat
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49865 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #5 from Vladimir Makarov vmakarov at redhat dot com 2011-12-13 19:38:16 UTC --- I can not reproduce it on the current trunk (rev. 182263). The recent ira patches might fix it. The code generated on the current trunk is pushl %ebp .cfi_def_cfa_offset 8 .cfi_offset 5, -8 movl%esp, %ebp .cfi_def_cfa_register 5 pushl %edi .cfi_offset 7, -12 movl$1024, %ecx xorl%eax, %eax movl8(%ebp), %edi rep stosl movl8(%ebp), %eax movl$0, 4096(%eax) popl%edi .cfi_restore 7 popl%ebp .cfi_restore 5 .cfi_def_cfa 4, 4 ret
[Bug rtl-optimization/50176] [4.7 Regression] 4.7 generates spill-fill dealing with char-int conversion
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50176 --- Comment #9 from Vladimir Makarov vmakarov at redhat dot com 2011-12-13 20:04:04 UTC --- (In reply to comment #0) Created attachment 25088 [details] After expanding 4.7 contains: (insn 52 51 53 6 (set (reg:QI 83 [ D.2723 ]) (mem:QI (plus:SI (reg/v/f:SI 75 [ inptr1 ]) (reg/v:SI 117 [ col ])) [0 MEM[base: inptr1_19, index: col_90, offset: 0B]+0 S1 A8])) test_4_6.c:42 -1 (nil)) and 4.6 contains (insn 52 51 53 6 (parallel [ (set (reg/v:SI 86 [ cb ]) (zero_extend:SI (mem:QI (plus:SI (reg/v/f:SI 76 [ inptr1 ]) (reg/v:SI 78 [ col ])) [0 MEM[base: inptr1_19, index: col_22, offset: 0B]+0 S1 A8]))) (clobber (reg:CC 17 flags)) ]) test_4_6.c:42 -1 (nil)) The reason of different outcome in RA is that p83 generated by 4.7 we can use only q regs vs. general regs for p86 generated by 4.6. It decreases # of possible hard regs for p83 in two times and failure to assign p83 a hard register. More accurately IRA assigns dx to p83 then reload spills p83 because it needs a hard register then reload asks IRA to reassign a hard register to p83 and IRA fails.
[Bug rtl-optimization/21617] [4.4/4.5/4.6/4.7 Regression] CRC64 algorithm optimization problem on Intel 32-bit
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21617 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #6 from Vladimir Makarov vmakarov at redhat dot com 2011-12-09 19:09:52 UTC --- There is small difference in the code which results in such degradation. -O1 generates an insn in the major loop (insn 43 42 44 5 /home/cygnus/vmakarov/build1/trunk/crctest64.c:241 (parallel [ (set (reg/v:SI 77 [ __tab_index ]) (xor:SI (reg:SI 108) (reg:SI 120))) (clobber (reg:CC 17 flags)) ]) 395 {*xorsi_1} (expr_list:REG_DEAD (reg:SI 108) (expr_list:REG_DEAD (reg:SI 120) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil) -O2 generates analogous insn (insn 39 38 40 5 /home/cygnus/vmakarov/build1/trunk/crctest64.c:241 (parallel [ (set (reg/v:SI 83 [ __tab_index ]) (xor:SI (reg/v:SI 83 [ __tab_index ]) (reg:SI 143))) (clobber (reg:CC 17 flags)) ]) 395 {*xorsi_1} (expr_list:REG_DEAD (reg:SI 143) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil The reason for the difference because of regmove optimization. The RTL insn in the second variant looks even better but it makes pseudo 83 most frequently used and assigned first by pushing it last to the coloring stack between bunch trivially colorable pseudos. The set of trivially colorable pseudos contains two double word pseudos which need two adjacent hard registers each. Assigning pseudo 83 first (the case is complicated more because some pseudos cross calls) results in presence of only one pair of adjacent hard registers although there are still 2 free hard register for the second double word pseudos but they are not adjacent. It results in spilling of one double word pseudo and code performance degradation. For -O1 analog pseudo 83 (p77) is assigned last after assigning to two double word pseudos and spilling does not occur. To solve the problem we should increase probability of keeping free hard registers adjacent. It can be done by pushing multi-word pseudos last to the coloring stack and as consequence to assign them first by modifying function bucket_allocno_compare_func. I did the problem was solved unfortunately, it results in 2% performance degradation of SPEC2000 perlbmk although there is a small code size improvement on SPEC2000 with this heuristic. On a general note, RA allocation is all about heuristics. So it is possible to find a test where it will work worse than other heuristics. The most important that RA works well in overall (on big credible set of tests). With this point of view IRA is much better than the previous register allocator. But because crc code is important, I'll continue the work on tuning which does not degrade SPEC2000 and which does solve problem.
[Bug other/50775] Register allocator sets up frame and frame pointer with low register pressure
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50775 --- Comment #6 from Vladimir Makarov vmakarov at redhat dot com 2011-12-04 04:09:06 UTC --- (In reply to comment #5) (In reply to comment #4) Wrong profitable hard regs calculation for register files requiring aligned start register was a merging problem with a patch for allocation without cover classes. I'll try make a patch this week to solve the problem. Thanks you are taking care of this. Will it also improve the situation for 3-byte types as introduced in PR50931? 3-byte types also start in even registers. I think it will improve. Sorry for the delay with the patch. The changes are big (the patch is about 1700 lines long) so I need a thorough testing.
[Bug other/50775] Register allocator sets up frame and frame pointer with low register pressure
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50775 --- Comment #4 from Vladimir Makarov vmakarov at redhat dot com 2011-11-28 21:48:20 UTC --- (In reply to comment #2) Also, I have a question about the following fields of `ira_allocno': /* The number of objects tracked in the following array. */ int num_objects; /* An array of structures describing conflict information and live ranges for each object associated with the allocno. There may be more than one such object in cases where the allocno represents a multi-word register. */ ira_object_t objects[2]; --^ The SImode for AVR consists of 4 words, but only 2 objects in allocno structure. Is this right ? Yes, that is right. IRA objects were introduced by By Bernd Schmidt. Unfortunately, I did not review his patch. Probably, Bernd decided that 2 hard regs allocno covers most cases (and may be he is right). Other multi regs allocno is processed as one object (it means that all one register parts conflict with all another one register parts even if in reality one part does not conflict with another allocno part). Wrong profitable hard regs calculation for register files requiring aligned start register was a merging problem with a patch for allocation without cover classes. I'll try make a patch this week to solve the problem. Dennis, thanks for detail analysis of the problem. It saved my time.
[Bug rtl-optimization/50829] avx extra copy for _mm256_insertf128_pd
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #5 from Vladimir Makarov vmakarov at redhat dot com 2011-11-24 03:29:09 UTC --- The following code is generated before RA: ... (insn 7 3 11 2 (set (reg:V4DF 63) (unspec:V4DF [ (reg/v:V2DF 62 [ x ]) ] UNSPEC_CAST)) ./include/avxintrin.h:1413 1960 {avx_pd256_pd} (nil)) (insn 11 7 17 2 (set (reg:V4DF 65) (vec_concat:V4DF (vec_select:V2DF (reg:V4DF 63) (parallel [ (const_int 0 [0]) (const_int 1 [0x1]) ])) (reg/v:V2DF 62 [ x ]))) ./include/avxintrin.h:715 1933 {vec_set_hi_v4df} (expr_list:REG_DEAD (reg:V4DF 63) (expr_list:REG_DEAD (reg/v:V2DF 62 [ x ]) (nil ... First of all unspec in insn 7 hides that 63 and 62 has the same value. But even if the unspec were absent, IRA as most other RAs finds conflicts based on live ranges not on the value of in the pseudos. The finding conflicts based on GVN is very expensive and gives nothing on the most code (I did GVN based conflict recognition about 8 years ago, it is described in the 2nd GCC summit proceedings if I remember correctly). As 62 and 63 conflicts they get different hard registers. I guess that the right RTL generation (using one pseudo for 62 and 63) should be done somewhere outside IRA.
[Bug target/50829] avx extra copy for _mm256_insertf128_pd
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829 --- Comment #7 from Vladimir Makarov vmakarov at redhat dot com 2011-11-24 03:45:24 UTC --- As for stack allocation. crtl-stack_realign_needed == 1 results in frame_pointer_needed:=1 in ira.c::ira_setup_eliminable_regset. I don't remember the origin of the code. Probably, it is from HJ's stack aligning work. Sorry, if I am wrong. I guess we should re-evaluate frame_pointer_needed at the end of RA if we don't allocate any memory in all RA.
[Bug target/50164] [IRA, 4.7 Regression] Performance degradation due to increased memory instructions count
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50164 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #1 from Vladimir Makarov vmakarov at redhat dot com 2011-08-24 16:02:57 UTC --- Yesterday I sent a patch http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01954.html which most probably solved the problem. Now I have code size 419 (gcc 4.6) vs 411 (gcc as of Aug 24) bytes for the test.
[Bug bootstrap/50146] [4.7 regression] unused variable saved_nregs in ira-color.c broke arm-linux-gnueabi bootstrap
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50146 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #1 from Vladimir Makarov vmakarov at redhat dot com 2011-08-22 03:32:18 UTC --- (In reply to comment #0) gcc-4.7-20110820 fails to bootstrap on arm-linux-gnueabi with: The issue is that saved_nregs is declared unconditionally but used #ifndef HONOR_REG_ALLOC_ORDER. The ARM backend does define HONOR_REG_ALLOC_ORDER, so the warning is expected. I'm testing the following fix: --- gcc-4.7-20110820/gcc/ira-color.c.~1~2011-08-18 16:56:36.0 +0200 +++ gcc-4.7-20110820/gcc/ira-color.c2011-08-21 19:11:00.0 +0200 @@ -1567,13 +1567,14 @@ static bool assign_hard_reg (ira_allocno_t a, bool retry_p) { HARD_REG_SET conflicting_regs[2], profitable_hard_regs[2]; - int i, j, hard_regno, best_hard_regno, class_size, saved_nregs; + int i, j, hard_regno, best_hard_regno, class_size; int cost, mem_cost, min_cost, full_cost, min_full_cost, nwords, word; int *a_costs; enum reg_class aclass; enum machine_mode mode; static int costs[FIRST_PSEUDO_REGISTER], full_costs[FIRST_PSEUDO_REGISTER]; #ifndef HONOR_REG_ALLOC_ORDER + int saved_nregs; enum reg_class rclass; int add_cost; #endif Sorry, my bad. It is from my patch for PR50107. The patch is ok so you can commit it to the trunk. Thank you.
[Bug rtl-optimization/50107] [IRA, i386] allocates registers in very non-optimal way
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107 --- Comment #14 from Vladimir Makarov vmakarov at redhat dot com 2011-08-19 16:12:48 UTC --- (In reply to comment #11) (In reply to comment #10) movq%rdi, %rdx mulx%rsi, %rax, %rsi movq%rsi, %rdx ret .cfi_endproc .LFE0: .sizetest_mul_64, .-test_mul_64 .identGCC: (GNU) 4.7.0 20110818 (experimental) .section.note.GNU-stack,,@progbits [hjl@gnu-6 pr50107]$ I would expect movq%rdi, %rdx mulx%rsi, %rax, %rdx ret I think it i a reload problem. IRA assigns dx to pseudo 71 (an insn output) but reload then spills it. uti-2.i.188r.asmcons has (insn 11 4 24 2 (parallel [ (set (reg:DI 72) (mult:DI (reg/v:DI 64 [ b ]) (reg/v:DI 63 [ a ]))) (set (reg:DI 73 [+8 ]) (truncate:DI (ashiftrt:TI (mult:TI (zero_extend:TI (reg/v:DI 64 [ b ])) (zero_extend:TI (reg/v:DI 63 [ a ]))) (const_int 64 [0x40] ]) uti-2.i:3 339 {bmi2_mulxditi3_internal} (expr_list:REG_DEAD (reg/v:DI 64 [ b ]) (expr_list:REG_DEAD (reg/v:DI 63 [ a ]) (nil uti-2.i.191r.ira generates: (insn 11 28 25 2 (parallel [ (set (reg:DI 0 ax [72]) (mult:DI (reg/v:DI 4 si [orig:64 b ] [64]) (reg:DI 1 dx))) (set (reg:DI 4 si [orig:73+8 ] [73]) (truncate:DI (ashiftrt:TI (mult:TI (zero_extend:TI (reg/v:DI 4 s i [orig:64 b ] [64])) (zero_extend:TI (reg:DI 1 dx))) (const_int 64 [0x40] ]) uti-2.i:3 339 {bmi2_mulxditi3_internal} (nil)) Why does IRA/reload choose SI for pseudo 73? IRA assigns dx to pseudo 73. Than reload pass needs dx for pseudo 63 and reload spills 73 and assigns si to 73 again. Reload pass spills pseudo 73 because it believes that pseudos living through insn or dead or set (pseudo 73 is set) in the insn conflict with necessary reload. Of course it is really not necessary to spill pseudo 73, but to teach reload pass to that is a big, error-prune project. I'd not recommend to start it. I myself am not interesting to work on the reload pass. Instead I prefer to work on LRA (local RA) which is a reload pass replacement.
[Bug rtl-optimization/49890] IRA spill with plenty of available registers
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49890 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #1 from Vladimir Makarov vmakarov at redhat dot com 2011-08-18 16:12:45 UTC --- IRA removes some classes for consideration on the 2nd pass to speed up cost calculation which is very time consuming. IRA did it in too optimistic way. That is the reason of the problem. I'll send a patch which removes classes in more conservative way and fixes the problem.
[Bug rtl-optimization/50107] [IRA, i386] allocates regiters in very non-optimal way
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107 --- Comment #10 from Vladimir Makarov vmakarov at redhat dot com 2011-08-18 18:24:42 UTC --- (In reply to comment #9) With revision 177865 + MULX change, I got [hjl@gnu-6 pr50107]$ cat uti-2.i unsigned __int128 test_mul_64 (unsigned long long a, unsigned long long b) { return (unsigned __int128) a*b; } [hjl@gnu-6 pr50107]$ make uti-2.s /export/build/gnu/gcc-hsw/build-x86_64-linux/gcc/xgcc -B/export/build/gnu/gcc-hsw/build-x86_64-linux/gcc/ -S -o uti-2.s -O2 -mbmi2 uti-2.i [hjl@gnu-6 pr50107]$ cat uti-2.s .fileuti-2.i .text .p2align 4,,15 .globltest_mul_64 .typetest_mul_64, @function test_mul_64: .LFB0: .cfi_startproc movq%rdi, %rdx mulx%rsi, %rax, %rsi movq%rsi, %rdx ret .cfi_endproc .LFE0: .sizetest_mul_64, .-test_mul_64 .identGCC: (GNU) 4.7.0 20110818 (experimental) .section.note.GNU-stack,,@progbits [hjl@gnu-6 pr50107]$ I would expect movq%rdi, %rdx mulx%rsi, %rax, %rdx ret I think it i a reload problem. IRA assigns dx to pseudo 71 (an insn output) but reload then spills it.
[Bug rtl-optimization/50107] [IRA, i386] allocates regiters in very non-optimal way
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107 --- Comment #2 from Vladimir Makarov vmakarov at redhat dot com 2011-08-17 17:16:11 UTC --- I guess something wrong with hard register preferencing for multi-register pseudos in ira-color.c::ira_assign. I believe it works fine for one-register pseudos. I'll look at this. Thanks for reporting. By the way, your patch is wrong. There should be TARGET_64BIT in define_split instead of !TARGET_64BIT.
[Bug rtl-optimization/50107] [IRA, i386] allocates regiters in very non-optimal way
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107 --- Comment #6 from Vladimir Makarov vmakarov at redhat dot com 2011-08-17 22:21:13 UTC --- (In reply to comment #4) Created attachment 25038 [details] A patch This patch generates: movq%rdi, %rdx mulx%rsi, %r10, %r9 addq$3, %r9 adcq$0, %r10 movq%r9, k2(%rip) movq%r9, %rax movq%r10, k2+8(%rip) movq%r10, %rdx ret I don't think it is a good patch (changing register allocation order) because it prefers new x86-64 registers and results in longer insns and bigger code for many programs. I am working on a patch to fix it in IRA. I found a typo which is a reason for such behaviour. I think it will be ready tomorrow.
[Bug rtl-optimization/49936] [4.7 Regression] IRA handles CANNOT_CHANGE_MODE_CLASS poorly, + spills to memory on 4.7
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49936 --- Comment #4 from Vladimir Makarov vmakarov at redhat dot com 2011-08-16 17:27:12 UTC --- (In reply to comment #3) Hmmm. Is it possible to make the INT/memory/whatever decision based on move costs? Or use a target hook to supply a hint about what to do? I think I can restore the 4.6 behaviour by assigning GR_REGS for accum. I'll try to do a patch this week. Such patches needs a lot of testing. So I hope it will be on the trunk next week.
[Bug rtl-optimization/49936] [4.7 Regression] IRA handles CANNOT_CHANGE_MODE_CLASS poorly, + spills to memory on 4.7
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49936 --- Comment #2 from Vladimir Makarov vmakarov at redhat dot com 2011-08-16 02:05:02 UTC --- After thorough investigation of the problem I came to a conclusion that fixing it in IRA requires to form regions on pseudo mode usage too (besides just register pressure). Allocnos for the pseudo in question should get a different classes (FP class inside loop and INT outside). The problem is that IRA were written on assumption that register class of all allocnos for a pseudo is the same. It needs a lot of changes besides a new code for forming regions on the mode base. I'll try to do this but it will take long time. If it does not work, I could try to restore 4.6 behaviour (assigning INT class instead of memory).
[Bug rtl-optimization/48633] [4.7 regression] IRA causes ICE in compensate_edge
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48633 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #3 from Vladimir Makarov vmakarov at redhat dot com 2011-05-13 15:26:57 UTC --- Michael, thanks for the analysis and the smaller test. It saved a lot of my time. I made a patch to fix the bug and after testing I submit the patch. I should say that the last big IRA patch did not create the bug, it just triggered it.
[Bug rtl-optimization/48971] [4.7 regression] ICE with -msoft-float -O2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48971 --- Comment #4 from Vladimir Makarov vmakarov at redhat dot com 2011-05-13 15:34:39 UTC --- (In reply to comment #3) (In reply to comment #2) Vlad, this is an abort in setup_pressure_classes which apparently is totally broken for sparc -msoft-float. I found the wrong code. It is pretty simple but I need to check a few platforms because the fix might affect other platform builds. I hope I'll send the patch at the end of the day. SPARC ICC register presents in ALL_REGS class only which can not be a pressure class. That is the reason for the problem. I also found a typo in the check code (it collected hard registers of all non-pressure classes although it should collect the pressure classes hard registers). I found more complication with the check code in MIPS target. So it took more time than I did. Currently I am testing the patch and submit it for approval soon.
[Bug rtl-optimization/48971] [4.7 regression] ICE with -msoft-float -O2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48971 --- Comment #3 from Vladimir Makarov vmakarov at redhat dot com 2011-05-12 17:57:58 UTC --- (In reply to comment #2) Vlad, this is an abort in setup_pressure_classes which apparently is totally broken for sparc -msoft-float. I found the wrong code. It is pretty simple but I need to check a few platforms because the fix might affect other platform builds. I hope I'll send the patch at the end of the day.
[Bug rtl-optimization/48455] [4.7 Regression] Huge code size regression for all ARM configurations
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48455 --- Comment #6 from Vladimir Makarov vmakarov at redhat dot com 2011-04-13 15:21:23 UTC --- I found one problem with reg equivalences. They were just ignored. It is a result of bad merging the big IRA patch and changes in IRA for last half year. I found the problem solution improves the code size (at least for -O2). I'll send the patch today. But I guess it does not solve all the code size degradation. Therefore I continue my work on the PR. Thanks for the small tests, Richard. It saved a lot of my time.
[Bug rtl-optimization/48496] [4.7 Regression] 'asm' operand requires impossible reload in libffi/src/ia64/ffi.c
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48496 --- Comment #4 from Vladimir Makarov vmakarov at redhat dot com 2011-04-11 17:11:37 UTC --- The new big IRA patch just triggered a latent reload bug. The code in question is in function reload_as_needed /* If this was an ASM, make sure that all the reload insns we have generated are valid. If not, give an error and delete them. */ if (asm_noperands (PATTERN (insn)) = 0) for (p = NEXT_INSN (prev); p != next; p = NEXT_INSN (p)) if (p != insn INSN_P (p) GET_CODE (PATTERN (p)) != USE (recog_memoized (p) 0 || (extract_insn (p), ! constrain_operands (1 { error_for_asm (insn, %asm% operand requires impossible reload); delete_insn (p); } } A previous insn P has a spilled pseudo and that results in the error generation because spilled pseudos are changed by memory later. I guess the above code is wrong if a previous insn has a spilled pseudo. The bug did not occur before the big IRA patch because the pseudo in question happened not to be spilled. I should mention that it is more profitable to spill the pseudo and the new IRA makes the right decision (which results in live range shrinkage and decreasing register pressure). I could make a patch (preventing the error generation if there are spilled pseudos in insn P) but I think that reload maintainers would do that different (e.g. moving the check after changing spilled pseudos by memory) or make a better patch.
[Bug middle-end/48464] [4.7 Regression] @171649: ICE in setup_pressure_classes, at ira.c:877
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48464 --- Comment #3 from Vladimir Makarov vmakarov at redhat dot com 2011-04-11 18:44:59 UTC --- There is typo in a loop condition resulting in taking hard registers of LIM_REG_CLASS which happens a garbage for VAX. I'll send a patch soon.
[Bug rtl-optimization/48272] internal compiler error: in setup_insn_reg_pressure_info, at haifa-sched.c:1124
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48272 --- Comment #3 from Vladimir Makarov vmakarov at redhat dot com 2011-04-07 21:22:49 UTC --- (In reply to comment #2) Confirmed (nice non-sensical set of options, btw.) The problem is that register pressure code is not prepared for new insns created during scheduling (for ia64, this is speculation checks and recovery code). The ICE happens because we do not initialize register pressure structures. The below patch seems to fix it, but I am not sure it is correct. The patch calls setup_insn_reg_pressure_info (renamed to init_insn_reg_pressure_info because there is the function with the same name in haifa-sched.c) from within haifa_init_insn, where new instructions created during scheduling are initialized. The patch does not call setup_insn_reg_uses as sched_analyze_insn does, because there is no deps context at that point. If some processing of this kind is desired, I guess we need to amend the functions that copy/init dependencies for recovery code (that is, create_check_block_twin and add_to_speculative_block). Finally, better name for init_insn_reg_pressure_info should be devised. Vlad, it would be great if you can advise me on how to improve the patch. It is good enough. You can commit it of course with a proper changelog entry. Thanks, Andrey.
[Bug rtl-optimization/48455] [4.7 Regression] Huge code size regression for all ARM configurations
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48455 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #3 from Vladimir Makarov vmakarov at redhat dot com 2011-04-06 16:58:02 UTC --- That is a huge degradation. I am going to work on it. Could you provide me a small test? I can not even download CSiBE. Something wrong with their web-server.
[Bug inline-asm/48435] [4.7 Regression] Assertion failure during IRA (df_scan)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48435 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #4 from Vladimir Makarov vmakarov at redhat dot com 2011-04-06 19:42:56 UTC --- All pseudos got 0 available hard regs and therefore spills. Something wrong with calculation of number of available hard regs for targets which can use reg pairs starting only on even/odd hard regs. The fix will need changes in very sensitive part of IRA code and need some time to write it, test, and benchmark it. I hope it will be done at the end of week. Sorry for the inconvenience.
[Bug target/48366] [4.7 Regression] ICE in extract_constrain_insn_cached, at recog.c:2024
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48366 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #6 from Vladimir Makarov vmakarov at redhat dot com 2011-04-03 18:18:03 UTC --- John, thanks for reporting the PR and working on it. I guess that the last patch (for pr48380) I sent should solve the problem too. Unfortunately, I did not get an approval for the patch yet. I'd recommend you to check the patch first because it might save you a lot of time because the problem occurs in reload and it is hard to analyze the reload. But the real reason of the problem is in wrong IRA directions.
[Bug target/48380] [gcc-4.7 regression] ICE in postreload.c while building trunk
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48380 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #4 from Vladimir Makarov vmakarov at redhat dot com 2011-04-01 20:47:39 UTC --- We have the following situation: - a pseudo has equivalent constant. - a loop allocno corresponding to the pseudo got hard reg and the subloop allocno got memory. - the load generated by IRA on the loop/subloop border is not removed. - the loop allocno is spilled in reload transforming the load into mem-mem move. - reload skip processing the move because it sets up regno with equiv constant. - gcc dies in the post=reload. There are several possible solutions but the most optimal would be removing the load transformed into mem-mem move in the reload. We need to add the load to equiv init insn. I'll submit a patch solving the problem soon.
[Bug rtl-optimization/48381] [4.7 Regression] internal compiler error: in check_allocation, at ira.c:2094
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48381 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #2 from Vladimir Makarov vmakarov at redhat dot com 2011-03-31 19:06:57 UTC --- Jakub, thanks for reducing the test. It really saved my time. We have the following situation: Allocno a7r74 of CREG(1) has 1 avail. regs 2 (confl regs = 0 1 3-51 ) node: 2 ... Allocno a10r80 of GENERAL_REGS(6) has 5 avail. regs obj 0 0-5 (confl regs = 6-51 ) node: 0-5, obj 1 0-5 (confl regs = 6-51 ) node: 0-5 ... Popping a7(r74,l0) -- assign reg 2 .. Popping a10(r80,l0) -- (memory is more profitable 7000 vs 2147483647) spill ... Spilling a7r74 for a10r80 Assigning 1 to a10r80 a7(r74,l0) -- assign hard reg 2 r74 got C reg and r80 was spilled. Than function improve_allocation decides that spilling r74 and assigning hard reg to r84 is more profitable. The function at the end of its work was trying to assign another hard reg to r74 and assign C reg again which is wrong because of r80 needs two hard registers DX and CX. The wrong assignment is because of wrong code in function (assign_hard_reg) searching conflicting hard regs. The code deciding that conflicting allocnos really conflicts looks like hard_regno = ALLOCNO_HARD_REGNO (conflict_a); if (hard_regno = 0 ira_class_hard_reg_index[aclass][hard_regno] = 0) where aclass is class of r84 (CREGS) and hard_regno is r80 hard regno (DX). This code was ok for cover classes. It should be different when different classes of allocnos can intersect. I'll send a patch to solve the PR soon.
[Bug middle-end/48367] [4.7 Regression] 200.sixtrack/301.apsi in SPEC CPU 2000 are miscompiled
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48367 --- Comment #4 from Vladimir Makarov vmakarov at redhat dot com 2011-03-30 23:17:58 UTC --- I've started to work on this. Probably it will take day or two to fix it is hard to find a wrong code in a big program as apsi.
[Bug middle-end/48367] [4.7 Regression] 200.sixtrack/301.apsi in SPEC CPU 2000 are miscompiled
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48367 --- Comment #6 from Vladimir Makarov vmakarov at redhat dot com 2011-03-31 01:05:33 UTC --- The problem was in a typo in ira-costs.c which in some cases results in assigning INT_MAX to memory_cost and as consequence ALL_REGS to some allocnos. After some optimizations the allocno which got a hard reg and corresponds to loop which contains subloops and never referenced in its loop is spilled in function move_spill_restore and because it is never referenced in the loop, it got zero costs for all hard regs. In reload, the allocno is assigned to a mmx hard register through IRA which corrupted by sse registers usage in other program places. I'll sent a patch soon to fix this.
[Bug rtl-optimization/48331] [4.7 Regression] gcc.c-torture/execute/built-in-setjmp.c FAILs with -O -fira-algorithm=priority -fPIC
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48331 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #3 from Vladimir Makarov vmakarov at redhat dot com 2011-03-29 15:07:46 UTC --- (In reply to comment #2) It started with http://gcc.gnu.org/viewcvs?view=revisionrevision=171649 I don't know what's the status of this allocator (how near is its end), nor if there are any targets that have to use it as CB's allocator doesn't work for them. Thanks for reporting. The patch is to permit to use CB allocator for ports which had to use the priority allocator. The performance result of the modified CB allocator is expected to be better than the usage of priority one for the ports. In perspective, priority coloring will be removed. I'd recommend maintainers of the ports using priority coloring to check CB coloring and plan to switch to it by default. The changes in IRA are big and complex and probably will result some port problems for some time because RA is the most machine-dependent part of the compiler. Therefore the patch was committed to the trunk on the beginning of stage1 to have more time to fix all the problems. Meanwhile, I am going to work and try to fix this PR.
[Bug rtl-optimization/48345] [4.7 Regression] [SH] Invalid float register allocated
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48345 --- Comment #2 from Vladimir Makarov vmakarov at redhat dot com 2011-03-29 23:59:08 UTC --- (In reply to comment #0) It seems that ira-color.c:assign_hard_reg chooses a register of which corresponding bit of ira_prohibited_class_mode_regs [FP_REGS][DFmode] is set. The patch below looks to work for me, though I'm suspecting the real problem is in the target side. --- ORIG/trunk/gcc/ira-color.c2011-03-29 10:08:17.0 +0900 +++ LOCAL/trunk/gcc/ira-color.c2011-03-29 15:09:06.0 +0900 @@ -1692,6 +1692,9 @@ assign_hard_reg (ira_allocno_t a, bool r FIRST_STACK_REG = hard_regno hard_regno = LAST_STACK_REG) continue; #endif + if (TEST_HARD_REG_BIT (ira_prohibited_class_mode_regs[aclass][mode], + hard_regno)) +continue; if (! check_hard_reg_p (a, hard_regno, conflicting_regs, profitable_hard_regs)) continue; The patch is ok for me. This code was lost accidentally on ira-improv branch. Could you commit the patch (of course with a proper changelog entry). I am approving the patch. Thanks.
[Bug rtl-optimization/48345] [4.7 Regression] [SH] Invalid float register allocated
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48345 --- Comment #4 from Vladimir Makarov vmakarov at redhat dot com 2011-03-30 00:49:50 UTC --- (In reply to comment #3) Thanks! Is PR rtl-optimization/48345 * ira-color.c (assign_hard_reg): Skip prohibited hard registers for given class and mode. OK for ChangeLog? Sorry, Kazumoto. Please do not commit the patch. The problem is a bit more deeper than I thought. The profitable hard regs should exclude prohibited hard regs for given mode. It is true for major allocation. The wrong register is assigned during secondary allocation (after flattening IRA IR or during reload) where profitable hard register is not defined properly. So the fix should contain the code for proper setting profitable hard regs. I'll create a patch soon. Sorry again for jumping to a wrong conclusion.
[Bug middle-end/48342] [4.7 Regression] Failures on powerpc-apple-darwin9 at revision 171653
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48342 --- Comment #1 from Vladimir Makarov vmakarov at redhat dot com 2011-03-30 01:26:26 UTC --- I've just submitted a patch for approval to solve the problem. I hope it will be fixed soon. Thanks for the report.
[Bug rtl-optimization/46920] suboptimal register allocation with local register variables
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46920 --- Comment #3 from Vladimir Makarov vmakarov at redhat dot com 2010-12-14 16:02:09 UTC --- (In reply to comment #2) To generate the proposed code, we should assign r12 to p63. IRA marks p63 conflicting with r12 because DF-infrastructure reports r12 having intersected live ranges with p63. It is possible to solve the problem if we have conflicts based on values (not live ranges). I'd not recommend to do that, because it will slow down RA without visible improvement on majority benchmarks (I did such experiment about 7 years ago and reported about the results on GCC summit in 2004). One alternative is to rematerialize values that have been copied to a hard register before their uses (by inserting an r12:DI=r63:DI before the use of r63). This breaks the live ranges of the pseudos and facilitates coalescing. I'd not call it rematerialization. I think it is more live range shrinking (LRS) of hard register through additional copies. It is an interesting idea (I partially investigated LRS about 6 years ago). Probably I should think about this again. Thanks, Paolo. By the way, usage of implicit hard registers in RTL (when it can be avoided. Example when hard registers can be avoided is their usage as call arguments) is very bad idea for RA. I see it a lot such code in x86-64 code. I'd recommend to prevent optimizations before RA to abuse hard register usage. As I said, the improvement from hard register variable here is 25% on x86-64 and probably more (I can collect data) on i386. This testcase is distilled from a bytecode interpreter. Paolo, I did not mean that you should avoid to use hard register in this particular case. I just wrote that I saw a lot x86-64 code where hard registers were propagated and that is a bad for RA. I never had an opportunity to investigate what optimization does it. Again by the way :). My experience with implementation of interpreters shows me that usage of computed gotos does not work well (especially when there are a lot such labels) with modern OOO processors because of worse branch predictions. I found a switch statement works better. But I guess it is not your goal to rewrite the interpriter.
[Bug rtl-optimization/46920] suboptimal register allocation with local register variables
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46920 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #1 from Vladimir Makarov vmakarov at redhat dot com 2010-12-13 21:03:48 UTC --- Before IRA we have the following code L43: 25 NOTE_INSN_BASIC_BLOCK 26 pc=r59:DI REG_DEAD: r59:DI i 27: barrier L28: 29 NOTE_INSN_BASIC_BLOCK 30 r62:DI=r12:DI REG_DEAD: r12:DI 31 {r63:DI=r62:DI+0x2;clobber flags:CC;} REG_UNUSED: flags:CC 32 r12:DI=r63:DI 33 flags:CCZ=cmp([r62:DI+0x2],0) REG_DEAD: r62:DI 34 pc={(flags:CCZ==0)?L39:pc} REG_DEAD: flags:CCZ REG_BR_PROB: 0x1388 35 NOTE_INSN_BASIC_BLOCK 36 r76:DI=sign_extend(r65:SI) REG_DEAD: r65:SI 37 {r63:DI=r63:DI+r76:DI;clobber flags:CC;} REG_DEAD: r76:DI REG_UNUSED: flags:CC 38 r12:DI=r63:DI L39: 40 NOTE_INSN_BASIC_BLOCK 41 r65:SI=sign_extend([r63:DI+0x1]) REG_DEAD: r63:DI 42 r59:DI=[r71:DI] 61 pc=L43 To generate the proposed code, we should assign r12 to p63. IRA marks p63 conflicting with r12 because DF-infrastructure reports r12 having intersected live ranges with p63. It is possible to solve the problem if we have conflicts based on values (not live ranges). I'd not recommend to do that, because it will slow down RA without visible improvement on majority benchmarks (I did such experiment about 7 years ago and reported about the results on GCC summit in 2004). By the way, usage of implicit hard registers in RTL (when it can be avoided. Example when hard registers can be avoided is their usage as call arguments) is very bad idea for RA. I see it a lot such code in x86-64 code. I'd recommend to prevent optimizations before RA to abuse hard register usage.
[Bug rtl-optimization/46829] ICE: in spill_failure, at reload1.c:2105 with -fschedule-insns -fsched-pressure and variadic function
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46829 --- Comment #10 from Vladimir Makarov vmakarov at redhat dot com 2010-12-10 15:45:33 UTC --- (In reply to comment #9) (In reply to comment #5) It should work for x86_64, not necessarily i?86. Do you mean -fsched-pressure should be able to solve the problem completely for x86-64? Vladimir: Do you have any idea which direction to go in order to solve this problem? Introducing of -fsched-pressure just decreased probability of the bug when 1st insn scheduling is used. The patch introducing -fsched-pressure contained some code in reload to decrease the probability even more. Unfortunately, it did not eliminated it fully. This bug can not be fixed in scheduler (or the solution, like not moving through insn referring for a hard register, will be too conservative especially for x86_64 and still will not fix it for x86) because the scheduler can not see all info handled by reload. IMHO, the right fix should be possibility to split live ranges for explicitly mentioned hard register. May be Jeff Law's current work will provide such feature. In any case, implementation of live range splitting in reload is too big and complicated job even for stage #1. There is no way to implement it at stage #3. It is also very unreasonable thing to do because any change in reload is usually very bug prone. I am sorry, but I don't see that it can be fixed for gcc4.6 fox x86/x86-64. Although it might be fixed for gcc4.7.
[Bug target/42536] [4.4/4.5/4.6 regression] ICE in spill_failure, at reload1.c:2141
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42536 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #6 from Vladimir Makarov vmakarov at redhat dot com 2010-11-29 20:39:39 UTC --- (In reply to comment #4) Jeff/Vlad, how hard would it be to try to split the insn into two insns instead of a spill failure (for insns using a MEM whose address uses more than one hard register) - one which forces the address into register (assuming it is supported) and the store (or load) which would use a simpler address form? If it is done in reload (and imho this is the most right place to do), I think it would be hard. It needs some person with a good knowledge of the reload. It is also possible to do some splitting in other parts of compiler but it would an approximate solution (it means not all such cases will be avoided or/and it will hurt performance in general case).
[Bug rtl-optimization/44249] [4.4/4.5/4.6 Regression] IRA generates extra register move
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44249 --- Comment #2 from Vladimir Makarov vmakarov at redhat dot com 2010-11-24 17:40:56 UTC --- Reload creates additional insn for insn (insn 9 7 11 2 (parallel [ (set (reg:DI 71) (lshiftrt:DI (reg/v:DI 60 [ tag ]) (const_int 4 [0x4]))) (clobber (reg:CC 17 flags)) ]) b.i:5 533 {*lshrdi3_1} (expr_list:REG_DEAD (reg/v:DI 60 [ tag ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil That is because r60 and r71 got different registers (0 an 1) even although there is a copy between r71 and r60 which should result in getting r70 hard register 0 as r60 one. It does not happen because r68 already got 0 and it conflicts with r71: r71: preferred GENERAL_REGS, alternative NO_REGS, cover GENERAL_REGS r68: preferred AREG, alternative GENERAL_REGS, cover GENERAL_REGS r60: preferred GENERAL_REGS, alternative NO_REGS, cover GENERAL_REGS ;; a0(r68,l0) conflicts: a1(r71,l0) ;; a4(r67,l0) conflicts: cp0:a1(r71)-a3(r60)@1000:constraint Popping a0(r68,l0) -- assign reg 0 Popping a3(r60,l0) -- assign reg 0 Popping a1(r71,l0) -- assign reg 1 Analogous insn for gcc-4.3 looks like (insn:HI 9 7 11 2 b.i:4 (parallel [ (set (reg/v:DI 58 [ tag ]) (lshiftrt:DI (reg/v:DI 58 [ tag ]) (const_int 4 [0x4]))) (clobber (reg:CC 17 flags)) ]) 514 {*lshrdi3_1_rex64} (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))) It means there is no such problem as in gcc4.4+. Insn 9 for gcc-4.3 is a result of regmove transformation. I have no idea why regmove (which is present in gcc4.4+) does not do the same for gcc4.4+ (probably because of some changes since 4.3). The problem could be fixed in regmove or in IRA (which is probably harder). But I don't know is it worth to do it. Because such transformations result in longer live ranges of pseudos and might result in worse code for other programs.
[Bug fortran/42169] [4.4/4.5/4.6 Regression] gfortran.dg/pr41928.f90:47: internal compiler error: in store_can_be_removed_p, at ira-emit.c:371
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42169 Vladimir Makarov vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com --- Comment #22 from Vladimir Makarov vmakarov at redhat dot com 2010-10-20 03:03:53 UTC --- Function store_can_be_removed_p was written in assumption that the store is on a loop exit. Apparently it is not true. In this case, it was actually a loop entry from 4 to 5 in loop tree: 0-1-2-3-4-5 | --6-7 There should be some rare combinations of conditions (one is that pseudo is not changed in whole program) to achieve gcc_unreachable for the loop entry. Therefore it is hard to reproduce. There is a very simple solution which is to return false (preventing this optimization) instead of gcc_unreachable (that is a loop entry case). I'll send a patch soon.
[Bug middle-end/45312] [4.4 Regression] GCC 4.4.4 miscompiles the Linux kernel
--- Comment #23 from vmakarov at redhat dot com 2010-09-14 15:46 --- (In reply to comment #22) Fixed everywhere but on 4.3 branch. Maybe commit the patch there too? I think there is a smaller probability that this bug occurs in gcc4.3 because it is based on the old RA. IRA uses hard registers more effectively and frequently than the old RA and therefore it stresses the reload pass more and as the result reload bugs occur more frequently with IRA. But if it is present in gcc4.3, the patch should be applied too. Even more I guess that the patch is pretty safe and could be applied to gcc4.3 in any case. If you want you could apply it to gcc4.3-branch. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45312
[Bug middle-end/40386] wrong code generation for several SPEC CPU2000 benchmarks (lucas, mgrid, face, applu, apsi) with -O1 -fno-ira-share-spill-slots
--- Comment #9 from vmakarov at redhat dot com 2010-09-08 17:44 --- The problem is in that pseudos (r121 in our case) spilled by IRA are not added to live_throughout of reload chain. As the result, pseudo_forbidden_regs are not set up for such pseudos and they can get a hard registers (42 in our case) even if they live through insns (insn 153 in our case) using reload (0th in our case) with this register when another pseudo is spilled and reload ask IRA to assign the correspodning hard register to other pseudo. Here are some parts of IRA dump: Spilling for insn 153. Using reg 2 for reload 1 Using reg 42 for reload 0 ... Spilling for insn 238. Using reg 2 for reload 0 Spill 117(a35), cost=5000 Spilled regs 117 Try assign 121(a6), cost=5000: reassign to 42 The fix is pretty simple. I'll send it soon. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40386
[Bug middle-end/44554] Stack space after sigsetjmp is reused
--- Comment #9 from vmakarov at redhat dot com 2010-09-08 20:06 --- (In reply to comment #8) (In reply to comment #7) Is this still a bug then? Should ira-share-spill-slots be automatically disabled for the caller function when a callee function can return twice? I've never tested with gcc-4.5.x, but in 4.4.x the problem is still present. Unfortunately -fno-ira-share-spill-slots seems to introduce another bug which leads to wrong computations (nearly at the same code position where I had the problems mentioned is this report). At this moment I can not provide a detailed report for this problem, but perhaps it's the same as http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40386. I've submitted a patch solving PR40386. So now we can solve this problem by preventing slot sharing when setjmp is used. I'll send a patch soon. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44554
[Bug middle-end/45312] [4.4 Regression] GCC 4.4.4 miscompiles the Linux kernel
--- Comment #17 from vmakarov at redhat dot com 2010-09-07 18:03 --- (In reply to comment #16) I just noticed that even in the complete absence of reload inheritance, the allocate_reload_reg routine performs free_for_value_p checks, and therefore implicitly takes reload ordering into account. This seems to imply that even if we'd do merge_assigned_reloads only if no inheritance has taken place, we'd still have a problem. Does anybody have any idea how much merge_assigned_reloads actually contributes to performance on i386, in particular now that we have a bit more post-reload optimizers that potentially clear up duplicate code of the type generated by unmerged reloads? I am thinking in the same direction. merge_assign_reloads is dated by 1993. Since then it was not practically changed. I guess postreload can remove unecessary loads if it is generated without merge_assigned_reload. I've tried to compile SPEC2000 by gcc-4.4 with and without merge_assigned_reloads. I did not find any code difference. I've tried a lot of other programs with the same result. The single difference in code I found exists on this test case. So I'd remove merge_assigned_reloads at all as it became obsolete long ago. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45312
[Bug middle-end/45312] [4.4 Regression] GCC 4.4.4 miscompiles the Linux kernel
--- Comment #15 from vmakarov at redhat dot com 2010-09-03 20:45 --- (In reply to comment #14) Ulrih, I've just wanted to post the following when I found that you already posted analogous conclusion. I should have been on CC to see your comment right away. The problem is really fundamental. Code for merge_assigned_reloads ignores inheritance (and dependencies between reloads because of inheritance) at all. Here is my post wanted to add. After thorough examining code for inheritance in reload1.c::choose_reload_regs, I can not find where it can be wrong for this test case. After this function, we have the following reloads: Reload 0: reload_in (SI) = (reg/v/f:SI 132 [ kpte ]) GENERAL_REGS, RELOAD_FOR_OPERAND_ADDRESS (opnum = 0) reload_in_reg: (reg/v/f:SI 132 [ kpte ]) Reload 1: reload_in (SI) = (reg/v/f:SI 132 [ kpte ]) DIREG, RELOAD_FOR_INPUT (opnum = 1) reload_in_reg: (reg/v/f:SI 132 [ kpte ]) Reload 2: reload_in (SI) = (reg:SI 600 [ D.29693 ]) BREG, RELOAD_FOR_INPUT (opnum = 2) reload_in_reg: (reg:SI 600 [ D.29693 ]) Reload 3: reload_in (SI) = (reg:SI 356) CREG, RELOAD_FOR_INPUT (opnum = 3) reload_in_reg: (reg:SI 356) Function reload1.c::merge_assigned_reload called after reload1.c::choose_reload_regs for targets with SMALL_REGISTER_CLASSES (i686 case) merges 0th and 1st reloads (merging results in nullifying reload_in in 1st the reload and changing 0th to RELOAD_OTHER) producing Reload 0: reload_in (SI) = (reg/v/f:SI 132 [ kpte ]) GENERAL_REGS, RELOAD_OTHER (opnum = 0) reload_in_reg: (reg/v/f:SI 132 [ kpte ]) reload_reg_rtx: (reg:SI 5 di) Reload 1: DIREG, RELOAD_FOR_INPUT (opnum = 1) reload_in_reg: (reg/v/f:SI 132 [ kpte ]) reload_reg_rtx: (reg:SI 5 di) Reload 2: reload_in (SI) = (reg:SI 2 cx [501]) BREG, RELOAD_FOR_INPUT (opnum = 2) reload_in_reg: (reg:SI 600 [ D.29693 ]) reload_reg_rtx: (reg:SI 3 bx) Reload 3: reload_in (SI) = (reg:SI 356) CREG, RELOAD_FOR_INPUT (opnum = 3) reload_in_reg: (reg:SI 356) reload_reg_rtx: (reg:SI 2 cx) So far everything is ok. But after that, it changes 3rd reload to RELOAD_OTHER which means that it will be issued before 2nd reload instead of after it as it was before. Changing to RELOAD_OTHER is done because the code assumes (on function reg_overlap_mentioned_for_reload_p) that changing 3rd reload will affect 0th reload. In this unfortunate case pseudo 132 (from 0th reload) and pseudo 356 (from 3rd reload) have equivalent memory and reg_overlap_mentioned_for_reload_p is a simplified code which in this case decides that changing equivalent memory of p356 affects equivalent memory of p132. Reload 0: reload_in (SI) = (reg/v/f:SI 132 [ kpte ]) GENERAL_REGS, RELOAD_OTHER (opnum = 0) reload_in_reg: (reg/v/f:SI 132 [ kpte ]) reload_reg_rtx: (reg:SI 5 di) Reload 1: DIREG, RELOAD_FOR_INPUT (opnum = 1) reload_in_reg: (reg/v/f:SI 132 [ kpte ]) reload_reg_rtx: (reg:SI 5 di) Reload 2: reload_in (SI) = (reg:SI 2 cx [501]) BREG, RELOAD_FOR_INPUT (opnum = 2) reload_in_reg: (reg:SI 600 [ D.29693 ]) reload_reg_rtx: (reg:SI 3 bx) Reload 3: reload_in (SI) = (reg:SI 356) CREG, RELOAD_OTHER (opnum = 3) reload_in_reg: (reg:SI 356) reload_reg_rtx: (reg:SI 2 cx) I don't see a good and simple fix for general case (just fixing reg_overlap_mentioned_for_reload_p would wrong and dangerous) for this code when inheritance is used and there are dependencies for reload 2 and 3 in this case. -- vmakarov at redhat dot com changed: What|Removed |Added CC||vmakarov at redhat dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45312
[Bug middle-end/45312] [4.4 Regression] GCC 4.4.4 miscompiles the Linux kernel
--- Comment #12 from vmakarov at redhat dot com 2010-09-01 18:06 --- (In reply to comment #10) (insn 1407 1405 1406 78 /mnt/b1/src/linux/set64/arch/x86/include/asm/cmpxchg_32.h:72 (set (reg:SI 2 cx) (mem/c:SI (plus:SI (reg/f:SI 6 bp) (const_int -28 [0xffe4])) [0 S4 A32])) 47 {*movsi_1} (nil)) (insn 1406 1407 675 78 /mnt/b1/src/linux/set64/arch/x86/include/asm/cmpxchg_32.h:72 (set (reg:SI 3 bx) (reg:SI 2 cx [501])) 47 {*movsi_1} (nil)) If insn 1406 came right before insn 1407, it would be still correct. Yes, it would but I think the reload should still generate the right code in this particular order of insns. IMHO, fixing the order of insn is not the right thing to do because there might be situation of cycle (e.g. value of p600 is inherited from 2 but should be reloaded into 3 and p356 is inherited from 3 and should be reloaded into 2). The problem is definitely in reload inheritance. Reg_last_reload_reg is not invalidated by insn #1407 which is generated by another reload of insn #675. Reload inheritence bug fixes result in either big code degradation or possibility to induce new bugs. It could be ok to fix such problem on the trunk but fixing it on release brach might be dangerous. Looking through all patches for reload after gcc4.4 I don't think the bug is fixed on the trunk (or in gcc 4.5). We probably are lucky that it did not occur there. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45312
[Bug rtl-optimization/44174] [4.4/4.5/4.6 Regression] can't find a register in class 'CLOBBERED_REGS' while reloading 'asm'
--- Comment #1 from vmakarov at redhat dot com 2010-05-18 19:06 --- It will be fixed by IRA without cover classes which I am working on. The code is planned to be included in gcc4.6. For older versions, it should be fixed in reload because I believe it is a hidden reload bug. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44174
[Bug rtl-optimization/43332] valgrind warns about using uninitialized variable with -fsched-pressure -fschedule-insns
--- Comment #4 from vmakarov at redhat dot com 2010-05-18 21:40 --- Thanks for reporting the problem. The problem has no effect on generated code whatever initialization is used. The code in question tries to get basic block for BARRIER which is wrong. Whatever it gets basic block for BARRIER the code will still work right. In any case, it is really annoying to see such valgrind diagnostic. Therefore I'll send a patch to fix it soon. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43332
[Bug target/44031] [4.4/4.5/4.6 Regression] ice in subst_reloads, at reload.c:6327
--- Comment #3 from vmakarov at redhat dot com 2010-05-10 15:22 --- It is caused by revision 152533: http://gcc.gnu.org/ml/gcc-cvs/2009-10/msg00182.html If it is so, the patch triggered some reload bug IMO. The patch itself was very safe because it resulted in creation of additional conflicts. I hope that Jeff Law's work on new reload will fix it when it is in 4.6 finally. Otherwise, we should work on this PR. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44031
[Bug rtl-optimization/44012] [4.5/4.6 Regression] ICE: SIGSEGV in ira_merge_allocno_live_ranges
--- Comment #12 from vmakarov at redhat dot com 2010-05-07 17:49 --- When allocno is finished, its some info is propagated into upper allocno. When several allocnos with same regno are finished, info can be propagated directly to survived upper allocno or through one allocno will be finished. It depends on region configuration and order of allocnos with the same regno in the corresponding list. The sigsegv occurs in the second case when we remove allocno and propagates this info through allready removed allocno. It happens because regno_allocno_map which is used to find allocno into which the info to propagate is not nullified after removing allocno. H.J.'s patch idea is right but the patch is complicated. I'll send a simplier patch soon. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44012
[Bug rtl-optimization/43413] Powerpc generates worse code for -mvsx on gromacs even though there are no VSX instructions used
--- Comment #10 from vmakarov at redhat dot com 2010-03-23 18:45 --- (In reply to comment #5) Still I'll investigate a bit more why there are a lot of unexpected spills during assignment with -mvsx for the current code. The problem is in that part of VSX_REGS (altivec regs) does not contain values of SFmode. The coloring algorithm does not take it into account. The problem can be solved if we check this in available register calculation. The patch I will send soon decreases # stfs(x)/lfs(x) from 332 to 246. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43413
[Bug rtl-optimization/43413] Powerpc generates worse code for -mvsx on gromacs even though there are no VSX instructions used
--- Comment #5 from vmakarov at redhat dot com 2010-03-22 22:16 --- (In reply to comment #0) In the enclosed test case, it generates the following spills for the options: -O3 -ffast-math -mcpu=power7 -mvsx -maltivec: 117 stfs, 139 lfs -O3 -ffast-math -mcpu=power5 -maltivec: 80 stfs, 100 lfs -O3 -ffast-math -mcpu=power5: 80 stfs, 100 lfs Hi, Mike. I think the comparison should be done with the same -mcpu because there is 1st insn scheduling which increases register pressure differently for different architectures. But that is not so important. I see a lot of spills during assigning because memory is more profitable. Graph coloring pushes them on the stack suggesting that they get registers (and that is not happened during the assignment). On one my branch, I got -O3 -ffast-math -mcpu=power7 -mno-vsx -maltivec: 248 of stfs and lfs -O3 -ffast-math -mcpu=power7 -mvsx -maltivec: 331 of stfs and lfs -O3 -ffast-math -mcpu=power7 -mvsx -maltivec -fsched-pressure: 310 -O3 -ffast-math -mcpu=power7 -mvsx -maltivec -fsched-pressure -fexper: 179 Where -fexper switches on a new graph coloring code without cover classes which I am working on. So I think that this new code and register pressure sensitive insn scheduling will help. Still I'll investigate a bit more why there are a lot of unexpected spills during assignment with -mvsx for the current code. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43413
[Bug rtl-optimization/43413] Powerpc generates worse code for -mvsx on gromacs even though there are no VSX instructions used
--- Comment #6 from vmakarov at redhat dot com 2010-03-22 22:20 --- (In reply to comment #4) FWIW, I seem to get considerably worse code from mainline than you -- for -O3 -ffast-math -mcpu=power7 -mvsx -maltivec I get 140 stfs and 192 lfs insns (compared to 117 139 respectively that you reported). I suspect the differnce is because Mike calculated only stfs/lfs and you stfs(x)/lfs(x). But may be I am wrong. Just for fun, I ran the same code through the a ppc compiler with the LRS code from reload-v2 and get 133:178 stfs/lsf insns, so that code clearly is helping, but it's not enough to offset the badness shown by IRA. I couldn't reconcile how -fno-ira-share-spill-slots would be changing the number of load/store insns, so I poked at that a bit. Yes, I cannot understand that too. -fno-ira-share-spill-slots twiddles whether or not a pseudo which gets assigned a hard reg is put into live_throughout or dead_or_set_p in the reload chain structures, which in turn changes what pseudos get reassigned hard regs during reload. This is a somewhat odd effect and should be investigated further. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43413
[Bug middle-end/42973] [4.4/4.5 regression] IRA apparently systematically making reload too busy on 2 address instructions with 3 operands
--- Comment #10 from vmakarov at redhat dot com 2010-02-10 17:02 --- The big chunk of regmove which did the same what IRA is capable to do was removed when IRA was merged. There are still a lot of important transformations (like dealing with increments, sign/zero extensions etc) which IRA can not do. As I remember I benchmarked IRA with regmove and without it on x86/x86_64 some time ago and I got a clear impression that regmove is still important. It would be nice to see what regmove transformations are important, try to rewrite it or move some its functionality to IRA but unfortunately I have no time for this. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42973
[Bug middle-end/42973] [4.4/4.5 regression] IRA apparently systematically making reload too busy on 2 address instructions with 3 operands
--- Comment #11 from vmakarov at redhat dot com 2010-02-10 17:15 --- (In reply to comment #8) Thanks, we should see if this solves the AMMP problem in a day or two. Are you going to look at the related PR42961? Without the regmove hunk it does not happen at AMMP but it likely happens elsewhere. I did some work on this years back on old RA so I can play with it too. (Simple fix would be to add ? penalizers to integer variant of FP moves, but I would like to see some solution where RA actually can use integers for mem-mem copies) I am working on IRA without usage of cover classes. For example, IRA could assign integer or floating point register for mem-mem copies whatever is possible and whatever is more profitable. This code is big and not ready yet. There are a lot of performance issues (besides IRA speed issues which is a consequence of dealing with more classes). I am trying to solve the issues. But if the code is ok, it probably will solve the problem. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42973
[Bug middle-end/42973] [4.4/4.5 regression] IRA apparently systematically making reload too busy on 2 address instructions with 3 operands
--- Comment #6 from vmakarov at redhat dot com 2010-02-09 19:56 --- The patch which I'll send in a few minutes solves the problem. The patch avoids the creation of shuffle copies if an involved operand should be bound to some other operand in the current insn. The test code generated with the patch looks like .L2: movapd %xmm0, %xmm8 subsd %xmm3, %xmm8 movsd a(%rax), %xmm6 mulsd %xmm8, %xmm8 movsd b(%rax), %xmm7 subsd %xmm8, %xmm7 movsd %xmm7, b(%rax) leaq8(%rax), %r10 movapd %xmm0, %xmm5 subsd %xmm6, %xmm5 movsd a(%r10), %xmm3 mulsd %xmm5, %xmm5 movsd b(%r10), %xmm4 subsd %xmm5, %xmm4 movsd %xmm4, b(%r10) leaq16(%rax), %r9 movapd %xmm0, %xmm1 subsd %xmm3, %xmm1 movsd a(%r9), %xmm15 mulsd %xmm1, %xmm1 movsd b(%r9), %xmm2 subsd %xmm1, %xmm2 movsd %xmm2, b(%r9) leaq24(%rax), %r8 SPEC2000 benchmarking on x86/x86_64 (Core i7) shows that the patch usage results in a bit better code. x86: The code is different on gzip, vpr, gcc, crafty, perlbmk, gap, vortex, bzip2, twolf and mesa. The patch results in always not bigger code (in average about 0.02% smaller). The rate is a bit better with patch but practically the same (the biggest improvement is on crafty and perlbmk about 1%). x86_64: The code is different on gzip, vpr, gcc, crafty, parser, perlbmk, gap, vortex, bzip2, twolf and mesa, art, ammp. The patch results in average about 0.01% smaller code. The rate is a bit better with patch but practically the same (the biggest improvement is on vortex 1.3% and on crafty and bzip2 0.7%). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42973
[Bug middle-end/42973] [4.5 regression] IRA apparently systematically making reload too busy on 2 address instructions with 3 operands
--- Comment #4 from vmakarov at redhat dot com 2010-02-06 00:57 --- I have a patch which solves the problem and analogous problem that Jeff recently sent me. I just need a time to do some benchmarking. If everything is all right, I'll submit the patch probably on Monday. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42973
[Bug rtl-optimization/42941] -fsched-pressure -fschedule-insns - valgrind warns about using uninitialized variable
--- Comment #3 from vmakarov at redhat dot com 2010-02-03 18:57 --- This is a rare case when the algorithm works the same whatever values are in memory. Roughly speaking, if the value is not as expected (for example a garbage) the value is set up to what it needed. If it is one as expected we do nothing and have the same result. Valgrind warns because the data is not initialized. I'll submit a patch soon for initialization of the values. The compiler will work absolutely the same (may be a bit slower because of the initialization) but there will be no valgrind warnings which will simplify compiler debugging by valgrind. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42941
[Bug target/36539] Poor register allocation from IRA
--- Comment #10 from vmakarov at redhat dot com 2010-01-29 20:33 --- Jeff, I saw analogous problem when I worked on improving IRA performance. I checked the approach you are proposing. But it works considerably worse on SPEC2000. Finally, I found that the best conflicting cost technique works when we change it only for one hard register when pseudo best cost is achieved on one hard register, e.g. best cost is achieved on register class containing one hard register or assigning particular hard register removes a copy. Why technique you are proposing does not work well in average for classes (like Q_REGS in this case) containing more one register? This is just my speculation. If # conflicting pseudos is less size of QREGS we should not modify conflict costs of the pseudo for QREGS because QREGS for the conflicting pseudos can be more profitable and we still will assign QREG for the pseudo. Even if # conflicting pseudos size of QREGS, they still might be assigned to hard registers which are only part of QREGS. It is hard to predict. I am not saying that we should not work on this problem. I think we should try more sophisticated heuristics. Although I don't know what one (it could be conflict cost modifications only when register pressure for QREGS is high during pseudo live range but such heuristic will take some time to implement and i am not still sure that it will work better in average). Unfortunately, there will be cases when RA could work better because RA algorithms are heuristic ones. What we should focus on is to improve performance for credible benchmarks like SPEC2000/SPEC2006. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36539
[Bug target/41399] [4.5 Regression] Scheduler gives huge dependence graph compiling fortran/intrinsic.c on ARM
--- Comment #21 from vmakarov at redhat dot com 2010-01-29 21:54 --- Thanks everyone who works on the bug. I am sorry that the bug was really introduced by my patch more accurately by the part which should fix reload crashes when the 1st scheduling works for some targets. The patch creates huge number dependencies on stack register (r13) which could be used for reloads according to *arm_movsi_insn. But pseudos can not be assigned the stack register because the register is fixed and we have not to add dependencies for the pseudo to fix the reload craches. The following small fix will solve the PR. Index: ../../gcc/gcc/sched-deps.c === --- ../../gcc/gcc/sched-deps.c (revision 155624) +++ ../../gcc/gcc/sched-deps.c (working copy) @@ -2623,6 +2623,7 @@ sched_analyze_insn (struct deps *deps, r extract_insn (insn); preprocess_constraints (); ira_implicitly_set_insn_hard_regs (temp); + AND_COMPL_HARD_REG_SET (temp, ira_no_alloc_regs); IOR_HARD_REG_SET (implicit_reg_pending_clobbers, temp); } I'll submit the patch on Monday after some testing. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41399
[Bug rtl-optimization/41171] register allocator undoing optimal schedule
--- Comment #8 from vmakarov at redhat dot com 2009-10-30 21:57 --- Unfortunately, not yet because I had some failures after applying the patch. I postponed work on this but now I have time to continue the work. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41171