[Bug rtl-optimization/57915] [4.8/4.9 Regression] ICE in set_address_disp, at rtlanal.c:5537

2013-09-20 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57915

Vladimir Makarov vmakarov at redhat dot com changed:

   What|Removed |Added

 CC||vmakarov at redhat dot com

--- Comment #4 from Vladimir Makarov vmakarov at redhat dot com ---
The address in question is

(plus (symbol_ref ...) (const_int 4))

LRA finds two displacements (symbol_ref and const_int) although only one
displacement is allowed.

The correct canonical address should be:

(const (plus (symbol_ref ...) (const_int 4)))

Non-canonical address is created from

(reg ...)

by 1st constant propagation pass (cprop1).

I believe the problem should be fixed there.

As for reload pass, it has code transforming address (plus some const some
const) into (const (plus some const some const)).  It was probably a problem
fix in a wrong place.  There is no need to complicate LRA more and implement
analogous code in LRA.  As I wrote I believe it should be fixed in cprop1 by
generating the correct canonical address.


[Bug middle-end/58419] [4.9 Regression] wrong code at -O3 on x86_64-linux-gnu in 32-bit mode

2013-09-16 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58419

--- Comment #3 from Vladimir Makarov vmakarov at redhat dot com ---
(In reply to Zhendong Su from comment #2)
 (In reply to H.J. Lu from comment #1)
  It is caused by r202468.
 
 So it may have been a dup of 58418?

Yes, it is a duplication.


[Bug target/58166] ARMv5: poor register allocation in function containing smull instruction

2013-08-25 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58166

--- Comment #6 from Vladimir Makarov vmakarov at redhat dot com ---
On 13-08-22 10:11 AM, rearnsha at gcc dot gnu.org wrote:
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58166

 --- Comment #5 from Richard Earnshaw rearnsha at gcc dot gnu.org ---
 (In reply to Jay Foad from comment #3)
 I've bisected this to r191805:

 http://gcc.gnu.org/viewcvs/gcc?view=revisionrevision=191805
 http://gcc.gnu.org/ml/gcc-patches/2012-09/msg01764.html
 I suspect that is just exposing a latent problem.

Sorry, I am on vacation now.  I'll look at this after my vacation (after 
the Labor day).


[Bug target/58110] Useless GPR push and pop when only xmm registers are used.

2013-08-09 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58110

--- Comment #3 from Vladimir Makarov vmakarov at redhat dot com ---
Thanks, Ondrej and Jan.  GCC with reload generates code with the same problem.

I mentioned on RA BOF that we should look at postreload.c and postreload-gcse.c
to figure out what should and can be removed as redundant and what can be
integrated with IRA/LRA.  This PR is just a good illustration of why it should
be done.

I don't think this work will be done soon but it is good to have the PR to
remember this.


[Bug rtl-optimization/58048] [4.8/4.9 Regression] internal compiler error: Max. number of generated reload insns per insn is achieved (90)

2013-08-08 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58048

Vladimir Makarov vmakarov at redhat dot com changed:

   What|Removed |Added

 CC||vmakarov at redhat dot com

--- Comment #9 from Vladimir Makarov vmakarov at redhat dot com ---
(In reply to Bernd Edlinger from comment #8)
 I see the same error with recent 4.9 i686-pc-linux-gnu in the following
 test case:
 
 gcc -O2 -msse -mno-avx -S testsuite/gcc.target/i386/intrinsics_4.c
 intrinsics_4.c: In function 'foo':
 intrinsics_4.c:14:1: internal compiler error: Max. number of generated
 reload insns per insn is achieved (90)
 
  }
  ^
 0x849e4c3 lra_constraints(bool)
 ../../gcc-4.9-20130728/gcc/lra-constraints.c:3724
 0x849136c lra(_IO_FILE*)
 ../../gcc-4.9-20130728/gcc/lra.c:2319
 0x8456beb do_reload
 ../../gcc-4.9-20130728/gcc/ira.c:4689
 0x8456beb rest_of_handle_reload
 ../../gcc-4.9-20130728/gcc/ira.c:4801
 Please submit a full bug report,
 with preprocessed source if appropriate.

It is the same diagnostic but it has different reason for this.

I guess it is not LRA problem.  This test should be not run for i686 as it
tries to use non-avx and avx insns (which is absent for i686 architecture).

Reload pass also finishes badly by assert (internal error) on this test as
reload can not find insns to generate correct code.

Still GCC should have a better diagnostic for this case (may be by checking
correct architecture/attribute pairs).  Although I have no idea how to do it
right.


[Bug target/57293] [4.8/4.9 Regression] not needed frame pointers on IA-32 (performance regression?)

2013-08-04 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57293

--- Comment #5 from Vladimir Makarov vmakarov at redhat dot com ---
I've started this work.  But unfortunately, i have too many things on my plate
now.  I was too optimistic.  Now I can say only that I am planning to fix it on
stage1 (so the fix should be in gcc4.9).


[Bug rtl-optimization/51041] g++ strange optimisation behaviour

2013-07-29 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51041

Vladimir Makarov vmakarov at redhat dot com changed:

   What|Removed |Added

 CC||vmakarov at redhat dot com

--- Comment #3 from Vladimir Makarov vmakarov at redhat dot com ---
I guess RA is doing right thing.  Pseudo 84 corresponding to variable sum when
the second printf is uncommented lives through insn throwing an
exception.  The code affecting p84 allocation (putting it into memory
as SSE_REGS have no caller-saved regs) is

ira-lives.c::process_bb_node_lives:

  if (can_throw_internal (insn))
{
  IOR_HARD_REG_SET (OBJECT_CONFLICT_HARD_REGS (obj),
call_used_reg_set);
  IOR_HARD_REG_SET (OBJECT_TOTAL_CONFLICT_HARD_REGS (obj),
call_used_reg_set);
}
Where insn is:

(call_insn 141 140 142 22 (call (mem:QI (symbol_ref:DI (_ZdlPv) [flags 0x41]
function_decl 0x71ae2400 operator delete) [0 operator delete S1 A8])
(const_int 0 [0]))
/usr/lib/gcc/x86_64-redhat-linux/4.7.2/../../../../include/c++/4.7.2/ext/new_allocator.h:100
648 {*call}
 (expr_list:REG_DEAD (reg:DI 5 di)
(expr_list:REG_EH_REGION (const_int 0 [0])
(nil)))
(expr_list:REG_FRAME_RELATED_EXPR (use (reg:DI 5 di))
(nil)))

it is a destructor in new_allocator.h:

  void
  deallocate(pointer __p, size_type)
  { ::operator delete(__p); }

The problem could be solved by p84 live range splitting.  By default IRA does
live range splitting only when the register pressure is high.  This is not the
case for the test where max pressure for GENERAL_REGS and SSE_REGS is only 4.

We can modify semantics -fira-region=all to form a region for any loop on which
border live range splitting is done.  I tried that and with -fira-region=all
the same speed is achieved for the test.  Unfortunately, with the new semantics
permitting too aggressive spilling, the generated code is about 0.5% worse on
SPEC2000 for x86-64.

I guess we should pay more attention in optimizations to deal with code with EH
regions, as C++ code have a lot of such code.

I'll think what can I do more with the problem.


[Bug rtl-optimization/57960] S/390: LRA ICE building glibc

2013-07-24 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57960

Vladimir Makarov vmakarov at redhat dot com changed:

   What|Removed |Added

 CC||vmakarov at redhat dot com

--- Comment #3 from Vladimir Makarov vmakarov at redhat dot com ---
(In reply to Andreas Krebbel from comment #2)
 (In reply to Marek Polacek from comment #1)
  But this is s390x, right?  (Judging from the movstrictsi.)
 
 Yes.

Thanks, Andrew.  I've reproduced it.  I guess a fix will be ready on this week
as the bug is in a sensitive part of LRA and the fix will need a lot of testing
on a few machines.


[Bug bootstrap/57604] LRA related bootstrap comparison failure on s390x --with-arch=zEC12

2013-06-17 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57604

Vladimir Makarov vmakarov at redhat dot com changed:

   What|Removed |Added

 CC||vmakarov at redhat dot com

--- Comment #2 from Vladimir Makarov vmakarov at redhat dot com ---
Andreas, thanks for checking it and doing the analysis.  I'll try to make a
patch fixing this on this week.


[Bug rtl-optimization/57462] ira-costs considers only a single register at a time

2013-05-31 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57462

Vladimir Makarov vmakarov at redhat dot com changed:

   What|Removed |Added

 CC||vmakarov at redhat dot com

--- Comment #1 from Vladimir Makarov vmakarov at redhat dot com ---
  Thanks for the analysis.  That would be an interesting problem to solve.
Although I don't know when I could start work on the problem.

  The code you are mentioning is actually adaptation of code from old regclass
pass which existed since day 1 of GCC. The optimal solution of the problem
might be NP-complete (I am not sure about it, but at least long time ago I
tried to describe it by ILP).

  I should say that even the current cost code is very expensive and speeding
up is on my list to do.  Better solution (through better heuristics) probably
will be even more expensive.  IMHO, it is also GCC specific problem because GCC
postpones code selection (usually compilers do complete code selection before
RA, e.g. selecting insn for add in this case) and that is a consequence of GCC
machine description model.  Doing complete code selection before RA is also
challenging task.

  In any case, the problem is known and quite interesting.  There are a lot of
different approaches to solve it (some require even GCC architectural changes),
none of them are easy.  So I don't think, the problem will be solved soon. 
Sorry.


[Bug target/57293] [4.8/4.9 Regression] not needed frame pointers on IA-32 (performance regression?)

2013-05-16 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57293

Vladimir Makarov vmakarov at redhat dot com changed:

   What|Removed |Added

 CC||vmakarov at redhat dot com

--- Comment #1 from Vladimir Makarov vmakarov at redhat dot com ---
The change was done because of 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57018

LRA misses some functionality for now for this kind of code.  There will be no
quick fix (I mean in a few days or even in 2 weeks) for this.  But I am
planning to fix it until end of June.

Sorry.


[Bug rtl-optimization/55278] [4.8/4.9 Regression] Botan performance regressions apparently due to LRA

2013-05-09 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55278

--- Comment #11 from Vladimir Makarov vmakarov at redhat dot com ---
I don't see a code degradation because of LRA.  Here what I got using gcc4.8
branch compiler with options -O3  -finline-functions  -D_REENTRANT
-Wno-long-long -W -Wall -fPIC -fvisibility=hidden on Xeon X5660 and i7-2600
(sandy bridge):

64-bit:

real=16.78 user=16.57 system=0.00
real=16.39 user=16.20 system=0.00
real=16.81 user=16.57 system=0.00
real=16.35 user=16.20 system=0.00
real=16.82 user=16.56 system=0.00
real=16.40 user=16.20 system=0.00

real=7.37 user=7.34 system=0.00
real=7.05 user=7.02 system=0.00
real=7.34 user=7.31 system=0.00
real=7.05 user=7.02 system=0.00
real=7.37 user=7.31 system=0.00
real=7.05 user=7.02 system=0.00


32-bit:

real=15.46 user=15.22 system=0.00 share=98%%
real=14.53 user=14.21 system=0.00 share=97%%
real=15.77 user=15.41 system=0.00 share=97%%
real=14.49 user=14.23 system=0.00 share=98%%
real=15.57 user=15.22 system=0.00 share=97%%
real=14.51 user=14.23 system=0.00 share=98%%

real=10.17 user=10.13 system=0.00
real=7.76 user=7.73 system=0.00
real=10.17 user=10.13 system=0.00
real=7.76 user=7.73 system=0.00
real=10.17 user=10.13 system=0.00
real=7.76 user=7.73 system=0.00

The first run is for gcc-4.8 with reload the second run with LRA. It is
repeated 3 times. LRA generates a better code for this test on both CPU in 32
and 64-bit mode.

Although LLVM new reg allocator might generate better code than LRA or reload
or may be there is another reason for this.  To be honest I don't know.

I looked at http://gcc.opensuse.org/c++bench/botan/botan-summary.txt-1-0.html
and I see that KASUMI was improved about October.  I worked on botan after LRA
merge and as I remember some benchmarked became worse, some were improved but
in overall (run time for all algorithms) was about the same.

I don't have 3.3 LLVM but I using 3.2 I am getting on i7-2600
7.378s(64-bit) and 7.234s (32-bit) using the option above vs 7.02s and 7.73s
for gcc4.8 (LRA).  So I can not confirm the big difference on KASUMI reported
on http://www.phoronix.com/scan.php?page=articleitem=llvm_32_eggingnum=2.

It seems to me phoronix is very LLVM biased and that is not good for its
credibility.


[Bug rtl-optimization/57131] [4.8/4.9 Regression] Wrong register assignment?

2013-05-01 Thread vmakarov at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57131



Vladimir Makarov vmakarov at redhat dot com changed:



   What|Removed |Added



 CC||vmakarov at redhat dot com



--- Comment #3 from Vladimir Makarov vmakarov at redhat dot com 2013-05-02 
03:03:34 UTC ---

(In reply to comment #2)

 Apparently went away with the http://gcc.gnu.org/r198432 fix, but it isn't

 clear whether that change was meant to fix this or just made the bug latent.

 Anyway, still reproduceable on the 4.8 branch.

 What I'm seeing before that change is that extendsidi2_1 pattern with MEM

 destination LRA chooses %ebx as (clobber (scratch:SI)) register, eventhough

 %ebx

 is live across that instruction (there is

 (insn 14 74 68 2 (set (reg:SI 3 bx [orig:83 D.1395 ] [83])

 (mem/v/c:SI (plus:SI (reg/f:SI 7 sp)

 (const_int 72 [0x48])) [0 x4+0 S4 A64])) pr57131.c:11 85

 {*movsi_internal}

  (nil))

 (insn 68 14 73 2 (set (reg:SI 3 bx [orig:83 D.1395 ] [83])

 (reg:SI 3 bx [orig:83 D.1395 ] [83])) pr57131.c:11 85 
 {*movsi_internal}

  (expr_list:REG_DEAD (reg:SI 3 bx [orig:83 D.1395 ] [83])

 (nil)))

 some insns before it and:

 (insn 65 24 26 2 (set (reg:SI 5 di [orig:83 D.1395 ] [83])

 (reg:SI 3 bx [orig:83 D.1395 ] [83])) pr57131.c:11 85 
 {*movsi_internal}

  (expr_list:REG_DEAD (reg:SI 3 bx [orig:83 D.1395 ] [83])

 (nil)))

 some insns after it.  Not sure if the noop move with REG_DEAD has anything to

 do with that.  Vlad, can you please have a look?



http://gcc.gnu.org/r198432 was a right solution for this bug.  LRA don't pay

attention to NO_REGS pseudos during assignment although ebx was assigned to

NO_REGS r95 (which is reflected in reg_renumber).



At some points of LRA work reg notes can be invalid.  LRA makes them up to date

after live subpass (lra-lives.c).  It needs only correct live info on bb

borders.



So I'd close this PR.


[Bug rtl-optimization/57046] [4.8/4.9 Regression] wrong code generated by gcc 4.8.0 on i686

2013-04-23 Thread vmakarov at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57046



Vladimir Makarov vmakarov at redhat dot com changed:



   What|Removed |Added



 CC||vmakarov at redhat dot com



--- Comment #5 from Vladimir Makarov vmakarov at redhat dot com 2013-04-23 
15:34:40 UTC ---

(In reply to comment #4)

 We have after the get_value call:

 (insn 73 30 32 6 (set (reg:SI 76 [ D.1441 ])

 (reg:SI 0 ax)) pr57046.c:42 85 {*movsi_internal}

  (expr_list:REG_DEAD (reg:SI 0 ax)

 (nil)))

 (insn 32 73 33 6 (parallel [

 (set (reg:SI 73 [ D.1443 ])

 (ashift:SI (subreg:SI (reg:DI 60 [ D.1441 ]) 0)

 (const_int 2 [0x2])))

 (clobber (reg:CC 17 flags))

 ]) 502 {*ashlsi3_1}

  (expr_list:REG_DEAD (reg:DI 60 [ D.1441 ])

 (expr_list:REG_UNUSED (reg:CC 17 flags)

 (nil

 

 and IRA decides to put pseudo 76 into %ebx and pseudo 60 into %ecx.  Either it

 is an IRA bug, or IRA takes into account that we only really need the low

 32-bits of pseudo 60 at that point.  In any case, reload loads SImode %ecx 
 from

 the stack and uses it in the shift, while LRA loads full DImode %ecx (i.e. 
 %ecx

 and %ebx) from the stack, then uses just the low bits from that (i.e. %ecx) in

 the shift.  So the LRA generated code clobbers the value in %ebx, and 
 get_value

 call is even later on DCEd because of it.



It seems like a discrepancy in IRA which allocates in terms of subregisters and

LRA splitting (including call save/restore as in this case) in terms of

pseudos.  I guess fixing this might take about week.


[Bug target/57018] [4.8/4.9 Regression] Miscompilation of bison 2.7.1 under -Os -fomit-frame-pointer

2013-04-22 Thread vmakarov at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57018



Vladimir Makarov vmakarov at redhat dot com changed:



   What|Removed |Added



 CC||vmakarov at redhat dot com



--- Comment #9 from Vladimir Makarov vmakarov at redhat dot com 2013-04-22 
13:53:10 UTC ---

(In reply to comment #8)

 BTW, with reload on current trunk, bar has identical code, except for the 
 right

 leal 32(%esp), %esi instead of the wrong leal 16(%esp), %esi.

 

 It seems that with reload, elimination_effects is called both during IRA costs

 analysis and later on during actual elimination, while with LRA only IRA costs

 analysis calls it.  And I don't see code in lra-eliminations.c that would

 adjust ep-offset based on say sp adjustments in the code.



Yes, that is true.  In such cases LRA just prevents frame pointer elimination

(except for stack realingnment). I omitted this functionality as I thought it

is not that important for code majority. May be it is time to reconsider this

decision.



I have a patch for the PR which I'll commit today later after some testing.


[Bug rtl-optimization/56999] [4.8/4.9 Regression] LRA caused miscompilation of xulrunner

2013-04-18 Thread vmakarov at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56999



Vladimir Makarov vmakarov at redhat dot com changed:



   What|Removed |Added



 CC||vmakarov at redhat dot com



--- Comment #4 from Vladimir Makarov vmakarov at redhat dot com 2013-04-18 
18:28:21 UTC ---

The bug is in a complicated interactions inheritance and coalescing through

several inheritance/coalesce passes.



I think the patch will be ready tomorrow.





Jakub and Marek, thanks for working on extracting the testcase which required a

lot of your efforts.


[Bug rtl-optimization/56847] [4.8/4.9 Regression] '-fpie' triggers - internal compiler error: in gen_add2_insn, at optabs.c:4705

2013-04-18 Thread vmakarov at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56847



--- Comment #9 from Vladimir Makarov vmakarov at redhat dot com 2013-04-18 
20:10:34 UTC ---

I am still working on this.  I have a patch solving the problem but I'd like to

try other solutions too.


[Bug rtl-optimization/56847] [4.8/4.9 Regression] '-fpie' triggers - internal compiler error: in gen_add2_insn, at optabs.c:4705

2013-04-05 Thread vmakarov at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56847



Vladimir Makarov vmakarov at redhat dot com changed:



   What|Removed |Added



 CC||vmakarov at redhat dot com



--- Comment #7 from Vladimir Makarov vmakarov at redhat dot com 2013-04-06 
03:43:50 UTC ---

  It seems that reload systematically chooses a different alternative (4) than

LRA (1) for movti_internal.  This is a very tricky part of LRA so I guess

fixing this can take a few days may be a week.


[Bug middle-end/55889] [4.8 Regression] ICE: in move_op_ascend, at sel-sched.c:6153 with -fschedule-insns -fselective-scheduling

2013-02-14 Thread vmakarov at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55889



Vladimir Makarov vmakarov at redhat dot com changed:



   What|Removed |Added



 CC||vmakarov at redhat dot com



--- Comment #29 from Vladimir Makarov vmakarov at redhat dot com 2013-02-14 
16:48:24 UTC ---

(In reply to comment #28)

 (In reply to comment #27)

  (In reply to comment #26)

   You are right, your suggestions is what I sketched in comment #21 as 
   choices 1

   or 2.  Sorry for my unclear expalanation of what was actually happening.

   

   I don't have a problem with making sel-sched have extra checks when 
   renaming

   registers before reload, which will make us notice a not obvious extra

   dependence and avoid renaming properly, as now we've figured out these

   dependences don't follow immediately from the RTL.  I just want an extra

   opinion on whether such unexpected dependencies arising when a target 
   (hard)

   register is replaced by a pseudo register should be normal within GCC, or 
   do we

   attribute such dependencies only to the register pressure scheduling 
   mode. 

   FWIW, I would rather agree with the latter than with the former.

  

  I guess you can not fully assume that dependencies are created only from RTL

  data flow.  There are cases (besides pressure sensitive scheduling case

  mentioned here) when dependencies are still created for other reasons 
  different

  from RTL data flow.  I'd look at the dependencies as constraints resulting 
  in

  correct and *desirable* insn schedule.  Although overwhelming majority of 
  them

  are created from RTL data flow analysis.

 

 I agree with you in general, it's just this case of having extra dependencies

 because an LHS hard register was substituted to a pseudo is non-intuitive to

 me.  I am not aware of other similar cases when the other dependency reasons

 you mention kick in after such transformation. 



For example, additional dependencies can be created when queues are too long to

speed up insn scheduling in some patalogical cases.  The probability that it

happens is small but it still happens and selective scheduler can crash in this

case too.



 So I'll try going with the

 minimal fix of tracking only this particular case (of newly created implicit

 clobbers) in the selective scheduler.

 

 Btw, does the code calculating implicit clobbers via

 ira_implicitly_set_insn_hard_regs were planned just for the pressure sensitive

 scheduling or also for the general case?  It looks like it is needed for the

 former but it is calculated for the latter.



It was done to solve (or at least decrease the probability) reload crashes

(reload can not find a spill register) when the first insn scheduling is used.


[Bug inline-asm/56148] [4.8 Regression] inline asm matching constraint with different mode

2013-02-12 Thread vmakarov at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56148



Vladimir Makarov vmakarov at redhat dot com changed:



   What|Removed |Added



 CC||vmakarov at redhat dot com



--- Comment #12 from Vladimir Makarov vmakarov at redhat dot com 2013-02-12 
18:52:05 UTC ---

(In reply to comment #10)

 Vlad, could you please explain a bit how you figured out this issue

 so quickly? (I mean, apart from experience, of course.)



Actually I worked on this for 2.5 days.  The patch affects very sensitive LRA

code.  I think there is very small probability that the patch affects other

targets (so I am going to do a merge trunk into lra branch at the end of week).



The problem was in using the same pseudo for two input operands only one of

which is matching the same pseudo which is an earlyclobber.


[Bug rtl-optimization/56195] [4.8 Regression] Error: incorrect register `%rdi' used with `l' suffix (at -O2)

2013-02-07 Thread vmakarov at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56195



Vladimir Makarov vmakarov at redhat dot com changed:



   What|Removed |Added



 CC||vmakarov at redhat dot com



--- Comment #4 from Vladimir Makarov vmakarov at redhat dot com 2013-02-07 
19:24:58 UTC ---

(In reply to comment #3)

 I'd say the bug is in get_reload_reg.

 Changing pseudo 118 in operand 0 of insn 90 on equiv 0

 Changing address in insn 90 r59:DI -- no change

 Changing pseudo 59 in address of insn 90 on equiv 0

   Creating newreg=137, assigning class GENERAL_REGS to address r137

  Choosing alt 1 in insn 90:  (0) r  (1) rm

  Reuse r137 for reload 0, change to class INDEX_REGS for r137

90: flags:CCGC=cmp(r137:DI,[r137:DI])

 Inserting insn reload before:

   256: r137:DI=0

 

 

 3065  if (get_reload_reg (type, mode, old, goal_alt[i], , new_reg)

 3066   type != OP_OUT)

 

 calls it with

 type=OP_IN, mode=SImode, original=const0_rtx, rclass=GENERAL_REGS

 but returns new_reg = (reg:DI 137).

 That is because:

   if (rtx_equal_p (curr_insn_input_reloads[i].input, original)

in_class_p (curr_insn_input_reloads[i].reg, rclass, new_class))

 doesn't check any mode if original (and curr_insn_input_reloads[i].input) are

 VOIDmode as in this case.  So, either this can be fixed by doing:

if (rtx_equal_p (curr_insn_input_reloads[i].input, original)

 -   in_class_p (curr_insn_input_reloads[i].reg, rclass, new_class))

 +   in_class_p (curr_insn_input_reloads[i].reg, rclass, new_class)

 +   GET_MODE (curr_insn_input_reloads[i].reg) == mode)

 , or we could try better, if the GET_MODE (curr_insn_input_reloads[i].reg)

 is wider than mode, see if we can create a lowpart subreg thereof and return

 that, and only give up (i.e. continue looping) if creation of the lowpart

 subreg for some reason failed.

 

 Vlad, what do you think?



I think, the second solution with lowpart is better.



Would you like to make a patch or may be you prefer that I work on it?


[Bug rtl-optimization/56195] [4.8 Regression] Error: incorrect register `%rdi' used with `l' suffix (at -O2)

2013-02-07 Thread vmakarov at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56195



--- Comment #7 from Vladimir Makarov vmakarov at redhat dot com 2013-02-07 
20:08:47 UTC ---

(In reply to comment #6)

 Actually, that one doesn't really work, because we have pseudo rather than 
 hard

 reg at that point, which will never simplify.

 

 With this:

 

 --- lra-constraints.c.jj2013-02-07 18:34:39.0 +0100

 +++ lra-constraints.c2013-02-07 20:58:25.558920536 +0100

 @@ -421,8 +421,20 @@ get_reload_reg (enum op_type type, enum

if (rtx_equal_p (curr_insn_input_reloads[i].input, original)

 in_class_p (curr_insn_input_reloads[i].reg, rclass, new_class))

  {

 -  *result_reg = curr_insn_input_reloads[i].reg;

 -  regno = REGNO (*result_reg);

 +  rtx reg = curr_insn_input_reloads[i].reg;

 +  regno = REGNO (reg);

 +  /* If input is equal to original and both are VOIDmode,

 + GET_MODE (reg) might be still different from mode.

 + Ensure we don't return *result_reg with wrong mode.  */

 +  if (GET_MODE (reg) != mode)

 +{

 +  if (GET_MODE_SIZE (GET_MODE (reg))  GET_MODE_SIZE (mode))

 +continue;

 +  reg = lowpart_subreg (mode, reg, GET_MODE (reg));

 +  if (reg == NULL_RTX || GET_CODE (reg) != SUBREG)

 +continue;

 +}

 +  *result_reg = reg;

if (lra_dump_file != NULL)

  {

fprintf (lra_dump_file,  Reuse r%d for reload , regno);

 

 the assembly difference is:

 

 -cmpl(%rdi), %rdi

 +cmpl(%rdi), %edi

 

 which is desirable in this case, but not sure if all get_reload_reg callers

 will grok a SUBREG instead of REG returned in *result_reg.



This version of patch looks ok for me.  I have no worry about get_reload_reg

callers.  It should work fine (that is a difference from reload pass when you

should care about secondary reloads etc).



Thanks for working on this, Jakub,


[Bug rtl-optimization/56069] [4.6/4.7/4.8 Regression] RA pessimization

2013-01-22 Thread vmakarov at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56069



Vladimir Makarov vmakarov at redhat dot com changed:



   What|Removed |Added



 CC||vmakarov at redhat dot com



--- Comment #2 from Vladimir Makarov vmakarov at redhat dot com 2013-01-22 
21:01:21 UTC ---

It is definitely regmove pass drawback.  IRA can do nothing in this

case.



We have the following code before and after regmove:



2: r62:DI=di:DI  2: r63:DI=di:DI

  REG_DEAD di:DI   REG_DEAD di:DI

6: {r64:DI=r62:DI 00x3;clobber fla 6: {r63:DI=r63:DI 00x3;clobber

fl

  REG_DEAD r62:DI

7: r65:DI=0x1000 7: r65:DI=0x1000

8: {r63:DI=r64:DI|r65:DI;clobber fla 8: {r63:DI=r63:DI|r65:DI;clobber

fl

  REG_DEAD r65:DI  REG_DEAD r65:DI

  REG_DEAD r64:DI

   13: ax:DI=r63:DI 13: ax:DI=r63:DI

  REG_DEAD r63:DI  REG_DEAD r63:DI

   16: use ax:DI16: use ax:DI



Regmove changes r64 to r63.  It makes two equal hard reg preferences

for r63: AX or DI. Choosing either one results in worse code.



The original generated code can be achieved if regmove changes r65 to

r63.  In this case we have only one hard register preference for r63

(AX) and for r62 (DI).



It can be achieved if regmove tries all orders of commutative operands

(now regmove pass uses the first found order) using additional

heuristics (live range length and/or number of preferred hard regs) to

choose the best order.



It is not a trivial change and can not be done for given release

(gcc4.8).



Also now we have LRA and may be such regmove transformations are not

necessary.  I am going to try this for gcc4.9 when I have more time.

Still to assign ax to r65 and r63 we also need some hard register

preference propagations in IRA which is currently absent.



In any case, I think that older releases were just lucky and generated

the best code as some passes before regmove put r65 as a first input

operand.


[Bug rtl-optimization/55153] [4.8 Regression] ICE: in begin_move_insn, at sched-ebb.c:205 with -fsched2-use-superblocks and __builtin_prefetch

2013-01-14 Thread vmakarov at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55153



Vladimir Makarov vmakarov at redhat dot com changed:



   What|Removed |Added



 CC||vmakarov at redhat dot com



--- Comment #3 from Vladimir Makarov vmakarov at redhat dot com 2013-01-14 
19:44:36 UTC ---

(In reply to comment #2)

  Vlad, can you please have a look?  Thanks.

Ok, I started to work on this.


[Bug rtl-optimization/55829] [4.8 Regression] ICE: in curr_insn_transform, at lra-constraints.c:3069 with -msse3

2013-01-08 Thread vmakarov at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55829



Vladimir Makarov vmakarov at redhat dot com changed:



   What|Removed |Added



 CC||vmakarov at redhat dot com



--- Comment #4 from Vladimir Makarov vmakarov at redhat dot com 2013-01-08 
16:09:58 UTC ---

(In reply to comment #2)

 

 I think this patch can be useful and does give the RA more freedom, but it is

 unclear whether it doesn't make some LRA bug latent.  Vlad?



I am working on it on LRA side.  I hope the patch will be ready today.


[Bug rtl-optimization/55458] [4.8 Regression] ICE: in assign_by_spills, at lra-assigns.c:1212 with -fPIC -m32 and simple asm volatile

2012-11-27 Thread vmakarov at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55458



Vladimir Makarov vmakarov at redhat dot com changed:



   What|Removed |Added



 CC||vmakarov at redhat dot com



--- Comment #2 from Vladimir Makarov vmakarov at redhat dot com 2012-11-27 
22:03:04 UTC ---

  Reload also can not compile the test.  But at least it gives a meaningful

error.

The problem that asm insns needs 6 reload regs and there is only 5 of them (one

is reserved for PIC and sp is always reserved).   Optimizations make asm insn

requiring only 3 regs.



  I've just submitted the patch reporting error as reload reports.  In any

case, I wanted to add this code for some time.


[Bug rtl-optimization/55330] [4.8 Regression] ICE: Maximum number of LRA constraint passes is achieved (15) on gfortran.dg/actual_array_constructor_1.f90

2012-11-16 Thread vmakarov at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55330



Vladimir Makarov vmakarov at redhat dot com changed:



   What|Removed |Added



 CC||vmakarov at redhat dot com



--- Comment #4 from Vladimir Makarov vmakarov at redhat dot com 2012-11-16 
16:39:02 UTC ---

(In reply to comment #2)

 (In reply to comment #1)

  I don't see it on x86_64-apple-darwin10 (revisions 193495+patches and 
  193329).

 

 Looks like a duplicate of 55122.



The both have the same end and diagnostics but reasons for this are different.


[Bug rtl-optimization/55247] [4.8 Regression] internal compiler error: Max. number of generated reload insns per insn is achieved (90)

2012-11-09 Thread vmakarov at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55247



Vladimir Makarov vmakarov at redhat dot com changed:



   What|Removed |Added



 CC||ubizjak at gmail dot com



--- Comment #2 from Vladimir Makarov vmakarov at redhat dot com 2012-11-09 
19:42:30 UTC ---

Here is the insn in question:



(insn 26 25 27 2 (set (reg:TI 115 [orig:100 *defsym_17 ] [100])

(mem:TI (zero_extend:DI (reg:SI 98)) [7 *defsym_17+0 S16 A32])) h.i:54

61 {*movti_internal_rex64}



As I understand the first alternative has ! to strongly encourage to use SSE

instead of GENERAL registers.



(define_insn *movti_internal_rex64

  [(set (match_operand:TI 0 nonimmediate_operand =!r ,o  ,x,x ,m)

(match_operand:TI 1 general_operand  riFo,riF,C,xm,x))]

  TARGET_64BIT  !(MEM_P (operands[0])  MEM_P (operands[1]))



For some reasons, the second alternative does not have !.  I don't know why it

is different from the first alternative.



  For reload it works as it already substituted hard register for the first

operand and in this case it rejects

the 2nd alternative.



(insn 26 25 27 2 (set (reg:TI 0 ax [orig:100 *defsym_17 ] [100])

(mem:TI (zero_extend:DI (reg:SI 2 cx [98])) [7 *defsym_17+0 S16 A32]))

h.i:54 61 {*movti_internal_rex64}



Adding ! for the second alternative (as I believe it should be) solves the

problem.



(define_insn *movti_internal_rex64

  [(set (match_operand:TI 0 nonimmediate_operand =!r ,!o  ,x,x ,m)

(match_operand:TI 1 general_operand  riFo,riF,C,xm,x))]

  TARGET_64BIT  !(MEM_P (operands[0])  MEM_P (operands[1]))



Uros, is this change ok for you?  If it is ok I can commit the patch only on

Wednesday (I'll be away for a few days).


[Bug rtl-optimization/55141] [4.8 Regression] wrong code with -fno-split-wide-types

2012-11-07 Thread vmakarov at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55141



Vladimir Makarov vmakarov at redhat dot com changed:



   What|Removed |Added



 CC||vmakarov at redhat dot com



--- Comment #3 from Vladimir Makarov vmakarov at redhat dot com 2012-11-07 
15:21:11 UTC ---

(In reply to comment #2)

 Yeah, argp is eliminated to rsp + 16 instead of the correct rsp + 32 (there 
 are

 2 64-bit call used registers saved to stack in the prologue, callq pushes 8

 bytes and rsp is adjusted by 8 to maintain the required stack alignment.

 Vlad, can you please take a look?



Sure, I'll look at this.  Simply I don't know when exactly (probably in a few

days) because I am working on many LRA PRs these days.


[Bug rtl-optimization/55092] [4.8 Regression] LRA doesn't scale

2012-10-26 Thread vmakarov at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55092



--- Comment #2 from Vladimir Makarov vmakarov at redhat dot com 2012-10-26 
22:49:23 UTC ---

 LRA reuses stack memory much better than reload (in all modes but especially

in -O0).  May be that is the reason of the var-tracking problem.


[Bug rtl-optimization/55092] [4.8 Regression] LRA doesn't scale

2012-10-26 Thread vmakarov at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55092



--- Comment #4 from Vladimir Makarov vmakarov at redhat dot com 2012-10-26 
22:57:38 UTC ---

(In reply to comment #2)

  LRA reuses stack memory much better than reload (in all modes but especially

 in -O0).  May be that is the reason of the var-tracking problem.



I forgot to say that LRA understands -fno-ira-share-spill-slots.  In this case,

 each pseudo gets own stack slot.



I thing it is worth to try it.


[Bug debug/54402] [4.8 Regression] var-tracking does not scale

2012-10-26 Thread vmakarov at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54402



Vladimir Makarov vmakarov at redhat dot com changed:



   What|Removed |Added



 CC||vmakarov at redhat dot com



--- Comment #5 from Vladimir Makarov vmakarov at redhat dot com 2012-10-26 
23:06:44 UTC ---

Ok, I'll try to find a reason for this slow down.


[Bug regression/55050] Regression test failure slp-21.c on arm-linux-gnueabi

2012-10-24 Thread vmakarov at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55050



Vladimir Makarov vmakarov at redhat dot com changed:



   What|Removed |Added



 CC||vmakarov at redhat dot com



--- Comment #3 from Vladimir Makarov vmakarov at redhat dot com 2012-10-24 
20:27:06 UTC ---

  I am not sure that is a LRA merge problem.  LRA merge should not affect ARM

because old reload should work for ARM.  There is a very small change for arm.c

because I added a new argument for final.c::alter_subreg.  There is few changes

in IRA too but again I don't think it affect ARM.



  Could you provide a preprocessed file because I can not reproduce the problem

by myself.  Thanks.


[Bug rtl-optimization/53125] Very slow register allocation on SPARC

2012-05-10 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53125

--- Comment #3 from Vladimir Makarov vmakarov at redhat dot com 2012-05-10 
18:30:19 UTC ---
  I've tried a recent trunk on gcc63 of the compiler farm with -O0.  The
compilation takes about 300sec.  I checked also gcc-4.3 (this last version with
the old RA), it takes also about 300sec.  The actual old RA is slower (it takes
150sec) than IRA (it takes 55sec) but register information pass (more exactly
regstat_compute_ri which is a part of DF-infrastructure) takes more time in the
trunk than in gcc4.3.  So my times are different what you reported.  Probably
it depends on a machine (gcc63 is relatively modern SPARC machine with NIAGARA
processors).

  After some investigation, I found that the trunk gcc calls regstat_compute_ri
more than gcc-4.3.  That is a result of recent addition to IRA to move some
insns (a month old Bernd's patch).  It is not worth to do for -O0.  So I am
going to switch it off and achieve the same number of regstat_compute_ri calls
(2 of them) as in gcc-4.3 and that means achieving less 200sec of compilation
time. (65% of previous time).  I am going to submit a patch today.

  The futher improvement of regstat_compute_ri is not possible because we need
one call for IRA needs and one call after reload transformations (for
subsequent passes).  Speedup of IRA itself can have only a small impact.  I
don't see how it is possible.  It is very simple and fast enough (3 times
faster than the old RA).

  One might think that not doing RA at all (setting -1 for all reg_renumber
elements) could speed the case up.  But this is not true.  It increases reload
work enormously and generates  2-3 times more insns which will slow down the
compiler even more.

  So, Ian, if you need more speedup for -O0, regstat_compute_ri should be
improved.  But that is not my responsibility area.  For me, it is strange that
such simple task (which requires 1 pass of RTL) takes so much time for this
case.


[Bug rtl-optimization/53125] Very slow register allocation on SPARC

2012-04-28 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53125

--- Comment #2 from Vladimir Makarov vmakarov at redhat dot com 2012-04-29 
00:08:54 UTC ---
I'll look at this PR in a week.


[Bug rtl-optimization/52208] [4.7 Regression] Useless store

2012-02-15 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52208

Vladimir Makarov vmakarov at redhat dot com changed:

   What|Removed |Added

 CC||vmakarov at redhat dot com

--- Comment #4 from Vladimir Makarov vmakarov at redhat dot com 2012-02-15 
19:34:06 UTC ---
(In reply to comment #3)
 The -1000 costs comes from the scan_one_insn subtracting there
 ira_memory_move_cost[][][] * frequency (i.e. memory_cost becomes -4000) and
 on the plus we add just 3000 to memory_cost.
 I wonder if we shouldn't limit this subtraction of mem_cost / setting of
 counted_mem e.g. to general_operand (SET_SRC (set), GET_MODE (SET_SRC (set)))
 and leave the specialized memory loads alone (I know, it would be a hack, but
 works for this and shouldn't pessimize the cases for which this hunk has been
 added).

I would not name this a hack, Jakub.  It is a heuristic :)  This solution is ok
for me.  I checked SPEC2000 and did not find any effect of this patch on
generated code. So the patch is ok but it would be great if you add some
comment for the change. 

  And would at least tiny bit model what reload will do with such
 non-standard mems - as on this testcase it doesn't use the orignal mem, but
 does the load, followed by store to another mem, followed by load from that
 mem.
 
 --- ira-costs.c.jj 2012-01-20 12:35:17.0 +0100
 +++ ira-costs.c 2012-02-14 14:54:52.297356053 +0100
 @@ -1313,7 +1313,8 @@ scan_one_insn (rtx insn)
|| (CONSTANT_P (XEXP (note, 0))
 targetm.legitimate_constant_p (GET_MODE (SET_DEST (set)),
  XEXP (note, 0))
 -   REG_N_SETS (REGNO (SET_DEST (set))) == 1)))
 +   REG_N_SETS (REGNO (SET_DEST (set))) == 1))
 +   general_operand (SET_SRC (set), GET_MODE (SET_SRC (set
  {
enum reg_class cl = GENERAL_REGS;
rtx reg = SET_DEST (set);


[Bug rtl-optimization/49800] [4.7 Regression] segfault with -fsched-pressure -fdump-rtl-sched1

2012-02-02 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49800

--- Comment #3 from Vladimir Makarov vmakarov at redhat dot com 2012-02-02 
18:33:34 UTC ---
I am working on it.


[Bug rtl-optimization/40761] IRA memory hog for insanely nested loops

2012-01-19 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40761

--- Comment #17 from Vladimir Makarov vmakarov at redhat dot com 2012-01-19 
20:42:57 UTC ---
  The problem was in building CFG loops which took the most of time.   CFG
loops were built even if we don't use regional allocation as for -O0.

  I'll send a patch soon.  It is not small because IRA in any case uses one
region with CFG loop representing the whole function.


[Bug rtl-optimization/40761] IRA memory hog for insanely nested loops

2012-01-18 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40761

Vladimir Makarov vmakarov at redhat dot com changed:

   What|Removed |Added

 CC||vmakarov at redhat dot com

--- Comment #16 from Vladimir Makarov vmakarov at redhat dot com 2012-01-18 
22:01:11 UTC ---
I'll work on it.


[Bug target/49865] [4.7 Regression] Unnecessary reload causes small bloat

2011-12-13 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49865

Vladimir Makarov vmakarov at redhat dot com changed:

   What|Removed |Added

 CC||vmakarov at redhat dot com

--- Comment #5 from Vladimir Makarov vmakarov at redhat dot com 2011-12-13 
19:38:16 UTC ---
I can not reproduce it on the current trunk (rev. 182263).  The recent ira
patches might fix it.  The code generated on the current trunk is 

pushl   %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl%esp, %ebp
.cfi_def_cfa_register 5
pushl   %edi
.cfi_offset 7, -12
movl$1024, %ecx
xorl%eax, %eax
movl8(%ebp), %edi
rep stosl
movl8(%ebp), %eax
movl$0, 4096(%eax)
popl%edi
.cfi_restore 7
popl%ebp
.cfi_restore 5
.cfi_def_cfa 4, 4
ret


[Bug rtl-optimization/50176] [4.7 Regression] 4.7 generates spill-fill dealing with char-int conversion

2011-12-13 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50176

--- Comment #9 from Vladimir Makarov vmakarov at redhat dot com 2011-12-13 
20:04:04 UTC ---
(In reply to comment #0)
 Created attachment 25088 [details]
 
 
  After expanding 4.7 contains:
 
 (insn 52 51 53 6 (set (reg:QI 83 [ D.2723 ])
 (mem:QI (plus:SI (reg/v/f:SI 75 [ inptr1 ])
 (reg/v:SI 117 [ col ])) [0 MEM[base: inptr1_19, index: col_90,
 offset: 0B]+0 S1 A8])) test_4_6.c:42 -1
  (nil))
 
  and 4.6 contains
 
 (insn 52 51 53 6 (parallel [
 (set (reg/v:SI 86 [ cb ])
 (zero_extend:SI (mem:QI (plus:SI (reg/v/f:SI 76 [ inptr1 ])
 (reg/v:SI 78 [ col ])) [0 MEM[base: inptr1_19, 
 index: col_22, offset: 0B]+0 S1 A8])))
 (clobber (reg:CC 17 flags))
 ]) test_4_6.c:42 -1
  (nil))
 
 

The reason of different outcome in RA is that p83 generated by 4.7 we can use
only q regs vs. general regs for p86 generated by 4.6.  It decreases # of
possible hard regs for p83 in two times and failure to assign p83 a hard
register.  More accurately IRA assigns dx to p83 then reload spills p83 because
it needs a hard register then reload asks IRA to reassign a hard register to
p83 and IRA fails.


[Bug rtl-optimization/21617] [4.4/4.5/4.6/4.7 Regression] CRC64 algorithm optimization problem on Intel 32-bit

2011-12-09 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21617

Vladimir Makarov vmakarov at redhat dot com changed:

   What|Removed |Added

 CC||vmakarov at redhat dot com

--- Comment #6 from Vladimir Makarov vmakarov at redhat dot com 2011-12-09 
19:09:52 UTC ---
There is small difference in the code which results in such degradation.

-O1 generates an insn in the major loop

(insn 43 42 44 5 /home/cygnus/vmakarov/build1/trunk/crctest64.c:241 (parallel [
(set (reg/v:SI 77 [ __tab_index ])
(xor:SI (reg:SI 108)
(reg:SI 120)))
(clobber (reg:CC 17 flags))
]) 395 {*xorsi_1} (expr_list:REG_DEAD (reg:SI 108)
(expr_list:REG_DEAD (reg:SI 120)
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)

-O2 generates analogous insn

(insn 39 38 40 5 /home/cygnus/vmakarov/build1/trunk/crctest64.c:241 (parallel [
(set (reg/v:SI 83 [ __tab_index ])
(xor:SI (reg/v:SI 83 [ __tab_index ])
(reg:SI 143)))
(clobber (reg:CC 17 flags))
]) 395 {*xorsi_1} (expr_list:REG_DEAD (reg:SI 143)
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil

The reason for the difference because of regmove optimization.

The RTL insn in the second variant looks even better but it makes
pseudo 83 most frequently used and assigned first by pushing it last
to the coloring stack between bunch trivially colorable pseudos.  The
set of trivially colorable pseudos contains two double word pseudos
which need two adjacent hard registers each.  Assigning pseudo 83
first (the case is complicated more because some pseudos cross calls)
results in presence of only one pair of adjacent hard registers
although there are still 2 free hard register for the second double
word pseudos but they are not adjacent.  It results in spilling of one
double word pseudo and code performance degradation.

For -O1 analog pseudo 83 (p77) is assigned last after assigning to two
double word pseudos and spilling does not occur.

To solve the problem we should increase probability of keeping free
hard registers adjacent.  It can be done by pushing multi-word pseudos
last to the coloring stack and as consequence to assign them first by
modifying function bucket_allocno_compare_func.  I did the problem was
solved unfortunately, it results in 2% performance degradation of
SPEC2000 perlbmk although there is a small code size improvement on
SPEC2000 with this heuristic.

On a general note, RA allocation is all about heuristics.  So it is
possible to find a test where it will work worse than other
heuristics.  The most important that RA works well in overall (on big
credible set of tests).  With this point of view IRA is much better
than the previous register allocator.

But because crc code is important, I'll continue the work on tuning
which does not degrade SPEC2000 and which does solve problem.


[Bug other/50775] Register allocator sets up frame and frame pointer with low register pressure

2011-12-03 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50775

--- Comment #6 from Vladimir Makarov vmakarov at redhat dot com 2011-12-04 
04:09:06 UTC ---
(In reply to comment #5)
 (In reply to comment #4)
 
  Wrong profitable hard regs calculation for register files requiring aligned
  start register was a merging problem with a patch for allocation without 
  cover
  classes.
  
  I'll try make a patch this week to solve the problem.
 
 Thanks you are taking care of this.  Will it also improve the situation for
 3-byte types as introduced in PR50931? 3-byte types also start in even
 registers.

I think it will improve.

Sorry for the delay with the patch.  The changes are big (the patch is about
1700 lines long) so I need a thorough testing.


[Bug other/50775] Register allocator sets up frame and frame pointer with low register pressure

2011-11-28 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50775

--- Comment #4 from Vladimir Makarov vmakarov at redhat dot com 2011-11-28 
21:48:20 UTC ---

(In reply to comment #2)
 
 Also, I have a question about the following fields of `ira_allocno':
   /* The number of objects tracked in the following array.  */
   int num_objects;
   /* An array of structures describing conflict information and live
  ranges for each object associated with the allocno.  There may be
  more than one such object in cases where the allocno represents a
  multi-word register.  */
   ira_object_t objects[2];
 --^
 The SImode for AVR consists of 4 words, but only 2 objects in allocno
 structure.
 Is this right ?
 
 

Yes, that is right.  IRA objects were introduced by By Bernd Schmidt. 
Unfortunately, I did not review his patch.  Probably, Bernd decided that 2 hard
regs allocno covers most cases (and may be he is right).  Other multi regs
allocno is processed as one object (it means that all one register parts
conflict with all another one register parts even if in reality one part does
not conflict with another allocno part).

Wrong profitable hard regs calculation for register files requiring aligned
start register was a merging problem with a patch for allocation without cover
classes.

I'll try make a patch this week to solve the problem.

Dennis, thanks for detail analysis of the problem.  It saved my time.


[Bug rtl-optimization/50829] avx extra copy for _mm256_insertf128_pd

2011-11-23 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829

Vladimir Makarov vmakarov at redhat dot com changed:

   What|Removed |Added

 CC||vmakarov at redhat dot com

--- Comment #5 from Vladimir Makarov vmakarov at redhat dot com 2011-11-24 
03:29:09 UTC ---
The following code is generated before RA:
...
(insn 7 3 11 2 (set (reg:V4DF 63)
(unspec:V4DF [
(reg/v:V2DF 62 [ x ])
] UNSPEC_CAST)) ./include/avxintrin.h:1413 1960 {avx_pd256_pd}
 (nil))

(insn 11 7 17 2 (set (reg:V4DF 65)
(vec_concat:V4DF (vec_select:V2DF (reg:V4DF 63)
(parallel [
(const_int 0 [0])
(const_int 1 [0x1])
]))
(reg/v:V2DF 62 [ x ]))) ./include/avxintrin.h:715 1933
{vec_set_hi_v4df}
 (expr_list:REG_DEAD (reg:V4DF 63)
(expr_list:REG_DEAD (reg/v:V2DF 62 [ x ])
(nil
...

First of all unspec in insn 7 hides that 63 and 62 has the same value.  But
even if the unspec were absent, IRA as most other RAs finds conflicts based on
live ranges not on the value of in the pseudos.  The finding conflicts based on
GVN is very expensive and gives nothing on the most code (I did GVN based
conflict recognition about 8 years ago, it is described in the 2nd GCC summit
proceedings if I remember correctly).

As 62 and 63 conflicts they get different hard registers.

I guess that the right RTL generation (using one pseudo for 62 and 63) should
be done somewhere outside IRA.


[Bug target/50829] avx extra copy for _mm256_insertf128_pd

2011-11-23 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829

--- Comment #7 from Vladimir Makarov vmakarov at redhat dot com 2011-11-24 
03:45:24 UTC ---
As for stack allocation.  crtl-stack_realign_needed == 1 results in
frame_pointer_needed:=1 in ira.c::ira_setup_eliminable_regset.  I don't
remember the origin of the code.  Probably, it is from HJ's stack aligning
work.  Sorry, if I am wrong.

I guess we should re-evaluate frame_pointer_needed at the end of RA if we don't
allocate any memory in all RA.


[Bug target/50164] [IRA, 4.7 Regression] Performance degradation due to increased memory instructions count

2011-08-24 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50164

Vladimir Makarov vmakarov at redhat dot com changed:

   What|Removed |Added

 CC||vmakarov at redhat dot com

--- Comment #1 from Vladimir Makarov vmakarov at redhat dot com 2011-08-24 
16:02:57 UTC ---
Yesterday I sent a patch
http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01954.html which most probably
solved the problem.

Now I have code size 419 (gcc 4.6) vs 411 (gcc as of Aug 24) bytes for the
test.


[Bug bootstrap/50146] [4.7 regression] unused variable saved_nregs in ira-color.c broke arm-linux-gnueabi bootstrap

2011-08-21 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50146

Vladimir Makarov vmakarov at redhat dot com changed:

   What|Removed |Added

 CC||vmakarov at redhat dot com

--- Comment #1 from Vladimir Makarov vmakarov at redhat dot com 2011-08-22 
03:32:18 UTC ---
(In reply to comment #0)
 gcc-4.7-20110820 fails to bootstrap on arm-linux-gnueabi with:
 

 The issue is that saved_nregs is declared unconditionally but used #ifndef
 HONOR_REG_ALLOC_ORDER.  The ARM backend does define HONOR_REG_ALLOC_ORDER, so
 the warning is expected.
 
 I'm testing the following fix:
 
 --- gcc-4.7-20110820/gcc/ira-color.c.~1~2011-08-18 16:56:36.0
 +0200
 +++ gcc-4.7-20110820/gcc/ira-color.c2011-08-21 19:11:00.0 +0200
 @@ -1567,13 +1567,14 @@ static bool
  assign_hard_reg (ira_allocno_t a, bool retry_p)
  {
HARD_REG_SET conflicting_regs[2], profitable_hard_regs[2];
 -  int i, j, hard_regno, best_hard_regno, class_size, saved_nregs;
 +  int i, j, hard_regno, best_hard_regno, class_size;
int cost, mem_cost, min_cost, full_cost, min_full_cost, nwords, word;
int *a_costs;
enum reg_class aclass;
enum machine_mode mode;
static int costs[FIRST_PSEUDO_REGISTER], full_costs[FIRST_PSEUDO_REGISTER];
  #ifndef HONOR_REG_ALLOC_ORDER
 +  int saved_nregs;
enum reg_class rclass;
int add_cost;
  #endif

Sorry, my bad.  It is from my patch for PR50107.

The patch is ok so you can commit it to the trunk.

Thank you.


[Bug rtl-optimization/50107] [IRA, i386] allocates registers in very non-optimal way

2011-08-19 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107

--- Comment #14 from Vladimir Makarov vmakarov at redhat dot com 2011-08-19 
16:12:48 UTC ---
(In reply to comment #11)
 (In reply to comment #10)
   movq%rdi, %rdx
   mulx%rsi, %rax, %rsi
   movq%rsi, %rdx
   ret
   .cfi_endproc
   .LFE0:
   .sizetest_mul_64, .-test_mul_64
   .identGCC: (GNU) 4.7.0 20110818 (experimental)
   .section.note.GNU-stack,,@progbits
   [hjl@gnu-6 pr50107]$ 
   
   I would expect
   
   movq%rdi, %rdx
   mulx%rsi, %rax, %rdx
   ret
  
  I think it i a reload problem.  IRA assigns dx to pseudo 71 (an insn output)
  but reload then spills it.
 
 uti-2.i.188r.asmcons has
 
 (insn 11 4 24 2 (parallel [
 (set (reg:DI 72)
 (mult:DI (reg/v:DI 64 [ b ])
 (reg/v:DI 63 [ a ])))
 (set (reg:DI 73 [+8 ])
 (truncate:DI (ashiftrt:TI (mult:TI (zero_extend:TI (reg/v:DI 
 64 
 [ b ]))
 (zero_extend:TI (reg/v:DI 63 [ a ])))
 (const_int 64 [0x40]
 ]) uti-2.i:3 339 {bmi2_mulxditi3_internal}
  (expr_list:REG_DEAD (reg/v:DI 64 [ b ])
 (expr_list:REG_DEAD (reg/v:DI 63 [ a ])
 (nil
 
 uti-2.i.191r.ira generates:
 
 (insn 11 28 25 2 (parallel [
 (set (reg:DI 0 ax [72])
 (mult:DI (reg/v:DI 4 si [orig:64 b ] [64])
 (reg:DI 1 dx)))
 (set (reg:DI 4 si [orig:73+8 ] [73])
 (truncate:DI (ashiftrt:TI (mult:TI (zero_extend:TI (reg/v:DI 4
 s
 i [orig:64 b ] [64]))
 (zero_extend:TI (reg:DI 1 dx)))
 (const_int 64 [0x40]
 ]) uti-2.i:3 339 {bmi2_mulxditi3_internal}
  (nil))
 
 Why does IRA/reload choose SI for pseudo 73?

IRA assigns dx to pseudo 73.  Than reload pass needs dx for pseudo 63 and
reload spills 73 and assigns si to 73 again.  Reload pass spills pseudo 73
because it believes that pseudos living through insn or dead or set (pseudo 73
is set) in the insn conflict with necessary reload.

Of course it is really not necessary to spill pseudo 73, but to teach reload
pass to that is a big, error-prune project.  I'd not recommend to start it.

I myself am not interesting to work on the reload pass.  Instead I prefer to
work on LRA (local RA) which is a reload pass replacement.


[Bug rtl-optimization/49890] IRA spill with plenty of available registers

2011-08-18 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49890

Vladimir Makarov vmakarov at redhat dot com changed:

   What|Removed |Added

 CC||vmakarov at redhat dot com

--- Comment #1 from Vladimir Makarov vmakarov at redhat dot com 2011-08-18 
16:12:45 UTC ---
IRA removes some classes for consideration on the 2nd pass to speed up cost
calculation which is very time consuming.  IRA did it in too optimistic way. 
That is the reason of the problem.

I'll send a patch which removes classes in more conservative way and fixes the
problem.


[Bug rtl-optimization/50107] [IRA, i386] allocates regiters in very non-optimal way

2011-08-18 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107

--- Comment #10 from Vladimir Makarov vmakarov at redhat dot com 2011-08-18 
18:24:42 UTC ---
(In reply to comment #9)
 With revision 177865 + MULX change, I got
 
 [hjl@gnu-6 pr50107]$ cat uti-2.i
 unsigned __int128 test_mul_64 (unsigned long long a, unsigned long long b)
 {
   return (unsigned __int128) a*b;
 }
 [hjl@gnu-6 pr50107]$ make uti-2.s
 /export/build/gnu/gcc-hsw/build-x86_64-linux/gcc/xgcc
 -B/export/build/gnu/gcc-hsw/build-x86_64-linux/gcc/ -S -o uti-2.s -O2 -mbmi2 
 uti-2.i
 [hjl@gnu-6 pr50107]$ cat uti-2.s
 .fileuti-2.i
 .text
 .p2align 4,,15
 .globltest_mul_64
 .typetest_mul_64, @function
 test_mul_64:
 .LFB0:
 .cfi_startproc
 movq%rdi, %rdx
 mulx%rsi, %rax, %rsi
 movq%rsi, %rdx
 ret
 .cfi_endproc
 .LFE0:
 .sizetest_mul_64, .-test_mul_64
 .identGCC: (GNU) 4.7.0 20110818 (experimental)
 .section.note.GNU-stack,,@progbits
 [hjl@gnu-6 pr50107]$ 
 
 I would expect
 
 movq%rdi, %rdx
 mulx%rsi, %rax, %rdx
 ret

I think it i a reload problem.  IRA assigns dx to pseudo 71 (an insn output)
but reload then spills it.


[Bug rtl-optimization/50107] [IRA, i386] allocates regiters in very non-optimal way

2011-08-17 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107

--- Comment #2 from Vladimir Makarov vmakarov at redhat dot com 2011-08-17 
17:16:11 UTC ---
I guess something wrong with hard register preferencing for multi-register
pseudos in ira-color.c::ira_assign.  I believe it works fine for one-register
pseudos.  I'll look at this.  Thanks for reporting.

By the way, your patch is wrong.  There should be TARGET_64BIT in define_split
instead of !TARGET_64BIT.


[Bug rtl-optimization/50107] [IRA, i386] allocates regiters in very non-optimal way

2011-08-17 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50107

--- Comment #6 from Vladimir Makarov vmakarov at redhat dot com 2011-08-17 
22:21:13 UTC ---
(In reply to comment #4)
 Created attachment 25038 [details]
 A patch
 
 This patch generates:
 
 movq%rdi, %rdx
 mulx%rsi, %r10, %r9
 addq$3, %r9
 adcq$0, %r10
 movq%r9, k2(%rip)
 movq%r9, %rax
 movq%r10, k2+8(%rip)
 movq%r10, %rdx
 ret

I don't think it is a good patch (changing register allocation order) because
it prefers new x86-64 registers and results in longer insns and bigger code for
many programs.

I am working on a patch to fix it in IRA.  I found a typo which is a reason for
such behaviour.  I think it will be ready tomorrow.


[Bug rtl-optimization/49936] [4.7 Regression] IRA handles CANNOT_CHANGE_MODE_CLASS poorly, + spills to memory on 4.7

2011-08-16 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49936

--- Comment #4 from Vladimir Makarov vmakarov at redhat dot com 2011-08-16 
17:27:12 UTC ---
(In reply to comment #3)
 Hmmm.  Is it possible to make the INT/memory/whatever decision based on move
 costs?  Or use a target hook to supply a hint about what to do?

I think I can restore the 4.6 behaviour by assigning GR_REGS for accum.  I'll
try to do a patch this week.  Such patches needs a lot of testing.  So I hope
it will be on the trunk next week.


[Bug rtl-optimization/49936] [4.7 Regression] IRA handles CANNOT_CHANGE_MODE_CLASS poorly, + spills to memory on 4.7

2011-08-15 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49936

--- Comment #2 from Vladimir Makarov vmakarov at redhat dot com 2011-08-16 
02:05:02 UTC ---
After thorough investigation of the problem I came to a conclusion that fixing
it in IRA requires to form regions on pseudo mode usage too (besides just
register pressure).  Allocnos for the pseudo in question should get a different
classes (FP class inside loop and INT outside).

The problem is that IRA were written on assumption that register class of all
allocnos for a pseudo is the same.  It needs a lot of changes besides a new
code for forming regions on the mode base.

I'll try to do this but it will take long time.

If it does not work, I could try to restore 4.6 behaviour (assigning INT class
instead of memory).


[Bug rtl-optimization/48633] [4.7 regression] IRA causes ICE in compensate_edge

2011-05-13 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48633

Vladimir Makarov vmakarov at redhat dot com changed:

   What|Removed |Added

 CC||vmakarov at redhat dot com

--- Comment #3 from Vladimir Makarov vmakarov at redhat dot com 2011-05-13 
15:26:57 UTC ---
Michael, thanks for the analysis and the smaller test.  It saved a lot of my
time.

I made a patch to fix the bug and after testing I submit the patch.

I should say that the last big IRA patch did not create the bug, it just
triggered it.


[Bug rtl-optimization/48971] [4.7 regression] ICE with -msoft-float -O2

2011-05-13 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48971

--- Comment #4 from Vladimir Makarov vmakarov at redhat dot com 2011-05-13 
15:34:39 UTC ---
(In reply to comment #3)
 (In reply to comment #2)
  Vlad, this is an abort in setup_pressure_classes which apparently is totally
  broken for sparc -msoft-float.
 
 
 I found the wrong code.  It is pretty simple but I need to check a few
 platforms because the fix might affect other platform builds.
 
 I hope I'll send the patch at the end of the day.

SPARC ICC register presents in ALL_REGS class only which can not be a pressure
class. That is the reason for the problem.  I also found a typo in the check
code (it collected hard registers of all non-pressure classes although it
should collect the pressure classes hard registers).  I found more complication
with the check code in MIPS target.  So it took more time than I did. 
Currently I am testing the patch and submit it for approval soon.


[Bug rtl-optimization/48971] [4.7 regression] ICE with -msoft-float -O2

2011-05-12 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48971

--- Comment #3 from Vladimir Makarov vmakarov at redhat dot com 2011-05-12 
17:57:58 UTC ---
(In reply to comment #2)
 Vlad, this is an abort in setup_pressure_classes which apparently is totally
 broken for sparc -msoft-float.


I found the wrong code.  It is pretty simple but I need to check a few
platforms because the fix might affect other platform builds.

I hope I'll send the patch at the end of the day.


[Bug rtl-optimization/48455] [4.7 Regression] Huge code size regression for all ARM configurations

2011-04-13 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48455

--- Comment #6 from Vladimir Makarov vmakarov at redhat dot com 2011-04-13 
15:21:23 UTC ---
I found one problem with reg equivalences.  They were just ignored.  It is a
result of bad merging the big IRA patch and changes in IRA for last half year.

I found the problem solution improves the code size (at least for -O2).  I'll
send the patch today.

But I guess it does not solve all the code size degradation.  Therefore I
continue my work on the PR.

Thanks for the small tests, Richard.  It saved a lot of my time.


[Bug rtl-optimization/48496] [4.7 Regression] 'asm' operand requires impossible reload in libffi/src/ia64/ffi.c

2011-04-11 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48496

--- Comment #4 from Vladimir Makarov vmakarov at redhat dot com 2011-04-11 
17:11:37 UTC ---
The new big IRA patch just triggered a latent reload bug.

The code in question is in function reload_as_needed

  /* If this was an ASM, make sure that all the reload insns
 we have generated are valid.  If not, give an error
 and delete them.  */
  if (asm_noperands (PATTERN (insn)) = 0)
for (p = NEXT_INSN (prev); p != next; p = NEXT_INSN (p))
  if (p != insn  INSN_P (p)
   GET_CODE (PATTERN (p)) != USE
   (recog_memoized (p)  0
  || (extract_insn (p), ! constrain_operands (1
{
  error_for_asm (insn,
 %asm% operand requires 
 impossible reload);
  delete_insn (p);
}
}

A previous insn P has a spilled pseudo and that results in the error generation
because spilled pseudos are changed by memory later.

I guess the above code is wrong if a previous insn has a spilled pseudo.

The bug did not occur before the big IRA patch because the pseudo in question
happened not to be spilled.  I should mention that it is more profitable to
spill the pseudo and the new IRA makes the right decision (which results in
live range shrinkage and decreasing register pressure).

I could make a patch (preventing the error generation if there are spilled
pseudos in insn P) but I think that reload maintainers would do that different
(e.g. moving the check after changing spilled pseudos by memory) or make a
better patch.


[Bug middle-end/48464] [4.7 Regression] @171649: ICE in setup_pressure_classes, at ira.c:877

2011-04-11 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48464

--- Comment #3 from Vladimir Makarov vmakarov at redhat dot com 2011-04-11 
18:44:59 UTC ---
There is typo in a loop condition resulting in taking hard registers of
LIM_REG_CLASS which happens a garbage for VAX.

I'll send a patch soon.


[Bug rtl-optimization/48272] internal compiler error: in setup_insn_reg_pressure_info, at haifa-sched.c:1124

2011-04-07 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48272

--- Comment #3 from Vladimir Makarov vmakarov at redhat dot com 2011-04-07 
21:22:49 UTC ---
(In reply to comment #2)
 Confirmed (nice non-sensical set of options, btw.)
 
 The problem is that register pressure code is not prepared for new insns
 created during scheduling (for ia64, this is speculation checks and recovery
 code).  The ICE happens because we do not initialize register pressure
 structures.  The below patch seems to fix it, but I am not sure it is 
 correct.  
 
 The patch calls setup_insn_reg_pressure_info (renamed to
 init_insn_reg_pressure_info because there is the function with the same name 
 in
 haifa-sched.c) from within haifa_init_insn, where new instructions created
 during scheduling are initialized.  The patch does not call 
 setup_insn_reg_uses
 as sched_analyze_insn does, because there is no deps context at that point.  
 If
 some processing of this kind is desired, I guess we need to amend the 
 functions
 that copy/init dependencies for recovery code (that is, 
 create_check_block_twin
 and add_to_speculative_block).  Finally, better name for
 init_insn_reg_pressure_info should be devised.
 
 Vlad, it would be great if you can advise me on how to improve the patch.
 


It is good enough.  You can commit it of course with a proper changelog entry.

Thanks, Andrey.


[Bug rtl-optimization/48455] [4.7 Regression] Huge code size regression for all ARM configurations

2011-04-06 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48455

Vladimir Makarov vmakarov at redhat dot com changed:

   What|Removed |Added

 CC||vmakarov at redhat dot com

--- Comment #3 from Vladimir Makarov vmakarov at redhat dot com 2011-04-06 
16:58:02 UTC ---
That is a huge degradation. I am going to work on it.

Could you provide me a small test?  I can not even download CSiBE.  Something
wrong with their web-server.


[Bug inline-asm/48435] [4.7 Regression] Assertion failure during IRA (df_scan)

2011-04-06 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48435

Vladimir Makarov vmakarov at redhat dot com changed:

   What|Removed |Added

 CC||vmakarov at redhat dot com

--- Comment #4 from Vladimir Makarov vmakarov at redhat dot com 2011-04-06 
19:42:56 UTC ---
All pseudos got 0 available hard regs and therefore spills.  Something wrong
with calculation of number of available hard regs for targets which can use reg
pairs starting only on even/odd hard regs.

The fix will need changes in very sensitive part of IRA code and need some time
to write it, test, and benchmark it.  I hope it will be done at the end of
week.

Sorry for the inconvenience.


[Bug target/48366] [4.7 Regression] ICE in extract_constrain_insn_cached, at recog.c:2024

2011-04-03 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48366

Vladimir Makarov vmakarov at redhat dot com changed:

   What|Removed |Added

 CC||vmakarov at redhat dot com

--- Comment #6 from Vladimir Makarov vmakarov at redhat dot com 2011-04-03 
18:18:03 UTC ---
John, thanks for reporting the PR and working on it.

I guess that the last patch (for pr48380) I sent should solve the problem too.
Unfortunately, I did not get an approval for the patch yet.

I'd recommend you to check the patch first because it might save you a lot of
time because the problem occurs in reload and it is hard to analyze the reload.
 But the real reason of the problem is in wrong IRA directions.


[Bug target/48380] [gcc-4.7 regression] ICE in postreload.c while building trunk

2011-04-01 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48380

Vladimir Makarov vmakarov at redhat dot com changed:

   What|Removed |Added

 CC||vmakarov at redhat dot com

--- Comment #4 from Vladimir Makarov vmakarov at redhat dot com 2011-04-01 
20:47:39 UTC ---
We have the following situation:
  - a pseudo has equivalent constant.
  - a loop allocno corresponding to the pseudo got hard reg and
the subloop allocno got memory.
  - the load generated by IRA on the loop/subloop border is not removed.
  - the loop allocno is spilled in reload transforming the load into mem-mem
move.
  - reload skip processing the move because it sets up regno with equiv
constant.
  - gcc dies in the post=reload.

There are several possible solutions but the most optimal would be removing the
load transformed into mem-mem move in the reload.  We need to add the load to
equiv init insn.

I'll submit a patch solving the problem soon.


[Bug rtl-optimization/48381] [4.7 Regression] internal compiler error: in check_allocation, at ira.c:2094

2011-03-31 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48381

Vladimir Makarov vmakarov at redhat dot com changed:

   What|Removed |Added

 CC||vmakarov at redhat dot com

--- Comment #2 from Vladimir Makarov vmakarov at redhat dot com 2011-03-31 
19:06:57 UTC ---
Jakub, thanks for reducing the test.  It really saved my time.

We have the following situation:

  Allocno a7r74 of CREG(1) has 1 avail. regs 2 (confl regs =  0 1 3-51 )
node:  2
  ...
  Allocno a10r80 of GENERAL_REGS(6) has 5 avail. regs obj 0 0-5 (confl regs
=  6-51 ) node:  0-5,  obj 1 0-5 (confl regs =  6-51 ) node:  0-5
  ...
  Popping a7(r74,l0)  -- assign reg 2
  ..
  Popping a10(r80,l0)  -- (memory is more profitable 7000 vs 2147483647)
spill
...
Spilling a7r74 for a10r80
Assigning 1 to a10r80
   a7(r74,l0)  -- assign hard reg 2

r74 got C reg and r80 was spilled.  Than function improve_allocation decides
that spilling r74 and assigning hard reg to r84 is more profitable.  The
function at the end of its work was trying to assign another hard reg to r74
and assign C reg again which is wrong because of r80 needs two hard registers
DX and CX.

The wrong assignment is because of wrong code in function (assign_hard_reg)
searching conflicting hard regs.  The code deciding that conflicting allocnos
really conflicts looks like

  hard_regno = ALLOCNO_HARD_REGNO (conflict_a);
  if (hard_regno = 0  
  ira_class_hard_reg_index[aclass][hard_regno] = 0)

where aclass is class of r84 (CREGS) and hard_regno is r80 hard regno (DX). 
This code was ok for cover classes.  It should be different when different
classes of allocnos can intersect.

I'll send a patch to solve the PR soon.


[Bug middle-end/48367] [4.7 Regression] 200.sixtrack/301.apsi in SPEC CPU 2000 are miscompiled

2011-03-30 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48367

--- Comment #4 from Vladimir Makarov vmakarov at redhat dot com 2011-03-30 
23:17:58 UTC ---
I've started to work on this.  Probably it will take day or two to fix it is
hard to find a wrong code in a big program as apsi.


[Bug middle-end/48367] [4.7 Regression] 200.sixtrack/301.apsi in SPEC CPU 2000 are miscompiled

2011-03-30 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48367

--- Comment #6 from Vladimir Makarov vmakarov at redhat dot com 2011-03-31 
01:05:33 UTC ---
  The problem was in a typo in ira-costs.c which in some cases results in
assigning INT_MAX to memory_cost and as consequence ALL_REGS to some allocnos. 
After some optimizations the allocno which got a hard reg and corresponds to
loop which contains subloops and never referenced in its loop is spilled in
function move_spill_restore and because it is never referenced in the loop, it
got zero costs for all hard regs.

In reload, the allocno is assigned to a mmx hard register through IRA which
corrupted by sse registers usage in other program places.

I'll sent a patch soon to fix this.


[Bug rtl-optimization/48331] [4.7 Regression] gcc.c-torture/execute/built-in-setjmp.c FAILs with -O -fira-algorithm=priority -fPIC

2011-03-29 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48331

Vladimir Makarov vmakarov at redhat dot com changed:

   What|Removed |Added

 CC||vmakarov at redhat dot com

--- Comment #3 from Vladimir Makarov vmakarov at redhat dot com 2011-03-29 
15:07:46 UTC ---
(In reply to comment #2)
 It started with http://gcc.gnu.org/viewcvs?view=revisionrevision=171649
 
 I don't know what's the status of this allocator (how near is its end), nor if
 there are any targets that have to use it as CB's allocator doesn't work for
 them.

Thanks for reporting.  The patch is to permit to use CB allocator for ports
which had to use the priority allocator.  The performance result of the
modified CB allocator is expected to be better than the usage of priority one
for the ports.

In perspective, priority coloring will be removed.  I'd recommend maintainers
of the ports using priority coloring to check CB coloring and plan to switch to
it by default.

The changes in IRA are big and complex and probably will result some port
problems for some time because RA is the most machine-dependent part of the
compiler.  Therefore the patch was committed to the trunk on the beginning of
stage1 to have more time to fix all the problems.

Meanwhile, I am going to work and try to fix this PR.


[Bug rtl-optimization/48345] [4.7 Regression] [SH] Invalid float register allocated

2011-03-29 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48345

--- Comment #2 from Vladimir Makarov vmakarov at redhat dot com 2011-03-29 
23:59:08 UTC ---
(In reply to comment #0)
 
 It seems that ira-color.c:assign_hard_reg chooses a register
 of which corresponding bit of ira_prohibited_class_mode_regs
 [FP_REGS][DFmode] is set.  The patch below looks to work for me,
 though I'm suspecting the real problem is in the target side.
 
 --- ORIG/trunk/gcc/ira-color.c2011-03-29 10:08:17.0 +0900
 +++ LOCAL/trunk/gcc/ira-color.c2011-03-29 15:09:06.0 +0900
 @@ -1692,6 +1692,9 @@ assign_hard_reg (ira_allocno_t a, bool r
 FIRST_STACK_REG = hard_regno  hard_regno = LAST_STACK_REG)
  continue;
  #endif
 +  if (TEST_HARD_REG_BIT (ira_prohibited_class_mode_regs[aclass][mode],
 + hard_regno))
 +continue;
if (! check_hard_reg_p (a, hard_regno,
conflicting_regs, profitable_hard_regs))
  continue;

The patch is ok for me.  This code was lost accidentally on ira-improv branch.

Could you commit the patch (of course with a proper changelog entry).  I am
approving the patch.

Thanks.


[Bug rtl-optimization/48345] [4.7 Regression] [SH] Invalid float register allocated

2011-03-29 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48345

--- Comment #4 from Vladimir Makarov vmakarov at redhat dot com 2011-03-30 
00:49:50 UTC ---

(In reply to comment #3)
 Thanks!  Is
 
 PR rtl-optimization/48345
 * ira-color.c (assign_hard_reg): Skip prohibited hard registers
 for given class and mode.
 
 OK for ChangeLog?

Sorry, Kazumoto.  Please do not commit the patch.  The problem is a bit more
deeper than I thought.

The profitable hard regs should exclude prohibited hard regs for given mode. 
It is true for major allocation.  The wrong register is assigned during
secondary allocation (after flattening IRA IR or during reload) where
profitable hard register is not defined properly.  So the fix should contain
the code for proper setting profitable hard regs.

I'll create a patch soon.

Sorry again for jumping to a wrong conclusion.


[Bug middle-end/48342] [4.7 Regression] Failures on powerpc-apple-darwin9 at revision 171653

2011-03-29 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48342

--- Comment #1 from Vladimir Makarov vmakarov at redhat dot com 2011-03-30 
01:26:26 UTC ---
I've just submitted a patch for approval to solve the problem.  I hope it will
be fixed soon.

Thanks for the report.


[Bug rtl-optimization/46920] suboptimal register allocation with local register variables

2010-12-14 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46920

--- Comment #3 from Vladimir Makarov vmakarov at redhat dot com 2010-12-14 
16:02:09 UTC ---
(In reply to comment #2)
  To generate the proposed code, we should assign r12 to p63.  IRA marks p63
  conflicting with r12 because DF-infrastructure reports r12 having 
  intersected
  live ranges with p63.
 
  It is possible to solve the problem if we have conflicts based on values 
  (not
  live ranges).  I'd not recommend to do that, because it will slow down RA
  without visible improvement on majority benchmarks (I did such experiment 
  about
  7 years ago and reported about the results on GCC summit in 2004).
 
 One alternative is to rematerialize values that have been copied to a
 hard register before their uses (by inserting an r12:DI=r63:DI before
 the use of r63).  This breaks the live ranges of the pseudos and
 facilitates coalescing.
 

I'd not call it rematerialization.  I think it is more live range shrinking
(LRS) of hard register through additional copies.  It is an interesting idea (I
partially investigated LRS about 6 years ago).  Probably I should think about
this again.  Thanks, Paolo.

  By the way, usage of implicit hard registers in RTL (when it can be avoided.
  Example when hard registers can be avoided is their usage as call 
  arguments) is
  very bad idea for RA.  I see it a lot such code in x86-64 code.  I'd 
  recommend
  to prevent optimizations before RA to abuse hard register usage.
 
 As I said, the improvement from hard register variable here is 25% on
 x86-64 and probably more (I can collect data) on i386.  This testcase
 is distilled from a bytecode interpreter.

Paolo, I did not mean that you should avoid to use hard register in this
particular case.  I just wrote that I saw a lot x86-64 code where hard
registers were propagated and that is a bad for RA.  I never had an opportunity
to investigate what optimization does it.

Again by the way :).  My experience with implementation of interpreters shows
me that usage of computed gotos does not work well (especially when there are a
lot such labels) with modern OOO processors because of worse branch
predictions.  I found a switch statement works better.  But I guess it is not
your goal to rewrite the interpriter.


[Bug rtl-optimization/46920] suboptimal register allocation with local register variables

2010-12-13 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46920

Vladimir Makarov vmakarov at redhat dot com changed:

   What|Removed |Added

 CC||vmakarov at redhat dot com

--- Comment #1 from Vladimir Makarov vmakarov at redhat dot com 2010-12-13 
21:03:48 UTC ---
Before IRA we have the following code

 L43:
   25 NOTE_INSN_BASIC_BLOCK
   26 pc=r59:DI
  REG_DEAD: r59:DI
i  27: barrier
L28:
   29 NOTE_INSN_BASIC_BLOCK
   30 r62:DI=r12:DI
  REG_DEAD: r12:DI
   31 {r63:DI=r62:DI+0x2;clobber flags:CC;}
  REG_UNUSED: flags:CC
   32 r12:DI=r63:DI
   33 flags:CCZ=cmp([r62:DI+0x2],0)
  REG_DEAD: r62:DI
   34 pc={(flags:CCZ==0)?L39:pc}
  REG_DEAD: flags:CCZ
  REG_BR_PROB: 0x1388
   35 NOTE_INSN_BASIC_BLOCK
   36 r76:DI=sign_extend(r65:SI)
  REG_DEAD: r65:SI
   37 {r63:DI=r63:DI+r76:DI;clobber flags:CC;}
  REG_DEAD: r76:DI
  REG_UNUSED: flags:CC
   38 r12:DI=r63:DI
L39:
   40 NOTE_INSN_BASIC_BLOCK
   41 r65:SI=sign_extend([r63:DI+0x1])
  REG_DEAD: r63:DI
   42 r59:DI=[r71:DI]
   61 pc=L43

To generate the proposed code, we should assign r12 to p63.  IRA marks p63
conflicting with r12 because DF-infrastructure reports r12 having intersected
live ranges with p63.

It is possible to solve the problem if we have conflicts based on values (not
live ranges).  I'd not recommend to do that, because it will slow down RA
without visible improvement on majority benchmarks (I did such experiment about
7 years ago and reported about the results on GCC summit in 2004).

By the way, usage of implicit hard registers in RTL (when it can be avoided. 
Example when hard registers can be avoided is their usage as call arguments) is
very bad idea for RA.  I see it a lot such code in x86-64 code.  I'd recommend
to prevent optimizations before RA to abuse hard register usage.


[Bug rtl-optimization/46829] ICE: in spill_failure, at reload1.c:2105 with -fschedule-insns -fsched-pressure and variadic function

2010-12-10 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46829

--- Comment #10 from Vladimir Makarov vmakarov at redhat dot com 2010-12-10 
15:45:33 UTC ---
(In reply to comment #9)
 (In reply to comment #5)
  It should work for x86_64, not necessarily i?86.
 
 Do you mean -fsched-pressure should be able to solve the problem completely
 for x86-64?
 
 Vladimir: Do you have any idea which direction to go in order to solve this
 problem?

Introducing of -fsched-pressure just decreased probability of the bug when 1st
insn scheduling is used.  The patch introducing -fsched-pressure contained some
code in reload to decrease the probability even more.

Unfortunately, it did not eliminated it fully.  This bug can not be fixed in
scheduler (or the solution, like not moving through insn referring for a hard
register, will be too conservative especially for x86_64 and still will not fix
it for x86) because the scheduler can not see all info handled by reload.

IMHO, the right fix should be possibility to split live ranges for explicitly
mentioned hard register.  May be Jeff Law's current work will provide such
feature.

In any case, implementation of live range splitting in reload is too big and
complicated job even for stage #1.  There is no way to implement it at stage
#3.  It is also very unreasonable thing to do because any change in reload is
usually very bug prone.

I am sorry, but I don't see that it can be fixed for gcc4.6 fox x86/x86-64. 
Although it might be fixed for gcc4.7.


[Bug target/42536] [4.4/4.5/4.6 regression] ICE in spill_failure, at reload1.c:2141

2010-11-29 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42536

Vladimir Makarov vmakarov at redhat dot com changed:

   What|Removed |Added

 CC||vmakarov at redhat dot com

--- Comment #6 from Vladimir Makarov vmakarov at redhat dot com 2010-11-29 
20:39:39 UTC ---
(In reply to comment #4)
 Jeff/Vlad, how hard would it be to try to split the insn into two insns 
 instead
 of a spill failure (for insns using a MEM whose address uses more than one 
 hard
 register) - one which forces the address into register (assuming it is
 supported) and the store (or load) which would use a simpler address form?

If it is done in reload (and imho this is the most right place to do), I think
it would be hard.  It needs some person with a good knowledge of the reload.

It is also possible to do some splitting in other parts of compiler but it
would an approximate solution (it means not all such cases will be avoided
or/and it will hurt performance in general case).


[Bug rtl-optimization/44249] [4.4/4.5/4.6 Regression] IRA generates extra register move

2010-11-24 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44249

--- Comment #2 from Vladimir Makarov vmakarov at redhat dot com 2010-11-24 
17:40:56 UTC ---
Reload creates additional insn for insn

(insn 9 7 11 2 (parallel [
(set (reg:DI 71)
(lshiftrt:DI (reg/v:DI 60 [ tag ])
(const_int 4 [0x4])))
(clobber (reg:CC 17 flags))
]) b.i:5 533 {*lshrdi3_1}
 (expr_list:REG_DEAD (reg/v:DI 60 [ tag ])
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil

That is because r60 and r71 got different registers (0 an 1) even
although there is a copy between r71 and r60 which should result in
getting r70 hard register 0 as r60 one.  It does not happen because
r68 already got 0 and it conflicts with r71:

r71: preferred GENERAL_REGS, alternative NO_REGS, cover GENERAL_REGS
r68: preferred AREG, alternative GENERAL_REGS, cover GENERAL_REGS
r60: preferred GENERAL_REGS, alternative NO_REGS, cover GENERAL_REGS

;; a0(r68,l0) conflicts: a1(r71,l0)

;; a4(r67,l0) conflicts:  cp0:a1(r71)-a3(r60)@1000:constraint

  Popping a0(r68,l0)  -- assign reg 0
  Popping a3(r60,l0)  -- assign reg 0
  Popping a1(r71,l0)  -- assign reg 1

Analogous insn for gcc-4.3 looks like

(insn:HI 9 7 11 2 b.i:4 (parallel [
(set (reg/v:DI 58 [ tag ])
(lshiftrt:DI (reg/v:DI 58 [ tag ])
(const_int 4 [0x4])))
(clobber (reg:CC 17 flags))
]) 514 {*lshrdi3_1_rex64} (expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)))

It means there is no such problem as in gcc4.4+.

Insn 9 for gcc-4.3 is a result of regmove transformation.  I have no
idea why regmove (which is present in gcc4.4+) does not do the same
for gcc4.4+ (probably because of some changes since 4.3).

The problem could be fixed in regmove or in IRA (which is probably
harder).  But I don't know is it worth to do it.  Because such
transformations result in longer live ranges of pseudos and might
result in worse code for other programs.


[Bug fortran/42169] [4.4/4.5/4.6 Regression] gfortran.dg/pr41928.f90:47: internal compiler error: in store_can_be_removed_p, at ira-emit.c:371

2010-10-19 Thread vmakarov at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42169

Vladimir Makarov vmakarov at redhat dot com changed:

   What|Removed |Added

 CC||vmakarov at redhat dot com

--- Comment #22 from Vladimir Makarov vmakarov at redhat dot com 2010-10-20 
03:03:53 UTC ---
Function store_can_be_removed_p was written in assumption that the store is on
a loop exit.  Apparently it is not true.  In this case, it was actually a loop
entry from 4 to 5 in loop tree:

0-1-2-3-4-5
|
 --6-7

There should be some rare combinations of conditions (one is that pseudo is not
changed in whole program) to achieve gcc_unreachable for the loop entry. 
Therefore it is hard to reproduce.

There is a very simple solution which is to return false (preventing this
optimization) instead of gcc_unreachable (that is a loop entry case).

I'll send a patch soon.


[Bug middle-end/45312] [4.4 Regression] GCC 4.4.4 miscompiles the Linux kernel

2010-09-14 Thread vmakarov at redhat dot com


--- Comment #23 from vmakarov at redhat dot com  2010-09-14 15:46 ---
(In reply to comment #22)
 Fixed everywhere but on 4.3 branch.
 
 Maybe commit the patch there too?
 

I think there is a smaller probability that this bug occurs in gcc4.3 because
it is based on the old RA.  IRA uses hard registers more effectively and
frequently than the old RA and therefore it stresses the reload pass more and
as the result reload bugs occur more frequently with IRA.

But if it is present in gcc4.3, the patch should be applied too.  Even more I
guess that the patch is pretty safe and could be applied to gcc4.3 in any case.

If you want you could apply it to gcc4.3-branch.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45312



[Bug middle-end/40386] wrong code generation for several SPEC CPU2000 benchmarks (lucas, mgrid, face, applu, apsi) with -O1 -fno-ira-share-spill-slots

2010-09-08 Thread vmakarov at redhat dot com


--- Comment #9 from vmakarov at redhat dot com  2010-09-08 17:44 ---
The problem is in that pseudos (r121 in our case) spilled by IRA are
not added to live_throughout of reload chain.  As the result,
pseudo_forbidden_regs are not set up for such pseudos and they can get
a hard registers (42 in our case) even if they live through insns
(insn 153 in our case) using reload (0th in our case) with this
register when another pseudo is spilled and reload ask IRA to assign
the correspodning hard register to other pseudo.

Here are some parts of IRA dump:

Spilling for insn 153.
Using reg 2 for reload 1
Using reg 42 for reload 0
...
Spilling for insn 238.
Using reg 2 for reload 0
  Spill 117(a35), cost=5000
  Spilled regs 117
Try assign 121(a6), cost=5000: reassign to 42


The fix is pretty simple.  I'll send it soon.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40386



[Bug middle-end/44554] Stack space after sigsetjmp is reused

2010-09-08 Thread vmakarov at redhat dot com


--- Comment #9 from vmakarov at redhat dot com  2010-09-08 20:06 ---
(In reply to comment #8)
 (In reply to comment #7)
  Is this still a bug then?  Should ira-share-spill-slots be automatically
  disabled for the caller function when a callee function can return twice?
  
 I've never tested with gcc-4.5.x, but in 4.4.x the problem is still present. 
 
 Unfortunately -fno-ira-share-spill-slots seems to introduce another bug which
 leads to wrong computations (nearly at the same code position where I had the
 problems mentioned is this report). 
 
 At this moment I can not provide a detailed report for this problem, but
 perhaps it's the same as http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40386.
 

I've submitted a patch solving PR40386.  So now we can solve this problem by
preventing slot sharing when setjmp is used.

I'll send a patch soon.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44554



[Bug middle-end/45312] [4.4 Regression] GCC 4.4.4 miscompiles the Linux kernel

2010-09-07 Thread vmakarov at redhat dot com


--- Comment #17 from vmakarov at redhat dot com  2010-09-07 18:03 ---
(In reply to comment #16)
 
 
 I just noticed that even in the complete absence of reload inheritance, the
 allocate_reload_reg routine performs free_for_value_p checks, and therefore
 implicitly takes reload ordering into account.  This seems to imply that even
 if we'd do merge_assigned_reloads only if no inheritance has taken place, we'd
 still have a problem.
 
 Does anybody have any idea how much merge_assigned_reloads actually 
 contributes
 to performance on i386, in particular now that we have a bit more post-reload
 optimizers that potentially clear up duplicate code of the type generated by
 unmerged reloads?
 

I am thinking in the same direction.  merge_assign_reloads is dated by 1993. 
Since then it was not practically changed.  I guess postreload can remove
unecessary loads if it is generated without merge_assigned_reload.

I've tried to compile SPEC2000 by gcc-4.4 with and without
merge_assigned_reloads.  I did not find any code difference.  I've tried a lot
of other programs with the same result.  The single difference in code I found
exists on this test case.

So I'd remove merge_assigned_reloads at all as it became obsolete long ago.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45312



[Bug middle-end/45312] [4.4 Regression] GCC 4.4.4 miscompiles the Linux kernel

2010-09-03 Thread vmakarov at redhat dot com


--- Comment #15 from vmakarov at redhat dot com  2010-09-03 20:45 ---
(In reply to comment #14)
Ulrih, I've just wanted to post the following when I found that you already
posted analogous conclusion.  I should have been on CC to see your comment
right away.  The problem is really fundamental.  Code for
merge_assigned_reloads ignores inheritance (and dependencies between reloads
because of inheritance) at all.  Here is my post wanted to add.

After thorough examining code for inheritance in
reload1.c::choose_reload_regs, I can not find where it can be wrong
for this test case.  After this function, we have the following
reloads:

Reload 0: reload_in (SI) = (reg/v/f:SI 132 [ kpte ])
GENERAL_REGS, RELOAD_FOR_OPERAND_ADDRESS (opnum = 0)
reload_in_reg: (reg/v/f:SI 132 [ kpte ])
Reload 1: reload_in (SI) = (reg/v/f:SI 132 [ kpte ])
DIREG, RELOAD_FOR_INPUT (opnum = 1)
reload_in_reg: (reg/v/f:SI 132 [ kpte ])
Reload 2: reload_in (SI) = (reg:SI 600 [ D.29693 ])
BREG, RELOAD_FOR_INPUT (opnum = 2)
reload_in_reg: (reg:SI 600 [ D.29693 ])
Reload 3: reload_in (SI) = (reg:SI 356)
CREG, RELOAD_FOR_INPUT (opnum = 3)
reload_in_reg: (reg:SI 356)

Function reload1.c::merge_assigned_reload called after
reload1.c::choose_reload_regs for targets with SMALL_REGISTER_CLASSES
(i686 case) merges 0th and 1st reloads (merging results in nullifying
reload_in in 1st the reload and changing 0th to RELOAD_OTHER)
producing

Reload 0: reload_in (SI) = (reg/v/f:SI 132 [ kpte ])
GENERAL_REGS, RELOAD_OTHER (opnum = 0)
reload_in_reg: (reg/v/f:SI 132 [ kpte ])
reload_reg_rtx: (reg:SI 5 di)
Reload 1: DIREG, RELOAD_FOR_INPUT (opnum = 1)
reload_in_reg: (reg/v/f:SI 132 [ kpte ])
reload_reg_rtx: (reg:SI 5 di)
Reload 2: reload_in (SI) = (reg:SI 2 cx [501])
BREG, RELOAD_FOR_INPUT (opnum = 2)
reload_in_reg: (reg:SI 600 [ D.29693 ])
reload_reg_rtx: (reg:SI 3 bx)
Reload 3: reload_in (SI) = (reg:SI 356)
CREG, RELOAD_FOR_INPUT (opnum = 3)
reload_in_reg: (reg:SI 356)
reload_reg_rtx: (reg:SI 2 cx)

So far everything is ok.  But after that, it changes 3rd reload to
RELOAD_OTHER which means that it will be issued before 2nd reload
instead of after it as it was before.  Changing to RELOAD_OTHER is
done because the code assumes (on function
reg_overlap_mentioned_for_reload_p) that changing 3rd reload will
affect 0th reload.  In this unfortunate case pseudo 132 (from 0th
reload) and pseudo 356 (from 3rd reload) have equivalent memory and
reg_overlap_mentioned_for_reload_p is a simplified code which in this
case decides that changing equivalent memory of p356 affects
equivalent memory of p132.

Reload 0: reload_in (SI) = (reg/v/f:SI 132 [ kpte ])
GENERAL_REGS, RELOAD_OTHER (opnum = 0)
reload_in_reg: (reg/v/f:SI 132 [ kpte ])
reload_reg_rtx: (reg:SI 5 di)
Reload 1: DIREG, RELOAD_FOR_INPUT (opnum = 1)
reload_in_reg: (reg/v/f:SI 132 [ kpte ])
reload_reg_rtx: (reg:SI 5 di)
Reload 2: reload_in (SI) = (reg:SI 2 cx [501])
BREG, RELOAD_FOR_INPUT (opnum = 2)
reload_in_reg: (reg:SI 600 [ D.29693 ])
reload_reg_rtx: (reg:SI 3 bx)
Reload 3: reload_in (SI) = (reg:SI 356)
CREG, RELOAD_OTHER (opnum = 3)
reload_in_reg: (reg:SI 356)
reload_reg_rtx: (reg:SI 2 cx)

I don't see a good and simple fix for general case (just fixing
reg_overlap_mentioned_for_reload_p would wrong and dangerous) for this
code when inheritance is used and there are dependencies for reload 2
and 3 in this case.


-- 

vmakarov at redhat dot com changed:

   What|Removed |Added

 CC||vmakarov at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45312



[Bug middle-end/45312] [4.4 Regression] GCC 4.4.4 miscompiles the Linux kernel

2010-09-01 Thread vmakarov at redhat dot com


--- Comment #12 from vmakarov at redhat dot com  2010-09-01 18:06 ---
(In reply to comment #10)

 (insn 1407 1405 1406 78
 /mnt/b1/src/linux/set64/arch/x86/include/asm/cmpxchg_32.h:72 (set (reg:SI 2 
 cx)
 (mem/c:SI (plus:SI (reg/f:SI 6 bp)
 (const_int -28 [0xffe4])) [0 S4 A32])) 47
 {*movsi_1} (nil))
 
 (insn 1406 1407 675 78
 /mnt/b1/src/linux/set64/arch/x86/include/asm/cmpxchg_32.h:72 (set (reg:SI 3 
 bx)
 (reg:SI 2 cx [501])) 47 {*movsi_1} (nil))
 
 If insn 1406 came right before insn 1407, it would be still correct.
 

Yes, it would but I think the reload should still generate the right code in
this particular order of insns.  IMHO, fixing the order of insn is not the
right thing to do because there might be situation of cycle (e.g. value of p600
is inherited from 2 but should be reloaded into 3 and p356 is inherited from 3
and should be reloaded into 2). 

The problem is definitely in reload inheritance.  Reg_last_reload_reg is not
invalidated by insn #1407 which is generated by another reload of insn #675.

Reload inheritence bug fixes result in either big code degradation or
possibility to induce new bugs.  It could be ok to fix such problem on the
trunk but fixing it on release brach might be dangerous.

Looking through all patches for reload after gcc4.4 I don't think the bug is
fixed on the trunk (or in gcc 4.5).  We probably are lucky that it did not
occur there.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45312



[Bug rtl-optimization/44174] [4.4/4.5/4.6 Regression] can't find a register in class 'CLOBBERED_REGS' while reloading 'asm'

2010-05-18 Thread vmakarov at redhat dot com


--- Comment #1 from vmakarov at redhat dot com  2010-05-18 19:06 ---
  It will be fixed by IRA without cover classes which I am working on. The code
is planned to be included in gcc4.6.

  For older versions, it should be fixed in reload because I believe it is a
hidden reload bug.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44174



[Bug rtl-optimization/43332] valgrind warns about using uninitialized variable with -fsched-pressure -fschedule-insns

2010-05-18 Thread vmakarov at redhat dot com


--- Comment #4 from vmakarov at redhat dot com  2010-05-18 21:40 ---
  Thanks for reporting the problem.  The problem has no effect on generated
code whatever initialization is used.  The code in question tries to get basic
block for BARRIER which is wrong.  Whatever it gets basic block for BARRIER the
code will still work right.

  In any case, it is really annoying to see such valgrind diagnostic. 
Therefore I'll send a patch to fix it soon.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43332



[Bug target/44031] [4.4/4.5/4.6 Regression] ice in subst_reloads, at reload.c:6327

2010-05-10 Thread vmakarov at redhat dot com


--- Comment #3 from vmakarov at redhat dot com  2010-05-10 15:22 ---

 It is caused by revision 152533:
 
 http://gcc.gnu.org/ml/gcc-cvs/2009-10/msg00182.html
 

If it is so, the patch triggered some reload bug IMO.  The patch itself was
very safe because it resulted in creation of additional conflicts.

I hope that Jeff Law's work on new reload will fix it when it is in 4.6
finally.  Otherwise, we should work on this PR.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44031



[Bug rtl-optimization/44012] [4.5/4.6 Regression] ICE: SIGSEGV in ira_merge_allocno_live_ranges

2010-05-07 Thread vmakarov at redhat dot com


--- Comment #12 from vmakarov at redhat dot com  2010-05-07 17:49 ---
  When allocno is finished, its some info is propagated into upper allocno. 
When several allocnos with same regno are finished, info can be propagated
directly to survived upper allocno or through one allocno will be finished.  It
depends on region configuration and order of allocnos with the same regno in
the corresponding list.  The sigsegv occurs in the second case when we remove
allocno and propagates this info through allready removed allocno.  It happens
because regno_allocno_map which is used to find allocno into which the info to
propagate is not nullified after removing allocno.

H.J.'s patch idea is right but the patch is complicated.  I'll send a simplier
patch soon.



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44012



[Bug rtl-optimization/43413] Powerpc generates worse code for -mvsx on gromacs even though there are no VSX instructions used

2010-03-23 Thread vmakarov at redhat dot com


--- Comment #10 from vmakarov at redhat dot com  2010-03-23 18:45 ---
(In reply to comment #5)
 
 Still I'll investigate a bit more why there are a lot of unexpected spills
 during assignment with -mvsx for the current code.
 

The problem is in that part of VSX_REGS (altivec regs) does not contain values
of SFmode.  The coloring algorithm does not take it into account.  The problem
can be solved if we check this in available register calculation.  The patch I
will send soon decreases # stfs(x)/lfs(x) from 332 to 246.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43413



[Bug rtl-optimization/43413] Powerpc generates worse code for -mvsx on gromacs even though there are no VSX instructions used

2010-03-22 Thread vmakarov at redhat dot com


--- Comment #5 from vmakarov at redhat dot com  2010-03-22 22:16 ---
(In reply to comment #0)
 
 In the enclosed test case, it generates the following spills for the options:
 -O3 -ffast-math -mcpu=power7 -mvsx -maltivec: 117 stfs, 139 lfs
 -O3 -ffast-math -mcpu=power5 -maltivec: 80 stfs, 100 lfs
 -O3 -ffast-math -mcpu=power5: 80 stfs, 100 lfs

Hi, Mike.  I think the comparison should be done with the same -mcpu because
there is 1st insn scheduling which increases register pressure differently for
different architectures.  But that is not so important.  I see a lot of spills
during assigning because memory is more profitable.  Graph coloring pushes them
on the stack suggesting that they get registers (and that is not happened
during the assignment).

On one my branch, I got 
-O3 -ffast-math -mcpu=power7 -mno-vsx -maltivec: 248 of stfs and lfs
-O3 -ffast-math -mcpu=power7 -mvsx -maltivec: 331 of stfs and lfs
-O3 -ffast-math -mcpu=power7 -mvsx -maltivec -fsched-pressure: 310
-O3 -ffast-math -mcpu=power7 -mvsx -maltivec -fsched-pressure -fexper: 179

Where -fexper switches on a new graph coloring code without cover classes which
I am working on.

So I think that this new code and register pressure sensitive insn scheduling
will help.

Still I'll investigate a bit more why there are a lot of unexpected spills
during assignment with -mvsx for the current code.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43413



[Bug rtl-optimization/43413] Powerpc generates worse code for -mvsx on gromacs even though there are no VSX instructions used

2010-03-22 Thread vmakarov at redhat dot com


--- Comment #6 from vmakarov at redhat dot com  2010-03-22 22:20 ---
(In reply to comment #4)
 FWIW, I seem to get considerably worse code from mainline than you -- for -O3
 -ffast-math -mcpu=power7 -mvsx -maltivec I get 140 stfs and 192 lfs insns
 (compared to 117  139 respectively that you reported).
 

I suspect the differnce is because Mike calculated only stfs/lfs and you
stfs(x)/lfs(x).  But may be I am wrong.

 Just for fun, I ran the same code through the a ppc compiler with the LRS code
 from reload-v2 and get 133:178 stfs/lsf insns, so that code clearly is 
 helping,
 but it's not enough to offset the badness shown by IRA.
 
 
 I couldn't reconcile how -fno-ira-share-spill-slots would be changing the
 number of load/store insns, so I poked at that a bit.

Yes, I cannot understand that too.

 -fno-ira-share-spill-slots twiddles whether or not a pseudo which gets 
 assigned
 a hard reg is put into live_throughout or dead_or_set_p in the reload chain
 structures, which in turn changes what pseudos get reassigned hard regs during
 reload.  This is a somewhat odd effect and should be investigated further.
 
 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43413



[Bug middle-end/42973] [4.4/4.5 regression] IRA apparently systematically making reload too busy on 2 address instructions with 3 operands

2010-02-10 Thread vmakarov at redhat dot com


--- Comment #10 from vmakarov at redhat dot com  2010-02-10 17:02 ---
  The big chunk of regmove which did the same what IRA is capable to do was
removed when IRA was merged.

  There are still a lot of important transformations (like dealing with
increments, sign/zero extensions etc) which IRA can not do.

  As I remember I benchmarked IRA with regmove and without it on x86/x86_64
some time ago and I got a clear impression that regmove is still important.

  It would be nice to see what regmove transformations are important, try to
rewrite it or move some its functionality to IRA but unfortunately I have no
time for this.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42973



[Bug middle-end/42973] [4.4/4.5 regression] IRA apparently systematically making reload too busy on 2 address instructions with 3 operands

2010-02-10 Thread vmakarov at redhat dot com


--- Comment #11 from vmakarov at redhat dot com  2010-02-10 17:15 ---

(In reply to comment #8)
 
 Thanks, we should see if this solves the AMMP problem in a day or two.
 Are you going to look at the related PR42961?  Without the regmove hunk
 it does not happen at AMMP but it likely happens elsewhere.  I did some
 work on this years back on old RA so I can play with it too.
 (Simple fix would be to add ? penalizers to integer variant of FP moves,
 but I would like to see some solution where RA actually can use integers
 for mem-mem copies)

  I am working on IRA without usage of cover classes.  For example, IRA could
assign integer or floating point register for mem-mem copies whatever is
possible and whatever is more profitable.  This code is big and not ready yet. 
There are a lot of performance issues (besides IRA speed issues which is a
consequence of dealing with more classes).  I am trying to solve the issues. 
But if the code is ok, it probably will solve the problem.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42973



[Bug middle-end/42973] [4.4/4.5 regression] IRA apparently systematically making reload too busy on 2 address instructions with 3 operands

2010-02-09 Thread vmakarov at redhat dot com


--- Comment #6 from vmakarov at redhat dot com  2010-02-09 19:56 ---
  The patch which I'll send in a few minutes solves the problem.  The patch
avoids the creation of shuffle copies if an involved operand should be bound to
some other operand in the current insn.  The test code generated with the patch
looks like

.L2:
movapd  %xmm0, %xmm8
subsd   %xmm3, %xmm8
movsd   a(%rax), %xmm6
mulsd   %xmm8, %xmm8
movsd   b(%rax), %xmm7
subsd   %xmm8, %xmm7
movsd   %xmm7, b(%rax)
leaq8(%rax), %r10
movapd  %xmm0, %xmm5
subsd   %xmm6, %xmm5
movsd   a(%r10), %xmm3
mulsd   %xmm5, %xmm5
movsd   b(%r10), %xmm4
subsd   %xmm5, %xmm4
movsd   %xmm4, b(%r10)
leaq16(%rax), %r9
movapd  %xmm0, %xmm1
subsd   %xmm3, %xmm1
movsd   a(%r9), %xmm15
mulsd   %xmm1, %xmm1
movsd   b(%r9), %xmm2
subsd   %xmm1, %xmm2
movsd   %xmm2, b(%r9)
leaq24(%rax), %r8


SPEC2000 benchmarking on x86/x86_64 (Core i7) shows that the patch usage
results in a bit better code.

 x86: The code is different on gzip, vpr, gcc, crafty, perlbmk, gap,
vortex, bzip2, twolf and mesa.  The patch results in always not bigger
code (in average about 0.02% smaller).  The rate is a bit better with
patch but practically the same (the biggest improvement is on crafty
and perlbmk about 1%).

  x86_64: The code is different on gzip, vpr, gcc, crafty, parser, perlbmk,
gap,
vortex, bzip2, twolf and mesa, art, ammp.  The patch results in
average about 0.01% smaller code.  The rate is a bit better with patch
but practically the same (the biggest improvement is on vortex 1.3%
and on crafty and bzip2 0.7%).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42973



[Bug middle-end/42973] [4.5 regression] IRA apparently systematically making reload too busy on 2 address instructions with 3 operands

2010-02-05 Thread vmakarov at redhat dot com


--- Comment #4 from vmakarov at redhat dot com  2010-02-06 00:57 ---
I have a patch which solves the problem and analogous problem that Jeff
recently sent me.

I just need a time to do some benchmarking.  If everything is all right, I'll
submit the patch probably on Monday.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42973



[Bug rtl-optimization/42941] -fsched-pressure -fschedule-insns - valgrind warns about using uninitialized variable

2010-02-03 Thread vmakarov at redhat dot com


--- Comment #3 from vmakarov at redhat dot com  2010-02-03 18:57 ---
  This is a rare case when the algorithm works the same whatever values are in
memory.  Roughly speaking, if the value is not as expected (for example a
garbage) the value is set up to what it needed.  If it is one as expected we do
nothing and have the same result.  Valgrind warns because the data is not
initialized.

  I'll submit a patch soon for initialization of the values.  The compiler will
work absolutely the same (may be a bit slower because of the initialization)
but there will be no valgrind warnings which will simplify compiler debugging 
by valgrind.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42941



[Bug target/36539] Poor register allocation from IRA

2010-01-29 Thread vmakarov at redhat dot com


--- Comment #10 from vmakarov at redhat dot com  2010-01-29 20:33 ---
Jeff, I saw analogous problem when I worked on improving IRA performance.  I
checked the approach you are proposing.  But it works considerably worse on
SPEC2000.  Finally, I found that the best conflicting cost technique works when
we change it only for one hard register when pseudo best cost is achieved on
one hard register, e.g. best cost is achieved on register class containing one
hard register or assigning particular hard register removes a copy.

Why technique you are proposing does not work well in average for classes (like
Q_REGS in this case) containing more one register? This is just my speculation.
 If # conflicting pseudos is less size of QREGS we should not modify conflict
costs of the pseudo for QREGS because QREGS for the conflicting pseudos can be
more profitable and we still will assign QREG for the pseudo.  Even if #
conflicting pseudos  size of QREGS, they still might be assigned to hard
registers which are only part of QREGS.  It is hard to predict.

I am not saying that we should not work on this problem. I think we should try
more sophisticated heuristics.  Although I don't know what one (it could be
conflict cost modifications only when register pressure for QREGS is high
during pseudo live range but such heuristic will take some time to implement
and i am not still sure that it will work better in average).

Unfortunately, there will be cases when RA could work better because RA
algorithms are heuristic ones.  What we should focus on is to improve
performance for credible benchmarks like SPEC2000/SPEC2006.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36539



[Bug target/41399] [4.5 Regression] Scheduler gives huge dependence graph compiling fortran/intrinsic.c on ARM

2010-01-29 Thread vmakarov at redhat dot com


--- Comment #21 from vmakarov at redhat dot com  2010-01-29 21:54 ---
Thanks everyone who works on the bug.

  I am sorry that the bug was really introduced by my patch more accurately by
the part which should fix reload crashes when the 1st scheduling works for some
targets.  The patch creates huge number dependencies on stack register (r13)
which could be used for reloads according to *arm_movsi_insn.  But pseudos can
not be assigned the stack register because the register is fixed and we have
not to add dependencies for the pseudo to fix the reload craches.

The following small fix will solve the PR.

Index: ../../gcc/gcc/sched-deps.c
===
--- ../../gcc/gcc/sched-deps.c  (revision 155624)
+++ ../../gcc/gcc/sched-deps.c  (working copy)
@@ -2623,6 +2623,7 @@ sched_analyze_insn (struct deps *deps, r
   extract_insn (insn);
   preprocess_constraints ();
   ira_implicitly_set_insn_hard_regs (temp);
+  AND_COMPL_HARD_REG_SET (temp, ira_no_alloc_regs);
   IOR_HARD_REG_SET (implicit_reg_pending_clobbers, temp);
 }

I'll submit the patch on Monday after some testing.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41399



[Bug rtl-optimization/41171] register allocator undoing optimal schedule

2009-10-30 Thread vmakarov at redhat dot com


--- Comment #8 from vmakarov at redhat dot com  2009-10-30 21:57 ---
Unfortunately, not yet because I had some failures after applying the patch. I
postponed work on this but now I have time to continue the work.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41171



  1   2   >