Re: [llvm-commits] [llvm] r47046 - in /llvm/trunk: include/llvm/CodeGen/LiveIntervalAnalysis.h lib/CodeGen/SimpleRegisterCoalescing.cpp lib/CodeGen/SimpleRegisterCoalescing.h

2008-02-13 Thread Evan Cheng

On Feb 12, 2008, at 9:56 PM, Chris Lattner wrote:

 On Feb 12, 2008, at 7:01 PM, Evan Cheng wrote:

 Author: evancheng
 Date: Tue Feb 12 21:01:43 2008
 New Revision: 47046

 URL: http://llvm.org/viewvc/llvm-project?rev=47046view=rev
 Log:
 Initial support for copy elimination by commuting its definition MI.

 Yay, thanks for tackling this Evan!


 This speeds up FreeBench/neural by 29%, Olden/bh by 12%, oopack_v1p8
 by 53%.

 Very nice, does it also help shootout/fib?

bh speedup doesn't seem real. It doesn't help fib.


 +  // Get the location that B is defined at.  Two options: either
 this value has
 +  // an unknown definition point or it is defined at CopyIdx.  If
 unknown, we
 +  // can't process it.
 +  if (!BValNo-reg) return false;
 +  assert(BValNo-def == CopyIdx  Copy doesn't define the  
 value?);
 +
 +  // AValNo is the value number in A that defines the copy, A3 in
 the example.
 +  LiveInterval::iterator ALR =
 IntA.FindLiveRangeContaining(CopyIdx-1);
 +  VNInfo *AValNo = ALR-valno;

 This idiom happens a lot in the preexisting code.  Please use:

 VNInfo *AValNo = IntA.FindLiveRangeContaining(CopyIdx-1)-valno;

ALR is also used below.



 Or better yet, add a new method that returns valno directly.



 +  int Idx = -1;

 Idx is too generic of a name for a variable with this long of
 lifetime, please name it something more descriptive.  What type of idx
 is it?


 +  for (unsigned i = 0, e = DefMI-getNumOperands(); i != e; ++i) {

 This loop needs a comment. What are you trying to find with this
 loop?  Maybe it should be moved out to its own function which just
 returns idx?  This would force you to name it, which would describe
 what it does :)

There is a reason I said initial in the commit message. This is not  
done. I need a target hook to tell me what is the new destination  
register after commuting. Right now this is basically assuming x86  
commutable instructions: A = A + B


 Another random question unrelated to this code: now that we have
 efficient RAUW, can we eliminate the 'rep'  mapping stuff and just
 RAUW vregs as they are coallesced?

Well, actually it's related. I think it's required for correctness.  
See below.


 +  MachineBasicBlock *MBB = DefMI-getParent();
 +  MachineInstr *NewMI = tii_-commuteInstruction(DefMI);
 +  if (NewMI != DefMI) {
 +li_-ReplaceMachineInstrInMaps(DefMI, NewMI);
 +MBB-insert(DefMI, NewMI);
 +MBB-erase(DefMI);
 +  }
 +  unsigned OpIdx = NewMI-findRegisterUseOperandIdx(IntA.reg);
 +  NewMI-getOperand(OpIdx).setIsKill();
 +
 +  // Update uses of IntA of the specific Val# with IntB.
 +  bool BHasPHIKill = BValNo-hasPHIKill;
 +  SmallVectorVNInfo*, 4 BDeadValNos;
 +  SmallVectorunsigned, 4 BKills;
 +  std::mapunsigned, unsigned BExtend;
 +  for (MachineRegisterInfo::use_iterator UI = mri_-
 use_begin(IntA.reg),
 + UE = mri_-use_end(); UI != UE;) {

 Yay for use_iterators :)

Except I don't think this is quite right. Iterating through uses of  
IntA.reg means we end up not updating other uses that have already  
been coalesced to it.




 +MachineOperand UseMO = UI.getOperand();
 +++UI;
 +MachineInstr *UseMI = UseMO.getParent();

 It would be more clear to use (before the increment):
 MachineInstr *UseMI = *UI;

 Note that if an instruction uses a register multiple times,
 incrementing the iterator could point into another operand of the same
 instruction.  This is rare, but the code needs to handle it
 correctly.  I'm concerned about cases like:

   r1 = or r2, r2

 (which is the copy operation on some targets).

 Where the use iterator could point to the first r2, and incrementing
 it could point to the next r2.  This could be avoided by not zapping
 instructions in this loop: move the copy zapping to a walk over a
 smallset or something.

Nothing is being zapped in the loop. Copies that would be zapped are  
put into a set JoinedCopies. In fact, all uses have to be updated to  
the new register.




 +unsigned UseIdx = li_-getInstructionIndex(UseMI);
 +LiveInterval::iterator ULR =
 IntA.FindLiveRangeContaining(UseIdx);
 +if (ULR-valno != AValNo)
 +  continue;
 +UseMO.setReg(NewReg);
 +if (UseMO.isKill())
 +  BKills.push_back(li_-getUseIndex(UseIdx)+1);
 +if (UseMI != CopyMI) {

 if (UseMI == CopyMI) continue;


 +  unsigned SrcReg, DstReg;
 +  if (!tii_-isMoveInstr(*UseMI, SrcReg, DstReg))
 +continue;


 +  unsigned repDstReg = rep(DstReg);
 +  if (repDstReg != IntB.reg) {
 +// Update dst register interval val# since its source
 register has
 +// changed.
 +LiveInterval DLI = li_-getInterval(repDstReg);
 +LiveInterval::iterator DLR =
 +  DLI.FindLiveRangeContaining(li_-getDefIndex(UseIdx));
 +DLR-valno-reg = NewReg;
 +ChangedCopies.insert(UseMI);
 +  } else {
 +// This copy will become a noop. If it's defining a new  
 val#,
 +// remove that val# as well. However this live range is  

Re: [llvm-commits] [llvm] r47046 - in /llvm/trunk: include/llvm/CodeGen/LiveIntervalAnalysis.h lib/CodeGen/SimpleRegisterCoalescing.cpp lib/CodeGen/SimpleRegisterCoalescing.h

2008-02-13 Thread Chris Lattner
On Feb 13, 2008, at 1:47 AM, Evan Cheng wrote:
 Very nice, does it also help shootout/fib?

 bh speedup doesn't seem real. It doesn't help fib.

Ok, once things have settled, please take a look at the comments in  
the bugzilla to see if there are other cases being missed.

 This idiom happens a lot in the preexisting code.  Please use:

 VNInfo *AValNo = IntA.FindLiveRangeContaining(CopyIdx-1)-valno;

 ALR is also used below.

Ok!

 +  for (unsigned i = 0, e = DefMI-getNumOperands(); i != e; ++i) {

 This loop needs a comment. What are you trying to find with this
 loop?  Maybe it should be moved out to its own function which just
 returns idx?  This would force you to name it, which would describe
 what it does :)

 There is a reason I said initial in the commit message. This is not
 done. I need a target hook to tell me what is the new destination
 register after commuting. Right now this is basically assuming x86
 commutable instructions: A = A + B

Ok.

 Another random question unrelated to this code: now that we have
 efficient RAUW, can we eliminate the 'rep'  mapping stuff and just
 RAUW vregs as they are coallesced?

 Well, actually it's related. I think it's required for correctness.
 See below.

Ah, I see.  Well this seems like an independently useful change that  
will both speed up and simplify coalescing.  Are you interested in  
tackling it as part of this project?

 Yay for use_iterators :)

 Except I don't think this is quite right. Iterating through uses of
 IntA.reg means we end up not updating other uses that have already
 been coalesced to it.

Yeah, that's bad.

 Where the use iterator could point to the first r2, and incrementing
 it could point to the next r2.  This could be avoided by not zapping
 instructions in this loop: move the copy zapping to a walk over a
 smallset or something.

 Nothing is being zapped in the loop. Copies that would be zapped are
 put into a set JoinedCopies. In fact, all uses have to be updated to
 the new register.

Ok.

 It isn't clear to me what this code is doing.  It looks like it is
 looking for the copy that has now becomes an identity copy.  Is there
 some way to factor this with other code that is coallescing copies
 away?  It seems duplicated from somewhere else.  It would be nicer to

 Not really. A number of things are happening here. We are trying to
 determine which copies are now identity copies and make sure their
 valno# will be eliminated. We also need to transfer the live ranges
 defined by the (now dead) copies to the new val# defined by the
 commuted definition MI. It's also tracking kill information.

Ok, please comment it a bit better.  Is there any way to avoid this  
work or factor this out into functions that can be shared with other  
code?

 just add all copies to a small vector and then zap them later or
 something, to keep the logic simple.

 That's essentially what's happening. This has gone through a number of
 iterations already. Apart from some cosmetic changes, it's not going
 to get much simpler than this.

ok.

 Does this need to be an ivar?  Can it just be passed by-ref between
 the methods that need it?

 This will be cleaned up later as part of RAUW change.

Ok.  Thanks Evan!

-Chris
___
llvm-commits mailing list
llvm-commits@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits


Re: [llvm-commits] [llvm] r47046 - in /llvm/trunk: include/llvm/CodeGen/LiveIntervalAnalysis.h lib/CodeGen/SimpleRegisterCoalescing.cpp lib/CodeGen/SimpleRegisterCoalescing.h

2008-02-13 Thread Evan Cheng

On Feb 13, 2008, at 9:31 AM, Chris Lattner wrote:

 On Feb 13, 2008, at 1:47 AM, Evan Cheng wrote:
 Very nice, does it also help shootout/fib?

 bh speedup doesn't seem real. It doesn't help fib.

 Ok, once things have settled, please take a look at the comments in
 the bugzilla to see if there are other cases being missed.

 This idiom happens a lot in the preexisting code.  Please use:

 VNInfo *AValNo = IntA.FindLiveRangeContaining(CopyIdx-1)-valno;

 ALR is also used below.

 Ok!

 +  for (unsigned i = 0, e = DefMI-getNumOperands(); i != e; ++i) {

 This loop needs a comment. What are you trying to find with this
 loop?  Maybe it should be moved out to its own function which just
 returns idx?  This would force you to name it, which would describe
 what it does :)

 There is a reason I said initial in the commit message. This is not
 done. I need a target hook to tell me what is the new destination
 register after commuting. Right now this is basically assuming x86
 commutable instructions: A = A + B

 Ok.

 Another random question unrelated to this code: now that we have
 efficient RAUW, can we eliminate the 'rep'  mapping stuff and just
 RAUW vregs as they are coallesced?

 Well, actually it's related. I think it's required for correctness.
 See below.

 Ah, I see.  Well this seems like an independently useful change that
 will both speed up and simplify coalescing.  Are you interested in
 tackling it as part of this project?

That's the plan. Without RAUW change, I can't enable this optimization.

It's *shouldn't* take long.

Evan



 Yay for use_iterators :)

 Except I don't think this is quite right. Iterating through uses of
 IntA.reg means we end up not updating other uses that have already
 been coalesced to it.

 Yeah, that's bad.

 Where the use iterator could point to the first r2, and incrementing
 it could point to the next r2.  This could be avoided by not zapping
 instructions in this loop: move the copy zapping to a walk over a
 smallset or something.

 Nothing is being zapped in the loop. Copies that would be zapped are
 put into a set JoinedCopies. In fact, all uses have to be updated to
 the new register.

 Ok.

 It isn't clear to me what this code is doing.  It looks like it is
 looking for the copy that has now becomes an identity copy.  Is  
 there
 some way to factor this with other code that is coallescing copies
 away?  It seems duplicated from somewhere else.  It would be nicer  
 to

 Not really. A number of things are happening here. We are trying to
 determine which copies are now identity copies and make sure their
 valno# will be eliminated. We also need to transfer the live ranges
 defined by the (now dead) copies to the new val# defined by the
 commuted definition MI. It's also tracking kill information.

 Ok, please comment it a bit better.  Is there any way to avoid this
 work or factor this out into functions that can be shared with other
 code?

 just add all copies to a small vector and then zap them later or
 something, to keep the logic simple.

 That's essentially what's happening. This has gone through a number  
 of
 iterations already. Apart from some cosmetic changes, it's not going
 to get much simpler than this.

 ok.

 Does this need to be an ivar?  Can it just be passed by-ref between
 the methods that need it?

 This will be cleaned up later as part of RAUW change.

 Ok.  Thanks Evan!

 -Chris
 ___
 llvm-commits mailing list
 llvm-commits@cs.uiuc.edu
 http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

___
llvm-commits mailing list
llvm-commits@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits


Re: [llvm-commits] [llvm] r47046 - in /llvm/trunk: include/llvm/CodeGen/LiveIntervalAnalysis.h lib/CodeGen/SimpleRegisterCoalescing.cpp lib/CodeGen/SimpleRegisterCoalescing.h

2008-02-13 Thread Chris Lattner

On Feb 13, 2008, at 10:58 AM, Evan Cheng wrote:

 Ah, I see.  Well this seems like an independently useful change that
 will both speed up and simplify coalescing.  Are you interested in
 tackling it as part of this project?

 That's the plan. Without RAUW change, I can't enable this  
 optimization.

Awesome, thanks.

 It's *shouldn't* take long.

*crosses fingers* :)

-Chris

___
llvm-commits mailing list
llvm-commits@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits


[llvm-commits] [llvm] r47046 - in /llvm/trunk: include/llvm/CodeGen/LiveIntervalAnalysis.h lib/CodeGen/SimpleRegisterCoalescing.cpp lib/CodeGen/SimpleRegisterCoalescing.h

2008-02-12 Thread Evan Cheng
Author: evancheng
Date: Tue Feb 12 21:01:43 2008
New Revision: 47046

URL: http://llvm.org/viewvc/llvm-project?rev=47046view=rev
Log:
Initial support for copy elimination by commuting its definition MI.
PR1877.
A3 = op A2 B0kill 

   
..  

   
B1 = A3  - this copy   

   
..  


   = op A3   - more uses   

   


 
== 





B2 = op B0 A2kill 

   
..  


B1 = B2  - now an identify copy

   
..  


   = op B2   - more uses

This speeds up FreeBench/neural by 29%, Olden/bh by 12%, oopack_v1p8 by 53%.

Modified:
llvm/trunk/include/llvm/CodeGen/LiveIntervalAnalysis.h
llvm/trunk/lib/CodeGen/SimpleRegisterCoalescing.cpp
llvm/trunk/lib/CodeGen/SimpleRegisterCoalescing.h

Modified: llvm/trunk/include/llvm/CodeGen/LiveIntervalAnalysis.h
URL: 
http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/CodeGen/LiveIntervalAnalysis.h?rev=47046r1=47045r2=47046view=diff

==
--- llvm/trunk/include/llvm/CodeGen/LiveIntervalAnalysis.h (original)
+++ llvm/trunk/include/llvm/CodeGen/LiveIntervalAnalysis.h Tue Feb 12 21:01:43 
2008
@@ -213,6 +213,20 @@
   }
 }
 
+/// ReplaceMachineInstrInMaps - Replacing a machine instr with a new one in
+/// maps used by register allocator.
+void ReplaceMachineInstrInMaps(MachineInstr *MI, MachineInstr *NewMI) {
+  Mi2IndexMap::iterator mi2i = mi2iMap_.find(MI);
+  if (mi2i != mi2iMap_.end()) {
+i2miMap_[mi2i-second/InstrSlots::NUM] = NewMI;
+Mi2IndexMap::const_iterator it = mi2iMap_.find(MI);
+assert(it != mi2iMap_.end()  Invalid instruction!);
+unsigned Index = it-second;
+mi2iMap_.erase(MI);
+mi2iMap_[NewMI] = Index;
+  }
+}
+
 BumpPtrAllocator getVNInfoAllocator() { return VNInfoAllocator; }
 
 virtual void getAnalysisUsage(AnalysisUsage AU) const;

Modified: llvm/trunk/lib/CodeGen/SimpleRegisterCoalescing.cpp
URL: 
http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SimpleRegisterCoalescing.cpp?rev=47046r1=47045r2=47046view=diff

==
--- llvm/trunk/lib/CodeGen/SimpleRegisterCoalescing.cpp (original)
+++ llvm/trunk/lib/CodeGen/SimpleRegisterCoalescing.cpp Tue Feb 12 21:01:43 2008
@@ -36,6 +36,8 @@
 using namespace llvm;
 
 STATISTIC(numJoins, Number of interval joins performed);
+STATISTIC(numCommutes , Number of instruction commuting performed);
+STATISTIC(numExtends  , Number of copies extended);
 STATISTIC(numPeep , Number of identity moves eliminated after 
coalescing);
 STATISTIC(numAborts   , Number of times interval joining aborted);
 
@@ -51,6 +53,10 @@
 cl::desc(Use new coalescer heuristic),
 cl::init(false));
 
+  static cl::optbool
+  CommuteDef(coalescer-commute-instrs,
+ cl::init(false), 

Re: [llvm-commits] [llvm] r47046 - in /llvm/trunk: include/llvm/CodeGen/LiveIntervalAnalysis.h lib/CodeGen/SimpleRegisterCoalescing.cpp lib/CodeGen/SimpleRegisterCoalescing.h

2008-02-12 Thread Chris Lattner
On Feb 12, 2008, at 7:01 PM, Evan Cheng wrote:

 Author: evancheng
 Date: Tue Feb 12 21:01:43 2008
 New Revision: 47046

 URL: http://llvm.org/viewvc/llvm-project?rev=47046view=rev
 Log:
 Initial support for copy elimination by commuting its definition MI.

Yay, thanks for tackling this Evan!


 This speeds up FreeBench/neural by 29%, Olden/bh by 12%, oopack_v1p8  
 by 53%.

Very nice, does it also help shootout/fib?


 +/// ReplaceMachineInstrInMaps - Replacing a machine instr with  
 a new one in
 +/// maps used by register allocator.
 +void ReplaceMachineInstrInMaps(MachineInstr *MI, MachineInstr  
 *NewMI) {
 +  Mi2IndexMap::iterator mi2i = mi2iMap_.find(MI);
 +  if (mi2i != mi2iMap_.end()) {

Please change this to:

  if (mi2i == mi2iMap_.end())
return;
  ...

Just to avoid indentation.


 +i2miMap_[mi2i-second/InstrSlots::NUM] = NewMI;


 +Mi2IndexMap::const_iterator it = mi2iMap_.find(MI);
 +assert(it != mi2iMap_.end()  Invalid instruction!);
 +unsigned Index = it-second;
 +mi2iMap_.erase(MI);

This would avoid another lookup if you used .erase(it);

 +bool  
 SimpleRegisterCoalescing::RemoveCopyByCommutingDef(LiveInterval IntA,
 + 
 LiveInterval IntB,
 + 
 MachineInstr *CopyMI) {

...

 +  // Get the location that B is defined at.  Two options: either  
 this value has
 +  // an unknown definition point or it is defined at CopyIdx.  If  
 unknown, we
 +  // can't process it.
 +  if (!BValNo-reg) return false;
 +  assert(BValNo-def == CopyIdx  Copy doesn't define the value?);
 +
 +  // AValNo is the value number in A that defines the copy, A3 in  
 the example.
 +  LiveInterval::iterator ALR =  
 IntA.FindLiveRangeContaining(CopyIdx-1);
 +  VNInfo *AValNo = ALR-valno;

This idiom happens a lot in the preexisting code.  Please use:

VNInfo *AValNo = IntA.FindLiveRangeContaining(CopyIdx-1)-valno;

Or better yet, add a new method that returns valno directly.



 +  int Idx = -1;

Idx is too generic of a name for a variable with this long of  
lifetime, please name it something more descriptive.  What type of idx  
is it?


 +  for (unsigned i = 0, e = DefMI-getNumOperands(); i != e; ++i) {

This loop needs a comment. What are you trying to find with this  
loop?  Maybe it should be moved out to its own function which just  
returns idx?  This would force you to name it, which would describe  
what it does :)


 +MachineOperand MO = DefMI-getOperand(i);
 +if (!MO.isRegister()) continue;
 +unsigned Reg = MO.getReg();
 +if (Reg  TargetRegisterInfo::isVirtualRegister(Reg)) {

How about:

  if (Reg == 0 || !isvreg) continue;

 +  if (rep(Reg) == IntA.reg) {
 +// If the dest register comes from an interval other than  
 IntA, we
 +// can't handle this.
 +if (Reg != IntA.reg)
 +  return false;

I must be missing something here, how do you know this is a def operand?

Another random question unrelated to this code: now that we have  
efficient RAUW, can we eliminate the 'rep'  mapping stuff and just  
RAUW vregs as they are coallesced?


 +continue;
 +  }
 +  if (Idx != -1)
 +// FIXME: Being overly careful here. We just need to figure  
 out the
 +// which register operand will become the new def.
 +return false;
 +  Idx = i;
 +}
 +  }
 +  if (Idx == -1)
 +// Something like %reg1024 = add %reg1024, %reg1024
 +return false;
 +
 +  MachineOperand MO = DefMI-getOperand(Idx);
 +  unsigned NewReg = MO.getReg();
 +  if (rep(NewReg) != IntB.reg || !MO.isKill())
 +return false;

This logic would make more sense if I knew what Idx indicates and why  
you're rejecting these cases :).  Also, MO has a long lifetime, please  
name it DefMO or something else more descriptive.


 +  // Make sure there are no other definitions of IntB that would  
 reach the
 +  // uses which the new definition can reach.
 +  for (LiveInterval::iterator AI = IntA.begin(), AE = IntA.end();
 +   AI != AE; ++AI) {
 +if (AI-valno != AValNo) continue;


 +LiveInterval::Ranges::iterator BI =
 +  std::upper_bound(IntB.ranges.begin(), IntB.ranges.end(), AI- 
 start);
 +if (BI != IntB.ranges.begin())
 +  --BI;
 +for (; BI != IntB.ranges.end()  AI-end = BI-start; ++BI) {
 +  if (BI-valno == BLR-valno)
 +continue;
 +  if (BI-start = AI-start  BI-end  AI-start)
 +return false;
 +  if (BI-start  AI-start  BI-start  AI-end)
 +return false;
 +}

Please split this out into a static function or a method on  
liveinterval.


 +  }
 +
 +  // Commute def machine instr.

How about: at this point we have decided that it is legal to do this  
transformation.  Start by commuting the instruction.


 +  MachineBasicBlock *MBB = DefMI-getParent();
 +  MachineInstr *NewMI = tii_-commuteInstruction(DefMI);
 +  if (NewMI