[m5-dev] IsOn using a vector

2008-09-23 Thread Gabe Black
I just finished stepping through some code having to do with PCs in 
simple CPU, and I noticed that not printing DPRINTFs is actually a 
fairly involved process, considering that you're not actually doing 
anything. Part of the issue, I think, is that whether or not a traceflag 
is on is stored in a vector of Bools. Since the size of the vector won't 
change often (ever?) would it make sense to just make it a char [] and 
use something like the following?

flags[t   3]   (1  (t  3));


I realize when you've got tracing on you're not going for blazing speed 
in the first place, but if it's easy to tighten it up a bit that's 
probably a good idea. The other possibility is that it's actually not 
doing a whole lot but calling through a bunch of functions gdb stops at 
one at a time. That would look like a lot of work to someone stepping 
through with gdb but could be just as fast.

Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] IsOn using a vector

2008-09-23 Thread Ali Saidi
You're talking about replacing return flags[t]; with a space optimized  
bit vector? I imagine it would help performance some if for no other  
reason that the trace flags would fit is a single cache block rather  
than spanning multiple as they do now.

Ali


On Sep 23, 2008, at 2:42 AM, Gabe Black wrote:

I just finished stepping through some code having to do with PCs in
 simple CPU, and I noticed that not printing DPRINTFs is actually a
 fairly involved process, considering that you're not actually doing
 anything. Part of the issue, I think, is that whether or not a  
 traceflag
 is on is stored in a vector of Bools. Since the size of the vector  
 won't
 change often (ever?) would it make sense to just make it a char [] and
 use something like the following?

 flags[t   3]   (1  (t  3));


 I realize when you've got tracing on you're not going for blazing  
 speed
 in the first place, but if it's easy to tighten it up a bit that's
 probably a good idea. The other possibility is that it's actually not
 doing a whole lot but calling through a bunch of functions gdb stops  
 at
 one at a time. That would look like a lot of work to someone stepping
 through with gdb but could be just as fast.

 Gabe
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] IsOn using a vector

2008-09-23 Thread Kevin Lim
I think I probably started using vectorbool but wound up using bitset 
eventually.  Likely anything I used vectorbool for could be easily 
cleaned up to use bitset instead.  It's probably not going to make a 
difference performance wise where I used it, but at least it would be 
more consistent.

Regarding Gabe, this probably doesn't help clear anything up, but:
http://www.sgi.com/tech/stl/bit_vector.html


  Description

A bit_vector is essentially a vector 
http://www.sgi.com/tech/stl/Vector.htmlbool: it is a Sequence 
http://www.sgi.com/tech/stl/Sequence.html that has the same interface 
as vector http://www.sgi.com/tech/stl/Vector.html. The main difference 
is that bit_vector is optimized for space efficiency. A vector always 
requires at least one byte per element, but a bit_vector only requires 
one bit per element.

*Warning*: The name bit_vector will be removed in a future release of 
the STL. The only reason that bit_vector is a separate class, instead of 
a template specialization of vectorbool, is that this would require 
partial specialization of templates. On compilers that support partial 
specialization, bit_vector is a specialization of vectorbool. The name 
bit_vector is a typedef. This typedef is not defined in the C++ 
standard, and is retained only for backward compatibility.



[EMAIL PROTECTED] wrote:
 I generally agree, except that I'd be a bit surprised (but not shocked) if
 vectorbool was specialized to use bits. I'll check into this more at some
 point but my vote would be to go with bitset since it sounds like that would
 get us where we'd want to be.

 Gabe

 Quoting Steve Reinhardt [EMAIL PROTECTED]:

   
 I thought vectorbool was supposed to be a space-optimized bit vector.

 I think there are two things going on here:

 1. It's an STL type, which means that the implementation probably is a
 nightmare of layered abstractions that in theory the compiler can figure out
 and flatten to an efficient piece of code in the common case.  If you're
 single-stepping through the debug version, I'm not surprised that it's a
 mess, but I also would not be surprised if in the opt version it boils down
 to roughly equivalent to the optimized code you're proposing.

 2. We probably should be using std::bitset rather than std::vectorbool
 when possible... the former should be faster since it's non-resizable and
 thus might have less bounds-checking code.  I've been trying to use bitset
 for this type of thing everywhere I can in m5 (packet flags), and it looks
 like Kevin is using it in a few places in o3 (though he uses vectorbool
 also... not sure if that's intentional).  Plus even if vectorbool is no
 longer space-optimized I'm sure that bitset is.

 In any case, we definitely don't want to write a one-off piece of code just
 for trace flags.  If someone can absolutely prove that neither vectorbool
 nor bitset is adequate when compiled with optimization, we can consider
 writing a replacement class for all our bit vectors, but I highly doubt that
 this is the case.

 Steve



 On Tue, Sep 23, 2008 at 11:33 AM, Ali Saidi [EMAIL PROTECTED] wrote:

 
 You're talking about replacing return flags[t]; with a space optimized
 bit vector? I imagine it would help performance some if for no other
 reason that the trace flags would fit is a single cache block rather
 than spanning multiple as they do now.

 Ali


 On Sep 23, 2008, at 2:42 AM, Gabe Black wrote:

   
I just finished stepping through some code having to do with PCs in
 simple CPU, and I noticed that not printing DPRINTFs is actually a
 fairly involved process, considering that you're not actually doing
 anything. Part of the issue, I think, is that whether or not a
 traceflag
 is on is stored in a vector of Bools. Since the size of the vector
 won't
 change often (ever?) would it make sense to just make it a char [] and
 use something like the following?

 flags[t   3]   (1  (t  3));


 I realize when you've got tracing on you're not going for blazing
 speed
 in the first place, but if it's easy to tighten it up a bit that's
 probably a good idea. The other possibility is that it's actually not
 doing a whole lot but calling through a bunch of functions gdb stops
 at
 one at a time. That would look like a lot of work to someone stepping
 through with gdb but could be just as fast.

 Gabe
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev

 
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev

   




 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev


   

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] IsOn using a vector

2008-09-23 Thread nathan binkert
I think vectorbool is indeed a vector of bools because x[0] is
supposed to return a bool * that you can mess with.  I think that's
why bitset exists.  I'd be happy to see us move to bitset.  The size
of the bitset is easy to get since the code is autogenerated anyway.

  Nate

On Tue, Sep 23, 2008 at 2:16 PM,  [EMAIL PROTECTED] wrote:
 I generally agree, except that I'd be a bit surprised (but not shocked) if
 vectorbool was specialized to use bits. I'll check into this more at some
 point but my vote would be to go with bitset since it sounds like that would
 get us where we'd want to be.

 Gabe

 Quoting Steve Reinhardt [EMAIL PROTECTED]:

 I thought vectorbool was supposed to be a space-optimized bit vector.

 I think there are two things going on here:

 1. It's an STL type, which means that the implementation probably is a
 nightmare of layered abstractions that in theory the compiler can figure out
 and flatten to an efficient piece of code in the common case.  If you're
 single-stepping through the debug version, I'm not surprised that it's a
 mess, but I also would not be surprised if in the opt version it boils down
 to roughly equivalent to the optimized code you're proposing.

 2. We probably should be using std::bitset rather than std::vectorbool
 when possible... the former should be faster since it's non-resizable and
 thus might have less bounds-checking code.  I've been trying to use bitset
 for this type of thing everywhere I can in m5 (packet flags), and it looks
 like Kevin is using it in a few places in o3 (though he uses vectorbool
 also... not sure if that's intentional).  Plus even if vectorbool is no
 longer space-optimized I'm sure that bitset is.

 In any case, we definitely don't want to write a one-off piece of code just
 for trace flags.  If someone can absolutely prove that neither vectorbool
 nor bitset is adequate when compiled with optimization, we can consider
 writing a replacement class for all our bit vectors, but I highly doubt that
 this is the case.

 Steve



 On Tue, Sep 23, 2008 at 11:33 AM, Ali Saidi [EMAIL PROTECTED] wrote:

  You're talking about replacing return flags[t]; with a space optimized
  bit vector? I imagine it would help performance some if for no other
  reason that the trace flags would fit is a single cache block rather
  than spanning multiple as they do now.
 
  Ali
 
 
  On Sep 23, 2008, at 2:42 AM, Gabe Black wrote:
 
  I just finished stepping through some code having to do with PCs in
   simple CPU, and I noticed that not printing DPRINTFs is actually a
   fairly involved process, considering that you're not actually doing
   anything. Part of the issue, I think, is that whether or not a
   traceflag
   is on is stored in a vector of Bools. Since the size of the vector
   won't
   change often (ever?) would it make sense to just make it a char [] and
   use something like the following?
  
   flags[t   3]   (1  (t  3));
  
  
   I realize when you've got tracing on you're not going for blazing
   speed
   in the first place, but if it's easy to tighten it up a bit that's
   probably a good idea. The other possibility is that it's actually not
   doing a whole lot but calling through a bunch of functions gdb stops
   at
   one at a time. That would look like a lot of work to someone stepping
   through with gdb but could be just as fast.
  
   Gabe
   ___
   m5-dev mailing list
   m5-dev@m5sim.org
   http://m5sim.org/mailman/listinfo/m5-dev
  
 
  ___
  m5-dev mailing list
  m5-dev@m5sim.org
  http://m5sim.org/mailman/listinfo/m5-dev
 





 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] ISA specific decoder state

2008-09-23 Thread gblack
Unfortunately there's still one missing component here. If you switch to the ROM
and change the micropc, you may just be starting up the part of the macroop
stored in the ROM, or you may be, for instance, trying to enter an interrupt
handler. In the former case you want a macroop and in the later you don't, but
it isn't clear how to differentiate between the two.

Most of the time the problem isn't that big of a deal because you can just
ignore the macroop, but in cases where the macroop doesn't work at all, like on
a page fault in instruction memory, that can effectively block an interrupt from
happening.

One possible solution is to keep the macroop around effectively until you know
for sure you finished a macroop, or if the thread context tells you to
artificially kick it out with some sort of function. A possible name is
endMacroop.

Gabe

Quoting Gabe Black [EMAIL PROTECTED]:

 I've considered it, and I think this is a good solution. I'm going to
 pass the current macroop into the rom when getting microops, and then
 when this gets to o3 at some point, keep the current macroop in the
 DynamicInst.

 Gabe

 [EMAIL PROTECTED] wrote:
  I'll have to think about it more, but at first pass that sounds pretty
 good. It
  might also help generate better/more descriptive disassembly which is
 something
  I've wanted to do for a while.
 
  Gabe
 
  Quoting Steve Reinhardt [EMAIL PROTECTED]:
 
 
  How about having a current macroop pointer in the execution context (so
  there would be a single pointer for SimpleCPU, and it would live in the
  DynamicInst object in O3)?
 
  On Sun, Sep 21, 2008 at 11:25 PM, Gabe Black [EMAIL PROTECTED]
 wrote:
 
 
  To break this into more digestible chunks, I need a way to get important
  information from the most recent macroop to the ROM so it can tune it's
  microops accordingly.
 
  I'd like to have two different types of regions in the ROM, those that
  are extensions of combinational macroops and need the specifics of the
  original macroop to do their job, and those that use a predefined set of
  parameters so they always work in predictable ways (and don't require a
  macroop).
 
  When the ROM was just an abstraction internal to the macroops, really
  just a special range of micropcs, the macroop itself could ferry that
  information along when it got microops from the ROM to give to the
  decoder. Now that the decoder gets the microops directly from the ROM,
  the information needs a new way to get to there.
 
  Two possibilities are that the decoder could expose the last macroop to
  the ROM so the ROM can pull out what it needs itself. The other is that
  the macroop sets up state maintained in the decoder which is exposed to
  the ROM for the same purpose. Exposing the macroop would probably be
  easier, but then it's easier to do something bad if there's no macroop
  but the ROM expects one. Liberal application of assert(ptr != NULL)
  would probably help mitigate that.
 
  One additional complication is if you mispredict or otherwise need to
  reset to a particular microop in the ROM. It could be hard to figure out
  if you need to get a macroop so the ROM can get the state it needs, or
  if the PC is nonsense and the ROM doesn't need the macroop. In this
  case, if the macroop is needed, fetch/decode should generate it, and if
  it causes a fault the fault should be handled. If the macroop isn't
  needed, fetch/decode shouldn't try to get it, and if they do and a fault
  happens, the fault should be ignored. I'm not sure what the best way to
  differentiate these is. One option would be to add more state which says
  whether or not the microops are stand alone and not intended to be part
  of a macroop. I feel like the decoder might be trying to keep track of
  too many things already, so some way to work that in there without just
  layering it on top would be best.
 
  Gabe
 
  Gabe Black wrote:
 
  I'm getting pretty close to starting with the interrupt entering
  microcode, but one annoying issue I'm not sure how to deal with is
  passing an environment to the microops in the ROM. For the interrupt
  code it's not as important because for a particular mode, the code
  should always behave the same way, generally outside the scope of any
  instruction. In the case of macroops being partially stored in the ROM,
  it becomes more important since the registers used, the various widths
  involved, etc. become more important. The ROM itself as it stands spits
  out instructions which correspond to a particular micropc but aren't
  otherwise specialized. If this was just x86, the obvious/easy solution
  would be to pass the emulation environment and extended machine
  instruction being used with the current macroop through to the ROM as
  the instructions are generated. Since it isn't, I need some way to get
  that information, whose existence is unknown to the decoder, out of the
  macroop, and then pass it into the ROM when a microop is requested. Any
  ideas 

[m5-dev] taking out RegContext code

2008-09-23 Thread Gabe Black
I'd like to take out the old RegContext code which is still floating 
around from my original implementation of register windows. Nothing 
calls it, and the way things are set up right now, specifically that o3 
doesn't use the ISA defined int and float reg files, it's fundamentally 
broken. I'm having visitors until early next week, so if no one has 
protested by then I'll assume no one will miss it and get rid of it.

Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] ISA specific decoder state

2008-09-23 Thread Ali Saidi

On Sep 23, 2008, at 9:28 PM, Steve Reinhardt wrote:
 I believe it's impossible fault on a microop in the middle of a  
 macroop and then resume where you left off in the middle of the  
 macroop flow.  If you fault you will roll back to the state at the  
 beginning of the macroop and then on resuming from the fault you  
 either have to reexecute the macroop from the start or skip it  
 entirely.

 So it seems like the two cases where you're in the middle of a  
 macroop flow and you end the macroop are either (1) the final,  
 specially marked microop of the normal flow hits retire or (2) a  
 microop causes a fault and the whole macroop is discarded and the  
 fault handler is invoked.  Interrupts don't count because you'd  
 never take an interrupt in the middle of a macroop.
I think x86s string copy instruction is a case where you have to be  
able to take an fault (tlb) in the middle of the micro-op and then  
resume. Otherwise it would be possible to prevent the machine from  
making forward progress with a string copy that spanned more pages  
than the tlb holds.

Ali

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev