[m5-dev] Cron [EMAIL PROTECTED] /z/m5/regression/do-regression quick

2008-09-20 Thread Cron Daemon
* build/ALPHA_SE/tests/fast/quick/20.eio-short/alpha/eio/simple-atomic 
passed.
* build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/simple-timing passed.
* build/ALPHA_SE/tests/fast/quick/00.hello/alpha/tru64/simple-atomic passed.
* build/ALPHA_SE/tests/fast/quick/00.hello/alpha/tru64/o3-timing passed.
* build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/simple-atomic passed.
* build/ALPHA_SE/tests/fast/quick/01.hello-2T-smt/alpha/linux/o3-timing 
passed.
* build/ALPHA_SE/tests/fast/quick/20.eio-short/alpha/eio/simple-timing 
passed.
* build/ALPHA_SE/tests/fast/quick/00.hello/alpha/tru64/simple-timing passed.
* build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/o3-timing passed.
* 
build/ALPHA_FS/tests/fast/quick/10.linux-boot/alpha/linux/tsunami-simple-atomic-dual
 passed.
* 
build/ALPHA_FS/tests/fast/quick/10.linux-boot/alpha/linux/tsunami-simple-atomic 
passed.
* build/MIPS_SE/tests/fast/quick/00.hello/mips/linux/simple-timing passed.
* build/MIPS_SE/tests/fast/quick/00.hello/mips/linux/simple-atomic passed.
* 
build/ALPHA_FS/tests/fast/quick/10.linux-boot/alpha/linux/tsunami-simple-timing 
passed.
* 
build/ALPHA_FS/tests/fast/quick/10.linux-boot/alpha/linux/tsunami-simple-timing-dual
 passed.
* build/SPARC_SE/tests/fast/quick/00.hello/sparc/linux/simple-atomic passed.
* build/SPARC_SE/tests/fast/quick/02.insttest/sparc/linux/simple-atomic 
passed.
* build/SPARC_SE/tests/fast/quick/00.hello/sparc/linux/simple-timing passed.
* build/SPARC_SE/tests/fast/quick/02.insttest/sparc/linux/o3-timing passed.
* build/SPARC_SE/tests/fast/quick/02.insttest/sparc/linux/simple-timing 
passed.
* 
build/ALPHA_FS/tests/fast/quick/80.netperf-stream/alpha/linux/twosys-tsunami-simple-atomic
 passed.
* build/ALPHA_SE/tests/fast/quick/50.memtest/alpha/linux/memtest passed.
* build/X86_SE/tests/fast/quick/00.hello/x86/linux/simple-atomic passed.

See /z/m5/regression/regress-2008-09-20-03:00:01 for details.

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


[m5-dev] CPUID implementation

2008-09-20 Thread Gabe Black
Now that I'm making the branch microop always use a fixed absolute 
micropc, the only place I wasn't already using it, the CPUID 
instruction, needs to change. The problem is, as things are implemented, 
it really has to be able to compute it's target. The CPUID instruction 
basically queries a mostly but not completely static pool of information 
about the CPU it's run on. For instance, it can tell you the size of 
various caches, what version the CPU is, who the manufacturer was, what 
instruction extensions are supported (that's partially where the info in 
/proc/cpuinfo comes from), blah blah blah. It's not completely static 
for two reasons. First, sometimes certain extensions are implemented in 
only some modes. I believe some CPUs turn off the bits of instructions 
that won't work in the current mode, although I'm not sure of that and I 
think it's done inconsistently among processors. Second, I believe Intel 
now allows you to tamper with the values returned by CPUID in order to 
allow a virtualized guest to query freely and not see capabilities that 
wouldn't work or that it shouldn't use.

Right now, my implementation of CPUID does a little munging on the 
function code, which specifies what information you want, and then 
goes into what is essentially a big case statement/computed branch that 
puts the right values in the right registers and then returns. As I 
mentioned, since I'll no longer be able to do computed branches, this 
will no longer work. There are lots of other limitations too like having 
lots of microops to function as a basic lookup table, and the fact that 
the information is static and completely unconfigurable. For instance, 
the cache would always reported as the same size, and if some benchmark 
tried to use that value to behave in a certain way, it wouldn't do what 
it was supposed to. What I'm thinking I'd want to do is one of two 
things. Either the CPUID instruction should do a series of loads out of 
an actual lookup ROM/RAM somewhere outside of the CPU, or there could be 
a CPUID device which would allow it to respond in intelligent ways 
depending on the CPU mode, for instance. I'm favoring sticking a ROM in 
the memory system somewhere. Also, I'd like to put in some sort of 
configuration interface that would allow the configs to program in what 
CPUID should say if it needs to reflect the actual hardware or someone 
wants to add a new function, for instance.

What do you guys think?

Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] another microcode design decision

2008-09-20 Thread Steve Reinhardt
OK, I see... the combination of only branching to the start of a quad and
not being able to generate more than one quad combinationally makes
branching in a combinational sequence meaningless on the real machine.

Is there an upper bound on the number of microops you can generate in m5
through the combinational path?  Even though it's more artificial, would
it make sense to keep the restriction that if you need a microbranch you
have to go to the ROM?

At an even higher level, my impression is that you were making everything
combinational because until you hit interrupt handling there was no
absolute need for a microcode ROM... but now that you have one, should you
revisit that, and start putting the more complex instruction sequences in
the ROM as well?  Would that allow you to simplify some of the sequencing
logic?

Steve

On Thu, Sep 18, 2008 at 10:36 AM, [EMAIL PROTECTED] wrote:

 There is conceptually no upper bound, but really it's because you simply
 can't
 branch withing the number of microops generated by the combinational
 decoder.
 It's generating what amounts to an atomic VLIW vector of operations where
 control happens between entire vectors. Only one comes out of the
 combinational
 decoder, so it's like you get one instruction at that point. Either that
 one
 instruction does the trick, or you need to go to the ROM where a micropc
 conceptually exists.

 Gabe

 Quoting Steve Reinhardt [EMAIL PROTECTED]:

  I see...the only reason I saw to switch to relative branches is that it
  avoids the need to distinguish between ROM and non-ROM targets.  I guess
 the
  argument for their approach is that if your microcode flow is complex
 enough
  to require a branch then it's probably complex enough to need to come out
 of
  ROM anyway.  I'm guessing the difference with what you're doing is that
  there's no hard upper bound on the number of microops you can generate
 via
  the combinational decoder; is that true?
 
  Steve
 
  On Thu, Sep 18, 2008 at 12:52 AM, Gabe Black [EMAIL PROTECTED]
 wrote:
 
   This email is a minor informational update on the
   microcode/micropc/branching/ROM stuff.
  
   I started working on making the microbranches relative, but I had a
 hard
   time getting it to work because of how the microcode listing is
   processed and needing to know the current micropc in order to compute
   the argument for the branch microop. I went to check the patent to just
   to make sure I was thinking about things the right way, and it turns
 out
   I was wrong about two things. First, the branches are absolute and not
   relative. This makes a lot of sense because you eliminate the need for
   an adder to computer your target, and also one big win of relative
   branches, relocatable code, is moot in a ROM like this. Second,
 branches
   are not register operations. They come in one or two parts and are
   centered around generating quads of operations from the ROM on each
   read. On the end of a quad, there's a field called OpSeq which directs
   you to the next quad to fetch and is similar to what I'd call a branch.
   It has a 12 bit field which encodes it's target, and since it deals
 with
   a quad of microops, I'd say it's 14 effective bits. If you want a
   conditional branch, one of the operations in the quad says what the
   condition is and what the fall through address is, and there are 17
 bits
   for that. Since that's all tied to having quads and that's a little to
   specific to a particular implementation for M5, I think, it seems more
   appropriate to have a conditional branch microop that just does
   everything at once. What I did before was make the branch instruction a
   register operation which only supports 8 bits of immediate because that
   made it easy to make it conditional. What I should really do is move
 the
   branch instruction to the right category and extend the immediate field
   to 16 bits, a handy number which approximates what's in the patent.
  
   Also, as far as different address spaces for the ROM and the macroop,
   the patent handles that by making -all- branches go to the ROM, and by
   making the unit of execution a quad. That way, if you're not finished
   when you execute the one and only combinational quad associated with an
   instruction (which effectively has no micropc), you always go to
   someplace in the ROM with a branch. This again doesn't work for M5
   because we're not architected around quads and all microops have a
 micropc.
  
   Gabe
  
 




 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


[m5-dev] Undelivered Mail Returned to Sender

2008-09-20 Thread Mail Delivery System
This is the mail system at host daystrom.m5sim.org.

I'm sorry to have to inform you that your message could not
be delivered to one or more recipients. It's attached below.

For further assistance, please send mail to postmaster.

If you do so, please include this problem report. You can
delete your own text from the attached returned message.

   The mail system

[EMAIL PROTECTED]: Host or domain name not found. Name service error for
name=rywzrd.com type=A: Host not found
Reporting-MTA: dns; daystrom.m5sim.org
X-Postfix-Queue-ID: C3F6715826B
X-Postfix-Sender: rfc822; m5-dev@m5sim.org
Arrival-Date: Sat, 20 Sep 2008 12:39:40 -0400 (EDT)

Final-Recipient: rfc822; vxnfyq@rywzrd.com
Action: failed
Status: 5.4.4
Diagnostic-Code: X-Postfix; Host or domain name not found. Name service error
for name=rywzrd.com type=A: Host not found
---BeginMessage---
 

This address has been used to register a Flyspray account.  If you
were not expecting this message, please ignore and delete it.  Go to
the following URL to complete your registration:

http://www.m5sim.org/flyspray/index.php?do=registermagic=6a7aa7170f4bf6498c9aef6a04210068


Your confirmation code is:

7af2j.eGoO14M

---End Message---
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] CPUID implementation

2008-09-20 Thread Steve Reinhardt
If it's that complicated, why not just do it in C++ inside of M5, and have a
special microop that just calls that function and lets it do the dirty
work?  I don't think performance fidelity is an issue here, and even if it
were, we could always just make that single microop take longer.

Steve

On Sat, Sep 20, 2008 at 12:50 AM, Gabe Black [EMAIL PROTECTED] wrote:

Now that I'm making the branch microop always use a fixed absolute
 micropc, the only place I wasn't already using it, the CPUID
 instruction, needs to change. The problem is, as things are implemented,
 it really has to be able to compute it's target. The CPUID instruction
 basically queries a mostly but not completely static pool of information
 about the CPU it's run on. For instance, it can tell you the size of
 various caches, what version the CPU is, who the manufacturer was, what
 instruction extensions are supported (that's partially where the info in
 /proc/cpuinfo comes from), blah blah blah. It's not completely static
 for two reasons. First, sometimes certain extensions are implemented in
 only some modes. I believe some CPUs turn off the bits of instructions
 that won't work in the current mode, although I'm not sure of that and I
 think it's done inconsistently among processors. Second, I believe Intel
 now allows you to tamper with the values returned by CPUID in order to
 allow a virtualized guest to query freely and not see capabilities that
 wouldn't work or that it shouldn't use.

Right now, my implementation of CPUID does a little munging on the
 function code, which specifies what information you want, and then
 goes into what is essentially a big case statement/computed branch that
 puts the right values in the right registers and then returns. As I
 mentioned, since I'll no longer be able to do computed branches, this
 will no longer work. There are lots of other limitations too like having
 lots of microops to function as a basic lookup table, and the fact that
 the information is static and completely unconfigurable. For instance,
 the cache would always reported as the same size, and if some benchmark
 tried to use that value to behave in a certain way, it wouldn't do what
 it was supposed to. What I'm thinking I'd want to do is one of two
 things. Either the CPUID instruction should do a series of loads out of
 an actual lookup ROM/RAM somewhere outside of the CPU, or there could be
 a CPUID device which would allow it to respond in intelligent ways
 depending on the CPU mode, for instance. I'm favoring sticking a ROM in
 the memory system somewhere. Also, I'd like to put in some sort of
 configuration interface that would allow the configs to program in what
 CPUID should say if it needs to reflect the actual hardware or someone
 wants to add a new function, for instance.

What do you guys think?

 Gabe
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] CPUID implementation

2008-09-20 Thread Ali Saidi
I kind of ran into a similar thing with sparc. There is configuration  
code that needed to inform the system about the speed/size/type of  
various objects. It would be good to have a C++ interface to easily  
query the object tree to be able to make those determinations.

Ali

On Sep 20, 2008, at 9:42 AM, Steve Reinhardt wrote:

 If it's that complicated, why not just do it in C++ inside of M5,  
 and have a special microop that just calls that function and lets it  
 do the dirty work?  I don't think performance fidelity is an issue  
 here, and even if it were, we could always just make that single  
 microop take longer.

 Steve

 On Sat, Sep 20, 2008 at 12:50 AM, Gabe Black [EMAIL PROTECTED]  
 wrote:
Now that I'm making the branch microop always use a fixed absolute
 micropc, the only place I wasn't already using it, the CPUID
 instruction, needs to change. The problem is, as things are  
 implemented,
 it really has to be able to compute it's target. The CPUID instruction
 basically queries a mostly but not completely static pool of  
 information
 about the CPU it's run on. For instance, it can tell you the size of
 various caches, what version the CPU is, who the manufacturer was,  
 what
 instruction extensions are supported (that's partially where the  
 info in
 /proc/cpuinfo comes from), blah blah blah. It's not completely static
 for two reasons. First, sometimes certain extensions are implemented  
 in
 only some modes. I believe some CPUs turn off the bits of instructions
 that won't work in the current mode, although I'm not sure of that  
 and I
 think it's done inconsistently among processors. Second, I believe  
 Intel
 now allows you to tamper with the values returned by CPUID in order to
 allow a virtualized guest to query freely and not see capabilities  
 that
 wouldn't work or that it shouldn't use.

Right now, my implementation of CPUID does a little munging on the
 function code, which specifies what information you want, and then
 goes into what is essentially a big case statement/computed branch  
 that
 puts the right values in the right registers and then returns. As I
 mentioned, since I'll no longer be able to do computed branches, this
 will no longer work. There are lots of other limitations too like  
 having
 lots of microops to function as a basic lookup table, and the fact  
 that
 the information is static and completely unconfigurable. For instance,
 the cache would always reported as the same size, and if some  
 benchmark
 tried to use that value to behave in a certain way, it wouldn't do  
 what
 it was supposed to. What I'm thinking I'd want to do is one of two
 things. Either the CPUID instruction should do a series of loads out  
 of
 an actual lookup ROM/RAM somewhere outside of the CPU, or there  
 could be
 a CPUID device which would allow it to respond in intelligent ways
 depending on the CPU mode, for instance. I'm favoring sticking a ROM  
 in
 the memory system somewhere. Also, I'd like to put in some sort of
 configuration interface that would allow the configs to program in  
 what
 CPUID should say if it needs to reflect the actual hardware or someone
 wants to add a new function, for instance.

What do you guys think?

 Gabe
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev

 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] CPUID implementation

2008-09-20 Thread Steve Reinhardt
Collecting the necessary information in Python up front seems reasonable to
me.

As far as where the C++ code would go, it seems to me it could either be a
method on the BaseCPU object or a global function in the x86 ISA namespace.
The latter makes sense if we're trying not to pollute the CPU model with a
lot of ISA-specific stuff.  I don't see the advantage of making the ISA a
SimObject instead of a namespace unless we want to do something like make it
a template parameter of the CPU objects (to enable heterogeneous ISA
support).

Steve
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] another microcode design decision

2008-09-20 Thread Gabe Black
I have the microbranches ready to go, so now we need to figure out the 
details of how you switch to and from the ROM. I think something like 
fromRom and nextFromRom would work, although the names aren't that 
great. If anyone has a suggestion for a different mechanism or better 
names, please let me know.

Gabe

Gabe Black wrote:
 There is no limit on what you can do combinationally. The problem with 
 making every branch go to the ROM, or really the reason that doesn't 
 actually buy us anything, is that the micropc changes all the time in 
 the middle. If you made micropcs only relevant in the ROM, then you'd 
 need the combinational microops to be fed in a strict sequence out of 
 the macroop. If the decoder keeps track of that, it needs a way to tell 
 the macroop what's next and we've reintroduced a micropc. If the macroop 
 itself keeps track of that, it's not a static StaticInst anymore. Then 
 at some point you'd still need to switch the state from combinational to 
 ROM based when you'd reached a branch, so you'd still need to mark that 
 microop specially and you'd still need to keep some state and a 
 transition mechanism in the CPU.

 You're right about making everything combinational. I had planned on 
 supporting a ROM from the beginning and built support into the 
 microassembler for it, but in the interests of getting running microcode 
 I just made everything combinational to get it working. I will 
 definitely move the more complex instructions into the ROM, hopefully 
 making all branches into the ROM so that there's only one version of 
 branch. That would just simplify the implementation of my microops, 
 though, and wouldn't really do anything at the level of the decoder or CPU.

 Gabe

 Steve Reinhardt wrote:
   
 OK, I see... the combination of only branching to the start of a quad 
 and not being able to generate more than one quad combinationally 
 makes branching in a combinational sequence meaningless on the real 
 machine.

 Is there an upper bound on the number of microops you can generate in 
 m5 through the combinational path?  Even though it's more 
 artificial, would it make sense to keep the restriction that if you 
 need a microbranch you have to go to the ROM?

 At an even higher level, my impression is that you were making 
 everything combinational because until you hit interrupt handling 
 there was no absolute need for a microcode ROM... but now that you 
 have one, should you revisit that, and start putting the more complex 
 instruction sequences in the ROM as well?  Would that allow you to 
 simplify some of the sequencing logic?

 Steve

 On Thu, Sep 18, 2008 at 10:36 AM, [EMAIL PROTECTED] 
 mailto:[EMAIL PROTECTED] wrote:

 There is conceptually no upper bound, but really it's because you
 simply can't
 branch withing the number of microops generated by the
 combinational decoder.
 It's generating what amounts to an atomic VLIW vector of
 operations where
 control happens between entire vectors. Only one comes out of the
 combinational
 decoder, so it's like you get one instruction at that point.
 Either that one
 instruction does the trick, or you need to go to the ROM where a
 micropc
 conceptually exists.

 Gabe

 Quoting Steve Reinhardt [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]:

  I see...the only reason I saw to switch to relative branches is
 that it
  avoids the need to distinguish between ROM and non-ROM targets.
  I guess the
  argument for their approach is that if your microcode flow is
 complex enough
  to require a branch then it's probably complex enough to need to
 come out of
  ROM anyway.  I'm guessing the difference with what you're doing
 is that
  there's no hard upper bound on the number of microops you can
 generate via
  the combinational decoder; is that true?
 
  Steve
 
  On Thu, Sep 18, 2008 at 12:52 AM, Gabe Black
 [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote:
 
   This email is a minor informational update on the
   microcode/micropc/branching/ROM stuff.
  
   I started working on making the microbranches relative, but I
 had a hard
   time getting it to work because of how the microcode listing is
   processed and needing to know the current micropc in order to
 compute
   the argument for the branch microop. I went to check the
 patent to just
   to make sure I was thinking about things the right way, and it
 turns out
   I was wrong about two things. First, the branches are absolute
 and not
   relative. This makes a lot of sense because you eliminate the
 need for
   an adder to computer your target, and also one big win of relative
   branches, relocatable code, is moot in a ROM like this.
 Second, branches
   are not register operations. They come in one or two parts and are
   centered around