[m5-dev] Cron [EMAIL PROTECTED] /z/m5/regression/do-regression quick
* build/ALPHA_SE/tests/fast/quick/20.eio-short/alpha/eio/simple-atomic passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/simple-timing passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/tru64/simple-atomic passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/tru64/o3-timing passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/simple-atomic passed. * build/ALPHA_SE/tests/fast/quick/01.hello-2T-smt/alpha/linux/o3-timing passed. * build/ALPHA_SE/tests/fast/quick/20.eio-short/alpha/eio/simple-timing passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/tru64/simple-timing passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/o3-timing passed. * build/ALPHA_FS/tests/fast/quick/10.linux-boot/alpha/linux/tsunami-simple-atomic-dual passed. * build/ALPHA_FS/tests/fast/quick/10.linux-boot/alpha/linux/tsunami-simple-atomic passed. * build/MIPS_SE/tests/fast/quick/00.hello/mips/linux/simple-timing passed. * build/MIPS_SE/tests/fast/quick/00.hello/mips/linux/simple-atomic passed. * build/ALPHA_FS/tests/fast/quick/10.linux-boot/alpha/linux/tsunami-simple-timing passed. * build/ALPHA_FS/tests/fast/quick/10.linux-boot/alpha/linux/tsunami-simple-timing-dual passed. * build/SPARC_SE/tests/fast/quick/00.hello/sparc/linux/simple-atomic passed. * build/SPARC_SE/tests/fast/quick/02.insttest/sparc/linux/simple-atomic passed. * build/SPARC_SE/tests/fast/quick/00.hello/sparc/linux/simple-timing passed. * build/SPARC_SE/tests/fast/quick/02.insttest/sparc/linux/o3-timing passed. * build/SPARC_SE/tests/fast/quick/02.insttest/sparc/linux/simple-timing passed. * build/ALPHA_FS/tests/fast/quick/80.netperf-stream/alpha/linux/twosys-tsunami-simple-atomic passed. * build/ALPHA_SE/tests/fast/quick/50.memtest/alpha/linux/memtest passed. * build/X86_SE/tests/fast/quick/00.hello/x86/linux/simple-atomic passed. See /z/m5/regression/regress-2008-09-20-03:00:01 for details. ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
[m5-dev] CPUID implementation
Now that I'm making the branch microop always use a fixed absolute micropc, the only place I wasn't already using it, the CPUID instruction, needs to change. The problem is, as things are implemented, it really has to be able to compute it's target. The CPUID instruction basically queries a mostly but not completely static pool of information about the CPU it's run on. For instance, it can tell you the size of various caches, what version the CPU is, who the manufacturer was, what instruction extensions are supported (that's partially where the info in /proc/cpuinfo comes from), blah blah blah. It's not completely static for two reasons. First, sometimes certain extensions are implemented in only some modes. I believe some CPUs turn off the bits of instructions that won't work in the current mode, although I'm not sure of that and I think it's done inconsistently among processors. Second, I believe Intel now allows you to tamper with the values returned by CPUID in order to allow a virtualized guest to query freely and not see capabilities that wouldn't work or that it shouldn't use. Right now, my implementation of CPUID does a little munging on the function code, which specifies what information you want, and then goes into what is essentially a big case statement/computed branch that puts the right values in the right registers and then returns. As I mentioned, since I'll no longer be able to do computed branches, this will no longer work. There are lots of other limitations too like having lots of microops to function as a basic lookup table, and the fact that the information is static and completely unconfigurable. For instance, the cache would always reported as the same size, and if some benchmark tried to use that value to behave in a certain way, it wouldn't do what it was supposed to. What I'm thinking I'd want to do is one of two things. Either the CPUID instruction should do a series of loads out of an actual lookup ROM/RAM somewhere outside of the CPU, or there could be a CPUID device which would allow it to respond in intelligent ways depending on the CPU mode, for instance. I'm favoring sticking a ROM in the memory system somewhere. Also, I'd like to put in some sort of configuration interface that would allow the configs to program in what CPUID should say if it needs to reflect the actual hardware or someone wants to add a new function, for instance. What do you guys think? Gabe ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] another microcode design decision
OK, I see... the combination of only branching to the start of a quad and not being able to generate more than one quad combinationally makes branching in a combinational sequence meaningless on the real machine. Is there an upper bound on the number of microops you can generate in m5 through the combinational path? Even though it's more artificial, would it make sense to keep the restriction that if you need a microbranch you have to go to the ROM? At an even higher level, my impression is that you were making everything combinational because until you hit interrupt handling there was no absolute need for a microcode ROM... but now that you have one, should you revisit that, and start putting the more complex instruction sequences in the ROM as well? Would that allow you to simplify some of the sequencing logic? Steve On Thu, Sep 18, 2008 at 10:36 AM, [EMAIL PROTECTED] wrote: There is conceptually no upper bound, but really it's because you simply can't branch withing the number of microops generated by the combinational decoder. It's generating what amounts to an atomic VLIW vector of operations where control happens between entire vectors. Only one comes out of the combinational decoder, so it's like you get one instruction at that point. Either that one instruction does the trick, or you need to go to the ROM where a micropc conceptually exists. Gabe Quoting Steve Reinhardt [EMAIL PROTECTED]: I see...the only reason I saw to switch to relative branches is that it avoids the need to distinguish between ROM and non-ROM targets. I guess the argument for their approach is that if your microcode flow is complex enough to require a branch then it's probably complex enough to need to come out of ROM anyway. I'm guessing the difference with what you're doing is that there's no hard upper bound on the number of microops you can generate via the combinational decoder; is that true? Steve On Thu, Sep 18, 2008 at 12:52 AM, Gabe Black [EMAIL PROTECTED] wrote: This email is a minor informational update on the microcode/micropc/branching/ROM stuff. I started working on making the microbranches relative, but I had a hard time getting it to work because of how the microcode listing is processed and needing to know the current micropc in order to compute the argument for the branch microop. I went to check the patent to just to make sure I was thinking about things the right way, and it turns out I was wrong about two things. First, the branches are absolute and not relative. This makes a lot of sense because you eliminate the need for an adder to computer your target, and also one big win of relative branches, relocatable code, is moot in a ROM like this. Second, branches are not register operations. They come in one or two parts and are centered around generating quads of operations from the ROM on each read. On the end of a quad, there's a field called OpSeq which directs you to the next quad to fetch and is similar to what I'd call a branch. It has a 12 bit field which encodes it's target, and since it deals with a quad of microops, I'd say it's 14 effective bits. If you want a conditional branch, one of the operations in the quad says what the condition is and what the fall through address is, and there are 17 bits for that. Since that's all tied to having quads and that's a little to specific to a particular implementation for M5, I think, it seems more appropriate to have a conditional branch microop that just does everything at once. What I did before was make the branch instruction a register operation which only supports 8 bits of immediate because that made it easy to make it conditional. What I should really do is move the branch instruction to the right category and extend the immediate field to 16 bits, a handy number which approximates what's in the patent. Also, as far as different address spaces for the ROM and the macroop, the patent handles that by making -all- branches go to the ROM, and by making the unit of execution a quad. That way, if you're not finished when you execute the one and only combinational quad associated with an instruction (which effectively has no micropc), you always go to someplace in the ROM with a branch. This again doesn't work for M5 because we're not architected around quads and all microops have a micropc. Gabe ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
[m5-dev] Undelivered Mail Returned to Sender
This is the mail system at host daystrom.m5sim.org. I'm sorry to have to inform you that your message could not be delivered to one or more recipients. It's attached below. For further assistance, please send mail to postmaster. If you do so, please include this problem report. You can delete your own text from the attached returned message. The mail system [EMAIL PROTECTED]: Host or domain name not found. Name service error for name=rywzrd.com type=A: Host not found Reporting-MTA: dns; daystrom.m5sim.org X-Postfix-Queue-ID: C3F6715826B X-Postfix-Sender: rfc822; m5-dev@m5sim.org Arrival-Date: Sat, 20 Sep 2008 12:39:40 -0400 (EDT) Final-Recipient: rfc822; vxnfyq@rywzrd.com Action: failed Status: 5.4.4 Diagnostic-Code: X-Postfix; Host or domain name not found. Name service error for name=rywzrd.com type=A: Host not found ---BeginMessage--- This address has been used to register a Flyspray account. If you were not expecting this message, please ignore and delete it. Go to the following URL to complete your registration: http://www.m5sim.org/flyspray/index.php?do=registermagic=6a7aa7170f4bf6498c9aef6a04210068 Your confirmation code is: 7af2j.eGoO14M ---End Message--- ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] CPUID implementation
If it's that complicated, why not just do it in C++ inside of M5, and have a special microop that just calls that function and lets it do the dirty work? I don't think performance fidelity is an issue here, and even if it were, we could always just make that single microop take longer. Steve On Sat, Sep 20, 2008 at 12:50 AM, Gabe Black [EMAIL PROTECTED] wrote: Now that I'm making the branch microop always use a fixed absolute micropc, the only place I wasn't already using it, the CPUID instruction, needs to change. The problem is, as things are implemented, it really has to be able to compute it's target. The CPUID instruction basically queries a mostly but not completely static pool of information about the CPU it's run on. For instance, it can tell you the size of various caches, what version the CPU is, who the manufacturer was, what instruction extensions are supported (that's partially where the info in /proc/cpuinfo comes from), blah blah blah. It's not completely static for two reasons. First, sometimes certain extensions are implemented in only some modes. I believe some CPUs turn off the bits of instructions that won't work in the current mode, although I'm not sure of that and I think it's done inconsistently among processors. Second, I believe Intel now allows you to tamper with the values returned by CPUID in order to allow a virtualized guest to query freely and not see capabilities that wouldn't work or that it shouldn't use. Right now, my implementation of CPUID does a little munging on the function code, which specifies what information you want, and then goes into what is essentially a big case statement/computed branch that puts the right values in the right registers and then returns. As I mentioned, since I'll no longer be able to do computed branches, this will no longer work. There are lots of other limitations too like having lots of microops to function as a basic lookup table, and the fact that the information is static and completely unconfigurable. For instance, the cache would always reported as the same size, and if some benchmark tried to use that value to behave in a certain way, it wouldn't do what it was supposed to. What I'm thinking I'd want to do is one of two things. Either the CPUID instruction should do a series of loads out of an actual lookup ROM/RAM somewhere outside of the CPU, or there could be a CPUID device which would allow it to respond in intelligent ways depending on the CPU mode, for instance. I'm favoring sticking a ROM in the memory system somewhere. Also, I'd like to put in some sort of configuration interface that would allow the configs to program in what CPUID should say if it needs to reflect the actual hardware or someone wants to add a new function, for instance. What do you guys think? Gabe ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] CPUID implementation
I kind of ran into a similar thing with sparc. There is configuration code that needed to inform the system about the speed/size/type of various objects. It would be good to have a C++ interface to easily query the object tree to be able to make those determinations. Ali On Sep 20, 2008, at 9:42 AM, Steve Reinhardt wrote: If it's that complicated, why not just do it in C++ inside of M5, and have a special microop that just calls that function and lets it do the dirty work? I don't think performance fidelity is an issue here, and even if it were, we could always just make that single microop take longer. Steve On Sat, Sep 20, 2008 at 12:50 AM, Gabe Black [EMAIL PROTECTED] wrote: Now that I'm making the branch microop always use a fixed absolute micropc, the only place I wasn't already using it, the CPUID instruction, needs to change. The problem is, as things are implemented, it really has to be able to compute it's target. The CPUID instruction basically queries a mostly but not completely static pool of information about the CPU it's run on. For instance, it can tell you the size of various caches, what version the CPU is, who the manufacturer was, what instruction extensions are supported (that's partially where the info in /proc/cpuinfo comes from), blah blah blah. It's not completely static for two reasons. First, sometimes certain extensions are implemented in only some modes. I believe some CPUs turn off the bits of instructions that won't work in the current mode, although I'm not sure of that and I think it's done inconsistently among processors. Second, I believe Intel now allows you to tamper with the values returned by CPUID in order to allow a virtualized guest to query freely and not see capabilities that wouldn't work or that it shouldn't use. Right now, my implementation of CPUID does a little munging on the function code, which specifies what information you want, and then goes into what is essentially a big case statement/computed branch that puts the right values in the right registers and then returns. As I mentioned, since I'll no longer be able to do computed branches, this will no longer work. There are lots of other limitations too like having lots of microops to function as a basic lookup table, and the fact that the information is static and completely unconfigurable. For instance, the cache would always reported as the same size, and if some benchmark tried to use that value to behave in a certain way, it wouldn't do what it was supposed to. What I'm thinking I'd want to do is one of two things. Either the CPUID instruction should do a series of loads out of an actual lookup ROM/RAM somewhere outside of the CPU, or there could be a CPUID device which would allow it to respond in intelligent ways depending on the CPU mode, for instance. I'm favoring sticking a ROM in the memory system somewhere. Also, I'd like to put in some sort of configuration interface that would allow the configs to program in what CPUID should say if it needs to reflect the actual hardware or someone wants to add a new function, for instance. What do you guys think? Gabe ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] CPUID implementation
Collecting the necessary information in Python up front seems reasonable to me. As far as where the C++ code would go, it seems to me it could either be a method on the BaseCPU object or a global function in the x86 ISA namespace. The latter makes sense if we're trying not to pollute the CPU model with a lot of ISA-specific stuff. I don't see the advantage of making the ISA a SimObject instead of a namespace unless we want to do something like make it a template parameter of the CPU objects (to enable heterogeneous ISA support). Steve ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] another microcode design decision
I have the microbranches ready to go, so now we need to figure out the details of how you switch to and from the ROM. I think something like fromRom and nextFromRom would work, although the names aren't that great. If anyone has a suggestion for a different mechanism or better names, please let me know. Gabe Gabe Black wrote: There is no limit on what you can do combinationally. The problem with making every branch go to the ROM, or really the reason that doesn't actually buy us anything, is that the micropc changes all the time in the middle. If you made micropcs only relevant in the ROM, then you'd need the combinational microops to be fed in a strict sequence out of the macroop. If the decoder keeps track of that, it needs a way to tell the macroop what's next and we've reintroduced a micropc. If the macroop itself keeps track of that, it's not a static StaticInst anymore. Then at some point you'd still need to switch the state from combinational to ROM based when you'd reached a branch, so you'd still need to mark that microop specially and you'd still need to keep some state and a transition mechanism in the CPU. You're right about making everything combinational. I had planned on supporting a ROM from the beginning and built support into the microassembler for it, but in the interests of getting running microcode I just made everything combinational to get it working. I will definitely move the more complex instructions into the ROM, hopefully making all branches into the ROM so that there's only one version of branch. That would just simplify the implementation of my microops, though, and wouldn't really do anything at the level of the decoder or CPU. Gabe Steve Reinhardt wrote: OK, I see... the combination of only branching to the start of a quad and not being able to generate more than one quad combinationally makes branching in a combinational sequence meaningless on the real machine. Is there an upper bound on the number of microops you can generate in m5 through the combinational path? Even though it's more artificial, would it make sense to keep the restriction that if you need a microbranch you have to go to the ROM? At an even higher level, my impression is that you were making everything combinational because until you hit interrupt handling there was no absolute need for a microcode ROM... but now that you have one, should you revisit that, and start putting the more complex instruction sequences in the ROM as well? Would that allow you to simplify some of the sequencing logic? Steve On Thu, Sep 18, 2008 at 10:36 AM, [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: There is conceptually no upper bound, but really it's because you simply can't branch withing the number of microops generated by the combinational decoder. It's generating what amounts to an atomic VLIW vector of operations where control happens between entire vectors. Only one comes out of the combinational decoder, so it's like you get one instruction at that point. Either that one instruction does the trick, or you need to go to the ROM where a micropc conceptually exists. Gabe Quoting Steve Reinhardt [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]: I see...the only reason I saw to switch to relative branches is that it avoids the need to distinguish between ROM and non-ROM targets. I guess the argument for their approach is that if your microcode flow is complex enough to require a branch then it's probably complex enough to need to come out of ROM anyway. I'm guessing the difference with what you're doing is that there's no hard upper bound on the number of microops you can generate via the combinational decoder; is that true? Steve On Thu, Sep 18, 2008 at 12:52 AM, Gabe Black [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: This email is a minor informational update on the microcode/micropc/branching/ROM stuff. I started working on making the microbranches relative, but I had a hard time getting it to work because of how the microcode listing is processed and needing to know the current micropc in order to compute the argument for the branch microop. I went to check the patent to just to make sure I was thinking about things the right way, and it turns out I was wrong about two things. First, the branches are absolute and not relative. This makes a lot of sense because you eliminate the need for an adder to computer your target, and also one big win of relative branches, relocatable code, is moot in a ROM like this. Second, branches are not register operations. They come in one or two parts and are centered around