[m5-dev] IsOn using a vector
I just finished stepping through some code having to do with PCs in simple CPU, and I noticed that not printing DPRINTFs is actually a fairly involved process, considering that you're not actually doing anything. Part of the issue, I think, is that whether or not a traceflag is on is stored in a vector of Bools. Since the size of the vector won't change often (ever?) would it make sense to just make it a char [] and use something like the following? flags[t 3] (1 (t 3)); I realize when you've got tracing on you're not going for blazing speed in the first place, but if it's easy to tighten it up a bit that's probably a good idea. The other possibility is that it's actually not doing a whole lot but calling through a bunch of functions gdb stops at one at a time. That would look like a lot of work to someone stepping through with gdb but could be just as fast. Gabe ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] IsOn using a vector
You're talking about replacing return flags[t]; with a space optimized bit vector? I imagine it would help performance some if for no other reason that the trace flags would fit is a single cache block rather than spanning multiple as they do now. Ali On Sep 23, 2008, at 2:42 AM, Gabe Black wrote: I just finished stepping through some code having to do with PCs in simple CPU, and I noticed that not printing DPRINTFs is actually a fairly involved process, considering that you're not actually doing anything. Part of the issue, I think, is that whether or not a traceflag is on is stored in a vector of Bools. Since the size of the vector won't change often (ever?) would it make sense to just make it a char [] and use something like the following? flags[t 3] (1 (t 3)); I realize when you've got tracing on you're not going for blazing speed in the first place, but if it's easy to tighten it up a bit that's probably a good idea. The other possibility is that it's actually not doing a whole lot but calling through a bunch of functions gdb stops at one at a time. That would look like a lot of work to someone stepping through with gdb but could be just as fast. Gabe ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] IsOn using a vector
I think I probably started using vectorbool but wound up using bitset eventually. Likely anything I used vectorbool for could be easily cleaned up to use bitset instead. It's probably not going to make a difference performance wise where I used it, but at least it would be more consistent. Regarding Gabe, this probably doesn't help clear anything up, but: http://www.sgi.com/tech/stl/bit_vector.html Description A bit_vector is essentially a vector http://www.sgi.com/tech/stl/Vector.htmlbool: it is a Sequence http://www.sgi.com/tech/stl/Sequence.html that has the same interface as vector http://www.sgi.com/tech/stl/Vector.html. The main difference is that bit_vector is optimized for space efficiency. A vector always requires at least one byte per element, but a bit_vector only requires one bit per element. *Warning*: The name bit_vector will be removed in a future release of the STL. The only reason that bit_vector is a separate class, instead of a template specialization of vectorbool, is that this would require partial specialization of templates. On compilers that support partial specialization, bit_vector is a specialization of vectorbool. The name bit_vector is a typedef. This typedef is not defined in the C++ standard, and is retained only for backward compatibility. [EMAIL PROTECTED] wrote: I generally agree, except that I'd be a bit surprised (but not shocked) if vectorbool was specialized to use bits. I'll check into this more at some point but my vote would be to go with bitset since it sounds like that would get us where we'd want to be. Gabe Quoting Steve Reinhardt [EMAIL PROTECTED]: I thought vectorbool was supposed to be a space-optimized bit vector. I think there are two things going on here: 1. It's an STL type, which means that the implementation probably is a nightmare of layered abstractions that in theory the compiler can figure out and flatten to an efficient piece of code in the common case. If you're single-stepping through the debug version, I'm not surprised that it's a mess, but I also would not be surprised if in the opt version it boils down to roughly equivalent to the optimized code you're proposing. 2. We probably should be using std::bitset rather than std::vectorbool when possible... the former should be faster since it's non-resizable and thus might have less bounds-checking code. I've been trying to use bitset for this type of thing everywhere I can in m5 (packet flags), and it looks like Kevin is using it in a few places in o3 (though he uses vectorbool also... not sure if that's intentional). Plus even if vectorbool is no longer space-optimized I'm sure that bitset is. In any case, we definitely don't want to write a one-off piece of code just for trace flags. If someone can absolutely prove that neither vectorbool nor bitset is adequate when compiled with optimization, we can consider writing a replacement class for all our bit vectors, but I highly doubt that this is the case. Steve On Tue, Sep 23, 2008 at 11:33 AM, Ali Saidi [EMAIL PROTECTED] wrote: You're talking about replacing return flags[t]; with a space optimized bit vector? I imagine it would help performance some if for no other reason that the trace flags would fit is a single cache block rather than spanning multiple as they do now. Ali On Sep 23, 2008, at 2:42 AM, Gabe Black wrote: I just finished stepping through some code having to do with PCs in simple CPU, and I noticed that not printing DPRINTFs is actually a fairly involved process, considering that you're not actually doing anything. Part of the issue, I think, is that whether or not a traceflag is on is stored in a vector of Bools. Since the size of the vector won't change often (ever?) would it make sense to just make it a char [] and use something like the following? flags[t 3] (1 (t 3)); I realize when you've got tracing on you're not going for blazing speed in the first place, but if it's easy to tighten it up a bit that's probably a good idea. The other possibility is that it's actually not doing a whole lot but calling through a bunch of functions gdb stops at one at a time. That would look like a lot of work to someone stepping through with gdb but could be just as fast. Gabe ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] IsOn using a vector
I think vectorbool is indeed a vector of bools because x[0] is supposed to return a bool * that you can mess with. I think that's why bitset exists. I'd be happy to see us move to bitset. The size of the bitset is easy to get since the code is autogenerated anyway. Nate On Tue, Sep 23, 2008 at 2:16 PM, [EMAIL PROTECTED] wrote: I generally agree, except that I'd be a bit surprised (but not shocked) if vectorbool was specialized to use bits. I'll check into this more at some point but my vote would be to go with bitset since it sounds like that would get us where we'd want to be. Gabe Quoting Steve Reinhardt [EMAIL PROTECTED]: I thought vectorbool was supposed to be a space-optimized bit vector. I think there are two things going on here: 1. It's an STL type, which means that the implementation probably is a nightmare of layered abstractions that in theory the compiler can figure out and flatten to an efficient piece of code in the common case. If you're single-stepping through the debug version, I'm not surprised that it's a mess, but I also would not be surprised if in the opt version it boils down to roughly equivalent to the optimized code you're proposing. 2. We probably should be using std::bitset rather than std::vectorbool when possible... the former should be faster since it's non-resizable and thus might have less bounds-checking code. I've been trying to use bitset for this type of thing everywhere I can in m5 (packet flags), and it looks like Kevin is using it in a few places in o3 (though he uses vectorbool also... not sure if that's intentional). Plus even if vectorbool is no longer space-optimized I'm sure that bitset is. In any case, we definitely don't want to write a one-off piece of code just for trace flags. If someone can absolutely prove that neither vectorbool nor bitset is adequate when compiled with optimization, we can consider writing a replacement class for all our bit vectors, but I highly doubt that this is the case. Steve On Tue, Sep 23, 2008 at 11:33 AM, Ali Saidi [EMAIL PROTECTED] wrote: You're talking about replacing return flags[t]; with a space optimized bit vector? I imagine it would help performance some if for no other reason that the trace flags would fit is a single cache block rather than spanning multiple as they do now. Ali On Sep 23, 2008, at 2:42 AM, Gabe Black wrote: I just finished stepping through some code having to do with PCs in simple CPU, and I noticed that not printing DPRINTFs is actually a fairly involved process, considering that you're not actually doing anything. Part of the issue, I think, is that whether or not a traceflag is on is stored in a vector of Bools. Since the size of the vector won't change often (ever?) would it make sense to just make it a char [] and use something like the following? flags[t 3] (1 (t 3)); I realize when you've got tracing on you're not going for blazing speed in the first place, but if it's easy to tighten it up a bit that's probably a good idea. The other possibility is that it's actually not doing a whole lot but calling through a bunch of functions gdb stops at one at a time. That would look like a lot of work to someone stepping through with gdb but could be just as fast. Gabe ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] ISA specific decoder state
Unfortunately there's still one missing component here. If you switch to the ROM and change the micropc, you may just be starting up the part of the macroop stored in the ROM, or you may be, for instance, trying to enter an interrupt handler. In the former case you want a macroop and in the later you don't, but it isn't clear how to differentiate between the two. Most of the time the problem isn't that big of a deal because you can just ignore the macroop, but in cases where the macroop doesn't work at all, like on a page fault in instruction memory, that can effectively block an interrupt from happening. One possible solution is to keep the macroop around effectively until you know for sure you finished a macroop, or if the thread context tells you to artificially kick it out with some sort of function. A possible name is endMacroop. Gabe Quoting Gabe Black [EMAIL PROTECTED]: I've considered it, and I think this is a good solution. I'm going to pass the current macroop into the rom when getting microops, and then when this gets to o3 at some point, keep the current macroop in the DynamicInst. Gabe [EMAIL PROTECTED] wrote: I'll have to think about it more, but at first pass that sounds pretty good. It might also help generate better/more descriptive disassembly which is something I've wanted to do for a while. Gabe Quoting Steve Reinhardt [EMAIL PROTECTED]: How about having a current macroop pointer in the execution context (so there would be a single pointer for SimpleCPU, and it would live in the DynamicInst object in O3)? On Sun, Sep 21, 2008 at 11:25 PM, Gabe Black [EMAIL PROTECTED] wrote: To break this into more digestible chunks, I need a way to get important information from the most recent macroop to the ROM so it can tune it's microops accordingly. I'd like to have two different types of regions in the ROM, those that are extensions of combinational macroops and need the specifics of the original macroop to do their job, and those that use a predefined set of parameters so they always work in predictable ways (and don't require a macroop). When the ROM was just an abstraction internal to the macroops, really just a special range of micropcs, the macroop itself could ferry that information along when it got microops from the ROM to give to the decoder. Now that the decoder gets the microops directly from the ROM, the information needs a new way to get to there. Two possibilities are that the decoder could expose the last macroop to the ROM so the ROM can pull out what it needs itself. The other is that the macroop sets up state maintained in the decoder which is exposed to the ROM for the same purpose. Exposing the macroop would probably be easier, but then it's easier to do something bad if there's no macroop but the ROM expects one. Liberal application of assert(ptr != NULL) would probably help mitigate that. One additional complication is if you mispredict or otherwise need to reset to a particular microop in the ROM. It could be hard to figure out if you need to get a macroop so the ROM can get the state it needs, or if the PC is nonsense and the ROM doesn't need the macroop. In this case, if the macroop is needed, fetch/decode should generate it, and if it causes a fault the fault should be handled. If the macroop isn't needed, fetch/decode shouldn't try to get it, and if they do and a fault happens, the fault should be ignored. I'm not sure what the best way to differentiate these is. One option would be to add more state which says whether or not the microops are stand alone and not intended to be part of a macroop. I feel like the decoder might be trying to keep track of too many things already, so some way to work that in there without just layering it on top would be best. Gabe Gabe Black wrote: I'm getting pretty close to starting with the interrupt entering microcode, but one annoying issue I'm not sure how to deal with is passing an environment to the microops in the ROM. For the interrupt code it's not as important because for a particular mode, the code should always behave the same way, generally outside the scope of any instruction. In the case of macroops being partially stored in the ROM, it becomes more important since the registers used, the various widths involved, etc. become more important. The ROM itself as it stands spits out instructions which correspond to a particular micropc but aren't otherwise specialized. If this was just x86, the obvious/easy solution would be to pass the emulation environment and extended machine instruction being used with the current macroop through to the ROM as the instructions are generated. Since it isn't, I need some way to get that information, whose existence is unknown to the decoder, out of the macroop, and then pass it into the ROM when a microop is requested. Any ideas
[m5-dev] taking out RegContext code
I'd like to take out the old RegContext code which is still floating around from my original implementation of register windows. Nothing calls it, and the way things are set up right now, specifically that o3 doesn't use the ISA defined int and float reg files, it's fundamentally broken. I'm having visitors until early next week, so if no one has protested by then I'll assume no one will miss it and get rid of it. Gabe ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] ISA specific decoder state
On Sep 23, 2008, at 9:28 PM, Steve Reinhardt wrote: I believe it's impossible fault on a microop in the middle of a macroop and then resume where you left off in the middle of the macroop flow. If you fault you will roll back to the state at the beginning of the macroop and then on resuming from the fault you either have to reexecute the macroop from the start or skip it entirely. So it seems like the two cases where you're in the middle of a macroop flow and you end the macroop are either (1) the final, specially marked microop of the normal flow hits retire or (2) a microop causes a fault and the whole macroop is discarded and the fault handler is invoked. Interrupts don't count because you'd never take an interrupt in the middle of a macroop. I think x86s string copy instruction is a case where you have to be able to take an fault (tlb) in the middle of the micro-op and then resume. Otherwise it would be possible to prevent the machine from making forward progress with a string copy that spanned more pages than the tlb holds. Ali ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev