Re: [m5-dev] local APIC timer and bus frequency
To follow up on this, I'm using the CPU frequency divided by 16 as the bus frequency used by the local APIC. If anybody thinks that's unreasonable let me know. Gabe Steve Reinhardt wrote: On Thu, May 22, 2008 at 11:45 AM, Gabe Black [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: One problem is that this isn't a parent/child relationship. For instance, there could be four CPUs and four local APICs all on the same bus, and they need to know which one goes with which CPU. In that case it would be arbitrary. [...] Also, all the local APICs appear at the same address but are only visible to their CPU. I was going to try to deal with that by sticking them into their own address space multiplexed on some address bits, but I don't like the scalability of that. Yea, that's the kind of thing I meant by fiddling with address mappings. I wouldn't disagree that it's a bit hackish, but it seems plenty scalable to me. If you do that it does solve the all the APICs on one local bus problem too. Another idea is a bus local to the CPU which has the CPU and APIC on it. That would have to connect to the external interconnect somehow. That could also be a home for the page table walker and tlb eventually if those go off chip and into the memory system. The tricky part here is doing that without adding latency to non-APIC memory accesses (and appropriately forwarding flow-control state through from the cache, etc.). It's probably doable but it would be nicer to have a more general solution that didn't require this (IMO). If the interconnect isn't a bus, then you can't just stick the APICs anywhere since they're supposed to be tightly integrated on die. You'd need to make sure they ended up very close to their CPUs, and having that over a network hop would be substantially unrealistic. Don't forget that part of the point of m5 is to be able to model things that are substantially unrealistic, like putting a PCI NIC next to the L2 cache :-). I think a solution that lets you put the APIC wherever you want is a good idea, even if most of those places are obviously stupid. I don't think it matters much whether the CPU can check the interconnect frequency, although it probably can through some mechanism somewhere like CPUID or the stuff the processor sticks into PCI config space according to the AMD kernel/BIOS developers guide. The timer is being setup and used for something, and I would imagine whatever is using it might break if the frequency is way off. Then again maybe not. I'd rather make it do what it's supposed to do than to try to figure out and fix the problems it may cause later. I think the key is separating the stuff that needs to be a certain way to get things to work from the stuff that is just done a certain way just because that's easy/traditional/whatever. So clearly the APIC timer needs to run at a frequency within some range, such that the CPU can figure out what that frequency is, but I'm guessing that as long as that frequency is reasonable it doesn't matter exactly what it's relationship is to the interconnect. So just having a parameter that lets you set the frequency sounds fine to me; if in the longer term we find that there really is a need to have it tied to the interconnect frequency we can always set that up in the python. Steve ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] local APIC timer and bus frequency
For now, I'm going to make the miscregfile have the event and cause an interrupt when the timer goes off. This really sounds crappy to me, but I'd have to add new functions to get at the interrupt object like there are for the TLB. That wouldn't be hard, but I wanted to point out I'd be adding a new function all over the place in the CPU before I went and did it in case that doesn't sound like a great idea. I personally think there are too many functions in too many places in the CPU which is probably unavoidable, but I'd prefer not to add another one. Also, while I was deciding what to do about the local APIC, I tried to boil down what exactly an ISA is in M5, what it should and shouldn't be doing, and if that suggested anything we should be doing differently. That's basically what follows, so if you don't care go ahead and stop reading. Basically, the ISA is a set of policies and a collection of state which directs them. The policies are layered on top of and direct the mechanisms of the CPU. This is different from how I think the ISAs have been built up to this point which was as a collection of objects which implemented the ISAs policies and which were plugged into the CPU. Essentially, the difference is that there would be a base TLB class which would contain the behavior of all TLBs, they translate addresses, they have entries, the evict things, blah blah, and the ISA would describe the policies that happened under and what state, the entries and control registers, that guided it. The TLB would call into the ISA to make decisions rather than the ISA defining a TLB which would automatically make the right decisions. Another important distinction here is that the state which controls the policies of the ISA, most of the MiscRegs, are part of the ISA, but the other registers are not. The policy directing how those registers are used, namely how they're indexed, is, but the actual registers themselves are not. Also, the way they're currently indexed, as integer/floating point/misc, should really be setup and managed by the ISA and not the CPU. The CPU would probably want to know what type a register is so it can have it's own policies layered on top, like for instance separate pools of resources for ints and floats, but it shouldn't force the ISA to glob those together. Even that might be too much, because at least for X86 there are registers which can be both floating point and integer depending on the instruction that uses them. So really, the cpu should just expect some number of local storage banks. In x86, those could correspond to the general (ha) purpose registers, x87/mmx registers, xmm registers, and control register backing storage. The control register backing storage could be broken down farther into the CR control registers, debug registers, performance counter registers, the LAPIC control registers, the MSRs, etc, etc. Like now, the CPU would be responsible for providing the ISA the illusion that that was how things worked, but the o3, for example, could keep track of the actual storage however it liked. Then, the ISA defined control register file(s) really are just register file(s) which have their accesses intercepted and acted on in some way. They could conceptually live in the TLB, Interrupt object, whatever, with the CPU actually keeping track of the state, aka what a read would return. The objects the ISA controls would be able to keep their own local cached versions of the control state in whatever way was convenient, like for instance as a lookup table for register windows. The objects would need to be able to refresh and reread the underlying control state to update their caching data structures in cases where the underlying storage was updated directly, like what NoEffect does right now. This takes care of some odd circumstances where you can have trouble bootstrapping the control state in, for instance, SE mode. You have to make sure everything actually uses the control state you're righting, but it has to do that when not all of the state has actually been set. Really, all that needs to be refreshed is the control state which is part of the ISA, not of the objects in the CPU. In that way, you don't have to refresh the TLBs, blah blah. You only have to make the ISA configure itself based on the (to the CPU) generic array of values stored in some of the register files. There also would be objects with state and policy which are dynamically generated like faults and instructions. The faults are policy with maybe a little bubble of controlling state in them, and the instructions are policy (how to execute) strapped onto generic objects. So now, all the policies and state particular to the ISA have been localized into one place, an abstract concept called the ISA. That might as well be an object since it has methods and fields just like an object would. Now that the ISA is an object, which might as well be
Re: [m5-dev] local APIC timer and bus frequency
Oh, and one thing I forgot, registers can be like faults where they're little islands of the ISA. They know how to translate indexes, and they could use bitunions, which if it works (I don't remember if it does) could be inherited from (or inherit, with some modifications) to be able to pull out the bitfields easily while having methods and whatnot. I keep thinking about how to get the parser to figure out inputs and outputs better, but really I'm thinking that might be better done explicitly and then propagated using the formats. If the parser goes from treating the ISA description as input and instead has the same functionality but becomes a python library to go into x86.py (for instance) which I'd like to do someday as well, it could be based on python classes and inheritance. I think generally if you have a class of instructions like IntOps, you know there will be Ra and Rb as inputs and Rc as outputs all the time, and if not you did something wrong. Anyway, that's an entirely different discussion. Gabe Gabe Black wrote: For now, I'm going to make the miscregfile have the event and cause an interrupt when the timer goes off. This really sounds crappy to me, but I'd have to add new functions to get at the interrupt object like there are for the TLB. That wouldn't be hard, but I wanted to point out I'd be adding a new function all over the place in the CPU before I went and did it in case that doesn't sound like a great idea. I personally think there are too many functions in too many places in the CPU which is probably unavoidable, but I'd prefer not to add another one. Also, while I was deciding what to do about the local APIC, I tried to boil down what exactly an ISA is in M5, what it should and shouldn't be doing, and if that suggested anything we should be doing differently. That's basically what follows, so if you don't care go ahead and stop reading. Basically, the ISA is a set of policies and a collection of state which directs them. The policies are layered on top of and direct the mechanisms of the CPU. This is different from how I think the ISAs have been built up to this point which was as a collection of objects which implemented the ISAs policies and which were plugged into the CPU. Essentially, the difference is that there would be a base TLB class which would contain the behavior of all TLBs, they translate addresses, they have entries, the evict things, blah blah, and the ISA would describe the policies that happened under and what state, the entries and control registers, that guided it. The TLB would call into the ISA to make decisions rather than the ISA defining a TLB which would automatically make the right decisions. Another important distinction here is that the state which controls the policies of the ISA, most of the MiscRegs, are part of the ISA, but the other registers are not. The policy directing how those registers are used, namely how they're indexed, is, but the actual registers themselves are not. Also, the way they're currently indexed, as integer/floating point/misc, should really be setup and managed by the ISA and not the CPU. The CPU would probably want to know what type a register is so it can have it's own policies layered on top, like for instance separate pools of resources for ints and floats, but it shouldn't force the ISA to glob those together. Even that might be too much, because at least for X86 there are registers which can be both floating point and integer depending on the instruction that uses them. So really, the cpu should just expect some number of local storage banks. In x86, those could correspond to the general (ha) purpose registers, x87/mmx registers, xmm registers, and control register backing storage. The control register backing storage could be broken down farther into the CR control registers, debug registers, performance counter registers, the LAPIC control registers, the MSRs, etc, etc. Like now, the CPU would be responsible for providing the ISA the illusion that that was how things worked, but the o3, for example, could keep track of the actual storage however it liked. Then, the ISA defined control register file(s) really are just register file(s) which have their accesses intercepted and acted on in some way. They could conceptually live in the TLB, Interrupt object, whatever, with the CPU actually keeping track of the state, aka what a read would return. The objects the ISA controls would be able to keep their own local cached versions of the control state in whatever way was convenient, like for instance as a lookup table for register windows. The objects would need to be able to refresh and reread the underlying control state to update their caching data structures in cases where the underlying storage was updated directly, like what NoEffect does right now. This takes care of some odd circumstances where you can have trouble bootstrapping the control
Re: [m5-dev] local APIC timer and bus frequency
Thanks for the email... can't say I really follow all the nuances after a quick read, but I'm glad you're thinking about it. Just a few comments off the top of my head: The common indexing scheme across all register types is something we inherited from SimpleScalar. It's not ideal for actually indexing into register files, but the main benefit is that it gives a single flat namespace for tracking register dependencies, which simplifies things. (At least is simplified the old FullCPU model; I assume it helps in O3 as well.) While I think doing heterogeneous ISAs could be useful, I don't want to lose performance for it, esp. in the common homogeneous case. My thought was that we'd always keep the ISA as a static parameter, but then compile and link in multiple instantiations of the CPU model for each ISA we cared about. I also don't think the ISA itself should have any state... CPUs or other components can have ISA-specific state, and the ISA can certainly have a lot of constants associated with it, but there shouldn't be any dynamic state associated solely with the ISA. Thus I don't see where making it an object has any advantages over the status quo. Steve On Sat, May 24, 2008 at 12:01 AM, Gabe Black [EMAIL PROTECTED] wrote: For now, I'm going to make the miscregfile have the event and cause an interrupt when the timer goes off. This really sounds crappy to me, but I'd have to add new functions to get at the interrupt object like there are for the TLB. That wouldn't be hard, but I wanted to point out I'd be adding a new function all over the place in the CPU before I went and did it in case that doesn't sound like a great idea. I personally think there are too many functions in too many places in the CPU which is probably unavoidable, but I'd prefer not to add another one. Also, while I was deciding what to do about the local APIC, I tried to boil down what exactly an ISA is in M5, what it should and shouldn't be doing, and if that suggested anything we should be doing differently. That's basically what follows, so if you don't care go ahead and stop reading. Basically, the ISA is a set of policies and a collection of state which directs them. The policies are layered on top of and direct the mechanisms of the CPU. This is different from how I think the ISAs have been built up to this point which was as a collection of objects which implemented the ISAs policies and which were plugged into the CPU. Essentially, the difference is that there would be a base TLB class which would contain the behavior of all TLBs, they translate addresses, they have entries, the evict things, blah blah, and the ISA would describe the policies that happened under and what state, the entries and control registers, that guided it. The TLB would call into the ISA to make decisions rather than the ISA defining a TLB which would automatically make the right decisions. Another important distinction here is that the state which controls the policies of the ISA, most of the MiscRegs, are part of the ISA, but the other registers are not. The policy directing how those registers are used, namely how they're indexed, is, but the actual registers themselves are not. Also, the way they're currently indexed, as integer/floating point/misc, should really be setup and managed by the ISA and not the CPU. The CPU would probably want to know what type a register is so it can have it's own policies layered on top, like for instance separate pools of resources for ints and floats, but it shouldn't force the ISA to glob those together. Even that might be too much, because at least for X86 there are registers which can be both floating point and integer depending on the instruction that uses them. So really, the cpu should just expect some number of local storage banks. In x86, those could correspond to the general (ha) purpose registers, x87/mmx registers, xmm registers, and control register backing storage. The control register backing storage could be broken down farther into the CR control registers, debug registers, performance counter registers, the LAPIC control registers, the MSRs, etc, etc. Like now, the CPU would be responsible for providing the ISA the illusion that that was how things worked, but the o3, for example, could keep track of the actual storage however it liked. Then, the ISA defined control register file(s) really are just register file(s) which have their accesses intercepted and acted on in some way. They could conceptually live in the TLB, Interrupt object, whatever, with the CPU actually keeping track of the state, aka what a read would return. The objects the ISA controls would be able to keep their own local cached versions of the control state in whatever way was convenient, like for instance as a lookup table for register windows. The objects would need to be able to refresh and reread the underlying control state to update their caching data
Re: [m5-dev] local APIC timer and bus frequency
On Wed, May 21, 2008 at 5:44 PM, Gabe Black [EMAIL PROTECTED] wrote: The kernel is now getting to a point where it's trying to calibrate the timer in the local APIC against the TSC register. In order to mimic that, I'm going to need to create an event to fire when the timer is supposed to go off. This is enough of an impetus to separate the local APIC into it's own device on the bus just outside of the CPU. That means I need to solve some issues I've been putting off, namely making sure there's exactly one local APIC per cpu and that they know about each other. One topology is that the local APIC acts as an intermediary between the CPU and the interconnect, and the other is with the APIC as a peer. The latter won't work so well with non-bus interconnects I don't think. Why do you say that? Is there some aspect to how it's addressed that makes it different from a regular memory-mapped device? (In particular, some aspect that's not easily worked around by fiddling with address mappings?) Also, each APIC has to know what CPU it's associated with so it can return the right ID number and to have a pointer to make it interrupt. Seems like that should be relatively easy to take care of in python with a Parent.any parameter. Also, the APIC needs to know what the frequency is of the interconnect it's connected to since it runs it's timer off of a divided version of that clock. What do people think? Is it really architecturally visible that the timer's clock is related to the interconnect clock? Steve ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] local APIC timer and bus frequency
One problem is that this isn't a parent/child relationship. For instance, there could be four CPUs and four local APICs all on the same bus, and they need to know which one goes with which CPU. In that case it would be arbitrary. If the interconnect isn't a bus, then you can't just stick the APICs anywhere since they're supposed to be tightly integrated on die. You'd need to make sure they ended up very close to their CPUs, and having that over a network hop would be substantially unrealistic. Also, all the local APICs appear at the same address but are only visible to their CPU. I was going to try to deal with that by sticking them into their own address space multiplexed on some address bits, but I don't like the scalability of that. Another idea is a bus local to the CPU which has the CPU and APIC on it. That would have to connect to the external interconnect somehow. That could also be a home for the page table walker and tlb eventually if those go off chip and into the memory system. I don't think it matters much whether the CPU can check the interconnect frequency, although it probably can through some mechanism somewhere like CPUID or the stuff the processor sticks into PCI config space according to the AMD kernel/BIOS developers guide. The timer is being setup and used for something, and I would imagine whatever is using it might break if the frequency is way off. Then again maybe not. I'd rather make it do what it's supposed to do than to try to figure out and fix the problems it may cause later. Gabe Steve Reinhardt wrote: On Wed, May 21, 2008 at 5:44 PM, Gabe Black [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: The kernel is now getting to a point where it's trying to calibrate the timer in the local APIC against the TSC register. In order to mimic that, I'm going to need to create an event to fire when the timer is supposed to go off. This is enough of an impetus to separate the local APIC into it's own device on the bus just outside of the CPU. That means I need to solve some issues I've been putting off, namely making sure there's exactly one local APIC per cpu and that they know about each other. One topology is that the local APIC acts as an intermediary between the CPU and the interconnect, and the other is with the APIC as a peer. The latter won't work so well with non-bus interconnects I don't think. Why do you say that? Is there some aspect to how it's addressed that makes it different from a regular memory-mapped device? (In particular, some aspect that's not easily worked around by fiddling with address mappings?) Also, each APIC has to know what CPU it's associated with so it can return the right ID number and to have a pointer to make it interrupt. Seems like that should be relatively easy to take care of in python with a Parent.any parameter. Also, the APIC needs to know what the frequency is of the interconnect it's connected to since it runs it's timer off of a divided version of that clock. What do people think? Is it really architecturally visible that the timer's clock is related to the interconnect clock? Steve ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev