Re: [m5-dev] local APIC timer and bus frequency

2008-06-07 Thread Gabe Black
To follow up on this, I'm using the CPU frequency divided by 16 as the
bus frequency used by the local APIC. If anybody thinks that's
unreasonable let me know.

Gabe

Steve Reinhardt wrote:
 
 
 On Thu, May 22, 2008 at 11:45 AM, Gabe Black [EMAIL PROTECTED]
 mailto:[EMAIL PROTECTED] wrote:
 
 One problem is that this isn't a parent/child relationship. For
 instance, there could be four CPUs and four local APICs all on the
 same bus, and they need to know which one goes with which CPU. In
 that case it would be arbitrary. [...] Also, all the local APICs
 appear at the same address but are only visible to their CPU. I was
 going to try to deal with that by sticking them into their own
 address space multiplexed on some address bits, but I don't like the
 scalability of that.
 
 
 Yea, that's the kind of thing I meant by fiddling with address
 mappings.  I wouldn't disagree that it's a bit hackish, but it seems
 plenty scalable to me.  If you do that it does solve the all the APICs
 on one local bus problem too.
  
 
 Another idea is a bus local to the CPU which has the CPU and APIC on
 it. That would have to connect to the external interconnect somehow.
 That could also be a home for the page table walker and tlb
 eventually if those go off chip and into the memory system.
 
 
 The tricky part here is doing that without adding latency to non-APIC
 memory accesses (and appropriately forwarding flow-control state through
 from the cache, etc.).  It's probably doable but it would be nicer to
 have a more general solution that didn't require this (IMO).
 
 If the interconnect isn't a bus, then you can't just stick the APICs
 anywhere since they're supposed to be tightly integrated on die.
 You'd need to make sure they ended up very close to their CPUs, and
 having that over a network hop would be substantially unrealistic.
 
 
 Don't forget that part of the point of m5 is to be able to model things
 that are substantially unrealistic, like putting a PCI NIC next to the
 L2 cache :-).  I think a solution that lets you put the APIC wherever
 you want is a good idea, even if most of those places are obviously stupid.
  
 
 I don't think it matters much whether the CPU can check the
 interconnect frequency, although it probably can through some
 mechanism somewhere like CPUID or the stuff the processor sticks
 into PCI config space according to the AMD kernel/BIOS developers
 guide. The timer is being setup and used for something, and I would
 imagine whatever is using it might break if the frequency is way
 off. Then again maybe not. I'd rather make it do what it's supposed
 to do than to try to figure out and fix the problems it may cause later.
 
 
 I think the key is separating the stuff that needs to be a certain way
 to get things to work from the stuff that is just done a certain way
 just because that's easy/traditional/whatever.  So clearly the APIC
 timer needs to run at a frequency within some range, such that the CPU
 can figure out what that frequency is, but I'm guessing that as long as
 that frequency is reasonable it doesn't matter exactly what it's
 relationship is to the interconnect.  So just having a parameter that
 lets you set the frequency sounds fine to me; if in the longer term we
 find that there really is a need to have it tied to the interconnect
 frequency we can always set that up in the python.
 
 Steve
 
 
 
 
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] local APIC timer and bus frequency

2008-05-24 Thread Gabe Black
For now, I'm going to make the miscregfile have the event and cause an 
interrupt when the timer goes off. This really sounds crappy to me, but 
I'd have to add new functions to get at the interrupt object like there 
are for the TLB. That wouldn't be hard, but I wanted to point out I'd be 
adding a new function all over the place in the CPU before I went and 
did it in case that doesn't sound like a great idea. I personally think 
there are too many functions in too many places in the CPU which is 
probably unavoidable, but I'd prefer not to add another one. Also, while 
I was deciding what to do about the local APIC, I tried to boil down 
what exactly an ISA is in M5, what it should and shouldn't be doing, and 
if that suggested anything we should be doing differently. That's 
basically what follows, so if you don't care go ahead and stop reading.



Basically, the ISA is a set of policies and a collection of state which 
directs them. The policies are layered on top of and direct the 
mechanisms of the CPU. This is different from how I think the ISAs have 
been built up to this point which was as a collection of objects which 
implemented the ISAs policies and which were plugged into the CPU. 
Essentially, the difference is that there would be a base TLB class 
which would contain the behavior of all TLBs, they translate addresses, 
they have entries, the evict things, blah blah, and the ISA would 
describe the policies that happened under and what state, the entries 
and control registers, that guided it. The TLB would call into the ISA 
to make decisions rather than the ISA defining a TLB which would 
automatically make the right decisions.


Another important distinction here is that the state which controls the 
policies of the ISA, most of the MiscRegs, are part of the ISA, but the 
other registers are not. The policy directing how those registers are 
used, namely how they're indexed, is, but the actual registers 
themselves are not. Also, the way they're currently indexed, as 
integer/floating point/misc, should really be setup and managed by the 
ISA and not the CPU. The CPU would probably want to know what type a 
register is so it can have it's own policies layered on top, like for 
instance separate pools of resources for ints and floats, but it 
shouldn't force the ISA to glob those together. Even that might be too 
much, because at least for X86 there are registers which can be both 
floating point and integer depending on the instruction that uses them. 
So really, the cpu should just expect some number of local storage 
banks. In x86, those could correspond to the general (ha) purpose 
registers, x87/mmx registers, xmm registers, and control register 
backing storage. The control register backing storage could be broken 
down farther into the CR control registers, debug registers, performance 
counter registers, the LAPIC control registers, the MSRs, etc, etc. Like 
now, the CPU would be responsible for providing the ISA the illusion 
that that was how things worked, but the o3, for example, could keep 
track of the actual storage however it liked.


Then, the ISA defined control register file(s) really are just register 
file(s) which have their accesses intercepted and acted on in some way. 
They could conceptually live in the TLB, Interrupt object, whatever, 
with the CPU actually keeping track of the state, aka what a read 
would return. The objects the ISA controls would be able to keep their 
own local cached versions of the control state in whatever way was 
convenient, like for instance as a lookup table for register windows. 
The objects would need to be able to refresh and reread the underlying 
control state to update their caching data structures in cases where the 
underlying storage was updated directly, like what NoEffect does right 
now. This takes care of some odd circumstances where you can have 
trouble bootstrapping the control state in, for instance, SE mode. You 
have to make sure everything actually uses the control state you're 
righting, but it has to do that when not all of the state has actually 
been set. Really, all that needs to be refreshed is the control state 
which is part of the ISA, not of the objects in the CPU. In that way, 
you don't have to refresh the TLBs, blah blah. You only have to make the 
ISA configure itself based on the (to the CPU) generic array of values 
stored in some of the register files.


There also would be objects with state and policy which are dynamically 
generated like faults and instructions. The faults are policy with maybe 
a little bubble of controlling state in them, and the instructions are 
policy (how to execute) strapped onto generic objects.


So now, all the policies and state particular to the ISA have been 
localized into one place, an abstract concept called the ISA. That 
might as well be an object since it has methods and fields just like an 
object would. Now that the ISA is an object, which might as well be 

Re: [m5-dev] local APIC timer and bus frequency

2008-05-24 Thread Gabe Black
Oh, and one thing I forgot, registers can be like faults where they're 
little islands of the ISA. They know how to translate indexes, and they 
could use bitunions, which if it works (I don't remember if it does) 
could be inherited from (or inherit, with some modifications) to be able 
to pull out the bitfields easily while having methods and whatnot. I 
keep thinking about how to get the parser to figure out inputs and 
outputs better, but really I'm thinking that might be better done 
explicitly and then propagated using the formats. If the parser goes 
from treating the ISA description as input and instead has the same 
functionality but becomes a python library to go into x86.py (for 
instance) which I'd like to do someday as well, it could be based on 
python classes and inheritance. I think generally if you have a class of 
instructions like IntOps, you know there will be Ra and Rb as inputs and 
Rc as outputs all the time, and if not you did something wrong. Anyway, 
that's an entirely different discussion.


Gabe

Gabe Black wrote:
For now, I'm going to make the miscregfile have the event and cause an 
interrupt when the timer goes off. This really sounds crappy to me, 
but I'd have to add new functions to get at the interrupt object like 
there are for the TLB. That wouldn't be hard, but I wanted to point 
out I'd be adding a new function all over the place in the CPU before 
I went and did it in case that doesn't sound like a great idea. I 
personally think there are too many functions in too many places in 
the CPU which is probably unavoidable, but I'd prefer not to add 
another one. Also, while I was deciding what to do about the local 
APIC, I tried to boil down what exactly an ISA is in M5, what it 
should and shouldn't be doing, and if that suggested anything we 
should be doing differently. That's basically what follows, so if you 
don't care go ahead and stop reading.



Basically, the ISA is a set of policies and a collection of state 
which directs them. The policies are layered on top of and direct the 
mechanisms of the CPU. This is different from how I think the ISAs 
have been built up to this point which was as a collection of objects 
which implemented the ISAs policies and which were plugged into the 
CPU. Essentially, the difference is that there would be a base TLB 
class which would contain the behavior of all TLBs, they translate 
addresses, they have entries, the evict things, blah blah, and the ISA 
would describe the policies that happened under and what state, the 
entries and control registers, that guided it. The TLB would call into 
the ISA to make decisions rather than the ISA defining a TLB which 
would automatically make the right decisions.


Another important distinction here is that the state which controls 
the policies of the ISA, most of the MiscRegs, are part of the ISA, 
but the other registers are not. The policy directing how those 
registers are used, namely how they're indexed, is, but the actual 
registers themselves are not. Also, the way they're currently indexed, 
as integer/floating point/misc, should really be setup and managed by 
the ISA and not the CPU. The CPU would probably want to know what type 
a register is so it can have it's own policies layered on top, like 
for instance separate pools of resources for ints and floats, but it 
shouldn't force the ISA to glob those together. Even that might be too 
much, because at least for X86 there are registers which can be both 
floating point and integer depending on the instruction that uses 
them. So really, the cpu should just expect some number of local 
storage banks. In x86, those could correspond to the general (ha) 
purpose registers, x87/mmx registers, xmm registers, and control 
register backing storage. The control register backing storage could 
be broken down farther into the CR control registers, debug registers, 
performance counter registers, the LAPIC control registers, the MSRs, 
etc, etc. Like now, the CPU would be responsible for providing the ISA 
the illusion that that was how things worked, but the o3, for example, 
could keep track of the actual storage however it liked.


Then, the ISA defined control register file(s) really are just 
register file(s) which have their accesses intercepted and acted on in 
some way. They could conceptually live in the TLB, Interrupt object, 
whatever, with the CPU actually keeping track of the state, aka what 
a read would return. The objects the ISA controls would be able to 
keep their own local cached versions of the control state in 
whatever way was convenient, like for instance as a lookup table for 
register windows. The objects would need to be able to refresh and 
reread the underlying control state to update their caching data 
structures in cases where the underlying storage was updated directly, 
like what NoEffect does right now. This takes care of some odd 
circumstances where you can have trouble bootstrapping the control 

Re: [m5-dev] local APIC timer and bus frequency

2008-05-24 Thread Steve Reinhardt
Thanks for the email... can't say I really follow all the nuances
after a quick read, but I'm glad you're thinking about it.  Just a few
comments off the top of my head:

The common indexing scheme across all register types is something we
inherited from SimpleScalar.  It's not ideal for actually indexing
into register files, but the main benefit is that it gives a single
flat namespace for tracking register dependencies, which simplifies
things.  (At least is simplified the old FullCPU model; I assume it
helps in O3 as well.)

While I think doing heterogeneous ISAs could be useful, I don't want
to lose performance for it, esp. in the common homogeneous case.  My
thought was that we'd always keep the ISA as a static parameter, but
then compile and link in multiple instantiations of the CPU model for
each ISA we cared about.

I also don't think the ISA itself should have any state... CPUs or
other components can have ISA-specific state, and the ISA can
certainly have a lot of constants associated with it, but there
shouldn't be any dynamic state associated solely with the ISA.  Thus I
don't see where making it an object has any advantages over the status
quo.

Steve


On Sat, May 24, 2008 at 12:01 AM, Gabe Black [EMAIL PROTECTED] wrote:
 For now, I'm going to make the miscregfile have the event and cause an
 interrupt when the timer goes off. This really sounds crappy to me, but I'd
 have to add new functions to get at the interrupt object like there are for
 the TLB. That wouldn't be hard, but I wanted to point out I'd be adding a
 new function all over the place in the CPU before I went and did it in case
 that doesn't sound like a great idea. I personally think there are too many
 functions in too many places in the CPU which is probably unavoidable, but
 I'd prefer not to add another one. Also, while I was deciding what to do
 about the local APIC, I tried to boil down what exactly an ISA is in M5,
 what it should and shouldn't be doing, and if that suggested anything we
 should be doing differently. That's basically what follows, so if you don't
 care go ahead and stop reading.


 Basically, the ISA is a set of policies and a collection of state which
 directs them. The policies are layered on top of and direct the mechanisms
 of the CPU. This is different from how I think the ISAs have been built up
 to this point which was as a collection of objects which implemented the
 ISAs policies and which were plugged into the CPU. Essentially, the
 difference is that there would be a base TLB class which would contain the
 behavior of all TLBs, they translate addresses, they have entries, the evict
 things, blah blah, and the ISA would describe the policies that happened
 under and what state, the entries and control registers, that guided it. The
 TLB would call into the ISA to make decisions rather than the ISA defining a
 TLB which would automatically make the right decisions.

 Another important distinction here is that the state which controls the
 policies of the ISA, most of the MiscRegs, are part of the ISA, but the
 other registers are not. The policy directing how those registers are used,
 namely how they're indexed, is, but the actual registers themselves are not.
 Also, the way they're currently indexed, as integer/floating point/misc,
 should really be setup and managed by the ISA and not the CPU. The CPU would
 probably want to know what type a register is so it can have it's own
 policies layered on top, like for instance separate pools of resources for
 ints and floats, but it shouldn't force the ISA to glob those together. Even
 that might be too much, because at least for X86 there are registers which
 can be both floating point and integer depending on the instruction that
 uses them. So really, the cpu should just expect some number of local
 storage banks. In x86, those could correspond to the general (ha) purpose
 registers, x87/mmx registers, xmm registers, and control register backing
 storage. The control register backing storage could be broken down farther
 into the CR control registers, debug registers, performance counter
 registers, the LAPIC control registers, the MSRs, etc, etc. Like now, the
 CPU would be responsible for providing the ISA the illusion that that was
 how things worked, but the o3, for example, could keep track of the actual
 storage however it liked.

 Then, the ISA defined control register file(s) really are just register
 file(s) which have their accesses intercepted and acted on in some way. They
 could conceptually live in the TLB, Interrupt object, whatever, with the CPU
 actually keeping track of the state, aka what a read would return. The
 objects the ISA controls would be able to keep their own local cached
 versions of the control state in whatever way was convenient, like for
 instance as a lookup table for register windows. The objects would need to
 be able to refresh and reread the underlying control state to update their
 caching data 

Re: [m5-dev] local APIC timer and bus frequency

2008-05-22 Thread Steve Reinhardt
On Wed, May 21, 2008 at 5:44 PM, Gabe Black [EMAIL PROTECTED] wrote:

   The kernel is now getting to a point where it's trying to calibrate the
 timer in the local APIC against the TSC register. In order to mimic that,
 I'm going to need to create an event to fire when the timer is supposed to
 go off. This is enough of an impetus to separate the local APIC into it's
 own device on the bus just outside of the CPU. That means I need to solve
 some issues I've been putting off, namely making sure there's exactly one
 local APIC per cpu and that they know about each other. One topology is that
 the local APIC acts as an intermediary between the CPU and the interconnect,
 and the other is with the APIC as a peer. The latter won't work so well with
 non-bus interconnects I don't think.


Why do you say that?  Is there some aspect to how it's addressed that makes
it different from a regular memory-mapped device?  (In particular, some
aspect that's not easily worked around by fiddling with address mappings?)


 Also, each APIC has to know what CPU it's associated with so it can return
 the right ID number and to have a pointer to make it interrupt.


Seems like that should be relatively easy to take care of in python with a
Parent.any parameter.


 Also, the APIC needs to know what the frequency is of the interconnect it's
 connected to since it runs it's timer off of a divided version of that
 clock. What do people think?


Is it really architecturally visible that the timer's clock is related to
the interconnect clock?

Steve
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] local APIC timer and bus frequency

2008-05-22 Thread Gabe Black
One problem is that this isn't a parent/child relationship. For 
instance, there could be four CPUs and four local APICs all on the same 
bus, and they need to know which one goes with which CPU. In that case 
it would be arbitrary. If the interconnect isn't a bus, then you can't 
just stick the APICs anywhere since they're supposed to be tightly 
integrated on die. You'd need to make sure they ended up very close to 
their CPUs, and having that over a network hop would be substantially 
unrealistic. Also, all the local APICs appear at the same address but 
are only visible to their CPU. I was going to try to deal with that by 
sticking them into their own address space multiplexed on some address 
bits, but I don't like the scalability of that. Another idea is a bus 
local to the CPU which has the CPU and APIC on it. That would have to 
connect to the external interconnect somehow. That could also be a home 
for the page table walker and tlb eventually if those go off chip and 
into the memory system. I don't think it matters much whether the CPU 
can check the interconnect frequency, although it probably can through 
some mechanism somewhere like CPUID or the stuff the processor sticks 
into PCI config space according to the AMD kernel/BIOS developers guide. 
The timer is being setup and used for something, and I would imagine 
whatever is using it might break if the frequency is way off. Then again 
maybe not. I'd rather make it do what it's supposed to do than to try to 
figure out and fix the problems it may cause later.


Gabe

Steve Reinhardt wrote:



On Wed, May 21, 2008 at 5:44 PM, Gabe Black [EMAIL PROTECTED] 
mailto:[EMAIL PROTECTED] wrote:


  The kernel is now getting to a point where it's trying to
calibrate the timer in the local APIC against the TSC register. In
order to mimic that, I'm going to need to create an event to fire
when the timer is supposed to go off. This is enough of an impetus
to separate the local APIC into it's own device on the bus just
outside of the CPU. That means I need to solve some issues I've
been putting off, namely making sure there's exactly one local
APIC per cpu and that they know about each other. One topology is
that the local APIC acts as an intermediary between the CPU and
the interconnect, and the other is with the APIC as a peer. The
latter won't work so well with non-bus interconnects I don't think.


Why do you say that?  Is there some aspect to how it's addressed that 
makes it different from a regular memory-mapped device?  (In 
particular, some aspect that's not easily worked around by fiddling 
with address mappings?)
 


Also, each APIC has to know what CPU it's associated with so it
can return the right ID number and to have a pointer to make it
interrupt.


Seems like that should be relatively easy to take care of in python 
with a Parent.any parameter.
 


Also, the APIC needs to know what the frequency is of the
interconnect it's connected to since it runs it's timer off of a
divided version of that clock. What do people think?


Is it really architecturally visible that the timer's clock is related 
to the interconnect clock?


Steve



___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev
  


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev