Re: [Open-graphics] OGA1 mad dash: What's left

Timothy Normand Miller Tue, 01 Jan 2008 09:46:28 -0800

On 1/1/08, Lourens Veen <[EMAIL PROTECTED]> wrote:

> Okay, I'm actually starting to understand more and more of this (this
> hardware stuff is fun! :-)), but the memory subsystem and the addressing
> has me mystified somewhat (I'm not the only one, it looks like :-)).
> Also, there appear to be discrepancies between the code in SVN and the
> documentation on the mailinglist. Below is how I think it works.


It's hard to keep it all straight.

> Comments please?
>
>
> We have four pairs of 16Mx16 memory chips. To address 16M words, you
> need 24 address bits: 2 bits bank select, 13 bits row address, 9 bits
> column address. Each pair of two memory chips takes a 24 bit address
> through which you access a 32-bit word.

Since it's a DDR memory, the low bit of the column address is
"ignored".  (Not really, but that's not important; we set it to zero.)
 And the burst length has to be an even number (and we do only 2).
Two 16-bit chips make a 32-bit word, and with the burst of 2, that's a
minimum granularity of 64 bits (not counting that we have byte
enables).

As a result, our columns address is effectively only 8 bits.  If we
were to assign roles to bits for Bank, Row, Column, and
Memory-controller, then we have:

BBRRRRRRRRRRRRRCCCCCCCCMM

That's a 25 bit address to uniquely identify a 64-bit word (qword).
Using the lower two to select a controller, that leave us with a
23-bit qword address being used by each memory controller.

8 bytes/qword * 2**25 qwords = 268'435'456 bytes

So the math works out right.

> Each of these pairs has a memory controller attached to it. To clients
> it looks (for future expandability?) like a 32Mx64 memory. So, it takes
> a (24+2-1=)25-bit address to access a 64-bit word, but the topmost two
> bits of the address must be zero since we only actually have 8Mx64 bits
> installed. The memory controller has to do a two-word burst to access
> 64 bits, since its chip pair only does 32 bits per access.

Yeah.  We expanded the memory space to 1GiB.  Then we just throw away
the top two bits inside the arbiter.  Also, for PCI, we'll likely
reserve only 256MiB or less.  We're just doing this as a
future-proofing measure or sorts.

> Next, we get to the arbiter. Terminology is a bit confusing here: there
> is conceptually one arbiter, but it consists of four instantiations of
> arbiter.v if I understand correctly, each wired to one memory
> controller.

Since there are four independent sets of RAM, we might as well control
them separately.  Any agent that is doing random access therefore can
benefit from each controller being on a different memory row.  (For
the moment, there is no benefit, because the video controllers request
the same addresses from all four controllers, and PCI access has very
low throughput.  With a GPU in there, we'll see benefits.)

> The arbiter multiplexes memory access requests from various
> sources. The whole arbiter takes 27-bit addresses (upper two bits zero)
> to access 64-bit data words, and uses the lowest two bits to select a
> memory controller to pass the upper 25 bits of the address on to when
> it's time for the request to be serviced.

I tend to think of there being four arbiters.  Each one makes its own
scheduling decisions independently of the others.  But looking it from
your perspective, you're right.  Each memory controller gets (sortof)
a 25-bit address, with the upper two bits thrown away, leaving 23.

Note that the decision as to which arbiter to talk to, when doing PCI
writes, is being made at the top level.  The amount of logic is
trivial, but it's probably inappropriate to do it this way, from a
logical standpoint.

> The bridge between the FPGAs has 32 physical data lines. Requesting
> access to memory across the bridge therefore goes at 32 bits at a time,
> requiring a 28-bit address with the upper two bits zero.

Yes, although we pass a byte address, so we see a 30-bit address, then
throw away two lower bits to get a 28-bit dword address.

> Finally, if you want to refer to a single byte within this memory space,
> then you need two more bits at the low end, for a 30-bit address and a
> 1GB overall memory space of which only the lowest 256MB is populated.

Yes and no.  The lower two bits don't really contain a byte offset.
They do for I/O space.  But for memory space, the contain some control
flags that we ignore.  Instead, we pay attention to the byte enable
flags.  The memory controller is fed 8 of them for each qword.

> Summarising, here is a diagram. Each line describes the input address of
> the named module, and how it uses it. For example, the arbiter gets a
> 27-bit address of which it uses the last two bits to select one of the
> four memory controllers. The remaining bits are sent to the memory
> controller in the line below.
>
> Note how the bits are consumed top to bottom. The chips get an
> additional bit at the end of the 8-bit column address the memory
> controller receives, which is generated by the memory controller.
>
> bits    |31|30|29|28|27|26|25|24|23|22|21|20|19|18|17|16|
> --------+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
> PCI     |00|00|00|00|XX|XX|XX|XX|XX|XX|XX|XX|XX|XX|XX|XX|
> 1GB     |  |  |00|00|XX|XX|XX|XX|XX|XX|XX|XX|XX|XX|XX|XX|
> Bridge  |  |  |00|00|XX|XX|XX|XX|XX|XX|XX|XX|XX|XX|XX|XX|
> Arbiter |  |  |00|00|XX|XX|XX|XX|XX|XX|XX|XX|XX|XX|XX|XX|
> Memctl  |  |  |00|00|BB|BB|RR|RR|RR|RR|RR|RR|RR|RR|RR|RR|
>
>
> bits    |15|14|13|12|11|10|09|08|07|06|05|04|03|02|01|00|
> --------+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
> PCI     |XX|XX|XX|XX|XX|XX|XX|XX|XX|XX|XX|XX|XX|XX|XX|XX|
> 1GB     |XX|XX|XX|XX|XX|XX|XX|XX|XX|XX|XX|XX|XX|XX|XX|XX|
> Bridge  |XX|XX|XX|XX|XX|XX|XX|XX|XX|XX|XX|XX|XX|WW|  |  |
> Arbiter |XX|XX|XX|XX|XX|XX|XX|XX|XX|XX|XX|MM|MM|  |  |  |
> Memctl  |RR|RR|RR|CC|CC|CC|CC|CC|CC|CC|CC|  |  |  |  |  |
>
> XX = some value
> 00 = zero because we only have 256MB of DRAM
> WW = select upper or lower 32-bit word of arbiter output
> MM = memory controller select
> BB = bank select
> RR = row select
> CC = column select

This all looks correct to me.

>
> NOTE:
>
> In [1], Timothy specifies a 64-bit interface with separate read and
> write data lines between the memory controller and the arbiter, but in
> SVN mem_ctl.v, there is a 32-bit interface with combined read and write
> data lines. That doesn't match the spec, nor the arbiter, so something
> has to be adjusted here. The key question here is who does the 2-word
> burst to convert from a 32-bit to a 64-bit interface? I'd say that that
> is up to the memory controller (as I described above); then the arbiter
> can worry about scheduling and not have to do address/datawidth
> conversion as well.

I'm not sure what you're looking at.  See
mem_ctl/tims/memctl_syn_top.v and mem_ctl/tims/memctl_fsm_200712.v.
(There are two other files in there, but they're not so important.)
The controller has a 32-bit interface to the memories, but that's
because they're double data rate (data is transferred on both the
rising and falling edges of the clock.)

>
> NOTE:
>
> The arbiter currently has
>
>     output [12:0] col_mem,
>
> If I understand correctly and the diagram above is right, then that 12
> should be a 7. It could also be 10, 9 or 8, but not 12 I think :-).

Yes and no.  Different memory sizes have different numbers of rows and
columns.  How the address gets broken up into column/row, etc. needs
to be made configurable in the arbiter.



-- 
Timothy Normand Miller
http://www.cse.ohio-state.edu/~millerti
Open Graphics Project
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Re: [Open-graphics] OGA1 mad dash: What's left

Reply via email to