This is some of the email I had mentioned earlier wrt HQ.  I had
intended to reply to Petter a while back, so I'm doing it now.  I've
included a lot of stuff relevant to the structure of the VGA project,
so if some people can help me keep track, extract and past into wiki,
that would be great!

Note that all of this is very important.  We need to keep track of
both the microcode's view of this and the module interface.  They're
_slightly_ different, because they come from different perspectives
about what's going on, even though they resolve to the same functions.

On 9/9/07, Petter Urkedal <[EMAIL PROTECTED]> wrote:
> On 2007-09-08, Timothy Normand Miller wrote:
> > On 9/8/07, Petter Urkedal <[EMAIL PROTECTED]> wrote:
> > > Here is my attempt to refine the I/O ports.  For PCI I'm making wild
> > > guesses, so nailing down the ports is just an easy way to expose my
> > > misunderstandings.  The point is to gain some understanding of how the
> > > nanocontroller will interact with the PCI controller (and memory).
> > >
> > > Memory read
> > >     in   MEM_READREQ_FREE               Free slots in command pipe.
> > >     out  MEM_READREQ_ADDR               First address to read.
> > >     out  MEM_READREQ_COUNT              Number of words to read.
> > >     in   MEM_READREPLY_DATA             Data stream from memory.
> > >     in   MEM_READREPLY_AVAIL            Number of words in FIFO.
> >
> > Perfect.
> >
> > BTW, in the symbol name, we may want to add a reminder to the human
> > programmer as to which ones are "trigger" writes.  For instance,
> > MEM_READREQ_COUNT triggers the read at the address that was programmed
> > in.
>

Annotation:  This is about the set of I/O ports connected to HQ that
are involved in making a read request from main memory, via the
bridge.

We need to define the bridge interface a bit first.  I'm not going to
get this exactly right, but I'm doing interfaces today, and this is
where we need to get started.

// This is the part of the bridge in the XP10 that is inward-facing to
PCI and HQ
// We'll include the out-facing pins soon enough.
module bridge_xp10(
...
// Common for requests
input [24:0] req_addr;  // might need separate read and write addresses
output busy;   // cannot accept a request

// Read requests
input [6:0] read_req_count;  // not sure on the max count; 64 for now
input do_read;

// Write requests
input [31:0] write_data;
input [3:0] write_bytes;  // byte enable flags
input do_write;

// Read return path
output [31:0] read_data;  // return of read data
output read_valid;
...
endmodule


I think the bridge will not have queues.  This is one-at-a-time.
Other logic will have queues.

> We could use MEM_READREQ_ADDR so that we can allow skipping the count if
> it's the same as the last one, then
>
> Memory read
>     in   MEM_READREQ_FREE               Free slots in command pipe.
>     out  MEM_READREQ_COUNT              Number of words to read.
>     out  MEM_READREQ_ADDR_TRIG          First address to read.
>     in   MEM_READREPLY_DATA             Data stream from memory.
>     in   MEM_READREPLY_AVAIL            Number of words in FIFO.

So, on the top-level wrapper for HQ, these will look something like:

// Memory read request queue
input [3:0] mem_readreq_free;  // Available req entries, used by microcode
output [6:0] mem_readreq_count;   // Part of the request
output [24:0] mem_readreq_addr;   // Address of first req
output mem_readreq_enq;  // Enqueue into request queue
input mem_readreq_full;   // read the fifo interface doc!

// Memory read return queue
input [31:0] mem_readreply_data;
input mem_readreply_valid;
input mem_readreply_count;   // How many read words in queue
output mem_read_deq;

> >
> > I'm not sure that the count has much meaning.  Writes are nice in that
> > in some cases like this one, we can just "fire and forget."  :)
>
> All this great new terminology ;-)
>
> So now we have,
>
> Memory write
>     out  MEM_WRITE_ADDR                 Start address.
>     in   MEM_WRITE_FREE                 Free slots in output FIFO.
>     out  MEM_WRITE_DATA                 Data stream to memory.
>

Annotation:  Write requests to main memory via bridge

// Memory write requests
input mem_write_full;
output mem_write_enq;
output [24:0] mem_write_addr;
output [31:0] mem_write_data;
input [6:0] mem_write_free;  // number of requests that can be queued


Note that since HQ's pipeline cannot be held up, the busy signal
doesn't really do anything for us.  It's part of a normal fifo
interface, but what we'll really be doing is finding out how many free
req slots there are and then filling them.  If we overfill, we'll lose
a request, but that's a coding error.

> And, this is unchanged:
>
> Master read
>     in   PCI_MASTER_READREQ_FREE        Free slots in command pipe.
>     out  PCI_MASTER_READREQ_ADDR        Host-mapped address to read.
>     out  PCI_MASTER_READREQ_COUNT       Number of words to receive.
>     in   PCI_MASTER_READREPLY_DATA      Data stream from host.
>     in   PCI_MASTER_READREPLY_AVAIL     Number of words in FIFO.

This should be put into the wiki, but we're not going to implement it
yet.  In fact, that's the case for all of the stuff we've defined in
this post.

> > Now that I think about it, I rather prefer the idea of indicating
> > count up front, rather than having to tag the last word.  This way,
> > nothing (besides the master, which really needs to know the most and
> > soonest) has to keep track of when the last word is going to come
> > through.
>
> I though the master needed to know in advance, but now that you mention
> it I remember the PCI allows us to just terminate at will.  Is there an
> advantage with not using the count, that we can continue streaming past
> the 2^6 or so limit if we wish?

We don't want to do this in case the target decides to terminate with
retry.  If it does, we can't get the address right.

> And for the number of instructions, it
> does not matter.  We write to a count before, or we write to another
> register after.  If we want to optimise, we replace the count port with
> PCI_MASTER_WRITE_DATA_FINAL.

I think we should go with the count.  This gives us more freedom in
the master to predict when a last entry is going to appear.

> > And moreover, I'm also thinking that perhaps the DMA master should
> > have separate command and write data fifos.  This way, some other
> > agent can be filling the data fifo asynchronously.  For instance, some
> > data words come in from the memory system, but the master doesn't know
> > what to do with them, so it doesn't do anything, and then the
> > nanocontroller gets around to sending a command to the master, and
> > then it can do something with the data.  More opportunity to make
> > things asynchronous.
>
> Great idea!  Let's put in a reminder (PCI_MASTER_WRITE_ROUTE).  Now, if
> we also replace the count, we have
>
> Master write
>     out  PCI_MASTER_WRITE_ADDR          Host-mapped address to write.
>     out  PCI_MASTER_WRITE_ROUTE         FIFO routing commands.
>     in   PCI_MASTER_WRITE_FREE          Free words in output FIFO.
>     out  PCI_MASTER_WRITE_DATA          Data stream to host.
>     out  PCI_MASTER_WRITE_DATA_FINAL    Final data word to host.
>
> Maybe using PCI_MASTER_WRITE_ROUTE should be mandatory, also when
> filling the pipeline with PCI_MASTER_WRITE_DATA.  That way, we can make
> sure enough data to avoid PCI timeout before triggering the DMA.

I would just call it PCI_MASTER_CMD, because all it does is tell the
master to do something.  These are the ports I would use:

Master write data:
out PCI_MASTER_WRITE_DATA    Data word
out PCI_MASTER_WRITE_BYTES   Byte enables (optional, default to all)
in PCI_MASTER_WRITE_DATA_FREE  How many data words we can write out

Master command:
out PCI_MASTER_CMD  Includes direction (read/write) and count
in PCI_MASTER_CMD_FREE  Number of commands we can write

Master read data:
in PCI_MASTER_READ_DATA  Data word
in PCI_MASTER_READ_COUNT   Number of words available to read

So, this is three queues.  Two for data that can be filled as
necessary, one for commands that control the state machine.  If you
put extra write data in the write queue, for instance, only the amount
you request in the command will get written.  This means you can do
some things more asynchronously.  (If you underfill the queue, or you
don't empty the read queue, the state machine will insert wait states
on the bus.  That's good, but avoid it.)

> > > Target of write (we're reading)
> > >     in   PCI_TARGET_WRITEREQ_ADDR       Target address of write.
> > >     in   PCI_TARGET_WRITEREQ_COUNT      Number of words to receive.
> > >     in   PCI_TARGET_WRITEREPLY_AVAIL    Number of words in FIFO.
> > >     out  PCI_TARGET_WRITEREPLY_DATA     Data stream from host.
> >
> > This one's tricky.  With the target, we have absolutely no control
> > here.  For one thing, we "config" ports that set whether or not we're
> > take PIO accesses.  Either they go directly over to the Spartan, or
> > they all come to us.  Or perhaps we want to select by BAR.  Not sure
> > exactly yet.
> >
> > Now, the only time we do intercept PIO transactions is when we're
> > really going to process them somehow, so the flow control can be as
> > complex as we can afford in the time available.
> >
> > So, basically, I think we could make do with one physical fifo.  The
> > target keeps track of addresses for PIO bursts, so we could just push
> > 64-bit entries into a fifo.  4 bits are byte flags.  28 bits are a

Maybe address words above should be [27:0] instead of [24:0].

> > word address (1 GiB max space).  32 bits are data.  One way to handle
> > this is to have one I/O port samples (but doesn't dequeue) the
> > flags/address word.  The other I/O port grabs the data and dequeues.
> > This way, in the unlikely event that you KNEW what the next address
> > would be, you could just ignore it and grab the data in one cycle.
> >
> > These would be the I/O ports:
> >
> > in PCI_TARGET_WRITE_COUNT   The number of words in the write queue
> > in PCI_TARGET_WRITE_ADDRFLAGS   Address and flags for a write data word
> > in PCI_TARGET_WRITE_DATA   Data of write word
>

Note:  We need to discuss fifo counts.  I want PCI's address
granularity to be 64-word, but queues are best being 16-entry.  In
some cases, it doesn't matter.  If the queue fills, wait states get
inserted while we catch up.

The ports in HQ for target write:

input [6:0] pci_target_write_count;
input [31:0] pci_target_write_data;
input [3:0] pci_target_write_bytes;
input pci_target_write_valid;  // queue contains >0 entries
output pci_target_write_deq;

> Looks good to me.
>
> But this raises a question.  I assume we can ignore byte-enables for
> master mode, since we're deciding.  For target read it also does not
> matter for non-config space since we are just be filling in redundant
> information.  So is target write the only non-config mode where we care
> about byte-enables?

For master writes, we CAN care about byte enables.  For target writes,
we MUST care.  For reads, we never need to care.

> That also brings up a general question about all addresses.  In the
> nanocontroller we have a 32 bit granularity on scratch space.  Would it
> not save us code if address ports are all right-shifted 2 bits, and in
> the case we need byte-enable, they are on a separate port?

Doesn't matter too much.  If we right-shift the address, that frees up
two bits.  What do we do with them?  It's not enough space for byte
enables.  Maybe we should keep everything byte-oriented (even though
we ignore the lower two bits), unless we discover that there's some
major performance advantage, like how we don't have to left-shift
counts to add to addresses.  There's a tradeoff between getting the
programmer confused and other kinds of efficiency.  I can buy both
approaches, so let's discuss it.

> > You know what?  We'll never be able to keep up with that.  It's too
> > complicated.  There's absolutely no reason why PIO reads have to be
> > fast, ESPECIALLY in the cases where we would actually intercept
> > requests.  PIO reads suck, and we should not put gobs of logic into
> > trying to make it not suck.  So, no, I think we should handle one at a
> > time, and each individual transaction should be only one word.  In
> > this state, the target would be in a mode where it always asserts STOP
> > at the same time as TRDY, on top of the usual timeout mechanism.
> >
> > So here are the ports:
> >
> > in PCI_TARGET_READ_PENDING   Is a read pending?
> > in PCI_TARGET_READ_ADDR   The one address that is pending
> > out PCI_TARGET_READ_DATA   Where we write the one data word
>
> I like this simple solution if PIOs efficiency is secondary.  We'd would
> probably need interrupts to handle them without timing out.

Alas, interrupts too would be a problem.  Controlling the bridge also
involves sequences of things.  Interrupting that would be a race
condition.  So we're stuck with having to poll PCI at regular
intervals.

Here's the interface section on HQ:

input pci_target_read_pending;
input [27:0] pci_target_read_addr;
output pci_target_read_deq;
output [31:0] pci_target_read_data;
output pci_target_read_valid;   // pulse this with valid data

> > So, if the read times out, the controller is smart enough to recognize
> > that the address of a later retry is the same as the posted one and
> > automatically returns the data (if we've supplied any) or times out
> > (if we haven't supplied any).
>
> That sounded easy at first, but isn't there a race condition here?  The
> PCI target sets the address.  The nanocode sees the address and start
> computing the reply.  The PCI target has timed out and gotten a new
> request, then updates the address.  The nanocode writes the reply to the
> previous address.  Hmm, we'll need to let the PCI target also detect
> when the nanocontroller reads the address, and discard any data which
> the nanocode writes prior to reading the last address.

It will always be the same address.  This is a requirement of PCI
protocol.  I don't think it's allowed that some other agent come in
and make a request while this posted transaction is going on.

> > Note that we could probably combine PENDING and ADDR.  It's not a
> > queue.  Most of the bits will be the address, and one is a flag
> > indicating if it's valid.  The PENDING flag will be cleared whenever
> > we write to the DATA port (which is also not a queue).
>
> I suggest we let ADDR be negative if there is nothing pending, since we
> don't need addresses above 2 GiB.  That way, we can use the bneg
> instruction.
>
>     in PCI_TARGET_READ_ADDR    The one address that is pending or -1.
>     out PCI_TARGET_READ_DATA   Where we write the one data word

Do we have a bpos instruction?  I'd rather keep the flag high
assertion for valid.  Putting the flag in the high bit is a good idea.
 Also, the interface I provided doesn't change for this.  Although the
deq and valid are probably the same signal, so we should combine them.

input pci_target_read_pending;
input [27:0] pci_target_read_addr;
output pci_target_read_deq;  // Removes request, asserts response,
pulse with valid data
output [31:0] pci_target_read_data;


> > In the microcode we would do well to do some sort of caching, so that
> > we can return data before timeout (by my design, we have only 8 PCI
> > cycles, though).  But that's all an implementation detail.
>
> Okay, let's postpone that, since it doesn't affect the ports.  It'll
> probably be more clear at we start to write the code.
>
> > Oh, and don't forget PCI_TARGET_INTERCEPT_CONFIG or whatever.
>
> Will this be unified with target-write ports?  Can you type it out?

This is just a configuration register that indicates routing for glue
logic to HQ.  For instance, a bit will indicate whether some/all PCI
transactions go directly to the bridge or route through HQ.  This
registers should be accessible by HQ and also via a PCI config space
register (extended cfg space).

-- 
Timothy Normand Miller
http://www.cse.ohio-state.edu/~millerti
Open Graphics Project
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to