I'm still working on simulating the PCI target, and I have a master
I've written as part of that test.  The master will go on to be
rewritten to be the master that we put into OGD1, so I'm making sure
it works properly as well.

I encountered a situation where the master violated bus protocol.  One
of the things I'm testing is situations where either the master or
target inserts wait states.  For those of you interested in the
protocol, I can go into detail, but the nutshell is that I was
simulating a "slow" bus master.  That is, the master was inserting
wait state cycles.  This corner case is a little hard to fully test,
but I think I have come up with an appropriate solution, and it
appears to work for reads.  Contriving a case for writes may be
difficult.

There are potential problems with this, and what I'd like to do is
make sure that the situation simply never happens.  That means making
the microcontroller able to keep up.  With 33MHz PCI, that shouldn't
be a problem, but with 66MHz, I think it will be.

I don't expect the microcontroller to run at more than 100MHz (in the
FPGA).  That also poses a problem, because there are numerous
situations where it literally needs to be doing two things at once,
and the branch instruction overhead for a single-threaded processor
would be far too much.  Instead, I think it would be better to make it
interleave two threads.  This gives us, effectively, two
tightly-coupled 50MHz processors.  (Everything but the program counter
would be shared.)

So, now, we have a situation where we cannot supply or consume data as
fast as the PCI controller can move it.

For reads, the simplest solution is to have a big (maybe 64 entries)
return data fifo.  For other reasons, we don't want to request overly
long bursts anyhow, so that's settled.  To keep things moving
reasonable well, we'd request bursts of 32 reads.  Request two bursts,
then grab the data for the first one, then request another burst, then
grab the data from the second burst, and so on.  Keeps things moving
reasonably well.

Writes are a little harder.  Certainly, one solution would be to fill
an out-going data fifo with all writes for the request before allowing
the controller to start.  But that means that we can't  pipeline the
writes.  Generate write data, filling the fifo, then tell it to go,
and then wait around for it to finish.  Or rather than waiting, we
could throw in another burst.  But it would be much more efficient to
be able to push data into the fifo as we're getting it.  When the
first word is available, start the transaction, and then just dump
words in as we get them.  I'm not entirely sure what is a good way to
solve this problem.

But it gets even worse.  There's something I haven't bothered to tell
you.  The master state machine isn't smart enough to restart a
transaction after whomever we're writing to has decided to terminate
the transaction early.  For speed reasons, our own target breaks
transactions at 256-byte boundaries.  This is primarily so that we
don't have to have a counter of more than 6 bits (we may enlarge that
later).  It is convenient to end the transaction (and 64 word bursts
are quite long anyhow) and let the other master begin again.  Well, we
have to do the same.

The basic pattern our "software" code in the microcontroller has to
employ goes something like this:

- request a transaction at a certain address (read or write)
- request a certain number of words to be transferred
- check back later to find out if the right number of words got moved.
-   if so, we're done and can do something ele.
- otherwise, wipe out everything in the master request fifo
- recompute a new starting address, at the location where we have to begin again
- request the remaining number of words
- continue the pattern until all of the data is transferred.

We can run into a pathological situation where the target of our
accesses terminates after every word.  This is legal, but it sucks
because we spend all of our time babysitting the master, AND we cannot
do any kind of streamlining.  The reason to do it this way is that we
consider this to be a corner case.  It shouldn't happen very often, so
it's better to dedicate some ROM space to some code than to dedicate
gates in the chip.  We especially want to avoid computing addresses
anywhere but in the microcontroller.

There needs to be some discussion about this.  There may be good
reason to move the restart logic into hardware, for instance.  Not
because it's time critical but because a restart is so disruptive to
efficient dataflow, and the real cost there is all of the code we have
to write to make it work smoothly.

Consider the situation where we're fetching GPU commands from a buffer
in the host.  One thread would spend all its time babysitting the PCI
master.  The other thread would be busy processing the read data
coming in.  This should work out okay, but we need to discuss some
fine details to make sure that everything keeps up.

One thing I haven't solved is how to deal with an empty data fifo.  I
absolutely don't want to design into the microcontroller anything that
allows the pipeline to stall.  Somehow, that has to affect flow
control with low overhead.  But we don't necessarily know about it
until several stages down in the pipeline, making is a massive flow
control hazard.

--
Timothy Miller
http://www.cse.ohio-state.edu/~millerti
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to