On Dec 8, 2007 10:01 AM, Petter Urkedal <[EMAIL PROTECTED]> wrote:
> On 2007-12-06, Timothy Normand Miller wrote:
> > Note that all of this is very important.  We need to keep track of
> > both the microcode's view of this and the module interface.  They're
> > _slightly_ different, because they come from different perspectives
> > about what's going on, even though they resolve to the same functions.
> >
> > [...]
>
> Let me just focus on the memory for the time being.  I haven't done I/O
> ports before, so I first of all want to get the approach right.
>
> First, I think the I/O port is best put in a completely separate module
> from the HQ, since it's very OGA1 specific.

Makes sense.

>
> I attach a sketch without the PCI bits.  It will basically be a big
> "case" routing data between a HQ input/output word and multiple external
> signals dependent on the I/O address.  Now, I'm uncertain about timing
> issues.  The attached sketch registers it's outputs.  My though is that
> a more combinatorial I/O unit is infeasible given the big "case".  Here
> is how I imagine the timing and data flow:

Big muxes CAN be bad, but these FPGAs are pretty good with them.
Here's what I had envisioned:

// Read I/O ports
always @(address, this, that, yonder) begin
    case (address)
        0: data = free_req_fifo_entries;
        1: data = available_return_fifo_entries;
        2: data = return_fifo_data;
        ....
    endcase
end

Ignore the address numbers and the names of the ports.  This is just
the general idea.  It's also probably what you had in mind.  :)


For writes, some of the ports push things into fifos.  Let's say we
have one where one fifo entry we want to enqueue requires two I/O
writes.  Then it'll be something like this:


always @(posedge clock) begin
    // when queue has accepted data, clear the enqueue
    if (!fifo_full) enqueue_out <= 0;

    if (do_write) begin
        case (address)
            0: data0 <= cpu_data;
            1: begin
                  // provide data
                  data1 <= cpu_data;
                  // signal enqueue
                  enqueue_out <= 1;
            end
       endcase
   end
end


This mostly conforms to the fifo protocol, except that the microcode
is responsible for making sure that there's enough room before
enqueueing.  It won't be held up by a full queue, so it can screw it
up.

>   * cycle 0
>       * The operating mode (store/fetch), address, and optional data is
>         computed by the ALU and registered at the end of the cycle.
>   * cycle 1
>       * The request is now available to the HQ memory stage, and is also
>         decoded by the I/O unit if applicable since that's just a
>         combinatorial path from the ALU output.
>       * For port-write, the I/O unit registers the data on the
>         appropriate external port at the end of the cycle.

In reality, there's only one data register leading out to all of the
receivers.  In cases like the above, where there's an extra word
involved, we may or may not want to combine it with some other.  The
primary thing that distinguishes writes between I/O ports is which
enqueue signal we assert.  For reads, there are multiple sources that
have to be multiplexed.

Note that for semantic clarity, we may give the same bus multiple
names.  We just assign the same signal to multiple output ports.  The
synthesizer knows that they're all the same sets of signals and will
do the right thing (unless we do something to prevent it).

>       * For port-read the I/O unit registers the data from the appropriate
>         external port at the end of the cycle.

Yup.  You had in mind what I was thinking.  :)

>   * cycle 2
>       * For port-read, the HQ memory stage MUXes the data to it's output
>         port.  This is further routed to the register file and to the
>         appropriate destination for register-forwarding.
>
> Some minor points:
>
>   * Is there anything to gain from arranging the code of the I/O unit
>     differently?  E.g. dispatch between {read, write} before address?

We don't know the proper address until we're in the MEM stage.  The
only change we might make is to add an extra cycle to the mem stage,
but I don't want to do that.

I'm not sure what you mean by dispatch here.  Do you mean to split
them up in to separate blocks?  Sure, but that won't change the
underlying logic.  Read are just one big MUX, and writes resolve to a
handful of output registers and an address decode.

Let's get it logically correct then identify where the long delays
are.  Some thought now can save us some work later, but we first need
to nail down what we want to implement.  Then we can find a faster way
to implement it.

>   * We could let read-only ports and write-only ports overlap
>     (arbitrarily) in the address space and add an input which decides
>     whether HQ is reading or writing.  (Given an appropriate
>     rearrangement of the port numbers, this is equivalent in terms of
>     gate-count of the I/O unit: Just bundle/unbundle the is-reading
>     signal with the address signals.)

Yes, we could indeed do that.  This is one of those areas where
someone could get confused, but they can use symbols in the assembler
to name them so they don't mix them up.  The fewer bits we have to
decode, the better.

Sorry if I don't quite address your points.  Christmas part last
night.  :)  But please do ask me to clarify.


-- 
Timothy Normand Miller
http://www.cse.ohio-state.edu/~millerti
Open Graphics Project
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to