On Wed, Jul 30, 2008 at 2:19 PM, Petter Urkedal <[EMAIL PROTECTED]> wrote:
> On 2008-07-29, Timothy Normand Miller wrote:
>> There's some weirdness we're going to have to deal with.  The async
>> fifos didn't used to have a used-entry count output.  I added a new
>> fifo type to the fifos directory that is a modification of
>> async_fifo_16, but with a count output.  Even though the fifo could
>> hold 16 entries, the count output is only 4 bits.  It can also report
>> valid output, empty, and zero count at the same time.  That's because
>> the output gets dequeued into a register.  What I did, as a hack
>> internally, was logically OR the "next" valid signal with the low bit
>> of the "next" count.  This makes the count either zero or odd and
>> potentially an underestimate while guaranteeing a non-zero count if
>> there is something that can be dequeued.  [...]
>
> If I understand correctly, the correct count is the internal FIFO count
> plus valid_out (or !is_empty, if it's registered), but we don't want to
> waste gates and timing on the increment.  How about ORing 32{!valid_out}
> into the MEM_READQ_AVAIL port and document the fact?  Then, -1 means
> empty and n ≥ 0 means there are (n + 1) entries in the queue.  This can
> be dealt with in code by replacing tests for zero (jzero) with test for
> negativity (jneg).

This is a good idea.  Like I say, I'm not sure it does us any good in
any case because the situation where it matters most, where there are
16 entries ready to be processed, is an exceedingly rare situation.
My idea is to read the fifo count and somehow use that to jump into an
unrolled loop of 16 identical moves.  So we have a sequence of moves
that moves data from one fifo into another, we detect that there are,
say, 6 words to be moved, so we jump into the unrolled loop so that
the last 6 moves get executed.  Doing it your way would work just as
well since it's a trivial change in the math that computes the jump
address.  Without the mod, there's less logic, but you'd have to
repeat the check perhaps a bit more often.  On the other hand, we also
have the issue of making sure that we don't try to pull more words
than we push... we read the free count from one, the used from the
other, then move the min of the two.  That makes being able to move 16
even less likely.  Since the fifo reports only odd numbers (or zero),
I'm not sure what impact there'll be if now and then we move 7 when we
could have moved 8, because on the next loop, we'll move 7 when we
would have moved 6, so what's the difference?

When we get to the point where we start wanting to profile this code
for improving performance, we can see if we have a problem here.  I
suspect that we're going to find an even bigger demand for adding
instructions for things that are too slow to do as a sequence of
instructions.  On a 33MHz bus, we have about 3 CPU cycles per PCI
cycle.  About 1.5 for 66MHz, and 66MHz is the speed we'll use on a
PCIe product with an additional off-the-shelf PCIe to PCI-66 bridge
chip.

This may seem minor, but I'd like to leave it open on a list of other
questions, big and small, to be answered.

-- 
Timothy Normand Miller
http://www.cse.ohio-state.edu/~millerti
Open Graphics Project
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to