On 2007-12-06, Timothy Normand Miller wrote:
> On Nov 30, 2007 4:25 PM, Petter Urkedal <[EMAIL PROTECTED]> wrote:
> > > At the moment, the primary goal is correctness. We can put off SOME
> > > performance issues until somewhat later in the process. In
> > > particular, if you optimize too early, you obscure the semantics of
> > > what you're doing, making it much harder to debug.
> >
> > I've been worried that we'd still had to change the semantics in order
> > to reach timing goal. If you thing it looks realisable, then let's
> > carry on under that assumption.
>
> The nice thing about this situation is that we CAN. As long as we
> keep the assembler and the code up to date, it'll be fine.
About that, I have a few patches which I'm ready to commit. What I did
was completely remove the multiplication logic. After that, there were
only 8 instructions left, allowing us to remove one operator bit. The
good news is that immediates are back to 16 bits, which I think is
"more than one bit" better than 15 bits. OTOH, this will "close" the
instruction set, but I think we could squeeze in another instruction if
we have to by merging the signed and unsigned shifts.
The ALU operations after the commit will be
move
and
or
xor
shiftu (lsl with signed shift)
shift (asl with signed shift)
add
rsub (sub with reversed operands)
from which we can derive noop, neg, not, sub. For each there is
register, fetch, and store modes. Branches are unchanged. For a
preview before I commit see
http://git.eideticdew.org/?p=ogp-pu.git;a=commitdiff;h=e7951eb3ea79df398b72ac1457badf3fee550806
Okay to commit?
> Optimizing too early just makes it an incomprehensible mess. Now, PCI
> is an extreme case, because I had to force restructuring of the logic
> in ways that ISE would not have done voluntarily. Xilinx even told
> me, when I opened a web case, that my manual MUX approach was about
> the best I was going to do. But it's nasty. Instead of burying
> selections in the logic in a way that makes semantic sense.... well,
> let me be clearer....
>
> There are, I believe, 3 or 4 different inputs from PCI that we cannot
> register because we need to use the signals directly on a given cycle.
> These include #STOP and #IRDY. Let's say we have an internal state
> or output or whatever that depends on those two. What I had to do was
> design logic to generate what the state would be for each of the four
> possible combinations of those signals, force ISE to generate logic
> for those, and then MUX those based on the inputs. It's vital to use
> those inputs at the absolute last stage of the logic. Since they're
> coming in from the PCI bus, there's already a huge delay on them.
> (Out from the driver, delay on the bus, ad input buffers in our FPGA.)
> There were cases where I had to MUX 8 different possibilities. IIRC,
> it wasn't any worse than that, although it will be when we do the
> Master.
I see what you mean now, and I recall you pointed out this trick for use
in HQ.
Since this thread is getting pedagogical, let me point out to other the
HQ analogues. These are only binary, though.
* In the ALU (stage 3) the output of add is MUXed with the output from
the other operations, since add is the most expensive operation in
terms of propagation delay. Thus add is allowed to utilise the full
cycle, independent of the other operations. The price is a short
delay on the output of the ALU, which spills into the next stage.
* In the register fetch (stage 2) and memory (stage 4) it was
necessary, since the outputs of the register file and the block RAM
are already registered.
All cases are easy to find, as the inputs to the MUXes are named
<signal>_if_<cond> and <signal>_if_not_<cond>.
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)