On Sat, September 1, 2007 1:23 pm, Timothy Normand Miller said: > On 9/1/07, Petter Urkedal <[EMAIL PROTECTED]> wrote: > >> So, let's consider integrating Farhan's version in the nanocontroller. >> > > http://wiki.opengraphics.org/tiki-index.php?page=Subversion+Commit+Polic > y > > Farhan would need to officially give us (Traversal specifically) rights > to use his work. > No problem. How do i do this officially? Just include the copyright and license statement in my files? I realize that right now i don't have a proper header and i just have comments all over the place, but changes are being made quite often so i'm too lazy to write a decent one at the moment. I will do this once the spec is more or less settled.
> >> code? (Does DMA require multiply at all, other than powers of 2?) > > Doubtful. But if I'm wrong, we maybe should reserve an opcode or two for > some instruction we don't yet know about. > >> I'd go with the non-blocking out-of-band approach. That is, the >> programmer will count instructions before fetching the result. > > I generally prefer this myself. > >> As a slight variant, we can hard-code the multiplication result to r31 >> and drop the fetch-product instruction. That's just as easy to >> implement, and it saves one cycle, since it means the product can be >> directly used as an operand to the ALU. >> > > I'm not sure we want to add additional MUXing after the REG stage. It > might be better to move it into the MEM stage. This is especially not a > problem since we have gobs of time to schedule when the product is > grabbed. > > Having a special instruction to initiate the multiply would save us one > cycle (worth it?). Otherwise, there would be two moves into the > scratch/io space. But the product is only a single word fetch. Putting > it into r31 would save a cycle, because we wouldn't have to move it into > a register first before using it as an operand to another instruction. > > My main concerns are the extra multiplexing logic hurting our max clock > rate. > >> The introduction of interrupts, if needed, will not cause problems as >> long as interrupt handlers don't use the multiplier. Moreover, if an >> interrupt handler needs to use the multiplier, this is also possible: >> When the interrupt handler is sure any pending multiplication is >> finished, it can save the result R. Then it can do it's own >> multiplication. Before returning to normal code, it must perform a >> multiply R*1 and wait long enough for the result to be available. > > I think we may in fact need interrupts, and I'm struggling with it. The > problem is VGA graphics modes. In 640x480x16 and such, framebuffer reads > and writes are not simple accesses. You can apply raster operators to > writes, and you can make reads fill a blt buffer larger than your word > size so that when you write, it causes more than a word size to get > written out. This way, you can bitblt faster than you can move data over > the bus. > > Now, for VGA mode, mostly what the controller does is read VGA text or > pixels and convert them in the background into pixels suitable for our > video controller. At the same time, we want the controller to handle the > extra smarts of VGA. One way to do this is to support interrupts; when a > PCI access comes in, we can intercept it and do the extra stuff. While > writes could be queued for us to process periodically, reads have to be > processed as soon as possible. > > Interrupts won't stall lower parts of the pipeline, but they would divert > the instruction flow. We need to determine how this will affect our > static instruction scheduling. > > Correct me if I'm wrong, but a subroutine call stores the return address > into r31, right? Of course, since that's under main program control, no > problem! But with interrupts, I think we should dump the return address > into a redefined address in the scratch memory. > > What about context switches? Should we require the ISR to copy registers > to the scratch memory? That's a fair amount of overhead, depending on > how many we need to clobber. How about doubling the size of the register > file? The lower half for normal execution, the upper half for > interrupts. (Like how the Z80 did it.) (In this case, the interrupt > return address appears in what we might internally call r63.) Oh, and > don't forget the delayed branch issue and how it'll affect interrupt--one > extra instruction from the main program will get executed, so the return > PC must account for that, and be sure to consider the situation where the > interrupt arrives at the same time as a branch instruction is being > fetched in the main program. > > -- Timothy Normand Miller http://www.cse.ohio-state.edu/~millerti Open > Graphics Project _______________________________________________ > Open-graphics mailing list [email protected] > http://lists.duskglow.com/mailman/listinfo/open-graphics List service > provided by Duskglow Consulting, LLC (www.duskglow.com) > > _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
