Re: [Open-graphics] Multipliers in oga1hq

Patrick McNamara Tue, 04 Sep 2007 05:03:42 -0700

Nicholas S-A wrote:
>
> On Sep 3, 2007, at 10:09 PM, Patrick McNamara wrote:
>> As a starting point for having a single memory space for registers and
>> RAM take for example the ATtiny45.  This controller has 32 general
>> purpose registers, 64 I/O registers, and 256 bytes of RAM.  The memory
>> maps looks something like this.
>>
>> 0x0000-0x001F:  general purpose registers
>> 0x0020-0x005F:  I/O registers
>> 0x0060-0x015F:  RAM
>>
>> I won't go into the AVR instruction set, but I can access any location
>> within the memory map with a single instruction type.  The AVR ISA does
>> still preserve register syntax in a number of different instruction
>> mnemonics, and we could as well.  Nothing says we couldn't map several
>> mnemonics to a single instruction.   All instructions now comprise of
>> two source and one destination address fields. If we allow for
>> immediates, they replace one or both of the  source addresses.  To go
>> with Petter's example, the high bit selects IO space or memory space.
>> Assuming we allow for more memory space than we need for IO space, it
>> would be quite ok to mirror the IO space.  Say for example you have a
>> 128 byte memory space but only need 32 bytes of IO space, you can
>> effectively ignore bits 5 and 6 in the IO address and effectively
>> replicate the IO space four times.  I'm afraid I haven't been paying
>> close enough attention lately to have a good feel for how big of a
>> scratch RAM space is needed.
>
> What is stopping us from just having 512 registers? Is it the
> instruction size?
> If so, don't we have 36 bits, not 32?
Only instruction size.  Instruction size and the number of instructions
needed control the address space available for each instruction.
>
>> IIRC, if a BRAM is 512x36 correct?  Since the BRAM is dual ported are
>> allowed 2 reads and two writes per cycle assuming you read on one clock
>> edge and write on the other.  We could break the BRAM in two, using half
>> for memory/register and the upper half as dedicated stack space.  Even
>> if you only get one read and write per cycle, appropriately designing
>> the pipeline could work around this.
>>
>> Something else I was thinking about relates to using the same controller
>> core for both PCI and VGA duties.  We effectively have to be able to
>> context switch to do this, and we have to be able to do it quickly to
>> meet PCI timing requirements.  I don't know what our BRAM budget is
>> right now, but could we effectively have two sets of memory/registers
>> and stack for the core.  When we need to context switch we switch BRAMS.
>> You could actual expand this to as many BRAMs as you want to use.  To
>> keep from having to flush the pipe, and a context pipeline that marches
>> in step with the normal processor pipeline.  For two contexts this is
>> just an n bit shift register (where n is the number of stages in the
>> pipeline).  The value of the bit at any given stage in the pipeline
>> selects the target BRAM for that stage.  More than two hardware contexts
>> means expanding the width of course.
>
> Interesting. We only have 24 BRAMs on the device, and need some for
> buffers,
> but I think that this might work.
Also consider that the FPGA version only needs to be a proof of
concept.  If we only allow for two contexts, but get them working, then
with the ASIC where we may not be as constrained on real estate then we
can expand.


>
>> Context switching does of course bring us back to the problem of the
>> multiplier.  If multiplying doesn't stall the pipe waiting for the
>> answer, then we really don't want to context switch (or interrupt) in
>> the middle of a multiply.  This causes all sort of problems though since
>> we are effectively working in a realtime environment.  If we need to go
>> service a PCI transaction, we can't wait 10-20 cycles for a pending
>> multiply to finish.  This means that we have to have the output be
>> context (or interrupt) aware.  If the multiplier is context aware then
>> the answer could be written to a separate output as necessary.
>
> Hmm, that is a pretty big problem. Even just ignoring the context switch,
> we are not going to ever need a multiply for PCI, right?
Not sure.  I'm not even sure we have to have a multiply to do the VGA
conversion routines, mainly because I haven't sat down to think real
hard about what is necessary.  The actual question is "Do we need
variable multiplication?"  If all the multiplication we need to do is by
a fix amount, say the line number times screen width, then using a
multiply instruction is probably slower than using shifts and adds.

>
>> Which brings me to a question that has been tickling the back of my head
>> for a bit.  Why aren't we using the multipliers embedded in the FPGA?  I
>> know there are limitations on how the BRAMs can be configured and still
>> use the multipliers, but I couldn't find anything quickly in my archive
>> of list messages.
>
> The basic problem is that we are using the Lattice XP FPGA instead of the
> Spartan for the nanocontroller to give us more room for the OpenGL
> Pipeline.
> They have distributed (9K) RAM but no multipliers.
Doh!  That makes sense.
>
> Cheers!
> Nicholas
>

_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Re: [Open-graphics] Multipliers in oga1hq

Reply via email to