Re: [Open-graphics] Multipliers in oga1hq

Patrick McNamara Tue, 04 Sep 2007 16:33:41 -0700

Timothy Normand Miller wrote:
> On 9/4/07, Patrick McNamara <[EMAIL PROTECTED]> wrote:
>
>   
>> Obviously a concern.  One of the problems to consider here is we don't
>> know how we are going to make VGA work.  For example, how are we going
>> to handle reads from VGA memory?
>>     
>
> We'll map part of the graphics memory into the lower address space.
> Note that this may require a slight mod to the PCI target so that we
> can alter the size of the aperture.  And we'll have to store the
> original mapping somewhere so we can restore it later.
>
> Anyhow, so in text mode, there's no special hardware, so we'll just
> make reads and writes just pass through to the memory controller.
>
> In a VGA graphics mode that requires more smarts, the read or write
> raises an interrupt with the nanocontroller that then makes it own
> modified request to the memory system.
>


My question was centered around VGA reads since the VGA interface
expects to spit out data formated in "funky" VGA formats.  I suspect we
are going to have to shadow VGA memory so that reads can come directly
from the shadow memory and writes get written to the shadow memory and
also the the framebuffer as 24 bit pixels.

>   
>> What about font bitmaps?
>>     
>
> Those are just stored in graphics memory.  The text mode has a
> standard way to store those, and we'll just have the VGA controller
> use them to convert text to graphics.
>   
They are effectively stored as bitmaps anyway.
>   
>> There are a
>> lot of details surrounding the VGA pipeline that will need to be worked
>> out too.  We don't want to get stuck in an architecture that can't
>> support our final needs either.
>>     
>
> I agree.  But I suspect that rather than finding our architecture
> inadequate, we'll find it to be inefficient in some respects.  We may
> add specialized instructions to make up for that.
>
>   
>>> One thing about the BRAM.  If we were to try to use it as the primary
>>> register file, I'm not sure we could double-pump it like we need to.
>>> Routing to/from the RAM may impose too much delay.  One of the
>>> advantages of the current architecture is that it is effectively
>>> triple-ported.  We can write one reg and read two at the same time.
>>> If we use the BRAM as you describe, we serialize it, making any
>>> instruction that requires access to three operands take 3 cycles.
>>>
>>>       
>> That is definitely a concern.  Allowing for only one read and write per
>> cycle would require addition of a second fetch stage in the pipeline.
>>     
>
> How would this help?  I don't know what you mean.
>   
Given that you have two source registers and a target register, with
dual port memory you can fetch both register contents in a single
pipeline stage.  This stage can also allow for a write in a tri-port
setup as we have.  Assuming we don't allow for ALU operations on
non-register locations (indirect addressing), then you would normal
follow with the ALU/MEM stage.  If you can only do one read and one
write per register access then you have to have two register fetch
stages stages, one for each register, prior to the ALU/MEM stage.

tri-port:
instruction fetch
instruction decode
register fetch
ALU/memory
write back

dual-port:
instruction fetch
decode
register fetch
register fetch
ALU/memory
write back

Or something like that.  It's been 10 years since my processor design
class and I sold the book because I was a poor college student at the time.
>   
>>> Did you mean DMA and VGA?  We'd never be doing DMA at the same time as VGA.
>>>
>>> If you're referring to the fact that it has to handle VGA translation
>>> at the same time as intercepting PCI transactions so it can do the
>>> rest of the VGA stuff, then you're right.
>>>
>>>       
>> Are you positive we won't be doing DMA at the same time as VGA
>> transactions?  We won't be initiating DMA, but we may very well be the
>> target.  But, yes, I was referring to PCI and VGA.
>>     
>
> I'm positive.  Only the 3D GPU needs DMA.  Upon starting X11, we'll
> have software load the DMA program.  On exit from X11, we'll have it
> (or the kernel or whatever) reload the VGA program.  We can only be in
> one graphics mode (well, one per head, but ignore that) at a time, so
> there's no issue with 640x480x16.  And we also won't be in text mode
> and graphics mode at the same time.
>
>   

What does the the nanocontoller handle in the way of DMA?  What happens
if I initiate a DMA transaction from main memory targeting the VGA
memory space?  I don't actually know if that is allowed with standard
VGA, I will need to do some research.

We do have to provide text based ouput in graphics mode.  You can make a
BIOS interrupt call on an x86 system to print text, even in graphics
mode.  Obviously this is different that standard graphics mode from our
perspective, but what we have to do is very similar.

Once we have a basic 3d pipeline available, we could use it to assist
with scaling and text.  If the VGA screen is simple a poly and the video
memory is the texture for that poly then we don't have to handle scaling
at all, just format translation.  Likewise if an 80x25 text screen is
simply 2000 polys and the character is the texture then text mode
becomes quite easy to.  When a character is changed all we need do is
change to texture.

An interesting side effect of this is the capability to dump the text
console into a window after the window manager starts, or to allow a VM
direct access to the VGA hardware while the 3d pipe is handling normal
display.  These are obviously just neat little things that could be done
and not at all necessary.  But there are valid reasons to consider
supporting DMA, PCI, and VGA (or another context that we haven't thought
of yet) if at all possible.

>> My concern is that on interrupt or context switch, a multiply is needed
>> early on in the execution.  Two independent threads of execution should
>> not have to know what the other was doing, especially since interrupts
>> cannot know what the other was doing.  If the code has to check whether
>> a multiply is pending from another execution path before submitting its
>> own, this has a further impact on multiply performance.
>>     
>
> Only the interrupt needs to worry about this.  That helps a bit.  But
> as I say, if the multiplier is pipelined, then it's a non-issue.  If
> it's not pipelined, then we will indeed have to query it before using
> it, unless we ensure that the ISR doesn't issue a multiply too early.
>
>
>   
The first stage of the VGA pipeline is a barrel shift.  Being able to
use the multiplier for this would be very useful.  Otherwise it will
take 8 processor cycles plus branch overhead in the worst case.  Though
that may be faster than the multiplier can work as a barrel shifter so
it may be a moot point.
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Re: [Open-graphics] Multipliers in oga1hq

Reply via email to