I think one thing that we are missing here is the fact that we are
thinking about a GPU design, not a full blown CPU design. Has anyone
here ever written a shader before?

Here is my vote, keep it simple. If you go with CISC or anything with
a longer pipeline, you are going to have problems with data
dependency, long pipelines,

A MISC design is going to need two maybe three stages in the pipeline.
Fetch, and Execute, maybe decode, but maybe not. Data dependency is
not going to be an issue. It would be a blast programming a compiler
for this sort of GPU, you could optimize the shaders to death.

We have to stick with what is practical. And what will work well. Plus
we are limited by the following restrictions:

Low clock rate (200-300Mhz?)
Small transistor space

What ever we make must fit in these two restrictions.

I do have a question though? Does the GPU on the current OGP design
have direct access to the memory? Or does it contact the video memory
through a memory controller of sorts.

If, somehow we could give the GPU direct access to video memory,
basically 64MB of registers. Then we would have a design that would
give some powerful performance benefits. We could then design the MISC
modules to accept memory locations. So you could say, "multiply 0x0004
with 0x01004 placing result in 0x02004 executing it 0x0010 times.".

We find ourselves in a catch 22 here. I'm afraid that a RISC design is
not going to be fast enough. We'll be trying to push too many
instructions through the chip too fast. However, a CISC design is not
going to be much better. We cannot go with Out-of-Order execution
because of the complexity. But performance is going to suffer unless
we can execute more than one instruction at a time.

But what someone said here was right. We won't know how it works until
we start trying to program it. That's the wonderful thing about OGP
right? So when we get the first prototypes out, those of us who feel
like it can program our own GPU on it.

Timothy

On 4/17/06, Timothy Miller <[EMAIL PROTECTED]> wrote:
> On 4/17/06, Lourens Veen <[EMAIL PROTECTED]> wrote:
>
> >
> > You can still get high throughput with pipelined functional units. It
> > doesn't matter much if it takes ten cycles to multiply two numbers (or
> > vectors of numbers), as long as you can provide two new numbers to
> > multiply every cycle, and read out the result of the calculation that
> > started ten cycles ago. Throughput will still be ok (or at least as
> > good as it gets at the given clock rate).
> >
>
> One of the things we're forgetting is that static scheduling is way
> behind the curve, but dynamic scheduling requires lots of extra
> hardware.  Unless we hand-code most of what we run on this or have
> some massive peep-hole optimizer library, we're always going to get
> sub-optimal code.
>
> The only way to keep the computing units busy with a new fragment
> every cycle is to avoid data dependency hazards.  We can only do that
> if we can overlap the processing for different fragments (like
> threads).  Then we have to keep track of multiple processor states.
>
> Only slightly related, the statistics I have on branch delay slots say
> that they're only fillable about 60% of the time and they're only
> useful to the computation about 80% of the time when they're filled,
> making delay slots only useful about 50% of the time.
> _______________________________________________
> Open-graphics mailing list
> [email protected]
> http://lists.duskglow.com/mailman/listinfo/open-graphics
> List service provided by Duskglow Consulting, LLC (www.duskglow.com)
>


--
I think computer viruses should count as life. I think it says
something about human nature that the only form of life we have
created so far is purely destructive. We've created life in our own
image. (Stephen Hawking)
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to