Re: [Open-graphics] [Discussion] Larrabee: Intel's multi-core graphics computing architecture

James Richard Tyrer Wed, 26 Aug 2009 15:05:14 -0700

Timothy Normand Miller wrote:

Today, I'm leading a round-table discussion at OSU regarding Intel's
Larrabee architecture.  I thought that perhaps people on this list
might be interested in engaging in a separate discussion.  Larrabee is
a multicore processor that has several in-order x86 cores enhanced
with special vector processing units, specialized cache architecture,
and other things that optimize it for graphics.  Most things that OGA
will do in dedicated hardware, they do in software, with the exception
of texture filtering, which is just too slow to do in software.  This
paper points out a number of things that are relevant even to our
fixed-function design, such as avoiding wasted bandwidth caused by
over-draw.  But even more, it covers a lot of issues we'll have to
deal with should we ever decide to do a programmable GPU.

IIRC, I mentioned this idea a while ago. A graphics board based onseveral CPUs or DSPs with a display controller to drive the display.


IIUC from the Wikipeda article:

http://en.wikipedia.org/wiki/Larrabee_(GPU)

and Intel's website,

http://www.intel.com/technology/visual/microarch.htm

and the paper. I also read a trade paper article that had thearchitecture somewhat confused, so the paper is the best source.

the Intel chip does not use standard CPUs. It appears that the CPUshave a 16 wide vector processing unit that will handle 64 bit float ascompared with SSE which had 128 bits that can be partitioned intodifferent widths for different size data objects. It appears to me thatif this is only for display that 64bit float is overkill. 32bit is morethan sufficient and I wonder is 16bit for half precision or integerwould be sufficient.

It appears to me that it is the large vector processors that can handlea 4x4 matrix multiply with a single data load (but probably multipleinstructions or in microcode) that are the major advantage. So, what weare really talking about here is not what I mentioned previously, butrather multiple 16 wide vector processors each having a CPU to controlit. It is clearly the number of MAC operations per clock that isimportant no matter how you accomplish it.


I noticed that it didn't say how much multiplication hardware each CPU has.

As we have discussed, having this much hardware available means that itis often wasted since the 4x4 matrix multiplies are not often used whenexecuting GLSL, but they do exist.

I wonder if it would be practical to have multiple chips and achieve thesame thing? This would make an extendable and upgradeable board.

TI has announced a new DSP, TMS320C6748, that does both fixed floatingpoint operations:


http://focus.ti.com/docs/prod/folders/print/tms320c6748.html

that could handle both pixel operations and geometry operations. Thishas only a 16 bit memory interface. But, it has 128K of internal SRAMand IIUC, DSPs normally work on internal memory and use the internal DMAcontroller to move data in and out. It will do 2 32bit x 32bit with32bit out float multiplies per clock at 300MHz.


--
James Tyrer

Linux (mostly) From Scratch
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Re: [Open-graphics] [Discussion] Larrabee: Intel's multi-core graphics computing architecture

Reply via email to