On Sat, 17 Mar 2007 10:06:14 +0100
Dieter <[EMAIL PROTECTED]> wrote:

> > >   1) fully decode in TRV(sp?) chip?
> > >           adds complexity, chip takes longer to design
> > 
> > Infeasible with our budget.
> 
> Economic budget or time budget?  Do we have an idea of how much
> larger the die would be?  I suspect the main problem would be
> design time.

Both. It will use too much real estate on the FPGA/ASIC.
I assume, that if we want to have a somewhat complete h.264 support,
we'll need about half, to as much space as with the rest of the GPU.

> > >   2) Partly in CPU, partly in TRV chip?
> > >           from previous discussion, this may be difficult
> > >           or impossible?
> > 
> > The only way IMHO.
> 
> But I keep reading that large amounts of data have to go back and
> forth between CPU and GPU?

Yes, that's not feasible, that's why i said we have to move
the SW/HW boarder from right to left.

> Would PCIe x16 be enough?

Definitly. I think even one or two lanes should be enough.
The problem is only whether we can make an architecture that works
well enough with this. And this would incooperate the video player too.
But, having such an arch wouldnt do us any good. Nobody will support
it because it's too complicated to use. (unless we are talking about
embedded systems)

> There
> are other ways, like putting the GPU on the hypertransport bus,
> but that would *greatly* reduce the mainboards you could use.
> You could put a CPU on the OGC card, but that would increase
> the economic cost and power&heat cost significantly.

It would make it impossible to use OGC on the biggest market
we target: embedded systems.


> > >   3) wait for AMD to sell a AMD64 30,000+ x16 CPU?
> > 
> > Impossible, light isnt that fast.
> 
> IIRC they now have 6000+ x2, so they only have to find a 5x
> speed improvement. 

Yes, but 4GHz is about the limit how fast chips with
2x2cm can be clocked. And even that needs very carefull
design of the clock distribution and signal routing.
Keep in mind that 2cm is about the distance a signal
can travel within one clock cycle (depends a lot on
the actually used wire model). And then you still
need some safety margin to reliably operate the chip
under all specified conditions.


> (And of course I just made up the 30,000
> number, I don't know what the real requirement would be.)

I think a 3GHz cpu should be enough for 1080p content,
even without DMA. But i don't have such a system available.
(3GHz real clock, not AMDs rating)

> I read that Barcelona will have 128 bit SSE instructions
> and x4.  I assume they will eventually have a CPU that is fast
> enough to decode video and also do other things at the same time.

CPUs are fast enough to decode video. You just have to use
the proper software architecture and use the already available
techniques. As i said, none of the video players these days
use DMA, just that should bring a performance boost of 10-40%.

> > >   4) wait for TI to sell a faster DSP?
> > 
> > DSPs wont help as much as you think. DSP are for signal
> > processing, not for video decoding. Yes, video decoding
> > is to a certain extend signal processing, but not in a
> > way that maps easily into DSP like systems.
> 
> If TI can bump the speed up 2x or so (just a guess) it should
> be fast enough.  I have no idea when or if we will see this.

I don't know the underlying architecture Ti uses for
their DSPs, so i cannot comment on this.

 
> > >   5) find out what the standalone Blu-Ray and HD-DVD
> > >      players use, and see if we can use that
> > 
> > Forget that... unless you want to deal with NDAs and
> > closed source libraries.
> 
> I assume someone makes a chip that decodes in hardware that
> these things use?  If so, then it is possible to do in hardware.

I never said it's impossible. It's just infeasible to do
it with OGA, or more correctly with OGA1. 

> What really matters is that we have documentation on how to talk
> to the board, allowing device drivers to be written.  If that is
> possible, we could hold our nose and use a NDA'd chip.  Depends
> on what information they will make available.  Some companies
> will not tell you what you need to know even with a NDA.

And that's usualy the case with setop-box chipsets. You don't
even get the information on how to use the video decoders
or video output. Just some binaries. If they don't work, too bad.

> 
> > >   6) something else?
> > 
> > Keep the current aproach.
> 
> Which is basically the same as 3 - wait for a super fast CPU.
> And kiss a big part of the market goodbye.

No. Implement fast DMA, upsampling and YUV->RGB support in
hardware. Make those things easily availbale to user space
programms. Make use of these and you'll get the needed performance.

I also plan to have a look whether a deinterlacer and inverse telecine
filter could be implemented reasonably well in hardware. Both would
allow to skip one filter stage that is currently performed in
software and leave it to the hardware.

I think, properly designed, a deinterlacer should be possible to
be done in OGA1 (needs to access a few lines in parallel, but i
think it can be pipelined). I'm not so sure whether an inverse
telecine filter can be done. But then again, inverse telecine
isn't that CPU intensive as deinterlacing.


                        Attila Kinali
-- 
Linux ist... wenn man einfache Dinge auch mit einer kryptischen
post-fix Sprache loesen kann
                        -- Daniel Hottinger
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to