Tim Schmidt wrote:
On 4/30/07, James Richard Tyrer <[EMAIL PROTECTED]> wrote:
Perhaps you are correct. When we can buy a motherboard with 2 quad
core CPUs running at 3 GHz, then decoding H.264 should be no
problem. Actually, you will be able to do so in a few months but
the price will be rather high.
But ... (tm)
The question is how much of the system's power will be required to
do this. If it takes over 50%, this is still going to slow down
whatever else you are doing on your system.
So, no matter how powerful your PC (within reason) it is still
going to be useful to have hardware that will do DCT/IDCT and YUV
<> RGB conversion since such hardware will always be more efficient
at doing these specialized tasks. The same technology that makes
the 2 X 4 CPU computer affordable will also allow 8x8 DCT/IDCT and
YUV <> RGB with a latency of a few clocks (get it down to the
theoretical minimum). You should be able to get YUV <> RGB down to
3 clocks since it is a [3 vector] * [3 x 3 matrix] + a [3 vector]
-- 3 multipliers and 3 adders per color (you can do this today but
it is too expensive). Still haven't found my DSP text book so I
don't know what the theoretical minimum would be for the DCT/IDCT
If wavelets become the standard method of compression then hardware
that is good at performing a 2D FIR filter will be the next thing.
All sounds good... I agree that specialized hardware can be more
efficient - it's certainly more efficient than doing the exact same
thing with generalized hardware, but we need the generalized hardware
anyway, so specialized hardware needs to be a LOT more efficient to
be worthwhile in terms of efficiency. Depending on the usage of the
system, it may be dead weight most of the time.
Yes, I agree. It appears that the two things which I mentioned would be
used enough and would be enough faster. DCT/IDCT is used in many still
and video compression/decompression methods. AFAIK (and please correct
me if I am wrong), all currently used video compression is done in YUV
(or the slightly different HD equivalent) so the YUV -> RGB would also
be used a lot. Actually, the YUV <> RGB could do any similar transform
(it is fixed point and only 8 bits [in and out]) all you need to do is
reload the coefficient and constant registers -- the fact that it is
only 8 bits means that it wouldn't be that expensive.
Implementing enough hardware to decode a specific type of video
stream (and ostensibly other types as well) seems a bit much.
I would not suggest specialized hardware for only a specific type of
video stream. But DCT/IDCT are used in many compression/decompression
methods and with the ones that use it, it is a large part of the
computation.
It seems the sort of problem best attacked by implementing hardware
that accelerates the tough part of the decoding as a general purpose
instruction, and allows us to support newer (and older!) codecs with
a little bit of soft/firm ware. MMX / 3DNow! / SSE attempt to do
this on the CPU side, a GPU equivalent seems warranted.
The 128 bit SIMD processor is very useful, but it is floating point and
tends to be a waste of resources to compute a fixed point DCT/IDCT or
the 8 bit YUV <> RGB yet these operations require a lot of multiplies.
Note that if you have the full array (9 multiples and 9 adders) for YUV
<> RGB that it can run at a slower speed.
For the 4 x 32 bit float SIMD processor, the AltiVec appears to be the
best. It is only available in the e600 POWER-PC core, but I wondered if
it could be used without the processor. Currently, FreeScale's
processors using the e600 core appear to be too expensive. OTOH, the
(now) AMD (formerly NSM) Geode processors have MMX & 3D-NOW!.
--
JRT
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)