Re: [Xpert]Using MMX assembly (for video card drivers)

2002-01-11 Thread Ewald Snel

[...]

> BTW - does anyone know why the mga driver internally converts to 422
> format ? It seems to me that mga 400 and 450 chips do support 420
> planar format... (I saw some sample code using it, I can probably find
> it back if needed). I think XFree would benefit from using this
> feature instead of converting to nonplanar 422.

I also wrote a patch for this several months ago (even before XFree86-4.1.0).
If you're interested, I've uploaded it here :

http://rambo.its.tudelft.nl/~ewald/XFree86-4.0.99.3-mga-xv-planar-data.patch

It's about 13% faster decoding DVD movies on a PII-350 using planar format 
instead of converting to YUY2. Unfortunately, the Matrox hardware is not 
capable of filtering the chrominance component in vertical direction, so you 
can't have that at the same time.

> Cheers,

bye,

ewald
___
Xpert mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/xpert



Re: [Xpert]Using MMX assembly (for video card drivers

2002-01-11 Thread Ewald Snel

> At 11:26 AM 4/01/02 +0100, Ewald Snel wrote:

(sorry for the duplicate message, it was delayed for one week (see date))

[...]

> It would be interesting to see if the same could be achieved with 3DNow!
> instructions, as this would provide a welcome boost for anyone with an AMD
> K6-2 or K6-3 or any of the other 3DNow! capable CPU's. I'm sure there are

Using MMX will benefit any CPU capable of MMX instructions, including AMD K6, 
K6-2, K6-3 and Athlon/Duron processors. That's why I did not use SSE or 
3DNow!.

> also a number of other platforms that could use in-line assembly to do the
> same (eg: PPC/Altivec).
>
> Out of interest, how much in-line assembly code are you referring to?
> Anywhere some of us can get a look-see?

Here's an image of what it looks like ...
http://rambo.its.tudelft.nl/~ewald/xfree86-chrominance-filter.jpg

And here are some patches ...
http://rambo.its.tudelft.nl/~ewald/

bye,

ewald
___
Xpert mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/xpert



Re: [Xpert]Using MMX assembly (for video card drivers)

2002-01-10 Thread Stuart Young

At 11:26 AM 4/01/02 +0100, Ewald Snel wrote:
>Hi,
>
>Could I use MMX assembly for improving the mga video driver? I wrote a
>vertical chrominance filter (*) for the XVideo module using inline MMX
>assembly. This allows me to improve output quality without any speed penalty.

It would be interesting to see if the same could be achieved with 3DNow! 
instructions, as this would provide a welcome boost for anyone with an AMD 
K6-2 or K6-3 or any of the other 3DNow! capable CPU's. I'm sure there are 
also a number of other platforms that could use in-line assembly to do the 
same (eg: PPC/Altivec).

Out of interest, how much in-line assembly code are you referring to? 
Anywhere some of us can get a look-see?


Stuart Young - [EMAIL PROTECTED]
(aka Cefiar) - [EMAIL PROTECTED]

[All opinions expressed in the above message are my]
[own and not necessarily the views of my employer..]

___
Xpert mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/xpert



Re: [Xpert]Using MMX assembly (for video card drivers)

2002-01-10 Thread Michel Lespinasse

On Fri, Jan 04, 2002 at 11:26:02AM +0100, Ewald Snel wrote:
> Could I use MMX assembly for improving the mga video driver? I wrote a 
> vertical chrominance filter (*) for the XVideo module using inline MMX 
> assembly. This allows me to improve output quality without any speed penalty.

BTW - does anyone know why the mga driver internally converts to 422
format ? It seems to me that mga 400 and 450 chips do support 420
planar format... (I saw some sample code using it, I can probably find
it back if needed). I think XFree would benefit from using this
feature instead of converting to nonplanar 422.

Cheers,

-- 
Michel "Walken" LESPINASSE
Is this the best that god can do ? Then I'm not impressed.
___
Xpert mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/xpert



[Xpert]Using MMX assembly (for video card drivers)

2002-01-10 Thread Ewald Snel

Hi,

Could I use MMX assembly for improving the mga video driver? I wrote a 
vertical chrominance filter (*) for the XVideo module using inline MMX 
assembly. This allows me to improve output quality without any speed penalty.

Of course, I'm using "#ifdef USE_MMX_ASM" and the original C code as an 
alternative for other CPU architectures. Runtime detection of MMX support is 
not included yet, but will be added if MMX is allowed.

Thanks in advance,

ewald

(*) This fixes red blockiness (2x2 pixels) for DVD/MPEG movies (attachment).

<>

Re: [Xpert]Using MMX assembly (for video card drivers)

2002-01-04 Thread Billy Biggs

  To reply to my own mail  :)

Billy Biggs ([EMAIL PROTECTED]):

> > It's actually 0.5 pixel (my mistake :)) using the following filter :
> > 
> > o   o   (c=c1)
> >  c1
> > o   o   (c=.5*c1 + .5*c2)
> > 
> > o   o   (c=c2)
> >  c2
> > o   o   (c=.5*c2 + .5*c3)
> 
>   I don't think this is right for MPEG2.

  I sent this and realized I might look like an asshole.  :)  This
should read:

  Thanks, I see what you mean now, and yeah, I think this filter is
wrong for filtering chroma from MPEG2.  :)

  Apologies.

-- 
Billy Biggs
[EMAIL PROTECTED]
___
Xpert mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/xpert



Re: [Xpert]Using MMX assembly (for video card drivers)

2002-01-04 Thread Billy Biggs

Ewald Snel ([EMAIL PROTECTED]):

> > Please, please correct me if I'm wrong here.  In MPEG sampling, the
> > chrominance sample is halfway between the two luminance samples on
> > the same vertical scanline (by is138182):
> 
> I think you're right, my interpolation looks like this :
> 
> o   o   (c=.75*c1 + .25*c0)
>  c1
> o   o   (c=.75*c1 + .25*c2)
> 
> o   o   (c=.75*c2 + .25*c1)
>  c2
> o   o   (c=.75*c2 + .25*c3)

  You mean you think I'm wrong. :)  My picture was wrong, yours makes
total sense, and I now believe your filter is reasonable.

> It's actually 0.5 pixel (my mistake :)) using the following filter :
> 
> o   o   (c=c1)
>  c1
> o   o   (c=.5*c1 + .5*c2)
> 
> o   o   (c=c2)
>  c2
> o   o   (c=.5*c2 + .5*c3)

  I don't think this is right for MPEG2.

-- 
Billy Biggs
[EMAIL PROTECTED]
___
Xpert mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/xpert



Re: [Xpert]Using MMX assembly (for video card drivers)

2002-01-04 Thread Ewald Snel

Hi,

[...]

> > Something like that, the filter uses 0.75x nearest chrominance sample
> > and 0.25x second nearest chrominance sample. This is more accurate as
> > it doesn't shift the chrominance signal by 1 pixel.
>
>   Please, please correct me if I'm wrong here.  In MPEG sampling, the
> chrominance sample is halfway between the two luminance samples on the
> same vertical scanline (by is138182):

I think you're right, my interpolation looks like this :

o   o   (c=.75*c1 + .25*c0)
 c1
o   o   (c=.75*c1 + .25*c2)

o   o   (c=.75*c2 + .25*c1)
 c2
o   o   (c=.75*c2 + .25*c3)

[...]

>   So, are not the chroma samples above and below the same distance away?
> I thought this was the purpose of MPEG sampling, that is, it's
> reasonable to convert to 4:2:2 sampling by doubling the scanlines.

It's reasonable, but doubling the scanlines will make the image look a little 
blocky as both scanlines use the same chrominance values. That's why you 
should use filtering.

>   Are you sure that maybe the images where you see that nasty chroma
> artifact aren't from when the DVD is using interlaced encoding?  In this
> case, each second chroma sample is from a different field, and you can
> get blocky errors because you don't correllate samples correctly.

The source was a non-interlaced MPEG-1 video file. The red blocks are very 
small for (high resolution) DVD movies, but they are still visible.

>   What do you mean by shifting the chroma by one pixel?

It's actually 0.5 pixel (my mistake :)) using the following filter :

o   o   (c=c1)
 c1
o   o   (c=.5*c1 + .5*c2)

o   o   (c=c2)
 c2
o   o   (c=.5*c2 + .5*c3)

bye,

ewald
___
Xpert mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/xpert



Re: [Xpert]Using MMX assembly (for video card drivers)

2002-01-04 Thread Erik Walthinsen

On Fri, 4 Jan 2002, Billy Biggs wrote:

>   Please, please correct me if I'm wrong here.  In MPEG sampling, the
> chrominance sample is halfway between the two luminance samples on the
> same vertical scanline (by is138182):
>
>o   o  where   o == luma sample
>x  x == chroma sample
>o   o
Note that this depends on which version of MPEG you're talking about.  I
forget which (I can look it up if anyone's interested), but one of the
MPEG standards specifies that the chroma samples are located between the
lumas in both dimensions, i.e.:

o   o
  x
o   o

>   So, are not the chroma samples above and below the same distance away?
> I thought this was the purpose of MPEG sampling, that is, it's
> reasonable to convert to 4:2:2 sampling by doubling the scanlines.
Possibly, but you have to beware what the chroma position is for the 4:2:2
as well.  If the 4:2:2 specifies colocated first luma and chroma, it will
work nicely for the first form (above).  If in the middle, it'll work for
the second form.

>   What do you mean by shifting the chroma by one pixel?
If a chroma sample is colocated with a luma sample (in either dimension),
you get the following:

ooooo
 x x
|^|^|

Where a single chroma sample impacts three adjacent pixels (note the
difference between pixel and sample...), and the luma samples in the
middle actually get chroma from two different chroma samples.  In this
case you have to give differing amounts to each new (resampled) sample,
according to the percentages mentioned previously.

  Erik Walthinsen <[EMAIL PROTECTED]> - System Administrator
__
   /  \GStreamer - The only way to stream!
  || M E G A* http://gstreamer.net/ *
  _\  /_

___
Xpert mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/xpert



Re: [Xpert]Using MMX assembly (for video card drivers)

2002-01-04 Thread Billy Biggs

Ewald Snel ([EMAIL PROTECTED]):

> > > I wrote a vertical chrominance filter (*) for the XVideo module
> > > using inline MMX assembly. This allows me to improve output
> > > quality without any speed penalty.
> >
> > Do you mean for upsampling to 4:2:2 ?  How do you filter?  Do you
> > average to create the new chroma line?
> 
> Something like that, the filter uses 0.75x nearest chrominance sample
> and 0.25x second nearest chrominance sample. This is more accurate as
> it doesn't shift the chrominance signal by 1 pixel.

  Please, please correct me if I'm wrong here.  In MPEG sampling, the
chrominance sample is halfway between the two luminance samples on the
same vertical scanline (by is138182):

   o   o  where   o == luma sample
   x  x == chroma sample
   o   o

  So, if we look vertically down a 2-pixel wide line, we see:

   o1  o
   x1
   o2  o o == luma sample
   x2x == chroma sample
   o3  o
   x3
   o4  o

  So, are not the chroma samples above and below the same distance away?
I thought this was the purpose of MPEG sampling, that is, it's
reasonable to convert to 4:2:2 sampling by doubling the scanlines.

  Are you sure that maybe the images where you see that nasty chroma
artifact aren't from when the DVD is using interlaced encoding?  In this
case, each second chroma sample is from a different field, and you can
get blocky errors because you don't correllate samples correctly.

  What do you mean by shifting the chroma by one pixel?

-- 
Billy Biggs
[EMAIL PROTECTED]
___
Xpert mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/xpert



Re: [Xpert]Using MMX assembly (for video card drivers)

2002-01-04 Thread Ewald Snel

Hi,

> > I wrote a vertical chrominance filter (*) for the XVideo module using
> > inline MMX assembly. This allows me to improve output quality without
> > any speed penalty.
>
>   Do you mean for upsampling to 4:2:2 ?  How do you filter?  Do you
> average to create the new chroma line?

Something like that, the filter uses 0.75x nearest chrominance sample and 
0.25x second nearest chrominance sample. This is more accurate as it doesn't 
shift the chrominance signal by 1 pixel.

Here are the patches, the second one is for enabling the horizontal filtering 
in hardware:

http://rambo.its.tudelft.nl/~ewald/XFree86-4.1.99.4-mga-xv-mmx-chromafilter.patch
http://rambo.its.tudelft.nl/~ewald/XFree86-4.2.0-mga-xv-uvfilter.patch

These are not paired for Pentium MMX, but performance is already better than 
the C version (which compiles to slow "movzx" instructions). It's nearly 
optimal for AMD Athlon though (about 2 IPC using L1-cache).

bye,

ewald
___
Xpert mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/xpert



Re: [Xpert]Using MMX assembly (for video card drivers)

2002-01-04 Thread Billy Biggs

> > I've also been playing with some mmx-ification of the XVideo
> > routines, for example I also did an SSE-4:2:0-to-4:2:2 function.
> 
> I just did this too, MMX only though. How many cycles/pixel did you
> end up with? What percentage of pairing did you achieve?

  I'll get some numbers in a sec.

> > There was some discussion on #xfree86 about efforts to have a nice
> > runtime detection mechanism somewhere.  Has anyone got any code for
> > this already done?  If not I might also have a go at it.
> 
> there are plenty of samples of this on Intel's site.

 And in many nice abstracted open source modules.  :)  Specifically I
meant code to put this somewhere appropriate in the X tree.

-- 
Billy Biggs
[EMAIL PROTECTED]
___
Xpert mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/xpert



Re: [Xpert]Using MMX assembly (for video card drivers)

2002-01-04 Thread Erik Walthinsen

On Fri, 4 Jan 2002, greg wright wrote:

> I just did this too, MMX only though. How many cycles/pixel did you
> end up with? What percentage of pairing did you achieve?
Note that only P5-core chips care about pairing, per-se.  There are much
nastier issues involved in modern P6 cores.  I haven't thought about them
for quite a while, so it'd take me a while to dig out the stuff and put it
back into main memory, but I think I have a pretty good understanding of
how the P6 really works...

> there are plenty of samples of this on Intel's site.
Unfortunately that just isn't very useful outside Intel's world.  There
are about a half-dozen manufacturers of x86 chips that matter, and they
all have all sortsof bizarre quirks.  I ran across a sourceforge project a
few days ago (x86info I think) that tries to deal with that, but I didn't
look at the code.

There's a larger issue when it comes to other architectures.  There are
similar but in some cases nastier problems on things like PPC and Alpha.
This is why I want to gather all this into a single library.  It would go
closely with my other projects, SpeciaLib and libcodec, which focus on
run-time specialization of time-critical kernels, such as the
motion-compensation code in an MPEG decoder, or color-space
conversion/transliterations, etc. (as in the 4:2:0 to 4:2:2 problem).

You can see a lot of this stuff at http://codecs.org/, though specialib
itself isn't there because it's not anywhere near formed enough for CVS.

  Erik Walthinsen <[EMAIL PROTECTED]> - System Administrator
__
   /  \GStreamer - The only way to stream!
  || M E G A* http://gstreamer.net/ *
  _\  /_



___
Xpert mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/xpert



Re: [Xpert]Using MMX assembly (for video card drivers)

2002-01-04 Thread greg wright

 > I've also been playing with some mmx-ification of the XVideo routines,
 > for example I also did an SSE-4:2:0-to-4:2:2 function.

I just did this too, MMX only though. How many cycles/pixel did you
end up with? What percentage of pairing did you achieve?

>   There was some discussion on #xfree86 about efforts to have a nice
> runtime detection mechanism somewhere.  Has anyone got any code for this
> already done?  If not I might also have a go at it.


there are plenty of samples of this on Intel's site.

--greg

 


___
Xpert mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/xpert



Re: [Xpert]Using MMX assembly (for video card drivers)

2002-01-04 Thread Billy Biggs

Ewald Snel ([EMAIL PROTECTED]):

> Of course, I'm using "#ifdef USE_MMX_ASM" and the original C code as
> an alternative for other CPU architectures. Runtime detection of MMX
> support is not included yet, but will be added if MMX is allowed.

  I've also been playing with some mmx-ification of the XVideo routines,
for example I also did an SSE-4:2:0-to-4:2:2 function.

  There was some discussion on #xfree86 about efforts to have a nice
runtime detection mechanism somewhere.  Has anyone got any code for this
already done?  If not I might also have a go at it.

-- 
Billy Biggs
[EMAIL PROTECTED]
___
Xpert mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/xpert



[Xpert]Using MMX assembly (for video card drivers)

2002-01-04 Thread Ewald Snel

Hi,

Could I use MMX assembly for improving the mga video driver? I wrote a 
vertical chrominance filter (*) for the XVideo module using inline MMX 
assembly. This allows me to improve output quality without any speed penalty.

Of course, I'm using "#ifdef USE_MMX_ASM" and the original C code as an 
alternative for other CPU architectures. Runtime detection of MMX support is 
not included yet, but will be added if MMX is allowed.

Thanks in advance,

ewald

(*) This fixes red blockiness (2x2 pixels) for DVD/MPEG movies
http://rambo.its.tudelft.nl/~ewald/xfree86-chrominance-filter.jpg
___
Xpert mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/xpert