Re: [FFmpeg-devel] [PATCH v2] swscale/output: Altivec-optimize yuv2plane1_8

2018-11-26 Thread Carl Eugen Hoyos
2018-11-21 18:19 GMT+01:00, Lauri Kasanen :

> First I ran with THREADS=3, baseline blew up in
> fate-h264-conformance-frext-hpcafl_bcrm_c

I opened ticket #7570, thank you for the report!

Carl Eugen
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH v2] swscale/output: Altivec-optimize yuv2plane1_8

2018-11-26 Thread Carl Eugen Hoyos
2018-11-27 0:17 GMT+01:00, Carl Eugen Hoyos :
> 2018-11-17 9:12 GMT+01:00, Lauri Kasanen :
>> ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt
>> yuv420p \
>> -f null -vframes 100 -v error -nostats -
>>
>> 1158 UNITS in planar1,   65528 runs,  8 skips
>>
>> -cpuflags 0
>>
>> 19082 UNITS in planar1,   65533 runs,  3 skips
>>
>> 16.48 speedup ratio. On x86, SSE2 is ~7. Curiously, the Power C version
>> takes as many cycles as the x86 SSE2 version, yikes it's fast.
>>
>> Note that this function uses VSX instructions, but is not marked so.
>> This is because several existing functions also make that mistake.
>> I'll submit a patch moving them once this is reviewed.
>>
>> v2: Remove !BE check
>> Signed-off-by: Lauri Kasanen 
>> ---
>>  libswscale/ppc/swscale_altivec.c | 53
>> 
>>  1 file changed, 53 insertions(+)
>>
>> diff --git a/libswscale/ppc/swscale_altivec.c
>> b/libswscale/ppc/swscale_altivec.c
>> index 2fb2337..8c6056d 100644
>> --- a/libswscale/ppc/swscale_altivec.c
>> +++ b/libswscale/ppc/swscale_altivec.c
>> @@ -324,6 +324,53 @@ static void hScale_altivec_real(SwsContext *c,
>> int16_t
>> *dst, int dstW,
>>  }
>>  }
>>  }
>> +
>> +static void yuv2plane1_8_u(const int16_t *src, uint8_t *dest, int dstW,
>> +   const uint8_t *dither, int offset, int start)
>> +{
>> +int i;
>> +for (i = start; i < dstW; i++) {
>> +int val = (src[i] + dither[(i + offset) & 7]) >> 7;
>> +dest[i] = av_clip_uint8(val);
>> +}
>> +}
>> +
>> +static void yuv2plane1_8_altivec(const int16_t *src, uint8_t *dest, int
>> dstW,
>> +   const uint8_t *dither, int offset)
>> +{
>> +const int dst_u = -(uintptr_t)dest & 15;
>> +int i, j;
>> +LOCAL_ALIGNED(16, int16_t, val, [16]);
>
>> +const vector uint16_t shifts = (vector uint16_t) {7, 7, 7, 7, 7, 7,
>> 7,
>> 7};
>
> The patch breaks compilation with xlc, sorry for not testing earlier:
> libswscale/ppc/swscale_altivec.c:344:11: error: unknown type name 'vector'
> const vector uint16_t shifts = (vector uint16_t) {7, 7, 7, 7, 7, 7, 7, 7};

In case this error does not make much sense to you, don't worry too
much, the following change was necessary to make xlc pass rv20-1239:
diff --git a/fftools/ffmpeg_filter.c b/fftools/ffmpeg_filter.c
index 6518d50..fb749c5 100644
--- a/fftools/ffmpeg_filter.c
+++ b/fftools/ffmpeg_filter.c
@@ -744,6 +744,7 @@ static int configure_input_video_filter
 InputFile *f = input_files[ist->file_index];
 AVRational tb = ist->framerate.num ? av_inv_q(ist->framerate) :
  ist->st->time_base;
+if(!ist->framerate.num)tb = ist->st->time_base;
 AVRational fr = ist->framerate;
 AVRational sar;
 AVBPrint args;

;-)

(As expected, other tests also fail.)

Carl Eugen
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH v2] swscale/output: Altivec-optimize yuv2plane1_8

2018-11-26 Thread Carl Eugen Hoyos
2018-11-17 9:12 GMT+01:00, Lauri Kasanen :
> ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt
> yuv420p \
> -f null -vframes 100 -v error -nostats -
>
> 1158 UNITS in planar1,   65528 runs,  8 skips
>
> -cpuflags 0
>
> 19082 UNITS in planar1,   65533 runs,  3 skips
>
> 16.48 speedup ratio. On x86, SSE2 is ~7. Curiously, the Power C version
> takes as many cycles as the x86 SSE2 version, yikes it's fast.
>
> Note that this function uses VSX instructions, but is not marked so.
> This is because several existing functions also make that mistake.
> I'll submit a patch moving them once this is reviewed.
>
> v2: Remove !BE check
> Signed-off-by: Lauri Kasanen 
> ---
>  libswscale/ppc/swscale_altivec.c | 53
> 
>  1 file changed, 53 insertions(+)
>
> diff --git a/libswscale/ppc/swscale_altivec.c
> b/libswscale/ppc/swscale_altivec.c
> index 2fb2337..8c6056d 100644
> --- a/libswscale/ppc/swscale_altivec.c
> +++ b/libswscale/ppc/swscale_altivec.c
> @@ -324,6 +324,53 @@ static void hScale_altivec_real(SwsContext *c, int16_t
> *dst, int dstW,
>  }
>  }
>  }
> +
> +static void yuv2plane1_8_u(const int16_t *src, uint8_t *dest, int dstW,
> +   const uint8_t *dither, int offset, int start)
> +{
> +int i;
> +for (i = start; i < dstW; i++) {
> +int val = (src[i] + dither[(i + offset) & 7]) >> 7;
> +dest[i] = av_clip_uint8(val);
> +}
> +}
> +
> +static void yuv2plane1_8_altivec(const int16_t *src, uint8_t *dest, int
> dstW,
> +   const uint8_t *dither, int offset)
> +{
> +const int dst_u = -(uintptr_t)dest & 15;
> +int i, j;
> +LOCAL_ALIGNED(16, int16_t, val, [16]);

> +const vector uint16_t shifts = (vector uint16_t) {7, 7, 7, 7, 7, 7, 7,
> 7};

The patch breaks compilation with xlc, sorry for not testing earlier:
libswscale/ppc/swscale_altivec.c:344:11: error: unknown type name 'vector'
const vector uint16_t shifts = (vector uint16_t) {7, 7, 7, 7, 7, 7, 7, 7};

Carl Eugen
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH v2] swscale/output: Altivec-optimize yuv2plane1_8

2018-11-26 Thread Lauri Kasanen
On Mon, 26 Nov 2018 11:03:55 +0300
Michael Kostylev  wrote:

> 
> http://fate.xffm.org/?sort=arch
> /ppc

Yeah, mentioned in the commit message. Follow-up patch coming today.

- Lauri
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH v2] swscale/output: Altivec-optimize yuv2plane1_8

2018-11-25 Thread Michael Niedermayer
On Sat, Nov 24, 2018 at 11:33:01AM +0200, Lauri Kasanen wrote:
> On Fri, 23 Nov 2018 23:01:02 +0100
> Michael Niedermayer  wrote:
> 
> > On Fri, Nov 23, 2018 at 10:38:13AM +0200, Lauri Kasanen wrote:
> > > I mean, if my patch adds no failures, is that enough to apply it?
> > 
> > yes that and the tests failing should still fail the same way with the
> > same checksums
> > This of course assumes noone finds an issue in the patch
> 
> Okay, ran both with -k. No new failures, and fate-rv20-1239 failed with
> the same checksums in both cases. That was the only failing test, did
> not try with THREADS.

ok, will apply

thanks


[...]

-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Old school: Use the lowest level language in which you can solve the problem
conveniently.
New school: Use the highest level language in which the latest supercomputer
can solve the problem without the user falling asleep waiting.


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH v2] swscale/output: Altivec-optimize yuv2plane1_8

2018-11-24 Thread Lauri Kasanen
On Fri, 23 Nov 2018 23:01:02 +0100
Michael Niedermayer  wrote:

> On Fri, Nov 23, 2018 at 10:38:13AM +0200, Lauri Kasanen wrote:
> > I mean, if my patch adds no failures, is that enough to apply it?
> 
> yes that and the tests failing should still fail the same way with the
> same checksums
> This of course assumes noone finds an issue in the patch

Okay, ran both with -k. No new failures, and fate-rv20-1239 failed with
the same checksums in both cases. That was the only failing test, did
not try with THREADS.

Curiously "make CPUFLAGS=0 fate-rv20-1239" also fails, so it's not
Altivec code that breaks that test, but C (?).

- Lauri
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH v2] swscale/output: Altivec-optimize yuv2plane1_8

2018-11-23 Thread Michael Niedermayer
On Fri, Nov 23, 2018 at 10:38:13AM +0200, Lauri Kasanen wrote:
> On Fri, 23 Nov 2018 03:26:50 +0100
> Michael Niedermayer  wrote:
> 
> > On Wed, Nov 21, 2018 at 07:19:45PM +0200, Lauri Kasanen wrote:
> > > On Wed, 21 Nov 2018 17:22:36 +0100
> > > Michael Niedermayer  wrote:
> > > > the full fate tests must be run, many of these tests use swscale without
> > > > having "scale" in their name
> > > > and yes on lower end hardware 20min and longer is possible
> > > 
> > > I get failures on the baseline, without my patch. What is the procedure
> > > here? Is there a var to skip those tests, or?
> > 
> > procedure ?
> > First i try to convince you to attempt to fix some of these failures ;)
> > because well, everyone would benefit if they are fixed ...
> 
> I mean, if my patch adds no failures, is that enough to apply it?

yes that and the tests failing should still fail the same way with the
same checksums
This of course assumes noone finds an issue in the patch

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Freedom in capitalist society always remains about the same as it was in
ancient Greek republics: Freedom for slave owners. -- Vladimir Lenin


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH v2] swscale/output: Altivec-optimize yuv2plane1_8

2018-11-23 Thread Lauri Kasanen
On Fri, 23 Nov 2018 03:26:50 +0100
Michael Niedermayer  wrote:

> On Wed, Nov 21, 2018 at 07:19:45PM +0200, Lauri Kasanen wrote:
> > On Wed, 21 Nov 2018 17:22:36 +0100
> > Michael Niedermayer  wrote:
> > > the full fate tests must be run, many of these tests use swscale without
> > > having "scale" in their name
> > > and yes on lower end hardware 20min and longer is possible
> > 
> > I get failures on the baseline, without my patch. What is the procedure
> > here? Is there a var to skip those tests, or?
> 
> procedure ?
> First i try to convince you to attempt to fix some of these failures ;)
> because well, everyone would benefit if they are fixed ...

I mean, if my patch adds no failures, is that enough to apply it?

- Lauri
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH v2] swscale/output: Altivec-optimize yuv2plane1_8

2018-11-22 Thread Michael Niedermayer
On Wed, Nov 21, 2018 at 07:19:45PM +0200, Lauri Kasanen wrote:
> On Wed, 21 Nov 2018 17:22:36 +0100
> Michael Niedermayer  wrote:
> 
> > the full fate tests must be run, many of these tests use swscale without
> > having "scale" in their name
> > and yes on lower end hardware 20min and longer is possible
> 
> I get failures on the baseline, without my patch. What is the procedure
> here? Is there a var to skip those tests, or?

procedure ?
First i try to convince you to attempt to fix some of these failures ;)
because well, everyone would benefit if they are fixed ...


> 
> First I ran with THREADS=3, baseline blew up in
> fate-h264-conformance-frext-hpcafl_bcrm_c
> 
> Then I ran without THREADS, it got further, but blew up in
> fate-rv20-1239

you can use make -k  to continue testing in presence of failures
this is not guranteed to run all tests, it may skip dependant tests
which dependent on a failing target.

Thanks

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I am the wisest man alive, for I know one thing, and that is that I know
nothing. -- Socrates


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH v2] swscale/output: Altivec-optimize yuv2plane1_8

2018-11-21 Thread Lauri Kasanen
On Wed, 21 Nov 2018 17:22:36 +0100
Michael Niedermayer  wrote:

> the full fate tests must be run, many of these tests use swscale without
> having "scale" in their name
> and yes on lower end hardware 20min and longer is possible

I get failures on the baseline, without my patch. What is the procedure
here? Is there a var to skip those tests, or?

First I ran with THREADS=3, baseline blew up in
fate-h264-conformance-frext-hpcafl_bcrm_c

Then I ran without THREADS, it got further, but blew up in
fate-rv20-1239

- Lauri
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH v2] swscale/output: Altivec-optimize yuv2plane1_8

2018-11-21 Thread Michael Niedermayer
On Wed, Nov 21, 2018 at 02:35:32PM +0200, Lauri Kasanen wrote:
> On Wed, 21 Nov 2018 13:21:58 +0100
> Michael Niedermayer  wrote:
> 
> > On Wed, Nov 21, 2018 at 10:12:48AM +0200, Lauri Kasanen wrote:
> > > > ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt 
> > > > yuv420p \
> > > > -f null -vframes 100 -v error -nostats -
> > > > 
> > > > 1158 UNITS in planar1,   65528 runs,  8 skips
> > > > 
> > > > -cpuflags 0
> > > > 
> > > > 19082 UNITS in planar1,   65533 runs,  3 skips
> > > > 
> > > > 16.48 speedup ratio. On x86, SSE2 is ~7. Curiously, the Power C version
> > > > takes as many cycles as the x86 SSE2 version, yikes it's fast.
> > > > 
> > > > Note that this function uses VSX instructions, but is not marked so.
> > > > This is because several existing functions also make that mistake.
> > > > I'll submit a patch moving them once this is reviewed.
> > > > 
> > > > v2: Remove !BE check
> > > > Signed-off-by: Lauri Kasanen 
> > > 
> > > Ping. Seems not many ffmpeg devs interested in ppc.
> > 
> > have you tried "make fate" with this patch (note you need to configure with
> > fate samples" so all tests are run
> 
> I ran those fate tests containing "scale" in the name, I gather the
> full suite takes > 20min. Otherwise I tested with a PNG to video
> conversion on LE, and Carl Eugen Hoyos tested with Lena on BE.

the full fate tests must be run, many of these tests use swscale without
having "scale" in their name
and yes on lower end hardware 20min and longer is possible

[...]

-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

If you think the mosad wants you dead since a long time then you are either
wrong or dead since a long time.


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH v2] swscale/output: Altivec-optimize yuv2plane1_8

2018-11-21 Thread Lauri Kasanen
On Wed, 21 Nov 2018 13:21:58 +0100
Michael Niedermayer  wrote:

> On Wed, Nov 21, 2018 at 10:12:48AM +0200, Lauri Kasanen wrote:
> > > ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt 
> > > yuv420p \
> > > -f null -vframes 100 -v error -nostats -
> > > 
> > > 1158 UNITS in planar1,   65528 runs,  8 skips
> > > 
> > > -cpuflags 0
> > > 
> > > 19082 UNITS in planar1,   65533 runs,  3 skips
> > > 
> > > 16.48 speedup ratio. On x86, SSE2 is ~7. Curiously, the Power C version
> > > takes as many cycles as the x86 SSE2 version, yikes it's fast.
> > > 
> > > Note that this function uses VSX instructions, but is not marked so.
> > > This is because several existing functions also make that mistake.
> > > I'll submit a patch moving them once this is reviewed.
> > > 
> > > v2: Remove !BE check
> > > Signed-off-by: Lauri Kasanen 
> > 
> > Ping. Seems not many ffmpeg devs interested in ppc.
> 
> have you tried "make fate" with this patch (note you need to configure with
> fate samples" so all tests are run

I ran those fate tests containing "scale" in the name, I gather the
full suite takes > 20min. Otherwise I tested with a PNG to video
conversion on LE, and Carl Eugen Hoyos tested with Lena on BE.

- Lauri
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH v2] swscale/output: Altivec-optimize yuv2plane1_8

2018-11-21 Thread Michael Niedermayer
On Wed, Nov 21, 2018 at 10:12:48AM +0200, Lauri Kasanen wrote:
> > ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt 
> > yuv420p \
> > -f null -vframes 100 -v error -nostats -
> > 
> > 1158 UNITS in planar1,   65528 runs,  8 skips
> > 
> > -cpuflags 0
> > 
> > 19082 UNITS in planar1,   65533 runs,  3 skips
> > 
> > 16.48 speedup ratio. On x86, SSE2 is ~7. Curiously, the Power C version
> > takes as many cycles as the x86 SSE2 version, yikes it's fast.
> > 
> > Note that this function uses VSX instructions, but is not marked so.
> > This is because several existing functions also make that mistake.
> > I'll submit a patch moving them once this is reviewed.
> > 
> > v2: Remove !BE check
> > Signed-off-by: Lauri Kasanen 
> 
> Ping. Seems not many ffmpeg devs interested in ppc.

have you tried "make fate" with this patch (note you need to configure with
fate samples" so all tests are run

thx

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Good people do not need laws to tell them to act responsibly, while bad
people will find a way around the laws. -- Plato


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH v2] swscale/output: Altivec-optimize yuv2plane1_8

2018-11-21 Thread Lauri Kasanen
> ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p 
> \
> -f null -vframes 100 -v error -nostats -
> 
> 1158 UNITS in planar1,   65528 runs,  8 skips
> 
> -cpuflags 0
> 
> 19082 UNITS in planar1,   65533 runs,  3 skips
> 
> 16.48 speedup ratio. On x86, SSE2 is ~7. Curiously, the Power C version
> takes as many cycles as the x86 SSE2 version, yikes it's fast.
> 
> Note that this function uses VSX instructions, but is not marked so.
> This is because several existing functions also make that mistake.
> I'll submit a patch moving them once this is reviewed.
> 
> v2: Remove !BE check
> Signed-off-by: Lauri Kasanen 

Ping. Seems not many ffmpeg devs interested in ppc.

- Lauri
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH v2] swscale/output: Altivec-optimize yuv2plane1_8

2018-11-17 Thread Michael Niedermayer
On Sat, Nov 17, 2018 at 10:12:14AM +0200, Lauri Kasanen wrote:
> ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p 
> \
> -f null -vframes 100 -v error -nostats -
> 
> 1158 UNITS in planar1,   65528 runs,  8 skips
> 
> -cpuflags 0
> 
> 19082 UNITS in planar1,   65533 runs,  3 skips
> 
> 16.48 speedup ratio. On x86, SSE2 is ~7. Curiously, the Power C version
> takes as many cycles as the x86 SSE2 version, yikes it's fast.
> 
> Note that this function uses VSX instructions, but is not marked so.
> This is because several existing functions also make that mistake.
> I'll submit a patch moving them once this is reviewed.
> 
> v2: Remove !BE check
> Signed-off-by: Lauri Kasanen 
> ---
>  libswscale/ppc/swscale_altivec.c | 53 
> 
>  1 file changed, 53 insertions(+)

iam no altivec guy, just wanted to reply primarly so people dont wait for
a review from me as i normally review swscale patches.
if someone who knows altivec ok-es this then its ok with me too.
assuming this was tested (fate and some real world test)

thx

[...]

-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Concerning the gods, I have no means of knowing whether they exist or not
or of what sort they may be, because of the obscurity of the subject, and
the brevity of human life -- Protagoras


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH v2] swscale/output: Altivec-optimize yuv2plane1_8

2018-11-17 Thread Lauri Kasanen
./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p \
-f null -vframes 100 -v error -nostats -

1158 UNITS in planar1,   65528 runs,  8 skips

-cpuflags 0

19082 UNITS in planar1,   65533 runs,  3 skips

16.48 speedup ratio. On x86, SSE2 is ~7. Curiously, the Power C version
takes as many cycles as the x86 SSE2 version, yikes it's fast.

Note that this function uses VSX instructions, but is not marked so.
This is because several existing functions also make that mistake.
I'll submit a patch moving them once this is reviewed.

v2: Remove !BE check
Signed-off-by: Lauri Kasanen 
---
 libswscale/ppc/swscale_altivec.c | 53 
 1 file changed, 53 insertions(+)

diff --git a/libswscale/ppc/swscale_altivec.c b/libswscale/ppc/swscale_altivec.c
index 2fb2337..8c6056d 100644
--- a/libswscale/ppc/swscale_altivec.c
+++ b/libswscale/ppc/swscale_altivec.c
@@ -324,6 +324,53 @@ static void hScale_altivec_real(SwsContext *c, int16_t 
*dst, int dstW,
 }
 }
 }
+
+static void yuv2plane1_8_u(const int16_t *src, uint8_t *dest, int dstW,
+   const uint8_t *dither, int offset, int start)
+{
+int i;
+for (i = start; i < dstW; i++) {
+int val = (src[i] + dither[(i + offset) & 7]) >> 7;
+dest[i] = av_clip_uint8(val);
+}
+}
+
+static void yuv2plane1_8_altivec(const int16_t *src, uint8_t *dest, int dstW,
+   const uint8_t *dither, int offset)
+{
+const int dst_u = -(uintptr_t)dest & 15;
+int i, j;
+LOCAL_ALIGNED(16, int16_t, val, [16]);
+const vector uint16_t shifts = (vector uint16_t) {7, 7, 7, 7, 7, 7, 7, 7};
+vector int16_t vi, vileft, ditherleft, ditherright;
+vector uint8_t vd;
+
+for (j = 0; j < 16; j++) {
+val[j] = dither[(dst_u + offset + j) & 7];
+}
+
+ditherleft = vec_ld(0, val);
+ditherright = vec_ld(0, [8]);
+
+yuv2plane1_8_u(src, dest, dst_u, dither, offset, 0);
+
+for (i = dst_u; i < dstW - 15; i += 16) {
+
+vi = vec_vsx_ld(0, [i]);
+vi = vec_adds(ditherleft, vi);
+vileft = vec_sra(vi, shifts);
+
+vi = vec_vsx_ld(0, [i + 8]);
+vi = vec_adds(ditherright, vi);
+vi = vec_sra(vi, shifts);
+
+vd = vec_packsu(vileft, vi);
+vec_st(vd, 0, [i]);
+}
+
+yuv2plane1_8_u(src, dest, dstW, dither, offset, i);
+}
+
 #endif /* HAVE_ALTIVEC */
 
 av_cold void ff_sws_init_swscale_ppc(SwsContext *c)
@@ -367,6 +414,12 @@ av_cold void ff_sws_init_swscale_ppc(SwsContext *c)
 c->yuv2packedX = ff_yuv2rgb24_X_altivec;
 break;
 }
+
+switch (c->dstBpc) {
+case 8:
+c->yuv2plane1 = yuv2plane1_8_altivec;
+break;
+}
 }
 #endif /* HAVE_ALTIVEC */
 }
-- 
2.6.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel