Re: [FFmpeg-devel] [PATCH] checkasm/af_afir: relax the max allowed absolute difference

2019-01-12 Thread James Almer
On 1/11/2019 12:17 PM, Derek Buitenhuis wrote:
> On 11/01/2019 13:28, Hendrik Leppkes wrote:
>> Because the computation accumulates more inaccuarcy then FLT_EPSILON
>> allows for. That value is really not of that great use. If you have
>> two accurate numbers and do one calculation, it may work, but if you
>> do a whole bunch of them, the error accumulates and eventually gets
>> bigger then FLT_EPSILON.
>> x86_32 floating point is for $reasons a tad bit less accurate then on
>> x86_64, for example, resulting in the test failing. We have some other
>> float tests that  do (or used to) fail sporadically due to inaccuracy
>> problems, which sometimes where fixed by similar means - or
>> multiplifying FLT_EPSILON to make it bigger.
> 
> OK.
> 
> Two things:
> 
> 1. That should be in the commit messages.
> 2. Would some multiple of FLT_EPSILON make more sense?

Michael suggested 1000*FLT_EPSILON but IMO that's too big and may hide
errors in future implementations.
The value i used is the smallest value i found that didn't fail after
several runs. 6.1e-05 for example fails.

> 
> - Derek
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH] avcodec: fix some docstrings

2019-01-12 Thread Nicolas Granger
Hello,

This fixes an erroneous reference and missing links in the API documentation.

Best regards,

Nicolas Granger

---
libavcodec/avcodec.h | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h
index 4414853e84..64ba039be2 100644
--- a/libavcodec/avcodec.h
+++ b/libavcodec/avcodec.h
@@ -4892,7 +4892,8 @@ int avcodec_send_packet(AVCodecContext *avctx, const 
AVPacket *avpkt);
* @param frame This will be set to a reference-counted video or audio
* frame (depending on the decoder type) allocated by the
* decoder. Note that the function will always call
- * av_frame_unref(frame) before doing anything else.
+ * @ref av_frame_unref "av_frame_unref(frame)" before doing
+ * anything else.
*
* @return
* 0: success, a frame was returned
@@ -4948,7 +4949,8 @@ int avcodec_send_frame(AVCodecContext *avctx, const 
AVFrame *frame);
* @param avctx codec context
* @param avpkt This will be set to a reference-counted packet allocated by the
* encoder. Note that the function will always call
- * av_frame_unref(frame) before doing anything else.
+ * @ref av_packet_unref "av_packet_unref(avpkt)" before doing
+ * anything else.
* @return 0 on success, otherwise negative error code:
* AVERROR(EAGAIN): output is not available in the current state - user
* must try to send input
-- 
2.20.1


___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH v5] libswscale/ppc: VSX-optimize 9-16 bit yuv2planeX

2019-01-12 Thread Michael Niedermayer
On Sat, Jan 12, 2019 at 06:25:57PM +0200, Lauri Kasanen wrote:
> On Sat, 12 Jan 2019 14:52:07 +0100
> Michael Niedermayer  wrote:
> 
> > On Sat, Jan 12, 2019 at 10:47:50AM +0200, Lauri Kasanen wrote:
> > > ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt 
> > > yuv420p16be \
> > > -s 1920x1728 -f null -vframes 100 -v error -nostats -
> > > 
> > > 9-14 bit funcs get about 6x speedup, 16-bit gets about 15x.
> > > Fate passes, each format tested with an image to video conversion.
> > > 
> > > Only POWER8 includes 32-bit vector multiplies, so POWER7 is locked out
> > > of the 16-bit function. This includes the vec_mulo/mule functions too,
> > > not just vmuluwm.
> > > 
> > > yuv420p9le
> > >   12341 UNITS in planarX,  130976 runs, 96 skips
> > >   73752 UNITS in planarX,  131066 runs,  6 skips
> > > yuv420p9be
> > >   12364 UNITS in planarX,  131025 runs, 47 skips
> > >   73001 UNITS in planarX,  131055 runs, 17 skips
> > > yuv420p10le
> > >   12386 UNITS in planarX,  131042 runs, 30 skips
> > >   72735 UNITS in planarX,  131062 runs, 10 skips
> > > yuv420p10be
> > >   12337 UNITS in planarX,  131045 runs, 27 skips
> > >   72734 UNITS in planarX,  131057 runs, 15 skips
> > > yuv420p12le
> > >   12236 UNITS in planarX,  131058 runs, 14 skips
> > >   73029 UNITS in planarX,  131062 runs, 10 skips
> > > yuv420p12be
> > >   12218 UNITS in planarX,  130973 runs, 99 skips
> > >   72402 UNITS in planarX,  131069 runs,  3 skips
> > > yuv420p14le
> > >   12168 UNITS in planarX,  131067 runs,  5 skips
> > >   72480 UNITS in planarX,  131069 runs,  3 skips
> > > yuv420p14be
> > >   12358 UNITS in planarX,  130948 runs,124 skips
> > >   73772 UNITS in planarX,  131063 runs,  9 skips
> > > yuv420p16le
> > >   10439 UNITS in planarX,  130911 runs,161 skips
> > >  157923 UNITS in planarX,  131068 runs,  4 skips
> > > yuv420p16be
> > >   10463 UNITS in planarX,  130874 runs,198 skips
> > >  154405 UNITS in planarX,  131061 runs, 11 skips
> > 
> > The number of skips in the benchmark is much larger on one
> > side. That way the numbers become hard to compare as
> > more cases aer skipped on one side
> > 
> > please adjust the parameters so the skip counts are compareable
> > or redo the tests until the numbers are more similar
> > thanks
> 
> How do I do that? It's a VM, so there are going to be pauses no matter
> what, when other VMs run. Or should I take the largest run count with
> about the same skips?

I would try to adjust TIMER_REPORT so that either VM switches
are skiped on both sides of the test reliably or that they are never
skipped. The idea is to do the same to both so theres no asymetry
from differntly successfull skips

thx

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Asymptotically faster algorithms should always be preferred if you have
asymptotical amounts of data


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] doc/developer: require transparency about sponshorships.

2019-01-12 Thread Paul B Mahol
On 1/12/19, Nicolas George  wrote:
> Hendrik Leppkes (12019-01-11):
>> Its everyones right to keep their finances private. Would I be forced
>> to disclose my hourly wages and then determine how long I worked on a
>> patch, just because I did it during my day job? Thats not going to
>> happen.
>>
>> To take a line from your post:
>> Are you against privacy?
>
> I grant you these were cheap theatricals. But to answer your question
> seriously: I am against absolute unconditional privacy, yes. Some things
> deserve privacy, some things do not; I personally believe that economic
> matters rather fall in the second category.
>
> In the particular instance you are evoking, the commit message could
> just say "developed as part my regular job at $company", I consider that
> enough disclosure for the purpose. And I wonder why you would want to
> keep that much hidden.
>
>> Patches should generally be considered on their own merit.
>
> That is true. And patches should be reviewed and discussed until they
> are of top quality. You know as well as me that it is not what is
> happening: there are too many patches and too little time available from
> competent developers; as a result, some code of mediocre quality have
> been pushed, and some committers have explicitly stated they would
> bypass technical objections to their patches. And now it appears that
> was the result of sponsorships...

Citation needed.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 3/4] avcodec/prosumer: Reduce lut size

2019-01-12 Thread Michael Niedermayer
Signed-off-by: Michael Niedermayer 
---
 libavcodec/prosumer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavcodec/prosumer.c b/libavcodec/prosumer.c
index 3fa9986a38..9143bb1bf4 100644
--- a/libavcodec/prosumer.c
+++ b/libavcodec/prosumer.c
@@ -38,7 +38,7 @@ typedef struct ProSumerContext {
 
 unsigned stride;
 unsigned size;
-uint32_t lut[0x1];
+uint32_t lut[0x2000];
 uint8_t *initial_line;
 uint8_t *decbuffer;
 } ProSumerContext;
-- 
2.20.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 2/4] avcodec/prosumer: Simplify code slightly in decompress()

2019-01-12 Thread Michael Niedermayer
Signed-off-by: Michael Niedermayer 
---
 libavcodec/prosumer.c | 19 +++
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/libavcodec/prosumer.c b/libavcodec/prosumer.c
index 505de71980..3fa9986a38 100644
--- a/libavcodec/prosumer.c
+++ b/libavcodec/prosumer.c
@@ -84,16 +84,13 @@ static int decompress(GetByteContext *gb, int size, 
PutByteContext *pb, const ui
 if (bytestream2_get_bytes_left(gb) <= 0) {
 if (!a)
 return 0;
-cnt = 4;
 } else {
-pos = bytestream2_tell(gb) ^ 2;
-bytestream2_seek(gb, pos, SEEK_SET);
+pos = bytestream2_tell(gb);
+bytestream2_seek(gb, pos ^ 2, SEEK_SET);
 AV_WN16(, bytestream2_peek_le16(gb));
-pos = pos ^ 2;
-bytestream2_seek(gb, pos, SEEK_SET);
-bytestream2_skip(gb, 2);
-cnt = 4;
+bytestream2_seek(gb, pos + 2, SEEK_SET);
 }
+cnt = 4;
 }
 c--;
 }
@@ -117,12 +114,10 @@ static int decompress(GetByteContext *gb, int size, 
PutByteContext *pb, const ui
 }
 return 0;
 }
-pos = bytestream2_tell(gb) ^ 2;
-bytestream2_seek(gb, pos, SEEK_SET);
+pos = bytestream2_tell(gb);
+bytestream2_seek(gb, pos ^ 2, SEEK_SET);
 AV_WN16(, bytestream2_peek_le16(gb));
-pos = pos ^ 2;
-bytestream2_seek(gb, pos, SEEK_SET);
-bytestream2_skip(gb, 2);
+bytestream2_seek(gb, pos + 2, SEEK_SET);
 cnt = 4;
 idx--;
 }
-- 
2.20.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 4/4] avcodec/prosumer: Error out if decompress() stops reading data

2019-01-12 Thread Michael Niedermayer
if 0 is encountered in the LUT then decompress() will continue to output 0 
bytes but never read more data.
Without a specification it is impossible to say if this is invalid or a feature.
None of the valid prosumer files tested cause a 0 to be read, so it is likely
not a intended feature.

Fixes: Timeout
Fixes: 
11266/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_PROSUMER_fuzzer-5681827423977472

Found-by: continuous fuzzing process 
https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer 
---
 libavcodec/prosumer.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/libavcodec/prosumer.c b/libavcodec/prosumer.c
index 9143bb1bf4..ce3cbdbb73 100644
--- a/libavcodec/prosumer.c
+++ b/libavcodec/prosumer.c
@@ -96,6 +96,8 @@ static int decompress(GetByteContext *gb, int size, 
PutByteContext *pb, const ui
 }
 idx = a >> 20;
 b = lut[2 * idx];
+if (!b)
+return AVERROR_INVALIDDATA;
 continue;
 }
 idx = 2;
@@ -154,8 +156,9 @@ static int decode_frame(AVCodecContext *avctx, void *data,
 memset(s->decbuffer, 0, s->size);
 bytestream2_init(>gb, avpkt->data, avpkt->size);
 bytestream2_init_writer(>pb, s->decbuffer, s->size);
-
-decompress(>gb, AV_RL32(avpkt->data + 28) >> 1, >pb, s->lut);
+ret = decompress(>gb, AV_RL32(avpkt->data + 28) >> 1, >pb, s->lut);
+if (ret < 0)
+return ret;
 vertical_predict((uint32_t *)s->decbuffer, 0, (uint32_t *)s->initial_line, 
s->stride, 1);
 vertical_predict((uint32_t *)s->decbuffer, s->stride, (uint32_t 
*)s->decbuffer, s->stride, avctx->height - 1);
 
-- 
2.20.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 1/4] avcodec/tiff: Check for 12bit gray fax

2019-01-12 Thread Michael Niedermayer
Fixes: Assertion failure
Fixes: 
11898/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_TIFF_fuzzer-5759794191794176

Found-by: continuous fuzzing process 
https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer 
---
 libavcodec/tiff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavcodec/tiff.c b/libavcodec/tiff.c
index 570b3cbd01..5cf2a4417e 100644
--- a/libavcodec/tiff.c
+++ b/libavcodec/tiff.c
@@ -619,7 +619,7 @@ static int tiff_unpack_strip(TiffContext *s, AVFrame *p, 
uint8_t *dst, int strid
 if (s->compr == TIFF_CCITT_RLE ||
 s->compr == TIFF_G3||
 s->compr == TIFF_G4) {
-if (is_yuv)
+if (is_yuv || p->format == AV_PIX_FMT_GRAY12)
 return AVERROR_INVALIDDATA;
 
 return tiff_unpack_fax(s, dst, stride, src, size, width, lines);
-- 
2.20.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] doc/developer: require transparency about sponshorships.

2019-01-12 Thread Nicolas George
Kyle Swanson (12019-01-11):
> If someone sends a bad patch, we have no obligation to merge it.

Except if they push it themselves after a few hours without review (or
after being rude to somebody to made a review requiring more work).

The disclosure (and review) requirement is especially important for
people who have commit rights, but it would be unjust to apply to only
us.

Regards,

-- 
  Nicolas George


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] doc/developer: require transparency about sponshorships.

2019-01-12 Thread Nicolas George
Hendrik Leppkes (12019-01-11):
> Its everyones right to keep their finances private. Would I be forced
> to disclose my hourly wages and then determine how long I worked on a
> patch, just because I did it during my day job? Thats not going to
> happen.
> 
> To take a line from your post:
> Are you against privacy?

I grant you these were cheap theatricals. But to answer your question
seriously: I am against absolute unconditional privacy, yes. Some things
deserve privacy, some things do not; I personally believe that economic
matters rather fall in the second category.

In the particular instance you are evoking, the commit message could
just say "developed as part my regular job at $company", I consider that
enough disclosure for the purpose. And I wonder why you would want to
keep that much hidden.

> Patches should generally be considered on their own merit.

That is true. And patches should be reviewed and discussed until they
are of top quality. You know as well as me that it is not what is
happening: there are too many patches and too little time available from
competent developers; as a result, some code of mediocre quality have
been pushed, and some committers have explicitly stated they would
bypass technical objections to their patches. And now it appears that
was the result of sponsorships...

Regards,

-- 
  Nicolas George


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avfilter: add maskfun filter

2019-01-12 Thread Paul B Mahol
On 11/19/18, Paul B Mahol  wrote:
> Signed-off-by: Paul B Mahol 
> ---
>  doc/filters.texi |  27 
>  libavfilter/Makefile |   1 +
>  libavfilter/allfilters.c |   1 +
>  libavfilter/vf_maskfun.c | 279 +++
>  4 files changed, 308 insertions(+)
>  create mode 100644 libavfilter/vf_maskfun.c

Will apply.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH]lavc: Allow very high bitrates in AVCPBProperties after next version bump

2019-01-12 Thread Carl Eugen Hoyos
2019-01-11 1:07 GMT+01:00, James Almer :
> On 1/10/2019 6:27 PM, Carl Eugen Hoyos wrote:
>> Hi!
>>
>> I don't know how urgent this is and how easily this can be triggered
>> with AVCPBProperties but we had issues with bitrates > INT32MAX in the
>> past, so looking at this code before realizing the qsv bitrate issue
>> is Intel-related I thought this patch cannot hurt.
>>
>> Please comment, Carl Eugen
>
> Probalby correct. bitrate fields in AVCodecContext are all int64_t, and
> AVCPBProperties fields are usually set to those.

Patch applied.

Thank you, Carl Eugen
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH]lavc/tiff: Support some CMYK samples

2019-01-12 Thread Carl Eugen Hoyos
2019-01-11 16:36 GMT+01:00, Derek Buitenhuis :
> On 11/01/2019 14:54, Carl Eugen Hoyos wrote:

>> Attached patch that fixes the sample from ticket #3459 cannot be
>> factorized with the code in mjpegdec (and psd), the representation is
>> different.
>
> Is there a good reason this is RGB0 instead of RGB24?

It would make the code much more complicated for no obvious
benefit (imo). Once somebody implements 5-component cmyk
in tiff, this can be changed as well.

> Other than that, seems OK, if tested

Patch applied, thank you!

> (is there a FATE sample we can add?)

Will look into a test.

Thank you, Carl Eugen
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH v5] libswscale/ppc: VSX-optimize 9-16 bit yuv2planeX

2019-01-12 Thread Lauri Kasanen
On Sat, 12 Jan 2019 14:52:07 +0100
Michael Niedermayer  wrote:

> On Sat, Jan 12, 2019 at 10:47:50AM +0200, Lauri Kasanen wrote:
> > ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt 
> > yuv420p16be \
> > -s 1920x1728 -f null -vframes 100 -v error -nostats -
> > 
> > 9-14 bit funcs get about 6x speedup, 16-bit gets about 15x.
> > Fate passes, each format tested with an image to video conversion.
> > 
> > Only POWER8 includes 32-bit vector multiplies, so POWER7 is locked out
> > of the 16-bit function. This includes the vec_mulo/mule functions too,
> > not just vmuluwm.
> > 
> > yuv420p9le
> >   12341 UNITS in planarX,  130976 runs, 96 skips
> >   73752 UNITS in planarX,  131066 runs,  6 skips
> > yuv420p9be
> >   12364 UNITS in planarX,  131025 runs, 47 skips
> >   73001 UNITS in planarX,  131055 runs, 17 skips
> > yuv420p10le
> >   12386 UNITS in planarX,  131042 runs, 30 skips
> >   72735 UNITS in planarX,  131062 runs, 10 skips
> > yuv420p10be
> >   12337 UNITS in planarX,  131045 runs, 27 skips
> >   72734 UNITS in planarX,  131057 runs, 15 skips
> > yuv420p12le
> >   12236 UNITS in planarX,  131058 runs, 14 skips
> >   73029 UNITS in planarX,  131062 runs, 10 skips
> > yuv420p12be
> >   12218 UNITS in planarX,  130973 runs, 99 skips
> >   72402 UNITS in planarX,  131069 runs,  3 skips
> > yuv420p14le
> >   12168 UNITS in planarX,  131067 runs,  5 skips
> >   72480 UNITS in planarX,  131069 runs,  3 skips
> > yuv420p14be
> >   12358 UNITS in planarX,  130948 runs,124 skips
> >   73772 UNITS in planarX,  131063 runs,  9 skips
> > yuv420p16le
> >   10439 UNITS in planarX,  130911 runs,161 skips
> >  157923 UNITS in planarX,  131068 runs,  4 skips
> > yuv420p16be
> >   10463 UNITS in planarX,  130874 runs,198 skips
> >  154405 UNITS in planarX,  131061 runs, 11 skips
> 
> The number of skips in the benchmark is much larger on one
> side. That way the numbers become hard to compare as
> more cases aer skipped on one side
> 
> please adjust the parameters so the skip counts are compareable
> or redo the tests until the numbers are more similar
> thanks

How do I do that? It's a VM, so there are going to be pauses no matter
what, when other VMs run. Or should I take the largest run count with
about the same skips?

- Lauri
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH]lavc/psd: Support CMYK images

2019-01-12 Thread Carl Eugen Hoyos
2019-01-11 7:17 GMT+01:00, Peter Ross :
> On Fri, Jan 11, 2019 at 03:23:49AM +0100, Carl Eugen Hoyos wrote:
>> 2019-01-11 2:55 GMT+01:00, Carl Eugen Hoyos :
>> > Hi!
>> >
>> > Attached patch fixes ticket #6797, please comment.
>>
>> New patch with 16bit support attached.
>>
>> Please comment, Carl Eugen
>
>> From 5f879539ee7fecd57bd3de9f7c6363d9b7779b5b Mon Sep 17 00:00:00 2001
>> From: Carl Eugen Hoyos 
>> Date: Fri, 11 Jan 2019 03:20:38 +0100
>> Subject: [PATCH] lavc/psd: Support CMYK images.
>>
>> Based on a05635e by Michael Niedermayer.
>>
>> Fixes ticket #6797.
>> ---
>>  libavcodec/psd.c |   84
>> ++
>>  1 file changed, 84 insertions(+)
>>
>> diff --git a/libavcodec/psd.c b/libavcodec/psd.c
>
>> +if (s->channel_depth == 8) {
>> +for (y = 0; y < s->height; y++) {
>> +for (x = 0; x < s->width; x++) {
>> +int k = src[3][x];
>> +int r = src[0][x] * k;
>> +int g = src[1][x] * k;
>> +int b = src[2][x] * k;
>> +dst[0][x] = g * 257 >> 16;
>> +dst[1][x] = b * 257 >> 16;
>> +dst[2][x] = r * 257 >> 16;
>> +}
>
> the same algorithm exists in libavcodec/mjpegdec.c, with alpha channel
> support.
> i guess it is trivial enough to be duplicated here.
>
> otherwise looks good.

Patch applied.

Thanks everybody, Carl Eugen
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 1/2] avcodec/vc1: shuffle calculation of MV predictor candidates

2019-01-12 Thread Carl Eugen Hoyos
2019-01-11 15:36 GMT+01:00, Jerome Borsboom :
> The B predictor for 4-MV macroblocks is only out of bounds when
> the A predictor is also out of bounds.

Patch applied.

Thank you, Carl Eugen
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 2/3] avcodec/gdv: Optimize and factorize scaling loops

2019-01-12 Thread Carl Eugen Hoyos
2019-01-12 16:46 GMT+01:00, Michael Niedermayer :
> On Sat, Jan 12, 2019 at 04:07:42PM +0100, Carl Eugen Hoyos wrote:
>> 2019-01-04 20:22 GMT+01:00, Michael Niedermayer :
>>
>> > +static void scaledown(uint8_t *dst, const uint8_t *src, int w)
>> > +{
>> > +int x;
>> > +for (x = 0; x < w - 7; x+=8) {
>> > +dst[x + 0] = src[2*x + 0];
>> > +dst[x + 1] = src[2*x + 2];
>> > +dst[x + 2] = src[2*x + 4];
>> > +dst[x + 3] = src[2*x + 6];
>> > +dst[x + 4] = src[2*x + 8];
>> > +dst[x + 5] = src[2*x +10];
>> > +dst[x + 6] = src[2*x +12];
>> > +dst[x + 7] = src[2*x +14];
>>
>> Could you add to the commit message the information
>> which compiler is able to optimize this?
>> (Assuming this is a reason for the speedup)
>
> if what you ask for is "which compiler turns this into SIMD"
> i do not know, and i suspect mine does not from the limited
> increase in performance
> I think the speedup is primarly from simply unrolling the trivial loop
>
> is there something you want me to change in the commit message still ?

No, I am a little surprised that unrolling without SIMD makes
a difference.

Thank you for the explanation, Carl Eugen
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 2/2 v3] avcodec/vc1: fix decoding of old WMV3 format

2019-01-12 Thread Carl Eugen Hoyos
2019-01-12 16:14 GMT+01:00, Jerome Borsboom :
> The position of the second MV predicitor candidate is slightly different
> for the old WMV3 format indicated by RES_RTM_FLAG. This patch fixes
> decoding of niceday.wmv on the samples server.
>
> Fixes: #6641
>
> Signed-off-by: Jerome Borsboom 
> ---
> This revision removes a spurious whitespace that was left behind.

Patch applied.

Thank you, Carl Eugen
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 2/3] avcodec/gdv: Optimize and factorize scaling loops

2019-01-12 Thread Michael Niedermayer
On Sat, Jan 12, 2019 at 04:07:42PM +0100, Carl Eugen Hoyos wrote:
> 2019-01-04 20:22 GMT+01:00, Michael Niedermayer :
> 
> > +static void scaledown(uint8_t *dst, const uint8_t *src, int w)
> > +{
> > +int x;
> > +for (x = 0; x < w - 7; x+=8) {
> > +dst[x + 0] = src[2*x + 0];
> > +dst[x + 1] = src[2*x + 2];
> > +dst[x + 2] = src[2*x + 4];
> > +dst[x + 3] = src[2*x + 6];
> > +dst[x + 4] = src[2*x + 8];
> > +dst[x + 5] = src[2*x +10];
> > +dst[x + 6] = src[2*x +12];
> > +dst[x + 7] = src[2*x +14];
> 
> Could you add to the commit message the information
> which compiler is able to optimize this?
> (Assuming this is a reason for the speedup)

if what you ask for is "which compiler turns this into SIMD"
i do not know, and i suspect mine does not from the limited
increase in performance
I think the speedup is primarly from simply unrolling the trivial loop

is there something you want me to change in the commit message still ?


> 
> Sorry for the late comment, Carl Eugen
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Dictatorship: All citizens are under surveillance, all their steps and
actions recorded, for the politicians to enforce control.
Democracy: All politicians are under surveillance, all their steps and
actions recorded, for the citizens to enforce control.


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 2/2 v3] avcodec/vc1: fix decoding of old WMV3 format

2019-01-12 Thread Jerome Borsboom
The position of the second MV predicitor candidate is slightly different
for the old WMV3 format indicated by RES_RTM_FLAG. This patch fixes
decoding of niceday.wmv on the samples server.

Fixes: #6641

Signed-off-by: Jerome Borsboom 
---
This revision removes a spurious whitespace that was left behind.

 libavcodec/vc1.c  | 5 -
 libavcodec/vc1_pred.c | 5 -
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/libavcodec/vc1.c b/libavcodec/vc1.c
index 3581d87b57..e102b931d8 100644
--- a/libavcodec/vc1.c
+++ b/libavcodec/vc1.c
@@ -379,11 +379,6 @@ int ff_vc1_decode_sequence_header(AVCodecContext *avctx, 
VC1Context *v, GetBitCo
 } else {
 v->res_rtm_flag = get_bits1(gb); //reserved
 }
-if (!v->res_rtm_flag) {
-av_log(avctx, AV_LOG_ERROR,
-   "Old WMV3 version detected, some frames may be decoded 
incorrectly\n");
-//return -1;
-}
 //TODO: figure out what they mean (always 0x402F)
 if (!v->res_fasttx)
 skip_bits(gb, 16);
diff --git a/libavcodec/vc1_pred.c b/libavcodec/vc1_pred.c
index 0b22d9916c..e1758a3817 100644
--- a/libavcodec/vc1_pred.c
+++ b/libavcodec/vc1_pred.c
@@ -275,7 +275,10 @@ void ff_vc1_pred_mv(VC1Context *v, int n, int dmv_x, int 
dmv_y,
 //in 4-MV mode different blocks have different B predictor position
 switch (n) {
 case 0:
-off = (s->mb_x > 0) ? -1 : 1;
+if (v->res_rtm_flag)
+off = s->mb_x ? -1 : 1;
+else
+off = s->mb_x ? -1 : 2 * s->mb_width - wrap - 1;
 break;
 case 1:
 off = (s->mb_x == (s->mb_width - 1)) ? -1 : 1;
-- 
2.13.6


___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] Video codec design for very low-end decoder

2019-01-12 Thread Carl Eugen Hoyos
2019-01-12 1:46 GMT+01:00, Ronald S. Bultje :
> Hi,
>
> On Thu, Jan 10, 2019 at 2:41 PM Carl Eugen Hoyos  wrote:
>
>> 2019-01-07 18:37 GMT+01:00, Ronald S. Bultje :
>> > Hi,
>> >
>> > On Mon, Jan 7, 2019 at 12:22 PM Lauri Kasanen  wrote:
>> >
>> >> On Mon, 7 Jan 2019 12:02:58 -0500
>> >> "Ronald S. Bultje"  wrote:
>> >>
>> >> > Have you considered vp8? It may sound weird but this is basically
>> >> > what
>> >> vp8
>> >> > was great at: being really simple to decode.
>> >>
>> >> VP8 has a reputation of being slow, so I didn't consider it. Benchmarks
>> >> show it as decoding slower than h264.
>> >>
>> >
>> > It is faster than h264 when comparing ffh264 vs. ffvp8:
>> >
>> > https://blogs.gnome.org/rbultje/files/2014/02/sintel_decspeed.png
>>
>> Are the relations identical without asm optimizations?
>>
>
> I believe so, yes. The theory behind it would be that lack of per-symbol
> probability adaptations in CABAC and bidirectional prediction were missing
> in VP8, both of which incur a significant runtime overhead. Then, if you
> start disabling tools (e.g. CABAC -> CAVLC) this difference would probably
> diminish quite quickly.

Thank you for the clarification!

Carl Eugen
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 2/3] avcodec/gdv: Optimize and factorize scaling loops

2019-01-12 Thread Carl Eugen Hoyos
2019-01-04 20:22 GMT+01:00, Michael Niedermayer :

> +static void scaledown(uint8_t *dst, const uint8_t *src, int w)
> +{
> +int x;
> +for (x = 0; x < w - 7; x+=8) {
> +dst[x + 0] = src[2*x + 0];
> +dst[x + 1] = src[2*x + 2];
> +dst[x + 2] = src[2*x + 4];
> +dst[x + 3] = src[2*x + 6];
> +dst[x + 4] = src[2*x + 8];
> +dst[x + 5] = src[2*x +10];
> +dst[x + 6] = src[2*x +12];
> +dst[x + 7] = src[2*x +14];

Could you add to the commit message the information
which compiler is able to optimize this?
(Assuming this is a reason for the speedup)

Sorry for the late comment, Carl Eugen
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avcodec/tests/rangecoder: initialize array to avoid valgrind warning

2019-01-12 Thread Michael Niedermayer
On Fri, Jan 04, 2019 at 02:46:29AM +0100, Michael Niedermayer wrote:
> Found-by: jamrial
> Signed-off-by: Michael Niedermayer 
> ---
>  libavcodec/tests/rangecoder.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

i intend to apply this soon unless there are more comments, i did not
understand the only comment :(

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

"I am not trying to be anyone's saviour, I'm trying to think about the
 future and not be sad" - Elon Musk



signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 3/3] avcodec/h264_slice: Fix integer overflow in implicit_weight_table()

2019-01-12 Thread Michael Niedermayer
On Fri, Jan 04, 2019 at 08:22:36PM +0100, Michael Niedermayer wrote:
> Fixes: signed integer overflow: 2 * 2132811760 cannot be represented in type 
> 'int'
> Fixes: 
> 11156/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_H264_fuzzer-6237685933408256
> 
> Found-by: continuous fuzzing process 
> https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
> Signed-off-by: Michael Niedermayer 
> ---
>  libavcodec/h264_slice.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

will apply

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Concerning the gods, I have no means of knowing whether they exist or not
or of what sort they may be, because of the obscurity of the subject, and
the brevity of human life -- Protagoras


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 2/3] avcodec/gdv: Optimize and factorize scaling loops

2019-01-12 Thread Michael Niedermayer
On Fri, Jan 04, 2019 at 08:22:35PM +0100, Michael Niedermayer wrote:
> Fixes: Timeout
> Fixes: 
> 11067/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_GDV_fuzzer-5686623711264768
> 
> Before change: Executed 
> clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_GDV_fuzzer-5686623711264768 
> in 34386 ms
> After  change: Executed 
> clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_GDV_fuzzer-5686623711264768 
> in 24327 ms
> 
> Found-by: continuous fuzzing process 
> https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
> Signed-off-by: Michael Niedermayer 
> ---
>  libavcodec/gdv.c | 87 +++-
>  1 file changed, 64 insertions(+), 23 deletions(-)

will apply

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

No human being will ever know the Truth, for even if they happen to say it
by chance, they would not even known they had done so. -- Xenophanes


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 3/3] avcodec/exr: set layer_match in all branches

2019-01-12 Thread Michael Niedermayer
On Tue, Dec 25, 2018 at 11:15:22PM +0100, Michael Niedermayer wrote:
> Otherwise it is left to the value from the previous iteration
> 
> Signed-off-by: Michael Niedermayer 
> ---
>  libavcodec/exr.c | 1 +
>  1 file changed, 1 insertion(+)

will apply

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Asymptotically faster algorithms should always be preferred if you have
asymptotical amounts of data


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 2/3] avcodec/exr: Check for duplicate channel index

2019-01-12 Thread Michael Niedermayer
On Tue, Dec 25, 2018 at 11:15:21PM +0100, Michael Niedermayer wrote:
> Fixes: Out of memory
> Fixes: 
> 11582/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_EXR_fuzzer-5730204559867904
> 
> Found-by: continuous fuzzing process 
> https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
> Signed-off-by: Michael Niedermayer 
> ---
>  libavcodec/exr.c | 5 +
>  1 file changed, 5 insertions(+)

will apply

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

When the tyrant has disposed of foreign enemies by conquest or treaty, and
there is nothing more to fear from them, then he is always stirring up
some war or other, in order that the people may require a leader. -- Plato


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH v5] libswscale/ppc: VSX-optimize 9-16 bit yuv2planeX

2019-01-12 Thread Michael Niedermayer
On Sat, Jan 12, 2019 at 10:47:50AM +0200, Lauri Kasanen wrote:
> ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt 
> yuv420p16be \
> -s 1920x1728 -f null -vframes 100 -v error -nostats -
> 
> 9-14 bit funcs get about 6x speedup, 16-bit gets about 15x.
> Fate passes, each format tested with an image to video conversion.
> 
> Only POWER8 includes 32-bit vector multiplies, so POWER7 is locked out
> of the 16-bit function. This includes the vec_mulo/mule functions too,
> not just vmuluwm.
> 
> yuv420p9le
>   12341 UNITS in planarX,  130976 runs, 96 skips
>   73752 UNITS in planarX,  131066 runs,  6 skips
> yuv420p9be
>   12364 UNITS in planarX,  131025 runs, 47 skips
>   73001 UNITS in planarX,  131055 runs, 17 skips
> yuv420p10le
>   12386 UNITS in planarX,  131042 runs, 30 skips
>   72735 UNITS in planarX,  131062 runs, 10 skips
> yuv420p10be
>   12337 UNITS in planarX,  131045 runs, 27 skips
>   72734 UNITS in planarX,  131057 runs, 15 skips
> yuv420p12le
>   12236 UNITS in planarX,  131058 runs, 14 skips
>   73029 UNITS in planarX,  131062 runs, 10 skips
> yuv420p12be
>   12218 UNITS in planarX,  130973 runs, 99 skips
>   72402 UNITS in planarX,  131069 runs,  3 skips
> yuv420p14le
>   12168 UNITS in planarX,  131067 runs,  5 skips
>   72480 UNITS in planarX,  131069 runs,  3 skips
> yuv420p14be
>   12358 UNITS in planarX,  130948 runs,124 skips
>   73772 UNITS in planarX,  131063 runs,  9 skips
> yuv420p16le
>   10439 UNITS in planarX,  130911 runs,161 skips
>  157923 UNITS in planarX,  131068 runs,  4 skips
> yuv420p16be
>   10463 UNITS in planarX,  130874 runs,198 skips
>  154405 UNITS in planarX,  131061 runs, 11 skips

The number of skips in the benchmark is much larger on one
side. That way the numbers become hard to compare as
more cases aer skipped on one side

please adjust the parameters so the skip counts are compareable
or redo the tests until the numbers are more similar
thanks

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Modern terrorism, a quick summary: Need oil, start war with country that
has oil, kill hundread thousand in war. Let country fall into chaos,
be surprised about raise of fundamantalists. Drop more bombs, kill more
people, be surprised about them taking revenge and drop even more bombs
and strip your own citizens of their rights and freedoms. to be continued


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH v5] libswscale/ppc: VSX-optimize 9-16 bit yuv2planeX

2019-01-12 Thread Lauri Kasanen
./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt 
yuv420p16be \
-s 1920x1728 -f null -vframes 100 -v error -nostats -

9-14 bit funcs get about 6x speedup, 16-bit gets about 15x.
Fate passes, each format tested with an image to video conversion.

Only POWER8 includes 32-bit vector multiplies, so POWER7 is locked out
of the 16-bit function. This includes the vec_mulo/mule functions too,
not just vmuluwm.

yuv420p9le
  12341 UNITS in planarX,  130976 runs, 96 skips
  73752 UNITS in planarX,  131066 runs,  6 skips
yuv420p9be
  12364 UNITS in planarX,  131025 runs, 47 skips
  73001 UNITS in planarX,  131055 runs, 17 skips
yuv420p10le
  12386 UNITS in planarX,  131042 runs, 30 skips
  72735 UNITS in planarX,  131062 runs, 10 skips
yuv420p10be
  12337 UNITS in planarX,  131045 runs, 27 skips
  72734 UNITS in planarX,  131057 runs, 15 skips
yuv420p12le
  12236 UNITS in planarX,  131058 runs, 14 skips
  73029 UNITS in planarX,  131062 runs, 10 skips
yuv420p12be
  12218 UNITS in planarX,  130973 runs, 99 skips
  72402 UNITS in planarX,  131069 runs,  3 skips
yuv420p14le
  12168 UNITS in planarX,  131067 runs,  5 skips
  72480 UNITS in planarX,  131069 runs,  3 skips
yuv420p14be
  12358 UNITS in planarX,  130948 runs,124 skips
  73772 UNITS in planarX,  131063 runs,  9 skips
yuv420p16le
  10439 UNITS in planarX,  130911 runs,161 skips
 157923 UNITS in planarX,  131068 runs,  4 skips
yuv420p16be
  10463 UNITS in planarX,  130874 runs,198 skips
 154405 UNITS in planarX,  131061 runs, 11 skips

Signed-off-by: Lauri Kasanen 
---
 libswscale/ppc/swscale_ppc_template.c |   4 +-
 libswscale/ppc/swscale_vsx.c  | 186 +-
 2 files changed, 184 insertions(+), 6 deletions(-)

v2: Separate macros so that yuv2plane1_16_vsx remains available for power7
v3: Remove accidental tabs, switch to HAVE_POWER8 from configure + runtime check
v4: #if HAVE_POWER8
v5: Get rid of the mul #if, turns out gcc vec_mul works

diff --git a/libswscale/ppc/swscale_ppc_template.c 
b/libswscale/ppc/swscale_ppc_template.c
index 00e4b99..11decab 100644
--- a/libswscale/ppc/swscale_ppc_template.c
+++ b/libswscale/ppc/swscale_ppc_template.c
@@ -21,7 +21,7 @@
  * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
  */
 
-static void FUNC(yuv2planeX_16)(const int16_t *filter, int filterSize,
+static void FUNC(yuv2planeX_8_16)(const int16_t *filter, int filterSize,
   const int16_t **src, uint8_t *dest,
   const uint8_t *dither, int offset, int x)
 {
@@ -88,7 +88,7 @@ static void FUNC(yuv2planeX)(const int16_t *filter, int 
filterSize,
 yuv2planeX_u(filter, filterSize, src, dest, dst_u, dither, offset, 0);
 
 for (i = dst_u; i < dstW - 15; i += 16)
-FUNC(yuv2planeX_16)(filter, filterSize, src, dest + i, dither,
+FUNC(yuv2planeX_8_16)(filter, filterSize, src, dest + i, dither,
   offset, i);
 
 yuv2planeX_u(filter, filterSize, src, dest, dstW, dither, offset, i);
diff --git a/libswscale/ppc/swscale_vsx.c b/libswscale/ppc/swscale_vsx.c
index 70da6ae..f6c7f1d 100644
--- a/libswscale/ppc/swscale_vsx.c
+++ b/libswscale/ppc/swscale_vsx.c
@@ -83,6 +83,8 @@
 #include "swscale_ppc_template.c"
 #undef FUNC
 
+#undef vzero
+
 #endif /* !HAVE_BIGENDIAN */
 
 static void yuv2plane1_8_u(const int16_t *src, uint8_t *dest, int dstW,
@@ -180,6 +182,76 @@ static void yuv2plane1_nbps_vsx(const int16_t *src, 
uint16_t *dest, int dstW,
 yuv2plane1_nbps_u(src, dest, dstW, big_endian, output_bits, i);
 }
 
+static void yuv2planeX_nbps_u(const int16_t *filter, int filterSize,
+  const int16_t **src, uint16_t *dest, int dstW,
+  int big_endian, int output_bits, int start)
+{
+int i;
+int shift = 11 + 16 - output_bits;
+
+for (i = start; i < dstW; i++) {
+int val = 1 << (shift - 1);
+int j;
+
+for (j = 0; j < filterSize; j++)
+val += src[j][i] * filter[j];
+
+output_pixel([i], val);
+}
+}
+
+static void yuv2planeX_nbps_vsx(const int16_t *filter, int filterSize,
+const int16_t **src, uint16_t *dest, int dstW,
+int big_endian, int output_bits)
+{
+const int dst_u = -(uintptr_t)dest & 7;
+const int shift = 11 + 16 - output_bits;
+const int add = (1 << (shift - 1));
+const int clip = (1 << output_bits) - 1;
+const uint16_t swap = big_endian ? 8 : 0;
+const vector uint32_t vadd = (vector uint32_t) {add, add, add, add};
+const vector uint32_t vshift = (vector uint32_t) {shift, shift, shift, 
shift};
+const vector uint16_t vswap = (vector uint16_t) {swap, swap, swap, swap, 
swap, swap, swap, swap};
+const vector uint16_t vlargest = (vector uint16_t) {clip, clip, clip, 
clip, clip, clip, clip, 

Re: [FFmpeg-devel] [PATCH v4] libswscale/ppc: VSX-optimize 9-16 bit yuv2planeX

2019-01-12 Thread Lauri Kasanen
On Sat, 12 Jan 2019 01:03:09 +0100
Michael Niedermayer  wrote:

> On Fri, Jan 11, 2019 at 11:16:20AM +0200, Lauri Kasanen wrote:
> > On Fri, 11 Jan 2019 09:56:15 +0100
> > Michael Niedermayer  wrote:
> > 
> > > > +#ifdef __GNUC__
> > > > +// GCC does not support vmuluwm yet. Bug open.
> > > 
> > > this should probably be tested by configure similar to how other
> > > compiler limitations are tested
> > 
> > We can't really test for it, because there is no standard name for it. I
> > don't know what name the gcc devs will pick for it, it could be vec_mul,
> > vec_vmuluwm or something different.
> 
> the code contains a #if and a #else case
> so i thought there was something else than the __GNUC__ case and gcc
> would follow that

It's second-hand info from libsimdpp. I don't know where they got it.

However, I found out yesterday that gcc docs are wrong, and vec_mul for
gcc does use the correct instruction on power8. Respinning.

- Lauri
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel