Re: [FFmpeg-devel] What new instructions would you like?

2020-02-01 Thread Lauri Kasanen
On Sat, 1 Feb 2020 12:53:28 +0100 James Darnley wrote: > On 30/12/2019, Lauri Kasanen wrote: > > For the Libre RISC-V project, I'm going to research the popular codecs > > and design new instructions to help speed them up. With ffmpeg being > > home to lots of asm f

[FFmpeg-devel] What new instructions would you like?

2019-12-30 Thread Lauri Kasanen
Hi, For the Libre RISC-V project, I'm going to research the popular codecs and design new instructions to help speed them up. With ffmpeg being home to lots of asm folks for many platforms, I also want to ask your opinion. What new instructions would you like? Anything particular you find missing

Re: [FFmpeg-devel] [PATCH v2 0/2] AltiVec/VSX fixes in swscale

2019-10-03 Thread Lauri Kasanen
On Tue, 1 Oct 2019 18:26:20 +0300 Lauri Kasanen wrote: > Hi, > > I'll apply these in a couple days if no objections. Works ok in my > tests. Applying. - Lauri ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org

Re: [FFmpeg-devel] [PATCH v2 0/2] AltiVec/VSX fixes in swscale

2019-10-01 Thread Lauri Kasanen
Hi, I'll apply these in a couple days if no objections. Works ok in my tests. - Lauri ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...

Re: [FFmpeg-devel] [PATCH V3] swscale/ppc/yuv2rgb_altivec: Fixes compiler bug - replace vec_lvsl/vec_perm with vec_xl

2019-08-24 Thread Lauri Kasanen
Hi, This change uses VSX code in a function marked Altivec, aka it makes it not work on pre-power7 (macs, etc). As such I would've NAK'd it. - Lauri ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH v2 0/2] AltiVec/VSX fixes in swscale

2019-08-24 Thread Lauri Kasanen
Hi, I approve of this series, but being in the middle of a move, I can't test it. - Lauri ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-req

Re: [FFmpeg-devel] yuv420_bgr24_mmxext conversion taking significant time

2019-06-10 Thread Lauri Kasanen
On Mon, 10 Jun 2019 17:42:00 -0700 Adrian Tong wrote: > I have been trying to implement yuv420_to_bgr24 using SSE2 instruction. I > ran into the case where the output of C implemented yuv420_to_bgr24 has > slightly different resulting bgr24 image from MMX implemented > yuv420_to_bgr24. Is this ex

Re: [FFmpeg-devel] yuv420_bgr24_mmxext conversion taking significant time

2019-06-08 Thread Lauri Kasanen
On Sat, 8 Jun 2019 06:51:51 -0700 Adrian Tong wrote: > Hi Lauri. > > Thanks for the reply, any reason why this has not been implemented before ? > it seems to me that this would be a pretty important/hot function. Just the usual, nobody has had the interest. There are other places too where the

Re: [FFmpeg-devel] yuv420_bgr24_mmxext conversion taking significant time

2019-06-07 Thread Lauri Kasanen
On Fri, 7 Jun 2019 08:38:35 -0700 Adrian Tong wrote: > Hi > > I have a workload which spends a significant amount of time (~10%) in > the yuv420_bgr24_mmxext function in FFMEPG. > > I looked at the assembly and profile and see MMX (64 bit) registers are > used. I wonder whether we can have a SSE2

Re: [FFmpeg-devel] [PATCH] swscale: Add support for NV24 and NV42

2019-05-10 Thread Lauri Kasanen
On Fri, 10 May 2019 10:08:57 -0700 Philip Langdale wrote: > On 2019-05-10 08:12, Lauri Kasanen wrote: > > On Fri, 10 May 2019 08:07:45 -0700 > > Philip Langdale wrote: > > > >> On Fri, 10 May 2019 09:35:40 +0300 > >> Lauri Kasanen wrote: > >>

Re: [FFmpeg-devel] [PATCH] swscale: Add support for NV24 and NV42

2019-05-10 Thread Lauri Kasanen
On Fri, 10 May 2019 08:07:45 -0700 Philip Langdale wrote: > On Fri, 10 May 2019 09:35:40 +0300 > Lauri Kasanen wrote: > > > > > I'm having trouble making out what formats exactly isSemiPlanarYUV() > > matches. Are you sure it's an equivalent check? > &

Re: [FFmpeg-devel] [PATCH] swscale: Add support for NV24 and NV42

2019-05-09 Thread Lauri Kasanen
On Thu, 9 May 2019 22:59:12 -0700 Philip Langdale wrote: > I don't think this is terribly useful, as the only thing out there that > can even handle NV24 content is VDPAU and the only time you have to > deal with it is when doing VDPAU OpenGL interop where swscale is > irrelevant. In the other c

Re: [FFmpeg-devel] [PATCH 1/4] swscale/ppc: VSX-optimize hScale8To19_vsx

2019-05-07 Thread Lauri Kasanen
On Tue, 30 Apr 2019 14:43:52 +0300 Lauri Kasanen wrote: > ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \ > -s 2400x720 -f rawvideo -y -vframes 5 -pix_fmt yuv420p16le -nostats > test.raw > > 2.26 speedup (x86 SSE2 is 2.32): > 23772 UNITS in hscale,4096

Re: [FFmpeg-devel] [PATCH V5 1/2] configure: sort decoder/encoder/filter/... names in alphabet order

2019-05-01 Thread Lauri Kasanen
On Wed, 1 May 2019 22:57:47 +0200 Carl Eugen Hoyos wrote: > 2019-04-28 3:18 GMT+02:00, Alexander Strasser : > > > What do you think about using awk instead of shell? > > Do we only use awk for --enable-random and the dependency > files so far? Does configure also work without awk now and > would

Re: [FFmpeg-devel] [PATCH 1/4] swscale/ppc: VSX-optimize hScale8To19_vsx

2019-04-30 Thread Lauri Kasanen
Copy-paste thinko in the title I see. Will remove the _vsx suffix from the title. - Lauri ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ

Re: [FFmpeg-devel] [PATCH 3/4] swscale/ppc: VSX-optimize hScale16To*

2019-04-30 Thread Lauri Kasanen
hScale8To19_vsx (x86 SSE2 is 2.37): 30896 UNITS in hscale,8192 runs, 0 skips 63956 UNITS in hscale,8192 runs, 0 skips 2.06 for hScale16To15_vsx: 30531 UNITS in hscale,8192 runs, 0 skips 63161 UNITS in hscale,8192 runs, 0 skips Signed-off-by: Lauri Kasanen

[FFmpeg-devel] [PATCH 3/4] swscale/ppc: VSX-optimize hScale16To*

2019-04-30 Thread Lauri Kasanen
hScale8To19_vsx (x86 SSE2 is 2.37): 30896 UNITS in hscale,8192 runs, 0 skips 63956 UNITS in hscale,8192 runs, 0 skips 2.06 for hScale16To15_vsx: 30531 UNITS in hscale,8192 runs, 0 skips 63161 UNITS in hscale,8192 runs, 0 skips Signed-off-by: Lauri Kasanen

[FFmpeg-devel] [PATCH 4/4] swscale/ppc: Shorten power8 tests via a var

2019-04-30 Thread Lauri Kasanen
Signed-off-by: Lauri Kasanen --- libswscale/ppc/swscale_vsx.c | 27 ++- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/libswscale/ppc/swscale_vsx.c b/libswscale/ppc/swscale_vsx.c index 31d3ba2..a617f76 100644 --- a/libswscale/ppc/swscale_vsx.c +++ b

[FFmpeg-devel] [PATCH 2/4] swscale/ppc: Indent

2019-04-30 Thread Lauri Kasanen
Signed-off-by: Lauri Kasanen --- libswscale/ppc/swscale_vsx.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/libswscale/ppc/swscale_vsx.c b/libswscale/ppc/swscale_vsx.c index a82cf95..17c15a2 100644 --- a/libswscale/ppc/swscale_vsx.c +++ b/libswscale/ppc

[FFmpeg-devel] [PATCH 1/4] swscale/ppc: VSX-optimize hScale8To19_vsx

2019-04-30 Thread Lauri Kasanen
: Lauri Kasanen --- libswscale/ppc/swscale_vsx.c | 64 +++- 1 file changed, 63 insertions(+), 1 deletion(-) diff --git a/libswscale/ppc/swscale_vsx.c b/libswscale/ppc/swscale_vsx.c index 2e20ab3..a82cf95 100644 --- a/libswscale/ppc/swscale_vsx.c +++ b/libswscale

Re: [FFmpeg-devel] [PATCH] swscale/ppc: VSX-optimize hscale_fast

2019-04-30 Thread Lauri Kasanen
On Wed, 24 Apr 2019 14:02:16 +0300 Lauri Kasanen wrote: > ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags > fast_bilinear \ > -s 2400x720 -f rawvideo -vframes 5 -pix_fmt abgr -nostats test.raw > > 4.27 speedup for hyscale_fast: > 24796 UNI

[FFmpeg-devel] [PATCH] swscale/ppc: VSX-optimize hscale_fast

2019-04-24 Thread Lauri Kasanen
, 0 skips 4.48 speedup for hcscale_fast: 19911 UNITS in hcscale_fast,4095 runs, 1 skips 4437 UNITS in hcscale_fast,4096 runs, 0 skips Signed-off-by: Lauri Kasanen --- libswscale/ppc/swscale_vsx.c | 196 +++ 1 file changed, 196

Re: [FFmpeg-devel] [PATCH]lavc/alac: Make a variable unsigned

2019-04-18 Thread Lauri Kasanen
On Thu, 18 Apr 2019 15:07:03 +0200 Hendrik Leppkes wrote: > On Thu, Apr 18, 2019 at 2:54 PM Lauri Kasanen wrote: > > > > On Thu, 18 Apr 2019 13:53:37 +0200 > > Carl Eugen Hoyos wrote: > > > > > Hi! > > > > > > Attached patch silences a wa

Re: [FFmpeg-devel] [PATCH]lavc/alac: Make a variable unsigned

2019-04-18 Thread Lauri Kasanen
On Thu, 18 Apr 2019 13:53:37 +0200 Carl Eugen Hoyos wrote: > Hi! > > Attached patch silences a warning that is shown with some gcc versions. It pokes my style sense to have different things in the sizeof() and the var. How about uint32_t in both? - Lauri

Re: [FFmpeg-devel] [PATCH] swscale/ppc: VSX-optimize non-full-chroma yuv2rgb_2

2019-04-10 Thread Lauri Kasanen
On Fri, 5 Apr 2019 11:41:19 +0300 Lauri Kasanen wrote: > ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags > fast_bilinear \ > -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \ > -cpuflags 0 -v error - > > 32-bit mul, power8 onl

Re: [FFmpeg-devel] [PATCH v2] Added XV Support

2019-04-07 Thread Lauri Kasanen
On Mon, 8 Apr 2019 06:39:27 +0800 Steven Liu wrote: > >+.long_name = NULL_IF_CONFIG_SMALL("Xunlie Video File"), XV is a video output format, so please make the title something like "flv: Add XV (Xunlie Video) support". - Lauri ___ ffmpeg-deve

Re: [FFmpeg-devel] [PATCH] swscale/ppc: VSX-optimize yuv2rgb_full_X

2019-04-06 Thread Lauri Kasanen
On Mon, 1 Apr 2019 13:37:32 +0300 Lauri Kasanen wrote: > ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \ > -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \ > -cpuflags 0 -v error - > > 32-bit mul, power8 only. > > ~6.

Re: [FFmpeg-devel] [PATCH] swscale/ppc: VSX-optimize yuv2rgb_full_2

2019-04-06 Thread Lauri Kasanen
On Mon, 1 Apr 2019 13:13:59 +0300 Lauri Kasanen wrote: > ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags area \ > -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \ > -cpuflags 0 -v error - > > 32-bit mul, power8 only. > > ~

Re: [FFmpeg-devel] [PATCH] swscale/ppc: VSX-optimize non-full-chroma yuv2rgb_1

2019-04-06 Thread Lauri Kasanen
On Sun, 31 Mar 2019 17:18:47 +0300 Lauri Kasanen wrote: > ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags > fast_bilinear \ > -s 1200x1440 -f null -vframes 100 -pix_fmt $i -nostats \ > -cpuflags 0 -v error - > > 32-bit mul, power8 only. &g

[FFmpeg-devel] [PATCH] swscale/ppc: VSX-optimize non-full-chroma yuv2rgb_2

2019-04-05 Thread Lauri Kasanen
yuv2packed2, 16384 runs, 0 skips This is a low speedup, but the x86 mmx version also gets only ~2x. The mmx version is also heavily inaccurate, while the vsx version has high accuracy. Signed-off-by: Lauri Kasanen --- libswscale/ppc/swscale_vsx.c | 188

[FFmpeg-devel] [PATCH] swscale/ppc: VSX-optimize yuv2rgb_full_X

2019-04-01 Thread Lauri Kasanen
yuv2packedX, 16384 runs, 0 skips Signed-off-by: Lauri Kasanen --- libswscale/ppc/swscale_vsx.c | 160 +++ 1 file changed, 160 insertions(+) diff --git a/libswscale/ppc/swscale_vsx.c b/libswscale/ppc/swscale_vsx.c index 6ff8b62..e05f9ec 100644 --- a

[FFmpeg-devel] [PATCH] swscale/ppc: VSX-optimize yuv2rgb_full_2

2019-04-01 Thread Lauri Kasanen
yuv2packed2, 16384 runs, 0 skips Signed-off-by: Lauri Kasanen --- libswscale/ppc/swscale_vsx.c | 166 +++ 1 file changed, 166 insertions(+) diff --git a/libswscale/ppc/swscale_vsx.c b/libswscale/ppc/swscale_vsx.c index 0ac8cac..6ff8b62 100644 --- a

Re: [FFmpeg-devel] [PATCH] This patch addresses Trac ticket #5570. The optimized functions are in file libswscale/ppc/input_vsx.c. Each optimized function name is a concatenation of the corresponding

2019-03-31 Thread Lauri Kasanen
On Mon, 1 Apr 2019 09:07:48 +0300 slava wrote: > Sorry for title. It is my first experience in git send-email. Can I make > a benchmark with handwritten tests or have some standard tool in ffmeg? > And will the benchmark on x86-64 be informative? We have standard bench macros, START_TIMER and ST

[FFmpeg-devel] [PATCH] swscale/ppc: VSX-optimize non-full-chroma yuv2rgb_1

2019-03-31 Thread Lauri Kasanen
UNITS in yuv2packed1, 32764 runs, 4 skips This is a low speedup, but the x86 mmx version also gets only ~2x. The mmx version is also heavily inaccurate, while the vsx version has high accuracy. Signed-off-by: Lauri Kasanen --- libswscale/ppc/swscale_vsx.c | 425

Re: [FFmpeg-devel] [PATCH 1/3] swscale/ppc: VSX-optimize yuv2422_1

2019-03-31 Thread Lauri Kasanen
On Sun, 24 Mar 2019 15:10:35 +0200 Lauri Kasanen wrote: > ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \ > -s 1200x1440 -f null -vframes 100 -pix_fmt $i -nostats \ > -cpuflags 0 -v error - > > 15.3x speedup: > > yuyv422 > 14513 UNITS

Re: [FFmpeg-devel] [PATCH] This patch addresses Trac ticket #5570. The optimized functions are in file libswscale/ppc/input_vsx.c. Each optimized function name is a concatenation of the corresponding

2019-03-29 Thread Lauri Kasanen
On Fri, 29 Mar 2019 17:00:38 +0300 Вячеслав wrote: > --- > libswscale/ppc/Makefile |3 +- > libswscale/ppc/input_vsx.c| 3801 > + > libswscale/swscale.c |3 + > libswscale/swscale_internal.h |1 + > 4 files changed, 3807 ins

Re: [FFmpeg-devel] [PATCH v2 resend] swscale/ppc: VSX-optimize yuv2rgb_full

2019-03-27 Thread Lauri Kasanen
On Thu, 21 Mar 2019 09:54:17 +0200 Lauri Kasanen wrote: > ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \ > -s 1200x1440 -f null -vframes 100 -pix_fmt $i -nostats \ > -cpuflags 0 -v error - > > This uses 32-bit mul, so POWER8 only. > > The followi

Re: [FFmpeg-devel] [PATCH v2] swscale: Remove duplicated code

2019-03-26 Thread Lauri Kasanen
On Tue, 26 Mar 2019 22:00:54 +0100 Michael Niedermayer wrote: > On Tue, Mar 26, 2019 at 08:58:34AM +0200, Lauri Kasanen wrote: > > In this function, the exact same clamping happens both in the if and > > unconditionally. > > > > Signed-off-by: Lauri Kasanen >

[FFmpeg-devel] [PATCH v2] swscale: Remove duplicated code

2019-03-25 Thread Lauri Kasanen
In this function, the exact same clamping happens both in the if and unconditionally. Signed-off-by: Lauri Kasanen --- libswscale/output.c | 10 -- 1 file changed, 10 deletions(-) v2: Remove the unconditional instead of the if'd clipping. I'll leave changing the bit pattern

Re: [FFmpeg-devel] [PATCH] swscale: Remove duplicated code

2019-03-25 Thread Lauri Kasanen
On Mon, 25 Mar 2019 11:17:38 +0100 Michael Niedermayer wrote: > On Sun, Mar 24, 2019 at 01:04:51PM +0200, Lauri Kasanen wrote: > > In this function, the exact same clamping happens both in the if and > > unconditionally. > > > > Signed-off-by: Lauri Kasanen >

[FFmpeg-devel] [PATCH 3/3 v2] swscale/ppc: VSX-optimize yuv2422_X

2019-03-25 Thread Lauri Kasanen
yvyu422 117669 UNITS in yuv2packedX, 16384 runs, 0 skips 16271 UNITS in yuv2packedX, 16379 runs, 5 skips uyvy422 117310 UNITS in yuv2packedX, 16384 runs, 0 skips 16226 UNITS in yuv2packedX, 16382 runs, 2 skips Signed-off-by: Lauri Kasanen --- libswscale/ppc

[FFmpeg-devel] [PATCH 2/3] swscale/ppc: VSX-optimize yuv2422_2

2019-03-24 Thread Lauri Kasanen
, 16383 runs, 1 skips yvyu422 19438 UNITS in yuv2packed2, 16384 runs, 0 skips 3800 UNITS in yuv2packed2, 16380 runs, 4 skips uyvy422 19128 UNITS in yuv2packed2, 16384 runs, 0 skips 3721 UNITS in yuv2packed2, 16380 runs, 4 skips Signed-off-by: Lauri Kasanen

[FFmpeg-devel] [PATCH 3/3] swscale/ppc: VSX-optimize yuv2422_X

2019-03-24 Thread Lauri Kasanen
runs, 2 skips yvyu422 117669 UNITS in yuv2packedX, 16384 runs, 0 skips 16271 UNITS in yuv2packedX, 16379 runs, 5 skips uyvy422 117310 UNITS in yuv2packedX, 16384 runs, 0 skips 16226 UNITS in yuv2packedX, 16382 runs, 2 skips Signed-off-by: Lauri Kasanen

[FFmpeg-devel] [PATCH 1/3] swscale/ppc: VSX-optimize yuv2422_1

2019-03-24 Thread Lauri Kasanen
skips yvyu422 14516 UNITS in yuv2packed1, 32767 runs, 1 skips 943 UNITS in yuv2packed1, 32767 runs, 1 skips uyvy422 14530 UNITS in yuv2packed1, 32767 runs, 1 skips 941 UNITS in yuv2packed1, 32766 runs, 2 skips Signed-off-by: Lauri Kasanen --- libswscale/ppc

[FFmpeg-devel] [PATCH] swscale: Remove duplicated code

2019-03-24 Thread Lauri Kasanen
In this function, the exact same clamping happens both in the if and unconditionally. Signed-off-by: Lauri Kasanen --- libswscale/output.c | 14 -- 1 file changed, 14 deletions(-) diff --git a/libswscale/output.c b/libswscale/output.c index d7c53e6..8441ddd 100644 --- a/libswscale

Re: [FFmpeg-devel] [PATCH]lavf: Constify the probe function argument

2019-03-21 Thread Lauri Kasanen
On Thu, 21 Mar 2019 01:20:21 +0100 Carl Eugen Hoyos wrote: > Hi! > > Attached patch makes the only argument to the common probe() function const. > > Please comment, Carl Eugen LGTM - Lauri ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https:/

[FFmpeg-devel] [PATCH v2 resend] swscale/ppc: VSX-optimize yuv2rgb_full

2019-03-21 Thread Lauri Kasanen
, 0 skips 8659 UNITS in yuv2packed1, 32767 runs, 1 skips Signed-off-by: Lauri Kasanen --- libswscale/ppc/swscale_vsx.c | 291 +++ 1 file changed, 291 insertions(+) v2: HAVE_POWER8 from ifdef to if Resending due to mail client troubles diff

Re: [FFmpeg-devel] [PATCH v2] swscale/ppc: VSX-optimize yuv2rgb_full

2019-03-20 Thread Lauri Kasanen
On Wed, 20 Mar 2019 16:31:57 +0100 Carl Eugen Hoyos wrote: > 2019-03-20 16:06 GMT+01:00, Lauri Kasanen : > > On Wed, 20 Mar 2019 15:51:20 +0100 > > Carl Eugen Hoyos wrote: > > > >> 2019-03-20 15:06 GMT+01:00, Lauri Kasanen : > >> &

Re: [FFmpeg-devel] [PATCH v2] swscale/ppc: VSX-optimize yuv2rgb_full

2019-03-20 Thread Lauri Kasanen
On Wed, 20 Mar 2019 15:51:20 +0100 Carl Eugen Hoyos wrote: > 2019-03-20 15:06 GMT+01:00, Lauri Kasanen : > > > +case AV_PIX_FMT_BGRA: > > +if (HAVE_POWER8 && cpu_flags & AV_CPU_FLAG_POWER8) { > > +if (!c->need

[FFmpeg-devel] [PATCH v2] swscale/ppc: VSX-optimize yuv2rgb_full

2019-03-20 Thread Lauri Kasanen
, 0 skips 8659 UNITS in yuv2packed1, 32767 runs, 1 skips Signed-off-by: Lauri Kasanen --- libswscale/ppc/swscale_vsx.c | 291 +++ 1 file changed, 291 insertions(+) v2: HAVE_POWER8 from ifdef to if diff --git a/libswscale/ppc/swscale_vsx.c b

Re: [FFmpeg-devel] [PATCH] swscale/ppc: VSX-optimize yuv2rgb_full

2019-03-20 Thread Lauri Kasanen
On Wed, 20 Mar 2019 14:41:27 +0100 Carl Eugen Hoyos wrote: > 2019-03-20 13:37 GMT+01:00, Lauri Kasanen : > > > @@ -480,5 +722,66 @@ av_cold void ff_sws_init_swscale_vsx(SwsContext *c) > > Are there followup patches? > Or why is the following hunk so convoluted? I plan

[FFmpeg-devel] [PATCH] swscale/ppc: VSX-optimize yuv2rgb_full

2019-03-20 Thread Lauri Kasanen
, 0 skips 8659 UNITS in yuv2packed1, 32767 runs, 1 skips Signed-off-by: Lauri Kasanen --- libswscale/ppc/swscale_vsx.c | 303 +++ 1 file changed, 303 insertions(+) diff --git a/libswscale/ppc/swscale_vsx.c b/libswscale/ppc/swscale_vsx.c index

Re: [FFmpeg-devel] [PATCH 2/2] swscale/ppc: Add av_unused to template vars only used in one includer

2019-03-20 Thread Lauri Kasanen
On Mon, 18 Mar 2019 13:56:52 +0200 Lauri Kasanen wrote: > Signed-off-by: Lauri Kasanen > --- > libswscale/ppc/swscale_ppc_template.c | 21 +++-- > 1 file changed, 11 insertions(+), 10 deletions(-) Applying these t

Re: [FFmpeg-devel] [PATCH 1/2] swscale/ppc: Clean up some mixed decl warnings

2019-03-18 Thread Lauri Kasanen
On Mon, 18 Mar 2019 14:06:15 +0100 Carl Eugen Hoyos wrote: > > This looks good to me if you tested it and it reduces the number of warnings. Tested on power8. With these two patches, swscale/ppc has no warnings. - Lauri ___ ffmpeg-devel mailing list f

[FFmpeg-devel] [PATCH 2/2] swscale/ppc: Add av_unused to template vars only used in one includer

2019-03-18 Thread Lauri Kasanen
Signed-off-by: Lauri Kasanen --- libswscale/ppc/swscale_ppc_template.c | 21 +++-- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/libswscale/ppc/swscale_ppc_template.c b/libswscale/ppc/swscale_ppc_template.c index 3964a7a..aff2dd7 100644 --- a/libswscale/ppc

[FFmpeg-devel] [PATCH 1/2] swscale/ppc: Clean up some mixed decl warnings

2019-03-18 Thread Lauri Kasanen
Signed-off-by: Lauri Kasanen --- libswscale/ppc/swscale_altivec.c | 6 +++--- libswscale/ppc/swscale_ppc_template.c | 9 + libswscale/ppc/swscale_vsx.c | 6 +++--- 3 files changed, 11 insertions(+), 10 deletions(-) diff --git a/libswscale/ppc/swscale_altivec.c b/libswscale

Re: [FFmpeg-devel] [PATCH] avcodec/tiff: Add support for recognizing DNG files

2019-03-18 Thread Lauri Kasanen
On Mon, 18 Mar 2019 09:13:01 +0100 Moritz Barsnick wrote: > On Sun, Mar 17, 2019 at 23:05:01 +0100, Paul B Mahol wrote: > > Still wrong, You can decode images you linked just fine (albeit with > > incorrect colors) with command: > > > > ffmpeg -subimage 1 -i IMAGE.dng rest of command. > > S

Re: [FFmpeg-devel] [PATCH 2/2] avcodec/pnm: Avoid structure pointer dereferences in inner loop in pnm_get()

2019-02-21 Thread Lauri Kasanen
On Thu, 21 Feb 2019 20:34:29 +0100 Michael Niedermayer wrote: > Improves speed from 5.4 to 4.2 seconds > Fixes: > 13149/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_PGM_fuzzer-5760833622114304 LGTM Though, I really would expect the compiler to detect and optimize that. I wonder if "PNMCon

Re: [FFmpeg-devel] [PATCH v6] libswscale/ppc: VSX-optimize 9-16 bit yuv2planeX

2019-02-04 Thread Lauri Kasanen
On Sun, 13 Jan 2019 10:26:20 +0200 Lauri Kasanen wrote: > ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt > yuv420p16be \ > -s 1920x1728 -f null -vframes 100 -v error -nostats - > > 9-14 bit funcs get about 6x speedup, 16-bit gets about 15x. > Fate p

Re: [FFmpeg-devel] [PATCH] avutil/ppc/cpu: Fix power8 linux detection

2019-02-04 Thread Lauri Kasanen
On Tue, 8 Jan 2019 11:08:04 +0200 Lauri Kasanen wrote: > The existing code was in no released kernel that I can see. The corrected code > was added in 3.9. > > Signed-off-by: Lauri Kasanen > --- > libavutil/ppc/cpu.c | 10 +- > 1 file changed, 5 insertions(+), 5 d

[FFmpeg-devel] [PATCH] MAINTAINERS: add myself to the PPC section

2019-01-27 Thread Lauri Kasanen
Signed-off-by: Lauri Kasanen --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) Ref http://ffmpeg.org/pipermail/ffmpeg-devel/2019-January/239357.html Requesting commit access so I don't have to constantly bug Michael. diff --git a/MAINTAINERS b/MAINTAINERS index bc2ae13..e3a80e9 1

Re: [FFmpeg-devel] [PATCH] avutil/ppc/cpu: Fix power8 linux detection

2019-01-27 Thread Lauri Kasanen
On Thu, 17 Jan 2019 09:40:09 +0200 Lauri Kasanen wrote: > On Tue, 8 Jan 2019 11:08:04 +0200 > Lauri Kasanen wrote: > > > The existing code was in no released kernel that I can see. The corrected > > code > > was added in 3.9. > > > > Signed-off-by: La

Re: [FFmpeg-devel] [PATCH v6] libswscale/ppc: VSX-optimize 9-16 bit yuv2planeX

2019-01-27 Thread Lauri Kasanen
On Mon, 14 Jan 2019 16:13:52 +0100 Michael Niedermayer wrote: > On Sun, Jan 13, 2019 at 10:26:20AM +0200, Lauri Kasanen wrote: > > ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt > > yuv420p16be \ > > -s 1920x1728 -f null -vframes 100 -v error -nostat

Re: [FFmpeg-devel] [PATCH] avutil/ppc/cpu: Fix power8 linux detection

2019-01-16 Thread Lauri Kasanen
On Tue, 8 Jan 2019 11:08:04 +0200 Lauri Kasanen wrote: > The existing code was in no released kernel that I can see. The corrected code > was added in 3.9. > > Signed-off-by: Lauri Kasanen > --- > libavutil/ppc/cpu.c | 10 +- > 1 file changed, 5 insertions(+),

Re: [FFmpeg-devel] Video codec design for very low-end decoder

2019-01-13 Thread Lauri Kasanen
On Mon, 7 Jan 2019 12:37:01 -0500 "Ronald S. Bultje" wrote: > On Mon, Jan 7, 2019 at 12:22 PM Lauri Kasanen wrote: > > "Ronald S. Bultje" wrote: > > > > > Have you considered vp8? It may sound weird but this is basically what > > > vp8 was

[FFmpeg-devel] [PATCH v6] libswscale/ppc: VSX-optimize 9-16 bit yuv2planeX

2019-01-13 Thread Lauri Kasanen
, 131072 runs, 0 skips yuv420p16be 10634 UNITS in planarX, 131072 runs, 0 skips 150959 UNITS in planarX, 131072 runs, 0 skips Signed-off-by: Lauri Kasanen --- libswscale/ppc/swscale_ppc_template.c | 4 +- libswscale/ppc/swscale_vsx.c | 186

Re: [FFmpeg-devel] [PATCH v5] libswscale/ppc: VSX-optimize 9-16 bit yuv2planeX

2019-01-12 Thread Lauri Kasanen
On Sat, 12 Jan 2019 14:52:07 +0100 Michael Niedermayer wrote: > On Sat, Jan 12, 2019 at 10:47:50AM +0200, Lauri Kasanen wrote: > > ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt > > yuv420p16be \ > > -s 1920x1728 -f null -vframes 100 -v error -nostat

[FFmpeg-devel] [PATCH v5] libswscale/ppc: VSX-optimize 9-16 bit yuv2planeX

2019-01-12 Thread Lauri Kasanen
yuv420p16be 10463 UNITS in planarX, 130874 runs,198 skips 154405 UNITS in planarX, 131061 runs, 11 skips Signed-off-by: Lauri Kasanen --- libswscale/ppc/swscale_ppc_template.c | 4 +- libswscale/ppc/swscale_vsx.c | 186 +- 2 files changed, 184

Re: [FFmpeg-devel] [PATCH v4] libswscale/ppc: VSX-optimize 9-16 bit yuv2planeX

2019-01-12 Thread Lauri Kasanen
On Sat, 12 Jan 2019 01:03:09 +0100 Michael Niedermayer wrote: > On Fri, Jan 11, 2019 at 11:16:20AM +0200, Lauri Kasanen wrote: > > On Fri, 11 Jan 2019 09:56:15 +0100 > > Michael Niedermayer wrote: > > > > > > +#ifdef __GNUC__ > > > > +

Re: [FFmpeg-devel] [PATCH v4] libswscale/ppc: VSX-optimize 9-16 bit yuv2planeX

2019-01-11 Thread Lauri Kasanen
On Fri, 11 Jan 2019 09:56:15 +0100 Michael Niedermayer wrote: > > +#ifdef __GNUC__ > > +// GCC does not support vmuluwm yet. Bug open. > > this should probably be tested by configure similar to how other > compiler limitations are tested We can't really test for it, because there is

Re: [FFmpeg-devel] [PATCH] avutil/ppc/cpu: Fix power8 linux detection

2019-01-10 Thread Lauri Kasanen
On Thu, 10 Jan 2019 18:09:21 +0100 Carl Eugen Hoyos wrote: > >> > -goto out; > >> > >> This seems like an unrelated change. > > > > It's necessary. HWCAP appears before HWCAP2 in the array, so if the > > code jumps out in HWCAP, it never gets to checking the CAP2 bits like > > pow

[FFmpeg-devel] [PATCH v4] libswscale/ppc: VSX-optimize 9-16 bit yuv2planeX

2019-01-10 Thread Lauri Kasanen
yuv420p16be 10463 UNITS in planarX, 130874 runs,198 skips 154405 UNITS in planarX, 131061 runs, 11 skips Signed-off-by: Lauri Kasanen --- v2: Separate macros so that yuv2plane1_16_vsx remains available for power7 v3: Remove accidental tabs, switch to HAVE_POWER8 from configure + runtime check

Re: [FFmpeg-devel] [PATCH v3] libswscale/ppc: VSX-optimize 9-16 bit yuv2planeX

2019-01-10 Thread Lauri Kasanen
On Wed, 9 Jan 2019 22:26:25 +0100 Carl Eugen Hoyos wrote: > > +#ifdef __GNUC__ > > +// GCC does not support vmuluwm yet. Bug open. > > +__asm__("vmuluwm %0, %1, %2" : "=v"(vtmp) : "v"(vin32l), > > "v"(vfilter[j])); > > +vleft = vec_add(vleft, vtmp); > > +

Re: [FFmpeg-devel] [PATCH] avutil/ppc/cpu: Fix power8 linux detection

2019-01-10 Thread Lauri Kasanen
On Wed, 9 Jan 2019 21:55:30 +0100 Carl Eugen Hoyos wrote: > 2019-01-08 10:08 GMT+01:00, Lauri Kasanen : > > The existing code was in no released kernel that I can see. The corrected > > code > > was added in 3.9. > > > > Signed-off-by: Lauri Kasanen >

Re: [FFmpeg-devel] Armada 370 problem causes ffmpeg segmentation fault

2019-01-09 Thread Lauri Kasanen
On Tue, 08 Jan 2019 21:32:30 + Simon Nash wrote: > I have encountered a problem with ffmpeg (a segmentation fault) that > occurs only when running ffmpeg on the Marvell Armada 370 processor. ... > When the 32-bit floating-point multiply instruction > 0x0018a8f2 : vmla.f32s12

[FFmpeg-devel] [PATCH v3] libswscale/ppc: VSX-optimize 9-16 bit yuv2planeX

2019-01-08 Thread Lauri Kasanen
yuv420p16be 10463 UNITS in planarX, 130874 runs,198 skips 154405 UNITS in planarX, 131061 runs, 11 skips Signed-off-by: Lauri Kasanen --- v2: Separate macros so that yuv2plane1_16_vsx remains available for power7 v3: Remove accidental tabs, switch to HAVE_POWER8 from configure + runtime check

[FFmpeg-devel] [PATCH] avutil/ppc/cpu: Fix power8 linux detection

2019-01-08 Thread Lauri Kasanen
The existing code was in no released kernel that I can see. The corrected code was added in 3.9. Signed-off-by: Lauri Kasanen --- libavutil/ppc/cpu.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/libavutil/ppc/cpu.c b/libavutil/ppc/cpu.c index 7bb7cd8..b022149

Re: [FFmpeg-devel] Video codec design for very low-end decoder

2019-01-07 Thread Lauri Kasanen
On Mon, 7 Jan 2019 17:42:58 +0100 Michael Niedermayer wrote: > > According to a 2010 comparison > > https://keyj.emphy.de/video-encoder-comparison/ > > x264 constrained baseline (everything off) takes something like 30% > > longer to decode vs xvid at the same rate. Probably more because that > >

Re: [FFmpeg-devel] Video codec design for very low-end decoder

2019-01-07 Thread Lauri Kasanen
On Mon, 7 Jan 2019 13:44:56 +0100 Michael Niedermayer wrote: > > The modern approaches, DCT, FFT, wavelets and such transforms, are all > > likely too slow to decode. > > you said it can do mpeg1 and xvid, these are DCT based > have you tried H.264 ? (i imagine that might with asm optimizations

[FFmpeg-devel] Video codec design for very low-end decoder

2019-01-07 Thread Lauri Kasanen
Hi, If you were to design a video codec for a very low-end decoder, what would it look like? My target is MIPS 100MHz, and it should decode 320x240x30 in full speed in software, with headroom for audio too. Seems all the codec research in last 20 years has been more quality with more overhead, no

[FFmpeg-devel] [PATCH v2] libswscale/ppc: VSX-optimize 9-16 bit yuv2planeX

2019-01-06 Thread Lauri Kasanen
yuv420p16be 10463 UNITS in planarX, 130874 runs,198 skips 154405 UNITS in planarX, 131061 runs, 11 skips Signed-off-by: Lauri Kasanen --- v2: Separate macros so that yuv2plane1_16_vsx remains available for power7 libswscale/ppc/swscale_ppc_template.c | 4 +- libswscale/ppc/swscale_vsx.c

Re: [FFmpeg-devel] [PATCH] swscale/output: VSX-optimize 9-16 bit yuv2planeX

2019-01-06 Thread Lauri Kasanen
On Sun, 6 Jan 2019 13:23:43 +0100 Carl Eugen Hoyos wrote: > 2019-01-04 20:43 GMT+01:00, Lauri Kasanen : > > +#ifdef __POWER8_VECTOR__ > > If this is correct, I assume it fixes a bug in the current code > and should be a separate patch, no? > > > case 16: >

[FFmpeg-devel] [PATCH] swscale/output: VSX-optimize 9-16 bit yuv2planeX

2019-01-04 Thread Lauri Kasanen
yuv420p16be 10463 UNITS in planarX, 130874 runs,198 skips 154405 UNITS in planarX, 131061 runs, 11 skips Signed-off-by: Lauri Kasanen --- The existing VSX yuv2plane1 is also ifdefed out for POWER7, even though it works there. This is for cleanliness mainly, separating the macros would be a

Re: [FFmpeg-devel] [PATCH v2] swscale/output: Altivec-optimize float yuv2plane1

2018-12-24 Thread Lauri Kasanen
On Sun, 16 Dec 2018 11:06:53 +0200 Lauri Kasanen wrote: > This function wouldn't benefit from VSX instructions, so I put it > under altivec. > > ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt > grayf32le \ > -f null -vframes 100 -v error -nosta

Re: [FFmpeg-devel] [PATCH v2] swscale/output: Altivec-optimize float yuv2plane1

2018-12-17 Thread Lauri Kasanen
On Mon, 17 Dec 2018 14:52:49 +0100 Carl Eugen Hoyos wrote: > >> Note that this function / this pix_fmt currently has no real use-case > >> afaict. > > > > Is there a list of which pix fmts are useful? Of course I don't want to > > waste both my and reviewers' time, if the format is considered for

Re: [FFmpeg-devel] [PATCH v2] swscale/output: Altivec-optimize float yuv2plane1

2018-12-16 Thread Lauri Kasanen
On Mon, 17 Dec 2018 01:03:36 +0100 Carl Eugen Hoyos wrote: > 2018-12-16 10:06 GMT+01:00, Lauri Kasanen : > > This function wouldn't benefit from VSX instructions, so I put it > > under altivec. > > > > ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero

[FFmpeg-devel] [PATCH v2] swscale/output: Altivec-optimize float yuv2plane1

2018-12-16 Thread Lauri Kasanen
ff-by: Lauri Kasanen --- Tested on POWER8 LE. Testing on earlier ppc and/or BE appreciated. v2: Added #undef vzero, that define broke the build on older gcc. Thanks Michael libswscale/ppc/swscale_altivec.c | 141 ++- 1 file changed, 139 insertions(+), 2 dele

Re: [FFmpeg-devel] [PATCH] swscale/output: Altivec-optimize float yuv2plane1

2018-12-16 Thread Lauri Kasanen
On Sun, 16 Dec 2018 00:22:00 +0100 Michael Niedermayer wrote: > On Sat, Dec 15, 2018 at 06:32:31PM +0200, Lauri Kasanen wrote: > > Tested on POWER8 LE. Testing on earlier ppc and/or BE appreciated. > > > > libswscale/ppc/sw

[FFmpeg-devel] [PATCH] swscale/output: Altivec-optimize float yuv2plane1

2018-12-15 Thread Lauri Kasanen
ge to video conversion. Signed-off-by: Lauri Kasanen --- Tested on POWER8 LE. Testing on earlier ppc and/or BE appreciated. libswscale/ppc/swscale_altivec.c | 139 ++- 1 file changed, 137 insertions(+), 2 deletions(-) diff --git a/libswscale/ppc/swscale_alti

[FFmpeg-devel] [PATCH v2] swscale/output: VSX-optimize 16-bit yuv2plane1

2018-12-13 Thread Lauri Kasanen
format tested with an image to video conversion. Signed-off-by: Lauri Kasanen --- v2: Copy-pasted rows were flipped. libswscale/ppc/swscale_vsx.c | 59 1 file changed, 59 insertions(+) diff --git a/libswscale/ppc/swscale_vsx.c b/libswscale/ppc

[FFmpeg-devel] [PATCH] swscale/output: VSX-optimize 16-bit yuv2plane1

2018-12-13 Thread Lauri Kasanen
format tested with an image to video conversion. Signed-off-by: Lauri Kasanen --- libswscale/ppc/swscale_vsx.c | 59 1 file changed, 59 insertions(+) diff --git a/libswscale/ppc/swscale_vsx.c b/libswscale/ppc/swscale_vsx.c index 6462c11..70da6ae 100644

Re: [FFmpeg-devel] [PATCH] swscale/ppc: Move VSX-using code to its own file

2018-12-10 Thread Lauri Kasanen
On Thu, 6 Dec 2018 21:47:18 +0100 Michael Niedermayer wrote: > On Tue, Dec 04, 2018 at 02:27:22PM +0100, Michael Niedermayer wrote: > > > > > On Mon, Dec 03, 2018 at 09:24:47AM +0200, Lauri Kasanen wrote: > > > > > > Also ping on "swscale/output:

Re: [FFmpeg-devel] [PATCH] swscale/output: VSX-optimize nbps yuv2plane1

2018-12-07 Thread Lauri Kasanen
On Fri, 7 Dec 2018 13:50:12 +0100 Carl Eugen Hoyos wrote: > > Carl Eugen Hoyos wrote: > >> 2018-11-27 14:26 GMT+01:00, Lauri Kasanen : > >> > Fate passes, each format tested with an image to video conversion. > >> > > >> > Depends o

Re: [FFmpeg-devel] [PATCH] swscale/output: VSX-optimize nbps yuv2plane1

2018-12-06 Thread Lauri Kasanen
On Thu, 6 Dec 2018 22:36:01 +0100 Carl Eugen Hoyos wrote: > 2018-11-27 14:26 GMT+01:00, Lauri Kasanen : > > ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt > > yuv420p9le \ > > -f null -vframes 100 -v error -nostats - > > > > Speedups

Re: [FFmpeg-devel] [PATCH] swscale/ppc: Move VSX-using code to its own file

2018-12-03 Thread Lauri Kasanen
On Tue, 4 Dec 2018 03:21:30 +0100 Michael Niedermayer wrote: > On Mon, Dec 03, 2018 at 09:24:47AM +0200, Lauri Kasanen wrote: > > Also ping on "swscale/output: VSX-optimize > > nbps yuv2plane1". > > This IIUC has not been tested on BE yet > > my ppc emula

Re: [FFmpeg-devel] [PATCH] swscale/ppc: Move VSX-using code to its own file

2018-12-02 Thread Lauri Kasanen
On Fri, 30 Nov 2018 14:05:26 +0200 Lauri Kasanen wrote: > On Fri, 30 Nov 2018 12:30:58 +0300 > Michael Kostylev wrote: > > > > >> Passes fate on LE (with "lavc/jrevdct: Avoid an aliasing violation" > > >> applied). Can anyone test BE? > >

Re: [FFmpeg-devel] [PATCH] swscale/ppc: Move VSX-using code to its own file

2018-11-30 Thread Lauri Kasanen
On Fri, 30 Nov 2018 12:30:58 +0300 Michael Kostylev wrote: > > >> Passes fate on LE (with "lavc/jrevdct: Avoid an aliasing violation" > >> applied). Can anyone test BE? > > > > Ping. > > FATE becomes green as much as possible, I haven't performed any benchmarking > though. Thanks for testing

Re: [FFmpeg-devel] [PATCH] swscale/ppc: Move VSX-using code to its own file

2018-11-29 Thread Lauri Kasanen
On Mon, 26 Nov 2018 14:24:15 +0200 Lauri Kasanen wrote: > Passes fate on LE (with "lavc/jrevdct: Avoid an aliasing violation" applied). > Can anyone test BE? Ping. - Lauri ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org h

[FFmpeg-devel] [PATCH] swscale/output: VSX-optimize nbps yuv2plane1

2018-11-27 Thread Lauri Kasanen
yuv2plane1_12LE_vsx 11.0404 yuv2plane1_14BE_vsx 10.1763 yuv2plane1_14LE_vsx 11.2728 Fate passes, each format tested with an image to video conversion. Depends on "swscale/ppc: Move VSX-using code to its own file". Only tested on LE. Signed-off-by: Lauri Kasanen --- libs

[FFmpeg-devel] [PATCH] swscale/ppc: Move VSX-using code to its own file

2018-11-26 Thread Lauri Kasanen
Passes fate on LE (with "lavc/jrevdct: Avoid an aliasing violation" applied). Can anyone test BE? Signed-off-by: Lauri Kasanen --- libswscale/ppc/Makefile | 1 + libswscale/ppc/swscale_altivec.c | 291 ++ libs

  1   2   >