Re: [FFmpeg-devel] avcodec/utvideoenc : add SIMD (SSSE3) for sub_left_pred

2018-01-28 Thread Martin Vignali
2018-01-14 14:31 GMT+01:00 Martin Vignali :

> Hello,
>
> new patch in attach
> no more segfault if i remove movsxdifnidn, after change in the 
> "declare_func_emms"
> part
>
> I also modify the checkasm test, following your comments
>
> Martin
>
> Pushed

Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] avcodec/utvideoenc : add SIMD (SSSE3) for sub_left_pred

2018-01-14 Thread Martin Vignali
Hello,

new patch in attach
no more segfault if i remove movsxdifnidn, after change in the
"declare_func_emms"
part

I also modify the checkasm test, following your comments

Martin


0001-avcodec-utvideoenc-add-SIMD-avx-for-sub_left_predict.patch
Description: Binary data


0002-checkasm-add-test-for-losslessvideoencdsp-for-diff.patch
Description: Binary data
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] avcodec/utvideoenc : add SIMD (SSSE3) for sub_left_pred

2018-01-14 Thread James Almer
On 1/14/2018 7:46 AM, Henrik Gramner wrote:
> On Sat, Jan 13, 2018 at 5:22 PM, Martin Vignali
>  wrote:
>> +#define randomize_buffers(buf, size) \
>> +do { \
>> +int j;   \
>> +uint8_t *tmp_buf = (uint8_t *)buf;\
>> +for (j = 0; j < size; j++)   \
>> +tmp_buf[j] = rnd() & 0xFF;   \
>> +} while (0)
> 
> Typecast is redundant since everything is already uint8_t (unless you
> plan to add more things that aren't).
> 
> & 0xFF is redundant for bytes.

AV_WN32 should be used here to write all four bytes instead of wasting
24 bits from every random value generated.

> 
>> +static void check_diff_bytes(LLVidEncDSPContext c)
> [...]
>> +static void check_sub_left_pred(LLVidEncDSPContext c)
> 
> Struct arguments should generallly be passed by reference (pointer).
> 
>> +if (!dst0 || !dst1)
>> +fail();
> [...]
>> +if (!dst0 || !dst1 || !src0)
>> +fail();
> 
> fail() is used to signal that the currently tested function has failed
> and shouldn't be used outside of the check_func() scope. In this case
> it wouldn't even prevent segfaults by dereferencing the NULL pointers
> anyway in case of malloc failure.
> 
> Actually, you could get rid of malloc and just use fixed-size stack
> buffers to simplify things.
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] avcodec/utvideoenc : add SIMD (SSSE3) for sub_left_pred

2018-01-14 Thread Henrik Gramner
On Sat, Jan 13, 2018 at 5:22 PM, Martin Vignali
 wrote:
> +#define randomize_buffers(buf, size) \
> +do { \
> +int j;   \
> +uint8_t *tmp_buf = (uint8_t *)buf;\
> +for (j = 0; j < size; j++)   \
> +tmp_buf[j] = rnd() & 0xFF;   \
> +} while (0)

Typecast is redundant since everything is already uint8_t (unless you
plan to add more things that aren't).

& 0xFF is redundant for bytes.

> +static void check_diff_bytes(LLVidEncDSPContext c)
[...]
> +static void check_sub_left_pred(LLVidEncDSPContext c)

Struct arguments should generallly be passed by reference (pointer).

> +if (!dst0 || !dst1)
> +fail();
[...]
> +if (!dst0 || !dst1 || !src0)
> +fail();

fail() is used to signal that the currently tested function has failed
and shouldn't be used outside of the check_func() scope. In this case
it wouldn't even prevent segfaults by dereferencing the NULL pointers
anyway in case of malloc failure.

Actually, you could get rid of malloc and just use fixed-size stack
buffers to simplify things.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] avcodec/utvideoenc : add SIMD (SSSE3) for sub_left_pred

2018-01-13 Thread Henrik Gramner
On Sat, Jan 13, 2018 at 5:22 PM, Martin Vignali
 wrote:
> i try to change int width -> ptrdiff_t width to remove movsxdifnidn
> but i have a segfault if height > 1

I'm guessing due to

> +declare_func_emms(AV_CPU_FLAG_MMX, void, uint8_t *dst, const uint8_t 
> *src,
> +  ptrdiff_t stride, int width, int height);

where it's still specified as int.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] avcodec/utvideoenc : add SIMD (SSSE3) for sub_left_pred

2018-01-13 Thread Martin Vignali
Hello,

Following Henrik Gramner's comments,
new patch in attach

i try to change int width -> ptrdiff_t width to remove movsxdifnidn
but i have a segfault if height > 1

pass fate test for me.

Martin


0001-avcodec-utvideoenc-add-SIMD-avx-for.patch
Description: Binary data


0002-checkasm-add-test-for-losslessvideoencdsp-for-diff.patch
Description: Binary data
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] avcodec/utvideoenc : add SIMD (SSSE3) for sub_left_pred

2018-01-12 Thread Michael Niedermayer
On Fri, Jan 12, 2018 at 09:57:03AM +0100, Martin Vignali wrote:
> >
> > this changes the output:
> > make -j12 && ./ffmpeg -i ~/videos/matrixbench_mpeg2.mpg -an -vcodec
> > utvideo -t 1 -pix_fmt yuv420p -pred left  -t 1 test2.avi
> >
> > -rw-r- 1 michael michael 3744402 Jan 12 04:20 test2.avi
> > -rw-r- 1 michael michael 3753358 Jan 12 04:19 test.avi
> >
> >
> > Hello,
> 
> Thanks for testing.
> Is it possible to have the file ?

the matrixbench file ?
thats here:
https://samples.ffmpeg.org/benchmark/testsuite1/matrixbench_mpeg2.mpg

[...]

-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

When you are offended at any man's fault, turn to yourself and study your
own failings. Then you will forget your anger. -- Epictetus


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] avcodec/utvideoenc : add SIMD (SSSE3) for sub_left_pred

2018-01-12 Thread Henrik Gramner
On Thu, Jan 11, 2018 at 9:45 PM, Martin Vignali
 wrote:
> +if (check_func(c.sub_left_predict, "sub_left_predict")) {
> +call_ref(dst0, src0, stride, width, height);
> +call_new(dst1, src0, stride, width, height);
> +if (memcmp(dst0, dst1, width))
> +fail();
> +bench_new(dst1, src0, stride, width, height);
> +}

You're only verifying the results of the first row here. Changing it
to test all rows results in test failures.

> +int width = av_clip(rnd(), 16, 128);
> +int height = av_clip(rnd(), 16, 128);

This kind of clipping will result in the values being 128 almost every
run. You should also use constant sizes instead of random ones because
random ones will make benchmarking inconsistent since you'll measure
different things for the C and asm versions.

You could do something along the lines of

static const struct { uint8_t w, h, s; } planes[] = {
{16,16,16}, {21,23,25}, {32,17,48}, {15,128,16}, {128,127,128}
};

and just test all of those every run.

> +%if ARCH_X86_64
> +INIT_XMM ssse3
> +cglobal sub_left_predict, 4,5,5, dst, src, stride, width, height, x
> +mova m0, [pb_15] ; shuffle for last byte
> +mova m1, [pb_80] ; prev initial
> +.nextrow:
> +xor  xq, xq
> +
> +.loop:
> +movu   m2, [srcq + xq]
> +psubb  m1, m2 ; - prev
> +pslldq m3, m1, 1
> +psubb  m3, m1
> +movu[dstq+xq], m3
> +pshufb m1, m2, m0
> +addxq, mmsize
> +cmpxd, widthd
> +jl .loop
> +
> +addsrcq, strideq
> +adddstq, widthq
> +sub heightq, 1
> +jg .nextrow
> +REP_RET
> +%endif

There's no need to restrict this to x86-64 only.

The register specification is wrong and will fail on Windows (and 32-bit).

Using a constant 15 for pshufb will be be wrong for the first byte of
every row except for the first with non-mod16 widths.

Try something like this:

INIT_XMM avx
cglobal sub_left_predict, 5,6,5, dst, src, stride, width, height, x
movsxdifnidn widthq, widthd ; Change width from int to ptrdiff_t
to get rid of this
mova m1, [pb_80] ; prev
adddstq, widthq
addsrcq, widthq
lea  xd, [widthq-1]
neg  widthq
and  xd, 15
pinsrb   m4, m1, xd, 15
mov  xq, widthq

.loop:
movu m0, [srcq+widthq]
palignr  m2, m0, m1, 15
movu m1, [srcq+widthq+16]
palignr  m3, m1, m0, 15
psubbm2, m0, m2
psubbm3, m1, m3
movu [dstq+widthq], m2
movu [dstq+widthq+16], m3
add  widthq, 2*16
jl .loop

addsrcq, strideq
subdstq, xq
test xd, 16
jz .mod32
mova m1, m0
.mod32:
pshufb   m1, m4
mov  widthq, xq
dec heightd
jg .loop
RET
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] avcodec/utvideoenc : add SIMD (SSSE3) for sub_left_pred

2018-01-12 Thread Martin Vignali
>
> this changes the output:
> make -j12 && ./ffmpeg -i ~/videos/matrixbench_mpeg2.mpg -an -vcodec
> utvideo -t 1 -pix_fmt yuv420p -pred left  -t 1 test2.avi
>
> -rw-r- 1 michael michael 3744402 Jan 12 04:20 test2.avi
> -rw-r- 1 michael michael 3753358 Jan 12 04:19 test.avi
>
>
> Hello,

Thanks for testing.
Is it possible to have the file ?

Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] avcodec/utvideoenc : add SIMD (SSSE3) for sub_left_pred

2018-01-11 Thread Michael Niedermayer
On Thu, Jan 11, 2018 at 09:45:07PM +0100, Martin Vignali wrote:
> Hello,
> 
> in attach patch to add SIMD for sub_left_pred in utvideoenc
> 
> 001 : add SIMD for utvideoenc
> 002 : add checkasm for llviddspenc (diff bytes and sub_left_pred)
> 
> Encoding result :
> ./ffmpeg -i utvideo_file.avi -c:v utvideo -pred left res.avi
> 
> Without
> 
> frame= 3316 fps=194 q=-0.0 Lsize= 3613675kB time=00:02:12.64
> bitrate=223184.8kbits/s speed=7.74x
> 
> With
> frame= 3316 fps=206 q=-0.0 Lsize= 3613675kB time=00:02:12.64
> bitrate=223184.8kbits/s speed=8.24x
> 
> 
> Checkasm result :
> sub_left_predict_c: 2497.2
> sub_left_predict_ssse3: 455.7
> 
> 
> pass fate test for me (os X, x86_64)

this changes the output:
make -j12 && ./ffmpeg -i ~/videos/matrixbench_mpeg2.mpg -an -vcodec utvideo -t 
1 -pix_fmt yuv420p -pred left  -t 1 test2.avi

-rw-r- 1 michael michael 3744402 Jan 12 04:20 test2.avi
-rw-r- 1 michael michael 3753358 Jan 12 04:19 test.avi


[...]

-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

When the tyrant has disposed of foreign enemies by conquest or treaty, and
there is nothing more to fear from them, then he is always stirring up
some war or other, in order that the people may require a leader. -- Plato


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel