Re: [FFmpeg-devel] [PATCH v4 2/2][GSoC 2024] tests/checkasm: Add check_vvc_sad to vvc_mc.c

2024-05-21 Thread Ronald S. Bultje
Hi,

On Sun, May 19, 2024 at 8:55 PM Stone Chen  wrote:

> Adds checkasm for DMVR SAD AVX2 implementation.
>
> Benchmarks ( AMD 7940HS )
> vvc_sad_8x8_c: 70.0
> vvc_sad_8x8_avx2: 10.0
> vvc_sad_16x16_c: 280.0
> vvc_sad_16x16_avx2: 20.0
> vvc_sad_32x32_c: 1020.0
> vvc_sad_32x32_avx2: 70.0
> vvc_sad_64x64_c: 3560.0
> vvc_sad_64x64_avx2: 270.0
> vvc_sad_128x128_c: 13760.0
> vvc_sad_128x128_avx2: 1070.0
> ---
>  tests/checkasm/vvc_mc.c | 38 ++
>  1 file changed, 38 insertions(+)
>

It appears Remi's performance concerns have been addressed separately, so
this patch is good to go.

Ronald
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v4 2/2][GSoC 2024] tests/checkasm: Add check_vvc_sad to vvc_mc.c

2024-05-21 Thread Martin Storsjö

On Tue, 21 May 2024, Rémi Denis-Courmont wrote:




Le 21 mai 2024 09:37:18 GMT+03:00, "Martin Storsjö"  a écrit :

On Tue, 21 May 2024, Rémi Denis-Courmont wrote:


Hi,

VVC benchmarks have increased checksam runtime by at least an order of 
magnitude. It's become so prohibitively slow that I could not even get 
to the end.


This is not an acceptable situation and impedes non-VVC assembler work


I don't quite understand; whenever benchmarking anything in checkasm, I 
would always run e.g. "checkasm --test=ac3dsp 
--bench=ac3_sum_square_bufferfly_float", limiting the total running of 
tests to a specific module, and only benchmarking a subset of the run 
functions. (The --bench parameter specifies a prefix; only functions 
matching that prefix gets benchmarked.)


Sure that's how you do it when you're working on a specific new 
optimisation. Now we're trying to compare 128-bit and 256-bit vectors 
for *all* existing functions to see which ones need to be reworked.


That used to work (in 30 minutes on K230, 5 minutes on Zen 2, IIRC). Now 
it's effectively broken and that's not acceptable'


Ah, I see. Ok, that's a reasonable thing to do I guess.

(It's of course possible to speed it up further by only testing specific 
--test=foo cases where you know you have riscv assembly worth 
benchmarking, but if it was doable in a tolerable amount of time before, 
that shouldn't be needed.)


Without limiting the scope with a --test parameter, checkasm 
benchmarking has always been prohibitively slow for me - so I don't 
think there's anything new here?


As said, it seems to be literally an order of magnitude slower than 
before if not worse.


That said I'm not familiar with the VVC tests in checkasm, perhaps they 
benchmark things excessively. But I don't see how that would impede 
work on other DSP functions in any way?


James also complained about the same thing before I.


Indeed, the tests in vvc_alf group seem to do excessive benchmarking 
(benchmarking every width/height combination between 4 and 128, in 
increments of 4). I sent a patch to cut this down to a reasonable amount.


Overall, I would expect the vvc checkasm tests to take a notable amount of 
time. Dav1d's checkasm takes twice as long to run as ffmpeg's, and it's 
probably a reasonable to assume that vvc is roughly of the same level of 
complexity as av1, so it's probably expected that ffmpeg's checkasm 
runtime at least doubles, once all vvc routines are integrated in 
checkasm.


But the tests in vvc_alf indeed had an entirely unreasonable amount of 
benchmarking hooked up, and that should indeed be fixed, e.g. with the 
patch I just sent.


// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v4 2/2][GSoC 2024] tests/checkasm: Add check_vvc_sad to vvc_mc.c

2024-05-21 Thread Rémi Denis-Courmont


Le 21 mai 2024 09:37:18 GMT+03:00, "Martin Storsjö"  a écrit :
>On Tue, 21 May 2024, Rémi Denis-Courmont wrote:
>
>> Hi,
>> 
>> Le 20 mai 2024 03:42:03 GMT+03:00, Stone Chen  a 
>> écrit :
>>> Adds checkasm for DMVR SAD AVX2 implementation.
>>> 
>>> Benchmarks ( AMD 7940HS )
>>> vvc_sad_8x8_c: 70.0
>>> vvc_sad_8x8_avx2: 10.0
>>> vvc_sad_16x16_c: 280.0
>>> vvc_sad_16x16_avx2: 20.0
>>> vvc_sad_32x32_c: 1020.0
>>> vvc_sad_32x32_avx2: 70.0
>>> vvc_sad_64x64_c: 3560.0
>>> vvc_sad_64x64_avx2: 270.0
>>> vvc_sad_128x128_c: 13760.0
>>> vvc_sad_128x128_avx2: 1070.0
>>> ---
>>> tests/checkasm/vvc_mc.c | 38 ++
>>> 1 file changed, 38 insertions(+)
>> 
>> VVC benchmarks have increased checksam runtime by at least an order of 
>> magnitude. It's become so prohibitively slow that I could not even get to 
>> the end.
>> 
>> This is not an acceptable situation and impedes non-VVC assembler work
>
>I don't quite understand; whenever benchmarking anything in checkasm, I would 
>always run e.g. "checkasm --test=ac3dsp 
>--bench=ac3_sum_square_bufferfly_float", limiting the total running of tests 
>to a specific module, and only benchmarking a subset of the run functions. 
>(The --bench parameter specifies a prefix; only functions matching that prefix 
>gets benchmarked.)

Sure that's how you do it when you're working on a specific new optimisation. 
Now we're trying to compare 128-bit and 256-bit vectors for *all* existing 
functions to see which ones need to be reworked.

That used to work (in 30 minutes on K230, 5 minutes on Zen 2, IIRC). Now it's 
effectively broken and that's not acceptable'

>
>Without limiting the scope with a --test parameter, checkasm benchmarking has 
>always been prohibitively slow for me - so I don't think there's anything new 
>here?

As said, it seems to be literally an order of magnitude slower than before if 
not worse.

>That said I'm not familiar with the VVC tests in checkasm, perhaps they 
>benchmark things excessively. But I don't see how that would impede work on 
>other DSP functions in any way?

James also complained about the same thing before I.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v4 2/2][GSoC 2024] tests/checkasm: Add check_vvc_sad to vvc_mc.c

2024-05-20 Thread Martin Storsjö

On Tue, 21 May 2024, Rémi Denis-Courmont wrote:


Hi,

Le 20 mai 2024 03:42:03 GMT+03:00, Stone Chen  a 
écrit :

Adds checkasm for DMVR SAD AVX2 implementation.

Benchmarks ( AMD 7940HS )
vvc_sad_8x8_c: 70.0
vvc_sad_8x8_avx2: 10.0
vvc_sad_16x16_c: 280.0
vvc_sad_16x16_avx2: 20.0
vvc_sad_32x32_c: 1020.0
vvc_sad_32x32_avx2: 70.0
vvc_sad_64x64_c: 3560.0
vvc_sad_64x64_avx2: 270.0
vvc_sad_128x128_c: 13760.0
vvc_sad_128x128_avx2: 1070.0
---
tests/checkasm/vvc_mc.c | 38 ++
1 file changed, 38 insertions(+)


VVC benchmarks have increased checksam runtime by at least an order of 
magnitude. It's become so prohibitively slow that I could not even get 
to the end.


This is not an acceptable situation and impedes non-VVC assembler work


I don't quite understand; whenever benchmarking anything in checkasm, I 
would always run e.g. "checkasm --test=ac3dsp 
--bench=ac3_sum_square_bufferfly_float", limiting the total running of 
tests to a specific module, and only benchmarking a subset of the run 
functions. (The --bench parameter specifies a prefix; only functions 
matching that prefix gets benchmarked.)


Without limiting the scope with a --test parameter, checkasm benchmarking 
has always been prohibitively slow for me - so I don't think there's 
anything new here? If you were lucky enough to be able to do a full run of 
checkasm with benchmarks of all functions before, that sounds like an 
exception to me, not a reason to limit adding new tests?


That said I'm not familiar with the VVC tests in checkasm, perhaps they 
benchmark things excessively. But I don't see how that would impede work 
on other DSP functions in any way?


// Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v4 2/2][GSoC 2024] tests/checkasm: Add check_vvc_sad to vvc_mc.c

2024-05-20 Thread Rémi Denis-Courmont
Hi,

Le 20 mai 2024 03:42:03 GMT+03:00, Stone Chen  a 
écrit :
>Adds checkasm for DMVR SAD AVX2 implementation.
>
>Benchmarks ( AMD 7940HS )
>vvc_sad_8x8_c: 70.0
>vvc_sad_8x8_avx2: 10.0
>vvc_sad_16x16_c: 280.0
>vvc_sad_16x16_avx2: 20.0
>vvc_sad_32x32_c: 1020.0
>vvc_sad_32x32_avx2: 70.0
>vvc_sad_64x64_c: 3560.0
>vvc_sad_64x64_avx2: 270.0
>vvc_sad_128x128_c: 13760.0
>vvc_sad_128x128_avx2: 1070.0
>---
> tests/checkasm/vvc_mc.c | 38 ++
> 1 file changed, 38 insertions(+)

VVC benchmarks have increased checksam runtime by at least an order of 
magnitude. It's become so prohibitively slow that I could not even get to the 
end.

This is not an acceptable situation and impedes non-VVC assembler work 

Please fix this before you add any new VVC tests. In the mean time:

-1 / Nack all VVC checksam from my behalf.


>diff --git a/tests/checkasm/vvc_mc.c b/tests/checkasm/vvc_mc.c
>index 97f57cb401..e251400bfc 100644
>--- a/tests/checkasm/vvc_mc.c
>+++ b/tests/checkasm/vvc_mc.c
>@@ -322,8 +322,46 @@ static void check_avg(void)
> report("avg");
> }
> 
>+static void check_vvc_sad(void)
>+{
>+const int bit_depth = 10;
>+VVCDSPContext c;
>+LOCAL_ALIGNED_32(uint16_t, src0, [MAX_CTU_SIZE * MAX_CTU_SIZE * 4]);
>+LOCAL_ALIGNED_32(uint16_t, src1, [MAX_CTU_SIZE * MAX_CTU_SIZE * 4]);
>+declare_func(int, const int16_t *src0, const int16_t *src1, int dx, int 
>dy, int block_w, int block_h);
>+
>+ff_vvc_dsp_init(&c, bit_depth);
>+memset(src0, 0, MAX_CTU_SIZE * MAX_CTU_SIZE * 4);
>+memset(src1, 0, MAX_CTU_SIZE * MAX_CTU_SIZE * 4);
>+
>+randomize_pixels(src0, src1, MAX_CTU_SIZE * MAX_CTU_SIZE * 2);
>+ for (int h = 8; h <= MAX_CTU_SIZE; h *= 2) {
>+for (int w = 8; w <= MAX_CTU_SIZE; w *= 2) {
>+for(int offy = 0; offy <= 4; offy++) {
>+for(int offx = 0; offx <= 4; offx++) {
>+if(check_func(c.inter.sad, "vvc_sad_%dx%d", w, h)) {
>+int result0;
>+int result1;
>+
>+result0 =  call_ref(src0 + PIXEL_STRIDE * 2 + 2, src1 
>+ PIXEL_STRIDE * 2 + 2, offx, offy, w, h);
>+result1 =  call_new(src0 + PIXEL_STRIDE * 2 + 2, src1 
>+ PIXEL_STRIDE * 2 + 2, offx, offy, w, h);
>+
>+if (result1 != result0)
>+fail();
>+if(w == h && offx == 0 && offy == 0)
>+bench_new(src0 + PIXEL_STRIDE * 2 + 2, src1 + 
>PIXEL_STRIDE * 2 + 2, offx, offy, w, h);
>+}
>+}
>+}
>+}
>+ }
>+
>+report("check_vvc_sad");
>+}
>+
> void checkasm_check_vvc_mc(void)
> {
>+check_vvc_sad();
> check_put_vvc_luma();
> check_put_vvc_luma_uni();
> check_put_vvc_chroma();
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH v4 2/2][GSoC 2024] tests/checkasm: Add check_vvc_sad to vvc_mc.c

2024-05-19 Thread Stone Chen
Adds checkasm for DMVR SAD AVX2 implementation.

Benchmarks ( AMD 7940HS )
vvc_sad_8x8_c: 70.0
vvc_sad_8x8_avx2: 10.0
vvc_sad_16x16_c: 280.0
vvc_sad_16x16_avx2: 20.0
vvc_sad_32x32_c: 1020.0
vvc_sad_32x32_avx2: 70.0
vvc_sad_64x64_c: 3560.0
vvc_sad_64x64_avx2: 270.0
vvc_sad_128x128_c: 13760.0
vvc_sad_128x128_avx2: 1070.0
---
 tests/checkasm/vvc_mc.c | 38 ++
 1 file changed, 38 insertions(+)

diff --git a/tests/checkasm/vvc_mc.c b/tests/checkasm/vvc_mc.c
index 97f57cb401..e251400bfc 100644
--- a/tests/checkasm/vvc_mc.c
+++ b/tests/checkasm/vvc_mc.c
@@ -322,8 +322,46 @@ static void check_avg(void)
 report("avg");
 }
 
+static void check_vvc_sad(void)
+{
+const int bit_depth = 10;
+VVCDSPContext c;
+LOCAL_ALIGNED_32(uint16_t, src0, [MAX_CTU_SIZE * MAX_CTU_SIZE * 4]);
+LOCAL_ALIGNED_32(uint16_t, src1, [MAX_CTU_SIZE * MAX_CTU_SIZE * 4]);
+declare_func(int, const int16_t *src0, const int16_t *src1, int dx, int 
dy, int block_w, int block_h);
+
+ff_vvc_dsp_init(&c, bit_depth);
+memset(src0, 0, MAX_CTU_SIZE * MAX_CTU_SIZE * 4);
+memset(src1, 0, MAX_CTU_SIZE * MAX_CTU_SIZE * 4);
+
+randomize_pixels(src0, src1, MAX_CTU_SIZE * MAX_CTU_SIZE * 2);
+ for (int h = 8; h <= MAX_CTU_SIZE; h *= 2) {
+for (int w = 8; w <= MAX_CTU_SIZE; w *= 2) {
+for(int offy = 0; offy <= 4; offy++) {
+for(int offx = 0; offx <= 4; offx++) {
+if(check_func(c.inter.sad, "vvc_sad_%dx%d", w, h)) {
+int result0;
+int result1;
+
+result0 =  call_ref(src0 + PIXEL_STRIDE * 2 + 2, src1 
+ PIXEL_STRIDE * 2 + 2, offx, offy, w, h);
+result1 =  call_new(src0 + PIXEL_STRIDE * 2 + 2, src1 
+ PIXEL_STRIDE * 2 + 2, offx, offy, w, h);
+
+if (result1 != result0)
+fail();
+if(w == h && offx == 0 && offy == 0)
+bench_new(src0 + PIXEL_STRIDE * 2 + 2, src1 + 
PIXEL_STRIDE * 2 + 2, offx, offy, w, h);
+}
+}
+}
+}
+ }
+
+report("check_vvc_sad");
+}
+
 void checkasm_check_vvc_mc(void)
 {
+check_vvc_sad();
 check_put_vvc_luma();
 check_put_vvc_luma_uni();
 check_put_vvc_chroma();
-- 
2.45.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH v4 2/2][GSoC 2024] tests/checkasm: Add check_vvc_sad to vvc_mc.c

2024-05-19 Thread Stone Chen
Adds checkasm for DMVR SAD AVX2 implementation.

Benchmarks ( AMD 7940HS )
vvc_sad_8x8_c: 70.0
vvc_sad_8x8_avx2: 10.0
vvc_sad_16x16_c: 280.0
vvc_sad_16x16_avx2: 20.0
vvc_sad_32x32_c: 1020.0
vvc_sad_32x32_avx2: 70.0
vvc_sad_64x64_c: 3560.0
vvc_sad_64x64_avx2: 270.0
vvc_sad_128x128_c: 13760.0
vvc_sad_128x128_avx2: 1070.0
---
 tests/checkasm/vvc_mc.c | 38 ++
 1 file changed, 38 insertions(+)

diff --git a/tests/checkasm/vvc_mc.c b/tests/checkasm/vvc_mc.c
index 97f57cb401..e251400bfc 100644
--- a/tests/checkasm/vvc_mc.c
+++ b/tests/checkasm/vvc_mc.c
@@ -322,8 +322,46 @@ static void check_avg(void)
 report("avg");
 }
 
+static void check_vvc_sad(void)
+{
+const int bit_depth = 10;
+VVCDSPContext c;
+LOCAL_ALIGNED_32(uint16_t, src0, [MAX_CTU_SIZE * MAX_CTU_SIZE * 4]);
+LOCAL_ALIGNED_32(uint16_t, src1, [MAX_CTU_SIZE * MAX_CTU_SIZE * 4]);
+declare_func(int, const int16_t *src0, const int16_t *src1, int dx, int 
dy, int block_w, int block_h);
+
+ff_vvc_dsp_init(&c, bit_depth);
+memset(src0, 0, MAX_CTU_SIZE * MAX_CTU_SIZE * 4);
+memset(src1, 0, MAX_CTU_SIZE * MAX_CTU_SIZE * 4);
+
+randomize_pixels(src0, src1, MAX_CTU_SIZE * MAX_CTU_SIZE * 2);
+ for (int h = 8; h <= MAX_CTU_SIZE; h *= 2) {
+for (int w = 8; w <= MAX_CTU_SIZE; w *= 2) {
+for(int offy = 0; offy <= 4; offy++) {
+for(int offx = 0; offx <= 4; offx++) {
+if(check_func(c.inter.sad, "vvc_sad_%dx%d", w, h)) {
+int result0;
+int result1;
+
+result0 =  call_ref(src0 + PIXEL_STRIDE * 2 + 2, src1 
+ PIXEL_STRIDE * 2 + 2, offx, offy, w, h);
+result1 =  call_new(src0 + PIXEL_STRIDE * 2 + 2, src1 
+ PIXEL_STRIDE * 2 + 2, offx, offy, w, h);
+
+if (result1 != result0)
+fail();
+if(w == h && offx == 0 && offy == 0)
+bench_new(src0 + PIXEL_STRIDE * 2 + 2, src1 + 
PIXEL_STRIDE * 2 + 2, offx, offy, w, h);
+}
+}
+}
+}
+ }
+
+report("check_vvc_sad");
+}
+
 void checkasm_check_vvc_mc(void)
 {
+check_vvc_sad();
 check_put_vvc_luma();
 check_put_vvc_luma_uni();
 check_put_vvc_chroma();
-- 
2.45.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".