Re: [FFmpeg-devel] [PATCH] af_afir: RISC-V V fcmul_add

2023-09-27 Thread flow gg
] fcmul_add_c: 4.2 fcmul_add_rvv_f32: 4.2 - af_afir.fcmul_add [OK] fcmul_add_c: 4.5 fcmul_add_rvv_f32: 4.2 - af_afir.fcmul_add [OK] fcmul_add_c: 4.7 fcmul_add_rvv_f32: 3.5 Rémi Denis-Courmont 于2023年9月28日周四 00:41写道: > Le tiistaina 26. syyskuuta 2023, 12.24.58 EEST flow gg a écrit : > >

Re: [FFmpeg-devel] [PATCH] af_afir: RISC-V V fcmul_add

2023-09-26 Thread flow gg
于2023年9月27日周三 02:44写道: > Le tiistaina 26. syyskuuta 2023, 21.40.12 EEST Paul B Mahol a écrit : > > On Tue, Sep 26, 2023 at 8:35 PM Rémi Denis-Courmont > wrote: > > > Le tiistaina 26. syyskuuta 2023, 12.24.58 EEST flow gg a écrit : > > > > benchmark: > > &

[FFmpeg-devel] [PATCH] af_afir: RISC-V V fcmul_add

2023-09-26 Thread flow gg
benchmark: fcmul_add_c: 19.7 fcmul_add_rvv_f32: 6.7 From 6bef2523728a472bb803ce085a1aafdfd624e212 Mon Sep 17 00:00:00 2001 From: h Date: Tue, 26 Sep 2023 15:03:12 +0800 Subject: [PATCH] af_afir: RISC-V V fcmul_add fcmul_add_c: 19.7 fcmul_add_rvv_f32: 6.7 --- libavfilter/af_afirdsp.h |

Re: [FFmpeg-devel] [PATCH 1/7] lavc/me_cmp: R-V V pix_abs

2024-02-08 Thread flow gg
signal 7: Bus error) Because it can only load according to e8, it seems there's no way to use larger group multipliers. Rémi Denis-Courmont 于2024年2月9日周五 03:41写道: > Le keskiviikkona 7. helmikuuta 2024, 2.01.23 EET flow gg a écrit : > > I think in most cases it is like this, but spe

Re: [FFmpeg-devel] [PATCH 1/7] lavc/me_cmp: R-V V pix_abs

2024-02-09 Thread flow gg
The issue here is that any load greater than e8 will fail the test(Bus error), so it cannot use vlse64 or similar methods... Rémi Denis-Courmont 于2024年2月9日周五 18:32写道: > > > Le 9 février 2024 00:39:38 GMT+02:00, flow gg a > écrit : > >From my understanding, to use larger grou

Re: [FFmpeg-devel] [PATCH 2/4] lavc/rv34dsp: R-V V rv34_inv_transform_dc

2024-02-09 Thread flow gg
Okay, I have updated them in the response Rémi Denis-Courmont 于2024年2月10日周六 05:14写道: > Le keskiviikkona 7. helmikuuta 2024, 2.12.22 EET flow gg a écrit : > > My carelessness.. fixed it in the reply. > > I know I said to avoid scalar multiplications, but this may be taking it a &

Re: [FFmpeg-devel] [PATCH 2/2] lavc/blockdsp: R-V V clear_blocks

2024-02-12 Thread flow gg
ok, updated it in the reply Rémi Denis-Courmont 于2024年2月13日周二 03:49写道: > Le perjantaina 2. helmikuuta 2024, 3.14.39 EET flow gg a écrit : > > Ok, updated it in the reply > > Sorry I meant directive, not macro. .rept is just fine here. > > -- > レミ・デニ-クールモン

Re: [FFmpeg-devel] [PATCH 4/4] lavc/rv34dsp: R-V V rv34_idct_dc_add

2024-02-12 Thread flow gg
I tested this in '[FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V inv_trans'. The logic here is the same, using vext can reduce vset, making it a bit faster Rémi Denis-Courmont 于2024年2月13日周二 03:46写道: > Le keskiviikkona 31. tammikuuta 2024, 19.58.55 EET flow gg a écrit : > > Fixed the r

Re: [FFmpeg-devel] [PATCH 1/3] lavc/vp8dsp: R-V V vp8_idct_dc_add

2024-02-12 Thread flow gg
xxx_idct_dc_add is quite similar because vext can reduce vset, so it is a bit faster than using vwadd. This was tested when '[FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V inv_trans' Rémi Denis-Courmont 于2024年2月13日周二 03:53写道: > Hi, > > I think you cna use vwadd here? > > -- > Rémi Denis-Courmont >

Re: [FFmpeg-devel] [PATCH 1/4] checkasm/rv34dsp: add rv34_inv_transform_dc test

2024-02-13 Thread flow gg
it was due to a testing , not MMX. fixed it in this reply. flow gg 于2024年2月13日周二 10:37写道: > I sended "[FFmpeg-devel] [PATCH] x86: Remove MMX assembly > rv34_inv_transform_dc in rv34dsp" > > Rémi Denis-Courmont 于2024年2月13日周二 03:37写道: > >> Le perjantaina 2. helmi

Re: [FFmpeg-devel] [PATCH] x86: Remove MMX assembly rv34_inv_transform_dc in rv34dsp

2024-02-13 Thread flow gg
I made a mistake. It can be fixed your way. Please ignore this reply. flow gg 于2024年2月13日周二 17:47写道: > Thank you for your guidance. Do you mean that it should be modified test > like this? > > - declare_func(void, uint8_t *dst, ptrdiff_t stride, int dc); > + declare_func_emms(

Re: [FFmpeg-devel] [PATCH] x86: Remove MMX assembly rv34_inv_transform_dc in rv34dsp

2024-02-13 Thread flow gg
Thank you for your guidance. Do you mean that it should be modified test like this? - declare_func(void, uint8_t *dst, ptrdiff_t stride, int dc); + declare_func_emms(AV_CPU_FLAG_MMX, void, uint8_t *, ptrdiff_t, int); I tried to do it this way, but the test still failed. not sure why ...

Re: [FFmpeg-devel] [PATCH 2/4] lavc/rv34dsp: R-V V rv34_inv_transform_dc

2024-02-10 Thread flow gg
Happy new year ~ Yes, I've tried reordering. Rémi Denis-Courmont 于2024年2月10日周六 17:18写道: > Happy new year, > > The gains are -unsurprisingly- modest here. Did you try to reorder > instructions to improve scheduling? > > -- > Rémi Denis-Courmont > http://www.remlab.net/ > > > >

[FFmpeg-devel] [PATCH] x86: Remove MMX assembly rv34_inv_transform_dc in rv34dsp

2024-02-12 Thread flow gg
checkasm in [FFmpeg-devel] [PATCH 1/4] checkasm/rv34dsp: add rv34_inv_transform_dc test From 1aa51d60def8d4313c1b11a50528662ec832530e Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Tue, 13 Feb 2024 08:41:20 +0800 Subject: [PATCH] x86: Remove MMX assembly rv34_inv_transform_dc in rv34dsp This asm

Re: [FFmpeg-devel] [PATCH 1/4] checkasm/rv34dsp: add rv34_inv_transform_dc test

2024-02-12 Thread flow gg
I sended "[FFmpeg-devel] [PATCH] x86: Remove MMX assembly rv34_inv_transform_dc in rv34dsp" Rémi Denis-Courmont 于2024年2月13日周二 03:37写道: > Le perjantaina 2. helmikuuta 2024, 2.47.16 EET flow gg a écrit : > > It seems to be caused by movd m0, r1d in libavcodec/x86/rv34dsp.asm

Re: [FFmpeg-devel] [PATCH 2/3] lavc/vp8dsp: R-V V vp8_idct_dc_add4y

2024-02-12 Thread flow gg
Okay, updated it in the reply Rémi Denis-Courmont 于2024年2月13日周二 03:54写道: > Hi, > > To avoid repeating the code, you can either use .repr or .irp. You can > even > use assembler conditionals to elide the redundant code on the last > iteration. > > -- > レミ・デニ-クールモン > http://www.remlab.net/ >

Re: [FFmpeg-devel] [PATCH] lavc/vp8dsp: R-V V put_vp8_pixels

2024-02-21 Thread flow gg
llo, > > Le maanantaina 19. helmikuuta 2024, 13.13.43 EET flow gg a écrit : > > The reason for using m1+le8 instead of stride load + larger group > > multipliers is the same as in "[FFmpeg-devel] [PATCH 1/7] lavc/me_cmp: > R-V > > V pix_abs." > > > >

Re: [FFmpeg-devel] [PATCH 5/7] lavc/me_cmp: R-V V vsse vsad

2024-02-21 Thread flow gg
=917745 c=3865 Rémi Denis-Courmont 于2024年2月22日周四 02:07写道: > Le tiistaina 6. helmikuuta 2024, 17.56.32 EET flow gg a écrit : > > > > Did you try to compute integral absolute values with the ad-hoc (floating > point) instruction instead of vneg/vmax? It should work since the sign

[FFmpeg-devel] [PATCH] lavc/vp8dsp: R-V V put_vp8_pixels

2024-02-19 Thread flow gg
The reason for using m1+le8 instead of stride load + larger group multipliers is the same as in "[FFmpeg-devel] [PATCH 1/7] lavc/me_cmp: R-V V pix_abs." In the test, there is #define src (buf + 2 * SRC_BUF_STRIDE + 2 + 1) Therefore, not using e8 will result : (fatal signal 7: Bus error). From

[FFmpeg-devel] [PATCH 6/7] lavc/me_cmp: R-V V vsse vsad intra

2024-02-06 Thread flow gg
From b4abb039f8f769104a29819a1d709f5a00bf84d5 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Tue, 6 Feb 2024 23:28:08 +0800 Subject: [PATCH 6/7] lavc/me_cmp: R-V V vsse vsad intra C908: vsad_4_c: 681.0 vsad_4_rvv_i32: 182.2 vsad_5_c: 278.0 vsad_5_rvv_i32: 145.2 vsse_4_c: 595.0 vsse_4_rvv_i32:

[FFmpeg-devel] [PATCH 7/7] lavc/me_cmp: R-V V nsse

2024-02-06 Thread flow gg
From 31635394e89318c554a9653bd22791336309951e Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Tue, 6 Feb 2024 22:51:47 +0800 Subject: [PATCH 7/7] lavc/me_cmp: R-V V nsse C908: nsse_0_c: 1990.0 nsse_0_rvv_i32: 572.0 nsse_1_c: 910.0 nsse_1_rvv_i32: 456.0 --- libavcodec/riscv/me_cmp_init.c | 30

[FFmpeg-devel] [PATCH 5/7] lavc/me_cmp: R-V V vsse vsad

2024-02-06 Thread flow gg
From 67f2a662be1533e52a28971152bff670f78544fd Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Tue, 6 Feb 2024 23:18:51 +0800 Subject: [PATCH 5/7] lavc/me_cmp: R-V V vsse vsad C908: vsad_0_c: 936.0 vsad_0_rvv_i32: 236.2 vsad_1_c: 424.0 vsad_1_rvv_i32: 190.2 vsse_0_c: 877.0 vsse_0_rvv_i32: 204.2

[FFmpeg-devel] [PATCH 4/7] lavc/me_cmp: R-V V sse

2024-02-06 Thread flow gg
From 7d153e6b166d53c94db57be4f024986d38290042 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Tue, 6 Feb 2024 21:55:07 +0800 Subject: [PATCH 4/7] lavc/me_cmp: R-V V sse C908: sse_0_c: 614.7 sse_0_rvv_i32: 138.2 sse_1_c: 302.7 sse_1_rvv_i32: 107.2 sse_2_c: 175.7 sse_2_rvv_i32: 104.2 ---

[FFmpeg-devel] [PATCH 1/7] lavc/me_cmp: R-V V pix_abs

2024-02-06 Thread flow gg
From d4d6b3ea040f3f7997463b4452813bc75d1c9f9d Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Sat, 3 Feb 2024 10:58:13 +0800 Subject: [PATCH 1/7] lavc/me_cmp: R-V V pix_abs C908: pix_abs_0_0_c: 534.0 pix_abs_0_0_rvv_i32: 136.2 pix_abs_1_0_c: 287.7 pix_abs_1_0_rvv_i32: 125.2 sad_0_c: 534.0

[FFmpeg-devel] [PATCH 2/7] lavc/me_cmp: R-V V pix_abs_x2

2024-02-06 Thread flow gg
From ea0cf15e43c9a3e1b56c1a43d50f0701d42c7e9f Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Tue, 6 Feb 2024 21:41:35 +0800 Subject: [PATCH 2/7] lavc/me_cmp: R-V V pix_abs_x2 C908: pix_abs_0_1_c: 767.0 pix_abs_0_1_rvv_i32: 196.2 pix_abs_1_1_c: 388.0 pix_abs_1_1_rvv_i32: 185.2 ---

[FFmpeg-devel] [PATCH 3/7] lavc/me_cmp: R-V V pix_abs_y2

2024-02-06 Thread flow gg
From 01cdfde56c4a88022f0ed8c12a2442e6bebb6a60 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Tue, 6 Feb 2024 21:46:07 +0800 Subject: [PATCH 3/7] lavc/me_cmp: R-V V pix_abs_y2 C908: pix_abs_0_2_c: 904.0 pix_abs_0_2_rvv_i32: 172.2 pix_abs_1_2_c: 460.0 pix_abs_1_2_rvv_i32: 168.2 ---

Re: [FFmpeg-devel] [PATCH 1/7] lavc/me_cmp: R-V V pix_abs

2024-02-06 Thread flow gg
I think in most cases it is like this, but specifically for this function, using Reduction only once would be slower. The currently submitted version roughly takes: pix_abs_0_0_rvv_i32: 136.2 The version that uses Reduction only once takes: pix_abs_0_0_rvv_i32: 169.2 Here is the implementation

Re: [FFmpeg-devel] [PATCH 2/4] lavc/rv34dsp: R-V V rv34_inv_transform_dc

2024-02-06 Thread flow gg
My carelessness.. fixed it in the reply. Rémi Denis-Courmont 于2024年2月7日周三 01:26写道: > Hi, > > I'm not sure why you're mixing element sizes this way, but the code should > not > even compile due to mismatched extensions. > > -- > Rémi Denis-Courmont > http://www.remlab.net/ > > > >

Re: [FFmpeg-devel] Subject: [PATCH 3/3] lavc/dnxhdenc: R-V V get_pixels_8x4_sym

2024-02-18 Thread flow gg
ping flow gg 于2024年1月30日周二 00:22写道: > > I expect that it would be faster to make one large load, and then 4 small > > stores, but that might work only for exactly 128-bit vectors? > > This seems to require vle128, so I didn't modify it. > > > That's not needed. Y

[FFmpeg-devel] [PATCH 4/6] lavc/takdsp: R-V V decorrelate_ls

2023-12-18 Thread flow gg
C908: decorrelate_ls_c: 69.7 decorrelate_ls_rvv_i32: 27.2 From 03fad46e6db1846596c31918fc4e34b58246efc4 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Mon, 18 Dec 2023 22:49:21 +0800 Subject: [PATCH 4/6] lavc/takdsp: R-V V decorrelate_ls C908: decorrelate_ls_c: 69.7 decorrelate_ls_rvv_i32: 27.2

[FFmpeg-devel] [PATCH 3/6] checkasm/takdsp: add decorrelate_sm test

2023-12-18 Thread flow gg
From 9e09f52403058e1bc87653bfd9980c7d5a6ce33c Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Mon, 18 Dec 2023 22:48:09 +0800 Subject: [PATCH 3/6] checkasm/takdsp: add decorrelate_sm test --- tests/checkasm/takdsp.c | 29 + 1 file changed, 29 insertions(+) diff

[FFmpeg-devel] [PATCH 5/6] lavc/takdsp: R-V V decorrelate_sr

2023-12-18 Thread flow gg
C908: decorrelate_sr_c: 95.5 decorrelate_sr_rvv_i32: 28.2 From fa1a84337a7cd2a62c26a9d5f8d707a97e917f77 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Mon, 18 Dec 2023 22:52:20 +0800 Subject: [PATCH 5/6] lavc/takdsp: R-V V decorrelate_sr C908: decorrelate_sr_c: 95.5 decorrelate_sr_rvv_i32: 28.2

Re: [FFmpeg-devel] [PATCH 4/6] lavc/takdsp: R-V V decorrelate_ls

2023-12-18 Thread flow gg
A 'shnadd' should be moved to the front, updated in this reply. flow gg 于2023年12月18日周一 23:15写道: > C908: > decorrelate_ls_c: 69.7 > decorrelate_ls_rvv_i32: 27.2 > From fdee02eae64ced9a65781fbbeef32c6b8ee2fdce Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Mon, 18 Dec 2023 22:49:21 +

[FFmpeg-devel] [PATCH 2/6] checkasm/takdsp: add decorrelate_sr test

2023-12-18 Thread flow gg
From 9254ae1f72498568857357059eb514e8cb90b5f1 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Mon, 18 Dec 2023 22:47:29 +0800 Subject: [PATCH 2/6] checkasm/takdsp: add decorrelate_sr test --- tests/checkasm/takdsp.c | 27 +++ 1 file changed, 27 insertions(+) diff --git

[FFmpeg-devel] [PATCH 1/6] checkasm/takdsp: add decorrelate_ls test

2023-12-18 Thread flow gg
From 960f70964521e1dc94647d70e2631351c0bb51bb Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Mon, 18 Dec 2023 22:39:13 +0800 Subject: [PATCH 1/6] checkasm/takdsp: add decorrelate_ls test --- tests/checkasm/Makefile | 1 + tests/checkasm/checkasm.c | 3 ++ tests/checkasm/checkasm.h | 1 +

[FFmpeg-devel] [PATCH 6/6] lavc/takdsp: R-V V decorrelate_sm

2023-12-18 Thread flow gg
C908: decorrelate_sm_c: 130.0 decorrelate_sm_rvv_i32: 43.7 From 3dc613feaa6c38a7df47a3fc385e2140716e0ae2 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Mon, 18 Dec 2023 22:53:39 +0800 Subject: [PATCH 6/6] lavc/takdsp: R-V V decorrelate_sm C908: decorrelate_sm_c: 130.0 decorrelate_sm_rvv_i32:

[FFmpeg-devel] [PATCH] libavfilter/af_afir: R-V V dcmul_add

2023-12-18 Thread flow gg
c908: dcmul_add_c: 88.0 dcmul_add_rvv_f64: 46.2 Did not use vlseg2e64, because it is much slower than vlse64 Did not use vsseg2e64, because it is slightly slower than vsse64 From 80b6694bc29ed1c37852dc079a6d91a24dd6f18e Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Tue, 19 Dec 2023 09:11:28

Re: [FFmpeg-devel] [PATCH 4/6] lavc/takdsp: R-V V decorrelate_ls

2023-12-18 Thread flow gg
Okay, updated in the reply. Rémi Denis-Courmont 于2023年12月19日周二 00:25写道: > Le maanantaina 18. joulukuuta 2023, 17.26.58 EET flow gg a écrit : > > A 'shnadd' should be moved to the front, updated in this reply. > > Indeed, but please try to interleave scalar and vector instructi

Re: [FFmpeg-devel] Subject: [PATCH 3/3] lavc/dnxhdenc: R-V V get_pixels_8x4_sym

2023-12-20 Thread flow gg
There are only three emails in my Sent Items, but there are six at ffmpeg-devel.. I'm not quite sure why, please ignore the three duplicates. flow gg 于2023年12月20日周三 16:41写道: > C908: > get_pixels_8x4_sym_c: 297.2 > get_pixels_8x4_sym_rvv_

[FFmpeg-devel] Subject: [PATCH 3/3] lavc/dnxhdenc: R-V V get_pixels_8x4_sym

2023-12-20 Thread flow gg
C908: get_pixels_8x4_sym_c: 297.2 get_pixels_8x4_sym_rvv_i64: 52.7 From 6fe4dbe9af39af50a1bf2069e91dfa542d83fee3 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Wed, 20 Dec 2023 16:28:33 +0800 Subject: [PATCH 3/3] lavc/dnxhdenc: R-V V get_pixels_8x4_sym C908: get_pixels_8x4_sym_c: 297.2

[FFmpeg-devel] Subject: [PATCH 2/3] checkasm/dnxhdenc: add get_pixels_8x4_sym test

2023-12-20 Thread flow gg
From 2f17a594805615a93f3f475246d61d61cc0aa43b Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Wed, 20 Dec 2023 16:21:38 +0800 Subject: [PATCH 2/3] checkasm/dnxhdenc: add get_pixels_8x4_sym test --- tests/checkasm/Makefile | 1 + tests/checkasm/checkasm.c | 3 ++ tests/checkasm/checkasm.h |

[FFmpeg-devel] Subject: [PATCH 1/3] lvac/dnxhdenc: add ff_dnxhdenc_init

2023-12-20 Thread flow gg
From 3f8adabeac408ada6048a1e2ac472534f970364e Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Wed, 20 Dec 2023 16:17:32 +0800 Subject: [PATCH 1/3] lvac/dnxhdenc: add ff_dnxhdenc_init This is for clarity and use in testing, consistent with other parts of the code. --- libavcodec/dnxhdenc.c | 6

Re: [FFmpeg-devel] Subject: [PATCH 3/3] lavc/dnxhdenc: R-V V get_pixels_8x4_sym

2023-12-20 Thread flow gg
Because the format of [PATCH 1/3] was modified, this patch needs to be changed, and it has been modified in this reply. flow gg 于2023年12月20日周三 16:41写道: > C908: > get_pixels_8x4_sym_c: 297.2 > get_pixels_8x4_sym_rvv_i64: 52.7 > ___ &g

Re: [FFmpeg-devel] Subject: [PATCH 1/3] lvac/dnxhdenc: add ff_dnxhdenc_init

2023-12-20 Thread flow gg
> typo in 'lavc' fixed. > Brace should be on its own line fixed > Shouldn't it actually replace the existing ff_dnxhdenc_init_x86() call in dnxhdenc.c? Sorry, I missed this part, it's fixed in this reply Anton Khirnov 于2023年12月20日周三 17:51写道: > Quoting flow gg (2023-12

[FFmpeg-devel] [PATCH 2/3] lavc/h264dsp: R-V V h264_add_pixels4_clear

2023-12-24 Thread flow gg
C908 h264_add_pixels4_clear_c: 96.0 h264_add_pixels4_clear_rvv_i64: 30.2 From 8b2838516915c27aa2831e797c2c41ad1d1bae1b Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Mon, 25 Dec 2023 00:06:28 +0800 Subject: [PATCH 2/3] lavc/h264dsp: R-V V h264_add_pixels4_clear C908 h264_add_pixels4_clear_c:

[FFmpeg-devel] [PATCH 1/3] checkasm/h264dsp: add h264_add_pixels_clear test

2023-12-24 Thread flow gg
From 39a9d1728cd867f5a4bfc39232167e9769247bf6 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Thu, 21 Dec 2023 20:02:11 +0800 Subject: [PATCH 1/3] checkasm/h264dsp: add h264_add_pixels_clear test --- tests/checkasm/h264dsp.c | 55 1 file changed, 55

[FFmpeg-devel] [PATCH 3/3] lavc/h264dsp: R-V V h264_add_pixels8_clear

2023-12-24 Thread flow gg
C908 h264_add_pixels8_clear_c: 262.0 h264_add_pixels8_clear_rvv_i64: 59.0 From 11218f9067566fa3ace8821b4b890457d6ea17f9 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Mon, 25 Dec 2023 00:07:09 +0800 Subject: [PATCH 3/3] lavc/h264dsp: R-V V h264_add_pixels8_clear C908 h264_add_pixels8_clear_c:

Re: [FFmpeg-devel] [PATCH 6/6] lavc/takdsp: R-V V decorrelate_sm

2023-12-21 Thread flow gg
: > Le maanantaina 18. joulukuuta 2023, 17.16.27 EET flow gg a écrit : > > C908: > > decorrelate_sm_c: 130.0 > > decorrelate_sm_rvv_i32: 43.7 > > + > +func ff_decorrelate_sm_rvv, zve32x > +1: > +vsetvli t0, a2, e32, m8, ta, ma > +vle32

Re: [FFmpeg-devel] [PATCH] libavfilter/af_afir: R-V V dcmul_add

2023-12-21 Thread flow gg
uta 2023, 4.53.12 EET flow gg a écrit : > > c908: > > dcmul_add_c: 88.0 > > dcmul_add_rvv_f64: 46.2 > > > > Did not use vlseg2e64, because it is much slower than vlse64 > > Did not use vsseg2e64, because it is slightly slower than vsse64 > > Is this about C

Re: [FFmpeg-devel] [PATCH 2/2] lavc/aacencdsp: R-V V abs_pow34

2023-12-09 Thread flow gg
Updated the patch to resolve conflicts, updated m4 to m8, using c908's benchmark. flow gg 于2023年11月29日周三 01:00写道: > c910: > abs_pow34_c: 24610.7 > abs_pow34_rvv_f32: 6177.7 > > (need use "[FFmpeg-devel] [PATCH 1/2] checkasm: test for abs_

Re: [FFmpeg-devel] [PATCH 1/2] checkasm: test for abs_pow34

2023-12-09 Thread flow gg
To express clearly,I mean remove libavcodec/aacenc.c:1429 FF_CODEC_ENCODE_CB(aac_encode_frame) ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email

Re: [FFmpeg-devel] [PATCH 1/2] checkasm: test for abs_pow34

2023-12-09 Thread flow gg
s:0kB If I remove the line 1429 with FF_CODEC_ENCODE_CB(aac_encode_frame), there is no error on k230, but I am unsure of the reason. flow gg 于2023年12月5日周二 05:46写道: > Because there was a conflict, the patch was updated in the reply > > flow gg 于2023年12月1日周五 04:25写道: > &g

Re: [FFmpeg-devel] [PATCH 3/3] lavc/svq1enc: R-V V ssd_int8_vs_int16

2023-12-30 Thread flow gg
I mistook it, seeing the vector length as the length of the vector register .. I have modified it in this reply. Rémi Denis-Courmont 于2023年12月30日周六 20:15写道: > > > Le 29 décembre 2023 12:57:20 GMT+01:00, flow gg a > écrit : > >C908 > >ssd_int8_vs_int16_c: 207.7 >

Re: [FFmpeg-devel] [PATCH 2/3] checkasm/svqenc: add ssd_int8_vs_int16 test

2023-12-30 Thread flow gg
Thank you, I learned this and updated it in this reply. James Almer 于2023年12月30日周六 22:46写道: > On 12/30/2023 10:59 AM, flow gg wrote: > > Okay, it has been modified in this reply. > > > From d62f363e3aad534c7ead5f3015029b3e7cbbff46 Mon Sep 17 00:00:00 2001 > > From: suny

Re: [FFmpeg-devel] [PATCH 3/3] lavc/svq1enc: R-V V ssd_int8_vs_int16

2023-12-30 Thread flow gg
flow gg 于2023年12月30日周六 22:00写道: > > At a quick glance, it won't work if the input length is not a multiple > of the vector length. > > Why? I tried 1024, 32*3, 32*7 and all passed the test. > > > Also do you really need to extend accumulators to 32 bits? > > It

Re: [FFmpeg-devel] [PATCH 3/3] lavc/svq1enc: R-V V ssd_int8_vs_int16

2023-12-30 Thread flow gg
I have modified it in this reply. Rémi Denis-Courmont 于2023年12月30日周六 20:15写道: > > > Le 29 décembre 2023 12:57:20 GMT+01:00, flow gg a > écrit : > >C908 > >ssd_int8_vs_int16_c: 207.7 > >ssd_int8_vs_int16_rvv_i32: 28.0 > > At a quick glance, it won't work if the inpu

Re: [FFmpeg-devel] [PATCH 2/3] checkasm/svqenc: add ssd_int8_vs_int16 test

2023-12-30 Thread flow gg
Okay, it has been modified in this reply. Martin Storsjö 于2023年12月29日周五 22:35写道: > On Fri, 29 Dec 2023, James Almer wrote: > > > On 12/29/2023 9:16 AM, Martin Storsjö wrote: > >> On Fri, 29 Dec 2023, flow gg wrote: > >> > >>> Tests on x86 might f

[FFmpeg-devel] [PATCH 1/3] lvac/svqenc: add ff_svq1enc_init

2023-12-29 Thread flow gg
From 55fe9e001545ed3ae1f2c64666d07aebaeb83a2a Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Fri, 29 Dec 2023 13:08:25 +0800 Subject: [PATCH 1/3] lvac/svqenc: add ff_svq1enc_init This is for clarity and use in testing, consistent with other parts of the code --- libavcodec/svq1enc.c| 18

[FFmpeg-devel] [PATCH 2/3] checkasm/svqenc: add ssd_int8_vs_int16 test

2023-12-29 Thread flow gg
Tests on x86 might fail, possibly due to a 16-bit sub overflow From 8bde7750ec7adc2437843e14d4be85fb900d1b16 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Fri, 29 Dec 2023 13:09:21 +0800 Subject: [PATCH 2/3] checkasm/svqenc: add ssd_int8_vs_int16 test --- tests/checkasm/Makefile | 1 +

[FFmpeg-devel] [PATCH 3/3] lavc/svq1enc: R-V V ssd_int8_vs_int16

2023-12-29 Thread flow gg
C908 ssd_int8_vs_int16_c: 207.7 ssd_int8_vs_int16_rvv_i32: 28.0 From 0fd1b7a34ab8794868d80233c35f70c8ad42b9fa Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Fri, 29 Dec 2023 13:27:31 +0800 Subject: [PATCH 3/3] lavc/svq1enc: R-V V ssd_int8_vs_int16 C908 ssd_int8_vs_int16_c: 207.7

Re: [FFmpeg-devel] [PATCH 3/3] lavc/svq1enc: R-V V ssd_int8_vs_int16

2024-01-04 Thread flow gg
One vset can be reduced, but vwsub should not be used in this case. I modified it in this reply. Rémi Denis-Courmont 于2024年1月5日周五 00:00写道: > Le lauantaina 30. joulukuuta 2023, 18.20.15 EET flow gg a écrit : > > I mistook it, seeing the vector length as the length of the vector &

Re: [FFmpeg-devel] [PATCH 3/3] lavc/svq1enc: R-V V ssd_int8_vs_int16

2024-01-06 Thread flow gg
a2, a2, t0 + vsetvli zero, t0, e8, m2, tu, ma + vle8.v v0, (a0) + vwsub.wv v16, v8, v0 Rémi Denis-Courmont 于2024年1月6日周六 23:05写道: > Le perjantaina 5. tammikuuta 2024, 2.56.18 EET flow gg a écrit : > > One vset can be reduced, but vwsub should not be used in thi

Re: [FFmpeg-devel] [PATCH 3/3] lavc/svq1enc: R-V V ssd_int8_vs_int16

2024-01-07 Thread flow gg
Alright, I learned a bit more, so should we not consider the internal implementation? I've added this version that reduces one vset in this reply. Rémi Denis-Courmont 于2024年1月7日周日 16:03写道: > Le sunnuntaina 7. tammikuuta 2024, 3.33.39 EET flow gg a écrit : > > I tested it, and indeed us

Re: [FFmpeg-devel] [PATCH 3/3] lavc/h264dsp: R-V V h264_add_pixels8_clear

2024-01-11 Thread flow gg
ping flow gg 于2023年12月25日周一 12:01写道: > C908 > h264_add_pixels8_clear_c: 262.0 > h264_add_pixels8_clear_rvv_i64: 59.0 > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe

Re: [FFmpeg-devel] [PATCH 3/3] lavc/svq1enc: R-V V ssd_int8_vs_int16

2024-01-16 Thread flow gg
Okay, I updated it in the reply Rémi Denis-Courmont 于2024年1月17日周三 02:04写道: > +vsetvli t0, a2, e8, m2, tu, ma > +vle8.v v0, (a0) > +sub a2, a2, t0 > +vsetvli zero, t0, e16, m4, tu, ma > +vle16.v v8, (a1) > +vsetvli

[FFmpeg-devel] [PATCH 1/3] lavc/h264pred: R-V V pred16x16_vertical_8

2024-01-16 Thread flow gg
From eaac50d41b3398ef39d1026a7d84480860a1c41e Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Tue, 16 Jan 2024 23:55:33 +0800 Subject: [PATCH 1/3] lavc/h264pred: R-V V pred16x16_vertical_8 C908 pred16x16_vertical_8_c: 1.5 pred16x16_vertical_8_rvv_i32: 1.0 --- libavcodec/h264pred.c|

[FFmpeg-devel] [PATCH 2/3] lavc/h264pred: R-V V pred16x16_horizontal_8

2024-01-16 Thread flow gg
From 806f84ea5557c4652e48451decc4c679c9485472 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Tue, 16 Jan 2024 23:56:33 +0800 Subject: [PATCH 2/3] lavc/h264pred: R-V V pred16x16_horizontal_8 C908 pred16x16_horizontal_8_c: 3.0 pred16x16_horizontal_8_rvv_i32: 2.5 ---

[FFmpeg-devel] [PATCH 3/3] lavc/h264pred: R-V V pred16x16_dc_8

2024-01-16 Thread flow gg
From 8c5fdbfea42e9ad6ba6e1df5e4ea3c583d59537a Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Tue, 16 Jan 2024 23:57:53 +0800 Subject: [PATCH 3/3] lavc/h264pred: R-V V pred16x16_dc_8 C908 pred16x16_dc_8_c: 2.5 pred16x16_dc_8_rvv_i32: 1.7 --- libavcodec/riscv/h264pred_init.c | 2 ++

[FFmpeg-devel] [PATCH 2/2] lavc/aacencdsp: R-V V abs_pow34

2023-11-28 Thread flow gg
c910: abs_pow34_c: 24610.7 abs_pow34_rvv_f32: 6177.7 (need use "[FFmpeg-devel] [PATCH 1/2] checkasm: test for abs_pow34" first) From 86577c2d40d29422c4b769c854df99a88c7b3c77 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Tue, 28 Nov 2023 20:14:14 +0800 Subject: [PATCH 2/2]

[FFmpeg-devel] [PATCH 1/2] checkasm: test for abs_pow34

2023-11-28 Thread flow gg
From 85e60d75554894964825f5718d14591294ec4e88 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Tue, 28 Nov 2023 14:08:12 +0800 Subject: [PATCH 1/2] checkasm: test for abs_pow34 --- libavcodec/aacenc.c| 24 +++-- libavcodec/aacenc.h| 1 + tests/checkasm/Makefile| 1 +

Re: [FFmpeg-devel] [PATCH] checkasm: add test for dcmul_add

2023-11-26 Thread flow gg
This is a bit confusing for me.. I tried pulling the latest code, and then used `git am checkasm-test-for-dcmul_add.patch` without any patch corruption. Rémi Denis-Courmont 于2023年11月27日周一 03:36写道: > Le sunnuntaina 19. marraskuuta 2023, 0.28.10 EET flow gg a écrit : >

[FFmpeg-devel] [PATCH] checkasm/ac3dsp: add float_to_fixed24 test

2023-11-22 Thread flow gg
From 02dd534bd602ba3ec79e51070934949a98f780e2 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Wed, 22 Nov 2023 14:57:29 +0800 Subject: [PATCH] checkasm/ac3dsp: add float_to_fixed24 test --- tests/checkasm/Makefile | 1 + tests/checkasm/ac3dsp.c | 71 +++

Re: [FFmpeg-devel] [PATCH] ac3dsp: RISC-V V float_to_fixed24

2023-11-22 Thread flow gg
I modified the temporary test and sent it in "[FFmpeg-devel] [PATCH] checkasm/ac3dsp: add float_to_fixed24 test". So the test time results have changed, and I updated them in the patch. c910 float_to_fixed24_c: 2207.2 float_to_fixed24_rvv_f32: 696.2 flow gg 于2023年11月22日周三 20:00写

Re: [FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V inv_trans

2023-12-05 Thread flow gg
> FWIW CanMV-K230 boards are on sale for under 500 RMB. I just made a payment ~ (I saw you mention in IRC that you're going to write about K230+Debian. Looking forward to it) Rémi Denis-Courmont 于2023年12月6日周三 04:11写道: > Le tiistaina 5. joulukuuta 2023, 21.25.12 EET flow gg a

Re: [FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V inv_trans

2023-12-05 Thread flow gg
Changed. Rémi Denis-Courmont 于2023年12月6日周三 04:11写道: > Le tiistaina 5. joulukuuta 2023, 21.25.12 EET flow gg a écrit : > > > This block can be folded into the next. You don't need to check VLENB > > > > twice. > > > > Changed. > > > > > Instruction

Re: [FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V inv_trans

2023-12-05 Thread flow gg
> This block can be folded into the next. You don't need to check VLENB twice. Changed. > Instruction scheduling could be better, especially on in-order CPUs. I put the vload at the front, and then proceeded with the t2 operation, but I'm not sure... > You don't need to reset the AVL here,

Re: [FFmpeg-devel] [PATCH] ac3dsp: RISC-V V float_to_fixed24

2023-12-01 Thread flow gg
Okay, changed and attached Rémi Denis-Courmont 于2023年12月2日周六 02:38写道: > Le perjantaina 1. joulukuuta 2023, 20.35.10 EET Rémi Denis-Courmont a > écrit : > > Le perjantaina 24. marraskuuta 2023, 0.39.39 EET flow gg a écrit : > > > Okay, changed > > > > s

Re: [FFmpeg-devel] [PATCH] ac3dsp: RISC-V V float_to_fixed24

2023-12-01 Thread flow gg
I forgot to modify the Makefile; I've made the changes in this reply. flow gg 于2023年12月2日周六 03:50写道: > Okay, changed and attached > > Rémi Denis-Courmont 于2023年12月2日周六 02:38写道: > >> Le perjantaina 1. joulukuuta 2023, 20.35.10 EET Rémi Denis-Courmont a >> écrit

Re: [FFmpeg-devel] [PATCH] ac3dsp: RISC-V V float_to_fixed24

2023-11-23 Thread flow gg
Okay, changed Rémi Denis-Courmont 于2023年11月24日周五 01:09写道: > Le torstaina 23. marraskuuta 2023, 1.17.03 EET flow gg a écrit : > > Hello, I saw the new commit "avcodec/ac3dsp: make len a size_t in > > float_to_fixed24." > > > > So I removed the part #if (__ris

Re: [FFmpeg-devel] [PATCH] checkasm/ac3dsp: add float_to_fixed24 test

2023-11-23 Thread flow gg
> You should probably add the test case to tests/fate/checkasm.mak > This one is not necessary. You can reuse dst or dst2 for the bench() as it's write only. > Changed BUF_SIZE instead of 10. Okay, changed. James Almer 于2023年11月24日周五 01:11写道: > On 11/23/2023 4:08 AM, f

Re: [FFmpeg-devel] [PATCH] ac3dsp: RISC-V V float_to_fixed24

2023-11-22 Thread flow gg
Wow, thank you for reviewing this. I just wanted to see if the function was working properly. There are so many bugs in the test code ... ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To

Re: [FFmpeg-devel] [PATCH] ac3dsp: RISC-V V float_to_fixed24

2023-11-22 Thread flow gg
Hello, I saw the new commit "avcodec/ac3dsp: make len a size_t in float_to_fixed24." So I removed the part #if (__riscv_xlen == 64) and restored the patch. From 3e790fdccd780257f464aa8f8a56a37321ddd429 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Wed, 22 Nov 2023 14:57:29 +0800 Subject:

[FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V inv_trans

2023-12-03 Thread flow gg
c910 vc1dsp.vc1_inv_trans_4x4_dc_c: 84.0 vc1dsp.vc1_inv_trans_4x4_dc_rvv_i32: 74.0 vc1dsp.vc1_inv_trans_4x8_dc_c: 150.2 vc1dsp.vc1_inv_trans_4x8_dc_rvv_i32: 83.5 vc1dsp.vc1_inv_trans_8x4_dc_c: 129.0 vc1dsp.vc1_inv_trans_8x4_dc_rvv_i64: 75.7

Re: [FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V inv_trans

2023-12-04 Thread flow gg
I found that in the case of nosplat, an additional vset can be removed, and the time is basically the same, so I updated the patch. Rémi Denis-Courmont 于2023年12月4日周一 23:15写道: > Le maanantaina 4. joulukuuta 2023, 10.48.56 EET flow gg a écrit : > > > Probably missing VLENB checks. >

Re: [FFmpeg-devel] [PATCH 1/2] checkasm: test for abs_pow34

2023-12-04 Thread flow gg
Because there was a conflict, the patch was updated in the reply flow gg 于2023年12月1日周五 04:25写道: > Okay, I splited and attached > > > > Rémi Denis-Courmont 于2023年11月30日周四 23:31写道: > >> Le tiistaina 28. marraskuuta 2023, 18.59.38 EET flow gg a écrit : >> > >

Re: [FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V inv_trans

2023-12-04 Thread flow gg
zero, zero, e64, m4, ta, ma + vsetivlizero, 8, e8, mf2, ta, ma ``` And ISCAS seems to have no announcement about getting an RVV 1.0 board. I plan to ask about it from time to time. Rémi Denis-Courmont 于2023年12月4日周一 01:17写道: > Le sunnuntaina 3. joulukuuta 2023, 16.40.

Re: [FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V inv_trans

2023-12-04 Thread flow gg
Okay, after using zext, can delete two vset, which is better than splat. I have updated the patch in this reply. Rémi Denis-Courmont 于2023年12月4日周一 23:15写道: > Le maanantaina 4. joulukuuta 2023, 10.48.56 EET flow gg a écrit : > > > Probably missing VLENB checks. > > > > Ch

Re: [FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V inv_trans

2023-12-07 Thread flow gg
023, 16.40.08 EET flow gg a écrit : > > c910 > > vc1dsp.vc1_inv_trans_4x4_dc_c: 84.0 > > vc1dsp.vc1_inv_trans_4x4_dc_rvv_i32: 74.0 > > vc1dsp.vc1_inv_trans_4x8_dc_c: 150.2 > > vc1dsp.vc1_inv_trans_4x8_dc_rvv_i32: 83.5 > >

Re: [FFmpeg-devel] [PATCH] checkasm: add test for dcmul_add

2023-11-27 Thread flow gg
also posed no problems. (I am using the Gmail web page.) Rémi Denis-Courmont 于2023年11月27日周一 20:17写道: > > > Le 26 novembre 2023 22:54:28 GMT+02:00, flow gg a > écrit : > >This is a bit confusing for me.. I tried pulling the latest code, and then > >used `git am checkasm-

Re: [FFmpeg-devel] [PATCH 1/2] checkasm: test for abs_pow34

2023-11-30 Thread flow gg
Okay, I splited and attached Rémi Denis-Courmont 于2023年11月30日周四 23:31写道: > Le tiistaina 28. marraskuuta 2023, 18.59.38 EET flow gg a écrit : > > > > Since nobody else commented, I shall note that you should probably split > the > underlying lavc changes into a separ

Re: [FFmpeg-devel] [PATCH 2/3] lavc/h264dsp: R-V V h264_add_pixels4_clear

2024-01-25 Thread flow gg
ping flow gg 于2023年12月25日周一 12:01写道: > > C908 > h264_add_pixels4_clear_c: 96.0 > h264_add_pixels4_clear_rvv_i64: 30.2 > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscr

Re: [FFmpeg-devel] [PATCH 2/4] lavc/rv34dsp: R-V V rv34_inv_transform_dc

2024-01-31 Thread flow gg
> Also fractional multipler should never be smaller than the ratio of the > specified element size to the largest element size used in the function. Here > it is largelly inconsequential, but for instance "e32, mf4" and "e64, mf2" are > invalid. Thanks, I indeed almost forgot about this part > I

Re: [FFmpeg-devel] [PATCH 4/4] lavc/rv34dsp: R-V V rv34_idct_dc_add

2024-01-31 Thread flow gg
Fixed the rv32 break in this reply flow gg 于2024年1月31日周三 20:01写道: > > From 0874f319e1c26aa0eeb5ed0d4e00d29aec4c5af8 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Wed, 31 Jan 2024 19:04:11 +0800 Subject: [PATCH 4/4] lavc/rv34dsp: R-V V rv34_idct_dc_add C908: rv34_idct_dc_add_c:

Re: [FFmpeg-devel] Subject: [PATCH 3/3] lavc/dnxhdenc: R-V V get_pixels_8x4_sym

2024-01-29 Thread flow gg
I have slightly adjusted the rvv and updated patch in this reply. flow gg 于2023年12月20日周三 18:15写道: > Because the format of [PATCH 1/3] was modified, this patch needs to be > changed, and it has been modified in this reply. > > flow gg 于2023年12月20日周三 16:41写道: > >> C908: &g

Re: [FFmpeg-devel] Subject: [PATCH 3/3] lavc/dnxhdenc: R-V V get_pixels_8x4_sym

2024-01-29 Thread flow gg
> I expect that it would be faster to make one large load, and then 4 small > stores, but that might work only for exactly 128-bit vectors? This seems to require vle128, so I didn't modify it. > That's not needed. You can use immediate values. > You can reorder to avoid immediate data

[FFmpeg-devel] [PATCH 1/2] lavc/blockdsp: R-V V clear_block

2024-02-01 Thread flow gg
From d545f5ccc1c5923cb38c25b18ca750ef0ee529f4 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Thu, 1 Feb 2024 15:12:49 +0800 Subject: [PATCH 1/2] lavc/blockdsp: R-V V clear_block C908: blockdsp.clear_block_c: 47.2 blockdsp.clear_block_rvv_i64: 28.5 --- libavcodec/blockdsp.c| 2 ++

[FFmpeg-devel] [PATCH 2/2] lavc/blockdsp: R-V V clear_blocks

2024-02-01 Thread flow gg
From 91236c12365de8a39250ceee07a6234a1735ae77 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Thu, 1 Feb 2024 15:41:09 +0800 Subject: [PATCH 2/2] lavc/blockdsp: R-V V clear_blocks C908: blockdsp.clear_blocks_c: 128.2 blockdsp.clear_blocks_rvv_i64: 102.5 --- libavcodec/riscv/blockdsp_init.c | 2

Re: [FFmpeg-devel] [PATCH 1/4] checkasm/rv34dsp: add rv34_inv_transform_dc test

2024-02-01 Thread flow gg
It seems to be caused by movd m0, r1d in libavcodec/x86/rv34dsp.asm? I'm not quite sure. Michael Niedermayer 于2024年2月2日周五 07:42写道: > On Wed, Jan 31, 2024 at 08:00:18PM +0800, flow gg wrote: > > > > > checkasm/Makefile |1 > > checkasm/checkasm.c |3 ++ > &g

Re: [FFmpeg-devel] [PATCH 2/2] lavc/blockdsp: R-V V clear_blocks

2024-02-01 Thread flow gg
Ok, updated it in the reply Rémi Denis-Courmont 于2024年2月2日周五 04:13写道: > You should probably use an assembler macro to repeat the code. > > > -- > レミ・デニ-クールモン > http://www.remlab.net/ > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org >

[FFmpeg-devel] [PATCH 1/3] lavc/vp8dsp: R-V V vp8_idct_dc_add

2024-02-01 Thread flow gg
From 32fdf006a81da78bde29b5cc0c26446d0bb3390d Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Fri, 2 Feb 2024 12:49:07 +0800 Subject: [PATCH 1/3] lavc/vp8dsp: R-V V vp8_idct_dc_add c908: vp8_idct_dc_add_c: 102.2 vp8_idct_dc_add_rvv_i32: 42.0 --- libavcodec/riscv/Makefile | 2 ++

  1   2   3   >