]
fcmul_add_c: 4.2
fcmul_add_rvv_f32: 4.2
- af_afir.fcmul_add [OK]
fcmul_add_c: 4.5
fcmul_add_rvv_f32: 4.2
- af_afir.fcmul_add [OK]
fcmul_add_c: 4.7
fcmul_add_rvv_f32: 3.5
Rémi Denis-Courmont 于2023年9月28日周四 00:41写道:
> Le tiistaina 26. syyskuuta 2023, 12.24.58 EEST flow gg a écrit :
> >
于2023年9月27日周三 02:44写道:
> Le tiistaina 26. syyskuuta 2023, 21.40.12 EEST Paul B Mahol a écrit :
> > On Tue, Sep 26, 2023 at 8:35 PM Rémi Denis-Courmont
> wrote:
> > > Le tiistaina 26. syyskuuta 2023, 12.24.58 EEST flow gg a écrit :
> > > > benchmark:
> > &
benchmark:
fcmul_add_c: 19.7
fcmul_add_rvv_f32: 6.7
From 6bef2523728a472bb803ce085a1aafdfd624e212 Mon Sep 17 00:00:00 2001
From: h
Date: Tue, 26 Sep 2023 15:03:12 +0800
Subject: [PATCH] af_afir: RISC-V V fcmul_add
fcmul_add_c: 19.7
fcmul_add_rvv_f32: 6.7
---
libavfilter/af_afirdsp.h |
signal 7: Bus error)
Because it can only load according to e8, it seems there's no way to use
larger group multipliers.
Rémi Denis-Courmont 于2024年2月9日周五 03:41写道:
> Le keskiviikkona 7. helmikuuta 2024, 2.01.23 EET flow gg a écrit :
> > I think in most cases it is like this, but spe
The issue here is that any load greater than e8 will fail the test(Bus
error), so it cannot use vlse64 or similar methods...
Rémi Denis-Courmont 于2024年2月9日周五 18:32写道:
>
>
> Le 9 février 2024 00:39:38 GMT+02:00, flow gg a
> écrit :
> >From my understanding, to use larger grou
Okay, I have updated them in the response
Rémi Denis-Courmont 于2024年2月10日周六 05:14写道:
> Le keskiviikkona 7. helmikuuta 2024, 2.12.22 EET flow gg a écrit :
> > My carelessness.. fixed it in the reply.
>
> I know I said to avoid scalar multiplications, but this may be taking it a
&
ok, updated it in the reply
Rémi Denis-Courmont 于2024年2月13日周二 03:49写道:
> Le perjantaina 2. helmikuuta 2024, 3.14.39 EET flow gg a écrit :
> > Ok, updated it in the reply
>
> Sorry I meant directive, not macro. .rept is just fine here.
>
> --
> レミ・デニ-クールモン
I tested this in '[FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V inv_trans'. The
logic here is the same, using vext can reduce vset, making it a bit faster
Rémi Denis-Courmont 于2024年2月13日周二 03:46写道:
> Le keskiviikkona 31. tammikuuta 2024, 19.58.55 EET flow gg a écrit :
> > Fixed the r
xxx_idct_dc_add is quite similar because vext can reduce vset, so it is a
bit faster than using vwadd. This was tested when '[FFmpeg-devel] [PATCH]
lavc/vc1dsp: R-V V inv_trans'
Rémi Denis-Courmont 于2024年2月13日周二 03:53写道:
> Hi,
>
> I think you cna use vwadd here?
>
> --
> Rémi Denis-Courmont
>
it was due to a testing , not MMX. fixed it in this reply.
flow gg 于2024年2月13日周二 10:37写道:
> I sended "[FFmpeg-devel] [PATCH] x86: Remove MMX assembly
> rv34_inv_transform_dc in rv34dsp"
>
> Rémi Denis-Courmont 于2024年2月13日周二 03:37写道:
>
>> Le perjantaina 2. helmi
I made a mistake. It can be fixed your way. Please ignore this reply.
flow gg 于2024年2月13日周二 17:47写道:
> Thank you for your guidance. Do you mean that it should be modified test
> like this?
>
> - declare_func(void, uint8_t *dst, ptrdiff_t stride, int dc);
> + declare_func_emms(
Thank you for your guidance. Do you mean that it should be modified test
like this?
- declare_func(void, uint8_t *dst, ptrdiff_t stride, int dc);
+ declare_func_emms(AV_CPU_FLAG_MMX, void, uint8_t *, ptrdiff_t, int);
I tried to do it this way, but the test still failed. not sure why ...
Happy new year ~
Yes, I've tried reordering.
Rémi Denis-Courmont 于2024年2月10日周六 17:18写道:
> Happy new year,
>
> The gains are -unsurprisingly- modest here. Did you try to reorder
> instructions to improve scheduling?
>
> --
> Rémi Denis-Courmont
> http://www.remlab.net/
>
>
>
>
checkasm in [FFmpeg-devel] [PATCH 1/4] checkasm/rv34dsp: add
rv34_inv_transform_dc test
From 1aa51d60def8d4313c1b11a50528662ec832530e Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Tue, 13 Feb 2024 08:41:20 +0800
Subject: [PATCH] x86: Remove MMX assembly rv34_inv_transform_dc in rv34dsp
This asm
I sended "[FFmpeg-devel] [PATCH] x86: Remove MMX assembly
rv34_inv_transform_dc in rv34dsp"
Rémi Denis-Courmont 于2024年2月13日周二 03:37写道:
> Le perjantaina 2. helmikuuta 2024, 2.47.16 EET flow gg a écrit :
> > It seems to be caused by movd m0, r1d in libavcodec/x86/rv34dsp.asm
Okay, updated it in the reply
Rémi Denis-Courmont 于2024年2月13日周二 03:54写道:
> Hi,
>
> To avoid repeating the code, you can either use .repr or .irp. You can
> even
> use assembler conditionals to elide the redundant code on the last
> iteration.
>
> --
> レミ・デニ-クールモン
> http://www.remlab.net/
>
llo,
>
> Le maanantaina 19. helmikuuta 2024, 13.13.43 EET flow gg a écrit :
> > The reason for using m1+le8 instead of stride load + larger group
> > multipliers is the same as in "[FFmpeg-devel] [PATCH 1/7] lavc/me_cmp:
> R-V
> > V pix_abs."
> >
> >
=917745 c=3865
Rémi Denis-Courmont 于2024年2月22日周四 02:07写道:
> Le tiistaina 6. helmikuuta 2024, 17.56.32 EET flow gg a écrit :
> >
>
> Did you try to compute integral absolute values with the ad-hoc (floating
> point) instruction instead of vneg/vmax? It should work since the sign
The reason for using m1+le8 instead of stride load + larger group
multipliers is the same as in "[FFmpeg-devel] [PATCH 1/7] lavc/me_cmp: R-V
V pix_abs."
In the test, there is
#define src (buf + 2 * SRC_BUF_STRIDE + 2 + 1)
Therefore, not using e8 will result : (fatal signal 7: Bus error).
From
From b4abb039f8f769104a29819a1d709f5a00bf84d5 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Tue, 6 Feb 2024 23:28:08 +0800
Subject: [PATCH 6/7] lavc/me_cmp: R-V V vsse vsad intra
C908:
vsad_4_c: 681.0
vsad_4_rvv_i32: 182.2
vsad_5_c: 278.0
vsad_5_rvv_i32: 145.2
vsse_4_c: 595.0
vsse_4_rvv_i32:
From 31635394e89318c554a9653bd22791336309951e Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Tue, 6 Feb 2024 22:51:47 +0800
Subject: [PATCH 7/7] lavc/me_cmp: R-V V nsse
C908:
nsse_0_c: 1990.0
nsse_0_rvv_i32: 572.0
nsse_1_c: 910.0
nsse_1_rvv_i32: 456.0
---
libavcodec/riscv/me_cmp_init.c | 30
From 67f2a662be1533e52a28971152bff670f78544fd Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Tue, 6 Feb 2024 23:18:51 +0800
Subject: [PATCH 5/7] lavc/me_cmp: R-V V vsse vsad
C908:
vsad_0_c: 936.0
vsad_0_rvv_i32: 236.2
vsad_1_c: 424.0
vsad_1_rvv_i32: 190.2
vsse_0_c: 877.0
vsse_0_rvv_i32: 204.2
From 7d153e6b166d53c94db57be4f024986d38290042 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Tue, 6 Feb 2024 21:55:07 +0800
Subject: [PATCH 4/7] lavc/me_cmp: R-V V sse
C908:
sse_0_c: 614.7
sse_0_rvv_i32: 138.2
sse_1_c: 302.7
sse_1_rvv_i32: 107.2
sse_2_c: 175.7
sse_2_rvv_i32: 104.2
---
From d4d6b3ea040f3f7997463b4452813bc75d1c9f9d Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Sat, 3 Feb 2024 10:58:13 +0800
Subject: [PATCH 1/7] lavc/me_cmp: R-V V pix_abs
C908:
pix_abs_0_0_c: 534.0
pix_abs_0_0_rvv_i32: 136.2
pix_abs_1_0_c: 287.7
pix_abs_1_0_rvv_i32: 125.2
sad_0_c: 534.0
From ea0cf15e43c9a3e1b56c1a43d50f0701d42c7e9f Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Tue, 6 Feb 2024 21:41:35 +0800
Subject: [PATCH 2/7] lavc/me_cmp: R-V V pix_abs_x2
C908:
pix_abs_0_1_c: 767.0
pix_abs_0_1_rvv_i32: 196.2
pix_abs_1_1_c: 388.0
pix_abs_1_1_rvv_i32: 185.2
---
From 01cdfde56c4a88022f0ed8c12a2442e6bebb6a60 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Tue, 6 Feb 2024 21:46:07 +0800
Subject: [PATCH 3/7] lavc/me_cmp: R-V V pix_abs_y2
C908:
pix_abs_0_2_c: 904.0
pix_abs_0_2_rvv_i32: 172.2
pix_abs_1_2_c: 460.0
pix_abs_1_2_rvv_i32: 168.2
---
I think in most cases it is like this, but specifically for this function,
using Reduction only once would be slower.
The currently submitted version roughly takes:
pix_abs_0_0_rvv_i32: 136.2
The version that uses Reduction only once takes:
pix_abs_0_0_rvv_i32: 169.2
Here is the implementation
My carelessness.. fixed it in the reply.
Rémi Denis-Courmont 于2024年2月7日周三 01:26写道:
> Hi,
>
> I'm not sure why you're mixing element sizes this way, but the code should
> not
> even compile due to mismatched extensions.
>
> --
> Rémi Denis-Courmont
> http://www.remlab.net/
>
>
>
>
ping
flow gg 于2024年1月30日周二 00:22写道:
> > I expect that it would be faster to make one large load, and then 4 small
> > stores, but that might work only for exactly 128-bit vectors?
>
> This seems to require vle128, so I didn't modify it.
>
> > That's not needed. Y
C908:
decorrelate_ls_c: 69.7
decorrelate_ls_rvv_i32: 27.2
From 03fad46e6db1846596c31918fc4e34b58246efc4 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Mon, 18 Dec 2023 22:49:21 +0800
Subject: [PATCH 4/6] lavc/takdsp: R-V V decorrelate_ls
C908:
decorrelate_ls_c: 69.7
decorrelate_ls_rvv_i32: 27.2
From 9e09f52403058e1bc87653bfd9980c7d5a6ce33c Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Mon, 18 Dec 2023 22:48:09 +0800
Subject: [PATCH 3/6] checkasm/takdsp: add decorrelate_sm test
---
tests/checkasm/takdsp.c | 29 +
1 file changed, 29 insertions(+)
diff
C908:
decorrelate_sr_c: 95.5
decorrelate_sr_rvv_i32: 28.2
From fa1a84337a7cd2a62c26a9d5f8d707a97e917f77 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Mon, 18 Dec 2023 22:52:20 +0800
Subject: [PATCH 5/6] lavc/takdsp: R-V V decorrelate_sr
C908:
decorrelate_sr_c: 95.5
decorrelate_sr_rvv_i32: 28.2
A 'shnadd' should be moved to the front, updated in this reply.
flow gg 于2023年12月18日周一 23:15写道:
> C908:
> decorrelate_ls_c: 69.7
> decorrelate_ls_rvv_i32: 27.2
>
From fdee02eae64ced9a65781fbbeef32c6b8ee2fdce Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Mon, 18 Dec 2023 22:49:21 +
From 9254ae1f72498568857357059eb514e8cb90b5f1 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Mon, 18 Dec 2023 22:47:29 +0800
Subject: [PATCH 2/6] checkasm/takdsp: add decorrelate_sr test
---
tests/checkasm/takdsp.c | 27 +++
1 file changed, 27 insertions(+)
diff --git
From 960f70964521e1dc94647d70e2631351c0bb51bb Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Mon, 18 Dec 2023 22:39:13 +0800
Subject: [PATCH 1/6] checkasm/takdsp: add decorrelate_ls test
---
tests/checkasm/Makefile | 1 +
tests/checkasm/checkasm.c | 3 ++
tests/checkasm/checkasm.h | 1 +
C908:
decorrelate_sm_c: 130.0
decorrelate_sm_rvv_i32: 43.7
From 3dc613feaa6c38a7df47a3fc385e2140716e0ae2 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Mon, 18 Dec 2023 22:53:39 +0800
Subject: [PATCH 6/6] lavc/takdsp: R-V V decorrelate_sm
C908:
decorrelate_sm_c: 130.0
decorrelate_sm_rvv_i32:
c908:
dcmul_add_c: 88.0
dcmul_add_rvv_f64: 46.2
Did not use vlseg2e64, because it is much slower than vlse64
Did not use vsseg2e64, because it is slightly slower than vsse64
From 80b6694bc29ed1c37852dc079a6d91a24dd6f18e Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Tue, 19 Dec 2023 09:11:28
Okay, updated in the reply.
Rémi Denis-Courmont 于2023年12月19日周二 00:25写道:
> Le maanantaina 18. joulukuuta 2023, 17.26.58 EET flow gg a écrit :
> > A 'shnadd' should be moved to the front, updated in this reply.
>
> Indeed, but please try to interleave scalar and vector instructi
There are only three emails in my Sent Items, but there are six at
ffmpeg-devel.. I'm not quite sure why, please ignore the three duplicates.
flow gg 于2023年12月20日周三 16:41写道:
> C908:
> get_pixels_8x4_sym_c: 297.2
> get_pixels_8x4_sym_rvv_
C908:
get_pixels_8x4_sym_c: 297.2
get_pixels_8x4_sym_rvv_i64: 52.7
From 6fe4dbe9af39af50a1bf2069e91dfa542d83fee3 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Wed, 20 Dec 2023 16:28:33 +0800
Subject: [PATCH 3/3] lavc/dnxhdenc: R-V V get_pixels_8x4_sym
C908:
get_pixels_8x4_sym_c: 297.2
From 2f17a594805615a93f3f475246d61d61cc0aa43b Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Wed, 20 Dec 2023 16:21:38 +0800
Subject: [PATCH 2/3] checkasm/dnxhdenc: add get_pixels_8x4_sym test
---
tests/checkasm/Makefile | 1 +
tests/checkasm/checkasm.c | 3 ++
tests/checkasm/checkasm.h |
From 3f8adabeac408ada6048a1e2ac472534f970364e Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Wed, 20 Dec 2023 16:17:32 +0800
Subject: [PATCH 1/3] lvac/dnxhdenc: add ff_dnxhdenc_init
This is for clarity and use in testing, consistent with other parts of the code.
---
libavcodec/dnxhdenc.c | 6
Because the format of [PATCH 1/3] was modified, this patch needs to be
changed, and it has been modified in this reply.
flow gg 于2023年12月20日周三 16:41写道:
> C908:
> get_pixels_8x4_sym_c: 297.2
> get_pixels_8x4_sym_rvv_i64: 52.7
> ___
&g
> typo in 'lavc'
fixed.
> Brace should be on its own line
fixed
> Shouldn't it actually replace the existing ff_dnxhdenc_init_x86() call
in dnxhdenc.c?
Sorry, I missed this part, it's fixed in this reply
Anton Khirnov 于2023年12月20日周三 17:51写道:
> Quoting flow gg (2023-12
C908
h264_add_pixels4_clear_c: 96.0
h264_add_pixels4_clear_rvv_i64: 30.2
From 8b2838516915c27aa2831e797c2c41ad1d1bae1b Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Mon, 25 Dec 2023 00:06:28 +0800
Subject: [PATCH 2/3] lavc/h264dsp: R-V V h264_add_pixels4_clear
C908
h264_add_pixels4_clear_c:
From 39a9d1728cd867f5a4bfc39232167e9769247bf6 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Thu, 21 Dec 2023 20:02:11 +0800
Subject: [PATCH 1/3] checkasm/h264dsp: add h264_add_pixels_clear test
---
tests/checkasm/h264dsp.c | 55
1 file changed, 55
C908
h264_add_pixels8_clear_c: 262.0
h264_add_pixels8_clear_rvv_i64: 59.0
From 11218f9067566fa3ace8821b4b890457d6ea17f9 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Mon, 25 Dec 2023 00:07:09 +0800
Subject: [PATCH 3/3] lavc/h264dsp: R-V V h264_add_pixels8_clear
C908
h264_add_pixels8_clear_c:
:
> Le maanantaina 18. joulukuuta 2023, 17.16.27 EET flow gg a écrit :
> > C908:
> > decorrelate_sm_c: 130.0
> > decorrelate_sm_rvv_i32: 43.7
>
> +
> +func ff_decorrelate_sm_rvv, zve32x
> +1:
> +vsetvli t0, a2, e32, m8, ta, ma
> +vle32
uta 2023, 4.53.12 EET flow gg a écrit :
> > c908:
> > dcmul_add_c: 88.0
> > dcmul_add_rvv_f64: 46.2
> >
> > Did not use vlseg2e64, because it is much slower than vlse64
> > Did not use vsseg2e64, because it is slightly slower than vsse64
>
> Is this about C
Updated the patch to resolve conflicts, updated m4 to m8, using c908's
benchmark.
flow gg 于2023年11月29日周三 01:00写道:
> c910:
> abs_pow34_c: 24610.7
> abs_pow34_rvv_f32: 6177.7
>
> (need use "[FFmpeg-devel] [PATCH 1/2] checkasm: test for abs_
To express clearly,I mean remove
libavcodec/aacenc.c:1429 FF_CODEC_ENCODE_CB(aac_encode_frame)
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
s:0kB
If I remove the line 1429 with FF_CODEC_ENCODE_CB(aac_encode_frame), there
is no error on k230, but I am unsure of the reason.
flow gg 于2023年12月5日周二 05:46写道:
> Because there was a conflict, the patch was updated in the reply
>
> flow gg 于2023年12月1日周五 04:25写道:
>
&g
I mistook it, seeing the vector length as the length of the vector register
..
I have modified it in this reply.
Rémi Denis-Courmont 于2023年12月30日周六 20:15写道:
>
>
> Le 29 décembre 2023 12:57:20 GMT+01:00, flow gg a
> écrit :
> >C908
> >ssd_int8_vs_int16_c: 207.7
>
Thank you, I learned this and updated it in this reply.
James Almer 于2023年12月30日周六 22:46写道:
> On 12/30/2023 10:59 AM, flow gg wrote:
> > Okay, it has been modified in this reply.
>
> > From d62f363e3aad534c7ead5f3015029b3e7cbbff46 Mon Sep 17 00:00:00 2001
> > From: suny
flow gg 于2023年12月30日周六 22:00写道:
> > At a quick glance, it won't work if the input length is not a multiple
> of the vector length.
>
> Why? I tried 1024, 32*3, 32*7 and all passed the test.
>
> > Also do you really need to extend accumulators to 32 bits?
>
> It
I have modified it in this reply.
Rémi Denis-Courmont 于2023年12月30日周六 20:15写道:
>
>
> Le 29 décembre 2023 12:57:20 GMT+01:00, flow gg a
> écrit :
> >C908
> >ssd_int8_vs_int16_c: 207.7
> >ssd_int8_vs_int16_rvv_i32: 28.0
>
> At a quick glance, it won't work if the inpu
Okay, it has been modified in this reply.
Martin Storsjö 于2023年12月29日周五 22:35写道:
> On Fri, 29 Dec 2023, James Almer wrote:
>
> > On 12/29/2023 9:16 AM, Martin Storsjö wrote:
> >> On Fri, 29 Dec 2023, flow gg wrote:
> >>
> >>> Tests on x86 might f
From 55fe9e001545ed3ae1f2c64666d07aebaeb83a2a Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Fri, 29 Dec 2023 13:08:25 +0800
Subject: [PATCH 1/3] lvac/svqenc: add ff_svq1enc_init
This is for clarity and use in testing, consistent with other parts of the code
---
libavcodec/svq1enc.c| 18
Tests on x86 might fail, possibly due to a 16-bit sub overflow
From 8bde7750ec7adc2437843e14d4be85fb900d1b16 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Fri, 29 Dec 2023 13:09:21 +0800
Subject: [PATCH 2/3] checkasm/svqenc: add ssd_int8_vs_int16 test
---
tests/checkasm/Makefile | 1 +
C908
ssd_int8_vs_int16_c: 207.7
ssd_int8_vs_int16_rvv_i32: 28.0
From 0fd1b7a34ab8794868d80233c35f70c8ad42b9fa Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Fri, 29 Dec 2023 13:27:31 +0800
Subject: [PATCH 3/3] lavc/svq1enc: R-V V ssd_int8_vs_int16
C908
ssd_int8_vs_int16_c: 207.7
One vset can be reduced, but vwsub should not be used in this case. I
modified it in this reply.
Rémi Denis-Courmont 于2024年1月5日周五 00:00写道:
> Le lauantaina 30. joulukuuta 2023, 18.20.15 EET flow gg a écrit :
> > I mistook it, seeing the vector length as the length of the vector
&
a2, a2, t0
+ vsetvli zero, t0, e8, m2, tu, ma
+ vle8.v v0, (a0)
+ vwsub.wv v16, v8, v0
Rémi Denis-Courmont 于2024年1月6日周六 23:05写道:
> Le perjantaina 5. tammikuuta 2024, 2.56.18 EET flow gg a écrit :
> > One vset can be reduced, but vwsub should not be used in thi
Alright, I learned a bit more, so should we not consider the internal
implementation?
I've added this version that reduces one vset in this reply.
Rémi Denis-Courmont 于2024年1月7日周日 16:03写道:
> Le sunnuntaina 7. tammikuuta 2024, 3.33.39 EET flow gg a écrit :
> > I tested it, and indeed us
ping
flow gg 于2023年12月25日周一 12:01写道:
> C908
> h264_add_pixels8_clear_c: 262.0
> h264_add_pixels8_clear_rvv_i64: 59.0
>
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe
Okay, I updated it in the reply
Rémi Denis-Courmont 于2024年1月17日周三 02:04写道:
> +vsetvli t0, a2, e8, m2, tu, ma
> +vle8.v v0, (a0)
> +sub a2, a2, t0
> +vsetvli zero, t0, e16, m4, tu, ma
> +vle16.v v8, (a1)
> +vsetvli
From eaac50d41b3398ef39d1026a7d84480860a1c41e Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Tue, 16 Jan 2024 23:55:33 +0800
Subject: [PATCH 1/3] lavc/h264pred: R-V V pred16x16_vertical_8
C908
pred16x16_vertical_8_c: 1.5
pred16x16_vertical_8_rvv_i32: 1.0
---
libavcodec/h264pred.c|
From 806f84ea5557c4652e48451decc4c679c9485472 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Tue, 16 Jan 2024 23:56:33 +0800
Subject: [PATCH 2/3] lavc/h264pred: R-V V pred16x16_horizontal_8
C908
pred16x16_horizontal_8_c: 3.0
pred16x16_horizontal_8_rvv_i32: 2.5
---
From 8c5fdbfea42e9ad6ba6e1df5e4ea3c583d59537a Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Tue, 16 Jan 2024 23:57:53 +0800
Subject: [PATCH 3/3] lavc/h264pred: R-V V pred16x16_dc_8
C908
pred16x16_dc_8_c: 2.5
pred16x16_dc_8_rvv_i32: 1.7
---
libavcodec/riscv/h264pred_init.c | 2 ++
c910:
abs_pow34_c: 24610.7
abs_pow34_rvv_f32: 6177.7
(need use "[FFmpeg-devel] [PATCH 1/2] checkasm: test for abs_pow34" first)
From 86577c2d40d29422c4b769c854df99a88c7b3c77 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Tue, 28 Nov 2023 20:14:14 +0800
Subject: [PATCH 2/2]
From 85e60d75554894964825f5718d14591294ec4e88 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Tue, 28 Nov 2023 14:08:12 +0800
Subject: [PATCH 1/2] checkasm: test for abs_pow34
---
libavcodec/aacenc.c| 24 +++--
libavcodec/aacenc.h| 1 +
tests/checkasm/Makefile| 1 +
This is a bit confusing for me.. I tried pulling the latest code, and then
used `git am checkasm-test-for-dcmul_add.patch` without any patch
corruption.
Rémi Denis-Courmont 于2023年11月27日周一 03:36写道:
> Le sunnuntaina 19. marraskuuta 2023, 0.28.10 EET flow gg a écrit :
>
From 02dd534bd602ba3ec79e51070934949a98f780e2 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Wed, 22 Nov 2023 14:57:29 +0800
Subject: [PATCH] checkasm/ac3dsp: add float_to_fixed24 test
---
tests/checkasm/Makefile | 1 +
tests/checkasm/ac3dsp.c | 71 +++
I modified the temporary test and sent it in "[FFmpeg-devel] [PATCH]
checkasm/ac3dsp: add float_to_fixed24 test".
So the test time results have changed, and I updated them in the patch.
c910
float_to_fixed24_c: 2207.2
float_to_fixed24_rvv_f32: 696.2
flow gg 于2023年11月22日周三 20:00写
> FWIW CanMV-K230 boards are on sale for under 500 RMB.
I just made a payment ~ (I saw you mention in IRC that you're going to
write about K230+Debian. Looking forward to it)
Rémi Denis-Courmont 于2023年12月6日周三 04:11写道:
> Le tiistaina 5. joulukuuta 2023, 21.25.12 EET flow gg a
Changed.
Rémi Denis-Courmont 于2023年12月6日周三 04:11写道:
> Le tiistaina 5. joulukuuta 2023, 21.25.12 EET flow gg a écrit :
> > > This block can be folded into the next. You don't need to check VLENB
> >
> > twice.
> >
> > Changed.
> >
> > > Instruction
> This block can be folded into the next. You don't need to check VLENB
twice.
Changed.
> Instruction scheduling could be better, especially on in-order CPUs.
I put the vload at the front, and then proceeded with the t2 operation, but
I'm not sure...
> You don't need to reset the AVL here,
Okay, changed and attached
Rémi Denis-Courmont 于2023年12月2日周六 02:38写道:
> Le perjantaina 1. joulukuuta 2023, 20.35.10 EET Rémi Denis-Courmont a
> écrit :
> > Le perjantaina 24. marraskuuta 2023, 0.39.39 EET flow gg a écrit :
> > > Okay, changed
> >
> > s
I forgot to modify the Makefile; I've made the changes in this reply.
flow gg 于2023年12月2日周六 03:50写道:
> Okay, changed and attached
>
> Rémi Denis-Courmont 于2023年12月2日周六 02:38写道:
>
>> Le perjantaina 1. joulukuuta 2023, 20.35.10 EET Rémi Denis-Courmont a
>> écrit
Okay, changed
Rémi Denis-Courmont 于2023年11月24日周五 01:09写道:
> Le torstaina 23. marraskuuta 2023, 1.17.03 EET flow gg a écrit :
> > Hello, I saw the new commit "avcodec/ac3dsp: make len a size_t in
> > float_to_fixed24."
> >
> > So I removed the part #if (__ris
> You should probably add the test case to tests/fate/checkasm.mak
> This one is not necessary. You can reuse dst or dst2 for the bench() as
it's write only.
> Changed BUF_SIZE instead of 10.
Okay, changed.
James Almer 于2023年11月24日周五 01:11写道:
> On 11/23/2023 4:08 AM, f
Wow, thank you for reviewing this. I just wanted to see if the function was
working properly. There are so many bugs in the test code ...
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To
Hello, I saw the new commit "avcodec/ac3dsp: make len a size_t in
float_to_fixed24."
So I removed the part #if (__riscv_xlen == 64) and restored the patch.
From 3e790fdccd780257f464aa8f8a56a37321ddd429 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Wed, 22 Nov 2023 14:57:29 +0800
Subject:
c910
vc1dsp.vc1_inv_trans_4x4_dc_c: 84.0
vc1dsp.vc1_inv_trans_4x4_dc_rvv_i32: 74.0
vc1dsp.vc1_inv_trans_4x8_dc_c: 150.2
vc1dsp.vc1_inv_trans_4x8_dc_rvv_i32: 83.5
vc1dsp.vc1_inv_trans_8x4_dc_c: 129.0
vc1dsp.vc1_inv_trans_8x4_dc_rvv_i64: 75.7
I found that in the case of nosplat, an additional vset can be removed, and
the time is basically the same, so I updated the patch.
Rémi Denis-Courmont 于2023年12月4日周一 23:15写道:
> Le maanantaina 4. joulukuuta 2023, 10.48.56 EET flow gg a écrit :
> > > Probably missing VLENB checks.
>
Because there was a conflict, the patch was updated in the reply
flow gg 于2023年12月1日周五 04:25写道:
> Okay, I splited and attached
>
>
>
> Rémi Denis-Courmont 于2023年11月30日周四 23:31写道:
>
>> Le tiistaina 28. marraskuuta 2023, 18.59.38 EET flow gg a écrit :
>> >
>
zero, zero, e64, m4, ta, ma
+ vsetivlizero, 8, e8, mf2, ta, ma
```
And ISCAS seems to have no announcement about getting an RVV 1.0 board. I
plan to ask about it from time to time.
Rémi Denis-Courmont 于2023年12月4日周一 01:17写道:
> Le sunnuntaina 3. joulukuuta 2023, 16.40.
Okay, after using zext, can delete two vset, which is better than splat. I
have updated the patch in this reply.
Rémi Denis-Courmont 于2023年12月4日周一 23:15写道:
> Le maanantaina 4. joulukuuta 2023, 10.48.56 EET flow gg a écrit :
> > > Probably missing VLENB checks.
> >
> > Ch
023, 16.40.08 EET flow gg a écrit :
> > c910
> > vc1dsp.vc1_inv_trans_4x4_dc_c: 84.0
> > vc1dsp.vc1_inv_trans_4x4_dc_rvv_i32: 74.0
> > vc1dsp.vc1_inv_trans_4x8_dc_c: 150.2
> > vc1dsp.vc1_inv_trans_4x8_dc_rvv_i32: 83.5
> >
also posed no problems.
(I am using the Gmail web page.)
Rémi Denis-Courmont 于2023年11月27日周一 20:17写道:
>
>
> Le 26 novembre 2023 22:54:28 GMT+02:00, flow gg a
> écrit :
> >This is a bit confusing for me.. I tried pulling the latest code, and then
> >used `git am checkasm-
Okay, I splited and attached
Rémi Denis-Courmont 于2023年11月30日周四 23:31写道:
> Le tiistaina 28. marraskuuta 2023, 18.59.38 EET flow gg a écrit :
> >
>
> Since nobody else commented, I shall note that you should probably split
> the
> underlying lavc changes into a separ
ping
flow gg 于2023年12月25日周一 12:01写道:
>
> C908
> h264_add_pixels4_clear_c: 96.0
> h264_add_pixels4_clear_rvv_i64: 30.2
>
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscr
> Also fractional multipler should never be smaller than the ratio of the
> specified element size to the largest element size used in the function.
Here
> it is largelly inconsequential, but for instance "e32, mf4" and "e64,
mf2" are
> invalid.
Thanks, I indeed almost forgot about this part
> I
Fixed the rv32 break in this reply
flow gg 于2024年1月31日周三 20:01写道:
>
>
From 0874f319e1c26aa0eeb5ed0d4e00d29aec4c5af8 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Wed, 31 Jan 2024 19:04:11 +0800
Subject: [PATCH 4/4] lavc/rv34dsp: R-V V rv34_idct_dc_add
C908:
rv34_idct_dc_add_c:
I have slightly adjusted the rvv and updated patch in this reply.
flow gg 于2023年12月20日周三 18:15写道:
> Because the format of [PATCH 1/3] was modified, this patch needs to be
> changed, and it has been modified in this reply.
>
> flow gg 于2023年12月20日周三 16:41写道:
>
>> C908:
&g
> I expect that it would be faster to make one large load, and then 4 small
> stores, but that might work only for exactly 128-bit vectors?
This seems to require vle128, so I didn't modify it.
> That's not needed. You can use immediate values.
> You can reorder to avoid immediate data
From d545f5ccc1c5923cb38c25b18ca750ef0ee529f4 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Thu, 1 Feb 2024 15:12:49 +0800
Subject: [PATCH 1/2] lavc/blockdsp: R-V V clear_block
C908:
blockdsp.clear_block_c: 47.2
blockdsp.clear_block_rvv_i64: 28.5
---
libavcodec/blockdsp.c| 2 ++
From 91236c12365de8a39250ceee07a6234a1735ae77 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Thu, 1 Feb 2024 15:41:09 +0800
Subject: [PATCH 2/2] lavc/blockdsp: R-V V clear_blocks
C908:
blockdsp.clear_blocks_c: 128.2
blockdsp.clear_blocks_rvv_i64: 102.5
---
libavcodec/riscv/blockdsp_init.c | 2
It seems to be caused by movd m0, r1d in libavcodec/x86/rv34dsp.asm? I'm
not quite sure.
Michael Niedermayer 于2024年2月2日周五 07:42写道:
> On Wed, Jan 31, 2024 at 08:00:18PM +0800, flow gg wrote:
> >
>
> > checkasm/Makefile |1
> > checkasm/checkasm.c |3 ++
> &g
Ok, updated it in the reply
Rémi Denis-Courmont 于2024年2月2日周五 04:13写道:
> You should probably use an assembler macro to repeat the code.
>
>
> --
> レミ・デニ-クールモン
> http://www.remlab.net/
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
>
From 32fdf006a81da78bde29b5cc0c26446d0bb3390d Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Fri, 2 Feb 2024 12:49:07 +0800
Subject: [PATCH 1/3] lavc/vp8dsp: R-V V vp8_idct_dc_add
c908:
vp8_idct_dc_add_c: 102.2
vp8_idct_dc_add_rvv_i32: 42.0
---
libavcodec/riscv/Makefile | 2 ++
1 - 100 of 244 matches
Mail list logo