Okay, I have modified them to 64 and added some descriptions.
Rémi Denis-Courmont 于2023年11月15日周三 23:06写道:
> Le keskiviikkona 15. marraskuuta 2023, 10.59.55 EET flow gg a écrit :
> > Okay, I have updated these issues in the patch.
>
> It does not assemble but I can fix it locally. The narrowing s
Le keskiviikkona 15. marraskuuta 2023, 10.59.55 EET flow gg a écrit :
> Okay, I have updated these issues in the patch.
It does not assemble but I can fix it locally. The narrowing shift trickery
require Zve64x, or rather Zve64f in this case.
The performance improvement is much better on newer h
Okay, I have updated these issues in the patch.
Rémi Denis-Courmont 于2023年11月13日周一 23:35写道:
>Hi,
>
> Le maanantaina 13. marraskuuta 2023, 11.43.01 EET flow gg a écrit :
> > Sorry for the long delay in responding.
>
> No problem. Working with T-Head C910 (or C920?) cores is very tedious. I
>
Okay, I have updated these issues in the patch.
Rémi Denis-Courmont 于2023年11月13日周一 23:35写道:
>Hi,
>
> Le maanantaina 13. marraskuuta 2023, 11.43.01 EET flow gg a écrit :
> > Sorry for the long delay in responding.
>
> No problem. Working with T-Head C910 (or C920?) cores is very tedious. I
>
On Mon, Nov 13, 2023 at 4:35 PM Rémi Denis-Courmont wrote:
>Hi,
>
> Le maanantaina 13. marraskuuta 2023, 11.43.01 EET flow gg a écrit :
> > Sorry for the long delay in responding.
>
> No problem. Working with T-Head C910 (or C920?) cores is very tedious. I
> gave
> up on that and switched ove
Hi,
Le maanantaina 13. marraskuuta 2023, 11.43.01 EET flow gg a écrit :
> Sorry for the long delay in responding.
No problem. Working with T-Head C910 (or C920?) cores is very tedious. I gave
up on that and switched over to Kendryte K230 (based on C908) now.
> How is the modified patch now?
Sorry for the long delay in responding.
How is the modified patch now?
no longer using register stride(learn from your code) and have switched to
shNadd instead.
(using m4 and m2 as they are slightly faster than m8 and m4)
benchmark:
fcmul_add_c: 2179
fcmul_add_rvv_f32: 1652
Rémi Denis-Courmon
Le 28 septembre 2023 08:45:44 GMT+03:00, flow gg a écrit
:
>Okay, I revert the volatile in ff_read_time
>
>How about this version?
It's still using register stride which is all but guaranteed to be slow on any
hardware and should only be used as a last resort.
The code is also missing schedu
Okay, I revert the volatile in ff_read_time
How about this version?
use vls instead vlseg, and use vfmacc
The benchmark is sometimes better, sometimes the same
fcmul_add_c: 3.5
fcmul_add_rvv_f32: 3.5
- af_afir.fcmul_add [OK]
fcmul_add_c: 4.5
fcmul_add_rvv_f32: 4.2
- af_afir.fcmul_add [OK]
fcm
Le tiistaina 26. syyskuuta 2023, 12.24.58 EEST flow gg a écrit :
> benchmark:
> fcmul_add_c: 19.7
> fcmul_add_rvv_f32: 6.7
With optimisations enabled and the benchmarking fix, I get this (on the same
hardware, I believe):
fcmul_add_c: 3.5
fcmul_add_rvv_f32: 6.7
For sure unfortunate design limit
Le keskiviikkona 27. syyskuuta 2023, 4.47.26 EEST flow gg a écrit :
> >>> please pad mnemonics to at least 8 columns for consistency
>
> okay, changed
>
> >>> It seems that you could just as well use vlseg2 without register
>
> stride, no?
>
> yes, vlseg will better, changed
>
> >>> Note that
Le keskiviikkona 27. syyskuuta 2023, 4.47.26 EEST flow gg a écrit :
> ```
> tests/checkasm/checkasm --bench --test=aacpsdsp
> tests/checkasm/checkasm --bench --test=alacdsp
> tests/checkasm/checkasm --bench --test=audiodsp
> tests/checkasm/checkasm --bench --test=g722dsp
> tests/checkasm/checkasm -
>>> please pad mnemonics to at least 8 columns for consistency
okay, changed
>>> It seems that you could just as well use vlseg2 without register
stride, no?
yes, vlseg will better, changed
>>> Note that you could do the double versions with very little extra
efforts.
okay
>>> But really, DO
Le tiistaina 26. syyskuuta 2023, 12.24.58 EEST flow gg a écrit :
> benchmark:
> fcmul_add_c: 19.7
> fcmul_add_rvv_f32: 6.7
+li t1, 4
+vsetvli t0, t1, e32, m1, ta, ma
vsetivli t0, 4, ...
But really, DO NOT use a fixed vector length here. At best, you're wasting half
the vector width. Yo
Le tiistaina 26. syyskuuta 2023, 21.40.12 EEST Paul B Mahol a écrit :
> On Tue, Sep 26, 2023 at 8:35 PM Rémi Denis-Courmont wrote:
> > Le tiistaina 26. syyskuuta 2023, 12.24.58 EEST flow gg a écrit :
> > > benchmark:
> > > fcmul_add_c: 19.7
> > > fcmul_add_rvv_f32: 6.7
> >
> > Nit: please pad mne
On Tue, Sep 26, 2023 at 8:35 PM Rémi Denis-Courmont wrote:
> Le tiistaina 26. syyskuuta 2023, 12.24.58 EEST flow gg a écrit :
> > benchmark:
> > fcmul_add_c: 19.7
> > fcmul_add_rvv_f32: 6.7
>
> Nit: please pad mnemonics to at least 8 columns for consistency.
>
> I'm a bit surprised that the perfo
Le tiistaina 26. syyskuuta 2023, 12.24.58 EEST flow gg a écrit :
> benchmark:
> fcmul_add_c: 19.7
> fcmul_add_rvv_f32: 6.7
Nit: please pad mnemonics to at least 8 columns for consistency.
I'm a bit surprised that the performance improves this much, considering that
the C910 is notoriously bad at
benchmark:
fcmul_add_c: 19.7
fcmul_add_rvv_f32: 6.7
From 6bef2523728a472bb803ce085a1aafdfd624e212 Mon Sep 17 00:00:00 2001
From: h
Date: Tue, 26 Sep 2023 15:03:12 +0800
Subject: [PATCH] af_afir: RISC-V V fcmul_add
fcmul_add_c: 19.7
fcmul_add_rvv_f32: 6.7
---
libavfilter/af_afirdsp.h | 3
18 matches
Mail list logo