Re: [FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V inv_trans

2023-12-07 Thread flow gg
Hello, I have received the K230, and then installed Debian following your method. Therefore, I have updated the benchmark of K230 in the patch of this reply. k230 vc1dsp.vc1_inv_trans_4x4_dc_c: 125.7 vc1dsp.vc1_inv_trans_4x4_dc_rvv_i32: 53.5 vc1dsp.vc1_inv_trans_4x8_dc_c: 230.7

Re: [FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V inv_trans

2023-12-05 Thread flow gg
> FWIW CanMV-K230 boards are on sale for under 500 RMB. I just made a payment ~ (I saw you mention in IRC that you're going to write about K230+Debian. Looking forward to it) Rémi Denis-Courmont 于2023年12月6日周三 04:11写道: > Le tiistaina 5. joulukuuta 2023, 21.25.12 EET flow gg a écrit : > > > This

Re: [FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V inv_trans

2023-12-05 Thread flow gg
I'm sorry for my carelessness.It's because I used to build and run manually, but now I've switched to a script to do it, so I accidentally missed the error.I will modify the script and to avoid this kind of issue in the future. libavcodec/riscv/vc1dsp_rvv.S:35: Error: improper CSRxI immediate

Re: [FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V inv_trans

2023-12-05 Thread Rémi Denis-Courmont
Le tiistaina 5. joulukuuta 2023, 21.25.12 EET flow gg a écrit : > > This block can be folded into the next. You don't need to check VLENB > > twice. > > Changed. > > > Instruction scheduling could be better, especially on in-order CPUs. > > I put the vload at the front, and then proceeded with

Re: [FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V inv_trans

2023-12-05 Thread flow gg
> This block can be folded into the next. You don't need to check VLENB twice. Changed. > Instruction scheduling could be better, especially on in-order CPUs. I put the vload at the front, and then proceeded with the t2 operation, but I'm not sure... > You don't need to reset the AVL here,

Re: [FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V inv_trans

2023-12-05 Thread Rémi Denis-Courmont
Hi, > diff --git a/libavcodec/riscv/Makefile b/libavcodec/riscv/Makefile > index 2d0e6c19c8..442c5961ea 100644 > --- a/libavcodec/riscv/Makefile > +++ b/libavcodec/riscv/Makefile > @@ -39,5 +39,7 @@ OBJS-$(CONFIG_PIXBLOCKDSP) += riscv/pixblockdsp_init.o \ > RVV-OBJS-$(CONFIG_PIXBLOCKDSP)

Re: [FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V inv_trans

2023-12-04 Thread flow gg
Okay, after using zext, can delete two vset, which is better than splat. I have updated the patch in this reply. Rémi Denis-Courmont 于2023年12月4日周一 23:15写道: > Le maanantaina 4. joulukuuta 2023, 10.48.56 EET flow gg a écrit : > > > Probably missing VLENB checks. > > > > Changed. > > > > > You can

Re: [FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V inv_trans

2023-12-04 Thread flow gg
I found that in the case of nosplat, an additional vset can be removed, and the time is basically the same, so I updated the patch. Rémi Denis-Courmont 于2023年12月4日周一 23:15写道: > Le maanantaina 4. joulukuuta 2023, 10.48.56 EET flow gg a écrit : > > > Probably missing VLENB checks. > > > >

Re: [FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V inv_trans

2023-12-04 Thread Rémi Denis-Courmont
Le maanantaina 4. joulukuuta 2023, 10.48.56 EET flow gg a écrit : > > Probably missing VLENB checks. > > Changed. > > > You can multiply by 3, 5 or 9 with shift-and-add. By 12 with shift-and-add > > then shift, and by 17 with shift then add. You don't need multiplications. > > Changed. > > >

Re: [FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V inv_trans

2023-12-04 Thread flow gg
> Probably missing VLENB checks. Changed. > You can multiply by 3, 5 or 9 with shift-and-add. By 12 with shift-and-add > then shift, and by 17 with shift then add. You don't need multiplications. Changed. > Do you really need to splat? Can't .vx or .wx be used instead? Okay, for example in

Re: [FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V inv_trans

2023-12-03 Thread Rémi Denis-Courmont
Le sunnuntaina 3. joulukuuta 2023, 16.40.08 EET flow gg a écrit : > c910 > vc1dsp.vc1_inv_trans_4x4_dc_c: 84.0 > vc1dsp.vc1_inv_trans_4x4_dc_rvv_i32: 74.0 > vc1dsp.vc1_inv_trans_4x8_dc_c: 150.2 > vc1dsp.vc1_inv_trans_4x8_dc_rvv_i32: 83.5 > vc1dsp.vc1_inv_trans_8x4_dc_c: 129.0 >

[FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V inv_trans

2023-12-03 Thread flow gg
c910 vc1dsp.vc1_inv_trans_4x4_dc_c: 84.0 vc1dsp.vc1_inv_trans_4x4_dc_rvv_i32: 74.0 vc1dsp.vc1_inv_trans_4x8_dc_c: 150.2 vc1dsp.vc1_inv_trans_4x8_dc_rvv_i32: 83.5 vc1dsp.vc1_inv_trans_8x4_dc_c: 129.0 vc1dsp.vc1_inv_trans_8x4_dc_rvv_i64: 75.7