Re: [FFmpeg-devel] [PATCH 2/2] lavc/vc1dsp: R-V V mspel_pixels

2024-04-29 Thread flow gg
Happy to see you back :) Rémi Denis-Courmont 于2024年4月29日周一 02:06写道: > Le sunnuntaina 7. huhtikuuta 2024, 8.38.54 EEST flow gg a écrit : > > ping > > I have been away for a while, and catching up takes time, sorry. > > -- > レミ・デニ-クールモン > http://www.remlab.net/ >

Re: [FFmpeg-devel] [PATCH 2/2] lavc/vc1dsp: R-V V mspel_pixels

2024-04-28 Thread Rémi Denis-Courmont
Le sunnuntaina 7. huhtikuuta 2024, 8.38.54 EEST flow gg a écrit : > ping I have been away for a while, and catching up takes time, sorry. -- レミ・デニ-クールモン http://www.remlab.net/ ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org

Re: [FFmpeg-devel] [PATCH 2/2] lavc/vc1dsp: R-V V mspel_pixels

2024-04-06 Thread flow gg
ping flow gg 于2024年3月8日周五 17:46写道: > Alright, using m8, but for now don't add code to address dependencies in > loops that have a minor impact. Updated in the reply > > Rémi Denis-Courmont 于2024年3月8日周五 17:08写道: > >> >> >> Le 8 mars 2024 02:45:46 GMT+02:00, flow gg a >> écrit : >> >> Isn't it

Re: [FFmpeg-devel] [PATCH 2/2] lavc/vc1dsp: R-V V mspel_pixels

2024-03-08 Thread flow gg
Alright, using m8, but for now don't add code to address dependencies in loops that have a minor impact. Updated in the reply Rémi Denis-Courmont 于2024年3月8日周五 17:08写道: > > > Le 8 mars 2024 02:45:46 GMT+02:00, flow gg a > écrit : > >> Isn't it also faster to max LMUL for the adds here? > > >

Re: [FFmpeg-devel] [PATCH 2/2] lavc/vc1dsp: R-V V mspel_pixels

2024-03-08 Thread Rémi Denis-Courmont
Le 8 mars 2024 02:45:46 GMT+02:00, flow gg a écrit : >> Isn't it also faster to max LMUL for the adds here? > >It requires the use of one more vset, making the time slightly longer: >147.7 (m1), 148.7 (m8 + vset). A variation of 0.6% on a single set of kernels will end up below measurement

Re: [FFmpeg-devel] [PATCH 2/2] lavc/vc1dsp: R-V V mspel_pixels

2024-03-07 Thread flow gg
> Isn't it also faster to max LMUL for the adds here? It requires the use of one more vset, making the time slightly longer: 147.7 (m1), 148.7 (m8 + vset). Also this might not be much noticeable on C908, but avoiding sequential dependencies on the address registers may help. I mean, avoid using

Re: [FFmpeg-devel] [PATCH 2/2] lavc/vc1dsp: R-V V mspel_pixels

2024-03-07 Thread Rémi Denis-Courmont
Le lauantaina 2. maaliskuuta 2024, 14.06.13 EET flow gg a écrit : > Here adjusting the order, rather than simply using .rept, will be 13%-24% > faster. Isn't it also faster to max LMUL for the adds here? Also this might not be much noticeable on C908, but avoiding sequential dependencies on the

[FFmpeg-devel] [PATCH 2/2] lavc/vc1dsp: R-V V mspel_pixels

2024-03-02 Thread flow gg
Here adjusting the order, rather than simply using .rept, will be 13%-24% faster. From 07aa3e2eff0fe1660ac82dec5d06d50fa4c433a4 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Wed, 28 Feb 2024 16:32:39 +0800 Subject: [PATCH 2/2] lavc/vc1dsp: R-V V mspel_pixels