I want to update the VP9 bilin load, just like you did with VP8, but it
seems like this patch([PATCH v2 1/5] lavc/vp9dsp: R-V V mc avg) doesn't
merge the current updates here but merges the previous version instead, so
the subsequent patches will have conflicts.
flow gg 于2024年5月22日周三 01:15写道
Unfortunately I only test to obtain benchmarks and basic correctness. I
always feel the need for a professional to write the tests.
Rémi Denis-Courmont 于2024年5月23日周四 04:35写道:
>
>
> Le 22 mai 2024 23:28:54 GMT+03:00, "Rémi Denis-Courmont"
> a écrit :
> >This removes one stray LI and reworks the
Reordered some here.
于2024年5月22日周三 03:24写道:
> From: sunyuechi
>
> C908 X60
> avg_8_2x2_c:1.01.0
> avg_8_2x2_rvv_i32 :0.70.7
> avg_8_2x4_c
Do macros definition also need a comma? I noticed that many of my old code
and SiFive's code don't have a comma
Rémi Denis-Courmont 于2024年5月22日周三 02:29写道:
> Le tiistaina 21. toukokuuta 2024, 20.13.16 EEST u...@foxmail.com a écrit :
> > From: sunyuechi
>
> > diff --git
> I would expect that you can get better performance by interleaving scalar
and
vector stuff, and possibly also vector loads and vector arithmetic.
Okay, I will try
> These labels lead to nowhere? If you actually mean to implicitly fall
through
to the next function, you can use the function name
> Please put commas between operands.
> This should probably be ff_avg_vp9 or something slightly more specific.
Updated here.
于2024年5月22日周三 01:14写道:
> From: sunyuechi
>
> C908:
> vp9_avg4_8bpp_c: 1.2
> vp9_avg4_8bpp_rvv_i64: 1.0
> vp9_avg8_8bpp_c: 3.7
> vp9_avg8_8bpp_rvv_i64: 1.5
>
> Please put commas between operands.
Okay
> This should probably be ff_avg_vp9 or something slightly more specific.
Is it necessary here? Many macros in the C file are copied from MIPS, where
it is called ff_avg4_msa. Here, it has been simply changed to ff_avg4_rvv.
Rémi Denis-Courmont
There are three unused lines which I forgot to delete before submitting. I
have updated them here.
于2024年5月21日周二 15:47写道:
> From: sunyuechi
>
> C908 X60
> avg_8_2x2_c:1.01.0
>
To obtain test results, need to comment out the if (w == h) in
tests/checkasm/vvc_mc.c.
Because vset needs to be used in the loop, I manually wrote a cumbersome
vset macro.
于2024年5月21日周二 15:38写道:
> From: sunyuechi
>
> C908 X60
>
fix .irp use
于2024年5月19日周日 16:18写道:
> From: sunyuechi
>
> C908:
> vp8_put_epel4_h4v4_c: 20.0
> vp8_put_epel4_h4v4_rvv_i32: 11.0
> vp8_put_epel4_h4v6_c: 25.2
> vp8_put_epel4_h4v6_rvv_i32: 13.5
> vp8_put_epel4_h6v4_c: 22.2
> vp8_put_epel4_h6v4_rvv_i32: 14.5
> vp8_put_epel4_h6v6_c: 29.0
>
fixed in v4
Rémi Denis-Courmont 于2024年5月18日周六 23:56写道:
> Le maanantaina 13. toukokuuta 2024, 19.59.23 EEST u...@foxmail.com a
> écrit :
> > From: sunyuechi
> >
> > C908:
> > vp9_avg_bilin_4h_8bpp_c: 5.2
> > vp9_avg_bilin_4h_8bpp_rvv_i64: 2.2
> > vp9_avg_bilin_4v_8bpp_c: 5.5
> >
Fixed issues with .irp and comma, as well as the ifc issue (same
modifications as previously done for vp8).
于2024年5月19日周日 02:16写道:
> From: sunyuechi
>
> C908:
> vp9_avg4_8bpp_c: 1.2
> vp9_avg4_8bpp_rvv_i64: 1.0
> vp9_avg8_8bpp_c: 3.7
> vp9_avg8_8bpp_rvv_i64: 1.5
> vp9_avg16_8bpp_c: 14.7
>
yeah, updated it in the reply
Rémi Denis-Courmont 于2024年5月17日周五 23:11写道:
> Le maanantaina 13. toukokuuta 2024, 19.59.22 EEST u...@foxmail.com a
> écrit :
> > From: sunyuechi
> >
> > C908:
> > vp9_avg4_8bpp_c: 1.2
> > vp9_avg4_8bpp_rvv_i64: 1.0
> > vp9_avg8_8bpp_c: 3.7
> >
Is the test result missing here?
Rémi Denis-Courmont 于2024年5月16日周四 01:11写道:
> ---
> libavcodec/riscv/Makefile| 1 +
> libavcodec/riscv/h264dsp_init.c | 5
> libavcodec/riscv/startcode_rvv.S | 44
> libavcodec/riscv/vc1dsp_init.c | 16
updated for clean code
于2024年5月15日周三 11:56写道:
> From: sunyuechi
>
> C908:
> vp9_tm_4x4_8bpp_c: 116.5
> vp9_tm_4x4_8bpp_rvv_i32: 43.5
> vp9_tm_8x8_8bpp_c: 416.2
> vp9_tm_8x8_8bpp_rvv_i32: 86.0
> vp9_tm_16x16_8bpp_c: 1665.5
> vp9_tm_16x16_8bpp_rvv_i32: 187.2
> vp9_tm_32x32_8bpp_c: 6974.2
>
in the reply
Rémi Denis-Courmont 于2024年5月15日周三 02:08写道:
> Le tiistaina 14. toukokuuta 2024, 20.57.17 EEST flow gg a écrit :
> > Why is it unnecessary to reset the vector configuration every time? I
> think
> > it is necessary to reset e16/e8 each time.
>
> I misread the p
Why is it unnecessary to reset the vector configuration every time? I think
it is necessary to reset e16/e8 each time.
Rémi Denis-Courmont 于2024年5月15日周三 01:46写道:
> Le maanantaina 13. toukokuuta 2024, 19.59.21 EEST u...@foxmail.com a
> écrit :
> > From: sunyuechi
> >
> > C908:
> >
Okay, learned it
Rémi Denis-Courmont 于2024年5月15日周三 01:00写道:
> Le tiistaina 14. toukokuuta 2024, 7.45.29 EEST flow gg a écrit :
> > I am locally using:
> > if (bpp == 8 && (flags & AV_CPU_FLAG_RVI) && (flags &
> > AV_CPU_FLAG_RVB_ADDR)) {
>
&g
Using this will give output `if (bpp == 8 && (flags & AV_CPU_FLAG_RVI)) {`
Did you comment out the MISALIGNED flag check but not add RVI, resulting in
no output?
Rémi Denis-Courmont 于2024年5月15日周三 01:02写道:
> Le tiistaina 14. toukokuuta 2024, 7.44.55 EEST flow gg a écrit :
> &g
I am locally using:
if (bpp == 8 && (flags & AV_CPU_FLAG_RVI) && (flags &
AV_CPU_FLAG_RVB_ADDR)) {
this performs better on k230/banana_f3 than C.
For email, refer to [FFmpeg-devel] [PATCH 2/2] lavc/vp8dsp: restrict RVI
optimisations and change it to
if (bpp == 8 && (flags &
I am locally using:
if (bpp == 8 && (flags & AV_CPU_FLAG_RVI)) {
this performs better on k230/banana_f3 than C.
For email, refer to [FFmpeg-devel] [PATCH 2/2] lavc/vp8dsp: restrict RVI
optimisations and change it to
if (bpp == 8 && (flags & AV_CPU_FLAG_RV_MISALIGNED)) {
So no output, but I
just rebase
于2024年5月14日周二 01:00写道:
> From: sunyuechi
>
> C908:
> vp9_vert_8x8_8bpp_c: 22.0
> vp9_vert_8x8_8bpp_rvi: 15.7
> vp9_vert_16x16_8bpp_c: 71.2
> vp9_vert_16x16_8bpp_rvi: 39.0
> vp9_vert_32x32_8bpp_c: 300.2
> vp9_vert_32x32_8bpp_rvi: 135.2
> ---
> libavcodec/riscv/Makefile| 1
It seems like it can't... update using AV_CPU_FLAG_RV_MISALIGNED
Rémi Denis-Courmont 于2024年5月12日周日 19:48写道:
> Le perjantaina 10. toukokuuta 2024, 11.21.14 EEST u...@foxmail.com a
> écrit :
> > From: sunyuechi
> >
> > C908 X60
> >
> It should be possible to improve ordering to avoid immediate dependency
from ADD to SD
Okay, updated it.
Additionally improved the mc-tap_64 on vlen>=256 and something
于2024年5月12日周日 18:04写道:
> From: sunyuechi
>
> C908:
> vp9_vert_8x8_8bpp_c: 22.0
> vp9_vert_8x8_8bpp_rvi: 15.7
>
Wow, got it
Rémi Denis-Courmont 于2024年5月11日周六 22:39写道:
> Le maanantaina 6. toukokuuta 2024, 6.38.01 EEST u...@foxmail.com a écrit :
> > From: sunyuechi
> >
> > C908:
> > vp8_put_pixels4_c: 78.0
> > vp8_put_pixels4_rvi: 33.7
> > vp8_put_pixels8_c: 278.0
> > vp8_put_pixels8_rvi: 55.0
> >
In banana_f3, further reducing the value of mf resulted in another
performance improvement. I think in the end we might need to use different
functions depending on vlen in init..
Rémi Denis-Courmont 于2024年5月11日周六 18:24写道:
> Le lauantaina 11. toukokuuta 2024, 13.02.02 EEST flow gg a éc
Okay, updated it in the reply
Rémi Denis-Courmont 于2024年5月10日周五 23:41写道:
> Le tiistaina 7. toukokuuta 2024, 19.54.09 EEST u...@foxmail.com a écrit :
> > From: sunyuechi
> >
> > C908:
> > vp8_put_epel4_h4v4_c: 20.0
> > vp8_put_epel4_h4v4_rvv_i32: 11.0
> > vp8_put_epel4_h4v6_c: 25.2
> >
The patch `lavc/vp9dsp: R-V ipred vert` needs to add `#if HAVE_RV`. How
about I modify these `#if HAVE_RVV` indentations together in this patch?
Rémi Denis-Courmont 于2024年5月11日周六 00:39写道:
> ---
> libavcodec/riscv/vp9dsp_init.c | 50 +-
> 1 file changed, 25
:
> Le perjantaina 10. toukokuuta 2024, 11.22.53 EEST flow gg a écrit :
> > Hi, I got BananaPi F3, made some fixes, updated in reply
>
> So... Does it benefit from halving the logical multiplier to process
> fixed-sized
> block as compared to C908, or can we stick to the same code r
Hi, I got BananaPi F3, made some fixes, updated in reply
Rémi Denis-Courmont 于2024年5月6日周一 03:26写道:
> Le sunnuntaina 5. toukokuuta 2024, 12.18.56 EEST flow gg a écrit :
> > > Does MF2 actually improve perfs over M1 here?
> >
> > The difference here seems very small,
> Do you gain much by unrolling all the way to 16x? Given that you have the
> counter value already in t0, it should not make much difference to just
unroll
> 2x or maybe 4x and then loop.
I chose this simple method because I think the effect is about the same..
Do I need to change it?
> It
> h is not a number so that's not a valid condition.
Fixed two of this issue
于2024年5月8日周三 00:55写道:
> From: sunyuechi
>
> C908:
> vp8_put_bilin4_h_c: 367.0
> vp8_put_bilin4_h_rvv_i32: 137.7
> vp8_put_bilin4_v_c: 377.0
> vp8_put_bilin4_v_rvv_i32: 137.7
> vp8_put_bilin8_h_c: 1431.0
>
I didn't understand what you mean... What does judging whether the type is
'h' or 'v' have to do with the number?
Rémi Denis-Courmont 于2024年5月8日周三 00:00写道:
> Le maanantaina 6. toukokuuta 2024, 6.38.02 EEST u...@foxmail.com a écrit :
> > From: sunyuechi
> >
> > C908:
> > vp8_put_bilin4_h_c:
Fixed issues similar to vp8
于2024年5月7日周二 15:36写道:
> From: sunyuechi
>
> C908:
> vp9_vert_8x8_8bpp_c: 22.0
> vp9_vert_8x8_8bpp_rvi: 15.7
> vp9_vert_16x16_8bpp_c: 71.2
> vp9_vert_16x16_8bpp_rvi: 39.0
> vp9_vert_32x32_8bpp_c: 300.2
> vp9_vert_32x32_8bpp_rvi: 135.2
> ---
>
> IMO, passing a complete register name, if you really need to vary it,
would be
simpler and more flexible than an ABI register type prefix.
If the full register name is passed here, some require four parameters,
some require six parameters, and there is often repetition.
I feel it's easy to get
> Doesn't this effectively discard the last element, t5?
> Can't we skip the slide and just load the vector at a2+1? Also then, we
can
> keep VL=len and halve the multipler.
Yes, this is better, I remember that using slide1down was better in the
initial version testing, but now it has changed..
I
Made these changes according to the previous review:
moved func into macro, added macro vset to reduce if else, used rvi,
supplemented __riscv_xlen
于2024年5月6日周一 00:45写道:
> From: sunyuechi
>
> C908:
> vp8_put_pixels4_c: 78.0
> vp8_put_pixels4_rvi: 33.7
> vp8_put_pixels8_c: 278.0
>
> Is it not faster to compute the address ahead of time, e.g.:
> Ditto below and in other patches.
Yes, update here and I will check other patches
> Copying 64-bit quantities should not need RVV at all. Maybe the C version
needs to be improved instead, but if that is not possible, then an RVI
the github link: https://github.com/hleft/FFmpeg/tree/vp9
于2024年5月4日周六 23:03写道:
> From: sunyuechi
>
> C908:
> vp9_vert_8x8_8bpp_c: 22.0
> vp9_vert_8x8_8bpp_rvv_i64: 18.5
> vp9_vert_16x16_8bpp_c: 71.2
> vp9_vert_16x16_8bpp_rvv_i32: 50.7
> vp9_vert_32x32_8bpp_c: 300.2
>
I've reorganized it, and the github link is at :
https://github.com/hleft/FFmpeg/tree/vp8
于2024年5月4日周六 22:49写道:
> From: sunyuechi
>
> C908:
> vp8_put_pixels4_c: 87.5
> vp8_put_pixels4_rvv_i32: 42.7
> vp8_put_pixels8_c: 284.5
> vp8_put_pixels8_rvv_i32: 77.7
> vp8_put_pixels16_c: 1087.7
>
Hi, it's me. I accidentally repeated it but it seems to be correct.
于2024年5月4日周六 18:01写道:
> From: sunyuechi
>
> vc1dsp.avg_vc1_mspel_pixels_tab[0][0]_c: 869.7
> vc1dsp.avg_vc1_mspel_pixels_tab[0][0]_rvv_i32: 148.7
> vc1dsp.avg_vc1_mspel_pixels_tab[1][0]_c: 220.5
>
I saw about comparing emails and gitlab/hub .., I did not comprehensively
understand their advantages and disadvantages, but I want to say that I
support it to change to gitlab/hub
Simple reason:
If you need to use git-send-email, I may not be able to submit any code
If you do not need to use
Sorry, this is because a 'bpp == 8' was missed. It has been fixed in this
link
Rémi Denis-Courmont 于2024年5月2日周四 22:11写道:
> Le tiistaina 30. huhtikuuta 2024, 2.36.22 EEST flow gg a écrit :
> > updated it in the reply and https://github.com/hleft/FFmpeg/tree/vp8vp9
>
> VP9 checkas
From 3e66b2bbe257cc91a4c2169362163e92aba6760b Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Tue, 30 Apr 2024 18:24:00 +0800
Subject: [PATCH 2/2] lavc/rv40dsp: R-V V chroma_mc
This is similar to h264, but here we use manual_avg instead of vaaddu
because rv40's OP differs from h264. If we use
From 07c0b8a26b76e31c46ecabddb251f317c48c73a3 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Tue, 30 Apr 2024 12:43:57 +0800
Subject: [PATCH 1/2] checkasm/rv40dsp: add chroma_mc test
This is similar to h264.
---
tests/checkasm/Makefile | 1 +
tests/checkasm/checkasm.c | 3 ++
Since the number of stores is controlled by a3 and not by zero, it doesn't
have to be exactly 16 bytes ?
Rémi Denis-Courmont 于2024年4月30日周二 14:40写道:
>
>
> Le 30 avril 2024 03:26:25 GMT+03:00, flow gg a
> écrit :
> >Hi, I initially used a loop, but according to lib
Since there is no 8x16, not test 8x16, and updated it in the reply
flow gg 于2024年4月29日周一 15:09写道:
>
>
From fc7c28cb78e0c90880f31c0b8d6f2fc16d0fe581 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Mon, 29 Apr 2024 14:18:23 +0800
Subject: [PATCH 1/2] checkasm/blockdsp: add fill_bloc
Since there is no 8x16, I changed m8 to m4, and updated it in the reply
flow gg 于2024年4月30日周二 08:26写道:
> Hi, I initially used a loop, but according to libavcodec/blockdsp.h,
>
> the maximum is 8x16 = 128 bytes, so using ff_get_rv_vlenb() >= 16 and m8
> does not
ina 29. huhtikuuta 2024, 10.09.41 EEST flow gg a écrit :
> >
>
> Are you sure that this works with all vector lengths?
> The block8 code looks odd.
>
> --
> レミ・デニ-クールモン
> http://www.remlab.net/
> ___
> ffmpeg-devel mailing
updated it in the reply and https://github.com/hleft/FFmpeg/tree/vp8vp9
Rémi Denis-Courmont 于2024年4月30日周二 01:57写道:
> Le perjantaina 22. maaliskuuta 2024, 8.02.38 EEST flow gg a écrit :
> > Because the previous patch was updated, so it was updated in this
> response
>
> Seem
From 4315f4e4774e3006d7cc55b6d235cb80e0173cf9 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Wed, 6 Mar 2024 12:46:03 +0800
Subject: [PATCH 2/2] lavc/blockdsp: R-V V fill_block
C908:
blockdsp.fill_block_tab[0]_c: 550.0
blockdsp.fill_block_tab[0]_rvv_i64: 48.2
blockdsp.fill_block_tab[1]_c: 148.7
From 0c196a37cb4036d8c618c06c02a011b910cc56ce Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Mon, 29 Apr 2024 14:18:23 +0800
Subject: [PATCH 1/2] checkasm/blockdsp: add fill_block test
---
tests/checkasm/blockdsp.c | 32
1 file changed, 32 insertions(+)
diff
Happy to see you back :)
Rémi Denis-Courmont 于2024年4月29日周一 02:06写道:
> Le sunnuntaina 7. huhtikuuta 2024, 8.38.54 EEST flow gg a écrit :
> > ping
>
> I have been away for a while, and catching up takes time, sorry.
>
> --
> レミ・デニ-クールモン
github link: https://github.com/hleft/FFmpeg/tree/vp8vp9
flow gg 于2024年4月20日周六 23:55写道:
>
>
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg
From cff79c9500b94f4c0abdd9cd68c91cc736366c78 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Sat, 20 Apr 2024 23:26:58 +0800
Subject: [PATCH 3/3] lavc/vp8dsp: R-V V loop_filter
C908:
vp8_loop_filter8uv_v_c: 745.5
vp8_loop_filter8uv_v_rvv_i32: 467.2
vp8_loop_filter16y_h_c: 674.2
From c033ab8d30135dc02b09b1747c0761baefdcbb4a Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Sat, 20 Apr 2024 23:13:07 +0800
Subject: [PATCH 2/3] lavc/vp8dsp: R-V V loop_filter_inner
C908:
vp8_loop_filter8uv_inner_v_c: 738.2
vp8_loop_filter8uv_inner_v_rvv_i32: 455.2
From 2f516e0236bd84d78ce6fd7e55c4b1a3c9d99baa Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Sat, 20 Apr 2024 23:32:10 +0800
Subject: [PATCH 1/3] lavc/vp8dsp: R-V V loop_filter_simple
C908:
vp8_loop_filter_simple_h_c: 416.0
vp8_loop_filter_simple_h_rvv_i32: 187.5
vp8_loop_filter_simple_v_c:
ping
flow gg 于2024年3月8日周五 17:46写道:
> Alright, using m8, but for now don't add code to address dependencies in
> loops that have a minor impact. Updated in the reply
>
> Rémi Denis-Courmont 于2024年3月8日周五 17:08写道:
>
>>
>>
>> Le 8 mars 2024 02:45:46 GMT+02:00,
Okay, updated it in the reply and github(
https://github.com/hleft/FFmpeg/tree/vp8vp9)
Rémi Denis-Courmont 于2024年4月4日周四 04:22写道:
> Le torstaina 28. maaliskuuta 2024, 4.44.33 EEST flow gg a écrit :
> > I don't quite understand, I think here 8x8 because zve64x is not suitable
> &
:41写道:
> Le perjantaina 22. maaliskuuta 2024, 8.02.08 EET flow gg a écrit :
> > Using macros to shorten function definitions, updated in this response
>
> Did you try to share the common code after getdc and see how slower it is?
> If
> an extra static branch has negligib
doesn't have enough)
Rémi Denis-Courmont 于2024年3月27日周三 23:36写道:
> Le perjantaina 22. maaliskuuta 2024, 8.01.21 EET flow gg a écrit :
> >
>
> IMO, you could just as well share the code and avoid most if's. Not like
> one
> additional `li a3, 1` per function call is going t
Alright, updated it in this reply
Rémi Denis-Courmont 于2024年3月27日周三 16:18写道:
> Hi,
>
> Le 27 mars 2024 04:37:02 GMT+02:00, flow gg a
> écrit :
> >Okay, changed to use const, updated at this GitHub link (
> >https://github.com/hleft/FFmpeg/tree/vp8vp9)
>
> OK
Hi, here's the github link (https://github.com/hleft/FFmpeg/tree/vp8vp9)
Rémi Denis-Courmont 于2024年3月27日周三 02:30写道:
> Hi,
>
> Le perjantaina 22. maaliskuuta 2024, 8.12.41 EET flow gg a écrit :
> > It might be a bit inconvenient to find the patches related to vp8, vp9
>
Okay, changed to use const, updated at this GitHub link (
https://github.com/hleft/FFmpeg/tree/vp8vp9)
Rémi Denis-Courmont 于2024年3月27日周三 02:38写道:
> Le perjantaina 22. maaliskuuta 2024, 8.01.00 EET flow gg a écrit :
> > (This should be used after applying these 4 patches)
> >
>
It might be a bit inconvenient to find the patches related to vp8, vp9 that
were sent earlier. Here, I've placed them in a zip file in this reply
flow gg 于2024年3月22日周五 14:03写道:
> (This should be used after applying these patches)
>
> ```
> [FFmpeg-devel] [PATCH 1/4] lavc/vp9dsp: R-
From 5d29de366bab4736b1e05e2167d976d344dd8c44 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Thu, 21 Mar 2024 23:21:18 +0800
Subject: [PATCH 7/7] lavc/vp9dsp: R-V V mc tap hv
C908:
vp9_avg_8tap_smooth_4hv_8bpp_c: 32.2
vp9_avg_8tap_smooth_4hv_8bpp_rvv_i64: 15.2
vp9_avg_8tap_smooth_8hv_8bpp_c:
From 5df2835fd182378b78530e001669c65f3638946d Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Thu, 21 Mar 2024 23:14:10 +0800
Subject: [PATCH 6/7] lavc/vp9dsp: R-V V mc bilin hv
C908:
vp9_avg_bilin_4hv_8bpp_c: 10.7
vp9_avg_bilin_4hv_8bpp_rvv_i64: 4.5
vp9_avg_bilin_8hv_8bpp_c: 38.7
From 94aacf6d1d49cc009669f89c91db71038a13285d Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Thu, 21 Mar 2024 23:08:01 +0800
Subject: [PATCH 5/7] lavc/vp9dsp: R-V V mc tap v
C908:
vp9_avg_8tap_smooth_4v_8bpp_c: 13.7
vp9_avg_8tap_smooth_4v_8bpp_rvv_i64: 5.0
vp9_avg_8tap_smooth_8v_8bpp_c: 49.7
From eb004dcf5cc6a3c379cb6cb7b8592afa65626c5c Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Thu, 21 Mar 2024 23:00:19 +0800
Subject: [PATCH 4/7] lavc/vp9dsp: R-V V mc bilin v
C908:
vp9_avg_bilin_4v_8bpp_c: 5.5
vp9_avg_bilin_4v_8bpp_rvv_i64: 2.2
vp9_avg_bilin_8v_8bpp_c: 20.7
The order of some instructions appears imperfect because, when len==32, the
registers for operations like hv can only just suffice, making it difficult
to adjust.
It's possible to create a separate function for len<32, but it likely won't
have a significant impact, so this hasn't been done yet.
From 7ad03f4bc70e4c334d8e52dce2ea2b6f09a9a244 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Thu, 21 Mar 2024 22:11:26 +0800
Subject: [PATCH 2/7] lavc/vp9dsp: R-V V mc bilin h
C908:
vp9_avg_bilin_4h_8bpp_c: 5.5
vp9_avg_bilin_4h_8bpp_rvv_i64: 2.5
vp9_avg_bilin_8h_8bpp_c: 19.7
(This should be used after applying these patches)
```
[FFmpeg-devel] [PATCH 1/4] lavc/vp9dsp: R-V V ipred dc
1-4
```
From ea81872215165ff859a0b5b2e003c5c678ea8ed0 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Thu, 21 Mar 2024 22:01:18 +0800
Subject: [PATCH 1/7] lavc/vp9dsp: R-V mc copy_avg
Because the previous patch was updated, so it was updated in this response
flow gg 于2024年3月3日周日 10:01写道:
> Due to the PATCH 1/4 update, updates are made here.
>
> flow gg 于2024年3月2日周六 15:42写道:
>
>>
>>
From 9561d35be25c330a0be3a371269289ce21f5ada3 Mon Sep 17 00:00:00 20
Because the previous patch was updated, so it was updated in this response
flow gg 于2024年3月3日周日 10:01写道:
>
>
> flow gg 于2024年3月2日周六 15:42写道:
>
>>
>>
From a4672687a10a49702623449e8569d68913e91346 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Thu, 21 Mar 2024 21:39:50 +
Because the previous patch was updated, so it was updated in this response
flow gg 于2024年3月3日周日 10:01写道:
> Due to the PATCH 1/4 update, updates here.
>
> flow gg 于2024年3月2日周六 15:42写道:
>
>>
>>
From 6feb148e9167e1f0cc6d8a0e9ca701d61222c03e Mon Sep 17 00:00:00 2001
From:
Using macros to shorten function definitions, updated in this response
flow gg 于2024年3月7日周四 19:20写道:
> updated it in the reply
>
> flow gg 于2024年3月3日周日 23:31写道:
>
>> > As noted eaerlier, I don't understand why you have two size parameters.
>> It
>> seems tha
From 278e473681eddaf24977e47c88f715620105c6b3 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Thu, 21 Mar 2024 17:50:58 +0800
Subject: [PATCH 3/3] lavc/vp8dsp: R-V V put_epel hv
C908:
vp8_put_epel4_h4v4_c: 20.0
vp8_put_epel4_h4v4_rvv_i32: 11.0
vp8_put_epel4_h4v6_c: 25.2
From a59509c554a319f8271ad4175da40788445f7a56 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Thu, 21 Mar 2024 17:49:54 +0800
Subject: [PATCH 2/3] lavc/vp8dsp: R-V V put_epel v
C908:
vp8_put_epel4_v4_c: 11.0
vp8_put_epel4_v4_rvv_i32: 5.0
vp8_put_epel4_v6_c: 16.5
vp8_put_epel4_v6_rvv_i32: 6.2
(This should be used after applying these 4 patches)
```
[FFmpeg-devel] [PATCH] lavc/vp8dsp: R-V V put_vp8_pixels
[FFmpeg-devel] [PATCH 1/3] lavc/vp8dsp: R-V V put_bilin_h
1-3
```
From 201274b32ef49fdeb6782498634ed78491a9519a Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Sat, 9 Mar 2024
ping
flow gg 于2024年3月3日周日 23:03写道:
> Sorry since I did not send the emails all at once, so cannot apply all 4
> patches together with git am *.patch. Instead, it needs to first apply the
> patch with 'git am '[PATCH] lavc/vp8dsp: R-V V put_vp8_pixels'', and then
> apply the
Alright, using m8, but for now don't add code to address dependencies in
loops that have a minor impact. Updated in the reply
Rémi Denis-Courmont 于2024年3月8日周五 17:08写道:
>
>
> Le 8 mars 2024 02:45:46 GMT+02:00, flow gg a
> écrit :
> >> Isn't it also faster to max L
14.06.13 EET flow gg a écrit :
> > Here adjusting the order, rather than simply using .rept, will be 13%-24%
> > faster.
>
> Isn't it also faster to max LMUL for the adds here?
>
> Also this might not be much noticeable on C908, but avoiding sequential
> dependencies
updated it in the reply
flow gg 于2024年3月3日周日 23:31写道:
> > As noted eaerlier, I don't understand why you have two size parameters.
> It
> seems that \size is always either the same as (1 << (\size2 - 1)) a.k.a.
> ((1
> << \size2) / 2), or unused. The asse
larly, you can use \restore as a truth value directly: `.if \restore`.
Okay
FWIW, it seems that you could just as well include func/endfunc inside the
macros.
Do you mean to generate func/endfunc using macros?
Rémi Denis-Courmont 于2024年3月3日周日 22:46写道:
> Le sunnuntaina 3. maaliskuuta 2024, 3.59.0
-Courmont 于2024年3月3日周日 22:39写道:
> Le perjantaina 23. helmikuuta 2024, 16.45.46 EET flow gg a écrit :
> >
>
> Looks like this needs rebasing, or otherwise does not apply.
>
> --
> Rémi Denis-Courmont
> http://www.remlab.net/
>
>
>
> ___
Due to the PATCH 1/4 update, updates are made here.
flow gg 于2024年3月2日周六 15:42写道:
>
>
From d7aa14940f52b627baf0ae4905e8af6038dc16fc Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Sat, 2 Mar 2024 09:35:22 +0800
Subject: [PATCH 4/4] lavc/vp9dsp: R-V V ipred tm
C908:
vp9_tm_4x4_8bpp_c:
flow gg 于2024年3月2日周六 15:42写道:
>
>
From 006dcbe723592a3653bceb0d7f8cc3004e05cb05 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Sat, 2 Mar 2024 08:35:39 +0800
Subject: [PATCH 3/4] lavc/vp9dsp: R-V V ipred hor
C908:
vp9_hor_8x8_8bpp_c: 74.7
vp9_hor_8x8_8bpp_rvv_i32: 35.7
vp9_hor_16x16_
Due to the PATCH 1/4 update, updates here.
flow gg 于2024年3月2日周六 15:42写道:
>
>
From ed44215bff4cbf0372cd04f87f45a6ba25274564 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Fri, 1 Mar 2024 18:38:43 +0800
Subject: [PATCH 2/4] lavc/vp9dsp: R-V V ipred vert
C908:
vp9_vert_8x8_8bpp_c
updated a little improve in this reply
flow gg 于2024年3月2日周六 17:48写道:
> Okay, reduced if/else in the response.
>
> Rémi Denis-Courmont 于2024年3月2日周六 17:03写道:
>
>> Le lauantaina 2. maaliskuuta 2024, 9.42.06 EET flow gg a écrit :
>> >
>>
>> You would
Here adjusting the order, rather than simply using .rept, will be 13%-24%
faster.
From 07aa3e2eff0fe1660ac82dec5d06d50fa4c433a4 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Wed, 28 Feb 2024 16:32:39 +0800
Subject: [PATCH 2/2] lavc/vc1dsp: R-V V mspel_pixels
From efcb91959cb373145f2fc9fcbfcc6659610172cc Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Fri, 1 Mar 2024 19:45:53 +0800
Subject: [PATCH 1/2] checkasm/vc1dsp: add mspel_pixels test
---
tests/checkasm/vc1dsp.c | 37 +
1 file changed, 37 insertions(+)
diff
Okay, reduced if/else in the response.
Rémi Denis-Courmont 于2024年3月2日周六 17:03写道:
> Le lauantaina 2. maaliskuuta 2024, 9.42.06 EET flow gg a écrit :
> >
>
> You would need a lot fewer if/else if you passed the order/bit-width
> instead
> of the size as macro parameter.
&g
From 3128765d298f5a44fd13be7b3da2ef88c96083f9 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Sat, 2 Mar 2024 09:35:22 +0800
Subject: [PATCH 4/4] lavc/vp9dsp: R-V V ipred tm
C908:
vp9_tm_4x4_8bpp_c: 116.5
vp9_tm_4x4_8bpp_rvv_i32: 43.5
vp9_tm_8x8_8bpp_c: 416.2
vp9_tm_8x8_8bpp_rvv_i32: 86.0
From 173072b33d3237b924f3fa342e20558d96a72457 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Sat, 2 Mar 2024 08:35:39 +0800
Subject: [PATCH 3/4] lavc/vp9dsp: R-V V ipred hor
C908:
vp9_hor_8x8_8bpp_c: 74.7
vp9_hor_8x8_8bpp_rvv_i32: 35.7
vp9_hor_16x16_8bpp_c: 175.5
vp9_hor_16x16_8bpp_rvv_i32:
From 7abd262daa281cee412a905ea75a5f10dd0b1fbe Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Fri, 1 Mar 2024 18:38:43 +0800
Subject: [PATCH 2/4] lavc/vp9dsp: R-V V ipred vert
C908:
vp9_vert_8x8_8bpp_c: 22.0
vp9_vert_8x8_8bpp_rvv_i64: 18.5
vp9_vert_16x16_8bpp_c: 71.2
vp9_vert_16x16_8bpp_rvv_i32:
From adaae06a3e18bccec1772a3134334cbea652ae77 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Mon, 26 Feb 2024 14:42:17 +0800
Subject: [PATCH 1/4] lavc/vp9dsp: R-V V ipred dc
C908:
vp9_dc_8x8_8bpp_c: 46.0
vp9_dc_8x8_8bpp_rvv_i64: 41.0
vp9_dc_16x16_8bpp_c: 109.2
vp9_dc_16x16_8bpp_rvv_i32: 72.7
please ignore this, updated in "[FFmpeg-devel] [PATCH 1/4] lavc/vp9dsp: R-V
V ipred dc"
flow gg 于2024年2月27日周二 00:19写道:
>
>
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To un
please ignore this, updated in "[FFmpeg-devel] [PATCH 1/4] lavc/vp9dsp: R-V
V ipred dc"
flow gg 于2024年2月27日周二 00:19写道:
>
>
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To un
please ignore this, updated in "[FFmpeg-devel] [PATCH 1/4] lavc/vp9dsp: R-V
V ipred dc"
flow gg 于2024年2月27日周二 00:19写道:
>
>
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To un
Found some problems.. I'll come back to modify this later. (to prevent
wasting time on this now)
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
1 - 100 of 227 matches
Mail list logo