Okay, I updated it in the reply
Rémi Denis-Courmont 于2024年1月17日周三 02:04写道:
> +vsetvli t0, a2, e8, m2, tu, ma
> +vle8.v v0, (a0)
> +sub a2, a2, t0
> +vsetvli zero, t0, e16, m4, tu, ma
> +vle16.v v8, (a1)
> +vsetvli
Le sunnuntaina 7. tammikuuta 2024, 10.36.23 EET flow gg a écrit :
> Alright, I learned a bit more, so should we not consider the internal
> implementation?
You asked what the reason was for your counter-intuitive observations, and I
provided a plausible hypothesis. Nothing more ,nothing less.
+vsetvli t0, a2, e8, m2, tu, ma
+vle8.v v0, (a0)
+sub a2, a2, t0
+vsetvli zero, t0, e16, m4, tu, ma
+vle16.v v8, (a1)
+vsetvli zero, t0, e8, m2, tu, ma
+vwsub.wv v16, v8, v0
+vsetvli zero,
Alright, I learned a bit more, so should we not consider the internal
implementation?
I've added this version that reduces one vset in this reply.
Rémi Denis-Courmont 于2024年1月7日周日 16:03写道:
> Le sunnuntaina 7. tammikuuta 2024, 3.33.39 EET flow gg a écrit :
> > I tested it, and indeed using vwsub
Le sunnuntaina 7. tammikuuta 2024, 3.33.39 EET flow gg a écrit :
> I tested it, and indeed using vwsub is faster. Updated it in the reply.
>
> ---
>
> I have a question: if I tweak the load order a bit, using one less vset, it
> leads to being slower (the patch I submitted is 13.2, if I make the
I tested it, and indeed using vwsub is faster. Updated it in the reply.
---
I have a question: if I tweak the load order a bit, using one less vset, it
leads to being slower (the patch I submitted is 13.2, if I make the
following change, the time would be 15.2).
But I thought it would be faster.
Le perjantaina 5. tammikuuta 2024, 2.56.18 EET flow gg a écrit :
> One vset can be reduced, but vwsub should not be used in this case. I
> modified it in this reply.
Fair enough, but are you sure that that's faster than keeping the vsetvli and
removing the sign extension?
> Rémi Denis-Courmont
One vset can be reduced, but vwsub should not be used in this case. I
modified it in this reply.
Rémi Denis-Courmont 于2024年1月5日周五 00:00写道:
> Le lauantaina 30. joulukuuta 2023, 18.20.15 EET flow gg a écrit :
> > I mistook it, seeing the vector length as the length of the vector
> register
> > ..
Le lauantaina 30. joulukuuta 2023, 18.20.15 EET flow gg a écrit :
> I mistook it, seeing the vector length as the length of the vector register
> ..
> I have modified it in this reply.
Setting element size to 8-bit is unnecessary, and a widening subtraction can
presumably avoid the sign
I mistook it, seeing the vector length as the length of the vector register
..
I have modified it in this reply.
Rémi Denis-Courmont 于2023年12月30日周六 20:15写道:
>
>
> Le 29 décembre 2023 12:57:20 GMT+01:00, flow gg a
> écrit :
> >C908
> >ssd_int8_vs_int16_c: 207.7
> >ssd_int8_vs_int16_rvv_i32:
Le 30 décembre 2023 15:00:53 GMT+01:00, flow gg a écrit :
>> At a quick glance, it won't work if the input length is not a multiple of
>the vector length.
>
>Why?
You're not handling tails as far as I see.
> I tried 1024, 32*3, 32*7 and all passed the test.
They're all multiples of the
flow gg 于2023年12月30日周六 22:00写道:
> > At a quick glance, it won't work if the input length is not a multiple
> of the vector length.
>
> Why? I tried 1024, 32*3, 32*7 and all passed the test.
>
> > Also do you really need to extend accumulators to 32 bits?
>
> It won't overflow after the test is
> At a quick glance, it won't work if the input length is not a multiple of
the vector length.
Why? I tried 1024, 32*3, 32*7 and all passed the test.
> Also do you really need to extend accumulators to 32 bits?
It won't overflow after the test is changed, so it's not needed anymore.
I have
Le 29 décembre 2023 12:57:20 GMT+01:00, flow gg a écrit :
>C908
>ssd_int8_vs_int16_c: 207.7
>ssd_int8_vs_int16_rvv_i32: 28.0
At a quick glance, it won't work if the input length is not a multiple of the
vector length.
Also do you really need to extend accumulators to 32 bits?
C908
ssd_int8_vs_int16_c: 207.7
ssd_int8_vs_int16_rvv_i32: 28.0
From 0fd1b7a34ab8794868d80233c35f70c8ad42b9fa Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Fri, 29 Dec 2023 13:27:31 +0800
Subject: [PATCH 3/3] lavc/svq1enc: R-V V ssd_int8_vs_int16
C908
ssd_int8_vs_int16_c: 207.7
15 matches
Mail list logo