Re: [FFmpeg-devel] [PATCH] avcodec/proresenc_anatoliy: change quantization scaling to floating point to utilize vectorization

2018-02-27 Thread Rostislav Pehlivanov
On 27 February 2018 at 21:22, David Murmann  wrote:

>
> On 2/27/2018 9:58 PM, Hendrik Leppkes wrote:
> > On Tue, Feb 27, 2018 at 9:35 PM, David Murmann 
> wrote:
> >> Quantization scaling seems to be a slight bottleneck,
> >> this change allows the compiler to more easily vectorize
> >> the loop. This improves total encoding performance in my
> >> tests by about 10-20%.
> >>
> >> Signed-off-by: David Murmann 
> >> ---
> >>   libavcodec/proresenc_anatoliy.c | 12 
> >>   1 file changed, 8 insertions(+), 4 deletions(-)
> >>
> [...]
> >> +for (j = 0; j < blocks_per_slice; j++) {
> >> +for (i = 0; i < 64; i++) {
> >> +block[i] = (float)in[(j << 6) + i] / (float)qmat[i];
> >> +}
> >> +
> >> +for (i = 1; i < 64; i++) {
> >> +int val = block[progressive_scan[i]];
> >>   if (val) {
> >>   encode_codeword(pb, run, run_to_cb[FFMIN(prev_run,
> 15)]);
> >
> > Usually, using float is best avoided. Did you test re-factoring the
> > loop structure without changing it to float?
>
> Yes, the vector instructions don't have integer division, AFAIK, and the
> compiler just generates a loop with idivs. This is quite a bit slower
> than converting to float, dividing and converting back, if the compiler
> uses vector instructions. In the general case this wouldn't be exact,
> but since the input values are int16 they should losslessly fit into
> float32. On platforms where this auto-vectorization fails this might
> actually be quite a bit slower, but I have not seen that in my tests
> (though I have only tested on x86_64).
>
> --
> David Murmann
>
> da...@btf.de
> Telefon +49 (0) 221 82008710
> Fax +49 (0) 221 82008799
>
> http://btf.de/
>
> --
> btf GmbH | Leyendeckerstr. 27, 50825 Köln | +49 (0) 221 82 00 87 10
> Geschäftsführer: Philipp Käßbohrer & Matthias Murmann | HR Köln | HRB 74707
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>

No, you're going about it the wrong way. Floats should most definitely be
avoided in encoders/decoders. Non-deterministic output on platforms is a
smaller issue to how they can obliterate performance if compilers emit an
actual div instruction.

Instead, here's what you can do to make it even faster: replace the
division with a multiply + a shift. Keeps the output identical too. I've
just sent an old patch of mine (for a different but similar codec) you can
work off of - just take the last bit of code there, run it at init to
generate the LUTs for all quantizers and then just multiply and shift by
looking into the tables you generate. Here's the link:
http://ffmpeg.org/pipermail/ffmpeg-devel/2018-February/225867.html
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avcodec/proresenc_anatoliy: change quantization scaling to floating point to utilize vectorization

2018-02-27 Thread David Murmann


On 2/27/2018 9:58 PM, Hendrik Leppkes wrote:
> On Tue, Feb 27, 2018 at 9:35 PM, David Murmann  
wrote:

>> Quantization scaling seems to be a slight bottleneck,
>> this change allows the compiler to more easily vectorize
>> the loop. This improves total encoding performance in my
>> tests by about 10-20%.
>>
>> Signed-off-by: David Murmann 
>> ---
>>   libavcodec/proresenc_anatoliy.c | 12 
>>   1 file changed, 8 insertions(+), 4 deletions(-)
>>
[...]
>> +for (j = 0; j < blocks_per_slice; j++) {
>> +for (i = 0; i < 64; i++) {
>> +block[i] = (float)in[(j << 6) + i] / (float)qmat[i];
>> +}
>> +
>> +for (i = 1; i < 64; i++) {
>> +int val = block[progressive_scan[i]];
>>   if (val) {
>>   encode_codeword(pb, run, run_to_cb[FFMIN(prev_run, 
15)]);

>
> Usually, using float is best avoided. Did you test re-factoring the
> loop structure without changing it to float?

Yes, the vector instructions don't have integer division, AFAIK, and the
compiler just generates a loop with idivs. This is quite a bit slower
than converting to float, dividing and converting back, if the compiler
uses vector instructions. In the general case this wouldn't be exact,
but since the input values are int16 they should losslessly fit into
float32. On platforms where this auto-vectorization fails this might
actually be quite a bit slower, but I have not seen that in my tests
(though I have only tested on x86_64).

--
David Murmann

da...@btf.de
Telefon +49 (0) 221 82008710
Fax +49 (0) 221 82008799

http://btf.de/

--
btf GmbH | Leyendeckerstr. 27, 50825 Köln | +49 (0) 221 82 00 87 10
Geschäftsführer: Philipp Käßbohrer & Matthias Murmann | HR Köln | HRB 74707
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avcodec/proresenc_anatoliy: change quantization scaling to floating point to utilize vectorization

2018-02-27 Thread Hendrik Leppkes
On Tue, Feb 27, 2018 at 9:35 PM, David Murmann  wrote:
> Quantization scaling seems to be a slight bottleneck,
> this change allows the compiler to more easily vectorize
> the loop. This improves total encoding performance in my
> tests by about 10-20%.
>
> Signed-off-by: David Murmann 
> ---
>  libavcodec/proresenc_anatoliy.c | 12 
>  1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/libavcodec/proresenc_anatoliy.c
> b/libavcodec/proresenc_anatoliy.c
> index 0516066163..8b296f6f1b 100644
> --- a/libavcodec/proresenc_anatoliy.c
> +++ b/libavcodec/proresenc_anatoliy.c
> @@ -232,14 +232,18 @@ static const uint8_t lev_to_cb[10] = { 0x04, 0x0A,
> 0x05, 0x06, 0x04, 0x28,
>  static void encode_ac_coeffs(AVCodecContext *avctx, PutBitContext *pb,
>  int16_t *in, int blocks_per_slice, int *qmat)
>  {
> +int16_t block[64];
>  int prev_run = 4;
>  int prev_level = 2;
>   int run = 0, level, code, i, j;
> -for (i = 1; i < 64; i++) {
> -int indp = progressive_scan[i];
> -for (j = 0; j < blocks_per_slice; j++) {
> -int val = QSCALE(qmat, indp, in[(j << 6) + indp]);
> +for (j = 0; j < blocks_per_slice; j++) {
> +for (i = 0; i < 64; i++) {
> +block[i] = (float)in[(j << 6) + i] / (float)qmat[i];
> +}
> +
> +for (i = 1; i < 64; i++) {
> +int val = block[progressive_scan[i]];
>  if (val) {
>  encode_codeword(pb, run, run_to_cb[FFMIN(prev_run, 15)]);

Usually, using float is best avoided. Did you test re-factoring the
loop structure without changing it to float?

- Hendrik
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH] avcodec/proresenc_anatoliy: change quantization scaling to floating point to utilize vectorization

2018-02-27 Thread David Murmann

Quantization scaling seems to be a slight bottleneck,
this change allows the compiler to more easily vectorize
the loop. This improves total encoding performance in my
tests by about 10-20%.

Signed-off-by: David Murmann 
---
 libavcodec/proresenc_anatoliy.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/libavcodec/proresenc_anatoliy.c 
b/libavcodec/proresenc_anatoliy.c

index 0516066163..8b296f6f1b 100644
--- a/libavcodec/proresenc_anatoliy.c
+++ b/libavcodec/proresenc_anatoliy.c
@@ -232,14 +232,18 @@ static const uint8_t lev_to_cb[10] = { 0x04, 0x0A, 
0x05, 0x06, 0x04, 0x28,

 static void encode_ac_coeffs(AVCodecContext *avctx, PutBitContext *pb,
 int16_t *in, int blocks_per_slice, int *qmat)
 {
+int16_t block[64];
 int prev_run = 4;
 int prev_level = 2;
  int run = 0, level, code, i, j;
-for (i = 1; i < 64; i++) {
-int indp = progressive_scan[i];
-for (j = 0; j < blocks_per_slice; j++) {
-int val = QSCALE(qmat, indp, in[(j << 6) + indp]);
+for (j = 0; j < blocks_per_slice; j++) {
+for (i = 0; i < 64; i++) {
+block[i] = (float)in[(j << 6) + i] / (float)qmat[i];
+}
+
+for (i = 1; i < 64; i++) {
+int val = block[progressive_scan[i]];
 if (val) {
 encode_codeword(pb, run, run_to_cb[FFMIN(prev_run, 15)]);
 -- 2.16.2


--
btf GmbH | Leyendeckerstr. 27, 50825 Köln | +49 (0) 221 82 00 87 10
Geschäftsführer: Philipp Käßbohrer & Matthias Murmann | HR Köln | HRB 74707
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel