Re: [FFmpeg-devel] [PATCH] avcodec/x86/hpeldsp: fix half pel interpolation

2018-04-28 Thread Rostislav Pehlivanov
On 28 April 2018 at 12:34, Carl Eugen Hoyos  wrote:

> 2018-04-28 13:11 GMT+02:00, Rostislav Pehlivanov :
>
> > I'd say just remove it, it'll bitrot otherwise since no one will use it.
>
> Not all mpeg decoders are bit-exact and the date of the functions
> indicate they are used for such decoders.
>
> Carl Eugen
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>

Not all mpeg codecs are specified as being bitexact, so they'd be okay.
This isn't.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avcodec/x86/hpeldsp: fix half pel interpolation

2018-04-28 Thread Carl Eugen Hoyos
2018-04-28 13:11 GMT+02:00, Rostislav Pehlivanov :

> I'd say just remove it, it'll bitrot otherwise since no one will use it.

Not all mpeg decoders are bit-exact and the date of the functions
indicate they are used for such decoders.

Carl Eugen
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avcodec/x86/hpeldsp: fix half pel interpolation

2018-04-28 Thread Rostislav Pehlivanov
On 28 April 2018 at 10:42, Hendrik Leppkes  wrote:

> On Sat, Apr 28, 2018 at 10:44 AM, Jerome Borsboom
>  wrote:
> >> This patch is not correct.
> >>
> >> this code is not used if AV_CODEC_FLAG_BITEXACT is set, because it is
> >> not bit exact ...
> >>
> >> Also the case where the off by 1 error occurs is a rare corner case,
> >> Compared to the errors introduced by the IDCT this is not significant
> >>
> >> If you want to optimize the bit exact version, feel free to do so.
> >> It may be faster to use xor -1 before and after avg for this though
> >
> > Thank you for the review. VC-1 spec is defined bit exact, including the
> > inverse transform. This code is used by the VC-1 decoder and as I was
> under
> > the impression that the VC-1 decoder ought to be bit exact without any
> additional
> > command line options, I tried to resolve the issue.
> >
> > As this code is guarded by AV_CODEC_FLAG_BITEXACT, i agree that the
> > proposed solution is not appropriate. The underlying question remains
> though. Should
> > the VC-1 decoder fully conform to spec or do we allow small deviations?
>
> If the spec is bitexact, then our decoder should be by default as
> well. Perhaps such things should be flipped around and use bitexact by
> default unless the "fast" flag is specified.
>
> - Hendrik
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>

I'd say just remove it, it'll bitrot otherwise since no one will use it.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avcodec/x86/hpeldsp: fix half pel interpolation

2018-04-28 Thread Hendrik Leppkes
On Sat, Apr 28, 2018 at 10:44 AM, Jerome Borsboom
 wrote:
>> This patch is not correct.
>>
>> this code is not used if AV_CODEC_FLAG_BITEXACT is set, because it is
>> not bit exact ...
>>
>> Also the case where the off by 1 error occurs is a rare corner case,
>> Compared to the errors introduced by the IDCT this is not significant
>>
>> If you want to optimize the bit exact version, feel free to do so.
>> It may be faster to use xor -1 before and after avg for this though
>
> Thank you for the review. VC-1 spec is defined bit exact, including the
> inverse transform. This code is used by the VC-1 decoder and as I was under
> the impression that the VC-1 decoder ought to be bit exact without any 
> additional
> command line options, I tried to resolve the issue.
>
> As this code is guarded by AV_CODEC_FLAG_BITEXACT, i agree that the
> proposed solution is not appropriate. The underlying question remains though. 
> Should
> the VC-1 decoder fully conform to spec or do we allow small deviations?

If the spec is bitexact, then our decoder should be by default as
well. Perhaps such things should be flipped around and use bitexact by
default unless the "fast" flag is specified.

- Hendrik
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avcodec/x86/hpeldsp: fix half pel interpolation

2018-04-28 Thread Michael Niedermayer
On Sat, Apr 28, 2018 at 10:44:54AM +0200, Jerome Borsboom wrote:
> > This patch is not correct.
> > 
> > this code is not used if AV_CODEC_FLAG_BITEXACT is set, because it is
> > not bit exact ...
> > 
> > Also the case where the off by 1 error occurs is a rare corner case,
> > Compared to the errors introduced by the IDCT this is not significant
> > 
> > If you want to optimize the bit exact version, feel free to do so.
> > It may be faster to use xor -1 before and after avg for this though
> 
> Thank you for the review. VC-1 spec is defined bit exact, including the
> inverse transform. This code is used by the VC-1 decoder and as I was under
> the impression that the VC-1 decoder ought to be bit exact without any 
> additional
> command line options, I tried to resolve the issue.

yes, thats a bug, this should not be used if the whole decoder is bitexactly 
defined


> 
> As this code is guarded by AV_CODEC_FLAG_BITEXACT, i agree that the
> proposed solution is not appropriate. The underlying question remains though. 
> Should
> the VC-1 decoder fully conform to spec or do we allow small deviations?

i think it should fully conform, this needs to be fixed
can you fix it so the decoder uses only bitexact functions ?
(i guess this should be easy by passing the appropriate flag to the init code,
 but didnt look if the API passes it as seperate argument of passes a context,
 with a context it may be cleanest to change the internal API ...)

thx

[...]

-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

While the State exists there can be no freedom; when there is freedom there
will be no State. -- Vladimir Lenin


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avcodec/x86/hpeldsp: fix half pel interpolation

2018-04-28 Thread Jerome Borsboom
> This patch is not correct.
> 
> this code is not used if AV_CODEC_FLAG_BITEXACT is set, because it is
> not bit exact ...
> 
> Also the case where the off by 1 error occurs is a rare corner case,
> Compared to the errors introduced by the IDCT this is not significant
> 
> If you want to optimize the bit exact version, feel free to do so.
> It may be faster to use xor -1 before and after avg for this though

Thank you for the review. VC-1 spec is defined bit exact, including the
inverse transform. This code is used by the VC-1 decoder and as I was under
the impression that the VC-1 decoder ought to be bit exact without any 
additional
command line options, I tried to resolve the issue.

As this code is guarded by AV_CODEC_FLAG_BITEXACT, i agree that the
proposed solution is not appropriate. The underlying question remains though. 
Should
the VC-1 decoder fully conform to spec or do we allow small deviations?

In addition, the VC-1 fate tests put the -bitexact option after the input file 
(-i). I
could only get the bitexeact working if it was put before the input file. Thus, 
I
think the current vc-1 fate-tests do not use the bit exact version.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avcodec/x86/hpeldsp: fix half pel interpolation

2018-04-27 Thread Michael Niedermayer
On Fri, Apr 27, 2018 at 04:47:13PM +0200, Jerome Borsboom wrote:
> The assembly optimized half pel interpolation in some cases rounds the
> interpolated value when no rounding is requested. The result is a off by one
> error when one of the pixel values is zero.
> 
> Signed-off-by: Jerome Borsboom 
> ---
> In the put_no_rnd_pixels functions, the psubusb instruction subtracts one 
> from each
> unsigned byte to correct for the rouding that the PAVGB instruction performs. 
> The psubusb
> instruction, however, uses saturation when the value does not fit in the 
> operand type,
> i.e. an unsigned byte. In this particular case, this means that when the 
> value of a pixel
> is 0, the psubusb instruction will return 0 instead of -1 as this value does 
> not fit in
> an unsigned byte and is saturated to 0. The result is that the interpolated 
> value is not
> corrected for the rounding that PAVGB performs and that the result will be 
> off by one.
> 
> The corrections below solved the issues for me, but I do not a lot of 
> experience in optimizing
> assembly. A good check for the correctness of the solution might be 
> advisable. Furthermore,
> I have not checked the other assembly, but there may be more cases where the 
> psubusb
> instruction does not provide the desired results. A good check by the 
> owner/maintainer of
> the assembly code might be appropriate.
> 
>  libavcodec/x86/hpeldsp.asm | 38 --
>  1 file changed, 32 insertions(+), 6 deletions(-)

This patch is not correct.

this code is not used if AV_CODEC_FLAG_BITEXACT is set, because it is
not bit exact ...

Also the case where the off by 1 error occurs is a rare corner case,
Compared to the errors introduced by the IDCT this is not significant

If you want to optimize the bit exact version, feel free to do so.
It may be faster to use xor -1 before and after avg for this though

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Into a blind darkness they enter who follow after the Ignorance,
they as if into a greater darkness enter who devote themselves
to the Knowledge alone. -- Isha Upanishad


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avcodec/x86/hpeldsp: fix half pel interpolation

2018-04-27 Thread Henrik Gramner
On Fri, Apr 27, 2018 at 4:47 PM, Jerome Borsboom
 wrote:
> In the put_no_rnd_pixels functions, the psubusb instruction subtracts one 
> from each
> unsigned byte to correct for the rouding that the PAVGB instruction performs. 
> The psubusb
> instruction, however, uses saturation when the value does not fit in the 
> operand type,
> i.e. an unsigned byte. In this particular case, this means that when the 
> value of a pixel
> is 0, the psubusb instruction will return 0 instead of -1 as this value does 
> not fit in
> an unsigned byte and is saturated to 0. The result is that the interpolated 
> value is not
> corrected for the rounding that PAVGB performs and that the result will be 
> off by one.

Looks correct. You could put it in a macro though instead of
duplicating the code, e.g.

%macro PAVGB_NO_RND 4 ; dst/src1, src2, tmp, pb_1
pxor %3, %1, %2
pand %3, %4
PAVGB%1, %2
psubb%1, %3
%endmacro
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avcodec/x86/hpeldsp: fix half pel interpolation

2018-04-27 Thread Carl Eugen Hoyos
2018-04-27 16:47 GMT+02:00, Jerome Borsboom :
> The assembly optimized half pel interpolation in some cases rounds the
> interpolated value when no rounding is requested. The result is a off by one
> error when one of the pixel values is zero.

See also:
http://git.videolan.org/?p=ffmpeg.git;a=commitdiff;h=92415510

Carl Eugen
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH] avcodec/x86/hpeldsp: fix half pel interpolation

2018-04-27 Thread Jerome Borsboom
The assembly optimized half pel interpolation in some cases rounds the
interpolated value when no rounding is requested. The result is a off by one
error when one of the pixel values is zero.

Signed-off-by: Jerome Borsboom 
---
In the put_no_rnd_pixels functions, the psubusb instruction subtracts one from 
each
unsigned byte to correct for the rouding that the PAVGB instruction performs. 
The psubusb
instruction, however, uses saturation when the value does not fit in the 
operand type,
i.e. an unsigned byte. In this particular case, this means that when the value 
of a pixel
is 0, the psubusb instruction will return 0 instead of -1 as this value does 
not fit in
an unsigned byte and is saturated to 0. The result is that the interpolated 
value is not
corrected for the rounding that PAVGB performs and that the result will be off 
by one.

The corrections below solved the issues for me, but I do not a lot of 
experience in optimizing
assembly. A good check for the correctness of the solution might be advisable. 
Furthermore,
I have not checked the other assembly, but there may be more cases where the 
psubusb
instruction does not provide the desired results. A good check by the 
owner/maintainer of
the assembly code might be appropriate.

 libavcodec/x86/hpeldsp.asm | 38 --
 1 file changed, 32 insertions(+), 6 deletions(-)

diff --git a/libavcodec/x86/hpeldsp.asm b/libavcodec/x86/hpeldsp.asm
index ce5d7a4e28..bae2ba9880 100644
--- a/libavcodec/x86/hpeldsp.asm
+++ b/libavcodec/x86/hpeldsp.asm
@@ -145,10 +145,16 @@ cglobal put_no_rnd_pixels8_x2, 4,5
 mova m1, [r1+1]
 mova m3, [r1+r2+1]
 add  r1, r4
-psubusb  m0, m6
-psubusb  m2, m6
+mova m4, m0
+pxor m4, m1
+pand m4, m6
 PAVGBm0, m1
+psubbm0, m4
+mova m4, m2
+pxor m4, m3
+pand m4, m6
 PAVGBm2, m3
+psubbm2, m4
 mova   [r0], m0
 mova[r0+r2], m2
 mova m0, [r1]
@@ -157,10 +163,16 @@ cglobal put_no_rnd_pixels8_x2, 4,5
 mova m3, [r1+r2+1]
 add  r0, r4
 add  r1, r4
-psubusb  m0, m6
-psubusb  m2, m6
+mova m4, m0
+pxor m4, m1
+pand m4, m6
 PAVGBm0, m1
+psubbm0, m4
+mova m4, m2
+pxor m4, m3
+pand m4, m6
 PAVGBm2, m3
+psubbm2, m4
 mova   [r0], m0
 mova[r0+r2], m2
 add  r0, r4
@@ -227,18 +239,32 @@ cglobal put_no_rnd_pixels8_y2, 4,5
 mova m1, [r1+r2]
 mova m2, [r1+r4]
 add  r1, r4
-psubusb  m1, m6
+mova m3, m0
+pxor m3, m1
+pand m3, m6
 PAVGBm0, m1
+psubbm0, m3
+mova m3, m1
+pxor m3, m2
+pand m3, m6
 PAVGBm1, m2
+psubbm1, m3
 mova[r0+r2], m0
 mova[r0+r4], m1
 mova m1, [r1+r2]
 mova m0, [r1+r4]
 add  r0, r4
 add  r1, r4
-psubusb  m1, m6
+mova m3, m2
+pxor m3, m1
+pand m3, m6
 PAVGBm2, m1
+psubbm2, m3
+mova m3, m1
+pxor m3, m0
+pand m3, m6
 PAVGBm1, m0
+psubbm1, m3
 mova[r0+r2], m2
 mova[r0+r4], m1
 add  r0, r4
-- 
2.13.6
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel