Re: [libav-devel] [FFmpeg-devel] [PATCH] x86inc: Avoid using eax/rax for storing the stack pointer

2016-12-26 Thread Ronald S. Bultje
Hi,

On Mon, Dec 26, 2016 at 4:53 AM, Henrik Gramner <hen...@gramner.com> wrote:

> On Mon, Dec 26, 2016 at 2:32 AM, Ronald S. Bultje <rsbul...@gmail.com>
> wrote:
> > I know I'm terribly nitpicking here for the limited scope of the comment,
> > but this only matters for functions that have a return value. Do you
> think
> > it makes sense to allow functions to opt out of this requirement if they
> > explicitly state to not have a return value?
>
> An opt-out would only be relevant on 64-bit Windows when the following
> criteria are true for a function:
>
> * Reserves exactly 6 registers
> * Reserves stack space with the original stack pointer stored in a
> register (as opposed to the stack)
> * Requires >16 byte stack alignment (e.g. spilling ymm registers to the
> stack)
> * Does not have a return value
>
> If and only if all of those are true this would result in one register
> being unnecessarily saved (the cost of which would likely be hidden by
> OoE). On other systems than WIN64 or if any of the conditions above is
> false an opt-out doesn't make any sense.
>
> Considering how rare that corner case is in combination with how
> fairly insignificant the downside is I'm not sure it makes that much
> sense to complicate the x86inc API further with an opt-out just for
> that specific scenario.
>

 Hm, OK, I think it affects unix64/x86-32 also when using 32-byte
alignment. We do use the stack pointer then. But let's ignore that for a
second, I think it's besides the point.

I think my hesitation comes from how I view x86inc.asm. There's two ways to
see it:
- it's a universal tool, like a compiler, to assist writing assembly
(combined with yasm/nasm as actual assembler);
or
- it's a local tool for ffmpeg/libav/x26[5], like libavutil/attributes.h,
to assist writing assembly.

If x86inc.asm were like a compiler, every micro-optimization, no matter the
benefit, would be important. If it were a local tool, we indeed wouldn't
care because ffmpeg spends most runtime for important use cases in other
areas. (There's obviously a grayscale in this black/white range that I'm
drawing out.) So having said that, patch is OK. If someone would later come
in to add something to take return value type (void vs. non-void) into
account, I would still find that helpful. :)

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] x86inc: Avoid using eax/rax for storing the stack pointer

2016-12-25 Thread Ronald S. Bultje
Hi,

On Sun, Dec 25, 2016 at 2:24 PM, Henrik Gramner  wrote:

> When allocating stack space with an alignment requirement that is larger
> than the current stack alignment we need to store a copy of the original
> stack pointer in order to be able to restore it later.
>
> If we chose to use another register for this purpose we should not pick
> eax/rax since it can be overwritten as a return value.
> ---
>  libavutil/x86/x86inc.asm | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm
> index b2e9c60..128ddc1 100644
> --- a/libavutil/x86/x86inc.asm
> +++ b/libavutil/x86/x86inc.asm
> @@ -385,7 +385,14 @@ DECLARE_REG_TMP_SIZE 0,1,2,3,4,5,6,7,8,9,10,11,12,
> 13,14
>  %ifnum %1
>  %if %1 != 0 && required_stack_alignment > STACK_ALIGNMENT
>  %if %1 > 0
> +; Reserve an additional register for storing the original
> stack pointer, but avoid using
> +; eax/rax for this purpose since it can potentially get
> overwritten as a return value.
>  %assign regs_used (regs_used + 1)
> +%if ARCH_X86_64 && regs_used == 7
> +%assign regs_used 8
> +%elif ARCH_X86_64 == 0 && regs_used == 1
> +%assign regs_used 2
> +%endif
>  %endif
>  %if ARCH_X86_64 && regs_used < 5 + UNIX64 * 3
>  ; Ensure that we don't clobber any registers containing
> arguments. For UNIX64 we also preserve r6 (rax)
> --
> 2.7.4


I know I'm terribly nitpicking here for the limited scope of the comment,
but this only matters for functions that have a return value. Do you think
it makes sense to allow functions to opt out of this requirement if they
explicitly state to not have a return value?

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 2/3] vp9: Add bsf to recombine frames/superframes

2016-11-30 Thread Ronald S. Bultje
Hi,

On Wed, Nov 30, 2016 at 6:48 AM, Mark Thompson <s...@jkqxz.net> wrote:

> On 30/11/16 03:37, Ronald S. Bultje wrote:
> > Hi,
> >
> > On Mon, Nov 28, 2016 at 6:50 PM, Mark Thompson <s...@jkqxz.net> wrote:
> >
> >> ---
> >> Incomplete, but enough to work with the encoder in patch 3.
> >>
> >> Todo:
> >> * Superframe splitting.
> >>
> >
> > From what I understand, Anton ported the vp9 parser into a BSF, which
> > resolves this.
> >
> >
> >> * More options to control recombination.
> >> * Better error handling, especially for slightly broken streams.
> >>
> >>
> >>  libavcodec/Makefile|   1 +
> >>  libavcodec/bitstream_filters.c |   1 +
> >>  libavcodec/vp9_recombine_bsf.c | 557 ++
> >> +++
> >>  3 files changed, 559 insertions(+)
> >>  create mode 100644 libavcodec/vp9_recombine_bsf.c
> >
> >
> > Can you please import the one from ffmpeg?
> >
> > http://git.videolan.org/?p=ffmpeg.git;a=blob;f=
> libavcodec/vp9_superframe_bsf.c;h=b686adbe1673f564d252a30cff11c5
> 895a9a3b55;hb=HEAD
>
> Sure, that's probably a good idea in any case to keep the projects in sync.
>
> Would people prefer that this filter only handles the reordering (with a
> different name, I guess), and then use reorder,merge as separate bsfs for
> the VAAPI case?


Personally I would say yes, obviously depending on this being technically
feasible.

Can you explain the reordering in the docs? (I'm still not 100% sure what
that means.)

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 2/3] vp9: Add bsf to recombine frames/superframes

2016-11-29 Thread Ronald S. Bultje
Hi,

On Mon, Nov 28, 2016 at 6:50 PM, Mark Thompson  wrote:

> ---
> Incomplete, but enough to work with the encoder in patch 3.
>
> Todo:
> * Superframe splitting.
>

>From what I understand, Anton ported the vp9 parser into a BSF, which
resolves this.


> * More options to control recombination.
> * Better error handling, especially for slightly broken streams.
>
>
>  libavcodec/Makefile|   1 +
>  libavcodec/bitstream_filters.c |   1 +
>  libavcodec/vp9_recombine_bsf.c | 557 ++
> +++
>  3 files changed, 559 insertions(+)
>  create mode 100644 libavcodec/vp9_recombine_bsf.c


Can you please import the one from ffmpeg?

http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavcodec/vp9_superframe_bsf.c;h=b686adbe1673f564d252a30cff11c5895a9a3b55;hb=HEAD

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] checkasm: vp9dsp: Benchmark the dc-only version of idct_idct separately

2016-11-17 Thread Ronald S. Bultje
Hi,

On Thu, Nov 17, 2016 at 7:37 AM, Martin Storsjö <mar...@martin.st> wrote:

> On Thu, 17 Nov 2016, Ronald S. Bultje wrote:
>
> Hi,
>>
>> On Mon, Nov 14, 2016 at 4:46 PM, Martin Storsjö <mar...@martin.st> wrote:
>>
>> The dc-only mode is already checked to work correctly above, but this
>>> allows benchmarking this mode for performance tuning, and allows making
>>> sure that it actually is correctly hooked up.
>>> ---
>>>  tests/checkasm/vp9dsp.c | 6 ++
>>>  1 file changed, 6 insertions(+)
>>>
>>> diff --git a/tests/checkasm/vp9dsp.c b/tests/checkasm/vp9dsp.c
>>> index 690e0cf..b9d1c73 100644
>>> --- a/tests/checkasm/vp9dsp.c
>>> +++ b/tests/checkasm/vp9dsp.c
>>> @@ -297,6 +297,12 @@ static void check_itxfm(void)
>>>  }
>>>  bench_new(dst, sz * SIZEOF_PIXEL, coef, sz * sz);
>>>  }
>>> +if (txtp == 0 && tx != 4) {
>>> +if (check_func(dsp.itxfm_add[tx][txtp],
>>> "vp9_inv_%s_%dx%d_dc_add",
>>> +   txtp_types[txtp], sz, sz)) {
>>> +bench_new(dst, sz * SIZEOF_PIXEL, coef, 1);
>>> +}
>>> +}
>>>  }
>>>  }
>>>  report("itxfm");
>>> --
>>> 2.7.4
>>>
>>
>>
>> I had a different local modification that allows tuning all the relevant
>> sub-IDCTs, basically re-arranging the loops so check_func is inside the
>> sub-IDCT loop and we bench each sub-IDCT separately. That's more generic
>> and probably more useful.
>>
>
> Right, that's probably more useful. Would you care to finish that
> modification to get it upstreamed in either project?


Sure, no problem.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH] checkasm: vp9dsp: Benchmark the dc-only version of idct_idct separately

2016-11-17 Thread Ronald S. Bultje
Hi,

On Mon, Nov 14, 2016 at 4:46 PM, Martin Storsjö  wrote:

> The dc-only mode is already checked to work correctly above, but this
> allows benchmarking this mode for performance tuning, and allows making
> sure that it actually is correctly hooked up.
> ---
>  tests/checkasm/vp9dsp.c | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/tests/checkasm/vp9dsp.c b/tests/checkasm/vp9dsp.c
> index 690e0cf..b9d1c73 100644
> --- a/tests/checkasm/vp9dsp.c
> +++ b/tests/checkasm/vp9dsp.c
> @@ -297,6 +297,12 @@ static void check_itxfm(void)
>  }
>  bench_new(dst, sz * SIZEOF_PIXEL, coef, sz * sz);
>  }
> +if (txtp == 0 && tx != 4) {
> +if (check_func(dsp.itxfm_add[tx][txtp],
> "vp9_inv_%s_%dx%d_dc_add",
> +   txtp_types[txtp], sz, sz)) {
> +bench_new(dst, sz * SIZEOF_PIXEL, coef, 1);
> +}
> +}
>  }
>  }
>  report("itxfm");
> --
> 2.7.4


I had a different local modification that allows tuning all the relevant
sub-IDCTs, basically re-arranging the loops so check_func is inside the
sub-IDCT loop and we bench each sub-IDCT separately. That's more generic
and probably more useful.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH] checkasm: Add tests for vp8dsp

2016-07-09 Thread Ronald S. Bultje
Hi,

On Fri, Jul 8, 2016 at 3:00 PM, Martin Storsjö <mar...@martin.st> wrote:

> On Fri, 8 Jul 2016, Ronald S. Bultje wrote:
>
> Hi,
>>
>> On Thu, Jul 7, 2016 at 7:08 PM, Janne Grunau <janne-li...@jannau.net>
>> wrote:
>>
>> +#define SRC_BUF_STRIDE 32
>>>> +#define SRC_BUF_SIZE ((size + 5) * SRC_BUF_STRIDE)
>>>> +// The 2 * stride + 2 offset is necessary to avoid reading out of
>>>>
>>> bounds,
>>>
>>> that sounds a little misleading. just stating that the mc sub pixel
>>> interpolation filter needs 2 previous pixels in either direction would
>>> be clearer. Feel free to ignore
>>>
>>
>>
>> It needs 2 before and 3 after, not 2+2.
>>
>
> What I mostly refer to here is that it needs 2 before both horizontally
> and vertically, explaining the "+2 * SRC_BUF_STRIDE + 2".


Right, I misread Janne's comment, sorry about that.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH] checkasm: Add tests for vp8dsp

2016-07-08 Thread Ronald S. Bultje
Hi,

On Thu, Jul 7, 2016 at 7:08 PM, Janne Grunau  wrote:

> > +#define SRC_BUF_STRIDE 32
> > +#define SRC_BUF_SIZE ((size + 5) * SRC_BUF_STRIDE)
> > +// The 2 * stride + 2 offset is necessary to avoid reading out of
> bounds,
>
> that sounds a little misleading. just stating that the mc sub pixel
> interpolation filter needs 2 previous pixels in either direction would
> be clearer. Feel free to ignore


It needs 2 before and 3 after, not 2+2.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 2/6] hevc: Separate adding residual to prediction from IDCT

2016-07-07 Thread Ronald S. Bultje
Hi,

On Thu, Jul 7, 2016 at 10:53 AM, Ronald S. Bultje <rsbul...@gmail.com>
wrote:

> Hi,
>
> On Thu, Jul 7, 2016 at 9:52 AM, Alexandra Hájková <
> alexandra.khirn...@gmail.com> wrote:
>
>> On Thu, Jul 7, 2016 at 1:53 PM, Ronald S. Bultje <rsbul...@gmail.com>
>> wrote:
>> > On Thu, Jul 7, 2016 at 5:25 AM, Alexandra Hájková <
>> > alexandra.khirn...@gmail.com> wrote:
>> > > +s->hevcdsp.add_residual[log2_trafo_size - 2](dst, coeffs,
>> stride);
>> >
>> > Won't this be slower since there's a memory store intermediate?
>> >
>> > (I know it's faster now because you don't have inverse transform simd,
>> but
>> > you should fix that by writing inverse transform simd, not by splitting
>> the
>> > transform and the add.)
>>
>> Separating adding residual from the transform seems to cause certain
>> slow down but  is needed to separate dc from idct which is faster overall,
>> which I consider a good reason to do this.
>
>
> I'm not sure I understand why, could you elaborate on this?
>
> Sure, simd IDCT is needed and I'm working on it.
>
>
> Great!
>

Btw I'm just noticing that all my comments apply to the ffmpeg codebase
also, so perhaps you should just ignore my comments and we can fix that
later on...

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 2/6] hevc: Separate adding residual to prediction from IDCT

2016-07-07 Thread Ronald S. Bultje
Hi,

On Thu, Jul 7, 2016 at 9:52 AM, Alexandra Hájková <
alexandra.khirn...@gmail.com> wrote:

> On Thu, Jul 7, 2016 at 1:53 PM, Ronald S. Bultje <rsbul...@gmail.com>
> wrote:
> > On Thu, Jul 7, 2016 at 5:25 AM, Alexandra Hájková <
> > alexandra.khirn...@gmail.com> wrote:
> > > +s->hevcdsp.add_residual[log2_trafo_size - 2](dst, coeffs, stride);
> >
> > Won't this be slower since there's a memory store intermediate?
> >
> > (I know it's faster now because you don't have inverse transform simd,
> but
> > you should fix that by writing inverse transform simd, not by splitting
> the
> > transform and the add.)
>
> Separating adding residual from the transform seems to cause certain
> slow down but  is needed to separate dc from idct which is faster overall,
> which I consider a good reason to do this.


I'm not sure I understand why, could you elaborate on this?

Sure, simd IDCT is needed and I'm working on it.


Great!

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 2/6] hevc: Separate adding residual to prediction from IDCT

2016-07-07 Thread Ronald S. Bultje
Hi,

On Thu, Jul 7, 2016 at 5:25 AM, Alexandra Hájková <
alexandra.khirn...@gmail.com> wrote:

>  else if (lc->cu.pred_mode == MODE_INTRA && c_idx == 0 &&
>   log2_trafo_size == 2)
> -s->hevcdsp.transform_4x4_luma_add(dst, coeffs, stride);
> +s->hevcdsp.idct_4x4_luma(coeffs);
>

This is not an idct.


> +s->hevcdsp.add_residual[log2_trafo_size - 2](dst, coeffs, stride);


Won't this be slower since there's a memory store intermediate?

(I know it's faster now because you don't have inverse transform simd, but
you should fix that by writing inverse transform simd, not by splitting the
transform and the add.)

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH] vp8dsp: Clarify the first dimension of the mc function tables

2016-07-05 Thread Ronald S. Bultje
Hi,

one more thing...

On Tue, Jul 5, 2016 at 6:36 AM, Ronald S. Bultje <rsbul...@gmail.com> wrote:

> On Tue, Jul 5, 2016 at 5:58 AM, Martin Storsjö <mar...@martin.st> wrote:
>
>> On Fri, 1 Jul 2016, Martin Storsjö wrote:
>>
>>> Index 0 is 16x16, 1 is 8x8, 2 is 4x4.
>>
>>
This isn't accurate either - since we're nitpicking over technical
correctness. vp8_mc_func has a height argument, so height is not part of
the index, only width. index 0: w=16, index 1: w=8, index 2: w=4. Since
this is apparently dubious, it may help to add this (within braces or so)
as additional information to the documentation if you think it helps.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH] vp8dsp: Clarify the first dimension of the mc function tables

2016-07-05 Thread Ronald S. Bultje
Hi,

On Tue, Jul 5, 2016 at 5:58 AM, Martin Storsjö  wrote:

> On Fri, 1 Jul 2016, Martin Storsjö wrote:
>
> Index 0 is 16x16, 1 is 8x8, 2 is 4x4.
>> ---
>> libavcodec/vp8dsp.h | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/libavcodec/vp8dsp.h b/libavcodec/vp8dsp.h
>> index 4864cf7..b8a30f9 100644
>> --- a/libavcodec/vp8dsp.h
>> +++ b/libavcodec/vp8dsp.h
>> @@ -70,12 +70,12 @@ typedef struct VP8DSPContext {
>> void (*vp8_h_loop_filter_simple)(uint8_t *dst, ptrdiff_t stride, int
>> flim);
>>
>> /**
>> - * first dimension: width>>3, height is assumed equal to width
>> + * first dimension: 2-(width>>3), height is assumed equal to width
>>
>
> FWIW; 2-(width>>3) might give the right values, but isn't technically
> quite right either. The right expression would be 4-log2(width) (or
> 4-av_ctz(width)) - should we go with that instead?


Since it's documentation, I'd go with 4-log2(width).

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 00/38] Convert to the new bitstream reader, set 3

2016-05-26 Thread Ronald S. Bultje
Hi,

On Wed, May 25, 2016 at 7:38 PM, Kieran Kunhya <kier...@obe.tv> wrote:

> On Wed, 25 May 2016 at 19:37 Vittorio Giovara <vittorio.giov...@gmail.com>
> wrote:
>
> > On Wed, May 25, 2016 at 10:49 AM, Ronald S. Bultje <rsbul...@gmail.com>
> > wrote:
> > > Hi,
> > >
> > > On Wed, May 25, 2016 at 9:54 AM, Luca Barbato <lu_z...@gentoo.org>
> > wrote:
> > >
> > >> On 25/05/16 15:32, Ronald S. Bultje wrote:
> > >> > I agree, ARM 32bit results would be very interesting.
> > >>
> > >> The odroid images I have are arm64 only (and I'm still figuring out
> how
> > >> to set it up properly), if you have a fast arm32 to try you are
> welcome
> > >> to beat me at it =)
> > >
> > >
> > > Hm, sorry, x86-32 was the best I could do ;-). I can try to help by
> > finding
> > > other decoders that became slower (on x86-32) maybe? I can also try to
> > > debug why decoders that became slower, actually are slower (on x86-32 -
> > but
> > > then again you already did quite some work in that area). But I also
> kind
> > > of agree with Anton that x86-32 isn't exactly top priority (although
> > > Chromebooks...), arm32 is more interesting.
> >
> > Are there people processing dnxhd or prores on arm32?
> > If so, they should volunteer to run the benchmarks.
> >
>
> VLC on some tablets I guess.


I think this misses the point. I tested 3 decoders, and 2 were slower. It
looks like something between 10 and 100 decoders were changed. I could blow
this up and say that this suggests that there may, in fact, be a
performance problem that might significantly affect the majority of the
decoders on a significant subset of systems.

I'm not going to make that claim just yet because I feel like I don't have
enough data. But claiming that this only affects dnxhd/prores decoding on
VLC on "some tablets" misses the point. I prefer Luca's approach (thanks!)
of testing arm32, even if it takes a little bit of effort.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 00/38] Convert to the new bitstream reader, set 3

2016-05-25 Thread Ronald S. Bultje
Hi,

On Wed, May 25, 2016 at 10:59 AM, Diego Biurrun <di...@biurrun.de> wrote:

> On Wed, May 25, 2016 at 10:49:12AM -0400, Ronald S. Bultje wrote:
> > It's not like this is to be committed tomorrow, right? So take your time,
> > not urgent...
>
> Ummm, no; it's overdue already and was slated for pushing (after far too
> many delays) a few days ago.  So if you want to chip in (which you are
> welcome to), you should do it sooner rather than later.


OK, so it seems Luca is still volunteering to do the arm32 stuff. So, what
can I best help with to "chip in"? I can help figuring out what change in
the bitstream reader makes it slower for these decoders that became slower
on x86-32, and alternatively I can try to test more decoders and find more
that became slower (on x86-32). Which of these is more useful?

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 00/38] Convert to the new bitstream reader, set 3

2016-05-25 Thread Ronald S. Bultje
Hi,

On Wed, May 25, 2016 at 9:54 AM, Luca Barbato <lu_z...@gentoo.org> wrote:

> On 25/05/16 15:32, Ronald S. Bultje wrote:
> > I agree, ARM 32bit results would be very interesting.
>
> The odroid images I have are arm64 only (and I'm still figuring out how
> to set it up properly), if you have a fast arm32 to try you are welcome
> to beat me at it =)


Hm, sorry, x86-32 was the best I could do ;-). I can try to help by finding
other decoders that became slower (on x86-32) maybe? I can also try to
debug why decoders that became slower, actually are slower (on x86-32 - but
then again you already did quite some work in that area). But I also kind
of agree with Anton that x86-32 isn't exactly top priority (although
Chromebooks...), arm32 is more interesting.

It's not like this is to be committed tomorrow, right? So take your time,
not urgent...

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 00/38] Convert to the new bitstream reader, set 3

2016-05-25 Thread Ronald S. Bultje
Hi,

On Wed, May 25, 2016 at 8:20 AM, Luca Barbato  wrote:

> On 25/05/16 12:31, Anton Khirnov wrote:
> > All the results quoted here seem to be from x86 and 32bit x86 is not
> > really all that relevant these days. Did anyone do any ARM tests?
>
> On power8 I'm seeing a 17% speedup for huffyuv, 10% speedup for prores,
> 4% speedup for dnxhd.
>
> Testing on ARM is going to be more contrived than for x86_32: high
> bitrate over tiny platforms mixes not so well.
>
> I'll try to get proper results from the odroid since it is the fastes
> system I have access to, but will take more time.


I agree, ARM 32bit results would be very interesting.

Thanks,
Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 00/38] Convert to the new bitstream reader, set 3

2016-05-24 Thread Ronald S. Bultje
Hi,

On Tue, May 24, 2016 at 3:47 PM, Luca Barbato <lu_z...@gentoo.org> wrote:

> On 23/05/16 17:01, Ronald S. Bultje wrote:
> > Howdy,
>
> Interesting. I spent a bit of time on it myself.
>
> I run some benchmark using a yuv422 file of the right size from the
> Tim's collection [directly][1] and looped/cut to have a length that
> works fine (1minute and 10 minutes) and I used `perf stat -r 30` on a
> system that surely has a cpu unencumbered by random process on a server,
> so it does not have random quirks like a laptop one.
>
> The benchmark shown that force-inlining bitstream_read_vlc is not
> exactly helpful on the poor constained x86_32, and its implementation
> could spare few branches.
>
> With that change in, looks like the gains for x86_64 get even larger.
>
> I get the dnxhd to be about 3% slower on x86_32 and 20% faster on x86_64.

[..]

> And with that I guess we are set =)


But 2 out of 3 are still slower. I can try to look somewhat more into this,
but I think that not understanding what makes it slower is fundamentally
flawed. If we understand why it's slower and we decide that that's OK,
that's an entirely different thing.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 00/38] Convert to the new bitstream reader, set 3

2016-05-23 Thread Ronald S. Bultje
Howdy,

On Sun, May 22, 2016 at 5:27 AM, Alexandra Hájková <
alexandra.khirn...@gmail.com> wrote:

> > Do you have a tree for testing somewhere?
>
> Yes, there's github branch:
> https://github.com/sasshka/libav/tree/get_bits3.
>

Thanks!

> If I find decoders that are slower on 32bit after the patch, will you fix
> > it?
> >
> If you'll find something it will be discussed with the other
> developers and potentially fixed.


So, I tested a few decoders for which it's easy to generate test files. It
looks like dnxhd got about 20% slower slower. avconv-new is
5ab5ff1f0783daf0924fdbd25333ea63a7faeb54 (i.e. tip of your get_bits3
branch), and avconv is 3399a26d3f57d462e839c0ee51223ae9aca20852 (branch
point of get_bits3 branch from upstream). Both are compiled using
"../configure --arch=i386 --extra-cflags='-arch i386'
--extra-ldflags='-arch i386' --enable-gpl && make -j4". Input file was
generated from [1], downsampled to 720p30/yuv420p [2] and then encoded
using [3]. This is decoding time (single-threaded) of the two binaries:

bash-4.3$ for n in {1..5}; do ( time ./avconv -threads 1 -i
/tmp/sat-dnxhd.mov -f null -v 0 -nostats - ) 2>&1|grep user; done
user 0m3.138s
user 0m3.057s
user 0m3.122s
user 0m3.120s
user 0m3.095s
bash-4.3$ for n in {1..5}; do ( time ./avconv-new  -threads 1 -i
/tmp/sat-dnxhd.mov -f null -v 0 -nostats - ) 2>&1|grep user; done
user 0m3.769s
user 0m3.767s
user 0m3.761s
user 0m3.711s
user 0m3.745s

I also tested prores (which looks like it got about 5% faster), and
huffyuv, which seems to be about 10% slower (input generated using [4]):

bash-4.3$ for n in {1..5}; do ( time ./avconv -threads 1 -i
/tmp/sat-huvvyuv.avi -f null -v 0 -nostats - ) 2>&1|grep user; done
user 0m3.782s
user 0m3.776s
user 0m3.780s
user 0m3.835s
user 0m3.773s
bash-4.3$ for n in {1..5}; do ( time ./avconv-new -threads 1 -i
/tmp/sat-huvvyuv.avi -f null -v 0 -nostats - ) 2>&1|grep user; done
user 0m4.127s
user 0m4.162s
user 0m4.159s
user 0m4.134s
user 0m4.124s

I think the speed regression in these 2 decoders (dnxhd/huffyuv) should be
addressed, since this might go beyond just x86-32 and affect other 32-bit
platforms also.

Ronald

[1]
https://media.xiph.org/video/derf/ElFuente/Netflix_SquareAndTimelapse_4096x2160_60fps_10bit_420.y4m
[2] ffmpeg -i Netflix_SquareAndTimelapse_4096x2160_60fps_10bit_420.y4m -vf
framestep=2 -s 1280x720 -pix_fmt yuv420p -c:v ffv1
SquareAndTimelapse.ffv1.mkv
[3] ffmpeg -i SquareAndTimelapse.ffv1.mkv -pix_fmt yuv422p -b:v 75M -c:v
dnxhd /tmp/sat-dnxhd.mov
[4] ffmpeg -i SquareAndTimelapse.ffv1.mkv -c:v huffyuv /tmp/sat-huffyuv.avi

(PS ffmpeg in [2-4] is whatever ships by default in the latest macports,
seems to be 2.8.6.)
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 00/38] Convert to the new bitstream reader, set 3

2016-05-21 Thread Ronald S. Bultje
Hi,

On Sat, May 21, 2016 at 2:39 AM, Alexandra Hájková <
alexandra.khirn...@gmail.com> wrote:

> >
> > I noticed proresdec (for example) is not converted to the new bitstream
> > reader. Is there a reason for that?
>
> Not all the sets are sent yet, I'm sending them gradually to make the
> reviewing
> easier.
> >
> > Also, since this patch basically converts the bitstream reader to 64bits,
> > do people think it would be useful to do some speed tests on 32bit as
> well?
> > I feel that on 32bits, the 64bit emulation might actually slow the thing
> > down considerably, even if it's faster on 64bits.
> We did benchmarks for 32 bits for several decoders and the new bitreader is
> faster or as fast as the old one for the 32 bit CPU.


Do you have a tree for testing somewhere?

If I find decoders that are slower on 32bit after the patch, will you fix
it?

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 00/38] Convert to the new bitstream reader, set 3

2016-05-20 Thread Ronald S. Bultje
Hi,

On Fri, May 20, 2016 at 4:11 PM, Alexandra Hájková <
alexandra.khirn...@gmail.com> wrote:

> This set is compilable together only.


I noticed proresdec (for example) is not converted to the new bitstream
reader. Is there a reason for that?

Also, since this patch basically converts the bitstream reader to 64bits,
do people think it would be useful to do some speed tests on 32bit as well?
I feel that on 32bits, the 64bit emulation might actually slow the thing
down considerably, even if it's faster on 64bits.

(That doesn't mean the patch doesn't have merit, but rather it might mean
that you might want a state size that depends on the bit width of the
architecture. While I agree 32bit x86 is on its way out and possibly
somewhat irrelevant, some - chromebook or x86-android are some examples -
still care about it, and on non-x86, 32bit may actually be a more
predominant target.)

Btw don't get my comments wrong, I'm not criticizing the direction you guys
take, work in this area is good and seems to have merit (as measured on
64bits), so thanks!

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [RFC PATCH 04/12] add the new bitstream reader

2016-04-27 Thread Ronald S. Bultje
Hi,

On Wed, Apr 27, 2016 at 7:37 AM, Alexandra Hájková <
alexandra.khirn...@gmail.com> wrote:

> ---
>  libavcodec/bitstream.h | 475
> +
>  1 file changed, 475 insertions(+)
>  create mode 100644 libavcodec/bitstream.h
>
> diff --git a/libavcodec/bitstream.h b/libavcodec/bitstream.h
> new file mode 100644
> index 000..8793556
> --- /dev/null
> +++ b/libavcodec/bitstream.h
>

So, have you considered to just change the implementation in get_bits.h
instead of this new API? I mean, the API looks largely identical - other
than the prefix. It might even be 100% identical, but I can't be bothered
to check. Large parts of the implementation look like they're copied also.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 02/15] lavc: add a new bitstream filtering API

2016-02-25 Thread Ronald S. Bultje
Hi,

On Thu, Feb 25, 2016 at 2:40 PM, Anton Khirnov <an...@khirnov.net> wrote:

> Quoting Ronald S. Bultje (2016-02-25 19:48:10)
> > Hi,
> >
> > given I'm writing a BSF right now...
> >
> > On Thu, Feb 25, 2016 at 10:05 AM, Anton Khirnov <an...@khirnov.net>
> wrote:
> >
> > > +int (*filter)(AVBSFContext *ctx, AVPacket *pkt);
> > >
> >
> > How do we skip packets? Like, say we have a BSF that merges two packets
> > together into one, how does that work? Maybe it should return a packet
> like
> > avcodec_decode_video2() and return whether it's got a output packet in a
> > separate parameter.
> >
> > I understand how splitting send/receive allows this in the API side, but
> > the internal vfuncs don't expose this bit of functionality yet.
>
> Look at the actual conversion patches. The filter callback calls
> ff_bsf_get_packet() to get input packets from an internal FIFO and
> may or may not put some output to pkt.


What if the BSF splits the packet in two?

(I'm basically describing VP9 superframe packing/unpacking here. This is
not theoretical.)

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 02/15] lavc: add a new bitstream filtering API

2016-02-25 Thread Ronald S. Bultje
Hi,

given I'm writing a BSF right now...

On Thu, Feb 25, 2016 at 10:05 AM, Anton Khirnov  wrote:

> +int (*filter)(AVBSFContext *ctx, AVPacket *pkt);
>

How do we skip packets? Like, say we have a BSF that merges two packets
together into one, how does that work? Maybe it should return a packet like
avcodec_decode_video2() and return whether it's got a output packet in a
separate parameter.

I understand how splitting send/receive allows this in the API side, but
the internal vfuncs don't expose this bit of functionality yet.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [FFmpeg-devel] [PATCHv3] avcodec: Cineform HD Decoder

2016-01-25 Thread Ronald S. Bultje
Hi,

On Sun, Jan 24, 2016 at 7:34 PM, Kieran Kunhya  wrote:

> +static inline void filter(int16_t *output, ptrdiff_t out_stride, int16_t
> *low, ptrdiff_t low_stride,
> +  int16_t *high, ptrdiff_t high_stride, int len,
> uint8_t clip)


Should this be a DSP function? (That is, the functions calling this.)

They seem very SIMD'able.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 4/8] x86inc: Preserve arguments when allocating stack space

2016-01-18 Thread Ronald S. Bultje
Hi,

On Mon, Jan 18, 2016 at 9:37 AM, Henrik Gramner <hen...@gramner.com> wrote:

> On Mon, Jan 18, 2016 at 2:35 PM, Ronald S. Bultje <rsbul...@gmail.com>
> wrote:
> > On Sun, Jan 17, 2016 at 6:21 PM, Henrik Gramner <hen...@gramner.com>
> wrote:
> >> @@ -386,8 +386,10 @@ DECLARE_REG_TMP_SIZE
> >> 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
> >>  %if %1 != 0 && required_stack_alignment > STACK_ALIGNMENT
> >>  %if %1 > 0
> >>  %assign regs_used (regs_used + 1)
> >> -%elif ARCH_X86_64 && regs_used == num_args && num_args <=
> 4 +
> >> UNIX64 * 2
> >> -%warning "Stack pointer will overwrite register
> argument"
> >> +%endif
> >> +%if ARCH_X86_64 && regs_used < 5 + UNIX64 * 3
> >> +; Ensure that we don't clobber any registers containing
> >> arguments
> >> +%assign regs_used 5 + UNIX64 * 3
> >
> > Why 5 + unix * 3 and not 5 +unix * 2? Isn't unix64 6 regs and win64 4
> regs?
>
> Because in the System V ABI, r6 (rax) is used to specify the number of
> arguments passed in vector registers in vararg functions so we use r7
> instead of potentially clobbering it. It's certainly unlikely for it
> to actually be relevant in handwritten assembly functions, but there's
> not really any drawback of supporting that use case here (both r6 and
> r7 are volatile).


Ok. How about we document that with a comment?

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 4/8] x86inc: Preserve arguments when allocating stack space

2016-01-18 Thread Ronald S. Bultje
Hi,

On Sun, Jan 17, 2016 at 6:21 PM, Henrik Gramner  wrote:

> When allocating stack space with a larger alignment than the known stack
> alignment a temporary register is used for storing the stack pointer.
> Ensure that this isn't one of the registers used for passing arguments.
> ---
>  libavutil/x86/x86inc.asm | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm
> index fc58b74..c355ee7 100644
> --- a/libavutil/x86/x86inc.asm
> +++ b/libavutil/x86/x86inc.asm
> @@ -386,8 +386,10 @@ DECLARE_REG_TMP_SIZE
> 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
>  %if %1 != 0 && required_stack_alignment > STACK_ALIGNMENT
>  %if %1 > 0
>  %assign regs_used (regs_used + 1)
> -%elif ARCH_X86_64 && regs_used == num_args && num_args <= 4 +
> UNIX64 * 2
> -%warning "Stack pointer will overwrite register argument"
> +%endif
> +%if ARCH_X86_64 && regs_used < 5 + UNIX64 * 3
> +; Ensure that we don't clobber any registers containing
> arguments
> +%assign regs_used 5 + UNIX64 * 3


Why 5 + unix * 3 and not 5 +unix * 2? Isn't unix64 6 regs and win64 4 regs?

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 8/8] hevcdsp: add x86 SIMD for MC

2015-08-21 Thread Ronald S. Bultje
Hi,

On Wed, Aug 19, 2015 at 7:31 PM, James Almer jamr...@gmail.com wrote:

 On 19/08/15 8:23 PM, Ronald S. Bultje wrote:
  Hi,
 
  On Wed, Aug 19, 2015 at 6:34 PM, James Almer jamr...@gmail.com wrote:
 
  On 19/08/15 4:43 PM, Anton Khirnov wrote:
  ---
   libavcodec/hevc.c |   6 +-
   libavcodec/hevc.h |   2 +-
   libavcodec/hevcdsp.c  |  24 +-
   libavcodec/hevcdsp.h  |   5 +-
   libavcodec/hevcdsp_template.c |   8 +-
   libavcodec/x86/Makefile   |   3 +-
   libavcodec/x86/hevc_mc.asm| 816
  ++
   libavcodec/x86/hevcdsp_init.c | 405 +
   8 files changed, 1258 insertions(+), 11 deletions(-)
   create mode 100644 libavcodec/x86/hevc_mc.asm
 
  I'm getting segmentation faults with quite a few of samples.
  For example
 http://www.elecard.com/assets/files/other/clips/bbb_1080p_c.ts
 
 
  So, at the risk of godwin, why was this reimplemented from scratch,
 rather
  than basing it on what ffmpeg has? How could this possibly be an
 advantage
  to our users?

 Or OpenHEVC for that matter, which is the source of almost every hevc asm
 optimization, x86 or otherwise, and a project that afaik branched off
 libav.


Guys, please, this situation is awful enough as it is, can you please
consider this concern? Ignoring me does not make it better.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 8/8] hevcdsp: add x86 SIMD for MC

2015-08-21 Thread Ronald S. Bultje
Hi,

On Fri, Aug 21, 2015 at 10:02 AM, Anton Khirnov an...@khirnov.net wrote:

 Quoting Ronald S. Bultje (2015-08-21 14:24:47)
  Hi,
 
  On Wed, Aug 19, 2015 at 7:31 PM, James Almer jamr...@gmail.com wrote:
 
   On 19/08/15 8:23 PM, Ronald S. Bultje wrote:
Hi,
   
On Wed, Aug 19, 2015 at 6:34 PM, James Almer jamr...@gmail.com
 wrote:
   
On 19/08/15 4:43 PM, Anton Khirnov wrote:
---
 libavcodec/hevc.c |   6 +-
 libavcodec/hevc.h |   2 +-
 libavcodec/hevcdsp.c  |  24 +-
 libavcodec/hevcdsp.h  |   5 +-
 libavcodec/hevcdsp_template.c |   8 +-
 libavcodec/x86/Makefile   |   3 +-
 libavcodec/x86/hevc_mc.asm| 816
++
 libavcodec/x86/hevcdsp_init.c | 405 +
 8 files changed, 1258 insertions(+), 11 deletions(-)
 create mode 100644 libavcodec/x86/hevc_mc.asm
   
I'm getting segmentation faults with quite a few of samples.
For example
   http://www.elecard.com/assets/files/other/clips/bbb_1080p_c.ts
   
   
So, at the risk of godwin, why was this reimplemented from scratch,
   rather
than basing it on what ffmpeg has? How could this possibly be an
   advantage
to our users?
  
   Or OpenHEVC for that matter, which is the source of almost every hevc
 asm
   optimization, x86 or otherwise, and a project that afaik branched off
   libav.
 
 
  Guys, please, this situation is awful enough as it is, can you please
  consider this concern? Ignoring me does not make it better.

 There's a couple of reasons
 - dealing with the openhevc code has proven to be massive pain in the
   past, requiring significant rewrites to be readable
 - the existing mc asm in ffmpeg taken from openhevc seems to conform to
   this as well
 - it does a bunch of changes to the c code, which smart people told me
   are better not done
 - the codebase in ffmpeg is quite different, so extracting the changes
   is yet more pain
 - I had almost no experience with writing SIMD before this, so I
   wouldn't have been able to review the code properly
 - the result is smaller, more readable (IMO) and faster


There's no avx2. Will you write that too? I'd like to prevent ffmpeg/libav
from massively diverging w.r.t. big features, and no avx2 will mean
significant speed loss if I dump the current code on the ffmpeg side.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 8/8] hevcdsp: add x86 SIMD for MC

2015-08-19 Thread Ronald S. Bultje
Hi,

On Wed, Aug 19, 2015 at 6:34 PM, James Almer jamr...@gmail.com wrote:

 On 19/08/15 4:43 PM, Anton Khirnov wrote:
  ---
   libavcodec/hevc.c |   6 +-
   libavcodec/hevc.h |   2 +-
   libavcodec/hevcdsp.c  |  24 +-
   libavcodec/hevcdsp.h  |   5 +-
   libavcodec/hevcdsp_template.c |   8 +-
   libavcodec/x86/Makefile   |   3 +-
   libavcodec/x86/hevc_mc.asm| 816
 ++
   libavcodec/x86/hevcdsp_init.c | 405 +
   8 files changed, 1258 insertions(+), 11 deletions(-)
   create mode 100644 libavcodec/x86/hevc_mc.asm

 I'm getting segmentation faults with quite a few of samples.
 For example http://www.elecard.com/assets/files/other/clips/bbb_1080p_c.ts


So, at the risk of godwin, why was this reimplemented from scratch, rather
than basing it on what ffmpeg has? How could this possibly be an advantage
to our users?

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 5/6] Postpone API-incompatible changes until the next bump

2015-08-16 Thread Ronald S. Bultje
Hi,

On Sat, Aug 8, 2015 at 7:37 AM, Andreas Cadhalpun 
andreas.cadhal...@googlemail.com wrote:

 Signed-off-by: Andreas Cadhalpun andreas.cadhal...@googlemail.com
 ---
  libavcodec/version.h  | 54
 +--
  libavfilter/version.h | 10 +-
  libavformat/version.h |  6 +++---
  libavutil/version.h   | 10 +-
  4 files changed, 40 insertions(+), 40 deletions(-)

 diff --git a/libavcodec/version.h b/libavcodec/version.h
 index c903d2f..7eedf08 100644
 --- a/libavcodec/version.h
 +++ b/libavcodec/version.h
 @@ -85,31 +85,31 @@
  #define FF_API_MISSING_SAMPLE(LIBAVCODEC_VERSION_MAJOR  57)
  #endif
  #ifndef FF_API_LOWRES
 -#define FF_API_LOWRES(LIBAVCODEC_VERSION_MAJOR  57)
 +#define FF_API_LOWRES(LIBAVCODEC_VERSION_MAJOR  58)
  #endif
  #ifndef FF_API_CAP_VDPAU
 -#define FF_API_CAP_VDPAU (LIBAVCODEC_VERSION_MAJOR  57)
 +#define FF_API_CAP_VDPAU (LIBAVCODEC_VERSION_MAJOR  58)
  #endif
  #ifndef FF_API_BUFS_VDPAU
 -#define FF_API_BUFS_VDPAU(LIBAVCODEC_VERSION_MAJOR  57)
 +#define FF_API_BUFS_VDPAU(LIBAVCODEC_VERSION_MAJOR  58)
  #endif
  #ifndef FF_API_VOXWARE
 -#define FF_API_VOXWARE   (LIBAVCODEC_VERSION_MAJOR  57)
 +#define FF_API_VOXWARE   (LIBAVCODEC_VERSION_MAJOR  58)
  #endif
  #ifndef FF_API_SET_DIMENSIONS
 -#define FF_API_SET_DIMENSIONS(LIBAVCODEC_VERSION_MAJOR  57)
 +#define FF_API_SET_DIMENSIONS(LIBAVCODEC_VERSION_MAJOR  58)
  #endif
  #ifndef FF_API_DEBUG_MV
 -#define FF_API_DEBUG_MV  (LIBAVCODEC_VERSION_MAJOR  57)
 +#define FF_API_DEBUG_MV  (LIBAVCODEC_VERSION_MAJOR  58)
  #endif
  #ifndef FF_API_AC_VLC
 -#define FF_API_AC_VLC(LIBAVCODEC_VERSION_MAJOR  57)
 +#define FF_API_AC_VLC(LIBAVCODEC_VERSION_MAJOR  58)
  #endif
  #ifndef FF_API_OLD_MSMPEG4
 -#define FF_API_OLD_MSMPEG4   (LIBAVCODEC_VERSION_MAJOR  57)
 +#define FF_API_OLD_MSMPEG4   (LIBAVCODEC_VERSION_MAJOR  58)
  #endif
  #ifndef FF_API_ASPECT_EXTENDED
 -#define FF_API_ASPECT_EXTENDED   (LIBAVCODEC_VERSION_MAJOR  57)
 +#define FF_API_ASPECT_EXTENDED   (LIBAVCODEC_VERSION_MAJOR  58)
  #endif
  #ifndef FF_API_THREAD_OPAQUE
  #define FF_API_THREAD_OPAQUE (LIBAVCODEC_VERSION_MAJOR  57)
 @@ -118,58 +118,58 @@
  #define FF_API_CODEC_PKT (LIBAVCODEC_VERSION_MAJOR  57)
  #endif
  #ifndef FF_API_ARCH_ALPHA
 -#define FF_API_ARCH_ALPHA(LIBAVCODEC_VERSION_MAJOR  57)
 +#define FF_API_ARCH_ALPHA(LIBAVCODEC_VERSION_MAJOR  58)
  #endif
  #ifndef FF_API_XVMC
 -#define FF_API_XVMC  (LIBAVCODEC_VERSION_MAJOR  57)
 +#define FF_API_XVMC  (LIBAVCODEC_VERSION_MAJOR  58)
  #endif
  #ifndef FF_API_ERROR_RATE
 -#define FF_API_ERROR_RATE(LIBAVCODEC_VERSION_MAJOR  57)
 +#define FF_API_ERROR_RATE(LIBAVCODEC_VERSION_MAJOR  58)
  #endif
  #ifndef FF_API_QSCALE_TYPE
 -#define FF_API_QSCALE_TYPE   (LIBAVCODEC_VERSION_MAJOR  57)
 +#define FF_API_QSCALE_TYPE   (LIBAVCODEC_VERSION_MAJOR  58)
  #endif
  #ifndef FF_API_MB_TYPE
 -#define FF_API_MB_TYPE   (LIBAVCODEC_VERSION_MAJOR  57)
 +#define FF_API_MB_TYPE   (LIBAVCODEC_VERSION_MAJOR  58)
  #endif
  #ifndef FF_API_MAX_BFRAMES
 -#define FF_API_MAX_BFRAMES   (LIBAVCODEC_VERSION_MAJOR  57)
 +#define FF_API_MAX_BFRAMES   (LIBAVCODEC_VERSION_MAJOR  58)
  #endif
  #ifndef FF_API_NEG_LINESIZES
 -#define FF_API_NEG_LINESIZES (LIBAVCODEC_VERSION_MAJOR  57)
 +#define FF_API_NEG_LINESIZES (LIBAVCODEC_VERSION_MAJOR  58)
  #endif
  #ifndef FF_API_EMU_EDGE
 -#define FF_API_EMU_EDGE  (LIBAVCODEC_VERSION_MAJOR  57)
 +#define FF_API_EMU_EDGE  (LIBAVCODEC_VERSION_MAJOR  58)
  #endif
  #ifndef FF_API_ARCH_SH4
 -#define FF_API_ARCH_SH4  (LIBAVCODEC_VERSION_MAJOR  57)
 +#define FF_API_ARCH_SH4  (LIBAVCODEC_VERSION_MAJOR  58)
  #endif
  #ifndef FF_API_ARCH_SPARC
 -#define FF_API_ARCH_SPARC(LIBAVCODEC_VERSION_MAJOR  57)
 +#define FF_API_ARCH_SPARC(LIBAVCODEC_VERSION_MAJOR  58)
  #endif
  #ifndef FF_API_UNUSED_MEMBERS
 -#define FF_API_UNUSED_MEMBERS(LIBAVCODEC_VERSION_MAJOR  57)
 +#define FF_API_UNUSED_MEMBERS(LIBAVCODEC_VERSION_MAJOR  58)
  #endif
  #ifndef FF_API_IDCT_XVIDMMX
 -#define FF_API_IDCT_XVIDMMX  (LIBAVCODEC_VERSION_MAJOR  57)
 +#define FF_API_IDCT_XVIDMMX  (LIBAVCODEC_VERSION_MAJOR  58)
  #endif
  #ifndef FF_API_INPUT_PRESERVED
 -#define FF_API_INPUT_PRESERVED   (LIBAVCODEC_VERSION_MAJOR  57)
 +#define FF_API_INPUT_PRESERVED   (LIBAVCODEC_VERSION_MAJOR  58)
  #endif
  #ifndef FF_API_NORMALIZE_AQP
 -#define FF_API_NORMALIZE_AQP (LIBAVCODEC_VERSION_MAJOR  57)
 +#define FF_API_NORMALIZE_AQP (LIBAVCODEC_VERSION_MAJOR  58)
  #endif
  #ifndef FF_API_GMC
 -#define FF_API_GMC   (LIBAVCODEC_VERSION_MAJOR  57)
 +#define FF_API_GMC   (LIBAVCODEC_VERSION_MAJOR  58)
  #endif
  #ifndef FF_API_MV0
 -#define FF_API_MV0   (LIBAVCODEC_VERSION_MAJOR  57)
 

Re: [libav-devel] [FFmpeg-devel] [PATCH 3/6] avutil: delay removal of the PIX_FMT_* flags

2015-08-09 Thread Ronald S. Bultje
Hi,

On Sun, Aug 9, 2015 at 11:54 AM, Andreas Cadhalpun 
andreas.cadhal...@googlemail.com wrote:

 Hi,

 On 09.08.2015 12:57, Ronald S. Bultje wrote:
  Yeah, I'm with this. Andreas, the correct fix is to update applications,
  even if that means vendor-specific patches in Debian. These are
  exceptionally trivial patches that you can generate using fairly trivial
  sed scripting.

 The problem is not that creating patches for these two API changes was
 difficult, but that these affect the majority of API users.

  The same goes for other easily scriptable changes. These APIs are gone
 and
  I don't want them back.

 What is your problem with keeping the trivial compatibility layers for
 another year?


We've decided several years ago that this API will go. This API is gone
now. Good riddance. We're going to make pancakes after breaking eggs. It's
going to be yummy. I'll help you decreasing the pain caused by the breaking
of the eggs. That's life.


  Scriptable API changes are out and stay out. Just script a patch in
 Debian.

 I can help you scripting it together if you can't figure it out yourself
  (something like find . -name *.[ch] -exec sed -i -e
  's|PIX_FMT_|AV_PIX_FMT|g' {} \; and then a similar version for this
 change)
  should do the trick.

 This script will miss some cases (*.cpp) and wreck all sorts of havoc
 (AV_AV_PIX_FMT_*). But don't waste your time improving this script.

 If you want to help, please document how avpicture_deinterlace
 can be replaced in practice, that is in a project that doesn't use
 libavfilter yet.


OK. Do you want a documented example on something like stackoverflow, or a
deinterlace-example.c file, or something on the wiki?

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] Remote participation options for IETF session on MKV/FFV1 at July 22 @ 9 CEST

2015-07-21 Thread Ronald S. Bultje
Hi,

On Tue, Jul 21, 2015 at 12:58 PM, Kostya Shishkov kostya.shish...@gmail.com
 wrote:

 On Tue, Jul 21, 2015 at 11:52:55AM -0400, Dave Rice wrote:
  Hi all,
 [...]
  The FFV1 specification work may also be reviewed at github [5] with
 recent rendering in HTML [6] and PDF [7] available. To participate in the
 current standardization efforts of FFV1 please visit the ffmpeg-devel
 mailing list [8] or the #ffmpeg-devel [8] IRC channel on freenode.

 I'd suggest that any standardisation includes not only specification but
 also an independent implementation - it helps to figure out what's wrong
 with
 the specification and maybe gives a small standalone library instead of
 something spread out on half a dozen files in a large software project.


+1. I can't stress how important this is. In addition, the spec should be
the master, not any one implementation (because then the bugs in that one
implementation will be the spec, regardless of what the bug is).

Thank you Kostya.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [FFmpeg-devel] [PATCH] wmavoice: limit wmavoice_decode_packet return value to packet size

2015-06-28 Thread Ronald S. Bultje
Hi,

On Sun, Jun 28, 2015 at 5:28 AM, Andreas Cadhalpun 
andreas.cadhal...@googlemail.com wrote:

 On 27.06.2015 23:01, Michael Niedermayer wrote:
  On Sat, Jun 27, 2015 at 08:36:15PM +0200, Andreas Cadhalpun wrote:
  Claiming to have decoded more bytes than the packet size is wrong.
 
  Signed-off-by: Andreas Cadhalpun andreas.cadhal...@googlemail.com
  ---
   libavcodec/wmavoice.c | 4 ++--
   1 file changed, 2 insertions(+), 2 deletions(-)
 
  diff --git a/libavcodec/wmavoice.c b/libavcodec/wmavoice.c
  index ae88d4e..6cd407a 100644
  --- a/libavcodec/wmavoice.c
  +++ b/libavcodec/wmavoice.c
  @@ -1982,7 +1982,7 @@ static int wmavoice_decode_packet(AVCodecContext
 *ctx, void *data,
   *got_frame_ptr) {
   cnt += s-spillover_nbits;
   s-skip_bits_next = cnt  7;
  -return cnt  3;
  +return FFMIN(cnt  3, avpkt-size);
   } else
   skip_bits_long (gb, s-spillover_nbits - cnt +
   get_bits_count(gb)); // resync
  @@ -2001,7 +2001,7 @@ static int wmavoice_decode_packet(AVCodecContext
 *ctx, void *data,
   } else if (*got_frame_ptr) {
   int cnt = get_bits_count(gb);
   s-skip_bits_next = cnt  7;
  -return cnt  3;
  +return FFMIN(cnt  3, avpkt-size);
   } else if ((s-sframe_cache_size = pos)  0) {
   /* rewind bit reader to start of last (incomplete)
 superframe... */
   init_get_bits(gb, avpkt-data, size  3);
 
  am i assuming correct that gb was read beyond its end ?

 That only happens in the second case, not in the first.

  if so this maybe should be treated as an error instead of cliping

 Treating one like an error, but not the other seems strange as well.
 One could add an explode mode for both. Would that be better?


In the first case, it's an error. If the frame size is 2 bits, the header
is 1, and it specifies a spillover bits of 2, then the frame is clearly
corrupt. Returning an error is fine. The ffmin() isn't necessary. I also
don't think an explode mode check is necessary here, it's a clear error
that is unrecoverable for this frame.

In the second case, does that actually happen? Wmavoice is one of the
limited number of decoders that internally checks for overreads.
get_bits_count() should never overread. Do you have samples for which this
happens? We currently basically return an error on any possible overread
signified in the bitstream (without actually overreading), so doing so here
also would make sense (if it really happens at all).

(We could also remove all the overread checks in the decoder, make it use
the safe bitstream reader mode, and then check for overreads at the end of
synth_superframe or in the caller, and then return an error. I have no
specific preference, and this may lead to less code overall.)

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] h264_cabac: Break infinite loops

2015-01-14 Thread Ronald S. Bultje
Hi,

On Tue, Jan 13, 2015 at 8:43 AM, Luca Barbato lu_z...@gentoo.org wrote:

 On 13/01/15 13:43, Martin Storsjö wrote:
  From: Michael Niedermayer michae...@gmx.at
 
  This fixes out of array reads and/or infinite loops.
 
  CC: libav-sta...@libav.org
  Found-by: Mateusz j00ru Jurczyk and Gynvael Coldwind
  ---
  Not sure exactly which of the fuzzed samples this fixes, I ran
  into other, unrelated, broken samples that triggered this issue
  and found this fix for it.
  ---
   libavcodec/h264_cabac.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)
 
  diff --git a/libavcodec/h264_cabac.c b/libavcodec/h264_cabac.c
  index 1e91626..0ad8ac0 100644
  --- a/libavcodec/h264_cabac.c
  +++ b/libavcodec/h264_cabac.c
  @@ -1712,7 +1712,7 @@ decode_cabac_residual_internal(H264Context *h,
int16_t *block,
   \
   if( coeff_abs = 15 ) { \
   int j = 0; \
  -while( get_cabac_bypass( CC ) ) { \
  +while (get_cabac_bypass(CC)  j  30) { \
   j++; \
   } \
   \
 

 Probably ok, not sure why 30 though.


1707 int coeff_abs = 2; \

So coeff_abs is of type signed int.

[..]
1717 while(get_cabac_bypass( CC )  j30) { \
1718 j++; \
1719 } \
[..]
1721 coeff_abs=1; \
1722 while( j-- ) { \
1723 coeff_abs += coeff_abs + get_cabac_bypass( CC ); \
1724 } \
1725 coeff_abs+= 14; \

Let's rewrite this small block into a different form (for readability in
this particular case):

coeff_abs = 1  j;
while (j--) {
coeff_abs |= get_cabac_bypass(CC)  j;
}
coeff_abs += 14;

And you'll see why 30 is the max. 1  31 is undefined for signed integers.
There is no particular reason why this would be the largest coefficient
that we want to support (really, our storage type is int16_t for 8bit
content so we can't store these coefficients anyway).

HTH,
Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [RFC] jpeg2000: MCT in SSE

2014-09-29 Thread Ronald S. Bultje
Hi,

On Mon, Sep 29, 2014 at 7:07 AM, Luca Barbato lu_z...@gentoo.org wrote:

 On 29/09/14 11:35, Hendrik Leppkes wrote:

 On Mon, Sep 29, 2014 at 9:33 AM, Nicolas Bertrand 
 nicoinatte...@gmail.com
 wrote:

  Finally a 1st optimization patch for jpeg2000!
 To start: an easy one.

 The main help I need: is where and how to put the SSE optimzation in the
 x86 directory.


  We don't allow x86 intrinsics in avcodec. You should write this as a
 yasm
 function.


 Do you have a tutorial about x86inc or something close to that we could
 import in the wiki?


Better yet, x264 has one: https://wiki.videolan.org/X264asm/

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] livestream.com us-local news channels

2014-09-22 Thread Ronald S. Bultje
Hi,

On Mon, Sep 22, 2014 at 12:35 PM, Georg Stein georg_st...@t-online.de
wrote:

 when i call

 int result = avformat_open_input(this-avFormatContext, fileNameChar, fmt,
 NULL);

 the result is 0 but the AVFormatContext does not has any codec/stream in
 the command line tool from ffmepg can download the stream fine with the url
 i use in my app


avformat_find_stream_info().

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] Common mailing-list for API evolutions

2014-08-29 Thread Ronald S. Bultje
Hi,

On Fri, Aug 29, 2014 at 1:30 PM, Kieran Kunhya kier...@obe.tv wrote:

 You're throwing a blob at me without individual author's
 contribution, I don't think that's
 appropriate.

 You've decided to selectively ignore the second part of the message.
 As I read that it says he is not happy with those changes being
 attributed to him.


This is both about being able to selectively and critically (from a
technical PoV) analyze diffs, as well as have legally correct attribution.
I believe both are important, not just because Debian likes it, but because
it's the correct thing to do for whatever happens 10 years from now.
Remember JBs pains to get pieces of VideoLan relicensed under LGPL? Let's
prevent that, it's easy.

I understand that one of the authors (Diego) of these changes on top of
Clement's/my decoder now says that he rescinds his authorship over his
changes (to who? To me? Clement? Both? Some other entity? Copyright doesn't
just vanish into thin air; maybe he wanted to publish it under the WTFPL?),
but I'm not sure that has any legal weight (I certainly don't have written,
signed paperwork on this), nor am I aware of what other authors would need
to rescind their right for me and Clement to have full ownership over this.
And then there's the obvious IANAL problem.

Blobbing stuff together is wrong. It's a mistake, I'm sure people threw out
their git history, so let's just admit there was a mistake and make sure we
don't repeat it. Poor Clement spent considerable time going through your
changes, splitting cosmetics from functional patches. We even found bugs in
your changes as a result of this. It's for the better. Let's get along.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] Common mailing-list for API evolutions

2014-08-29 Thread Ronald S. Bultje
Hi,

On Fri, Aug 29, 2014 at 3:51 PM, Diego Biurrun di...@biurrun.de wrote:


 Am 29.08.2014 20:03 schrieb Ronald S. Bultje rsbul...@gmail.com:
  On Fri, Aug 29, 2014 at 1:30 PM, Kieran Kunhya kier...@obe.tv wrote:
   You're throwing a blob at me without individual author's
   contribution, I don't think that's
   appropriate.
  
   You've decided to selectively ignore the second part of the message.
   As I read that it says he is not happy with those changes being
   attributed to him.
 
  This is both about being able to selectively and critically (from a
  technical PoV) analyze diffs, as well as have legally correct
 attribution.
  I believe both are important, not just because Debian likes it, but
 because
  it's the correct thing to do for whatever happens 10 years from now.
  Remember JBs pains to get pieces of VideoLan relicensed under LGPL? Let's
  prevent that, it's easy.

 Let me get this straight - for a decoder with two authors and no separate
 attribution you are complaining about further changes that do not carry
 separate attribution?


https://github.com/rbultje/ffmpeg/tree/vp9

Yes.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 2/2] lavc: Add a VP9 decoder

2013-11-14 Thread Ronald S. Bultje
Hi,

On Thu, Nov 14, 2013 at 7:58 PM, Luca Barbato lu_z...@gentoo.org wrote:

 From: Ronald S. Bultje rsbul...@gmail.com

 Originally written by Ronald S. Bultje rsbul...@gmail.com and
 Clément Bœsch u...@pkh.me

 Further contributions by:


Please define the further contributions. I am the original author,
together with Clement, I'd like to know what you changed. You're throwing a
blob at me without individual author's contribution, I don't think that's
appropriate.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] lavc: Add a VP9 decoder

2013-11-04 Thread Ronald S. Bultje
Hi,

On Mon, Nov 4, 2013 at 11:27 AM, Luca Barbato lu_z...@gentoo.org wrote:

 From: Ronald S. Bultje rsbul...@gmail.com

 Originally written by Ronald S. Bultje rsbul...@gmail.com with the
 help of Clément Bœsch ubi...@gmail.com.


No. The decoder was written by A and B, not A with the help of B.

Further contributions by:
 Anton Khirnov an...@khirnov.net
 Luca Barbato lu_z...@gentoo.org


As one of the actual authors, I'm very interested in these further
contributions. What are they?

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] FFV1.3 released: How is libav's implementation status?

2013-10-25 Thread Ronald S. Bultje
Hi,

On Fri, Oct 25, 2013 at 4:35 AM, Peter B. p...@das-werkstatt.com wrote:

 On 10/24/2013 11:52 PM, Ronald S. Bultje wrote:
  Hi,
 
  On Thu, Oct 24, 2013 at 5:29 PM, Peter B. p...@das-werkstatt.com wrote:
 
  On 10/21/2013 12:43 AM, Luca Barbato wrote:
  Basically you should make sample files (so 1 frame or 3 frame sample)
  that sports specific corner cases or common cases.
 
  e.g. all the kind of capture you want to make using it.
  Like, e.g. VHS, DigiBeta, etc?
  If so, then how many different samples should I provide?
  The ones I have are usually in yuv422p.
 
  People typically use more standardized existing content, such as:
  http://media.xiph.org/video/derf/

 Hm...
 So, it seems that I didn't yet fully understand the purpose of the
 provided samples.


Right, so this is not about doing quality comparisons (you already did
that, and that's great!) - this is about conformance testing.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] FFV1.3 released: How is libav's implementation status?

2013-10-24 Thread Ronald S. Bultje
Hi,

On Thu, Oct 24, 2013 at 5:29 PM, Peter B. p...@das-werkstatt.com wrote:

 On 10/21/2013 12:43 AM, Luca Barbato wrote:
  Basically you should make sample files (so 1 frame or 3 frame sample)
  that sports specific corner cases or common cases.
 
  e.g. all the kind of capture you want to make using it.

 Like, e.g. VHS, DigiBeta, etc?
 If so, then how many different samples should I provide?
 The ones I have are usually in yuv422p.


People typically use more standardized existing content, such as:
http://media.xiph.org/video/derf/

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] h263dec: use init_get_bits8() and check its return code

2013-10-22 Thread Ronald S. Bultje
Hi,

On Tue, Oct 22, 2013 at 7:19 AM, Derek Buitenhuis 
derek.buitenh...@gmail.com wrote:

 On 10/22/2013 12:09 PM, Luca Barbato wrote:
  -init_get_bits(gb, s-avctx-extradata,
 s-avctx-extradata_size*8);
  -ret = ff_mpeg4_decode_picture_header(s, gb);
  +if (init_get_bits8(gb, s-avctx-extradata,
 s-avctx-extradata_size) = 0 )
  +ret = ff_mpeg4_decode_picture_header(s, gb);
 
  This looks wrong.

 Care to elaborate?


if ((ret = init_get_bits8(...)) = 0)
ret = ff_mpeg4_decode_...(..);

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] h263dec: use init_get_bits8() and check its return code

2013-10-22 Thread Ronald S. Bultje
Hi,

On Tue, Oct 22, 2013 at 10:06 AM, Derek Buitenhuis 
derek.buitenh...@gmail.com wrote:

 On 10/22/2013 2:47 PM, Ronald S. Bultje wrote:
  Right. So the thinking might be, should we perhaps take action based on
  this hypothetical error-like value of ret, such as perhaps aborting
  decoding the rest of the frame, and instead returning an error?

 I have a sample that actually triggers this error case, and it seems
 to abort fine... I don't know *where* it catches it in this mess of
 MPEG code though...


And that supposedly-magic doesn't concern you?

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] h263dec: use init_get_bits8() and check its return code

2013-10-22 Thread Ronald S. Bultje
Hi,

On Tue, Oct 22, 2013 at 10:17 AM, Derek Buitenhuis 
derek.buitenh...@gmail.com wrote:

 On 10/22/2013 3:08 PM, Ronald S. Bultje wrote:
  And that supposedly-magic doesn't concern you?

 Definitely does.

 Guess the consensus is to fail immediately?


That sounds like an outstanding proposal.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] lavc: Edge emulation with dst/src linesize

2013-10-14 Thread Ronald S. Bultje
Hi,

On Sun, Oct 13, 2013 at 11:55 PM, Luca Barbato lu_z...@gentoo.org wrote:

 -static av_noinline void emulated_edge_mc_sse(uint8_t *buf, const uint8_t
 *src,
 - ptrdiff_t linesize,
 +static av_noinline void emulated_edge_mc_sse(uint8_t * buf,const uint8_t
 *src,
 + ptrdiff_t buf_stride,
 + ptrdiff_t src_stride,
   int block_w, int block_h,
   int src_x, int src_y, int w,
 int h)


I don't believe my original patch had this argument ordering. Why did you
change it?

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] lavc: Edge emulation with dst/src linesize

2013-10-14 Thread Ronald S. Bultje
Hi,

On Mon, Oct 14, 2013 at 2:48 PM, Luca Barbato lu_z...@gentoo.org wrote:

 On 14/10/13 15:54, Ronald S. Bultje wrote:
  Hi,
 
  On Sun, Oct 13, 2013 at 11:55 PM, Luca Barbato lu_z...@gentoo.org
 wrote:
 
  -static av_noinline void emulated_edge_mc_sse(uint8_t *buf, const
 uint8_t
  *src,
  - ptrdiff_t linesize,
  +static av_noinline void emulated_edge_mc_sse(uint8_t * buf,const
 uint8_t
  *src,
  + ptrdiff_t buf_stride,
  + ptrdiff_t src_stride,
int block_w, int block_h,
int src_x, int src_y, int
 w,
  int h)
 
 
  I don't believe my original patch had this argument ordering. Why did you
  change it?
 

 Because Kostya liked it better and matches the other functions in dsputil.


Shouldn't my opinion - as author of this code - matter a little? I prefer
that all silly forks keep code identical where it makes sense. That makes
my hobbyist life a lot easier.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 2/5] x86: lpc: simd av_evaluate_lls

2013-06-16 Thread Ronald S. Bultje
Hi Loren,

On Sat, Jun 15, 2013 at 5:53 PM, Loren Merritt lor...@u.washington.eduwrote:

 1.5x-1.8x faster on sandybridge
 ---
  libavutil/lls.c  |  3 +++
  libavutil/lls.h  |  1 +
  libavutil/x86/lls.asm| 31 +++
  libavutil/x86/lls_init.c |  6 +-
  4 files changed, 40 insertions(+), 1 deletion(-)

 diff --git a/libavutil/lls.c b/libavutil/lls.c
 index eb500af..8f1aff1 100644
 --- a/libavutil/lls.c
 +++ b/libavutil/lls.c
 @@ -119,6 +119,9 @@ double avpriv_evaluate_lls(LLSModel *m, double *param,
 int order)
  int i;
  double out = 0;

 +if (m-evaluate_lls)
 +return m-evaluate_lls(m-coeff[order], param, order);


Is there a special reason you didn't assign the default code as default
implementation for evaluate_lls (as in: evaluate_lls_c), as is commonly
done?

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] x86: h264: Don't use redzone in AVX h264_deblock on Win64

2013-02-22 Thread Ronald S. Bultje
Hi,

On Thu, Feb 21, 2013 at 1:11 AM, Martin Storsjö mar...@martin.st wrote:
 +%if WIN64
 +cglobal deblock_%1_luma_intra_8, 4,6,16,0x10
 +%else
  cglobal deblock_%1_luma_intra_8, 4,6,16,ARCH_X86_64*0x50-0x50
 +%endif

I believe this doesn't need a stack pointer, so you can use -0x10.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] x86: h264: Don't use redzone in AVX h264_deblock on Win64

2013-02-22 Thread Ronald S. Bultje
Hi,

On Fri, Feb 22, 2013 at 11:25 AM, Ronald S. Bultje rsbul...@gmail.com wrote:
 On Thu, Feb 21, 2013 at 1:11 AM, Martin Storsjö mar...@martin.st wrote:
 +%if WIN64
 +cglobal deblock_%1_luma_intra_8, 4,6,16,0x10
 +%else
  cglobal deblock_%1_luma_intra_8, 4,6,16,ARCH_X86_64*0x50-0x50
 +%endif

 I believe this doesn't need a stack pointer, so you can use -0x10.

Oh actually that's 32bit only, so that never makes a functional
difference, so nevermind.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] hpeldsp: x86: Convert dsputil_rnd_template to yasm

2013-02-14 Thread Ronald S. Bultje
Hi,

On Feb 14, 2013 4:59 AM, Diego Biurrun di...@biurrun.de wrote:
 On Wed, Feb 13, 2013 at 05:53:36PM -0500, Daniel Kang wrote:
  @@ -56,6 +107,44 @@ PUT_PIXELS8_X2
 
  +%macro PUT_PIXELS8_X2_MMX 0-1
  +%if %0 == 1
  +cglobal put%1_pixels8_x2, 4,4
  +%else
  +cglobal put_pixels8_x2, 4,4
  +%endif

 IIRC you don't need the %if, but you can just pass an empty
 first parameter and it should do the right thing.
 .. more below ..

MACRO 0-1  sets an empty string by default.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH] h264: integrate clear_blocks calls with IDCT.

2013-02-09 Thread Ronald S. Bultje
From: Ronald S. Bultje rsbul...@gmail.com

In case of no-transform, integrate it with put_pixels4/8(). Intra PCM
is changed to not use h-mb anymore (saves a memcpy). The one during
update_thread_context() init is removed by removing the memcpy() that
clobbered it in the first place. Together, this makes the H264 decoder
almost-independent of dsputil.

Arm assembly changes untested.
---
 libavcodec/arm/h264idct_neon.S |  20 +--
 libavcodec/get_bits.h  |   3 +-
 libavcodec/h264.c  |   7 ++-
 libavcodec/h264.h  |   1 +
 libavcodec/h264_cabac.c|   4 +-
 libavcodec/h264_cavlc.c|  10 ++--
 libavcodec/h264_mb_template.c  |  20 +++
 libavcodec/h264idct_template.c |  16 --
 libavcodec/h264pred.h  |   8 +--
 libavcodec/h264pred_template.c |  28 ++
 libavcodec/ppc/h264_altivec.c  |   3 ++
 libavcodec/svq3.c  |   2 +
 libavcodec/x86/h264_idct.asm   | 108 -
 libavcodec/x86/h264_idct_10bit.asm |  53 --
 14 files changed, 204 insertions(+), 79 deletions(-)

diff --git a/libavcodec/arm/h264idct_neon.S b/libavcodec/arm/h264idct_neon.S
index 1b349ce..73b2260 100644
--- a/libavcodec/arm/h264idct_neon.S
+++ b/libavcodec/arm/h264idct_neon.S
@@ -22,9 +22,12 @@
 
 function ff_h264_idct_add_neon, export=1
 vld1.64 {d0-d3},  [r1,:128]
+vmov.i16q15, #0
 
 vswpd1,  d2
+vst1.16 {q15},[r1,:128]!
 vadd.i16d4,  d0,  d1
+vst1.16 {q15},[r1,:128]!
 vshr.s16q8,  q1,  #1
 vsub.i16d5,  d0,  d1
 vadd.i16d6,  d2,  d17
@@ -69,7 +72,9 @@ function ff_h264_idct_add_neon, export=1
 endfunc
 
 function ff_h264_idct_dc_add_neon, export=1
+mov r3,   #0
 vld1.16 {d2[],d3[]}, [r1,:16]
+strhr3,   [r1]
 vrshr.s16   q1,  q1,  #6
 vld1.32 {d0[0]},  [r0,:32], r2
 vld1.32 {d0[1]},  [r0,:32], r2
@@ -180,7 +185,8 @@ endfunc
 qb  .reqq14
 vshr.s16q2,  q10, #1
 vadd.i16q0,  q8,  q12
-vld1.16 {q14-q15},[r1,:128]!
+vld1.16 {q14-q15},[r1,:128]
+vst1.16 {q3}, [r1,:128]!
 vsub.i16q1,  q8,  q12
 vshr.s16q3,  q14, #1
 vsub.i16q2,  q2,  q14
@@ -259,9 +265,13 @@ endfunc
 .endm
 
 function ff_h264_idct8_add_neon, export=1
-vld1.16 {q8-q9},  [r1,:128]!
-vld1.16 {q10-q11},[r1,:128]!
-vld1.16 {q12-q13},[r1,:128]!
+vmov.i16q7,   #0
+vld1.16 {q8-q9},  [r1,:128]
+vst1.16 {q3}, [r1,:128]!
+vld1.16 {q10-q11},[r1,:128]
+vst1.16 {q3}, [r1,:128]!
+vld1.16 {q12-q13},[r1,:128]
+vst1.16 {q3}, [r1,:128]!
 
 idct8x8_cols0
 idct8x8_cols1
@@ -313,7 +323,9 @@ function ff_h264_idct8_add_neon, export=1
 endfunc
 
 function ff_h264_idct8_dc_add_neon, export=1
+mov r3,   #0
 vld1.16 {d30[],d31[]},[r1,:16]
+strhr3,   [r1]
 vld1.32 {d0}, [r0,:64], r2
 vrshr.s16   q15, q15, #6
 vld1.32 {d1}, [r0,:64], r2
diff --git a/libavcodec/get_bits.h b/libavcodec/get_bits.h
index 7129b17..f16a508 100644
--- a/libavcodec/get_bits.h
+++ b/libavcodec/get_bits.h
@@ -415,11 +415,12 @@ static inline int init_get_bits8(GetBitContext *s, const 
uint8_t *buffer,
 return init_get_bits(s, buffer, byte_size * 8);
 }
 
-static inline void align_get_bits(GetBitContext *s)
+static inline const uint8_t *align_get_bits(GetBitContext *s)
 {
 int n = -get_bits_count(s)  7;
 if (n)
 skip_bits(s, n);
+return s-buffer + (s-index  3);
 }
 
 #define init_vlc(vlc, nb_bits, nb_codes,\
diff --git a/libavcodec/h264.c b/libavcodec/h264.c
index cfcb552..a0bf031 100644
--- a/libavcodec/h264.c
+++ b/libavcodec/h264.c
@@ -1249,7 +1249,9 @@ static int decode_update_thread_context(AVCodecContext 
*dst,
 
 // copy all fields after MpegEnc
 memcpy(h-s + 1, h1-s + 1,
-   sizeof(H264Context) - sizeof(MpegEncContext));
+   offsetof(H264Context, intra_gb) - sizeof(MpegEncContext));
+memcpy(h-cabac, h1-cabac,
+   sizeof(H264Context) - offsetof(H264Context, cabac));
 memset(h-sps_buffers, 0, sizeof(h-sps_buffers));
 memset(h-pps_buffers, 0, sizeof(h-pps_buffers));
 
@@ -1269,9 +1271,6 @@ static int decode_update_thread_context(AVCodecContext 
*dst,
 h-bipred_scratchpad = NULL;
 
 h-thread_context[0] = h;
-
-s-dsp.clear_blocks(h-mb);
-s-dsp.clear_blocks(h-mb + (24 * 16  h-pixel_shift

Re: [libav-devel] [PATCH] hpel: split off halfpel MC from dsputil into new DSP context.

2013-01-29 Thread Ronald S. Bultje
Hi,

On Jan 28, 2013 11:44 PM, Luca Barbato lu_z...@gentoo.org wrote:

 On 28/01/13 20:32, Ronald S. Bultje wrote:
  From: Ronald S. Bultje rsbul...@gmail.com
 
  This allows objects to use just halfpel MC without depending on all
  of dsputil. E.g. indeo3, interplayvideo and svq1dec become dsputil-
  independent. The fine-grained HAVE_HPEL_* flags allow only compiling
  a subset of the HPEL functions if a codec only uses a subset of them.
  Currently, only the C code uses this, I'll add this to the assembly
  at a later point.
 

 PPC had a mismatch in the headers int vs ptrdiff_t fixed locally.

 It needs a small rebase to fit the bfin changes,

 The rest builds fine on all the arches I could try.

Send a patch please, Anton had some more modification requests, so it can't
be entirely applied as-is.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH] bfin: vp3: Separate VP3 initialization code

2013-01-28 Thread Ronald S. Bultje
Hi,

On Mon, Jan 28, 2013 at 11:01 AM, Luca Barbato lu_z...@gentoo.org wrote:
 On 28/01/13 20:00, Diego Biurrun wrote:
 On Thu, Jan 24, 2013 at 11:44:00AM +0100, Diego Biurrun wrote:
 On Wed, Jan 23, 2013 at 04:45:22PM +0100, Luca Barbato wrote:
 From: Diego Biurrun di...@biurrun.de

 Signed-off-by: Luca Barbato lu_z...@gentoo.org
 ---

 Rebased after the move to int16_t.

 En passant, adding the missing memsets as pointed by Ronald.

 Please separate those, patch LGTM otherwise.

 .. ping ..

 This conflicts with and thus holds up hpeldsp.


 I'll split it now, first memsets then the rest, ok?

No, just commit it. I'm really bright enough to do a trivial rebase.
Please review each patch on its own merits.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] x86: Simplify some arch conditionals

2013-01-28 Thread Ronald S. Bultje
Hi,

On Mon, Jan 28, 2013 at 10:53 AM, Diego Biurrun di...@biurrun.de wrote:
 ---
  libavcodec/x86/h264_qpel.c  |2 +-
  libavcodec/x86/idct_sse2_xvid.c |2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

OK.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


[libav-devel] [PATCH] dsputil: remove unused functions copy_block{2, 4, 8, 16}.

2013-01-28 Thread Ronald S. Bultje
From: Ronald S. Bultje rsbul...@gmail.com

---
 libavcodec/dsputil_template.c | 48 ---
 1 file changed, 48 deletions(-)

diff --git a/libavcodec/dsputil_template.c b/libavcodec/dsputil_template.c
index f5811c1..a8e4e0b 100644
--- a/libavcodec/dsputil_template.c
+++ b/libavcodec/dsputil_template.c
@@ -29,54 +29,6 @@
 
 #include bit_depth_template.c
 
-static inline void FUNC(copy_block2)(uint8_t *dst, const uint8_t *src, int 
dstStride, int srcStride, int h)
-{
-int i;
-for(i=0; ih; i++)
-{
-AV_WN2P(dst   , AV_RN2P(src   ));
-dst+=dstStride;
-src+=srcStride;
-}
-}
-
-static inline void FUNC(copy_block4)(uint8_t *dst, const uint8_t *src, int 
dstStride, int srcStride, int h)
-{
-int i;
-for(i=0; ih; i++)
-{
-AV_WN4P(dst   , AV_RN4P(src   ));
-dst+=dstStride;
-src+=srcStride;
-}
-}
-
-static inline void FUNC(copy_block8)(uint8_t *dst, const uint8_t *src, int 
dstStride, int srcStride, int h)
-{
-int i;
-for(i=0; ih; i++)
-{
-AV_WN4P(dst, AV_RN4P(src));
-AV_WN4P(dst+4*sizeof(pixel), AV_RN4P(src+4*sizeof(pixel)));
-dst+=dstStride;
-src+=srcStride;
-}
-}
-
-static inline void FUNC(copy_block16)(uint8_t *dst, const uint8_t *src, int 
dstStride, int srcStride, int h)
-{
-int i;
-for(i=0; ih; i++)
-{
-AV_WN4P(dst , AV_RN4P(src ));
-AV_WN4P(dst+ 4*sizeof(pixel), AV_RN4P(src+ 4*sizeof(pixel)));
-AV_WN4P(dst+ 8*sizeof(pixel), AV_RN4P(src+ 8*sizeof(pixel)));
-AV_WN4P(dst+12*sizeof(pixel), AV_RN4P(src+12*sizeof(pixel)));
-dst+=dstStride;
-src+=srcStride;
-}
-}
-
 /* draw the edges of width 'w' of an image of size width, height */
 //FIXME check that this is ok for mpeg4 interlaced
 static void FUNCC(draw_edges)(uint8_t *_buf, int _wrap, int width, int height, 
int w, int h, int sides)
-- 
1.7.11.3

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] dsputil: Fix error by not using redzone and register name

2013-01-27 Thread Ronald S. Bultje
Hi,

On Sun, Jan 27, 2013 at 1:28 PM, Derek Buitenhuis
derek.buitenh...@gmail.com wrote:
 From: Daniel Kang daniel.d.k...@gmail.com

 Signed-off-by: Derek Buitenhuis derek.buitenh...@gmail.com
 ---
  libavcodec/x86/hpeldsp.asm   |6 +++---
  libavcodec/x86/mpeg4qpel.asm |6 +++---
  2 files changed, 6 insertions(+), 6 deletions(-)

OK.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 1/5] sws: GBRP9, GBRP10 GBRP12 GBRP14 output support

2013-01-27 Thread Ronald S. Bultje
Hi,

On Sun, Jan 27, 2013 at 1:25 PM, Derek Buitenhuis
derek.buitenh...@gmail.com wrote:
 From: Michael Niedermayer michae...@gmx.at

 Signed-off-by: Michael Niedermayer michae...@gmx.at
 Signed-off-by: Derek Buitenhuis derek.buitenh...@gmail.com
 ---
  libswscale/output.c   |   28 +---
  libswscale/utils.c|   16 
  tests/ref/lavfi/pixdesc   |6 ++
  tests/ref/lavfi/pixfmts_copy  |6 ++
  tests/ref/lavfi/pixfmts_null  |6 ++
  tests/ref/lavfi/pixfmts_scale |6 ++
  tests/ref/lavfi/pixfmts_vflip |6 ++
  7 files changed, 63 insertions(+), 11 deletions(-)

OK if you fix title (no 14/12 bit bgrp here).

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 2/5] sws: use planarRgbToRgbWrapper only for 8bit per component

2013-01-27 Thread Ronald S. Bultje
Hi,

On Sun, Jan 27, 2013 at 1:25 PM, Derek Buitenhuis
derek.buitenh...@gmail.com wrote:
 From: Michael Niedermayer michae...@gmx.at

 The function doesnt support 8bit currently

 Signed-off-by: Michael Niedermayer michae...@gmx.at
 Signed-off-by: Derek Buitenhuis derek.buitenh...@gmail.com
 ---
  libswscale/swscale_unscaled.c |   10 +-
  1 file changed, 9 insertions(+), 1 deletion(-)

Fine.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 3/5] sws: dont enable chrSrcHSubSample for planar RGB

2013-01-27 Thread Ronald S. Bultje
Hi,

On Sun, Jan 27, 2013 at 1:25 PM, Derek Buitenhuis
derek.buitenh...@gmail.com wrote:
 From: Michael Niedermayer michae...@gmx.at

 This code path is not implemented and makes not much sense to implement
 either.

 Signed-off-by: Michael Niedermayer michae...@gmx.at
 Signed-off-by: Derek Buitenhuis derek.buitenh...@gmail.com
 ---
  libswscale/utils.c |3 +++
  1 file changed, 3 insertions(+)

ok.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 5/5] sws: disable yuv2rgb warning for planar rgb.

2013-01-27 Thread Ronald S. Bultje
Hi,

On Sun, Jan 27, 2013 at 1:25 PM, Derek Buitenhuis
derek.buitenh...@gmail.com wrote:
 From: Michael Niedermayer michae...@gmx.at

 planar rgb formats do not use the table

 Signed-off-by: Michael Niedermayer michae...@gmx.at
 Signed-off-by: Derek Buitenhuis derek.buitenh...@gmail.com
 ---
  libswscale/yuv2rgb.c |3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

OK.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 4/5] sws: add planar RGB formats to isAnyRGB

2013-01-27 Thread Ronald S. Bultje
Hi,

On Sun, Jan 27, 2013 at 1:25 PM, Derek Buitenhuis
derek.buitenh...@gmail.com wrote:
 From: Michael Niedermayer michae...@gmx.at

 We have to make some symetric changes elsewhere as this increases
 the precission with which samples are stored.

 Signed-off-by: Michael Niedermayer michae...@gmx.at
 Signed-off-by: Derek Buitenhuis derek.buitenh...@gmail.com
 ---
  libswscale/input.c|6 +++---
  libswscale/swscale_internal.h |   10 +-
  tests/ref/lavfi/pixfmts_scale |   12 ++--
  3 files changed, 18 insertions(+), 10 deletions(-)

 diff --git a/libswscale/input.c b/libswscale/input.c
 index 2e8d43f..64ab0b9 100644
 --- a/libswscale/input.c
 +++ b/libswscale/input.c
 @@ -579,7 +579,7 @@ static av_always_inline void planar_rgb16_to_y(uint8_t 
 *_dst, const uint8_t *_sr
  int b = rdpx(src[1] + i);
  int r = rdpx(src[2] + i);

 -dst[i] = ((RY * r + GY * g + BY * b + (33  (RGB2YUV_SHIFT + bpc - 
 9)))  RGB2YUV_SHIFT);
 +dst[i] = ((RY * r + GY * g + BY * b + (33  (RGB2YUV_SHIFT + bpc - 
 9)))  (RGB2YUV_SHIFT + bpc - 14));

This looks wrong. The original code also btw. We're doing
(cast_to_fixed_point)(r*ry + g*gy + b*by + 16 + 0.5). I don't think
this does the correct thing now.

 @@ -626,8 +626,8 @@ static av_always_inline void planar_rgb16_to_uv(uint8_t 
 *_dstU, uint8_t *_dstV,
  int b = rdpx(src[1] + i);
  int r = rdpx(src[2] + i);

 -dstU[i] = (RU * r + GU * g + BU * b + (257  (RGB2YUV_SHIFT + bpc - 
 9)))  RGB2YUV_SHIFT;
 -dstV[i] = (RV * r + GV * g + BV * b + (257  (RGB2YUV_SHIFT + bpc - 
 9)))  RGB2YUV_SHIFT;
 +dstU[i] = (RU * r + GU * g + BU * b + (257  (RGB2YUV_SHIFT + bpc - 
 9)))  (RGB2YUV_SHIFT + bpc - 14);
 +dstV[i] = (RV * r + GV * g + BV * b + (257  (RGB2YUV_SHIFT + bpc - 
 9)))  (RGB2YUV_SHIFT + bpc - 14);

Same. The round and shift should correspond to each other in both
cases, and now they don't.

  #define isAnyRGB(x)\
  (isRGBinInt(x)  || \
 - isBGRinInt(x))
 + isBGRinInt(x)  || \
 + (x)==AV_PIX_FMT_GBRP9LE|| \
 + (x)==AV_PIX_FMT_GBRP9BE|| \
 + (x)==AV_PIX_FMT_GBRP10LE   || \
 + (x)==AV_PIX_FMT_GBRP10BE   || \
 + (x)==AV_PIX_FMT_GBRP16LE   || \
 + (x)==AV_PIX_FMT_GBRP16BE   || \
 + (x)==AV_PIX_FMT_GBRP  \
 +)

|| isPlanarRGB(x)?

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] x86: hpeldsp: Fix a typo, use the right register

2013-01-27 Thread Ronald S. Bultje
Hi,

On Sun, Jan 27, 2013 at 2:03 PM, Martin Storsjö mar...@martin.st wrote:
 From: Michael Niedermayer michae...@gmx.at

 This makes the code actually work.

 ---
 Allegedly.
 ---
  libavcodec/x86/hpeldsp.asm |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/libavcodec/x86/hpeldsp.asm b/libavcodec/x86/hpeldsp.asm
 index 8afd955..12d42c7 100644
 --- a/libavcodec/x86/hpeldsp.asm
 +++ b/libavcodec/x86/hpeldsp.asm
 @@ -452,7 +452,7 @@ cglobal avg_pixels8_xy2, 4,5
  pavgbm2, [r0]
  pavgbm1, [r0+r2]
  mova   [r0], m2
 -mova[r0+r2], m2
 +mova[r0+r2], m1
  add  r0, r4
  sub r3d, 4
  jne .loop

OK.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] swscale: GBRP output support

2013-01-26 Thread Ronald S. Bultje
Hi,

On Sat, Jan 26, 2013 at 10:59 AM, Derek Buitenhuis
derek.buitenh...@gmail.com wrote:
 On 2013-01-26 1:57 PM, Diego Biurrun wrote:
 Indentation is off, see the other function declarations in that file
 for how it should look like.

 Right. Fixed locally.

 Will re-send after more review.

Functionally lgtm.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH v2] dsputil: x86: Convert mpeg4 qpel and dsputil avg to yasm

2013-01-26 Thread Ronald S. Bultje
Hi,

On Sat, Jan 26, 2013 at 10:01 AM, Daniel Kang daniel.d.k...@gmail.com wrote:
 On Sat, Jan 26, 2013 at 3:23 AM, Diego Biurrun di...@biurrun.de wrote:
 On Sat, Jan 26, 2013 at 12:32:16AM -0500, Daniel Kang wrote:
 --- a/libavcodec/x86/dsputil.asm
 +++ b/libavcodec/x86/dsputil.asm
 @@ -879,3 +884,984 @@ cglobal avg_pixels16, 4,5,4
 +
 +; HPEL mmxext
 +%macro PAVGB_OP 2
 +%if cpuflag(3dnow)
 +pavgusb %1, %2
 +%else
 +pavgb   %1, %2
 +%endif
 +%endmacro

 We have a macro for this in x86util.asm and it works the other way around.
 I'm very suspicious of this doing the right thing on CPUs with mmxext and
 3dnow ...

 You're probably right. Fixed.

 +; mpeg4 qpel
 +
 +%macro MPEG4_QPEL16_H_LOWPASS 1
 +cglobal %1_mpeg4_qpel16_h_lowpass, 5, 5, 0, 8

 So it seems like dsputil.asm is becoming the new dumping ground for
 functions of all kind.  It doubles in size after your patch and at
 around 2k lines it starts to work against our current efforts of
 splitting dsputil into sensibly-sized pieces.  If you continue your
 porting efforts, it will probably end up around 5k lines or so.

 Whenever there is an opportunity to make dsputil less monolithic comes
 up, we should exploit it.  That seems to be the case here.

 I was trying to avoid drama and bikeshedding re: file names and save
 that for another patch. I guess I could split it in this patch if you
 want.

While at it, please split hpel functions to a new file called
hpeldsp.asm. This will make my life slightly easier later on.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] dsputil: x86: Convert mpeg4 qpel and dsputil avg to yasm

2013-01-26 Thread Ronald S. Bultje
Hi,

On Sat, Jan 26, 2013 at 4:58 PM, Daniel Kang daniel.d.k...@gmail.com wrote:
 ---
 Make hpeldsp.asm
 ---
  libavcodec/x86/Makefile   |2 +
  libavcodec/x86/dsputil.asm|  142 ++
  libavcodec/x86/dsputil_avg_template.c |  789 ++---
  libavcodec/x86/dsputil_mmx.c  |  874 
 -
  libavcodec/x86/hpeldsp.asm|  465 ++
  libavcodec/x86/mpeg4qpel.asm  |  422 
  libavcodec/x86/vc1dsp_mmx.c   |4 +
  7 files changed, 1386 insertions(+), 1312 deletions(-)
  create mode 100644 libavcodec/x86/hpeldsp.asm
  create mode 100644 libavcodec/x86/mpeg4qpel.asm

 diff --git a/libavcodec/x86/Makefile b/libavcodec/x86/Makefile
 index 9b8b653..1feb060 100644
 --- a/libavcodec/x86/Makefile
 +++ b/libavcodec/x86/Makefile
 @@ -71,3 +71,5 @@ YASM-OBJS-$(CONFIG_VP8_DECODER)+= x86/vp8dsp.o
  YASM-OBJS  += x86/dsputil.o \
x86/deinterlace.o \
x86/fmtconvert.o  \
 +  x86/hpeldsp.o \
 +  x86/mpeg4qpel.o   \
 diff --git a/libavcodec/x86/dsputil.asm b/libavcodec/x86/dsputil.asm
 index 65f4b37..7953a5d 100644
 --- a/libavcodec/x86/dsputil.asm
 +++ b/libavcodec/x86/dsputil.asm
 @@ -22,6 +22,11 @@
  %include libavutil/x86/x86util.asm

  SECTION_RODATA
 +cextern pb_1
 +cextern pw_3
 +cextern pw_15
 +cextern pw_16
 +cextern pw_20
  pb_f: times 16 db 15
  pb_: times 8 db -1
  pb_7: times 8 db 7
 @@ -879,3 +884,140 @@ cglobal avg_pixels16, 4,5,4
  lea  r0, [r0+r2*4]
  jnz   .loop
  REP_RET
 +
 +
 +; put_no_rnd_pixels8_l2(uint8_t *dst, uint8_t *src1, uint8_t *src2, int 
 dstStride, int src1Stride, int h)
 +%macro PUT_NO_RND_PIXELS8_L2 0
 +cglobal put_no_rnd_pixels8_l2, 6,6

I believe these are only used in mpeg4 qpel, so they can also be moved
to mpeg4qpel.asm.

Otherwise looks pretty good.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 1/1] h264: copy h264qpel dsp context to slice thread copies

2013-01-24 Thread Ronald S. Bultje
Hi,

On Thu, Jan 24, 2013 at 7:56 AM, Janne Grunau janne-li...@jannau.net wrote:
 ---
  libavcodec/h264.c | 1 +
  1 file changed, 1 insertion(+)

 diff --git a/libavcodec/h264.c b/libavcodec/h264.c
 index 9e9384b..38a6f5e 100644
 --- a/libavcodec/h264.c
 +++ b/libavcodec/h264.c
 @@ -2556,6 +2556,7 @@ static int h264_slice_header_init(H264Context *h, int 
 reinit)
  memcpy(c, h-s.thread_context[i], sizeof(MpegEncContext));
  memset(c-s + 1, 0, sizeof(H264Context) - 
 sizeof(MpegEncContext));
  c-h264dsp = h-h264dsp;
 +c-h264qpel= h-h264qpel;

OK.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


[libav-devel] [PATCH] Move H264/QPEL specific asm from dsputil.asm to h264_qpel_*.asm.

2013-01-24 Thread Ronald S. Bultje
From: Ronald S. Bultje rsbul...@gmail.com

---
 libavcodec/x86/dsputil.asm| 188 --
 libavcodec/x86/h264_qpel_8bit.asm | 169 ++
 2 files changed, 169 insertions(+), 188 deletions(-)

diff --git a/libavcodec/x86/dsputil.asm b/libavcodec/x86/dsputil.asm
index 9bc6e3f..8002779 100644
--- a/libavcodec/x86/dsputil.asm
+++ b/libavcodec/x86/dsputil.asm
@@ -648,191 +648,3 @@ BSWAP32_BUF
 
 INIT_XMM ssse3
 BSWAP32_BUF
-
-%macro op_avgh 3
-movh   %3, %2
-pavgb  %1, %3
-movh   %2, %1
-%endmacro
-
-%macro op_avg 2
-pavgb  %1, %2
-mova   %2, %1
-%endmacro
-
-%macro op_puth 2-3
-movh   %2, %1
-%endmacro
-
-%macro op_put 2
-mova   %2, %1
-%endmacro
-
-; void pixels4_l2_mmxext(uint8_t *dst, uint8_t *src1, uint8_t *src2, int 
dstStride, int src1Stride, int h)
-%macro PIXELS4_L2 1
-%define OP op_%1h
-cglobal %1_pixels4_l2, 6,6
-movsxdifnidn r3, r3d
-movsxdifnidn r4, r4d
-testr5d, 1
-je.loop
-movd m0, [r1]
-movd m1, [r2]
-add  r1, r4
-add  r2, 4
-pavgbm0, m1
-OP   m0, [r0], m3
-add  r0, r3
-dec r5d
-.loop:
-mova m0, [r1]
-mova m1, [r1+r4]
-lea  r1, [r1+2*r4]
-pavgbm0, [r2]
-pavgbm1, [r2+4]
-OP   m0, [r0], m3
-OP   m1, [r0+r3], m3
-lea  r0, [r0+2*r3]
-mova m0, [r1]
-mova m1, [r1+r4]
-lea  r1, [r1+2*r4]
-pavgbm0, [r2+8]
-pavgbm1, [r2+12]
-OP   m0, [r0], m3
-OP   m1, [r0+r3], m3
-lea  r0, [r0+2*r3]
-add  r2, 16
-sub r5d, 4
-jne   .loop
-REP_RET
-%endmacro
-
-INIT_MMX mmxext
-PIXELS4_L2 put
-PIXELS4_L2 avg
-
-; void pixels8_l2_mmxext(uint8_t *dst, uint8_t *src1, uint8_t *src2, int 
dstStride, int src1Stride, int h)
-%macro PIXELS8_L2 1
-%define OP op_%1
-cglobal %1_pixels8_l2, 6,6
-movsxdifnidn r3, r3d
-movsxdifnidn r4, r4d
-testr5d, 1
-je.loop
-mova m0, [r1]
-mova m1, [r2]
-add  r1, r4
-add  r2, 8
-pavgbm0, m1
-OP   m0, [r0]
-add  r0, r3
-dec r5d
-.loop:
-mova m0, [r1]
-mova m1, [r1+r4]
-lea  r1, [r1+2*r4]
-pavgbm0, [r2]
-pavgbm1, [r2+8]
-OP   m0, [r0]
-OP   m1, [r0+r3]
-lea  r0, [r0+2*r3]
-mova m0, [r1]
-mova m1, [r1+r4]
-lea  r1, [r1+2*r4]
-pavgbm0, [r2+16]
-pavgbm1, [r2+24]
-OP   m0, [r0]
-OP   m1, [r0+r3]
-lea  r0, [r0+2*r3]
-add  r2, 32
-sub r5d, 4
-jne   .loop
-REP_RET
-%endmacro
-
-INIT_MMX mmxext
-PIXELS8_L2 put
-PIXELS8_L2 avg
-
-; void pixels16_l2_mmxext(uint8_t *dst, uint8_t *src1, uint8_t *src2, int 
dstStride, int src1Stride, int h)
-%macro PIXELS16_L2 1
-%define OP op_%1
-cglobal %1_pixels16_l2, 6,6
-movsxdifnidn r3, r3d
-movsxdifnidn r4, r4d
-testr5d, 1
-je.loop
-mova m0, [r1]
-mova m1, [r1+8]
-pavgbm0, [r2]
-pavgbm1, [r2+8]
-add  r1, r4
-add  r2, 16
-OP   m0, [r0]
-OP   m1, [r0+8]
-add  r0, r3
-dec r5d
-.loop:
-mova m0, [r1]
-mova m1, [r1+8]
-add  r1, r4
-pavgbm0, [r2]
-pavgbm1, [r2+8]
-OP   m0, [r0]
-OP   m1, [r0+8]
-add  r0, r3
-mova m0, [r1]
-mova m1, [r1+8]
-add  r1, r4
-pavgbm0, [r2+16]
-pavgbm1, [r2+24]
-OP   m0, [r0]
-OP   m1, [r0+8]
-add  r0, r3
-add  r2, 32
-sub r5d, 2
-jne   .loop
-REP_RET
-%endmacro
-
-INIT_MMX mmxext
-PIXELS16_L2 put
-PIXELS16_L2 avg
-
-INIT_MMX mmxext
-; void pixels(uint8_t *block, const uint8_t *pixels, int line_size, int h)
-%macro PIXELS48 2
-%if %2 == 4
-%define OP movh
-%else
-%define OP mova
-%endif
-cglobal %1_pixels%2, 4,5
-movsxdifnidn r2, r2d
-lea  r4, [r2*3]
-.loop:
-OP   m0, [r1]
-OP   m1, [r1+r2]
-OP   m2, [r1+r2*2]
-OP   m3, [r1+r4]
-lea  r1, [r1+r2*4]
-%ifidn %1, avg
-pavgbm0, [r0]
-pavgbm1, [r0+r2]
-pavgbm2, [r0+r2*2]
-pavgbm3, [r0+r4]
-%endif
-OP [r0], m0
-OP  [r0+r2], m1
-OP[r0+r2*2], m2
-OP  [r0+r4], m3
-sub r3d, 4
-lea  r0, [r0+r2*4]
-jne   .loop
-RET
-%endmacro
-
-PIXELS48 put, 4
-PIXELS48 avg, 4
-PIXELS48 put, 8
-PIXELS48 avg, 8
diff --git a/libavcodec/x86/h264_qpel_8bit.asm 
b/libavcodec/x86

Re: [libav-devel] [PATCH] bfin: vp3: Separate VP3 initialization code

2013-01-23 Thread Ronald S. Bultje
Hi,

On Wed, Jan 23, 2013 at 7:45 AM, Luca Barbato lu_z...@gentoo.org wrote:
 From: Diego Biurrun di...@biurrun.de

 Signed-off-by: Luca Barbato lu_z...@gentoo.org
 ---

 Rebased after the move to int16_t.

 En passant, adding the missing memsets as pointed by Ronald.

  libavcodec/bfin/Makefile   |  4 ++--
  libavcodec/bfin/dsputil_bfin.c |  8 +---
  libavcodec/bfin/vp3_bfin.c | 13 -
  libavcodec/vp3dsp.c|  2 ++
  libavcodec/vp3dsp.h|  1 +
  5 files changed, 18 insertions(+), 10 deletions(-)

TY, should be good now.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH v2] dsputil: x86: Convert mpeg4 qpel and dsputil avg to yasm

2013-01-23 Thread Ronald S. Bultje
Hi Daniel,

On Tue, Jan 22, 2013 at 11:19 PM, Daniel Kang daniel.d.k...@gmail.com wrote:
 @@ -1330,10 +1087,12 @@ static void OPNAME ## qpel8_mc12_ ## MMX(uint8_t 
 *dst, uint8_t *src,\
  {   \
  uint64_t half[8 + 9];   \
  uint8_t * const halfH = ((uint8_t*)half);   \
 -put ## RND ## mpeg4_qpel8_h_lowpass_ ## MMX(halfH, src, 8,  \
 -stride, 9); \
 -put ## RND ## pixels8_l2_ ## MMX(halfH, src, halfH, 8, stride, 9);  \
 -OPNAME ## mpeg4_qpel8_v_lowpass_ ## MMX(dst, halfH, stride, 8); \
 +ff_put ## RND ## mpeg4_qpel8_h_lowpass_ ## MMX(halfH, src, 8,   \
 +   stride, 9);  \
 +ff_put ## RND ## pixels8_l2_ ## MMX(halfH, src, halfH,  \
 +8, stride, 9);  \
 +ff_ ## OPNAME ## mpeg4_qpel8_v_lowpass_ ## MMX(dst, halfH,  \
 +   stride, 8);  \
  }   \

So, for all cases like this, does this actually affect speed? I mean,
previously this could be inlined, now it no longer can be. I wonder if
that has any effect on speed (i.e. was it ever inlined previously?).

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH v2] dsputil: x86: Convert mpeg4 qpel and dsputil avg to yasm

2013-01-23 Thread Ronald S. Bultje
Hi,

On Wed, Jan 23, 2013 at 1:16 PM, Daniel Kang daniel.d.k...@gmail.com wrote:
 On Wed, Jan 23, 2013 at 4:14 PM, Daniel Kang daniel.d.k...@gmail.com wrote:
 On Wed, Jan 23, 2013 at 12:36 PM, Ronald S. Bultje rsbul...@gmail.com 
 wrote:
 Hi Daniel,

 On Tue, Jan 22, 2013 at 11:19 PM, Daniel Kang daniel.d.k...@gmail.com 
 wrote:
 @@ -1330,10 +1087,12 @@ static void OPNAME ## qpel8_mc12_ ## MMX(uint8_t 
 *dst, uint8_t *src,\
  {   \
  uint64_t half[8 + 9];   \
  uint8_t * const halfH = ((uint8_t*)half);   \
 -put ## RND ## mpeg4_qpel8_h_lowpass_ ## MMX(halfH, src, 8,  \
 -stride, 9); \
 -put ## RND ## pixels8_l2_ ## MMX(halfH, src, halfH, 8, stride, 9);  \
 -OPNAME ## mpeg4_qpel8_v_lowpass_ ## MMX(dst, halfH, stride, 8); \
 +ff_put ## RND ## mpeg4_qpel8_h_lowpass_ ## MMX(halfH, src, 8,   \
 +   stride, 9);  \
 +ff_put ## RND ## pixels8_l2_ ## MMX(halfH, src, halfH,  \
 +8, stride, 9);  \
 +ff_ ## OPNAME ## mpeg4_qpel8_v_lowpass_ ## MMX(dst, halfH,  \
 +   stride, 8);  \
  }   \

 So, for all cases like this, does this actually affect speed? I mean,
 previously this could be inlined, now it no longer can be. I wonder if
 that has any effect on speed (i.e. was it ever inlined previously?).

 Depending on the architecture (??) the functions are inlined, but are
 often not. I suspect GCC's insane method of reordering registers
 swallows any overhead from calling these functions, but due to macro
 hell, I'm not sure of the best way to test this.

 Sorry, this was not very clear. I think the yasm version is faster
 despite calling overhead, because GCC uses some ridiculous method of
 reordering registers for the inline assembly.

Do you have numbers?

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] dsputil: remove avg_no_rnd_pixels8.

2013-01-22 Thread Ronald S. Bultje
Hi,

On Tue, Jan 22, 2013 at 4:00 AM, Diego Biurrun di...@biurrun.de wrote:
 On Mon, Jan 21, 2013 at 06:02:38PM -0800, Ronald S. Bultje wrote:

 --- a/libavcodec/dsputil.h
 +++ b/libavcodec/dsputil.h
 @@ -281,15 +281,15 @@ typedef struct DSPContext {

  /**
   * Halfpel motion compensation with no rounding (a+b)1.
 - * this is an array[2][4] of motion compensation functions for 2
 - * horizontal blocksizes (8,16) and the 4 halfpel positionsbr
 - * *pixels_tab[ 0-16xH 1-8xH ][ xhalfpel + 2*yhalfpel ]
 + * this is an array[4] of motion compensation functions for 1
 + * horizontal blocksizes (16) and the 4 halfpel positionsbr
 + * *pixels_tab[0][ xhalfpel + 2*yhalfpel ]

 one horizontal blocksize_

 -op_pixels_func avg_no_rnd_pixels_tab[4][4];
 +op_pixels_func avg_no_rnd_pixels_tab[1][4];

 Why do you keep this array two-dimensional?

This is currently stuck in dsputil's macro mess. I'm looking into ways
of fixing that (while also fixing some other oddities) but I'm not
quite ready with that yet. Basically, it will be fixed in a later
commit.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] Drop DCTELEM typedef

2013-01-22 Thread Ronald S. Bultje
Hi,

On Tue, Jan 22, 2013 at 1:53 AM, Diego Biurrun di...@biurrun.de wrote:
 On Mon, Jan 21, 2013 at 06:18:22PM -0800, Ronald S. Bultje wrote:
 On Mon, Jan 21, 2013 at 4:04 PM, Diego Biurrun di...@biurrun.de wrote:
  It does not help as an abstraction and adds dsputil dependencies.

 I like the commit. I do want to add, though, that you're not actually
 practically removing the dsputil dependency from a lot of files (at
 build time), even though the dependency is (in a code-sense) no longer
 there. Examples are in vp3.c or vp8.c, but there's likely more.

 Your comment puzzles me.  vp3.c directly uses DSPContext, vp8.c has no
 dependency on dsputil, before or after my patch ...
[..blah..]

$ grep dsputil\.h ../libavcodec/vp*dsp.h
../libavcodec/vp3dsp.h:#include dsputil.h
../libavcodec/vp8dsp.h:#include dsputil.h

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] dsputil: remove avg_no_rnd_pixels8.

2013-01-22 Thread Ronald S. Bultje
Hi,

On Tue, Jan 22, 2013 at 8:59 AM, Diego Elio Pettenò
flamee...@flameeyes.eu wrote:
 On 22/01/2013 03:02, Ronald S. Bultje wrote:

 This is never used.

 This has a strange effect on the other avg_pixels8_* functions, me and
 Luca have been looking into it today — it's not bad, but if we can
 stagger this a moment, we might be able to figure it out properly.

You'll probably want to explain what you mean with strange effect?

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] Drop DCTELEM typedef

2013-01-22 Thread Ronald S. Bultje
Hi,

On Tue, Jan 22, 2013 at 7:49 AM, Diego Biurrun di...@biurrun.de wrote:
 On Tue, Jan 22, 2013 at 07:12:21AM -0800, Ronald S. Bultje wrote:
 On Tue, Jan 22, 2013 at 1:53 AM, Diego Biurrun di...@biurrun.de wrote:
  On Mon, Jan 21, 2013 at 06:18:22PM -0800, Ronald S. Bultje wrote:
  On Mon, Jan 21, 2013 at 4:04 PM, Diego Biurrun di...@biurrun.de wrote:
   It does not help as an abstraction and adds dsputil dependencies.
 
  I like the commit. I do want to add, though, that you're not actually
  practically removing the dsputil dependency from a lot of files (at
  build time), even though the dependency is (in a code-sense) no longer
  there. Examples are in vp3.c or vp8.c, but there's likely more.
 
  Your comment puzzles me.  vp3.c directly uses DSPContext, vp8.c has no
  dependency on dsputil, before or after my patch ...
 [..blah..]

 $ grep dsputil\.h ../libavcodec/vp*dsp.h
 ../libavcodec/vp3dsp.h:#include dsputil.h
 ../libavcodec/vp8dsp.h:#include dsputil.h

 So it's a game of showing shell output; here's mine:

 $ grep dsputil\.h libavcodec/vp*dsp.h
 libavcodec/vp3dsp.h:#include dsputil.h
 libavcodec/vp8dsp.h:#include dsputil.h
 $ git cherry-pick c2567e6c6771a6a5bd66762e486eae0cd608f7a4
 Finished one cherry-pick.
 [test a10ebd3] Drop DCTELEM typedef
  163 files changed, 835 insertions(+), 812 deletions(-)
 $ grep dsputil\.h libavcodec/vp*dsp.h
 $ git log -n 1 --oneline c2567e6c6771a6a5bd66762e486eae0cd608f7a4 | cat
 c2567e6 Drop DCTELEM typedef

Very well then.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


[libav-devel] [PATCH] dsputil: remove 9/10 bits hpel functions.

2013-01-22 Thread Ronald S. Bultje
From: Ronald S. Bultje rsbul...@gmail.com

These are never used.
---
 libavcodec/dsputil.c  | 31 -
 libavcodec/dsputil_template.c | 64 ---
 2 files changed, 54 insertions(+), 41 deletions(-)

diff --git a/libavcodec/dsputil.c b/libavcodec/dsputil.c
index 7bead1d..a306583 100644
--- a/libavcodec/dsputil.c
+++ b/libavcodec/dsputil.c
@@ -2689,6 +2689,24 @@ av_cold void ff_dsputil_init(DSPContext* c, 
AVCodecContext *avctx)
 c-shrink[2]= ff_shrink44;
 c-shrink[3]= ff_shrink88;
 
+#define hpel_funcs(prefix, idx, num) \
+c-prefix ## _pixels_tab idx [0] = prefix ## _pixels ## num ## _8_c; \
+c-prefix ## _pixels_tab idx [1] = prefix ## _pixels ## num ## _x2_8_c; \
+c-prefix ## _pixels_tab idx [2] = prefix ## _pixels ## num ## _y2_8_c; \
+c-prefix ## _pixels_tab idx [3] = prefix ## _pixels ## num ## _xy2_8_c
+
+hpel_funcs(put, [0], 16);
+hpel_funcs(put, [1],  8);
+hpel_funcs(put, [2],  4);
+hpel_funcs(put, [3],  2);
+hpel_funcs(put_no_rnd, [0], 16);
+hpel_funcs(put_no_rnd, [1],  8);
+hpel_funcs(avg, [0], 16);
+hpel_funcs(avg, [1],  8);
+hpel_funcs(avg, [2],  4);
+hpel_funcs(avg, [3],  2);
+hpel_funcs(avg_no_rnd,[0], 16);
+
 #undef FUNC
 #undef FUNCC
 #define FUNC(f, depth) f ## _ ## depth
@@ -2718,7 +2736,6 @@ av_cold void ff_dsputil_init(DSPContext* c, 
AVCodecContext *avctx)
 c-PFX ## _pixels_tab[IDX][14] = FUNCC(PFX ## NUM ## _mc23, depth);\
 c-PFX ## _pixels_tab[IDX][15] = FUNCC(PFX ## NUM ## _mc33, depth)
 
-
 #define BIT_DEPTH_FUNCS(depth, dct)\
 c-get_pixels= FUNCC(get_pixels   ## dct   , depth);\
 c-draw_edges= FUNCC(draw_edges, depth);\
@@ -2734,18 +2751,6 @@ av_cold void ff_dsputil_init(DSPContext* c, 
AVCodecContext *avctx)
 c-avg_h264_chroma_pixels_tab[1] = FUNCC(avg_h264_chroma_mc4   , depth);\
 c-avg_h264_chroma_pixels_tab[2] = FUNCC(avg_h264_chroma_mc2   , depth);\
 \
-dspfunc1(put   , 0, 16, depth);\
-dspfunc1(put   , 1,  8, depth);\
-dspfunc1(put   , 2,  4, depth);\
-dspfunc1(put   , 3,  2, depth);\
-dspfunc1(put_no_rnd, 0, 16, depth);\
-dspfunc1(put_no_rnd, 1,  8, depth);\
-dspfunc1(avg   , 0, 16, depth);\
-dspfunc1(avg   , 1,  8, depth);\
-dspfunc1(avg   , 2,  4, depth);\
-dspfunc1(avg   , 3,  2, depth);\
-dspfunc1(avg_no_rnd, 0, 16, depth);\
-\
 dspfunc2(put_h264_qpel, 0, 16, depth);\
 dspfunc2(put_h264_qpel, 1,  8, depth);\
 dspfunc2(put_h264_qpel, 2,  4, depth);\
diff --git a/libavcodec/dsputil_template.c b/libavcodec/dsputil_template.c
index bd5c48b..c1199db 100644
--- a/libavcodec/dsputil_template.c
+++ b/libavcodec/dsputil_template.c
@@ -197,15 +197,7 @@ DCTELEM_FUNCS(DCTELEM, _16)
 DCTELEM_FUNCS(dctcoef, _32)
 #endif
 
-#define PIXOP2(OPNAME, OP) \
-static void FUNCC(OPNAME ## _pixels2)(uint8_t *block, const uint8_t *pixels, 
int line_size, int h){\
-int i;\
-for(i=0; ih; i++){\
-OP(*((pixel2*)(block  )), AV_RN2P(pixels  ));\
-pixels+=line_size;\
-block +=line_size;\
-}\
-}\
+#define PIXOP3(OPNAME, OP) \
 static void FUNCC(OPNAME ## _pixels4)(uint8_t *block, const uint8_t *pixels, 
int line_size, int h){\
 int i;\
 for(i=0; ih; i++){\
@@ -227,20 +219,6 @@ static inline void FUNCC(OPNAME ## 
_no_rnd_pixels8)(uint8_t *block, const uint8_
 FUNCC(OPNAME ## _pixels8)(block, pixels, line_size, h);\
 }\
 \
-static inline void FUNC(OPNAME ## _no_rnd_pixels8_l2)(uint8_t *dst, const 
uint8_t *src1, const uint8_t *src2, int dst_stride, \
-int src_stride1, int 
src_stride2, int h){\
-int i;\
-for(i=0; ih; i++){\
-pixel4 a,b;\
-a= AV_RN4P(src1[i*src_stride1  ]);\
-b= AV_RN4P(src2[i*src_stride2  ]);\
-OP(*((pixel4*)dst[i*dst_stride  ]), no_rnd_avg_pixel4(a, b));\
-a= AV_RN4P(src1[i*src_stride1+4*sizeof(pixel)]);\
-b= AV_RN4P(src2[i*src_stride2+4*sizeof(pixel)]);\
-OP(*((pixel4*)dst[i*dst_stride+4*sizeof(pixel)]), 
no_rnd_avg_pixel4(a, b));\
-}\
-}\
-\
 static inline void FUNC(OPNAME ## _pixels8_l2)(uint8_t *dst, const uint8_t 
*src1, const uint8_t *src2, int dst_stride, \
 int src_stride1, int 
src_stride2, int h){\
 int i;\
@@ -283,6 +261,36 @@ static inline void FUNC(OPNAME ## _pixels16_l2)(uint8_t 
*dst, const uint8_t *src
 FUNC(OPNAME ## _pixels8_l2)(dst+8*sizeof(pixel), src1+8*sizeof(pixel), 
src2+8*sizeof(pixel), dst_stride, src_stride1, src_stride2, h);\
 }\
 \
+CALL_2X_PIXELS(FUNCC(OPNAME ## _pixels16), FUNCC(OPNAME ## _pixels8), 
8*sizeof(pixel))
+
+#define PIXOP4(OPNAME, OP) \
+static void FUNCC(OPNAME ## _pixels2)(uint8_t *block, const uint8_t *pixels, 
int line_size, int h){\
+int i;\
+for(i=0; ih; i++){\
+OP(*((pixel2*)(block  )), AV_RN2P(pixels

[libav-devel] [PATCH] vp3dsp: don't do aligned reads on input.

2013-01-22 Thread Ronald S. Bultje
From: Ronald S. Bultje rsbul...@gmail.com

The input is not guarenteed to be aligned.
---
 libavcodec/vp3dsp.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/libavcodec/vp3dsp.c b/libavcodec/vp3dsp.c
index 1883099..0ce6b81 100644
--- a/libavcodec/vp3dsp.c
+++ b/libavcodec/vp3dsp.c
@@ -282,11 +282,11 @@ static void put_no_rnd_pixels_l2(uint8_t *dst, const 
uint8_t *src1,
 for (i = 0; i  h; i++) {
 uint32_t a, b;
 
-a = AV_RN32A(src1[i * stride]);
-b = AV_RN32A(src2[i * stride]);
+a = AV_RN32(src1[i * stride]);
+b = AV_RN32(src2[i * stride]);
 AV_WN32A(dst[i * stride], no_rnd_avg32(a, b));
-a = AV_RN32A(src1[i * stride + 4]);
-b = AV_RN32A(src2[i * stride + 4]);
+a = AV_RN32(src1[i * stride + 4]);
+b = AV_RN32(src2[i * stride + 4]);
 AV_WN32A(dst[i * stride + 4], no_rnd_avg32(a, b));
 }
 }
-- 
1.8.0

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 2/2] Separate h264 qpel from dsputil

2013-01-22 Thread Ronald S. Bultje
Hi,

On Fri, Jan 18, 2013 at 2:37 PM, Diego Biurrun di...@biurrun.de wrote:
[..]

This patch doesn't convert sh4 and ppc. I can do ppc, I don't have
access to a sh4 cross-compilation environment.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 2/2] Separate h264 qpel from dsputil

2013-01-22 Thread Ronald S. Bultje
Hi,

On Tue, Jan 22, 2013 at 8:49 PM, Ronald S. Bultje rsbul...@gmail.com wrote:
 Hi,

 On Fri, Jan 18, 2013 at 2:37 PM, Diego Biurrun di...@biurrun.de wrote:
 [..]

 This patch doesn't convert sh4 and ppc. I can do ppc, I don't have
 access to a sh4 cross-compilation environment.

And arm also. I've just fixed ppc, I'll fix arm in a little. I don't
know what to do with sh4. Someone appears to be hosting a qemu-based
sh4 fate instance, so it is possible to test it without owning the
proper hardware. Anyone fancy trying to make that one work?

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 2/2] Separate h264 qpel from dsputil

2013-01-22 Thread Ronald S. Bultje
Hi,

On Tue, Jan 22, 2013 at 10:26 PM, Ronald S. Bultje rsbul...@gmail.com wrote:
 Hi,

 On Tue, Jan 22, 2013 at 8:49 PM, Ronald S. Bultje rsbul...@gmail.com wrote:
 Hi,

 On Fri, Jan 18, 2013 at 2:37 PM, Diego Biurrun di...@biurrun.de wrote:
 [..]

 This patch doesn't convert sh4 and ppc. I can do ppc, I don't have
 access to a sh4 cross-compilation environment.

 And arm also. I've just fixed ppc, I'll fix arm in a little. I don't
 know what to do with sh4. Someone appears to be hosting a qemu-based
 sh4 fate instance, so it is possible to test it without owning the
 proper hardware. Anyone fancy trying to make that one work?

Top patch in https://github.com/rbultje/ffmpeg/commits/wmv2dsp has ppc
(runtime-tested w/ and w/o altivec) and arm (compiletime-tested w/ and
w/o neon), and of course also tested on x86-32/64.

As for sh4, I had a look, and I don't get it. It's an almost literal
copy of some ages-old copy of the qpel C functions with some slight
modifications to do aligned reads and minor other tricks. Doesn't the
C code do some of this itself nowadays (AV_RN32A vs AV_RN32)? Some
code in sh4/qpel.c even still has _c suffixes (such as, no really,
gmc1_c, some mspel functions, etc.).

I guess what I'm saying is, it can be made to work, but I can't test
it and I'm not sure I see the point.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] bfin: vp3: Separate VP3 initialization code from general dsputil code

2013-01-21 Thread Ronald S. Bultje
Hi,

On Mon, Jan 21, 2013 at 1:01 AM, Diego Biurrun di...@biurrun.de wrote:
 ---

 This is untested due to lack of a bfin cross-compilation environment.

  libavcodec/bfin/Makefile   |4 ++--
  libavcodec/bfin/dsputil_bfin.c |8 +---
  libavcodec/bfin/vp3_bfin.c |6 ++
  libavcodec/vp3dsp.c|2 ++
  libavcodec/vp3dsp.h|1 +
  5 files changed, 12 insertions(+), 9 deletions(-)

OK.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


[libav-devel] [PATCH] Move put_no_rnd_pixels_l2 function to new VP35MCDSPContext.

2013-01-21 Thread Ronald S. Bultje
From: Ronald S. Bultje rsbul...@gmail.com

---
 configure |  7 ---
 libavcodec/Makefile   |  1 +
 libavcodec/dsputil.c  |  1 -
 libavcodec/dsputil.h  |  2 --
 libavcodec/dsputil_template.c |  4 
 libavcodec/vp3.c  |  5 -
 libavcodec/vp35mcdsp.c| 43 +++
 libavcodec/vp35mcdsp.h| 43 +++
 libavcodec/vp56.c |  3 ++-
 libavcodec/vp56.h |  2 ++
 10 files changed, 99 insertions(+), 12 deletions(-)
 create mode 100644 libavcodec/vp35mcdsp.c
 create mode 100644 libavcodec/vp35mcdsp.h

diff --git a/configure b/configure
index 144ec2d..fc2be8e 100755
--- a/configure
+++ b/configure
@@ -1333,6 +1333,7 @@ CONFIG_EXTRA=
 sinewin
 videodsp
 vp3dsp
+vp35mcdsp
 
 
 CMDLINE_SELECT=
@@ -1575,9 +1576,9 @@ vc1_decoder_select=h263_decoder h264chroma h264qpel
 vc1image_decoder_select=vc1_decoder
 vorbis_decoder_select=mdct
 vorbis_encoder_select=mdct
-vp3_decoder_select=vp3dsp videodsp
-vp5_decoder_select=vp3dsp videodsp
-vp6_decoder_select=huffman vp3dsp videodsp
+vp3_decoder_select=vp3dsp vp35mcdsp videodsp
+vp5_decoder_select=vp3dsp vp35mcdsp videodsp
+vp6_decoder_select=huffman vp3dsp vp35mcdsp videodsp
 vp6a_decoder_select=vp6_decoder
 vp6f_decoder_select=vp6_decoder
 vp8_decoder_select=h264pred videodsp
diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index 3f8f280..d49af79 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -68,6 +68,7 @@ OBJS-$(CONFIG_VAAPI)   += vaapi.o
 OBJS-$(CONFIG_VDPAU)   += vdpau.o
 OBJS-$(CONFIG_VIDEODSP)+= videodsp.o
 OBJS-$(CONFIG_VP3DSP)  += vp3dsp.o
+OBJS-$(CONFIG_VP35MCDSP)   += vp35mcdsp.o
 
 # decoders/encoders/hardware accelerators
 OBJS-$(CONFIG_A64MULTI_ENCODER)+= a64multienc.o elbg.o
diff --git a/libavcodec/dsputil.c b/libavcodec/dsputil.c
index 32a56df..caf1b07 100644
--- a/libavcodec/dsputil.c
+++ b/libavcodec/dsputil.c
@@ -2726,7 +2726,6 @@ av_cold void ff_dsputil_init(DSPContext* c, 
AVCodecContext *avctx)
 c-clear_blocks  = FUNCC(clear_blocks ## dct   , depth);\
 c-add_pixels8   = FUNCC(add_pixels8  ## dct   , depth);\
 c-add_pixels4   = FUNCC(add_pixels4  ## dct   , depth);\
-c-put_no_rnd_pixels_l2  = FUNCC(put_no_rnd_pixels8_l2 , depth);\
 \
 c-put_h264_chroma_pixels_tab[0] = FUNCC(put_h264_chroma_mc8   , depth);\
 c-put_h264_chroma_pixels_tab[1] = FUNCC(put_h264_chroma_mc4   , depth);\
diff --git a/libavcodec/dsputil.h b/libavcodec/dsputil.h
index e6cc1c0..9b88058 100644
--- a/libavcodec/dsputil.h
+++ b/libavcodec/dsputil.h
@@ -291,8 +291,6 @@ typedef struct DSPContext {
  */
 op_pixels_func avg_no_rnd_pixels_tab[4][4];
 
-void (*put_no_rnd_pixels_l2)(uint8_t *block/*align 8*/, const uint8_t 
*a/*align 1*/, const uint8_t *b/*align 1*/, int line_size, int h);
-
 /**
  * Thirdpel motion compensation with rounding (a+b+1)1.
  * this is an array[12] of motion compensation functions for the 9 thirdpe
diff --git a/libavcodec/dsputil_template.c b/libavcodec/dsputil_template.c
index b9d5e97..bd5c48b 100644
--- a/libavcodec/dsputil_template.c
+++ b/libavcodec/dsputil_template.c
@@ -582,10 +582,6 @@ PIXOP2(put, op_put)
 #define put_no_rnd_pixels8_c  put_pixels8_c
 #define put_no_rnd_pixels16_c put_pixels16_c
 
-static void FUNCC(put_no_rnd_pixels8_l2)(uint8_t *dst, const uint8_t *a, const 
uint8_t *b, int stride, int h){
-FUNC(put_no_rnd_pixels8_l2)(dst, a, b, stride, stride, stride, h);
-}
-
 #define H264_CHROMA_MC(OPNAME, OP)\
 static void FUNCC(OPNAME ## h264_chroma_mc2)(uint8_t *_dst/*align 8*/, uint8_t 
*_src/*align 1*/, int stride, int h, int x, int y){\
 pixel *dst = (pixel*)_dst;\
diff --git a/libavcodec/vp3.c b/libavcodec/vp3.c
index 58db890..35cfd24 100644
--- a/libavcodec/vp3.c
+++ b/libavcodec/vp3.c
@@ -41,6 +41,7 @@
 #include videodsp.h
 #include vp3data.h
 #include vp3dsp.h
+#include vp35mcdsp.h
 #include xiph.h
 #include thread.h
 
@@ -138,6 +139,7 @@ typedef struct Vp3DecodeContext {
 DSPContext dsp;
 VideoDSPContext vdsp;
 VP3DSPContext vp3dsp;
+VP35MCDSPContext vp35mcdsp;
 DECLARE_ALIGNED(16, DCTELEM, block)[64];
 int flipped_image;
 int last_slice_end;
@@ -1564,7 +1566,7 @@ static void render_slice(Vp3DecodeContext *s, int slice)
 motion_source, stride, 8);
 }else{
 int d= (motion_x ^ motion_y)31; // d is 0 if 
motion_x and _y have the same sign, else -1
-s-dsp.put_no_rnd_pixels_l2(
+s-vp35mcdsp.put_no_rnd_pixels_l2(
 output_plane + first_pixel,
 motion_source - d,
 motion_source + stride + 1 + d

Re: [libav-devel] [PATCH] Move put_no_rnd_pixels_l2 function to new VP35MCDSPContext.

2013-01-21 Thread Ronald S. Bultje
Hi,

On Mon, Jan 21, 2013 at 10:53 AM, Luca Barbato lu_z...@gentoo.org wrote:
 On 21/01/13 19:45, Ronald S. Bultje wrote:
 From: Ronald S. Bultje rsbul...@gmail.com


 Change the subject to match our guidelines.

 lavc: Move put_no_rnd_pixels_l2 function to new VP35MCDSPContext

 Why the MC btw, I doubt there is a need to split _so_ much than we have
 MC and non-mc functions in different contexts.

Hm, I just realized VP3DSPContext is already shared between vp3 and
vp56, so I'll merge it back in there, sorry for that.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


[libav-devel] [PATCH] Move put_no_rnd_pixels_l2 function to VP3DSPContext.

2013-01-21 Thread Ronald S. Bultje
From: Ronald S. Bultje rsbul...@gmail.com

The function is only used in VP3 and VP5, so no need to have it in
DSPContext.
---
 libavcodec/dsputil.c  |  1 -
 libavcodec/dsputil.h  |  2 --
 libavcodec/dsputil_template.c |  4 
 libavcodec/vp3.c  |  2 +-
 libavcodec/vp3dsp.c   | 19 +++
 libavcodec/vp3dsp.h   | 15 +++
 libavcodec/vp56.c |  2 +-
 7 files changed, 36 insertions(+), 9 deletions(-)

diff --git a/libavcodec/dsputil.c b/libavcodec/dsputil.c
index 32a56df..caf1b07 100644
--- a/libavcodec/dsputil.c
+++ b/libavcodec/dsputil.c
@@ -2726,7 +2726,6 @@ av_cold void ff_dsputil_init(DSPContext* c, 
AVCodecContext *avctx)
 c-clear_blocks  = FUNCC(clear_blocks ## dct   , depth);\
 c-add_pixels8   = FUNCC(add_pixels8  ## dct   , depth);\
 c-add_pixels4   = FUNCC(add_pixels4  ## dct   , depth);\
-c-put_no_rnd_pixels_l2  = FUNCC(put_no_rnd_pixels8_l2 , depth);\
 \
 c-put_h264_chroma_pixels_tab[0] = FUNCC(put_h264_chroma_mc8   , depth);\
 c-put_h264_chroma_pixels_tab[1] = FUNCC(put_h264_chroma_mc4   , depth);\
diff --git a/libavcodec/dsputil.h b/libavcodec/dsputil.h
index e6cc1c0..9b88058 100644
--- a/libavcodec/dsputil.h
+++ b/libavcodec/dsputil.h
@@ -291,8 +291,6 @@ typedef struct DSPContext {
  */
 op_pixels_func avg_no_rnd_pixels_tab[4][4];
 
-void (*put_no_rnd_pixels_l2)(uint8_t *block/*align 8*/, const uint8_t 
*a/*align 1*/, const uint8_t *b/*align 1*/, int line_size, int h);
-
 /**
  * Thirdpel motion compensation with rounding (a+b+1)1.
  * this is an array[12] of motion compensation functions for the 9 thirdpe
diff --git a/libavcodec/dsputil_template.c b/libavcodec/dsputil_template.c
index b9d5e97..bd5c48b 100644
--- a/libavcodec/dsputil_template.c
+++ b/libavcodec/dsputil_template.c
@@ -582,10 +582,6 @@ PIXOP2(put, op_put)
 #define put_no_rnd_pixels8_c  put_pixels8_c
 #define put_no_rnd_pixels16_c put_pixels16_c
 
-static void FUNCC(put_no_rnd_pixels8_l2)(uint8_t *dst, const uint8_t *a, const 
uint8_t *b, int stride, int h){
-FUNC(put_no_rnd_pixels8_l2)(dst, a, b, stride, stride, stride, h);
-}
-
 #define H264_CHROMA_MC(OPNAME, OP)\
 static void FUNCC(OPNAME ## h264_chroma_mc2)(uint8_t *_dst/*align 8*/, uint8_t 
*_src/*align 1*/, int stride, int h, int x, int y){\
 pixel *dst = (pixel*)_dst;\
diff --git a/libavcodec/vp3.c b/libavcodec/vp3.c
index 58db890..33cfc8c 100644
--- a/libavcodec/vp3.c
+++ b/libavcodec/vp3.c
@@ -1564,7 +1564,7 @@ static void render_slice(Vp3DecodeContext *s, int slice)
 motion_source, stride, 8);
 }else{
 int d= (motion_x ^ motion_y)31; // d is 0 if 
motion_x and _y have the same sign, else -1
-s-dsp.put_no_rnd_pixels_l2(
+s-vp3dsp.put_no_rnd_pixels_l2(
 output_plane + first_pixel,
 motion_source - d,
 motion_source + stride + 1 + d,
diff --git a/libavcodec/vp3dsp.c b/libavcodec/vp3dsp.c
index 9e6209d..1883099 100644
--- a/libavcodec/vp3dsp.c
+++ b/libavcodec/vp3dsp.c
@@ -274,8 +274,27 @@ static void vp3_h_loop_filter_c(uint8_t *first_pixel, int 
stride,
 }
 }
 
+static void put_no_rnd_pixels_l2(uint8_t *dst, const uint8_t *src1,
+ const uint8_t *src2, ptrdiff_t stride, int h)
+{
+int i;
+
+for (i = 0; i  h; i++) {
+uint32_t a, b;
+
+a = AV_RN32A(src1[i * stride]);
+b = AV_RN32A(src2[i * stride]);
+AV_WN32A(dst[i * stride], no_rnd_avg32(a, b));
+a = AV_RN32A(src1[i * stride + 4]);
+b = AV_RN32A(src2[i * stride + 4]);
+AV_WN32A(dst[i * stride + 4], no_rnd_avg32(a, b));
+}
+}
+
 av_cold void ff_vp3dsp_init(VP3DSPContext *c, int flags)
 {
+c-put_no_rnd_pixels_l2 = put_no_rnd_pixels_l2;
+
 c-idct_put  = vp3_idct_put_c;
 c-idct_add  = vp3_idct_add_c;
 c-idct_dc_add   = vp3_idct_dc_add_c;
diff --git a/libavcodec/vp3dsp.h b/libavcodec/vp3dsp.h
index feb3000..3e53f0a 100644
--- a/libavcodec/vp3dsp.h
+++ b/libavcodec/vp3dsp.h
@@ -23,6 +23,21 @@
 #include dsputil.h
 
 typedef struct VP3DSPContext {
+/**
+ * Copy 8xH pixels from source to destination buffer using a bilinear
+ * filter with no rounding (i.e. *dst = (*a + *b)  1).
+ *
+ * @param dst destination buffer, aligned by 8
+ * @param a first source buffer, no alignment
+ * @param b second source buffer, no alignment
+ * @param stride distance between two lines in source/dest buffers
+ * @param h height
+ */
+void (*put_no_rnd_pixels_l2)(uint8_t *dst,
+ const uint8_t *a,
+ const uint8_t *b,
+ ptrdiff_t stride, int h);
+
 void (*idct_put

[libav-devel] [PATCH] vp3/5: move put_no_rnd_pixels_l2 from dsputil to VP3DSPContext.

2013-01-21 Thread Ronald S. Bultje
From: Ronald S. Bultje rsbul...@gmail.com

The function is only used in VP3 and VP5, so no need to have it in
DSPContext.
---
 libavcodec/dsputil.c  |  1 -
 libavcodec/dsputil.h  |  2 --
 libavcodec/dsputil_template.c |  4 
 libavcodec/vp3.c  |  2 +-
 libavcodec/vp3dsp.c   | 19 +++
 libavcodec/vp3dsp.h   | 16 
 libavcodec/vp56.c |  6 +++---
 7 files changed, 39 insertions(+), 11 deletions(-)

diff --git a/libavcodec/dsputil.c b/libavcodec/dsputil.c
index 32a56df..caf1b07 100644
--- a/libavcodec/dsputil.c
+++ b/libavcodec/dsputil.c
@@ -2726,7 +2726,6 @@ av_cold void ff_dsputil_init(DSPContext* c, 
AVCodecContext *avctx)
 c-clear_blocks  = FUNCC(clear_blocks ## dct   , depth);\
 c-add_pixels8   = FUNCC(add_pixels8  ## dct   , depth);\
 c-add_pixels4   = FUNCC(add_pixels4  ## dct   , depth);\
-c-put_no_rnd_pixels_l2  = FUNCC(put_no_rnd_pixels8_l2 , depth);\
 \
 c-put_h264_chroma_pixels_tab[0] = FUNCC(put_h264_chroma_mc8   , depth);\
 c-put_h264_chroma_pixels_tab[1] = FUNCC(put_h264_chroma_mc4   , depth);\
diff --git a/libavcodec/dsputil.h b/libavcodec/dsputil.h
index e6cc1c0..9b88058 100644
--- a/libavcodec/dsputil.h
+++ b/libavcodec/dsputil.h
@@ -291,8 +291,6 @@ typedef struct DSPContext {
  */
 op_pixels_func avg_no_rnd_pixels_tab[4][4];
 
-void (*put_no_rnd_pixels_l2)(uint8_t *block/*align 8*/, const uint8_t 
*a/*align 1*/, const uint8_t *b/*align 1*/, int line_size, int h);
-
 /**
  * Thirdpel motion compensation with rounding (a+b+1)1.
  * this is an array[12] of motion compensation functions for the 9 thirdpe
diff --git a/libavcodec/dsputil_template.c b/libavcodec/dsputil_template.c
index b9d5e97..bd5c48b 100644
--- a/libavcodec/dsputil_template.c
+++ b/libavcodec/dsputil_template.c
@@ -582,10 +582,6 @@ PIXOP2(put, op_put)
 #define put_no_rnd_pixels8_c  put_pixels8_c
 #define put_no_rnd_pixels16_c put_pixels16_c
 
-static void FUNCC(put_no_rnd_pixels8_l2)(uint8_t *dst, const uint8_t *a, const 
uint8_t *b, int stride, int h){
-FUNC(put_no_rnd_pixels8_l2)(dst, a, b, stride, stride, stride, h);
-}
-
 #define H264_CHROMA_MC(OPNAME, OP)\
 static void FUNCC(OPNAME ## h264_chroma_mc2)(uint8_t *_dst/*align 8*/, uint8_t 
*_src/*align 1*/, int stride, int h, int x, int y){\
 pixel *dst = (pixel*)_dst;\
diff --git a/libavcodec/vp3.c b/libavcodec/vp3.c
index 58db890..33cfc8c 100644
--- a/libavcodec/vp3.c
+++ b/libavcodec/vp3.c
@@ -1564,7 +1564,7 @@ static void render_slice(Vp3DecodeContext *s, int slice)
 motion_source, stride, 8);
 }else{
 int d= (motion_x ^ motion_y)31; // d is 0 if 
motion_x and _y have the same sign, else -1
-s-dsp.put_no_rnd_pixels_l2(
+s-vp3dsp.put_no_rnd_pixels_l2(
 output_plane + first_pixel,
 motion_source - d,
 motion_source + stride + 1 + d,
diff --git a/libavcodec/vp3dsp.c b/libavcodec/vp3dsp.c
index 9e6209d..1883099 100644
--- a/libavcodec/vp3dsp.c
+++ b/libavcodec/vp3dsp.c
@@ -274,8 +274,27 @@ static void vp3_h_loop_filter_c(uint8_t *first_pixel, int 
stride,
 }
 }
 
+static void put_no_rnd_pixels_l2(uint8_t *dst, const uint8_t *src1,
+ const uint8_t *src2, ptrdiff_t stride, int h)
+{
+int i;
+
+for (i = 0; i  h; i++) {
+uint32_t a, b;
+
+a = AV_RN32A(src1[i * stride]);
+b = AV_RN32A(src2[i * stride]);
+AV_WN32A(dst[i * stride], no_rnd_avg32(a, b));
+a = AV_RN32A(src1[i * stride + 4]);
+b = AV_RN32A(src2[i * stride + 4]);
+AV_WN32A(dst[i * stride + 4], no_rnd_avg32(a, b));
+}
+}
+
 av_cold void ff_vp3dsp_init(VP3DSPContext *c, int flags)
 {
+c-put_no_rnd_pixels_l2 = put_no_rnd_pixels_l2;
+
 c-idct_put  = vp3_idct_put_c;
 c-idct_add  = vp3_idct_add_c;
 c-idct_dc_add   = vp3_idct_dc_add_c;
diff --git a/libavcodec/vp3dsp.h b/libavcodec/vp3dsp.h
index feb3000..d28c847 100644
--- a/libavcodec/vp3dsp.h
+++ b/libavcodec/vp3dsp.h
@@ -19,10 +19,26 @@
 #ifndef AVCODEC_VP3DSP_H
 #define AVCODEC_VP3DSP_H
 
+#include stddef.h
 #include stdint.h
 #include dsputil.h
 
 typedef struct VP3DSPContext {
+/**
+ * Copy 8xH pixels from source to destination buffer using a bilinear
+ * filter with no rounding (i.e. *dst = (*a + *b)  1).
+ *
+ * @param dst destination buffer, aligned by 8
+ * @param a first source buffer, no alignment
+ * @param b second source buffer, no alignment
+ * @param stride distance between two lines in source/dest buffers
+ * @param h height
+ */
+void (*put_no_rnd_pixels_l2)(uint8_t *dst,
+ const uint8_t *a,
+ const

Re: [libav-devel] [PATCH] Move put_no_rnd_pixels_l2 function to new VP35MCDSPContext.

2013-01-21 Thread Ronald S. Bultje
Hi,

On Mon, Jan 21, 2013 at 10:58 AM, Diego Biurrun di...@biurrun.de wrote:
 On Mon, Jan 21, 2013 at 10:45:44AM -0800, Ronald S. Bultje wrote:
 From: Ronald S. Bultje rsbul...@gmail.com

 ---
  configure |  7 ---
  libavcodec/Makefile   |  1 +
  libavcodec/dsputil.c  |  1 -
  libavcodec/dsputil.h  |  2 --
  libavcodec/dsputil_template.c |  4 
  libavcodec/vp3.c  |  5 -
  libavcodec/vp35mcdsp.c| 43 
 +++
  libavcodec/vp35mcdsp.h| 43 
 +++
  libavcodec/vp56.c |  3 ++-
  libavcodec/vp56.h |  2 ++
  10 files changed, 99 insertions(+), 12 deletions(-)
  create mode 100644 libavcodec/vp35mcdsp.c
  create mode 100644 libavcodec/vp35mcdsp.h

 You don't touch the optimized version below libavcodec/x86/ - why?

Which one? I don't see any.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] Move put_no_rnd_pixels_l2 function to new VP35MCDSPContext.

2013-01-21 Thread Ronald S. Bultje
Hi,

On Jan 21, 2013 11:59 AM, Diego Biurrun di...@biurrun.de wrote:

 On Mon, Jan 21, 2013 at 11:04:25AM -0800, Ronald S. Bultje wrote:
  On Mon, Jan 21, 2013 at 10:58 AM, Diego Biurrun di...@biurrun.de
wrote:
   On Mon, Jan 21, 2013 at 10:45:44AM -0800, Ronald S. Bultje wrote:
   From: Ronald S. Bultje rsbul...@gmail.com
  
   ---
configure |  7 ---
libavcodec/Makefile   |  1 +
libavcodec/dsputil.c  |  1 -
libavcodec/dsputil.h  |  2 --
libavcodec/dsputil_template.c |  4 
libavcodec/vp3.c  |  5 -
libavcodec/vp35mcdsp.c| 43
+++
libavcodec/vp35mcdsp.h| 43
+++
libavcodec/vp56.c |  3 ++-
libavcodec/vp56.h |  2 ++
10 files changed, 99 insertions(+), 12 deletions(-)
create mode 100644 libavcodec/vp35mcdsp.c
create mode 100644 libavcodec/vp35mcdsp.h
  
   You don't touch the optimized version below libavcodec/x86/ - why?
 
  Which one? I don't see any.

 put_no_rnd_pixels8_l2 in libavcodec/x86/dsputil_avg_template.c

Where is that assigned?

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH] dsputil: remove avg_no_rnd_pixels8.

2013-01-21 Thread Ronald S. Bultje
From: Ronald S. Bultje rsbul...@gmail.com

This is never used.
---
 libavcodec/alpha/dsputil_alpha.c  |   5 -
 libavcodec/arm/dsputil_init_neon.c|   7 -
 libavcodec/arm/dsputil_neon.S |   6 +-
 libavcodec/dsputil.c  |   1 -
 libavcodec/dsputil.h  |   8 +-
 libavcodec/sh4/dsputil_align.c|   9 -
 libavcodec/sparc/dsputil_vis.c| 461 --
 libavcodec/x86/dsputil_mmx.c  |   3 +-
 libavcodec/x86/dsputil_rnd_template.c |   4 +
 9 files changed, 13 insertions(+), 491 deletions(-)

diff --git a/libavcodec/alpha/dsputil_alpha.c b/libavcodec/alpha/dsputil_alpha.c
index ce7cecb..cf1077b 100644
--- a/libavcodec/alpha/dsputil_alpha.c
+++ b/libavcodec/alpha/dsputil_alpha.c
@@ -308,11 +308,6 @@ void ff_dsputil_init_alpha(DSPContext* c, AVCodecContext 
*avctx)
 c-avg_pixels_tab[1][2] = avg_pixels_y2_axp;
 c-avg_pixels_tab[1][3] = avg_pixels_xy2_axp;
 
-c-avg_no_rnd_pixels_tab[1][0] = avg_no_rnd_pixels_axp;
-c-avg_no_rnd_pixels_tab[1][1] = avg_no_rnd_pixels_x2_axp;
-c-avg_no_rnd_pixels_tab[1][2] = avg_no_rnd_pixels_y2_axp;
-c-avg_no_rnd_pixels_tab[1][3] = avg_no_rnd_pixels_xy2_axp;
-
 c-clear_blocks = clear_blocks_axp;
 }
 
diff --git a/libavcodec/arm/dsputil_init_neon.c 
b/libavcodec/arm/dsputil_init_neon.c
index f27aee4..1c5181c 100644
--- a/libavcodec/arm/dsputil_init_neon.c
+++ b/libavcodec/arm/dsputil_init_neon.c
@@ -58,9 +58,6 @@ void ff_avg_pixels8_xy2_neon(uint8_t *, const uint8_t *, int, 
int);
 void ff_avg_pixels16_x2_no_rnd_neon(uint8_t *, const uint8_t *, int, int);
 void ff_avg_pixels16_y2_no_rnd_neon(uint8_t *, const uint8_t *, int, int);
 void ff_avg_pixels16_xy2_no_rnd_neon(uint8_t *, const uint8_t *, int, int);
-void ff_avg_pixels8_x2_no_rnd_neon(uint8_t *, const uint8_t *, int, int);
-void ff_avg_pixels8_y2_no_rnd_neon(uint8_t *, const uint8_t *, int, int);
-void ff_avg_pixels8_xy2_no_rnd_neon(uint8_t *, const uint8_t *, int, int);
 
 void ff_add_pixels_clamped_neon(const DCTELEM *, uint8_t *, int);
 void ff_put_pixels_clamped_neon(const DCTELEM *, uint8_t *, int);
@@ -203,10 +200,6 @@ void ff_dsputil_init_neon(DSPContext *c, AVCodecContext 
*avctx)
 c-avg_no_rnd_pixels_tab[0][1] = ff_avg_pixels16_x2_no_rnd_neon;
 c-avg_no_rnd_pixels_tab[0][2] = ff_avg_pixels16_y2_no_rnd_neon;
 c-avg_no_rnd_pixels_tab[0][3] = ff_avg_pixels16_xy2_no_rnd_neon;
-c-avg_no_rnd_pixels_tab[1][0] = ff_avg_pixels8_neon;
-c-avg_no_rnd_pixels_tab[1][1] = ff_avg_pixels8_x2_no_rnd_neon;
-c-avg_no_rnd_pixels_tab[1][2] = ff_avg_pixels8_y2_no_rnd_neon;
-c-avg_no_rnd_pixels_tab[1][3] = ff_avg_pixels8_xy2_no_rnd_neon;
 }
 
 c-add_pixels_clamped = ff_add_pixels_clamped_neon;
diff --git a/libavcodec/arm/dsputil_neon.S b/libavcodec/arm/dsputil_neon.S
index cf92817..f33fa33 100644
--- a/libavcodec/arm/dsputil_neon.S
+++ b/libavcodec/arm/dsputil_neon.S
@@ -421,9 +421,9 @@ function ff_avg_h264_qpel8_mc00_neon, export=1
 endfunc
 
 pixfunc avg_, pixels8, avg=1
-pixfunc2avg_, pixels8_x2,  avg=1
-pixfunc2avg_, pixels8_y2,  avg=1
-pixfunc2avg_, pixels8_xy2, avg=1
+pixfunc avg_, pixels8_x2,  avg=1
+pixfunc avg_, pixels8_y2,  avg=1
+pixfunc avg_, pixels8_xy2, avg=1
 
 function ff_put_pixels_clamped_neon, export=1
 vld1.16 {d16-d19}, [r0,:128]!
diff --git a/libavcodec/dsputil.c b/libavcodec/dsputil.c
index caf1b07..7bead1d 100644
--- a/libavcodec/dsputil.c
+++ b/libavcodec/dsputil.c
@@ -2745,7 +2745,6 @@ av_cold void ff_dsputil_init(DSPContext* c, 
AVCodecContext *avctx)
 dspfunc1(avg   , 2,  4, depth);\
 dspfunc1(avg   , 3,  2, depth);\
 dspfunc1(avg_no_rnd, 0, 16, depth);\
-dspfunc1(avg_no_rnd, 1,  8, depth);\
 \
 dspfunc2(put_h264_qpel, 0, 16, depth);\
 dspfunc2(put_h264_qpel, 1,  8, depth);\
diff --git a/libavcodec/dsputil.h b/libavcodec/dsputil.h
index 9b88058..b01c912 100644
--- a/libavcodec/dsputil.h
+++ b/libavcodec/dsputil.h
@@ -281,15 +281,15 @@ typedef struct DSPContext {
 
 /**
  * Halfpel motion compensation with no rounding (a+b)1.
- * this is an array[2][4] of motion compensation functions for 2
- * horizontal blocksizes (8,16) and the 4 halfpel positionsbr
- * *pixels_tab[ 0-16xH 1-8xH ][ xhalfpel + 2*yhalfpel ]
+ * this is an array[4] of motion compensation functions for 1
+ * horizontal blocksizes (16) and the 4 halfpel positionsbr
+ * *pixels_tab[0][ xhalfpel + 2*yhalfpel ]
  * @param block destination into which the result is averaged (a+b)1
  * @param pixels source
  * @param line_size number of bytes in a horizontal line of block
  * @param h height
  */
-op_pixels_func avg_no_rnd_pixels_tab[4][4];
+op_pixels_func avg_no_rnd_pixels_tab[1][4];
 
 /**
  * Thirdpel motion compensation with rounding (a+b+1)1

Re: [libav-devel] [PATCH] bfin: vp3: Separate VP3 initialization code

2013-01-21 Thread Ronald S. Bultje
Hi,

On Mon, Jan 21, 2013 at 1:13 PM, Luca Barbato lu_z...@gentoo.org wrote:
 On 21/01/13 20:38, Luca Barbato wrote:
 +void ff_vp3dsp_init_bfin(VP3DSPContext *c, int flags)
 +{
 +c-idct_add  = ff_bfin_vp3_idct_add;
 +c-idct_put  = ff_bfin_vp3_idct_put;

 c-idct missing.

vp3dsp has no idct.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] bfin: vp3: Separate VP3 initialization code

2013-01-21 Thread Ronald S. Bultje
Hi,

On Mon, Jan 21, 2013 at 2:31 PM, Luca Barbato lu_z...@gentoo.org wrote:
 On 21/01/13 22:43, Luca Barbato wrote:
 Actually the idct_dc_add not sure if we can safely mix and match them,
 probably we can.

 I'd push tomorrow the updated patch, idct_dc_add is compatible from what
 I can see.

Please make sure that they set their coefficients to zero after the
idct, that was a recent modification in the API (like VP8) which the
asm previously (when this was written) didn't do.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] Drop DCTELEM typedef

2013-01-21 Thread Ronald S. Bultje
Hi,

On Mon, Jan 21, 2013 at 4:04 PM, Diego Biurrun di...@biurrun.de wrote:
 It does not help as an abstraction and adds dsputil dependencies.

I like the commit. I do want to add, though, that you're not actually
practically removing the dsputil dependency from a lot of files (at
build time), even though the dependency is (in a code-sense) no longer
there. Examples are in vp3.c or vp8.c, but there's likely more.

Will you remove these in a later commit?

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


[libav-devel] [PATCH] dsputil: remove some never-assigned function pointers from the struct.

2013-01-21 Thread Ronald S. Bultje
From: Ronald S. Bultje rsbul...@gmail.com

---
 libavcodec/dsputil.h | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/libavcodec/dsputil.h b/libavcodec/dsputil.h
index b01c912..e4bbd46 100644
--- a/libavcodec/dsputil.h
+++ b/libavcodec/dsputil.h
@@ -277,7 +277,7 @@ typedef struct DSPContext {
  * @param line_size number of bytes in a horizontal line of block
  * @param h height
  */
-op_pixels_func put_no_rnd_pixels_tab[4][4];
+op_pixels_func put_no_rnd_pixels_tab[2][4];
 
 /**
  * Halfpel motion compensation with no rounding (a+b)1.
@@ -307,7 +307,6 @@ typedef struct DSPContext {
 qpel_mc_func put_qpel_pixels_tab[2][16];
 qpel_mc_func avg_qpel_pixels_tab[2][16];
 qpel_mc_func put_no_rnd_qpel_pixels_tab[2][16];
-qpel_mc_func avg_no_rnd_qpel_pixels_tab[2][16];
 qpel_mc_func put_mspel_pixels_tab[8];
 
 /**
@@ -317,7 +316,7 @@ typedef struct DSPContext {
 h264_chroma_mc_func avg_h264_chroma_pixels_tab[3];
 
 qpel_mc_func put_h264_qpel_pixels_tab[4][16];
-qpel_mc_func avg_h264_qpel_pixels_tab[4][16];
+qpel_mc_func avg_h264_qpel_pixels_tab[3][16];
 
 me_cmp_func pix_abs[2][4];
 
-- 
1.7.11.3

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 2/3] Move wmv idct to its own DSP context.

2013-01-20 Thread Ronald S. Bultje
Hi,

On Sun, Jan 20, 2013 at 5:12 AM, Diego Biurrun di...@biurrun.de wrote:
 On Sat, Jan 19, 2013 at 01:52:24PM -0800, Ronald S. Bultje wrote:
 From: Ronald S. Bultje rsbul...@gmail.com

 This allows us to remove FF_IDCT_WMV2, which serves no practical purpose
 other than to be able to select the WMV2 IDCT for MPEG (or vice versa)
 and get corrupt output.
 ---
  libavcodec/Makefile   |   4 +-
  libavcodec/dsputil.c  |  89 -
  libavcodec/wmv2.c |  22 ++--
  libavcodec/wmv2.h |   2 +
  libavcodec/wmv2dec.c  |   4 --
  libavcodec/wmv2dsp.c  | 135 
 ++
  libavcodec/wmv2dsp.h  |  34 +
  libavcodec/wmv2enc.c  |   4 --
  tests/fate-run.sh |   3 +-
  tests/fate/vcodec.mak |   5 +-
  10 files changed, 195 insertions(+), 107 deletions(-)
  create mode 100644 libavcodec/wmv2dsp.c
  create mode 100644 libavcodec/wmv2dsp.h

 fate-seek-vsynth2-wmv2 fails with this patch applied (and 1/3 also applied).
 Does it work for you?

#1 includes a ref update for it as well, #2 doesn't change it afaik. Old #1?

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] Move vector_fmul_add from dsputil to avfloatdsp.

2013-01-20 Thread Ronald S. Bultje
Hi,

On Sun, Jan 20, 2013 at 1:30 AM, Luca Barbato lu_z...@gentoo.org wrote:
 On 20/01/13 07:36, Ronald S. Bultje wrote:
 From: Ronald S. Bultje rsbul...@gmail.com

 ---
  libavcodec/aacsbr.c | 10 +-
  libavcodec/arm/dsputil_init_neon.c  |  3 ---
  libavcodec/arm/dsputil_neon.S   | 27 ---
  libavcodec/dsputil.c|  7 ---
  libavcodec/dsputil.h|  2 --
  libavcodec/ppc/float_altivec.c  | 25 -
  libavcodec/wmadec.c |  8 
  libavcodec/x86/dsputil.asm  | 28 
  libavcodec/x86/dsputil_mmx.c|  7 ---
  libavutil/arm/float_dsp_init_neon.c |  4 
  libavutil/arm/float_dsp_neon.S  | 27 +++
  libavutil/float_dsp.c   |  8 
  libavutil/float_dsp.h   | 18 ++
  libavutil/ppc/float_dsp_altivec.c   | 24 
  libavutil/ppc/float_dsp_altivec.h   |  4 
  libavutil/ppc/float_dsp_init.c  |  1 +
  libavutil/x86/float_dsp.asm | 28 
  libavutil/x86/float_dsp_init.c  |  7 +++
  18 files changed, 130 insertions(+), 108 deletions(-)


 For those wondering, it seems a rebase.

No, it moves the functions to the end of the files and struct, to
maintain ABI. The previous patch added it somewhere in the middle.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


[libav-devel] [PATCH] floatdsp: move vector_fmul_add from dsputil to avfloatdsp.

2013-01-20 Thread Ronald S. Bultje
From: Ronald S. Bultje rsbul...@gmail.com

---
 libavcodec/aacsbr.c | 10 +-
 libavcodec/arm/dsputil_init_neon.c  |  3 ---
 libavcodec/arm/dsputil_neon.S   | 27 ---
 libavcodec/dsputil.c|  7 ---
 libavcodec/dsputil.h|  2 --
 libavcodec/ppc/float_altivec.c  | 25 -
 libavcodec/wmadec.c |  8 
 libavcodec/x86/dsputil.asm  | 28 
 libavcodec/x86/dsputil_mmx.c|  7 ---
 libavutil/arm/float_dsp_init_neon.c |  4 
 libavutil/arm/float_dsp_neon.S  | 27 +++
 libavutil/float_dsp.c   |  9 +
 libavutil/float_dsp.h   | 18 ++
 libavutil/ppc/float_dsp_altivec.c   | 24 
 libavutil/ppc/float_dsp_altivec.h   |  4 
 libavutil/ppc/float_dsp_init.c  |  1 +
 libavutil/x86/float_dsp.asm | 28 
 libavutil/x86/float_dsp_init.c  |  7 +++
 18 files changed, 131 insertions(+), 108 deletions(-)

diff --git a/libavcodec/aacsbr.c b/libavcodec/aacsbr.c
index add9f18..0b96abb 100644
--- a/libavcodec/aacsbr.c
+++ b/libavcodec/aacsbr.c
@@ -1172,8 +1172,8 @@ static void sbr_qmf_analysis(DSPContext *dsp, FFTContext 
*mdct,
  * Synthesis QMF Bank (14496-3 sp04 p206) and Downsampled Synthesis QMF Bank
  * (14496-3 sp04 p206)
  */
-static void sbr_qmf_synthesis(DSPContext *dsp, FFTContext *mdct,
-  SBRDSPContext *sbrdsp, AVFloatDSPContext *fdsp,
+static void sbr_qmf_synthesis(FFTContext *mdct,
+  SBRDSPContext *sbrdsp, AVFloatDSPContext *dsp,
   float *out, float X[2][38][64],
   float mdct_buf[2][64],
   float *v0, int *v_off, const unsigned int div)
@@ -1204,7 +1204,7 @@ static void sbr_qmf_synthesis(DSPContext *dsp, FFTContext 
*mdct,
 mdct-imdct_half(mdct, mdct_buf[1], X[1][i]);
 sbrdsp-qmf_deint_bfly(v, mdct_buf[1], mdct_buf[0]);
 }
-fdsp-vector_fmul   (out, v, sbr_qmf_window
   , 64  div);
+dsp-vector_fmul(out, v, sbr_qmf_window
   , 64  div);
 dsp-vector_fmul_add(out, v + ( 192  div), sbr_qmf_window + ( 64  
div), out   , 64  div);
 dsp-vector_fmul_add(out, v + ( 256  div), sbr_qmf_window + (128  
div), out   , 64  div);
 dsp-vector_fmul_add(out, v + ( 448  div), sbr_qmf_window + (192  
div), out   , 64  div);
@@ -1702,13 +1702,13 @@ void ff_sbr_apply(AACContext *ac, 
SpectralBandReplication *sbr, int id_aac,
 nch = 2;
 }
 
-sbr_qmf_synthesis(ac-dsp, sbr-mdct, sbr-dsp, ac-fdsp,
+sbr_qmf_synthesis(sbr-mdct, sbr-dsp, ac-fdsp,
   L, sbr-X[0], sbr-qmf_filter_scratch,
   sbr-data[0].synthesis_filterbank_samples,
   sbr-data[0].synthesis_filterbank_samples_offset,
   downsampled);
 if (nch == 2)
-sbr_qmf_synthesis(ac-dsp, sbr-mdct, sbr-dsp, ac-fdsp,
+sbr_qmf_synthesis(sbr-mdct, sbr-dsp, ac-fdsp,
   R, sbr-X[1], sbr-qmf_filter_scratch,
   sbr-data[1].synthesis_filterbank_samples,
   sbr-data[1].synthesis_filterbank_samples_offset,
diff --git a/libavcodec/arm/dsputil_init_neon.c 
b/libavcodec/arm/dsputil_init_neon.c
index ee0e9af..0d23b26 100644
--- a/libavcodec/arm/dsputil_init_neon.c
+++ b/libavcodec/arm/dsputil_init_neon.c
@@ -146,8 +146,6 @@ void ff_butterflies_float_neon(float *v1, float *v2, int 
len);
 float ff_scalarproduct_float_neon(const float *v1, const float *v2, int len);
 void ff_vector_fmul_reverse_neon(float *dst, const float *src0,
  const float *src1, int len);
-void ff_vector_fmul_add_neon(float *dst, const float *src0, const float *src1,
- const float *src2, int len);
 
 void ff_vector_clipf_neon(float *dst, const float *src, float min, float max,
   int len);
@@ -301,7 +299,6 @@ void ff_dsputil_init_neon(DSPContext *c, AVCodecContext 
*avctx)
 c-butterflies_float  = ff_butterflies_float_neon;
 c-scalarproduct_float= ff_scalarproduct_float_neon;
 c-vector_fmul_reverse= ff_vector_fmul_reverse_neon;
-c-vector_fmul_add= ff_vector_fmul_add_neon;
 c-vector_clipf   = ff_vector_clipf_neon;
 c-vector_clip_int32  = ff_vector_clip_int32_neon;
 
diff --git a/libavcodec/arm/dsputil_neon.S b/libavcodec/arm/dsputil_neon.S
index ebc70ac..5e512a7 100644
--- a/libavcodec/arm/dsputil_neon.S
+++ b/libavcodec/arm/dsputil_neon.S
@@ -580,33 +580,6 @@ function ff_vector_fmul_reverse_neon, export=1
 bx  lr
 endfunc
 
-function ff_vector_fmul_add_neon, export=1

[libav-devel] [PATCH] floatdsp: move vector_fmul_reverse from dsputil to avfloatdsp.

2013-01-20 Thread Ronald S. Bultje
From: Ronald S. Bultje rsbul...@gmail.com

Now, nellymoserenc and aacenc no longer depends on dsputil. Independent
of this patch, wmaprodec also does not depend on dsputil, so I removed
it from there also.
---
 libavcodec/aacdec.c |  10 ++--
 libavcodec/aacenc.c |  19 +++
 libavcodec/aacenc.h |   1 -
 libavcodec/aacsbr.c |   4 +-
 libavcodec/arm/Makefile |   3 -
 libavcodec/arm/dsputil_init_arm.c   |   1 -
 libavcodec/arm/dsputil_init_neon.c  |   3 -
 libavcodec/arm/dsputil_init_vfp.c   |  30 --
 libavcodec/arm/dsputil_neon.S   |  24 
 libavcodec/arm/dsputil_vfp.S| 106 
 libavcodec/dsputil.c|   8 ---
 libavcodec/dsputil.h|   2 -
 libavcodec/nellymoserenc.c  |  10 ++--
 libavcodec/ppc/Makefile |   1 -
 libavcodec/ppc/dsputil_ppc.c|   1 -
 libavcodec/ppc/float_altivec.c  |  57 ---
 libavcodec/wmadec.c |   4 +-
 libavcodec/wmaenc.c |   2 +-
 libavcodec/wmaprodec.c  |   2 -
 libavcodec/x86/dsputil.asm  |  37 -
 libavcodec/x86/dsputil_mmx.c|   8 ---
 libavutil/arm/float_dsp_init_neon.c |   4 ++
 libavutil/arm/float_dsp_init_vfp.c  |   4 ++
 libavutil/arm/float_dsp_neon.S  |  24 
 libavutil/arm/float_dsp_vfp.S   |  69 +++
 libavutil/float_dsp.c   |  11 
 libavutil/float_dsp.h   |  19 +++
 libavutil/ppc/float_dsp_altivec.c   |  29 ++
 libavutil/ppc/float_dsp_altivec.h   |   3 +
 libavutil/ppc/float_dsp_init.c  |   1 +
 libavutil/x86/float_dsp.asm |  37 +
 libavutil/x86/float_dsp_init.c  |   7 +++
 32 files changed, 231 insertions(+), 310 deletions(-)
 delete mode 100644 libavcodec/arm/dsputil_init_vfp.c
 delete mode 100644 libavcodec/arm/dsputil_vfp.S
 delete mode 100644 libavcodec/ppc/float_altivec.c

diff --git a/libavcodec/aacdec.c b/libavcodec/aacdec.c
index d59dea4..0c4e356 100644
--- a/libavcodec/aacdec.c
+++ b/libavcodec/aacdec.c
@@ -2067,9 +2067,9 @@ static void windowing_and_mdct_ltp(AACContext *ac, float 
*out,
 ac-fdsp.vector_fmul(in + 448, in + 448, swindow_prev, 128);
 }
 if (ics-window_sequence[0] != LONG_START_SEQUENCE) {
-ac-dsp.vector_fmul_reverse(in + 1024, in + 1024, lwindow, 1024);
+ac-fdsp.vector_fmul_reverse(in + 1024, in + 1024, lwindow, 1024);
 } else {
-ac-dsp.vector_fmul_reverse(in + 1024 + 448, in + 1024 + 448, swindow, 
128);
+ac-fdsp.vector_fmul_reverse(in + 1024 + 448, in + 1024 + 448, 
swindow, 128);
 memset(in + 1024 + 576, 0, 448 * sizeof(float));
 }
 ac-mdct_ltp.mdct_calc(ac-mdct_ltp, out, in);
@@ -2122,17 +2122,17 @@ static void update_ltp(AACContext *ac, 
SingleChannelElement *sce)
 if (ics-window_sequence[0] == EIGHT_SHORT_SEQUENCE) {
 memcpy(saved_ltp,   saved, 512 * sizeof(float));
 memset(saved_ltp + 576, 0, 448 * sizeof(float));
-ac-dsp.vector_fmul_reverse(saved_ltp + 448, ac-buf_mdct + 960, 
swindow[64],  64);
+ac-fdsp.vector_fmul_reverse(saved_ltp + 448, ac-buf_mdct + 960, 
swindow[64],  64);
 for (i = 0; i  64; i++)
 saved_ltp[i + 512] = ac-buf_mdct[1023 - i] * swindow[63 - i];
 } else if (ics-window_sequence[0] == LONG_START_SEQUENCE) {
 memcpy(saved_ltp,   ac-buf_mdct + 512, 448 * sizeof(float));
 memset(saved_ltp + 576, 0,  448 * sizeof(float));
-ac-dsp.vector_fmul_reverse(saved_ltp + 448, ac-buf_mdct + 960, 
swindow[64],  64);
+ac-fdsp.vector_fmul_reverse(saved_ltp + 448, ac-buf_mdct + 960, 
swindow[64],  64);
 for (i = 0; i  64; i++)
 saved_ltp[i + 512] = ac-buf_mdct[1023 - i] * swindow[63 - i];
 } else { // LONG_STOP or ONLY_LONG
-ac-dsp.vector_fmul_reverse(saved_ltp,   ac-buf_mdct + 512, 
lwindow[512], 512);
+ac-fdsp.vector_fmul_reverse(saved_ltp,   ac-buf_mdct + 512, 
lwindow[512], 512);
 for (i = 0; i  512; i++)
 saved_ltp[i + 512] = ac-buf_mdct[1023 - i] * lwindow[511 - i];
 }
diff --git a/libavcodec/aacenc.c b/libavcodec/aacenc.c
index 6f582ca..00a6d03 100644
--- a/libavcodec/aacenc.c
+++ b/libavcodec/aacenc.c
@@ -183,7 +183,7 @@ static void put_audio_specific_config(AVCodecContext *avctx)
 }
 
 #define WINDOW_FUNC(type) \
-static void apply_ ##type ##_window(DSPContext *dsp, AVFloatDSPContext *fdsp, \
+static void apply_ ##type ##_window(AVFloatDSPContext *fdsp, \
 SingleChannelElement *sce, \
 const float *audio)
 
@@ -193,8 +193,8 @@ WINDOW_FUNC(only_long)
 const float *pwindow = sce-ics.use_kb_window[1] ? ff_aac_kbd_long_1024 : 
ff_sine_1024;
 float *out = sce-ret_buf;
 
-fdsp

[libav-devel] [PATCH] dsputil: remove butterflies_float_interleave.

2013-01-20 Thread Ronald S. Bultje
From: Ronald S. Bultje rsbul...@gmail.com

The function is unused.
---
 libavcodec/dsputil.c | 13 -
 libavcodec/dsputil.h | 17 -
 libavcodec/x86/dsputil.asm   | 44 
 libavcodec/x86/dsputil_mmx.c |  7 ---
 4 files changed, 81 deletions(-)

diff --git a/libavcodec/dsputil.c b/libavcodec/dsputil.c
index 6a0c4cf..3903eeb 100644
--- a/libavcodec/dsputil.c
+++ b/libavcodec/dsputil.c
@@ -2364,18 +2364,6 @@ static void butterflies_float_c(float *restrict v1, 
float *restrict v2,
 }
 }
 
-static void butterflies_float_interleave_c(float *dst, const float *src0,
-   const float *src1, int len)
-{
-int i;
-for (i = 0; i  len; i++) {
-float f1 = src0[i];
-float f2 = src1[i];
-dst[2*i] = f1 + f2;
-dst[2*i + 1] = f1 - f2;
-}
-}
-
 float ff_scalarproduct_float_c(const float *v1, const float *v2, int len)
 {
 float p = 0.0;
@@ -2719,7 +2707,6 @@ av_cold void ff_dsputil_init(DSPContext* c, 
AVCodecContext *avctx)
 c-vector_clip_int32 = vector_clip_int32_c;
 c-scalarproduct_float = ff_scalarproduct_float_c;
 c-butterflies_float = butterflies_float_c;
-c-butterflies_float_interleave = butterflies_float_interleave_c;
 
 c-shrink[0]= av_image_copy_plane;
 c-shrink[1]= ff_shrink22;
diff --git a/libavcodec/dsputil.h b/libavcodec/dsputil.h
index 7d2a332..5d49083 100644
--- a/libavcodec/dsputil.h
+++ b/libavcodec/dsputil.h
@@ -359,23 +359,6 @@ typedef struct DSPContext {
  */
 void (*butterflies_float)(float *restrict v1, float *restrict v2, int len);
 
-/**
- * Calculate the sum and difference of two vectors of floats and interleave
- * results into a separate output vector of floats, with each sum
- * positioned before the corresponding difference.
- *
- * @param dst  output vector
- * constraints: 16-byte aligned
- * @param src0 first input vector
- * constraints: 32-byte aligned
- * @param src1 second input vector
- * constraints: 32-byte aligned
- * @param len  number of elements in the input
- * constraints: multiple of 8
- */
-void (*butterflies_float_interleave)(float *dst, const float *src0,
- const float *src1, int len);
-
 /* (I)DCT */
 void (*fdct)(DCTELEM *block/* align 16*/);
 void (*fdct248)(DCTELEM *block/* align 16*/);
diff --git a/libavcodec/x86/dsputil.asm b/libavcodec/x86/dsputil.asm
index f22fb19..27e77d5 100644
--- a/libavcodec/x86/dsputil.asm
+++ b/libavcodec/x86/dsputil.asm
@@ -567,50 +567,6 @@ VECTOR_CLIP_INT32 11, 1, 1, 0
 VECTOR_CLIP_INT32 6, 1, 0, 0
 %endif
 
-;-
-; void ff_butterflies_float_interleave(float *dst, const float *src0,
-;  const float *src1, int len);
-;-
-
-%macro BUTTERFLIES_FLOAT_INTERLEAVE 0
-cglobal butterflies_float_interleave, 4,4,3, dst, src0, src1, len
-%if ARCH_X86_64
-movsxdlenq, lend
-%endif
-test  lenq, lenq
-jz .end
-shl   lenq, 2
-lea  src0q, [src0q +   lenq]
-lea  src1q, [src1q +   lenq]
-lea   dstq, [ dstq + 2*lenq]
-neg   lenq
-.loop:
-movam0, [src0q + lenq]
-movam1, [src1q + lenq]
-subps   m2, m0, m1
-addps   m0, m0, m1
-unpcklpsm1, m0, m2
-unpckhpsm0, m0, m2
-%if cpuflag(avx)
-vextractf128 [dstq + 2*lenq ], m1, 0
-vextractf128 [dstq + 2*lenq + 16], m0, 0
-vextractf128 [dstq + 2*lenq + 32], m1, 1
-vextractf128 [dstq + 2*lenq + 48], m0, 1
-%else
-mova [dstq + 2*lenq ], m1
-mova [dstq + 2*lenq + mmsize], m0
-%endif
-add   lenq, mmsize
-jl .loop
-.end:
-REP_RET
-%endmacro
-
-INIT_XMM sse
-BUTTERFLIES_FLOAT_INTERLEAVE
-INIT_YMM avx
-BUTTERFLIES_FLOAT_INTERLEAVE
-
 ; %1 = aligned/unaligned
 %macro BSWAP_LOOPS  1
 mov  r3, r2
diff --git a/libavcodec/x86/dsputil_mmx.c b/libavcodec/x86/dsputil_mmx.c
index fb1a801..5ac18f3 100644
--- a/libavcodec/x86/dsputil_mmx.c
+++ b/libavcodec/x86/dsputil_mmx.c
@@ -1854,11 +1854,6 @@ void ff_vector_clip_int32_int_sse2(int32_t *dst, const 
int32_t *src,
 void ff_vector_clip_int32_sse4(int32_t *dst, const int32_t *src,
int32_t min, int32_t max, unsigned int len);
 
-extern void ff_butterflies_float_interleave_sse(float *dst, const float *src0,
-const float *src1, int len);
-extern void ff_butterflies_float_interleave_avx(float *dst, const float *src0,
-const float *src1, int len);
-
 #define SET_QPEL_FUNCS(PFX, IDX, SIZE, CPU, PREFIX)  \
 do

Re: [libav-devel] [PATCH 06/14] x32 yasm: x86inc add ptrsize and p-suffix

2013-01-20 Thread Ronald S. Bultje
Hi,

On Sun, Jan 20, 2013 at 2:32 PM, Matthias Räncker
theonetruecam...@gmx.de wrote:
 - new ptrsize macro
   equals 8 on x64, 4 otherwise
 - added p suffix to registers and function arguments
   refering to pointer sized registers and arguments

 Signed-off-by: Matthias Räncker theonetruecam...@gmx.de
 ---
  libavutil/x86/x86inc.asm | 34 ++
  1 file changed, 34 insertions(+)

x86inc.asm patches go to x264. We only merge x264 patches.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


[libav-devel] [PATCH] floatdsp: move scalarproduct_float from dsputil to avfloatdsp.

2013-01-20 Thread Ronald S. Bultje
From: Ronald S. Bultje rsbul...@gmail.com

This makes the aac decoder and all voice codecs independent of dsputil.
---
 libavcodec/aac.h|  1 -
 libavcodec/aacdec.c |  3 +--
 libavcodec/acelp_pitch_delay.c  |  4 ++--
 libavcodec/acelp_vectors.c  |  6 +++---
 libavcodec/amrnbdec.c   | 20 ++--
 libavcodec/amrwbdec.c   | 33 +
 libavcodec/arm/dsputil_init_neon.c  |  3 ---
 libavcodec/arm/dsputil_neon.S   | 13 -
 libavcodec/dsputil.c| 12 
 libavcodec/dsputil.h| 18 --
 libavcodec/qcelpdec.c   | 17 -
 libavcodec/ra288.c  |  4 ++--
 libavcodec/sipr.c   | 15 ---
 libavcodec/sipr16k.c|  8 
 libavcodec/wmavoice.c   | 16 +---
 libavcodec/x86/dsputil.asm  | 26 --
 libavcodec/x86/dsputil_mmx.c|  6 --
 libavutil/arm/float_dsp_init_neon.c |  3 +++
 libavutil/arm/float_dsp_neon.S  | 13 +
 libavutil/float_dsp.c   | 12 
 libavutil/float_dsp.h   | 22 ++
 libavutil/x86/float_dsp.asm | 25 +
 libavutil/x86/float_dsp_init.c  |  3 +++
 23 files changed, 142 insertions(+), 141 deletions(-)

diff --git a/libavcodec/aac.h b/libavcodec/aac.h
index 6c5d962..dd337a0 100644
--- a/libavcodec/aac.h
+++ b/libavcodec/aac.h
@@ -291,7 +291,6 @@ typedef struct AACContext {
 FFTContext mdct;
 FFTContext mdct_small;
 FFTContext mdct_ltp;
-DSPContext dsp;
 FmtConvertContext fmt_conv;
 AVFloatDSPContext fdsp;
 int random_state;
diff --git a/libavcodec/aacdec.c b/libavcodec/aacdec.c
index b016611..5afc9b8 100644
--- a/libavcodec/aacdec.c
+++ b/libavcodec/aacdec.c
@@ -895,7 +895,6 @@ static av_cold int aac_decode_init(AVCodecContext *avctx)
 
 ff_aac_sbr_init();
 
-ff_dsputil_init(ac-dsp, avctx);
 ff_fmt_convert_init(ac-fmt_conv, avctx);
 avpriv_float_dsp_init(ac-fdsp, avctx-flags  CODEC_FLAG_BITEXACT);
 
@@ -1358,7 +1357,7 @@ static int decode_spectrum_and_dequant(AACContext *ac, 
float coef[1024],
 cfo[k] = ac-random_state;
 }
 
-band_energy = ac-dsp.scalarproduct_float(cfo, cfo, 
off_len);
+band_energy = ac-fdsp.scalarproduct_float(cfo, cfo, 
off_len);
 scale = sf[idx] / sqrtf(band_energy);
 ac-fdsp.vector_fmul_scalar(cfo, cfo, scale, off_len);
 }
diff --git a/libavcodec/acelp_pitch_delay.c b/libavcodec/acelp_pitch_delay.c
index a9668fa..ab09bdb 100644
--- a/libavcodec/acelp_pitch_delay.c
+++ b/libavcodec/acelp_pitch_delay.c
@@ -21,9 +21,9 @@
  */
 
 #include libavutil/common.h
+#include libavutil/float_dsp.h
 #include libavutil/mathematics.h
 #include avcodec.h
-#include dsputil.h
 #include acelp_pitch_delay.h
 #include celp_math.h
 
@@ -120,7 +120,7 @@ float ff_amr_set_fixed_gain(float fixed_gain_factor, float 
fixed_mean_energy,
 // Note 10^(0.05 * -10log(average x2)) = 1/sqrt((average x2)).
 float val = fixed_gain_factor *
 exp2f(M_LOG2_10 * 0.05 *
-  (ff_scalarproduct_float_c(pred_table, prediction_error, 4) +
+  (avpriv_scalarproduct_float_c(pred_table, prediction_error, 4) +
energy_mean)) /
 sqrtf(fixed_mean_energy);
 
diff --git a/libavcodec/acelp_vectors.c b/libavcodec/acelp_vectors.c
index b50c5f3..a85e45f 100644
--- a/libavcodec/acelp_vectors.c
+++ b/libavcodec/acelp_vectors.c
@@ -23,8 +23,8 @@
 #include inttypes.h
 
 #include libavutil/common.h
+#include libavutil/float_dsp.h
 #include avcodec.h
-#include dsputil.h
 #include acelp_vectors.h
 
 const uint8_t ff_fc_2pulses_9bits_track1[16] =
@@ -183,7 +183,7 @@ void ff_adaptive_gain_control(float *out, const float *in, 
float speech_energ,
   int size, float alpha, float *gain_mem)
 {
 int i;
-float postfilter_energ = ff_scalarproduct_float_c(in, in, size);
+float postfilter_energ = avpriv_scalarproduct_float_c(in, in, size);
 float gain_scale_factor = 1.0;
 float mem = *gain_mem;
 
@@ -204,7 +204,7 @@ void ff_scale_vector_to_given_sum_of_squares(float *out, 
const float *in,
  float sum_of_squares, const int n)
 {
 int i;
-float scalefactor = ff_scalarproduct_float_c(in, in, n);
+float scalefactor = avpriv_scalarproduct_float_c(in, in, n);
 if (scalefactor)
 scalefactor = sqrt(sum_of_squares / scalefactor);
 for (i = 0; i  n; i++)
diff --git a/libavcodec/amrnbdec.c b/libavcodec/amrnbdec.c
index 5c359a8..7db12dd 100644
--- a/libavcodec/amrnbdec.c
+++ b/libavcodec/amrnbdec.c
@@ -44,8 +44,8 @@
 #include math.h
 
 #include libavutil/channel_layout.h
+#include libavutil

  1   2   3   4   5   6   7   8   9   10   >