Re: [FFmpeg-devel] [PATCH 2/3] avcodec/aacsbr: Add comment about possibly optimization in sbr_dequant()

2015-12-12 Thread Michael Niedermayer
On Fri, Dec 11, 2015 at 12:09:57PM -0500, Ganesh Ajjanagadde wrote:
> On Fri, Dec 11, 2015 at 11:36 AM, Andreas Cadhalpun
>  wrote:
> > On 11.12.2015 17:21, Ganesh Ajjanagadde wrote:
> >> On Fri, Dec 11, 2015 at 11:16 AM, Andreas Cadhalpun
> >>  wrote:
> >>> On 19.11.2015 14:17, Michael Niedermayer wrote:
>  From: Michael Niedermayer 
> 
>  Signed-off-by: Michael Niedermayer 
>  ---
>   libavcodec/aacsbr.c |1 +
>   1 file changed, 1 insertion(+)
> 
>  diff --git a/libavcodec/aacsbr.c b/libavcodec/aacsbr.c
>  index d1e3a91..e014646 100644
>  --- a/libavcodec/aacsbr.c
>  +++ b/libavcodec/aacsbr.c
>  @@ -73,6 +73,7 @@ static void sbr_dequant(SpectralBandReplication *sbr, 
>  int id_aac)
>   {
>   int k, e;
>   int ch;
>  +//TODO: Replace exp2f(0.5*x) by a LUT, the inputs are all integer 
>  and have a small range
> 
>   if (id_aac == TYPE_CPE && sbr->bs_coupling) {
>   float alpha  = sbr->data[0].bs_amp_res ?  1.0f :  0.5f;
> 
> >>>
> >>> This shouldn't hurt, with or without the clarification requested by 
> >>> Ganesh.
> >>
> >> I am doing related work cleaning up and optimizing usages of slow libm
> >> functions such as pow and exp2. Do you know the exact possible range
> >> of the inputs x, and if so, can it be added to the comment? That will
> >> be very helpful for me to come up with a patch. Thanks.
> >
> > The exp2f expressions are:
> > exp2f(sbr->data[0].env_facs_q[e][k] * alpha + 7.0f);
> > exp2f((pan_offset - sbr->data[1].env_facs_q[e][k]) * alpha);
> > exp2f(NOISE_FLOOR_OFFSET - sbr->data[0].noise_facs_q[e][k] + 1);
> > exp2f(12 - sbr->data[1].noise_facs_q[e][k]);
> > exp2f(alpha * sbr->data[ch].env_facs_q[e][k] + 6.0f);
> > exp2f(NOISE_FLOOR_OFFSET - sbr->data[ch].noise_facs_q[e][k]);
> >
> > Here alpha is 1 or 0.5, pan_offset 12 or 24 and NOISE_FLOOR_OFFSET is 6.
> > After patch 3 of this series, env_facs_q is in the range from 0 to 127 and
> > noise_facs_q is already limited to the range from 0 to 30.
> >
> > So x should always be in the range -300..300, or so.
> 
> Very good, thanks a lot.
> 
> Based on the above range, my idea is to not even use a LUT, but use
> something like exp2fi followed by multiplication by M_SQRT2 depending
> on even or odd.

conditional operations can due to branch misprediction be potentially
rather slow

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Many that live deserve death. And some that die deserve life. Can you give
it to them? Then do not be too eager to deal out death in judgement. For
even the very wise cannot see all ends. -- Gandalf


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 2/3] avcodec/aacsbr: Add comment about possibly optimization in sbr_dequant()

2015-12-12 Thread Ganesh Ajjanagadde
On Sat, Dec 12, 2015 at 12:58 PM, Michael Niedermayer  wrote:
> On Fri, Dec 11, 2015 at 12:09:57PM -0500, Ganesh Ajjanagadde wrote:
>> On Fri, Dec 11, 2015 at 11:36 AM, Andreas Cadhalpun
>>  wrote:
>> > On 11.12.2015 17:21, Ganesh Ajjanagadde wrote:
>> >> On Fri, Dec 11, 2015 at 11:16 AM, Andreas Cadhalpun
>> >>  wrote:
>> >>> On 19.11.2015 14:17, Michael Niedermayer wrote:
>>  From: Michael Niedermayer 
>> 
>>  Signed-off-by: Michael Niedermayer 
>>  ---
>>   libavcodec/aacsbr.c |1 +
>>   1 file changed, 1 insertion(+)
>> 
>>  diff --git a/libavcodec/aacsbr.c b/libavcodec/aacsbr.c
>>  index d1e3a91..e014646 100644
>>  --- a/libavcodec/aacsbr.c
>>  +++ b/libavcodec/aacsbr.c
>>  @@ -73,6 +73,7 @@ static void sbr_dequant(SpectralBandReplication *sbr, 
>>  int id_aac)
>>   {
>>   int k, e;
>>   int ch;
>>  +//TODO: Replace exp2f(0.5*x) by a LUT, the inputs are all integer 
>>  and have a small range
>> 
>>   if (id_aac == TYPE_CPE && sbr->bs_coupling) {
>>   float alpha  = sbr->data[0].bs_amp_res ?  1.0f :  0.5f;
>> 
>> >>>
>> >>> This shouldn't hurt, with or without the clarification requested by 
>> >>> Ganesh.
>> >>
>> >> I am doing related work cleaning up and optimizing usages of slow libm
>> >> functions such as pow and exp2. Do you know the exact possible range
>> >> of the inputs x, and if so, can it be added to the comment? That will
>> >> be very helpful for me to come up with a patch. Thanks.
>> >
>> > The exp2f expressions are:
>> > exp2f(sbr->data[0].env_facs_q[e][k] * alpha + 7.0f);
>> > exp2f((pan_offset - sbr->data[1].env_facs_q[e][k]) * alpha);
>> > exp2f(NOISE_FLOOR_OFFSET - sbr->data[0].noise_facs_q[e][k] + 1);
>> > exp2f(12 - sbr->data[1].noise_facs_q[e][k]);
>> > exp2f(alpha * sbr->data[ch].env_facs_q[e][k] + 6.0f);
>> > exp2f(NOISE_FLOOR_OFFSET - sbr->data[ch].noise_facs_q[e][k]);
>> >
>> > Here alpha is 1 or 0.5, pan_offset 12 or 24 and NOISE_FLOOR_OFFSET is 6.
>> > After patch 3 of this series, env_facs_q is in the range from 0 to 127 and
>> > noise_facs_q is already limited to the range from 0 to 30.
>> >
>> > So x should always be in the range -300..300, or so.
>>
>> Very good, thanks a lot.
>>
>> Based on the above range, my idea is to not even use a LUT, but use
>> something like exp2fi followed by multiplication by M_SQRT2 depending
>> on even or odd.
>
> conditional operations can due to branch misprediction be potentially
> rather slow

I think it will still be far faster than exp2f, and in the absence of
hard numbers, I view this a far better approach than a large (~300
element) lut. Of course, the proof and extent of this will need to
wait for actual benches.

>
> [...]
> --
> Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> Many that live deserve death. And some that die deserve life. Can you give
> it to them? Then do not be too eager to deal out death in judgement. For
> even the very wise cannot see all ends. -- Gandalf
>
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 2/3] avcodec/aacsbr: Add comment about possibly optimization in sbr_dequant()

2015-12-12 Thread Michael Niedermayer
On Sat, Dec 12, 2015 at 01:08:01PM -0500, Ganesh Ajjanagadde wrote:
> On Sat, Dec 12, 2015 at 12:58 PM, Michael Niedermayer  
> wrote:
> > On Fri, Dec 11, 2015 at 12:09:57PM -0500, Ganesh Ajjanagadde wrote:
> >> On Fri, Dec 11, 2015 at 11:36 AM, Andreas Cadhalpun
> >>  wrote:
> >> > On 11.12.2015 17:21, Ganesh Ajjanagadde wrote:
> >> >> On Fri, Dec 11, 2015 at 11:16 AM, Andreas Cadhalpun
> >> >>  wrote:
> >> >>> On 19.11.2015 14:17, Michael Niedermayer wrote:
> >>  From: Michael Niedermayer 
> >> 
> >>  Signed-off-by: Michael Niedermayer 
> >>  ---
> >>   libavcodec/aacsbr.c |1 +
> >>   1 file changed, 1 insertion(+)
> >> 
> >>  diff --git a/libavcodec/aacsbr.c b/libavcodec/aacsbr.c
> >>  index d1e3a91..e014646 100644
> >>  --- a/libavcodec/aacsbr.c
> >>  +++ b/libavcodec/aacsbr.c
> >>  @@ -73,6 +73,7 @@ static void sbr_dequant(SpectralBandReplication 
> >>  *sbr, int id_aac)
> >>   {
> >>   int k, e;
> >>   int ch;
> >>  +//TODO: Replace exp2f(0.5*x) by a LUT, the inputs are all 
> >>  integer and have a small range
> >> 
> >>   if (id_aac == TYPE_CPE && sbr->bs_coupling) {
> >>   float alpha  = sbr->data[0].bs_amp_res ?  1.0f :  0.5f;
> >> 
> >> >>>
> >> >>> This shouldn't hurt, with or without the clarification requested by 
> >> >>> Ganesh.
> >> >>
> >> >> I am doing related work cleaning up and optimizing usages of slow libm
> >> >> functions such as pow and exp2. Do you know the exact possible range
> >> >> of the inputs x, and if so, can it be added to the comment? That will
> >> >> be very helpful for me to come up with a patch. Thanks.
> >> >
> >> > The exp2f expressions are:
> >> > exp2f(sbr->data[0].env_facs_q[e][k] * alpha + 7.0f);
> >> > exp2f((pan_offset - sbr->data[1].env_facs_q[e][k]) * alpha);
> >> > exp2f(NOISE_FLOOR_OFFSET - sbr->data[0].noise_facs_q[e][k] + 1);
> >> > exp2f(12 - sbr->data[1].noise_facs_q[e][k]);
> >> > exp2f(alpha * sbr->data[ch].env_facs_q[e][k] + 6.0f);
> >> > exp2f(NOISE_FLOOR_OFFSET - sbr->data[ch].noise_facs_q[e][k]);
> >> >
> >> > Here alpha is 1 or 0.5, pan_offset 12 or 24 and NOISE_FLOOR_OFFSET is 6.
> >> > After patch 3 of this series, env_facs_q is in the range from 0 to 127 
> >> > and
> >> > noise_facs_q is already limited to the range from 0 to 30.
> >> >
> >> > So x should always be in the range -300..300, or so.
> >>
> >> Very good, thanks a lot.
> >>
> >> Based on the above range, my idea is to not even use a LUT, but use
> >> something like exp2fi followed by multiplication by M_SQRT2 depending
> >> on even or odd.
> >
> > conditional operations can due to branch misprediction be potentially
> > rather slow
> 
> I think it will still be far faster than exp2f, and in the absence of
> hard numbers, I view this a far better approach than a large (~300
> element) lut. Of course, the proof and extent of this will need to
> wait for actual benches.

alternatively one could do a
if (x+A < (unsigned)B)
LUT[x+A]
else
exp2whatever(x)

the range in practice should be much smaller than +-300

also the LUT can possibly be shared between codecs

or that code could be in a exp_sqrt2i() or something

just some random ideas...

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Complexity theory is the science of finding the exact solution to an
approximation. Benchmarking OTOH is finding an approximation of the exact


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 2/3] avcodec/aacsbr: Add comment about possibly optimization in sbr_dequant()

2015-12-12 Thread Ganesh Ajjanagadde
On Sat, Dec 12, 2015 at 1:17 PM, Michael Niedermayer  wrote:
[...]

>> >> >
>> >> > The exp2f expressions are:
>> >> > exp2f(sbr->data[0].env_facs_q[e][k] * alpha + 7.0f);
>> >> > exp2f((pan_offset - sbr->data[1].env_facs_q[e][k]) * alpha);
>> >> > exp2f(NOISE_FLOOR_OFFSET - sbr->data[0].noise_facs_q[e][k] + 1);
>> >> > exp2f(12 - sbr->data[1].noise_facs_q[e][k]);
>> >> > exp2f(alpha * sbr->data[ch].env_facs_q[e][k] + 6.0f);
>> >> > exp2f(NOISE_FLOOR_OFFSET - sbr->data[ch].noise_facs_q[e][k]);
>> >> >
>> >> > Here alpha is 1 or 0.5, pan_offset 12 or 24 and NOISE_FLOOR_OFFSET is 6.
>> >> > After patch 3 of this series, env_facs_q is in the range from 0 to 127 
>> >> > and
>> >> > noise_facs_q is already limited to the range from 0 to 30.
>> >> >
>> >> > So x should always be in the range -300..300, or so.
>> >>
>> >> Very good, thanks a lot.
>> >>
>> >> Based on the above range, my idea is to not even use a LUT, but use
>> >> something like exp2fi followed by multiplication by M_SQRT2 depending
>> >> on even or odd.
>> >
>> > conditional operations can due to branch misprediction be potentially
>> > rather slow
>>
>> I think it will still be far faster than exp2f, and in the absence of
>> hard numbers, I view this a far better approach than a large (~300
>> element) lut. Of course, the proof and extent of this will need to
>> wait for actual benches.
>
> alternatively one could do a
> if (x+A < (unsigned)B)
> LUT[x+A]
> else
> exp2whatever(x)
>
> the range in practice should be much smaller than +-300

That still uses a branch, so unless for whatever reason the numbers
tend to concentrate in an interval (which you believe but I am
agnostic about since I don't know AAC), this is code complexity for
little gain. Furthermore, that B then becomes a "voodoo" constant, and
performance may vary across files depending on the degree of
concentration of the inputs. I personally don't like such things
unless they are very well justified after all other easy, uniform
methods of optimization are exhausted.

>
> also the LUT can possibly be shared between codecs

This is something for which there is plenty of low hanging fruit
across FFmpeg. The best example I know of is the sin/cos tables used
across dct, fft, and other areas: a lot of them can be derived from
sin(2*pi*i/65536) for 0 <= i <= 65536/4, and cheap runtime derivation
via indexing and symmetry exploitation (eg 16*i, 32*i, flip the sign).
I will work on this only after other things are sorted out; so in the
meantime it is up for grabs.

Complications come from threading issues for the case of dynamic init,
since one needs to place a lock of some sort to avoid write
contention. And --enable-hardcoded-tables does not help at all, as it
distracts from the key issues since one needs to reason about both
cases.

Unfortunately, the question of static vs dynamic tables is something
that can easily get bikeshedded to death. I am trying to do
improvements to the default dynamic initialization to help resolve
such things, see e.g the aac_tablegen work. My goal is to first
improve the runtime generation code, and once it is done, ask the ML
for decisions on this when the answer is relatively clear cut.

I might be too optimistic here regarding removal of
--enable-hardcoded-tables entirely via hard decisions in respective
usages of static vs runtime across the codebase. This is beacuse
things like mpegaudio_tablegen are hard to do on the order of 10^5 or
below cycles at runtime, leading to ambiguity and stalemating the
removal of hardcoded tables hackery.

>
> or that code could be in a exp_sqrt2i() or something

Maybe, I just thought by keeping common code, one avoids essentially
duplicate functions and it possibly keeps binary size smaller. This of
course depends on the linker, header vs source implementation, etc and
is a minor consideration.

>
> just some random ideas...

Thanks for the ideas. I had thought about most of these things before
settling on the idea proposed above. In any case, the goal should be
solid incremental improvement: last mile tweaks to get rid of some
branch penalties, optimize for common inputs, or other things can
always be done later, and should not bog down work now.

>
> [...]
> --
> Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> Complexity theory is the science of finding the exact solution to an
> approximation. Benchmarking OTOH is finding an approximation of the exact
>
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 2/3] avcodec/aacsbr: Add comment about possibly optimization in sbr_dequant()

2015-12-12 Thread Michael Niedermayer
On Sat, Dec 12, 2015 at 05:24:34PM -0500, Ganesh Ajjanagadde wrote:
> On Sat, Dec 12, 2015 at 1:17 PM, Michael Niedermayer  wrote:
> [...]
> 
> >> >> >
> >> >> > The exp2f expressions are:
> >> >> > exp2f(sbr->data[0].env_facs_q[e][k] * alpha + 7.0f);
> >> >> > exp2f((pan_offset - sbr->data[1].env_facs_q[e][k]) * alpha);
> >> >> > exp2f(NOISE_FLOOR_OFFSET - sbr->data[0].noise_facs_q[e][k] + 1);
> >> >> > exp2f(12 - sbr->data[1].noise_facs_q[e][k]);
> >> >> > exp2f(alpha * sbr->data[ch].env_facs_q[e][k] + 6.0f);
> >> >> > exp2f(NOISE_FLOOR_OFFSET - sbr->data[ch].noise_facs_q[e][k]);
> >> >> >
> >> >> > Here alpha is 1 or 0.5, pan_offset 12 or 24 and NOISE_FLOOR_OFFSET is 
> >> >> > 6.
> >> >> > After patch 3 of this series, env_facs_q is in the range from 0 to 
> >> >> > 127 and
> >> >> > noise_facs_q is already limited to the range from 0 to 30.
> >> >> >
> >> >> > So x should always be in the range -300..300, or so.
> >> >>
> >> >> Very good, thanks a lot.
> >> >>
> >> >> Based on the above range, my idea is to not even use a LUT, but use
> >> >> something like exp2fi followed by multiplication by M_SQRT2 depending
> >> >> on even or odd.
> >> >
> >> > conditional operations can due to branch misprediction be potentially
> >> > rather slow
> >>
> >> I think it will still be far faster than exp2f, and in the absence of
> >> hard numbers, I view this a far better approach than a large (~300
> >> element) lut. Of course, the proof and extent of this will need to
> >> wait for actual benches.
> >
> > alternatively one could do a
> > if (x+A < (unsigned)B)
> > LUT[x+A]
> > else
> > exp2whatever(x)
> >
> > the range in practice should be much smaller than +-300
> 
> That still uses a branch, so unless for whatever reason the numbers
> tend to concentrate in an interval (which you believe but I am
> agnostic about since I don't know AAC), this is code complexity for

theres an easy way to find out, just print the numbers and use
sort -n | uniq -c
i didnt try but i expect most of the range to have 0 hits


> little gain. Furthermore, that B then becomes a "voodoo" constant, and
> performance may vary across files depending on the degree of
> concentration of the inputs. I personally don't like such things
> unless they are very well justified after all other easy, uniform
> methods of optimization are exhausted.
> 
> >
> > also the LUT can possibly be shared between codecs
> 
> This is something for which there is plenty of low hanging fruit
> across FFmpeg. The best example I know of is the sin/cos tables used
> across dct, fft, and other areas: a lot of them can be derived from
> sin(2*pi*i/65536) for 0 <= i <= 65536/4, and cheap runtime derivation
> via indexing and symmetry exploitation (eg 16*i, 32*i, flip the sign).
> I will work on this only after other things are sorted out; so in the
> meantime it is up for grabs.
> 
> Complications come from threading issues for the case of dynamic init,
> since one needs to place a lock of some sort to avoid write
> contention. And --enable-hardcoded-tables does not help at all, as it
> distracts from the key issues since one needs to reason about both
> cases.

you do not "need" a lock

the most standard way to add a table is to type the numbers of its
entries in the a source file and commit that.
if the table is to be generated at build time then some complexity is
added to do that with a generator run at build time
if it is build at runtime then it no longer can be shared between
processes and needs to be written to disk in case it is paged out
a "hardcoded" table resides in read only memory and on page out
is discarded, the kernel can read it directly from the excutable
again when it gets accessed again.
and one can support both runtime and buildtime tables, so the user
can choose
a manually hardcoded table without generator is harder to update
though and it and the build generated ones result in larger object and
excutable files


> 
> Unfortunately, the question of static vs dynamic tables is something
> that can easily get bikeshedded to death. I am trying to do

yes, everyone has a different use case in mind and a different variant
is optimal for each.
In the end there is not a big difference between them i think,
especially not for small tables. For big tables there could be more
significant disadvantages in each solution relative to what is optimal
for each others use case

anyway, you could try exp2whatever(x>>1) * tab[x&1]
or something
that may be faster IFF the compiler builds the  if(x&1) into bad
code and if the values actually fluctuate enough

[...]

-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I have often repented speaking, but never of holding my tongue.
-- Xenocrates


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 2/3] avcodec/aacsbr: Add comment about possibly optimization in sbr_dequant()

2015-12-11 Thread Andreas Cadhalpun
On 11.12.2015 17:21, Ganesh Ajjanagadde wrote:
> On Fri, Dec 11, 2015 at 11:16 AM, Andreas Cadhalpun
>  wrote:
>> On 19.11.2015 14:17, Michael Niedermayer wrote:
>>> From: Michael Niedermayer 
>>>
>>> Signed-off-by: Michael Niedermayer 
>>> ---
>>>  libavcodec/aacsbr.c |1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>> diff --git a/libavcodec/aacsbr.c b/libavcodec/aacsbr.c
>>> index d1e3a91..e014646 100644
>>> --- a/libavcodec/aacsbr.c
>>> +++ b/libavcodec/aacsbr.c
>>> @@ -73,6 +73,7 @@ static void sbr_dequant(SpectralBandReplication *sbr, int 
>>> id_aac)
>>>  {
>>>  int k, e;
>>>  int ch;
>>> +//TODO: Replace exp2f(0.5*x) by a LUT, the inputs are all integer and 
>>> have a small range
>>>
>>>  if (id_aac == TYPE_CPE && sbr->bs_coupling) {
>>>  float alpha  = sbr->data[0].bs_amp_res ?  1.0f :  0.5f;
>>>
>>
>> This shouldn't hurt, with or without the clarification requested by Ganesh.
> 
> I am doing related work cleaning up and optimizing usages of slow libm
> functions such as pow and exp2. Do you know the exact possible range
> of the inputs x, and if so, can it be added to the comment? That will
> be very helpful for me to come up with a patch. Thanks.

The exp2f expressions are:
exp2f(sbr->data[0].env_facs_q[e][k] * alpha + 7.0f);
exp2f((pan_offset - sbr->data[1].env_facs_q[e][k]) * alpha);
exp2f(NOISE_FLOOR_OFFSET - sbr->data[0].noise_facs_q[e][k] + 1);
exp2f(12 - sbr->data[1].noise_facs_q[e][k]);
exp2f(alpha * sbr->data[ch].env_facs_q[e][k] + 6.0f);
exp2f(NOISE_FLOOR_OFFSET - sbr->data[ch].noise_facs_q[e][k]);

Here alpha is 1 or 0.5, pan_offset 12 or 24 and NOISE_FLOOR_OFFSET is 6.
After patch 3 of this series, env_facs_q is in the range from 0 to 127 and
noise_facs_q is already limited to the range from 0 to 30.

So x should always be in the range -300..300, or so.

Best regards,
Andreas
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 2/3] avcodec/aacsbr: Add comment about possibly optimization in sbr_dequant()

2015-12-11 Thread Ganesh Ajjanagadde
On Fri, Dec 11, 2015 at 11:36 AM, Andreas Cadhalpun
 wrote:
> On 11.12.2015 17:21, Ganesh Ajjanagadde wrote:
>> On Fri, Dec 11, 2015 at 11:16 AM, Andreas Cadhalpun
>>  wrote:
>>> On 19.11.2015 14:17, Michael Niedermayer wrote:
 From: Michael Niedermayer 

 Signed-off-by: Michael Niedermayer 
 ---
  libavcodec/aacsbr.c |1 +
  1 file changed, 1 insertion(+)

 diff --git a/libavcodec/aacsbr.c b/libavcodec/aacsbr.c
 index d1e3a91..e014646 100644
 --- a/libavcodec/aacsbr.c
 +++ b/libavcodec/aacsbr.c
 @@ -73,6 +73,7 @@ static void sbr_dequant(SpectralBandReplication *sbr, 
 int id_aac)
  {
  int k, e;
  int ch;
 +//TODO: Replace exp2f(0.5*x) by a LUT, the inputs are all integer and 
 have a small range

  if (id_aac == TYPE_CPE && sbr->bs_coupling) {
  float alpha  = sbr->data[0].bs_amp_res ?  1.0f :  0.5f;

>>>
>>> This shouldn't hurt, with or without the clarification requested by Ganesh.
>>
>> I am doing related work cleaning up and optimizing usages of slow libm
>> functions such as pow and exp2. Do you know the exact possible range
>> of the inputs x, and if so, can it be added to the comment? That will
>> be very helpful for me to come up with a patch. Thanks.
>
> The exp2f expressions are:
> exp2f(sbr->data[0].env_facs_q[e][k] * alpha + 7.0f);
> exp2f((pan_offset - sbr->data[1].env_facs_q[e][k]) * alpha);
> exp2f(NOISE_FLOOR_OFFSET - sbr->data[0].noise_facs_q[e][k] + 1);
> exp2f(12 - sbr->data[1].noise_facs_q[e][k]);
> exp2f(alpha * sbr->data[ch].env_facs_q[e][k] + 6.0f);
> exp2f(NOISE_FLOOR_OFFSET - sbr->data[ch].noise_facs_q[e][k]);
>
> Here alpha is 1 or 0.5, pan_offset 12 or 24 and NOISE_FLOOR_OFFSET is 6.
> After patch 3 of this series, env_facs_q is in the range from 0 to 127 and
> noise_facs_q is already limited to the range from 0 to 30.
>
> So x should always be in the range -300..300, or so.

Very good, thanks a lot.

Based on the above range, my idea is to not even use a LUT, but use
something like exp2fi followed by multiplication by M_SQRT2 depending
on even or odd. This will not bloat the binary, but is still very fast
and avoids huge variability in performance. That should provide a good
baseline (see jpeg2000 for this idea), further tweaks can be done (e.g
using an exp2i, i.e double precision to avoid possible branching for
the overflow/underflow cases).

Maybe exp2i and/or exp2fi could be moved to avutil/internal or more
appropriately avcodec/internal as they have utility in this, jpeg2000,
and at least one other place in avcodec (which I can't recall).

Will be addressed in a week or so, unless someone does it before then.
This is very quick to do, and so this patch may not be needed.

>
> Best regards,
> Andreas
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 2/3] avcodec/aacsbr: Add comment about possibly optimization in sbr_dequant()

2015-12-11 Thread Andreas Cadhalpun
On 19.11.2015 14:17, Michael Niedermayer wrote:
> From: Michael Niedermayer 
> 
> Signed-off-by: Michael Niedermayer 
> ---
>  libavcodec/aacsbr.c |1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/libavcodec/aacsbr.c b/libavcodec/aacsbr.c
> index d1e3a91..e014646 100644
> --- a/libavcodec/aacsbr.c
> +++ b/libavcodec/aacsbr.c
> @@ -73,6 +73,7 @@ static void sbr_dequant(SpectralBandReplication *sbr, int 
> id_aac)
>  {
>  int k, e;
>  int ch;
> +//TODO: Replace exp2f(0.5*x) by a LUT, the inputs are all integer and 
> have a small range
>  
>  if (id_aac == TYPE_CPE && sbr->bs_coupling) {
>  float alpha  = sbr->data[0].bs_amp_res ?  1.0f :  0.5f;
> 

This shouldn't hurt, with or without the clarification requested by Ganesh.

Best regards,
Andreas
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 2/3] avcodec/aacsbr: Add comment about possibly optimization in sbr_dequant()

2015-12-11 Thread Andreas Cadhalpun
On 11.12.2015 18:09, Ganesh Ajjanagadde wrote:
> On Fri, Dec 11, 2015 at 11:36 AM, Andreas Cadhalpun
>  wrote:
>> On 11.12.2015 17:21, Ganesh Ajjanagadde wrote:
>>> On Fri, Dec 11, 2015 at 11:16 AM, Andreas Cadhalpun
>>>  wrote:
 On 19.11.2015 14:17, Michael Niedermayer wrote:
> From: Michael Niedermayer 
>
> Signed-off-by: Michael Niedermayer 
> ---
>  libavcodec/aacsbr.c |1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/libavcodec/aacsbr.c b/libavcodec/aacsbr.c
> index d1e3a91..e014646 100644
> --- a/libavcodec/aacsbr.c
> +++ b/libavcodec/aacsbr.c
> @@ -73,6 +73,7 @@ static void sbr_dequant(SpectralBandReplication *sbr, 
> int id_aac)
>  {
>  int k, e;
>  int ch;
> +//TODO: Replace exp2f(0.5*x) by a LUT, the inputs are all integer 
> and have a small range
>
>  if (id_aac == TYPE_CPE && sbr->bs_coupling) {
>  float alpha  = sbr->data[0].bs_amp_res ?  1.0f :  0.5f;
>

 This shouldn't hurt, with or without the clarification requested by Ganesh.
>>>
>>> I am doing related work cleaning up and optimizing usages of slow libm
>>> functions such as pow and exp2. Do you know the exact possible range
>>> of the inputs x, and if so, can it be added to the comment? That will
>>> be very helpful for me to come up with a patch. Thanks.
>>
>> The exp2f expressions are:
>> exp2f(sbr->data[0].env_facs_q[e][k] * alpha + 7.0f);
>> exp2f((pan_offset - sbr->data[1].env_facs_q[e][k]) * alpha);
>> exp2f(NOISE_FLOOR_OFFSET - sbr->data[0].noise_facs_q[e][k] + 1);
>> exp2f(12 - sbr->data[1].noise_facs_q[e][k]);
>> exp2f(alpha * sbr->data[ch].env_facs_q[e][k] + 6.0f);
>> exp2f(NOISE_FLOOR_OFFSET - sbr->data[ch].noise_facs_q[e][k]);
>>
>> Here alpha is 1 or 0.5, pan_offset 12 or 24 and NOISE_FLOOR_OFFSET is 6.
>> After patch 3 of this series, env_facs_q is in the range from 0 to 127 and
>> noise_facs_q is already limited to the range from 0 to 30.
>>
>> So x should always be in the range -300..300, or so.
> 
> Very good, thanks a lot.
> 
> Based on the above range, my idea is to not even use a LUT, but use
> something like exp2fi followed by multiplication by M_SQRT2 depending
> on even or odd. This will not bloat the binary, but is still very fast
> and avoids huge variability in performance. That should provide a good
> baseline (see jpeg2000 for this idea), further tweaks can be done (e.g
> using an exp2i, i.e double precision to avoid possible branching for
> the overflow/underflow cases).

That sounds good. :)

> Maybe exp2i and/or exp2fi could be moved to avutil/internal or more
> appropriately avcodec/internal as they have utility in this, jpeg2000,
> and at least one other place in avcodec (which I can't recall).

Moving to avcodec/internal should be fine.

> Will be addressed in a week or so, unless someone does it before then.
> This is very quick to do, and so this patch may not be needed.

Indeed, no need to add the comment, if it's removed a week later.

Best regards,
Andreas

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 2/3] avcodec/aacsbr: Add comment about possibly optimization in sbr_dequant()

2015-11-19 Thread Ganesh Ajjanagadde
On Thu, Nov 19, 2015 at 8:17 AM, Michael Niedermayer  wrote:
> From: Michael Niedermayer 
>
> Signed-off-by: Michael Niedermayer 
> ---
>  libavcodec/aacsbr.c |1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/libavcodec/aacsbr.c b/libavcodec/aacsbr.c
> index d1e3a91..e014646 100644
> --- a/libavcodec/aacsbr.c
> +++ b/libavcodec/aacsbr.c
> @@ -73,6 +73,7 @@ static void sbr_dequant(SpectralBandReplication *sbr, int 
> id_aac)
>  {
>  int k, e;
>  int ch;
> +//TODO: Replace exp2f(0.5*x) by a LUT, the inputs are all integer and 
> have a small range

Entirely possible I missed something, but exp2 gives 2^x, so if inputs
are integers, can't one simply do the necessary bit shifts and cast
implicitly to float?
I think what you meant was all inputs are half-integers or integers.

>
>  if (id_aac == TYPE_CPE && sbr->bs_coupling) {
>  float alpha  = sbr->data[0].bs_amp_res ?  1.0f :  0.5f;
> --
> 1.7.9.5
>
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 2/3] avcodec/aacsbr: Add comment about possibly optimization in sbr_dequant()

2015-11-19 Thread Ganesh Ajjanagadde
On Thu, Nov 19, 2015 at 9:18 AM, Michael Niedermayer  wrote:
> On Thu, Nov 19, 2015 at 09:03:03AM -0500, Ganesh Ajjanagadde wrote:
>> On Thu, Nov 19, 2015 at 8:17 AM, Michael Niedermayer  
>> wrote:
>> > From: Michael Niedermayer 
>> >
>> > Signed-off-by: Michael Niedermayer 
>> > ---
>> >  libavcodec/aacsbr.c |1 +
>> >  1 file changed, 1 insertion(+)
>> >
>> > diff --git a/libavcodec/aacsbr.c b/libavcodec/aacsbr.c
>> > index d1e3a91..e014646 100644
>> > --- a/libavcodec/aacsbr.c
>> > +++ b/libavcodec/aacsbr.c
>> > @@ -73,6 +73,7 @@ static void sbr_dequant(SpectralBandReplication *sbr, 
>> > int id_aac)
>> >  {
>> >  int k, e;
>> >  int ch;
>> > +//TODO: Replace exp2f(0.5*x) by a LUT, the inputs are all integer and 
>> > have a small range
>>
>> Entirely possible I missed something, but exp2 gives 2^x, so if inputs
>> are integers, can't one simply do the necessary bit shifts and cast
>> implicitly to float?
>> I think what you meant was all inputs are half-integers or integers.
>
> i meant all arguments to function f(x) are integers, f(x) = exp2(0.5*x)

Since we had this back and forth, suggested clarifying reword:

//TODO: Replace exp2f(0.5*x) by a LUT, the inputs are all integer and
have a small range ->
//TODO: Replace exp2f(0.5*x) by a LUT, the inputs x are all integer
and have a small range

>
> [...]
> --
> Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> Let us carefully observe those good qualities wherein our enemies excel us
> and endeavor to excel them, by avoiding what is faulty, and imitating what
> is excellent in them. -- Plutarch
>
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 2/3] avcodec/aacsbr: Add comment about possibly optimization in sbr_dequant()

2015-11-19 Thread Michael Niedermayer
On Thu, Nov 19, 2015 at 09:03:03AM -0500, Ganesh Ajjanagadde wrote:
> On Thu, Nov 19, 2015 at 8:17 AM, Michael Niedermayer  wrote:
> > From: Michael Niedermayer 
> >
> > Signed-off-by: Michael Niedermayer 
> > ---
> >  libavcodec/aacsbr.c |1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/libavcodec/aacsbr.c b/libavcodec/aacsbr.c
> > index d1e3a91..e014646 100644
> > --- a/libavcodec/aacsbr.c
> > +++ b/libavcodec/aacsbr.c
> > @@ -73,6 +73,7 @@ static void sbr_dequant(SpectralBandReplication *sbr, int 
> > id_aac)
> >  {
> >  int k, e;
> >  int ch;
> > +//TODO: Replace exp2f(0.5*x) by a LUT, the inputs are all integer and 
> > have a small range
> 
> Entirely possible I missed something, but exp2 gives 2^x, so if inputs
> are integers, can't one simply do the necessary bit shifts and cast
> implicitly to float?
> I think what you meant was all inputs are half-integers or integers.

i meant all arguments to function f(x) are integers, f(x) = exp2(0.5*x)

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Let us carefully observe those good qualities wherein our enemies excel us
and endeavor to excel them, by avoiding what is faulty, and imitating what
is excellent in them. -- Plutarch


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel