Re: [FFmpeg-devel] [PATCH 2/3] avcodec/aacsbr: Add comment about possibly optimization in sbr_dequant()
On Fri, Dec 11, 2015 at 12:09:57PM -0500, Ganesh Ajjanagadde wrote: > On Fri, Dec 11, 2015 at 11:36 AM, Andreas Cadhalpun >wrote: > > On 11.12.2015 17:21, Ganesh Ajjanagadde wrote: > >> On Fri, Dec 11, 2015 at 11:16 AM, Andreas Cadhalpun > >> wrote: > >>> On 19.11.2015 14:17, Michael Niedermayer wrote: > From: Michael Niedermayer > > Signed-off-by: Michael Niedermayer > --- > libavcodec/aacsbr.c |1 + > 1 file changed, 1 insertion(+) > > diff --git a/libavcodec/aacsbr.c b/libavcodec/aacsbr.c > index d1e3a91..e014646 100644 > --- a/libavcodec/aacsbr.c > +++ b/libavcodec/aacsbr.c > @@ -73,6 +73,7 @@ static void sbr_dequant(SpectralBandReplication *sbr, > int id_aac) > { > int k, e; > int ch; > +//TODO: Replace exp2f(0.5*x) by a LUT, the inputs are all integer > and have a small range > > if (id_aac == TYPE_CPE && sbr->bs_coupling) { > float alpha = sbr->data[0].bs_amp_res ? 1.0f : 0.5f; > > >>> > >>> This shouldn't hurt, with or without the clarification requested by > >>> Ganesh. > >> > >> I am doing related work cleaning up and optimizing usages of slow libm > >> functions such as pow and exp2. Do you know the exact possible range > >> of the inputs x, and if so, can it be added to the comment? That will > >> be very helpful for me to come up with a patch. Thanks. > > > > The exp2f expressions are: > > exp2f(sbr->data[0].env_facs_q[e][k] * alpha + 7.0f); > > exp2f((pan_offset - sbr->data[1].env_facs_q[e][k]) * alpha); > > exp2f(NOISE_FLOOR_OFFSET - sbr->data[0].noise_facs_q[e][k] + 1); > > exp2f(12 - sbr->data[1].noise_facs_q[e][k]); > > exp2f(alpha * sbr->data[ch].env_facs_q[e][k] + 6.0f); > > exp2f(NOISE_FLOOR_OFFSET - sbr->data[ch].noise_facs_q[e][k]); > > > > Here alpha is 1 or 0.5, pan_offset 12 or 24 and NOISE_FLOOR_OFFSET is 6. > > After patch 3 of this series, env_facs_q is in the range from 0 to 127 and > > noise_facs_q is already limited to the range from 0 to 30. > > > > So x should always be in the range -300..300, or so. > > Very good, thanks a lot. > > Based on the above range, my idea is to not even use a LUT, but use > something like exp2fi followed by multiplication by M_SQRT2 depending > on even or odd. conditional operations can due to branch misprediction be potentially rather slow [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Many that live deserve death. And some that die deserve life. Can you give it to them? Then do not be too eager to deal out death in judgement. For even the very wise cannot see all ends. -- Gandalf signature.asc Description: Digital signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 2/3] avcodec/aacsbr: Add comment about possibly optimization in sbr_dequant()
On Sat, Dec 12, 2015 at 12:58 PM, Michael Niedermayerwrote: > On Fri, Dec 11, 2015 at 12:09:57PM -0500, Ganesh Ajjanagadde wrote: >> On Fri, Dec 11, 2015 at 11:36 AM, Andreas Cadhalpun >> wrote: >> > On 11.12.2015 17:21, Ganesh Ajjanagadde wrote: >> >> On Fri, Dec 11, 2015 at 11:16 AM, Andreas Cadhalpun >> >> wrote: >> >>> On 19.11.2015 14:17, Michael Niedermayer wrote: >> From: Michael Niedermayer >> >> Signed-off-by: Michael Niedermayer >> --- >> libavcodec/aacsbr.c |1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/libavcodec/aacsbr.c b/libavcodec/aacsbr.c >> index d1e3a91..e014646 100644 >> --- a/libavcodec/aacsbr.c >> +++ b/libavcodec/aacsbr.c >> @@ -73,6 +73,7 @@ static void sbr_dequant(SpectralBandReplication *sbr, >> int id_aac) >> { >> int k, e; >> int ch; >> +//TODO: Replace exp2f(0.5*x) by a LUT, the inputs are all integer >> and have a small range >> >> if (id_aac == TYPE_CPE && sbr->bs_coupling) { >> float alpha = sbr->data[0].bs_amp_res ? 1.0f : 0.5f; >> >> >>> >> >>> This shouldn't hurt, with or without the clarification requested by >> >>> Ganesh. >> >> >> >> I am doing related work cleaning up and optimizing usages of slow libm >> >> functions such as pow and exp2. Do you know the exact possible range >> >> of the inputs x, and if so, can it be added to the comment? That will >> >> be very helpful for me to come up with a patch. Thanks. >> > >> > The exp2f expressions are: >> > exp2f(sbr->data[0].env_facs_q[e][k] * alpha + 7.0f); >> > exp2f((pan_offset - sbr->data[1].env_facs_q[e][k]) * alpha); >> > exp2f(NOISE_FLOOR_OFFSET - sbr->data[0].noise_facs_q[e][k] + 1); >> > exp2f(12 - sbr->data[1].noise_facs_q[e][k]); >> > exp2f(alpha * sbr->data[ch].env_facs_q[e][k] + 6.0f); >> > exp2f(NOISE_FLOOR_OFFSET - sbr->data[ch].noise_facs_q[e][k]); >> > >> > Here alpha is 1 or 0.5, pan_offset 12 or 24 and NOISE_FLOOR_OFFSET is 6. >> > After patch 3 of this series, env_facs_q is in the range from 0 to 127 and >> > noise_facs_q is already limited to the range from 0 to 30. >> > >> > So x should always be in the range -300..300, or so. >> >> Very good, thanks a lot. >> >> Based on the above range, my idea is to not even use a LUT, but use >> something like exp2fi followed by multiplication by M_SQRT2 depending >> on even or odd. > > conditional operations can due to branch misprediction be potentially > rather slow I think it will still be far faster than exp2f, and in the absence of hard numbers, I view this a far better approach than a large (~300 element) lut. Of course, the proof and extent of this will need to wait for actual benches. > > [...] > -- > Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB > > Many that live deserve death. And some that die deserve life. Can you give > it to them? Then do not be too eager to deal out death in judgement. For > even the very wise cannot see all ends. -- Gandalf > > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 2/3] avcodec/aacsbr: Add comment about possibly optimization in sbr_dequant()
On Sat, Dec 12, 2015 at 01:08:01PM -0500, Ganesh Ajjanagadde wrote: > On Sat, Dec 12, 2015 at 12:58 PM, Michael Niedermayer> wrote: > > On Fri, Dec 11, 2015 at 12:09:57PM -0500, Ganesh Ajjanagadde wrote: > >> On Fri, Dec 11, 2015 at 11:36 AM, Andreas Cadhalpun > >> wrote: > >> > On 11.12.2015 17:21, Ganesh Ajjanagadde wrote: > >> >> On Fri, Dec 11, 2015 at 11:16 AM, Andreas Cadhalpun > >> >> wrote: > >> >>> On 19.11.2015 14:17, Michael Niedermayer wrote: > >> From: Michael Niedermayer > >> > >> Signed-off-by: Michael Niedermayer > >> --- > >> libavcodec/aacsbr.c |1 + > >> 1 file changed, 1 insertion(+) > >> > >> diff --git a/libavcodec/aacsbr.c b/libavcodec/aacsbr.c > >> index d1e3a91..e014646 100644 > >> --- a/libavcodec/aacsbr.c > >> +++ b/libavcodec/aacsbr.c > >> @@ -73,6 +73,7 @@ static void sbr_dequant(SpectralBandReplication > >> *sbr, int id_aac) > >> { > >> int k, e; > >> int ch; > >> +//TODO: Replace exp2f(0.5*x) by a LUT, the inputs are all > >> integer and have a small range > >> > >> if (id_aac == TYPE_CPE && sbr->bs_coupling) { > >> float alpha = sbr->data[0].bs_amp_res ? 1.0f : 0.5f; > >> > >> >>> > >> >>> This shouldn't hurt, with or without the clarification requested by > >> >>> Ganesh. > >> >> > >> >> I am doing related work cleaning up and optimizing usages of slow libm > >> >> functions such as pow and exp2. Do you know the exact possible range > >> >> of the inputs x, and if so, can it be added to the comment? That will > >> >> be very helpful for me to come up with a patch. Thanks. > >> > > >> > The exp2f expressions are: > >> > exp2f(sbr->data[0].env_facs_q[e][k] * alpha + 7.0f); > >> > exp2f((pan_offset - sbr->data[1].env_facs_q[e][k]) * alpha); > >> > exp2f(NOISE_FLOOR_OFFSET - sbr->data[0].noise_facs_q[e][k] + 1); > >> > exp2f(12 - sbr->data[1].noise_facs_q[e][k]); > >> > exp2f(alpha * sbr->data[ch].env_facs_q[e][k] + 6.0f); > >> > exp2f(NOISE_FLOOR_OFFSET - sbr->data[ch].noise_facs_q[e][k]); > >> > > >> > Here alpha is 1 or 0.5, pan_offset 12 or 24 and NOISE_FLOOR_OFFSET is 6. > >> > After patch 3 of this series, env_facs_q is in the range from 0 to 127 > >> > and > >> > noise_facs_q is already limited to the range from 0 to 30. > >> > > >> > So x should always be in the range -300..300, or so. > >> > >> Very good, thanks a lot. > >> > >> Based on the above range, my idea is to not even use a LUT, but use > >> something like exp2fi followed by multiplication by M_SQRT2 depending > >> on even or odd. > > > > conditional operations can due to branch misprediction be potentially > > rather slow > > I think it will still be far faster than exp2f, and in the absence of > hard numbers, I view this a far better approach than a large (~300 > element) lut. Of course, the proof and extent of this will need to > wait for actual benches. alternatively one could do a if (x+A < (unsigned)B) LUT[x+A] else exp2whatever(x) the range in practice should be much smaller than +-300 also the LUT can possibly be shared between codecs or that code could be in a exp_sqrt2i() or something just some random ideas... [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Complexity theory is the science of finding the exact solution to an approximation. Benchmarking OTOH is finding an approximation of the exact signature.asc Description: Digital signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 2/3] avcodec/aacsbr: Add comment about possibly optimization in sbr_dequant()
On Sat, Dec 12, 2015 at 1:17 PM, Michael Niedermayerwrote: [...] >> >> > >> >> > The exp2f expressions are: >> >> > exp2f(sbr->data[0].env_facs_q[e][k] * alpha + 7.0f); >> >> > exp2f((pan_offset - sbr->data[1].env_facs_q[e][k]) * alpha); >> >> > exp2f(NOISE_FLOOR_OFFSET - sbr->data[0].noise_facs_q[e][k] + 1); >> >> > exp2f(12 - sbr->data[1].noise_facs_q[e][k]); >> >> > exp2f(alpha * sbr->data[ch].env_facs_q[e][k] + 6.0f); >> >> > exp2f(NOISE_FLOOR_OFFSET - sbr->data[ch].noise_facs_q[e][k]); >> >> > >> >> > Here alpha is 1 or 0.5, pan_offset 12 or 24 and NOISE_FLOOR_OFFSET is 6. >> >> > After patch 3 of this series, env_facs_q is in the range from 0 to 127 >> >> > and >> >> > noise_facs_q is already limited to the range from 0 to 30. >> >> > >> >> > So x should always be in the range -300..300, or so. >> >> >> >> Very good, thanks a lot. >> >> >> >> Based on the above range, my idea is to not even use a LUT, but use >> >> something like exp2fi followed by multiplication by M_SQRT2 depending >> >> on even or odd. >> > >> > conditional operations can due to branch misprediction be potentially >> > rather slow >> >> I think it will still be far faster than exp2f, and in the absence of >> hard numbers, I view this a far better approach than a large (~300 >> element) lut. Of course, the proof and extent of this will need to >> wait for actual benches. > > alternatively one could do a > if (x+A < (unsigned)B) > LUT[x+A] > else > exp2whatever(x) > > the range in practice should be much smaller than +-300 That still uses a branch, so unless for whatever reason the numbers tend to concentrate in an interval (which you believe but I am agnostic about since I don't know AAC), this is code complexity for little gain. Furthermore, that B then becomes a "voodoo" constant, and performance may vary across files depending on the degree of concentration of the inputs. I personally don't like such things unless they are very well justified after all other easy, uniform methods of optimization are exhausted. > > also the LUT can possibly be shared between codecs This is something for which there is plenty of low hanging fruit across FFmpeg. The best example I know of is the sin/cos tables used across dct, fft, and other areas: a lot of them can be derived from sin(2*pi*i/65536) for 0 <= i <= 65536/4, and cheap runtime derivation via indexing and symmetry exploitation (eg 16*i, 32*i, flip the sign). I will work on this only after other things are sorted out; so in the meantime it is up for grabs. Complications come from threading issues for the case of dynamic init, since one needs to place a lock of some sort to avoid write contention. And --enable-hardcoded-tables does not help at all, as it distracts from the key issues since one needs to reason about both cases. Unfortunately, the question of static vs dynamic tables is something that can easily get bikeshedded to death. I am trying to do improvements to the default dynamic initialization to help resolve such things, see e.g the aac_tablegen work. My goal is to first improve the runtime generation code, and once it is done, ask the ML for decisions on this when the answer is relatively clear cut. I might be too optimistic here regarding removal of --enable-hardcoded-tables entirely via hard decisions in respective usages of static vs runtime across the codebase. This is beacuse things like mpegaudio_tablegen are hard to do on the order of 10^5 or below cycles at runtime, leading to ambiguity and stalemating the removal of hardcoded tables hackery. > > or that code could be in a exp_sqrt2i() or something Maybe, I just thought by keeping common code, one avoids essentially duplicate functions and it possibly keeps binary size smaller. This of course depends on the linker, header vs source implementation, etc and is a minor consideration. > > just some random ideas... Thanks for the ideas. I had thought about most of these things before settling on the idea proposed above. In any case, the goal should be solid incremental improvement: last mile tweaks to get rid of some branch penalties, optimize for common inputs, or other things can always be done later, and should not bog down work now. > > [...] > -- > Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB > > Complexity theory is the science of finding the exact solution to an > approximation. Benchmarking OTOH is finding an approximation of the exact > > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 2/3] avcodec/aacsbr: Add comment about possibly optimization in sbr_dequant()
On Sat, Dec 12, 2015 at 05:24:34PM -0500, Ganesh Ajjanagadde wrote: > On Sat, Dec 12, 2015 at 1:17 PM, Michael Niedermayerwrote: > [...] > > >> >> > > >> >> > The exp2f expressions are: > >> >> > exp2f(sbr->data[0].env_facs_q[e][k] * alpha + 7.0f); > >> >> > exp2f((pan_offset - sbr->data[1].env_facs_q[e][k]) * alpha); > >> >> > exp2f(NOISE_FLOOR_OFFSET - sbr->data[0].noise_facs_q[e][k] + 1); > >> >> > exp2f(12 - sbr->data[1].noise_facs_q[e][k]); > >> >> > exp2f(alpha * sbr->data[ch].env_facs_q[e][k] + 6.0f); > >> >> > exp2f(NOISE_FLOOR_OFFSET - sbr->data[ch].noise_facs_q[e][k]); > >> >> > > >> >> > Here alpha is 1 or 0.5, pan_offset 12 or 24 and NOISE_FLOOR_OFFSET is > >> >> > 6. > >> >> > After patch 3 of this series, env_facs_q is in the range from 0 to > >> >> > 127 and > >> >> > noise_facs_q is already limited to the range from 0 to 30. > >> >> > > >> >> > So x should always be in the range -300..300, or so. > >> >> > >> >> Very good, thanks a lot. > >> >> > >> >> Based on the above range, my idea is to not even use a LUT, but use > >> >> something like exp2fi followed by multiplication by M_SQRT2 depending > >> >> on even or odd. > >> > > >> > conditional operations can due to branch misprediction be potentially > >> > rather slow > >> > >> I think it will still be far faster than exp2f, and in the absence of > >> hard numbers, I view this a far better approach than a large (~300 > >> element) lut. Of course, the proof and extent of this will need to > >> wait for actual benches. > > > > alternatively one could do a > > if (x+A < (unsigned)B) > > LUT[x+A] > > else > > exp2whatever(x) > > > > the range in practice should be much smaller than +-300 > > That still uses a branch, so unless for whatever reason the numbers > tend to concentrate in an interval (which you believe but I am > agnostic about since I don't know AAC), this is code complexity for theres an easy way to find out, just print the numbers and use sort -n | uniq -c i didnt try but i expect most of the range to have 0 hits > little gain. Furthermore, that B then becomes a "voodoo" constant, and > performance may vary across files depending on the degree of > concentration of the inputs. I personally don't like such things > unless they are very well justified after all other easy, uniform > methods of optimization are exhausted. > > > > > also the LUT can possibly be shared between codecs > > This is something for which there is plenty of low hanging fruit > across FFmpeg. The best example I know of is the sin/cos tables used > across dct, fft, and other areas: a lot of them can be derived from > sin(2*pi*i/65536) for 0 <= i <= 65536/4, and cheap runtime derivation > via indexing and symmetry exploitation (eg 16*i, 32*i, flip the sign). > I will work on this only after other things are sorted out; so in the > meantime it is up for grabs. > > Complications come from threading issues for the case of dynamic init, > since one needs to place a lock of some sort to avoid write > contention. And --enable-hardcoded-tables does not help at all, as it > distracts from the key issues since one needs to reason about both > cases. you do not "need" a lock the most standard way to add a table is to type the numbers of its entries in the a source file and commit that. if the table is to be generated at build time then some complexity is added to do that with a generator run at build time if it is build at runtime then it no longer can be shared between processes and needs to be written to disk in case it is paged out a "hardcoded" table resides in read only memory and on page out is discarded, the kernel can read it directly from the excutable again when it gets accessed again. and one can support both runtime and buildtime tables, so the user can choose a manually hardcoded table without generator is harder to update though and it and the build generated ones result in larger object and excutable files > > Unfortunately, the question of static vs dynamic tables is something > that can easily get bikeshedded to death. I am trying to do yes, everyone has a different use case in mind and a different variant is optimal for each. In the end there is not a big difference between them i think, especially not for small tables. For big tables there could be more significant disadvantages in each solution relative to what is optimal for each others use case anyway, you could try exp2whatever(x>>1) * tab[x&1] or something that may be faster IFF the compiler builds the if(x&1) into bad code and if the values actually fluctuate enough [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB I have often repented speaking, but never of holding my tongue. -- Xenocrates signature.asc Description: Digital signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 2/3] avcodec/aacsbr: Add comment about possibly optimization in sbr_dequant()
On 11.12.2015 17:21, Ganesh Ajjanagadde wrote: > On Fri, Dec 11, 2015 at 11:16 AM, Andreas Cadhalpun >wrote: >> On 19.11.2015 14:17, Michael Niedermayer wrote: >>> From: Michael Niedermayer >>> >>> Signed-off-by: Michael Niedermayer >>> --- >>> libavcodec/aacsbr.c |1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/libavcodec/aacsbr.c b/libavcodec/aacsbr.c >>> index d1e3a91..e014646 100644 >>> --- a/libavcodec/aacsbr.c >>> +++ b/libavcodec/aacsbr.c >>> @@ -73,6 +73,7 @@ static void sbr_dequant(SpectralBandReplication *sbr, int >>> id_aac) >>> { >>> int k, e; >>> int ch; >>> +//TODO: Replace exp2f(0.5*x) by a LUT, the inputs are all integer and >>> have a small range >>> >>> if (id_aac == TYPE_CPE && sbr->bs_coupling) { >>> float alpha = sbr->data[0].bs_amp_res ? 1.0f : 0.5f; >>> >> >> This shouldn't hurt, with or without the clarification requested by Ganesh. > > I am doing related work cleaning up and optimizing usages of slow libm > functions such as pow and exp2. Do you know the exact possible range > of the inputs x, and if so, can it be added to the comment? That will > be very helpful for me to come up with a patch. Thanks. The exp2f expressions are: exp2f(sbr->data[0].env_facs_q[e][k] * alpha + 7.0f); exp2f((pan_offset - sbr->data[1].env_facs_q[e][k]) * alpha); exp2f(NOISE_FLOOR_OFFSET - sbr->data[0].noise_facs_q[e][k] + 1); exp2f(12 - sbr->data[1].noise_facs_q[e][k]); exp2f(alpha * sbr->data[ch].env_facs_q[e][k] + 6.0f); exp2f(NOISE_FLOOR_OFFSET - sbr->data[ch].noise_facs_q[e][k]); Here alpha is 1 or 0.5, pan_offset 12 or 24 and NOISE_FLOOR_OFFSET is 6. After patch 3 of this series, env_facs_q is in the range from 0 to 127 and noise_facs_q is already limited to the range from 0 to 30. So x should always be in the range -300..300, or so. Best regards, Andreas ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 2/3] avcodec/aacsbr: Add comment about possibly optimization in sbr_dequant()
On Fri, Dec 11, 2015 at 11:36 AM, Andreas Cadhalpunwrote: > On 11.12.2015 17:21, Ganesh Ajjanagadde wrote: >> On Fri, Dec 11, 2015 at 11:16 AM, Andreas Cadhalpun >> wrote: >>> On 19.11.2015 14:17, Michael Niedermayer wrote: From: Michael Niedermayer Signed-off-by: Michael Niedermayer --- libavcodec/aacsbr.c |1 + 1 file changed, 1 insertion(+) diff --git a/libavcodec/aacsbr.c b/libavcodec/aacsbr.c index d1e3a91..e014646 100644 --- a/libavcodec/aacsbr.c +++ b/libavcodec/aacsbr.c @@ -73,6 +73,7 @@ static void sbr_dequant(SpectralBandReplication *sbr, int id_aac) { int k, e; int ch; +//TODO: Replace exp2f(0.5*x) by a LUT, the inputs are all integer and have a small range if (id_aac == TYPE_CPE && sbr->bs_coupling) { float alpha = sbr->data[0].bs_amp_res ? 1.0f : 0.5f; >>> >>> This shouldn't hurt, with or without the clarification requested by Ganesh. >> >> I am doing related work cleaning up and optimizing usages of slow libm >> functions such as pow and exp2. Do you know the exact possible range >> of the inputs x, and if so, can it be added to the comment? That will >> be very helpful for me to come up with a patch. Thanks. > > The exp2f expressions are: > exp2f(sbr->data[0].env_facs_q[e][k] * alpha + 7.0f); > exp2f((pan_offset - sbr->data[1].env_facs_q[e][k]) * alpha); > exp2f(NOISE_FLOOR_OFFSET - sbr->data[0].noise_facs_q[e][k] + 1); > exp2f(12 - sbr->data[1].noise_facs_q[e][k]); > exp2f(alpha * sbr->data[ch].env_facs_q[e][k] + 6.0f); > exp2f(NOISE_FLOOR_OFFSET - sbr->data[ch].noise_facs_q[e][k]); > > Here alpha is 1 or 0.5, pan_offset 12 or 24 and NOISE_FLOOR_OFFSET is 6. > After patch 3 of this series, env_facs_q is in the range from 0 to 127 and > noise_facs_q is already limited to the range from 0 to 30. > > So x should always be in the range -300..300, or so. Very good, thanks a lot. Based on the above range, my idea is to not even use a LUT, but use something like exp2fi followed by multiplication by M_SQRT2 depending on even or odd. This will not bloat the binary, but is still very fast and avoids huge variability in performance. That should provide a good baseline (see jpeg2000 for this idea), further tweaks can be done (e.g using an exp2i, i.e double precision to avoid possible branching for the overflow/underflow cases). Maybe exp2i and/or exp2fi could be moved to avutil/internal or more appropriately avcodec/internal as they have utility in this, jpeg2000, and at least one other place in avcodec (which I can't recall). Will be addressed in a week or so, unless someone does it before then. This is very quick to do, and so this patch may not be needed. > > Best regards, > Andreas > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 2/3] avcodec/aacsbr: Add comment about possibly optimization in sbr_dequant()
On 19.11.2015 14:17, Michael Niedermayer wrote: > From: Michael Niedermayer> > Signed-off-by: Michael Niedermayer > --- > libavcodec/aacsbr.c |1 + > 1 file changed, 1 insertion(+) > > diff --git a/libavcodec/aacsbr.c b/libavcodec/aacsbr.c > index d1e3a91..e014646 100644 > --- a/libavcodec/aacsbr.c > +++ b/libavcodec/aacsbr.c > @@ -73,6 +73,7 @@ static void sbr_dequant(SpectralBandReplication *sbr, int > id_aac) > { > int k, e; > int ch; > +//TODO: Replace exp2f(0.5*x) by a LUT, the inputs are all integer and > have a small range > > if (id_aac == TYPE_CPE && sbr->bs_coupling) { > float alpha = sbr->data[0].bs_amp_res ? 1.0f : 0.5f; > This shouldn't hurt, with or without the clarification requested by Ganesh. Best regards, Andreas ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 2/3] avcodec/aacsbr: Add comment about possibly optimization in sbr_dequant()
On 11.12.2015 18:09, Ganesh Ajjanagadde wrote: > On Fri, Dec 11, 2015 at 11:36 AM, Andreas Cadhalpun >wrote: >> On 11.12.2015 17:21, Ganesh Ajjanagadde wrote: >>> On Fri, Dec 11, 2015 at 11:16 AM, Andreas Cadhalpun >>> wrote: On 19.11.2015 14:17, Michael Niedermayer wrote: > From: Michael Niedermayer > > Signed-off-by: Michael Niedermayer > --- > libavcodec/aacsbr.c |1 + > 1 file changed, 1 insertion(+) > > diff --git a/libavcodec/aacsbr.c b/libavcodec/aacsbr.c > index d1e3a91..e014646 100644 > --- a/libavcodec/aacsbr.c > +++ b/libavcodec/aacsbr.c > @@ -73,6 +73,7 @@ static void sbr_dequant(SpectralBandReplication *sbr, > int id_aac) > { > int k, e; > int ch; > +//TODO: Replace exp2f(0.5*x) by a LUT, the inputs are all integer > and have a small range > > if (id_aac == TYPE_CPE && sbr->bs_coupling) { > float alpha = sbr->data[0].bs_amp_res ? 1.0f : 0.5f; > This shouldn't hurt, with or without the clarification requested by Ganesh. >>> >>> I am doing related work cleaning up and optimizing usages of slow libm >>> functions such as pow and exp2. Do you know the exact possible range >>> of the inputs x, and if so, can it be added to the comment? That will >>> be very helpful for me to come up with a patch. Thanks. >> >> The exp2f expressions are: >> exp2f(sbr->data[0].env_facs_q[e][k] * alpha + 7.0f); >> exp2f((pan_offset - sbr->data[1].env_facs_q[e][k]) * alpha); >> exp2f(NOISE_FLOOR_OFFSET - sbr->data[0].noise_facs_q[e][k] + 1); >> exp2f(12 - sbr->data[1].noise_facs_q[e][k]); >> exp2f(alpha * sbr->data[ch].env_facs_q[e][k] + 6.0f); >> exp2f(NOISE_FLOOR_OFFSET - sbr->data[ch].noise_facs_q[e][k]); >> >> Here alpha is 1 or 0.5, pan_offset 12 or 24 and NOISE_FLOOR_OFFSET is 6. >> After patch 3 of this series, env_facs_q is in the range from 0 to 127 and >> noise_facs_q is already limited to the range from 0 to 30. >> >> So x should always be in the range -300..300, or so. > > Very good, thanks a lot. > > Based on the above range, my idea is to not even use a LUT, but use > something like exp2fi followed by multiplication by M_SQRT2 depending > on even or odd. This will not bloat the binary, but is still very fast > and avoids huge variability in performance. That should provide a good > baseline (see jpeg2000 for this idea), further tweaks can be done (e.g > using an exp2i, i.e double precision to avoid possible branching for > the overflow/underflow cases). That sounds good. :) > Maybe exp2i and/or exp2fi could be moved to avutil/internal or more > appropriately avcodec/internal as they have utility in this, jpeg2000, > and at least one other place in avcodec (which I can't recall). Moving to avcodec/internal should be fine. > Will be addressed in a week or so, unless someone does it before then. > This is very quick to do, and so this patch may not be needed. Indeed, no need to add the comment, if it's removed a week later. Best regards, Andreas ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 2/3] avcodec/aacsbr: Add comment about possibly optimization in sbr_dequant()
On Thu, Nov 19, 2015 at 8:17 AM, Michael Niedermayerwrote: > From: Michael Niedermayer > > Signed-off-by: Michael Niedermayer > --- > libavcodec/aacsbr.c |1 + > 1 file changed, 1 insertion(+) > > diff --git a/libavcodec/aacsbr.c b/libavcodec/aacsbr.c > index d1e3a91..e014646 100644 > --- a/libavcodec/aacsbr.c > +++ b/libavcodec/aacsbr.c > @@ -73,6 +73,7 @@ static void sbr_dequant(SpectralBandReplication *sbr, int > id_aac) > { > int k, e; > int ch; > +//TODO: Replace exp2f(0.5*x) by a LUT, the inputs are all integer and > have a small range Entirely possible I missed something, but exp2 gives 2^x, so if inputs are integers, can't one simply do the necessary bit shifts and cast implicitly to float? I think what you meant was all inputs are half-integers or integers. > > if (id_aac == TYPE_CPE && sbr->bs_coupling) { > float alpha = sbr->data[0].bs_amp_res ? 1.0f : 0.5f; > -- > 1.7.9.5 > > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 2/3] avcodec/aacsbr: Add comment about possibly optimization in sbr_dequant()
On Thu, Nov 19, 2015 at 9:18 AM, Michael Niedermayerwrote: > On Thu, Nov 19, 2015 at 09:03:03AM -0500, Ganesh Ajjanagadde wrote: >> On Thu, Nov 19, 2015 at 8:17 AM, Michael Niedermayer >> wrote: >> > From: Michael Niedermayer >> > >> > Signed-off-by: Michael Niedermayer >> > --- >> > libavcodec/aacsbr.c |1 + >> > 1 file changed, 1 insertion(+) >> > >> > diff --git a/libavcodec/aacsbr.c b/libavcodec/aacsbr.c >> > index d1e3a91..e014646 100644 >> > --- a/libavcodec/aacsbr.c >> > +++ b/libavcodec/aacsbr.c >> > @@ -73,6 +73,7 @@ static void sbr_dequant(SpectralBandReplication *sbr, >> > int id_aac) >> > { >> > int k, e; >> > int ch; >> > +//TODO: Replace exp2f(0.5*x) by a LUT, the inputs are all integer and >> > have a small range >> >> Entirely possible I missed something, but exp2 gives 2^x, so if inputs >> are integers, can't one simply do the necessary bit shifts and cast >> implicitly to float? >> I think what you meant was all inputs are half-integers or integers. > > i meant all arguments to function f(x) are integers, f(x) = exp2(0.5*x) Since we had this back and forth, suggested clarifying reword: //TODO: Replace exp2f(0.5*x) by a LUT, the inputs are all integer and have a small range -> //TODO: Replace exp2f(0.5*x) by a LUT, the inputs x are all integer and have a small range > > [...] > -- > Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB > > Let us carefully observe those good qualities wherein our enemies excel us > and endeavor to excel them, by avoiding what is faulty, and imitating what > is excellent in them. -- Plutarch > > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 2/3] avcodec/aacsbr: Add comment about possibly optimization in sbr_dequant()
On Thu, Nov 19, 2015 at 09:03:03AM -0500, Ganesh Ajjanagadde wrote: > On Thu, Nov 19, 2015 at 8:17 AM, Michael Niedermayerwrote: > > From: Michael Niedermayer > > > > Signed-off-by: Michael Niedermayer > > --- > > libavcodec/aacsbr.c |1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/libavcodec/aacsbr.c b/libavcodec/aacsbr.c > > index d1e3a91..e014646 100644 > > --- a/libavcodec/aacsbr.c > > +++ b/libavcodec/aacsbr.c > > @@ -73,6 +73,7 @@ static void sbr_dequant(SpectralBandReplication *sbr, int > > id_aac) > > { > > int k, e; > > int ch; > > +//TODO: Replace exp2f(0.5*x) by a LUT, the inputs are all integer and > > have a small range > > Entirely possible I missed something, but exp2 gives 2^x, so if inputs > are integers, can't one simply do the necessary bit shifts and cast > implicitly to float? > I think what you meant was all inputs are half-integers or integers. i meant all arguments to function f(x) are integers, f(x) = exp2(0.5*x) [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Let us carefully observe those good qualities wherein our enemies excel us and endeavor to excel them, by avoiding what is faulty, and imitating what is excellent in them. -- Plutarch signature.asc Description: Digital signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel