Re: [FFmpeg-devel] [PATCH 1/2] lavc/pcm_tablegen: slight speedup of table generation
On Sun, Jan 03, 2016 at 09:49:33PM -0800, Ganesh Ajjanagadde wrote: > On Sun, Jan 3, 2016 at 1:30 PM, Carl Eugen Hoyoswrote: > > Carl Eugen Hoyos ag.or.at> writes: > > > >> Ganesh Ajjanagadde mit.edu> writes: > >> > >> > No one has told me what is interesting > >> > >> Did you look at tickets #4441 or #4085? > > > > Or ticket #4829 or a j2k issue? > > Thanks a lot for taking the time to point these out. They have all > been noted. Unfortunately, I have too many things on my list now. 4829 > may be what I tackle first; it may take a while though. > > I hope the following is helpful. > Generally, my strengths are in algorithmic/mathematical/numerical > improvements. but not interested in merging pre-filter and RLB-filter in EBU R128 like I pointed out? :( More seriously, maybe try to write a filter? I'm thinking of the Eulerian magnification filter¹², which I unfortunately haven't time to work on. You may also enjoy studying motion interpolation for many applications. Anyway, the point is, you have a very large range of possibilities to enjoy yourself on the project wrt image processing. [1]: http://people.csail.mit.edu/mrub/vidmag/ [2]: https://www.youtube.com/watch?v=ONZcjs1Pjmk > I have a strong interest in security (both its > "practical" and "theoretical" variants), but with nowhere near the > same level of knowledge. > > Clarifications: by algorithmic improvements, I do not usually count > asm code, but make exceptions in some cases. In particular, I have > minimal knowledge of assembly and minimal motivation in learning it. > However, I may examine at some point cases where I am convinced that a > compiler can't do the needful. > By theoretical aspects of security, I refer to defensive programming, > some forms of undefined behavior (e.g rint64_clip, many ubsan > failures), and other such things such as those flagged by Coverity. By > practical aspects of security, I refer to things like fuzzing crashes, > other ubsan failures, and other things flagged by Coverity or found by > audit. Well, I have a challenging suggestion then... How about looking at threading? Look for Helgrind (or DRD) on http://fate.ffmpeg.org. I know many of the report are false positive... but are they? Do we have real issues spotted here? You might want to study why we have so much of them, if ints read/write really are actually atomic on all platforms, ... that sort of stuff :) -- Clément B. signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/2] lavc/pcm_tablegen: slight speedup of table generation
On Sun, Jan 03, 2016 at 09:11:28PM -0800, Ganesh Ajjanagadde wrote: > On Sun, Jan 3, 2016 at 7:32 PM, Michael Niedermayer >wrote: > > On Mon, Jan 04, 2016 at 04:04:02AM +0100, Michael Niedermayer wrote: > >> On Wed, Dec 30, 2015 at 08:34:55PM -0800, Ganesh Ajjanagadde wrote: > >> > This gets rid of some branches to speed up table generation slightly > >> > (impact higher on mulaw than alaw). Tables are identical to before, > >> > tested with FATE. > >> > > >> > Sample benchmark (Haswell, GNU/Linux+gcc): > >> > old: > >> > 313494 decicycles in build_alaw_table,4094 runs, 2 skips > >> > 315959 decicycles in build_alaw_table,8190 runs, 2 skips > >> > > >> > 323599 decicycles in build_ulaw_table,4095 runs, 1 skips > >> > 318849 decicycles in build_ulaw_table,8188 runs, 4 skips > >> > > >> > new: > >> > 261902 decicycles in build_alaw_table,4096 runs, 0 skips > >> > 266519 decicycles in build_alaw_table,8192 runs, 0 skips > >> > > >> > 209657 decicycles in build_ulaw_table,4096 runs, 0 skips > >> > 232656 decicycles in build_ulaw_table,8192 runs, 0 skips > >> > > >> > Signed-off-by: Ganesh Ajjanagadde > >> > --- > >> > libavcodec/pcm_tablegen.h | 24 > >> > 1 file changed, 12 insertions(+), 12 deletions(-) > >> > > >> > diff --git a/libavcodec/pcm_tablegen.h b/libavcodec/pcm_tablegen.h > >> > index 1387210..7269977 100644 > >> > --- a/libavcodec/pcm_tablegen.h > >> > +++ b/libavcodec/pcm_tablegen.h > >> > @@ -87,21 +87,21 @@ static av_cold void build_xlaw_table(uint8_t > >> > *linear_to_xlaw, > >> > { > >> > int i, j, v, v1, v2; > >> > > >> > -j = 0; > >> > -for(i=0;i<128;i++) { > >> > -if (i != 127) { > >> > -v1 = xlaw2linear(i ^ mask); > >> > -v2 = xlaw2linear((i + 1) ^ mask); > >> > -v = (v1 + v2 + 4) >> 3; > >> > -} else { > >> > -v = 8192; > >> > -} > >> > -for(;j >> > +j = 1; > >> > +linear_to_xlaw[8192] = mask; > >> > +for(i=0;i<127;i++) { > >> > +v1 = xlaw2linear(i ^ mask); > >> > +v2 = xlaw2linear((i + 1) ^ mask); > >> > +v = (v1 + v2 + 4) >> 3; > >> > +for(;j >> > +linear_to_xlaw[8192 - j] = (i ^ (mask ^ 0x80)); > >> > linear_to_xlaw[8192 + j] = (i ^ mask); > >> > -if (j > 0) > >> > -linear_to_xlaw[8192 - j] = (i ^ (mask ^ 0x80)); > >> > } > >> > } > >> > +for(;j<8192;j++) { > >> > +linear_to_xlaw[8192 - j] = (127 ^ (mask ^ 0x80)); > >> > +linear_to_xlaw[8192 + j] = (127 ^ mask); > >> > +} > >> > linear_to_xlaw[0] = linear_to_xlaw[1]; > >> > >> i think you can make the tables 8 times smaller > > > > forget this, i should have checked the whole table or looked when i > > am awake ... > > ha ha. By the way, both changes are needed to get this level of > speedup, with only the j change which you acked, the speedup is much > smaller. But then also note that the other parts of the patch also > increase the binary size more. hmm, ok if its needed to get the speedup then LGTM thanks [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Concerning the gods, I have no means of knowing whether they exist or not or of what sort they may be, because of the obscurity of the subject, and the brevity of human life -- Protagoras signature.asc Description: Digital signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/2] lavc/pcm_tablegen: slight speedup of table generation
On Mon, Jan 4, 2016 at 2:45 AM, Michael Niedermayerwrote: > On Sun, Jan 03, 2016 at 09:11:28PM -0800, Ganesh Ajjanagadde wrote: >> On Sun, Jan 3, 2016 at 7:32 PM, Michael Niedermayer >> wrote: >> > On Mon, Jan 04, 2016 at 04:04:02AM +0100, Michael Niedermayer wrote: >> >> On Wed, Dec 30, 2015 at 08:34:55PM -0800, Ganesh Ajjanagadde wrote: >> >> > This gets rid of some branches to speed up table generation slightly >> >> > (impact higher on mulaw than alaw). Tables are identical to before, >> >> > tested with FATE. >> >> > >> >> > Sample benchmark (Haswell, GNU/Linux+gcc): >> >> > old: >> >> > 313494 decicycles in build_alaw_table,4094 runs, 2 skips >> >> > 315959 decicycles in build_alaw_table,8190 runs, 2 skips >> >> > >> >> > 323599 decicycles in build_ulaw_table,4095 runs, 1 skips >> >> > 318849 decicycles in build_ulaw_table,8188 runs, 4 skips >> >> > >> >> > new: >> >> > 261902 decicycles in build_alaw_table,4096 runs, 0 skips >> >> > 266519 decicycles in build_alaw_table,8192 runs, 0 skips >> >> > >> >> > 209657 decicycles in build_ulaw_table,4096 runs, 0 skips >> >> > 232656 decicycles in build_ulaw_table,8192 runs, 0 skips >> >> > >> >> > Signed-off-by: Ganesh Ajjanagadde >> >> > --- >> >> > libavcodec/pcm_tablegen.h | 24 >> >> > 1 file changed, 12 insertions(+), 12 deletions(-) >> >> > >> >> > diff --git a/libavcodec/pcm_tablegen.h b/libavcodec/pcm_tablegen.h >> >> > index 1387210..7269977 100644 >> >> > --- a/libavcodec/pcm_tablegen.h >> >> > +++ b/libavcodec/pcm_tablegen.h >> >> > @@ -87,21 +87,21 @@ static av_cold void build_xlaw_table(uint8_t >> >> > *linear_to_xlaw, >> >> > { >> >> > int i, j, v, v1, v2; >> >> > >> >> > -j = 0; >> >> > -for(i=0;i<128;i++) { >> >> > -if (i != 127) { >> >> > -v1 = xlaw2linear(i ^ mask); >> >> > -v2 = xlaw2linear((i + 1) ^ mask); >> >> > -v = (v1 + v2 + 4) >> 3; >> >> > -} else { >> >> > -v = 8192; >> >> > -} >> >> > -for(;j > >> > +j = 1; >> >> > +linear_to_xlaw[8192] = mask; >> >> > +for(i=0;i<127;i++) { >> >> > +v1 = xlaw2linear(i ^ mask); >> >> > +v2 = xlaw2linear((i + 1) ^ mask); >> >> > +v = (v1 + v2 + 4) >> 3; >> >> > +for(;j > >> > +linear_to_xlaw[8192 - j] = (i ^ (mask ^ 0x80)); >> >> > linear_to_xlaw[8192 + j] = (i ^ mask); >> >> > -if (j > 0) >> >> > -linear_to_xlaw[8192 - j] = (i ^ (mask ^ 0x80)); >> >> > } >> >> > } >> >> > +for(;j<8192;j++) { >> >> > +linear_to_xlaw[8192 - j] = (127 ^ (mask ^ 0x80)); >> >> > +linear_to_xlaw[8192 + j] = (127 ^ mask); >> >> > +} >> >> > linear_to_xlaw[0] = linear_to_xlaw[1]; >> >> >> >> i think you can make the tables 8 times smaller >> > >> > forget this, i should have checked the whole table or looked when i >> > am awake ... >> >> ha ha. By the way, both changes are needed to get this level of >> speedup, with only the j change which you acked, the speedup is much >> smaller. But then also note that the other parts of the patch also >> increase the binary size more. > > hmm, ok if its needed to get the speedup then LGTM > > thanks pushed, thanks > > [...] > -- > Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB > > Concerning the gods, I have no means of knowing whether they exist or not > or of what sort they may be, because of the obscurity of the subject, and > the brevity of human life -- Protagoras > > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/2] lavc/pcm_tablegen: slight speedup of table generation
Hi Ganesh, On 04.01.2016 06:49, Ganesh Ajjanagadde wrote: > Generally, my strengths are in algorithmic/mathematical/numerical > improvements. I have a strong interest in security (both its > "practical" and "theoretical" variants), but with nowhere near the > same level of knowledge. > > Clarifications: by algorithmic improvements, I do not usually count > asm code, but make exceptions in some cases. In particular, I have > minimal knowledge of assembly and minimal motivation in learning it. > However, I may examine at some point cases where I am convinced that a > compiler can't do the needful. > By theoretical aspects of security, I refer to defensive programming, > some forms of undefined behavior (e.g rint64_clip, many ubsan > failures), and other such things such as those flagged by Coverity. By > practical aspects of security, I refer to things like fuzzing crashes, > other ubsan failures, and other things flagged by Coverity or found by > audit. If you're interested in fixing some forms of undefined behavior (which?) I can send you a couple of samples triggering them. I have far too many of these to be able to fix them any time soon. Best regards, Andreas ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/2] lavc/pcm_tablegen: slight speedup of table generation
On Mon, Jan 4, 2016 at 1:29 AM, Clément Bœschwrote: > On Sun, Jan 03, 2016 at 09:49:33PM -0800, Ganesh Ajjanagadde wrote: >> On Sun, Jan 3, 2016 at 1:30 PM, Carl Eugen Hoyos wrote: >> > Carl Eugen Hoyos ag.or.at> writes: >> > >> >> Ganesh Ajjanagadde mit.edu> writes: >> >> >> >> > No one has told me what is interesting >> >> >> >> Did you look at tickets #4441 or #4085? >> > >> > Or ticket #4829 or a j2k issue? >> >> Thanks a lot for taking the time to point these out. They have all >> been noted. Unfortunately, I have too many things on my list now. 4829 >> may be what I tackle first; it may take a while though. >> >> I hope the following is helpful. >> Generally, my strengths are in algorithmic/mathematical/numerical >> improvements. > > but not interested in merging pre-filter and RLB-filter in EBU R128 like I > pointed out? :( That was a long time back, completely forgotten and never noted down explicitly. Thanks for reminding me :). > > More seriously, maybe try to write a filter? I'm thinking of the Eulerian > magnification filter¹², which I unfortunately haven't time to work on. I recall seeing this at MIT a year back, noted. Thanks. > > You may also enjoy studying motion interpolation for many applications. > > Anyway, the point is, you have a very large range of possibilities to > enjoy yourself on the project wrt image processing. > > [1]: http://people.csail.mit.edu/mrub/vidmag/ > [2]: https://www.youtube.com/watch?v=ONZcjs1Pjmk > >> I have a strong interest in security (both its >> "practical" and "theoretical" variants), but with nowhere near the >> same level of knowledge. >> >> Clarifications: by algorithmic improvements, I do not usually count >> asm code, but make exceptions in some cases. In particular, I have >> minimal knowledge of assembly and minimal motivation in learning it. >> However, I may examine at some point cases where I am convinced that a >> compiler can't do the needful. >> By theoretical aspects of security, I refer to defensive programming, >> some forms of undefined behavior (e.g rint64_clip, many ubsan >> failures), and other such things such as those flagged by Coverity. By >> practical aspects of security, I refer to things like fuzzing crashes, >> other ubsan failures, and other things flagged by Coverity or found by >> audit. > > Well, I have a challenging suggestion then... How about looking at > threading? Look for Helgrind (or DRD) on http://fate.ffmpeg.org. I know > many of the report are false positive... but are they? Do we have real > issues spotted here? You might want to study why we have so much of them, > if ints read/write really are actually atomic on all platforms, ... that > sort of stuff :) Ah, threading - this was always a pain, and I deliberately did not study it as well as I should have during my undergrad years. This will unfortunately go very down in my list, as I need to study it and find the necessary time. @all: thanks very much for suggestions. > > -- > Clément B. > > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/2] lavc/pcm_tablegen: slight speedup of table generation
On Wed, Dec 30, 2015 at 08:34:55PM -0800, Ganesh Ajjanagadde wrote: > This gets rid of some branches to speed up table generation slightly > (impact higher on mulaw than alaw). Tables are identical to before, > tested with FATE. > > Sample benchmark (Haswell, GNU/Linux+gcc): > old: > 313494 decicycles in build_alaw_table,4094 runs, 2 skips > 315959 decicycles in build_alaw_table,8190 runs, 2 skips > > 323599 decicycles in build_ulaw_table,4095 runs, 1 skips > 318849 decicycles in build_ulaw_table,8188 runs, 4 skips > > new: > 261902 decicycles in build_alaw_table,4096 runs, 0 skips > 266519 decicycles in build_alaw_table,8192 runs, 0 skips > > 209657 decicycles in build_ulaw_table,4096 runs, 0 skips > 232656 decicycles in build_ulaw_table,8192 runs, 0 skips > > Signed-off-by: Ganesh Ajjanagadde> --- > libavcodec/pcm_tablegen.h | 24 > 1 file changed, 12 insertions(+), 12 deletions(-) > > diff --git a/libavcodec/pcm_tablegen.h b/libavcodec/pcm_tablegen.h > index 1387210..7269977 100644 > --- a/libavcodec/pcm_tablegen.h > +++ b/libavcodec/pcm_tablegen.h > @@ -87,21 +87,21 @@ static av_cold void build_xlaw_table(uint8_t > *linear_to_xlaw, > { > int i, j, v, v1, v2; > > -j = 0; > -for(i=0;i<128;i++) { > -if (i != 127) { > -v1 = xlaw2linear(i ^ mask); > -v2 = xlaw2linear((i + 1) ^ mask); > -v = (v1 + v2 + 4) >> 3; > -} else { > -v = 8192; > -} > -for(;j +j = 1; > +linear_to_xlaw[8192] = mask; > +for(i=0;i<127;i++) { > +v1 = xlaw2linear(i ^ mask); > +v2 = xlaw2linear((i + 1) ^ mask); > +v = (v1 + v2 + 4) >> 3; > +for(;j +linear_to_xlaw[8192 - j] = (i ^ (mask ^ 0x80)); > linear_to_xlaw[8192 + j] = (i ^ mask); > -if (j > 0) > -linear_to_xlaw[8192 - j] = (i ^ (mask ^ 0x80)); > } > } > +for(;j<8192;j++) { > +linear_to_xlaw[8192 - j] = (127 ^ (mask ^ 0x80)); > +linear_to_xlaw[8192 + j] = (127 ^ mask); > +} > linear_to_xlaw[0] = linear_to_xlaw[1]; i think you can make the tables 8 times smaller the points in the table where values transition seemed to be always a multiple of 8 appart so just adjusting the offset in pcm_encode_frame() would allow decreasing the >> 2 to >> 5 if that works out it would make the table generation 8 times faster reduce memory needed and speed up the code runtime due to lower pressure on L1/L2 caches [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB During times of universal deceit, telling the truth becomes a revolutionary act. -- George Orwell signature.asc Description: Digital signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/2] lavc/pcm_tablegen: slight speedup of table generation
Carl Eugen Hoyos ag.or.at> writes: > Ganesh Ajjanagadde mit.edu> writes: > > > No one has told me what is interesting > > Did you look at tickets #4441 or #4085? Or ticket #4829 or a j2k issue? Carl Eugen ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/2] lavc/pcm_tablegen: slight speedup of table generation
Ganesh Ajjanagadde mit.edu> writes: > No one has told me what is interesting Did you look at tickets #4441 or #4085? (Careful about the license for the second one.) But you can only decide for yourself what you find interesting... Carl Eugen ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/2] lavc/pcm_tablegen: slight speedup of table generation
On Sun, Jan 3, 2016 at 9:35 AM, Kieran Kunhyawrote: >> It is still "speed critical" enough for people to retain >> CONFIG_HARDCODED_TABLES. My goal here is simple: I want to get cycle >> count down enough so that hardcoded tables can be removed here. > > How are you going to guarantee this across all arches? First of all, I have no idea what "guarantee" you are looking for here. What I guarantee is this: on every architecture, this change results in speedup (not to a mathematical level of guarantee, but to a practical level of gurantee). What I don't know is at what point one considers hardcoded tables, associated ifdefry, etc useful or not. That is inherently subjective. Same goes for code "readability" - also subjective. I also don't know the relative gains across architectures. What I do know is that there are clear inconsistencies in opinions here regarding this (e.g not "speed critical" yet needs a configure option), and I want to understand the heart of the matter. The thread I created is a step towards a scientific understanding of the actual impact of hardcoded tables, something that seems to not have been done in the past, when it should have been done at the time of introduction. Furthermore, since when has FFmpeg "guaranteed" speedups across all arches? Impacts of even a single line change can vary across arches. Does that mean that all patches under review get benched on each and every architecture that we support? Of course not. [...] ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/2] lavc/pcm_tablegen: slight speedup of table generation
On Mon, Jan 04, 2016 at 04:04:02AM +0100, Michael Niedermayer wrote: > On Wed, Dec 30, 2015 at 08:34:55PM -0800, Ganesh Ajjanagadde wrote: > > This gets rid of some branches to speed up table generation slightly > > (impact higher on mulaw than alaw). Tables are identical to before, > > tested with FATE. > > > > Sample benchmark (Haswell, GNU/Linux+gcc): > > old: > > 313494 decicycles in build_alaw_table,4094 runs, 2 skips > > 315959 decicycles in build_alaw_table,8190 runs, 2 skips > > > > 323599 decicycles in build_ulaw_table,4095 runs, 1 skips > > 318849 decicycles in build_ulaw_table,8188 runs, 4 skips > > > > new: > > 261902 decicycles in build_alaw_table,4096 runs, 0 skips > > 266519 decicycles in build_alaw_table,8192 runs, 0 skips > > > > 209657 decicycles in build_ulaw_table,4096 runs, 0 skips > > 232656 decicycles in build_ulaw_table,8192 runs, 0 skips > > > > Signed-off-by: Ganesh Ajjanagadde> > --- > > libavcodec/pcm_tablegen.h | 24 > > 1 file changed, 12 insertions(+), 12 deletions(-) > > > > diff --git a/libavcodec/pcm_tablegen.h b/libavcodec/pcm_tablegen.h > > index 1387210..7269977 100644 > > --- a/libavcodec/pcm_tablegen.h > > +++ b/libavcodec/pcm_tablegen.h > > @@ -87,21 +87,21 @@ static av_cold void build_xlaw_table(uint8_t > > *linear_to_xlaw, > > { > > int i, j, v, v1, v2; > > > > -j = 0; > > -for(i=0;i<128;i++) { > > -if (i != 127) { > > -v1 = xlaw2linear(i ^ mask); > > -v2 = xlaw2linear((i + 1) ^ mask); > > -v = (v1 + v2 + 4) >> 3; > > -} else { > > -v = 8192; > > -} > > -for(;j > +j = 1; > > +linear_to_xlaw[8192] = mask; > > +for(i=0;i<127;i++) { > > +v1 = xlaw2linear(i ^ mask); > > +v2 = xlaw2linear((i + 1) ^ mask); > > +v = (v1 + v2 + 4) >> 3; > > +for(;j > +linear_to_xlaw[8192 - j] = (i ^ (mask ^ 0x80)); > > linear_to_xlaw[8192 + j] = (i ^ mask); > > -if (j > 0) > > -linear_to_xlaw[8192 - j] = (i ^ (mask ^ 0x80)); > > } > > } > > +for(;j<8192;j++) { > > +linear_to_xlaw[8192 - j] = (127 ^ (mask ^ 0x80)); > > +linear_to_xlaw[8192 + j] = (127 ^ mask); > > +} > > linear_to_xlaw[0] = linear_to_xlaw[1]; > > i think you can make the tables 8 times smaller forget this, i should have checked the whole table or looked when i am awake ... [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB I do not agree with what you have to say, but I'll defend to the death your right to say it. -- Voltaire signature.asc Description: Digital signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/2] lavc/pcm_tablegen: slight speedup of table generation
On Sun, Jan 3, 2016 at 1:30 PM, Carl Eugen Hoyoswrote: > Carl Eugen Hoyos ag.or.at> writes: > >> Ganesh Ajjanagadde mit.edu> writes: >> >> > No one has told me what is interesting >> >> Did you look at tickets #4441 or #4085? > > Or ticket #4829 or a j2k issue? Thanks a lot for taking the time to point these out. They have all been noted. Unfortunately, I have too many things on my list now. 4829 may be what I tackle first; it may take a while though. I hope the following is helpful. Generally, my strengths are in algorithmic/mathematical/numerical improvements. I have a strong interest in security (both its "practical" and "theoretical" variants), but with nowhere near the same level of knowledge. Clarifications: by algorithmic improvements, I do not usually count asm code, but make exceptions in some cases. In particular, I have minimal knowledge of assembly and minimal motivation in learning it. However, I may examine at some point cases where I am convinced that a compiler can't do the needful. By theoretical aspects of security, I refer to defensive programming, some forms of undefined behavior (e.g rint64_clip, many ubsan failures), and other such things such as those flagged by Coverity. By practical aspects of security, I refer to things like fuzzing crashes, other ubsan failures, and other things flagged by Coverity or found by audit. > > Carl Eugen > > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/2] lavc/pcm_tablegen: slight speedup of table generation
On Sun, Jan 3, 2016 at 7:32 PM, Michael Niedermayerwrote: > On Mon, Jan 04, 2016 at 04:04:02AM +0100, Michael Niedermayer wrote: >> On Wed, Dec 30, 2015 at 08:34:55PM -0800, Ganesh Ajjanagadde wrote: >> > This gets rid of some branches to speed up table generation slightly >> > (impact higher on mulaw than alaw). Tables are identical to before, >> > tested with FATE. >> > >> > Sample benchmark (Haswell, GNU/Linux+gcc): >> > old: >> > 313494 decicycles in build_alaw_table,4094 runs, 2 skips >> > 315959 decicycles in build_alaw_table,8190 runs, 2 skips >> > >> > 323599 decicycles in build_ulaw_table,4095 runs, 1 skips >> > 318849 decicycles in build_ulaw_table,8188 runs, 4 skips >> > >> > new: >> > 261902 decicycles in build_alaw_table,4096 runs, 0 skips >> > 266519 decicycles in build_alaw_table,8192 runs, 0 skips >> > >> > 209657 decicycles in build_ulaw_table,4096 runs, 0 skips >> > 232656 decicycles in build_ulaw_table,8192 runs, 0 skips >> > >> > Signed-off-by: Ganesh Ajjanagadde >> > --- >> > libavcodec/pcm_tablegen.h | 24 >> > 1 file changed, 12 insertions(+), 12 deletions(-) >> > >> > diff --git a/libavcodec/pcm_tablegen.h b/libavcodec/pcm_tablegen.h >> > index 1387210..7269977 100644 >> > --- a/libavcodec/pcm_tablegen.h >> > +++ b/libavcodec/pcm_tablegen.h >> > @@ -87,21 +87,21 @@ static av_cold void build_xlaw_table(uint8_t >> > *linear_to_xlaw, >> > { >> > int i, j, v, v1, v2; >> > >> > -j = 0; >> > -for(i=0;i<128;i++) { >> > -if (i != 127) { >> > -v1 = xlaw2linear(i ^ mask); >> > -v2 = xlaw2linear((i + 1) ^ mask); >> > -v = (v1 + v2 + 4) >> 3; >> > -} else { >> > -v = 8192; >> > -} >> > -for(;j > > +j = 1; >> > +linear_to_xlaw[8192] = mask; >> > +for(i=0;i<127;i++) { >> > +v1 = xlaw2linear(i ^ mask); >> > +v2 = xlaw2linear((i + 1) ^ mask); >> > +v = (v1 + v2 + 4) >> 3; >> > +for(;j > > +linear_to_xlaw[8192 - j] = (i ^ (mask ^ 0x80)); >> > linear_to_xlaw[8192 + j] = (i ^ mask); >> > -if (j > 0) >> > -linear_to_xlaw[8192 - j] = (i ^ (mask ^ 0x80)); >> > } >> > } >> > +for(;j<8192;j++) { >> > +linear_to_xlaw[8192 - j] = (127 ^ (mask ^ 0x80)); >> > +linear_to_xlaw[8192 + j] = (127 ^ mask); >> > +} >> > linear_to_xlaw[0] = linear_to_xlaw[1]; >> >> i think you can make the tables 8 times smaller > > forget this, i should have checked the whole table or looked when i > am awake ... ha ha. By the way, both changes are needed to get this level of speedup, with only the j change which you acked, the speedup is much smaller. But then also note that the other parts of the patch also increase the binary size more. > > [...] > > -- > Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB > > I do not agree with what you have to say, but I'll defend to the death your > right to say it. -- Voltaire > > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/2] lavc/pcm_tablegen: slight speedup of table generation
On Sun, Jan 3, 2016 at 6:13 AM, Michael Niedermayerwrote: > On Wed, Dec 30, 2015 at 08:34:55PM -0800, Ganesh Ajjanagadde wrote: >> This gets rid of some branches to speed up table generation slightly >> (impact higher on mulaw than alaw). Tables are identical to before, >> tested with FATE. >> >> Sample benchmark (Haswell, GNU/Linux+gcc): >> old: >> 313494 decicycles in build_alaw_table,4094 runs, 2 skips >> 315959 decicycles in build_alaw_table,8190 runs, 2 skips >> >> 323599 decicycles in build_ulaw_table,4095 runs, 1 skips >> 318849 decicycles in build_ulaw_table,8188 runs, 4 skips >> >> new: >> 261902 decicycles in build_alaw_table,4096 runs, 0 skips >> 266519 decicycles in build_alaw_table,8192 runs, 0 skips >> >> 209657 decicycles in build_ulaw_table,4096 runs, 0 skips >> 232656 decicycles in build_ulaw_table,8192 runs, 0 skips >> >> Signed-off-by: Ganesh Ajjanagadde >> --- >> libavcodec/pcm_tablegen.h | 24 >> 1 file changed, 12 insertions(+), 12 deletions(-) >> >> diff --git a/libavcodec/pcm_tablegen.h b/libavcodec/pcm_tablegen.h >> index 1387210..7269977 100644 >> --- a/libavcodec/pcm_tablegen.h >> +++ b/libavcodec/pcm_tablegen.h >> @@ -87,21 +87,21 @@ static av_cold void build_xlaw_table(uint8_t >> *linear_to_xlaw, >> { >> int i, j, v, v1, v2; >> >> -j = 0; >> -for(i=0;i<128;i++) { >> -if (i != 127) { >> -v1 = xlaw2linear(i ^ mask); >> -v2 = xlaw2linear((i + 1) ^ mask); >> -v = (v1 + v2 + 4) >> 3; >> -} else { >> -v = 8192; >> -} >> -for(;j > +j = 1; >> +linear_to_xlaw[8192] = mask; >> +for(i=0;i<127;i++) { >> +v1 = xlaw2linear(i ^ mask); >> +v2 = xlaw2linear((i + 1) ^ mask); >> +v = (v1 + v2 + 4) >> 3; >> +for(;j > +linear_to_xlaw[8192 - j] = (i ^ (mask ^ 0x80)); >> linear_to_xlaw[8192 + j] = (i ^ mask); >> -if (j > 0) >> -linear_to_xlaw[8192 - j] = (i ^ (mask ^ 0x80)); >> } >> } >> +for(;j<8192;j++) { >> +linear_to_xlaw[8192 - j] = (127 ^ (mask ^ 0x80)); >> +linear_to_xlaw[8192 + j] = (127 ^ mask); >> +} > > removing the if(j>0) and replacing it by the direct init before > is fine. > do the other changes have any significnat speed effect ? > i think they make the code harder to read and this is not really > speed critical code It is still "speed critical" enough for people to retain CONFIG_HARDCODED_TABLES. My goal here is simple: I want to get cycle count down enough so that hardcoded tables can be removed here. If patch 2 is fine as is, i.e if the current code is fast enough, than I will just commit with the removal of if(j > 0). > > [...] > -- > Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB > > Avoid a single point of failure, be that a person or equipment. > > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/2] lavc/pcm_tablegen: slight speedup of table generation
On Wed, Dec 30, 2015 at 08:34:55PM -0800, Ganesh Ajjanagadde wrote: > This gets rid of some branches to speed up table generation slightly > (impact higher on mulaw than alaw). Tables are identical to before, > tested with FATE. > > Sample benchmark (Haswell, GNU/Linux+gcc): > old: > 313494 decicycles in build_alaw_table,4094 runs, 2 skips > 315959 decicycles in build_alaw_table,8190 runs, 2 skips > > 323599 decicycles in build_ulaw_table,4095 runs, 1 skips > 318849 decicycles in build_ulaw_table,8188 runs, 4 skips > > new: > 261902 decicycles in build_alaw_table,4096 runs, 0 skips > 266519 decicycles in build_alaw_table,8192 runs, 0 skips > > 209657 decicycles in build_ulaw_table,4096 runs, 0 skips > 232656 decicycles in build_ulaw_table,8192 runs, 0 skips > > Signed-off-by: Ganesh Ajjanagadde> --- > libavcodec/pcm_tablegen.h | 24 > 1 file changed, 12 insertions(+), 12 deletions(-) > > diff --git a/libavcodec/pcm_tablegen.h b/libavcodec/pcm_tablegen.h > index 1387210..7269977 100644 > --- a/libavcodec/pcm_tablegen.h > +++ b/libavcodec/pcm_tablegen.h > @@ -87,21 +87,21 @@ static av_cold void build_xlaw_table(uint8_t > *linear_to_xlaw, > { > int i, j, v, v1, v2; > > -j = 0; > -for(i=0;i<128;i++) { > -if (i != 127) { > -v1 = xlaw2linear(i ^ mask); > -v2 = xlaw2linear((i + 1) ^ mask); > -v = (v1 + v2 + 4) >> 3; > -} else { > -v = 8192; > -} > -for(;j +j = 1; > +linear_to_xlaw[8192] = mask; > +for(i=0;i<127;i++) { > +v1 = xlaw2linear(i ^ mask); > +v2 = xlaw2linear((i + 1) ^ mask); > +v = (v1 + v2 + 4) >> 3; > +for(;j +linear_to_xlaw[8192 - j] = (i ^ (mask ^ 0x80)); > linear_to_xlaw[8192 + j] = (i ^ mask); > -if (j > 0) > -linear_to_xlaw[8192 - j] = (i ^ (mask ^ 0x80)); > } > } > +for(;j<8192;j++) { > +linear_to_xlaw[8192 - j] = (127 ^ (mask ^ 0x80)); > +linear_to_xlaw[8192 + j] = (127 ^ mask); > +} removing the if(j>0) and replacing it by the direct init before is fine. do the other changes have any significnat speed effect ? i think they make the code harder to read and this is not really speed critical code [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Avoid a single point of failure, be that a person or equipment. signature.asc Description: Digital signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/2] lavc/pcm_tablegen: slight speedup of table generation
> It is still "speed critical" enough for people to retain > CONFIG_HARDCODED_TABLES. My goal here is simple: I want to get cycle > count down enough so that hardcoded tables can be removed here. How are you going to guarantee this across all arches? Whilst by all means feel free to work on what you want, there are way more interesting things out there. Kieran ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/2] lavc/pcm_tablegen: slight speedup of table generation
Hi, On Sun, Jan 3, 2016 at 11:21 AM, Ganesh Ajjanagaddewrote: > It is still "speed critical" enough for people to retain > CONFIG_HARDCODED_TABLES. My goal here is simple: I want to get cycle > count down enough so that hardcoded tables can be removed here. Can you explain why? Does CONFIG_HARDCODED_TABLES hurt your eyes? Or is it morally corrupt? Or something else? Ronald ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/2] lavc/pcm_tablegen: slight speedup of table generation
On Sun, Jan 3, 2016 at 9:35 AM, Kieran Kunhyawrote: >> It is still "speed critical" enough for people to retain >> CONFIG_HARDCODED_TABLES. My goal here is simple: I want to get cycle >> count down enough so that hardcoded tables can be removed here. > > How are you going to guarantee this across all arches? I don't. But what really matters is the static vs runtime cost, see e.g the thread I created. The ratio will be far more similar across arches. > Whilst by all means feel free to work on what you want, there are way > more interesting things out there. No one has told me what is interesting, and in the last 6 months, I have not seen a commit that I find interesting either to get an idea of what can be done for the project. This is nothing against the authors, who are all fantastic people, just my opinion. I am here to serve the project, not because I find it "interesting", but because it lacks manpower, and I find its goals worthy. This philosophy has already been mentioned: https://ffmpeg.org/pipermail/ffmpeg-devel/2015-October/182508.html. > > Kieran > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/2] lavc/pcm_tablegen: slight speedup of table generation
On Sun, Jan 3, 2016 at 9:43 AM, Ronald S. Bultjewrote: > Hi, > > On Sun, Jan 3, 2016 at 11:21 AM, Ganesh Ajjanagadde > wrote: > >> It is still "speed critical" enough for people to retain >> CONFIG_HARDCODED_TABLES. My goal here is simple: I want to get cycle >> count down enough so that hardcoded tables can be removed here. > > > Can you explain why? Does CONFIG_HARDCODED_TABLES hurt your eyes? Or is it > morally corrupt? Or something else? Please refrain from hyperbole, it has nothing to do with my eyes or "moral corruption". More seriously, I have mentioned this already: wm4 said it is a worthy goal. wm4, being a lead of mpv (a main client of FFmpeg), is someone whose opinion I take seriously and think hard about, even if I don't agree with it personally in some cases. Many things I did in the past were not liked by many here, and are still not liked by many going by recent IRC logs. I wanted to find a common ground, and here was something where I actually agreed with wm4 even from my own convictions. Again, this goes back to what I said: I do things not because I find it interesting, but because someone whose needs are more than mine benefits from it. More generally, I find something very inconsistent here: table generation is claimed to not be "speed-critical", yet there are a few people here who still think it is "critical enough" to justify retaining hardcoded tables, and the associated complexity of the configure/build system. > > Ronald > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/2] lavc/pcm_tablegen: slight speedup of table generation
On Wed, Dec 30, 2015 at 8:34 PM, Ganesh Ajjanagaddewrote: > This gets rid of some branches to speed up table generation slightly > (impact higher on mulaw than alaw). Tables are identical to before, > tested with FATE. > > Sample benchmark (Haswell, GNU/Linux+gcc): > old: > 313494 decicycles in build_alaw_table,4094 runs, 2 skips > 315959 decicycles in build_alaw_table,8190 runs, 2 skips > > 323599 decicycles in build_ulaw_table,4095 runs, 1 skips > 318849 decicycles in build_ulaw_table,8188 runs, 4 skips > > new: > 261902 decicycles in build_alaw_table,4096 runs, 0 skips > 266519 decicycles in build_alaw_table,8192 runs, 0 skips > > 209657 decicycles in build_ulaw_table,4096 runs, 0 skips > 232656 decicycles in build_ulaw_table,8192 runs, 0 skips > > Signed-off-by: Ganesh Ajjanagadde > --- > libavcodec/pcm_tablegen.h | 24 > 1 file changed, 12 insertions(+), 12 deletions(-) > > diff --git a/libavcodec/pcm_tablegen.h b/libavcodec/pcm_tablegen.h > index 1387210..7269977 100644 > --- a/libavcodec/pcm_tablegen.h > +++ b/libavcodec/pcm_tablegen.h > @@ -87,21 +87,21 @@ static av_cold void build_xlaw_table(uint8_t > *linear_to_xlaw, > { > int i, j, v, v1, v2; > > -j = 0; > -for(i=0;i<128;i++) { > -if (i != 127) { > -v1 = xlaw2linear(i ^ mask); > -v2 = xlaw2linear((i + 1) ^ mask); > -v = (v1 + v2 + 4) >> 3; > -} else { > -v = 8192; > -} > -for(;j +j = 1; > +linear_to_xlaw[8192] = mask; > +for(i=0;i<127;i++) { > +v1 = xlaw2linear(i ^ mask); > +v2 = xlaw2linear((i + 1) ^ mask); > +v = (v1 + v2 + 4) >> 3; > +for(;j +linear_to_xlaw[8192 - j] = (i ^ (mask ^ 0x80)); > linear_to_xlaw[8192 + j] = (i ^ mask); > -if (j > 0) > -linear_to_xlaw[8192 - j] = (i ^ (mask ^ 0x80)); > } > } > +for(;j<8192;j++) { > +linear_to_xlaw[8192 - j] = (127 ^ (mask ^ 0x80)); > +linear_to_xlaw[8192 + j] = (127 ^ mask); > +} > linear_to_xlaw[0] = linear_to_xlaw[1]; > } > > -- > 2.6.4 > ping ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 1/2] lavc/pcm_tablegen: slight speedup of table generation
This gets rid of some branches to speed up table generation slightly (impact higher on mulaw than alaw). Tables are identical to before, tested with FATE. Sample benchmark (Haswell, GNU/Linux+gcc): old: 313494 decicycles in build_alaw_table,4094 runs, 2 skips 315959 decicycles in build_alaw_table,8190 runs, 2 skips 323599 decicycles in build_ulaw_table,4095 runs, 1 skips 318849 decicycles in build_ulaw_table,8188 runs, 4 skips new: 261902 decicycles in build_alaw_table,4096 runs, 0 skips 266519 decicycles in build_alaw_table,8192 runs, 0 skips 209657 decicycles in build_ulaw_table,4096 runs, 0 skips 232656 decicycles in build_ulaw_table,8192 runs, 0 skips Signed-off-by: Ganesh Ajjanagadde--- libavcodec/pcm_tablegen.h | 24 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/libavcodec/pcm_tablegen.h b/libavcodec/pcm_tablegen.h index 1387210..7269977 100644 --- a/libavcodec/pcm_tablegen.h +++ b/libavcodec/pcm_tablegen.h @@ -87,21 +87,21 @@ static av_cold void build_xlaw_table(uint8_t *linear_to_xlaw, { int i, j, v, v1, v2; -j = 0; -for(i=0;i<128;i++) { -if (i != 127) { -v1 = xlaw2linear(i ^ mask); -v2 = xlaw2linear((i + 1) ^ mask); -v = (v1 + v2 + 4) >> 3; -} else { -v = 8192; -} -for(;j > 3; +for(;j 0) -linear_to_xlaw[8192 - j] = (i ^ (mask ^ 0x80)); } } +for(;j<8192;j++) { +linear_to_xlaw[8192 - j] = (127 ^ (mask ^ 0x80)); +linear_to_xlaw[8192 + j] = (127 ^ mask); +} linear_to_xlaw[0] = linear_to_xlaw[1]; } -- 2.6.4 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel