Re: [music-dsp] Efficiency of clear/copy/offset buffers
On 3/7/13 10:11 PM, Alan Wolfe wrote: Quick 2 cents of my own to re-emphasize a point that Ross made - profile to find out which is fastest if you aren't sure (although it's good to ask too in case different systems have different oddities you don't know about) Also, if in the future you have performance issues, profile before acting for maximum efficiency... often times what we suspect to be the bottleneck of our application is in fact not the bottleneck at all. Happens to everyone :P lastly, copying buffers is an important thing to get right, but in case you haven't heard this enough, when hitting performance problems it's often better to do MACRO optimization instead of MICRO optimization. Macro optimization means changing your algorithm, being smarter with the resources you have etc. Micro optimization means turning multiplications into bitshifts, breaking out the assembly and things like that. one thing that makes sense to me, when i was worrying about this, was to try to do a few different tasks together in the same operation at a system level. here's a case in point: in some previous product that will go unnamed because i don't want anyone pissed at me for "revealing state secrets", the product had multichannel in and multichannel out. the samples in the A/D and D/A DMA buffers were interlaced, fixed point, and scaled for the I/O device. but we wanted the different channel buffers to not be interlaced for the internal algs and we wanted the data be converted to floating point (i don't like floating point so much, but the processor was float and the decision was made by bigger people than me that all the algs were to be floating point), and there were user-definable global gains going in and coming out of the box. so i wrote (in assembly) a simple de-interlace, copy, scale, and convert-to-float of the samples going in, and the reverse of all of that for the samples going out. doing all four operations together cost about the same as just copying the data when done in assembly. maybe some setup overhead, but the sample was yanked from one buffer, converted to float, multiplied by the global gain, and stored into one of multiple other buffers. and going out was the reverse. in between, the sorta user-defined algs were mono or multichannel, but looked at each channel as just another mono signal in a block or buffer that didn't have any confusing interleaving (no "stride" needed, unless it was a crude down-sampler and that was part of the alg definition, but the algs never had to think about skipping over other channels' samples). -- r b-j r...@audioimagination.com "Imagination is more important than knowledge." -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp
Re: [music-dsp] Efficiency of clear/copy/offset buffers
Quick 2 cents of my own to re-emphasize a point that Ross made - profile to find out which is fastest if you aren't sure (although it's good to ask too in case different systems have different oddities you don't know about) Also, if in the future you have performance issues, profile before acting for maximum efficiency... often times what we suspect to be the bottleneck of our application is in fact not the bottleneck at all. Happens to everyone :P lastly, copying buffers is an important thing to get right, but in case you haven't heard this enough, when hitting performance problems it's often better to do MACRO optimization instead of MICRO optimization. Macro optimization means changing your algorithm, being smarter with the resources you have etc. Micro optimization means turning multiplications into bitshifts, breaking out the assembly and things like that. Often times macro optimizations will get you a bigger win (don't optimize a crappy sorting algorithm, just use a better sorting algorithm and it'll be way better) and also will result in more maintainable, portable code, so you should prefer going that route first. Hope this helps! On Thu, Mar 7, 2013 at 2:48 PM, Ross Bencina wrote: > Stephen, > > > On 8/03/2013 9:29 AM, ChordWizard Software wrote: >> >> a) additive mixing of audio buffers b) clearing to zero before >> additive processing > > > You could also consider writing (rather than adding) the first signal to the > buffer. That way you don't have to zero it first. It requires having a > "write" and an "add" version of your generators. Depending on your code this > may or may not be worth the trouble vs zeroing first. > > In the past I've sometimes used C++ templates to paramaterise by the output > operation (write/add) so you only have to write the code that generates the > signals once > > > c) copying from one buffer to another > > Of course you should avoid this whereever possible. Consider using > (reference counted) buffer objects so you can share them instead of copying > data. You could use reference counting, or just reclaim everything at the > end of every cycle. > > > > d) converting between short and float formats >> >> >> No surprises to any of you there I'm sure. My question is, can you >> give me a few pointers about making them as efficient as possible >> within that critical realtime loop? >> >> For example, how does the efficiency of memset, or ZeroMemory, >> compare to a simple for loop? > > > Usually memset has a special case for writing zeros, so you shouldn't see > too much difference between memset and ZeroMemory. > > memset vs simple loop will depend on your compiler. > > The usual wisdom is: > > 1) use memset vs writing your own. the library implementation will use > SSE/whatever and will be fast. Of course this depends on the runtime > > 2) always profile and compare if you care. > > > >> Or using HeapAlloc with the >> HEAP_ZERO_MEMORY flag when the buffer is created (I know buffers >> shouldn’t be allocated in a realtime callback, but just out of >> interest, I assume an initial zeroing must come at a cost compared to >> not using that flag)? > > > It could happen in a few ways, but I'm not sure how it *does* happen on > Windows and OS X. > > For example the MMU could map all the pages to a single zero page and then > allocate+zero only when there is a write to the page. > > > >> I'm using Win32 but intend to port to OSX as well, so comments on the >> merits of cross-platform options like the C RTL would be particularly >> helpful. I realise some of those I mention above are Win-specific. >> >> Also for converting sample formats, are there more efficient options >> than simply using >> >> nFloat = (float)nShort / 32768.0 > > > Unless you have a good reason not to you should prefer multiplication by > reciprocal for the first one > > const float scale = (float)(1. / 32768.0); > nFloat = (float)nShort * scale; > > You can do 4 at once if you use SSE/intrinsics. > > >> nShort = (short)(nFloat * 32768.0) > > Float => int conversion can be expensive depending on your compiler settings > and supported processor architectures. There are various ways around this. > > Take a look at pa_converters.c and the pa_x86_plain_converters.c in > PortAudio. But you can do better with SSE. > > > >> for every sample? >> >> Are there any articles on this type of optimisation that can give me >> some insight into what is happening behind the various memory >> management calls? > > > Probably. I would make sure you allocate aligned memory, maybe lock it in > physical memory, and then use it -- and generally avoid OS-level memory > calls from then on. > > I would use memset() memcpy(). These are optimised and the compiler may even > inline an even more optimal version. > > The alternative is to go low-level and benchmark everything and write your > own code in SSE (and learn how to optimise it). > > If you really care you need a good profiler. > > That's my 2c. > > HTH > > Ross. >
Re: [music-dsp] Efficiency of clear/copy/offset buffers
Stephen, On 8/03/2013 9:29 AM, ChordWizard Software wrote: a) additive mixing of audio buffers b) clearing to zero before additive processing You could also consider writing (rather than adding) the first signal to the buffer. That way you don't have to zero it first. It requires having a "write" and an "add" version of your generators. Depending on your code this may or may not be worth the trouble vs zeroing first. In the past I've sometimes used C++ templates to paramaterise by the output operation (write/add) so you only have to write the code that generates the signals once c) copying from one buffer to another Of course you should avoid this whereever possible. Consider using (reference counted) buffer objects so you can share them instead of copying data. You could use reference counting, or just reclaim everything at the end of every cycle. d) converting between short and float formats No surprises to any of you there I'm sure. My question is, can you give me a few pointers about making them as efficient as possible within that critical realtime loop? For example, how does the efficiency of memset, or ZeroMemory, compare to a simple for loop? Usually memset has a special case for writing zeros, so you shouldn't see too much difference between memset and ZeroMemory. memset vs simple loop will depend on your compiler. The usual wisdom is: 1) use memset vs writing your own. the library implementation will use SSE/whatever and will be fast. Of course this depends on the runtime 2) always profile and compare if you care. Or using HeapAlloc with the HEAP_ZERO_MEMORY flag when the buffer is created (I know buffers shouldn’t be allocated in a realtime callback, but just out of interest, I assume an initial zeroing must come at a cost compared to not using that flag)? It could happen in a few ways, but I'm not sure how it *does* happen on Windows and OS X. For example the MMU could map all the pages to a single zero page and then allocate+zero only when there is a write to the page. I'm using Win32 but intend to port to OSX as well, so comments on the merits of cross-platform options like the C RTL would be particularly helpful. I realise some of those I mention above are Win-specific. Also for converting sample formats, are there more efficient options than simply using nFloat = (float)nShort / 32768.0 Unless you have a good reason not to you should prefer multiplication by reciprocal for the first one const float scale = (float)(1. / 32768.0); nFloat = (float)nShort * scale; You can do 4 at once if you use SSE/intrinsics. > nShort = (short)(nFloat * 32768.0) Float => int conversion can be expensive depending on your compiler settings and supported processor architectures. There are various ways around this. Take a look at pa_converters.c and the pa_x86_plain_converters.c in PortAudio. But you can do better with SSE. for every sample? Are there any articles on this type of optimisation that can give me some insight into what is happening behind the various memory management calls? Probably. I would make sure you allocate aligned memory, maybe lock it in physical memory, and then use it -- and generally avoid OS-level memory calls from then on. I would use memset() memcpy(). These are optimised and the compiler may even inline an even more optimal version. The alternative is to go low-level and benchmark everything and write your own code in SSE (and learn how to optimise it). If you really care you need a good profiler. That's my 2c. HTH Ross. Regards, Stephen Clarke Managing Director ChordWizard Software Pty Ltd corpor...@chordwizard.com http://www.chordwizard.com ph: (+61) 2 4960 9520 fax: (+61) 2 4960 9580 -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp
[music-dsp] Efficiency of clear/copy/offset buffers
Greetings, and apologies in advance for bringing up what must be a well-covered topic on this list, I just couldn't find it in the archives anywhere. I'm in the final stages of building an audio host/synth engine in C++, and of course a large part of its realtime workload is building and transferring audio buffers: a) additive mixing of audio buffers b) clearing to zero before additive processing c) copying from one buffer to another d) converting between short and float formats No surprises to any of you there I'm sure. My question is, can you give me a few pointers about making them as efficient as possible within that critical realtime loop? For example, how does the efficiency of memset, or ZeroMemory, compare to a simple for loop? Or using HeapAlloc with the HEAP_ZERO_MEMORY flag when the buffer is created (I know buffers shouldnt be allocated in a realtime callback, but just out of interest, I assume an initial zeroing must come at a cost compared to not using that flag)? I'm using Win32 but intend to port to OSX as well, so comments on the merits of cross-platform options like the C RTL would be particularly helpful. I realise some of those I mention above are Win-specific. Also for converting sample formats, are there more efficient options than simply using nFloat = (float)nShort / 32768.0 nShort = (short)(nFloat * 32768.0) for every sample? Are there any articles on this type of optimisation that can give me some insight into what is happening behind the various memory management calls? Regards, Stephen Clarke Managing Director ChordWizard Software Pty Ltd corpor...@chordwizard.com http://www.chordwizard.com ph: (+61) 2 4960 9520 fax: (+61) 2 4960 9580 -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp
Re: [music-dsp] meassuring the difference
On 3/7/13 1:41 PM, volker böhm wrote: On 07.03.2013, at 16:32, robert bristow-johnson wrote: now i'm looking for something to quantify the error signal. from statistics i know there is something like the "mean squared error". so i'm squaring the error signal and take the (running) average. mostly i'm getting some numbers very close to zero wow! what are the "equivalent but not identical processes"? except for maybe differences in methods of rounding and quantization, or for simple filter, different structures or forms (like Direct Form 1 vs. State-Variable vs. Gold-Rader vs. Lattice/Ladder) i would not expect an error signal (i presume this is the difference of outputs) to be very close to zero. is this what your process is? i s'pose you could have a compressor with identical compression curve and where the delays are lined up very well and where the differences are in how compression levels are computed. even then, when the "equivalent but not identical" compressors pump, i would expect significant amplitude in the difference signal. i am curious what the "equivalent but not identical processes" are. fair enough. i see that my question was probably too vage in this respect. right now i'm doing simple stuff, yes, i'm comparing filter structures. i'm talking about differences that you hardly hear or might even not hear at all. with perfectly linear, time-invariant systems (LTI), the output signal is the theoretically-perfect output plus some error signal due to the internal quantization errors. so then you can line things up so that for two "equivalent" filters (both have the same H(z)), you can subtract one output from the other and you should get something that is very small and well-behaved. but this does not extend well to more sophisticated algs that are not LTI. and i'm interested if this "i don't hear a difference" maybe correleates to a measured error function. or put differently: how big can the error/difference be, without being perceivable (concerning a specific algorithm)? and a gut feeling tells me i want to see those on a dB scale. so i'm taking the logarithm and multiply by 10, as i have already squared the values before. if it's the base-10 logarithm, that gets you dB. what makes 0 dB depends on how the signal is scaled before the log. yes. (as far as i can see, this is equivalent to a RMS meassurement). how are you doing the "M"? i'm summing the samples and divide by the number. as my first attemps where in realtime, i was using a sliding window. that is, BTW, a simple "moving average" and it's a linear, time-invariant filter that happens to have a DC gain of 1 (or 0 dB). any LTI filter with DC gain of 1 will work. a simple 1-pole LPF will work for computing a sliding mean. is there a correct/better/preferred way of doing this? all depends on the alg. next to a listening test, in the end i want to have a simple meassure of the difference of the two processes which is close to our perception of the difference. does that make sense? it does, but a general analytic method that "is close to our perception of the difference" is a sorta holy grail for "equivalent but not identical processes" that are more sophisticated than just a filter or tapped delay or similar. yes, it's all about the holy grail! but i would be satisfied to find it for simple algos (right now). i think that someone in the AES has published ideas. dunno who exactly. i might check it out, but if it requires too much work, i might give up early. -- r b-j r...@audioimagination.com "Imagination is more important than knowledge." -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp
Re: [music-dsp] meassuring the difference
On 07.03.2013, at 16:32, robert bristow-johnson wrote: >> now i'm looking for something to quantify the error signal. >> from statistics i know there is something like the "mean squared error". >> so i'm squaring the error signal and take the (running) average. >> >> mostly i'm getting some numbers very close to zero > > wow! what are the "equivalent but not identical processes"? except for > maybe differences in methods of rounding and quantization, or for simple > filter, different structures or forms (like Direct Form 1 vs. State-Variable > vs. Gold-Rader vs. Lattice/Ladder) i would not expect an error signal (i > presume this is the difference of outputs) to be very close to zero. is > this what your process is? > > i s'pose you could have a compressor with identical compression curve and > where the delays are lined up very well and where the differences are in how > compression levels are computed. even then, when the "equivalent but not > identical" compressors pump, i would expect significant amplitude in the > difference signal. > > i am curious what the "equivalent but not identical processes" are. fair enough. i see that my question was probably too vage in this respect. right now i'm doing simple stuff, yes, i'm comparing filter structures. i'm talking about differences that you hardly hear or might even not hear at all. and i'm interested if this "i don't hear a difference" maybe correleates to a measured error function. or put differently: how big can the error/difference be, without being perceivable (concerning a specific algorithm)? > >> and a gut feeling tells me i want to see those on a dB scale. >> so i'm taking the logarithm and multiply by 10, as i have already squared >> the values before. > > if it's the base-10 logarithm, that gets you dB. what makes 0 dB depends on > how the signal is scaled before the log. yes. > >> (as far as i can see, this is equivalent to a RMS meassurement). > > how are you doing the "M"? i'm summing the samples and divide by the number. as my first attemps where in realtime, i was using a sliding window. > >> is there a correct/better/preferred way of doing this? > > all depends on the alg. > >> next to a listening test, in the end i want to have a simple meassure of the >> difference of the two processes which is close to our perception of the >> difference. does that make sense? > > it does, but a general analytic method that "is close to our perception of > the difference" is a sorta holy grail for "equivalent but not identical > processes" that are more sophisticated than just a filter or tapped delay or > similar. yes, it's all about the holy grail! but i would be satisfied to find it for simple algos (right now). thanks for you comments. volker. -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp
Re: [music-dsp] meassuring the difference
On 07.03.2013, at 16:27, Thomas Young wrote: > Your mean square error procedure is slightly incorrect. You should take the > final signals from both processes, say A[1..n] and B[1..n], subtract them to > get your error signal E[1..n], then the mean square error is the sum of the > squared error over n. > > Sum( E[1..n]^2 ) / n that's what i'm doing, no? > > This (MSE) is a statistical approach though and isn't necessarily a great way > of measuring perceived acoustical differences. yes, this is what i'm suspecting. > > It depends on the nature of your signal but you may want to check the error > in the frequency domain (weighted to specific frequency band if appropriate) > rather than the time domain. thanks. will have to think a little bit about it. volker -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp
Re: [music-dsp] meassuring the difference
On 3/7/13 10:10 AM, volker böhm wrote: dear all, i'm trying to meassure the difference between two equivalent but not identical processes. i sorta know know what you mean by this, maybe... but it would be interesting to see an articulated definition of what makes processes "equivalent" assuming we know what "identical" is. if "equivalent" is "sounds the same" ... right now i'm feeding some test signals to both algorithms at the same time and subtract the output signals. ... then i don't think this will work at all. a millisecond difference in delay will sound the same, but your difference signal will not be anywhere close to zero. there are all sorts of processes that are not simple filters, like reverbs, pitch-shifters, dynamics (AGC, compressor, limiter, gate), fuzz/distortion, that may sound equivalent but the outputs are quite different. now i'm looking for something to quantify the error signal. from statistics i know there is something like the "mean squared error". so i'm squaring the error signal and take the (running) average. mostly i'm getting some numbers very close to zero wow! what are the "equivalent but not identical processes"? except for maybe differences in methods of rounding and quantization, or for simple filter, different structures or forms (like Direct Form 1 vs. State-Variable vs. Gold-Rader vs. Lattice/Ladder) i would not expect an error signal (i presume this is the difference of outputs) to be very close to zero. is this what your process is? i s'pose you could have a compressor with identical compression curve and where the delays are lined up very well and where the differences are in how compression levels are computed. even then, when the "equivalent but not identical" compressors pump, i would expect significant amplitude in the difference signal. i am curious what the "equivalent but not identical processes" are. and a gut feeling tells me i want to see those on a dB scale. so i'm taking the logarithm and multiply by 10, as i have already squared the values before. if it's the base-10 logarithm, that gets you dB. what makes 0 dB depends on how the signal is scaled before the log. (as far as i can see, this is equivalent to a RMS meassurement). how are you doing the "M"? is there a correct/better/preferred way of doing this? all depends on the alg. next to a listening test, in the end i want to have a simple meassure of the difference of the two processes which is close to our perception of the difference. does that make sense? it does, but a general analytic method that "is close to our perception of the difference" is a sorta holy grail for "equivalent but not identical processes" that are more sophisticated than just a filter or tapped delay or similar. -- r b-j r...@audioimagination.com "Imagination is more important than knowledge." -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp
Re: [music-dsp] meassuring the difference
Your mean square error procedure is slightly incorrect. You should take the final signals from both processes, say A[1..n] and B[1..n], subtract them to get your error signal E[1..n], then the mean square error is the sum of the squared error over n. Sum( E[1..n]^2 ) / n This (MSE) is a statistical approach though and isn't necessarily a great way of measuring perceived acoustical differences. It depends on the nature of your signal but you may want to check the error in the frequency domain (weighted to specific frequency band if appropriate) rather than the time domain. -Original Message- From: music-dsp-boun...@music.columbia.edu [mailto:music-dsp-boun...@music.columbia.edu] On Behalf Of volker böhm Sent: 07 March 2013 15:10 To: A discussion list for music-related DSP Subject: [music-dsp] meassuring the difference dear all, i'm trying to meassure the difference between two equivalent but not identical processes. right now i'm feeding some test signals to both algorithms at the same time and subtract the output signals. now i'm looking for something to quantify the error signal. from statistics i know there is something like the "mean squared error". so i'm squaring the error signal and take the (running) average. mostly i'm getting some numbers very close to zero and a gut feeling tells me i want to see those on a dB scale. so i'm taking the logarithm and multiply by 10, as i have already squared the values before. (as far as i can see, this is equivalent to a RMS meassurement). is there a correct/better/preferred way of doing this? next to a listening test, in the end i want to have a simple meassure of the difference of the two processes which is close to our perception of the difference. does that make sense? thanks for any comments, volker. -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp
[music-dsp] meassuring the difference
dear all, i'm trying to meassure the difference between two equivalent but not identical processes. right now i'm feeding some test signals to both algorithms at the same time and subtract the output signals. now i'm looking for something to quantify the error signal. from statistics i know there is something like the "mean squared error". so i'm squaring the error signal and take the (running) average. mostly i'm getting some numbers very close to zero and a gut feeling tells me i want to see those on a dB scale. so i'm taking the logarithm and multiply by 10, as i have already squared the values before. (as far as i can see, this is equivalent to a RMS meassurement). is there a correct/better/preferred way of doing this? next to a listening test, in the end i want to have a simple meassure of the difference of the two processes which is close to our perception of the difference. does that make sense? thanks for any comments, volker. -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp
[music-dsp] Thesis topic on procedural-audio in video games?
Talking about procedural audio for games, we recently posted this on the auditory-list, which may be of interest to music-dsp members : People working with environmental sounds may be interested in downloading our synthesizer, recently made available for research purposes. The synthesizer was designed to simulate wind, rain, footsteps, waves and fire sounds, with real-time control of source physics (e.g., rain intensity, wave size, etc.) and spatial properties (3D position and spatial extension). It is compatible with standard loudspeaker setups and headphones (binaural rendering). The synthesizer is available as a standalone application for Mac and Windows. We hope it can be useful to some of you, e.g., for creating stimuli in virtual environments, testing sound classification/processing algorithms, interactive sound installations, etc. The application and related publications are available here: www.charlesverron.com/spad Regards, Charles Verron postdoctoral researcher at CNRS-LMA www.charlesverron.com -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp