Re: [music-dsp] A theory of optimal splicing of audio in the time domain.
On Jul 13, 2011, at 9:29 AM, Olli Niemitalo wrote: On Sat, Jul 9, 2011 at 10:53 PM, robert bristow-johnson r...@audioimagination.com wrote: On Dec 7, 2010, at 5:27 AM, Olli Niemitalo wrote: [I] chose that the ratio a(t)/a(-t) [...] should be preserved by preserved, do you mean constant over all t? Constant over all r. i think i figgered that out after hitting the Send button. what is the fundamental reason for preserving a(t)/a(-t) ? I'm thinking outside your application of automatic finding of splice points. Think of crossfades between clips in a multi-track sample editor. For a cross-fade in which one signal is faded in using a volume envelope that is a time-reverse of the volume envelope using which the other signal is faded out, a(t)/a(-t) describes by what proportions the two signals are mixed at each t. The fundamental reason then is that I think it is a rather good description of the shape of the fade, to a user, as it will describe how the second signal swallows the first by time. okay, i get it. so instead of expressing the crossfade envelope as a(t) = e(t) + o(t) i think we could describe it as a constant-voltage crossfade (those used for splicing perfectly correlated snippets) bumped up a little by an overall loudness function. an envelope acting on the envelope. and, as you correctly observed, for constant-voltage crossfades, the even component is always e(t) = 1/2 so, pulling another couple of letters outa the alfabet, we can represent the crossfade function as a(t) = e(t) + o(t) = g(t)*( 1/2 + p(t) ) where g(-t) = g(t) is even and p(-t) = -p(t) is odd g(t) = 1 for constant-voltage crossfades, when r=1. for constant-power crossfades, r=0, we know that g(0) = sqrt(2) 1 the shape p(t) is preserved for different values of r and we want to solve for g(t) given a specified correlation value r and a given shape family p(t). indeed a(t)/a(-t) = (1/2 + p(t))/(1/2 - p(t)) and remains preserved over r if p(t) remains unchanged. p(t) can be spec'd initially exactly like o(t) (linear crossfade, Hann, Flattened Hann, or whatever odd function your heart desires). i think it should be easy to solve for g(t). we know that e(t) = 1/2 * g(t) o(t) = g(t) * p(t) and recall the result e(t) = sqrt( (1/2)/(1+r) - (1-r)/(1+r)*(o(t))^2 ) which comes from (1+r)*( e(t) )^2 + (1-r)*( o(t) )^2 = 1/2 so (1+r)*( 1/2*g(t) )^2 + (1-r)*( g(t)*p(t) )^2 = 1/2 ( g(t) )^2 * ( (1+r)/4 + (1-r)*(p(t))^2 ) = 1/2 and picking the positive square root for g(t) yields g(t) = 1/sqrt( (1+r)/2 + 2*(1-r)*(p(t))^2 ) might this result match what you have? (assemble a(t) from g(t) and p(t) just as we had previously from e(t) and o(t).) remember that p(t) is odd so p(0)=0 so when r=1 --- g(t) = 1 (constant-voltage crossfade) and r=0 --- g(0) = sqrt(2)(constant-power crossfade) The user might choose one shape for a particular crossfade. Then, depending on the correlation between the superimposed signals, an appropriate symmetrical volume envelope could be applied to the mixed signal to ensure that there is no peak or dip in the contour of the mixed signal. Because the envelope is symmetrical, applying it preserves a(t)/a(-t). It can also be incorporated directly into a(t). All that is not so far off from the application you describe. but i don't think it is necessary to deal with lags where Rxx(tau) 0. why splice a waveform to another part of the same waveform that has opposite polarity? that would create an even a bigger glitch. Splicing at quiet regions with negative correlation can give a smaller glitch than splicing at louder regions with positive correlation. okay. i would still like to hunt for a splice displacement around that quiet region that would have correlation better than zero. and, if both x(t) and y(t) have no DC, it should be possible to find something. This applies particularly to rhythmic material like drum loops, where the time lag between the splice points is constrained, and it may make most sense to look for quiet spots. However, if it's already so quiet in there, I don't know how much it matters what you use for a cross-fade. Apart from it's so quiet it doesn't matter, I can think of one other objection against using cross-fades tailored for r 0: For example, let's imagine that our signal is white noise generated from a Gaussian distribution, and we are dealing with given splice points for which Rxx(tau) 0 (slightly). but you should also be able to find a tau where Rxx(tau) is slightly greater than zero because Rxx(tau) should be DC free (if x(t) is DC free). if it were true noise, it should not be far from zero so you would likely use the r=0 crossfade function. Now, while the samples of the signal were generated independently, there is by accident a bit of
Re: [music-dsp] A theory of optimal splicing of audio in the time domain.
On Thu, Jul 14, 2011 at 9:22 PM, robert bristow-johnson r...@audioimagination.com wrote: g(t) = 1/sqrt( (1+r)/2 + 2*(1-r)*(p(t))^2 ) might this result match what you have? Yes! I only derived the formula for the linear ramp, p(t) = t/2, because one can get the other shapes by warping time and I didn't want to bloat the cumbersome equations. With the linear ramp our results match exactly. okay. i would still like to hunt for a splice displacement around that quiet region that would have correlation better than zero Sometimes you are stuck with a certain displacement. Think drum loops; changing tau would change tempo. i think it's better to define p(t) (with the same restrictions as o(t)) and find g(t) as a function of r than it is to do it with o(t) and e(t). I agree, even though the theory was quite elegant with o(t) and e(t)... -olli -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp
Re: [music-dsp] A theory of optimal splicing of audio in the time domain.
On Jul 14, 2011, at 5:36 PM, Olli Niemitalo wrote: On Thu, Jul 14, 2011 at 9:22 PM, robert bristow-johnson r...@audioimagination.com wrote: g(t) = 1/sqrt( (1+r)/2 + 2*(1-r)*(p(t))^2 ) might this result match what you have? Yes! I only derived the formula for the linear ramp, p(t) = t/2, because one can get the other shapes by warping time and I didn't want to bloat the cumbersome equations. With the linear ramp our results match exactly. okay. i would still like to hunt for a splice displacement around that quiet region that would have correlation better than zero Sometimes you are stuck with a certain displacement. Think drum loops; changing tau would change tempo. i think it's better to define p(t) (with the same restrictions as o(t)) and find g(t) as a function of r than it is to do it with o(t) and e(t). I agree, even though the theory was quite elegant with o(t) and e(t)... do you have any of this in a document? i wonder if one of us should put this down in a pdf and put it in the music-dsp code archive. -- r b-j r...@audioimagination.com Imagination is more important than knowledge. -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp
Re: [music-dsp] A theory of optimal splicing of audio in the time domain.
On Sat, Jul 9, 2011 at 10:53 PM, robert bristow-johnson r...@audioimagination.com wrote: On Dec 7, 2010, at 5:27 AM, Olli Niemitalo wrote: [I] chose that the ratio a(t)/a(-t) [...] should be preserved by preserved, do you mean constant over all t? Constant over all r. what is the fundamental reason for preserving a(t)/a(-t) ? I'm thinking outside your application of automatic finding of splice points. Think of crossfades between clips in a multi-track sample editor. For a cross-fade in which one signal is faded in using a volume envelope that is a time-reverse of the volume envelope using which the other signal is faded out, a(t)/a(-t) describes by what proportions the two signals are mixed at each t. The fundamental reason then is that I think it is a rather good description of the shape of the fade, to a user, as it will describe how the second signal swallows the first by time. The user might choose one shape for a particular crossfade. Then, depending on the correlation between the superimposed signals, an appropriate symmetrical volume envelope could be applied to the mixed signal to ensure that there is no peak or dip in the contour of the mixed signal. Because the envelope is symmetrical, applying it preserves a(t)/a(-t). It can also be incorporated directly into a(t). All that is not so far off from the application you describe. but i don't think it is necessary to deal with lags where Rxx(tau) 0. why splice a waveform to another part of the same waveform that has opposite polarity? that would create an even a bigger glitch. Splicing at quiet regions with negative correlation can give a smaller glitch than splicing at louder regions with positive correlation. This applies particularly to rhythmic material like drum loops, where the time lag between the splice points is constrained, and it may make most sense to look for quiet spots. However, if it's already so quiet in there, I don't know how much it matters what you use for a cross-fade. Apart from it's so quiet it doesn't matter, I can think of one other objection against using cross-fades tailored for r 0: For example, let's imagine that our signal is white noise generated from a Gaussian distribution, and we are dealing with given splice points for which Rxx(tau) 0 (slightly). Now, while the samples of the signal were generated independently, there is by accident a bit of negative correlation in the instantiation of the noise, between those splice points. Knowing all this, shouldn't we simply use a constant-power fade, rather than a fade tailored for r 0, because random deviations in noise power are to be expected, and only a constant-power fade will produce noise that is statistically identical to the original. I would imagine that noise with long-time non-zero autocorrelation (all the way across the splice points) is a very rare occurrence. Then again, do we really know all this, or even that we are dealing with noise. I should note that Rxx(tau) 0 does not imply opposite polarity, in the fullest sense of the adjective. Two equal sinusoids that have phases 91 degrees apart have a correlation coefficient of about -0.009. RBJ, I'd like to return the favor and let you know that I have great respect for you in these matters (and absolutely no disrespect in any others :-) ). Hey, I wonder if you missed also my other post in the parent thread? You can search for AANLkTim=eM_kgPeibOqFGEr2FdKyL5uCCB_wJhz1Vne -olli -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp
Re: [music-dsp] A theory of optimal splicing of audio in the time domain.
OK, so explain a bit more. On 21 Jan 2011, at 22:55, Sampo Syreeni wrote: My best bet? Go into the cepstral domain to find the most likely loop duration -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp
[music-dsp] A theory of optimal splicing of audio in the time domain.
a few mistakes are spotted and corrected before i forget This is a continuation of the thread started by Element Green titled: Algorithms for finding seamless loops in audio As far as I know, it is not published anywhere. A few years ago, I was thinking of writing this up and publishing it (or submitting it for publication, probably to JAES), and had let it fall by the wayside. I'm publishing the main ideas here on music-dsp because of some possible interest here (and the hope it might be helpful to somebody), and so that prior art is established in case of anyone like IVL is thinking of claiming it as their own. I really do not know how useful it will be in practice. It might not make any difference. It's just a theory. __ Section 0: This is about the generalization of the different ways we can splice and crossfade audio that has these two extremes: (1) Splicing perfectly coherent and correlated signals (2) Splicing completely uncorrelated signals I sometimes call the first case the constant-voltage crossfade because the crossfade envelopes of the two signals being spliced add up to one. The two envelopes meet when both have a value of 1/2. In the second case, we use a constant-power crossfade, the square of the two envelopes add to one and they meet when both have a value of sqrt(1/2)=0.707. The questions I wanted to answer are: What does one do for cases in between, and how does one know from the audio, which crossfade function to use? How does one quantify the answers to these questions? How much can we generalize the answer? __ Section 1: Set up the problem. We have two continuous-time audio signals, x(t) and y(t), and we want to splice from one to the other at time t=0. In pitch-shifting or time-scaling or any other looping, y(t) can be some delayed or advanced version of x(t). e.g.y(t) = x(t-P) where P is a period length or some other good splice displacement. We get that value from an algorithm we call a pitch detector. Also, it doesn't matter whether x(t) is getting spliced to y(t) or the other way around, it should work just as well for the audio played in reverse. And it should be no loss of generality that the splice happens at t=0, we define our coordinate system any damn way we damn well please. The signal resulting from the splice is v(t) = a(t)*x(t) + a(-t)*y(t) By restricting our result to be equivalent if run either forward or backward in time, we can conclude that fade-out function (say that's a(t)) is the time-reversed copy of the fade-in function, a(-t). For the correlated case (1): a(t)+ a(-t)= 1 for all t For the uncorrelated case (2): (a(t))^2 + (a(-t))^2 = 1 for all t This crossfade function, a(t), has well-defined even and odd symmetry components: a(t) = e(t) + o(t) where even part: e(t) = e(-t) = ( a(t) + a(-t) )/2 odd part: o(t) = -o(-t) = ( a(t) - a(-t) )/2 And it's clear that a(-t) = e(t) - o(t) . For example, if it's a simple linear crossfade (equivalent to splicing analog tape with a diagonally-oriented razor blade): { 0 for t = -1 { a(t) = { 1/2 + t/2 for -1 t 1 { { 1 for t = 1 This is represented simply, in the even and odd components, as: e(t) = 1/2 { t/2 for |t| 1 o(t) = { { sgn(t)/2 for |t| = 1 where sgn(t) is the sign function: sgn(t) = t/|t| . This is a constant voltage-crossfade, appropriate for perfectly correlated signals; x(t) and y(t). There is no loss of generality by defining the crossfade to take place around t=0 and have two time units in length. Both are simply a matter of offset and scaling of time. Another constant-voltage crossfade would be what I might call a Hann crossfade (after the Hann window): e(t) = 1/2 { (1/2)*sin(pi/2 * t) for |t| 1 o(t) = { { sgn(t)/2for |t| = 1 Some might like that better because the derivative is continuous everywhere. Extending this idea, one more constant-voltage crossfade is what I might call a Flattened Hann crossfade: e(t) = 1/2 { (9/16)*sin(pi/2 * t) + (1/16)*sin(3*pi/2 * t) for |t| 1 o(t) = { { sgn(t)/2 for |t| = 1 This splice is everywhere continuous in the zeroth, first, and second derivative. A very smooth crossfade. As another example, a constant-power crossfade would be the same as any of the above, but where the above a(t) is square rooted: { 0 for t = -1 { a(t) = { sqrt(1/2 + t/2) for -1 t 1 { { 1 for t = 1 This
Re: [music-dsp] A theory of optimal splicing of audio in the time domain.
On 06.12.2010 08:59, robert bristow-johnson wrote: This is a continuation of the thread started by Element Green titled: Algorithms for finding seamless loops in audio I suspect it works better to *construct* a seamless loop instead trying find one where there is none. Stefan -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp