Re: [PD] Making a Realtime Convolution External
In response to my comment about acoustic mirror sounding muddy. I think that most of the directx effects sounded muddy at that time. ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
Re: [PD] Making a Realtime Convolution External
--- On Thu, 4/7/11, Mathieu Bouchard wrote: > From: Mathieu Bouchard > Subject: Re: [PD] Making a Realtime Convolution External > To: "IOhannes m zmoelnig" > Cc: pd-list@iem.at > Date: Thursday, April 7, 2011, 5:15 PM > On Wed, 6 Apr 2011, IOhannes m > zmoelnig wrote: > > > using "threats" does not mean that things have to get > non-deterministic, > > and indeed a lot of software uses threads and stays > completely > > deterministic. > > Suppose that you launch a second fred on another cpu. How > do you synchronise the main fred and the second fred > together so that the main fred doesn't have to wait ? > Sounds to me like a big issue with multi-fredded > applications. You can't guarantee that the second cpu will > run the fred when the first fred will want to, because > fredding is dependent on the cpu's availability and the OS's > scheduler's decisions. I think Tim Blechmann addresses this with Supernova: http://lac.linuxaudio.org/2010/recordings/day4_1215_Supernova.ogv (start maybe 5 min. in...) -Jonathan > > > ___ > | Mathieu Bouchard tél: +1.514.383.3801 > Villeray, Montréal, QC > > -Inline Attachment Follows- > > ___ > Pd-list@iem.at > mailing list > UNSUBSCRIBE and account-management -> > http://lists.puredata.info/listinfo/pd-list > ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
Re: [PD] Making a Realtime Convolution External
if you're not realtime, your main process could get yanked by the scheduler too seems more like "realtime vs not-realtime" than "threads: yes/no". On Thu, Apr 7, 2011 at 8:15 AM, Mathieu Bouchard wrote: > On Wed, 6 Apr 2011, IOhannes m zmoelnig wrote: > >> using "threats" does not mean that things have to get non-deterministic, >> and indeed a lot of software uses threads and stays completely >> deterministic. > > Suppose that you launch a second fred on another cpu. How do you synchronise > the main fred and the second fred together so that the main fred doesn't > have to wait ? Sounds to me like a big issue with multi-fredded > applications. You can't guarantee that the second cpu will run the fred when > the first fred will want to, because fredding is dependent on the cpu's > availability and the OS's scheduler's decisions. > > ___ > | Mathieu Bouchard tél: +1.514.383.3801 Villeray, Montréal, QC > > ___ > Pd-list@iem.at mailing list > UNSUBSCRIBE and account-management -> > http://lists.puredata.info/listinfo/pd-list > > ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
Re: [PD] Making a Realtime Convolution External
On Wed, 6 Apr 2011, IOhannes m zmoelnig wrote: using "threats" does not mean that things have to get non-deterministic, and indeed a lot of software uses threads and stays completely deterministic. Suppose that you launch a second fred on another cpu. How do you synchronise the main fred and the second fred together so that the main fred doesn't have to wait ? Sounds to me like a big issue with multi-fredded applications. You can't guarantee that the second cpu will run the fred when the first fred will want to, because fredding is dependent on the cpu's availability and the OS's scheduler's decisions. ___ | Mathieu Bouchard tél: +1.514.383.3801 Villeray, Montréal, QC ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
Re: [PD] Making a Realtime Convolution External
SONOFA$#*&! I keep posting from the wrong email address and getting bounced ;-) Sorry to Henry & IOhannes for the dupes... > In the context of threading/part. conv, I had an idea to compute ahead. > Most of the calculations for a given block can be computed ahead. Only the > most recent block of samples needs to be actually convolved, right away. > > Then, once you've summed the most recent block with the other partitions, > you'd start a thread to "compute ahead" the next cycle's partitions. If the > load is low enough, it would complete by the next cycle--of course, you'd > need to wait/join the background thread to make sure that it completes. that's what it does, give or take :-) 1) Blocks that are due sooner are prioritized, whenever a worker thread is signalled to wake up, it does the block next due. 2) The main thread pre-empts worker threads as soon as its called. Worker threads check constantly for a signal to suspend their current task, and hand the unfinished work back to the main thread. 3) If the main thread needs a block that hasn't been completed, it has to work on it -> usually this means you aren't going to make your schedule and the CPU is maxxed out. -Seth ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
Re: [PD] Making a Realtime Convolution External
On Apr 6, 2011, at 2:52 PM, IOhannes m zmoelnig wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 2011-04-06 20:26, Hans-Christoph Steiner wrote: Pd has its own scheduling system which is best to stick to as long as you can so that you can keep the deterministic operation intact. For convolution, I can't see a reason to use a thread. It adds complexity and more code to run, but if the CPU is overtaxed by realtime convolution processing, you are going to get an interruption in the audio regardless of whether the processing is in a thread or not. partioned convolutions can gain massively from parallelisation. given that we have more and more CPUs available, i think it is a good thing to try and do a multicore convolution. otoh, if there is only a single thread doing the convolution, then there is no parallelisation, and thus the only thing gained is complexity. using "threats" does not mean that things have to get non- deterministic, and indeed a lot of software uses threads and stays completely deterministic. Yes, you can make things deterministic using threads. Coding without threads, its basically automatically deterministic, but when using threads, you have to code things right to have it deterministic. Having multiple threads to support multiple cores definitely makes sense, so I guess this multi-threaded pd object would just need to wait for the results of all threads before letting the DSP tick complete, thereby ensuring deterministic behavior. .hc Terrorism is not an enemy. It cannot be defeated. It's a tactic. It's about as sensible to say we declare war on night attacks and expect we're going to win that war. We're not going to win the war on terrorism.- retired U.S. Army general, William Odom ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
Re: [PD] Making a Realtime Convolution External
On Wed, Apr 6, 2011 at 2:08 PM, IOhannes m zmoelnig wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > On 2011-04-06 21:04, Seth Nickell wrote: > > I use a thread per core, it does parallelize nicely. > > > > that's what i thought. > > please don't let yourself turn down by all those misers :-) > > fgmasdr > IOhannes > As a young curmudgeon myself, I might *grumble* seem discouraging. But really, I'd encourage you to take on convolution externals... but don't create a monster. In the context of threading/part. conv, I had an idea to compute ahead. Most of the calculations for a given block can be computed ahead. Only the most recent block of samples needs to be actually convolved, right away. Then, once you've summed the most recent block with the other partitions, you'd start a thread to "compute ahead" the next cycle's partitions. If the load is low enough, it would complete by the next cycle--of course, you'd need to wait/join the background thread to make sure that it completes. ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
Re: [PD] Making a Realtime Convolution External
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 2011-04-06 21:04, Seth Nickell wrote: > I use a thread per core, it does parallelize nicely. > that's what i thought. please don't let yourself turn down by all those misers :-) fgmasdr IOhannes -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk2cuh0ACgkQkX2Xpv6ydvRphgCffCHDaKZzKAznS13POLZ+duHl zp8AoLk9RXtxKnxkTV0tGmWS/HH3afTH =JbVO -END PGP SIGNATURE- smime.p7s Description: S/MIME Cryptographic Signature ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
Re: [PD] Making a Realtime Convolution External
Hi Charles, I have a few partitioning methods. I used to do profiling when you first load the plugin, to determine the optimal partitioning, but found that on intel/amd cpus with sse3, it didn't vary much, and just hardcoded a simple rule set for when to use each partitioning style. In the more cross-platform context of pd, I think that profiling code might make sense again, I'll see if I can resurrect it. -Seth On Tue, Apr 5, 2011 at 5:04 PM, Charles Henry wrote: > > > On Tue, Apr 5, 2011 at 2:33 PM, Seth Nickell wrote: >> >> Hi Mathieu, >> >> Thanks, I assumed (without checking :-P) that the dsp call happened >> every time, didn't realize it was a setup/patching call that registers >> my "_perform" function with a call graph. Exactly what I need. >> >> I think the difference in approach comes from the needs of the >> external. fiddle~ probably needs much larger blocks than typical to >> discriminate between low frequencies. In my case, I can run at 64 >> sample sizes, but I'll take your whole CPU to do it. It might be smart >> to default to some internal buffering (say 512), and let people order >> the external to do really really low latency if they need it and are >> willing to pay in CPU. > > Here's where your users' choice of block sizes comes in--if your user puts a > partitioned convolution external into a canvas with block size 64, it means > to be low-latency. If the user puts it in with [block~ 1024], then the > buffering is defined. > > Pd means to be ~user~programmable and modular. The more you try to monolith > your externals, the worse they work (I've done this). I know I'm not > expressing it well, but I hope the point comes through. > >> >> That said, Peter reminded me of an optimization that I hadn't >> implemented yet. AudioUnits are rarely asked to run below 128 sample >> block sizes, so it didn't make sense for the AU, and I forgot that it >> was on the TODO list from 2 years ago. ;-) By convolving very small >> blocks in the time domain, and switching to frequency domain for >> larger blocks, I think we can get excellent CPU usage at very small >> block sizes too. > > It sounds like you'd have a bit of a problem without first profiling the > system or having known profiles for different hardware. Can you tell me > more about your partitioning method (just the math)? > >> >> -Seth >> >> On Tue, Apr 5, 2011 at 8:49 AM, Mathieu Bouchard >> wrote: >> > On Mon, 4 Apr 2011, Seth Nickell wrote: >> > >> >> Are the DSP calls liable to vary t_signal->s_n (block size) without >> >> notification? 64 samples, apparently the default on pd-extended, is >> >> doable without buffering for partitioned convolution on a modern >> >> computer, but it exacts a pretty high CPU toll, and if I have to >> >> handle random blocksize changes, it gets more expensive. >> >> >> >> Also, since convolution is much more efficient around block sizes of >> >> 256 >> >> or 512, perhaps I should default to one of these, buffer a little, and >> >> have >> >> a "runatpdblocksize" message or somesuch? >> > >> > There's always a notification. Any change of s_n will result in a new >> > call >> > to the dsp-function. >> > >> > Note that it's best to make sure that the dsp-function is fairly fast >> > most >> > of the times, because any patching may retrigger the dsp-function in >> > order >> > to recompile the graph. >> > >> > dsp objects working with some kind of blocks don't have to be using s_n >> > as a >> > setting. I mean that you can accumulate several dsp-blocks in order to >> > make >> > your own kind of bigger block. This is what [fiddle~] and [env~] do, for >> > example. >> > >> > But some other object classes use s_n as a setting. For example, [fft~] >> > does. I don't know why this is not consistent across all of pd. (I'm not >> > saying either approach is better than the other.) >> > >> > ___ >> > | Mathieu Bouchard tél: +1.514.383.3801 Villeray, Montréal, QC >> >> ___ >> Pd-list@iem.at mailing list >> UNSUBSCRIBE and account-management -> >> http://lists.puredata.info/listinfo/pd-list > > ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
Re: [PD] Making a Realtime Convolution External
Hi Hans, The thread in question here would be invoked when a "set" message is sent to the object. In this case, I need to load the Impulse Response from the disk and optionally do a test convolution and normalize it. I'm assuming (yeah, I should just check ;-) if I block on an inlet, I'm blocking the whole audio thread? Convolution with an increasing block size naturally involves different work-loads on different sample blocks (some sample blocks finish a large block that can then be processed, some don't). If the scheduler is picky/precise enough (AudioUnits is), you can't get away with this "ragged work load". Its not a matter of decreasing CPU usage - of course scheduling things to run in a worker thread increases CPU usage a little - its a matter of keeping the _perform call CPU cycles consistent. -Seth On Wed, Apr 6, 2011 at 11:26 AM, Hans-Christoph Steiner wrote: > > On Apr 4, 2011, at 10:48 PM, Seth Nickell wrote: > 2) Anyone have requests for features/api? Its currently simplistic: - takes a "read FILENAME" message, loads the file, does a test convolution against pink noise to normalize the gain to something sane >>> >>> Is this done within the main Pd audio thread? >> >> The convolution engine has support for doing it either on the calling >> thread, or a background thread. I'm thinking of default to a >> background thread. That seem like the right move? > > Pd has its own scheduling system which is best to stick to as long as you > can so that you can keep the deterministic operation intact. For > convolution, I can't see a reason to use a thread. It adds complexity and > more code to run, but if the CPU is overtaxed by realtime convolution > processing, you are going to get an interruption in the audio regardless of > whether the processing is in a thread or not. > > .hc > > - caches the last N impulse responses, as the test convolution takes a little time - allows setting the cache size with a "cachesize N" message >>> >>> To make sure I understood this: cachesize is not the size of the first >>> partition of the partitioned convolution, but the cache that tries to >>> avoid >>> audio dropouts when performing the test convolution? >> >> The convolution engine can swap-in a pre-loaded ('cached') IR in >> realtime without glitching... but it means keeping 2x the Impulse >> Response data in RAM. To keep the default API simple but useful, I'm >> defaulting to caching only the last 5 impulse responses in RAM. >> "cachesize N" lets you increase that number lets say in a >> performance you wanted to use 30 different impulse responses and you >> have 2GB of ram... should be nbd. >> - disable normalization with "normalize 0" or "normalize 1" >>> >>> Yes, disabling this could be a good idea! You could also add a "gain 0-1" >>> message for manual control. >> >> Its worth noting that impulse responses are usually whack without gain >> normalization like factors of hundreds to millions off a usable >> signal. >> Features I'm considering (let me know if they sound useful): - load from an array instead of from disk (no gain normalization?) >>> >>> Very good. - It wouldn't be hard to enable MxN convolution if that floats somebody's boat. >>> >>> I am sure if you come up with a convolution as efficient and flexible as >>> jconv by Fons within Pd, then soon a multichannel use and hence request >>> will >>> come up fast. >> >> I'd be interested in what flexibility means in this context, it might >> give me some good ideas for features to add. Efficiency-wise, last >> time I benchmarked its more efficient than jconv, but the difference >> is offset by less graceful degradation under CPU load (I convolve in >> background threads to preserve realtime in the main thread while >> avoiding an irritating patent that's going to expire soon...). >> >> WRT to Pd's audio scheduling... are Pd signal externals held to >> realtime or can my dsp call vary the number of cycles it takes by 100% >> from call to call? VST seems to do ok with this, but AudioUnits get >> scheduled to run at the very last instant they possibly could. If Pd >> can have some variance, I can drop the threads and improve the >> external's degradation under high CPU load. >> >> thanks for the feedback (also, is the best list for this kind of >> feedback?), >> >> -Seth >> >> ___ >> Pd-list@iem.at mailing list >> UNSUBSCRIBE and account-management -> >> http://lists.puredata.info/listinfo/pd-list > > > > > > As we enjoy great advantages from inventions of others, we should be glad of > an opportunity to serve others by any invention of ours; and this we should > do freely and generously. - Benjamin Franklin > > > ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.pure
Re: [PD] Making a Realtime Convolution External
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 2011-04-06 20:26, Hans-Christoph Steiner wrote: > > Pd has its own scheduling system which is best to stick to as long as > you can so that you can keep the deterministic operation intact. For > convolution, I can't see a reason to use a thread. It adds complexity > and more code to run, but if the CPU is overtaxed by realtime > convolution processing, you are going to get an interruption in the > audio regardless of whether the processing is in a thread or not. > partioned convolutions can gain massively from parallelisation. given that we have more and more CPUs available, i think it is a good thing to try and do a multicore convolution. otoh, if there is only a single thread doing the convolution, then there is no parallelisation, and thus the only thing gained is complexity. using "threats" does not mean that things have to get non-deterministic, and indeed a lot of software uses threads and stays completely deterministic. gjasdr IOhannes -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk2ctoUACgkQkX2Xpv6ydvSRHgCffF6PLcdNxIZb4SdpdQjMA5iJ TJ8An0OJBOD2SPYZcU54ltqGn7RF/tCb =yot/ -END PGP SIGNATURE- smime.p7s Description: S/MIME Cryptographic Signature ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
Re: [PD] Making a Realtime Convolution External
On Apr 4, 2011, at 10:48 PM, Seth Nickell wrote: 2) Anyone have requests for features/api? Its currently simplistic: - takes a "read FILENAME" message, loads the file, does a test convolution against pink noise to normalize the gain to something sane Is this done within the main Pd audio thread? The convolution engine has support for doing it either on the calling thread, or a background thread. I'm thinking of default to a background thread. That seem like the right move? Pd has its own scheduling system which is best to stick to as long as you can so that you can keep the deterministic operation intact. For convolution, I can't see a reason to use a thread. It adds complexity and more code to run, but if the CPU is overtaxed by realtime convolution processing, you are going to get an interruption in the audio regardless of whether the processing is in a thread or not. .hc - caches the last N impulse responses, as the test convolution takes a little time - allows setting the cache size with a "cachesize N" message To make sure I understood this: cachesize is not the size of the first partition of the partitioned convolution, but the cache that tries to avoid audio dropouts when performing the test convolution? The convolution engine can swap-in a pre-loaded ('cached') IR in realtime without glitching... but it means keeping 2x the Impulse Response data in RAM. To keep the default API simple but useful, I'm defaulting to caching only the last 5 impulse responses in RAM. "cachesize N" lets you increase that number lets say in a performance you wanted to use 30 different impulse responses and you have 2GB of ram... should be nbd. - disable normalization with "normalize 0" or "normalize 1" Yes, disabling this could be a good idea! You could also add a "gain 0-1" message for manual control. Its worth noting that impulse responses are usually whack without gain normalization like factors of hundreds to millions off a usable signal. Features I'm considering (let me know if they sound useful): - load from an array instead of from disk (no gain normalization?) Very good. - It wouldn't be hard to enable MxN convolution if that floats somebody's boat. I am sure if you come up with a convolution as efficient and flexible as jconv by Fons within Pd, then soon a multichannel use and hence request will come up fast. I'd be interested in what flexibility means in this context, it might give me some good ideas for features to add. Efficiency-wise, last time I benchmarked its more efficient than jconv, but the difference is offset by less graceful degradation under CPU load (I convolve in background threads to preserve realtime in the main thread while avoiding an irritating patent that's going to expire soon...). WRT to Pd's audio scheduling... are Pd signal externals held to realtime or can my dsp call vary the number of cycles it takes by 100% from call to call? VST seems to do ok with this, but AudioUnits get scheduled to run at the very last instant they possibly could. If Pd can have some variance, I can drop the threads and improve the external's degradation under high CPU load. thanks for the feedback (also, is the best list for this kind of feedback?), -Seth ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list As we enjoy great advantages from inventions of others, we should be glad of an opportunity to serve others by any invention of ours; and this we should do freely and generously. - Benjamin Franklin ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
Re: [PD] Making a Realtime Convolution External
Hi Seth, In terms of feature requests, since you are doing all the work already, it would be nice to have 1x4 mode, meaning one input ->4 convolutions->4 outputs. That would be great for ambisonic (b-format) 4-channel room impulse responses. Regards, Archontis On 4/5/11 3:54 AM, Seth Nickell wrote: I'm planning to release our realtime convolution engine (extracted from http://meatscience.net/pages/convolution-reverb) as a GPLed Pd external. It currently accepts 4-channel ('true stereo'), two channel or mono impulse responses, with stereo or mono output. Performance is excellent if you have SSE3 and has a fallback in case you don't, and it aims for accuracy (basically that means multi-stage scaling to keep floats within healthy sizes). 1) I'd love to swipe the convolve~ external name, currently installed by mjlib as part of pd-extended. convolve~ from mjlib appears to be a copy of pin~ ? so I think it could be taken? Maybe I mis-read the code. I've cc'ed mark who can probably clarify. 2) Anyone have requests for features/api? Its currently simplistic: - takes a "read FILENAME" message, loads the file, does a test convolution against pink noise to normalize the gain to something sane - caches the last N impulse responses, as the test convolution takes a little time - allows setting the cache size with a "cachesize N" message - disable normalization with "normalize 0" or "normalize 1" Features I'm considering (let me know if they sound useful): - load from an array instead of from disk (no gain normalization?) - It wouldn't be hard to enable MxN convolution if that floats somebody's boat. 3) I can compile/test on Mac& Linux, anyone up for helping me with Windows? 4) Would this be of interest for Pd-extended? 5) I'd love to build a granular convolution engine takes two real-time signals, and extracts grains from one to convolve against the other. Anyone have ideas about this? thanks all, -Seth ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
Re: [PD] Making a Realtime Convolution External
> the calculation as well, so that you could deliberately stagger the > blocks and more evenly distribute the calculation in cpu-intensive > situations. I'm imagining something like two 4096 blocks running say, > 64 samples apart so that one is does its calculation while the other > is still collecting samples. > > Matt > Maybe a master to sequence the starting of the blocks. instead of using block use switch~. If the masters blocksize is 64 set up a counter that sends a bang every iteration trigger the first switch at 4096. next iteration trigger the second. do a mod 64 repeat the process. I had no idea that convolution and fft vocoder were that much different. I will have to look up convolution now my only experience with it was in cool edit pro. After hearing Aphex Twin's Bucephalis Bouncing Ball I recorded a ball bouncing, extracted the timing in cakewalk, made some drum sounds trigger at the ball bounce, imported to cool edit, then convoluted with different sounds like female voice. Then there was Sonic Foundry's Acoustic mirror which sounded really muddy. Actually my use of the fft vocoder could almost be done with an envelope follower. ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
Re: [PD] Making a Realtime Convolution External
> Just scanned the source... big difference would be performance, and if > you're picky (you have to be pretty picky, honestly), some difference > in accuracy due to floating point's reduced precision at large/small > values. Convolution is still expensive enough for performance to > really matter. > > the biggies: > - partconv implements a single fixed block size, but freq domain > convolution is faster by far on bigger blocks (peak on a core duo is > near 4k sample blocks). implementing growing block sizes makes a big > difference to low latency performance (e.g. 64 64 128 128 256 256 512 > 512 1024 1024 2048 2048 4096 4096), as you can get low latency while > most of your convolutions operating on the ideal high-performance > block size. I was putting one of these together in Pd vanilla with dynamic patching as an exercise a few years back, but there were some problems I had. I think you can just do a simple 64 128 256 512 etc. and let the block delay take care of the timing automatically, but I actually found the kind you posted here to work a little better. Another one that worked even better was something like 64 32 32 64 64 128 128 256 256 etc., which seemed to front-load some of the calculation a little (and with this one and the one you posted, if Pd's block size were 1, you could do the first block as a direct convolution for extreme low-latency). Anyway, this brings up a problem I've been wondering about with Pd -- If you have lots of reblocking going on I have been assuming that if you had, say, one patch blocked at 64, another at 128, and others at 256 512 1024 2048 and 4096, that at the end of the 4096 block all 7 patches will have just finished a block cycle and there will therefore be a CPU spike relative to other places between the beginning and end of the 4096 block as the calculation for all 7 is done. Is there a way in Pd to offset larger blocks by a given number of samples so that the calculation for that block happens at a different time? It's easy enough to delay the samples -- that's not what I want. I want to delay the calculation as well, so that you could deliberately stagger the blocks and more evenly distribute the calculation in cpu-intensive situations. I'm imagining something like two 4096 blocks running say, 64 samples apart so that one is does its calculation while the other is still collecting samples. Matt > - vectorization (sse/altivec) of partconv would give a 2-3.5x performance > boost > > -seth ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
Re: [PD] Making a Realtime Convolution External
On Tue, Apr 5, 2011 at 2:33 PM, Seth Nickell wrote: > Hi Mathieu, > > Thanks, I assumed (without checking :-P) that the dsp call happened > every time, didn't realize it was a setup/patching call that registers > my "_perform" function with a call graph. Exactly what I need. > > I think the difference in approach comes from the needs of the > external. fiddle~ probably needs much larger blocks than typical to > discriminate between low frequencies. In my case, I can run at 64 > sample sizes, but I'll take your whole CPU to do it. It might be smart > to default to some internal buffering (say 512), and let people order > the external to do really really low latency if they need it and are > willing to pay in CPU. > Here's where your users' choice of block sizes comes in--if your user puts a partitioned convolution external into a canvas with block size 64, it means to be low-latency. If the user puts it in with [block~ 1024], then the buffering is defined. Pd means to be ~user~programmable and modular. The more you try to monolith your externals, the worse they work (I've done this). I know I'm not expressing it well, but I hope the point comes through. > > That said, Peter reminded me of an optimization that I hadn't > implemented yet. AudioUnits are rarely asked to run below 128 sample > block sizes, so it didn't make sense for the AU, and I forgot that it > was on the TODO list from 2 years ago. ;-) By convolving very small > blocks in the time domain, and switching to frequency domain for > larger blocks, I think we can get excellent CPU usage at very small > block sizes too. > It sounds like you'd have a bit of a problem without first profiling the system or having known profiles for different hardware. Can you tell me more about your partitioning method (just the math)? > > -Seth > > On Tue, Apr 5, 2011 at 8:49 AM, Mathieu Bouchard > wrote: > > On Mon, 4 Apr 2011, Seth Nickell wrote: > > > >> Are the DSP calls liable to vary t_signal->s_n (block size) without > >> notification? 64 samples, apparently the default on pd-extended, is > >> doable without buffering for partitioned convolution on a modern > >> computer, but it exacts a pretty high CPU toll, and if I have to > >> handle random blocksize changes, it gets more expensive. > >> > >> Also, since convolution is much more efficient around block sizes of 256 > >> or 512, perhaps I should default to one of these, buffer a little, and > have > >> a "runatpdblocksize" message or somesuch? > > > > There's always a notification. Any change of s_n will result in a new > call > > to the dsp-function. > > > > Note that it's best to make sure that the dsp-function is fairly fast > most > > of the times, because any patching may retrigger the dsp-function in > order > > to recompile the graph. > > > > dsp objects working with some kind of blocks don't have to be using s_n > as a > > setting. I mean that you can accumulate several dsp-blocks in order to > make > > your own kind of bigger block. This is what [fiddle~] and [env~] do, for > > example. > > > > But some other object classes use s_n as a setting. For example, [fft~] > > does. I don't know why this is not consistent across all of pd. (I'm not > > saying either approach is better than the other.) > > > > ___ > > | Mathieu Bouchard tél: +1.514.383.3801 Villeray, Montréal, QC > > ___ > Pd-list@iem.at mailing list > UNSUBSCRIBE and account-management -> > http://lists.puredata.info/listinfo/pd-list > ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
Re: [PD] Making a Realtime Convolution External
Hi Mathieu, Thanks, I assumed (without checking :-P) that the dsp call happened every time, didn't realize it was a setup/patching call that registers my "_perform" function with a call graph. Exactly what I need. I think the difference in approach comes from the needs of the external. fiddle~ probably needs much larger blocks than typical to discriminate between low frequencies. In my case, I can run at 64 sample sizes, but I'll take your whole CPU to do it. It might be smart to default to some internal buffering (say 512), and let people order the external to do really really low latency if they need it and are willing to pay in CPU. That said, Peter reminded me of an optimization that I hadn't implemented yet. AudioUnits are rarely asked to run below 128 sample block sizes, so it didn't make sense for the AU, and I forgot that it was on the TODO list from 2 years ago. ;-) By convolving very small blocks in the time domain, and switching to frequency domain for larger blocks, I think we can get excellent CPU usage at very small block sizes too. -Seth On Tue, Apr 5, 2011 at 8:49 AM, Mathieu Bouchard wrote: > On Mon, 4 Apr 2011, Seth Nickell wrote: > >> Are the DSP calls liable to vary t_signal->s_n (block size) without >> notification? 64 samples, apparently the default on pd-extended, is >> doable without buffering for partitioned convolution on a modern >> computer, but it exacts a pretty high CPU toll, and if I have to >> handle random blocksize changes, it gets more expensive. >> >> Also, since convolution is much more efficient around block sizes of 256 >> or 512, perhaps I should default to one of these, buffer a little, and have >> a "runatpdblocksize" message or somesuch? > > There's always a notification. Any change of s_n will result in a new call > to the dsp-function. > > Note that it's best to make sure that the dsp-function is fairly fast most > of the times, because any patching may retrigger the dsp-function in order > to recompile the graph. > > dsp objects working with some kind of blocks don't have to be using s_n as a > setting. I mean that you can accumulate several dsp-blocks in order to make > your own kind of bigger block. This is what [fiddle~] and [env~] do, for > example. > > But some other object classes use s_n as a setting. For example, [fft~] > does. I don't know why this is not consistent across all of pd. (I'm not > saying either approach is better than the other.) > > ___ > | Mathieu Bouchard tél: +1.514.383.3801 Villeray, Montréal, QC ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
Re: [PD] Making a Realtime Convolution External
On Tue, Apr 5, 2011 at 8:38 AM, Mathieu Bouchard wrote: > On Mon, 4 Apr 2011, Seth Nickell wrote: > >> 5) I'd love to build a granular convolution engine takes two real-time >> signals, and extracts grains from one to convolve against the other. Anyone >> have ideas about this? > > What's the fundamental difference between this and a windowed FFT > convolution engine ? Big difference would be stochastic grain selection (with inputs/control over the selection tendencies), but it'd definitely start as a straight-up windowed fft convolution engine. E.g. one parameter that could make for interesting selections is to hunt for decaying peaks, and favor using those to get a "crisper output" instead of the haze that results from windowed fft convolution. -seth ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
Re: [PD] Making a Realtime Convolution External
Hi Jamie, Just scanned the source... big difference would be performance, and if you're picky (you have to be pretty picky, honestly), some difference in accuracy due to floating point's reduced precision at large/small values. Convolution is still expensive enough for performance to really matter. the biggies: - partconv implements a single fixed block size, but freq domain convolution is faster by far on bigger blocks (peak on a core duo is near 4k sample blocks). implementing growing block sizes makes a big difference to low latency performance (e.g. 64 64 128 128 256 256 512 512 1024 1024 2048 2048 4096 4096), as you can get low latency while most of your convolutions operating on the ideal high-performance block size. - vectorization (sse/altivec) of partconv would give a 2-3.5x performance boost -seth On Tue, Apr 5, 2011 at 8:26 AM, Jamie Bullock wrote: > > Hi Seth, > > > On 5 Apr 2011, at 01:54, Seth Nickell wrote: > >> I'm planning to release our realtime convolution engine (extracted >> from http://meatscience.net/pages/convolution-reverb) as a GPLed Pd >> external. >> > > What is the advantage of this over Ben Saylor's [partconv~] external, which > provides partitioned convolution? > > Jamie > > ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
Re: [PD] Making a Realtime Convolution External
On Mon, 4 Apr 2011, Seth Nickell wrote: Are the DSP calls liable to vary t_signal->s_n (block size) without notification? 64 samples, apparently the default on pd-extended, is doable without buffering for partitioned convolution on a modern computer, but it exacts a pretty high CPU toll, and if I have to handle random blocksize changes, it gets more expensive. Also, since convolution is much more efficient around block sizes of 256 or 512, perhaps I should default to one of these, buffer a little, and have a "runatpdblocksize" message or somesuch? There's always a notification. Any change of s_n will result in a new call to the dsp-function. Note that it's best to make sure that the dsp-function is fairly fast most of the times, because any patching may retrigger the dsp-function in order to recompile the graph. dsp objects working with some kind of blocks don't have to be using s_n as a setting. I mean that you can accumulate several dsp-blocks in order to make your own kind of bigger block. This is what [fiddle~] and [env~] do, for example. But some other object classes use s_n as a setting. For example, [fft~] does. I don't know why this is not consistent across all of pd. (I'm not saying either approach is better than the other.) ___ | Mathieu Bouchard tél: +1.514.383.3801 Villeray, Montréal, QC___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
Re: [PD] Making a Realtime Convolution External
On Mon, 4 Apr 2011, Peter Plessas wrote: This would be of interest for all Pd users, no matter if they like their externals included in a distribution of Pd ('extended') or manuall adding them to their vanilla Pd. But pd-extended is not merely a bundling of externals. For example, the [initbang] internal class is not in vanilla and is not possible as an external. There are also differences about rendering of boxes and fonts, if you've seen any screenshots. ___ | Mathieu Bouchard tél: +1.514.383.3801 Villeray, Montréal, QC ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
Re: [PD] Making a Realtime Convolution External
On Mon, 4 Apr 2011, Seth Nickell wrote: 5) I'd love to build a granular convolution engine takes two real-time signals, and extracts grains from one to convolve against the other. Anyone have ideas about this? What's the fundamental difference between this and a windowed FFT convolution engine ? ___ | Mathieu Bouchard tél: +1.514.383.3801 Villeray, Montréal, QC ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
Re: [PD] Making a Realtime Convolution External
Hi Seth, On 5 Apr 2011, at 01:54, Seth Nickell wrote: > I'm planning to release our realtime convolution engine (extracted > from http://meatscience.net/pages/convolution-reverb) as a GPLed Pd > external. > What is the advantage of this over Ben Saylor's [partconv~] external, which provides partitioned convolution? Jamie ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
Re: [PD] Making a Realtime Convolution External
On Tue, 5 Apr 2011, Billy Stiltner wrote: I remember there were lots of tricks that could be done with graphics and integer math as well as binary bit twidling before math coprocessors were in every machine. Look at fractint. Example of circle code seems like I optimized this further but can't remember. Yeah, I remember the Bresenham techniques, but the whole concept of the convolution theorem and the FFT is a lot deeper than that... it's a really deep optimisation. BTW, the only Bresenham I'm aware of in Pd is [#draw_polygon]. I used it for making stuff like this : http://gridflow.ca/gallery/koch_polygon_3a.png http://gridflow.ca/gallery/koch_polygon_2d.png http://gridflow.ca/gallery/bezier.png http://gridflow.ca/gallery/supercycloid.mov etc. see the rest of the koch series on http://gridflow.ca/gallery ___ | Mathieu Bouchard tél: +1.514.383.3801 Villeray, Montréal, QC___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
Re: [PD] Making a Realtime Convolution External
I just looked and with 512 sample buffer Reason defaults to 15ms latency output at 44.1k and 48k. at 96k it is 9ms output latency. ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
Re: [PD] Making a Realtime Convolution External
>From a users standpoint here's my 2 cents. I have 2 tunes that use Reason's FFT vocoder. In FFT mode it has 32 frequency bands. The carrier input on these tunes are water recordings at 96k and the modulator is drum beats of which samples were recorded at 44.1k. I usually work in 48khz. So being able to work with differen't sample rates and have the audio playing at sample rate of file would be great. http://sites.google.com/site/chuxlingchaxrazauralarielz/bp1.mp3 http://sites.google.com/site/chuxlingchaxrazauralarielz/bp2.mp3 performance of reason and latency was good enough to record live midi input while vocoding on a 2.2GHz. AMD with 1Gb of ram. I'm not sure of the latency at the time of those recordings but was using pd along with reason. pd was used to retune midi notes with pitchbend. I don't know how they do it but It but would sure like to know. fast convolution is a feature I would love to be able to use in pd. I remember there were lots of tricks that could be done with graphics and integer math as well as binary bit twidling before math coprocessors were in every machine. Look at fractint. Example of circle code seems like I optimized this further but can't remember. / void bcircle(int x0,int y0,int radius,int c) { int x,y; long a, asquared, twoasquared; long b, bsquared, twobsquared; long d, dx, dy; int Aspecty,Aspectx; getaspectratio(&Aspectx,&Aspecty); x=0; y=radius; //a=radius*Aspecty/Aspectx; a=radius*1.; asquared=a*a; twoasquared=2*asquared; b=radius; bsquared=b*b; twobsquared=2*bsquared; d=bsquared-asquared*b+asquared/4L; dx=0; dy=twoasquared*b; while(dx0) { y=y-1; dy=dy-twoasquared; d=d-dy; putpixel(x0+x,y0+y,c); putpixel(x0-x,y0+y,c); putpixel(x0+x,y0-y,c); putpixel(x0-x,y0-y,c); x=x+1; dx=dx+twobsquared; d=d+bsquared+dx; }else{ x=x+1; dx=dx+twobsquared; d=d+bsquared+dx; putpixel(x0+x,y0+y,c); putpixel(x0-x,y0+y,c); putpixel(x0+x,y0-y,c); putpixel(x0-x,y0-y,c); }; }; d=d+(3L*(asquared-bsquared)/2L-(dx+dy))/2L; while(y>0) { if(d<0) { x=x+1; dx=dx+twobsquared; d=d+dx; }; y=y-1; putpixel(x0+x,y0+y,c); putpixel(x0-x,y0+y,c); putpixel(x0+x,y0-y,c); putpixel(x0-x,y0-y,c); dy=dy-twoasquared; d=d+asquared-dy; }; }; / Could something like this be done with audio to speed up operations? ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
Re: [PD] Making a Realtime Convolution External
>> Also, since convolution is much more efficient around block sizes of >> 256 or 512, perhaps I should default to one of these, buffer a little, >> and have a "runatpdblocksize" message or somesuch? > > I still have not understood if/how the user can set the duration of the > first partition of you partitioned convolution, and how these partitions are > structured in their (possibly increasing) sizes. Since this first paramter > will define the latency-vs-CPU tradeoff it should not be preset by the > developers. I guess this is what I was asking. I support a few "block pattern" partitioning schemes (they're pluggable, its very easy to add a new one), I could export the choice of these to the end-user, including the option of what block size to start with - the minimuum block size of course being Pd's current block size. My guess is, in the wild, most "pd users" are using Pd-extended, and ships with a 20msec default delay (dunno if this is inherited from vanilla, or overridden by the distro, but either way, same effect: most pd installs probably run at 20msec). I'm all for allowing configuration of these important parameters, but I want the external to do something sane out of the box. My guess is 64 sample blocks (~20msec) is more abusive CPU-wise than most people expect out-of-the-box, so I'm probably going to default to a partitioning that looks like: 256, 512, 1024, 2048, 4096, 4096, ..., 4096 And allow people to set a different partitioning scheme, including reducing the initial partition size, if they want. That make good sense? -Seth > > P. > > PS: Pd and Pd-extended use the same core, audio engine. You might want to > consider Pd-extended as vanilla Pd with a folder full of precompiled > externals. > >> >> On Mon, Apr 4, 2011 at 7:48 PM, Seth Nickell wrote: > > 2) Anyone have requests for features/api? Its currently simplistic: > - takes a "read FILENAME" message, loads the file, does a test > convolution against pink noise to normalize the gain to something sane Is this done within the main Pd audio thread? >>> >>> The convolution engine has support for doing it either on the calling >>> thread, or a background thread. I'm thinking of default to a >>> background thread. That seem like the right move? >>> > - caches the last N impulse responses, as the test convolution > takes a little time > - allows setting the cache size with a "cachesize N" message To make sure I understood this: cachesize is not the size of the first partition of the partitioned convolution, but the cache that tries to avoid audio dropouts when performing the test convolution? >>> >>> The convolution engine can swap-in a pre-loaded ('cached') IR in >>> realtime without glitching... but it means keeping 2x the Impulse >>> Response data in RAM. To keep the default API simple but useful, I'm >>> defaulting to caching only the last 5 impulse responses in RAM. >>> "cachesize N" lets you increase that number lets say in a >>> performance you wanted to use 30 different impulse responses and you >>> have 2GB of ram... should be nbd. >>> > - disable normalization with "normalize 0" or "normalize 1" Yes, disabling this could be a good idea! You could also add a "gain 0-1" message for manual control. >>> >>> Its worth noting that impulse responses are usually whack without gain >>> normalization like factors of hundreds to millions off a usable >>> signal. >>> > Features I'm considering (let me know if they sound useful): > - load from an array instead of from disk (no gain normalization?) Very good. > > - It wouldn't be hard to enable MxN convolution if that floats > somebody's boat. I am sure if you come up with a convolution as efficient and flexible as jconv by Fons within Pd, then soon a multichannel use and hence request will come up fast. >>> >>> I'd be interested in what flexibility means in this context, it might >>> give me some good ideas for features to add. Efficiency-wise, last >>> time I benchmarked its more efficient than jconv, but the difference >>> is offset by less graceful degradation under CPU load (I convolve in >>> background threads to preserve realtime in the main thread while >>> avoiding an irritating patent that's going to expire soon...). >>> >>> WRT to Pd's audio scheduling... are Pd signal externals held to >>> realtime or can my dsp call vary the number of cycles it takes by 100% >>> from call to call? VST seems to do ok with this, but AudioUnits get >>> scheduled to run at the very last instant they possibly could. If Pd >>> can have some variance, I can drop the threads and improve the >>> external's degradation under high CPU load. >>> >>> thanks for the feedback (also, is the best list for this kind of >>> feedback?), >>> >>> -Seth >>> > ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account
Re: [PD] Making a Realtime Convolution External
Dear Seth, Seth Nickell wrote: Another question on similar lines... Are the DSP calls liable to vary t_signal->s_n (block size) without notification? 64 samples, apparently the default on pd-extended, is doable without buffering for partitioned convolution on a modern computer, but it exacts a pretty high CPU toll, and if I have to handle random blocksize changes, it gets more expensive. They cannot vary by themselves, but what is usually done (e.g. with FFTs), is to place an signal (tilde ~) object in a subpatch and resize the blocksize for that blocksize using the [switch~] or [block~] objects. You might consider using this very approach. Also, since convolution is much more efficient around block sizes of 256 or 512, perhaps I should default to one of these, buffer a little, and have a "runatpdblocksize" message or somesuch? I still have not understood if/how the user can set the duration of the first partition of you partitioned convolution, and how these partitions are structured in their (possibly increasing) sizes. Since this first paramter will define the latency-vs-CPU tradeoff it should not be preset by the developers. P. PS: Pd and Pd-extended use the same core, audio engine. You might want to consider Pd-extended as vanilla Pd with a folder full of precompiled externals. On Mon, Apr 4, 2011 at 7:48 PM, Seth Nickell wrote: 2) Anyone have requests for features/api? Its currently simplistic: - takes a "read FILENAME" message, loads the file, does a test convolution against pink noise to normalize the gain to something sane Is this done within the main Pd audio thread? The convolution engine has support for doing it either on the calling thread, or a background thread. I'm thinking of default to a background thread. That seem like the right move? - caches the last N impulse responses, as the test convolution takes a little time - allows setting the cache size with a "cachesize N" message To make sure I understood this: cachesize is not the size of the first partition of the partitioned convolution, but the cache that tries to avoid audio dropouts when performing the test convolution? The convolution engine can swap-in a pre-loaded ('cached') IR in realtime without glitching... but it means keeping 2x the Impulse Response data in RAM. To keep the default API simple but useful, I'm defaulting to caching only the last 5 impulse responses in RAM. "cachesize N" lets you increase that number lets say in a performance you wanted to use 30 different impulse responses and you have 2GB of ram... should be nbd. - disable normalization with "normalize 0" or "normalize 1" Yes, disabling this could be a good idea! You could also add a "gain 0-1" message for manual control. Its worth noting that impulse responses are usually whack without gain normalization like factors of hundreds to millions off a usable signal. Features I'm considering (let me know if they sound useful): - load from an array instead of from disk (no gain normalization?) Very good. - It wouldn't be hard to enable MxN convolution if that floats somebody's boat. I am sure if you come up with a convolution as efficient and flexible as jconv by Fons within Pd, then soon a multichannel use and hence request will come up fast. I'd be interested in what flexibility means in this context, it might give me some good ideas for features to add. Efficiency-wise, last time I benchmarked its more efficient than jconv, but the difference is offset by less graceful degradation under CPU load (I convolve in background threads to preserve realtime in the main thread while avoiding an irritating patent that's going to expire soon...). WRT to Pd's audio scheduling... are Pd signal externals held to realtime or can my dsp call vary the number of cycles it takes by 100% from call to call? VST seems to do ok with this, but AudioUnits get scheduled to run at the very last instant they possibly could. If Pd can have some variance, I can drop the threads and improve the external's degradation under high CPU load. thanks for the feedback (also, is the best list for this kind of feedback?), -Seth ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
Re: [PD] Making a Realtime Convolution External
Another question on similar lines... Are the DSP calls liable to vary t_signal->s_n (block size) without notification? 64 samples, apparently the default on pd-extended, is doable without buffering for partitioned convolution on a modern computer, but it exacts a pretty high CPU toll, and if I have to handle random blocksize changes, it gets more expensive. Also, since convolution is much more efficient around block sizes of 256 or 512, perhaps I should default to one of these, buffer a little, and have a "runatpdblocksize" message or somesuch? On Mon, Apr 4, 2011 at 7:48 PM, Seth Nickell wrote: >>> 2) Anyone have requests for features/api? Its currently simplistic: >>> - takes a "read FILENAME" message, loads the file, does a test >>> convolution against pink noise to normalize the gain to something sane >> >> Is this done within the main Pd audio thread? > > The convolution engine has support for doing it either on the calling > thread, or a background thread. I'm thinking of default to a > background thread. That seem like the right move? > >>> >>> - caches the last N impulse responses, as the test convolution >>> takes a little time >>> - allows setting the cache size with a "cachesize N" message >> >> To make sure I understood this: cachesize is not the size of the first >> partition of the partitioned convolution, but the cache that tries to avoid >> audio dropouts when performing the test convolution? > > The convolution engine can swap-in a pre-loaded ('cached') IR in > realtime without glitching... but it means keeping 2x the Impulse > Response data in RAM. To keep the default API simple but useful, I'm > defaulting to caching only the last 5 impulse responses in RAM. > "cachesize N" lets you increase that number lets say in a > performance you wanted to use 30 different impulse responses and you > have 2GB of ram... should be nbd. > >>> >>> - disable normalization with "normalize 0" or "normalize 1" >> >> Yes, disabling this could be a good idea! You could also add a "gain 0-1" >> message for manual control. > > Its worth noting that impulse responses are usually whack without gain > normalization like factors of hundreds to millions off a usable > signal. > >>> Features I'm considering (let me know if they sound useful): >>> - load from an array instead of from disk (no gain normalization?) >> >> Very good. >>> >>> - It wouldn't be hard to enable MxN convolution if that floats >>> somebody's boat. >> >> I am sure if you come up with a convolution as efficient and flexible as >> jconv by Fons within Pd, then soon a multichannel use and hence request will >> come up fast. > > I'd be interested in what flexibility means in this context, it might > give me some good ideas for features to add. Efficiency-wise, last > time I benchmarked its more efficient than jconv, but the difference > is offset by less graceful degradation under CPU load (I convolve in > background threads to preserve realtime in the main thread while > avoiding an irritating patent that's going to expire soon...). > > WRT to Pd's audio scheduling... are Pd signal externals held to > realtime or can my dsp call vary the number of cycles it takes by 100% > from call to call? VST seems to do ok with this, but AudioUnits get > scheduled to run at the very last instant they possibly could. If Pd > can have some variance, I can drop the threads and improve the > external's degradation under high CPU load. > > thanks for the feedback (also, is the best list for this kind of feedback?), > > -Seth > ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
Re: [PD] Making a Realtime Convolution External
>> 2) Anyone have requests for features/api? Its currently simplistic: >> - takes a "read FILENAME" message, loads the file, does a test >> convolution against pink noise to normalize the gain to something sane > > Is this done within the main Pd audio thread? The convolution engine has support for doing it either on the calling thread, or a background thread. I'm thinking of default to a background thread. That seem like the right move? >> >> - caches the last N impulse responses, as the test convolution >> takes a little time >> - allows setting the cache size with a "cachesize N" message > > To make sure I understood this: cachesize is not the size of the first > partition of the partitioned convolution, but the cache that tries to avoid > audio dropouts when performing the test convolution? The convolution engine can swap-in a pre-loaded ('cached') IR in realtime without glitching... but it means keeping 2x the Impulse Response data in RAM. To keep the default API simple but useful, I'm defaulting to caching only the last 5 impulse responses in RAM. "cachesize N" lets you increase that number lets say in a performance you wanted to use 30 different impulse responses and you have 2GB of ram... should be nbd. >> >> - disable normalization with "normalize 0" or "normalize 1" > > Yes, disabling this could be a good idea! You could also add a "gain 0-1" > message for manual control. Its worth noting that impulse responses are usually whack without gain normalization like factors of hundreds to millions off a usable signal. >> Features I'm considering (let me know if they sound useful): >> - load from an array instead of from disk (no gain normalization?) > > Very good. >> >> - It wouldn't be hard to enable MxN convolution if that floats >> somebody's boat. > > I am sure if you come up with a convolution as efficient and flexible as > jconv by Fons within Pd, then soon a multichannel use and hence request will > come up fast. I'd be interested in what flexibility means in this context, it might give me some good ideas for features to add. Efficiency-wise, last time I benchmarked its more efficient than jconv, but the difference is offset by less graceful degradation under CPU load (I convolve in background threads to preserve realtime in the main thread while avoiding an irritating patent that's going to expire soon...). WRT to Pd's audio scheduling... are Pd signal externals held to realtime or can my dsp call vary the number of cycles it takes by 100% from call to call? VST seems to do ok with this, but AudioUnits get scheduled to run at the very last instant they possibly could. If Pd can have some variance, I can drop the threads and improve the external's degradation under high CPU load. thanks for the feedback (also, is the best list for this kind of feedback?), -Seth ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
Re: [PD] Making a Realtime Convolution External
Seth Nickell wrote: I'm planning to release our realtime convolution engine (extracted from http://meatscience.net/pages/convolution-reverb) as a GPLed Pd external. This is a good idea! It currently accepts 4-channel ('true stereo'), two channel or mono impulse responses, with stereo or mono output. Performance is What is 'true stereo' with four channels by the way? excellent if you have SSE3 and has a fallback in case you don't, and it aims for accuracy (basically that means multi-stage scaling to keep floats within healthy sizes). 1) I'd love to swipe the convolve~ external name, currently installed by mjlib as part of pd-extended. convolve~ from mjlib appears to be a copy of pin~ ? so I think it could be taken? Maybe I mis-read the code. I've cc'ed mark who can probably clarify. 2) Anyone have requests for features/api? Its currently simplistic: - takes a "read FILENAME" message, loads the file, does a test convolution against pink noise to normalize the gain to something sane Is this done within the main Pd audio thread? - caches the last N impulse responses, as the test convolution takes a little time - allows setting the cache size with a "cachesize N" message To make sure I understood this: cachesize is not the size of the first partition of the partitioned convolution, but the cache that tries to avoid audio dropouts when performing the test convolution? - disable normalization with "normalize 0" or "normalize 1" Yes, disabling this could be a good idea! You could also add a "gain 0-1" message for manual control. Features I'm considering (let me know if they sound useful): - load from an array instead of from disk (no gain normalization?) Very good. - It wouldn't be hard to enable MxN convolution if that floats somebody's boat. I am sure if you come up with a convolution as efficient and flexible as jconv by Fons within Pd, then soon a multichannel use and hence request will come up fast. [...] 4) Would this be of interest for Pd-extended? This would be of interest for all Pd users, no matter if they like their externals included in a distribution of Pd ('extended') or manuall adding them to their vanilla Pd. best, P ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
[PD] Making a Realtime Convolution External
I'm planning to release our realtime convolution engine (extracted from http://meatscience.net/pages/convolution-reverb) as a GPLed Pd external. It currently accepts 4-channel ('true stereo'), two channel or mono impulse responses, with stereo or mono output. Performance is excellent if you have SSE3 and has a fallback in case you don't, and it aims for accuracy (basically that means multi-stage scaling to keep floats within healthy sizes). 1) I'd love to swipe the convolve~ external name, currently installed by mjlib as part of pd-extended. convolve~ from mjlib appears to be a copy of pin~ ? so I think it could be taken? Maybe I mis-read the code. I've cc'ed mark who can probably clarify. 2) Anyone have requests for features/api? Its currently simplistic: - takes a "read FILENAME" message, loads the file, does a test convolution against pink noise to normalize the gain to something sane - caches the last N impulse responses, as the test convolution takes a little time - allows setting the cache size with a "cachesize N" message - disable normalization with "normalize 0" or "normalize 1" Features I'm considering (let me know if they sound useful): - load from an array instead of from disk (no gain normalization?) - It wouldn't be hard to enable MxN convolution if that floats somebody's boat. 3) I can compile/test on Mac & Linux, anyone up for helping me with Windows? 4) Would this be of interest for Pd-extended? 5) I'd love to build a granular convolution engine takes two real-time signals, and extracts grains from one to convolve against the other. Anyone have ideas about this? thanks all, -Seth ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list