Re: [PD] how to iterate over left and right channel separately in one Pd class?
Yeah, that makes sense. With all the auto-vectorization and SIMD support is recent versions of gcc, it seems a better approach is to tailor the C code to work well with SIMD-aware compilers. .hc On 01/12/2013 04:45 PM, katja wrote: > It's interesting, but rather compiler-and-processor-specific. Such > code is maintanance-intensive. At the moment, ARM processors are > screaming loudest for optimization. Best thing for a community project > is probably plain C code which reckons with parallel processing, > because that won't go away for the next few decades. Functions like > copy_perform8(), times_perform8() etc. can profit from SIMD > instructions without a need for compiler intrinsics and asm code. > Well-structured data storage and access can make a 50 % or more > performance gain, in my experience. > > Another important thing: avoid float precision conversions. Throughout > Pd there are many untyped float defines and literal constants which > default to double, and I have introduced more when making libs > double-ready. Not good. I'll come back to this in another thread. > > Katja > > > On Sat, Jan 12, 2013 at 8:14 PM, Hans-Christoph Steiner wrote: >> >> If you are interested, there is still the hand-coded SIMD stuff from >> pd-devel: >> https://pure-data.svn.sourceforge.net/svnroot/pure-data/branches/pd-devel/v0-39 >> >> .hc >> >> On 01/12/2013 09:34 AM, katja wrote: >>> Function copy_perform8() is also eligible for SIMD processing. I used >>> memcpy() because it is straightforward to use, while Pd's functions >>> pointed to the wrong locations for this case. On the reverb's total >>> load there is no significant performance difference. >>> >>> Katja >>> >>> >>> On Sat, Jan 12, 2013 at 1:00 AM, Hans-Christoph Steiner >>> wrote: I recently learned that libc's memcpy actually uses things like SSE2 or SSSE2 so it can be quite fast on CPUs from the past 10 years, especially of the last 5 years. It would be worth profiling to see if that's noticeable. .hc On 01/11/2013 05:12 PM, katja wrote: > Ok so I did the ugly thing with the right channel input and output > pointers: > > memcpy(outR, inR, vectorsize * sizeof(t_float)); > inR = outR; > > Works like a charm, thanks again. > > Katja > > > > On Fri, Jan 11, 2013 at 10:05 PM, Miller Puckette wrote: >> copy_perform assumes the data is 4-byte aligned so might save a test >> or two compared to memcopy() - but I really don't know. I never >> benchmarked the two against each other :) >> >> M >> >> On Fri, Jan 11, 2013 at 09:36:41PM +0100, katja wrote: >>> Hi Miller, >>> >>> Thanks for the solution. The routines are in place so copying the >>> right channel input to output should do it. Is there any reason to >>> prefer copy_perform() over memcpy()? I'm trying to make the most >>> efficient reverb for RPi & Co. >>> >>> Katja >>> >>> >>> >>> On Fri, Jan 11, 2013 at 7:57 PM, Miller Puckette wrote: Hi Katja - There's one example of this in sigfft_dspx() - a complex FFT that 'natively' works on 2 signals in-place but has to deal with various cases in which buffers get re-used. It's ugly but the basic idea is first to get the inputs copied to the outputs (unless they're already there in the correct order in which case nothing needs to be done) and then run the in-place algorithm. If the algo only works out-of-place (i.e. you need 4 distinct buffers, 2 in and 2 out) the only way out is (at least conditionally) allocate temporary copies of the inputs before writing to any outputs. I may be able to add an optional way tilde objects can request that output buffers be distinct from input ones sometime in the future - but this is a couple of steps away for me right now :) M On Fri, Jan 11, 2013 at 03:32:09PM +0100, katja wrote: > Hello, > > I'm working on a Pd class with stereo channels (reverb), and the > routine happens to be most efficient when iterating over the samples > per channel, instead of left and right together in the perform loop. > However, when doing two while loops in one object, one for left and > one for right, the right channel samples get overwritten because of > sample-wise in-place computation. Is this an inescapable truth? I > mean, I could write a left channel class and a right channel class > (actually did that to verify that it works), but it's inconvenient to > use. What could be an efficient way to get them in one object? > > Thanks, > Katja > > ___
Re: [PD] how to iterate over left and right channel separately in one Pd class?
It's interesting, but rather compiler-and-processor-specific. Such code is maintanance-intensive. At the moment, ARM processors are screaming loudest for optimization. Best thing for a community project is probably plain C code which reckons with parallel processing, because that won't go away for the next few decades. Functions like copy_perform8(), times_perform8() etc. can profit from SIMD instructions without a need for compiler intrinsics and asm code. Well-structured data storage and access can make a 50 % or more performance gain, in my experience. Another important thing: avoid float precision conversions. Throughout Pd there are many untyped float defines and literal constants which default to double, and I have introduced more when making libs double-ready. Not good. I'll come back to this in another thread. Katja On Sat, Jan 12, 2013 at 8:14 PM, Hans-Christoph Steiner wrote: > > If you are interested, there is still the hand-coded SIMD stuff from pd-devel: > https://pure-data.svn.sourceforge.net/svnroot/pure-data/branches/pd-devel/v0-39 > > .hc > > On 01/12/2013 09:34 AM, katja wrote: >> Function copy_perform8() is also eligible for SIMD processing. I used >> memcpy() because it is straightforward to use, while Pd's functions >> pointed to the wrong locations for this case. On the reverb's total >> load there is no significant performance difference. >> >> Katja >> >> >> On Sat, Jan 12, 2013 at 1:00 AM, Hans-Christoph Steiner >> wrote: >>> >>> I recently learned that libc's memcpy actually uses things like SSE2 or >>> SSSE2 >>> so it can be quite fast on CPUs from the past 10 years, especially of the >>> last >>> 5 years. >>> >>> It would be worth profiling to see if that's noticeable. >>> >>> .hc >>> >>> On 01/11/2013 05:12 PM, katja wrote: Ok so I did the ugly thing with the right channel input and output pointers: memcpy(outR, inR, vectorsize * sizeof(t_float)); inR = outR; Works like a charm, thanks again. Katja On Fri, Jan 11, 2013 at 10:05 PM, Miller Puckette wrote: > copy_perform assumes the data is 4-byte aligned so might save a test > or two compared to memcopy() - but I really don't know. I never > benchmarked the two against each other :) > > M > > On Fri, Jan 11, 2013 at 09:36:41PM +0100, katja wrote: >> Hi Miller, >> >> Thanks for the solution. The routines are in place so copying the >> right channel input to output should do it. Is there any reason to >> prefer copy_perform() over memcpy()? I'm trying to make the most >> efficient reverb for RPi & Co. >> >> Katja >> >> >> >> On Fri, Jan 11, 2013 at 7:57 PM, Miller Puckette wrote: >>> Hi Katja - >>> >>> There's one example of this in sigfft_dspx() - a complex FFT that >>> 'natively' >>> works on 2 signals in-place but has to deal with various cases in which >>> buffers get re-used. It's ugly but the basic idea is first to get the >>> inputs copied to the outputs (unless they're already there in the >>> correct >>> order in which case nothing needs to be done) and then run the in-place >>> algorithm. >>> >>> If the algo only works out-of-place (i.e. you need 4 distinct buffers, 2 >>> in and 2 out) the only way out is (at least conditionally) allocate >>> temporary >>> copies of the inputs before writing to any outputs. >>> >>> I may be able to add an optional way tilde objects can request that >>> output >>> buffers be distinct from input ones sometime in the future - but this >>> is a >>> couple of steps away for me right now :) >>> >>> M >>> >>> On Fri, Jan 11, 2013 at 03:32:09PM +0100, katja wrote: Hello, I'm working on a Pd class with stereo channels (reverb), and the routine happens to be most efficient when iterating over the samples per channel, instead of left and right together in the perform loop. However, when doing two while loops in one object, one for left and one for right, the right channel samples get overwritten because of sample-wise in-place computation. Is this an inescapable truth? I mean, I could write a left channel class and a right channel class (actually did that to verify that it works), but it's inconvenient to use. What could be an efficient way to get them in one object? Thanks, Katja ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list >> >> ___ >> Pd-list@iem.at mailing list >> UNSUBSCRIBE and account-management -> >> http://lists.puredata.info/listinfo/pd-list ___
Re: [PD] Echo detection and autocepstrum
Sounds like you're looking for William Brent's owkr: http://williambrent.conflations.com/pages/research.html .hc On 01/12/2013 06:58 AM, oguz gurler wrote: > Hi, > > I'm working on echo detection and looking for autocepstrum with pd. Is > there any detail information about autocepstrum-cepstrum and it's output > data for commenting about finding echo. > > > > ___ > Pd-list@iem.at mailing list > UNSUBSCRIBE and account-management -> > http://lists.puredata.info/listinfo/pd-list > ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
Re: [PD] how to iterate over left and right channel separately in one Pd class?
If you are interested, there is still the hand-coded SIMD stuff from pd-devel: https://pure-data.svn.sourceforge.net/svnroot/pure-data/branches/pd-devel/v0-39 .hc On 01/12/2013 09:34 AM, katja wrote: > Function copy_perform8() is also eligible for SIMD processing. I used > memcpy() because it is straightforward to use, while Pd's functions > pointed to the wrong locations for this case. On the reverb's total > load there is no significant performance difference. > > Katja > > > On Sat, Jan 12, 2013 at 1:00 AM, Hans-Christoph Steiner wrote: >> >> I recently learned that libc's memcpy actually uses things like SSE2 or SSSE2 >> so it can be quite fast on CPUs from the past 10 years, especially of the >> last >> 5 years. >> >> It would be worth profiling to see if that's noticeable. >> >> .hc >> >> On 01/11/2013 05:12 PM, katja wrote: >>> Ok so I did the ugly thing with the right channel input and output pointers: >>> >>> memcpy(outR, inR, vectorsize * sizeof(t_float)); >>> inR = outR; >>> >>> Works like a charm, thanks again. >>> >>> Katja >>> >>> >>> >>> On Fri, Jan 11, 2013 at 10:05 PM, Miller Puckette wrote: copy_perform assumes the data is 4-byte aligned so might save a test or two compared to memcopy() - but I really don't know. I never benchmarked the two against each other :) M On Fri, Jan 11, 2013 at 09:36:41PM +0100, katja wrote: > Hi Miller, > > Thanks for the solution. The routines are in place so copying the > right channel input to output should do it. Is there any reason to > prefer copy_perform() over memcpy()? I'm trying to make the most > efficient reverb for RPi & Co. > > Katja > > > > On Fri, Jan 11, 2013 at 7:57 PM, Miller Puckette wrote: >> Hi Katja - >> >> There's one example of this in sigfft_dspx() - a complex FFT that >> 'natively' >> works on 2 signals in-place but has to deal with various cases in which >> buffers get re-used. It's ugly but the basic idea is first to get the >> inputs copied to the outputs (unless they're already there in the correct >> order in which case nothing needs to be done) and then run the in-place >> algorithm. >> >> If the algo only works out-of-place (i.e. you need 4 distinct buffers, 2 >> in and 2 out) the only way out is (at least conditionally) allocate >> temporary >> copies of the inputs before writing to any outputs. >> >> I may be able to add an optional way tilde objects can request that >> output >> buffers be distinct from input ones sometime in the future - but this is >> a >> couple of steps away for me right now :) >> >> M >> >> On Fri, Jan 11, 2013 at 03:32:09PM +0100, katja wrote: >>> Hello, >>> >>> I'm working on a Pd class with stereo channels (reverb), and the >>> routine happens to be most efficient when iterating over the samples >>> per channel, instead of left and right together in the perform loop. >>> However, when doing two while loops in one object, one for left and >>> one for right, the right channel samples get overwritten because of >>> sample-wise in-place computation. Is this an inescapable truth? I >>> mean, I could write a left channel class and a right channel class >>> (actually did that to verify that it works), but it's inconvenient to >>> use. What could be an efficient way to get them in one object? >>> >>> Thanks, >>> Katja >>> >>> ___ >>> Pd-list@iem.at mailing list >>> UNSUBSCRIBE and account-management -> >>> http://lists.puredata.info/listinfo/pd-list > > ___ > Pd-list@iem.at mailing list > UNSUBSCRIBE and account-management -> > http://lists.puredata.info/listinfo/pd-list >>> >>> ___ >>> Pd-list@iem.at mailing list >>> UNSUBSCRIBE and account-management -> >>> http://lists.puredata.info/listinfo/pd-list >>> >> >> ___ >> Pd-list@iem.at mailing list >> UNSUBSCRIBE and account-management -> >> http://lists.puredata.info/listinfo/pd-list ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
Re: [PD] Automating startup
Hi Rick, You can start by reading this page in the FLOSS manual, especially the last paragraphs. Once you get how it works, it's no big deal start Pd from a command line with all the settings you need. Cheers, Pierre. 2013/1/12 Rick Bragg > Hi, > > I would like to set up my system to set up my patch automatically with > jack and all > the right connections when the system boots. I am using Ubuntu studio at > the > moment. > > I currently have it set so that qjackctl starts when I log in, and I set > qjackctl > to automatically start up jack server when it opens, and then to open my > pd file > after it starts. > > I have a few problems to overcome. > > First, every time pd starts, I need to go into the "Media" menu and change > from > "Default MIDI" to "Alsa MIDI" Why do I have to change this every time? > Shouldn't > this setting be saved? > > Second problem: > After pd opens, I always need to go back to qjackctl, open up the the > "connections" > and change the MIDI connections. Can't I save this as a patchbay setting > somehow? > I tried that, but it doesn't work because the patchbay settings need to > load AFTER > pd starts otherwise the pd connections are not available. > > Are there any good documentation somewhere that discusses automating all > this kind > of stuff? > > Thanks! > Rick > > > > > ___ > Pd-list@iem.at mailing list > UNSUBSCRIBE and account-management -> > http://lists.puredata.info/listinfo/pd-list > ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
[PD] Automating startup
Hi, I would like to set up my system to set up my patch automatically with jack and all the right connections when the system boots. I am using Ubuntu studio at the moment. I currently have it set so that qjackctl starts when I log in, and I set qjackctl to automatically start up jack server when it opens, and then to open my pd file after it starts. I have a few problems to overcome. First, every time pd starts, I need to go into the "Media" menu and change from "Default MIDI" to "Alsa MIDI" Why do I have to change this every time? Shouldn't this setting be saved? Second problem: After pd opens, I always need to go back to qjackctl, open up the the "connections" and change the MIDI connections. Can't I save this as a patchbay setting somehow? I tried that, but it doesn't work because the patchbay settings need to load AFTER pd starts otherwise the pd connections are not available. Are there any good documentation somewhere that discusses automating all this kind of stuff? Thanks! Rick ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
Re: [PD] how to iterate over left and right channel separately in one Pd class?
Function copy_perform8() is also eligible for SIMD processing. I used memcpy() because it is straightforward to use, while Pd's functions pointed to the wrong locations for this case. On the reverb's total load there is no significant performance difference. Katja On Sat, Jan 12, 2013 at 1:00 AM, Hans-Christoph Steiner wrote: > > I recently learned that libc's memcpy actually uses things like SSE2 or SSSE2 > so it can be quite fast on CPUs from the past 10 years, especially of the last > 5 years. > > It would be worth profiling to see if that's noticeable. > > .hc > > On 01/11/2013 05:12 PM, katja wrote: >> Ok so I did the ugly thing with the right channel input and output pointers: >> >> memcpy(outR, inR, vectorsize * sizeof(t_float)); >> inR = outR; >> >> Works like a charm, thanks again. >> >> Katja >> >> >> >> On Fri, Jan 11, 2013 at 10:05 PM, Miller Puckette wrote: >>> copy_perform assumes the data is 4-byte aligned so might save a test >>> or two compared to memcopy() - but I really don't know. I never >>> benchmarked the two against each other :) >>> >>> M >>> >>> On Fri, Jan 11, 2013 at 09:36:41PM +0100, katja wrote: Hi Miller, Thanks for the solution. The routines are in place so copying the right channel input to output should do it. Is there any reason to prefer copy_perform() over memcpy()? I'm trying to make the most efficient reverb for RPi & Co. Katja On Fri, Jan 11, 2013 at 7:57 PM, Miller Puckette wrote: > Hi Katja - > > There's one example of this in sigfft_dspx() - a complex FFT that > 'natively' > works on 2 signals in-place but has to deal with various cases in which > buffers get re-used. It's ugly but the basic idea is first to get the > inputs copied to the outputs (unless they're already there in the correct > order in which case nothing needs to be done) and then run the in-place > algorithm. > > If the algo only works out-of-place (i.e. you need 4 distinct buffers, 2 > in and 2 out) the only way out is (at least conditionally) allocate > temporary > copies of the inputs before writing to any outputs. > > I may be able to add an optional way tilde objects can request that output > buffers be distinct from input ones sometime in the future - but this is a > couple of steps away for me right now :) > > M > > On Fri, Jan 11, 2013 at 03:32:09PM +0100, katja wrote: >> Hello, >> >> I'm working on a Pd class with stereo channels (reverb), and the >> routine happens to be most efficient when iterating over the samples >> per channel, instead of left and right together in the perform loop. >> However, when doing two while loops in one object, one for left and >> one for right, the right channel samples get overwritten because of >> sample-wise in-place computation. Is this an inescapable truth? I >> mean, I could write a left channel class and a right channel class >> (actually did that to verify that it works), but it's inconvenient to >> use. What could be an efficient way to get them in one object? >> >> Thanks, >> Katja >> >> ___ >> Pd-list@iem.at mailing list >> UNSUBSCRIBE and account-management -> >> http://lists.puredata.info/listinfo/pd-list ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list >> >> ___ >> Pd-list@iem.at mailing list >> UNSUBSCRIBE and account-management -> >> http://lists.puredata.info/listinfo/pd-list >> > > ___ > Pd-list@iem.at mailing list > UNSUBSCRIBE and account-management -> > http://lists.puredata.info/listinfo/pd-list ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
[PD] Echo detection and autocepstrum
Hi, I'm working on echo detection and looking for autocepstrum with pd. Is there any detail information about autocepstrum-cepstrum and it's output data for commenting about finding echo. ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list