Re: [PD] Making a Realtime Convolution External

2011-04-08 Thread Billy Stiltner
In response to my comment about acoustic mirror sounding muddy.

I think that most of the directx effects sounded muddy at that time.

___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list


Re: [PD] Making a Realtime Convolution External

2011-04-07 Thread Jonathan Wilkes


--- On Thu, 4/7/11, Mathieu Bouchard  wrote:

> From: Mathieu Bouchard 
> Subject: Re: [PD] Making a Realtime Convolution External
> To: "IOhannes m zmoelnig" 
> Cc: pd-list@iem.at
> Date: Thursday, April 7, 2011, 5:15 PM
> On Wed, 6 Apr 2011, IOhannes m
> zmoelnig wrote:
> 
> > using "threats" does not mean that things have to get
> non-deterministic,
> > and indeed a lot of software uses threads and stays
> completely
> > deterministic.
> 
> Suppose that you launch a second fred on another cpu. How
> do you synchronise the main fred and the second fred
> together so that the main fred doesn't have to wait ?
> Sounds to me like a big issue with multi-fredded
> applications. You can't guarantee that the second cpu will
> run the fred when the first fred will want to, because
> fredding is dependent on the cpu's availability and the OS's
> scheduler's decisions.

I think Tim Blechmann addresses this with Supernova:
http://lac.linuxaudio.org/2010/recordings/day4_1215_Supernova.ogv
(start maybe 5 min. in...)

-Jonathan

> 
> 
> ___
> | Mathieu Bouchard  tél: +1.514.383.3801 
> Villeray, Montréal, QC
> 
> -Inline Attachment Follows-
> 
> ___
> Pd-list@iem.at
> mailing list
> UNSUBSCRIBE and account-management -> 
> http://lists.puredata.info/listinfo/pd-list
> 

___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list


Re: [PD] Making a Realtime Convolution External

2011-04-07 Thread Seth Nickell
if you're not realtime, your main process could get yanked by the
scheduler too seems more like "realtime vs not-realtime" than
"threads: yes/no".

On Thu, Apr 7, 2011 at 8:15 AM, Mathieu Bouchard  wrote:
> On Wed, 6 Apr 2011, IOhannes m zmoelnig wrote:
>
>> using "threats" does not mean that things have to get non-deterministic,
>> and indeed a lot of software uses threads and stays completely
>> deterministic.
>
> Suppose that you launch a second fred on another cpu. How do you synchronise
> the main fred and the second fred together so that the main fred doesn't
> have to wait ? Sounds to me like a big issue with multi-fredded
> applications. You can't guarantee that the second cpu will run the fred when
> the first fred will want to, because fredding is dependent on the cpu's
> availability and the OS's scheduler's decisions.
>
>  ___
> | Mathieu Bouchard  tél: +1.514.383.3801  Villeray, Montréal, QC
>
> ___
> Pd-list@iem.at mailing list
> UNSUBSCRIBE and account-management ->
> http://lists.puredata.info/listinfo/pd-list
>
>

___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list


Re: [PD] Making a Realtime Convolution External

2011-04-07 Thread Mathieu Bouchard

On Wed, 6 Apr 2011, IOhannes m zmoelnig wrote:


using "threats" does not mean that things have to get non-deterministic,
and indeed a lot of software uses threads and stays completely
deterministic.


Suppose that you launch a second fred on another cpu. How do you 
synchronise the main fred and the second fred together so that the main 
fred doesn't have to wait ? Sounds to me like a big issue with 
multi-fredded applications. You can't guarantee that the second cpu will 
run the fred when the first fred will want to, because fredding is 
dependent on the cpu's availability and the OS's scheduler's decisions.


 ___
| Mathieu Bouchard  tél: +1.514.383.3801  Villeray, Montréal, QC
___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list


Re: [PD] Making a Realtime Convolution External

2011-04-06 Thread Seth Nickell
SONOFA$#*&! I keep posting from the wrong email address and getting
bounced ;-) Sorry to Henry & IOhannes for the dupes...

> In the context of threading/part. conv, I had an idea to compute ahead.
> Most of the calculations for a given block can be computed ahead.  Only the
> most recent block of samples needs to be actually convolved, right away.
>
> Then, once you've summed the most recent block with the other partitions,
> you'd start a thread to "compute ahead" the next cycle's partitions.  If the
> load is low enough, it would complete by the next cycle--of course, you'd
> need to wait/join the background thread to make sure that it completes.

that's what it does, give or take :-)

1) Blocks that are due sooner are prioritized, whenever a worker
thread is signalled to wake up, it does the block next due.
2) The main thread pre-empts worker threads as soon as its called.
Worker threads check constantly for a signal to suspend their current
task, and hand the unfinished work back to the main thread.
3) If the main thread needs a block that hasn't been completed, it has
to work on it -> usually this means you aren't going to make your
schedule and the CPU is maxxed out.

-Seth

___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list


Re: [PD] Making a Realtime Convolution External

2011-04-06 Thread Hans-Christoph Steiner


On Apr 6, 2011, at 2:52 PM, IOhannes m zmoelnig wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 2011-04-06 20:26, Hans-Christoph Steiner wrote:


Pd has its own scheduling system which is best to stick to as long as
you can so that you can keep the deterministic operation intact.  For
convolution, I can't see a reason to use a thread.  It adds  
complexity

and more code to run, but if the CPU is overtaxed by realtime
convolution processing, you are going to get an interruption in the
audio regardless of whether the processing is in a thread or not.



partioned convolutions can gain massively from parallelisation.
given that we have more and more CPUs available, i think it is a good
thing to try and do a multicore convolution.

otoh, if there is only a single thread doing the convolution, then  
there

is no parallelisation, and thus the only thing gained is complexity.

using "threats" does not mean that things have to get non- 
deterministic,

and indeed a lot of software uses threads and stays completely
deterministic.




Yes, you can make things deterministic using threads.  Coding without  
threads, its basically automatically deterministic, but when using  
threads, you have to code things right to have it deterministic.   
Having multiple threads to support multiple cores definitely makes  
sense, so I guess this multi-threaded pd object would just need to  
wait for the results of all threads before letting the DSP tick  
complete, thereby ensuring deterministic behavior.


.hc




Terrorism is not an enemy.  It cannot be defeated.  It's a tactic.   
It's about as sensible to say we declare war on night attacks and  
expect we're going to win that war.  We're not going to win the war on  
terrorism.- retired U.S. Army general, William Odom




___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list


Re: [PD] Making a Realtime Convolution External

2011-04-06 Thread Charles Henry
On Wed, Apr 6, 2011 at 2:08 PM, IOhannes m zmoelnig  wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> On 2011-04-06 21:04, Seth Nickell wrote:
> > I use a thread per core, it does parallelize nicely.
> >
>
> that's what i thought.
>
> please don't let yourself turn down by all those misers :-)
>
> fgmasdr
> IOhannes
>

As a young curmudgeon myself, I might *grumble* seem discouraging.  But
really, I'd encourage you to take on convolution externals... but don't
create a monster.

In the context of threading/part. conv, I had an idea to compute ahead.
Most of the calculations for a given block can be computed ahead.  Only the
most recent block of samples needs to be actually convolved, right away.

Then, once you've summed the most recent block with the other partitions,
you'd start a thread to "compute ahead" the next cycle's partitions.  If the
load is low enough, it would complete by the next cycle--of course, you'd
need to wait/join the background thread to make sure that it completes.
___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list


Re: [PD] Making a Realtime Convolution External

2011-04-06 Thread IOhannes m zmoelnig
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 2011-04-06 21:04, Seth Nickell wrote:
> I use a thread per core, it does parallelize nicely.
> 

that's what i thought.

please don't let yourself turn down by all those misers :-)

fgmasdr
IOhannes
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2cuh0ACgkQkX2Xpv6ydvRphgCffCHDaKZzKAznS13POLZ+duHl
zp8AoLk9RXtxKnxkTV0tGmWS/HH3afTH
=JbVO
-END PGP SIGNATURE-



smime.p7s
Description: S/MIME Cryptographic Signature
___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list


Re: [PD] Making a Realtime Convolution External

2011-04-06 Thread Seth Nickell
Hi Charles,

I have a few partitioning methods. I used to do profiling when you
first load the plugin, to determine the optimal partitioning, but
found that on intel/amd cpus with sse3, it didn't vary much, and just
hardcoded a simple rule set for when to use each partitioning style.
In the more cross-platform context of pd, I think that profiling code
might make sense again, I'll see if I can resurrect it.

-Seth

On Tue, Apr 5, 2011 at 5:04 PM, Charles Henry  wrote:
>
>
> On Tue, Apr 5, 2011 at 2:33 PM, Seth Nickell  wrote:
>>
>> Hi Mathieu,
>>
>> Thanks, I assumed (without checking :-P) that the dsp call happened
>> every time, didn't realize it was a setup/patching call that registers
>> my "_perform" function with a call graph. Exactly what I need.
>>
>> I think the difference in approach comes from the needs of the
>> external. fiddle~ probably needs much larger blocks than typical to
>> discriminate between low frequencies. In my case, I can run at 64
>> sample sizes, but I'll take your whole CPU to do it. It might be smart
>> to default to some internal buffering (say 512), and let people order
>> the external to do really really low latency if they need it and are
>> willing to pay in CPU.
>
> Here's where your users' choice of block sizes comes in--if your user puts a
> partitioned convolution external into a canvas with block size 64, it means
> to be low-latency.  If the user puts it in with [block~ 1024], then the
> buffering is defined.
>
> Pd means to be ~user~programmable and modular.  The more you try to monolith
> your externals, the worse they work (I've done this).  I know I'm not
> expressing it well, but I hope the point comes through.
>
>>
>> That said, Peter reminded me of an optimization that I hadn't
>> implemented yet. AudioUnits are rarely asked to run below 128 sample
>> block sizes, so it didn't make sense for the AU, and I forgot that it
>> was on the TODO list from 2 years ago. ;-) By convolving very small
>> blocks in the time domain, and switching to frequency domain for
>> larger blocks, I think we can get excellent CPU usage at very small
>> block sizes too.
>
> It sounds like you'd have a bit of a problem without first profiling the
> system or having known profiles for different hardware.  Can you tell me
> more about your partitioning method (just the math)?
>
>>
>> -Seth
>>
>> On Tue, Apr 5, 2011 at 8:49 AM, Mathieu Bouchard 
>> wrote:
>> > On Mon, 4 Apr 2011, Seth Nickell wrote:
>> >
>> >> Are the DSP calls liable to vary t_signal->s_n (block size) without
>> >> notification? 64 samples, apparently the default on pd-extended, is
>> >> doable without buffering for partitioned convolution on a modern
>> >> computer, but it exacts a pretty high CPU toll, and if I have to
>> >> handle random blocksize changes, it gets more expensive.
>> >>
>> >> Also, since convolution is much more efficient around block sizes of
>> >> 256
>> >> or 512, perhaps I should default to one of these, buffer a little, and
>> >> have
>> >> a "runatpdblocksize" message or somesuch?
>> >
>> > There's always a notification. Any change of s_n will result in a new
>> > call
>> > to the dsp-function.
>> >
>> > Note that it's best to make sure that the dsp-function is fairly fast
>> > most
>> > of the times, because any patching may retrigger the dsp-function in
>> > order
>> > to recompile the graph.
>> >
>> > dsp objects working with some kind of blocks don't have to be using s_n
>> > as a
>> > setting. I mean that you can accumulate several dsp-blocks in order to
>> > make
>> > your own kind of bigger block. This is what [fiddle~] and [env~] do, for
>> > example.
>> >
>> > But some other object classes use s_n as a setting. For example, [fft~]
>> > does. I don't know why this is not consistent across all of pd. (I'm not
>> > saying either approach is better than the other.)
>> >
>> >  ___
>> > | Mathieu Bouchard  tél: +1.514.383.3801  Villeray, Montréal, QC
>>
>> ___
>> Pd-list@iem.at mailing list
>> UNSUBSCRIBE and account-management ->
>> http://lists.puredata.info/listinfo/pd-list
>
>

___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list


Re: [PD] Making a Realtime Convolution External

2011-04-06 Thread Seth Nickell
Hi Hans,

The thread in question here would be invoked when a "set" message is
sent to the object. In this case, I need to load the Impulse Response
from the disk and optionally do a test convolution and normalize it.
I'm assuming (yeah, I should just check ;-) if I block on an inlet,
I'm blocking the whole audio thread?

Convolution with an increasing block size naturally involves different
work-loads on different sample blocks (some sample blocks finish a
large block that can then be processed, some don't). If the scheduler
is picky/precise enough (AudioUnits is), you can't get away with this
"ragged work load". Its not a matter of decreasing CPU usage - of
course scheduling things to run in a worker thread increases CPU usage
a little - its a matter of keeping the _perform call CPU cycles
consistent.

-Seth

On Wed, Apr 6, 2011 at 11:26 AM, Hans-Christoph Steiner  wrote:
>
> On Apr 4, 2011, at 10:48 PM, Seth Nickell wrote:
>
 2) Anyone have requests for features/api? Its currently simplistic:
  - takes a "read FILENAME" message, loads the file, does a test
 convolution against pink noise to normalize the gain to something sane
>>>
>>> Is this done within the main Pd audio thread?
>>
>> The convolution engine has support for doing it either on the calling
>> thread, or a background thread. I'm thinking of default to a
>> background thread. That seem like the right move?
>
> Pd has its own scheduling system which is best to stick to as long as you
> can so that you can keep the deterministic operation intact.  For
> convolution, I can't see a reason to use a thread.  It adds complexity and
> more code to run, but if the CPU is overtaxed by realtime convolution
> processing, you are going to get an interruption in the audio regardless of
> whether the processing is in a thread or not.
>
> .hc
>
>
  - caches the last N impulse responses, as the test convolution
 takes a little time
  - allows setting the cache size with a "cachesize N" message
>>>
>>> To make sure I understood this: cachesize is not the size of the first
>>> partition of the partitioned convolution, but the cache that tries to
>>> avoid
>>> audio dropouts when performing the test convolution?
>>
>> The convolution engine can swap-in a pre-loaded ('cached') IR in
>> realtime without glitching... but it means keeping 2x the Impulse
>> Response data in RAM. To keep the default API simple but useful, I'm
>> defaulting to caching only the last 5 impulse responses in RAM.
>> "cachesize N" lets you increase that number lets say in a
>> performance you wanted to use 30 different impulse responses and you
>> have 2GB of ram... should be nbd.
>>

  - disable normalization with "normalize 0" or "normalize 1"
>>>
>>> Yes, disabling this could be a good idea! You could also add a "gain 0-1"
>>> message for manual control.
>>
>> Its worth noting that impulse responses are usually whack without gain
>> normalization like factors of hundreds to millions off a usable
>> signal.
>>
  Features I'm considering (let me know if they sound useful):
   - load from an array instead of from disk (no gain normalization?)
>>>
>>> Very good.

   - It wouldn't be hard to enable MxN convolution if that floats
 somebody's boat.
>>>
>>> I am sure if you come up with a convolution as efficient and flexible as
>>> jconv by Fons within Pd, then soon a multichannel use and hence request
>>> will
>>> come up fast.
>>
>> I'd be interested in what flexibility means in this context, it might
>> give me some good ideas for features to add. Efficiency-wise, last
>> time I benchmarked its more efficient than jconv, but the difference
>> is offset by less graceful degradation under CPU load (I convolve in
>> background threads to preserve realtime in the main thread while
>> avoiding an irritating patent that's going to expire soon...).
>>
>> WRT to Pd's audio scheduling... are Pd signal externals held to
>> realtime or can my dsp call vary the number of cycles it takes by 100%
>> from call to call? VST seems to do ok with this, but AudioUnits get
>> scheduled to run at the very last instant they possibly could. If Pd
>> can have some variance, I can drop the threads and improve the
>> external's degradation under high CPU load.
>>
>> thanks for the feedback (also, is the best list for this kind of
>> feedback?),
>>
>> -Seth
>>
>> ___
>> Pd-list@iem.at mailing list
>> UNSUBSCRIBE and account-management ->
>> http://lists.puredata.info/listinfo/pd-list
>
>
>
> 
>
> As we enjoy great advantages from inventions of others, we should be glad of
> an opportunity to serve others by any invention of ours; and this we should
> do freely and generously.         - Benjamin Franklin
>
>
>

___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management -> 
http://lists.pure

Re: [PD] Making a Realtime Convolution External

2011-04-06 Thread IOhannes m zmoelnig
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 2011-04-06 20:26, Hans-Christoph Steiner wrote:
>
> Pd has its own scheduling system which is best to stick to as long as
> you can so that you can keep the deterministic operation intact.  For
> convolution, I can't see a reason to use a thread.  It adds complexity
> and more code to run, but if the CPU is overtaxed by realtime
> convolution processing, you are going to get an interruption in the
> audio regardless of whether the processing is in a thread or not.
> 

partioned convolutions can gain massively from parallelisation.
given that we have more and more CPUs available, i think it is a good
thing to try and do a multicore convolution.

otoh, if there is only a single thread doing the convolution, then there
is no parallelisation, and thus the only thing gained is complexity.

using "threats" does not mean that things have to get non-deterministic,
and indeed a lot of software uses threads and stays completely
deterministic.

gjasdr
IOhannes
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2ctoUACgkQkX2Xpv6ydvSRHgCffF6PLcdNxIZb4SdpdQjMA5iJ
TJ8An0OJBOD2SPYZcU54ltqGn7RF/tCb
=yot/
-END PGP SIGNATURE-



smime.p7s
Description: S/MIME Cryptographic Signature
___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list


Re: [PD] Making a Realtime Convolution External

2011-04-06 Thread Hans-Christoph Steiner


On Apr 4, 2011, at 10:48 PM, Seth Nickell wrote:


2) Anyone have requests for features/api? Its currently simplistic:
  - takes a "read FILENAME" message, loads the file, does a test
convolution against pink noise to normalize the gain to something  
sane


Is this done within the main Pd audio thread?


The convolution engine has support for doing it either on the calling
thread, or a background thread. I'm thinking of default to a
background thread. That seem like the right move?


Pd has its own scheduling system which is best to stick to as long as  
you can so that you can keep the deterministic operation intact.  For  
convolution, I can't see a reason to use a thread.  It adds complexity  
and more code to run, but if the CPU is overtaxed by realtime  
convolution processing, you are going to get an interruption in the  
audio regardless of whether the processing is in a thread or not.


.hc



  - caches the last N impulse responses, as the test convolution
takes a little time
  - allows setting the cache size with a "cachesize N" message


To make sure I understood this: cachesize is not the size of the  
first
partition of the partitioned convolution, but the cache that tries  
to avoid

audio dropouts when performing the test convolution?


The convolution engine can swap-in a pre-loaded ('cached') IR in
realtime without glitching... but it means keeping 2x the Impulse
Response data in RAM. To keep the default API simple but useful, I'm
defaulting to caching only the last 5 impulse responses in RAM.
"cachesize N" lets you increase that number lets say in a
performance you wanted to use 30 different impulse responses and you
have 2GB of ram... should be nbd.



  - disable normalization with "normalize 0" or "normalize 1"


Yes, disabling this could be a good idea! You could also add a  
"gain 0-1"

message for manual control.


Its worth noting that impulse responses are usually whack without gain
normalization like factors of hundreds to millions off a usable
signal.


 Features I'm considering (let me know if they sound useful):
   - load from an array instead of from disk (no gain  
normalization?)


Very good.


   - It wouldn't be hard to enable MxN convolution if that floats
somebody's boat.


I am sure if you come up with a convolution as efficient and  
flexible as
jconv by Fons within Pd, then soon a multichannel use and hence  
request will

come up fast.


I'd be interested in what flexibility means in this context, it might
give me some good ideas for features to add. Efficiency-wise, last
time I benchmarked its more efficient than jconv, but the difference
is offset by less graceful degradation under CPU load (I convolve in
background threads to preserve realtime in the main thread while
avoiding an irritating patent that's going to expire soon...).

WRT to Pd's audio scheduling... are Pd signal externals held to
realtime or can my dsp call vary the number of cycles it takes by 100%
from call to call? VST seems to do ok with this, but AudioUnits get
scheduled to run at the very last instant they possibly could. If Pd
can have some variance, I can drop the threads and improve the
external's degradation under high CPU load.

thanks for the feedback (also, is the best list for this kind of  
feedback?),


-Seth

___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list






As we enjoy great advantages from inventions of others, we should be  
glad of an opportunity to serve others by any invention of ours; and  
this we should do freely and generously. - Benjamin Franklin




___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list


Re: [PD] Making a Realtime Convolution External

2011-04-06 Thread Archontis Politis

Hi Seth,

In terms of feature requests, since you are doing all the work already, 
it would be nice to have 1x4 mode, meaning one input ->4 convolutions->4 
outputs. That would be great for ambisonic (b-format) 4-channel room 
impulse responses.


Regards,
Archontis

On 4/5/11 3:54 AM, Seth Nickell wrote:

I'm planning to release our realtime convolution engine (extracted
from http://meatscience.net/pages/convolution-reverb) as a GPLed Pd
external.

It currently accepts 4-channel ('true stereo'), two channel or mono
impulse responses, with stereo or mono output. Performance is
excellent if you have SSE3 and has a fallback in case you don't, and
it aims for accuracy (basically that means multi-stage scaling to keep
floats within healthy sizes).

1) I'd love to swipe the convolve~ external name, currently installed
by mjlib as part of pd-extended. convolve~ from mjlib appears to be a
copy of pin~ ? so I think it could be taken? Maybe I mis-read the
code. I've cc'ed mark who can probably clarify.

2) Anyone have requests for features/api? Its currently simplistic:
- takes a "read FILENAME" message, loads the file, does a test
convolution against pink noise to normalize the gain to something sane
- caches the last N impulse responses, as the test convolution
takes a little time
- allows setting the cache size with a "cachesize N" message
- disable normalization with "normalize 0" or "normalize 1"

   Features I'm considering (let me know if they sound useful):
 - load from an array instead of from disk (no gain normalization?)
 - It wouldn't be hard to enable MxN convolution if that floats
somebody's boat.

3) I can compile/test on Mac&  Linux, anyone up for helping me with Windows?

4) Would this be of interest for Pd-extended?

5) I'd love to build a granular convolution engine takes two
real-time signals, and extracts grains from one to convolve against
the other. Anyone have ideas about this?

thanks all,

-Seth

___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management ->  
http://lists.puredata.info/listinfo/pd-list



___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list


Re: [PD] Making a Realtime Convolution External

2011-04-06 Thread Billy Stiltner
> the calculation as well, so that you could deliberately stagger the
> blocks and more evenly distribute the calculation in cpu-intensive
> situations. I'm imagining something like two 4096 blocks running say,
> 64 samples apart so that one is does its calculation while the other
> is still collecting samples.
>
> Matt
>

Maybe a master to sequence the starting of the blocks. instead of
using block use switch~.
If the masters blocksize is 64 set up a counter that sends a bang
every iteration trigger the first switch at 4096. next iteration
trigger the second. do a mod 64 repeat the process.


I had no idea that convolution and fft vocoder were that much
different. I will have to look up convolution now my only experience
with it was in cool edit pro. After hearing Aphex Twin's Bucephalis
Bouncing Ball I recorded a ball bouncing, extracted the timing in
cakewalk, made some drum sounds trigger at the ball bounce, imported
to cool edit, then convoluted with different sounds like female voice.
Then there was Sonic Foundry's Acoustic mirror which sounded really
muddy.  Actually my use of the fft vocoder could almost be done with
an envelope follower.

___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list


Re: [PD] Making a Realtime Convolution External

2011-04-05 Thread Matt Barber
> Just scanned the source... big difference would be performance, and if
> you're picky (you have to be pretty picky, honestly), some difference
> in accuracy due to floating point's reduced precision at large/small
> values. Convolution is still expensive enough for performance to
> really matter.
>
> the biggies:
> - partconv implements a single fixed block size, but freq domain
> convolution is faster by far on bigger blocks (peak on a core duo is
> near 4k sample blocks). implementing growing block sizes makes a big
> difference to low latency performance (e.g. 64 64 128 128 256 256 512
> 512 1024 1024 2048 2048 4096 4096), as you can get low latency while
> most of your convolutions operating on the ideal high-performance
> block size.


I was putting one of these together in Pd vanilla with dynamic
patching as an exercise a few years back, but there were some problems
I had. I think you can just do a simple 64 128 256 512 etc. and let
the block delay take care of the timing automatically, but I actually
found the kind you posted here to work a little better. Another one
that worked even better was something like 64 32 32 64 64 128 128 256
256 etc., which seemed to front-load some of the calculation a little
(and with this one and the one you posted, if Pd's block size were 1,
you could do the first block as a direct convolution for extreme
low-latency).

Anyway, this brings up a problem I've been wondering about with Pd --
If you have lots of reblocking going on I have been assuming that if
you had, say, one patch blocked at 64, another at 128, and others at
256 512 1024 2048 and 4096, that at the end of the 4096 block all 7
patches will have just finished a block cycle and there will therefore
be a CPU spike relative to other places between the beginning and end
of the 4096 block as the calculation for all 7 is done. Is there a way
in Pd to offset larger blocks by a given number of samples so that the
calculation for that block happens at a different time? It's easy
enough to delay the samples -- that's not what I want. I want to delay
the calculation as well, so that you could deliberately stagger the
blocks and more evenly distribute the calculation in cpu-intensive
situations. I'm imagining something like two 4096 blocks running say,
64 samples apart so that one is does its calculation while the other
is still collecting samples.

Matt


> - vectorization (sse/altivec) of partconv would give a 2-3.5x performance 
> boost
>
> -seth

___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list


Re: [PD] Making a Realtime Convolution External

2011-04-05 Thread Charles Henry
On Tue, Apr 5, 2011 at 2:33 PM, Seth Nickell  wrote:

> Hi Mathieu,
>
> Thanks, I assumed (without checking :-P) that the dsp call happened
> every time, didn't realize it was a setup/patching call that registers
> my "_perform" function with a call graph. Exactly what I need.
>
> I think the difference in approach comes from the needs of the
> external. fiddle~ probably needs much larger blocks than typical to
> discriminate between low frequencies. In my case, I can run at 64
> sample sizes, but I'll take your whole CPU to do it. It might be smart
> to default to some internal buffering (say 512), and let people order
> the external to do really really low latency if they need it and are
> willing to pay in CPU.
>

Here's where your users' choice of block sizes comes in--if your user puts a
partitioned convolution external into a canvas with block size 64, it means
to be low-latency.  If the user puts it in with [block~ 1024], then the
buffering is defined.

Pd means to be ~user~programmable and modular.  The more you try to monolith
your externals, the worse they work (I've done this).  I know I'm not
expressing it well, but I hope the point comes through.


>
> That said, Peter reminded me of an optimization that I hadn't
> implemented yet. AudioUnits are rarely asked to run below 128 sample
> block sizes, so it didn't make sense for the AU, and I forgot that it
> was on the TODO list from 2 years ago. ;-) By convolving very small
> blocks in the time domain, and switching to frequency domain for
> larger blocks, I think we can get excellent CPU usage at very small
> block sizes too.
>

It sounds like you'd have a bit of a problem without first profiling the
system or having known profiles for different hardware.  Can you tell me
more about your partitioning method (just the math)?


>
> -Seth
>
> On Tue, Apr 5, 2011 at 8:49 AM, Mathieu Bouchard 
> wrote:
> > On Mon, 4 Apr 2011, Seth Nickell wrote:
> >
> >> Are the DSP calls liable to vary t_signal->s_n (block size) without
> >> notification? 64 samples, apparently the default on pd-extended, is
> >> doable without buffering for partitioned convolution on a modern
> >> computer, but it exacts a pretty high CPU toll, and if I have to
> >> handle random blocksize changes, it gets more expensive.
> >>
> >> Also, since convolution is much more efficient around block sizes of 256
> >> or 512, perhaps I should default to one of these, buffer a little, and
> have
> >> a "runatpdblocksize" message or somesuch?
> >
> > There's always a notification. Any change of s_n will result in a new
> call
> > to the dsp-function.
> >
> > Note that it's best to make sure that the dsp-function is fairly fast
> most
> > of the times, because any patching may retrigger the dsp-function in
> order
> > to recompile the graph.
> >
> > dsp objects working with some kind of blocks don't have to be using s_n
> as a
> > setting. I mean that you can accumulate several dsp-blocks in order to
> make
> > your own kind of bigger block. This is what [fiddle~] and [env~] do, for
> > example.
> >
> > But some other object classes use s_n as a setting. For example, [fft~]
> > does. I don't know why this is not consistent across all of pd. (I'm not
> > saying either approach is better than the other.)
> >
> >  ___
> > | Mathieu Bouchard  tél: +1.514.383.3801  Villeray, Montréal, QC
>
> ___
> Pd-list@iem.at mailing list
> UNSUBSCRIBE and account-management ->
> http://lists.puredata.info/listinfo/pd-list
>
___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list


Re: [PD] Making a Realtime Convolution External

2011-04-05 Thread Seth Nickell
Hi Mathieu,

Thanks, I assumed (without checking :-P) that the dsp call happened
every time, didn't realize it was a setup/patching call that registers
my "_perform" function with a call graph. Exactly what I need.

I think the difference in approach comes from the needs of the
external. fiddle~ probably needs much larger blocks than typical to
discriminate between low frequencies. In my case, I can run at 64
sample sizes, but I'll take your whole CPU to do it. It might be smart
to default to some internal buffering (say 512), and let people order
the external to do really really low latency if they need it and are
willing to pay in CPU.

That said, Peter reminded me of an optimization that I hadn't
implemented yet. AudioUnits are rarely asked to run below 128 sample
block sizes, so it didn't make sense for the AU, and I forgot that it
was on the TODO list from 2 years ago. ;-) By convolving very small
blocks in the time domain, and switching to frequency domain for
larger blocks, I think we can get excellent CPU usage at very small
block sizes too.

-Seth

On Tue, Apr 5, 2011 at 8:49 AM, Mathieu Bouchard  wrote:
> On Mon, 4 Apr 2011, Seth Nickell wrote:
>
>> Are the DSP calls liable to vary t_signal->s_n (block size) without
>> notification? 64 samples, apparently the default on pd-extended, is
>> doable without buffering for partitioned convolution on a modern
>> computer, but it exacts a pretty high CPU toll, and if I have to
>> handle random blocksize changes, it gets more expensive.
>>
>> Also, since convolution is much more efficient around block sizes of 256
>> or 512, perhaps I should default to one of these, buffer a little, and have
>> a "runatpdblocksize" message or somesuch?
>
> There's always a notification. Any change of s_n will result in a new call
> to the dsp-function.
>
> Note that it's best to make sure that the dsp-function is fairly fast most
> of the times, because any patching may retrigger the dsp-function in order
> to recompile the graph.
>
> dsp objects working with some kind of blocks don't have to be using s_n as a
> setting. I mean that you can accumulate several dsp-blocks in order to make
> your own kind of bigger block. This is what [fiddle~] and [env~] do, for
> example.
>
> But some other object classes use s_n as a setting. For example, [fft~]
> does. I don't know why this is not consistent across all of pd. (I'm not
> saying either approach is better than the other.)
>
>  ___
> | Mathieu Bouchard  tél: +1.514.383.3801  Villeray, Montréal, QC

___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list


Re: [PD] Making a Realtime Convolution External

2011-04-05 Thread Seth Nickell
On Tue, Apr 5, 2011 at 8:38 AM, Mathieu Bouchard  wrote:
> On Mon, 4 Apr 2011, Seth Nickell wrote:
>
>> 5) I'd love to build a granular convolution engine takes two real-time
>> signals, and extracts grains from one to convolve against the other. Anyone
>> have ideas about this?
>
> What's the fundamental difference between this and a windowed FFT
> convolution engine ?

Big difference would be stochastic grain selection (with
inputs/control over the selection tendencies), but it'd definitely
start as a straight-up windowed fft convolution engine. E.g. one
parameter that could make for interesting selections is to hunt for
decaying peaks, and favor using those to get a "crisper output"
instead of the haze that results from windowed fft convolution.

-seth

___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list


Re: [PD] Making a Realtime Convolution External

2011-04-05 Thread Seth Nickell
Hi Jamie,

Just scanned the source... big difference would be performance, and if
you're picky (you have to be pretty picky, honestly), some difference
in accuracy due to floating point's reduced precision at large/small
values. Convolution is still expensive enough for performance to
really matter.

the biggies:
- partconv implements a single fixed block size, but freq domain
convolution is faster by far on bigger blocks (peak on a core duo is
near 4k sample blocks). implementing growing block sizes makes a big
difference to low latency performance (e.g. 64 64 128 128 256 256 512
512 1024 1024 2048 2048 4096 4096), as you can get low latency while
most of your convolutions operating on the ideal high-performance
block size.
- vectorization (sse/altivec) of partconv would give a 2-3.5x performance boost

-seth

On Tue, Apr 5, 2011 at 8:26 AM, Jamie Bullock  wrote:
>
> Hi Seth,
>
>
> On 5 Apr 2011, at 01:54, Seth Nickell wrote:
>
>> I'm planning to release our realtime convolution engine (extracted
>> from http://meatscience.net/pages/convolution-reverb) as a GPLed Pd
>> external.
>>
>
> What is the advantage of this over Ben Saylor's [partconv~] external, which 
> provides partitioned convolution?
>
> Jamie
>
>

___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list


Re: [PD] Making a Realtime Convolution External

2011-04-05 Thread Mathieu Bouchard

On Mon, 4 Apr 2011, Seth Nickell wrote:


Are the DSP calls liable to vary t_signal->s_n (block size) without
notification? 64 samples, apparently the default on pd-extended, is
doable without buffering for partitioned convolution on a modern
computer, but it exacts a pretty high CPU toll, and if I have to
handle random blocksize changes, it gets more expensive.

Also, since convolution is much more efficient around block sizes of 256 
or 512, perhaps I should default to one of these, buffer a little, and 
have a "runatpdblocksize" message or somesuch?


There's always a notification. Any change of s_n will result in a new call 
to the dsp-function.


Note that it's best to make sure that the dsp-function is fairly fast most 
of the times, because any patching may retrigger the dsp-function in order 
to recompile the graph.


dsp objects working with some kind of blocks don't have to be using s_n as 
a setting. I mean that you can accumulate several dsp-blocks in order to 
make your own kind of bigger block. This is what [fiddle~] and [env~] do, 
for example.


But some other object classes use s_n as a setting. For example, [fft~] 
does. I don't know why this is not consistent across all of pd. (I'm not 
saying either approach is better than the other.)


 ___
| Mathieu Bouchard  tél: +1.514.383.3801  Villeray, Montréal, QC___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list


Re: [PD] Making a Realtime Convolution External

2011-04-05 Thread Mathieu Bouchard

On Mon, 4 Apr 2011, Peter Plessas wrote:

This would be of interest for all Pd users, no matter if they like their 
externals included in a distribution of Pd ('extended') or manuall 
adding them to their vanilla Pd.


But pd-extended is not merely a bundling of externals.

For example, the [initbang] internal class is not in vanilla and is not 
possible as an external.


There are also differences about rendering of boxes and fonts, if you've 
seen any screenshots.


 ___
| Mathieu Bouchard  tél: +1.514.383.3801  Villeray, Montréal, QC
___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list


Re: [PD] Making a Realtime Convolution External

2011-04-05 Thread Mathieu Bouchard

On Mon, 4 Apr 2011, Seth Nickell wrote:

5) I'd love to build a granular convolution engine takes two 
real-time signals, and extracts grains from one to convolve against the 
other. Anyone have ideas about this?


What's the fundamental difference between this and a windowed FFT 
convolution engine ?


 ___
| Mathieu Bouchard  tél: +1.514.383.3801  Villeray, Montréal, QC
___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list


Re: [PD] Making a Realtime Convolution External

2011-04-05 Thread Jamie Bullock

Hi Seth,


On 5 Apr 2011, at 01:54, Seth Nickell wrote:

> I'm planning to release our realtime convolution engine (extracted
> from http://meatscience.net/pages/convolution-reverb) as a GPLed Pd
> external.
> 

What is the advantage of this over Ben Saylor's [partconv~] external, which 
provides partitioned convolution?

Jamie


___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list


Re: [PD] Making a Realtime Convolution External

2011-04-05 Thread Mathieu Bouchard

On Tue, 5 Apr 2011, Billy Stiltner wrote:

I remember there were lots of tricks that could be done with graphics 
and integer math as well as binary bit twidling before math coprocessors 
were in every machine. Look at fractint. Example of circle code seems 
like I optimized this further but can't remember.


Yeah, I remember the Bresenham techniques, but the whole concept of the 
convolution theorem and the FFT is a lot deeper than that... it's a really 
deep optimisation.


BTW, the only Bresenham I'm aware of in Pd is [#draw_polygon]. I used it 
for making stuff like this :


  http://gridflow.ca/gallery/koch_polygon_3a.png
  http://gridflow.ca/gallery/koch_polygon_2d.png
  http://gridflow.ca/gallery/bezier.png
  http://gridflow.ca/gallery/supercycloid.mov

etc. see the rest of the koch series on http://gridflow.ca/gallery

 ___
| Mathieu Bouchard  tél: +1.514.383.3801  Villeray, Montréal, QC___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list


Re: [PD] Making a Realtime Convolution External

2011-04-05 Thread Billy Stiltner
I just looked and with 512 sample buffer Reason defaults to 15ms
latency output at 44.1k and 48k. at 96k it is 9ms output latency.

___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list


Re: [PD] Making a Realtime Convolution External

2011-04-05 Thread Billy Stiltner
>From a users standpoint here's my 2 cents.
I have 2 tunes that use Reason's FFT vocoder. In FFT mode it has 32
frequency bands.
The carrier input on these tunes are water recordings at 96k and the
modulator is drum beats of which samples were recorded at 44.1k.
I usually work in 48khz. So being able to work with differen't sample
rates and have the audio playing at sample rate of file would be
great.

http://sites.google.com/site/chuxlingchaxrazauralarielz/bp1.mp3
http://sites.google.com/site/chuxlingchaxrazauralarielz/bp2.mp3

performance of reason and latency was good enough to record live midi
input while vocoding on a 2.2GHz. AMD with 1Gb of ram. I'm not sure of
the latency at the time of those recordings but was using pd along
with reason. pd was used to retune midi notes with pitchbend. I don't
know how they do it but It but would sure like to know.
fast convolution is a feature I would love to be able to use in pd.

I remember there were lots of tricks that could be done with graphics
and integer math as well as binary bit twidling before math
coprocessors were in every machine. Look at fractint.

Example of circle code seems like I optimized this further but can't remember.

/
void bcircle(int x0,int y0,int radius,int c)
{
int x,y;
long a, asquared, twoasquared;
long b, bsquared, twobsquared;
long d, dx, dy;
int Aspecty,Aspectx;
getaspectratio(&Aspectx,&Aspecty);
x=0;
y=radius;
//a=radius*Aspecty/Aspectx;
a=radius*1.;
asquared=a*a;
twoasquared=2*asquared;
b=radius;
bsquared=b*b;
twobsquared=2*bsquared;
d=bsquared-asquared*b+asquared/4L;
dx=0;
dy=twoasquared*b;
while(dx0)
   {
y=y-1;
dy=dy-twoasquared;
d=d-dy;
putpixel(x0+x,y0+y,c);
putpixel(x0-x,y0+y,c);
putpixel(x0+x,y0-y,c);
putpixel(x0-x,y0-y,c);
x=x+1;
dx=dx+twobsquared;
d=d+bsquared+dx;
   }else{
 x=x+1;
 dx=dx+twobsquared;
 d=d+bsquared+dx;
  putpixel(x0+x,y0+y,c);
putpixel(x0-x,y0+y,c);
putpixel(x0+x,y0-y,c);
putpixel(x0-x,y0-y,c);
 };
};
 d=d+(3L*(asquared-bsquared)/2L-(dx+dy))/2L;
 while(y>0)
 {
  if(d<0)
{
 x=x+1;
 dx=dx+twobsquared;
 d=d+dx;
};
  y=y-1;
  putpixel(x0+x,y0+y,c);
  putpixel(x0-x,y0+y,c);
  putpixel(x0+x,y0-y,c);
  putpixel(x0-x,y0-y,c);

  dy=dy-twoasquared;
  d=d+asquared-dy;
 };
};
/


Could something like this be done with audio to speed up operations?

___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list


Re: [PD] Making a Realtime Convolution External

2011-04-04 Thread Seth Nickell
>> Also, since convolution is much more efficient around block sizes of
>> 256 or 512, perhaps I should default to one of these, buffer a little,
>> and have a "runatpdblocksize" message or somesuch?
>
> I still have not understood if/how the user can set the duration of the
> first partition of you partitioned convolution, and how these partitions are
> structured in their (possibly increasing) sizes. Since this first paramter
> will define the latency-vs-CPU tradeoff it should not be preset by the
> developers.

I guess this is what I was asking. I support a few "block pattern"
partitioning schemes (they're pluggable, its very easy to add a new
one), I could export the choice of these to the end-user, including
the option of what block size to start with - the minimuum block size
of course being Pd's current block size.

My guess is, in the wild, most "pd users" are using Pd-extended, and
ships with a 20msec default delay (dunno if this is inherited from
vanilla, or overridden by the distro, but either way, same effect:
most pd installs probably run at 20msec).

I'm all for allowing configuration of these important parameters, but
I want the external to do something sane out of the box. My guess is
64 sample blocks (~20msec) is more abusive CPU-wise than most people
expect out-of-the-box, so I'm probably going to default to a
partitioning that looks like:

256, 512, 1024, 2048, 4096, 4096, ..., 4096

And allow people to set a different partitioning scheme, including
reducing the initial partition size, if they want. That make good
sense?

-Seth

>
> P.
>
> PS: Pd and Pd-extended use the same core, audio engine. You might want to
> consider Pd-extended as vanilla Pd with a folder full of precompiled
> externals.
>
>>
>> On Mon, Apr 4, 2011 at 7:48 PM, Seth Nickell  wrote:
>
> 2) Anyone have requests for features/api? Its currently simplistic:
>  - takes a "read FILENAME" message, loads the file, does a test
> convolution against pink noise to normalize the gain to something sane

 Is this done within the main Pd audio thread?
>>>
>>> The convolution engine has support for doing it either on the calling
>>> thread, or a background thread. I'm thinking of default to a
>>> background thread. That seem like the right move?
>>>
>  - caches the last N impulse responses, as the test convolution
> takes a little time
>  - allows setting the cache size with a "cachesize N" message

 To make sure I understood this: cachesize is not the size of the first
 partition of the partitioned convolution, but the cache that tries to
 avoid
 audio dropouts when performing the test convolution?
>>>
>>> The convolution engine can swap-in a pre-loaded ('cached') IR in
>>> realtime without glitching... but it means keeping 2x the Impulse
>>> Response data in RAM. To keep the default API simple but useful, I'm
>>> defaulting to caching only the last 5 impulse responses in RAM.
>>> "cachesize N" lets you increase that number lets say in a
>>> performance you wanted to use 30 different impulse responses and you
>>> have 2GB of ram... should be nbd.
>>>
>  - disable normalization with "normalize 0" or "normalize 1"

 Yes, disabling this could be a good idea! You could also add a "gain
 0-1"
 message for manual control.
>>>
>>> Its worth noting that impulse responses are usually whack without gain
>>>  normalization like factors of hundreds to millions off a usable
>>> signal.
>>>
>  Features I'm considering (let me know if they sound useful):
>   - load from an array instead of from disk (no gain normalization?)

 Very good.
>
>   - It wouldn't be hard to enable MxN convolution if that floats
> somebody's boat.

 I am sure if you come up with a convolution as efficient and flexible as
 jconv by Fons within Pd, then soon a multichannel use and hence request
 will
 come up fast.
>>>
>>> I'd be interested in what flexibility means in this context, it might
>>> give me some good ideas for features to add. Efficiency-wise, last
>>> time I benchmarked its more efficient than jconv, but the difference
>>> is offset by less graceful degradation under CPU load (I convolve in
>>> background threads to preserve realtime in the main thread while
>>> avoiding an irritating patent that's going to expire soon...).
>>>
>>> WRT to Pd's audio scheduling... are Pd signal externals held to
>>> realtime or can my dsp call vary the number of cycles it takes by 100%
>>> from call to call? VST seems to do ok with this, but AudioUnits get
>>> scheduled to run at the very last instant they possibly could. If Pd
>>> can have some variance, I can drop the threads and improve the
>>> external's degradation under high CPU load.
>>>
>>> thanks for the feedback (also, is the best list for this kind of
>>> feedback?),
>>>
>>> -Seth
>>>
>

___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account

Re: [PD] Making a Realtime Convolution External

2011-04-04 Thread Peter Plessas

Dear Seth,

Seth Nickell wrote:

Another question on similar lines...

Are the DSP calls liable to vary t_signal->s_n (block size) without
notification? 64 samples, apparently the default on pd-extended, is
doable without buffering for partitioned convolution on a modern
computer, but it exacts a pretty high CPU toll, and if I have to
handle random blocksize changes, it gets more expensive.
They cannot vary by themselves, but what is usually done (e.g. with 
FFTs), is to place an signal (tilde ~) object in a subpatch and resize 
the blocksize for that blocksize using the [switch~] or [block~] 
objects. You might consider using this very approach.


Also, since convolution is much more efficient around block sizes of
256 or 512, perhaps I should default to one of these, buffer a little,
and have a "runatpdblocksize" message or somesuch?
I still have not understood if/how the user can set the duration of the 
first partition of you partitioned convolution, and how these partitions 
are structured in their (possibly increasing) sizes. Since this first 
paramter will define the latency-vs-CPU tradeoff it should not be preset 
by the developers.


P.

PS: Pd and Pd-extended use the same core, audio engine. You might want 
to consider Pd-extended as vanilla Pd with a folder full of precompiled 
externals.




On Mon, Apr 4, 2011 at 7:48 PM, Seth Nickell  wrote:

2) Anyone have requests for features/api? Its currently simplistic:
  - takes a "read FILENAME" message, loads the file, does a test
convolution against pink noise to normalize the gain to something sane

Is this done within the main Pd audio thread?

The convolution engine has support for doing it either on the calling
thread, or a background thread. I'm thinking of default to a
background thread. That seem like the right move?


  - caches the last N impulse responses, as the test convolution
takes a little time
  - allows setting the cache size with a "cachesize N" message

To make sure I understood this: cachesize is not the size of the first
partition of the partitioned convolution, but the cache that tries to avoid
audio dropouts when performing the test convolution?

The convolution engine can swap-in a pre-loaded ('cached') IR in
realtime without glitching... but it means keeping 2x the Impulse
Response data in RAM. To keep the default API simple but useful, I'm
defaulting to caching only the last 5 impulse responses in RAM.
"cachesize N" lets you increase that number lets say in a
performance you wanted to use 30 different impulse responses and you
have 2GB of ram... should be nbd.


  - disable normalization with "normalize 0" or "normalize 1"

Yes, disabling this could be a good idea! You could also add a "gain 0-1"
message for manual control.

Its worth noting that impulse responses are usually whack without gain
 normalization like factors of hundreds to millions off a usable
signal.


 Features I'm considering (let me know if they sound useful):
   - load from an array instead of from disk (no gain normalization?)

Very good.

   - It wouldn't be hard to enable MxN convolution if that floats
somebody's boat.

I am sure if you come up with a convolution as efficient and flexible as
jconv by Fons within Pd, then soon a multichannel use and hence request will
come up fast.

I'd be interested in what flexibility means in this context, it might
give me some good ideas for features to add. Efficiency-wise, last
time I benchmarked its more efficient than jconv, but the difference
is offset by less graceful degradation under CPU load (I convolve in
background threads to preserve realtime in the main thread while
avoiding an irritating patent that's going to expire soon...).

WRT to Pd's audio scheduling... are Pd signal externals held to
realtime or can my dsp call vary the number of cycles it takes by 100%
from call to call? VST seems to do ok with this, but AudioUnits get
scheduled to run at the very last instant they possibly could. If Pd
can have some variance, I can drop the threads and improve the
external's degradation under high CPU load.

thanks for the feedback (also, is the best list for this kind of feedback?),

-Seth



___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list


Re: [PD] Making a Realtime Convolution External

2011-04-04 Thread Seth Nickell
Another question on similar lines...

Are the DSP calls liable to vary t_signal->s_n (block size) without
notification? 64 samples, apparently the default on pd-extended, is
doable without buffering for partitioned convolution on a modern
computer, but it exacts a pretty high CPU toll, and if I have to
handle random blocksize changes, it gets more expensive.

Also, since convolution is much more efficient around block sizes of
256 or 512, perhaps I should default to one of these, buffer a little,
and have a "runatpdblocksize" message or somesuch?

On Mon, Apr 4, 2011 at 7:48 PM, Seth Nickell  wrote:
>>> 2) Anyone have requests for features/api? Its currently simplistic:
>>>   - takes a "read FILENAME" message, loads the file, does a test
>>> convolution against pink noise to normalize the gain to something sane
>>
>> Is this done within the main Pd audio thread?
>
> The convolution engine has support for doing it either on the calling
> thread, or a background thread. I'm thinking of default to a
> background thread. That seem like the right move?
>
>>>
>>>   - caches the last N impulse responses, as the test convolution
>>> takes a little time
>>>   - allows setting the cache size with a "cachesize N" message
>>
>> To make sure I understood this: cachesize is not the size of the first
>> partition of the partitioned convolution, but the cache that tries to avoid
>> audio dropouts when performing the test convolution?
>
> The convolution engine can swap-in a pre-loaded ('cached') IR in
> realtime without glitching... but it means keeping 2x the Impulse
> Response data in RAM. To keep the default API simple but useful, I'm
> defaulting to caching only the last 5 impulse responses in RAM.
> "cachesize N" lets you increase that number lets say in a
> performance you wanted to use 30 different impulse responses and you
> have 2GB of ram... should be nbd.
>
>>>
>>>   - disable normalization with "normalize 0" or "normalize 1"
>>
>> Yes, disabling this could be a good idea! You could also add a "gain 0-1"
>> message for manual control.
>
> Its worth noting that impulse responses are usually whack without gain
>  normalization like factors of hundreds to millions off a usable
> signal.
>
>>>  Features I'm considering (let me know if they sound useful):
>>>    - load from an array instead of from disk (no gain normalization?)
>>
>> Very good.
>>>
>>>    - It wouldn't be hard to enable MxN convolution if that floats
>>> somebody's boat.
>>
>> I am sure if you come up with a convolution as efficient and flexible as
>> jconv by Fons within Pd, then soon a multichannel use and hence request will
>> come up fast.
>
> I'd be interested in what flexibility means in this context, it might
> give me some good ideas for features to add. Efficiency-wise, last
> time I benchmarked its more efficient than jconv, but the difference
> is offset by less graceful degradation under CPU load (I convolve in
> background threads to preserve realtime in the main thread while
> avoiding an irritating patent that's going to expire soon...).
>
> WRT to Pd's audio scheduling... are Pd signal externals held to
> realtime or can my dsp call vary the number of cycles it takes by 100%
> from call to call? VST seems to do ok with this, but AudioUnits get
> scheduled to run at the very last instant they possibly could. If Pd
> can have some variance, I can drop the threads and improve the
> external's degradation under high CPU load.
>
> thanks for the feedback (also, is the best list for this kind of feedback?),
>
> -Seth
>

___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list


Re: [PD] Making a Realtime Convolution External

2011-04-04 Thread Seth Nickell
>> 2) Anyone have requests for features/api? Its currently simplistic:
>>   - takes a "read FILENAME" message, loads the file, does a test
>> convolution against pink noise to normalize the gain to something sane
>
> Is this done within the main Pd audio thread?

The convolution engine has support for doing it either on the calling
thread, or a background thread. I'm thinking of default to a
background thread. That seem like the right move?

>>
>>   - caches the last N impulse responses, as the test convolution
>> takes a little time
>>   - allows setting the cache size with a "cachesize N" message
>
> To make sure I understood this: cachesize is not the size of the first
> partition of the partitioned convolution, but the cache that tries to avoid
> audio dropouts when performing the test convolution?

The convolution engine can swap-in a pre-loaded ('cached') IR in
realtime without glitching... but it means keeping 2x the Impulse
Response data in RAM. To keep the default API simple but useful, I'm
defaulting to caching only the last 5 impulse responses in RAM.
"cachesize N" lets you increase that number lets say in a
performance you wanted to use 30 different impulse responses and you
have 2GB of ram... should be nbd.

>>
>>   - disable normalization with "normalize 0" or "normalize 1"
>
> Yes, disabling this could be a good idea! You could also add a "gain 0-1"
> message for manual control.

Its worth noting that impulse responses are usually whack without gain
 normalization like factors of hundreds to millions off a usable
signal.

>>  Features I'm considering (let me know if they sound useful):
>>    - load from an array instead of from disk (no gain normalization?)
>
> Very good.
>>
>>    - It wouldn't be hard to enable MxN convolution if that floats
>> somebody's boat.
>
> I am sure if you come up with a convolution as efficient and flexible as
> jconv by Fons within Pd, then soon a multichannel use and hence request will
> come up fast.

I'd be interested in what flexibility means in this context, it might
give me some good ideas for features to add. Efficiency-wise, last
time I benchmarked its more efficient than jconv, but the difference
is offset by less graceful degradation under CPU load (I convolve in
background threads to preserve realtime in the main thread while
avoiding an irritating patent that's going to expire soon...).

WRT to Pd's audio scheduling... are Pd signal externals held to
realtime or can my dsp call vary the number of cycles it takes by 100%
from call to call? VST seems to do ok with this, but AudioUnits get
scheduled to run at the very last instant they possibly could. If Pd
can have some variance, I can drop the threads and improve the
external's degradation under high CPU load.

thanks for the feedback (also, is the best list for this kind of feedback?),

-Seth

___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list


Re: [PD] Making a Realtime Convolution External

2011-04-04 Thread Peter Plessas

Seth Nickell wrote:

I'm planning to release our realtime convolution engine (extracted
from http://meatscience.net/pages/convolution-reverb) as a GPLed Pd
external.

This is a good idea!


It currently accepts 4-channel ('true stereo'), two channel or mono
impulse responses, with stereo or mono output. Performance is

What is 'true stereo' with four channels by the way?


excellent if you have SSE3 and has a fallback in case you don't, and
it aims for accuracy (basically that means multi-stage scaling to keep
floats within healthy sizes).

1) I'd love to swipe the convolve~ external name, currently installed
by mjlib as part of pd-extended. convolve~ from mjlib appears to be a
copy of pin~ ? so I think it could be taken? Maybe I mis-read the
code. I've cc'ed mark who can probably clarify.

2) Anyone have requests for features/api? Its currently simplistic:
   - takes a "read FILENAME" message, loads the file, does a test
convolution against pink noise to normalize the gain to something sane

Is this done within the main Pd audio thread?

   - caches the last N impulse responses, as the test convolution
takes a little time
   - allows setting the cache size with a "cachesize N" message
To make sure I understood this: cachesize is not the size of the first 
partition of the partitioned convolution, but the cache that tries to 
avoid audio dropouts when performing the test convolution?

   - disable normalization with "normalize 0" or "normalize 1"
Yes, disabling this could be a good idea! You could also add a "gain 
0-1" message for manual control.



  Features I'm considering (let me know if they sound useful):
- load from an array instead of from disk (no gain normalization?)

Very good.

- It wouldn't be hard to enable MxN convolution if that floats
somebody's boat.
I am sure if you come up with a convolution as efficient and flexible as 
jconv by Fons within Pd, then soon a multichannel use and hence request 
will come up fast.


[...]

4) Would this be of interest for Pd-extended?
This would be of interest for all Pd users, no matter if they like their 
externals included in a distribution of Pd ('extended') or manuall 
adding them to their vanilla Pd.


best, P

___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list


[PD] Making a Realtime Convolution External

2011-04-04 Thread Seth Nickell
I'm planning to release our realtime convolution engine (extracted
from http://meatscience.net/pages/convolution-reverb) as a GPLed Pd
external.

It currently accepts 4-channel ('true stereo'), two channel or mono
impulse responses, with stereo or mono output. Performance is
excellent if you have SSE3 and has a fallback in case you don't, and
it aims for accuracy (basically that means multi-stage scaling to keep
floats within healthy sizes).

1) I'd love to swipe the convolve~ external name, currently installed
by mjlib as part of pd-extended. convolve~ from mjlib appears to be a
copy of pin~ ? so I think it could be taken? Maybe I mis-read the
code. I've cc'ed mark who can probably clarify.

2) Anyone have requests for features/api? Its currently simplistic:
   - takes a "read FILENAME" message, loads the file, does a test
convolution against pink noise to normalize the gain to something sane
   - caches the last N impulse responses, as the test convolution
takes a little time
   - allows setting the cache size with a "cachesize N" message
   - disable normalization with "normalize 0" or "normalize 1"

  Features I'm considering (let me know if they sound useful):
- load from an array instead of from disk (no gain normalization?)
- It wouldn't be hard to enable MxN convolution if that floats
somebody's boat.

3) I can compile/test on Mac & Linux, anyone up for helping me with Windows?

4) Would this be of interest for Pd-extended?

5) I'd love to build a granular convolution engine takes two
real-time signals, and extracts grains from one to convolve against
the other. Anyone have ideas about this?

thanks all,

-Seth

___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list