It looks like I am not very lucky in getting attention, so let me try to re-up
this. Can we implement a threaded [block~] ? see details below
From: Giulio Moro <[email protected]>
To: Pd-List <[email protected]>
Sent: Sunday, 18 September 2016, 2:23
Subject: Threading in Pd/libpd
Hi all,if I understand correctly, using the [block~] and [switch~] objects to
increase the blocksize for a given subpatch, means that the DSP computation for
that subpatch is delayed until the moment when enough input samples have been
collected, at which point the entire DSP stack for the subpatch is performed at
once and the outputs are written to the output buffer.This means that the DSP
load is not spread over time, rather it is concentrated in that single audio
driver callback when the buffer for that subpatch happens to be ready to be
processed.
Now, if what I say makes sense, then this approach has the disadvantage that
the CPU load is not spread evenly across audio callbacks, eventually causing
dropouts if whatever computation takes too long in that one callback, forcing
you to increase the internal buffering of Pd (``Delay'') to cope with this. At
the same time, though, the CPU will be pretty much idle in all the other audio
callbacks.
If we could spread the load of the expensive, but occasional, computation (say
fft) over multiple audio callbacks, then the CPU load would be more even, with
no spikes and there would be no need to increase Pd's internal buffering.This
would require to have the output of the fft available a few processing blocks
after the one where it was started, while the current approach allows to have
it immediately available. A fine tuning of the system would be required to
understand how much this latency should be, and worst case it would be the
number of overlap samples as set by [block~] (as in: if the system cannot
process these blocks fast enough, then you should lower your requirements, as
your system cannot provide the required throughput). Now this may seem a
downside, but the actual overall roundtrip latency of the Pd subpatch would be
not much larger than the one currently achievable (if at all larger), with the
added advantage that the rest of Pd could work at smaller blocksizes, and with
a ``Delay'' set to 0.The ultimate advantage would be to have a more responsive
system, in terms of I/O roundtrip for most of the patch, except those
subpatches where a longer latency is anyhow imposed by the algorithm. Think for
instance of having a patch processing the live the sound of an instrument,
which also uses [sigmund~] to detect its pitch to apply some adaptive effect. A
low roundtrip latency could be used for the processed instrument while the
latency imposed by [sigmund~] would only affect e.g.: the parameters of the
effect. I see how this approach may be useful in many cases.Multi-core hardware
would take extra advantage from this way of spreading the CPU usage.
I am in the situation where I hacked together a threaded version of [sigmund~]
for use with libpd on Bela which works fine and I am wondering if it is worth
going down the route of making threaded versions of all objects with similar
requirements (which I really would not want to do) or I should rather try to
create some higher-level objects (say [blockThread~] ) that perform the
threading strategy mentioned above.It may be that [pd~] could probably(?)
provide the solution requested, but it seems to me there is lots of overhead
associated with it, and I do not see how to easily integrate it with our use of
libpd.
So, probably this point has been discussed previously, I'd like to know:- are
there any existing objects doing this already?- what are the pitfalls that
prevented such an approach from making its way into Pd?- how can I help?
Best,Giulio
_______________________________________________
[email protected] mailing list
UNSUBSCRIBE and account-management ->
https://lists.puredata.info/listinfo/pd-list