Yeah, that's just the way that Windows rolls for any program that has
a message loop.
Where did you get that information? A message loop does not create
threads, it's the other way round: you create a message loop on existing
threads, e.g. by creating a window (typically on the main thread).
Not to mention that Pd does not contain any windows event loops (except
for some externals).
z_ringbuffer appears to be deployed in pretty much only one context
which appears to be pretty much isolated to always being in the same
thread.
What do you mean by "in the same thread"? The ringbuffer is used for
queuing outgoing messages; the message is sent on the audio thread and
receeived on some other thread (e.g. the UI thread).
Just to be clear: both the portaudio and libpd ringbuffer are
single-producer-single-consumer, i.e. there must only be a single writer
thread and a single reader thread at a time.
As I said above, I am pretty sure that z_ringbuffer's use of atomics
is actually incorrect.
It's only "wrong" in the sense that it uses more complex atomics and
stronger memory order than required, but it does not make the code
incorrect.
To be more specific: the libpd ringbuffer uses atomic read-modify-write
operations (with dummy arguments) instead of atomic loads and stores.
Again, these are hacks from pre-C11 times. Unfortunately, the C11
version follows this pattern instead of using the more appropriate
atomic_load[_explicit] and atomic_store_[explicit] functions. Also, it
implicitly uses the default memory ordering (memory_order_seq_cst) for
all atomic operations, which is much stronger than what we actually need.
I haven't yet taken the time to prove this as the z_ringbuffer call
sites are very limited, and it clearly works in context
SPSC ringbuffers are a solved problem and one of the most basic lockfree
datastructures there is. If you understand how atomics work you can just
look at the code and immediately tell if it's correct or not. There's
not much to "prove". Here's how a lockfree SPSC ring buffer works, using
the libpd ringbuffer API as an example:
1. rb_available_to_write: atomically load and compare both the read and
write pointer. Actually, the loads can be relaxed (the proof is left as
an excercise to the reader :), but it does not hurt to use
memory_order_acquire (or higher).
2: rb_available_to_read: same as 1.
3. rb_write_to_buffer: load the write head and write the data, then
increment it and store it back atomically with memory_order_release (or
higher)
4. rb_read_from_buffer: first load the read head and read the data, then
increment it and store it back atomically with memory_order_release (or
higher)
Now where do you think the libpd ringbuffer implementation is incorrect
in the sense that it would cause race conditions?
For comparison, here's my own little implementation in C++11:
https://github.com/Spacechild1/vstplugin/blob/3f0ed8a800ea238bf204a2ead940b2d1324ac909/vst/Lockfree.h#L10-L58
(which you can assume to be correct :)
If you still don't trust the libpd ringbuffer, feel free to use that
instead. I hope you are not using C, but if you do, the code can be
easily adapted to the equivalent C11 functions (see
https://en.cppreference.com/w/c/thread)
Christof
On 11.04.2024 14:08, Caoimhe &co wrote:
On Thu, 11 Apr 2024 at 07:52, Christof Ressi <i...@christofressi.com>
wrote:
> I get at least four threads. Just for some context: here on Windows
I get 10 threads when I open Pd and start DSP, but only 2 of these are
active and the remaining 8 or idle.
Yeah, that's just the way that Windows rolls for any program that has
a message loop. It's also different when you run under a debugger than
when you run standalone. Why yes, I do get paid not enough to do deep
Windows debugging, why do you ask?
> The Pd core itself does not spawn any threads, only the audio
backend and certain objects/externals do (notably [readsf~] and
[writesf~]).
I was aware of that as the standard story, which is why I was
surprised to see four threads, and that it was in (relatively)
invariant even when you included to more 'primitive' back-ends like
OSS. It seems like this is a bit of lore that should be known, if not
particularly documented anywhere.
> But why do you care about the number of threads in the first place?
Because I am working on code which is trying to handle some of the
*other* JACK data streams. Ambiguity in thread functionality makes for
ambiguity in debugging.
>> As an aside: is the code in z_ringbuffer.{c,h} considered
trustworthy? I note that the other code in PD
>> appears to use the sys_ringbuffer* API, which seems to be built on
the PA ringbuffer.
> Is the PA ringbuffer considered trustworthy?
Well it is *in use*, which means that *somebody* considers it
trustworthy (in a multi-threaded context). z_ringbuffer appears to be
deployed in pretty much only one context which appears to be pretty
much isolated to always being in the same thread. But see my above
question about threading. I had debugging artifacts which looked like
z_ringbuffer was behaving badly under thread race conditions, so I
looked at the code, and I am pretty sure that z_ringbuffer's use of
atomics is actually incorrect. I haven't yet taken the time to prove
this as the z_ringbuffer call sites are very limited.
> Note that the ringbuffer code in "s_audio_ringbuf.c" - for whatever
reason - is missing all the memory
> barriers from the original PA implementation. This happens to work
as the implementation is in another
> source file and (non-inline) function calls act as compiler barriers
and Intel has a strong memory model,
> but if compiled with LTO this code may very well fail on other
platforms, particularly on ARM.
Yeah, I saw that. It's actually *worse* because the header file
multi-include protection uses the same preprocessor symbol as the JACK
ringbuffer implementation. I fixed that in my local git repo. How do I
know this? After a light code read, I switched to using the JACK
ringbuffer implementation, which I *do* trust.
>> I ask because I had some problems with z_ringbuffer.c and after a
code read, there are some
>> bits which look sketchy enough to me that I decided to stop using it.
> Which problems did you have? And which bits look sketchy? There are
some things that could be
> improved. The original code has been written before C11, i.e. before
C/C++ got an official memory
> model. As a consequence, the platform specific atomic instructions /
memory barriers are stronger
> than required. In general, SYNC_FETCH should really be called
SYNC_LOAD and SYNC_COMPARE_AND_SWAP
> should be called SYNC_STORE. With C11, SYNC_LOAD could be just an
atomic_load (with
> memory_order_acquire) and SYNC_STORE could be an atomic_store (with
memory_order_release).
> Apart from that, the code looks fine to me.
As I said above, I am pretty sure that z_ringbuffer's use of atomics
is actually incorrect. I haven't yet taken the time to prove this as
the z_ringbuffer call sites are very limited, and it clearly works in
context. I don't think it will work in a more difficult context. The
last time I worked with atomics, I ended up writing an automated proof
checker to make sure that all of the cases worked correctly. I work
with PD when I want to make music more than I want to do advanced CS.
And then there's the question of the marginal utility of PD having its
own implementation of a ringbuffer, but I will leave that for all of
you who have dedicated far more time than I to the maintenance of this
project.
- c&co
_______________________________________________
Pd-dev mailing list
Pd-dev@lists.iem.at
https://lists.puredata.info/listinfo/pd-dev
_______________________________________________
Pd-dev mailing list
Pd-dev@lists.iem.at
https://lists.puredata.info/listinfo/pd-dev