Yeah, that's just the way that Windows rolls for any program that has a message loop.
Where did you get that information? A message loop does not create threads, it's the other way round: you create a message loop on existing threads, e.g. by creating a window (typically on the main thread).

Not to mention that Pd does not contain any windows event loops (except for some externals).


z_ringbuffer appears to be deployed in pretty much only one context which appears to be pretty much isolated to always being in the same thread.
What do you mean by "in the same thread"? The ringbuffer is used for queuing outgoing messages; the message is sent on the audio thread and receeived on some other thread (e.g. the UI thread).

Just to be clear: both the portaudio and libpd ringbuffer are single-producer-single-consumer, i.e. there must only be a single writer thread and a single reader thread at a time.


As I said above, I am pretty sure that z_ringbuffer's use of atomics is actually incorrect.
It's only "wrong" in the sense that it uses more complex atomics and stronger memory order than required, but it does not make the code incorrect.

To be more specific: the libpd ringbuffer uses atomic read-modify-write operations (with dummy arguments) instead of atomic loads and stores. Again, these are hacks from pre-C11 times. Unfortunately, the C11 version follows this pattern instead of using the more appropriate atomic_load[_explicit] and atomic_store_[explicit] functions. Also, it implicitly uses the default memory ordering (memory_order_seq_cst) for all atomic operations, which is much stronger than what we actually need.


I haven't yet taken the time to prove this as the z_ringbuffer call sites are very limited, and it clearly works in context
SPSC ringbuffers are a solved problem and one of the most basic lockfree datastructures there is. If you understand how atomics work you can just look at the code and immediately tell if it's correct or not. There's not much to "prove". Here's how a lockfree SPSC ring buffer works, using the libpd ringbuffer API as an example:

1. rb_available_to_write: atomically load and compare both the read and write pointer. Actually, the loads can be relaxed (the proof is left as an excercise to the reader :), but it does not hurt to use memory_order_acquire (or higher).

2: rb_available_to_read: same as 1.

3. rb_write_to_buffer: load the write head and write the data, then increment it and store it back atomically with memory_order_release (or higher)

4. rb_read_from_buffer: first load the read head and read the data, then increment it and store it back atomically with memory_order_release (or higher)

Now where do you think the libpd ringbuffer implementation is incorrect in the sense that it would cause race conditions?

For comparison, here's my own little implementation in C++11: https://github.com/Spacechild1/vstplugin/blob/3f0ed8a800ea238bf204a2ead940b2d1324ac909/vst/Lockfree.h#L10-L58 (which you can assume to be correct :)

If you still don't trust the libpd ringbuffer, feel free to use that instead. I hope you are not using C, but if you do, the code can be easily adapted to the equivalent C11 functions (see https://en.cppreference.com/w/c/thread)

Christof

On 11.04.2024 14:08, Caoimhe &co wrote:
On Thu, 11 Apr 2024 at 07:52, Christof Ressi <i...@christofressi.com> wrote: > I get at least four threads. Just for some context: here on Windows I get 10 threads when I open Pd and start DSP, but only 2 of these are active and the remaining 8 or idle.

Yeah, that's just the way that Windows rolls for any program that has a message loop. It's also different when you run under a debugger than when you run  standalone. Why yes, I do get paid not enough to do deep Windows debugging, why do you ask?

> The Pd core itself does not spawn any threads, only the audio backend and certain objects/externals do (notably [readsf~] and [writesf~]).

I was aware of that as the standard story, which is why I was surprised to see four threads, and that it was in (relatively) invariant even when you included to more 'primitive' back-ends like OSS. It seems like this is a bit of lore that should be known, if not particularly documented anywhere.

> But why do you care about the number of threads in the first place?

Because I am working on code which is trying to handle some of the *other* JACK data streams. Ambiguity in thread functionality makes for ambiguity in debugging.

>> As an aside: is the code in z_ringbuffer.{c,h} considered trustworthy? I note that the other code in PD >> appears to use the sys_ringbuffer* API, which seems to be built on the PA ringbuffer.

> Is the PA ringbuffer considered trustworthy?

Well it is *in use*, which means that *somebody* considers it trustworthy (in a multi-threaded context). z_ringbuffer appears to be deployed in pretty much only one context which appears to be pretty much isolated to always being in the same thread. But see my above question about threading. I had debugging artifacts which looked like z_ringbuffer was behaving badly under thread race conditions, so I looked at the code, and I am pretty sure that z_ringbuffer's use of atomics is actually incorrect. I haven't yet taken the time to prove this as the z_ringbuffer call sites are very limited.

> Note that the ringbuffer code in "s_audio_ringbuf.c" - for whatever reason - is missing all the memory > barriers from the original PA implementation. This happens to work as the implementation is in another > source file and (non-inline) function calls act as compiler barriers and Intel has a strong memory model, > but if compiled with LTO this code may very well fail on other platforms, particularly on ARM.

 Yeah, I saw that. It's actually *worse* because the header file multi-include protection uses the same preprocessor symbol as the JACK ringbuffer implementation. I fixed that in my local git repo. How do I know this? After a light code read, I switched to using the JACK ringbuffer implementation, which I *do* trust.

>> I ask because I had some problems with z_ringbuffer.c and after a code read, there are some
>> bits which look sketchy enough to me that I decided to stop using it.

> Which problems did you have? And which bits look sketchy? There are some things that could be > improved. The original code has been written before C11, i.e. before C/C++ got an official memory > model. As a consequence, the platform specific atomic instructions / memory barriers are stronger > than required. In general, SYNC_FETCH should really be called SYNC_LOAD and SYNC_COMPARE_AND_SWAP > should be called SYNC_STORE. With C11, SYNC_LOAD could be just an atomic_load (with > memory_order_acquire) and SYNC_STORE could be an atomic_store (with memory_order_release).
> Apart from that, the code looks fine to me.

As I said above, I am pretty sure that z_ringbuffer's use of atomics is actually incorrect. I haven't yet taken the time to prove this as the z_ringbuffer call sites are very limited, and it clearly works in context. I don't think it will work in a more difficult context. The last time I worked with atomics, I ended up writing an automated proof checker to make sure that all of the cases worked correctly. I work with PD when I want to make music more than I want to do advanced CS. And then there's the question of the marginal utility of PD having its own implementation of a ringbuffer, but I will leave that for all of you who have dedicated far more time than I to the maintenance of this project.

- c&co

_______________________________________________
Pd-dev mailing list
Pd-dev@lists.iem.at
https://lists.puredata.info/listinfo/pd-dev
_______________________________________________
Pd-dev mailing list
Pd-dev@lists.iem.at
https://lists.puredata.info/listinfo/pd-dev

Reply via email to