Re: [PD-dev] threads

Christof Ressi Thu, 11 Apr 2024 14:22:29 -0700

Yeah, that's just the way that Windows rolls for any program that hasa message loop.

Where did you get that information? A message loop does not createthreads, it's the other way round: you create a message loop on existingthreads, e.g. by creating a window (typically on the main thread).

Not to mention that Pd does not contain any windows event loops (exceptfor some externals).

z_ringbuffer appears to be deployed in pretty much only one contextwhich appears to be pretty much isolated to always being in the samethread.

What do you mean by "in the same thread"? The ringbuffer is used forqueuing outgoing messages; the message is sent on the audio thread andreceeived on some other thread (e.g. the UI thread).

Just to be clear: both the portaudio and libpd ringbuffer aresingle-producer-single-consumer, i.e. there must only be a single writerthread and a single reader thread at a time.

As I said above, I am pretty sure that z_ringbuffer's use of atomicsis actually incorrect.

It's only "wrong" in the sense that it uses more complex atomics andstronger memory order than required, but it does not make the codeincorrect.

To be more specific: the libpd ringbuffer uses atomic read-modify-writeoperations (with dummy arguments) instead of atomic loads and stores.Again, these are hacks from pre-C11 times. Unfortunately, the C11version follows this pattern instead of using the more appropriateatomic_load[_explicit] and atomic_store_[explicit] functions. Also, itimplicitly uses the default memory ordering (memory_order_seq_cst) forall atomic operations, which is much stronger than what we actually need.

I haven't yet taken the time to prove this as the z_ringbuffer callsites are very limited, and it clearly works in context

SPSC ringbuffers are a solved problem and one of the most basic lockfreedatastructures there is. If you understand how atomics work you can justlook at the code and immediately tell if it's correct or not. There'snot much to "prove". Here's how a lockfree SPSC ring buffer works, usingthe libpd ringbuffer API as an example:

1. rb_available_to_write: atomically load and compare both the read andwrite pointer. Actually, the loads can be relaxed (the proof is left asan excercise to the reader :), but it does not hurt to usememory_order_acquire (or higher).


2: rb_available_to_read: same as 1.

3. rb_write_to_buffer: load the write head and write the data, thenincrement it and store it back atomically with memory_order_release (orhigher)

4. rb_read_from_buffer: first load the read head and read the data, thenincrement it and store it back atomically with memory_order_release (orhigher)

Now where do you think the libpd ringbuffer implementation is incorrectin the sense that it would cause race conditions?

For comparison, here's my own little implementation in C++11:https://github.com/Spacechild1/vstplugin/blob/3f0ed8a800ea238bf204a2ead940b2d1324ac909/vst/Lockfree.h#L10-L58(which you can assume to be correct :)

If you still don't trust the libpd ringbuffer, feel free to use thatinstead. I hope you are not using C, but if you do, the code can beeasily adapted to the equivalent C11 functions (seehttps://en.cppreference.com/w/c/thread)


Christof

On 11.04.2024 14:08, Caoimhe &co wrote:

On Thu, 11 Apr 2024 at 07:52, Christof Ressi <i...@christofressi.com>wrote:> I get at least four threads. Just for some context: here on WindowsI get 10 threads when I open Pd and start DSP, but only 2 of these areactive and the remaining 8 or idle.
Yeah, that's just the way that Windows rolls for any program that hasa message loop. It's also different when you run under a debugger thanwhen you run standalone. Why yes, I do get paid not enough to do deepWindows debugging, why do you ask?
> The Pd core itself does not spawn any threads, only the audiobackend and certain objects/externals do (notably [readsf~] and[writesf~]).
I was aware of that as the standard story, which is why I wassurprised to see four threads, and that it was in (relatively)invariant even when you included to more 'primitive' back-ends likeOSS. It seems like this is a bit of lore that should be known, if notparticularly documented anywhere.
> But why do you care about the number of threads in the first place?
Because I am working on code which is trying to handle some of the*other* JACK data streams. Ambiguity in thread functionality makes forambiguity in debugging.
>> As an aside: is the code in z_ringbuffer.{c,h} consideredtrustworthy? I note that the other code in PD>> appears to use the sys_ringbuffer* API, which seems to be built onthe PA ringbuffer.
> Is the PA ringbuffer considered trustworthy?
Well it is *in use*, which means that *somebody* considers ittrustworthy (in a multi-threaded context). z_ringbuffer appears to bedeployed in pretty much only one context which appears to be prettymuch isolated to always being in the same thread. But see my abovequestion about threading. I had debugging artifacts which looked likez_ringbuffer was behaving badly under thread race conditions, so Ilooked at the code, and I am pretty sure that z_ringbuffer's use ofatomics is actually incorrect. I haven't yet taken the time to provethis as the z_ringbuffer call sites are very limited.
> Note that the ringbuffer code in "s_audio_ringbuf.c" - for whateverreason - is missing all the memory> barriers from the original PA implementation. This happens to workas the implementation is in another> source file and (non-inline) function calls act as compiler barriersand Intel has a strong memory model,> but if compiled with LTO this code may very well fail on otherplatforms, particularly on ARM.
Yeah, I saw that. It's actually *worse* because the header filemulti-include protection uses the same preprocessor symbol as the JACKringbuffer implementation. I fixed that in my local git repo. How do Iknow this? After a light code read, I switched to using the JACKringbuffer implementation, which I *do* trust.
>> I ask because I had some problems with z_ringbuffer.c and after acode read, there are some
>> bits which look sketchy enough to me that I decided to stop using it.
> Which problems did you have? And which bits look sketchy? There aresome things that could be> improved. The original code has been written before C11, i.e. beforeC/C++ got an official memory> model. As a consequence, the platform specific atomic instructions /memory barriers are stronger> than required. In general, SYNC_FETCH should really be calledSYNC_LOAD and SYNC_COMPARE_AND_SWAP> should be called SYNC_STORE. With C11, SYNC_LOAD could be just anatomic_load (with> memory_order_acquire) and SYNC_STORE could be an atomic_store (withmemory_order_release).
> Apart from that, the code looks fine to me.
As I said above, I am pretty sure that z_ringbuffer's use of atomicsis actually incorrect. I haven't yet taken the time to prove this asthe z_ringbuffer call sites are very limited, and it clearly works incontext. I don't think it will work in a more difficult context. Thelast time I worked with atomics, I ended up writing an automated proofchecker to make sure that all of the cases worked correctly. I workwith PD when I want to make music more than I want to do advanced CS.And then there's the question of the marginal utility of PD having itsown implementation of a ringbuffer, but I will leave that for all ofyou who have dedicated far more time than I to the maintenance of thisproject.
- c&co

_______________________________________________
Pd-dev mailing list
Pd-dev@lists.iem.at
https://lists.puredata.info/listinfo/pd-dev

_______________________________________________
Pd-dev mailing list
Pd-dev@lists.iem.at
https://lists.puredata.info/listinfo/pd-dev

Re: [PD-dev] threads

Reply via email to