Tim Blechmann wrote: > hi niklas, > > i'm curious about your implementation: > - have you been doing some profiling of the scheduling overhead? > - which latency settings have you been using? it would be great to know > the worst-case response times of the locking synchronization ... > Hey tim&chuck,
as i said i just did it for fun, so i did no profiling after everything looked promising enough to become uninteresting again. i hope to get some time the next week(s) to write the basic working down, the idea behind it isnt really complicated which might imply several inconsistencies. i dont think sharing the code will do any good since it is is a complete mess, but i do think, that i can write it down in a more formal way. this will also allow for a better discussion and analysis. so long... Niklas > in general, the expressive power of dataflow languages in terms of > parallelism is really amazing, however neither pd nor nova are > general-purpose programming languages, but low-latency soft-realtime > audio programming languages, which makes a usable implementation rather > complex ... > > cheers, tim > > On Thu, 2007-05-31 at 02:19 +0200, Niklas Klügel wrote: > >> Tim Blechmann wrote: >> >>> On Wed, 2007-05-30 at 12:13 +0200, Niklas Klügel wrote: >>> >>> >>>>> I think it depends on the application.... for the most part, we >>>>> >>>>> >>>> can't >>>> >>>> >>>>> get a generic speedup from using multiple cores (forgive me if >>>>> >>>>> >>>> wrong) >>>> >>>> >>>>> that would apply to every single pd program..... but some types of >>>>> computations such as large ffts can be performed faster when >>>>> distributed to different cores, in which case, the code for the fft >>>>> has to be parallelized a priori. Plus, the memory is tricky. You >>>>> >>>>> >>>> can >>>> >>>> >>>>> have a memory access bottleneck, when using a shared memory resource >>>>> between multiple processors. >>>>> It's definitely a problem that is worth solving, but I'm not >>>>> suggesting to do anything about it soon. It sounds like something >>>>> that would require a complete top-down re-design to be successful. >>>>> yikes >>>>> >>>>> Chuck >>>>> >>>>> >>>>> >>>>> >>>> I once wrote such a toolset that does automatically scale up >>>> with multiple threads throughout the whole network. it worked >>>> by detecting cycles in the graph and splits of the signals while >>>> segmenting the graph in autonomous sequential parts and essentially >>>> adding some smart and lightweight locks everyhwere the signals >>>> split or merged. it even reassigned threats on the lock-level to >>>> "balance" the workload in the graph and preventing deadlocks. >>>> the code is/was around 2.5k lines of c++ code and a bloody mess :) >>>> so, i don't know much about the internals of pd but it'd be probably >>>> possible. >>>> >>>> >>> detaching ffts (i.e. canvases with larger blocksizes than 64) should be >>> rather trivial ... >>> >>> distributing a synchronous dsp graph to several threads is not trivial, >>> especially when it comes to a huge number of nodes. for small numbers of >>> nodes the approach of jackdmp, using a dynamic dataflow scheduling, is >>> probably usable, but when it comes to huge dsp graphs, the >>> synchronization overhead is probably to big, so the graph would have to >>> be split to parallel chunks which are then scheduled ... >>> >>> >> true, i didn't try big graphs, so i can't really say how it would behave. >> it was more a fun project to see if it was doable. at that time i had >> the impression that the locking and the re-assignment of threads >> was quite efficient and done only on demand, if the graph >> has more sequential parts than the number of created threads >> ; i am curious how it can be achieved in a lock-free way. >> >> about the issues of explicitely threading parts of the graph (that came >> up in the >> discussion lateron), i must say i don't get why you would want to do it. >> seeing how the numbers of cores are about >> to increase, i'd say that it is contraproductive in relation to the >> technological >> development of hardware and the software running on top of it lagging >> behind as well >> as the steady implicit maintenance of the software involved. from my >> point of view >> a graphical dataflow language has the perfect semantics to express the >> parallelisms >> of a program in an intuitive way. therefore i'd say that rather than >> adding constructs >> for explicit parallelism to the language that is able to express them anyhow >> adding constructs for explicit serialization of a process makes more sense. >> maybe i'm talking nonsense here, please correct me. >> >> so long... >> Niklas >> >> >> _______________________________________________ >> [email protected] mailing list >> UNSUBSCRIBE and account-management -> >> http://lists.puredata.info/listinfo/pd-list >> > -- > [EMAIL PROTECTED] ICQ: 96771783 > http://tim.klingt.org > > Every word is like an unnecessary stain on silence and nothingness > Samuel Beckett > _______________________________________________ [email protected] mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
