Thanks all for your non-scarce replies to my question and a lot of practical
considerations. I suppose mostly it's always nice to know what people are working on.
About the reasons for my inquiry: mostly tho whole of the modern day PC as a Complex
Instruction Set Computer has become so convolved that dealing with it's pipelines
(instruction )pre-)fetch, memory access delay (depending on address- and data-flow access
times memory page reuse, clock frequency matching between (frequency governed/turbo
boosted) cores/threads and the memory access and DMA control units, etc.) the cache
filling, write-through and hierarchical access times, and the contentions of threads/cores
accessing memory and devices, is pretty hard, and looks almost non-determinable for
software engineers.
Add to that the hardware and network related related multilevel buffering and the
difficulty of executing kernel activities like memory segment and bank assignment,
process/thread scheduling/check-pointing/virtual multiprocessing state preservation, it's
hard to know what efficient programming, and gives reliable and repeatable interactions
times unless you trim the number of processes, run a real-time kernel Linux (which of
course aren't a scientifically RT finite state machine yet), us a machine with little
variation in it's load or stay way under the full load of the CPU and memory such that
they hardly rise in temperature as they would when you'd use a significant portion of it's
actual processing power, or maybe resort to a simpler processor with simpler heat
management and build you own OS+software from the ground up without a claim to general
usefulness.
In practice task switching, which can be related to thread instructions, as well as memory
management can be in the way of the fine grained real-time responsiveness you may want,
and the many pipelines in the modern PC (say an I7 machine) as well as the many caches in
combination with access granularity to the main memory can be very in the way of even
deciding which small computation should follow the other and then efficiently execute a
small number of computations.
An FPGA like the cheap put powerful Zynq 7010 I use can, when running at 1/3 of a GHz
compute fast logical sequences very efficiently, and for instance theoretically can run
certain filters at up to 10Gigaops per second, which isn't necessarily easy on a PC, but
it then still can connect up signal parts with almost no buffering in between and very
little pipe-lining. Of course in case you want to make good use of your virtual CPUs ALU
or even FPU, you need to run more samples through it than simply one per clock cycle. For
cases of straightforward logic resulting from optimized silicon compiling a C program with
the latest Xilinx Vivado HLx, it is possible to run computations in 1 (one) clock cycle of
the 333MHz FPGA fabric. That means at CD rate you should use 333,000kHz/44.1kHz ~ 7,551.0
samples per computation unit to make full use of that hardware instance's full abilities.
I for myself regularly use a "jackd" (Linux/alsa) process framesize of 8192 as this makes
the system very stable when a lot of computations are to be done by various pipelines of
various cores, while running 192kHz audio, which is an empirical given, and not on an
optimized machine (for instance it still doubles as a web server, TV, and I prefer to run
Firefox as well).
Anyhow, it's an interesting subject which I as very advanced musician like when it becomes
more accurate and responsive !
Theo V.
_______________________________________________
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp