On 2014-04-15, Ross Bencina wrote:
How are you proposing to handle the digital side? USB3 -> USB3 class audio protocol stack -> multiple SPI channels
Personally I'd go with whatever current Ethernet fabric, starting with 100Mbps and ending up with 10Gbps. Crucially always using a single contention domain for this sort of work, and trying to keep the data flow one-way-only unless there was a problem; then going with tens or tens of thousands of parallel contention domains, in both source and target of the data, because obviusly you can't have any bottleneck on either side, in the end.
I'd use the most simplistic kind of protocol you can imagine over ether: an L2 packet with a protocol number reserved for just this purpose, containing a minimalistic, inner utility packet comprising: 1) linear frame number as 32-bit uint, 2) multichannel PCM frame width as 32-bit uint, and 3) your chosen piece of multichannel 24-bit/3-byte fixed point PCM data sent within the packet as-is. Nothing else. No calibration/meta data either, because it's always nominally calibrated to FS==~130dB SPL(A), and it's 24-bit linear. Every sample, even individually, covers the whole range of human hearing, and it does so in a fully linear fashion; no need to fuck around with any options, there.
If you bit-interleaved the samples, you could derive even lower latency. But why go there nowadays? The baud rate of even the age old 100Mbps ethernet I run is already pretty steep even compared to the minimal length of a perceptually transparent 48Khz anti-aliasing filter. So...
How do you make that into a proper protocol? Well, first you make your D/A-converter bank synchronize itself via a much more exacting, out-of-band-protocol. Right downto nano- or picoseconds, referenced to "the frame numbers". Those numbers will be relative in time, but after physical level synch acquisition, they will refer to the same instant in time downto extreme accuracy. Those are the numbers you will be using when you send samples to you D/A bank to be converted; the relative numbers will be recorded at the top of each packet going over your ethernet connection, and the various parts of your D/A bank will always one of a) discard, b) delay or c) (most often) D/A the samples you gave it, respecting the invariant that any sequence of samples with a relative starting time in the immediate relative future wrt the local timestamp must be transformed, and all else must be discarded.
(Then you run a discrete, stochastic, side protocol PLL to synchronize with the D/A bank's current timestamp. That is trivial to do, because of NTP and like codebases and protocols. Eventually you'll end up transmitting a continuous stream of straight linear multichannel sample frames to the converter bank, with the guarantee that they will always be transformed in synch, or not at all.)
There are many other, more well-behaved protocols out there, but this is the simplest and lowest latency one you can also do over commodity ethernet, while retaining synchro over multiple D/A converters in different collision domains/over a switch. And the point really is that you do multiple ethernet connections in parallel, too, over perhaps quite a number of switches; this is what enables this kind of a thing t scale. What I'm talking about here is the kind of fabric/architecture where you could easily have tens of thousands of gigabit ethernet connections going in parallel, leading to something like millions or tens of millions of D/A channels in end. All presumably syncronized, because the clock distribution we can do out of band even in physically distributed D/A-systems, and via the time stamp, we can also exactly reference the samples to be converted to that exact, synched, relative time.
That kind of an architecture needs very little in excess of what I outlined above. It requires a simplistic, distributed clock synch algorithm in the analog domain, it needs a scalable transport for the audio data proper, and it needs some way of referencing the packetized samples to the real, relative time. All that is already present in the above. The rest of the routing functionality can then be provided just via a well-managed ethernet switching backbone. All routing can be provided via static ingress-egress-pairs within the ethernet fabric, a o(1) cost per static connection, so that the core switching fabric doesn't need any extra capability either, and will easily scale even to the kind of redundancy you need for high MTBF's at these kinds of part counts. (Optimally you'll need o(log(depth)) in redundancy, times O(probability of failure)+c in time.) Using nothing but the kind of lowest denominator commodity hardware Google uses for its datacenters, all of that is eminently doable from very low to very high channel counts.
Not to mention that when you design the basic protocol as I just did, you can also do punch-through routing. In fact most higher grade ethernet switches do that already. So, as soon as you can somehow derive an upper bound to their swiching delay under these highly advantageous, dedicated line conditions...
The ultimately optimized path from static ethernet sender to static receiver using this kind of a protocol need not be much beyond lightpath delay plus the low level modulation and framing overhead, plus the overhead of interpreting the numbers I posited above in the header. In toto, using current chip speeds and monomode fiber, as a matter of fact even the static one way delay between sender and receiver could be pushed below the propagation latency of a copper wire within a typical living room.
So, summa summarum, if you really do it right, the kind of architecture I'm thinking about is easily scalable to hundreds of thousands or even millions of D/A channels. Using commodity hardware. While keeping to the proven e2e architecture. While keeping to exact syncronization in A/D/A conversion. While as a fully distributed architecture it can accept (static) signals routings from any input to any output. (Those numbers would call for telephone switch kind of algorithms, but those are to my knowledge already being applied within Google too.) And especially it does scale down to the level of individual converter chips/boards as well, just as soon as they have a simple sync pin.
What's not to like? ;) -- Sampo Syreeni, aka decoy - [email protected], http://decoy.iki.fi/front +358-40-3255353, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2 _______________________________________________ Sursound mailing list [email protected] https://mail.music.vt.edu/mailman/listinfo/sursound
