[Sursound] inexpensive, scalable, ethernet, D/A, and more

Sampo Syreeni Thu, 17 Apr 2014 20:54:18 -0700

On 2014-04-15, Ross Bencina wrote:

How are you proposing to handle the digital side? USB3 -> USB3 classaudio protocol stack -> multiple SPI channels

Personally I'd go with whatever current Ethernet fabric, starting with100Mbps and ending up with 10Gbps. Crucially always using a singlecontention domain for this sort of work, and trying to keep the dataflow one-way-only unless there was a problem; then going with tens ortens of thousands of parallel contention domains, in both source andtarget of the data, because obviusly you can't have any bottleneck oneither side, in the end.

I'd use the most simplistic kind of protocol you can imagine over ether:an L2 packet with a protocol number reserved for just this purpose,containing a minimalistic, inner utility packet comprising: 1) linearframe number as 32-bit uint, 2) multichannel PCM frame width as 32-bituint, and 3) your chosen piece of multichannel 24-bit/3-byte fixed pointPCM data sent within the packet as-is. Nothing else. No calibration/metadata either, because it's always nominally calibrated to FS==~130dBSPL(A), and it's 24-bit linear. Every sample, even individually, coversthe whole range of human hearing, and it does so in a fully linearfashion; no need to fuck around with any options, there.

If you bit-interleaved the samples, you could derive even lower latency.But why go there nowadays? The baud rate of even the age old 100Mbpsethernet I run is already pretty steep even compared to the minimallength of a perceptually transparent 48Khz anti-aliasing filter. So...

How do you make that into a proper protocol? Well, first you make yourD/A-converter bank synchronize itself via a much more exacting,out-of-band-protocol. Right downto nano- or picoseconds, referenced to"the frame numbers". Those numbers will be relative in time, but afterphysical level synch acquisition, they will refer to the same instant intime downto extreme accuracy. Those are the numbers you will be usingwhen you send samples to you D/A bank to be converted; the relativenumbers will be recorded at the top of each packet going over yourethernet connection, and the various parts of your D/A bank will alwaysone of a) discard, b) delay or c) (most often) D/A the samples you gaveit, respecting the invariant that any sequence of samples with arelative starting time in the immediate relative future wrt the localtimestamp must be transformed, and all else must be discarded.

(Then you run a discrete, stochastic, side protocol PLL to synchronizewith the D/A bank's current timestamp. That is trivial to do, because ofNTP and like codebases and protocols. Eventually you'll end uptransmitting a continuous stream of straight linear multichannel sampleframes to the converter bank, with the guarantee that they will alwaysbe transformed in synch, or not at all.)

There are many other, more well-behaved protocols out there, but this isthe simplest and lowest latency one you can also do over commodityethernet, while retaining synchro over multiple D/A converters indifferent collision domains/over a switch. And the point really is thatyou do multiple ethernet connections in parallel, too, over perhapsquite a number of switches; this is what enables this kind of a thing tscale. What I'm talking about here is the kind of fabric/architecturewhere you could easily have tens of thousands of gigabit ethernetconnections going in parallel, leading to something like millions ortens of millions of D/A channels in end. All presumably syncronized,because the clock distribution we can do out of band even in physicallydistributed D/A-systems, and via the time stamp, we can also exactlyreference the samples to be converted to that exact, synched, relativetime.

That kind of an architecture needs very little in excess of what Ioutlined above. It requires a simplistic, distributed clock synchalgorithm in the analog domain, it needs a scalable transport for theaudio data proper, and it needs some way of referencing the packetizedsamples to the real, relative time. All that is already present in theabove. The rest of the routing functionality can then be provided justvia a well-managed ethernet switching backbone. All routing can beprovided via static ingress-egress-pairs within the ethernet fabric, ao(1) cost per static connection, so that the core switching fabricdoesn't need any extra capability either, and will easily scale even tothe kind of redundancy you need for high MTBF's at these kinds of partcounts. (Optimally you'll need o(log(depth)) in redundancy, timesO(probability of failure)+c in time.) Using nothing but the kind oflowest denominator commodity hardware Google uses for its datacenters,all of that is eminently doable from very low to very high channelcounts.

Not to mention that when you design the basic protocol as I just did,you can also do punch-through routing. In fact most higher gradeethernet switches do that already. So, as soon as you can somehow derivean upper bound to their swiching delay under these highly advantageous,dedicated line conditions...

The ultimately optimized path from static ethernet sender to staticreceiver using this kind of a protocol need not be much beyond lightpathdelay plus the low level modulation and framing overhead, plus theoverhead of interpreting the numbers I posited above in the header. Intoto, using current chip speeds and monomode fiber, as a matter of facteven the static one way delay between sender and receiver could bepushed below the propagation latency of a copper wire within a typicalliving room.

So, summa summarum, if you really do it right, the kind of architectureI'm thinking about is easily scalable to hundreds of thousands or evenmillions of D/A channels. Using commodity hardware. While keeping to theproven e2e architecture. While keeping to exact syncronization in A/D/Aconversion. While as a fully distributed architecture it can accept(static) signals routings from any input to any output. (Those numberswould call for telephone switch kind of algorithms, but those are to myknowledge already being applied within Google too.) And especially itdoes scale down to the level of individual converter chips/boards aswell, just as soon as they have a simple sync pin.


What's not to like? ;)
--
Sampo Syreeni, aka decoy - [email protected], http://decoy.iki.fi/front
+358-40-3255353, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
_______________________________________________
Sursound mailing list
[email protected]
https://mail.music.vt.edu/mailman/listinfo/sursound

[Sursound] inexpensive, scalable, ethernet, D/A, and more

Reply via email to