Re: [wsjt-devel] Audio input from RTP stream?

Phil Karn Wed, 07 Mar 2018 00:40:15 -0800

On 3/6/18 08:05, Bill Somerville wrote:

> not very hard at all. The audio in WSJT-X is via a Qt I/O Device
> abstraction and making a UDP server that joins a multicast group to
> receive and send audio would only require a thin layer to manage
> datagram reception and transmission converting into/out of a stream
> in/out of respective QIODevice sub-classes is about the whole
> requirement. The Modulator and Detector classes in WSJT-X are pretty
> much agnostic about which QIODevice derivative they talk to for PCM
> sample streams.


This is great news!

> Can you point me to somewhere I can get information about gathering RTP
> datagrams and combining as a continuous stream? Do I need to work with
> RTCP as well?

RTP is really very simple. It's documented in RFC-3550 ("A Transport
Protocol for Real-Time Applications"). It also defines RTCP

I implemented RTP in about a page of C code, not counting all of the
UNIX/Linux system calls needed to set up multicast sockets. That's
actually the only hard part. With RTP I just check that the next packet
is in sequence, drop any old duplicates, and play out silence in place
of lost packets to maintain timing, which is much more important for
digital demodulators than for human speech.

Where RTP can get tricky is in managing timing jitter. This isn't a
problem on most LANs unless they're severely overloaded. Over a longer
path the problem is always picking the right size for the playout
buffer, trading off latency against the risk of dropping packets that
arrive just a little too late. This is only a problem for interactive
voice; it's a non-problem for a program like WSJT that processes signals
in large chunks. For example, with WSPR your "playout buffer" could be
the entire two minutes long; just drop each RTP packet into the right
place in the buffer when it arrives and start the decoder on the even
minute as usual.

I haven't implemented RTCP yet but I will soon. The receiver statistics
aren't very useful on a LAN, but RTCP looks like the best way for a
multicast sender to tell who's listening so it can not bother sending
when there isn't anybody. With IGMP snooping the multicast streams don't
really go anywhere when nobody is listening, but the sending computer
could still save some CPU cycles.

> Our experience of users using the Elecraft remote kit using IP streaming
> is that latency and delay are a problem, this being because of our
> dependence on external wall clock synchronization. Can RTP provide
> absolute time-stamp information that we can leverage to capture UTC time
> sync relative to the source? Is there a recovery mechanism to "unbuffer"
> when delays accumulate?

RTP also provides a 32-bit relative timestamp. Its exact meaning depends
on the codec; in general it's supposed to count audio samples at the
encoder input or decoder output (or frames for video). For 48 kHz PCM
(any number of channels and any number of bits per sample) the timestamp
increments every 1/48000 sec of real time. So if a PCM packet contains
960 samples (20 ms @ 48 kHz) the RTP timestamp increases 960 with each
packet. It's the same even with compressed signals; for example, if you
use Opus with 20 ms frames the RTP timestamp still increases by 960 for
each packet even though the actual number of (compressed) bytes in the
packet varies (and is considerably less than 960).

This is a *relative* timestamp unrelated to clock time. I've thought a
lot about the absolute timestamp problem as I also have it with the
AFSK/FM demod, HDLC receiver and APRS data extractor I wrote to process
my receiver output so our satellite antennas can automatically track the
high altitude balloons we fly at the high school ham club.

For I/Q streams I insert my own metadata header between RTP and the data
that carries the receiver LO frequency, sample rate and analog gain
settings. I just recently added absolute clock time represented as a
64-bit integer count of nanoseconds since the GPS epoch of 6 January
1980 00:00:00 UTC (don't get me started about leap seconds).

But doing that on receiver audio output would be incompatible with audio
players like VLC that can play multicast 16-bit linear PCM. Listening
directly to a raw I/Q stream doesn't seem very interesting, so there I
don't mind the incompatibility.

So the clock time problem is currently unsolved on the receiver output,
but I'm thinking of periodically sending the clock time in a separate
metadata packet. It would give the wall clock time (or the original wall
clock time for an I/Q playback) that corresponds to a particular RTP
timestamp. The sample rate ought to be accurate enough to interpolate
between those metadata updates. Remember I already fill in any missing
I/Q frames with zeroes to maintain synchronization.

IMHO, many network audio streaming schemes have huge latency because
they use TCP, which was never designed for real-time traffic. That's
what RTP/UDP is for.

I work to keep the delays in my own SDR as small as possible. Here's an
analysis:

The time to fill an Ethernet frame with 240 pairs of 16 bit I/Q samples
@ 192 kHz is 1.25 ms.

I use fast correlation with overlap-and-save to do predetection
filtering. The numbers are all configurable, but I typically use an 8K
point FFT with 3840 new data samples that fills in 20 ms @ 192 kHz. This
overlaps with the 16 Ethernet I/Q sample transfers so the buffer still
fills 20+epsilon ms after the A/D converter produced the first sample in
the block. (My Ethernet is all gigabit so I'm ignoring its
store-and-forward delays. I'm also not counting CPU processing time,
mainly because I don't know exactly what it is, but also because it's
very small.)

3. The filter impulse response is centered in the other 4352 FFT points,
which is another 4352/2 = 2176 sample or 11.3 ms of delay. So the total
is 20+11.3 = 31.3 ms.

I decimate the output to 48 kHz, producing 960 samples/20 ms that takes
two back-to-back Ethernet frames (more if it's stereo). Again I ignore
the Ethernet store-and-forward delay because it's so small at a gigabit.

Ordinarily I'll listen to Opus-compressed audio with a 20 ms frame time,
so that adds 20 ms plus whatever the codec adds internally. But that's
only for my ears; digital decoders get the PCM stream directly.

I often verify latency by simply tuning in WWV and comparing the ticks
to the NTP-synchronized clock on my laptop. The delay is only slightly
perceptible.  Is that good enough?

> I assume the RTP payload formats are usually compressed, can we use
> uncompressed PCM at 48000Hz 16-bit (actually 12000Hz 16-bit is all we
> require unless we go to wider bandwidths) and still expect timely
> delivery? If not, are we heading for a world of pain with proprietary
> codecs?

RTP can handle any codec, compressed or uncompressed, at any sample rate
you want -- and, no I won't touch proprietary codecs with a 10-meter
barge pole. Check out Opus; it's an IETF standard, unpatented and
available as open source. It's a hybrid of several techniques that
handles hi-fi audio at 510 kb/s down to speech at 6 kb/s. The API is
very easy to use, and it sounds pretty good too. I'd like to see it used
over the air to replace those awful AMBE codecs in DMR/Fusion/D*Star.

I currently use 48 kHz (decimated 4:1 from 192 kHz) for several reasons.
I need that bandwidth for NBFM demodulation (though not for SSB); it's
the preferred rate of most computer sound systems; it's the preferred
rate of the Opus codec even on low bit rate speech; I've got plenty of
Ethernet capacity in my house; and I rarely send uncompressed PCM out of
the local machine anyway. I'll usually run the decoder (or the Opus
encoder) on the same machine as the receiver producing the PCM stream.
But if you'd rather get your PCM at 12 kHz or some other rate, that's
very easy to do. Just make the decimation ratio an integer, please.

> I have some experience as I used to work on a set top box application
> that streamed MPEG-4 over ADSL Internet, we used the Intel IPP libraries
> for codecs rendering to a Linux frame buffer and sending either
> web-camera or Ethernet based stand-alone camera streams.

In the US, I know only one large-scale application for high rate real
time audio and video streams over multicast RTP/UDP: AT&T Uverse, which
provides a "cable TV" service over VDSL2. When I visited Munich in 2013
I saw the same technology in use under a different brand name, so it is
being used elsewhere.

Phil




------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
wsjt-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/wsjt-devel

Re: [wsjt-devel] Audio input from RTP stream?

Reply via email to