[Sursound] Surround formats and lossy compression

Eric Carmichel Fri, 05 Apr 2013 17:52:34 -0700

Greetings to All:

When it comes to surround sound coding/decoding, I never make a peep because 
I'm ignorant on the topic. However, a friend who heads the Dept. of Audiology 
at a children's hospital had asked a question regarding MP3s. Although the MP3 
format may be nothing more than a distant relative to surround formats, the 
thought of using "lossy" file types in research studies utilizing 
surround-sound stimuli does concern me. I answered my friend's question (re 
MP3s) as best I could, and the answer is shown below (I copied and pasted it 
verbatim--sorry for it's long length). Some of the concerns outlined below may 
or may not apply to surround sound (?).

Has anyone experienced odd artifacts while doing hybrid mixing (sounds from
monaural sources added to actual, or live, Ambisonic recordings) and where
sound files stored in lossy formats were converted to wav files? Re surround
sound for research: Are there file formats that should be avoided as far as
psychoacoustic research goes? Are all lossless formats more-or-less equal in
terms of 'purity'.

Thanks in advance for any insights.
Eric C.

---original email and response re MP3s and audiology follow---

Hi Eric –
I hope you’re doing well. I’d like to pick your brain, if you don’t mind. What
do you think about the use of MP3 or MP4 recordings for speech audiometry? I’m
thinking of possible pitfalls in the compression and the bandwidth of the
signal compared to, say, FLAC or standard wav files. Of course, audiologists
used vinyl LPs and tape recordings for decades without any worry. Thanks,
Bob

Hi Bob,

You ask a good question and one that should be examined from more than a
“fidelity” point of view. But before I dive into this, please allow me to make
my first disclaimer: I’m writing this off the cuff, so I won’t give any
references to peer-reviewed studies (but then, who needs peer review when the
answer comes from Eric Carmichel?). Second disclaimer: I assume you already
know a lot of what I wrote below--if I explain something that is either
“obvious” or well known, it’s only to help me communicate my thoughts.

Researchers [ref?] have shown that the majority of listeners cannot tell the
difference between a 44.1 kHz (or kS/s), 16-bit wav file and an MP3 derived
from the same wav file. I don’t know what program material was used in the
studies, but let’s assume music. If we can’t tell the difference between music
MP3s and CDs, then “surely” we can’t hear a difference between speech MP3s and
CDs. This might be one argument in favor of using MP3s for speech audiometry.

I believe most MP3s have a 32 kS/s sampling rate, which isn’t by itself much of
a size reduction from 44.1 kS/s files. The compression scheme used to create
MP3s is (or was) proprietary and largely based on psychoacoustical principals.
Sounds that can’t be heard because of energy masking are “removed” at the time
they would otherwise be masked. MP3s, unlike FLAC (free lossless audio codec),
use a “lossy” compression scheme--what is lost isn’t brought back--it just
doesn’t contribute (perceptually) to the sound. I’d guess that both forward and
backward masking are taken into account as well. The usual “bandwidth” (not
sure it’s a good use of the word) for MP3s is 128 kilobytes/s, but some files
use a variable bit rate*. An MP3 whose bit rate is 128 KBps is considered
“radio” quality, while higher rates are probably indistinguishable from
CD-quality wav files, at least in terms of fidelity.

[Side notes: CD-quality refers to 44.1 kHz sampling rate and 16-bit resolution.
16 bits is, well, a mix-n-match of 16 zeroes and ones. MP3s are also 16 bit,
but this is variable. Sixteen bits yields a total of 2^16 unique combinations
of zero and one (0 to 65535 represented digitally). A byte is 8 bits--just
basic computer nomenclature that goes back to caveman days and ASCII standards.
So, 16 bits (lower case b) is same as two bytes (upper case B). If sampling
rate is 32,000 samples per second and our resolution is 2 bytes, then we’re
“streaming” 32,000 * 2 = 64,000 bytes per second, or 64 KBps. Unlike kilohertz
(kHz), the K is capitalized when referencing kilobytes (KB) or kilobits (Kb).
Because we’re (generally) dealing with two interleaved channels (L + R), the
rate doubles. This is where the 128 KBps comes from: 32 kHz * 2 channels * 16
bits / 8 bits/byte = 128 KBps.]

MP3s, like FLAC or wav files, are NOT limited to a fixed sample rate or bit
depth. If frequency response were our only concern, Nyquist frequency (also
known as “foldover” frequency) or Nyquist’s theory says that the highest
reproducible frequency without aliasing is half of the sample rate. So, bit
depth (= resolution) and upper frequency limit should NOT be our concern for
using MP3s. So why be against MP3s? Read on...

When it comes to perception, we really don’t know what the hearing impaired,
autistic (neurotypical), or “not-so-average” listener hears. Perhaps the
“missing” information in lossy compression schemes provides useful or subtle
information to those whose perception isn’t normal or average. Furthermore, we
don’t know how adding masking noise (speech or weighted noise) to material
reproduced from MP3s might affect an outcome. Here’s an interesting experiment:
Convert a stereo MP3 to wav (you’re not gaining anything... yet), flip the
polarity of one channel (i.e., a 180 degree “phase” change but without moving
the time line), and mix the channels to create a 50/50 mono mixdown. In many
instances, you’ll hear odd artifacts that aren’t explained by simple phase
cancellations. In other words, mixing the original source material (master
tracks, not the MP3) down to mono won’t give rise to the artifacts. So, there’s
something about
the encoding or decoding that affects files in unpredictable ways when
changing back to a “lossless” (e.g. wav) file. Because there are audiometric
test protocols that rely on phase flipping or combining signal and noise, I’d
most certainly avoid lossy compression schemes. If the tests were as simple as
speech detection thresholds, I don’t foresee any harm in using MP3 files. But
for differential diagnoses, research, etc., stick with lossless files, whether
analog, digital, wav, or FLAC.

In summary, my reasons for not recommending MP3s is that they are already
“psychoacoustically tainted” and not the equivalent to actual stimuli even if
perceived by normal-hearing listeners as equivalent. Frequency response isn’t
the culprit. And with today’s technology, there’s very little reason to
“conserve” memory in order to accommodate small speech (wav) files.

*Additional notes

MP3 processing may not entirely remove a sound that is otherwise masked;
instead, the resolution, or bit depth, can be greatly reduced in instances of
“unheard” sounds. Simply re-sampling a file to lower the sampling rate is a
linear reduction. In other words, re-sampling a wav file from 44.1 kHz to 32
kHz gives a 32/44.1 = 0.726 reduction in file size. Discussions of re-sampling
among audio geeks, by the way, gets into the ultra-boring topics of dithering
(ever heard “dithering down” used by recording engineers?), noise shaping,
filter types, blah blah, but only a small percentage of the people who like to
toss these words around can do the math.

I can’t state that I’ve generated a pure tone, saved it as a wav file,
converted it to MP3 Pro, and then examine how many bits were actually being
used to create the sinusoid. Opening the MP3 in a wav editor such as Sound
Forge or Audition probably doesn’t allow one to “see” how the MP3 is being
operated on in order to edit or play the file (again, MP3 compression is
proprietary as well as lossy). Zooming in on the MP3 will probably reveal 16
bits per sample, yet the file reduction is considerable (approx 10x smaller
than the wav file it was derived from).

When it comes to bit depth and sample rate, one of the biggest reasons for not
using mega-fidelity files (24-bit, 96 kS/s) isn’t one of memory allocation, but
one of battery use. Yep, the processing power needed for super audio files is
greater than for lower-fidelity files. Apple (so I’m told) limits sample rate
based on power consumption, not memory used. If you really want to open a can
of worms, get the audio geeks to argue over 16- and 24-bit audio files. I put
more merit in bit depth than sampling rate, but mostly for reasons having to do
with dynamic range.

Lossless compression codecs require processing power, too, but unless you’re
doing audiometry in the field and power is at a premium, there shouldn’t be any
problems regarding power. There may be an intrinsic latency when presenting
material, but this would be on the order of milliseconds (or microseconds).
Latency would only be a problem if other time-sensitive processing was involved
(e.g, the use of VST or RTAS plug-ins for research). I really can’t think of
practical reasons not to use FLAC files. A lot of this gets back to the quality
of the master material, and what software was used to convert to FLAC or
whatever.

Hope this isn’t too confusing.

Best,
Eric C.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<https://mail.music.vt.edu/mailman/private/sursound/attachments/20130405/0e18fb5b/attachment.html>
_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound

[Sursound] Surround formats and lossy compression

Reply via email to