Greetings to All:

When it comes to surround sound coding/decoding, I never make a peep because 
I'm ignorant on the topic. However, a friend who heads the Dept. of Audiology 
at a children's hospital had asked a question regarding MP3s. Although the MP3 
format may be nothing more than a distant relative to surround formats, the 
thought of using "lossy" file types in research studies utilizing 
surround-sound stimuli does concern me. I answered my friend's question (re 
MP3s) as best I could, and the answer is shown below (I copied and pasted it 
verbatim--sorry for it's long length). Some of the concerns outlined below may 
or may not apply to surround sound (?).

Has anyone experienced odd artifacts while doing hybrid mixing (sounds from 
monaural sources added to actual, or live, Ambisonic recordings) and where 
sound files stored in lossy formats were converted to wav files? Re surround 
sound for research: Are there file formats that should be avoided as far as 
psychoacoustic research goes? Are all lossless formats more-or-less equal in 
terms of 'purity'.

Thanks in advance for any insights.
Eric C.


---original email and response re MP3s and audiology follow---

Hi Eric –
I hope you’re doing well. I’d like to pick your brain, if you don’t mind. What 
do you think about the use of MP3 or MP4 recordings for speech audiometry? I’m 
thinking of possible pitfalls in the compression and the bandwidth of the 
signal compared to, say, FLAC or standard wav files. Of course, audiologists 
used vinyl LPs and tape recordings for decades without any worry. Thanks,
Bob

Hi Bob,

You ask a good question and one that should be examined from more than a 
“fidelity” point of view. But before I dive into this, please allow me to make 
my first disclaimer: I’m writing this off the cuff, so I won’t give any 
references to peer-reviewed studies (but then, who needs peer review when the 
answer comes from Eric Carmichel?). Second disclaimer: I assume you already 
know a lot of what I wrote below--if I explain something that is either 
“obvious” or well known, it’s only to help me communicate my thoughts.

Researchers [ref?] have shown that the majority of listeners cannot tell the 
difference between a 44.1 kHz (or kS/s), 16-bit wav file and an MP3 derived 
from the same wav file. I don’t know what program material was used in the 
studies, but let’s assume music. If we can’t tell the difference between music 
MP3s and CDs, then “surely” we can’t hear a difference between speech MP3s and 
CDs. This might be one argument in favor of using MP3s for speech audiometry.

I believe most MP3s have a 32 kS/s sampling rate, which isn’t by itself much of 
a size reduction from 44.1 kS/s files. The compression scheme used to create 
MP3s is (or was) proprietary and largely based on psychoacoustical principals. 
Sounds that can’t be heard because of energy masking are “removed” at the time 
they would otherwise be masked. MP3s, unlike FLAC (free lossless audio codec), 
use a “lossy” compression scheme--what is lost isn’t brought back--it just 
doesn’t contribute (perceptually) to the sound. I’d guess that both forward and 
backward masking are taken into account as well. The usual “bandwidth” (not 
sure it’s a good use of the word) for MP3s is 128 kilobytes/s, but some files 
use a variable bit rate*. An MP3 whose bit rate is 128 KBps is considered 
“radio” quality, while higher rates are probably indistinguishable from 
CD-quality wav files, at least in terms of fidelity.

[Side notes: CD-quality refers to 44.1 kHz sampling rate and 16-bit resolution. 
16 bits is, well, a mix-n-match of 16 zeroes and ones. MP3s are also 16 bit, 
but this is variable. Sixteen bits yields a total of 2^16 unique combinations 
of zero and one (0 to 65535 represented digitally). A byte is 8 bits--just 
basic computer nomenclature that goes back to caveman days and ASCII standards. 
So, 16 bits (lower case b) is same as two bytes (upper case B). If sampling 
rate is 32,000 samples per second and our resolution is 2 bytes, then we’re 
“streaming” 32,000 * 2 = 64,000 bytes per second, or 64 KBps. Unlike kilohertz 
(kHz), the K is capitalized when referencing kilobytes (KB) or kilobits (Kb). 
Because we’re (generally) dealing with two interleaved channels (L + R), the 
rate doubles. This is where the 128 KBps comes from: 32 kHz * 2 channels * 16 
bits / 8 bits/byte = 128 KBps.]

MP3s, like FLAC or wav files, are NOT limited to a fixed sample rate or bit 
depth. If frequency response were our only concern, Nyquist frequency (also 
known as “foldover” frequency) or Nyquist’s theory says that the highest 
reproducible frequency without aliasing is half of the sample rate. So, bit 
depth (= resolution) and upper frequency limit should NOT be our concern for 
using MP3s. So why be against MP3s? Read on...

When it comes to perception, we really don’t know what the hearing impaired, 
autistic (neurotypical), or “not-so-average” listener hears. Perhaps the 
“missing” information in lossy compression schemes provides useful or subtle 
information to those whose perception isn’t normal or average. Furthermore, we 
don’t know how adding masking noise (speech or weighted noise) to material 
reproduced from MP3s might affect an outcome. Here’s an interesting experiment: 
Convert a stereo MP3 to wav (you’re not gaining anything... yet), flip the 
polarity of one channel (i.e., a 180 degree “phase” change but without moving 
the time line), and mix the channels to create a 50/50 mono mixdown. In many 
instances, you’ll hear odd artifacts that aren’t explained by simple phase 
cancellations. In other words, mixing the original source material (master 
tracks, not the MP3) down to mono won’t give rise to the artifacts. So, there’s 
something about
 the encoding or decoding that affects files in unpredictable ways when 
changing back to a “lossless” (e.g. wav) file. Because there are audiometric 
test protocols that rely on phase flipping or combining signal and noise, I’d 
most certainly avoid lossy compression schemes. If the tests were as simple as 
speech detection thresholds, I don’t foresee any harm in using MP3 files. But 
for differential diagnoses, research, etc., stick with lossless files, whether 
analog, digital, wav, or FLAC.

In summary, my reasons for not recommending MP3s is that they are already 
“psychoacoustically tainted” and not the equivalent to actual stimuli even if 
perceived by normal-hearing listeners as equivalent. Frequency response isn’t 
the culprit. And with today’s technology, there’s very little reason to 
“conserve” memory in order to accommodate small speech (wav) files.

*Additional notes

MP3 processing may not entirely remove a sound that is otherwise masked; 
instead, the resolution, or bit depth, can be greatly reduced in instances of 
“unheard” sounds. Simply re-sampling a file to lower the sampling rate is a 
linear reduction. In other words, re-sampling a wav file from 44.1 kHz to 32 
kHz gives a 32/44.1 = 0.726 reduction in file size. Discussions of re-sampling 
among audio geeks, by the way, gets into the ultra-boring topics of dithering 
(ever heard “dithering down” used by recording engineers?), noise shaping, 
filter types, blah blah, but only a small percentage of the people who like to 
toss these words around can do the math.

I can’t state that I’ve generated a pure tone, saved it as a wav file, 
converted it to MP3 Pro, and then examine how many bits were actually being 
used to create the sinusoid. Opening the MP3 in a wav editor such as Sound 
Forge or Audition probably doesn’t allow one to “see” how the MP3 is being 
operated on in order to edit or play the file (again, MP3 compression is 
proprietary as well as lossy). Zooming in on the MP3 will probably reveal 16 
bits per sample, yet the file reduction is considerable (approx 10x smaller 
than the wav file it was derived from).

When it comes to bit depth and sample rate, one of the biggest reasons for not 
using mega-fidelity files (24-bit, 96 kS/s) isn’t one of memory allocation, but 
one of battery use. Yep, the processing power needed for super audio files is 
greater than for lower-fidelity files. Apple (so I’m told) limits sample rate 
based on power consumption, not memory used. If you really want to open a can 
of worms, get the audio geeks to argue over 16- and 24-bit audio files. I put 
more merit in bit depth than sampling rate, but mostly for reasons having to do 
with dynamic range.

Lossless compression codecs require processing power, too, but unless you’re 
doing audiometry in the field and power is at a premium, there shouldn’t be any 
problems regarding power. There may be an intrinsic latency when presenting 
material, but this would be on the order of milliseconds (or microseconds). 
Latency would only be a problem if other time-sensitive processing was involved 
(e.g, the use of VST or RTAS plug-ins for research). I really can’t think of 
practical reasons not to use FLAC files. A lot of this gets back to the quality 
of the master material, and what software was used to convert to FLAC or 
whatever.

Hope this isn’t too confusing.

Best,
Eric C.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<https://mail.music.vt.edu/mailman/private/sursound/attachments/20130405/0e18fb5b/attachment.html>
_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound

Reply via email to