On Wed, May 11, 2011 at 02:50:36AM +0300, Sviatoslav Chagaev wrote:
> I'm sitting at work, listening to music, debugging a web-application
> with JavaScript alert()s. Each time an alert window pops up, the
> browser plays a sound. For a brief moment, the volume drops twicefold
> then goes back to normal. This is annoying and doesn't make sense.

I agree, this is annoying.

> In real life, if you are surrounded by multiple sound sources, their
> sound volumes will not be divided by the total amount of sound sources.
> Their sounds will add up until they blur and you can't distinguish
> anything anymore. Other operating systems, such as Macrohard Doors, do
> mixing by modeling this real world behaviour.

my physics lessons say that pressure is additive, so the resulting
pressure of two sources close to each other is the sum of their
respective pressures. And there's no clipping in nature, so no need to
test against any MIN and MAX value.

A simple addition is what our ears expect.

On the other hand DACs operate on a limited dynamic range, so there's
a MIN and a MAX value. This is not how physics laws are, there's not
MIN and MAX values for pressure.

So keeping full dynamic range of the DAC and doing the physics
correctly at the same time is simply mathematically impossible.

What options do we have?

 (1) prescale streams => loose few dBs of dynamic range
 (2) clipping => is not natural except if there's no clipping
 (3) using (x + y - x * y) => distortion, similar to (2)
 (4) do (1) but with DACs with larger dynamic range => ok
 (5) ...

The choice behind aucat is to never add distortion, clipping or
whatever. So (1) and (4) are the only options afaics

> In this sense, aucat violates the principle of least surprise.
> I'm used to how sound interacts in real world and then aucat steps in
> and introduces it's own laws of physics.
> 
> To remedy this, aucat has an option -v, which lets you pre-divide the
> volume of inputs. This results in loss of dynamic range (quiet sounds
> might disappear and the maximum volume that you can set decreases). And
> also, if during usage the count of inputs raises above of what I
> predicted, the volume starts to jump up and down again.

If you have N streams, the relative jump is, N / (N + 1) so there's
almost no step if N is large enough (it tends to 1). My experience is
that for N > 3, I hear no step, except if I pay special attention
and/or I use particular recordings.

> 
> Experimentally, I've found that if you do a saturating addition between
> inputs, it sounds very much how it might have sounded in real world

I don't agree. Sound doesn't saturate in real world. When two persons
are speaking around me at the same time, I don't hear any
clipping/distortion.

Human ears might saturate at very elevated sound levels but at such
level they are being damaged.

> and
> how Macrohard Doors, among others, sounds like when playing
> multiple sounds.
> 

I bet it prescales, but nobody noticed it because it prescales all the
time. I bet that if "-v 100" was the aucat default, we wouldn't have
this discussion. We would be discussing about aucat defaults being
unpractical for conversions, or about the volume being too low when a
single stream is playing.

> 
> So, why is what I'm proposing better than what currently exists:
> 
> * Resembles how sound behaves in real world more closely;
> * Doesn't violate the principle of least surprise;
> * No more annoying volume jumps up and down;
> * No need to use the -v option anymore / less stuff to remember / "it
> just works";
> * No more choosing between being annoyed by volume jumps or loosing
> dynamic range.
> 

I guess this works well with your recordings by accident, as it would
with mines. I bet they are pre-divided, so you almost never hit the
ADATA_MIN and ADATA_MAX bundary, and there's almost no clipping, is
it?

If so, for such streams you could do:

        int
        adata_sadd(int x, int y)
        {
                return x + y;
        }

and the result would be almost the same.

-- Alexandre

Reply via email to