Re: [pulseaudio-discuss] [PATCH 07/13] loopback: Refactor latency initialization

Georg Chini Sat, 21 Nov 2015 10:43:44 -0800

On 20.11.2015 16:18, Tanu Kaskinen wrote:

On Fri, 2015-11-20 at 08:03 +0100, Georg Chini wrote:

On 20.11.2015 01:03, Tanu Kaskinen wrote:

On Wed, 2015-02-25 at 19:43 +0100, Georg Chini wrote:

as well as the way of reacting to sink input or source output moving.


The goal is to make sure that the initial latency of the system matches
the configured one.

While at it, allow to set adjust_time to 0, which means "no adjustments
at all".
---
   src/modules/module-loopback.c | 329 
+++++++++++++++++++++++++++++++++++-------
   1 file changed, 275 insertions(+), 54 deletions(-)

diff --git a/src/modules/module-loopback.c b/src/modules/module-loopback.c
index 79bd106..09f2f58 100644
--- a/src/modules/module-loopback.c
+++ b/src/modules/module-loopback.c
@@ -59,7 +59,9 @@ PA_MODULE_USAGE(

#define DEFAULT_LATENCY_MSEC 200-#define MEMBLOCKQ_MAXLENGTH (1024*1024*16)

+#define MEMBLOCKQ_MAXLENGTH (1024*1024*32)

Why is this change done? Whatever the reason is, shouldn't this be done
in a separate patch?

The queue is just too small if you want to run long delays (30s) together
with high sample rates (190000 Hz). I can put it in a separate patch.

+
+#define DEFAULT_BUFFER_MARGIN_MSEC 20

#define DEFAULT_ADJUST_TIME_USEC (10*PA_USEC_PER_SEC)@@ -80,11 +82,21 @@ struct userdata {int64_t recv_counter;

       int64_t send_counter;
+    uint32_t sink_adjust_counter;
+    uint32_t source_adjust_counter;

- size_t skip;

       pa_usec_t latency;
+    pa_usec_t buffer_latency;
+    pa_usec_t initial_buffer_latency;
+    pa_usec_t configured_sink_latency;
+    pa_usec_t configured_source_latency;

It would be nice to have comments about what each of these different
latency variables mean.

I'm trying to understand what buffer_latency is... I suppose it's used
to configure the buffering so that the sink, source and buffer_latency
together equal u->latency (i.e. the user-configured loopback latency).

Yes, the buffer latency is the part of the latency that is under direct
control of the module (mainly what is in the memblockq). It is the
parameter that is adjusted during the regulation process.

Correct me if I'm wrong, but the parameter that is adjusted is the sink
input rate. The regulation process doesn't touch the buffer_latency
variable. The sink input rate affects the amount that is buffered in
the memblockq, but the buffer_latency variable stays the same (except
when moving the streams).


You are right, I was confusing the variable buffer_latency with "the
latency that is caused by the current number of samples in the buffer".
It's too long ago that I wrote this patch ...
buffer_latency is the minimum amount of audio that must be kept
in the buffer to avoid underruns.

It's 1/4 of u->latency by default, so I guess you try to configure the
sink and source latencies to be 3/8 of u->latency each? buffer_latency
can't be less than 1.667 ms, however. (Where does that number come
from?)

Hmm... When configuring the sink and source latencies, you divide u-

latency by 3, which is consistent with the old code, so maybe it was a

typo to divide u->latency by 4 when setting the default buffer_latency?
I'll assume it was a typo, and you meant all three latency components
to be one third of u->latency by default.

No, it is no typo. The intention was to set the initial buffer latency
to something
smaller than one third. The value is fixed up later.

So does the initial value have any effect on anything? It seems very
weird that buffer_latency defaults to 1/4 of the full latency and sink
and source latencies are 1/3.


Taking a closer look today I cannot find any reason why I set it
to a quarter of the latency. It should be set to something like
"requested latency - expected source/sink latency - some msec"
because I use it later in the regulation if there are too many
underruns. This is currently only done correctly if you request a
latency that cannot be satisfied. I will fix it in the next version.

buffer_latency can't be less than 3/4 of the sink or source latency
(where does that 3/4 come from?), so if the sink or source latency
exceeds that limit, buffer_latency is raised accordingly. That's with

Like many other values in this patch the 3/4 is the result of lots of
experiments
with the module. If it is less than 3/4 of sink or source latency you
will start
seeing underruns. You can go below that value in certain situations, it's a
safety margin.

I think it should be possible to reason about the minimum safe value.
We want to avoid underruns, but we also want to minimize the latency.
I'm afraid experimental values will fail in cases that you haven't
tested.

I can't intuitively figure out what the minimum safe value is, but
maybe some examples will show.

Let's start with zero buffer_latency and 1 second sink and source
latencies. Let's assume that both device buffers are full in the
beginning, and that the sink requests for more data when its buffer is
empty, and the source pushes out data when its buffer becomes full.
Let's also assume that all operations are instantaneous. The sink and
source run at the same rate.

time = 0 s

buffered:
source: 1 s    memblockq: 0 s    sink: 1 s

The source is full, so the audio moves to the memblockq.

source: 0 s    memblockq: 1 s    sink: 1 s

----

time = 1 s

The source is again full, so it will push data to the memblockq. The
sink is empty, so it will pull data from the memblockq. These two
things can be serialized in two ways. Either the source pushes first:

source: 0 s    memblockq: 2 s    sink: 0 s
source: 0 s    memblockq: 1 s    sink: 1 s

or the sink pulls first:

source: 1 s    memblockq: 0 s    sink: 1 s
source: 0 s    memblockq: 1 s    sink: 1 s

In neither case underruns happened.

We're back in the same situation as in the beginning, so in this
example buffer_latency can safely be zero.


What if we have different latencies in the sink and source? Let's set
source latency to 2 s and sink latency to 3 s. Let's keep other
parameters and assumptions the same. In the beginning both devices have
full buffers again:

time = 0 s

source: 2 s    memblockq: 0 s    sink: 3 s

The source is full, so it pushes.

source: 0 s    memblockq: 2 s    sink: 3 s

----

time = 2 s

The source is full, so it pushes.

source: 0 s    memblockq: 4 s    sink: 1 s

----

time = 3 s

The sink is empty, so it pulls.

source: 1 s    memblockq: 1 s    sink: 3 s

----

time = 4 s

The source is full, so it pushes.

source: 0 s    memblockq: 3 s    sink: 2 s

----

time = 6 s

The source is full and the sink is empty, so they push and pull.

source: 0 s    memblockq: 5 s    sink: 0 s
source: 0 s    memblockq: 2 s    sink: 3 s

or

source: 2 s    memblockq: 0 s    sink: 3 s
source: 0 s    memblockq: 2 s    sink: 3 s

We're now back to where we started. Again, no underruns. To me it seems
that buffer_latency doesn't need to be big. Underruns happen when the
memblockq doesn't have enough data to satisfy the pull from the sink.
In these two examples the memblockq got empty in some cases, but those
were cases where the sink had just been satisfied and the source was
just about to push more data to the queue, so in some sense an underrun
was not even close.

In the cases where the memblockq got empty, a rate mismatch could have
caused an underrun: if the source is just about to get full, and the
sink consumes a bit too much data from the memblockq, a small underrun
will occur. To mitigate that, a small margin is needed, which I suppose
should be proportional to the rate error and the sink buffer size (or
max_request to be more accurate). Does the source buffer size play any
role? I'm not sure. Anyway, 3/4 of the sink buffer seems excessive,
because the rate error won't be anywhere close to 75%.

Then there are scheduling delays and other overhead that break the
assumption of instantaneous operations. Those have very little effect
with big latencies, but with very low latencies they become more
important, and your 3/4 rule isn't necessary excessive.

If buffer_latency only needs to cover the rate errors and cpu time,
then its proportion of the whole latency should be big with low
latencies and small with high latencies. Does this theory match with
your experiments?


The point is, that the assumption that source_output and sink_input
rate are the same is not valid. As long as they are, you will not hit a
problem.
Once you are in a steady state you only have to care about jitter.
I cannot clearly remember how I derived that value, probably
experiments, but I still believe that 0.75 is a "good" estimate. If
you look at the 4 msec case, the buffer_latency is slightly lower
than 3/4 of the sink latency (1.667 versus 1.75 ms) but this is
also already slightly unstable.
In a more general case the resulting latency will be
1.75 * minimum_sink_latency, which I would consider small enough.

Regarding your concern that we want to keep latency down: The old
module loopback starts to throw errors in the log when you go down
to 7 msec latency on Intel HDA. The new module will run happily at
4 msec, so it is still an improvement.
With the controller I am currently developing together with Alexander
it might even be possible to go further down because it is better at
keeping the latency constant.

dynamic latency support. If the sink or source doesn't support dynamic
latency, buffer_latency is raised to default_fragment_size_msec + 20
ms. I don't think it makes sense to use default_fragment_size_msec.
That variable is not guaranteed to have any relation to the sink/source
behaviour. Something derived from max_request for sinks would probably
be appropriate. I'm not sure about sources, maybe the fixed_latency
variable could be used.

Mh, that is also a value derived from experiments and a safety margin to
avoid underruns. I can say that for USB cards there is a clear connection
between default_fragment_size_msec and the necessary buffer_latency
to avoid underruns. Don't know if max_request is somehow connected
to default_fragment_size.

Yes, max_request is connected to default_fragment_size. For the alsa
sink, max_request is the same as the configured latency, which
default_fragment_size affects.


To me it looks like max_request and fixed latency are derived from
default_fragment_size * default_fragments. The probability of underruns
only depends on default_fragment_size, the number of fragments
is irrelevant. The question is how large the chunks are that are transferred
between memblockq and source/sink (or between soundcard and pulse?).
So neither max_request nor fixed latency seem to be the right indicators.
I remember that it took me quite a while to figure out which values are safe
for batch and non-batch cards.
My tests have been done with USB, Intel HDA and bluetooth.

I'm going to sleep now. I can continue reviewing this patch tomorrow,
or I can do it when you send v2, if you split this into more manageable
pieces (I'd prefer the latter). Which one do you think is better?

I'll split it in the next version, but I would appreciate if you can at
least check
the rest of the patch if you find something that is obviously wrong on first
inspection.

Ok, I'll continue with this patch still. I said I'd do it today, but it
will have to wait until tomorrow, sorry.

--
Tanu


_______________________________________________
pulseaudio-discuss mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/pulseaudio-discuss

Re: [pulseaudio-discuss] [PATCH 07/13] loopback: Refactor latency initialization

Reply via email to