On 08.02.2015 19:34, Alexander E. Patrakov wrote:
01.02.2015 03:43, Georg Chini wrote:
This is the final version of my patch for module-loopback. It is on
top of the
patch I sent about an hour ago and contains a lot more changes than
the previous
versions:
- Honor specified latency if possible, if not adjust to the lowest
possible value
- Smooth switching from fixed latency to dynamic latency source or
sink and vice versa
- good rate and latency stability, no rate oscillation
- adjusts latency as good as your setup allows
- fast regulation of latency offsets, adjusts 100 ms offset within 22
seconds (adjust
time=1) to 60 seconds (adjust_time=10)
- usable latency range 4 - 30000 ms
- Avoid rewinds and "cannot peek into queue" messages during startup
and switching
- works with rates between 200 and 190000 Hz
- maximum latency offset after source/sink switch or at startup
around is 200 ms
I also introduced a new parameter, buffer_latency_msec which can be
used together
with latency_msec. If buffer_latency_msec is specified, the resulting
latency
will be latency_msec + buffer_latency_msec. Latency_msec then refers
only to
the source/sink latency while buffer_latency_msec specifies the
buffer part.
This can be used to save a lot of CPU at low latencies, running 10 ms
latency
with latency_msec=6 buffer_latency_msec=4 gives 8% CPU on my system
compared to
12% when I only specify latency_msec=10.
Additionally you can go beyond the safe-guard limits that are built
in, you can
access the range 1 - 3 ms or lower the buffer latency for fixed
latency devices.
Some of my USB devices run fine at a buffer latency of fragment size
+ 4 ms
instead of the dfault fragment size + 20 ms.
I tested it all with Intel HDA, USB and bluetooth sound devices. I
would like to
see some test results from other people.
After attempting to split up this patch and to add comments, I got
some remarks and questions.
+ pa_log_debug("Loopback overall latency is %0.2f ms + %0.2f ms +
%0.2f ms = %0.2f ms, latency difference: %0.2f ms, rate difference:
%i Hz",
(double) u->latency_snapshot.sink_latency /
PA_USEC_PER_MSEC,
- (double) buffer_latency / PA_USEC_PER_MSEC,
+ (double) current_buffer_latency / PA_USEC_PER_MSEC,
(double) u->latency_snapshot.source_latency /
PA_USEC_PER_MSEC,
- ((double) u->latency_snapshot.sink_latency +
buffer_latency + u->latency_snapshot.source_latency) /
PA_USEC_PER_MSEC);
I am not sure whether this split of latency accounting makes sense
anymore, because it is not possible to attribute these latencies to
any particular point in time. Especially current_buffer_latency, which
(for me) is just a meaningless-by-itself intermediate quantity.
I don't care, I left it in because it was already there. If you would
like to delete it, no problem.
But maybe it makes sense to print out the corrected latency at this point.
Also, here is my line of thought (an alternative derivation of
current_buffer_latency, which does not, however, yield exactly the
same), in some pseudocode.
At the moment source_timestamp, the source had already given us
receive_counter bytes of data, and had source_output_buffer bytes of
data buffered at the source output level and source_latency
microseconds of data still sitting in the soundcard buffer. So, at
that moment, we have been recording for this amount of time, according
to the source clock:
recording_duration_at_source_timestamp = source_latency +
bytes_to_usec(receive_counter + source_output_buffer, base_rate)
If we knew that base_rate is accurate (i.e. that the source clock and
wall clock are exactly the same), we could add the timestamp
difference to see for how long we have been recording at sink_timestamp:
recording_duration_at_sink_timestamp =
recording_duration_at_source_timestamp + sink_timestamp -
source_timestamp
We don't know that, because base_rate is in fact not accurate
according to the wall clock. But we don't have an estimate of the
actual source sample rate (according to the wall clock), and thus
cannot translate the timestamp difference from the wallclock domain to
the source clock domain any better. So we have to live with the above
formula, and accept that it gives us the absolute error of this order:
recording_duration_error = (sink_timestamp - source_timestamp) * abs(1
- real_base_rate / base_rate)
i.e. less than 0.75% of error if we accept that the real sample rate
never deviates from the nominal one by more than 0.75%.
Using the similar arguments, let's calculate how long the sink input
has been playing at sink_timestamp. The sink input, according to the
source clock, has received send_counter bytes of data, but has
sink_input_buffer bytes buffered in the sink input, and sink_latency
microseconds of data (according to the sink clock) buffered in the
sink. So:
playback_duration = bytes_to_usec(send_counter, ???) -
bytes_to_usec(sink_input_buffer, !!!) - sink_latency
...with an obvious source of error that we didn't convert the sink
latency to the source clock domain. But this error is of the same
order as the recording duration error (because both sink latency and
the worst-case duration between the message being sent and processed
in the pop callback are of the same order) that we already accepted,
so it's pointless to correct.
Let's see what we should put instead of the "???". Obviously, the
actual rate with which the sink consumed samples. But we have
previously controlled the rate at which it consumes samples, with the
aim of keeping the latency constant. So "???" is just base_rate.
Now let's think which rate should be put instead of the "!!!".
Intuitively, it would appear that it is old_rate, because that's the
rate associated with the sink input.
But there is a counterargument: that rate is being constantly
manipulated with, in order to cause the sink input to consume samples
faster or slower than it would normally do, and thus does not
represent the true sample rate of the sink input.
I cannot follow this argument. The rate has been constant for at least 1
second, so why would
the rate then not represent the real rate of the sink?
Also, due to these manipulations, old_rate might contain jitter, and
thus base_rate is a better quantity to put instead of the "!!!", with
the same "we have already accepted a similar error" argument.
Not at all! Consider a latency of 30000 ms. Deliberately introducing an
error of 0.75% here means
accepting 225 ms of deviation that are not necessary! And this is
something I observed in
reality and why I needed the corrected latency at all. You will
massively over-correct if you do what
you propose. You will not see a lot of difference for small latencies
however.
After all wall clock is not too important in that case, most of it
happens either in the source
or sink domain, wall clock is just the domain you translate to so that
you can do calculations.
If the rates are not exact it does not matter, you will see some drift
then that the controller
will correct.
Let's see how I arrived at the final equation (and I am quite sure I am
correct here):
The basic of it all is the equation L=n/r where L is latency, n is the
number of samples and
r the rate.
Let's call the base rate rb, the current ("old") rate ro, sink latency
Lsi, source latency Lso,
buffer latency Lbu , then at any given time the number of samples in the
whole system is
n = rb*Lso + ro*(Lbu + Lsi), this means at base rate we have the latency
L = n/rb = Lso + ro/rb*(Lbu + Lsi) - that is my corrected latency. And this
is the value I want to control, so I put this into the controller.
The final number of samples I want to reach is nf = Lfi / rb (Lfi being the
final or requested latency)
The expectation value for the next latency is calculated similar:
Current number of samples (=L*rb) plus samples that are added
during adjust_time minus samples that are played during adjust_time gives
LNext = (L*rb + adjust_time*rb - adjust_time*rn)/rn
(Ups, I just noticed I made a mistake in calculating the current latency
which
is n/ro = rb/ro*Lso + Lbu + Lsi and not just Lso + Lbu + Lsi, but I
think that
is not too important because it is only used to calculate the error. Can you
nevertheless correct it please when you split the patch?)
I hope this explains a bit what the basis of the patch is and why I
think your
modification is wrong.
The total latency is, obviously,
latency = recording_duration - playback_duration
which, after expansion, is exactly your formula for current_latency,
with some instances of old_rate replaced with base_rate. As I said, I
think this replacement may be beneficial for reducing self-inflicted
jitter while working outside of the deadband.
A wrong-and-hackish (not sure about thread safety) patch is attached
that does this replacement in as many places as possible (including
the message processing) in hope to reduce jitter, and also removes
corrected_latency because it is no longer needed. For me, in
webcam->HDA and bluetooth->HDA scenarios, it works just as well as
your original patch - but you have USB playback devices, so your
results may be different. Could you please apply it on top of my older
patch (that moves capturing the timestamps) and test? A log similar to
what you have already sent, but with this patch and with both 0.75%
and the 2‰ restraints commented out would be useful.
I did not try the patch, I do not think it is correct, see above.
+ u->latency_error = (4 * u->latency_error +
(double)abs((int32_t)(current_latency - u->next_latency)) /
final_latency) / 5;
OK, so latency_error is a dimensionless quantity representing the
relative (to final_latency) error. But then I can't make sense of this:
+ /* Adjust as good as physics allows (with some safety margin) */
+ if (abs(latency_difference) <= 2.5 * u->latency_error *
final_latency + u->adjust_time / 2 / base_rate + 100)
+ new_rate = base_rate;
abs(latency_difference) is obviously in microseconds.
2.5 * u->latency_error * final_latency is also in microseconds, good.
100 microseconds as a fudge factor are understandable, too.
But u->adjust_time / 2 / base_rate is something strange, not
microseconds. Obviously, you meant something different. Besides, this,
if evaluated, would also yield at most 100 (with adjust_time of 10
seconds), and thus would be of the same order as the fudge factor. So
- the whole deadband, according to your own testing, works fine almost
without this term, maybe it is a good idea to delete it?
This is also us. There is 1 second you do not see (because it's 1). The
term accounts for the
minimum adjustment you can do. The minimum adjustment is 1Hz, so you
have a minimum
time of 1/base_rate (actually 1/(base_rate +/- 1)) seconds PER SECOND
that you can add or
remove.
So it does not make sense to adjust if you are less than 1/(2*base_rate)
/1s * adjust_time
away from the requested latency. I would not want to delete it, if you
consider a rate
of 8000 Hz it is already 625 us at 10 seconds adjust_time and maybe
there are people using
even lower rates.
_______________________________________________
pulseaudio-discuss mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/pulseaudio-discuss