Re: [pulseaudio-discuss] [PATCH v4] Make module loopback honor requested latency

Alexander E. Patrakov Sun, 08 Feb 2015 10:35:12 -0800

01.02.2015 03:43, Georg Chini wrote:

This is the final version of my patch for module-loopback. It is on top of the
patch I sent about an hour ago and contains a lot more changes than the previous
versions:


- Honor specified latency if possible, if not adjust to the lowest possible 
value
- Smooth switching from fixed latency to dynamic latency source or sink and 
vice versa
- good rate and latency stability, no rate oscillation
- adjusts latency as good as your setup allows
- fast regulation of latency offsets, adjusts 100 ms offset within 22 seconds 
(adjust
   time=1) to 60 seconds (adjust_time=10)
- usable latency range 4 - 30000 ms
- Avoid rewinds and "cannot peek into queue" messages during startup and 
switching
- works with rates between 200 and 190000 Hz
- maximum latency offset after source/sink switch or at startup around is 200 ms

I also introduced a new parameter, buffer_latency_msec which can be used 
together
with latency_msec. If buffer_latency_msec is specified, the resulting latency
will be latency_msec + buffer_latency_msec. Latency_msec then refers only to
the source/sink latency while buffer_latency_msec specifies the buffer part.
This can be used to save a lot of CPU at low latencies, running 10 ms latency
with latency_msec=6 buffer_latency_msec=4 gives 8% CPU on my system compared to
12% when I only specify latency_msec=10.
Additionally you can go beyond the safe-guard limits that are built in, you can
access the range 1 - 3 ms or lower the buffer latency for fixed latency devices.
Some of my USB devices run fine at a buffer latency of fragment size + 4 ms
instead of the dfault fragment size + 20 ms.

I tested it all with Intel HDA, USB and bluetooth sound devices. I would like to
see some test results from other people.

After attempting to split up this patch and to add comments, I got someremarks and questions.

+    pa_log_debug("Loopback overall latency is %0.2f ms + %0.2f ms + %0.2f ms = 
%0.2f ms, latency difference: %0.2f ms, rate difference: %i Hz",
                  (double) u->latency_snapshot.sink_latency / PA_USEC_PER_MSEC,
-                (double) buffer_latency / PA_USEC_PER_MSEC,
+                (double) current_buffer_latency / PA_USEC_PER_MSEC,
                  (double) u->latency_snapshot.source_latency / 
PA_USEC_PER_MSEC,
-                ((double) u->latency_snapshot.sink_latency + buffer_latency + 
u->latency_snapshot.source_latency) / PA_USEC_PER_MSEC);

I am not sure whether this split of latency accounting makes senseanymore, because it is not possible to attribute these latencies to anyparticular point in time. Especially current_buffer_latency, which (forme) is just a meaningless-by-itself intermediate quantity.

Also, here is my line of thought (an alternative derivation ofcurrent_buffer_latency, which does not, however, yield exactly thesame), in some pseudocode.

At the moment source_timestamp, the source had already given usreceive_counter bytes of data, and had source_output_buffer bytes ofdata buffered at the source output level and source_latency microsecondsof data still sitting in the soundcard buffer. So, at that moment, wehave been recording for this amount of time, according to the source clock:

recording_duration_at_source_timestamp = source_latency +bytes_to_usec(receive_counter + source_output_buffer, base_rate)

If we knew that base_rate is accurate (i.e. that the source clock andwall clock are exactly the same), we could add the timestamp differenceto see for how long we have been recording at sink_timestamp:

recording_duration_at_sink_timestamp =recording_duration_at_source_timestamp + sink_timestamp - source_timestamp

We don't know that, because base_rate is in fact not accurate accordingto the wall clock. But we don't have an estimate of the actual sourcesample rate (according to the wall clock), and thus cannot translate thetimestamp difference from the wallclock domain to the source clockdomain any better. So we have to live with the above formula, and acceptthat it gives us the absolute error of this order:

recording_duration_error = (sink_timestamp - source_timestamp) * abs(1 -real_base_rate / base_rate)

i.e. less than 0.75% of error if we accept that the real sample ratenever deviates from the nominal one by more than 0.75%.

Using the similar arguments, let's calculate how long the sink input hasbeen playing at sink_timestamp. The sink input, according to the sourceclock, has received send_counter bytes of data, but hassink_input_buffer bytes buffered in the sink input, and sink_latencymicroseconds of data (according to the sink clock) buffered in the sink. So:

playback_duration = bytes_to_usec(send_counter, ???) -bytes_to_usec(sink_input_buffer, !!!) - sink_latency

...with an obvious source of error that we didn't convert the sinklatency to the source clock domain. But this error is of the same orderas the recording duration error (because both sink latency and theworst-case duration between the message being sent and processed in thepop callback are of the same order) that we already accepted, so it'spointless to correct.

Let's see what we should put instead of the "???". Obviously, the actualrate with which the sink consumed samples. But we have previouslycontrolled the rate at which it consumes samples, with the aim ofkeeping the latency constant. So "???" is just base_rate.

Now let's think which rate should be put instead of the "!!!".Intuitively, it would appear that it is old_rate, because that's therate associated with the sink input. But there is a counterargument:that rate is being constantly manipulated with, in order to cause thesink input to consume samples faster or slower than it would normallydo, and thus does not represent the true sample rate of the sink input.Also, due to these manipulations, old_rate might contain jitter, andthus base_rate is a better quantity to put instead of the "!!!", withthe same "we have already accepted a similar error" argument.


The total latency is, obviously,

latency = recording_duration - playback_duration

which, after expansion, is exactly your formula for current_latency,with some instances of old_rate replaced with base_rate. As I said, Ithink this replacement may be beneficial for reducing self-inflictedjitter while working outside of the deadband.

A wrong-and-hackish (not sure about thread safety) patch is attachedthat does this replacement in as many places as possible (including themessage processing) in hope to reduce jitter, and also removescorrected_latency because it is no longer needed. For me, in webcam->HDAand bluetooth->HDA scenarios, it works just as well as your originalpatch - but you have USB playback devices, so your results may bedifferent. Could you please apply it on top of my older patch (thatmoves capturing the timestamps) and test? A log similar to what you havealready sent, but with this patch and with both 0.75% and the 2‰restraints commented out would be useful.

+      u->latency_error = (4 * u->latency_error + 
(double)abs((int32_t)(current_latency - u->next_latency)) / final_latency) / 5;

OK, so latency_error is a dimensionless quantity representing therelative (to final_latency) error. But then I can't make sense of this:

+    /* Adjust as good as physics allows (with some safety margin) */
+    if (abs(latency_difference) <= 2.5 * u->latency_error * final_latency + 
u->adjust_time / 2 / base_rate + 100)
+       new_rate = base_rate;


abs(latency_difference) is obviously in microseconds.

2.5 * u->latency_error * final_latency is also in microseconds, good.

100 microseconds as a fudge factor are understandable, too.

But u->adjust_time / 2 / base_rate is something strange, notmicroseconds. Obviously, you meant something different. Besides, this,if evaluated, would also yield at most 100 (with adjust_time of 10seconds), and thus would be of the same order as the fudge factor. So -the whole deadband, according to your own testing, works fine almostwithout this term, maybe it is a good idea to delete it?


--
Alexander E. Patrakov

>From 3fc2d409dcee3d1e9a7ad5df5ee06d3e1457f7e0 Mon Sep 17 00:00:00 2001
From: "Alexander E. Patrakov" <[email protected]>
Date: Sun, 8 Feb 2015 23:14:12 +0500
Subject: [PATCH] Use a less jittery sample rate in calculations

The sink input sample rate is being constantly manipulated with, thus
multiplying or dividing by it would introduce jitter. Use the source
output sample rate as a somewhat biased, but also less jittery
estimation of the average sink input sample rate.
---
 src/modules/module-loopback.c | 37 ++++++++++++++++++-------------------
 1 file changed, 18 insertions(+), 19 deletions(-)

diff --git a/src/modules/module-loopback.c b/src/modules/module-loopback.c
index 1b76ef5..33012e9 100644
--- a/src/modules/module-loopback.c
+++ b/src/modules/module-loopback.c
@@ -187,7 +187,7 @@ static void teardown(struct userdata *u) {
 static void adjust_rates(struct userdata *u) {
     size_t buffer;
     uint32_t old_rate, base_rate, new_rate, hours, cut_off_frequency;
-    pa_usec_t final_latency, source_sink_latency, current_buffer_latency, current_latency, corrected_latency;
+    pa_usec_t final_latency, source_sink_latency, current_buffer_latency, current_latency;
     double min_cycles;
     int32_t latency_difference;
     pa_usec_t snapshot_delay;
@@ -228,7 +228,7 @@ static void adjust_rates(struct userdata *u) {
         buffer += (size_t) (u->latency_snapshot.send_counter - u->latency_snapshot.recv_counter);
     else
         buffer = PA_CLIP_SUB(buffer, (size_t) (u->latency_snapshot.recv_counter - u->latency_snapshot.send_counter));
-    current_buffer_latency = pa_bytes_to_usec(buffer, &u->sink_input->sample_spec);
+    current_buffer_latency = pa_bytes_to_usec(buffer, &u->source_output->sample_spec);
 
     snapshot_delay = u->latency_snapshot.sink_timestamp - u->latency_snapshot.source_timestamp;
     current_latency = u->latency_snapshot.sink_latency + current_buffer_latency + u->latency_snapshot.source_latency - snapshot_delay;
@@ -240,14 +240,6 @@ static void adjust_rates(struct userdata *u) {
        final_latency += u->initial_buffer_latency;
     final_latency = PA_MAX(final_latency, source_sink_latency + u->buffer_latency);
 
-    pa_log_debug("Loopback overall latency is %0.2f ms + %0.2f ms + %0.2f ms = %0.2f ms, latency difference: %0.2f ms, rate difference: %i Hz",
-                (double) u->latency_snapshot.sink_latency / PA_USEC_PER_MSEC,
-                (double) current_buffer_latency / PA_USEC_PER_MSEC,
-                (double) u->latency_snapshot.source_latency / PA_USEC_PER_MSEC,
-                (double) current_latency / PA_USEC_PER_MSEC,
-                (double) (int32_t)(current_latency - final_latency) / PA_USEC_PER_MSEC,
-                (int32_t)(old_rate - base_rate));
-
    /* Low pass filtered difference between expectation value and observed latency */
    if (!u->source_sink_changed)
       u->latency_error = (4 * u->latency_error + (double)abs((int32_t)(current_latency - u->next_latency)) / final_latency) / 5;
@@ -255,8 +247,14 @@ static void adjust_rates(struct userdata *u) {
       u->source_sink_changed = false;
 
     /* Latency and latency difference at base rate */
-    corrected_latency = u->latency_snapshot.source_latency + (u->latency_snapshot.sink_latency + current_buffer_latency) * old_rate / base_rate - snapshot_delay;
-    latency_difference = (int32_t)(corrected_latency - final_latency);
+    latency_difference = (int32_t)(current_latency - final_latency);
+
+    pa_log_debug("Loopback overall latency is %0.2f ms, latency difference: %0.2f ms, allowed error: %0.2f ms, rate difference: %i Hz",
+                (double) current_latency / PA_USEC_PER_MSEC,
+                (double) latency_difference / PA_USEC_PER_MSEC,
+		(double) (2.5 * u->latency_error * final_latency + 100) / PA_USEC_PER_MSEC,
+                (int32_t)(old_rate - base_rate));
+
 
     /* Minimum number of adjust times + 1 needed to adjust at 0.75% deviation from base rate */
     min_cycles = (double)abs(latency_difference) / u->adjust_time / 0.0075 + 1;
@@ -265,7 +263,7 @@ static void adjust_rates(struct userdata *u) {
     new_rate = base_rate * (1.0 + latency_difference / min_cycles / u->adjust_time) + 0.5;
 
     /* Adjust as good as physics allows (with some safety margin) */
-    if (abs(latency_difference) <= 2.5 * u->latency_error * final_latency + u->adjust_time / 2 / base_rate + 100)
+    if (abs(latency_difference) <= 2.5 * u->latency_error * final_latency + 100)
        new_rate = base_rate;
 
     /* Do the adjustment in small steps; 2â° of base rate can be considered inaudible, use 1 Hz below 500 Hz base rate */
@@ -276,7 +274,7 @@ static void adjust_rates(struct userdata *u) {
     }
 
     /* Predictor */
-    u->next_latency = (corrected_latency * base_rate + (int32_t)(base_rate - new_rate) * (int64_t)u->adjust_time) / new_rate;
+    u->next_latency = (current_latency * base_rate + (int32_t)(base_rate - new_rate) * (int64_t)u->adjust_time) / new_rate;
 
     /* Set rate */
     pa_sink_input_set_rate(u->sink_input, new_rate);
@@ -351,7 +349,7 @@ static void memblockq_adjust(struct userdata *u, int32_t offset, bool allow_push
     else
        requested_buffer_latency = requested_buffer_latency - offset;
 
-    requested_buffer_bytes = pa_usec_to_bytes(requested_buffer_latency, &u->sink_input->sample_spec);
+    requested_buffer_bytes = pa_usec_to_bytes(requested_buffer_latency, &u->source_output->sample_spec);
     memblock_bytes = pa_memblockq_get_length(u->memblockq);
 
     /* Drop audio from queue */
@@ -553,7 +551,7 @@ static void source_output_moving_cb(pa_source_output *o, pa_source *dest) {
 
     pa_sink_input_get_latency(u->sink_input, &sink_latency);
     if (u->send_counter > u->recv_counter)
-       sink_latency += pa_bytes_to_usec(u->send_counter - u->recv_counter, &u->sink_input->sample_spec);
+       sink_latency += pa_bytes_to_usec(u->send_counter - u->recv_counter, &u->source_output->sample_spec);
     if (dest->flags & PA_SOURCE_DYNAMIC_LATENCY)
        sink_latency += pa_source_get_latency(dest);
     else
@@ -709,12 +707,13 @@ static int sink_input_process_msg_cb(pa_msgobject *obj, int code, void *data, in
 
         case SINK_INPUT_MESSAGE_LATENCY_SNAPSHOT: {
             size_t length;
+            size_t resampled_length;
 
             length = pa_memblockq_get_length(u->sink_input->thread_info.render_memblockq);
+            resampled_length = length * u->source_output->sample_spec.rate / u->sink_input->sink->sample_spec.rate;
 
             u->latency_snapshot.recv_counter = u->recv_counter;
-            u->latency_snapshot.sink_input_buffer = pa_memblockq_get_length(u->memblockq) +
-                                                    (u->sink_input->thread_info.resampler ? pa_resampler_request(u->sink_input->thread_info.resampler, length) : length);
+            u->latency_snapshot.sink_input_buffer = pa_memblockq_get_length(u->memblockq) + resampled_length;
             u->latency_snapshot.sink_latency = pa_sink_get_latency_within_thread(u->sink_input->sink);
             u->latency_snapshot.sink_timestamp = pa_rtclock_now();
 
@@ -874,7 +873,7 @@ static void sink_input_moving_cb(pa_sink_input *i, pa_sink *dest) {
 
     pa_source_output_get_latency(u->source_output, &source_latency);
     if (u->send_counter > u->recv_counter)
-       source_latency += pa_bytes_to_usec(u->send_counter - u->recv_counter, &u->sink_input->sample_spec);
+       source_latency += pa_bytes_to_usec(u->send_counter - u->recv_counter, &u->source_output->sample_spec);
     memblockq_adjust(u, source_latency, true);
 
     u->latency_error = 400.0 / get_requested_latency(u);
-- 
2.2.1

_______________________________________________
pulseaudio-discuss mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/pulseaudio-discuss

Re: [pulseaudio-discuss] [PATCH v4] Make module loopback honor requested latency

Reply via email to