Re: [pulseaudio-discuss] [PATCH v4] Make module loopback honor requested latency

Georg Chini Sun, 08 Feb 2015 14:26:36 -0800

On 08.02.2015 19:34, Alexander E. Patrakov wrote:

01.02.2015 03:43, Georg Chini wrote:
This is the final version of my patch for module-loopback. It is ontop of thepatch I sent about an hour ago and contains a lot more changes thanthe previous
versions:
- Honor specified latency if possible, if not adjust to the lowestpossible value- Smooth switching from fixed latency to dynamic latency source orsink and vice versa
- good rate and latency stability, no rate oscillation
- adjusts latency as good as your setup allows
- fast regulation of latency offsets, adjusts 100 ms offset within 22seconds (adjust
   time=1) to 60 seconds (adjust_time=10)
- usable latency range 4 - 30000 ms
- Avoid rewinds and "cannot peek into queue" messages during startupand switching
- works with rates between 200 and 190000 Hz
- maximum latency offset after source/sink switch or at startuparound is 200 ms
I also introduced a new parameter, buffer_latency_msec which can beused togetherwith latency_msec. If buffer_latency_msec is specified, the resultinglatencywill be latency_msec + buffer_latency_msec. Latency_msec then refersonly tothe source/sink latency while buffer_latency_msec specifies thebuffer part.This can be used to save a lot of CPU at low latencies, running 10 mslatencywith latency_msec=6 buffer_latency_msec=4 gives 8% CPU on my systemcompared to
12% when I only specify latency_msec=10.
Additionally you can go beyond the safe-guard limits that are builtin, you canaccess the range 1 - 3 ms or lower the buffer latency for fixedlatency devices.Some of my USB devices run fine at a buffer latency of fragment size+ 4 ms
instead of the dfault fragment size + 20 ms.
I tested it all with Intel HDA, USB and bluetooth sound devices. Iwould like to
see some test results from other people.
After attempting to split up this patch and to add comments, I gotsome remarks and questions.
+ pa_log_debug("Loopback overall latency is %0.2f ms + %0.2f ms +%0.2f ms = %0.2f ms, latency difference: %0.2f ms, rate difference:%i Hz",(double) u->latency_snapshot.sink_latency /PA_USEC_PER_MSEC,
-                (double) buffer_latency / PA_USEC_PER_MSEC,
+                (double) current_buffer_latency / PA_USEC_PER_MSEC,
(double) u->latency_snapshot.source_latency /PA_USEC_PER_MSEC,- ((double) u->latency_snapshot.sink_latency +buffer_latency + u->latency_snapshot.source_latency) /PA_USEC_PER_MSEC);
I am not sure whether this split of latency accounting makes senseanymore, because it is not possible to attribute these latencies toany particular point in time. Especially current_buffer_latency, which(for me) is just a meaningless-by-itself intermediate quantity.

I don't care, I left it in because it was already there. If you wouldlike to delete it, no problem.

But maybe it makes sense to print out the corrected latency at this point.

Also, here is my line of thought (an alternative derivation ofcurrent_buffer_latency, which does not, however, yield exactly thesame), in some pseudocode.
At the moment source_timestamp, the source had already given usreceive_counter bytes of data, and had source_output_buffer bytes ofdata buffered at the source output level and source_latencymicroseconds of data still sitting in the soundcard buffer. So, atthat moment, we have been recording for this amount of time, accordingto the source clock:
recording_duration_at_source_timestamp = source_latency +bytes_to_usec(receive_counter + source_output_buffer, base_rate)
If we knew that base_rate is accurate (i.e. that the source clock andwall clock are exactly the same), we could add the timestampdifference to see for how long we have been recording at sink_timestamp:
recording_duration_at_sink_timestamp =recording_duration_at_source_timestamp + sink_timestamp -source_timestamp
We don't know that, because base_rate is in fact not accurateaccording to the wall clock. But we don't have an estimate of theactual source sample rate (according to the wall clock), and thuscannot translate the timestamp difference from the wallclock domain tothe source clock domain any better. So we have to live with the aboveformula, and accept that it gives us the absolute error of this order:
recording_duration_error = (sink_timestamp - source_timestamp) * abs(1- real_base_rate / base_rate)
i.e. less than 0.75% of error if we accept that the real sample ratenever deviates from the nominal one by more than 0.75%.
Using the similar arguments, let's calculate how long the sink inputhas been playing at sink_timestamp. The sink input, according to thesource clock, has received send_counter bytes of data, but hassink_input_buffer bytes buffered in the sink input, and sink_latencymicroseconds of data (according to the sink clock) buffered in thesink. So:
playback_duration = bytes_to_usec(send_counter, ???) -bytes_to_usec(sink_input_buffer, !!!) - sink_latency
...with an obvious source of error that we didn't convert the sinklatency to the source clock domain. But this error is of the sameorder as the recording duration error (because both sink latency andthe worst-case duration between the message being sent and processedin the pop callback are of the same order) that we already accepted,so it's pointless to correct.
Let's see what we should put instead of the "???". Obviously, theactual rate with which the sink consumed samples. But we havepreviously controlled the rate at which it consumes samples, with theaim of keeping the latency constant. So "???" is just base_rate.
Now let's think which rate should be put instead of the "!!!".Intuitively, it would appear that it is old_rate, because that's therate associated with the sink input.But there is a counterargument: that rate is being constantlymanipulated with, in order to cause the sink input to consume samplesfaster or slower than it would normally do, and thus does notrepresent the true sample rate of the sink input.

I cannot follow this argument. The rate has been constant for at least 1second, so why would

the rate then not represent the real rate of the sink?

Also, due to these manipulations, old_rate might contain jitter, andthus base_rate is a better quantity to put instead of the "!!!", withthe same "we have already accepted a similar error" argument.

Not at all! Consider a latency of 30000 ms. Deliberately introducing anerror of 0.75% here meansaccepting 225 ms of deviation that are not necessary! And this issomething I observed inreality and why I needed the corrected latency at all. You willmassively over-correct if you do whatyou propose. You will not see a lot of difference for small latencieshowever.After all wall clock is not too important in that case, most of ithappens either in the sourceor sink domain, wall clock is just the domain you translate to so thatyou can do calculations.If the rates are not exact it does not matter, you will see some driftthen that the controller

will correct.

Let's see how I arrived at the final equation (and I am quite sure I amcorrect here):

The basic of it all is the equation L=n/r where L is latency, n is thenumber of samples and

r the rate.

Let's call the base rate rb, the current ("old") rate ro, sink latencyLsi, source latency Lso,buffer latency Lbu , then at any given time the number of samples in thewhole system is


n = rb*Lso + ro*(Lbu + Lsi), this means at base rate we have the latency
L = n/rb = Lso + ro/rb*(Lbu + Lsi) - that is my corrected latency. And this
is the value I want to control, so I put this into the controller.
The final number of samples I want to reach is nf = Lfi / rb (Lfi being the
final or requested latency)

The expectation value for the next latency is calculated similar:
Current number of samples (=L*rb) plus samples that are added
during adjust_time minus samples that are played during adjust_time gives
LNext = (L*rb + adjust_time*rb - adjust_time*rn)/rn

(Ups, I just noticed I made a mistake in calculating the current latencywhichis n/ro = rb/ro*Lso + Lbu + Lsi and not just Lso + Lbu + Lsi, but Ithink that

is not too important because it is only used to calculate the error. Can you
nevertheless correct it please when you split the patch?)

I hope this explains a bit what the basis of the patch is and why Ithink your

modification is wrong.

The total latency is, obviously,

latency = recording_duration - playback_duration
which, after expansion, is exactly your formula for current_latency,with some instances of old_rate replaced with base_rate. As I said, Ithink this replacement may be beneficial for reducing self-inflictedjitter while working outside of the deadband.
A wrong-and-hackish (not sure about thread safety) patch is attachedthat does this replacement in as many places as possible (includingthe message processing) in hope to reduce jitter, and also removescorrected_latency because it is no longer needed. For me, inwebcam->HDA and bluetooth->HDA scenarios, it works just as well asyour original patch - but you have USB playback devices, so yourresults may be different. Could you please apply it on top of my olderpatch (that moves capturing the timestamps) and test? A log similar towhat you have already sent, but with this patch and with both 0.75%and the 2‰ restraints commented out would be useful.


I did not try the patch, I do not think it is correct, see above.

+ u->latency_error = (4 * u->latency_error +(double)abs((int32_t)(current_latency - u->next_latency)) /final_latency) / 5;
OK, so latency_error is a dimensionless quantity representing therelative (to final_latency) error. But then I can't make sense of this:
+    /* Adjust as good as physics allows (with some safety margin) */
+ if (abs(latency_difference) <= 2.5 * u->latency_error *final_latency + u->adjust_time / 2 / base_rate + 100)
+       new_rate = base_rate;
abs(latency_difference) is obviously in microseconds.

2.5 * u->latency_error * final_latency is also in microseconds, good.

100 microseconds as a fudge factor are understandable, too.
But u->adjust_time / 2 / base_rate is something strange, notmicroseconds. Obviously, you meant something different. Besides, this,if evaluated, would also yield at most 100 (with adjust_time of 10seconds), and thus would be of the same order as the fudge factor. So- the whole deadband, according to your own testing, works fine almostwithout this term, maybe it is a good idea to delete it?

This is also us. There is 1 second you do not see (because it's 1). Theterm accounts for theminimum adjustment you can do. The minimum adjustment is 1Hz, so youhave a minimumtime of 1/base_rate (actually 1/(base_rate +/- 1)) seconds PER SECONDthat you can add or

remove.

So it does not make sense to adjust if you are less than 1/(2*base_rate)/1s * adjust_timeaway from the requested latency. I would not want to delete it, if youconsider a rateof 8000 Hz it is already 625 us at 10 seconds adjust_time and maybethere are people using

even lower rates.

_______________________________________________
pulseaudio-discuss mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/pulseaudio-discuss

Re: [pulseaudio-discuss] [PATCH v4] Make module loopback honor requested latency

Reply via email to