Re: [pulseaudio-discuss] alsa sink latency - how to account for startup delay

Georg Chini Fri, 08 Apr 2016 11:23:57 -0700

On 08.04.2016 18:01, Tanu Kaskinen wrote:

On Wed, 2016-04-06 at 21:17 +0200, Georg Chini wrote:

On 06.04.2016 15:55, Tanu Kaskinen wrote:

On Tue, 2016-04-05 at 21:54 +0200, Georg Chini wrote:

On 05.04.2016 17:42, Tanu Kaskinen wrote:

I can't follow that line of reasoning. In the beginning the ring buffer
is filled to max, and once you call snd_pcm_start(), data starts to
move from the ring buffer to other buffers (I'll call the other buffers
the "not-ring-buffer"). Apparently the driver "sees" the not-ring-
buffer only partially, since it reports a larger latency than just the
ring buffer fill level, but it still doesn't report the full latency.
The time between snd_pcm_start() and the point where the reported delay
does not any more equal the written amount tells the size of the
visible part of the not-ring-buffer - it's the time it took for the
first sample to travel from the ring buffer to the invisible part of
the not-ring-buffer. I don't understand how the time could say anything
about the size of the invisible part of the not-ring-buffer. Your logic
"works" only if the visible and invisible parts happen to be of the
same size.


You should get the same results by calculating

     adjusted delay = ring buffer fill level + 2 * (reported delay - ring 
buffer fill level)

That formula doesn't make sense, but that's how I understand your logic
works, with the difference that your fix is based on one measurement
only, so it's constant over time, while my formula recalculates the
adjustment every time the delay is queried, so the adjustment size
varies somewhat depending on the granularity at which audio moves to
and from the visible part of the not-ring-buffer.

In any case, even if your logic actually makes sense and I'm just
misunderstanding something, I don't see why the correction should be
done in pulseaudio instead of the alsa driver.

Well, now I don't understand what you mean. The logic is very simple:

If there is a not reported delay between the time snd_pcm_start() is
called and the time when the first sample is delivered to the DAC, then
this delay will persist and become part of the continuous latency.
That's all, what causes the delay is completely irrelevant.

The code can't know when the first sample hits the DAC. The delay

reported by alsa is supposed to tell that, but if the reported delay is
wrong, I don't think you have any way to know the real delay.

Yes, the code can know when the first sample hits the DAC. I explained it

already. Before the first sample hits the DAC, the delay is growing and
larger or equal than the number of samples you have written to the
buffer.
At the moment the delay is smaller than the write count, you can be
sure that at least some audio has been delivered. Since the delay is
decreased by the amount of audio that has been delivered to the DAC,
you can work back in time to the moment when the first sample has been
played.

Yes, you explained that already, but you didn't give a convincing
explanation of why the point in time when the delay stops growing would
indicate the point when the first sample hit the DAC.


See below. The precondition for my thoughts naturally is that no
samples vanish from the latency reports, maybe that is where
we are thinking differently.

Maybe what I said above was not complete. At the point in time when
the first audio is played, there are two delays: First the one that is
reported
by alsa and the other is the difference between the time stamps minus
the played audio. If these two delays don't match, then there is an
"extra delay" that has to be taken into account.

The difference between the time stamps is not related to how big the

invisible part of the buffer is. I'll try to illustrate:

In the beginning, pulseaudio has written 10 ms of audio to the ring
buffer, and snd_pcm_start() hasn't been called:

DAC <- ssssssssss|sss|dddddddddd <- pulseaudio

Here "ssssssssss|sss|ddddddddd" is the whole buffer between the DAC and
pulseaudio. It's divided into three parts; the pipe characters separate
the different parts. Each letter represents 1 ms of data. "s" stands
for silence and "d" stands for data. The first part of the buffer is
the invisible part that is not included in the delay reports. I've put
10 ms of data there, but it's unknown to the driver how big the
invisible part is. The middle part of the buffer is the "send buffer"
that the driver maintains, its size is 3 ms in this example. It's
filled with silence in the beginning. The third part is the ring
buffer, containing 10 ms of data from pulseaudio.

At this point the driver reports 10 ms latency. It knows it has 3 ms of
silence buffered too, which it should include in its latency report,
but it's stupid, so it only reports the data in the ring buffer. The
driver has no idea how big the invisible part is, so it doesn't include
it in the report.

Now pulseaudio calls snd_pcm_start(), which causes data to start moving
from the ring buffer to the send buffer. After 1 ms the situation looks
like this:

DAC <- ssssssssss|ssd|ddddddddd  <- pulseaudio

There's 2 ms of silence in the send buffer and 1 ms of data. The driver
again ignores the silence in the send buffer, and reports that the
delay is 10 ms, which consists of 1 ms of data in the send buffer and 9
ms of data in the ring buffer.

After 2 ms:

DAC <- ssssssssss|sdd|dddddddd   <- pulseaudio

Reported delay: 10 ms

After 3 ms:

DAC <- ssssssssss|ddd|ddddddd    <- pulseaudio

Reported delay: 10 ms

Let's say pulseaudio refills the ring buffer now.

DAC <- ssssssssss|ddd|dddddddddd <- pulseaudio

Reported delay: 13 ms

After 4 ms:

DAC <- sssssssssd|ddd|ddddddddd  <- pulseaudio

The first data chunk has now entered the invisible part of the buffer,
but it will still take 9 ms before it hits the DAC. At this point
pulseaudio has written 13 ms of audio, and the reported delay is 12 ms.
According to your logic, the adjusted delay is 12 + (4 - 1) = 15 ms,
while in reality the latency is 22 ms.

At this point, no audio has been played yet. You still have silence in the

buffer, so alsa would not report back, that samples have been played.

But the reported delay stopped growing! That's the point where you
claim the first sample hits the DAC, but as my example illustrates,
that doesn't seem to be true.


In your example it is not true, that's right. But for the USB devices it is.
They only start decreasing the delay when real audio has been played,
and they would increase the delay when you write to the buffer,
I have checked that in the code.
And I think any driver that makes samples vanish is so severely screwed,
that we can't do anything about it. If the driver reports complete moonshine
numbers, you can't fix it, I agree with you in that respect.

But that is not the case with USB. There is only some missing latency
that is not reported - call it transport delay or whatever and I suspect a
similar delay can be found in other alsa drivers. There is no need to figure
out the reason for it, it just takes some time after snd_pcm_start() was
called until the first sample is played - without making samples vanish.
And in that case the delay can be detected and used by the code.

I choose the point where the first d hits the DAC and that is reported
back by alsa. (see above) I've tried put it all together in a document.
I hope I can finish the part that deals with the smoother code today.
If so, I will send it to you privately because the part about
module-loopback
is still missing.
Anyway, even if you think it is wrong I am still measuring the correct
end-to-end latency with my code, so something I am doing must be
right ...

>From what I can tell, that's a coincidence.


No, it definitely isn't. If you accept the precondition, that samples
not simply vanish from the latency reports, it's physics.
I would tend to agree that I have overlooked something, if the "extra
delay" would be the same every time and if I could not write down
the math for it.
But it isn't completely constant (just in the same range) and I can
write down the math and it matches my measurements. So I am
fairly sure that I am right. Did you have a look at my document?

I don't know how well this model reflects the reality of how the usb
audio driver works, but this model seems like a plausible explanation
for why the driver reports delays equalling the amount of written data
in the beginning, and why the real latency is higher than the reported
latency at later times.

I hope this also clarifies why I don't buy your argument that the time
stamp difference is somehow related to the unreported latency.

No, in fact it doesn't.

Trying to fix up that delay on every iteration does not make any sense
at all, it is there from the start and it is constant.

Commenting on "it is constant": The playback latency is the sum of data

in various buffers. The DAC consumes one sample at a time from the very
last buffer, but I presume that all other places move data in bigger
chunks than one sample. The unreported delay can only be constant if
data moves to the invisible part of the buffering in one sample chunks.
Otherwise the latency goes down every time the DAC reads a sample, and
then when the buffer is refilled at the other end, the latency jumps up
by the refill amount.

I only said the "extra latency" is constant, not the latency as such.

See your own example above that your argument is wrong. Even
if the audio is moved in chunks through your invisible buffer part,
that part still has the same length all the time. When one "d" is
moved forward another one will replace it.

No, the invisible part is not constant, even though my presentation
didn't show the variance. The DAC consumes data from the invisible
buffer one sample at a time, and each time it does that, the extra
latency decreases by one sample. Data moves from the visible part of
the buffer to the invisible part in bigger chunks. I didn't specify the
chunk size, but if we assume 1 ms chunks, the extra latency grows by 1
ms every time a chunk is transferred from the visible part to the
invisible part.


Then take any part of the buffer but the last or the first bit. All the
chunks are always full, so it's constant. The moving bit is dealt with
elsewhere, (in the smoother) but there is a lot of buffer that is always
full.
And when you take USB, the driver sees only chunks. The sample
by sample consuming of the DAC is never seen by the driver, it gets
the notification from USB that a chunk has been played.
I'm not sure how it is with HDA, but probably similar.

This is not a negative delay reported by alsa, but my "extra latency"
is getting negative, which means playback must have started
before snd_pcm_start().
According to Raymond Yau playback seems in fact to be started
before snd_pcm_start() for HDA devices, at least if I read his last
mail on that topic right. Then the negative delays would even make
sense, since data is written to the buffer before snd_pcm_start().

I had a look at the code to verify the claim that we configure alsa to

start playback already before we call snd_pcm_start(). If we really do
that intentionally, then it doesn't make sense to call snd_pcm_start()
explicitly.

This is what we do:
snd_pcm_sw_params_set_start_threshold(pcm, swparams, (snd_pcm_uframes_t) -1)

Note the casting of -1 to an unsigned integer. It seems that the
intention is to set as high threshold as possible to avoid automatic
starting. However, alsa-lib casts the threshold back to a signed value
when it's used, and I believe the end result is indeed that playback
starts immediately after the first write. I don't know if that matters,
since we do the manual snd_pcm_start() call immediately after the first
write anyway, but it seems like a bug in any case.

Not very important, but I'll clarify one thing: I had another look, and
I'm not any more sure that the code where I saw the casting back to a
signed integer is actually used by pulseaudio. The function
is snd_pcm_write_areas(), but pulseaudio doesn't call that at least
directly, and I did some searching in alsa-lib too, and I didn't find a
call path that would cause snd_pcm_write_areas() to be used by
pulseaudio. Even if snd_pcm_write_areas() isn't used, though, it's
entirely possible that there's some other code that does a similar
cast. I don't know the code is that triggers the snd_pcm_start() call
when the ring buffer fill level exceeds the configured threshold. It
might be in the kernel.

OK, this it why I measure an "extra latency" of -60 to -20 usec.
So again, if I can measure it and even detect a bug that way,
don't you think there must be some truth in what I'm saying?

Do I understand correctly that your "extra latency" is affected by
whether snd_pcm_start() is called implicitly in mmap_write() or
explicitly after mmap_write()? The time when mmap_write() is called
doesn't affect the latency in the long term.

It does. It isn't much, but if playback starts earlier, the delay
will be exactly that amount less even after 10 hours of playback.
Let's assume you have 10ms of audio to write to the buffer.
During the time, when you write, samples are coming in.
Let's say it takes 100 usec to write the buffer. If you start
playback after the write, this will be 100 usec additional delay.
5 samples have accumulated.
If you start playback immediately after the first bit of data is
written this might take much less time, say 20 usec.
So your delay is four samples less and it will remain that way
until the sink is stopped. There is nothing that would take away
the delay.

The smoother will produce
wrong values if it's not started at the same time as snd_pcm_start() is
called, but I presume the smoother is able to fix such inaccuracies
over time, so it doesn't matter that much when the snd_pcm_start() is
called. So isn't it a bad thing if your "extra latency" permanently
includes something that doesn't have any real effect after some time?


Yes, it is affected by it and it should be, because the "extra delay"
is the time between snd_pcm_start() and the first sample being
played. So if the first samples are played before snd_pcm_start()
the "extra latency" will become negative. And as explained above,
it has permanent effect. Somehow you seem to be of the opinion
that all delays that are not controlled by the pulseaudio code
vanish magically, but they don't.

For the reported latency, it just means, that it will become slightly
smaller. As I said, the smoother does not use the "extra delay"
for anything, it is only calculated once when the origin for the
smoother is set and added later as an offset, when get_latency()
is called.

Have you investigated why there are so many memblocks active that the
pool gets full? It seems fishy to me. The limit could be triggered e.g.
by thousand 1 ms blocks in a 1 second buffer, but why would you do
that?

I have no idea at all. It always happens when you run a sink at

very low latencies and I have observed this also with the code
from git. It is not something that has been introduced by my
changes, so I did not investigate further and just accepted that
there seems to be a lower limit to the sink latency at around 2.3ms
for HDA. Now that USB devices also use timer based scheduling
I am seeing it there as well at around 8ms configured latency.

The message might indicate some bug somewhere, but I think it's
sufficient if it's printed just once. If it's printed once, it's likely
to be printed many more times, and those additional messages don't
really add any value. So you could add some flag that allows us to
print the message only once.

Well, i probably accept it as it is for the moment. Maybe I come
back to it later.
_______________________________________________
pulseaudio-discuss mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/pulseaudio-discuss

Re: [pulseaudio-discuss] alsa sink latency - how to account for startup delay

Reply via email to