RE: [PATCH] ALSA: pcm: fix incorrect hw_base increase

2020-05-17 Thread Lu, Brent
> 
> I tried to imagine a negative impact for this hw_ptr_jiffies update when the
> DMA position is not updated from the driver and I haven't found any so far.
> 
> Let's apply this and we'll see in future :-)
> 
> And yes, the patch description should be improved (DMA ptr is not updated /
> streaming is inactive).
> 
> Reviewed-by: Jaroslav Kysela 

Hi Jaroslav and Takashi,

Thank you for the review. Patch V2 has been uploaded.


Regards,
Brent
> 
> >
> >
> > thanks,
> >
> > Takashi
> >
> 
> 
> --
> Jaroslav Kysela 
> Linux Sound Maintainer; ALSA Project; Red Hat, Inc.


Re: [PATCH] ALSA: pcm: fix incorrect hw_base increase

2020-05-15 Thread Jaroslav Kysela

Dne 15. 05. 20 v 12:39 Takashi Iwai napsal(a):

On Fri, 15 May 2020 11:30:54 +0200,
Jaroslav Kysela wrote:


Dne 15. 05. 20 v 11:04 Lu, Brent napsal(a):


Is this a bugfix needed for older kernels as well?  When did this issue show
up?

thanks,

greg k-h


It happens when DMA stop moving data from host to DSP/DAI for a long time
(> half of buffer time). I know host driver should do something about it. But if
not, the HWSYNC will keep increasing the hw_base and hw_ptr and confuses
user space program.


I'm afraid, but with this code, you turn off the hw_ptr jiffies
code. It would be better to fix the driver in this case (return the
updated / estimated DMA pointer, increase DMA buffer size etc.). This
"lag" is unacceptable.


The problem is obviously in the driver's side and it's best to be
addressed there.  But, I think it's still worth to apply this change.

The hw_ptr jiffies check is performed basically in two places: one is
snd_pcm_period_elapsed() call from ISR, and another is with the
no_period_wakeup flag.  In both cases, it calculates the diff of
jiffies from the previous update, and corrects the hw_ptr_base if that
exceeds the threshold.

And the bug here is that the "previous" jiffies is kept as long as the
hwptr itself is updated.  What we need is the correction of the base
when it really has processed the period size; i.e. hwptr got the same
value (with no_period_wakeup) and yet the jiffies diff is big.  For
this check, it's correct to update hw_ptr_jiffies at each call no
matter whether hwptr moved or not; we need to evaluate from the
previous update, after all.

But I might overlook something.  Jaroslav, could you check it again?
The jiffies check code is your black magic :)


I tried to imagine a negative impact for this hw_ptr_jiffies update when the 
DMA position is not updated from the driver and I haven't found any so far.


Let's apply this and we'll see in future :-)

And yes, the patch description should be improved (DMA ptr is not updated / 
streaming is inactive).


Reviewed-by: Jaroslav Kysela 




thanks,

Takashi




--
Jaroslav Kysela 
Linux Sound Maintainer; ALSA Project; Red Hat, Inc.


Re: [PATCH] ALSA: pcm: fix incorrect hw_base increase

2020-05-15 Thread Takashi Iwai
On Fri, 15 May 2020 11:36:19 +0200,
Lu, Brent wrote:
> 
> > 
> > Updating hw_ptr_jiffies at that code path looks correct, but it still 
> > leaves the
> > question why this condition happens.  It means that the actual hwptr isn't
> > changed and yet only jiffies increase significantly; it means that the 
> > hardware
> > can't report proper pointer, and it should have set
> > SNDRV_PCM_INFO_BATCH flag, then the jiffies check is skipped.
> > 
> > With which hardware and under which situation did it happen (and the patch
> > fixed)?
> > 
> > 
> > thanks,
> > 
> > Takashi
> > 
> 
> >From time to time we got questions from google about why sometimes the
> snd_pcm_avail() returns a value larger than buffer size. Recently we finally
> found reliable reproduce steps: it's on Intel GLK Chromebook Fleex with
> SOF firmware. There is a 1/20 chance the audio playback to HDMI fails.
> 
> >From the FW side we observe a buffer runderrun, the FW is not able to
> recover it for some reason and stops the pipe.
> 
> >From the Linux side we see the pos does not increase because the FW stops
> receiving data but suddenly the hw_prt is increased by buffer_size (16368).
> It could make snd_pcm_avail() reporting a false underrun to the caller like
> following log:
> 
> 2020-05-09T16:08:32.712042+08:00 DEBUG kernel: [  418.510086] sound pcmC0D5p: 
> pos 96 hw_ptr 96 appl_ptr 4096 avail 12368
> 2020-05-09T16:08:32.712043+08:00 DEBUG kernel: [  418.510149] sound pcmC0D5p: 
> pos 96 hw_ptr 96 appl_ptr 6910 avail 9554
> ...
> 2020-05-09T16:08:32.883095+08:00 DEBUG kernel: [  418.680868] sound pcmC0D5p: 
> pos 96 hw_ptr 96 appl_ptr 15102 avail 1362
> 2020-05-09T16:08:32.883104+08:00 DEBUG kernel: [  418.681052] sound pcmC0D5p: 
> pos 96 hw_ptr 96 appl_ptr 15102 avail 1362
> 2020-05-09T16:08:32.883109+08:00 DEBUG kernel: [  418.681130] sound pcmC0D5p: 
> pos 96 hw_ptr 96 appl_ptr 16464 avail 0
> 2020-05-09T16:08:32.929330+08:00 DEBUG kernel: [  418.726515] sound pcmC0D5p: 
> pos 96 hw_ptr 16464 appl_ptr 16464 avail 16368
> 2020-05-09T16:08:32.929512+08:00 DEBUG kernel: [  418.727041] sound pcmC0D5p: 
> pos 96 hw_ptr 16464 appl_ptr 16464 avail 16368
> 
> Or it could make snd_pcm_avail() returns an invalid number and confuses the
> Caller like following log:
> 
> 2020-05-09T16:08:33.054039+08:00 DEBUG kernel: [  418.851755] sound pcmC0D5p: 
> pos 96 hw_ptr 16464 appl_ptr 27390 avail 5442
> 2020-05-09T16:08:33.129552+08:00 DEBUG kernel: [  418.926491] sound pcmC0D5p: 
> pos 96 hw_ptr 32832 appl_ptr 27390 avail 21810
> 2020-05-09T16:08:33.131746+08:00 ERR cras_server[1907]: pcm_avail returned 
> frames larger than buf_size: sof-glkda7219max: :0,5: 21810 > 16368
> 
> I've submitted a patch to fix the issue in SOF side. However, I think it's 
> also good
> to fix the incorrect hw_base increasement in Linux side.
> 
> https://github.com/thesofproject/sof/pull/2926

Oh this whole information (at least some digested version) should have
been included in the patch description.  Otherwise we have no idea why
it's needed and what actually means.

Let's wait for Jaroslav's review, and if it's OK, could you resubmit
with more description?


thanks,

Takashi


Re: [PATCH] ALSA: pcm: fix incorrect hw_base increase

2020-05-15 Thread Takashi Iwai
On Fri, 15 May 2020 11:30:54 +0200,
Jaroslav Kysela wrote:
> 
> Dne 15. 05. 20 v 11:04 Lu, Brent napsal(a):
> >>
> >> Is this a bugfix needed for older kernels as well?  When did this issue 
> >> show
> >> up?
> >>
> >> thanks,
> >>
> >> greg k-h
> >
> > It happens when DMA stop moving data from host to DSP/DAI for a long time
> > (> half of buffer time). I know host driver should do something about it. 
> > But if
> > not, the HWSYNC will keep increasing the hw_base and hw_ptr and confuses
> > user space program.
> 
> I'm afraid, but with this code, you turn off the hw_ptr jiffies
> code. It would be better to fix the driver in this case (return the
> updated / estimated DMA pointer, increase DMA buffer size etc.). This
> "lag" is unacceptable.

The problem is obviously in the driver's side and it's best to be
addressed there.  But, I think it's still worth to apply this change.

The hw_ptr jiffies check is performed basically in two places: one is
snd_pcm_period_elapsed() call from ISR, and another is with the
no_period_wakeup flag.  In both cases, it calculates the diff of
jiffies from the previous update, and corrects the hw_ptr_base if that
exceeds the threshold.

And the bug here is that the "previous" jiffies is kept as long as the
hwptr itself is updated.  What we need is the correction of the base
when it really has processed the period size; i.e. hwptr got the same
value (with no_period_wakeup) and yet the jiffies diff is big.  For
this check, it's correct to update hw_ptr_jiffies at each call no
matter whether hwptr moved or not; we need to evaluate from the
previous update, after all.

But I might overlook something.  Jaroslav, could you check it again?
The jiffies check code is your black magic :)


thanks,

Takashi


Re: [PATCH] ALSA: pcm: fix incorrect hw_base increase

2020-05-15 Thread Jaroslav Kysela

Dne 15. 05. 20 v 11:04 Lu, Brent napsal(a):


Is this a bugfix needed for older kernels as well?  When did this issue show
up?

thanks,

greg k-h


It happens when DMA stop moving data from host to DSP/DAI for a long time
(> half of buffer time). I know host driver should do something about it. But if
not, the HWSYNC will keep increasing the hw_base and hw_ptr and confuses
user space program.


I'm afraid, but with this code, you turn off the hw_ptr jiffies code. It would 
be better to fix the driver in this case (return the updated / estimated DMA 
pointer, increase DMA buffer size etc.). This "lag" is unacceptable.


Jaroslav



Regards,
Brent




--
Jaroslav Kysela 
Linux Sound Maintainer; ALSA Project; Red Hat, Inc.


RE: [PATCH] ALSA: pcm: fix incorrect hw_base increase

2020-05-15 Thread Lu, Brent
> 
> Updating hw_ptr_jiffies at that code path looks correct, but it still leaves 
> the
> question why this condition happens.  It means that the actual hwptr isn't
> changed and yet only jiffies increase significantly; it means that the 
> hardware
> can't report proper pointer, and it should have set
> SNDRV_PCM_INFO_BATCH flag, then the jiffies check is skipped.
> 
> With which hardware and under which situation did it happen (and the patch
> fixed)?
> 
> 
> thanks,
> 
> Takashi
> 

>From time to time we got questions from google about why sometimes the
snd_pcm_avail() returns a value larger than buffer size. Recently we finally
found reliable reproduce steps: it's on Intel GLK Chromebook Fleex with
SOF firmware. There is a 1/20 chance the audio playback to HDMI fails.

>From the FW side we observe a buffer runderrun, the FW is not able to
recover it for some reason and stops the pipe.

>From the Linux side we see the pos does not increase because the FW stops
receiving data but suddenly the hw_prt is increased by buffer_size (16368).
It could make snd_pcm_avail() reporting a false underrun to the caller like
following log:

2020-05-09T16:08:32.712042+08:00 DEBUG kernel: [  418.510086] sound pcmC0D5p: 
pos 96 hw_ptr 96 appl_ptr 4096 avail 12368
2020-05-09T16:08:32.712043+08:00 DEBUG kernel: [  418.510149] sound pcmC0D5p: 
pos 96 hw_ptr 96 appl_ptr 6910 avail 9554
...
2020-05-09T16:08:32.883095+08:00 DEBUG kernel: [  418.680868] sound pcmC0D5p: 
pos 96 hw_ptr 96 appl_ptr 15102 avail 1362
2020-05-09T16:08:32.883104+08:00 DEBUG kernel: [  418.681052] sound pcmC0D5p: 
pos 96 hw_ptr 96 appl_ptr 15102 avail 1362
2020-05-09T16:08:32.883109+08:00 DEBUG kernel: [  418.681130] sound pcmC0D5p: 
pos 96 hw_ptr 96 appl_ptr 16464 avail 0
2020-05-09T16:08:32.929330+08:00 DEBUG kernel: [  418.726515] sound pcmC0D5p: 
pos 96 hw_ptr 16464 appl_ptr 16464 avail 16368
2020-05-09T16:08:32.929512+08:00 DEBUG kernel: [  418.727041] sound pcmC0D5p: 
pos 96 hw_ptr 16464 appl_ptr 16464 avail 16368

Or it could make snd_pcm_avail() returns an invalid number and confuses the
Caller like following log:

2020-05-09T16:08:33.054039+08:00 DEBUG kernel: [  418.851755] sound pcmC0D5p: 
pos 96 hw_ptr 16464 appl_ptr 27390 avail 5442
2020-05-09T16:08:33.129552+08:00 DEBUG kernel: [  418.926491] sound pcmC0D5p: 
pos 96 hw_ptr 32832 appl_ptr 27390 avail 21810
2020-05-09T16:08:33.131746+08:00 ERR cras_server[1907]: pcm_avail returned 
frames larger than buf_size: sof-glkda7219max: :0,5: 21810 > 16368

I've submitted a patch to fix the issue in SOF side. However, I think it's also 
good
to fix the incorrect hw_base increasement in Linux side.

https://github.com/thesofproject/sof/pull/2926


Regards,
Brent



RE: [PATCH] ALSA: pcm: fix incorrect hw_base increase

2020-05-15 Thread Lu, Brent
> 
> Is this a bugfix needed for older kernels as well?  When did this issue show
> up?
> 
> thanks,
> 
> greg k-h

It happens when DMA stop moving data from host to DSP/DAI for a long time
(> half of buffer time). I know host driver should do something about it. But if
not, the HWSYNC will keep increasing the hw_base and hw_ptr and confuses
user space program.

Regards,
Brent


Re: [PATCH] ALSA: pcm: fix incorrect hw_base increase

2020-05-15 Thread Takashi Iwai
On Fri, 15 May 2020 06:09:39 +0200,
Brent Lu wrote:
> 
> The hw_base will be increased by runtime->buffer_size frames
> unconditionally if the runtime->status->hw_ptr is not updated for over
> half of buffer time. As the hw_base increases, so does the
> runtime->status->hw_ptr which could lead to invalid return value when
> user space program calls snd_pcm_avail() function.
> 
> By updating runtime->hw_ptr_jiffies each time the HWSYNC is called,
> the hw_base will keep the same when buffer stall happens, so does the
> hw_ptr.

Updating hw_ptr_jiffies at that code path looks correct, but it still
leaves the question why this condition happens.  It means that the
actual hwptr isn't changed and yet only jiffies increase
significantly; it means that the hardware can't report proper pointer,
and it should have set SNDRV_PCM_INFO_BATCH flag, then the jiffies
check is skipped.

With which hardware and under which situation did it happen (and the
patch fixed)?


thanks,

Takashi

> 
> Signed-off-by: Brent Lu 
> ---
>  sound/core/pcm_lib.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/sound/core/pcm_lib.c b/sound/core/pcm_lib.c
> index 872a852..d531e1b 100644
> --- a/sound/core/pcm_lib.c
> +++ b/sound/core/pcm_lib.c
> @@ -433,6 +433,7 @@ static int snd_pcm_update_hw_ptr0(struct 
> snd_pcm_substream *substream,
>  
>   no_delta_check:
>   if (runtime->status->hw_ptr == new_hw_ptr) {
> + runtime->hw_ptr_jiffies = curr_jiffies;
>   update_audio_tstamp(substream, &curr_tstamp, &audio_tstamp);
>   return 0;
>   }
> -- 
> 2.7.4
> 


Re: [PATCH] ALSA: pcm: fix incorrect hw_base increase

2020-05-15 Thread Greg Kroah-Hartman
On Fri, May 15, 2020 at 12:09:39PM +0800, Brent Lu wrote:
> The hw_base will be increased by runtime->buffer_size frames
> unconditionally if the runtime->status->hw_ptr is not updated for over
> half of buffer time. As the hw_base increases, so does the
> runtime->status->hw_ptr which could lead to invalid return value when
> user space program calls snd_pcm_avail() function.
> 
> By updating runtime->hw_ptr_jiffies each time the HWSYNC is called,
> the hw_base will keep the same when buffer stall happens, so does the
> hw_ptr.
> 
> Signed-off-by: Brent Lu 
> ---
>  sound/core/pcm_lib.c | 1 +
>  1 file changed, 1 insertion(+)

Is this a bugfix needed for older kernels as well?  When did this issue
show up?

thanks,

greg k-h