Re: WARNING in amdgpu_sync_keep_later / dma_fence_is_later should be rate limited

2023-09-22 Thread Rafał Miłecki

On 21.09.2023 21:52, Deucher, Alexander wrote:

backporting commit 187916e6ed9d ("drm/amdgpu: install stub fence into
potential unused fence pointers") to stable kernels resulted in lots of
WARNINGs on some devices. In my case I was getting 3 WARNINGs per
second (~150 lines logged every second). Commit ended up being reverted for
stable but it exposed a potential problem. My messages log size was reaching
gigabytes and was running my /tmp/ out of space.

Could someone take a look at amdgpu_sync_keep_later / dma_fence_is_later
and make sure its logging is rate limited to avoid such situations in the 
future,
please?

Revert in linux-5.15.x:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=li
nux-5.15.y=fae2d591f3cb31f722c7f065acf586830eab8c2a

openSUSE bug report:
https://bugzilla.opensuse.org/show_bug.cgi?id=1215523


These patches were never intended for stable.  They were picked up by Sasha's 
stable autoselect tools and automatically applied to stable kernels.


Are you saying massive WARNINGs in dma_fence_is_later() can't happen
in any other case? I understand it was an incorrect backport action but
I thought we may learn from it and still add some rate limit.


Re: WARNING in amdgpu_sync_keep_later / dma_fence_is_later should be rate limited

2023-09-21 Thread Christian König

Am 21.09.23 um 23:30 schrieb Alex Deucher:

On Thu, Sep 21, 2023 at 4:21 PM Rafał Miłecki  wrote:

On 21.09.2023 21:52, Deucher, Alexander wrote:

backporting commit 187916e6ed9d ("drm/amdgpu: install stub fence into
potential unused fence pointers") to stable kernels resulted in lots of
WARNINGs on some devices. In my case I was getting 3 WARNINGs per
second (~150 lines logged every second). Commit ended up being reverted for
stable but it exposed a potential problem. My messages log size was reaching
gigabytes and was running my /tmp/ out of space.

Could someone take a look at amdgpu_sync_keep_later / dma_fence_is_later
and make sure its logging is rate limited to avoid such situations in the 
future,
please?

Revert in linux-5.15.x:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=li
nux-5.15.y=fae2d591f3cb31f722c7f065acf586830eab8c2a

openSUSE bug report:
https://bugzilla.opensuse.org/show_bug.cgi?id=1215523

These patches were never intended for stable.  They were picked up by Sasha's 
stable autoselect tools and automatically applied to stable kernels.

Are you saying massive WARNINGs in dma_fence_is_later() can't happen
in any other case? I understand it was an incorrect backport action but
I thought we may learn from it and still add some rate limit.

All of the current places where that function is used check the
contexts before calling it so it should be safe as is in the tree.
That said, something like this could potentially happen again.  I
don't think using WARN_ON_RATELIMIT() would be a problem.


Yeah, but it also shouldn't be necessary.

When this triggers you have a major driver bug at hand, spamming the 
logs is then the least of your problems.


Christian.



Alex




Re: WARNING in amdgpu_sync_keep_later / dma_fence_is_later should be rate limited

2023-09-21 Thread Alex Deucher
On Thu, Sep 21, 2023 at 4:21 PM Rafał Miłecki  wrote:
>
> On 21.09.2023 21:52, Deucher, Alexander wrote:
> >> backporting commit 187916e6ed9d ("drm/amdgpu: install stub fence into
> >> potential unused fence pointers") to stable kernels resulted in lots of
> >> WARNINGs on some devices. In my case I was getting 3 WARNINGs per
> >> second (~150 lines logged every second). Commit ended up being reverted for
> >> stable but it exposed a potential problem. My messages log size was 
> >> reaching
> >> gigabytes and was running my /tmp/ out of space.
> >>
> >> Could someone take a look at amdgpu_sync_keep_later / dma_fence_is_later
> >> and make sure its logging is rate limited to avoid such situations in the 
> >> future,
> >> please?
> >>
> >> Revert in linux-5.15.x:
> >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=li
> >> nux-5.15.y=fae2d591f3cb31f722c7f065acf586830eab8c2a
> >>
> >> openSUSE bug report:
> >> https://bugzilla.opensuse.org/show_bug.cgi?id=1215523
> >
> > These patches were never intended for stable.  They were picked up by 
> > Sasha's stable autoselect tools and automatically applied to stable kernels.
>
> Are you saying massive WARNINGs in dma_fence_is_later() can't happen
> in any other case? I understand it was an incorrect backport action but
> I thought we may learn from it and still add some rate limit.

All of the current places where that function is used check the
contexts before calling it so it should be safe as is in the tree.
That said, something like this could potentially happen again.  I
don't think using WARN_ON_RATELIMIT() would be a problem.

Alex


RE: WARNING in amdgpu_sync_keep_later / dma_fence_is_later should be rate limited

2023-09-21 Thread Deucher, Alexander
[Public]

> -Original Message-
> From: Rafał Miłecki 
> Sent: Thursday, September 21, 2023 3:41 PM
> To: Deucher, Alexander ; Koenig, Christian
> ; Pan, Xinhui ; amd-
> g...@lists.freedesktop.org; dri-devel ; Yu,
> Lang 
> Subject: WARNING in amdgpu_sync_keep_later / dma_fence_is_later should
> be rate limited
>
> Hi,
>
> backporting commit 187916e6ed9d ("drm/amdgpu: install stub fence into
> potential unused fence pointers") to stable kernels resulted in lots of
> WARNINGs on some devices. In my case I was getting 3 WARNINGs per
> second (~150 lines logged every second). Commit ended up being reverted for
> stable but it exposed a potential problem. My messages log size was reaching
> gigabytes and was running my /tmp/ out of space.
>
> Could someone take a look at amdgpu_sync_keep_later / dma_fence_is_later
> and make sure its logging is rate limited to avoid such situations in the 
> future,
> please?
>
> Revert in linux-5.15.x:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=li
> nux-5.15.y=fae2d591f3cb31f722c7f065acf586830eab8c2a
>
> openSUSE bug report:
> https://bugzilla.opensuse.org/show_bug.cgi?id=1215523

These patches were never intended for stable.  They were picked up by Sasha's 
stable autoselect tools and automatically applied to stable kernels.

Alex