Re: ✗ Fi.CI.IGT: failure for Resolve suspend-resume racing with GuC destroy-context-worker (rev13)

2024-01-09 Thread Rodrigo Vivi
On Thu, Jan 04, 2024 at 05:39:16PM +, Teres Alexis, Alan Previn wrote:
> On Thu, 2024-01-04 at 10:57 +, Patchwork wrote:
> > Patch Details
> > Series: Resolve suspend-resume racing with GuC destroy-context-worker 
> > (rev13)
> > URL:https://patchwork.freedesktop.org/series/121916/
> > State:  failure
> > Details:
> > https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121916v13/index.html
> > CI Bug Log - changes from CI_DRM_14076_full -> Patchwork_121916v13_full
> > Summary
> > 
> > FAILURE
> alan:snip
> 
> 
> > Here are the unknown changes that may have been introduced in 
> > Patchwork_121916v13_full:
> > 
> > IGT changes
> > Possible regressions
> > 
> >   *   igt@gem_eio@wait-wedge-immediate:
> >  *   shard-mtlp: 
> > PASS
> >  -> 
> > ABORT
> > 
> alan: from the code and dmesg, this is unrelated to guc context destruction 
> flows.
> Its reading an MCR register that times out. Additionally, i believe this 
> error is occuring during post-reset-init flows.
> So its definitely not doing any context destruction at this point (as reset 
> would have happenned sooner).

yeap, it is indeed happening once in a while:

https://intel-gfx-ci.01.org/tree/drm-tip/IGT_7659/shard-mtlp-4/igt@gem_...@wait-wedge-immediate.html

I was going to merge the series now, but then I noticed that Matt had taken 
care of that.
Thank you all.

> > Known issues
> > 
> 


Re: ✗ Fi.CI.IGT: failure for Resolve suspend-resume racing with GuC destroy-context-worker (rev13)

2024-01-09 Thread Matt Roper
On Thu, Jan 04, 2024 at 05:39:16PM +, Teres Alexis, Alan Previn wrote:
> On Thu, 2024-01-04 at 10:57 +, Patchwork wrote:
> > Patch Details
> > Series: Resolve suspend-resume racing with GuC destroy-context-worker 
> > (rev13)
> > URL:https://patchwork.freedesktop.org/series/121916/
> > State:  failure
> > Details:
> > https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121916v13/index.html
> > CI Bug Log - changes from CI_DRM_14076_full -> Patchwork_121916v13_full
> > Summary
> > 
> > FAILURE
> alan:snip
> 
> 
> > Here are the unknown changes that may have been introduced in 
> > Patchwork_121916v13_full:
> > 
> > IGT changes
> > Possible regressions
> > 
> >   *   igt@gem_eio@wait-wedge-immediate:
> >  *   shard-mtlp: 
> > PASS
> >  -> 
> > ABORT
> > 
> alan: from the code and dmesg, this is unrelated to guc context destruction 
> flows.
> Its reading an MCR register that times out. Additionally, i believe this 
> error is occuring during post-reset-init flows.
> So its definitely not doing any context destruction at this point (as reset 
> would have happenned sooner).

Yeah, the MCR timeouts are due to these CI machines running an outdated
IFWI, so they're missing an important workaround in the firmware.

Series applies to drm-intel-gt-next.  Thanks for the patches and
reviews.


Matt

> > Known issues
> > 
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: ✗ Fi.CI.IGT: failure for Resolve suspend-resume racing with GuC destroy-context-worker (rev13)

2024-01-04 Thread Teres Alexis, Alan Previn
On Thu, 2024-01-04 at 10:57 +, Patchwork wrote:
> Patch Details
> Series: Resolve suspend-resume racing with GuC destroy-context-worker (rev13)
> URL:https://patchwork.freedesktop.org/series/121916/
> State:  failure
> Details:
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121916v13/index.html
> CI Bug Log - changes from CI_DRM_14076_full -> Patchwork_121916v13_full
> Summary
> 
> FAILURE
alan:snip


> Here are the unknown changes that may have been introduced in 
> Patchwork_121916v13_full:
> 
> IGT changes
> Possible regressions
> 
>   *   igt@gem_eio@wait-wedge-immediate:
>  *   shard-mtlp: 
> PASS
>  -> 
> ABORT
> 
alan: from the code and dmesg, this is unrelated to guc context destruction 
flows.
Its reading an MCR register that times out. Additionally, i believe this error 
is occuring during post-reset-init flows.
So its definitely not doing any context destruction at this point (as reset 
would have happenned sooner).
> Known issues
>