Re: "ring gfx timeout" with Vega 64 on mesa 19.0.0-rc2 and kernel 5.0.0-rc6 (GPU reset still not works)

2019-02-20 Thread Mikhail Gavrilov
On Wed, 20 Feb 2019 at 20:39, Grodzovsky, Andrey wrote: > No, we only fixed the original deadlock with display driver during GPU > reset. I still didn't have time to go over your captures for the GPU > page fault. > > The deadlock we see here is another deadlock, different from the one > already f

Re: "ring gfx timeout" with Vega 64 on mesa 19.0.0-rc2 and kernel 5.0.0-rc6 (GPU reset still not works)

2019-02-20 Thread Grodzovsky, Andrey
On 2/20/19 12:28 AM, Mikhail Gavrilov wrote: > On Tue, 19 Feb 2019 at 20:24, Grodzovsky, Andrey > wrote: >> Just pull in latest drm-next from here - >> https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next >> >> Andrey > Tested this kernel and result not good for me. > 1) "amdgpu

Re: "ring gfx timeout" with Vega 64 on mesa 19.0.0-rc2 and kernel 5.0.0-rc6 (GPU reset still not works)

2019-02-19 Thread Grodzovsky, Andrey
Just pull in latest drm-next from here - https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next Andrey On 2/14/19 11:18 PM, Mikhail Gavrilov wrote: > On Thu, 14 Feb 2019 at 20:51, Grodzovsky, Andrey > wrote: >> Got it. >> >> Andrey >> > Cool, please don't forget give me patch for

Re: "ring gfx timeout" with Vega 64 on mesa 19.0.0-rc2 and kernel 5.0.0-rc6 (GPU reset still not works)

2019-02-15 Thread Mikhail Gavrilov via amd-gfx
On Thu, 14 Feb 2019 at 20:51, Grodzovsky, Andrey wrote: > > Got it. > > Andrey > Cool, please don't forget give me patch for testing. -- Best Regards, Mike Gavrilov. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/

Re: "ring gfx timeout" with Vega 64 on mesa 19.0.0-rc2 and kernel 5.0.0-rc6 (GPU reset still not works)

2019-02-14 Thread Grodzovsky, Andrey
Got it. Andrey On 2/14/19 4:32 AM, Christian König wrote: Hey Andrey, this is on Vega10, so the ASIC always stops after it sees the first fault. I'm actually working on implementing that it should continue without interruption. Regards, Christian. Am 13.02.19 um 22:47 schrieb Grodzovsky, And

Re: "ring gfx timeout" with Vega 64 on mesa 19.0.0-rc2 and kernel 5.0.0-rc6 (GPU reset still not works)

2019-02-14 Thread Christian König via amd-gfx
Hey Andrey, this is on Vega10, so the ASIC always stops after it sees the first fault. I'm actually working on implementing that it should continue without interruption. Regards, Christian. Am 13.02.19 um 22:47 schrieb Grodzovsky, Andrey: Looks like you are still running this without the l

Re: "ring gfx timeout" with Vega 64 on mesa 19.0.0-rc2 and kernel 5.0.0-rc6 (GPU reset still not works)

2019-02-14 Thread Michel Dänzer
[ Puts on list administrator hat ] On 2019-02-14 5:16 a.m., Mikhail Gavrilov via amd-gfx wrote: > > Just in case, I duplicated all the files on the file sharing service Mega: > https://mega.nz/#F!pgYCjYrS!NkeTFIja_qwmxqLoSEUyzA Please only share such large files via an external service, don't

Re: "ring gfx timeout" with Vega 64 on mesa 19.0.0-rc2 and kernel 5.0.0-rc6 (GPU reset still not works)

2019-02-13 Thread Grodzovsky, Andrey
Looks like you are still running this without the latest hang fix since i see the deadlock again, but actually what i forgot to ask you is to load amdgpu with vm_fault_stop=2 to freeze the ASIC once VM_FAULT is encountered - sorry about that. So please retest with amdgpu.vm_fault_stop=2 paramete

Re: "ring gfx timeout" with Vega 64 on mesa 19.0.0-rc2 and kernel 5.0.0-rc6 (GPU reset still not works)

2019-02-13 Thread Grodzovsky, Andrey
OK, just apply the following to your amdgpu_dm_do_flip function and see if GPU reset does proceed after you experience the hang. diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index d59bafc..586301f 100644 --- a/drivers/gpu/drm/

Re: "ring gfx timeout" with Vega 64 on mesa 19.0.0-rc2 and kernel 5.0.0-rc6 (GPU reset still not works)

2019-02-12 Thread Mikhail Gavrilov via amd-gfx
On Tue, 12 Feb 2019 at 20:23, Grodzovsky, Andrey wrote: > > It should recover you - so this looks like a bug. I noticed in one of > the call traces this - drm_atomic_helper_suspend which points to system > going into sleep mode, is it what happened, did it hang when system > tried to sleep ? > It

Re: "ring gfx timeout" with Vega 64 on mesa 19.0.0-rc2 and kernel 5.0.0-rc6 (GPU reset still not works)

2019-02-12 Thread Grodzovsky, Andrey
Sure, that probably would be the solution, one missing detail here (besides confirming with the debug prints that this is the scenario we are hitting) is WHY we even stuck in reservation_object_wait_timeout_rcu, in amdgpu_device_pre_asic_reset (during GPU reset) we are first forcing all outstan

Re: "ring gfx timeout" with Vega 64 on mesa 19.0.0-rc2 and kernel 5.0.0-rc6 (GPU reset still not works)

2019-02-12 Thread Nicholas Kazlauskas
The MAX_SCHEDULE_TIMEOUT is probably not a good idea on the wait in DM. I wonder if we could just do shorter wait and skip the FB update/programming if it fails after some reasonable amount of time. This would still allow recovery to happen at least even if the display isn't showing the right b

Re: "ring gfx timeout" with Vega 64 on mesa 19.0.0-rc2 and kernel 5.0.0-rc6 (GPU reset still not works)

2019-02-12 Thread Grodzovsky, Andrey
On 2/12/19 7:34 AM, Mikhail Gavrilov wrote: > Hi folks. Sorry for noise. > But I really don't know Is it enough to send my logs or not. > As I am understand different sequences may cause "ring gfx timeout". > I am also not hear which version I need wait or which patch I needs > apply before testin

Re: "ring gfx timeout" with Vega 64 on mesa 19.0.0-rc2 and kernel 5.0.0-rc6 (GPU reset still not works)

2019-02-12 Thread Grodzovsky, Andrey
They are useful. I am gonna take a look later. Andrey On 2/12/19 10:49 AM, Mikhail Gavrilov wrote: > On Tue, 12 Feb 2019 at 20:23, Grodzovsky, Andrey > wrote: >> It should recover you - so this looks like a bug. I noticed in one of >> the call traces this - drm_atomic_helper_suspend which points

Re: "ring gfx timeout" with Vega 64 on mesa 19.0.0-rc2 and kernel 5.0.0-rc6 (GPU reset still not works)

2019-02-12 Thread Grodzovsky, Andrey
I suspect the issue is that amdgpu_dm_do_flip is holding the BO reserved and then stack waiting for fences to signal in reservation_object_wait_timeout_rcu (which won't signal because there was a VM_FAULT). Then when we try to shutdown display block during reset recovery from drm_atomic_helper_