Re: [PATCH] drm/amdgpu: fix gfx hang during suspend with video playback

2020-04-07 Thread Huang Rui
On Tue, Apr 07, 2020 at 11:43:20PM +0800, Kuehling, Felix wrote:
> Sorry, I missed this email thread because the subject seemed irrelevant
> to me. I still don't get why this is causing a problem with
> suspend/resume with video playback.
> 
> The functions you're changing are mostly used when running without HWS.
> This should only be the case during bring-ups or while debugging HWS
> issues. Otherwise they're only used for setting up the HIQ. That means
> in normal operation, these functions should not be used for user mode
> queue mapping, which is handled by the HWS.

The issue is caused by destorying the queue with mmio under cgpg/gfxoff
enabled.

While we do suspend to S3, it will destory the HIQ under CGPG/GFXOFF is still
enabled. At that time, we didn't have any commands under HWS, because of no
ROCm application is running. 

In this case, we have three ways to fix the issue:

1. Disable CGPG/GFXOFF before do kfd suspend.

https://lists.freedesktop.org/archives/amd-gfx/2020-April/048181.html

2. Destory the hiq queue under RLC save mode.
3. Using the UNMAP packet to unmap the hiq with kiq instead of mmio.

I think use #1 is more straightforward. For long term, I think we should use
kiq to map/unmap all cp/sdma queues.

> 
> Ray, I vaguely remember we discussed using KIQ for mapping the HIQ at
> some point. Did anyone ever propose a patch for that?
> 

Yes, that's patch is already upstream. However this issue is caused by
destorying the queue. (sorry, I should cover this case before)

commit 35cd89d5a658dc26687a7a6909d35fee19a6b173
Author: Aaron Liu 
Date:   Wed Dec 25 15:50:51 2019 +0800

drm/amdkfd: use kiq to load the mqd of hiq queue for gfx v9 (v6)

There is an issue that CP will check the HIQ queue to be configured and 
mapped
with KIQ ring, otherwise, it will be unable to read back the secure buffer 
while
the gfxoff is enabled even with trusted IP blocks.

v1 -> v2:
- Fix to remove surplus set_resources packets.
- Fill the whole configuration in MQD.
- Change the author as Aaron because he addressed the key point of this 
issue.
- Add kiq ring lock.

v2 -> v3:
- Free the lock while in error return case.
- Remove the programming only needed by the queue is unmapped.

v3 -> v4:
- Remove doorbell programming because it's used for restarting queue.
- Remove CP scheduler programming because map_queue packet will handle this.

v4 -> v5:
- Remove cp_hqd_active because mec ucode will enable it while use 
map_queues.
- Revise goto out_unlock.
- Correct the right doorbell offset for HIQ that kfd driver assigned in the
  packet.

v5 -> v6:
- Merge Arcturus fix into this patch because it will get oops in Arcturus
  platform.

Reported-by: Lisa Saturday 
Signed-off-by: Aaron Liu 
Signed-off-by: Huang Rui 
Reviewed-and-Tested-by: Aaron Liu 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 

Thanks,
Ray

> Thanks,
>   Felix
> 
> Am 2020-04-03 um 12:07 a.m. schrieb Prike Liang:
> > The system will be hang up during S3 as SMU is pending at GC not
> > respose the register CP_HQD_ACTIVE access request and this issue
> > can be fixed by adding RLC safe mode guard before each HQD
> > map/unmap retrive opt.
> >
> > Signed-off-by: Prike Liang 
> > Tested-by: Mengbing Wang 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 6 ++
> >  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 4 
> >  2 files changed, 10 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> > index df841c2..e265063 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> > @@ -232,6 +232,7 @@ int kgd_gfx_v9_hqd_load(struct kgd_dev *kgd, void *mqd, 
> > uint32_t pipe_id,
> > uint32_t *mqd_hqd;
> > uint32_t reg, hqd_base, data;
> >  
> > +   amdgpu_gfx_rlc_enter_safe_mode(adev);
> > m = get_mqd(mqd);
> >  
> > acquire_queue(kgd, pipe_id, queue_id);
> > @@ -299,6 +300,7 @@ int kgd_gfx_v9_hqd_load(struct kgd_dev *kgd, void *mqd, 
> > uint32_t pipe_id,
> >  
> > release_queue(kgd);
> >  
> > +   amdgpu_gfx_rlc_exit_safe_mode(adev);
> > return 0;
> >  }
> >  
> > @@ -497,6 +499,7 @@ bool kgd_gfx_v9_hqd_is_occupied(struct kgd_dev *kgd, 
> > uint64_t queue_address,
> > bool retval = false;
> > uint32_t low, high;
> >  
> > +   amdgpu_gfx_rlc_enter_safe_mode(adev);
> > acquire_queue(kgd, pipe_id, queue_id);
> > act = RREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_ACTIVE));
> > if (act) {
> > @@ -508,6 +511,7 @@ bool kgd_gfx_v9_hqd_is_occupied(struct kgd_dev *kgd, 
> > uint64_t queue_address,
> > retval = true;
> > }
> > release_queue(kgd);
> > +   amdgpu_gfx_rlc_exit_safe_mode(adev);
> > return retval;
> >  }
> >  
> > @@ -541,6 +545,7 @@ int kgd_gfx_v9_hqd_destroy(struct 

RE: [PATCH] drm/amdgpu: fix gfx hang during suspend with video playback

2020-04-07 Thread Liang, Prike


> -Original Message-
> From: Kuehling, Felix 
> Sent: Tuesday, April 7, 2020 11:43 PM
> To: Liang, Prike ; amd-gfx@lists.freedesktop.org;
> Huang, Ray 
> Cc: Deucher, Alexander ; Quan, Evan
> 
> Subject: Re: [PATCH] drm/amdgpu: fix gfx hang during suspend with video
> playback
> 
> Sorry, I missed this email thread because the subject seemed irrelevant to
> me. I still don't get why this is causing a problem with suspend/resume with
> video playback.
> 
> The functions you're changing are mostly used when running without HWS.
> This should only be the case during bring-ups or while debugging HWS issues.
> Otherwise they're only used for setting up the HIQ. That means in normal
> operation, these functions should not be used for user mode queue mapping,
> which is handled by the HWS.
[Prike]  This issue caused by improperly accessing the register CP_HQD_ACTIVE 
under GFX enter CGPG during perform destroy MQD at the stage of amdkfd suspend. 

For this solution may have an excessive guard for some MQD setup and occupy 
check. 
It's likely a potential common issue and have drafted v2 patch to disable GFX 
CGPG 
directly before perform amdgpu suspend opt. 

Thanks,
Prike

> Ray, I vaguely remember we discussed using KIQ for mapping the HIQ at
> some point. Did anyone ever propose a patch for that?
> 
> Thanks,
>   Felix
> 
> Am 2020-04-03 um 12:07 a.m. schrieb Prike Liang:
> > The system will be hang up during S3 as SMU is pending at GC not
> > respose the register CP_HQD_ACTIVE access request and this issue can
> > be fixed by adding RLC safe mode guard before each HQD map/unmap
> > retrive opt.
> >
> > Signed-off-by: Prike Liang 
> > Tested-by: Mengbing Wang 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 6 ++
> >  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 4 
> >  2 files changed, 10 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> > index df841c2..e265063 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> > @@ -232,6 +232,7 @@ int kgd_gfx_v9_hqd_load(struct kgd_dev *kgd, void
> *mqd, uint32_t pipe_id,
> > uint32_t *mqd_hqd;
> > uint32_t reg, hqd_base, data;
> >
> > +   amdgpu_gfx_rlc_enter_safe_mode(adev);
> > m = get_mqd(mqd);
> >
> > acquire_queue(kgd, pipe_id, queue_id); @@ -299,6 +300,7 @@ int
> > kgd_gfx_v9_hqd_load(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id,
> >
> > release_queue(kgd);
> >
> > +   amdgpu_gfx_rlc_exit_safe_mode(adev);
> > return 0;
> >  }
> >
> > @@ -497,6 +499,7 @@ bool kgd_gfx_v9_hqd_is_occupied(struct kgd_dev
> *kgd, uint64_t queue_address,
> > bool retval = false;
> > uint32_t low, high;
> >
> > +   amdgpu_gfx_rlc_enter_safe_mode(adev);
> > acquire_queue(kgd, pipe_id, queue_id);
> > act = RREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_ACTIVE));
> > if (act) {
> > @@ -508,6 +511,7 @@ bool kgd_gfx_v9_hqd_is_occupied(struct kgd_dev
> *kgd, uint64_t queue_address,
> > retval = true;
> > }
> > release_queue(kgd);
> > +   amdgpu_gfx_rlc_exit_safe_mode(adev);
> > return retval;
> >  }
> >
> > @@ -541,6 +545,7 @@ int kgd_gfx_v9_hqd_destroy(struct kgd_dev *kgd,
> void *mqd,
> > uint32_t temp;
> > struct v9_mqd *m = get_mqd(mqd);
> >
> > +   amdgpu_gfx_rlc_enter_safe_mode(adev);
> > if (adev->in_gpu_reset)
> > return -EIO;
> >
> > @@ -577,6 +582,7 @@ int kgd_gfx_v9_hqd_destroy(struct kgd_dev *kgd,
> void *mqd,
> > }
> >
> > release_queue(kgd);
> > +   amdgpu_gfx_rlc_exit_safe_mode(adev);
> > return 0;
> >  }
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > index 1fea077..ee107d9 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > @@ -3533,6 +3533,7 @@ static int gfx_v9_0_kiq_init_register(struct
> amdgpu_ring *ring)
> > struct v9_mqd *mqd = ring->mqd_ptr;
> > int j;
> >
> > +   amdgpu_gfx_rlc_enter_safe_mode(adev);
> > /* disable wptr polling */
> > WREG32_FIELD15(GC, 0, CP_PQ_WPTR_POLL_CNTL, EN, 0);
> >
> > @@ -3629,6 +3630,7 @@ static int gfx_v9_0_kiq_init_register(struct
> amdgpu_ring *ring)
> > if (ring->use_doorbell)
> > WR

Re: [PATCH] drm/amdgpu: fix gfx hang during suspend with video playback

2020-04-07 Thread Felix Kuehling
Sorry, I missed this email thread because the subject seemed irrelevant
to me. I still don't get why this is causing a problem with
suspend/resume with video playback.

The functions you're changing are mostly used when running without HWS.
This should only be the case during bring-ups or while debugging HWS
issues. Otherwise they're only used for setting up the HIQ. That means
in normal operation, these functions should not be used for user mode
queue mapping, which is handled by the HWS.

Ray, I vaguely remember we discussed using KIQ for mapping the HIQ at
some point. Did anyone ever propose a patch for that?

Thanks,
  Felix

Am 2020-04-03 um 12:07 a.m. schrieb Prike Liang:
> The system will be hang up during S3 as SMU is pending at GC not
> respose the register CP_HQD_ACTIVE access request and this issue
> can be fixed by adding RLC safe mode guard before each HQD
> map/unmap retrive opt.
>
> Signed-off-by: Prike Liang 
> Tested-by: Mengbing Wang 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 6 ++
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 4 
>  2 files changed, 10 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> index df841c2..e265063 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> @@ -232,6 +232,7 @@ int kgd_gfx_v9_hqd_load(struct kgd_dev *kgd, void *mqd, 
> uint32_t pipe_id,
>   uint32_t *mqd_hqd;
>   uint32_t reg, hqd_base, data;
>  
> + amdgpu_gfx_rlc_enter_safe_mode(adev);
>   m = get_mqd(mqd);
>  
>   acquire_queue(kgd, pipe_id, queue_id);
> @@ -299,6 +300,7 @@ int kgd_gfx_v9_hqd_load(struct kgd_dev *kgd, void *mqd, 
> uint32_t pipe_id,
>  
>   release_queue(kgd);
>  
> + amdgpu_gfx_rlc_exit_safe_mode(adev);
>   return 0;
>  }
>  
> @@ -497,6 +499,7 @@ bool kgd_gfx_v9_hqd_is_occupied(struct kgd_dev *kgd, 
> uint64_t queue_address,
>   bool retval = false;
>   uint32_t low, high;
>  
> + amdgpu_gfx_rlc_enter_safe_mode(adev);
>   acquire_queue(kgd, pipe_id, queue_id);
>   act = RREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_ACTIVE));
>   if (act) {
> @@ -508,6 +511,7 @@ bool kgd_gfx_v9_hqd_is_occupied(struct kgd_dev *kgd, 
> uint64_t queue_address,
>   retval = true;
>   }
>   release_queue(kgd);
> + amdgpu_gfx_rlc_exit_safe_mode(adev);
>   return retval;
>  }
>  
> @@ -541,6 +545,7 @@ int kgd_gfx_v9_hqd_destroy(struct kgd_dev *kgd, void *mqd,
>   uint32_t temp;
>   struct v9_mqd *m = get_mqd(mqd);
>  
> + amdgpu_gfx_rlc_enter_safe_mode(adev);
>   if (adev->in_gpu_reset)
>   return -EIO;
>  
> @@ -577,6 +582,7 @@ int kgd_gfx_v9_hqd_destroy(struct kgd_dev *kgd, void *mqd,
>   }
>  
>   release_queue(kgd);
> + amdgpu_gfx_rlc_exit_safe_mode(adev);
>   return 0;
>  }
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index 1fea077..ee107d9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -3533,6 +3533,7 @@ static int gfx_v9_0_kiq_init_register(struct 
> amdgpu_ring *ring)
>   struct v9_mqd *mqd = ring->mqd_ptr;
>   int j;
>  
> + amdgpu_gfx_rlc_enter_safe_mode(adev);
>   /* disable wptr polling */
>   WREG32_FIELD15(GC, 0, CP_PQ_WPTR_POLL_CNTL, EN, 0);
>  
> @@ -3629,6 +3630,7 @@ static int gfx_v9_0_kiq_init_register(struct 
> amdgpu_ring *ring)
>   if (ring->use_doorbell)
>   WREG32_FIELD15(GC, 0, CP_PQ_STATUS, DOORBELL_ENABLE, 1);
>  
> + amdgpu_gfx_rlc_exit_safe_mode(adev);
>   return 0;
>  }
>  
> @@ -3637,6 +3639,7 @@ static int gfx_v9_0_kiq_fini_register(struct 
> amdgpu_ring *ring)
>   struct amdgpu_device *adev = ring->adev;
>   int j;
>  
> + amdgpu_gfx_rlc_enter_safe_mode(adev);
>   /* disable the queue if it's active */
>   if (RREG32_SOC15(GC, 0, mmCP_HQD_ACTIVE) & 1) {
>  
> @@ -3668,6 +3671,7 @@ static int gfx_v9_0_kiq_fini_register(struct 
> amdgpu_ring *ring)
>   WREG32_SOC15_RLC(GC, 0, mmCP_HQD_PQ_WPTR_HI, 0);
>   WREG32_SOC15_RLC(GC, 0, mmCP_HQD_PQ_WPTR_LO, 0);
>  
> + amdgpu_gfx_rlc_exit_safe_mode(adev);
>   return 0;
>  }
>  
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: fix gfx hang during suspend with video playback

2020-04-07 Thread Liang, Prike



> -Original Message-
> From: Huang, Ray 
> Sent: Tuesday, April 7, 2020 4:03 PM
> To: Liang, Prike 
> Cc: Deucher, Alexander ; Kuehling, Felix
> ; Quan, Evan ; amd-
> g...@lists.freedesktop.org
> Subject: Re: [PATCH] drm/amdgpu: fix gfx hang during suspend with video
> playback
> 
> On Tue, Apr 07, 2020 at 01:49:43PM +0800, Liang, Prike wrote:
> >
> > > -Original Message-
> > > From: Huang, Ray 
> > > Sent: Friday, April 3, 2020 6:29 PM
> > > To: Liang, Prike 
> > > Cc: Deucher, Alexander ; Kuehling, Felix
> > > ; Quan, Evan ; amd-
> > > g...@lists.freedesktop.org
> > > Subject: Re: [PATCH] drm/amdgpu: fix gfx hang during suspend with
> > > video playback
> > >
> > > On Fri, Apr 03, 2020 at 06:05:55PM +0800, Huang Rui wrote:
> > > > On Fri, Apr 03, 2020 at 05:22:28PM +0800, Liang, Prike wrote:
> > > > >
> > > > > > -Original Message-
> > > > > > From: Huang, Ray 
> > > > > > Sent: Friday, April 3, 2020 2:27 PM
> > > > > > To: Liang, Prike 
> > > > > > Cc: amd-gfx@lists.freedesktop.org; Quan, Evan
> > > ;
> > > > > > Deucher, Alexander ; Kuehling,
> > > > > > Felix 
> > > > > > Subject: Re: [PATCH] drm/amdgpu: fix gfx hang during suspend
> > > > > > with video playback
> > > > > >
> > > > > > (+ Felix)
> > > > > >
> > > > > > On Fri, Apr 03, 2020 at 12:07:53PM +0800, Liang, Prike wrote:
> > > > > > > The system will be hang up during S3 as SMU is pending at GC
> > > > > > > not respose the register CP_HQD_ACTIVE access request and
> > > > > > > this issue can be fixed by adding RLC safe mode guard before
> > > > > > > each HQD map/unmap retrive opt.
> > > > > >
> > > > > > We need more information for the issue, does the map/unmap is
> > > > > > required for MAP_QUEUES/UNMAP_QUEUES packets or writing with
> > > MMIO or both?
> > > > > >
> > > > > [Prike]  The issue hang up at MP1 was trying to read register
> > > > > RSMU_RESIDENCY_COUNTER_GC but did not get response from GFX,
> > > since GFX was busy at reading register CP_HQD_ACTIVE.
> > > > > Moreover, when disabled GFXOFF this issue also can't see so
> > > > > there is likely to perform register accessed at GFXOFF CGPG/CGCG
> enter stage.
> > > > > As for only  this issue, that seems just MMIO  access failed
> > > > > case which
> > > occurred under QUEUE map/unmap status check.
> > > > >
> > > >
> > > > While we start to do S3, we will disable gfxoff at start of suspend.
> > > > Then in this point, the gfx should be always in "on" state.
> > > >
> > > > > > From your patch, you just protect the kernel kiq and user queue.
> > > > > > What about other kernel compute queues? HIQ?
> > > > > >
> > > > > [Prike] So far just find the KIQ/CPQ/DIQ map/unmap will inquire
> > > > > the CP_HQD_ACTIVE status by MMIO accessing, therefore just guard
> > > > > the KIQ
> > > and some type user queue now. Regarding HIQ map and ummap which
> used
> > > the method of submitting configuration packet.
> > > > >
> > > >
> > > > KIQ itself init/unit should be always under gfx on state. Can you
> > > > give a check the result if not add enter/exit rlc safe mode around it?
> > >
> > > Wait... In your case, the system didn't load any user queues because
> > > no ROCm based application is running. So the issue is probably
> > > caused by KIQ itself init/unit, can you confirm?
> > [Prike]  This  improper register access is under performing MQD
> > destroy during amdkfd suspend period. For the KIQ UNI process may not
> > need the RLC guard as GFX CGPG has been disabled at the early suspend
> period.
> 
> How about move below gfxoff/cgpg disabling ahead of
> amdgpu_amdkfd_suspend?
> 
> amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE);
> amdgpu_device_set_cg_state(adev, AMD_CG_STATE_UNGATE);
> 
> amdgpu_amdkfd_suspend(adev, !fbcon);
> 
> We should disable the gfxoff/cgpg at first to avoid mmio access.
> 
[Prike]  Generally speaking that's fine to un-gate the CGPG before each GFX 
MMIO access.
 That's should be no diff

Re: [PATCH] drm/amdgpu: fix gfx hang during suspend with video playback

2020-04-07 Thread Huang Rui
On Tue, Apr 07, 2020 at 01:49:43PM +0800, Liang, Prike wrote:
> 
> > -Original Message-
> > From: Huang, Ray 
> > Sent: Friday, April 3, 2020 6:29 PM
> > To: Liang, Prike 
> > Cc: Deucher, Alexander ; Kuehling, Felix
> > ; Quan, Evan ; amd-
> > g...@lists.freedesktop.org
> > Subject: Re: [PATCH] drm/amdgpu: fix gfx hang during suspend with video
> > playback
> > 
> > On Fri, Apr 03, 2020 at 06:05:55PM +0800, Huang Rui wrote:
> > > On Fri, Apr 03, 2020 at 05:22:28PM +0800, Liang, Prike wrote:
> > > >
> > > > > -Original Message-
> > > > > From: Huang, Ray 
> > > > > Sent: Friday, April 3, 2020 2:27 PM
> > > > > To: Liang, Prike 
> > > > > Cc: amd-gfx@lists.freedesktop.org; Quan, Evan
> > ;
> > > > > Deucher, Alexander ; Kuehling, Felix
> > > > > 
> > > > > Subject: Re: [PATCH] drm/amdgpu: fix gfx hang during suspend with
> > > > > video playback
> > > > >
> > > > > (+ Felix)
> > > > >
> > > > > On Fri, Apr 03, 2020 at 12:07:53PM +0800, Liang, Prike wrote:
> > > > > > The system will be hang up during S3 as SMU is pending at GC not
> > > > > > respose the register CP_HQD_ACTIVE access request and this issue
> > > > > > can be fixed by adding RLC safe mode guard before each HQD
> > > > > > map/unmap retrive opt.
> > > > >
> > > > > We need more information for the issue, does the map/unmap is
> > > > > required for MAP_QUEUES/UNMAP_QUEUES packets or writing with
> > MMIO or both?
> > > > >
> > > > [Prike]  The issue hang up at MP1 was trying to read register
> > > > RSMU_RESIDENCY_COUNTER_GC but did not get response from GFX,
> > since GFX was busy at reading register CP_HQD_ACTIVE.
> > > > Moreover, when disabled GFXOFF this issue also can't see so there is
> > > > likely to perform register accessed at GFXOFF CGPG/CGCG enter stage.
> > > > As for only  this issue, that seems just MMIO  access failed case which
> > occurred under QUEUE map/unmap status check.
> > > >
> > >
> > > While we start to do S3, we will disable gfxoff at start of suspend.
> > > Then in this point, the gfx should be always in "on" state.
> > >
> > > > > From your patch, you just protect the kernel kiq and user queue.
> > > > > What about other kernel compute queues? HIQ?
> > > > >
> > > > [Prike] So far just find the KIQ/CPQ/DIQ map/unmap will inquire the
> > > > CP_HQD_ACTIVE status by MMIO accessing, therefore just guard the KIQ
> > and some type user queue now. Regarding HIQ map and ummap which used
> > the method of submitting configuration packet.
> > > >
> > >
> > > KIQ itself init/unit should be always under gfx on state. Can you give
> > > a check the result if not add enter/exit rlc safe mode around it?
> > 
> > Wait... In your case, the system didn't load any user queues because no
> > ROCm based application is running. So the issue is probably caused by KIQ
> > itself init/unit, can you confirm?
> [Prike]  This  improper register access is under performing MQD destroy
> during amdkfd suspend period. For the KIQ UNI process may not need the RLC
> guard as GFX CGPG has been disabled at the early suspend period.  

How about move below gfxoff/cgpg disabling ahead of amdgpu_amdkfd_suspend?

amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE);
amdgpu_device_set_cg_state(adev, AMD_CG_STATE_UNGATE);

amdgpu_amdkfd_suspend(adev, !fbcon);

We should disable the gfxoff/cgpg at first to avoid mmio access.

Thanks,
Ray

> 
> If have concern the other case over guard will send a patch for simplify it.
> > 
> > Thanks,
> > Ray
> > 
> > >
> > > Hi Felix, maybe we need to use packets with kiq to map all user queues.
> > >
> > > Thanks,
> > > Ray
> > >
> > > > > Thanks,
> > > > > Ray
> > > > >
> > > > > >
> > > > > > Signed-off-by: Prike Liang 
> > > > > > Tested-by: Mengbing Wang 
> > > > > > ---
> > > > > >  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 6
> > ++
> > > > > >  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 4 
> > > > > >  2 files changed, 10 insertions(+)
> > > > > >
> > > > &g

RE: [PATCH] drm/amdgpu: fix gfx hang during suspend with video playback

2020-04-06 Thread Liang, Prike


> -Original Message-
> From: Huang, Ray 
> Sent: Friday, April 3, 2020 6:29 PM
> To: Liang, Prike 
> Cc: Deucher, Alexander ; Kuehling, Felix
> ; Quan, Evan ; amd-
> g...@lists.freedesktop.org
> Subject: Re: [PATCH] drm/amdgpu: fix gfx hang during suspend with video
> playback
> 
> On Fri, Apr 03, 2020 at 06:05:55PM +0800, Huang Rui wrote:
> > On Fri, Apr 03, 2020 at 05:22:28PM +0800, Liang, Prike wrote:
> > >
> > > > -Original Message-
> > > > From: Huang, Ray 
> > > > Sent: Friday, April 3, 2020 2:27 PM
> > > > To: Liang, Prike 
> > > > Cc: amd-gfx@lists.freedesktop.org; Quan, Evan
> ;
> > > > Deucher, Alexander ; Kuehling, Felix
> > > > 
> > > > Subject: Re: [PATCH] drm/amdgpu: fix gfx hang during suspend with
> > > > video playback
> > > >
> > > > (+ Felix)
> > > >
> > > > On Fri, Apr 03, 2020 at 12:07:53PM +0800, Liang, Prike wrote:
> > > > > The system will be hang up during S3 as SMU is pending at GC not
> > > > > respose the register CP_HQD_ACTIVE access request and this issue
> > > > > can be fixed by adding RLC safe mode guard before each HQD
> > > > > map/unmap retrive opt.
> > > >
> > > > We need more information for the issue, does the map/unmap is
> > > > required for MAP_QUEUES/UNMAP_QUEUES packets or writing with
> MMIO or both?
> > > >
> > > [Prike]  The issue hang up at MP1 was trying to read register
> > > RSMU_RESIDENCY_COUNTER_GC but did not get response from GFX,
> since GFX was busy at reading register CP_HQD_ACTIVE.
> > > Moreover, when disabled GFXOFF this issue also can't see so there is
> > > likely to perform register accessed at GFXOFF CGPG/CGCG enter stage.
> > > As for only  this issue, that seems just MMIO  access failed case which
> occurred under QUEUE map/unmap status check.
> > >
> >
> > While we start to do S3, we will disable gfxoff at start of suspend.
> > Then in this point, the gfx should be always in "on" state.
> >
> > > > From your patch, you just protect the kernel kiq and user queue.
> > > > What about other kernel compute queues? HIQ?
> > > >
> > > [Prike] So far just find the KIQ/CPQ/DIQ map/unmap will inquire the
> > > CP_HQD_ACTIVE status by MMIO accessing, therefore just guard the KIQ
> and some type user queue now. Regarding HIQ map and ummap which used
> the method of submitting configuration packet.
> > >
> >
> > KIQ itself init/unit should be always under gfx on state. Can you give
> > a check the result if not add enter/exit rlc safe mode around it?
> 
> Wait... In your case, the system didn't load any user queues because no
> ROCm based application is running. So the issue is probably caused by KIQ
> itself init/unit, can you confirm?
[Prike]  This  improper register access is under performing MQD destroy
during amdkfd suspend period. For the KIQ UNI process may not need the RLC
guard as GFX CGPG has been disabled at the early suspend period.  

If have concern the other case over guard will send a patch for simplify it.
> 
> Thanks,
> Ray
> 
> >
> > Hi Felix, maybe we need to use packets with kiq to map all user queues.
> >
> > Thanks,
> > Ray
> >
> > > > Thanks,
> > > > Ray
> > > >
> > > > >
> > > > > Signed-off-by: Prike Liang 
> > > > > Tested-by: Mengbing Wang 
> > > > > ---
> > > > >  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 6
> ++
> > > > >  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 4 
> > > > >  2 files changed, 10 insertions(+)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> > > > > index df841c2..e265063 100644
> > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> > > > > @@ -232,6 +232,7 @@ int kgd_gfx_v9_hqd_load(struct kgd_dev *kgd,
> > > > > void
> > > > *mqd, uint32_t pipe_id,
> > > > >   uint32_t *mqd_hqd;
> > > > >   uint32_t reg, hqd_base, data;
> > > > >
> > > > > + amdgpu_gfx_rlc_enter_safe_mode(adev);
> > > > >   m = get_mqd(mqd);
> > > > >
> > > > >   acquire_qu

Re: [PATCH] drm/amdgpu: fix gfx hang during suspend with video playback

2020-04-03 Thread Huang Rui
On Fri, Apr 03, 2020 at 06:05:55PM +0800, Huang Rui wrote:
> On Fri, Apr 03, 2020 at 05:22:28PM +0800, Liang, Prike wrote:
> > 
> > > -Original Message-
> > > From: Huang, Ray 
> > > Sent: Friday, April 3, 2020 2:27 PM
> > > To: Liang, Prike 
> > > Cc: amd-gfx@lists.freedesktop.org; Quan, Evan ;
> > > Deucher, Alexander ; Kuehling, Felix
> > > 
> > > Subject: Re: [PATCH] drm/amdgpu: fix gfx hang during suspend with video
> > > playback
> > > 
> > > (+ Felix)
> > > 
> > > On Fri, Apr 03, 2020 at 12:07:53PM +0800, Liang, Prike wrote:
> > > > The system will be hang up during S3 as SMU is pending at GC not
> > > > respose the register CP_HQD_ACTIVE access request and this issue can
> > > > be fixed by adding RLC safe mode guard before each HQD map/unmap
> > > > retrive opt.
> > > 
> > > We need more information for the issue, does the map/unmap is required
> > > for MAP_QUEUES/UNMAP_QUEUES packets or writing with MMIO or both?
> > > 
> > [Prike]  The issue hang up at MP1 was trying to read register 
> > RSMU_RESIDENCY_COUNTER_GC 
> > but did not get response from GFX, since GFX was busy at reading register 
> > CP_HQD_ACTIVE.
> > Moreover, when disabled GFXOFF this issue also can't see so there is likely 
> > to perform 
> > register accessed at GFXOFF CGPG/CGCG enter stage.  As for only  this 
> > issue, that seems just 
> > MMIO  access failed case which occurred under QUEUE map/unmap status check. 
> > 
> 
> While we start to do S3, we will disable gfxoff at start of suspend. Then
> in this point, the gfx should be always in "on" state. 
> 
> > > From your patch, you just protect the kernel kiq and user queue. What 
> > > about
> > > other kernel compute queues? HIQ?
> > > 
> > [Prike] So far just find the KIQ/CPQ/DIQ map/unmap will inquire the 
> > CP_HQD_ACTIVE status by MMIO accessing,
> > therefore just guard the KIQ  and some type user queue now. Regarding HIQ 
> > map and ummap which used the method of submitting configuration packet.  
> > 
> 
> KIQ itself init/unit should be always under gfx on state. Can you give a
> check the result if not add enter/exit rlc safe mode around it?

Wait... In your case, the system didn't load any user queues because no
ROCm based application is running. So the issue is probably caused by KIQ
itself init/unit, can you confirm?

Thanks,
Ray

> 
> Hi Felix, maybe we need to use packets with kiq to map all user queues.
> 
> Thanks,
> Ray
> 
> > > Thanks,
> > > Ray
> > > 
> > > >
> > > > Signed-off-by: Prike Liang 
> > > > Tested-by: Mengbing Wang 
> > > > ---
> > > >  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 6 ++
> > > >  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 4 
> > > >  2 files changed, 10 insertions(+)
> > > >
> > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> > > > index df841c2..e265063 100644
> > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> > > > @@ -232,6 +232,7 @@ int kgd_gfx_v9_hqd_load(struct kgd_dev *kgd, void
> > > *mqd, uint32_t pipe_id,
> > > > uint32_t *mqd_hqd;
> > > > uint32_t reg, hqd_base, data;
> > > >
> > > > +   amdgpu_gfx_rlc_enter_safe_mode(adev);
> > > > m = get_mqd(mqd);
> > > >
> > > > acquire_queue(kgd, pipe_id, queue_id); @@ -299,6 +300,7 @@ int
> > > > kgd_gfx_v9_hqd_load(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id,
> > > >
> > > > release_queue(kgd);
> > > >
> > > > +   amdgpu_gfx_rlc_exit_safe_mode(adev);
> > > > return 0;
> > > >  }
> > > >
> > > > @@ -497,6 +499,7 @@ bool kgd_gfx_v9_hqd_is_occupied(struct kgd_dev
> > > *kgd, uint64_t queue_address,
> > > > bool retval = false;
> > > > uint32_t low, high;
> > > >
> > > > +   amdgpu_gfx_rlc_enter_safe_mode(adev);
> > > > acquire_queue(kgd, pipe_id, queue_id);
> > > > act = RREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_ACTIVE));
> > > > if (act) {
> > > > @@ -508,6 +511,7 @@ bool kgd_gfx_v9_hqd_is_occupied(struct kgd

Re: [PATCH] drm/amdgpu: fix gfx hang during suspend with video playback

2020-04-03 Thread Huang Rui
On Fri, Apr 03, 2020 at 05:22:28PM +0800, Liang, Prike wrote:
> 
> > -Original Message-
> > From: Huang, Ray 
> > Sent: Friday, April 3, 2020 2:27 PM
> > To: Liang, Prike 
> > Cc: amd-gfx@lists.freedesktop.org; Quan, Evan ;
> > Deucher, Alexander ; Kuehling, Felix
> > 
> > Subject: Re: [PATCH] drm/amdgpu: fix gfx hang during suspend with video
> > playback
> > 
> > (+ Felix)
> > 
> > On Fri, Apr 03, 2020 at 12:07:53PM +0800, Liang, Prike wrote:
> > > The system will be hang up during S3 as SMU is pending at GC not
> > > respose the register CP_HQD_ACTIVE access request and this issue can
> > > be fixed by adding RLC safe mode guard before each HQD map/unmap
> > > retrive opt.
> > 
> > We need more information for the issue, does the map/unmap is required
> > for MAP_QUEUES/UNMAP_QUEUES packets or writing with MMIO or both?
> > 
> [Prike]  The issue hang up at MP1 was trying to read register 
> RSMU_RESIDENCY_COUNTER_GC 
> but did not get response from GFX, since GFX was busy at reading register 
> CP_HQD_ACTIVE.
> Moreover, when disabled GFXOFF this issue also can't see so there is likely 
> to perform 
> register accessed at GFXOFF CGPG/CGCG enter stage.  As for only  this issue, 
> that seems just 
> MMIO  access failed case which occurred under QUEUE map/unmap status check. 
> 

While we start to do S3, we will disable gfxoff at start of suspend. Then
in this point, the gfx should be always in "on" state. 

> > From your patch, you just protect the kernel kiq and user queue. What about
> > other kernel compute queues? HIQ?
> > 
> [Prike] So far just find the KIQ/CPQ/DIQ map/unmap will inquire the 
> CP_HQD_ACTIVE status by MMIO accessing,
> therefore just guard the KIQ  and some type user queue now. Regarding HIQ map 
> and ummap which used the method of submitting configuration packet.  
> 

KIQ itself init/unit should be always under gfx on state. Can you give a
check the result if not add enter/exit rlc safe mode around it?

Hi Felix, maybe we need to use packets with kiq to map all user queues.

Thanks,
Ray

> > Thanks,
> > Ray
> > 
> > >
> > > Signed-off-by: Prike Liang 
> > > Tested-by: Mengbing Wang 
> > > ---
> > >  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 6 ++
> > >  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 4 
> > >  2 files changed, 10 insertions(+)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> > > index df841c2..e265063 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> > > @@ -232,6 +232,7 @@ int kgd_gfx_v9_hqd_load(struct kgd_dev *kgd, void
> > *mqd, uint32_t pipe_id,
> > >   uint32_t *mqd_hqd;
> > >   uint32_t reg, hqd_base, data;
> > >
> > > + amdgpu_gfx_rlc_enter_safe_mode(adev);
> > >   m = get_mqd(mqd);
> > >
> > >   acquire_queue(kgd, pipe_id, queue_id); @@ -299,6 +300,7 @@ int
> > > kgd_gfx_v9_hqd_load(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id,
> > >
> > >   release_queue(kgd);
> > >
> > > + amdgpu_gfx_rlc_exit_safe_mode(adev);
> > >   return 0;
> > >  }
> > >
> > > @@ -497,6 +499,7 @@ bool kgd_gfx_v9_hqd_is_occupied(struct kgd_dev
> > *kgd, uint64_t queue_address,
> > >   bool retval = false;
> > >   uint32_t low, high;
> > >
> > > + amdgpu_gfx_rlc_enter_safe_mode(adev);
> > >   acquire_queue(kgd, pipe_id, queue_id);
> > >   act = RREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_ACTIVE));
> > >   if (act) {
> > > @@ -508,6 +511,7 @@ bool kgd_gfx_v9_hqd_is_occupied(struct kgd_dev
> > *kgd, uint64_t queue_address,
> > >   retval = true;
> > >   }
> > >   release_queue(kgd);
> > > + amdgpu_gfx_rlc_exit_safe_mode(adev);
> > >   return retval;
> > >  }
> > >
> > > @@ -541,6 +545,7 @@ int kgd_gfx_v9_hqd_destroy(struct kgd_dev *kgd,
> > void *mqd,
> > >   uint32_t temp;
> > >   struct v9_mqd *m = get_mqd(mqd);
> > >
> > > + amdgpu_gfx_rlc_enter_safe_mode(adev);
> > >   if (adev->in_gpu_reset)
> > >   return -EIO;
> > >
> > > @@ -577,6 +582,7 @@ int kgd_gfx_v9_hqd_destroy(struct kgd_dev *kgd,
> > void *mqd,
> > >   }
> > >
> > >   release_queue(kgd);
> > > + amdgpu_gfx_rlc_exit_safe_mode(ad

RE: [PATCH] drm/amdgpu: fix gfx hang during suspend with video playback

2020-04-03 Thread Liang, Prike


> -Original Message-
> From: Huang, Ray 
> Sent: Friday, April 3, 2020 2:27 PM
> To: Liang, Prike 
> Cc: amd-gfx@lists.freedesktop.org; Quan, Evan ;
> Deucher, Alexander ; Kuehling, Felix
> 
> Subject: Re: [PATCH] drm/amdgpu: fix gfx hang during suspend with video
> playback
> 
> (+ Felix)
> 
> On Fri, Apr 03, 2020 at 12:07:53PM +0800, Liang, Prike wrote:
> > The system will be hang up during S3 as SMU is pending at GC not
> > respose the register CP_HQD_ACTIVE access request and this issue can
> > be fixed by adding RLC safe mode guard before each HQD map/unmap
> > retrive opt.
> 
> We need more information for the issue, does the map/unmap is required
> for MAP_QUEUES/UNMAP_QUEUES packets or writing with MMIO or both?
> 
[Prike]  The issue hang up at MP1 was trying to read register 
RSMU_RESIDENCY_COUNTER_GC 
but did not get response from GFX, since GFX was busy at reading register 
CP_HQD_ACTIVE.
Moreover, when disabled GFXOFF this issue also can't see so there is likely to 
perform 
register accessed at GFXOFF CGPG/CGCG enter stage.  As for only  this issue, 
that seems just 
MMIO  access failed case which occurred under QUEUE map/unmap status check. 

> From your patch, you just protect the kernel kiq and user queue. What about
> other kernel compute queues? HIQ?
> 
[Prike] So far just find the KIQ/CPQ/DIQ map/unmap will inquire the 
CP_HQD_ACTIVE status by MMIO accessing,
therefore just guard the KIQ  and some type user queue now. Regarding HIQ map 
and ummap which used the method of submitting configuration packet.  

> Thanks,
> Ray
> 
> >
> > Signed-off-by: Prike Liang 
> > Tested-by: Mengbing Wang 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 6 ++
> >  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 4 
> >  2 files changed, 10 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> > index df841c2..e265063 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> > @@ -232,6 +232,7 @@ int kgd_gfx_v9_hqd_load(struct kgd_dev *kgd, void
> *mqd, uint32_t pipe_id,
> > uint32_t *mqd_hqd;
> > uint32_t reg, hqd_base, data;
> >
> > +   amdgpu_gfx_rlc_enter_safe_mode(adev);
> > m = get_mqd(mqd);
> >
> > acquire_queue(kgd, pipe_id, queue_id); @@ -299,6 +300,7 @@ int
> > kgd_gfx_v9_hqd_load(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id,
> >
> > release_queue(kgd);
> >
> > +   amdgpu_gfx_rlc_exit_safe_mode(adev);
> > return 0;
> >  }
> >
> > @@ -497,6 +499,7 @@ bool kgd_gfx_v9_hqd_is_occupied(struct kgd_dev
> *kgd, uint64_t queue_address,
> > bool retval = false;
> > uint32_t low, high;
> >
> > +   amdgpu_gfx_rlc_enter_safe_mode(adev);
> > acquire_queue(kgd, pipe_id, queue_id);
> > act = RREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_ACTIVE));
> > if (act) {
> > @@ -508,6 +511,7 @@ bool kgd_gfx_v9_hqd_is_occupied(struct kgd_dev
> *kgd, uint64_t queue_address,
> > retval = true;
> > }
> > release_queue(kgd);
> > +   amdgpu_gfx_rlc_exit_safe_mode(adev);
> > return retval;
> >  }
> >
> > @@ -541,6 +545,7 @@ int kgd_gfx_v9_hqd_destroy(struct kgd_dev *kgd,
> void *mqd,
> > uint32_t temp;
> > struct v9_mqd *m = get_mqd(mqd);
> >
> > +   amdgpu_gfx_rlc_enter_safe_mode(adev);
> > if (adev->in_gpu_reset)
> > return -EIO;
> >
> > @@ -577,6 +582,7 @@ int kgd_gfx_v9_hqd_destroy(struct kgd_dev *kgd,
> void *mqd,
> > }
> >
> > release_queue(kgd);
> > +   amdgpu_gfx_rlc_exit_safe_mode(adev);
> > return 0;
> >  }
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > index 1fea077..ee107d9 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > @@ -3533,6 +3533,7 @@ static int gfx_v9_0_kiq_init_register(struct
> amdgpu_ring *ring)
> > struct v9_mqd *mqd = ring->mqd_ptr;
> > int j;
> >
> > +   amdgpu_gfx_rlc_enter_safe_mode(adev);
> > /* disable wptr polling */
> > WREG32_FIELD15(GC, 0, CP_PQ_WPTR_POLL_CNTL, EN, 0);
> >
> > @@ -3629,6 +3630,7 @@ static int gfx_v9_0_kiq_init_register(struct
> amdgpu_ring *ring)
> > if (ring->use_doorbell)
> > WREG32_FIELD15(GC, 0, CP_PQ_STATUS, DOORBELL_ENABLE,
> 1);
> &

Re: [PATCH] drm/amdgpu: fix gfx hang during suspend with video playback

2020-04-03 Thread Huang Rui
(+ Felix)

On Fri, Apr 03, 2020 at 12:07:53PM +0800, Liang, Prike wrote:
> The system will be hang up during S3 as SMU is pending at GC not
> respose the register CP_HQD_ACTIVE access request and this issue
> can be fixed by adding RLC safe mode guard before each HQD
> map/unmap retrive opt.

We need more information for the issue, does the map/unmap is required for
MAP_QUEUES/UNMAP_QUEUES packets or writing with MMIO or both?

>From your patch, you just protect the kernel kiq and user queue. What about
other kernel compute queues? HIQ?

Thanks,
Ray

> 
> Signed-off-by: Prike Liang 
> Tested-by: Mengbing Wang 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 6 ++
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 4 
>  2 files changed, 10 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> index df841c2..e265063 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> @@ -232,6 +232,7 @@ int kgd_gfx_v9_hqd_load(struct kgd_dev *kgd, void *mqd, 
> uint32_t pipe_id,
>   uint32_t *mqd_hqd;
>   uint32_t reg, hqd_base, data;
>  
> + amdgpu_gfx_rlc_enter_safe_mode(adev);
>   m = get_mqd(mqd);
>  
>   acquire_queue(kgd, pipe_id, queue_id);
> @@ -299,6 +300,7 @@ int kgd_gfx_v9_hqd_load(struct kgd_dev *kgd, void *mqd, 
> uint32_t pipe_id,
>  
>   release_queue(kgd);
>  
> + amdgpu_gfx_rlc_exit_safe_mode(adev);
>   return 0;
>  }
>  
> @@ -497,6 +499,7 @@ bool kgd_gfx_v9_hqd_is_occupied(struct kgd_dev *kgd, 
> uint64_t queue_address,
>   bool retval = false;
>   uint32_t low, high;
>  
> + amdgpu_gfx_rlc_enter_safe_mode(adev);
>   acquire_queue(kgd, pipe_id, queue_id);
>   act = RREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_ACTIVE));
>   if (act) {
> @@ -508,6 +511,7 @@ bool kgd_gfx_v9_hqd_is_occupied(struct kgd_dev *kgd, 
> uint64_t queue_address,
>   retval = true;
>   }
>   release_queue(kgd);
> + amdgpu_gfx_rlc_exit_safe_mode(adev);
>   return retval;
>  }
>  
> @@ -541,6 +545,7 @@ int kgd_gfx_v9_hqd_destroy(struct kgd_dev *kgd, void *mqd,
>   uint32_t temp;
>   struct v9_mqd *m = get_mqd(mqd);
>  
> + amdgpu_gfx_rlc_enter_safe_mode(adev);
>   if (adev->in_gpu_reset)
>   return -EIO;
>  
> @@ -577,6 +582,7 @@ int kgd_gfx_v9_hqd_destroy(struct kgd_dev *kgd, void *mqd,
>   }
>  
>   release_queue(kgd);
> + amdgpu_gfx_rlc_exit_safe_mode(adev);
>   return 0;
>  }
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index 1fea077..ee107d9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -3533,6 +3533,7 @@ static int gfx_v9_0_kiq_init_register(struct 
> amdgpu_ring *ring)
>   struct v9_mqd *mqd = ring->mqd_ptr;
>   int j;
>  
> + amdgpu_gfx_rlc_enter_safe_mode(adev);
>   /* disable wptr polling */
>   WREG32_FIELD15(GC, 0, CP_PQ_WPTR_POLL_CNTL, EN, 0);
>  
> @@ -3629,6 +3630,7 @@ static int gfx_v9_0_kiq_init_register(struct 
> amdgpu_ring *ring)
>   if (ring->use_doorbell)
>   WREG32_FIELD15(GC, 0, CP_PQ_STATUS, DOORBELL_ENABLE, 1);
>  
> + amdgpu_gfx_rlc_exit_safe_mode(adev);
>   return 0;
>  }
>  
> @@ -3637,6 +3639,7 @@ static int gfx_v9_0_kiq_fini_register(struct 
> amdgpu_ring *ring)
>   struct amdgpu_device *adev = ring->adev;
>   int j;
>  
> + amdgpu_gfx_rlc_enter_safe_mode(adev);
>   /* disable the queue if it's active */
>   if (RREG32_SOC15(GC, 0, mmCP_HQD_ACTIVE) & 1) {
>  
> @@ -3668,6 +3671,7 @@ static int gfx_v9_0_kiq_fini_register(struct 
> amdgpu_ring *ring)
>   WREG32_SOC15_RLC(GC, 0, mmCP_HQD_PQ_WPTR_HI, 0);
>   WREG32_SOC15_RLC(GC, 0, mmCP_HQD_PQ_WPTR_LO, 0);
>  
> + amdgpu_gfx_rlc_exit_safe_mode(adev);
>   return 0;
>  }
>  
> -- 
> 2.7.4
> 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx