from:"Liang, Prike"

RE: [PATCH] drm/amdgpu: update suspend status for aborting from deeper suspend

2024-09-09 Thread Liang, Prike

[AMD Official Use Only - AMD Internal Distribution Only]

Yes, thank you for the suggestion. There will be a separate patch for cleaning 
up the setting and checking of the suspend_complete flag.

Thanks,
Prike

> -Original Message-
> From: Alex Deucher 
> Sent: Monday, September 9, 2024 11:23 PM
> To: Liang, Prike 
> Cc: Deucher, Alexander ; amd-
> g...@lists.freedesktop.org
> Subject: Re: [PATCH] drm/amdgpu: update suspend status for aborting from
> deeper suspend
>
> On Mon, Sep 9, 2024 at 8:58 AM Liang, Prike  wrote:
> >
> > [AMD Official Use Only - AMD Internal Distribution Only]
> >
> >
> > Previously, the S3 process aborted before calling the noirq suspend, and 
> > this
> issue was successfully sorted by checking the suspend_complete flag.
> However, there are now some S3 suspend cases, such as pm_test
> platform/core mode, which abort the S3 process after the noirq suspend. In
> these cases of abort, the issue cannot be sorted out by setting the
> suspend_complete flag in the noirq suspend callback, and it’s fine to use the
> MP0 SOL register directly to determine whether to reset the GPU on resume.
> However, on the GFX9 series, the driver still needs to rely on the
> suspend_complete flag to determine whether it needs to skip reprogramming
> the clear state register values during resume from suspend abort cases, so
> now it sounds that the suspend_complete flag cannot be completely
> removed.
>
> Can we just set the suspend_complete flag based on the SOL register rather
> than based on what functions have been called?  Maybe as a future cleanup?
> This logic seems fragile and I'm worried it will get accidently broken.  For 
> now
> the patch is:
> Acked-by: Alex Deucher 
>
> Alex
>
> >
> >
> >
> > Thanks,
> >
> > Prikec
> >
> >
> >
> > From: Deucher, Alexander 
> > Sent: Saturday, September 7, 2024 1:34 AM
> > To: Liang, Prike ; amd-gfx@lists.freedesktop.org
> > Subject: Re: [PATCH] drm/amdgpu: update suspend status for aborting
> > from deeper suspend
> >
> >
> >
> > [AMD Official Use Only - AMD Internal Distribution Only]
> >
> >
> >
> > Can you elaborate on how this fails?  Seems like maybe we should just get
> rid of adev->suspend_complete and just check the MP0 SOL register to
> determine whether or not we need to reset the GPU on resume.
> >
> >
> >
> > Alex
> >
> >
> >
> > 
> >
> > From: Liang, Prike 
> > Sent: Thursday, September 5, 2024 3:36 AM
> > To: amd-gfx@lists.freedesktop.org 
> > Cc: Deucher, Alexander 
> > Subject: RE: [PATCH] drm/amdgpu: update suspend status for aborting
> > from deeper suspend
> >
> >
> >
> > [AMD Official Use Only - AMD Internal Distribution Only]
> >
> > According to the ChromeOS team test, this patch can resolve the S3 suspend
> abort from deeper sleep, which occurs when suspension aborts after calling
> the noirq suspend and before executing the _S3 and turning off the power
> rail.
> >
> > Could this patch get a review or acknowledgment?
> >
> > Thanks,
> > Prike
> >
> > > -Original Message-
> > > From: Liang, Prike 
> > > Sent: Monday, September 2, 2024 4:13 PM
> > > To: amd-gfx@lists.freedesktop.org
> > > Cc: Deucher, Alexander ; Liang, Prike
> > > 
> > > Subject: [PATCH] drm/amdgpu: update suspend status for aborting from
> > > deeper suspend
> > >
> > > There're some other suspend abort cases which can call the noirq
> > > suspend except for executing _S3 method. In those cases need to
> > > process as incomplete suspendsion.
> > >
> > > Signed-off-by: Prike Liang 
> > > ---
> > >  drivers/gpu/drm/amd/amdgpu/soc15.c | 10 ++
> > >  1 file changed, 6 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c
> > > b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > > index 8d16dacdc172..cf701bb8fc79 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > > @@ -587,11 +587,13 @@ static bool soc15_need_reset_on_resume(struct
> > > amdgpu_device *adev)
> > >* 2) S3 suspend abort and TOS already launched.
> > >*/
> > >   if (adev->flags & AMD_IS_APU && adev->in_s3 &&
> > > - !adev->suspend_complete &&
> > > - sol_reg)
> > > + sol_reg) {
> > > + adev->suspend_complete = false;
> > >   return true;
> > > -
> > > - return false;
> > > + } else {
> > > + adev->suspend_complete = true;
> > > + return false;
> > > + }
> > >  }
> > >
> > >  static int soc15_asic_reset(struct amdgpu_device *adev)
> > > --
> > > 2.34.1

RE: [PATCH] drm/amdgpu: update suspend status for aborting from deeper suspend

2024-09-09 Thread Liang, Prike

[AMD Official Use Only - AMD Internal Distribution Only]

Previously, the S3 process aborted before calling the noirq suspend, and this 
issue was successfully sorted by checking the suspend_complete flag. However, 
there are now some S3 suspend cases, such as pm_test platform/core mode, which 
abort the S3 process after the noirq suspend. In these cases of abort, the 
issue cannot be sorted out by setting the suspend_complete flag in the noirq 
suspend callback, and it's fine to use the MP0 SOL register directly to 
determine whether to reset the GPU on resume. However, on the GFX9 series, the 
driver still needs to rely on the suspend_complete flag to determine whether it 
needs to skip reprogramming the clear state register values during resume from 
suspend abort cases, so now it sounds that the suspend_complete flag cannot be 
completely removed.

Thanks,
Prikec

From: Deucher, Alexander 
Sent: Saturday, September 7, 2024 1:34 AM
To: Liang, Prike ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: update suspend status for aborting from deeper 
suspend


[AMD Official Use Only - AMD Internal Distribution Only]

Can you elaborate on how this fails?  Seems like maybe we should just get rid 
of adev->suspend_complete and just check the MP0 SOL register to determine 
whether or not we need to reset the GPU on resume.

Alex

____
From: Liang, Prike mailto:prike.li...@amd.com>>
Sent: Thursday, September 5, 2024 3:36 AM
To: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> 
mailto:amd-gfx@lists.freedesktop.org>>
Cc: Deucher, Alexander 
mailto:alexander.deuc...@amd.com>>
Subject: RE: [PATCH] drm/amdgpu: update suspend status for aborting from deeper 
suspend

[AMD Official Use Only - AMD Internal Distribution Only]

According to the ChromeOS team test, this patch can resolve the S3 suspend 
abort from deeper sleep, which occurs when suspension aborts after calling the 
noirq suspend and before executing the _S3 and turning off the power rail.

Could this patch get a review or acknowledgment?

Thanks,
Prike

> -Original Message-
> From: Liang, Prike mailto:prike.li...@amd.com>>
> Sent: Monday, September 2, 2024 4:13 PM
> To: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
> Cc: Deucher, Alexander 
> mailto:alexander.deuc...@amd.com>>; Liang, Prike
> mailto:prike.li...@amd.com>>
> Subject: [PATCH] drm/amdgpu: update suspend status for aborting from
> deeper suspend
>
> There're some other suspend abort cases which can call the noirq suspend
> except for executing _S3 method. In those cases need to process as
> incomplete suspendsion.
>
> Signed-off-by: Prike Liang mailto:prike.li...@amd.com>>
> ---
>  drivers/gpu/drm/amd/amdgpu/soc15.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c
> b/drivers/gpu/drm/amd/amdgpu/soc15.c
> index 8d16dacdc172..cf701bb8fc79 100644
> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> @@ -587,11 +587,13 @@ static bool soc15_need_reset_on_resume(struct
> amdgpu_device *adev)
>* 2) S3 suspend abort and TOS already launched.
>*/
>   if (adev->flags & AMD_IS_APU && adev->in_s3 &&
> - !adev->suspend_complete &&
> - sol_reg)
> + sol_reg) {
> + adev->suspend_complete = false;
>   return true;
> -
> - return false;
> + } else {
> + adev->suspend_complete = true;
> + return false;
> + }
>  }
>
>  static int soc15_asic_reset(struct amdgpu_device *adev)
> --
> 2.34.1

RE: [PATCH] drm/amdgpu: update suspend status for aborting from deeper suspend

2024-09-05 Thread Liang, Prike

[AMD Official Use Only - AMD Internal Distribution Only]

According to the ChromeOS team test, this patch can resolve the S3 suspend 
abort from deeper sleep, which occurs when suspension aborts after calling the 
noirq suspend and before executing the _S3 and turning off the power rail.

Could this patch get a review or acknowledgment?

Thanks,
Prike

> -Original Message-
> From: Liang, Prike 
> Sent: Monday, September 2, 2024 4:13 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Liang, Prike
> 
> Subject: [PATCH] drm/amdgpu: update suspend status for aborting from
> deeper suspend
>
> There're some other suspend abort cases which can call the noirq suspend
> except for executing _S3 method. In those cases need to process as
> incomplete suspendsion.
>
> Signed-off-by: Prike Liang 
> ---
>  drivers/gpu/drm/amd/amdgpu/soc15.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c
> b/drivers/gpu/drm/amd/amdgpu/soc15.c
> index 8d16dacdc172..cf701bb8fc79 100644
> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> @@ -587,11 +587,13 @@ static bool soc15_need_reset_on_resume(struct
> amdgpu_device *adev)
>* 2) S3 suspend abort and TOS already launched.
>*/
>   if (adev->flags & AMD_IS_APU && adev->in_s3 &&
> - !adev->suspend_complete &&
> - sol_reg)
> + sol_reg) {
> + adev->suspend_complete = false;
>   return true;
> -
> - return false;
> + } else {
> + adev->suspend_complete = true;
> + return false;
> + }
>  }
>
>  static int soc15_asic_reset(struct amdgpu_device *adev)
> --
> 2.34.1

RE: [PATCH v5] drm/amdgpu/gfx9.4.3: Implement compute pipe reset

2024-08-29 Thread Liang, Prike

[AMD Official Use Only - AMD Internal Distribution Only]

Thanks for the detail review, and I will update those before pushing the commit.

Thanks,
Prike

> -Original Message-
> From: Lazar, Lijo 
> Sent: Thursday, August 29, 2024 4:13 PM
> To: Liang, Prike ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Ma, Le
> 
> Subject: Re: [PATCH v5] drm/amdgpu/gfx9.4.3: Implement compute pipe reset
>
>
>
> On 8/29/2024 9:17 AM, Prike Liang wrote:
> > Implement the compute pipe reset, and the driver will fallback to pipe
> > reset when queue reset fails.
> > The pipe reset only deactivates the queue which is scheduled in the
> > pipe, and meanwhile the MEC engine will be reset to the firmware
> > _start pointer. So,
>
> May refine this to indicate that reset to _start is for the specific pipe and 
> not
> applicable for the whole MEC engine.
>
> > it seems pipe reset will cost more cycles than the queue reset;
> > therefore, the driver tries to recover by doing queue reset first.
> >
> > Signed-off-by: Prike Liang 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 127
> > 
> >  1 file changed, 108 insertions(+), 19 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
> > b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
> > index 2067f26d3a9d..26ae62d2a752 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
> > @@ -3466,6 +3466,98 @@ static void gfx_v9_4_3_emit_wave_limit(struct
> amdgpu_ring *ring, bool enable)
> > }
> >  }
> >
> > +static int gfx_v9_4_3_unmap_done(struct amdgpu_device *adev, uint32_t
> me,
> > +   uint32_t pipe, uint32_t queue,
> > +   uint32_t xcc_id)
> > +{
> > +   int i, r;
> > +   /* make sure dequeue is complete*/
> > +   gfx_v9_4_3_xcc_set_safe_mode(adev, xcc_id);
> > +   mutex_lock(&adev->srbm_mutex);
> > +   soc15_grbm_select(adev, me, pipe, queue, 0, GET_INST(GC, xcc_id));
> > +   for (i = 0; i < adev->usec_timeout; i++) {
> > +   if (!(RREG32_SOC15(GC, GET_INST(GC, xcc_id),
> regCP_HQD_ACTIVE) & 1))
> > +   break;
> > +   udelay(1);
> > +   }
> > +   if (i >= adev->usec_timeout)
> > +   r = -ETIMEDOUT;
> > +   else
> > +   r = 0;
> > +   soc15_grbm_select(adev, 0, 0, 0, 0, GET_INST(GC, xcc_id));
> > +   mutex_unlock(&adev->srbm_mutex);
> > +   gfx_v9_4_3_xcc_unset_safe_mode(adev, xcc_id);
> > +
> > +   return r;
> > +
> > +}
> > +
> > +static bool gfx_v9_4_3_pipe_reset_support(struct amdgpu_device *adev)
> > +{
> > +   /*TODO: Need check gfx9.4.4 mec fw whether supports pipe reset as
> well.*/
> > +   if (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 3) &&
> > +   adev->gfx.mec_fw_version >= 0x009b)
> > +   return true;
> > +   else
> > +   dev_warn_once(adev->dev, "Please use the latest MEC
> version to see
> > +whether support pipe reset\n");
> > +
> > +   return false;
> > +}
> > +
> > +static int gfx_v9_4_3_kiq_reset_hw_pipe(struct amdgpu_ring *ring)
>
> Please drop the kiq name in this function to avoid confusion. It's not
> restricted to kiq.
>
> With those
>
>   Reviewed-by: Lijo Lazar 
>
> Thanks,
> Lijo
>
> > +{
> > +   struct amdgpu_device *adev = ring->adev;
> > +   uint32_t reset_pipe, clean_pipe;
> > +   int r;
> > +
> > +   if (!gfx_v9_4_3_pipe_reset_support(adev))
> > +   return -EINVAL;
> > +
> > +   gfx_v9_4_3_xcc_set_safe_mode(adev, ring->xcc_id);
> > +   mutex_lock(&adev->srbm_mutex);
> > +
> > +   reset_pipe = RREG32_SOC15(GC, GET_INST(GC, ring->xcc_id),
> regCP_MEC_CNTL);
> > +   clean_pipe = reset_pipe;
> > +
> > +   if (ring->me == 1) {
> > +   switch (ring->pipe) {
> > +   case 0:
> > +   reset_pipe = REG_SET_FIELD(reset_pipe,
> CP_MEC_CNTL,
> > +  MEC_ME1_PIPE0_RESET, 1);
> > +   break;
> > +   case 1:
> > +   reset_pipe = REG_SET_FIELD(reset_pipe,
> CP_MEC_CNTL,
> > +  MEC_ME1_PIPE1_RESET, 1);
> > +   break;
> > +   case 2:
> > +   reset_pipe = REG_SET_

RE: [PATCH v4] drm/amdgpu/gfx9.4.3: Implement compute pipe reset

2024-08-28 Thread Liang, Prike

[AMD Official Use Only - AMD Internal Distribution Only]

> From: Lazar, Lijo 
> Sent: Wednesday, August 28, 2024 2:45 PM
> To: Liang, Prike ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Ma, Le
> 
> Subject: Re: [PATCH v4] drm/amdgpu/gfx9.4.3: Implement compute pipe reset
>
>
>
> On 8/28/2024 10:50 AM, Prike Liang wrote:
> > Implement the compute pipe reset, and the driver will fallback to pipe
> > reset when queue reset fails.
> > The pipe reset only deactivates the queue which is scheduled in the
> > pipe, and meanwhile the MEC engine will be reset to the firmware
> > _start pointer. So, it seems pipe reset will cost more cycles than the
> > queue reset; therefore, the driver tries to recover by doing queue
> > reset first.
> >
> > Signed-off-by: Prike Liang 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h |   5 +
> >  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 139
> > 
> >  2 files changed, 124 insertions(+), 20 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > index e28c1ebfa98f..d4d74ba2bc27 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > @@ -143,6 +143,11 @@ struct kiq_pm4_funcs {
> >uint32_t queue_type, uint32_t me_id,
> >uint32_t pipe_id, uint32_t queue_id,
> >uint32_t xcc_id, uint32_t vmid);
> > +   int (*kiq_reset_hw_pipe)(struct amdgpu_ring *kiq_ring,
> > +  uint32_t queue_type, uint32_t me,
> > +  uint32_t pipe, uint32_t queue,
> > +  uint32_t xcc_id);
>
> Missed the addition of this callback in earlier review.
>
> The implementation below -
>   Doesn't use kiq to do a pipe reset. It's looks like a direct hardware
> reset. Passing a kiq_ring here or defining a callback in kiq  functions 
> doesn't
> look required unless a pipe reset through kiq is available for other hardware
> generations.
>
>   Also, it uses pipe reset as a fallback when queue unmap fails. So the
> callback eventually is not used.
>
> Is this really needed? For the below implementation, it seems a private
> function like gfx_v9_4_3_reset_hw_pipe(struct amdgpu_ring *ring) is good
> enough.
>
> Thanks,
> Lijo
>
This KIQ callback is implemented following Alex's software design spec. Maybe 
original design purpose was design to support the compute user queue.
But IIRC the compute user queue pipe reset has a similar implementation in the 
KFD and may not reuse this callback.

Hi, @Deucher, Alexander Could you help comment here and do we need to implement 
pipe reset in the KIQ callback?

Thanks,
Prike
> > +
> > /* Packet sizes */
> > int set_resources_size;
> > int map_queues_size;
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
> > b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
> > index 2067f26d3a9d..f47b55d6f673 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
> > @@ -166,6 +166,10 @@ static int gfx_v9_4_3_get_cu_info(struct
> amdgpu_device *adev,
> > struct amdgpu_cu_info *cu_info);
> >  static void gfx_v9_4_3_xcc_set_safe_mode(struct amdgpu_device *adev,
> > int xcc_id);  static void gfx_v9_4_3_xcc_unset_safe_mode(struct
> > amdgpu_device *adev, int xcc_id);
> > +static int gfx_v9_4_3_kiq_reset_hw_pipe(struct amdgpu_ring *kiq_ring,
> > +   uint32_t queue_type, uint32_t me,
> > +   uint32_t pipe, uint32_t queue,
> > +   uint32_t xcc_id);
> >
> >  static void gfx_v9_4_3_kiq_set_resources(struct amdgpu_ring *kiq_ring,
> > uint64_t queue_mask)
> > @@ -323,6 +327,7 @@ static const struct kiq_pm4_funcs
> gfx_v9_4_3_kiq_pm4_funcs = {
> > .kiq_query_status = gfx_v9_4_3_kiq_query_status,
> > .kiq_invalidate_tlbs = gfx_v9_4_3_kiq_invalidate_tlbs,
> > .kiq_reset_hw_queue = gfx_v9_4_3_kiq_reset_hw_queue,
> > +   .kiq_reset_hw_pipe = gfx_v9_4_3_kiq_reset_hw_pipe,
> > .set_resources_size = 8,
> > .map_queues_size = 7,
> > .unmap_queues_size = 6,
> > @@ -3466,6 +3471,101 @@ static void gfx_v9_4_3_emit_wave_limit(struct
> amdgpu_ring *ring, bool enable)
> > }
> >  }
> >
> > +static int gfx_v9_4_3_unmap_done(struct amdgpu_device *adev, uint3

RE: [PATCH v3] drm/amdgpu/gfx9.4.3: Implement compute pipe reset

2024-08-27 Thread Liang, Prike

[AMD Official Use Only - AMD Internal Distribution Only]

> From: Lazar, Lijo 
> Sent: Tuesday, August 27, 2024 4:02 PM
> To: Liang, Prike ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Ma, Le
> 
> Subject: Re: [PATCH v3] drm/amdgpu/gfx9.4.3: Implement compute pipe reset
>
>
>
> On 8/22/2024 3:08 PM, Prike Liang wrote:
> > Implement the compute pipe reset and driver will fallback to pipe
> > reset when queue reset failed.
> >
> > Signed-off-by: Prike Liang 
> > ---
> > v3: Use the dev log and filer out the gfx9.4.4 pipe reset support.
> > v2: Convert the GC logic instance to physical instance in the
> > register accessing process and use the dev_* print to specify
> > the failed device.
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h |   5 +
> >  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 154
> > +---
> >  2 files changed, 139 insertions(+), 20 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > index e28c1ebfa98f..d4d74ba2bc27 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > @@ -143,6 +143,11 @@ struct kiq_pm4_funcs {
> >uint32_t queue_type, uint32_t me_id,
> >uint32_t pipe_id, uint32_t queue_id,
> >uint32_t xcc_id, uint32_t vmid);
> > +   int (*kiq_reset_hw_pipe)(struct amdgpu_ring *kiq_ring,
> > +  uint32_t queue_type, uint32_t me,
> > +  uint32_t pipe, uint32_t queue,
> > +  uint32_t xcc_id);
> > +
> > /* Packet sizes */
> > int set_resources_size;
> > int map_queues_size;
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
> > b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
> > index 2067f26d3a9d..aa0c76eed452 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
> > @@ -166,6 +166,10 @@ static int gfx_v9_4_3_get_cu_info(struct
> amdgpu_device *adev,
> > struct amdgpu_cu_info *cu_info);
> >  static void gfx_v9_4_3_xcc_set_safe_mode(struct amdgpu_device *adev,
> > int xcc_id);  static void gfx_v9_4_3_xcc_unset_safe_mode(struct
> > amdgpu_device *adev, int xcc_id);
> > +static int gfx_v9_4_3_kiq_reset_hw_pipe(struct amdgpu_ring *kiq_ring,
> > +   uint32_t queue_type, uint32_t me,
> > +   uint32_t pipe, uint32_t queue,
> > +   uint32_t xcc_id);
> >
> >  static void gfx_v9_4_3_kiq_set_resources(struct amdgpu_ring *kiq_ring,
> > uint64_t queue_mask)
> > @@ -323,6 +327,7 @@ static const struct kiq_pm4_funcs
> gfx_v9_4_3_kiq_pm4_funcs = {
> > .kiq_query_status = gfx_v9_4_3_kiq_query_status,
> > .kiq_invalidate_tlbs = gfx_v9_4_3_kiq_invalidate_tlbs,
> > .kiq_reset_hw_queue = gfx_v9_4_3_kiq_reset_hw_queue,
> > +   .kiq_reset_hw_pipe = gfx_v9_4_3_kiq_reset_hw_pipe,
> > .set_resources_size = 8,
> > .map_queues_size = 7,
> > .unmap_queues_size = 6,
> > @@ -3466,6 +3471,116 @@ static void gfx_v9_4_3_emit_wave_limit(struct
> amdgpu_ring *ring, bool enable)
> > }
> >  }
> >
> > +static int gfx_v9_4_3_unmap_done(struct amdgpu_device *adev, uint32_t
> me,
> > +   uint32_t pipe, uint32_t queue,
> > +   uint32_t xcc_id)
> > +{
> > +   int i, r;
> > +   /* make sure dequeue is complete*/
> > +   gfx_v9_4_3_xcc_set_safe_mode(adev, xcc_id);
> > +   mutex_lock(&adev->srbm_mutex);
> > +   soc15_grbm_select(adev, me, pipe, queue, 0, GET_INST(GC, xcc_id));
> > +   for (i = 0; i < adev->usec_timeout; i++) {
> > +   if (!(RREG32_SOC15(GC, GET_INST(GC, xcc_id),
> regCP_HQD_ACTIVE) & 1))
> > +   break;
> > +   udelay(1);
> > +   }
> > +   if (i >= adev->usec_timeout)
> > +   r = -ETIMEDOUT;
> > +   else
> > +   r = 0;
> > +   soc15_grbm_select(adev, 0, 0, 0, 0, GET_INST(GC, xcc_id));
> > +   mutex_unlock(&adev->srbm_mutex);
> > +   gfx_v9_4_3_xcc_unset_safe_mode(adev, xcc_id);
> > +
> > +   return r;
> > +
> > +}
> > +
> > +static bool gfx_v9_4_3_pipe_reset_support(struct amdgpu_device *adev)
> > +{
> > +   /*TODO: Need check gfx

RE: [PATCH v2] drm/amdgpu/gfx9.4.3: Implement compute pipe reset

2024-08-26 Thread Liang, Prike

[AMD Official Use Only - AMD Internal Distribution Only]

> From: Alex Deucher 
> Sent: Tuesday, August 20, 2024 9:50 PM
> To: Lazar, Lijo 
> Cc: Liang, Prike ; amd-gfx@lists.freedesktop.org;
> Deucher, Alexander ; Ma, Le
> 
> Subject: Re: [PATCH v2] drm/amdgpu/gfx9.4.3: Implement compute pipe reset
>
> On Tue, Aug 20, 2024 at 8:43 AM Lazar, Lijo  wrote:
> >
> >
> >
> > On 8/20/2024 4:01 PM, Prike Liang wrote:
> > > Implement the compute pipe reset and driver will fallback to pipe
> > > reset when queue reset failed.
> > >
> > > Signed-off-by: Prike Liang 
> > > ---
> > > v2: Convert the GC logic instance to physical instance in the
> > > register accessing process and
> >
> > > use the dev_* print to specify the failed device.
> >
> > This is not fully done, marked below.
> >
> > > ---
> > >  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h |   5 +
> > >  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 153
> > > 
> > >  2 files changed, 138 insertions(+), 20 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > > index e28c1ebfa98f..d4d74ba2bc27 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > > @@ -143,6 +143,11 @@ struct kiq_pm4_funcs {
> > >  uint32_t queue_type, uint32_t me_id,
> > >  uint32_t pipe_id, uint32_t queue_id,
> > >  uint32_t xcc_id, uint32_t vmid);
> > > + int (*kiq_reset_hw_pipe)(struct amdgpu_ring *kiq_ring,
> > > +uint32_t queue_type, uint32_t me,
> > > +uint32_t pipe, uint32_t queue,
> > > +uint32_t xcc_id);
> > > +
> > >   /* Packet sizes */
> > >   int set_resources_size;
> > >   int map_queues_size;
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
> > > b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
> > > index 2067f26d3a9d..ab9d5adbbfe8 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
> > > @@ -166,6 +166,10 @@ static int gfx_v9_4_3_get_cu_info(struct
> amdgpu_device *adev,
> > >   struct amdgpu_cu_info *cu_info);
> > > static void gfx_v9_4_3_xcc_set_safe_mode(struct amdgpu_device *adev,
> > > int xcc_id);  static void gfx_v9_4_3_xcc_unset_safe_mode(struct
> > > amdgpu_device *adev, int xcc_id);
> > > +static int gfx_v9_4_3_kiq_reset_hw_pipe(struct amdgpu_ring *kiq_ring,
> > > + uint32_t queue_type, uint32_t me,
> > > + uint32_t pipe, uint32_t queue,
> > > + uint32_t xcc_id);
> > >
> > >  static void gfx_v9_4_3_kiq_set_resources(struct amdgpu_ring *kiq_ring,
> > >   uint64_t queue_mask) @@ -323,6 +327,7
> > > @@ static const struct kiq_pm4_funcs gfx_v9_4_3_kiq_pm4_funcs = {
> > >   .kiq_query_status = gfx_v9_4_3_kiq_query_status,
> > >   .kiq_invalidate_tlbs = gfx_v9_4_3_kiq_invalidate_tlbs,
> > >   .kiq_reset_hw_queue = gfx_v9_4_3_kiq_reset_hw_queue,
> > > + .kiq_reset_hw_pipe = gfx_v9_4_3_kiq_reset_hw_pipe,
> > >   .set_resources_size = 8,
> > >   .map_queues_size = 7,
> > >   .unmap_queues_size = 6,
> > > @@ -3466,6 +3471,115 @@ static void
> gfx_v9_4_3_emit_wave_limit(struct amdgpu_ring *ring, bool enable)
> > >   }
> > >  }
> > >
> > > +static int gfx_v9_4_3_unmap_done(struct amdgpu_device *adev,
> uint32_t me,
> > > + uint32_t pipe, uint32_t queue,
> > > + uint32_t xcc_id) {
> > > + int i, r;
> > > + /* make sure dequeue is complete*/
> > > + gfx_v9_4_3_xcc_set_safe_mode(adev, xcc_id);
> > > + mutex_lock(&adev->srbm_mutex);
> > > + soc15_grbm_select(adev, me, pipe, queue, 0, GET_INST(GC, xcc_id));
> > > + for (i = 0; i < adev->usec_timeout; i++) {
> > > + if (!(RREG32_SOC15(GC, GET_INST(GC, xcc_id),
> regCP_HQD_ACTIVE) & 1))
> > > + break;
> > > + udelay(1);
> > > + }
> > > +

RE: [PATCH] drm/amdgpu/gfx9.4.3: Implement compute pipe reset

2024-08-20 Thread Liang, Prike

[AMD Official Use Only - AMD Internal Distribution Only]

> From: Ma, Le 
> Sent: Tuesday, August 20, 2024 5:38 PM
> To: Liang, Prike ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander 
> Subject: RE: [PATCH] drm/amdgpu/gfx9.4.3: Implement compute pipe reset
>
> [AMD Official Use Only - AMD Internal Distribution Only]
>
> > -Original Message-
> > From: amd-gfx  On Behalf Of
> > Prike Liang
> > Sent: Tuesday, August 20, 2024 4:58 PM
> > To: amd-gfx@lists.freedesktop.org
> > Cc: Deucher, Alexander ; Liang, Prike
> > 
> > Subject: [PATCH] drm/amdgpu/gfx9.4.3: Implement compute pipe reset
> >
> > Implement the compute pipe reset and driver will fallback to pipe
> > reset when queue reset failed.
> >
> > Signed-off-by: Prike Liang 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h |   5 +
> >  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 153
> > 
> >  2 files changed, 138 insertions(+), 20 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > index e28c1ebfa98f..d4d74ba2bc27 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > @@ -143,6 +143,11 @@ struct kiq_pm4_funcs {
> >  uint32_t queue_type, uint32_t me_id,
> >  uint32_t pipe_id, uint32_t queue_id,
> >  uint32_t xcc_id, uint32_t vmid);
> > + int (*kiq_reset_hw_pipe)(struct amdgpu_ring *kiq_ring,
> > +uint32_t queue_type, uint32_t me,
> > +uint32_t pipe, uint32_t queue,
> > +uint32_t xcc_id);
> > +
> >   /* Packet sizes */
> >   int set_resources_size;
> >   int map_queues_size;
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
> > b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
> > index 2067f26d3a9d..09caa5a1842b 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
> > @@ -166,6 +166,10 @@ static int gfx_v9_4_3_get_cu_info(struct
> > amdgpu_device *adev,
> >   struct amdgpu_cu_info *cu_info);  static
> > void gfx_v9_4_3_xcc_set_safe_mode(struct amdgpu_device *adev, int
> > xcc_id);  static void gfx_v9_4_3_xcc_unset_safe_mode(struct
> > amdgpu_device *adev, int xcc_id);
> > +static int gfx_v9_4_3_kiq_reset_hw_pipe(struct amdgpu_ring *kiq_ring,
> > + uint32_t queue_type, uint32_t me,
> > + uint32_t pipe, uint32_t queue,
> > + uint32_t xcc_id);
> >
> >  static void gfx_v9_4_3_kiq_set_resources(struct amdgpu_ring *kiq_ring,
> >   uint64_t queue_mask) @@ -323,6 +327,7 @@
> > static const struct kiq_pm4_funcs gfx_v9_4_3_kiq_pm4_funcs = {
> >   .kiq_query_status = gfx_v9_4_3_kiq_query_status,
> >   .kiq_invalidate_tlbs = gfx_v9_4_3_kiq_invalidate_tlbs,
> >   .kiq_reset_hw_queue = gfx_v9_4_3_kiq_reset_hw_queue,
> > + .kiq_reset_hw_pipe = gfx_v9_4_3_kiq_reset_hw_pipe,
> >   .set_resources_size = 8,
> >   .map_queues_size = 7,
> >   .unmap_queues_size = 6,
> > @@ -3466,6 +3471,115 @@ static void gfx_v9_4_3_emit_wave_limit(struct
> > amdgpu_ring *ring, bool enable)
> >   }
> >  }
> >
> > +static int gfx_v9_4_3_unmap_done(struct amdgpu_device *adev, uint32_t
> me,
> > + uint32_t pipe, uint32_t queue,
> > + uint32_t xcc_id) {
> > + int i, r;
> > + /* make sure dequeue is complete*/
> > + gfx_v9_4_3_xcc_set_safe_mode(adev, xcc_id);
> > + mutex_lock(&adev->srbm_mutex);
> > + soc15_grbm_select(adev, me, pipe, queue, 0, GET_INST(GC, xcc_id));
> > + for (i = 0; i < adev->usec_timeout; i++) {
> > + if (!(RREG32_SOC15(GC, 0, regCP_HQD_ACTIVE) & 1))
>
> Should GET_INST be used to locate the target xcc_id here?
>
Thanks for pointing this out, it requires to covert the GC logical instance to 
physical instance on this ASIC and will address it in later version.

> > + break;
> > + udelay(1);
> > + }
> > + if (i >= adev->usec_timeout)
> > + r = -ETIMEDOUT;
> > + else
> > + r = 0;
> > + soc15_grbm_select(adev, 0, 0, 0, 0, GET_INST(GC, xc

RE: [PATCH] drm/amdgpu: Use the slab allocator to reduce job allocation fragmentation

2024-05-14 Thread Liang, Prike

[AMD Official Use Only - AMD Internal Distribution Only]

> From: Koenig, Christian 
> Sent: Friday, May 10, 2024 5:31 PM
> To: Liang, Prike ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander 
> Subject: Re: [PATCH] drm/amdgpu: Use the slab allocator to reduce job
> allocation fragmentation
>
> Am 10.05.24 um 10:11 schrieb Prike Liang:
> > Using kzalloc() results in about 50% memory fragmentation, therefore
> > use the slab allocator to reproduce memory fragmentation.
> >
> > Signed-off-by: Prike Liang 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  1 +
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 26
> -
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_job.h |  1 +
> >   3 files changed, 23 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > index ea14f1c8f430..3de1b42291b6 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > @@ -3040,6 +3040,7 @@ static void __exit amdgpu_exit(void)
> > amdgpu_fence_slab_fini();
> > mmu_notifier_synchronize();
> > amdgpu_xcp_drv_release();
> > +   amdgpue_job_slab_fini();
> >   }
> >
> >   module_init(amdgpu_init);
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > index e4742b65032d..8327bf017a0e 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > @@ -31,6 +31,8 @@
> >   #include "amdgpu_trace.h"
> >   #include "amdgpu_reset.h"
> >
> > +static struct kmem_cache *amdgpu_job_slab;
> > +
> >   static enum drm_gpu_sched_stat amdgpu_job_timedout(struct
> drm_sched_job *s_job)
> >   {
> > struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched); @@ -
> 101,10
> > +103,19 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, struct
> amdgpu_vm *vm,
> > if (num_ibs == 0)
> > return -EINVAL;
> >
> > -   *job = kzalloc(struct_size(*job, ibs, num_ibs), GFP_KERNEL);
> > -   if (!*job)
> > +   amdgpu_job_slab = kmem_cache_create("amdgpu_job",
> > +   struct_size(*job, ibs, num_ibs), 0,
> > +   SLAB_HWCACHE_ALIGN, NULL);
>
> Well you are declaring a global slab cache for a dynamic job size, then try to
> set it up in the job allocation function which can be called concurrently with
> different number of IBs.
>
> To sum it up  this is completely racy and will go boom immediately in testing.
> As far as I can see this suggestion is just utterly nonsense.
>
> Regards,
> Christian.
>
Hi, Christian

The num_ibs is calculated as 1 in amdgpu_cs_p1_ib() and from amdgpu_cs_pass1(), 
the num_ibs will be set to 1 as an input parameter at amdgpu_job_alloc(). 
Moreover, the num_ibs is only set from amdgpu_cs_p1_ib() and shouldn't have a 
chance to be overwritten from the user space driver side. Also, I checked a few 
GL and Vulkan applications and didn't find multiple IBs within one amdgpu job 
submission.

If there are still concerns about the IB array size on the amdgpu_job object 
allocated, we can remove the IBs member and decompose the IB with the job 
object. Then, we can export and access the IBs as a parameter from a new 
interface like amdgpu_cs_patch_ibs(struct amdgpu_cs_parser *p, struct 
amdgpu_job *job, struct amdgpu_ib *ib).

Regarding this patch, using kmem_cache_zalloc() instead of kzalloc() can save 
about 448 bytes of memory space for each amdgpu_job object allocated. 
Meanwhile, the job object allocation takes almost the same time, so it should 
have no side effect on the performance. If the idea is sensible, I will rework 
the patch by creating the job slab during the driver probe period.

Thanks,
Prike
> > +   if (!amdgpu_job_slab) {
> > +   DRM_ERROR("create amdgpu_job cache failed\n");
> > return -ENOMEM;
> > +   }
> >
> > +   *job = kmem_cache_zalloc(amdgpu_job_slab, GFP_KERNEL);
> > +   if (!*job) {
> > +   kmem_cache_destroy(amdgpu_job_slab);
> > +   return -ENOMEM;
> > +   }
> > /*
> >  * Initialize the scheduler to at least some ring so that we always
> >  * have a pointer to adev.
> > @@ -138,7 +149,7 @@ int amdgpu_job_alloc_with_ib(struct
> amdgpu_device *adev,
> > if (r) {
> > if (entity)
> > drm_sched_job_cleanup(&(*job)->base);
> > -   kfree(*job);
> > +   kmem_cache_free(amdg

RE: [PATCH 3/3] drm/amdgpu: add the amdgpu buffer object move speed metrics

2024-04-23 Thread Liang, Prike

[Public]

Hi, Christian

The basic idea is to collect the following performance data and export this raw 
data into a centralized debugfs. This raw data may help in performance tuning 
from the AMDGPU kernel driver side. Additionally, this performance data should 
be easily used for tool libraries to enhance the tool's functionality.

- AMDGPU engine configuration dump
- GPU bus transaction speed metrics
- AMDGPU buffer move speed metrics
- AMDGPU performance counter
- AMDGPU driver sw information dump

Thanks,
Prike

> -Original Message-
> From: Christian König 
> Sent: Monday, April 22, 2024 11:01 PM
> To: Liang, Prike ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander 
> Subject: Re: [PATCH 3/3] drm/amdgpu: add the amdgpu buffer object move
> speed metrics
>
> Am 16.04.24 um 10:51 schrieb Prike Liang:
> > Add the amdgpu buffer object move speed metrics.
>
> What should that be good for? It adds quite a bunch of complexity for a
> feature we actually want to deprecate.
>
> Regards,
> Christian.
>
> >
> > Signed-off-by: Prike Liang 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  2 +-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c | 78
> ++-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c   |  2 +-
> >   3 files changed, 61 insertions(+), 21 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > index 163d221b3bbd..2840f1536b51 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > @@ -502,7 +502,7 @@ void amdgpu_device_wb_free(struct
> amdgpu_device *adev, u32 wb);
> >   /*
> >* Benchmarking
> >*/
> > -int amdgpu_benchmark(struct amdgpu_device *adev, int test_number);
> > +int amdgpu_benchmark(struct amdgpu_device *adev, int test_number,
> > +struct seq_file *m);
> >
> >   int amdgpu_benchmark_dump(struct amdgpu_device *adev, struct
> seq_file *m);
> >   /*
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
> > index f6848b574dea..fcd186ca088a 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
> > @@ -65,20 +65,27 @@ static void amdgpu_benchmark_log_results(struct
> amdgpu_device *adev,
> >  int n, unsigned size,
> >  s64 time_ms,
> >  unsigned sdomain, unsigned
> ddomain,
> > -char *kind)
> > +char *kind, struct seq_file *m)
> >   {
> > s64 throughput = (n * (size >> 10));
> >
> > throughput = div64_s64(throughput, time_ms);
> >
> > -   dev_info(adev->dev, "amdgpu: %s %u bo moves of %u kB from"
> > -" %d to %d in %lld ms, throughput: %lld Mb/s or %lld
> MB/s\n",
> > -kind, n, size >> 10, sdomain, ddomain, time_ms,
> > -throughput * 8, throughput);
> > +   if (m) {
> > +   seq_printf(m, "\tamdgpu: %s %u bo moves of %u kB from"
> > +" %d to %d in %lld ms, throughput: %lld Mb/s or %lld
> MB/s\n",
> > +   kind, n, size >> 10, sdomain, ddomain, time_ms,
> > +   throughput * 8, throughput);
> > +   } else {
> > +   dev_info(adev->dev, "amdgpu: %s %u bo moves of %u kB
> from"
> > +" %d to %d in %lld ms, throughput: %lld Mb/s or %lld
> MB/s\n",
> > +   kind, n, size >> 10, sdomain, ddomain, time_ms,
> > +   throughput * 8, throughput);
> > +   }
> >   }
> >
> >   static int amdgpu_benchmark_move(struct amdgpu_device *adev,
> unsigned size,
> > -unsigned sdomain, unsigned ddomain)
> > +unsigned sdomain, unsigned ddomain, struct
> seq_file *m)
> >   {
> > struct amdgpu_bo *dobj = NULL;
> > struct amdgpu_bo *sobj = NULL;
> > @@ -109,7 +116,7 @@ static int amdgpu_benchmark_move(struct
> amdgpu_device *adev, unsigned size,
> > goto out_cleanup;
> > else
> > amdgpu_benchmark_log_results(adev, n, size,
> time_ms,
> > -sdomain, ddomain,
> "dma");
> > +sdomain, ddomain, "dma

RE: [PATCH 3/3] drm/amdgpu: add the amdgpu buffer object move speed metrics

2024-04-22 Thread Liang, Prike

[AMD Official Use Only - General]

Soft ping for the series.

Thanks,
Prike

> -Original Message-
> From: Liang, Prike 
> Sent: Tuesday, April 16, 2024 4:52 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Liang, Prike
> 
> Subject: [PATCH 3/3] drm/amdgpu: add the amdgpu buffer object move
> speed metrics
>
> Add the amdgpu buffer object move speed metrics.
>
> Signed-off-by: Prike Liang 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c | 78
> ++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c   |  2 +-
>  3 files changed, 61 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 163d221b3bbd..2840f1536b51 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -502,7 +502,7 @@ void amdgpu_device_wb_free(struct amdgpu_device
> *adev, u32 wb);
>  /*
>   * Benchmarking
>   */
> -int amdgpu_benchmark(struct amdgpu_device *adev, int test_number);
> +int amdgpu_benchmark(struct amdgpu_device *adev, int test_number,
> +struct seq_file *m);
>
>  int amdgpu_benchmark_dump(struct amdgpu_device *adev, struct seq_file
> *m);
>  /*
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
> index f6848b574dea..fcd186ca088a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
> @@ -65,20 +65,27 @@ static void amdgpu_benchmark_log_results(struct
> amdgpu_device *adev,
>int n, unsigned size,
>s64 time_ms,
>unsigned sdomain, unsigned
> ddomain,
> -  char *kind)
> +  char *kind, struct seq_file *m)
>  {
>   s64 throughput = (n * (size >> 10));
>
>   throughput = div64_s64(throughput, time_ms);
>
> - dev_info(adev->dev, "amdgpu: %s %u bo moves of %u kB from"
> -  " %d to %d in %lld ms, throughput: %lld Mb/s or %lld
> MB/s\n",
> -  kind, n, size >> 10, sdomain, ddomain, time_ms,
> -  throughput * 8, throughput);
> + if (m) {
> + seq_printf(m, "\tamdgpu: %s %u bo moves of %u kB from"
> +  " %d to %d in %lld ms, throughput: %lld Mb/s or %lld
> MB/s\n",
> + kind, n, size >> 10, sdomain, ddomain, time_ms,
> + throughput * 8, throughput);
> + } else {
> + dev_info(adev->dev, "amdgpu: %s %u bo moves of %u kB
> from"
> +  " %d to %d in %lld ms, throughput: %lld Mb/s or %lld
> MB/s\n",
> + kind, n, size >> 10, sdomain, ddomain, time_ms,
> + throughput * 8, throughput);
> + }
>  }
>
>  static int amdgpu_benchmark_move(struct amdgpu_device *adev, unsigned
> size,
> -  unsigned sdomain, unsigned ddomain)
> +  unsigned sdomain, unsigned ddomain, struct
> seq_file *m)
>  {
>   struct amdgpu_bo *dobj = NULL;
>   struct amdgpu_bo *sobj = NULL;
> @@ -109,7 +116,7 @@ static int amdgpu_benchmark_move(struct
> amdgpu_device *adev, unsigned size,
>   goto out_cleanup;
>   else
>   amdgpu_benchmark_log_results(adev, n, size,
> time_ms,
> -  sdomain, ddomain,
> "dma");
> +  sdomain, ddomain, "dma",
> m);
>   }
>
>  out_cleanup:
> @@ -124,7 +131,7 @@ static int amdgpu_benchmark_move(struct
> amdgpu_device *adev, unsigned size,
>   return r;
>  }
>
> -int amdgpu_benchmark(struct amdgpu_device *adev, int test_number)
> +int amdgpu_benchmark(struct amdgpu_device *adev, int test_number,
> +struct seq_file *m)
>  {
>   int i, r;
>   static const int
> common_modes[AMDGPU_BENCHMARK_COMMON_MODES_N] = { @@ -
> 153,13 +160,16 @@ int amdgpu_benchmark(struct amdgpu_device *adev, int
> test_number)
>   dev_info(adev->dev,
>"benchmark test: %d (simple test, VRAM to GTT and
> GTT to VRAM)\n",
>test_number);
> + if (m)
> + seq_printf(m, "\tbenchmark test: %d (simple test,
> VRAM to GTT and GTT

RE: [PATCH] drm/amdgpu: Fix the ring buffer size for queue VM flush

2024-04-07 Thread Liang, Prike

[AMD Official Use Only - General]

Thank you, Christian, for the review. I will remove the 
gfx_v9_0_ring_emit_vm_flush() IP type condition check before pushing the commit.

Thanks,
Prike

> -Original Message-
> From: Christian König 
> Sent: Thursday, April 4, 2024 8:24 PM
> To: Liang, Prike ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander 
> Subject: Re: [PATCH] drm/amdgpu: Fix the ring buffer size for queue VM flush
>
> Am 26.03.24 um 09:21 schrieb Prike Liang:
> > Here are the corrections needed for the queue ring buffer size
> > calculation for the following cases:
> > - Remove the KIQ VM flush ring usage.
> > - Add the invalidate TLBs packet for gfx10 and gfx11 queue.
> > - There's no VM flush and PFP sync, so remove the gfx9 real
> >ring and compute ring buffer usage.
> >
> > Signed-off-by: Prike Liang 
>
> Good catch, that was probably just copied over from the gfx implementation.
>
> When the function isn't used with the compute rings any more you can
> probably also remove this from gfx_v9_0_ring_emit_vm_flush():
>
>  /* compute doesn't have PFP */
>  if (ring->funcs->type == AMDGPU_RING_TYPE_GFX) {
>
> With or without that the patch is Reviewed-by: Christian König
> .
>
> Thanks,
> Christian.
>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 3 +--
> >   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 3 +--
> >   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 5 -
> >   3 files changed, 2 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> > b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> > index d524f1a353ed..0c7312c0fa7f 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> > @@ -9186,7 +9186,7 @@ static const struct amdgpu_ring_funcs
> gfx_v10_0_ring_funcs_gfx = {
> > 7 + /* PIPELINE_SYNC */
> > SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 +
> > SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 +
> > -   2 + /* VM_FLUSH */
> > +   4 + /* VM_FLUSH */
> > 8 + /* FENCE for VM_FLUSH */
> > 20 + /* GDS switch */
> > 4 + /* double SWITCH_BUFFER,
> > @@ -9276,7 +9276,6 @@ static const struct amdgpu_ring_funcs
> gfx_v10_0_ring_funcs_kiq = {
> > 7 + /* gfx_v10_0_ring_emit_pipeline_sync */
> > SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 +
> > SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 +
> > -   2 + /* gfx_v10_0_ring_emit_vm_flush */
> > 8 + 8 + 8, /* gfx_v10_0_ring_emit_fence_kiq x3 for user fence,
> vm fence */
> > .emit_ib_size = 7, /* gfx_v10_0_ring_emit_ib_compute */
> > .emit_ib = gfx_v10_0_ring_emit_ib_compute, diff --git
> > a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> > b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> > index 7a906318e451..5390dd2c51da 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> > @@ -6191,7 +6191,7 @@ static const struct amdgpu_ring_funcs
> gfx_v11_0_ring_funcs_gfx = {
> > 7 + /* PIPELINE_SYNC */
> > SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 +
> > SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 +
> > -   2 + /* VM_FLUSH */
> > +   4 + /* VM_FLUSH */
> > 8 + /* FENCE for VM_FLUSH */
> > 20 + /* GDS switch */
> > 5 + /* COND_EXEC */
> > @@ -6277,7 +6277,6 @@ static const struct amdgpu_ring_funcs
> gfx_v11_0_ring_funcs_kiq = {
> > 7 + /* gfx_v11_0_ring_emit_pipeline_sync */
> > SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 +
> > SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 +
> > -   2 + /* gfx_v11_0_ring_emit_vm_flush */
> > 8 + 8 + 8, /* gfx_v11_0_ring_emit_fence_kiq x3 for user fence,
> vm fence */
> > .emit_ib_size = 7, /* gfx_v11_0_ring_emit_ib_compute */
> > .emit_ib = gfx_v11_0_ring_emit_ib_compute, diff --git
> > a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > index 71b555993b7a..fce0b8238d13 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > @@ -6872,7 +6872,6 @@ static const struct amdgpu_ring_funcs
> gfx_v9_0_ring_funcs_gfx = {
> > 7 +  /* PIPELINE_SYNC */
> > SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 +
> > SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 +
> > -   2 + /* VM_FLUSH */
> > 8 +  /* FENCE for VM_FLUSH

RE: [PATCH AUTOSEL 5.15 3/5] drm/amdgpu: Enable gpu reset for S3 abort cases on Raven series

2024-03-13 Thread Liang, Prike

[AMD Official Use Only - General]

> From: Alex Deucher 
> Sent: Thursday, March 14, 2024 4:46 AM
> To: Kuehling, Felix 
> Cc: Sasha Levin ; linux-ker...@vger.kernel.org;
> sta...@vger.kernel.org; Liang, Prike ; Deucher,
> Alexander ; Koenig, Christian
> ; Pan, Xinhui ;
> airl...@gmail.com; dan...@ffwll.ch; Zhang, Hawking
> ; Lazar, Lijo ; Ma, Le
> ; Zhu, James ; Xiao, Shane
> ; Jiang, Sonny ; amd-
> g...@lists.freedesktop.org; dri-de...@lists.freedesktop.org
> Subject: Re: [PATCH AUTOSEL 5.15 3/5] drm/amdgpu: Enable gpu reset for S3
> abort cases on Raven series
>
> On Wed, Mar 13, 2024 at 4:12 PM Felix Kuehling 
> wrote:
> >
> > On 2024-03-11 11:14, Sasha Levin wrote:
> > > From: Prike Liang 
> > >
> > > [ Upstream commit c671ec01311b4744b377f98b0b4c6d033fe569b3 ]
> > >
> > > Currently, GPU resets can now be performed successfully on the Raven
> > > series. While GPU reset is required for the S3 suspend abort case.
> > > So now can enable gpu reset for S3 abort cases on the Raven series.
> >
> > This looks suspicious to me. I'm not sure what conditions made the GPU
> > reset successful. But unless all the changes involved were also
> > backported, this should probably not be applied to older kernel
> > branches. I'm speculating it may be related to the removal of AMD
> IOMMUv2.
> >
>
> We should get confirmation from Prike, but I think he tested this on older
> kernels as well.
>
> Alex
>
> > Regards,
> >Felix
> >

The Raven/Raven2 series GPU reset function was enabled in some older kernel 
versions such as 5.5 but filtered out in more recent kernel driver versions. 
Therefore, this patch only applies to the latest kernel version, and it should 
be safe without affecting other cases by enabling the Raven GPU reset only on 
the S3 suspend abort case. From the Chrome kernel log indicating that the AMD 
IOMMUv2 driver is loaded, and with this patch triggering the GPU reset before 
the AMDGPU device reinitialization, it can effectively handle the S3 suspend 
abort resume problem on the Raven series.

Was the Raven GPU reset previously disabled due to the AMD IOMMUv2 driver? If 
so, based on the Chromebook's verification result, the Raven series GPU reset 
can probably be enabled with IOMMUv2 for other cases as well.

Thanks,
Prike
> >
> > >
> > > Signed-off-by: Prike Liang 
> > > Acked-by: Alex Deucher 
> > > Signed-off-by: Alex Deucher 
> > > Signed-off-by: Sasha Levin 
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/soc15.c | 45 +--
> ---
> > >   1 file changed, 25 insertions(+), 20 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c
> > > b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > > index 6a3486f52d698..ef5b3eedc8615 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > > @@ -605,11 +605,34 @@ soc15_asic_reset_method(struct
> amdgpu_device *adev)
> > >   return AMD_RESET_METHOD_MODE1;
> > >   }
> > >
> > > +static bool soc15_need_reset_on_resume(struct amdgpu_device *adev)
> > > +{
> > > + u32 sol_reg;
> > > +
> > > + sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
> > > +
> > > + /* Will reset for the following suspend abort cases.
> > > +  * 1) Only reset limit on APU side, dGPU hasn't checked yet.
> > > +  * 2) S3 suspend abort and TOS already launched.
> > > +  */
> > > + if (adev->flags & AMD_IS_APU && adev->in_s3 &&
> > > + !adev->suspend_complete &&
> > > + sol_reg)
> > > + return true;
> > > +
> > > + return false;
> > > +}
> > > +
> > >   static int soc15_asic_reset(struct amdgpu_device *adev)
> > >   {
> > >   /* original raven doesn't have full asic reset */
> > > - if ((adev->apu_flags & AMD_APU_IS_RAVEN) ||
> > > - (adev->apu_flags & AMD_APU_IS_RAVEN2))
> > > + /* On the latest Raven, the GPU reset can be performed
> > > +  * successfully. So now, temporarily enable it for the
> > > +  * S3 suspend abort case.
> > > +  */
> > > + if (((adev->apu_flags & AMD_APU_IS_RAVEN) ||
> > > + (adev->apu_flags & AMD_APU_IS_RAVEN2)) &&
> > > + !soc15_need_reset_on_resume(adev))
> > >   return 0

RE: [PATCH] drm/amdgpu: Enable gpu reset for S3 abort cases on Raven series

2024-02-22 Thread Liang, Prike

[AMD Official Use Only - General]

> From: Alex Deucher 
> Sent: Friday, February 23, 2024 1:03 AM
> To: Liang, Prike 
> Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander
> 
> Subject: Re: [PATCH] drm/amdgpu: Enable gpu reset for S3 abort cases on
> Raven series
>
> On Thu, Feb 22, 2024 at 8:41 AM Prike Liang  wrote:
> >
> > Currently, GPU resets can now be performed successfully on the Raven
> > series. While GPU reset is required for the S3 suspend abort case.
> > So now can enable gpu reset for S3 abort cases on the Raven series.
> >
> > Signed-off-by: Prike Liang 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/soc15.c | 45
> > +-
> >  1 file changed, 25 insertions(+), 20 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c
> > b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > index e4012f53632b..f68ef0863cb0 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > @@ -574,11 +574,34 @@ soc15_asic_reset_method(struct amdgpu_device
> *adev)
> > return AMD_RESET_METHOD_MODE1;  }
> >
> > +static bool soc15_need_reset_on_resume(struct amdgpu_device *adev) {
> > +   u32 sol_reg;
> > +
> > +   sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
> > +
> > +   /* Will reset for the following suspend abort cases.
> > +* 1) Only reset limit on APU side, dGPU hasn't checked yet.
> > +* 2) S3 suspend abort and TOS already launched.
> > +*/
> > +   if (adev->flags & AMD_IS_APU && adev->in_s3 &&
> > +   !adev->suspend_complete &&
> > +   sol_reg)
> > +   return true;
> > +
> > +   return false;
> > +}
> > +
> >  static int soc15_asic_reset(struct amdgpu_device *adev)  {
> > /* original raven doesn't have full asic reset */
> > -   if ((adev->apu_flags & AMD_APU_IS_RAVEN) ||
> > -   (adev->apu_flags & AMD_APU_IS_RAVEN2))
> > +   /* On the latest Raven, the GPU reset can be performed
> > +* successfully. So now, temporarily enable it for the
> > +* S3 suspend abort case.
> > +*/
> > +   if (!soc15_need_reset_on_resume(adev) &&
> > +   ((adev->apu_flags & AMD_APU_IS_RAVEN) ||
> > +   (adev->apu_flags & AMD_APU_IS_RAVEN2)))
>
> Maybe check the apu flags first to avoid the MMIO read on chips where we
> don't need it.
>
> Alex
>
Yes, it's a good idea and will address it at next patch version.

> > return 0;
> >
> > switch (soc15_asic_reset_method(adev)) { @@ -1298,24 +1321,6
> > @@ static int soc15_common_suspend(void *handle)
> > return soc15_common_hw_fini(adev);  }
> >
> > -static bool soc15_need_reset_on_resume(struct amdgpu_device *adev) -{
> > -   u32 sol_reg;
> > -
> > -   sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
> > -
> > -   /* Will reset for the following suspend abort cases.
> > -* 1) Only reset limit on APU side, dGPU hasn't checked yet.
> > -* 2) S3 suspend abort and TOS already launched.
> > -*/
> > -   if (adev->flags & AMD_IS_APU && adev->in_s3 &&
> > -   !adev->suspend_complete &&
> > -   sol_reg)
> > -   return true;
> > -
> > -   return false;
> > -}
> > -
> >  static int soc15_common_resume(void *handle)  {
> > struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> > --
> > 2.34.1
> >

RE: [PATCH 2/2] drm/amdgpu: reset gpu for pm abort case

2024-01-30 Thread Liang, Prike

[AMD Official Use Only - General]

> From: Lazar, Lijo 
> Sent: Monday, January 29, 2024 2:48 PM
> To: Liang, Prike ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Sharma, Deepak
> 
> Subject: Re: [PATCH 2/2] drm/amdgpu: reset gpu for pm abort case
>
>
>
> On 1/26/2024 2:30 PM, Liang, Prike wrote:
> > [AMD Official Use Only - General]
> >
> >>
> >> On 1/25/2024 8:52 AM, Prike Liang wrote:
> >>> In the pm abort case the gfx power rail not turn off from FCH side
> >>> and this will lead to the gfx reinitialized failed base on the
> >>> unknown gfx HW status, so let's reset the gpu to a known good power
> state.
> >>>
> >>
> >> From the description, this an APU only problem (or this patch could
> >> only resolve APU abort sequence). However, there is no check for APU
> >> in the patch below.
> >>
> > [Prike]  IIRC, there also has a similar problem on the dGPU side when
> > suspend abort and now this patch is only drafted for a hot issue on
> > the RV series. If need we can add a TODO item for drafting a more generic
> solution.
> >
>
> If this addresses a specific issue, then better to check the specific IP 
> revision
> before presenting this as a generic one. Presently the patch logic considers
> this as a generic for all soc15 asics.
>
Before someone can further confirm whether there's a similar problem on the 
dGPU device side then I prefer to limit this quirk only on some specific ASIC.

> >>
> >>> Signed-off-by: Prike Liang 
> >>> ---
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +
> >>>  drivers/gpu/drm/amd/amdgpu/soc15.c | 8 +++-
> >>>  2 files changed, 12 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >>> index 56d9dfa61290..4c40ffaaa5c2 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >>> @@ -4627,6 +4627,11 @@ int amdgpu_device_resume(struct
> drm_device
> >> *dev, bool fbcon)
> >>> return r;
> >>> }
> >>>
> >>> +   if(amdgpu_asic_need_reset_on_init(adev)) {
> >>> +   DRM_INFO("PM abort case and let's reset asic \n");
> >>> +   amdgpu_asic_reset(adev);
> >>> +   }
> >>> +
> >>
> >> suspend_noirq is specific for suspend scenarios and not valid for
> freeze/thaw.
> >> I guess this could trigger reset for successful restore on APUs.
> >>
> > [Prike] If doesn't run into noirq_suspend then still need further check
> whether the PSP TOS is still alive before gpu reset.
> >
>
> AFAIU, for a successful resume from hibernate on APUs, TOS will still be
> running. The patch will trigger a reset in such cases also.
>
> Thanks,
> Lijo
>
Yes, during the system try to restore the saved image the TOS should be running 
at that moment so will filter out the hibernate resume case in the later patch.

Thanks,
Prike
> >>> if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
> >>> return 0;
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c
> >>> b/drivers/gpu/drm/amd/amdgpu/soc15.c
> >>> index 15033efec2ba..9329a00b6abc 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> >>> @@ -804,9 +804,16 @@ static bool soc15_need_reset_on_init(struct
> >> amdgpu_device *adev)
> >>> if (adev->asic_type == CHIP_RENOIR)
> >>> return true;
> >>>
> >>> +   sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
> >>> +
> >>> /* Just return false for soc15 GPUs.  Reset does not seem to
> >>>  * be necessary.
> >>>  */
> >>
> >> The comment now doesn't make sense.
> >>
> >> Thanks,
> >> Lijo
> >>
> >>> +   if (adev->in_suspend && !adev->in_s0ix &&
> >>> +   !adev->pm_complete &&
> >>> +   sol_reg)
> >>> +   return true;
> >>> +
> >>> if (!amdgpu_passthrough(adev))
> >>> return false;
> >>>
> >>> @@ -816,7 +823,6 @@ static bool soc15_need_reset_on_init(struct
> >> amdgpu_device *adev)
> >>> /* Check sOS sign of life register to confirm sys driver and sOS
> >>>  * are already been loaded.
> >>>  */
> >>> -   sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
> >>> if (sol_reg)
> >>> return true;
> >>>

RE: [PATCH 1/2] drm/amdgpu: skip to program GFXDEC registers for PM abort case

2024-01-29 Thread Liang, Prike

[AMD Official Use Only - General]

> Sent: Friday, January 26, 2024 9:43 AM
> To: Alex Deucher 
> Cc: Deucher, Alexander ; Sharma, Deepak
> ; amd-gfx@lists.freedesktop.org
> Subject: RE: [PATCH 1/2] drm/amdgpu: skip to program GFXDEC registers for
> PM abort case
>
> [AMD Official Use Only - General]
>
> [AMD Official Use Only - General]
>
> > From: Alex Deucher 
> > Sent: Thursday, January 25, 2024 11:24 PM
> > To: Liang, Prike 
> > Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander
> > ; Sharma, Deepak
> 
> > Subject: Re: [PATCH 1/2] drm/amdgpu: skip to program GFXDEC registers
> > for PM abort case
> >
> > On Thu, Jan 25, 2024 at 10:22 AM Alex Deucher 
> > wrote:
> > >
> > > On Wed, Jan 24, 2024 at 9:39 PM Liang, Prike 
> > wrote:
> > > >
> > > > [AMD Official Use Only - General]
> > > >
> > > > Hi, Alex
> > > > > -Original Message-
> > > > > From: Alex Deucher 
> > > > > Sent: Wednesday, January 24, 2024 11:59 PM
> > > > > To: Liang, Prike 
> > > > > Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander
> > > > > ; Sharma, Deepak
> > > > > 
> > > > > Subject: Re: [PATCH 1/2] drm/amdgpu: skip to program GFXDEC
> > > > > registers for PM abort case
> > > > >
> > > > > On Wed, Jan 24, 2024 at 2:12 AM Prike Liang
> > > > > 
> > wrote:
> > > > > >
> > > > > > In the PM abort cases, the gfx power rail doesn't turn off so
> > > > > > some GFXDEC registers/CSB can't reset to default vaule. In
> > > > > > order to avoid unexpected problem now need skip to program
> > > > > > GFXDEC registers and bypass issue CSB packet for PM abort case.
> > > > > >
> > > > > > Signed-off-by: Prike Liang 
> > > > > > ---
> > > > > >  drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 +
> > > > > >  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 +
> > > > > >  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 4 
> > > > > >  3 files changed, 6 insertions(+)
> > > > > >
> > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > > > > > index c5f3859fd682..26d983eb831b 100644
> > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > > > > > @@ -1079,6 +1079,7 @@ struct amdgpu_device {
> > > > > > boolin_s3;
> > > > > > boolin_s4;
> > > > > > boolin_s0ix;
> > > > > > +   boolpm_complete;
> > > > > >
> > > > > > enum pp_mp1_state   mp1_state;
> > > > > > struct amdgpu_doorbell_index doorbell_index; diff
> > > > > > --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > > > > index 475bd59c9ac2..a01f9b0c2f30 100644
> > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > > > > @@ -2486,6 +2486,7 @@ static int
> > > > > > amdgpu_pmops_suspend_noirq(struct
> > > > > device *dev)
> > > > > > struct drm_device *drm_dev = dev_get_drvdata(dev);
> > > > > > struct amdgpu_device *adev = drm_to_adev(drm_dev);
> > > > > >
> > > > > > +   adev->pm_complete = true;
> > > > >
> > > > > This needs to be cleared somewhere on resume.
> > > > [Liang, Prike]  This flag is designed to indicate the amdgpu
> > > > device
> > suspension process status and will update the patch and clear it at
> > the amdgpu suspension beginning point.
> > > > >
> > > > > > if (amdgpu_acpi_should_gpu_reset(adev))
> > > > > > return amdgpu_asic_reset(adev);
> > > > > >
> > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > > > > > b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > > > > > index 57808be6e3ec..3bf51f18e13c 10064

RE: [PATCH 2/2] drm/amdgpu: reset gpu for pm abort case

2024-01-26 Thread Liang, Prike

[AMD Official Use Only - General]

>
> On 1/25/2024 8:52 AM, Prike Liang wrote:
> > In the pm abort case the gfx power rail not turn off from FCH side and
> > this will lead to the gfx reinitialized failed base on the unknown gfx
> > HW status, so let's reset the gpu to a known good power state.
> >
>
> From the description, this an APU only problem (or this patch could only
> resolve APU abort sequence). However, there is no check for APU in the patch
> below.
>
[Prike]  IIRC, there also has a similar problem on the dGPU side when suspend 
abort and
now this patch is only drafted for a hot issue on the RV series. If need we can 
add a TODO
item for drafting a more generic solution.

>
> > Signed-off-by: Prike Liang 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +
> >  drivers/gpu/drm/amd/amdgpu/soc15.c | 8 +++-
> >  2 files changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index 56d9dfa61290..4c40ffaaa5c2 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -4627,6 +4627,11 @@ int amdgpu_device_resume(struct drm_device
> *dev, bool fbcon)
> > return r;
> > }
> >
> > +   if(amdgpu_asic_need_reset_on_init(adev)) {
> > +   DRM_INFO("PM abort case and let's reset asic \n");
> > +   amdgpu_asic_reset(adev);
> > +   }
> > +
>
> suspend_noirq is specific for suspend scenarios and not valid for freeze/thaw.
> I guess this could trigger reset for successful restore on APUs.
>
[Prike] If doesn't run into noirq_suspend then still need further check whether 
the PSP TOS is still alive before gpu reset.

> > if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
> > return 0;
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c
> > b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > index 15033efec2ba..9329a00b6abc 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > @@ -804,9 +804,16 @@ static bool soc15_need_reset_on_init(struct
> amdgpu_device *adev)
> > if (adev->asic_type == CHIP_RENOIR)
> > return true;
> >
> > +   sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
> > +
> > /* Just return false for soc15 GPUs.  Reset does not seem to
> >  * be necessary.
> >  */
>
> The comment now doesn't make sense.
>
> Thanks,
> Lijo
>
> > +   if (adev->in_suspend && !adev->in_s0ix &&
> > +   !adev->pm_complete &&
> > +   sol_reg)
> > +   return true;
> > +
> > if (!amdgpu_passthrough(adev))
> > return false;
> >
> > @@ -816,7 +823,6 @@ static bool soc15_need_reset_on_init(struct
> amdgpu_device *adev)
> > /* Check sOS sign of life register to confirm sys driver and sOS
> >  * are already been loaded.
> >  */
> > -   sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
> > if (sol_reg)
> > return true;
> >

RE: [PATCH 2/2] drm/amdgpu: reset gpu for pm abort case

2024-01-26 Thread Liang, Prike

[AMD Official Use Only - General]

> From: Alex Deucher 
> Sent: Friday, January 26, 2024 6:57 AM
> To: Liang, Prike 
> Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander
> ; Sharma, Deepak
> 
> Subject: Re: [PATCH 2/2] drm/amdgpu: reset gpu for pm abort case
>
> On Wed, Jan 24, 2024 at 11:11 PM Prike Liang  wrote:
> >
> > In the pm abort case the gfx power rail not turn off from FCH side and
> > this will lead to the gfx reinitialized failed base on the unknown gfx
> > HW status, so let's reset the gpu to a known good power state.
> >
> > Signed-off-by: Prike Liang 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +
> >  drivers/gpu/drm/amd/amdgpu/soc15.c | 8 +++-
> >  2 files changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index 56d9dfa61290..4c40ffaaa5c2 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -4627,6 +4627,11 @@ int amdgpu_device_resume(struct drm_device
> *dev, bool fbcon)
> > return r;
> > }
> >
> > +   if(amdgpu_asic_need_reset_on_init(adev)) {
>
> space between if and (.
>
> Also, I think we need to check that we are not in S0ix as well otherwise I 
> think
> we'll always reset in S0ix.  We could probably do away with the GPU reset in
> the suspend_noirq callback with this change, but maybe make that a separate
> follow up patch.
>
> Alex
>
[Liang, Prike] Yes, there needn't reset GPU for the s0ix suspend abort case and 
s0ix suspension was already
filtered out in the need_reset_on_init() at the end of this patch. Regards to 
reset GPU at suspend_noirq callback,
there already sort out some cases that require to reset at 
amdgpu_acpi_should_gpu_reset() and if do the reset
at the no_irq suspend phase then will miss the first suspend abort case. IMO, 
we need filer out the following two cases and update the
patch to be a more generic solution accordingly.

1) The PM suspend abort case which is exit PM suspension process before runs 
into no_irq suspend phase.
2)  The PM suspend completely but the GPU doesn't power off and PSP TOS is 
alive at the amdgpu resume early beginning and this case
happed at passthrough device suspension at Xen guest side.

> > +   DRM_INFO("PM abort case and let's reset asic \n");
> > +   amdgpu_asic_reset(adev);
> > +   }
> > +
> > if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
> > return 0;
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c
> > b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > index 15033efec2ba..9329a00b6abc 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > @@ -804,9 +804,16 @@ static bool soc15_need_reset_on_init(struct
> amdgpu_device *adev)
> > if (adev->asic_type == CHIP_RENOIR)
> > return true;
> >
> > +   sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
> > +
> > /* Just return false for soc15 GPUs.  Reset does not seem to
> >  * be necessary.
> >  */
> > +   if (adev->in_suspend && !adev->in_s0ix &&
> > +   !adev->pm_complete &&
> > +   sol_reg)
> > +   return true;
> > +
> > if (!amdgpu_passthrough(adev))
> > return false;
> >
> > @@ -816,7 +823,6 @@ static bool soc15_need_reset_on_init(struct
> amdgpu_device *adev)
> > /* Check sOS sign of life register to confirm sys driver and sOS
> >  * are already been loaded.
> >  */
> > -   sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
> > if (sol_reg)
> > return true;
> >
> > --
> > 2.34.1
> >

RE: [PATCH 1/2] drm/amdgpu: skip to program GFXDEC registers for PM abort case

2024-01-25 Thread Liang, Prike

[AMD Official Use Only - General]

> From: Alex Deucher 
> Sent: Thursday, January 25, 2024 11:24 PM
> To: Liang, Prike 
> Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander
> ; Sharma, Deepak
> 
> Subject: Re: [PATCH 1/2] drm/amdgpu: skip to program GFXDEC registers for
> PM abort case
>
> On Thu, Jan 25, 2024 at 10:22 AM Alex Deucher 
> wrote:
> >
> > On Wed, Jan 24, 2024 at 9:39 PM Liang, Prike 
> wrote:
> > >
> > > [AMD Official Use Only - General]
> > >
> > > Hi, Alex
> > > > -Original Message-
> > > > From: Alex Deucher 
> > > > Sent: Wednesday, January 24, 2024 11:59 PM
> > > > To: Liang, Prike 
> > > > Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander
> > > > ; Sharma, Deepak
> > > > 
> > > > Subject: Re: [PATCH 1/2] drm/amdgpu: skip to program GFXDEC
> > > > registers for PM abort case
> > > >
> > > > On Wed, Jan 24, 2024 at 2:12 AM Prike Liang 
> wrote:
> > > > >
> > > > > In the PM abort cases, the gfx power rail doesn't turn off so
> > > > > some GFXDEC registers/CSB can't reset to default vaule. In order
> > > > > to avoid unexpected problem now need skip to program GFXDEC
> > > > > registers and bypass issue CSB packet for PM abort case.
> > > > >
> > > > > Signed-off-by: Prike Liang 
> > > > > ---
> > > > >  drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 +
> > > > >  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 +
> > > > >  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 4 
> > > > >  3 files changed, 6 insertions(+)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > > > > index c5f3859fd682..26d983eb831b 100644
> > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > > > > @@ -1079,6 +1079,7 @@ struct amdgpu_device {
> > > > > boolin_s3;
> > > > > boolin_s4;
> > > > > boolin_s0ix;
> > > > > +   boolpm_complete;
> > > > >
> > > > > enum pp_mp1_state   mp1_state;
> > > > > struct amdgpu_doorbell_index doorbell_index; diff --git
> > > > > a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > > > index 475bd59c9ac2..a01f9b0c2f30 100644
> > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > > > @@ -2486,6 +2486,7 @@ static int
> > > > > amdgpu_pmops_suspend_noirq(struct
> > > > device *dev)
> > > > > struct drm_device *drm_dev = dev_get_drvdata(dev);
> > > > > struct amdgpu_device *adev = drm_to_adev(drm_dev);
> > > > >
> > > > > +   adev->pm_complete = true;
> > > >
> > > > This needs to be cleared somewhere on resume.
> > > [Liang, Prike]  This flag is designed to indicate the amdgpu device
> suspension process status and will update the patch and clear it at the
> amdgpu suspension beginning point.
> > > >
> > > > > if (amdgpu_acpi_should_gpu_reset(adev))
> > > > > return amdgpu_asic_reset(adev);
> > > > >
> > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > > > > b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > > > > index 57808be6e3ec..3bf51f18e13c 100644
> > > > > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > > > > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > > > > @@ -3034,6 +3034,10 @@ static int gfx_v9_0_cp_gfx_start(struct
> > > > > amdgpu_device *adev)
> > > > >
> > > > > gfx_v9_0_cp_gfx_enable(adev, true);
> > > > >
> > > > > +   if (adev->in_suspend && !adev->pm_complete) {
> > > > > +   DRM_INFO(" will skip the csb ring write\n");
> > > > > +   return 0;
> > > > > +   }
> > > >
> > > > We probably want a similar fix for other gfx generations as well.
> > > >
> > > > Alex
> > > >
> > > [Liang, Prike] IIRC, there's no issue on the Mendocino side even without
> the fix. How about keep the other gfx generations unchanged firstly and after
> sort out the failed case will add the quirk for each specific gfx 
> respectively?
> >
> > Mendocino only supports S0i3 so we don't touch gfx on suspend/resume.
> > This would only happen on platforms that support S3.
>
> E.g., try an aborted suspend on Raphael or PHX2.
>
> Alex
>
[Liang, Prike]  Thanks for the reminder, but the Mendocino also was verified on 
the system with S3 enabled from BIOS. I will double confirm if there need the 
quirk on the RPL or PHX2.

> >
> > Alex
> >
> > >
> > > > > r = amdgpu_ring_alloc(ring, gfx_v9_0_get_csb_size(adev) + 4 + 
> > > > > 3);
> > > > > if (r) {
> > > > > DRM_ERROR("amdgpu: cp failed to lock ring
> > > > > (%d).\n", r);
> > > > > --
> > > > > 2.34.1
> > > > >

RE: [PATCH 2/2] drm/amdgpu: reset gpu for pm abort case

2024-01-24 Thread Liang, Prike

[AMD Official Use Only - General]

Hi, Alex

> -Original Message-
> From: Alex Deucher 
> Sent: Thursday, January 25, 2024 12:07 AM
> To: Liang, Prike 
> Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander
> ; Sharma, Deepak
> 
> Subject: Re: [PATCH 2/2] drm/amdgpu: reset gpu for pm abort case
>
> On Wed, Jan 24, 2024 at 2:17 AM Prike Liang  wrote:
> >
> > In the pm abort case the gfx power rail not turn off from FCH side and
> > this will lead to the gfx reinitialized failed base on the unknown gfx
> > HW status, so let's reset the gpu to a known good power state.
> >
> > Signed-off-by: Prike Liang 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 5 +
> >  drivers/gpu/drm/amd/amdgpu/soc15.c  | 7 ++-
> >  2 files changed, 11 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > index 9153f69bad7f..14b66c49a536 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > @@ -2935,6 +2935,11 @@ static int psp_resume(void *handle)
> >
> > dev_info(adev->dev, "PSP is resuming...\n");
> >
> > +   if(amdgpu_asic_need_reset_on_init(adev)) {
> > +   DRM_INFO("PM abort case and let's reset asic \n");
> > +   amdgpu_asic_reset(adev);
> > +   }
> > +
>
> Seems like it would be better to put this in the resume callback.
[Liang, Prike]  Yes, it makes sense to put the device level function call in 
device resume callback.
>
> > if (psp->mem_train_ctx.enable_mem_training) {
> > ret = psp_mem_training(psp, PSP_MEM_TRAIN_RESUME);
> > if (ret) {
> > diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c
> > b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > index 15033efec2ba..6ec4f6958c4c 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > @@ -804,9 +804,15 @@ static bool soc15_need_reset_on_init(struct
> amdgpu_device *adev)
> > if (adev->asic_type == CHIP_RENOIR)
> > return true;
> >
> > +   sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
>
> Is this register consistent for all soc15 chips?
>
> Alex
>
Yes, confirmed from the PSP firmware guys this register can be used for 
indicating the TOS/SOS live status on the SOC15 series.

> > +
> > /* Just return false for soc15 GPUs.  Reset does not seem to
> >  * be necessary.
> >  */
> > +   if (adev->in_suspend && !added->microplate &&
> > +   sol_reg)
> > +   return true;
> > +
> > if (!amdgpu_passthrough(adev))
> > return false;
> >
> > @@ -816,7 +822,6 @@ static bool soc15_need_reset_on_init(struct
> amdgpu_device *adev)
> > /* Check sOS sign of life register to confirm sys driver and sOS
> >  * are already been loaded.
> >  */
> > -   sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
> > if (sol_reg)
> > return true;
> >> --
> > 2.34.1
> >

RE: [PATCH 1/2] drm/amdgpu: skip to program GFXDEC registers for PM abort case

2024-01-24 Thread Liang, Prike

[AMD Official Use Only - General]

Hi, Alex
> -Original Message-
> From: Alex Deucher 
> Sent: Wednesday, January 24, 2024 11:59 PM
> To: Liang, Prike 
> Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander
> ; Sharma, Deepak
> 
> Subject: Re: [PATCH 1/2] drm/amdgpu: skip to program GFXDEC registers for
> PM abort case
>
> On Wed, Jan 24, 2024 at 2:12 AM Prike Liang  wrote:
> >
> > In the PM abort cases, the gfx power rail doesn't turn off so some
> > GFXDEC registers/CSB can't reset to default vaule. In order to avoid
> > unexpected problem now need skip to program GFXDEC registers and
> > bypass issue CSB packet for PM abort case.
> >
> > Signed-off-by: Prike Liang 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 +
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 +
> >  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 4 
> >  3 files changed, 6 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > index c5f3859fd682..26d983eb831b 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > @@ -1079,6 +1079,7 @@ struct amdgpu_device {
> > boolin_s3;
> > boolin_s4;
> > boolin_s0ix;
> > +   boolpm_complete;
> >
> > enum pp_mp1_state   mp1_state;
> > struct amdgpu_doorbell_index doorbell_index; diff --git
> > a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > index 475bd59c9ac2..a01f9b0c2f30 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > @@ -2486,6 +2486,7 @@ static int amdgpu_pmops_suspend_noirq(struct
> device *dev)
> > struct drm_device *drm_dev = dev_get_drvdata(dev);
> > struct amdgpu_device *adev = drm_to_adev(drm_dev);
> >
> > +   adev->pm_complete = true;
>
> This needs to be cleared somewhere on resume.
[Liang, Prike]  This flag is designed to indicate the amdgpu device suspension 
process status and will update the patch and clear it at the amdgpu suspension 
beginning point.
>
> > if (amdgpu_acpi_should_gpu_reset(adev))
> > return amdgpu_asic_reset(adev);
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > index 57808be6e3ec..3bf51f18e13c 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > @@ -3034,6 +3034,10 @@ static int gfx_v9_0_cp_gfx_start(struct
> > amdgpu_device *adev)
> >
> > gfx_v9_0_cp_gfx_enable(adev, true);
> >
> > +   if (adev->in_suspend && !adev->pm_complete) {
> > +   DRM_INFO(" will skip the csb ring write\n");
> > +   return 0;
> > +   }
>
> We probably want a similar fix for other gfx generations as well.
>
> Alex
>
[Liang, Prike] IIRC, there's no issue on the Mendocino side even without the 
fix. How about keep the other gfx generations unchanged firstly and after sort 
out the failed case will add the quirk for each specific gfx respectively?

> > r = amdgpu_ring_alloc(ring, gfx_v9_0_get_csb_size(adev) + 4 + 3);
> > if (r) {
> > DRM_ERROR("amdgpu: cp failed to lock ring (%d).\n",
> > r);
> > --
> > 2.34.1
> >

RE: [PATCH] drm/amdgpu: correct the amdgpu runtime dereference usage count

2023-11-16 Thread Liang, Prike

[AMD Official Use Only - General]

Ping for the review.

Regards,
--Prike

> -Original Message-
> From: Liang, Prike 
> Sent: Tuesday, November 14, 2023 10:41 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Liang, Prike
> 
> Subject: [PATCH] drm/amdgpu: correct the amdgpu runtime dereference
> usage count
>
> Fix the amdgpu runpm dereference usage count.
>
> Signed-off-by: Prike Liang 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> index 0cacd0b9f8be..4737ada467cc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> @@ -340,12 +340,12 @@ int amdgpu_display_crtc_set_config(struct
> drm_mode_set *set,
>   adev->have_disp_power_ref = true;
>   return ret;
>   }
> - /* if we have no active crtcs, then drop the power ref
> -  * we got before
> + /* if we have no active crtcs, then go to
> +  * drop the power ref we got before
>*/
>   if (!active && adev->have_disp_power_ref) {
> - pm_runtime_put_autosuspend(dev->dev);
>   adev->have_disp_power_ref = false;
> + goto out;
>   }
>
>  out:
> --
> 2.34.1

RE: [PATCH] drm/amdgpu: needn't set aggregated doorbell for map queue

2023-11-13 Thread Liang, Prike

[Public]

Please ignore this patch, double confirm from the stakeholder the MES map queue 
has possibility to change as an unmap queue after write the doorbell.

Regards,
--Prike

> -Original Message-
> From: Liang, Prike 
> Sent: Thursday, November 9, 2023 3:36 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Liang, Prike
> 
> Subject: [PATCH] drm/amdgpu: needn't set aggregated doorbell for map
> queue
>
> Needn't set aggregated doorbell for map queue and remove the dead code.
>
> Signed-off-by: Prike Liang 
> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 6 --
> drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 4 
>  2 files changed, 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index c8a3bf01743f..601bb6755bd3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -8220,9 +8220,6 @@ static void gfx_v10_0_ring_set_wptr_gfx(struct
> amdgpu_ring *ring)
>   WDOORBELL64(ring->doorbell_index, wptr_tmp);
>   } else {
>   WDOORBELL64(ring->doorbell_index, wptr_tmp);
> -
> - if (*is_queue_unmap)
> - WDOORBELL64(aggregated_db_index,
> wptr_tmp);
>   }
>   } else {
>   if (ring->use_doorbell) {
> @@ -8283,9 +8280,6 @@ static void
> gfx_v10_0_ring_set_wptr_compute(struct amdgpu_ring *ring)
>   WDOORBELL64(ring->doorbell_index, wptr_tmp);
>   } else {
>   WDOORBELL64(ring->doorbell_index, wptr_tmp);
> -
> - if (*is_queue_unmap)
> - WDOORBELL64(aggregated_db_index,
> wptr_tmp);
>   }
>   } else {
>   /* XXX check if swapping is necessary on BE */ diff --git
> a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
> b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
> index c1ff5eda8961..14633e2ceac6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
> @@ -358,10 +358,6 @@ static void sdma_v5_0_ring_set_wptr(struct
> amdgpu_ring *ring)
>   DRM_DEBUG("calling WDOORBELL64(0x%08x,
> 0x%016llx)\n",
>   ring->doorbell_index, ring->wptr <<
> 2);
>   WDOORBELL64(ring->doorbell_index, ring->wptr <<
> 2);
> -
> - if (*is_queue_unmap)
> - WDOORBELL64(aggregated_db_index,
> - ring->wptr << 2);
>   }
>   } else {
>   if (ring->use_doorbell) {
> --
> 2.34.1

RE: [PATCH 1/2] drm/amdgpu: correct the amdgpu runtime dereference usage count

2023-11-13 Thread Liang, Prike

[Public]

> -Original Message-
> From: Deucher, Alexander 
> Sent: Friday, November 10, 2023 5:46 AM
> To: Liang, Prike ; amd-gfx@lists.freedesktop.org
> Subject: RE: [PATCH 1/2] drm/amdgpu: correct the amdgpu runtime
> dereference usage count
>
> [Public]
>
> > -Original Message-
> > From: Liang, Prike 
> > Sent: Thursday, November 9, 2023 2:37 AM
> > To: amd-gfx@lists.freedesktop.org
> > Cc: Deucher, Alexander ; Liang, Prike
> > 
> > Subject: [PATCH 1/2] drm/amdgpu: correct the amdgpu runtime
> > dereference usage count
> >
> > Fix the amdgpu runpm dereference usage count.
> >
> > Signed-off-by: Prike Liang 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 2 +-
> > drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 1 +
> >  2 files changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> > index a53f436fa9f1..f6e5d9f7 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> > @@ -1992,7 +1992,7 @@ static int amdgpu_debugfs_sclk_set(void *data,
> > u64 val)
> >
> >   ret = amdgpu_dpm_set_soft_freq_range(adev, PP_SCLK,
> > (uint32_t)val, (uint32_t)val);
> >   if (ret)
> > - ret = -EINVAL;
> > + goto out;
>
> I think this hunk can be dropped.  It doesn't really change anything.  Or you
> could just drop the whole ret check since we just return ret at the end
> anyway.  Not sure if changing the error code is important here or not.
>
[Prike] Thanks for pointing it out, revisit this part that seems ok for 
reassign the return value when the caller function can't return a proper error 
type.
I will keep this part as the unmodified since this has no problem for 
dereferencing the runpm usage.
> >
> >  out:
> >   pm_runtime_mark_last_busy(adev_to_drm(adev)->dev);
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> > index 0cacd0b9f8be..ff1f42ae6d8e 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> > @@ -346,6 +346,7 @@ int amdgpu_display_crtc_set_config(struct
> > drm_mode_set *set,
> >   if (!active && adev->have_disp_power_ref) {
> >   pm_runtime_put_autosuspend(dev->dev);
> >   adev->have_disp_power_ref = false;
> > + return ret;
> >   }
>
> I think it would be cleaner to just drop the runtime_put above and update
> the comment.  We'll just fall through to the end of the function.
>
> Alex
>
[Prike]  Do you mean something like as the following? I will submit a new patch 
for this.

-   /* if we have no active crtcs, then drop the power ref
-* we got before
+   /* if we have no active crtcs, then go to
+* drop the power ref we got before
 */
if (!active && adev->have_disp_power_ref) {
-   pm_runtime_put_autosuspend(dev->dev);
adev->have_disp_power_ref = false;
+   goto out;
}
> >
> >  out:
> > --
> > 2.34.1
>

RE: [PATCH 2/2] drm/amdgpu: add amdgpu runpm usage trace for separate funcs

2023-11-13 Thread Liang, Prike

[Public]

> -Original Message-
> From: Deucher, Alexander 
> Sent: Friday, November 10, 2023 5:49 AM
> To: Liang, Prike ; amd-gfx@lists.freedesktop.org
> Subject: RE: [PATCH 2/2] drm/amdgpu: add amdgpu runpm usage trace for
> separate funcs
>
> [Public]
>
> > -Original Message-
> > From: Liang, Prike 
> > Sent: Thursday, November 9, 2023 2:37 AM
> > To: amd-gfx@lists.freedesktop.org
> > Cc: Deucher, Alexander ; Liang, Prike
> > 
> > Subject: [PATCH 2/2] drm/amdgpu: add amdgpu runpm usage trace for
> > separate funcs
> >
> > Add trace for amdgpu runpm separate funcs usage and this will help
> > debugging on the case of runpm usage missed to dereference.
> > In the normal case the runpm usage count referred by one kind of
> > functionality pairwise and usage should be changed from 1 to 0,
> > otherwise there will be an issue in the amdgpu runpm usage dereference.
> >
> > Signed-off-by: Prike Liang 
>
> Looks good.  Not sure if you want to add tracepoints to the other call sites 
> as
> well.  These are probably the trickiest however.
>
> Acked-by: Alex Deucher 
>
[Prike] Thanks for the review, now the trace points only added to the amdgpu 
functions which are referencing and dereferencing amdgpu runpm usage separately 
and from the checking that seems no more separate functions need the trace 
point at the moment.

> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c |  4 
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c   |  7 ++-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h   | 15 +++
> >  3 files changed, 25 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> > index e7e87a3b2601..decbbe3d4f06 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> > @@ -42,6 +42,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include "amdgpu_trace.h"
> >
> >  /**
> >   * amdgpu_dma_buf_attach - &dma_buf_ops.attach implementation @@
> -
> > 63,6 +64,7 @@ static int amdgpu_dma_buf_attach(struct dma_buf
> *dmabuf,
> >   attach->peer2peer = false;
> >
> >   r = pm_runtime_get_sync(adev_to_drm(adev)->dev);
> > + trace_amdgpu_runpm_reference_dumps(1, __func__);
> >   if (r < 0)
> >   goto out;
> >
> > @@ -70,6 +72,7 @@ static int amdgpu_dma_buf_attach(struct dma_buf
> > *dmabuf,
> >
> >  out:
> >   pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
> > + trace_amdgpu_runpm_reference_dumps(0, __func__);
> >   return r;
> >  }
> >
> > @@ -90,6 +93,7 @@ static void amdgpu_dma_buf_detach(struct dma_buf
> > *dmabuf,
> >
> >   pm_runtime_mark_last_busy(adev_to_drm(adev)->dev);
> >   pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
> > + trace_amdgpu_runpm_reference_dumps(0, __func__);
> >  }
> >
> >  /**
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > index 709a2c1b9d63..1026a9fa0c0f 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > @@ -183,6 +183,7 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring,
> > struct dma_fence **f, struct amd
> >   amdgpu_ring_emit_fence(ring, ring->fence_drv.gpu_addr,
> >  seq, flags | AMDGPU_FENCE_FLAG_INT);
> >   pm_runtime_get_noresume(adev_to_drm(adev)->dev);
> > + trace_amdgpu_runpm_reference_dumps(1, __func__);
> >   ptr = &ring->fence_drv.fences[seq & ring-
> > >fence_drv.num_fences_mask];
> >   if (unlikely(rcu_dereference_protected(*ptr, 1))) {
> >   struct dma_fence *old;
> > @@ -286,8 +287,11 @@ bool amdgpu_fence_process(struct amdgpu_ring
> > *ring)
> >   seq != ring->fence_drv.sync_seq)
> >   amdgpu_fence_schedule_fallback(ring);
> >
> > - if (unlikely(seq == last_seq))
> > + if (unlikely(seq == last_seq)) {
> > + pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
> > + trace_amdgpu_runpm_reference_dumps(0, __func__);
> >   return false;
> > + }
> >
> >   last_seq &= drv->num_fences_mask;
> >   seq &= drv->num_fences_mask;
> > @@ -310,6 +314,7 @@ bool amdgpu_fence_process(struct amdgpu_ring
>

RE: [PATCH] drm/amdgpu: only save and restore GPU device config at GPU error

2023-07-27 Thread Liang, Prike

[Public]

Hi, Lijo

> -Original Message-
> From: Lazar, Lijo 
> Sent: Thursday, July 27, 2023 6:49 PM
> To: Liang, Prike ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander 
> Subject: Re: [PATCH] drm/amdgpu: only save and restore GPU device config
> at GPU error
>
>
>
> On 7/27/2023 3:20 PM, Prike Liang wrote:
> > There's need a check on the GPU error state before save and restore
> > GPU device config space.
> >
> > Signed-off-by: Prike Liang 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index 358dcc1070c5..5ef3c5c49bee 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -3946,7 +3946,8 @@ int amdgpu_device_init(struct amdgpu_device
> *adev,
> > dev_err(adev->dev, "amdgpu_pmu_init failed\n");
> >
> > /* Have stored pci confspace at hand for restore in sudden PCI error
> */
> > -   if (amdgpu_device_cache_pci_state(adev->pdev))
> > +   if (adev->pdev->error_state != pci_channel_io_normal &&
> > +   amdgpu_device_cache_pci_state(adev->pdev))
>
> We need the clean state to be cached, not the state when there is an error.
> This state is later used to restore later, say when a mode-2 reset happens.
>
> Thanks,
> Lijo
>
But the original code in the amdgpu_device_init () is trying to restore the GPU 
config space immediately after the GPU config space saved, so
that looks this GPU save state restored unconditionally and that's not only for 
the GPU error case? Meanwhile, as to the mode-2 or some other reset case the 
reset
function will do the save and restore GPU state separately and not use the GPU 
state saved in amdgpu_device_init() process. In the amdgpu_device_init(), how
about restore the GPU state only for the GPU error case?

> > pci_restore_state(pdev);
> >
> > /* if we have > 1 VGA cards, then disable the amdgpu VGA resources
> > */

RE: [PATCH] drm/amdgpu: enable SDMA MGCG for SDMA 5.2.x

2023-07-25 Thread Liang, Prike

[AMD Official Use Only - General]

Thanks Alex for the input, yeah it requires check the SDMA firmware version to 
resolve the wild release package driver and I will update it in a new patch.

Regards,
--Prike

> -Original Message-
> From: Alex Deucher 
> Sent: Tuesday, July 25, 2023 9:16 PM
> To: Liang, Prike 
> Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander
> ; Guo, Shikai 
> Subject: Re: [PATCH] drm/amdgpu: enable SDMA MGCG for SDMA 5.2.x
>
> On Tue, Jul 25, 2023 at 5:20 AM Prike Liang  wrote:
> >
> > Now the SDMA firmware support SDMA MGCG properly, so let's enable it
> > from the driver side.
>
> If this is only supported on certain firmware versions, do we need a version
> check?
>
> Alex
>
>
> >
> > Signed-off-by: Prike Liang 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/nv.c| 6 --
> >  drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 1 +
> >  2 files changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c
> > b/drivers/gpu/drm/amd/amdgpu/nv.c index 6853b93ac82e..9bf7872e260d
> > 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/nv.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
> > @@ -901,7 +901,8 @@ static int nv_common_early_init(void *handle)
> > AMD_CG_SUPPORT_ATHUB_LS |
> > AMD_CG_SUPPORT_IH_CG |
> > AMD_CG_SUPPORT_VCN_MGCG |
> > -   AMD_CG_SUPPORT_JPEG_MGCG;
> > +   AMD_CG_SUPPORT_JPEG_MGCG |
> > +   AMD_CG_SUPPORT_SDMA_MGCG;
> > adev->pg_flags = AMD_PG_SUPPORT_GFX_PG |
> > AMD_PG_SUPPORT_VCN |
> > AMD_PG_SUPPORT_VCN_DPG | @@ -962,7 +963,8 @@
> > static int nv_common_early_init(void *handle)
> > AMD_CG_SUPPORT_ATHUB_LS |
> > AMD_CG_SUPPORT_IH_CG |
> > AMD_CG_SUPPORT_VCN_MGCG |
> > -   AMD_CG_SUPPORT_JPEG_MGCG;
> > +   AMD_CG_SUPPORT_JPEG_MGCG |
> > +   AMD_CG_SUPPORT_SDMA_MGCG;
> > adev->pg_flags = AMD_PG_SUPPORT_VCN |
> > AMD_PG_SUPPORT_VCN_DPG |
> > AMD_PG_SUPPORT_JPEG | diff --git
> > a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
> > b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
> > index 809eca54fc61..f8b6a2637d1d 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
> > @@ -1653,6 +1653,7 @@ static int sdma_v5_2_set_clockgating_state(void
> *handle,
> > case IP_VERSION(5, 2, 5):
> > case IP_VERSION(5, 2, 6):
> > case IP_VERSION(5, 2, 3):
> > +   case IP_VERSION(5, 2, 7):
> > sdma_v5_2_update_medium_grain_clock_gating(adev,
> > state == AMD_CG_STATE_GATE);
> > sdma_v5_2_update_medium_grain_light_sleep(adev,
> > --
> > 2.34.1
> >

RE: drm/amdgpu: skip kfd-iommu suspend/resume for S0ix

2023-04-12 Thread Liang, Prike

[AMD Official Use Only - General]

Thanks for sorting out this, that is making sense since driver already skip kfd 
device suspend and this will skip kfd_iommu_suspend() as well.

Reviewed-by: Prike Liang 

Regards,
--Prike

> -Original Message-
> From: Limonciello, Mario 
> Sent: Thursday, April 13, 2023 9:25 AM
> To: Liu, Aaron ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Kuehling, Felix
> ; Liang, Prike 
> Subject: Re: drm/amdgpu: skip kfd-iommu suspend/resume for S0ix
>
> On 4/5/2023 06:29, Aaron Liu wrote:
> > GFX is in gfxoff mode during s0ix so we shouldn't need to actually
> > execute kfd_iommu_suspend/kfd_iommu_resume operation.
> >
> > Signed-off-by: Aaron Liu 
> > Acked-by: Alex Deucher 
> Reviewed-by: Mario Limonciello 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 +---
> >   1 file changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index 3b6b85d9e0be..5094be94fa06 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -3304,9 +3304,11 @@ static int amdgpu_device_ip_resume(struct
> amdgpu_device *adev)
> >   {
> > int r;
> >
> > -   r = amdgpu_amdkfd_resume_iommu(adev);
> > -   if (r)
> > -   return r;
> > +   if (!adev->in_s0ix) {
> > +   r = amdgpu_amdkfd_resume_iommu(adev);
> > +   if (r)
> > +   return r;
> > +   }
> >
> > r = amdgpu_device_ip_resume_phase1(adev);
> > if (r)

RE: [PATCH] drm/amdgpu/sdma_v4_0: turn off SDMA ring buffer in the s2idle suspend

2022-11-30 Thread Liang, Prike

[Public]

-Original Message-
From: Lazar, Lijo 
Sent: Thursday, December 1, 2022 2:39 PM
To: Liang, Prike ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Limonciello, Mario 
; sta...@vger.kernel.org
Subject: Re: [PATCH] drm/amdgpu/sdma_v4_0: turn off SDMA ring buffer in the 
s2idle suspend



On 12/1/2022 11:52 AM, Prike Liang wrote:
> In the SDMA s0ix save process requires to turn off SDMA ring buffer
> for avoiding the SDMA in-flight request, otherwise will suffer from
> SDMA page fault which causes by page request from in-flight SDMA ring
> accessing at SDMA restore phase.
>
> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2248
> Cc: sta...@vger.kernel.org # 6.0
> Fixes: f8f4e2a51834 ("drm/amdgpu: skipping SDMA hw_init and hw_fini
> for S0ix.")
>
> Signed-off-by: Prike Liang 
> ---
>   drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 18 --
>   1 file changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> index 1122bd4eae98..2b9fe9f00343 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> @@ -913,7 +913,7 @@ static void sdma_v4_0_ring_emit_fence(struct amdgpu_ring 
> *ring, u64 addr, u64 se
>*
>* Stop the gfx async dma ring buffers (VEGA10).
>*/
> -static void sdma_v4_0_gfx_stop(struct amdgpu_device *adev)
> +static void sdma_v4_0_gfx_stop(struct amdgpu_device *adev, bool stop)

Better to rename as sdma_v4_0_gfx_enable(struct amdgpu_device *adev, bool 
enable).

Thanks,
Lijo

Ah, before this version I do use the sdma_v4_0_gfx_enable() name in the primary 
draft, but choose the sdma_v4_0_gfx_stop() for re-using the function name and 
comment info at the patch clean up.
AFAICS, use the _enable() name may can match well with the function job which 
does the SDMA ring enable bit setting, any other reason require to change the 
name here?

>   {
>   u32 rb_cntl, ib_cntl;
>   int i;
> @@ -922,10 +922,10 @@ static void sdma_v4_0_gfx_stop(struct
> amdgpu_device *adev)
>
>   for (i = 0; i < adev->sdma.num_instances; i++) {
>   rb_cntl = RREG32_SDMA(i, mmSDMA0_GFX_RB_CNTL);
> - rb_cntl = REG_SET_FIELD(rb_cntl, SDMA0_GFX_RB_CNTL, RB_ENABLE, 
> 0);
> + rb_cntl = REG_SET_FIELD(rb_cntl, SDMA0_GFX_RB_CNTL, RB_ENABLE, 
> stop
> +? 0 : 1);
>   WREG32_SDMA(i, mmSDMA0_GFX_RB_CNTL, rb_cntl);
>   ib_cntl = RREG32_SDMA(i, mmSDMA0_GFX_IB_CNTL);
> - ib_cntl = REG_SET_FIELD(ib_cntl, SDMA0_GFX_IB_CNTL, IB_ENABLE, 
> 0);
> + ib_cntl = REG_SET_FIELD(ib_cntl, SDMA0_GFX_IB_CNTL, IB_ENABLE, 
> stop
> +? 0 : 1);
>   WREG32_SDMA(i, mmSDMA0_GFX_IB_CNTL, ib_cntl);
>   }
>   }
> @@ -1044,7 +1044,7 @@ static void sdma_v4_0_enable(struct amdgpu_device 
> *adev, bool enable)
>   int i;
>
>   if (!enable) {
> - sdma_v4_0_gfx_stop(adev);
> + sdma_v4_0_gfx_stop(adev, true);
>   sdma_v4_0_rlc_stop(adev);
>   if (adev->sdma.has_page_queue)
>   sdma_v4_0_page_stop(adev);
> @@ -1960,8 +1960,10 @@ static int sdma_v4_0_suspend(void *handle)
>   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>
>   /* SMU saves SDMA state for us */
> - if (adev->in_s0ix)
> + if (adev->in_s0ix) {
> + sdma_v4_0_gfx_stop(adev, true);
>   return 0;
> + }
>
>   return sdma_v4_0_hw_fini(adev);
>   }
> @@ -1971,8 +1973,12 @@ static int sdma_v4_0_resume(void *handle)
>   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>
>   /* SMU restores SDMA state for us */
> - if (adev->in_s0ix)
> + if (adev->in_s0ix) {
> + sdma_v4_0_enable(adev, true);
> + sdma_v4_0_gfx_stop(adev, false);
> + amdgpu_ttm_set_buffer_funcs_status(adev, true);
>   return 0;
> + }
>
>   return sdma_v4_0_hw_init(adev);
>   }
<>

RE: [PATCH v2] drm/amdgpu: disallow gfxoff until GC IP blocks complete s2idle resume

2022-10-24 Thread Liang, Prike

[Public]

-Original Message-
From: Quan, Evan 
Sent: Monday, October 24, 2022 2:31 PM
To: Liang, Prike ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Liang, Prike 

Subject: RE: [PATCH v2] drm/amdgpu: disallow gfxoff until GC IP blocks complete 
s2idle resume

[AMD Official Use Only - General]



> -Original Message-
> From: amd-gfx  On Behalf Of
> Prike Liang
> Sent: Friday, October 21, 2022 10:47 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Liang, Prike
> 
> Subject: [PATCH v2] drm/amdgpu: disallow gfxoff until GC IP blocks
> complete s2idle resume
>
> In the S2idle suspend/resume phase the gfxoff is keeping functional so
> some IP blocks will be likely to reinitialize at gfxoff entry and that
> will result in failing to program GC registers.Therefore, let disallow
> gfxoff until AMDGPU IPs reinitialized completely.
[Quan, Evan] It seems the issue described here has nothing related with 
suspend. Instead it happened during resuming only. Right?
If so, I would suggest to drop the confusing "suspend" from the description 
part.
Other than that, the patch is reviewed-by: Evan Quan 

Evan

[Prike] Yes this problem only observed on the s2idle resume period, but that's 
true the s2idle suspend and resume keep the gfxoff being functional.

>
> Signed-off-by: Prike Liang 
> ---
> -v2: Move the operation of exiting gfxoff from smu to higer layer in
> amdgpu_device.c.
>
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 5b8362727226..36c44625932e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3210,6 +3210,12 @@ static int
> amdgpu_device_ip_resume_phase2(struct amdgpu_device *adev)
>   return r;
>   }
>   adev->ip_blocks[i].status.hw = true;
> +
> + if (adev->in_s0ix && adev->ip_blocks[i].version->type ==
> AMD_IP_BLOCK_TYPE_SMC) {
> + amdgpu_gfx_off_ctrl(adev, false);
> + DRM_DEBUG("will disable gfxoff for re-initializing
> other blocks\n");
> + }
> +
>   }
>
>   return 0;
> @@ -4185,6 +4191,10 @@ int amdgpu_device_resume(struct drm_device
> *dev, bool fbcon)
>   /* Make sure IB tests flushed */
>   flush_delayed_work(&adev->delayed_init_work);
>
> + if (adev->in_s0ix) {
> + amdgpu_gfx_off_ctrl(adev, true);
> + DRM_DEBUG("will enable gfxoff for the mission mode\n");
> + }
>   if (fbcon)
>
>   drm_fb_helper_set_suspend_unlocked(adev_to_drm(adev)-
> >fb_helper, false);
>
> --
> 2.25.1

RE: [PATCH 1/2] drm/amdkfd: introduce dummy cache info for property asic

2022-10-24 Thread Liang, Prike

[Public]

-Original Message-
From: Kuehling, Felix 
Sent: Saturday, October 22, 2022 4:53 AM
To: Liang, Prike ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Zhang, Yifan 
; Huang, Ray ; Liu, Aaron 

Subject: Re: [PATCH 1/2] drm/amdkfd: introduce dummy cache info for property 
asic

On 2022-10-21 09:05, Liang, Prike wrote:
> [Public]
>
> -Original Message-
> From: Kuehling, Felix 
> Sent: Friday, October 21, 2022 1:11 PM
> To: Liang, Prike ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Zhang, Yifan
> ; Huang, Ray ; Liu, Aaron
> 
> Subject: Re: [PATCH 1/2] drm/amdkfd: introduce dummy cache info for
> property asic
>
> Am 2022-10-20 um 21:50 schrieb Liang, Prike:
>> [Public]
>>
>> -Original Message-
>> From: Kuehling, Felix 
>> Sent: Friday, October 21, 2022 12:03 AM
>> To: Liang, Prike ; amd-gfx@lists.freedesktop.org
>> Cc: Deucher, Alexander ; Zhang, Yifan
>> ; Huang, Ray ; Liu, Aaron
>> 
>> Subject: Re: [PATCH 1/2] drm/amdkfd: introduce dummy cache info for
>> property asic
>>
>>
>> Am 2022-10-20 um 05:15 schrieb Prike Liang:
>>> This dummy cache info will enable kfd base function support.
>>>
>>> Signed-off-by: Prike Liang 
>>> ---
>>> drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 55 +--
>>> 1 file changed, 52 insertions(+), 3 deletions(-)
>>>
[snip]
>>> @@ -1528,7 +1574,10 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev 
>>> *kdev,
>>> 
>>> kfd_fill_gpu_cache_info_from_gfx_config(kdev, pcache_info);
>>> break;
>>> default:
>>> - return -EINVAL;
>>> + pcache_info = dummy_cache_info;
>>> + num_of_cache_types = ARRAY_SIZE(dummy_cache_info);
>>> + pr_warn("dummy cache info is used temporarily and 
>>> real cache info need update later.\n");
>>> + break;
>> Could we make this return an error if the amdgpu.exp_hw_support module 
>> parameter is not set?
>>
>> Regards,
>>  Felix
>>
>> [Prike] It's fine to protect this dummy info by checking the parameter 
>> amdgpu_exp_hw_support, but it may not friendly to end user by adding the 
>> parameter and some guys will still report KFD not enabled for this parameter 
>> setting problem. The original idea is the end user will not aware the dummy 
>> cache info and only alert the warning message to developer.
> I thought the intention was to simplify bring-up but make sure that valid 
> cache info is available by the time a chip goes into production.
> Therefore, normal end users should never need to set the 
> amdgpu_exp_hw_support option. I think you're saying that we would go to 
> production with dummy info. That seems like a bad idea to me.
>
> Regards,
> Felix
>
> [Prike]  Sorry for the confusion. In fact, this dummy cache info only used 
> before production and meanwhile needn't require any parameter setting for CQE 
> do KFD test. Anyway if you still have concern on this solution I will add the 
> condition of checking amdgpu_exp_hw_support.

The idea to control this with a module parameter was to cause a more obvious 
failure if we don't have correct cache info before going to production. Just a 
warning in the log file is too easy to miss or ignore. Of course, if QA gets 
into the habit of testing with amdgpu_exp_hw_support, then this may not solve 
the problem. You need to have at least one test pass without 
amdgpu_exp_hw_support to catch missing cache info.

Regards,
   Felix

Get your point. As to the KFD support on a NPI will be tracked by a ticket 
which make sure the real cache info update later after this dummy cache info 
assigned in the early BU phase.

Thanks,
Prike

RE: [PATCH] drm/amdgpu: disallow gfxoff until GC IP blocks complete s2idle resume

2022-10-21 Thread Liang, Prike

[Public]

-Original Message-
From: Alex Deucher 
Sent: Friday, October 21, 2022 9:39 PM
To: Liang, Prike 
Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander 
; Huang, Ray 
Subject: Re: [PATCH] drm/amdgpu: disallow gfxoff until GC IP blocks complete 
s2idle resume

On Thu, Oct 20, 2022 at 10:30 PM Prike Liang  wrote:
>
> In the S2idle suspend/resume phase the gfxoff is keeping functional so
> some IP blocks will be likely to reinitialize at gfxoff entry and that
> will result in failing to program GC registers.Therefore, let disallow
> gfxoff until AMDGPU IPs reinitialized completely.
>
> Signed-off-by: Prike Liang 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 
> drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c  | 5 +
>  2 files changed, 9 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index e0445e8cc342..1624ed15fbc4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -4185,6 +4185,10 @@ int amdgpu_device_resume(struct drm_device *dev, bool 
> fbcon)
> /* Make sure IB tests flushed */
> flush_delayed_work(&adev->delayed_init_work);
>
> +   if (adev->in_s0ix) {
> +   amdgpu_gfx_off_ctrl(adev, true);
> +   DRM_DEBUG("will enable gfxoff for the mission mode\n");
> +   }
> if (fbcon)
>
> drm_fb_helper_set_suspend_unlocked(adev_to_drm(adev)->fb_helper,
> false);
>
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> index 4fe75dd2b329..3948dc5b1d6a 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> @@ -1664,6 +1664,11 @@ static int smu_resume(void *handle)
>
> dev_info(adev->dev, "SMU is resumed successfully!\n");
>
> +   if (adev->in_s0ix) {
> +   amdgpu_gfx_off_ctrl(adev, false);
> +   dev_dbg(adev->dev, "will disable gfxoff for re-initializing 
> other blocks\n");
> +   }
> +

I think it would be better to put this in amdgpu_device.c so it's clear where 
its match is.  Also for raven based boards this will get missed because they 
still use the powerplay power code.

Alex

[Prike] I miss this amdgpu_gfx_off_ctrl() is a generic upper layer function but 
that should also work on Rave series, since on the Rave series will use another 
message PPSMC_MSG_GpuChangeState exit gfxoff. But it makes sense unify the 
operation of exiting gfxoff at an upper layer in amdgpu_device.c and I will 
update it at patch v2.

Thanks,
Prike
> return 0;
>  }
>
> --
> 2.25.1
>

RE: [PATCH 1/2] drm/amdkfd: introduce dummy cache info for property asic

2022-10-21 Thread Liang, Prike

[Public]

-Original Message-
From: Kuehling, Felix 
Sent: Friday, October 21, 2022 1:11 PM
To: Liang, Prike ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Zhang, Yifan 
; Huang, Ray ; Liu, Aaron 

Subject: Re: [PATCH 1/2] drm/amdkfd: introduce dummy cache info for property 
asic

Am 2022-10-20 um 21:50 schrieb Liang, Prike:
> [Public]
>
> -Original Message-
> From: Kuehling, Felix 
> Sent: Friday, October 21, 2022 12:03 AM
> To: Liang, Prike ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Zhang, Yifan
> ; Huang, Ray ; Liu, Aaron
> 
> Subject: Re: [PATCH 1/2] drm/amdkfd: introduce dummy cache info for
> property asic
>
>
> Am 2022-10-20 um 05:15 schrieb Prike Liang:
>> This dummy cache info will enable kfd base function support.
>>
>> Signed-off-by: Prike Liang 
>> ---
>>drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 55 +--
>>1 file changed, 52 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
>> b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
>> index cd5f8b219bf9..960046e43b7a 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
>> @@ -795,6 +795,54 @@ static struct kfd_gpu_cache_info 
>> yellow_carp_cache_info[] = {
>>},
>>};
>>
>> +static struct kfd_gpu_cache_info dummy_cache_info[] = {
>> + {
>> + /* TCP L1 Cache per CU */
>> + .cache_size = 16,
>> + .cache_level = 1,
>> + .flags = (CRAT_CACHE_FLAGS_ENABLED |
>> + CRAT_CACHE_FLAGS_DATA_CACHE |
>> + CRAT_CACHE_FLAGS_SIMD_CACHE),
>> + .num_cu_shared = 1,
>> + },
>> + {
>> + /* Scalar L1 Instruction Cache per SQC */
>> + .cache_size = 32,
>> + .cache_level = 1,
>> + .flags = (CRAT_CACHE_FLAGS_ENABLED |
>> + CRAT_CACHE_FLAGS_INST_CACHE |
>> + CRAT_CACHE_FLAGS_SIMD_CACHE),
>> + .num_cu_shared = 2,
>> + },
>> + {
>> + /* Scalar L1 Data Cache per SQC */
>> + .cache_size = 16,
>> + .cache_level = 1,
>> + .flags = (CRAT_CACHE_FLAGS_ENABLED |
>> + CRAT_CACHE_FLAGS_DATA_CACHE |
>> + CRAT_CACHE_FLAGS_SIMD_CACHE),
>> + .num_cu_shared = 2,
>> + },
>> + {
>> + /* GL1 Data Cache per SA */
>> + .cache_size = 128,
>> + .cache_level = 1,
>> + .flags = (CRAT_CACHE_FLAGS_ENABLED |
>> + CRAT_CACHE_FLAGS_DATA_CACHE |
>> + CRAT_CACHE_FLAGS_SIMD_CACHE),
>> + .num_cu_shared = 6,
>> + },
>> + {
>> + /* L2 Data Cache per GPU (Total Tex Cache) */
>> + .cache_size = 2048,
>> + .cache_level = 2,
>> + .flags = (CRAT_CACHE_FLAGS_ENABLED |
>> + CRAT_CACHE_FLAGS_DATA_CACHE |
>> + CRAT_CACHE_FLAGS_SIMD_CACHE),
>> + .num_cu_shared = 6,
>> + },
>> +};
>> +
>>static void kfd_populated_cu_info_cpu(struct kfd_topology_device *dev,
>>struct crat_subtype_computeunit *cu)
>>{
>> @@ -1514,8 +1562,6 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev 
>> *kdev,
>>num_of_cache_types = 
>> ARRAY_SIZE(beige_goby_cache_info);
>>break;
>>case IP_VERSION(10, 3, 3):
>> - case IP_VERSION(10, 3, 6): /* TODO: Double check these on 
>> production silicon */
>> - case IP_VERSION(10, 3, 7): /* TODO: Double check these on 
>> production silicon */
>>pcache_info = yellow_carp_cache_info;
>>num_of_cache_types = 
>> ARRAY_SIZE(yellow_carp_cache_info);
>>break;
>> @@ -1528,7 +1574,10 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev 
>> *kdev,
>>kfd_fill_gpu_cache_info_from_gfx_config(kdev, 
>> pcache_info);
>>break;
>>default:
>> - return -EINVAL;
>> + pcache_info = dummy_cache_info;
>> + num_of_cache_types = ARRAY_SIZE(dummy_cache_info);
>> +

RE: [PATCH 1/2] drm/amdkfd: introduce dummy cache info for property asic

2022-10-20 Thread Liang, Prike

[Public]

-Original Message-
From: Kuehling, Felix 
Sent: Friday, October 21, 2022 12:03 AM
To: Liang, Prike ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Zhang, Yifan 
; Huang, Ray ; Liu, Aaron 

Subject: Re: [PATCH 1/2] drm/amdkfd: introduce dummy cache info for property 
asic


Am 2022-10-20 um 05:15 schrieb Prike Liang:
> This dummy cache info will enable kfd base function support.
>
> Signed-off-by: Prike Liang 
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 55 +--
>   1 file changed, 52 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> index cd5f8b219bf9..960046e43b7a 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> @@ -795,6 +795,54 @@ static struct kfd_gpu_cache_info 
> yellow_carp_cache_info[] = {
>   },
>   };
>
> +static struct kfd_gpu_cache_info dummy_cache_info[] = {
> + {
> + /* TCP L1 Cache per CU */
> + .cache_size = 16,
> + .cache_level = 1,
> + .flags = (CRAT_CACHE_FLAGS_ENABLED |
> + CRAT_CACHE_FLAGS_DATA_CACHE |
> + CRAT_CACHE_FLAGS_SIMD_CACHE),
> + .num_cu_shared = 1,
> + },
> + {
> + /* Scalar L1 Instruction Cache per SQC */
> + .cache_size = 32,
> + .cache_level = 1,
> + .flags = (CRAT_CACHE_FLAGS_ENABLED |
> + CRAT_CACHE_FLAGS_INST_CACHE |
> + CRAT_CACHE_FLAGS_SIMD_CACHE),
> + .num_cu_shared = 2,
> + },
> + {
> + /* Scalar L1 Data Cache per SQC */
> + .cache_size = 16,
> + .cache_level = 1,
> + .flags = (CRAT_CACHE_FLAGS_ENABLED |
> + CRAT_CACHE_FLAGS_DATA_CACHE |
> + CRAT_CACHE_FLAGS_SIMD_CACHE),
> + .num_cu_shared = 2,
> + },
> + {
> + /* GL1 Data Cache per SA */
> + .cache_size = 128,
> + .cache_level = 1,
> + .flags = (CRAT_CACHE_FLAGS_ENABLED |
> + CRAT_CACHE_FLAGS_DATA_CACHE |
> + CRAT_CACHE_FLAGS_SIMD_CACHE),
> + .num_cu_shared = 6,
> + },
> + {
> + /* L2 Data Cache per GPU (Total Tex Cache) */
> + .cache_size = 2048,
> + .cache_level = 2,
> + .flags = (CRAT_CACHE_FLAGS_ENABLED |
> + CRAT_CACHE_FLAGS_DATA_CACHE |
> + CRAT_CACHE_FLAGS_SIMD_CACHE),
> + .num_cu_shared = 6,
> + },
> +};
> +
>   static void kfd_populated_cu_info_cpu(struct kfd_topology_device *dev,
>   struct crat_subtype_computeunit *cu)
>   {
> @@ -1514,8 +1562,6 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev,
>   num_of_cache_types = ARRAY_SIZE(beige_goby_cache_info);
>   break;
>   case IP_VERSION(10, 3, 3):
> - case IP_VERSION(10, 3, 6): /* TODO: Double check these on 
> production silicon */
> - case IP_VERSION(10, 3, 7): /* TODO: Double check these on 
> production silicon */
>   pcache_info = yellow_carp_cache_info;
>   num_of_cache_types = ARRAY_SIZE(yellow_carp_cache_info);
>   break;
> @@ -1528,7 +1574,10 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev 
> *kdev,
>   kfd_fill_gpu_cache_info_from_gfx_config(kdev, 
> pcache_info);
>   break;
>   default:
> - return -EINVAL;
> + pcache_info = dummy_cache_info;
> + num_of_cache_types = ARRAY_SIZE(dummy_cache_info);
> + pr_warn("dummy cache info is used temporarily and real 
> cache info need update later.\n");
> + break;

Could we make this return an error if the amdgpu.exp_hw_support module 
parameter is not set?

Regards,
   Felix

[Prike] It's fine to protect this dummy info by checking the parameter 
amdgpu_exp_hw_support, but it may not friendly to end user by adding the 
parameter and some guys will still report KFD not enabled for this parameter 
setting problem. The original idea is the end user will not aware the dummy 
cache info and only alert the warning message to developer.

>   }
>   }
>

RE: [PATCH] drm/amdkfd: Match GC 11.0.1 cache info to yellow carp

2022-09-04 Thread Liang, Prike

If not sure whether the discovery table can be fulfilled the cache info in the 
upcoming NPI ASICs, maybe need the dummy cache info in the default clause case 
rather than add more and more faking some specific HW configuration and 
sometimes that may give misleading HW info. 


Regards,
--Prike

-Original Message-
From: Zhang, Yifan  
Sent: Monday, September 5, 2022 9:47 AM
To: Liang, Prike ; Liu, Aaron ; Huang, 
Tim ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Du, Xiaojian 
; Huang, Ray ; Kuehling, Felix 

Subject: RE: [PATCH] drm/amdkfd: Match GC 11.0.1 cache info to yellow carp

[Public]

It is actually a bug of discovery table which needs to be identified in NPI 
phase. Hopefully we don't need neither dummy nor yellow carp cache info in the 
future.

Best Regards,
Yifan

-Original Message-
From: Liang, Prike 
Sent: Monday, September 5, 2022 9:24 AM
To: Zhang, Yifan ; Liu, Aaron ; Huang, 
Tim ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Du, Xiaojian 
; Huang, Ray ; Kuehling, Felix 

Subject: RE: [PATCH] drm/amdkfd: Match GC 11.0.1 cache info to yellow carp

How about add a dummy cache info for the NPI product in the default case and 
notify user that's using the dummy cache configuration to make sure not miss 
correcting the HW info in the future? 


Regards,
--Prike

-Original Message-
From: amd-gfx  On Behalf Of Zhang, Yifan
Sent: Friday, September 2, 2022 10:28 AM
To: Liu, Aaron ; Huang, Tim ; 
amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Du, Xiaojian 

Subject: RE: [PATCH] drm/amdkfd: Match GC 11.0.1 cache info to yellow carp

[Public]

Hi Aaron,

Yes, the cache info are different. But this diff is intentional to workaround 
the discovery table lacks cache info in GC 11.0.1 issue. The workaround will be 
removed after discovery table finishes integrating cache info. Given they 
already have a test version, we can pend this patch until they finish 
integration.

Best Regards,
Yifan

-Original Message-
From: Liu, Aaron 
Sent: Friday, September 2, 2022 9:44 AM
To: Huang, Tim ; Zhang, Yifan ; 
amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Du, Xiaojian 

Subject: RE: [PATCH] drm/amdkfd: Match GC 11.0.1 cache info to yellow carp

[Public]

Hi Yifan,

Yellow carp's cache info cannot be duplicated to GC_11_0_1.

Different point to GC_11_0_1:
TCP L1  Cache size is 32 
GL1 Data Cache size per SA is 256

Others looks good to me 

--
Best Regards
Aaron Liu

> -Original Message-
> From: amd-gfx  On Behalf Of 
> Huang, Tim
> Sent: Friday, September 2, 2022 6:44 AM
> To: Zhang, Yifan ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Du, Xiaojian 
> 
> Subject: RE: [PATCH] drm/amdkfd: Match GC 11.0.1 cache info to yellow 
> carp
> 
> [Public]
> 
> [Public]
> 
> Reviewed-by: Tim Huang 
> 
> Best Regards,
> Tim Huang
> 
> 
> 
> -Original Message-
> From: Zhang, Yifan 
> Sent: Thursday, September 1, 2022 3:30 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Huang, Tim 
> ; Du, Xiaojian ; Zhang, Yifan 
> 
> Subject: [PATCH] drm/amdkfd: Match GC 11.0.1 cache info to yellow carp
> 
> Current discovery table doesn't have cache info for GC 11.0.1, thus 
> can't be parsed like other GC 11, this patch to match GC 11.0.1 cache 
> info to yellow carp
> 
> Signed-off-by: Yifan Zhang 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> index 24b414cff3ec..1c500bfb0b28 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> @@ -1516,11 +1516,11 @@ static int kfd_fill_gpu_cache_info(struct 
> kfd_dev *kdev,
> case IP_VERSION(10, 3, 3):
> case IP_VERSION(10, 3, 6): /* TODO: Double check these 
> on production silicon */
> case IP_VERSION(10, 3, 7): /* TODO: Double check these 
> on production silicon */
> +   case IP_VERSION(11, 0, 1): /* TODO: Double check these 
> +on production silicon */
> pcache_info = yellow_carp_cache_info;
> num_of_cache_types = 
> ARRAY_SIZE(yellow_carp_cache_info);
> break;
> case IP_VERSION(11, 0, 0):
> -   case IP_VERSION(11, 0, 1):
> case IP_VERSION(11, 0, 2):
> case IP_VERSION(11, 0, 3):
> pcache_info = cache_info;
> --
> 2.37.1

RE: [PATCH] drm/amdkfd: Match GC 11.0.1 cache info to yellow carp

2022-09-04 Thread Liang, Prike

How about add a dummy cache info for the NPI product in the default case and 
notify user that's using the dummy cache configuration to make sure not miss 
correcting the HW info in the future? 


Regards,
--Prike

-Original Message-
From: amd-gfx  On Behalf Of Zhang, Yifan
Sent: Friday, September 2, 2022 10:28 AM
To: Liu, Aaron ; Huang, Tim ; 
amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Du, Xiaojian 

Subject: RE: [PATCH] drm/amdkfd: Match GC 11.0.1 cache info to yellow carp

[Public]

Hi Aaron,

Yes, the cache info are different. But this diff is intentional to workaround 
the discovery table lacks cache info in GC 11.0.1 issue. The workaround will be 
removed after discovery table finishes integrating cache info. Given they 
already have a test version, we can pend this patch until they finish 
integration.

Best Regards,
Yifan

-Original Message-
From: Liu, Aaron 
Sent: Friday, September 2, 2022 9:44 AM
To: Huang, Tim ; Zhang, Yifan ; 
amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Du, Xiaojian 

Subject: RE: [PATCH] drm/amdkfd: Match GC 11.0.1 cache info to yellow carp

[Public]

Hi Yifan,

Yellow carp's cache info cannot be duplicated to GC_11_0_1.

Different point to GC_11_0_1:
TCP L1  Cache size is 32 
GL1 Data Cache size per SA is 256

Others looks good to me 

--
Best Regards
Aaron Liu

> -Original Message-
> From: amd-gfx  On Behalf Of 
> Huang, Tim
> Sent: Friday, September 2, 2022 6:44 AM
> To: Zhang, Yifan ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Du, Xiaojian 
> 
> Subject: RE: [PATCH] drm/amdkfd: Match GC 11.0.1 cache info to yellow 
> carp
> 
> [Public]
> 
> [Public]
> 
> Reviewed-by: Tim Huang 
> 
> Best Regards,
> Tim Huang
> 
> 
> 
> -Original Message-
> From: Zhang, Yifan 
> Sent: Thursday, September 1, 2022 3:30 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Huang, Tim 
> ; Du, Xiaojian ; Zhang, Yifan 
> 
> Subject: [PATCH] drm/amdkfd: Match GC 11.0.1 cache info to yellow carp
> 
> Current discovery table doesn't have cache info for GC 11.0.1, thus 
> can't be parsed like other GC 11, this patch to match GC 11.0.1 cache 
> info to yellow carp
> 
> Signed-off-by: Yifan Zhang 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> index 24b414cff3ec..1c500bfb0b28 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> @@ -1516,11 +1516,11 @@ static int kfd_fill_gpu_cache_info(struct 
> kfd_dev *kdev,
> case IP_VERSION(10, 3, 3):
> case IP_VERSION(10, 3, 6): /* TODO: Double check these 
> on production silicon */
> case IP_VERSION(10, 3, 7): /* TODO: Double check these 
> on production silicon */
> +   case IP_VERSION(11, 0, 1): /* TODO: Double check these 
> +on production silicon */
> pcache_info = yellow_carp_cache_info;
> num_of_cache_types = 
> ARRAY_SIZE(yellow_carp_cache_info);
> break;
> case IP_VERSION(11, 0, 0):
> -   case IP_VERSION(11, 0, 1):
> case IP_VERSION(11, 0, 2):
> case IP_VERSION(11, 0, 3):
> pcache_info = cache_info;
> --
> 2.37.1

RE: [PATCH] drm/amdkfd: Fix isa version for the GC 10.3.7

2022-08-25 Thread Liang, Prike



Thanks, it makes sense and will refine the code before push. 


Regards,
--Prike

-Original Message-
From: Liu, Aaron  
Sent: Thursday, August 25, 2022 8:28 AM
To: Liang, Prike ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Huang, Ray 
; Zhang, Yifan ; Limonciello, Mario 

Subject: RE: [PATCH] drm/amdkfd: Fix isa version for the GC 10.3.7

[Public]

Because GC_IP_VERSION 10.3.6&10.3.7 all use 1036 ISA version, one nit-pick as 
below. It looks better.
case IP_VERSION(10, 3, 6):
case IP_VERSION(10, 3, 7):
gfx_target_version = 100306;
if (!vf)
f2g = &gfx_v10_3_kfd2kgd;
break;


> -Original Message-
> From: Liu, Aaron
> Sent: Thursday, August 25, 2022 8:15 AM
> To: Liang, Prike ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Huang, Ray 
> ; Zhang, Yifan ; Limonciello, 
> Mario 
> Subject: RE: [PATCH] drm/amdkfd: Fix isa version for the GC 10.3.7
> 
> [Public]
> 
> Reviewed-by: Aaron Liu 
> 
> > -Original Message-
> > From: Liang, Prike 
> > Sent: Wednesday, August 24, 2022 8:40 PM
> > To: Liang, Prike ; 
> > amd-gfx@lists.freedesktop.org
> > Cc: Deucher, Alexander ; Huang, Ray 
> > ; Zhang, Yifan ; Liu,
> Aaron
> > ; Limonciello, Mario 
> > Subject: RE: [PATCH] drm/amdkfd: Fix isa version for the GC 10.3.7
> >
> > [Public]
> >
> > Add more for the review and awareness.
> >
> > Regards,
> > --Prike
> >
> > -Original Message-
> > From: Prike Liang 
> > Sent: Wednesday, August 24, 2022 2:41 PM
> > To: amd-gfx@lists.freedesktop.org
> > Cc: Deucher, Alexander ; Huang, Ray 
> > ; Zhang, Yifan ; Liang,
> Prike
> > 
> > Subject: [PATCH] drm/amdkfd: Fix isa version for the GC 10.3.7
> >
> > Correct the isa version for handling KFD test.
> >
> > Fixes: 7c4f4f197e0c ("drm/amdkfd: Add GC 10.3.6 and 10.3.7 KFD
> > definitions")
> > Signed-off-by: Prike Liang 
> > ---
> >  drivers/gpu/drm/amd/amdkfd/kfd_device.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> > b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> > index fdad1415f8bd..5ebbeac61379 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> > @@ -388,7 +388,7 @@ struct kfd_dev *kgd2kfd_probe(struct
> amdgpu_device
> > *adev, bool vf)
> > f2g = &gfx_v10_3_kfd2kgd;
> > break;
> > case IP_VERSION(10, 3, 7):
> > -   gfx_target_version = 100307;
> > +   gfx_target_version = 100306;
> > if (!vf)
> > f2g = &gfx_v10_3_kfd2kgd;
> > break;
> > --
> > 2.25.1
> >

RE: [PATCH] drm/amdkfd: Fix isa version for the GC 10.3.7

2022-08-24 Thread Liang, Prike

[Public]

Add more for the review and awareness.

Regards,
--Prike

-Original Message-
From: Prike Liang 
Sent: Wednesday, August 24, 2022 2:41 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Huang, Ray 
; Zhang, Yifan ; Liang, Prike 

Subject: [PATCH] drm/amdkfd: Fix isa version for the GC 10.3.7

Correct the isa version for handling KFD test.

Fixes: 7c4f4f197e0c ("drm/amdkfd: Add GC 10.3.6 and 10.3.7 KFD definitions")
Signed-off-by: Prike Liang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index fdad1415f8bd..5ebbeac61379 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -388,7 +388,7 @@ struct kfd_dev *kgd2kfd_probe(struct amdgpu_device *adev, 
bool vf)
f2g = &gfx_v10_3_kfd2kgd;
break;
case IP_VERSION(10, 3, 7):
-   gfx_target_version = 100307;
+   gfx_target_version = 100306;
if (!vf)
f2g = &gfx_v10_3_kfd2kgd;
break;
--
2.25.1

RE: [PATCH v2] drm/amd/pm: Add get_gfx_off_status interface for yellow carp

2022-07-26 Thread Liang, Prike

[Public]

Reviewed-by: Prike Liang 

> -Original Message-
> From: amd-gfx  On Behalf Of
> shikai@amd.com
> Sent: Tuesday, July 26, 2022 2:29 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Liang, Prike
> ; Quan, Evan ; Liu, Aaron
> 
> Subject: [PATCH v2] drm/amd/pm: Add get_gfx_off_status interface for
> yellow carp
>
> From: Shikai Guo 
>
> add get_gfx_off_status interface to yellow_carp_ppt_funcs structure.
>
> Signed-off-by: Shikai Guo 
> ---
>  .../drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c  | 31
> +++
>  1 file changed, 31 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c
> b/drivers/gpu/drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c
> index 70cbc46341a3..04e56b0b3033 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c
> @@ -42,6 +42,11 @@
>  #undef pr_info
>  #undef pr_debug
>
> +#define regSMUIO_GFX_MISC_CNTL
>   0x00c5
> +#define regSMUIO_GFX_MISC_CNTL_BASE_IDX
>   0
> +#define SMUIO_GFX_MISC_CNTL__PWR_GFXOFF_STATUS_MASK
>   0x0006L
> +#define SMUIO_GFX_MISC_CNTL__PWR_GFXOFF_STATUS__SHIFT  0x1L
> +
>  #define FEATURE_MASK(feature) (1ULL << feature)  #define
> SMC_DPM_FEATURE ( \
>   FEATURE_MASK(FEATURE_CCLK_DPM_BIT) | \ @@ -587,6 +592,31
> @@ static ssize_t yellow_carp_get_gpu_metrics(struct smu_context *smu,
>   return sizeof(struct gpu_metrics_v2_1);  }
>
> +/**
> + * yellow_carp_get_gfxoff_status - get gfxoff status
> + *
> + * @smu: smu_context pointer
> + *
> + * This function will be used to get gfxoff status
> + *
> + * Returns 0=GFXOFF(default).
> + * Returns 1=Transition out of GFX State.
> + * Returns 2=Not in GFXOFF.
> + * Returns 3=Transition into GFXOFF.
> + */
> +static uint32_t yellow_carp_get_gfxoff_status(struct smu_context *smu)
> +{
> + uint32_t reg;
> + uint32_t gfxoff_status = 0;
> + struct amdgpu_device *adev = smu->adev;
> +
> + reg = RREG32_SOC15(SMUIO, 0, regSMUIO_GFX_MISC_CNTL);
> + gfxoff_status = (reg &
> SMUIO_GFX_MISC_CNTL__PWR_GFXOFF_STATUS_MASK)
> + >> SMUIO_GFX_MISC_CNTL__PWR_GFXOFF_STATUS__SHIFT;
> +
> + return gfxoff_status;
> +}
> +
>  static int yellow_carp_set_default_dpm_tables(struct smu_context *smu)  {
>   struct smu_table_context *smu_table = &smu->smu_table; @@ -
> 1186,6 +1216,7 @@ static const struct pptable_funcs yellow_carp_ppt_funcs
> = {
>   .get_pp_feature_mask = smu_cmn_get_pp_feature_mask,
>   .set_driver_table_location = smu_v13_0_set_driver_table_location,
>   .gfx_off_control = smu_v13_0_gfx_off_control,
> + .get_gfx_off_status = yellow_carp_get_gfxoff_status,
>   .post_init = yellow_carp_post_smu_init,
>   .mode2_reset = yellow_carp_mode2_reset,
>   .get_dpm_ultimate_freq = yellow_carp_get_dpm_ultimate_freq,
> --
> 2.25.1

RE: [PATCH] drm/admdgpu: Add get_gfx_off_status interface

2022-07-21 Thread Liang, Prike

[Public]

It looks like the amdgpu_device pointer declaration is useless, please clean it 
up.
Meanwhile need correct the patch subject prefix to drm/amd/pm: instead of 
drm/admdgpu.

With the above fixed the patch is Reviewed-by: Prike Liang 

> -Original Message-
> From: Guo, Shikai 
> Sent: Thursday, July 21, 2022 2:20 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Quan, Evan ; Deucher, Alexander
> ; Liang, Prike ; Guo,
> Shikai ; Guo, Shikai 
> Subject: [PATCH] drm/admdgpu: Add get_gfx_off_status interface
>
> From: Shikai Guo 
>
> add get_gfx_off_status interface to yellow_carp_ppt_funcs structure.
>
> Signed-off-by: Shikai Guo 
> ---
>  .../drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c  | 30
> +++
>  1 file changed, 30 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c
> b/drivers/gpu/drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c
> index 70cbc46341a3..cac48121d72b 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c
> @@ -31,6 +31,7 @@
>  #include "smu_v13_0_1_ppsmc.h"
>  #include "smu_v13_0_1_pmfw.h"
>  #include "smu_cmn.h"
> +#include "asic_reg/smuio/smuio_13_0_2_offset.h"
>
>  /*
>   * DO NOT use these for err/warn/info/debug messages.
> @@ -42,6 +43,9 @@
>  #undef pr_info
>  #undef pr_debug
>
> +#define SMUIO_GFX_MISC_CNTL__PWR_GFXOFF_STATUS_MASK
>   0x0006L
> +#define SMUIO_GFX_MISC_CNTL__PWR_GFXOFF_STATUS__SHIFT
>   0x1
> +
>  #define FEATURE_MASK(feature) (1ULL << feature)  #define
> SMC_DPM_FEATURE ( \
>   FEATURE_MASK(FEATURE_CCLK_DPM_BIT) | \ @@ -587,6 +591,31
> @@ static ssize_t yellow_carp_get_gpu_metrics(struct smu_context *smu,
>   return sizeof(struct gpu_metrics_v2_1);  }
>
> +/**
> + * yellow_carp_get_gfxoff_status - get gfxoff status
> + *
> + * @smu: amdgpu_device pointer
> + *
> + * This function will be used to get gfxoff status
> + *
> + * Returns 0=GFXOFF(default).
> + * Returns 1=Transition out of GFX State.
> + * Returns 2=Not in GFXOFF.
> + * Returns 3=Transition into GFXOFF.
> + */
> +static uint32_t yellow_carp_get_gfxoff_status(struct smu_context *smu)
> +{
> + uint32_t reg;
> + uint32_t gfxOff_Status = 0;
> + struct amdgpu_device *adev = smu->adev;
> +
> + reg = RREG32_SOC15(SMUIO, 0, regSMUIO_GFX_MISC_CNTL);
> + gfxOff_Status = (reg &
> SMUIO_GFX_MISC_CNTL__PWR_GFXOFF_STATUS_MASK)
> + >> SMUIO_GFX_MISC_CNTL__PWR_GFXOFF_STATUS__SHIFT;
> +
> + return gfxOff_Status;
> +}
> +
>  static int yellow_carp_set_default_dpm_tables(struct smu_context *smu)  {
>   struct smu_table_context *smu_table = &smu->smu_table; @@ -
> 1186,6 +1215,7 @@ static const struct pptable_funcs yellow_carp_ppt_funcs
> = {
>   .get_pp_feature_mask = smu_cmn_get_pp_feature_mask,
>   .set_driver_table_location = smu_v13_0_set_driver_table_location,
>   .gfx_off_control = smu_v13_0_gfx_off_control,
> + .get_gfx_off_status = yellow_carp_get_gfxoff_status,
>   .post_init = yellow_carp_post_smu_init,
>   .mode2_reset = yellow_carp_mode2_reset,
>   .get_dpm_ultimate_freq = yellow_carp_get_dpm_ultimate_freq,
> --
> 2.25.1

RE: [PATCH] drm/amdgpu: keep mmhub clock gating being enabled during s2idle suspend

2022-04-27 Thread Liang, Prike

[AMD Official Use Only - General]

> -Original Message-
> From: Lazar, Lijo 
> Sent: Wednesday, April 27, 2022 2:33 PM
> To: Liang, Prike ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Huang, Ray
> 
> Subject: Re: [PATCH] drm/amdgpu: keep mmhub clock gating being enabled
> during s2idle suspend
>
>
>
> On 4/27/2022 9:44 AM, Liang, Prike wrote:
> > [Public]
> >
> >> -Original Message-
> >> From: Lazar, Lijo 
> >> Sent: Tuesday, April 26, 2022 7:19 PM
> >> To: Liang, Prike ; amd-gfx@lists.freedesktop.org
> >> Cc: Deucher, Alexander ; Huang, Ray
> >> 
> >> Subject: Re: [PATCH] drm/amdgpu: keep mmhub clock gating being
> >> enabled during s2idle suspend
> >>
> >>
> >>
> >> On 4/25/2022 12:22 PM, Prike Liang wrote:
> >>> Without MMHUB clock gating being enabled then MMHUB will not
> >>> disconnect from DF and will result in DF C-state entry can't be
> >>> accessed during S2idle suspend, and eventually s0ix entry will be blocked.
> >>>
> >>> Signed-off-by: Prike Liang 
> >>> ---
> >>>drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 10 ++
> >>>1 file changed, 10 insertions(+)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> >>> b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> >>> index a455e59f41f4..20946bc7fc93 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> >>> @@ -1151,6 +1151,16 @@ static int
> >>> gmc_v10_0_set_clockgating_state(void
> >> *handle,
> >>>  int r;
> >>>  struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> >>>
> >>> +   /*
> >>> +* The issue mmhub can't disconnect from DF with MMHUB clock
> >> gating being disabled
> >>> +* is a new problem observed at DF 3.0.3, however with the same
> >> suspend sequence not
> >>> +* seen any issue on the DF 3.0.2 series platform.
> >>> +*/
> >>> +   if (adev->in_s0ix && adev->ip_versions[DF_HWIP][0] >
> >>> + IP_VERSION(3,
> >> 0, 2)) {
> >>> +   dev_dbg(adev->dev, "keep mmhub clock gating being
> >> enabled for s0ix\n");
> >>> +   return 0;
> >>> +   }
> >>> +
> >>
> >> This only ignores clock gating requests as long as s0ix flag is set.
> >> As far as I see, s0ix flag is set to true even during resume and set
> >> to false only after resume is completed. Is that intention and is
> >> this tested to be working fine? I suggest to keep this specifically for
> suspend calls.
> >>
> >> Thanks,
> >> Lijo
> >>
> > It reasonable for also not reenable the clock gating on the s2ilde
> > resume since clock gating not disabled on the s2idle suspend.
>
> Generally, the CG setting registers are not in always-on domain and the
> register settings will be lost once it goes down. Not sure about the state of
> this particular IP rail during S0i3 cycle.
>
> If the CG settings are driver-enabled, we reprogram CG settings during
> resume - amdgpu_device_resume->amdgpu_device_ip_late_init ->
> amdgpu_device_set_cg_state. This logic prevents this. Maybe, it works fine
> during your testing because it's done by FW. If the settings are programmed
> by FW components, usually reprogramming is taken care by FW.
>
> Thanks,
> Lijo
>
In the S0i3 entry the gfx power rail will be turn off but MEM_S3 power rail is 
keeping on and involved device/IP context will be saved in the memory and then 
each context restored by PM firmware during S0i3 resume.

> Have merged the fix for not blocking s0ix support for some
> > upcoming asic and meanwhile still need dig into whether the DF C-state,
> MMHUB DS or BIOS mmhub power gate request different on GC 10.3.7
> introduce this issue and then make a generic solution for such this issue.
> >
> >>>  r = adev->mmhub.funcs->set_clockgating(adev, state);
> >>>  if (r)
> >>>  return r;
> >>>

RE: [PATCH] drm/amdgpu: keep mmhub clock gating being enabled during s2idle suspend

2022-04-26 Thread Liang, Prike

[Public]

> -Original Message-
> From: Lazar, Lijo 
> Sent: Tuesday, April 26, 2022 7:19 PM
> To: Liang, Prike ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Huang, Ray
> 
> Subject: Re: [PATCH] drm/amdgpu: keep mmhub clock gating being enabled
> during s2idle suspend
>
>
>
> On 4/25/2022 12:22 PM, Prike Liang wrote:
> > Without MMHUB clock gating being enabled then MMHUB will not
> > disconnect from DF and will result in DF C-state entry can't be
> > accessed during S2idle suspend, and eventually s0ix entry will be blocked.
> >
> > Signed-off-by: Prike Liang 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 10 ++
> >   1 file changed, 10 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> > b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> > index a455e59f41f4..20946bc7fc93 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> > @@ -1151,6 +1151,16 @@ static int gmc_v10_0_set_clockgating_state(void
> *handle,
> > int r;
> > struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> >
> > +   /*
> > +* The issue mmhub can't disconnect from DF with MMHUB clock
> gating being disabled
> > +* is a new problem observed at DF 3.0.3, however with the same
> suspend sequence not
> > +* seen any issue on the DF 3.0.2 series platform.
> > +*/
> > +   if (adev->in_s0ix && adev->ip_versions[DF_HWIP][0] > IP_VERSION(3,
> 0, 2)) {
> > +   dev_dbg(adev->dev, "keep mmhub clock gating being
> enabled for s0ix\n");
> > +   return 0;
> > +   }
> > +
>
> This only ignores clock gating requests as long as s0ix flag is set. As far 
> as I
> see, s0ix flag is set to true even during resume and set to false only after
> resume is completed. Is that intention and is this tested to be working fine? 
> I
> suggest to keep this specifically for suspend calls.
>
> Thanks,
> Lijo
>
It reasonable for also not reenable the clock gating on the s2ilde resume since 
clock gating
not disabled on the s2idle suspend. Have merged the fix for not blocking s0ix 
support for some
upcoming asic and meanwhile still need dig into whether the DF C-state, MMHUB 
DS or BIOS mmhub power gate request different on GC 10.3.7 introduce this issue 
and then make a generic solution for such this issue.

> > r = adev->mmhub.funcs->set_clockgating(adev, state);
> > if (r)
> > return r;
> >

RE: [PATCH] drm/amdgpu: keep mmhub clock gating being enabled during s2idle suspend

2022-04-20 Thread Liang, Prike

[Public]

> -Original Message-
> From: Lazar, Lijo 
> Sent: Wednesday, April 20, 2022 11:39 AM
> To: Liang, Prike ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Huang, Ray
> 
> Subject: Re: [PATCH] drm/amdgpu: keep mmhub clock gating being enabled
> during s2idle suspend
>
>
>
> On 4/20/2022 7:32 AM, Prike Liang wrote:
> > Without MMHUB clock gating being enabled then MMHUB will not
> > disconnect from DF and will result in DF C-state entry can't be
> > accessed during S2idle suspend, and eventually s0ix entry will be blocked.
> >
> > Signed-off-by: Prike Liang 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/mmhub_v2_3.c | 9 +
> >   1 file changed, 9 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v2_3.c
> > b/drivers/gpu/drm/amd/amdgpu/mmhub_v2_3.c
> > index 1957fb098c4d..cb3dca4834b4 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/mmhub_v2_3.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v2_3.c
> > @@ -568,6 +568,15 @@ static int mmhub_v2_3_set_clockgating(struct
> amdgpu_device *adev,
> >   {
> > if (amdgpu_sriov_vf(adev))
> > return 0;
> > +   /*
> > +* The issue mmhub can't disconnect from DF with clock gating being
> disabled
> > +* seems only observed at DF 3.0.3, with the same suspend sequence
> not seen
> > +* any issue on the DF 3.0.2 series platform.
> > +*/
> > +   if (adev->in_s0ix) {
> > +   dev_dbg(adev->dev, "keep mmhub clock gating being
> enabled for s0ix\n");
> > +   return 0;
> > +   }
> >
>
> A better fix would be to explicitly enable mmhub clockgating during s0ix
> suspend of gmc for this IP version.
>
> Thanks,
> Lijo
>
Yeah, it looks like more safe way to sort out the mmhub clock gating set for 
not influence old product.
However, still not sure whether really need sort out the case which can 
disabled mmhub clock gating for s0ix,
and it sounds like the clock gating enabled is a common requirement for each DF 
client can disconnect from DF.

> > mmhub_v2_3_update_medium_grain_clock_gating(adev,
> > state == AMD_CG_STATE_GATE);
> >

RE: [PATCH] drm/amdgpu: fix incorrect GCR_GENERAL_CNTL address

2022-03-28 Thread Liang, Prike

[Public]

Thanks for the correcting fix and  I confirm on the gfx1037 exactly need update 
the mmGCR_GENERAL_CNTL offset to 0x1580.

Meanwhile, please confirm whether this change also apply in the gfx1036, anyway 
this patch is

Acked-by: Prike Liang 

> -Original Message-
> From: Ji, Ruili 
> Sent: Monday, March 28, 2022 11:47 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Huang, Ray
> ; Liu, Aaron ; Zhang, Yifan
> ; Liang, Prike ; Ji, Ruili
> 
> Subject: [PATCH] drm/amdgpu: fix incorrect GCR_GENERAL_CNTL address
>
> From: Ruili Ji 
>
> RMB shall use 0x1580 address for GCR_GENERAL_CNTL
>
> Signed-off-by: Ruili Ji 
> Change-Id: I10a85891986f31411f85fa3db46970aaa8a5bd03
> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index 99df18ae7316..e4c9d92ac381 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -3300,7 +3300,7 @@ static const struct soc15_reg_golden
> golden_settings_gc_10_3_3[] =
>   SOC15_REG_GOLDEN_VALUE(GC, 0, mmDB_DEBUG3, 0x,
> 0x0280),
>   SOC15_REG_GOLDEN_VALUE(GC, 0, mmDB_DEBUG4, 0x,
> 0x0080),
>   SOC15_REG_GOLDEN_VALUE(GC, 0, mmGB_ADDR_CONFIG,
> 0x0c1807ff, 0x0242),
> - SOC15_REG_GOLDEN_VALUE(GC, 0, mmGCR_GENERAL_CNTL,
> 0x1ff1, 0x0500),
> + SOC15_REG_GOLDEN_VALUE(GC, 0,
> mmGCR_GENERAL_CNTL_Vangogh, 0x1ff1,
> +0x0500),
>   SOC15_REG_GOLDEN_VALUE(GC, 0, mmGL1_PIPE_STEER, 0x00ff,
> 0x00e4),
>   SOC15_REG_GOLDEN_VALUE(GC, 0, mmGL2_PIPE_STEER_0,
> 0x, 0x32103210),
>   SOC15_REG_GOLDEN_VALUE(GC, 0, mmGL2_PIPE_STEER_1,
> 0x, 0x32103210), @@ -3436,7 +3436,7 @@ static const struct
> soc15_reg_golden golden_settings_gc_10_3_6[] =
>   SOC15_REG_GOLDEN_VALUE(GC, 0, mmDB_DEBUG3, 0x,
> 0x0280),
>   SOC15_REG_GOLDEN_VALUE(GC, 0, mmDB_DEBUG4, 0x,
> 0x0080),
>   SOC15_REG_GOLDEN_VALUE(GC, 0, mmGB_ADDR_CONFIG,
> 0x0c1807ff, 0x0042),
> - SOC15_REG_GOLDEN_VALUE(GC, 0, mmGCR_GENERAL_CNTL,
> 0x1ff1, 0x0500),
> + SOC15_REG_GOLDEN_VALUE(GC, 0,
> mmGCR_GENERAL_CNTL_Vangogh, 0x1ff1,
> +0x0500),
>   SOC15_REG_GOLDEN_VALUE(GC, 0, mmGL1_PIPE_STEER, 0x00ff,
> 0x0044),
>   SOC15_REG_GOLDEN_VALUE(GC, 0, mmGL2_PIPE_STEER_0,
> 0x, 0x32103210),
>   SOC15_REG_GOLDEN_VALUE(GC, 0, mmGL2_PIPE_STEER_1,
> 0x, 0x32103210), @@ -3461,7 +3461,7 @@ static const struct
> soc15_reg_golden golden_settings_gc_10_3_7[] = {
>   SOC15_REG_GOLDEN_VALUE(GC, 0, mmDB_DEBUG3, 0x,
> 0x0280),
>   SOC15_REG_GOLDEN_VALUE(GC, 0, mmDB_DEBUG4, 0x,
> 0x0080),
>   SOC15_REG_GOLDEN_VALUE(GC, 0, mmGB_ADDR_CONFIG,
> 0x0c1807ff, 0x0041),
> - SOC15_REG_GOLDEN_VALUE(GC, 0, mmGCR_GENERAL_CNTL,
> 0x1ff1, 0x0500),
> + SOC15_REG_GOLDEN_VALUE(GC, 0,
> mmGCR_GENERAL_CNTL_Vangogh, 0x1ff1,
> +0x0500),
>   SOC15_REG_GOLDEN_VALUE(GC, 0, mmGL1_PIPE_STEER, 0x00ff,
> 0x00e4),
>   SOC15_REG_GOLDEN_VALUE(GC, 0, mmGL2_PIPE_STEER_0,
> 0x, 0x32103210),
>   SOC15_REG_GOLDEN_VALUE(GC, 0, mmGL2_PIPE_STEER_1,
> 0x, 0x32103210),
> --
> 2.25.1

RE: [PATCH] drm/amdgpu: set noretry for gfx 10.3.7

2022-03-23 Thread Liang, Prike

[Public]

Thanks for the review. Actually, the commit here mean disable xnack on the ISA 
gfx1036 for KFD test, but is seems more clear to unify using IP version gfx 
10.3.7 and will update the commit info.

Thanks,
Prike
> -Original Message-
> From: Alex Deucher 
> Sent: Wednesday, March 23, 2022 9:21 PM
> To: Liang, Prike 
> Cc: amd-gfx list ; Deucher, Alexander
> ; Zhang, Yifan ;
> Huang, Ray 
> Subject: Re: [PATCH] drm/amdgpu: set noretry for gfx 10.3.7
>
> On Wed, Mar 23, 2022 at 4:15 AM Prike Liang  wrote:
> >
> > Disable xnack on the isa gfx10.3.6.
>
> typo: gfx10.3.6 -> gfx10.3.7
>
> With that fixed:
> Acked-by: Alex Deucher 
>
> >
> > Signed-off-by: Prike Liang 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> > index e1635a3f2553..a66a0881a934 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> > @@ -569,6 +569,7 @@ void amdgpu_gmc_noretry_set(struct
> amdgpu_device *adev)
> > case IP_VERSION(10, 3, 4):
> > case IP_VERSION(10, 3, 5):
> > case IP_VERSION(10, 3, 6):
> > +   case IP_VERSION(10, 3, 7):
> > /*
> >  * noretry = 0 will cause kfd page fault tests fail
> >  * for some ASICs, so set default to 1 for these ASICs.
> > --
> > 2.25.1
> >

RE: [PATCH] drm/amd/pm: validate SMU feature enable message for getting feature enabled mask

2022-02-17 Thread Liang, Prike

[Public]

> -Original Message-
> From: Lazar, Lijo 
> Sent: Friday, February 18, 2022 1:29 PM
> To: Liang, Prike ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Quan, Evan
> ; Huang, Ray 
> Subject: Re: [PATCH] drm/amd/pm: validate SMU feature enable message for
> getting feature enabled mask
>
>
>
> On 2/18/2022 9:57 AM, Liang, Prike wrote:
> > [Public]
> >
> >> -Original Message-
> >> From: Lazar, Lijo 
> >> Sent: Friday, February 18, 2022 12:05 PM
> >> To: Liang, Prike ; amd-gfx@lists.freedesktop.org
> >> Cc: Deucher, Alexander ; Quan, Evan
> >> ; Huang, Ray 
> >> Subject: Re: [PATCH] drm/amd/pm: validate SMU feature enable message
> >> for getting feature enabled mask
> >>
> >>
> >>
> >> On 2/18/2022 9:25 AM, Prike Liang wrote:
> >>> There's always miss the SMU feature enabled checked in the NPI
> >>> phase, so let validate the SMU feature enable message directly
> >>> rather than add more and more MP1 version check.
> >>>
> >>> Signed-off-by: Prike Liang 
> >>> ---
> >>>drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 28
> >>> ++---
> >> -
> >>>1 file changed, 6 insertions(+), 22 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> >>> b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> >>> index f24111d28290..da1ac70ed455 100644
> >>> --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> >>> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> >>> @@ -552,10 +552,9 @@ bool smu_cmn_clk_dpm_is_enabled(struct
> >> smu_context *smu,
> >>>int smu_cmn_get_enabled_mask(struct smu_context *smu,
> >>>   uint64_t *feature_mask)
> >>>{
> >>> -   struct amdgpu_device *adev = smu->adev;
> >>>  uint32_t *feature_mask_high;
> >>>  uint32_t *feature_mask_low;
> >>> -   int ret = 0;
> >>> +   int ret = 0, index = 0;
> >>>
> >>>  if (!feature_mask)
> >>>  return -EINVAL;
> >>> @@ -563,12 +562,10 @@ int smu_cmn_get_enabled_mask(struct
> >> smu_context *smu,
> >>>  feature_mask_low = &((uint32_t *)feature_mask)[0];
> >>>  feature_mask_high = &((uint32_t *)feature_mask)[1];
> >>>
> >>> -   switch (adev->ip_versions[MP1_HWIP][0]) {
> >>> -   /* For Vangogh and Yellow Carp */
> >>> -   case IP_VERSION(11, 5, 0):
> >>> -   case IP_VERSION(13, 0, 1):
> >>> -   case IP_VERSION(13, 0, 3):
> >>> -   case IP_VERSION(13, 0, 8):
> >>> +   index = smu_cmn_to_asic_specific_index(smu,
> >>> +   CMN2ASIC_MAPPING_MSG,
> >>> +
> >>SMU_MSG_GetEnabledSmuFeatures);
> >>> +   if (index > 0) {
> >>>  ret = smu_cmn_send_smc_msg_with_param(smu,
> >>>
> >> SMU_MSG_GetEnabledSmuFeatures,
> >>>0, @@ -580,19
> >>> +577,7 @@ int smu_cmn_get_enabled_mask(struct
> >> smu_context *smu,
> >>>
> >> SMU_MSG_GetEnabledSmuFeatures,
> >>>1,
> >>>feature_mask_high);
> >>> -   break;
> >>> -   /*
> >>> -* For Cyan Skillfish and Renoir, there is no interface provided by
> >> PMFW
> >>> -* to retrieve the enabled features. So, we assume all features are
> >> enabled.
> >>> -* TODO: add other APU ASICs which suffer from the same issue here
> >>> -*/
> >>> -   case IP_VERSION(11, 0, 8):
> >>> -   case IP_VERSION(12, 0, 0):
> >>> -   case IP_VERSION(12, 0, 1):
> >>> -   memset(feature_mask, 0xff, sizeof(*feature_mask));
> >>> -   break;
> >>
> >> This is broken now as these ASICs don't support any message. It is
> >> best to take out smu_cmn_get_enabled_mask altogether and move to
> >> smu_v*.c or *_ppt.c as this is a callback function.
> >>
> >> Thanks,
> >> Lijo
> >>
> > Before this solution I also consider put the  smu_cmn_get_enabled_mask
> implementation in each *_ppt directly, but seems need some effort for
> implementing on each *_pp

RE: [PATCH] drm/amd/pm: validate SMU feature enable message for getting feature enabled mask

2022-02-17 Thread Liang, Prike

[Public]

> -Original Message-
> From: Lazar, Lijo 
> Sent: Friday, February 18, 2022 12:05 PM
> To: Liang, Prike ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Quan, Evan
> ; Huang, Ray 
> Subject: Re: [PATCH] drm/amd/pm: validate SMU feature enable message for
> getting feature enabled mask
>
>
>
> On 2/18/2022 9:25 AM, Prike Liang wrote:
> > There's always miss the SMU feature enabled checked in the NPI phase,
> > so let validate the SMU feature enable message directly rather than
> > add more and more MP1 version check.
> >
> > Signed-off-by: Prike Liang 
> > ---
> >   drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 28 ++---
> -
> >   1 file changed, 6 insertions(+), 22 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> > b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> > index f24111d28290..da1ac70ed455 100644
> > --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> > +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> > @@ -552,10 +552,9 @@ bool smu_cmn_clk_dpm_is_enabled(struct
> smu_context *smu,
> >   int smu_cmn_get_enabled_mask(struct smu_context *smu,
> >  uint64_t *feature_mask)
> >   {
> > -   struct amdgpu_device *adev = smu->adev;
> > uint32_t *feature_mask_high;
> > uint32_t *feature_mask_low;
> > -   int ret = 0;
> > +   int ret = 0, index = 0;
> >
> > if (!feature_mask)
> > return -EINVAL;
> > @@ -563,12 +562,10 @@ int smu_cmn_get_enabled_mask(struct
> smu_context *smu,
> > feature_mask_low = &((uint32_t *)feature_mask)[0];
> > feature_mask_high = &((uint32_t *)feature_mask)[1];
> >
> > -   switch (adev->ip_versions[MP1_HWIP][0]) {
> > -   /* For Vangogh and Yellow Carp */
> > -   case IP_VERSION(11, 5, 0):
> > -   case IP_VERSION(13, 0, 1):
> > -   case IP_VERSION(13, 0, 3):
> > -   case IP_VERSION(13, 0, 8):
> > +   index = smu_cmn_to_asic_specific_index(smu,
> > +   CMN2ASIC_MAPPING_MSG,
> > +
>   SMU_MSG_GetEnabledSmuFeatures);
> > +   if (index > 0) {
> > ret = smu_cmn_send_smc_msg_with_param(smu,
> >
> SMU_MSG_GetEnabledSmuFeatures,
> >   0,
> > @@ -580,19 +577,7 @@ int smu_cmn_get_enabled_mask(struct
> smu_context *smu,
> >
> SMU_MSG_GetEnabledSmuFeatures,
> >   1,
> >   feature_mask_high);
> > -   break;
> > -   /*
> > -* For Cyan Skillfish and Renoir, there is no interface provided by
> PMFW
> > -* to retrieve the enabled features. So, we assume all features are
> enabled.
> > -* TODO: add other APU ASICs which suffer from the same issue here
> > -*/
> > -   case IP_VERSION(11, 0, 8):
> > -   case IP_VERSION(12, 0, 0):
> > -   case IP_VERSION(12, 0, 1):
> > -   memset(feature_mask, 0xff, sizeof(*feature_mask));
> > -   break;
>
> This is broken now as these ASICs don't support any message. It is best to
> take out smu_cmn_get_enabled_mask altogether and move to smu_v*.c or
> *_ppt.c as this is a callback function.
>
> Thanks,
> Lijo
>
Before this solution I also consider put the  smu_cmn_get_enabled_mask 
implementation in each *_ppt directly, but seems need some effort for 
implementing on each *_ppt. How about we also check the 
SMU_MSG_GetEnabledSmuFeaturesHigh mapping index? As to the ASCI not support 
neither  SMU_MSG_GetEnabledSmuFeatures nor  SMU_MSG_GetEnabledSmuFeaturesHigh 
will hard code the feature mask in this case.

> > -   /* other dGPU ASICs */
> > -   default:
> > +   } else {
> > ret = smu_cmn_send_smc_msg(smu,
> >
> SMU_MSG_GetEnabledSmuFeaturesHigh,
> >feature_mask_high);
> > @@ -602,7 +587,6 @@ int smu_cmn_get_enabled_mask(struct
> smu_context *smu,
> > ret = smu_cmn_send_smc_msg(smu,
> >
> SMU_MSG_GetEnabledSmuFeaturesLow,
> >feature_mask_low);
> > -   break;
> > }
> >
> > return ret;
> >

RE: [PATCH] drm/amd/pm: validate SMU feature enable message for getting feature enabled mask

2022-02-17 Thread Liang, Prike

[Public]

Please ignore this patch for the moment and seems need take care the RN series 
for a special case.

> -Original Message-
> From: Liang, Prike 
> Sent: Friday, February 18, 2022 11:55 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Quan, Evan
> ; Huang, Ray ; Liang, Prike
> 
> Subject: [PATCH] drm/amd/pm: validate SMU feature enable message for
> getting feature enabled mask
>
> There's always miss the SMU feature enabled checked in the NPI phase, so
> let validate the SMU feature enable message directly rather than add more
> and more MP1 version check.
>
> Signed-off-by: Prike Liang 
> ---
>  drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 28 ++
>  1 file changed, 6 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> index f24111d28290..da1ac70ed455 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> @@ -552,10 +552,9 @@ bool smu_cmn_clk_dpm_is_enabled(struct
> smu_context *smu,  int smu_cmn_get_enabled_mask(struct smu_context
> *smu,
>uint64_t *feature_mask)
>  {
> - struct amdgpu_device *adev = smu->adev;
>   uint32_t *feature_mask_high;
>   uint32_t *feature_mask_low;
> - int ret = 0;
> + int ret = 0, index = 0;
>
>   if (!feature_mask)
>   return -EINVAL;
> @@ -563,12 +562,10 @@ int smu_cmn_get_enabled_mask(struct
> smu_context *smu,
>   feature_mask_low = &((uint32_t *)feature_mask)[0];
>   feature_mask_high = &((uint32_t *)feature_mask)[1];
>
> - switch (adev->ip_versions[MP1_HWIP][0]) {
> - /* For Vangogh and Yellow Carp */
> - case IP_VERSION(11, 5, 0):
> - case IP_VERSION(13, 0, 1):
> - case IP_VERSION(13, 0, 3):
> - case IP_VERSION(13, 0, 8):
> + index = smu_cmn_to_asic_specific_index(smu,
> + CMN2ASIC_MAPPING_MSG,
> +
>   SMU_MSG_GetEnabledSmuFeatures);
> + if (index > 0) {
>   ret = smu_cmn_send_smc_msg_with_param(smu,
>
> SMU_MSG_GetEnabledSmuFeatures,
> 0,
> @@ -580,19 +577,7 @@ int smu_cmn_get_enabled_mask(struct
> smu_context *smu,
>
> SMU_MSG_GetEnabledSmuFeatures,
> 1,
> feature_mask_high);
> - break;
> - /*
> -  * For Cyan Skillfish and Renoir, there is no interface provided by
> PMFW
> -  * to retrieve the enabled features. So, we assume all features are
> enabled.
> -  * TODO: add other APU ASICs which suffer from the same issue here
> -  */
> - case IP_VERSION(11, 0, 8):
> - case IP_VERSION(12, 0, 0):
> - case IP_VERSION(12, 0, 1):
> - memset(feature_mask, 0xff, sizeof(*feature_mask));
> - break;
> - /* other dGPU ASICs */
> - default:
> + } else {
>   ret = smu_cmn_send_smc_msg(smu,
>
> SMU_MSG_GetEnabledSmuFeaturesHigh,
>  feature_mask_high);
> @@ -602,7 +587,6 @@ int smu_cmn_get_enabled_mask(struct smu_context
> *smu,
>   ret = smu_cmn_send_smc_msg(smu,
>
> SMU_MSG_GetEnabledSmuFeaturesLow,
>  feature_mask_low);
> - break;
>   }
>
>   return ret;
> --
> 2.17.1

RE: [PATCH] drm/amdgpu: enable TMZ option for onwards asic

2022-02-17 Thread Liang, Prike

[Public]

Ah, it's a typo, will re-sent a new patch.

> -Original Message-
> From: Liu, Aaron 
> Sent: Friday, February 18, 2022 9:56 AM
> To: Liang, Prike ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Liang, Prike
> ; Huang, Ray 
> Subject: RE: [PATCH] drm/amdgpu: enable TMZ option for onwards asic
>
> [AMD Official Use Only]
>
> ":" is expected behind "case CHIP_IP_DISCOVERY"
>
> --
> Best Regards
> Aaron Liu
>
> > -Original Message-
> > From: amd-gfx  On Behalf Of
> > Prike Liang
> > Sent: Friday, February 18, 2022 9:37 AM
> > To: amd-gfx@lists.freedesktop.org
> > Cc: Deucher, Alexander ; Liang, Prike
> > ; Huang, Ray 
> > Subject: [PATCH] drm/amdgpu: enable TMZ option for onwards asic
> >
> > The TMZ is disabled by default and enable TMZ option for the IP
> > discovery based asic will help on the TMZ function verification.
> >
> > Signed-off-by: Prike Liang 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> > index 956cc994ca7d..d2dd526a4c80 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> > @@ -581,6 +581,7 @@ void amdgpu_gmc_tmz_set(struct amdgpu_device
> > *adev)
> > case CHIP_NAVI12:
> > case CHIP_VANGOGH:
> > case CHIP_YELLOW_CARP:
> > +   case CHIP_IP_DISCOVERY
> > /* Don't enable it by default yet.
> >  */
> > if (amdgpu_tmz < 1) {
> > --
> > 2.17.1

RE: [PATCH v2] drm/amdgpu: skipping SDMA hw_init and hw_fini for S0ix.

2022-02-10 Thread Liang, Prike



> -Original Message-
> From: Limonciello, Mario 
> Sent: Friday, February 11, 2022 12:09 AM
> To: Alex Deucher ; Liang, Prike
> 
> Cc: Mahapatra, Rajib ; Deucher, Alexander
> ; amd-gfx@lists.freedesktop.org; S, Shirish
> 
> Subject: RE: [PATCH v2] drm/amdgpu: skipping SDMA hw_init and hw_fini for
> S0ix.
> 
> [Public]
> 
> > > VG doesn't do s0i3 right?
> >
> > Right.
> >
> > > No, YC should not take a similar fix.YC had an architectural change 
> > > and
> to
> > > avoid a "similar" problem takes
> > 26db706a6d77b9e184feb11725e97e53b7a89519.
> >
> > Isn't that likely just a workaround for the same issue?  This seems cleaner.
> >
> 
> The SMU doesn't handle the restore of the SDMA registers for YC though,
> this explicitly changed.  So I don't believe we can do an identical fix there.
> 
> @Liang, Prike comments?

Yeah, in the gfx10 series looks the SMU doesn't handle SDMA save and restore in 
the PMFW anymore.

RE: [PATCH v4] drm/amd: Warn users about potential s0ix problems

2022-01-21 Thread Liang, Prike

The S2idle suspend/resume process seems also depends on the CONFIG_SUSPEND. 
Moreover, why this check function still return true even when BIOS/AMDPMC not 
configured correctly? You know we still looking into some S0ix abort issue and 
system will run into such problem when mark those misconfigured case also as 
s0ix.   

Thanks,
Prike
> -Original Message-
> From: Limonciello, Mario 
> Sent: Friday, January 21, 2022 3:59 AM
> To: amd-gfx@lists.freedesktop.org; Lazar, Lijo ; Liang,
> Prike 
> Cc: Bjoren Dasse 
> Subject: RE: [PATCH v4] drm/amd: Warn users about potential s0ix problems
> 
> [Public]
> 
> Add back on Lijo and Prike, my mistake they got dropped from CC.
> 
> > -Original Message-
> > From: Limonciello, Mario 
> > Sent: Tuesday, January 18, 2022 21:41
> > To: amd-gfx@lists.freedesktop.org
> > Cc: Limonciello, Mario ; Bjoren Dasse
> > 
> > Subject: [PATCH v4] drm/amd: Warn users about potential s0ix problems
> >
> > On some OEM setups users can configure the BIOS for S3 or S2idle.
> > When configured to S3 users can still choose 's2idle' in the kernel by
> > using `/sys/power/mem_sleep`.  Before commit 6dc8265f9803
> ("drm/amdgpu:
> > always reset the asic in suspend (v2)"), the GPU would crash.  Now
> > when configured this way, the system should resume but will use more
> power.
> >
> > As such, adjust the `amdpu_acpi_is_s0ix function` to warn users about
> > potential power consumption issues during their first attempt at
> > suspending.
> >
> > Reported-by: Bjoren Dasse 
> > Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1824
> > Signed-off-by: Mario Limonciello 
> > ---
> > v3->v4:
> >  * Add back in CONFIG_SUSPEND check
> > v2->v3:
> >  * Better direct users how to recover in the bad cases
> > v1->v2:
> >  * Only show messages in s2idle cases
> >
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 21 +++-
> -
> >  1 file changed, 15 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > index 4811b0faafd9..2531da6cbec3 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > @@ -1040,11 +1040,20 @@ void amdgpu_acpi_detect(void)
> >   */
> >  bool amdgpu_acpi_is_s0ix_active(struct amdgpu_device *adev)  { -#if
> > IS_ENABLED(CONFIG_AMD_PMC) && IS_ENABLED(CONFIG_SUSPEND)
> > -   if (acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0) {
> > -   if (adev->flags & AMD_IS_APU)
> > -   return pm_suspend_target_state ==
> > PM_SUSPEND_TO_IDLE;
> > -   }
> > -#endif
> > +#if IS_ENABLED(CONFIG_SUSPEND)
> > +   if (!(adev->flags & AMD_IS_APU) ||
> > +   pm_suspend_target_state != PM_SUSPEND_TO_IDLE)
> > +   return false;
> > +#else
> > return false;
> > +#endif
> > +   if (!(acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0))
> > +   dev_warn_once(adev->dev,
> > + "Power consumption will be higher as BIOS has
> not
> > been configured for suspend-to-idle.\n"
> > + "To use suspend-to-idle change the sleep mode in
> > BIOS setup.\n");
> > +#if !IS_ENABLED(CONFIG_AMD_PMC)
> > +   dev_warn_once(adev->dev,
> > + "Power consumption will be higher as the kernel has not
> > been compiled with CONFIG_AMD_PMC.\n");
> > +#endif
> > +   return true;
> >  }
> > --
> > 2.25.1

RE: [PATCH v2] drm/amd: Warn users about potential s0ix problems

2022-01-18 Thread Liang, Prike

If the flag ACPI_FADT_LOW_POWER_S0 not set or AMDPMC driver not build, then 
that seems will mess up the suspend entry and unable to enter either S3 nor 
S2idle properly. In this S2idle configuration issue case, how about add some 
message to notify end user how to configure S2idle correctly?

Thanks,
Prike
From: amd-gfx  On Behalf Of Limonciello, 
Mario
Sent: Tuesday, January 18, 2022 1:26 AM
To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org
Cc: Bjoren Dasse 
Subject: RE: [PATCH v2] drm/amd: Warn users about potential s0ix problems


[Public]

Yes, that's part of why I want to make sure there are explicit warnings to 
users about using this flow.
When not enabled in ACPI then also the LPS0 device is not exported and AMD_PMC 
won't load or be used.

I think from amdgpu perspective it should behave relatively similar to an 
aborted suspend.

From: Lazar, Lijo mailto:lijo.la...@amd.com>>
Sent: Monday, January 17, 2022 11:20
To: Limonciello, Mario 
mailto:mario.limoncie...@amd.com>>; 
amd-gfx@lists.freedesktop.org
Cc: Bjoren Dasse mailto:bjoern.da...@gmail.com>>
Subject: Re: [PATCH v2] drm/amd: Warn users about potential s0ix problems

Any problem with PMFW sequence in the way Linux handles s2idle when it's not 
enabled in ACPI?

Thanks,
Lijo

From: Limonciello, Mario 
mailto:mario.limoncie...@amd.com>>
Sent: Monday, January 17, 2022 10:45:44 PM
To: amd-gfx@lists.freedesktop.org 
mailto:amd-gfx@lists.freedesktop.org>>; Lazar, 
Lijo mailto:lijo.la...@amd.com>>
Cc: Bjoren Dasse mailto:bjoern.da...@gmail.com>>
Subject: RE: [PATCH v2] drm/amd: Warn users about potential s0ix problems

[Public]

This has been sitting a week or so.
Bump on review for this patch.

> -Original Message-
> From: Limonciello, Mario 
> mailto:mario.limoncie...@amd.com>>
> Sent: Tuesday, January 11, 2022 14:00
> To: amd-gfx@lists.freedesktop.org
> Cc: Limonciello, Mario 
> mailto:mario.limoncie...@amd.com>>; Bjoren Dasse
> mailto:bjoern.da...@gmail.com>>
> Subject: [PATCH v2] drm/amd: Warn users about potential s0ix problems
>
> On some OEM setups users can configure the BIOS for S3 or S2idle.
> When configured to S3 users can still choose 's2idle' in the kernel by
> using `/sys/power/mem_sleep`.  Before commit 6dc8265f9803
> ("drm/amdgpu:
> always reset the asic in suspend (v2)"), the GPU would crash.  Now when
> configured this way, the system should resume but will use more power.
>
> As such, adjust the `amdpu_acpi_is_s0ix function` to warn users about
> potential power consumption issues during their first attempt at
> suspending.
>
> Reported-by: Bjoren Dasse 
> mailto:bjoern.da...@gmail.com>>
> Link: 
> https://gitlab.freedesktop.org/drm/amd/-/issues/1824
> Signed-off-by: Mario Limonciello 
> mailto:mario.limoncie...@amd.com>>
> ---
> v1->v2:
>  * Only show messages in s2idle cases
>  drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 16 ++--
>  1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> index 4811b0faafd9..1295de6d6c30 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> @@ -1040,11 +1040,15 @@ void amdgpu_acpi_detect(void)
>   */
>  bool amdgpu_acpi_is_s0ix_active(struct amdgpu_device *adev)
>  {
> -#if IS_ENABLED(CONFIG_AMD_PMC) && IS_ENABLED(CONFIG_SUSPEND)
> - if (acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0) {
> - if (adev->flags & AMD_IS_APU)
> - return pm_suspend_target_state ==
> PM_SUSPEND_TO_IDLE;
> - }
> + if (!(adev->flags & AMD_IS_APU) ||
> + pm_suspend_target_state != PM_SUSPEND_TO_IDLE)
> + return false;
> + if (!(acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0))
> + dev_warn_once(adev->dev,
> +   "BIOS is not configured for suspend-to-idle, power
> consumption will be higher\n");
> +#if !IS_ENABLED(CONFIG_AMD_PMC)
> + dev_warn_once(adev->dev,
> +   "amd-pmc is not enabled in the kernel, power
> consumption will be higher\n");
>  #endif
> - return false;
> + return true;
>  }
> --
> 2.25.1

RE: [v3] drm/amdgpu: reset asic after system-wide suspend aborted (v3)

2021-12-13 Thread Liang, Prike

[Public]

> -Original Message-
> From: Limonciello, Mario 
> Sent: Tuesday, December 14, 2021 5:48 AM
> To: Liang, Prike ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Lazar, Lijo
> ; Huang, Ray 
> Subject: Re: [v3] drm/amdgpu: reset asic after system-wide suspend aborted
> (v3)
>
> On 11/24/2021 23:48, Prike Liang wrote:
> > Do ASIC reset at the moment Sx suspend aborted behind of amdgpu
> > suspend to keep AMDGPU in a clean reset state and that can avoid
> > re-initialize device improperly error. Currently,we just always do
> > asic reset in the amdgpu resume until sort out the PM abort case.
> >
> > v2: Remove incomplete PM abort flag and add GPU hive case check for
> > GPU reset.
> >
> > v3: Some dGPU reset method not support at the early resume time and
> > temprorary skip the dGPU case.
>
> FYI to you that this was tested on an issue with S3 exit to show success that
> you will want to include in a Fixes tag for v4 when you change it to just run
> for S3 path, not S0i3 path.
>
> https://gitlab.freedesktop.org/drm/amd/-/issues/1822
>
Yeah this patch need exclude for S0i3 case, in the S0i3 entry we skip some 
blocks suspend so we may not resume successfully after do GPU reset for S0i3 
case. Furthermore, for S3 abort resume issue also can find on the dGPU series 
and Alex's following patch may can handle this generic issue better when we 
ignore the PM abort check.

https://lore.kernel.org/all/dm6pr12mb26195f8e099407b4b6966febe4...@dm6pr12mb2619.namprd12.prod.outlook.com/T/

Thanks,
Prike
> >
> > Signed-off-by: Prike Liang 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 
> >   1 file changed, 8 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index 7d4115d..f6e1a6a 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -3983,6 +3983,14 @@ int amdgpu_device_resume(struct drm_device
> *dev, bool fbcon)
> > if (adev->in_s0ix)
> > amdgpu_gfx_state_change_set(adev,
> sGpuChangeState_D0Entry);
> >
> > +   /*TODO: In order to not let all-always asic reset affect resume
> latency
> > +* need sort out the case which really need asic reset in the resume
> process.
> > +* As to the known issue on the system suspend abort behind the
> AMDGPU suspend,
> > +* may can sort this case by checking struct suspend_stats which
> need exported
> > +* firstly.
> > +*/
> > +   if (adev->flags & AMD_IS_APU)
> > +   amdgpu_asic_reset(adev);
> > /* post card */
> > if (amdgpu_device_need_post(adev)) {
> > r = amdgpu_device_asic_init(adev);
> >

RE: [PATCH] drm/amd/pm: skip gfx cgpg in the s0ix suspend-resume

2021-12-12 Thread Liang, Prike

[Public]

> -Original Message-
> From: Quan, Evan 
> Sent: Friday, December 10, 2021 10:15 AM
> To: Liang, Prike ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Liang, Prike
> ; Huang, Ray ; Limonciello,
> Mario 
> Subject: RE: [PATCH] drm/amd/pm: skip gfx cgpg in the s0ix suspend-resume
>
> [AMD Official Use Only]
>
>
>
> > -Original Message-
> > From: amd-gfx  On Behalf Of
> > Prike Liang
> > Sent: Thursday, December 9, 2021 9:51 AM
> > To: amd-gfx@lists.freedesktop.org
> > Cc: Deucher, Alexander ; Liang, Prike
> > ; Huang, Ray ; Limonciello,
> > Mario 
> > Subject: [PATCH] drm/amd/pm: skip gfx cgpg in the s0ix suspend-resume
> >
> > In the s0ix entry need retain gfx in the gfxoff state,we don't disable
> > gfx cgpg in the suspend so there is also needn't enable gfx cgpg in
> > the s0ix resume.
> >
> > Signed-off-by: Prike Liang 
> > ---
> >  drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> > b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> > index 5839918..185269f 100644
> > --- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> > +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> > @@ -1607,7 +1607,8 @@ static int smu_resume(void *handle)
> > return ret;
> > }
> >
> > -   if (smu->is_apu)
> > +   /* skip gfx cgpg in the s0ix suspend-resume case*/
> > +   if (smu->is_apu && !adev->in_s0ix)
> > smu_set_gfx_cgpg(&adev->smu, true);
> [Quan, Evan] I was wondering can we move the "!adev->in_s0ix" into the -
> >set_gfx_cgpg(for now, only smu_v12_0_set_gfx_cgpg() supported by Renoir)
> implementation?
> Also, considering this is only supported by Renoir, we may be able to drop
> the "smu->is_apu" check.
Yes, the set_gfx_cgpg only implemented in the SMU12 series and we can move the 
S0ix protected in the smu_v12_0_set_gfx_cgpg() and drop the S0ix flag check in 
the SMU suspend/resume process.

Thanks,
Prike
>
> BR
> Evan
> >
> > smu->disable_uclk_switch = 0;
> > --
> > 2.7.4

RE: [v3] drm/amdgpu: reset asic after system-wide suspend aborted (v3)

2021-12-03 Thread Liang, Prike

[Public]

> -Original Message-
> From: Alex Deucher 
> Sent: Thursday, December 2, 2021 3:39 AM
> To: Limonciello, Mario 
> Cc: Liang, Prike ; amd-gfx list  g...@lists.freedesktop.org>; Deucher, Alexander
> ; Lazar, Lijo ; Huang,
> Ray 
> Subject: Re: [v3] drm/amdgpu: reset asic after system-wide suspend aborted
> (v3)
>
> On Wed, Dec 1, 2021 at 1:46 PM Limonciello, Mario
>  wrote:
> >
> > On 11/24/2021 23:48, Prike Liang wrote:
> > > Do ASIC reset at the moment Sx suspend aborted behind of amdgpu
> > > suspend to keep AMDGPU in a clean reset state and that can avoid
> > > re-initialize device improperly error. Currently,we just always do
> > > asic reset in the amdgpu resume until sort out the PM abort case.
> > >
> > > v2: Remove incomplete PM abort flag and add GPU hive case check for
> > > GPU reset.
> > >
> > > v3: Some dGPU reset method not support at the early resume time and
> > > temprorary skip the dGPU case.
> > >
> > > Signed-off-by: Prike Liang 
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 
> > >   1 file changed, 8 insertions(+)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > index 7d4115d..f6e1a6a 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > @@ -3983,6 +3983,14 @@ int amdgpu_device_resume(struct
> drm_device *dev, bool fbcon)
> > >   if (adev->in_s0ix)
> > >   amdgpu_gfx_state_change_set(adev,
> > > sGpuChangeState_D0Entry);
> > >
> > > + /*TODO: In order to not let all-always asic reset affect resume 
> > > latency
> > > +  * need sort out the case which really need asic reset in the resume
> process.
> > > +  * As to the known issue on the system suspend abort behind the
> AMDGPU suspend,
> > > +  * may can sort this case by checking struct suspend_stats which 
> > > need
> exported
> > > +  * firstly.
> > > +  */
> > > + if (adev->flags & AMD_IS_APU)
> > > + amdgpu_asic_reset(adev);
> >
> > Ideally you only want this to happen on S3 right?  So shouldn't there
> > be an extra check for `amdgpu_acpi_is_s0ix_active`?
>
> Shouldn't matter on the resume side.  Only the suspend side.  If we reset in
> suspend, we'd end up disabling gfxoff.  On the resume side, it should safe,
> but the resume paths for various IPs probably are not adequate to deal with
> a reset for S0i3 since they don't re-init as much hardware.  So it's probably
> better to skip this for S0i3.
>
> Alex
>
There's may some IP suspend/resume sequence not program correctly and when 
abort the s2idle after AMDGPU suspend can see some other error in the SMU/SDMA 
resume. However, that seems different symptom in the S3 abort case and will 
handle it separately.

>
> >
> > >   /* post card */
> > >   if (amdgpu_device_need_post(adev)) {
> > >   r = amdgpu_device_asic_init(adev);
> > >
> >

RE: [PATCH v2] drm/amdgpu: reset asic after system-wide suspend aborted (v2)

2021-11-24 Thread Liang, Prike

[Public]

> -Original Message-
> From: Lazar, Lijo 
> Sent: Wednesday, November 24, 2021 9:30 PM
> To: Liang, Prike ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Huang, Ray
> 
> Subject: Re: [PATCH v2] drm/amdgpu: reset asic after system-wide suspend
> aborted (v2)
>
>
>
> On 11/24/2021 6:13 PM, Prike Liang wrote:
> > Do ASIC reset at the moment Sx suspend aborted behind of amdgpu
> > suspend to keep AMDGPU in a clean reset state and that can avoid
> > re-initialize device improperly error. Currently,we just always do
> > asic reset in the amdgpu resume until sort out the PM abort case.
> >
> > v2: Remove incomplete PM abort flag and add GPU hive case check for
> > GPU reset.
> >
> > Signed-off-by: Prike Liang 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 
> >   1 file changed, 8 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index 7d4115d..3fcd90d 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -3983,6 +3983,14 @@ int amdgpu_device_resume(struct drm_device
> *dev, bool fbcon)
> > if (adev->in_s0ix)
> > amdgpu_gfx_state_change_set(adev,
> sGpuChangeState_D0Entry);
> >
> > +   /*TODO: In order to not let all-always asic reset affect resume
> latency
> > +* need sort out the case which really need asic reset in the resume
> process.
> > +* As to the known issue on the system suspend abort behind the
> AMDGPU suspend,
> > +* may can sort this case by checking struct suspend_stats which
> need exported
> > +* firstly.
> > +*/
> > +   if (adev->gmc.xgmi.num_physical_nodes <= 1)
> > +   amdgpu_asic_reset(adev);
>
> Newer dGPUs depend on PMFW to do reset and that is not loaded at this
> point. For some, there is a mini FW available which could technically handle a
> reset and some of the older ones depend on PSP. Strongly suggest to check
> all such cases before doing a reset here.
>
> Or, the safest at this point could be to do the reset only for APUs.
>
> Thanks,
> Lijo
>
Thanks for the input, that may need a lot of effort to sort out reset method 
from many dGPUs.
So in this time let's only handle APUs firstly.

> > /* post card */
> > if (amdgpu_device_need_post(adev)) {
> > r = amdgpu_device_asic_init(adev);
> >

RE: [PATCH] drm/amdgpu: reset asic after system-wide suspend aborted

2021-11-22 Thread Liang, Prike

[Public]

> -Original Message-
> From: Alex Deucher 
> Sent: Monday, November 22, 2021 11:48 PM
> To: Liang, Prike 
> Cc: Lazar, Lijo ; Deucher, Alexander
> ; Christian König
> ; Huang, Ray ;
> amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] drm/amdgpu: reset asic after system-wide suspend
> aborted
>
> On Mon, Nov 22, 2021 at 9:23 AM Liang, Prike 
> wrote:
> >
> > [Public]
> >
> > > -Original Message-
> > > From: Alex Deucher 
> > > Sent: Friday, November 19, 2021 12:18 AM
> > > To: Lazar, Lijo 
> > > Cc: Deucher, Alexander ; Christian König
> > > ; Liang, Prike
> > > ; Huang, Ray ;
> > > amd-gfx@lists.freedesktop.org
> > > Subject: Re: [PATCH] drm/amdgpu: reset asic after system-wide
> > > suspend aborted
> > >
> > > On Thu, Nov 18, 2021 at 10:01 AM Lazar, Lijo 
> wrote:
> > > >
> > > > [Public]
> > > >
> > > >
> > > > BTW, I'm not sure if 'reset always' on resume is a good idea  for
> > > > GPUs in a
> > > hive (assuming those systems also get suspended and get hiccups). At
> > > this point the hive isn't reinitialized.
> > >
> > > Yeah, we should probably not reset if we are part of a hive.
> > >
> > > Alex
> > >
> > For the GPU hive reset in this suspend abort case need treat specially, does
> that because of GPU hive need take care each node reset dependence and
> synchronous reset? For this purpose, can we skip the hive reset case and
> only do GPU reset under adev->gmc.xgmi.num_physical_nodes == 0 ?
>
> Yes, exactly.  For the aborted suspend reset, we can check the value before
> doing a reset.  I think you want to check if
> adev->gmc.xgmi.num_physical_nodes <= 1.
>
> Alex
>
Thanks for the clarification and will add this checking for GPU reset in the 
amdgpu_device_resume().
> >
> > > >
> > > > Thanks,
> > > > Lijo

RE: [PATCH] drm/amdgpu: reset asic after system-wide suspend aborted

2021-11-22 Thread Liang, Prike

[Public]

> -Original Message-
> From: Alex Deucher 
> Sent: Friday, November 19, 2021 12:18 AM
> To: Lazar, Lijo 
> Cc: Deucher, Alexander ; Christian König
> ; Liang, Prike ;
> Huang, Ray ; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] drm/amdgpu: reset asic after system-wide suspend
> aborted
>
> On Thu, Nov 18, 2021 at 10:01 AM Lazar, Lijo  wrote:
> >
> > [Public]
> >
> >
> > BTW, I'm not sure if 'reset always' on resume is a good idea  for GPUs in a
> hive (assuming those systems also get suspended and get hiccups). At this
> point the hive isn't reinitialized.
>
> Yeah, we should probably not reset if we are part of a hive.
>
> Alex
>
For the GPU hive reset in this suspend abort case need treat specially, does 
that because of GPU hive need take care each node reset dependence and 
synchronous reset? For this purpose, can we skip the hive reset case and only 
do GPU reset under adev->gmc.xgmi.num_physical_nodes == 0 ?

> >
> > Thanks,
> > Lijo

RE: [PATCH] drm/amdgpu: reset asic after system-wide suspend aborted

2021-11-22 Thread Liang, Prike

[Public]

> -Original Message-
> From: Lazar, Lijo 
> Sent: Thursday, November 18, 2021 10:28 PM
> To: Alex Deucher 
> Cc: Christian König ; Liang, Prike
> ; Deucher, Alexander
> ; Huang, Ray ;
> amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] drm/amdgpu: reset asic after system-wide suspend
> aborted
>
>
>
> On 11/18/2021 7:55 PM, Alex Deucher wrote:
> > On Thu, Nov 18, 2021 at 9:15 AM Lazar, Lijo  wrote:
> >>
> >>
> >>
> >> On 11/18/2021 7:41 PM, Christian König wrote:
> >>> Am 18.11.21 um 15:09 schrieb Lazar, Lijo:
> >>>> On 11/18/2021 7:36 PM, Alex Deucher wrote:
> >>>>> On Thu, Nov 18, 2021 at 8:11 AM Liang, Prike 
> >>>>> wrote:
> >>>>>>
> >>>>>> [Public]
> >>>>>>
> >>>>>>> -Original Message-
> >>>>>>> From: Lazar, Lijo 
> >>>>>>> Sent: Thursday, November 18, 2021 4:01 PM
> >>>>>>> To: Liang, Prike ;
> >>>>>>> amd-gfx@lists.freedesktop.org
> >>>>>>> Cc: Deucher, Alexander ; Huang,
> Ray
> >>>>>>> 
> >>>>>>> Subject: Re: [PATCH] drm/amdgpu: reset asic after system-wide
> >>>>>>> suspend aborted
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 11/18/2021 12:32 PM, Prike Liang wrote:
> >>>>>>>> Do ASIC reset at the moment Sx suspend aborted behind of
> amdgpu
> >>>>>>>> suspend to keep AMDGPU in a clean reset state and that can
> >>>>>>>> avoid re-initialize device improperly error.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Prike Liang 
> >>>>>>>> ---
> >>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu.h|  1 +
> >>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  4 
> >>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 19
> >>>>>>> +++
> >>>>>>>> 3 files changed, 24 insertions(+)
> >>>>>>>>
> >>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >>>>>>>> index b85b67a..8bd9833 100644
> >>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >>>>>>>> @@ -1053,6 +1053,7 @@ struct amdgpu_device {
> >>>>>>>>   boolin_s3;
> >>>>>>>>   boolin_s4;
> >>>>>>>>   boolin_s0ix;
> >>>>>>>> +   boolpm_completed;
> >>>>>>>
> >>>>>>> PM framework maintains separate flags, why not use the same?
> >>>>>>>
> >>>>>>>dev->power.is_suspended = false;
> >>>>>>>dev->power.is_noirq_suspended = false;
> >>>>>>>dev->power.is_late_suspended = false;
> >>>>>>>
> >>>>>>
> >>>>>> Thanks for pointing it out and I miss that flag. In this case we
> >>>>>> can use the PM flag is_noirq_suspended for checking AMDGPU
> device
> >>>>>> whether is finished suspend.
> >>>>>>
> >>>>>>> BTW, Alex posted a patch which does similar thing, though it
> >>>>>>> tries reset if suspend fails.
> >>>>>>>
> >>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%
> >>>>>>>
> 2Flore.kernel.org%2Fall%2FDM6PR12MB26195F8E099407B4B6966FEBE4999
> >>>>>>> %40&data=04%7C01%7Clijo.lazar%40amd.com%7C2ce211aeee
> b448f6cb
> >>>>>>>
> 2c08d9aa9f4741%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C63
> 77
> >>>>>>>
> 28423343483218%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL
> CJQI
> >>>>>>>
> joiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=nyzhG
> >>>>>>>
> wTJV83YZkit34Bb%2B5tBxGEMv

RE: [PATCH] drm/amdgpu: reset asic after system-wide suspend aborted

2021-11-18 Thread Liang, Prike

[Public]

> -Original Message-
> From: Lazar, Lijo 
> Sent: Thursday, November 18, 2021 4:01 PM
> To: Liang, Prike ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Huang, Ray
> 
> Subject: Re: [PATCH] drm/amdgpu: reset asic after system-wide suspend
> aborted
>
>
>
> On 11/18/2021 12:32 PM, Prike Liang wrote:
> > Do ASIC reset at the moment Sx suspend aborted behind of amdgpu
> > suspend to keep AMDGPU in a clean reset state and that can avoid
> > re-initialize device improperly error.
> >
> > Signed-off-by: Prike Liang 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu.h|  1 +
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  4 
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 19
> +++
> >   3 files changed, 24 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > index b85b67a..8bd9833 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > @@ -1053,6 +1053,7 @@ struct amdgpu_device {
> > boolin_s3;
> > boolin_s4;
> > boolin_s0ix;
> > +   boolpm_completed;
>
> PM framework maintains separate flags, why not use the same?
>
>  dev->power.is_suspended = false;
>  dev->power.is_noirq_suspended = false;
>  dev->power.is_late_suspended = false;
>

Thanks for pointing it out and I miss that flag. In this case we can use the PM 
flag is_noirq_suspended for checking AMDGPU device whether is finished suspend.

> BTW, Alex posted a patch which does similar thing, though it tries reset if
> suspend fails.
>
> https://lore.kernel.org/all/DM6PR12MB26195F8E099407B4B6966FEBE4999@
> DM6PR12MB2619.namprd12.prod.outlook.com/
>
> If that reset also failed, then no point in another reset, or keep it as part 
> of
> resume.
>

Alex's patch seems always do the ASIC reset at the end of AMDGPU device, but 
that may needn't on the normal AMDGPU device suspend. However, this patch shows 
that can handle the system suspend aborted after AMDGPU suspend case well, so 
now seems we only need take care suspend abort case here.

> Thanks,
> Lijo
>
> >
> > atomic_tin_gpu_reset;
> > enum pp_mp1_state   mp1_state;
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index ec42a6f..a12ed54 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -3983,6 +3983,10 @@ int amdgpu_device_resume(struct drm_device
> *dev, bool fbcon)
> > if (adev->in_s0ix)
> > amdgpu_gfx_state_change_set(adev,
> sGpuChangeState_D0Entry);
> >
> > +   if (!adev->pm_completed) {
> > +   dev_warn(adev->dev, "suspend aborted will do asic reset\n");
> > +   amdgpu_asic_reset(adev);
> > +   }
> > /* post card */
> > if (amdgpu_device_need_post(adev)) {
> > r = amdgpu_device_asic_init(adev); diff --git
> > a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > index eee3cf8..9f90017 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > @@ -2168,6 +2168,23 @@ static int amdgpu_pmops_suspend(struct
> device *dev)
> > return r;
> >   }
> >
> > +/*
> > + * Actually the PM suspend whether is completed should be confirmed
> > + * by checking the sysfs
> > +sys/power/suspend_stats/failed_suspend.However,
> > + * in this function only check the AMDGPU device whether is suspended
> > + * completely in the system-wide suspend process.
> > + */
> > +static int amdgpu_pmops_noirq_suspend(struct device *dev) {
> > +   struct drm_device *drm_dev = dev_get_drvdata(dev);
> > +   struct amdgpu_device *adev = drm_to_adev(drm_dev);
> > +
> > +   dev_dbg(dev, "amdgpu suspend completely.\n");
> > +   adev->pm_completed = true;
> > +
> > +   return 0;
> > +}
> > +
> >   static int amdgpu_pmops_resume(struct device *dev)
> >   {
> > struct drm_device *drm_dev = dev_get_drvdata(dev); @@ -2181,6
> > +2198,7 @@ static int amdgpu_pmops_resume(struct device *dev)
> > r = amdgpu_device_resume(drm_dev, true);
> > if (amdgpu_acpi_is_s0ix_active(adev))
> > adev->in_s0ix = false;
> > +   adev->pm_completed = false;
> > return r;
> >   }
> >
> > @@ -2397,6 +2415,7 @@ static const struct dev_pm_ops amdgpu_pm_ops
> = {
> > .runtime_suspend = amdgpu_pmops_runtime_suspend,
> > .runtime_resume = amdgpu_pmops_runtime_resume,
> > .runtime_idle = amdgpu_pmops_runtime_idle,
> > +   .suspend_noirq = amdgpu_pmops_noirq_suspend,
> >   };
> >
> >   static int amdgpu_flush(struct file *f, fl_owner_t id)
> >

RE: [PATCH] drm/amdgpu: force exit gfxoff on sdma resume for rmb s0ix

2021-09-23 Thread Liang, Prike

[Public]

Hold on there's still need further check the gfxoff control residence and will 
update the patch.

Thanks,
Prike
> -Original Message-
> From: Liang, Prike 
> Sent: Friday, September 24, 2021 1:18 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Huang, Ray
> ; Liang, Prike 
> Subject: [PATCH] drm/amdgpu: force exit gfxoff on sdma resume for rmb
> s0ix
>
> In the s2idle stress test sdma resume fail occasionally,in the failed case GPU
> is in the gfxoff state.This issue may introduce by FSDL miss handle doorbell
> S/R and now temporary fix the issue by forcing exit gfxoff for sdma resume.
>
> Signed-off-by: Prike Liang 
> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 11 +++
>  1 file changed, 11 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index 24b0195..af759ab 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -7608,6 +7608,14 @@ static int gfx_v10_0_suspend(void *handle)
>
>  static int gfx_v10_0_resume(void *handle)  {
> + struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> +
> + /* TODO: check whether can submit a doorbell request to raise
> +  * a doorbell fence to exit gfxoff.
> +  */
> + if (adev->in_s0ix)
> + amdgpu_gfx_off_ctrl(adev, false);
> +
>   return gfx_v10_0_hw_init(handle);
>  }
>
> @@ -7819,6 +7827,9 @@ static int gfx_v10_0_late_init(void *handle)
>   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>   int r;
>
> + if (adev->in_s0ix)
> +  amdgpu_gfx_off_ctrl(adev, true);
> +
>   r = amdgpu_irq_get(adev, &adev->gfx.priv_reg_irq, 0);
>   if (r)
>   return r;
> --
> 2.7.4

RE: [PATCH 1/2] drm/amd/pm: Add information about SMU11 firmware version

2021-07-21 Thread Liang, Prike

[Public]

> -Original Message-
> From: Limonciello, Mario 
> Sent: Wednesday, July 21, 2021 11:15 AM
> To: Liang, Prike ; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH 1/2] drm/amd/pm: Add information about SMU11
> firmware version
>
> On 7/20/2021 22:07, Liang, Prike wrote:
> > [Public]
> >
> > In the SMU issue troubleshooting process we also can check the SMU
> > version by reading MP1 scratch register and  from long term we may
> > need put the SMC version collection in the debug sysfs
> > amdgpu_firmware_info
> >
> > In this patch fashion, we better use dev_dbg instead of dev_info for only
> debug purpose.
>
> Actually SMUv13 files have it at info level, which is why it was modeled this
> way.  Perhaps v13 should also decrease this to debug then.
>
[Prike] OK, that seems not great deal for setting print level here. Besides, 
now base on you patch we may also need clean up the SMU version in the header 
mismatch print info and just throw out the warning message like as following.

if (if_version != smu->smc_driver_if_version) {
dev_warn(smu->adev->dev, "SMU driver if version not matched\n");
}

> >
> >> -Original Message-
> >> From: amd-gfx  On Behalf Of
> >> Mario Limonciello
> >> Sent: Wednesday, July 21, 2021 12:18 AM
> >> To: amd-gfx@lists.freedesktop.org
> >> Cc: Limonciello, Mario 
> >> Subject: [PATCH 1/2] drm/amd/pm: Add information about SMU11
> firmware
> >> version
> >>
> >> This information is useful for root causing issues with S0ix.
> >>
> >> Signed-off-by: Mario Limonciello 
> >> ---
> >>   drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c | 3 +++
> >>   1 file changed, 3 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c
> >> b/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c
> >> index 0a5d46ac9ccd..626d7c2bdf66 100644
> >> --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c
> >> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c
> >> @@ -272,6 +272,9 @@ int smu_v11_0_check_fw_version(struct
> smu_context
> >> *smu)
> >>break;
> >>}
> >>
> >> + dev_info(smu->adev->dev, "smu fw reported version = 0x%08x
> >> (%d.%d.%d)\n",
> >> +  smu_version, smu_major, smu_minor, smu_debug);
> >> +
> >>/*
> >> * 1. if_version mismatch is not critical as our fw is designed
> >> * to be backward compatible.
> >> --
> >> 2.25.1
> >>
> >> ___
> >> amd-gfx mailing list
> >> amd-gfx@lists.freedesktop.org
> >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flis
> >> ts.f
> >> reedesktop.org%2Fmailman%2Flistinfo%2Famd-
> >>
> gfx&data=04%7C01%7CPrike.Liang%40amd.com%7C62180964b7d24208
> >>
> b59908d94b99f971%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C
> >>
> 637623947521949203%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
> >>
> DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdat
> >>
> a=ACcymqjRA5e1wmQmBCPW5cwsM1tF5QXOXQRukuAgkeg%3D&reser
> >> ved=0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH 1/2] drm/amd/pm: Add information about SMU11 firmware version

2021-07-21 Thread Liang, Prike

[Public]

In the SMU issue troubleshooting process we also can check the SMU version by 
reading MP1 scratch register and  from long term we may need put the SMC 
version collection in the debug sysfs amdgpu_firmware_info.

In this patch fashion, we better use dev_dbg instead of dev_info for only debug 
purpose.

> -Original Message-
> From: amd-gfx  On Behalf Of
> Mario Limonciello
> Sent: Wednesday, July 21, 2021 12:18 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Limonciello, Mario 
> Subject: [PATCH 1/2] drm/amd/pm: Add information about SMU11 firmware
> version
>
> This information is useful for root causing issues with S0ix.
>
> Signed-off-by: Mario Limonciello 
> ---
>  drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c
> b/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c
> index 0a5d46ac9ccd..626d7c2bdf66 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c
> @@ -272,6 +272,9 @@ int smu_v11_0_check_fw_version(struct
> smu_context *smu)
>   break;
>   }
>
> + dev_info(smu->adev->dev, "smu fw reported version = 0x%08x
> (%d.%d.%d)\n",
> +  smu_version, smu_major, smu_minor, smu_debug);
> +
>   /*
>* 1. if_version mismatch is not critical as our fw is designed
>* to be backward compatible.
> --
> 2.25.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.f
> reedesktop.org%2Fmailman%2Flistinfo%2Famd-
> gfx&data=04%7C01%7CPrike.Liang%40amd.com%7C62180964b7d24208
> b59908d94b99f971%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C
> 637623947521949203%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
> DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdat
> a=ACcymqjRA5e1wmQmBCPW5cwsM1tF5QXOXQRukuAgkeg%3D&reser
> ved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amd/pm: skip PrepareMp1ForUnload message in s0ix

2021-06-28 Thread Liang, Prike

[AMD Official Use Only]

Thanks update this patch and remove APU flag for avoiding over protection in 
this case.

Reviewed-by: Prike Liang 

> -Original Message-
> From: amd-gfx  On Behalf Of
> Shyam Sundar S K
> Sent: Monday, June 28, 2021 3:55 PM
> To: Deucher, Alexander ; Koenig, Christian
> ; airl...@linux.ie; dan...@ffwll.ch; Huang, Ray
> ; Hou, Xiaomeng (Matthew)
> ; Liu, Aaron 
> Cc: S-k, Shyam-sundar ; dri-
> de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org
> Subject: [PATCH] drm/amd/pm: skip PrepareMp1ForUnload message in s0ix
>
> The documentation around PrepareMp1ForUnload message says that
> anything sent to SMU after this command would be stalled as the PMFW
> would not be in a state to take further job requests.
>
> Technically this is right in case of S3 scenario. But, this might not be the 
> case
> during s0ix as the PMC driver would be the last to send the SMU on the
> OS_HINT. If SMU gets a PrepareMp1ForUnload message before the OS_HINT,
> this would stall the entire S0ix process.
>
> Results show that, this message to SMU is not required during S0ix and hence
> skip it.
>
> Signed-off-by: Shyam Sundar S K 
> Acked-by: Huang Rui 
> ---
>  drivers/gpu/drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c
> b/drivers/gpu/drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c
> index 7664334d8144..18a1ffdca227 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c
> @@ -189,10 +189,11 @@ static int yellow_carp_init_smc_tables(struct
> smu_context *smu)  static int yellow_carp_system_features_control(struct
> smu_context *smu, bool en)  {
>   struct smu_feature *feature = &smu->smu_feature;
> + struct amdgpu_device *adev = smu->adev;
>   uint32_t feature_mask[2];
>   int ret = 0;
>
> - if (!en)
> + if (!en && !adev->in_s0ix)
>   ret = smu_cmn_send_smc_msg(smu,
> SMU_MSG_PrepareMp1ForUnload, NULL);
>
>   bitmap_zero(feature->enabled, feature->feature_num);
> --
> 2.25.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.f
> reedesktop.org%2Fmailman%2Flistinfo%2Famd-
> gfx&data=04%7C01%7CPrike.Liang%40amd.com%7C05062eb7dd9c4520
> b28d08d93a0a0c5e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7
> C637604637106550264%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
> MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sd
> ata=fyo4FgXkxaRWWPEAMufdXj0f2aom%2Fhz8nE%2FZP1AM7Wo%3D&
> reserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: check whether s2idle is enabled to determine s0ix

2021-03-30 Thread Liang, Prike

[AMD Public Use]

This issue should occur on the hybrid s0i3 system which forces mem_sleep to 
deep level on the s0i3 enabled platform. We may need use the 
acpi_target_system_state() to identify the system target sleep level and then 
handle AMDGPU Sx[0..5] suspend/resume respectively.

Reviewed-by: Prike Liang 

Thanks,
Prike
> -Original Message-
> From: amd-gfx  On Behalf Of Alex
> Deucher
> Sent: Tuesday, March 30, 2021 1:41 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander 
> Subject: [PATCH] drm/amdgpu: check whether s2idle is enabled to determine
> s0ix
>
> For legacy S3, we need to use different handling in the driver.
> Suggested by Heiko Przybyl.
>
> Bug:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitla
> b.freedesktop.org%2Fdrm%2Famd%2F-
> %2Fissues%2F1553&data=04%7C01%7CPrike.Liang%40amd.com%7Ce0a
> 7dd7c6958449814c508d8f2d9d1c1%7C3dd8961fe4884e608e11a82d994e183d
> %7C0%7C0%7C637526364636883173%7CUnknown%7CTWFpbGZsb3d8eyJWI
> joiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1
> 000&sdata=SHtEYsQz5%2FgkSGXE%2B9OFzN4yaRgkw6A2xs9IxVy3e5Q%
> 3D&reserved=0
> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> index 2e9b16fb3fcd..b64c002b9aa3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> @@ -26,6 +26,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>
> @@ -906,7 +907,7 @@ bool amdgpu_acpi_is_s0ix_supported(struct
> amdgpu_device *adev)  #if defined(CONFIG_AMD_PMC) ||
> defined(CONFIG_AMD_PMC_MODULE)
>  if (acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0) {
>  if (adev->flags & AMD_IS_APU)
> -return true;
> +return pm_suspend_default_s2idle();
>  }
>  #endif
>  return false;
> --
> 2.30.2
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.f
> reedesktop.org%2Fmailman%2Flistinfo%2Famd-
> gfx&data=04%7C01%7CPrike.Liang%40amd.com%7Ce0a7dd7c69584498
> 14c508d8f2d9d1c1%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C
> 637526364636883173%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
> DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdat
> a=BZF35Ev5P7Guzwd3HYUGksztE30D4Cc9mhWk%2FjbaTyA%3D&reserv
> ed=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: fix S0ix handling when the CONFIG_AMD_PMC=m

2021-03-10 Thread Liang, Prike

Just find out the config variable is formed by the header_print_symbol() in the 
confdata.c and also can double confirm the variable config set value in the 
include/generated/autoconf.h. Maybe we can set the  AMD_PMC to 'y' by default 
in the platform Kconfig.

Thanks,
Prike
> -Original Message-
> From: Liang, Prike
> Sent: Wednesday, March 10, 2021 4:21 PM
> To: Alex Deucher ; amd-
> g...@lists.freedesktop.org
> Cc: Deucher, Alexander 
> Subject: RE: [PATCH] drm/amdgpu: fix S0ix handling when the
> CONFIG_AMD_PMC=m
> 
> Not find the Kconfig script in the in-tree repo can strip the config variables
> and add  _MODULE suffix to module config variable. Not sure this  _MODULE
> variable config whether introduced by some specific ENV.
> 
> Acked-by: Prike Liang 
> 
> > -Original Message-
> > From: amd-gfx  On Behalf Of
> > Alex Deucher
> > Sent: Wednesday, March 10, 2021 12:01 PM
> > To: amd-gfx@lists.freedesktop.org
> > Cc: Deucher, Alexander 
> > Subject: [PATCH] drm/amdgpu: fix S0ix handling when the
> > CONFIG_AMD_PMC=m
> >
> > Need to check the module variant as well.
> >
> > Signed-off-by: Alex Deucher 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > index 36a741d63ddc..2e9b16fb3fcd 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > @@ -903,7 +903,7 @@ void amdgpu_acpi_fini(struct amdgpu_device
> *adev)
> >   */
> >  bool amdgpu_acpi_is_s0ix_supported(struct amdgpu_device *adev)  {
> > -#if
> > defined(CONFIG_AMD_PMC)
> > +#if defined(CONFIG_AMD_PMC) || defined(CONFIG_AMD_PMC_MODULE)
> > if (acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0) {
> > if (adev->flags & AMD_IS_APU)
> > return true;
> > --
> > 2.29.2
> >
> > ___
> > amd-gfx mailing list
> > amd-gfx@lists.freedesktop.org
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist
> > s.f
> > reedesktop.org%2Fmailman%2Flistinfo%2Famd-
> >
> gfx&data=04%7C01%7CPrike.Liang%40amd.com%7Cebd55e2581834e12
> >
> ee3d08d8e37921b0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7
> >
> C637509456690121890%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
> >
> MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sd
> >
> ata=zKMQiqcdFs1lyuSJY2Zg3DxQemTYqCRVGtNFSiud3NA%3D&reserved
> > =0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH 4/7] drm/amdgpu: track what pmops flow we are in

2021-03-10 Thread Liang, Prike

[AMD Official Use Only - Internal Distribution Only]

Maybe we can use the acpi_target_system_state() interface to identify the 
system-wide suspend target level Sx and then can parse the return code by the 
following macro definition. If have bandwidth will update and refine the AMDGPU 
Sx[0..5] suspend/resume sequence.

#define ACPI_STATE_S0   (u8) 0
#define ACPI_STATE_S1   (u8) 1
#define ACPI_STATE_S2   (u8) 2
#define ACPI_STATE_S3   (u8) 3
#define ACPI_STATE_S4   (u8) 4
#define ACPI_STATE_S5   (u8) 5

Thanks,
Prike
> -Original Message-
> From: amd-gfx  On Behalf Of
> Bhardwaj, Rajneesh
> Sent: Wednesday, March 10, 2021 1:25 AM
> To: Alex Deucher ; Lazar, Lijo
> 
> Cc: Deucher, Alexander ; amd-
> g...@lists.freedesktop.org
> Subject: Re: [PATCH 4/7] drm/amdgpu: track what pmops flow we are in
>
> pm_message_t events seem to be the right thing to use here instead of
> driver's privately managed power states. Please have a look
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir
> .bootlin.com%2Flinux%2Fv4.7%2Fsource%2Fdrivers%2Fgpu%2Fdrm%2Fi915
> %2Fi915_drv.c%23L714&data=04%7C01%7CPrike.Liang%40amd.com%7
> C473ede68e7a347ff606b08d8e3204e94%7C3dd8961fe4884e608e11a82d994e
> 183d%7C0%7C0%7C637509075233985095%7CUnknown%7CTWFpbGZsb3d8e
> yJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D
> %7C1000&sdata=Y%2BNKrW2LfzB157fZ%2FuLn7QAu%2BmxVgHjzG8LO
> CH8z6J4%3D&reserved=0
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir
> .bootlin.com%2Flinux%2Fv4.7%2Fsource%2Fdrivers%2Fgpu%2Fdrm%2Fdrm_
> sysfs.c%23L43&data=04%7C01%7CPrike.Liang%40amd.com%7C473ede6
> 8e7a347ff606b08d8e3204e94%7C3dd8961fe4884e608e11a82d994e183d%7C
> 0%7C0%7C637509075233985095%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiM
> C4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000
> &sdata=svJsR97DiTwcbYHn3Y9dDV0YQCVzx5HLiebqQ9mTRY8%3D&am
> p;reserved=0
>
> Thanks,
>
> Rajneesh
>
>
> On 3/9/2021 10:47 AM, Alex Deucher wrote:
> > [CAUTION: External Email]
> >
> > On Tue, Mar 9, 2021 at 1:19 AM Lazar, Lijo  wrote:
> >> [AMD Public Use]
> >>
> >> This seems a duplicate of dev_pm_info states. Can't we reuse that?
> > Are you referring to the PM_EVENT_ messages in
> > dev_pm_info.power_state?  Maybe.  I was not able to find much
> > documentation on how those should be used.  Do you know?
> >
> > Alex
> >
> >
> >> Thanks,
> >> Lijo
> >>
> >> -Original Message-
> >> From: amd-gfx  On Behalf Of
> >> Alex Deucher
> >> Sent: Tuesday, March 9, 2021 9:40 AM
> >> To: amd-gfx@lists.freedesktop.org
> >> Cc: Deucher, Alexander 
> >> Subject: [PATCH 4/7] drm/amdgpu: track what pmops flow we are in
> >>
> >> We reuse the same suspend and resume functions for all of the pmops
> states, so flag what state we are in so that we can alter behavior deeper in
> the driver depending on the current flow.
> >>
> >> Signed-off-by: Alex Deucher 
> >> ---
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu.h   | 20 +++-
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   | 58
> +++
> >>   drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c |  3 +-
> >>   3 files changed, 70 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >> index d47626ce9bc5..4ddc5cc983c7 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >> @@ -347,6 +347,24 @@ int amdgpu_device_ip_block_add(struct
> >> amdgpu_device *adev,  bool amdgpu_get_bios(struct amdgpu_device
> >> *adev);  bool amdgpu_read_bios(struct amdgpu_device *adev);
> >>
> >> +/*
> >> + * PM Ops
> >> + */
> >> +enum amdgpu_pmops_state {
> >> +   AMDGPU_PMOPS_NONE = 0,
> >> +   AMDGPU_PMOPS_PREPARE,
> >> +   AMDGPU_PMOPS_COMPLETE,
> >> +   AMDGPU_PMOPS_SUSPEND,
> >> +   AMDGPU_PMOPS_RESUME,
> >> +   AMDGPU_PMOPS_FREEZE,
> >> +   AMDGPU_PMOPS_THAW,
> >> +   AMDGPU_PMOPS_POWEROFF,
> >> +   AMDGPU_PMOPS_RESTORE,
> >> +   AMDGPU_PMOPS_RUNTIME_SUSPEND,
> >> +   AMDGPU_PMOPS_RUNTIME_RESUME,
> >> +   AMDGPU_PMOPS_RUNTIME_IDLE,
> >> +};
> >> +
> >>   /*
> >>* Clocks
> >>*/
> >> @@ -1019,8 +1037,8 @@ struct amdgpu_device {
> >>  u8  
> >> reset_magic[AMDGPU_RESET_MAGIC_NUM];
> >>
> >>  /* s3/s4 mask */
> >> +   enum amdgpu_pmops_state pmops_state;
> >>  boolin_suspend;
> >> -   boolin_hibernate;
> >>
> >>  /*
> >>   * The combination flag in_poweroff_reboot_com used to
> >> identify the poweroff diff --git
> >> a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> >> index 3e6bb7d79652..0312c52bd39d 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> >> @

RE: [PATCH] drm/amdgpu: fix S0ix handling when the CONFIG_AMD_PMC=m

2021-03-10 Thread Liang, Prike

Not find the Kconfig script in the in-tree repo can strip the config variables 
and add  _MODULE suffix to module config variable. Not sure this  _MODULE 
variable config whether introduced by some specific ENV.

Acked-by: Prike Liang 

> -Original Message-
> From: amd-gfx  On Behalf Of Alex
> Deucher
> Sent: Wednesday, March 10, 2021 12:01 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander 
> Subject: [PATCH] drm/amdgpu: fix S0ix handling when the
> CONFIG_AMD_PMC=m
> 
> Need to check the module variant as well.
> 
> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> index 36a741d63ddc..2e9b16fb3fcd 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> @@ -903,7 +903,7 @@ void amdgpu_acpi_fini(struct amdgpu_device *adev)
>   */
>  bool amdgpu_acpi_is_s0ix_supported(struct amdgpu_device *adev)  { -#if
> defined(CONFIG_AMD_PMC)
> +#if defined(CONFIG_AMD_PMC) || defined(CONFIG_AMD_PMC_MODULE)
>   if (acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0) {
>   if (adev->flags & AMD_IS_APU)
>   return true;
> --
> 2.29.2
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.f
> reedesktop.org%2Fmailman%2Flistinfo%2Famd-
> gfx&data=04%7C01%7CPrike.Liang%40amd.com%7Cebd55e2581834e12
> ee3d08d8e37921b0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7
> C637509456690121890%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
> MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sd
> ata=zKMQiqcdFs1lyuSJY2Zg3DxQemTYqCRVGtNFSiud3NA%3D&reserved
> =0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: fix the hibernation suspend with s0ix

2021-03-08 Thread Liang, Prike




> -Original Message-
> From: Alex Deucher 
> Sent: Tuesday, March 9, 2021 12:07 PM
> To: Liang, Prike 
> Cc: amd-gfx list ; Deucher, Alexander
> ; Huang, Ray 
> Subject: Re: [PATCH] drm/amdgpu: fix the hibernation suspend with s0ix
> 
> On Mon, Mar 8, 2021 at 10:52 PM Prike Liang  wrote:
> >
> > During system hibernation suspend still need un-gate gfx CG/PG firstly
> > to handle HW status check before HW resource destory.
> >
> > Signed-off-by: Prike Liang 
> 
> This is fine for stable, but we should work on cleaning this up.  I have a 
> patch
> set to improve this, but it's more invasive.  We really need to sort out what
> specific parts of
> amdgpu_device_ip_suspend_phase2() are problematic and special case
> them.  We shouldn't just be skipping that function.
[Prike] Yeah in this stage we're just try make the s0ix been functional and 
stable. The AMDGPU work mode is aligning  with windows KMD s0ix sequence and 
only suspend the DCE and IH for s0i3 entry . Will try figure out the each GNB 
IP idle off dependency and then improve the AMDGPU suspend/resume sequence for 
system-wide Sx  entry/exit.  

> Acked-by: Alex Deucher 
> 
> Alex
> 
> 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index e247c3a..7079bfc 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -2683,7 +2683,7 @@ static int
> > amdgpu_device_ip_suspend_phase1(struct amdgpu_device *adev)  {
> > int i, r;
> >
> > -   if (adev->in_poweroff_reboot_com ||
> > +   if (adev->in_poweroff_reboot_com || adev->in_hibernate ||
> > !amdgpu_acpi_is_s0ix_supported(adev) || amdgpu_in_reset(adev))
> {
> > amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE);
> > amdgpu_device_set_cg_state(adev, AMD_CG_STATE_UNGATE);
> > @@ -3750,7 +3750,7 @@ int amdgpu_device_suspend(struct drm_device
> > *dev, bool fbcon)
> >
> > amdgpu_fence_driver_suspend(adev);
> >
> > -   if (adev->in_poweroff_reboot_com ||
> > +   if (adev->in_poweroff_reboot_com || adev->in_hibernate ||
> > !amdgpu_acpi_is_s0ix_supported(adev) || amdgpu_in_reset(adev))
> > r = amdgpu_device_ip_suspend_phase2(adev);
> > else
> > --
> > 2.7.4
> >
> > ___
> > amd-gfx mailing list
> > amd-gfx@lists.freedesktop.org
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist
> > s.freedesktop.org%2Fmailman%2Flistinfo%2Famd-
> gfx&data=04%7C01%7CPr
> >
> ike.Liang%40amd.com%7C641ed997755644a7c30a08d8e2b0d7bf%7C3dd896
> 1fe4884
> >
> e608e11a82d994e183d%7C0%7C0%7C637508596461291719%7CUnknown%7
> CTWFpbGZsb
> >
> 3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0
> %3D%
> >
> 7C1000&sdata=J%2Figj9QUO6Vk74BeE3udM5yVgloUanpXtJUue3pJoFI%
> 3D&
> > reserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: Only check for S0ix if AMD_PMC is configured

2021-02-28 Thread Liang, Prike

[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Prike Liang 

> -Original Message-
> From: amd-gfx  On Behalf Of Alex
> Deucher
> Sent: Saturday, February 27, 2021 6:28 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander 
> Subject: [PATCH] drm/amdgpu: Only check for S0ix if AMD_PMC is configured
>
> The S0ix check only makes sense if the AMD PMC driver is present.  We need
> to use the legacy S3 pathes when the PMC driver is not present.
>
> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> index 8155c54392c8..36a741d63ddc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> @@ -903,10 +903,11 @@ void amdgpu_acpi_fini(struct amdgpu_device
> *adev)
>   */
>  bool amdgpu_acpi_is_s0ix_supported(struct amdgpu_device *adev)  {
> +#if defined(CONFIG_AMD_PMC)
>  if (acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0) {
>  if (adev->flags & AMD_IS_APU)
>  return true;
>  }
> -
> +#endif
>  return false;
>  }
> --
> 2.29.2
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.f
> reedesktop.org%2Fmailman%2Flistinfo%2Famd-
> gfx&data=04%7C01%7CPrike.Liang%40amd.com%7Cfe79a895d34a4d25
> 47cc08d8daa5d4d8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7
> C637499753082442990%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
> MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sd
> ata=LKHdOptexAF4NF%2F7nhMyjvh7rGcQN%2FnpTQS7yKT%2FAPM%3D&am
> p;reserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: fix shutdown with s0ix

2021-02-18 Thread Liang, Prike

[AMD Official Use Only - Internal Distribution Only]

Thanks Alex fix, I also have drafted the fix locally for the poweroff and 
shutdown on the s0ix enablement.
Besides the shutdown opt fix for the reboot process and we also need a 
similarity for the poweroff opt.

So how about create a new combination flag for legacy PM poweroff() and 
shutdown() opt?

Thanks,
Prike
> -Original Message-
> From: Alex Deucher 
> Sent: Friday, February 19, 2021 1:11 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Liang, Prike
> 
> Subject: [PATCH] drm/amdgpu: fix shutdown with s0ix
>
> For shutdown needs to be handled differently and s0ix.  Add a new flag to
> shutdown and use it to adjust behavior appropriately.
>
> Bug:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitla
> b.freedesktop.org%2Fdrm%2Famd%2F-
> %2Fissues%2F1499&data=04%7C01%7CPrike.Liang%40amd.com%7C247
> 59f57e2644f26deaf08d8d4302cc3%7C3dd8961fe4884e608e11a82d994e183d
> %7C0%7C0%7C637492650673813454%7CUnknown%7CTWFpbGZsb3d8eyJWI
> joiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1
> 000&sdata=I0rH%2FFt2cs%2BdNwvNdqWKPE%2B3bOosUyBodViUsEwb
> 6tE%3D&reserved=0
> Fixes: 628c36d7b238e2 ("drm/amdgpu: update amdgpu device
> suspend/resume sequence for s0i3 support")
> Signed-off-by: Alex Deucher 
> Cc: Prike Liang 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h| 1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 --
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 2 ++
>  3 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index da258331e86b..7f5500d8e8f4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -1018,6 +1018,7 @@ struct amdgpu_device {
>  /* s3/s4 mask */
>  boolin_suspend;
>  boolin_hibernate;
> +boolin_shutdown;
>
>  atomic_t in_gpu_reset;
>  enum pp_mp1_state   mp1_state;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 7052dc35d278..ecd0201050ac 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2678,7 +2678,8 @@ static int
> amdgpu_device_ip_suspend_phase1(struct amdgpu_device *adev)  {
>  int i, r;
>
> -if (!amdgpu_acpi_is_s0ix_supported(adev) ||
> amdgpu_in_reset(adev)) {
> +if (adev->in_shutdown ||
> +!amdgpu_acpi_is_s0ix_supported(adev) || amdgpu_in_reset(adev))
> {
>  amdgpu_device_set_pg_state(adev,
> AMD_PG_STATE_UNGATE);
>  amdgpu_device_set_cg_state(adev,
> AMD_CG_STATE_UNGATE);
>  }
> @@ -3741,7 +3742,8 @@ int amdgpu_device_suspend(struct drm_device
> *dev, bool fbcon)
>
>  amdgpu_fence_driver_suspend(adev);
>
> -if (!amdgpu_acpi_is_s0ix_supported(adev) ||
> amdgpu_in_reset(adev))
> +if (adev->in_shutdown ||
> +!amdgpu_acpi_is_s0ix_supported(adev) || amdgpu_in_reset(adev))
>  r = amdgpu_device_ip_suspend_phase2(adev);
>  else
>  amdgpu_gfx_state_change_set(adev,
> sGpuChangeState_D3Entry); diff --git
> a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 2ddaa72437e3..b44358e8dc5b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -1265,6 +1265,7 @@ amdgpu_pci_shutdown(struct pci_dev *pdev)
>  if (amdgpu_ras_intr_triggered())
>  return;
>
> +adev->in_shutdown = true;
>  /* if we are running in a VM, make sure the device
>   * torn down properly on reboot/shutdown.
>   * unfortunately we can't detect certain @@ -1274,6 +1275,7 @@
> amdgpu_pci_shutdown(struct pci_dev *pdev)
>  adev->mp1_state = PP_MP1_STATE_UNLOAD;
>  amdgpu_device_ip_suspend(adev);
>  adev->mp1_state = PP_MP1_STATE_NONE;
> +adev->in_shutdown = false;
>  }
>
>  static int amdgpu_pmops_prepare(struct device *dev)
> --
> 2.29.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: remove gpu info firmware of green sardine

2021-01-18 Thread Liang, Prike

[AMD Official Use Only - Internal Distribution Only]

> -Original Message-
> From: Huang, Ray 
> Sent: Tuesday, January 19, 2021 2:57 PM
> To: Liang, Prike 
> Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander
> 
> Subject: Re: [PATCH] drm/amdgpu: remove gpu info firmware of green
> sardine
>
> On Tue, Jan 19, 2021 at 02:25:36PM +0800, Liang, Prike wrote:
> > [AMD Official Use Only - Internal Distribution Only]
> >
> > Thanks help clean up. Generally that seems fine but could we keep the
> green sardine chip name to retrieve the GPU info FW when the IP discovery
> fail back to legacy mode?
>
> Do you want to only clean MODULE_FIRMWARE(gpu_info.bin)? That's fine
> for me.
[Prike]  Yeah, that seems enough just remove the green sardine GPU info FW 
declared for amdgpu driver module.
>
> Thanks,
> Ray
>
> >
> > Anyway this patch is Reviewed-by: Prike Liang 
> >
> > Thanks,
> > Prike
> > > -Original Message-
> > > From: Huang, Ray 
> > > Sent: Tuesday, January 19, 2021 1:52 PM
> > > To: amd-gfx@lists.freedesktop.org
> > > Cc: Deucher, Alexander ; Liang, Prike
> > > ; Huang, Ray 
> > > Subject: [PATCH] drm/amdgpu: remove gpu info firmware of green
> > > sardine
> > >
> > > The ip discovery is supported on green sardine, it doesn't need gpu
> > > info firmware anymore.
> > >
> > > Signed-off-by: Huang Rui 
> > > ---
> > >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +--
> > >  1 file changed, 1 insertion(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > index 4d434803fb49..f1a426d8861d 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > @@ -81,7 +81,6 @@
> MODULE_FIRMWARE("amdgpu/navi10_gpu_info.bin");
> > >  MODULE_FIRMWARE("amdgpu/navi14_gpu_info.bin");
> > >  MODULE_FIRMWARE("amdgpu/navi12_gpu_info.bin");
> > >  MODULE_FIRMWARE("amdgpu/vangogh_gpu_info.bin");
> > > -MODULE_FIRMWARE("amdgpu/green_sardine_gpu_info.bin");
> > >
> > >  #define AMDGPU_RESUME_MS2000
> > >
> > > @@ -1825,7 +1824,7 @@ static int
> > > amdgpu_device_parse_gpu_info_fw(struct amdgpu_device *adev)  if
> > > (adev->apu_flags & AMD_APU_IS_RENOIR)  chip_name = "renoir";  else
> > > -chip_name = "green_sardine";
> > > +return 0;
> > >  break;
> > >  case CHIP_NAVI10:
> > >  chip_name = "navi10";
> > > --
> > > 2.25.1
> >
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: remove gpu info firmware of green sardine

2021-01-18 Thread Liang, Prike

[AMD Official Use Only - Internal Distribution Only]

Thanks help clean up. Generally that seems fine but could we keep the green 
sardine chip name to retrieve the GPU info FW when the IP discovery fail back 
to legacy mode?

Anyway this patch is Reviewed-by: Prike Liang 

Thanks,
Prike
> -Original Message-
> From: Huang, Ray 
> Sent: Tuesday, January 19, 2021 1:52 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Liang, Prike
> ; Huang, Ray 
> Subject: [PATCH] drm/amdgpu: remove gpu info firmware of green sardine
>
> The ip discovery is supported on green sardine, it doesn't need gpu info
> firmware anymore.
>
> Signed-off-by: Huang Rui 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 4d434803fb49..f1a426d8861d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -81,7 +81,6 @@ MODULE_FIRMWARE("amdgpu/navi10_gpu_info.bin");
>  MODULE_FIRMWARE("amdgpu/navi14_gpu_info.bin");
>  MODULE_FIRMWARE("amdgpu/navi12_gpu_info.bin");
>  MODULE_FIRMWARE("amdgpu/vangogh_gpu_info.bin");
> -MODULE_FIRMWARE("amdgpu/green_sardine_gpu_info.bin");
>
>  #define AMDGPU_RESUME_MS2000
>
> @@ -1825,7 +1824,7 @@ static int
> amdgpu_device_parse_gpu_info_fw(struct amdgpu_device *adev)
>  if (adev->apu_flags & AMD_APU_IS_RENOIR)
>  chip_name = "renoir";
>  else
> -chip_name = "green_sardine";
> +return 0;
>  break;
>  case CHIP_NAVI10:
>  chip_name = "navi10";
> --
> 2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH 2/2] drm/amdgpu/powerplay/smu10: drop unused variable

2020-11-15 Thread Liang, Prike

[AMD Official Use Only - Internal Distribution Only]

Thanks help handle this and that should be also handled in my last patch series 
sent out.
Sorry for this rebase splash error and will take care on this.

Thanks,
Prike
> -Original Message-
> From: amd-gfx  On Behalf Of Alex
> Deucher
> Sent: Saturday, November 14, 2020 4:41 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander 
> Subject: [PATCH 2/2] drm/amdgpu/powerplay/smu10: drop unused variable
>
> Never used so drop it.
>
> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c
> b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c
> index 50308a5573e4..04226b1544e4 100644
> --- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c
> +++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c
> @@ -1441,8 +1441,6 @@ static int smu10_set_fine_grain_clk_vol(struct
> pp_hwmgr *hwmgr,
>
>  static int smu10_gfx_state_change(struct pp_hwmgr *hwmgr, uint32_t state)
> {
> -struct amdgpu_device *adev = hwmgr->adev;
> -
>  smum_send_msg_to_smc_with_parameter(hwmgr,
> PPSMC_MSG_GpuChangeState, state, NULL);
>
>  return 0;
> --
> 2.25.4
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.f
> reedesktop.org%2Fmailman%2Flistinfo%2Famd-
> gfx&data=04%7C01%7CPrike.Liang%40amd.com%7Caf1fc360bf2b415a6
> 6f708d888146937%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C6
> 37408968554801059%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
> DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdat
> a=AXA1E00WPiiOvYqJLcFCZbhLndwFn8T%2BqximgzLo2%2FQ%3D&reser
> ved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH 4/5] drm/amdgpu: fix reset support for s0i3 enablement

2020-11-12 Thread Liang, Prike

[AMD Official Use Only - Internal Distribution Only]

> -Original Message-
> From: Alex Deucher 
> Sent: Friday, November 13, 2020 3:51 AM
> To: Liang, Prike 
> Cc: amd-gfx list ; Deucher, Alexander
> ; Huang, Ray 
> Subject: Re: [PATCH 4/5] drm/amdgpu: fix reset support for s0i3 enablement
>
> On Thu, Nov 12, 2020 at 2:06 AM Prike Liang  wrote:
> >
> > update amdgpu device suspend sequence for gpu reset during s0i3 enable.
> >
> > Signed-off-by: Prike Liang 
>
> Maybe squash this one into patch 3?
>
> Alex
>
[Prike]  Yes, this patch only handle the GPU reset base on the s0i3 enablement 
and will merge to patch3.
>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index cf6a1b9..2f60b70 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -2650,7 +2650,7 @@ static int
> > amdgpu_device_ip_suspend_phase1(struct amdgpu_device *adev)  {
> > int i, r;
> >
> > -   if (!amdgpu_acpi_is_s0ix_supported()) {
> > +   if (!amdgpu_acpi_is_s0ix_supported() || amdgpu_in_reset(adev))
> > + {
> > amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE);
> > amdgpu_device_set_cg_state(adev, AMD_CG_STATE_UNGATE);
> > }
> > @@ -3708,7 +3708,7 @@ int amdgpu_device_suspend(struct drm_device
> > *dev, bool fbcon)
> >
> > amdgpu_fence_driver_suspend(adev);
> >
> > -   if (!amdgpu_acpi_is_s0ix_supported())
> > +   if (!amdgpu_acpi_is_s0ix_supported() || amdgpu_in_reset(adev))
> > r = amdgpu_device_ip_suspend_phase2(adev);
> > else
> > amdgpu_gfx_state_change_set(adev,
> > sGpuChangeState_D3Entry);
> > --
> > 2.7.4
> >
> > ___
> > amd-gfx mailing list
> > amd-gfx@lists.freedesktop.org
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist
> > s.freedesktop.org%2Fmailman%2Flistinfo%2Famd-
> gfx&data=04%7C01%7CPr
> >
> ike.Liang%40amd.com%7C59713bd6a49b4b094d8b08d887445bac%7C3dd89
> 61fe4884
> >
> e608e11a82d994e183d%7C0%7C0%7C637408074951136062%7CUnknown%7
> CTWFpbGZsb
> >
> 3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0
> %3D%
> >
> 7C2000&sdata=Z%2FaCe7d5aGY2HHlc7iTRny%2B2DJl8jPv3QDSa2ad%2F
> zPk%3D&
> > amp;reserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH 2/4] drm/amdgpu: add amdgpu_gfx_state_change_set() set gfx power change entry

2020-10-18 Thread Liang, Prike

[AMD Official Use Only - Internal Distribution Only]

> -Original Message-
> From: Alex Deucher 
> Sent: Saturday, October 17, 2020 2:10 AM
> To: Liang, Prike 
> Cc: amd-gfx list ; Deucher, Alexander
> ; Huang, Ray 
> Subject: Re: [PATCH 2/4] drm/amdgpu: add amdgpu_gfx_state_change_set()
> set gfx power change entry
>
> On Fri, Oct 16, 2020 at 5:21 AM Prike Liang  wrote:
> >
> > The new amdgpu_gfx_state_change_set() funtion can support set GFX
> > power change status to D0/D3.
> >
> > Signed-off-by: Prike Liang 
> > Acked-by: Huang Rui 
> > Reviewed-by: Alex Deucher 
>
> I presume we'll need something similar for renoir?  That can be a follow up
> patch.
[Prike]  Yeah, so I have drafted the invoked function 
amdgpu_gfx_state_change_set() outside of the powerplay driver for common use. 
But until now haven't checked s0i3 on the Renoir/Cezanne yet and if needed will 
be implement it in the Renoir SMU driver.
>
> Alex
>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c   | 20
> 
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h   |  7 +++
> >  drivers/gpu/drm/amd/include/kgd_pp_interface.h|  1 +
> >  drivers/gpu/drm/amd/powerplay/amd_powerplay.c | 20
> 
> >  drivers/gpu/drm/amd/powerplay/hwmgr/smu10_hwmgr.c |  9 +
> >  drivers/gpu/drm/amd/powerplay/inc/hwmgr.h |  1 +
> >  drivers/gpu/drm/amd/powerplay/inc/rv_ppsmc.h  |  3 ++-
> >  7 files changed, 60 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> > index d612033..e1d6c8a 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> > @@ -789,3 +789,23 @@ void amdgpu_kiq_wreg(struct amdgpu_device
> *adev,
> > uint32_t reg, uint32_t v)
> >  failed_kiq_write:
> > pr_err("failed to write reg:%x\n", reg);  }
> > +
> > +/* amdgpu_gfx_state_change_set - Handle gfx power state change set
> > + * @adev: amdgpu_device pointer
> > + * @state: gfx power state(1 -eGpuChangeState_D0Entry and 2
> > +-eGpuChangeState_D3Entry)
> > + *
> > + */
> > +
> > +void amdgpu_gfx_state_change_set(struct amdgpu_device *adev, enum
> > +gfx_change_state state) {
> > +
> > +   mutex_lock(&adev->pm.mutex);
> > +
> > +   if (adev->powerplay.pp_funcs &&
> > +   adev->powerplay.pp_funcs->gfx_state_change_set)
> > +   ((adev)->powerplay.pp_funcs->gfx_state_change_set(
> > +   (adev)->powerplay.pp_handle,
> > + state));
> > +
> > +   mutex_unlock(&adev->pm.mutex);
> > +
> > +}
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > index d43c116..73942b2 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > @@ -47,6 +47,12 @@ enum gfx_pipe_priority {
> > AMDGPU_GFX_PIPE_PRIO_MAX
> >  };
> >
> > +/* Argument for PPSMC_MSG_GpuChangeState */ enum
> gfx_change_state {
> > +   GpuChangeState_D0Entry = 1,
> > +   GpuChangeState_D3Entry,
> > +};
> > +
> >  #define AMDGPU_GFX_QUEUE_PRIORITY_MINIMUM  0  #define
> > AMDGPU_GFX_QUEUE_PRIORITY_MAXIMUM  15
> >
> > @@ -387,4 +393,5 @@ int amdgpu_gfx_cp_ecc_error_irq(struct
> amdgpu_device *adev,
> >   struct amdgpu_iv_entry *entry);
> > uint32_t amdgpu_kiq_rreg(struct amdgpu_device *adev, uint32_t reg);
> > void amdgpu_kiq_wreg(struct amdgpu_device *adev, uint32_t reg,
> > uint32_t v);
> > +void amdgpu_gfx_state_change_set(struct amdgpu_device *adev, enum
> > +gfx_change_state state);
> >  #endif
> > diff --git a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
> > b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
> > index a7f92d0..e7b69dd 100644
> > --- a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
> > +++ b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
> > @@ -284,6 +284,7 @@ struct amd_pm_funcs {
> > int (*odn_edit_dpm_table)(void *handle, uint32_t type, long *input,
> uint32_t size);
> > int (*set_mp1_state)(void *handle, enum pp_mp1_state mp1_state);
> > int (*smu_i2c_bus_access)(void *handle, bool acquire);
> > +   int (*gfx_state_change_set)(void *handle, uint32_t state);
> >  /* export to DC */
> > u32 (*get_sclk)(void *ha

RE: [PATCH] drm/amdgpu/swsmu/smu12: fix force clock handling for mclk

2020-09-28 Thread Liang, Prike

[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Prike Liang 

Thanks,
Prike
> -Original Message-
> From: amd-gfx  On Behalf Of Alex
> Deucher
> Sent: Tuesday, September 29, 2020 2:19 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander 
> Subject: [PATCH] drm/amdgpu/swsmu/smu12: fix force clock handling for
> mclk
>
> The state array is in the reverse order compared to other asics (high to low
> rather than low to high).
>
> Bug:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitla
> b.freedesktop.org%2Fdrm%2Famd%2F-
> %2Fissues%2F1313&data=02%7C01%7CPrike.Liang%40amd.com%7C15f
> 59da6951546dbea4808d863dafba6%7C3dd8961fe4884e608e11a82d994e183
> d%7C0%7C0%7C637369139485609967&sdata=8QG%2FcFh8K3t7BT%2Ft
> cZJt6KqYOYxQVX9dblyzebY4rFc%3D&reserved=0
> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c
> b/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c
> index 55a254be5ac2..66c1026489be 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c
> @@ -222,14 +222,16 @@ static int renoir_get_profiling_clk_mask(struct
> smu_context *smu,
>  *sclk_mask = 0;
>  } else if (level == AMD_DPM_FORCED_LEVEL_PROFILE_MIN_MCLK) {
>  if (mclk_mask)
> -*mclk_mask = 0;
> +/* mclk levels are in reverse order */
> +*mclk_mask = NUM_MEMCLK_DPM_LEVELS - 1;
>  } else if (level == AMD_DPM_FORCED_LEVEL_PROFILE_PEAK) {
>  if(sclk_mask)
>  /* The sclk as gfxclk and has three level about
> max/min/current */
>  *sclk_mask = 3 - 1;
>
>  if(mclk_mask)
> -*mclk_mask = NUM_MEMCLK_DPM_LEVELS - 1;
> +/* mclk levels are in reverse order */
> +*mclk_mask = 0;
>
>  if(soc_mask)
>  *soc_mask = NUM_SOCCLK_DPM_LEVELS - 1; @@ -
> 323,7 +325,7 @@ static int renoir_get_dpm_ultimate_freq(struct
> smu_context *smu,
>  case SMU_UCLK:
>  case SMU_FCLK:
>  case SMU_MCLK:
> -ret = renoir_get_dpm_clk_limited(smu, clk_type, 0,
> min);
> +ret = renoir_get_dpm_clk_limited(smu, clk_type,
> +NUM_MEMCLK_DPM_LEVELS - 1, min);
>  if (ret)
>  goto failed;
>  break;
> --
> 2.25.4
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.f
> reedesktop.org%2Fmailman%2Flistinfo%2Famd-
> gfx&data=02%7C01%7CPrike.Liang%40amd.com%7C15f59da6951546db
> ea4808d863dafba6%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C
> 637369139485609967&sdata=kH8rL1yy2JAGfJcsVdKfIP5wVjWMW%2Bp
> NLOLSuZssGh0%3D&reserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu/soc15: fix using ip discovery tables on renoir (v2)

2020-06-08 Thread Liang, Prike

[AMD Official Use Only - Internal Distribution Only]

Regards the gpu info inquire failed during start X, have sent a following fix 
for the issue.

drm/amdgpu/soc15: fix nullptr issue in soc15_read_register() for reg base 
accessing

> -Original Message-
> From: Liang, Prike
> Sent: Monday, June 8, 2020 2:00 PM
> To: Alex Deucher ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander 
> Subject: RE: [PATCH] drm/amdgpu/soc15: fix using ip discovery tables on
> renoir (v2)
>
> According to reg_offset assignment in amdgpu_discovery_reg_base_init() the
> reg_offset is calculated as IP base address pointer therefore PWR IP base
> should be map to adev->reg_offset[SMUIO_HWIP][0] + 1. Moreover, not
> sure whether can use/need  the mapped address to access
> PWR_MISC_CNTL_STATUS for controlling  the GFX CGPG in Renoir.
>
> Base on the above modify the PWR IP access  nullptr issue should can be fix,
> but should hold on this patch since  start X will be occur other nullptr issue
> during amdgpu_info_ioctl().
>
> Thanks,
> Prike
> > -Original Message-
> > From: Alex Deucher 
> > Sent: Friday, June 5, 2020 11:40 PM
> > To: amd-gfx@lists.freedesktop.org; Liang, Prike 
> > Cc: Deucher, Alexander 
> > Subject: [PATCH] drm/amdgpu/soc15: fix using ip discovery tables on
> > renoir
> > (v2)
> >
> > The PWR block moved into SMUIO, so the ip discovery table doesn't have
> > an entry for PWR, but the register has the same absolute offset, so
> > just patch up the offsets after updating the offsets from the IP discovery
> table.
> >
> > v2: PWR became SMUIO block 1.  fix the mapping.
> >
> > Signed-off-by: Alex Deucher 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/soc15.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c
> > b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > index 623745b2d8b3..dd17a8422111 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > @@ -686,6 +686,9 @@ int soc15_set_ip_blocks(struct amdgpu_device
> *adev)
> >  DRM_WARN("failed to init reg base from ip
> discovery table, "
> >   "fallback to legacy init method\n");
> >  vega10_reg_base_init(adev);
> > +} else {
> > +/* PWR block was merged into SMUIO on
> > renoir and became SMUIO block 1 */
> > +adev->reg_offset[PWR_HWIP][0] = adev-
> > >reg_offset[SMUIO_HWIP][1];
> >  }
> >  }
> >  break;
> > --
> > 2.25.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu/soc15: fix using ip discovery tables on renoir (v2)

2020-06-07 Thread Liang, Prike

[AMD Official Use Only - Internal Distribution Only]

According to reg_offset assignment in amdgpu_discovery_reg_base_init() the 
reg_offset is calculated as IP base address pointer therefore PWR IP base 
should be map to adev->reg_offset[SMUIO_HWIP][0] + 1. Moreover, not sure 
whether can use/need  the mapped address to access PWR_MISC_CNTL_STATUS for 
controlling  the GFX CGPG in Renoir.

Base on the above modify the PWR IP access  nullptr issue should can be fix, 
but should hold on this patch since  start X will be occur other nullptr issue 
during amdgpu_info_ioctl().

Thanks,
Prike
> -Original Message-
> From: Alex Deucher 
> Sent: Friday, June 5, 2020 11:40 PM
> To: amd-gfx@lists.freedesktop.org; Liang, Prike 
> Cc: Deucher, Alexander 
> Subject: [PATCH] drm/amdgpu/soc15: fix using ip discovery tables on renoir
> (v2)
>
> The PWR block moved into SMUIO, so the ip discovery table doesn't have an
> entry for PWR, but the register has the same absolute offset, so just patch up
> the offsets after updating the offsets from the IP discovery table.
>
> v2: PWR became SMUIO block 1.  fix the mapping.
>
> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/amdgpu/soc15.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c
> b/drivers/gpu/drm/amd/amdgpu/soc15.c
> index 623745b2d8b3..dd17a8422111 100644
> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> @@ -686,6 +686,9 @@ int soc15_set_ip_blocks(struct amdgpu_device *adev)
>  DRM_WARN("failed to init reg base from ip
> discovery table, "
>   "fallback to legacy init method\n");
>  vega10_reg_base_init(adev);
> +} else {
> +/* PWR block was merged into SMUIO on
> renoir and became SMUIO block 1 */
> +adev->reg_offset[PWR_HWIP][0] = adev-
> >reg_offset[SMUIO_HWIP][1];
>  }
>  }
>  break;
> --
> 2.25.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: enable renoir discovery for gc info retrieved

2020-06-02 Thread Liang, Prike

[AMD Official Use Only - Internal Distribution Only]

Ah not aware the enable patch before. But the already enable patch seems can't 
fallback to legacy gpuinfo FW load method when not support discovery and also 
may miss destroy the discovery_bin object when driver shut down.

Thanks,
Prike
> -Original Message-
> From: Alex Deucher 
> Sent: Tuesday, June 2, 2020 9:35 PM
> To: Liang, Prike 
> Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander
> ; Huang, Ray 
> Subject: Re: [PATCH] drm/amdgpu: enable renoir discovery for gc info
> retrieved
>
> On Mon, Jun 1, 2020 at 10:14 PM Liang, Prike  wrote:
> >
> > [AMD Official Use Only - Internal Distribution Only]
> >
> > Ping...
>
> Already enabled:
> https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.free
> desktop.org%2F~agd5f%2Flinux%2Fcommit%2F%3Fh%3Damd-staging-drm-
> next%26id%3De467ab869f5783cf93d4cf24c6ac647cc29d1fb5&data=02%
> 7C01%7CPrike.Liang%40amd.com%7C5bcc45116bb042163cec08d806f9bd58
> %7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637267016987033
> 430&sdata=KJ0xmSPXwlZ4LEfhAYoFzAwaWyx3laoLAsQccMM0pcs%3D&
> amp;reserved=0
>
> Alex
>
> >
> > Thanks,
> > > -Original Message-
> > > From: Liang, Prike 
> > > Sent: Friday, May 29, 2020 11:28 AM
> > > To: amd-gfx@lists.freedesktop.org
> > > Cc: Deucher, Alexander ; Huang, Ray
> > > ; Liang, Prike 
> > > Subject: [PATCH] drm/amdgpu: enable renoir discovery for gc info
> > > retrieved
> > >
> > > Use ip discovery GC table instead of gpu info firmware for exporting
> > > gpu info to inquire interface.As Renoir discovery has same version
> > > with Navi1x therefore just enable it same way as Navi1x.
> > >
> > > Signed-off-by: Prike.Liang 
> > > ---
> > >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 23
> > > ---
> > >  1 file changed, 20 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > index 2f0e8da..bff740ccd 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > @@ -1528,7 +1528,7 @@ static int
> > > amdgpu_device_parse_gpu_info_fw(struct amdgpu_device *adev)  {
> > > const char *chip_name;  char fw_name[30]; -int err;
> > > +int err, r;
> > >  const struct gpu_info_firmware_header_v1_0 *hdr;
> > >
> > >  adev->firmware.gpu_info_fw = NULL;
> > > @@ -1578,6 +1578,23 @@ static int
> > > amdgpu_device_parse_gpu_info_fw(struct amdgpu_device *adev)
> > > chip_name = "arcturus";  break;  case CHIP_RENOIR:
> > > +if (amdgpu_discovery) {
> > > +/**
> > > + * For RENOIR series seems needn't reinitialize the reg base
> > > again as it already set during
> > > + * early init,if any concern here will need export
> > > amdgpu_discovery_init() for this case.
> > > + */
> > > +r = amdgpu_discovery_reg_base_init(adev);
> > > +if (r) {
> > > +DRM_WARN("failed to get ip discovery table,
> > > "
> > > +"fallback to get gpu info in legacy
> > > method\n");
> > > +goto legacy_gpuinfo;
> > > +}
> > > +
> > > +amdgpu_discovery_get_gfx_info(adev);
> > > +
> > > +return 0;
> > > +}
> > > +legacy_gpuinfo:
> > >  chip_name = "renoir";
> > >  break;
> > >  case CHIP_NAVI10:
> > > @@ -1617,7 +1634,7 @@ static int
> > > amdgpu_device_parse_gpu_info_fw(struct amdgpu_device *adev)  (const
> > > struct gpu_info_firmware_v1_0 *)(adev-
> > > >firmware.gpu_info_fw->data +
> > >
> > > le32_to_cpu(hdr->header.ucode_array_offset_bytes));
> > >
> > > -if (amdgpu_discovery && adev->asic_type >= CHIP_NAVI10) {
> > > +if (amdgpu_discovery && adev->asic_type >= CHIP_RENOIR
> > > && !r) {
> > >  amdgpu_discovery_get_gfx_info(adev);
> > >  goto parse_soc_bounding_box;
> > >  }
> > > @@ -3364,7 +3381,7 @@ void amdgpu_device_fini(struct
> amdgpu_device
> > > *adev)
> > >  sysfs_remove_files(&adev->dev->kobj, amdgpu_dev_attributes);  if
> > > (IS_ENABLED(CONFIG_PERF_EVENTS))  amdgpu_pmu_fini(adev); -if
> > > (amdgpu_discovery && adev->asic_type >= CHIP_NAVI10)
> > > +if (amdgpu_discovery && adev->asic_type >= CHIP_RENOIR)
> > >  amdgpu_discovery_fini(adev);
> > >  }
> > >
> > > --
> > > 2.7.4
> >
> > ___
> > amd-gfx mailing list
> > amd-gfx@lists.freedesktop.org
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist
> > s.freedesktop.org%2Fmailman%2Flistinfo%2Famd-
> gfx&data=02%7C01%7CPr
> >
> ike.Liang%40amd.com%7C5bcc45116bb042163cec08d806f9bd58%7C3dd896
> 1fe4884
> >
> e608e11a82d994e183d%7C0%7C0%7C637267016987033430&sdata=R%
> 2F%2BY%2B
> >
> z%2BKHh09WazkQqS%2FbwH%2BeBM97%2Fx5hvqWAjUYEtM%3D&res
> erved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: enable renoir discovery for gc info retrieved

2020-06-01 Thread Liang, Prike

[AMD Official Use Only - Internal Distribution Only]

Ping...

Thanks,
> -Original Message-
> From: Liang, Prike 
> Sent: Friday, May 29, 2020 11:28 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Huang, Ray
> ; Liang, Prike 
> Subject: [PATCH] drm/amdgpu: enable renoir discovery for gc info retrieved
>
> Use ip discovery GC table instead of gpu info firmware for exporting gpu info
> to inquire interface.As Renoir discovery has same version with Navi1x
> therefore just enable it same way as Navi1x.
>
> Signed-off-by: Prike.Liang 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 23
> ---
>  1 file changed, 20 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 2f0e8da..bff740ccd 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -1528,7 +1528,7 @@ static int
> amdgpu_device_parse_gpu_info_fw(struct amdgpu_device *adev)  {
>  const char *chip_name;
>  char fw_name[30];
> -int err;
> +int err, r;
>  const struct gpu_info_firmware_header_v1_0 *hdr;
>
>  adev->firmware.gpu_info_fw = NULL;
> @@ -1578,6 +1578,23 @@ static int
> amdgpu_device_parse_gpu_info_fw(struct amdgpu_device *adev)
>  chip_name = "arcturus";
>  break;
>  case CHIP_RENOIR:
> +if (amdgpu_discovery) {
> +/**
> + * For RENOIR series seems needn't reinitialize the reg base
> again as it already set during
> + * early init,if any concern here will need export
> amdgpu_discovery_init() for this case.
> + */
> +r = amdgpu_discovery_reg_base_init(adev);
> +if (r) {
> +DRM_WARN("failed to get ip discovery table,
> "
> +"fallback to get gpu info in legacy
> method\n");
> +goto legacy_gpuinfo;
> +}
> +
> +amdgpu_discovery_get_gfx_info(adev);
> +
> +return 0;
> +}
> +legacy_gpuinfo:
>  chip_name = "renoir";
>  break;
>  case CHIP_NAVI10:
> @@ -1617,7 +1634,7 @@ static int
> amdgpu_device_parse_gpu_info_fw(struct amdgpu_device *adev)
>  (const struct gpu_info_firmware_v1_0 *)(adev-
> >firmware.gpu_info_fw->data +
>
> le32_to_cpu(hdr->header.ucode_array_offset_bytes));
>
> -if (amdgpu_discovery && adev->asic_type >= CHIP_NAVI10) {
> +if (amdgpu_discovery && adev->asic_type >= CHIP_RENOIR
> && !r) {
>  amdgpu_discovery_get_gfx_info(adev);
>  goto parse_soc_bounding_box;
>  }
> @@ -3364,7 +3381,7 @@ void amdgpu_device_fini(struct amdgpu_device
> *adev)
>  sysfs_remove_files(&adev->dev->kobj, amdgpu_dev_attributes);
>  if (IS_ENABLED(CONFIG_PERF_EVENTS))
>  amdgpu_pmu_fini(adev);
> -if (amdgpu_discovery && adev->asic_type >= CHIP_NAVI10)
> +if (amdgpu_discovery && adev->asic_type >= CHIP_RENOIR)
>  amdgpu_discovery_fini(adev);
>  }
>
> --
> 2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: fix the hw hang during perform system reboot and reset

2020-04-13 Thread Liang, Prike

> 
> On Mon, Apr 13, 2020 at 2:17 PM Paul Menzel  g...@molgen.mpg.de> wrote:
> >
> > Dear Alex, dear Prike,
> >
> >
> > Am 13.04.20 um 17:14 schrieb Alex Deucher:
> > > On Mon, Apr 13, 2020 at 11:09 AM Prike Liang 
> wrote:
> > >>
> > >> Unify set device CGPG to ungate state before enter poweroff or reset.
> > >>
> > >> Signed-off-by: Prike Liang 
> > >> Tested-by: Mengbing Wang 
> > >
> > > Acked-by: Alex Deucher 
> >
> > First:
> >
> > Tested-by: Paul Menzel  (MSI B350M MORTAR
> > (MS-7A37) with an AMD Ryzen 3 2200G)
> >
> > Second, I am having trouble to understand, how you can add your
> > Acked-by tag to a commit with such a commit message?
> >
> > The problem is not described (apparently it only affected certain
> > devices), it is not mentioned that it’s a regression (Fixes: tag/line
> > is missing), and I am having a hard time to understand the commit
> > message at all (and the one from the commit introducing the
> > regression). Why is it more or less reverting part of the other
> > commit, while the issue was not reproducible on Prike’s system?
> 
> The original issue was that we were not properly ungating some of the hw
> blocks in the right order for S3 suspend on renoir.  So the fix was to add
> ungate calls to amdgpu_device_suspend() to handle that case.
> However, the original fix should not have removed the calls from
> amdgpu_device_ip_suspend_phase1() since that is called separately for
> some other use cases (e.g., pci shutdown).  It didn't matter for some asics as
> they don't have different levels of powergating functionality.  I'll add the 
> fixes
> tag before the patch goes upstream.
> 
> Alex
> 
[Prike]  Thanks Alex help clarify and I will give more detail in the message.
> >
> > >> ---
> > >>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++
> > >>   1 file changed, 2 insertions(+)
> > >>
> > >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > >> index 87f7c12..bbe090a 100644
> > >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > >> @@ -2413,6 +2413,8 @@ static int
> amdgpu_device_ip_suspend_phase1(struct amdgpu_device *adev)
> > >>   {
> > >>  int i, r;
> > >>
> > >> +   amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE);
> > >> +   amdgpu_device_set_cg_state(adev, AMD_CG_STATE_UNGATE);
> > >>
> > >>  for (i = adev->num_ip_blocks - 1; i >= 0; i--) {
> > >>  if (!adev->ip_blocks[i].status.valid)
> > >> --
> > >> 2.7.4
> >
> > Kind regards,
> >
> > Paul
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH v2] drm/amdgpu: fix gfx hang during suspend with video playback (v2)

2020-04-12 Thread Liang, Prike

Could you share the PCI sub revision and I try check the issue on the 
Vega10(1002:687f) but can't find the
reboot hang up.

Thanks,
Prike
From: Pan, Xinhui 
Sent: Sunday, April 12, 2020 2:58 PM
To: Johannes Hirte ; Liang, Prike 

Cc: Deucher, Alexander ; Huang, Ray 
; Quan, Evan ; 
amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH v2] drm/amdgpu: fix gfx hang during suspend with video 
playback (v2)

Prike
I hit this issue too. reboot hung with my vega10.  it is ok with navi10.

From: amd-gfx 
mailto:amd-gfx-boun...@lists.freedesktop.org>>
 on behalf of Liang, Prike mailto:prike.li...@amd.com>>
Sent: Sunday, April 12, 2020 11:49:39 AM
To: Johannes Hirte 
mailto:johannes.hi...@datenkhaos.de>>
Cc: Deucher, Alexander 
mailto:alexander.deuc...@amd.com>>; Huang, Ray 
mailto:ray.hu...@amd.com>>; Quan, Evan 
mailto:evan.q...@amd.com>>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> 
mailto:amd-gfx@lists.freedesktop.org>>
Subject: RE: [PATCH v2] drm/amdgpu: fix gfx hang during suspend with video 
playback (v2)

Thanks update and verify. Could you give more detail information and error log 
message
about you observed issue?

Thanks,
Prike
> -Original Message-
> From: Johannes Hirte 
> mailto:johannes.hi...@datenkhaos.de>>
> Sent: Sunday, April 12, 2020 7:56 AM
> To: Liang, Prike mailto:prike.li...@amd.com>>
> Cc: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; 
> Deucher, Alexander
> mailto:alexander.deuc...@amd.com>>; Huang, Ray 
> mailto:ray.hu...@amd.com>>;
> Quan, Evan mailto:evan.q...@amd.com>>
> Subject: Re: [PATCH v2] drm/amdgpu: fix gfx hang during suspend with video
> playback (v2)
>
> On 2020 Apr 07, Prike Liang wrote:
> > The system will be hang up during S3 suspend because of SMU is pending
> > for GC not respose the register CP_HQD_ACTIVE access request.This
> > issue root cause of accessing the GC register under enter GFX CGGPG
> > and can be fixed by disable GFX CGPG before perform suspend.
> >
> > v2: Use disable the GFX CGPG instead of RLC safe mode guard.
> >
> > Signed-off-by: Prike Liang mailto:prike.li...@amd.com>>
> > Tested-by: Mengbing Wang 
> > mailto:mengbing.w...@amd.com>>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index 2e1f955..bf8735b 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -2440,8 +2440,6 @@ static int
> > amdgpu_device_ip_suspend_phase1(struct amdgpu_device *adev)  {
> >  int i, r;
> >
> > -   amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE);
> > -   amdgpu_device_set_cg_state(adev, AMD_CG_STATE_UNGATE);
> >
> >  for (i = adev->num_ip_blocks - 1; i >= 0; i--) {
> >  if (!adev->ip_blocks[i].status.valid)
> > @@ -3470,6 +3468,9 @@ int amdgpu_device_suspend(struct drm_device
> *dev, bool fbcon)
> >  }
> >  }
> >
> > +   amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE);
> > +   amdgpu_device_set_cg_state(adev, AMD_CG_STATE_UNGATE);
> > +
> >  amdgpu_amdkfd_suspend(adev, !fbcon);
> >
> >  amdgpu_ras_suspend(adev);
>
>
> This breaks shutdown/reboot on my system (Dell latitude 5495).
>
> --
> Regards,
>   Johannes Hirte

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Cxinhui.pan%40amd.com%7Cde6e0578174940b5f29808d7de948b88%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637222601969843248&sdata=quWGElw%2Fo70VJibuZ7%2BzS%2FcHH2OHSDB%2B5uaFPQUf2Os%3D&reserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH v2] drm/amdgpu: fix gfx hang during suspend with video playback (v2)

2020-04-11 Thread Liang, Prike

Thanks update and verify. Could you give more detail information and error log 
message   
about you observed issue? 

Thanks,
Prike
> -Original Message-
> From: Johannes Hirte 
> Sent: Sunday, April 12, 2020 7:56 AM
> To: Liang, Prike 
> Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander
> ; Huang, Ray ;
> Quan, Evan 
> Subject: Re: [PATCH v2] drm/amdgpu: fix gfx hang during suspend with video
> playback (v2)
> 
> On 2020 Apr 07, Prike Liang wrote:
> > The system will be hang up during S3 suspend because of SMU is pending
> > for GC not respose the register CP_HQD_ACTIVE access request.This
> > issue root cause of accessing the GC register under enter GFX CGGPG
> > and can be fixed by disable GFX CGPG before perform suspend.
> >
> > v2: Use disable the GFX CGPG instead of RLC safe mode guard.
> >
> > Signed-off-by: Prike Liang 
> > Tested-by: Mengbing Wang 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index 2e1f955..bf8735b 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -2440,8 +2440,6 @@ static int
> > amdgpu_device_ip_suspend_phase1(struct amdgpu_device *adev)  {
> > int i, r;
> >
> > -   amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE);
> > -   amdgpu_device_set_cg_state(adev, AMD_CG_STATE_UNGATE);
> >
> > for (i = adev->num_ip_blocks - 1; i >= 0; i--) {
> > if (!adev->ip_blocks[i].status.valid)
> > @@ -3470,6 +3468,9 @@ int amdgpu_device_suspend(struct drm_device
> *dev, bool fbcon)
> > }
> > }
> >
> > +   amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE);
> > +   amdgpu_device_set_cg_state(adev, AMD_CG_STATE_UNGATE);
> > +
> > amdgpu_amdkfd_suspend(adev, !fbcon);
> >
> > amdgpu_ras_suspend(adev);
> 
> 
> This breaks shutdown/reboot on my system (Dell latitude 5495).
> 
> --
> Regards,
>   Johannes Hirte

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: fix gfx hang during suspend with video playback

2020-04-07 Thread Liang, Prike



> -Original Message-
> From: Kuehling, Felix 
> Sent: Tuesday, April 7, 2020 11:43 PM
> To: Liang, Prike ; amd-gfx@lists.freedesktop.org;
> Huang, Ray 
> Cc: Deucher, Alexander ; Quan, Evan
> 
> Subject: Re: [PATCH] drm/amdgpu: fix gfx hang during suspend with video
> playback
> 
> Sorry, I missed this email thread because the subject seemed irrelevant to
> me. I still don't get why this is causing a problem with suspend/resume with
> video playback.
> 
> The functions you're changing are mostly used when running without HWS.
> This should only be the case during bring-ups or while debugging HWS issues.
> Otherwise they're only used for setting up the HIQ. That means in normal
> operation, these functions should not be used for user mode queue mapping,
> which is handled by the HWS.
[Prike]  This issue caused by improperly accessing the register CP_HQD_ACTIVE 
under GFX enter CGPG during perform destroy MQD at the stage of amdkfd suspend. 

For this solution may have an excessive guard for some MQD setup and occupy 
check. 
It's likely a potential common issue and have drafted v2 patch to disable GFX 
CGPG 
directly before perform amdgpu suspend opt. 

Thanks,
Prike

> Ray, I vaguely remember we discussed using KIQ for mapping the HIQ at
> some point. Did anyone ever propose a patch for that?
> 
> Thanks,
>   Felix
> 
> Am 2020-04-03 um 12:07 a.m. schrieb Prike Liang:
> > The system will be hang up during S3 as SMU is pending at GC not
> > respose the register CP_HQD_ACTIVE access request and this issue can
> > be fixed by adding RLC safe mode guard before each HQD map/unmap
> > retrive opt.
> >
> > Signed-off-by: Prike Liang 
> > Tested-by: Mengbing Wang 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 6 ++
> >  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 4 
> >  2 files changed, 10 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> > index df841c2..e265063 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> > @@ -232,6 +232,7 @@ int kgd_gfx_v9_hqd_load(struct kgd_dev *kgd, void
> *mqd, uint32_t pipe_id,
> > uint32_t *mqd_hqd;
> > uint32_t reg, hqd_base, data;
> >
> > +   amdgpu_gfx_rlc_enter_safe_mode(adev);
> > m = get_mqd(mqd);
> >
> > acquire_queue(kgd, pipe_id, queue_id); @@ -299,6 +300,7 @@ int
> > kgd_gfx_v9_hqd_load(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id,
> >
> > release_queue(kgd);
> >
> > +   amdgpu_gfx_rlc_exit_safe_mode(adev);
> > return 0;
> >  }
> >
> > @@ -497,6 +499,7 @@ bool kgd_gfx_v9_hqd_is_occupied(struct kgd_dev
> *kgd, uint64_t queue_address,
> > bool retval = false;
> > uint32_t low, high;
> >
> > +   amdgpu_gfx_rlc_enter_safe_mode(adev);
> > acquire_queue(kgd, pipe_id, queue_id);
> > act = RREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_ACTIVE));
> > if (act) {
> > @@ -508,6 +511,7 @@ bool kgd_gfx_v9_hqd_is_occupied(struct kgd_dev
> *kgd, uint64_t queue_address,
> > retval = true;
> > }
> > release_queue(kgd);
> > +   amdgpu_gfx_rlc_exit_safe_mode(adev);
> > return retval;
> >  }
> >
> > @@ -541,6 +545,7 @@ int kgd_gfx_v9_hqd_destroy(struct kgd_dev *kgd,
> void *mqd,
> > uint32_t temp;
> > struct v9_mqd *m = get_mqd(mqd);
> >
> > +   amdgpu_gfx_rlc_enter_safe_mode(adev);
> > if (adev->in_gpu_reset)
> > return -EIO;
> >
> > @@ -577,6 +582,7 @@ int kgd_gfx_v9_hqd_destroy(struct kgd_dev *kgd,
> void *mqd,
> > }
> >
> > release_queue(kgd);
> > +   amdgpu_gfx_rlc_exit_safe_mode(adev);
> > return 0;
> >  }
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > index 1fea077..ee107d9 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > @@ -3533,6 +3533,7 @@ static int gfx_v9_0_kiq_init_register(struct
> amdgpu_ring *ring)
> > struct v9_mqd *mqd = ring->mqd_ptr;
> > int j;
> >
> > +   amdgpu_gfx_rlc_enter_safe_mode(adev);
> > /* disable wptr polling */
> > WREG32_FIELD15(GC, 0, CP_PQ_WPTR_POLL_CNTL, EN, 0);
> >
> > @@ -3629,6 +3630,7 @@ static int gfx_v9_0_kiq_init_register(struct
> amdgpu_ring *ring)
> > if (ring->use_doorbell)
>

RE: [PATCH] drm/amdgpu: fix gfx hang during suspend with video playback

2020-04-07 Thread Liang, Prike




> -Original Message-
> From: Huang, Ray 
> Sent: Tuesday, April 7, 2020 4:03 PM
> To: Liang, Prike 
> Cc: Deucher, Alexander ; Kuehling, Felix
> ; Quan, Evan ; amd-
> g...@lists.freedesktop.org
> Subject: Re: [PATCH] drm/amdgpu: fix gfx hang during suspend with video
> playback
> 
> On Tue, Apr 07, 2020 at 01:49:43PM +0800, Liang, Prike wrote:
> >
> > > -Original Message-
> > > From: Huang, Ray 
> > > Sent: Friday, April 3, 2020 6:29 PM
> > > To: Liang, Prike 
> > > Cc: Deucher, Alexander ; Kuehling, Felix
> > > ; Quan, Evan ; amd-
> > > g...@lists.freedesktop.org
> > > Subject: Re: [PATCH] drm/amdgpu: fix gfx hang during suspend with
> > > video playback
> > >
> > > On Fri, Apr 03, 2020 at 06:05:55PM +0800, Huang Rui wrote:
> > > > On Fri, Apr 03, 2020 at 05:22:28PM +0800, Liang, Prike wrote:
> > > > >
> > > > > > -Original Message-
> > > > > > From: Huang, Ray 
> > > > > > Sent: Friday, April 3, 2020 2:27 PM
> > > > > > To: Liang, Prike 
> > > > > > Cc: amd-gfx@lists.freedesktop.org; Quan, Evan
> > > ;
> > > > > > Deucher, Alexander ; Kuehling,
> > > > > > Felix 
> > > > > > Subject: Re: [PATCH] drm/amdgpu: fix gfx hang during suspend
> > > > > > with video playback
> > > > > >
> > > > > > (+ Felix)
> > > > > >
> > > > > > On Fri, Apr 03, 2020 at 12:07:53PM +0800, Liang, Prike wrote:
> > > > > > > The system will be hang up during S3 as SMU is pending at GC
> > > > > > > not respose the register CP_HQD_ACTIVE access request and
> > > > > > > this issue can be fixed by adding RLC safe mode guard before
> > > > > > > each HQD map/unmap retrive opt.
> > > > > >
> > > > > > We need more information for the issue, does the map/unmap is
> > > > > > required for MAP_QUEUES/UNMAP_QUEUES packets or writing with
> > > MMIO or both?
> > > > > >
> > > > > [Prike]  The issue hang up at MP1 was trying to read register
> > > > > RSMU_RESIDENCY_COUNTER_GC but did not get response from GFX,
> > > since GFX was busy at reading register CP_HQD_ACTIVE.
> > > > > Moreover, when disabled GFXOFF this issue also can't see so
> > > > > there is likely to perform register accessed at GFXOFF CGPG/CGCG
> enter stage.
> > > > > As for only  this issue, that seems just MMIO  access failed
> > > > > case which
> > > occurred under QUEUE map/unmap status check.
> > > > >
> > > >
> > > > While we start to do S3, we will disable gfxoff at start of suspend.
> > > > Then in this point, the gfx should be always in "on" state.
> > > >
> > > > > > From your patch, you just protect the kernel kiq and user queue.
> > > > > > What about other kernel compute queues? HIQ?
> > > > > >
> > > > > [Prike] So far just find the KIQ/CPQ/DIQ map/unmap will inquire
> > > > > the CP_HQD_ACTIVE status by MMIO accessing, therefore just guard
> > > > > the KIQ
> > > and some type user queue now. Regarding HIQ map and ummap which
> used
> > > the method of submitting configuration packet.
> > > > >
> > > >
> > > > KIQ itself init/unit should be always under gfx on state. Can you
> > > > give a check the result if not add enter/exit rlc safe mode around it?
> > >
> > > Wait... In your case, the system didn't load any user queues because
> > > no ROCm based application is running. So the issue is probably
> > > caused by KIQ itself init/unit, can you confirm?
> > [Prike]  This  improper register access is under performing MQD
> > destroy during amdkfd suspend period. For the KIQ UNI process may not
> > need the RLC guard as GFX CGPG has been disabled at the early suspend
> period.
> 
> How about move below gfxoff/cgpg disabling ahead of
> amdgpu_amdkfd_suspend?
> 
> amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE);
> amdgpu_device_set_cg_state(adev, AMD_CG_STATE_UNGATE);
> 
> amdgpu_amdkfd_suspend(adev, !fbcon);
> 
> We should disable the gfxoff/cgpg at first to avoid mmio access.
> 
[Prike]  Generally speaking that's fine to un-gate the CGPG before each GFX 
MMIO access.
 That'

RE: [PATCH] drm/amdgpu: fix gfx hang during suspend with video playback

2020-04-06 Thread Liang, Prike



> -Original Message-
> From: Huang, Ray 
> Sent: Friday, April 3, 2020 6:29 PM
> To: Liang, Prike 
> Cc: Deucher, Alexander ; Kuehling, Felix
> ; Quan, Evan ; amd-
> g...@lists.freedesktop.org
> Subject: Re: [PATCH] drm/amdgpu: fix gfx hang during suspend with video
> playback
> 
> On Fri, Apr 03, 2020 at 06:05:55PM +0800, Huang Rui wrote:
> > On Fri, Apr 03, 2020 at 05:22:28PM +0800, Liang, Prike wrote:
> > >
> > > > -Original Message-
> > > > From: Huang, Ray 
> > > > Sent: Friday, April 3, 2020 2:27 PM
> > > > To: Liang, Prike 
> > > > Cc: amd-gfx@lists.freedesktop.org; Quan, Evan
> ;
> > > > Deucher, Alexander ; Kuehling, Felix
> > > > 
> > > > Subject: Re: [PATCH] drm/amdgpu: fix gfx hang during suspend with
> > > > video playback
> > > >
> > > > (+ Felix)
> > > >
> > > > On Fri, Apr 03, 2020 at 12:07:53PM +0800, Liang, Prike wrote:
> > > > > The system will be hang up during S3 as SMU is pending at GC not
> > > > > respose the register CP_HQD_ACTIVE access request and this issue
> > > > > can be fixed by adding RLC safe mode guard before each HQD
> > > > > map/unmap retrive opt.
> > > >
> > > > We need more information for the issue, does the map/unmap is
> > > > required for MAP_QUEUES/UNMAP_QUEUES packets or writing with
> MMIO or both?
> > > >
> > > [Prike]  The issue hang up at MP1 was trying to read register
> > > RSMU_RESIDENCY_COUNTER_GC but did not get response from GFX,
> since GFX was busy at reading register CP_HQD_ACTIVE.
> > > Moreover, when disabled GFXOFF this issue also can't see so there is
> > > likely to perform register accessed at GFXOFF CGPG/CGCG enter stage.
> > > As for only  this issue, that seems just MMIO  access failed case which
> occurred under QUEUE map/unmap status check.
> > >
> >
> > While we start to do S3, we will disable gfxoff at start of suspend.
> > Then in this point, the gfx should be always in "on" state.
> >
> > > > From your patch, you just protect the kernel kiq and user queue.
> > > > What about other kernel compute queues? HIQ?
> > > >
> > > [Prike] So far just find the KIQ/CPQ/DIQ map/unmap will inquire the
> > > CP_HQD_ACTIVE status by MMIO accessing, therefore just guard the KIQ
> and some type user queue now. Regarding HIQ map and ummap which used
> the method of submitting configuration packet.
> > >
> >
> > KIQ itself init/unit should be always under gfx on state. Can you give
> > a check the result if not add enter/exit rlc safe mode around it?
> 
> Wait... In your case, the system didn't load any user queues because no
> ROCm based application is running. So the issue is probably caused by KIQ
> itself init/unit, can you confirm?
[Prike]  This  improper register access is under performing MQD destroy
during amdkfd suspend period. For the KIQ UNI process may not need the RLC
guard as GFX CGPG has been disabled at the early suspend period.  

If have concern the other case over guard will send a patch for simplify it.
> 
> Thanks,
> Ray
> 
> >
> > Hi Felix, maybe we need to use packets with kiq to map all user queues.
> >
> > Thanks,
> > Ray
> >
> > > > Thanks,
> > > > Ray
> > > >
> > > > >
> > > > > Signed-off-by: Prike Liang 
> > > > > Tested-by: Mengbing Wang 
> > > > > ---
> > > > >  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 6
> ++
> > > > >  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 4 
> > > > >  2 files changed, 10 insertions(+)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> > > > > index df841c2..e265063 100644
> > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> > > > > @@ -232,6 +232,7 @@ int kgd_gfx_v9_hqd_load(struct kgd_dev *kgd,
> > > > > void
> > > > *mqd, uint32_t pipe_id,
> > > > >   uint32_t *mqd_hqd;
> > > > >   uint32_t reg, hqd_base, data;
> > > > >
> > > > > + amdgpu_gfx_rlc_enter_safe_mode(adev);
> > > > >   m = get_mqd(mqd);
> > > > >
> > > > >

RE: [PATCH] drm/amdgpu: fix gfx hang during suspend with video playback

2020-04-03 Thread Liang, Prike



> -Original Message-
> From: Huang, Ray 
> Sent: Friday, April 3, 2020 2:27 PM
> To: Liang, Prike 
> Cc: amd-gfx@lists.freedesktop.org; Quan, Evan ;
> Deucher, Alexander ; Kuehling, Felix
> 
> Subject: Re: [PATCH] drm/amdgpu: fix gfx hang during suspend with video
> playback
> 
> (+ Felix)
> 
> On Fri, Apr 03, 2020 at 12:07:53PM +0800, Liang, Prike wrote:
> > The system will be hang up during S3 as SMU is pending at GC not
> > respose the register CP_HQD_ACTIVE access request and this issue can
> > be fixed by adding RLC safe mode guard before each HQD map/unmap
> > retrive opt.
> 
> We need more information for the issue, does the map/unmap is required
> for MAP_QUEUES/UNMAP_QUEUES packets or writing with MMIO or both?
> 
[Prike]  The issue hang up at MP1 was trying to read register 
RSMU_RESIDENCY_COUNTER_GC 
but did not get response from GFX, since GFX was busy at reading register 
CP_HQD_ACTIVE.
Moreover, when disabled GFXOFF this issue also can't see so there is likely to 
perform 
register accessed at GFXOFF CGPG/CGCG enter stage.  As for only  this issue, 
that seems just 
MMIO  access failed case which occurred under QUEUE map/unmap status check. 

> From your patch, you just protect the kernel kiq and user queue. What about
> other kernel compute queues? HIQ?
> 
[Prike] So far just find the KIQ/CPQ/DIQ map/unmap will inquire the 
CP_HQD_ACTIVE status by MMIO accessing,
therefore just guard the KIQ  and some type user queue now. Regarding HIQ map 
and ummap which used the method of submitting configuration packet.  

> Thanks,
> Ray
> 
> >
> > Signed-off-by: Prike Liang 
> > Tested-by: Mengbing Wang 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 6 ++
> >  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 4 
> >  2 files changed, 10 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> > index df841c2..e265063 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> > @@ -232,6 +232,7 @@ int kgd_gfx_v9_hqd_load(struct kgd_dev *kgd, void
> *mqd, uint32_t pipe_id,
> > uint32_t *mqd_hqd;
> > uint32_t reg, hqd_base, data;
> >
> > +   amdgpu_gfx_rlc_enter_safe_mode(adev);
> > m = get_mqd(mqd);
> >
> > acquire_queue(kgd, pipe_id, queue_id); @@ -299,6 +300,7 @@ int
> > kgd_gfx_v9_hqd_load(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id,
> >
> > release_queue(kgd);
> >
> > +   amdgpu_gfx_rlc_exit_safe_mode(adev);
> > return 0;
> >  }
> >
> > @@ -497,6 +499,7 @@ bool kgd_gfx_v9_hqd_is_occupied(struct kgd_dev
> *kgd, uint64_t queue_address,
> > bool retval = false;
> > uint32_t low, high;
> >
> > +   amdgpu_gfx_rlc_enter_safe_mode(adev);
> > acquire_queue(kgd, pipe_id, queue_id);
> > act = RREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_ACTIVE));
> > if (act) {
> > @@ -508,6 +511,7 @@ bool kgd_gfx_v9_hqd_is_occupied(struct kgd_dev
> *kgd, uint64_t queue_address,
> > retval = true;
> > }
> > release_queue(kgd);
> > +   amdgpu_gfx_rlc_exit_safe_mode(adev);
> > return retval;
> >  }
> >
> > @@ -541,6 +545,7 @@ int kgd_gfx_v9_hqd_destroy(struct kgd_dev *kgd,
> void *mqd,
> > uint32_t temp;
> > struct v9_mqd *m = get_mqd(mqd);
> >
> > +   amdgpu_gfx_rlc_enter_safe_mode(adev);
> > if (adev->in_gpu_reset)
> > return -EIO;
> >
> > @@ -577,6 +582,7 @@ int kgd_gfx_v9_hqd_destroy(struct kgd_dev *kgd,
> void *mqd,
> > }
> >
> > release_queue(kgd);
> > +   amdgpu_gfx_rlc_exit_safe_mode(adev);
> > return 0;
> >  }
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > index 1fea077..ee107d9 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > @@ -3533,6 +3533,7 @@ static int gfx_v9_0_kiq_init_register(struct
> amdgpu_ring *ring)
> > struct v9_mqd *mqd = ring->mqd_ptr;
> > int j;
> >
> > +   amdgpu_gfx_rlc_enter_safe_mode(adev);
> > /* disable wptr polling */
> > WREG32_FIELD15(GC, 0, CP_PQ_WPTR_POLL_CNTL, EN, 0);
> >
> > @@ -3629,6 +3630,7 @@ static int gfx_v9_0_kiq_init_register(struct
> amdgpu_ring *ring)
> > if (ring->use_doorbell)
> > WREG32_FIELD15(GC, 0, CP_PQ_STATUS, DOORBELL_ENABLE,
> 1);
&

RE: [PATCH 1/2] drm/amd/powerplay: fix pre-check condition for setting clock range

2020-03-09 Thread Liang, Prike




> -Original Message-
> From: Bjorn Helgaas 
> Sent: Monday, March 9, 2020 9:11 PM
> To: Liang, Prike 
> Cc: amd-gfx@lists.freedesktop.org; Quan, Evan ;
> Huang, Ray ; linux-ker...@vger.org; Deucher,
> Alexander 
> Subject: Re: [PATCH 1/2] drm/amd/powerplay: fix pre-check condition for
> setting clock range
> 
> On Wed, Mar 04, 2020 at 10:55:37AM +0800, Prike Liang wrote:
> > This fix will handle some MP1 FW issue like as mclk dpm table in
> > renoir has a reverse dpm clock layout and a zero frequency dpm level
> > as following case.
> >
> > cat pp_dpm_mclk
> > 0: 1200Mhz
> > 1: 1200Mhz
> > 2: 800Mhz
> > 3: 0Mhz
> >
> > Signed-off-by: Prike Liang 
> > ---
> >  drivers/gpu/drm/amd/powerplay/amdgpu_smu.c | 2 +-
> > drivers/gpu/drm/amd/powerplay/smu_v12_0.c  | 3 ---
> >  2 files changed, 1 insertion(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> > b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> > index e3398f9..d454493 100644
> > --- a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> > +++ b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> > @@ -214,7 +214,7 @@ int smu_set_soft_freq_range(struct smu_context
> > *smu, enum smu_clk_type clk_type,  {
> > int ret = 0;
> >
> > -   if (min <= 0 && max <= 0)
> > +   if (min < 0 && max < 0)
> 
> This change causes the following Coverity warning because min and max are
> both unsigned:
> 
> int smu_set_soft_freq_range(struct smu_context *smu, enum smu_clk_type
> clk_type,
> uint32_t min, uint32_t max)
> 
> >>> CID 1460516:  Integer handling issues  (NO_EFFECT)
> >>> This less-than-zero comparison of an unsigned value is never true.
> "min < 0U".
[Prike] Thanks and will fix the Coverity warning. 
> 225 if (min < 0 && max < 0)
> 226 return -EINVAL;
> 
> > return -EINVAL;
> >
> > if (!smu_clk_dpm_is_enabled(smu, clk_type))
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amd/powerplay: suppress nonsupport profile mode overrun message

2019-12-18 Thread Liang, Prike

[AMD Official Use Only - Internal Distribution Only]



> -Original Message-
> From: Dai, Yuxian (David) 
> Sent: Thursday, December 19, 2019 3:39 PM
> To: Huang, Ray ; Liang, Prike 
> Cc: amd-gfx@lists.freedesktop.org; Quan, Evan 
> Subject: RE: [PATCH] drm/amd/powerplay: suppress nonsupport profile
> mode overrun message
> 
> As Ray point out.  We should  set the SMU_MSG_SetWorkloadMask with
> specified value  to indicate unsupported.
> But  the current a value with system error value: "-EINVAL"
> The firmware maybe response with unexpected action to driver.
[Prike] If get nonsupport profile mode will exit and not issue any you 
mentioned error value. 
> 
> -Original Message-
> From: Huang, Ray 
> Sent: Thursday, December 19, 2019 3:17 PM
> To: Dai, Yuxian (David) 
> Cc: Liang, Prike ; amd-gfx@lists.freedesktop.org;
> Quan, Evan 
> Subject: Re: [PATCH] drm/amd/powerplay: suppress nonsupport profile
> mode overrun message
> 
> [AMD Official Use Only - Internal Distribution Only]
> 
> On Thu, Dec 19, 2019 at 03:04:12PM +0800, Dai, Yuxian (David) wrote:
> > For we don't support the mode, so shouldn't print the error message, or
> regard as a error.
> > For log message, the error is high level .maybe change from "error"  to
> "warning" , it will be much better.
> >
> >
> > -Original Message-
> > From: Liang, Prike 
> > Sent: Thursday, December 19, 2019 2:46 PM
> > To: amd-gfx@lists.freedesktop.org
> > Cc: Quan, Evan ; Huang, Ray
> ;
> > Dai, Yuxian (David) ; Liang, Prike
> > 
> > Subject: [PATCH] drm/amd/powerplay: suppress nonsupport profile mode
> > overrun message
> >
> > SMU12 not support WORKLOAD_DEFAULT_BIT and
> WORKLOAD_PPLIB_POWER_SAVING_BIT.
> >
> 
> Probably smu firmware doesn't expose the feature mask to driver. Can you
> confirmware with smu firmware guy whehter this feature is really disabled or
> not in SMU12. If that, in my view, issue the message
> SMU_MSG_SetWorkloadMask with an unsupported state, it doesn't make
> sense.
> 
> Just work around this with one time warnning is not a good solution.
> 
> Thanks,
> Ray
> 
> > Signed-off-by: Prike Liang 
> > ---
> >  drivers/gpu/drm/amd/powerplay/renoir_ppt.c | 8 ++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/powerplay/renoir_ppt.c
> > b/drivers/gpu/drm/amd/powerplay/renoir_ppt.c
> > index 784903a3..f9a1817 100644
> > --- a/drivers/gpu/drm/amd/powerplay/renoir_ppt.c
> > +++ b/drivers/gpu/drm/amd/powerplay/renoir_ppt.c
> > @@ -550,14 +550,18 @@ static int renoir_set_power_profile_mode(struct
> smu_context *smu, long *input, u
> > /* conv PP_SMC_POWER_PROFILE* to WORKLOAD_PPLIB_*_BIT */
> > workload_type = smu_workload_get_type(smu, smu-
> >power_profile_mode);
> > if (workload_type < 0) {
> > -   pr_err("Unsupported power profile mode %d on
> RENOIR\n",smu->power_profile_mode);
> > +   /*
> > +* TODO: If some case need switch to powersave/default
> power mode
> > +* then can consider enter
> WORKLOAD_COMPUTE/WORKLOAD_CUSTOM for power saving.
> > +*/
> > +   pr_err_once("Unsupported power profile mode %d on
> > +RENOIR\n",smu->power_profile_mode);
> > return -EINVAL;
> > }
> >
> > ret = smu_send_smc_msg_with_param(smu,
> SMU_MSG_SetWorkloadMask,
> > 1 << workload_type);
> > if (ret) {
> > -   pr_err("Fail to set workload type %d\n", workload_type);
> > +   pr_err_once("Fail to set workload type %d\n",
> workload_type);
> > return ret;
> > }
> >
> > --
> > 2.7.4
> >
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amd/powerplay: suppress nonsupport profile mode overrun message

2019-12-18 Thread Liang, Prike

[AMD Official Use Only - Internal Distribution Only]



> -Original Message-
> From: Huang, Ray 
> Sent: Thursday, December 19, 2019 3:17 PM
> To: Dai, Yuxian (David) 
> Cc: Liang, Prike ; amd-gfx@lists.freedesktop.org;
> Quan, Evan 
> Subject: Re: [PATCH] drm/amd/powerplay: suppress nonsupport profile
> mode overrun message
> 
> [AMD Official Use Only - Internal Distribution Only]
> 
> On Thu, Dec 19, 2019 at 03:04:12PM +0800, Dai, Yuxian (David) wrote:
> > For we don't support the mode, so shouldn't print the error message, or
> regard as a error.
> > For log message, the error is high level .maybe change from "error"  to
> "warning" , it will be much better.
> >
> >
> > -Original Message-
> > From: Liang, Prike 
> > Sent: Thursday, December 19, 2019 2:46 PM
> > To: amd-gfx@lists.freedesktop.org
> > Cc: Quan, Evan ; Huang, Ray
> ;
> > Dai, Yuxian (David) ; Liang, Prike
> > 
> > Subject: [PATCH] drm/amd/powerplay: suppress nonsupport profile mode
> > overrun message
> >
> > SMU12 not support WORKLOAD_DEFAULT_BIT and
> WORKLOAD_PPLIB_POWER_SAVING_BIT.
> >
> 
> Probably smu firmware doesn't expose the feature mask to driver. Can you
> confirmware with smu firmware guy whehter this feature is really disabled or
> not in SMU12. If that, in my view, issue the message
> SMU_MSG_SetWorkloadMask with an unsupported state, it doesn't make
> sense.
> 
> Just work around this with one time warnning is not a good solution.
[Prike]  Yes, from SMU FW guy @Cai, Land  SMU12 not support default and 
power saving mode now.  As the patch TODO item said we can consider switch to
compute/customer profile mode for power saving if needed. 
> 
> Thanks,
> Ray
> 
> > Signed-off-by: Prike Liang 
> > ---
> >  drivers/gpu/drm/amd/powerplay/renoir_ppt.c | 8 ++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/powerplay/renoir_ppt.c
> > b/drivers/gpu/drm/amd/powerplay/renoir_ppt.c
> > index 784903a3..f9a1817 100644
> > --- a/drivers/gpu/drm/amd/powerplay/renoir_ppt.c
> > +++ b/drivers/gpu/drm/amd/powerplay/renoir_ppt.c
> > @@ -550,14 +550,18 @@ static int renoir_set_power_profile_mode(struct
> smu_context *smu, long *input, u
> > /* conv PP_SMC_POWER_PROFILE* to WORKLOAD_PPLIB_*_BIT */
> > workload_type = smu_workload_get_type(smu, smu-
> >power_profile_mode);
> > if (workload_type < 0) {
> > -   pr_err("Unsupported power profile mode %d on
> RENOIR\n",smu->power_profile_mode);
> > +   /*
> > +* TODO: If some case need switch to powersave/default
> power mode
> > +* then can consider enter
> WORKLOAD_COMPUTE/WORKLOAD_CUSTOM for power saving.
> > +*/
> > +   pr_err_once("Unsupported power profile mode %d on
> > +RENOIR\n",smu->power_profile_mode);
> > return -EINVAL;
> > }
> >
> > ret = smu_send_smc_msg_with_param(smu,
> SMU_MSG_SetWorkloadMask,
> > 1 << workload_type);
> > if (ret) {
> > -   pr_err("Fail to set workload type %d\n", workload_type);
> > +   pr_err_once("Fail to set workload type %d\n",
> workload_type);
> > return ret;
> > }
> >
> > --
> > 2.7.4
> >
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu/powerplay: use local renoir array sizes for clock fetching

2019-10-17 Thread Liang, Prike

Reviewed-by: Prike Liang 

> -Original Message-
> From: amd-gfx  On Behalf Of Alex
> Deucher
> Sent: Friday, October 18, 2019 12:00 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander 
> Subject: [PATCH] drm/amdgpu/powerplay: use local renoir array sizes for
> clock fetching
> 
> To avoid walking past the end of the arrays since the PP_SMU defines don't
> match the renoir defines.
> 
> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/powerplay/renoir_ppt.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/powerplay/renoir_ppt.c
> b/drivers/gpu/drm/amd/powerplay/renoir_ppt.c
> index fa314c275a82..f0c8d1ad2a80 100644
> --- a/drivers/gpu/drm/amd/powerplay/renoir_ppt.c
> +++ b/drivers/gpu/drm/amd/powerplay/renoir_ppt.c
> @@ -427,22 +427,22 @@ static int renoir_get_dpm_clock_table(struct
> smu_context *smu, struct dpm_clocks
>   if (!clock_table || !table)
>   return -EINVAL;
> 
> - for (i = 0; i < PP_SMU_NUM_DCFCLK_DPM_LEVELS; i++) {
> + for (i = 0; i < NUM_DCFCLK_DPM_LEVELS; i++) {
>   clock_table->DcfClocks[i].Freq = table->DcfClocks[i].Freq;
>   clock_table->DcfClocks[i].Vol = table->DcfClocks[i].Vol;
>   }
> 
> - for (i = 0; i < PP_SMU_NUM_SOCCLK_DPM_LEVELS; i++) {
> + for (i = 0; i < NUM_SOCCLK_DPM_LEVELS; i++) {
>   clock_table->SocClocks[i].Freq = table->SocClocks[i].Freq;
>   clock_table->SocClocks[i].Vol = table->SocClocks[i].Vol;
>   }
> 
> - for (i = 0; i < PP_SMU_NUM_FCLK_DPM_LEVELS; i++) {
> + for (i = 0; i < NUM_FCLK_DPM_LEVELS; i++) {
>   clock_table->FClocks[i].Freq = table->FClocks[i].Freq;
>   clock_table->FClocks[i].Vol = table->FClocks[i].Vol;
>   }
> 
> - for (i = 0; i<  PP_SMU_NUM_MEMCLK_DPM_LEVELS; i++) {
> + for (i = 0; i<  NUM_MEMCLK_DPM_LEVELS; i++) {
>   clock_table->MemClocks[i].Freq = table->MemClocks[i].Freq;
>   clock_table->MemClocks[i].Vol = table->MemClocks[i].Vol;
>   }
> --
> 2.23.0
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu/powerplay: implement interface pp_power_profile_mode

2019-10-16 Thread Liang, Prike

implement get_power_profile_mode for getting power profile mode status.

Signed-off-by: Prike Liang 
---
 drivers/gpu/drm/amd/powerplay/renoir_ppt.c | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/drivers/gpu/drm/amd/powerplay/renoir_ppt.c 
b/drivers/gpu/drm/amd/powerplay/renoir_ppt.c
index fa314c2..953e347 100644
--- a/drivers/gpu/drm/amd/powerplay/renoir_ppt.c
+++ b/drivers/gpu/drm/amd/powerplay/renoir_ppt.c
@@ -640,6 +640,39 @@ static int renoir_set_watermarks_table(
return ret;
 }
 
+static int renoir_get_power_profile_mode(struct smu_context *smu,
+  char *buf)
+{
+   static const char *profile_name[] = {
+   "BOOTUP_DEFAULT",
+   "3D_FULL_SCREEN",
+   "POWER_SAVING",
+   "VIDEO",
+   "VR",
+   "COMPUTE",
+   "CUSTOM"};
+   uint32_t i, size = 0;
+   int16_t workload_type = 0;
+
+   if (!smu->pm_enabled || !buf)
+   return -EINVAL;
+
+   for (i = 0; i <= PP_SMC_POWER_PROFILE_CUSTOM; i++) {
+   /*
+* Conv PP_SMC_POWER_PROFILE* to WORKLOAD_PPLIB_*_BIT
+* Not all profile modes are supported on arcturus.
+*/
+   workload_type = smu_workload_get_type(smu, i);
+   if (workload_type < 0)
+   continue;
+
+   size += sprintf(buf + size, "%2d %14s%s\n",
+   i, profile_name[i], (i == smu->power_profile_mode) ? 
"*" : " ");
+   }
+
+   return size;
+}
+
 static const struct pptable_funcs renoir_ppt_funcs = {
.get_smu_msg_index = renoir_get_smu_msg_index,
.get_smu_table_index = renoir_get_smu_table_index,
@@ -658,6 +691,7 @@ static const struct pptable_funcs renoir_ppt_funcs = {
.set_performance_level = renoir_set_performance_level,
.get_dpm_clock_table = renoir_get_dpm_clock_table,
.set_watermarks_table = renoir_set_watermarks_table,
+   .get_power_profile_mode = renoir_get_power_profile_mode,
 };
 
 void renoir_set_ppt_funcs(struct smu_context *smu)
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu/powerplay: add renoir funcs to support dc

2019-10-15 Thread Liang, Prike

Ok, see you newly patch DCFCLK_DPM_LEVELS has been aligned well.

Thanks,
 Prike
> -Original Message-
> From: amd-gfx  On Behalf Of Liang,
> Prike
> Sent: Wednesday, October 16, 2019 10:22 AM
> To: Wu, Hersen ; amd-gfx@lists.freedesktop.org
> Cc: Wentland, Harry ; Wang, Kevin(Yang)
> ; Wu, Hersen 
> Subject: RE: [PATCH] drm/amdgpu/powerplay: add renoir funcs to support
> dc
> 
> Regards the comment inline, saw you have fixed the not enable
> CONFIG_DRM_AMD_DC_DCN2_1 potential compile issue.
> 
> BTW, would you help clarify why PP_SMU_NUM_DCFCLK_DPM_LEVELS is
> different from the smu12_driver_if.h define NUM_DCFCLK_DPM_LEVELS .
> Is there can track the macro definition update ?
> 
> Thanks,
> Prike
> > -Original Message-
> > From: Liang, Prike
> > Sent: Friday, October 11, 2019 10:34 PM
> > To: Hersen Wu ; amd-gfx@lists.freedesktop.org
> > Cc: Wu, Hersen ; Wang, Kevin(Yang)
> > ; Wentland, Harry 
> > Subject: RE: [PATCH] drm/amdgpu/powerplay: add renoir funcs to support
> > dc
> >
> >
> >
> > > -Original Message-
> > > From: amd-gfx  On Behalf Of
> > > Hersen Wu
> > > Sent: Thursday, October 10, 2019 10:58 PM
> > > To: amd-gfx@lists.freedesktop.org
> > > Cc: Wu, Hersen ; Wang, Kevin(Yang)
> > > ; Wentland, Harry
> 
> > > Subject: [PATCH] drm/amdgpu/powerplay: add renoir funcs to support
> > > dc
> > >
> > > there are two paths for renoir dc access smu.
> > > one dc access smu directly using bios smc
> > > interface: set disply, dprefclk, etc.
> > > another goes through pplib for get dpm clock table and set watermmark.
> > >
> > > Signed-off-by: Hersen Wu 
> > > ---
> > >  .../amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c  | 16 +---
> > >  drivers/gpu/drm/amd/powerplay/amdgpu_smu.c| 35 +++
> > >  .../gpu/drm/amd/powerplay/inc/amdgpu_smu.h| 16 ++--
> > >  drivers/gpu/drm/amd/powerplay/renoir_ppt.c| 96
> > > +++
> > >  drivers/gpu/drm/amd/powerplay/smu_v11_0.c | 39 
> > >  5 files changed, 141 insertions(+), 61 deletions(-)
> > >
> > > diff --git
> > > a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c
> > > b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c
> > > index f4cfa0caeba8..95564b8de3ce 100644
> > > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c
> > > +++
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c
> > > @@ -589,10 +589,9 @@ void pp_rv_set_wm_ranges(struct pp_smu *pp,
> > >   if (pp_funcs && pp_funcs->set_watermarks_for_clocks_ranges)
> > >   pp_funcs->set_watermarks_for_clocks_ranges(pp_handle,
> > >
> > > &wm_with_clock_ranges);
> > > - else if (adev->smu.funcs &&
> > > -  adev->smu.funcs->set_watermarks_for_clock_ranges)
> > > + else
> > >   smu_set_watermarks_for_clock_ranges(&adev->smu,
> > > - &wm_with_clock_ranges);
> > > + &wm_with_clock_ranges);
> > >  }
> > >
> > >  void pp_rv_set_pme_wa_enable(struct pp_smu *pp) @@ -665,7 +664,6
> > @@
> > > enum pp_smu_status pp_nv_set_wm_ranges(struct pp_smu *pp,  {
> > >   const struct dc_context *ctx = pp->dm;
> > >   struct amdgpu_device *adev = ctx->driver_context;
> > > - struct smu_context *smu = &adev->smu;
> > >   struct dm_pp_wm_sets_with_clock_ranges_soc15
> > > wm_with_clock_ranges;
> > >   struct dm_pp_clock_range_for_dmif_wm_set_soc15
> > > *wm_dce_clocks =
> > >   wm_with_clock_ranges.wm_dmif_clocks_ranges;
> > > @@ -708,15 +706,7 @@ enum pp_smu_status
> > pp_nv_set_wm_ranges(struct
> > > pp_smu *pp,
> > >   ranges->writer_wm_sets[i].min_drain_clk_mhz *
> > 1000;
> > >   }
> > >
> > > - if (!smu->funcs)
> > > - return PP_SMU_RESULT_UNSUPPORTED;
> > > -
> > > - /* 0: successful or smu.funcs->set_watermarks_for_clock_ranges =
> > > NULL;
> > > -  * 1: fail
> > > -  */
> > > - if (smu_set_watermarks_for_clock_ranges(&adev->smu,
> > > - &wm_with_clock_ranges))
> > > - return PP_SMU_RESULT_UNSUPPORTED;
> > > + smu_set_watermarks_for_clock_ranges(&adev->smu,
> > >   &wm_with_clock_ranges);
> &

RE: [PATCH] drm/amdgpu/powerplay: add renoir funcs to support dc

2019-10-15 Thread Liang, Prike

Regards the comment inline, saw you have fixed the not enable 
CONFIG_DRM_AMD_DC_DCN2_1 potential compile issue.

BTW, would you help clarify why PP_SMU_NUM_DCFCLK_DPM_LEVELS is different from 
the smu12_driver_if.h define NUM_DCFCLK_DPM_LEVELS .
Is there can track the macro definition update ?

Thanks,
Prike
> -Original Message-
> From: Liang, Prike
> Sent: Friday, October 11, 2019 10:34 PM
> To: Hersen Wu ; amd-gfx@lists.freedesktop.org
> Cc: Wu, Hersen ; Wang, Kevin(Yang)
> ; Wentland, Harry 
> Subject: RE: [PATCH] drm/amdgpu/powerplay: add renoir funcs to support
> dc
> 
> 
> 
> > -Original Message-
> > From: amd-gfx  On Behalf Of
> > Hersen Wu
> > Sent: Thursday, October 10, 2019 10:58 PM
> > To: amd-gfx@lists.freedesktop.org
> > Cc: Wu, Hersen ; Wang, Kevin(Yang)
> > ; Wentland, Harry 
> > Subject: [PATCH] drm/amdgpu/powerplay: add renoir funcs to support dc
> >
> > there are two paths for renoir dc access smu.
> > one dc access smu directly using bios smc
> > interface: set disply, dprefclk, etc.
> > another goes through pplib for get dpm clock table and set watermmark.
> >
> > Signed-off-by: Hersen Wu 
> > ---
> >  .../amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c  | 16 +---
> >  drivers/gpu/drm/amd/powerplay/amdgpu_smu.c| 35 +++
> >  .../gpu/drm/amd/powerplay/inc/amdgpu_smu.h| 16 ++--
> >  drivers/gpu/drm/amd/powerplay/renoir_ppt.c| 96
> > +++
> >  drivers/gpu/drm/amd/powerplay/smu_v11_0.c | 39 
> >  5 files changed, 141 insertions(+), 61 deletions(-)
> >
> > diff --git
> > a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c
> > b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c
> > index f4cfa0caeba8..95564b8de3ce 100644
> > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c
> > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c
> > @@ -589,10 +589,9 @@ void pp_rv_set_wm_ranges(struct pp_smu *pp,
> > if (pp_funcs && pp_funcs->set_watermarks_for_clocks_ranges)
> > pp_funcs->set_watermarks_for_clocks_ranges(pp_handle,
> >
> > &wm_with_clock_ranges);
> > -   else if (adev->smu.funcs &&
> > -adev->smu.funcs->set_watermarks_for_clock_ranges)
> > +   else
> > smu_set_watermarks_for_clock_ranges(&adev->smu,
> > -   &wm_with_clock_ranges);
> > +   &wm_with_clock_ranges);
> >  }
> >
> >  void pp_rv_set_pme_wa_enable(struct pp_smu *pp) @@ -665,7 +664,6
> @@
> > enum pp_smu_status pp_nv_set_wm_ranges(struct pp_smu *pp,  {
> > const struct dc_context *ctx = pp->dm;
> > struct amdgpu_device *adev = ctx->driver_context;
> > -   struct smu_context *smu = &adev->smu;
> > struct dm_pp_wm_sets_with_clock_ranges_soc15
> > wm_with_clock_ranges;
> > struct dm_pp_clock_range_for_dmif_wm_set_soc15
> > *wm_dce_clocks =
> > wm_with_clock_ranges.wm_dmif_clocks_ranges;
> > @@ -708,15 +706,7 @@ enum pp_smu_status
> pp_nv_set_wm_ranges(struct
> > pp_smu *pp,
> > ranges->writer_wm_sets[i].min_drain_clk_mhz *
> 1000;
> > }
> >
> > -   if (!smu->funcs)
> > -   return PP_SMU_RESULT_UNSUPPORTED;
> > -
> > -   /* 0: successful or smu.funcs->set_watermarks_for_clock_ranges =
> > NULL;
> > -* 1: fail
> > -*/
> > -   if (smu_set_watermarks_for_clock_ranges(&adev->smu,
> > -   &wm_with_clock_ranges))
> > -   return PP_SMU_RESULT_UNSUPPORTED;
> > +   smu_set_watermarks_for_clock_ranges(&adev->smu,
> > &wm_with_clock_ranges);
> >
> > return PP_SMU_RESULT_OK;
> >  }
> > diff --git a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> > b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> > index c9266ea70331..1b71c38cdf96 100644
> > --- a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> > +++ b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> > @@ -1834,6 +1834,41 @@ int smu_set_mp1_state(struct smu_context
> *smu,
> > return ret;
> >  }
> >
> > +int smu_write_watermarks_table(struct smu_context *smu) {
> > +   int ret = 0;
> > +   struct smu_table_context *smu_table = &smu->smu_table;
> > +   struct smu_table *table = NULL;
> > +
> > +   table = &smu_table->tables[SMU_TABLE_WATERMARKS];
> > +
> > +   if (!table->cpu_addr)
> > +   ret

RE: [PATCH] drm/amdgpu/display: fix build error casused by CONFIG_DRM_AMD_DC_DCN2_1

2019-10-15 Thread Liang, Prike

Reviewed-by: Prike Liang 

BTW, would you help clarify why PP_SMU_NUM_DCFCLK_DPM_LEVELS is different from 
the smu12_driver_if.h define 
NUM_DCFCLK_DPM_LEVELS in you other patch about drm/amdgpu/powerplay: add renoir 
funcs to support dc.

Is there can track the macro definition update ?

Thanks,
Prike
> -Original Message-
> From: amd-gfx  On Behalf Of
> Hersen Wu
> Sent: Wednesday, October 16, 2019 12:51 AM
> To: dri-de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org
> Cc: Li, Sun peng (Leo) ; Lakha, Bhawanpreet
> ; Wentland, Harry
> ; Wu, Hersen 
> Subject: [PATCH] drm/amdgpu/display: fix build error casused by
> CONFIG_DRM_AMD_DC_DCN2_1
> 
> when CONFIG_DRM_AMD_DC_DCN2_1 is not enable in .config, there is build
> error. struct dpm_clocks shoud not be guarded.
> 
> Signed-off-by: Hersen Wu 
> ---
>  drivers/gpu/drm/amd/display/dc/dm_pp_smu.h | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/dc/dm_pp_smu.h
> b/drivers/gpu/drm/amd/display/dc/dm_pp_smu.h
> index 24d65dbbd749..ef7df9ef6d7e 100644
> --- a/drivers/gpu/drm/amd/display/dc/dm_pp_smu.h
> +++ b/drivers/gpu/drm/amd/display/dc/dm_pp_smu.h
> @@ -249,8 +249,6 @@ struct pp_smu_funcs_nv {  };  #endif
> 
> -#if defined(CONFIG_DRM_AMD_DC_DCN2_1)
> -
>  #define PP_SMU_NUM_SOCCLK_DPM_LEVELS  8  #define
> PP_SMU_NUM_DCFCLK_DPM_LEVELS  8
>  #define PP_SMU_NUM_FCLK_DPM_LEVELS4
> @@ -288,7 +286,6 @@ struct pp_smu_funcs_rn {
>   enum pp_smu_status (*get_dpm_clock_table) (struct pp_smu *pp,
>   struct dpm_clocks *clock_table);
>  };
> -#endif
> 
>  struct pp_smu_funcs {
>   struct pp_smu ctx;
> --
> 2.17.1
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 2/2] drm/amdgpu: fix S3 failed as RLC safe mode entry stucked in polloing gfx acq

2019-10-15 Thread Liang, Prike

Fix gfx cgpg setting sequence for RLC deadlock at safe mode entry in polling 
gfx response.
The patch can fix VCN IB test failed and DAL get dispaly count failed issue.

Signed-off-by: Prike Liang 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 5 -
 drivers/gpu/drm/amd/powerplay/amdgpu_smu.c | 4 
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index de8f9d6..dd345fc 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -4287,9 +4287,6 @@ static void gfx_v9_0_update_gfx_cg_power_gating(struct 
amdgpu_device *adev,
 {
amdgpu_gfx_rlc_enter_safe_mode(adev);
 
-   if (is_support_sw_smu(adev) && !enable)
-   smu_set_gfx_cgpg(&adev->smu, enable);
-
if ((adev->pg_flags & AMD_PG_SUPPORT_GFX_PG) && enable) {
gfx_v9_0_enable_gfx_cg_power_gating(adev, true);
if (adev->pg_flags & AMD_PG_SUPPORT_GFX_PIPELINE)
@@ -4566,8 +4563,6 @@ static int gfx_v9_0_set_powergating_state(void *handle,
gfx_v9_0_enable_cp_power_gating(adev, false);
 
/* update gfx cgpg state */
-   if (is_support_sw_smu(adev) && enable)
-   smu_set_gfx_cgpg(&adev->smu, enable);
gfx_v9_0_update_gfx_cg_power_gating(adev, enable);
 
/* update mgcg state */
diff --git a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c 
b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
index 6cb5288..84d8aa2 100644
--- a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
@@ -1188,6 +1188,7 @@ static int smu_hw_init(void *handle)
if (adev->flags & AMD_IS_APU) {
smu_powergate_sdma(&adev->smu, false);
smu_powergate_vcn(&adev->smu, false);
+   smu_set_gfx_cgpg(&adev->smu, true);
}
 
if (!smu->pm_enabled)
@@ -1350,6 +1351,9 @@ static int smu_resume(void *handle)
if (ret)
goto failed;
 
+   if (smu->is_apu)
+   smu_set_gfx_cgpg(&adev->smu, true);
+
mutex_unlock(&smu->mutex);
 
pr_info("SMU is resumed successfully!\n");
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 1/2] drm/amdgpu: add GFX_PIPELINE capacity check for updating gfx cgpg

2019-10-15 Thread Liang, Prike

Before disable gfx pipeline power gating need check the flag 
AMD_PG_SUPPORT_GFX_PIPELINE.

Signed-off-by: Prike Liang 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index b577b69..de8f9d6 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -4296,7 +4296,8 @@ static void gfx_v9_0_update_gfx_cg_power_gating(struct 
amdgpu_device *adev,
gfx_v9_0_enable_gfx_pipeline_powergating(adev, true);
} else {
gfx_v9_0_enable_gfx_cg_power_gating(adev, false);
-   gfx_v9_0_enable_gfx_pipeline_powergating(adev, false);
+   if (adev->pg_flags & AMD_PG_SUPPORT_GFX_PIPELINE)
+   gfx_v9_0_enable_gfx_pipeline_powergating(adev, false);
}
 
amdgpu_gfx_rlc_exit_safe_mode(adev);
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu/powerplay: add renoir funcs to support dc

2019-10-11 Thread Liang, Prike



> -Original Message-
> From: amd-gfx  On Behalf Of
> Hersen Wu
> Sent: Thursday, October 10, 2019 10:58 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Wu, Hersen ; Wang, Kevin(Yang)
> ; Wentland, Harry 
> Subject: [PATCH] drm/amdgpu/powerplay: add renoir funcs to support dc
> 
> there are two paths for renoir dc access smu.
> one dc access smu directly using bios smc
> interface: set disply, dprefclk, etc.
> another goes through pplib for get dpm clock table and set watermmark.
> 
> Signed-off-by: Hersen Wu 
> ---
>  .../amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c  | 16 +---
>  drivers/gpu/drm/amd/powerplay/amdgpu_smu.c| 35 +++
>  .../gpu/drm/amd/powerplay/inc/amdgpu_smu.h| 16 ++--
>  drivers/gpu/drm/amd/powerplay/renoir_ppt.c| 96
> +++
>  drivers/gpu/drm/amd/powerplay/smu_v11_0.c | 39 
>  5 files changed, 141 insertions(+), 61 deletions(-)
> 
> diff --git
> a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c
> index f4cfa0caeba8..95564b8de3ce 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c
> @@ -589,10 +589,9 @@ void pp_rv_set_wm_ranges(struct pp_smu *pp,
>   if (pp_funcs && pp_funcs->set_watermarks_for_clocks_ranges)
>   pp_funcs->set_watermarks_for_clocks_ranges(pp_handle,
> 
> &wm_with_clock_ranges);
> - else if (adev->smu.funcs &&
> -  adev->smu.funcs->set_watermarks_for_clock_ranges)
> + else
>   smu_set_watermarks_for_clock_ranges(&adev->smu,
> - &wm_with_clock_ranges);
> + &wm_with_clock_ranges);
>  }
> 
>  void pp_rv_set_pme_wa_enable(struct pp_smu *pp) @@ -665,7 +664,6
> @@ enum pp_smu_status pp_nv_set_wm_ranges(struct pp_smu *pp,  {
>   const struct dc_context *ctx = pp->dm;
>   struct amdgpu_device *adev = ctx->driver_context;
> - struct smu_context *smu = &adev->smu;
>   struct dm_pp_wm_sets_with_clock_ranges_soc15
> wm_with_clock_ranges;
>   struct dm_pp_clock_range_for_dmif_wm_set_soc15
> *wm_dce_clocks =
>   wm_with_clock_ranges.wm_dmif_clocks_ranges;
> @@ -708,15 +706,7 @@ enum pp_smu_status pp_nv_set_wm_ranges(struct
> pp_smu *pp,
>   ranges->writer_wm_sets[i].min_drain_clk_mhz *
> 1000;
>   }
> 
> - if (!smu->funcs)
> - return PP_SMU_RESULT_UNSUPPORTED;
> -
> - /* 0: successful or smu.funcs->set_watermarks_for_clock_ranges =
> NULL;
> -  * 1: fail
> -  */
> - if (smu_set_watermarks_for_clock_ranges(&adev->smu,
> - &wm_with_clock_ranges))
> - return PP_SMU_RESULT_UNSUPPORTED;
> + smu_set_watermarks_for_clock_ranges(&adev->smu,
>   &wm_with_clock_ranges);
> 
>   return PP_SMU_RESULT_OK;
>  }
> diff --git a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> index c9266ea70331..1b71c38cdf96 100644
> --- a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> +++ b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> @@ -1834,6 +1834,41 @@ int smu_set_mp1_state(struct smu_context *smu,
>   return ret;
>  }
> 
> +int smu_write_watermarks_table(struct smu_context *smu) {
> + int ret = 0;
> + struct smu_table_context *smu_table = &smu->smu_table;
> + struct smu_table *table = NULL;
> +
> + table = &smu_table->tables[SMU_TABLE_WATERMARKS];
> +
> + if (!table->cpu_addr)
> + return -EINVAL;
> +
> + ret = smu_update_table(smu, SMU_TABLE_WATERMARKS, 0, table-
> >cpu_addr,
> + true);
> +
> + return ret;
> +}
> +
> +int smu_set_watermarks_for_clock_ranges(struct smu_context *smu,
> + struct dm_pp_wm_sets_with_clock_ranges_soc15
> *clock_ranges) {
> + int ret = 0;
> + struct smu_table *watermarks = &smu-
> >smu_table.tables[SMU_TABLE_WATERMARKS];
> + void *table = watermarks->cpu_addr;
> +
> + if (!smu->disable_watermark &&
> + smu_feature_is_enabled(smu,
> SMU_FEATURE_DPM_DCEFCLK_BIT) &&
> + smu_feature_is_enabled(smu,
> SMU_FEATURE_DPM_SOCCLK_BIT)) {
> + smu_set_watermarks_table(smu, table, clock_ranges);
> + smu->watermarks_bitmap |= WATERMARKS_EXIST;
> + smu->watermarks_bitmap &= ~WATERMARKS_LOADED;
> + }
> +
> + return ret;
> +}
> +
>  const struct amd_ip_funcs smu_ip_funcs = {
>   .name = "smu",
>   .early_init = smu_early_init,
> diff --git a/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h
> b/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h
> index ccf711c327c8..1469146da1aa 100644
> --- a/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h
> +++ b/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h
> @@ -468,6 +468,7 @@ struct pptable_funcs {
>   int (*get_power_limit)(struct smu_context *smu, uint32_t *limit,
> bool asic_default)

RE: [PATCH] drm/amd/powerplay: initlialize smu->is_apu is false by default

2019-09-27 Thread Liang, Prike

Does use the default value (false) for the Boolean variable of is_apu not 
enough for identifying DGPU?   
Anyway, initialize the is_apu during smu early initialize also fine and the 
patch is 

Reviewed-by: Prike Liang 

Thanks,
Prike
> -Original Message-
> From: Wang, Kevin(Yang) 
> Sent: Friday, September 27, 2019 2:58 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Huang, Ray ; Feng, Kenneth
> ; Liang, Prike ; Wang,
> Kevin(Yang) 
> Subject: [PATCH] drm/amd/powerplay: initlialize smu->is_apu is false by
> default
> 
> the member of is_apu in smu_context need to initlialize by default.
> 
> set default value is false (dGPU)
> 
> for patch:
>   drm/amd/powerplay: bypass dpm_context null pointer check guard
>   for some smu series
> 
> Signed-off-by: Kevin Wang 
> ---
>  drivers/gpu/drm/amd/powerplay/amdgpu_smu.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> index 7b995b0834eb..6a64f765fcd4 100644
> --- a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> +++ b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> @@ -712,6 +712,7 @@ static int smu_early_init(void *handle)
> 
>   smu->adev = adev;
>   smu->pm_enabled = !!amdgpu_dpm;
> + smu->is_apu = false;
>   mutex_init(&smu->mutex);
> 
>   return smu_set_funcs(adev);
> --
> 2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

1 2 >

1 - 100 of 136 matches

Mail list logo