RE: [PATCH] drm/amdgpu: add SMU debug option support

2021-11-30 Thread Yu, Lang
[AMD Official Use Only]



>-Original Message-
>From: Lazar, Lijo 
>Sent: Wednesday, December 1, 2021 3:28 PM
>To: Yu, Lang ; amd-gfx@lists.freedesktop.org
>Cc: Deucher, Alexander ; Huang, Ray
>; Koenig, Christian 
>Subject: Re: [PATCH] drm/amdgpu: add SMU debug option support
>
>
>
>On 12/1/2021 12:37 PM, Yu, Lang wrote:
>> [AMD Official Use Only]
>>
>>
>>
>>> -Original Message-
>>> From: Lazar, Lijo 
>>> Sent: Wednesday, December 1, 2021 2:56 PM
>>> To: Yu, Lang ; amd-gfx@lists.freedesktop.org
>>> Cc: Deucher, Alexander ; Huang, Ray
>>> ; Koenig, Christian 
>>> Subject: Re: [PATCH] drm/amdgpu: add SMU debug option support
>>>
>>>
>>>
>>> On 12/1/2021 11:57 AM, Yu, Lang wrote:
 [AMD Official Use Only]

 Hi Lijo,

 Thanks for your comments.

   From my understanding, that just increases the timeout threshold
 and could hide some potential issues which should be exposed and solved.

 If current timeout threshold is not enough for some corner cases,
 (1) Do we consider to increase the threshold to cover these cases?
 (2) Or do we just expose them and request SMU FW to optimize them?

 I think it doesn't make much sense to increase the threshold in debug mode.
 How do you think? Thanks!
>>>
>>> In normal cases, 2secs would be more than enough. If we hang
>>> immediately, then check the FW registers later, the response would
>>> have come. I thought we just need to note those cases and not to fail
>>> everytime. Just to mark as a red flag in the log to tell us that FW
>>> is unexpectedly busy processing something else when the message is sent.
>>>
>>> There are some issues related to S0ix where we see the FW comes back
>>> with a response with an increased timeout under certain conditions.
>>
>> If these issues still exists, could we just blacklist the tests that
>> triggered them before solve them? Or we just increase the threshold to cover
>all the cases?
>>
>
>Actually, the timeout is message specific - like i2c transfer from EEPROM could
>take longer time.
>
>I am not sure if we should have more than 2s as timeout. Whenever this kind of
>issue happens, FW team check registers (then it will have a proper value) and 
>say
>they don't see anything abnormal :) Usually, those are just signs of crack and 
>it
>eventually breaks.
>
>Option is just fail immediately (then again not sure useful it will be if the 
>issue is
>this sort of thing) or wait to see how far it goes with an added timeout 
>before it
>fails eventually.

Does smu_cmn_wait_for_response()/smu_cmn_send_msg_without_waiting() are 
designed for long timeout cases? Is it fine that we don't fail here in the 
event of timeout?

Thanks,
Lang 

>
>Thanks,
>Lijo
>
>> Regards,
>> Lang
>>
>>>
>>> Thanks,
>>> Lijo
>>>

 Regards,
 Lang

> -Original Message-
> From: Lazar, Lijo 
> Sent: Wednesday, December 1, 2021 1:44 PM
> To: Lazar, Lijo ; Yu, Lang ;
> amd- g...@lists.freedesktop.org
> Cc: Deucher, Alexander ; Huang, Ray
> ; Koenig, Christian 
> Subject: RE: [PATCH] drm/amdgpu: add SMU debug option support
>
> Just realized that the patch I pasted won't work. Outer loop exit
> needs to be like this.
>   (reg & MP1_C2PMSG_90__CONTENT_MASK) != 0 && extended_wait-- >=
> 0
>
> Anyway, that patch is only there to communicate what I really meant
> in the earlier comment.
>
> Thanks,
> Lijo
>
> -Original Message-
> From: amd-gfx  On Behalf Of
> Lazar, Lijo
> Sent: Wednesday, December 1, 2021 10:44 AM
> To: Yu, Lang ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Huang, Ray
> ; Koenig, Christian 
> Subject: Re: [PATCH] drm/amdgpu: add SMU debug option support
>
>
>
> On 11/30/2021 10:47 AM, Lang Yu wrote:
>> To maintain system error state when SMU errors occurred, which
>> will aid in debugging SMU firmware issues, add SMU debug option support.
>>
>> It can be enabled or disabled via amdgpu_smu_debug debugfs file.
>> When enabled, it makes SMU errors fatal.
>> It is disabled by default.
>>
>> == Command Guide ==
>>
>> 1, enable SMU debug option
>>
>> # echo 1 > /sys/kernel/debug/dri/0/amdgpu_smu_debug
>>
>> 2, disable SMU debug option
>>
>> # echo 0 > /sys/kernel/debug/dri/0/amdgpu_smu_debug
>>
>> v2:
>> - Resend command when timeout.(Lijo)
>> - Use debugfs file instead of module parameter.
>>
>> Signed-off-by: Lang Yu 
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 32
>>> +
>> drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c  | 39
>>> +++-
> -
>> 2 files changed, 69 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>> index 

Re: [PATCH] drm/amdgpu: add SMU debug option support

2021-11-30 Thread Lazar, Lijo




On 12/1/2021 12:37 PM, Yu, Lang wrote:

[AMD Official Use Only]




-Original Message-
From: Lazar, Lijo 
Sent: Wednesday, December 1, 2021 2:56 PM
To: Yu, Lang ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Huang, Ray
; Koenig, Christian 
Subject: Re: [PATCH] drm/amdgpu: add SMU debug option support



On 12/1/2021 11:57 AM, Yu, Lang wrote:

[AMD Official Use Only]

Hi Lijo,

Thanks for your comments.

  From my understanding, that just increases the timeout threshold and
could hide some potential issues which should be exposed and solved.

If current timeout threshold is not enough for some corner cases,
(1) Do we consider to increase the threshold to cover these cases?
(2) Or do we just expose them and request SMU FW to optimize them?

I think it doesn't make much sense to increase the threshold in debug mode.
How do you think? Thanks!


In normal cases, 2secs would be more than enough. If we hang immediately, then
check the FW registers later, the response would have come. I thought we just
need to note those cases and not to fail everytime. Just to mark as a red flag 
in
the log to tell us that FW is unexpectedly busy processing something else when
the message is sent.

There are some issues related to S0ix where we see the FW comes back with a
response with an increased timeout under certain conditions.


If these issues still exists, could we just blacklist the tests that triggered 
them
before solve them? Or we just increase the threshold to cover all the cases?



Actually, the timeout is message specific - like i2c transfer from 
EEPROM could take longer time.


I am not sure if we should have more than 2s as timeout. Whenever this 
kind of issue happens, FW team check registers (then it will have a 
proper value) and say they don't see anything abnormal :) Usually, those 
are just signs of crack and it eventually breaks.


Option is just fail immediately (then again not sure useful it will be 
if the issue is this sort of thing) or wait to see how far it goes with 
an added timeout before it fails eventually.


Thanks,
Lijo


Regards,
Lang



Thanks,
Lijo



Regards,
Lang


-Original Message-
From: Lazar, Lijo 
Sent: Wednesday, December 1, 2021 1:44 PM
To: Lazar, Lijo ; Yu, Lang ;
amd- g...@lists.freedesktop.org
Cc: Deucher, Alexander ; Huang, Ray
; Koenig, Christian 
Subject: RE: [PATCH] drm/amdgpu: add SMU debug option support

Just realized that the patch I pasted won't work. Outer loop exit
needs to be like this.
(reg & MP1_C2PMSG_90__CONTENT_MASK) != 0 && extended_wait-- >=
0

Anyway, that patch is only there to communicate what I really meant
in the earlier comment.

Thanks,
Lijo

-Original Message-
From: amd-gfx  On Behalf Of
Lazar, Lijo
Sent: Wednesday, December 1, 2021 10:44 AM
To: Yu, Lang ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Huang, Ray
; Koenig, Christian 
Subject: Re: [PATCH] drm/amdgpu: add SMU debug option support



On 11/30/2021 10:47 AM, Lang Yu wrote:

To maintain system error state when SMU errors occurred, which will
aid in debugging SMU firmware issues, add SMU debug option support.

It can be enabled or disabled via amdgpu_smu_debug debugfs file.
When enabled, it makes SMU errors fatal.
It is disabled by default.

== Command Guide ==

1, enable SMU debug option

# echo 1 > /sys/kernel/debug/dri/0/amdgpu_smu_debug

2, disable SMU debug option

# echo 0 > /sys/kernel/debug/dri/0/amdgpu_smu_debug

v2:
- Resend command when timeout.(Lijo)
- Use debugfs file instead of module parameter.

Signed-off-by: Lang Yu 
---
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 32

+

drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c  | 39

+++-

-

2 files changed, 69 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 164d6a9e9fbb..f9412de86599 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -39,6 +39,8 @@

#if defined(CONFIG_DEBUG_FS)

+extern int amdgpu_smu_debug;
+
/**
 * amdgpu_debugfs_process_reg_op - Handle MMIO register reads/writes
 *
@@ -1152,6 +1154,8 @@ static ssize_t
amdgpu_debugfs_gfxoff_read(struct

file *f, char __user *buf,

return result;
}

+
+
static const struct file_operations amdgpu_debugfs_regs2_fops = {
.owner = THIS_MODULE,
.unlocked_ioctl = amdgpu_debugfs_regs2_ioctl, @@ -1609,6
+1613,26 @@ DEFINE_DEBUGFS_ATTRIBUTE(fops_ib_preempt, NULL,
DEFINE_DEBUGFS_ATTRIBUTE(fops_sclk_set, NULL,
amdgpu_debugfs_sclk_set, "%llu\n");

+static int amdgpu_debugfs_smu_debug_get(void *data, u64 *val) {
+   *val = amdgpu_smu_debug;
+   return 0;
+}
+
+static int amdgpu_debugfs_smu_debug_set(void *data, u64 val) {
+   if (val != 0 && val != 1)
+   return -EINVAL;
+
+   amdgpu_smu_debug = val;
+   return 0;
+}
+

RE: [PATCH V2 13/17] drm/amd/pm: do not expose the smu_context structure used internally in power

2021-11-30 Thread Quan, Evan
[AMD Official Use Only]



> -Original Message-
> From: Lazar, Lijo 
> Sent: Wednesday, December 1, 2021 2:39 PM
> To: Quan, Evan ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Koenig, Christian
> ; Feng, Kenneth 
> Subject: Re: [PATCH V2 13/17] drm/amd/pm: do not expose the
> smu_context structure used internally in power
> 
> 
> 
> On 12/1/2021 11:09 AM, Quan, Evan wrote:
> > [AMD Official Use Only]
> >
> >
> >
> >> -Original Message-
> >> From: Lazar, Lijo 
> >> Sent: Tuesday, November 30, 2021 9:58 PM
> >> To: Quan, Evan ; amd-gfx@lists.freedesktop.org
> >> Cc: Deucher, Alexander ; Koenig,
> Christian
> >> ; Feng, Kenneth 
> >> Subject: Re: [PATCH V2 13/17] drm/amd/pm: do not expose the
> >> smu_context structure used internally in power
> >>
> >>
> >>
> >> On 11/30/2021 1:12 PM, Evan Quan wrote:
> >>> This can cover the power implementation details. And as what did for
> >>> powerplay framework, we hook the smu_context to adev-
> >>> powerplay.pp_handle.
> >>>
> >>> Signed-off-by: Evan Quan 
> >>> Change-Id: I3969c9f62a8b63dc6e4321a488d8f15022ffeb3d
> >>> ---
> >>>drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  6 --
> >>>.../gpu/drm/amd/include/kgd_pp_interface.h|  9 +++
> >>>drivers/gpu/drm/amd/pm/amdgpu_dpm.c   | 51 ++--
> >>>drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h   | 11 +---
> >>>drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 60
> >> +--
> >>>.../gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c |  9 +--
> >>>.../gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c   |  9 +--
> >>>.../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c   |  9 +--
> >>>.../gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c|  4 +-
> >>>.../drm/amd/pm/swsmu/smu13/aldebaran_ppt.c|  9 +--
> >>>.../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c|  8 +--
> >>>11 files changed, 111 insertions(+), 74 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >>> index c987813a4996..fefabd568483 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >>> @@ -99,7 +99,6 @@
> >>>#include "amdgpu_gem.h"
> >>>#include "amdgpu_doorbell.h"
> >>>#include "amdgpu_amdkfd.h"
> >>> -#include "amdgpu_smu.h"
> >>>#include "amdgpu_discovery.h"
> >>>#include "amdgpu_mes.h"
> >>>#include "amdgpu_umc.h"
> >>> @@ -950,11 +949,6 @@ struct amdgpu_device {
> >>>
> >>>   /* powerplay */
> >>>   struct amd_powerplaypowerplay;
> >>> -
> >>> - /* smu */
> >>> - struct smu_context  smu;
> >>> -
> >>> - /* dpm */
> >>>   struct amdgpu_pmpm;
> >>>   u32 cg_flags;
> >>>   u32 pg_flags;
> >>> diff --git a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
> >>> b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
> >>> index 7919e96e772b..da6a82430048 100644
> >>> --- a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
> >>> +++ b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
> >>> @@ -25,6 +25,9 @@
> >>>#define __KGD_PP_INTERFACE_H__
> >>>
> >>>extern const struct amdgpu_ip_block_version pp_smu_ip_block;
> >>> +extern const struct amdgpu_ip_block_version smu_v11_0_ip_block;
> >>> +extern const struct amdgpu_ip_block_version smu_v12_0_ip_block;
> >>> +extern const struct amdgpu_ip_block_version smu_v13_0_ip_block;
> >>>
> >>>enum smu_event_type {
> >>>   SMU_EVENT_RESET_COMPLETE = 0,
> >>> @@ -244,6 +247,12 @@ enum pp_power_type
> >>>   PP_PWR_TYPE_FAST,
> >>>};
> >>>
> >>> +enum smu_ppt_limit_type
> >>> +{
> >>> + SMU_DEFAULT_PPT_LIMIT = 0,
> >>> + SMU_FAST_PPT_LIMIT,
> >>> +};
> >>> +
> >>
> >> This is a contradiction. If the entry point is dpm, this shouldn't be
> >> here and the external interface doesn't need to know about internal
> datatypes.
> > [Quan, Evan] This is needed by amdgpu_hwmon_show_power_label()
> from amdgpu_pm.c.
> > So, it has to be put into some place which can be accessed from outside(of
> power).
> > Then kgd_pp_interface.h is the right place.
> 
> The public data types are enum pp_power_type and enum
> pp_power_limit_level.
> 
> The first one tells about the type of power limits (fast/slow/sustained) and
> second one is about the min/max/default values for different limits.
> 
> To show the label, use the pp_power_type type.
[Quan, Evan] Thanks, missed the pp_power_type. Seems we defined two data 
structures for the same purpose.
Let me check and fix this.
> 
> >
> >>
> >>>#define PP_GROUP_MASK0xF000
> >>>#define PP_GROUP_SHIFT   28
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
> >>> b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
> >>> index 8f0ae58f4292..a5cbbf9367fe 100644
> >>> --- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
> >>> +++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
> >>> @@ -31,6 +31,7 @@
> >>>#include "amdgpu_display.h"
> 

Re: [PATCH v4] drm/radeon/radeon_kms: Fix a NULL pointer dereference in radeon_driver_open_kms()

2021-11-30 Thread Christian König

Am 01.12.21 um 04:22 schrieb Zhou Qingyang:

In radeon_driver_open_kms(), radeon_vm_bo_add() is assigned to
vm->ib_bo_va and passes and used in radeon_vm_bo_set_addr(). In
radeon_vm_bo_set_addr(), there is a dereference of vm->ib_bo_va,
which could lead to a NULL pointer dereference on failure of
radeon_vm_bo_add().

Fix this bug by adding a check of vm->ib_bo_va.

This bug was found by a static analyzer. The analysis employs
differential checking to identify inconsistent security operations
(e.g., checks or kfrees) between two code paths and confirms that the
inconsistent operations are not recovered in the current function or
the callers, so they constitute bugs.

Note that, as a bug found by static analysis, it can be a false
positive or hard to trigger. Multiple researchers have cross-reviewed
the bug.

Builds with CONFIG_DRM_RADEON=m show no new warnings,
and our static analyzer no longer warns about this code.

Fixes: cc9e67e3d700 ("drm/radeon: fix VM IB handling")
Reported-by: kernel test robot 
Signed-off-by: Zhou Qingyang 
---
Changes in v2:
   -  Initialize the variables to silence warning


What warning do you get? Double checking the code that shouldn't be 
necessary and is usually rather frowned upon.


Thanks,
Christian.



Changes in v3:
   -  Fix the bug that good case will also be freed
   -  Improve code style

Changes in v2:
   -  Improve the error handling into goto style

  drivers/gpu/drm/radeon/radeon_kms.c | 37 -
  1 file changed, 20 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_kms.c 
b/drivers/gpu/drm/radeon/radeon_kms.c
index 482fb0ae6cb5..9d0f840286a1 100644
--- a/drivers/gpu/drm/radeon/radeon_kms.c
+++ b/drivers/gpu/drm/radeon/radeon_kms.c
@@ -648,7 +648,9 @@ void radeon_driver_lastclose_kms(struct drm_device *dev)
  int radeon_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv)
  {
struct radeon_device *rdev = dev->dev_private;
-   int r;
+   struct radeon_fpriv *fpriv = NULL;
+   struct radeon_vm *vm = NULL;
+   int r = 0;

file_priv->driver_priv = NULL;

@@ -660,8 +662,6 @@ int radeon_driver_open_kms(struct drm_device *dev, struct 
drm_file *file_priv)
  
  	/* new gpu have virtual address space support */

if (rdev->family >= CHIP_CAYMAN) {
-   struct radeon_fpriv *fpriv;
-   struct radeon_vm *vm;
  
  		fpriv = kzalloc(sizeof(*fpriv), GFP_KERNEL);

if (unlikely(!fpriv)) {
@@ -672,35 +672,38 @@ int radeon_driver_open_kms(struct drm_device *dev, struct 
drm_file *file_priv)
if (rdev->accel_working) {
vm = >vm;
r = radeon_vm_init(rdev, vm);
-   if (r) {
-   kfree(fpriv);
-   goto out_suspend;
-   }
+   if (r)
+   goto out_fpriv;
  
  			r = radeon_bo_reserve(rdev->ring_tmp_bo.bo, false);

-   if (r) {
-   radeon_vm_fini(rdev, vm);
-   kfree(fpriv);
-   goto out_suspend;
-   }
+   if (r)
+   goto out_vm_fini;
  
  			/* map the ib pool buffer read only into

 * virtual address space */
vm->ib_bo_va = radeon_vm_bo_add(rdev, vm,
rdev->ring_tmp_bo.bo);
+   if (!vm->ib_bo_va) {
+   r = -ENOMEM;
+   goto out_vm_fini;
+   }
+
r = radeon_vm_bo_set_addr(rdev, vm->ib_bo_va,
  RADEON_VA_IB_OFFSET,
  RADEON_VM_PAGE_READABLE |
  RADEON_VM_PAGE_SNOOPED);
-   if (r) {
-   radeon_vm_fini(rdev, vm);
-   kfree(fpriv);
-   goto out_suspend;
-   }
+   if (r)
+   goto out_vm_fini;
}
file_priv->driver_priv = fpriv;
}
  
+out_vm_fini:

+   if (r)
+   radeon_vm_fini(rdev, vm);
+out_fpriv:
+   if (r)
+   kfree(fpriv);
  out_suspend:
pm_runtime_mark_last_busy(dev->dev);
pm_runtime_put_autosuspend(dev->dev);




RE: [PATCH] drm/amdgpu: add SMU debug option support

2021-11-30 Thread Yu, Lang
[AMD Official Use Only]



>-Original Message-
>From: Lazar, Lijo 
>Sent: Wednesday, December 1, 2021 2:56 PM
>To: Yu, Lang ; amd-gfx@lists.freedesktop.org
>Cc: Deucher, Alexander ; Huang, Ray
>; Koenig, Christian 
>Subject: Re: [PATCH] drm/amdgpu: add SMU debug option support
>
>
>
>On 12/1/2021 11:57 AM, Yu, Lang wrote:
>> [AMD Official Use Only]
>>
>> Hi Lijo,
>>
>> Thanks for your comments.
>>
>>  From my understanding, that just increases the timeout threshold and
>> could hide some potential issues which should be exposed and solved.
>>
>> If current timeout threshold is not enough for some corner cases,
>> (1) Do we consider to increase the threshold to cover these cases?
>> (2) Or do we just expose them and request SMU FW to optimize them?
>>
>> I think it doesn't make much sense to increase the threshold in debug mode.
>> How do you think? Thanks!
>
>In normal cases, 2secs would be more than enough. If we hang immediately, then
>check the FW registers later, the response would have come. I thought we just
>need to note those cases and not to fail everytime. Just to mark as a red flag 
>in
>the log to tell us that FW is unexpectedly busy processing something else when
>the message is sent.
>
>There are some issues related to S0ix where we see the FW comes back with a
>response with an increased timeout under certain conditions.

If these issues still exists, could we just blacklist the tests that triggered 
them 
before solve them? Or we just increase the threshold to cover all the cases?

Regards,
Lang

>
>Thanks,
>Lijo
>
>>
>> Regards,
>> Lang
>>
>>> -Original Message-
>>> From: Lazar, Lijo 
>>> Sent: Wednesday, December 1, 2021 1:44 PM
>>> To: Lazar, Lijo ; Yu, Lang ;
>>> amd- g...@lists.freedesktop.org
>>> Cc: Deucher, Alexander ; Huang, Ray
>>> ; Koenig, Christian 
>>> Subject: RE: [PATCH] drm/amdgpu: add SMU debug option support
>>>
>>> Just realized that the patch I pasted won't work. Outer loop exit
>>> needs to be like this.
>>> (reg & MP1_C2PMSG_90__CONTENT_MASK) != 0 && extended_wait-- >=
>>> 0
>>>
>>> Anyway, that patch is only there to communicate what I really meant
>>> in the earlier comment.
>>>
>>> Thanks,
>>> Lijo
>>>
>>> -Original Message-
>>> From: amd-gfx  On Behalf Of
>>> Lazar, Lijo
>>> Sent: Wednesday, December 1, 2021 10:44 AM
>>> To: Yu, Lang ; amd-gfx@lists.freedesktop.org
>>> Cc: Deucher, Alexander ; Huang, Ray
>>> ; Koenig, Christian 
>>> Subject: Re: [PATCH] drm/amdgpu: add SMU debug option support
>>>
>>>
>>>
>>> On 11/30/2021 10:47 AM, Lang Yu wrote:
 To maintain system error state when SMU errors occurred, which will
 aid in debugging SMU firmware issues, add SMU debug option support.

 It can be enabled or disabled via amdgpu_smu_debug debugfs file.
 When enabled, it makes SMU errors fatal.
 It is disabled by default.

 == Command Guide ==

 1, enable SMU debug option

# echo 1 > /sys/kernel/debug/dri/0/amdgpu_smu_debug

 2, disable SMU debug option

# echo 0 > /sys/kernel/debug/dri/0/amdgpu_smu_debug

 v2:
- Resend command when timeout.(Lijo)
- Use debugfs file instead of module parameter.

 Signed-off-by: Lang Yu 
 ---
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 32
>+
drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c  | 39
>+++-
>>> -
2 files changed, 69 insertions(+), 2 deletions(-)

 diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
 b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
 index 164d6a9e9fbb..f9412de86599 100644
 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
 +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
 @@ -39,6 +39,8 @@

#if defined(CONFIG_DEBUG_FS)

 +extern int amdgpu_smu_debug;
 +
/**
 * amdgpu_debugfs_process_reg_op - Handle MMIO register reads/writes
 *
 @@ -1152,6 +1154,8 @@ static ssize_t
 amdgpu_debugfs_gfxoff_read(struct
>>> file *f, char __user *buf,
return result;
}

 +
 +
static const struct file_operations amdgpu_debugfs_regs2_fops = {
.owner = THIS_MODULE,
.unlocked_ioctl = amdgpu_debugfs_regs2_ioctl, @@ -1609,6
 +1613,26 @@ DEFINE_DEBUGFS_ATTRIBUTE(fops_ib_preempt, NULL,
DEFINE_DEBUGFS_ATTRIBUTE(fops_sclk_set, NULL,
amdgpu_debugfs_sclk_set, "%llu\n");

 +static int amdgpu_debugfs_smu_debug_get(void *data, u64 *val) {
 +  *val = amdgpu_smu_debug;
 +  return 0;
 +}
 +
 +static int amdgpu_debugfs_smu_debug_set(void *data, u64 val) {
 +  if (val != 0 && val != 1)
 +  return -EINVAL;
 +
 +  amdgpu_smu_debug = val;
 +  return 0;
 +}
 +
 +DEFINE_DEBUGFS_ATTRIBUTE(fops_smu_debug,
 +   amdgpu_debugfs_smu_debug_get,
 +   

RE: [PATCH V2 01/17] drm/amd/pm: do not expose implementation details to other blocks out of power

2021-11-30 Thread Quan, Evan
[AMD Official Use Only]



> -Original Message-
> From: Lazar, Lijo 
> Sent: Wednesday, December 1, 2021 11:33 AM
> To: Quan, Evan ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Feng, Kenneth
> ; Koenig, Christian 
> Subject: Re: [PATCH V2 01/17] drm/amd/pm: do not expose implementation
> details to other blocks out of power
> 
> 
> 
> On 12/1/2021 7:29 AM, Quan, Evan wrote:
> > [AMD Official Use Only]
> >
> >
> >
> >> -Original Message-
> >> From: amd-gfx  On Behalf Of
> >> Lazar, Lijo
> >> Sent: Tuesday, November 30, 2021 4:10 PM
> >> To: Quan, Evan ; amd-gfx@lists.freedesktop.org
> >> Cc: Deucher, Alexander ; Feng, Kenneth
> >> ; Koenig, Christian
> 
> >> Subject: Re: [PATCH V2 01/17] drm/amd/pm: do not expose
> >> implementation details to other blocks out of power
> >>
> >>
> >>
> >> On 11/30/2021 1:12 PM, Evan Quan wrote:
> >>> Those implementation details(whether swsmu supported, some
> ppt_funcs
> >>> supported, accessing internal statistics ...)should be kept
> >>> internally. It's not a good practice and even error prone to expose
> >> implementation details.
> >>>
> >>> Signed-off-by: Evan Quan 
> >>> Change-Id: Ibca3462ceaa26a27a9145282b60c6ce5deca7752
> >>> ---
> >>>drivers/gpu/drm/amd/amdgpu/aldebaran.c|  2 +-
> >>>drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c   | 25 ++---
> >>>drivers/gpu/drm/amd/amdgpu/amdgpu_device.c|  6 +-
> >>>drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c   | 18 +---
> >>>drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h   |  7 --
> >>>drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c   |  5 +-
> >>>drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c   |  5 +-
> >>>drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c   |  2 +-
> >>>.../gpu/drm/amd/include/kgd_pp_interface.h|  4 +
> >>>drivers/gpu/drm/amd/pm/amdgpu_dpm.c   | 95
> >> +++
> >>>drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h   | 25 -
> >>>drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h   |  9 +-
> >>>drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 16 ++--
> >>>13 files changed, 155 insertions(+), 64 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/aldebaran.c
> >>> b/drivers/gpu/drm/amd/amdgpu/aldebaran.c
> >>> index bcfdb63b1d42..a545df4efce1 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/aldebaran.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/aldebaran.c
> >>> @@ -260,7 +260,7 @@ static int aldebaran_mode2_restore_ip(struct
> >> amdgpu_device *adev)
> >>>   adev->gfx.rlc.funcs->resume(adev);
> >>>
> >>>   /* Wait for FW reset event complete */
> >>> - r = smu_wait_for_event(adev, SMU_EVENT_RESET_COMPLETE, 0);
> >>> + r = amdgpu_dpm_wait_for_event(adev,
> >> SMU_EVENT_RESET_COMPLETE, 0);
> >>>   if (r) {
> >>>   dev_err(adev->dev,
> >>>   "Failed to get response from firmware after 
> >>> reset\n");
> >> diff --git
> >>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> >>> index 164d6a9e9fbb..0d1f00b24aae 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> >>> @@ -1585,22 +1585,25 @@ static int amdgpu_debugfs_sclk_set(void
> >>> *data,
> >> u64 val)
> >>>   return ret;
> >>>   }
> >>>
> >>> - if (is_support_sw_smu(adev)) {
> >>> - ret = smu_get_dpm_freq_range(>smu, SMU_SCLK,
> >> _freq, _freq);
> >>> - if (ret || val > max_freq || val < min_freq)
> >>> - return -EINVAL;
> >>> - ret = smu_set_soft_freq_range(>smu, SMU_SCLK,
> >> (uint32_t)val, (uint32_t)val);
> >>> - } else {
> >>> - return 0;
> >>> + ret = amdgpu_dpm_get_dpm_freq_range(adev, PP_SCLK,
> >> _freq, _freq);
> >>> + if (ret == -EOPNOTSUPP) {
> >>> + ret = 0;
> >>> + goto out;
> >>>   }
> >>> + if (ret || val > max_freq || val < min_freq) {
> >>> + ret = -EINVAL;
> >>> + goto out;
> >>> + }
> >>> +
> >>> + ret = amdgpu_dpm_set_soft_freq_range(adev, PP_SCLK,
> >> (uint32_t)val, (uint32_t)val);
> >>> + if (ret)
> >>> + ret = -EINVAL;
> >>>
> >>> +out:
> >>>   pm_runtime_mark_last_busy(adev_to_drm(adev)->dev);
> >>>   pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
> >>>
> >>> - if (ret)
> >>> - return -EINVAL;
> >>> -
> >>> - return 0;
> >>> + return ret;
> >>>}
> >>>
> >>>DEFINE_DEBUGFS_ATTRIBUTE(fops_ib_preempt, NULL, diff --git
> >>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >>> index 1989f9e9379e..41cc1ffb5809 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >>> @@ -2617,7 +2617,7 @@ static int amdgpu_device_ip_late_init(struct
> >> amdgpu_device *adev)
> >>>   if (adev->asic_type == CHIP_ARCTURUS &&
> >>>   amdgpu_passthrough(adev) &&
> >>>   

Re: [PATCH v3] drm/radeon/radeon_kms: Fix a NULL pointer dereference in radeon_driver_open_kms()

2021-11-30 Thread Christian König

Am 30.11.21 um 16:57 schrieb Zhou Qingyang:

In radeon_driver_open_kms(), radeon_vm_bo_add() is assigned to
vm->ib_bo_va and passes and used in radeon_vm_bo_set_addr(). In
radeon_vm_bo_set_addr(), there is a dereference of vm->ib_bo_va,
which could lead to a NULL pointer dereference on failure of
radeon_vm_bo_add().

Fix this bug by adding a check of vm->ib_bo_va.

This bug was found by a static analyzer. The analysis employs
differential checking to identify inconsistent security operations
(e.g., checks or kfrees) between two code paths and confirms that the
inconsistent operations are not recovered in the current function or
the callers, so they constitute bugs.

Note that, as a bug found by static analysis, it can be a false
positive or hard to trigger. Multiple researchers have cross-reviewed
the bug.

Builds with CONFIG_DRM_RADEON=m show no new warnings,
and our static analyzer no longer warns about this code.

Fixes: cc9e67e3d700 ("drm/radeon: fix VM IB handling")
Signed-off-by: Zhou Qingyang 


Reviewed-by: Christian König 


---
Changes in v3:
   -  Fix the bug that good case will also be freed
   -  Improve code style

Changes in v2:
   -  Improve the error handling into goto style

  drivers/gpu/drm/radeon/radeon_kms.c | 35 -
  1 file changed, 19 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_kms.c 
b/drivers/gpu/drm/radeon/radeon_kms.c
index 482fb0ae6cb5..439f4d1fdd65 100644
--- a/drivers/gpu/drm/radeon/radeon_kms.c
+++ b/drivers/gpu/drm/radeon/radeon_kms.c
@@ -648,6 +648,8 @@ void radeon_driver_lastclose_kms(struct drm_device *dev)
  int radeon_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv)
  {
struct radeon_device *rdev = dev->dev_private;
+   struct radeon_fpriv *fpriv;
+   struct radeon_vm *vm;
int r;
  
  	file_priv->driver_priv = NULL;

@@ -660,8 +662,6 @@ int radeon_driver_open_kms(struct drm_device *dev, struct 
drm_file *file_priv)
  
  	/* new gpu have virtual address space support */

if (rdev->family >= CHIP_CAYMAN) {
-   struct radeon_fpriv *fpriv;
-   struct radeon_vm *vm;
  
  		fpriv = kzalloc(sizeof(*fpriv), GFP_KERNEL);

if (unlikely(!fpriv)) {
@@ -672,35 +672,38 @@ int radeon_driver_open_kms(struct drm_device *dev, struct 
drm_file *file_priv)
if (rdev->accel_working) {
vm = >vm;
r = radeon_vm_init(rdev, vm);
-   if (r) {
-   kfree(fpriv);
-   goto out_suspend;
-   }
+   if (r)
+   goto out_fpriv;
  
  			r = radeon_bo_reserve(rdev->ring_tmp_bo.bo, false);

-   if (r) {
-   radeon_vm_fini(rdev, vm);
-   kfree(fpriv);
-   goto out_suspend;
-   }
+   if (r)
+   goto out_vm_fini;
  
  			/* map the ib pool buffer read only into

 * virtual address space */
vm->ib_bo_va = radeon_vm_bo_add(rdev, vm,
rdev->ring_tmp_bo.bo);
+   if (!vm->ib_bo_va) {
+   r = -ENOMEM;
+   goto out_vm_fini;
+   }
+
r = radeon_vm_bo_set_addr(rdev, vm->ib_bo_va,
  RADEON_VA_IB_OFFSET,
  RADEON_VM_PAGE_READABLE |
  RADEON_VM_PAGE_SNOOPED);
-   if (r) {
-   radeon_vm_fini(rdev, vm);
-   kfree(fpriv);
-   goto out_suspend;
-   }
+   if (r)
+   goto out_vm_fini;
}
file_priv->driver_priv = fpriv;
}
  
+out_vm_fini:

+   if (r)
+   radeon_vm_fini(rdev, vm);
+out_fpriv:
+   if (r)
+   kfree(fpriv);
  out_suspend:
pm_runtime_mark_last_busy(dev->dev);
pm_runtime_put_autosuspend(dev->dev);




Re: [PATCH] drm/amdgpu: add SMU debug option support

2021-11-30 Thread Lazar, Lijo




On 12/1/2021 11:57 AM, Yu, Lang wrote:

[AMD Official Use Only]

Hi Lijo,

Thanks for your comments.
  
 From my understanding, that just increases the timeout threshold and

could hide some potential issues which should be exposed and solved.

If current timeout threshold is not enough for some corner cases,
(1) Do we consider to increase the threshold to cover these cases?
(2) Or do we just expose them and request SMU FW to optimize them?

I think it doesn't make much sense to increase the threshold in debug mode.
How do you think? Thanks!


In normal cases, 2secs would be more than enough. If we hang 
immediately, then check the FW registers later, the response would have 
come. I thought we just need to note those cases and not to fail 
everytime. Just to mark as a red flag in the log to tell us that FW is 
unexpectedly busy processing something else when the message is sent.


There are some issues related to S0ix where we see the FW comes back 
with a response with an increased timeout under certain conditions.


Thanks,
Lijo



Regards,
Lang


-Original Message-
From: Lazar, Lijo 
Sent: Wednesday, December 1, 2021 1:44 PM
To: Lazar, Lijo ; Yu, Lang ; amd-
g...@lists.freedesktop.org
Cc: Deucher, Alexander ; Huang, Ray
; Koenig, Christian 
Subject: RE: [PATCH] drm/amdgpu: add SMU debug option support

Just realized that the patch I pasted won't work. Outer loop exit needs to be 
like
this.
(reg & MP1_C2PMSG_90__CONTENT_MASK) != 0 && extended_wait-- >=
0

Anyway, that patch is only there to communicate what I really meant in the
earlier comment.

Thanks,
Lijo

-Original Message-
From: amd-gfx  On Behalf Of Lazar,
Lijo
Sent: Wednesday, December 1, 2021 10:44 AM
To: Yu, Lang ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Huang, Ray
; Koenig, Christian 
Subject: Re: [PATCH] drm/amdgpu: add SMU debug option support



On 11/30/2021 10:47 AM, Lang Yu wrote:

To maintain system error state when SMU errors occurred, which will
aid in debugging SMU firmware issues, add SMU debug option support.

It can be enabled or disabled via amdgpu_smu_debug debugfs file. When
enabled, it makes SMU errors fatal.
It is disabled by default.

== Command Guide ==

1, enable SMU debug option

   # echo 1 > /sys/kernel/debug/dri/0/amdgpu_smu_debug

2, disable SMU debug option

   # echo 0 > /sys/kernel/debug/dri/0/amdgpu_smu_debug

v2:
   - Resend command when timeout.(Lijo)
   - Use debugfs file instead of module parameter.

Signed-off-by: Lang Yu 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 32 +
   drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c  | 39 +++-

-

   2 files changed, 69 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 164d6a9e9fbb..f9412de86599 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -39,6 +39,8 @@

   #if defined(CONFIG_DEBUG_FS)

+extern int amdgpu_smu_debug;
+
   /**
* amdgpu_debugfs_process_reg_op - Handle MMIO register reads/writes
*
@@ -1152,6 +1154,8 @@ static ssize_t amdgpu_debugfs_gfxoff_read(struct

file *f, char __user *buf,

return result;
   }

+
+
   static const struct file_operations amdgpu_debugfs_regs2_fops = {
.owner = THIS_MODULE,
.unlocked_ioctl = amdgpu_debugfs_regs2_ioctl, @@ -1609,6 +1613,26
@@ DEFINE_DEBUGFS_ATTRIBUTE(fops_ib_preempt, NULL,
   DEFINE_DEBUGFS_ATTRIBUTE(fops_sclk_set, NULL,
amdgpu_debugfs_sclk_set, "%llu\n");

+static int amdgpu_debugfs_smu_debug_get(void *data, u64 *val) {
+   *val = amdgpu_smu_debug;
+   return 0;
+}
+
+static int amdgpu_debugfs_smu_debug_set(void *data, u64 val) {
+   if (val != 0 && val != 1)
+   return -EINVAL;
+
+   amdgpu_smu_debug = val;
+   return 0;
+}
+
+DEFINE_DEBUGFS_ATTRIBUTE(fops_smu_debug,
+amdgpu_debugfs_smu_debug_get,
+amdgpu_debugfs_smu_debug_set,
+"%llu\n");
+
   int amdgpu_debugfs_init(struct amdgpu_device *adev)
   {
struct dentry *root = adev_to_drm(adev)->primary->debugfs_root;
@@ -1632,6 +1656,14 @@ int amdgpu_debugfs_init(struct amdgpu_device

*adev)

return PTR_ERR(ent);
}

+   ent = debugfs_create_file("amdgpu_smu_debug", 0600, root, adev,
+ _smu_debug);
+   if (IS_ERR(ent)) {
+   DRM_ERROR("unable to create amdgpu_smu_debug debugsfs

file\n");

+   return PTR_ERR(ent);
+   }
+
+
/* Register debugfs entries for amdgpu_ttm */
amdgpu_ttm_debugfs_init(adev);
amdgpu_debugfs_pm_init(adev);
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
index 048ca1673863..b3969d7933d3 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
@@ -55,6 +55,14 

Re: [PATCH V2 13/17] drm/amd/pm: do not expose the smu_context structure used internally in power

2021-11-30 Thread Lazar, Lijo




On 12/1/2021 11:09 AM, Quan, Evan wrote:

[AMD Official Use Only]




-Original Message-
From: Lazar, Lijo 
Sent: Tuesday, November 30, 2021 9:58 PM
To: Quan, Evan ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Koenig, Christian
; Feng, Kenneth 
Subject: Re: [PATCH V2 13/17] drm/amd/pm: do not expose the
smu_context structure used internally in power



On 11/30/2021 1:12 PM, Evan Quan wrote:

This can cover the power implementation details. And as what did for
powerplay framework, we hook the smu_context to adev-
powerplay.pp_handle.

Signed-off-by: Evan Quan 
Change-Id: I3969c9f62a8b63dc6e4321a488d8f15022ffeb3d
---
   drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  6 --
   .../gpu/drm/amd/include/kgd_pp_interface.h|  9 +++
   drivers/gpu/drm/amd/pm/amdgpu_dpm.c   | 51 ++--
   drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h   | 11 +---
   drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 60

+--

   .../gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c |  9 +--
   .../gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c   |  9 +--
   .../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c   |  9 +--
   .../gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c|  4 +-
   .../drm/amd/pm/swsmu/smu13/aldebaran_ppt.c|  9 +--
   .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c|  8 +--
   11 files changed, 111 insertions(+), 74 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index c987813a4996..fefabd568483 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -99,7 +99,6 @@
   #include "amdgpu_gem.h"
   #include "amdgpu_doorbell.h"
   #include "amdgpu_amdkfd.h"
-#include "amdgpu_smu.h"
   #include "amdgpu_discovery.h"
   #include "amdgpu_mes.h"
   #include "amdgpu_umc.h"
@@ -950,11 +949,6 @@ struct amdgpu_device {

/* powerplay */
struct amd_powerplaypowerplay;
-
-   /* smu */
-   struct smu_context  smu;
-
-   /* dpm */
struct amdgpu_pmpm;
u32 cg_flags;
u32 pg_flags;
diff --git a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
index 7919e96e772b..da6a82430048 100644
--- a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
@@ -25,6 +25,9 @@
   #define __KGD_PP_INTERFACE_H__

   extern const struct amdgpu_ip_block_version pp_smu_ip_block;
+extern const struct amdgpu_ip_block_version smu_v11_0_ip_block;
+extern const struct amdgpu_ip_block_version smu_v12_0_ip_block;
+extern const struct amdgpu_ip_block_version smu_v13_0_ip_block;

   enum smu_event_type {
SMU_EVENT_RESET_COMPLETE = 0,
@@ -244,6 +247,12 @@ enum pp_power_type
PP_PWR_TYPE_FAST,
   };

+enum smu_ppt_limit_type
+{
+   SMU_DEFAULT_PPT_LIMIT = 0,
+   SMU_FAST_PPT_LIMIT,
+};
+


This is a contradiction. If the entry point is dpm, this shouldn't be here and
the external interface doesn't need to know about internal datatypes.

[Quan, Evan] This is needed by amdgpu_hwmon_show_power_label() from amdgpu_pm.c.
So, it has to be put into some place which can be accessed from outside(of 
power).
Then kgd_pp_interface.h is the right place.


The public data types are enum pp_power_type and enum pp_power_limit_level.

The first one tells about the type of power limits (fast/slow/sustained) 
and second one is about the min/max/default values for different limits.


To show the label, use the pp_power_type type.






   #define PP_GROUP_MASK0xF000
   #define PP_GROUP_SHIFT   28

diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
index 8f0ae58f4292..a5cbbf9367fe 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
@@ -31,6 +31,7 @@
   #include "amdgpu_display.h"
   #include "hwmgr.h"
   #include 
+#include "amdgpu_smu.h"

   #define amdgpu_dpm_enable_bapm(adev, e) \

((adev)->powerplay.pp_funcs->enable_bapm((adev)-
powerplay.pp_handle,
(e))) @@ -213,7 +214,7 @@ int amdgpu_dpm_baco_reset(struct
amdgpu_device *adev)

   bool amdgpu_dpm_is_mode1_reset_supported(struct amdgpu_device

*adev)

   {
-   struct smu_context *smu = >smu;
+   struct smu_context *smu = adev->powerplay.pp_handle;

if (is_support_sw_smu(adev))
return smu_mode1_reset_is_support(smu); @@ -223,7

+224,7 @@ bool

amdgpu_dpm_is_mode1_reset_supported(struct amdgpu_device *adev)

   int amdgpu_dpm_mode1_reset(struct amdgpu_device *adev)
   {
-   struct smu_context *smu = >smu;
+   struct smu_context *smu = adev->powerplay.pp_handle;

if (is_support_sw_smu(adev))
return smu_mode1_reset(smu);
@@ -276,7 +277,7 @@ int amdgpu_dpm_set_df_cstate(struct

amdgpu_device

*adev,

   int amdgpu_dpm_allow_xgmi_power_down(struct amdgpu_device

*adev, bool en)

   {
-   struct smu_context 

RE: [PATCH] drm/amdgpu: add SMU debug option support

2021-11-30 Thread Yu, Lang
[AMD Official Use Only]

Hi Lijo,

Thanks for your comments.
 
From my understanding, that just increases the timeout threshold and
could hide some potential issues which should be exposed and solved.

If current timeout threshold is not enough for some corner cases,
(1) Do we consider to increase the threshold to cover these cases?
(2) Or do we just expose them and request SMU FW to optimize them?

I think it doesn't make much sense to increase the threshold in debug mode.
How do you think? Thanks!

Regards,
Lang

>-Original Message-
>From: Lazar, Lijo 
>Sent: Wednesday, December 1, 2021 1:44 PM
>To: Lazar, Lijo ; Yu, Lang ; amd-
>g...@lists.freedesktop.org
>Cc: Deucher, Alexander ; Huang, Ray
>; Koenig, Christian 
>Subject: RE: [PATCH] drm/amdgpu: add SMU debug option support
>
>Just realized that the patch I pasted won't work. Outer loop exit needs to be 
>like
>this.
>   (reg & MP1_C2PMSG_90__CONTENT_MASK) != 0 && extended_wait-- >=
>0
>
>Anyway, that patch is only there to communicate what I really meant in the
>earlier comment.
>
>Thanks,
>Lijo
>
>-Original Message-
>From: amd-gfx  On Behalf Of Lazar,
>Lijo
>Sent: Wednesday, December 1, 2021 10:44 AM
>To: Yu, Lang ; amd-gfx@lists.freedesktop.org
>Cc: Deucher, Alexander ; Huang, Ray
>; Koenig, Christian 
>Subject: Re: [PATCH] drm/amdgpu: add SMU debug option support
>
>
>
>On 11/30/2021 10:47 AM, Lang Yu wrote:
>> To maintain system error state when SMU errors occurred, which will
>> aid in debugging SMU firmware issues, add SMU debug option support.
>>
>> It can be enabled or disabled via amdgpu_smu_debug debugfs file. When
>> enabled, it makes SMU errors fatal.
>> It is disabled by default.
>>
>> == Command Guide ==
>>
>> 1, enable SMU debug option
>>
>>   # echo 1 > /sys/kernel/debug/dri/0/amdgpu_smu_debug
>>
>> 2, disable SMU debug option
>>
>>   # echo 0 > /sys/kernel/debug/dri/0/amdgpu_smu_debug
>>
>> v2:
>>   - Resend command when timeout.(Lijo)
>>   - Use debugfs file instead of module parameter.
>>
>> Signed-off-by: Lang Yu 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 32 +
>>   drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c  | 39 +++-
>-
>>   2 files changed, 69 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>> index 164d6a9e9fbb..f9412de86599 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>> @@ -39,6 +39,8 @@
>>
>>   #if defined(CONFIG_DEBUG_FS)
>>
>> +extern int amdgpu_smu_debug;
>> +
>>   /**
>>* amdgpu_debugfs_process_reg_op - Handle MMIO register reads/writes
>>*
>> @@ -1152,6 +1154,8 @@ static ssize_t amdgpu_debugfs_gfxoff_read(struct
>file *f, char __user *buf,
>>  return result;
>>   }
>>
>> +
>> +
>>   static const struct file_operations amdgpu_debugfs_regs2_fops = {
>>  .owner = THIS_MODULE,
>>  .unlocked_ioctl = amdgpu_debugfs_regs2_ioctl, @@ -1609,6 +1613,26
>> @@ DEFINE_DEBUGFS_ATTRIBUTE(fops_ib_preempt, NULL,
>>   DEFINE_DEBUGFS_ATTRIBUTE(fops_sclk_set, NULL,
>>  amdgpu_debugfs_sclk_set, "%llu\n");
>>
>> +static int amdgpu_debugfs_smu_debug_get(void *data, u64 *val) {
>> +*val = amdgpu_smu_debug;
>> +return 0;
>> +}
>> +
>> +static int amdgpu_debugfs_smu_debug_set(void *data, u64 val) {
>> +if (val != 0 && val != 1)
>> +return -EINVAL;
>> +
>> +amdgpu_smu_debug = val;
>> +return 0;
>> +}
>> +
>> +DEFINE_DEBUGFS_ATTRIBUTE(fops_smu_debug,
>> + amdgpu_debugfs_smu_debug_get,
>> + amdgpu_debugfs_smu_debug_set,
>> + "%llu\n");
>> +
>>   int amdgpu_debugfs_init(struct amdgpu_device *adev)
>>   {
>>  struct dentry *root = adev_to_drm(adev)->primary->debugfs_root;
>> @@ -1632,6 +1656,14 @@ int amdgpu_debugfs_init(struct amdgpu_device
>*adev)
>>  return PTR_ERR(ent);
>>  }
>>
>> +ent = debugfs_create_file("amdgpu_smu_debug", 0600, root, adev,
>> +  _smu_debug);
>> +if (IS_ERR(ent)) {
>> +DRM_ERROR("unable to create amdgpu_smu_debug debugsfs
>file\n");
>> +return PTR_ERR(ent);
>> +}
>> +
>> +
>>  /* Register debugfs entries for amdgpu_ttm */
>>  amdgpu_ttm_debugfs_init(adev);
>>  amdgpu_debugfs_pm_init(adev);
>> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
>> b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
>> index 048ca1673863..b3969d7933d3 100644
>> --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
>> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
>> @@ -55,6 +55,14 @@
>>
>>   #undef __SMU_DUMMY_MAP
>>   #define __SMU_DUMMY_MAP(type)  #type
>> +
>> +/*
>> + * Used to enable SMU debug option. When enabled, it makes SMU errors
>fatal.
>> + * This will aid in debugging SMU firmware issues.
>> + * (0 = disabled (default), 1 = enabled)  */ int amdgpu_smu_debug;
>> +
>>   static const char * const 

RE: [PATCH V2 14/17] drm/amd/pm: relocate the power related headers

2021-11-30 Thread Quan, Evan
[AMD Official Use Only]



> -Original Message-
> From: Lazar, Lijo 
> Sent: Tuesday, November 30, 2021 10:07 PM
> To: Quan, Evan ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Koenig, Christian
> ; Feng, Kenneth 
> Subject: Re: [PATCH V2 14/17] drm/amd/pm: relocate the power related
> headers
> 
> 
> 
> On 11/30/2021 1:12 PM, Evan Quan wrote:
> > Instead of centralizing all headers in the same folder. Separate them
> > into different folders and place them among those source files those
> > who really need them.
> >
> > Signed-off-by: Evan Quan 
> > Change-Id: Id74cb4c7006327ca7ecd22daf17321e417c4aa71
> > ---
> >   drivers/gpu/drm/amd/pm/Makefile   | 10 +++---
> >   drivers/gpu/drm/amd/pm/legacy-dpm/Makefile| 32
> +++
> >   .../pm/{powerplay => legacy-dpm}/cik_dpm.h|  0
> >   .../amd/pm/{powerplay => legacy-dpm}/kv_dpm.c |  0
> >   .../amd/pm/{powerplay => legacy-dpm}/kv_dpm.h |  0
> >   .../amd/pm/{powerplay => legacy-dpm}/kv_smc.c |  0
> >   .../pm/{powerplay => legacy-dpm}/legacy_dpm.c |  0
> >   .../pm/{powerplay => legacy-dpm}/legacy_dpm.h |  0
> >   .../amd/pm/{powerplay => legacy-dpm}/ppsmc.h  |  0
> >   .../pm/{powerplay => legacy-dpm}/r600_dpm.h   |  0
> >   .../amd/pm/{powerplay => legacy-dpm}/si_dpm.c |  0
> >   .../amd/pm/{powerplay => legacy-dpm}/si_dpm.h |  0
> >   .../amd/pm/{powerplay => legacy-dpm}/si_smc.c |  0
> >   .../{powerplay => legacy-dpm}/sislands_smc.h  |  0
> >   drivers/gpu/drm/amd/pm/powerplay/Makefile |  6 +---
> >   .../pm/{ => powerplay}/inc/amd_powerplay.h|  0
> >   .../drm/amd/pm/{ => powerplay}/inc/cz_ppsmc.h |  0
> >   .../amd/pm/{ => powerplay}/inc/fiji_ppsmc.h   |  0
> >   .../pm/{ => powerplay}/inc/hardwaremanager.h  |  0
> >   .../drm/amd/pm/{ => powerplay}/inc/hwmgr.h|  0
> >   .../{ => powerplay}/inc/polaris10_pwrvirus.h  |  0
> >   .../amd/pm/{ => powerplay}/inc/power_state.h  |  0
> >   .../drm/amd/pm/{ => powerplay}/inc/pp_debug.h |  0
> >   .../amd/pm/{ => powerplay}/inc/pp_endian.h|  0
> >   .../amd/pm/{ => powerplay}/inc/pp_thermal.h   |  0
> >   .../amd/pm/{ => powerplay}/inc/ppinterrupt.h  |  0
> >   .../drm/amd/pm/{ => powerplay}/inc/rv_ppsmc.h |  0
> >   .../drm/amd/pm/{ => powerplay}/inc/smu10.h|  0
> >   .../pm/{ => powerplay}/inc/smu10_driver_if.h  |  0
> >   .../pm/{ => powerplay}/inc/smu11_driver_if.h  |  0
> >   .../gpu/drm/amd/pm/{ => powerplay}/inc/smu7.h |  0
> >   .../drm/amd/pm/{ => powerplay}/inc/smu71.h|  0
> >   .../pm/{ => powerplay}/inc/smu71_discrete.h   |  0
> >   .../drm/amd/pm/{ => powerplay}/inc/smu72.h|  0
> >   .../pm/{ => powerplay}/inc/smu72_discrete.h   |  0
> >   .../drm/amd/pm/{ => powerplay}/inc/smu73.h|  0
> >   .../pm/{ => powerplay}/inc/smu73_discrete.h   |  0
> >   .../drm/amd/pm/{ => powerplay}/inc/smu74.h|  0
> >   .../pm/{ => powerplay}/inc/smu74_discrete.h   |  0
> >   .../drm/amd/pm/{ => powerplay}/inc/smu75.h|  0
> >   .../pm/{ => powerplay}/inc/smu75_discrete.h   |  0
> >   .../amd/pm/{ => powerplay}/inc/smu7_common.h  |  0
> >   .../pm/{ => powerplay}/inc/smu7_discrete.h|  0
> >   .../amd/pm/{ => powerplay}/inc/smu7_fusion.h  |  0
> >   .../amd/pm/{ => powerplay}/inc/smu7_ppsmc.h   |  0
> >   .../gpu/drm/amd/pm/{ => powerplay}/inc/smu8.h |  0
> >   .../amd/pm/{ => powerplay}/inc/smu8_fusion.h  |  0
> >   .../gpu/drm/amd/pm/{ => powerplay}/inc/smu9.h |  0
> >   .../pm/{ => powerplay}/inc/smu9_driver_if.h   |  0
> >   .../{ => powerplay}/inc/smu_ucode_xfer_cz.h   |  0
> >   .../{ => powerplay}/inc/smu_ucode_xfer_vi.h   |  0
> >   .../drm/amd/pm/{ => powerplay}/inc/smumgr.h   |  0
> >   .../amd/pm/{ => powerplay}/inc/tonga_ppsmc.h  |  0
> >   .../amd/pm/{ => powerplay}/inc/vega10_ppsmc.h |  0
> >   .../inc/vega12/smu9_driver_if.h   |  0
> >   .../amd/pm/{ => powerplay}/inc/vega12_ppsmc.h |  0
> >   .../amd/pm/{ => powerplay}/inc/vega20_ppsmc.h |  0
> >   .../amd/pm/{ => swsmu}/inc/aldebaran_ppsmc.h  |  0
> >   .../drm/amd/pm/{ => swsmu}/inc/amdgpu_smu.h   |  0
> >   .../amd/pm/{ => swsmu}/inc/arcturus_ppsmc.h   |  0
> >   .../inc/smu11_driver_if_arcturus.h|  0
> >   .../inc/smu11_driver_if_cyan_skillfish.h  |  0
> >   .../{ => swsmu}/inc/smu11_driver_if_navi10.h  |  0
> >   .../inc/smu11_driver_if_sienna_cichlid.h  |  0
> >   .../{ => swsmu}/inc/smu11_driver_if_vangogh.h |  0
> >   .../amd/pm/{ => swsmu}/inc/smu12_driver_if.h  |  0
> >   .../inc/smu13_driver_if_aldebaran.h   |  0
> >   .../inc/smu13_driver_if_yellow_carp.h |  0
> >   .../pm/{ => swsmu}/inc/smu_11_0_cdr_table.h   |  0
> >   .../drm/amd/pm/{ => swsmu}/inc/smu_types.h|  0
> >   .../drm/amd/pm/{ => swsmu}/inc/smu_v11_0.h|  0
> >   .../pm/{ => swsmu}/inc/smu_v11_0_7_ppsmc.h|  0
> >   .../pm/{ => swsmu}/inc/smu_v11_0_7_pptable.h  |  0
> >   .../amd/pm/{ => swsmu}/inc/smu_v11_0_ppsmc.h  |  0
> >   .../pm/{ => swsmu}/inc/smu_v11_0_pptable.h|  0
> >   .../amd/pm/{ => swsmu}/inc/smu_v11_5_pmfw.h   

RE: [PATCH] drm/amdgpu: add SMU debug option support

2021-11-30 Thread Lazar, Lijo
Just realized that the patch I pasted won't work. Outer loop exit needs to be 
like this. 
(reg & MP1_C2PMSG_90__CONTENT_MASK) != 0 && extended_wait-- >= 0

Anyway, that patch is only there to communicate what I really meant in the 
earlier comment.

Thanks,
Lijo

-Original Message-
From: amd-gfx  On Behalf Of Lazar, Lijo
Sent: Wednesday, December 1, 2021 10:44 AM
To: Yu, Lang ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Huang, Ray 
; Koenig, Christian 
Subject: Re: [PATCH] drm/amdgpu: add SMU debug option support



On 11/30/2021 10:47 AM, Lang Yu wrote:
> To maintain system error state when SMU errors occurred, which will 
> aid in debugging SMU firmware issues, add SMU debug option support.
> 
> It can be enabled or disabled via amdgpu_smu_debug debugfs file. When 
> enabled, it makes SMU errors fatal.
> It is disabled by default.
> 
> == Command Guide ==
> 
> 1, enable SMU debug option
> 
>   # echo 1 > /sys/kernel/debug/dri/0/amdgpu_smu_debug
> 
> 2, disable SMU debug option
> 
>   # echo 0 > /sys/kernel/debug/dri/0/amdgpu_smu_debug
> 
> v2:
>   - Resend command when timeout.(Lijo)
>   - Use debugfs file instead of module parameter.
> 
> Signed-off-by: Lang Yu 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 32 +
>   drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c  | 39 +++--
>   2 files changed, 69 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> index 164d6a9e9fbb..f9412de86599 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> @@ -39,6 +39,8 @@
>   
>   #if defined(CONFIG_DEBUG_FS)
>   
> +extern int amdgpu_smu_debug;
> +
>   /**
>* amdgpu_debugfs_process_reg_op - Handle MMIO register reads/writes
>*
> @@ -1152,6 +1154,8 @@ static ssize_t amdgpu_debugfs_gfxoff_read(struct file 
> *f, char __user *buf,
>   return result;
>   }
>   
> +
> +
>   static const struct file_operations amdgpu_debugfs_regs2_fops = {
>   .owner = THIS_MODULE,
>   .unlocked_ioctl = amdgpu_debugfs_regs2_ioctl, @@ -1609,6 +1613,26 
> @@ DEFINE_DEBUGFS_ATTRIBUTE(fops_ib_preempt, NULL,
>   DEFINE_DEBUGFS_ATTRIBUTE(fops_sclk_set, NULL,
>   amdgpu_debugfs_sclk_set, "%llu\n");
>   
> +static int amdgpu_debugfs_smu_debug_get(void *data, u64 *val) {
> + *val = amdgpu_smu_debug;
> + return 0;
> +}
> +
> +static int amdgpu_debugfs_smu_debug_set(void *data, u64 val) {
> + if (val != 0 && val != 1)
> + return -EINVAL;
> +
> + amdgpu_smu_debug = val;
> + return 0;
> +}
> +
> +DEFINE_DEBUGFS_ATTRIBUTE(fops_smu_debug,
> +  amdgpu_debugfs_smu_debug_get,
> +  amdgpu_debugfs_smu_debug_set,
> +  "%llu\n");
> +
>   int amdgpu_debugfs_init(struct amdgpu_device *adev)
>   {
>   struct dentry *root = adev_to_drm(adev)->primary->debugfs_root;
> @@ -1632,6 +1656,14 @@ int amdgpu_debugfs_init(struct amdgpu_device *adev)
>   return PTR_ERR(ent);
>   }
>   
> + ent = debugfs_create_file("amdgpu_smu_debug", 0600, root, adev,
> +   _smu_debug);
> + if (IS_ERR(ent)) {
> + DRM_ERROR("unable to create amdgpu_smu_debug debugsfs file\n");
> + return PTR_ERR(ent);
> + }
> +
> +
>   /* Register debugfs entries for amdgpu_ttm */
>   amdgpu_ttm_debugfs_init(adev);
>   amdgpu_debugfs_pm_init(adev);
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c 
> b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> index 048ca1673863..b3969d7933d3 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> @@ -55,6 +55,14 @@
>   
>   #undef __SMU_DUMMY_MAP
>   #define __SMU_DUMMY_MAP(type)   #type
> +
> +/*
> + * Used to enable SMU debug option. When enabled, it makes SMU errors fatal.
> + * This will aid in debugging SMU firmware issues.
> + * (0 = disabled (default), 1 = enabled)  */ int amdgpu_smu_debug;
> +
>   static const char * const __smu_message_names[] = {
>   SMU_MESSAGE_TYPES
>   };
> @@ -272,6 +280,11 @@ int smu_cmn_send_msg_without_waiting(struct smu_context 
> *smu,
>   __smu_cmn_send_msg(smu, msg_index, param);
>   res = 0;
>   Out:
> + if (unlikely(amdgpu_smu_debug == 1) && res) {
> + mutex_unlock(>message_lock);
> + BUG();
> + }
> +
>   return res;
>   }
>   
> @@ -288,9 +301,17 @@ int smu_cmn_send_msg_without_waiting(struct smu_context 
> *smu,
>   int smu_cmn_wait_for_response(struct smu_context *smu)
>   {
>   u32 reg;
> + int res;
>   
>   reg = __smu_cmn_poll_stat(smu);
> - return __smu_cmn_reg2errno(smu, reg);
> + res = __smu_cmn_reg2errno(smu, reg);
> +
> + if (unlikely(amdgpu_smu_debug == 1) && res) {
> + mutex_unlock(>message_lock);
> + BUG();
> + }
> +
> + return res;
> 

RE: [PATCH V2 13/17] drm/amd/pm: do not expose the smu_context structure used internally in power

2021-11-30 Thread Quan, Evan
[AMD Official Use Only]



> -Original Message-
> From: Lazar, Lijo 
> Sent: Tuesday, November 30, 2021 9:58 PM
> To: Quan, Evan ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Koenig, Christian
> ; Feng, Kenneth 
> Subject: Re: [PATCH V2 13/17] drm/amd/pm: do not expose the
> smu_context structure used internally in power
> 
> 
> 
> On 11/30/2021 1:12 PM, Evan Quan wrote:
> > This can cover the power implementation details. And as what did for
> > powerplay framework, we hook the smu_context to adev-
> >powerplay.pp_handle.
> >
> > Signed-off-by: Evan Quan 
> > Change-Id: I3969c9f62a8b63dc6e4321a488d8f15022ffeb3d
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  6 --
> >   .../gpu/drm/amd/include/kgd_pp_interface.h|  9 +++
> >   drivers/gpu/drm/amd/pm/amdgpu_dpm.c   | 51 ++--
> >   drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h   | 11 +---
> >   drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 60
> +--
> >   .../gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c |  9 +--
> >   .../gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c   |  9 +--
> >   .../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c   |  9 +--
> >   .../gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c|  4 +-
> >   .../drm/amd/pm/swsmu/smu13/aldebaran_ppt.c|  9 +--
> >   .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c|  8 +--
> >   11 files changed, 111 insertions(+), 74 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > index c987813a4996..fefabd568483 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > @@ -99,7 +99,6 @@
> >   #include "amdgpu_gem.h"
> >   #include "amdgpu_doorbell.h"
> >   #include "amdgpu_amdkfd.h"
> > -#include "amdgpu_smu.h"
> >   #include "amdgpu_discovery.h"
> >   #include "amdgpu_mes.h"
> >   #include "amdgpu_umc.h"
> > @@ -950,11 +949,6 @@ struct amdgpu_device {
> >
> > /* powerplay */
> > struct amd_powerplaypowerplay;
> > -
> > -   /* smu */
> > -   struct smu_context  smu;
> > -
> > -   /* dpm */
> > struct amdgpu_pmpm;
> > u32 cg_flags;
> > u32 pg_flags;
> > diff --git a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
> > b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
> > index 7919e96e772b..da6a82430048 100644
> > --- a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
> > +++ b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
> > @@ -25,6 +25,9 @@
> >   #define __KGD_PP_INTERFACE_H__
> >
> >   extern const struct amdgpu_ip_block_version pp_smu_ip_block;
> > +extern const struct amdgpu_ip_block_version smu_v11_0_ip_block;
> > +extern const struct amdgpu_ip_block_version smu_v12_0_ip_block;
> > +extern const struct amdgpu_ip_block_version smu_v13_0_ip_block;
> >
> >   enum smu_event_type {
> > SMU_EVENT_RESET_COMPLETE = 0,
> > @@ -244,6 +247,12 @@ enum pp_power_type
> > PP_PWR_TYPE_FAST,
> >   };
> >
> > +enum smu_ppt_limit_type
> > +{
> > +   SMU_DEFAULT_PPT_LIMIT = 0,
> > +   SMU_FAST_PPT_LIMIT,
> > +};
> > +
> 
> This is a contradiction. If the entry point is dpm, this shouldn't be here and
> the external interface doesn't need to know about internal datatypes.
[Quan, Evan] This is needed by amdgpu_hwmon_show_power_label() from 
amdgpu_pm.c. 
So, it has to be put into some place which can be accessed from outside(of 
power).
Then kgd_pp_interface.h is the right place.

> 
> >   #define PP_GROUP_MASK0xF000
> >   #define PP_GROUP_SHIFT   28
> >
> > diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
> > b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
> > index 8f0ae58f4292..a5cbbf9367fe 100644
> > --- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
> > +++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
> > @@ -31,6 +31,7 @@
> >   #include "amdgpu_display.h"
> >   #include "hwmgr.h"
> >   #include 
> > +#include "amdgpu_smu.h"
> >
> >   #define amdgpu_dpm_enable_bapm(adev, e) \
> >
> > ((adev)->powerplay.pp_funcs->enable_bapm((adev)-
> >powerplay.pp_handle,
> > (e))) @@ -213,7 +214,7 @@ int amdgpu_dpm_baco_reset(struct
> > amdgpu_device *adev)
> >
> >   bool amdgpu_dpm_is_mode1_reset_supported(struct amdgpu_device
> *adev)
> >   {
> > -   struct smu_context *smu = >smu;
> > +   struct smu_context *smu = adev->powerplay.pp_handle;
> >
> > if (is_support_sw_smu(adev))
> > return smu_mode1_reset_is_support(smu); @@ -223,7
> +224,7 @@ bool
> > amdgpu_dpm_is_mode1_reset_supported(struct amdgpu_device *adev)
> >
> >   int amdgpu_dpm_mode1_reset(struct amdgpu_device *adev)
> >   {
> > -   struct smu_context *smu = >smu;
> > +   struct smu_context *smu = adev->powerplay.pp_handle;
> >
> > if (is_support_sw_smu(adev))
> > return smu_mode1_reset(smu);
> > @@ -276,7 +277,7 @@ int amdgpu_dpm_set_df_cstate(struct
> amdgpu_device
> > *adev,
> >
> >   int amdgpu_dpm_allow_xgmi_power_down(struct amdgpu_device
> *adev, bool en)
> >   {
> > -   

Re: [PATCH] drm/amdgpu: add SMU debug option support

2021-11-30 Thread Lazar, Lijo




On 11/30/2021 10:47 AM, Lang Yu wrote:

To maintain system error state when SMU errors occurred,
which will aid in debugging SMU firmware issues,
add SMU debug option support.

It can be enabled or disabled via amdgpu_smu_debug
debugfs file. When enabled, it makes SMU errors fatal.
It is disabled by default.

== Command Guide ==

1, enable SMU debug option

  # echo 1 > /sys/kernel/debug/dri/0/amdgpu_smu_debug

2, disable SMU debug option

  # echo 0 > /sys/kernel/debug/dri/0/amdgpu_smu_debug

v2:
  - Resend command when timeout.(Lijo)
  - Use debugfs file instead of module parameter.

Signed-off-by: Lang Yu 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 32 +
  drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c  | 39 +++--
  2 files changed, 69 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 164d6a9e9fbb..f9412de86599 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -39,6 +39,8 @@
  
  #if defined(CONFIG_DEBUG_FS)
  
+extern int amdgpu_smu_debug;

+
  /**
   * amdgpu_debugfs_process_reg_op - Handle MMIO register reads/writes
   *
@@ -1152,6 +1154,8 @@ static ssize_t amdgpu_debugfs_gfxoff_read(struct file *f, 
char __user *buf,
return result;
  }
  
+

+
  static const struct file_operations amdgpu_debugfs_regs2_fops = {
.owner = THIS_MODULE,
.unlocked_ioctl = amdgpu_debugfs_regs2_ioctl,
@@ -1609,6 +1613,26 @@ DEFINE_DEBUGFS_ATTRIBUTE(fops_ib_preempt, NULL,
  DEFINE_DEBUGFS_ATTRIBUTE(fops_sclk_set, NULL,
amdgpu_debugfs_sclk_set, "%llu\n");
  
+static int amdgpu_debugfs_smu_debug_get(void *data, u64 *val)

+{
+   *val = amdgpu_smu_debug;
+   return 0;
+}
+
+static int amdgpu_debugfs_smu_debug_set(void *data, u64 val)
+{
+   if (val != 0 && val != 1)
+   return -EINVAL;
+
+   amdgpu_smu_debug = val;
+   return 0;
+}
+
+DEFINE_DEBUGFS_ATTRIBUTE(fops_smu_debug,
+amdgpu_debugfs_smu_debug_get,
+amdgpu_debugfs_smu_debug_set,
+"%llu\n");
+
  int amdgpu_debugfs_init(struct amdgpu_device *adev)
  {
struct dentry *root = adev_to_drm(adev)->primary->debugfs_root;
@@ -1632,6 +1656,14 @@ int amdgpu_debugfs_init(struct amdgpu_device *adev)
return PTR_ERR(ent);
}
  
+	ent = debugfs_create_file("amdgpu_smu_debug", 0600, root, adev,

+ _smu_debug);
+   if (IS_ERR(ent)) {
+   DRM_ERROR("unable to create amdgpu_smu_debug debugsfs file\n");
+   return PTR_ERR(ent);
+   }
+
+
/* Register debugfs entries for amdgpu_ttm */
amdgpu_ttm_debugfs_init(adev);
amdgpu_debugfs_pm_init(adev);
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
index 048ca1673863..b3969d7933d3 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
@@ -55,6 +55,14 @@
  
  #undef __SMU_DUMMY_MAP

  #define __SMU_DUMMY_MAP(type) #type
+
+/*
+ * Used to enable SMU debug option. When enabled, it makes SMU errors fatal.
+ * This will aid in debugging SMU firmware issues.
+ * (0 = disabled (default), 1 = enabled)
+ */
+int amdgpu_smu_debug;
+
  static const char * const __smu_message_names[] = {
SMU_MESSAGE_TYPES
  };
@@ -272,6 +280,11 @@ int smu_cmn_send_msg_without_waiting(struct smu_context 
*smu,
__smu_cmn_send_msg(smu, msg_index, param);
res = 0;
  Out:
+   if (unlikely(amdgpu_smu_debug == 1) && res) {
+   mutex_unlock(>message_lock);
+   BUG();
+   }
+
return res;
  }
  
@@ -288,9 +301,17 @@ int smu_cmn_send_msg_without_waiting(struct smu_context *smu,

  int smu_cmn_wait_for_response(struct smu_context *smu)
  {
u32 reg;
+   int res;
  
  	reg = __smu_cmn_poll_stat(smu);

-   return __smu_cmn_reg2errno(smu, reg);
+   res = __smu_cmn_reg2errno(smu, reg);
+
+   if (unlikely(amdgpu_smu_debug == 1) && res) {
+   mutex_unlock(>message_lock);
+   BUG();
+   }
+
+   return res;
  }
  
  /**

@@ -328,6 +349,7 @@ int smu_cmn_send_smc_msg_with_param(struct smu_context *smu,
uint32_t param,
uint32_t *read_arg)
  {
+   int retry_count = 0;
int res, index;
u32 reg;
  
@@ -349,15 +371,28 @@ int smu_cmn_send_smc_msg_with_param(struct smu_context *smu,

__smu_cmn_reg_print_error(smu, reg, index, param, msg);
goto Out;
}
+retry:
__smu_cmn_send_msg(smu, (uint16_t) index, param);
reg = __smu_cmn_poll_stat(smu);
res = __smu_cmn_reg2errno(smu, reg);
-   if (res != 0)
+   if (res != 0) {
__smu_cmn_reg_print_error(smu, reg, index, param, msg);
+

Re: [PATCH V2 07/17] drm/amd/pm: create a new holder for those APIs used only by legacy ASICs(si/kv)

2021-11-30 Thread Lazar, Lijo




On 12/1/2021 8:43 AM, Quan, Evan wrote:

[AMD Official Use Only]




-Original Message-
From: Lazar, Lijo 
Sent: Tuesday, November 30, 2021 9:21 PM
To: Quan, Evan ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Koenig, Christian
; Feng, Kenneth 
Subject: Re: [PATCH V2 07/17] drm/amd/pm: create a new holder for those
APIs used only by legacy ASICs(si/kv)



On 11/30/2021 1:12 PM, Evan Quan wrote:

Those APIs are used only by legacy ASICs(si/kv). They cannot be
shared by other ASICs. So, we create a new holder for them.

Signed-off-by: Evan Quan 
Change-Id: I555dfa37e783a267b1d3b3a7db5c87fcc3f1556f
--
v1->v2:
- move other APIs used by si/kv in amdgpu_atombios.c to the new
  holder also(Alex)
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c  |  421 -
   drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.h  |   30 -
   .../gpu/drm/amd/include/kgd_pp_interface.h|1 +
   drivers/gpu/drm/amd/pm/amdgpu_dpm.c   | 1008 +---
   drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h   |   15 -
   drivers/gpu/drm/amd/pm/powerplay/Makefile |2 +-
   drivers/gpu/drm/amd/pm/powerplay/kv_dpm.c |2 +
   drivers/gpu/drm/amd/pm/powerplay/legacy_dpm.c | 1453

+

   drivers/gpu/drm/amd/pm/powerplay/legacy_dpm.h |   70 +
   drivers/gpu/drm/amd/pm/powerplay/si_dpm.c |2 +
   10 files changed, 1534 insertions(+), 1470 deletions(-)
   create mode 100644 drivers/gpu/drm/amd/pm/powerplay/legacy_dpm.c
   create mode 100644 drivers/gpu/drm/amd/pm/powerplay/legacy_dpm.h

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c

b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c

index 12a6b1c99c93..f2e447212e62 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
@@ -1083,427 +1083,6 @@ int

amdgpu_atombios_get_clock_dividers(struct amdgpu_device *adev,

return 0;
   }

-int amdgpu_atombios_get_memory_pll_dividers(struct amdgpu_device

*adev,

-   u32 clock,
-   bool strobe_mode,
-   struct atom_mpll_param

*mpll_param)

-{
-   COMPUTE_MEMORY_CLOCK_PARAM_PARAMETERS_V2_1 args;
-   int index = GetIndexIntoMasterTable(COMMAND,

ComputeMemoryClockParam);

-   u8 frev, crev;
-
-   memset(, 0, sizeof(args));
-   memset(mpll_param, 0, sizeof(struct atom_mpll_param));
-
-   if (!amdgpu_atom_parse_cmd_header(adev-
mode_info.atom_context, index, , ))
-   return -EINVAL;
-
-   switch (frev) {
-   case 2:
-   switch (crev) {
-   case 1:
-   /* SI */
-   args.ulClock = cpu_to_le32(clock);  /* 10 khz */
-   args.ucInputFlag = 0;
-   if (strobe_mode)
-   args.ucInputFlag |=

MPLL_INPUT_FLAG_STROBE_MODE_EN;

-
-   amdgpu_atom_execute_table(adev-
mode_info.atom_context, index, (uint32_t *));
-
-   mpll_param->clkfrac =

le16_to_cpu(args.ulFbDiv.usFbDivFrac);

-   mpll_param->clkf =

le16_to_cpu(args.ulFbDiv.usFbDiv);

-   mpll_param->post_div = args.ucPostDiv;
-   mpll_param->dll_speed = args.ucDllSpeed;
-   mpll_param->bwcntl = args.ucBWCntl;
-   mpll_param->vco_mode =
-   (args.ucPllCntlFlag &

MPLL_CNTL_FLAG_VCO_MODE_MASK);

-   mpll_param->yclk_sel =
-   (args.ucPllCntlFlag &

MPLL_CNTL_FLAG_BYPASS_DQ_PLL) ? 1 : 0;

-   mpll_param->qdr =
-   (args.ucPllCntlFlag &

MPLL_CNTL_FLAG_QDR_ENABLE) ? 1 : 0;

-   mpll_param->half_rate =
-   (args.ucPllCntlFlag &

MPLL_CNTL_FLAG_AD_HALF_RATE) ? 1 : 0;

-   break;
-   default:
-   return -EINVAL;
-   }
-   break;
-   default:
-   return -EINVAL;
-   }
-   return 0;
-}
-
-void amdgpu_atombios_set_engine_dram_timings(struct amdgpu_device

*adev,

-u32 eng_clock, u32 mem_clock)
-{
-   SET_ENGINE_CLOCK_PS_ALLOCATION args;
-   int index = GetIndexIntoMasterTable(COMMAND,

DynamicMemorySettings);

-   u32 tmp;
-
-   memset(, 0, sizeof(args));
-
-   tmp = eng_clock & SET_CLOCK_FREQ_MASK;
-   tmp |= (COMPUTE_ENGINE_PLL_PARAM << 24);
-
-   args.ulTargetEngineClock = cpu_to_le32(tmp);
-   if (mem_clock)
-   args.sReserved.ulClock = cpu_to_le32(mem_clock &

SET_CLOCK_FREQ_MASK);

-
-   amdgpu_atom_execute_table(adev->mode_info.atom_context,

index, (uint32_t *));

-}
-
-void amdgpu_atombios_get_default_voltages(struct amdgpu_device

*adev,

- u16 *vddc, u16 

RE: [PATCH V2 11/17] drm/amd/pm: correct the usage for amdgpu_dpm_dispatch_task()

2021-11-30 Thread Quan, Evan
[AMD Official Use Only]



> -Original Message-
> From: Lazar, Lijo 
> Sent: Tuesday, November 30, 2021 9:48 PM
> To: Quan, Evan ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Koenig, Christian
> ; Feng, Kenneth 
> Subject: Re: [PATCH V2 11/17] drm/amd/pm: correct the usage for
> amdgpu_dpm_dispatch_task()
> 
> 
> 
> On 11/30/2021 1:12 PM, Evan Quan wrote:
> > We should avoid having multi-function APIs. It should be up to the
> > caller to determine when or whether to call amdgpu_dpm_dispatch_task().
> >
> > Signed-off-by: Evan Quan 
> > Change-Id: I78ec4eb8ceb6e526a4734113d213d15a5fbaa8a4
> > ---
> >   drivers/gpu/drm/amd/pm/amdgpu_dpm.c | 18 ++
> >   drivers/gpu/drm/amd/pm/amdgpu_pm.c  | 26
> --
> >   2 files changed, 26 insertions(+), 18 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
> > b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
> > index c6299e406848..8f0ae58f4292 100644
> > --- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
> > +++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
> > @@ -558,8 +558,6 @@ void amdgpu_dpm_set_power_state(struct
> amdgpu_device *adev,
> > enum amd_pm_state_type state)
> >   {
> > adev->pm.dpm.user_state = state;
> > -
> > -   amdgpu_dpm_dispatch_task(adev,
> AMD_PP_TASK_ENABLE_USER_STATE, );
> >   }
> >
> >   enum amd_dpm_forced_level
> amdgpu_dpm_get_performance_level(struct
> > amdgpu_device *adev) @@ -727,13 +725,7 @@ int
> amdgpu_dpm_set_sclk_od(struct amdgpu_device *adev, uint32_t value)
> > if (!pp_funcs->set_sclk_od)
> > return -EOPNOTSUPP;
> >
> > -   pp_funcs->set_sclk_od(adev->powerplay.pp_handle, value);
> > -
> > -   amdgpu_dpm_dispatch_task(adev,
> > -AMD_PP_TASK_READJUST_POWER_STATE,
> > -NULL);
> > -
> > -   return 0;
> > +   return pp_funcs->set_sclk_od(adev->powerplay.pp_handle, value);
> >   }
> >
> >   int amdgpu_dpm_get_mclk_od(struct amdgpu_device *adev) @@ -
> 753,13
> > +745,7 @@ int amdgpu_dpm_set_mclk_od(struct amdgpu_device *adev,
> uint32_t value)
> > if (!pp_funcs->set_mclk_od)
> > return -EOPNOTSUPP;
> >
> > -   pp_funcs->set_mclk_od(adev->powerplay.pp_handle, value);
> > -
> > -   amdgpu_dpm_dispatch_task(adev,
> > -AMD_PP_TASK_READJUST_POWER_STATE,
> > -NULL);
> > -
> > -   return 0;
> > +   return pp_funcs->set_mclk_od(adev->powerplay.pp_handle, value);
> >   }
> >
> >   int amdgpu_dpm_get_power_profile_mode(struct amdgpu_device
> *adev,
> > diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
> > b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
> > index fa2f4e11e94e..89e1134d660f 100644
> > --- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
> > +++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
> > @@ -187,6 +187,10 @@ static ssize_t
> amdgpu_set_power_dpm_state(struct
> > device *dev,
> >
> > amdgpu_dpm_set_power_state(adev, state);
> >
> > +   amdgpu_dpm_dispatch_task(adev,
> > +AMD_PP_TASK_ENABLE_USER_STATE,
> > +);
> > +
> 
> This is just the opposite of what has been done so far. The idea is to keep 
> the
> logic inside dpm_* calls and not to keep the logic in amdgpu_pm. This does
> the reverse. I guess this patch can be dropped.
[Quan, Evan] The situation here is 
1. in some cases the amdgpu_dpm_dispatch_task() is included/integrated. E.g. 
amdgpu_dpm_set_mclk_od() amdgpu_dpm_set_sclk_od
2. in other cases the amdgpu_dpm_dispatch_task() is called separately . E.g. by 
amdgpu_set_pp_force_state() and amdgpu_set_pp_od_clk_voltage() from amdgpu_pm.c 
They will make the thing that adds a unified lock protection on those 
amdgpu_dpm_xxx() APIs tricky. To resolve that, we either
1. separate the amdgpu_dpm_dispatch_task() from those 
APIs(amdgpu_dpm_set_mclk_od() amdgpu_dpm_set_sclk_od())
2. try to get amdgpu_dpm_dispatch_task() included also in 
amdgpu_set_pp_force_state() and amdgpu_set_pp_od_clk_voltage()
After some considerations, I believe 1 is the more proper way. As the current 
implementation of amdgpu_dpm_set_mclk_od() really combines two logics 
separately things together.
The amdgpu_dpm_dispatch_task() should be splitted out.

BR
Evan
> 
> Thanks,
> Lijo
> 
> > pm_runtime_mark_last_busy(ddev->dev);
> > pm_runtime_put_autosuspend(ddev->dev);
> >
> > @@ -1278,7 +1282,16 @@ static ssize_t amdgpu_set_pp_sclk_od(struct
> device *dev,
> > return ret;
> > }
> >
> > -   amdgpu_dpm_set_sclk_od(adev, (uint32_t)value);
> > +   ret = amdgpu_dpm_set_sclk_od(adev, (uint32_t)value);
> > +   if (ret) {
> > +   pm_runtime_mark_last_busy(ddev->dev);
> > +   pm_runtime_put_autosuspend(ddev->dev);
> > +   return ret;
> > +   }
> > +
> > +   amdgpu_dpm_dispatch_task(adev,
> > +AMD_PP_TASK_READJUST_POWER_STATE,
> > +NULL);
> >
> > pm_runtime_mark_last_busy(ddev->dev);
> > 

Re: [PATCH V2 01/17] drm/amd/pm: do not expose implementation details to other blocks out of power

2021-11-30 Thread Lazar, Lijo




On 11/30/2021 1:39 PM, Lazar, Lijo wrote:



On 11/30/2021 1:12 PM, Evan Quan wrote:
Those implementation details(whether swsmu supported, some ppt_funcs 
supported,
accessing internal statistics ...)should be kept internally. It's not 
a good

practice and even error prone to expose implementation details.

Signed-off-by: Evan Quan 
Change-Id: Ibca3462ceaa26a27a9145282b60c6ce5deca7752
---
  drivers/gpu/drm/amd/amdgpu/aldebaran.c    |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c   | 25 ++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    |  6 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c   | 18 +---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h   |  7 --
  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c   |  5 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c   |  5 +-
  drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c   |  2 +-
  .../gpu/drm/amd/include/kgd_pp_interface.h    |  4 +
  drivers/gpu/drm/amd/pm/amdgpu_dpm.c   | 95 +++
  drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h   | 25 -
  drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h   |  9 +-
  drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 16 ++--
  13 files changed, 155 insertions(+), 64 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/aldebaran.c 
b/drivers/gpu/drm/amd/amdgpu/aldebaran.c

index bcfdb63b1d42..a545df4efce1 100644
--- a/drivers/gpu/drm/amd/amdgpu/aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/aldebaran.c
@@ -260,7 +260,7 @@ static int aldebaran_mode2_restore_ip(struct 
amdgpu_device *adev)

  adev->gfx.rlc.funcs->resume(adev);
  /* Wait for FW reset event complete */
-    r = smu_wait_for_event(adev, SMU_EVENT_RESET_COMPLETE, 0);
+    r = amdgpu_dpm_wait_for_event(adev, SMU_EVENT_RESET_COMPLETE, 0);
  if (r) {
  dev_err(adev->dev,
  "Failed to get response from firmware after reset\n");
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c

index 164d6a9e9fbb..0d1f00b24aae 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1585,22 +1585,25 @@ static int amdgpu_debugfs_sclk_set(void *data, 
u64 val)

  return ret;
  }
-    if (is_support_sw_smu(adev)) {
-    ret = smu_get_dpm_freq_range(>smu, SMU_SCLK, _freq, 
_freq);

-    if (ret || val > max_freq || val < min_freq)
-    return -EINVAL;
-    ret = smu_set_soft_freq_range(>smu, SMU_SCLK, 
(uint32_t)val, (uint32_t)val);

-    } else {
-    return 0;
+    ret = amdgpu_dpm_get_dpm_freq_range(adev, PP_SCLK, _freq, 
_freq);

+    if (ret == -EOPNOTSUPP) {
+    ret = 0;
+    goto out;
  }
+    if (ret || val > max_freq || val < min_freq) {
+    ret = -EINVAL;
+    goto out;
+    }
+
+    ret = amdgpu_dpm_set_soft_freq_range(adev, PP_SCLK, 
(uint32_t)val, (uint32_t)val);

+    if (ret)
+    ret = -EINVAL;
+out:
  pm_runtime_mark_last_busy(adev_to_drm(adev)->dev);
  pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
-    if (ret)
-    return -EINVAL;
-
-    return 0;
+    return ret;
  }
  DEFINE_DEBUGFS_ATTRIBUTE(fops_ib_preempt, NULL,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index 1989f9e9379e..41cc1ffb5809 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2617,7 +2617,7 @@ static int amdgpu_device_ip_late_init(struct 
amdgpu_device *adev)

  if (adev->asic_type == CHIP_ARCTURUS &&
  amdgpu_passthrough(adev) &&
  adev->gmc.xgmi.num_physical_nodes > 1)
-    smu_set_light_sbr(>smu, true);
+    amdgpu_dpm_set_light_sbr(adev, true);
  if (adev->gmc.xgmi.num_physical_nodes > 1) {
  mutex_lock(_info.mutex);
@@ -2857,7 +2857,7 @@ static int 
amdgpu_device_ip_suspend_phase2(struct amdgpu_device *adev)

  int i, r;
  if (adev->in_s0ix)
-    amdgpu_gfx_state_change_set(adev, sGpuChangeState_D3Entry);
+    amdgpu_dpm_gfx_state_change(adev, sGpuChangeState_D3Entry);
  for (i = adev->num_ip_blocks - 1; i >= 0; i--) {
  if (!adev->ip_blocks[i].status.valid)
@@ -3982,7 +3982,7 @@ int amdgpu_device_resume(struct drm_device *dev, 
bool fbcon)

  return 0;
  if (adev->in_s0ix)
-    amdgpu_gfx_state_change_set(adev, sGpuChangeState_D0Entry);
+    amdgpu_dpm_gfx_state_change(adev, sGpuChangeState_D0Entry);
  /* post card */
  if (amdgpu_device_need_post(adev)) {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c

index 1916ec84dd71..3d8f82dc8c97 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -615,7 +615,7 @@ int amdgpu_get_gfx_off_status(struct amdgpu_device 
*adev, uint32_t *value)

  mutex_lock(>gfx.gfx_off_mutex);
-    r = smu_get_status_gfxoff(adev, value);
+    r = amdgpu_dpm_get_status_gfxoff(adev, value);
  mutex_unlock(>gfx.gfx_off_mutex);

Re: [PATCH V2 01/17] drm/amd/pm: do not expose implementation details to other blocks out of power

2021-11-30 Thread Lazar, Lijo




On 12/1/2021 7:29 AM, Quan, Evan wrote:

[AMD Official Use Only]




-Original Message-
From: amd-gfx  On Behalf Of
Lazar, Lijo
Sent: Tuesday, November 30, 2021 4:10 PM
To: Quan, Evan ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Feng, Kenneth
; Koenig, Christian 
Subject: Re: [PATCH V2 01/17] drm/amd/pm: do not expose implementation
details to other blocks out of power



On 11/30/2021 1:12 PM, Evan Quan wrote:

Those implementation details(whether swsmu supported, some ppt_funcs
supported, accessing internal statistics ...)should be kept
internally. It's not a good practice and even error prone to expose

implementation details.


Signed-off-by: Evan Quan 
Change-Id: Ibca3462ceaa26a27a9145282b60c6ce5deca7752
---
   drivers/gpu/drm/amd/amdgpu/aldebaran.c|  2 +-
   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c   | 25 ++---
   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c|  6 +-
   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c   | 18 +---
   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h   |  7 --
   drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c   |  5 +-
   drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c   |  5 +-
   drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c   |  2 +-
   .../gpu/drm/amd/include/kgd_pp_interface.h|  4 +
   drivers/gpu/drm/amd/pm/amdgpu_dpm.c   | 95

+++

   drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h   | 25 -
   drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h   |  9 +-
   drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 16 ++--
   13 files changed, 155 insertions(+), 64 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/aldebaran.c
b/drivers/gpu/drm/amd/amdgpu/aldebaran.c
index bcfdb63b1d42..a545df4efce1 100644
--- a/drivers/gpu/drm/amd/amdgpu/aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/aldebaran.c
@@ -260,7 +260,7 @@ static int aldebaran_mode2_restore_ip(struct

amdgpu_device *adev)

adev->gfx.rlc.funcs->resume(adev);

/* Wait for FW reset event complete */
-   r = smu_wait_for_event(adev, SMU_EVENT_RESET_COMPLETE, 0);
+   r = amdgpu_dpm_wait_for_event(adev,

SMU_EVENT_RESET_COMPLETE, 0);

if (r) {
dev_err(adev->dev,
"Failed to get response from firmware after reset\n");

diff --git

a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 164d6a9e9fbb..0d1f00b24aae 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1585,22 +1585,25 @@ static int amdgpu_debugfs_sclk_set(void *data,

u64 val)

return ret;
}

-   if (is_support_sw_smu(adev)) {
-   ret = smu_get_dpm_freq_range(>smu, SMU_SCLK,

_freq, _freq);

-   if (ret || val > max_freq || val < min_freq)
-   return -EINVAL;
-   ret = smu_set_soft_freq_range(>smu, SMU_SCLK,

(uint32_t)val, (uint32_t)val);

-   } else {
-   return 0;
+   ret = amdgpu_dpm_get_dpm_freq_range(adev, PP_SCLK,

_freq, _freq);

+   if (ret == -EOPNOTSUPP) {
+   ret = 0;
+   goto out;
}
+   if (ret || val > max_freq || val < min_freq) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   ret = amdgpu_dpm_set_soft_freq_range(adev, PP_SCLK,

(uint32_t)val, (uint32_t)val);

+   if (ret)
+   ret = -EINVAL;

+out:
pm_runtime_mark_last_busy(adev_to_drm(adev)->dev);
pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);

-   if (ret)
-   return -EINVAL;
-
-   return 0;
+   return ret;
   }

   DEFINE_DEBUGFS_ATTRIBUTE(fops_ib_preempt, NULL, diff --git
a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 1989f9e9379e..41cc1ffb5809 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2617,7 +2617,7 @@ static int amdgpu_device_ip_late_init(struct

amdgpu_device *adev)

if (adev->asic_type == CHIP_ARCTURUS &&
amdgpu_passthrough(adev) &&
adev->gmc.xgmi.num_physical_nodes > 1)
-   smu_set_light_sbr(>smu, true);
+   amdgpu_dpm_set_light_sbr(adev, true);

if (adev->gmc.xgmi.num_physical_nodes > 1) {
mutex_lock(_info.mutex);
@@ -2857,7 +2857,7 @@ static int

amdgpu_device_ip_suspend_phase2(struct amdgpu_device *adev)

int i, r;

if (adev->in_s0ix)
-   amdgpu_gfx_state_change_set(adev,

sGpuChangeState_D3Entry);

+   amdgpu_dpm_gfx_state_change(adev,

sGpuChangeState_D3Entry);


for (i = adev->num_ip_blocks - 1; i >= 0; i--) {
if (!adev->ip_blocks[i].status.valid)
@@ -3982,7 +3982,7 @@ int amdgpu_device_resume(struct drm_device

*dev, bool fbcon)

return 0;

if (adev->in_s0ix)
-   amdgpu_gfx_state_change_set(adev,

sGpuChangeState_D0Entry);

+   

RE: [PATCH V2 06/17] drm/amd/pm: do not expose the API used internally only in kv_dpm.c

2021-11-30 Thread Quan, Evan
[AMD Official Use Only]



> -Original Message-
> From: amd-gfx  On Behalf Of
> Lazar, Lijo
> Sent: Tuesday, November 30, 2021 8:28 PM
> To: Quan, Evan ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Feng, Kenneth
> ; Koenig, Christian 
> Subject: Re: [PATCH V2 06/17] drm/amd/pm: do not expose the API used
> internally only in kv_dpm.c
> 
> 
> 
> On 11/30/2021 1:12 PM, Evan Quan wrote:
> > Move it to kv_dpm.c instead.
> >
> > Signed-off-by: Evan Quan 
> > Change-Id: I554332b386491a79b7913f72786f1e2cb1f8165b
> > --
> > v1->v2:
> >- rename the API with "kv_" prefix(Alex)
> > ---
> >   drivers/gpu/drm/amd/pm/amdgpu_dpm.c   | 23 -
> >   drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h   |  2 --
> >   drivers/gpu/drm/amd/pm/powerplay/kv_dpm.c | 25
> ++-
> >   3 files changed, 24 insertions(+), 26 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
> > b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
> > index fbfc07a83122..ecaf0081bc31 100644
> > --- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
> > +++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
> > @@ -209,29 +209,6 @@ static u32 amdgpu_dpm_get_vrefresh(struct
> amdgpu_device *adev)
> > return vrefresh;
> >   }
> >
> > -bool amdgpu_is_internal_thermal_sensor(enum
> amdgpu_int_thermal_type
> > sensor) -{
> > -   switch (sensor) {
> > -   case THERMAL_TYPE_RV6XX:
> > -   case THERMAL_TYPE_RV770:
> > -   case THERMAL_TYPE_EVERGREEN:
> > -   case THERMAL_TYPE_SUMO:
> > -   case THERMAL_TYPE_NI:
> > -   case THERMAL_TYPE_SI:
> > -   case THERMAL_TYPE_CI:
> > -   case THERMAL_TYPE_KV:
> > -   return true;
> > -   case THERMAL_TYPE_ADT7473_WITH_INTERNAL:
> > -   case THERMAL_TYPE_EMC2103_WITH_INTERNAL:
> > -   return false; /* need special handling */
> > -   case THERMAL_TYPE_NONE:
> > -   case THERMAL_TYPE_EXTERNAL:
> > -   case THERMAL_TYPE_EXTERNAL_GPIO:
> > -   default:
> > -   return false;
> > -   }
> > -}
> > -
> >   union power_info {
> > struct _ATOM_POWERPLAY_INFO info;
> > struct _ATOM_POWERPLAY_INFO_V2 info_2; diff --git
> > a/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
> > b/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
> > index f43b96dfe9d8..01120b302590 100644
> > --- a/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
> > +++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
> > @@ -374,8 +374,6 @@ u32 amdgpu_dpm_get_vblank_time(struct
> amdgpu_device *adev);
> >   int amdgpu_dpm_read_sensor(struct amdgpu_device *adev, enum
> amd_pp_sensors sensor,
> >void *data, uint32_t *size);
> >
> > -bool amdgpu_is_internal_thermal_sensor(enum
> amdgpu_int_thermal_type
> > sensor);
> > -
> >   int amdgpu_get_platform_caps(struct amdgpu_device *adev);
> >
> >   int amdgpu_parse_extended_power_table(struct amdgpu_device *adev);
> > diff --git a/drivers/gpu/drm/amd/pm/powerplay/kv_dpm.c
> > b/drivers/gpu/drm/amd/pm/powerplay/kv_dpm.c
> > index bcae42cef374..380a5336c74f 100644
> > --- a/drivers/gpu/drm/amd/pm/powerplay/kv_dpm.c
> > +++ b/drivers/gpu/drm/amd/pm/powerplay/kv_dpm.c
> > @@ -1256,6 +1256,29 @@ static void kv_dpm_enable_bapm(void *handle,
> bool enable)
> > }
> >   }
> >
> > +static bool kv_is_internal_thermal_sensor(enum
> > +amdgpu_int_thermal_type sensor) {
> > +   switch (sensor) {
> > +   case THERMAL_TYPE_RV6XX:
> > +   case THERMAL_TYPE_RV770:
> > +   case THERMAL_TYPE_EVERGREEN:
> > +   case THERMAL_TYPE_SUMO:
> > +   case THERMAL_TYPE_NI:
> > +   case THERMAL_TYPE_SI:
> > +   case THERMAL_TYPE_CI:
> > +   case THERMAL_TYPE_KV:
> > +   return true;
> > +   case THERMAL_TYPE_ADT7473_WITH_INTERNAL:
> > +   case THERMAL_TYPE_EMC2103_WITH_INTERNAL:
> > +   return false; /* need special handling */
> > +   case THERMAL_TYPE_NONE:
> > +   case THERMAL_TYPE_EXTERNAL:
> > +   case THERMAL_TYPE_EXTERNAL_GPIO:
> > +   default:
> > +   return false;
> > +   }
> > +}
> 
> All these names don't look like KV specific. Remove the family specifc ones
> like RV, SI, NI, CI etc., and keep KV and the generic ones like
> GPIO/EXTERNAL/NONE. Don't see a chance of external diodes being used for
> KV.
[Quan, Evan] Make sense. I will create another patch to follow this.
Let's keep minimum change here.

Thanks,
Evan
> 
> Thanks,
> Lijo
> 
> > +
> >   static int kv_dpm_enable(struct amdgpu_device *adev)
> >   {
> > struct kv_power_info *pi = kv_get_pi(adev); @@ -1352,7 +1375,7
> @@
> > static int kv_dpm_enable(struct amdgpu_device *adev)
> > }
> >
> > if (adev->irq.installed &&
> > -   amdgpu_is_internal_thermal_sensor(adev-
> >pm.int_thermal_type)) {
> > +   kv_is_internal_thermal_sensor(adev->pm.int_thermal_type)) {
> > ret = kv_set_thermal_temperature_range(adev,
> KV_TEMP_RANGE_MIN, KV_TEMP_RANGE_MAX);
> > if (ret) {
> > DRM_ERROR("kv_set_thermal_temperature_range
> failed\n");
> >


Re: [PATCH] drm/amdgpu: add SMU debug option support

2021-11-30 Thread Lang Yu
On 11/30/ , Christian KKKnig wrote:
> Am 30.11.21 um 06:17 schrieb Lang Yu:
> > To maintain system error state when SMU errors occurred,
> > which will aid in debugging SMU firmware issues,
> > add SMU debug option support.
> > 
> > It can be enabled or disabled via amdgpu_smu_debug
> > debugfs file. When enabled, it makes SMU errors fatal.
> > It is disabled by default.
> > 
> > == Command Guide ==
> > 
> > 1, enable SMU debug option
> > 
> >   # echo 1 > /sys/kernel/debug/dri/0/amdgpu_smu_debug
> > 
> > 2, disable SMU debug option
> > 
> >   # echo 0 > /sys/kernel/debug/dri/0/amdgpu_smu_debug
> > 
> > v2:
> >   - Resend command when timeout.(Lijo)
> >   - Use debugfs file instead of module parameter.
> > 
> > Signed-off-by: Lang Yu 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 32 +
> >   drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c  | 39 +++--
> >   2 files changed, 69 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> > index 164d6a9e9fbb..f9412de86599 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> > @@ -39,6 +39,8 @@
> >   #if defined(CONFIG_DEBUG_FS)
> > +extern int amdgpu_smu_debug;
> > +
> >   /**
> >* amdgpu_debugfs_process_reg_op - Handle MMIO register reads/writes
> >*
> > @@ -1152,6 +1154,8 @@ static ssize_t amdgpu_debugfs_gfxoff_read(struct file 
> > *f, char __user *buf,
> > return result;
> >   }
> > +
> > +
> 
> Unrelated change.
Will remove it.

> >   static const struct file_operations amdgpu_debugfs_regs2_fops = {
> > .owner = THIS_MODULE,
> > .unlocked_ioctl = amdgpu_debugfs_regs2_ioctl,
> > @@ -1609,6 +1613,26 @@ DEFINE_DEBUGFS_ATTRIBUTE(fops_ib_preempt, NULL,
> >   DEFINE_DEBUGFS_ATTRIBUTE(fops_sclk_set, NULL,
> > amdgpu_debugfs_sclk_set, "%llu\n");
> > +static int amdgpu_debugfs_smu_debug_get(void *data, u64 *val)
> > +{
> > +   *val = amdgpu_smu_debug;
> > +   return 0;
> > +}
> > +
> > +static int amdgpu_debugfs_smu_debug_set(void *data, u64 val)
> > +{
> > +   if (val != 0 && val != 1)
> > +   return -EINVAL;
> > +
> > +   amdgpu_smu_debug = val;
> > +   return 0;
> > +}
> > +
> > +DEFINE_DEBUGFS_ATTRIBUTE(fops_smu_debug,
> > +amdgpu_debugfs_smu_debug_get,
> > +amdgpu_debugfs_smu_debug_set,
> > +"%llu\n");
> > +
> 
> That can be done much simpler. Take a look at the debugfs_create_bool()
> function.
Thanks for your advice. Will modify that.

> >   int amdgpu_debugfs_init(struct amdgpu_device *adev)
> >   {
> > struct dentry *root = adev_to_drm(adev)->primary->debugfs_root;
> > @@ -1632,6 +1656,14 @@ int amdgpu_debugfs_init(struct amdgpu_device *adev)
> > return PTR_ERR(ent);
> > }
> > +   ent = debugfs_create_file("amdgpu_smu_debug", 0600, root, adev,
> > + _smu_debug);
> > +   if (IS_ERR(ent)) {
> > +   DRM_ERROR("unable to create amdgpu_smu_debug debugsfs file\n");
> > +   return PTR_ERR(ent);
> > +   }
> > +
> > +
> > /* Register debugfs entries for amdgpu_ttm */
> > amdgpu_ttm_debugfs_init(adev);
> > amdgpu_debugfs_pm_init(adev);
> > diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c 
> > b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> > index 048ca1673863..b3969d7933d3 100644
> > --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> > +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> > @@ -55,6 +55,14 @@
> >   #undef __SMU_DUMMY_MAP
> >   #define __SMU_DUMMY_MAP(type) #type
> > +
> > +/*
> > + * Used to enable SMU debug option. When enabled, it makes SMU errors 
> > fatal.
> > + * This will aid in debugging SMU firmware issues.
> > + * (0 = disabled (default), 1 = enabled)
> > + */
> > +int amdgpu_smu_debug;
> 
> Probably better to put that into amdgpu_device or similar structure.
Ok. Thanks for your advice.

Regards,
Lang

> Regards,
> Christian.
> 
> > +
> >   static const char * const __smu_message_names[] = {
> > SMU_MESSAGE_TYPES
> >   };
> > @@ -272,6 +280,11 @@ int smu_cmn_send_msg_without_waiting(struct 
> > smu_context *smu,
> > __smu_cmn_send_msg(smu, msg_index, param);
> > res = 0;
> >   Out:
> > +   if (unlikely(amdgpu_smu_debug == 1) && res) {
> > +   mutex_unlock(>message_lock);
> > +   BUG();
> > +   }
> > +
> > return res;
> >   }
> > @@ -288,9 +301,17 @@ int smu_cmn_send_msg_without_waiting(struct 
> > smu_context *smu,
> >   int smu_cmn_wait_for_response(struct smu_context *smu)
> >   {
> > u32 reg;
> > +   int res;
> > reg = __smu_cmn_poll_stat(smu);
> > -   return __smu_cmn_reg2errno(smu, reg);
> > +   res = __smu_cmn_reg2errno(smu, reg);
> > +
> > +   if (unlikely(amdgpu_smu_debug == 1) && res) {
> > +   mutex_unlock(>message_lock);
> > +   BUG();
> > +   }
> > +
> > +   return res;
> >   }
> >   /**
> > @@ -328,6 +349,7 

RE: [PATCH V2 05/17] drm/amd/pm: do not expose those APIs used internally only in si_dpm.c

2021-11-30 Thread Quan, Evan
[AMD Official Use Only]



> -Original Message-
> From: Lazar, Lijo 
> Sent: Tuesday, November 30, 2021 8:22 PM
> To: Quan, Evan ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Koenig, Christian
> ; Feng, Kenneth 
> Subject: Re: [PATCH V2 05/17] drm/amd/pm: do not expose those APIs used
> internally only in si_dpm.c
> 
> 
> 
> On 11/30/2021 1:12 PM, Evan Quan wrote:
> > Move them to si_dpm.c instead.
> >
> > Signed-off-by: Evan Quan 
> > Change-Id: I288205cfd7c6ba09cfb22626ff70360d61ff0c67
> > --
> > v1->v2:
> >- rename the API with "si_" prefix(Alex)
> > ---
> >   drivers/gpu/drm/amd/pm/amdgpu_dpm.c   | 25 ---
> >   drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h   | 25 ---
> >   drivers/gpu/drm/amd/pm/powerplay/si_dpm.c | 54
> +++
> >   drivers/gpu/drm/amd/pm/powerplay/si_dpm.h |  7 +++
> >   4 files changed, 53 insertions(+), 58 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
> > b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
> > index 52ac3c883a6e..fbfc07a83122 100644
> > --- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
> > +++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
> > @@ -894,31 +894,6 @@ void amdgpu_add_thermal_controller(struct
> amdgpu_device *adev)
> > }
> >   }
> >
> > -enum amdgpu_pcie_gen amdgpu_get_pcie_gen_support(struct
> amdgpu_device *adev,
> > -u32 sys_mask,
> > -enum amdgpu_pcie_gen
> asic_gen,
> > -enum amdgpu_pcie_gen
> default_gen)
> > -{
> > -   switch (asic_gen) {
> > -   case AMDGPU_PCIE_GEN1:
> > -   return AMDGPU_PCIE_GEN1;
> > -   case AMDGPU_PCIE_GEN2:
> > -   return AMDGPU_PCIE_GEN2;
> > -   case AMDGPU_PCIE_GEN3:
> > -   return AMDGPU_PCIE_GEN3;
> > -   default:
> > -   if ((sys_mask & CAIL_PCIE_LINK_SPEED_SUPPORT_GEN3)
> &&
> > -   (default_gen == AMDGPU_PCIE_GEN3))
> > -   return AMDGPU_PCIE_GEN3;
> > -   else if ((sys_mask &
> CAIL_PCIE_LINK_SPEED_SUPPORT_GEN2) &&
> > -(default_gen == AMDGPU_PCIE_GEN2))
> > -   return AMDGPU_PCIE_GEN2;
> > -   else
> > -   return AMDGPU_PCIE_GEN1;
> > -   }
> > -   return AMDGPU_PCIE_GEN1;
> > -}
> > -
> >   struct amd_vce_state*
> >   amdgpu_get_vce_clock_state(void *handle, u32 idx)
> >   {
> > diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
> > b/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
> > index 6681b878e75f..f43b96dfe9d8 100644
> > --- a/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
> > +++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
> > @@ -45,19 +45,6 @@ enum amdgpu_int_thermal_type {
> > THERMAL_TYPE_KV,
> >   };
> >
> > -enum amdgpu_dpm_auto_throttle_src {
> > -   AMDGPU_DPM_AUTO_THROTTLE_SRC_THERMAL,
> > -   AMDGPU_DPM_AUTO_THROTTLE_SRC_EXTERNAL
> > -};
> > -
> > -enum amdgpu_dpm_event_src {
> > -   AMDGPU_DPM_EVENT_SRC_ANALOG = 0,
> > -   AMDGPU_DPM_EVENT_SRC_EXTERNAL = 1,
> > -   AMDGPU_DPM_EVENT_SRC_DIGITAL = 2,
> > -   AMDGPU_DPM_EVENT_SRC_ANALOG_OR_EXTERNAL = 3,
> > -   AMDGPU_DPM_EVENT_SRC_DIGIAL_OR_EXTERNAL = 4
> > -};
> > -
> >   struct amdgpu_ps {
> > u32 caps; /* vbios flags */
> > u32 class; /* vbios flags */
> > @@ -252,13 +239,6 @@ struct amdgpu_dpm_fan {
> > bool ucode_fan_control;
> >   };
> >
> > -enum amdgpu_pcie_gen {
> > -   AMDGPU_PCIE_GEN1 = 0,
> > -   AMDGPU_PCIE_GEN2 = 1,
> > -   AMDGPU_PCIE_GEN3 = 2,
> > -   AMDGPU_PCIE_GEN_INVALID = 0x
> > -};
> > -
> >   #define amdgpu_dpm_reset_power_profile_state(adev, request) \
> > ((adev)->powerplay.pp_funcs-
> >reset_power_profile_state(\
> > (adev)->powerplay.pp_handle, request)) @@ -
> 403,11 +383,6 @@ void
> > amdgpu_free_extended_power_table(struct amdgpu_device *adev);
> >
> >   void amdgpu_add_thermal_controller(struct amdgpu_device *adev);
> >
> > -enum amdgpu_pcie_gen amdgpu_get_pcie_gen_support(struct
> amdgpu_device *adev,
> > -u32 sys_mask,
> > -enum amdgpu_pcie_gen
> asic_gen,
> > -enum amdgpu_pcie_gen
> default_gen);
> > -
> >   struct amd_vce_state*
> >   amdgpu_get_vce_clock_state(void *handle, u32 idx);
> >
> > diff --git a/drivers/gpu/drm/amd/pm/powerplay/si_dpm.c
> > b/drivers/gpu/drm/amd/pm/powerplay/si_dpm.c
> > index 81f82aa05ec2..4f84d8b893f1 100644
> > --- a/drivers/gpu/drm/amd/pm/powerplay/si_dpm.c
> > +++ b/drivers/gpu/drm/amd/pm/powerplay/si_dpm.c
> > @@ -96,6 +96,19 @@ union pplib_clock_info {
> > struct _ATOM_PPLIB_SI_CLOCK_INFO si;
> >   };
> >
> > +enum amdgpu_dpm_auto_throttle_src {
> > +   AMDGPU_DPM_AUTO_THROTTLE_SRC_THERMAL,
> > +   AMDGPU_DPM_AUTO_THROTTLE_SRC_EXTERNAL
> > +};
> > +
> > +enum amdgpu_dpm_event_src {
> > +   AMDGPU_DPM_EVENT_SRC_ANALOG = 0,
> > +   AMDGPU_DPM_EVENT_SRC_EXTERNAL = 1,
> > +   

RE: [PATCH V2 02/17] drm/amd/pm: do not expose power implementation details to amdgpu_pm.c

2021-11-30 Thread Quan, Evan
[Public]



> -Original Message-
> From: Chen, Guchun 
> Sent: Tuesday, November 30, 2021 9:05 PM
> To: Quan, Evan ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Lazar, Lijo
> ; Feng, Kenneth ; Koenig,
> Christian ; Quan, Evan 
> Subject: RE: [PATCH V2 02/17] drm/amd/pm: do not expose power
> implementation details to amdgpu_pm.c
> 
> [Public]
> 
> Two nit-picks.
> 
> 1. It's better to drop "return" in amdgpu_dpm_get_current_power_state.
[Quan, Evan] I can do that.
> 
> 2. In some functions, when function pointer is NULL, sometimes it returns 0,
> while in other cases, it returns -EOPNOTSUPP. Is there any cause for this?
[Quan, Evan] It is to stick with original logic. We might update them later(by 
new patches).
For this patch series, I would like to maintain minimum changes(avoid real 
logic change).

Thanks
Evan
> 
> Regards,
> Guchun
> 
> -Original Message-
> From: amd-gfx  On Behalf Of Evan
> Quan
> Sent: Tuesday, November 30, 2021 3:43 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Lazar, Lijo
> ; Feng, Kenneth ; Koenig,
> Christian ; Quan, Evan 
> Subject: [PATCH V2 02/17] drm/amd/pm: do not expose power
> implementation details to amdgpu_pm.c
> 
> amdgpu_pm.c holds all the user sysfs/hwmon interfaces. It's another
> client of our power APIs. It's not proper to spike into power
> implementation details there.
> 
> Signed-off-by: Evan Quan 
> Change-Id: I397853ddb13eacfce841366de2a623535422df9a
> ---
>  drivers/gpu/drm/amd/pm/amdgpu_dpm.c   | 458 ++-
>  drivers/gpu/drm/amd/pm/amdgpu_pm.c| 519 --
>  drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h   | 160 +++
>  drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c |   3 -
>  4 files changed, 709 insertions(+), 431 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
> b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
> index 9b332c8a0079..3c59f16c7a6f 100644
> --- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
> +++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
> @@ -1453,7 +1453,9 @@ static void
> amdgpu_dpm_change_power_state_locked(struct amdgpu_device *adev)
>   if (equal)
>   return;
> 
> - amdgpu_dpm_set_power_state(adev);
> + if (adev->powerplay.pp_funcs->set_power_state)
> + adev->powerplay.pp_funcs->set_power_state(adev-
> >powerplay.pp_handle);
> +
>   amdgpu_dpm_post_set_power_state(adev);
> 
>   adev->pm.dpm.current_active_crtcs = adev-
> >pm.dpm.new_active_crtcs;
> @@ -1709,3 +1711,457 @@ int amdgpu_dpm_get_ecc_info(struct
> amdgpu_device *adev,
> 
>   return smu_get_ecc_info(>smu, umc_ecc);
>  }
> +
> +struct amd_vce_state *amdgpu_dpm_get_vce_clock_state(struct
> amdgpu_device *adev,
> +  uint32_t idx)
> +{
> + const struct amd_pm_funcs *pp_funcs = adev->powerplay.pp_funcs;
> +
> + if (!pp_funcs->get_vce_clock_state)
> + return NULL;
> +
> + return pp_funcs->get_vce_clock_state(adev->powerplay.pp_handle,
> +  idx);
> +}
> +
> +void amdgpu_dpm_get_current_power_state(struct amdgpu_device
> *adev,
> + enum amd_pm_state_type *state)
> +{
> + const struct amd_pm_funcs *pp_funcs = adev->powerplay.pp_funcs;
> +
> + if (!pp_funcs->get_current_power_state) {
> + *state = adev->pm.dpm.user_state;
> + return;
> + }
> +
> + *state = pp_funcs->get_current_power_state(adev-
> >powerplay.pp_handle);
> + if (*state < POWER_STATE_TYPE_DEFAULT ||
> + *state > POWER_STATE_TYPE_INTERNAL_3DPERF)
> + *state = adev->pm.dpm.user_state;
> +
> + return;
> +}
> +
> +void amdgpu_dpm_set_power_state(struct amdgpu_device *adev,
> + enum amd_pm_state_type state)
> +{
> + adev->pm.dpm.user_state = state;
> +
> + if (adev->powerplay.pp_funcs->dispatch_tasks)
> + amdgpu_dpm_dispatch_task(adev,
> AMD_PP_TASK_ENABLE_USER_STATE, );
> + else
> + amdgpu_pm_compute_clocks(adev);
> +}
> +
> +enum amd_dpm_forced_level
> amdgpu_dpm_get_performance_level(struct amdgpu_device *adev)
> +{
> + const struct amd_pm_funcs *pp_funcs = adev->powerplay.pp_funcs;
> + enum amd_dpm_forced_level level;
> +
> + if (pp_funcs->get_performance_level)
> + level = pp_funcs->get_performance_level(adev-
> >powerplay.pp_handle);
> + else
> + level = adev->pm.dpm.forced_level;
> +
> + return level;
> +}
> +
> +int amdgpu_dpm_force_performance_level(struct amdgpu_device *adev,
> +enum amd_dpm_forced_level level)
> +{
> + const struct amd_pm_funcs *pp_funcs = adev->powerplay.pp_funcs;
> +
> + if (pp_funcs->force_performance_level) {
> + if (adev->pm.dpm.thermal_active)
> + return -EINVAL;
> +
> + if (pp_funcs->force_performance_level(adev-
> >powerplay.pp_handle,
> +

RE: [PATCH V2 01/17] drm/amd/pm: do not expose implementation details to other blocks out of power

2021-11-30 Thread Quan, Evan
[AMD Official Use Only]



> -Original Message-
> From: amd-gfx  On Behalf Of
> Lazar, Lijo
> Sent: Tuesday, November 30, 2021 4:10 PM
> To: Quan, Evan ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Feng, Kenneth
> ; Koenig, Christian 
> Subject: Re: [PATCH V2 01/17] drm/amd/pm: do not expose implementation
> details to other blocks out of power
> 
> 
> 
> On 11/30/2021 1:12 PM, Evan Quan wrote:
> > Those implementation details(whether swsmu supported, some ppt_funcs
> > supported, accessing internal statistics ...)should be kept
> > internally. It's not a good practice and even error prone to expose
> implementation details.
> >
> > Signed-off-by: Evan Quan 
> > Change-Id: Ibca3462ceaa26a27a9145282b60c6ce5deca7752
> > ---
> >   drivers/gpu/drm/amd/amdgpu/aldebaran.c|  2 +-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c   | 25 ++---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c|  6 +-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c   | 18 +---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h   |  7 --
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c   |  5 +-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c   |  5 +-
> >   drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c   |  2 +-
> >   .../gpu/drm/amd/include/kgd_pp_interface.h|  4 +
> >   drivers/gpu/drm/amd/pm/amdgpu_dpm.c   | 95
> +++
> >   drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h   | 25 -
> >   drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h   |  9 +-
> >   drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 16 ++--
> >   13 files changed, 155 insertions(+), 64 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/aldebaran.c
> > b/drivers/gpu/drm/amd/amdgpu/aldebaran.c
> > index bcfdb63b1d42..a545df4efce1 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/aldebaran.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/aldebaran.c
> > @@ -260,7 +260,7 @@ static int aldebaran_mode2_restore_ip(struct
> amdgpu_device *adev)
> > adev->gfx.rlc.funcs->resume(adev);
> >
> > /* Wait for FW reset event complete */
> > -   r = smu_wait_for_event(adev, SMU_EVENT_RESET_COMPLETE, 0);
> > +   r = amdgpu_dpm_wait_for_event(adev,
> SMU_EVENT_RESET_COMPLETE, 0);
> > if (r) {
> > dev_err(adev->dev,
> > "Failed to get response from firmware after reset\n");
> diff --git
> > a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> > index 164d6a9e9fbb..0d1f00b24aae 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> > @@ -1585,22 +1585,25 @@ static int amdgpu_debugfs_sclk_set(void *data,
> u64 val)
> > return ret;
> > }
> >
> > -   if (is_support_sw_smu(adev)) {
> > -   ret = smu_get_dpm_freq_range(>smu, SMU_SCLK,
> _freq, _freq);
> > -   if (ret || val > max_freq || val < min_freq)
> > -   return -EINVAL;
> > -   ret = smu_set_soft_freq_range(>smu, SMU_SCLK,
> (uint32_t)val, (uint32_t)val);
> > -   } else {
> > -   return 0;
> > +   ret = amdgpu_dpm_get_dpm_freq_range(adev, PP_SCLK,
> _freq, _freq);
> > +   if (ret == -EOPNOTSUPP) {
> > +   ret = 0;
> > +   goto out;
> > }
> > +   if (ret || val > max_freq || val < min_freq) {
> > +   ret = -EINVAL;
> > +   goto out;
> > +   }
> > +
> > +   ret = amdgpu_dpm_set_soft_freq_range(adev, PP_SCLK,
> (uint32_t)val, (uint32_t)val);
> > +   if (ret)
> > +   ret = -EINVAL;
> >
> > +out:
> > pm_runtime_mark_last_busy(adev_to_drm(adev)->dev);
> > pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
> >
> > -   if (ret)
> > -   return -EINVAL;
> > -
> > -   return 0;
> > +   return ret;
> >   }
> >
> >   DEFINE_DEBUGFS_ATTRIBUTE(fops_ib_preempt, NULL, diff --git
> > a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index 1989f9e9379e..41cc1ffb5809 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -2617,7 +2617,7 @@ static int amdgpu_device_ip_late_init(struct
> amdgpu_device *adev)
> > if (adev->asic_type == CHIP_ARCTURUS &&
> > amdgpu_passthrough(adev) &&
> > adev->gmc.xgmi.num_physical_nodes > 1)
> > -   smu_set_light_sbr(>smu, true);
> > +   amdgpu_dpm_set_light_sbr(adev, true);
> >
> > if (adev->gmc.xgmi.num_physical_nodes > 1) {
> > mutex_lock(_info.mutex);
> > @@ -2857,7 +2857,7 @@ static int
> amdgpu_device_ip_suspend_phase2(struct amdgpu_device *adev)
> > int i, r;
> >
> > if (adev->in_s0ix)
> > -   amdgpu_gfx_state_change_set(adev,
> sGpuChangeState_D3Entry);
> > +   amdgpu_dpm_gfx_state_change(adev,
> sGpuChangeState_D3Entry);
> >
> > for (i = adev->num_ip_blocks - 1; i >= 0; i--) {
> > if (!adev->ip_blocks[i].status.valid)
> > @@ -3982,7 +3982,7 @@ int amdgpu_device_resume(struct drm_device
> *dev, 

Re: [PATCH] drm/amdkfd: fix svm_bo release invalid wait context warning

2021-11-30 Thread Felix Kuehling
Am 2021-11-30 um 3:13 p.m. schrieb Philip Yang:
> svm_range_bo_release could be called from __do_munmap put_page
> atomic context if process release work has finished to free pranges of
> the svm_bo. Schedule release_work to wait for svm_bo eviction work done
> and then free svm_bo.
>
> Signed-off-by: Philip Yang 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 36 +++-
>  drivers/gpu/drm/amd/amdkfd/kfd_svm.h |  1 +
>  2 files changed, 26 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> index f2db49c7a8fd..8af87a662a0d 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> @@ -327,11 +327,33 @@ static bool svm_bo_ref_unless_zero(struct svm_range_bo 
> *svm_bo)
>   return true;
>  }
>  
> +static void svm_range_bo_wq_release(struct work_struct *work)
> +{
> + struct svm_range_bo *svm_bo;
> +
> + svm_bo = container_of(work, struct svm_range_bo, release_work);
> + pr_debug("svm_bo 0x%p\n", svm_bo);
> +
> + if (!dma_fence_is_signaled(_bo->eviction_fence->base)) {
> + /* We're not in the eviction worker.
> +  * Signal the fence and synchronize with any
> +  * pending eviction work.
> +  */

Now that this is its own worker, it's definitely never in the eviction
worker. So this doesn't need to be conditional here.

I'm thinking, in the eviction case, the extra scheduling is not actually
needed and only adds latency and overhead. The eviction already runs in
a worker thread and the svm_range_bo_unref there is definitely not in an
atomic context.

So I wonder if we should have a two versions of
svm_range_bo_unref_sync/async and svm_range_bo_release_sync/async. The
synchronous version doesn't schedule a worker and is used when we're
sure were not in atomic context. The asynchronous version we can use in
places that may be in atomic context.

Regards,
  Felix


> + dma_fence_signal(_bo->eviction_fence->base);
> + cancel_work_sync(_bo->eviction_work);
> + }
> + dma_fence_put(_bo->eviction_fence->base);
> + amdgpu_bo_unref(_bo->bo);
> + kfree(svm_bo);
> +}
> +
>  static void svm_range_bo_release(struct kref *kref)
>  {
>   struct svm_range_bo *svm_bo;
>  
>   svm_bo = container_of(kref, struct svm_range_bo, kref);
> + pr_debug("svm_bo 0x%p\n", svm_bo);
> +
>   spin_lock(_bo->list_lock);
>   while (!list_empty(_bo->range_list)) {
>   struct svm_range *prange =
> @@ -352,17 +374,9 @@ static void svm_range_bo_release(struct kref *kref)
>   spin_lock(_bo->list_lock);
>   }
>   spin_unlock(_bo->list_lock);
> - if (!dma_fence_is_signaled(_bo->eviction_fence->base)) {
> - /* We're not in the eviction worker.
> -  * Signal the fence and synchronize with any
> -  * pending eviction work.
> -  */
> - dma_fence_signal(_bo->eviction_fence->base);
> - cancel_work_sync(_bo->eviction_work);
> - }
> - dma_fence_put(_bo->eviction_fence->base);
> - amdgpu_bo_unref(_bo->bo);
> - kfree(svm_bo);
> +
> + INIT_WORK(_bo->release_work, svm_range_bo_wq_release);
> + schedule_work(_bo->release_work);
>  }
>  
>  void svm_range_bo_unref(struct svm_range_bo *svm_bo)
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h 
> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
> index 6dc91c33e80f..23478ae7a7d9 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
> @@ -48,6 +48,7 @@ struct svm_range_bo {
>   struct work_struct  eviction_work;
>   struct svm_range_list   *svms;
>   uint32_tevicting;
> + struct work_struct  release_work;
>  };
>  
>  enum svm_work_list_ops {


Re: [PATCH 1/6] Documentation/gpu: Reorganize DC documentation

2021-11-30 Thread Yann Dirson
> On 2021-11-30 10:48 a.m., Harry Wentland wrote:
> > On 2021-11-30 10:46, Rodrigo Siqueira Jordao wrote:
> >>
> >>
> >> On 2021-11-29 7:06 a.m., Jani Nikula wrote:
> >>> On Fri, 26 Nov 2021, Daniel Vetter  wrote:
>  On Thu, Nov 25, 2021 at 10:38:25AM -0500, Rodrigo Siqueira
>  wrote:
> > Display core documentation is not well organized, and it is
> > hard to find
> > information due to the lack of sections. This commit
> > reorganizes the
> > documentation layout, and it is preparation work for future
> > changes.
> >
> > Signed-off-by: Rodrigo Siqueira 
> > ---
> >    Documentation/gpu/amdgpu-dc.rst   | 74
> >    ---
> >    .../gpu/amdgpu-dc/amdgpu-dc-debug.rst |  4 +
> >    Documentation/gpu/amdgpu-dc/amdgpu-dc.rst | 29 
> >    Documentation/gpu/amdgpu-dc/amdgpu-dm.rst | 42
> >    +++
> >    Documentation/gpu/drivers.rst |  2 +-
> >    5 files changed, 76 insertions(+), 75 deletions(-)
> >    delete mode 100644 Documentation/gpu/amdgpu-dc.rst
> >    create mode 100644
> >    Documentation/gpu/amdgpu-dc/amdgpu-dc-debug.rst
> >    create mode 100644 Documentation/gpu/amdgpu-dc/amdgpu-dc.rst
> >    create mode 100644 Documentation/gpu/amdgpu-dc/amdgpu-dm.rst
> >
> > diff --git a/Documentation/gpu/amdgpu-dc.rst
> > b/Documentation/gpu/amdgpu-dc.rst
> > deleted file mode 100644
> > index f7ff7e1309de..
> > --- a/Documentation/gpu/amdgpu-dc.rst
> > +++ /dev/null
> > @@ -1,74 +0,0 @@
> > -===
> > -drm/amd/display - Display Core (DC)
> > -===
> > -
> > -*placeholder - general description of supported platforms,
> > what dc is, etc.*
> > -
> > -Because it is partially shared with other operating systems,
> > the Display Core
> > -Driver is divided in two pieces.
> > -
> > -1. **Display Core (DC)** contains the OS-agnostic components.
> > Things like
> > -   hardware programming and resource management are handled
> > here.
> > -2. **Display Manager (DM)** contains the OS-dependent
> > components. Hooks to the
> > -   amdgpu base driver and DRM are implemented here.
> > -
> > -It doesn't help that the entire package is frequently referred
> > to as DC. But
> > -with the context in mind, it should be clear.
> > -
> > -When CONFIG_DRM_AMD_DC is enabled, DC will be initialized by
> > default for
> > -supported ASICs. To force disable, set `amdgpu.dc=0` on kernel
> > command line.
> > -Likewise, to force enable on unsupported ASICs, set
> > `amdgpu.dc=1`.
> > -
> > -To determine if DC is loaded, search dmesg for the following
> > entry:
> > -
> > -``Display Core initialized with ``
> > -
> > -AMDgpu Display Manager
> > -==
> > -
> > -.. kernel-doc::
> > drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > -   :doc: overview
> > -
> > -.. kernel-doc::
> > drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
> > -   :internal:
> > -
> > -Lifecycle
> > --
> > -
> > -.. kernel-doc::
> > drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > -   :doc: DM Lifecycle
> > -
> > -.. kernel-doc::
> > drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > -   :functions: dm_hw_init dm_hw_fini
> > -
> > -Interrupts
> > ---
> > -
> > -.. kernel-doc::
> > drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c
> > -   :doc: overview
> > -
> > -.. kernel-doc::
> > drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c
> > -   :internal:
> > -
> > -.. kernel-doc::
> > drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > -   :functions: register_hpd_handlers dm_crtc_high_irq
> > dm_pflip_high_irq
> > -
> > -Atomic Implementation
> > --
> > -
> > -.. kernel-doc::
> > drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > -   :doc: atomic
> > -
> > -.. kernel-doc::
> > drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > -   :functions: amdgpu_dm_atomic_check
> > amdgpu_dm_atomic_commit_tail
> > -
> > -Display Core
> > -
> > -
> > -**WIP**
> > -
> > -FreeSync Video
> > ---
> > -
> > -.. kernel-doc::
> > drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > -   :doc: FreeSync Video
> > diff --git a/Documentation/gpu/amdgpu-dc/amdgpu-dc-debug.rst
> > b/Documentation/gpu/amdgpu-dc/amdgpu-dc-debug.rst
> > new file mode 100644
> > index ..bbb8c3fc8eee
> > --- /dev/null
> > +++ b/Documentation/gpu/amdgpu-dc/amdgpu-dc-debug.rst
> > @@ -0,0 +1,4 @@
> > +Display Core Debug 

[PATCH] drm/amdkfd: fix svm_bo release invalid wait context warning

2021-11-30 Thread Philip Yang
svm_range_bo_release could be called from __do_munmap put_page
atomic context if process release work has finished to free pranges of
the svm_bo. Schedule release_work to wait for svm_bo eviction work done
and then free svm_bo.

Signed-off-by: Philip Yang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 36 +++-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h |  1 +
 2 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index f2db49c7a8fd..8af87a662a0d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -327,11 +327,33 @@ static bool svm_bo_ref_unless_zero(struct svm_range_bo 
*svm_bo)
return true;
 }
 
+static void svm_range_bo_wq_release(struct work_struct *work)
+{
+   struct svm_range_bo *svm_bo;
+
+   svm_bo = container_of(work, struct svm_range_bo, release_work);
+   pr_debug("svm_bo 0x%p\n", svm_bo);
+
+   if (!dma_fence_is_signaled(_bo->eviction_fence->base)) {
+   /* We're not in the eviction worker.
+* Signal the fence and synchronize with any
+* pending eviction work.
+*/
+   dma_fence_signal(_bo->eviction_fence->base);
+   cancel_work_sync(_bo->eviction_work);
+   }
+   dma_fence_put(_bo->eviction_fence->base);
+   amdgpu_bo_unref(_bo->bo);
+   kfree(svm_bo);
+}
+
 static void svm_range_bo_release(struct kref *kref)
 {
struct svm_range_bo *svm_bo;
 
svm_bo = container_of(kref, struct svm_range_bo, kref);
+   pr_debug("svm_bo 0x%p\n", svm_bo);
+
spin_lock(_bo->list_lock);
while (!list_empty(_bo->range_list)) {
struct svm_range *prange =
@@ -352,17 +374,9 @@ static void svm_range_bo_release(struct kref *kref)
spin_lock(_bo->list_lock);
}
spin_unlock(_bo->list_lock);
-   if (!dma_fence_is_signaled(_bo->eviction_fence->base)) {
-   /* We're not in the eviction worker.
-* Signal the fence and synchronize with any
-* pending eviction work.
-*/
-   dma_fence_signal(_bo->eviction_fence->base);
-   cancel_work_sync(_bo->eviction_work);
-   }
-   dma_fence_put(_bo->eviction_fence->base);
-   amdgpu_bo_unref(_bo->bo);
-   kfree(svm_bo);
+
+   INIT_WORK(_bo->release_work, svm_range_bo_wq_release);
+   schedule_work(_bo->release_work);
 }
 
 void svm_range_bo_unref(struct svm_range_bo *svm_bo)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
index 6dc91c33e80f..23478ae7a7d9 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
@@ -48,6 +48,7 @@ struct svm_range_bo {
struct work_struct  eviction_work;
struct svm_range_list   *svms;
uint32_tevicting;
+   struct work_struct  release_work;
 };
 
 enum svm_work_list_ops {
-- 
2.17.1



[PATCH] drm/amdgpu/sriov/vcn: add new vcn ip revision check case for Navi21 SRIOV

2021-11-30 Thread Bokun Zhang
Under SRIOV, there will be dynamic VCN assignment controlled by the host
driver. The VCN assignment information is passed using IP discovery data
and VCN revision. The revision ID can be 64, 128 and 192.

The swIP code (vcn_v3_0.c) will handle the initialization according to the
revision ID of each VCN instance. In the common code path (amdgpu_discovery.c
and amdgpu_vcn.c), we will simply add the IP block for above mentioned
revision ID.

Note that, we actually support decode under SRIOV, but we decide to disable
it for now.

Signed-off-by: Bokun Zhang 
Change-Id: I2bb44d590fc01ed413efb0e689c491079628454b
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 3 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c   | 1 +
 drivers/gpu/drm/amd/amdgpu/nv.c   | 1 +
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index ee19548cf22c..02644abcfc06 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -915,9 +915,10 @@ static int amdgpu_discovery_set_mm_ip_blocks(struct 
amdgpu_device *adev)
break;
case IP_VERSION(3, 0, 0):
case IP_VERSION(3, 0, 16):
-   case IP_VERSION(3, 0, 64):
case IP_VERSION(3, 1, 1):
case IP_VERSION(3, 0, 2):
+   case IP_VERSION(3, 0, 64):
+   case IP_VERSION(3, 0, 128):
case IP_VERSION(3, 0, 192):
amdgpu_device_ip_block_add(adev, _v3_0_ip_block);
if (!amdgpu_sriov_vf(adev))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index 585961c2f5f2..6d1eb7eabc90 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -135,6 +135,7 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev)
break;
case IP_VERSION(3, 0, 0):
case IP_VERSION(3, 0, 64):
+   case IP_VERSION(3, 0, 128):
case IP_VERSION(3, 0, 192):
if (adev->ip_versions[GC_HWIP][0] == IP_VERSION(10, 3, 0))
fw_name = FIRMWARE_SIENNA_CICHLID;
diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
index 2ec1ffb36b1f..f236fa233b9a 100644
--- a/drivers/gpu/drm/amd/amdgpu/nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
@@ -183,6 +183,7 @@ static int nv_query_video_codecs(struct amdgpu_device 
*adev, bool encode,
switch (adev->ip_versions[UVD_HWIP][0]) {
case IP_VERSION(3, 0, 0):
case IP_VERSION(3, 0, 64):
+   case IP_VERSION(3, 0, 128):
case IP_VERSION(3, 0, 192):
if (amdgpu_sriov_vf(adev)) {
if (encode)
-- 
2.25.1



Re: [PATCH 6/6] Documentation/gpu: Add DC glossary

2021-11-30 Thread Yann Dirson



- Mail original -
> De: "Rodrigo Siqueira Jordao" 
> À: ydir...@free.fr, "Rodrigo Siqueira" , "Christian 
> König" ,
> "Alex Deucher" 
> Cc: "Harry Wentland" , "Linux Doc Mailing List" 
> , "Mark Yacoub"
> , "Michel Dänzer" , "Bas 
> Nieuwenhuizen" ,
> "Roman Li" , "amd-gfx list" 
> , "Roman Gilg" ,
> "Marek Olšák" , "Pekka Paalanen" , 
> "Aurabindo Pillai"
> , "nicholas choi" , "Maling 
> list - DRI developers"
> , "Simon Ser" , "Alex 
> Deucher" , "Sean
> Paul" , "Qingqing Zhuo" , 
> "Bhawanpreet Lakha"
> , "Nicholas Kazlauskas" 
> 
> Envoyé: Mardi 30 Novembre 2021 16:53:55
> Objet: Re: [PATCH 6/6] Documentation/gpu: Add DC glossary
> 
> 
> 
> On 2021-11-29 3:48 p.m., ydir...@free.fr wrote:
> > Hi Rodrigo,
> > 
> > That will really be helpful!
> > 
> > I know drawing the line is a difficult problem (and can even make
> > things
> > harder when searching), but maybe it would make sense to keep
> > generic
> > acronyms not specific to amdgpu in a separate list.  I bet a number
> > of
> > them would be useful in the scope of other drm drivers (e.g. CRTC,
> > DCC,
> > MST), and some are not restricted to the drm subsystem at all (e.g.
> > FEC,
> > LUT), but still have value as not necessarily easy to look up.
> > 
> > Maybe "DC glossary" should just be "Glossary", since quite some
> > entries
> > help to read adm/amdgpu/ too.  Which brings me to the result of my
> > recent
> > searches as suggested entries:
> > 
> >   KIQ (Kernel Interface Queue), MQD (memory queue descriptor), HQD
> >   (hardware
> >   queue descriptor), EOP (still no clue :)
> > 
> > Maybe some more specific ones just to be spelled out in clear where
> > they
> > are used ?  KCQ (compute queue?), KGQ (gfx queue?)
> > 
> > More suggestions inlined.
> > 
> > Best regards,
> > 
> 
> Hi all,
> 
> I'll address all the highlighted problems in the V2. Thanks a lot for
> all the feedback.
> 
> Yann,
> For the generic acronyms, how about keeping it in this patch for now?
> After it gets merged, I can prepare a new documentation patch that
> creates a glossary for DRM where I move the generic acronyms to the
> DRM
> documentation. I prefer this approach to keep the improvement small
> and
> manageable.

Sure, especially as the Right Solution(tm) is not necessarily obvious :)

One thing I thought about is that a context could be specified together
with terms.  Like "BPP (graphics)", "FEC (CS)", "DMCUB (amdgpu)".  Well,
"CS" may not be a good choice but you get the idea: that would keep all
terms together and keep it easy for the reader.

That way it could be easily be generalized at some point by just moving
it to a generic kernel level - provided the solution suits the doc
community at large.

Best regards,
-- 
Yann




Re: [PATCH] drm/amdkfd: Fix a wild pointer dereference in svm_range_add()

2021-11-30 Thread Felix Kuehling
Am 2021-11-30 um 11:51 a.m. schrieb philip yang:
>
>
> On 2021-11-30 6:26 a.m., Zhou Qingyang wrote:
>> In svm_range_add(), the return value of svm_range_new() is assigned
>> to prange and >insert_list is used in list_add(). There is a
>> a dereference of >insert_list in list_add(), which could lead
>> to a wild pointer dereference on failure of vm_range_new() if
>> CONFIG_DEBUG_LIST is unset in .config file.
>>
>> Fix this bug by adding a check of prange.
>>
>> This bug was found by a static analyzer. The analysis employs
>> differential checking to identify inconsistent security operations
>> (e.g., checks or kfrees) between two code paths and confirms that the
>> inconsistent operations are not recovered in the current function or
>> the callers, so they constitute bugs.
>>
>> Note that, as a bug found by static analysis, it can be a false
>> positive or hard to trigger. Multiple researchers have cross-reviewed
>> the bug.
>>
>> Builds with CONFIG_DRM_AMDGPU=m, CONFIG_HSA_AMD=y, and
>> CONFIG_HSA_AMD_SVM=y show no new warnings, and our static analyzer no
>> longer warns about this code.
>>
>> Fixes: 42de677f7999 ("drm/amdkfd: register svm range")
>> Signed-off-by: Zhou Qingyang 
> Reviewed-by: Philip Yang 

The patch looks good to me. It's an obvious bug and definitely not a
false positive. The patch description is a bit verbose. Is this
auto-generated output from the static checker? It could be replaced with
something more concise. Especially the comment about this possibly being
a false positive should not be in the final submission.

Regards,
  Felix


>> ---
>>  drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> index 58b89b53ebe6..e40c2211901d 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> @@ -2940,6 +2940,9 @@ svm_range_add(struct kfd_process *p, uint64_t start, 
>> uint64_t size,
>>  
>>  if (left) {
>>  prange = svm_range_new(svms, last - left + 1, last);
>> +if (!prange)
>> +return -ENOMEM;
>> +
>>  list_add(>insert_list, insert_list);
>>  list_add(>update_list, update_list);
>>  }


Re: [PATCH] drm/amdkfd: Fix a wild pointer dereference in svm_range_add()

2021-11-30 Thread philip yang

  


On 2021-11-30 6:26 a.m., Zhou Qingyang
  wrote:


  In svm_range_add(), the return value of svm_range_new() is assigned
to prange and >insert_list is used in list_add(). There is a
a dereference of >insert_list in list_add(), which could lead
to a wild pointer dereference on failure of vm_range_new() if
CONFIG_DEBUG_LIST is unset in .config file.

Fix this bug by adding a check of prange.

This bug was found by a static analyzer. The analysis employs
differential checking to identify inconsistent security operations
(e.g., checks or kfrees) between two code paths and confirms that the
inconsistent operations are not recovered in the current function or
the callers, so they constitute bugs.

Note that, as a bug found by static analysis, it can be a false
positive or hard to trigger. Multiple researchers have cross-reviewed
the bug.

Builds with CONFIG_DRM_AMDGPU=m, CONFIG_HSA_AMD=y, and
CONFIG_HSA_AMD_SVM=y show no new warnings, and our static analyzer no
longer warns about this code.

Fixes: 42de677f7999 ("drm/amdkfd: register svm range")
Signed-off-by: Zhou Qingyang 

Reviewed-by: Philip Yang 

  
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 58b89b53ebe6..e40c2211901d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -2940,6 +2940,9 @@ svm_range_add(struct kfd_process *p, uint64_t start, uint64_t size,
 
 	if (left) {
 		prange = svm_range_new(svms, last - left + 1, last);
+		if (!prange)
+			return -ENOMEM;
+
 		list_add(>insert_list, insert_list);
 		list_add(>update_list, update_list);
 	}


  



Re: [PATCH 1/6] Documentation/gpu: Reorganize DC documentation

2021-11-30 Thread Harry Wentland



On 2021-11-30 10:59, Rodrigo Siqueira Jordao wrote:
> 
> 
> On 2021-11-30 10:48 a.m., Harry Wentland wrote:
>> On 2021-11-30 10:46, Rodrigo Siqueira Jordao wrote:
>>>
>>>
>>> On 2021-11-29 7:06 a.m., Jani Nikula wrote:
 On Fri, 26 Nov 2021, Daniel Vetter  wrote:
> On Thu, Nov 25, 2021 at 10:38:25AM -0500, Rodrigo Siqueira wrote:
>> Display core documentation is not well organized, and it is hard to find
>> information due to the lack of sections. This commit reorganizes the
>> documentation layout, and it is preparation work for future changes.
>>
>> Signed-off-by: Rodrigo Siqueira 
>> ---
>>    Documentation/gpu/amdgpu-dc.rst   | 74 ---
>>    .../gpu/amdgpu-dc/amdgpu-dc-debug.rst |  4 +
>>    Documentation/gpu/amdgpu-dc/amdgpu-dc.rst | 29 
>>    Documentation/gpu/amdgpu-dc/amdgpu-dm.rst | 42 +++
>>    Documentation/gpu/drivers.rst |  2 +-
>>    5 files changed, 76 insertions(+), 75 deletions(-)
>>    delete mode 100644 Documentation/gpu/amdgpu-dc.rst
>>    create mode 100644 Documentation/gpu/amdgpu-dc/amdgpu-dc-debug.rst
>>    create mode 100644 Documentation/gpu/amdgpu-dc/amdgpu-dc.rst
>>    create mode 100644 Documentation/gpu/amdgpu-dc/amdgpu-dm.rst
>>
>> diff --git a/Documentation/gpu/amdgpu-dc.rst 
>> b/Documentation/gpu/amdgpu-dc.rst
>> deleted file mode 100644
>> index f7ff7e1309de..
>> --- a/Documentation/gpu/amdgpu-dc.rst
>> +++ /dev/null
>> @@ -1,74 +0,0 @@
>> -===
>> -drm/amd/display - Display Core (DC)
>> -===
>> -
>> -*placeholder - general description of supported platforms, what dc is, 
>> etc.*
>> -
>> -Because it is partially shared with other operating systems, the 
>> Display Core
>> -Driver is divided in two pieces.
>> -
>> -1. **Display Core (DC)** contains the OS-agnostic components. Things 
>> like
>> -   hardware programming and resource management are handled here.
>> -2. **Display Manager (DM)** contains the OS-dependent components. Hooks 
>> to the
>> -   amdgpu base driver and DRM are implemented here.
>> -
>> -It doesn't help that the entire package is frequently referred to as 
>> DC. But
>> -with the context in mind, it should be clear.
>> -
>> -When CONFIG_DRM_AMD_DC is enabled, DC will be initialized by default for
>> -supported ASICs. To force disable, set `amdgpu.dc=0` on kernel command 
>> line.
>> -Likewise, to force enable on unsupported ASICs, set `amdgpu.dc=1`.
>> -
>> -To determine if DC is loaded, search dmesg for the following entry:
>> -
>> -``Display Core initialized with ``
>> -
>> -AMDgpu Display Manager
>> -==
>> -
>> -.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> -   :doc: overview
>> -
>> -.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
>> -   :internal:
>> -
>> -Lifecycle
>> --
>> -
>> -.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> -   :doc: DM Lifecycle
>> -
>> -.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> -   :functions: dm_hw_init dm_hw_fini
>> -
>> -Interrupts
>> ---
>> -
>> -.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c
>> -   :doc: overview
>> -
>> -.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c
>> -   :internal:
>> -
>> -.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> -   :functions: register_hpd_handlers dm_crtc_high_irq dm_pflip_high_irq
>> -
>> -Atomic Implementation
>> --
>> -
>> -.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> -   :doc: atomic
>> -
>> -.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> -   :functions: amdgpu_dm_atomic_check amdgpu_dm_atomic_commit_tail
>> -
>> -Display Core
>> -
>> -
>> -**WIP**
>> -
>> -FreeSync Video
>> ---
>> -
>> -.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> -   :doc: FreeSync Video
>> diff --git a/Documentation/gpu/amdgpu-dc/amdgpu-dc-debug.rst 
>> b/Documentation/gpu/amdgpu-dc/amdgpu-dc-debug.rst
>> new file mode 100644
>> index ..bbb8c3fc8eee
>> --- /dev/null
>> +++ b/Documentation/gpu/amdgpu-dc/amdgpu-dc-debug.rst
>> @@ -0,0 +1,4 @@
>> +Display Core Debug tools
>> +
>> +
>> +TODO
>> diff --git a/Documentation/gpu/amdgpu-dc/amdgpu-dc.rst 
>> b/Documentation/gpu/amdgpu-dc/amdgpu-dc.rst
>> new file mode 100644
>> index ..3685b3b1ad64

Re: [PATCH 1/6] Documentation/gpu: Reorganize DC documentation

2021-11-30 Thread Rodrigo Siqueira Jordao




On 2021-11-30 10:48 a.m., Harry Wentland wrote:

On 2021-11-30 10:46, Rodrigo Siqueira Jordao wrote:



On 2021-11-29 7:06 a.m., Jani Nikula wrote:

On Fri, 26 Nov 2021, Daniel Vetter  wrote:

On Thu, Nov 25, 2021 at 10:38:25AM -0500, Rodrigo Siqueira wrote:

Display core documentation is not well organized, and it is hard to find
information due to the lack of sections. This commit reorganizes the
documentation layout, and it is preparation work for future changes.

Signed-off-by: Rodrigo Siqueira 
---
   Documentation/gpu/amdgpu-dc.rst   | 74 ---
   .../gpu/amdgpu-dc/amdgpu-dc-debug.rst |  4 +
   Documentation/gpu/amdgpu-dc/amdgpu-dc.rst | 29 
   Documentation/gpu/amdgpu-dc/amdgpu-dm.rst | 42 +++
   Documentation/gpu/drivers.rst |  2 +-
   5 files changed, 76 insertions(+), 75 deletions(-)
   delete mode 100644 Documentation/gpu/amdgpu-dc.rst
   create mode 100644 Documentation/gpu/amdgpu-dc/amdgpu-dc-debug.rst
   create mode 100644 Documentation/gpu/amdgpu-dc/amdgpu-dc.rst
   create mode 100644 Documentation/gpu/amdgpu-dc/amdgpu-dm.rst

diff --git a/Documentation/gpu/amdgpu-dc.rst b/Documentation/gpu/amdgpu-dc.rst
deleted file mode 100644
index f7ff7e1309de..
--- a/Documentation/gpu/amdgpu-dc.rst
+++ /dev/null
@@ -1,74 +0,0 @@
-===
-drm/amd/display - Display Core (DC)
-===
-
-*placeholder - general description of supported platforms, what dc is, etc.*
-
-Because it is partially shared with other operating systems, the Display Core
-Driver is divided in two pieces.
-
-1. **Display Core (DC)** contains the OS-agnostic components. Things like
-   hardware programming and resource management are handled here.
-2. **Display Manager (DM)** contains the OS-dependent components. Hooks to the
-   amdgpu base driver and DRM are implemented here.
-
-It doesn't help that the entire package is frequently referred to as DC. But
-with the context in mind, it should be clear.
-
-When CONFIG_DRM_AMD_DC is enabled, DC will be initialized by default for
-supported ASICs. To force disable, set `amdgpu.dc=0` on kernel command line.
-Likewise, to force enable on unsupported ASICs, set `amdgpu.dc=1`.
-
-To determine if DC is loaded, search dmesg for the following entry:
-
-``Display Core initialized with ``
-
-AMDgpu Display Manager
-==
-
-.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
-   :doc: overview
-
-.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
-   :internal:
-
-Lifecycle
--
-
-.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
-   :doc: DM Lifecycle
-
-.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
-   :functions: dm_hw_init dm_hw_fini
-
-Interrupts
---
-
-.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c
-   :doc: overview
-
-.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c
-   :internal:
-
-.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
-   :functions: register_hpd_handlers dm_crtc_high_irq dm_pflip_high_irq
-
-Atomic Implementation
--
-
-.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
-   :doc: atomic
-
-.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
-   :functions: amdgpu_dm_atomic_check amdgpu_dm_atomic_commit_tail
-
-Display Core
-
-
-**WIP**
-
-FreeSync Video
---
-
-.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
-   :doc: FreeSync Video
diff --git a/Documentation/gpu/amdgpu-dc/amdgpu-dc-debug.rst 
b/Documentation/gpu/amdgpu-dc/amdgpu-dc-debug.rst
new file mode 100644
index ..bbb8c3fc8eee
--- /dev/null
+++ b/Documentation/gpu/amdgpu-dc/amdgpu-dc-debug.rst
@@ -0,0 +1,4 @@
+Display Core Debug tools
+
+
+TODO
diff --git a/Documentation/gpu/amdgpu-dc/amdgpu-dc.rst 
b/Documentation/gpu/amdgpu-dc/amdgpu-dc.rst
new file mode 100644
index ..3685b3b1ad64
--- /dev/null
+++ b/Documentation/gpu/amdgpu-dc/amdgpu-dc.rst


While we bikeshed names, I think it'd would make sense to call this
overview.rst or intro.rst or similar, since it's meant to contain the
overall toctree for everything amdgpu related (maybe there will be more in
the future).


index.rst?




Hi,

Thanks a lot for the suggestions; I will prepare a V2 that addresses all your 
comments.

Ps.: If there is no objection, I'll rename amdgpu-dc to index as Jani suggested.



SGTM, you mean amdgpu/index.rst, right?


Yeah, but I'm also thinking about this new organization:

1. Create an amdgpu folder.
2. Inside amdgpu folder, I want to create a display folder.
3. Move all display documentation to the display folder and keep other 
amdgpu generic things under amdgpu.
4. Finally, inside the amdgpu folder, I'll create the index.rst for 
amdgpu, and inside the display folder, I will create a similar 

Re: [PATCH 6/6] Documentation/gpu: Add DC glossary

2021-11-30 Thread Rodrigo Siqueira Jordao




On 2021-11-29 3:48 p.m., ydir...@free.fr wrote:

Hi Rodrigo,

That will really be helpful!

I know drawing the line is a difficult problem (and can even make things
harder when searching), but maybe it would make sense to keep generic
acronyms not specific to amdgpu in a separate list.  I bet a number of
them would be useful in the scope of other drm drivers (e.g. CRTC, DCC,
MST), and some are not restricted to the drm subsystem at all (e.g. FEC,
LUT), but still have value as not necessarily easy to look up.

Maybe "DC glossary" should just be "Glossary", since quite some entries
help to read adm/amdgpu/ too.  Which brings me to the result of my recent
searches as suggested entries:

  KIQ (Kernel Interface Queue), MQD (memory queue descriptor), HQD (hardware
  queue descriptor), EOP (still no clue :)

Maybe some more specific ones just to be spelled out in clear where they
are used ?  KCQ (compute queue?), KGQ (gfx queue?)

More suggestions inlined.

Best regards,



Hi all,

I'll address all the highlighted problems in the V2. Thanks a lot for 
all the feedback.


Yann,
For the generic acronyms, how about keeping it in this patch for now? 
After it gets merged, I can prepare a new documentation patch that 
creates a glossary for DRM where I move the generic acronyms to the DRM 
documentation. I prefer this approach to keep the improvement small and 
manageable.


Best Regards
Siqueira


Re: [PATCH 1/6] Documentation/gpu: Reorganize DC documentation

2021-11-30 Thread Harry Wentland
On 2021-11-30 10:46, Rodrigo Siqueira Jordao wrote:
> 
> 
> On 2021-11-29 7:06 a.m., Jani Nikula wrote:
>> On Fri, 26 Nov 2021, Daniel Vetter  wrote:
>>> On Thu, Nov 25, 2021 at 10:38:25AM -0500, Rodrigo Siqueira wrote:
 Display core documentation is not well organized, and it is hard to find
 information due to the lack of sections. This commit reorganizes the
 documentation layout, and it is preparation work for future changes.

 Signed-off-by: Rodrigo Siqueira 
 ---
   Documentation/gpu/amdgpu-dc.rst   | 74 ---
   .../gpu/amdgpu-dc/amdgpu-dc-debug.rst |  4 +
   Documentation/gpu/amdgpu-dc/amdgpu-dc.rst | 29 
   Documentation/gpu/amdgpu-dc/amdgpu-dm.rst | 42 +++
   Documentation/gpu/drivers.rst |  2 +-
   5 files changed, 76 insertions(+), 75 deletions(-)
   delete mode 100644 Documentation/gpu/amdgpu-dc.rst
   create mode 100644 Documentation/gpu/amdgpu-dc/amdgpu-dc-debug.rst
   create mode 100644 Documentation/gpu/amdgpu-dc/amdgpu-dc.rst
   create mode 100644 Documentation/gpu/amdgpu-dc/amdgpu-dm.rst

 diff --git a/Documentation/gpu/amdgpu-dc.rst 
 b/Documentation/gpu/amdgpu-dc.rst
 deleted file mode 100644
 index f7ff7e1309de..
 --- a/Documentation/gpu/amdgpu-dc.rst
 +++ /dev/null
 @@ -1,74 +0,0 @@
 -===
 -drm/amd/display - Display Core (DC)
 -===
 -
 -*placeholder - general description of supported platforms, what dc is, 
 etc.*
 -
 -Because it is partially shared with other operating systems, the Display 
 Core
 -Driver is divided in two pieces.
 -
 -1. **Display Core (DC)** contains the OS-agnostic components. Things like
 -   hardware programming and resource management are handled here.
 -2. **Display Manager (DM)** contains the OS-dependent components. Hooks 
 to the
 -   amdgpu base driver and DRM are implemented here.
 -
 -It doesn't help that the entire package is frequently referred to as DC. 
 But
 -with the context in mind, it should be clear.
 -
 -When CONFIG_DRM_AMD_DC is enabled, DC will be initialized by default for
 -supported ASICs. To force disable, set `amdgpu.dc=0` on kernel command 
 line.
 -Likewise, to force enable on unsupported ASICs, set `amdgpu.dc=1`.
 -
 -To determine if DC is loaded, search dmesg for the following entry:
 -
 -``Display Core initialized with ``
 -
 -AMDgpu Display Manager
 -==
 -
 -.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
 -   :doc: overview
 -
 -.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
 -   :internal:
 -
 -Lifecycle
 --
 -
 -.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
 -   :doc: DM Lifecycle
 -
 -.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
 -   :functions: dm_hw_init dm_hw_fini
 -
 -Interrupts
 ---
 -
 -.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c
 -   :doc: overview
 -
 -.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c
 -   :internal:
 -
 -.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
 -   :functions: register_hpd_handlers dm_crtc_high_irq dm_pflip_high_irq
 -
 -Atomic Implementation
 --
 -
 -.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
 -   :doc: atomic
 -
 -.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
 -   :functions: amdgpu_dm_atomic_check amdgpu_dm_atomic_commit_tail
 -
 -Display Core
 -
 -
 -**WIP**
 -
 -FreeSync Video
 ---
 -
 -.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
 -   :doc: FreeSync Video
 diff --git a/Documentation/gpu/amdgpu-dc/amdgpu-dc-debug.rst 
 b/Documentation/gpu/amdgpu-dc/amdgpu-dc-debug.rst
 new file mode 100644
 index ..bbb8c3fc8eee
 --- /dev/null
 +++ b/Documentation/gpu/amdgpu-dc/amdgpu-dc-debug.rst
 @@ -0,0 +1,4 @@
 +Display Core Debug tools
 +
 +
 +TODO
 diff --git a/Documentation/gpu/amdgpu-dc/amdgpu-dc.rst 
 b/Documentation/gpu/amdgpu-dc/amdgpu-dc.rst
 new file mode 100644
 index ..3685b3b1ad64
 --- /dev/null
 +++ b/Documentation/gpu/amdgpu-dc/amdgpu-dc.rst
>>>
>>> While we bikeshed names, I think it'd would make sense to call this
>>> overview.rst or intro.rst or similar, since it's meant to contain the
>>> overall toctree for everything amdgpu related (maybe there will be more in
>>> the future).
>>
>> index.rst?
>>
>>
> 
> Hi,
> 
> Thanks a lot for the 

Re: [PATCH 1/6] Documentation/gpu: Reorganize DC documentation

2021-11-30 Thread Rodrigo Siqueira Jordao




On 2021-11-29 7:06 a.m., Jani Nikula wrote:

On Fri, 26 Nov 2021, Daniel Vetter  wrote:

On Thu, Nov 25, 2021 at 10:38:25AM -0500, Rodrigo Siqueira wrote:

Display core documentation is not well organized, and it is hard to find
information due to the lack of sections. This commit reorganizes the
documentation layout, and it is preparation work for future changes.

Signed-off-by: Rodrigo Siqueira 
---
  Documentation/gpu/amdgpu-dc.rst   | 74 ---
  .../gpu/amdgpu-dc/amdgpu-dc-debug.rst |  4 +
  Documentation/gpu/amdgpu-dc/amdgpu-dc.rst | 29 
  Documentation/gpu/amdgpu-dc/amdgpu-dm.rst | 42 +++
  Documentation/gpu/drivers.rst |  2 +-
  5 files changed, 76 insertions(+), 75 deletions(-)
  delete mode 100644 Documentation/gpu/amdgpu-dc.rst
  create mode 100644 Documentation/gpu/amdgpu-dc/amdgpu-dc-debug.rst
  create mode 100644 Documentation/gpu/amdgpu-dc/amdgpu-dc.rst
  create mode 100644 Documentation/gpu/amdgpu-dc/amdgpu-dm.rst

diff --git a/Documentation/gpu/amdgpu-dc.rst b/Documentation/gpu/amdgpu-dc.rst
deleted file mode 100644
index f7ff7e1309de..
--- a/Documentation/gpu/amdgpu-dc.rst
+++ /dev/null
@@ -1,74 +0,0 @@
-===
-drm/amd/display - Display Core (DC)
-===
-
-*placeholder - general description of supported platforms, what dc is, etc.*
-
-Because it is partially shared with other operating systems, the Display Core
-Driver is divided in two pieces.
-
-1. **Display Core (DC)** contains the OS-agnostic components. Things like
-   hardware programming and resource management are handled here.
-2. **Display Manager (DM)** contains the OS-dependent components. Hooks to the
-   amdgpu base driver and DRM are implemented here.
-
-It doesn't help that the entire package is frequently referred to as DC. But
-with the context in mind, it should be clear.
-
-When CONFIG_DRM_AMD_DC is enabled, DC will be initialized by default for
-supported ASICs. To force disable, set `amdgpu.dc=0` on kernel command line.
-Likewise, to force enable on unsupported ASICs, set `amdgpu.dc=1`.
-
-To determine if DC is loaded, search dmesg for the following entry:
-
-``Display Core initialized with ``
-
-AMDgpu Display Manager
-==
-
-.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
-   :doc: overview
-
-.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
-   :internal:
-
-Lifecycle
--
-
-.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
-   :doc: DM Lifecycle
-
-.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
-   :functions: dm_hw_init dm_hw_fini
-
-Interrupts
---
-
-.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c
-   :doc: overview
-
-.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c
-   :internal:
-
-.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
-   :functions: register_hpd_handlers dm_crtc_high_irq dm_pflip_high_irq
-
-Atomic Implementation
--
-
-.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
-   :doc: atomic
-
-.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
-   :functions: amdgpu_dm_atomic_check amdgpu_dm_atomic_commit_tail
-
-Display Core
-
-
-**WIP**
-
-FreeSync Video
---
-
-.. kernel-doc:: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
-   :doc: FreeSync Video
diff --git a/Documentation/gpu/amdgpu-dc/amdgpu-dc-debug.rst 
b/Documentation/gpu/amdgpu-dc/amdgpu-dc-debug.rst
new file mode 100644
index ..bbb8c3fc8eee
--- /dev/null
+++ b/Documentation/gpu/amdgpu-dc/amdgpu-dc-debug.rst
@@ -0,0 +1,4 @@
+Display Core Debug tools
+
+
+TODO
diff --git a/Documentation/gpu/amdgpu-dc/amdgpu-dc.rst 
b/Documentation/gpu/amdgpu-dc/amdgpu-dc.rst
new file mode 100644
index ..3685b3b1ad64
--- /dev/null
+++ b/Documentation/gpu/amdgpu-dc/amdgpu-dc.rst


While we bikeshed names, I think it'd would make sense to call this
overview.rst or intro.rst or similar, since it's meant to contain the
overall toctree for everything amdgpu related (maybe there will be more in
the future).


index.rst?




Hi,

Thanks a lot for the suggestions; I will prepare a V2 that addresses all 
your comments.


Ps.: If there is no objection, I'll rename amdgpu-dc to index as Jani 
suggested.


Thanks.



Re: [PATCH v2] drm/radeon/radeon_kms: Fix a NULL pointer dereference in radeon_driver_open_kms()

2021-11-30 Thread Christian König




Am 30.11.21 um 16:33 schrieb Zhou Qingyang:

In radeon_driver_open_kms(), radeon_vm_bo_add() is assigned to
vm->ib_bo_va and passes and used in radeon_vm_bo_set_addr(). In
radeon_vm_bo_set_addr(), there is a dereference of vm->ib_bo_va,
which could lead to a NULL pointer dereference on failure of
radeon_vm_bo_add().

Fix this bug by adding a check of vm->ib_bo_va.

This bug was found by a static analyzer. The analysis employs
differential checking to identify inconsistent security operations
(e.g., checks or kfrees) between two code paths and confirms that the
inconsistent operations are not recovered in the current function or
the callers, so they constitute bugs.

Note that, as a bug found by static analysis, it can be a false
positive or hard to trigger. Multiple researchers have cross-reviewed
the bug.

Builds with CONFIG_DRM_RADEON=m show no new warnings,
and our static analyzer no longer warns about this code.

Fixes: cc9e67e3d700 ("drm/radeon: fix VM IB handling")
Signed-off-by: Zhou Qingyang 
---
Changes in v2:
   -  Improve the error handling into goto style

  drivers/gpu/drm/radeon/radeon_kms.c | 24 ++--
  1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_kms.c 
b/drivers/gpu/drm/radeon/radeon_kms.c
index 482fb0ae6cb5..e49a9d160e52 100644
--- a/drivers/gpu/drm/radeon/radeon_kms.c
+++ b/drivers/gpu/drm/radeon/radeon_kms.c
@@ -649,6 +649,8 @@ int radeon_driver_open_kms(struct drm_device *dev, struct 
drm_file *file_priv)
  {
struct radeon_device *rdev = dev->dev_private;
int r;
+   struct radeon_fpriv *fpriv;
+   struct radeon_vm *vm;


Please keep variables like "i" or "r" declared last.

  
  	file_priv->driver_priv = NULL;
  
@@ -660,8 +662,6 @@ int radeon_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv)
  
  	/* new gpu have virtual address space support */

if (rdev->family >= CHIP_CAYMAN) {
-   struct radeon_fpriv *fpriv;
-   struct radeon_vm *vm;
  
  		fpriv = kzalloc(sizeof(*fpriv), GFP_KERNEL);

if (unlikely(!fpriv)) {
@@ -673,34 +673,38 @@ int radeon_driver_open_kms(struct drm_device *dev, struct 
drm_file *file_priv)
vm = >vm;
r = radeon_vm_init(rdev, vm);
if (r) {
-   kfree(fpriv);
-   goto out_suspend;
+   goto out_fpriv;
}
  
  			r = radeon_bo_reserve(rdev->ring_tmp_bo.bo, false);

if (r) {
-   radeon_vm_fini(rdev, vm);
-   kfree(fpriv);
-   goto out_suspend;
+   goto out_vm_fini;
}
  
  			/* map the ib pool buffer read only into

 * virtual address space */
vm->ib_bo_va = radeon_vm_bo_add(rdev, vm,
rdev->ring_tmp_bo.bo);
+   if (!vm->ib_bo_va) {
+   r = -ENOMEM;
+   goto out_vm_fini;
+   }
+
r = radeon_vm_bo_set_addr(rdev, vm->ib_bo_va,
  RADEON_VA_IB_OFFSET,
  RADEON_VM_PAGE_READABLE |
  RADEON_VM_PAGE_SNOOPED);
if (r) {
-   radeon_vm_fini(rdev, vm);
-   kfree(fpriv);
-   goto out_suspend;
+   goto out_vm_fini;
}
}
file_priv->driver_priv = fpriv;
}
  


That here won't work.


+out_vm_fini:
+   radeon_vm_fini(rdev, vm);
+out_fpriv:
+   kfree(fpriv);


You are finishing the VM and freeing up the memory in the good case now 
as well.


Christian.


  out_suspend:
pm_runtime_mark_last_busy(dev->dev);
pm_runtime_put_autosuspend(dev->dev);




RE: [PATCH] drm/amdgpu: adjust the kfd reset sequence in reset sriov function

2021-11-30 Thread Liu, Shaoyun
Thanks for the review , change the description as suggested and submitted. 

Shaoyun.liu

-Original Message-
From: Kuehling, Felix  
Sent: Tuesday, November 30, 2021 1:19 AM
To: amd-gfx@lists.freedesktop.org; Liu, Shaoyun 
Subject: Re: [PATCH] drm/amdgpu: adjust the kfd reset sequence in reset sriov 
function

Am 2021-11-29 um 9:40 p.m. schrieb shaoyunl:
> This change revert previous commit
> 7079e7d5c6bf: drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov
> cd547b93c62a: drm/amdgpu: move kfd post_reset out of reset_sriov 
> function

It looks like this is not a straight revert. It moves the 
amdgpu_amdkfd_pre_reset to an earlier place in amdgpu_device_reset_sriov, 
presumably to address the sequence issue that the first patch was originally 
meant to fix. The patch description should mention that.

With that fixed, the patch is

Reviewed-by: Felix Kuehling 


>
> Some register access(GRBM_GFX_CNTL) only be allowed on full access 
> mode. Move kfd_pre_reset and  kfd_post_reset back inside reset_sriov 
> function.
>
> Signed-off-by: shaoyunl 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 
>  1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 1989f9e9379e..3c5afa45173c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -4285,6 +4285,8 @@ static int amdgpu_device_reset_sriov(struct 
> amdgpu_device *adev,  {
>   int r;
>  
> + amdgpu_amdkfd_pre_reset(adev);
> +
>   if (from_hypervisor)
>   r = amdgpu_virt_request_full_gpu(adev, true);
>   else
> @@ -4312,6 +4314,7 @@ static int amdgpu_device_reset_sriov(struct 
> amdgpu_device *adev,
>  
>   amdgpu_irq_gpu_reset_resume_helper(adev);
>   r = amdgpu_ib_ring_tests(adev);
> + amdgpu_amdkfd_post_reset(adev);
>  
>  error:
>   if (!r && adev->virt.gim_feature & AMDGIM_FEATURE_GIM_FLR_VRAMLOST) 
> { @@ -5026,7 +5029,8 @@ int amdgpu_device_gpu_recover(struct 
> amdgpu_device *adev,
>  
>   cancel_delayed_work_sync(_adev->delayed_init_work);
>  
> - amdgpu_amdkfd_pre_reset(tmp_adev);
> + if (!amdgpu_sriov_vf(tmp_adev))
> + amdgpu_amdkfd_pre_reset(tmp_adev);
>  
>   /*
>* Mark these ASICs to be reseted as untracked first @@ -5144,9 
> +5148,9 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>  
>  skip_sched_resume:
>   list_for_each_entry(tmp_adev, device_list_handle, reset_list) {
> - /* unlock kfd */
> - if (!need_emergency_restart)
> - amdgpu_amdkfd_post_reset(tmp_adev);
> + /* unlock kfd: SRIOV would do it separately */
> + if (!need_emergency_restart && !amdgpu_sriov_vf(tmp_adev))
> + amdgpu_amdkfd_post_reset(tmp_adev);
>  
>   /* kfd_post_reset will do nothing if kfd device is not 
> initialized,
>* need to bring up kfd here if it's not be initialized before


Re: [PATCH] drm/amdgpu/display: Only set vblank_disable_immediate when PSR is not enabled

2021-11-30 Thread Harry Wentland
On 2021-11-30 09:53, Nicholas Kazlauskas wrote:
> [Why]
> PSR currently relies on the kernel's delayed vblank on/off mechanism
> as an implicit bufferring mechanism to prevent excessive entry/exit.
> 
> Without this delay the user experience is impacted since it can take
> a few frames to enter/exit.
> 
> [How]
> Only allow vblank disable immediate for DC when psr is not supported.
> 
> Leave a TODO indicating that this support should be extended in the
> future to delay independent of the vblank interrupt.
> 
> Fixes: 3d1508b73ff1 ("drm/amdgpu/display: set vblank_disable_immediate for 
> DC")
> 
> Cc: Harry Wentland 
> Cc: Alex Deucher 
> Signed-off-by: Nicholas Kazlauskas 

Now I'm curious whether vblank_disable_immediate or PSR
save more power.

Reviewed-by: Harry Wentland 

Harry

> ---
>  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index 0747dc7922c2..d582d44c02ad 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -1599,9 +1599,6 @@ static int amdgpu_dm_init(struct amdgpu_device *adev)
>   adev_to_drm(adev)->mode_config.cursor_width = 
> adev->dm.dc->caps.max_cursor_size;
>   adev_to_drm(adev)->mode_config.cursor_height = 
> adev->dm.dc->caps.max_cursor_size;
>  
> - /* Disable vblank IRQs aggressively for power-saving */
> - adev_to_drm(adev)->vblank_disable_immediate = true;
> -
>   if (drm_vblank_init(adev_to_drm(adev), adev->dm.display_indexes_num)) {
>   DRM_ERROR(
>   "amdgpu: failed to initialize sw for display support.\n");
> @@ -4264,6 +4261,14 @@ static int amdgpu_dm_initialize_drm_device(struct 
> amdgpu_device *adev)
>  
>   }
>  
> + /*
> +  * Disable vblank IRQs aggressively for power-saving.
> +  *
> +  * TODO: Fix vblank control helpers to delay PSR entry to allow this 
> when PSR
> +  * is also supported.
> +  */
> + adev_to_drm(adev)->vblank_disable_immediate = !psr_feature_enabled;
> +
>   /* Software is initialized. Now we can register interrupt handlers. */
>   switch (adev->asic_type) {
>  #if defined(CONFIG_DRM_AMD_DC_SI)
> 



Re: [PATCH] drm/radeon/radeon_kms: Fix a NULL pointer dereference in radeon_driver_open_kms()

2021-11-30 Thread Christian König

Am 30.11.21 um 16:04 schrieb Zhou Qingyang:

In radeon_driver_open_kms(), radeon_vm_bo_add() is assigned to
vm->ib_bo_va and passes and used in radeon_vm_bo_set_addr(). In
radeon_vm_bo_set_addr(), there is a dereference of vm->ib_bo_va,
which could lead to a NULL pointer dereference on failure of
radeon_vm_bo_add().

Fix this bug by adding a check of vm->ib_bo_va.

This bug was found by a static analyzer. The analysis employs
differential checking to identify inconsistent security operations
(e.g., checks or kfrees) between two code paths and confirms that the
inconsistent operations are not recovered in the current function or
the callers, so they constitute bugs.

Note that, as a bug found by static analysis, it can be a false
positive or hard to trigger. Multiple researchers have cross-reviewed
the bug.

Builds with CONFIG_DRM_RADEON=m show no new warnings,
and our static analyzer no longer warns about this code.

Fixes: cc9e67e3d700 ("drm/radeon: fix VM IB handling")
Signed-off-by: Zhou Qingyang 
---
  drivers/gpu/drm/radeon/radeon_kms.c | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon_kms.c 
b/drivers/gpu/drm/radeon/radeon_kms.c
index 482fb0ae6cb5..ead015c055fb 100644
--- a/drivers/gpu/drm/radeon/radeon_kms.c
+++ b/drivers/gpu/drm/radeon/radeon_kms.c
@@ -688,6 +688,13 @@ int radeon_driver_open_kms(struct drm_device *dev, struct 
drm_file *file_priv)
 * virtual address space */
vm->ib_bo_va = radeon_vm_bo_add(rdev, vm,
rdev->ring_tmp_bo.bo);
+   if (!vm->ib_bo_va) {
+   r = -ENOMEM;
+   radeon_vm_fini(rdev, vm);
+   kfree(fpriv);
+   goto out_suspend;
+   }
+


Impressive catch for an automated checker.

Please improve the error handling into goto style since we now add the 
fourth instance of the same error handling code. Apart from that looks 
good to me.


Thanks,
Christian.


r = radeon_vm_bo_set_addr(rdev, vm->ib_bo_va,
  RADEON_VA_IB_OFFSET,
  RADEON_VM_PAGE_READABLE |




Re: [PATCH] drm/amdgpu/display: Only set vblank_disable_immediate when PSR is not enabled

2021-11-30 Thread Deucher, Alexander
[Public]

Acked-by: Alex Deucher 

From: Nicholas Kazlauskas 
Sent: Tuesday, November 30, 2021 9:53 AM
To: amd-gfx@lists.freedesktop.org 
Cc: Kazlauskas, Nicholas ; Wentland, Harry 
; Deucher, Alexander 
Subject: [PATCH] drm/amdgpu/display: Only set vblank_disable_immediate when PSR 
is not enabled

[Why]
PSR currently relies on the kernel's delayed vblank on/off mechanism
as an implicit bufferring mechanism to prevent excessive entry/exit.

Without this delay the user experience is impacted since it can take
a few frames to enter/exit.

[How]
Only allow vblank disable immediate for DC when psr is not supported.

Leave a TODO indicating that this support should be extended in the
future to delay independent of the vblank interrupt.

Fixes: 3d1508b73ff1 ("drm/amdgpu/display: set vblank_disable_immediate for DC")

Cc: Harry Wentland 
Cc: Alex Deucher 
Signed-off-by: Nicholas Kazlauskas 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 0747dc7922c2..d582d44c02ad 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1599,9 +1599,6 @@ static int amdgpu_dm_init(struct amdgpu_device *adev)
 adev_to_drm(adev)->mode_config.cursor_width = 
adev->dm.dc->caps.max_cursor_size;
 adev_to_drm(adev)->mode_config.cursor_height = 
adev->dm.dc->caps.max_cursor_size;

-   /* Disable vblank IRQs aggressively for power-saving */
-   adev_to_drm(adev)->vblank_disable_immediate = true;
-
 if (drm_vblank_init(adev_to_drm(adev), adev->dm.display_indexes_num)) {
 DRM_ERROR(
 "amdgpu: failed to initialize sw for display support.\n");
@@ -4264,6 +4261,14 @@ static int amdgpu_dm_initialize_drm_device(struct 
amdgpu_device *adev)

 }

+   /*
+* Disable vblank IRQs aggressively for power-saving.
+*
+* TODO: Fix vblank control helpers to delay PSR entry to allow this 
when PSR
+* is also supported.
+*/
+   adev_to_drm(adev)->vblank_disable_immediate = !psr_feature_enabled;
+
 /* Software is initialized. Now we can register interrupt handlers. */
 switch (adev->asic_type) {
 #if defined(CONFIG_DRM_AMD_DC_SI)
--
2.25.1



[PATCH AUTOSEL 5.15 59/68] drm/amdgpu: fix byteorder error in amdgpu discovery

2021-11-30 Thread Sasha Levin
From: Yang Wang 

[ Upstream commit fd08953b2de911f32c06aedbc8ad111c2fd0168b ]

fix some byteorder issues about amdgpu discovery.
This will result in running errors on the big end system. (e.g:MIPS)

Signed-off-by: Yang Wang 
Reviewed-by: Guchun Chen 
Reviewed-by: Lijo Lazar 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index ada7bc19118ac..a12272a0c8844 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -190,8 +190,8 @@ static int amdgpu_discovery_init(struct amdgpu_device *adev)
 
offset = offsetof(struct binary_header, binary_checksum) +
sizeof(bhdr->binary_checksum);
-   size = bhdr->binary_size - offset;
-   checksum = bhdr->binary_checksum;
+   size = le16_to_cpu(bhdr->binary_size) - offset;
+   checksum = le16_to_cpu(bhdr->binary_checksum);
 
if (!amdgpu_discovery_verify_checksum(adev->mman.discovery_bin + offset,
  size, checksum)) {
@@ -212,7 +212,7 @@ static int amdgpu_discovery_init(struct amdgpu_device *adev)
}
 
if (!amdgpu_discovery_verify_checksum(adev->mman.discovery_bin + offset,
- ihdr->size, checksum)) {
+ le16_to_cpu(ihdr->size), 
checksum)) {
DRM_ERROR("invalid ip discovery data table checksum\n");
r = -EINVAL;
goto out;
@@ -224,7 +224,7 @@ static int amdgpu_discovery_init(struct amdgpu_device *adev)
ghdr = (struct gpu_info_header *)(adev->mman.discovery_bin + offset);
 
if (!amdgpu_discovery_verify_checksum(adev->mman.discovery_bin + offset,
- ghdr->size, checksum)) {
+ le32_to_cpu(ghdr->size), 
checksum)) {
DRM_ERROR("invalid gc data table checksum\n");
r = -EINVAL;
goto out;
@@ -395,10 +395,10 @@ void amdgpu_discovery_harvest_ip(struct amdgpu_device 
*adev)
le16_to_cpu(bhdr->table_list[HARVEST_INFO].offset));
 
for (i = 0; i < 32; i++) {
-   if (le32_to_cpu(harvest_info->list[i].hw_id) == 0)
+   if (le16_to_cpu(harvest_info->list[i].hw_id) == 0)
break;
 
-   switch (le32_to_cpu(harvest_info->list[i].hw_id)) {
+   switch (le16_to_cpu(harvest_info->list[i].hw_id)) {
case VCN_HWID:
vcn_harvest_count++;
break;
-- 
2.33.0



[PATCH AUTOSEL 5.15 58/68] drm/amdkfd: handle VMA remove race

2021-11-30 Thread Sasha Levin
From: Philip Yang 

[ Upstream commit 0cc53cb450669cf1def4ff89e8cbcd8ec3c62380 ]

VMA may be removed before unmap notifier callback, and deferred list
work remove range, return success for this special case as we are
handling stale retry fault.

Signed-off-by: Philip Yang 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 22 +-
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 179080329af89..4e933fb0fc698 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -2448,20 +2448,13 @@ svm_range_count_fault(struct amdgpu_device *adev, 
struct kfd_process *p,
 }
 
 static bool
-svm_fault_allowed(struct mm_struct *mm, uint64_t addr, bool write_fault)
+svm_fault_allowed(struct vm_area_struct *vma, bool write_fault)
 {
unsigned long requested = VM_READ;
-   struct vm_area_struct *vma;
 
if (write_fault)
requested |= VM_WRITE;
 
-   vma = find_vma(mm, addr << PAGE_SHIFT);
-   if (!vma || (addr << PAGE_SHIFT) < vma->vm_start) {
-   pr_debug("address 0x%llx VMA is removed\n", addr);
-   return true;
-   }
-
pr_debug("requested 0x%lx, vma permission flags 0x%lx\n", requested,
vma->vm_flags);
return (vma->vm_flags & requested) == requested;
@@ -2479,6 +2472,7 @@ svm_range_restore_pages(struct amdgpu_device *adev, 
unsigned int pasid,
int32_t best_loc;
int32_t gpuidx = MAX_GPU_INSTANCE;
bool write_locked = false;
+   struct vm_area_struct *vma;
int r = 0;
 
if (!KFD_IS_SVM_API_SUPPORTED(adev->kfd.dev)) {
@@ -2552,7 +2546,17 @@ svm_range_restore_pages(struct amdgpu_device *adev, 
unsigned int pasid,
goto out_unlock_range;
}
 
-   if (!svm_fault_allowed(mm, addr, write_fault)) {
+   /* __do_munmap removed VMA, return success as we are handling stale
+* retry fault.
+*/
+   vma = find_vma(mm, addr << PAGE_SHIFT);
+   if (!vma || (addr << PAGE_SHIFT) < vma->vm_start) {
+   pr_debug("address 0x%llx VMA is removed\n", addr);
+   r = 0;
+   goto out_unlock_range;
+   }
+
+   if (!svm_fault_allowed(vma, write_fault)) {
pr_debug("fault addr 0x%llx no %s permission\n", addr,
write_fault ? "write" : "read");
r = -EPERM;
-- 
2.33.0



[PATCH AUTOSEL 5.15 60/68] drm/amd/display: update bios scratch when setting backlight

2021-11-30 Thread Sasha Levin
From: Alex Deucher 

[ Upstream commit 692cd92e66ee10597676530573a495dc1d3bec6a ]

Update the bios scratch register when updating the backlight
level.  Some platforms apparently read this scratch register
and do additional operations in their hotkey handlers.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1518
Reviewed-by: Harry Wentland 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c  | 12 
 drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.h  |  2 ++
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |  4 
 3 files changed, 18 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
index 96b7bb13a2dd9..12a6b1c99c93e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
@@ -1569,6 +1569,18 @@ void amdgpu_atombios_scratch_regs_engine_hung(struct 
amdgpu_device *adev,
WREG32(adev->bios_scratch_reg_offset + 3, tmp);
 }
 
+void amdgpu_atombios_scratch_regs_set_backlight_level(struct amdgpu_device 
*adev,
+ u32 backlight_level)
+{
+   u32 tmp = RREG32(adev->bios_scratch_reg_offset + 2);
+
+   tmp &= ~ATOM_S2_CURRENT_BL_LEVEL_MASK;
+   tmp |= (backlight_level << ATOM_S2_CURRENT_BL_LEVEL_SHIFT) &
+   ATOM_S2_CURRENT_BL_LEVEL_MASK;
+
+   WREG32(adev->bios_scratch_reg_offset + 2, tmp);
+}
+
 bool amdgpu_atombios_scratch_need_asic_init(struct amdgpu_device *adev)
 {
u32 tmp = RREG32(adev->bios_scratch_reg_offset + 7);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.h
index 8cc0222dba191..27e74b1fc260a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.h
@@ -185,6 +185,8 @@ bool amdgpu_atombios_has_gpu_virtualization_table(struct 
amdgpu_device *adev);
 void amdgpu_atombios_scratch_regs_lock(struct amdgpu_device *adev, bool lock);
 void amdgpu_atombios_scratch_regs_engine_hung(struct amdgpu_device *adev,
  bool hung);
+void amdgpu_atombios_scratch_regs_set_backlight_level(struct amdgpu_device 
*adev,
+ u32 backlight_level);
 bool amdgpu_atombios_scratch_need_asic_init(struct amdgpu_device *adev);
 
 void amdgpu_atombios_copy_swap(u8 *dst, u8 *src, u8 num_bytes, bool to_le);
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 084491afe5405..ea36f0fa59a9e 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -50,6 +50,7 @@
 #include 
 #endif
 #include "amdgpu_pm.h"
+#include "amdgpu_atombios.h"
 
 #include "amd_shared.h"
 #include "amdgpu_dm_irq.h"
@@ -3528,6 +3529,9 @@ static int amdgpu_dm_backlight_set_level(struct 
amdgpu_display_manager *dm,
caps = dm->backlight_caps[bl_idx];
 
dm->brightness[bl_idx] = user_brightness;
+   /* update scratch register */
+   if (bl_idx == 0)
+   amdgpu_atombios_scratch_regs_set_backlight_level(dm->adev, 
dm->brightness[bl_idx]);
brightness = convert_brightness_from_user(, 
dm->brightness[bl_idx]);
link = (struct dc_link *)dm->backlight_link[bl_idx];
 
-- 
2.33.0



[PATCH AUTOSEL 5.15 55/68] drm/amdgpu: Fix MMIO HDP flush on SRIOV

2021-11-30 Thread Sasha Levin
From: Felix Kuehling 

[ Upstream commit d3a21f7e353dc8d6939383578f3bd45b4ae3a946 ]

Disable HDP register remapping on SRIOV and set rmmio_remap.reg_offset
to the fixed address of the VF register for hdp_v*_flush_hdp.

Signed-off-by: Felix Kuehling 
Tested-by: Bokun Zhang 
Reviewed-by: Lijo Lazar 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c | 4 
 drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c | 4 
 drivers/gpu/drm/amd/amdgpu/nbio_v7_0.c | 4 +++-
 drivers/gpu/drm/amd/amdgpu/nbio_v7_2.c | 4 
 drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c | 4 +++-
 drivers/gpu/drm/amd/amdgpu/nv.c| 8 +---
 drivers/gpu/drm/amd/amdgpu/soc15.c | 8 +---
 7 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c 
b/drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c
index b184b656b9b6b..a76b5e47e7cbe 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c
@@ -328,6 +328,10 @@ static void nbio_v2_3_init_registers(struct amdgpu_device 
*adev)
 
if (def != data)
WREG32_PCIE(smnPCIE_CONFIG_CNTL, data);
+
+   if (amdgpu_sriov_vf(adev))
+   adev->rmmio_remap.reg_offset = SOC15_REG_OFFSET(NBIO, 0,
+   mmBIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL) << 
2;
 }
 
 #define NAVI10_PCIE__LC_L0S_INACTIVITY_DEFAULT 0x // off by 
default, no gains over L1
diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c 
b/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c
index 0d2d629e2d6a2..4bbacf1be25a4 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c
@@ -276,6 +276,10 @@ static void nbio_v6_1_init_registers(struct amdgpu_device 
*adev)
 
if (def != data)
WREG32_PCIE(smnPCIE_CI_CNTL, data);
+
+   if (amdgpu_sriov_vf(adev))
+   adev->rmmio_remap.reg_offset = SOC15_REG_OFFSET(NBIO, 0,
+   mmBIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL) << 
2;
 }
 
 static void nbio_v6_1_program_ltr(struct amdgpu_device *adev)
diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_0.c 
b/drivers/gpu/drm/amd/amdgpu/nbio_v7_0.c
index 3c00666a13e16..37a4039fdfc53 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_0.c
@@ -273,7 +273,9 @@ const struct nbio_hdp_flush_reg nbio_v7_0_hdp_flush_reg = {
 
 static void nbio_v7_0_init_registers(struct amdgpu_device *adev)
 {
-
+   if (amdgpu_sriov_vf(adev))
+   adev->rmmio_remap.reg_offset =
+   SOC15_REG_OFFSET(NBIO, 0, 
mmHDP_MEM_COHERENCY_FLUSH_CNTL) << 2;
 }
 
 const struct amdgpu_nbio_funcs nbio_v7_0_funcs = {
diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_2.c 
b/drivers/gpu/drm/amd/amdgpu/nbio_v7_2.c
index 8f2a315e7c73c..3444332ea1104 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_2.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_2.c
@@ -371,6 +371,10 @@ static void nbio_v7_2_init_registers(struct amdgpu_device 
*adev)
if (def != data)
WREG32_PCIE_PORT(SOC15_REG_OFFSET(NBIO, 0, 
regPCIE_CONFIG_CNTL), data);
}
+
+   if (amdgpu_sriov_vf(adev))
+   adev->rmmio_remap.reg_offset = SOC15_REG_OFFSET(NBIO, 0,
+   regBIF_BX_PF0_HDP_MEM_COHERENCY_FLUSH_CNTL) << 2;
 }
 
 const struct amdgpu_nbio_funcs nbio_v7_2_funcs = {
diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c 
b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
index f50045cebd44c..9b3f64971a321 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
@@ -344,7 +344,9 @@ const struct nbio_hdp_flush_reg nbio_v7_4_hdp_flush_reg = {
 
 static void nbio_v7_4_init_registers(struct amdgpu_device *adev)
 {
-
+   if (amdgpu_sriov_vf(adev))
+   adev->rmmio_remap.reg_offset = SOC15_REG_OFFSET(NBIO, 0,
+   mmBIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL) << 
2;
 }
 
 static void nbio_v7_4_handle_ras_controller_intr_no_bifring(struct 
amdgpu_device *adev)
diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
index 01efda4398e56..b739166b242a7 100644
--- a/drivers/gpu/drm/amd/amdgpu/nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
@@ -1034,8 +1034,10 @@ static int nv_common_early_init(void *handle)
 #define MMIO_REG_HOLE_OFFSET (0x8 - PAGE_SIZE)
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
-   adev->rmmio_remap.reg_offset = MMIO_REG_HOLE_OFFSET;
-   adev->rmmio_remap.bus_addr = adev->rmmio_base + MMIO_REG_HOLE_OFFSET;
+   if (!amdgpu_sriov_vf(adev)) {
+   adev->rmmio_remap.reg_offset = MMIO_REG_HOLE_OFFSET;
+   adev->rmmio_remap.bus_addr = adev->rmmio_base + 
MMIO_REG_HOLE_OFFSET;
+   }
adev->smc_rreg = NULL;
adev->smc_wreg = NULL;
adev->pcie_rreg = _pcie_rreg;
@@ -1333,7 +1335,7 @@ static int nv_common_hw_init(void *handle)
 * for the 

[PATCH AUTOSEL 5.15 56/68] drm/amdgpu: Fix double free of dmabuf

2021-11-30 Thread Sasha Levin
From: xinhui pan 

[ Upstream commit 4eb6bb649fe041472ddd00f94870c0b86ef49d34 ]

amdgpu_amdkfd_gpuvm_free_memory_of_gpu drop dmabuf reference increased in
amdgpu_gem_prime_export.
amdgpu_bo_destroy drop dmabuf reference increased in
amdgpu_gem_prime_import.

So remove this extra dma_buf_put to avoid double free.

Signed-off-by: xinhui pan 
Tested-by: Felix Kuehling 
Reviewed-by: Felix Kuehling 
Acked-by: Christian König 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index cdf46bd0d8d5b..3862470c7f1eb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -644,12 +644,6 @@ kfd_mem_attach_dmabuf(struct amdgpu_device *adev, struct 
kgd_mem *mem,
if (IS_ERR(gobj))
return PTR_ERR(gobj);
 
-   /* Import takes an extra reference on the dmabuf. Drop it now to
-* avoid leaking it. We only need the one reference in
-* kgd_mem->dmabuf.
-*/
-   dma_buf_put(mem->dmabuf);
-
*bo = gem_to_amdgpu_bo(gobj);
(*bo)->flags |= AMDGPU_GEM_CREATE_PREEMPTIBLE;
(*bo)->parent = amdgpu_bo_ref(mem->bo);
-- 
2.33.0



[PATCH AUTOSEL 5.15 57/68] drm/amd/display: Fixed DSC would not PG after removing DSC stream

2021-11-30 Thread Sasha Levin
From: Yi-Ling Chen 

[ Upstream commit 2da8f0beece08a5c3c2e20c0e38e1a4bbc153f9e ]

[WHY]
Due to pass the wrong parameter down to the enable_stream_gating(),
it would cause the DSC of the removing stream would not be PG.

[HOW]
To pass the correct parameter down th the enable_stream_gating().

Reviewed-by: Anthony Koo 
Acked-by: Qingqing Zhuo 
Signed-off-by: Yi-Ling Chen 
Tested-by: Daniel Wheeler 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c | 2 +-
 drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c| 2 +-
 drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hwseq.c| 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c 
b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
index 3af49cdf89ebd..2e0fb2ead0a3a 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
@@ -1566,7 +1566,7 @@ void dcn10_reset_hw_ctx_wrap(
 
dcn10_reset_back_end_for_pipe(dc, pipe_ctx_old, 
dc->current_state);
if (hws->funcs.enable_stream_gating)
-   hws->funcs.enable_stream_gating(dc, pipe_ctx);
+   hws->funcs.enable_stream_gating(dc, 
pipe_ctx_old);
if (old_clk)
old_clk->funcs->cs_power_down(old_clk);
}
diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
index a47ba1d45be92..027f221a6d7d5 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
@@ -2262,7 +2262,7 @@ void dcn20_reset_hw_ctx_wrap(
 
dcn20_reset_back_end_for_pipe(dc, pipe_ctx_old, 
dc->current_state);
if (hws->funcs.enable_stream_gating)
-   hws->funcs.enable_stream_gating(dc, pipe_ctx);
+   hws->funcs.enable_stream_gating(dc, 
pipe_ctx_old);
if (old_clk)
old_clk->funcs->cs_power_down(old_clk);
}
diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hwseq.c
index 3afa1159a5f7d..251414babffa3 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hwseq.c
@@ -588,7 +588,7 @@ void dcn31_reset_hw_ctx_wrap(
 
dcn31_reset_back_end_for_pipe(dc, pipe_ctx_old, 
dc->current_state);
if (hws->funcs.enable_stream_gating)
-   hws->funcs.enable_stream_gating(dc, pipe_ctx);
+   hws->funcs.enable_stream_gating(dc, 
pipe_ctx_old);
if (old_clk)
old_clk->funcs->cs_power_down(old_clk);
}
-- 
2.33.0



[PATCH] drm/amdgpu/display: Only set vblank_disable_immediate when PSR is not enabled

2021-11-30 Thread Nicholas Kazlauskas
[Why]
PSR currently relies on the kernel's delayed vblank on/off mechanism
as an implicit bufferring mechanism to prevent excessive entry/exit.

Without this delay the user experience is impacted since it can take
a few frames to enter/exit.

[How]
Only allow vblank disable immediate for DC when psr is not supported.

Leave a TODO indicating that this support should be extended in the
future to delay independent of the vblank interrupt.

Fixes: 3d1508b73ff1 ("drm/amdgpu/display: set vblank_disable_immediate for DC")

Cc: Harry Wentland 
Cc: Alex Deucher 
Signed-off-by: Nicholas Kazlauskas 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 0747dc7922c2..d582d44c02ad 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1599,9 +1599,6 @@ static int amdgpu_dm_init(struct amdgpu_device *adev)
adev_to_drm(adev)->mode_config.cursor_width = 
adev->dm.dc->caps.max_cursor_size;
adev_to_drm(adev)->mode_config.cursor_height = 
adev->dm.dc->caps.max_cursor_size;
 
-   /* Disable vblank IRQs aggressively for power-saving */
-   adev_to_drm(adev)->vblank_disable_immediate = true;
-
if (drm_vblank_init(adev_to_drm(adev), adev->dm.display_indexes_num)) {
DRM_ERROR(
"amdgpu: failed to initialize sw for display support.\n");
@@ -4264,6 +4261,14 @@ static int amdgpu_dm_initialize_drm_device(struct 
amdgpu_device *adev)
 
}
 
+   /*
+* Disable vblank IRQs aggressively for power-saving.
+*
+* TODO: Fix vblank control helpers to delay PSR entry to allow this 
when PSR
+* is also supported.
+*/
+   adev_to_drm(adev)->vblank_disable_immediate = !psr_feature_enabled;
+
/* Software is initialized. Now we can register interrupt handlers. */
switch (adev->asic_type) {
 #if defined(CONFIG_DRM_AMD_DC_SI)
-- 
2.25.1



Re: [PATCH 6/6] Documentation/gpu: Add DC glossary

2021-11-30 Thread Alex Deucher
On Mon, Nov 29, 2021 at 3:48 PM  wrote:
>
> Hi Rodrigo,
>
> That will really be helpful!
>
> I know drawing the line is a difficult problem (and can even make things
> harder when searching), but maybe it would make sense to keep generic
> acronyms not specific to amdgpu in a separate list.  I bet a number of
> them would be useful in the scope of other drm drivers (e.g. CRTC, DCC,
> MST), and some are not restricted to the drm subsystem at all (e.g. FEC,
> LUT), but still have value as not necessarily easy to look up.
>
> Maybe "DC glossary" should just be "Glossary", since quite some entries
> help to read adm/amdgpu/ too.  Which brings me to the result of my recent
> searches as suggested entries:
>
>  KIQ (Kernel Interface Queue), MQD (memory queue descriptor), HQD (hardware
>  queue descriptor), EOP (still no clue :)
>
> Maybe some more specific ones just to be spelled out in clear where they
> are used ?  KCQ (compute queue?), KGQ (gfx queue?)

Kernel Compute Queue and Kernel Graphics Queue.

Alex

>
> More suggestions inlined.
>
> Best regards,
> --
> Yann
>
> > On Thu, Nov 25, 2021 at 10:40 AM Rodrigo Siqueira
> >  wrote:
> > >
> > > In the DC driver, we have multiple acronyms that are not obvious
> > > most of
> > > the time. This commit introduces a DC glossary in order to make it
> > > easier to navigate through our driver.
> > >
> > > Signed-off-by: Rodrigo Siqueira 
> > > ---
> > >  Documentation/gpu/amdgpu-dc/amdgpu-dc.rst   |   2 +-
> > >  Documentation/gpu/amdgpu-dc/dc-glossary.rst | 257
> > >  
> > >  2 files changed, 258 insertions(+), 1 deletion(-)
> > >  create mode 100644 Documentation/gpu/amdgpu-dc/dc-glossary.rst
> > >
> > > diff --git a/Documentation/gpu/amdgpu-dc/amdgpu-dc.rst
> > > b/Documentation/gpu/amdgpu-dc/amdgpu-dc.rst
> > > index 2e45e83d9a2a..15405c43786a 100644
> > > --- a/Documentation/gpu/amdgpu-dc/amdgpu-dc.rst
> > > +++ b/Documentation/gpu/amdgpu-dc/amdgpu-dc.rst
> > > @@ -26,4 +26,4 @@ table of content:
> > > amdgpu-dcn-overview.rst
> > > amdgpu-dm.rst
> > > amdgpu-dc-debug.rst
> > > -
> > > +   dc-glossary.rst
> > > diff --git a/Documentation/gpu/amdgpu-dc/dc-glossary.rst
> > > b/Documentation/gpu/amdgpu-dc/dc-glossary.rst
> > > new file mode 100644
> > > index ..48698fc1799f
> > > --- /dev/null
> > > +++ b/Documentation/gpu/amdgpu-dc/dc-glossary.rst
> > > @@ -0,0 +1,257 @@
> > > +===
> > > +DC Glossary
> > > +===
> > > +
> > > +.. glossary::
> > > +
> > > +ABM
> > > +  Adaptive Backlight Modulation
> > > +
> > > +APU
> > > +  Accelerated Processing Unit
> > > +
> > > +ASIC
> > > +  Application-Specific Integrated Circuit
> > > +
> > > +ASSR
> > > +  Alternate Scrambler Seed Reset
> > > +
> > > +AZ
> > > +  Azalia (HD audio DMA engine)
> > > +
> > > +BPC
> > > +  Bits Per Colour/Component
> > > +
> > > +BPP
> > > +  Bits Per Pixel
> > > +
> > > +Clocks
> > > +  * PCLK: Pixel Clock
> > > +  * SYMCLK: Symbol Clock
> > > +  * SOCCLK: GPU Engine Clock
> > > +  * DISPCLK: Display Clock
> > > +  * DPPCLK: DPP Clock
> > > +  * DCFCLK: Display Controller Fabric Clock
> > > +  * REFCLK: Real Time Reference Clock
> > > +  * PPLL: Pixel PLL
> > > +  * FCLK: Fabric Clock
> > > +  * MCLK: Memory Clock
> > > +  * CPLIB: Content Protection Library
> >
> > CPLIB is not a clock.  It should be split out as its own item.
> >
> > > +
> > > +CRC
> > > +  Cyclic Redundancy Check
> > > +
> > > +CRTC
> > > +  Cathode Ray Tube Controller - commonly called "Controller" -
> > > Generates
> > > +  raw stream of pixels, clocked at pixel clock
> > > +
> > > +CVT
> > > +  Coordinated Video Timings
> > > +
> > > +DAL
> > > +  Display Abstraction layer
>
> I recall this as the old name for DC, maybe this should be mentioned ?
>
> > > +
> > > +DC (Software)
> > > +  Display Core
> > > +
> > > +DC (Hardware)
> > > +  Display Controller
> > > +
> > > +DCC
> > > +  Delta Colour Compression
> > > +
> > > +DCE
> > > +  Display Controller Engine
> > > +
> > > +DCHUB
> > > +  Display Controller Hub
> > > +
> > > +ARB
> > > +  Arbiter
> > > +
> > > +VTG
> > > +  Vertical Timing Generator
> > > +
> > > +DCN
> > > +  Display Core Next
> > > +
> > > +DCCG
> > > +  Display Clock Generator block
> > > +
> > > +DDC
> > > +  Display Data Channel
> > > +
> > > +DFS
> > > +  Digital Frequency Synthesizer
> > > +
> > > +DIO
> > > +  Display IO
> > > +
> > > +DPP
> > > +  Display Pipes and Planes
> > > +
> > > +DSC
> > > +  Display Stream Compression (Reduce the amount of bits to
> > > represent pixel
> > > +  count while at the same pixel clock)
> > > +
> > > +dGPU
> > > +  discrete GPU
> > > +
> > > +DMIF
> > > +  Display Memory Interface
> > > +
> > > +DML
> > > +  Display Mode 

Re: [PATCH V2 14/17] drm/amd/pm: relocate the power related headers

2021-11-30 Thread Lazar, Lijo




On 11/30/2021 1:12 PM, Evan Quan wrote:

Instead of centralizing all headers in the same folder. Separate them into
different folders and place them among those source files those who really
need them.

Signed-off-by: Evan Quan 
Change-Id: Id74cb4c7006327ca7ecd22daf17321e417c4aa71
---
  drivers/gpu/drm/amd/pm/Makefile   | 10 +++---
  drivers/gpu/drm/amd/pm/legacy-dpm/Makefile| 32 +++
  .../pm/{powerplay => legacy-dpm}/cik_dpm.h|  0
  .../amd/pm/{powerplay => legacy-dpm}/kv_dpm.c |  0
  .../amd/pm/{powerplay => legacy-dpm}/kv_dpm.h |  0
  .../amd/pm/{powerplay => legacy-dpm}/kv_smc.c |  0
  .../pm/{powerplay => legacy-dpm}/legacy_dpm.c |  0
  .../pm/{powerplay => legacy-dpm}/legacy_dpm.h |  0
  .../amd/pm/{powerplay => legacy-dpm}/ppsmc.h  |  0
  .../pm/{powerplay => legacy-dpm}/r600_dpm.h   |  0
  .../amd/pm/{powerplay => legacy-dpm}/si_dpm.c |  0
  .../amd/pm/{powerplay => legacy-dpm}/si_dpm.h |  0
  .../amd/pm/{powerplay => legacy-dpm}/si_smc.c |  0
  .../{powerplay => legacy-dpm}/sislands_smc.h  |  0
  drivers/gpu/drm/amd/pm/powerplay/Makefile |  6 +---
  .../pm/{ => powerplay}/inc/amd_powerplay.h|  0
  .../drm/amd/pm/{ => powerplay}/inc/cz_ppsmc.h |  0
  .../amd/pm/{ => powerplay}/inc/fiji_ppsmc.h   |  0
  .../pm/{ => powerplay}/inc/hardwaremanager.h  |  0
  .../drm/amd/pm/{ => powerplay}/inc/hwmgr.h|  0
  .../{ => powerplay}/inc/polaris10_pwrvirus.h  |  0
  .../amd/pm/{ => powerplay}/inc/power_state.h  |  0
  .../drm/amd/pm/{ => powerplay}/inc/pp_debug.h |  0
  .../amd/pm/{ => powerplay}/inc/pp_endian.h|  0
  .../amd/pm/{ => powerplay}/inc/pp_thermal.h   |  0
  .../amd/pm/{ => powerplay}/inc/ppinterrupt.h  |  0
  .../drm/amd/pm/{ => powerplay}/inc/rv_ppsmc.h |  0
  .../drm/amd/pm/{ => powerplay}/inc/smu10.h|  0
  .../pm/{ => powerplay}/inc/smu10_driver_if.h  |  0
  .../pm/{ => powerplay}/inc/smu11_driver_if.h  |  0
  .../gpu/drm/amd/pm/{ => powerplay}/inc/smu7.h |  0
  .../drm/amd/pm/{ => powerplay}/inc/smu71.h|  0
  .../pm/{ => powerplay}/inc/smu71_discrete.h   |  0
  .../drm/amd/pm/{ => powerplay}/inc/smu72.h|  0
  .../pm/{ => powerplay}/inc/smu72_discrete.h   |  0
  .../drm/amd/pm/{ => powerplay}/inc/smu73.h|  0
  .../pm/{ => powerplay}/inc/smu73_discrete.h   |  0
  .../drm/amd/pm/{ => powerplay}/inc/smu74.h|  0
  .../pm/{ => powerplay}/inc/smu74_discrete.h   |  0
  .../drm/amd/pm/{ => powerplay}/inc/smu75.h|  0
  .../pm/{ => powerplay}/inc/smu75_discrete.h   |  0
  .../amd/pm/{ => powerplay}/inc/smu7_common.h  |  0
  .../pm/{ => powerplay}/inc/smu7_discrete.h|  0
  .../amd/pm/{ => powerplay}/inc/smu7_fusion.h  |  0
  .../amd/pm/{ => powerplay}/inc/smu7_ppsmc.h   |  0
  .../gpu/drm/amd/pm/{ => powerplay}/inc/smu8.h |  0
  .../amd/pm/{ => powerplay}/inc/smu8_fusion.h  |  0
  .../gpu/drm/amd/pm/{ => powerplay}/inc/smu9.h |  0
  .../pm/{ => powerplay}/inc/smu9_driver_if.h   |  0
  .../{ => powerplay}/inc/smu_ucode_xfer_cz.h   |  0
  .../{ => powerplay}/inc/smu_ucode_xfer_vi.h   |  0
  .../drm/amd/pm/{ => powerplay}/inc/smumgr.h   |  0
  .../amd/pm/{ => powerplay}/inc/tonga_ppsmc.h  |  0
  .../amd/pm/{ => powerplay}/inc/vega10_ppsmc.h |  0
  .../inc/vega12/smu9_driver_if.h   |  0
  .../amd/pm/{ => powerplay}/inc/vega12_ppsmc.h |  0
  .../amd/pm/{ => powerplay}/inc/vega20_ppsmc.h |  0
  .../amd/pm/{ => swsmu}/inc/aldebaran_ppsmc.h  |  0
  .../drm/amd/pm/{ => swsmu}/inc/amdgpu_smu.h   |  0
  .../amd/pm/{ => swsmu}/inc/arcturus_ppsmc.h   |  0
  .../inc/smu11_driver_if_arcturus.h|  0
  .../inc/smu11_driver_if_cyan_skillfish.h  |  0
  .../{ => swsmu}/inc/smu11_driver_if_navi10.h  |  0
  .../inc/smu11_driver_if_sienna_cichlid.h  |  0
  .../{ => swsmu}/inc/smu11_driver_if_vangogh.h |  0
  .../amd/pm/{ => swsmu}/inc/smu12_driver_if.h  |  0
  .../inc/smu13_driver_if_aldebaran.h   |  0
  .../inc/smu13_driver_if_yellow_carp.h |  0
  .../pm/{ => swsmu}/inc/smu_11_0_cdr_table.h   |  0
  .../drm/amd/pm/{ => swsmu}/inc/smu_types.h|  0
  .../drm/amd/pm/{ => swsmu}/inc/smu_v11_0.h|  0
  .../pm/{ => swsmu}/inc/smu_v11_0_7_ppsmc.h|  0
  .../pm/{ => swsmu}/inc/smu_v11_0_7_pptable.h  |  0
  .../amd/pm/{ => swsmu}/inc/smu_v11_0_ppsmc.h  |  0
  .../pm/{ => swsmu}/inc/smu_v11_0_pptable.h|  0
  .../amd/pm/{ => swsmu}/inc/smu_v11_5_pmfw.h   |  0
  .../amd/pm/{ => swsmu}/inc/smu_v11_5_ppsmc.h  |  0
  .../amd/pm/{ => swsmu}/inc/smu_v11_8_pmfw.h   |  0
  .../amd/pm/{ => swsmu}/inc/smu_v11_8_ppsmc.h  |  0
  .../drm/amd/pm/{ => swsmu}/inc/smu_v12_0.h|  0
  .../amd/pm/{ => swsmu}/inc/smu_v12_0_ppsmc.h  |  0
  .../drm/amd/pm/{ => swsmu}/inc/smu_v13_0.h|  0
  .../amd/pm/{ => swsmu}/inc/smu_v13_0_1_pmfw.h |  0
  .../pm/{ => swsmu}/inc/smu_v13_0_1_ppsmc.h|  0
  .../pm/{ => swsmu}/inc/smu_v13_0_pptable.h|  0
  .../gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c |  1 -
  .../drm/amd/pm/swsmu/smu13/aldebaran_ppt.c|  1 -
  87 files changed, 39 insertions(+), 11 deletions(-)
  

Re: [PATCH V2 13/17] drm/amd/pm: do not expose the smu_context structure used internally in power

2021-11-30 Thread Lazar, Lijo




On 11/30/2021 1:12 PM, Evan Quan wrote:

This can cover the power implementation details. And as what did for
powerplay framework, we hook the smu_context to adev->powerplay.pp_handle.

Signed-off-by: Evan Quan 
Change-Id: I3969c9f62a8b63dc6e4321a488d8f15022ffeb3d
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  6 --
  .../gpu/drm/amd/include/kgd_pp_interface.h|  9 +++
  drivers/gpu/drm/amd/pm/amdgpu_dpm.c   | 51 ++--
  drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h   | 11 +---
  drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 60 +--
  .../gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c |  9 +--
  .../gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c   |  9 +--
  .../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c   |  9 +--
  .../gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c|  4 +-
  .../drm/amd/pm/swsmu/smu13/aldebaran_ppt.c|  9 +--
  .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c|  8 +--
  11 files changed, 111 insertions(+), 74 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index c987813a4996..fefabd568483 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -99,7 +99,6 @@
  #include "amdgpu_gem.h"
  #include "amdgpu_doorbell.h"
  #include "amdgpu_amdkfd.h"
-#include "amdgpu_smu.h"
  #include "amdgpu_discovery.h"
  #include "amdgpu_mes.h"
  #include "amdgpu_umc.h"
@@ -950,11 +949,6 @@ struct amdgpu_device {
  
  	/* powerplay */

struct amd_powerplaypowerplay;
-
-   /* smu */
-   struct smu_context  smu;
-
-   /* dpm */
struct amdgpu_pmpm;
u32 cg_flags;
u32 pg_flags;
diff --git a/drivers/gpu/drm/amd/include/kgd_pp_interface.h 
b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
index 7919e96e772b..da6a82430048 100644
--- a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
@@ -25,6 +25,9 @@
  #define __KGD_PP_INTERFACE_H__
  
  extern const struct amdgpu_ip_block_version pp_smu_ip_block;

+extern const struct amdgpu_ip_block_version smu_v11_0_ip_block;
+extern const struct amdgpu_ip_block_version smu_v12_0_ip_block;
+extern const struct amdgpu_ip_block_version smu_v13_0_ip_block;
  
  enum smu_event_type {

SMU_EVENT_RESET_COMPLETE = 0,
@@ -244,6 +247,12 @@ enum pp_power_type
PP_PWR_TYPE_FAST,
  };
  
+enum smu_ppt_limit_type

+{
+   SMU_DEFAULT_PPT_LIMIT = 0,
+   SMU_FAST_PPT_LIMIT,
+};
+


This is a contradiction. If the entry point is dpm, this shouldn't be 
here and the external interface doesn't need to know about internal 
datatypes.



  #define PP_GROUP_MASK0xF000
  #define PP_GROUP_SHIFT   28
  
diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c

index 8f0ae58f4292..a5cbbf9367fe 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
@@ -31,6 +31,7 @@
  #include "amdgpu_display.h"
  #include "hwmgr.h"
  #include 
+#include "amdgpu_smu.h"
  
  #define amdgpu_dpm_enable_bapm(adev, e) \


((adev)->powerplay.pp_funcs->enable_bapm((adev)->powerplay.pp_handle, (e)))
@@ -213,7 +214,7 @@ int amdgpu_dpm_baco_reset(struct amdgpu_device *adev)
  
  bool amdgpu_dpm_is_mode1_reset_supported(struct amdgpu_device *adev)

  {
-   struct smu_context *smu = >smu;
+   struct smu_context *smu = adev->powerplay.pp_handle;
  
  	if (is_support_sw_smu(adev))

return smu_mode1_reset_is_support(smu);
@@ -223,7 +224,7 @@ bool amdgpu_dpm_is_mode1_reset_supported(struct 
amdgpu_device *adev)
  
  int amdgpu_dpm_mode1_reset(struct amdgpu_device *adev)

  {
-   struct smu_context *smu = >smu;
+   struct smu_context *smu = adev->powerplay.pp_handle;
  
  	if (is_support_sw_smu(adev))

return smu_mode1_reset(smu);
@@ -276,7 +277,7 @@ int amdgpu_dpm_set_df_cstate(struct amdgpu_device *adev,
  
  int amdgpu_dpm_allow_xgmi_power_down(struct amdgpu_device *adev, bool en)

  {
-   struct smu_context *smu = >smu;
+   struct smu_context *smu = adev->powerplay.pp_handle;
  
  	if (is_support_sw_smu(adev))

return smu_allow_xgmi_power_down(smu, en);
@@ -341,7 +342,7 @@ void amdgpu_pm_acpi_event_handler(struct amdgpu_device 
*adev)
mutex_unlock(>pm.mutex);
  
  		if (is_support_sw_smu(adev))

-   smu_set_ac_dc(>smu);
+   smu_set_ac_dc(adev->powerplay.pp_handle);
}
  }
  
@@ -423,15 +424,16 @@ int amdgpu_pm_load_smu_firmware(struct amdgpu_device *adev, uint32_t *smu_versio
  
  int amdgpu_dpm_set_light_sbr(struct amdgpu_device *adev, bool enable)

  {
-   return smu_set_light_sbr(>smu, enable);
+   return smu_set_light_sbr(adev->powerplay.pp_handle, enable);
  }
  
  int amdgpu_dpm_send_hbm_bad_pages_num(struct amdgpu_device *adev, uint32_t size)

  {
+   struct 

Re: [PATCH V2 11/17] drm/amd/pm: correct the usage for amdgpu_dpm_dispatch_task()

2021-11-30 Thread Lazar, Lijo




On 11/30/2021 1:12 PM, Evan Quan wrote:

We should avoid having multi-function APIs. It should be up to the caller
to determine when or whether to call amdgpu_dpm_dispatch_task().

Signed-off-by: Evan Quan 
Change-Id: I78ec4eb8ceb6e526a4734113d213d15a5fbaa8a4
---
  drivers/gpu/drm/amd/pm/amdgpu_dpm.c | 18 ++
  drivers/gpu/drm/amd/pm/amdgpu_pm.c  | 26 --
  2 files changed, 26 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c 
b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
index c6299e406848..8f0ae58f4292 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
@@ -558,8 +558,6 @@ void amdgpu_dpm_set_power_state(struct amdgpu_device *adev,
enum amd_pm_state_type state)
  {
adev->pm.dpm.user_state = state;
-
-   amdgpu_dpm_dispatch_task(adev, AMD_PP_TASK_ENABLE_USER_STATE, );
  }
  
  enum amd_dpm_forced_level amdgpu_dpm_get_performance_level(struct amdgpu_device *adev)

@@ -727,13 +725,7 @@ int amdgpu_dpm_set_sclk_od(struct amdgpu_device *adev, 
uint32_t value)
if (!pp_funcs->set_sclk_od)
return -EOPNOTSUPP;
  
-	pp_funcs->set_sclk_od(adev->powerplay.pp_handle, value);

-
-   amdgpu_dpm_dispatch_task(adev,
-AMD_PP_TASK_READJUST_POWER_STATE,
-NULL);
-
-   return 0;
+   return pp_funcs->set_sclk_od(adev->powerplay.pp_handle, value);
  }
  
  int amdgpu_dpm_get_mclk_od(struct amdgpu_device *adev)

@@ -753,13 +745,7 @@ int amdgpu_dpm_set_mclk_od(struct amdgpu_device *adev, 
uint32_t value)
if (!pp_funcs->set_mclk_od)
return -EOPNOTSUPP;
  
-	pp_funcs->set_mclk_od(adev->powerplay.pp_handle, value);

-
-   amdgpu_dpm_dispatch_task(adev,
-AMD_PP_TASK_READJUST_POWER_STATE,
-NULL);
-
-   return 0;
+   return pp_funcs->set_mclk_od(adev->powerplay.pp_handle, value);
  }
  
  int amdgpu_dpm_get_power_profile_mode(struct amdgpu_device *adev,

diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c 
b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
index fa2f4e11e94e..89e1134d660f 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
@@ -187,6 +187,10 @@ static ssize_t amdgpu_set_power_dpm_state(struct device 
*dev,
  
  	amdgpu_dpm_set_power_state(adev, state);
  
+	amdgpu_dpm_dispatch_task(adev,

+AMD_PP_TASK_ENABLE_USER_STATE,
+);
+


This is just the opposite of what has been done so far. The idea is to 
keep the logic inside dpm_* calls and not to keep the logic in 
amdgpu_pm. This does the reverse. I guess this patch can be dropped.


Thanks,
Lijo


pm_runtime_mark_last_busy(ddev->dev);
pm_runtime_put_autosuspend(ddev->dev);
  
@@ -1278,7 +1282,16 @@ static ssize_t amdgpu_set_pp_sclk_od(struct device *dev,

return ret;
}
  
-	amdgpu_dpm_set_sclk_od(adev, (uint32_t)value);

+   ret = amdgpu_dpm_set_sclk_od(adev, (uint32_t)value);
+   if (ret) {
+   pm_runtime_mark_last_busy(ddev->dev);
+   pm_runtime_put_autosuspend(ddev->dev);
+   return ret;
+   }
+
+   amdgpu_dpm_dispatch_task(adev,
+AMD_PP_TASK_READJUST_POWER_STATE,
+NULL);
  
  	pm_runtime_mark_last_busy(ddev->dev);

pm_runtime_put_autosuspend(ddev->dev);
@@ -1340,7 +1353,16 @@ static ssize_t amdgpu_set_pp_mclk_od(struct device *dev,
return ret;
}
  
-	amdgpu_dpm_set_mclk_od(adev, (uint32_t)value);

+   ret = amdgpu_dpm_set_mclk_od(adev, (uint32_t)value);
+   if (ret) {
+   pm_runtime_mark_last_busy(ddev->dev);
+   pm_runtime_put_autosuspend(ddev->dev);
+   return ret;
+   }
+
+   amdgpu_dpm_dispatch_task(adev,
+AMD_PP_TASK_READJUST_POWER_STATE,
+NULL);
  
  	pm_runtime_mark_last_busy(ddev->dev);

pm_runtime_put_autosuspend(ddev->dev);



Re: [PATCH V2 07/17] drm/amd/pm: create a new holder for those APIs used only by legacy ASICs(si/kv)

2021-11-30 Thread Lazar, Lijo




On 11/30/2021 1:12 PM, Evan Quan wrote:

Those APIs are used only by legacy ASICs(si/kv). They cannot be
shared by other ASICs. So, we create a new holder for them.

Signed-off-by: Evan Quan 
Change-Id: I555dfa37e783a267b1d3b3a7db5c87fcc3f1556f
--
v1->v2:
   - move other APIs used by si/kv in amdgpu_atombios.c to the new
 holder also(Alex)
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c  |  421 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.h  |   30 -
  .../gpu/drm/amd/include/kgd_pp_interface.h|1 +
  drivers/gpu/drm/amd/pm/amdgpu_dpm.c   | 1008 +---
  drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h   |   15 -
  drivers/gpu/drm/amd/pm/powerplay/Makefile |2 +-
  drivers/gpu/drm/amd/pm/powerplay/kv_dpm.c |2 +
  drivers/gpu/drm/amd/pm/powerplay/legacy_dpm.c | 1453 +
  drivers/gpu/drm/amd/pm/powerplay/legacy_dpm.h |   70 +
  drivers/gpu/drm/amd/pm/powerplay/si_dpm.c |2 +
  10 files changed, 1534 insertions(+), 1470 deletions(-)
  create mode 100644 drivers/gpu/drm/amd/pm/powerplay/legacy_dpm.c
  create mode 100644 drivers/gpu/drm/amd/pm/powerplay/legacy_dpm.h

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
index 12a6b1c99c93..f2e447212e62 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
@@ -1083,427 +1083,6 @@ int amdgpu_atombios_get_clock_dividers(struct 
amdgpu_device *adev,
return 0;
  }
  
-int amdgpu_atombios_get_memory_pll_dividers(struct amdgpu_device *adev,

-   u32 clock,
-   bool strobe_mode,
-   struct atom_mpll_param *mpll_param)
-{
-   COMPUTE_MEMORY_CLOCK_PARAM_PARAMETERS_V2_1 args;
-   int index = GetIndexIntoMasterTable(COMMAND, ComputeMemoryClockParam);
-   u8 frev, crev;
-
-   memset(, 0, sizeof(args));
-   memset(mpll_param, 0, sizeof(struct atom_mpll_param));
-
-   if (!amdgpu_atom_parse_cmd_header(adev->mode_info.atom_context, index, 
, ))
-   return -EINVAL;
-
-   switch (frev) {
-   case 2:
-   switch (crev) {
-   case 1:
-   /* SI */
-   args.ulClock = cpu_to_le32(clock);  /* 10 khz */
-   args.ucInputFlag = 0;
-   if (strobe_mode)
-   args.ucInputFlag |= 
MPLL_INPUT_FLAG_STROBE_MODE_EN;
-
-   amdgpu_atom_execute_table(adev->mode_info.atom_context, 
index, (uint32_t *));
-
-   mpll_param->clkfrac = 
le16_to_cpu(args.ulFbDiv.usFbDivFrac);
-   mpll_param->clkf = le16_to_cpu(args.ulFbDiv.usFbDiv);
-   mpll_param->post_div = args.ucPostDiv;
-   mpll_param->dll_speed = args.ucDllSpeed;
-   mpll_param->bwcntl = args.ucBWCntl;
-   mpll_param->vco_mode =
-   (args.ucPllCntlFlag & 
MPLL_CNTL_FLAG_VCO_MODE_MASK);
-   mpll_param->yclk_sel =
-   (args.ucPllCntlFlag & 
MPLL_CNTL_FLAG_BYPASS_DQ_PLL) ? 1 : 0;
-   mpll_param->qdr =
-   (args.ucPllCntlFlag & 
MPLL_CNTL_FLAG_QDR_ENABLE) ? 1 : 0;
-   mpll_param->half_rate =
-   (args.ucPllCntlFlag & 
MPLL_CNTL_FLAG_AD_HALF_RATE) ? 1 : 0;
-   break;
-   default:
-   return -EINVAL;
-   }
-   break;
-   default:
-   return -EINVAL;
-   }
-   return 0;
-}
-
-void amdgpu_atombios_set_engine_dram_timings(struct amdgpu_device *adev,
-u32 eng_clock, u32 mem_clock)
-{
-   SET_ENGINE_CLOCK_PS_ALLOCATION args;
-   int index = GetIndexIntoMasterTable(COMMAND, DynamicMemorySettings);
-   u32 tmp;
-
-   memset(, 0, sizeof(args));
-
-   tmp = eng_clock & SET_CLOCK_FREQ_MASK;
-   tmp |= (COMPUTE_ENGINE_PLL_PARAM << 24);
-
-   args.ulTargetEngineClock = cpu_to_le32(tmp);
-   if (mem_clock)
-   args.sReserved.ulClock = cpu_to_le32(mem_clock & 
SET_CLOCK_FREQ_MASK);
-
-   amdgpu_atom_execute_table(adev->mode_info.atom_context, index, (uint32_t 
*));
-}
-
-void amdgpu_atombios_get_default_voltages(struct amdgpu_device *adev,
- u16 *vddc, u16 *vddci, u16 *mvdd)
-{
-   struct amdgpu_mode_info *mode_info = >mode_info;
-   int index = GetIndexIntoMasterTable(DATA, FirmwareInfo);
-   u8 frev, crev;
-   u16 data_offset;
-   union firmware_info *firmware_info;
-
-   *vddc = 0;
-   *vddci = 0;
-   *mvdd = 0;
-
-   if (amdgpu_atom_parse_data_header(mode_info->atom_context, index, NULL,
-  

RE: [PATCH V2 02/17] drm/amd/pm: do not expose power implementation details to amdgpu_pm.c

2021-11-30 Thread Chen, Guchun
[Public]

Two nit-picks.

1. It's better to drop "return" in amdgpu_dpm_get_current_power_state.

2. In some functions, when function pointer is NULL, sometimes it returns 0, 
while in other cases, it returns -EOPNOTSUPP. Is there any cause for this?

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Evan Quan
Sent: Tuesday, November 30, 2021 3:43 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Lazar, Lijo 
; Feng, Kenneth ; Koenig, Christian 
; Quan, Evan 
Subject: [PATCH V2 02/17] drm/amd/pm: do not expose power implementation 
details to amdgpu_pm.c

amdgpu_pm.c holds all the user sysfs/hwmon interfaces. It's another
client of our power APIs. It's not proper to spike into power
implementation details there.

Signed-off-by: Evan Quan 
Change-Id: I397853ddb13eacfce841366de2a623535422df9a
---
 drivers/gpu/drm/amd/pm/amdgpu_dpm.c   | 458 ++-
 drivers/gpu/drm/amd/pm/amdgpu_pm.c| 519 --
 drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h   | 160 +++
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c |   3 -
 4 files changed, 709 insertions(+), 431 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c 
b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
index 9b332c8a0079..3c59f16c7a6f 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
@@ -1453,7 +1453,9 @@ static void amdgpu_dpm_change_power_state_locked(struct 
amdgpu_device *adev)
if (equal)
return;
 
-   amdgpu_dpm_set_power_state(adev);
+   if (adev->powerplay.pp_funcs->set_power_state)
+   
adev->powerplay.pp_funcs->set_power_state(adev->powerplay.pp_handle);
+
amdgpu_dpm_post_set_power_state(adev);
 
adev->pm.dpm.current_active_crtcs = adev->pm.dpm.new_active_crtcs;
@@ -1709,3 +1711,457 @@ int amdgpu_dpm_get_ecc_info(struct amdgpu_device *adev,
 
return smu_get_ecc_info(>smu, umc_ecc);
 }
+
+struct amd_vce_state *amdgpu_dpm_get_vce_clock_state(struct amdgpu_device 
*adev,
+uint32_t idx)
+{
+   const struct amd_pm_funcs *pp_funcs = adev->powerplay.pp_funcs;
+
+   if (!pp_funcs->get_vce_clock_state)
+   return NULL;
+
+   return pp_funcs->get_vce_clock_state(adev->powerplay.pp_handle,
+idx);
+}
+
+void amdgpu_dpm_get_current_power_state(struct amdgpu_device *adev,
+   enum amd_pm_state_type *state)
+{
+   const struct amd_pm_funcs *pp_funcs = adev->powerplay.pp_funcs;
+
+   if (!pp_funcs->get_current_power_state) {
+   *state = adev->pm.dpm.user_state;
+   return;
+   }
+
+   *state = pp_funcs->get_current_power_state(adev->powerplay.pp_handle);
+   if (*state < POWER_STATE_TYPE_DEFAULT ||
+   *state > POWER_STATE_TYPE_INTERNAL_3DPERF)
+   *state = adev->pm.dpm.user_state;
+
+   return;
+}
+
+void amdgpu_dpm_set_power_state(struct amdgpu_device *adev,
+   enum amd_pm_state_type state)
+{
+   adev->pm.dpm.user_state = state;
+
+   if (adev->powerplay.pp_funcs->dispatch_tasks)
+   amdgpu_dpm_dispatch_task(adev, AMD_PP_TASK_ENABLE_USER_STATE, 
);
+   else
+   amdgpu_pm_compute_clocks(adev);
+}
+
+enum amd_dpm_forced_level amdgpu_dpm_get_performance_level(struct 
amdgpu_device *adev)
+{
+   const struct amd_pm_funcs *pp_funcs = adev->powerplay.pp_funcs;
+   enum amd_dpm_forced_level level;
+
+   if (pp_funcs->get_performance_level)
+   level = 
pp_funcs->get_performance_level(adev->powerplay.pp_handle);
+   else
+   level = adev->pm.dpm.forced_level;
+
+   return level;
+}
+
+int amdgpu_dpm_force_performance_level(struct amdgpu_device *adev,
+  enum amd_dpm_forced_level level)
+{
+   const struct amd_pm_funcs *pp_funcs = adev->powerplay.pp_funcs;
+
+   if (pp_funcs->force_performance_level) {
+   if (adev->pm.dpm.thermal_active)
+   return -EINVAL;
+
+   if (pp_funcs->force_performance_level(adev->powerplay.pp_handle,
+ level))
+   return -EINVAL;
+   }
+
+   adev->pm.dpm.forced_level = level;
+
+   return 0;
+}
+
+int amdgpu_dpm_get_pp_num_states(struct amdgpu_device *adev,
+struct pp_states_info *states)
+{
+   const struct amd_pm_funcs *pp_funcs = adev->powerplay.pp_funcs;
+
+   if (!pp_funcs->get_pp_num_states)
+   return -EOPNOTSUPP;
+
+   return pp_funcs->get_pp_num_states(adev->powerplay.pp_handle, states);
+}
+
+int amdgpu_dpm_dispatch_task(struct amdgpu_device *adev,
+ enum amd_pp_task task_id,
+ enum amd_pm_state_type *user_state)
+{
+   const struct amd_pm_funcs 

Re: [PATCH V2 06/17] drm/amd/pm: do not expose the API used internally only in kv_dpm.c

2021-11-30 Thread Lazar, Lijo




On 11/30/2021 1:12 PM, Evan Quan wrote:

Move it to kv_dpm.c instead.

Signed-off-by: Evan Quan 
Change-Id: I554332b386491a79b7913f72786f1e2cb1f8165b
--
v1->v2:
   - rename the API with "kv_" prefix(Alex)
---
  drivers/gpu/drm/amd/pm/amdgpu_dpm.c   | 23 -
  drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h   |  2 --
  drivers/gpu/drm/amd/pm/powerplay/kv_dpm.c | 25 ++-
  3 files changed, 24 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c 
b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
index fbfc07a83122..ecaf0081bc31 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
@@ -209,29 +209,6 @@ static u32 amdgpu_dpm_get_vrefresh(struct amdgpu_device 
*adev)
return vrefresh;
  }
  
-bool amdgpu_is_internal_thermal_sensor(enum amdgpu_int_thermal_type sensor)

-{
-   switch (sensor) {
-   case THERMAL_TYPE_RV6XX:
-   case THERMAL_TYPE_RV770:
-   case THERMAL_TYPE_EVERGREEN:
-   case THERMAL_TYPE_SUMO:
-   case THERMAL_TYPE_NI:
-   case THERMAL_TYPE_SI:
-   case THERMAL_TYPE_CI:
-   case THERMAL_TYPE_KV:
-   return true;
-   case THERMAL_TYPE_ADT7473_WITH_INTERNAL:
-   case THERMAL_TYPE_EMC2103_WITH_INTERNAL:
-   return false; /* need special handling */
-   case THERMAL_TYPE_NONE:
-   case THERMAL_TYPE_EXTERNAL:
-   case THERMAL_TYPE_EXTERNAL_GPIO:
-   default:
-   return false;
-   }
-}
-
  union power_info {
struct _ATOM_POWERPLAY_INFO info;
struct _ATOM_POWERPLAY_INFO_V2 info_2;
diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h 
b/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
index f43b96dfe9d8..01120b302590 100644
--- a/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
+++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
@@ -374,8 +374,6 @@ u32 amdgpu_dpm_get_vblank_time(struct amdgpu_device *adev);
  int amdgpu_dpm_read_sensor(struct amdgpu_device *adev, enum amd_pp_sensors 
sensor,
   void *data, uint32_t *size);
  
-bool amdgpu_is_internal_thermal_sensor(enum amdgpu_int_thermal_type sensor);

-
  int amdgpu_get_platform_caps(struct amdgpu_device *adev);
  
  int amdgpu_parse_extended_power_table(struct amdgpu_device *adev);

diff --git a/drivers/gpu/drm/amd/pm/powerplay/kv_dpm.c 
b/drivers/gpu/drm/amd/pm/powerplay/kv_dpm.c
index bcae42cef374..380a5336c74f 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/kv_dpm.c
+++ b/drivers/gpu/drm/amd/pm/powerplay/kv_dpm.c
@@ -1256,6 +1256,29 @@ static void kv_dpm_enable_bapm(void *handle, bool enable)
}
  }
  
+static bool kv_is_internal_thermal_sensor(enum amdgpu_int_thermal_type sensor)

+{
+   switch (sensor) {
+   case THERMAL_TYPE_RV6XX:
+   case THERMAL_TYPE_RV770:
+   case THERMAL_TYPE_EVERGREEN:
+   case THERMAL_TYPE_SUMO:
+   case THERMAL_TYPE_NI:
+   case THERMAL_TYPE_SI:
+   case THERMAL_TYPE_CI:
+   case THERMAL_TYPE_KV:
+   return true;
+   case THERMAL_TYPE_ADT7473_WITH_INTERNAL:
+   case THERMAL_TYPE_EMC2103_WITH_INTERNAL:
+   return false; /* need special handling */
+   case THERMAL_TYPE_NONE:
+   case THERMAL_TYPE_EXTERNAL:
+   case THERMAL_TYPE_EXTERNAL_GPIO:
+   default:
+   return false;
+   }
+}


All these names don't look like KV specific. Remove the family specifc 
ones like RV, SI, NI, CI etc., and keep KV and the generic ones like 
GPIO/EXTERNAL/NONE. Don't see a chance of external diodes being used for KV.


Thanks,
Lijo


+
  static int kv_dpm_enable(struct amdgpu_device *adev)
  {
struct kv_power_info *pi = kv_get_pi(adev);
@@ -1352,7 +1375,7 @@ static int kv_dpm_enable(struct amdgpu_device *adev)
}
  
  	if (adev->irq.installed &&

-   amdgpu_is_internal_thermal_sensor(adev->pm.int_thermal_type)) {
+   kv_is_internal_thermal_sensor(adev->pm.int_thermal_type)) {
ret = kv_set_thermal_temperature_range(adev, KV_TEMP_RANGE_MIN, 
KV_TEMP_RANGE_MAX);
if (ret) {
DRM_ERROR("kv_set_thermal_temperature_range failed\n");



Re: [PATCH V2 05/17] drm/amd/pm: do not expose those APIs used internally only in si_dpm.c

2021-11-30 Thread Lazar, Lijo




On 11/30/2021 1:12 PM, Evan Quan wrote:

Move them to si_dpm.c instead.

Signed-off-by: Evan Quan 
Change-Id: I288205cfd7c6ba09cfb22626ff70360d61ff0c67
--
v1->v2:
   - rename the API with "si_" prefix(Alex)
---
  drivers/gpu/drm/amd/pm/amdgpu_dpm.c   | 25 ---
  drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h   | 25 ---
  drivers/gpu/drm/amd/pm/powerplay/si_dpm.c | 54 +++
  drivers/gpu/drm/amd/pm/powerplay/si_dpm.h |  7 +++
  4 files changed, 53 insertions(+), 58 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c 
b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
index 52ac3c883a6e..fbfc07a83122 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
@@ -894,31 +894,6 @@ void amdgpu_add_thermal_controller(struct amdgpu_device 
*adev)
}
  }
  
-enum amdgpu_pcie_gen amdgpu_get_pcie_gen_support(struct amdgpu_device *adev,

-u32 sys_mask,
-enum amdgpu_pcie_gen asic_gen,
-enum amdgpu_pcie_gen 
default_gen)
-{
-   switch (asic_gen) {
-   case AMDGPU_PCIE_GEN1:
-   return AMDGPU_PCIE_GEN1;
-   case AMDGPU_PCIE_GEN2:
-   return AMDGPU_PCIE_GEN2;
-   case AMDGPU_PCIE_GEN3:
-   return AMDGPU_PCIE_GEN3;
-   default:
-   if ((sys_mask & CAIL_PCIE_LINK_SPEED_SUPPORT_GEN3) &&
-   (default_gen == AMDGPU_PCIE_GEN3))
-   return AMDGPU_PCIE_GEN3;
-   else if ((sys_mask & CAIL_PCIE_LINK_SPEED_SUPPORT_GEN2) &&
-(default_gen == AMDGPU_PCIE_GEN2))
-   return AMDGPU_PCIE_GEN2;
-   else
-   return AMDGPU_PCIE_GEN1;
-   }
-   return AMDGPU_PCIE_GEN1;
-}
-
  struct amd_vce_state*
  amdgpu_get_vce_clock_state(void *handle, u32 idx)
  {
diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h 
b/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
index 6681b878e75f..f43b96dfe9d8 100644
--- a/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
+++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
@@ -45,19 +45,6 @@ enum amdgpu_int_thermal_type {
THERMAL_TYPE_KV,
  };
  
-enum amdgpu_dpm_auto_throttle_src {

-   AMDGPU_DPM_AUTO_THROTTLE_SRC_THERMAL,
-   AMDGPU_DPM_AUTO_THROTTLE_SRC_EXTERNAL
-};
-
-enum amdgpu_dpm_event_src {
-   AMDGPU_DPM_EVENT_SRC_ANALOG = 0,
-   AMDGPU_DPM_EVENT_SRC_EXTERNAL = 1,
-   AMDGPU_DPM_EVENT_SRC_DIGITAL = 2,
-   AMDGPU_DPM_EVENT_SRC_ANALOG_OR_EXTERNAL = 3,
-   AMDGPU_DPM_EVENT_SRC_DIGIAL_OR_EXTERNAL = 4
-};
-
  struct amdgpu_ps {
u32 caps; /* vbios flags */
u32 class; /* vbios flags */
@@ -252,13 +239,6 @@ struct amdgpu_dpm_fan {
bool ucode_fan_control;
  };
  
-enum amdgpu_pcie_gen {

-   AMDGPU_PCIE_GEN1 = 0,
-   AMDGPU_PCIE_GEN2 = 1,
-   AMDGPU_PCIE_GEN3 = 2,
-   AMDGPU_PCIE_GEN_INVALID = 0x
-};
-
  #define amdgpu_dpm_reset_power_profile_state(adev, request) \
((adev)->powerplay.pp_funcs->reset_power_profile_state(\
(adev)->powerplay.pp_handle, request))
@@ -403,11 +383,6 @@ void amdgpu_free_extended_power_table(struct amdgpu_device 
*adev);
  
  void amdgpu_add_thermal_controller(struct amdgpu_device *adev);
  
-enum amdgpu_pcie_gen amdgpu_get_pcie_gen_support(struct amdgpu_device *adev,

-u32 sys_mask,
-enum amdgpu_pcie_gen asic_gen,
-enum amdgpu_pcie_gen 
default_gen);
-
  struct amd_vce_state*
  amdgpu_get_vce_clock_state(void *handle, u32 idx);
  
diff --git a/drivers/gpu/drm/amd/pm/powerplay/si_dpm.c b/drivers/gpu/drm/amd/pm/powerplay/si_dpm.c

index 81f82aa05ec2..4f84d8b893f1 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/si_dpm.c
+++ b/drivers/gpu/drm/amd/pm/powerplay/si_dpm.c
@@ -96,6 +96,19 @@ union pplib_clock_info {
struct _ATOM_PPLIB_SI_CLOCK_INFO si;
  };
  
+enum amdgpu_dpm_auto_throttle_src {

+   AMDGPU_DPM_AUTO_THROTTLE_SRC_THERMAL,
+   AMDGPU_DPM_AUTO_THROTTLE_SRC_EXTERNAL
+};
+
+enum amdgpu_dpm_event_src {
+   AMDGPU_DPM_EVENT_SRC_ANALOG = 0,
+   AMDGPU_DPM_EVENT_SRC_EXTERNAL = 1,
+   AMDGPU_DPM_EVENT_SRC_DIGITAL = 2,
+   AMDGPU_DPM_EVENT_SRC_ANALOG_OR_EXTERNAL = 3,
+   AMDGPU_DPM_EVENT_SRC_DIGIAL_OR_EXTERNAL = 4
+};
+


Better to rename the enums also including amdgpu_pcie_gen if they are 
used only within si_dpm.


Thanks,
Lijo


  static const u32 r600_utc[R600_PM_NUMBER_OF_TC] =
  {
R600_UTC_DFLT_00,
@@ -4927,6 +4940,31 @@ static int si_populate_smc_initial_state(struct 
amdgpu_device *adev,
return 0;
  }
  
+static enum amdgpu_pcie_gen si_gen_pcie_gen_support(struct amdgpu_device *adev,

+   u32 sys_mask,
+   

Re: [PATCH] drm/amdgpu: add SMU debug option support

2021-11-30 Thread Christian König

Am 30.11.21 um 06:17 schrieb Lang Yu:

To maintain system error state when SMU errors occurred,
which will aid in debugging SMU firmware issues,
add SMU debug option support.

It can be enabled or disabled via amdgpu_smu_debug
debugfs file. When enabled, it makes SMU errors fatal.
It is disabled by default.

== Command Guide ==

1, enable SMU debug option

  # echo 1 > /sys/kernel/debug/dri/0/amdgpu_smu_debug

2, disable SMU debug option

  # echo 0 > /sys/kernel/debug/dri/0/amdgpu_smu_debug

v2:
  - Resend command when timeout.(Lijo)
  - Use debugfs file instead of module parameter.

Signed-off-by: Lang Yu 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 32 +
  drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c  | 39 +++--
  2 files changed, 69 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 164d6a9e9fbb..f9412de86599 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -39,6 +39,8 @@
  
  #if defined(CONFIG_DEBUG_FS)
  
+extern int amdgpu_smu_debug;

+
  /**
   * amdgpu_debugfs_process_reg_op - Handle MMIO register reads/writes
   *
@@ -1152,6 +1154,8 @@ static ssize_t amdgpu_debugfs_gfxoff_read(struct file *f, 
char __user *buf,
return result;
  }
  
+

+


Unrelated change.


  static const struct file_operations amdgpu_debugfs_regs2_fops = {
.owner = THIS_MODULE,
.unlocked_ioctl = amdgpu_debugfs_regs2_ioctl,
@@ -1609,6 +1613,26 @@ DEFINE_DEBUGFS_ATTRIBUTE(fops_ib_preempt, NULL,
  DEFINE_DEBUGFS_ATTRIBUTE(fops_sclk_set, NULL,
amdgpu_debugfs_sclk_set, "%llu\n");
  
+static int amdgpu_debugfs_smu_debug_get(void *data, u64 *val)

+{
+   *val = amdgpu_smu_debug;
+   return 0;
+}
+
+static int amdgpu_debugfs_smu_debug_set(void *data, u64 val)
+{
+   if (val != 0 && val != 1)
+   return -EINVAL;
+
+   amdgpu_smu_debug = val;
+   return 0;
+}
+
+DEFINE_DEBUGFS_ATTRIBUTE(fops_smu_debug,
+amdgpu_debugfs_smu_debug_get,
+amdgpu_debugfs_smu_debug_set,
+"%llu\n");
+


That can be done much simpler. Take a look at the debugfs_create_bool() 
function.



  int amdgpu_debugfs_init(struct amdgpu_device *adev)
  {
struct dentry *root = adev_to_drm(adev)->primary->debugfs_root;
@@ -1632,6 +1656,14 @@ int amdgpu_debugfs_init(struct amdgpu_device *adev)
return PTR_ERR(ent);
}
  
+	ent = debugfs_create_file("amdgpu_smu_debug", 0600, root, adev,

+ _smu_debug);
+   if (IS_ERR(ent)) {
+   DRM_ERROR("unable to create amdgpu_smu_debug debugsfs file\n");
+   return PTR_ERR(ent);
+   }
+
+
/* Register debugfs entries for amdgpu_ttm */
amdgpu_ttm_debugfs_init(adev);
amdgpu_debugfs_pm_init(adev);
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
index 048ca1673863..b3969d7933d3 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
@@ -55,6 +55,14 @@
  
  #undef __SMU_DUMMY_MAP

  #define __SMU_DUMMY_MAP(type) #type
+
+/*
+ * Used to enable SMU debug option. When enabled, it makes SMU errors fatal.
+ * This will aid in debugging SMU firmware issues.
+ * (0 = disabled (default), 1 = enabled)
+ */
+int amdgpu_smu_debug;


Probably better to put that into amdgpu_device or similar structure.

Regards,
Christian.


+
  static const char * const __smu_message_names[] = {
SMU_MESSAGE_TYPES
  };
@@ -272,6 +280,11 @@ int smu_cmn_send_msg_without_waiting(struct smu_context 
*smu,
__smu_cmn_send_msg(smu, msg_index, param);
res = 0;
  Out:
+   if (unlikely(amdgpu_smu_debug == 1) && res) {
+   mutex_unlock(>message_lock);
+   BUG();
+   }
+
return res;
  }
  
@@ -288,9 +301,17 @@ int smu_cmn_send_msg_without_waiting(struct smu_context *smu,

  int smu_cmn_wait_for_response(struct smu_context *smu)
  {
u32 reg;
+   int res;
  
  	reg = __smu_cmn_poll_stat(smu);

-   return __smu_cmn_reg2errno(smu, reg);
+   res = __smu_cmn_reg2errno(smu, reg);
+
+   if (unlikely(amdgpu_smu_debug == 1) && res) {
+   mutex_unlock(>message_lock);
+   BUG();
+   }
+
+   return res;
  }
  
  /**

@@ -328,6 +349,7 @@ int smu_cmn_send_smc_msg_with_param(struct smu_context *smu,
uint32_t param,
uint32_t *read_arg)
  {
+   int retry_count = 0;
int res, index;
u32 reg;
  
@@ -349,15 +371,28 @@ int smu_cmn_send_smc_msg_with_param(struct smu_context *smu,

__smu_cmn_reg_print_error(smu, reg, index, param, msg);
goto Out;
}
+retry:
__smu_cmn_send_msg(smu, (uint16_t) index, param);
reg 

Re: [PATCH 6/6] Documentation/gpu: Add DC glossary

2021-11-30 Thread Christian König




Am 29.11.21 um 21:48 schrieb ydir...@free.fr:

Hi Rodrigo,

That will really be helpful!

I know drawing the line is a difficult problem (and can even make things
harder when searching), but maybe it would make sense to keep generic
acronyms not specific to amdgpu in a separate list.  I bet a number of
them would be useful in the scope of other drm drivers (e.g. CRTC, DCC,
MST), and some are not restricted to the drm subsystem at all (e.g. FEC,
LUT), but still have value as not necessarily easy to look up.

Maybe "DC glossary" should just be "Glossary", since quite some entries
help to read adm/amdgpu/ too.  Which brings me to the result of my recent
searches as suggested entries:

  KIQ (Kernel Interface Queue), MQD (memory queue descriptor), HQD (hardware
  queue descriptor), EOP (still no clue :)


End Of Pipe/Pipeline :)

This means that calculations are done, all caches are flushed and all 
memory is coherent again.


Usually related to an interrupt send or fence value written.

Christian.



Maybe some more specific ones just to be spelled out in clear where they
are used ?  KCQ (compute queue?), KGQ (gfx queue?)

More suggestions inlined.

Best regards,




Re: [PATCH V2 00/17] Unified entry point for other blocks to interact with power

2021-11-30 Thread Christian König

Am 30.11.21 um 08:42 schrieb Evan Quan:

There are several problems with current power implementations:
1. Too many internal details are exposed to other blocks. Thus to interact with
power, they need to know which power framework is used(powerplay vs swsmu)
or even whether some API is implemented.
2. A lot of cross callings exist which make it hard to get a whole picture of
the code hierarchy. And that makes any code change/increment error-prone.
3. Many different types of lock are used. It is calculated there is totally
13 different locks are used within power. Some of them are even designed for
the same purpose.

To ease the problems above, this patch series try to
1. provide unified entry point for other blocks to interact with power.
2. relocate some source code piece/headers to avoid cross callings.
3. enforce a unified lock protection on those entry point APIs above.
That makes the future optimization for unnecessary power locks possible.


I only skimmed over it, but that looks really good on first glance.

But you need to have Alex take a look as well since I only have a very 
high level understanding of power management.


Regards,
Christian.



Evan Quan (17):
   drm/amd/pm: do not expose implementation details to other blocks out
 of power
   drm/amd/pm: do not expose power implementation details to amdgpu_pm.c
   drm/amd/pm: do not expose power implementation details to display
   drm/amd/pm: do not expose those APIs used internally only in
 amdgpu_dpm.c
   drm/amd/pm: do not expose those APIs used internally only in si_dpm.c
   drm/amd/pm: do not expose the API used internally only in kv_dpm.c
   drm/amd/pm: create a new holder for those APIs used only by legacy
 ASICs(si/kv)
   drm/amd/pm: move pp_force_state_enabled member to amdgpu_pm structure
   drm/amd/pm: optimize the amdgpu_pm_compute_clocks() implementations
   drm/amd/pm: move those code piece used by Stoney only to smu8_hwmgr.c
   drm/amd/pm: correct the usage for amdgpu_dpm_dispatch_task()
   drm/amd/pm: drop redundant or unused APIs and data structures
   drm/amd/pm: do not expose the smu_context structure used internally in
 power
   drm/amd/pm: relocate the power related headers
   drm/amd/pm: drop unnecessary gfxoff controls
   drm/amd/pm: revise the performance level setting APIs
   drm/amd/pm: unified lock protections in amdgpu_dpm.c

  drivers/gpu/drm/amd/amdgpu/aldebaran.c|2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu.h   |7 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c  |  421 ---
  drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.h  |   30 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c   |   25 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c|6 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c   |   18 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h   |7 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c   |5 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c   |5 +-
  drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c   |2 +-
  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |6 +-
  .../amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c  |  246 +-
  .../gpu/drm/amd/include/kgd_pp_interface.h|   14 +
  drivers/gpu/drm/amd/pm/Makefile   |   12 +-
  drivers/gpu/drm/amd/pm/amdgpu_dpm.c   | 2435 -
  drivers/gpu/drm/amd/pm/amdgpu_dpm_internal.c  |   94 +
  drivers/gpu/drm/amd/pm/amdgpu_pm.c|  568 ++--
  drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h   |  339 +--
  .../gpu/drm/amd/pm/inc/amdgpu_dpm_internal.h  |   32 +
  drivers/gpu/drm/amd/pm/legacy-dpm/Makefile|   32 +
  .../pm/{powerplay => legacy-dpm}/cik_dpm.h|0
  .../amd/pm/{powerplay => legacy-dpm}/kv_dpm.c |   47 +-
  .../amd/pm/{powerplay => legacy-dpm}/kv_dpm.h |0
  .../amd/pm/{powerplay => legacy-dpm}/kv_smc.c |0
  .../gpu/drm/amd/pm/legacy-dpm/legacy_dpm.c| 1510 ++
  .../gpu/drm/amd/pm/legacy-dpm/legacy_dpm.h|   71 +
  .../amd/pm/{powerplay => legacy-dpm}/ppsmc.h  |0
  .../pm/{powerplay => legacy-dpm}/r600_dpm.h   |0
  .../amd/pm/{powerplay => legacy-dpm}/si_dpm.c |  111 +-
  .../amd/pm/{powerplay => legacy-dpm}/si_dpm.h |7 +
  .../amd/pm/{powerplay => legacy-dpm}/si_smc.c |0
  .../{powerplay => legacy-dpm}/sislands_smc.h  |0
  drivers/gpu/drm/amd/pm/powerplay/Makefile |4 -
  .../gpu/drm/amd/pm/powerplay/amd_powerplay.c  |   51 +-
  .../drm/amd/pm/powerplay/hwmgr/smu8_hwmgr.c   |   10 +-
  .../pm/{ => powerplay}/inc/amd_powerplay.h|0
  .../drm/amd/pm/{ => powerplay}/inc/cz_ppsmc.h |0
  .../amd/pm/{ => powerplay}/inc/fiji_ppsmc.h   |0
  .../pm/{ => powerplay}/inc/hardwaremanager.h  |0
  .../drm/amd/pm/{ => powerplay}/inc/hwmgr.h|3 -
  .../{ => powerplay}/inc/polaris10_pwrvirus.h  |0
  .../amd/pm/{ => powerplay}/inc/power_state.h  |0
  .../drm/amd/pm/{ => powerplay}/inc/pp_debug.h |0
  .../amd/pm/{ => powerplay}/inc/pp_endian.h|0
  .../amd/pm/{ => 

Re: [PATCH V2 01/17] drm/amd/pm: do not expose implementation details to other blocks out of power

2021-11-30 Thread Lazar, Lijo




On 11/30/2021 1:12 PM, Evan Quan wrote:

Those implementation details(whether swsmu supported, some ppt_funcs supported,
accessing internal statistics ...)should be kept internally. It's not a good
practice and even error prone to expose implementation details.

Signed-off-by: Evan Quan 
Change-Id: Ibca3462ceaa26a27a9145282b60c6ce5deca7752
---
  drivers/gpu/drm/amd/amdgpu/aldebaran.c|  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c   | 25 ++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c|  6 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c   | 18 +---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h   |  7 --
  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c   |  5 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c   |  5 +-
  drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c   |  2 +-
  .../gpu/drm/amd/include/kgd_pp_interface.h|  4 +
  drivers/gpu/drm/amd/pm/amdgpu_dpm.c   | 95 +++
  drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h   | 25 -
  drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h   |  9 +-
  drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 16 ++--
  13 files changed, 155 insertions(+), 64 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/aldebaran.c 
b/drivers/gpu/drm/amd/amdgpu/aldebaran.c
index bcfdb63b1d42..a545df4efce1 100644
--- a/drivers/gpu/drm/amd/amdgpu/aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/aldebaran.c
@@ -260,7 +260,7 @@ static int aldebaran_mode2_restore_ip(struct amdgpu_device 
*adev)
adev->gfx.rlc.funcs->resume(adev);
  
  	/* Wait for FW reset event complete */

-   r = smu_wait_for_event(adev, SMU_EVENT_RESET_COMPLETE, 0);
+   r = amdgpu_dpm_wait_for_event(adev, SMU_EVENT_RESET_COMPLETE, 0);
if (r) {
dev_err(adev->dev,
"Failed to get response from firmware after reset\n");
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 164d6a9e9fbb..0d1f00b24aae 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1585,22 +1585,25 @@ static int amdgpu_debugfs_sclk_set(void *data, u64 val)
return ret;
}
  
-	if (is_support_sw_smu(adev)) {

-   ret = smu_get_dpm_freq_range(>smu, SMU_SCLK, _freq, 
_freq);
-   if (ret || val > max_freq || val < min_freq)
-   return -EINVAL;
-   ret = smu_set_soft_freq_range(>smu, SMU_SCLK, 
(uint32_t)val, (uint32_t)val);
-   } else {
-   return 0;
+   ret = amdgpu_dpm_get_dpm_freq_range(adev, PP_SCLK, _freq, 
_freq);
+   if (ret == -EOPNOTSUPP) {
+   ret = 0;
+   goto out;
}
+   if (ret || val > max_freq || val < min_freq) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   ret = amdgpu_dpm_set_soft_freq_range(adev, PP_SCLK, (uint32_t)val, 
(uint32_t)val);
+   if (ret)
+   ret = -EINVAL;
  
+out:

pm_runtime_mark_last_busy(adev_to_drm(adev)->dev);
pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
  
-	if (ret)

-   return -EINVAL;
-
-   return 0;
+   return ret;
  }
  
  DEFINE_DEBUGFS_ATTRIBUTE(fops_ib_preempt, NULL,

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 1989f9e9379e..41cc1ffb5809 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2617,7 +2617,7 @@ static int amdgpu_device_ip_late_init(struct 
amdgpu_device *adev)
if (adev->asic_type == CHIP_ARCTURUS &&
amdgpu_passthrough(adev) &&
adev->gmc.xgmi.num_physical_nodes > 1)
-   smu_set_light_sbr(>smu, true);
+   amdgpu_dpm_set_light_sbr(adev, true);
  
  	if (adev->gmc.xgmi.num_physical_nodes > 1) {

mutex_lock(_info.mutex);
@@ -2857,7 +2857,7 @@ static int amdgpu_device_ip_suspend_phase2(struct 
amdgpu_device *adev)
int i, r;
  
  	if (adev->in_s0ix)

-   amdgpu_gfx_state_change_set(adev, sGpuChangeState_D3Entry);
+   amdgpu_dpm_gfx_state_change(adev, sGpuChangeState_D3Entry);
  
  	for (i = adev->num_ip_blocks - 1; i >= 0; i--) {

if (!adev->ip_blocks[i].status.valid)
@@ -3982,7 +3982,7 @@ int amdgpu_device_resume(struct drm_device *dev, bool 
fbcon)
return 0;
  
  	if (adev->in_s0ix)

-   amdgpu_gfx_state_change_set(adev, sGpuChangeState_D0Entry);
+   amdgpu_dpm_gfx_state_change(adev, sGpuChangeState_D0Entry);
  
  	/* post card */

if (amdgpu_device_need_post(adev)) {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index 1916ec84dd71..3d8f82dc8c97 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -615,7 +615,7 @@ int amdgpu_get_gfx_off_status(struct amdgpu_device *adev, 
uint32_t *value)