RE: [PATCH] drm/amdgpu/swsmu: don't bail early on hw_setup on resume

2021-03-25 Thread Liu, Zhan
[AMD Official Use Only - Internal Distribution Only]

> -Original Message-
> From: amd-gfx  On Behalf Of Alex
> Deucher
> Sent: 2021/March/26, Friday 12:38 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander 
> Subject: [PATCH] drm/amdgpu/swsmu: don't bail early on hw_setup on
> resume
>
> The SMU comes back up with DPM enabled by the sbios, but the driver still
> has to set up the SMU/driver mailbox, etc.
>
> Signed-off-by: Alex Deucher 

Reviewed-by: Zhan Liu 

> ---
>  drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> index d4b804c7b986..462917d4d5e2 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> @@ -1102,7 +1102,7 @@ static int smu_smc_hw_setup(struct smu_context
> *smu)
>  uint32_t pcie_gen = 0, pcie_width = 0;
>  int ret = 0;
>
> -if (adev->in_suspend && smu_is_dpm_running(smu)) {
> +if (!smu->is_apu && adev->in_suspend &&
> smu_is_dpm_running(smu)) {
>  dev_info(adev->dev, "dpm has been enabled\n");
>  /* this is needed specifically */
>  if ((adev->asic_type >= CHIP_SIENNA_CICHLID) &&
> --
> 2.30.2
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.
> freedesktop.org%2Fmailman%2Flistinfo%2Famd-
> gfxdata=04%7C01%7Czhan.liu%40amd.com%7C500744d08f7946b2c5d
> e08d8f010ec49%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C6375
> 23302768646367%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL
> CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=vN
> JawxwfojJrxNOG5L8Y2BAWpGRRN6valpk6y00XIQw%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu/swsmu: don't bail early on hw_setup on resume

2021-03-25 Thread Wang, Kevin(Yang)
[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Kevin Wang 

Best Regards,
Kevin

From: amd-gfx  on behalf of Alex Deucher 

Sent: Friday, March 26, 2021 12:37 PM
To: amd-gfx@lists.freedesktop.org 
Cc: Deucher, Alexander 
Subject: [PATCH] drm/amdgpu/swsmu: don't bail early on hw_setup on resume

The SMU comes back up with DPM enabled by the sbios, but the
driver still has to set up the SMU/driver mailbox, etc.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c 
b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
index d4b804c7b986..462917d4d5e2 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -1102,7 +1102,7 @@ static int smu_smc_hw_setup(struct smu_context *smu)
 uint32_t pcie_gen = 0, pcie_width = 0;
 int ret = 0;

-   if (adev->in_suspend && smu_is_dpm_running(smu)) {
+   if (!smu->is_apu && adev->in_suspend && smu_is_dpm_running(smu)) {
 dev_info(adev->dev, "dpm has been enabled\n");
 /* this is needed specifically */
 if ((adev->asic_type >= CHIP_SIENNA_CICHLID) &&
--
2.30.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=04%7C01%7CKevin1.Wang%40amd.com%7C97142ed601fb40e5733308d8f010ecb4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637523302779786035%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=8ToBQ2CXNmbeA%2BH01D6frUUCvrsL4TzGiiId%2B2gVHi8%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu/swsmu: don't bail early on hw_setup on resume

2021-03-25 Thread Alex Deucher
The SMU comes back up with DPM enabled by the sbios, but the
driver still has to set up the SMU/driver mailbox, etc.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c 
b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
index d4b804c7b986..462917d4d5e2 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -1102,7 +1102,7 @@ static int smu_smc_hw_setup(struct smu_context *smu)
uint32_t pcie_gen = 0, pcie_width = 0;
int ret = 0;
 
-   if (adev->in_suspend && smu_is_dpm_running(smu)) {
+   if (!smu->is_apu && adev->in_suspend && smu_is_dpm_running(smu)) {
dev_info(adev->dev, "dpm has been enabled\n");
/* this is needed specifically */
if ((adev->asic_type >= CHIP_SIENNA_CICHLID) &&
-- 
2.30.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm: Update MST First Link Slot Information Based on Encoding Format

2021-03-25 Thread Fangzhi Zuo
8b/10b encoding format requires to reserve the first slot for
recording metadata. Real data transmission starts from the second slot,
with a total of available 63 slots available.

In 128b/132b encoding format, metadata is transmitted separately
in LLCP packet before MTP. Real data transmission starts from
the first slot, with a total of 64 slots available.

Update the slot information after link detect.

Signed-off-by: Fangzhi Zuo 
---
 drivers/gpu/drm/drm_dp_mst_topology.c | 48 ++-
 include/drm/drm_dp_mst_helper.h   |  8 +
 2 files changed, 47 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c 
b/drivers/gpu/drm/drm_dp_mst_topology.c
index 42a0c6888c33..577ed4224778 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -3382,7 +3382,7 @@ int drm_dp_update_payload_part1(struct 
drm_dp_mst_topology_mgr *mgr)
struct drm_dp_payload req_payload;
struct drm_dp_mst_port *port;
int i, j;
-   int cur_slots = 1;
+   int cur_slots = mgr->first_link_start_slot;
 
mutex_lock(>payload_lock);
for (i = 0; i < mgr->max_payloads; i++) {
@@ -4302,8 +4302,13 @@ int drm_dp_find_vcpi_slots(struct 
drm_dp_mst_topology_mgr *mgr,
 
num_slots = DIV_ROUND_UP(pbn, mgr->pbn_div);
 
-   /* max. time slots - one slot for MTP header */
-   if (num_slots > 63)
+   /**
+* first_link_total_avail_slots: max. time slots
+* first slot reserved for MTP header in 8b/10b,
+* but not required for 128b/132b
+*/
+
+   if (num_slots > mgr->first_link_total_avail_slots)
return -ENOSPC;
return num_slots;
 }
@@ -4314,8 +4319,12 @@ static int drm_dp_init_vcpi(struct 
drm_dp_mst_topology_mgr *mgr,
 {
int ret;
 
-   /* max. time slots - one slot for MTP header */
-   if (slots > 63)
+   /**
+* first_link_total_avail_slots: max. time slots
+* first slot reserved for MTP header in 8b/10b,
+* but not required for 128b/132b
+*/
+   if (slots > mgr->first_link_total_avail_slots)
return -ENOSPC;
 
vcpi->pbn = pbn;
@@ -4488,6 +4497,25 @@ int drm_dp_atomic_release_vcpi_slots(struct 
drm_atomic_state *state,
 }
 EXPORT_SYMBOL(drm_dp_atomic_release_vcpi_slots);
 
+/*
+ * drm_dp_mst_update_first_link_slot_info()
+ *  update the first link's total available slots and starting slot
+ * @mgr: manager to store the slot info.
+ * @encoding_format: detected link encoding format
+ */
+void drm_dp_mst_update_first_link_slot_info(
+   struct drm_dp_mst_topology_mgr *mgr, uint8_t encoding_format)
+{
+   if (encoding_format == DP_CAP_ANSI_128B132B) {
+   mgr->first_link_total_avail_slots = 64;
+   mgr->first_link_start_slot = 0;
+   }
+   DRM_DEBUG_KMS("%s encoding format on 0x%p -> total %d slots, start at 
slot %d\n",
+   (encoding_format == DP_CAP_ANSI_128B132B) ? "128b/132b":"8b/10b",
+   mgr, mgr->first_link_total_avail_slots, 
mgr->first_link_start_slot);
+}
+EXPORT_SYMBOL(drm_dp_mst_update_first_link_slot_info);
+
 /**
  * drm_dp_mst_allocate_vcpi() - Allocate a virtual channel
  * @mgr: manager for this port
@@ -4518,8 +4546,8 @@ bool drm_dp_mst_allocate_vcpi(struct 
drm_dp_mst_topology_mgr *mgr,
 
ret = drm_dp_init_vcpi(mgr, >vcpi, pbn, slots);
if (ret) {
-   DRM_DEBUG_KMS("failed to init vcpi slots=%d max=63 ret=%d\n",
- DIV_ROUND_UP(pbn, mgr->pbn_div), ret);
+   DRM_DEBUG_KMS("failed to init vcpi slots=%d max=%d ret=%d\n",
+   DIV_ROUND_UP(pbn, mgr->pbn_div), 
mgr->first_link_total_avail_slots, ret);
drm_dp_mst_topology_put_port(port);
goto out;
}
@@ -5162,7 +5190,7 @@ drm_dp_mst_atomic_check_vcpi_alloc_limit(struct 
drm_dp_mst_topology_mgr *mgr,
 struct drm_dp_mst_topology_state 
*mst_state)
 {
struct drm_dp_vcpi_allocation *vcpi;
-   int avail_slots = 63, payload_count = 0;
+   int avail_slots = mgr->first_link_total_avail_slots, payload_count = 0;
 
list_for_each_entry(vcpi, _state->vcpis, next) {
/* Releasing VCPI is always OK-even if the port is gone */
@@ -5191,7 +5219,7 @@ drm_dp_mst_atomic_check_vcpi_alloc_limit(struct 
drm_dp_mst_topology_mgr *mgr,
}
DRM_DEBUG_ATOMIC("[MST MGR:%p] mst state %p VCPI avail=%d used=%d\n",
 mgr, mst_state, avail_slots,
-63 - avail_slots);
+mgr->first_link_total_avail_slots - avail_slots);
 
return 0;
 }
@@ -5455,6 +5483,8 @@ int drm_dp_mst_topology_mgr_init(struct 
drm_dp_mst_topology_mgr *mgr,
if (!mgr->proposed_vcpis)
return -ENOMEM;
set_bit(0, >payload_mask);
+   mgr->first_link_total_avail_slots = 63;
+   

Re: [PATCH] drm/amdkfd: Fix cat debugfs hang_hws file causes system crash bug

2021-03-25 Thread Felix Kuehling

Am 2021-03-23 um 10:56 a.m. schrieb Alex Deucher:
> Applied.  Thanks!

Thanks. I thought we fixed this before by making the file write-only.
But I guess that's not sufficient to stop root from reading it:

commit 2bdac179e217a0c0b548a8c60524977586621b19
Author: Felix Kuehling 
Date:   Thu Dec 19 22:36:55 2019 -0500

drm/amdkfd: Fix permissions of hang_hws

Reading from /sys/kernel/debug/kfd/hang_hws would cause a kernel
oops because we didn't implement a read callback. Set the permission
to write-only to prevent that.

Signed-off-by: Felix Kuehling 
Reviewed-by: shaoyunl  
Signed-off-by: Alex Deucher 


Now that we have a sensible message in the file, I guess we should
officially make it readable again.

Regards,
  Felix

>
> Alex
>
> On Sun, Mar 21, 2021 at 5:33 AM Qu Huang  wrote:
>> Here is the system crash log:
>> [ 1272.884438] BUG: unable to handle kernel NULL pointer dereference at
>> (null)
>> [ 1272.88] IP: [<  (null)>]   (null)
>> [ 1272.884447] PGD 825b09067 PUD 8267c8067 PMD 0
>> [ 1272.884452] Oops: 0010 [#1] SMP
>> [ 1272.884509] CPU: 13 PID: 3485 Comm: cat Kdump: loaded Tainted: G
>> [ 1272.884515] task: 9a38dbd4d140 ti: 9a37cd3b8000 task.ti:
>> 9a37cd3b8000
>> [ 1272.884517] RIP: 0010:[<>]  [<  (null)>]
>> (null)
>> [ 1272.884520] RSP: 0018:9a37cd3bbe68  EFLAGS: 00010203
>> [ 1272.884522] RAX:  RBX:  RCX:
>> 00014d5f
>> [ 1272.884524] RDX: fff4 RSI: 0001 RDI:
>> 9a38aca4d200
>> [ 1272.884526] RBP: 9a37cd3bbed0 R08: 9a38dcd5f1a0 R09:
>> 9a31ffc07300
>> [ 1272.884527] R10: 9a31ffc07300 R11: addd5e9d R12:
>> 9a38b4e0fb00
>> [ 1272.884529] R13: 0001 R14: 9a37cd3bbf18 R15:
>> 9a38aca4d200
>> [ 1272.884532] FS:  7feccaa67740() GS:9a38dcd4()
>> knlGS:
>> [ 1272.884534] CS:  0010 DS:  ES:  CR0: 80050033
>> [ 1272.884536] CR2:  CR3: 0008267c CR4:
>> 003407e0
>> [ 1272.884537] Call Trace:
>> [ 1272.884544]  [] ? seq_read+0x130/0x440
>> [ 1272.884548]  [] vfs_read+0x9f/0x170
>> [ 1272.884552]  [] SyS_read+0x7f/0xf0
>> [ 1272.884557]  [] system_call_fastpath+0x22/0x27
>> [ 1272.884558] Code:  Bad RIP value.
>> [ 1272.884562] RIP  [<  (null)>]   (null)
>> [ 1272.884564]  RSP 
>> [ 1272.884566] CR2: 
>>
>> Signed-off-by: Qu Huang 
>> ---
>>  drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c | 7 ++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c
>> index 511712c..673d5e3 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c
>> @@ -33,6 +33,11 @@ static int kfd_debugfs_open(struct inode *inode, struct 
>> file *file)
>>
>> return single_open(file, show, NULL);
>>  }
>> +static int kfd_debugfs_hang_hws_read(struct seq_file *m, void *data)
>> +{
>> +   seq_printf(m, "echo gpu_id > hang_hws\n");
>> +   return 0;
>> +}
>>
>>  static ssize_t kfd_debugfs_hang_hws_write(struct file *file,
>> const char __user *user_buf, size_t size, loff_t *ppos)
>> @@ -94,7 +99,7 @@ void kfd_debugfs_init(void)
>> debugfs_create_file("rls", S_IFREG | 0444, debugfs_root,
>> kfd_debugfs_rls_by_device, _debugfs_fops);
>> debugfs_create_file("hang_hws", S_IFREG | 0200, debugfs_root,
>> -   NULL, _debugfs_hang_hws_fops);
>> +   kfd_debugfs_hang_hws_read, 
>> _debugfs_hang_hws_fops);
>>  }
>>
>>  void kfd_debugfs_fini(void)
>> --
>> 1.8.3.1
>>
>> ___
>> dri-devel mailing list
>> dri-de...@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak

2021-03-25 Thread Zhang, Jack (Jian)
[AMD Official Use Only - Internal Distribution Only]

Hi, Andrey,

>>how u handle non guilty singnaled jobs in drm_sched_stop, currently looks 
>>like you don't call put for them and just explicitly free them as before
Good point, I missed that place. Will cover that in my next patch.

>>Also sched->free_guilty seems useless with the new approach.
Yes, I agree.

>>Do we even need the cleanup mechanism at drm_sched_get_cleanup_job with this 
>>approach...
I am not quite sure about that for now, let me think about this topic today.

Hi, Christian,
should I add a fence and get/put to that fence rather than using an explicit 
refcount?
And another concerns?

Thanks,
Jack

-Original Message-
From: Grodzovsky, Andrey 
Sent: Friday, March 26, 2021 12:32 AM
To: Zhang, Jack (Jian) ; Christian König 
; dri-de...@lists.freedesktop.org; 
amd-gfx@lists.freedesktop.org; Koenig, Christian ; 
Liu, Monk ; Deng, Emily ; Rob Herring 
; Tomeu Vizoso ; Steven Price 

Subject: Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak

There are a few issues here like - how u handle non guilty singnaled jobs in 
drm_sched_stop, currently looks like you don't call put for them and just 
explicitly free them as before. Also sched->free_guilty seems useless with the 
new approach. Do we even need the cleanup mechanism at 
drm_sched_get_cleanup_job with this approach...

But first - We need Christian to express his opinion on this since I think he 
opposed refcounting jobs and that we should concentrate on fences instead.

Christian - can you chime in here ?

Andrey

On 2021-03-25 5:51 a.m., Zhang, Jack (Jian) wrote:
> [AMD Official Use Only - Internal Distribution Only]
>
>
> Hi, Andrey
>
> Thank you for your good opinions.
>
> I literally agree with you that the refcount could solve the
> get_clean_up_up cocurrent job gracefully, and no need to re-insert the
>
> job back anymore.
>
> I quickly made a draft for this idea as follows:
>
> How do you like it? I will start implement to it after I got your
> acknowledge.
>
> Thanks,
>
> Jack
>
> +void drm_job_get(struct drm_sched_job *s_job)
>
> +{
>
> +   kref_get(_job->refcount);
>
> +}
>
> +
>
> +void drm_job_do_release(struct kref *ref)
>
> +{
>
> +   struct drm_sched_job *s_job;
>
> +   struct drm_gpu_scheduler *sched;
>
> +
>
> +   s_job = container_of(ref, struct drm_sched_job, refcount);
>
> +   sched = s_job->sched;
>
> +   sched->ops->free_job(s_job);
>
> +}
>
> +
>
> +void drm_job_put(struct drm_sched_job *s_job)
>
> +{
>
> +   kref_put(_job->refcount, drm_job_do_release);
>
> +}
>
> +
>
> static void drm_sched_job_begin(struct drm_sched_job *s_job)
>
> {
>
>  struct drm_gpu_scheduler *sched = s_job->sched;
>
> +   kref_init(_job->refcount);
>
> +   drm_job_get(s_job);
>
>  spin_lock(>job_list_lock);
>
>  list_add_tail(_job->node, >ring_mirror_list);
>
>  drm_sched_start_timeout(sched);
>
> @@ -294,17 +316,16 @@ static void drm_sched_job_timedout(struct
> work_struct *work)
>
>   * drm_sched_cleanup_jobs. It will be reinserted back
> after sched->thread
>
>   * is parked at which point it's safe.
>
>   */
>
> -   list_del_init(>node);
>
> +   drm_job_get(job);
>
>  spin_unlock(>job_list_lock);
>
>  job->sched->ops->timedout_job(job);
>
> -
>
> +   drm_job_put(job);
>
>  /*
>
>   * Guilty job did complete and hence needs to be
> manually removed
>
>   * See drm_sched_stop doc.
>
>   */
>
>  if (sched->free_guilty) {
>
> -   job->sched->ops->free_job(job);
>
>  sched->free_guilty = false;
>
>  }
>
>  } else {
>
> @@ -355,20 +376,6 @@ void drm_sched_stop(struct drm_gpu_scheduler
> *sched, struct drm_sched_job *bad)
>
> -   /*
>
> -* Reinsert back the bad job here - now it's safe as
>
> -* drm_sched_get_cleanup_job cannot race against us and
> release the
>
> -* bad job at this point - we parked (waited for) any in
> progress
>
> -* (earlier) cleanups and drm_sched_get_cleanup_job will not
> be called
>
> -* now until the scheduler thread is unparked.
>
> -*/
>
> -   if (bad && bad->sched == sched)
>
> -   /*
>
> -* Add at the head of the queue to reflect it was the
> earliest
>
> -* job extracted.
>
> -*/
>
> -   list_add(>node, >ring_mirror_list);
>
> -
>
>  /*
>
>   * Iterate the job list from later to  earlier one and either
> deactive
>
>   * their HW callbacks or remove them from mirror list if they
> already
>
> @@ -774,7 +781,7 @@ static int drm_sched_main(void *param)
>
>   kthread_should_stop());
>
>  if (cleanup_job) {
>

RE: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak

2021-03-25 Thread Zhang, Jack (Jian)
[AMD Official Use Only - Internal Distribution Only]

Hi, Steve,

Thank you for your detailed comments.

But currently the patch is not finalized.
We found some potential race condition even with this patch. The solution is 
under discussion and hopefully we could find an ideal one.
After that, I will start to consider other drm-driver if it will influence 
other drivers(except for amdgpu).

Best,
Jack

-Original Message-
From: Steven Price 
Sent: Monday, March 22, 2021 11:29 PM
To: Zhang, Jack (Jian) ; dri-de...@lists.freedesktop.org; 
amd-gfx@lists.freedesktop.org; Koenig, Christian ; 
Grodzovsky, Andrey ; Liu, Monk ; 
Deng, Emily ; Rob Herring ; Tomeu Vizoso 

Subject: Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak

On 15/03/2021 05:23, Zhang, Jack (Jian) wrote:
> [AMD Public Use]
>
> Hi, Rob/Tomeu/Steven,
>
> Would you please help to review this patch for panfrost driver?
>
> Thanks,
> Jack Zhang
>
> -Original Message-
> From: Jack Zhang 
> Sent: Monday, March 15, 2021 1:21 PM
> To: dri-de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org;
> Koenig, Christian ; Grodzovsky, Andrey
> ; Liu, Monk ; Deng, Emily
> 
> Cc: Zhang, Jack (Jian) 
> Subject: [PATCH v3] drm/scheduler re-insert Bailing job to avoid
> memleak
>
> re-insert Bailing jobs to avoid memory leak.
>
> V2: move re-insert step to drm/scheduler logic
> V3: add panfrost's return value for bailing jobs in case it hits the
> memleak issue.

This commit message could do with some work - it's really hard to decipher what 
the actual problem you're solving is.

>
> Signed-off-by: Jack Zhang 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 8 ++--
>   drivers/gpu/drm/panfrost/panfrost_job.c| 4 ++--
>   drivers/gpu/drm/scheduler/sched_main.c | 8 +++-
>   include/drm/gpu_scheduler.h| 1 +
>   5 files changed, 19 insertions(+), 6 deletions(-)
>
[...]
> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c
> b/drivers/gpu/drm/panfrost/panfrost_job.c
> index 6003cfeb1322..e2cb4f32dae1 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> @@ -444,7 +444,7 @@ static enum drm_gpu_sched_stat 
> panfrost_job_timedout(struct drm_sched_job
>* spurious. Bail out.
>*/
>   if (dma_fence_is_signaled(job->done_fence))
> -return DRM_GPU_SCHED_STAT_NOMINAL;
> +return DRM_GPU_SCHED_STAT_BAILING;
>
>   dev_err(pfdev->dev, "gpu sched timeout, js=%d, config=0x%x, status=0x%x, 
> head=0x%x, tail=0x%x, sched_job=%p",
>   js,
> @@ -456,7 +456,7 @@ static enum drm_gpu_sched_stat
> panfrost_job_timedout(struct drm_sched_job
>
>   /* Scheduler is already stopped, nothing to do. */
>   if (!panfrost_scheduler_stop(>js->queue[js], sched_job))
> -return DRM_GPU_SCHED_STAT_NOMINAL;
> +return DRM_GPU_SCHED_STAT_BAILING;
>
>   /* Schedule a reset if there's no reset in progress. */
>   if (!atomic_xchg(>reset.pending, 1))

This looks correct to me - in these two cases drm_sched_stop() is not called on 
the sched_job, so it looks like currently the job will be leaked.

> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> b/drivers/gpu/drm/scheduler/sched_main.c
> index 92d8de24d0a1..a44f621fb5c4 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -314,6 +314,7 @@ static void drm_sched_job_timedout(struct work_struct 
> *work)
>   {
>   struct drm_gpu_scheduler *sched;
>   struct drm_sched_job *job;
> +int ret;
>
>   sched = container_of(work, struct drm_gpu_scheduler,
> work_tdr.work);
>
> @@ -331,8 +332,13 @@ static void drm_sched_job_timedout(struct work_struct 
> *work)
>   list_del_init(>list);
>   spin_unlock(>job_list_lock);
>
> -job->sched->ops->timedout_job(job);
> +ret = job->sched->ops->timedout_job(job);
>
> +if (ret == DRM_GPU_SCHED_STAT_BAILING) {
> +spin_lock(>job_list_lock);
> +list_add(>node, >ring_mirror_list);
> +spin_unlock(>job_list_lock);
> +}

I think we could really do with a comment somewhere explaining what "bailing" 
means in this context. For the Panfrost case we have two cases:

  * The GPU job actually finished while the timeout code was running 
(done_fence is signalled).

  * The GPU is already in the process of being reset (Panfrost has multiple 
queues, so mostly like a bad job in another queue).

I'm also not convinced that (for Panfrost) it makes sense to be adding the jobs 
back to the list. For the first case above clearly the job could just be freed 
(it's complete). The second case is more interesting and Panfrost currently 
doesn't handle this well. In theory the driver could try to rescue the job 
('soft stop' in Mali language) so that it could be resubmitted. Panfrost 
doesn't currently support that, so attempting to resubmit the job is almost 
certainly going to fail.

It's on my TODO list to look at improving Panfrost in this regard, but sadly 
still quite far down.

Steve

>   /*
>* Guilty 

Re: [PATCH v2] drm/mst: Enhance MST topology logging

2021-03-25 Thread Lyude Paul
Reviewed-by: Lyude Paul 

Let me know if you need me to push this to drm-misc-next for you

On Thu, 2021-03-25 at 14:06 -0400, Eryk Brol wrote:
> [why]
> MST topology print was missing fec logging and pdt printed
> as an int wasn't clear. vcpi and payload info was printed as an
> arbitrary series of ints which requires user to know the ordering
> of the prints, making the logs difficult to use.
> 
> [how]
> -add fec logging
> -add pdt parsing into strings
> -format vcpi and payload info into tables with headings
> -clean up topology prints
> 
> ---
> 
> v2: Addressed Lyude's comments
> -made helper function return const
> -fixed indentation and spacing issues
> 
> Signed-off-by: Eryk Brol 
> ---
>  drivers/gpu/drm/drm_dp_mst_topology.c | 59 ++-
>  1 file changed, 48 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> b/drivers/gpu/drm/drm_dp_mst_topology.c
> index 932c4641ec3e..de5124ce42cb 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -4720,6 +4720,28 @@ static void drm_dp_mst_kick_tx(struct
> drm_dp_mst_topology_mgr *mgr)
> queue_work(system_long_wq, >tx_work);
>  }
>  
> +/*
> + * Helper function for parsing DP device types into convenient strings
> + * for use with dp_mst_topology
> + */
> +static const char *pdt_to_string(u8 pdt)
> +{
> +   switch (pdt) {
> +   case DP_PEER_DEVICE_NONE:
> +   return "NONE";
> +   case DP_PEER_DEVICE_SOURCE_OR_SST:
> +   return "SOURCE OR SST";
> +   case DP_PEER_DEVICE_MST_BRANCHING:
> +   return "MST BRANCHING";
> +   case DP_PEER_DEVICE_SST_SINK:
> +   return "SST SINK";
> +   case DP_PEER_DEVICE_DP_LEGACY_CONV:
> +   return "DP LEGACY CONV";
> +   default:
> +   return "ERR";
> +   }
> +}
> +
>  static void drm_dp_mst_dump_mstb(struct seq_file *m,
>  struct drm_dp_mst_branch *mstb)
>  {
> @@ -4732,9 +4754,20 @@ static void drm_dp_mst_dump_mstb(struct seq_file *m,
> prefix[i] = '\t';
> prefix[i] = '\0';
>  
> -   seq_printf(m, "%smst: %p, %d\n", prefix, mstb, mstb->num_ports);
> +   seq_printf(m, "%smstb - [%p]: num_ports: %d\n", prefix, mstb, mstb-
> >num_ports);
> list_for_each_entry(port, >ports, next) {
> -   seq_printf(m, "%sport: %d: input: %d: pdt: %d, ddps: %d ldps:
> %d, sdp: %d/%d, %p, conn: %p\n", prefix, port->port_num, port->input, port-
> >pdt, port->ddps, port->ldps, port->num_sdp_streams, port-
> >num_sdp_stream_sinks, port, port->connector);
> +   seq_printf(m, "%sport %d - [%p] (%s - %s): ddps: %d, ldps: %d,
> sdp: %d/%d, fec: %s, conn: %p\n", 
> +  prefix,
> +  port->port_num,
> +  port,
> +  port->input ? "input" : "output",
> +  pdt_to_string(port->pdt),
> +  port->ddps,
> +  port->ldps,
> +  port->num_sdp_streams,
> +  port->num_sdp_stream_sinks,
> +  port->fec_capable ? "true" : "false",
> +  port->connector);
> if (port->mstb)
> drm_dp_mst_dump_mstb(m, port->mstb);
> }
> @@ -4787,33 +4820,37 @@ void drm_dp_mst_dump_topology(struct seq_file *m,
> mutex_unlock(>lock);
>  
> mutex_lock(>payload_lock);
> -   seq_printf(m, "vcpi: %lx %lx %d\n", mgr->payload_mask, mgr->vcpi_mask,
> -   mgr->max_payloads);
> +   seq_printf(m, "\n*** VCPI Info ***\n");
> +   seq_printf(m, "payload_mask: %lx, vcpi_mask: %lx, max_payloads: %d\n",
> mgr->payload_mask, mgr->vcpi_mask, mgr->max_payloads);
>  
> +   seq_printf(m, "\n|   idx   |  port # |  vcp_id | # slots | sink
> name |\n");
> for (i = 0; i < mgr->max_payloads; i++) {
> if (mgr->proposed_vcpis[i]) {
> char name[14];
>  
> port = container_of(mgr->proposed_vcpis[i], struct
> drm_dp_mst_port, vcpi);
> fetch_monitor_name(mgr, port, name, sizeof(name));
> -   seq_printf(m, "vcpi %d: %d %d %d sink name: %s\n", i,
> -  port->port_num, port->vcpi.vcpi,
> +   seq_printf(m, "%10d%10d%10d%10d%20s\n",
> +  i,
> +  port->port_num,
> +  port->vcpi.vcpi,
>    port->vcpi.num_slots,
> -  (*name != 0) ? name :  "Unknown");
> +  (*name != 0) ? name : "Unknown");
> } else
> -   seq_printf(m, "vcpi %d:unused\n", i);
> +   seq_printf(m, 

Re: [PATCH 2/2] drm/amdgpu: Introduce new SETUP_TMR interface

2021-03-25 Thread Zeng, Oak
Thank you Lijo.

I added two macro to calculate a bo's physical address. I also used those 
macros to simplified driver a little bit. Please help review

Regards,
Oak 

 

On 2021-03-23, 10:15 AM, "Lazar, Lijo"  wrote:

[AMD Public Use]


-Original Message-
From: amd-gfx  On Behalf Of Zeng, Oak
Sent: Monday, March 22, 2021 7:33 PM
To: amd-gfx@lists.freedesktop.org
Cc: Kuehling, Felix ; Zhang, Hawking 

Subject: Re: [PATCH 2/2] drm/amdgpu: Introduce new SETUP_TMR interface

[AMD Official Use Only - Internal Distribution Only]

[AMD Official Use Only - Internal Distribution Only]

Hello all,

Can someone help to review below patches? We verified with firmware team 
and want to check-in together with psp firmware

Regards,
Oak



On 2021-03-12, 4:24 PM, "Zeng, Oak"  wrote:

This new interface passes both virtual and physical address
to PSP. It is backword compatible with old interface.

Signed-off-by: Oak Zeng 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 13 ++---
 drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h | 11 ++-
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index cd3eda9..99e1a3e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -328,8 +328,13 @@ psp_cmd_submit_buf(struct psp_context *psp,

 static void psp_prep_tmr_cmd_buf(struct psp_context *psp,
  struct psp_gfx_cmd_resp *cmd,
- uint64_t tmr_mc, uint32_t size)
+ uint64_t tmr_mc, struct amdgpu_bo *tmr_bo)
 {
+struct amdgpu_device *adev = psp->adev;
+uint32_t size = amdgpu_bo_size(tmr_bo);
+uint64_t tmr_pa = amdgpu_bo_gpu_offset(tmr_bo) +
+adev->vm_manager.vram_base_offset - adev->gmc.vram_start;
+

<> This looks like a candidate for a small inline function in gmc. PSP 
doesn't need to know about the calculation.

Thanks,
Lijo

 if (amdgpu_sriov_vf(psp->adev))
 cmd->cmd_id = GFX_CMD_ID_SETUP_VMR;
 else
@@ -337,6 +342,9 @@ static void psp_prep_tmr_cmd_buf(struct psp_context 
*psp,
 cmd->cmd.cmd_setup_tmr.buf_phy_addr_lo = lower_32_bits(tmr_mc);
 cmd->cmd.cmd_setup_tmr.buf_phy_addr_hi = upper_32_bits(tmr_mc);
 cmd->cmd.cmd_setup_tmr.buf_size = size;
+cmd->cmd.cmd_setup_tmr.bitfield.virt_phy_addr = 1;
+cmd->cmd.cmd_setup_tmr.system_phy_addr_lo = lower_32_bits(tmr_pa);
+cmd->cmd.cmd_setup_tmr.system_phy_addr_hi = upper_32_bits(tmr_pa);
 }

 static void psp_prep_load_toc_cmd_buf(struct psp_gfx_cmd_resp *cmd,
@@ -456,8 +464,7 @@ static int psp_tmr_load(struct psp_context *psp)
 if (!cmd)
 return -ENOMEM;

-psp_prep_tmr_cmd_buf(psp, cmd, psp->tmr_mc_addr,
- amdgpu_bo_size(psp->tmr_bo));
+psp_prep_tmr_cmd_buf(psp, cmd, psp->tmr_mc_addr, psp->tmr_bo);
 DRM_INFO("reserve 0x%lx from 0x%llx for PSP TMR\n",
  amdgpu_bo_size(psp->tmr_bo), psp->tmr_mc_addr);

diff --git a/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h 
b/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h
index a41b054..604a1c1 100644
--- a/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h
+++ b/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h
@@ -170,10 +170,19 @@ struct psp_gfx_cmd_setup_tmr
 uint32_tbuf_phy_addr_lo;   /* bits [31:0] of GPU 
Virtual address of TMR buffer (must be 4 KB aligned) */
 uint32_tbuf_phy_addr_hi;   /* bits [63:32] of GPU 
Virtual address of TMR buffer */
 uint32_tbuf_size;  /* buffer size in bytes 
(must be multiple of 4 KB) */
+union {
+struct {
+uint32_tsriov_enabled:1; /* whether the device runs under SR-IOV*/
+uint32_tvirt_phy_addr:1; /* driver passes both virtual and physical 
address to PSP*/
+uint32_treserved:30;
+} bitfield;
+uint32_ttmr_flags;
+};
+uint32_tsystem_phy_addr_lo;/* bits [31:0] of 
system physical address of TMR buffer (must be 4 KB aligned) */
+uint32_tsystem_phy_addr_hi;/* bits [63:32] of 
system physical address of TMR buffer */

 };

-
 /* FW types for GFX_CMD_ID_LOAD_IP_FW command. Limit 31. */
 enum psp_gfx_fw_type {
 GFX_FW_TYPE_NONE= 0,/* */
--
2.7.4


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org


[PATCH v2] drm/mst: Enhance MST topology logging

2021-03-25 Thread Eryk Brol
[why]
MST topology print was missing fec logging and pdt printed
as an int wasn't clear. vcpi and payload info was printed as an
arbitrary series of ints which requires user to know the ordering
of the prints, making the logs difficult to use.

[how]
-add fec logging
-add pdt parsing into strings
-format vcpi and payload info into tables with headings
-clean up topology prints

---

v2: Addressed Lyude's comments
-made helper function return const
-fixed indentation and spacing issues

Signed-off-by: Eryk Brol 
---
 drivers/gpu/drm/drm_dp_mst_topology.c | 59 ++-
 1 file changed, 48 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c 
b/drivers/gpu/drm/drm_dp_mst_topology.c
index 932c4641ec3e..de5124ce42cb 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -4720,6 +4720,28 @@ static void drm_dp_mst_kick_tx(struct 
drm_dp_mst_topology_mgr *mgr)
queue_work(system_long_wq, >tx_work);
 }
 
+/*
+ * Helper function for parsing DP device types into convenient strings
+ * for use with dp_mst_topology
+ */
+static const char *pdt_to_string(u8 pdt)
+{
+   switch (pdt) {
+   case DP_PEER_DEVICE_NONE:
+   return "NONE";
+   case DP_PEER_DEVICE_SOURCE_OR_SST:
+   return "SOURCE OR SST";
+   case DP_PEER_DEVICE_MST_BRANCHING:
+   return "MST BRANCHING";
+   case DP_PEER_DEVICE_SST_SINK:
+   return "SST SINK";
+   case DP_PEER_DEVICE_DP_LEGACY_CONV:
+   return "DP LEGACY CONV";
+   default:
+   return "ERR";
+   }
+}
+
 static void drm_dp_mst_dump_mstb(struct seq_file *m,
 struct drm_dp_mst_branch *mstb)
 {
@@ -4732,9 +4754,20 @@ static void drm_dp_mst_dump_mstb(struct seq_file *m,
prefix[i] = '\t';
prefix[i] = '\0';
 
-   seq_printf(m, "%smst: %p, %d\n", prefix, mstb, mstb->num_ports);
+   seq_printf(m, "%smstb - [%p]: num_ports: %d\n", prefix, mstb, 
mstb->num_ports);
list_for_each_entry(port, >ports, next) {
-   seq_printf(m, "%sport: %d: input: %d: pdt: %d, ddps: %d ldps: 
%d, sdp: %d/%d, %p, conn: %p\n", prefix, port->port_num, port->input, 
port->pdt, port->ddps, port->ldps, port->num_sdp_streams, 
port->num_sdp_stream_sinks, port, port->connector);
+   seq_printf(m, "%sport %d - [%p] (%s - %s): ddps: %d, ldps: %d, 
sdp: %d/%d, fec: %s, conn: %p\n", 
+  prefix,
+  port->port_num,
+  port,
+  port->input ? "input" : "output",
+  pdt_to_string(port->pdt),
+  port->ddps,
+  port->ldps,
+  port->num_sdp_streams,
+  port->num_sdp_stream_sinks,
+  port->fec_capable ? "true" : "false",
+  port->connector);
if (port->mstb)
drm_dp_mst_dump_mstb(m, port->mstb);
}
@@ -4787,33 +4820,37 @@ void drm_dp_mst_dump_topology(struct seq_file *m,
mutex_unlock(>lock);
 
mutex_lock(>payload_lock);
-   seq_printf(m, "vcpi: %lx %lx %d\n", mgr->payload_mask, mgr->vcpi_mask,
-   mgr->max_payloads);
+   seq_printf(m, "\n*** VCPI Info ***\n");
+   seq_printf(m, "payload_mask: %lx, vcpi_mask: %lx, max_payloads: %d\n", 
mgr->payload_mask, mgr->vcpi_mask, mgr->max_payloads);
 
+   seq_printf(m, "\n|   idx   |  port # |  vcp_id | # slots | sink 
name |\n");
for (i = 0; i < mgr->max_payloads; i++) {
if (mgr->proposed_vcpis[i]) {
char name[14];
 
port = container_of(mgr->proposed_vcpis[i], struct 
drm_dp_mst_port, vcpi);
fetch_monitor_name(mgr, port, name, sizeof(name));
-   seq_printf(m, "vcpi %d: %d %d %d sink name: %s\n", i,
-  port->port_num, port->vcpi.vcpi,
+   seq_printf(m, "%10d%10d%10d%10d%20s\n",
+  i,
+  port->port_num,
+  port->vcpi.vcpi,
   port->vcpi.num_slots,
-  (*name != 0) ? name :  "Unknown");
+  (*name != 0) ? name : "Unknown");
} else
-   seq_printf(m, "vcpi %d:unused\n", i);
+   seq_printf(m, "%6d - Unused\n", i);
}
+   seq_printf(m, "\n*** Payload Info ***\n");
+   seq_printf(m, "|   idx   |  state  |  start slot  | # slots |\n");
for (i = 0; i < mgr->max_payloads; i++) {
-   seq_printf(m, "payload %d: %d, %d, %d\n",
+   seq_printf(m, "%10d%10d%15d%10d\n",
   i,
  

Re: [PATCH] drm/amd: Fix a typo in two different sentences

2021-03-25 Thread Alex Deucher
Applied.  Thanks!

Alex

On Mon, Mar 22, 2021 at 6:45 PM Randy Dunlap  wrote:
>
> On 3/22/21 2:06 PM, Bhaskar Chowdhury wrote:
> >
> > s/defintion/definition/ .two different places.
> >
> > Signed-off-by: Bhaskar Chowdhury 
>
> Acked-by: Randy Dunlap 
>
> > ---
> >  drivers/gpu/drm/amd/include/atombios.h | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/include/atombios.h 
> > b/drivers/gpu/drm/amd/include/atombios.h
> > index c1d7b1d0b952..47eb84598b96 100644
> > --- a/drivers/gpu/drm/amd/include/atombios.h
> > +++ b/drivers/gpu/drm/amd/include/atombios.h
> > @@ -1987,9 +1987,9 @@ typedef struct _PIXEL_CLOCK_PARAMETERS_V6
> >  #define PIXEL_CLOCK_V6_MISC_HDMI_BPP_MASK   0x0c
> >  #define PIXEL_CLOCK_V6_MISC_HDMI_24BPP  0x00
> >  #define PIXEL_CLOCK_V6_MISC_HDMI_36BPP  0x04
> > -#define PIXEL_CLOCK_V6_MISC_HDMI_36BPP_V6   0x08//for V6, the 
> > correct defintion for 36bpp should be 2 for 36bpp(2:1)
> > +#define PIXEL_CLOCK_V6_MISC_HDMI_36BPP_V6   0x08//for V6, the 
> > correct definition for 36bpp should be 2 for 36bpp(2:1)
> >  #define PIXEL_CLOCK_V6_MISC_HDMI_30BPP  0x08
> > -#define PIXEL_CLOCK_V6_MISC_HDMI_30BPP_V6   0x04//for V6, the 
> > correct defintion for 30bpp should be 1 for 36bpp(5:4)
> > +#define PIXEL_CLOCK_V6_MISC_HDMI_30BPP_V6   0x04//for V6, the 
> > correct definition for 30bpp should be 1 for 36bpp(5:4)
> >  #define PIXEL_CLOCK_V6_MISC_HDMI_48BPP  0x0c
> >  #define PIXEL_CLOCK_V6_MISC_REF_DIV_SRC 0x10
> >  #define PIXEL_CLOCK_V6_MISC_GEN_DPREFCLK0x40
> > --
>
>
> --
> ~Randy
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amd/amdgpu/gfx_v7_0: Trivial typo fixes

2021-03-25 Thread Alex Deucher
Applied.  Thanks!

Alex

On Thu, Mar 25, 2021 at 5:26 AM Nirmoy  wrote:
>
>
> Reviewed-by: Nirmoy Das
>
> On 3/25/21 9:53 AM, Bhaskar Chowdhury wrote:
> > s/acccess/access/
> > s/inferface/interface/
> > s/sequnce/sequence/  .two different places.
> > s/retrive/retrieve/
> > s/sheduling/scheduling/
> > s/independant/independent/
> > s/wether/whether/ ..two different places.
> > s/emmit/emit/
> > s/synce/sync/
> >
> >
> > Signed-off-by: Bhaskar Chowdhury 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 22 +++---
> >   1 file changed, 11 insertions(+), 11 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c 
> > b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> > index a368724c3dfc..4502b95ddf6b 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> > @@ -1877,7 +1877,7 @@ static void gfx_v7_0_init_compute_vmid(struct 
> > amdgpu_device *adev)
> >   mutex_unlock(>srbm_mutex);
> >
> >   /* Initialize all compute VMIDs to have no GDS, GWS, or OA
> > -acccess. These should be enabled by FW for target VMIDs. */
> > +access. These should be enabled by FW for target VMIDs. */
> >   for (i = adev->vm_manager.first_kfd_vmid; i < AMDGPU_NUM_VMID; i++) {
> >   WREG32(amdgpu_gds_reg_offset[i].mem_base, 0);
> >   WREG32(amdgpu_gds_reg_offset[i].mem_size, 0);
> > @@ -2058,7 +2058,7 @@ static void gfx_v7_0_constants_init(struct 
> > amdgpu_device *adev)
> >* @adev: amdgpu_device pointer
> >*
> >* Set up the number and offset of the CP scratch registers.
> > - * NOTE: use of CP scratch registers is a legacy inferface and
> > + * NOTE: use of CP scratch registers is a legacy interface and
> >* is not used by default on newer asics (r6xx+).  On newer asics,
> >* memory buffers are used for fences rather than scratch regs.
> >*/
> > @@ -2172,7 +2172,7 @@ static void gfx_v7_0_ring_emit_vgt_flush(struct 
> > amdgpu_ring *ring)
> >* @seq: sequence number
> >* @flags: fence related flags
> >*
> > - * Emits a fence sequnce number on the gfx ring and flushes
> > + * Emits a fence sequence number on the gfx ring and flushes
> >* GPU caches.
> >*/
> >   static void gfx_v7_0_ring_emit_fence_gfx(struct amdgpu_ring *ring, u64 
> > addr,
> > @@ -2215,7 +2215,7 @@ static void gfx_v7_0_ring_emit_fence_gfx(struct 
> > amdgpu_ring *ring, u64 addr,
> >* @seq: sequence number
> >* @flags: fence related flags
> >*
> > - * Emits a fence sequnce number on the compute ring and flushes
> > + * Emits a fence sequence number on the compute ring and flushes
> >* GPU caches.
> >*/
> >   static void gfx_v7_0_ring_emit_fence_compute(struct amdgpu_ring *ring,
> > @@ -2245,14 +2245,14 @@ static void gfx_v7_0_ring_emit_fence_compute(struct 
> > amdgpu_ring *ring,
> >* gfx_v7_0_ring_emit_ib - emit an IB (Indirect Buffer) on the ring
> >*
> >* @ring: amdgpu_ring structure holding ring information
> > - * @job: job to retrive vmid from
> > + * @job: job to retrieve vmid from
> >* @ib: amdgpu indirect buffer object
> >* @flags: options (AMDGPU_HAVE_CTX_SWITCH)
> >*
> >* Emits an DE (drawing engine) or CE (constant engine) IB
> >* on the gfx ring.  IBs are usually generated by userspace
> >* acceleration drivers and submitted to the kernel for
> > - * sheduling on the ring.  This function schedules the IB
> > + * scheduling on the ring.  This function schedules the IB
> >* on the gfx ring for execution by the GPU.
> >*/
> >   static void gfx_v7_0_ring_emit_ib_gfx(struct amdgpu_ring *ring,
> > @@ -2402,7 +2402,7 @@ static int gfx_v7_0_ring_test_ib(struct amdgpu_ring 
> > *ring, long timeout)
> >
> >   /*
> >* CP.
> > - * On CIK, gfx and compute now have independant command processors.
> > + * On CIK, gfx and compute now have independent command processors.
> >*
> >* GFX
> >* Gfx consists of a single ring and can process both gfx jobs and
> > @@ -2630,7 +2630,7 @@ static int gfx_v7_0_cp_gfx_resume(struct 
> > amdgpu_device *adev)
> >   ring->wptr = 0;
> >   WREG32(mmCP_RB0_WPTR, lower_32_bits(ring->wptr));
> >
> > - /* set the wb address wether it's enabled or not */
> > + /* set the wb address whether it's enabled or not */
> >   rptr_addr = adev->wb.gpu_addr + (ring->rptr_offs * 4);
> >   WREG32(mmCP_RB0_RPTR_ADDR, lower_32_bits(rptr_addr));
> >   WREG32(mmCP_RB0_RPTR_ADDR_HI, upper_32_bits(rptr_addr) & 0xFF);
> > @@ -2985,7 +2985,7 @@ static void gfx_v7_0_mqd_init(struct amdgpu_device 
> > *adev,
> >   mqd->cp_hqd_pq_wptr_poll_addr_lo = wb_gpu_addr & 0xfffc;
> >   mqd->cp_hqd_pq_wptr_poll_addr_hi = upper_32_bits(wb_gpu_addr) & 
> > 0x;
> >
> > - /* set the wb address wether it's enabled or not */
> > + /* set the wb address whether it's enabled or not */
> >   wb_gpu_addr = adev->wb.gpu_addr + (ring->rptr_offs * 4);
> >   

Re: [PATCH V2] drm/radeon/r600_cs: Few typo fixes

2021-03-25 Thread Alex Deucher
Applied.  Thanks!

Alex

On Wed, Mar 24, 2021 at 7:46 PM Randy Dunlap  wrote:
>
> On 3/24/21 4:29 PM, Bhaskar Chowdhury wrote:
> > s/miror/mirror/
> > s/needind/needing/
> > s/informations/information/
> >
> > Signed-off-by: Bhaskar Chowdhury 
>
> Acked-by: Randy Dunlap 
>
> Thanks.
>
> > ---
> >  Changes from V1:
> >  Randy's finding incorporated ,i.e in one place,informations->information
> >   Adjusted the subject line accordingly
> >
> >  drivers/gpu/drm/radeon/r600_cs.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/radeon/r600_cs.c 
> > b/drivers/gpu/drm/radeon/r600_cs.c
> > index 34b7c6f16479..8be4799a98ef 100644
> > --- a/drivers/gpu/drm/radeon/r600_cs.c
> > +++ b/drivers/gpu/drm/radeon/r600_cs.c
> > @@ -38,7 +38,7 @@ extern void r600_cs_legacy_get_tiling_conf(struct 
> > drm_device *dev, u32 *npipes,
> >
> >
> >  struct r600_cs_track {
> > - /* configuration we miror so that we use same code btw kms/ums */
> > + /* configuration we mirror so that we use same code btw kms/ums */
> >   u32 group_size;
> >   u32 nbanks;
> >   u32 npipes;
> > @@ -963,7 +963,7 @@ static int r600_cs_parse_packet0(struct 
> > radeon_cs_parser *p,
> >   *
> >   * This function will test against r600_reg_safe_bm and return 0
> >   * if register is safe. If register is not flag as safe this function
> > - * will test it against a list of register needind special handling.
> > + * will test it against a list of register needing special handling.
> >   */
> >  static int r600_cs_check_reg(struct radeon_cs_parser *p, u32 reg, u32 idx)
> >  {
> > @@ -2336,7 +2336,7 @@ int r600_cs_parse(struct radeon_cs_parser *p)
> >  /**
> >   * r600_dma_cs_next_reloc() - parse next reloc
> >   * @p:   parser structure holding parsing context.
> > - * @cs_reloc:reloc informations
> > + * @cs_reloc:reloc information
> >   *
> >   * Return the next reloc, do bo validation and compute
> >   * GPU offset using the provided start.
> > --
>
>
> --
> ~Randy
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] amdgpu: securedisplay: simplify i2c hexdump output

2021-03-25 Thread Alex Deucher
Applied.  Thanks!

Alex

On Wed, Mar 24, 2021 at 9:37 AM Arnd Bergmann  wrote:
>
> From: Arnd Bergmann 
>
> A previous fix I did left a rather complicated loop in
> amdgpu_securedisplay_debugfs_write() for what could be expressed in a
> simple sprintf, as Rasmus pointed out.
>
> This drops the leading 0x for each byte, but is otherwise
> much nicer.
>
> Suggested-by: Rasmus Villemoes 
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_securedisplay.c | 11 +++
>  1 file changed, 3 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_securedisplay.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_securedisplay.c
> index 69d7f6bff5d4..fc3ddd7aa6f0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_securedisplay.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_securedisplay.c
> @@ -92,9 +92,7 @@ static ssize_t amdgpu_securedisplay_debugfs_write(struct 
> file *f, const char __u
> struct drm_device *dev = adev_to_drm(adev);
> uint32_t phy_id;
> uint32_t op;
> -   int i;
> char str[64];
> -   char i2c_output[256];
> int ret;
>
> if (*pos || size > sizeof(str) - 1)
> @@ -136,12 +134,9 @@ static ssize_t amdgpu_securedisplay_debugfs_write(struct 
> file *f, const char __u
> ret = psp_securedisplay_invoke(psp, 
> TA_SECUREDISPLAY_COMMAND__SEND_ROI_CRC);
> if (!ret) {
> if (securedisplay_cmd->status == 
> TA_SECUREDISPLAY_STATUS__SUCCESS) {
> -   int pos = 0;
> -   memset(i2c_output,  0, sizeof(i2c_output));
> -   for (i = 0; i < 
> TA_SECUREDISPLAY_I2C_BUFFER_SIZE; i++)
> -   pos += sprintf(i2c_output + pos, " 
> 0x%X",
> -   
> securedisplay_cmd->securedisplay_out_message.send_roi_crc.i2c_buf[i]);
> -   dev_info(adev->dev, "SECUREDISPLAY: I2C 
> buffer out put is :%s\n", i2c_output);
> +   dev_info(adev->dev, "SECUREDISPLAY: I2C 
> buffer out put is: %*ph\n",
> +TA_SECUREDISPLAY_I2C_BUFFER_SIZE,
> +
> securedisplay_cmd->securedisplay_out_message.send_roi_crc.i2c_buf);
> } else {
> psp_securedisplay_parse_resp_status(psp, 
> securedisplay_cmd->status);
> }
> --
> 2.29.2
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2] drm/amdgpu: Ensure that the modifier requested is supported by plane.

2021-03-25 Thread Alex Deucher
On Wed, Mar 24, 2021 at 4:16 PM Mark Yacoub  wrote:
>
> From: Mark Yacoub 
>
> On initializing the framebuffer, call drm_any_plane_has_format to do a
> check if the modifier is supported. drm_any_plane_has_format calls
> dm_plane_format_mod_supported which is extended to validate that the
> modifier is on the list of the plane's supported modifiers.
>
> The bug was caught using igt-gpu-tools test: 
> kms_addfb_basic.addfb25-bad-modifier
>
> Tested on ChromeOS Zork by turning on the display, running an overlay
> test, and running a YT video.
>
> === Changes from v1 ===
> Explicitly handle DRM_FORMAT_MOD_INVALID modifier.
>
> Cc: Alex Deucher 
> Cc: Bas Nieuwenhuizen 
> Signed-off-by: default avatarMark Yacoub 

Applied.  Thanks!

Alex

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_display.c| 13 +
>  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c  | 18 +++---
>  2 files changed, 28 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> index afa5f8ad0f563..a947b5aa420d2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> @@ -908,6 +908,19 @@ int amdgpu_display_gem_fb_verify_and_init(
>  _fb_funcs);
> if (ret)
> goto err;
> +   /* Verify that the modifier is supported. */
> +   if (!drm_any_plane_has_format(dev, mode_cmd->pixel_format,
> + mode_cmd->modifier[0])) {
> +   struct drm_format_name_buf format_name;
> +   drm_dbg_kms(dev,
> +   "unsupported pixel format %s / modifier 0x%llx\n",
> +   drm_get_format_name(mode_cmd->pixel_format,
> +   _name),
> +   mode_cmd->modifier[0]);
> +
> +   ret = -EINVAL;
> +   goto err;
> +   }
>
> ret = amdgpu_display_framebuffer_init(dev, rfb, mode_cmd, obj);
> if (ret)
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index 961abf1cf040c..6fc746996c24f 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -3939,6 +3939,7 @@ static bool dm_plane_format_mod_supported(struct 
> drm_plane *plane,
>  {
> struct amdgpu_device *adev = drm_to_adev(plane->dev);
> const struct drm_format_info *info = drm_format_info(format);
> +   int i;
>
> enum dm_micro_swizzle microtile = 
> modifier_gfx9_swizzle_mode(modifier) & 3;
>
> @@ -3946,11 +3947,22 @@ static bool dm_plane_format_mod_supported(struct 
> drm_plane *plane,
> return false;
>
> /*
> -* We always have to allow this modifier, because core DRM still
> -* checks LINEAR support if userspace does not provide modifers.
> +* We always have to allow these modifiers:
> +* 1. Core DRM checks for LINEAR support if userspace does not 
> provide modifiers.
> +* 2. Not passing any modifiers is the same as explicitly passing 
> INVALID.
>  */
> -   if (modifier == DRM_FORMAT_MOD_LINEAR)
> +   if (modifier == DRM_FORMAT_MOD_LINEAR ||
> +   modifier == DRM_FORMAT_MOD_INVALID) {
> return true;
> +   }
> +
> +   /* Check that the modifier is on the list of the plane's supported 
> modifiers. */
> +   for (i = 0; i < plane->modifier_count; i++) {
> +   if (modifier == plane->modifiers[i])
> +   break;
> +   }
> +   if (i == plane->modifier_count)
> +   return false;
>
> /*
>  * The arbitrary tiling support for multiplane formats has not been 
> hooked
> --
> 2.31.0.291.g576ba9dcdaf-goog
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 4/4] drm/amdgpu: Introduce new SETUP_TMR interface

2021-03-25 Thread Oak Zeng
This new interface passes both virtual and physical address
to PSP. It is backword compatible with old interface.

v2: use a macro to simply tmr physical address calc (Lijo)

Signed-off-by: Oak Zeng 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 12 +---
 drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h | 11 ++-
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index 1005124..d224c53 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -331,8 +331,12 @@ psp_cmd_submit_buf(struct psp_context *psp,
 
 static void psp_prep_tmr_cmd_buf(struct psp_context *psp,
 struct psp_gfx_cmd_resp *cmd,
-uint64_t tmr_mc, uint32_t size)
+uint64_t tmr_mc, struct amdgpu_bo *tmr_bo)
 {
+   struct amdgpu_device *adev = psp->adev;
+   uint32_t size = amdgpu_bo_size(tmr_bo);
+   uint64_t tmr_pa = amdgpu_gmc_gpu_pa(adev, tmr_bo);
+
if (amdgpu_sriov_vf(psp->adev))
cmd->cmd_id = GFX_CMD_ID_SETUP_VMR;
else
@@ -340,6 +344,9 @@ static void psp_prep_tmr_cmd_buf(struct psp_context *psp,
cmd->cmd.cmd_setup_tmr.buf_phy_addr_lo = lower_32_bits(tmr_mc);
cmd->cmd.cmd_setup_tmr.buf_phy_addr_hi = upper_32_bits(tmr_mc);
cmd->cmd.cmd_setup_tmr.buf_size = size;
+   cmd->cmd.cmd_setup_tmr.bitfield.virt_phy_addr = 1;
+   cmd->cmd.cmd_setup_tmr.system_phy_addr_lo = lower_32_bits(tmr_pa);
+   cmd->cmd.cmd_setup_tmr.system_phy_addr_hi = upper_32_bits(tmr_pa);
 }
 
 static void psp_prep_load_toc_cmd_buf(struct psp_gfx_cmd_resp *cmd,
@@ -459,8 +466,7 @@ static int psp_tmr_load(struct psp_context *psp)
if (!cmd)
return -ENOMEM;
 
-   psp_prep_tmr_cmd_buf(psp, cmd, psp->tmr_mc_addr,
-amdgpu_bo_size(psp->tmr_bo));
+   psp_prep_tmr_cmd_buf(psp, cmd, psp->tmr_mc_addr, psp->tmr_bo);
DRM_INFO("reserve 0x%lx from 0x%llx for PSP TMR\n",
 amdgpu_bo_size(psp->tmr_bo), psp->tmr_mc_addr);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h 
b/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h
index 8c1d0b5..edc5d75 100644
--- a/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h
+++ b/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h
@@ -170,10 +170,19 @@ struct psp_gfx_cmd_setup_tmr
 uint32_tbuf_phy_addr_lo;   /* bits [31:0] of GPU Virtual 
address of TMR buffer (must be 4 KB aligned) */
 uint32_tbuf_phy_addr_hi;   /* bits [63:32] of GPU Virtual 
address of TMR buffer */
 uint32_tbuf_size;  /* buffer size in bytes (must be 
multiple of 4 KB) */
+union {
+   struct {
+   uint32_tsriov_enabled:1; /* whether the device runs 
under SR-IOV*/
+   uint32_tvirt_phy_addr:1; /* driver passes both virtual 
and physical address to PSP*/
+   uint32_treserved:30;
+   } bitfield;
+   uint32_ttmr_flags;
+};
+uint32_tsystem_phy_addr_lo;/* bits [31:0] of system 
physical address of TMR buffer (must be 4 KB aligned) */
+uint32_tsystem_phy_addr_hi;/* bits [63:32] of system 
physical address of TMR buffer */
 
 };
 
-
 /* FW types for GFX_CMD_ID_LOAD_IP_FW command. Limit 31. */
 enum psp_gfx_fw_type {
GFX_FW_TYPE_NONE= 0,/* */
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 3/4] Revert "drm/amdgpu: workaround the TMR MC address issue (v2)"

2021-03-25 Thread Oak Zeng
This reverts commit 34a33d4683cba7ba63c894132efb1998c0217631.

Signed-off-by: Oak Zeng 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h  |  9 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c  | 10 --
 drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c | 21 ++---
 drivers/gpu/drm/amd/amdgpu/mmhub_v1_7.c  | 10 --
 4 files changed, 10 insertions(+), 40 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
index 7cd9d34..a9e0bba 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
@@ -217,15 +217,6 @@ struct amdgpu_gmc {
 */
u64 fb_start;
u64 fb_end;
-   /* In the case of use GART table for vmid0 FB access, [fb_start, fb_end]
-* will be squeezed to GART aperture. But we have a PSP FW issue to fix
-* for now. To temporarily workaround the PSP FW issue, added below two
-* variables to remember the original fb_start/end to re-enable FB
-* aperture to workaround the PSP FW issue. Will delete it after we
-* get a proper PSP FW fix.
-*/
-   u64 fb_start_original;
-   u64 fb_end_original;
unsignedvram_width;
u64 real_vram_size;
int vram_mtrr;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index 5c71c5c..1005124 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -410,16 +410,6 @@ static int psp_tmr_init(struct psp_context *psp)
  AMDGPU_GEM_DOMAIN_VRAM,
  >tmr_bo, >tmr_mc_addr, pptr);
 
-   /* workaround the tmr_mc_addr:
-* PSP requires an address in FB aperture. Right now driver produce
-* tmr_mc_addr in the GART aperture. Convert it back to FB aperture
-* for PSP. Will revert it after we get a fix from PSP FW.
-*/
-   if (psp->adev->asic_type == CHIP_ALDEBARAN) {
-   psp->tmr_mc_addr -= psp->adev->gmc.fb_start;
-   psp->tmr_mc_addr += psp->adev->gmc.fb_start_original;
-   }
-
return ret;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
index 7beef4c..8c8f0d3 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
@@ -154,21 +154,12 @@ static void gfxhub_v1_0_init_system_aperture_regs(struct 
amdgpu_device *adev)
 * FB aperture and AGP aperture. Disable them.
 */
if (adev->gmc.pdb0_bo) {
-   if (adev->asic_type == CHIP_ALDEBARAN) {
-   WREG32_SOC15(GC, 0, mmMC_VM_FB_LOCATION_TOP, 
adev->gmc.fb_end_original >> 24);
-   WREG32_SOC15(GC, 0, mmMC_VM_FB_LOCATION_BASE, 
adev->gmc.fb_start_original >> 24);
-   WREG32_SOC15(GC, 0, mmMC_VM_AGP_TOP, 0);
-   WREG32_SOC15(GC, 0, mmMC_VM_AGP_BOT, 0xFF);
-   WREG32_SOC15(GC, 0, mmMC_VM_SYSTEM_APERTURE_LOW_ADDR, 
adev->gmc.fb_start_original >> 18);
-   WREG32_SOC15(GC, 0, mmMC_VM_SYSTEM_APERTURE_HIGH_ADDR, 
adev->gmc.fb_end_original >> 18);
-   } else {
-   WREG32_SOC15(GC, 0, mmMC_VM_FB_LOCATION_TOP, 0);
-   WREG32_SOC15(GC, 0, mmMC_VM_FB_LOCATION_BASE, 
0x00FF);
-   WREG32_SOC15(GC, 0, mmMC_VM_AGP_TOP, 0);
-   WREG32_SOC15(GC, 0, mmMC_VM_AGP_BOT, 0xFF);
-   WREG32_SOC15(GC, 0, mmMC_VM_SYSTEM_APERTURE_LOW_ADDR, 
0x3FFF);
-   WREG32_SOC15(GC, 0, mmMC_VM_SYSTEM_APERTURE_HIGH_ADDR, 
0);
-   }
+   WREG32_SOC15(GC, 0, mmMC_VM_FB_LOCATION_TOP, 0);
+   WREG32_SOC15(GC, 0, mmMC_VM_FB_LOCATION_BASE, 0x00FF);
+   WREG32_SOC15(GC, 0, mmMC_VM_AGP_TOP, 0);
+   WREG32_SOC15(GC, 0, mmMC_VM_AGP_BOT, 0xFF);
+   WREG32_SOC15(GC, 0, mmMC_VM_SYSTEM_APERTURE_LOW_ADDR, 
0x3FFF);
+   WREG32_SOC15(GC, 0, mmMC_VM_SYSTEM_APERTURE_HIGH_ADDR, 0);
}
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_7.c 
b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_7.c
index 8862ac2..8bb36d1 100644
--- a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_7.c
+++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_7.c
@@ -51,8 +51,6 @@ static u64 mmhub_v1_7_get_fb_location(struct amdgpu_device 
*adev)
 
adev->gmc.fb_start = base;
adev->gmc.fb_end = top;
-   adev->gmc.fb_start_original = base;
-   adev->gmc.fb_end_original = top;
 
return base;
 }
@@ -148,10 +146,10 @@ static void mmhub_v1_7_init_system_aperture_regs(struct 
amdgpu_device *adev)
if (adev->gmc.pdb0_bo) {
WREG32_SOC15(MMHUB, 0, 

[PATCH 1/4] drm/amdgpu: Macros for vram physical addr calculation

2021-03-25 Thread Oak Zeng
Add one macro to calculate BO's GPU physical address.
And another one to calculate BO's CPU physical address.

Signed-off-by: Oak Zeng 
Suggested-by: Lijo Lazar 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
index 2ee8d1b..7cd9d34 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
@@ -283,6 +283,9 @@ struct amdgpu_gmc {
 #define amdgpu_gmc_get_vm_pde(adev, level, dst, flags) 
(adev)->gmc.gmc_funcs->get_vm_pde((adev), (level), (dst), (flags))
 #define amdgpu_gmc_get_vm_pte(adev, mapping, flags) 
(adev)->gmc.gmc_funcs->get_vm_pte((adev), (mapping), (flags))
 #define amdgpu_gmc_get_vbios_fb_size(adev) 
(adev)->gmc.gmc_funcs->get_vbios_fb_size((adev))
+#define amdgpu_gmc_gpu_va2pa(adev, va) (va - (adev)->gmc.vram_start + 
(adev)->vm_manager.vram_base_offset)
+#define amdgpu_gmc_gpu_pa(adev, bo) amdgpu_gmc_gpu_va2pa(adev, 
amdgpu_bo_gpu_offset(bo))
+#define amdgpu_gmc_cpu_pa(adev, bo) (amdgpu_bo_gpu_offset(bo) - 
(adev)->gmc.vram_start + (adev)->gmc.aper_base)
 
 /**
  * amdgpu_gmc_vram_full_visible - Check if full VRAM is visible through the BAR
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 2/4] drm/amdgpu: use macros to simplify codes

2021-03-25 Thread Oak Zeng
Use amdgpu_gmc_gpu_pa and amdgpu_gmc_cpu_pa
to simplify codes. No logic change.

Signed-off-by: Oak Zeng 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 3 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c   | 4 +---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c  | 4 +---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c  | 3 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c  | 4 +---
 drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c | 3 +--
 drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.c | 3 +--
 drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c | 3 +--
 drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c   | 3 +--
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c| 3 +--
 drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c  | 3 +--
 drivers/gpu/drm/amd/amdgpu/mmhub_v1_7.c  | 3 +--
 drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c  | 3 +--
 drivers/gpu/drm/amd/amdgpu/mmhub_v2_3.c  | 3 +--
 drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c  | 3 +--
 15 files changed, 15 insertions(+), 33 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 7357c1e..33581ef 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1831,8 +1831,7 @@ static int get_sg_table(struct amdgpu_device *adev,
goto out;
 
if (bo->preferred_domains == AMDGPU_GEM_DOMAIN_VRAM) {
-   bus_addr = amdgpu_bo_gpu_offset(bo) - adev->gmc.vram_start
-  + adev->gmc.aper_base + offset;
+   bus_addr = amdgpu_gmc_cpu_pa(adev, bo) + offset;
aper_limit = adev->gmc.aper_base + adev->gmc.aper_size;
if (bus_addr + (chunks * page_size) > aper_limit) {
pr_err("sg: bus addr not inside pci aperture\n");
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
index fd04a3a..dbeb774 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
@@ -211,7 +211,6 @@ static int amdgpufb_create(struct drm_fb_helper *helper,
struct drm_gem_object *gobj = NULL;
struct amdgpu_bo *abo = NULL;
int ret;
-   unsigned long tmp;
 
memset(_cmd, 0, sizeof(mode_cmd));
mode_cmd.width = sizes->surface_width;
@@ -252,8 +251,7 @@ static int amdgpufb_create(struct drm_fb_helper *helper,
 
info->fbops = _ops;
 
-   tmp = amdgpu_bo_gpu_offset(abo) - adev->gmc.vram_start;
-   info->fix.smem_start = adev->gmc.aper_base + tmp;
+   info->fix.smem_start = amdgpu_gmc_cpu_pa(adev, abo);
info->fix.smem_len = amdgpu_bo_size(abo);
info->screen_base = amdgpu_bo_kptr(abo);
info->screen_size = amdgpu_bo_size(abo);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 46a10c1..391d41f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -475,9 +475,7 @@ int amdgpu_gem_dgma_ioctl(struct drm_device *dev, void 
*data,
r = -EINVAL;
goto release_object;
}
-   args->addr = amdgpu_bo_gpu_offset(abo);
-   args->addr -= adev->gmc.vram_start;
-   args->addr += adev->gmc.aper_base;
+   args->addr = amdgpu_gmc_cpu_pa(adev, abo);
break;
default:
return -EINVAL;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index e6344a8..f02143a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -671,8 +671,7 @@ void amdgpu_gmc_init_pdb0(struct amdgpu_device *adev)
u64 vram_addr = adev->vm_manager.vram_base_offset -
adev->gmc.xgmi.physical_node_id * 
adev->gmc.xgmi.node_segment_size;
u64 vram_end = vram_addr + vram_size;
-   u64 gart_ptb_gpu_pa = amdgpu_bo_gpu_offset(adev->gart.bo) +
-   adev->vm_manager.vram_base_offset - adev->gmc.vram_start;
+   u64 gart_ptb_gpu_pa = amdgpu_gmc_gpu_pa(adev, adev->gart.bo);
 
flags |= AMDGPU_PTE_VALID | AMDGPU_PTE_READABLE;
flags |= AMDGPU_PTE_WRITEABLE;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index d1211ef..9c18e00 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -2234,9 +2234,7 @@ static int amdgpu_ssg_init(struct amdgpu_device *adev)
 
init_completion(>ssg.cmp);
 
-   res.start = adev->gmc.aper_base +
-   (amdgpu_bo_gpu_offset(adev->direct_gma.dgma_bo) -
-adev->gmc.vram_start);
+   res.start = amdgpu_gmc_cpu_pa(adev, adev->direct_gma.dgma_bo);
res.end = res.start + amdgpu_bo_size(adev->direct_gma.dgma_bo) - 1;
res.name = "DirectGMA";
 
diff --git 

Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak

2021-03-25 Thread Andrey Grodzovsky

There are a few issues here like - how u handle non guilty singnaled jobs
in drm_sched_stop, currently looks like you don't call put for them
and just explicitly free them as before. Also sched->free_guilty
seems useless with the new approach. Do we even need the cleanup
mechanism at drm_sched_get_cleanup_job with this approach...

But first - We need Christian to express his opinion on this since
I think he opposed refcounting jobs and that we should concentrate on
fences instead.

Christian - can you chime in here ?

Andrey

On 2021-03-25 5:51 a.m., Zhang, Jack (Jian) wrote:

[AMD Official Use Only - Internal Distribution Only]


Hi, Andrey

Thank you for your good opinions.

I literally agree with you that the refcount could solve the 
get_clean_up_up cocurrent job gracefully, and no need to re-insert the


job back anymore.

I quickly made a draft for this idea as follows:

How do you like it? I will start implement to it after I got your 
acknowledge.


Thanks,

Jack

+void drm_job_get(struct drm_sched_job *s_job)

+{

+   kref_get(_job->refcount);

+}

+

+void drm_job_do_release(struct kref *ref)

+{

+   struct drm_sched_job *s_job;

+   struct drm_gpu_scheduler *sched;

+

+   s_job = container_of(ref, struct drm_sched_job, refcount);

+   sched = s_job->sched;

+   sched->ops->free_job(s_job);

+}

+

+void drm_job_put(struct drm_sched_job *s_job)

+{

+   kref_put(_job->refcount, drm_job_do_release);

+}

+

static void drm_sched_job_begin(struct drm_sched_job *s_job)

{

     struct drm_gpu_scheduler *sched = s_job->sched;

+   kref_init(_job->refcount);

+   drm_job_get(s_job);

     spin_lock(>job_list_lock);

     list_add_tail(_job->node, >ring_mirror_list);

     drm_sched_start_timeout(sched);

@@ -294,17 +316,16 @@ static void drm_sched_job_timedout(struct 
work_struct *work)


  * drm_sched_cleanup_jobs. It will be reinserted back 
after sched->thread


  * is parked at which point it's safe.

      */

-   list_del_init(>node);

+   drm_job_get(job);

     spin_unlock(>job_list_lock);

 job->sched->ops->timedout_job(job);

-

+   drm_job_put(job);

     /*

      * Guilty job did complete and hence needs to be 
manually removed


  * See drm_sched_stop doc.

  */

     if (sched->free_guilty) {

-   job->sched->ops->free_job(job);

     sched->free_guilty = false;

     }

     } else {

@@ -355,20 +376,6 @@ void drm_sched_stop(struct drm_gpu_scheduler 
*sched, struct drm_sched_job *bad)


-   /*

-    * Reinsert back the bad job here - now it's safe as

-    * drm_sched_get_cleanup_job cannot race against us and release the

-    * bad job at this point - we parked (waited for) any in progress

-    * (earlier) cleanups and drm_sched_get_cleanup_job will not be 
called


-    * now until the scheduler thread is unparked.

-    */

-   if (bad && bad->sched == sched)

-   /*

-    * Add at the head of the queue to reflect it was the 
earliest


-    * job extracted.

-    */

-   list_add(>node, >ring_mirror_list);

-

     /*

  * Iterate the job list from later to  earlier one and either 
deactive


  * their HW callbacks or remove them from mirror list if they 
already


@@ -774,7 +781,7 @@ static int drm_sched_main(void *param)

  kthread_should_stop());

 if (cleanup_job) {

-   sched->ops->free_job(cleanup_job);

+       drm_job_put(cleanup_job);

     /* queue timeout for next job */

     drm_sched_start_timeout(sched);

     }

diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h

index 5a1f068af1c2..b80513eec90f 100644

--- a/include/drm/gpu_scheduler.h

+++ b/include/drm/gpu_scheduler.h

@@ -188,6 +188,7 @@ struct drm_sched_fence *to_drm_sched_fence(struct 
dma_fence *f);


   * to schedule the job.

   */

struct drm_sched_job {

+   struct kref refcount;

     struct spsc_node    queue_node;

     struct drm_gpu_scheduler    *sched;

     struct drm_sched_fence  *s_fence;

@@ -198,6 +199,7 @@ struct drm_sched_job {

     enum drm_sched_priority s_priority;

     struct drm_sched_entity  *entity;

     struct dma_fence_cb cb;

+

};

*From:* Grodzovsky, Andrey 
*Sent:* Friday, March 19, 2021 12:17 AM
*To:* Zhang, Jack (Jian) ; Christian König 
; dri-de...@lists.freedesktop.org; 
amd-gfx@lists.freedesktop.org; Koenig, Christian 
; Liu, Monk ; Deng, Emily 
; Rob Herring ; Tomeu Vizoso 
; Steven Price 
*Subject:* Re: [PATCH v3] 

Re: Need to support mixed memory mappings with HMM

2021-03-25 Thread Felix Kuehling

Am 2021-03-25 um 12:22 p.m. schrieb Christian König:
>> My idea is to change the amdgpu_vm_update_mapping interface to use
>> some
>> high-bit in the pages_addr array to indicate the page type. For the
>> default page type (0) nothing really changes for the callers. The
>> "flags" parameter needs to become a pointer to an array that gets
>> indexed by the high bits from the pages_addr array. For existing
>> callers
>> it's as easy as changing flags to  (array of size 1). For
>> HMM we
>> would pass a pointer to a real array.
> Yeah, I've thought about stuff like that as well for a while.
>
> Problem is that this won't work that easily. We assume at many places
> that the flags doesn't change for the range in question.
 I think some lower level functions assume that the flags stay the same
 for physically contiguous ranges. But if you use the high-bits to
 encode
 the page type, the ranges won't be contiguous any more. So you can
 change page flags for different contiguous ranges.
>>> Yeah, but then you also get absolutely zero THP and fragment flags
>>> support.
>> As long as you have a contiguous 2MB page with the same page type, I
>> think you can still get a THP mapping in the GPU page table. But if one
>> page in the middle of a 2MB page has a different page type, that will
>> break the THP mapping, as it should.
>
> Yeah, but currently we detect that before we call down into the
> functions to update the tables.
>
> When you give a mixed list the chance for that is just completely gone.

Currently the detection of contiguous ranges is in amdgpu_vm_update_mapping:

> if (num_entries > AMDGPU_GPU_PAGES_IN_CPU_PAGE) { uint64_t pfn =
> cursor.start >> PAGE_SHIFT; uint64_t count; contiguous =
> pages_addr[pfn + 1] == pages_addr[pfn] + PAGE_SIZE; tmp = num_entries
> / AMDGPU_GPU_PAGES_IN_CPU_PAGE; for (count = 2; count < tmp; ++count)
> { uint64_t idx = pfn + count; if (contiguous != (pages_addr[idx] ==
> pages_addr[idx - 1] + PAGE_SIZE)) break; } num_entries = count *
> AMDGPU_GPU_PAGES_IN_CPU_PAGE; } 

If a page type changes from one page to the next, it will end a
contiguous range and call into the lower level (amdgpu_vm_update_ptes).
I don't think anything needs to change in amdgpu_vm_update_ptes, because
all pages contiguous passed to it use the same page type, so the same
flags. And the existing code in amdgpu_vm_update_mapping already does
the right thing. I honestly don't see the problem.

Regards,
  Felix


>
> Regards,
> Christian. 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: Need to support mixed memory mappings with HMM

2021-03-25 Thread Christian König

Am 25.03.21 um 17:20 schrieb Felix Kuehling:

Am 2021-03-25 um 12:16 p.m. schrieb Christian König:

Am 25.03.21 um 17:14 schrieb Felix Kuehling:

Am 2021-03-25 um 12:10 p.m. schrieb Christian König:

Am 25.03.21 um 17:03 schrieb Felix Kuehling:

Hi,

This is a long one with a proposal for a pretty significant
redesign of
how we handle migrations and VRAM management with HMM. This is
based on
my own debugging and reading of the migrate_vma helpers, as well as
Alex's problems with migrations on A+A. I hope we can discuss this
next
Monday after you've had some time do digest it.

I did some debugging yesterday and found that migrations to VRAM can
fail for some pages. The current migration helpers have many corner
cases where a page cannot be migrated. Some of them may be fixable
(adding support for THP), others are not (locked pages are skipped to
avoid deadlocks). Therefore I think our current code is too inflexible
when it assumes that a range is entirely in one place.

Alex also ran into some funny issues with COW on A+A where some pages
get faulted back to system memory. I think a lot of the problems here
will get easier once we support mixed mappings.

Mixed GPU mappings
===

The idea is, to remove any assumptions that an entire svm_range is in
one place. Instead hmm_range_fault gives us a list of pages, some of
which are system memory and others are device_private or
device_generic.

We will need an amdgpu_vm interface that lets us map mixed page arrays
where different pages use different PTE flags. We can have at least 3
different types of pages in one mapping:

    1. System memory (S-bit set)
    2. Local memory (S-bit cleared, MTYPE for local memory)
    3. Remote XGMI memory (S-bit cleared, MTYPE+C for remote memory)

My idea is to change the amdgpu_vm_update_mapping interface to use
some
high-bit in the pages_addr array to indicate the page type. For the
default page type (0) nothing really changes for the callers. The
"flags" parameter needs to become a pointer to an array that gets
indexed by the high bits from the pages_addr array. For existing
callers
it's as easy as changing flags to  (array of size 1). For HMM we
would pass a pointer to a real array.

Yeah, I've thought about stuff like that as well for a while.

Problem is that this won't work that easily. We assume at many places
that the flags doesn't change for the range in question.

I think some lower level functions assume that the flags stay the same
for physically contiguous ranges. But if you use the high-bits to encode
the page type, the ranges won't be contiguous any more. So you can
change page flags for different contiguous ranges.

Yeah, but then you also get absolutely zero THP and fragment flags
support.

As long as you have a contiguous 2MB page with the same page type, I
think you can still get a THP mapping in the GPU page table. But if one
page in the middle of a 2MB page has a different page type, that will
break the THP mapping, as it should.


Yeah, but currently we detect that before we call down into the 
functions to update the tables.


When you give a mixed list the chance for that is just completely gone.

Regards,
Christian.



Regards,
   Felix



But I think we could also add those later on.

Regards,
Christian.


Regards,
    Felix



We would somehow need to change that to get the flags directly from
the low bits of the DMA address or something instead.

Christian.


Once this is done, it leads to a number of opportunities for
simplification and better efficiency in kfd_svm:

     * Support migration when cpages != npages
     * Migrate a part of an svm_range without splitting it. No more
   splitting of ranges in CPU page faults
     * Migrate a part of an svm_range in GPU page fault handler. No
more
   migrating the whole range for a single page fault
     * Simplified VRAM management (see below)

With that, svm_range will no longer have an "actual_loc" field. If
we're
not sure where the data is, we need to call migrate. If it's
already in
the right place, then cpages will be 0 and we can skip all the steps
after migrate_vma_setup.

Simplified VRAM management
==

VRAM BOs are no longer associated with pranges. Instead they are
"free-floating", allocated during migration to VRAM, with reference
count for each page that uses the BO. Ref is released in page-release
callback. When the ref count drops to 0, free the BO.

VRAM BO size should match the migration granularity, 2MB by default.
That way the BO can be freed when memory gets migrated out. If
migration
of some pages fails the BO may not be fully occupied. Also some pages
may be released individually on A+A due to COW or other events.

Eviction needs to migrate all the pages still using the BO. If the BO
struct keeps an array of page pointers, that's basically the
migrate.src
for the eviction. Migration calls "try_to_unmap", which has the best
chance of freeing the BO, even when shared by multiple processes.

If we cannot 

Re: [PATCH drm/amdgpu 0/2] Convert sysfs sprintf/snprintf family to sysfs_emit

2021-03-25 Thread Alex Deucher
Applied the series.  Thanks!

Alex

On Wed, Mar 24, 2021 at 5:17 AM Tian Tao  wrote:
>
> Use the generic sysfs_emit() function to take place of
> snprintf/scnprintf, to avoid buffer overrun.
>
> Tian Tao (2):
>   drm/amdgpu: Convert sysfs sprintf/snprintf family to sysfs_emit
>   drm/amd/pm: Convert sysfs sprintf/snprintf family to sysfs_emit
>
>  drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c   |  8 +--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c  |  6 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c  |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c  |  8 +--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 32 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c |  4 +-
>  drivers/gpu/drm/amd/amdgpu/df_v3_6.c |  2 +-
>  drivers/gpu/drm/amd/pm/amdgpu_pm.c   | 88 
> ++--
>  9 files changed, 73 insertions(+), 79 deletions(-)
>
> --
> 2.7.4
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: Need to support mixed memory mappings with HMM

2021-03-25 Thread Felix Kuehling
Am 2021-03-25 um 12:16 p.m. schrieb Christian König:
> Am 25.03.21 um 17:14 schrieb Felix Kuehling:
>> Am 2021-03-25 um 12:10 p.m. schrieb Christian König:
>>>
>>> Am 25.03.21 um 17:03 schrieb Felix Kuehling:
 Hi,

 This is a long one with a proposal for a pretty significant
 redesign of
 how we handle migrations and VRAM management with HMM. This is
 based on
 my own debugging and reading of the migrate_vma helpers, as well as
 Alex's problems with migrations on A+A. I hope we can discuss this
 next
 Monday after you've had some time do digest it.

 I did some debugging yesterday and found that migrations to VRAM can
 fail for some pages. The current migration helpers have many corner
 cases where a page cannot be migrated. Some of them may be fixable
 (adding support for THP), others are not (locked pages are skipped to
 avoid deadlocks). Therefore I think our current code is too inflexible
 when it assumes that a range is entirely in one place.

 Alex also ran into some funny issues with COW on A+A where some pages
 get faulted back to system memory. I think a lot of the problems here
 will get easier once we support mixed mappings.

 Mixed GPU mappings
 ===

 The idea is, to remove any assumptions that an entire svm_range is in
 one place. Instead hmm_range_fault gives us a list of pages, some of
 which are system memory and others are device_private or
 device_generic.

 We will need an amdgpu_vm interface that lets us map mixed page arrays
 where different pages use different PTE flags. We can have at least 3
 different types of pages in one mapping:

    1. System memory (S-bit set)
    2. Local memory (S-bit cleared, MTYPE for local memory)
    3. Remote XGMI memory (S-bit cleared, MTYPE+C for remote memory)

 My idea is to change the amdgpu_vm_update_mapping interface to use
 some
 high-bit in the pages_addr array to indicate the page type. For the
 default page type (0) nothing really changes for the callers. The
 "flags" parameter needs to become a pointer to an array that gets
 indexed by the high bits from the pages_addr array. For existing
 callers
 it's as easy as changing flags to  (array of size 1). For HMM we
 would pass a pointer to a real array.
>>> Yeah, I've thought about stuff like that as well for a while.
>>>
>>> Problem is that this won't work that easily. We assume at many places
>>> that the flags doesn't change for the range in question.
>> I think some lower level functions assume that the flags stay the same
>> for physically contiguous ranges. But if you use the high-bits to encode
>> the page type, the ranges won't be contiguous any more. So you can
>> change page flags for different contiguous ranges.
>
> Yeah, but then you also get absolutely zero THP and fragment flags
> support.

As long as you have a contiguous 2MB page with the same page type, I
think you can still get a THP mapping in the GPU page table. But if one
page in the middle of a 2MB page has a different page type, that will
break the THP mapping, as it should.

Regards,
  Felix


>
> But I think we could also add those later on.
>
> Regards,
> Christian.
>
>>
>> Regards,
>>    Felix
>>
>>
>>> We would somehow need to change that to get the flags directly from
>>> the low bits of the DMA address or something instead.
>>>
>>> Christian.
>>>
 Once this is done, it leads to a number of opportunities for
 simplification and better efficiency in kfd_svm:

     * Support migration when cpages != npages
     * Migrate a part of an svm_range without splitting it. No more
   splitting of ranges in CPU page faults
     * Migrate a part of an svm_range in GPU page fault handler. No
 more
   migrating the whole range for a single page fault
     * Simplified VRAM management (see below)

 With that, svm_range will no longer have an "actual_loc" field. If
 we're
 not sure where the data is, we need to call migrate. If it's
 already in
 the right place, then cpages will be 0 and we can skip all the steps
 after migrate_vma_setup.

 Simplified VRAM management
 ==

 VRAM BOs are no longer associated with pranges. Instead they are
 "free-floating", allocated during migration to VRAM, with reference
 count for each page that uses the BO. Ref is released in page-release
 callback. When the ref count drops to 0, free the BO.

 VRAM BO size should match the migration granularity, 2MB by default.
 That way the BO can be freed when memory gets migrated out. If
 migration
 of some pages fails the BO may not be fully occupied. Also some pages
 may be released individually on A+A due to COW or other events.

 Eviction needs to migrate all the pages still using the BO. If the BO
 struct 

Re: Need to support mixed memory mappings with HMM

2021-03-25 Thread Christian König

Am 25.03.21 um 17:14 schrieb Felix Kuehling:

Am 2021-03-25 um 12:10 p.m. schrieb Christian König:


Am 25.03.21 um 17:03 schrieb Felix Kuehling:

Hi,

This is a long one with a proposal for a pretty significant redesign of
how we handle migrations and VRAM management with HMM. This is based on
my own debugging and reading of the migrate_vma helpers, as well as
Alex's problems with migrations on A+A. I hope we can discuss this next
Monday after you've had some time do digest it.

I did some debugging yesterday and found that migrations to VRAM can
fail for some pages. The current migration helpers have many corner
cases where a page cannot be migrated. Some of them may be fixable
(adding support for THP), others are not (locked pages are skipped to
avoid deadlocks). Therefore I think our current code is too inflexible
when it assumes that a range is entirely in one place.

Alex also ran into some funny issues with COW on A+A where some pages
get faulted back to system memory. I think a lot of the problems here
will get easier once we support mixed mappings.

Mixed GPU mappings
===

The idea is, to remove any assumptions that an entire svm_range is in
one place. Instead hmm_range_fault gives us a list of pages, some of
which are system memory and others are device_private or device_generic.

We will need an amdgpu_vm interface that lets us map mixed page arrays
where different pages use different PTE flags. We can have at least 3
different types of pages in one mapping:

   1. System memory (S-bit set)
   2. Local memory (S-bit cleared, MTYPE for local memory)
   3. Remote XGMI memory (S-bit cleared, MTYPE+C for remote memory)

My idea is to change the amdgpu_vm_update_mapping interface to use some
high-bit in the pages_addr array to indicate the page type. For the
default page type (0) nothing really changes for the callers. The
"flags" parameter needs to become a pointer to an array that gets
indexed by the high bits from the pages_addr array. For existing callers
it's as easy as changing flags to  (array of size 1). For HMM we
would pass a pointer to a real array.

Yeah, I've thought about stuff like that as well for a while.

Problem is that this won't work that easily. We assume at many places
that the flags doesn't change for the range in question.

I think some lower level functions assume that the flags stay the same
for physically contiguous ranges. But if you use the high-bits to encode
the page type, the ranges won't be contiguous any more. So you can
change page flags for different contiguous ranges.


Yeah, but then you also get absolutely zero THP and fragment flags support.

But I think we could also add those later on.

Regards,
Christian.



Regards,
   Felix



We would somehow need to change that to get the flags directly from
the low bits of the DMA address or something instead.

Christian.


Once this is done, it leads to a number of opportunities for
simplification and better efficiency in kfd_svm:

    * Support migration when cpages != npages
    * Migrate a part of an svm_range without splitting it. No more
  splitting of ranges in CPU page faults
    * Migrate a part of an svm_range in GPU page fault handler. No more
  migrating the whole range for a single page fault
    * Simplified VRAM management (see below)

With that, svm_range will no longer have an "actual_loc" field. If we're
not sure where the data is, we need to call migrate. If it's already in
the right place, then cpages will be 0 and we can skip all the steps
after migrate_vma_setup.

Simplified VRAM management
==

VRAM BOs are no longer associated with pranges. Instead they are
"free-floating", allocated during migration to VRAM, with reference
count for each page that uses the BO. Ref is released in page-release
callback. When the ref count drops to 0, free the BO.

VRAM BO size should match the migration granularity, 2MB by default.
That way the BO can be freed when memory gets migrated out. If migration
of some pages fails the BO may not be fully occupied. Also some pages
may be released individually on A+A due to COW or other events.

Eviction needs to migrate all the pages still using the BO. If the BO
struct keeps an array of page pointers, that's basically the migrate.src
for the eviction. Migration calls "try_to_unmap", which has the best
chance of freeing the BO, even when shared by multiple processes.

If we cannot guarantee eviction of pages, we cannot use TTM for VRAM
allocations. Need to use amdgpu_vram_mgr. Need a way to detect memory
pressure so we can start evicting memory.

Regards,
    Felix



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: Need to support mixed memory mappings with HMM

2021-03-25 Thread Felix Kuehling
Am 2021-03-25 um 12:10 p.m. schrieb Christian König:
>
>
> Am 25.03.21 um 17:03 schrieb Felix Kuehling:
>> Hi,
>>
>> This is a long one with a proposal for a pretty significant redesign of
>> how we handle migrations and VRAM management with HMM. This is based on
>> my own debugging and reading of the migrate_vma helpers, as well as
>> Alex's problems with migrations on A+A. I hope we can discuss this next
>> Monday after you've had some time do digest it.
>>
>> I did some debugging yesterday and found that migrations to VRAM can
>> fail for some pages. The current migration helpers have many corner
>> cases where a page cannot be migrated. Some of them may be fixable
>> (adding support for THP), others are not (locked pages are skipped to
>> avoid deadlocks). Therefore I think our current code is too inflexible
>> when it assumes that a range is entirely in one place.
>>
>> Alex also ran into some funny issues with COW on A+A where some pages
>> get faulted back to system memory. I think a lot of the problems here
>> will get easier once we support mixed mappings.
>>
>> Mixed GPU mappings
>> ===
>>
>> The idea is, to remove any assumptions that an entire svm_range is in
>> one place. Instead hmm_range_fault gives us a list of pages, some of
>> which are system memory and others are device_private or device_generic.
>>
>> We will need an amdgpu_vm interface that lets us map mixed page arrays
>> where different pages use different PTE flags. We can have at least 3
>> different types of pages in one mapping:
>>
>>   1. System memory (S-bit set)
>>   2. Local memory (S-bit cleared, MTYPE for local memory)
>>   3. Remote XGMI memory (S-bit cleared, MTYPE+C for remote memory)
>>
>> My idea is to change the amdgpu_vm_update_mapping interface to use some
>> high-bit in the pages_addr array to indicate the page type. For the
>> default page type (0) nothing really changes for the callers. The
>> "flags" parameter needs to become a pointer to an array that gets
>> indexed by the high bits from the pages_addr array. For existing callers
>> it's as easy as changing flags to  (array of size 1). For HMM we
>> would pass a pointer to a real array.
>
> Yeah, I've thought about stuff like that as well for a while.
>
> Problem is that this won't work that easily. We assume at many places
> that the flags doesn't change for the range in question.

I think some lower level functions assume that the flags stay the same
for physically contiguous ranges. But if you use the high-bits to encode
the page type, the ranges won't be contiguous any more. So you can
change page flags for different contiguous ranges.

Regards,
  Felix


>
> We would somehow need to change that to get the flags directly from
> the low bits of the DMA address or something instead.
>
> Christian.
>
>>
>> Once this is done, it leads to a number of opportunities for
>> simplification and better efficiency in kfd_svm:
>>
>>    * Support migration when cpages != npages
>>    * Migrate a part of an svm_range without splitting it. No more
>>  splitting of ranges in CPU page faults
>>    * Migrate a part of an svm_range in GPU page fault handler. No more
>>  migrating the whole range for a single page fault
>>    * Simplified VRAM management (see below)
>>
>> With that, svm_range will no longer have an "actual_loc" field. If we're
>> not sure where the data is, we need to call migrate. If it's already in
>> the right place, then cpages will be 0 and we can skip all the steps
>> after migrate_vma_setup.
>>
>> Simplified VRAM management
>> ==
>>
>> VRAM BOs are no longer associated with pranges. Instead they are
>> "free-floating", allocated during migration to VRAM, with reference
>> count for each page that uses the BO. Ref is released in page-release
>> callback. When the ref count drops to 0, free the BO.
>>
>> VRAM BO size should match the migration granularity, 2MB by default.
>> That way the BO can be freed when memory gets migrated out. If migration
>> of some pages fails the BO may not be fully occupied. Also some pages
>> may be released individually on A+A due to COW or other events.
>>
>> Eviction needs to migrate all the pages still using the BO. If the BO
>> struct keeps an array of page pointers, that's basically the migrate.src
>> for the eviction. Migration calls "try_to_unmap", which has the best
>> chance of freeing the BO, even when shared by multiple processes.
>>
>> If we cannot guarantee eviction of pages, we cannot use TTM for VRAM
>> allocations. Need to use amdgpu_vram_mgr. Need a way to detect memory
>> pressure so we can start evicting memory.
>>
>> Regards,
>>    Felix
>>
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: Color mode exposed to user space?

2021-03-25 Thread Alex Deucher
+ dri-devel

I don't think it's currently exposed anywhere.

Alex

On Wed, Mar 24, 2021 at 5:11 AM Werner Sembach  wrote:
>
> Hello,
>
> is the information which color mode is currently in used for a display (RGB, 
> YCbCr444, or YCbCr420) exposed to user space somewhere?
>
> If no: Where would be the best place to put code to expose it to sysfs?
>
> Thanks in advance,
>
> Werner Sembach
>
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: Need to support mixed memory mappings with HMM

2021-03-25 Thread Christian König



Am 25.03.21 um 17:03 schrieb Felix Kuehling:

Hi,

This is a long one with a proposal for a pretty significant redesign of
how we handle migrations and VRAM management with HMM. This is based on
my own debugging and reading of the migrate_vma helpers, as well as
Alex's problems with migrations on A+A. I hope we can discuss this next
Monday after you've had some time do digest it.

I did some debugging yesterday and found that migrations to VRAM can
fail for some pages. The current migration helpers have many corner
cases where a page cannot be migrated. Some of them may be fixable
(adding support for THP), others are not (locked pages are skipped to
avoid deadlocks). Therefore I think our current code is too inflexible
when it assumes that a range is entirely in one place.

Alex also ran into some funny issues with COW on A+A where some pages
get faulted back to system memory. I think a lot of the problems here
will get easier once we support mixed mappings.

Mixed GPU mappings
===

The idea is, to remove any assumptions that an entire svm_range is in
one place. Instead hmm_range_fault gives us a list of pages, some of
which are system memory and others are device_private or device_generic.

We will need an amdgpu_vm interface that lets us map mixed page arrays
where different pages use different PTE flags. We can have at least 3
different types of pages in one mapping:

  1. System memory (S-bit set)
  2. Local memory (S-bit cleared, MTYPE for local memory)
  3. Remote XGMI memory (S-bit cleared, MTYPE+C for remote memory)

My idea is to change the amdgpu_vm_update_mapping interface to use some
high-bit in the pages_addr array to indicate the page type. For the
default page type (0) nothing really changes for the callers. The
"flags" parameter needs to become a pointer to an array that gets
indexed by the high bits from the pages_addr array. For existing callers
it's as easy as changing flags to  (array of size 1). For HMM we
would pass a pointer to a real array.


Yeah, I've thought about stuff like that as well for a while.

Problem is that this won't work that easily. We assume at many places 
that the flags doesn't change for the range in question.


We would somehow need to change that to get the flags directly from the 
low bits of the DMA address or something instead.


Christian.



Once this is done, it leads to a number of opportunities for
simplification and better efficiency in kfd_svm:

   * Support migration when cpages != npages
   * Migrate a part of an svm_range without splitting it. No more
 splitting of ranges in CPU page faults
   * Migrate a part of an svm_range in GPU page fault handler. No more
 migrating the whole range for a single page fault
   * Simplified VRAM management (see below)

With that, svm_range will no longer have an "actual_loc" field. If we're
not sure where the data is, we need to call migrate. If it's already in
the right place, then cpages will be 0 and we can skip all the steps
after migrate_vma_setup.

Simplified VRAM management
==

VRAM BOs are no longer associated with pranges. Instead they are
"free-floating", allocated during migration to VRAM, with reference
count for each page that uses the BO. Ref is released in page-release
callback. When the ref count drops to 0, free the BO.

VRAM BO size should match the migration granularity, 2MB by default.
That way the BO can be freed when memory gets migrated out. If migration
of some pages fails the BO may not be fully occupied. Also some pages
may be released individually on A+A due to COW or other events.

Eviction needs to migrate all the pages still using the BO. If the BO
struct keeps an array of page pointers, that's basically the migrate.src
for the eviction. Migration calls "try_to_unmap", which has the best
chance of freeing the BO, even when shared by multiple processes.

If we cannot guarantee eviction of pages, we cannot use TTM for VRAM
allocations. Need to use amdgpu_vram_mgr. Need a way to detect memory
pressure so we can start evicting memory.

Regards,
   Felix



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/radeon/radeon_pm: Convert sysfs sprintf/snprintf family to sysfs_emit

2021-03-25 Thread Alex Deucher
On Wed, Mar 24, 2021 at 2:47 AM Tian Tao  wrote:
>
> Fix the following coccicheck warning:
> drivers/gpu//drm/radeon/radeon_pm.c:521:9-17: WARNING: use scnprintf or
> sprintf
> drivers/gpu//drm/radeon/radeon_pm.c:475:8-16: WARNING: use scnprintf or
> sprintf
> drivers/gpu//drm/radeon/radeon_pm.c:418:8-16: WARNING: use scnprintf or
> sprintf
> drivers/gpu//drm/radeon/radeon_pm.c:363:8-16: WARNING: use scnprintf or
> sprintf
> drivers/gpu//drm/radeon/radeon_pm.c:734:8-16: WARNING: use scnprintf or
> sprintf
> drivers/gpu//drm/radeon/radeon_pm.c:688:8-16: WARNING: use scnprintf or
> sprintf
> drivers/gpu//drm/radeon/radeon_pm.c:704:8-16: WARNING: use scnprintf or
> sprintf
> drivers/gpu//drm/radeon/radeon_pm.c:755:8-16: WARNING: use scnprintf or
> sprintf
>
> Signed-off-by: Tian Tao 

Applied.  Thanks!

Alex

> ---
>  drivers/gpu/drm/radeon/radeon_pm.c | 36 +---
>  1 file changed, 17 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/gpu/drm/radeon/radeon_pm.c 
> b/drivers/gpu/drm/radeon/radeon_pm.c
> index 1995dad..dd56fcd 100644
> --- a/drivers/gpu/drm/radeon/radeon_pm.c
> +++ b/drivers/gpu/drm/radeon/radeon_pm.c
> @@ -361,11 +361,10 @@ static ssize_t radeon_get_pm_profile(struct device *dev,
> struct radeon_device *rdev = ddev->dev_private;
> int cp = rdev->pm.profile;
>
> -   return snprintf(buf, PAGE_SIZE, "%s\n",
> -   (cp == PM_PROFILE_AUTO) ? "auto" :
> -   (cp == PM_PROFILE_LOW) ? "low" :
> -   (cp == PM_PROFILE_MID) ? "mid" :
> -   (cp == PM_PROFILE_HIGH) ? "high" : "default");
> +   return sysfs_emit(buf, "%s\n", (cp == PM_PROFILE_AUTO) ? "auto" :
> + (cp == PM_PROFILE_LOW) ? "low" :
> + (cp == PM_PROFILE_MID) ? "mid" :
> + (cp == PM_PROFILE_HIGH) ? "high" : "default");
>  }
>
>  static ssize_t radeon_set_pm_profile(struct device *dev,
> @@ -416,9 +415,8 @@ static ssize_t radeon_get_pm_method(struct device *dev,
> struct radeon_device *rdev = ddev->dev_private;
> int pm = rdev->pm.pm_method;
>
> -   return snprintf(buf, PAGE_SIZE, "%s\n",
> -   (pm == PM_METHOD_DYNPM) ? "dynpm" :
> -   (pm == PM_METHOD_PROFILE) ? "profile" : "dpm");
> +   return sysfs_emit(buf, "%s\n", (pm == PM_METHOD_DYNPM) ? "dynpm" :
> + (pm == PM_METHOD_PROFILE) ? "profile" : "dpm");
>  }
>
>  static ssize_t radeon_set_pm_method(struct device *dev,
> @@ -473,9 +471,9 @@ static ssize_t radeon_get_dpm_state(struct device *dev,
> struct radeon_device *rdev = ddev->dev_private;
> enum radeon_pm_state_type pm = rdev->pm.dpm.user_state;
>
> -   return snprintf(buf, PAGE_SIZE, "%s\n",
> -   (pm == POWER_STATE_TYPE_BATTERY) ? "battery" :
> -   (pm == POWER_STATE_TYPE_BALANCED) ? "balanced" : 
> "performance");
> +   return sysfs_emit(buf, "%s\n",
> + (pm == POWER_STATE_TYPE_BATTERY) ? "battery" :
> + (pm == POWER_STATE_TYPE_BALANCED) ? "balanced" : 
> "performance");
>  }
>
>  static ssize_t radeon_set_dpm_state(struct device *dev,
> @@ -519,11 +517,11 @@ static ssize_t 
> radeon_get_dpm_forced_performance_level(struct device *dev,
>
> if  ((rdev->flags & RADEON_IS_PX) &&
>  (ddev->switch_power_state != DRM_SWITCH_POWER_ON))
> -   return snprintf(buf, PAGE_SIZE, "off\n");
> +   return sysfs_emit(buf, "off\n");
>
> -   return snprintf(buf, PAGE_SIZE, "%s\n",
> -   (level == RADEON_DPM_FORCED_LEVEL_AUTO) ? "auto" :
> -   (level == RADEON_DPM_FORCED_LEVEL_LOW) ? "low" : 
> "high");
> +   return sysfs_emit(buf, "%s\n",
> + (level == RADEON_DPM_FORCED_LEVEL_AUTO) ? "auto" :
> + (level == RADEON_DPM_FORCED_LEVEL_LOW) ? "low" : 
> "high");
>  }
>
>  static ssize_t radeon_set_dpm_forced_performance_level(struct device *dev,
> @@ -686,7 +684,7 @@ static ssize_t radeon_hwmon_show_temp(struct device *dev,
> else
> temp = 0;
>
> -   return snprintf(buf, PAGE_SIZE, "%d\n", temp);
> +   return sysfs_emit(buf, "%d\n", temp);
>  }
>
>  static ssize_t radeon_hwmon_show_temp_thresh(struct device *dev,
> @@ -702,7 +700,7 @@ static ssize_t radeon_hwmon_show_temp_thresh(struct 
> device *dev,
> else
> temp = rdev->pm.dpm.thermal.max_temp;
>
> -   return snprintf(buf, PAGE_SIZE, "%d\n", temp);
> +   return sysfs_emit(buf, "%d\n", temp);
>  }
>
>  static SENSOR_DEVICE_ATTR(temp1_input, S_IRUGO, radeon_hwmon_show_temp, 
> NULL, 0);
> @@ -732,7 +730,7 @@ static ssize_t radeon_hwmon_show_sclk(struct device *dev,
>for hwmon */
> sclk *= 1;
>
> -   return snprintf(buf, PAGE_SIZE, "%u\n", sclk);
> +   return sysfs_emit(buf, 

Need to support mixed memory mappings with HMM

2021-03-25 Thread Felix Kuehling
Hi,

This is a long one with a proposal for a pretty significant redesign of
how we handle migrations and VRAM management with HMM. This is based on
my own debugging and reading of the migrate_vma helpers, as well as
Alex's problems with migrations on A+A. I hope we can discuss this next
Monday after you've had some time do digest it.

I did some debugging yesterday and found that migrations to VRAM can
fail for some pages. The current migration helpers have many corner
cases where a page cannot be migrated. Some of them may be fixable
(adding support for THP), others are not (locked pages are skipped to
avoid deadlocks). Therefore I think our current code is too inflexible
when it assumes that a range is entirely in one place.

Alex also ran into some funny issues with COW on A+A where some pages
get faulted back to system memory. I think a lot of the problems here
will get easier once we support mixed mappings.

Mixed GPU mappings
===

The idea is, to remove any assumptions that an entire svm_range is in
one place. Instead hmm_range_fault gives us a list of pages, some of
which are system memory and others are device_private or device_generic.

We will need an amdgpu_vm interface that lets us map mixed page arrays
where different pages use different PTE flags. We can have at least 3
different types of pages in one mapping:

 1. System memory (S-bit set)
 2. Local memory (S-bit cleared, MTYPE for local memory)
 3. Remote XGMI memory (S-bit cleared, MTYPE+C for remote memory)

My idea is to change the amdgpu_vm_update_mapping interface to use some
high-bit in the pages_addr array to indicate the page type. For the
default page type (0) nothing really changes for the callers. The
"flags" parameter needs to become a pointer to an array that gets
indexed by the high bits from the pages_addr array. For existing callers
it's as easy as changing flags to  (array of size 1). For HMM we
would pass a pointer to a real array.

Once this is done, it leads to a number of opportunities for
simplification and better efficiency in kfd_svm:

  * Support migration when cpages != npages
  * Migrate a part of an svm_range without splitting it. No more
splitting of ranges in CPU page faults
  * Migrate a part of an svm_range in GPU page fault handler. No more
migrating the whole range for a single page fault
  * Simplified VRAM management (see below)

With that, svm_range will no longer have an "actual_loc" field. If we're
not sure where the data is, we need to call migrate. If it's already in
the right place, then cpages will be 0 and we can skip all the steps
after migrate_vma_setup.

Simplified VRAM management
==

VRAM BOs are no longer associated with pranges. Instead they are
"free-floating", allocated during migration to VRAM, with reference
count for each page that uses the BO. Ref is released in page-release
callback. When the ref count drops to 0, free the BO.

VRAM BO size should match the migration granularity, 2MB by default.
That way the BO can be freed when memory gets migrated out. If migration
of some pages fails the BO may not be fully occupied. Also some pages
may be released individually on A+A due to COW or other events.

Eviction needs to migrate all the pages still using the BO. If the BO
struct keeps an array of page pointers, that's basically the migrate.src
for the eviction. Migration calls "try_to_unmap", which has the best
chance of freeing the BO, even when shared by multiple processes.

If we cannot guarantee eviction of pages, we cannot use TTM for VRAM
allocations. Need to use amdgpu_vram_mgr. Need a way to detect memory
pressure so we can start evicting memory.

Regards,
  Felix

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: drm/ttm: switch to per device LRU lock

2021-03-25 Thread Christian König

Thanks! Just a copy issue.

Patch to fix this is on the mailing list.

Christian.

Am 25.03.21 um 16:00 schrieb Colin Ian King:

Hi,

Static analysis with Coverity in linux-next has detected an issue in
drivers/gpu/drm/ttm/ttm_bo.c with the follow commit:

commit a1f091f8ef2b680a5184db065527612247cb4cae
Author: Christian König 
Date:   Tue Oct 6 17:26:42 2020 +0200

 drm/ttm: switch to per device LRU lock

 Instead of having a global lock for potentially less contention.


The analysis is as follows:

617 int ttm_mem_evict_first(struct ttm_device *bdev,
618struct ttm_resource_manager *man,
619const struct ttm_place *place,
620struct ttm_operation_ctx *ctx,
621struct ww_acquire_ctx *ticket)
622 {
1. assign_zero: Assigning: bo = NULL.

623struct ttm_buffer_object *bo = NULL, *busy_bo = NULL;
624bool locked = false;
625unsigned i;
626int ret;
627

Explicit null dereferenced (FORWARD_NULL)2. var_deref_op:
Dereferencing null pointer bo.

628spin_lock(>bdev->lru_lock);
629for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {

The spin_lock on bo is dereferencing a null bo pointer.

Colin


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


re: drm/ttm: switch to per device LRU lock

2021-03-25 Thread Colin Ian King
Hi,

Static analysis with Coverity in linux-next has detected an issue in
drivers/gpu/drm/ttm/ttm_bo.c with the follow commit:

commit a1f091f8ef2b680a5184db065527612247cb4cae
Author: Christian König 
Date:   Tue Oct 6 17:26:42 2020 +0200

drm/ttm: switch to per device LRU lock

Instead of having a global lock for potentially less contention.


The analysis is as follows:

617 int ttm_mem_evict_first(struct ttm_device *bdev,
618struct ttm_resource_manager *man,
619const struct ttm_place *place,
620struct ttm_operation_ctx *ctx,
621struct ww_acquire_ctx *ticket)
622 {
   1. assign_zero: Assigning: bo = NULL.

623struct ttm_buffer_object *bo = NULL, *busy_bo = NULL;
624bool locked = false;
625unsigned i;
626int ret;
627

   Explicit null dereferenced (FORWARD_NULL)2. var_deref_op:
Dereferencing null pointer bo.

628spin_lock(>bdev->lru_lock);
629for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {

The spin_lock on bo is dereferencing a null bo pointer.

Colin
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amd/pm: no need to force MCLK to highest when no display connected

2021-03-25 Thread Deucher, Alexander
[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Alex Deucher 
Tested-by: Alex Deucher 


From: amd-gfx  on behalf of Evan Quan 

Sent: Thursday, March 25, 2021 12:18 AM
To: amd-gfx@lists.freedesktop.org 
Cc: Quan, Evan 
Subject: [PATCH] drm/amd/pm: no need to force MCLK to highest when no display 
connected

Correct the check for vblank short.

Change-Id: I0eb3a6bd95b7f6d97029772914154324c8ccd2c0
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c 
b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c
index 7edafef095a3..301b6769f007 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c
+++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c
@@ -3330,7 +3330,8 @@ static int smu7_apply_state_adjust_rules(struct pp_hwmgr 
*hwmgr,

 disable_mclk_switching_for_display = ((1 < 
hwmgr->display_config->num_display) &&
 
!hwmgr->display_config->multi_monitor_in_sync) ||
-   smu7_vblank_too_short(hwmgr, 
hwmgr->display_config->min_vblank_time);
+   
(hwmgr->display_config->num_display &&
+   smu7_vblank_too_short(hwmgr, 
hwmgr->display_config->min_vblank_time));

 disable_mclk_switching = disable_mclk_switching_for_frame_lock ||
  disable_mclk_switching_for_display;
--
2.29.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=04%7C01%7Calexander.deucher%40amd.com%7C96ae73006f284dd59a5d08d8ef452656%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637522427565407677%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=3S%2FkGh8zi1uLEonF3G4kWmozbBPazZJeuwTNAFhRgdQ%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amd/amdgpu/gfx_v7_0: Trivial typo fixes

2021-03-25 Thread Bhaskar Chowdhury
s/acccess/access/
s/inferface/interface/
s/sequnce/sequence/  .two different places.
s/retrive/retrieve/
s/sheduling/scheduling/
s/independant/independent/
s/wether/whether/ ..two different places.
s/emmit/emit/
s/synce/sync/


Signed-off-by: Bhaskar Chowdhury 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
index a368724c3dfc..4502b95ddf6b 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
@@ -1877,7 +1877,7 @@ static void gfx_v7_0_init_compute_vmid(struct 
amdgpu_device *adev)
mutex_unlock(>srbm_mutex);

/* Initialize all compute VMIDs to have no GDS, GWS, or OA
-  acccess. These should be enabled by FW for target VMIDs. */
+  access. These should be enabled by FW for target VMIDs. */
for (i = adev->vm_manager.first_kfd_vmid; i < AMDGPU_NUM_VMID; i++) {
WREG32(amdgpu_gds_reg_offset[i].mem_base, 0);
WREG32(amdgpu_gds_reg_offset[i].mem_size, 0);
@@ -2058,7 +2058,7 @@ static void gfx_v7_0_constants_init(struct amdgpu_device 
*adev)
  * @adev: amdgpu_device pointer
  *
  * Set up the number and offset of the CP scratch registers.
- * NOTE: use of CP scratch registers is a legacy inferface and
+ * NOTE: use of CP scratch registers is a legacy interface and
  * is not used by default on newer asics (r6xx+).  On newer asics,
  * memory buffers are used for fences rather than scratch regs.
  */
@@ -2172,7 +2172,7 @@ static void gfx_v7_0_ring_emit_vgt_flush(struct 
amdgpu_ring *ring)
  * @seq: sequence number
  * @flags: fence related flags
  *
- * Emits a fence sequnce number on the gfx ring and flushes
+ * Emits a fence sequence number on the gfx ring and flushes
  * GPU caches.
  */
 static void gfx_v7_0_ring_emit_fence_gfx(struct amdgpu_ring *ring, u64 addr,
@@ -2215,7 +2215,7 @@ static void gfx_v7_0_ring_emit_fence_gfx(struct 
amdgpu_ring *ring, u64 addr,
  * @seq: sequence number
  * @flags: fence related flags
  *
- * Emits a fence sequnce number on the compute ring and flushes
+ * Emits a fence sequence number on the compute ring and flushes
  * GPU caches.
  */
 static void gfx_v7_0_ring_emit_fence_compute(struct amdgpu_ring *ring,
@@ -2245,14 +2245,14 @@ static void gfx_v7_0_ring_emit_fence_compute(struct 
amdgpu_ring *ring,
  * gfx_v7_0_ring_emit_ib - emit an IB (Indirect Buffer) on the ring
  *
  * @ring: amdgpu_ring structure holding ring information
- * @job: job to retrive vmid from
+ * @job: job to retrieve vmid from
  * @ib: amdgpu indirect buffer object
  * @flags: options (AMDGPU_HAVE_CTX_SWITCH)
  *
  * Emits an DE (drawing engine) or CE (constant engine) IB
  * on the gfx ring.  IBs are usually generated by userspace
  * acceleration drivers and submitted to the kernel for
- * sheduling on the ring.  This function schedules the IB
+ * scheduling on the ring.  This function schedules the IB
  * on the gfx ring for execution by the GPU.
  */
 static void gfx_v7_0_ring_emit_ib_gfx(struct amdgpu_ring *ring,
@@ -2402,7 +2402,7 @@ static int gfx_v7_0_ring_test_ib(struct amdgpu_ring 
*ring, long timeout)

 /*
  * CP.
- * On CIK, gfx and compute now have independant command processors.
+ * On CIK, gfx and compute now have independent command processors.
  *
  * GFX
  * Gfx consists of a single ring and can process both gfx jobs and
@@ -2630,7 +2630,7 @@ static int gfx_v7_0_cp_gfx_resume(struct amdgpu_device 
*adev)
ring->wptr = 0;
WREG32(mmCP_RB0_WPTR, lower_32_bits(ring->wptr));

-   /* set the wb address wether it's enabled or not */
+   /* set the wb address whether it's enabled or not */
rptr_addr = adev->wb.gpu_addr + (ring->rptr_offs * 4);
WREG32(mmCP_RB0_RPTR_ADDR, lower_32_bits(rptr_addr));
WREG32(mmCP_RB0_RPTR_ADDR_HI, upper_32_bits(rptr_addr) & 0xFF);
@@ -2985,7 +2985,7 @@ static void gfx_v7_0_mqd_init(struct amdgpu_device *adev,
mqd->cp_hqd_pq_wptr_poll_addr_lo = wb_gpu_addr & 0xfffc;
mqd->cp_hqd_pq_wptr_poll_addr_hi = upper_32_bits(wb_gpu_addr) & 0x;

-   /* set the wb address wether it's enabled or not */
+   /* set the wb address whether it's enabled or not */
wb_gpu_addr = adev->wb.gpu_addr + (ring->rptr_offs * 4);
mqd->cp_hqd_pq_rptr_report_addr_lo = wb_gpu_addr & 0xfffc;
mqd->cp_hqd_pq_rptr_report_addr_hi =
@@ -3198,7 +3198,7 @@ static int gfx_v7_0_cp_resume(struct amdgpu_device *adev)
 /**
  * gfx_v7_0_ring_emit_vm_flush - cik vm flush using the CP
  *
- * @ring: the ring to emmit the commands to
+ * @ring: the ring to emit the commands to
  *
  * Sync the command pipeline with the PFP. E.g. wait for everything
  * to be completed.
@@ -3220,7 +3220,7 @@ static void gfx_v7_0_ring_emit_pipeline_sync(struct 
amdgpu_ring *ring)
amdgpu_ring_write(ring, 4); /* poll interval */

if 

RE: [PATCH 2/2] drm/amd/pm: unify the interface for gfx state setting

2021-03-25 Thread Lazar, Lijo
[AMD Public Use]

Series is 
Reviewed-by: Lijo Lazar 

-Original Message-
From: amd-gfx  On Behalf Of Evan Quan
Sent: Thursday, March 25, 2021 4:12 PM
To: amd-gfx@lists.freedesktop.org
Cc: Quan, Evan 
Subject: [PATCH 2/2] drm/amd/pm: unify the interface for gfx state setting

No need to have special handling for swSMU supported ASICs.

Change-Id: I029414efa63c130a1a3aaba908bbf3857c5873ff
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c   | 16 ++--
 drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h   |  2 --
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c |  5 -
 3 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index de540c209023..12e8b527776b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -842,14 +842,10 @@ int amdgpu_gfx_get_num_kcq(struct amdgpu_device *adev)
 
 void amdgpu_gfx_state_change_set(struct amdgpu_device *adev, enum 
gfx_change_state state)  {
-   if (is_support_sw_smu(adev)) {
-   smu_gfx_state_change_set(>smu, state);
-   } else {
-   mutex_lock(>pm.mutex);
-   if (adev->powerplay.pp_funcs &&
-   adev->powerplay.pp_funcs->gfx_state_change_set)
-   ((adev)->powerplay.pp_funcs->gfx_state_change_set(
-   (adev)->powerplay.pp_handle, state));
-   mutex_unlock(>pm.mutex);
-   }
+   mutex_lock(>pm.mutex);
+   if (adev->powerplay.pp_funcs &&
+   adev->powerplay.pp_funcs->gfx_state_change_set)
+   ((adev)->powerplay.pp_funcs->gfx_state_change_set(
+   (adev)->powerplay.pp_handle, state));
+   mutex_unlock(>pm.mutex);
 }
diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h 
b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
index 059d2c4e4c7f..ae05c16443c3 100644
--- a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
@@ -1317,8 +1317,6 @@ int smu_allow_xgmi_power_down(struct smu_context *smu, 
bool en);
 
 int smu_get_status_gfxoff(struct amdgpu_device *adev, uint32_t *value);
 
-int smu_gfx_state_change_set(struct smu_context *smu, uint32_t state);
-
 int smu_set_light_sbr(struct smu_context *smu, bool enable);
 
 int smu_wait_for_event(struct amdgpu_device *adev, enum smu_event_type event, 
diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c 
b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
index 0d1372d440ed..b2898f93a3ff 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -2920,8 +2920,10 @@ static int smu_enable_mgpu_fan_boost(void *handle)
return ret;
 }
 
-int smu_gfx_state_change_set(struct smu_context *smu, uint32_t state)
+static int smu_gfx_state_change_set(void *handle,
+   uint32_t state)
 {
+   struct smu_context *smu = handle;
int ret = 0;
 
mutex_lock(>mutex);
@@ -2994,6 +2996,7 @@ static const struct amd_pm_funcs swsmu_pm_funcs = {
.display_disable_memory_clock_switch = 
smu_display_disable_memory_clock_switch,
.get_max_sustainable_clocks_by_dc= 
smu_get_max_sustainable_clocks_by_dc,
.load_firmware   = smu_load_microcode,
+   .gfx_state_change_set= smu_gfx_state_change_set,
 };
 
 int smu_wait_for_event(struct amdgpu_device *adev, enum smu_event_type event,
--
2.29.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=04%7C01%7Clijo.lazar%40amd.com%7Cead6b76a1b794a3e28a308d8ef7abebc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C63752265776120%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=8k39%2B4CHfE4hYaGq1rci1Df6ATY%2B4mFWWy28h9fv8Sc%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 2/2] drm/amd/pm: fix missing static declarations

2021-03-25 Thread Lazar, Lijo
[AMD Public Use]

Series is 
Reviewed-by: Lijo Lazar 

-Original Message-
From: amd-gfx  On Behalf Of Evan Quan
Sent: Thursday, March 25, 2021 9:51 AM
To: amd-gfx@lists.freedesktop.org
Cc: Quan, Evan 
Subject: [PATCH 2/2] drm/amd/pm: fix missing static declarations

Add "static" declarations for those APIs used internally.

Change-Id: I38bfa8a117b3056202935163935a867f03ebaefe
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c 
b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
index 3bbe0278e50e..e136f071b4f2 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -1467,7 +1467,7 @@ static int smu_hw_fini(void *handle)
return smu_smc_hw_cleanup(smu);
 }
 
-int smu_reset(struct smu_context *smu)
+static int smu_reset(struct smu_context *smu)
 {
struct amdgpu_device *adev = smu->adev;
int ret;
@@ -1715,10 +1715,10 @@ static int smu_adjust_power_state_dynamic(struct 
smu_context *smu,
return ret;
 }
 
-int smu_handle_task(struct smu_context *smu,
-   enum amd_dpm_forced_level level,
-   enum amd_pp_task task_id,
-   bool lock_needed)
+static int smu_handle_task(struct smu_context *smu,
+  enum amd_dpm_forced_level level,
+  enum amd_pp_task task_id,
+  bool lock_needed)
 {
int ret = 0;
 
@@ -2147,7 +2147,7 @@ static int smu_load_microcode(void *handle)
return ret;
 }
 
-int smu_set_gfx_cgpg(struct smu_context *smu, bool enabled)
+static int smu_set_gfx_cgpg(struct smu_context *smu, bool enabled)
 {
int ret = 0;
 
@@ -2466,7 +2466,7 @@ static u32 smu_get_fan_control_mode(void *handle)
return ret;
 }
 
-int smu_set_fan_control_mode(struct smu_context *smu, int value)
+static int smu_set_fan_control_mode(struct smu_context *smu, int value)
 {
int ret = 0;
 
-- 
2.29.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=04%7C01%7Clijo.lazar%40amd.com%7Cb4ea58727007476caa1008d8ef4580c7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637522429088901237%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=bNDUB66tzrjOXwn8Prqe3QSo1kHIOxa3A%2BvrEjSUZl8%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/2] drm/amd/pm: unify the interface for power gating

2021-03-25 Thread Evan Quan
No need to have special handling for swSMU supported ASICs.

Change-Id: I7292c1064c3a1c75dc9f91d7c5318eab4f407241
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/pm/amdgpu_dpm.c   | 9 -
 drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h   | 3 ---
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 6 +++---
 3 files changed, 3 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c 
b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
index 464fc04fb334..03581d5b1836 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
@@ -927,7 +927,6 @@ int amdgpu_dpm_set_powergating_by_smu(struct amdgpu_device 
*adev, uint32_t block
 {
int ret = 0;
const struct amd_pm_funcs *pp_funcs = adev->powerplay.pp_funcs;
-   bool swsmu = is_support_sw_smu(adev);
 
switch (block_type) {
case AMD_IP_BLOCK_TYPE_UVD:
@@ -968,15 +967,7 @@ int amdgpu_dpm_set_powergating_by_smu(struct amdgpu_device 
*adev, uint32_t block
case AMD_IP_BLOCK_TYPE_GFX:
case AMD_IP_BLOCK_TYPE_VCN:
case AMD_IP_BLOCK_TYPE_SDMA:
-   if (pp_funcs && pp_funcs->set_powergating_by_smu) {
-   ret = (pp_funcs->set_powergating_by_smu(
-   (adev)->powerplay.pp_handle, block_type, gate));
-   }
-   break;
case AMD_IP_BLOCK_TYPE_JPEG:
-   if (swsmu)
-   ret = smu_dpm_set_power_gate(>smu, block_type, 
gate);
-   break;
case AMD_IP_BLOCK_TYPE_GMC:
case AMD_IP_BLOCK_TYPE_ACP:
if (pp_funcs && pp_funcs->set_powergating_by_smu) {
diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h 
b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
index 4684eca7b37b..059d2c4e4c7f 100644
--- a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
@@ -1305,9 +1305,6 @@ bool is_support_sw_smu(struct amdgpu_device *adev);
 bool is_support_cclk_dpm(struct amdgpu_device *adev);
 int smu_write_watermarks_table(struct smu_context *smu);
 
-/* smu to display interface */
-extern int smu_dpm_set_power_gate(void *handle, uint32_t block_type, bool 
gate);
-
 int smu_get_dpm_freq_range(struct smu_context *smu, enum smu_clk_type clk_type,
   uint32_t *min, uint32_t *max);
 
diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c 
b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
index e136f071b4f2..0d1372d440ed 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -268,9 +268,9 @@ static int smu_dpm_set_jpeg_enable(struct smu_context *smu,
  *Under this case, the smu->mutex lock protection is already enforced on
  *the parent API smu_force_performance_level of the call path.
  */
-int smu_dpm_set_power_gate(void *handle,
-  uint32_t block_type,
-  bool gate)
+static int smu_dpm_set_power_gate(void *handle,
+ uint32_t block_type,
+ bool gate)
 {
struct smu_context *smu = handle;
int ret = 0;
-- 
2.29.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 2/2] drm/amd/pm: unify the interface for gfx state setting

2021-03-25 Thread Evan Quan
No need to have special handling for swSMU supported ASICs.

Change-Id: I029414efa63c130a1a3aaba908bbf3857c5873ff
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c   | 16 ++--
 drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h   |  2 --
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c |  5 -
 3 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index de540c209023..12e8b527776b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -842,14 +842,10 @@ int amdgpu_gfx_get_num_kcq(struct amdgpu_device *adev)
 
 void amdgpu_gfx_state_change_set(struct amdgpu_device *adev, enum 
gfx_change_state state)
 {
-   if (is_support_sw_smu(adev)) {
-   smu_gfx_state_change_set(>smu, state);
-   } else {
-   mutex_lock(>pm.mutex);
-   if (adev->powerplay.pp_funcs &&
-   adev->powerplay.pp_funcs->gfx_state_change_set)
-   ((adev)->powerplay.pp_funcs->gfx_state_change_set(
-   (adev)->powerplay.pp_handle, state));
-   mutex_unlock(>pm.mutex);
-   }
+   mutex_lock(>pm.mutex);
+   if (adev->powerplay.pp_funcs &&
+   adev->powerplay.pp_funcs->gfx_state_change_set)
+   ((adev)->powerplay.pp_funcs->gfx_state_change_set(
+   (adev)->powerplay.pp_handle, state));
+   mutex_unlock(>pm.mutex);
 }
diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h 
b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
index 059d2c4e4c7f..ae05c16443c3 100644
--- a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
@@ -1317,8 +1317,6 @@ int smu_allow_xgmi_power_down(struct smu_context *smu, 
bool en);
 
 int smu_get_status_gfxoff(struct amdgpu_device *adev, uint32_t *value);
 
-int smu_gfx_state_change_set(struct smu_context *smu, uint32_t state);
-
 int smu_set_light_sbr(struct smu_context *smu, bool enable);
 
 int smu_wait_for_event(struct amdgpu_device *adev, enum smu_event_type event,
diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c 
b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
index 0d1372d440ed..b2898f93a3ff 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -2920,8 +2920,10 @@ static int smu_enable_mgpu_fan_boost(void *handle)
return ret;
 }
 
-int smu_gfx_state_change_set(struct smu_context *smu, uint32_t state)
+static int smu_gfx_state_change_set(void *handle,
+   uint32_t state)
 {
+   struct smu_context *smu = handle;
int ret = 0;
 
mutex_lock(>mutex);
@@ -2994,6 +2996,7 @@ static const struct amd_pm_funcs swsmu_pm_funcs = {
.display_disable_memory_clock_switch = 
smu_display_disable_memory_clock_switch,
.get_max_sustainable_clocks_by_dc= 
smu_get_max_sustainable_clocks_by_dc,
.load_firmware   = smu_load_microcode,
+   .gfx_state_change_set= smu_gfx_state_change_set,
 };
 
 int smu_wait_for_event(struct amdgpu_device *adev, enum smu_event_type event,
-- 
2.29.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak

2021-03-25 Thread Zhang, Jack (Jian)
[AMD Official Use Only - Internal Distribution Only]

Hi, Andrey

Thank you for your good opinions.
I literally agree with you that the refcount could solve the get_clean_up_up 
cocurrent job gracefully, and no need to re-insert the
job back anymore.

I quickly made a draft for this idea as follows:
How do you like it? I will start implement to it after I got your acknowledge.

Thanks,
Jack

+void drm_job_get(struct drm_sched_job *s_job)
+{
+   kref_get(_job->refcount);
+}
+
+void drm_job_do_release(struct kref *ref)
+{
+   struct drm_sched_job *s_job;
+   struct drm_gpu_scheduler *sched;
+
+   s_job = container_of(ref, struct drm_sched_job, refcount);
+   sched = s_job->sched;
+   sched->ops->free_job(s_job);
+}
+
+void drm_job_put(struct drm_sched_job *s_job)
+{
+   kref_put(_job->refcount, drm_job_do_release);
+}
+
static void drm_sched_job_begin(struct drm_sched_job *s_job)
{
struct drm_gpu_scheduler *sched = s_job->sched;
+   kref_init(_job->refcount);
+   drm_job_get(s_job);
spin_lock(>job_list_lock);
list_add_tail(_job->node, >ring_mirror_list);
drm_sched_start_timeout(sched);
@@ -294,17 +316,16 @@ static void drm_sched_job_timedout(struct work_struct 
*work)
 * drm_sched_cleanup_jobs. It will be reinserted back after 
sched->thread
 * is parked at which point it's safe.
 */
-   list_del_init(>node);
+   drm_job_get(job);
spin_unlock(>job_list_lock);
job->sched->ops->timedout_job(job);
-
+   drm_job_put(job);
/*
 * Guilty job did complete and hence needs to be manually 
removed
 * See drm_sched_stop doc.
 */
if (sched->free_guilty) {
-   job->sched->ops->free_job(job);
sched->free_guilty = false;
}
} else {
@@ -355,20 +376,6 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, 
struct drm_sched_job *bad)
-   /*
-* Reinsert back the bad job here - now it's safe as
-* drm_sched_get_cleanup_job cannot race against us and release the
-* bad job at this point - we parked (waited for) any in progress
-* (earlier) cleanups and drm_sched_get_cleanup_job will not be called
-* now until the scheduler thread is unparked.
-*/
-   if (bad && bad->sched == sched)
-   /*
-* Add at the head of the queue to reflect it was the earliest
-* job extracted.
-*/
-   list_add(>node, >ring_mirror_list);
-
/*
 * Iterate the job list from later to  earlier one and either deactive
 * their HW callbacks or remove them from mirror list if they already
@@ -774,7 +781,7 @@ static int drm_sched_main(void *param)
 kthread_should_stop());
if (cleanup_job) {
-   sched->ops->free_job(cleanup_job);
+   drm_job_put(cleanup_job);
/* queue timeout for next job */
drm_sched_start_timeout(sched);
}
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 5a1f068af1c2..b80513eec90f 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -188,6 +188,7 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence 
*f);
  * to schedule the job.
  */
struct drm_sched_job {
+   struct kref refcount;
struct spsc_nodequeue_node;
struct drm_gpu_scheduler*sched;
struct drm_sched_fence  *s_fence;
@@ -198,6 +199,7 @@ struct drm_sched_job {
enum drm_sched_priority s_priority;
struct drm_sched_entity  *entity;
struct dma_fence_cb cb;
+
};

From: Grodzovsky, Andrey 
Sent: Friday, March 19, 2021 12:17 AM
To: Zhang, Jack (Jian) ; Christian König 
; dri-de...@lists.freedesktop.org; 
amd-gfx@lists.freedesktop.org; Koenig, Christian ; 
Liu, Monk ; Deng, Emily ; Rob Herring 
; Tomeu Vizoso ; Steven Price 

Subject: Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak



On 2021-03-18 6:41 a.m., Zhang, Jack (Jian) wrote:

[AMD Official Use Only - Internal Distribution Only]

Hi, Andrey

Let me summarize the background of this patch:

In TDR resubmit step “amdgpu_device_recheck_guilty_jobs,
It will submit first jobs of each ring and do guilty job re-check.
At that point, We had to make sure each job is in the mirror list(or 
re-inserted back already).

But we found the current code never re-insert the job to mirror list in the 
2nd, 3rd job_timeout thread(Bailing TDR thread).
This not only will cause memleak of the bailing jobs. What’s more important, 
the 1st tdr thread can never iterate the bailing job and set its guilty status 
to a correct 

Re: [PATCH] drm/amd/amdgpu/gfx_v7_0: Trivial typo fixes

2021-03-25 Thread Nirmoy



Reviewed-by: Nirmoy Das

On 3/25/21 9:53 AM, Bhaskar Chowdhury wrote:

s/acccess/access/
s/inferface/interface/
s/sequnce/sequence/  .two different places.
s/retrive/retrieve/
s/sheduling/scheduling/
s/independant/independent/
s/wether/whether/ ..two different places.
s/emmit/emit/
s/synce/sync/


Signed-off-by: Bhaskar Chowdhury 
---
  drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 22 +++---
  1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
index a368724c3dfc..4502b95ddf6b 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
@@ -1877,7 +1877,7 @@ static void gfx_v7_0_init_compute_vmid(struct 
amdgpu_device *adev)
mutex_unlock(>srbm_mutex);

/* Initialize all compute VMIDs to have no GDS, GWS, or OA
-  acccess. These should be enabled by FW for target VMIDs. */
+  access. These should be enabled by FW for target VMIDs. */
for (i = adev->vm_manager.first_kfd_vmid; i < AMDGPU_NUM_VMID; i++) {
WREG32(amdgpu_gds_reg_offset[i].mem_base, 0);
WREG32(amdgpu_gds_reg_offset[i].mem_size, 0);
@@ -2058,7 +2058,7 @@ static void gfx_v7_0_constants_init(struct amdgpu_device 
*adev)
   * @adev: amdgpu_device pointer
   *
   * Set up the number and offset of the CP scratch registers.
- * NOTE: use of CP scratch registers is a legacy inferface and
+ * NOTE: use of CP scratch registers is a legacy interface and
   * is not used by default on newer asics (r6xx+).  On newer asics,
   * memory buffers are used for fences rather than scratch regs.
   */
@@ -2172,7 +2172,7 @@ static void gfx_v7_0_ring_emit_vgt_flush(struct 
amdgpu_ring *ring)
   * @seq: sequence number
   * @flags: fence related flags
   *
- * Emits a fence sequnce number on the gfx ring and flushes
+ * Emits a fence sequence number on the gfx ring and flushes
   * GPU caches.
   */
  static void gfx_v7_0_ring_emit_fence_gfx(struct amdgpu_ring *ring, u64 addr,
@@ -2215,7 +2215,7 @@ static void gfx_v7_0_ring_emit_fence_gfx(struct 
amdgpu_ring *ring, u64 addr,
   * @seq: sequence number
   * @flags: fence related flags
   *
- * Emits a fence sequnce number on the compute ring and flushes
+ * Emits a fence sequence number on the compute ring and flushes
   * GPU caches.
   */
  static void gfx_v7_0_ring_emit_fence_compute(struct amdgpu_ring *ring,
@@ -2245,14 +2245,14 @@ static void gfx_v7_0_ring_emit_fence_compute(struct 
amdgpu_ring *ring,
   * gfx_v7_0_ring_emit_ib - emit an IB (Indirect Buffer) on the ring
   *
   * @ring: amdgpu_ring structure holding ring information
- * @job: job to retrive vmid from
+ * @job: job to retrieve vmid from
   * @ib: amdgpu indirect buffer object
   * @flags: options (AMDGPU_HAVE_CTX_SWITCH)
   *
   * Emits an DE (drawing engine) or CE (constant engine) IB
   * on the gfx ring.  IBs are usually generated by userspace
   * acceleration drivers and submitted to the kernel for
- * sheduling on the ring.  This function schedules the IB
+ * scheduling on the ring.  This function schedules the IB
   * on the gfx ring for execution by the GPU.
   */
  static void gfx_v7_0_ring_emit_ib_gfx(struct amdgpu_ring *ring,
@@ -2402,7 +2402,7 @@ static int gfx_v7_0_ring_test_ib(struct amdgpu_ring 
*ring, long timeout)

  /*
   * CP.
- * On CIK, gfx and compute now have independant command processors.
+ * On CIK, gfx and compute now have independent command processors.
   *
   * GFX
   * Gfx consists of a single ring and can process both gfx jobs and
@@ -2630,7 +2630,7 @@ static int gfx_v7_0_cp_gfx_resume(struct amdgpu_device 
*adev)
ring->wptr = 0;
WREG32(mmCP_RB0_WPTR, lower_32_bits(ring->wptr));

-   /* set the wb address wether it's enabled or not */
+   /* set the wb address whether it's enabled or not */
rptr_addr = adev->wb.gpu_addr + (ring->rptr_offs * 4);
WREG32(mmCP_RB0_RPTR_ADDR, lower_32_bits(rptr_addr));
WREG32(mmCP_RB0_RPTR_ADDR_HI, upper_32_bits(rptr_addr) & 0xFF);
@@ -2985,7 +2985,7 @@ static void gfx_v7_0_mqd_init(struct amdgpu_device *adev,
mqd->cp_hqd_pq_wptr_poll_addr_lo = wb_gpu_addr & 0xfffc;
mqd->cp_hqd_pq_wptr_poll_addr_hi = upper_32_bits(wb_gpu_addr) & 0x;

-   /* set the wb address wether it's enabled or not */
+   /* set the wb address whether it's enabled or not */
wb_gpu_addr = adev->wb.gpu_addr + (ring->rptr_offs * 4);
mqd->cp_hqd_pq_rptr_report_addr_lo = wb_gpu_addr & 0xfffc;
mqd->cp_hqd_pq_rptr_report_addr_hi =
@@ -3198,7 +3198,7 @@ static int gfx_v7_0_cp_resume(struct amdgpu_device *adev)
  /**
   * gfx_v7_0_ring_emit_vm_flush - cik vm flush using the CP
   *
- * @ring: the ring to emmit the commands to
+ * @ring: the ring to emit the commands to
   *
   * Sync the command pipeline with the PFP. E.g. wait for everything
   * to be completed.
@@ -3220,7 +3220,7 @@ static void 

Re: [PATCH] drm/amdgpu/pm: bail on sysfs/debugfs queries during platform suspend

2021-03-25 Thread Christian König

Am 25.03.21 um 04:29 schrieb Quan, Evan:

[AMD Public Use]

Maybe we can have an API like is_hw_access_blocked(). So that we can put all 
those checks below within it.
if (amdgpu_in_reset(adev))
return -EPERM;
+   if (adev->in_suspend && !adev->in_runpm)
+   return -EPERM;


Sounds like a good idea to me as well.

But my question is how the heck have we managed to access those files 
during suspend?



Anyway, patch is reviewed-by: Evan Quan 


Acked-by: Christian König 



-Original Message-
From: amd-gfx  On Behalf Of Alex Deucher
Sent: Thursday, March 25, 2021 5:18 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander 
Subject: [PATCH] drm/amdgpu/pm: bail on sysfs/debugfs queries during platform 
suspend

The GPU is in the process of being shutdown.  Spurious queries during
suspend and resume can put the SMU into a bad state.  Runtime PM is
handled dynamically so we check if we are in non-runtime suspend.

Signed-off-by: Alex Deucher 
---
  drivers/gpu/drm/amd/pm/amdgpu_pm.c | 98 ++
  1 file changed, 98 insertions(+)

diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c 
b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
index 2627870a786e..3c1b5483688b 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
@@ -129,6 +129,8 @@ static ssize_t amdgpu_get_power_dpm_state(struct device 
*dev,
  
  	if (amdgpu_in_reset(adev))

return -EPERM;
+   if (adev->in_suspend && !adev->in_runpm)
+   return -EPERM;
  
  	ret = pm_runtime_get_sync(ddev->dev);

if (ret < 0) {
@@ -162,6 +164,8 @@ static ssize_t amdgpu_set_power_dpm_state(struct device 
*dev,
  
  	if (amdgpu_in_reset(adev))

return -EPERM;
+   if (adev->in_suspend && !adev->in_runpm)
+   return -EPERM;
  
  	if (strncmp("battery", buf, strlen("battery")) == 0)

state = POWER_STATE_TYPE_BATTERY;
@@ -268,6 +272,8 @@ static ssize_t 
amdgpu_get_power_dpm_force_performance_level(struct device *dev,
  
  	if (amdgpu_in_reset(adev))

return -EPERM;
+   if (adev->in_suspend && !adev->in_runpm)
+   return -EPERM;
  
  	ret = pm_runtime_get_sync(ddev->dev);

if (ret < 0) {
@@ -310,6 +316,8 @@ static ssize_t 
amdgpu_set_power_dpm_force_performance_level(struct device *dev,
  
  	if (amdgpu_in_reset(adev))

return -EPERM;
+   if (adev->in_suspend && !adev->in_runpm)
+   return -EPERM;
  
  	if (strncmp("low", buf, strlen("low")) == 0) {

level = AMD_DPM_FORCED_LEVEL_LOW;
@@ -408,6 +416,8 @@ static ssize_t amdgpu_get_pp_num_states(struct device *dev,
  
  	if (amdgpu_in_reset(adev))

return -EPERM;
+   if (adev->in_suspend && !adev->in_runpm)
+   return -EPERM;
  
  	ret = pm_runtime_get_sync(ddev->dev);

if (ret < 0) {
@@ -448,6 +458,8 @@ static ssize_t amdgpu_get_pp_cur_state(struct device *dev,
  
  	if (amdgpu_in_reset(adev))

return -EPERM;
+   if (adev->in_suspend && !adev->in_runpm)
+   return -EPERM;
  
  	ret = pm_runtime_get_sync(ddev->dev);

if (ret < 0) {
@@ -484,6 +496,8 @@ static ssize_t amdgpu_get_pp_force_state(struct device *dev,
  
  	if (amdgpu_in_reset(adev))

return -EPERM;
+   if (adev->in_suspend && !adev->in_runpm)
+   return -EPERM;
  
  	if (adev->pp_force_state_enabled)

return amdgpu_get_pp_cur_state(dev, attr, buf);
@@ -504,6 +518,8 @@ static ssize_t amdgpu_set_pp_force_state(struct device *dev,
  
  	if (amdgpu_in_reset(adev))

return -EPERM;
+   if (adev->in_suspend && !adev->in_runpm)
+   return -EPERM;
  
  	if (strlen(buf) == 1)

adev->pp_force_state_enabled = false;
@@ -564,6 +580,8 @@ static ssize_t amdgpu_get_pp_table(struct device *dev,
  
  	if (amdgpu_in_reset(adev))

return -EPERM;
+   if (adev->in_suspend && !adev->in_runpm)
+   return -EPERM;
  
  	ret = pm_runtime_get_sync(ddev->dev);

if (ret < 0) {
@@ -602,6 +620,8 @@ static ssize_t amdgpu_set_pp_table(struct device *dev,
  
  	if (amdgpu_in_reset(adev))

return -EPERM;
+   if (adev->in_suspend && !adev->in_runpm)
+   return -EPERM;
  
  	ret = pm_runtime_get_sync(ddev->dev);

if (ret < 0) {
@@ -764,6 +784,8 @@ static ssize_t amdgpu_set_pp_od_clk_voltage(struct device 
*dev,
  
  	if (amdgpu_in_reset(adev))

return -EPERM;
+   if (adev->in_suspend && !adev->in_runpm)
+   return -EPERM;
  
  	if (count > 127)

return -EINVAL;
@@ -865,6 +887,8 @@ static ssize_t amdgpu_get_pp_od_clk_voltage(struct device 
*dev,
  
  	if (amdgpu_in_reset(adev))

return -EPERM;
+   if (adev->in_suspend && !adev->in_runpm)
+   return -EPERM;
  
  	ret = pm_runtime_get_sync(ddev->dev);

if