RE: [PATCH] drm/amdgpu/swsmu: don't bail early on hw_setup on resume
[AMD Official Use Only - Internal Distribution Only] > -Original Message- > From: amd-gfx On Behalf Of Alex > Deucher > Sent: 2021/March/26, Friday 12:38 AM > To: amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander > Subject: [PATCH] drm/amdgpu/swsmu: don't bail early on hw_setup on > resume > > The SMU comes back up with DPM enabled by the sbios, but the driver still > has to set up the SMU/driver mailbox, etc. > > Signed-off-by: Alex Deucher Reviewed-by: Zhan Liu > --- > drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c > b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c > index d4b804c7b986..462917d4d5e2 100644 > --- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c > +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c > @@ -1102,7 +1102,7 @@ static int smu_smc_hw_setup(struct smu_context > *smu) > uint32_t pcie_gen = 0, pcie_width = 0; > int ret = 0; > > -if (adev->in_suspend && smu_is_dpm_running(smu)) { > +if (!smu->is_apu && adev->in_suspend && > smu_is_dpm_running(smu)) { > dev_info(adev->dev, "dpm has been enabled\n"); > /* this is needed specifically */ > if ((adev->asic_type >= CHIP_SIENNA_CICHLID) && > -- > 2.30.2 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists. > freedesktop.org%2Fmailman%2Flistinfo%2Famd- > gfxdata=04%7C01%7Czhan.liu%40amd.com%7C500744d08f7946b2c5d > e08d8f010ec49%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C6375 > 23302768646367%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL > CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=vN > JawxwfojJrxNOG5L8Y2BAWpGRRN6valpk6y00XIQw%3Dreserved=0 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu/swsmu: don't bail early on hw_setup on resume
[AMD Official Use Only - Internal Distribution Only] Reviewed-by: Kevin Wang Best Regards, Kevin From: amd-gfx on behalf of Alex Deucher Sent: Friday, March 26, 2021 12:37 PM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander Subject: [PATCH] drm/amdgpu/swsmu: don't bail early on hw_setup on resume The SMU comes back up with DPM enabled by the sbios, but the driver still has to set up the SMU/driver mailbox, etc. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c index d4b804c7b986..462917d4d5e2 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c @@ -1102,7 +1102,7 @@ static int smu_smc_hw_setup(struct smu_context *smu) uint32_t pcie_gen = 0, pcie_width = 0; int ret = 0; - if (adev->in_suspend && smu_is_dpm_running(smu)) { + if (!smu->is_apu && adev->in_suspend && smu_is_dpm_running(smu)) { dev_info(adev->dev, "dpm has been enabled\n"); /* this is needed specifically */ if ((adev->asic_type >= CHIP_SIENNA_CICHLID) && -- 2.30.2 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=04%7C01%7CKevin1.Wang%40amd.com%7C97142ed601fb40e5733308d8f010ecb4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637523302779786035%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=8ToBQ2CXNmbeA%2BH01D6frUUCvrsL4TzGiiId%2B2gVHi8%3Dreserved=0 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amdgpu/swsmu: don't bail early on hw_setup on resume
The SMU comes back up with DPM enabled by the sbios, but the driver still has to set up the SMU/driver mailbox, etc. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c index d4b804c7b986..462917d4d5e2 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c @@ -1102,7 +1102,7 @@ static int smu_smc_hw_setup(struct smu_context *smu) uint32_t pcie_gen = 0, pcie_width = 0; int ret = 0; - if (adev->in_suspend && smu_is_dpm_running(smu)) { + if (!smu->is_apu && adev->in_suspend && smu_is_dpm_running(smu)) { dev_info(adev->dev, "dpm has been enabled\n"); /* this is needed specifically */ if ((adev->asic_type >= CHIP_SIENNA_CICHLID) && -- 2.30.2 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm: Update MST First Link Slot Information Based on Encoding Format
8b/10b encoding format requires to reserve the first slot for recording metadata. Real data transmission starts from the second slot, with a total of available 63 slots available. In 128b/132b encoding format, metadata is transmitted separately in LLCP packet before MTP. Real data transmission starts from the first slot, with a total of 64 slots available. Update the slot information after link detect. Signed-off-by: Fangzhi Zuo --- drivers/gpu/drm/drm_dp_mst_topology.c | 48 ++- include/drm/drm_dp_mst_helper.h | 8 + 2 files changed, 47 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c index 42a0c6888c33..577ed4224778 100644 --- a/drivers/gpu/drm/drm_dp_mst_topology.c +++ b/drivers/gpu/drm/drm_dp_mst_topology.c @@ -3382,7 +3382,7 @@ int drm_dp_update_payload_part1(struct drm_dp_mst_topology_mgr *mgr) struct drm_dp_payload req_payload; struct drm_dp_mst_port *port; int i, j; - int cur_slots = 1; + int cur_slots = mgr->first_link_start_slot; mutex_lock(>payload_lock); for (i = 0; i < mgr->max_payloads; i++) { @@ -4302,8 +4302,13 @@ int drm_dp_find_vcpi_slots(struct drm_dp_mst_topology_mgr *mgr, num_slots = DIV_ROUND_UP(pbn, mgr->pbn_div); - /* max. time slots - one slot for MTP header */ - if (num_slots > 63) + /** +* first_link_total_avail_slots: max. time slots +* first slot reserved for MTP header in 8b/10b, +* but not required for 128b/132b +*/ + + if (num_slots > mgr->first_link_total_avail_slots) return -ENOSPC; return num_slots; } @@ -4314,8 +4319,12 @@ static int drm_dp_init_vcpi(struct drm_dp_mst_topology_mgr *mgr, { int ret; - /* max. time slots - one slot for MTP header */ - if (slots > 63) + /** +* first_link_total_avail_slots: max. time slots +* first slot reserved for MTP header in 8b/10b, +* but not required for 128b/132b +*/ + if (slots > mgr->first_link_total_avail_slots) return -ENOSPC; vcpi->pbn = pbn; @@ -4488,6 +4497,25 @@ int drm_dp_atomic_release_vcpi_slots(struct drm_atomic_state *state, } EXPORT_SYMBOL(drm_dp_atomic_release_vcpi_slots); +/* + * drm_dp_mst_update_first_link_slot_info() + * update the first link's total available slots and starting slot + * @mgr: manager to store the slot info. + * @encoding_format: detected link encoding format + */ +void drm_dp_mst_update_first_link_slot_info( + struct drm_dp_mst_topology_mgr *mgr, uint8_t encoding_format) +{ + if (encoding_format == DP_CAP_ANSI_128B132B) { + mgr->first_link_total_avail_slots = 64; + mgr->first_link_start_slot = 0; + } + DRM_DEBUG_KMS("%s encoding format on 0x%p -> total %d slots, start at slot %d\n", + (encoding_format == DP_CAP_ANSI_128B132B) ? "128b/132b":"8b/10b", + mgr, mgr->first_link_total_avail_slots, mgr->first_link_start_slot); +} +EXPORT_SYMBOL(drm_dp_mst_update_first_link_slot_info); + /** * drm_dp_mst_allocate_vcpi() - Allocate a virtual channel * @mgr: manager for this port @@ -4518,8 +4546,8 @@ bool drm_dp_mst_allocate_vcpi(struct drm_dp_mst_topology_mgr *mgr, ret = drm_dp_init_vcpi(mgr, >vcpi, pbn, slots); if (ret) { - DRM_DEBUG_KMS("failed to init vcpi slots=%d max=63 ret=%d\n", - DIV_ROUND_UP(pbn, mgr->pbn_div), ret); + DRM_DEBUG_KMS("failed to init vcpi slots=%d max=%d ret=%d\n", + DIV_ROUND_UP(pbn, mgr->pbn_div), mgr->first_link_total_avail_slots, ret); drm_dp_mst_topology_put_port(port); goto out; } @@ -5162,7 +5190,7 @@ drm_dp_mst_atomic_check_vcpi_alloc_limit(struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_topology_state *mst_state) { struct drm_dp_vcpi_allocation *vcpi; - int avail_slots = 63, payload_count = 0; + int avail_slots = mgr->first_link_total_avail_slots, payload_count = 0; list_for_each_entry(vcpi, _state->vcpis, next) { /* Releasing VCPI is always OK-even if the port is gone */ @@ -5191,7 +5219,7 @@ drm_dp_mst_atomic_check_vcpi_alloc_limit(struct drm_dp_mst_topology_mgr *mgr, } DRM_DEBUG_ATOMIC("[MST MGR:%p] mst state %p VCPI avail=%d used=%d\n", mgr, mst_state, avail_slots, -63 - avail_slots); +mgr->first_link_total_avail_slots - avail_slots); return 0; } @@ -5455,6 +5483,8 @@ int drm_dp_mst_topology_mgr_init(struct drm_dp_mst_topology_mgr *mgr, if (!mgr->proposed_vcpis) return -ENOMEM; set_bit(0, >payload_mask); + mgr->first_link_total_avail_slots = 63; +
Re: [PATCH] drm/amdkfd: Fix cat debugfs hang_hws file causes system crash bug
Am 2021-03-23 um 10:56 a.m. schrieb Alex Deucher: > Applied. Thanks! Thanks. I thought we fixed this before by making the file write-only. But I guess that's not sufficient to stop root from reading it: commit 2bdac179e217a0c0b548a8c60524977586621b19 Author: Felix Kuehling Date: Thu Dec 19 22:36:55 2019 -0500 drm/amdkfd: Fix permissions of hang_hws Reading from /sys/kernel/debug/kfd/hang_hws would cause a kernel oops because we didn't implement a read callback. Set the permission to write-only to prevent that. Signed-off-by: Felix Kuehling Reviewed-by: shaoyunl Signed-off-by: Alex Deucher Now that we have a sensible message in the file, I guess we should officially make it readable again. Regards, Felix > > Alex > > On Sun, Mar 21, 2021 at 5:33 AM Qu Huang wrote: >> Here is the system crash log: >> [ 1272.884438] BUG: unable to handle kernel NULL pointer dereference at >> (null) >> [ 1272.88] IP: [< (null)>] (null) >> [ 1272.884447] PGD 825b09067 PUD 8267c8067 PMD 0 >> [ 1272.884452] Oops: 0010 [#1] SMP >> [ 1272.884509] CPU: 13 PID: 3485 Comm: cat Kdump: loaded Tainted: G >> [ 1272.884515] task: 9a38dbd4d140 ti: 9a37cd3b8000 task.ti: >> 9a37cd3b8000 >> [ 1272.884517] RIP: 0010:[<>] [< (null)>] >> (null) >> [ 1272.884520] RSP: 0018:9a37cd3bbe68 EFLAGS: 00010203 >> [ 1272.884522] RAX: RBX: RCX: >> 00014d5f >> [ 1272.884524] RDX: fff4 RSI: 0001 RDI: >> 9a38aca4d200 >> [ 1272.884526] RBP: 9a37cd3bbed0 R08: 9a38dcd5f1a0 R09: >> 9a31ffc07300 >> [ 1272.884527] R10: 9a31ffc07300 R11: addd5e9d R12: >> 9a38b4e0fb00 >> [ 1272.884529] R13: 0001 R14: 9a37cd3bbf18 R15: >> 9a38aca4d200 >> [ 1272.884532] FS: 7feccaa67740() GS:9a38dcd4() >> knlGS: >> [ 1272.884534] CS: 0010 DS: ES: CR0: 80050033 >> [ 1272.884536] CR2: CR3: 0008267c CR4: >> 003407e0 >> [ 1272.884537] Call Trace: >> [ 1272.884544] [] ? seq_read+0x130/0x440 >> [ 1272.884548] [] vfs_read+0x9f/0x170 >> [ 1272.884552] [] SyS_read+0x7f/0xf0 >> [ 1272.884557] [] system_call_fastpath+0x22/0x27 >> [ 1272.884558] Code: Bad RIP value. >> [ 1272.884562] RIP [< (null)>] (null) >> [ 1272.884564] RSP >> [ 1272.884566] CR2: >> >> Signed-off-by: Qu Huang >> --- >> drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c | 7 ++- >> 1 file changed, 6 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c >> b/drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c >> index 511712c..673d5e3 100644 >> --- a/drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c >> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c >> @@ -33,6 +33,11 @@ static int kfd_debugfs_open(struct inode *inode, struct >> file *file) >> >> return single_open(file, show, NULL); >> } >> +static int kfd_debugfs_hang_hws_read(struct seq_file *m, void *data) >> +{ >> + seq_printf(m, "echo gpu_id > hang_hws\n"); >> + return 0; >> +} >> >> static ssize_t kfd_debugfs_hang_hws_write(struct file *file, >> const char __user *user_buf, size_t size, loff_t *ppos) >> @@ -94,7 +99,7 @@ void kfd_debugfs_init(void) >> debugfs_create_file("rls", S_IFREG | 0444, debugfs_root, >> kfd_debugfs_rls_by_device, _debugfs_fops); >> debugfs_create_file("hang_hws", S_IFREG | 0200, debugfs_root, >> - NULL, _debugfs_hang_hws_fops); >> + kfd_debugfs_hang_hws_read, >> _debugfs_hang_hws_fops); >> } >> >> void kfd_debugfs_fini(void) >> -- >> 1.8.3.1 >> >> ___ >> dri-devel mailing list >> dri-de...@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/dri-devel ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak
[AMD Official Use Only - Internal Distribution Only] Hi, Andrey, >>how u handle non guilty singnaled jobs in drm_sched_stop, currently looks >>like you don't call put for them and just explicitly free them as before Good point, I missed that place. Will cover that in my next patch. >>Also sched->free_guilty seems useless with the new approach. Yes, I agree. >>Do we even need the cleanup mechanism at drm_sched_get_cleanup_job with this >>approach... I am not quite sure about that for now, let me think about this topic today. Hi, Christian, should I add a fence and get/put to that fence rather than using an explicit refcount? And another concerns? Thanks, Jack -Original Message- From: Grodzovsky, Andrey Sent: Friday, March 26, 2021 12:32 AM To: Zhang, Jack (Jian) ; Christian König ; dri-de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig, Christian ; Liu, Monk ; Deng, Emily ; Rob Herring ; Tomeu Vizoso ; Steven Price Subject: Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak There are a few issues here like - how u handle non guilty singnaled jobs in drm_sched_stop, currently looks like you don't call put for them and just explicitly free them as before. Also sched->free_guilty seems useless with the new approach. Do we even need the cleanup mechanism at drm_sched_get_cleanup_job with this approach... But first - We need Christian to express his opinion on this since I think he opposed refcounting jobs and that we should concentrate on fences instead. Christian - can you chime in here ? Andrey On 2021-03-25 5:51 a.m., Zhang, Jack (Jian) wrote: > [AMD Official Use Only - Internal Distribution Only] > > > Hi, Andrey > > Thank you for your good opinions. > > I literally agree with you that the refcount could solve the > get_clean_up_up cocurrent job gracefully, and no need to re-insert the > > job back anymore. > > I quickly made a draft for this idea as follows: > > How do you like it? I will start implement to it after I got your > acknowledge. > > Thanks, > > Jack > > +void drm_job_get(struct drm_sched_job *s_job) > > +{ > > + kref_get(_job->refcount); > > +} > > + > > +void drm_job_do_release(struct kref *ref) > > +{ > > + struct drm_sched_job *s_job; > > + struct drm_gpu_scheduler *sched; > > + > > + s_job = container_of(ref, struct drm_sched_job, refcount); > > + sched = s_job->sched; > > + sched->ops->free_job(s_job); > > +} > > + > > +void drm_job_put(struct drm_sched_job *s_job) > > +{ > > + kref_put(_job->refcount, drm_job_do_release); > > +} > > + > > static void drm_sched_job_begin(struct drm_sched_job *s_job) > > { > > struct drm_gpu_scheduler *sched = s_job->sched; > > + kref_init(_job->refcount); > > + drm_job_get(s_job); > > spin_lock(>job_list_lock); > > list_add_tail(_job->node, >ring_mirror_list); > > drm_sched_start_timeout(sched); > > @@ -294,17 +316,16 @@ static void drm_sched_job_timedout(struct > work_struct *work) > > * drm_sched_cleanup_jobs. It will be reinserted back > after sched->thread > > * is parked at which point it's safe. > > */ > > - list_del_init(>node); > > + drm_job_get(job); > > spin_unlock(>job_list_lock); > > job->sched->ops->timedout_job(job); > > - > > + drm_job_put(job); > > /* > > * Guilty job did complete and hence needs to be > manually removed > > * See drm_sched_stop doc. > > */ > > if (sched->free_guilty) { > > - job->sched->ops->free_job(job); > > sched->free_guilty = false; > > } > > } else { > > @@ -355,20 +376,6 @@ void drm_sched_stop(struct drm_gpu_scheduler > *sched, struct drm_sched_job *bad) > > - /* > > -* Reinsert back the bad job here - now it's safe as > > -* drm_sched_get_cleanup_job cannot race against us and > release the > > -* bad job at this point - we parked (waited for) any in > progress > > -* (earlier) cleanups and drm_sched_get_cleanup_job will not > be called > > -* now until the scheduler thread is unparked. > > -*/ > > - if (bad && bad->sched == sched) > > - /* > > -* Add at the head of the queue to reflect it was the > earliest > > -* job extracted. > > -*/ > > - list_add(>node, >ring_mirror_list); > > - > > /* > > * Iterate the job list from later to earlier one and either > deactive > > * their HW callbacks or remove them from mirror list if they > already > > @@ -774,7 +781,7 @@ static int drm_sched_main(void *param) > > kthread_should_stop()); > > if (cleanup_job) { >
RE: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak
[AMD Official Use Only - Internal Distribution Only] Hi, Steve, Thank you for your detailed comments. But currently the patch is not finalized. We found some potential race condition even with this patch. The solution is under discussion and hopefully we could find an ideal one. After that, I will start to consider other drm-driver if it will influence other drivers(except for amdgpu). Best, Jack -Original Message- From: Steven Price Sent: Monday, March 22, 2021 11:29 PM To: Zhang, Jack (Jian) ; dri-de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig, Christian ; Grodzovsky, Andrey ; Liu, Monk ; Deng, Emily ; Rob Herring ; Tomeu Vizoso Subject: Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak On 15/03/2021 05:23, Zhang, Jack (Jian) wrote: > [AMD Public Use] > > Hi, Rob/Tomeu/Steven, > > Would you please help to review this patch for panfrost driver? > > Thanks, > Jack Zhang > > -Original Message- > From: Jack Zhang > Sent: Monday, March 15, 2021 1:21 PM > To: dri-de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; > Koenig, Christian ; Grodzovsky, Andrey > ; Liu, Monk ; Deng, Emily > > Cc: Zhang, Jack (Jian) > Subject: [PATCH v3] drm/scheduler re-insert Bailing job to avoid > memleak > > re-insert Bailing jobs to avoid memory leak. > > V2: move re-insert step to drm/scheduler logic > V3: add panfrost's return value for bailing jobs in case it hits the > memleak issue. This commit message could do with some work - it's really hard to decipher what the actual problem you're solving is. > > Signed-off-by: Jack Zhang > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++- > drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 8 ++-- > drivers/gpu/drm/panfrost/panfrost_job.c| 4 ++-- > drivers/gpu/drm/scheduler/sched_main.c | 8 +++- > include/drm/gpu_scheduler.h| 1 + > 5 files changed, 19 insertions(+), 6 deletions(-) > [...] > diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c > b/drivers/gpu/drm/panfrost/panfrost_job.c > index 6003cfeb1322..e2cb4f32dae1 100644 > --- a/drivers/gpu/drm/panfrost/panfrost_job.c > +++ b/drivers/gpu/drm/panfrost/panfrost_job.c > @@ -444,7 +444,7 @@ static enum drm_gpu_sched_stat > panfrost_job_timedout(struct drm_sched_job >* spurious. Bail out. >*/ > if (dma_fence_is_signaled(job->done_fence)) > -return DRM_GPU_SCHED_STAT_NOMINAL; > +return DRM_GPU_SCHED_STAT_BAILING; > > dev_err(pfdev->dev, "gpu sched timeout, js=%d, config=0x%x, status=0x%x, > head=0x%x, tail=0x%x, sched_job=%p", > js, > @@ -456,7 +456,7 @@ static enum drm_gpu_sched_stat > panfrost_job_timedout(struct drm_sched_job > > /* Scheduler is already stopped, nothing to do. */ > if (!panfrost_scheduler_stop(>js->queue[js], sched_job)) > -return DRM_GPU_SCHED_STAT_NOMINAL; > +return DRM_GPU_SCHED_STAT_BAILING; > > /* Schedule a reset if there's no reset in progress. */ > if (!atomic_xchg(>reset.pending, 1)) This looks correct to me - in these two cases drm_sched_stop() is not called on the sched_job, so it looks like currently the job will be leaked. > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > b/drivers/gpu/drm/scheduler/sched_main.c > index 92d8de24d0a1..a44f621fb5c4 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -314,6 +314,7 @@ static void drm_sched_job_timedout(struct work_struct > *work) > { > struct drm_gpu_scheduler *sched; > struct drm_sched_job *job; > +int ret; > > sched = container_of(work, struct drm_gpu_scheduler, > work_tdr.work); > > @@ -331,8 +332,13 @@ static void drm_sched_job_timedout(struct work_struct > *work) > list_del_init(>list); > spin_unlock(>job_list_lock); > > -job->sched->ops->timedout_job(job); > +ret = job->sched->ops->timedout_job(job); > > +if (ret == DRM_GPU_SCHED_STAT_BAILING) { > +spin_lock(>job_list_lock); > +list_add(>node, >ring_mirror_list); > +spin_unlock(>job_list_lock); > +} I think we could really do with a comment somewhere explaining what "bailing" means in this context. For the Panfrost case we have two cases: * The GPU job actually finished while the timeout code was running (done_fence is signalled). * The GPU is already in the process of being reset (Panfrost has multiple queues, so mostly like a bad job in another queue). I'm also not convinced that (for Panfrost) it makes sense to be adding the jobs back to the list. For the first case above clearly the job could just be freed (it's complete). The second case is more interesting and Panfrost currently doesn't handle this well. In theory the driver could try to rescue the job ('soft stop' in Mali language) so that it could be resubmitted. Panfrost doesn't currently support that, so attempting to resubmit the job is almost certainly going to fail. It's on my TODO list to look at improving Panfrost in this regard, but sadly still quite far down. Steve > /* >* Guilty
Re: [PATCH v2] drm/mst: Enhance MST topology logging
Reviewed-by: Lyude Paul Let me know if you need me to push this to drm-misc-next for you On Thu, 2021-03-25 at 14:06 -0400, Eryk Brol wrote: > [why] > MST topology print was missing fec logging and pdt printed > as an int wasn't clear. vcpi and payload info was printed as an > arbitrary series of ints which requires user to know the ordering > of the prints, making the logs difficult to use. > > [how] > -add fec logging > -add pdt parsing into strings > -format vcpi and payload info into tables with headings > -clean up topology prints > > --- > > v2: Addressed Lyude's comments > -made helper function return const > -fixed indentation and spacing issues > > Signed-off-by: Eryk Brol > --- > drivers/gpu/drm/drm_dp_mst_topology.c | 59 ++- > 1 file changed, 48 insertions(+), 11 deletions(-) > > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c > b/drivers/gpu/drm/drm_dp_mst_topology.c > index 932c4641ec3e..de5124ce42cb 100644 > --- a/drivers/gpu/drm/drm_dp_mst_topology.c > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c > @@ -4720,6 +4720,28 @@ static void drm_dp_mst_kick_tx(struct > drm_dp_mst_topology_mgr *mgr) > queue_work(system_long_wq, >tx_work); > } > > +/* > + * Helper function for parsing DP device types into convenient strings > + * for use with dp_mst_topology > + */ > +static const char *pdt_to_string(u8 pdt) > +{ > + switch (pdt) { > + case DP_PEER_DEVICE_NONE: > + return "NONE"; > + case DP_PEER_DEVICE_SOURCE_OR_SST: > + return "SOURCE OR SST"; > + case DP_PEER_DEVICE_MST_BRANCHING: > + return "MST BRANCHING"; > + case DP_PEER_DEVICE_SST_SINK: > + return "SST SINK"; > + case DP_PEER_DEVICE_DP_LEGACY_CONV: > + return "DP LEGACY CONV"; > + default: > + return "ERR"; > + } > +} > + > static void drm_dp_mst_dump_mstb(struct seq_file *m, > struct drm_dp_mst_branch *mstb) > { > @@ -4732,9 +4754,20 @@ static void drm_dp_mst_dump_mstb(struct seq_file *m, > prefix[i] = '\t'; > prefix[i] = '\0'; > > - seq_printf(m, "%smst: %p, %d\n", prefix, mstb, mstb->num_ports); > + seq_printf(m, "%smstb - [%p]: num_ports: %d\n", prefix, mstb, mstb- > >num_ports); > list_for_each_entry(port, >ports, next) { > - seq_printf(m, "%sport: %d: input: %d: pdt: %d, ddps: %d ldps: > %d, sdp: %d/%d, %p, conn: %p\n", prefix, port->port_num, port->input, port- > >pdt, port->ddps, port->ldps, port->num_sdp_streams, port- > >num_sdp_stream_sinks, port, port->connector); > + seq_printf(m, "%sport %d - [%p] (%s - %s): ddps: %d, ldps: %d, > sdp: %d/%d, fec: %s, conn: %p\n", > + prefix, > + port->port_num, > + port, > + port->input ? "input" : "output", > + pdt_to_string(port->pdt), > + port->ddps, > + port->ldps, > + port->num_sdp_streams, > + port->num_sdp_stream_sinks, > + port->fec_capable ? "true" : "false", > + port->connector); > if (port->mstb) > drm_dp_mst_dump_mstb(m, port->mstb); > } > @@ -4787,33 +4820,37 @@ void drm_dp_mst_dump_topology(struct seq_file *m, > mutex_unlock(>lock); > > mutex_lock(>payload_lock); > - seq_printf(m, "vcpi: %lx %lx %d\n", mgr->payload_mask, mgr->vcpi_mask, > - mgr->max_payloads); > + seq_printf(m, "\n*** VCPI Info ***\n"); > + seq_printf(m, "payload_mask: %lx, vcpi_mask: %lx, max_payloads: %d\n", > mgr->payload_mask, mgr->vcpi_mask, mgr->max_payloads); > > + seq_printf(m, "\n| idx | port # | vcp_id | # slots | sink > name |\n"); > for (i = 0; i < mgr->max_payloads; i++) { > if (mgr->proposed_vcpis[i]) { > char name[14]; > > port = container_of(mgr->proposed_vcpis[i], struct > drm_dp_mst_port, vcpi); > fetch_monitor_name(mgr, port, name, sizeof(name)); > - seq_printf(m, "vcpi %d: %d %d %d sink name: %s\n", i, > - port->port_num, port->vcpi.vcpi, > + seq_printf(m, "%10d%10d%10d%10d%20s\n", > + i, > + port->port_num, > + port->vcpi.vcpi, > port->vcpi.num_slots, > - (*name != 0) ? name : "Unknown"); > + (*name != 0) ? name : "Unknown"); > } else > - seq_printf(m, "vcpi %d:unused\n", i); > + seq_printf(m,
Re: [PATCH 2/2] drm/amdgpu: Introduce new SETUP_TMR interface
Thank you Lijo. I added two macro to calculate a bo's physical address. I also used those macros to simplified driver a little bit. Please help review Regards, Oak On 2021-03-23, 10:15 AM, "Lazar, Lijo" wrote: [AMD Public Use] -Original Message- From: amd-gfx On Behalf Of Zeng, Oak Sent: Monday, March 22, 2021 7:33 PM To: amd-gfx@lists.freedesktop.org Cc: Kuehling, Felix ; Zhang, Hawking Subject: Re: [PATCH 2/2] drm/amdgpu: Introduce new SETUP_TMR interface [AMD Official Use Only - Internal Distribution Only] [AMD Official Use Only - Internal Distribution Only] Hello all, Can someone help to review below patches? We verified with firmware team and want to check-in together with psp firmware Regards, Oak On 2021-03-12, 4:24 PM, "Zeng, Oak" wrote: This new interface passes both virtual and physical address to PSP. It is backword compatible with old interface. Signed-off-by: Oak Zeng --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 13 ++--- drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h | 11 ++- 2 files changed, 20 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c index cd3eda9..99e1a3e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c @@ -328,8 +328,13 @@ psp_cmd_submit_buf(struct psp_context *psp, static void psp_prep_tmr_cmd_buf(struct psp_context *psp, struct psp_gfx_cmd_resp *cmd, - uint64_t tmr_mc, uint32_t size) + uint64_t tmr_mc, struct amdgpu_bo *tmr_bo) { +struct amdgpu_device *adev = psp->adev; +uint32_t size = amdgpu_bo_size(tmr_bo); +uint64_t tmr_pa = amdgpu_bo_gpu_offset(tmr_bo) + +adev->vm_manager.vram_base_offset - adev->gmc.vram_start; + <> This looks like a candidate for a small inline function in gmc. PSP doesn't need to know about the calculation. Thanks, Lijo if (amdgpu_sriov_vf(psp->adev)) cmd->cmd_id = GFX_CMD_ID_SETUP_VMR; else @@ -337,6 +342,9 @@ static void psp_prep_tmr_cmd_buf(struct psp_context *psp, cmd->cmd.cmd_setup_tmr.buf_phy_addr_lo = lower_32_bits(tmr_mc); cmd->cmd.cmd_setup_tmr.buf_phy_addr_hi = upper_32_bits(tmr_mc); cmd->cmd.cmd_setup_tmr.buf_size = size; +cmd->cmd.cmd_setup_tmr.bitfield.virt_phy_addr = 1; +cmd->cmd.cmd_setup_tmr.system_phy_addr_lo = lower_32_bits(tmr_pa); +cmd->cmd.cmd_setup_tmr.system_phy_addr_hi = upper_32_bits(tmr_pa); } static void psp_prep_load_toc_cmd_buf(struct psp_gfx_cmd_resp *cmd, @@ -456,8 +464,7 @@ static int psp_tmr_load(struct psp_context *psp) if (!cmd) return -ENOMEM; -psp_prep_tmr_cmd_buf(psp, cmd, psp->tmr_mc_addr, - amdgpu_bo_size(psp->tmr_bo)); +psp_prep_tmr_cmd_buf(psp, cmd, psp->tmr_mc_addr, psp->tmr_bo); DRM_INFO("reserve 0x%lx from 0x%llx for PSP TMR\n", amdgpu_bo_size(psp->tmr_bo), psp->tmr_mc_addr); diff --git a/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h b/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h index a41b054..604a1c1 100644 --- a/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h +++ b/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h @@ -170,10 +170,19 @@ struct psp_gfx_cmd_setup_tmr uint32_tbuf_phy_addr_lo; /* bits [31:0] of GPU Virtual address of TMR buffer (must be 4 KB aligned) */ uint32_tbuf_phy_addr_hi; /* bits [63:32] of GPU Virtual address of TMR buffer */ uint32_tbuf_size; /* buffer size in bytes (must be multiple of 4 KB) */ +union { +struct { +uint32_tsriov_enabled:1; /* whether the device runs under SR-IOV*/ +uint32_tvirt_phy_addr:1; /* driver passes both virtual and physical address to PSP*/ +uint32_treserved:30; +} bitfield; +uint32_ttmr_flags; +}; +uint32_tsystem_phy_addr_lo;/* bits [31:0] of system physical address of TMR buffer (must be 4 KB aligned) */ +uint32_tsystem_phy_addr_hi;/* bits [63:32] of system physical address of TMR buffer */ }; - /* FW types for GFX_CMD_ID_LOAD_IP_FW command. Limit 31. */ enum psp_gfx_fw_type { GFX_FW_TYPE_NONE= 0,/* */ -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org
[PATCH v2] drm/mst: Enhance MST topology logging
[why] MST topology print was missing fec logging and pdt printed as an int wasn't clear. vcpi and payload info was printed as an arbitrary series of ints which requires user to know the ordering of the prints, making the logs difficult to use. [how] -add fec logging -add pdt parsing into strings -format vcpi and payload info into tables with headings -clean up topology prints --- v2: Addressed Lyude's comments -made helper function return const -fixed indentation and spacing issues Signed-off-by: Eryk Brol --- drivers/gpu/drm/drm_dp_mst_topology.c | 59 ++- 1 file changed, 48 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c index 932c4641ec3e..de5124ce42cb 100644 --- a/drivers/gpu/drm/drm_dp_mst_topology.c +++ b/drivers/gpu/drm/drm_dp_mst_topology.c @@ -4720,6 +4720,28 @@ static void drm_dp_mst_kick_tx(struct drm_dp_mst_topology_mgr *mgr) queue_work(system_long_wq, >tx_work); } +/* + * Helper function for parsing DP device types into convenient strings + * for use with dp_mst_topology + */ +static const char *pdt_to_string(u8 pdt) +{ + switch (pdt) { + case DP_PEER_DEVICE_NONE: + return "NONE"; + case DP_PEER_DEVICE_SOURCE_OR_SST: + return "SOURCE OR SST"; + case DP_PEER_DEVICE_MST_BRANCHING: + return "MST BRANCHING"; + case DP_PEER_DEVICE_SST_SINK: + return "SST SINK"; + case DP_PEER_DEVICE_DP_LEGACY_CONV: + return "DP LEGACY CONV"; + default: + return "ERR"; + } +} + static void drm_dp_mst_dump_mstb(struct seq_file *m, struct drm_dp_mst_branch *mstb) { @@ -4732,9 +4754,20 @@ static void drm_dp_mst_dump_mstb(struct seq_file *m, prefix[i] = '\t'; prefix[i] = '\0'; - seq_printf(m, "%smst: %p, %d\n", prefix, mstb, mstb->num_ports); + seq_printf(m, "%smstb - [%p]: num_ports: %d\n", prefix, mstb, mstb->num_ports); list_for_each_entry(port, >ports, next) { - seq_printf(m, "%sport: %d: input: %d: pdt: %d, ddps: %d ldps: %d, sdp: %d/%d, %p, conn: %p\n", prefix, port->port_num, port->input, port->pdt, port->ddps, port->ldps, port->num_sdp_streams, port->num_sdp_stream_sinks, port, port->connector); + seq_printf(m, "%sport %d - [%p] (%s - %s): ddps: %d, ldps: %d, sdp: %d/%d, fec: %s, conn: %p\n", + prefix, + port->port_num, + port, + port->input ? "input" : "output", + pdt_to_string(port->pdt), + port->ddps, + port->ldps, + port->num_sdp_streams, + port->num_sdp_stream_sinks, + port->fec_capable ? "true" : "false", + port->connector); if (port->mstb) drm_dp_mst_dump_mstb(m, port->mstb); } @@ -4787,33 +4820,37 @@ void drm_dp_mst_dump_topology(struct seq_file *m, mutex_unlock(>lock); mutex_lock(>payload_lock); - seq_printf(m, "vcpi: %lx %lx %d\n", mgr->payload_mask, mgr->vcpi_mask, - mgr->max_payloads); + seq_printf(m, "\n*** VCPI Info ***\n"); + seq_printf(m, "payload_mask: %lx, vcpi_mask: %lx, max_payloads: %d\n", mgr->payload_mask, mgr->vcpi_mask, mgr->max_payloads); + seq_printf(m, "\n| idx | port # | vcp_id | # slots | sink name |\n"); for (i = 0; i < mgr->max_payloads; i++) { if (mgr->proposed_vcpis[i]) { char name[14]; port = container_of(mgr->proposed_vcpis[i], struct drm_dp_mst_port, vcpi); fetch_monitor_name(mgr, port, name, sizeof(name)); - seq_printf(m, "vcpi %d: %d %d %d sink name: %s\n", i, - port->port_num, port->vcpi.vcpi, + seq_printf(m, "%10d%10d%10d%10d%20s\n", + i, + port->port_num, + port->vcpi.vcpi, port->vcpi.num_slots, - (*name != 0) ? name : "Unknown"); + (*name != 0) ? name : "Unknown"); } else - seq_printf(m, "vcpi %d:unused\n", i); + seq_printf(m, "%6d - Unused\n", i); } + seq_printf(m, "\n*** Payload Info ***\n"); + seq_printf(m, "| idx | state | start slot | # slots |\n"); for (i = 0; i < mgr->max_payloads; i++) { - seq_printf(m, "payload %d: %d, %d, %d\n", + seq_printf(m, "%10d%10d%15d%10d\n", i,
Re: [PATCH] drm/amd: Fix a typo in two different sentences
Applied. Thanks! Alex On Mon, Mar 22, 2021 at 6:45 PM Randy Dunlap wrote: > > On 3/22/21 2:06 PM, Bhaskar Chowdhury wrote: > > > > s/defintion/definition/ .two different places. > > > > Signed-off-by: Bhaskar Chowdhury > > Acked-by: Randy Dunlap > > > --- > > drivers/gpu/drm/amd/include/atombios.h | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/include/atombios.h > > b/drivers/gpu/drm/amd/include/atombios.h > > index c1d7b1d0b952..47eb84598b96 100644 > > --- a/drivers/gpu/drm/amd/include/atombios.h > > +++ b/drivers/gpu/drm/amd/include/atombios.h > > @@ -1987,9 +1987,9 @@ typedef struct _PIXEL_CLOCK_PARAMETERS_V6 > > #define PIXEL_CLOCK_V6_MISC_HDMI_BPP_MASK 0x0c > > #define PIXEL_CLOCK_V6_MISC_HDMI_24BPP 0x00 > > #define PIXEL_CLOCK_V6_MISC_HDMI_36BPP 0x04 > > -#define PIXEL_CLOCK_V6_MISC_HDMI_36BPP_V6 0x08//for V6, the > > correct defintion for 36bpp should be 2 for 36bpp(2:1) > > +#define PIXEL_CLOCK_V6_MISC_HDMI_36BPP_V6 0x08//for V6, the > > correct definition for 36bpp should be 2 for 36bpp(2:1) > > #define PIXEL_CLOCK_V6_MISC_HDMI_30BPP 0x08 > > -#define PIXEL_CLOCK_V6_MISC_HDMI_30BPP_V6 0x04//for V6, the > > correct defintion for 30bpp should be 1 for 36bpp(5:4) > > +#define PIXEL_CLOCK_V6_MISC_HDMI_30BPP_V6 0x04//for V6, the > > correct definition for 30bpp should be 1 for 36bpp(5:4) > > #define PIXEL_CLOCK_V6_MISC_HDMI_48BPP 0x0c > > #define PIXEL_CLOCK_V6_MISC_REF_DIV_SRC 0x10 > > #define PIXEL_CLOCK_V6_MISC_GEN_DPREFCLK0x40 > > -- > > > -- > ~Randy > > ___ > dri-devel mailing list > dri-de...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amd/amdgpu/gfx_v7_0: Trivial typo fixes
Applied. Thanks! Alex On Thu, Mar 25, 2021 at 5:26 AM Nirmoy wrote: > > > Reviewed-by: Nirmoy Das > > On 3/25/21 9:53 AM, Bhaskar Chowdhury wrote: > > s/acccess/access/ > > s/inferface/interface/ > > s/sequnce/sequence/ .two different places. > > s/retrive/retrieve/ > > s/sheduling/scheduling/ > > s/independant/independent/ > > s/wether/whether/ ..two different places. > > s/emmit/emit/ > > s/synce/sync/ > > > > > > Signed-off-by: Bhaskar Chowdhury > > --- > > drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 22 +++--- > > 1 file changed, 11 insertions(+), 11 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c > > b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c > > index a368724c3dfc..4502b95ddf6b 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c > > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c > > @@ -1877,7 +1877,7 @@ static void gfx_v7_0_init_compute_vmid(struct > > amdgpu_device *adev) > > mutex_unlock(>srbm_mutex); > > > > /* Initialize all compute VMIDs to have no GDS, GWS, or OA > > -acccess. These should be enabled by FW for target VMIDs. */ > > +access. These should be enabled by FW for target VMIDs. */ > > for (i = adev->vm_manager.first_kfd_vmid; i < AMDGPU_NUM_VMID; i++) { > > WREG32(amdgpu_gds_reg_offset[i].mem_base, 0); > > WREG32(amdgpu_gds_reg_offset[i].mem_size, 0); > > @@ -2058,7 +2058,7 @@ static void gfx_v7_0_constants_init(struct > > amdgpu_device *adev) > >* @adev: amdgpu_device pointer > >* > >* Set up the number and offset of the CP scratch registers. > > - * NOTE: use of CP scratch registers is a legacy inferface and > > + * NOTE: use of CP scratch registers is a legacy interface and > >* is not used by default on newer asics (r6xx+). On newer asics, > >* memory buffers are used for fences rather than scratch regs. > >*/ > > @@ -2172,7 +2172,7 @@ static void gfx_v7_0_ring_emit_vgt_flush(struct > > amdgpu_ring *ring) > >* @seq: sequence number > >* @flags: fence related flags > >* > > - * Emits a fence sequnce number on the gfx ring and flushes > > + * Emits a fence sequence number on the gfx ring and flushes > >* GPU caches. > >*/ > > static void gfx_v7_0_ring_emit_fence_gfx(struct amdgpu_ring *ring, u64 > > addr, > > @@ -2215,7 +2215,7 @@ static void gfx_v7_0_ring_emit_fence_gfx(struct > > amdgpu_ring *ring, u64 addr, > >* @seq: sequence number > >* @flags: fence related flags > >* > > - * Emits a fence sequnce number on the compute ring and flushes > > + * Emits a fence sequence number on the compute ring and flushes > >* GPU caches. > >*/ > > static void gfx_v7_0_ring_emit_fence_compute(struct amdgpu_ring *ring, > > @@ -2245,14 +2245,14 @@ static void gfx_v7_0_ring_emit_fence_compute(struct > > amdgpu_ring *ring, > >* gfx_v7_0_ring_emit_ib - emit an IB (Indirect Buffer) on the ring > >* > >* @ring: amdgpu_ring structure holding ring information > > - * @job: job to retrive vmid from > > + * @job: job to retrieve vmid from > >* @ib: amdgpu indirect buffer object > >* @flags: options (AMDGPU_HAVE_CTX_SWITCH) > >* > >* Emits an DE (drawing engine) or CE (constant engine) IB > >* on the gfx ring. IBs are usually generated by userspace > >* acceleration drivers and submitted to the kernel for > > - * sheduling on the ring. This function schedules the IB > > + * scheduling on the ring. This function schedules the IB > >* on the gfx ring for execution by the GPU. > >*/ > > static void gfx_v7_0_ring_emit_ib_gfx(struct amdgpu_ring *ring, > > @@ -2402,7 +2402,7 @@ static int gfx_v7_0_ring_test_ib(struct amdgpu_ring > > *ring, long timeout) > > > > /* > >* CP. > > - * On CIK, gfx and compute now have independant command processors. > > + * On CIK, gfx and compute now have independent command processors. > >* > >* GFX > >* Gfx consists of a single ring and can process both gfx jobs and > > @@ -2630,7 +2630,7 @@ static int gfx_v7_0_cp_gfx_resume(struct > > amdgpu_device *adev) > > ring->wptr = 0; > > WREG32(mmCP_RB0_WPTR, lower_32_bits(ring->wptr)); > > > > - /* set the wb address wether it's enabled or not */ > > + /* set the wb address whether it's enabled or not */ > > rptr_addr = adev->wb.gpu_addr + (ring->rptr_offs * 4); > > WREG32(mmCP_RB0_RPTR_ADDR, lower_32_bits(rptr_addr)); > > WREG32(mmCP_RB0_RPTR_ADDR_HI, upper_32_bits(rptr_addr) & 0xFF); > > @@ -2985,7 +2985,7 @@ static void gfx_v7_0_mqd_init(struct amdgpu_device > > *adev, > > mqd->cp_hqd_pq_wptr_poll_addr_lo = wb_gpu_addr & 0xfffc; > > mqd->cp_hqd_pq_wptr_poll_addr_hi = upper_32_bits(wb_gpu_addr) & > > 0x; > > > > - /* set the wb address wether it's enabled or not */ > > + /* set the wb address whether it's enabled or not */ > > wb_gpu_addr = adev->wb.gpu_addr + (ring->rptr_offs * 4); > >
Re: [PATCH V2] drm/radeon/r600_cs: Few typo fixes
Applied. Thanks! Alex On Wed, Mar 24, 2021 at 7:46 PM Randy Dunlap wrote: > > On 3/24/21 4:29 PM, Bhaskar Chowdhury wrote: > > s/miror/mirror/ > > s/needind/needing/ > > s/informations/information/ > > > > Signed-off-by: Bhaskar Chowdhury > > Acked-by: Randy Dunlap > > Thanks. > > > --- > > Changes from V1: > > Randy's finding incorporated ,i.e in one place,informations->information > > Adjusted the subject line accordingly > > > > drivers/gpu/drm/radeon/r600_cs.c | 6 +++--- > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/gpu/drm/radeon/r600_cs.c > > b/drivers/gpu/drm/radeon/r600_cs.c > > index 34b7c6f16479..8be4799a98ef 100644 > > --- a/drivers/gpu/drm/radeon/r600_cs.c > > +++ b/drivers/gpu/drm/radeon/r600_cs.c > > @@ -38,7 +38,7 @@ extern void r600_cs_legacy_get_tiling_conf(struct > > drm_device *dev, u32 *npipes, > > > > > > struct r600_cs_track { > > - /* configuration we miror so that we use same code btw kms/ums */ > > + /* configuration we mirror so that we use same code btw kms/ums */ > > u32 group_size; > > u32 nbanks; > > u32 npipes; > > @@ -963,7 +963,7 @@ static int r600_cs_parse_packet0(struct > > radeon_cs_parser *p, > > * > > * This function will test against r600_reg_safe_bm and return 0 > > * if register is safe. If register is not flag as safe this function > > - * will test it against a list of register needind special handling. > > + * will test it against a list of register needing special handling. > > */ > > static int r600_cs_check_reg(struct radeon_cs_parser *p, u32 reg, u32 idx) > > { > > @@ -2336,7 +2336,7 @@ int r600_cs_parse(struct radeon_cs_parser *p) > > /** > > * r600_dma_cs_next_reloc() - parse next reloc > > * @p: parser structure holding parsing context. > > - * @cs_reloc:reloc informations > > + * @cs_reloc:reloc information > > * > > * Return the next reloc, do bo validation and compute > > * GPU offset using the provided start. > > -- > > > -- > ~Randy > > ___ > dri-devel mailing list > dri-de...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] amdgpu: securedisplay: simplify i2c hexdump output
Applied. Thanks! Alex On Wed, Mar 24, 2021 at 9:37 AM Arnd Bergmann wrote: > > From: Arnd Bergmann > > A previous fix I did left a rather complicated loop in > amdgpu_securedisplay_debugfs_write() for what could be expressed in a > simple sprintf, as Rasmus pointed out. > > This drops the leading 0x for each byte, but is otherwise > much nicer. > > Suggested-by: Rasmus Villemoes > Signed-off-by: Arnd Bergmann > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_securedisplay.c | 11 +++ > 1 file changed, 3 insertions(+), 8 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_securedisplay.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_securedisplay.c > index 69d7f6bff5d4..fc3ddd7aa6f0 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_securedisplay.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_securedisplay.c > @@ -92,9 +92,7 @@ static ssize_t amdgpu_securedisplay_debugfs_write(struct > file *f, const char __u > struct drm_device *dev = adev_to_drm(adev); > uint32_t phy_id; > uint32_t op; > - int i; > char str[64]; > - char i2c_output[256]; > int ret; > > if (*pos || size > sizeof(str) - 1) > @@ -136,12 +134,9 @@ static ssize_t amdgpu_securedisplay_debugfs_write(struct > file *f, const char __u > ret = psp_securedisplay_invoke(psp, > TA_SECUREDISPLAY_COMMAND__SEND_ROI_CRC); > if (!ret) { > if (securedisplay_cmd->status == > TA_SECUREDISPLAY_STATUS__SUCCESS) { > - int pos = 0; > - memset(i2c_output, 0, sizeof(i2c_output)); > - for (i = 0; i < > TA_SECUREDISPLAY_I2C_BUFFER_SIZE; i++) > - pos += sprintf(i2c_output + pos, " > 0x%X", > - > securedisplay_cmd->securedisplay_out_message.send_roi_crc.i2c_buf[i]); > - dev_info(adev->dev, "SECUREDISPLAY: I2C > buffer out put is :%s\n", i2c_output); > + dev_info(adev->dev, "SECUREDISPLAY: I2C > buffer out put is: %*ph\n", > +TA_SECUREDISPLAY_I2C_BUFFER_SIZE, > + > securedisplay_cmd->securedisplay_out_message.send_roi_crc.i2c_buf); > } else { > psp_securedisplay_parse_resp_status(psp, > securedisplay_cmd->status); > } > -- > 2.29.2 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH v2] drm/amdgpu: Ensure that the modifier requested is supported by plane.
On Wed, Mar 24, 2021 at 4:16 PM Mark Yacoub wrote: > > From: Mark Yacoub > > On initializing the framebuffer, call drm_any_plane_has_format to do a > check if the modifier is supported. drm_any_plane_has_format calls > dm_plane_format_mod_supported which is extended to validate that the > modifier is on the list of the plane's supported modifiers. > > The bug was caught using igt-gpu-tools test: > kms_addfb_basic.addfb25-bad-modifier > > Tested on ChromeOS Zork by turning on the display, running an overlay > test, and running a YT video. > > === Changes from v1 === > Explicitly handle DRM_FORMAT_MOD_INVALID modifier. > > Cc: Alex Deucher > Cc: Bas Nieuwenhuizen > Signed-off-by: default avatarMark Yacoub Applied. Thanks! Alex > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_display.c| 13 + > .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 18 +++--- > 2 files changed, 28 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c > index afa5f8ad0f563..a947b5aa420d2 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c > @@ -908,6 +908,19 @@ int amdgpu_display_gem_fb_verify_and_init( > _fb_funcs); > if (ret) > goto err; > + /* Verify that the modifier is supported. */ > + if (!drm_any_plane_has_format(dev, mode_cmd->pixel_format, > + mode_cmd->modifier[0])) { > + struct drm_format_name_buf format_name; > + drm_dbg_kms(dev, > + "unsupported pixel format %s / modifier 0x%llx\n", > + drm_get_format_name(mode_cmd->pixel_format, > + _name), > + mode_cmd->modifier[0]); > + > + ret = -EINVAL; > + goto err; > + } > > ret = amdgpu_display_framebuffer_init(dev, rfb, mode_cmd, obj); > if (ret) > diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > index 961abf1cf040c..6fc746996c24f 100644 > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > @@ -3939,6 +3939,7 @@ static bool dm_plane_format_mod_supported(struct > drm_plane *plane, > { > struct amdgpu_device *adev = drm_to_adev(plane->dev); > const struct drm_format_info *info = drm_format_info(format); > + int i; > > enum dm_micro_swizzle microtile = > modifier_gfx9_swizzle_mode(modifier) & 3; > > @@ -3946,11 +3947,22 @@ static bool dm_plane_format_mod_supported(struct > drm_plane *plane, > return false; > > /* > -* We always have to allow this modifier, because core DRM still > -* checks LINEAR support if userspace does not provide modifers. > +* We always have to allow these modifiers: > +* 1. Core DRM checks for LINEAR support if userspace does not > provide modifiers. > +* 2. Not passing any modifiers is the same as explicitly passing > INVALID. > */ > - if (modifier == DRM_FORMAT_MOD_LINEAR) > + if (modifier == DRM_FORMAT_MOD_LINEAR || > + modifier == DRM_FORMAT_MOD_INVALID) { > return true; > + } > + > + /* Check that the modifier is on the list of the plane's supported > modifiers. */ > + for (i = 0; i < plane->modifier_count; i++) { > + if (modifier == plane->modifiers[i]) > + break; > + } > + if (i == plane->modifier_count) > + return false; > > /* > * The arbitrary tiling support for multiplane formats has not been > hooked > -- > 2.31.0.291.g576ba9dcdaf-goog > > ___ > dri-devel mailing list > dri-de...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 4/4] drm/amdgpu: Introduce new SETUP_TMR interface
This new interface passes both virtual and physical address to PSP. It is backword compatible with old interface. v2: use a macro to simply tmr physical address calc (Lijo) Signed-off-by: Oak Zeng --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 12 +--- drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h | 11 ++- 2 files changed, 19 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c index 1005124..d224c53 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c @@ -331,8 +331,12 @@ psp_cmd_submit_buf(struct psp_context *psp, static void psp_prep_tmr_cmd_buf(struct psp_context *psp, struct psp_gfx_cmd_resp *cmd, -uint64_t tmr_mc, uint32_t size) +uint64_t tmr_mc, struct amdgpu_bo *tmr_bo) { + struct amdgpu_device *adev = psp->adev; + uint32_t size = amdgpu_bo_size(tmr_bo); + uint64_t tmr_pa = amdgpu_gmc_gpu_pa(adev, tmr_bo); + if (amdgpu_sriov_vf(psp->adev)) cmd->cmd_id = GFX_CMD_ID_SETUP_VMR; else @@ -340,6 +344,9 @@ static void psp_prep_tmr_cmd_buf(struct psp_context *psp, cmd->cmd.cmd_setup_tmr.buf_phy_addr_lo = lower_32_bits(tmr_mc); cmd->cmd.cmd_setup_tmr.buf_phy_addr_hi = upper_32_bits(tmr_mc); cmd->cmd.cmd_setup_tmr.buf_size = size; + cmd->cmd.cmd_setup_tmr.bitfield.virt_phy_addr = 1; + cmd->cmd.cmd_setup_tmr.system_phy_addr_lo = lower_32_bits(tmr_pa); + cmd->cmd.cmd_setup_tmr.system_phy_addr_hi = upper_32_bits(tmr_pa); } static void psp_prep_load_toc_cmd_buf(struct psp_gfx_cmd_resp *cmd, @@ -459,8 +466,7 @@ static int psp_tmr_load(struct psp_context *psp) if (!cmd) return -ENOMEM; - psp_prep_tmr_cmd_buf(psp, cmd, psp->tmr_mc_addr, -amdgpu_bo_size(psp->tmr_bo)); + psp_prep_tmr_cmd_buf(psp, cmd, psp->tmr_mc_addr, psp->tmr_bo); DRM_INFO("reserve 0x%lx from 0x%llx for PSP TMR\n", amdgpu_bo_size(psp->tmr_bo), psp->tmr_mc_addr); diff --git a/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h b/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h index 8c1d0b5..edc5d75 100644 --- a/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h +++ b/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h @@ -170,10 +170,19 @@ struct psp_gfx_cmd_setup_tmr uint32_tbuf_phy_addr_lo; /* bits [31:0] of GPU Virtual address of TMR buffer (must be 4 KB aligned) */ uint32_tbuf_phy_addr_hi; /* bits [63:32] of GPU Virtual address of TMR buffer */ uint32_tbuf_size; /* buffer size in bytes (must be multiple of 4 KB) */ +union { + struct { + uint32_tsriov_enabled:1; /* whether the device runs under SR-IOV*/ + uint32_tvirt_phy_addr:1; /* driver passes both virtual and physical address to PSP*/ + uint32_treserved:30; + } bitfield; + uint32_ttmr_flags; +}; +uint32_tsystem_phy_addr_lo;/* bits [31:0] of system physical address of TMR buffer (must be 4 KB aligned) */ +uint32_tsystem_phy_addr_hi;/* bits [63:32] of system physical address of TMR buffer */ }; - /* FW types for GFX_CMD_ID_LOAD_IP_FW command. Limit 31. */ enum psp_gfx_fw_type { GFX_FW_TYPE_NONE= 0,/* */ -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 3/4] Revert "drm/amdgpu: workaround the TMR MC address issue (v2)"
This reverts commit 34a33d4683cba7ba63c894132efb1998c0217631. Signed-off-by: Oak Zeng --- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 9 - drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 10 -- drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c | 21 ++--- drivers/gpu/drm/amd/amdgpu/mmhub_v1_7.c | 10 -- 4 files changed, 10 insertions(+), 40 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h index 7cd9d34..a9e0bba 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h @@ -217,15 +217,6 @@ struct amdgpu_gmc { */ u64 fb_start; u64 fb_end; - /* In the case of use GART table for vmid0 FB access, [fb_start, fb_end] -* will be squeezed to GART aperture. But we have a PSP FW issue to fix -* for now. To temporarily workaround the PSP FW issue, added below two -* variables to remember the original fb_start/end to re-enable FB -* aperture to workaround the PSP FW issue. Will delete it after we -* get a proper PSP FW fix. -*/ - u64 fb_start_original; - u64 fb_end_original; unsignedvram_width; u64 real_vram_size; int vram_mtrr; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c index 5c71c5c..1005124 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c @@ -410,16 +410,6 @@ static int psp_tmr_init(struct psp_context *psp) AMDGPU_GEM_DOMAIN_VRAM, >tmr_bo, >tmr_mc_addr, pptr); - /* workaround the tmr_mc_addr: -* PSP requires an address in FB aperture. Right now driver produce -* tmr_mc_addr in the GART aperture. Convert it back to FB aperture -* for PSP. Will revert it after we get a fix from PSP FW. -*/ - if (psp->adev->asic_type == CHIP_ALDEBARAN) { - psp->tmr_mc_addr -= psp->adev->gmc.fb_start; - psp->tmr_mc_addr += psp->adev->gmc.fb_start_original; - } - return ret; } diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c index 7beef4c..8c8f0d3 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c @@ -154,21 +154,12 @@ static void gfxhub_v1_0_init_system_aperture_regs(struct amdgpu_device *adev) * FB aperture and AGP aperture. Disable them. */ if (adev->gmc.pdb0_bo) { - if (adev->asic_type == CHIP_ALDEBARAN) { - WREG32_SOC15(GC, 0, mmMC_VM_FB_LOCATION_TOP, adev->gmc.fb_end_original >> 24); - WREG32_SOC15(GC, 0, mmMC_VM_FB_LOCATION_BASE, adev->gmc.fb_start_original >> 24); - WREG32_SOC15(GC, 0, mmMC_VM_AGP_TOP, 0); - WREG32_SOC15(GC, 0, mmMC_VM_AGP_BOT, 0xFF); - WREG32_SOC15(GC, 0, mmMC_VM_SYSTEM_APERTURE_LOW_ADDR, adev->gmc.fb_start_original >> 18); - WREG32_SOC15(GC, 0, mmMC_VM_SYSTEM_APERTURE_HIGH_ADDR, adev->gmc.fb_end_original >> 18); - } else { - WREG32_SOC15(GC, 0, mmMC_VM_FB_LOCATION_TOP, 0); - WREG32_SOC15(GC, 0, mmMC_VM_FB_LOCATION_BASE, 0x00FF); - WREG32_SOC15(GC, 0, mmMC_VM_AGP_TOP, 0); - WREG32_SOC15(GC, 0, mmMC_VM_AGP_BOT, 0xFF); - WREG32_SOC15(GC, 0, mmMC_VM_SYSTEM_APERTURE_LOW_ADDR, 0x3FFF); - WREG32_SOC15(GC, 0, mmMC_VM_SYSTEM_APERTURE_HIGH_ADDR, 0); - } + WREG32_SOC15(GC, 0, mmMC_VM_FB_LOCATION_TOP, 0); + WREG32_SOC15(GC, 0, mmMC_VM_FB_LOCATION_BASE, 0x00FF); + WREG32_SOC15(GC, 0, mmMC_VM_AGP_TOP, 0); + WREG32_SOC15(GC, 0, mmMC_VM_AGP_BOT, 0xFF); + WREG32_SOC15(GC, 0, mmMC_VM_SYSTEM_APERTURE_LOW_ADDR, 0x3FFF); + WREG32_SOC15(GC, 0, mmMC_VM_SYSTEM_APERTURE_HIGH_ADDR, 0); } } diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_7.c b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_7.c index 8862ac2..8bb36d1 100644 --- a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_7.c +++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_7.c @@ -51,8 +51,6 @@ static u64 mmhub_v1_7_get_fb_location(struct amdgpu_device *adev) adev->gmc.fb_start = base; adev->gmc.fb_end = top; - adev->gmc.fb_start_original = base; - adev->gmc.fb_end_original = top; return base; } @@ -148,10 +146,10 @@ static void mmhub_v1_7_init_system_aperture_regs(struct amdgpu_device *adev) if (adev->gmc.pdb0_bo) { WREG32_SOC15(MMHUB, 0,
[PATCH 1/4] drm/amdgpu: Macros for vram physical addr calculation
Add one macro to calculate BO's GPU physical address. And another one to calculate BO's CPU physical address. Signed-off-by: Oak Zeng Suggested-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h index 2ee8d1b..7cd9d34 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h @@ -283,6 +283,9 @@ struct amdgpu_gmc { #define amdgpu_gmc_get_vm_pde(adev, level, dst, flags) (adev)->gmc.gmc_funcs->get_vm_pde((adev), (level), (dst), (flags)) #define amdgpu_gmc_get_vm_pte(adev, mapping, flags) (adev)->gmc.gmc_funcs->get_vm_pte((adev), (mapping), (flags)) #define amdgpu_gmc_get_vbios_fb_size(adev) (adev)->gmc.gmc_funcs->get_vbios_fb_size((adev)) +#define amdgpu_gmc_gpu_va2pa(adev, va) (va - (adev)->gmc.vram_start + (adev)->vm_manager.vram_base_offset) +#define amdgpu_gmc_gpu_pa(adev, bo) amdgpu_gmc_gpu_va2pa(adev, amdgpu_bo_gpu_offset(bo)) +#define amdgpu_gmc_cpu_pa(adev, bo) (amdgpu_bo_gpu_offset(bo) - (adev)->gmc.vram_start + (adev)->gmc.aper_base) /** * amdgpu_gmc_vram_full_visible - Check if full VRAM is visible through the BAR -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 2/4] drm/amdgpu: use macros to simplify codes
Use amdgpu_gmc_gpu_pa and amdgpu_gmc_cpu_pa to simplify codes. No logic change. Signed-off-by: Oak Zeng --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 3 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c | 4 +--- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 4 +--- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 3 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 4 +--- drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c | 3 +-- drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.c | 3 +-- drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c | 3 +-- drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 3 +-- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c| 3 +-- drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c | 3 +-- drivers/gpu/drm/amd/amdgpu/mmhub_v1_7.c | 3 +-- drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c | 3 +-- drivers/gpu/drm/amd/amdgpu/mmhub_v2_3.c | 3 +-- drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c | 3 +-- 15 files changed, 15 insertions(+), 33 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 7357c1e..33581ef 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -1831,8 +1831,7 @@ static int get_sg_table(struct amdgpu_device *adev, goto out; if (bo->preferred_domains == AMDGPU_GEM_DOMAIN_VRAM) { - bus_addr = amdgpu_bo_gpu_offset(bo) - adev->gmc.vram_start - + adev->gmc.aper_base + offset; + bus_addr = amdgpu_gmc_cpu_pa(adev, bo) + offset; aper_limit = adev->gmc.aper_base + adev->gmc.aper_size; if (bus_addr + (chunks * page_size) > aper_limit) { pr_err("sg: bus addr not inside pci aperture\n"); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c index fd04a3a..dbeb774 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c @@ -211,7 +211,6 @@ static int amdgpufb_create(struct drm_fb_helper *helper, struct drm_gem_object *gobj = NULL; struct amdgpu_bo *abo = NULL; int ret; - unsigned long tmp; memset(_cmd, 0, sizeof(mode_cmd)); mode_cmd.width = sizes->surface_width; @@ -252,8 +251,7 @@ static int amdgpufb_create(struct drm_fb_helper *helper, info->fbops = _ops; - tmp = amdgpu_bo_gpu_offset(abo) - adev->gmc.vram_start; - info->fix.smem_start = adev->gmc.aper_base + tmp; + info->fix.smem_start = amdgpu_gmc_cpu_pa(adev, abo); info->fix.smem_len = amdgpu_bo_size(abo); info->screen_base = amdgpu_bo_kptr(abo); info->screen_size = amdgpu_bo_size(abo); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index 46a10c1..391d41f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -475,9 +475,7 @@ int amdgpu_gem_dgma_ioctl(struct drm_device *dev, void *data, r = -EINVAL; goto release_object; } - args->addr = amdgpu_bo_gpu_offset(abo); - args->addr -= adev->gmc.vram_start; - args->addr += adev->gmc.aper_base; + args->addr = amdgpu_gmc_cpu_pa(adev, abo); break; default: return -EINVAL; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c index e6344a8..f02143a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c @@ -671,8 +671,7 @@ void amdgpu_gmc_init_pdb0(struct amdgpu_device *adev) u64 vram_addr = adev->vm_manager.vram_base_offset - adev->gmc.xgmi.physical_node_id * adev->gmc.xgmi.node_segment_size; u64 vram_end = vram_addr + vram_size; - u64 gart_ptb_gpu_pa = amdgpu_bo_gpu_offset(adev->gart.bo) + - adev->vm_manager.vram_base_offset - adev->gmc.vram_start; + u64 gart_ptb_gpu_pa = amdgpu_gmc_gpu_pa(adev, adev->gart.bo); flags |= AMDGPU_PTE_VALID | AMDGPU_PTE_READABLE; flags |= AMDGPU_PTE_WRITEABLE; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index d1211ef..9c18e00 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -2234,9 +2234,7 @@ static int amdgpu_ssg_init(struct amdgpu_device *adev) init_completion(>ssg.cmp); - res.start = adev->gmc.aper_base + - (amdgpu_bo_gpu_offset(adev->direct_gma.dgma_bo) - -adev->gmc.vram_start); + res.start = amdgpu_gmc_cpu_pa(adev, adev->direct_gma.dgma_bo); res.end = res.start + amdgpu_bo_size(adev->direct_gma.dgma_bo) - 1; res.name = "DirectGMA"; diff --git
Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak
There are a few issues here like - how u handle non guilty singnaled jobs in drm_sched_stop, currently looks like you don't call put for them and just explicitly free them as before. Also sched->free_guilty seems useless with the new approach. Do we even need the cleanup mechanism at drm_sched_get_cleanup_job with this approach... But first - We need Christian to express his opinion on this since I think he opposed refcounting jobs and that we should concentrate on fences instead. Christian - can you chime in here ? Andrey On 2021-03-25 5:51 a.m., Zhang, Jack (Jian) wrote: [AMD Official Use Only - Internal Distribution Only] Hi, Andrey Thank you for your good opinions. I literally agree with you that the refcount could solve the get_clean_up_up cocurrent job gracefully, and no need to re-insert the job back anymore. I quickly made a draft for this idea as follows: How do you like it? I will start implement to it after I got your acknowledge. Thanks, Jack +void drm_job_get(struct drm_sched_job *s_job) +{ + kref_get(_job->refcount); +} + +void drm_job_do_release(struct kref *ref) +{ + struct drm_sched_job *s_job; + struct drm_gpu_scheduler *sched; + + s_job = container_of(ref, struct drm_sched_job, refcount); + sched = s_job->sched; + sched->ops->free_job(s_job); +} + +void drm_job_put(struct drm_sched_job *s_job) +{ + kref_put(_job->refcount, drm_job_do_release); +} + static void drm_sched_job_begin(struct drm_sched_job *s_job) { struct drm_gpu_scheduler *sched = s_job->sched; + kref_init(_job->refcount); + drm_job_get(s_job); spin_lock(>job_list_lock); list_add_tail(_job->node, >ring_mirror_list); drm_sched_start_timeout(sched); @@ -294,17 +316,16 @@ static void drm_sched_job_timedout(struct work_struct *work) * drm_sched_cleanup_jobs. It will be reinserted back after sched->thread * is parked at which point it's safe. */ - list_del_init(>node); + drm_job_get(job); spin_unlock(>job_list_lock); job->sched->ops->timedout_job(job); - + drm_job_put(job); /* * Guilty job did complete and hence needs to be manually removed * See drm_sched_stop doc. */ if (sched->free_guilty) { - job->sched->ops->free_job(job); sched->free_guilty = false; } } else { @@ -355,20 +376,6 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) - /* - * Reinsert back the bad job here - now it's safe as - * drm_sched_get_cleanup_job cannot race against us and release the - * bad job at this point - we parked (waited for) any in progress - * (earlier) cleanups and drm_sched_get_cleanup_job will not be called - * now until the scheduler thread is unparked. - */ - if (bad && bad->sched == sched) - /* - * Add at the head of the queue to reflect it was the earliest - * job extracted. - */ - list_add(>node, >ring_mirror_list); - /* * Iterate the job list from later to earlier one and either deactive * their HW callbacks or remove them from mirror list if they already @@ -774,7 +781,7 @@ static int drm_sched_main(void *param) kthread_should_stop()); if (cleanup_job) { - sched->ops->free_job(cleanup_job); + drm_job_put(cleanup_job); /* queue timeout for next job */ drm_sched_start_timeout(sched); } diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 5a1f068af1c2..b80513eec90f 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -188,6 +188,7 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f); * to schedule the job. */ struct drm_sched_job { + struct kref refcount; struct spsc_node queue_node; struct drm_gpu_scheduler *sched; struct drm_sched_fence *s_fence; @@ -198,6 +199,7 @@ struct drm_sched_job { enum drm_sched_priority s_priority; struct drm_sched_entity *entity; struct dma_fence_cb cb; + }; *From:* Grodzovsky, Andrey *Sent:* Friday, March 19, 2021 12:17 AM *To:* Zhang, Jack (Jian) ; Christian König ; dri-de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig, Christian ; Liu, Monk ; Deng, Emily ; Rob Herring ; Tomeu Vizoso ; Steven Price *Subject:* Re: [PATCH v3]
Re: Need to support mixed memory mappings with HMM
Am 2021-03-25 um 12:22 p.m. schrieb Christian König: >> My idea is to change the amdgpu_vm_update_mapping interface to use >> some >> high-bit in the pages_addr array to indicate the page type. For the >> default page type (0) nothing really changes for the callers. The >> "flags" parameter needs to become a pointer to an array that gets >> indexed by the high bits from the pages_addr array. For existing >> callers >> it's as easy as changing flags to (array of size 1). For >> HMM we >> would pass a pointer to a real array. > Yeah, I've thought about stuff like that as well for a while. > > Problem is that this won't work that easily. We assume at many places > that the flags doesn't change for the range in question. I think some lower level functions assume that the flags stay the same for physically contiguous ranges. But if you use the high-bits to encode the page type, the ranges won't be contiguous any more. So you can change page flags for different contiguous ranges. >>> Yeah, but then you also get absolutely zero THP and fragment flags >>> support. >> As long as you have a contiguous 2MB page with the same page type, I >> think you can still get a THP mapping in the GPU page table. But if one >> page in the middle of a 2MB page has a different page type, that will >> break the THP mapping, as it should. > > Yeah, but currently we detect that before we call down into the > functions to update the tables. > > When you give a mixed list the chance for that is just completely gone. Currently the detection of contiguous ranges is in amdgpu_vm_update_mapping: > if (num_entries > AMDGPU_GPU_PAGES_IN_CPU_PAGE) { uint64_t pfn = > cursor.start >> PAGE_SHIFT; uint64_t count; contiguous = > pages_addr[pfn + 1] == pages_addr[pfn] + PAGE_SIZE; tmp = num_entries > / AMDGPU_GPU_PAGES_IN_CPU_PAGE; for (count = 2; count < tmp; ++count) > { uint64_t idx = pfn + count; if (contiguous != (pages_addr[idx] == > pages_addr[idx - 1] + PAGE_SIZE)) break; } num_entries = count * > AMDGPU_GPU_PAGES_IN_CPU_PAGE; } If a page type changes from one page to the next, it will end a contiguous range and call into the lower level (amdgpu_vm_update_ptes). I don't think anything needs to change in amdgpu_vm_update_ptes, because all pages contiguous passed to it use the same page type, so the same flags. And the existing code in amdgpu_vm_update_mapping already does the right thing. I honestly don't see the problem. Regards, Felix > > Regards, > Christian. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: Need to support mixed memory mappings with HMM
Am 25.03.21 um 17:20 schrieb Felix Kuehling: Am 2021-03-25 um 12:16 p.m. schrieb Christian König: Am 25.03.21 um 17:14 schrieb Felix Kuehling: Am 2021-03-25 um 12:10 p.m. schrieb Christian König: Am 25.03.21 um 17:03 schrieb Felix Kuehling: Hi, This is a long one with a proposal for a pretty significant redesign of how we handle migrations and VRAM management with HMM. This is based on my own debugging and reading of the migrate_vma helpers, as well as Alex's problems with migrations on A+A. I hope we can discuss this next Monday after you've had some time do digest it. I did some debugging yesterday and found that migrations to VRAM can fail for some pages. The current migration helpers have many corner cases where a page cannot be migrated. Some of them may be fixable (adding support for THP), others are not (locked pages are skipped to avoid deadlocks). Therefore I think our current code is too inflexible when it assumes that a range is entirely in one place. Alex also ran into some funny issues with COW on A+A where some pages get faulted back to system memory. I think a lot of the problems here will get easier once we support mixed mappings. Mixed GPU mappings === The idea is, to remove any assumptions that an entire svm_range is in one place. Instead hmm_range_fault gives us a list of pages, some of which are system memory and others are device_private or device_generic. We will need an amdgpu_vm interface that lets us map mixed page arrays where different pages use different PTE flags. We can have at least 3 different types of pages in one mapping: 1. System memory (S-bit set) 2. Local memory (S-bit cleared, MTYPE for local memory) 3. Remote XGMI memory (S-bit cleared, MTYPE+C for remote memory) My idea is to change the amdgpu_vm_update_mapping interface to use some high-bit in the pages_addr array to indicate the page type. For the default page type (0) nothing really changes for the callers. The "flags" parameter needs to become a pointer to an array that gets indexed by the high bits from the pages_addr array. For existing callers it's as easy as changing flags to (array of size 1). For HMM we would pass a pointer to a real array. Yeah, I've thought about stuff like that as well for a while. Problem is that this won't work that easily. We assume at many places that the flags doesn't change for the range in question. I think some lower level functions assume that the flags stay the same for physically contiguous ranges. But if you use the high-bits to encode the page type, the ranges won't be contiguous any more. So you can change page flags for different contiguous ranges. Yeah, but then you also get absolutely zero THP and fragment flags support. As long as you have a contiguous 2MB page with the same page type, I think you can still get a THP mapping in the GPU page table. But if one page in the middle of a 2MB page has a different page type, that will break the THP mapping, as it should. Yeah, but currently we detect that before we call down into the functions to update the tables. When you give a mixed list the chance for that is just completely gone. Regards, Christian. Regards, Felix But I think we could also add those later on. Regards, Christian. Regards, Felix We would somehow need to change that to get the flags directly from the low bits of the DMA address or something instead. Christian. Once this is done, it leads to a number of opportunities for simplification and better efficiency in kfd_svm: * Support migration when cpages != npages * Migrate a part of an svm_range without splitting it. No more splitting of ranges in CPU page faults * Migrate a part of an svm_range in GPU page fault handler. No more migrating the whole range for a single page fault * Simplified VRAM management (see below) With that, svm_range will no longer have an "actual_loc" field. If we're not sure where the data is, we need to call migrate. If it's already in the right place, then cpages will be 0 and we can skip all the steps after migrate_vma_setup. Simplified VRAM management == VRAM BOs are no longer associated with pranges. Instead they are "free-floating", allocated during migration to VRAM, with reference count for each page that uses the BO. Ref is released in page-release callback. When the ref count drops to 0, free the BO. VRAM BO size should match the migration granularity, 2MB by default. That way the BO can be freed when memory gets migrated out. If migration of some pages fails the BO may not be fully occupied. Also some pages may be released individually on A+A due to COW or other events. Eviction needs to migrate all the pages still using the BO. If the BO struct keeps an array of page pointers, that's basically the migrate.src for the eviction. Migration calls "try_to_unmap", which has the best chance of freeing the BO, even when shared by multiple processes. If we cannot
Re: [PATCH drm/amdgpu 0/2] Convert sysfs sprintf/snprintf family to sysfs_emit
Applied the series. Thanks! Alex On Wed, Mar 24, 2021 at 5:17 AM Tian Tao wrote: > > Use the generic sysfs_emit() function to take place of > snprintf/scnprintf, to avoid buffer overrun. > > Tian Tao (2): > drm/amdgpu: Convert sysfs sprintf/snprintf family to sysfs_emit > drm/amd/pm: Convert sysfs sprintf/snprintf family to sysfs_emit > > drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c | 2 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 +-- > drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 6 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 2 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 8 +-- > drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 32 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 4 +- > drivers/gpu/drm/amd/amdgpu/df_v3_6.c | 2 +- > drivers/gpu/drm/amd/pm/amdgpu_pm.c | 88 > ++-- > 9 files changed, 73 insertions(+), 79 deletions(-) > > -- > 2.7.4 > > ___ > dri-devel mailing list > dri-de...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: Need to support mixed memory mappings with HMM
Am 2021-03-25 um 12:16 p.m. schrieb Christian König: > Am 25.03.21 um 17:14 schrieb Felix Kuehling: >> Am 2021-03-25 um 12:10 p.m. schrieb Christian König: >>> >>> Am 25.03.21 um 17:03 schrieb Felix Kuehling: Hi, This is a long one with a proposal for a pretty significant redesign of how we handle migrations and VRAM management with HMM. This is based on my own debugging and reading of the migrate_vma helpers, as well as Alex's problems with migrations on A+A. I hope we can discuss this next Monday after you've had some time do digest it. I did some debugging yesterday and found that migrations to VRAM can fail for some pages. The current migration helpers have many corner cases where a page cannot be migrated. Some of them may be fixable (adding support for THP), others are not (locked pages are skipped to avoid deadlocks). Therefore I think our current code is too inflexible when it assumes that a range is entirely in one place. Alex also ran into some funny issues with COW on A+A where some pages get faulted back to system memory. I think a lot of the problems here will get easier once we support mixed mappings. Mixed GPU mappings === The idea is, to remove any assumptions that an entire svm_range is in one place. Instead hmm_range_fault gives us a list of pages, some of which are system memory and others are device_private or device_generic. We will need an amdgpu_vm interface that lets us map mixed page arrays where different pages use different PTE flags. We can have at least 3 different types of pages in one mapping: 1. System memory (S-bit set) 2. Local memory (S-bit cleared, MTYPE for local memory) 3. Remote XGMI memory (S-bit cleared, MTYPE+C for remote memory) My idea is to change the amdgpu_vm_update_mapping interface to use some high-bit in the pages_addr array to indicate the page type. For the default page type (0) nothing really changes for the callers. The "flags" parameter needs to become a pointer to an array that gets indexed by the high bits from the pages_addr array. For existing callers it's as easy as changing flags to (array of size 1). For HMM we would pass a pointer to a real array. >>> Yeah, I've thought about stuff like that as well for a while. >>> >>> Problem is that this won't work that easily. We assume at many places >>> that the flags doesn't change for the range in question. >> I think some lower level functions assume that the flags stay the same >> for physically contiguous ranges. But if you use the high-bits to encode >> the page type, the ranges won't be contiguous any more. So you can >> change page flags for different contiguous ranges. > > Yeah, but then you also get absolutely zero THP and fragment flags > support. As long as you have a contiguous 2MB page with the same page type, I think you can still get a THP mapping in the GPU page table. But if one page in the middle of a 2MB page has a different page type, that will break the THP mapping, as it should. Regards, Felix > > But I think we could also add those later on. > > Regards, > Christian. > >> >> Regards, >> Felix >> >> >>> We would somehow need to change that to get the flags directly from >>> the low bits of the DMA address or something instead. >>> >>> Christian. >>> Once this is done, it leads to a number of opportunities for simplification and better efficiency in kfd_svm: * Support migration when cpages != npages * Migrate a part of an svm_range without splitting it. No more splitting of ranges in CPU page faults * Migrate a part of an svm_range in GPU page fault handler. No more migrating the whole range for a single page fault * Simplified VRAM management (see below) With that, svm_range will no longer have an "actual_loc" field. If we're not sure where the data is, we need to call migrate. If it's already in the right place, then cpages will be 0 and we can skip all the steps after migrate_vma_setup. Simplified VRAM management == VRAM BOs are no longer associated with pranges. Instead they are "free-floating", allocated during migration to VRAM, with reference count for each page that uses the BO. Ref is released in page-release callback. When the ref count drops to 0, free the BO. VRAM BO size should match the migration granularity, 2MB by default. That way the BO can be freed when memory gets migrated out. If migration of some pages fails the BO may not be fully occupied. Also some pages may be released individually on A+A due to COW or other events. Eviction needs to migrate all the pages still using the BO. If the BO struct
Re: Need to support mixed memory mappings with HMM
Am 25.03.21 um 17:14 schrieb Felix Kuehling: Am 2021-03-25 um 12:10 p.m. schrieb Christian König: Am 25.03.21 um 17:03 schrieb Felix Kuehling: Hi, This is a long one with a proposal for a pretty significant redesign of how we handle migrations and VRAM management with HMM. This is based on my own debugging and reading of the migrate_vma helpers, as well as Alex's problems with migrations on A+A. I hope we can discuss this next Monday after you've had some time do digest it. I did some debugging yesterday and found that migrations to VRAM can fail for some pages. The current migration helpers have many corner cases where a page cannot be migrated. Some of them may be fixable (adding support for THP), others are not (locked pages are skipped to avoid deadlocks). Therefore I think our current code is too inflexible when it assumes that a range is entirely in one place. Alex also ran into some funny issues with COW on A+A where some pages get faulted back to system memory. I think a lot of the problems here will get easier once we support mixed mappings. Mixed GPU mappings === The idea is, to remove any assumptions that an entire svm_range is in one place. Instead hmm_range_fault gives us a list of pages, some of which are system memory and others are device_private or device_generic. We will need an amdgpu_vm interface that lets us map mixed page arrays where different pages use different PTE flags. We can have at least 3 different types of pages in one mapping: 1. System memory (S-bit set) 2. Local memory (S-bit cleared, MTYPE for local memory) 3. Remote XGMI memory (S-bit cleared, MTYPE+C for remote memory) My idea is to change the amdgpu_vm_update_mapping interface to use some high-bit in the pages_addr array to indicate the page type. For the default page type (0) nothing really changes for the callers. The "flags" parameter needs to become a pointer to an array that gets indexed by the high bits from the pages_addr array. For existing callers it's as easy as changing flags to (array of size 1). For HMM we would pass a pointer to a real array. Yeah, I've thought about stuff like that as well for a while. Problem is that this won't work that easily. We assume at many places that the flags doesn't change for the range in question. I think some lower level functions assume that the flags stay the same for physically contiguous ranges. But if you use the high-bits to encode the page type, the ranges won't be contiguous any more. So you can change page flags for different contiguous ranges. Yeah, but then you also get absolutely zero THP and fragment flags support. But I think we could also add those later on. Regards, Christian. Regards, Felix We would somehow need to change that to get the flags directly from the low bits of the DMA address or something instead. Christian. Once this is done, it leads to a number of opportunities for simplification and better efficiency in kfd_svm: * Support migration when cpages != npages * Migrate a part of an svm_range without splitting it. No more splitting of ranges in CPU page faults * Migrate a part of an svm_range in GPU page fault handler. No more migrating the whole range for a single page fault * Simplified VRAM management (see below) With that, svm_range will no longer have an "actual_loc" field. If we're not sure where the data is, we need to call migrate. If it's already in the right place, then cpages will be 0 and we can skip all the steps after migrate_vma_setup. Simplified VRAM management == VRAM BOs are no longer associated with pranges. Instead they are "free-floating", allocated during migration to VRAM, with reference count for each page that uses the BO. Ref is released in page-release callback. When the ref count drops to 0, free the BO. VRAM BO size should match the migration granularity, 2MB by default. That way the BO can be freed when memory gets migrated out. If migration of some pages fails the BO may not be fully occupied. Also some pages may be released individually on A+A due to COW or other events. Eviction needs to migrate all the pages still using the BO. If the BO struct keeps an array of page pointers, that's basically the migrate.src for the eviction. Migration calls "try_to_unmap", which has the best chance of freeing the BO, even when shared by multiple processes. If we cannot guarantee eviction of pages, we cannot use TTM for VRAM allocations. Need to use amdgpu_vram_mgr. Need a way to detect memory pressure so we can start evicting memory. Regards, Felix ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: Need to support mixed memory mappings with HMM
Am 2021-03-25 um 12:10 p.m. schrieb Christian König: > > > Am 25.03.21 um 17:03 schrieb Felix Kuehling: >> Hi, >> >> This is a long one with a proposal for a pretty significant redesign of >> how we handle migrations and VRAM management with HMM. This is based on >> my own debugging and reading of the migrate_vma helpers, as well as >> Alex's problems with migrations on A+A. I hope we can discuss this next >> Monday after you've had some time do digest it. >> >> I did some debugging yesterday and found that migrations to VRAM can >> fail for some pages. The current migration helpers have many corner >> cases where a page cannot be migrated. Some of them may be fixable >> (adding support for THP), others are not (locked pages are skipped to >> avoid deadlocks). Therefore I think our current code is too inflexible >> when it assumes that a range is entirely in one place. >> >> Alex also ran into some funny issues with COW on A+A where some pages >> get faulted back to system memory. I think a lot of the problems here >> will get easier once we support mixed mappings. >> >> Mixed GPU mappings >> === >> >> The idea is, to remove any assumptions that an entire svm_range is in >> one place. Instead hmm_range_fault gives us a list of pages, some of >> which are system memory and others are device_private or device_generic. >> >> We will need an amdgpu_vm interface that lets us map mixed page arrays >> where different pages use different PTE flags. We can have at least 3 >> different types of pages in one mapping: >> >> 1. System memory (S-bit set) >> 2. Local memory (S-bit cleared, MTYPE for local memory) >> 3. Remote XGMI memory (S-bit cleared, MTYPE+C for remote memory) >> >> My idea is to change the amdgpu_vm_update_mapping interface to use some >> high-bit in the pages_addr array to indicate the page type. For the >> default page type (0) nothing really changes for the callers. The >> "flags" parameter needs to become a pointer to an array that gets >> indexed by the high bits from the pages_addr array. For existing callers >> it's as easy as changing flags to (array of size 1). For HMM we >> would pass a pointer to a real array. > > Yeah, I've thought about stuff like that as well for a while. > > Problem is that this won't work that easily. We assume at many places > that the flags doesn't change for the range in question. I think some lower level functions assume that the flags stay the same for physically contiguous ranges. But if you use the high-bits to encode the page type, the ranges won't be contiguous any more. So you can change page flags for different contiguous ranges. Regards, Felix > > We would somehow need to change that to get the flags directly from > the low bits of the DMA address or something instead. > > Christian. > >> >> Once this is done, it leads to a number of opportunities for >> simplification and better efficiency in kfd_svm: >> >> * Support migration when cpages != npages >> * Migrate a part of an svm_range without splitting it. No more >> splitting of ranges in CPU page faults >> * Migrate a part of an svm_range in GPU page fault handler. No more >> migrating the whole range for a single page fault >> * Simplified VRAM management (see below) >> >> With that, svm_range will no longer have an "actual_loc" field. If we're >> not sure where the data is, we need to call migrate. If it's already in >> the right place, then cpages will be 0 and we can skip all the steps >> after migrate_vma_setup. >> >> Simplified VRAM management >> == >> >> VRAM BOs are no longer associated with pranges. Instead they are >> "free-floating", allocated during migration to VRAM, with reference >> count for each page that uses the BO. Ref is released in page-release >> callback. When the ref count drops to 0, free the BO. >> >> VRAM BO size should match the migration granularity, 2MB by default. >> That way the BO can be freed when memory gets migrated out. If migration >> of some pages fails the BO may not be fully occupied. Also some pages >> may be released individually on A+A due to COW or other events. >> >> Eviction needs to migrate all the pages still using the BO. If the BO >> struct keeps an array of page pointers, that's basically the migrate.src >> for the eviction. Migration calls "try_to_unmap", which has the best >> chance of freeing the BO, even when shared by multiple processes. >> >> If we cannot guarantee eviction of pages, we cannot use TTM for VRAM >> allocations. Need to use amdgpu_vram_mgr. Need a way to detect memory >> pressure so we can start evicting memory. >> >> Regards, >> Felix >> > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: Color mode exposed to user space?
+ dri-devel I don't think it's currently exposed anywhere. Alex On Wed, Mar 24, 2021 at 5:11 AM Werner Sembach wrote: > > Hello, > > is the information which color mode is currently in used for a display (RGB, > YCbCr444, or YCbCr420) exposed to user space somewhere? > > If no: Where would be the best place to put code to expose it to sysfs? > > Thanks in advance, > > Werner Sembach > > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: Need to support mixed memory mappings with HMM
Am 25.03.21 um 17:03 schrieb Felix Kuehling: Hi, This is a long one with a proposal for a pretty significant redesign of how we handle migrations and VRAM management with HMM. This is based on my own debugging and reading of the migrate_vma helpers, as well as Alex's problems with migrations on A+A. I hope we can discuss this next Monday after you've had some time do digest it. I did some debugging yesterday and found that migrations to VRAM can fail for some pages. The current migration helpers have many corner cases where a page cannot be migrated. Some of them may be fixable (adding support for THP), others are not (locked pages are skipped to avoid deadlocks). Therefore I think our current code is too inflexible when it assumes that a range is entirely in one place. Alex also ran into some funny issues with COW on A+A where some pages get faulted back to system memory. I think a lot of the problems here will get easier once we support mixed mappings. Mixed GPU mappings === The idea is, to remove any assumptions that an entire svm_range is in one place. Instead hmm_range_fault gives us a list of pages, some of which are system memory and others are device_private or device_generic. We will need an amdgpu_vm interface that lets us map mixed page arrays where different pages use different PTE flags. We can have at least 3 different types of pages in one mapping: 1. System memory (S-bit set) 2. Local memory (S-bit cleared, MTYPE for local memory) 3. Remote XGMI memory (S-bit cleared, MTYPE+C for remote memory) My idea is to change the amdgpu_vm_update_mapping interface to use some high-bit in the pages_addr array to indicate the page type. For the default page type (0) nothing really changes for the callers. The "flags" parameter needs to become a pointer to an array that gets indexed by the high bits from the pages_addr array. For existing callers it's as easy as changing flags to (array of size 1). For HMM we would pass a pointer to a real array. Yeah, I've thought about stuff like that as well for a while. Problem is that this won't work that easily. We assume at many places that the flags doesn't change for the range in question. We would somehow need to change that to get the flags directly from the low bits of the DMA address or something instead. Christian. Once this is done, it leads to a number of opportunities for simplification and better efficiency in kfd_svm: * Support migration when cpages != npages * Migrate a part of an svm_range without splitting it. No more splitting of ranges in CPU page faults * Migrate a part of an svm_range in GPU page fault handler. No more migrating the whole range for a single page fault * Simplified VRAM management (see below) With that, svm_range will no longer have an "actual_loc" field. If we're not sure where the data is, we need to call migrate. If it's already in the right place, then cpages will be 0 and we can skip all the steps after migrate_vma_setup. Simplified VRAM management == VRAM BOs are no longer associated with pranges. Instead they are "free-floating", allocated during migration to VRAM, with reference count for each page that uses the BO. Ref is released in page-release callback. When the ref count drops to 0, free the BO. VRAM BO size should match the migration granularity, 2MB by default. That way the BO can be freed when memory gets migrated out. If migration of some pages fails the BO may not be fully occupied. Also some pages may be released individually on A+A due to COW or other events. Eviction needs to migrate all the pages still using the BO. If the BO struct keeps an array of page pointers, that's basically the migrate.src for the eviction. Migration calls "try_to_unmap", which has the best chance of freeing the BO, even when shared by multiple processes. If we cannot guarantee eviction of pages, we cannot use TTM for VRAM allocations. Need to use amdgpu_vram_mgr. Need a way to detect memory pressure so we can start evicting memory. Regards, Felix ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/radeon/radeon_pm: Convert sysfs sprintf/snprintf family to sysfs_emit
On Wed, Mar 24, 2021 at 2:47 AM Tian Tao wrote: > > Fix the following coccicheck warning: > drivers/gpu//drm/radeon/radeon_pm.c:521:9-17: WARNING: use scnprintf or > sprintf > drivers/gpu//drm/radeon/radeon_pm.c:475:8-16: WARNING: use scnprintf or > sprintf > drivers/gpu//drm/radeon/radeon_pm.c:418:8-16: WARNING: use scnprintf or > sprintf > drivers/gpu//drm/radeon/radeon_pm.c:363:8-16: WARNING: use scnprintf or > sprintf > drivers/gpu//drm/radeon/radeon_pm.c:734:8-16: WARNING: use scnprintf or > sprintf > drivers/gpu//drm/radeon/radeon_pm.c:688:8-16: WARNING: use scnprintf or > sprintf > drivers/gpu//drm/radeon/radeon_pm.c:704:8-16: WARNING: use scnprintf or > sprintf > drivers/gpu//drm/radeon/radeon_pm.c:755:8-16: WARNING: use scnprintf or > sprintf > > Signed-off-by: Tian Tao Applied. Thanks! Alex > --- > drivers/gpu/drm/radeon/radeon_pm.c | 36 +--- > 1 file changed, 17 insertions(+), 19 deletions(-) > > diff --git a/drivers/gpu/drm/radeon/radeon_pm.c > b/drivers/gpu/drm/radeon/radeon_pm.c > index 1995dad..dd56fcd 100644 > --- a/drivers/gpu/drm/radeon/radeon_pm.c > +++ b/drivers/gpu/drm/radeon/radeon_pm.c > @@ -361,11 +361,10 @@ static ssize_t radeon_get_pm_profile(struct device *dev, > struct radeon_device *rdev = ddev->dev_private; > int cp = rdev->pm.profile; > > - return snprintf(buf, PAGE_SIZE, "%s\n", > - (cp == PM_PROFILE_AUTO) ? "auto" : > - (cp == PM_PROFILE_LOW) ? "low" : > - (cp == PM_PROFILE_MID) ? "mid" : > - (cp == PM_PROFILE_HIGH) ? "high" : "default"); > + return sysfs_emit(buf, "%s\n", (cp == PM_PROFILE_AUTO) ? "auto" : > + (cp == PM_PROFILE_LOW) ? "low" : > + (cp == PM_PROFILE_MID) ? "mid" : > + (cp == PM_PROFILE_HIGH) ? "high" : "default"); > } > > static ssize_t radeon_set_pm_profile(struct device *dev, > @@ -416,9 +415,8 @@ static ssize_t radeon_get_pm_method(struct device *dev, > struct radeon_device *rdev = ddev->dev_private; > int pm = rdev->pm.pm_method; > > - return snprintf(buf, PAGE_SIZE, "%s\n", > - (pm == PM_METHOD_DYNPM) ? "dynpm" : > - (pm == PM_METHOD_PROFILE) ? "profile" : "dpm"); > + return sysfs_emit(buf, "%s\n", (pm == PM_METHOD_DYNPM) ? "dynpm" : > + (pm == PM_METHOD_PROFILE) ? "profile" : "dpm"); > } > > static ssize_t radeon_set_pm_method(struct device *dev, > @@ -473,9 +471,9 @@ static ssize_t radeon_get_dpm_state(struct device *dev, > struct radeon_device *rdev = ddev->dev_private; > enum radeon_pm_state_type pm = rdev->pm.dpm.user_state; > > - return snprintf(buf, PAGE_SIZE, "%s\n", > - (pm == POWER_STATE_TYPE_BATTERY) ? "battery" : > - (pm == POWER_STATE_TYPE_BALANCED) ? "balanced" : > "performance"); > + return sysfs_emit(buf, "%s\n", > + (pm == POWER_STATE_TYPE_BATTERY) ? "battery" : > + (pm == POWER_STATE_TYPE_BALANCED) ? "balanced" : > "performance"); > } > > static ssize_t radeon_set_dpm_state(struct device *dev, > @@ -519,11 +517,11 @@ static ssize_t > radeon_get_dpm_forced_performance_level(struct device *dev, > > if ((rdev->flags & RADEON_IS_PX) && > (ddev->switch_power_state != DRM_SWITCH_POWER_ON)) > - return snprintf(buf, PAGE_SIZE, "off\n"); > + return sysfs_emit(buf, "off\n"); > > - return snprintf(buf, PAGE_SIZE, "%s\n", > - (level == RADEON_DPM_FORCED_LEVEL_AUTO) ? "auto" : > - (level == RADEON_DPM_FORCED_LEVEL_LOW) ? "low" : > "high"); > + return sysfs_emit(buf, "%s\n", > + (level == RADEON_DPM_FORCED_LEVEL_AUTO) ? "auto" : > + (level == RADEON_DPM_FORCED_LEVEL_LOW) ? "low" : > "high"); > } > > static ssize_t radeon_set_dpm_forced_performance_level(struct device *dev, > @@ -686,7 +684,7 @@ static ssize_t radeon_hwmon_show_temp(struct device *dev, > else > temp = 0; > > - return snprintf(buf, PAGE_SIZE, "%d\n", temp); > + return sysfs_emit(buf, "%d\n", temp); > } > > static ssize_t radeon_hwmon_show_temp_thresh(struct device *dev, > @@ -702,7 +700,7 @@ static ssize_t radeon_hwmon_show_temp_thresh(struct > device *dev, > else > temp = rdev->pm.dpm.thermal.max_temp; > > - return snprintf(buf, PAGE_SIZE, "%d\n", temp); > + return sysfs_emit(buf, "%d\n", temp); > } > > static SENSOR_DEVICE_ATTR(temp1_input, S_IRUGO, radeon_hwmon_show_temp, > NULL, 0); > @@ -732,7 +730,7 @@ static ssize_t radeon_hwmon_show_sclk(struct device *dev, >for hwmon */ > sclk *= 1; > > - return snprintf(buf, PAGE_SIZE, "%u\n", sclk); > + return sysfs_emit(buf,
Need to support mixed memory mappings with HMM
Hi, This is a long one with a proposal for a pretty significant redesign of how we handle migrations and VRAM management with HMM. This is based on my own debugging and reading of the migrate_vma helpers, as well as Alex's problems with migrations on A+A. I hope we can discuss this next Monday after you've had some time do digest it. I did some debugging yesterday and found that migrations to VRAM can fail for some pages. The current migration helpers have many corner cases where a page cannot be migrated. Some of them may be fixable (adding support for THP), others are not (locked pages are skipped to avoid deadlocks). Therefore I think our current code is too inflexible when it assumes that a range is entirely in one place. Alex also ran into some funny issues with COW on A+A where some pages get faulted back to system memory. I think a lot of the problems here will get easier once we support mixed mappings. Mixed GPU mappings === The idea is, to remove any assumptions that an entire svm_range is in one place. Instead hmm_range_fault gives us a list of pages, some of which are system memory and others are device_private or device_generic. We will need an amdgpu_vm interface that lets us map mixed page arrays where different pages use different PTE flags. We can have at least 3 different types of pages in one mapping: 1. System memory (S-bit set) 2. Local memory (S-bit cleared, MTYPE for local memory) 3. Remote XGMI memory (S-bit cleared, MTYPE+C for remote memory) My idea is to change the amdgpu_vm_update_mapping interface to use some high-bit in the pages_addr array to indicate the page type. For the default page type (0) nothing really changes for the callers. The "flags" parameter needs to become a pointer to an array that gets indexed by the high bits from the pages_addr array. For existing callers it's as easy as changing flags to (array of size 1). For HMM we would pass a pointer to a real array. Once this is done, it leads to a number of opportunities for simplification and better efficiency in kfd_svm: * Support migration when cpages != npages * Migrate a part of an svm_range without splitting it. No more splitting of ranges in CPU page faults * Migrate a part of an svm_range in GPU page fault handler. No more migrating the whole range for a single page fault * Simplified VRAM management (see below) With that, svm_range will no longer have an "actual_loc" field. If we're not sure where the data is, we need to call migrate. If it's already in the right place, then cpages will be 0 and we can skip all the steps after migrate_vma_setup. Simplified VRAM management == VRAM BOs are no longer associated with pranges. Instead they are "free-floating", allocated during migration to VRAM, with reference count for each page that uses the BO. Ref is released in page-release callback. When the ref count drops to 0, free the BO. VRAM BO size should match the migration granularity, 2MB by default. That way the BO can be freed when memory gets migrated out. If migration of some pages fails the BO may not be fully occupied. Also some pages may be released individually on A+A due to COW or other events. Eviction needs to migrate all the pages still using the BO. If the BO struct keeps an array of page pointers, that's basically the migrate.src for the eviction. Migration calls "try_to_unmap", which has the best chance of freeing the BO, even when shared by multiple processes. If we cannot guarantee eviction of pages, we cannot use TTM for VRAM allocations. Need to use amdgpu_vram_mgr. Need a way to detect memory pressure so we can start evicting memory. Regards, Felix ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: drm/ttm: switch to per device LRU lock
Thanks! Just a copy issue. Patch to fix this is on the mailing list. Christian. Am 25.03.21 um 16:00 schrieb Colin Ian King: Hi, Static analysis with Coverity in linux-next has detected an issue in drivers/gpu/drm/ttm/ttm_bo.c with the follow commit: commit a1f091f8ef2b680a5184db065527612247cb4cae Author: Christian König Date: Tue Oct 6 17:26:42 2020 +0200 drm/ttm: switch to per device LRU lock Instead of having a global lock for potentially less contention. The analysis is as follows: 617 int ttm_mem_evict_first(struct ttm_device *bdev, 618struct ttm_resource_manager *man, 619const struct ttm_place *place, 620struct ttm_operation_ctx *ctx, 621struct ww_acquire_ctx *ticket) 622 { 1. assign_zero: Assigning: bo = NULL. 623struct ttm_buffer_object *bo = NULL, *busy_bo = NULL; 624bool locked = false; 625unsigned i; 626int ret; 627 Explicit null dereferenced (FORWARD_NULL)2. var_deref_op: Dereferencing null pointer bo. 628spin_lock(>bdev->lru_lock); 629for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) { The spin_lock on bo is dereferencing a null bo pointer. Colin ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
re: drm/ttm: switch to per device LRU lock
Hi, Static analysis with Coverity in linux-next has detected an issue in drivers/gpu/drm/ttm/ttm_bo.c with the follow commit: commit a1f091f8ef2b680a5184db065527612247cb4cae Author: Christian König Date: Tue Oct 6 17:26:42 2020 +0200 drm/ttm: switch to per device LRU lock Instead of having a global lock for potentially less contention. The analysis is as follows: 617 int ttm_mem_evict_first(struct ttm_device *bdev, 618struct ttm_resource_manager *man, 619const struct ttm_place *place, 620struct ttm_operation_ctx *ctx, 621struct ww_acquire_ctx *ticket) 622 { 1. assign_zero: Assigning: bo = NULL. 623struct ttm_buffer_object *bo = NULL, *busy_bo = NULL; 624bool locked = false; 625unsigned i; 626int ret; 627 Explicit null dereferenced (FORWARD_NULL)2. var_deref_op: Dereferencing null pointer bo. 628spin_lock(>bdev->lru_lock); 629for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) { The spin_lock on bo is dereferencing a null bo pointer. Colin ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amd/pm: no need to force MCLK to highest when no display connected
[AMD Official Use Only - Internal Distribution Only] Reviewed-by: Alex Deucher Tested-by: Alex Deucher From: amd-gfx on behalf of Evan Quan Sent: Thursday, March 25, 2021 12:18 AM To: amd-gfx@lists.freedesktop.org Cc: Quan, Evan Subject: [PATCH] drm/amd/pm: no need to force MCLK to highest when no display connected Correct the check for vblank short. Change-Id: I0eb3a6bd95b7f6d97029772914154324c8ccd2c0 Signed-off-by: Evan Quan --- drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c index 7edafef095a3..301b6769f007 100644 --- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c +++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c @@ -3330,7 +3330,8 @@ static int smu7_apply_state_adjust_rules(struct pp_hwmgr *hwmgr, disable_mclk_switching_for_display = ((1 < hwmgr->display_config->num_display) && !hwmgr->display_config->multi_monitor_in_sync) || - smu7_vblank_too_short(hwmgr, hwmgr->display_config->min_vblank_time); + (hwmgr->display_config->num_display && + smu7_vblank_too_short(hwmgr, hwmgr->display_config->min_vblank_time)); disable_mclk_switching = disable_mclk_switching_for_frame_lock || disable_mclk_switching_for_display; -- 2.29.0 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=04%7C01%7Calexander.deucher%40amd.com%7C96ae73006f284dd59a5d08d8ef452656%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637522427565407677%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=3S%2FkGh8zi1uLEonF3G4kWmozbBPazZJeuwTNAFhRgdQ%3Dreserved=0 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amd/amdgpu/gfx_v7_0: Trivial typo fixes
s/acccess/access/ s/inferface/interface/ s/sequnce/sequence/ .two different places. s/retrive/retrieve/ s/sheduling/scheduling/ s/independant/independent/ s/wether/whether/ ..two different places. s/emmit/emit/ s/synce/sync/ Signed-off-by: Bhaskar Chowdhury --- drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c index a368724c3dfc..4502b95ddf6b 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c @@ -1877,7 +1877,7 @@ static void gfx_v7_0_init_compute_vmid(struct amdgpu_device *adev) mutex_unlock(>srbm_mutex); /* Initialize all compute VMIDs to have no GDS, GWS, or OA - acccess. These should be enabled by FW for target VMIDs. */ + access. These should be enabled by FW for target VMIDs. */ for (i = adev->vm_manager.first_kfd_vmid; i < AMDGPU_NUM_VMID; i++) { WREG32(amdgpu_gds_reg_offset[i].mem_base, 0); WREG32(amdgpu_gds_reg_offset[i].mem_size, 0); @@ -2058,7 +2058,7 @@ static void gfx_v7_0_constants_init(struct amdgpu_device *adev) * @adev: amdgpu_device pointer * * Set up the number and offset of the CP scratch registers. - * NOTE: use of CP scratch registers is a legacy inferface and + * NOTE: use of CP scratch registers is a legacy interface and * is not used by default on newer asics (r6xx+). On newer asics, * memory buffers are used for fences rather than scratch regs. */ @@ -2172,7 +2172,7 @@ static void gfx_v7_0_ring_emit_vgt_flush(struct amdgpu_ring *ring) * @seq: sequence number * @flags: fence related flags * - * Emits a fence sequnce number on the gfx ring and flushes + * Emits a fence sequence number on the gfx ring and flushes * GPU caches. */ static void gfx_v7_0_ring_emit_fence_gfx(struct amdgpu_ring *ring, u64 addr, @@ -2215,7 +2215,7 @@ static void gfx_v7_0_ring_emit_fence_gfx(struct amdgpu_ring *ring, u64 addr, * @seq: sequence number * @flags: fence related flags * - * Emits a fence sequnce number on the compute ring and flushes + * Emits a fence sequence number on the compute ring and flushes * GPU caches. */ static void gfx_v7_0_ring_emit_fence_compute(struct amdgpu_ring *ring, @@ -2245,14 +2245,14 @@ static void gfx_v7_0_ring_emit_fence_compute(struct amdgpu_ring *ring, * gfx_v7_0_ring_emit_ib - emit an IB (Indirect Buffer) on the ring * * @ring: amdgpu_ring structure holding ring information - * @job: job to retrive vmid from + * @job: job to retrieve vmid from * @ib: amdgpu indirect buffer object * @flags: options (AMDGPU_HAVE_CTX_SWITCH) * * Emits an DE (drawing engine) or CE (constant engine) IB * on the gfx ring. IBs are usually generated by userspace * acceleration drivers and submitted to the kernel for - * sheduling on the ring. This function schedules the IB + * scheduling on the ring. This function schedules the IB * on the gfx ring for execution by the GPU. */ static void gfx_v7_0_ring_emit_ib_gfx(struct amdgpu_ring *ring, @@ -2402,7 +2402,7 @@ static int gfx_v7_0_ring_test_ib(struct amdgpu_ring *ring, long timeout) /* * CP. - * On CIK, gfx and compute now have independant command processors. + * On CIK, gfx and compute now have independent command processors. * * GFX * Gfx consists of a single ring and can process both gfx jobs and @@ -2630,7 +2630,7 @@ static int gfx_v7_0_cp_gfx_resume(struct amdgpu_device *adev) ring->wptr = 0; WREG32(mmCP_RB0_WPTR, lower_32_bits(ring->wptr)); - /* set the wb address wether it's enabled or not */ + /* set the wb address whether it's enabled or not */ rptr_addr = adev->wb.gpu_addr + (ring->rptr_offs * 4); WREG32(mmCP_RB0_RPTR_ADDR, lower_32_bits(rptr_addr)); WREG32(mmCP_RB0_RPTR_ADDR_HI, upper_32_bits(rptr_addr) & 0xFF); @@ -2985,7 +2985,7 @@ static void gfx_v7_0_mqd_init(struct amdgpu_device *adev, mqd->cp_hqd_pq_wptr_poll_addr_lo = wb_gpu_addr & 0xfffc; mqd->cp_hqd_pq_wptr_poll_addr_hi = upper_32_bits(wb_gpu_addr) & 0x; - /* set the wb address wether it's enabled or not */ + /* set the wb address whether it's enabled or not */ wb_gpu_addr = adev->wb.gpu_addr + (ring->rptr_offs * 4); mqd->cp_hqd_pq_rptr_report_addr_lo = wb_gpu_addr & 0xfffc; mqd->cp_hqd_pq_rptr_report_addr_hi = @@ -3198,7 +3198,7 @@ static int gfx_v7_0_cp_resume(struct amdgpu_device *adev) /** * gfx_v7_0_ring_emit_vm_flush - cik vm flush using the CP * - * @ring: the ring to emmit the commands to + * @ring: the ring to emit the commands to * * Sync the command pipeline with the PFP. E.g. wait for everything * to be completed. @@ -3220,7 +3220,7 @@ static void gfx_v7_0_ring_emit_pipeline_sync(struct amdgpu_ring *ring) amdgpu_ring_write(ring, 4); /* poll interval */ if
RE: [PATCH 2/2] drm/amd/pm: unify the interface for gfx state setting
[AMD Public Use] Series is Reviewed-by: Lijo Lazar -Original Message- From: amd-gfx On Behalf Of Evan Quan Sent: Thursday, March 25, 2021 4:12 PM To: amd-gfx@lists.freedesktop.org Cc: Quan, Evan Subject: [PATCH 2/2] drm/amd/pm: unify the interface for gfx state setting No need to have special handling for swSMU supported ASICs. Change-Id: I029414efa63c130a1a3aaba908bbf3857c5873ff Signed-off-by: Evan Quan --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 16 ++-- drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h | 2 -- drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 5 - 3 files changed, 10 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c index de540c209023..12e8b527776b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c @@ -842,14 +842,10 @@ int amdgpu_gfx_get_num_kcq(struct amdgpu_device *adev) void amdgpu_gfx_state_change_set(struct amdgpu_device *adev, enum gfx_change_state state) { - if (is_support_sw_smu(adev)) { - smu_gfx_state_change_set(>smu, state); - } else { - mutex_lock(>pm.mutex); - if (adev->powerplay.pp_funcs && - adev->powerplay.pp_funcs->gfx_state_change_set) - ((adev)->powerplay.pp_funcs->gfx_state_change_set( - (adev)->powerplay.pp_handle, state)); - mutex_unlock(>pm.mutex); - } + mutex_lock(>pm.mutex); + if (adev->powerplay.pp_funcs && + adev->powerplay.pp_funcs->gfx_state_change_set) + ((adev)->powerplay.pp_funcs->gfx_state_change_set( + (adev)->powerplay.pp_handle, state)); + mutex_unlock(>pm.mutex); } diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h index 059d2c4e4c7f..ae05c16443c3 100644 --- a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h +++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h @@ -1317,8 +1317,6 @@ int smu_allow_xgmi_power_down(struct smu_context *smu, bool en); int smu_get_status_gfxoff(struct amdgpu_device *adev, uint32_t *value); -int smu_gfx_state_change_set(struct smu_context *smu, uint32_t state); - int smu_set_light_sbr(struct smu_context *smu, bool enable); int smu_wait_for_event(struct amdgpu_device *adev, enum smu_event_type event, diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c index 0d1372d440ed..b2898f93a3ff 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c @@ -2920,8 +2920,10 @@ static int smu_enable_mgpu_fan_boost(void *handle) return ret; } -int smu_gfx_state_change_set(struct smu_context *smu, uint32_t state) +static int smu_gfx_state_change_set(void *handle, + uint32_t state) { + struct smu_context *smu = handle; int ret = 0; mutex_lock(>mutex); @@ -2994,6 +2996,7 @@ static const struct amd_pm_funcs swsmu_pm_funcs = { .display_disable_memory_clock_switch = smu_display_disable_memory_clock_switch, .get_max_sustainable_clocks_by_dc= smu_get_max_sustainable_clocks_by_dc, .load_firmware = smu_load_microcode, + .gfx_state_change_set= smu_gfx_state_change_set, }; int smu_wait_for_event(struct amdgpu_device *adev, enum smu_event_type event, -- 2.29.0 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=04%7C01%7Clijo.lazar%40amd.com%7Cead6b76a1b794a3e28a308d8ef7abebc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C63752265776120%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=8k39%2B4CHfE4hYaGq1rci1Df6ATY%2B4mFWWy28h9fv8Sc%3Dreserved=0 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: [PATCH 2/2] drm/amd/pm: fix missing static declarations
[AMD Public Use] Series is Reviewed-by: Lijo Lazar -Original Message- From: amd-gfx On Behalf Of Evan Quan Sent: Thursday, March 25, 2021 9:51 AM To: amd-gfx@lists.freedesktop.org Cc: Quan, Evan Subject: [PATCH 2/2] drm/amd/pm: fix missing static declarations Add "static" declarations for those APIs used internally. Change-Id: I38bfa8a117b3056202935163935a867f03ebaefe Signed-off-by: Evan Quan --- drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c index 3bbe0278e50e..e136f071b4f2 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c @@ -1467,7 +1467,7 @@ static int smu_hw_fini(void *handle) return smu_smc_hw_cleanup(smu); } -int smu_reset(struct smu_context *smu) +static int smu_reset(struct smu_context *smu) { struct amdgpu_device *adev = smu->adev; int ret; @@ -1715,10 +1715,10 @@ static int smu_adjust_power_state_dynamic(struct smu_context *smu, return ret; } -int smu_handle_task(struct smu_context *smu, - enum amd_dpm_forced_level level, - enum amd_pp_task task_id, - bool lock_needed) +static int smu_handle_task(struct smu_context *smu, + enum amd_dpm_forced_level level, + enum amd_pp_task task_id, + bool lock_needed) { int ret = 0; @@ -2147,7 +2147,7 @@ static int smu_load_microcode(void *handle) return ret; } -int smu_set_gfx_cgpg(struct smu_context *smu, bool enabled) +static int smu_set_gfx_cgpg(struct smu_context *smu, bool enabled) { int ret = 0; @@ -2466,7 +2466,7 @@ static u32 smu_get_fan_control_mode(void *handle) return ret; } -int smu_set_fan_control_mode(struct smu_context *smu, int value) +static int smu_set_fan_control_mode(struct smu_context *smu, int value) { int ret = 0; -- 2.29.0 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=04%7C01%7Clijo.lazar%40amd.com%7Cb4ea58727007476caa1008d8ef4580c7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637522429088901237%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=bNDUB66tzrjOXwn8Prqe3QSo1kHIOxa3A%2BvrEjSUZl8%3Dreserved=0 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 1/2] drm/amd/pm: unify the interface for power gating
No need to have special handling for swSMU supported ASICs. Change-Id: I7292c1064c3a1c75dc9f91d7c5318eab4f407241 Signed-off-by: Evan Quan --- drivers/gpu/drm/amd/pm/amdgpu_dpm.c | 9 - drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h | 3 --- drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 6 +++--- 3 files changed, 3 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c index 464fc04fb334..03581d5b1836 100644 --- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c +++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c @@ -927,7 +927,6 @@ int amdgpu_dpm_set_powergating_by_smu(struct amdgpu_device *adev, uint32_t block { int ret = 0; const struct amd_pm_funcs *pp_funcs = adev->powerplay.pp_funcs; - bool swsmu = is_support_sw_smu(adev); switch (block_type) { case AMD_IP_BLOCK_TYPE_UVD: @@ -968,15 +967,7 @@ int amdgpu_dpm_set_powergating_by_smu(struct amdgpu_device *adev, uint32_t block case AMD_IP_BLOCK_TYPE_GFX: case AMD_IP_BLOCK_TYPE_VCN: case AMD_IP_BLOCK_TYPE_SDMA: - if (pp_funcs && pp_funcs->set_powergating_by_smu) { - ret = (pp_funcs->set_powergating_by_smu( - (adev)->powerplay.pp_handle, block_type, gate)); - } - break; case AMD_IP_BLOCK_TYPE_JPEG: - if (swsmu) - ret = smu_dpm_set_power_gate(>smu, block_type, gate); - break; case AMD_IP_BLOCK_TYPE_GMC: case AMD_IP_BLOCK_TYPE_ACP: if (pp_funcs && pp_funcs->set_powergating_by_smu) { diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h index 4684eca7b37b..059d2c4e4c7f 100644 --- a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h +++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h @@ -1305,9 +1305,6 @@ bool is_support_sw_smu(struct amdgpu_device *adev); bool is_support_cclk_dpm(struct amdgpu_device *adev); int smu_write_watermarks_table(struct smu_context *smu); -/* smu to display interface */ -extern int smu_dpm_set_power_gate(void *handle, uint32_t block_type, bool gate); - int smu_get_dpm_freq_range(struct smu_context *smu, enum smu_clk_type clk_type, uint32_t *min, uint32_t *max); diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c index e136f071b4f2..0d1372d440ed 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c @@ -268,9 +268,9 @@ static int smu_dpm_set_jpeg_enable(struct smu_context *smu, *Under this case, the smu->mutex lock protection is already enforced on *the parent API smu_force_performance_level of the call path. */ -int smu_dpm_set_power_gate(void *handle, - uint32_t block_type, - bool gate) +static int smu_dpm_set_power_gate(void *handle, + uint32_t block_type, + bool gate) { struct smu_context *smu = handle; int ret = 0; -- 2.29.0 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 2/2] drm/amd/pm: unify the interface for gfx state setting
No need to have special handling for swSMU supported ASICs. Change-Id: I029414efa63c130a1a3aaba908bbf3857c5873ff Signed-off-by: Evan Quan --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 16 ++-- drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h | 2 -- drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 5 - 3 files changed, 10 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c index de540c209023..12e8b527776b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c @@ -842,14 +842,10 @@ int amdgpu_gfx_get_num_kcq(struct amdgpu_device *adev) void amdgpu_gfx_state_change_set(struct amdgpu_device *adev, enum gfx_change_state state) { - if (is_support_sw_smu(adev)) { - smu_gfx_state_change_set(>smu, state); - } else { - mutex_lock(>pm.mutex); - if (adev->powerplay.pp_funcs && - adev->powerplay.pp_funcs->gfx_state_change_set) - ((adev)->powerplay.pp_funcs->gfx_state_change_set( - (adev)->powerplay.pp_handle, state)); - mutex_unlock(>pm.mutex); - } + mutex_lock(>pm.mutex); + if (adev->powerplay.pp_funcs && + adev->powerplay.pp_funcs->gfx_state_change_set) + ((adev)->powerplay.pp_funcs->gfx_state_change_set( + (adev)->powerplay.pp_handle, state)); + mutex_unlock(>pm.mutex); } diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h index 059d2c4e4c7f..ae05c16443c3 100644 --- a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h +++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h @@ -1317,8 +1317,6 @@ int smu_allow_xgmi_power_down(struct smu_context *smu, bool en); int smu_get_status_gfxoff(struct amdgpu_device *adev, uint32_t *value); -int smu_gfx_state_change_set(struct smu_context *smu, uint32_t state); - int smu_set_light_sbr(struct smu_context *smu, bool enable); int smu_wait_for_event(struct amdgpu_device *adev, enum smu_event_type event, diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c index 0d1372d440ed..b2898f93a3ff 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c @@ -2920,8 +2920,10 @@ static int smu_enable_mgpu_fan_boost(void *handle) return ret; } -int smu_gfx_state_change_set(struct smu_context *smu, uint32_t state) +static int smu_gfx_state_change_set(void *handle, + uint32_t state) { + struct smu_context *smu = handle; int ret = 0; mutex_lock(>mutex); @@ -2994,6 +2996,7 @@ static const struct amd_pm_funcs swsmu_pm_funcs = { .display_disable_memory_clock_switch = smu_display_disable_memory_clock_switch, .get_max_sustainable_clocks_by_dc= smu_get_max_sustainable_clocks_by_dc, .load_firmware = smu_load_microcode, + .gfx_state_change_set= smu_gfx_state_change_set, }; int smu_wait_for_event(struct amdgpu_device *adev, enum smu_event_type event, -- 2.29.0 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak
[AMD Official Use Only - Internal Distribution Only] Hi, Andrey Thank you for your good opinions. I literally agree with you that the refcount could solve the get_clean_up_up cocurrent job gracefully, and no need to re-insert the job back anymore. I quickly made a draft for this idea as follows: How do you like it? I will start implement to it after I got your acknowledge. Thanks, Jack +void drm_job_get(struct drm_sched_job *s_job) +{ + kref_get(_job->refcount); +} + +void drm_job_do_release(struct kref *ref) +{ + struct drm_sched_job *s_job; + struct drm_gpu_scheduler *sched; + + s_job = container_of(ref, struct drm_sched_job, refcount); + sched = s_job->sched; + sched->ops->free_job(s_job); +} + +void drm_job_put(struct drm_sched_job *s_job) +{ + kref_put(_job->refcount, drm_job_do_release); +} + static void drm_sched_job_begin(struct drm_sched_job *s_job) { struct drm_gpu_scheduler *sched = s_job->sched; + kref_init(_job->refcount); + drm_job_get(s_job); spin_lock(>job_list_lock); list_add_tail(_job->node, >ring_mirror_list); drm_sched_start_timeout(sched); @@ -294,17 +316,16 @@ static void drm_sched_job_timedout(struct work_struct *work) * drm_sched_cleanup_jobs. It will be reinserted back after sched->thread * is parked at which point it's safe. */ - list_del_init(>node); + drm_job_get(job); spin_unlock(>job_list_lock); job->sched->ops->timedout_job(job); - + drm_job_put(job); /* * Guilty job did complete and hence needs to be manually removed * See drm_sched_stop doc. */ if (sched->free_guilty) { - job->sched->ops->free_job(job); sched->free_guilty = false; } } else { @@ -355,20 +376,6 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) - /* -* Reinsert back the bad job here - now it's safe as -* drm_sched_get_cleanup_job cannot race against us and release the -* bad job at this point - we parked (waited for) any in progress -* (earlier) cleanups and drm_sched_get_cleanup_job will not be called -* now until the scheduler thread is unparked. -*/ - if (bad && bad->sched == sched) - /* -* Add at the head of the queue to reflect it was the earliest -* job extracted. -*/ - list_add(>node, >ring_mirror_list); - /* * Iterate the job list from later to earlier one and either deactive * their HW callbacks or remove them from mirror list if they already @@ -774,7 +781,7 @@ static int drm_sched_main(void *param) kthread_should_stop()); if (cleanup_job) { - sched->ops->free_job(cleanup_job); + drm_job_put(cleanup_job); /* queue timeout for next job */ drm_sched_start_timeout(sched); } diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 5a1f068af1c2..b80513eec90f 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -188,6 +188,7 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f); * to schedule the job. */ struct drm_sched_job { + struct kref refcount; struct spsc_nodequeue_node; struct drm_gpu_scheduler*sched; struct drm_sched_fence *s_fence; @@ -198,6 +199,7 @@ struct drm_sched_job { enum drm_sched_priority s_priority; struct drm_sched_entity *entity; struct dma_fence_cb cb; + }; From: Grodzovsky, Andrey Sent: Friday, March 19, 2021 12:17 AM To: Zhang, Jack (Jian) ; Christian König ; dri-de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig, Christian ; Liu, Monk ; Deng, Emily ; Rob Herring ; Tomeu Vizoso ; Steven Price Subject: Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak On 2021-03-18 6:41 a.m., Zhang, Jack (Jian) wrote: [AMD Official Use Only - Internal Distribution Only] Hi, Andrey Let me summarize the background of this patch: In TDR resubmit step “amdgpu_device_recheck_guilty_jobs, It will submit first jobs of each ring and do guilty job re-check. At that point, We had to make sure each job is in the mirror list(or re-inserted back already). But we found the current code never re-insert the job to mirror list in the 2nd, 3rd job_timeout thread(Bailing TDR thread). This not only will cause memleak of the bailing jobs. What’s more important, the 1st tdr thread can never iterate the bailing job and set its guilty status to a correct
Re: [PATCH] drm/amd/amdgpu/gfx_v7_0: Trivial typo fixes
Reviewed-by: Nirmoy Das On 3/25/21 9:53 AM, Bhaskar Chowdhury wrote: s/acccess/access/ s/inferface/interface/ s/sequnce/sequence/ .two different places. s/retrive/retrieve/ s/sheduling/scheduling/ s/independant/independent/ s/wether/whether/ ..two different places. s/emmit/emit/ s/synce/sync/ Signed-off-by: Bhaskar Chowdhury --- drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c index a368724c3dfc..4502b95ddf6b 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c @@ -1877,7 +1877,7 @@ static void gfx_v7_0_init_compute_vmid(struct amdgpu_device *adev) mutex_unlock(>srbm_mutex); /* Initialize all compute VMIDs to have no GDS, GWS, or OA - acccess. These should be enabled by FW for target VMIDs. */ + access. These should be enabled by FW for target VMIDs. */ for (i = adev->vm_manager.first_kfd_vmid; i < AMDGPU_NUM_VMID; i++) { WREG32(amdgpu_gds_reg_offset[i].mem_base, 0); WREG32(amdgpu_gds_reg_offset[i].mem_size, 0); @@ -2058,7 +2058,7 @@ static void gfx_v7_0_constants_init(struct amdgpu_device *adev) * @adev: amdgpu_device pointer * * Set up the number and offset of the CP scratch registers. - * NOTE: use of CP scratch registers is a legacy inferface and + * NOTE: use of CP scratch registers is a legacy interface and * is not used by default on newer asics (r6xx+). On newer asics, * memory buffers are used for fences rather than scratch regs. */ @@ -2172,7 +2172,7 @@ static void gfx_v7_0_ring_emit_vgt_flush(struct amdgpu_ring *ring) * @seq: sequence number * @flags: fence related flags * - * Emits a fence sequnce number on the gfx ring and flushes + * Emits a fence sequence number on the gfx ring and flushes * GPU caches. */ static void gfx_v7_0_ring_emit_fence_gfx(struct amdgpu_ring *ring, u64 addr, @@ -2215,7 +2215,7 @@ static void gfx_v7_0_ring_emit_fence_gfx(struct amdgpu_ring *ring, u64 addr, * @seq: sequence number * @flags: fence related flags * - * Emits a fence sequnce number on the compute ring and flushes + * Emits a fence sequence number on the compute ring and flushes * GPU caches. */ static void gfx_v7_0_ring_emit_fence_compute(struct amdgpu_ring *ring, @@ -2245,14 +2245,14 @@ static void gfx_v7_0_ring_emit_fence_compute(struct amdgpu_ring *ring, * gfx_v7_0_ring_emit_ib - emit an IB (Indirect Buffer) on the ring * * @ring: amdgpu_ring structure holding ring information - * @job: job to retrive vmid from + * @job: job to retrieve vmid from * @ib: amdgpu indirect buffer object * @flags: options (AMDGPU_HAVE_CTX_SWITCH) * * Emits an DE (drawing engine) or CE (constant engine) IB * on the gfx ring. IBs are usually generated by userspace * acceleration drivers and submitted to the kernel for - * sheduling on the ring. This function schedules the IB + * scheduling on the ring. This function schedules the IB * on the gfx ring for execution by the GPU. */ static void gfx_v7_0_ring_emit_ib_gfx(struct amdgpu_ring *ring, @@ -2402,7 +2402,7 @@ static int gfx_v7_0_ring_test_ib(struct amdgpu_ring *ring, long timeout) /* * CP. - * On CIK, gfx and compute now have independant command processors. + * On CIK, gfx and compute now have independent command processors. * * GFX * Gfx consists of a single ring and can process both gfx jobs and @@ -2630,7 +2630,7 @@ static int gfx_v7_0_cp_gfx_resume(struct amdgpu_device *adev) ring->wptr = 0; WREG32(mmCP_RB0_WPTR, lower_32_bits(ring->wptr)); - /* set the wb address wether it's enabled or not */ + /* set the wb address whether it's enabled or not */ rptr_addr = adev->wb.gpu_addr + (ring->rptr_offs * 4); WREG32(mmCP_RB0_RPTR_ADDR, lower_32_bits(rptr_addr)); WREG32(mmCP_RB0_RPTR_ADDR_HI, upper_32_bits(rptr_addr) & 0xFF); @@ -2985,7 +2985,7 @@ static void gfx_v7_0_mqd_init(struct amdgpu_device *adev, mqd->cp_hqd_pq_wptr_poll_addr_lo = wb_gpu_addr & 0xfffc; mqd->cp_hqd_pq_wptr_poll_addr_hi = upper_32_bits(wb_gpu_addr) & 0x; - /* set the wb address wether it's enabled or not */ + /* set the wb address whether it's enabled or not */ wb_gpu_addr = adev->wb.gpu_addr + (ring->rptr_offs * 4); mqd->cp_hqd_pq_rptr_report_addr_lo = wb_gpu_addr & 0xfffc; mqd->cp_hqd_pq_rptr_report_addr_hi = @@ -3198,7 +3198,7 @@ static int gfx_v7_0_cp_resume(struct amdgpu_device *adev) /** * gfx_v7_0_ring_emit_vm_flush - cik vm flush using the CP * - * @ring: the ring to emmit the commands to + * @ring: the ring to emit the commands to * * Sync the command pipeline with the PFP. E.g. wait for everything * to be completed. @@ -3220,7 +3220,7 @@ static void
Re: [PATCH] drm/amdgpu/pm: bail on sysfs/debugfs queries during platform suspend
Am 25.03.21 um 04:29 schrieb Quan, Evan: [AMD Public Use] Maybe we can have an API like is_hw_access_blocked(). So that we can put all those checks below within it. if (amdgpu_in_reset(adev)) return -EPERM; + if (adev->in_suspend && !adev->in_runpm) + return -EPERM; Sounds like a good idea to me as well. But my question is how the heck have we managed to access those files during suspend? Anyway, patch is reviewed-by: Evan Quan Acked-by: Christian König -Original Message- From: amd-gfx On Behalf Of Alex Deucher Sent: Thursday, March 25, 2021 5:18 AM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander Subject: [PATCH] drm/amdgpu/pm: bail on sysfs/debugfs queries during platform suspend The GPU is in the process of being shutdown. Spurious queries during suspend and resume can put the SMU into a bad state. Runtime PM is handled dynamically so we check if we are in non-runtime suspend. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/pm/amdgpu_pm.c | 98 ++ 1 file changed, 98 insertions(+) diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c b/drivers/gpu/drm/amd/pm/amdgpu_pm.c index 2627870a786e..3c1b5483688b 100644 --- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c +++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c @@ -129,6 +129,8 @@ static ssize_t amdgpu_get_power_dpm_state(struct device *dev, if (amdgpu_in_reset(adev)) return -EPERM; + if (adev->in_suspend && !adev->in_runpm) + return -EPERM; ret = pm_runtime_get_sync(ddev->dev); if (ret < 0) { @@ -162,6 +164,8 @@ static ssize_t amdgpu_set_power_dpm_state(struct device *dev, if (amdgpu_in_reset(adev)) return -EPERM; + if (adev->in_suspend && !adev->in_runpm) + return -EPERM; if (strncmp("battery", buf, strlen("battery")) == 0) state = POWER_STATE_TYPE_BATTERY; @@ -268,6 +272,8 @@ static ssize_t amdgpu_get_power_dpm_force_performance_level(struct device *dev, if (amdgpu_in_reset(adev)) return -EPERM; + if (adev->in_suspend && !adev->in_runpm) + return -EPERM; ret = pm_runtime_get_sync(ddev->dev); if (ret < 0) { @@ -310,6 +316,8 @@ static ssize_t amdgpu_set_power_dpm_force_performance_level(struct device *dev, if (amdgpu_in_reset(adev)) return -EPERM; + if (adev->in_suspend && !adev->in_runpm) + return -EPERM; if (strncmp("low", buf, strlen("low")) == 0) { level = AMD_DPM_FORCED_LEVEL_LOW; @@ -408,6 +416,8 @@ static ssize_t amdgpu_get_pp_num_states(struct device *dev, if (amdgpu_in_reset(adev)) return -EPERM; + if (adev->in_suspend && !adev->in_runpm) + return -EPERM; ret = pm_runtime_get_sync(ddev->dev); if (ret < 0) { @@ -448,6 +458,8 @@ static ssize_t amdgpu_get_pp_cur_state(struct device *dev, if (amdgpu_in_reset(adev)) return -EPERM; + if (adev->in_suspend && !adev->in_runpm) + return -EPERM; ret = pm_runtime_get_sync(ddev->dev); if (ret < 0) { @@ -484,6 +496,8 @@ static ssize_t amdgpu_get_pp_force_state(struct device *dev, if (amdgpu_in_reset(adev)) return -EPERM; + if (adev->in_suspend && !adev->in_runpm) + return -EPERM; if (adev->pp_force_state_enabled) return amdgpu_get_pp_cur_state(dev, attr, buf); @@ -504,6 +518,8 @@ static ssize_t amdgpu_set_pp_force_state(struct device *dev, if (amdgpu_in_reset(adev)) return -EPERM; + if (adev->in_suspend && !adev->in_runpm) + return -EPERM; if (strlen(buf) == 1) adev->pp_force_state_enabled = false; @@ -564,6 +580,8 @@ static ssize_t amdgpu_get_pp_table(struct device *dev, if (amdgpu_in_reset(adev)) return -EPERM; + if (adev->in_suspend && !adev->in_runpm) + return -EPERM; ret = pm_runtime_get_sync(ddev->dev); if (ret < 0) { @@ -602,6 +620,8 @@ static ssize_t amdgpu_set_pp_table(struct device *dev, if (amdgpu_in_reset(adev)) return -EPERM; + if (adev->in_suspend && !adev->in_runpm) + return -EPERM; ret = pm_runtime_get_sync(ddev->dev); if (ret < 0) { @@ -764,6 +784,8 @@ static ssize_t amdgpu_set_pp_od_clk_voltage(struct device *dev, if (amdgpu_in_reset(adev)) return -EPERM; + if (adev->in_suspend && !adev->in_runpm) + return -EPERM; if (count > 127) return -EINVAL; @@ -865,6 +887,8 @@ static ssize_t amdgpu_get_pp_od_clk_voltage(struct device *dev, if (amdgpu_in_reset(adev)) return -EPERM; + if (adev->in_suspend && !adev->in_runpm) + return -EPERM; ret = pm_runtime_get_sync(ddev->dev); if