Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer dereference
Did find way to reproduce issue constantly. After applying David's patch "0001-drm-amdgpu-fix-signaled-fence-isn-t-handled" with minor change -static struct dma_fence *drm_syncobj_get_stub_fence(void) +struct dma_fence *drm_syncobj_get_stub_fence(void) was able to avoid kernel panic due to NULL pointer dereference. So seems it is resolving the problem which I was trying to address. Can you please put revision of this patch? Though only one time I got reboot with dmesg.. [ 213.714324] RIP: 0010:reservation_object_add_shared_fence+0x22e/0x245 [ 213.714331] RSP: 0018:a03900b5ba80 EFLAGS: 00010246 [ 213.714338] RAX: 0004 RBX: a03900b5bba8 RCX: 969421139d00 [ 213.714343] RDX: RSI: 9693639662e0 RDI: 9694203db230 [ 213.714349] RBP: a03900b5bad0 R08: 002a R09: 969363966280 [ 213.714354] R10: 000180800079 R11: a3637f66 R12: 96941ea5cf00 [ 213.714360] R13: R14: 9693639662e0 R15: 9694203db230 [ 213.714366] FS: 786009e5f700() GS:96942ec0() knlGS: [ 213.714372] CS: 0010 DS: ES: CR0: 80050033 [ 213.714377] CR2: 7860023a4000 CR3: 00011ea8a000 CR4: 001406f0 [ 213.714382] Call Trace: [ 213.714397] ttm_eu_fence_buffer_objects+0x53/0x89 [ 213.714407] amdgpu_cs_ioctl+0x194f/0x196c [ 213.714420] ? amdgpu_cs_report_moved_bytes+0x5f/0x5f [ 213.714427] drm_ioctl_kernel+0x98/0xac [ 213.714435] drm_ioctl+0x235/0x357 [ 213.714443] ? amdgpu_cs_report_moved_bytes+0x5f/0x5f [ 213.714450] ? __switch_to_asm+0x34/0x70 [ 213.714456] ? __switch_to_asm+0x40/0x70 [ 213.714462] ? __switch_to_asm+0x34/0x70 [ 213.714468] ? __switch_to_asm+0x40/0x70 [ 213.714476] amdgpu_drm_ioctl+0x46/0x78 [ 213.714500] vfs_ioctl+0x1b/0x30 [ 213.714507] do_vfs_ioctl+0x492/0x6c1 [ 213.714530] SyS_ioctl+0x52/0x77 [ 213.714537] do_syscall_64+0x6d/0x81 [ 213.714544] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 213.714551] RIP: 0033:0x78600c9c6967 [ 213.714556] RSP: 002b:786009e5ec08 EFLAGS: 0246 ORIG_RAX: 0010 [ 213.714563] RAX: ffda RBX: c0186444 RCX: 78600c9c6967 [ 213.714568] RDX: 786009e5ec60 RSI: c0186444 RDI: 000d [ 213.714573] RBP: 786009e5ec40 R08: 786009e5ed00 R09: 786009e5ec50 [ 213.714578] R10: R11: 0246 R12: 000d [ 213.714583] R13: 3ba2573ff128 R14: R15: 786009e5ec60 Its worth to mention that I am not using latest kernel .relevant git log 71a0556255de125b7e3fc0dc6171fb31fab2b9ad drm/syncobj: Don't leak fences when WAIT_FOR_SUBMIT is set 4034318d2afac8f7f43457a4b480b877a94ec651 drm/syncobj: Stop reusing the same struct file for all syncobj -> fd So have not taken any recent changes from https://patchwork.kernel.org/patch/10575275/ series. only backported attached two patches . I need to check if something is missing that can cause issue. Thanks, Deepak From: Sharma, Deepak Sent: Tuesday, November 27, 2018 3:12 PM To: Zhou, David(ChunMing); Koenig, Christian; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer dereference Thanks David , I did backport the drm patch to my kernel after that I am not getting -ENOMEM from amdgpu_cs_ioctl while running tests so have not been able to test patch to handle signaled fence. As this issue is hard to reproduce , will give some more try. But yes the problem is there and need to handle when fence is null that your patch seems to handle it correctly. -Deepak From: Zhou, David(ChunMing) Sent: Monday, November 26, 2018 6:40 PM To: Sharma, Deepak; Zhou, David(ChunMing); Koenig, Christian; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer dereference Yeah, you need another drm patch as well when you apply my patch. Attached. -David On 2018年11月27日 08:40, Sharma, Deepak wrote: > > On 11/26/18 1:57 AM, Zhou, David(ChunMing) wrote: >> >>> -Original Message- >>> From: Christian König >>> Sent: Monday, November 26, 2018 5:23 PM >>> To: Sharma, Deepak ; Zhou, David(ChunMing) >>> ; Koenig, Christian ; >>> amd-gfx@lists.freedesktop.org >>> Subject: Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer >>> dereference >>> >>> Am 26.11.18 um 02:59 schrieb Sharma, Deepak: 在 2018/11/24 2:10, Koenig, Christian 写道: > Am 23.11.18 um 15:10 schrieb Zhou, David(ChunMing): >> 在 2018/11/23 21:30, Koenig, Christian 写道: >>> Am 23.11.18 um 14:27 schrieb Zhou, David(ChunMing): 在 2018/11/22 19:25, Christian König 写道: > Am 22.11.18 um 07:56 schrieb Sharma, Deepak: >> when returned fence is not valid mostly due to userspace ignored >> previous error causes NULL pointer dereference. > Again, thi
Re: [Intel-gfx] [PATCH RFC 2/5] cgroup: Add mechanism to register vendor specific DRM devices
On Wed, Nov 28, 2018 at 07:46:06PM +, Ho, Kenny wrote: > > On Wed, Nov 28, 2018 at 4:14 AM Joonas Lahtinen > wrote: > > So we can only choose the lowest common denominator, right? > > > > Any core count out of total core count should translate nicely into a > > fraction, so what would be the problem with percentage amounts? > > I don't think having an abstracted resource necessarily equate > 'lowest'. The issue with percentage is the lack of precision. If you > look at cpuset cgroup, you can see the specification can be very > precise: > > # /bin/echo 1-4,6 > cpuset.cpus -> set cpus list to cpus 1,2,3,4,6 > (https://www.kernel.org/doc/Documentation/cgroup-v1/cpusets.txt) > > The driver can translate something like this to core count and then to > percentage and handle accordingly while the reverse is not possible. > (You can't tell which set of CUs/EUs a user want from a percentage > request.) It's also not clear to me, from > user/application/admin/resource management perspective, how the base > core counts of a GPU is relevant to the workload (since percentage is > a 'relative' quantity.) For example, let say a workload wants to use > 256 'cores', does it matter if that workload is put on a GPU with 1024 > cores or a GPU with 4096 cores total? > > I am not dismissing the possible need for percentage. I just think > there should be a way to accommodate more than just the 'lowest'. > As you noted, your proposal is similar to the cgroup-v1 "cpuset" controller, which is sort of a way of partitioning your underlying hardware resources; I think Joonas is describing something closer in design to the cgroup-v2 "cpu" controller, which partitions the general time/usage allocated to via cgroup; afaiu, "cpu" doesn't really care which specific core the tasks run on, just the relative weights that determine how much time they get to run on any of the cores. It sounds like with your hardware, your kernel driver is able to specify exactly which subset of GPU EU's a specific GPU context winds up running on. However I think there are a lot of platforms that don't allow that kind of low-level control. E.g., I don't think we can do that on Intel hardware; we have a handful of high-level GPU engines that we can submit different types of batchbuffers to (render, blitter, media, etc.). What we can do is use GPU preemption to limit how much time specific GPU contexts get to run on the render engine before the engine is reclaimed for use by a different context. Using a %gputime approach like Joonas is suggesting could be handled in a driver by reserving specific subsets of EU's on hardware like yours that's capable of doing that, whereas it could be mostly handled on other types of hardware via GPU engine preemption. I think either approach "gpu_euset" or "%gputime" should map well to a cgroup controller implementation. Granted, neither one solves the specific use case I was working on earlier this year where we need unfair (starvation-okay) scheduling that will run contexts strictly according to priority (i.e., lower priority contexts will never run at all unless all higher priority contexts have completed all of their submitted work), but that's a pretty specialized use case that we'll probably need to handle in a different manner anyway. Matt > Regards, > Kennny > > > > > > That combined with the "GPU memory usable" property should be a good > > > > starting point to start subdividing the GPU resources for multiple > > > > users. > > > > > > > > Regards, Joonas > > > > > > > > > > > > > > Your feedback is highly appreciated. > > > > > > > > > > Best Regards, > > > > > Harish > > > > > > > > > > > > > > > > > > > > From: amd-gfx on behalf of > > > > > Tejun Heo > > > > > Sent: Tuesday, November 20, 2018 5:30 PM > > > > > To: Ho, Kenny > > > > > Cc: cgro...@vger.kernel.org; intel-...@lists.freedesktop.org; > > > > > y2ke...@gmail.com; amd-gfx@lists.freedesktop.org; > > > > > dri-de...@lists.freedesktop.org > > > > > Subject: Re: [PATCH RFC 2/5] cgroup: Add mechanism to register vendor > > > > > specific DRM devices > > > > > > > > > > > > > > > Hello, > > > > > > > > > > On Tue, Nov 20, 2018 at 10:21:14PM +, Ho, Kenny wrote: > > > > > > By this reply, are you suggesting that vendor specific resources > > > > > > will never be acceptable to be managed under cgroup? Let say a user > > > > > > > > > > I wouldn't say never but whatever which gets included as a cgroup > > > > > controller should have clearly defined resource abstractions and the > > > > > control schemes around them including support for delegation. AFAICS, > > > > > gpu side still seems to have a long way to go (and it's not clear > > > > > whether that's somewhere it will or needs to end up). > > > > > > > > > > > want to have similar functionality as what cgroup is offering but to > > > > > > manage vendor specific resources, what would you suggest as a > > > > > > solution? When you say keeping vendor specific resourc
[PATCH v3 2/3] drm/amdgpu: Handle xgmi device removal.
XGMI hive has some resources allocted on device init which needs to be deallocated when the device is unregistered. v2: Remove creation of dedicated wq for XGMI hive reset. v3: Use the gmc.xgmi.supported flag Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 20 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h | 1 + 3 files changed, 24 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index c75badf..bfd286c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -1864,6 +1864,9 @@ static int amdgpu_device_ip_fini(struct amdgpu_device *adev) { int i, r; + if (adev->gmc.xgmi.num_physical_nodes > 1) + amdgpu_xgmi_remove_device(adev); + amdgpu_amdkfd_device_fini(adev); amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c index f8c86d0..1b15ff3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c @@ -135,3 +135,23 @@ int amdgpu_xgmi_add_device(struct amdgpu_device *adev) mutex_unlock(&xgmi_mutex); return ret; } + +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev) +{ + struct amdgpu_hive_info *hive; + + if (!adev->gmc.xgmi.supported) + return; + + mutex_lock(&xgmi_mutex); + + hive = amdgpu_get_xgmi_hive(adev); + if (!hive) + goto exit; + + if (!(hive->number_devices--)) + mutex_destroy(&hive->hive_lock); + +exit: + mutex_unlock(&xgmi_mutex); +} diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h index 6335bfd..6151eb9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h @@ -35,5 +35,6 @@ struct amdgpu_hive_info { struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev); int amdgpu_xgmi_update_topology(struct amdgpu_hive_info *hive, struct amdgpu_device *adev); int amdgpu_xgmi_add_device(struct amdgpu_device *adev); +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev); #endif -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH v3 1/3] drm/amdgpu/psp: Update waiting in psp mode1 reset.
No point in use mdelay unless running from interrupt context (which we are not) This is busy wait which will block the CPU for the entirety of the wait time. Also, reduce wait time to 500ms as it is done in refernce code because 1s might cause PSP FW TO issues during XGMI hive reset. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 2 +- drivers/gpu/drm/amd/amdgpu/psp_v3_1.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c index 082093a..9c1569b 100644 --- a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c @@ -557,7 +557,7 @@ static int psp_v11_0_mode1_reset(struct psp_context *psp) /*send the mode 1 reset command*/ WREG32(offset, GFX_CTRL_CMD_ID_MODE1_RST); - mdelay(1000); + msleep(500); offset = SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_33); diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c index 7efb823..7357fd5 100644 --- a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c @@ -592,7 +592,7 @@ static int psp_v3_1_mode1_reset(struct psp_context *psp) /*send the mode 1 reset command*/ WREG32(offset, GFX_CTRL_CMD_ID_MODE1_RST); - mdelay(1000); + msleep(500); offset = SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_33); -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH v3 3/3] drm/amdgpu: Implement concurrent asic reset for XGMI.
Use per hive wq to concurrently send reset commands to all nodes in the hive. v2: Switch to system_highpri_wq after dropping dedicated queue. Fix non XGMI code path KASAN error. Stop the hive reset for each node loop if there is a reset failure on any of the nodes. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 44 ++ 2 files changed, 41 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index c8ad6bf..6fc023b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -910,7 +910,9 @@ struct amdgpu_device { boolin_gpu_reset; struct mutex lock_reset; struct amdgpu_doorbell_index doorbell_index; + int asic_reset_res; + struct work_struct xgmi_reset_work; }; static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index bfd286c..9fd9f63 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2356,6 +2356,19 @@ bool amdgpu_device_has_dc_support(struct amdgpu_device *adev) return amdgpu_device_asic_has_dc_support(adev->asic_type); } + +static void amdgpu_device_xgmi_reset_func(struct work_struct *__work) +{ + struct amdgpu_device *adev = + container_of(__work, struct amdgpu_device, xgmi_reset_work); + + adev->asic_reset_res = amdgpu_asic_reset(adev); + if (adev->asic_reset_res) + DRM_WARN("ASIC reset failed with err r, %d for drm dev, %s", +adev->asic_reset_res, adev->ddev->unique); +} + + /** * amdgpu_device_init - initialize the driver * @@ -2454,6 +2467,8 @@ int amdgpu_device_init(struct amdgpu_device *adev, INIT_DELAYED_WORK(&adev->gfx.gfx_off_delay_work, amdgpu_device_delay_enable_gfx_off); + INIT_WORK(&adev->xgmi_reset_work, amdgpu_device_xgmi_reset_func); + adev->gfx.gfx_off_req_count = 1; adev->pm.ac_power = power_supply_is_system_supplied() > 0 ? true : false; @@ -3331,10 +3346,31 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive, */ if (need_full_reset) { list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { - r = amdgpu_asic_reset(tmp_adev); - if (r) - DRM_WARN("ASIC reset failed with err r, %d for drm dev, %s", + /* For XGMI run all resets in parallel to speed up the process */ + if (tmp_adev->gmc.xgmi.num_physical_nodes > 1) { + if (!queue_work(system_highpri_wq, &tmp_adev->xgmi_reset_work)) + r = -EALREADY; + } else + r = amdgpu_asic_reset(tmp_adev); + + if (r) { + DRM_ERROR("ASIC reset failed with err r, %d for drm dev, %s", r, tmp_adev->ddev->unique); + break; + } + } + + /* For XGMI wait for all PSP resets to complete before proceed */ + if (!r) { + list_for_each_entry(tmp_adev, device_list_handle, + gmc.xgmi.head) { + if (tmp_adev->gmc.xgmi.num_physical_nodes > 1) { + flush_work(&tmp_adev->xgmi_reset_work); + r = tmp_adev->asic_reset_res; + if (r) + break; + } + } } } @@ -3521,8 +3557,6 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, if (tmp_adev == adev) continue; - dev_info(tmp_adev->dev, "GPU reset begin for drm dev %s!\n", adev->ddev->unique); - amdgpu_device_lock_adev(tmp_adev); r = amdgpu_device_pre_asic_reset(tmp_adev, NULL, -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: [PATCH 03/11] drm/amdgpu: remove VM fault_credit handling
The credit was used to limit vm (retry) fault to be processed in each VM. If this is removed, it is possible that you get flooded interrupt storm. Even though you claimed from the commit message that, printk_ratelimit is a better solution, I didn't see you implement it in this patch. Are you planning a future patch? Thanks, Oak -Original Message- From: amd-gfx On Behalf Of Christian König Sent: Friday, November 30, 2018 7:36 AM To: amd-gfx@lists.freedesktop.org Subject: [PATCH 03/11] drm/amdgpu: remove VM fault_credit handling printk_ratelimit() is much better suited to limit the number of reported VM faults. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 37 - drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 5 drivers/gpu/drm/amd/amdgpu/cik_ih.c | 18 +--- drivers/gpu/drm/amd/amdgpu/cz_ih.c | 18 +--- drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 18 +--- drivers/gpu/drm/amd/amdgpu/tonga_ih.c | 18 +--- drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 7 ++--- 7 files changed, 6 insertions(+), 115 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 2acb9838913e..a2f149da83f8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -3057,7 +3057,6 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm, } INIT_KFIFO(vm->faults); - vm->fault_credit = 16; return 0; @@ -3269,42 +3268,6 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm) amdgpu_vmid_free_reserved(adev, vm, i); } -/** - * amdgpu_vm_pasid_fault_credit - Check fault credit for given PASID - * - * @adev: amdgpu_device pointer - * @pasid: PASID do identify the VM - * - * This function is expected to be called in interrupt context. - * - * Returns: - * True if there was fault credit, false otherwise - */ -bool amdgpu_vm_pasid_fault_credit(struct amdgpu_device *adev, - unsigned int pasid) -{ - struct amdgpu_vm *vm; - - spin_lock(&adev->vm_manager.pasid_lock); - vm = idr_find(&adev->vm_manager.pasid_idr, pasid); - if (!vm) { - /* VM not found, can't track fault credit */ - spin_unlock(&adev->vm_manager.pasid_lock); - return true; - } - - /* No lock needed. only accessed by IRQ handler */ - if (!vm->fault_credit) { - /* Too many faults in this VM */ - spin_unlock(&adev->vm_manager.pasid_lock); - return false; - } - - vm->fault_credit--; - spin_unlock(&adev->vm_manager.pasid_lock); - return true; -} - /** * amdgpu_vm_manager_init - init the VM manager * diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h index 2a8898d19c8b..e8dcfd59fc93 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h @@ -229,9 +229,6 @@ struct amdgpu_vm { /* Up to 128 pending retry page faults */ DECLARE_KFIFO(faults, u64, 128); - /* Limit non-retry fault storms */ - unsigned intfault_credit; - /* Points to the KFD process VM info */ struct amdkfd_process_info *process_info; @@ -299,8 +296,6 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm, int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct amdgpu_vm *vm, unsigned int pasid); void amdgpu_vm_release_compute(struct amdgpu_device *adev, struct amdgpu_vm *vm); void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm); -bool amdgpu_vm_pasid_fault_credit(struct amdgpu_device *adev, - unsigned int pasid); void amdgpu_vm_get_pd_bo(struct amdgpu_vm *vm, struct list_head *validated, struct amdgpu_bo_list_entry *entry); diff --git a/drivers/gpu/drm/amd/amdgpu/cik_ih.c b/drivers/gpu/drm/amd/amdgpu/cik_ih.c index b5775c6a857b..3e6c8c4067cb 100644 --- a/drivers/gpu/drm/amd/amdgpu/cik_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/cik_ih.c @@ -237,23 +237,7 @@ static u32 cik_ih_get_wptr(struct amdgpu_device *adev) */ static bool cik_ih_prescreen_iv(struct amdgpu_device *adev) { - u32 ring_index = adev->irq.ih.rptr >> 2; - u16 pasid; - - switch (le32_to_cpu(adev->irq.ih.ring[ring_index]) & 0xff) { - case 146: - case 147: - pasid = le32_to_cpu(adev->irq.ih.ring[ring_index + 2]) >> 16; - if (!pasid || amdgpu_vm_pasid_fault_credit(adev, pasid)) - return true; - break; - default: - /* Not a VM fault */ - return true; - } - - adev->irq.ih.rptr += 16; - return false; + return true; } /** diff --git a/drivers/gpu/drm/amd/amdgpu/cz_ih.c b/driv
RE: [PATCH 02/11] drm/amdgpu: send IVs to the KFD only after processing them v2
See comment [Oak] Thanks, Oak -Original Message- From: amd-gfx On Behalf Of Christian König Sent: Friday, November 30, 2018 7:36 AM To: amd-gfx@lists.freedesktop.org Subject: [PATCH 02/11] drm/amdgpu: send IVs to the KFD only after processing them v2 This allows us to filter out VM faults in the GMC code. v2: don't filter out all faults Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 29 +++-- 1 file changed, 17 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c index 6b6524f04ce0..6db4c58ddc13 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c @@ -149,9 +149,6 @@ static void amdgpu_irq_callback(struct amdgpu_device *adev, if (!amdgpu_ih_prescreen_iv(adev)) return; - /* Before dispatching irq to IP blocks, send it to amdkfd */ - amdgpu_amdkfd_interrupt(adev, (const void *) &ih->ring[ring_index]); - entry.iv_entry = (const uint32_t *)&ih->ring[ring_index]; amdgpu_ih_decode_iv(adev, &entry); @@ -371,29 +368,31 @@ void amdgpu_irq_dispatch(struct amdgpu_device *adev, unsigned client_id = entry->client_id; unsigned src_id = entry->src_id; struct amdgpu_irq_src *src; + bool handled = false; int r; trace_amdgpu_iv(entry); if (client_id >= AMDGPU_IRQ_CLIENTID_MAX) { - DRM_DEBUG("Invalid client_id in IV: %d\n", client_id); + DRM_ERROR("Invalid client_id in IV: %d\n", client_id); return; } if (src_id >= AMDGPU_MAX_IRQ_SRC_ID) { - DRM_DEBUG("Invalid src_id in IV: %d\n", src_id); + DRM_ERROR("Invalid src_id in IV: %d\n", src_id); return; } if (adev->irq.virq[src_id]) { generic_handle_irq(irq_find_mapping(adev->irq.domain, src_id)); - } else { - if (!adev->irq.client[client_id].sources) { - DRM_DEBUG("Unregistered interrupt client_id: %d src_id: %d\n", - client_id, src_id); - return; - } + return; + } + if (!adev->irq.client[client_id].sources) { + DRM_DEBUG("Unregistered interrupt client_id: %d src_id: %d\n", + client_id, src_id); + return; + } else { [Oak]: you probably don't need a else because the return in above if. src = adev->irq.client[client_id].sources[src_id]; if (!src) { DRM_DEBUG("Unhandled interrupt src_id: %d\n", src_id); @@ -401,9 +400,15 @@ void amdgpu_irq_dispatch(struct amdgpu_device *adev, } r = src->funcs->process(adev, src, entry); - if (r) + if (r < 0) DRM_ERROR("error processing interrupt (%d)\n", r); + else if (r) + handled = true; } + + /* Send it to amdkfd as well if it isn't already handled */ + if (!handled) [Oak]: Kfd cares about followed interrupt source/client. How can we guarantee that those are not handled by registered interrupt handler at amdgpu level? source_id == SOC15_INTSRC_CP_END_OF_PIPE || source_id == SOC15_INTSRC_SDMA_TRAP || source_id == SOC15_INTSRC_SQ_INTERRUPT_MSG || source_id == SOC15_INTSRC_CP_BAD_OPCODE || client_id == SOC15_IH_CLIENTID_VMC || client_id == SOC15_IH_CLIENTID_UTCL2; To completely avoid this, we should remove the amdgpu_amdkfd_interrupt function. Instead, kfd interrupt handling functions should be splitted and registered to adev->irq.client[].sources[] + amdgpu_amdkfd_interrupt(adev, entry->iv_entry); } /** -- 2.17.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: add a xgmi supported flag
Reviewed-by: Andrey Grodzovsky Andrey On 11/30/2018 03:36 PM, Alex Deucher wrote: > On Fri, Nov 30, 2018 at 3:34 PM Grodzovsky, Andrey > wrote: >> >> >> On 11/30/2018 03:30 PM, Alex Deucher wrote: >>> Use this to track whether an asic supports xgmi rather than >>> checking the asic type everywhere. >>> >>> Signed-off-by: Alex Deucher >>> --- >>>drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 1 + >>>drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 4 ++-- >>>drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c| 2 +- >>>drivers/gpu/drm/amd/amdgpu/soc15.c | 3 +++ >>>4 files changed, 7 insertions(+), 3 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h >>> index 8c57924c075f..81e6070d255b 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h >>> @@ -99,6 +99,7 @@ struct amdgpu_xgmi { >>>unsigned num_physical_nodes; >>>/* gpu list in the same hive */ >>>struct list_head head; >>> + bool supported; >>>}; >>> >>>struct amdgpu_gmc { >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c >>> index fb37e69f1bba..f8c86d0593dd 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c >>> @@ -94,9 +94,9 @@ int amdgpu_xgmi_add_device(struct amdgpu_device *adev) >>> >>>int count = 0, ret = -EINVAL; >>> >>> - if ((adev->asic_type < CHIP_VEGA20) || >>> - (adev->flags & AMD_IS_APU) ) >>> + if (!adev->gmc.xgmi.supported) >>>return 0; >> What about the (adev->flags & AMD_IS_APU) part ? > It's covered by the check. Only vega20 has the flag set. > > Alex > >> Andrey >> >>> + >>>adev->gmc.xgmi.node_id = psp_xgmi_get_node_id(&adev->psp); >>>adev->gmc.xgmi.hive_id = psp_xgmi_get_hive_id(&adev->psp); >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c >>> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c >>> index 3a4e5d8d5162..ed3145b2a596 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c >>> @@ -934,7 +934,7 @@ static int gmc_v9_0_sw_init(void *handle) >>>} >>>adev->need_swiotlb = drm_get_max_iomem() > ((u64)1 << dma_bits); >>> >>> - if (adev->asic_type == CHIP_VEGA20) { >>> + if (adev->gmc.xgmi.supported) { >>>r = gfxhub_v1_1_get_xgmi_info(adev); >>>if (r) >>>return r; >>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c >>> b/drivers/gpu/drm/amd/amdgpu/soc15.c >>> index 83624e150ca7..8849b74078d6 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c >>> @@ -507,6 +507,9 @@ int soc15_set_ip_blocks(struct amdgpu_device *adev) >>>return -EINVAL; >>>} >>> >>> + if (adev->asic_type == CHIP_VEGA20) >>> + adev->gmc.xgmi.supported = true; >>> + >>>if (adev->flags & AMD_IS_APU) >>>adev->nbio_funcs = &nbio_v7_0_funcs; >>>else if (adev->asic_type == CHIP_VEGA20) ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: add a xgmi supported flag
On Fri, Nov 30, 2018 at 3:34 PM Grodzovsky, Andrey wrote: > > > > On 11/30/2018 03:30 PM, Alex Deucher wrote: > > Use this to track whether an asic supports xgmi rather than > > checking the asic type everywhere. > > > > Signed-off-by: Alex Deucher > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 1 + > > drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 4 ++-- > > drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c| 2 +- > > drivers/gpu/drm/amd/amdgpu/soc15.c | 3 +++ > > 4 files changed, 7 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h > > index 8c57924c075f..81e6070d255b 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h > > @@ -99,6 +99,7 @@ struct amdgpu_xgmi { > > unsigned num_physical_nodes; > > /* gpu list in the same hive */ > > struct list_head head; > > + bool supported; > > }; > > > > struct amdgpu_gmc { > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c > > index fb37e69f1bba..f8c86d0593dd 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c > > @@ -94,9 +94,9 @@ int amdgpu_xgmi_add_device(struct amdgpu_device *adev) > > > > int count = 0, ret = -EINVAL; > > > > - if ((adev->asic_type < CHIP_VEGA20) || > > - (adev->flags & AMD_IS_APU) ) > > + if (!adev->gmc.xgmi.supported) > > return 0; > > What about the (adev->flags & AMD_IS_APU) part ? It's covered by the check. Only vega20 has the flag set. Alex > > Andrey > > > + > > adev->gmc.xgmi.node_id = psp_xgmi_get_node_id(&adev->psp); > > adev->gmc.xgmi.hive_id = psp_xgmi_get_hive_id(&adev->psp); > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c > > b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c > > index 3a4e5d8d5162..ed3145b2a596 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c > > +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c > > @@ -934,7 +934,7 @@ static int gmc_v9_0_sw_init(void *handle) > > } > > adev->need_swiotlb = drm_get_max_iomem() > ((u64)1 << dma_bits); > > > > - if (adev->asic_type == CHIP_VEGA20) { > > + if (adev->gmc.xgmi.supported) { > > r = gfxhub_v1_1_get_xgmi_info(adev); > > if (r) > > return r; > > diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c > > b/drivers/gpu/drm/amd/amdgpu/soc15.c > > index 83624e150ca7..8849b74078d6 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/soc15.c > > +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c > > @@ -507,6 +507,9 @@ int soc15_set_ip_blocks(struct amdgpu_device *adev) > > return -EINVAL; > > } > > > > + if (adev->asic_type == CHIP_VEGA20) > > + adev->gmc.xgmi.supported = true; > > + > > if (adev->flags & AMD_IS_APU) > > adev->nbio_funcs = &nbio_v7_0_funcs; > > else if (adev->asic_type == CHIP_VEGA20) > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: add a xgmi supported flag
On 11/30/2018 03:30 PM, Alex Deucher wrote: > Use this to track whether an asic supports xgmi rather than > checking the asic type everywhere. > > Signed-off-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 1 + > drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 4 ++-- > drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c| 2 +- > drivers/gpu/drm/amd/amdgpu/soc15.c | 3 +++ > 4 files changed, 7 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h > index 8c57924c075f..81e6070d255b 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h > @@ -99,6 +99,7 @@ struct amdgpu_xgmi { > unsigned num_physical_nodes; > /* gpu list in the same hive */ > struct list_head head; > + bool supported; > }; > > struct amdgpu_gmc { > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c > index fb37e69f1bba..f8c86d0593dd 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c > @@ -94,9 +94,9 @@ int amdgpu_xgmi_add_device(struct amdgpu_device *adev) > > int count = 0, ret = -EINVAL; > > - if ((adev->asic_type < CHIP_VEGA20) || > - (adev->flags & AMD_IS_APU) ) > + if (!adev->gmc.xgmi.supported) > return 0; What about the (adev->flags & AMD_IS_APU) part ? Andrey > + > adev->gmc.xgmi.node_id = psp_xgmi_get_node_id(&adev->psp); > adev->gmc.xgmi.hive_id = psp_xgmi_get_hive_id(&adev->psp); > > diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c > b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c > index 3a4e5d8d5162..ed3145b2a596 100644 > --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c > @@ -934,7 +934,7 @@ static int gmc_v9_0_sw_init(void *handle) > } > adev->need_swiotlb = drm_get_max_iomem() > ((u64)1 << dma_bits); > > - if (adev->asic_type == CHIP_VEGA20) { > + if (adev->gmc.xgmi.supported) { > r = gfxhub_v1_1_get_xgmi_info(adev); > if (r) > return r; > diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c > b/drivers/gpu/drm/amd/amdgpu/soc15.c > index 83624e150ca7..8849b74078d6 100644 > --- a/drivers/gpu/drm/amd/amdgpu/soc15.c > +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c > @@ -507,6 +507,9 @@ int soc15_set_ip_blocks(struct amdgpu_device *adev) > return -EINVAL; > } > > + if (adev->asic_type == CHIP_VEGA20) > + adev->gmc.xgmi.supported = true; > + > if (adev->flags & AMD_IS_APU) > adev->nbio_funcs = &nbio_v7_0_funcs; > else if (adev->asic_type == CHIP_VEGA20) ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amdgpu: add a xgmi supported flag
Use this to track whether an asic supports xgmi rather than checking the asic type everywhere. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c| 2 +- drivers/gpu/drm/amd/amdgpu/soc15.c | 3 +++ 4 files changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h index 8c57924c075f..81e6070d255b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h @@ -99,6 +99,7 @@ struct amdgpu_xgmi { unsigned num_physical_nodes; /* gpu list in the same hive */ struct list_head head; + bool supported; }; struct amdgpu_gmc { diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c index fb37e69f1bba..f8c86d0593dd 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c @@ -94,9 +94,9 @@ int amdgpu_xgmi_add_device(struct amdgpu_device *adev) int count = 0, ret = -EINVAL; - if ((adev->asic_type < CHIP_VEGA20) || - (adev->flags & AMD_IS_APU) ) + if (!adev->gmc.xgmi.supported) return 0; + adev->gmc.xgmi.node_id = psp_xgmi_get_node_id(&adev->psp); adev->gmc.xgmi.hive_id = psp_xgmi_get_hive_id(&adev->psp); diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index 3a4e5d8d5162..ed3145b2a596 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -934,7 +934,7 @@ static int gmc_v9_0_sw_init(void *handle) } adev->need_swiotlb = drm_get_max_iomem() > ((u64)1 << dma_bits); - if (adev->asic_type == CHIP_VEGA20) { + if (adev->gmc.xgmi.supported) { r = gfxhub_v1_1_get_xgmi_info(adev); if (r) return r; diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c index 83624e150ca7..8849b74078d6 100644 --- a/drivers/gpu/drm/amd/amdgpu/soc15.c +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c @@ -507,6 +507,9 @@ int soc15_set_ip_blocks(struct amdgpu_device *adev) return -EINVAL; } + if (adev->asic_type == CHIP_VEGA20) + adev->gmc.xgmi.supported = true; + if (adev->flags & AMD_IS_APU) adev->nbio_funcs = &nbio_v7_0_funcs; else if (adev->asic_type == CHIP_VEGA20) -- 2.13.6 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH v2 2/3] drm/amdgpu: Handle xgmi device removal.
On Fri, Nov 30, 2018 at 3:12 PM Grodzovsky, Andrey wrote: > > > On 11/30/2018 03:08 PM, Alex Deucher wrote: > > On Fri, Nov 30, 2018 at 3:06 PM Grodzovsky, Andrey > > wrote: > >> > >> > >> On 11/30/2018 02:49 PM, Alex Deucher wrote: > >>> On Fri, Nov 30, 2018 at 1:17 PM Andrey Grodzovsky > >>> wrote: > XGMI hive has some resources allocted on device init which > needs to be deallocated when the device is unregistered. > > v2: Remove creation of dedicated wq for XGMI hive reset. > > Signed-off-by: Andrey Grodzovsky > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++ > drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 20 > drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h | 1 + > 3 files changed, 24 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index c75badf..bfd286c 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -1864,6 +1864,9 @@ static int amdgpu_device_ip_fini(struct > amdgpu_device *adev) > { > int i, r; > > + if (adev->gmc.xgmi.num_physical_nodes > 1) > + amdgpu_xgmi_remove_device(adev); > + > amdgpu_amdkfd_device_fini(adev); > > amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE); > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c > index fb37e69..38e1599 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c > @@ -135,3 +135,23 @@ int amdgpu_xgmi_add_device(struct amdgpu_device > *adev) > mutex_unlock(&xgmi_mutex); > return ret; > } > + > +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev) > +{ > + struct amdgpu_hive_info *hive; > + > + if ((adev->asic_type < CHIP_VEGA20) || (adev->flags & > AMD_IS_APU)) > + return; > >>> It would be nice to have something better here to check against. This > >>> seems kind of fragile. Can we check based on some xgmi related > >>> structure? > >>> > >>> Alex > >> This check is same as in amdgpu_xgmi_add_device, i guess we could > >> look if adev->gmc.xgmi.head is empty or not since it's only gets added to > >> hive->device_list if amdgpu_xgmi_add_device was called. > >> Sounds ok ? > > Sounds good. We should probably fix up the other case as well at some > > point. > > > > Thanks! > > > > Alex > > The other case is in amdgpu_xgmi_add_device - at the beginning - just > avoids XGMI code path for non VEGA20 dGPUs - how can we avoid this check ? > Maybe something like adev->gmc.xgmi.supported and set it in soc15_set_ip_blocks() for vega20. We should do something similar for ras. That way for new asics we only need to set the variable rather than updating asic checks scattered around the code. Alex > Andrey > > > > > >> Andrey > >> > + > + mutex_lock(&xgmi_mutex); > + > + hive = amdgpu_get_xgmi_hive(adev); > + if (!hive) > + goto exit; > + > + if (!(hive->number_devices--)) > + mutex_destroy(&hive->hive_lock); > + > +exit: > + mutex_unlock(&xgmi_mutex); > +} > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h > index 6335bfd..6151eb9 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h > @@ -35,5 +35,6 @@ struct amdgpu_hive_info { > struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device > *adev); > int amdgpu_xgmi_update_topology(struct amdgpu_hive_info *hive, struct > amdgpu_device *adev); > int amdgpu_xgmi_add_device(struct amdgpu_device *adev); > +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev); > > #endif > -- > 2.7.4 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH v2 2/3] drm/amdgpu: Handle xgmi device removal.
On 11/30/2018 03:08 PM, Alex Deucher wrote: > On Fri, Nov 30, 2018 at 3:06 PM Grodzovsky, Andrey > wrote: >> >> >> On 11/30/2018 02:49 PM, Alex Deucher wrote: >>> On Fri, Nov 30, 2018 at 1:17 PM Andrey Grodzovsky >>> wrote: XGMI hive has some resources allocted on device init which needs to be deallocated when the device is unregistered. v2: Remove creation of dedicated wq for XGMI hive reset. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 20 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h | 1 + 3 files changed, 24 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index c75badf..bfd286c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -1864,6 +1864,9 @@ static int amdgpu_device_ip_fini(struct amdgpu_device *adev) { int i, r; + if (adev->gmc.xgmi.num_physical_nodes > 1) + amdgpu_xgmi_remove_device(adev); + amdgpu_amdkfd_device_fini(adev); amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c index fb37e69..38e1599 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c @@ -135,3 +135,23 @@ int amdgpu_xgmi_add_device(struct amdgpu_device *adev) mutex_unlock(&xgmi_mutex); return ret; } + +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev) +{ + struct amdgpu_hive_info *hive; + + if ((adev->asic_type < CHIP_VEGA20) || (adev->flags & AMD_IS_APU)) + return; >>> It would be nice to have something better here to check against. This >>> seems kind of fragile. Can we check based on some xgmi related >>> structure? >>> >>> Alex >> This check is same as in amdgpu_xgmi_add_device, i guess we could >> look if adev->gmc.xgmi.head is empty or not since it's only gets added to >> hive->device_list if amdgpu_xgmi_add_device was called. >> Sounds ok ? > Sounds good. We should probably fix up the other case as well at some point. > > Thanks! > > Alex The other case is in amdgpu_xgmi_add_device - at the beginning - just avoids XGMI code path for non VEGA20 dGPUs - how can we avoid this check ? Andrey > >> Andrey >> + + mutex_lock(&xgmi_mutex); + + hive = amdgpu_get_xgmi_hive(adev); + if (!hive) + goto exit; + + if (!(hive->number_devices--)) + mutex_destroy(&hive->hive_lock); + +exit: + mutex_unlock(&xgmi_mutex); +} diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h index 6335bfd..6151eb9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h @@ -35,5 +35,6 @@ struct amdgpu_hive_info { struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev); int amdgpu_xgmi_update_topology(struct amdgpu_hive_info *hive, struct amdgpu_device *adev); int amdgpu_xgmi_add_device(struct amdgpu_device *adev); +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev); #endif -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH v2 2/3] drm/amdgpu: Handle xgmi device removal.
On Fri, Nov 30, 2018 at 3:06 PM Grodzovsky, Andrey wrote: > > > > On 11/30/2018 02:49 PM, Alex Deucher wrote: > > On Fri, Nov 30, 2018 at 1:17 PM Andrey Grodzovsky > > wrote: > >> XGMI hive has some resources allocted on device init which > >> needs to be deallocated when the device is unregistered. > >> > >> v2: Remove creation of dedicated wq for XGMI hive reset. > >> > >> Signed-off-by: Andrey Grodzovsky > >> --- > >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++ > >> drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 20 > >> drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h | 1 + > >> 3 files changed, 24 insertions(+) > >> > >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >> index c75badf..bfd286c 100644 > >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >> @@ -1864,6 +1864,9 @@ static int amdgpu_device_ip_fini(struct > >> amdgpu_device *adev) > >> { > >> int i, r; > >> > >> + if (adev->gmc.xgmi.num_physical_nodes > 1) > >> + amdgpu_xgmi_remove_device(adev); > >> + > >> amdgpu_amdkfd_device_fini(adev); > >> > >> amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE); > >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c > >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c > >> index fb37e69..38e1599 100644 > >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c > >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c > >> @@ -135,3 +135,23 @@ int amdgpu_xgmi_add_device(struct amdgpu_device *adev) > >> mutex_unlock(&xgmi_mutex); > >> return ret; > >> } > >> + > >> +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev) > >> +{ > >> + struct amdgpu_hive_info *hive; > >> + > >> + if ((adev->asic_type < CHIP_VEGA20) || (adev->flags & AMD_IS_APU)) > >> + return; > > It would be nice to have something better here to check against. This > > seems kind of fragile. Can we check based on some xgmi related > > structure? > > > > Alex > > This check is same as in amdgpu_xgmi_add_device, i guess we could > look if adev->gmc.xgmi.head is empty or not since it's only gets added to > hive->device_list if amdgpu_xgmi_add_device was called. > Sounds ok ? Sounds good. We should probably fix up the other case as well at some point. Thanks! Alex > > Andrey > > > > >> + > >> + mutex_lock(&xgmi_mutex); > >> + > >> + hive = amdgpu_get_xgmi_hive(adev); > >> + if (!hive) > >> + goto exit; > >> + > >> + if (!(hive->number_devices--)) > >> + mutex_destroy(&hive->hive_lock); > >> + > >> +exit: > >> + mutex_unlock(&xgmi_mutex); > >> +} > >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h > >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h > >> index 6335bfd..6151eb9 100644 > >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h > >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h > >> @@ -35,5 +35,6 @@ struct amdgpu_hive_info { > >> struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device > >> *adev); > >> int amdgpu_xgmi_update_topology(struct amdgpu_hive_info *hive, struct > >> amdgpu_device *adev); > >> int amdgpu_xgmi_add_device(struct amdgpu_device *adev); > >> +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev); > >> > >> #endif > >> -- > >> 2.7.4 > >> > >> ___ > >> amd-gfx mailing list > >> amd-gfx@lists.freedesktop.org > >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH v2 2/3] drm/amdgpu: Handle xgmi device removal.
On 11/30/2018 02:49 PM, Alex Deucher wrote: > On Fri, Nov 30, 2018 at 1:17 PM Andrey Grodzovsky > wrote: >> XGMI hive has some resources allocted on device init which >> needs to be deallocated when the device is unregistered. >> >> v2: Remove creation of dedicated wq for XGMI hive reset. >> >> Signed-off-by: Andrey Grodzovsky >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++ >> drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 20 >> drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h | 1 + >> 3 files changed, 24 insertions(+) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> index c75badf..bfd286c 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> @@ -1864,6 +1864,9 @@ static int amdgpu_device_ip_fini(struct amdgpu_device >> *adev) >> { >> int i, r; >> >> + if (adev->gmc.xgmi.num_physical_nodes > 1) >> + amdgpu_xgmi_remove_device(adev); >> + >> amdgpu_amdkfd_device_fini(adev); >> >> amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE); >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c >> index fb37e69..38e1599 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c >> @@ -135,3 +135,23 @@ int amdgpu_xgmi_add_device(struct amdgpu_device *adev) >> mutex_unlock(&xgmi_mutex); >> return ret; >> } >> + >> +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev) >> +{ >> + struct amdgpu_hive_info *hive; >> + >> + if ((adev->asic_type < CHIP_VEGA20) || (adev->flags & AMD_IS_APU)) >> + return; > It would be nice to have something better here to check against. This > seems kind of fragile. Can we check based on some xgmi related > structure? > > Alex This check is same as in amdgpu_xgmi_add_device, i guess we could look if adev->gmc.xgmi.head is empty or not since it's only gets added to hive->device_list if amdgpu_xgmi_add_device was called. Sounds ok ? Andrey > >> + >> + mutex_lock(&xgmi_mutex); >> + >> + hive = amdgpu_get_xgmi_hive(adev); >> + if (!hive) >> + goto exit; >> + >> + if (!(hive->number_devices--)) >> + mutex_destroy(&hive->hive_lock); >> + >> +exit: >> + mutex_unlock(&xgmi_mutex); >> +} >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h >> index 6335bfd..6151eb9 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h >> @@ -35,5 +35,6 @@ struct amdgpu_hive_info { >> struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev); >> int amdgpu_xgmi_update_topology(struct amdgpu_hive_info *hive, struct >> amdgpu_device *adev); >> int amdgpu_xgmi_add_device(struct amdgpu_device *adev); >> +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev); >> >> #endif >> -- >> 2.7.4 >> >> ___ >> amd-gfx mailing list >> amd-gfx@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH v2 2/3] drm/amdgpu: Handle xgmi device removal.
On Fri, Nov 30, 2018 at 1:17 PM Andrey Grodzovsky wrote: > > XGMI hive has some resources allocted on device init which > needs to be deallocated when the device is unregistered. > > v2: Remove creation of dedicated wq for XGMI hive reset. > > Signed-off-by: Andrey Grodzovsky > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++ > drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 20 > drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h | 1 + > 3 files changed, 24 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index c75badf..bfd286c 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -1864,6 +1864,9 @@ static int amdgpu_device_ip_fini(struct amdgpu_device > *adev) > { > int i, r; > > + if (adev->gmc.xgmi.num_physical_nodes > 1) > + amdgpu_xgmi_remove_device(adev); > + > amdgpu_amdkfd_device_fini(adev); > > amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE); > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c > index fb37e69..38e1599 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c > @@ -135,3 +135,23 @@ int amdgpu_xgmi_add_device(struct amdgpu_device *adev) > mutex_unlock(&xgmi_mutex); > return ret; > } > + > +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev) > +{ > + struct amdgpu_hive_info *hive; > + > + if ((adev->asic_type < CHIP_VEGA20) || (adev->flags & AMD_IS_APU)) > + return; It would be nice to have something better here to check against. This seems kind of fragile. Can we check based on some xgmi related structure? Alex > + > + mutex_lock(&xgmi_mutex); > + > + hive = amdgpu_get_xgmi_hive(adev); > + if (!hive) > + goto exit; > + > + if (!(hive->number_devices--)) > + mutex_destroy(&hive->hive_lock); > + > +exit: > + mutex_unlock(&xgmi_mutex); > +} > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h > index 6335bfd..6151eb9 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h > @@ -35,5 +35,6 @@ struct amdgpu_hive_info { > struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev); > int amdgpu_xgmi_update_topology(struct amdgpu_hive_info *hive, struct > amdgpu_device *adev); > int amdgpu_xgmi_add_device(struct amdgpu_device *adev); > +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev); > > #endif > -- > 2.7.4 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[pull] amdgpu, amdkfd, ttm, scheduler, radeon drm-next-4.21
Hi Dave, More new features for 4.21: amdgpu and amdkfd: - Freesync support - ABM support in DC - KFD support for vega12 and polaris12 - Add sdma paging queue support for vega - Use ACPI to query backlight range on supported platforms - Clean up doorbell handling - KFD fix for pasid handling under non-HWS - Misc cleanups and fixes scheduler: - Revert "fix timeout handling v2" radeon: - Fix possible overflow on 32 bit ttm: - Fix for LRU handling for ghost objects The following changes since commit 9235dd441af43599b9cdcce599a3da4083fcad3c: Merge branch 'drm-next-4.21' of git://people.freedesktop.org/~agd5f/linux into drm-next (2018-11-19 11:07:52 +1000) are available in the git repository at: git://people.freedesktop.org/~agd5f/linux drm-next-4.21 for you to fetch changes up to 2c486cc4c2774df684d8a43ca7a20670c67ccd76: drm/amdgpu: wait for IB test on first device open (2018-11-30 12:01:35 -0500) Alex Deucher (4): drm/amdgpu/gfx: use proper offset define for MEC doorbells drm/amdgpu/psp: use define rather than magic number for mode1 reset drm/amdgpu: don't expose fan attributes on APUs drm/amdgpu: add VCN JPEG support amdgpu_ctx_num_entities Andrey Grodzovsky (3): drm/amdgpu: Refactor amdgpu_xgmi_add_device drm/amdgpu: Expose hive adev list and xgmi_mutex drm/amdgpu: Refactor GPU reset for XGMI hive case Bhawanpreet Lakha (2): drm/amd/display: Set RMX_ASPECT as default drm/amd/display: Fix Scaling (RMX_*) for DC driver Brajeswar Ghosh (7): drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c: Remove duplicate header drm/amd/amdgpu/vce_v3_0.c: Remove duplicate header drm/amd/amdgpu: Remove duplicate header drm/amd/display/amdgpu_dm/amdgpu_dm.c: Remove duplicate header drm/amd/amdgpu: Remove duplicate header drm/amd/amdkfd: Remove duplicate header drm/amd/display: Remove duplicate header Charlene Liu (1): drm/amd/display: expose surface confirm color function Chengming Gui (1): Revert "drm/amdgpu: use GMC v9 KIQ workaround only for the GFXHUB" (v2) Chris Wilson (1): drm/amdgpu: Reorder uvd ring init before uvd resume Christian König (3): drm/sched: revert "fix timeout handling v2" v2 drm/ttm: fix LRU handling in ttm_buffer_object_transfer drm/amdgpu: wait for IB test on first device open Colin Ian King (3): drm/amd/display: fix dereference of pointer fs_params before it is null checked drm/amdgpu: fix spelling mistake "Pramater" -> "Parameter" drm/amd/pp: fix spelling mistake "dependancy" -> "dependency" David Francis (10): drm/amd/display: Remove dc_stream_state->status drm/amd/display: Check for dmcu initialization before calling dmcu drm/amd/display: Clean up dp_blank functions drm/amd/display: Get backlight controller id from link drm/amd/display: Clean up DCN1 clock requests drm/amd/display: Load DMCU IRAM drm/amd: Add abm level drm property drm/amd: update ATIF functions in AMD ACPI header drm/amd: Query and use ACPI backlight caps drm/amd/display: Fix compile error with ACPI disabled Dmytro Laktyushkin (2): drm/amd/display: redesign scaling rotation math drm/amd/display: fix pipe interdependent hubp programming Emily Deng (1): drm/amd/amdgpu/sriov: Aligned the definition with libgv Eric Bernstein (1): drm/amd/display: get tail pipe before aquire free pipe Eric Huang (1): drm/amdkfd: change system memory overcommit limit Felix Kuehling (2): drm/amdkfd: Fix and simplify sync object handling for KFD drm/amdgpu: Fix KFD doorbell SG BO mapping Gang Ba (1): drm/amdkfd: Added Vega12 and Polaris12 for KFD. Guttula, Suresh (2): drm/amd/powerplay:add hwmgr callback to update nbpstate on Carrizo drm/amd:Enable/Disable NBPSTATE on On/OFF of UVD Harish Kasiviswanathan (2): drm/amdgpu: Remove explicit wait after VM validate drm/amdgpu: KFD Restore process: Optimize waiting Jerry (Fangzhi) Zuo (1): drm/amd/display: Fix NULL ptr when calculating refresh rate Joerg Roedel (1): drm/amd/powerplay: Ratelimit all "was not implemented" messages Joshua Aberback (1): drm/amd/display: Adjust stream enable sequence Jun Lei (2): drm/amd/display: make underflow status clear explicit drm/amd/display: clear underflow on optc unblank Murton Liu (1): drm/amd/display: fix gamma not being applied correctly Nevenko Stupar (1): drm/amd/display: expose dentist_get_divider_from_did Nicholas Kazlauskas (9): drm/amdgpu: Add amdgpu "max bpc" connector property (v2) drm/amd/display: Support amdgpu "max bpc" connector property (v2) drm/amd/display: Use private obj helpers for dm_atomic_state drm: Add vrr_capable property to the drm connector drm: Add vrr_enabled property to drm CRTC drm: Document varia
[PATCH v2 3/3] drm/amdgpu: Implement concurrent asic reset for XGMI.
Use per hive wq to concurrently send reset commands to all nodes in the hive. v2: Switch to system_highpri_wq after dropping dedicated queue. Fix non XGMI code path KASAN error. Stop the hive reset for each node loop if there is a reset failure on any of the nodes. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 44 ++ 2 files changed, 41 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index c8ad6bf..6fc023b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -910,7 +910,9 @@ struct amdgpu_device { boolin_gpu_reset; struct mutex lock_reset; struct amdgpu_doorbell_index doorbell_index; + int asic_reset_res; + struct work_struct xgmi_reset_work; }; static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index bfd286c..9fd9f63 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2356,6 +2356,19 @@ bool amdgpu_device_has_dc_support(struct amdgpu_device *adev) return amdgpu_device_asic_has_dc_support(adev->asic_type); } + +static void amdgpu_device_xgmi_reset_func(struct work_struct *__work) +{ + struct amdgpu_device *adev = + container_of(__work, struct amdgpu_device, xgmi_reset_work); + + adev->asic_reset_res = amdgpu_asic_reset(adev); + if (adev->asic_reset_res) + DRM_WARN("ASIC reset failed with err r, %d for drm dev, %s", +adev->asic_reset_res, adev->ddev->unique); +} + + /** * amdgpu_device_init - initialize the driver * @@ -2454,6 +2467,8 @@ int amdgpu_device_init(struct amdgpu_device *adev, INIT_DELAYED_WORK(&adev->gfx.gfx_off_delay_work, amdgpu_device_delay_enable_gfx_off); + INIT_WORK(&adev->xgmi_reset_work, amdgpu_device_xgmi_reset_func); + adev->gfx.gfx_off_req_count = 1; adev->pm.ac_power = power_supply_is_system_supplied() > 0 ? true : false; @@ -3331,10 +3346,31 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive, */ if (need_full_reset) { list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { - r = amdgpu_asic_reset(tmp_adev); - if (r) - DRM_WARN("ASIC reset failed with err r, %d for drm dev, %s", + /* For XGMI run all resets in parallel to speed up the process */ + if (tmp_adev->gmc.xgmi.num_physical_nodes > 1) { + if (!queue_work(system_highpri_wq, &tmp_adev->xgmi_reset_work)) + r = -EALREADY; + } else + r = amdgpu_asic_reset(tmp_adev); + + if (r) { + DRM_ERROR("ASIC reset failed with err r, %d for drm dev, %s", r, tmp_adev->ddev->unique); + break; + } + } + + /* For XGMI wait for all PSP resets to complete before proceed */ + if (!r) { + list_for_each_entry(tmp_adev, device_list_handle, + gmc.xgmi.head) { + if (tmp_adev->gmc.xgmi.num_physical_nodes > 1) { + flush_work(&tmp_adev->xgmi_reset_work); + r = tmp_adev->asic_reset_res; + if (r) + break; + } + } } } @@ -3521,8 +3557,6 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, if (tmp_adev == adev) continue; - dev_info(tmp_adev->dev, "GPU reset begin for drm dev %s!\n", adev->ddev->unique); - amdgpu_device_lock_adev(tmp_adev); r = amdgpu_device_pre_asic_reset(tmp_adev, NULL, -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH v2 1/3] drm/amdgpu/psp: Update waiting in psp mode1 reset.
No point in use mdelay unless running from interrupt context (which we are not) This is busy wait which will block the CPU for the entirety of the wait time. Also, reduce wait time to 500ms as it is done in refernce code because it might cause PSP FW timeout issues during XGMI hive reset. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 2 +- drivers/gpu/drm/amd/amdgpu/psp_v3_1.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c index 082093a..9c1569b 100644 --- a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c @@ -557,7 +557,7 @@ static int psp_v11_0_mode1_reset(struct psp_context *psp) /*send the mode 1 reset command*/ WREG32(offset, GFX_CTRL_CMD_ID_MODE1_RST); - mdelay(1000); + msleep(500); offset = SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_33); diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c index 7efb823..7357fd5 100644 --- a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c @@ -592,7 +592,7 @@ static int psp_v3_1_mode1_reset(struct psp_context *psp) /*send the mode 1 reset command*/ WREG32(offset, GFX_CTRL_CMD_ID_MODE1_RST); - mdelay(1000); + msleep(500); offset = SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_33); -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH v2 2/3] drm/amdgpu: Handle xgmi device removal.
XGMI hive has some resources allocted on device init which needs to be deallocated when the device is unregistered. v2: Remove creation of dedicated wq for XGMI hive reset. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 20 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h | 1 + 3 files changed, 24 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index c75badf..bfd286c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -1864,6 +1864,9 @@ static int amdgpu_device_ip_fini(struct amdgpu_device *adev) { int i, r; + if (adev->gmc.xgmi.num_physical_nodes > 1) + amdgpu_xgmi_remove_device(adev); + amdgpu_amdkfd_device_fini(adev); amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c index fb37e69..38e1599 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c @@ -135,3 +135,23 @@ int amdgpu_xgmi_add_device(struct amdgpu_device *adev) mutex_unlock(&xgmi_mutex); return ret; } + +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev) +{ + struct amdgpu_hive_info *hive; + + if ((adev->asic_type < CHIP_VEGA20) || (adev->flags & AMD_IS_APU)) + return; + + mutex_lock(&xgmi_mutex); + + hive = amdgpu_get_xgmi_hive(adev); + if (!hive) + goto exit; + + if (!(hive->number_devices--)) + mutex_destroy(&hive->hive_lock); + +exit: + mutex_unlock(&xgmi_mutex); +} diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h index 6335bfd..6151eb9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h @@ -35,5 +35,6 @@ struct amdgpu_hive_info { struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev); int amdgpu_xgmi_update_topology(struct amdgpu_hive_info *hive, struct amdgpu_device *adev); int amdgpu_xgmi_add_device(struct amdgpu_device *adev); +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev); #endif -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: [PATCH 02/11] drm/amdgpu: send IVs to the KFD only after processing them v2
Won't this break VM fault handling in KFD? I don't see a way with the current code that you can leave some VM faults for KFD to process. If we could consider VM faults with VMIDs 8-15 as not handled in amdgpu and leave them for KFD to process, then this could work. As far as I can tell, the only code path that leave IRQs unhandled and passes them to KFD prints an error message in the kernel log. We can't have the kernel log flooded with error messages every time there are IRQs for KFD. We can get extremely high frequency interrupts for HSA signals. Regards, Felix -Original Message- From: amd-gfx On Behalf Of Alex Deucher Sent: Friday, November 30, 2018 10:03 AM To: Christian König Cc: amd-gfx list Subject: Re: [PATCH 02/11] drm/amdgpu: send IVs to the KFD only after processing them v2 On Fri, Nov 30, 2018 at 7:36 AM Christian König wrote: > > This allows us to filter out VM faults in the GMC code. > > v2: don't filter out all faults > > Signed-off-by: Christian König Acked-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 29 > +++-- > 1 file changed, 17 insertions(+), 12 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c > index 6b6524f04ce0..6db4c58ddc13 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c > @@ -149,9 +149,6 @@ static void amdgpu_irq_callback(struct amdgpu_device > *adev, > if (!amdgpu_ih_prescreen_iv(adev)) > return; > > - /* Before dispatching irq to IP blocks, send it to amdkfd */ > - amdgpu_amdkfd_interrupt(adev, (const void *) &ih->ring[ring_index]); > - > entry.iv_entry = (const uint32_t *)&ih->ring[ring_index]; > amdgpu_ih_decode_iv(adev, &entry); > > @@ -371,29 +368,31 @@ void amdgpu_irq_dispatch(struct amdgpu_device *adev, > unsigned client_id = entry->client_id; > unsigned src_id = entry->src_id; > struct amdgpu_irq_src *src; > + bool handled = false; > int r; > > trace_amdgpu_iv(entry); > > if (client_id >= AMDGPU_IRQ_CLIENTID_MAX) { > - DRM_DEBUG("Invalid client_id in IV: %d\n", client_id); > + DRM_ERROR("Invalid client_id in IV: %d\n", client_id); > return; > } > > if (src_id >= AMDGPU_MAX_IRQ_SRC_ID) { > - DRM_DEBUG("Invalid src_id in IV: %d\n", src_id); > + DRM_ERROR("Invalid src_id in IV: %d\n", src_id); > return; > } > > if (adev->irq.virq[src_id]) { > generic_handle_irq(irq_find_mapping(adev->irq.domain, > src_id)); > - } else { > - if (!adev->irq.client[client_id].sources) { > - DRM_DEBUG("Unregistered interrupt client_id: %d > src_id: %d\n", > - client_id, src_id); > - return; > - } > + return; > + } > > + if (!adev->irq.client[client_id].sources) { > + DRM_DEBUG("Unregistered interrupt client_id: %d src_id: %d\n", > + client_id, src_id); > + return; > + } else { > src = adev->irq.client[client_id].sources[src_id]; > if (!src) { > DRM_DEBUG("Unhandled interrupt src_id: %d\n", > src_id); @@ -401,9 +400,15 @@ void amdgpu_irq_dispatch(struct amdgpu_device > *adev, > } > > r = src->funcs->process(adev, src, entry); > - if (r) > + if (r < 0) > DRM_ERROR("error processing interrupt (%d)\n", > r); > + else if (r) > + handled = true; > } > + > + /* Send it to amdkfd as well if it isn't already handled */ > + if (!handled) > + amdgpu_amdkfd_interrupt(adev, entry->iv_entry); > } > > /** > -- > 2.17.1 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 1/2] drm/amdgpu: Handle xgmi device removal and add reset wq.
On 11/30/2018 10:53 AM, Koenig, Christian wrote: > Am 30.11.18 um 16:14 schrieb Grodzovsky, Andrey: >> On 11/30/2018 04:03 AM, Christian König wrote: >>> Am 29.11.18 um 21:36 schrieb Andrey Grodzovsky: XGMI hive has some resources allocted on device init which needs to be deallocated when the device is unregistered. Add per hive wq to allow all the nodes in hive to run resets concurently - this should speed up the total reset time to avoid breaching the PSP FW timeout. >>> Do you really want a workqueue? That sounds like the exact opposite of >>> what you describe here. >>> >>> E.g. a workqueue is there to serialize all work items scheduled to it. >>> >>> What you most likely rather want is a work item per device which are >>> scheduled, run in parallel and are joined back together before >>> dropping the per hive lock. >>> >>> Christian. >> As you can see in second patch that exactly how I use this wq. I define >> per adev work item all of which I enqueue during GPU reset routine. >> I believe the wq I created will processes work items in parallel since >> it's is served by per CPU worker thread (thread pool) while a >> serializing wq is >> created using the 'create_singlethread_workqueue' interface which >> creates an ordered wq which process at most one item simultaneously. > That is correct, but you are still limited to one work item processed > per CPU which is not necessary what you want. I was wrong to say it's single worker thread per CPU, it's actually work thread pool per CPU, so even with single CPU in the system you get concurrency of execution. > >> I did have my doubts about creating a dedicated wq instead of reusing >> existing system wide wq such as system_unbound_wq because this adds new >> kernel threads but eventually I opted for my own queue because there is >> a hard requirement from the PSP FW to complete all ASIC resets in >> defined finite amount >> of time (1s i believe) otherwise some nodes in the XGMI hive might >> switch to single GPU mode due to timeout. >> If I rely on system wide wq then during stress times this queue might >> fail to process my work items within that time limit because it's >> overloaded by work items from other queue clients. >> Maybe I should squash the 2 changes into one to make all of this more >> clear ? > No, that won't help. You would rather need to document that properly. > > But creating a wq just for a rarely used device reset is completely > overkill. > > Stuff like that should go to the system_highpri_wq work queue which > should make already sure that it never hits the PSP timeout. > > Christian. I agree in general with that, but it looks like system_highpri_wq is 'BOUND' wq meaning it's items are limited to the thread pool of the CPU from which they were queued only, anyway, since I mostly just wait in sleep inside PSP reset function that should be still OK. Will try that. Andrey > >> Andrey Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 24 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h | 3 +++ 2 files changed, 27 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c index fb37e69..9ac2dc5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c @@ -61,6 +61,8 @@ struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev) INIT_LIST_HEAD(&tmp->device_list); mutex_init(&tmp->hive_lock); + tmp->reset_queue = alloc_workqueue("xgmi-hive", WQ_UNBOUND | WQ_HIGHPRI, 0); + return tmp; } @@ -135,3 +137,25 @@ int amdgpu_xgmi_add_device(struct amdgpu_device *adev) mutex_unlock(&xgmi_mutex); return ret; } + +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev) +{ + struct amdgpu_hive_info *hive; + + if ((adev->asic_type < CHIP_VEGA20) || (adev->flags & AMD_IS_APU)) + return; + + mutex_lock(&xgmi_mutex); + + hive = amdgpu_get_xgmi_hive(adev); + if (!hive) + goto exit; + + if (!(hive->number_devices--)) { + mutex_destroy(&hive->hive_lock); + destroy_workqueue(hive->reset_queue); + } + +exit: + mutex_unlock(&xgmi_mutex); +} diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h index 6335bfd..285ab93 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h @@ -30,10 +30,13 @@ struct amdgpu_hive_info { struct psp_xgmi_topology_info topology_info; int number_devices; struct mutex hive_lock; + /* hive members reset wq */ + str
Re: [PATCH 11/11] drm/amdgpu: disable IH ring 1 & 2 WPTR overflow on Vega10
On Fri, Nov 30, 2018 at 7:36 AM Christian König wrote: > > That should add back pressure on the client. > > Signed-off-by: Christian König Acked-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c > b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c > index f5c5ea628fdf..dd7f52f08fd7 100644 > --- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c > +++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c > @@ -193,6 +193,8 @@ static int vega10_ih_irq_init(struct amdgpu_device *adev) > > ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1); > ih_rb_cntl = vega10_ih_rb_cntl(ih, ih_rb_cntl); > + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2, > + WPTR_OVERFLOW_ENABLE, 0); > WREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1, ih_rb_cntl); > > /* set rptr, wptr to 0 */ > @@ -207,6 +209,8 @@ static int vega10_ih_irq_init(struct amdgpu_device *adev) > > ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1); > ih_rb_cntl = vega10_ih_rb_cntl(ih, ih_rb_cntl); > + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2, > + WPTR_OVERFLOW_ENABLE, 0); > WREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING2, ih_rb_cntl); > > /* set rptr, wptr to 0 */ > -- > 2.17.1 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 10/11] drm/amdgpu: add support for self irq on Vega10
On Fri, Nov 30, 2018 at 7:36 AM Christian König wrote: > > This finally enables processing of ring 1 & 2. > > Signed-off-by: Christian König Reviewed-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 68 -- > 1 file changed, 63 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c > b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c > index 562fd036bb9e..f5c5ea628fdf 100644 > --- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c > +++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c > @@ -258,7 +258,7 @@ static void vega10_ih_irq_disable(struct amdgpu_device > *adev) > static u32 vega10_ih_get_wptr(struct amdgpu_device *adev, > struct amdgpu_ih_ring *ih) > { > - u32 wptr, tmp; > + u32 wptr, reg, tmp; > > wptr = le32_to_cpu(*ih->wptr_cpu); > > @@ -274,9 +274,18 @@ static u32 vega10_ih_get_wptr(struct amdgpu_device *adev, > wptr, ih->rptr, tmp); > ih->rptr = tmp; > > - tmp = RREG32_NO_KIQ(SOC15_REG_OFFSET(OSSSYS, 0, > mmIH_RB_CNTL)); > + if (ih == &adev->irq.ih) > + reg = SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL); > + else if (ih == &adev->irq.ih1) > + reg = SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL); > + else if (ih == &adev->irq.ih2) > + reg = SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL); > + else > + BUG(); > + > + tmp = RREG32_NO_KIQ(reg); > tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1); > - WREG32_NO_KIQ(SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL), tmp); > + WREG32_NO_KIQ(reg, tmp); > } > return (wptr & ih->ptr_mask); > } > @@ -338,9 +347,52 @@ static void vega10_ih_set_rptr(struct amdgpu_device > *adev, > /* XXX check if swapping is necessary on BE */ > *ih->rptr_cpu = ih->rptr; > WDOORBELL32(ih->doorbell_index, ih->rptr); > - } else { > + } else if (ih == &adev->irq.ih) { > WREG32_SOC15(OSSSYS, 0, mmIH_RB_RPTR, ih->rptr); > + } else if (ih == &adev->irq.ih1) { > + WREG32_SOC15(OSSSYS, 0, mmIH_RB_RPTR_RING1, ih->rptr); > + } else if (ih == &adev->irq.ih2) { > + WREG32_SOC15(OSSSYS, 0, mmIH_RB_RPTR_RING2, ih->rptr); > + } > +} > + > +/** > + * vega10_ih_self_irq - dispatch work for ring 1 and 2 > + * > + * @adev: amdgpu_device pointer > + * @source: irq source > + * @entry: IV with WPTR update > + * > + * Update the WPTR from the IV and schedule work to handle the entries. > + */ > +static int vega10_ih_self_irq(struct amdgpu_device *adev, > + struct amdgpu_irq_src *source, > + struct amdgpu_iv_entry *entry) > +{ > + uint32_t wptr = cpu_to_le32(entry->src_data[0]); > + > + switch (entry->ring_id) { > + case 1: > + *adev->irq.ih1.wptr_cpu = wptr; > + schedule_work(&adev->irq.ih1_work); > + break; > + case 2: > + *adev->irq.ih2.wptr_cpu = wptr; > + schedule_work(&adev->irq.ih2_work); > + break; > + default: break; > } > + return 0; > +} > + > +static const struct amdgpu_irq_src_funcs vega10_ih_self_irq_funcs = { > + .process = vega10_ih_self_irq, > +}; > + > +static void vega10_ih_set_self_irq_funcs(struct amdgpu_device *adev) > +{ > + adev->irq.self_irq.num_types = 0; > + adev->irq.self_irq.funcs = &vega10_ih_self_irq_funcs; > } > > static int vega10_ih_early_init(void *handle) > @@ -348,13 +400,19 @@ static int vega10_ih_early_init(void *handle) > struct amdgpu_device *adev = (struct amdgpu_device *)handle; > > vega10_ih_set_interrupt_funcs(adev); > + vega10_ih_set_self_irq_funcs(adev); > return 0; > } > > static int vega10_ih_sw_init(void *handle) > { > - int r; > struct amdgpu_device *adev = (struct amdgpu_device *)handle; > + int r; > + > + r = amdgpu_irq_add_id(adev, SOC15_IH_CLIENTID_IH, 0, > + &adev->irq.self_irq); > + if (r) > + return r; > > r = amdgpu_ih_ring_init(adev, &adev->irq.ih, 256 * 1024, true); > if (r) > -- > 2.17.1 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 09/11] drm/amdgpu: add support for processing IH ring 1 & 2
On Fri, Nov 30, 2018 at 7:36 AM Christian König wrote: > > Previously we only added the ring buffer memory, now add the handling as > well. > > Signed-off-by: Christian König Acked-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 33 + > drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h | 4 ++- > 2 files changed, 36 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c > index 127ed7de841d..64ec92bd74fb 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c > @@ -176,6 +176,36 @@ irqreturn_t amdgpu_irq_handler(int irq, void *arg) > return ret; > } > > +/** > + * amdgpu_irq_handle_ih1 - kick of processing for IH1 > + * > + * @work: work structure in struct amdgpu_irq > + * > + * Kick of processing IH ring 1. > + */ > +static void amdgpu_irq_handle_ih1(struct work_struct *work) > +{ > + struct amdgpu_device *adev = container_of(work, struct amdgpu_device, > + irq.ih1_work); > + > + amdgpu_ih_process(adev, &adev->irq.ih1, amdgpu_irq_callback); > +} > + > +/** > + * amdgpu_irq_handle_ih2 - kick of processing for IH2 > + * > + * @work: work structure in struct amdgpu_irq > + * > + * Kick of processing IH ring 2. > + */ > +static void amdgpu_irq_handle_ih2(struct work_struct *work) > +{ > + struct amdgpu_device *adev = container_of(work, struct amdgpu_device, > + irq.ih2_work); > + > + amdgpu_ih_process(adev, &adev->irq.ih2, amdgpu_irq_callback); > +} > + > /** > * amdgpu_msi_ok - check whether MSI functionality is enabled > * > @@ -240,6 +270,9 @@ int amdgpu_irq_init(struct amdgpu_device *adev) > amdgpu_hotplug_work_func); > } > > + INIT_WORK(&adev->irq.ih1_work, amdgpu_irq_handle_ih1); > + INIT_WORK(&adev->irq.ih2_work, amdgpu_irq_handle_ih2); > + > adev->irq.installed = true; > r = drm_irq_install(adev->ddev, adev->ddev->pdev->irq); > if (r) { > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h > index 7e06fa64321a..c27decfda494 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h > @@ -89,7 +89,9 @@ struct amdgpu_irq { > > /* interrupt rings */ > struct amdgpu_ih_ring ih, ih1, ih2; > - const struct amdgpu_ih_funcs*ih_funcs; > + const struct amdgpu_ih_funcs*ih_funcs; > + struct work_struct ih1_work, ih2_work; > + struct amdgpu_irq_src self_irq; > > /* gen irq stuff */ > struct irq_domain *domain; /* GPU irq controller domain > */ > -- > 2.17.1 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 05/11] drm/amdgpu: add IH ring to ih_get_wptr/ih_set_rptr v2
On Fri, Nov 30, 2018 at 7:36 AM Christian König wrote: > > Let's start to support multiple rings. > > v2: decode IV is needed as well > > Signed-off-by: Christian König > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c | 6 +-- > drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h | 13 +++--- > drivers/gpu/drm/amd/amdgpu/cik_ih.c | 29 +++-- > drivers/gpu/drm/amd/amdgpu/cz_ih.c | 31 +++--- > drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 29 +++-- > drivers/gpu/drm/amd/amdgpu/si_ih.c | 31 +++--- > drivers/gpu/drm/amd/amdgpu/tonga_ih.c | 43 ++- > drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 56 + > 8 files changed, 128 insertions(+), 110 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c > index 8af67f649660..fb8dd6179926 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c > @@ -137,7 +137,7 @@ int amdgpu_ih_process(struct amdgpu_device *adev, struct > amdgpu_ih_ring *ih, > if (!ih->enabled || adev->shutdown) > return IRQ_NONE; > > - wptr = amdgpu_ih_get_wptr(adev); > + wptr = amdgpu_ih_get_wptr(adev, ih); > > restart_ih: > /* is somebody else already processing irqs? */ > @@ -154,11 +154,11 @@ int amdgpu_ih_process(struct amdgpu_device *adev, > struct amdgpu_ih_ring *ih, > ih->rptr &= ih->ptr_mask; > } > > - amdgpu_ih_set_rptr(adev); > + amdgpu_ih_set_rptr(adev, ih); > atomic_set(&ih->lock, 0); > > /* make sure wptr hasn't changed while processing */ > - wptr = amdgpu_ih_get_wptr(adev); > + wptr = amdgpu_ih_get_wptr(adev, ih); > if (wptr != ih->rptr) > goto restart_ih; > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h > index f877bb78d10a..d810fd73d574 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h > @@ -50,15 +50,16 @@ struct amdgpu_ih_ring { > /* provided by the ih block */ > struct amdgpu_ih_funcs { > /* ring read/write ptr handling, called from interrupt context */ > - u32 (*get_wptr)(struct amdgpu_device *adev); > - void (*decode_iv)(struct amdgpu_device *adev, > + u32 (*get_wptr)(struct amdgpu_device *adev, struct amdgpu_ih_ring > *ih); > + void (*decode_iv)(struct amdgpu_device *adev, struct amdgpu_ih_ring > *ih, > struct amdgpu_iv_entry *entry); > - void (*set_rptr)(struct amdgpu_device *adev); > + void (*set_rptr)(struct amdgpu_device *adev, struct amdgpu_ih_ring > *ih); > }; > > -#define amdgpu_ih_get_wptr(adev) (adev)->irq.ih_funcs->get_wptr((adev)) > -#define amdgpu_ih_decode_iv(adev, iv) > (adev)->irq.ih_funcs->decode_iv((adev), (iv)) > -#define amdgpu_ih_set_rptr(adev) (adev)->irq.ih_funcs->set_rptr((adev)) > +#define amdgpu_ih_get_wptr(adev, ih) (adev)->irq.ih_funcs->get_wptr((adev), > (ih)) > +#define amdgpu_ih_decode_iv(adev, iv) \ > + (adev)->irq.ih_funcs->decode_iv((adev), (ih), (iv)) > +#define amdgpu_ih_set_rptr(adev, ih) (adev)->irq.ih_funcs->set_rptr((adev), > (ih)) > Could simplify the interface and skip the adev pointer. Just hang the adev off the ih if we need it. Either way: Acked-by: Alex Deucher > int amdgpu_ih_ring_init(struct amdgpu_device *adev, struct amdgpu_ih_ring > *ih, > unsigned ring_size, bool use_bus_addr); > diff --git a/drivers/gpu/drm/amd/amdgpu/cik_ih.c > b/drivers/gpu/drm/amd/amdgpu/cik_ih.c > index 8a8b4967a101..884aa9b81e86 100644 > --- a/drivers/gpu/drm/amd/amdgpu/cik_ih.c > +++ b/drivers/gpu/drm/amd/amdgpu/cik_ih.c > @@ -183,11 +183,12 @@ static void cik_ih_irq_disable(struct amdgpu_device > *adev) > * Used by cik_irq_process(). > * Returns the value of the wptr. > */ > -static u32 cik_ih_get_wptr(struct amdgpu_device *adev) > +static u32 cik_ih_get_wptr(struct amdgpu_device *adev, > + struct amdgpu_ih_ring *ih) > { > u32 wptr, tmp; > > - wptr = le32_to_cpu(adev->wb.wb[adev->irq.ih.wptr_offs]); > + wptr = le32_to_cpu(adev->wb.wb[ih->wptr_offs]); > > if (wptr & IH_RB_WPTR__RB_OVERFLOW_MASK) { > wptr &= ~IH_RB_WPTR__RB_OVERFLOW_MASK; > @@ -196,13 +197,13 @@ static u32 cik_ih_get_wptr(struct amdgpu_device *adev) > * this should allow us to catchup. > */ > dev_warn(adev->dev, "IH ring buffer overflow (0x%08X, 0x%08X, > 0x%08X)\n", > - wptr, adev->irq.ih.rptr, (wptr + 16) & > adev->irq.ih.ptr_mask); > - adev->irq.ih.rptr = (wptr + 16) & adev->irq.ih.ptr_mask; > +wptr, ih->rptr, (wptr + 16) & ih->ptr_mask); > + ih->rptr = (wptr + 16) & ih->ptr_mask; > tmp = RREG32(mmIH_RB_CNTL); > tmp |= IH_RB_CNTL__WP
Re: [PATCH 04/11] drm/amdgpu: move IV prescreening into the GMC code
On Fri, Nov 30, 2018 at 7:36 AM Christian König wrote: > > The GMC/VM subsystem is causing the faults, so move the handling here as > well. > > Signed-off-by: Christian König Acked-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h | 2 - > drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 4 -- > drivers/gpu/drm/amd/amdgpu/cik_ih.c | 13 > drivers/gpu/drm/amd/amdgpu/cz_ih.c | 13 > drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 59 ++ > drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 13 > drivers/gpu/drm/amd/amdgpu/si_ih.c | 14 - > drivers/gpu/drm/amd/amdgpu/tonga_ih.c | 13 > drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 82 - > 9 files changed, 59 insertions(+), 154 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h > index 9ce8c93ec19b..f877bb78d10a 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h > @@ -51,14 +51,12 @@ struct amdgpu_ih_ring { > struct amdgpu_ih_funcs { > /* ring read/write ptr handling, called from interrupt context */ > u32 (*get_wptr)(struct amdgpu_device *adev); > - bool (*prescreen_iv)(struct amdgpu_device *adev); > void (*decode_iv)(struct amdgpu_device *adev, > struct amdgpu_iv_entry *entry); > void (*set_rptr)(struct amdgpu_device *adev); > }; > > #define amdgpu_ih_get_wptr(adev) (adev)->irq.ih_funcs->get_wptr((adev)) > -#define amdgpu_ih_prescreen_iv(adev) > (adev)->irq.ih_funcs->prescreen_iv((adev)) > #define amdgpu_ih_decode_iv(adev, iv) > (adev)->irq.ih_funcs->decode_iv((adev), (iv)) > #define amdgpu_ih_set_rptr(adev) (adev)->irq.ih_funcs->set_rptr((adev)) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c > index 6db4c58ddc13..318874952bc2 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c > @@ -145,10 +145,6 @@ static void amdgpu_irq_callback(struct amdgpu_device > *adev, > u32 ring_index = ih->rptr >> 2; > struct amdgpu_iv_entry entry; > > - /* Prescreening of high-frequency interrupts */ > - if (!amdgpu_ih_prescreen_iv(adev)) > - return; > - > entry.iv_entry = (const uint32_t *)&ih->ring[ring_index]; > amdgpu_ih_decode_iv(adev, &entry); > > diff --git a/drivers/gpu/drm/amd/amdgpu/cik_ih.c > b/drivers/gpu/drm/amd/amdgpu/cik_ih.c > index 3e6c8c4067cb..8a8b4967a101 100644 > --- a/drivers/gpu/drm/amd/amdgpu/cik_ih.c > +++ b/drivers/gpu/drm/amd/amdgpu/cik_ih.c > @@ -228,18 +228,6 @@ static u32 cik_ih_get_wptr(struct amdgpu_device *adev) > * [127:96] - reserved > */ > > -/** > - * cik_ih_prescreen_iv - prescreen an interrupt vector > - * > - * @adev: amdgpu_device pointer > - * > - * Returns true if the interrupt vector should be further processed. > - */ > -static bool cik_ih_prescreen_iv(struct amdgpu_device *adev) > -{ > - return true; > -} > - > /** > * cik_ih_decode_iv - decode an interrupt vector > * > @@ -445,7 +433,6 @@ static const struct amd_ip_funcs cik_ih_ip_funcs = { > > static const struct amdgpu_ih_funcs cik_ih_funcs = { > .get_wptr = cik_ih_get_wptr, > - .prescreen_iv = cik_ih_prescreen_iv, > .decode_iv = cik_ih_decode_iv, > .set_rptr = cik_ih_set_rptr > }; > diff --git a/drivers/gpu/drm/amd/amdgpu/cz_ih.c > b/drivers/gpu/drm/amd/amdgpu/cz_ih.c > index 447b3cbc47e5..9d3ea298e116 100644 > --- a/drivers/gpu/drm/amd/amdgpu/cz_ih.c > +++ b/drivers/gpu/drm/amd/amdgpu/cz_ih.c > @@ -207,18 +207,6 @@ static u32 cz_ih_get_wptr(struct amdgpu_device *adev) > return (wptr & adev->irq.ih.ptr_mask); > } > > -/** > - * cz_ih_prescreen_iv - prescreen an interrupt vector > - * > - * @adev: amdgpu_device pointer > - * > - * Returns true if the interrupt vector should be further processed. > - */ > -static bool cz_ih_prescreen_iv(struct amdgpu_device *adev) > -{ > - return true; > -} > - > /** > * cz_ih_decode_iv - decode an interrupt vector > * > @@ -426,7 +414,6 @@ static const struct amd_ip_funcs cz_ih_ip_funcs = { > > static const struct amdgpu_ih_funcs cz_ih_funcs = { > .get_wptr = cz_ih_get_wptr, > - .prescreen_iv = cz_ih_prescreen_iv, > .decode_iv = cz_ih_decode_iv, > .set_rptr = cz_ih_set_rptr > }; > diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c > b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c > index e329a23e1f99..8995fca47efb 100644 > --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c > @@ -244,6 +244,62 @@ static int gmc_v9_0_vm_fault_interrupt_state(struct > amdgpu_device *adev, > return 0; > } > > +/** > + * vega10_ih_prescreen_iv - prescreen an interrupt vector > + * > + * @adev: amdgpu_device pointer > + * > + * Returns true if the interrupt vector should be further processed. > + */ > +static bool
Re: [PATCH 07/11] drm/amdgpu: enable IH ring 1 and ring 2 v2
Am 30.11.18 um 17:01 schrieb Alex Deucher: On Fri, Nov 30, 2018 at 7:36 AM Christian König wrote: The entries are ignored for now, but it at least stops crashing the hardware when somebody tries to push something to the other IH rings. v2: limit ring size, add TODO comment Signed-off-by: Christian König We may want to guard this with some flag so we can easily disable it (like we did for the paging queue) in case we run into issues on any asics. This should actually fix some issues since when you currently enable sources which go to ring 1 & 2 the hardware just crashes. But I can add some checking if the rings are actually allocated. This way we can disable them by just commenting out two lines. Christian. Alex --- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h | 4 +- drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 129 +++- 2 files changed, 107 insertions(+), 26 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h index f6ce171cb8aa..7e06fa64321a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h @@ -87,8 +87,8 @@ struct amdgpu_irq { /* status, etc. */ boolmsi_enabled; /* msi enabled */ - /* interrupt ring */ - struct amdgpu_ih_ring ih; + /* interrupt rings */ + struct amdgpu_ih_ring ih, ih1, ih2; const struct amdgpu_ih_funcs*ih_funcs; /* gen irq stuff */ diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c index 3e9ebb0de94d..562fd036bb9e 100644 --- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c @@ -50,6 +50,16 @@ static void vega10_ih_enable_interrupts(struct amdgpu_device *adev) ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, ENABLE_INTR, 1); WREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL, ih_rb_cntl); adev->irq.ih.enabled = true; + + ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1); + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1, RB_ENABLE, 1); + WREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1, ih_rb_cntl); + adev->irq.ih1.enabled = true; + + ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING2); + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2, RB_ENABLE, 1); + WREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING2, ih_rb_cntl); + adev->irq.ih2.enabled = true; } /** @@ -71,6 +81,47 @@ static void vega10_ih_disable_interrupts(struct amdgpu_device *adev) WREG32_SOC15(OSSSYS, 0, mmIH_RB_WPTR, 0); adev->irq.ih.enabled = false; adev->irq.ih.rptr = 0; + + ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1); + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1, RB_ENABLE, 0); + WREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1, ih_rb_cntl); + /* set rptr, wptr to 0 */ + WREG32_SOC15(OSSSYS, 0, mmIH_RB_RPTR_RING1, 0); + WREG32_SOC15(OSSSYS, 0, mmIH_RB_WPTR_RING1, 0); + adev->irq.ih1.enabled = false; + adev->irq.ih1.rptr = 0; + + ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING2); + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2, RB_ENABLE, 0); + WREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING2, ih_rb_cntl); + /* set rptr, wptr to 0 */ + WREG32_SOC15(OSSSYS, 0, mmIH_RB_RPTR_RING2, 0); + WREG32_SOC15(OSSSYS, 0, mmIH_RB_WPTR_RING2, 0); + adev->irq.ih2.enabled = false; + adev->irq.ih2.rptr = 0; +} + +static uint32_t vega10_ih_rb_cntl(struct amdgpu_ih_ring *ih, uint32_t ih_rb_cntl) +{ + int rb_bufsz = order_base_2(ih->ring_size / 4); + + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, + MC_SPACE, ih->use_bus_addr ? 1 : 4); + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, + WPTR_OVERFLOW_CLEAR, 1); + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, + WPTR_OVERFLOW_ENABLE, 1); + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, RB_SIZE, rb_bufsz); + /* Ring Buffer write pointer writeback. If enabled, IH_RB_WPTR register +* value is written to memory +*/ + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, + WPTR_WRITEBACK_ENABLE, 1); + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, MC_SNOOP, 1); + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, MC_RO, 0); + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, MC_VMID, 0); + + return ih_rb_cntl; } /** @@ -86,9 +137,8 @@ static void vega10_ih_disable_interrupts(struct amdgpu_device *adev) */ static int vega10_ih_irq_init(struct amdgpu_device *adev) { - struct amdgpu_ih_ring *ih = &adev->irq.ih; + struct amdgpu_ih_ring *ih; int ret = 0; - int rb_bufsz;
Re: [PATCH 03/11] drm/amdgpu: remove VM fault_credit handling
On Fri, Nov 30, 2018 at 7:36 AM Christian König wrote: > > printk_ratelimit() is much better suited to limit the number of reported > VM faults. > > Signed-off-by: Christian König Acked-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 37 - > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 5 > drivers/gpu/drm/amd/amdgpu/cik_ih.c | 18 +--- > drivers/gpu/drm/amd/amdgpu/cz_ih.c | 18 +--- > drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 18 +--- > drivers/gpu/drm/amd/amdgpu/tonga_ih.c | 18 +--- > drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 7 ++--- > 7 files changed, 6 insertions(+), 115 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > index 2acb9838913e..a2f149da83f8 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > @@ -3057,7 +3057,6 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct > amdgpu_vm *vm, > } > > INIT_KFIFO(vm->faults); > - vm->fault_credit = 16; > > return 0; > > @@ -3269,42 +3268,6 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct > amdgpu_vm *vm) > amdgpu_vmid_free_reserved(adev, vm, i); > } > > -/** > - * amdgpu_vm_pasid_fault_credit - Check fault credit for given PASID > - * > - * @adev: amdgpu_device pointer > - * @pasid: PASID do identify the VM > - * > - * This function is expected to be called in interrupt context. > - * > - * Returns: > - * True if there was fault credit, false otherwise > - */ > -bool amdgpu_vm_pasid_fault_credit(struct amdgpu_device *adev, > - unsigned int pasid) > -{ > - struct amdgpu_vm *vm; > - > - spin_lock(&adev->vm_manager.pasid_lock); > - vm = idr_find(&adev->vm_manager.pasid_idr, pasid); > - if (!vm) { > - /* VM not found, can't track fault credit */ > - spin_unlock(&adev->vm_manager.pasid_lock); > - return true; > - } > - > - /* No lock needed. only accessed by IRQ handler */ > - if (!vm->fault_credit) { > - /* Too many faults in this VM */ > - spin_unlock(&adev->vm_manager.pasid_lock); > - return false; > - } > - > - vm->fault_credit--; > - spin_unlock(&adev->vm_manager.pasid_lock); > - return true; > -} > - > /** > * amdgpu_vm_manager_init - init the VM manager > * > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h > index 2a8898d19c8b..e8dcfd59fc93 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h > @@ -229,9 +229,6 @@ struct amdgpu_vm { > /* Up to 128 pending retry page faults */ > DECLARE_KFIFO(faults, u64, 128); > > - /* Limit non-retry fault storms */ > - unsigned intfault_credit; > - > /* Points to the KFD process VM info */ > struct amdkfd_process_info *process_info; > > @@ -299,8 +296,6 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct > amdgpu_vm *vm, > int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct amdgpu_vm *vm, > unsigned int pasid); > void amdgpu_vm_release_compute(struct amdgpu_device *adev, struct amdgpu_vm > *vm); > void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm); > -bool amdgpu_vm_pasid_fault_credit(struct amdgpu_device *adev, > - unsigned int pasid); > void amdgpu_vm_get_pd_bo(struct amdgpu_vm *vm, > struct list_head *validated, > struct amdgpu_bo_list_entry *entry); > diff --git a/drivers/gpu/drm/amd/amdgpu/cik_ih.c > b/drivers/gpu/drm/amd/amdgpu/cik_ih.c > index b5775c6a857b..3e6c8c4067cb 100644 > --- a/drivers/gpu/drm/amd/amdgpu/cik_ih.c > +++ b/drivers/gpu/drm/amd/amdgpu/cik_ih.c > @@ -237,23 +237,7 @@ static u32 cik_ih_get_wptr(struct amdgpu_device *adev) > */ > static bool cik_ih_prescreen_iv(struct amdgpu_device *adev) > { > - u32 ring_index = adev->irq.ih.rptr >> 2; > - u16 pasid; > - > - switch (le32_to_cpu(adev->irq.ih.ring[ring_index]) & 0xff) { > - case 146: > - case 147: > - pasid = le32_to_cpu(adev->irq.ih.ring[ring_index + 2]) >> 16; > - if (!pasid || amdgpu_vm_pasid_fault_credit(adev, pasid)) > - return true; > - break; > - default: > - /* Not a VM fault */ > - return true; > - } > - > - adev->irq.ih.rptr += 16; > - return false; > + return true; > } > > /** > diff --git a/drivers/gpu/drm/amd/amdgpu/cz_ih.c > b/drivers/gpu/drm/amd/amdgpu/cz_ih.c > index df5ac4d85a00..447b3cbc47e5 100644 > --- a/drivers/gpu/drm/amd/amdgpu/cz_ih.c > +++ b/drivers/gpu/drm/amd/amdgpu/cz_ih.c > @@ -216,23 +216,7 @@ static u32 cz_ih_get_wptr(struct amdgpu_de
Re: [PATCH 02/11] drm/amdgpu: send IVs to the KFD only after processing them v2
On Fri, Nov 30, 2018 at 7:36 AM Christian König wrote: > > This allows us to filter out VM faults in the GMC code. > > v2: don't filter out all faults > > Signed-off-by: Christian König Acked-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 29 +++-- > 1 file changed, 17 insertions(+), 12 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c > index 6b6524f04ce0..6db4c58ddc13 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c > @@ -149,9 +149,6 @@ static void amdgpu_irq_callback(struct amdgpu_device > *adev, > if (!amdgpu_ih_prescreen_iv(adev)) > return; > > - /* Before dispatching irq to IP blocks, send it to amdkfd */ > - amdgpu_amdkfd_interrupt(adev, (const void *) &ih->ring[ring_index]); > - > entry.iv_entry = (const uint32_t *)&ih->ring[ring_index]; > amdgpu_ih_decode_iv(adev, &entry); > > @@ -371,29 +368,31 @@ void amdgpu_irq_dispatch(struct amdgpu_device *adev, > unsigned client_id = entry->client_id; > unsigned src_id = entry->src_id; > struct amdgpu_irq_src *src; > + bool handled = false; > int r; > > trace_amdgpu_iv(entry); > > if (client_id >= AMDGPU_IRQ_CLIENTID_MAX) { > - DRM_DEBUG("Invalid client_id in IV: %d\n", client_id); > + DRM_ERROR("Invalid client_id in IV: %d\n", client_id); > return; > } > > if (src_id >= AMDGPU_MAX_IRQ_SRC_ID) { > - DRM_DEBUG("Invalid src_id in IV: %d\n", src_id); > + DRM_ERROR("Invalid src_id in IV: %d\n", src_id); > return; > } > > if (adev->irq.virq[src_id]) { > generic_handle_irq(irq_find_mapping(adev->irq.domain, > src_id)); > - } else { > - if (!adev->irq.client[client_id].sources) { > - DRM_DEBUG("Unregistered interrupt client_id: %d > src_id: %d\n", > - client_id, src_id); > - return; > - } > + return; > + } > > + if (!adev->irq.client[client_id].sources) { > + DRM_DEBUG("Unregistered interrupt client_id: %d src_id: %d\n", > + client_id, src_id); > + return; > + } else { > src = adev->irq.client[client_id].sources[src_id]; > if (!src) { > DRM_DEBUG("Unhandled interrupt src_id: %d\n", src_id); > @@ -401,9 +400,15 @@ void amdgpu_irq_dispatch(struct amdgpu_device *adev, > } > > r = src->funcs->process(adev, src, entry); > - if (r) > + if (r < 0) > DRM_ERROR("error processing interrupt (%d)\n", r); > + else if (r) > + handled = true; > } > + > + /* Send it to amdkfd as well if it isn't already handled */ > + if (!handled) > + amdgpu_amdkfd_interrupt(adev, entry->iv_entry); > } > > /** > -- > 2.17.1 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 08/11] drm/amdgpu: add the IH to the IV trace
On Fri, Nov 30, 2018 at 7:36 AM Christian König wrote: > > To distinct on which IH ring an IV was found. > > Signed-off-by: Christian König Reviewed-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 4 ++-- > drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 11 +++ > 2 files changed, 9 insertions(+), 6 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c > index 318874952bc2..127ed7de841d 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c > @@ -148,6 +148,8 @@ static void amdgpu_irq_callback(struct amdgpu_device > *adev, > entry.iv_entry = (const uint32_t *)&ih->ring[ring_index]; > amdgpu_ih_decode_iv(adev, &entry); > > + trace_amdgpu_iv(ih - &adev->irq.ih, &entry); > + > amdgpu_irq_dispatch(adev, &entry); > } > > @@ -367,8 +369,6 @@ void amdgpu_irq_dispatch(struct amdgpu_device *adev, > bool handled = false; > int r; > > - trace_amdgpu_iv(entry); > - > if (client_id >= AMDGPU_IRQ_CLIENTID_MAX) { > DRM_ERROR("Invalid client_id in IV: %d\n", client_id); > return; > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h > index 626abca770a0..6e388d7753c1 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h > @@ -76,9 +76,10 @@ TRACE_EVENT(amdgpu_mm_wreg, > ); > > TRACE_EVENT(amdgpu_iv, > - TP_PROTO(struct amdgpu_iv_entry *iv), > - TP_ARGS(iv), > + TP_PROTO(unsigned ih, struct amdgpu_iv_entry *iv), > + TP_ARGS(ih, iv), > TP_STRUCT__entry( > +__field(unsigned, ih) > __field(unsigned, client_id) > __field(unsigned, src_id) > __field(unsigned, ring_id) > @@ -90,6 +91,7 @@ TRACE_EVENT(amdgpu_iv, > __array(unsigned, src_data, 4) > ), > TP_fast_assign( > + __entry->ih = ih; >__entry->client_id = iv->client_id; >__entry->src_id = iv->src_id; >__entry->ring_id = iv->ring_id; > @@ -103,8 +105,9 @@ TRACE_EVENT(amdgpu_iv, >__entry->src_data[2] = iv->src_data[2]; >__entry->src_data[3] = iv->src_data[3]; >), > - TP_printk("client_id:%u src_id:%u ring:%u vmid:%u timestamp: %llu > pasid:%u src_data: %08x %08x %08x %08x", > - __entry->client_id, __entry->src_id, > + TP_printk("ih: %u client_id:%u src_id:%u ring:%u vmid:%u " > + "timestamp: %llu pasid:%u src_data: %08x %08x %08x > %08x", > + __entry->ih, __entry->client_id, __entry->src_id, > __entry->ring_id, __entry->vmid, > __entry->timestamp, __entry->pasid, > __entry->src_data[0], __entry->src_data[1], > -- > 2.17.1 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 07/11] drm/amdgpu: enable IH ring 1 and ring 2 v2
On Fri, Nov 30, 2018 at 7:36 AM Christian König wrote: > > The entries are ignored for now, but it at least stops crashing the > hardware when somebody tries to push something to the other IH rings. > > v2: limit ring size, add TODO comment > > Signed-off-by: Christian König We may want to guard this with some flag so we can easily disable it (like we did for the paging queue) in case we run into issues on any asics. Alex > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h | 4 +- > drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 129 +++- > 2 files changed, 107 insertions(+), 26 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h > index f6ce171cb8aa..7e06fa64321a 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h > @@ -87,8 +87,8 @@ struct amdgpu_irq { > /* status, etc. */ > boolmsi_enabled; /* msi enabled */ > > - /* interrupt ring */ > - struct amdgpu_ih_ring ih; > + /* interrupt rings */ > + struct amdgpu_ih_ring ih, ih1, ih2; > const struct amdgpu_ih_funcs*ih_funcs; > > /* gen irq stuff */ > diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c > b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c > index 3e9ebb0de94d..562fd036bb9e 100644 > --- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c > +++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c > @@ -50,6 +50,16 @@ static void vega10_ih_enable_interrupts(struct > amdgpu_device *adev) > ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, ENABLE_INTR, 1); > WREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL, ih_rb_cntl); > adev->irq.ih.enabled = true; > + > + ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1); > + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1, RB_ENABLE, > 1); > + WREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1, ih_rb_cntl); > + adev->irq.ih1.enabled = true; > + > + ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING2); > + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2, RB_ENABLE, > 1); > + WREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING2, ih_rb_cntl); > + adev->irq.ih2.enabled = true; > } > > /** > @@ -71,6 +81,47 @@ static void vega10_ih_disable_interrupts(struct > amdgpu_device *adev) > WREG32_SOC15(OSSSYS, 0, mmIH_RB_WPTR, 0); > adev->irq.ih.enabled = false; > adev->irq.ih.rptr = 0; > + > + ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1); > + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1, RB_ENABLE, > 0); > + WREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1, ih_rb_cntl); > + /* set rptr, wptr to 0 */ > + WREG32_SOC15(OSSSYS, 0, mmIH_RB_RPTR_RING1, 0); > + WREG32_SOC15(OSSSYS, 0, mmIH_RB_WPTR_RING1, 0); > + adev->irq.ih1.enabled = false; > + adev->irq.ih1.rptr = 0; > + > + ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING2); > + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2, RB_ENABLE, > 0); > + WREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING2, ih_rb_cntl); > + /* set rptr, wptr to 0 */ > + WREG32_SOC15(OSSSYS, 0, mmIH_RB_RPTR_RING2, 0); > + WREG32_SOC15(OSSSYS, 0, mmIH_RB_WPTR_RING2, 0); > + adev->irq.ih2.enabled = false; > + adev->irq.ih2.rptr = 0; > +} > + > +static uint32_t vega10_ih_rb_cntl(struct amdgpu_ih_ring *ih, uint32_t > ih_rb_cntl) > +{ > + int rb_bufsz = order_base_2(ih->ring_size / 4); > + > + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, > + MC_SPACE, ih->use_bus_addr ? 1 : 4); > + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, > + WPTR_OVERFLOW_CLEAR, 1); > + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, > + WPTR_OVERFLOW_ENABLE, 1); > + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, RB_SIZE, rb_bufsz); > + /* Ring Buffer write pointer writeback. If enabled, IH_RB_WPTR > register > +* value is written to memory > +*/ > + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, > + WPTR_WRITEBACK_ENABLE, 1); > + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, MC_SNOOP, 1); > + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, MC_RO, 0); > + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, MC_VMID, 0); > + > + return ih_rb_cntl; > } > > /** > @@ -86,9 +137,8 @@ static void vega10_ih_disable_interrupts(struct > amdgpu_device *adev) > */ > static int vega10_ih_irq_init(struct amdgpu_device *adev) > { > - struct amdgpu_ih_ring *ih = &adev->irq.ih; > + struct amdgpu_ih_ring *ih; > int ret = 0; > - int rb_bufsz; > u32 ih_rb_cntl, ih_doorbell_rtpr; > u32 tmp; > > @@ -97,26 +147,15 @@ static int vega10_ih_irq_init(s
Re: [PATCH 1/2] drm/amdgpu: Handle xgmi device removal and add reset wq.
Am 30.11.18 um 16:14 schrieb Grodzovsky, Andrey: > > On 11/30/2018 04:03 AM, Christian König wrote: >> Am 29.11.18 um 21:36 schrieb Andrey Grodzovsky: >>> XGMI hive has some resources allocted on device init which >>> needs to be deallocated when the device is unregistered. >>> >>> Add per hive wq to allow all the nodes in hive to run resets >>> concurently - this should speed up the total reset time to avoid >>> breaching the PSP FW timeout. >> Do you really want a workqueue? That sounds like the exact opposite of >> what you describe here. >> >> E.g. a workqueue is there to serialize all work items scheduled to it. >> >> What you most likely rather want is a work item per device which are >> scheduled, run in parallel and are joined back together before >> dropping the per hive lock. >> >> Christian. > As you can see in second patch that exactly how I use this wq. I define > per adev work item all of which I enqueue during GPU reset routine. > I believe the wq I created will processes work items in parallel since > it's is served by per CPU worker thread (thread pool) while a > serializing wq is > created using the 'create_singlethread_workqueue' interface which > creates an ordered wq which process at most one item simultaneously. That is correct, but you are still limited to one work item processed per CPU which is not necessary what you want. > I did have my doubts about creating a dedicated wq instead of reusing > existing system wide wq such as system_unbound_wq because this adds new > kernel threads but eventually I opted for my own queue because there is > a hard requirement from the PSP FW to complete all ASIC resets in > defined finite amount > of time (1s i believe) otherwise some nodes in the XGMI hive might > switch to single GPU mode due to timeout. > If I rely on system wide wq then during stress times this queue might > fail to process my work items within that time limit because it's > overloaded by work items from other queue clients. > Maybe I should squash the 2 changes into one to make all of this more > clear ? No, that won't help. You would rather need to document that properly. But creating a wq just for a rarely used device reset is completely overkill. Stuff like that should go to the system_highpri_wq work queue which should make already sure that it never hits the PSP timeout. Christian. > > Andrey >>> Signed-off-by: Andrey Grodzovsky >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 24 >>> drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h | 3 +++ >>> 2 files changed, 27 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c >>> index fb37e69..9ac2dc5 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c >>> @@ -61,6 +61,8 @@ struct amdgpu_hive_info >>> *amdgpu_get_xgmi_hive(struct amdgpu_device *adev) >>> INIT_LIST_HEAD(&tmp->device_list); >>> mutex_init(&tmp->hive_lock); >>> + tmp->reset_queue = alloc_workqueue("xgmi-hive", WQ_UNBOUND | >>> WQ_HIGHPRI, 0); >>> + >>> return tmp; >>> } >>> @@ -135,3 +137,25 @@ int amdgpu_xgmi_add_device(struct >>> amdgpu_device *adev) >>> mutex_unlock(&xgmi_mutex); >>> return ret; >>> } >>> + >>> +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev) >>> +{ >>> + struct amdgpu_hive_info *hive; >>> + >>> + if ((adev->asic_type < CHIP_VEGA20) || (adev->flags & AMD_IS_APU)) >>> + return; >>> + >>> + mutex_lock(&xgmi_mutex); >>> + >>> + hive = amdgpu_get_xgmi_hive(adev); >>> + if (!hive) >>> + goto exit; >>> + >>> + if (!(hive->number_devices--)) { >>> + mutex_destroy(&hive->hive_lock); >>> + destroy_workqueue(hive->reset_queue); >>> + } >>> + >>> +exit: >>> + mutex_unlock(&xgmi_mutex); >>> +} >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h >>> index 6335bfd..285ab93 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h >>> @@ -30,10 +30,13 @@ struct amdgpu_hive_info { >>> struct psp_xgmi_topology_info topology_info; >>> int number_devices; >>> struct mutex hive_lock; >>> + /* hive members reset wq */ >>> + struct workqueue_struct *reset_queue; >>> }; >>> struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct >>> amdgpu_device *adev); >>> int amdgpu_xgmi_update_topology(struct amdgpu_hive_info *hive, >>> struct amdgpu_device *adev); >>> int amdgpu_xgmi_add_device(struct amdgpu_device *adev); >>> +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev); >>> #endif ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 06/11] drm/amdgpu: simplify IH programming
On Fri, Nov 30, 2018 at 7:36 AM Christian König wrote: > > Calculate all the addresses and pointers in amdgpu_ih.c > > Signed-off-by: Christian König Reviewed-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c | 34 +++ > drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h | 23 +--- > drivers/gpu/drm/amd/amdgpu/cik_ih.c | 9 +++ > drivers/gpu/drm/amd/amdgpu/cz_ih.c | 11 > drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 9 +++ > drivers/gpu/drm/amd/amdgpu/si_ih.c | 9 +++ > drivers/gpu/drm/amd/amdgpu/tonga_ih.c | 27 +-- > drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 36 + > 8 files changed, 73 insertions(+), 85 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c > index fb8dd6179926..d0a5db777b6d 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c > @@ -52,6 +52,8 @@ int amdgpu_ih_ring_init(struct amdgpu_device *adev, struct > amdgpu_ih_ring *ih, > ih->use_bus_addr = use_bus_addr; > > if (use_bus_addr) { > + dma_addr_t dma_addr; > + > if (ih->ring) > return 0; > > @@ -59,21 +61,26 @@ int amdgpu_ih_ring_init(struct amdgpu_device *adev, > struct amdgpu_ih_ring *ih, > * add them to the end of the ring allocation. > */ > ih->ring = dma_alloc_coherent(adev->dev, ih->ring_size + 8, > - &ih->rb_dma_addr, GFP_KERNEL); > + &dma_addr, GFP_KERNEL); > if (ih->ring == NULL) > return -ENOMEM; > > memset((void *)ih->ring, 0, ih->ring_size + 8); > - ih->wptr_offs = (ih->ring_size / 4) + 0; > - ih->rptr_offs = (ih->ring_size / 4) + 1; > + ih->gpu_addr = dma_addr; > + ih->wptr_addr = dma_addr + ih->ring_size; > + ih->wptr_cpu = &ih->ring[ih->ring_size / 4]; > + ih->rptr_addr = dma_addr + ih->ring_size + 4; > + ih->rptr_cpu = &ih->ring[(ih->ring_size / 4) + 1]; > } else { > - r = amdgpu_device_wb_get(adev, &ih->wptr_offs); > + unsigned wptr_offs, rptr_offs; > + > + r = amdgpu_device_wb_get(adev, &wptr_offs); > if (r) > return r; > > - r = amdgpu_device_wb_get(adev, &ih->rptr_offs); > + r = amdgpu_device_wb_get(adev, &rptr_offs); > if (r) { > - amdgpu_device_wb_free(adev, ih->wptr_offs); > + amdgpu_device_wb_free(adev, wptr_offs); > return r; > } > > @@ -82,10 +89,15 @@ int amdgpu_ih_ring_init(struct amdgpu_device *adev, > struct amdgpu_ih_ring *ih, > &ih->ring_obj, &ih->gpu_addr, > (void **)&ih->ring); > if (r) { > - amdgpu_device_wb_free(adev, ih->rptr_offs); > - amdgpu_device_wb_free(adev, ih->wptr_offs); > + amdgpu_device_wb_free(adev, rptr_offs); > + amdgpu_device_wb_free(adev, wptr_offs); > return r; > } > + > + ih->wptr_addr = adev->wb.gpu_addr + wptr_offs * 4; > + ih->wptr_cpu = &adev->wb.wb[wptr_offs]; > + ih->rptr_addr = adev->wb.gpu_addr + rptr_offs * 4; > + ih->rptr_cpu = &adev->wb.wb[rptr_offs]; > } > return 0; > } > @@ -109,13 +121,13 @@ void amdgpu_ih_ring_fini(struct amdgpu_device *adev, > struct amdgpu_ih_ring *ih) > * add them to the end of the ring allocation. > */ > dma_free_coherent(adev->dev, ih->ring_size + 8, > - (void *)ih->ring, ih->rb_dma_addr); > + (void *)ih->ring, ih->gpu_addr); > ih->ring = NULL; > } else { > amdgpu_bo_free_kernel(&ih->ring_obj, &ih->gpu_addr, > (void **)&ih->ring); > - amdgpu_device_wb_free(adev, ih->wptr_offs); > - amdgpu_device_wb_free(adev, ih->rptr_offs); > + amdgpu_device_wb_free(adev, (ih->wptr_addr - ih->gpu_addr) / > 4); > + amdgpu_device_wb_free(adev, (ih->rptr_addr - ih->gpu_addr) / > 4); > } > } > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h > index d810fd73d574..1ccb1831382a 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h > @@ -31,20 +31,25 @@ struct amdgpu_iv_entry; > * R6xx+ IH ring > */ > struct amdgpu_ih_ring
Re: [PATCH 01/11] drm/amdgpu: add missing error handling
On Fri, Nov 30, 2018 at 7:36 AM Christian König wrote: > > We ignored the return code here. > > Signed-off-by: Christian König Reviewed-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c > b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c > index 3a4e5d8d5162..e329a23e1f99 100644 > --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c > @@ -902,6 +902,9 @@ static int gmc_v9_0_sw_init(void *handle) > /* This interrupt is VMC page fault.*/ > r = amdgpu_irq_add_id(adev, SOC15_IH_CLIENTID_VMC, > VMC_1_0__SRCID__VM_FAULT, > &adev->gmc.vm_fault); > + if (r) > + return r; > + > r = amdgpu_irq_add_id(adev, SOC15_IH_CLIENTID_UTCL2, > UTCL2_1_0__SRCID__FAULT, > &adev->gmc.vm_fault); > > -- > 2.17.1 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: Fix num_doorbell calculation issue
Reviewed-by: Alex Deucher From: amd-gfx on behalf of Oak Zeng Sent: Friday, November 30, 2018 10:39:21 AM To: amd-gfx@lists.freedesktop.org Cc: Yang, Philip; Zeng, Oak; Yin, Tianci (Rico) Subject: [PATCH] drm/amdgpu: Fix num_doorbell calculation issue When paging queue is enabled, it use the second page of doorbell. The AMDGPU_DOORBELL64_MAX_ASSIGNMENT definition assumes all the kernel doorbells are in the first page. So with paging queue enabled, the total kernel doorbell range should be original num_doorbell plus one page (0x400 in dword), not *2. Change-Id: I62023bb91da33ca5532b22b263771b587b796d59 Signed-off-by: Oak Zeng --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 8eaa40e..c75badf 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -539,10 +539,13 @@ static int amdgpu_device_doorbell_init(struct amdgpu_device *adev) return -EINVAL; /* For Vega, reserve and map two pages on doorbell BAR since SDMA -* paging queue doorbell use the second page +* paging queue doorbell use the second page. The +* AMDGPU_DOORBELL64_MAX_ASSIGNMENT definition assumes all the +* doorbells are in the first page. So with paging queue enabled, +* the max num_doorbells should + 1 page (0x400 in dword) */ if (adev->asic_type >= CHIP_VEGA10) - adev->doorbell.num_doorbells *= 2; + adev->doorbell.num_doorbells += 0x400; adev->doorbell.ptr = ioremap(adev->doorbell.base, adev->doorbell.num_doorbells * -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amdgpu: Fix num_doorbell calculation issue
When paging queue is enabled, it use the second page of doorbell. The AMDGPU_DOORBELL64_MAX_ASSIGNMENT definition assumes all the kernel doorbells are in the first page. So with paging queue enabled, the total kernel doorbell range should be original num_doorbell plus one page (0x400 in dword), not *2. Change-Id: I62023bb91da33ca5532b22b263771b587b796d59 Signed-off-by: Oak Zeng --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 8eaa40e..c75badf 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -539,10 +539,13 @@ static int amdgpu_device_doorbell_init(struct amdgpu_device *adev) return -EINVAL; /* For Vega, reserve and map two pages on doorbell BAR since SDMA -* paging queue doorbell use the second page +* paging queue doorbell use the second page. The +* AMDGPU_DOORBELL64_MAX_ASSIGNMENT definition assumes all the +* doorbells are in the first page. So with paging queue enabled, +* the max num_doorbells should + 1 page (0x400 in dword) */ if (adev->asic_type >= CHIP_VEGA10) - adev->doorbell.num_doorbells *= 2; + adev->doorbell.num_doorbells += 0x400; adev->doorbell.ptr = ioremap(adev->doorbell.base, adev->doorbell.num_doorbells * -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amd/display: Fix NULL ptr deref for commit_planes_to_stream
On 2018-11-30 10:09 a.m., Nicholas Kazlauskas wrote: > [Why] > With scaling, underscan and abm changes we can end up calling > commit_planes_to_stream in commit_tail. This call uses dm_state->context > which can be NULL if the commit was a fast update. > > [How] > Use dc_state instead since that can't be NULL unless the system ran > out of memory. > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108912 > Fixes: e64abff2f133 ("drm/amd/display: Use private obj helpers for > dm_atomic_state") > > Cc: Leo Li > Cc: Harry Wentland > Signed-off-by: Nicholas Kazlauskas Reviewed-by: Harry Wentland Harry > --- > drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > index 76b1aebdca0c..8ecd78657d43 100644 > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > @@ -5011,7 +5011,7 @@ static void amdgpu_dm_atomic_commit_tail(struct > drm_atomic_state *state) > status->plane_count, > dm_new_crtc_state, > to_dm_crtc_state(old_crtc_state), > - dm_state->context)) > + dc_state)) > dm_error("%s: Failed to update stream scaling!\n", > __func__); > } > > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amd/display: Fix NULL ptr deref for commit_planes_to_stream
On 2018-11-30 10:13 a.m., Deucher, Alexander wrote: > Acked-by: Alex Deucher Reviewed-by: Leo Li > > > *From:* amd-gfx on behalf of > Nicholas Kazlauskas > *Sent:* Friday, November 30, 2018 10:09:28 AM > *To:* amd-gfx@lists.freedesktop.org > *Cc:* Li, Sun peng (Leo); Wentland, Harry; Kazlauskas, Nicholas > *Subject:* [PATCH] drm/amd/display: Fix NULL ptr deref for > commit_planes_to_stream > [Why] > With scaling, underscan and abm changes we can end up calling > commit_planes_to_stream in commit_tail. This call uses dm_state->context > which can be NULL if the commit was a fast update. > > [How] > Use dc_state instead since that can't be NULL unless the system ran > out of memory. > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108912 > Fixes: e64abff2f133 ("drm/amd/display: Use private obj helpers for > dm_atomic_state") > > Cc: Leo Li > Cc: Harry Wentland > Signed-off-by: Nicholas Kazlauskas > --- > drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > index 76b1aebdca0c..8ecd78657d43 100644 > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > @@ -5011,7 +5011,7 @@ static void amdgpu_dm_atomic_commit_tail(struct > drm_atomic_state *state) > status->plane_count, > dm_new_crtc_state, > to_dm_crtc_state(old_crtc_state), > - dm_state->context)) > + dc_state)) > dm_error("%s: Failed to update stream > scaling!\n", __func__); > } > > -- > 2.17.1 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 1/2] drm/amdgpu: Handle xgmi device removal and add reset wq.
On 11/30/2018 04:03 AM, Christian König wrote: > Am 29.11.18 um 21:36 schrieb Andrey Grodzovsky: >> XGMI hive has some resources allocted on device init which >> needs to be deallocated when the device is unregistered. >> >> Add per hive wq to allow all the nodes in hive to run resets >> concurently - this should speed up the total reset time to avoid >> breaching the PSP FW timeout. > > Do you really want a workqueue? That sounds like the exact opposite of > what you describe here. > > E.g. a workqueue is there to serialize all work items scheduled to it. > > What you most likely rather want is a work item per device which are > scheduled, run in parallel and are joined back together before > dropping the per hive lock. > > Christian. As you can see in second patch that exactly how I use this wq. I define per adev work item all of which I enqueue during GPU reset routine. I believe the wq I created will processes work items in parallel since it's is served by per CPU worker thread (thread pool) while a serializing wq is created using the 'create_singlethread_workqueue' interface which creates an ordered wq which process at most one item simultaneously. I did have my doubts about creating a dedicated wq instead of reusing existing system wide wq such as system_unbound_wq because this adds new kernel threads but eventually I opted for my own queue because there is a hard requirement from the PSP FW to complete all ASIC resets in defined finite amount of time (1s i believe) otherwise some nodes in the XGMI hive might switch to single GPU mode due to timeout. If I rely on system wide wq then during stress times this queue might fail to process my work items within that time limit because it's overloaded by work items from other queue clients. Maybe I should squash the 2 changes into one to make all of this more clear ? Andrey >> >> Signed-off-by: Andrey Grodzovsky >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 24 >> drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h | 3 +++ >> 2 files changed, 27 insertions(+) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c >> index fb37e69..9ac2dc5 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c >> @@ -61,6 +61,8 @@ struct amdgpu_hive_info >> *amdgpu_get_xgmi_hive(struct amdgpu_device *adev) >> INIT_LIST_HEAD(&tmp->device_list); >> mutex_init(&tmp->hive_lock); >> + tmp->reset_queue = alloc_workqueue("xgmi-hive", WQ_UNBOUND | >> WQ_HIGHPRI, 0); >> + >> return tmp; >> } >> @@ -135,3 +137,25 @@ int amdgpu_xgmi_add_device(struct >> amdgpu_device *adev) >> mutex_unlock(&xgmi_mutex); >> return ret; >> } >> + >> +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev) >> +{ >> + struct amdgpu_hive_info *hive; >> + >> + if ((adev->asic_type < CHIP_VEGA20) || (adev->flags & AMD_IS_APU)) >> + return; >> + >> + mutex_lock(&xgmi_mutex); >> + >> + hive = amdgpu_get_xgmi_hive(adev); >> + if (!hive) >> + goto exit; >> + >> + if (!(hive->number_devices--)) { >> + mutex_destroy(&hive->hive_lock); >> + destroy_workqueue(hive->reset_queue); >> + } >> + >> +exit: >> + mutex_unlock(&xgmi_mutex); >> +} >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h >> index 6335bfd..285ab93 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h >> @@ -30,10 +30,13 @@ struct amdgpu_hive_info { >> struct psp_xgmi_topology_info topology_info; >> int number_devices; >> struct mutex hive_lock; >> + /* hive members reset wq */ >> + struct workqueue_struct *reset_queue; >> }; >> struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct >> amdgpu_device *adev); >> int amdgpu_xgmi_update_topology(struct amdgpu_hive_info *hive, >> struct amdgpu_device *adev); >> int amdgpu_xgmi_add_device(struct amdgpu_device *adev); >> +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev); >> #endif > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amd/display: Fix NULL ptr deref for commit_planes_to_stream
Acked-by: Alex Deucher From: amd-gfx on behalf of Nicholas Kazlauskas Sent: Friday, November 30, 2018 10:09:28 AM To: amd-gfx@lists.freedesktop.org Cc: Li, Sun peng (Leo); Wentland, Harry; Kazlauskas, Nicholas Subject: [PATCH] drm/amd/display: Fix NULL ptr deref for commit_planes_to_stream [Why] With scaling, underscan and abm changes we can end up calling commit_planes_to_stream in commit_tail. This call uses dm_state->context which can be NULL if the commit was a fast update. [How] Use dc_state instead since that can't be NULL unless the system ran out of memory. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108912 Fixes: e64abff2f133 ("drm/amd/display: Use private obj helpers for dm_atomic_state") Cc: Leo Li Cc: Harry Wentland Signed-off-by: Nicholas Kazlauskas --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 76b1aebdca0c..8ecd78657d43 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -5011,7 +5011,7 @@ static void amdgpu_dm_atomic_commit_tail(struct drm_atomic_state *state) status->plane_count, dm_new_crtc_state, to_dm_crtc_state(old_crtc_state), - dm_state->context)) + dc_state)) dm_error("%s: Failed to update stream scaling!\n", __func__); } -- 2.17.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amd/display: Fix NULL ptr deref for commit_planes_to_stream
[Why] With scaling, underscan and abm changes we can end up calling commit_planes_to_stream in commit_tail. This call uses dm_state->context which can be NULL if the commit was a fast update. [How] Use dc_state instead since that can't be NULL unless the system ran out of memory. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108912 Fixes: e64abff2f133 ("drm/amd/display: Use private obj helpers for dm_atomic_state") Cc: Leo Li Cc: Harry Wentland Signed-off-by: Nicholas Kazlauskas --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 76b1aebdca0c..8ecd78657d43 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -5011,7 +5011,7 @@ static void amdgpu_dm_atomic_commit_tail(struct drm_atomic_state *state) status->plane_count, dm_new_crtc_state, to_dm_crtc_state(old_crtc_state), - dm_state->context)) + dc_state)) dm_error("%s: Failed to update stream scaling!\n", __func__); } -- 2.17.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: remove amdgpu_bo_backup_to_shadow
Reviewed-by: Alex Deucher From: amd-gfx on behalf of Christian König Sent: Friday, November 30, 2018 7:45:17 AM To: amd-gfx@lists.freedesktop.org Subject: [PATCH] drm/amdgpu: remove amdgpu_bo_backup_to_shadow It is unused. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 47 -- drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 5 --- 2 files changed, 52 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index cf768acb51dc..cc50cb65c212 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -607,53 +607,6 @@ int amdgpu_bo_create(struct amdgpu_device *adev, return r; } -/** - * amdgpu_bo_backup_to_shadow - Backs up an &amdgpu_bo buffer object - * @adev: amdgpu device object - * @ring: amdgpu_ring for the engine handling the buffer operations - * @bo: &amdgpu_bo buffer to be backed up - * @resv: reservation object with embedded fence - * @fence: dma_fence associated with the operation - * @direct: whether to submit the job directly - * - * Copies an &amdgpu_bo buffer object to its shadow object. - * Not used for now. - * - * Returns: - * 0 for success or a negative error code on failure. - */ -int amdgpu_bo_backup_to_shadow(struct amdgpu_device *adev, - struct amdgpu_ring *ring, - struct amdgpu_bo *bo, - struct reservation_object *resv, - struct dma_fence **fence, - bool direct) - -{ - struct amdgpu_bo *shadow = bo->shadow; - uint64_t bo_addr, shadow_addr; - int r; - - if (!shadow) - return -EINVAL; - - bo_addr = amdgpu_bo_gpu_offset(bo); - shadow_addr = amdgpu_bo_gpu_offset(bo->shadow); - - r = reservation_object_reserve_shared(bo->tbo.resv, 1); - if (r) - goto err; - - r = amdgpu_copy_buffer(ring, bo_addr, shadow_addr, - amdgpu_bo_size(bo), resv, fence, - direct, false); - if (!r) - amdgpu_bo_fence(bo, *fence, true); - -err: - return r; -} - /** * amdgpu_bo_validate - validate an &amdgpu_bo buffer object * @bo: pointer to the buffer object diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h index 7d3312d0da11..9291c2f837e9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h @@ -267,11 +267,6 @@ int amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo); void amdgpu_bo_fence(struct amdgpu_bo *bo, struct dma_fence *fence, bool shared); u64 amdgpu_bo_gpu_offset(struct amdgpu_bo *bo); -int amdgpu_bo_backup_to_shadow(struct amdgpu_device *adev, - struct amdgpu_ring *ring, - struct amdgpu_bo *bo, - struct reservation_object *resv, - struct dma_fence **fence, bool direct); int amdgpu_bo_validate(struct amdgpu_bo *bo); int amdgpu_bo_restore_shadow(struct amdgpu_bo *shadow, struct dma_fence **fence); -- 2.17.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 04/16 v2] drm/amd/display: Add tracing to dc
[Why] Tracing is a useful and cheap debug functionality [How] This creates a new trace system amdgpu_dm, currently with three trace events amdgpu_dc_rreg and amdgpu_dc_wreg report the address and value of any dc register reads and writes amdgpu_dc_performance requires at least one of those two to be enabled. It counts the register reads and writes since the last entry v2: Don't check for NULL before kfree Signed-off-by: David Francis Reviewed-by: Harry Wentland Acked-by: Leo Li --- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 + .../amd/display/amdgpu_dm/amdgpu_dm_trace.h | 104 ++ drivers/gpu/drm/amd/display/dc/core/dc.c | 19 drivers/gpu/drm/amd/display/dc/dc_types.h | 8 ++ .../amd/display/dc/dcn10/dcn10_cm_common.c| 4 +- drivers/gpu/drm/amd/display/dc/dm_services.h | 12 +- 6 files changed, 146 insertions(+), 4 deletions(-) create mode 100644 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_trace.h diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 76b1aebdca0c..376927c8bcc6 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -23,6 +23,9 @@ * */ +/* The caprices of the preprocessor require that this be declared right here */ +#define CREATE_TRACE_POINTS + #include "dm_services_types.h" #include "dc.h" #include "dc/inc/core_types.h" diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_trace.h b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_trace.h new file mode 100644 index ..d898981684d5 --- /dev/null +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_trace.h @@ -0,0 +1,104 @@ +/* + * Copyright 2018 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + * + * Authors: AMD + * + */ + +#undef TRACE_SYSTEM +#define TRACE_SYSTEM amdgpu_dm + +#if !defined(_AMDGPU_DM_TRACE_H) || defined(TRACE_HEADER_MULTI_READ) +#define _AMDGPU_DM_TRACE_H_ + +#include + +TRACE_EVENT(amdgpu_dc_rreg, + TP_PROTO(unsigned long *read_count, uint32_t reg, uint32_t value), + TP_ARGS(read_count, reg, value), + TP_STRUCT__entry( + __field(uint32_t, reg) + __field(uint32_t, value) + ), + TP_fast_assign( + __entry->reg = reg; + __entry->value = value; + *read_count = *read_count + 1; + ), + TP_printk("reg=0x%08lx, value=0x%08lx", + (unsigned long)__entry->reg, + (unsigned long)__entry->value) +); + +TRACE_EVENT(amdgpu_dc_wreg, + TP_PROTO(unsigned long *write_count, uint32_t reg, uint32_t value), + TP_ARGS(write_count, reg, value), + TP_STRUCT__entry( + __field(uint32_t, reg) + __field(uint32_t, value) + ), + TP_fast_assign( + __entry->reg = reg; + __entry->value = value; + *write_count = *write_count + 1; + ), + TP_printk("reg=0x%08lx, value=0x%08lx", + (unsigned long)__entry->reg, + (unsigned long)__entry->value) +); + + +TRACE_EVENT(amdgpu_dc_performance, + TP_PROTO(unsigned long read_count, unsigned long write_count, + unsigned long *last_read, unsigned long *last_write, + const char *func, unsigned int line), + TP_ARGS(read_count, write_count, last_read, last_write, func, line), + TP_STRUCT__entry( + __field(uint32_t, reads) + __field(uint32_t, writes) + __field(uint32_t, read_delta) + __field(uint32_t, write_delta) + __string(func, func) + __field(uint3
[PATCH] drm/amdgpu: remove amdgpu_bo_backup_to_shadow
It is unused. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 47 -- drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 5 --- 2 files changed, 52 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index cf768acb51dc..cc50cb65c212 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -607,53 +607,6 @@ int amdgpu_bo_create(struct amdgpu_device *adev, return r; } -/** - * amdgpu_bo_backup_to_shadow - Backs up an &amdgpu_bo buffer object - * @adev: amdgpu device object - * @ring: amdgpu_ring for the engine handling the buffer operations - * @bo: &amdgpu_bo buffer to be backed up - * @resv: reservation object with embedded fence - * @fence: dma_fence associated with the operation - * @direct: whether to submit the job directly - * - * Copies an &amdgpu_bo buffer object to its shadow object. - * Not used for now. - * - * Returns: - * 0 for success or a negative error code on failure. - */ -int amdgpu_bo_backup_to_shadow(struct amdgpu_device *adev, - struct amdgpu_ring *ring, - struct amdgpu_bo *bo, - struct reservation_object *resv, - struct dma_fence **fence, - bool direct) - -{ - struct amdgpu_bo *shadow = bo->shadow; - uint64_t bo_addr, shadow_addr; - int r; - - if (!shadow) - return -EINVAL; - - bo_addr = amdgpu_bo_gpu_offset(bo); - shadow_addr = amdgpu_bo_gpu_offset(bo->shadow); - - r = reservation_object_reserve_shared(bo->tbo.resv, 1); - if (r) - goto err; - - r = amdgpu_copy_buffer(ring, bo_addr, shadow_addr, - amdgpu_bo_size(bo), resv, fence, - direct, false); - if (!r) - amdgpu_bo_fence(bo, *fence, true); - -err: - return r; -} - /** * amdgpu_bo_validate - validate an &amdgpu_bo buffer object * @bo: pointer to the buffer object diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h index 7d3312d0da11..9291c2f837e9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h @@ -267,11 +267,6 @@ int amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo); void amdgpu_bo_fence(struct amdgpu_bo *bo, struct dma_fence *fence, bool shared); u64 amdgpu_bo_gpu_offset(struct amdgpu_bo *bo); -int amdgpu_bo_backup_to_shadow(struct amdgpu_device *adev, - struct amdgpu_ring *ring, - struct amdgpu_bo *bo, - struct reservation_object *resv, - struct dma_fence **fence, bool direct); int amdgpu_bo_validate(struct amdgpu_bo *bo); int amdgpu_bo_restore_shadow(struct amdgpu_bo *shadow, struct dma_fence **fence); -- 2.17.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 05/11] drm/amdgpu: add IH ring to ih_get_wptr/ih_set_rptr v2
Let's start to support multiple rings. v2: decode IV is needed as well Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c | 6 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h | 13 +++--- drivers/gpu/drm/amd/amdgpu/cik_ih.c | 29 +++-- drivers/gpu/drm/amd/amdgpu/cz_ih.c | 31 +++--- drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 29 +++-- drivers/gpu/drm/amd/amdgpu/si_ih.c | 31 +++--- drivers/gpu/drm/amd/amdgpu/tonga_ih.c | 43 ++- drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 56 + 8 files changed, 128 insertions(+), 110 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c index 8af67f649660..fb8dd6179926 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c @@ -137,7 +137,7 @@ int amdgpu_ih_process(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih, if (!ih->enabled || adev->shutdown) return IRQ_NONE; - wptr = amdgpu_ih_get_wptr(adev); + wptr = amdgpu_ih_get_wptr(adev, ih); restart_ih: /* is somebody else already processing irqs? */ @@ -154,11 +154,11 @@ int amdgpu_ih_process(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih, ih->rptr &= ih->ptr_mask; } - amdgpu_ih_set_rptr(adev); + amdgpu_ih_set_rptr(adev, ih); atomic_set(&ih->lock, 0); /* make sure wptr hasn't changed while processing */ - wptr = amdgpu_ih_get_wptr(adev); + wptr = amdgpu_ih_get_wptr(adev, ih); if (wptr != ih->rptr) goto restart_ih; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h index f877bb78d10a..d810fd73d574 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h @@ -50,15 +50,16 @@ struct amdgpu_ih_ring { /* provided by the ih block */ struct amdgpu_ih_funcs { /* ring read/write ptr handling, called from interrupt context */ - u32 (*get_wptr)(struct amdgpu_device *adev); - void (*decode_iv)(struct amdgpu_device *adev, + u32 (*get_wptr)(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih); + void (*decode_iv)(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih, struct amdgpu_iv_entry *entry); - void (*set_rptr)(struct amdgpu_device *adev); + void (*set_rptr)(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih); }; -#define amdgpu_ih_get_wptr(adev) (adev)->irq.ih_funcs->get_wptr((adev)) -#define amdgpu_ih_decode_iv(adev, iv) (adev)->irq.ih_funcs->decode_iv((adev), (iv)) -#define amdgpu_ih_set_rptr(adev) (adev)->irq.ih_funcs->set_rptr((adev)) +#define amdgpu_ih_get_wptr(adev, ih) (adev)->irq.ih_funcs->get_wptr((adev), (ih)) +#define amdgpu_ih_decode_iv(adev, iv) \ + (adev)->irq.ih_funcs->decode_iv((adev), (ih), (iv)) +#define amdgpu_ih_set_rptr(adev, ih) (adev)->irq.ih_funcs->set_rptr((adev), (ih)) int amdgpu_ih_ring_init(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih, unsigned ring_size, bool use_bus_addr); diff --git a/drivers/gpu/drm/amd/amdgpu/cik_ih.c b/drivers/gpu/drm/amd/amdgpu/cik_ih.c index 8a8b4967a101..884aa9b81e86 100644 --- a/drivers/gpu/drm/amd/amdgpu/cik_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/cik_ih.c @@ -183,11 +183,12 @@ static void cik_ih_irq_disable(struct amdgpu_device *adev) * Used by cik_irq_process(). * Returns the value of the wptr. */ -static u32 cik_ih_get_wptr(struct amdgpu_device *adev) +static u32 cik_ih_get_wptr(struct amdgpu_device *adev, + struct amdgpu_ih_ring *ih) { u32 wptr, tmp; - wptr = le32_to_cpu(adev->wb.wb[adev->irq.ih.wptr_offs]); + wptr = le32_to_cpu(adev->wb.wb[ih->wptr_offs]); if (wptr & IH_RB_WPTR__RB_OVERFLOW_MASK) { wptr &= ~IH_RB_WPTR__RB_OVERFLOW_MASK; @@ -196,13 +197,13 @@ static u32 cik_ih_get_wptr(struct amdgpu_device *adev) * this should allow us to catchup. */ dev_warn(adev->dev, "IH ring buffer overflow (0x%08X, 0x%08X, 0x%08X)\n", - wptr, adev->irq.ih.rptr, (wptr + 16) & adev->irq.ih.ptr_mask); - adev->irq.ih.rptr = (wptr + 16) & adev->irq.ih.ptr_mask; +wptr, ih->rptr, (wptr + 16) & ih->ptr_mask); + ih->rptr = (wptr + 16) & ih->ptr_mask; tmp = RREG32(mmIH_RB_CNTL); tmp |= IH_RB_CNTL__WPTR_OVERFLOW_CLEAR_MASK; WREG32(mmIH_RB_CNTL, tmp); } - return (wptr & adev->irq.ih.ptr_mask); + return (wptr & ih->ptr_mask); } /*CIK IV Ring @@ -237,16 +238,17 @@ static u32 cik_ih_get_wptr(struct amdgpu_device *adev) * position and also advance the position. */ static void cik_ih_decode_iv(struct amdgpu_device *adev, +struct am
[PATCH 09/11] drm/amdgpu: add support for processing IH ring 1 & 2
Previously we only added the ring buffer memory, now add the handling as well. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 33 + drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h | 4 ++- 2 files changed, 36 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c index 127ed7de841d..64ec92bd74fb 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c @@ -176,6 +176,36 @@ irqreturn_t amdgpu_irq_handler(int irq, void *arg) return ret; } +/** + * amdgpu_irq_handle_ih1 - kick of processing for IH1 + * + * @work: work structure in struct amdgpu_irq + * + * Kick of processing IH ring 1. + */ +static void amdgpu_irq_handle_ih1(struct work_struct *work) +{ + struct amdgpu_device *adev = container_of(work, struct amdgpu_device, + irq.ih1_work); + + amdgpu_ih_process(adev, &adev->irq.ih1, amdgpu_irq_callback); +} + +/** + * amdgpu_irq_handle_ih2 - kick of processing for IH2 + * + * @work: work structure in struct amdgpu_irq + * + * Kick of processing IH ring 2. + */ +static void amdgpu_irq_handle_ih2(struct work_struct *work) +{ + struct amdgpu_device *adev = container_of(work, struct amdgpu_device, + irq.ih2_work); + + amdgpu_ih_process(adev, &adev->irq.ih2, amdgpu_irq_callback); +} + /** * amdgpu_msi_ok - check whether MSI functionality is enabled * @@ -240,6 +270,9 @@ int amdgpu_irq_init(struct amdgpu_device *adev) amdgpu_hotplug_work_func); } + INIT_WORK(&adev->irq.ih1_work, amdgpu_irq_handle_ih1); + INIT_WORK(&adev->irq.ih2_work, amdgpu_irq_handle_ih2); + adev->irq.installed = true; r = drm_irq_install(adev->ddev, adev->ddev->pdev->irq); if (r) { diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h index 7e06fa64321a..c27decfda494 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h @@ -89,7 +89,9 @@ struct amdgpu_irq { /* interrupt rings */ struct amdgpu_ih_ring ih, ih1, ih2; - const struct amdgpu_ih_funcs*ih_funcs; + const struct amdgpu_ih_funcs*ih_funcs; + struct work_struct ih1_work, ih2_work; + struct amdgpu_irq_src self_irq; /* gen irq stuff */ struct irq_domain *domain; /* GPU irq controller domain */ -- 2.17.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 06/11] drm/amdgpu: simplify IH programming
Calculate all the addresses and pointers in amdgpu_ih.c Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c | 34 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h | 23 +--- drivers/gpu/drm/amd/amdgpu/cik_ih.c | 9 +++ drivers/gpu/drm/amd/amdgpu/cz_ih.c | 11 drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 9 +++ drivers/gpu/drm/amd/amdgpu/si_ih.c | 9 +++ drivers/gpu/drm/amd/amdgpu/tonga_ih.c | 27 +-- drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 36 + 8 files changed, 73 insertions(+), 85 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c index fb8dd6179926..d0a5db777b6d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c @@ -52,6 +52,8 @@ int amdgpu_ih_ring_init(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih, ih->use_bus_addr = use_bus_addr; if (use_bus_addr) { + dma_addr_t dma_addr; + if (ih->ring) return 0; @@ -59,21 +61,26 @@ int amdgpu_ih_ring_init(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih, * add them to the end of the ring allocation. */ ih->ring = dma_alloc_coherent(adev->dev, ih->ring_size + 8, - &ih->rb_dma_addr, GFP_KERNEL); + &dma_addr, GFP_KERNEL); if (ih->ring == NULL) return -ENOMEM; memset((void *)ih->ring, 0, ih->ring_size + 8); - ih->wptr_offs = (ih->ring_size / 4) + 0; - ih->rptr_offs = (ih->ring_size / 4) + 1; + ih->gpu_addr = dma_addr; + ih->wptr_addr = dma_addr + ih->ring_size; + ih->wptr_cpu = &ih->ring[ih->ring_size / 4]; + ih->rptr_addr = dma_addr + ih->ring_size + 4; + ih->rptr_cpu = &ih->ring[(ih->ring_size / 4) + 1]; } else { - r = amdgpu_device_wb_get(adev, &ih->wptr_offs); + unsigned wptr_offs, rptr_offs; + + r = amdgpu_device_wb_get(adev, &wptr_offs); if (r) return r; - r = amdgpu_device_wb_get(adev, &ih->rptr_offs); + r = amdgpu_device_wb_get(adev, &rptr_offs); if (r) { - amdgpu_device_wb_free(adev, ih->wptr_offs); + amdgpu_device_wb_free(adev, wptr_offs); return r; } @@ -82,10 +89,15 @@ int amdgpu_ih_ring_init(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih, &ih->ring_obj, &ih->gpu_addr, (void **)&ih->ring); if (r) { - amdgpu_device_wb_free(adev, ih->rptr_offs); - amdgpu_device_wb_free(adev, ih->wptr_offs); + amdgpu_device_wb_free(adev, rptr_offs); + amdgpu_device_wb_free(adev, wptr_offs); return r; } + + ih->wptr_addr = adev->wb.gpu_addr + wptr_offs * 4; + ih->wptr_cpu = &adev->wb.wb[wptr_offs]; + ih->rptr_addr = adev->wb.gpu_addr + rptr_offs * 4; + ih->rptr_cpu = &adev->wb.wb[rptr_offs]; } return 0; } @@ -109,13 +121,13 @@ void amdgpu_ih_ring_fini(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih) * add them to the end of the ring allocation. */ dma_free_coherent(adev->dev, ih->ring_size + 8, - (void *)ih->ring, ih->rb_dma_addr); + (void *)ih->ring, ih->gpu_addr); ih->ring = NULL; } else { amdgpu_bo_free_kernel(&ih->ring_obj, &ih->gpu_addr, (void **)&ih->ring); - amdgpu_device_wb_free(adev, ih->wptr_offs); - amdgpu_device_wb_free(adev, ih->rptr_offs); + amdgpu_device_wb_free(adev, (ih->wptr_addr - ih->gpu_addr) / 4); + amdgpu_device_wb_free(adev, (ih->rptr_addr - ih->gpu_addr) / 4); } } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h index d810fd73d574..1ccb1831382a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h @@ -31,20 +31,25 @@ struct amdgpu_iv_entry; * R6xx+ IH ring */ struct amdgpu_ih_ring { - struct amdgpu_bo*ring_obj; - volatile uint32_t *ring; - unsignedrptr; unsignedring_size; - uint64_tgpu_addr; uint32_tptr_mask; - atomic_tlock; - b
[PATCH 11/11] drm/amdgpu: disable IH ring 1 & 2 WPTR overflow on Vega10
That should add back pressure on the client. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c index f5c5ea628fdf..dd7f52f08fd7 100644 --- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c @@ -193,6 +193,8 @@ static int vega10_ih_irq_init(struct amdgpu_device *adev) ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1); ih_rb_cntl = vega10_ih_rb_cntl(ih, ih_rb_cntl); + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2, + WPTR_OVERFLOW_ENABLE, 0); WREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1, ih_rb_cntl); /* set rptr, wptr to 0 */ @@ -207,6 +209,8 @@ static int vega10_ih_irq_init(struct amdgpu_device *adev) ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1); ih_rb_cntl = vega10_ih_rb_cntl(ih, ih_rb_cntl); + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2, + WPTR_OVERFLOW_ENABLE, 0); WREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING2, ih_rb_cntl); /* set rptr, wptr to 0 */ -- 2.17.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 07/11] drm/amdgpu: enable IH ring 1 and ring 2 v2
The entries are ignored for now, but it at least stops crashing the hardware when somebody tries to push something to the other IH rings. v2: limit ring size, add TODO comment Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h | 4 +- drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 129 +++- 2 files changed, 107 insertions(+), 26 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h index f6ce171cb8aa..7e06fa64321a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h @@ -87,8 +87,8 @@ struct amdgpu_irq { /* status, etc. */ boolmsi_enabled; /* msi enabled */ - /* interrupt ring */ - struct amdgpu_ih_ring ih; + /* interrupt rings */ + struct amdgpu_ih_ring ih, ih1, ih2; const struct amdgpu_ih_funcs*ih_funcs; /* gen irq stuff */ diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c index 3e9ebb0de94d..562fd036bb9e 100644 --- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c @@ -50,6 +50,16 @@ static void vega10_ih_enable_interrupts(struct amdgpu_device *adev) ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, ENABLE_INTR, 1); WREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL, ih_rb_cntl); adev->irq.ih.enabled = true; + + ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1); + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1, RB_ENABLE, 1); + WREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1, ih_rb_cntl); + adev->irq.ih1.enabled = true; + + ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING2); + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2, RB_ENABLE, 1); + WREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING2, ih_rb_cntl); + adev->irq.ih2.enabled = true; } /** @@ -71,6 +81,47 @@ static void vega10_ih_disable_interrupts(struct amdgpu_device *adev) WREG32_SOC15(OSSSYS, 0, mmIH_RB_WPTR, 0); adev->irq.ih.enabled = false; adev->irq.ih.rptr = 0; + + ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1); + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1, RB_ENABLE, 0); + WREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1, ih_rb_cntl); + /* set rptr, wptr to 0 */ + WREG32_SOC15(OSSSYS, 0, mmIH_RB_RPTR_RING1, 0); + WREG32_SOC15(OSSSYS, 0, mmIH_RB_WPTR_RING1, 0); + adev->irq.ih1.enabled = false; + adev->irq.ih1.rptr = 0; + + ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING2); + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2, RB_ENABLE, 0); + WREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING2, ih_rb_cntl); + /* set rptr, wptr to 0 */ + WREG32_SOC15(OSSSYS, 0, mmIH_RB_RPTR_RING2, 0); + WREG32_SOC15(OSSSYS, 0, mmIH_RB_WPTR_RING2, 0); + adev->irq.ih2.enabled = false; + adev->irq.ih2.rptr = 0; +} + +static uint32_t vega10_ih_rb_cntl(struct amdgpu_ih_ring *ih, uint32_t ih_rb_cntl) +{ + int rb_bufsz = order_base_2(ih->ring_size / 4); + + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, + MC_SPACE, ih->use_bus_addr ? 1 : 4); + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, + WPTR_OVERFLOW_CLEAR, 1); + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, + WPTR_OVERFLOW_ENABLE, 1); + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, RB_SIZE, rb_bufsz); + /* Ring Buffer write pointer writeback. If enabled, IH_RB_WPTR register +* value is written to memory +*/ + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, + WPTR_WRITEBACK_ENABLE, 1); + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, MC_SNOOP, 1); + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, MC_RO, 0); + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, MC_VMID, 0); + + return ih_rb_cntl; } /** @@ -86,9 +137,8 @@ static void vega10_ih_disable_interrupts(struct amdgpu_device *adev) */ static int vega10_ih_irq_init(struct amdgpu_device *adev) { - struct amdgpu_ih_ring *ih = &adev->irq.ih; + struct amdgpu_ih_ring *ih; int ret = 0; - int rb_bufsz; u32 ih_rb_cntl, ih_doorbell_rtpr; u32 tmp; @@ -97,26 +147,15 @@ static int vega10_ih_irq_init(struct amdgpu_device *adev) adev->nbio_funcs->ih_control(adev); - ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL); + ih = &adev->irq.ih; /* Ring Buffer base. [39:8] of 40-bit address of the beginning of the ring buffer*/ - WREG32_SOC15(OSSSYS, 0, mmIH_RB_BASE, adev->irq.ih.gpu_addr >> 8); - WREG32_SOC15(OSSSYS, 0, mmIH_RB_BASE_HI, -(adev->irq.ih.gpu_addr >> 40) & 0xff);
[PATCH 02/11] drm/amdgpu: send IVs to the KFD only after processing them v2
This allows us to filter out VM faults in the GMC code. v2: don't filter out all faults Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 29 +++-- 1 file changed, 17 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c index 6b6524f04ce0..6db4c58ddc13 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c @@ -149,9 +149,6 @@ static void amdgpu_irq_callback(struct amdgpu_device *adev, if (!amdgpu_ih_prescreen_iv(adev)) return; - /* Before dispatching irq to IP blocks, send it to amdkfd */ - amdgpu_amdkfd_interrupt(adev, (const void *) &ih->ring[ring_index]); - entry.iv_entry = (const uint32_t *)&ih->ring[ring_index]; amdgpu_ih_decode_iv(adev, &entry); @@ -371,29 +368,31 @@ void amdgpu_irq_dispatch(struct amdgpu_device *adev, unsigned client_id = entry->client_id; unsigned src_id = entry->src_id; struct amdgpu_irq_src *src; + bool handled = false; int r; trace_amdgpu_iv(entry); if (client_id >= AMDGPU_IRQ_CLIENTID_MAX) { - DRM_DEBUG("Invalid client_id in IV: %d\n", client_id); + DRM_ERROR("Invalid client_id in IV: %d\n", client_id); return; } if (src_id >= AMDGPU_MAX_IRQ_SRC_ID) { - DRM_DEBUG("Invalid src_id in IV: %d\n", src_id); + DRM_ERROR("Invalid src_id in IV: %d\n", src_id); return; } if (adev->irq.virq[src_id]) { generic_handle_irq(irq_find_mapping(adev->irq.domain, src_id)); - } else { - if (!adev->irq.client[client_id].sources) { - DRM_DEBUG("Unregistered interrupt client_id: %d src_id: %d\n", - client_id, src_id); - return; - } + return; + } + if (!adev->irq.client[client_id].sources) { + DRM_DEBUG("Unregistered interrupt client_id: %d src_id: %d\n", + client_id, src_id); + return; + } else { src = adev->irq.client[client_id].sources[src_id]; if (!src) { DRM_DEBUG("Unhandled interrupt src_id: %d\n", src_id); @@ -401,9 +400,15 @@ void amdgpu_irq_dispatch(struct amdgpu_device *adev, } r = src->funcs->process(adev, src, entry); - if (r) + if (r < 0) DRM_ERROR("error processing interrupt (%d)\n", r); + else if (r) + handled = true; } + + /* Send it to amdkfd as well if it isn't already handled */ + if (!handled) + amdgpu_amdkfd_interrupt(adev, entry->iv_entry); } /** -- 2.17.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 10/11] drm/amdgpu: add support for self irq on Vega10
This finally enables processing of ring 1 & 2. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 68 -- 1 file changed, 63 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c index 562fd036bb9e..f5c5ea628fdf 100644 --- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c @@ -258,7 +258,7 @@ static void vega10_ih_irq_disable(struct amdgpu_device *adev) static u32 vega10_ih_get_wptr(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih) { - u32 wptr, tmp; + u32 wptr, reg, tmp; wptr = le32_to_cpu(*ih->wptr_cpu); @@ -274,9 +274,18 @@ static u32 vega10_ih_get_wptr(struct amdgpu_device *adev, wptr, ih->rptr, tmp); ih->rptr = tmp; - tmp = RREG32_NO_KIQ(SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL)); + if (ih == &adev->irq.ih) + reg = SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL); + else if (ih == &adev->irq.ih1) + reg = SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL); + else if (ih == &adev->irq.ih2) + reg = SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL); + else + BUG(); + + tmp = RREG32_NO_KIQ(reg); tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1); - WREG32_NO_KIQ(SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL), tmp); + WREG32_NO_KIQ(reg, tmp); } return (wptr & ih->ptr_mask); } @@ -338,9 +347,52 @@ static void vega10_ih_set_rptr(struct amdgpu_device *adev, /* XXX check if swapping is necessary on BE */ *ih->rptr_cpu = ih->rptr; WDOORBELL32(ih->doorbell_index, ih->rptr); - } else { + } else if (ih == &adev->irq.ih) { WREG32_SOC15(OSSSYS, 0, mmIH_RB_RPTR, ih->rptr); + } else if (ih == &adev->irq.ih1) { + WREG32_SOC15(OSSSYS, 0, mmIH_RB_RPTR_RING1, ih->rptr); + } else if (ih == &adev->irq.ih2) { + WREG32_SOC15(OSSSYS, 0, mmIH_RB_RPTR_RING2, ih->rptr); + } +} + +/** + * vega10_ih_self_irq - dispatch work for ring 1 and 2 + * + * @adev: amdgpu_device pointer + * @source: irq source + * @entry: IV with WPTR update + * + * Update the WPTR from the IV and schedule work to handle the entries. + */ +static int vega10_ih_self_irq(struct amdgpu_device *adev, + struct amdgpu_irq_src *source, + struct amdgpu_iv_entry *entry) +{ + uint32_t wptr = cpu_to_le32(entry->src_data[0]); + + switch (entry->ring_id) { + case 1: + *adev->irq.ih1.wptr_cpu = wptr; + schedule_work(&adev->irq.ih1_work); + break; + case 2: + *adev->irq.ih2.wptr_cpu = wptr; + schedule_work(&adev->irq.ih2_work); + break; + default: break; } + return 0; +} + +static const struct amdgpu_irq_src_funcs vega10_ih_self_irq_funcs = { + .process = vega10_ih_self_irq, +}; + +static void vega10_ih_set_self_irq_funcs(struct amdgpu_device *adev) +{ + adev->irq.self_irq.num_types = 0; + adev->irq.self_irq.funcs = &vega10_ih_self_irq_funcs; } static int vega10_ih_early_init(void *handle) @@ -348,13 +400,19 @@ static int vega10_ih_early_init(void *handle) struct amdgpu_device *adev = (struct amdgpu_device *)handle; vega10_ih_set_interrupt_funcs(adev); + vega10_ih_set_self_irq_funcs(adev); return 0; } static int vega10_ih_sw_init(void *handle) { - int r; struct amdgpu_device *adev = (struct amdgpu_device *)handle; + int r; + + r = amdgpu_irq_add_id(adev, SOC15_IH_CLIENTID_IH, 0, + &adev->irq.self_irq); + if (r) + return r; r = amdgpu_ih_ring_init(adev, &adev->irq.ih, 256 * 1024, true); if (r) -- 2.17.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 01/11] drm/amdgpu: add missing error handling
We ignored the return code here. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index 3a4e5d8d5162..e329a23e1f99 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -902,6 +902,9 @@ static int gmc_v9_0_sw_init(void *handle) /* This interrupt is VMC page fault.*/ r = amdgpu_irq_add_id(adev, SOC15_IH_CLIENTID_VMC, VMC_1_0__SRCID__VM_FAULT, &adev->gmc.vm_fault); + if (r) + return r; + r = amdgpu_irq_add_id(adev, SOC15_IH_CLIENTID_UTCL2, UTCL2_1_0__SRCID__FAULT, &adev->gmc.vm_fault); -- 2.17.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 04/11] drm/amdgpu: move IV prescreening into the GMC code
The GMC/VM subsystem is causing the faults, so move the handling here as well. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h | 2 - drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 4 -- drivers/gpu/drm/amd/amdgpu/cik_ih.c | 13 drivers/gpu/drm/amd/amdgpu/cz_ih.c | 13 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 59 ++ drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 13 drivers/gpu/drm/amd/amdgpu/si_ih.c | 14 - drivers/gpu/drm/amd/amdgpu/tonga_ih.c | 13 drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 82 - 9 files changed, 59 insertions(+), 154 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h index 9ce8c93ec19b..f877bb78d10a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h @@ -51,14 +51,12 @@ struct amdgpu_ih_ring { struct amdgpu_ih_funcs { /* ring read/write ptr handling, called from interrupt context */ u32 (*get_wptr)(struct amdgpu_device *adev); - bool (*prescreen_iv)(struct amdgpu_device *adev); void (*decode_iv)(struct amdgpu_device *adev, struct amdgpu_iv_entry *entry); void (*set_rptr)(struct amdgpu_device *adev); }; #define amdgpu_ih_get_wptr(adev) (adev)->irq.ih_funcs->get_wptr((adev)) -#define amdgpu_ih_prescreen_iv(adev) (adev)->irq.ih_funcs->prescreen_iv((adev)) #define amdgpu_ih_decode_iv(adev, iv) (adev)->irq.ih_funcs->decode_iv((adev), (iv)) #define amdgpu_ih_set_rptr(adev) (adev)->irq.ih_funcs->set_rptr((adev)) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c index 6db4c58ddc13..318874952bc2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c @@ -145,10 +145,6 @@ static void amdgpu_irq_callback(struct amdgpu_device *adev, u32 ring_index = ih->rptr >> 2; struct amdgpu_iv_entry entry; - /* Prescreening of high-frequency interrupts */ - if (!amdgpu_ih_prescreen_iv(adev)) - return; - entry.iv_entry = (const uint32_t *)&ih->ring[ring_index]; amdgpu_ih_decode_iv(adev, &entry); diff --git a/drivers/gpu/drm/amd/amdgpu/cik_ih.c b/drivers/gpu/drm/amd/amdgpu/cik_ih.c index 3e6c8c4067cb..8a8b4967a101 100644 --- a/drivers/gpu/drm/amd/amdgpu/cik_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/cik_ih.c @@ -228,18 +228,6 @@ static u32 cik_ih_get_wptr(struct amdgpu_device *adev) * [127:96] - reserved */ -/** - * cik_ih_prescreen_iv - prescreen an interrupt vector - * - * @adev: amdgpu_device pointer - * - * Returns true if the interrupt vector should be further processed. - */ -static bool cik_ih_prescreen_iv(struct amdgpu_device *adev) -{ - return true; -} - /** * cik_ih_decode_iv - decode an interrupt vector * @@ -445,7 +433,6 @@ static const struct amd_ip_funcs cik_ih_ip_funcs = { static const struct amdgpu_ih_funcs cik_ih_funcs = { .get_wptr = cik_ih_get_wptr, - .prescreen_iv = cik_ih_prescreen_iv, .decode_iv = cik_ih_decode_iv, .set_rptr = cik_ih_set_rptr }; diff --git a/drivers/gpu/drm/amd/amdgpu/cz_ih.c b/drivers/gpu/drm/amd/amdgpu/cz_ih.c index 447b3cbc47e5..9d3ea298e116 100644 --- a/drivers/gpu/drm/amd/amdgpu/cz_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/cz_ih.c @@ -207,18 +207,6 @@ static u32 cz_ih_get_wptr(struct amdgpu_device *adev) return (wptr & adev->irq.ih.ptr_mask); } -/** - * cz_ih_prescreen_iv - prescreen an interrupt vector - * - * @adev: amdgpu_device pointer - * - * Returns true if the interrupt vector should be further processed. - */ -static bool cz_ih_prescreen_iv(struct amdgpu_device *adev) -{ - return true; -} - /** * cz_ih_decode_iv - decode an interrupt vector * @@ -426,7 +414,6 @@ static const struct amd_ip_funcs cz_ih_ip_funcs = { static const struct amdgpu_ih_funcs cz_ih_funcs = { .get_wptr = cz_ih_get_wptr, - .prescreen_iv = cz_ih_prescreen_iv, .decode_iv = cz_ih_decode_iv, .set_rptr = cz_ih_set_rptr }; diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index e329a23e1f99..8995fca47efb 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -244,6 +244,62 @@ static int gmc_v9_0_vm_fault_interrupt_state(struct amdgpu_device *adev, return 0; } +/** + * vega10_ih_prescreen_iv - prescreen an interrupt vector + * + * @adev: amdgpu_device pointer + * + * Returns true if the interrupt vector should be further processed. + */ +static bool gmc_v9_0_prescreen_iv(struct amdgpu_device *adev, + struct amdgpu_iv_entry *entry, + uint64_t addr) +{ + struct amdgpu_vm *vm; + u64 key; + int r; + + /* No PASID, can't identify faulting process */ + if (!entry->pasid) + return tr
[PATCH 03/11] drm/amdgpu: remove VM fault_credit handling
printk_ratelimit() is much better suited to limit the number of reported VM faults. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 37 - drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 5 drivers/gpu/drm/amd/amdgpu/cik_ih.c | 18 +--- drivers/gpu/drm/amd/amdgpu/cz_ih.c | 18 +--- drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 18 +--- drivers/gpu/drm/amd/amdgpu/tonga_ih.c | 18 +--- drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 7 ++--- 7 files changed, 6 insertions(+), 115 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 2acb9838913e..a2f149da83f8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -3057,7 +3057,6 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm, } INIT_KFIFO(vm->faults); - vm->fault_credit = 16; return 0; @@ -3269,42 +3268,6 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm) amdgpu_vmid_free_reserved(adev, vm, i); } -/** - * amdgpu_vm_pasid_fault_credit - Check fault credit for given PASID - * - * @adev: amdgpu_device pointer - * @pasid: PASID do identify the VM - * - * This function is expected to be called in interrupt context. - * - * Returns: - * True if there was fault credit, false otherwise - */ -bool amdgpu_vm_pasid_fault_credit(struct amdgpu_device *adev, - unsigned int pasid) -{ - struct amdgpu_vm *vm; - - spin_lock(&adev->vm_manager.pasid_lock); - vm = idr_find(&adev->vm_manager.pasid_idr, pasid); - if (!vm) { - /* VM not found, can't track fault credit */ - spin_unlock(&adev->vm_manager.pasid_lock); - return true; - } - - /* No lock needed. only accessed by IRQ handler */ - if (!vm->fault_credit) { - /* Too many faults in this VM */ - spin_unlock(&adev->vm_manager.pasid_lock); - return false; - } - - vm->fault_credit--; - spin_unlock(&adev->vm_manager.pasid_lock); - return true; -} - /** * amdgpu_vm_manager_init - init the VM manager * diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h index 2a8898d19c8b..e8dcfd59fc93 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h @@ -229,9 +229,6 @@ struct amdgpu_vm { /* Up to 128 pending retry page faults */ DECLARE_KFIFO(faults, u64, 128); - /* Limit non-retry fault storms */ - unsigned intfault_credit; - /* Points to the KFD process VM info */ struct amdkfd_process_info *process_info; @@ -299,8 +296,6 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm, int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct amdgpu_vm *vm, unsigned int pasid); void amdgpu_vm_release_compute(struct amdgpu_device *adev, struct amdgpu_vm *vm); void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm); -bool amdgpu_vm_pasid_fault_credit(struct amdgpu_device *adev, - unsigned int pasid); void amdgpu_vm_get_pd_bo(struct amdgpu_vm *vm, struct list_head *validated, struct amdgpu_bo_list_entry *entry); diff --git a/drivers/gpu/drm/amd/amdgpu/cik_ih.c b/drivers/gpu/drm/amd/amdgpu/cik_ih.c index b5775c6a857b..3e6c8c4067cb 100644 --- a/drivers/gpu/drm/amd/amdgpu/cik_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/cik_ih.c @@ -237,23 +237,7 @@ static u32 cik_ih_get_wptr(struct amdgpu_device *adev) */ static bool cik_ih_prescreen_iv(struct amdgpu_device *adev) { - u32 ring_index = adev->irq.ih.rptr >> 2; - u16 pasid; - - switch (le32_to_cpu(adev->irq.ih.ring[ring_index]) & 0xff) { - case 146: - case 147: - pasid = le32_to_cpu(adev->irq.ih.ring[ring_index + 2]) >> 16; - if (!pasid || amdgpu_vm_pasid_fault_credit(adev, pasid)) - return true; - break; - default: - /* Not a VM fault */ - return true; - } - - adev->irq.ih.rptr += 16; - return false; + return true; } /** diff --git a/drivers/gpu/drm/amd/amdgpu/cz_ih.c b/drivers/gpu/drm/amd/amdgpu/cz_ih.c index df5ac4d85a00..447b3cbc47e5 100644 --- a/drivers/gpu/drm/amd/amdgpu/cz_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/cz_ih.c @@ -216,23 +216,7 @@ static u32 cz_ih_get_wptr(struct amdgpu_device *adev) */ static bool cz_ih_prescreen_iv(struct amdgpu_device *adev) { - u32 ring_index = adev->irq.ih.rptr >> 2; - u16 pasid; - - switch (le32_to_cpu(adev->irq.ih.ring[ring_index]) & 0xff) { - case 146: - case 147: - pasid = le32_to_cpu(adev->irq.ih.ring[ring_index + 2]) >> 16; -
[PATCH 08/11] drm/amdgpu: add the IH to the IV trace
To distinct on which IH ring an IV was found. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 11 +++ 2 files changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c index 318874952bc2..127ed7de841d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c @@ -148,6 +148,8 @@ static void amdgpu_irq_callback(struct amdgpu_device *adev, entry.iv_entry = (const uint32_t *)&ih->ring[ring_index]; amdgpu_ih_decode_iv(adev, &entry); + trace_amdgpu_iv(ih - &adev->irq.ih, &entry); + amdgpu_irq_dispatch(adev, &entry); } @@ -367,8 +369,6 @@ void amdgpu_irq_dispatch(struct amdgpu_device *adev, bool handled = false; int r; - trace_amdgpu_iv(entry); - if (client_id >= AMDGPU_IRQ_CLIENTID_MAX) { DRM_ERROR("Invalid client_id in IV: %d\n", client_id); return; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h index 626abca770a0..6e388d7753c1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h @@ -76,9 +76,10 @@ TRACE_EVENT(amdgpu_mm_wreg, ); TRACE_EVENT(amdgpu_iv, - TP_PROTO(struct amdgpu_iv_entry *iv), - TP_ARGS(iv), + TP_PROTO(unsigned ih, struct amdgpu_iv_entry *iv), + TP_ARGS(ih, iv), TP_STRUCT__entry( +__field(unsigned, ih) __field(unsigned, client_id) __field(unsigned, src_id) __field(unsigned, ring_id) @@ -90,6 +91,7 @@ TRACE_EVENT(amdgpu_iv, __array(unsigned, src_data, 4) ), TP_fast_assign( + __entry->ih = ih; __entry->client_id = iv->client_id; __entry->src_id = iv->src_id; __entry->ring_id = iv->ring_id; @@ -103,8 +105,9 @@ TRACE_EVENT(amdgpu_iv, __entry->src_data[2] = iv->src_data[2]; __entry->src_data[3] = iv->src_data[3]; ), - TP_printk("client_id:%u src_id:%u ring:%u vmid:%u timestamp: %llu pasid:%u src_data: %08x %08x %08x %08x", - __entry->client_id, __entry->src_id, + TP_printk("ih: %u client_id:%u src_id:%u ring:%u vmid:%u " + "timestamp: %llu pasid:%u src_data: %08x %08x %08x %08x", + __entry->ih, __entry->client_id, __entry->src_id, __entry->ring_id, __entry->vmid, __entry->timestamp, __entry->pasid, __entry->src_data[0], __entry->src_data[1], -- 2.17.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: enlarge maximum waiting time of KIQ
Am 30.11.18 um 10:48 schrieb wentalou: SWDEV-171843: KIQ in VF’s init delayed by another VF’s reset. late_init failed occasionally if overlapped with another VF’s reset. MAX_KIQ_REG_TRY enlarged from 20 to 80 would fix this issue. Change-Id: I841774bdd9ebf125c5aa2046b1dcebd65e07 Signed-off-by: wentalou --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index c8ad6bf..26e2455 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -235,7 +235,7 @@ enum amdgpu_kiq_irq { #define MAX_KIQ_REG_WAIT 5000 /* in usecs, 5ms */ #define MAX_KIQ_REG_BAILOUT_INTERVAL 5 /* in msecs, 5ms */ -#define MAX_KIQ_REG_TRY 20 +#define MAX_KIQ_REG_TRY 80 /* SWDEV-171843: 20 -> 80 */ Please don't add any internal ticket numbers to public source code. Christian. int amdgpu_device_ip_set_clockgating_state(void *dev, enum amd_ip_block_type block_type, ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amdgpu: enlarge maximum waiting time of KIQ
SWDEV-171843: KIQ in VF’s init delayed by another VF’s reset. late_init failed occasionally if overlapped with another VF’s reset. MAX_KIQ_REG_TRY enlarged from 20 to 80 would fix this issue. Change-Id: I841774bdd9ebf125c5aa2046b1dcebd65e07 Signed-off-by: wentalou --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index c8ad6bf..26e2455 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -235,7 +235,7 @@ enum amdgpu_kiq_irq { #define MAX_KIQ_REG_WAIT 5000 /* in usecs, 5ms */ #define MAX_KIQ_REG_BAILOUT_INTERVAL 5 /* in msecs, 5ms */ -#define MAX_KIQ_REG_TRY 20 +#define MAX_KIQ_REG_TRY 80 /* SWDEV-171843: 20 -> 80 */ int amdgpu_device_ip_set_clockgating_state(void *dev, enum amd_ip_block_type block_type, -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: [PATCH libdrm 4/5] wrap syncobj timeline query/wait APIs for amdgpu v3
> -Original Message- > From: Christian König > Sent: Friday, November 30, 2018 5:15 PM > To: Zhou, David(ChunMing) ; dri- > de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org > Subject: Re: [PATCH libdrm 4/5] wrap syncobj timeline query/wait APIs for > amdgpu v3 > [snip] > >> +drm_public int amdgpu_cs_syncobj_query(amdgpu_device_handle dev, > >> + uint32_t *handles, uint64_t *points, > > This interfaces is public to umd, I think they like "uint64_t > > **points" for batch query, I've verified before, it works well and > > more convenience. > > If removing num_handles, that means only one syncobj to query, I agree > > with "uint64_t *point". > > "handles" as well as "points" are an array of objects. If the UMD wants to > write the points to separate locations it can do so manually after calling the > function. Ok, it doesn't matter. -David > > It doesn't make any sense that libdrm or the kernel does the extra > indirection, the transferred pointers are 64bit as well (even on a 32bit > system) so the overhead is identical. > > Adding another indirection just makes the implementation unnecessary > complex. > > Christian. > > > > > -David > >> + unsigned num_handles) { > >> + if (NULL == dev) > >> + return -EINVAL; > >> + > >> + return drmSyncobjQuery(dev->fd, handles, points, num_handles); } > >> + > >> drm_public int amdgpu_cs_export_syncobj(amdgpu_device_handle dev, > >> uint32_t handle, > >> int *shared_fd) > > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH libdrm 4/5] wrap syncobj timeline query/wait APIs for amdgpu v3
Am 30.11.18 um 08:35 schrieb zhoucm1: On 2018年11月28日 22:50, Christian König wrote: From: Chunming Zhou v2: symbos are stored in lexical order. v3: drop export/import and extra query indirection Signed-off-by: Chunming Zhou Signed-off-by: Christian König --- amdgpu/amdgpu-symbol-check | 2 ++ amdgpu/amdgpu.h | 39 +++ amdgpu/amdgpu_cs.c | 23 +++ 3 files changed, 64 insertions(+) diff --git a/amdgpu/amdgpu-symbol-check b/amdgpu/amdgpu-symbol-check index 6f5e0f95..4553736f 100755 --- a/amdgpu/amdgpu-symbol-check +++ b/amdgpu/amdgpu-symbol-check @@ -49,8 +49,10 @@ amdgpu_cs_submit amdgpu_cs_submit_raw amdgpu_cs_syncobj_export_sync_file amdgpu_cs_syncobj_import_sync_file +amdgpu_cs_syncobj_query amdgpu_cs_syncobj_reset amdgpu_cs_syncobj_signal +amdgpu_cs_syncobj_timeline_wait amdgpu_cs_syncobj_wait amdgpu_cs_wait_fences amdgpu_cs_wait_semaphore diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h index dc51659a..330658a0 100644 --- a/amdgpu/amdgpu.h +++ b/amdgpu/amdgpu.h @@ -1489,6 +1489,45 @@ int amdgpu_cs_syncobj_wait(amdgpu_device_handle dev, int64_t timeout_nsec, unsigned flags, uint32_t *first_signaled); +/** + * Wait for one or all sync objects on their points to signal. + * + * \param dev - \c [in] self-explanatory + * \param handles - \c [in] array of sync object handles + * \param points - \c [in] array of sync points to wait + * \param num_handles - \c [in] self-explanatory + * \param timeout_nsec - \c [in] self-explanatory + * \param flags - \c [in] a bitmask of DRM_SYNCOBJ_WAIT_FLAGS_* + * \param first_signaled - \c [in] self-explanatory + * + * \return 0 on success\n + * -ETIME - Timeout + * <0 - Negative POSIX Error code + * + */ +int amdgpu_cs_syncobj_timeline_wait(amdgpu_device_handle dev, + uint32_t *handles, uint64_t *points, + unsigned num_handles, + int64_t timeout_nsec, unsigned flags, + uint32_t *first_signaled); +/** + * Query sync objects payloads. + * + * \param dev - \c [in] self-explanatory + * \param handles - \c [in] array of sync object handles + * \param points - \c [out] array of sync points returned, which presents + * syncobj payload. + * \param num_handles - \c [in] self-explanatory + * + * \return 0 on success\n + * -ETIME - Timeout + * <0 - Negative POSIX Error code + * + */ +int amdgpu_cs_syncobj_query(amdgpu_device_handle dev, + uint32_t *handles, uint64_t *points, + unsigned num_handles); + /** * Export kernel sync object to shareable fd. * diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c index 3b8231aa..e4a547c6 100644 --- a/amdgpu/amdgpu_cs.c +++ b/amdgpu/amdgpu_cs.c @@ -661,6 +661,29 @@ drm_public int amdgpu_cs_syncobj_wait(amdgpu_device_handle dev, flags, first_signaled); } +drm_public int amdgpu_cs_syncobj_timeline_wait(amdgpu_device_handle dev, + uint32_t *handles, uint64_t *points, + unsigned num_handles, + int64_t timeout_nsec, unsigned flags, + uint32_t *first_signaled) +{ + if (NULL == dev) + return -EINVAL; + + return drmSyncobjTimelineWait(dev->fd, handles, points, num_handles, + timeout_nsec, flags, first_signaled); +} + +drm_public int amdgpu_cs_syncobj_query(amdgpu_device_handle dev, + uint32_t *handles, uint64_t *points, This interfaces is public to umd, I think they like "uint64_t **points" for batch query, I've verified before, it works well and more convenience. If removing num_handles, that means only one syncobj to query, I agree with "uint64_t *point". "handles" as well as "points" are an array of objects. If the UMD wants to write the points to separate locations it can do so manually after calling the function. It doesn't make any sense that libdrm or the kernel does the extra indirection, the transferred pointers are 64bit as well (even on a 32bit system) so the overhead is identical. Adding another indirection just makes the implementation unnecessary complex. Christian. -David + unsigned num_handles) +{ + if (NULL == dev) + return -EINVAL; + + return drmSyncobjQuery(dev->fd, handles, points, num_handles); +} + drm_public int amdgpu_cs_export_syncobj(amdgpu_device_handle dev, uint32_t handle, int *shared_fd) ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: add VCN JPEG support amdgpu_ctx_num_entities
Am 30.11.18 um 03:00 schrieb Alex Deucher: Looks like it was missed when setting support was added. Signed-off-by: Alex Deucher Reviewed-by: Christian König --- This is a legit bug fix. the rest of this series needs more work. drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c index f9b54236102d..95f4c4139fc6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c @@ -39,6 +39,7 @@ const unsigned int amdgpu_ctx_num_entities[AMDGPU_HW_IP_NUM] = { [AMDGPU_HW_IP_UVD_ENC] = 1, [AMDGPU_HW_IP_VCN_DEC] = 1, [AMDGPU_HW_IP_VCN_ENC] = 1, + [AMDGPU_HW_IP_VCN_JPEG] = 1, }; static int amdgput_ctx_total_num_entities(void) ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 1/2] drm/amdgpu: Handle xgmi device removal and add reset wq.
Am 29.11.18 um 21:36 schrieb Andrey Grodzovsky: XGMI hive has some resources allocted on device init which needs to be deallocated when the device is unregistered. Add per hive wq to allow all the nodes in hive to run resets concurently - this should speed up the total reset time to avoid breaching the PSP FW timeout. Do you really want a workqueue? That sounds like the exact opposite of what you describe here. E.g. a workqueue is there to serialize all work items scheduled to it. What you most likely rather want is a work item per device which are scheduled, run in parallel and are joined back together before dropping the per hive lock. Christian. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 24 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h | 3 +++ 2 files changed, 27 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c index fb37e69..9ac2dc5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c @@ -61,6 +61,8 @@ struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev) INIT_LIST_HEAD(&tmp->device_list); mutex_init(&tmp->hive_lock); + tmp->reset_queue = alloc_workqueue("xgmi-hive", WQ_UNBOUND | WQ_HIGHPRI, 0); + return tmp; } @@ -135,3 +137,25 @@ int amdgpu_xgmi_add_device(struct amdgpu_device *adev) mutex_unlock(&xgmi_mutex); return ret; } + +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev) +{ + struct amdgpu_hive_info *hive; + + if ((adev->asic_type < CHIP_VEGA20) || (adev->flags & AMD_IS_APU)) + return; + + mutex_lock(&xgmi_mutex); + + hive = amdgpu_get_xgmi_hive(adev); + if (!hive) + goto exit; + + if (!(hive->number_devices--)) { + mutex_destroy(&hive->hive_lock); + destroy_workqueue(hive->reset_queue); + } + +exit: + mutex_unlock(&xgmi_mutex); +} diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h index 6335bfd..285ab93 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h @@ -30,10 +30,13 @@ struct amdgpu_hive_info { struct psp_xgmi_topology_info topology_info; int number_devices; struct mutex hive_lock; + /* hive members reset wq */ + struct workqueue_struct *reset_queue; }; struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev); int amdgpu_xgmi_update_topology(struct amdgpu_hive_info *hive, struct amdgpu_device *adev); int amdgpu_xgmi_add_device(struct amdgpu_device *adev); +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev); #endif ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx