Re: Changes to let KFD use render node VMs
On 2018-03-02 01:07 PM, Christian König wrote: > Am 02.03.2018 um 17:38 schrieb Felix Kuehling: >> On 2018-03-02 02:43 AM, Christian König wrote: >>> Hi Felix, >>> >>> patch #3 looks good to me in general, but why do you need to cache the >>> pd_phys_addr? >>> >>> The BO already does that anyway, so you can just use >>> amdgpu_bo_gpu_addr() for this which also makes some additional checks >>> on debug builds. >> Do you mean amdgpu_bo_gpu_offset? That one requires the reservation to >> be locked unless it's pinned. > > Correct, well that's why I suggested it. > >> We are caching the PD physical address here to get around a tricky >> circular locking issue we ran into under memory pressure with evictions. >> I don't remember all the details, but I do remember that the deadlock >> involved fences, which aren't tracked by the lock dependency checker. So >> it was particularly tricky to nail down. Avoiding the need to lock the >> page directory reservation for finding out its physical address breaks >> the lock cycle. > > OK, so what you do is you get the pd address after you validated the > BOs and cache it until you need it in the hardware setup? Correct. There is a KFD-KGD interface that queries the PD address from the VM. The address gets put into the MAP_PROCESS packet in the HWS runlist. After that the runlist effectively caches the same address until an eviction causes a preemption. > > If yes than that would be valid because we do exactly the same in the > job during command submission. > >>> patch #5 well mixing power management into the VM functions is a clear >>> NAK. >> This part won't be in my initial upstream push for GPUVM. >> >>> That certainly doesn't belong there, but we can have a counter how >>> many compute VMs we have in the manager. amdgpu_vm_make_compute() can >>> then return if this was the first VM to became a compute VM or not. >> We currently trigger compute power profile switching based on the >> existence of compute VMs. It was a convenient hook where amdgpu finds >> out about the existence of compute work. If that's not acceptable we can >> look into moving it elsewhere, possibly using a new KFD2KGD interface. > > As I said the VM code can still count how many compute VM there are, > the issue is it should not make power management decisions based on that. > > The caller which triggers the switch of the VM to be a compute VM > should make that decision. Makes sense. Regards, Felix > > Christian. > >> >>> The rest of the patch looks good to me. >> Thanks, >> Felix >> >>> Regards, >>> Christian. >>> >>> Am 01.03.2018 um 23:58 schrieb Felix Kuehling: Hi Christian, I have a working patch series against amd-kfg-staging that lets KFD use VMs from render node FDs, as we discussed. There are two patches in that series that touch amdgpu_vm.[ch] that I'd like your feedback on before I commit the changes to amd-kfd-staging and include them in my upstream patch series for KFD GPUVM support. See attached. Thanks, Felix > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: Changes to let KFD use render node VMs
Am 02.03.2018 um 17:38 schrieb Felix Kuehling: On 2018-03-02 02:43 AM, Christian König wrote: Hi Felix, patch #3 looks good to me in general, but why do you need to cache the pd_phys_addr? The BO already does that anyway, so you can just use amdgpu_bo_gpu_addr() for this which also makes some additional checks on debug builds. Do you mean amdgpu_bo_gpu_offset? That one requires the reservation to be locked unless it's pinned. Correct, well that's why I suggested it. We are caching the PD physical address here to get around a tricky circular locking issue we ran into under memory pressure with evictions. I don't remember all the details, but I do remember that the deadlock involved fences, which aren't tracked by the lock dependency checker. So it was particularly tricky to nail down. Avoiding the need to lock the page directory reservation for finding out its physical address breaks the lock cycle. OK, so what you do is you get the pd address after you validated the BOs and cache it until you need it in the hardware setup? If yes than that would be valid because we do exactly the same in the job during command submission. patch #5 well mixing power management into the VM functions is a clear NAK. This part won't be in my initial upstream push for GPUVM. That certainly doesn't belong there, but we can have a counter how many compute VMs we have in the manager. amdgpu_vm_make_compute() can then return if this was the first VM to became a compute VM or not. We currently trigger compute power profile switching based on the existence of compute VMs. It was a convenient hook where amdgpu finds out about the existence of compute work. If that's not acceptable we can look into moving it elsewhere, possibly using a new KFD2KGD interface. As I said the VM code can still count how many compute VM there are, the issue is it should not make power management decisions based on that. The caller which triggers the switch of the VM to be a compute VM should make that decision. Christian. The rest of the patch looks good to me. Thanks, Felix Regards, Christian. Am 01.03.2018 um 23:58 schrieb Felix Kuehling: Hi Christian, I have a working patch series against amd-kfg-staging that lets KFD use VMs from render node FDs, as we discussed. There are two patches in that series that touch amdgpu_vm.[ch] that I'd like your feedback on before I commit the changes to amd-kfd-staging and include them in my upstream patch series for KFD GPUVM support. See attached. Thanks, Felix ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: Changes to let KFD use render node VMs
On 2018-03-02 02:43 AM, Christian König wrote: > Hi Felix, > > patch #3 looks good to me in general, but why do you need to cache the > pd_phys_addr? > > The BO already does that anyway, so you can just use > amdgpu_bo_gpu_addr() for this which also makes some additional checks > on debug builds. Do you mean amdgpu_bo_gpu_offset? That one requires the reservation to be locked unless it's pinned. We are caching the PD physical address here to get around a tricky circular locking issue we ran into under memory pressure with evictions. I don't remember all the details, but I do remember that the deadlock involved fences, which aren't tracked by the lock dependency checker. So it was particularly tricky to nail down. Avoiding the need to lock the page directory reservation for finding out its physical address breaks the lock cycle. > > patch #5 well mixing power management into the VM functions is a clear > NAK. This part won't be in my initial upstream push for GPUVM. > > That certainly doesn't belong there, but we can have a counter how > many compute VMs we have in the manager. amdgpu_vm_make_compute() can > then return if this was the first VM to became a compute VM or not. We currently trigger compute power profile switching based on the existence of compute VMs. It was a convenient hook where amdgpu finds out about the existence of compute work. If that's not acceptable we can look into moving it elsewhere, possibly using a new KFD2KGD interface. > > The rest of the patch looks good to me. Thanks, Felix > > Regards, > Christian. > > Am 01.03.2018 um 23:58 schrieb Felix Kuehling: >> Hi Christian, >> >> I have a working patch series against amd-kfg-staging that lets KFD use >> VMs from render node FDs, as we discussed. There are two patches in that >> series that touch amdgpu_vm.[ch] that I'd like your feedback on before I >> commit the changes to amd-kfd-staging and include them in my upstream >> patch series for KFD GPUVM support. See attached. >> >> Thanks, >> Felix >> > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: Changes to let KFD use render node VMs
Hi Felix, patch #3 looks good to me in general, but why do you need to cache the pd_phys_addr? The BO already does that anyway, so you can just use amdgpu_bo_gpu_addr() for this which also makes some additional checks on debug builds. patch #5 well mixing power management into the VM functions is a clear NAK. That certainly doesn't belong there, but we can have a counter how many compute VMs we have in the manager. amdgpu_vm_make_compute() can then return if this was the first VM to became a compute VM or not. The rest of the patch looks good to me. Regards, Christian. Am 01.03.2018 um 23:58 schrieb Felix Kuehling: Hi Christian, I have a working patch series against amd-kfg-staging that lets KFD use VMs from render node FDs, as we discussed. There are two patches in that series that touch amdgpu_vm.[ch] that I'd like your feedback on before I commit the changes to amd-kfd-staging and include them in my upstream patch series for KFD GPUVM support. See attached. Thanks, Felix ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: Changes to let KFD use render node VMs
Am 02.03.2018 um 07:59 schrieb zhoucm1: On 2018年03月02日 14:19, Kuehling, Felix wrote: Hi David, Compute contexts enable the compute power profile, I heard from Rex, he said he prefer power profile depends on ring busy/idle, rather than always enable it. and can use a different VM update mode from graphics contexts (SDMA vs. CPU). If we can identify which is faster and efficient, then we can unify it I guess. Well SDMA is a bit faster in some cases, can access all of VRAM and is pipelined. CPU has lower latency, but it can't access all of VRAM and isn't pipelined. Lower latency is important for KFD and that it is pipelined is very important for good GFX performance. Regards, Christian. Anyway, thanks for explain. David Zhou These differences are not going away. Not yet, anyway. If we start sharing VM address space between graphics and compute in the future, this will have to change. That is, these features would have to be controlled by other means than by the vm context. Regards, Felix From: amd-gfx <amd-gfx-boun...@lists.freedesktop.org> on behalf of zhoucm1 <zhou...@amd.com> Sent: Thursday, March 1, 2018 10:18:15 PM To: Kuehling, Felix; amd-...@freedesktop.org; Koenig, Christian Subject: Re: Changes to let KFD use render node VMs Hi Felix, Out of patches, Could I ask what the difference of AMDGPU_VM_CONTEXT_COMPUTE/GFX is? After you merge kfd_vm to amdgpu_vm, does the difference still exist? Thanks, David Zhou On 2018年03月02日 06:58, Felix Kuehling wrote: Hi Christian, I have a working patch series against amd-kfg-staging that lets KFD use VMs from render node FDs, as we discussed. There are two patches in that series that touch amdgpu_vm.[ch] that I'd like your feedback on before I commit the changes to amd-kfd-staging and include them in my upstream patch series for KFD GPUVM support. See attached. Thanks, Felix ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: Changes to let KFD use render node VMs
On 2018年03月02日 14:19, Kuehling, Felix wrote: Hi David, Compute contexts enable the compute power profile, I heard from Rex, he said he prefer power profile depends on ring busy/idle, rather than always enable it. and can use a different VM update mode from graphics contexts (SDMA vs. CPU). If we can identify which is faster and efficient, then we can unify it I guess. Anyway, thanks for explain. David Zhou These differences are not going away. Not yet, anyway. If we start sharing VM address space between graphics and compute in the future, this will have to change. That is, these features would have to be controlled by other means than by the vm context. Regards, Felix From: amd-gfx <amd-gfx-boun...@lists.freedesktop.org> on behalf of zhoucm1 <zhou...@amd.com> Sent: Thursday, March 1, 2018 10:18:15 PM To: Kuehling, Felix; amd-...@freedesktop.org; Koenig, Christian Subject: Re: Changes to let KFD use render node VMs Hi Felix, Out of patches, Could I ask what the difference of AMDGPU_VM_CONTEXT_COMPUTE/GFX is? After you merge kfd_vm to amdgpu_vm, does the difference still exist? Thanks, David Zhou On 2018年03月02日 06:58, Felix Kuehling wrote: Hi Christian, I have a working patch series against amd-kfg-staging that lets KFD use VMs from render node FDs, as we discussed. There are two patches in that series that touch amdgpu_vm.[ch] that I'd like your feedback on before I commit the changes to amd-kfd-staging and include them in my upstream patch series for KFD GPUVM support. See attached. Thanks, Felix ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: Changes to let KFD use render node VMs
Hi David, Compute contexts enable the compute power profile, and can use a different VM update mode from graphics contexts (SDMA vs. CPU). These differences are not going away. Not yet, anyway. If we start sharing VM address space between graphics and compute in the future, this will have to change. That is, these features would have to be controlled by other means than by the vm context. Regards, Felix From: amd-gfx <amd-gfx-boun...@lists.freedesktop.org> on behalf of zhoucm1 <zhou...@amd.com> Sent: Thursday, March 1, 2018 10:18:15 PM To: Kuehling, Felix; amd-...@freedesktop.org; Koenig, Christian Subject: Re: Changes to let KFD use render node VMs Hi Felix, Out of patches, Could I ask what the difference of AMDGPU_VM_CONTEXT_COMPUTE/GFX is? After you merge kfd_vm to amdgpu_vm, does the difference still exist? Thanks, David Zhou On 2018年03月02日 06:58, Felix Kuehling wrote: Hi Christian, I have a working patch series against amd-kfg-staging that lets KFD use VMs from render node FDs, as we discussed. There are two patches in that series that touch amdgpu_vm.[ch] that I'd like your feedback on before I commit the changes to amd-kfd-staging and include them in my upstream patch series for KFD GPUVM support. See attached. Thanks, Felix ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: Changes to let KFD use render node VMs
Hi Felix, Out of patches, Could I ask what the difference of AMDGPU_VM_CONTEXT_COMPUTE/GFX is? After you merge kfd_vm to amdgpu_vm, does the difference still exist? Thanks, David Zhou On 2018年03月02日 06:58, Felix Kuehling wrote: Hi Christian, I have a working patch series against amd-kfg-staging that lets KFD use VMs from render node FDs, as we discussed. There are two patches in that series that touch amdgpu_vm.[ch] that I'd like your feedback on before I commit the changes to amd-kfd-staging and include them in my upstream patch series for KFD GPUVM support. See attached. Thanks, Felix ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Changes to let KFD use render node VMs
Hi Christian, I have a working patch series against amd-kfg-staging that lets KFD use VMs from render node FDs, as we discussed. There are two patches in that series that touch amdgpu_vm.[ch] that I'd like your feedback on before I commit the changes to amd-kfd-staging and include them in my upstream patch series for KFD GPUVM support. See attached. Thanks, Felix -- F e l i x K u e h l i n g PMTS Software Development Engineer | Vertical Workstation/Compute 1 Commerce Valley Dr. East, Markham, ON L3T 7X6 Canada (O) +1(289)695-1597 _ _ _ _ _ / \ | \ / | | _ \ \ _ | / A \ | \M/ | | |D) ) /|_| | /_/ \_\ |_| |_| |_/ |__/ \| facebook.com/AMD | amd.com >From 45efed055bcd186bcc92035433457fc39c6227dd Mon Sep 17 00:00:00 2001 From: Felix KuehlingDate: Fri, 23 Feb 2018 19:51:28 -0500 Subject: [PATCH 3/7] drm/amdgpu: Move KFD-specific fields into struct amdgpu_vm Remove struct amdkfd_vm and move the fields into struct amdgpu_vm. This will allow turning a VM created by a DRM render node into a KFD VM. Change-Id: I34112b358e29cdebc8c6af6ce1ffb62d3f22c884 Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 20 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 116 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 9 ++ 3 files changed, 64 insertions(+), 81 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h index 4b64899..d5f6c48 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h @@ -102,26 +102,6 @@ struct amdkfd_process_info { struct pid *pid; }; -/* struct amdkfd_vm - - * For Memory Eviction KGD requires a mechanism to keep track of all KFD BOs - * belonging to a KFD process. All the VMs belonging to the same process point - * to the same amdkfd_process_info. - */ -struct amdkfd_vm { - /* Keep base as the first parameter for pointer compatibility between - * amdkfd_vm and amdgpu_vm. - */ - struct amdgpu_vm base; - - /* List node in amdkfd_process_info.vm_list_head*/ - struct list_head vm_list_node; - - /* Points to the KFD process VM info*/ - struct amdkfd_process_info *process_info; - - uint64_t pd_phys_addr; -}; - int amdgpu_amdkfd_init(void); void amdgpu_amdkfd_fini(void); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 08b19ca..1df7beb 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -356,21 +356,20 @@ static int amdgpu_amdkfd_validate(void *param, struct amdgpu_bo *bo) return amdgpu_amdkfd_bo_validate(bo, p->domain, p->wait); } -static u64 get_vm_pd_gpu_offset(struct amdkfd_vm *kvm, bool reserve) +static u64 get_vm_pd_gpu_offset(struct amdgpu_vm *vm, bool reserve) { - struct amdgpu_vm *avm = >base; struct amdgpu_device *adev = - amdgpu_ttm_adev(avm->root.base.bo->tbo.bdev); + amdgpu_ttm_adev(vm->root.base.bo->tbo.bdev); u64 offset; uint64_t flags = AMDGPU_PTE_VALID; if (reserve) - amdgpu_bo_reserve(avm->root.base.bo, false); + amdgpu_bo_reserve(vm->root.base.bo, false); - offset = amdgpu_bo_gpu_offset(avm->root.base.bo); + offset = amdgpu_bo_gpu_offset(vm->root.base.bo); if (reserve) - amdgpu_bo_unreserve(avm->root.base.bo); + amdgpu_bo_unreserve(vm->root.base.bo); /* On some ASICs the FB doesn't start at 0. Adjust FB offset * to an actual MC address. @@ -387,9 +386,9 @@ static u64 get_vm_pd_gpu_offset(struct amdkfd_vm *kvm, bool reserve) * again. Page directories are only updated after updating page * tables. */ -static int vm_validate_pt_pd_bos(struct amdkfd_vm *vm) +static int vm_validate_pt_pd_bos(struct amdgpu_vm *vm) { - struct amdgpu_bo *pd = vm->base.root.base.bo; + struct amdgpu_bo *pd = vm->root.base.bo; struct amdgpu_device *adev = amdgpu_ttm_adev(pd->tbo.bdev); struct amdgpu_vm_parser param; int ret; @@ -397,7 +396,7 @@ static int vm_validate_pt_pd_bos(struct amdkfd_vm *vm) param.domain = AMDGPU_GEM_DOMAIN_VRAM; param.wait = false; - ret = amdgpu_vm_validate_pt_bos(adev, >base, amdgpu_amdkfd_validate, + ret = amdgpu_vm_validate_pt_bos(adev, vm, amdgpu_amdkfd_validate, ); if (ret) { pr_err("amdgpu: failed to validate PT BOs\n"); @@ -412,7 +411,7 @@ static int vm_validate_pt_pd_bos(struct amdkfd_vm *vm) vm->pd_phys_addr = get_vm_pd_gpu_offset(vm, false); - if (vm->base.use_cpu_for_update) { + if (vm->use_cpu_for_update) { ret = amdgpu_bo_kmap(pd, NULL); if (ret) { pr_err("amdgpu: failed to kmap PD, ret=%d\n", ret); @@ -466,14 +465,12 @@ static int vm_update_pds(struct amdgpu_vm *vm, struct amdgpu_sync *sync) * 4a. Validate new page tables and directories */ static int add_bo_to_vm(struct amdgpu_device *adev, struct kgd_mem *mem, - struct amdgpu_vm *avm, bool is_aql, + struct amdgpu_vm