Re: Preparation for unpinned DMA-buf handling
On Wed, May 15, 2019 at 11:24 PM Alex Deucher wrote: > > On Wed, May 15, 2019 at 4:52 PM Daniel Vetter wrote: > > > > On Wed, May 15, 2019 at 10:08 PM Alex Deucher wrote: > > > > > > On Wed, May 15, 2019 at 4:54 AM Daniel Vetter wrote: > > > > > > > > On Tue, May 14, 2019 at 04:05:13PM +0200, Christian König wrote: > > > > > Hi Daniel, > > > > > > > > > > since we are not moving forward with this I've separated the change > > > > > out of the larger patchset. > > > > > > > > > > Any objections to this now? It's basically a 1:1 move of the > > > > > functionality from DRM to DMA-buf. > > > > > > > > For my big picture understanding, and because I spotted in the latest > > > > rocm > > > > release that apparently the fancy interconnect is now support. I looked > > > > around a bit in upstream and basic xgmi support seems to be there > > > > already, > > > > but I haven't seen anything that looks like xgmi buffer sharing is there > > > > already? Missing something, or am I looking at the wrong thing (xgmi > > > > seems > > > > to be the engineering name for that fancy interconnect at least). > > > > > > > > For the big plan I've seen the dynamic dma-buf and p2p stuff, I think > > > > it'd > > > > be good if we also make sure xgmi and related things integrate somewhat > > > > neatly. Ofc xgmi requires that we upcast the dma_buf_attachment to > > > > somethign driver specific (I don't think we should make that generic, > > > > but > > > > maybe Jerome has different ideas with HMM). Question I have is how much > > > > of > > > > the map/unmap/invaliidate and passing sg tables around we can reuse for > > > > this. > > > > > > xgmi is an optimization in the driver. If a dma-buf comes in, we'd > > > check the drm-buf pointers to verify that it is an amdgpu buffer, then > > > we'd check if the other GPU is connected via xgmi or not. If so, we'd > > > use the internal xgmi address when setting up our GPUVM address space, > > > otherwise, we'd use the dma address. > > > > > > The code to actually enable sharing over xgmi (or p2p for that matter) > > > is not upstream yet. It's pretty trivial though once the > > > prerequisites are in place. > > > > > > Here's the patch to enable xgmi rather than p2p when it's available: > > > https://cgit.freedesktop.org/~agd5f/linux/commit/?h=amd-19.10=7a443319db2e614114b119015daa4e2c1ed9474d > > > > Hm I think I looked at this, but didn't realize how it all fits. I'll > > grab the rocm kernel tree again and browse a bit more. > > > > > And the original patch to add p2p support is mixed into this giant > > > squashed commit (I'll see if I can dig up the original patch): > > > https://cgit.freedesktop.org/~agd5f/linux/commit/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c?h=amd-19.10=67f5976ee4842d01af79144a8355a040ed36b6d2 > > > The relevant change is this hunk here: > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > index d4d3424..49dd501 100644 > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > @@ -1674,9 +1677,15 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev, > > > exclusive = reservation_object_get_excl(bo->tbo.resv); > > > } > > > > > > -if (bo) > > > +if (bo) { > > > flags = amdgpu_ttm_tt_pte_flags(adev, bo->tbo.ttm, mem); > > > -else > > > +bo_adev = amdgpu_ttm_adev(bo->tbo.bdev); > > > +if (mem && mem->mem_type == TTM_PL_VRAM && > > > +adev != bo_adev) { > > > +flags |= AMDGPU_PTE_SYSTEM; > > > +vram_base_offset = bo_adev->gmc.aper_base; > > > +} > > > +} else > > > flags = 0x0; > > > > > > if (clear || (bo && bo->tbo.resv == vm->root.base.bo->tbo.resv)) > > > > > > We just treat the remote BAR like we would for system memory mapped > > > into the GPUVM. This will obviously need some updating to properly > > > deal with IOMMU, etc. once dma-buf lands. > > > > Yeah the base address adjusting to pick the right window looks as > > expected. I think how you get at the list of offsets for all the pages > > of a bo is the more interesting thing. For p2p or system memory you > > get an sg table with dma addresses (which you then need to correct for > > where that is in gpu pa and stuff that into the gpu vm ptes ofc). For > > the direct access/xgmi case I can see a bunch of options, one of them > > would be to forgo the import and just directly upcast to the other > > driver's amdgpu_bo and access everything directly. That seems to be > > what you're doing here. Other extreme would be to somehow augment the > > Right. For xgmi, all connected gpus can access the entire vram > aperture from all connected gpus via it's local vram aperture like a > giant vram NUMA array. Well, assuming all driver instances have given > permission to allow access from their xgmi peers. At the moment we > always do. > > For dma-buf, we can check and upcast based
Re: Preparation for unpinned DMA-buf handling
On Wed, May 15, 2019 at 4:52 PM Daniel Vetter wrote: > > On Wed, May 15, 2019 at 10:08 PM Alex Deucher wrote: > > > > On Wed, May 15, 2019 at 4:54 AM Daniel Vetter wrote: > > > > > > On Tue, May 14, 2019 at 04:05:13PM +0200, Christian König wrote: > > > > Hi Daniel, > > > > > > > > since we are not moving forward with this I've separated the change out > > > > of the larger patchset. > > > > > > > > Any objections to this now? It's basically a 1:1 move of the > > > > functionality from DRM to DMA-buf. > > > > > > For my big picture understanding, and because I spotted in the latest rocm > > > release that apparently the fancy interconnect is now support. I looked > > > around a bit in upstream and basic xgmi support seems to be there already, > > > but I haven't seen anything that looks like xgmi buffer sharing is there > > > already? Missing something, or am I looking at the wrong thing (xgmi seems > > > to be the engineering name for that fancy interconnect at least). > > > > > > For the big plan I've seen the dynamic dma-buf and p2p stuff, I think it'd > > > be good if we also make sure xgmi and related things integrate somewhat > > > neatly. Ofc xgmi requires that we upcast the dma_buf_attachment to > > > somethign driver specific (I don't think we should make that generic, but > > > maybe Jerome has different ideas with HMM). Question I have is how much of > > > the map/unmap/invaliidate and passing sg tables around we can reuse for > > > this. > > > > xgmi is an optimization in the driver. If a dma-buf comes in, we'd > > check the drm-buf pointers to verify that it is an amdgpu buffer, then > > we'd check if the other GPU is connected via xgmi or not. If so, we'd > > use the internal xgmi address when setting up our GPUVM address space, > > otherwise, we'd use the dma address. > > > > The code to actually enable sharing over xgmi (or p2p for that matter) > > is not upstream yet. It's pretty trivial though once the > > prerequisites are in place. > > > > Here's the patch to enable xgmi rather than p2p when it's available: > > https://cgit.freedesktop.org/~agd5f/linux/commit/?h=amd-19.10=7a443319db2e614114b119015daa4e2c1ed9474d > > Hm I think I looked at this, but didn't realize how it all fits. I'll > grab the rocm kernel tree again and browse a bit more. > > > And the original patch to add p2p support is mixed into this giant > > squashed commit (I'll see if I can dig up the original patch): > > https://cgit.freedesktop.org/~agd5f/linux/commit/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c?h=amd-19.10=67f5976ee4842d01af79144a8355a040ed36b6d2 > > The relevant change is this hunk here: > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > index d4d3424..49dd501 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > @@ -1674,9 +1677,15 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev, > > exclusive = reservation_object_get_excl(bo->tbo.resv); > > } > > > > -if (bo) > > +if (bo) { > > flags = amdgpu_ttm_tt_pte_flags(adev, bo->tbo.ttm, mem); > > -else > > +bo_adev = amdgpu_ttm_adev(bo->tbo.bdev); > > +if (mem && mem->mem_type == TTM_PL_VRAM && > > +adev != bo_adev) { > > +flags |= AMDGPU_PTE_SYSTEM; > > +vram_base_offset = bo_adev->gmc.aper_base; > > +} > > +} else > > flags = 0x0; > > > > if (clear || (bo && bo->tbo.resv == vm->root.base.bo->tbo.resv)) > > > > We just treat the remote BAR like we would for system memory mapped > > into the GPUVM. This will obviously need some updating to properly > > deal with IOMMU, etc. once dma-buf lands. > > Yeah the base address adjusting to pick the right window looks as > expected. I think how you get at the list of offsets for all the pages > of a bo is the more interesting thing. For p2p or system memory you > get an sg table with dma addresses (which you then need to correct for > where that is in gpu pa and stuff that into the gpu vm ptes ofc). For > the direct access/xgmi case I can see a bunch of options, one of them > would be to forgo the import and just directly upcast to the other > driver's amdgpu_bo and access everything directly. That seems to be > what you're doing here. Other extreme would be to somehow augment the Right. For xgmi, all connected gpus can access the entire vram aperture from all connected gpus via it's local vram aperture like a giant vram NUMA array. Well, assuming all driver instances have given permission to allow access from their xgmi peers. At the moment we always do. For dma-buf, we can check and upcast based on the dma-buf pointers. For ROCm, we have a single device node for all GPUs in the system and all allocations go through that and we can use that get back to the right amdgpu driver instance. Alex > sg table to be able to handle xgmi addresses I think. > > Not sure where we should land on this
Re: Preparation for unpinned DMA-buf handling
On Wed, May 15, 2019 at 10:08 PM Alex Deucher wrote: > > On Wed, May 15, 2019 at 4:54 AM Daniel Vetter wrote: > > > > On Tue, May 14, 2019 at 04:05:13PM +0200, Christian König wrote: > > > Hi Daniel, > > > > > > since we are not moving forward with this I've separated the change out > > > of the larger patchset. > > > > > > Any objections to this now? It's basically a 1:1 move of the > > > functionality from DRM to DMA-buf. > > > > For my big picture understanding, and because I spotted in the latest rocm > > release that apparently the fancy interconnect is now support. I looked > > around a bit in upstream and basic xgmi support seems to be there already, > > but I haven't seen anything that looks like xgmi buffer sharing is there > > already? Missing something, or am I looking at the wrong thing (xgmi seems > > to be the engineering name for that fancy interconnect at least). > > > > For the big plan I've seen the dynamic dma-buf and p2p stuff, I think it'd > > be good if we also make sure xgmi and related things integrate somewhat > > neatly. Ofc xgmi requires that we upcast the dma_buf_attachment to > > somethign driver specific (I don't think we should make that generic, but > > maybe Jerome has different ideas with HMM). Question I have is how much of > > the map/unmap/invaliidate and passing sg tables around we can reuse for > > this. > > xgmi is an optimization in the driver. If a dma-buf comes in, we'd > check the drm-buf pointers to verify that it is an amdgpu buffer, then > we'd check if the other GPU is connected via xgmi or not. If so, we'd > use the internal xgmi address when setting up our GPUVM address space, > otherwise, we'd use the dma address. > > The code to actually enable sharing over xgmi (or p2p for that matter) > is not upstream yet. It's pretty trivial though once the > prerequisites are in place. > > Here's the patch to enable xgmi rather than p2p when it's available: > https://cgit.freedesktop.org/~agd5f/linux/commit/?h=amd-19.10=7a443319db2e614114b119015daa4e2c1ed9474d Hm I think I looked at this, but didn't realize how it all fits. I'll grab the rocm kernel tree again and browse a bit more. > And the original patch to add p2p support is mixed into this giant > squashed commit (I'll see if I can dig up the original patch): > https://cgit.freedesktop.org/~agd5f/linux/commit/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c?h=amd-19.10=67f5976ee4842d01af79144a8355a040ed36b6d2 > The relevant change is this hunk here: > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > index d4d3424..49dd501 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > @@ -1674,9 +1677,15 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev, > exclusive = reservation_object_get_excl(bo->tbo.resv); > } > > -if (bo) > +if (bo) { > flags = amdgpu_ttm_tt_pte_flags(adev, bo->tbo.ttm, mem); > -else > +bo_adev = amdgpu_ttm_adev(bo->tbo.bdev); > +if (mem && mem->mem_type == TTM_PL_VRAM && > +adev != bo_adev) { > +flags |= AMDGPU_PTE_SYSTEM; > +vram_base_offset = bo_adev->gmc.aper_base; > +} > +} else > flags = 0x0; > > if (clear || (bo && bo->tbo.resv == vm->root.base.bo->tbo.resv)) > > We just treat the remote BAR like we would for system memory mapped > into the GPUVM. This will obviously need some updating to properly > deal with IOMMU, etc. once dma-buf lands. Yeah the base address adjusting to pick the right window looks as expected. I think how you get at the list of offsets for all the pages of a bo is the more interesting thing. For p2p or system memory you get an sg table with dma addresses (which you then need to correct for where that is in gpu pa and stuff that into the gpu vm ptes ofc). For the direct access/xgmi case I can see a bunch of options, one of them would be to forgo the import and just directly upcast to the other driver's amdgpu_bo and access everything directly. That seems to be what you're doing here. Other extreme would be to somehow augment the sg table to be able to handle xgmi addresses I think. Not sure where we should land on this spectrum for upstream. -Daniel > > Alex > > > > > If you have patches/pointers would be really intersting to read them a > > bit. > > -Daniel > > -- > > Daniel Vetter > > Software Engineer, Intel Corporation > > http://blog.ffwll.ch > > ___ > > dri-devel mailing list > > dri-devel@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/dri-devel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Preparation for unpinned DMA-buf handling
On Wed, May 15, 2019 at 4:54 AM Daniel Vetter wrote: > > On Tue, May 14, 2019 at 04:05:13PM +0200, Christian König wrote: > > Hi Daniel, > > > > since we are not moving forward with this I've separated the change out of > > the larger patchset. > > > > Any objections to this now? It's basically a 1:1 move of the functionality > > from DRM to DMA-buf. > > For my big picture understanding, and because I spotted in the latest rocm > release that apparently the fancy interconnect is now support. I looked > around a bit in upstream and basic xgmi support seems to be there already, > but I haven't seen anything that looks like xgmi buffer sharing is there > already? Missing something, or am I looking at the wrong thing (xgmi seems > to be the engineering name for that fancy interconnect at least). > > For the big plan I've seen the dynamic dma-buf and p2p stuff, I think it'd > be good if we also make sure xgmi and related things integrate somewhat > neatly. Ofc xgmi requires that we upcast the dma_buf_attachment to > somethign driver specific (I don't think we should make that generic, but > maybe Jerome has different ideas with HMM). Question I have is how much of > the map/unmap/invaliidate and passing sg tables around we can reuse for > this. xgmi is an optimization in the driver. If a dma-buf comes in, we'd check the drm-buf pointers to verify that it is an amdgpu buffer, then we'd check if the other GPU is connected via xgmi or not. If so, we'd use the internal xgmi address when setting up our GPUVM address space, otherwise, we'd use the dma address. The code to actually enable sharing over xgmi (or p2p for that matter) is not upstream yet. It's pretty trivial though once the prerequisites are in place. Here's the patch to enable xgmi rather than p2p when it's available: https://cgit.freedesktop.org/~agd5f/linux/commit/?h=amd-19.10=7a443319db2e614114b119015daa4e2c1ed9474d And the original patch to add p2p support is mixed into this giant squashed commit (I'll see if I can dig up the original patch): https://cgit.freedesktop.org/~agd5f/linux/commit/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c?h=amd-19.10=67f5976ee4842d01af79144a8355a040ed36b6d2 The relevant change is this hunk here: diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index d4d3424..49dd501 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -1674,9 +1677,15 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev, exclusive = reservation_object_get_excl(bo->tbo.resv); } -if (bo) +if (bo) { flags = amdgpu_ttm_tt_pte_flags(adev, bo->tbo.ttm, mem); -else +bo_adev = amdgpu_ttm_adev(bo->tbo.bdev); +if (mem && mem->mem_type == TTM_PL_VRAM && +adev != bo_adev) { +flags |= AMDGPU_PTE_SYSTEM; +vram_base_offset = bo_adev->gmc.aper_base; +} +} else flags = 0x0; if (clear || (bo && bo->tbo.resv == vm->root.base.bo->tbo.resv)) We just treat the remote BAR like we would for system memory mapped into the GPUVM. This will obviously need some updating to properly deal with IOMMU, etc. once dma-buf lands. Alex > > If you have patches/pointers would be really intersting to read them a > bit. > -Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch > ___ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Preparation for unpinned DMA-buf handling
On Tue, May 14, 2019 at 04:05:13PM +0200, Christian König wrote: > Hi Daniel, > > since we are not moving forward with this I've separated the change out of > the larger patchset. > > Any objections to this now? It's basically a 1:1 move of the functionality > from DRM to DMA-buf. For my big picture understanding, and because I spotted in the latest rocm release that apparently the fancy interconnect is now support. I looked around a bit in upstream and basic xgmi support seems to be there already, but I haven't seen anything that looks like xgmi buffer sharing is there already? Missing something, or am I looking at the wrong thing (xgmi seems to be the engineering name for that fancy interconnect at least). For the big plan I've seen the dynamic dma-buf and p2p stuff, I think it'd be good if we also make sure xgmi and related things integrate somewhat neatly. Ofc xgmi requires that we upcast the dma_buf_attachment to somethign driver specific (I don't think we should make that generic, but maybe Jerome has different ideas with HMM). Question I have is how much of the map/unmap/invaliidate and passing sg tables around we can reuse for this. If you have patches/pointers would be really intersting to read them a bit. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel