Re: [Intel-gfx] [PATCH 2/6] drm/drv: Prepare to remove drm_dev_unplug()
On Wed, Feb 06, 2019 at 05:46:51PM +0100, Noralf Trønnes wrote: > > > Den 06.02.2019 16.26, skrev Daniel Vetter: > > On Tue, Feb 05, 2019 at 06:57:50PM +0100, Noralf Trønnes wrote: > >> > >> > >> Den 05.02.2019 17.31, skrev Daniel Vetter: > >>> On Tue, Feb 05, 2019 at 11:20:55AM +0100, Noralf Trønnes wrote: > > > Den 05.02.2019 10.11, skrev Daniel Vetter: > > On Mon, Feb 04, 2019 at 06:35:28PM +0100, Noralf Trønnes wrote: > >> > >> > >> Den 04.02.2019 16.41, skrev Daniel Vetter: > >>> On Sun, Feb 03, 2019 at 04:41:56PM +0100, Noralf Trønnes wrote: > The only thing now that makes drm_dev_unplug() special is that it > sets > drm_device->unplugged. Move this code to drm_dev_unregister() so > that we > can remove drm_dev_unplug(). > > Signed-off-by: Noralf Trønnes > --- > >> > >> [...] > >> > drivers/gpu/drm/drm_drv.c | 27 +++ > include/drm/drm_drv.h | 10 -- > 2 files changed, 19 insertions(+), 18 deletions(-) > > diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c > index 05bbc2b622fc..e0941200edc6 100644 > --- a/drivers/gpu/drm/drm_drv.c > +++ b/drivers/gpu/drm/drm_drv.c > @@ -366,15 +366,6 @@ EXPORT_SYMBOL(drm_dev_exit); > */ > void drm_dev_unplug(struct drm_device *dev) > { > -/* > - * After synchronizing any critical read section is guaranteed > to see > - * the new value of ->unplugged, and any critical section which > might > - * still have seen the old value of ->unplugged is guaranteed > to have > - * finished. > - */ > -dev->unplugged = true; > -synchronize_srcu(_unplug_srcu); > - > drm_dev_unregister(dev); > drm_dev_put(dev); > } > @@ -832,11 +823,14 @@ EXPORT_SYMBOL(drm_dev_register); > * drm_dev_register() but does not deallocate the device. The > caller must call > * drm_dev_put() to drop their final reference. > * > - * A special form of unregistering for hotpluggable devices is > drm_dev_unplug(), > - * which can be called while there are still open users of @dev. > + * This function can be called while there are still open users of > @dev as long > + * as the driver protects its device resources using > drm_dev_enter() and > + * drm_dev_exit(). > * > * This should be called first in the device teardown code to make > sure > - * userspace can't access the device instance any more. > + * userspace can't access the device instance any more. Drivers > that support > + * device unplug will probably want to call > drm_atomic_helper_shutdown() first > >>> > >>> Read once more with a bit more coffee, spotted this: > >>> > >>> s/first/afterwards/ - shutting down the hw before we've taken it away > >>> from > >>> userspace is kinda the wrong way round. It should be the inverse of > >>> driver > >>> load, which is 1) allocate structures 2) prep hw 3) register driver > >>> with > >>> the world (simplified ofc). > >>> > >> > >> The problem is that drm_dev_unregister() sets the device as unplugged > >> and if drm_atomic_helper_shutdown() is called afterwards it's not > >> allowed to touch hardware. > >> > >> I know it's the wrong order, but the only way to do it in the right > >> order is to have a separate function that sets unplugged: > >> > >>drm_dev_unregister(); > >>drm_atomic_helper_shutdown(); > >>drm_dev_set_unplugged(); > > > > Annoying ... but yeah calling _shutdown() before we stopped userspace is > > also not going to work. Because userspace could quickly re-enable > > something, and then the refcounts would be all wrong again and leaking > > objects. > > > > What happens with a USB device that is unplugged with open userspace, > will that leak objects? > >>> > >>> Maybe we've jumped to conclusions. drm_atomic_helper_shutdown() will run > >>> as normal, the only thing that should be skipped is actually touching the > >>> hw (as long as the driver doesn't protect too much with > >>> drm_dev_enter/exit). So all the software updates (including refcounting > >>> updates) will still be done. Ofc current udl is not yet atomic, so in > >>> reality something else happens. > >>> > >>> And we ofc still have the same issue that if you just unload the driver, > >>> then the hw will stay on (which might really confuse the driver on next > >>> load, when it assumes that it only gets loaded
Re: [PATCH 2/3] drm/amdgpu: Fix a typo in vega10_ih_set_rptr()
I see. I will drop this. How about the other two patches? Regards, Yong From: Christian K?nig Sent: Wednesday, February 6, 2019 1:20 PM To: Alex Deucher; Zhao, Yong Cc: Kuehling, Felix; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH 2/3] drm/amdgpu: Fix a typo in vega10_ih_set_rptr() Am 06.02.19 um 19:17 schrieb Alex Deucher: > On Wed, Feb 6, 2019 at 12:01 PM Zhao, Yong wrote: >> Not too sure about this, but it looks strange in the code when all other >> similar code uses 64-bit. > It's worth double checking, but Felix is right. A number of IPs still > used 32 bit doorbells on vega10. E.g., multi-media for example. I agree, the IH is certainly a 32bit doorbell on Vega10. Christian. > > Alex > >> "Maybe it makes more sense to think of 64-bit doorbells as using 2 doorbell >> indexes". You mean the alternative is to multiply all the current 64 >> doorbell index constants with 2, right? That might be easier and cleaner, >> and we need to make sure that the *2 and << 1 conversions from 64-bit index >> to dword index are all removed. >> >> Regards, >> Yong >> >> From: Kuehling, Felix >> Sent: Wednesday, February 6, 2019 11:26 AM >> To: Zhao, Yong; amd-gfx@lists.freedesktop.org >> Subject: Re: [PATCH 2/3] drm/amdgpu: Fix a typo in vega10_ih_set_rptr() >> >> Are you sure about this? Typically 64-bit doorbells don't wrap around. But >> this one does. If the IH doorbell wraps around, there is no reason why it >> needs to be 64-bit, so I suspect it may still be a 32-bit doorbell. >> >> AFAIK, not all doorbells on Vega10 are 64-bit. It depends on the IP block. >> Therefore, maybe some of your other renaming changes should be reconsidered >> if they assume that all doorbells are 64-bit. Maybe it makes more sense to >> think of 64-bit doorbells as using 2 doorbell indexes. >> >> Regards, >>Felix >> >> >> From: amd-gfx on behalf of Zhao, >> Yong >> Sent: Wednesday, February 6, 2019 10:49 AM >> To: amd-gfx@lists.freedesktop.org >> Cc: Zhao, Yong >> Subject: [PATCH 2/3] drm/amdgpu: Fix a typo in vega10_ih_set_rptr() >> >> Clearly, it should be a 64-bit doorbell operation. >> >> Change-Id: I644a2ebcb18c2ede24ee15692a6189efad10a35c >> Signed-off-by: Yong Zhao >> --- >> drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c >> b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c >> index 796004896661..36f0e3cada30 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c >> +++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c >> @@ -377,7 +377,7 @@ static void vega10_ih_set_rptr(struct amdgpu_device >> *adev, >> if (ih->use_doorbell) { >> /* XXX check if swapping is necessary on BE */ >> *ih->rptr_cpu = ih->rptr; >> - WDOORBELL32(ih->doorbell_idx_in_dw, ih->rptr); >> + WDOORBELL64(ih->doorbell_idx_in_dw, ih->rptr); >> } else if (ih == >irq.ih) { >> WREG32_SOC15(OSSSYS, 0, mmIH_RB_RPTR, ih->rptr); >> } else if (ih == >irq.ih1) { >> -- >> 2.17.1 >> >> ___ >> amd-gfx mailing list >> amd-gfx@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >> ___ >> amd-gfx mailing list >> amd-gfx@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: [PATCH 2/5] drm/amdgpu: Fix a bug in setting CP_MEC_DOORBELL_RANGE_UPPER on SOC15
Hey Oak, Interesting you should ask since this has been discussed a lot here recently. So I didn't know until recently that the nbif routes doorbells only using 11:2. So in each 4k page there is an area for the various doorbell recipients including cp gfx and compute. The doorbell upper/lower in the CP only control the setting of a doorbell updated bit and currently uses the full 27:2. The CP also uses the full 27:2 to match the doorbell to a queue slot and initiate work. So for your example if you set the ranges to 4k then the CP would not consider a 6k address to match the MEC range and would not actually trigger this doorbell updated logic. This logic is only needed for waking up an idle scheduler and for power gating support. We are changing the hardware in the doorbell range check to only look at 11:2 to match the nbif routing logic. With the current hardware I am not even sure how you configure the range registers if you are using PASID in bits 27:12 of the doorbell. The default for those registers says that compute gets everything from gfx (0x12) up to 3f. How do you actually configure these currently, do you leave these defaults? That seems like the only way to have it pseudo work without the coming hardware change. Thanks Clint -Original Message- From: Zeng, Oak Sent: Wednesday, February 06, 2019 10:23 AM To: Zhao, Yong ; amd-gfx@lists.freedesktop.org; Knott, William ; Yang, Alice (SRDC 3D) Cc: Zhao, Yong Subject: RE: [PATCH 2/5] drm/amdgpu: Fix a bug in setting CP_MEC_DOORBELL_RANGE_UPPER on SOC15 + Clint/Alice Hi Clint, We think CP_MEC_DOORBELL_RANGE_LOWER/UPPER registers are used by MEC to check whether a doorbell routed to MEC belongs to MEC, is this understanding correct? From the register spec, those registers are 27 bits. Does this mean MEC use all 27 bits to determine? For example, if we set lower/upper to [0, 4k], will a doorbell ring at 6K address be ignored by MEC? Thanks, Oak -Original Message- From: amd-gfx On Behalf Of Zhao, Yong Sent: Tuesday, February 5, 2019 3:31 PM To: amd-gfx@lists.freedesktop.org Cc: Zhao, Yong Subject: [PATCH 2/5] drm/amdgpu: Fix a bug in setting CP_MEC_DOORBELL_RANGE_UPPER on SOC15 Because CP can use all doorbells outside the ones reserved for SDMA, IH, and VCN, so the value set to CP_MEC_DOORBELL_RANGE_UPPER should be the maximal index possible in a page. Change-Id: I402a56ce9a80e6c2ed2f96be431ae71ca88e73a4 Signed-off-by: Yong Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h | 1 + drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 2 +- drivers/gpu/drm/amd/amdgpu/vega10_reg_init.c | 3 +++ drivers/gpu/drm/amd/amdgpu/vega20_reg_init.c | 3 +++ 4 files changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h index 5c8d04c353d0..90eca63605ea 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h @@ -73,6 +73,7 @@ struct amdgpu_doorbell_index { } uvd_vce; }; uint32_t max_assignment; + uint32_t last_idx; /* Per engine SDMA doorbell size in dword */ uint32_t dw_range_per_sdma_eng; }; diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index 262ee3cf6f1c..0278e3ab6b94 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c @@ -2998,7 +2998,7 @@ static int gfx_v9_0_kiq_init_register(struct amdgpu_ring *ring) WREG32_SOC15(GC, 0, mmCP_MEC_DOORBELL_RANGE_LOWER, (adev->doorbell_index.kiq * 2) << 2); WREG32_SOC15(GC, 0, mmCP_MEC_DOORBELL_RANGE_UPPER, - (adev->doorbell_index.userqueue_end * 2) << 2); + (adev->doorbell_index.last_idx * 2) << 2); } WREG32_SOC15(GC, 0, mmCP_HQD_PQ_DOORBELL_CONTROL, diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_reg_init.c b/drivers/gpu/drm/amd/amdgpu/vega10_reg_init.c index d2409df2dde9..9eb8c9209231 100644 --- a/drivers/gpu/drm/amd/amdgpu/vega10_reg_init.c +++ b/drivers/gpu/drm/amd/amdgpu/vega10_reg_init.c @@ -88,5 +88,8 @@ void vega10_doorbell_index_init(struct amdgpu_device *adev) (adev->doorbell_index.sdma_engine[1] - adev->doorbell_index.sdma_engine[0]) * adev->doorbell_index.entry_dw_size; + + adev->doorbell_index.last_idx = PAGE_SIZE + / (sizeof(uint32_t) * adev->doorbell_index.entry_dw_size) - 1; } diff --git a/drivers/gpu/drm/amd/amdgpu/vega20_reg_init.c b/drivers/gpu/drm/amd/amdgpu/vega20_reg_init.c index b28c5999d8f0..aa8c7699c689 100644 --- a/drivers/gpu/drm/amd/amdgpu/vega20_reg_init.c +++ b/drivers/gpu/drm/amd/amdgpu/vega20_reg_init.c @@ -91,5 +91,8 @@ void vega20_doorbell_index_init(struct amdgpu_device *adev)
Re: [PATCH] drm: enable uncached DMA optimization for ARM and arm64
Am 06.02.19 um 18:23 schrieb Ard Biesheuvel: On Fri, 25 Jan 2019 at 11:35, Ard Biesheuvel wrote: On Fri, 25 Jan 2019 at 12:30, Christian König wrote: Am 25.01.19 um 09:43 schrieb Ard Biesheuvel: On Thu, 24 Jan 2019 at 15:01, Alex Deucher wrote: On Thu, Jan 24, 2019 at 9:00 AM Ard Biesheuvel wrote: On Thu, 24 Jan 2019 at 13:31, Koenig, Christian wrote: Am 24.01.19 um 13:06 schrieb Ard Biesheuvel: The DRM driver stack is designed to work with cache coherent devices only, but permits an optimization to be enabled in some cases, where for some buffers, both the CPU and the GPU use uncached mappings, removing the need for DMA snooping and allocation in the CPU caches. The use of uncached GPU mappings relies on the correct implementation of the PCIe NoSnoop TLP attribute by the platform, otherwise the GPU will use cached mappings nonetheless. On x86 platforms, this does not seem to matter, as uncached CPU mappings will snoop the caches in any case. However, on ARM and arm64, enabling this optimization on a platform where NoSnoop is ignored results in loss of coherency, which breaks correct operation of the device. Since we have no way of detecting whether NoSnoop works or not, just disable this optimization entirely for ARM and arm64. Cc: Christian Koenig Cc: Alex Deucher Cc: David Zhou Cc: Huang Rui Cc: Junwei Zhang Cc: Michel Daenzer Cc: David Airlie Cc: Daniel Vetter Cc: Maarten Lankhorst Cc: Maxime Ripard Cc: Sean Paul Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Will Deacon Cc: Christoph Hellwig Cc: Robin Murphy Cc: amd-gfx list Cc: dri-devel Reported-by: Carsten Haitzler Signed-off-by: Ard Biesheuvel The subject line should probably read "disable uncached...". Ugh, of course ... With that fixed the patch is Reviewed-by: Christian König . Same: Reviewed-by: Alex Deucher Thanks all Should I resend the patch with the subject corrected? I will update the subject line and push it upstream through drm-misc-next if nobody objects. Wonderful, thanks. Hi Christian, Are you still planning to merge this for v5.1? My bad, only pushed this to our internal branch, but forgot out drm-misc-next. Fixed now, thanks for the reminder. Christian. Thanks, Ard. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 2/3] drm/amdgpu: Fix a typo in vega10_ih_set_rptr()
Am 06.02.19 um 19:17 schrieb Alex Deucher: On Wed, Feb 6, 2019 at 12:01 PM Zhao, Yong wrote: Not too sure about this, but it looks strange in the code when all other similar code uses 64-bit. It's worth double checking, but Felix is right. A number of IPs still used 32 bit doorbells on vega10. E.g., multi-media for example. I agree, the IH is certainly a 32bit doorbell on Vega10. Christian. Alex "Maybe it makes more sense to think of 64-bit doorbells as using 2 doorbell indexes". You mean the alternative is to multiply all the current 64 doorbell index constants with 2, right? That might be easier and cleaner, and we need to make sure that the *2 and << 1 conversions from 64-bit index to dword index are all removed. Regards, Yong From: Kuehling, Felix Sent: Wednesday, February 6, 2019 11:26 AM To: Zhao, Yong; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH 2/3] drm/amdgpu: Fix a typo in vega10_ih_set_rptr() Are you sure about this? Typically 64-bit doorbells don't wrap around. But this one does. If the IH doorbell wraps around, there is no reason why it needs to be 64-bit, so I suspect it may still be a 32-bit doorbell. AFAIK, not all doorbells on Vega10 are 64-bit. It depends on the IP block. Therefore, maybe some of your other renaming changes should be reconsidered if they assume that all doorbells are 64-bit. Maybe it makes more sense to think of 64-bit doorbells as using 2 doorbell indexes. Regards, Felix From: amd-gfx on behalf of Zhao, Yong Sent: Wednesday, February 6, 2019 10:49 AM To: amd-gfx@lists.freedesktop.org Cc: Zhao, Yong Subject: [PATCH 2/3] drm/amdgpu: Fix a typo in vega10_ih_set_rptr() Clearly, it should be a 64-bit doorbell operation. Change-Id: I644a2ebcb18c2ede24ee15692a6189efad10a35c Signed-off-by: Yong Zhao --- drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c index 796004896661..36f0e3cada30 100644 --- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c @@ -377,7 +377,7 @@ static void vega10_ih_set_rptr(struct amdgpu_device *adev, if (ih->use_doorbell) { /* XXX check if swapping is necessary on BE */ *ih->rptr_cpu = ih->rptr; - WDOORBELL32(ih->doorbell_idx_in_dw, ih->rptr); + WDOORBELL64(ih->doorbell_idx_in_dw, ih->rptr); } else if (ih == >irq.ih) { WREG32_SOC15(OSSSYS, 0, mmIH_RB_RPTR, ih->rptr); } else if (ih == >irq.ih1) { -- 2.17.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 2/3] drm/amdgpu: Fix a typo in vega10_ih_set_rptr()
On Wed, Feb 6, 2019 at 12:01 PM Zhao, Yong wrote: > > Not too sure about this, but it looks strange in the code when all other > similar code uses 64-bit. It's worth double checking, but Felix is right. A number of IPs still used 32 bit doorbells on vega10. E.g., multi-media for example. Alex > > "Maybe it makes more sense to think of 64-bit doorbells as using 2 doorbell > indexes". You mean the alternative is to multiply all the current 64 doorbell > index constants with 2, right? That might be easier and cleaner, and we need > to make sure that the *2 and << 1 conversions from 64-bit index to dword > index are all removed. > > Regards, > Yong > > From: Kuehling, Felix > Sent: Wednesday, February 6, 2019 11:26 AM > To: Zhao, Yong; amd-gfx@lists.freedesktop.org > Subject: Re: [PATCH 2/3] drm/amdgpu: Fix a typo in vega10_ih_set_rptr() > > Are you sure about this? Typically 64-bit doorbells don't wrap around. But > this one does. If the IH doorbell wraps around, there is no reason why it > needs to be 64-bit, so I suspect it may still be a 32-bit doorbell. > > AFAIK, not all doorbells on Vega10 are 64-bit. It depends on the IP block. > Therefore, maybe some of your other renaming changes should be reconsidered > if they assume that all doorbells are 64-bit. Maybe it makes more sense to > think of 64-bit doorbells as using 2 doorbell indexes. > > Regards, > Felix > > > From: amd-gfx on behalf of Zhao, Yong > > Sent: Wednesday, February 6, 2019 10:49 AM > To: amd-gfx@lists.freedesktop.org > Cc: Zhao, Yong > Subject: [PATCH 2/3] drm/amdgpu: Fix a typo in vega10_ih_set_rptr() > > Clearly, it should be a 64-bit doorbell operation. > > Change-Id: I644a2ebcb18c2ede24ee15692a6189efad10a35c > Signed-off-by: Yong Zhao > --- > drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c > b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c > index 796004896661..36f0e3cada30 100644 > --- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c > +++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c > @@ -377,7 +377,7 @@ static void vega10_ih_set_rptr(struct amdgpu_device *adev, > if (ih->use_doorbell) { > /* XXX check if swapping is necessary on BE */ > *ih->rptr_cpu = ih->rptr; > - WDOORBELL32(ih->doorbell_idx_in_dw, ih->rptr); > + WDOORBELL64(ih->doorbell_idx_in_dw, ih->rptr); > } else if (ih == >irq.ih) { > WREG32_SOC15(OSSSYS, 0, mmIH_RB_RPTR, ih->rptr); > } else if (ih == >irq.ih1) { > -- > 2.17.1 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm: enable uncached DMA optimization for ARM and arm64
On Fri, 25 Jan 2019 at 11:35, Ard Biesheuvel wrote: > > On Fri, 25 Jan 2019 at 12:30, Christian König > wrote: > > > > Am 25.01.19 um 09:43 schrieb Ard Biesheuvel: > > > On Thu, 24 Jan 2019 at 15:01, Alex Deucher wrote: > > >> On Thu, Jan 24, 2019 at 9:00 AM Ard Biesheuvel > > >> wrote: > > >>> On Thu, 24 Jan 2019 at 13:31, Koenig, Christian > > >>> wrote: > > Am 24.01.19 um 13:06 schrieb Ard Biesheuvel: > > > The DRM driver stack is designed to work with cache coherent devices > > > only, but permits an optimization to be enabled in some cases, where > > > for some buffers, both the CPU and the GPU use uncached mappings, > > > removing the need for DMA snooping and allocation in the CPU caches. > > > > > > The use of uncached GPU mappings relies on the correct implementation > > > of the PCIe NoSnoop TLP attribute by the platform, otherwise the GPU > > > will use cached mappings nonetheless. On x86 platforms, this does not > > > seem to matter, as uncached CPU mappings will snoop the caches in any > > > case. However, on ARM and arm64, enabling this optimization on a > > > platform where NoSnoop is ignored results in loss of coherency, which > > > breaks correct operation of the device. Since we have no way of > > > detecting whether NoSnoop works or not, just disable this > > > optimization entirely for ARM and arm64. > > > > > > Cc: Christian Koenig > > > Cc: Alex Deucher > > > Cc: David Zhou > > > Cc: Huang Rui > > > Cc: Junwei Zhang > > > Cc: Michel Daenzer > > > Cc: David Airlie > > > Cc: Daniel Vetter > > > Cc: Maarten Lankhorst > > > Cc: Maxime Ripard > > > Cc: Sean Paul > > > Cc: Michael Ellerman > > > Cc: Benjamin Herrenschmidt > > > Cc: Will Deacon > > > Cc: Christoph Hellwig > > > Cc: Robin Murphy > > > Cc: amd-gfx list > > > Cc: dri-devel > > > Reported-by: Carsten Haitzler > > > Signed-off-by: Ard Biesheuvel > > The subject line should probably read "disable uncached...". > > > > >>> Ugh, of course ... > > >>> > > With that fixed the patch is Reviewed-by: Christian König > > . > > > > >> Same: > > >> Reviewed-by: Alex Deucher > > >> > > > Thanks all > > > > > > Should I resend the patch with the subject corrected? > > > > I will update the subject line and push it upstream through > > drm-misc-next if nobody objects. > > > > Wonderful, thanks. Hi Christian, Are you still planning to merge this for v5.1? Thanks, Ard. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [Intel-gfx] [PATCH 2/6] drm/drv: Prepare to remove drm_dev_unplug()
Daniel Vetter writes: > > Zooming out more looking at the big picture I'd say all your work in the > past few years has enormously simplified drm for simple drivers already. > If we can't resolve this one here right now that just means you "only" > made drm 98% simpler instead of maybe 99%. It's still an epic win :-) I'd like to second this. So many of Noralf's cleanups I think "oof, that's a lot of work for a little cleanup here". But we've benefited immensely from it accumulating over the years. Thanks again! signature.asc Description: PGP signature ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
4.20 regression: short time display corruption during resume
Hi, I use HP EliteBook 745 G5 with Ryzen 5 PRO 2500U. With kernels 4.20 and 5.0.0-rc3 I see a display corruption during resume from RAM. It's a short time corruption of black screen only, after less than second an expected image appears, so nothing critical. It's a regression though so I somehow hope it may be easy to fix. It's caused by the commit 009d9ed6c4b7 ("drm/amdgpu: Change AI gfx/sdma/smu init sequence"). I've created a bugzilla report for it: https://bugs.freedesktop.org/show_bug.cgi?id=109554 (includes corruption photo). Could someone look at this, please? I'm happy to test any patches. -- Rafał ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 2/3] drm/amdgpu: Fix a typo in vega10_ih_set_rptr()
Not too sure about this, but it looks strange in the code when all other similar code uses 64-bit. "Maybe it makes more sense to think of 64-bit doorbells as using 2 doorbell indexes". You mean the alternative is to multiply all the current 64 doorbell index constants with 2, right? That might be easier and cleaner, and we need to make sure that the *2 and << 1 conversions from 64-bit index to dword index are all removed. Regards, Yong From: Kuehling, Felix Sent: Wednesday, February 6, 2019 11:26 AM To: Zhao, Yong; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH 2/3] drm/amdgpu: Fix a typo in vega10_ih_set_rptr() Are you sure about this? Typically 64-bit doorbells don't wrap around. But this one does. If the IH doorbell wraps around, there is no reason why it needs to be 64-bit, so I suspect it may still be a 32-bit doorbell. AFAIK, not all doorbells on Vega10 are 64-bit. It depends on the IP block. Therefore, maybe some of your other renaming changes should be reconsidered if they assume that all doorbells are 64-bit. Maybe it makes more sense to think of 64-bit doorbells as using 2 doorbell indexes. Regards, Felix From: amd-gfx on behalf of Zhao, Yong Sent: Wednesday, February 6, 2019 10:49 AM To: amd-gfx@lists.freedesktop.org Cc: Zhao, Yong Subject: [PATCH 2/3] drm/amdgpu: Fix a typo in vega10_ih_set_rptr() Clearly, it should be a 64-bit doorbell operation. Change-Id: I644a2ebcb18c2ede24ee15692a6189efad10a35c Signed-off-by: Yong Zhao --- drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c index 796004896661..36f0e3cada30 100644 --- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c @@ -377,7 +377,7 @@ static void vega10_ih_set_rptr(struct amdgpu_device *adev, if (ih->use_doorbell) { /* XXX check if swapping is necessary on BE */ *ih->rptr_cpu = ih->rptr; - WDOORBELL32(ih->doorbell_idx_in_dw, ih->rptr); + WDOORBELL64(ih->doorbell_idx_in_dw, ih->rptr); } else if (ih == >irq.ih) { WREG32_SOC15(OSSSYS, 0, mmIH_RB_RPTR, ih->rptr); } else if (ih == >irq.ih1) { -- 2.17.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [Intel-gfx] [PATCH 2/6] drm/drv: Prepare to remove drm_dev_unplug()
Den 06.02.2019 16.26, skrev Daniel Vetter: > On Tue, Feb 05, 2019 at 06:57:50PM +0100, Noralf Trønnes wrote: >> >> >> Den 05.02.2019 17.31, skrev Daniel Vetter: >>> On Tue, Feb 05, 2019 at 11:20:55AM +0100, Noralf Trønnes wrote: Den 05.02.2019 10.11, skrev Daniel Vetter: > On Mon, Feb 04, 2019 at 06:35:28PM +0100, Noralf Trønnes wrote: >> >> >> Den 04.02.2019 16.41, skrev Daniel Vetter: >>> On Sun, Feb 03, 2019 at 04:41:56PM +0100, Noralf Trønnes wrote: The only thing now that makes drm_dev_unplug() special is that it sets drm_device->unplugged. Move this code to drm_dev_unregister() so that we can remove drm_dev_unplug(). Signed-off-by: Noralf Trønnes --- >> >> [...] >> drivers/gpu/drm/drm_drv.c | 27 +++ include/drm/drm_drv.h | 10 -- 2 files changed, 19 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c index 05bbc2b622fc..e0941200edc6 100644 --- a/drivers/gpu/drm/drm_drv.c +++ b/drivers/gpu/drm/drm_drv.c @@ -366,15 +366,6 @@ EXPORT_SYMBOL(drm_dev_exit); */ void drm_dev_unplug(struct drm_device *dev) { - /* - * After synchronizing any critical read section is guaranteed to see - * the new value of ->unplugged, and any critical section which might - * still have seen the old value of ->unplugged is guaranteed to have - * finished. - */ - dev->unplugged = true; - synchronize_srcu(_unplug_srcu); - drm_dev_unregister(dev); drm_dev_put(dev); } @@ -832,11 +823,14 @@ EXPORT_SYMBOL(drm_dev_register); * drm_dev_register() but does not deallocate the device. The caller must call * drm_dev_put() to drop their final reference. * - * A special form of unregistering for hotpluggable devices is drm_dev_unplug(), - * which can be called while there are still open users of @dev. + * This function can be called while there are still open users of @dev as long + * as the driver protects its device resources using drm_dev_enter() and + * drm_dev_exit(). * * This should be called first in the device teardown code to make sure - * userspace can't access the device instance any more. + * userspace can't access the device instance any more. Drivers that support + * device unplug will probably want to call drm_atomic_helper_shutdown() first >>> >>> Read once more with a bit more coffee, spotted this: >>> >>> s/first/afterwards/ - shutting down the hw before we've taken it away >>> from >>> userspace is kinda the wrong way round. It should be the inverse of >>> driver >>> load, which is 1) allocate structures 2) prep hw 3) register driver with >>> the world (simplified ofc). >>> >> >> The problem is that drm_dev_unregister() sets the device as unplugged >> and if drm_atomic_helper_shutdown() is called afterwards it's not >> allowed to touch hardware. >> >> I know it's the wrong order, but the only way to do it in the right >> order is to have a separate function that sets unplugged: >> >> drm_dev_unregister(); >> drm_atomic_helper_shutdown(); >> drm_dev_set_unplugged(); > > Annoying ... but yeah calling _shutdown() before we stopped userspace is > also not going to work. Because userspace could quickly re-enable > something, and then the refcounts would be all wrong again and leaking > objects. > What happens with a USB device that is unplugged with open userspace, will that leak objects? >>> >>> Maybe we've jumped to conclusions. drm_atomic_helper_shutdown() will run >>> as normal, the only thing that should be skipped is actually touching the >>> hw (as long as the driver doesn't protect too much with >>> drm_dev_enter/exit). So all the software updates (including refcounting >>> updates) will still be done. Ofc current udl is not yet atomic, so in >>> reality something else happens. >>> >>> And we ofc still have the same issue that if you just unload the driver, >>> then the hw will stay on (which might really confuse the driver on next >>> load, when it assumes that it only gets loaded from cold boot where >>> everything is off - which usually is the case on an arm soc at least). >>> > I get a bit the feeling we're over-optimizing here with trying to devm-ize > drm_dev_register. Just getting drm_device correctly devm-ized is a big > step forward already, and will open up a lot of TODO
[PATCH 2/3] drm/amdkfd: avoid HMM change cause circular lock dependency v2
There is circular lock between gfx and kfd path with HMM change: lock(dqm) -> bo::reserve -> amdgpu_mn_lock To avoid this, move init/unint_mqd() out of lock(dqm), to remove nested locking between mmap_sem and bo::reserve. The locking order is: bo::reserve -> amdgpu_mn_lock(p->mn) Change-Id: I2ec09a47571f6b4c8eaef93f22c0a600f5f70153 Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Acked-by: Christian König --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 32 ++- 1 file changed, 17 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index 8372556b52eb..efe0d3c0215b 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -1156,21 +1156,17 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q, int retval; struct mqd_manager *mqd_mgr; - retval = 0; - - dqm_lock(dqm); - if (dqm->total_queue_count >= max_num_of_queues_per_device) { pr_warn("Can't create new usermode queue because %d queues were already created\n", dqm->total_queue_count); retval = -EPERM; - goto out_unlock; + goto out; } if (q->properties.type == KFD_QUEUE_TYPE_SDMA) { retval = allocate_sdma_queue(dqm, >sdma_id); if (retval) - goto out_unlock; + goto out; q->properties.sdma_queue_id = q->sdma_id / get_num_sdma_engines(dqm); q->properties.sdma_engine_id = @@ -1181,6 +1177,9 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q, if (retval) goto out_deallocate_sdma_queue; + /* Do init_mqd before dqm_lock(dqm) to avoid circular locking order: +* lock(dqm) -> bo::reserve +*/ mqd_mgr = dqm->ops.get_mqd_manager(dqm, get_mqd_type_from_queue_type(q->properties.type)); @@ -1188,6 +1187,7 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q, retval = -ENOMEM; goto out_deallocate_doorbell; } + /* * Eviction state logic: we only mark active queues as evicted * to avoid the overhead of restoring inactive queues later @@ -1196,9 +1196,7 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q, q->properties.is_evicted = (q->properties.queue_size > 0 && q->properties.queue_percent > 0 && q->properties.queue_address != 0); - dqm->asic_ops.init_sdma_vm(dqm, q, qpd); - q->properties.tba_addr = qpd->tba_addr; q->properties.tma_addr = qpd->tma_addr; retval = mqd_mgr->init_mqd(mqd_mgr, >mqd, >mqd_mem_obj, @@ -1206,6 +1204,8 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q, if (retval) goto out_deallocate_doorbell; + dqm_lock(dqm); + list_add(>list, >queues_list); qpd->queue_count++; if (q->properties.is_active) { @@ -1233,9 +1233,7 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q, out_deallocate_sdma_queue: if (q->properties.type == KFD_QUEUE_TYPE_SDMA) deallocate_sdma_queue(dqm, q->sdma_id); -out_unlock: - dqm_unlock(dqm); - +out: return retval; } @@ -1398,8 +1396,6 @@ static int destroy_queue_cpsch(struct device_queue_manager *dqm, qpd->reset_wavefronts = true; } - mqd_mgr->uninit_mqd(mqd_mgr, q->mqd, q->mqd_mem_obj); - /* * Unconditionally decrement this counter, regardless of the queue's * type @@ -1410,6 +1406,9 @@ static int destroy_queue_cpsch(struct device_queue_manager *dqm, dqm_unlock(dqm); + /* Do uninit_mqd after dqm_unlock(dqm) to avoid circular locking */ + mqd_mgr->uninit_mqd(mqd_mgr, q->mqd, q->mqd_mem_obj); + return retval; failed: @@ -1631,7 +1630,11 @@ static int process_termination_cpsch(struct device_queue_manager *dqm, qpd->reset_wavefronts = false; } - /* lastly, free mqd resources */ + dqm_unlock(dqm); + + /* Lastly, free mqd resources. +* Do uninit_mqd() after dqm_unlock to avoid circular locking. +*/ list_for_each_entry_safe(q, next, >queues_list, list) { mqd_mgr = dqm->ops.get_mqd_manager(dqm, get_mqd_type_from_queue_type(q->properties.type)); @@ -1645,7 +1648,6 @@ static int process_termination_cpsch(struct device_queue_manager *dqm, } out: - dqm_unlock(dqm); return
[PATCH 1/3] drm/amdgpu: use HMM mirror callback to replace mmu notifier v7
Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in DRM_AMDGPU_USERPTR Kconfig. It supports both KFD userptr and gfx userptr paths. Change-Id: Ie62c3c5e3c5b8521ab3b438d1eff2aa2a003835e Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/Kconfig | 6 +- drivers/gpu/drm/amd/amdgpu/Makefile| 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 154 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h | 2 +- 4 files changed, 70 insertions(+), 94 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig b/drivers/gpu/drm/amd/amdgpu/Kconfig index 9221e5489069..960a63355705 100644 --- a/drivers/gpu/drm/amd/amdgpu/Kconfig +++ b/drivers/gpu/drm/amd/amdgpu/Kconfig @@ -26,10 +26,10 @@ config DRM_AMDGPU_CIK config DRM_AMDGPU_USERPTR bool "Always enable userptr write support" depends on DRM_AMDGPU - select MMU_NOTIFIER + select HMM_MIRROR help - This option selects CONFIG_MMU_NOTIFIER if it isn't already - selected to enabled full userptr support. + This option selects CONFIG_HMM and CONFIG_HMM_MIRROR if it + isn't already selected to enabled full userptr support. config DRM_AMDGPU_GART_DEBUGFS bool "Allow GART access through debugfs" diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile index 466da5954a68..851001ced5e8 100644 --- a/drivers/gpu/drm/amd/amdgpu/Makefile +++ b/drivers/gpu/drm/amd/amdgpu/Makefile @@ -172,7 +172,7 @@ endif amdgpu-$(CONFIG_COMPAT) += amdgpu_ioc32.o amdgpu-$(CONFIG_VGA_SWITCHEROO) += amdgpu_atpx_handler.o amdgpu-$(CONFIG_ACPI) += amdgpu_acpi.o -amdgpu-$(CONFIG_MMU_NOTIFIER) += amdgpu_mn.o +amdgpu-$(CONFIG_HMM_MIRROR) += amdgpu_mn.o include $(FULL_AMD_PATH)/powerplay/Makefile diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c index 3e6823fdd939..e356867d2308 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c @@ -45,7 +45,7 @@ #include #include -#include +#include #include #include #include @@ -58,7 +58,6 @@ * * @adev: amdgpu device pointer * @mm: process address space - * @mn: MMU notifier structure * @type: type of MMU notifier * @work: destruction work item * @node: hash table node to find structure by adev and mn @@ -66,6 +65,7 @@ * @objects: interval tree containing amdgpu_mn_nodes * @read_lock: mutex for recursive locking of @lock * @recursion: depth of recursion + * @mirror: HMM mirror function support * * Data for each amdgpu device and process address space. */ @@ -73,7 +73,6 @@ struct amdgpu_mn { /* constant after initialisation */ struct amdgpu_device*adev; struct mm_struct*mm; - struct mmu_notifier mn; enum amdgpu_mn_type type; /* only used on destruction */ @@ -85,8 +84,9 @@ struct amdgpu_mn { /* objects protected by lock */ struct rw_semaphore lock; struct rb_root_cached objects; - struct mutexread_lock; - atomic_trecursion; + + /* HMM mirror */ + struct hmm_mirror mirror; }; /** @@ -103,7 +103,7 @@ struct amdgpu_mn_node { }; /** - * amdgpu_mn_destroy - destroy the MMU notifier + * amdgpu_mn_destroy - destroy the HMM mirror * * @work: previously sheduled work item * @@ -129,28 +129,26 @@ static void amdgpu_mn_destroy(struct work_struct *work) } up_write(>lock); mutex_unlock(>mn_lock); - mmu_notifier_unregister_no_release(>mn, amn->mm); + + hmm_mirror_unregister(>mirror); kfree(amn); } /** - * amdgpu_mn_release - callback to notify about mm destruction + * amdgpu_hmm_mirror_release - callback to notify about mm destruction * - * @mn: our notifier - * @mm: the mm this callback is about + * @mirror: the HMM mirror (mm) this callback is about * - * Shedule a work item to lazy destroy our notifier. + * Shedule a work item to lazy destroy HMM mirror. */ -static void amdgpu_mn_release(struct mmu_notifier *mn, - struct mm_struct *mm) +static void amdgpu_hmm_mirror_release(struct hmm_mirror *mirror) { - struct amdgpu_mn *amn = container_of(mn, struct amdgpu_mn, mn); + struct amdgpu_mn *amn = container_of(mirror, struct amdgpu_mn, mirror); INIT_WORK(>work, amdgpu_mn_destroy); schedule_work(>work); } - /** * amdgpu_mn_lock - take the write side lock for this notifier * @@ -181,14 +179,10 @@ void amdgpu_mn_unlock(struct amdgpu_mn *mn) static int amdgpu_mn_read_lock(struct amdgpu_mn *amn, bool blockable) { if (blockable) - mutex_lock(>read_lock); - else if (!mutex_trylock(>read_lock)) + down_read(>lock); + else if
[PATCH 0/3] Use HMM to replace get_user_pages
Hi Christian, Resend patch 1/3, 2/3, added Reviewed-by in comments. Change in patch 3/3, amdgpu_cs_submit, amdgpu_cs_ioctl return -EAGAIN to user space to retry cs_ioctl. Regards, Philip Philip Yang (3): drm/amdgpu: use HMM mirror callback to replace mmu notifier v7 drm/amdkfd: avoid HMM change cause circular lock dependency v2 drm/amdgpu: replace get_user_pages with HMM address mirror helpers v8 drivers/gpu/drm/amd/amdgpu/Kconfig| 6 +- drivers/gpu/drm/amd/amdgpu/Makefile | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 1 - .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 95 +++--- drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c| 138 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 14 +- drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c| 179 +- drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h| 6 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 178 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 3 +- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 32 ++-- 12 files changed, 269 insertions(+), 387 deletions(-) -- 2.17.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 2/3] drm/amdgpu: Fix a typo in vega10_ih_set_rptr()
Are you sure about this? Typically 64-bit doorbells don't wrap around. But this one does. If the IH doorbell wraps around, there is no reason why it needs to be 64-bit, so I suspect it may still be a 32-bit doorbell. AFAIK, not all doorbells on Vega10 are 64-bit. It depends on the IP block. Therefore, maybe some of your other renaming changes should be reconsidered if they assume that all doorbells are 64-bit. Maybe it makes more sense to think of 64-bit doorbells as using 2 doorbell indexes. Regards, Felix From: amd-gfx on behalf of Zhao, Yong Sent: Wednesday, February 6, 2019 10:49 AM To: amd-gfx@lists.freedesktop.org Cc: Zhao, Yong Subject: [PATCH 2/3] drm/amdgpu: Fix a typo in vega10_ih_set_rptr() Clearly, it should be a 64-bit doorbell operation. Change-Id: I644a2ebcb18c2ede24ee15692a6189efad10a35c Signed-off-by: Yong Zhao --- drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c index 796004896661..36f0e3cada30 100644 --- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c @@ -377,7 +377,7 @@ static void vega10_ih_set_rptr(struct amdgpu_device *adev, if (ih->use_doorbell) { /* XXX check if swapping is necessary on BE */ *ih->rptr_cpu = ih->rptr; - WDOORBELL32(ih->doorbell_idx_in_dw, ih->rptr); + WDOORBELL64(ih->doorbell_idx_in_dw, ih->rptr); } else if (ih == >irq.ih) { WREG32_SOC15(OSSSYS, 0, mmIH_RB_RPTR, ih->rptr); } else if (ih == >irq.ih1) { -- 2.17.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 3/3] drm/amdgpu: replace get_user_pages with HMM address mirror helpers v8
Use HMM helper function hmm_vma_fault() to get physical pages backing userptr and start CPU page table update track of those pages. Then use hmm_vma_range_done() to check if those pages are updated before amdgpu_cs_submit for gfx or before user queues are resumed for kfd. If userptr pages are updated, for gfx, amdgpu_cs_ioctl will restart from scratch, for kfd, restore worker is rescheduled to retry. HMM simplify the CPU page table concurrent update check, so remove guptasklock, mmu_invalidations, last_set_pages fields from amdgpu_ttm_tt struct. HMM does not pin the page (increase page ref count), so remove related operations like release_pages(), put_page(), mark_page_dirty(). Change-Id: I2e8c0c6f0d8c21e5596a32d7fc91762778bc9e67 Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 1 - .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 95 +++--- drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c| 138 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 14 +- drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c| 25 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h| 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 178 +++--- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 3 +- 9 files changed, 182 insertions(+), 278 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h index 0b31a1859023..0e1711a75b68 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h @@ -61,7 +61,6 @@ struct kgd_mem { atomic_t invalid; struct amdkfd_process_info *process_info; - struct page **user_pages; struct amdgpu_sync sync; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index d7b10d79f1de..ae2d838d31ea 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -582,28 +582,12 @@ static int init_user_pages(struct kgd_mem *mem, struct mm_struct *mm, goto out; } - /* If no restore worker is running concurrently, user_pages -* should not be allocated -*/ - WARN(mem->user_pages, "Leaking user_pages array"); - - mem->user_pages = kvmalloc_array(bo->tbo.ttm->num_pages, - sizeof(struct page *), - GFP_KERNEL | __GFP_ZERO); - if (!mem->user_pages) { - pr_err("%s: Failed to allocate pages array\n", __func__); - ret = -ENOMEM; - goto unregister_out; - } - - ret = amdgpu_ttm_tt_get_user_pages(bo->tbo.ttm, mem->user_pages); + ret = amdgpu_ttm_tt_get_user_pages(bo->tbo.ttm, bo->tbo.ttm->pages); if (ret) { pr_err("%s: Failed to get user pages: %d\n", __func__, ret); - goto free_out; + goto unregister_out; } - amdgpu_ttm_tt_set_user_pages(bo->tbo.ttm, mem->user_pages); - ret = amdgpu_bo_reserve(bo, true); if (ret) { pr_err("%s: Failed to reserve BO\n", __func__); @@ -616,11 +600,7 @@ static int init_user_pages(struct kgd_mem *mem, struct mm_struct *mm, amdgpu_bo_unreserve(bo); release_out: - if (ret) - release_pages(mem->user_pages, bo->tbo.ttm->num_pages); -free_out: - kvfree(mem->user_pages); - mem->user_pages = NULL; + amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm); unregister_out: if (ret) amdgpu_mn_unregister(bo); @@ -679,7 +659,6 @@ static int reserve_bo_and_vm(struct kgd_mem *mem, ctx->kfd_bo.priority = 0; ctx->kfd_bo.tv.bo = >tbo; ctx->kfd_bo.tv.num_shared = 1; - ctx->kfd_bo.user_pages = NULL; list_add(>kfd_bo.tv.head, >list); amdgpu_vm_get_pd_bo(vm, >list, >vm_pd[0]); @@ -743,7 +722,6 @@ static int reserve_bo_and_cond_vms(struct kgd_mem *mem, ctx->kfd_bo.priority = 0; ctx->kfd_bo.tv.bo = >tbo; ctx->kfd_bo.tv.num_shared = 1; - ctx->kfd_bo.user_pages = NULL; list_add(>kfd_bo.tv.head, >list); i = 0; @@ -1371,15 +1349,6 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu( list_del(_list_entry->head); mutex_unlock(_info->lock); - /* Free user pages if necessary */ - if (mem->user_pages) { - pr_debug("%s: Freeing user_pages array\n", __func__); - if (mem->user_pages[0]) - release_pages(mem->user_pages, - mem->bo->tbo.ttm->num_pages); - kvfree(mem->user_pages); - } - ret = reserve_bo_and_cond_vms(mem, NULL, BO_VM_ALL, ); if (unlikely(ret)) return ret; @@ -1855,25 +1824,11 @@ static int update_invalid_user_pages(struct
Re: [PATCH v2 2/2] drm/amd/powerplay: add override pcie parameters for Vega20 (v2)
Reviewed-by: Eric Huang On 2019-02-05 4:37 p.m., Kasiviswanathan, Harish wrote: > v2: Fix SMU message format > Send override message after SMU enable features > > Change-Id: Ib880c83bc7aa12be370cf6619acfe66e12664c9c > Signed-off-by: Harish Kasiviswanathan > --- > drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c | 45 > +- > 1 file changed, 27 insertions(+), 18 deletions(-) > > diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c > b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c > index da022ca..e1b1656 100644 > --- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c > +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c > @@ -771,40 +771,49 @@ static int vega20_init_smc_table(struct pp_hwmgr *hwmgr) > return 0; > } > > +/* > + * Override PCIe link speed and link width for DPM Level 1. PPTable entries > + * reflect the ASIC capabilities and not the system capabilities. For e.g. > + * Vega20 board in a PCI Gen3 system. In this case, when SMU's tries to > switch > + * to DPM1, it fails as system doesn't support Gen4. > + */ > static int vega20_override_pcie_parameters(struct pp_hwmgr *hwmgr) > { > struct amdgpu_device *adev = (struct amdgpu_device *)(hwmgr->adev); > - uint32_t pcie_speed = 0, pcie_width = 0, pcie_arg; > + uint32_t pcie_gen = 0, pcie_width = 0, smu_pcie_arg; > int ret; > > if (adev->pm.pcie_gen_mask & CAIL_PCIE_LINK_SPEED_SUPPORT_GEN4) > - pcie_speed = 16; > + pcie_gen = 3; > else if (adev->pm.pcie_gen_mask & CAIL_PCIE_LINK_SPEED_SUPPORT_GEN3) > - pcie_speed = 8; > + pcie_gen = 2; > else if (adev->pm.pcie_gen_mask & CAIL_PCIE_LINK_SPEED_SUPPORT_GEN2) > - pcie_speed = 5; > + pcie_gen = 1; > else if (adev->pm.pcie_gen_mask & CAIL_PCIE_LINK_SPEED_SUPPORT_GEN1) > - pcie_speed = 2; > + pcie_gen = 0; > > if (adev->pm.pcie_mlw_mask & CAIL_PCIE_LINK_WIDTH_SUPPORT_X32) > - pcie_width = 32; > + pcie_width = 7; > else if (adev->pm.pcie_mlw_mask & CAIL_PCIE_LINK_WIDTH_SUPPORT_X16) > - pcie_width = 16; > + pcie_width = 6; > else if (adev->pm.pcie_mlw_mask & CAIL_PCIE_LINK_WIDTH_SUPPORT_X12) > - pcie_width = 12; > + pcie_width = 5; > else if (adev->pm.pcie_mlw_mask & CAIL_PCIE_LINK_WIDTH_SUPPORT_X8) > - pcie_width = 8; > - else if (adev->pm.pcie_mlw_mask & CAIL_PCIE_LINK_WIDTH_SUPPORT_X4) > pcie_width = 4; > + else if (adev->pm.pcie_mlw_mask & CAIL_PCIE_LINK_WIDTH_SUPPORT_X4) > + pcie_width = 3; > else if (adev->pm.pcie_mlw_mask & CAIL_PCIE_LINK_WIDTH_SUPPORT_X2) > pcie_width = 2; > else if (adev->pm.pcie_mlw_mask & CAIL_PCIE_LINK_WIDTH_SUPPORT_X1) > pcie_width = 1; > > - pcie_arg = pcie_width | (pcie_speed << 8); > - > + /* Bit 31:16: LCLK DPM level. 0 is DPM0, and 1 is DPM1 > + * Bit 15:8: PCIE GEN, 0 to 3 corresponds to GEN1 to GEN4 > + * Bit 7:0: PCIE lane width, 1 to 7 corresponds is x1 to x32 > + */ > + smu_pcie_arg = (1 << 16) | (pcie_gen << 8) | pcie_width; > ret = smum_send_msg_to_smc_with_parameter(hwmgr, > - PPSMC_MSG_OverridePcieParameters, pcie_arg); > + PPSMC_MSG_OverridePcieParameters, smu_pcie_arg); > PP_ASSERT_WITH_CODE(!ret, > "[OverridePcieParameters] Attempt to override pcie params > failed!", > return ret); > @@ -1611,11 +1620,6 @@ static int vega20_enable_dpm_tasks(struct pp_hwmgr > *hwmgr) > "[EnableDPMTasks] Failed to initialize SMC table!", > return result); > > - result = vega20_override_pcie_parameters(hwmgr); > - PP_ASSERT_WITH_CODE(!result, > - "[EnableDPMTasks] Failed to override pcie parameters!", > - return result); > - > result = vega20_run_btc(hwmgr); > PP_ASSERT_WITH_CODE(!result, > "[EnableDPMTasks] Failed to run btc!", > @@ -1631,6 +1635,11 @@ static int vega20_enable_dpm_tasks(struct pp_hwmgr > *hwmgr) > "[EnableDPMTasks] Failed to enable all smu features!", > return result); > > + result = vega20_override_pcie_parameters(hwmgr); > + PP_ASSERT_WITH_CODE(!result, > + "[EnableDPMTasks] Failed to override pcie parameters!", > + return result); > + > result = vega20_notify_smc_display_change(hwmgr); > PP_ASSERT_WITH_CODE(!result, > "[EnableDPMTasks] Failed to notify smc display change!", ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 3/3] drm/amdgpu: replace get_user_pages with HMM address mirror helpers v7
Am 06.02.19 um 16:53 schrieb Yang, Philip: > >> +/* If userptr are updated after amdgpu_cs_parser_bos(), restart > >> cs */ > >> amdgpu_bo_list_for_each_userptr_entry(e, p->bo_list) { > >> struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo); > >> -if (amdgpu_ttm_tt_userptr_needs_pages(bo->tbo.ttm)) { > >> -r = -ERESTARTSYS; > >> -goto error_abort; > >> -} > >> +r |= !amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm); > > > > Just abort when you see the first one with a problem here. > > > > There is no value in checking all of them. > > > No, amdgpu_ttm_tt_get_user_pages_done calls hmm_vma_range_done to stop > HMM track and free range structure memory, this needs go through all > userptr BOs. Well that actually sounds like an ugliness in the hmm_vma_range_done interface. Need to double check the code and maybe sync up with Jerome if that is really a good idea. But that won't affect you work, so feel free to go ahead for now. > > >> + > >> +if (r == -EAGAIN) { > >> +if (!--tries) { > >> +DRM_ERROR("Possible deadlock? Retry too many times\n"); > >> +return -EDEADLK; > >> +} > >> +goto restart; > >> +} > >> + > > > > I would still say to better just return to userspace here. > > > > Because of the way HMM works 10 retries might not be sufficient any more. > > > Yes, it looks better to handle retry from user space. The extra sys call > overhead can be ignored because this does not happen all the time. I > will submit new patch for review. Thanks, apart from the issues mentioned so far this looks good to me. Feel free to add an Acked-by: Christian König to the patch. I still need to double check if we don't have any locking problems (inversions, missing locks etc...). When this is done I can also give you and rb for the patch. Christian. > > Thanks, > Philip > > On 2019-02-06 4:20 a.m., Christian König wrote: >> Am 05.02.19 um 23:00 schrieb Yang, Philip: >>> Use HMM helper function hmm_vma_fault() to get physical pages backing >>> userptr and start CPU page table update track of those pages. Then use >>> hmm_vma_range_done() to check if those pages are updated before >>> amdgpu_cs_submit for gfx or before user queues are resumed for kfd. >>> >>> If userptr pages are updated, for gfx, amdgpu_cs_ioctl will restart >>> from scratch, for kfd, restore worker is rescheduled to retry. >>> >>> HMM simplify the CPU page table concurrent update check, so remove >>> guptasklock, mmu_invalidations, last_set_pages fields from >>> amdgpu_ttm_tt struct. >>> >>> HMM does not pin the page (increase page ref count), so remove related >>> operations like release_pages(), put_page(), mark_page_dirty(). >>> >>> Change-Id: I2e8c0c6f0d8c21e5596a32d7fc91762778bc9e67 >>> Signed-off-by: Philip Yang >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 1 - >>> .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 95 +++--- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h | 2 +- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 158 +++- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 14 +- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 25 ++- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h | 4 +- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 178 +++--- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 3 +- >>> 9 files changed, 198 insertions(+), 282 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h >>> index 0b31a1859023..0e1711a75b68 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h >>> @@ -61,7 +61,6 @@ struct kgd_mem { >>> atomic_t invalid; >>> struct amdkfd_process_info *process_info; >>> - struct page **user_pages; >>> struct amdgpu_sync sync; >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>> index d7b10d79f1de..ae2d838d31ea 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>> @@ -582,28 +582,12 @@ static int init_user_pages(struct kgd_mem *mem, >>> struct mm_struct *mm, >>> goto out; >>> } >>> - /* If no restore worker is running concurrently, user_pages >>> - * should not be allocated >>> - */ >>> - WARN(mem->user_pages, "Leaking user_pages array"); >>> - >>> - mem->user_pages = kvmalloc_array(bo->tbo.ttm->num_pages, >>> - sizeof(struct page *), >>> - GFP_KERNEL | __GFP_ZERO); >>> - if (!mem->user_pages) { >>> - pr_err("%s: Failed to allocate pages array\n", __func__); >>> - ret = -ENOMEM; >>> - goto unregister_out; >>> - } >>> - >>> - ret
Re: [PATCH 1/3] drm/amdgpu: Improve doorbell variable names
Hi Alex, I only add the corresponding changes for IH, but not for GFX as GFX does not use the similar range functions. Regards, Yong From: Zhao, Yong Sent: Wednesday, February 6, 2019 10:49 AM To: amd-gfx@lists.freedesktop.org Cc: Zhao, Yong Subject: [PATCH 1/3] drm/amdgpu: Improve doorbell variable names Indicate that the doorbell offset and range is in dwords. Change-Id: Ib0f2564ffa7b1940ffb8725cdc03f662184f5436 Signed-off-by: Yong Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h | 3 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h | 2 +- drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c | 10 +- drivers/gpu/drm/amd/amdgpu/nbio_v7_0.c | 10 +- drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c | 10 +- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 3 ++- drivers/gpu/drm/amd/amdgpu/soc15.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/tonga_ih.c| 6 +++--- drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 6 +++--- drivers/gpu/drm/amd/amdgpu/vega10_reg_init.c | 6 +- drivers/gpu/drm/amd/amdgpu/vega20_reg_init.c | 6 +- 12 files changed, 40 insertions(+), 30 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index d67f8b1dfe80..88b3bbcea756 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -642,13 +642,13 @@ struct amdgpu_nbio_funcs { void (*hdp_flush)(struct amdgpu_device *adev, struct amdgpu_ring *ring); u32 (*get_memsize)(struct amdgpu_device *adev); void (*sdma_doorbell_range)(struct amdgpu_device *adev, int instance, - bool use_doorbell, int doorbell_index, int doorbell_size); + bool use_doorbell, int index_in_dw, int range_dw_size); void (*enable_doorbell_aperture)(struct amdgpu_device *adev, bool enable); void (*enable_doorbell_selfring_aperture)(struct amdgpu_device *adev, bool enable); void (*ih_doorbell_range)(struct amdgpu_device *adev, - bool use_doorbell, int doorbell_index); + bool use_doorbell, int index_in_dw); void (*update_medium_grain_clock_gating)(struct amdgpu_device *adev, bool enable); void (*update_medium_grain_light_sleep)(struct amdgpu_device *adev, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h index 1cfec06f81d4..5c8d04c353d0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h @@ -39,6 +39,7 @@ struct amdgpu_doorbell { * can be 64-bit, so the index defined is in qword. */ struct amdgpu_doorbell_index { + uint32_t entry_dw_size; uint32_t kiq; uint32_t mec_ring0; uint32_t mec_ring1; @@ -73,7 +74,7 @@ struct amdgpu_doorbell_index { }; uint32_t max_assignment; /* Per engine SDMA doorbell size in dword */ - uint32_t sdma_doorbell_range; + uint32_t dw_range_per_sdma_eng; }; typedef enum _AMDGPU_DOORBELL_ASSIGNMENT diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h index 1ccb1831382a..2572191b394a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h @@ -33,7 +33,7 @@ struct amdgpu_iv_entry; struct amdgpu_ih_ring { unsignedring_size; uint32_tptr_mask; - u32 doorbell_index; + u32 doorbell_idx_in_dw; booluse_doorbell; booluse_bus_addr; diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c b/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c index cc967dbfd631..bcc41c957b24 100644 --- a/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c +++ b/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c @@ -68,7 +68,7 @@ static u32 nbio_v6_1_get_memsize(struct amdgpu_device *adev) } static void nbio_v6_1_sdma_doorbell_range(struct amdgpu_device *adev, int instance, - bool use_doorbell, int doorbell_index, int doorbell_size) + bool use_doorbell, int index_in_dw, int range_dw_size) { u32 reg = instance == 0 ? SOC15_REG_OFFSET(NBIO, 0, mmBIF_SDMA0_DOORBELL_RANGE) : SOC15_REG_OFFSET(NBIO, 0, mmBIF_SDMA1_DOORBELL_RANGE); @@ -76,8 +76,8 @@ static void nbio_v6_1_sdma_doorbell_range(struct amdgpu_device *adev, int instan u32 doorbell_range = RREG32(reg); if (use_doorbell) { - doorbell_range = REG_SET_FIELD(doorbell_range, BIF_SDMA0_DOORBELL_RANGE, OFFSET, doorbell_index); - doorbell_range = REG_SET_FIELD(doorbell_range,
Re: [PATCH 3/3] drm/amdgpu: replace get_user_pages with HMM address mirror helpers v7
>> +/* If userptr are updated after amdgpu_cs_parser_bos(), restart >> cs */ >> amdgpu_bo_list_for_each_userptr_entry(e, p->bo_list) { >> struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo); >> -if (amdgpu_ttm_tt_userptr_needs_pages(bo->tbo.ttm)) { >> -r = -ERESTARTSYS; >> -goto error_abort; >> -} >> +r |= !amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm); > > Just abort when you see the first one with a problem here. > > There is no value in checking all of them. > No, amdgpu_ttm_tt_get_user_pages_done calls hmm_vma_range_done to stop HMM track and free range structure memory, this needs go through all userptr BOs. >> + >> +if (r == -EAGAIN) { >> +if (!--tries) { >> +DRM_ERROR("Possible deadlock? Retry too many times\n"); >> +return -EDEADLK; >> +} >> +goto restart; >> +} >> + > > I would still say to better just return to userspace here. > > Because of the way HMM works 10 retries might not be sufficient any more. > Yes, it looks better to handle retry from user space. The extra sys call overhead can be ignored because this does not happen all the time. I will submit new patch for review. Thanks, Philip On 2019-02-06 4:20 a.m., Christian König wrote: > Am 05.02.19 um 23:00 schrieb Yang, Philip: >> Use HMM helper function hmm_vma_fault() to get physical pages backing >> userptr and start CPU page table update track of those pages. Then use >> hmm_vma_range_done() to check if those pages are updated before >> amdgpu_cs_submit for gfx or before user queues are resumed for kfd. >> >> If userptr pages are updated, for gfx, amdgpu_cs_ioctl will restart >> from scratch, for kfd, restore worker is rescheduled to retry. >> >> HMM simplify the CPU page table concurrent update check, so remove >> guptasklock, mmu_invalidations, last_set_pages fields from >> amdgpu_ttm_tt struct. >> >> HMM does not pin the page (increase page ref count), so remove related >> operations like release_pages(), put_page(), mark_page_dirty(). >> >> Change-Id: I2e8c0c6f0d8c21e5596a32d7fc91762778bc9e67 >> Signed-off-by: Philip Yang >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 1 - >> .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 95 +++--- >> drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h | 2 +- >> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 158 +++- >> drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 14 +- >> drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 25 ++- >> drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h | 4 +- >> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 178 +++--- >> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 3 +- >> 9 files changed, 198 insertions(+), 282 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h >> index 0b31a1859023..0e1711a75b68 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h >> @@ -61,7 +61,6 @@ struct kgd_mem { >> atomic_t invalid; >> struct amdkfd_process_info *process_info; >> - struct page **user_pages; >> struct amdgpu_sync sync; >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >> index d7b10d79f1de..ae2d838d31ea 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >> @@ -582,28 +582,12 @@ static int init_user_pages(struct kgd_mem *mem, >> struct mm_struct *mm, >> goto out; >> } >> - /* If no restore worker is running concurrently, user_pages >> - * should not be allocated >> - */ >> - WARN(mem->user_pages, "Leaking user_pages array"); >> - >> - mem->user_pages = kvmalloc_array(bo->tbo.ttm->num_pages, >> - sizeof(struct page *), >> - GFP_KERNEL | __GFP_ZERO); >> - if (!mem->user_pages) { >> - pr_err("%s: Failed to allocate pages array\n", __func__); >> - ret = -ENOMEM; >> - goto unregister_out; >> - } >> - >> - ret = amdgpu_ttm_tt_get_user_pages(bo->tbo.ttm, mem->user_pages); >> + ret = amdgpu_ttm_tt_get_user_pages(bo->tbo.ttm, bo->tbo.ttm->pages); >> if (ret) { >> pr_err("%s: Failed to get user pages: %d\n", __func__, ret); >> - goto free_out; >> + goto unregister_out; >> } >> - amdgpu_ttm_tt_set_user_pages(bo->tbo.ttm, mem->user_pages); >> - >> ret = amdgpu_bo_reserve(bo, true); >> if (ret) { >> pr_err("%s: Failed to reserve BO\n", __func__); >> @@ -616,11 +600,7 @@ static int init_user_pages(struct kgd_mem *mem, >> struct mm_struct *mm, >> amdgpu_bo_unreserve(bo); >> release_out: >> - if (ret) >> - release_pages(mem->user_pages, bo->tbo.ttm->num_pages); >> -free_out: >> -
[PATCH 3/3] drm/amdgpu: Add dword term to the doorbell index in rings
When doorbells are 8-byte on SOC15, doorbell_index in rings no longer reflects its true usage. So we should indicate its dword attribute as a generic way to accommodate different doorbell sizes. Change-Id: I053c69498af5d68df1783a7eb03106dd68f5e8cc Signed-off-by: Yong Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 +- drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c| 2 +- drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c| 6 +++--- drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c| 16 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 20 ++-- drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 6 +++--- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 16 drivers/gpu/drm/amd/amdgpu/soc15.c | 2 +- drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c| 8 drivers/gpu/drm/amd/amdgpu/vce_v4_0.c| 8 11 files changed, 44 insertions(+), 44 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c index 97a60da62004..6389ef068f4b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c @@ -250,7 +250,7 @@ int amdgpu_gfx_kiq_init_ring(struct amdgpu_device *adev, ring->adev = NULL; ring->ring_obj = NULL; ring->use_doorbell = true; - ring->doorbell_index = adev->doorbell_index.kiq; + ring->doorbell_dw_idx = adev->doorbell_index.kiq; r = amdgpu_gfx_kiq_acquire(adev, ring); if (r) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index d7fae2676269..0c6fa7576a9f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -198,7 +198,7 @@ struct amdgpu_ring { uint64_tmqd_gpu_addr; void*mqd_ptr; uint64_teop_gpu_addr; - u32 doorbell_index; + u32 doorbell_dw_idx; booluse_doorbell; booluse_pollmem; unsignedwptr_offs; diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c index 305276c7e4bf..a6aff3f9ab9d 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c @@ -3109,7 +3109,7 @@ static int gfx_v6_0_sw_init(void *handle) ring = >gfx.compute_ring[i]; ring->ring_obj = NULL; ring->use_doorbell = false; - ring->doorbell_index = 0; + ring->doorbell_dw_idx = 0; ring->me = 1; ring->pipe = i; ring->queue = i; diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c index 7984292f9282..3f11f0ac43fc 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c @@ -2641,7 +2641,7 @@ static void gfx_v7_0_ring_set_wptr_compute(struct amdgpu_ring *ring) /* XXX check if swapping is necessary on BE */ adev->wb.wb[ring->wptr_offs] = lower_32_bits(ring->wptr); - WDOORBELL32(ring->doorbell_index, lower_32_bits(ring->wptr)); + WDOORBELL32(ring->doorbell_dw_idx, lower_32_bits(ring->wptr)); } /** @@ -2957,7 +2957,7 @@ static void gfx_v7_0_mqd_init(struct amdgpu_device *adev, mqd->cp_hqd_pq_doorbell_control &= ~CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_OFFSET_MASK; mqd->cp_hqd_pq_doorbell_control |= - (ring->doorbell_index << + (ring->doorbell_dw_idx << CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_OFFSET__SHIFT); mqd->cp_hqd_pq_doorbell_control |= CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_EN_MASK; @@ -4363,7 +4363,7 @@ static int gfx_v7_0_compute_ring_init(struct amdgpu_device *adev, int ring_id, ring->ring_obj = NULL; ring->use_doorbell = true; - ring->doorbell_index = adev->doorbell_index.mec_ring0 + ring_id; + ring->doorbell_dw_idx = adev->doorbell_index.mec_ring0 + ring_id; sprintf(ring->name, "comp_%d.%d.%d", ring->me, ring->pipe, ring->queue); irq_type = AMDGPU_CP_IRQ_COMPUTE_MEC1_PIPE0_EOP diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c index a26747681ed6..a2ae1446e7a9 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c @@ -1890,7 +1890,7 @@ static int gfx_v8_0_compute_ring_init(struct amdgpu_device *adev, int ring_id, ring->ring_obj = NULL; ring->use_doorbell = true; - ring->doorbell_index = adev->doorbell_index.mec_ring0 + ring_id; + ring->doorbell_dw_idx = adev->doorbell_index.mec_ring0 + ring_id; ring->eop_gpu_addr = adev->gfx.mec.hpd_eop_gpu_addr + (ring_id *
[PATCH 1/3] drm/amdgpu: Improve doorbell variable names
Indicate that the doorbell offset and range is in dwords. Change-Id: Ib0f2564ffa7b1940ffb8725cdc03f662184f5436 Signed-off-by: Yong Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h | 3 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h | 2 +- drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c | 10 +- drivers/gpu/drm/amd/amdgpu/nbio_v7_0.c | 10 +- drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c | 10 +- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 3 ++- drivers/gpu/drm/amd/amdgpu/soc15.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/tonga_ih.c| 6 +++--- drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 6 +++--- drivers/gpu/drm/amd/amdgpu/vega10_reg_init.c | 6 +- drivers/gpu/drm/amd/amdgpu/vega20_reg_init.c | 6 +- 12 files changed, 40 insertions(+), 30 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index d67f8b1dfe80..88b3bbcea756 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -642,13 +642,13 @@ struct amdgpu_nbio_funcs { void (*hdp_flush)(struct amdgpu_device *adev, struct amdgpu_ring *ring); u32 (*get_memsize)(struct amdgpu_device *adev); void (*sdma_doorbell_range)(struct amdgpu_device *adev, int instance, - bool use_doorbell, int doorbell_index, int doorbell_size); + bool use_doorbell, int index_in_dw, int range_dw_size); void (*enable_doorbell_aperture)(struct amdgpu_device *adev, bool enable); void (*enable_doorbell_selfring_aperture)(struct amdgpu_device *adev, bool enable); void (*ih_doorbell_range)(struct amdgpu_device *adev, - bool use_doorbell, int doorbell_index); + bool use_doorbell, int index_in_dw); void (*update_medium_grain_clock_gating)(struct amdgpu_device *adev, bool enable); void (*update_medium_grain_light_sleep)(struct amdgpu_device *adev, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h index 1cfec06f81d4..5c8d04c353d0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h @@ -39,6 +39,7 @@ struct amdgpu_doorbell { * can be 64-bit, so the index defined is in qword. */ struct amdgpu_doorbell_index { + uint32_t entry_dw_size; uint32_t kiq; uint32_t mec_ring0; uint32_t mec_ring1; @@ -73,7 +74,7 @@ struct amdgpu_doorbell_index { }; uint32_t max_assignment; /* Per engine SDMA doorbell size in dword */ - uint32_t sdma_doorbell_range; + uint32_t dw_range_per_sdma_eng; }; typedef enum _AMDGPU_DOORBELL_ASSIGNMENT diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h index 1ccb1831382a..2572191b394a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h @@ -33,7 +33,7 @@ struct amdgpu_iv_entry; struct amdgpu_ih_ring { unsignedring_size; uint32_tptr_mask; - u32 doorbell_index; + u32 doorbell_idx_in_dw; booluse_doorbell; booluse_bus_addr; diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c b/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c index cc967dbfd631..bcc41c957b24 100644 --- a/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c +++ b/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c @@ -68,7 +68,7 @@ static u32 nbio_v6_1_get_memsize(struct amdgpu_device *adev) } static void nbio_v6_1_sdma_doorbell_range(struct amdgpu_device *adev, int instance, - bool use_doorbell, int doorbell_index, int doorbell_size) + bool use_doorbell, int index_in_dw, int range_dw_size) { u32 reg = instance == 0 ? SOC15_REG_OFFSET(NBIO, 0, mmBIF_SDMA0_DOORBELL_RANGE) : SOC15_REG_OFFSET(NBIO, 0, mmBIF_SDMA1_DOORBELL_RANGE); @@ -76,8 +76,8 @@ static void nbio_v6_1_sdma_doorbell_range(struct amdgpu_device *adev, int instan u32 doorbell_range = RREG32(reg); if (use_doorbell) { - doorbell_range = REG_SET_FIELD(doorbell_range, BIF_SDMA0_DOORBELL_RANGE, OFFSET, doorbell_index); - doorbell_range = REG_SET_FIELD(doorbell_range, BIF_SDMA0_DOORBELL_RANGE, SIZE, doorbell_size); + doorbell_range = REG_SET_FIELD(doorbell_range, BIF_SDMA0_DOORBELL_RANGE, OFFSET, index_in_dw); + doorbell_range = REG_SET_FIELD(doorbell_range, BIF_SDMA0_DOORBELL_RANGE, SIZE, range_dw_size); } else doorbell_range = REG_SET_FIELD(doorbell_range,
[PATCH 2/3] drm/amdgpu: Fix a typo in vega10_ih_set_rptr()
Clearly, it should be a 64-bit doorbell operation. Change-Id: I644a2ebcb18c2ede24ee15692a6189efad10a35c Signed-off-by: Yong Zhao --- drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c index 796004896661..36f0e3cada30 100644 --- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c @@ -377,7 +377,7 @@ static void vega10_ih_set_rptr(struct amdgpu_device *adev, if (ih->use_doorbell) { /* XXX check if swapping is necessary on BE */ *ih->rptr_cpu = ih->rptr; - WDOORBELL32(ih->doorbell_idx_in_dw, ih->rptr); + WDOORBELL64(ih->doorbell_idx_in_dw, ih->rptr); } else if (ih == >irq.ih) { WREG32_SOC15(OSSSYS, 0, mmIH_RB_RPTR, ih->rptr); } else if (ih == >irq.ih1) { -- 2.17.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [Intel-gfx] [PATCH 2/6] drm/drv: Prepare to remove drm_dev_unplug()
On Tue, Feb 05, 2019 at 06:57:50PM +0100, Noralf Trønnes wrote: > > > Den 05.02.2019 17.31, skrev Daniel Vetter: > > On Tue, Feb 05, 2019 at 11:20:55AM +0100, Noralf Trønnes wrote: > >> > >> > >> Den 05.02.2019 10.11, skrev Daniel Vetter: > >>> On Mon, Feb 04, 2019 at 06:35:28PM +0100, Noralf Trønnes wrote: > > > Den 04.02.2019 16.41, skrev Daniel Vetter: > > On Sun, Feb 03, 2019 at 04:41:56PM +0100, Noralf Trønnes wrote: > >> The only thing now that makes drm_dev_unplug() special is that it sets > >> drm_device->unplugged. Move this code to drm_dev_unregister() so that > >> we > >> can remove drm_dev_unplug(). > >> > >> Signed-off-by: Noralf Trønnes > >> --- > > [...] > > >> drivers/gpu/drm/drm_drv.c | 27 +++ > >> include/drm/drm_drv.h | 10 -- > >> 2 files changed, 19 insertions(+), 18 deletions(-) > >> > >> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c > >> index 05bbc2b622fc..e0941200edc6 100644 > >> --- a/drivers/gpu/drm/drm_drv.c > >> +++ b/drivers/gpu/drm/drm_drv.c > >> @@ -366,15 +366,6 @@ EXPORT_SYMBOL(drm_dev_exit); > >> */ > >> void drm_dev_unplug(struct drm_device *dev) > >> { > >> - /* > >> - * After synchronizing any critical read section is guaranteed > >> to see > >> - * the new value of ->unplugged, and any critical section which > >> might > >> - * still have seen the old value of ->unplugged is guaranteed > >> to have > >> - * finished. > >> - */ > >> - dev->unplugged = true; > >> - synchronize_srcu(_unplug_srcu); > >> - > >>drm_dev_unregister(dev); > >>drm_dev_put(dev); > >> } > >> @@ -832,11 +823,14 @@ EXPORT_SYMBOL(drm_dev_register); > >> * drm_dev_register() but does not deallocate the device. The caller > >> must call > >> * drm_dev_put() to drop their final reference. > >> * > >> - * A special form of unregistering for hotpluggable devices is > >> drm_dev_unplug(), > >> - * which can be called while there are still open users of @dev. > >> + * This function can be called while there are still open users of > >> @dev as long > >> + * as the driver protects its device resources using drm_dev_enter() > >> and > >> + * drm_dev_exit(). > >> * > >> * This should be called first in the device teardown code to make > >> sure > >> - * userspace can't access the device instance any more. > >> + * userspace can't access the device instance any more. Drivers that > >> support > >> + * device unplug will probably want to call > >> drm_atomic_helper_shutdown() first > > > > Read once more with a bit more coffee, spotted this: > > > > s/first/afterwards/ - shutting down the hw before we've taken it away > > from > > userspace is kinda the wrong way round. It should be the inverse of > > driver > > load, which is 1) allocate structures 2) prep hw 3) register driver with > > the world (simplified ofc). > > > > The problem is that drm_dev_unregister() sets the device as unplugged > and if drm_atomic_helper_shutdown() is called afterwards it's not > allowed to touch hardware. > > I know it's the wrong order, but the only way to do it in the right > order is to have a separate function that sets unplugged: > > drm_dev_unregister(); > drm_atomic_helper_shutdown(); > drm_dev_set_unplugged(); > >>> > >>> Annoying ... but yeah calling _shutdown() before we stopped userspace is > >>> also not going to work. Because userspace could quickly re-enable > >>> something, and then the refcounts would be all wrong again and leaking > >>> objects. > >>> > >> > >> What happens with a USB device that is unplugged with open userspace, > >> will that leak objects? > > > > Maybe we've jumped to conclusions. drm_atomic_helper_shutdown() will run > > as normal, the only thing that should be skipped is actually touching the > > hw (as long as the driver doesn't protect too much with > > drm_dev_enter/exit). So all the software updates (including refcounting > > updates) will still be done. Ofc current udl is not yet atomic, so in > > reality something else happens. > > > > And we ofc still have the same issue that if you just unload the driver, > > then the hw will stay on (which might really confuse the driver on next > > load, when it assumes that it only gets loaded from cold boot where > > everything is off - which usually is the case on an arm soc at least). > > > >>> I get a bit the feeling we're over-optimizing here with trying to devm-ize > >>> drm_dev_register. Just getting drm_device correctly devm-ized is a big > >>> step forward already, and will open up a lot of TODO items across a lot of > >>> drivers. E.g. we
RE: [PATCH 2/5] drm/amdgpu: Fix a bug in setting CP_MEC_DOORBELL_RANGE_UPPER on SOC15
+ Clint/Alice Hi Clint, We think CP_MEC_DOORBELL_RANGE_LOWER/UPPER registers are used by MEC to check whether a doorbell routed to MEC belongs to MEC, is this understanding correct? From the register spec, those registers are 27 bits. Does this mean MEC use all 27 bits to determine? For example, if we set lower/upper to [0, 4k], will a doorbell ring at 6K address be ignored by MEC? Thanks, Oak -Original Message- From: amd-gfx On Behalf Of Zhao, Yong Sent: Tuesday, February 5, 2019 3:31 PM To: amd-gfx@lists.freedesktop.org Cc: Zhao, Yong Subject: [PATCH 2/5] drm/amdgpu: Fix a bug in setting CP_MEC_DOORBELL_RANGE_UPPER on SOC15 Because CP can use all doorbells outside the ones reserved for SDMA, IH, and VCN, so the value set to CP_MEC_DOORBELL_RANGE_UPPER should be the maximal index possible in a page. Change-Id: I402a56ce9a80e6c2ed2f96be431ae71ca88e73a4 Signed-off-by: Yong Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h | 1 + drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 2 +- drivers/gpu/drm/amd/amdgpu/vega10_reg_init.c | 3 +++ drivers/gpu/drm/amd/amdgpu/vega20_reg_init.c | 3 +++ 4 files changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h index 5c8d04c353d0..90eca63605ea 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h @@ -73,6 +73,7 @@ struct amdgpu_doorbell_index { } uvd_vce; }; uint32_t max_assignment; + uint32_t last_idx; /* Per engine SDMA doorbell size in dword */ uint32_t dw_range_per_sdma_eng; }; diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index 262ee3cf6f1c..0278e3ab6b94 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c @@ -2998,7 +2998,7 @@ static int gfx_v9_0_kiq_init_register(struct amdgpu_ring *ring) WREG32_SOC15(GC, 0, mmCP_MEC_DOORBELL_RANGE_LOWER, (adev->doorbell_index.kiq * 2) << 2); WREG32_SOC15(GC, 0, mmCP_MEC_DOORBELL_RANGE_UPPER, - (adev->doorbell_index.userqueue_end * 2) << 2); + (adev->doorbell_index.last_idx * 2) << 2); } WREG32_SOC15(GC, 0, mmCP_HQD_PQ_DOORBELL_CONTROL, diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_reg_init.c b/drivers/gpu/drm/amd/amdgpu/vega10_reg_init.c index d2409df2dde9..9eb8c9209231 100644 --- a/drivers/gpu/drm/amd/amdgpu/vega10_reg_init.c +++ b/drivers/gpu/drm/amd/amdgpu/vega10_reg_init.c @@ -88,5 +88,8 @@ void vega10_doorbell_index_init(struct amdgpu_device *adev) (adev->doorbell_index.sdma_engine[1] - adev->doorbell_index.sdma_engine[0]) * adev->doorbell_index.entry_dw_size; + + adev->doorbell_index.last_idx = PAGE_SIZE + / (sizeof(uint32_t) * adev->doorbell_index.entry_dw_size) - 1; } diff --git a/drivers/gpu/drm/amd/amdgpu/vega20_reg_init.c b/drivers/gpu/drm/amd/amdgpu/vega20_reg_init.c index b28c5999d8f0..aa8c7699c689 100644 --- a/drivers/gpu/drm/amd/amdgpu/vega20_reg_init.c +++ b/drivers/gpu/drm/amd/amdgpu/vega20_reg_init.c @@ -91,5 +91,8 @@ void vega20_doorbell_index_init(struct amdgpu_device *adev) (adev->doorbell_index.sdma_engine[1] - adev->doorbell_index.sdma_engine[0]) * adev->doorbell_index.entry_dw_size; + + adev->doorbell_index.last_idx = PAGE_SIZE + / (sizeof(uint32_t) * adev->doorbell_index.entry_dw_size) - 1; } -- 2.17.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
BUG - unable to handle null pointer, bisected - drm/amd/display: add gpio lock/unlock
Good morning, on my Lenovo G50-45 a6310 APU with R4 Mullins commit e261568f94d6c37ebb94d3c4b3f8a3085375dd9d is causing kernel Oops (unable to handle NULL pointer). Cross-checked by reverting troublesome commit and machine without it is working fine. Here is a part of the Oops message from pstore: <1>[ 13.200310] BUG: unable to handle kernel NULL pointer dereference at 0008 <1>[ 13.200323] #PF error: [normal kernel read fault] <6>[ 13.200328] PGD 0 P4D 0 <4>[ 13.200335] Oops: [#1] PREEMPT SMP <4>[ 13.200342] CPU: 2 PID: 2961 Comm: udevd Not tainted 5.0.0-rc1+ #47 <4>[ 13.200347] Hardware name: LENOVO 80E3/Lancer 5B2, BIOS A2CN45WW(V2.13) 08/04/2016 <4>[ 13.200450] RIP: 0010:dal_gpio_open_ex+0x0/0x30 [amdgpu] <4>[ 13.200456] Code: d6 48 89 de 48 89 ef e8 6e f8 ff ff 84 c0 74 c7 48 89 e8 5b 5d c3 0f 0b 31 ed 5b 48 89 e8 5d c3 66 2e 0f 1f 84 00 00 00 00 00 <48> 83 7f 08 00 74 08 0f 0b b8 05 00 00 00 c3 89 77 18 8b 57 14 4c <4>[ 13.200466] RSP: 0018:b78e82bb7650 EFLAGS: 00010282 <4>[ 13.200471] RAX: RBX: b78e82bb76a4 RCX: <4>[ 13.200476] RDX: 0006 RSI: 0004 RDI: <4>[ 13.200480] RBP: a1d695e93300 R08: 0003 R09: a1d692456600 <4>[ 13.200485] R10: f7dc88574dc0 R11: b78e82bb75b8 R12: a1d695c68700 <4>[ 13.200490] R13: c07ef5a0 R14: b78e82bb79b8 R15: a1d692456600 <4>[ 13.200495] FS: 7f9c3fcac300() GS:a1d697b0() knlGS: <4>[ 13.200501] CS: 0010 DS: ES: CR0: 80050033 <4>[ 13.200506] CR2: 0008 CR3: 0002124a CR4: 000406e0 <4>[ 13.200510] Call Trace: <4>[ 13.200605] construct+0x15f/0x710 [amdgpu] <4>[ 13.200710] link_create+0x2e/0x48 [amdgpu] <4>[ 13.200803] dc_create+0x2c0/0x5f0 [amdgpu] <4>[ 13.200899] dm_hw_init+0xe0/0x150 [amdgpu] <4>[ 13.200990] amdgpu_device_init.cold.38+0xe06/0xf67 [amdgpu] <4>[ 13.201002] ? kmalloc_order+0x13/0x38 <4>[ 13.201102] amdgpu_driver_load_kms+0x60/0x210 [amdgpu] <4>[ 13.201112] drm_dev_register+0x10e/0x150 <4>[ 13.201207] amdgpu_pci_probe+0xb8/0x118 [amdgpu] <4>[ 13.201217] ? _raw_spin_unlock_irqrestore+0xf/0x28 <4>[ 13.201226] pci_device_probe+0xd1/0x158 <4>[ 13.201234] really_probe+0xee/0x2a0 <4>[ 13.201241] driver_probe_device+0x4a/0xb0 <4>[ 13.201247] __driver_attach+0xaf/0xc8 <4>[ 13.201253] ? driver_probe_device+0xb0/0xb0 <4>[ 13.201258] bus_for_each_dev+0x6f/0xb8 <4>[ 13.201265] bus_add_driver+0x197/0x1d8 <4>[ 13.201271] ? 0xc0933000 <4>[ 13.201276] driver_register+0x66/0xa8 <4>[ 13.201281] ? 0xc0933000 <4>[ 13.201287] do_one_initcall+0x41/0x1e2 <4>[ 13.201294] ? wake_up_page_bit+0x21/0x100 <4>[ 13.201301] ? kmem_cache_alloc_trace+0x2e/0x1a0 <4>[ 13.201308] ? do_init_module+0x1d/0x1e0 <4>[ 13.201315] do_init_module+0x55/0x1e0 <4>[ 13.201321] load_module+0x205c/0x2488 <4>[ 13.201329] ? vfs_read+0x10e/0x138 <4>[ 13.201337] ? __do_sys_finit_module+0xba/0xd8 <4>[ 13.201342] __do_sys_finit_module+0xba/0xd8 <4>[ 13.201350] do_syscall_64+0x50/0x168 <4>[ 13.201357] entry_SYSCALL_64_after_hwframe+0x44/0xa9 <4>[ 13.201364] RIP: 0033:0x7f9c3fdcf409 <4>[ 13.201371] Code: 18 c3 e8 3a 98 01 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 47 6a 0c 00 f7 d8 64 89 01 48 <4>[ 13.201381] RSP: 002b:7fff9b4824f8 EFLAGS: 0246 ORIG_RAX: 0139 <4>[ 13.201389] RAX: ffda RBX: 559d56fe1780 RCX: 7f9c3fdcf409 <4>[ 13.201394] RDX: RSI: 559d570385c0 RDI: 000e <4>[ 13.201399] RBP: R08: R09: 7fff9b482610 <4>[ 13.201404] R10: 000e R11: 0246 R12: 559d56ff2120 <4>[ 13.201409] R13: 0002 R14: 559d570385c0 R15: 559d56fe1780 <4>[ 13.201416] Modules linked in: kvm_amd kvm ath9k irqbypass crc32_pclmul ghash_clmulni_intel serio_raw ath9k_common ath9k_hw sdhci_pci cqhci sdhci amdgpu(+) mmc_core mac80211 ath mfd_core chash cfg80211 gpu_sched ttm xhci_pci ehci_pci xhci_hcd ehci_hcd sp5100_tco <4>[ 13.201448] CR2: 0008 <4>[ 13.206222] ---[ end trace 2244da3024c5ad93 ]--- Here is a full git bisect log on amd-staging-drm-next branch synced today: git bisect start # good: [e1be4cb583800db36ed7f6303f7a8c205be24ceb] drm/amd/display: Use memset to initialize variables in fill_plane_dcc_attributes git bisect good e1be4cb583800db36ed7f6303f7a8c205be24ceb # bad: [25fa5507b06b8cfbec6db7933615ae603516bb7b] drm/amd/display: Disconnect mpcc when changing tg git bisect bad 25fa5507b06b8cfbec6db7933615ae603516bb7b # good: [e7b4cc9edcbe9c07e5bae2dbdebb04b054e3ff5b] drm/amd/display: Remove FreeSync timing changed debug output git bisect good
Re: [PATCH v2 1/2] drm/amdgpu: Fix pci platform speed and width
Am 06.02.19 um 02:13 schrieb Alex Deucher: On Tue, Feb 5, 2019 at 4:38 PM Kasiviswanathan, Harish wrote: The new Vega series GPU cards have in-built bridges. To get the pcie speed and width supported by the platform walk the hierarchy and get the slowest link. Change-Id: I3196d158b0c614cbb5d7a34c793a58cb95322d32 Signed-off-by: Harish Kasiviswanathan --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 58 +++--- 1 file changed, 46 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index d7dddb9..fcab1fe 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3618,6 +3618,38 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, return r; } +static void amdgpu_device_get_min_pci_speed_width(struct amdgpu_device *adev, + enum pci_bus_speed *speed, + enum pcie_link_width *width) +{ + struct pci_dev *pdev = adev->pdev; + enum pci_bus_speed cur_speed; + enum pcie_link_width cur_width; + + *speed = PCI_SPEED_UNKNOWN; + *width = PCIE_LNK_WIDTH_UNKNOWN; + + while (pdev) { + cur_speed = pcie_get_speed_cap(pdev); + cur_width = pcie_get_width_cap(pdev); + + if (cur_speed != PCI_SPEED_UNKNOWN) { + if (*speed == PCI_SPEED_UNKNOWN) + *speed = cur_speed; + else if (cur_speed < *speed) + *speed = cur_speed; + } + + if (cur_width != PCIE_LNK_WIDTH_UNKNOWN) { + if (*width == PCIE_LNK_WIDTH_UNKNOWN) + *width = cur_width; + else if (cur_width < *width) + *width = cur_width; + } + pdev = pci_upstream_bridge(pdev); + } +} + That should probably rather be some helper function in the PCIe subsystem. But for now it should be ok to have it here instead. Feel free to add an Acked-by: Christian König as well. Christian. /** * amdgpu_device_get_pcie_info - fence pcie info about the PCIE slot * @@ -3630,8 +3662,8 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, static void amdgpu_device_get_pcie_info(struct amdgpu_device *adev) { struct pci_dev *pdev; - enum pci_bus_speed speed_cap; - enum pcie_link_width link_width; + enum pci_bus_speed speed_cap, platform_speed_cap; + enum pcie_link_width platform_link_width; if (amdgpu_pcie_gen_cap) adev->pm.pcie_gen_mask = amdgpu_pcie_gen_cap; @@ -3648,6 +3680,12 @@ static void amdgpu_device_get_pcie_info(struct amdgpu_device *adev) return; } + if (adev->pm.pcie_gen_mask && adev->pm.pcie_mlw_mask) + return; I think you can drop this check, but I guess it doesn't hurt. Either way, series is: Reviewed-by: Alex Deucher + + amdgpu_device_get_min_pci_speed_width(adev, _speed_cap, + _link_width); + if (adev->pm.pcie_gen_mask == 0) { /* asic caps */ pdev = adev->pdev; @@ -3673,22 +3711,20 @@ static void amdgpu_device_get_pcie_info(struct amdgpu_device *adev) adev->pm.pcie_gen_mask |= CAIL_ASIC_PCIE_LINK_SPEED_SUPPORT_GEN1; } /* platform caps */ - pdev = adev->ddev->pdev->bus->self; - speed_cap = pcie_get_speed_cap(pdev); - if (speed_cap == PCI_SPEED_UNKNOWN) { + if (platform_speed_cap == PCI_SPEED_UNKNOWN) { adev->pm.pcie_gen_mask |= (CAIL_PCIE_LINK_SPEED_SUPPORT_GEN1 | CAIL_PCIE_LINK_SPEED_SUPPORT_GEN2); } else { - if (speed_cap == PCIE_SPEED_16_0GT) + if (platform_speed_cap == PCIE_SPEED_16_0GT) adev->pm.pcie_gen_mask |= (CAIL_PCIE_LINK_SPEED_SUPPORT_GEN1 | CAIL_PCIE_LINK_SPEED_SUPPORT_GEN2 | CAIL_PCIE_LINK_SPEED_SUPPORT_GEN3 | CAIL_PCIE_LINK_SPEED_SUPPORT_GEN4); - else if (speed_cap == PCIE_SPEED_8_0GT) + else if (platform_speed_cap == PCIE_SPEED_8_0GT) adev->pm.pcie_gen_mask |= (CAIL_PCIE_LINK_SPEED_SUPPORT_GEN1 | CAIL_PCIE_LINK_SPEED_SUPPORT_GEN2 |
Re: [PATCH 3/3] drm/amdgpu: replace get_user_pages with HMM address mirror helpers v7
Am 05.02.19 um 23:00 schrieb Yang, Philip: Use HMM helper function hmm_vma_fault() to get physical pages backing userptr and start CPU page table update track of those pages. Then use hmm_vma_range_done() to check if those pages are updated before amdgpu_cs_submit for gfx or before user queues are resumed for kfd. If userptr pages are updated, for gfx, amdgpu_cs_ioctl will restart from scratch, for kfd, restore worker is rescheduled to retry. HMM simplify the CPU page table concurrent update check, so remove guptasklock, mmu_invalidations, last_set_pages fields from amdgpu_ttm_tt struct. HMM does not pin the page (increase page ref count), so remove related operations like release_pages(), put_page(), mark_page_dirty(). Change-Id: I2e8c0c6f0d8c21e5596a32d7fc91762778bc9e67 Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 1 - .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 95 +++--- drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c| 158 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 14 +- drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c| 25 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h| 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 178 +++--- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 3 +- 9 files changed, 198 insertions(+), 282 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h index 0b31a1859023..0e1711a75b68 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h @@ -61,7 +61,6 @@ struct kgd_mem { atomic_t invalid; struct amdkfd_process_info *process_info; - struct page **user_pages; struct amdgpu_sync sync; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index d7b10d79f1de..ae2d838d31ea 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -582,28 +582,12 @@ static int init_user_pages(struct kgd_mem *mem, struct mm_struct *mm, goto out; } - /* If no restore worker is running concurrently, user_pages -* should not be allocated -*/ - WARN(mem->user_pages, "Leaking user_pages array"); - - mem->user_pages = kvmalloc_array(bo->tbo.ttm->num_pages, - sizeof(struct page *), - GFP_KERNEL | __GFP_ZERO); - if (!mem->user_pages) { - pr_err("%s: Failed to allocate pages array\n", __func__); - ret = -ENOMEM; - goto unregister_out; - } - - ret = amdgpu_ttm_tt_get_user_pages(bo->tbo.ttm, mem->user_pages); + ret = amdgpu_ttm_tt_get_user_pages(bo->tbo.ttm, bo->tbo.ttm->pages); if (ret) { pr_err("%s: Failed to get user pages: %d\n", __func__, ret); - goto free_out; + goto unregister_out; } - amdgpu_ttm_tt_set_user_pages(bo->tbo.ttm, mem->user_pages); - ret = amdgpu_bo_reserve(bo, true); if (ret) { pr_err("%s: Failed to reserve BO\n", __func__); @@ -616,11 +600,7 @@ static int init_user_pages(struct kgd_mem *mem, struct mm_struct *mm, amdgpu_bo_unreserve(bo); release_out: - if (ret) - release_pages(mem->user_pages, bo->tbo.ttm->num_pages); -free_out: - kvfree(mem->user_pages); - mem->user_pages = NULL; + amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm); unregister_out: if (ret) amdgpu_mn_unregister(bo); @@ -679,7 +659,6 @@ static int reserve_bo_and_vm(struct kgd_mem *mem, ctx->kfd_bo.priority = 0; ctx->kfd_bo.tv.bo = >tbo; ctx->kfd_bo.tv.num_shared = 1; - ctx->kfd_bo.user_pages = NULL; list_add(>kfd_bo.tv.head, >list); amdgpu_vm_get_pd_bo(vm, >list, >vm_pd[0]); @@ -743,7 +722,6 @@ static int reserve_bo_and_cond_vms(struct kgd_mem *mem, ctx->kfd_bo.priority = 0; ctx->kfd_bo.tv.bo = >tbo; ctx->kfd_bo.tv.num_shared = 1; - ctx->kfd_bo.user_pages = NULL; list_add(>kfd_bo.tv.head, >list); i = 0; @@ -1371,15 +1349,6 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu( list_del(_list_entry->head); mutex_unlock(_info->lock); - /* Free user pages if necessary */ - if (mem->user_pages) { - pr_debug("%s: Freeing user_pages array\n", __func__); - if (mem->user_pages[0]) - release_pages(mem->user_pages, - mem->bo->tbo.ttm->num_pages); - kvfree(mem->user_pages); - } - ret = reserve_bo_and_cond_vms(mem, NULL, BO_VM_ALL, ); if (unlikely(ret)) return ret; @@ -1855,25 +1824,11 @@ static int
Re: [PATCH 6/8] drm/amdgpu: allocate VM PDs/PTs on demand
Am 06.02.19 um 00:51 schrieb Kuehling, Felix: > On 2019-02-05 11:00 a.m., Koenig, Christian wrote: >> Ah! The initial clear of the PDs is syncing to everything! > Right. Using amdgpu_sync_resv(... AMDGPU_FENCE_OWNER_VM ...) in > amdgpu_vm_clear_bo seems to help. But if I also change the > amdgpu_job_submit to use AMDGPU_FENCE_OWNER_VM, I get a VM fault and CP > hang very early in my test. Not sure what's happening there. This is > what I changed: > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > index 8b240f9..06c28ac 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > @@ -826,17 +826,18 @@ static int amdgpu_vm_clear_bo(struct amdgpu_device > *adev, > > WARN_ON(job->ibs[0].length_dw > 64); > r = amdgpu_sync_resv(adev, >sync, bo->tbo.resv, > -AMDGPU_FENCE_OWNER_UNDEFINED, false); > +AMDGPU_FENCE_OWNER_VM, false); That's correct. > if (r) > goto error_free; > > - r = amdgpu_job_submit(job, >entity, AMDGPU_FENCE_OWNER_UNDEFINED, > + r = amdgpu_job_submit(job, >entity, AMDGPU_FENCE_OWNER_VM, This change most likely doesn't make a difference. > ); > if (r) > goto error_free; > > amdgpu_bo_fence(bo, fence, true); > - dma_fence_put(fence); > + dma_fence_put(vm->last_update); > + vm->last_update = fence; This is most likely incorrect. > > if (bo->shadow) > return amdgpu_vm_clear_bo(adev, vm, bo->shadow, > > Basically I tried to do the synchronization exactly like > amdgpu_vm_update_directories. > > The only way this clear could cause a VM fault is, if a subsequent > PTE/PDS update happens too early and gets overwritten by the pending > clear. But don't the clear and the next PTE/PDE update go to the same > queue (vm->entity) and are implicitly synchronized? Need to check where this is coming from. > Answer after another hour of pouring over code and history: No, sched > entities are no longer equivalent to queues. The scheduler can itself > load-balance VM updates using multiple SDMA queues. Sigh. That is unproblematic. Load balancing happens only when the entity is idle. E.g. submissions to the same entity never run in parallel. > OK, so page table updates to different PTEs don't usually need to > synchronize with each other. But how do page table updates to the same > address get synchronized? How do you guarantee that two updates of the > same PTE are processed by the hardware in the correct order, if > AMDGPU_FENCE_OWNER_VM fences don't synchronize with each other? Mhm, need to double check but that explanation doesn't seem to fit. There must be something else going on. > > >> Ok, that is easy to fix. > Not that easy. See above. ;) Well need to take a closer look why you run into VM faults, but basically replacing the parameter in amdgpu_sync_resv should already do the trick. Regards, Christian. > > Regards, > Felix > > > >> Christian. >> >> Am 05.02.19 um 16:49 schrieb Kuehling, Felix: >>> I saw a backtrace from the GPU scheduler when I was debugging this >>> yesterday, but those backtraces never tell us where the command submission >>> came from. It could be memory initialization, or some migration at >>> buffer-validation. Basically, any command submission triggered by page >>> table allocation that synchronizes with the PD reservation object is a >>> problem. >>> >>> Regards, >>> Felix >>> >>> -Original Message- >>> From: Christian König >>> Sent: Tuesday, February 05, 2019 10:40 AM >>> To: Kuehling, Felix ; Koenig, Christian >>> ; amd-gfx@lists.freedesktop.org >>> Subject: Re: [PATCH 6/8] drm/amdgpu: allocate VM PDs/PTs on demand >>> >>> Am 05.02.19 um 16:20 schrieb Kuehling, Felix: >> This may cause regressions in KFD if PT BO allocation can trigger >> eviction fences. > I don't think that can happen, but need to double check as well. > > Otherwise allocating page tables could try to evict other page tables > from the same process and that seriously doesn't make much sense. Eviction fences don't cause actual memory evictions. They are the result of memory evictions. They cause KFD processes to be preempted in order to allow the fence to signal so memory can be evicted. The problem is that a page table allocation waits for the fences on the PD reservation, which looks like an eviction to KFD and triggers an unnecessary preemption. There is no actual memory eviction happening here. >>> Yeah, but where is that actually coming from? >>> >>> It sounds like we still unnecessary synchronize page table updates >>> somewhere. >>> >>> Do you have a call chain? >>> >>> Regards, >>> Christian. >>> Regards, Felix -Original Message-
Re: [PATCH 3/3] drm/amdgpu: replace get_user_pages with HMM address mirror helpers v6
Am 05.02.19 um 22:56 schrieb Yang, Philip: Hi Christian, I will submit new patch for review, my comments embedded inline below. Thanks, Philip On 2019-02-05 1:09 p.m., Koenig, Christian wrote: Am 05.02.19 um 18:25 schrieb Yang, Philip: [SNIP]+ + if (r == -ERESTARTSYS) { + if (!--tries) { + DRM_ERROR("Possible deadlock? Retry too many times\n"); + return -EDEADLK; + } + goto restart; You really need to restart the IOCTL to potentially handle signals. But since the calling code correctly handles this already you can just drop this change as far as I can see. I agree that we should return -ERESTARTSYS to upper layer to handle signals. But I do not find upper layers handle -ERESTARTSYS in the entire calling path, ksys_ioctl -> do_vfs_ioctl -> amdgpu_drm_ioctl -> drm->ioctl -> drm_ioctl_kernel -> amdgpu_cs_ioctl. The error code returns to application. I confirm it, libdrm userptr test application calling amdgpu_cs_ioctl return code is -512, which is -ERESTARTSYS. So application should handle -ERESTARTSYS to restart the ioctl, but libdrm userptr test application doesn't handle this. This causes the test failed. This is a bug in the test cases then. -ERESTARTSYS can happen at any time during interruptible waiting and it is mandatory for the upper layer to handle it correctly. -ERESTARTSYS can be returned only when signal is pending, signal handler will translate ERESTARTSYS to EINTR, drmIoctl in libdrm does handle EINTR and restart the ioctl. The test cases are ok. Interesting, I thought that this was handled unconditionally if a signal is pending or not. But that could as well be architecture specific. Driver fail path should not return ERESTARTSYS to user space. The new patch, I change amdgpu_cs_submit to return -EAGAIN if userptr is updated, and amdgpu_cs_ioctl redo the ioctl only if error code is -EAGAIN. ERESTARTSYS error code returns to user space for signal handle as before. Good idea, that is probably cleaner. EGAIN and EINTR should have the same effect in drmIoctl IIRC. Below are details of userptr path difference. For the previous path, libdrm test always goes to step 2, step 3 never trigger. So it never return -ERESTARTSYS, but in theory, this could happen. For HMM path, the test always goes to step 3, we have to handle this case inside amdgpu_cs_ioctl. Maybe I can change amdgpu_cs_submit to return -EBUSY, then restart the ioctl inside amdgpu_cs_ioctl. I will submit new patch. Clearly a NAK, this won't work correctly. I don't understand your concern, may you explain the reason? When some underlying code returns ERESTARTSYS we *MUST* forward that to the upper layer. Handling it in a loop or replacing it manually with EINTR or EGAIN will break signal handling. This is suggested quite often, but can end up in an endless loop inside the kernel which is quite ugly. Regards, Christian. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx