Re: [PATCH] drm/amdgpu: fix a kcq hang issue for SRIOV
Am 28.03.2018 um 06:36 schrieb Liu, Monk: The SDMA is not directly connected to the GFXHUB, so even if the SDMA would provide a single command for this the write/wait would still be executed as two operations. I don't understand this point, more details may be ?? For SDMA from v148 ucode, it'll ignore PREEMPT command when it is doing SRBM_WRITE and POLL_MEM_REG on registers, so as long as SDMA is dong vm invalidate the world switch will not Interrupt it Ah! Good to know, I was assuming that the GFX block might actually switch anyway in this case. Christian. /Monk -Original Message- From: Koenig, Christian Sent: 2018年3月28日 0:30 To: Alex Deucher <alexdeuc...@gmail.com> Cc: Deng, Emily <emily.d...@amd.com>; Liu, Monk <monk@amd.com>; amd-gfx list <amd-gfx@lists.freedesktop.org> Subject: Re: [PATCH] drm/amdgpu: fix a kcq hang issue for SRIOV Am 27.03.2018 um 17:52 schrieb Alex Deucher: [SNIP] 2. add the new callback implementation to gfx9 and gfx8 (I think gfx8 will need this as well since we support sr-iov there too) gfx8 doesn't have the hardware bug which seems to make this necessary, not does it have the same VMHUB design as gfx9. Oh, right, in this case it's the req/ack engines which were new for soc15. We may want the same fix for sdma4 though. And exactly that is one of the reasons why this workaround doesn't work correctly. The SDMA is not directly connected to the GFXHUB, so even if the SDMA would provide a single command for this the write/wait would still be executed as two operations. In other words we can again run into the problem and the same thing applies for CPU based updates. The only real workaround would be to write the request, read the register back and if the write didn't succeeded write it again. But seriously remember that this issue is not limited to the VMHUB registers. Do you want to write and read back every register to make sure the write succeeded? Regards, Christian. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: [PATCH] drm/amdgpu: fix a kcq hang issue for SRIOV
> The SDMA is not directly connected to the GFXHUB, so even if the SDMA would provide a single command for this the write/wait would still be executed as two operations. I don't understand this point, more details may be ?? For SDMA from v148 ucode, it'll ignore PREEMPT command when it is doing SRBM_WRITE and POLL_MEM_REG on registers, so as long as SDMA is dong vm invalidate the world switch will not Interrupt it /Monk -Original Message- From: Koenig, Christian Sent: 2018年3月28日 0:30 To: Alex Deucher <alexdeuc...@gmail.com> Cc: Deng, Emily <emily.d...@amd.com>; Liu, Monk <monk@amd.com>; amd-gfx list <amd-gfx@lists.freedesktop.org> Subject: Re: [PATCH] drm/amdgpu: fix a kcq hang issue for SRIOV Am 27.03.2018 um 17:52 schrieb Alex Deucher: > [SNIP] >>> 2. add the new callback implementation to gfx9 and gfx8 (I think >>> gfx8 will need this as well since we support sr-iov there too) >> >> gfx8 doesn't have the hardware bug which seems to make this >> necessary, not does it have the same VMHUB design as gfx9. > Oh, right, in this case it's the req/ack engines which were new for > soc15. We may want the same fix for sdma4 though. And exactly that is one of the reasons why this workaround doesn't work correctly. The SDMA is not directly connected to the GFXHUB, so even if the SDMA would provide a single command for this the write/wait would still be executed as two operations. In other words we can again run into the problem and the same thing applies for CPU based updates. The only real workaround would be to write the request, read the register back and if the write didn't succeeded write it again. But seriously remember that this issue is not limited to the VMHUB registers. Do you want to write and read back every register to make sure the write succeeded? Regards, Christian. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: [PATCH] drm/amdgpu: fix a kcq hang issue for SRIOV
Thanks you all Let's keep pushing on hardware team ... hope MC or RLCV side could discover something interesting /Monk -Original Message- From: Christian König [mailto:ckoenig.leichtzumer...@gmail.com] Sent: 2018年3月28日 1:18 To: Alex Deucher <alexdeuc...@gmail.com>; Koenig, Christian <christian.koe...@amd.com> Cc: Deng, Emily <emily.d...@amd.com>; amd-gfx list <amd-gfx@lists.freedesktop.org>; Liu, Monk <monk....@amd.com> Subject: Re: [PATCH] drm/amdgpu: fix a kcq hang issue for SRIOV Am 27.03.2018 um 18:56 schrieb Alex Deucher: > On Tue, Mar 27, 2018 at 12:30 PM, Christian König > <christian.koe...@amd.com> wrote: >> Am 27.03.2018 um 17:52 schrieb Alex Deucher: >>> [SNIP] >>>>> 2. add the new callback implementation to gfx9 and gfx8 (I think >>>>> gfx8 will need this as well since we support sr-iov there too) >>>> >>>> gfx8 doesn't have the hardware bug which seems to make this >>>> necessary, not does it have the same VMHUB design as gfx9. >>> Oh, right, in this case it's the req/ack engines which were new for >>> soc15. We may want the same fix for sdma4 though. >> >> And exactly that is one of the reasons why this workaround doesn't >> work correctly. >> >> The SDMA is not directly connected to the GFXHUB, so even if the SDMA >> would provide a single command for this the write/wait would still be >> executed as two operations. > I'm not sure I follow. I think there are two issues: the hw bug you > are referring to and the SR-IOV requirement that the req and the ack > can't be split by a world switch. I believe the world switch happens > at at least packet granularity so I think for the SR-IOV requirement > using a single packet should handle it. The problem is to me it looks like there is no SR-IOV requirement to not split the req and ack. The hardware is duplicated per VF and I suggested to Emily to test my multiple write workaround. Since I didn't heard back I strongly assume that this worked as well and that can only mean that we are indeed running into the same hw issue again. That in turn means that not only the GFXHUB is affect, but ANY (GRBM?) register write could be silently dropped. I can't imagine how we want to build a stable driver around this. I unfortunately can't reliable reproduce the issue on bare metal any more. It would probably best if we could setup a call with some of the hardware guys to come up with a plan to narrow down this issue further. Regards, Christian. > >> In other words we can again run into the problem and the same thing >> applies for CPU based updates. > yeah, CPU based updates could indeed be an issue for the SR-IOV > requirement, but in that case it's easier to read back and retry. > > Alex > > >> The only real workaround would be to write the request, read the >> register back and if the write didn't succeeded write it again. >> >> But seriously remember that this issue is not limited to the VMHUB >> registers. Do you want to write and read back every register to make >> sure the write succeeded? >> >> Regards, >> Christian. > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: fix a kcq hang issue for SRIOV
Am 27.03.2018 um 18:56 schrieb Alex Deucher: On Tue, Mar 27, 2018 at 12:30 PM, Christian Königwrote: Am 27.03.2018 um 17:52 schrieb Alex Deucher: [SNIP] 2. add the new callback implementation to gfx9 and gfx8 (I think gfx8 will need this as well since we support sr-iov there too) gfx8 doesn't have the hardware bug which seems to make this necessary, not does it have the same VMHUB design as gfx9. Oh, right, in this case it's the req/ack engines which were new for soc15. We may want the same fix for sdma4 though. And exactly that is one of the reasons why this workaround doesn't work correctly. The SDMA is not directly connected to the GFXHUB, so even if the SDMA would provide a single command for this the write/wait would still be executed as two operations. I'm not sure I follow. I think there are two issues: the hw bug you are referring to and the SR-IOV requirement that the req and the ack can't be split by a world switch. I believe the world switch happens at at least packet granularity so I think for the SR-IOV requirement using a single packet should handle it. The problem is to me it looks like there is no SR-IOV requirement to not split the req and ack. The hardware is duplicated per VF and I suggested to Emily to test my multiple write workaround. Since I didn't heard back I strongly assume that this worked as well and that can only mean that we are indeed running into the same hw issue again. That in turn means that not only the GFXHUB is affect, but ANY (GRBM?) register write could be silently dropped. I can't imagine how we want to build a stable driver around this. I unfortunately can't reliable reproduce the issue on bare metal any more. It would probably best if we could setup a call with some of the hardware guys to come up with a plan to narrow down this issue further. Regards, Christian. In other words we can again run into the problem and the same thing applies for CPU based updates. yeah, CPU based updates could indeed be an issue for the SR-IOV requirement, but in that case it's easier to read back and retry. Alex The only real workaround would be to write the request, read the register back and if the write didn't succeeded write it again. But seriously remember that this issue is not limited to the VMHUB registers. Do you want to write and read back every register to make sure the write succeeded? Regards, Christian. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: fix a kcq hang issue for SRIOV
On Tue, Mar 27, 2018 at 12:30 PM, Christian Königwrote: > Am 27.03.2018 um 17:52 schrieb Alex Deucher: >> >> [SNIP] 2. add the new callback implementation to gfx9 and gfx8 (I think gfx8 will need this as well since we support sr-iov there too) >>> >>> >>> gfx8 doesn't have the hardware bug which seems to make this necessary, >>> not >>> does it have the same VMHUB design as gfx9. >> >> Oh, right, in this case it's the req/ack engines which were new for >> soc15. We may want the same fix for sdma4 though. > > > And exactly that is one of the reasons why this workaround doesn't work > correctly. > > The SDMA is not directly connected to the GFXHUB, so even if the SDMA would > provide a single command for this the write/wait would still be executed as > two operations. I'm not sure I follow. I think there are two issues: the hw bug you are referring to and the SR-IOV requirement that the req and the ack can't be split by a world switch. I believe the world switch happens at at least packet granularity so I think for the SR-IOV requirement using a single packet should handle it. > > In other words we can again run into the problem and the same thing applies > for CPU based updates. yeah, CPU based updates could indeed be an issue for the SR-IOV requirement, but in that case it's easier to read back and retry. Alex > > The only real workaround would be to write the request, read the register > back and if the write didn't succeeded write it again. > > But seriously remember that this issue is not limited to the VMHUB > registers. Do you want to write and read back every register to make sure > the write succeeded? > > Regards, > Christian. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: fix a kcq hang issue for SRIOV
Am 27.03.2018 um 17:52 schrieb Alex Deucher: [SNIP] 2. add the new callback implementation to gfx9 and gfx8 (I think gfx8 will need this as well since we support sr-iov there too) gfx8 doesn't have the hardware bug which seems to make this necessary, not does it have the same VMHUB design as gfx9. Oh, right, in this case it's the req/ack engines which were new for soc15. We may want the same fix for sdma4 though. And exactly that is one of the reasons why this workaround doesn't work correctly. The SDMA is not directly connected to the GFXHUB, so even if the SDMA would provide a single command for this the write/wait would still be executed as two operations. In other words we can again run into the problem and the same thing applies for CPU based updates. The only real workaround would be to write the request, read the register back and if the write didn't succeeded write it again. But seriously remember that this issue is not limited to the VMHUB registers. Do you want to write and read back every register to make sure the write succeeded? Regards, Christian. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: fix a kcq hang issue for SRIOV
On Tue, Mar 27, 2018 at 11:43 AM, Christian Königwrote: > Am 27.03.2018 um 17:37 schrieb Alex Deucher: >> >> On Tue, Mar 27, 2018 at 11:15 AM, Alex Deucher >> wrote: >>> >>> On Tue, Mar 27, 2018 at 1:58 AM, Emily Deng wrote: issue: the vmflush in KCQ could be preempted (not like GFX ring which doesn't allow preemption in ring buffer) and this lead to vm flush fail when there is a world switch during the vm flush procedure (between write invalidate request and query invalidate ack) fix: separate vm flush for gfx and compute ring, and use the new format command in compute's vm flush which use only one package so no preemption could allowed Signed-off-by: Monk Liu Signed-off-by: Emily Deng --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 ++ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 10 +- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c| 18 +- 4 files changed, 25 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index a7e2229..986659f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1790,6 +1790,7 @@ amdgpu_get_sdma_instance(struct amdgpu_ring *ring) #define amdgpu_ring_emit_rreg(r, d) (r)->funcs->emit_rreg((r), (d)) #define amdgpu_ring_emit_wreg(r, d, v) (r)->funcs->emit_wreg((r), (d), (v)) #define amdgpu_ring_emit_reg_wait(r, d, v, m) (r)->funcs->emit_reg_wait((r), (d), (v), (m)) +#define amdgpu_ring_emit_reg_wait1(r, d0, d1, v, m) (r)->funcs->emit_reg_wait1((r), (d0), (d1), (v), (m)) #define amdgpu_ring_emit_tmz(r, b) (r)->funcs->emit_tmz((r), (b)) #define amdgpu_ring_pad_ib(r, ib) ((r)->funcs->pad_ib((r), (ib))) #define amdgpu_ring_init_cond_exec(r) (r)->funcs->init_cond_exec((r)) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index 1d0d250..d85df5d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -152,6 +152,8 @@ struct amdgpu_ring_funcs { void (*emit_wreg)(struct amdgpu_ring *ring, uint32_t reg, uint32_t val); void (*emit_reg_wait)(struct amdgpu_ring *ring, uint32_t reg, uint32_t val, uint32_t mask); + void (*emit_reg_wait1)(struct amdgpu_ring *ring, uint32_t reg0, + uint32_t reg1, uint32_t val, uint32_t mask); >>> >>> This is pretty ugly. We don't need a new callback for this >>> workaround. It will just confuse things for other IPs. Just call >>> gfx_v9_0_wait_reg_mem() directly if you need it in >>> gmc_v9_0_emit_flush_gpu_tlb(). >> >> Nevermind, I misread the patch and thought the change was in the gfx9 >> module. In this case, I'd rather split this into three patches. >> >> 1. add the new ring callback with a better name. E.g., >> emit_reg_wait_oneshot() > > > Maybe "emit_reg_write_wait()", but I still think that is an absolutely worse > idea. Well apparently the issue is that world switches can't happen between the req and the ack, so this would consolidate the operation to one packet which should avoid that issue. > >> 2. add the new callback implementation to gfx9 and gfx8 (I think gfx8 >> will need this as well since we support sr-iov there too) > > > gfx8 doesn't have the hardware bug which seems to make this necessary, not > does it have the same VMHUB design as gfx9. Oh, right, in this case it's the req/ack engines which were new for soc15. We may want the same fix for sdma4 though. > >> 3. in the gmc8 and 9 code tlb_flush callbacks, check if the ring >> supports the new callback and if so, use that, otherwise, fall back to >> the existing code. > > > Actually I would rather say we provide a stub in amdgpu_ring.c which just > uses the separate write and wait callbacks. yeah, that may be cleaner. Alex > > Christian. > > >> >> >>> Alex >>> void (*emit_tmz)(struct amdgpu_ring *ring, bool start); /* priority functions */ void (*set_priority) (struct amdgpu_ring *ring, diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index 1ae3de1..509c9d2 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c @@ -4078,6 +4078,13 @@ static void gfx_v9_0_ring_emit_reg_wait(struct amdgpu_ring *ring, uint32_t reg, gfx_v9_0_wait_reg_mem(ring, 0, 0, 0, reg, 0, val, mask, 0x20); } +static void gfx_v9_0_ring_emit_reg_wait_compute(struct amdgpu_ring *ring, +
Re: [PATCH] drm/amdgpu: fix a kcq hang issue for SRIOV
Am 27.03.2018 um 17:37 schrieb Alex Deucher: On Tue, Mar 27, 2018 at 11:15 AM, Alex Deucherwrote: On Tue, Mar 27, 2018 at 1:58 AM, Emily Deng wrote: issue: the vmflush in KCQ could be preempted (not like GFX ring which doesn't allow preemption in ring buffer) and this lead to vm flush fail when there is a world switch during the vm flush procedure (between write invalidate request and query invalidate ack) fix: separate vm flush for gfx and compute ring, and use the new format command in compute's vm flush which use only one package so no preemption could allowed Signed-off-by: Monk Liu Signed-off-by: Emily Deng --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 ++ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 10 +- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c| 18 +- 4 files changed, 25 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index a7e2229..986659f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1790,6 +1790,7 @@ amdgpu_get_sdma_instance(struct amdgpu_ring *ring) #define amdgpu_ring_emit_rreg(r, d) (r)->funcs->emit_rreg((r), (d)) #define amdgpu_ring_emit_wreg(r, d, v) (r)->funcs->emit_wreg((r), (d), (v)) #define amdgpu_ring_emit_reg_wait(r, d, v, m) (r)->funcs->emit_reg_wait((r), (d), (v), (m)) +#define amdgpu_ring_emit_reg_wait1(r, d0, d1, v, m) (r)->funcs->emit_reg_wait1((r), (d0), (d1), (v), (m)) #define amdgpu_ring_emit_tmz(r, b) (r)->funcs->emit_tmz((r), (b)) #define amdgpu_ring_pad_ib(r, ib) ((r)->funcs->pad_ib((r), (ib))) #define amdgpu_ring_init_cond_exec(r) (r)->funcs->init_cond_exec((r)) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index 1d0d250..d85df5d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -152,6 +152,8 @@ struct amdgpu_ring_funcs { void (*emit_wreg)(struct amdgpu_ring *ring, uint32_t reg, uint32_t val); void (*emit_reg_wait)(struct amdgpu_ring *ring, uint32_t reg, uint32_t val, uint32_t mask); + void (*emit_reg_wait1)(struct amdgpu_ring *ring, uint32_t reg0, + uint32_t reg1, uint32_t val, uint32_t mask); This is pretty ugly. We don't need a new callback for this workaround. It will just confuse things for other IPs. Just call gfx_v9_0_wait_reg_mem() directly if you need it in gmc_v9_0_emit_flush_gpu_tlb(). Nevermind, I misread the patch and thought the change was in the gfx9 module. In this case, I'd rather split this into three patches. 1. add the new ring callback with a better name. E.g., emit_reg_wait_oneshot() Maybe "emit_reg_write_wait()", but I still think that is an absolutely worse idea. 2. add the new callback implementation to gfx9 and gfx8 (I think gfx8 will need this as well since we support sr-iov there too) gfx8 doesn't have the hardware bug which seems to make this necessary, not does it have the same VMHUB design as gfx9. 3. in the gmc8 and 9 code tlb_flush callbacks, check if the ring supports the new callback and if so, use that, otherwise, fall back to the existing code. Actually I would rather say we provide a stub in amdgpu_ring.c which just uses the separate write and wait callbacks. Christian. Alex void (*emit_tmz)(struct amdgpu_ring *ring, bool start); /* priority functions */ void (*set_priority) (struct amdgpu_ring *ring, diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index 1ae3de1..509c9d2 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c @@ -4078,6 +4078,13 @@ static void gfx_v9_0_ring_emit_reg_wait(struct amdgpu_ring *ring, uint32_t reg, gfx_v9_0_wait_reg_mem(ring, 0, 0, 0, reg, 0, val, mask, 0x20); } +static void gfx_v9_0_ring_emit_reg_wait_compute(struct amdgpu_ring *ring, + uint32_t reg0, uint32_t reg1, + uint32_t val, uint32_t mask) +{ + gfx_v9_0_wait_reg_mem(ring, 0, 0, 1, reg0, reg1, val, mask, 0x20); +} + static void gfx_v9_0_set_gfx_eop_interrupt_state(struct amdgpu_device *adev, enum amdgpu_interrupt_state state) { @@ -4415,7 +4422,7 @@ static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_compute = { 7 + /* gfx_v9_0_ring_emit_hdp_flush */ 5 + /* hdp invalidate */ 7 + /* gfx_v9_0_ring_emit_pipeline_sync */ - SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 + + (SOC15_FLUSH_GPU_TLB_NUM_WREG - 1) * 5 + SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 + 2 + /*
Re: [PATCH] drm/amdgpu: fix a kcq hang issue for SRIOV
On Tue, Mar 27, 2018 at 11:15 AM, Alex Deucherwrote: > On Tue, Mar 27, 2018 at 1:58 AM, Emily Deng wrote: >> issue: >> the vmflush in KCQ could be preempted (not like GFX ring >> which doesn't allow preemption in ring buffer) and this lead >> to vm flush fail when there is a world switch during >> the vm flush procedure (between write invalidate request >> and query invalidate ack) >> >> fix: >> separate vm flush for gfx and compute ring, and use >> the new format command in compute's vm flush which >> use only one package so no preemption could allowed >> >> Signed-off-by: Monk Liu >> Signed-off-by: Emily Deng >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + >> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 ++ >> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 10 +- >> drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c| 18 +- >> 4 files changed, 25 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h >> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h >> index a7e2229..986659f 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h >> @@ -1790,6 +1790,7 @@ amdgpu_get_sdma_instance(struct amdgpu_ring *ring) >> #define amdgpu_ring_emit_rreg(r, d) (r)->funcs->emit_rreg((r), (d)) >> #define amdgpu_ring_emit_wreg(r, d, v) (r)->funcs->emit_wreg((r), (d), (v)) >> #define amdgpu_ring_emit_reg_wait(r, d, v, m) >> (r)->funcs->emit_reg_wait((r), (d), (v), (m)) >> +#define amdgpu_ring_emit_reg_wait1(r, d0, d1, v, m) >> (r)->funcs->emit_reg_wait1((r), (d0), (d1), (v), (m)) >> #define amdgpu_ring_emit_tmz(r, b) (r)->funcs->emit_tmz((r), (b)) >> #define amdgpu_ring_pad_ib(r, ib) ((r)->funcs->pad_ib((r), (ib))) >> #define amdgpu_ring_init_cond_exec(r) (r)->funcs->init_cond_exec((r)) >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h >> index 1d0d250..d85df5d 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h >> @@ -152,6 +152,8 @@ struct amdgpu_ring_funcs { >> void (*emit_wreg)(struct amdgpu_ring *ring, uint32_t reg, uint32_t >> val); >> void (*emit_reg_wait)(struct amdgpu_ring *ring, uint32_t reg, >> uint32_t val, uint32_t mask); >> + void (*emit_reg_wait1)(struct amdgpu_ring *ring, uint32_t reg0, >> + uint32_t reg1, uint32_t val, uint32_t mask); > > This is pretty ugly. We don't need a new callback for this > workaround. It will just confuse things for other IPs. Just call > gfx_v9_0_wait_reg_mem() directly if you need it in > gmc_v9_0_emit_flush_gpu_tlb(). Nevermind, I misread the patch and thought the change was in the gfx9 module. In this case, I'd rather split this into three patches. 1. add the new ring callback with a better name. E.g., emit_reg_wait_oneshot() 2. add the new callback implementation to gfx9 and gfx8 (I think gfx8 will need this as well since we support sr-iov there too) 3. in the gmc8 and 9 code tlb_flush callbacks, check if the ring supports the new callback and if so, use that, otherwise, fall back to the existing code. > > Alex > >> void (*emit_tmz)(struct amdgpu_ring *ring, bool start); >> /* priority functions */ >> void (*set_priority) (struct amdgpu_ring *ring, >> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c >> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c >> index 1ae3de1..509c9d2 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c >> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c >> @@ -4078,6 +4078,13 @@ static void gfx_v9_0_ring_emit_reg_wait(struct >> amdgpu_ring *ring, uint32_t reg, >> gfx_v9_0_wait_reg_mem(ring, 0, 0, 0, reg, 0, val, mask, 0x20); >> } >> >> +static void gfx_v9_0_ring_emit_reg_wait_compute(struct amdgpu_ring *ring, >> + uint32_t reg0, uint32_t reg1, >> + uint32_t val, uint32_t mask) >> +{ >> + gfx_v9_0_wait_reg_mem(ring, 0, 0, 1, reg0, reg1, val, mask, 0x20); >> +} >> + >> static void gfx_v9_0_set_gfx_eop_interrupt_state(struct amdgpu_device *adev, >> enum amdgpu_interrupt_state >> state) >> { >> @@ -4415,7 +4422,7 @@ static const struct amdgpu_ring_funcs >> gfx_v9_0_ring_funcs_compute = { >> 7 + /* gfx_v9_0_ring_emit_hdp_flush */ >> 5 + /* hdp invalidate */ >> 7 + /* gfx_v9_0_ring_emit_pipeline_sync */ >> - SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 + >> + (SOC15_FLUSH_GPU_TLB_NUM_WREG - 1) * 5 + >> SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 + >> 2 + /* gfx_v9_0_ring_emit_vm_flush */ >> 8 + 8 + 8, /* gfx_v9_0_ring_emit_fence x3 for user fence, vm >> fence */ >> @@ -4433,6 +4440,7 @@ static const struct amdgpu_ring_funcs
Re: [PATCH] drm/amdgpu: fix a kcq hang issue for SRIOV
On Tue, Mar 27, 2018 at 11:15 AM, Alex Deucherwrote: > On Tue, Mar 27, 2018 at 1:58 AM, Emily Deng wrote: >> issue: >> the vmflush in KCQ could be preempted (not like GFX ring >> which doesn't allow preemption in ring buffer) and this lead >> to vm flush fail when there is a world switch during >> the vm flush procedure (between write invalidate request >> and query invalidate ack) >> >> fix: >> separate vm flush for gfx and compute ring, and use >> the new format command in compute's vm flush which >> use only one package so no preemption could allowed >> >> Signed-off-by: Monk Liu >> Signed-off-by: Emily Deng >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + >> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 ++ >> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 10 +- >> drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c| 18 +- >> 4 files changed, 25 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h >> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h >> index a7e2229..986659f 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h >> @@ -1790,6 +1790,7 @@ amdgpu_get_sdma_instance(struct amdgpu_ring *ring) >> #define amdgpu_ring_emit_rreg(r, d) (r)->funcs->emit_rreg((r), (d)) >> #define amdgpu_ring_emit_wreg(r, d, v) (r)->funcs->emit_wreg((r), (d), (v)) >> #define amdgpu_ring_emit_reg_wait(r, d, v, m) >> (r)->funcs->emit_reg_wait((r), (d), (v), (m)) >> +#define amdgpu_ring_emit_reg_wait1(r, d0, d1, v, m) >> (r)->funcs->emit_reg_wait1((r), (d0), (d1), (v), (m)) >> #define amdgpu_ring_emit_tmz(r, b) (r)->funcs->emit_tmz((r), (b)) >> #define amdgpu_ring_pad_ib(r, ib) ((r)->funcs->pad_ib((r), (ib))) >> #define amdgpu_ring_init_cond_exec(r) (r)->funcs->init_cond_exec((r)) >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h >> index 1d0d250..d85df5d 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h >> @@ -152,6 +152,8 @@ struct amdgpu_ring_funcs { >> void (*emit_wreg)(struct amdgpu_ring *ring, uint32_t reg, uint32_t >> val); >> void (*emit_reg_wait)(struct amdgpu_ring *ring, uint32_t reg, >> uint32_t val, uint32_t mask); >> + void (*emit_reg_wait1)(struct amdgpu_ring *ring, uint32_t reg0, >> + uint32_t reg1, uint32_t val, uint32_t mask); > > This is pretty ugly. We don't need a new callback for this > workaround. It will just confuse things for other IPs. Just call > gfx_v9_0_wait_reg_mem() directly if you need it in > gmc_v9_0_emit_flush_gpu_tlb(). > > Alex > >> void (*emit_tmz)(struct amdgpu_ring *ring, bool start); >> /* priority functions */ >> void (*set_priority) (struct amdgpu_ring *ring, >> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c >> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c >> index 1ae3de1..509c9d2 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c >> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c >> @@ -4078,6 +4078,13 @@ static void gfx_v9_0_ring_emit_reg_wait(struct >> amdgpu_ring *ring, uint32_t reg, >> gfx_v9_0_wait_reg_mem(ring, 0, 0, 0, reg, 0, val, mask, 0x20); >> } >> >> +static void gfx_v9_0_ring_emit_reg_wait_compute(struct amdgpu_ring *ring, >> + uint32_t reg0, uint32_t reg1, >> + uint32_t val, uint32_t mask) >> +{ >> + gfx_v9_0_wait_reg_mem(ring, 0, 0, 1, reg0, reg1, val, mask, 0x20); >> +} >> + >> static void gfx_v9_0_set_gfx_eop_interrupt_state(struct amdgpu_device *adev, >> enum amdgpu_interrupt_state >> state) >> { >> @@ -4415,7 +4422,7 @@ static const struct amdgpu_ring_funcs >> gfx_v9_0_ring_funcs_compute = { >> 7 + /* gfx_v9_0_ring_emit_hdp_flush */ >> 5 + /* hdp invalidate */ >> 7 + /* gfx_v9_0_ring_emit_pipeline_sync */ >> - SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 + >> + (SOC15_FLUSH_GPU_TLB_NUM_WREG - 1) * 5 + >> SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 + >> 2 + /* gfx_v9_0_ring_emit_vm_flush */ >> 8 + 8 + 8, /* gfx_v9_0_ring_emit_fence x3 for user fence, vm >> fence */ >> @@ -4433,6 +4440,7 @@ static const struct amdgpu_ring_funcs >> gfx_v9_0_ring_funcs_compute = { >> .set_priority = gfx_v9_0_ring_set_priority_compute, >> .emit_wreg = gfx_v9_0_ring_emit_wreg, >> .emit_reg_wait = gfx_v9_0_ring_emit_reg_wait, >> + .emit_reg_wait1 = gfx_v9_0_ring_emit_reg_wait_compute, >> }; >> >> static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_kiq = { >> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c >> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c >> index e687363..968447d 100644 >> ---
RE: [PATCH] drm/amdgpu: fix a kcq hang issue for SRIOV
Hi Christian, I only found the issue on SRIOV environment, does it also occurs on other cases, for example, thanks. Best Wishes, Emily Deng > -Original Message- > From: Christian König [mailto:ckoenig.leichtzumer...@gmail.com] > Sent: Tuesday, March 27, 2018 4:27 PM > To: Deng, Emily <emily.d...@amd.com>; Koenig, Christian > <christian.koe...@amd.com>; amd-gfx@lists.freedesktop.org > Cc: Liu, Monk <monk....@amd.com> > Subject: Re: [PATCH] drm/amdgpu: fix a kcq hang issue for SRIOV > > Hi Emily, > > Am 27.03.2018 um 10:23 schrieb Deng, Emily: > > Hi Christian, > > I think the root cause has been found out. It is the hardware > > restriction that couldn't do world switch between invalidate request, > > and ack. We couldn't control world switch in guest driver, but could replace > the origin two command with one command to avoid world switch. > > Unfortunately the problem was reproduced completely without world switch. > > So the world switch is not the root cause, but only another trigger to the > issue. > > We need to find the root cause or otherwise will not fix the real problem but > only mitigate the effects. > > Regards, > Christian. > > > > > For the multiple times writing, it is also the same root cause, > > when the world switch happened between the invalidate request, and > > ack, it will have problem. So we could give more chance for SRIOV to > > do the invalidate request to workaround the case that world switch > > happens between invalidate request, and ack > > > > Best Wishes, > > Emily Deng > > > >> -Original Message- > >> From: Christian König [mailto:ckoenig.leichtzumer...@gmail.com] > >> Sent: Tuesday, March 27, 2018 3:48 PM > >> To: Deng, Emily <emily.d...@amd.com>; amd-gfx@lists.freedesktop.org > >> Cc: Liu, Monk <monk@amd.com> > >> Subject: Re: [PATCH] drm/amdgpu: fix a kcq hang issue for SRIOV > >> > >> Am 27.03.2018 um 07:58 schrieb Emily Deng: > >>> issue: > >>> the vmflush in KCQ could be preempted (not like GFX ring which > >>> doesn't allow preemption in ring buffer) and this lead to vm flush > >>> fail when there is a world switch during the vm flush procedure > >>> (between write invalidate request and query invalidate ack) > >>> > >>> fix: > >>> separate vm flush for gfx and compute ring, and use the new format > >>> command in compute's vm flush which use only one package so no > >>> preemption could allowed > >> NAK, as already discussed multiple times now that only circumvents > >> the problem, but not really fixes it. > >> > >> Just executing the "amdgpu_ring_emit_wreg(ring, hub->vm_inv_eng0_req > >> + eng, req);" multiple times has the same effect and we need to figure out > why. > >> > >> Regards, > >> Christian. > >> > >>> Signed-off-by: Monk Liu <monk@amd.com> > >>> Signed-off-by: Emily Deng <emily.d...@amd.com> > >>> --- > >>>drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + > >>>drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 ++ > >>>drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 10 +- > >>>drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c| 18 +- > >>>4 files changed, 25 insertions(+), 6 deletions(-) > >>> > >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h > >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > >>> index a7e2229..986659f 100644 > >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h > >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > >>> @@ -1790,6 +1790,7 @@ amdgpu_get_sdma_instance(struct > >> amdgpu_ring *ring) > >>>#define amdgpu_ring_emit_rreg(r, d) (r)->funcs->emit_rreg((r), (d)) > >>>#define amdgpu_ring_emit_wreg(r, d, v) (r)->funcs->emit_wreg((r), (d), > (v)) > >>>#define amdgpu_ring_emit_reg_wait(r, d, v, m) > >>> (r)->funcs->emit_reg_wait((r), (d), (v), (m)) > >>> +#define amdgpu_ring_emit_reg_wait1(r, d0, d1, v, m) > >>> +(r)->funcs->emit_reg_wait1((r), (d0), (d1), (v), (m)) > >>>#define amdgpu_ring_emit_tmz(r, b) (r)->funcs->emit_tmz((r), (b)) > >>>#define amdgpu_ring_pad_ib(r, ib) ((r)->funcs->pad_ib((r), (ib))) > >>>#define amdgpu_ring_init_cond_exec(r) > >>> (r)->funcs->init_cond_exec((r)) diff --git > >>> a/drivers/
Re: [PATCH] drm/amdgpu: fix a kcq hang issue for SRIOV
Hi Emily, Am 27.03.2018 um 10:23 schrieb Deng, Emily: Hi Christian, I think the root cause has been found out. It is the hardware restriction that couldn't do world switch between invalidate request, and ack. We couldn't control world switch in guest driver, but could replace the origin two command with one command to avoid world switch. Unfortunately the problem was reproduced completely without world switch. So the world switch is not the root cause, but only another trigger to the issue. We need to find the root cause or otherwise will not fix the real problem but only mitigate the effects. Regards, Christian. For the multiple times writing, it is also the same root cause, when the world switch happened between the invalidate request, and ack, it will have problem. So we could give more chance for SRIOV to do the invalidate request to workaround the case that world switch happens between invalidate request, and ack Best Wishes, Emily Deng -Original Message- From: Christian König [mailto:ckoenig.leichtzumer...@gmail.com] Sent: Tuesday, March 27, 2018 3:48 PM To: Deng, Emily <emily.d...@amd.com>; amd-gfx@lists.freedesktop.org Cc: Liu, Monk <monk@amd.com> Subject: Re: [PATCH] drm/amdgpu: fix a kcq hang issue for SRIOV Am 27.03.2018 um 07:58 schrieb Emily Deng: issue: the vmflush in KCQ could be preempted (not like GFX ring which doesn't allow preemption in ring buffer) and this lead to vm flush fail when there is a world switch during the vm flush procedure (between write invalidate request and query invalidate ack) fix: separate vm flush for gfx and compute ring, and use the new format command in compute's vm flush which use only one package so no preemption could allowed NAK, as already discussed multiple times now that only circumvents the problem, but not really fixes it. Just executing the "amdgpu_ring_emit_wreg(ring, hub->vm_inv_eng0_req + eng, req);" multiple times has the same effect and we need to figure out why. Regards, Christian. Signed-off-by: Monk Liu <monk@amd.com> Signed-off-by: Emily Deng <emily.d...@amd.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 ++ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 10 +- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c| 18 +- 4 files changed, 25 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index a7e2229..986659f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1790,6 +1790,7 @@ amdgpu_get_sdma_instance(struct amdgpu_ring *ring) #define amdgpu_ring_emit_rreg(r, d) (r)->funcs->emit_rreg((r), (d)) #define amdgpu_ring_emit_wreg(r, d, v) (r)->funcs->emit_wreg((r), (d), (v)) #define amdgpu_ring_emit_reg_wait(r, d, v, m) (r)->funcs->emit_reg_wait((r), (d), (v), (m)) +#define amdgpu_ring_emit_reg_wait1(r, d0, d1, v, m) +(r)->funcs->emit_reg_wait1((r), (d0), (d1), (v), (m)) #define amdgpu_ring_emit_tmz(r, b) (r)->funcs->emit_tmz((r), (b)) #define amdgpu_ring_pad_ib(r, ib) ((r)->funcs->pad_ib((r), (ib))) #define amdgpu_ring_init_cond_exec(r) (r)->funcs->init_cond_exec((r)) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index 1d0d250..d85df5d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -152,6 +152,8 @@ struct amdgpu_ring_funcs { void (*emit_wreg)(struct amdgpu_ring *ring, uint32_t reg, uint32_t val); void (*emit_reg_wait)(struct amdgpu_ring *ring, uint32_t reg, uint32_t val, uint32_t mask); + void (*emit_reg_wait1)(struct amdgpu_ring *ring, uint32_t reg0, + uint32_t reg1, uint32_t val, uint32_t mask); void (*emit_tmz)(struct amdgpu_ring *ring, bool start); /* priority functions */ void (*set_priority) (struct amdgpu_ring *ring, diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index 1ae3de1..509c9d2 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c @@ -4078,6 +4078,13 @@ static void gfx_v9_0_ring_emit_reg_wait(struct amdgpu_ring *ring, uint32_t reg, gfx_v9_0_wait_reg_mem(ring, 0, 0, 0, reg, 0, val, mask, 0x20); } +static void gfx_v9_0_ring_emit_reg_wait_compute(struct amdgpu_ring *ring, + uint32_t reg0, uint32_t reg1, + uint32_t val, uint32_t mask) +{ + gfx_v9_0_wait_reg_mem(ring, 0, 0, 1, reg0, reg1, val, mask, 0x20); } + static void gfx_v9_0_set_gfx_eop_interrupt_state(struct amdgpu_device *adev, enum amdgpu_interrupt_state state) { @@ -4415,
RE: [PATCH] drm/amdgpu: fix a kcq hang issue for SRIOV
Hi Christian, I think the root cause has been found out. It is the hardware restriction that couldn't do world switch between invalidate request, and ack. We couldn't control world switch in guest driver, but could replace the origin two command with one command to avoid world switch. For the multiple times writing, it is also the same root cause, when the world switch happened between the invalidate request, and ack, it will have problem. So we could give more chance for SRIOV to do the invalidate request to workaround the case that world switch happens between invalidate request, and ack Best Wishes, Emily Deng > -Original Message- > From: Christian König [mailto:ckoenig.leichtzumer...@gmail.com] > Sent: Tuesday, March 27, 2018 3:48 PM > To: Deng, Emily <emily.d...@amd.com>; amd-gfx@lists.freedesktop.org > Cc: Liu, Monk <monk@amd.com> > Subject: Re: [PATCH] drm/amdgpu: fix a kcq hang issue for SRIOV > > Am 27.03.2018 um 07:58 schrieb Emily Deng: > > issue: > > the vmflush in KCQ could be preempted (not like GFX ring which doesn't > > allow preemption in ring buffer) and this lead to vm flush fail when > > there is a world switch during the vm flush procedure (between write > > invalidate request and query invalidate ack) > > > > fix: > > separate vm flush for gfx and compute ring, and use the new format > > command in compute's vm flush which use only one package so no > > preemption could allowed > > NAK, as already discussed multiple times now that only circumvents the > problem, but not really fixes it. > > Just executing the "amdgpu_ring_emit_wreg(ring, hub->vm_inv_eng0_req + > eng, req);" multiple times has the same effect and we need to figure out why. > > Regards, > Christian. > > > > > Signed-off-by: Monk Liu <monk@amd.com> > > Signed-off-by: Emily Deng <emily.d...@amd.com> > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + > > drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 ++ > > drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 10 +- > > drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c| 18 +- > > 4 files changed, 25 insertions(+), 6 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h > > b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > > index a7e2229..986659f 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > > @@ -1790,6 +1790,7 @@ amdgpu_get_sdma_instance(struct > amdgpu_ring *ring) > > #define amdgpu_ring_emit_rreg(r, d) (r)->funcs->emit_rreg((r), (d)) > > #define amdgpu_ring_emit_wreg(r, d, v) (r)->funcs->emit_wreg((r), (d), > > (v)) > > #define amdgpu_ring_emit_reg_wait(r, d, v, m) > > (r)->funcs->emit_reg_wait((r), (d), (v), (m)) > > +#define amdgpu_ring_emit_reg_wait1(r, d0, d1, v, m) > > +(r)->funcs->emit_reg_wait1((r), (d0), (d1), (v), (m)) > > #define amdgpu_ring_emit_tmz(r, b) (r)->funcs->emit_tmz((r), (b)) > > #define amdgpu_ring_pad_ib(r, ib) ((r)->funcs->pad_ib((r), (ib))) > > #define amdgpu_ring_init_cond_exec(r) > > (r)->funcs->init_cond_exec((r)) diff --git > > a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > > index 1d0d250..d85df5d 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > > @@ -152,6 +152,8 @@ struct amdgpu_ring_funcs { > > void (*emit_wreg)(struct amdgpu_ring *ring, uint32_t reg, uint32_t > val); > > void (*emit_reg_wait)(struct amdgpu_ring *ring, uint32_t reg, > > uint32_t val, uint32_t mask); > > + void (*emit_reg_wait1)(struct amdgpu_ring *ring, uint32_t reg0, > > + uint32_t reg1, uint32_t val, uint32_t mask); > > void (*emit_tmz)(struct amdgpu_ring *ring, bool start); > > /* priority functions */ > > void (*set_priority) (struct amdgpu_ring *ring, diff --git > > a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c > > b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c > > index 1ae3de1..509c9d2 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c > > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c > > @@ -4078,6 +4078,13 @@ static void gfx_v9_0_ring_emit_reg_wait(struct > amdgpu_ring *ring, uint32_t reg, > > gfx_v9_0_wait_reg_mem(ring, 0, 0, 0, reg, 0, val, mask, 0x20); > > } > > > > +static void gfx_v9_0_ring_emit_reg_wait_compute(struct amdgpu_ring > *ring, > > + uint32_t reg0, uint32_t reg1, > > +
Re: [PATCH] drm/amdgpu: fix a kcq hang issue for SRIOV
Am 27.03.2018 um 07:58 schrieb Emily Deng: issue: the vmflush in KCQ could be preempted (not like GFX ring which doesn't allow preemption in ring buffer) and this lead to vm flush fail when there is a world switch during the vm flush procedure (between write invalidate request and query invalidate ack) fix: separate vm flush for gfx and compute ring, and use the new format command in compute's vm flush which use only one package so no preemption could allowed NAK, as already discussed multiple times now that only circumvents the problem, but not really fixes it. Just executing the "amdgpu_ring_emit_wreg(ring, hub->vm_inv_eng0_req + eng, req);" multiple times has the same effect and we need to figure out why. Regards, Christian. Signed-off-by: Monk LiuSigned-off-by: Emily Deng --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 ++ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 10 +- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c| 18 +- 4 files changed, 25 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index a7e2229..986659f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1790,6 +1790,7 @@ amdgpu_get_sdma_instance(struct amdgpu_ring *ring) #define amdgpu_ring_emit_rreg(r, d) (r)->funcs->emit_rreg((r), (d)) #define amdgpu_ring_emit_wreg(r, d, v) (r)->funcs->emit_wreg((r), (d), (v)) #define amdgpu_ring_emit_reg_wait(r, d, v, m) (r)->funcs->emit_reg_wait((r), (d), (v), (m)) +#define amdgpu_ring_emit_reg_wait1(r, d0, d1, v, m) (r)->funcs->emit_reg_wait1((r), (d0), (d1), (v), (m)) #define amdgpu_ring_emit_tmz(r, b) (r)->funcs->emit_tmz((r), (b)) #define amdgpu_ring_pad_ib(r, ib) ((r)->funcs->pad_ib((r), (ib))) #define amdgpu_ring_init_cond_exec(r) (r)->funcs->init_cond_exec((r)) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index 1d0d250..d85df5d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -152,6 +152,8 @@ struct amdgpu_ring_funcs { void (*emit_wreg)(struct amdgpu_ring *ring, uint32_t reg, uint32_t val); void (*emit_reg_wait)(struct amdgpu_ring *ring, uint32_t reg, uint32_t val, uint32_t mask); + void (*emit_reg_wait1)(struct amdgpu_ring *ring, uint32_t reg0, + uint32_t reg1, uint32_t val, uint32_t mask); void (*emit_tmz)(struct amdgpu_ring *ring, bool start); /* priority functions */ void (*set_priority) (struct amdgpu_ring *ring, diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index 1ae3de1..509c9d2 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c @@ -4078,6 +4078,13 @@ static void gfx_v9_0_ring_emit_reg_wait(struct amdgpu_ring *ring, uint32_t reg, gfx_v9_0_wait_reg_mem(ring, 0, 0, 0, reg, 0, val, mask, 0x20); } +static void gfx_v9_0_ring_emit_reg_wait_compute(struct amdgpu_ring *ring, + uint32_t reg0, uint32_t reg1, + uint32_t val, uint32_t mask) +{ + gfx_v9_0_wait_reg_mem(ring, 0, 0, 1, reg0, reg1, val, mask, 0x20); +} + static void gfx_v9_0_set_gfx_eop_interrupt_state(struct amdgpu_device *adev, enum amdgpu_interrupt_state state) { @@ -4415,7 +4422,7 @@ static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_compute = { 7 + /* gfx_v9_0_ring_emit_hdp_flush */ 5 + /* hdp invalidate */ 7 + /* gfx_v9_0_ring_emit_pipeline_sync */ - SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 + + (SOC15_FLUSH_GPU_TLB_NUM_WREG - 1) * 5 + SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 + 2 + /* gfx_v9_0_ring_emit_vm_flush */ 8 + 8 + 8, /* gfx_v9_0_ring_emit_fence x3 for user fence, vm fence */ @@ -4433,6 +4440,7 @@ static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_compute = { .set_priority = gfx_v9_0_ring_set_priority_compute, .emit_wreg = gfx_v9_0_ring_emit_wreg, .emit_reg_wait = gfx_v9_0_ring_emit_reg_wait, + .emit_reg_wait1 = gfx_v9_0_ring_emit_reg_wait_compute, }; static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_kiq = { diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index e687363..968447d 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -385,11 +385,19 @@ static uint64_t gmc_v9_0_emit_flush_gpu_tlb(struct amdgpu_ring *ring, amdgpu_ring_emit_wreg(ring, hub->ctx0_ptb_addr_hi32 + (2 * vmid), upper_32_bits(pd_addr)); -
[PATCH] drm/amdgpu: fix a kcq hang issue for SRIOV
issue: the vmflush in KCQ could be preempted (not like GFX ring which doesn't allow preemption in ring buffer) and this lead to vm flush fail when there is a world switch during the vm flush procedure (between write invalidate request and query invalidate ack) fix: separate vm flush for gfx and compute ring, and use the new format command in compute's vm flush which use only one package so no preemption could allowed Signed-off-by: Monk LiuSigned-off-by: Emily Deng --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 ++ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 10 +- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c| 18 +- 4 files changed, 25 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index a7e2229..986659f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1790,6 +1790,7 @@ amdgpu_get_sdma_instance(struct amdgpu_ring *ring) #define amdgpu_ring_emit_rreg(r, d) (r)->funcs->emit_rreg((r), (d)) #define amdgpu_ring_emit_wreg(r, d, v) (r)->funcs->emit_wreg((r), (d), (v)) #define amdgpu_ring_emit_reg_wait(r, d, v, m) (r)->funcs->emit_reg_wait((r), (d), (v), (m)) +#define amdgpu_ring_emit_reg_wait1(r, d0, d1, v, m) (r)->funcs->emit_reg_wait1((r), (d0), (d1), (v), (m)) #define amdgpu_ring_emit_tmz(r, b) (r)->funcs->emit_tmz((r), (b)) #define amdgpu_ring_pad_ib(r, ib) ((r)->funcs->pad_ib((r), (ib))) #define amdgpu_ring_init_cond_exec(r) (r)->funcs->init_cond_exec((r)) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index 1d0d250..d85df5d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -152,6 +152,8 @@ struct amdgpu_ring_funcs { void (*emit_wreg)(struct amdgpu_ring *ring, uint32_t reg, uint32_t val); void (*emit_reg_wait)(struct amdgpu_ring *ring, uint32_t reg, uint32_t val, uint32_t mask); + void (*emit_reg_wait1)(struct amdgpu_ring *ring, uint32_t reg0, + uint32_t reg1, uint32_t val, uint32_t mask); void (*emit_tmz)(struct amdgpu_ring *ring, bool start); /* priority functions */ void (*set_priority) (struct amdgpu_ring *ring, diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index 1ae3de1..509c9d2 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c @@ -4078,6 +4078,13 @@ static void gfx_v9_0_ring_emit_reg_wait(struct amdgpu_ring *ring, uint32_t reg, gfx_v9_0_wait_reg_mem(ring, 0, 0, 0, reg, 0, val, mask, 0x20); } +static void gfx_v9_0_ring_emit_reg_wait_compute(struct amdgpu_ring *ring, + uint32_t reg0, uint32_t reg1, + uint32_t val, uint32_t mask) +{ + gfx_v9_0_wait_reg_mem(ring, 0, 0, 1, reg0, reg1, val, mask, 0x20); +} + static void gfx_v9_0_set_gfx_eop_interrupt_state(struct amdgpu_device *adev, enum amdgpu_interrupt_state state) { @@ -4415,7 +4422,7 @@ static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_compute = { 7 + /* gfx_v9_0_ring_emit_hdp_flush */ 5 + /* hdp invalidate */ 7 + /* gfx_v9_0_ring_emit_pipeline_sync */ - SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 + + (SOC15_FLUSH_GPU_TLB_NUM_WREG - 1) * 5 + SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 + 2 + /* gfx_v9_0_ring_emit_vm_flush */ 8 + 8 + 8, /* gfx_v9_0_ring_emit_fence x3 for user fence, vm fence */ @@ -4433,6 +4440,7 @@ static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_compute = { .set_priority = gfx_v9_0_ring_set_priority_compute, .emit_wreg = gfx_v9_0_ring_emit_wreg, .emit_reg_wait = gfx_v9_0_ring_emit_reg_wait, + .emit_reg_wait1 = gfx_v9_0_ring_emit_reg_wait_compute, }; static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_kiq = { diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index e687363..968447d 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -385,11 +385,19 @@ static uint64_t gmc_v9_0_emit_flush_gpu_tlb(struct amdgpu_ring *ring, amdgpu_ring_emit_wreg(ring, hub->ctx0_ptb_addr_hi32 + (2 * vmid), upper_32_bits(pd_addr)); - amdgpu_ring_emit_wreg(ring, hub->vm_inv_eng0_req + eng, req); - - /* wait for the invalidate to complete */ - amdgpu_ring_emit_reg_wait(ring, hub->vm_inv_eng0_ack + eng, - 1 << vmid, 1 << vmid); +/* The world switch cannot be allowed to occur while + some invalidation controller code is waiting for an
[PATCH] drm/amdgpu: fix a kcq hang issue for SRIOV
issue: the vmflush in KCQ could be preempted (not like GFX ring which doesn't allow preemption in ring buffer) and this lead to vm flush fail when there is a world switch during the vm flush procedure (between write invalidate request and query invalidate ack) fix: separate vm flush for gfx and compute ring, and use the new format command in compute's vm flush which use only one package so no preemption could allowed Signed-off-by: Monk LiuSigned-off-by: Emily Deng --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 ++ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 10 +- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c| 14 +- 4 files changed, 21 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index a7e2229..986659f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1790,6 +1790,7 @@ amdgpu_get_sdma_instance(struct amdgpu_ring *ring) #define amdgpu_ring_emit_rreg(r, d) (r)->funcs->emit_rreg((r), (d)) #define amdgpu_ring_emit_wreg(r, d, v) (r)->funcs->emit_wreg((r), (d), (v)) #define amdgpu_ring_emit_reg_wait(r, d, v, m) (r)->funcs->emit_reg_wait((r), (d), (v), (m)) +#define amdgpu_ring_emit_reg_wait1(r, d0, d1, v, m) (r)->funcs->emit_reg_wait1((r), (d0), (d1), (v), (m)) #define amdgpu_ring_emit_tmz(r, b) (r)->funcs->emit_tmz((r), (b)) #define amdgpu_ring_pad_ib(r, ib) ((r)->funcs->pad_ib((r), (ib))) #define amdgpu_ring_init_cond_exec(r) (r)->funcs->init_cond_exec((r)) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index 1d0d250..d85df5d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -152,6 +152,8 @@ struct amdgpu_ring_funcs { void (*emit_wreg)(struct amdgpu_ring *ring, uint32_t reg, uint32_t val); void (*emit_reg_wait)(struct amdgpu_ring *ring, uint32_t reg, uint32_t val, uint32_t mask); + void (*emit_reg_wait1)(struct amdgpu_ring *ring, uint32_t reg0, + uint32_t reg1, uint32_t val, uint32_t mask); void (*emit_tmz)(struct amdgpu_ring *ring, bool start); /* priority functions */ void (*set_priority) (struct amdgpu_ring *ring, diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index d1d2c27..d36b29e 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c @@ -4014,6 +4014,13 @@ static void gfx_v9_0_ring_emit_reg_wait(struct amdgpu_ring *ring, uint32_t reg, gfx_v9_0_wait_reg_mem(ring, 0, 0, 0, reg, 0, val, mask, 0x20); } +static void gfx_v9_0_ring_emit_reg_wait_compute(struct amdgpu_ring *ring, + uint32_t reg0, uint32_t reg1, + uint32_t val, uint32_t mask) +{ + gfx_v9_0_wait_reg_mem(ring, 0, 0, 1, reg0, reg1, val, mask, 0x20); +} + static void gfx_v9_0_set_gfx_eop_interrupt_state(struct amdgpu_device *adev, enum amdgpu_interrupt_state state) { @@ -4351,7 +4358,7 @@ static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_compute = { 7 + /* gfx_v9_0_ring_emit_hdp_flush */ 5 + /* hdp invalidate */ 7 + /* gfx_v9_0_ring_emit_pipeline_sync */ - SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 + + (SOC15_FLUSH_GPU_TLB_NUM_WREG - 1) * 5 + SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 + 2 + /* gfx_v9_0_ring_emit_vm_flush */ 8 + 8 + 8, /* gfx_v9_0_ring_emit_fence x3 for user fence, vm fence */ @@ -4369,6 +4376,7 @@ static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_compute = { .set_priority = gfx_v9_0_ring_set_priority_compute, .emit_wreg = gfx_v9_0_ring_emit_wreg, .emit_reg_wait = gfx_v9_0_ring_emit_reg_wait, + .emit_reg_wait1 = gfx_v9_0_ring_emit_reg_wait_compute, }; static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_kiq = { diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index 517712b..60be1a8 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -380,7 +380,6 @@ static void gmc_v9_0_flush_gpu_tlb(struct amdgpu_device *adev, } } - static uint64_t gmc_v9_0_emit_flush_gpu_tlb(struct amdgpu_ring *ring, unsigned vmid, uint64_t pd_addr) { @@ -399,11 +398,16 @@ static uint64_t gmc_v9_0_emit_flush_gpu_tlb(struct amdgpu_ring *ring, amdgpu_ring_emit_wreg(ring, hub->ctx0_ptb_addr_hi32 + (2 * vmid), upper_32_bits(pd_addr)); - amdgpu_ring_emit_wreg(ring, hub->vm_inv_eng0_req + eng, req); - /* wait for the invalidate to complete