Re: [PATCH] drm/amdgpu/gfx_v8_0: Reorder the gfx, kiq and kcq rings test sequence
Acked-by: Alex Deucher From: amd-gfx on behalf of Zhou, Tiecheng Sent: Friday, December 28, 2018 12:36:17 AM To: Zhou, Tiecheng; amd-gfx@lists.freedesktop.org Subject: RE: [PATCH] drm/amdgpu/gfx_v8_0: Reorder the gfx, kiq and kcq rings test sequence Ping... -Original Message- From: amd-gfx On Behalf Of Tiecheng Zhou Sent: Thursday, December 27, 2018 4:15 PM To: amd-gfx@lists.freedesktop.org Cc: Zhou, Tiecheng Subject: [PATCH] drm/amdgpu/gfx_v8_0: Reorder the gfx, kiq and kcq rings test sequence The kiq ring and the very first compute ring may fail occasionally if they are tested directly following kiq_kcq_enable. Insert the gfx ring test before kiq ring test to delay the kiq and kcq ring tests will fix the issue. Signed-off-by: Tiecheng Zhou --- drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 48 +-- 1 file changed, 35 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c index 381f593b..164ffc9 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c @@ -4278,9 +4278,8 @@ static int gfx_v8_0_cp_gfx_resume(struct amdgpu_device *adev) amdgpu_ring_clear_ring(ring); gfx_v8_0_cp_gfx_start(adev); ring->sched.ready = true; - r = amdgpu_ring_test_helper(ring); - return r; + return 0; } static void gfx_v8_0_cp_compute_enable(struct amdgpu_device *adev, bool enable) @@ -4369,10 +4368,9 @@ static int gfx_v8_0_kiq_kcq_enable(struct amdgpu_device *adev) amdgpu_ring_write(kiq_ring, upper_32_bits(wptr_addr)); } - r = amdgpu_ring_test_helper(kiq_ring); - if (r) - DRM_ERROR("KCQ enable failed\n"); - return r; + amdgpu_ring_commit(kiq_ring); + + return 0; } static int gfx_v8_0_deactivate_hqd(struct amdgpu_device *adev, u32 req) @@ -4709,16 +4707,32 @@ static int gfx_v8_0_kcq_resume(struct amdgpu_device *adev) if (r) goto done; - /* Test KCQs - reversing the order of rings seems to fix ring test failure -* after GPU reset -*/ - for (i = adev->gfx.num_compute_rings - 1; i >= 0; i--) { +done: + return r; +} + +static int gfx_v8_0_cp_test_all_rings(struct amdgpu_device *adev) { + int r, i; + struct amdgpu_ring *ring; + + /* collect all the ring_tests here, gfx, kiq, compute */ + ring = >gfx.gfx_ring[0]; + r = amdgpu_ring_test_helper(ring); + if (r) + return r; + + ring = >gfx.kiq.ring; + r = amdgpu_ring_test_helper(ring); + if (r) + return r; + + for (i = 0; i < adev->gfx.num_compute_rings; i++) { ring = >gfx.compute_ring[i]; - r = amdgpu_ring_test_helper(ring); + amdgpu_ring_test_helper(ring); } -done: - return r; + return 0; } static int gfx_v8_0_cp_resume(struct amdgpu_device *adev) @@ -4739,6 +4753,11 @@ static int gfx_v8_0_cp_resume(struct amdgpu_device *adev) r = gfx_v8_0_kcq_resume(adev); if (r) return r; + + r = gfx_v8_0_cp_test_all_rings(adev); + if (r) + return r; + gfx_v8_0_enable_gui_idle_interrupt(adev, true); return 0; @@ -5056,6 +5075,7 @@ static int gfx_v8_0_post_soft_reset(void *handle) { struct amdgpu_device *adev = (struct amdgpu_device *)handle; u32 grbm_soft_reset = 0; + struct amdgpu_ring *ring; if ((!adev->gfx.grbm_soft_reset) && (!adev->gfx.srbm_soft_reset)) @@ -5086,6 +5106,8 @@ static int gfx_v8_0_post_soft_reset(void *handle) REG_GET_FIELD(grbm_soft_reset, GRBM_SOFT_RESET, SOFT_RESET_GFX)) gfx_v8_0_cp_gfx_resume(adev); + gfx_v8_0_cp_test_all_rings(adev); + adev->gfx.rlc.funcs->start(adev); return 0; -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: Regression with kernel 4.20 on armhf
Hi Alex, Before all... Have a nice holidays! Happy new year!! - Okay, so it looks like sometimes the driver is able to enter graphical mode with the Polaris card, but most of the time it fails before with: [ 49.762704] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=2, emitted seq=3 - This is something that is happening sporadically but in a less intensive way in 4.17, 4.18 and 4.19 kernels, so this is actually not a regression, but rather an existent issue, which maybe the patch "drm/amdgpu/gfx_v8_0: Reorder the gfx, kiq and kcq ring tests sequence" solves. I tried to backport it to 4.20, but had no improvement. Need to try with the git version, or rc1. - This hang happens after the console is displayed in the screen, but before switching to graphical mode with X. - However if X is entered then the driver is stable and can be used for long periods. Regards, Luís Mendes On Tue, Dec 18, 2018 at 11:16 PM Luís Mendes wrote: > > Hi Alex, > > I am already using drm_arch_can_wc_memory() set to false. > I will try to bisect... > > Regards, > Luís > > On Tue, Dec 18, 2018 at 7:03 PM Alex Deucher wrote: > > > > On Tue, Dec 18, 2018 at 8:58 AM Luís Mendes wrote: > > > > > > Hi Christian, > > > > > > I've been using a Sapphire RX 550 and a Sapphire RX 460 on a custom > > > armhf board that runs well with Linux 4.19.9 at least, but now > > > starting with Linux kernel 4.20, I'm having a gpu hang, right after > > > the console being displayed, but before entering in graphical mode, > > > when starting X session. > > > I'm only reporting this now, because there was a PCI commit for mvebu > > > that also entered for linux-4.20 that caused a kernel oops during > > > pci_map_rom call in amdgpu initialization code. I've reverted that > > > patch, but now amdgpu is hanging. > > > > It would be useful if you could bisect. This is the first I've heard > > of amdgpu working on an ARM board without write combining (WC) > > disabled. You might check to see if disabling WC helps. Return false > > in drm_arch_can_wc_memory(). > > > > Alex > > > > > > > > > > > [ 24.801861] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx > > > timeout, signaled seq=2, emitted seq=3 > > > > > > 02:00.0 VGA compatible controller: Advanced Micro Devices, Inc. > > > [AMD/ATI] Baffin [Polaris11] (rev ff) (prog-if 00 [VGA controller]) > > > Subsystem: Sapphire Technology Limited Baffin [Radeon RX 560] > > > Flags: bus master, fast devsel, latency 0, IRQ 51 > > > Memory at d000 (64-bit, prefetchable) [size=256M] > > > Memory at e000 (64-bit, prefetchable) [size=2M] > > > I/O ports at 1 [size=256] > > > Memory at e020 (32-bit, non-prefetchable) [size=256K] > > > Expansion ROM at e024 [disabled] [size=128K] > > > Capabilities: > > > Kernel driver in use: amdgpu > > > Kernel modules: amdgpu > > > > > > dmesg follows in attachment. > > > > > > Regards, > > > Luís > > > ___ > > > amd-gfx mailing list > > > amd-gfx@lists.freedesktop.org > > > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx