[PATCH] drm/amdgpu: avoid clearing freed bo with sdma in gpu reset
WHY: For V320 passthrough and "modprobe amdgpu lockup_timeout=500", there will be kernel NULL pointer when using quark ~ BACO reset, for instance: hang_vm_compute0_bad_cs_dispatch.lua hang_vm_dma0_corrupted_header.lua etc. - [ 884.792885] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring comp_1.0.0 timeout, signaled seq=3, emitted seq=4 [ 884.793772] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process quark pid 16939 thread quark pid 16940 [ 884.859979] amdgpu: [powerplay] set virtualization GFX DPM policy success [ 884.861003] amdgpu: [powerplay] activate virtualization GFX DPM policy success [ 884.861065] amdgpu: [powerplay] set virtualization VCE DPM policy success [ 885.693554] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! [ 885.694682] [drm] schedpage0 is not ready, skipping [ 885.694682] [drm] schedpage1 is not ready, skipping [ 885.694720] [drm:amdgpu_gem_va_ioctl [amdgpu]] *ERROR* Couldn't update BO_VA (-2) [ 885.695328] BUG: unable to handle kernel NULL pointer dereference at 0008 [ 885.695909] PGD 0 P4D 0 [ 885.696104] Oops: [#1] SMP PTI [ 885.696368] CPU: 2 PID: 16940 Comm: quark Tainted: G OE 4.19.52+ #6 [ 885.696945] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [ 885.697593] RIP: 0010:amdgpu_vm_sdma_commit+0x59/0x130 [amdgpu] ... [ 885.705042] Call Trace: [ 885.705251] ? amdgpu_vm_bo_update_mapping+0xdf/0xf0 [amdgpu] [ 885.705696] ? amdgpu_vm_clear_freed+0xcc/0x1b0 [amdgpu] [ 885.706112] ? amdgpu_gem_va_ioctl+0x4a1/0x510 [amdgpu] [ 885.706493] ? __radix_tree_delete+0x7e/0xa0 [ 885.706822] ? amdgpu_gem_va_map_flags+0x70/0x70 [amdgpu] [ 885.707220] ? drm_ioctl_kernel+0xaa/0xf0 [drm] [ 885.707568] ? amdgpu_gem_va_map_flags+0x70/0x70 [amdgpu] [ 885.707962] ? drm_ioctl_kernel+0xaa/0xf0 [drm] [ 885.708294] ? drm_ioctl+0x3a7/0x3f0 [drm] [ 885.708632] ? amdgpu_gem_va_map_flags+0x70/0x70 [amdgpu] [ 885.709032] ? unmap_region+0xd9/0x120 [ 885.709328] ? amdgpu_drm_ioctl+0x49/0x80 [amdgpu] [ 885.709684] ? do_vfs_ioctl+0xa1/0x620 [ 885.709971] ? do_munmap+0x32e/0x430 [ 885.710232] ? ksys_ioctl+0x66/0x70 [ 885.710513] ? __x64_sys_ioctl+0x16/0x20 [ 885.710806] ? do_syscall_64+0x55/0x100 [ 885.711092] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 ... [ 885.719408] ---[ end trace 7ee3180f42e9f572 ]--- [ 885.719766] RIP: 0010:amdgpu_vm_sdma_commit+0x59/0x130 [amdgpu] ... - the NULL pointer (entity->rq == NULL in amdgpu_vm_sdma_commit()) as follows: 1. quark sends bad job that triggers job timeout; 2. guest KMD detects the job timeout and goes to gpu recovery, and it goes to ip_suspend for SDMA, and it sets sdma[].sched.ready to false; 3. quark sends UNMAP operation through amdgpu_gem_va_ioctl, and guest KMD goes through amdgpu_gem_va_update_vm and finally goes to amdgpu_vm_sdma_commit, it goes to amdgpu_job_submit to drm_sched_job_init 4. drm_sched_job_init fails at drm_sched_pick_best() since sdma[].sched.ready is set to false; in the meanwhile entity->rq becomes NULL; 5. quark sends other UNMAP operations through amdgpu_gem_va_ioctl, while this time there will be NULL pointer because entity->rq is NULL; the above sequence occurs only when "modprobe amdgpu lockup_timeout=500". it does not occur when lockup_timeout=1 (default) because step 2. KMD detects job timeout will be sometime after quark sends UNMAP operations; i.e. quark UNMAP opeartions are finished before sdma ip suspend. HOW: here is to add mutex_lock to wait to avoid using sdma during gpu reset. Signed-off-by: Tiecheng Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index e205ecc75a21..018b88f3b6da 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -2047,6 +2047,8 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev, struct dma_fence *f = NULL; int r; + mutex_lock(>lock_reset); + while (!list_empty(>freed)) { mapping = list_first_entry(>freed, struct amdgpu_bo_va_mapping, list); @@ -2062,6 +2064,7 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev, amdgpu_vm_free_mapping(adev, vm, mapping, f); if (r) { dma_fence_put(f); + mutex_unlock(>lock_reset); return r; } } @@ -2073,6 +2076,7 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev, dma_fence_put(f); } + mutex_unlock(>lock_reset); return 0; } -- 2.17.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 2/2] drm/amd/powerplay: avoid using pm_en before it is initialized revised
hwmgr->pm_en is initialized at hwmgr_hw_init. during amdgpu_device_init, there is amdgpu_asic_reset that calls to soc15_asic_reset (for V320 usecase, Vega10 asic), in which: 1) soc15_asic_reset_method calls to pp_get_asic_baco_capability (pm_en) 2) soc15_asic_baco_reset calls to pp_set_asic_baco_state (pm_en) pm_en is used in the above two cases while it has not yet been initialized So avoid using pm_en in the above two functions for V320 passthrough. Signed-off-by: Tiecheng Zhou --- drivers/gpu/drm/amd/powerplay/amd_powerplay.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c index 71b843f542d8..fc31499c2e5c 100644 --- a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c +++ b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c @@ -1438,7 +1438,8 @@ static int pp_get_asic_baco_capability(void *handle, bool *cap) if (!hwmgr) return -EINVAL; - if (!hwmgr->pm_en || !hwmgr->hwmgr_func->get_asic_baco_capability) + if (!(hwmgr->not_vf && amdgpu_dpm) || + !hwmgr->hwmgr_func->get_asic_baco_capability) return 0; mutex_lock(>smu_lock); @@ -1472,7 +1473,8 @@ static int pp_set_asic_baco_state(void *handle, int state) if (!hwmgr) return -EINVAL; - if (!hwmgr->pm_en || !hwmgr->hwmgr_func->set_asic_baco_state) + if (!(hwmgr->not_vf && amdgpu_dpm) || + !hwmgr->hwmgr_func->set_asic_baco_state) return 0; mutex_lock(>smu_lock); -- 2.17.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 1/2] Revert "drm/amd/powerplay: avoid using pm_en before it is initialized"
This reverts commit 764a21cb085b8d7d754b5d74e2ecc6adc064e3e7. The commit being reverted changed the wrong place, it should have changed in func get_asic_baco_capability. Signed-off-by: Tiecheng Zhou --- drivers/gpu/drm/amd/powerplay/amd_powerplay.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c index fdff3e1c5e95..71b843f542d8 100644 --- a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c +++ b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c @@ -1455,8 +1455,7 @@ static int pp_get_asic_baco_state(void *handle, int *state) if (!hwmgr) return -EINVAL; - if (!(hwmgr->not_vf && amdgpu_dpm) || - !hwmgr->hwmgr_func->get_asic_baco_state) + if (!hwmgr->pm_en || !hwmgr->hwmgr_func->get_asic_baco_state) return 0; mutex_lock(>smu_lock); -- 2.17.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 1/2] Revert "drm/amd/powerplay: avoid using pm_en before it is initialized"
This reverts commit 764a21cb085b8d7d754b5d74e2ecc6adc064e3e7. The commit being reverted changed the wrong place, it should have changed in func get_asic_baco_capability. Signed-off-by: Tiecheng Zhou --- drivers/gpu/drm/amd/powerplay/amd_powerplay.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c index fdff3e1c5e95..71b843f542d8 100644 --- a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c +++ b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c @@ -1455,8 +1455,7 @@ static int pp_get_asic_baco_state(void *handle, int *state) if (!hwmgr) return -EINVAL; - if (!(hwmgr->not_vf && amdgpu_dpm) || - !hwmgr->hwmgr_func->get_asic_baco_state) + if (!hwmgr->pm_en || !hwmgr->hwmgr_func->get_asic_baco_state) return 0; mutex_lock(>smu_lock); -- 2.17.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 2/2] drm/amd/powerplay: avoid using pm_en before it is initialized revised
hwmgr->pm_en is initialized at hwmgr_hw_init. during amdgpu_device_init, there is amdgpu_asic_reset that calls to pp_get_asic_baco_capability, while hwmgr->pm_en has not yet been initialized. this is to avoid using pm_en in pp_get_asic_baco_capability Signed-off-by: Tiecheng Zhou Signed-off-by: Yintian Tao --- drivers/gpu/drm/amd/powerplay/amd_powerplay.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c index 71b843f542d8..fb4ca614f6e3 100644 --- a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c +++ b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c @@ -1438,7 +1438,8 @@ static int pp_get_asic_baco_capability(void *handle, bool *cap) if (!hwmgr) return -EINVAL; - if (!hwmgr->pm_en || !hwmgr->hwmgr_func->get_asic_baco_capability) + if (!(hwmgr->not_vf && amdgpu_dpm) || + !hwmgr->hwmgr_func->get_asic_baco_capability) return 0; mutex_lock(>smu_lock); -- 2.17.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amd/powerplay: avoid using pm_en before it is initialized 2nd
hwmgr->pm_en is initialized at hwmgr_hw_init. during amdgpu_device_init, there is amdgpu_asic_reset that calls to pp_get_asic_baco_capability, while hwmgr->pm_en has not yet been initialized. this is a second patch that avoid using pm_en in pp_get_asic_baco_capability Signed-off-by: Tiecheng Zhou Signed-off-by: Yintian Tao --- drivers/gpu/drm/amd/powerplay/amd_powerplay.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c index fdff3e1c5e95..b27f71c75550 100644 --- a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c +++ b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c @@ -1438,7 +1438,8 @@ static int pp_get_asic_baco_capability(void *handle, bool *cap) if (!hwmgr) return -EINVAL; - if (!hwmgr->pm_en || !hwmgr->hwmgr_func->get_asic_baco_capability) + if (!(hwmgr->not_vf && amdgpu_dpm) || + !hwmgr->hwmgr_func->get_asic_baco_capability) return 0; mutex_lock(>smu_lock); -- 2.17.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amd/powerplay: avoid using pm_en before it is initialized
hwmgr->pm_en is initialized at hwmgr_hw_init. during amdgpu_device_init, there is amdgpu_asic_reset that calls to pp_get_asic_baco_capability, while hwmgr->pm_en has not yet been initialized. so avoid using pm_en in pp_get_asic_baco_capability. Signed-off-by: Tiecheng Zhou Signed-off-by: Yintian Tao --- drivers/gpu/drm/amd/powerplay/amd_powerplay.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c index 71b843f542d8..fdff3e1c5e95 100644 --- a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c +++ b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c @@ -1455,7 +1455,8 @@ static int pp_get_asic_baco_state(void *handle, int *state) if (!hwmgr) return -EINVAL; - if (!hwmgr->pm_en || !hwmgr->hwmgr_func->get_asic_baco_state) + if (!(hwmgr->not_vf && amdgpu_dpm) || + !hwmgr->hwmgr_func->get_asic_baco_state) return 0; mutex_lock(>smu_lock); -- 2.17.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amd/powerplay: determine pm_en at amd_powerplay_create
Need to determine pm_en at amd_powerplay_create of early_init stage. Signed-off-by: Tiecheng Zhou --- drivers/gpu/drm/amd/powerplay/amd_powerplay.c | 3 +++ drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c | 3 --- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c index 71b843f542d8..a37dc37dfe49 100644 --- a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c +++ b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c @@ -48,6 +48,9 @@ static int amd_powerplay_create(struct amdgpu_device *adev) hwmgr->adev = adev; hwmgr->not_vf = !amdgpu_sriov_vf(adev); + hwmgr->pp_one_vf = amdgpu_sriov_is_pp_one_vf(adev); + hwmgr->pm_en = (amdgpu_dpm && (hwmgr->not_vf || hwmgr->pp_one_vf)) + ? true : false; hwmgr->device = amdgpu_cgs_create_device(adev); mutex_init(>smu_lock); mutex_init(>msg_lock); diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c b/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c index f48fdc7f0382..7aee382fc1f9 100644 --- a/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c @@ -221,9 +221,6 @@ int hwmgr_hw_init(struct pp_hwmgr *hwmgr) { int ret = 0; - hwmgr->pp_one_vf = amdgpu_sriov_is_pp_one_vf((struct amdgpu_device *)hwmgr->adev); - hwmgr->pm_en = (amdgpu_dpm && (hwmgr->not_vf || hwmgr->pp_one_vf)) - ? true : false; if (!hwmgr->pm_en) return 0; -- 2.17.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amdgpu/sriov: skip programing some regs with new L1 policy
With new L1 policy, some regs are blocked at guest and they are programed at host side. So skip programing the regs under sriov. the regs are: GCMC_VM_FB_LOCATION_TOP GCMC_VM_FB_LOCATION_BASE MMMC_VM_FB_LOCATION_TOP MMMC_VM_FB_LOCATION_BASE GCMC_VM_SYSTEM_APERTURE_HIGH_ADDR GCMC_VM_SYSTEM_APERTURE_LOW_ADDR MMMC_VM_SYSTEM_APERTURE_HIGH_ADDR MMMC_VM_SYSTEM_APERTURE_LOW_ADDR HDP_NONSURFACE_BASE HDP_NONSURFACE_BASE_HI GCMC_VM_AGP_TOP GCMC_VM_AGP_BOT GCMC_VM_AGP_BASE Signed-off-by: Tiecheng Zhou --- drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.c | 55 +++- drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c | 29 ++--- 2 files changed, 37 insertions(+), 47 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.c b/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.c index e0654a216ab5..cc866c367939 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.c @@ -81,24 +81,31 @@ static void gfxhub_v2_0_init_system_aperture_regs(struct amdgpu_device *adev) { uint64_t value; - /* Disable AGP. */ - WREG32_SOC15(GC, 0, mmGCMC_VM_AGP_BASE, 0); - WREG32_SOC15(GC, 0, mmGCMC_VM_AGP_TOP, 0); - WREG32_SOC15(GC, 0, mmGCMC_VM_AGP_BOT, 0x00FF); - - /* Program the system aperture low logical page number. */ - WREG32_SOC15(GC, 0, mmGCMC_VM_SYSTEM_APERTURE_LOW_ADDR, -adev->gmc.vram_start >> 18); - WREG32_SOC15(GC, 0, mmGCMC_VM_SYSTEM_APERTURE_HIGH_ADDR, -adev->gmc.vram_end >> 18); - - /* Set default page address. */ - value = adev->vram_scratch.gpu_addr - adev->gmc.vram_start - + adev->vm_manager.vram_base_offset; - WREG32_SOC15(GC, 0, mmGCMC_VM_SYSTEM_APERTURE_DEFAULT_ADDR_LSB, -(u32)(value >> 12)); - WREG32_SOC15(GC, 0, mmGCMC_VM_SYSTEM_APERTURE_DEFAULT_ADDR_MSB, -(u32)(value >> 44)); + if (!amdgpu_sriov_vf(adev)) { + /* +* the new L1 policy will block SRIOV guest from writing +* these regs, and they will be programed at host. +* so skip programing these regs. +*/ + /* Disable AGP. */ + WREG32_SOC15(GC, 0, mmGCMC_VM_AGP_BASE, 0); + WREG32_SOC15(GC, 0, mmGCMC_VM_AGP_TOP, 0); + WREG32_SOC15(GC, 0, mmGCMC_VM_AGP_BOT, 0x00FF); + + /* Program the system aperture low logical page number. */ + WREG32_SOC15(GC, 0, mmGCMC_VM_SYSTEM_APERTURE_LOW_ADDR, +adev->gmc.vram_start >> 18); + WREG32_SOC15(GC, 0, mmGCMC_VM_SYSTEM_APERTURE_HIGH_ADDR, +adev->gmc.vram_end >> 18); + + /* Set default page address. */ + value = adev->vram_scratch.gpu_addr - adev->gmc.vram_start + + adev->vm_manager.vram_base_offset; + WREG32_SOC15(GC, 0, mmGCMC_VM_SYSTEM_APERTURE_DEFAULT_ADDR_LSB, +(u32)(value >> 12)); + WREG32_SOC15(GC, 0, mmGCMC_VM_SYSTEM_APERTURE_DEFAULT_ADDR_MSB, +(u32)(value >> 44)); + } /* Program "protection fault". */ WREG32_SOC15(GC, 0, mmGCVM_L2_PROTECTION_FAULT_DEFAULT_ADDR_LO32, @@ -260,18 +267,6 @@ static void gfxhub_v2_0_program_invalidation(struct amdgpu_device *adev) int gfxhub_v2_0_gart_enable(struct amdgpu_device *adev) { - if (amdgpu_sriov_vf(adev)) { - /* -* GCMC_VM_FB_LOCATION_BASE/TOP is NULL for VF, becuase they are -* VF copy registers so vbios post doesn't program them, for -* SRIOV driver need to program them -*/ - WREG32_SOC15(GC, 0, mmGCMC_VM_FB_LOCATION_BASE, -adev->gmc.vram_start >> 24); - WREG32_SOC15(GC, 0, mmGCMC_VM_FB_LOCATION_TOP, -adev->gmc.vram_end >> 24); - } - /* GART Enable. */ gfxhub_v2_0_init_gart_aperture_regs(adev); gfxhub_v2_0_init_system_aperture_regs(adev); diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c b/drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c index bde189680521..fb3f228458e5 100644 --- a/drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c +++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c @@ -72,11 +72,18 @@ static void mmhub_v2_0_init_system_aperture_regs(struct amdgpu_device *adev) WREG32_SOC15(MMHUB, 0, mmMMMC_VM_AGP_TOP, 0); WREG32_SOC15(MMHUB, 0, mmMMMC_VM_AGP_BOT, 0x00FF); - /* Program the system aperture low logical page number. */ - WREG32_SOC15(MMHUB, 0, mmMMMC_VM_SYSTEM_APERTURE_LOW_ADDR, -adev->gmc.vram_start >> 18); - WREG32_SOC15(MMHUB, 0, mmMMMC_VM_SYSTEM_APERTURE_HIGH_ADDR, -adev-
[PATCH] drm/amdgpu/sriov: workaround on rev_id for Navi12 under sriov
guest vm gets 0x when reading RCC_DEV0_EPF0_STRAP0, as a consequence, the rev_id and external_rev_id are wrong. workaround it by hardcoding the rev_id to 0, which is the default value. Signed-off-by: Tiecheng Zhou --- drivers/gpu/drm/amd/amdgpu/nv.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c index b0229543e887..63d54604ace6 100644 --- a/drivers/gpu/drm/amd/amdgpu/nv.c +++ b/drivers/gpu/drm/amd/amdgpu/nv.c @@ -726,6 +726,8 @@ static int nv_common_early_init(void *handle) AMD_PG_SUPPORT_VCN_DPG | AMD_PG_SUPPORT_JPEG | AMD_PG_SUPPORT_ATHUB; + if (amdgpu_sriov_vf(adev)) + adev->rev_id = 0; adev->external_rev_id = adev->rev_id + 0xa; break; default: -- 2.17.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amdgpu/sriov: Need to initialize the HDP_NONSURFACE_BAStE
it requires to initialize HDP_NONSURFACE_BASE, so as to avoid using the value left by a previous VM under sriov scenario. v2: it should not hurt baremetal, generalize it for both sriov and baremetal Signed-off-by: Emily Deng Signed-off-by: Tiecheng Zhou --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index be729e7..c221570 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -1181,6 +1181,9 @@ static int gmc_v9_0_gart_enable(struct amdgpu_device *adev) tmp = RREG32_SOC15(HDP, 0, mmHDP_HOST_PATH_CNTL); WREG32_SOC15(HDP, 0, mmHDP_HOST_PATH_CNTL, tmp); + WREG32_SOC15(HDP, 0, mmHDP_NONSURFACE_BASE, (adev->gmc.vram_start >> 8)); + WREG32_SOC15(HDP, 0, mmHDP_NONSURFACE_BASE_HI, (adev->gmc.vram_start >> 40)); + /* After HDP is initialized, flush HDP.*/ adev->nbio_funcs->hdp_flush(adev, NULL); -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amdgpu/sriov: Need to initialize the HDP_NONSURFACE_BAStE
it requires to initialize HDP_NONSURFACE_BASE, so as to avoid using the value left by a previous VM under sriov scenario. Signed-off-by: Emily Deng Signed-off-by: Tiecheng Zhou --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index be729e7..e96684e 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -1181,6 +1181,11 @@ static int gmc_v9_0_gart_enable(struct amdgpu_device *adev) tmp = RREG32_SOC15(HDP, 0, mmHDP_HOST_PATH_CNTL); WREG32_SOC15(HDP, 0, mmHDP_HOST_PATH_CNTL, tmp); + if (amdgpu_sriov_vf(adev)) { + WREG32_SOC15(HDP, 0, mmHDP_NONSURFACE_BASE, (adev->gmc.vram_start >> 8)); + WREG32_SOC15(HDP, 0, mmHDP_NONSURFACE_BASE_HI, (adev->gmc.vram_start >> 40)); + } + /* After HDP is initialized, flush HDP.*/ adev->nbio_funcs->hdp_flush(adev, NULL); -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amdgpu/sriov: Need to initialize the HDP_NONSURFACE_BAStE
it requires to initialize HDP_NONSURFACE_BASE, so as to avoid using the value left by a previous VM under sriov scenario. Signed-off-by: Emily Deng Signed-off-by: Tiecheng Zhou --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index be729e7..e96684e 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -1181,6 +1181,11 @@ static int gmc_v9_0_gart_enable(struct amdgpu_device *adev) tmp = RREG32_SOC15(HDP, 0, mmHDP_HOST_PATH_CNTL); WREG32_SOC15(HDP, 0, mmHDP_HOST_PATH_CNTL, tmp); + if (amdgpu_sriov_vf(adev)) { + WREG32_SOC15(HDP, 0, mmHDP_NONSURFACE_BASE, (adev->gmc.vram_start >> 8)); + WREG32_SOC15(HDP, 0, mmHDP_NONSURFACE_BASE_HI, (adev->gmc.vram_start >> 40)); + } + /* After HDP is initialized, flush HDP.*/ adev->nbio_funcs->hdp_flush(adev, NULL); -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amdgpu/gfx_v8_0: Reorder the gfx, kiq and kcq rings test sequence
The kiq ring and the very first compute ring may fail occasionally if they are tested directly following kiq_kcq_enable. Insert the gfx ring test before kiq ring test to delay the kiq and kcq ring tests will fix the issue. Signed-off-by: Tiecheng Zhou --- drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 48 +-- 1 file changed, 35 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c index 381f593b..164ffc9 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c @@ -4278,9 +4278,8 @@ static int gfx_v8_0_cp_gfx_resume(struct amdgpu_device *adev) amdgpu_ring_clear_ring(ring); gfx_v8_0_cp_gfx_start(adev); ring->sched.ready = true; - r = amdgpu_ring_test_helper(ring); - return r; + return 0; } static void gfx_v8_0_cp_compute_enable(struct amdgpu_device *adev, bool enable) @@ -4369,10 +4368,9 @@ static int gfx_v8_0_kiq_kcq_enable(struct amdgpu_device *adev) amdgpu_ring_write(kiq_ring, upper_32_bits(wptr_addr)); } - r = amdgpu_ring_test_helper(kiq_ring); - if (r) - DRM_ERROR("KCQ enable failed\n"); - return r; + amdgpu_ring_commit(kiq_ring); + + return 0; } static int gfx_v8_0_deactivate_hqd(struct amdgpu_device *adev, u32 req) @@ -4709,16 +4707,32 @@ static int gfx_v8_0_kcq_resume(struct amdgpu_device *adev) if (r) goto done; - /* Test KCQs - reversing the order of rings seems to fix ring test failure -* after GPU reset -*/ - for (i = adev->gfx.num_compute_rings - 1; i >= 0; i--) { +done: + return r; +} + +static int gfx_v8_0_cp_test_all_rings(struct amdgpu_device *adev) +{ + int r, i; + struct amdgpu_ring *ring; + + /* collect all the ring_tests here, gfx, kiq, compute */ + ring = >gfx.gfx_ring[0]; + r = amdgpu_ring_test_helper(ring); + if (r) + return r; + + ring = >gfx.kiq.ring; + r = amdgpu_ring_test_helper(ring); + if (r) + return r; + + for (i = 0; i < adev->gfx.num_compute_rings; i++) { ring = >gfx.compute_ring[i]; - r = amdgpu_ring_test_helper(ring); + amdgpu_ring_test_helper(ring); } -done: - return r; + return 0; } static int gfx_v8_0_cp_resume(struct amdgpu_device *adev) @@ -4739,6 +4753,11 @@ static int gfx_v8_0_cp_resume(struct amdgpu_device *adev) r = gfx_v8_0_kcq_resume(adev); if (r) return r; + + r = gfx_v8_0_cp_test_all_rings(adev); + if (r) + return r; + gfx_v8_0_enable_gui_idle_interrupt(adev, true); return 0; @@ -5056,6 +5075,7 @@ static int gfx_v8_0_post_soft_reset(void *handle) { struct amdgpu_device *adev = (struct amdgpu_device *)handle; u32 grbm_soft_reset = 0; + struct amdgpu_ring *ring; if ((!adev->gfx.grbm_soft_reset) && (!adev->gfx.srbm_soft_reset)) @@ -5086,6 +5106,8 @@ static int gfx_v8_0_post_soft_reset(void *handle) REG_GET_FIELD(grbm_soft_reset, GRBM_SOFT_RESET, SOFT_RESET_GFX)) gfx_v8_0_cp_gfx_resume(adev); + gfx_v8_0_cp_test_all_rings(adev); + adev->gfx.rlc.funcs->start(adev); return 0; -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amdgpu/sriov: sriov won't support gfx off
Signed-off-by: Tiecheng Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c index 790fd54..e67ab25 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c @@ -389,6 +389,8 @@ void amdgpu_gfx_compute_mqd_sw_fini(struct amdgpu_device *adev) void amdgpu_gfx_off_ctrl(struct amdgpu_device *adev, bool enable) { + if (amdgpu_sriov_vf(adev)) + return; if (!(adev->powerplay.pp_feature & PP_GFXOFF_MASK)) return; -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx