[PATCH] drm/amdgpu: avoid clearing freed bo with sdma in gpu reset

2020-05-06 Thread Tiecheng Zhou
WHY:
For V320 passthrough and "modprobe amdgpu lockup_timeout=500", there will be
kernel NULL pointer when using quark ~ BACO reset, for instance:
  hang_vm_compute0_bad_cs_dispatch.lua
  hang_vm_dma0_corrupted_header.lua
  etc.
-
[  884.792885] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring comp_1.0.0 
timeout, signaled seq=3, emitted seq=4
[  884.793772] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: 
process quark pid 16939 thread quark pid 16940
[  884.859979] amdgpu: [powerplay] set virtualization GFX DPM policy success
[  884.861003] amdgpu: [powerplay] activate virtualization GFX DPM policy 
success
[  884.861065] amdgpu: [powerplay] set virtualization VCE DPM policy success
[  885.693554] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize 
parser -125!
[  885.694682] [drm] schedpage0 is not ready, skipping
[  885.694682] [drm] schedpage1 is not ready, skipping
[  885.694720] [drm:amdgpu_gem_va_ioctl [amdgpu]] *ERROR* Couldn't update BO_VA 
(-2)
[  885.695328] BUG: unable to handle kernel NULL pointer dereference at 
0008
[  885.695909] PGD 0 P4D 0
[  885.696104] Oops:  [#1] SMP PTI
[  885.696368] CPU: 2 PID: 16940 Comm: quark Tainted: G   OE 
4.19.52+ #6
[  885.696945] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1 04/01/2014
[  885.697593] RIP: 0010:amdgpu_vm_sdma_commit+0x59/0x130 [amdgpu]
...
[  885.705042] Call Trace:
[  885.705251]  ? amdgpu_vm_bo_update_mapping+0xdf/0xf0 [amdgpu]
[  885.705696]  ? amdgpu_vm_clear_freed+0xcc/0x1b0 [amdgpu]
[  885.706112]  ? amdgpu_gem_va_ioctl+0x4a1/0x510 [amdgpu]
[  885.706493]  ? __radix_tree_delete+0x7e/0xa0
[  885.706822]  ? amdgpu_gem_va_map_flags+0x70/0x70 [amdgpu]
[  885.707220]  ? drm_ioctl_kernel+0xaa/0xf0 [drm]
[  885.707568]  ? amdgpu_gem_va_map_flags+0x70/0x70 [amdgpu]
[  885.707962]  ? drm_ioctl_kernel+0xaa/0xf0 [drm]
[  885.708294]  ? drm_ioctl+0x3a7/0x3f0 [drm]
[  885.708632]  ? amdgpu_gem_va_map_flags+0x70/0x70 [amdgpu]
[  885.709032]  ? unmap_region+0xd9/0x120
[  885.709328]  ? amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[  885.709684]  ? do_vfs_ioctl+0xa1/0x620
[  885.709971]  ? do_munmap+0x32e/0x430
[  885.710232]  ? ksys_ioctl+0x66/0x70
[  885.710513]  ? __x64_sys_ioctl+0x16/0x20
[  885.710806]  ? do_syscall_64+0x55/0x100
[  885.711092]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
...
[  885.719408] ---[ end trace 7ee3180f42e9f572 ]---
[  885.719766] RIP: 0010:amdgpu_vm_sdma_commit+0x59/0x130 [amdgpu]
...
-

the NULL pointer (entity->rq == NULL in amdgpu_vm_sdma_commit()) as follows:
1. quark sends bad job that triggers job timeout;
2. guest KMD detects the job timeout and goes to gpu recovery, and it goes to
   ip_suspend for SDMA, and it sets sdma[].sched.ready to false;
3. quark sends UNMAP operation through amdgpu_gem_va_ioctl, and guest KMD goes
   through amdgpu_gem_va_update_vm and finally goes to amdgpu_vm_sdma_commit,
   it goes to amdgpu_job_submit to drm_sched_job_init
4. drm_sched_job_init fails at drm_sched_pick_best() since
   sdma[].sched.ready is set to false; in the meanwhile entity->rq becomes NULL;
5. quark sends other UNMAP operations through amdgpu_gem_va_ioctl, while this 
time
   there will be NULL pointer because entity->rq is NULL;

the above sequence occurs only when "modprobe amdgpu lockup_timeout=500".
it does not occur when lockup_timeout=1 (default) because step 2. KMD 
detects
job timeout will be sometime after quark sends UNMAP operations; i.e. quark 
UNMAP
opeartions are finished before sdma ip suspend.

HOW:
here is to add mutex_lock to wait to avoid using sdma during gpu reset.

Signed-off-by: Tiecheng Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index e205ecc75a21..018b88f3b6da 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2047,6 +2047,8 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev,
struct dma_fence *f = NULL;
int r;
 
+   mutex_lock(>lock_reset);
+
while (!list_empty(>freed)) {
mapping = list_first_entry(>freed,
struct amdgpu_bo_va_mapping, list);
@@ -2062,6 +2064,7 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev,
amdgpu_vm_free_mapping(adev, vm, mapping, f);
if (r) {
dma_fence_put(f);
+   mutex_unlock(>lock_reset);
return r;
}
}
@@ -2073,6 +2076,7 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev,
dma_fence_put(f);
}
 
+   mutex_unlock(>lock_reset);
return 0;
 
 }
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 2/2] drm/amd/powerplay: avoid using pm_en before it is initialized revised

2020-04-26 Thread Tiecheng Zhou
hwmgr->pm_en is initialized at hwmgr_hw_init.

during amdgpu_device_init, there is amdgpu_asic_reset that calls to
soc15_asic_reset (for V320 usecase, Vega10 asic), in which:
1) soc15_asic_reset_method calls to pp_get_asic_baco_capability (pm_en)
2) soc15_asic_baco_reset calls to pp_set_asic_baco_state (pm_en)

pm_en is used in the above two cases while it has not yet been initialized

So avoid using pm_en in the above two functions for V320 passthrough.

Signed-off-by: Tiecheng Zhou 
---
 drivers/gpu/drm/amd/powerplay/amd_powerplay.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c 
b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
index 71b843f542d8..fc31499c2e5c 100644
--- a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
+++ b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
@@ -1438,7 +1438,8 @@ static int pp_get_asic_baco_capability(void *handle, bool 
*cap)
if (!hwmgr)
return -EINVAL;
 
-   if (!hwmgr->pm_en || !hwmgr->hwmgr_func->get_asic_baco_capability)
+   if (!(hwmgr->not_vf && amdgpu_dpm) ||
+   !hwmgr->hwmgr_func->get_asic_baco_capability)
return 0;
 
mutex_lock(>smu_lock);
@@ -1472,7 +1473,8 @@ static int pp_set_asic_baco_state(void *handle, int state)
if (!hwmgr)
return -EINVAL;
 
-   if (!hwmgr->pm_en || !hwmgr->hwmgr_func->set_asic_baco_state)
+   if (!(hwmgr->not_vf && amdgpu_dpm) ||
+   !hwmgr->hwmgr_func->set_asic_baco_state)
return 0;
 
mutex_lock(>smu_lock);
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/2] Revert "drm/amd/powerplay: avoid using pm_en before it is initialized"

2020-04-26 Thread Tiecheng Zhou
This reverts commit 764a21cb085b8d7d754b5d74e2ecc6adc064e3e7.

The commit being reverted changed the wrong place, it should have
changed in func get_asic_baco_capability.

Signed-off-by: Tiecheng Zhou 
---
 drivers/gpu/drm/amd/powerplay/amd_powerplay.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c 
b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
index fdff3e1c5e95..71b843f542d8 100644
--- a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
+++ b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
@@ -1455,8 +1455,7 @@ static int pp_get_asic_baco_state(void *handle, int 
*state)
if (!hwmgr)
return -EINVAL;
 
-   if (!(hwmgr->not_vf && amdgpu_dpm) ||
-   !hwmgr->hwmgr_func->get_asic_baco_state)
+   if (!hwmgr->pm_en || !hwmgr->hwmgr_func->get_asic_baco_state)
return 0;
 
mutex_lock(>smu_lock);
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/2] Revert "drm/amd/powerplay: avoid using pm_en before it is initialized"

2020-04-26 Thread Tiecheng Zhou
This reverts commit 764a21cb085b8d7d754b5d74e2ecc6adc064e3e7.

The commit being reverted changed the wrong place, it should have
changed in func get_asic_baco_capability.

Signed-off-by: Tiecheng Zhou 
---
 drivers/gpu/drm/amd/powerplay/amd_powerplay.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c 
b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
index fdff3e1c5e95..71b843f542d8 100644
--- a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
+++ b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
@@ -1455,8 +1455,7 @@ static int pp_get_asic_baco_state(void *handle, int 
*state)
if (!hwmgr)
return -EINVAL;
 
-   if (!(hwmgr->not_vf && amdgpu_dpm) ||
-   !hwmgr->hwmgr_func->get_asic_baco_state)
+   if (!hwmgr->pm_en || !hwmgr->hwmgr_func->get_asic_baco_state)
return 0;
 
mutex_lock(>smu_lock);
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 2/2] drm/amd/powerplay: avoid using pm_en before it is initialized revised

2020-04-26 Thread Tiecheng Zhou
hwmgr->pm_en is initialized at hwmgr_hw_init.
during amdgpu_device_init, there is amdgpu_asic_reset that calls to
pp_get_asic_baco_capability, while hwmgr->pm_en has not yet been initialized.

this is to avoid using pm_en in pp_get_asic_baco_capability

Signed-off-by: Tiecheng Zhou 
Signed-off-by: Yintian Tao 
---
 drivers/gpu/drm/amd/powerplay/amd_powerplay.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c 
b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
index 71b843f542d8..fb4ca614f6e3 100644
--- a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
+++ b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
@@ -1438,7 +1438,8 @@ static int pp_get_asic_baco_capability(void *handle, bool 
*cap)
if (!hwmgr)
return -EINVAL;
 
-   if (!hwmgr->pm_en || !hwmgr->hwmgr_func->get_asic_baco_capability)
+   if (!(hwmgr->not_vf && amdgpu_dpm) ||
+   !hwmgr->hwmgr_func->get_asic_baco_capability)
return 0;
 
mutex_lock(>smu_lock);
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amd/powerplay: avoid using pm_en before it is initialized 2nd

2020-04-26 Thread Tiecheng Zhou
hwmgr->pm_en is initialized at hwmgr_hw_init.
during amdgpu_device_init, there is amdgpu_asic_reset that calls to
pp_get_asic_baco_capability, while hwmgr->pm_en has not yet been initialized.

this is a second patch that avoid using pm_en in pp_get_asic_baco_capability

Signed-off-by: Tiecheng Zhou 
Signed-off-by: Yintian Tao 
---
 drivers/gpu/drm/amd/powerplay/amd_powerplay.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c 
b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
index fdff3e1c5e95..b27f71c75550 100644
--- a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
+++ b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
@@ -1438,7 +1438,8 @@ static int pp_get_asic_baco_capability(void *handle, bool 
*cap)
if (!hwmgr)
return -EINVAL;
 
-   if (!hwmgr->pm_en || !hwmgr->hwmgr_func->get_asic_baco_capability)
+   if (!(hwmgr->not_vf && amdgpu_dpm) ||
+   !hwmgr->hwmgr_func->get_asic_baco_capability)
return 0;
 
mutex_lock(>smu_lock);
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amd/powerplay: avoid using pm_en before it is initialized

2020-04-02 Thread Tiecheng Zhou
hwmgr->pm_en is initialized at hwmgr_hw_init.
during amdgpu_device_init, there is amdgpu_asic_reset that calls to
pp_get_asic_baco_capability, while hwmgr->pm_en has not yet been initialized.

so avoid using pm_en in pp_get_asic_baco_capability.

Signed-off-by: Tiecheng Zhou 
Signed-off-by: Yintian Tao 
---
 drivers/gpu/drm/amd/powerplay/amd_powerplay.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c 
b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
index 71b843f542d8..fdff3e1c5e95 100644
--- a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
+++ b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
@@ -1455,7 +1455,8 @@ static int pp_get_asic_baco_state(void *handle, int 
*state)
if (!hwmgr)
return -EINVAL;
 
-   if (!hwmgr->pm_en || !hwmgr->hwmgr_func->get_asic_baco_state)
+   if (!(hwmgr->not_vf && amdgpu_dpm) ||
+   !hwmgr->hwmgr_func->get_asic_baco_state)
return 0;
 
mutex_lock(>smu_lock);
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amd/powerplay: determine pm_en at amd_powerplay_create

2020-04-01 Thread Tiecheng Zhou
Need to determine pm_en at amd_powerplay_create of early_init stage.

Signed-off-by: Tiecheng Zhou 
---
 drivers/gpu/drm/amd/powerplay/amd_powerplay.c | 3 +++
 drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c   | 3 ---
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c 
b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
index 71b843f542d8..a37dc37dfe49 100644
--- a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
+++ b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
@@ -48,6 +48,9 @@ static int amd_powerplay_create(struct amdgpu_device *adev)
 
hwmgr->adev = adev;
hwmgr->not_vf = !amdgpu_sriov_vf(adev);
+   hwmgr->pp_one_vf = amdgpu_sriov_is_pp_one_vf(adev);
+   hwmgr->pm_en = (amdgpu_dpm && (hwmgr->not_vf || hwmgr->pp_one_vf))
+   ? true : false;
hwmgr->device = amdgpu_cgs_create_device(adev);
mutex_init(>smu_lock);
mutex_init(>msg_lock);
diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c
index f48fdc7f0382..7aee382fc1f9 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c
@@ -221,9 +221,6 @@ int hwmgr_hw_init(struct pp_hwmgr *hwmgr)
 {
int ret = 0;
 
-   hwmgr->pp_one_vf = amdgpu_sriov_is_pp_one_vf((struct amdgpu_device 
*)hwmgr->adev);
-   hwmgr->pm_en = (amdgpu_dpm && (hwmgr->not_vf || hwmgr->pp_one_vf))
-   ? true : false;
if (!hwmgr->pm_en)
return 0;
 
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu/sriov: skip programing some regs with new L1 policy

2020-03-01 Thread Tiecheng Zhou
With new L1 policy, some regs are blocked at guest and they are
programed at host side. So skip programing the regs under sriov.

the regs are:
GCMC_VM_FB_LOCATION_TOP
GCMC_VM_FB_LOCATION_BASE
MMMC_VM_FB_LOCATION_TOP
MMMC_VM_FB_LOCATION_BASE
GCMC_VM_SYSTEM_APERTURE_HIGH_ADDR
GCMC_VM_SYSTEM_APERTURE_LOW_ADDR
MMMC_VM_SYSTEM_APERTURE_HIGH_ADDR
MMMC_VM_SYSTEM_APERTURE_LOW_ADDR
HDP_NONSURFACE_BASE
HDP_NONSURFACE_BASE_HI
GCMC_VM_AGP_TOP
GCMC_VM_AGP_BOT
GCMC_VM_AGP_BASE

Signed-off-by: Tiecheng Zhou 
---
 drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.c | 55 +++-
 drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c  | 29 ++---
 2 files changed, 37 insertions(+), 47 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.c
index e0654a216ab5..cc866c367939 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.c
@@ -81,24 +81,31 @@ static void gfxhub_v2_0_init_system_aperture_regs(struct 
amdgpu_device *adev)
 {
uint64_t value;
 
-   /* Disable AGP. */
-   WREG32_SOC15(GC, 0, mmGCMC_VM_AGP_BASE, 0);
-   WREG32_SOC15(GC, 0, mmGCMC_VM_AGP_TOP, 0);
-   WREG32_SOC15(GC, 0, mmGCMC_VM_AGP_BOT, 0x00FF);
-
-   /* Program the system aperture low logical page number. */
-   WREG32_SOC15(GC, 0, mmGCMC_VM_SYSTEM_APERTURE_LOW_ADDR,
-adev->gmc.vram_start >> 18);
-   WREG32_SOC15(GC, 0, mmGCMC_VM_SYSTEM_APERTURE_HIGH_ADDR,
-adev->gmc.vram_end >> 18);
-
-   /* Set default page address. */
-   value = adev->vram_scratch.gpu_addr - adev->gmc.vram_start
-   + adev->vm_manager.vram_base_offset;
-   WREG32_SOC15(GC, 0, mmGCMC_VM_SYSTEM_APERTURE_DEFAULT_ADDR_LSB,
-(u32)(value >> 12));
-   WREG32_SOC15(GC, 0, mmGCMC_VM_SYSTEM_APERTURE_DEFAULT_ADDR_MSB,
-(u32)(value >> 44));
+   if (!amdgpu_sriov_vf(adev)) {
+   /*
+* the new L1 policy will block SRIOV guest from writing
+* these regs, and they will be programed at host.
+* so skip programing these regs.
+*/
+   /* Disable AGP. */
+   WREG32_SOC15(GC, 0, mmGCMC_VM_AGP_BASE, 0);
+   WREG32_SOC15(GC, 0, mmGCMC_VM_AGP_TOP, 0);
+   WREG32_SOC15(GC, 0, mmGCMC_VM_AGP_BOT, 0x00FF);
+
+   /* Program the system aperture low logical page number. */
+   WREG32_SOC15(GC, 0, mmGCMC_VM_SYSTEM_APERTURE_LOW_ADDR,
+adev->gmc.vram_start >> 18);
+   WREG32_SOC15(GC, 0, mmGCMC_VM_SYSTEM_APERTURE_HIGH_ADDR,
+adev->gmc.vram_end >> 18);
+
+   /* Set default page address. */
+   value = adev->vram_scratch.gpu_addr - adev->gmc.vram_start
+   + adev->vm_manager.vram_base_offset;
+   WREG32_SOC15(GC, 0, mmGCMC_VM_SYSTEM_APERTURE_DEFAULT_ADDR_LSB,
+(u32)(value >> 12));
+   WREG32_SOC15(GC, 0, mmGCMC_VM_SYSTEM_APERTURE_DEFAULT_ADDR_MSB,
+(u32)(value >> 44));
+   }
 
/* Program "protection fault". */
WREG32_SOC15(GC, 0, mmGCVM_L2_PROTECTION_FAULT_DEFAULT_ADDR_LO32,
@@ -260,18 +267,6 @@ static void gfxhub_v2_0_program_invalidation(struct 
amdgpu_device *adev)
 
 int gfxhub_v2_0_gart_enable(struct amdgpu_device *adev)
 {
-   if (amdgpu_sriov_vf(adev)) {
-   /*
-* GCMC_VM_FB_LOCATION_BASE/TOP is NULL for VF, becuase they are
-* VF copy registers so vbios post doesn't program them, for
-* SRIOV driver need to program them
-*/
-   WREG32_SOC15(GC, 0, mmGCMC_VM_FB_LOCATION_BASE,
-adev->gmc.vram_start >> 24);
-   WREG32_SOC15(GC, 0, mmGCMC_VM_FB_LOCATION_TOP,
-adev->gmc.vram_end >> 24);
-   }
-
/* GART Enable. */
gfxhub_v2_0_init_gart_aperture_regs(adev);
gfxhub_v2_0_init_system_aperture_regs(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c 
b/drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c
index bde189680521..fb3f228458e5 100644
--- a/drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c
@@ -72,11 +72,18 @@ static void mmhub_v2_0_init_system_aperture_regs(struct 
amdgpu_device *adev)
WREG32_SOC15(MMHUB, 0, mmMMMC_VM_AGP_TOP, 0);
WREG32_SOC15(MMHUB, 0, mmMMMC_VM_AGP_BOT, 0x00FF);
 
-   /* Program the system aperture low logical page number. */
-   WREG32_SOC15(MMHUB, 0, mmMMMC_VM_SYSTEM_APERTURE_LOW_ADDR,
-adev->gmc.vram_start >> 18);
-   WREG32_SOC15(MMHUB, 0, mmMMMC_VM_SYSTEM_APERTURE_HIGH_ADDR,
-adev-

[PATCH] drm/amdgpu/sriov: workaround on rev_id for Navi12 under sriov

2020-01-07 Thread Tiecheng Zhou
guest vm gets 0x when reading RCC_DEV0_EPF0_STRAP0,
as a consequence, the rev_id and external_rev_id are wrong.

workaround it by hardcoding the rev_id to 0, which is the default value.

Signed-off-by: Tiecheng Zhou 
---
 drivers/gpu/drm/amd/amdgpu/nv.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
index b0229543e887..63d54604ace6 100644
--- a/drivers/gpu/drm/amd/amdgpu/nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
@@ -726,6 +726,8 @@ static int nv_common_early_init(void *handle)
AMD_PG_SUPPORT_VCN_DPG |
AMD_PG_SUPPORT_JPEG |
AMD_PG_SUPPORT_ATHUB;
+   if (amdgpu_sriov_vf(adev))
+   adev->rev_id = 0;
adev->external_rev_id = adev->rev_id + 0xa;
break;
default:
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu/sriov: Need to initialize the HDP_NONSURFACE_BAStE

2019-05-14 Thread Tiecheng Zhou
it requires to initialize HDP_NONSURFACE_BASE, so as to avoid
using the value left by a previous VM under sriov scenario.

v2: it should not hurt baremetal, generalize it for both sriov
and baremetal

Signed-off-by: Emily Deng 
Signed-off-by: Tiecheng Zhou 
---
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index be729e7..c221570 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -1181,6 +1181,9 @@ static int gmc_v9_0_gart_enable(struct amdgpu_device 
*adev)
tmp = RREG32_SOC15(HDP, 0, mmHDP_HOST_PATH_CNTL);
WREG32_SOC15(HDP, 0, mmHDP_HOST_PATH_CNTL, tmp);
 
+   WREG32_SOC15(HDP, 0, mmHDP_NONSURFACE_BASE, (adev->gmc.vram_start >> 
8));
+   WREG32_SOC15(HDP, 0, mmHDP_NONSURFACE_BASE_HI, (adev->gmc.vram_start >> 
40));
+
/* After HDP is initialized, flush HDP.*/
adev->nbio_funcs->hdp_flush(adev, NULL);
 
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu/sriov: Need to initialize the HDP_NONSURFACE_BAStE

2019-05-14 Thread Tiecheng Zhou
it requires to initialize HDP_NONSURFACE_BASE, so as to avoid
using the value left by a previous VM under sriov scenario.

Signed-off-by: Emily Deng 
Signed-off-by: Tiecheng Zhou 
---
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index be729e7..e96684e 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -1181,6 +1181,11 @@ static int gmc_v9_0_gart_enable(struct amdgpu_device 
*adev)
tmp = RREG32_SOC15(HDP, 0, mmHDP_HOST_PATH_CNTL);
WREG32_SOC15(HDP, 0, mmHDP_HOST_PATH_CNTL, tmp);
 
+   if (amdgpu_sriov_vf(adev)) {
+   WREG32_SOC15(HDP, 0, mmHDP_NONSURFACE_BASE, 
(adev->gmc.vram_start >> 8));
+   WREG32_SOC15(HDP, 0, mmHDP_NONSURFACE_BASE_HI, 
(adev->gmc.vram_start >> 40));
+   }
+
/* After HDP is initialized, flush HDP.*/
adev->nbio_funcs->hdp_flush(adev, NULL);
 
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu/sriov: Need to initialize the HDP_NONSURFACE_BAStE

2019-05-13 Thread Tiecheng Zhou
it requires to initialize HDP_NONSURFACE_BASE, so as to avoid
using the value left by a previous VM under sriov scenario.

Signed-off-by: Emily Deng 
Signed-off-by: Tiecheng Zhou 
---
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index be729e7..e96684e 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -1181,6 +1181,11 @@ static int gmc_v9_0_gart_enable(struct amdgpu_device 
*adev)
tmp = RREG32_SOC15(HDP, 0, mmHDP_HOST_PATH_CNTL);
WREG32_SOC15(HDP, 0, mmHDP_HOST_PATH_CNTL, tmp);
 
+   if (amdgpu_sriov_vf(adev)) {
+   WREG32_SOC15(HDP, 0, mmHDP_NONSURFACE_BASE, 
(adev->gmc.vram_start >> 8));
+   WREG32_SOC15(HDP, 0, mmHDP_NONSURFACE_BASE_HI, 
(adev->gmc.vram_start >> 40));
+   }
+
/* After HDP is initialized, flush HDP.*/
adev->nbio_funcs->hdp_flush(adev, NULL);
 
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu/gfx_v8_0: Reorder the gfx, kiq and kcq rings test sequence

2018-12-27 Thread Tiecheng Zhou
The kiq ring and the very first compute ring may fail occasionally
if they are tested directly following kiq_kcq_enable.

Insert the gfx ring test before kiq ring test to delay the kiq and kcq
ring tests will fix the issue.

Signed-off-by: Tiecheng Zhou 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 48 +--
 1 file changed, 35 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index 381f593b..164ffc9 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -4278,9 +4278,8 @@ static int gfx_v8_0_cp_gfx_resume(struct amdgpu_device 
*adev)
amdgpu_ring_clear_ring(ring);
gfx_v8_0_cp_gfx_start(adev);
ring->sched.ready = true;
-   r = amdgpu_ring_test_helper(ring);
 
-   return r;
+   return 0;
 }
 
 static void gfx_v8_0_cp_compute_enable(struct amdgpu_device *adev, bool enable)
@@ -4369,10 +4368,9 @@ static int gfx_v8_0_kiq_kcq_enable(struct amdgpu_device 
*adev)
amdgpu_ring_write(kiq_ring, upper_32_bits(wptr_addr));
}
 
-   r = amdgpu_ring_test_helper(kiq_ring);
-   if (r)
-   DRM_ERROR("KCQ enable failed\n");
-   return r;
+   amdgpu_ring_commit(kiq_ring);
+
+   return 0;
 }
 
 static int gfx_v8_0_deactivate_hqd(struct amdgpu_device *adev, u32 req)
@@ -4709,16 +4707,32 @@ static int gfx_v8_0_kcq_resume(struct amdgpu_device 
*adev)
if (r)
goto done;
 
-   /* Test KCQs - reversing the order of rings seems to fix ring test 
failure
-* after GPU reset
-*/
-   for (i = adev->gfx.num_compute_rings - 1; i >= 0; i--) {
+done:
+   return r;
+}
+
+static int gfx_v8_0_cp_test_all_rings(struct amdgpu_device *adev)
+{
+   int r, i;
+   struct amdgpu_ring *ring;
+
+   /* collect all the ring_tests here, gfx, kiq, compute */
+   ring = >gfx.gfx_ring[0];
+   r = amdgpu_ring_test_helper(ring);
+   if (r)
+   return r;
+
+   ring = >gfx.kiq.ring;
+   r = amdgpu_ring_test_helper(ring);
+   if (r)
+   return r;
+
+   for (i = 0; i < adev->gfx.num_compute_rings; i++) {
ring = >gfx.compute_ring[i];
-   r = amdgpu_ring_test_helper(ring);
+   amdgpu_ring_test_helper(ring);
}
 
-done:
-   return r;
+   return 0;
 }
 
 static int gfx_v8_0_cp_resume(struct amdgpu_device *adev)
@@ -4739,6 +4753,11 @@ static int gfx_v8_0_cp_resume(struct amdgpu_device *adev)
r = gfx_v8_0_kcq_resume(adev);
if (r)
return r;
+
+   r = gfx_v8_0_cp_test_all_rings(adev);
+   if (r)
+   return r;
+
gfx_v8_0_enable_gui_idle_interrupt(adev, true);
 
return 0;
@@ -5056,6 +5075,7 @@ static int gfx_v8_0_post_soft_reset(void *handle)
 {
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
u32 grbm_soft_reset = 0;
+   struct amdgpu_ring *ring;
 
if ((!adev->gfx.grbm_soft_reset) &&
(!adev->gfx.srbm_soft_reset))
@@ -5086,6 +5106,8 @@ static int gfx_v8_0_post_soft_reset(void *handle)
REG_GET_FIELD(grbm_soft_reset, GRBM_SOFT_RESET, SOFT_RESET_GFX))
gfx_v8_0_cp_gfx_resume(adev);
 
+   gfx_v8_0_cp_test_all_rings(adev);
+
adev->gfx.rlc.funcs->start(adev);
 
return 0;
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu/sriov: sriov won't support gfx off

2018-08-13 Thread Tiecheng Zhou
Signed-off-by: Tiecheng Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index 790fd54..e67ab25 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -389,6 +389,8 @@ void amdgpu_gfx_compute_mqd_sw_fini(struct amdgpu_device 
*adev)
 
 void amdgpu_gfx_off_ctrl(struct amdgpu_device *adev, bool enable)
 {
+   if (amdgpu_sriov_vf(adev))
+   return;
if (!(adev->powerplay.pp_feature & PP_GFXOFF_MASK))
return;
 
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx