RE: [PATCH 4/4] drm/amdgpu: Move ras resume into SRIOV function

2024-04-28 Thread Deng, Emily
[AMD Official Use Only - General]

Reviewed-by: Emily Deng 

Emily Deng
Best Wishes



>-Original Message-
>From: Li, Yunxiang (Teddy) 
>Sent: Friday, April 26, 2024 11:58 AM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deucher, Alexander ; Koenig, Christian
>; Lazar, Lijo ; Kuehling,
>Felix ; Deng, Emily ; Li,
>Yunxiang (Teddy) 
>Subject: [PATCH 4/4] drm/amdgpu: Move ras resume into SRIOV function
>
>This is part of the reset, move it into the reset function.
>
>Signed-off-by: Yunxiang Li 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 +---
> 1 file changed, 5 insertions(+), 7 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>index 3c4755f3c116..8f2c1f71ed9a 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>@@ -5119,6 +5119,11 @@ static int amdgpu_device_reset_sriov(struct
>amdgpu_device *adev,
>   amdgpu_amdkfd_post_reset(adev);
>   amdgpu_virt_release_full_gpu(adev, true);
>
>+  /* Aldebaran and gfx_11_0_3 support ras in SRIOV, so need resume
>ras during reset */
>+  if (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 2) ||
>+  amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 3) ||
>+  amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(11, 0, 3))
>+  amdgpu_ras_resume(adev);
>   return 0;
> }
>
>@@ -5823,13 +5828,6 @@ int amdgpu_device_gpu_recover(struct
>amdgpu_device *adev,
>   goto retry;
>   if (r)
>   adev->asic_reset_res = r;
>-
>-  /* Aldebaran and gfx_11_0_3 support ras in SRIOV, so need
>resume ras during reset */
>-  if (amdgpu_ip_version(adev, GC_HWIP, 0) ==
>-  IP_VERSION(9, 4, 2) ||
>-  amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4,
>3) ||
>-  amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(11,
>0, 3))
>-  amdgpu_ras_resume(adev);
>   } else {
>   r = amdgpu_do_asic_reset(device_list_handle,
>reset_context);
>   if (r && r == -EAGAIN)
>--
>2.34.1



RE: [PATCH v4 2/4] drm/amdgpu: Add reset_context flag for host FLR

2024-04-28 Thread Deng, Emily
[AMD Official Use Only - General]

Reviewed-by: Emily Deng 

Emily Deng
Best Wishes



>-Original Message-
>From: Li, Yunxiang (Teddy) 
>Sent: Saturday, April 27, 2024 2:27 AM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deucher, Alexander ; Koenig, Christian
>; Lazar, Lijo ; Kuehling,
>Felix ; Deng, Emily ; Li,
>Yunxiang (Teddy) 
>Subject: [PATCH v4 2/4] drm/amdgpu: Add reset_context flag for host FLR
>
>There are other reset sources that pass NULL as the job pointer, such as
>amdgpu_amdkfd_reset_work. Therefore, using the job pointer to check if the
>FLR comes from the host does not work.
>
>Add a flag in reset_context to explicitly mark host triggered reset, and set
>this flag when we receive host reset notification.
>
>Signed-off-by: Yunxiang Li 
>---
>v2: fix typo
>v3: pass reset_context directly
>v4: clear the flag in case we retry
>
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 13 -
>drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h  |  1 +
> drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c  |  1 +
> drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c  |  1 +
> drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c  |  1 +
> 5 files changed, 12 insertions(+), 5 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>index 8befd10bf007..33c889c027a5 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>@@ -5055,13 +5055,13 @@ static int amdgpu_device_recover_vram(struct
>amdgpu_device *adev)
>  * amdgpu_device_reset_sriov - reset ASIC for SR-IOV vf
>  *
>  * @adev: amdgpu_device pointer
>- * @from_hypervisor: request from hypervisor
>+ * @reset_context: amdgpu reset context pointer
>  *
>  * do VF FLR and reinitialize Asic
>  * return 0 means succeeded otherwise failed
>  */
> static int amdgpu_device_reset_sriov(struct amdgpu_device *adev,
>-   bool from_hypervisor)
>+   struct amdgpu_reset_context
>*reset_context)
> {
>   int r;
>   struct amdgpu_hive_info *hive = NULL;
>@@ -5070,12 +5070,15 @@ static int amdgpu_device_reset_sriov(struct
>amdgpu_device *adev,
> retry:
>   amdgpu_amdkfd_pre_reset(adev);
>
>-  if (from_hypervisor)
>+  if (test_bit(AMDGPU_HOST_FLR, _context->flags)) {
>+  clear_bit(AMDGPU_HOST_FLR, _context->flags);
>   r = amdgpu_virt_request_full_gpu(adev, true);
>-  else
>+  } else {
>   r = amdgpu_virt_reset_gpu(adev);
>+  }
>   if (r)
>   return r;
>+
>   amdgpu_ras_set_fed(adev, false);
>   amdgpu_irq_gpu_reset_resume_helper(adev);
>
>@@ -5826,7 +5829,7 @@ int amdgpu_device_gpu_recover(struct
>amdgpu_device *adev,
>   /* Actual ASIC resets if needed.*/
>   /* Host driver will handle XGMI hive reset for SRIOV */
>   if (amdgpu_sriov_vf(adev)) {
>-  r = amdgpu_device_reset_sriov(adev, job ? false : true);
>+  r = amdgpu_device_reset_sriov(adev, reset_context);
>   if (r)
>   adev->asic_reset_res = r;
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
>index b11d190ece53..5a9cc043b858 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
>@@ -33,6 +33,7 @@ enum AMDGPU_RESET_FLAGS {
>   AMDGPU_NEED_FULL_RESET = 0,
>   AMDGPU_SKIP_HW_RESET = 1,
>   AMDGPU_SKIP_COREDUMP = 2,
>+  AMDGPU_HOST_FLR = 3,
> };
>
> struct amdgpu_reset_context {
>diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>index c5ba9c4757a8..f4c47492e0cd 100644
>--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>@@ -292,6 +292,7 @@ static void xgpu_ai_mailbox_flr_work(struct
>work_struct *work)
>   reset_context.method = AMD_RESET_METHOD_NONE;
>   reset_context.reset_req_dev = adev;
>   clear_bit(AMDGPU_NEED_FULL_RESET, _context.flags);
>+  set_bit(AMDGPU_HOST_FLR, _context.flags);
>
>   amdgpu_device_gpu_recover(adev, NULL, _context);
>   }
>diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>index fa9d1b02f391..14cc7910e5cf 100644
>--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>@@ -328,6 +328,7 @@ static void xgpu_nv_mailbox_flr_work(struct
>work_struct *work)
>   reset_context.method = AMD_RESET_METHOD_NONE;
>   reset_context.reset_req_dev = adev;
>   clear_bit(AMD

RE: [PATCH v2 3/4] drm/amdgpu: Fix amdgpu_device_reset_sriov retry logic

2024-04-28 Thread Deng, Emily
[AMD Official Use Only - General]

Reviewed-by: Emily Deng 

Emily Deng
Best Wishes



>-Original Message-
>From: Li, Yunxiang (Teddy) 
>Sent: Saturday, April 27, 2024 2:29 AM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deucher, Alexander ; Koenig, Christian
>; Lazar, Lijo ; Kuehling,
>Felix ; Deng, Emily ; Li,
>Yunxiang (Teddy) 
>Subject: [PATCH v2 3/4] drm/amdgpu: Fix amdgpu_device_reset_sriov retry
>logic
>
>The retry loop for SRIOV reset have refcount and memory leak issue.
>Depending on which function call fails it can potentially call
>amdgpu_amdkfd_pre/post_reset different number of times and causes
>kfd_locked count to be wrong. This will block all future attempts at opening
>/dev/kfd. The retry loop also leakes resources by calling
>amdgpu_virt_init_data_exchange multiple times without calling the
>corresponding fini function.
>
>Align with the bare-metal reset path which doesn't have these issues.
>This means taking the amdgpu_amdkfd_pre/post_reset functions out of the
>reset loop and calling amdgpu_device_pre_asic_reset each retry which
>properly free the resources from previous try by calling
>amdgpu_virt_fini_data_exchange.
>
>Signed-off-by: Yunxiang Li 
>---
>v2: put back release full access and the missed return
>
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 47 ++
> 1 file changed, 22 insertions(+), 25 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>index 33c889c027a5..b23645f23a2e 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>@@ -5065,10 +5065,6 @@ static int amdgpu_device_reset_sriov(struct
>amdgpu_device *adev,  {
>   int r;
>   struct amdgpu_hive_info *hive = NULL;
>-  int retry_limit = 0;
>-
>-retry:
>-  amdgpu_amdkfd_pre_reset(adev);
>
>   if (test_bit(AMDGPU_HOST_FLR, _context->flags)) {
>   clear_bit(AMDGPU_HOST_FLR, _context->flags); @@ -
>5088,7 +5084,7 @@ static int amdgpu_device_reset_sriov(struct
>amdgpu_device *adev,
>   /* Resume IP prior to SMC */
>   r = amdgpu_device_ip_reinit_early_sriov(adev);
>   if (r)
>-  goto error;
>+  return r;
>
>   amdgpu_virt_init_data_exchange(adev);
>
>@@ -5099,38 +5095,35 @@ static int amdgpu_device_reset_sriov(struct
>amdgpu_device *adev,
>   /* now we are okay to resume SMC/CP/SDMA */
>   r = amdgpu_device_ip_reinit_late_sriov(adev);
>   if (r)
>-  goto error;
>+  return r;
>
>   hive = amdgpu_get_xgmi_hive(adev);
>   /* Update PSP FW topology after reset */
>   if (hive && adev->gmc.xgmi.num_physical_nodes > 1)
>   r = amdgpu_xgmi_update_topology(hive, adev);
>-
>   if (hive)
>   amdgpu_put_xgmi_hive(hive);
>+  if (r)
>+  return r;
>
>-  if (!r) {
>-  r = amdgpu_ib_ring_tests(adev);
>-
>-  amdgpu_amdkfd_post_reset(adev);
>-  }
>+  r = amdgpu_ib_ring_tests(adev);
>+  if (r)
>+  return r;
>
>-error:
>-  if (!r && adev->virt.gim_feature &
>AMDGIM_FEATURE_GIM_FLR_VRAMLOST) {
>+  if (adev->virt.gim_feature &
>AMDGIM_FEATURE_GIM_FLR_VRAMLOST) {
>   amdgpu_inc_vram_lost(adev);
>   r = amdgpu_device_recover_vram(adev);
>   }
>-  amdgpu_virt_release_full_gpu(adev, true);
>+  if (r)
>+  return r;
>
>-  if (AMDGPU_RETRY_SRIOV_RESET(r)) {
>-  if (retry_limit < AMDGPU_MAX_RETRY_LIMIT) {
>-  retry_limit++;
>-  goto retry;
>-  } else
>-  DRM_ERROR("GPU reset retry is beyond the retry
>limit\n");
>-  }
>+  /* need to be called during full access so we can't do it later like
>+   * bare-metal does.
>+   */
>+  amdgpu_amdkfd_post_reset(adev);
>+  amdgpu_virt_release_full_gpu(adev, true);
>
>-  return r;
>+  return 0;
> }
>
> /**
>@@ -5689,6 +5682,7 @@ int amdgpu_device_gpu_recover(struct
>amdgpu_device *adev,
>   int i, r = 0;
>   bool need_emergency_restart = false;
>   bool audio_suspended = false;
>+  int retry_limit = AMDGPU_MAX_RETRY_LIMIT;
>
>   /*
>* Special case: RAS triggered and full reset isn't supported @@ -
>5770,8 +5764,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device
>*adev,
>
>   cancel_delayed_work_sync(_adev->delayed_init_work);
>
>-  if (!amdgpu_sriov_vf(tmp_adev))

RE: [PATCH 3/4] drm/amdgpu: Fix amdgpu_device_reset_sriov retry logic

2024-04-26 Thread Deng, Emily
[AMD Official Use Only - General]

>-Original Message-
>From: Li, Yunxiang (Teddy) 
>Sent: Friday, April 26, 2024 11:58 AM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deucher, Alexander ; Koenig, Christian
>; Lazar, Lijo ; Kuehling,
>Felix ; Deng, Emily ; Li,
>Yunxiang (Teddy) 
>Subject: [PATCH 3/4] drm/amdgpu: Fix amdgpu_device_reset_sriov retry logic
>
>The retry loop for SRIOV reset have refcount and memory leak issue.
>Depending on which function call fails it can potentially call
>amdgpu_amdkfd_pre/post_reset different number of times and causes
>kfd_locked count to be wrong. This will block all future attempts at opening
>/dev/kfd. The retry loop also leakes resources by calling
>amdgpu_virt_init_data_exchange multiple times without calling the
>corresponding fini function.
>
>Align with the bare-metal reset path which doesn't have these issues.
>This means taking the amdgpu_amdkfd_pre/post_reset functions out of the
>reset loop and calling amdgpu_device_pre_asic_reset each retry which
>properly free the resources from previous try by calling
>amdgpu_virt_fini_data_exchange.
>
>Signed-off-by: Yunxiang Li 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 50 ++
> 1 file changed, 22 insertions(+), 28 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>index 1fd9637daafc..3c4755f3c116 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>@@ -5063,19 +5063,14 @@ static int amdgpu_device_recover_vram(struct
>amdgpu_device *adev)  static int amdgpu_device_reset_sriov(struct
>amdgpu_device *adev,
>struct amdgpu_reset_context
>*reset_context)  {
>-  int r;
>+  int r = 0;
>   struct amdgpu_hive_info *hive = NULL;
>-  int retry_limit = 0;
>-
>-retry:
>-  amdgpu_amdkfd_pre_reset(adev);
>
>   if (test_bit(AMDGPU_HOST_FLR, _context->flags))
>   r = amdgpu_virt_request_full_gpu(adev, true);
>   else
>   r = amdgpu_virt_reset_gpu(adev);
>-  if (r)
>-  return r;
Why remove this?

Emily Deng
Best Wishes


>+
>   amdgpu_ras_set_fed(adev, false);
>   amdgpu_irq_gpu_reset_resume_helper(adev);
>
>@@ -5085,7 +5080,7 @@ static int amdgpu_device_reset_sriov(struct
>amdgpu_device *adev,
>   /* Resume IP prior to SMC */
>   r = amdgpu_device_ip_reinit_early_sriov(adev);
>   if (r)
>-  goto error;
>+  return r;
Need to call amdgpu_virt_release_full_gpu(adev, true) before retry, and the 
same as below.

Emily Deng
Best Wishes

>   amdgpu_virt_init_data_exchange(adev);
>
>@@ -5096,38 +5091,35 @@ static int amdgpu_device_reset_sriov(struct
>amdgpu_device *adev,
>   /* now we are okay to resume SMC/CP/SDMA */
>   r = amdgpu_device_ip_reinit_late_sriov(adev);
>   if (r)
>-  goto error;
>+  return r;
>
>   hive = amdgpu_get_xgmi_hive(adev);
>   /* Update PSP FW topology after reset */
>   if (hive && adev->gmc.xgmi.num_physical_nodes > 1)
>   r = amdgpu_xgmi_update_topology(hive, adev);
>-
>   if (hive)
>   amdgpu_put_xgmi_hive(hive);
>+  if (r)
>+  return r;
>
>-  if (!r) {
>-  r = amdgpu_ib_ring_tests(adev);
>-
>-  amdgpu_amdkfd_post_reset(adev);
>-  }
>+  r = amdgpu_ib_ring_tests(adev);
>+  if (r)
>+  return r;
>
>-error:
>-  if (!r && adev->virt.gim_feature &
>AMDGIM_FEATURE_GIM_FLR_VRAMLOST) {
>+  if (adev->virt.gim_feature &
>AMDGIM_FEATURE_GIM_FLR_VRAMLOST) {
>   amdgpu_inc_vram_lost(adev);
>   r = amdgpu_device_recover_vram(adev);
>   }
>-  amdgpu_virt_release_full_gpu(adev, true);
>+  if (r)
>+  return r;
>
>-  if (AMDGPU_RETRY_SRIOV_RESET(r)) {
>-  if (retry_limit < AMDGPU_MAX_RETRY_LIMIT) {
>-  retry_limit++;
>-  goto retry;
>-  } else
>-  DRM_ERROR("GPU reset retry is beyond the retry
>limit\n");
>-  }
>+  /* need to be called during full access so we can't do it later like
>+   * bare-metal does.
>+   */
>+  amdgpu_amdkfd_post_reset(adev);
>+  amdgpu_virt_release_full_gpu(adev, true);
>
>-  return r;
>+  return 0;
> }
>
> /**
>@@ -5686,6 +5678,7 @@ int amdgpu_device_gpu_recover(struct
>amdgpu_device *adev,
>   int i, r = 0;
>   bool need_emergency_restart = f

RE: [PATCH v2 2/2] drm/amd/amdgpu: SRIOV full reset issue with VCN

2023-11-30 Thread Deng, Emily
[AMD Official Use Only - General]

Series Reviewed-by: Emily Deng 

>-Original Message-
>From: amd-gfx  On Behalf Of Bokun
>Zhang
>Sent: Thursday, November 30, 2023 8:21 AM
>To: amd-gfx@lists.freedesktop.org
>Cc: Zhang, Bokun 
>Subject: [PATCH v2 2/2] drm/amd/amdgpu: SRIOV full reset issue with VCN
>
>- After a full reset, VF's FB will be cleaned. This
>  includes the VCN's fw_shared memory.
>
>  However, there is no suspend-resume routine for
>  SRIOV VF. Therefore, the data in the fw_shared
>  memory will be lost forever and it causes engine
>  hang later on.
>
>  We must repopulate the data in fw_shared during
>  SRIOV hw_init
>
>Signed-off-by: Bokun Zhang 
>---
> drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c | 3 +++
> 1 file changed, 3 insertions(+)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
>b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
>index 54b03df63a51..b71590b67e20 100644
>--- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
>+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
>@@ -1280,6 +1280,9 @@ static int vcn_v4_0_start_sriov(struct
>amdgpu_device *adev)
>   if (adev->vcn.harvest_config & (1 << i))
>   continue;
>
>+  // Must re/init fw_shared at beginning
>+  vcn_v4_0_fw_shared_init(adev, i);
>+
>   table_size = 0;
>
>
>   MMSCH_V4_0_INSERT_DIRECT_RD_MOD_WT(SOC15_REG_OFFSET(VC
>N, i,
>--
>2.34.1



RE: [PATCH v3] drm/amdkfd: Run restore_workers on freezable WQs

2023-11-24 Thread Deng, Emily
[AMD Official Use Only - General]

Tested-by: Emily Deng 

>-Original Message-
>From: Kuehling, Felix 
>Sent: Friday, November 24, 2023 6:55 AM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily ; Pan, Xinhui
>; Koenig, Christian 
>Subject: [PATCH v3] drm/amdkfd: Run restore_workers on freezable WQs
>
>Make restore workers freezable so we don't have to explicitly flush them in
>suspend and GPU reset code paths, and we don't accidentally try to restore
>BOs while the GPU is suspended. Not having to flush restore_work also helps
>avoid lock/fence dependencies in the GPU reset case where we're not allowed
>to wait for fences.
>
>A side effect of this is, that we can now have multiple concurrent threads 
>trying
>to signal the same eviction fence. Rework eviction fence signaling and
>replacement to account for that.
>
>The GPU reset path can no longer rely on restore_process_worker to resume
>queues because evict/restore workers can run independently of it. Instead call
>a new restore_process_helper directly.
>
>This is an RFC and request for testing.
>
>v2:
>- Reworked eviction fence signaling
>- Introduced restore_process_helper
>
>v3:
>- Handle unsignaled eviction fences in restore_process_bos
>
>Signed-off-by: Felix Kuehling 
>Acked-by: Christian König 
>---
> .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 68 +++
> drivers/gpu/drm/amd/amdkfd/kfd_process.c  | 87 +++
> drivers/gpu/drm/amd/amdkfd/kfd_svm.c  |  4 +-
> 3 files changed, 104 insertions(+), 55 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>index 2e302956a279..bdec88713a09 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>@@ -1431,7 +1431,6 @@ static int init_kfd_vm(struct amdgpu_vm *vm, void
>**process_info,
> amdgpu_amdkfd_restore_userptr_worker);
>
>   *process_info = info;
>-  *ef = dma_fence_get(>eviction_fence->base);
>   }
>
>   vm->process_info = *process_info;
>@@ -1462,6 +1461,8 @@ static int init_kfd_vm(struct amdgpu_vm *vm, void
>**process_info,
>   list_add_tail(>vm_list_node,
>   &(vm->process_info->vm_list_head));
>   vm->process_info->n_vms++;
>+
>+  *ef = dma_fence_get(>process_info->eviction_fence->base);
>   mutex_unlock(>process_info->lock);
>
>   return 0;
>@@ -1473,10 +1474,7 @@ static int init_kfd_vm(struct amdgpu_vm *vm, void
>**process_info,
> reserve_pd_fail:
>   vm->process_info = NULL;
>   if (info) {
>-  /* Two fence references: one in info and one in *ef */
>   dma_fence_put(>eviction_fence->base);
>-  dma_fence_put(*ef);
>-  *ef = NULL;
>   *process_info = NULL;
>   put_pid(info->pid);
> create_evict_fence_fail:
>@@ -1670,7 +1668,8 @@ int amdgpu_amdkfd_criu_resume(void *p)
>   goto out_unlock;
>   }
>   WRITE_ONCE(pinfo->block_mmu_notifications, false);
>-  schedule_delayed_work(>restore_userptr_work, 0);
>+  queue_delayed_work(system_freezable_wq,
>+ >restore_userptr_work, 0);
>
> out_unlock:
>   mutex_unlock(>lock);
>@@ -2475,7 +2474,8 @@ int amdgpu_amdkfd_evict_userptr(struct
>mmu_interval_notifier *mni,
>
>KFD_QUEUE_EVICTION_TRIGGER_USERPTR);
>   if (r)
>   pr_err("Failed to quiesce KFD\n");
>-  schedule_delayed_work(_info-
>>restore_userptr_work,
>+  queue_delayed_work(system_freezable_wq,
>+  _info->restore_userptr_work,
>
>   msecs_to_jiffies(AMDGPU_USERPTR_RESTORE_DELAY_MS));
>   }
>   mutex_unlock(_info->notifier_lock);
>@@ -2810,7 +2810,8 @@ static void
>amdgpu_amdkfd_restore_userptr_worker(struct work_struct *work)
>
>   /* If validation failed, reschedule another attempt */
>   if (evicted_bos) {
>-  schedule_delayed_work(_info-
>>restore_userptr_work,
>+  queue_delayed_work(system_freezable_wq,
>+  _info->restore_userptr_work,
>
>   msecs_to_jiffies(AMDGPU_USERPTR_RESTORE_DELAY_MS));
>
>   kfd_smi_event_queue_restore_rescheduled(mm);
>@@ -2819,6 +2820,23 @@ static void
>amdgpu_amdkfd_restore_userptr_worker(struct work_struct *work)
>   put_task_struct(usertask);
> }
>
>+static void replace_eviction_fence(struct dma_fence **ef,
>+ struct 

RE: [PATCH 2/2] drm/amdgpu: handle the return for sync wait

2023-10-20 Thread Deng, Emily
[AMD Official Use Only - General]

Ok, will send this as the first.

Emily Deng
Best Wishes

>-Original Message-
>From: Christian König 
>Sent: Friday, October 20, 2023 3:30 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH 2/2] drm/amdgpu: handle the return for sync wait
>
>Am 20.10.23 um 08:13 schrieb Emily Deng:
>
>You need a patch description and this patch here needs to come first and not
>second.
>
>Christian.
>
>> Signed-off-by: Emily Deng 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 9 ++---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c  | 6 +-
>>   2 files changed, 11 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> index 54f31a420229..3011c191d7dd 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> @@ -2668,7 +2668,7 @@ static int validate_invalid_user_pages(struct
>> amdkfd_process_info *process_info)
>>
>>   unreserve_out:
>>  ttm_eu_backoff_reservation(, _list);
>> -amdgpu_sync_wait(, false);
>> +ret = amdgpu_sync_wait(, false);
>>  amdgpu_sync_free();
>>   out_free:
>>  kfree(pd_bo_list_entries);
>> @@ -2939,8 +2939,11 @@ int
>amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence
>**ef)
>>  }
>>
>>  /* Wait for validate and PT updates to finish */
>> -amdgpu_sync_wait(_obj, false);
>> -
>> +ret = amdgpu_sync_wait(_obj, false);
>> +if (ret) {
>> +pr_err("Failed to wait for validate and PT updates to 
>> finish\n");
>> +goto validate_map_fail;
>> +}
>>  /* Release old eviction fence and create new one, because fence only
>>   * goes from unsignaled to signaled, fence cannot be reused.
>>   * Use context and mm from the old fence.
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
>> index 70fe3b39c004..a63139277583 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
>> @@ -1153,7 +1153,11 @@ int amdgpu_mes_ctx_map_meta_data(struct
>amdgpu_device *adev,
>>  }
>>  amdgpu_sync_fence(, vm->last_update);
>>
>> -amdgpu_sync_wait(, false);
>> +r = amdgpu_sync_wait(, false);
>> +if (r) {
>> +DRM_ERROR("failed to wait sync\n");
>> +goto error;
>> +}
>>  ttm_eu_backoff_reservation(, );
>>
>>  amdgpu_sync_free();



RE: [PATCH 1/2] drm/amdgpu: Add timeout for sync wait

2023-10-20 Thread Deng, Emily
[AMD Official Use Only - General]

Hi Christian,
 The issue is running a compute hang with a quark and trigger a compute job 
timeout. For compute, the timeout setting is 60s, but for gfx and sdma, it is 
10s.
So, get the timeout from the sched is reasonable for different sched.
And if wait timeout, it will print error, so won't hint real issues. And 
even it has real issue, the wait forever is bad user experience, and driver 
couldn't work anymore.

Emily Deng
Best Wishes



>-Original Message-
>From: Christian König 
>Sent: Friday, October 20, 2023 3:29 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH 1/2] drm/amdgpu: Add timeout for sync wait
>
>Am 20.10.23 um 08:13 schrieb Emily Deng:
>> Issue: Dead heappen during gpu recover, the call sequence as below:
>>
>> amdgpu_device_gpu_recover->amdgpu_amdkfd_pre_reset-
>>flush_delayed_work
>> -> amdgpu_amdkfd_gpuvm_restore_process_bos->amdgpu_sync_wait
>>
>> It is because the amdgpu_sync_wait is waiting for the bad job's fence,
>> and never return, so the recover couldn't continue.
>>
>>
>> Signed-off-by: Emily Deng 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 11 +--
>>   1 file changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>> index dcd8c066bc1f..6253d6aab7f8 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>> @@ -406,8 +406,15 @@ int amdgpu_sync_wait(struct amdgpu_sync *sync,
>bool intr)
>>  int i, r;
>>
>>  hash_for_each_safe(sync->fences, i, tmp, e, node) {
>> -r = dma_fence_wait(e->fence, intr);
>> -if (r)
>> +struct drm_sched_fence *s_fence = to_drm_sched_fence(e-
>>fence);
>> +long timeout = msecs_to_jiffies(1);
>
>That handling doesn't make much sense. If you need a timeout then you need
>a timeout for the whole function.
>
>Additional to that timeouts often just hide real problems which needs fixing.
>
>So this here needs a much better justification otherwise it's a pretty clear 
>NAK.
>
>Regards,
>Christian.
>
>> +
>> +if (s_fence)
>> +timeout = s_fence->sched->timeout;
>> +
>> +if (r == 0)
>> +r = -ETIMEDOUT;
>> +if (r < 0)
>>  return r;
>>
>>  amdgpu_sync_entry_free(e);



RE: [PATCH] drm/amdgpu: Add timeout for sync wait

2023-10-19 Thread Deng, Emily
[AMD Official Use Only - General]

Hi Felix,
 Yes, will correct the description. Will add another patch to handle the 
return for sync wait.

Emily Deng
Best Wishes



>-Original Message-
>From: Kuehling, Felix 
>Sent: Friday, October 20, 2023 12:05 AM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu: Add timeout for sync wait
>
>On 2023-10-19 05:31, Emily Deng wrote:
>> Issue: Dead heappen during gpu recover
>>
>> [56433.829492] amdgpu :04:00.0: amdgpu: GPU reset begin!
>> [56550.499625] INFO: task kworker/u80:0:10 blocked for more than 120
>seconds.
>> [56550.520215]   Tainted: G   OE  6.2.0-34-generic 
>> #34~22.04.1-
>Ubuntu
>> [56550.542883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>disables this message.
>> [56550.566313] task:kworker/u80:0   state:D stack:0 pid:10ppid:2
>flags:0x4000
>> [56550.591318] Workqueue: kfd_restore_wq restore_process_worker
>> [amdgpu] [56550.611391] Call Trace:
>> [56550.618698]  
>> [56550.624968]  __schedule+0x2b7/0x5f0 [56550.635416]
>> schedule+0x68/0x110 [56550.645090]  schedule_timeout+0x151/0x160
>> [56550.657096]  ? amdgpu_vm_bo_update+0x46e/0x660 [amdgpu]
>> [56550.673245]  dma_fence_default_wait+0x1a2/0x1e0
>> [56550.686818]  ? __pfx_dma_fence_default_wait_cb+0x10/0x10
>> [56550.702728]  dma_fence_wait_timeout+0x117/0x140
>> [56550.716301]  amdgpu_sync_wait+0x62/0xa0 [amdgpu] [56550.730654]
>> amdgpu_amdkfd_gpuvm_restore_process_bos+0x59e/0x770 [amdgpu]
>> [56550.751668]  ? newidle_balance+0x298/0x490 [56550.763936]
>> restore_process_worker+0x42/0x270 [amdgpu] [56550.780183]
>> process_one_work+0x21f/0x440 [56550.792193]  worker_thread+0x50/0x3f0
>> [56550.803165]  ? __pfx_worker_thread+0x10/0x10 [56550.815934]
>> kthread+0xee/0x120 [56550.825342]  ? __pfx_kthread+0x10/0x10
>> [56550.836548]  ret_from_fork+0x2c/0x50 [56550.847262]   [
>> 1935.215502] Call Trace:
>> [ 1935.222827]  
>> [ 1935.229121]  __schedule+0x23d/0x5a0 [ 1935.239583]
>> schedule+0x4e/0xc0 [ 1935.248983]  schedule_timeout+0x103/0x140 [
>> 1935.261002]  __wait_for_common+0xae/0x150 [ 1935.273008]  ?
>> usleep_range_state+0x90/0x90 [ 1935.285546]
>> wait_for_completion+0x24/0x30 [ 1935.297813]
>> __flush_work.isra.0+0x175/0x280 [ 1935.310611]  ?
>> worker_detach_from_pool+0xc0/0xc0 [ 1935.324436]
>> flush_delayed_work+0x31/0x50 [ 1935.336455]
>> kfd_suspend_all_processes+0x96/0x150 [amdgpu] [ 1935.353429]
>> kgd2kfd_suspend+0xb8/0xe0 [amdgpu] [ 1935.367469]
>> kgd2kfd_pre_reset+0x81/0xf0 [amdgpu] [ 1935.382036]
>> amdgpu_amdkfd_pre_reset+0x1a/0x30 [amdgpu] [ 1935.398156]
>> amdgpu_device_gpu_recover.cold+0x210/0xcf2 [amdgpu] [ 1935.416722]
>> amdgpu_job_timedout+0x19f/0x1e0 [amdgpu] [ 1935.432367]
>> drm_sched_job_timedout+0x6f/0x120 [amd_sched] [ 1935.448792]
>> process_one_work+0x22b/0x3d0 [ 1935.460806]
>worker_thread+0x53/0x420
>> [ 1935.471777]  ? process_one_work+0x3d0/0x3d0 [ 1935.484307]
>> kthread+0x12a/0x150 [ 1935.493993]  ? set_kthread_struct+0x50/0x50 [
>> 1935.506513]  ret_from_fork+0x22/0x30
>
>Looking at the time stamps, this seems to be a mash-up of two different logs. I
>think you're trying to show how a restore_processes worker is stuck on a fence,
>and that's causing kgd2kfd_pre_reset to hang when it tries to flush the work.
>
>The fence it's hanging on is probably something related to a page table update
>that got caught up in the GPU hang. Adding a timeout here seems reasonable.
>There may be another problem, because
>amdgpu_amdkfd_gpuvm_restore_process_bos ignores the return value of
>amdgpu_sync_wait. We shouldi probably handle the timeout gracefully with a
>"goto validate_map_fail".
>
>Regards,
>   Felix
>
>
>>
>> It is because the amdgpu_sync_wait is waiting for the bad job's fence,
>> and never return, so the recover couldn't continue.
>>
>>
>> Signed-off-by: Emily Deng 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 16 +---
>>   1 file changed, 13 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>> index dcd8c066bc1f..c922867c5675 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>> @@ -406,9 +406,19 @@ int amdgpu_sync_wait(struct amdgpu_sync *sync,
>bool intr)
>>  int i, r;
>>
>>  hash_for_each_safe(sync->fences, i, tmp, e, node) {
>> -r = dma_fence_wait(e->fence, intr);
&

RE: [PATCH] drm/amdgpu/irq: Move irq resume to the beginning

2023-08-08 Thread Deng, Emily
[AMD Official Use Only - General]

Ping.

>-Original Message-
>From: Emily Deng 
>Sent: Monday, August 7, 2023 1:11 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: [PATCH] drm/amdgpu/irq: Move irq resume to the beginning
>
>Need to move irq resume to the beginning of reset sriov, or if one interrupt
>occurs before irq resume, then the irq won't work anymore.
>
>Signed-off-by: Emily Deng 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>index 1338489b0b2f..8b304fdfe6db 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>@@ -4617,6 +4617,7 @@ static int amdgpu_device_reset_sriov(struct
>amdgpu_device *adev,
>   r = amdgpu_virt_reset_gpu(adev);
>   if (r)
>   return r;
>+  amdgpu_irq_gpu_reset_resume_helper(adev);
>
>   /* some sw clean up VF needs to do before recover */
>   amdgpu_virt_post_reset(adev);
>@@ -4646,7 +4647,6 @@ static int amdgpu_device_reset_sriov(struct
>amdgpu_device *adev,
>   amdgpu_put_xgmi_hive(hive);
>
>   if (!r) {
>-  amdgpu_irq_gpu_reset_resume_helper(adev);
>   r = amdgpu_ib_ring_tests(adev);
>
>   amdgpu_amdkfd_post_reset(adev);
>--
>2.36.1



RE: [PATCH] drm/amdgpu: Clear VCN cache when hw_init

2023-06-21 Thread Deng, Emily
[AMD Official Use Only - General]

Reviewed-by: Emily Deng 

>-Original Message-
>From: amd-gfx  On Behalf Of Horace
>Chen
>Sent: Tuesday, June 20, 2023 9:30 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Andrey Grodzovsky ; Xiao, Jack
>; Xu, Feifei ; Chen, Horace
>; Chang, HaiJun ;
>Deucher, Alexander ; Quan, Evan
>; Koenig, Christian ; Liu,
>Monk ; Zhang, Hawking 
>Subject: [PATCH] drm/amdgpu: Clear VCN cache when hw_init
>
>[Why]
>VCN will use some framebuffer space as its cache. It needs to be reset when
>reset happens, such as FLR. Otherwise some error may be kept after the reset.
>
>Signed-off-by: Horace Chen 
>---
> drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c | 3 +++
> 1 file changed, 3 insertions(+)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
>b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
>index b48bb5212488..2db73a964031 100644
>--- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
>+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
>@@ -1292,6 +1292,7 @@ static int vcn_v4_0_start_sriov(struct
>amdgpu_device *adev)
>   cache_size);
>
>   cache_addr = adev->vcn.inst[i].gpu_addr + offset;
>+  memset(adev->vcn.inst[i].cpu_addr + offset, 0,
>+AMDGPU_VCN_STACK_SIZE);
>
>   MMSCH_V4_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCN, i,
>   regUVD_LMI_VCPU_CACHE1_64BIT_BAR_LOW),
>   lower_32_bits(cache_addr));
>@@ -1307,6 +1308,8 @@ static int vcn_v4_0_start_sriov(struct
>amdgpu_device *adev)
>
>   cache_addr = adev->vcn.inst[i].gpu_addr + offset +
>   AMDGPU_VCN_STACK_SIZE;
>+  memset(adev->vcn.inst[i].cpu_addr + offset +
>AMDGPU_VCN_STACK_SIZE, 0,
>+  AMDGPU_VCN_STACK_SIZE);
>
>   MMSCH_V4_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCN, i,
>   regUVD_LMI_VCPU_CACHE2_64BIT_BAR_LOW),
>   lower_32_bits(cache_addr));
>--
>2.34.1



RE: [PATCH] drm/amdgpu/vcn: Need to unpause dpg before stop dpg

2023-06-21 Thread Deng, Emily
[AMD Official Use Only - General]

Ping.

Best wishes
Emily Deng



>-Original Message-
>From: Emily Deng 
>Sent: Wednesday, June 21, 2023 9:30 AM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: [PATCH] drm/amdgpu/vcn: Need to unpause dpg before stop dpg
>
>Need to unpause dpg first, or it will hit follow error during stop dpg:
>"[drm] Register(1) [regUVD_POWER_STATUS] failed to reach value
>0x0001 != 0xn"
>
>Signed-off-by: Emily Deng 
>---
> drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c | 2 ++
> 1 file changed, 2 insertions(+)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
>b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
>index b48bb5212488..259795098173 100644
>--- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
>+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
>@@ -1424,8 +1424,10 @@ static int vcn_v4_0_start_sriov(struct
>amdgpu_device *adev)
>  */
> static void vcn_v4_0_stop_dpg_mode(struct amdgpu_device *adev, int
>inst_idx)  {
>+  struct dpg_pause_state state = {.fw_based =
>VCN_DPG_STATE__UNPAUSE};
>   uint32_t tmp;
>
>+  vcn_v4_0_pause_dpg_mode(adev, inst_idx, );
>   /* Wait for power status to be 1 */
>   SOC15_WAIT_ON_RREG(VCN, inst_idx, regUVD_POWER_STATUS, 1,
>   UVD_POWER_STATUS__UVD_POWER_STATUS_MASK);
>--
>2.36.1



RE: [PATCH] drm/amdgpu/vcn: Need to pause dpg before stop dpg

2023-06-20 Thread Deng, Emily
[AMD Official Use Only - General]

Hi Leo,
Sorry, need to correct the commit message, it is unpasue, during unloading 
driver,  I haven't seen any place call un-pause dpg for vcn4.0 for hw fini. And 
it will report "[drm] Register(1) [regUVD_POWER_STATUS] failed to reach value
>0x0001 != 0xn ", and with adding calling un-pause dpg, then issue 
>disappears.

Best wishes
Emily Deng

>-Original Message-
>From: Liu, Leo 
>Sent: Monday, June 19, 2023 9:27 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: RE: [PATCH] drm/amdgpu/vcn: Need to pause dpg before stop dpg
>
>[AMD Official Use Only - General]
>
>Hi Emily,
>
>Do you want to pause or un-pause dpg mode based on and change and commit
>message?
>
>With bare metal, before calling the stop, the state of dpg should be un-paused
>within the call the of amdgpu_vcn_idle_work_handler, is it not the case for
>SRIOV?
>
>Regards,
>Leo
>
>
>-Original Message-
>From: amd-gfx  On Behalf Of Emily
>Deng
>Sent: Monday, June 19, 2023 6:24 AM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: [PATCH] drm/amdgpu/vcn: Need to pause dpg before stop dpg
>
>Need to pause dpg first, or it will hit follow error during stop dpg:
>"[drm] Register(1) [regUVD_POWER_STATUS] failed to reach value
>0x0001 != 0xn"
>
>Signed-off-by: Emily Deng 
>---
> drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c | 2 ++
> 1 file changed, 2 insertions(+)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
>b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
>index b48bb5212488..259795098173 100644
>--- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
>+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
>@@ -1424,8 +1424,10 @@ static int vcn_v4_0_start_sriov(struct
>amdgpu_device *adev)
>  */
> static void vcn_v4_0_stop_dpg_mode(struct amdgpu_device *adev, int
>inst_idx)  {
>+   struct dpg_pause_state state = {.fw_based =
>VCN_DPG_STATE__UNPAUSE};
>uint32_t tmp;
>
>+   vcn_v4_0_pause_dpg_mode(adev, inst_idx, );
>/* Wait for power status to be 1 */
>SOC15_WAIT_ON_RREG(VCN, inst_idx, regUVD_POWER_STATUS, 1,
>UVD_POWER_STATUS__UVD_POWER_STATUS_MASK);
>--
>2.36.1
>



RE: [PATCH] drm/amdgpu/mmsch: Correct the definition for mmsch init header

2023-06-06 Thread Deng, Emily
[AMD Official Use Only - General]

Ping..

Best wishes
Emily Deng



>-Original Message-
>From: Emily Deng 
>Sent: Tuesday, June 6, 2023 2:52 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: [PATCH] drm/amdgpu/mmsch: Correct the definition for mmsch init
>header
>
>For the header, it is version related, shouldn't use MAX_VCN_INSTANCES.
>
>Signed-off-by: Emily Deng 
>---
> drivers/gpu/drm/amd/amdgpu/mmsch_v3_0.h | 4 +++-
>drivers/gpu/drm/amd/amdgpu/mmsch_v4_0.h | 4 +++-
> drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c   | 2 +-
> drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c   | 2 +-
> 4 files changed, 8 insertions(+), 4 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/mmsch_v3_0.h
>b/drivers/gpu/drm/amd/amdgpu/mmsch_v3_0.h
>index 3e4e858a6965..a773ef61b78c 100644
>--- a/drivers/gpu/drm/amd/amdgpu/mmsch_v3_0.h
>+++ b/drivers/gpu/drm/amd/amdgpu/mmsch_v3_0.h
>@@ -30,6 +30,8 @@
> #define MMSCH_VERSION_MINOR   0
> #define MMSCH_VERSION (MMSCH_VERSION_MAJOR << 16 |
>MMSCH_VERSION_MINOR)
>
>+#define MMSCH_V3_0_VCN_INSTANCES 0x2
>+
> enum mmsch_v3_0_command_type {
>   MMSCH_COMMAND__DIRECT_REG_WRITE = 0,
>   MMSCH_COMMAND__DIRECT_REG_POLLING = 2, @@ -47,7 +49,7
>@@ struct mmsch_v3_0_table_info {  struct mmsch_v3_0_init_header {
>   uint32_t version;
>   uint32_t total_size;
>-  struct mmsch_v3_0_table_info inst[AMDGPU_MAX_VCN_INSTANCES];
>+  struct mmsch_v3_0_table_info inst[MMSCH_V3_0_VCN_INSTANCES];
> };
>
> struct mmsch_v3_0_cmd_direct_reg_header { diff --git
>a/drivers/gpu/drm/amd/amdgpu/mmsch_v4_0.h
>b/drivers/gpu/drm/amd/amdgpu/mmsch_v4_0.h
>index 83653a50a1a2..796d4f8791e5 100644
>--- a/drivers/gpu/drm/amd/amdgpu/mmsch_v4_0.h
>+++ b/drivers/gpu/drm/amd/amdgpu/mmsch_v4_0.h
>@@ -43,6 +43,8 @@
> #define MMSCH_VF_MAILBOX_RESP__OK 0x1
> #define MMSCH_VF_MAILBOX_RESP__INCOMPLETE 0x2
>
>+#define MMSCH_V4_0_VCN_INSTANCES 0x2
>+
> enum mmsch_v4_0_command_type {
>   MMSCH_COMMAND__DIRECT_REG_WRITE = 0,
>   MMSCH_COMMAND__DIRECT_REG_POLLING = 2, @@ -60,7 +62,7
>@@ struct mmsch_v4_0_table_info {  struct mmsch_v4_0_init_header {
>   uint32_t version;
>   uint32_t total_size;
>-  struct mmsch_v4_0_table_info inst[AMDGPU_MAX_VCN_INSTANCES];
>+  struct mmsch_v4_0_table_info inst[MMSCH_V4_0_VCN_INSTANCES];
>   struct mmsch_v4_0_table_info jpegdec;
> };
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
>b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
>index 70fefbf26c48..c8f63b3c6f69 100644
>--- a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
>+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
>@@ -1313,7 +1313,7 @@ static int vcn_v3_0_start_sriov(struct
>amdgpu_device *adev)
>
>   header.version = MMSCH_VERSION;
>   header.total_size = sizeof(struct mmsch_v3_0_init_header) >> 2;
>-  for (i = 0; i < AMDGPU_MAX_VCN_INSTANCES; i++) {
>+  for (i = 0; i < MMSCH_V3_0_VCN_INSTANCES; i++) {
>   header.inst[i].init_status = 0;
>   header.inst[i].table_offset = 0;
>   header.inst[i].table_size = 0;
>diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
>b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
>index 60c3fd20e8ce..8d371faaa2b3 100644
>--- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
>+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
>@@ -1239,7 +1239,7 @@ static int vcn_v4_0_start_sriov(struct
>amdgpu_device *adev)
>
>   header.version = MMSCH_VERSION;
>   header.total_size = sizeof(struct mmsch_v4_0_init_header) >> 2;
>-  for (i = 0; i < AMDGPU_MAX_VCN_INSTANCES; i++) {
>+  for (i = 0; i < MMSCH_V4_0_VCN_INSTANCES; i++) {
>   header.inst[i].init_status = 0;
>   header.inst[i].table_offset = 0;
>   header.inst[i].table_size = 0;
>--
>2.36.1



RE: [PATCH] drm/amd/amdgpu: Enable gfx pipe1 and fix related issues

2022-11-04 Thread Deng, Emily
[AMD Official Use Only - General]

>-Original Message-
>From: Christian König 
>Sent: Friday, November 4, 2022 5:26 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org;
>Michel Dänzer 
>Subject: Re: [PATCH] drm/amd/amdgpu: Enable gfx pipe1 and fix related
>issues
>
>Am 04.11.22 um 01:32 schrieb Emily Deng:
>> Starting from SIENNA CICHLID asic supports two gfx pipes, enabling two
>> graphics queues for performance concern.
>
>With the printk still in the patch I assume that this is just a debugging 
>patch?
>
Sorry, will delete it.
>>
>> Signed-off-by: Emily Deng 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c |  2 +-
>>   drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c  | 43 +
>>   2 files changed, 23 insertions(+), 22 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>> index 331aa191910c..0072f36b44d1 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>> @@ -33,7 +33,7 @@
>>  container_of((e), struct amdgpu_ctx_entity, entity)
>>
>>   const unsigned int amdgpu_ctx_num_entities[AMDGPU_HW_IP_NUM] = {
>> -[AMDGPU_HW_IP_GFX]  =   1,
>> +[AMDGPU_HW_IP_GFX]  =   2,
>
>That's an absolutely clear NAK and as far as I can see also unnecessary.
>
>We don't want to expose the GFX queues as separate queues to userspace.
>
>Instead the queues have separate priorities which userspace can select.
>
>Regards,
>Christian.
>
>>  [AMDGPU_HW_IP_COMPUTE]  =   4,
>>  [AMDGPU_HW_IP_DMA]  =   2,
>>  [AMDGPU_HW_IP_UVD]  =   1,
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> index 49d34c7bbf20..9219cd29acd3 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> @@ -4606,7 +4606,7 @@ static int gfx_v10_0_sw_init(void *handle)
>>  case IP_VERSION(10, 3, 3):
>>  case IP_VERSION(10, 3, 7):
>>  adev->gfx.me.num_me = 1;
>> -adev->gfx.me.num_pipe_per_me = 1;
>> +adev->gfx.me.num_pipe_per_me = 2;
>>  adev->gfx.me.num_queue_per_pipe = 1;
>>  adev->gfx.mec.num_mec = 2;
>>  adev->gfx.mec.num_pipe_per_mec = 4; @@ -6008,6 +6008,25
>@@ static
>> int gfx_v10_0_cp_gfx_load_microcode(struct amdgpu_device *adev)
>>  return 0;
>>   }
>>
>> +static int gfx_v10_0_wait_for_idle(void *handle) {
>> +unsigned i;
>> +u32 tmp;
>> +struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>> +
>> +for (i = 0; i < adev->usec_timeout; i++) {
>> +/* read MC_STATUS */
>> +tmp = RREG32_SOC15(GC, 0, mmGRBM_STATUS) &
>> +GRBM_STATUS__GUI_ACTIVE_MASK;
>> +
>> +if (!REG_GET_FIELD(tmp, GRBM_STATUS, GUI_ACTIVE))
>> +return 0;
>> +udelay(1);
>> +}
>> +printk("Emily:gfx_v10_0_wait_for_idle\n");
>> +return -ETIMEDOUT;
>> +}
>> +
>>   static int gfx_v10_0_cp_gfx_start(struct amdgpu_device *adev)
>>   {
>>  struct amdgpu_ring *ring;
>> @@ -6069,7 +6088,7 @@ static int gfx_v10_0_cp_gfx_start(struct
>amdgpu_device *adev)
>>  amdgpu_ring_write(ring, 0x8000);
>>
>>  amdgpu_ring_commit(ring);
>> -
>> +gfx_v10_0_wait_for_idle(adev);
>>  /* submit cs packet to copy state 0 to next available state */
>>  if (adev->gfx.num_gfx_rings > 1) {
>>  /* maximum supported gfx ring is 2 */ @@ -7404,24 +7423,6
>@@
>> static bool gfx_v10_0_is_idle(void *handle)
>>  return true;
>>   }
>>
>> -static int gfx_v10_0_wait_for_idle(void *handle) -{
>> -unsigned i;
>> -u32 tmp;
>> -struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>> -
>> -for (i = 0; i < adev->usec_timeout; i++) {
>> -/* read MC_STATUS */
>> -tmp = RREG32_SOC15(GC, 0, mmGRBM_STATUS) &
>> -GRBM_STATUS__GUI_ACTIVE_MASK;
>> -
>> -if (!REG_GET_FIELD(tmp, GRBM_STATUS, GUI_ACTIVE))
>> -return 0;
>> -udelay(1);
>> -}
>> -return -ETIMEDOUT;
>> -}
>> -
>>   static int gfx_v10_0_soft_reset(void *handle)
>>   {
>>  u32 grbm_soft_reset = 0;
>> @@ -8466,7 +8467,7 @@ static void gfx_v10_0_ring_emit_hdp_flush(struct
>amdgpu_ring *ring)
>>  }
>>  reg_mem_engine = 0;
>>  } else {
>> -ref_and_mask = nbio_hf_reg->ref_and_mask_cp0;
>> +ref_and_mask = nbio_hf_reg->ref_and_mask_cp0 << ring-
>>pipe;
>>  reg_mem_engine = 1; /* pfp */
>>  }
>>



RE: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV

2021-12-24 Thread Deng, Emily
These patches look good to me. JingWen will pull these patches and do some 
basic TDR test on sriov environment, and give feedback.

Best wishes
Emily Deng



>-Original Message-
>From: Liu, Monk 
>Sent: Thursday, December 23, 2021 6:14 PM
>To: Koenig, Christian ; Grodzovsky, Andrey
>; dri-de...@lists.freedesktop.org; amd-
>g...@lists.freedesktop.org; Chen, Horace ; Chen,
>JingWen ; Deng, Emily 
>Cc: dan...@ffwll.ch
>Subject: RE: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection
>for SRIOV
>
>[AMD Official Use Only]
>
>@Chen, Horace @Chen, JingWen @Deng, Emily
>
>Please take a review on Andrey's patch
>
>Thanks
>---
>Monk Liu | Cloud GPU & Virtualization Solution | AMD
>---
>we are hiring software manager for CVS core team
>---
>
>-Original Message-
>From: Koenig, Christian 
>Sent: Thursday, December 23, 2021 4:42 PM
>To: Grodzovsky, Andrey ; dri-
>de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org
>Cc: dan...@ffwll.ch; Liu, Monk ; Chen, Horace
>
>Subject: Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection
>for SRIOV
>
>Am 22.12.21 um 23:14 schrieb Andrey Grodzovsky:
>> Since now flr work is serialized against  GPU resets there is no need
>> for this.
>>
>> Signed-off-by: Andrey Grodzovsky 
>
>Acked-by: Christian König 
>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 11 ---
>>   drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 11 ---
>>   2 files changed, 22 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>> b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>> index 487cd654b69e..7d59a66e3988 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>> @@ -248,15 +248,7 @@ static void xgpu_ai_mailbox_flr_work(struct
>work_struct *work)
>>  struct amdgpu_device *adev = container_of(virt, struct
>amdgpu_device, virt);
>>  int timeout = AI_MAILBOX_POLL_FLR_TIMEDOUT;
>>
>> -/* block amdgpu_gpu_recover till msg FLR COMPLETE received,
>> - * otherwise the mailbox msg will be ruined/reseted by
>> - * the VF FLR.
>> - */
>> -if (!down_write_trylock(>reset_sem))
>> -return;
>> -
>>  amdgpu_virt_fini_data_exchange(adev);
>> -atomic_set(>in_gpu_reset, 1);
>>
>>  xgpu_ai_mailbox_trans_msg(adev, IDH_READY_TO_RESET, 0, 0, 0);
>>
>> @@ -269,9 +261,6 @@ static void xgpu_ai_mailbox_flr_work(struct
>work_struct *work)
>>  } while (timeout > 1);
>>
>>   flr_done:
>> -atomic_set(>in_gpu_reset, 0);
>> -up_write(>reset_sem);
>> -
>>  /* Trigger recovery for world switch failure if no TDR */
>>  if (amdgpu_device_should_recover_gpu(adev)
>>  && (!amdgpu_device_has_job_running(adev) || diff --git
>> a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>> b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>> index e3869067a31d..f82c066c8e8d 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>> @@ -277,15 +277,7 @@ static void xgpu_nv_mailbox_flr_work(struct
>work_struct *work)
>>  struct amdgpu_device *adev = container_of(virt, struct
>amdgpu_device, virt);
>>  int timeout = NV_MAILBOX_POLL_FLR_TIMEDOUT;
>>
>> -/* block amdgpu_gpu_recover till msg FLR COMPLETE received,
>> - * otherwise the mailbox msg will be ruined/reseted by
>> - * the VF FLR.
>> - */
>> -if (!down_write_trylock(>reset_sem))
>> -return;
>> -
>>  amdgpu_virt_fini_data_exchange(adev);
>> -atomic_set(>in_gpu_reset, 1);
>>
>>  xgpu_nv_mailbox_trans_msg(adev, IDH_READY_TO_RESET, 0, 0, 0);
>>
>> @@ -298,9 +290,6 @@ static void xgpu_nv_mailbox_flr_work(struct
>work_struct *work)
>>  } while (timeout > 1);
>>
>>   flr_done:
>> -atomic_set(>in_gpu_reset, 0);
>> -up_write(>reset_sem);
>> -
>>  /* Trigger recovery for world switch failure if no TDR */
>>  if (amdgpu_device_should_recover_gpu(adev)
>>  && (!amdgpu_device_has_job_running(adev) ||


RE: [PATCH] drm/ttm: add workaround for some arm hardware issue

2021-12-22 Thread Deng, Emily
[AMD Official Use Only]

Currently, only ampere found this issue, but it is hard to detect ampere board, 
especially on arm passthrough environment.

Best wishes
Emily Deng



>-Original Message-
>From: amd-gfx  On Behalf Of
>Christian König
>Sent: Wednesday, December 22, 2021 4:11 PM
>To: Zhao, Victor ; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/ttm: add workaround for some arm hardware issue
>
>Am 22.12.21 um 06:51 schrieb Victor Zhao:
>> Some Arm based platform has hardware issue which may generate
>> incorrect addresses when receiving writes from the CPU with a
>> discontiguous set of byte enables. This affects the writes with write
>> combine property.
>
>Can you point out which arm platforms are that exactly?
>
>> Workaround by change PROT_NORMAL_NC to PROT_DEVICE_nGnRE on arm.
>> As this is an issue with some specific arm based cpu, adding a ttm
>> parameter to control.
>
>Something as fundamental as this should not be made controllable by an
>module parameter.
>
>Write combining is very important for good performance and so we should
>only disable it on boards where we know that this won't work correctly.
>
>Regards,
>Christian.
>
>>
>> Signed-off-by: Victor Zhao 
>> ---
>>   drivers/gpu/drm/ttm/ttm_module.c | 8 +++-
>>   1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_module.c
>> b/drivers/gpu/drm/ttm/ttm_module.c
>> index e87f40674a4d..b27473cbbd52 100644
>> --- a/drivers/gpu/drm/ttm/ttm_module.c
>> +++ b/drivers/gpu/drm/ttm/ttm_module.c
>> @@ -41,6 +41,12 @@
>>
>>   #include "ttm_module.h"
>>
>> +static int enable_use_wc = 1;
>> +
>> +MODULE_PARM_DESC(enable_use_wc,
>> +"control write combine usage on arm platform due to hardware issue
>> +with write combine found on some specific arm cpu (1 =
>> +enable(default), 0 = disable)"); module_param(enable_use_wc, int,
>> +0644);
>> +
>>   /**
>>* ttm_prot_from_caching - Modify the page protection according to the
>>* ttm cacing mode
>> @@ -63,7 +69,7 @@ pgprot_t ttm_prot_from_caching(enum ttm_caching
>caching, pgprot_t tmp)
>>   #endif
>>   #if defined(__ia64__) || defined(__arm__) || defined(__aarch64__) || \
>>  defined(__powerpc__) || defined(__mips__)
>> -if (caching == ttm_write_combined)
>> +if (caching == ttm_write_combined && enable_use_wc != 0)
>>  tmp = pgprot_writecombine(tmp);
>>  else
>>  tmp = pgprot_noncached(tmp);



RE: [PATCH] drm/amd/amdgpu: Enable some sysnodes for guest smi

2021-09-07 Thread Deng, Emily
[AMD Official Use Only]

Reviewed-by: Emily.Deng 

>-Original Message-
>From: amd-gfx  On Behalf Of Roy
>Sun
>Sent: Monday, September 6, 2021 8:59 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Sun, Roy 
>Subject: [PATCH] drm/amd/amdgpu: Enable some sysnodes for guest smi
>
>Enable sysnode vclk and dclk on Navi21 asic for guest smi
>
>Signed-off-by: Roy Sun 
>---
> drivers/gpu/drm/amd/pm/amdgpu_pm.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
>b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
>index 249cb0aeb5ae..c255b4b8e685 100644
>--- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
>+++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
>@@ -2087,10 +2087,10 @@ static int default_attr_update(struct
>amdgpu_device *adev, struct amdgpu_device_
>   if (asic_type < CHIP_VEGA12)
>   *states = ATTR_STATE_UNSUPPORTED;
>   } else if (DEVICE_ATTR_IS(pp_dpm_vclk)) {
>-  if (!(asic_type == CHIP_VANGOGH))
>+  if (!(asic_type == CHIP_VANGOGH || asic_type ==
>CHIP_SIENNA_CICHLID))
>   *states = ATTR_STATE_UNSUPPORTED;
>   } else if (DEVICE_ATTR_IS(pp_dpm_dclk)) {
>-  if (!(asic_type == CHIP_VANGOGH))
>+  if (!(asic_type == CHIP_VANGOGH || asic_type ==
>CHIP_SIENNA_CICHLID))
>   *states = ATTR_STATE_UNSUPPORTED;
>   }
>
>--
>2.32.0



RE: [PATCH] drm/amdgpu: Add MB_REQ_MSG_READY_TO_RESET response when VF get FLR notification.

2021-08-12 Thread Deng, Emily
Reviewed-by: Emily.Deng 

>-Original Message-
>From: amd-gfx  On Behalf Of Zhou,
>Peng Ju
>Sent: Thursday, August 12, 2021 4:28 PM
>To: Zhou, Peng Ju ; amd-gfx@lists.freedesktop.org
>Cc: Zhao, Jiange 
>Subject: RE: [PATCH] drm/amdgpu: Add MB_REQ_MSG_READY_TO_RESET
>response when VF get FLR notification.
>
>[AMD Official Use Only]
>
>ping
>
>> -Original Message-
>> From: Peng Ju Zhou 
>> Sent: Monday, August 9, 2021 5:37 PM
>> To: amd-gfx@lists.freedesktop.org
>> Cc: Zhao, Jiange ; Zhou, Peng Ju
>> 
>> Subject: [PATCH] drm/amdgpu: Add MB_REQ_MSG_READY_TO_RESET
>response
>> when VF get FLR notification.
>>
>> From: Jiange Zhao 
>>
>> When guest received FLR notification from host, it would lock adapter
>> into reset state. There will be no more job submission and hardware
>> access after that.
>>
>> Then it should send a response to host that it has prepared for host reset.
>>
>> Signed-off-by: Jiange Zhao 
>> Signed-off-by: Peng Ju Zhou 
>> ---
>>  drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 2 ++
>> drivers/gpu/drm/amd/amdgpu/mxgpu_nv.h | 3 ++-
>>  2 files changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>> b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>> index b48e68f46a5c..a35e6d87e537 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>> @@ -287,6 +287,8 @@ static void xgpu_nv_mailbox_flr_work(struct
>> work_struct *work)
>>  amdgpu_virt_fini_data_exchange(adev);
>>  atomic_set(>in_gpu_reset, 1);
>>
>> +xgpu_nv_mailbox_trans_msg(adev, IDH_READY_TO_RESET, 0, 0, 0);
>> +
>>  do {
>>  if (xgpu_nv_mailbox_peek_msg(adev) ==
>> IDH_FLR_NOTIFICATION_CMPL)
>>  goto flr_done;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.h
>> b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.h
>> index 9f5808616174..73887b0aa1d6 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.h
>> @@ -37,7 +37,8 @@ enum idh_request {
>>  IDH_REQ_GPU_RESET_ACCESS,
>>  IDH_REQ_GPU_INIT_DATA,
>>
>> -IDH_LOG_VF_ERROR   = 200,
>> +IDH_LOG_VF_ERROR= 200,
>> +IDH_READY_TO_RESET  = 201,
>>  };
>>
>>  enum idh_event {
>> --
>> 2.17.1
<>

RE: [PATCH] drm/amd/amdgpu: skip locking delayed work if not initialized.

2021-08-09 Thread Deng, Emily
[AMD Official Use Only]

Reviewed-by: Emily.Deng 

>-Original Message-
>From: amd-gfx  On Behalf Of
>YuBiao Wang
>Sent: Thursday, August 5, 2021 10:38 AM
>To: amd-gfx@lists.freedesktop.org
>Cc: Grodzovsky, Andrey ; Quan, Evan
>; Chen, Horace ; Tuikov,
>Luben ; Koenig, Christian
>; Deucher, Alexander
>; Xiao, Jack ; Zhang,
>Hawking ; Liu, Monk ; Xu,
>Feifei ; Wang, Kevin(Yang) ;
>Wang, YuBiao 
>Subject: [PATCH] drm/amd/amdgpu: skip locking delayed work if not
>initialized.
>
>When init failed in early init stage, amdgpu_object has not been initialized,
>so hasn't the ttm delayed queue functions.
>
>Signed-off-by: YuBiao Wang 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>index 9e53ff851496..4c33985542ed 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>@@ -3825,7 +3825,8 @@ void amdgpu_device_fini_hw(struct
>amdgpu_device *adev)  {
>   dev_info(adev->dev, "amdgpu: finishing device.\n");
>   flush_delayed_work(>delayed_init_work);
>-  ttm_bo_lock_delayed_workqueue(>mman.bdev);
>+  if (adev->mman.initialized)
>+  ttm_bo_lock_delayed_workqueue(>mman.bdev);
>   adev->shutdown = true;
>
>   /* make sure IB test finished before entering exclusive mode
>--
>2.25.1



RE: [PATCH] drm/amdgpu: enable more pm sysfs under SRIOV 1-VF mode

2021-08-05 Thread Deng, Emily
Acked-by: Emily.Deng 

>-Original Message-
>From: Gu, JiaWei (Will) 
>Sent: Thursday, August 5, 2021 2:32 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Nieto, David M ; Deng, Emily
>; Deucher, Alexander
>
>Subject: RE: [PATCH] drm/amdgpu: enable more pm sysfs under SRIOV 1-VF
>mode
>
>[AMD Official Use Only]
>
>Ping.
>
>-Original Message-
>From: Gu, JiaWei (Will) 
>Sent: Wednesday, August 4, 2021 4:08 PM
>To: Gu, JiaWei (Will) ; amd-gfx@lists.freedesktop.org
>Cc: Nieto, David M ; Deng, Emily
>; Deucher, Alexander
>
>Subject: RE: [PATCH] drm/amdgpu: enable more pm sysfs under SRIOV 1-VF
>mode
>
>[AMD Official Use Only]
>
>Add Alex.
>
>-Original Message-
>From: Jiawei Gu 
>Sent: Wednesday, August 4, 2021 3:50 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Nieto, David M ; Deng, Emily
>; Gu, JiaWei (Will) 
>Subject: [PATCH] drm/amdgpu: enable more pm sysfs under SRIOV 1-VF
>mode
>
>Enable pp_num_states, pp_cur_state, pp_force_state, pp_table sysfs under
>SRIOV 1-VF scenario.
>
>Signed-off-by: Jiawei Gu 
>---
> drivers/gpu/drm/amd/pm/amdgpu_pm.c | 8 
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
>b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
>index 769f58d5ae1a..04c7d82f8b89 100644
>--- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
>+++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
>@@ -2005,10 +2005,10 @@ static int ss_bias_attr_update(struct
>amdgpu_device *adev, struct amdgpu_device_  static struct
>amdgpu_device_attr amdgpu_device_attrs[] = {
>   AMDGPU_DEVICE_ATTR_RW(power_dpm_state,
>   ATTR_FLAG_BASIC|ATTR_FLAG_ONEVF),
>   AMDGPU_DEVICE_ATTR_RW(power_dpm_force_performance_level,
>   ATTR_FLAG_BASIC|ATTR_FLAG_ONEVF),
>-  AMDGPU_DEVICE_ATTR_RO(pp_num_states,
>   ATTR_FLAG_BASIC),
>-  AMDGPU_DEVICE_ATTR_RO(pp_cur_state,
>   ATTR_FLAG_BASIC),
>-  AMDGPU_DEVICE_ATTR_RW(pp_force_state,
>   ATTR_FLAG_BASIC),
>-  AMDGPU_DEVICE_ATTR_RW(pp_table,
>   ATTR_FLAG_BASIC),
>+  AMDGPU_DEVICE_ATTR_RO(pp_num_states,
>   ATTR_FLAG_BASIC|ATTR_FLAG_ONEVF),
>+  AMDGPU_DEVICE_ATTR_RO(pp_cur_state,
>   ATTR_FLAG_BASIC|ATTR_FLAG_ONEVF),
>+  AMDGPU_DEVICE_ATTR_RW(pp_force_state,
>   ATTR_FLAG_BASIC|ATTR_FLAG_ONEVF),
>+  AMDGPU_DEVICE_ATTR_RW(pp_table,
>   ATTR_FLAG_BASIC|ATTR_FLAG_ONEVF),
>   AMDGPU_DEVICE_ATTR_RW(pp_dpm_sclk,
>   ATTR_FLAG_BASIC|ATTR_FLAG_ONEVF),
>   AMDGPU_DEVICE_ATTR_RW(pp_dpm_mclk,
>   ATTR_FLAG_BASIC|ATTR_FLAG_ONEVF),
>   AMDGPU_DEVICE_ATTR_RW(pp_dpm_socclk,
>   ATTR_FLAG_BASIC|ATTR_FLAG_ONEVF),
>--
>2.17.1


RE: [PATCH] drm/amdgpu: Correct the irq numbers for virtual crtc

2021-07-07 Thread Deng, Emily
[AMD Official Use Only]

Ping ..

>-Original Message-
>From: Emily Deng 
>Sent: Tuesday, July 6, 2021 10:14 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: [PATCH] drm/amdgpu: Correct the irq numbers for virtual crtc
>
>The irq number should be decided by num_crtc, and the num_crtc could change
>by parameter.
>
>Signed-off-by: Emily Deng 
>---
> drivers/gpu/drm/amd/amdgpu/dce_virtual.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>index 33324427b555..7e0d8c092c7e 100644
>--- a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>+++ b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>@@ -766,7 +766,7 @@ static const struct amdgpu_irq_src_funcs
>dce_virtual_crtc_irq_funcs = {
>
> static void dce_virtual_set_irq_funcs(struct amdgpu_device *adev)  {
>-  adev->crtc_irq.num_types = AMDGPU_CRTC_IRQ_VBLANK6 + 1;
>+  adev->crtc_irq.num_types = adev->mode_info.num_crtc;
>   adev->crtc_irq.funcs = _virtual_crtc_irq_funcs;  }
>
>--
>2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: Correct the irq numbers for virtual ctrc

2021-07-06 Thread Deng, Emily
Thanks, corrected.

Best wishes
Emily Deng



>-Original Message-
>From: Chen, Guchun 
>Sent: Tuesday, July 6, 2021 9:52 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: RE: [PATCH] drm/amdgpu: Correct the irq numbers for virtual ctrc
>
>[Public]
>
>A spelling typo in subject.
>
>s/ctrc/crtc
>
>Regards,
>Guchun
>
>-Original Message-
>From: amd-gfx  On Behalf Of Emily
>Deng
>Sent: Tuesday, July 6, 2021 4:23 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: [PATCH] drm/amdgpu: Correct the irq numbers for virtual ctrc
>
>The irq number should be decided by num_crtc, and the num_crtc could change
>by parameter.
>
>Signed-off-by: Emily Deng 
>---
> drivers/gpu/drm/amd/amdgpu/dce_virtual.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>index 33324427b555..7e0d8c092c7e 100644
>--- a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>+++ b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>@@ -766,7 +766,7 @@ static const struct amdgpu_irq_src_funcs
>dce_virtual_crtc_irq_funcs = {
>
> static void dce_virtual_set_irq_funcs(struct amdgpu_device *adev)  {
>-  adev->crtc_irq.num_types = AMDGPU_CRTC_IRQ_VBLANK6 + 1;
>+  adev->crtc_irq.num_types = adev->mode_info.num_crtc;
>   adev->crtc_irq.funcs = _virtual_crtc_irq_funcs;  }
>
>--
>2.25.1
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.fre
>edesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfxdata=04%7C01%7Cguchun.chen%40amd.com%7Ca4ebf76a046d421cf
>07c08d94057515e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C6
>37611566042100458%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMD
>AiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=k
>h2vwkTyod4ajvt03GB6%2F%2B4FYc%2BwZLCC%2FusgWuijPCU%3Dreser
>ved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: Correct the irq numbers for virtual ctrc

2021-07-06 Thread Deng, Emily
[AMD Official Use Only]

Hi Nirmoy,
Thanks, already send out another patch with updating the commit.

Best wishes
Emily Deng

>-Original Message-
>From: Das, Nirmoy 
>Sent: Friday, July 2, 2021 5:03 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Cc: Zhao, Victor 
>Subject: Re: [PATCH] drm/amdgpu: Correct the irq numbers for virtual ctrc
>
>Please describe bit more with a commit message.
>
>On 7/1/2021 6:37 AM, Emily Deng wrote:
>> Signed-off-by: Emily Deng 
>> Signed-off-by: Victor 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/dce_virtual.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>> index 33324427b555..7e0d8c092c7e 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>> @@ -766,7 +766,7 @@ static const struct amdgpu_irq_src_funcs
>dce_virtual_crtc_irq_funcs = {
>>
>>   static void dce_virtual_set_irq_funcs(struct amdgpu_device *adev)
>>   {
>> -adev->crtc_irq.num_types = AMDGPU_CRTC_IRQ_VBLANK6 + 1;
>> +adev->crtc_irq.num_types = adev->mode_info.num_crtc;
>>  adev->crtc_irq.funcs = _virtual_crtc_irq_funcs;
>>   }
>>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: Correct the irq numbers for virtual ctrc

2021-07-01 Thread Deng, Emily
[AMD Official Use Only]

Ping..

>-Original Message-
>From: Emily Deng 
>Sent: Thursday, July 1, 2021 12:38 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily ; Zhao, Victor 
>Subject: [PATCH] drm/amdgpu: Correct the irq numbers for virtual ctrc
>
>Signed-off-by: Emily Deng 
>Signed-off-by: Victor 
>---
> drivers/gpu/drm/amd/amdgpu/dce_virtual.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>index 33324427b555..7e0d8c092c7e 100644
>--- a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>+++ b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>@@ -766,7 +766,7 @@ static const struct amdgpu_irq_src_funcs
>dce_virtual_crtc_irq_funcs = {
>
> static void dce_virtual_set_irq_funcs(struct amdgpu_device *adev)  {
>-  adev->crtc_irq.num_types = AMDGPU_CRTC_IRQ_VBLANK6 + 1;
>+  adev->crtc_irq.num_types = adev->mode_info.num_crtc;
>   adev->crtc_irq.funcs = _virtual_crtc_irq_funcs;  }
>
>--
>2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: Fixing "Indirect register access for Navi12 sriov" for vega10

2021-06-07 Thread Deng, Emily
[AMD Official Use Only]

Reviewed-by: Emily Deng 

>-Original Message-
>From: amd-gfx  On Behalf Of Peng Ju
>Zhou
>Sent: Monday, June 7, 2021 1:55 PM
>To: amd-gfx@lists.freedesktop.org
>Subject: [PATCH] drm/amdgpu: Fixing "Indirect register access for Navi12 sriov"
>for vega10
>
>The NV12 and VEGA10 share the same interface W/RREG32_SOC15*, the
>callback functions in these macros may not be defined, so NULL pointer must be
>checked but not in macro __WREG32_SOC15_RLC__, fixing the lock of NULL
>pointer check.
>
>Signed-off-by: Peng Ju Zhou 
>---
> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 3 ++-
> drivers/gpu/drm/amd/amdgpu/soc15_common.h | 4 ++--
> 2 files changed, 4 insertions(+), 3 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>index fe5908f708cc..044076ec1d03 100644
>--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>@@ -790,7 +790,8 @@ static void gfx_v9_0_rlcg_w(struct amdgpu_device
>*adev, u32 offset, u32 v, u32 f  static void gfx_v9_0_rlcg_wreg(struct
>amdgpu_device *adev, u32 offset,
>  u32 v, u32 acc_flags, u32 hwip)  {
>-  if (amdgpu_sriov_fullaccess(adev)) {
>+  if ((acc_flags & AMDGPU_REGS_RLC) &&
>+  amdgpu_sriov_fullaccess(adev)) {
>   gfx_v9_0_rlcg_w(adev, offset, v, acc_flags);
>
>   return;
>diff --git a/drivers/gpu/drm/amd/amdgpu/soc15_common.h
>b/drivers/gpu/drm/amd/amdgpu/soc15_common.h
>index f6cf70e69cce..0eeb5e073be8 100644
>--- a/drivers/gpu/drm/amd/amdgpu/soc15_common.h
>+++ b/drivers/gpu/drm/amd/amdgpu/soc15_common.h
>@@ -28,12 +28,12 @@
> #define SOC15_REG_OFFSET(ip, inst, reg)   (adev-
>>reg_offset[ip##_HWIP][inst][reg##_BASE_IDX] + reg)
>
> #define __WREG32_SOC15_RLC__(reg, value, flag, hwip) \
>-  ((amdgpu_sriov_runtime(adev) && adev->gfx.rlc.funcs->rlcg_wreg) ? \
>+  ((amdgpu_sriov_vf(adev) && adev->gfx.rlc.funcs &&
>+adev->gfx.rlc.funcs->rlcg_wreg) ? \
>adev->gfx.rlc.funcs->rlcg_wreg(adev, reg, value, flag, hwip) : \
>WREG32(reg, value))
>
> #define __RREG32_SOC15_RLC__(reg, flag, hwip) \
>-  ((amdgpu_sriov_runtime(adev) && adev->gfx.rlc.funcs->rlcg_rreg) ? \
>+  ((amdgpu_sriov_vf(adev) && adev->gfx.rlc.funcs &&
>+adev->gfx.rlc.funcs->rlcg_rreg) ? \
>adev->gfx.rlc.funcs->rlcg_rreg(adev, reg, flag, hwip) : \
>RREG32(reg))
>
>--
>2.17.1
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.fre
>edesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfxdata=04%7C01%7CEmily.Deng%40amd.com%7C93e8d99e699940575
>6f708d92978d707%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C6
>37586421253857337%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMD
>AiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=s
>vMhuOMA21NlQ%2B9T9lrio5AIcoAPR2uzRuJfDndhr9o%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 1/5] drm/amdgpu: remove sriov vf checking from getting fb location

2021-06-03 Thread Deng, Emily
Do we need to consider backward compatibility?


Best wishes
Emily Deng


>-Original Message-
>From: amd-gfx  On Behalf Of Liu,
>Shaoyun
>Sent: Thursday, June 3, 2021 11:10 PM
>To: Luo, Zhigang ; amd-gfx@lists.freedesktop.org
>Cc: Luo, Zhigang 
>Subject: RE: [PATCH 1/5] drm/amdgpu: remove sriov vf checking from getting fb
>location
>
>[AMD Official Use Only]
>
>Looks ok to me .
>
>Reviewed-By : Shaoyun.liu 
>
>-Original Message-
>From: amd-gfx  On Behalf Of Zhigang
>Luo
>Sent: Thursday, June 3, 2021 10:13 AM
>To: amd-gfx@lists.freedesktop.org
>Cc: Luo, Zhigang 
>Subject: [PATCH 1/5] drm/amdgpu: remove sriov vf checking from getting fb
>location
>
>host driver programmed fb location registers for vf, no need to check anymore.
>
>Signed-off-by: Zhigang Luo 
>---
> drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 5 +
> 1 file changed, 1 insertion(+), 4 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>index ceb3968d8326..1c2d9fde9021 100644
>--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>@@ -1292,10 +1292,7 @@ static int gmc_v9_0_late_init(void *handle)  static
>void gmc_v9_0_vram_gtt_location(struct amdgpu_device *adev,
>   struct amdgpu_gmc *mc)
> {
>-  u64 base = 0;
>-
>-  if (!amdgpu_sriov_vf(adev))
>-  base = adev->mmhub.funcs->get_fb_location(adev);
>+  u64 base = adev->mmhub.funcs->get_fb_location(adev);
>
>   /* add the xgmi offset of the physical node */
>   base += adev->gmc.xgmi.physical_node_id * adev-
>>gmc.xgmi.node_segment_size;
>--
>2.17.1
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.fre
>edesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfxdata=04%7C01%7CEmily.Deng%40amd.com%7Cd41e78b1a3af4f08ff
>d108d926a1a2d8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C63
>7583297946242271%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAi
>LCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=Nsz
>ZyRZHCxj%2FIJ1hYoSrkv3LpTmF9FbchpNMtQ2GE5M%3Dreserved=0
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.fre
>edesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfxdata=04%7C01%7CEmily.Deng%40amd.com%7Cd41e78b1a3af4f08ff
>d108d926a1a2d8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C63
>7583297946242271%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAi
>LCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=Nsz
>ZyRZHCxj%2FIJ1hYoSrkv3LpTmF9FbchpNMtQ2GE5M%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: Fix Gstreamer api vaapih264enc missing

2021-05-30 Thread Deng, Emily
Hi Monk,
 Yes, actually this patch is to disable decode ring, and also want to let 
gstreamer could continue running. As gstreamer will check both encode and 
decode capability, if any of this is missing, then it couldn't run.
 As for navi12 sriov it disables decode ring, so 
adev->vcn.inst[i].ring_dec.sched.ready will be false, so add extra follow code 
here to report decode.

+   if (adev->vcn.inst[i].ring_dec.sched.ready ||
+   (adev->asic_type == CHIP_NAVI12 &&
+   amdgpu_sriov_vf(adev)))
 ++num_rings;

Best wishes
Emily Deng
From: amd-gfx  On Behalf Of Li, Xin 
(Justin)
Sent: Thursday, May 27, 2021 3:18 PM
To: Liu, Monk ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Min, Frank 
; Koenig, Christian 
Subject: Re: [PATCH] drm/amdgpu: Fix Gstreamer api vaapih264enc missing

Hi, friends.

My apologize for this patch.

I've ported this patch from another branch to fix gstreamer's lack of 
"vaapi264enc", currently this ported patch did fix that issue. However, since 
this patch is ported from another branch, I might need to do some alternations 
and audits mainly in its commit messages. I will file another review right 
after my fixes.

Thank you all for you time.

BR,
Justin

From: Liu, Monk mailto:monk@amd.com>>
Date: Thursday, May 27, 2021 at 07:57
To: Li, Xin (Justin) mailto:xin2...@amd.com>>, 
amd-gfx@lists.freedesktop.org 
mailto:amd-gfx@lists.freedesktop.org>>
Cc: Li, Xin (Justin) mailto:xin2...@amd.com>>, Deucher, 
Alexander mailto:alexander.deuc...@amd.com>>, Min, 
Frank mailto:frank@amd.com>>, Koenig, Christian 
mailto:christian.koe...@amd.com>>
Subject: RE: [PATCH] drm/amdgpu: Fix Gstreamer api vaapih264enc missing
[AMD Official Use Only]

Looks it lack enough background for people to review:


-   if (adev->vcn.inst[i].ring_dec.sched.ready)
+   if (adev->vcn.inst[i].ring_dec.sched.ready ||
+   (adev->asic_type == CHIP_NAVI12 &&
+   amdgpu_sriov_vf(adev)))
 ++num_rings;

[ml] why for SRIOV navi12 is forced to have those DEC rings ? since  SRIOV 
navi12 have no decode capability , any explain here ?


-   if (amdgpu_is_tmz(adev))
-   dev_info->ids_flags |= AMDGPU_IDS_FLAGS_TMZ;
[ML] why this is removed ? is it related to your issue ?


Thanks

--
Monk Liu | Cloud-GPU Core team
--

-Original Message-
From: amd-gfx 
mailto:amd-gfx-boun...@lists.freedesktop.org>>
 On Behalf Of Li, Xin (Justin)
Sent: Wednesday, May 26, 2021 6:13 PM
To: amd-gfx@lists.freedesktop.org
Cc: Li, Xin (Justin) mailto:xin2...@amd.com>>; Deucher, 
Alexander mailto:alexander.deuc...@amd.com>>; Min, 
Frank mailto:frank@amd.com>>; Koenig, Christian 
mailto:christian.koe...@amd.com>>
Subject: [PATCH] drm/amdgpu: Fix Gstreamer api vaapih264enc missing

since vcn decoding ring is not required, so just disable it.

Cc: Alex.Deucher mailto:alexander.deuc...@amd.com>>
Cc: Christian.Konig mailto:christian.koe...@amd.com>>
Signed-off-by: Li.Xin.Justin mailto:xin2...@amd.com>>
Signed-off-by: Frank.Min mailto:frank@amd.com>>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c |  6 +++---
 drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c   | 25 ++---
 2 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 524e4fe5efe8..614e6b06e94e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -427,7 +427,9 @@ static int amdgpu_hw_ip_info(struct amdgpu_device *adev,
 if (adev->uvd.harvest_config & (1 << i))
 continue;

-   if (adev->vcn.inst[i].ring_dec.sched.ready)
+   if (adev->vcn.inst[i].ring_dec.sched.ready ||
+   (adev->asic_type == CHIP_NAVI12 &&
+   amdgpu_sriov_vf(adev)))
 ++num_rings;
 }
 ib_start_alignment = 16;
@@ -770,8 +772,6 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, 
struct drm_file *filp)
 dev_info->ids_flags |= AMDGPU_IDS_FLAGS_FUSION;
 if (amdgpu_mcbp || amdgpu_sriov_vf(adev))
 dev_info->ids_flags |= AMDGPU_IDS_FLAGS_PREEMPTION;
-   if (amdgpu_is_tmz(adev))
-   dev_info->ids_flags |= AMDGPU_IDS_FLAGS_TMZ;

 vm_size = adev->vm_manager.max_pfn * AMDGPU_GPU_PAGE_SIZE;
 vm_size -= AMDGPU_VA_RESERVED_SIZE;
diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c 

RE: [PATCH] drm/amd/amdgpu:save psp ring wptr in SRIOV to avoid attack

2021-05-26 Thread Deng, Emily
[AMD Official Use Only]

Reviewed-by: Emily Deng 

>-Original Message-
>From: amd-gfx  On Behalf Of Jingwen
>Chen
>Sent: Wednesday, May 26, 2021 2:55 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Chen, JingWen ; Zhao, Victor
>; Liu, Monk 
>Subject: [PATCH] drm/amd/amdgpu:save psp ring wptr in SRIOV to avoid attack
>
>From: Victor Zhao 
>
>save psp ring wptr in SRIOV to avoid attack to avoid extra changes to
>MP0_SMN_C2PMSG_102 reg
>
>Change-Id: Idee78e8c1c781463048f2f6311fdc70488ef05b2
>Signed-off-by: Victor Zhao 
>Signed-off-by: Jingwen Chen 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 1 +
>drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 1 +
>drivers/gpu/drm/amd/amdgpu/psp_v11_0.c  | 3 ++-
> drivers/gpu/drm/amd/amdgpu/psp_v3_1.c   | 3 ++-
> 4 files changed, 6 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>index 55378c6b9722..20e06b3ec686 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>@@ -2701,6 +2701,7 @@ int psp_ring_cmd_submit(struct psp_context *psp,
>   /* Update the write Pointer in DWORDs */
>   psp_write_ptr_reg = (psp_write_ptr_reg + rb_frame_size_dw) %
>ring_size_dw;
>   psp_ring_set_wptr(psp, psp_write_ptr_reg);
>+  ring->ring_wptr = psp_write_ptr_reg;
>   return 0;
> }
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h
>index 46a5328e00e0..60aa99a39a74 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h
>@@ -76,6 +76,7 @@ struct psp_ring
>   uint64_tring_mem_mc_addr;
>   void*ring_mem_handle;
>   uint32_tring_size;
>+  uint32_tring_wptr;
> };
>
> /* More registers may will be supported */ diff --git
>a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c
>b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c
>index 1f2e7e35c91e..4a32b0c84ef4 100644
>--- a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c
>+++ b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c
>@@ -474,6 +474,7 @@ static int psp_v11_0_ring_create(struct psp_context
>*psp,
>   return ret;
>   }
>
>+  ring->ring_wptr = 0;
>   /* Write low address of the ring to C2PMSG_102 */
>   psp_ring_reg = lower_32_bits(ring->ring_mem_mc_addr);
>   WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_102,
>psp_ring_reg); @@ -733,7 +734,7 @@ static uint32_t
>psp_v11_0_ring_get_wptr(struct psp_context *psp)
>   struct amdgpu_device *adev = psp->adev;
>
>   if (amdgpu_sriov_vf(adev))
>-  data = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_102);
>+  data = psp->km_ring.ring_wptr;
>   else
>   data = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_67);
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c
>b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c
>index f2e725f72d2f..160f78eb6403 100644
>--- a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c
>+++ b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c
>@@ -237,6 +237,7 @@ static int psp_v3_1_ring_create(struct psp_context *psp,
>   return ret;
>   }
>
>+  ring->ring_wptr = 0;
>   /* Write low address of the ring to C2PMSG_102 */
>   psp_ring_reg = lower_32_bits(ring->ring_mem_mc_addr);
>   WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_102,
>psp_ring_reg); @@ -379,7 +380,7 @@ static uint32_t
>psp_v3_1_ring_get_wptr(struct psp_context *psp)
>   struct amdgpu_device *adev = psp->adev;
>
>   if (amdgpu_sriov_vf(adev))
>-  data = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_102);
>+  data = psp->km_ring.ring_wptr;
>   else
>   data = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_67);
>   return data;
>--
>2.25.1
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.free
>desktop.org%2Fmailman%2Flistinfo%2Famd-
>gfxdata=04%7C01%7CEmily.Deng%40amd.com%7C36c0aa6490f94ad29ac
>608d920133b77%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C6375
>76089720049840%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJ
>QIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=PV29Nx
>wj0ASCRlw%2BHWal6Ghk%2FkLe6Qhxd2yCwA%2BiI9Q%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH v5 09/10] drm/amdgpu: Use PSP to program IH_RB_CNTL* registers

2021-05-21 Thread Deng, Emily
Hi Pengju,
 You'd better only switch for sriov.

Best wishes
Emily Deng

>-Original Message-
>From: Zhou, Peng Ju 
>Sent: Friday, May 21, 2021 5:58 PM
>To: Alex Deucher ; Zhao, Victor
>; Deng, Emily 
>Cc: amd-gfx list 
>Subject: RE: [PATCH v5 09/10] drm/amdgpu: Use PSP to program IH_RB_CNTL*
>registers
>
>[AMD Official Use Only - Internal Distribution Only]
>
>Hi @Zhao, Victor/@Deng, Emily
>
>Can you help to answer Alex's question,?
>Because this patch originally from @Zhao, Victor, it's hard for me to explain 
>the
>question.
>
>Alex's question:
>> > --- a/drivers/gpu/drm/amd/amdgpu/nv.c
>> > +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
>> > @@ -845,8 +845,8 @@ int nv_set_ip_blocks(struct amdgpu_device *adev)
>> > case CHIP_NAVI12:
>> > amdgpu_device_ip_block_add(adev, _common_ip_block);
>> > amdgpu_device_ip_block_add(adev, _v10_0_ip_block);
>> > -   amdgpu_device_ip_block_add(adev, _ih_ip_block);
>> > amdgpu_device_ip_block_add(adev,
>> > _v11_0_ip_block);
>> > +   amdgpu_device_ip_block_add(adev,
>> > + _ih_ip_block);
>>
>> Is it safe to change the order like this on bare metal?  Please look
>> at what was done for vega and sienna cichlid.  Something like that is 
>> probably
>a better bet.
>
>
>--
>BW
>Pengju Zhou
>
>
>
>
>> -Original Message-
>> From: Alex Deucher 
>> Sent: Thursday, May 20, 2021 11:47 AM
>> To: Zhou, Peng Ju 
>> Cc: amd-gfx list ; Zhao, Victor
>> 
>> Subject: Re: [PATCH v5 09/10] drm/amdgpu: Use PSP to program
>> IH_RB_CNTL* registers
>>
>> On Mon, May 17, 2021 at 10:39 AM Peng Ju Zhou 
>> wrote:
>> >
>> > use psp to program IH_RB_CNTL* if indirect access for ih enabled in
>> > SRIOV environment.
>> >
>> > Signed-off-by: Victor 
>> > Signed-off-by: Peng Ju Zhou 
>> > ---
>> >  drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 19 +--
>> >  drivers/gpu/drm/amd/amdgpu/nv.c|  2 +-
>> >  2 files changed, 18 insertions(+), 3 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
>> > b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
>> > index f4e4040bbd25..2e69cf8db072 100644
>> > --- a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
>> > +++ b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
>> > @@ -151,7 +151,14 @@ static int navi10_ih_toggle_ring_interrupts(struct
>> amdgpu_device *adev,
>> > /* enable_intr field is only valid in ring0 */
>> > if (ih == >irq.ih)
>> > tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, ENABLE_INTR, (enable ?
>> 1 : 0));
>> > -   WREG32(ih_regs->ih_rb_cntl, tmp);
>> > +   if (amdgpu_sriov_vf(adev) && amdgpu_sriov_reg_indirect_ih(adev)) {
>> > +   if (psp_reg_program(>psp, ih_regs->psp_reg_id, tmp)) 
>> > {
>> > +   DRM_ERROR("PSP program IH_RB_CNTL failed!\n");
>> > +   return -ETIMEDOUT;
>> > +   }
>> > +   } else {
>> > +   WREG32(ih_regs->ih_rb_cntl, tmp);
>> > +   }
>> >
>> > if (enable) {
>> > ih->enabled = true;
>> > @@ -261,7 +268,15 @@ static int navi10_ih_enable_ring(struct
>> amdgpu_device *adev,
>> > tmp = REG_SET_FIELD(tmp, IH_RB_CNTL,
>> WPTR_OVERFLOW_ENABLE, 0);
>> > tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, RB_FULL_DRAIN_ENABLE,
>> 1);
>> > }
>> > -   WREG32(ih_regs->ih_rb_cntl, tmp);
>> > +
>> > +   if (amdgpu_sriov_vf(adev) && amdgpu_sriov_reg_indirect_ih(adev)) {
>> > +   if (psp_reg_program(>psp, ih_regs->psp_reg_id, tmp)) 
>> > {
>> > +   DRM_ERROR("PSP program IH_RB_CNTL failed!\n");
>> > +   return -ETIMEDOUT;
>> > +   }
>> > +   } else {
>> > +   WREG32(ih_regs->ih_rb_cntl, tmp);
>> > +   }
>> >
>> > if (ih == >irq.ih) {
>> > /* set the ih ring 0 writeback address whether it's
>> > enabled or not */ diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c
>> > b/drivers/gpu/drm/amd/amdgpu/nv.c index a9ad28fb55b3..b9c9c4d4606c
>> > 100644
>> > 

RE: [PATCH] drm/amd/amdgpu: Cancel the hrtimer in sw_fini

2021-05-09 Thread Deng, Emily
Reviewed-by: Emily Deng 

>-Original Message-
>From: amd-gfx  On Behalf Of Sun, Roy
>Sent: Saturday, May 8, 2021 12:35 PM
>To: Sun, Roy ; amd-gfx@lists.freedesktop.org
>Subject: RE: [PATCH] drm/amd/amdgpu: Cancel the hrtimer in sw_fini
>
>[AMD Official Use Only - Internal Distribution Only]
>
>Ping
>
>-Original Message-
>From: Roy Sun 
>Sent: Tuesday, April 6, 2021 8:21 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Sun, Roy 
>Subject: [PATCH] drm/amd/amdgpu: Cancel the hrtimer in sw_fini
>
>Move the process of cancelling hrtimer to sw_fini
>
>Signed-off-by: Roy Sun 
>---
> drivers/gpu/drm/amd/amdgpu/dce_virtual.c | 12 +---
> 1 file changed, 5 insertions(+), 7 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>index 5c11144da051..33324427b555 100644
>--- a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>+++ b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>@@ -421,6 +421,11 @@ static int dce_virtual_sw_init(void *handle)  static int
>dce_virtual_sw_fini(void *handle)  {
>   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>+  int i = 0;
>+
>+  for (i = 0; i < adev->mode_info.num_crtc; i++)
>+  if (adev->mode_info.crtcs[i])
>+  hrtimer_cancel(>mode_info.crtcs[i]-
>>vblank_timer);
>
>   kfree(adev->mode_info.bios_hardcoded_edid);
>
>@@ -480,13 +485,6 @@ static int dce_virtual_hw_init(void *handle)
>
> static int dce_virtual_hw_fini(void *handle)  {
>-  struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>-  int i = 0;
>-
>-  for (i = 0; imode_info.num_crtc; i++)
>-  if (adev->mode_info.crtcs[i])
>-  hrtimer_cancel(>mode_info.crtcs[i]-
>>vblank_timer);
>-
>   return 0;
> }
>
>--
>2.29.0
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.fre
>edesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfxdata=04%7C01%7CEmily.Deng%40amd.com%7Cc36a726027e043366d
>9308d911daafc6%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637
>560453220959024%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL
>CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=tgM
>OtVhydrjPcZCF%2BZwE04DKVdYfTDDwl%2Bl0mHa0HOo%3Dreserved=0
<>___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: Rename the flags to eliminate ambiguity v2

2021-04-29 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Emily Deng 

>-Original Message-
>From: amd-gfx  On Behalf Of Peng
>Ju Zhou
>Sent: Thursday, April 29, 2021 2:31 PM
>To: amd-gfx@lists.freedesktop.org
>Subject: [PATCH] drm/amdgpu: Rename the flags to eliminate ambiguity v2
>
>The flags vf_reg_access_* may cause confusion, rename the flags to make it
>more clear.
>
>Signed-off-by: Peng Ju Zhou 
>---
> drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h
>b/drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h
>index 1a8f6d4baab2..befd0b4b7bea 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h
>@@ -98,9 +98,9 @@ union amd_sriov_msg_feature_flags {
>
> union amd_sriov_reg_access_flags {
> struct {
>-uint32_t vf_reg_access_ih: 1;
>-uint32_t vf_reg_access_mmhub : 1;
>-uint32_t vf_reg_access_gc: 1;
>+uint32_t vf_reg_psp_access_ih: 1;
>+uint32_t vf_reg_rlc_access_mmhub : 1;
>+uint32_t vf_reg_rlc_access_gc: 1;
> uint32_t reserved: 29;
> } flags;
> uint32_t all;
>--
>2.17.1
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.f
>reedesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfxdata=04%7C01%7CEmily.Deng%40amd.com%7C4c15cf51efce4392
>8ee608d90ad86457%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7
>C637552746778898204%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
>MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sda
>ta=bN8GQYh%2Ftzs0Lg%2BhzMqRyoIwuyR42T6TLZdXszh3mlw%3Drese
>rved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 1/2] drm/scheduler: Change scheduled fence track

2021-04-28 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Hi Christian,
Good to know, thanks very much.

Best wishes
Emily Deng
From: Christian König 
Sent: Wednesday, April 28, 2021 5:07 PM
To: Deng, Emily ; Deucher, Alexander 

Cc: Sun, Roy ; amd-gfx list ; 
Nieto, David M 
Subject: Re: [PATCH 1/2] drm/scheduler: Change scheduled fence track

Well none. As I said I will push this upstream through drm-misc-next.

Christian.
Am 28.04.21 um 10:32 schrieb Deng, Emily:

[AMD Official Use Only - Internal Distribution Only]

Hi Alex and Christian,
What extra work Roy need to do about this patch? And fdinfo?

Best wishes
Emily Deng
From: amd-gfx 
<mailto:amd-gfx-boun...@lists.freedesktop.org>
 On Behalf Of Deucher, Alexander
Sent: Tuesday, April 27, 2021 3:52 AM
To: Christian König 
<mailto:ckoenig.leichtzumer...@gmail.com>
Cc: Sun, Roy <mailto:roy@amd.com>; amd-gfx list 
<mailto:amd-gfx@lists.freedesktop.org>; Nieto, 
David M <mailto:david.ni...@amd.com>
Subject: Re: [PATCH 1/2] drm/scheduler: Change scheduled fence track


[AMD Official Use Only - Internal Distribution Only]


[AMD Official Use Only - Internal Distribution Only]

Fair point.  Either way works for me.

Alex

From: Christian König 
mailto:ckoenig.leichtzumer...@gmail.com>>
Sent: Monday, April 26, 2021 3:48 PM
To: Deucher, Alexander 
mailto:alexander.deuc...@amd.com>>
Cc: amd-gfx list 
mailto:amd-gfx@lists.freedesktop.org>>; Sun, Roy 
mailto:roy@amd.com>>; Nieto, David M 
mailto:david.ni...@amd.com>>
Subject: Re: [PATCH 1/2] drm/scheduler: Change scheduled fence track

My concern is more to get this tested from more people than just AMD.

Christian.
Am 26.04.21 um 21:40 schrieb Deucher, Alexander:

[AMD Official Use Only - Internal Distribution Only]

That said, it would be easier for me to merge through the AMD tree since a 
relatively big AMD feature depends on it.  Not sure how much conflict potential 
there is if this goes through the AMD tree.

Alex


From: amd-gfx 
<mailto:amd-gfx-boun...@lists.freedesktop.org>
 on behalf of Deucher, Alexander 
<mailto:alexander.deuc...@amd.com>
Sent: Monday, April 26, 2021 3:24 PM
To: Christian König 
<mailto:ckoenig.leichtzumer...@gmail.com>
Cc: amd-gfx list 
<mailto:amd-gfx@lists.freedesktop.org>; Sun, Roy 
<mailto:roy@amd.com>; Nieto, David M 
<mailto:david.ni...@amd.com>
Subject: Re: [PATCH 1/2] drm/scheduler: Change scheduled fence track


[AMD Official Use Only - Internal Distribution Only]


[AMD Official Use Only - Internal Distribution Only]

No objections from me.

Thanks!

Alex


From: Christian König 
<mailto:ckoenig.leichtzumer...@gmail.com>
Sent: Monday, April 26, 2021 2:49 AM
To: Deucher, Alexander 
<mailto:alexander.deuc...@amd.com>
Cc: Nieto, David M <mailto:david.ni...@amd.com>; Sun, Roy 
<mailto:roy@amd.com>; amd-gfx list 
<mailto:amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH 1/2] drm/scheduler: Change scheduled fence track

Hey Alex,

any objections that we merge those two patches through drm-misc-next?

Thanks,
Christian.

Am 26.04.21 um 08:27 schrieb Roy Sun:
> Update the timestamp of scheduled fence on HW
> completion of the previous fences
>
> This allow more accurate tracking of the fence
> execution in HW
>
> Signed-off-by: David M Nieto <mailto:david.ni...@amd.com>
> Signed-off-by: Roy Sun <mailto:roy@amd.com>
> ---
>   drivers/gpu/drm/scheduler/sched_main.c | 12 ++--
>   1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
> b/drivers/gpu/drm/scheduler/sched_main.c
> index 92d8de24d0a1..f8e39ab0c41b 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -515,7 +515,7 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler 
> *sched)
>   EXPORT_SYMBOL(drm_sched_resubmit_jobs);
>
>   /**
> - * drm_sched_resubmit_jobs_ext - helper to relunch certain number of jobs 
> from mirror ring list
> + * drm_sched_resubmit_jobs_ext - helper to relaunch certain number of jobs 
> from pending list
>*
>* @sched: scheduler instance
>* @max: job numbers to relaunch
> @@ -671,7 +671,7 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
>   static struct drm_sched_job *
>   drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
>   {
> - struct drm_sched_job *job;
> + struct drm_sched_job *job, *next;
>
>/*
> * Don't destroy jobs while the timeout worker is running  OR thread
> @@ -690,6 +690,14 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler 
> *sched)
>if (job && dma_fence_is_signaled(>s_fence->finished)) {
>/* rem

RE: [PATCH 1/2] drm/scheduler: Change scheduled fence track

2021-04-28 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Hi Alex and Christian,
What extra work Roy need to do about this patch? And fdinfo?

Best wishes
Emily Deng
From: amd-gfx  On Behalf Of Deucher, 
Alexander
Sent: Tuesday, April 27, 2021 3:52 AM
To: Christian König 
Cc: Sun, Roy ; amd-gfx list ; 
Nieto, David M 
Subject: Re: [PATCH 1/2] drm/scheduler: Change scheduled fence track


[AMD Official Use Only - Internal Distribution Only]


[AMD Official Use Only - Internal Distribution Only]

Fair point.  Either way works for me.

Alex

From: Christian König 
mailto:ckoenig.leichtzumer...@gmail.com>>
Sent: Monday, April 26, 2021 3:48 PM
To: Deucher, Alexander 
mailto:alexander.deuc...@amd.com>>
Cc: amd-gfx list 
mailto:amd-gfx@lists.freedesktop.org>>; Sun, Roy 
mailto:roy@amd.com>>; Nieto, David M 
mailto:david.ni...@amd.com>>
Subject: Re: [PATCH 1/2] drm/scheduler: Change scheduled fence track

My concern is more to get this tested from more people than just AMD.

Christian.
Am 26.04.21 um 21:40 schrieb Deucher, Alexander:

[AMD Official Use Only - Internal Distribution Only]

That said, it would be easier for me to merge through the AMD tree since a 
relatively big AMD feature depends on it.  Not sure how much conflict potential 
there is if this goes through the AMD tree.

Alex


From: amd-gfx 

 on behalf of Deucher, Alexander 

Sent: Monday, April 26, 2021 3:24 PM
To: Christian König 

Cc: amd-gfx list 
; Sun, Roy 
; Nieto, David M 

Subject: Re: [PATCH 1/2] drm/scheduler: Change scheduled fence track


[AMD Official Use Only - Internal Distribution Only]


[AMD Official Use Only - Internal Distribution Only]

No objections from me.

Thanks!

Alex


From: Christian König 

Sent: Monday, April 26, 2021 2:49 AM
To: Deucher, Alexander 

Cc: Nieto, David M ; Sun, Roy 
; amd-gfx list 

Subject: Re: [PATCH 1/2] drm/scheduler: Change scheduled fence track

Hey Alex,

any objections that we merge those two patches through drm-misc-next?

Thanks,
Christian.

Am 26.04.21 um 08:27 schrieb Roy Sun:
> Update the timestamp of scheduled fence on HW
> completion of the previous fences
>
> This allow more accurate tracking of the fence
> execution in HW
>
> Signed-off-by: David M Nieto 
> Signed-off-by: Roy Sun 
> ---
>   drivers/gpu/drm/scheduler/sched_main.c | 12 ++--
>   1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
> b/drivers/gpu/drm/scheduler/sched_main.c
> index 92d8de24d0a1..f8e39ab0c41b 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -515,7 +515,7 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler 
> *sched)
>   EXPORT_SYMBOL(drm_sched_resubmit_jobs);
>
>   /**
> - * drm_sched_resubmit_jobs_ext - helper to relunch certain number of jobs 
> from mirror ring list
> + * drm_sched_resubmit_jobs_ext - helper to relaunch certain number of jobs 
> from pending list
>*
>* @sched: scheduler instance
>* @max: job numbers to relaunch
> @@ -671,7 +671,7 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
>   static struct drm_sched_job *
>   drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
>   {
> - struct drm_sched_job *job;
> + struct drm_sched_job *job, *next;
>
>/*
> * Don't destroy jobs while the timeout worker is running  OR thread
> @@ -690,6 +690,14 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler 
> *sched)
>if (job && dma_fence_is_signaled(>s_fence->finished)) {
>/* remove job from pending_list */
>list_del_init(>list);
> We just need to record the scheduled time of the next job. So we
> need not to check the rest job.
> + /* account for the next fence in the queue */
> + next = list_first_entry_or_null(>pending_list,
> + struct drm_sched_job, list);
> + if (next && test_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT,
> + >s_fence->finished.flags)) {
> + next->s_fence->scheduled.timestamp =
> + job->s_fence->finished.timestamp;
> + }
>} else {
>job = NULL;
>/* queue timeout for next job */

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amd/amdgpu/sriov disable all ip hw status by default

2021-04-27 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Review-by: Emily Deng 

>-Original Message-
>From: Jack Zhang 
>Sent: Tuesday, April 27, 2021 6:03 PM
>To: amd-gfx@lists.freedesktop.org; Liu, Monk ; Deng,
>Emily ; Chen, JingWen 
>Cc: Zhang, Jack (Jian) 
>Subject: [PATCH] drm/amd/amdgpu/sriov disable all ip hw status by default
>
>Disable all ip's hw status to false before any hw_init.
>Only set it to true until its hw_init is executed.
>
>The old 5.9 branch has this change but somehow the 5.11 kernrel does not
>have this fix.
>
>Without this change, sriov tdr have gfx IB test fail.
>
>Signed-off-by: Jack Zhang 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>index eef54b265ffd..5cb171c2273c 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>@@ -2843,7 +2843,7 @@ static int
>amdgpu_device_ip_reinit_early_sriov(struct amdgpu_device *adev)
> AMD_IP_BLOCK_TYPE_IH,
> };
>
>-for (i = 0; i < ARRAY_SIZE(ip_order); i++) {
>+for (i = 0; i < adev->num_ip_blocks; i++) {
> int j;
> struct amdgpu_ip_block *block;
>
>--
>2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu/sriov: Remove clear vf fw support

2021-04-26 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Emily Deng 

>-Original Message-
>From: Victor Zhao 
>Sent: Thursday, April 22, 2021 6:02 PM
>To: amd-gfx@lists.freedesktop.org; Deng, Emily 
>Cc: Zhao, Victor 
>Subject: [PATCH] drm/amdgpu/sriov: Remove clear vf fw support
>
>PSP clear_vf_fw feature is outdated and has been removed.
>Remove the related functions.
>
>Signed-off-by: Victor Zhao 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 32 -
>drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h |  1 -
> 2 files changed, 33 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>index 9311dcc94cb6..623044414bb5 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>@@ -417,26 +417,6 @@ static int psp_tmr_init(struct psp_context *psp)
> return ret;
> }
>
>-static int psp_clear_vf_fw(struct psp_context *psp) -{
>-int ret;
>-struct psp_gfx_cmd_resp *cmd;
>-
>-if (!amdgpu_sriov_vf(psp->adev) || psp->adev->asic_type !=
>CHIP_NAVI12)
>-return 0;
>-
>-cmd = kzalloc(sizeof(struct psp_gfx_cmd_resp), GFP_KERNEL);
>-if (!cmd)
>-return -ENOMEM;
>-
>-cmd->cmd_id = GFX_CMD_ID_CLEAR_VF_FW;
>-
>-ret = psp_cmd_submit_buf(psp, NULL, cmd, psp-
>>fence_buf_mc_addr);
>-kfree(cmd);
>-
>-return ret;
>-}
>-
> static bool psp_skip_tmr(struct psp_context *psp)  {
> switch (psp->adev->asic_type) {
>@@ -1925,12 +1905,6 @@ static int psp_hw_start(struct psp_context *psp)
> return ret;
> }
>
>-ret = psp_clear_vf_fw(psp);
>-if (ret) {
>-DRM_ERROR("PSP clear vf fw!\n");
>-return ret;
>-}
>-
> ret = psp_boot_config_set(adev);
> if (ret) {
> DRM_WARN("PSP set boot config@\n");
>@@ -2439,7 +2413,6 @@ static int psp_hw_fini(void *handle)  {
> struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> struct psp_context *psp = >psp;
>-int ret;
>
> if (psp->adev->psp.ta_fw) {
> psp_ras_terminate(psp);
>@@ -2450,11 +2423,6 @@ static int psp_hw_fini(void *handle)
> }
>
> psp_asd_unload(psp);
>-ret = psp_clear_vf_fw(psp);
>-if (ret) {
>-DRM_ERROR("PSP clear vf fw!\n");
>-return ret;
>-}
>
> psp_tmr_terminate(psp);
> psp_ring_destroy(psp, PSP_RING_TYPE__KM); diff --git
>a/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h
>b/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h
>index 96064c343163..f6d3180febc4 100644
>--- a/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h
>+++ b/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h
>@@ -97,7 +97,6 @@ enum psp_gfx_cmd_id
> GFX_CMD_ID_SETUP_VMR  = 0x0009,   /* setup VMR region */
> GFX_CMD_ID_DESTROY_VMR= 0x000A,   /* destroy VMR region
>*/
> GFX_CMD_ID_PROG_REG   = 0x000B,   /* program regs */
>-GFX_CMD_ID_CLEAR_VF_FW= 0x000D,   /* Clear VF FW, to be
>used on VF shutdown. */
> GFX_CMD_ID_GET_FW_ATTESTATION = 0x000F,   /* Query GPUVA of
>the Fw Attestation DB */
> /* IDs upto 0x1F are reserved for older programs (Raven, Vega 10/12/20)
>*/
> GFX_CMD_ID_LOAD_TOC   = 0x0020,   /* Load TOC and obtain
>TMR size */
>--
>2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 2/2] drm/amdgpu: Add show_fdinfo() interface

2021-04-20 Thread Deng, Emily
Hi Christian,
 Could you help to review these patches again, thanks.

Best wishes
Emily Deng
>-Original Message-
>From: amd-gfx  On Behalf Of Sun, Roy
>Sent: Tuesday, April 20, 2021 4:54 PM
>To: Sun, Roy ; amd-gfx@lists.freedesktop.org
>Cc: Nieto, David M 
>Subject: RE: [PATCH 2/2] drm/amdgpu: Add show_fdinfo() interface
>
>[AMD Official Use Only - Internal Distribution Only]
>
>Ping.
>Could you help review this patch again?
>
>BR
>Roy
>
>-Original Message-
>From: Roy Sun 
>Sent: Monday, April 19, 2021 2:26 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Sun, Roy ; Nieto, David M 
>Subject: [PATCH 2/2] drm/amdgpu: Add show_fdinfo() interface
>
>Tracking devices, process info and fence info using /proc/pid/fdinfo
>
>Signed-off-by: David M Nieto 
>Signed-off-by: Roy Sun 
>---
> drivers/gpu/drm/amd/amdgpu/Makefile|  2 +
> drivers/gpu/drm/amd/amdgpu/amdgpu.h|  1 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c| 61 ++
> drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h|  5 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c|  5 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 95
>++  drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h |
>43 ++
> drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c|  1 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 24 ++
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  4 +
> 10 files changed, 239 insertions(+), 2 deletions(-)  create mode 100644
>drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
> create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile
>b/drivers/gpu/drm/amd/amdgpu/Makefile
>index ee85e8aba636..d216b7ecb5d1 100644
>--- a/drivers/gpu/drm/amd/amdgpu/Makefile
>+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
>@@ -58,6 +58,8 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
>   amdgpu_umc.o smu_v11_0_i2c.o amdgpu_fru_eeprom.o
>amdgpu_rap.o \
>   amdgpu_fw_attestation.o amdgpu_securedisplay.o
>
>+amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o
>+
> amdgpu-$(CONFIG_PERF_EVENTS) += amdgpu_pmu.o
>
> # add asic specific block
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>index 125b25a5ce5b..3365feae15e1 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>@@ -107,6 +107,7 @@
> #include "amdgpu_gfxhub.h"
> #include "amdgpu_df.h"
> #include "amdgpu_smuio.h"
>+#include "amdgpu_fdinfo.h"
>
> #define MAX_GPU_INSTANCE  16
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>index 0350205c4897..01fe60fedcbe 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>@@ -651,3 +651,64 @@ void amdgpu_ctx_mgr_fini(struct amdgpu_ctx_mgr
>*mgr)
>   idr_destroy(>ctx_handles);
>   mutex_destroy(>lock);
> }
>+
>+void amdgpu_ctx_fence_time(struct amdgpu_ctx *ctx, struct
>amdgpu_ctx_entity *centity,
>+  ktime_t *total, ktime_t *max)
>+{
>+  ktime_t now, t1;
>+  uint32_t i;
>+
>+  now = ktime_get();
>+  for (i = 0; i < amdgpu_sched_jobs; i++) {
>+  struct dma_fence *fence;
>+  struct drm_sched_fence *s_fence;
>+
>+  spin_lock(>ring_lock);
>+  fence = dma_fence_get(centity->fences[i]);
>+  spin_unlock(>ring_lock);
>+  if (!fence)
>+  continue;
>+  s_fence = to_drm_sched_fence(fence);
>+  if (!dma_fence_is_signaled(_fence->scheduled))
>+  continue;
>+  t1 = s_fence->scheduled.timestamp;
>+  if (t1 >= now)
>+  continue;
>+  if (dma_fence_is_signaled(_fence->finished) &&
>+  s_fence->finished.timestamp < now)
>+  *total += ktime_sub(s_fence->finished.timestamp, t1);
>+  else
>+  *total += ktime_sub(now, t1);
>+  t1 = ktime_sub(now, t1);
>+  dma_fence_put(fence);
>+  *max = max(t1, *max);
>+  }
>+}
>+
>+ktime_t amdgpu_ctx_mgr_fence_usage(struct amdgpu_ctx_mgr *mgr,
>uint32_t hwip,
>+  uint32_t idx, uint64_t *elapsed)
>+{
>+  struct idr *idp;
>+  struct amdgpu_ctx *ctx;
>+  uint32_t id;
>+  struct amdgpu_ctx_entity *centity;
>+  ktime_t total = 0, max = 0;
>+
>+  if (idx >= AMDGPU_MAX_ENTITY_NUM)
>+  return 0;
>+  idp = >ctx_handles;
>+  mutex_lock(>lock);
>+  idr_for_each_entry(idp, ctx, id) {
>+  if (!ctx->entities[hwip][idx])
>+  continue;
>+
>+  centity = ctx->entities[hwip][idx];
>+  amdgpu_ctx_fence_time(ctx, centity, , );
>+  }
>+
>+  mutex_unlock(>lock);
>+  if (elapsed)
>+  *elapsed = max;
>+
>+  return total;
>+}
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
>index 

RE: [PATCH] drm/amdgpu: fix gfx9 rlc modprobe rlcg program timeout issue

2021-04-06 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Emily Deng 

>-Original Message-
>From: Zhu, Changfeng 
>Sent: Tuesday, April 6, 2021 3:30 PM
>To: amd-gfx@lists.freedesktop.org; Huang, Ray ;
>Zhou, Peng Ju ; Deng, Emily 
>Cc: Zhu, Changfeng 
>Subject: [PATCH] drm/amdgpu: fix gfx9 rlc modprobe rlcg program timeout
>issue
>
>From: changzhu 
>
>From: Changfeng 
>
>It needs to add amdgpu_sriov_fullaccess judgement as gfx_v10_rlcg_wreg
>when doing gfx_v9_0_rlcg_wreg.
>Or it will cause modprobe issue as below:
>kernel: [   59.992843] amdgpu: timeout: rlcg program reg:0x02984 failed!
>
>Fix for patch:
>drm/amdgpu: indirect register access for nv12 sriov
>
>Change-Id: I971804e4e8dbd83e4179beefa8ae8a06bd52913b
>Signed-off-by: Changfeng 
>---
> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 16 +++-
> 1 file changed, 15 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>index 2111e4c46a52..06811a1f4625 100644
>--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>@@ -734,7 +734,7 @@ static const u32
>GFX_RLC_SRM_INDEX_CNTL_DATA_OFFSETS[] =
> mmRLC_SRM_INDEX_CNTL_DATA_7 -
>mmRLC_SRM_INDEX_CNTL_DATA_0,  };
>
>-void gfx_v9_0_rlcg_wreg(struct amdgpu_device *adev, u32 offset, u32 v, u32
>flag)
>+static void gfx_v9_0_rlcg_rw(struct amdgpu_device *adev, u32 offset,
>+u32 v, u32 flag)
> {
> static void *scratch_reg0;
> static void *scratch_reg1;
>@@ -787,6 +787,20 @@ void gfx_v9_0_rlcg_wreg(struct amdgpu_device
>*adev, u32 offset, u32 v, u32 flag)
>
> }
>
>+static void gfx_v9_0_rlcg_wreg(struct amdgpu_device *adev, u32 offset,
>+u32 v, u32 flag) {
>+if (amdgpu_sriov_fullaccess(adev)) {
>+gfx_v9_0_rlcg_rw(adev, offset, v, flag);
>+
>+return;
>+}
>+
>+if (flag & AMDGPU_REGS_NO_KIQ)
>+WREG32_NO_KIQ(offset, v);
>+else
>+WREG32(offset, v);
>+}
>+
> #define VEGA10_GB_ADDR_CONFIG_GOLDEN 0x2a114042  #define
>VEGA12_GB_ADDR_CONFIG_GOLDEN 0x24104041  #define
>RAVEN_GB_ADDR_CONFIG_GOLDEN 0x2442
>--
>2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: Toggle msix after FLR for sriov

2021-04-01 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Hi Monk,
 Could you help to review this patch?

Best wishes
Emily Deng

>-Original Message-
>From: Deng, Emily 
>Sent: Wednesday, March 31, 2021 5:02 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Subject: RE: [PATCH] drm/amdgpu: Toggle msix after FLR for sriov
>
>[AMD Official Use Only - Internal Distribution Only]
>
>Ping ..
>
>>-Original Message-
>>From: Emily Deng 
>>Sent: Tuesday, March 30, 2021 5:43 PM
>>To: amd-gfx@lists.freedesktop.org
>>Cc: Deng, Emily 
>>Subject: [PATCH] drm/amdgpu: Toggle msix after FLR for sriov
>>
>>From: "Emily.Deng" 
>>
>>For vf assigned to guest VM, after FLR, the msix table will be reset.
>>As the flr is done on host driver. The qemu and vfio driver don't know
>>this, and the msix is still enable from qemu and vfio driver side.
>>So if want to  re-setup the msix table, first need to disable and
>>re-enable the msix from guest VM side or the qemu will do nothing as it
>>thought the msix is already enabled.
>>
>>v2:
>>Change name with amdgpu_irq prefix, remove #ifdef.
>>
>>Signed-off-by: Emily.Deng 
>>---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 14 ++
>> 1 file changed, 14 insertions(+)
>>
>>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>index 03412543427a..3045f52e613d 100644
>>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>@@ -277,6 +277,17 @@ static bool amdgpu_msi_ok(struct amdgpu_device
>>*adev)
>> return true;
>> }
>>
>>+static void amdgpu_irq_restore_msix(struct amdgpu_device *adev) {
>>+u16 ctrl;
>>+
>>+pci_read_config_word(adev->pdev, adev->pdev->msix_cap +
>>PCI_MSIX_FLAGS, );
>>+ctrl &= ~PCI_MSIX_FLAGS_ENABLE;
>>+pci_write_config_word(adev->pdev, adev->pdev->msix_cap +
>>PCI_MSIX_FLAGS, ctrl);
>>+ctrl |= PCI_MSIX_FLAGS_ENABLE;
>>+pci_write_config_word(adev->pdev, adev->pdev->msix_cap +
>>+PCI_MSIX_FLAGS, ctrl); }
>>+
>> /**
>>  * amdgpu_irq_init - initialize interrupt handling
>>  *
>>@@ -558,6 +569,9 @@ void amdgpu_irq_gpu_reset_resume_helper(struct
>>amdgpu_device *adev)  {
>> int i, j, k;
>>
>>+if (amdgpu_sriov_vf(adev))
>>+amdgpu_irq_restore_msix(adev);
>>+
>> for (i = 0; i < AMDGPU_IRQ_CLIENTID_MAX; ++i) {  if
>>(!adev->irq.client[i].sources)  continue;
>>--
>>2.25.1
>

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 4/6] drm/amdgpu: Disable fetch discovery data from vram for navi12 sriov

2021-04-01 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Hi Monk,
 Could you help to review this patch?

Best wishes
Emily Deng

>-Original Message-
>From: Deng, Emily 
>Sent: Wednesday, March 31, 2021 5:01 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Subject: RE: [PATCH 4/6] drm/amdgpu: Disable fetch discovery data from
>vram for navi12 sriov
>
>[AMD Official Use Only - Internal Distribution Only]
>
>Ping .
>
>>-Original Message-
>>From: Emily Deng 
>>Sent: Tuesday, March 30, 2021 12:42 PM
>>To: amd-gfx@lists.freedesktop.org
>>Cc: Deng, Emily 
>>Subject: [PATCH 4/6] drm/amdgpu: Disable fetch discovery data from vram
>>for
>>navi12 sriov
>>
>>To fix the board disappear issue.
>>
>>Signed-off-by: Emily Deng 
>>---
>> drivers/gpu/drm/amd/amdgpu/nv.c | 4 
>> 1 file changed, 4 insertions(+)
>>
>>diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c
>>b/drivers/gpu/drm/amd/amdgpu/nv.c index 46d4bbabce75..48dc171bc759
>>100644
>>--- a/drivers/gpu/drm/amd/amdgpu/nv.c
>>+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
>>@@ -693,6 +693,10 @@ int nv_set_ip_blocks(struct amdgpu_device *adev)
>> adev->nbio.funcs = _v2_3_funcs;
>> adev->nbio.hdp_flush_reg = _v2_3_hdp_flush_reg;
>> }
>>+
>>+if (amdgpu_sriov_vf(adev) && adev->asic_type == CHIP_NAVI12)
>>+amdgpu_discovery = 0;
>>+
>> adev->hdp.funcs = _v5_0_funcs;
>>
>> if (adev->asic_type >= CHIP_SIENNA_CICHLID)
>>--
>>2.25.1
>

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 2/6] drm/amdgpu: Correct the irq numbers for virtual ctrc

2021-04-01 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Hi Monk,
 Could you help to review this patch?

Best wishes
Emily Deng

>-Original Message-
>From: Deng, Emily 
>Sent: Wednesday, March 31, 2021 5:01 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Subject: RE: [PATCH 2/6] drm/amdgpu: Correct the irq numbers for virtual ctrc
>
>[AMD Official Use Only - Internal Distribution Only]
>
>Ping..
>
>>-Original Message-
>>From: Emily Deng 
>>Sent: Tuesday, March 30, 2021 12:42 PM
>>To: amd-gfx@lists.freedesktop.org
>>Cc: Deng, Emily 
>>Subject: [PATCH 2/6] drm/amdgpu: Correct the irq numbers for virtual
>>ctrc
>>
>>Set the num_types equal to the enabled num_crtc.
>>
>>Signed-off-by: Emily Deng 
>>---
>> drivers/gpu/drm/amd/amdgpu/dce_virtual.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>>diff --git a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>>b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>>index 5c11144da051..c03a83a2b7cd 100644
>>--- a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>>+++ b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>>@@ -768,7 +768,7 @@ static const struct amdgpu_irq_src_funcs
>>dce_virtual_crtc_irq_funcs = {
>>
>> static void dce_virtual_set_irq_funcs(struct amdgpu_device *adev)  {
>>-adev->crtc_irq.num_types = AMDGPU_CRTC_IRQ_VBLANK6 + 1;
>>+adev->crtc_irq.num_types = adev->mode_info.num_crtc;
>> adev->crtc_irq.funcs = _virtual_crtc_irq_funcs;  }
>>
>>--
>>2.25.1
>

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 1/6] drm/amdgpu: Disable vcn decode ring for sriov navi12

2021-04-01 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Hi Monk,
 Could you help to review this patch?

Best wishes
Emily Deng


>-Original Message-
>From: Deng, Emily 
>Sent: Wednesday, March 31, 2021 5:01 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Cc: Min, Frank 
>Subject: RE: [PATCH 1/6] drm/amdgpu: Disable vcn decode ring for sriov
>navi12
>
>[AMD Official Use Only - Internal Distribution Only]
>
>Ping..
>
>>-Original Message-
>>From: Emily Deng 
>>Sent: Tuesday, March 30, 2021 12:42 PM
>>To: amd-gfx@lists.freedesktop.org
>>Cc: Deng, Emily ; Min, Frank 
>>Subject: [PATCH 1/6] drm/amdgpu: Disable vcn decode ring for sriov
>>navi12
>>
>>Since vcn decoding ring is not required, so just disable it.
>>
>>Signed-off-by: Frank.Min 
>>Signed-off-by: Emily Deng 
>>---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c |  4 +++-
>> drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c   | 29 -
>> 2 files changed, 17 insertions(+), 16 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>index 8844f650b17f..5d5c41c9d5aa 100644
>>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>@@ -427,7 +427,9 @@ static int amdgpu_hw_ip_info(struct amdgpu_device
>>*adev,  if (adev->uvd.harvest_config & (1 << i))  continue;
>>
>>-if (adev->vcn.inst[i].ring_dec.sched.ready)
>>+if (adev->vcn.inst[i].ring_dec.sched.ready || (adev->asic_type ==
>>+CHIP_NAVI12 &&
>>+amdgpu_sriov_vf(adev)))
>> ++num_rings;
>> }
>> ib_start_alignment = 16;
>>diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
>>b/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
>>index 116b9643d5ba..e4b61f3a45fb 100644
>>--- a/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
>>+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
>>@@ -220,21 +220,20 @@ static int vcn_v2_0_hw_init(void *handle)  {
>>struct amdgpu_device *adev = (struct amdgpu_device *)handle;  struct
>>amdgpu_ring *ring = >vcn.inst->ring_dec; -int i, r;
>>+int i, r = -1;
>>
>> adev->nbio.funcs->vcn_doorbell_range(adev, ring->use_doorbell,
>>  ring->doorbell_index, 0);
>>
>>-if (amdgpu_sriov_vf(adev))
>>+if (amdgpu_sriov_vf(adev)) {
>> vcn_v2_0_start_sriov(adev);
>>-
>>-r = amdgpu_ring_test_helper(ring);
>>-if (r)
>>-goto done;
>>-
>>-//Disable vcn decode for sriov
>>-if (amdgpu_sriov_vf(adev))
>>-ring->sched.ready = false;
>>+if (adev->asic_type == CHIP_NAVI12)
>>+ring->sched.ready = false;
>>+} else {
>>+r = amdgpu_ring_test_helper(ring);
>>+if (r)
>>+goto done;
>>+}
>>
>> for (i = 0; i < adev->vcn.num_enc_rings; ++i) {  ring =
>>>vcn.inst->ring_enc[i]; @@ -245,8 +244,11 @@ static int
>>vcn_v2_0_hw_init(void *handle)
>>
>> done:
>> if (!r)
>>-DRM_INFO("VCN decode and encode initialized successfully(under
>>%s).\n", -(adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)?"DPG
>Mode":"SPG
>>Mode");
>>+DRM_INFO("VCN %s encode initialized
>>successfully(under %s).\n",
>>+(adev->asic_type == CHIP_NAVI12 &&
>>+amdgpu_sriov_vf(adev))?"":"decode and", (adev->pg_flags &
>>+AMD_PG_SUPPORT_VCN_DPG)?"DPG
>>Mode":"SPG Mode");
>>
>> return r;
>> }
>>@@ -1719,9 +1721,6 @@ int vcn_v2_0_dec_ring_test_ring(struct
>>amdgpu_ring *ring)
>> unsigned i;
>> int r;
>>
>>-if (amdgpu_sriov_vf(adev))
>>-return 0;
>>-
>> WREG32(adev->vcn.inst[ring->me].external.scratch9, 0xCAFEDEAD);  r =
>>amdgpu_ring_alloc(ring, 4);  if (r)
>>--
>>2.25.1
>

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 4/4] drm/amdgpu: indirect register access for nv12 sriov

2021-04-01 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Series Reviewed-by: Emily.Deng 

>-Original Message-
>From: amd-gfx  On Behalf Of Peng
>Ju Zhou
>Sent: Wednesday, March 31, 2021 1:20 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Zhao, Jiange 
>Subject: [PATCH 4/4] drm/amdgpu: indirect register access for nv12 sriov
>
>1. expand rlcg interface for gc & mmhub indirect access 2. add rlcg interface
>for no kiq
>
>Signed-off-by: Peng Ju Zhou 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   2 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.h|   3 +-
> drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 131 ++---
> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  |   2 +-
> drivers/gpu/drm/amd/amdgpu/soc15_common.h  |  75 ++--
> 5 files changed, 150 insertions(+), 63 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>index 060d0ae99453..438e2f732377 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>@@ -490,7 +490,7 @@ void amdgpu_mm_wreg_mmio_rlc(struct
>amdgpu_device *adev,
> adev->gfx.rlc.funcs &&
> adev->gfx.rlc.funcs->is_rlcg_access_range) {
> if (adev->gfx.rlc.funcs->is_rlcg_access_range(adev, reg))
>-return adev->gfx.rlc.funcs->rlcg_wreg(adev, reg, v);
>+return adev->gfx.rlc.funcs->rlcg_wreg(adev, reg, v, 0);
> } else {
> writel(v, ((void __iomem *)adev->rmmio) + (reg * 4));
> }
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.h
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.h
>index aeaaae713c59..4fc2ce8ce8ab 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.h
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.h
>@@ -127,7 +127,8 @@ struct amdgpu_rlc_funcs {
> void (*reset)(struct amdgpu_device *adev);
> void (*start)(struct amdgpu_device *adev);
> void (*update_spm_vmid)(struct amdgpu_device *adev, unsigned
>vmid);
>-void (*rlcg_wreg)(struct amdgpu_device *adev, u32 offset, u32 v);
>+void (*rlcg_wreg)(struct amdgpu_device *adev, u32 offset, u32 v, u32
>flag);
>+u32 (*rlcg_rreg)(struct amdgpu_device *adev, u32 offset, u32 flag);
> bool (*is_rlcg_access_range)(struct amdgpu_device *adev, uint32_t
>reg);  };
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>index b4fd0394cd08..85a6a10e048f 100644
>--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>@@ -177,6 +177,11 @@
> #define mmGC_THROTTLE_CTRL_Sienna_Cichlid  0x2030
> #define mmGC_THROTTLE_CTRL_Sienna_Cichlid_BASE_IDX 0
>
>+#define GFX_RLCG_GC_WRITE_OLD(0x8 << 28)
>+#define GFX_RLCG_GC_WRITE(0x0 << 28)
>+#define GFX_RLCG_GC_READ(0x1 << 28)
>+#define GFX_RLCG_MMHUB_WRITE(0x2 << 28)
>+
> MODULE_FIRMWARE("amdgpu/navi10_ce.bin");
> MODULE_FIRMWARE("amdgpu/navi10_pfp.bin");
> MODULE_FIRMWARE("amdgpu/navi10_me.bin");
>@@ -1422,38 +1427,127 @@ static const struct soc15_reg_golden
>golden_settings_gc_10_1_2[] =
> SOC15_REG_GOLDEN_VALUE(GC, 0, mmUTCL1_CTRL, 0x,
>0x0080)  };
>
>-static void gfx_v10_rlcg_wreg(struct amdgpu_device *adev, u32 offset, u32 v)
>+static bool gfx_v10_is_rlcg_rw(struct amdgpu_device *adev, u32 offset,
>+uint32_t *flag, bool write) {
>+/* always programed by rlcg, only for gc */
>+if (offset == SOC15_REG_OFFSET(GC, 0, mmRLC_CSIB_ADDR_HI) ||
>+offset == SOC15_REG_OFFSET(GC, 0, mmRLC_CSIB_ADDR_LO) ||
>+offset == SOC15_REG_OFFSET(GC, 0, mmRLC_CSIB_LENGTH) ||
>+offset == SOC15_REG_OFFSET(GC, 0, mmGRBM_GFX_CNTL) ||
>+offset == SOC15_REG_OFFSET(GC, 0, mmGRBM_GFX_INDEX) ||
>+offset == SOC15_REG_OFFSET(GC, 0, mmCP_ME_CNTL)) {
>+if (!amdgpu_sriov_reg_indirect_gc(adev))
>+*flag = GFX_RLCG_GC_WRITE_OLD;
>+else
>+*flag = write ? GFX_RLCG_GC_WRITE :
>GFX_RLCG_GC_READ;
>+
>+return true;
>+}
>+
>+/* currently support gc read/write, mmhub write */
>+if (offset >= SOC15_REG_OFFSET(GC, 0, mmSDMA0_DEC_START) &&
>+offset <= SOC15_REG_OFFSET(GC, 0, mmRLC_GTS_OFFSET_MSB)) {
>+if (amdgpu_sriov_reg_indirect_gc(adev))
>+*flag = write ? GFX_RLCG_GC_WRITE :
>GFX_RLCG_GC_READ;
>+else
>+return false;
>+} else {
>+if (amdgpu_sriov_reg_indirect_mmhub(adev))
>+*flag = GFX_RLCG_MMHUB_WRITE;
>+else
>+return false;
>+}
>+
>+return true;
>+}
>+
>+static u32 gfx_v10_rlcg_rw(struct amdgpu_device *adev, u32 offset, u32
>+v, uint32_t flag)
> {
> static void *scratch_reg0;
> static void *scratch_reg1;
>+static void *scratch_reg2;
>+static void *scratch_reg3;
> static void *spare_int;
>+static uint32_t grbm_cntl;
>+static uint32_t grbm_idx;
> uint32_t i = 0;
> uint32_t retries = 5;
>+u32 ret = 0;
>+
>+scratch_reg0 = adev->rmmio +
>+   (adev-
>>reg_offset[GC_HWIP][0][mmSCRATCH_REG0_BASE_IDX] +
>mmSCRATCH_REG0) * 4;
>+scratch_reg1 = adev->rmmio +
>+   (adev-
>>reg_offset[GC_HWIP][0][mmSCRATCH_REG1_BASE_IDX] +
>mmSCRATCH_REG1) * 4;
>+scratch_reg2 = adev->rmmio +
>+   (adev-
>>reg_offset[GC_HWIP][0][mmSCRATCH_REG0_BASE_IDX] +
>mmSCRATCH_REG2) * 4;
>+scratch_reg3 = adev->rmmio +
>+  

RE: [PATCH 2/2] drm/amdgpu: Revert "SWDEV-238407 Add clear vf fw support"

2021-03-31 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Ping ..

>-Original Message-
>From: Emily Deng 
>Sent: Wednesday, March 31, 2021 2:34 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: [PATCH 2/2] drm/amdgpu: Revert "SWDEV-238407 Add clear vf fw
>support"
>
>As already moved the support to host driver, so revert this in guest driver.
>This reverts commit 8d5e6f45df5f9073760dea0ab94321615cea16ec.
>
>Signed-off-by: Emily Deng 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 36 ++---
>drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h |  8 --
> 2 files changed, 2 insertions(+), 42 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>index c36c8fca1f64..aa2f8fc4aac8 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>@@ -291,9 +291,8 @@ psp_cmd_submit_buf(struct psp_context *psp,
> amdgpu_asic_invalidate_hdp(psp->adev, NULL);
> }
>
>-/* We allow TEE_ERROR_NOT_SUPPORTED for VMR command and
>PSP_ERR_UNKNOWN_COMMAND in SRIOV */
>-skip_unsupport = (psp->cmd_buf_mem->resp.status ==
>TEE_ERROR_NOT_SUPPORTED ||
>-psp->cmd_buf_mem->resp.status ==
>PSP_ERR_UNKNOWN_COMMAND) && amdgpu_sriov_vf(psp->adev);
>+/* We allow TEE_ERROR_NOT_SUPPORTED for VMR command in
>SRIOV */
>+skip_unsupport = (psp->cmd_buf_mem->resp.status == 0x000a)
>&&
>+amdgpu_sriov_vf(psp->adev);
>
> memcpy((void*)>resp, (void*)>cmd_buf_mem->resp,
>sizeof(struct psp_gfx_resp));
>
>@@ -420,26 +419,6 @@ static int psp_tmr_init(struct psp_context *psp)
> return ret;
> }
>
>-static int psp_clear_vf_fw(struct psp_context *psp) -{
>-int ret;
>-struct psp_gfx_cmd_resp *cmd;
>-
>-if (!amdgpu_sriov_vf(psp->adev) || psp->adev->asic_type !=
>CHIP_NAVI12)
>-return 0;
>-
>-cmd = kzalloc(sizeof(struct psp_gfx_cmd_resp), GFP_KERNEL);
>-if (!cmd)
>-return -ENOMEM;
>-
>-cmd->cmd_id = GFX_CMD_ID_CLEAR_VF_FW;
>-
>-ret = psp_cmd_submit_buf(psp, NULL, cmd, psp-
>>fence_buf_mc_addr);
>-kfree(cmd);
>-
>-return ret;
>-}
>-
> static bool psp_skip_tmr(struct psp_context *psp)  {
> switch (psp->adev->asic_type) {
>@@ -1924,12 +1903,6 @@ static int psp_hw_start(struct psp_context *psp)
> return ret;
> }
>
>-ret = psp_clear_vf_fw(psp);
>-if (ret) {
>-DRM_ERROR("PSP clear vf fw!\n");
>-return ret;
>-}
>-
> ret = psp_boot_config_set(adev);
> if (ret) {
> DRM_WARN("PSP set boot config@\n");
>@@ -2448,11 +2421,6 @@ static int psp_hw_fini(void *handle)
> }
>
> psp_asd_unload(psp);
>-ret = psp_clear_vf_fw(psp);
>-if (ret) {
>-DRM_ERROR("PSP clear vf fw!\n");
>-return ret;
>-}
>
> psp_tmr_terminate(psp);
> psp_ring_destroy(psp, PSP_RING_TYPE__KM); diff --git
>a/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h
>b/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h
>index dd4d65f7e0f0..b5b1feaa259e 100644
>--- a/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h
>+++ b/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h
>@@ -97,7 +97,6 @@ enum psp_gfx_cmd_id
> GFX_CMD_ID_SETUP_VMR  = 0x0009,   /* setup VMR region */
> GFX_CMD_ID_DESTROY_VMR= 0x000A,   /* destroy VMR region
>*/
> GFX_CMD_ID_PROG_REG   = 0x000B,   /* program regs */
>-GFX_CMD_ID_CLEAR_VF_FW= 0x000D,   /* Clear VF FW, to be
>used on VF shutdown. */
> GFX_CMD_ID_GET_FW_ATTESTATION = 0x000F,   /* Query GPUVA of
>the Fw Attestation DB */
> /* IDs upto 0x1F are reserved for older programs (Raven, Vega 10/12/20)
>*/
> GFX_CMD_ID_LOAD_TOC   = 0x0020,   /* Load TOC and obtain
>TMR size */
>@@ -401,11 +400,4 @@ struct psp_gfx_rb_frame
> /* total 64 bytes */
> };
>
>-#define PSP_ERR_UNKNOWN_COMMAND 0x0100
>-
>-enum tee_error_code {
>-TEE_SUCCESS = 0x,
>-TEE_ERROR_NOT_SUPPORTED = 0x000A,
>-};
>-
> #endif /* _PSP_TEE_GFX_IF_H_ */
>--
>2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: Toggle msix after FLR for sriov

2021-03-31 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Ping ..

>-Original Message-
>From: Emily Deng 
>Sent: Tuesday, March 30, 2021 5:43 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: [PATCH] drm/amdgpu: Toggle msix after FLR for sriov
>
>From: "Emily.Deng" 
>
>For vf assigned to guest VM, after FLR, the msix table will be reset.
>As the flr is done on host driver. The qemu and vfio driver don't know this,
>and the msix is still enable from qemu and vfio driver side.
>So if want to  re-setup the msix table, first need to disable and re-enable the
>msix from guest VM side or the qemu will do nothing as it thought the msix is
>already enabled.
>
>v2:
>Change name with amdgpu_irq prefix, remove #ifdef.
>
>Signed-off-by: Emily.Deng 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 14 ++
> 1 file changed, 14 insertions(+)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>index 03412543427a..3045f52e613d 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>@@ -277,6 +277,17 @@ static bool amdgpu_msi_ok(struct amdgpu_device
>*adev)
> return true;
> }
>
>+static void amdgpu_irq_restore_msix(struct amdgpu_device *adev) {
>+u16 ctrl;
>+
>+pci_read_config_word(adev->pdev, adev->pdev->msix_cap +
>PCI_MSIX_FLAGS, );
>+ctrl &= ~PCI_MSIX_FLAGS_ENABLE;
>+pci_write_config_word(adev->pdev, adev->pdev->msix_cap +
>PCI_MSIX_FLAGS, ctrl);
>+ctrl |= PCI_MSIX_FLAGS_ENABLE;
>+pci_write_config_word(adev->pdev, adev->pdev->msix_cap +
>+PCI_MSIX_FLAGS, ctrl); }
>+
> /**
>  * amdgpu_irq_init - initialize interrupt handling
>  *
>@@ -558,6 +569,9 @@ void amdgpu_irq_gpu_reset_resume_helper(struct
>amdgpu_device *adev)  {
> int i, j, k;
>
>+if (amdgpu_sriov_vf(adev))
>+amdgpu_irq_restore_msix(adev);
>+
> for (i = 0; i < AMDGPU_IRQ_CLIENTID_MAX; ++i) {
> if (!adev->irq.client[i].sources)
> continue;
>--
>2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 4/6] drm/amdgpu: Disable fetch discovery data from vram for navi12 sriov

2021-03-31 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Ping .

>-Original Message-
>From: Emily Deng 
>Sent: Tuesday, March 30, 2021 12:42 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: [PATCH 4/6] drm/amdgpu: Disable fetch discovery data from vram for
>navi12 sriov
>
>To fix the board disappear issue.
>
>Signed-off-by: Emily Deng 
>---
> drivers/gpu/drm/amd/amdgpu/nv.c | 4 
> 1 file changed, 4 insertions(+)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c
>b/drivers/gpu/drm/amd/amdgpu/nv.c index 46d4bbabce75..48dc171bc759
>100644
>--- a/drivers/gpu/drm/amd/amdgpu/nv.c
>+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
>@@ -693,6 +693,10 @@ int nv_set_ip_blocks(struct amdgpu_device *adev)
> adev->nbio.funcs = _v2_3_funcs;
> adev->nbio.hdp_flush_reg = _v2_3_hdp_flush_reg;
> }
>+
>+if (amdgpu_sriov_vf(adev) && adev->asic_type == CHIP_NAVI12)
>+amdgpu_discovery = 0;
>+
> adev->hdp.funcs = _v5_0_funcs;
>
> if (adev->asic_type >= CHIP_SIENNA_CICHLID)
>--
>2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 2/6] drm/amdgpu: Correct the irq numbers for virtual ctrc

2021-03-31 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Ping..

>-Original Message-
>From: Emily Deng 
>Sent: Tuesday, March 30, 2021 12:42 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: [PATCH 2/6] drm/amdgpu: Correct the irq numbers for virtual ctrc
>
>Set the num_types equal to the enabled num_crtc.
>
>Signed-off-by: Emily Deng 
>---
> drivers/gpu/drm/amd/amdgpu/dce_virtual.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>index 5c11144da051..c03a83a2b7cd 100644
>--- a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>+++ b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>@@ -768,7 +768,7 @@ static const struct amdgpu_irq_src_funcs
>dce_virtual_crtc_irq_funcs = {
>
> static void dce_virtual_set_irq_funcs(struct amdgpu_device *adev)  {
>-adev->crtc_irq.num_types = AMDGPU_CRTC_IRQ_VBLANK6 + 1;
>+adev->crtc_irq.num_types = adev->mode_info.num_crtc;
> adev->crtc_irq.funcs = _virtual_crtc_irq_funcs;  }
>
>--
>2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 1/6] drm/amdgpu: Disable vcn decode ring for sriov navi12

2021-03-31 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Ping..

>-Original Message-
>From: Emily Deng 
>Sent: Tuesday, March 30, 2021 12:42 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily ; Min, Frank 
>Subject: [PATCH 1/6] drm/amdgpu: Disable vcn decode ring for sriov navi12
>
>Since vcn decoding ring is not required, so just disable it.
>
>Signed-off-by: Frank.Min 
>Signed-off-by: Emily Deng 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c |  4 +++-
> drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c   | 29 -
> 2 files changed, 17 insertions(+), 16 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>index 8844f650b17f..5d5c41c9d5aa 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>@@ -427,7 +427,9 @@ static int amdgpu_hw_ip_info(struct amdgpu_device
>*adev,
> if (adev->uvd.harvest_config & (1 << i))
> continue;
>
>-if (adev->vcn.inst[i].ring_dec.sched.ready)
>+if (adev->vcn.inst[i].ring_dec.sched.ready ||
>+(adev->asic_type == CHIP_NAVI12 &&
>+amdgpu_sriov_vf(adev)))
> ++num_rings;
> }
> ib_start_alignment = 16;
>diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
>b/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
>index 116b9643d5ba..e4b61f3a45fb 100644
>--- a/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
>+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
>@@ -220,21 +220,20 @@ static int vcn_v2_0_hw_init(void *handle)  {
> struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> struct amdgpu_ring *ring = >vcn.inst->ring_dec;
>-int i, r;
>+int i, r = -1;
>
> adev->nbio.funcs->vcn_doorbell_range(adev, ring->use_doorbell,
>  ring->doorbell_index, 0);
>
>-if (amdgpu_sriov_vf(adev))
>+if (amdgpu_sriov_vf(adev)) {
> vcn_v2_0_start_sriov(adev);
>-
>-r = amdgpu_ring_test_helper(ring);
>-if (r)
>-goto done;
>-
>-//Disable vcn decode for sriov
>-if (amdgpu_sriov_vf(adev))
>-ring->sched.ready = false;
>+if (adev->asic_type == CHIP_NAVI12)
>+ring->sched.ready = false;
>+} else {
>+r = amdgpu_ring_test_helper(ring);
>+if (r)
>+goto done;
>+}
>
> for (i = 0; i < adev->vcn.num_enc_rings; ++i) {
> ring = >vcn.inst->ring_enc[i];
>@@ -245,8 +244,11 @@ static int vcn_v2_0_hw_init(void *handle)
>
> done:
> if (!r)
>-DRM_INFO("VCN decode and encode initialized
>successfully(under %s).\n",
>-(adev->pg_flags &
>AMD_PG_SUPPORT_VCN_DPG)?"DPG Mode":"SPG Mode");
>+DRM_INFO("VCN %s encode initialized
>successfully(under %s).\n",
>+(adev->asic_type == CHIP_NAVI12 &&
>+amdgpu_sriov_vf(adev))?"":"decode and",
>+(adev->pg_flags &
>+AMD_PG_SUPPORT_VCN_DPG)?"DPG
>Mode":"SPG Mode");
>
> return r;
> }
>@@ -1719,9 +1721,6 @@ int vcn_v2_0_dec_ring_test_ring(struct
>amdgpu_ring *ring)
> unsigned i;
> int r;
>
>-if (amdgpu_sriov_vf(adev))
>-return 0;
>-
> WREG32(adev->vcn.inst[ring->me].external.scratch9, 0xCAFEDEAD);
> r = amdgpu_ring_alloc(ring, 4);
> if (r)
>--
>2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: Toggle msix after FLR for sriov

2021-03-30 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Ping ..

>-Original Message-
>From: Emily Deng 
>Sent: Tuesday, March 30, 2021 5:43 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: [PATCH] drm/amdgpu: Toggle msix after FLR for sriov
>
>From: "Emily.Deng" 
>
>For vf assigned to guest VM, after FLR, the msix table will be reset.
>As the flr is done on host driver. The qemu and vfio driver don't know this,
>and the msix is still enable from qemu and vfio driver side.
>So if want to  re-setup the msix table, first need to disable and re-enable the
>msix from guest VM side or the qemu will do nothing as it thought the msix is
>already enabled.
>
>v2:
>Change name with amdgpu_irq prefix, remove #ifdef.
>
>Signed-off-by: Emily.Deng 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 14 ++
> 1 file changed, 14 insertions(+)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>index 03412543427a..3045f52e613d 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>@@ -277,6 +277,17 @@ static bool amdgpu_msi_ok(struct amdgpu_device
>*adev)
> return true;
> }
>
>+static void amdgpu_irq_restore_msix(struct amdgpu_device *adev) {
>+u16 ctrl;
>+
>+pci_read_config_word(adev->pdev, adev->pdev->msix_cap +
>PCI_MSIX_FLAGS, );
>+ctrl &= ~PCI_MSIX_FLAGS_ENABLE;
>+pci_write_config_word(adev->pdev, adev->pdev->msix_cap +
>PCI_MSIX_FLAGS, ctrl);
>+ctrl |= PCI_MSIX_FLAGS_ENABLE;
>+pci_write_config_word(adev->pdev, adev->pdev->msix_cap +
>+PCI_MSIX_FLAGS, ctrl); }
>+
> /**
>  * amdgpu_irq_init - initialize interrupt handling
>  *
>@@ -558,6 +569,9 @@ void amdgpu_irq_gpu_reset_resume_helper(struct
>amdgpu_device *adev)  {
> int i, j, k;
>
>+if (amdgpu_sriov_vf(adev))
>+amdgpu_irq_restore_msix(adev);
>+
> for (i = 0; i < AMDGPU_IRQ_CLIENTID_MAX; ++i) {
> if (!adev->irq.client[i].sources)
> continue;
>--
>2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: Toggle msix after FLR for sriov

2021-03-30 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

>-Original Message-
>From: Das, Nirmoy 
>Sent: Tuesday, March 30, 2021 5:34 PM
>To: Deng, Emily ; Das, Nirmoy
>; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu: Toggle msix after FLR for sriov
>
>
>On 3/30/21 11:29 AM, Deng, Emily wrote:
>> [AMD Official Use Only - Internal Distribution Only]
>>
>>> -Original Message-
>>> From: Das, Nirmoy 
>>> Sent: Tuesday, March 30, 2021 4:59 PM
>>> To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>>> Subject: Re: [PATCH] drm/amdgpu: Toggle msix after FLR for sriov
>>>
>>>
>>> On 3/30/21 10:14 AM, Emily Deng wrote:
>>>> From: "Emily.Deng" 
>>>>
>>>> After FLR, the msix will be cleared, so need to toggle it for sriov.
>>>>
>>>> v2:
>>>> Change name with amdgpu_irq prefix, remove #ifdef.
>>>>
>>>> Signed-off-by: Emily.Deng 
>>>> ---
>>>>drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 14 ++
>>>>1 file changed, 14 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>>> index 03412543427a..3045f52e613d 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>>> @@ -277,6 +277,17 @@ static bool amdgpu_msi_ok(struct
>amdgpu_device
>>> *adev)
>>>>return true;
>>>>}
>>>>
>>>> +static void amdgpu_irq_restore_msix(struct amdgpu_device *adev) {
>>>> +u16 ctrl;
>>>> +
>>>> +pci_read_config_word(adev->pdev, adev->pdev->msix_cap +
>>> PCI_MSIX_FLAGS, );
>>>> +ctrl &= ~PCI_MSIX_FLAGS_ENABLE;
>>>> +pci_write_config_word(adev->pdev, adev->pdev->msix_cap +
>>> PCI_MSIX_FLAGS, ctrl);
>>>> +ctrl |= PCI_MSIX_FLAGS_ENABLE;
>>>> +pci_write_config_word(adev->pdev, adev->pdev->msix_cap +
>>>> +PCI_MSIX_FLAGS, ctrl);
>>>
>>> Why write 1st clear and then set the msix flag if we know that msix
>>> is already cleared
>> For vf assigned to guest VM, after FLR, the msix table will be reset.
>> As the flr is done on host driver. The qemu and vfio driver don't know
>> this, and the msix is still enable from qemu and vfio driver side. So if 
>> want to
>re-setup the msix table, first need to disable and re-enable the msix from
>guest VM side or the qemu will do nothing as it thought the msix is already
>enabled.
>
>
>Thanks for the detailed explanation, Emily. Please add a comment so that we
>know/remember why we are doing this.
Ok, will do this. Thanks.
>
>
>Nirmoy
>
>
>>>
>>>
>>>> +}
>>>> +
>>>>/**
>>>> * amdgpu_irq_init - initialize interrupt handling
>>>> *
>>>> @@ -558,6 +569,9 @@ void
>amdgpu_irq_gpu_reset_resume_helper(struct
>>> amdgpu_device *adev)
>>>>{
>>>>int i, j, k;
>>>>
>>>> +if (amdgpu_sriov_vf(adev))
>>>> +amdgpu_irq_restore_msix(adev);
>>>
>>> Is it possible to load amdgpu on guest without msix ? If so then we need
>>> to probe if msix is enabled.
It is decided by host driver, not guest driver.
>>>
>>>
>>> Nirmoy
>>>
>>>
>>>> +
>>>>for (i = 0; i < AMDGPU_IRQ_CLIENTID_MAX; ++i) {
>>>>if (!adev->irq.client[i].sources)
>>>>continue;
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: Toggle msix after FLR for sriov

2021-03-30 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

>-Original Message-
>From: Das, Nirmoy 
>Sent: Tuesday, March 30, 2021 4:59 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu: Toggle msix after FLR for sriov
>
>
>On 3/30/21 10:14 AM, Emily Deng wrote:
>> From: "Emily.Deng" 
>>
>> After FLR, the msix will be cleared, so need to toggle it for sriov.
>>
>> v2:
>> Change name with amdgpu_irq prefix, remove #ifdef.
>>
>> Signed-off-by: Emily.Deng 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 14 ++
>>   1 file changed, 14 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>> index 03412543427a..3045f52e613d 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>> @@ -277,6 +277,17 @@ static bool amdgpu_msi_ok(struct amdgpu_device
>*adev)
>>   return true;
>>   }
>>
>> +static void amdgpu_irq_restore_msix(struct amdgpu_device *adev) {
>> +u16 ctrl;
>> +
>> +pci_read_config_word(adev->pdev, adev->pdev->msix_cap +
>PCI_MSIX_FLAGS, );
>> +ctrl &= ~PCI_MSIX_FLAGS_ENABLE;
>> +pci_write_config_word(adev->pdev, adev->pdev->msix_cap +
>PCI_MSIX_FLAGS, ctrl);
>> +ctrl |= PCI_MSIX_FLAGS_ENABLE;
>> +pci_write_config_word(adev->pdev, adev->pdev->msix_cap +
>> +PCI_MSIX_FLAGS, ctrl);
>
>
>Why write 1st clear and then set the msix flag if we know that msix is already
>cleared
For vf assigned to guest VM, after FLR, the msix table will be reset. As the 
flr is done on host driver. The qemu and vfio driver don't know
this, and the msix is still enable from qemu and vfio driver side. So if want 
to  re-setup the msix table, first need to disable and re-enable the msix from 
guest VM side or the qemu will do nothing
as it thought the msix is already enabled.
>
>
>
>> +}
>> +
>>   /**
>>* amdgpu_irq_init - initialize interrupt handling
>>*
>> @@ -558,6 +569,9 @@ void amdgpu_irq_gpu_reset_resume_helper(struct
>amdgpu_device *adev)
>>   {
>>   int i, j, k;
>>
>> +if (amdgpu_sriov_vf(adev))
>> +amdgpu_irq_restore_msix(adev);
>
>
>Is it possible to load amdgpu on guest without msix ? If so then we need
>to probe if msix is enabled.
>
>
>Nirmoy
>
>
>> +
>>   for (i = 0; i < AMDGPU_IRQ_CLIENTID_MAX; ++i) {
>>   if (!adev->irq.client[i].sources)
>>   continue;
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 6/6] drm/amdgpu: Fix driver unload issue

2021-03-30 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Hi Christian,
 Ok, will investigate this more for memory leak. But even I fixed this 
memory leak this time, it couldn't promise anymore memory leak in future. 
Memory leak shouldn't cause kernel crush, and couldn't
be used anymore.

Best wishes
Emily Deng



>-Original Message-
>From: Christian König 
>Sent: Tuesday, March 30, 2021 4:38 PM
>To: Deng, Emily ; Chen, Jiansong (Simon)
>; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
>
>Hi Emily,
>
>as I said add a WARN_ON() and look at the backtrace.
>
>It could be that the backtrace then just shows the general cleanup functions,
>but it is at least a start.
>
>On the other hand if you only see this sometimes then we have some kind of
>race condition and need to dig deeper.
>
>Christian.
>
>Am 30.03.21 um 10:19 schrieb Deng, Emily:
>> [AMD Official Use Only - Internal Distribution Only]
>>
>> Hi Christian,
>>   Yes, I agree both with you. But the issue occurs randomly and in
>> unload driver and in fairly low rate. It is hard to debug where is the memory
>leak. Could you give some suggestion about how to debug this issue?
>>
>>
>> Best wishes
>> Emily Deng
>>
>>
>>
>>> -Original Message-
>>> From: Christian König 
>>> Sent: Tuesday, March 30, 2021 3:11 PM
>>> To: Deng, Emily ; Chen, Jiansong (Simon)
>>> ; amd-gfx@lists.freedesktop.org
>>> Subject: Re: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
>>>
>>> Good morning,
>>>
>>> yes Jiansong is right that patch is really not a good idea.
>>>
>>> Moving buffers can indeed happen during shutdown while some memory
>is
>>> still referenced.
>>>
>>> Just ignoring the move is not the right approach, you need to find
>>> out why the memory is moved in the first place.
>>>
>>> You could add something like WARN_ON(adev->shutdown);
>>>
>>> Regards,
>>> Christian.
>>>
>>> Am 30.03.21 um 09:05 schrieb Deng, Emily:
>>>> [AMD Official Use Only - Internal Distribution Only]
>>>>
>>>> Hi Jiansong,
>>>>It does happen,  maybe have the race condition?
>>>>
>>>>
>>>> Best wishes
>>>> Emily Deng
>>>>
>>>>
>>>>
>>>>> -Original Message-
>>>>> From: Chen, Jiansong (Simon) 
>>>>> Sent: Tuesday, March 30, 2021 2:49 PM
>>>>> To: Deng, Emily ; amd-
>g...@lists.freedesktop.org
>>>>> Cc: Deng, Emily 
>>>>> Subject: RE: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
>>>>>
>>>>> [AMD Official Use Only - Internal Distribution Only]
>>>>>
>>>>> I still wonder how the issue takes place? According to my humble
>>>>> knowledge in driver model, the reference count of the kobject for
>>>>> the device will not reach zero when there is still some device mem
>>>>> access, and shutdown should not happen.
>>>>>
>>>>> Regards,
>>>>> Jiansong
>>>>> -Original Message-
>>>>> From: amd-gfx  On Behalf Of
>>>>> Emily Deng
>>>>> Sent: Tuesday, March 30, 2021 12:42 PM
>>>>> To: amd-gfx@lists.freedesktop.org
>>>>> Cc: Deng, Emily 
>>>>> Subject: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
>>>>>
>>>>> During driver unloading, don't need to copy mem, or it will
>>>>> introduce some call trace, such as when sa_manager is freed, it
>>>>> will introduce warn call trace in amdgpu_sa_bo_new.
>>>>>
>>>>> Signed-off-by: Emily Deng 
>>>>> ---
>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 +++
>>>>> 1 file changed, 3 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>>> index e00263bcc88b..f0546a489e0d 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>>> @@ -317,6 +317,9 @@ int amdgpu_ttm_copy_mem_to_mem(struct
>>>>> amdgpu_device *adev,  struct dma_fence *fence = NULL;  int r = 0;
>>>>>
>>>>> +if (adev->shutdown)
>>>>> +return 0;
>>>>> +

RE: [PATCH 6/6] drm/amdgpu: Fix driver unload issue

2021-03-30 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Hi Christian,
 Yes, I agree both with you. But the issue occurs randomly and in unload 
driver and in fairly low rate. It is hard to debug where is the memory leak. 
Could you give some suggestion about how
to debug this issue?


Best wishes
Emily Deng



>-Original Message-
>From: Christian König 
>Sent: Tuesday, March 30, 2021 3:11 PM
>To: Deng, Emily ; Chen, Jiansong (Simon)
>; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
>
>Good morning,
>
>yes Jiansong is right that patch is really not a good idea.
>
>Moving buffers can indeed happen during shutdown while some memory is
>still referenced.
>
>Just ignoring the move is not the right approach, you need to find out why the
>memory is moved in the first place.
>
>You could add something like WARN_ON(adev->shutdown);
>
>Regards,
>Christian.
>
>Am 30.03.21 um 09:05 schrieb Deng, Emily:
>> [AMD Official Use Only - Internal Distribution Only]
>>
>> Hi Jiansong,
>>   It does happen,  maybe have the race condition?
>>
>>
>> Best wishes
>> Emily Deng
>>
>>
>>
>>> -Original Message-
>>> From: Chen, Jiansong (Simon) 
>>> Sent: Tuesday, March 30, 2021 2:49 PM
>>> To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>>> Cc: Deng, Emily 
>>> Subject: RE: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
>>>
>>> [AMD Official Use Only - Internal Distribution Only]
>>>
>>> I still wonder how the issue takes place? According to my humble
>>> knowledge in driver model, the reference count of the kobject for the
>>> device will not reach zero when there is still some device mem
>>> access, and shutdown should not happen.
>>>
>>> Regards,
>>> Jiansong
>>> -Original Message-
>>> From: amd-gfx  On Behalf Of
>>> Emily Deng
>>> Sent: Tuesday, March 30, 2021 12:42 PM
>>> To: amd-gfx@lists.freedesktop.org
>>> Cc: Deng, Emily 
>>> Subject: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
>>>
>>> During driver unloading, don't need to copy mem, or it will introduce
>>> some call trace, such as when sa_manager is freed, it will introduce
>>> warn call trace in amdgpu_sa_bo_new.
>>>
>>> Signed-off-by: Emily Deng 
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 +++
>>> 1 file changed, 3 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> index e00263bcc88b..f0546a489e0d 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> @@ -317,6 +317,9 @@ int amdgpu_ttm_copy_mem_to_mem(struct
>>> amdgpu_device *adev,  struct dma_fence *fence = NULL;  int r = 0;
>>>
>>> +if (adev->shutdown)
>>> +return 0;
>>> +
>>> if (!adev->mman.buffer_funcs_enabled) {  DRM_ERROR("Trying to move
>>> memory with ring turned off.\n");  return -EINVAL;
>>> --
>>> 2.25.1
>>>
>>> ___
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flis
>>> ts.fr
>>> eedesktop.org%2Fmailman%2Flistinfo%2Famd-
>>>
>gfxdata=04%7C01%7CJiansong.Chen%40amd.com%7C1b4c71d7b96247
>>>
>6a367508d8f3362f40%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7
>>>
>C637526761354532311%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
>>>
>MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdat
>>>
>a=RxRnZW0fmwjKSGMN1nf6kIHRdAPVs9J5OBluDYhR6vQ%3Dreserved
>>> =0
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist
>> s.freedesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfxdata=04%7C01%7CEm
>>
>ily.Deng%40amd.com%7Cffacb4715aff4ba4336808d8f34af62d%7C3dd8961fe4
>884e
>>
>608e11a82d994e183d%7C0%7C0%7C637526850578585302%7CUnknown%7CT
>WFpbGZsb3
>>
>d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%
>3D%7
>>
>C1000sdata=u26JPASmJOF5nkXFSJP89PiUUFehvzf%2B2qxQM%2FgT9Ek
>%3D
>> ;reserved=0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 3/6] drm/amdgpu: Restore msix after FLR

2021-03-30 Thread Deng, Emily
Hi Guchun,
Ok, will make it to static function.

>-Original Message-
>From: Chen, Guchun 
>Sent: Tuesday, March 30, 2021 1:38 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: RE: [PATCH 3/6] drm/amdgpu: Restore msix after FLR
>
>[AMD Public Use]
>
>amdgpu_irq_restore_msix should be one static function?
>
>Regards,
>Guchun
>
>-Original Message-
>From: amd-gfx  On Behalf Of Emily
>Deng
>Sent: Tuesday, March 30, 2021 12:42 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: [PATCH 3/6] drm/amdgpu: Restore msix after FLR
>
>From: "Emily.Deng" 
>
>After FLR, the msix will be cleared, so need to re-enable it.
>
>v2:
>Change name with amdgpu_irq prefix, remove #ifdef.
>
>Signed-off-by: Emily.Deng 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 12 
> 1 file changed, 12 insertions(+)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>index 03412543427a..8936589bd7f9 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>@@ -277,6 +277,17 @@ static bool amdgpu_msi_ok(struct amdgpu_device
>*adev)
>   return true;
> }
>
>+void amdgpu_irq_restore_msix(struct amdgpu_device *adev) {
>+  u16 ctrl;
>+
>+  pci_read_config_word(adev->pdev, adev->pdev->msix_cap +
>PCI_MSIX_FLAGS, );
>+  ctrl &= ~PCI_MSIX_FLAGS_ENABLE;
>+  pci_write_config_word(adev->pdev, adev->pdev->msix_cap +
>PCI_MSIX_FLAGS, ctrl);
>+  ctrl |= PCI_MSIX_FLAGS_ENABLE;
>+  pci_write_config_word(adev->pdev, adev->pdev->msix_cap +
>+PCI_MSIX_FLAGS, ctrl); }
>+
> /**
>  * amdgpu_irq_init - initialize interrupt handling
>  *
>@@ -558,6 +569,7 @@ void amdgpu_irq_gpu_reset_resume_helper(struct
>amdgpu_device *adev)  {
>   int i, j, k;
>
>+  amdgpu_irq_restore_msix(adev);
>   for (i = 0; i < AMDGPU_IRQ_CLIENTID_MAX; ++i) {
>   if (!adev->irq.client[i].sources)
>   continue;
>--
>2.25.1
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.fr
>eedesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfxdata=04%7C01%7Cguchun.chen%40amd.com%7C6aff296c96104aef
>176208d8f3362acf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C6
>37526761267513989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMD
>AiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=B
>G4P%2FbJmn8PiLR%2BxTys8cVWK6924LWftjTXjKqrgnkg%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 5/6] drm/amdgpu: Disable RPTR write back for navi12

2021-03-30 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

>-Original Message-
>From: Christian König 
>Sent: Tuesday, March 30, 2021 3:24 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org;
>Deucher, Alexander 
>Subject: Re: [PATCH 5/6] drm/amdgpu: Disable RPTR write back for navi12
>
>Am 30.03.21 um 09:20 schrieb Deng, Emily:
>> [AMD Official Use Only - Internal Distribution Only]
>>
>>> -Original Message-
>>> From: Christian König 
>>> Sent: Tuesday, March 30, 2021 3:13 PM
>>> To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>>> Subject: Re: [PATCH 5/6] drm/amdgpu: Disable RPTR write back for
>>> navi12
>>>
>>>
>>>
>>> Am 30.03.21 um 06:41 schrieb Emily Deng:
>>>> It will hit ramdomly sdma hang, and pending on utcl2 address
>>>> translation when access the RPTR polling address.
>>>>
>>>> According sdma firmware team mentioned, the RPTR writeback is done
>>>> by hardware automatically, and will hit issue when clock gating occurs.
>>>> So stop using the rptr write back for sdma5.0.
>>>>
>>>> Signed-off-by: Emily Deng 
>>>> ---
>>>>drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 18 --
>>>>1 file changed, 12 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
>>>> b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
>>>> index 920fc6d4a127..63e4a78181b8 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
>>>> @@ -298,13 +298,19 @@ static void
>>> sdma_v5_0_ring_patch_cond_exec(struct amdgpu_ring *ring,
>>>> */
>>>>static uint64_t sdma_v5_0_ring_get_rptr(struct amdgpu_ring *ring)
>>>>{
>>>> -u64 *rptr;
>>>> +struct amdgpu_device *adev = ring->adev;
>>>> +u64 rptr;
>>>> +u32 lowbit, highbit;
>>>> +
>>>> +lowbit = RREG32(sdma_v5_0_get_reg_offset(adev, ring->me,
>>> mmSDMA0_GFX_RB_RPTR));
>>>> +highbit = RREG32(sdma_v5_0_get_reg_offset(adev, ring->me,
>>>> +mmSDMA0_GFX_RB_RPTR_HI));
>>> That won't work like this.
>>>
>>> We have the readpointer writeback because we otherwise can't
>>> guarantee that the two 32bit values read from the registers are coherent.
>>>
>>> In other words it can be that the hi rptr is already wrapped around
>>> while the lo is still the old value.
>>>
>>> Why exactly doesn't the writeback work?
>>>
>>> Christian.
>> Issue occurs, when occurs clockgating, at the same time, the rptr write back
>occurs. At this time, the utcl2 translation will hang.
>
>Mhm, crap. Alex are you up to date on this bug?
>
>I'm not an expert on the SDMA, but my last status is that writeback is
>mandatory when we use 64bit rptr/wptr.
>
>Otherwise we need a workaround how to read a consistent 64bit rptr from
>two 32bit registers.
>
>Can you check the register documentation if there is any double buffering or
>stuff like that?
>
>Christian.
Hi Christian,
 Thanks to point out the inconsistent issue for 64 bit register. Please 
ignore this patch. Will try to fix the issue in sdma firmware.

Best wishes
Emily Deng


>
>>>> -/* XXX check if swapping is necessary on BE */ -rptr = ((u64
>>>> *)>adev->wb.wb[ring->rptr_offs]);
>>>> +rptr = highbit;
>>>> +rptr = rptr << 32;
>>>> +rptr |= lowbit;
>>>>
>>>> -DRM_DEBUG("rptr before shift == 0x%016llx\n", *rptr); -return
>>>> ((*rptr) >> 2);
>>>> +DRM_DEBUG("rptr before shift == 0x%016llx\n", rptr); return (rptr
>>>> +>> 2);
>>>>}
>>>>
>>>>/**
>>>> @@ -702,7 +708,7 @@ static int sdma_v5_0_gfx_resume(struct
>>> amdgpu_device *adev)
>>>>WREG32(sdma_v5_0_get_reg_offset(adev, i,
>>> mmSDMA0_GFX_RB_RPTR_ADDR_LO),
>>>>   lower_32_bits(adev->wb.gpu_addr + wb_offset) &
>>> 0xFFFC);
>>>> -rb_cntl = REG_SET_FIELD(rb_cntl, SDMA0_GFX_RB_CNTL,
>>> RPTR_WRITEBACK_ENABLE, 1);
>>>> +rb_cntl = REG_SET_FIELD(rb_cntl, SDMA0_GFX_RB_CNTL,
>>>> +RPTR_WRITEBACK_ENABLE, 0);
>>>>
>>>>WREG32(sdma_v5_0_get_reg_offset(adev, i,
>>> mmSDMA0_GFX_RB_BASE), ring->gpu_addr >> 8);
>>>>WREG32(sdma_v5_0_get_reg_offset(adev, i,
>>> mmSDMA0_GFX_RB_BASE_HI),
>>>> ring->gpu_addr >> 40);

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 5/6] drm/amdgpu: Disable RPTR write back for navi12

2021-03-30 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

>-Original Message-
>From: Christian König 
>Sent: Tuesday, March 30, 2021 3:13 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH 5/6] drm/amdgpu: Disable RPTR write back for navi12
>
>
>
>Am 30.03.21 um 06:41 schrieb Emily Deng:
>> It will hit ramdomly sdma hang, and pending on utcl2 address
>> translation when access the RPTR polling address.
>>
>> According sdma firmware team mentioned, the RPTR writeback is done by
>> hardware automatically, and will hit issue when clock gating occurs.
>> So stop using the rptr write back for sdma5.0.
>>
>> Signed-off-by: Emily Deng 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 18 --
>>   1 file changed, 12 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
>> b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
>> index 920fc6d4a127..63e4a78181b8 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
>> @@ -298,13 +298,19 @@ static void
>sdma_v5_0_ring_patch_cond_exec(struct amdgpu_ring *ring,
>>*/
>>   static uint64_t sdma_v5_0_ring_get_rptr(struct amdgpu_ring *ring)
>>   {
>> -u64 *rptr;
>> +struct amdgpu_device *adev = ring->adev;
>> +u64 rptr;
>> +u32 lowbit, highbit;
>> +
>> +lowbit = RREG32(sdma_v5_0_get_reg_offset(adev, ring->me,
>mmSDMA0_GFX_RB_RPTR));
>> +highbit = RREG32(sdma_v5_0_get_reg_offset(adev, ring->me,
>> +mmSDMA0_GFX_RB_RPTR_HI));
>
>That won't work like this.
>
>We have the readpointer writeback because we otherwise can't guarantee
>that the two 32bit values read from the registers are coherent.
>
>In other words it can be that the hi rptr is already wrapped around while the
>lo is still the old value.
>
>Why exactly doesn't the writeback work?
>
>Christian.
Issue occurs, when occurs clockgating, at the same time, the rptr write back 
occurs. At this time, the utcl2 translation will hang.
>
>>
>> -/* XXX check if swapping is necessary on BE */
>> -rptr = ((u64 *)>adev->wb.wb[ring->rptr_offs]);
>> +rptr = highbit;
>> +rptr = rptr << 32;
>> +rptr |= lowbit;
>>
>> -DRM_DEBUG("rptr before shift == 0x%016llx\n", *rptr);
>> -return ((*rptr) >> 2);
>> +DRM_DEBUG("rptr before shift == 0x%016llx\n", rptr);
>> +return (rptr >> 2);
>>   }
>>
>>   /**
>> @@ -702,7 +708,7 @@ static int sdma_v5_0_gfx_resume(struct
>amdgpu_device *adev)
>>   WREG32(sdma_v5_0_get_reg_offset(adev, i,
>mmSDMA0_GFX_RB_RPTR_ADDR_LO),
>>  lower_32_bits(adev->wb.gpu_addr + wb_offset) &
>0xFFFC);
>>
>> -rb_cntl = REG_SET_FIELD(rb_cntl, SDMA0_GFX_RB_CNTL,
>RPTR_WRITEBACK_ENABLE, 1);
>> +rb_cntl = REG_SET_FIELD(rb_cntl, SDMA0_GFX_RB_CNTL,
>> +RPTR_WRITEBACK_ENABLE, 0);
>>
>>   WREG32(sdma_v5_0_get_reg_offset(adev, i,
>mmSDMA0_GFX_RB_BASE), ring->gpu_addr >> 8);
>>   WREG32(sdma_v5_0_get_reg_offset(adev, i,
>mmSDMA0_GFX_RB_BASE_HI),
>> ring->gpu_addr >> 40);

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 6/6] drm/amdgpu: Fix driver unload issue

2021-03-30 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Hi Jiansong,
 It does happen,  maybe have the race condition?


Best wishes
Emily Deng



>-Original Message-
>From: Chen, Jiansong (Simon) 
>Sent: Tuesday, March 30, 2021 2:49 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: RE: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
>
>[AMD Official Use Only - Internal Distribution Only]
>
>I still wonder how the issue takes place? According to my humble knowledge
>in driver model, the reference count of the kobject for the device will not
>reach zero when there is still some device mem access, and shutdown should
>not happen.
>
>Regards,
>Jiansong
>-Original Message-
>From: amd-gfx  On Behalf Of Emily
>Deng
>Sent: Tuesday, March 30, 2021 12:42 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
>
>During driver unloading, don't need to copy mem, or it will introduce some
>call trace, such as when sa_manager is freed, it will introduce warn call trace
>in amdgpu_sa_bo_new.
>
>Signed-off-by: Emily Deng 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 +++
> 1 file changed, 3 insertions(+)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>index e00263bcc88b..f0546a489e0d 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>@@ -317,6 +317,9 @@ int amdgpu_ttm_copy_mem_to_mem(struct
>amdgpu_device *adev,  struct dma_fence *fence = NULL;  int r = 0;
>
>+if (adev->shutdown)
>+return 0;
>+
> if (!adev->mman.buffer_funcs_enabled) {  DRM_ERROR("Trying to move
>memory with ring turned off.\n");  return -EINVAL;
>--
>2.25.1
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.fr
>eedesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfxdata=04%7C01%7CJiansong.Chen%40amd.com%7C1b4c71d7b96247
>6a367508d8f3362f40%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7
>C637526761354532311%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
>MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdat
>a=RxRnZW0fmwjKSGMN1nf6kIHRdAPVs9J5OBluDYhR6vQ%3Dreserved
>=0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 3/6] drm/amdgpu: Restore msix after FLR

2021-03-29 Thread Deng, Emily



>-Original Message-
>From: Deucher, Alexander 
>Sent: Monday, March 29, 2021 10:41 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: RE: [PATCH 3/6] drm/amdgpu: Restore msix after FLR
>
>[AMD Public Use]
>
>> -Original Message-
>> From: amd-gfx  On Behalf Of
>> Emily Deng
>> Sent: Monday, March 29, 2021 3:50 AM
>> To: amd-gfx@lists.freedesktop.org
>> Cc: Deng, Emily 
>> Subject: [PATCH 3/6] drm/amdgpu: Restore msix after FLR
>>
>> From: "Emily.Deng" 
>>
>> After FLR, the msix will be cleared, so need to re-enable it.
>>
>> Signed-off-by: Emily.Deng 
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 13 +
>>  1 file changed, 13 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>> index 03412543427a..f24263120f3a 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>> @@ -277,6 +277,18 @@ static bool amdgpu_msi_ok(struct amdgpu_device
>> *adev)
>>  return true;
>>  }
>>
>> +void amdgpu_restore_msix(struct amdgpu_device *adev) { #ifdef
>> +PCI_IRQ_MSIX
>
>This should be static.  Also please use the amdgpu_irq_ prefix for consistency.
>Additionally, the #ifdef should be on it's own line.  Moreover, can we just 
>drop
>the #ifdef?
>
>Alex
Hi Alex,
Thanks for your suggestion, will modify and send out v2 patch to review 
again.
>
>> +u16 ctrl;
>> +
>> +pci_read_config_word(adev->pdev, adev->pdev->msix_cap +
>> PCI_MSIX_FLAGS, );
>> +ctrl &= ~PCI_MSIX_FLAGS_ENABLE;
>> +pci_write_config_word(adev->pdev, adev->pdev->msix_cap +
>> PCI_MSIX_FLAGS, ctrl);
>> +ctrl |= PCI_MSIX_FLAGS_ENABLE;
>> +pci_write_config_word(adev->pdev, adev->pdev->msix_cap +
>> +PCI_MSIX_FLAGS, ctrl); #endif }
>>  /**
>>   * amdgpu_irq_init - initialize interrupt handling
>>   *
>> @@ -558,6 +570,7 @@ void amdgpu_irq_gpu_reset_resume_helper(struct
>> amdgpu_device *adev)  {
>>  int i, j, k;
>>
>> +amdgpu_restore_msix(adev);
>>  for (i = 0; i < AMDGPU_IRQ_CLIENTID_MAX; ++i) {
>>  if (!adev->irq.client[i].sources)
>>  continue;
>> --
>> 2.25.1
>>
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.
>> freedesktop.org%2Fmailman%2Flistinfo%2Famd-
>> gfxdata=04%7C01%7Calexander.deucher%40amd.com%7C422d42ec3
>>
>d004b207e5908d8f2873f6c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7
>>
>C0%7C637526009994486807%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w
>>
>LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000
>> p;sdata=0gZJU2qgZ1H%2F08YNVNN4RVAcgzXDbMZOw%2FuGvrimYtg%3D
>> mp;reserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: Fix the page fault issue in amdgpu_irq_fini

2021-03-18 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

>-Original Message-
>From: Christian König 
>Sent: Thursday, March 18, 2021 7:52 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu: Fix the page fault issue in amdgpu_irq_fini
>
>Am 18.03.21 um 12:48 schrieb Emily Deng:
>> For some source, it will be shared by some client ID and source ID.
>> To fix the page fault issue, set all those to null.
>>
>> Signed-off-by: Emily Deng 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 16 +---
>>   1 file changed, 13 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>> index af026109421a..623b1ac6231d 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>> @@ -359,7 +359,7 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
>>*/
>>   void amdgpu_irq_fini(struct amdgpu_device *adev)
>>   {
>> -unsigned i, j;
>> +unsigned i, j, m, n;
>>
>>   if (adev->irq.installed) {
>>   drm_irq_uninstall(adev_to_drm(adev));
>> @@ -380,12 +380,22 @@ void amdgpu_irq_fini(struct amdgpu_device
>*adev)
>>   if (!src)
>>   continue;
>>
>> -kfree(src->enabled_types);
>> +if (src->enabled_types)
>> +kfree(src->enabled_types);
>
>A NULL check before kfree() is unecessary and will be complained about by the
>static checkers.
Sorry, will remove this.
>
>> +
>>   src->enabled_types = NULL;
>> +
>
>Unrelated white space change.
Sorry, will remove this also.
>
>>   if (src->data) {
>>   kfree(src->data);
>>   kfree(src);
>> -adev->irq.client[i].sources[j] = NULL;
>> +}
>> +
>> +for (m = 0; m < AMDGPU_IRQ_CLIENTID_MAX; ++m) {
>> +if (!adev->irq.client[m].sources)
>> +continue;
>> +for (n = 0; n < AMDGPU_MAX_IRQ_SRC_ID;
>++n)
>> +if (adev->irq.client[m].sources[n] ==
>src)
>> +adev->irq.client[m].sources[n]
>= NULL;
>
>Hui what? The memory you set to NULL here is freed on the line below.
>
>Accessing it after that would be illegal, so why do you want to set it to NULL?
[Emily] It is in the loop "for (j = 0; j < AMDGPU_MAX_IRQ_SRC_ID; ++j) {", 
shouldn't have been freed in this loop. Only set " 
adev->irq.client[i].sources[j] = NULL;" is not enough,
as it maybe have other client ID and src ID will share the same src. Also need 
to set those to NULL.
>
>Christian.
>
>>   }
>>   }
>>   kfree(adev->irq.client[i].sources);

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: Fix some unload driver issues

2021-03-05 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

>-Original Message-
>From: Christian König 
>Sent: Friday, March 5, 2021 4:52 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu: Fix some unload driver issues
>
>
>
>Am 05.03.21 um 09:43 schrieb Deng, Emily:
>> [AMD Official Use Only - Internal Distribution Only]
>>
>>> -Original Message-
>>> From: Christian König 
>>> Sent: Friday, March 5, 2021 3:55 PM
>>> To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>>> Subject: Re: [PATCH] drm/amdgpu: Fix some unload driver issues
>>>
>>> Am 05.03.21 um 02:20 schrieb Emily Deng:
>>>> When unloading driver after killing some applications, it will hit
>>>> sdma flush tlb job timeout which is called by ttm_bo_delay_delete.
>>>> So to avoid the job submit after fence driver fini, call
>>>> ttm_bo_lock_delayed_workqueue before fence driver fini. And also put
>>> drm_sched_fini before waiting fence.
>>>
>>> Good catch, Reviewed-by: Christian König 
>>> for this part.
>>>
>>>> Set adev->gart.ptr to null to fix null pointer when calling
>>>> amdgpu_gart_unbind in amdgpu_bo_fini which is after gart_fini.
>>> For that one I'm wondering if we shouldn't rather rework or double
>>> check the tear down order.
>>>
>>> On the other hand the hardware should be idle by now (this comes
>>> after the fence_driver_fini, doesn't it?) and it looks cleaner on it's own.
>> Yes, you are right, without consider memory leak, with above patch, the bo
>are all cleaned, then won't have issue. But if have memory leak, maybe it will
>have issue in ttm_bo_force_list_clean-> ttm_mem_evict_first?
>
>Yeah, that is a good argument and part of what I mean with it looks cleaner on
>its own.
>
>Maybe write that into the commit message like this. With that done the full
>patch has my rb.
>
>Regards,
>Christian.
Ok, thanks.
>
>>
>>> Regards,
>>> Christian.
>>>
>>>> Signed-off-by: Emily Deng 
>>>> ---
>>>>drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 +
>>>>drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 5 +++--
>>>>drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c   | 1 +
>>>>3 files changed, 5 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> index a11760ec3924..de0597d34588 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> @@ -3594,6 +3594,7 @@ void amdgpu_device_fini(struct amdgpu_device
>>> *adev)
>>>>{
>>>>dev_info(adev->dev, "amdgpu: finishing device.\n");
>>>>flush_delayed_work(>delayed_init_work);
>>>> +ttm_bo_lock_delayed_workqueue(>mman.bdev);
>>>>adev->shutdown = true;
>>>>
>>>>kfree(adev->pci_state);
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> index 143a14f4866f..6d16f58ac91e 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> @@ -531,6 +531,8 @@ void amdgpu_fence_driver_fini(struct
>>> amdgpu_device
>>>> *adev)
>>>>
>>>>if (!ring || !ring->fence_drv.initialized)
>>>>continue;
>>>> +if (!ring->no_scheduler)
>>>> +drm_sched_fini(>sched);
>>>>r = amdgpu_fence_wait_empty(ring);
>>>>if (r) {
>>>>/* no need to trigger GPU reset as we are unloading */
>>> @@ -539,8
>>>> +541,7 @@ void amdgpu_fence_driver_fini(struct amdgpu_device *adev)
>>>>if (ring->fence_drv.irq_src)
>>>>amdgpu_irq_put(adev, ring->fence_drv.irq_src,
>>>>   ring->fence_drv.irq_type); -if (!ring->no_scheduler)
>>>> -drm_sched_fini(>sched);
>>>> +
>>>>del_timer_sync(>fence_drv.fallback_timer);
>>>>for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)
>>>>dma_fence_put(ring->fence_drv.fences[j]);
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>>>> index 23823a57374f..f1ede4b43d07 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>>>> @@ -202,6 +202,7 @@ void amdgpu_gart_table_vram_free(struct
>>> amdgpu_device *adev)
>>>>return;
>>>>}
>>>>amdgpu_bo_unref(>gart.bo);
>>>> +adev->gart.ptr = NULL;
>>>>}
>>>>
>>>>/*

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: Fix some unload driver issues

2021-03-05 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

>-Original Message-
>From: Christian König 
>Sent: Friday, March 5, 2021 3:55 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu: Fix some unload driver issues
>
>Am 05.03.21 um 02:20 schrieb Emily Deng:
>> When unloading driver after killing some applications, it will hit
>> sdma flush tlb job timeout which is called by ttm_bo_delay_delete. So
>> to avoid the job submit after fence driver fini, call
>> ttm_bo_lock_delayed_workqueue before fence driver fini. And also put
>drm_sched_fini before waiting fence.
>
>Good catch, Reviewed-by: Christian König  for
>this part.
>
>> Set adev->gart.ptr to null to fix null pointer when calling
>> amdgpu_gart_unbind in amdgpu_bo_fini which is after gart_fini.
>
>For that one I'm wondering if we shouldn't rather rework or double check the
>tear down order.
>
>On the other hand the hardware should be idle by now (this comes after the
>fence_driver_fini, doesn't it?) and it looks cleaner on it's own.
Yes, you are right, without consider memory leak, with above patch, the bo are 
all cleaned, then won't have issue. But if have memory leak, maybe it will have 
issue in ttm_bo_force_list_clean-> ttm_mem_evict_first?

>
>Regards,
>Christian.
>
>>
>> Signed-off-by: Emily Deng 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 5 +++--
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c   | 1 +
>>   3 files changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index a11760ec3924..de0597d34588 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -3594,6 +3594,7 @@ void amdgpu_device_fini(struct amdgpu_device
>*adev)
>>   {
>>   dev_info(adev->dev, "amdgpu: finishing device.\n");
>>   flush_delayed_work(>delayed_init_work);
>> +ttm_bo_lock_delayed_workqueue(>mman.bdev);
>>   adev->shutdown = true;
>>
>>   kfree(adev->pci_state);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> index 143a14f4866f..6d16f58ac91e 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> @@ -531,6 +531,8 @@ void amdgpu_fence_driver_fini(struct
>amdgpu_device
>> *adev)
>>
>>   if (!ring || !ring->fence_drv.initialized)
>>   continue;
>> +if (!ring->no_scheduler)
>> +drm_sched_fini(>sched);
>>   r = amdgpu_fence_wait_empty(ring);
>>   if (r) {
>>   /* no need to trigger GPU reset as we are unloading */
>@@ -539,8
>> +541,7 @@ void amdgpu_fence_driver_fini(struct amdgpu_device *adev)
>>   if (ring->fence_drv.irq_src)
>>   amdgpu_irq_put(adev, ring->fence_drv.irq_src,
>>  ring->fence_drv.irq_type);
>> -if (!ring->no_scheduler)
>> -drm_sched_fini(>sched);
>> +
>>   del_timer_sync(>fence_drv.fallback_timer);
>>   for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)
>>   dma_fence_put(ring->fence_drv.fences[j]);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>> index 23823a57374f..f1ede4b43d07 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>> @@ -202,6 +202,7 @@ void amdgpu_gart_table_vram_free(struct
>amdgpu_device *adev)
>>   return;
>>   }
>>   amdgpu_bo_unref(>gart.bo);
>> +adev->gart.ptr = NULL;
>>   }
>>
>>   /*

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: extend MAX_KIQ_REG_TRY to 1000

2021-02-08 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Emily.Deng 

>-Original Message-
>From: Christian König 
>Sent: Monday, February 8, 2021 6:05 PM
>To: Gu, JiaWei (Will) ; Koenig, Christian
>; amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: Re: [PATCH] drm/amdgpu: extend MAX_KIQ_REG_TRY to 1000
>
>Hi Jiawei,
>
>ok in this case it's fine with me.
>
>Just please also get a reviewed-by from somebody which has more KIQ
>background than I have.
>
>Thanks,
>Christian.
>
>Am 08.02.21 um 11:00 schrieb Gu, JiaWei (Will):
>> [AMD Official Use Only - Internal Distribution Only]
>>
>> Hi Christian,
>>
>> That's how many times it would retry before give up.
>> And we always skip this retry routine if we are in interrupt, so it's fine 
>> for
>interrupt condition.
>>
>> Best regards,
>> Jiawei
>>
>> -Original Message-
>> From: Christian König 
>> Sent: Monday, February 8, 2021 5:28 PM
>> To: Gu, JiaWei (Will) ;
>> amd-gfx@lists.freedesktop.org
>> Cc: Deng, Emily 
>> Subject: Re: [PATCH] drm/amdgpu: extend MAX_KIQ_REG_TRY to 1000
>>
>> Am 08.02.21 um 06:45 schrieb Jiawei Gu:
>>> Extend retry times of KIQ to avoid starvation situation caused by
>>> long time full access of GPU by other VFs.
>> In what units is that? We also need the KIQ during interrupt handling and
>that looks like *way* to big for that.
>>
>> Christian.
>>
>>> Signed-off-by: Jiawei Gu 
>>> ---
>>>drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 +-
>>>1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> index 639db32c1383..e0c797a5f739 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> @@ -288,7 +288,7 @@ enum amdgpu_kiq_irq {
>>>
>>>#define MAX_KIQ_REG_WAIT   5000 /* in usecs, 5ms */
>>>#define MAX_KIQ_REG_BAILOUT_INTERVAL   5 /* in msecs, 5ms */
>>> -#define MAX_KIQ_REG_TRY 80 /* 20 -> 80 */
>>> +#define MAX_KIQ_REG_TRY 1000
>>>
>>>int amdgpu_device_ip_set_clockgating_state(void *dev,
>>>   enum amd_ip_block_type
>block_type,
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist
>> s.freedesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfxdata=04%7C01%7CEm
>>
>ily.Deng%40amd.com%7Cd3ba1ae698ec408da87f08d8cc18f7b1%7C3dd8961fe
>4884e
>>
>608e11a82d994e183d%7C0%7C0%7C63748375490637%7CUnknown%7CT
>WFpbGZsb3
>>
>d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%
>3D%7
>>
>C1000sdata=%2FLrBdotkzcAepWTBazik9S9ah5ul48DvtCNKN3wYyQU%3
>Dr
>> eserved=0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: change the fence ring wait timeout

2021-01-18 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

>-Original Message-
>From: amd-gfx  On Behalf Of
>Christian König
>Sent: Tuesday, January 19, 2021 12:04 AM
>To: Deng, Emily ; Koenig, Christian
>; Sun, Roy ; amd-
>g...@lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu: change the fence ring wait timeout
>
>Am 18.01.21 um 12:56 schrieb Deng, Emily:
>> [AMD Official Use Only - Internal Distribution Only]
>>
>>> -Original Message-
>>> From: Koenig, Christian 
>>> Sent: Monday, January 18, 2021 3:49 PM
>>> To: Deng, Emily ; Sun, Roy ;
>>> amd-gfx@lists.freedesktop.org
>>> Subject: Re: [PATCH] drm/amdgpu: change the fence ring wait timeout
>>>
>>> Mhm, we could change amdgpu_fence_wait_empty() to timeout. But I
>>> think that waiting forever here is intentional and the right thing to do.
>>>
>>> What happens is that we wait for the hardware to make sure that
>>> nothing is writing to any memory before we unload the driver.
>>>
>>> Now the VCN block has crashed and doesn't respond, but we can't
>>> guarantee that it is not accidentally writing anywhere.
>>>
>>> The only alternative we have is to time out and proceed with the
>>> driver unload, risking corrupting the memory we free during that
>>> should the hardware continue to do something.
>> Hi Christian,
>>  Thanks your suggestion, but still not quite clearly, could you detail 
>> the
>solution to avoid kernel not lockup?
>
>Well as I said that the kernel locks up is intentional here.
So you think the lock up is better than only some memory corruption? Because we 
could give more time, such as 60s to wait, I don't think the fence won't be 
signaled within 60s if the engine is good. So
when the engine is ok, it won't cause memory corruption with the timeout. When 
engine is bad, the fence will never be signaled, so even we force completion, 
it still won't cause memory corruption. As for sriov, when engine
is bad, we still could do recover, and do driver reload to make the driver 
works ok again, so we don't want the kernel lockup.
>
>Regards,
>Christian.
>
>>> Regards,
>>> Christian.
>>>
>>> Am 18.01.21 um 03:01 schrieb Deng, Emily:
>>>> [AMD Official Use Only - Internal Distribution Only]
>>>>
>>>>> -Original Message-
>>>>> From: Koenig, Christian 
>>>>> Sent: Thursday, January 14, 2021 9:50 PM
>>>>> To: Deng, Emily ; Sun, Roy
>;
>>>>> amd-gfx@lists.freedesktop.org
>>>>> Subject: Re: [PATCH] drm/amdgpu: change the fence ring wait timeout
>>>>>
>>>>> Am 14.01.21 um 03:00 schrieb Deng, Emily:
>>>>>> [AMD Official Use Only - Internal Distribution Only]
>>>>>>
>>>>>>> -Original Message-
>>>>>>> From: amd-gfx  On Behalf
>>>>>>> Of Christian König
>>>>>>> Sent: Wednesday, January 13, 2021 10:03 PM
>>>>>>> To: Sun, Roy ; amd-gfx@lists.freedesktop.org
>>>>>>> Subject: Re: [PATCH] drm/amdgpu: change the fence ring wait
>>>>>>> timeout
>>>>>>>
>>>>>>> Am 13.01.21 um 07:36 schrieb Roy Sun:
>>>>>>>> This fix bug where when the engine hang, the fence ring will
>>>>>>>> wait without quit and cause kernel crash
>>>>>>> NAK, this blocking is intentional unlimited because otherwise we
>>>>>>> will cause a memory corruption.
>>>>>>>
>>>>>>> What is the actual bug you are trying to fix here?
>>>>>> When some engine hang during initialization, the IB test will
>>>>>> fail, and fence will never come back, then never could wait the fence
>back.
>>>>>> Why we need to wait here forever? We'd better not use forever wait
>>>>>> which
>>>>> will cause kernel crash and lockup. And we have
>>>>> amdgpu_fence_driver_force_completion will let memory free. We
>>>>> should remove all those forever wait, and replace them with one
>>>>> timeout, and do the correct error process if timeout.
>>>>>
>>>>> This wait here is to make sure we never overwrite the software
>>>>> fence ring buffer. Without it we would not signal all fences in
>>>>> amdgpu_fence_driver_force_completion() and cause either memory
>leak
>>>>> or corruption.
>>>>>
>>>

RE: [PATCH] drm/amdgpu: change the fence ring wait timeout

2021-01-18 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

>-Original Message-
>From: Koenig, Christian 
>Sent: Monday, January 18, 2021 3:49 PM
>To: Deng, Emily ; Sun, Roy ;
>amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu: change the fence ring wait timeout
>
>Mhm, we could change amdgpu_fence_wait_empty() to timeout. But I think
>that waiting forever here is intentional and the right thing to do.
>
>What happens is that we wait for the hardware to make sure that nothing is
>writing to any memory before we unload the driver.
>
>Now the VCN block has crashed and doesn't respond, but we can't guarantee
>that it is not accidentally writing anywhere.
>
>The only alternative we have is to time out and proceed with the driver unload,
>risking corrupting the memory we free during that should the hardware
>continue to do something.
Hi Christian,
Thanks your suggestion, but still not quite clearly, could you detail the 
solution to avoid kernel not lockup?
>
>Regards,
>Christian.
>
>Am 18.01.21 um 03:01 schrieb Deng, Emily:
>> [AMD Official Use Only - Internal Distribution Only]
>>
>>> -Original Message-
>>> From: Koenig, Christian 
>>> Sent: Thursday, January 14, 2021 9:50 PM
>>> To: Deng, Emily ; Sun, Roy ;
>>> amd-gfx@lists.freedesktop.org
>>> Subject: Re: [PATCH] drm/amdgpu: change the fence ring wait timeout
>>>
>>> Am 14.01.21 um 03:00 schrieb Deng, Emily:
>>>> [AMD Official Use Only - Internal Distribution Only]
>>>>
>>>>> -Original Message-
>>>>> From: amd-gfx  On Behalf Of
>>>>> Christian König
>>>>> Sent: Wednesday, January 13, 2021 10:03 PM
>>>>> To: Sun, Roy ; amd-gfx@lists.freedesktop.org
>>>>> Subject: Re: [PATCH] drm/amdgpu: change the fence ring wait timeout
>>>>>
>>>>> Am 13.01.21 um 07:36 schrieb Roy Sun:
>>>>>> This fix bug where when the engine hang, the fence ring will wait
>>>>>> without quit and cause kernel crash
>>>>> NAK, this blocking is intentional unlimited because otherwise we
>>>>> will cause a memory corruption.
>>>>>
>>>>> What is the actual bug you are trying to fix here?
>>>> When some engine hang during initialization, the IB test will fail,
>>>> and fence will never come back, then never could wait the fence back.
>>>> Why we need to wait here forever? We'd better not use forever wait
>>>> which
>>> will cause kernel crash and lockup. And we have
>>> amdgpu_fence_driver_force_completion will let memory free. We should
>>> remove all those forever wait, and replace them with one timeout,
>>> and do the correct error process if timeout.
>>>
>>> This wait here is to make sure we never overwrite the software fence
>>> ring buffer. Without it we would not signal all fences in
>>> amdgpu_fence_driver_force_completion() and cause either memory leak
>>> or corruption.
>>>
>>> Waiting here forever is the right thing to do even when that means
>>> that the submission thread gets stuck forever, cause that is still
>>> better than memory corruption.
>>>
>>> But this should never happen in practice and is only here for
>>> precaution. So do you really see that in practice?
>> Yes, we hit the issue when vcn ib test fail. Could you give some suggestions
>about how to fix this?
>> [  958.301685] failed to read reg:1a6c0
>>
>> [  959.036645] gmc_v10_0_process_interrupt: 42 callbacks suppressed
>>
>> [  959.036653] amdgpu :00:07.0: [mmhub] page fault (src_id:0
>> ring:0 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
>>
>> [  959.038043] amdgpu :00:07.0:   in page starting at address
>0x00567000 from client 18
>>
>> [  959.039014] amdgpu :00:07.0: [mmhub] page fault (src_id:0
>> ring:0 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
>>
>> [  959.040202] amdgpu :00:07.0:   in page starting at address
>0x00567000 from client 18
>>
>> [  959.041174] amdgpu :00:07.0: [mmhub] page fault (src_id:0
>> ring:0 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
>>
>> [  959.042353] amdgpu :00:07.0:   in page starting at address
>0x00567000 from client 18
>>
>> [  959.043325] amdgpu :00:07.0: [mmhub] page fault (src_id:0
>> ring:0 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
>>
>> [  959.044508] amdgpu :00:07.0:   in page starting at address
>0x00567000 from clie

RE: [PATCH] drm/amdgpu: change the fence ring wait timeout

2021-01-17 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

>-Original Message-
>From: Koenig, Christian 
>Sent: Thursday, January 14, 2021 9:50 PM
>To: Deng, Emily ; Sun, Roy ;
>amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu: change the fence ring wait timeout
>
>Am 14.01.21 um 03:00 schrieb Deng, Emily:
>> [AMD Official Use Only - Internal Distribution Only]
>>
>>> -Original Message-
>>> From: amd-gfx  On Behalf Of
>>> Christian König
>>> Sent: Wednesday, January 13, 2021 10:03 PM
>>> To: Sun, Roy ; amd-gfx@lists.freedesktop.org
>>> Subject: Re: [PATCH] drm/amdgpu: change the fence ring wait timeout
>>>
>>> Am 13.01.21 um 07:36 schrieb Roy Sun:
>>>> This fix bug where when the engine hang, the fence ring will wait
>>>> without quit and cause kernel crash
>>> NAK, this blocking is intentional unlimited because otherwise we will
>>> cause a memory corruption.
>>>
>>> What is the actual bug you are trying to fix here?
>> When some engine hang during initialization, the IB test will fail,
>> and fence will never come back, then never could wait the fence back.
>> Why we need to wait here forever? We'd better not use forever wait which
>will cause kernel crash and lockup. And we have
>amdgpu_fence_driver_force_completion will let memory free. We should
>remove all those forever wait, and replace them with one timeout,  and do
>the correct error process if timeout.
>
>This wait here is to make sure we never overwrite the software fence ring
>buffer. Without it we would not signal all fences in
>amdgpu_fence_driver_force_completion() and cause either memory leak or
>corruption.
>
>Waiting here forever is the right thing to do even when that means that the
>submission thread gets stuck forever, cause that is still better than memory
>corruption.
>
>But this should never happen in practice and is only here for precaution. So do
>you really see that in practice?
Yes, we hit the issue when vcn ib test fail. Could you give some suggestions 
about how to fix this?
[  958.301685] failed to read reg:1a6c0

[  959.036645] gmc_v10_0_process_interrupt: 42 callbacks suppressed

[  959.036653] amdgpu :00:07.0: [mmhub] page fault (src_id:0 ring:0 vmid:0 
pasid:0, for process  pid 0 thread  pid 0)

[  959.038043] amdgpu :00:07.0:   in page starting at address 
0x00567000 from client 18

[  959.039014] amdgpu :00:07.0: [mmhub] page fault (src_id:0 ring:0 vmid:0 
pasid:0, for process  pid 0 thread  pid 0)

[  959.040202] amdgpu :00:07.0:   in page starting at address 
0x00567000 from client 18

[  959.041174] amdgpu :00:07.0: [mmhub] page fault (src_id:0 ring:0 vmid:0 
pasid:0, for process  pid 0 thread  pid 0)

[  959.042353] amdgpu :00:07.0:   in page starting at address 
0x00567000 from client 18

[  959.043325] amdgpu :00:07.0: [mmhub] page fault (src_id:0 ring:0 vmid:0 
pasid:0, for process  pid 0 thread  pid 0)

[  959.044508] amdgpu :00:07.0:   in page starting at address 
0x00567000 from client 18

[  959.045480] amdgpu :00:07.0: [mmhub] page fault (src_id:0 ring:0 vmid:0 
pasid:0, for process  pid 0 thread  pid 0)

[  959.046659] amdgpu :00:07.0:   in page starting at address 
0x00567000 from client 18

[  959.047631] amdgpu :00:07.0: [mmhub] page fault (src_id:0 ring:0 vmid:0 
pasid:0, for process  pid 0 thread  pid 0)

[  959.048815] amdgpu :00:07.0:   in page starting at address 
0x00567000 from client 18

[  959.049787] amdgpu :00:07.0: [mmhub] page fault (src_id:0 ring:0 vmid:0 
pasid:0, for process  pid 0 thread  pid 0)

[  959.050973] amdgpu :00:07.0:   in page starting at address 
0x00567000 from client 18

[  959.051950] amdgpu :00:07.0: [mmhub] page fault (src_id:0 ring:0 vmid:0 
pasid:0, for process  pid 0 thread  pid 0)

[  959.053123] amdgpu :00:07.0:   in page starting at address 
0x00567000 from client 18

[  967.208705] amdgpu :00:07.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* 
IB test failed on vcn_enc0 (-110).

[  967.209879] [drm:amdgpu_device_init [amdgpu]] *ERROR* ib ring test failed 
(-110).



[ 1209.384668] INFO: task modprobe:23957 blocked for more than 120 seconds.

[ 1209.385605]   Tainted: G   OE 5.4.0-58-generic 
#64~18.04.1-Ubuntu

[ 1209.386451] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.

[ 1209.387342] modprobeD0 23957   1221 0x80004006

[ 1209.387344] Call Trace:

[ 1209.387354]  __schedule+0x293/0x720

[ 1209.387356]  schedule+0x33/0xa0

[ 1209.387357]  schedule_timeout+0x1d3/0x320

[ 1209.387359]  ? schedule+0x33/0xa0

[ 1209.387360]  ? schedule_timeout+0x1d3/0x320

[ 1209.387363]  dma_fence_default_wait+0x1b2/0x1

RE: [PATCH] drm/amdgpu: change the fence ring wait timeout

2021-01-13 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

>-Original Message-
>From: amd-gfx  On Behalf Of
>Christian König
>Sent: Wednesday, January 13, 2021 10:03 PM
>To: Sun, Roy ; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu: change the fence ring wait timeout
>
>Am 13.01.21 um 07:36 schrieb Roy Sun:
>> This fix bug where when the engine hang, the fence ring will wait
>> without quit and cause kernel crash
>
>NAK, this blocking is intentional unlimited because otherwise we will cause a
>memory corruption.
>
>What is the actual bug you are trying to fix here?
When some engine hang during initialization, the IB test will fail, and fence 
will never come back, then never could wait the fence back. Why we need to wait 
here forever? We'd better not use forever wait which
will cause kernel crash and lockup. And we have 
amdgpu_fence_driver_force_completion will let memory free. We should remove all 
those forever wait, and replace them with one timeout,  and
do the correct error process if timeout.

>
>Regards,
>Christian.
>
>>
>> Signed-off-by: Roy Sun 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 48
>---
>>   1 file changed, 43 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> index 6b0aeee61b8b..738ea65077ea 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> @@ -41,6 +41,8 @@
>>   #include "amdgpu.h"
>>   #include "amdgpu_trace.h"
>>
>> +#define AMDGPU_FENCE_TIMEOUT  msecs_to_jiffies(1000) #define
>> +AMDGPU_FENCE_GFX_XGMI_TIMEOUT msecs_to_jiffies(2000)
>>   /*
>>* Fences
>>* Fences mark an event in the GPUs pipeline and are used @@ -104,6
>> +106,38 @@ static void amdgpu_fence_write(struct amdgpu_ring *ring, u32
>seq)
>>   *drv->cpu_addr = cpu_to_le32(seq);
>>   }
>>
>> +/**
>> + * amdgpu_fence_wait_timeout - get the fence wait timeout
>> + *
>> + * @ring: ring the fence is associated with
>> + *
>> + * Returns the value of the fence wait timeout.
>> + */
>> +long amdgpu_fence_wait_timeout(struct amdgpu_ring *ring) {
>> +long tmo_gfx, tmo_mm, tmo;
>> +struct amdgpu_device *adev = ring->adev;
>> +tmo_mm = tmo_gfx = AMDGPU_FENCE_TIMEOUT;
>> +if (amdgpu_sriov_vf(adev)) {
>> +tmo_mm = 8 * AMDGPU_FENCE_TIMEOUT;
>> +}
>> +if (amdgpu_sriov_runtime(adev)) {
>> +tmo_gfx = 8 * AMDGPU_FENCE_TIMEOUT;
>> +} else if (adev->gmc.xgmi.hive_id) {
>> +tmo_gfx = AMDGPU_FENCE_GFX_XGMI_TIMEOUT;
>> +}
>> +if (ring->funcs->type == AMDGPU_RING_TYPE_UVD ||
>> +ring->funcs->type == AMDGPU_RING_TYPE_VCE ||
>> +ring->funcs->type == AMDGPU_RING_TYPE_UVD_ENC ||
>> +ring->funcs->type == AMDGPU_RING_TYPE_VCN_DEC ||
>> +ring->funcs->type == AMDGPU_RING_TYPE_VCN_ENC ||
>> +ring->funcs->type == AMDGPU_RING_TYPE_VCN_JPEG)
>> +tmo = tmo_mm;
>> +else
>> +tmo = tmo_gfx;
>> +return tmo;
>> +}
>> +
>>   /**
>>* amdgpu_fence_read - read a fence value
>>*
>> @@ -166,10 +200,12 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring,
>struct dma_fence **f,
>>   rcu_read_unlock();
>>
>>   if (old) {
>> -r = dma_fence_wait(old, false);
>> +long timeout;
>> +timeout = amdgpu_fence_wait_timeout(ring);
>> +r = dma_fence_wait_timeout(old, false, timeout);
>>   dma_fence_put(old);
>>   if (r)
>> -return r;
>> +return r < 0 ? r : 0;
>>   }
>>   }
>>
>> @@ -343,10 +379,12 @@ int amdgpu_fence_wait_empty(struct
>amdgpu_ring *ring)
>>   return 0;
>>   }
>>   rcu_read_unlock();
>> -
>> -r = dma_fence_wait(fence, false);
>> +
>> +long timeout;
>> +timeout = amdgpu_fence_wait_timeout(ring);
>> +r = dma_fence_wait_timeout(fence, false, timeout);
>>   dma_fence_put(fence);
>> -return r;
>> +return r < 0 ? r : 0;
>>   }
>>
>>   /**
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.f
>reedesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfxdata=04%7C01%7CEmily.Deng%40amd.com%7C8b116229938b463
>df87f08d8b7cbf607%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7
>C637461433936049544%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
>MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sda
>ta=HOcLHmmblOUHXATFBl5HC6LOmFq0oXglAh2GFwd6sus%3Dreserve
>d=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: change the fence ring wait timeout

2021-01-12 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

>-Original Message-
>From: amd-gfx  On Behalf Of Roy
>Sun
>Sent: Wednesday, January 13, 2021 2:36 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Sun, Roy 
>Subject: [PATCH] drm/amdgpu: change the fence ring wait timeout
>
>This fix bug where when the engine hang, the fence ring will wait without quit
>and cause kernel crash
>
>Signed-off-by: Roy Sun 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 48
>---
> 1 file changed, 43 insertions(+), 5 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>index 6b0aeee61b8b..738ea65077ea 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>@@ -41,6 +41,8 @@
> #include "amdgpu.h"
> #include "amdgpu_trace.h"
>
>+#define AMDGPU_FENCE_TIMEOUT  msecs_to_jiffies(1000) #define
>+AMDGPU_FENCE_GFX_XGMI_TIMEOUT msecs_to_jiffies(2000)
Please check the format.
> /*
>  * Fences
>  * Fences mark an event in the GPUs pipeline and are used @@ -104,6
>+106,38 @@ static void amdgpu_fence_write(struct amdgpu_ring *ring, u32
>seq)
> *drv->cpu_addr = cpu_to_le32(seq);
> }
>
>+/**
>+ * amdgpu_fence_wait_timeout - get the fence wait timeout
>+ *
>+ * @ring: ring the fence is associated with
>+ *
>+ * Returns the value of the fence wait timeout.
>+ */
>+long amdgpu_fence_wait_timeout(struct amdgpu_ring *ring) {
>+long tmo_gfx, tmo_mm, tmo;
>+struct amdgpu_device *adev = ring->adev;
>+tmo_mm = tmo_gfx = AMDGPU_FENCE_TIMEOUT;
>+if (amdgpu_sriov_vf(adev)) {
>+tmo_mm = 8 * AMDGPU_FENCE_TIMEOUT;
>+}
>+if (amdgpu_sriov_runtime(adev)) {
>+tmo_gfx = 8 * AMDGPU_FENCE_TIMEOUT;
>+} else if (adev->gmc.xgmi.hive_id) {
>+tmo_gfx = AMDGPU_FENCE_GFX_XGMI_TIMEOUT;
>+}
>+if (ring->funcs->type == AMDGPU_RING_TYPE_UVD ||
>+ring->funcs->type == AMDGPU_RING_TYPE_VCE ||
>+ring->funcs->type == AMDGPU_RING_TYPE_UVD_ENC ||
>+ring->funcs->type == AMDGPU_RING_TYPE_VCN_DEC ||
>+ring->funcs->type == AMDGPU_RING_TYPE_VCN_ENC ||
>+ring->funcs->type == AMDGPU_RING_TYPE_VCN_JPEG)
>+tmo = tmo_mm;
>+else
>+tmo = tmo_gfx;
>+return tmo;
>+}
>+
> /**
>  * amdgpu_fence_read - read a fence value
>  *
>@@ -166,10 +200,12 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring,
>struct dma_fence **f,
> rcu_read_unlock();
>
> if (old) {
>-r = dma_fence_wait(old, false);
>+long timeout;
>+timeout = amdgpu_fence_wait_timeout(ring);
>+r = dma_fence_wait_timeout(old, false, timeout);
> dma_fence_put(old);
> if (r)
>-return r;
>+return r < 0 ? r : 0;
> }
> }
>
>@@ -343,10 +379,12 @@ int amdgpu_fence_wait_empty(struct amdgpu_ring
>*ring)
> return 0;
> }
> rcu_read_unlock();
>-
>-r = dma_fence_wait(fence, false);
>+
>+long timeout;
>+timeout = amdgpu_fence_wait_timeout(ring);
>+r = dma_fence_wait_timeout(fence, false, timeout);
> dma_fence_put(fence);
>-return r;
>+return r < 0 ? r : 0;
> }
>
> /**
>--
>2.28.0
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.f
>reedesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfxdata=04%7C01%7CEmily.Deng%40amd.com%7C2df180bf7aaa419f
>387708d8b78d9dfd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7
>C637461166168063291%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
>MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sda
>ta=2n3ZbLF0ZT%2FakzxlcBI3c%2F9A0F8EIzvKzDj2OPA0DmE%3Dreserve
>d=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH v2] drm/amdgpu:Limit the resolution for virtual_display

2021-01-11 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Hi Michel and Alex,
Ok, let us abandon the patch, wish user mode driver could fix those issues 
encountered after resolution bigger than 16384.

Best wishes
Emily Deng



>-Original Message-
>From: Michel Dänzer 
>Sent: Monday, January 11, 2021 7:52 PM
>To: Deng, Emily ; Alex Deucher
>
>Cc: amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH v2] drm/amdgpu:Limit the resolution for virtual_display
>
>On 2021-01-11 12:50 p.m., Michel Dänzer wrote:
>> On 2021-01-11 5:55 a.m., Deng, Emily wrote:
>>> [AMD Official Use Only - Internal Distribution Only]
>>>
>>> Yes, it just won't report bigger than 16384 mode to user mode, as it
>>> won't work properly.
>>
>> ... with Xorg / glamor. It doesn't affect other user-space.
>
>Let me rephrase: This would artificially limit other user-space.
>
>
>--
>Earthling Michel Dänzer   |
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredha
>t.com%2Fdata=04%7C01%7CEmily.Deng%40amd.com%7Cae76e0607c
>d84ddba1c308d8b6275595%7C3dd8961fe4884e608e11a82d994e183d%7C0%
>7C0%7C637459627355417401%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4
>wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000
>mp;sdata=DoZlNXoNACkAbfn%2FcobBOM3D6Q60TgMz0bMSdCIgAVk%3D
>mp;reserved=0
>Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH v2] drm/amdgpu:Limit the resolution for virtual_display

2021-01-10 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Yes, it just won't report bigger than 16384 mode to user mode, as it won't work 
properly.

Best wishes
Emily Deng



>-Original Message-
>From: Alex Deucher 
>Sent: Friday, January 8, 2021 11:14 PM
>To: Deng, Emily 
>Cc: amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH v2] drm/amdgpu:Limit the resolution for virtual_display
>
>On Thu, Jan 7, 2021 at 8:45 PM Deng, Emily  wrote:
>>
>> [AMD Official Use Only - Internal Distribution Only]
>>
>> Ping ..
>>
>
>It's not clear what problem this solves.
>
>Alex
>
>
>> Best wishes
>> Emily Deng
>>
>>
>>
>> >-Original Message-
>> >From: Emily Deng 
>> >Sent: Thursday, January 7, 2021 11:29 AM
>> >To: amd-gfx@lists.freedesktop.org
>> >Cc: Deng, Emily 
>> >Subject: [PATCH v2] drm/amdgpu:Limit the resolution for
>> >virtual_display
>> >
>> >From: "Emily.Deng" 
>> >
>> >Limit the resolution not bigger than 16384, which means
>> >dev->mode_info.num_crtc * common_modes[i].w not bigger than 16384.
>> >
>> >v2:
>> >  Refine the code
>> >
>> >Signed-off-by: Emily.Deng 
>> >---
>> > drivers/gpu/drm/amd/amdgpu/dce_virtual.c | 7 +--
>> > 1 file changed, 5 insertions(+), 2 deletions(-)
>> >
>> >diff --git a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>> >b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>> >index 2b16c8faca34..fd2b3a6dfd60 100644
>> >--- a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>> >+++ b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>> >@@ -319,6 +319,7 @@ dce_virtual_encoder(struct drm_connector
>> >*connector)  static int dce_virtual_get_modes(struct drm_connector
>> >*connector)  {
>> > struct drm_device *dev = connector->dev;
>> >+struct amdgpu_device *adev = dev->dev_private;
>> > struct drm_display_mode *mode = NULL;  unsigned i;  static const
>> >struct mode_size { @@ -350,8 +351,10 @@ static int
>> >dce_virtual_get_modes(struct drm_connector *connector)  };
>> >
>> > for (i = 0; i < ARRAY_SIZE(common_modes); i++) { -mode =
>> >drm_cvt_mode(dev, common_modes[i].w, common_modes[i].h, 60, false,
>> >false, false); -drm_mode_probed_add(connector, mode);
>> >+if (adev->mode_info.num_crtc * common_modes[i].w <=
>> >16384) {
>> >+mode = drm_cvt_mode(dev, common_modes[i].w,
>> >common_modes[i].h, 60, false, false, false);
>> >+drm_mode_probed_add(connector, mode); }
>> > }
>> >
>> > return 0;
>> >--
>> >2.25.1
>>
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>>
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.f
>reedesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfxdata=04%7C01%7CEmily.Deng%40amd.com%7C74bdfb637c914153
>938508d8b3e81705%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7
>C637457156687621480%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
>MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sda
>ta=nDMymiTnfGo3ScIogcE8bch6ptCULS2GXSnuLcYQZEA%3Dreserved=
>0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH v2] drm/amdgpu: Decrease compute timeout to 10 s for sriov multiple VF

2021-01-10 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Ping .

>-Original Message-
>From: Emily Deng 
>Sent: Thursday, January 7, 2021 10:51 AM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: [PATCH v2] drm/amdgpu: Decrease compute timeout to 10 s for sriov
>multiple VF
>
>From: "Emily.Deng" 
>
>For multiple VF, after engine hang,as host driver will first encounter FLR, so
>has no meanning to set compute to 60s.
>
>v2:
>   Refine the patch and comment
>
>Signed-off-by: Emily.Deng 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 -
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>index 5527c549db82..35edf58c825d 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>@@ -3133,7 +3133,10 @@ static int
>amdgpu_device_get_job_timeout_settings(struct amdgpu_device *adev)
>  */
> adev->gfx_timeout = msecs_to_jiffies(1);
> adev->sdma_timeout = adev->video_timeout = adev->gfx_timeout;
>-if (amdgpu_sriov_vf(adev) || amdgpu_passthrough(adev))
>+if (amdgpu_sriov_vf(adev))
>+adev->compute_timeout =
>amdgpu_sriov_is_pp_one_vf(adev) ?
>+msecs_to_jiffies(6) :
>msecs_to_jiffies(1);
>+else if (amdgpu_passthrough(adev))
> adev->compute_timeout =  msecs_to_jiffies(6);
> else
> adev->compute_timeout = MAX_SCHEDULE_TIMEOUT;
>--
>2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH v2] drm/amdgpu:Limit the resolution for virtual_display

2021-01-07 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Ping ..

Best wishes
Emily Deng



>-Original Message-
>From: Emily Deng 
>Sent: Thursday, January 7, 2021 11:29 AM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: [PATCH v2] drm/amdgpu:Limit the resolution for virtual_display
>
>From: "Emily.Deng" 
>
>Limit the resolution not bigger than 16384, which means
>dev->mode_info.num_crtc * common_modes[i].w not bigger than 16384.
>
>v2:
>  Refine the code
>
>Signed-off-by: Emily.Deng 
>---
> drivers/gpu/drm/amd/amdgpu/dce_virtual.c | 7 +--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>index 2b16c8faca34..fd2b3a6dfd60 100644
>--- a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>+++ b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>@@ -319,6 +319,7 @@ dce_virtual_encoder(struct drm_connector
>*connector)  static int dce_virtual_get_modes(struct drm_connector
>*connector)  {
> struct drm_device *dev = connector->dev;
>+struct amdgpu_device *adev = dev->dev_private;
> struct drm_display_mode *mode = NULL;
> unsigned i;
> static const struct mode_size {
>@@ -350,8 +351,10 @@ static int dce_virtual_get_modes(struct
>drm_connector *connector)
> };
>
> for (i = 0; i < ARRAY_SIZE(common_modes); i++) {
>-mode = drm_cvt_mode(dev, common_modes[i].w,
>common_modes[i].h, 60, false, false, false);
>-drm_mode_probed_add(connector, mode);
>+if (adev->mode_info.num_crtc * common_modes[i].w <=
>16384) {
>+mode = drm_cvt_mode(dev, common_modes[i].w,
>common_modes[i].h, 60, false, false, false);
>+drm_mode_probed_add(connector, mode);
>+}
> }
>
> return 0;
>--
>2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: Decrease compute timeout to 10 s for sriov multiple VF

2021-01-07 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Ping ..

Best wishes
Emily Deng



>-Original Message-
>From: amd-gfx  On Behalf Of Emily
>Deng
>Sent: Thursday, January 7, 2021 10:50 AM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: [PATCH] drm/amdgpu: Decrease compute timeout to 10 s for sriov
>multiple VF
>
>From: "Emily.Deng" 
>
>For multiple VF, after engine hang,as host driver will first encounter FLR, so
>has no meanning to set compute to 60s.
>
>Signed-off-by: Emily.Deng 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 -
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>index 5527c549db82..ce07b9b975ff 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>@@ -3133,7 +3133,10 @@ static int
>amdgpu_device_get_job_timeout_settings(struct amdgpu_device *adev)
>  */
> adev->gfx_timeout = msecs_to_jiffies(1);
> adev->sdma_timeout = adev->video_timeout = adev->gfx_timeout;
>-if (amdgpu_sriov_vf(adev) || amdgpu_passthrough(adev))
>+if (amdgpu_sriov_vf(adev))
>+adev->compute_timeout =
>amdgpu_sriov_is_pp_one_vf(adev) ?
>+msecs_to_jiffies(6) :
>msecs_to_jiffies(1)
>+else if (amdgpu_passthrough(adev))
> adev->compute_timeout =  msecs_to_jiffies(6);
> else
> adev->compute_timeout = MAX_SCHEDULE_TIMEOUT;
>--
>2.25.1
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.f
>reedesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfxdata=04%7C01%7CEmily.Deng%40amd.com%7C29287057e1d84e2
>e912708d8b2b6f196%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7
>C637455846125410803%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
>MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sda
>ta=YsS0ylgUl2p2vXWbYftPBoFn59xdKKOpBdTVoxbOE2Y%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 3/3] drm/amdgpu:Limit the resolution for virtual_display

2021-01-07 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

>-Original Message-
>From: Michel Dänzer 
>Sent: Thursday, January 7, 2021 4:42 PM
>To: Deng, Emily ; Alex Deucher
>
>Cc: amd-gfx list 
>Subject: Re: [PATCH 3/3] drm/amdgpu:Limit the resolution for virtual_display
>
>On 2021-01-07 3:28 a.m., Deng, Emily wrote:
>>> From: Michel Dänzer  On 2021-01-06 11:40 a.m.,
>>> Deng, Emily wrote:
>>>>> From: Alex Deucher  On Tue, Jan 5, 2021 at
>>>>> 3:37 AM Emily.Deng  wrote:
>>>>>>
>>>>>> Limit the resolution not bigger than 16384, which means
>>>>>> dev->mode_info.num_crtc * common_modes[i].w not bigger than
>16384.
>>>>>>
>>>>>> Signed-off-by: Emily.Deng 
>>>>>> ---
>>>>>>drivers/gpu/drm/amd/amdgpu/dce_virtual.c | 7 +--
>>>>>>1 file changed, 5 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>>>>>> index 2b16c8faca34..c23d37b02fd7 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>>>>>> @@ -319,6 +319,7 @@ dce_virtual_encoder(struct drm_connector
>>>>>> *connector)  static int dce_virtual_get_modes(struct drm_connector
>>>>>> *connector)  {
>>>>>>   struct drm_device *dev = connector->dev;
>>>>>> +   struct amdgpu_device *adev = dev->dev_private;
>>>>>>   struct drm_display_mode *mode = NULL;
>>>>>>   unsigned i;
>>>>>>   static const struct mode_size { @@ -350,8 +351,10 @@
>>>>>> static int dce_virtual_get_modes(struct drm_connector *connector)
>>>>>>   };
>>>>>>
>>>>>>   for (i = 0; i < ARRAY_SIZE(common_modes); i++) {
>>>>>> -   mode = drm_cvt_mode(dev, common_modes[i].w,
>>>>> common_modes[i].h, 60, false, false, false);
>>>>>> -   drm_mode_probed_add(connector, mode);
>>>>>> +   if (adev->mode_info.num_crtc <= 4 ||
>>>>>> + common_modes[i].w <= 2560) {
>>>>>
>>>>> You are also limiting the number of crtcs here.  Intended?  Won't
>>>>> this break 5 or 6 crtc configs?
>>>>>
>>>>> Alex
>>>> Yes, it is intended,  for num_crtc bigger then 4, don't support
>>>> resolution
>>> bigger then 2560, because of the max supported width is 16384 for xcb
>>> protocol.
>>>
>>> There's no such limitation with Wayland. I'd recommend against
>>> artificially imposing limits from X11 to the kernel.
>>>
>>>
>>> (As a side note, the X11 protocol limit should actually be 32768; the
>>> 16384 limit exposed in the RANDR extension comes from the kernel
>>> driver, specifically drmModeGetResources's max_width/height)
>> It is our test and debug result, that the follow variable only have 16bit. 
>> Will
>limit the resolution to 16384.
>> glamor_pixmap_from_fd(ScreenPtr screen,
>>int fd,
>>CARD16 width,
>>CARD16 height,
>>CARD16 stride, CARD8 depth, CARD8 bpp)
>
>I assume you're referring to the stride parameter, which is in bytes.
>
>This function is only used for pixmaps created from a dma-buf via DRI3.
>It does not limit the size of other pixmaps, so it does not limit the size of 
>the
>screen pixmap (which corresponds to the framebuffer size in the RANDR
>extension) in general.
>
>Also, this is an implementation detail, the limitation could be lifted by
>changing the type of the parameter (though that would be an ABI break for
>Xorg).
>
>Xwayland isn't affected by this:
>
>Screen 0: minimum 16 x 16, current 1920 x 1200, maximum 32767 x 32767
Yes, openGL driver will refer to the stride. As we have tried resolution bigger 
than 16384, the screen won't display well.
And seem no body has verified this. So we want to limit the max supported modes 
to not bigger than 16384.
>
>
>--
>Earthling Michel Dänzer   |
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredha
>t.com%2Fdata=04%7C01%7CEmily.Deng%40amd.com%7C6279eb2390
>0b436337b408d8b2e82635%7C3dd8961fe4884e608e11a82d994e183d%7C0%
>7C0%7C637456057455424961%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4
>wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000
>mp;sdata=u%2FuKB%2ByxJMzCr3nfU8rkFyjjI37gc%2BZ2rmHP9riZB5w%3D
>mp;reserved=0
>Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH v2] drm/amdgpu/psp: fix psp gfx ctrl cmds

2021-01-06 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Sorry, reply the wrong message.
Reviewed-by: Emily.Deng 

>-Original Message-
>From: amd-gfx  On Behalf Of Deng,
>Emily
>Sent: Thursday, January 7, 2021 2:26 PM
>To: Zhao, Victor ; amd-gfx@lists.freedesktop.org
>Cc: Zhao, Victor 
>Subject: RE: [PATCH v2] drm/amdgpu/psp: fix psp gfx ctrl cmds
>
>[AMD Official Use Only - Internal Distribution Only]
>
>[AMD Official Use Only - Internal Distribution Only]
>
>Reviewed-by: Evan Quan 
>
>>-Original Message-
>>From: amd-gfx  On Behalf Of
>>Victor Zhao
>>Sent: Tuesday, January 5, 2021 3:51 PM
>>To: amd-gfx@lists.freedesktop.org
>>Cc: Zhao, Victor 
>>Subject: [PATCH v2] drm/amdgpu/psp: fix psp gfx ctrl cmds
>>
>>psp GFX_CTRL_CMD_ID_CONSUME_CMD different for windows and linux,
>>according to psp, linux cmds are not correct.
>>
>>v2: only correct GFX_CTRL_CMD_ID_CONSUME_CMD.
>>
>>Signed-off-by: Victor Zhao 
>>---
>> drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>>diff --git a/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h
>>b/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h
>>index d65a5339d354..3ba7bdfde65d 100644
>>--- a/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h
>>+++ b/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h
>>@@ -47,7 +47,7 @@ enum psp_gfx_crtl_cmd_id
>> GFX_CTRL_CMD_ID_DISABLE_INT = 0x0006,   /* disable PSP-to-Gfx
>>interrupt */
>> GFX_CTRL_CMD_ID_MODE1_RST   = 0x0007,   /* trigger the Mode
>1
>>reset */
>> GFX_CTRL_CMD_ID_GBR_IH_SET  = 0x0008,   /* set Gbr
>>IH_RB_CNTL registers */
>>-GFX_CTRL_CMD_ID_CONSUME_CMD = 0x000A,   /* send interrupt
>>to psp for updating write pointer of vf */
>>+GFX_CTRL_CMD_ID_CONSUME_CMD = 0x0009,   /* send interrupt
>>to psp for updating write pointer of vf */
>> GFX_CTRL_CMD_ID_DESTROY_GPCOM_RING = 0x000C, /* destroy
>GPCOM
>>ring */
>>
>> GFX_CTRL_CMD_ID_MAX = 0x000F,   /* max command ID */
>>--
>>2.25.1
>>
>>___
>>amd-gfx mailing list
>>amd-gfx@lists.freedesktop.org
>>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists
>>.f
>>reedesktop.org%2Fmailman%2Flistinfo%2Famd-
>>gfxdata=04%7C01%7CEmily.Deng%40amd.com%7Cb32008e7e8e144
>7
>>797ab08d8b14eb089%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%
>>7C637454298972763193%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjA
>>wMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000
>s
>>data=2HlggVF4%2B20Ceom5OBfyr0MXYdiLMblwgUbl%2FVEeqII%3Dr
>e
>>served=0
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.f
>reedesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfxdata=04%7C01%7CEmily.Deng%40amd.com%7Cdbe3378304364ec2
>161208d8b2d51127%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7
>C637455975520611398%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
>MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sda
>ta=lmtZ%2BZjKZEq73CsfCX%2FxEcM7rzWB7%2FuKDzuJ7j3Nx8k%3Dres
>erved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH v2] drm/amdgpu/psp: fix psp gfx ctrl cmds

2021-01-06 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Evan Quan 

>-Original Message-
>From: amd-gfx  On Behalf Of Victor
>Zhao
>Sent: Tuesday, January 5, 2021 3:51 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Zhao, Victor 
>Subject: [PATCH v2] drm/amdgpu/psp: fix psp gfx ctrl cmds
>
>psp GFX_CTRL_CMD_ID_CONSUME_CMD different for windows and linux,
>according to psp, linux cmds are not correct.
>
>v2: only correct GFX_CTRL_CMD_ID_CONSUME_CMD.
>
>Signed-off-by: Victor Zhao 
>---
> drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h
>b/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h
>index d65a5339d354..3ba7bdfde65d 100644
>--- a/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h
>+++ b/drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h
>@@ -47,7 +47,7 @@ enum psp_gfx_crtl_cmd_id
> GFX_CTRL_CMD_ID_DISABLE_INT = 0x0006,   /* disable PSP-to-Gfx
>interrupt */
> GFX_CTRL_CMD_ID_MODE1_RST   = 0x0007,   /* trigger the Mode 1
>reset */
> GFX_CTRL_CMD_ID_GBR_IH_SET  = 0x0008,   /* set Gbr
>IH_RB_CNTL registers */
>-GFX_CTRL_CMD_ID_CONSUME_CMD = 0x000A,   /* send interrupt
>to psp for updating write pointer of vf */
>+GFX_CTRL_CMD_ID_CONSUME_CMD = 0x0009,   /* send interrupt
>to psp for updating write pointer of vf */
> GFX_CTRL_CMD_ID_DESTROY_GPCOM_RING = 0x000C, /* destroy
>GPCOM ring */
>
> GFX_CTRL_CMD_ID_MAX = 0x000F,   /* max command ID */
>--
>2.25.1
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.f
>reedesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfxdata=04%7C01%7CEmily.Deng%40amd.com%7Cb32008e7e8e1447
>797ab08d8b14eb089%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%
>7C637454298972763193%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjA
>wMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000s
>data=2HlggVF4%2B20Ceom5OBfyr0MXYdiLMblwgUbl%2FVEeqII%3Dre
>served=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: For sriov multiple VF, set compute timeout to 10s

2021-01-06 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

>-Original Message-
>From: Paul Menzel 
>Sent: Wednesday, January 6, 2021 8:54 PM
>To: Deng, Emily 
>Cc: amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu: For sriov multiple VF, set compute timeout
>to 10s
>
>Dear Emily,
>
>
>Am 06.01.21 um 12:41 schrieb Emily.Deng:
>
>Could you please remove the dot your name in your git configuration?
>
> git config --global user.name "Emily Deng"
Ok, will do this.
>
>For the summary, maybe amend it to:
>
> Decrease compute timeout to 10 s for sriov multiple VF
Ok, thanks, good suggestion.
>
>> For multiple VF, after engine hang,as host driver will first
>
>Nit: Please add a space after the comma.
>
>> encounter FLR, so has no meanning to set compute to 60s.
>
>meaning
>
>How can this be tested?
Setup the environment for sriov.
>
>> Signed-off-by: Emily.Deng 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index b69c34074d8d..ed36bf97df29 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -3117,8 +3117,10 @@ static int
>amdgpu_device_get_job_timeout_settings(struct amdgpu_device *adev)
>>*/
>>   adev->gfx_timeout = msecs_to_jiffies(1);
>>   adev->sdma_timeout = adev->video_timeout = adev->gfx_timeout;
>> -if (amdgpu_sriov_vf(adev) || amdgpu_passthrough(adev))
>> +if ((amdgpu_sriov_vf(adev) && amdgpu_sriov_is_pp_one_vf(adev)) ||
>> +amdgpu_passthrough(adev))
>>   adev->compute_timeout =  msecs_to_jiffies(6);
>> +else if (amdgpu_sriov_vf(adev))
>> +adev->compute_timeout =  msecs_to_jiffies(1);
>
>Maybe split up the first if condition to group the condition and not he timeout
>values. At least for me that would be less confusing:
>
> if (amdgpu_sriov_vf(adev))
> adev->compute_timeout = amdgpu_sriov_is_pp_one_vf(adev) ?
>msecs_to_jiffies(6) : msecs_to_jiffies(1)
> else if (amdgpu_passthrough(adev))
> adev->compute_timeout =  msecs_to_jiffies(6);
>
>>   else
>>   adev->compute_timeout = MAX_SCHEDULE_TIMEOUT;
>
Good suggestion, will send out v2 patch
>
>Kind regards,
>
>Paul
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 3/3] drm/amdgpu:Limit the resolution for virtual_display

2021-01-06 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

>-Original Message-
>From: Michel Dänzer 
>Sent: Wednesday, January 6, 2021 11:25 PM
>To: Deng, Emily ; Alex Deucher
>
>Cc: amd-gfx list 
>Subject: Re: [PATCH 3/3] drm/amdgpu:Limit the resolution for virtual_display
>
>On 2021-01-06 11:40 a.m., Deng, Emily wrote:
>>> From: Alex Deucher  On Tue, Jan 5, 2021 at
>>> 3:37 AM Emily.Deng  wrote:
>>>>
>>>> Limit the resolution not bigger than 16384, which means
>>>> dev->mode_info.num_crtc * common_modes[i].w not bigger than 16384.
>>>>
>>>> Signed-off-by: Emily.Deng 
>>>> ---
>>>>   drivers/gpu/drm/amd/amdgpu/dce_virtual.c | 7 +--
>>>>   1 file changed, 5 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>>>> b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>>>> index 2b16c8faca34..c23d37b02fd7 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>>>> @@ -319,6 +319,7 @@ dce_virtual_encoder(struct drm_connector
>>>> *connector)  static int dce_virtual_get_modes(struct drm_connector
>>>> *connector)  {
>>>>  struct drm_device *dev = connector->dev;
>>>> +   struct amdgpu_device *adev = dev->dev_private;
>>>>  struct drm_display_mode *mode = NULL;
>>>>  unsigned i;
>>>>  static const struct mode_size { @@ -350,8 +351,10 @@ static
>>>> int dce_virtual_get_modes(struct drm_connector *connector)
>>>>  };
>>>>
>>>>  for (i = 0; i < ARRAY_SIZE(common_modes); i++) {
>>>> -   mode = drm_cvt_mode(dev, common_modes[i].w,
>>> common_modes[i].h, 60, false, false, false);
>>>> -   drm_mode_probed_add(connector, mode);
>>>> +   if (adev->mode_info.num_crtc <= 4 ||
>>>> + common_modes[i].w <= 2560) {
>>>
>>> You are also limiting the number of crtcs here.  Intended?  Won't
>>> this break 5 or 6 crtc configs?
>>>
>>> Alex
>> Yes, it is intended,  for num_crtc bigger then 4, don't support resolution
>bigger then 2560, because of the max supported width is 16384 for xcb
>protocol.
>
>There's no such limitation with Wayland. I'd recommend against artificially
>imposing limits from X11 to the kernel.
>
>
>(As a side note, the X11 protocol limit should actually be 32768; the
>16384 limit exposed in the RANDR extension comes from the kernel driver,
>specifically drmModeGetResources's max_width/height)
It is our test and debug result, that the follow variable only have 16bit. Will 
limit the resolution to 16384.
glamor_pixmap_from_fd(ScreenPtr screen,
  int fd,
  CARD16 width,
  CARD16 height,
  CARD16 stride, CARD8 depth, CARD8 bpp)
>
>
>--
>Earthling Michel Dänzer   |
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredha
>t.com%2Fdata=04%7C01%7CEmily.Deng%40amd.com%7Ca822927192
>e54d50539c08d8b2574439%7C3dd8961fe4884e608e11a82d994e183d%7C0%
>7C0%7C637455435178758996%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4
>wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000
>mp;sdata=5u7%2Bz2q52PTyPEg9LWcLGVGLERYupc%2B5nKJiIHZTTKw%3D
>mp;reserved=0
>Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 3/3] drm/amdgpu:Limit the resolution for virtual_display

2021-01-06 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

>-Original Message-
>From: Alex Deucher 
>Sent: Wednesday, January 6, 2021 1:23 AM
>To: Deng, Emily 
>Cc: amd-gfx list 
>Subject: Re: [PATCH 3/3] drm/amdgpu:Limit the resolution for virtual_display
>
>On Tue, Jan 5, 2021 at 3:37 AM Emily.Deng  wrote:
>>
>> Limit the resolution not bigger than 16384, which means
>> dev->mode_info.num_crtc * common_modes[i].w not bigger than 16384.
>>
>> Signed-off-by: Emily.Deng 
>> ---
>>  drivers/gpu/drm/amd/amdgpu/dce_virtual.c | 7 +--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>> b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>> index 2b16c8faca34..c23d37b02fd7 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>> @@ -319,6 +319,7 @@ dce_virtual_encoder(struct drm_connector
>> *connector)  static int dce_virtual_get_modes(struct drm_connector
>> *connector)  {
>> struct drm_device *dev = connector->dev;
>> +   struct amdgpu_device *adev = dev->dev_private;
>> struct drm_display_mode *mode = NULL;
>> unsigned i;
>> static const struct mode_size { @@ -350,8 +351,10 @@ static
>> int dce_virtual_get_modes(struct drm_connector *connector)
>> };
>>
>> for (i = 0; i < ARRAY_SIZE(common_modes); i++) {
>> -   mode = drm_cvt_mode(dev, common_modes[i].w,
>common_modes[i].h, 60, false, false, false);
>> -   drm_mode_probed_add(connector, mode);
>> +   if (adev->mode_info.num_crtc <= 4 ||
>> + common_modes[i].w <= 2560) {
>
>You are also limiting the number of crtcs here.  Intended?  Won't this break 5
>or 6 crtc configs?
>
>Alex
Yes, it is intended,  for num_crtc bigger then 4, don't support resolution 
bigger then 2560, because of the max supported width is 16384 for xcb protocol.
>
>> +   mode = drm_cvt_mode(dev, common_modes[i].w,
>common_modes[i].h, 60, false, false, false);
>> +   drm_mode_probed_add(connector, mode);
>> +   }
>> }
>>
>> return 0;
>> --
>> 2.25.1
>>
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist
>> s.freedesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfxdata=04%7C01%7CEm
>>
>ily.Deng%40amd.com%7Ce17ab0515ecf483eff6a08d8b19ea565%7C3dd8961f
>e4884e
>>
>608e11a82d994e183d%7C0%7C0%7C637454642229402978%7CUnknown%7
>CTWFpbGZsb3
>>
>d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%
>3D%7
>>
>C1000sdata=YEVtCVJZ8JSe3kjyAGmjltHN1O4i4yvjvXjDZhWhZSY%3D
>mp;res
>> erved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 2/3] drm/amdgpu: Correct the read sclk for navi10

2021-01-05 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

>-Original Message-
>From: Quan, Evan 
>Sent: Tuesday, January 5, 2021 5:07 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: RE: [PATCH 2/3] drm/amdgpu: Correct the read sclk for navi10
>
>[AMD Official Use Only - Internal Distribution Only]
>
>What's the issue with original implementation?
> And does other clock domains(e.g uclk) need this fix also?
According to smu team, after navi10, it will use dfll mode, for sclk read from 
CurrClock is not correct, need to read from AverageGfxclkFrequency in 
SmuMetrics_t. Will add this in comment.
>
>-Original Message-
>From: amd-gfx  On Behalf Of
>Emily.Deng
>Sent: Tuesday, January 5, 2021 4:37 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: [PATCH 2/3] drm/amdgpu: Correct the read sclk for navi10
>
>Signed-off-by: Emily.Deng 
>---
> drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
>b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
>index 51e83123f72a..7ebf9588983f 100644
>--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
>+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
>@@ -1673,7 +1673,7 @@ static int navi10_read_sensor(struct smu_context
>*smu,  *size = 4;  break;  case AMDGPU_PP_SENSOR_GFX_SCLK:
>-ret = navi10_get_current_clk_freq_by_table(smu, SMU_GFXCLK, (uint32_t
>*)data);
>+ret = navi10_get_smu_metrics_data(smu, METRICS_AVERAGE_GFXCLK,
>+(uint32_t *)data);
> *(uint32_t *)data *= 100;
> *size = 4;
> break;
>--
>2.25.1
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.f
>reedesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfxdata=04%7C01%7Cevan.quan%40amd.com%7C8f2af901fd044c097
>8a408d8b155289f%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C
>637454326608630462%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
>DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata
>=mzE040qzS6j1%2Fy85pZgUE1q3Pl6LLYOIe6Z7S3zsxJw%3Dreserved=0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: do optimization for psp command submit

2020-12-24 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Emily.Deng 

>-Original Message-
>From: amd-gfx  On Behalf Of Peng
>Ju Zhou
>Sent: Friday, December 25, 2020 3:02 PM
>To: amd-gfx@lists.freedesktop.org
>Subject: [PATCH] drm/amdgpu: do optimization for psp command submit
>
>From: pengzhou 
>
>In the psp command submit logic,
>the function msleep(1) delayed too long, Changing it to usleep_range(10, 100)
>to have a better performance.
>
>Signed-off-by: Peng Ju Zhou 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>index 523d22db094b..8d11b34fe40e 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>@@ -249,7 +249,7 @@ psp_cmd_submit_buf(struct psp_context *psp,  {
> int ret;
> int index;
>-int timeout = 2000;
>+int timeout = 2;
> bool ras_intr = false;
> bool skip_unsupport = false;
>
>@@ -282,7 +282,7 @@ psp_cmd_submit_buf(struct psp_context *psp,
> ras_intr = amdgpu_ras_intr_triggered();
> if (ras_intr)
> break;
>-msleep(1);
>+usleep_range(10, 100);
> amdgpu_asic_invalidate_hdp(psp->adev, NULL);
> }
>
>--
>2.17.1
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.f
>reedesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfxdata=04%7C01%7CEmily.Deng%40amd.com%7C57638b8bb2ed443
>d966d08d8a8a30448%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%
>7C637444765410848027%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjA
>wMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000s
>data=Ypcwgp0epONjbZGria4RwVRRnrIQs%2F36%2FnzWNXUDPRU%3D
>reserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: reduce the full access time by about 50ms

2020-12-24 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Emily.Deng 

>-Original Message-
>From: amd-gfx  On Behalf Of
>pengzhou
>Sent: Thursday, December 24, 2020 2:05 PM
>To: amd-gfx@lists.freedesktop.org
>Subject: [PATCH] drm/amdgpu: reduce the full access time by about 50ms
>
>The function msleep(1) can be delay to 10+ ms sometimes, which contributes
>a big delay during the full access time.
>
>Changing msleep(1) to usleep_range(10, 100) and it can reduce about 50ms
>delay during full access time.
>
>Signed-off-by: pengzhou 
>Change-Id: I151a07c55068d5c429553ef0e6668f024c0c0f3d
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>index 523d22db094b..ef69051681cf 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>@@ -282,7 +282,7 @@ psp_cmd_submit_buf(struct psp_context *psp,
> ras_intr = amdgpu_ras_intr_triggered();
> if (ras_intr)
> break;
>-msleep(1);
>+usleep_range(10, 100);
> amdgpu_asic_invalidate_hdp(psp->adev, NULL);
> }
>
>--
>2.17.1
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.f
>reedesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfxdata=04%7C01%7CEmily.Deng%40amd.com%7C902b8d53955b4eb
>aba9c08d8a7d1d6f4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7
>C637443867305925612%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
>MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000sda
>ta=Nj4thtHNHcIZyVxn1lFOsQSXO0i4cSSWEASpt3RfGtw%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 2/2] drm/amd: Skip not used microcode loading in SRIOV

2020-09-23 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Series is Reviewed-by:  Emily.Deng 

>-Original Message-
>From: amd-gfx  On Behalf Of
>Jingwen Chen
>Sent: Wednesday, September 23, 2020 6:07 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Chen, JingWen 
>Subject: [PATCH 2/2] drm/amd: Skip not used microcode loading in SRIOV
>
>smc, sdma, sos, ta and asd fw is not used in SRIOV. Skip them to accelerate
>sw_init for navi12.
>
>v2: skip above fw in SRIOV for vega10 and sienna_cichlid
>v3: directly skip psp fw loading in SRIOV
>Signed-off-by: Jingwen Chen 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c  | 10 ++
> drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c   |  3 +++
> drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c   |  3 +++
> drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c   |  3 +++
> .../gpu/drm/amd/pm/powerplay/smumgr/vega10_smumgr.c  | 12 +++
>-
> drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c| 11 +++
> 6 files changed, 29 insertions(+), 13 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>index 2c66e20b2ed9..18be544d8c1e 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>@@ -161,10 +161,12 @@ static int psp_sw_init(void *handle)
> struct psp_context *psp = >psp;
> int ret;
>
>-ret = psp_init_microcode(psp);
>-if (ret) {
>-DRM_ERROR("Failed to load psp firmware!\n");
>-return ret;
>+if (!amdgpu_sriov_vf(adev)) {
>+ret = psp_init_microcode(psp);
>+if (ret) {
>+DRM_ERROR("Failed to load psp firmware!\n");
>+return ret;
>+}
> }
>
> ret = psp_memory_training_init(psp);
>diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>index 810635cbf4c1..86fb1eddf5a6 100644
>--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>@@ -592,6 +592,9 @@ static int sdma_v4_0_init_microcode(struct
>amdgpu_device *adev)
> struct amdgpu_firmware_info *info = NULL;
> const struct common_firmware_header *header = NULL;
>
>+if (amdgpu_sriov_vf(adev))
>+return 0;
>+
> DRM_DEBUG("\n");
>
> switch (adev->asic_type) {
>diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
>b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
>index 48c95a78a173..9c72b95b7463 100644
>--- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
>+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
>@@ -203,6 +203,9 @@ static int sdma_v5_0_init_microcode(struct
>amdgpu_device *adev)
> const struct common_firmware_header *header = NULL;
> const struct sdma_firmware_header_v1_0 *hdr;
>
>+if (amdgpu_sriov_vf(adev))
>+return 0;
>+
> DRM_DEBUG("\n");
>
> switch (adev->asic_type) {
>diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
>b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
>index 34ccf376ee45..9f3952723c63 100644
>--- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
>+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
>@@ -148,6 +148,9 @@ static int sdma_v5_2_init_microcode(struct
>amdgpu_device *adev)
> struct amdgpu_firmware_info *info = NULL;
> const struct common_firmware_header *header = NULL;
>
>+if (amdgpu_sriov_vf(adev))
>+return 0;
>+
> DRM_DEBUG("\n");
>
> switch (adev->asic_type) {
>diff --git a/drivers/gpu/drm/amd/pm/powerplay/smumgr/vega10_smumgr.c
>b/drivers/gpu/drm/amd/pm/powerplay/smumgr/vega10_smumgr.c
>index 1e222c5d91a4..daf122f24f23 100644
>--- a/drivers/gpu/drm/amd/pm/powerplay/smumgr/vega10_smumgr.c
>+++ b/drivers/gpu/drm/amd/pm/powerplay/smumgr/vega10_smumgr.c
>@@ -209,11 +209,13 @@ static int vega10_smu_init(struct pp_hwmgr
>*hwmgr)
> int ret;
> struct cgs_firmware_info info = {0};
>
>-ret = cgs_get_firmware_info(hwmgr->device,
>-CGS_UCODE_ID_SMU,
>-);
>-if (ret || !info.kptr)
>-return -EINVAL;
>+if (!amdgpu_sriov_vf((struct amdgpu_device *)hwmgr->adev)) {
>+ret = cgs_get_firmware_info(hwmgr->device,
>+CGS_UCODE_ID_SMU,
>+);
>+if (ret || !info.kptr)
>+return -EINVAL;
>+}
>
> priv = kzalloc(sizeof(struct vega10_smumgr), GFP_KERNEL);
>
>diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
>b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
>index 538e6f5e19eb..3010cb31324a 100644
>--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
>+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
>@@ -832,10 +832,13 @@ static int smu_sw_init(void *handle)
>
> smu->smu_dpm.dpm_level = AMD_DPM_FORCED_LEVEL_AUTO;
> smu->smu_dpm.requested_dpm_level =
>AMD_DPM_FORCED_LEVEL_AUTO;
>-ret = smu_init_microcode(smu);
>-if (ret) {
>-dev_err(adev->dev, "Failed to load smu firmware!\n");
>-return ret;
>+
>+if (!amdgpu_sriov_vf(adev)) {
>+ret = smu_init_microcode(smu);
>+if (ret) {
>+dev_err(adev->dev, "Failed to load smu firmware!\n");
>+return ret;
>+}
> }
>
> ret = smu_smc_table_sw_init(smu);
>--
>2.25.1
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.fre
>edesktop.org%2Fmailman%2Flistinfo%2Famd-

RE: [PATCH 2/2] drm/amd: Skip not used microcode loading in SRIOV

2020-09-22 Thread Deng, Emily
[AMD Public Use]

Hi Kevin and Hawking,
I think both you are right. But currently we haven't good method to handle 
this. It seems need to re-arch the whole driver, not only refer to this patch. 
Only refer to this patch, I think it is OK.

Best wishes
Emily Deng
From: amd-gfx  On Behalf Of Wang, 
Kevin(Yang)
Sent: Tuesday, September 22, 2020 3:38 PM
To: Zhang, Hawking ; Chen, JingWen 
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 2/2] drm/amd: Skip not used microcode loading in SRIOV


[AMD Public Use]


[AMD Public Use]

Embedding these SRIOV check into the underlying functions is in many places, 
which is not conducive to subsequent code optimization and maintenance.
It took a long time to clean up the SMU code before, but now some new checks 
have been introduced into the SMU code.
I think a new method should be adopted to solve this problem unless there's a 
special reason.

Best Regards,
Kevin

From: amd-gfx 
mailto:amd-gfx-boun...@lists.freedesktop.org>>
 on behalf of Zhang, Hawking 
mailto:hawking.zh...@amd.com>>
Sent: Tuesday, September 22, 2020 3:25 PM
To: Chen, JingWen mailto:jingwen.ch...@amd.com>>; 
amd-gfx@lists.freedesktop.org 
mailto:amd-gfx@lists.freedesktop.org>>
Cc: Chen, JingWen mailto:jingwen.ch...@amd.com>>
Subject: RE: [PATCH 2/2] drm/amd: Skip not used microcode loading in SRIOV

[AMD Public Use]

1. Please do not add the amdgpu_sriov_vf check in every psp fw init_microcode 
function. psp_init_microcode is the entry point for all kinds of psp fw 
microcode initialization.
2. I'd like to get a whole picture on all the sequence you want to skip from 
guest side so that we can have more organized/reasonable approach to exclude 
those programing sequence for SRIOV, instead of having the amdgpu_sriov_vf 
patched case by case...

Regards,
Hawking

-Original Message-
From: amd-gfx 
mailto:amd-gfx-boun...@lists.freedesktop.org>>
 On Behalf Of Jingwen Chen
Sent: Tuesday, September 22, 2020 15:09
To: amd-gfx@lists.freedesktop.org
Cc: Chen, JingWen mailto:jingwen.ch...@amd.com>>
Subject: [PATCH 2/2] drm/amd: Skip not used microcode loading in SRIOV

smc, sdma, sos, ta and asd fw is not used in SRIOV. Skip them to accelerate 
sw_init for navi12.

v2: skip above fw in SRIOV for vega10 and sienna_cichlid
Signed-off-by: Jingwen Chen 
mailto:jingwen.ch...@amd.com>>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c  |  9 +
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c   |  3 +++
 drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c   |  3 +++
 drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c   |  3 +++
 .../gpu/drm/amd/pm/powerplay/smumgr/vega10_smumgr.c  | 12 +++-
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c| 11 +++
 6 files changed, 32 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index 2c66e20b2ed9..9e2038de6ea7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -2385,6 +2385,9 @@ int psp_init_asd_microcode(struct psp_context *psp,
 const struct psp_firmware_header_v1_0 *asd_hdr;
 int err = 0;

+   if (amdgpu_sriov_vf(adev))
+   return 0;
+
 if (!chip_name) {
 dev_err(adev->dev, "invalid chip name for asd microcode\n");
 return -EINVAL;
@@ -2424,6 +2427,9 @@ int psp_init_sos_microcode(struct psp_context *psp,
 const struct psp_firmware_header_v1_3 *sos_hdr_v1_3;
 int err = 0;

+   if (amdgpu_sriov_vf(adev))
+   return 0;
+
 if (!chip_name) {
 dev_err(adev->dev, "invalid chip name for sos microcode\n");
 return -EINVAL;
@@ -2558,6 +2564,9 @@ int psp_init_ta_microcode(struct psp_context *psp,
 int err = 0;
 int ta_index = 0;

+   if (amdgpu_sriov_vf(adev))
+   return 0;
+
 if (!chip_name) {
 dev_err(adev->dev, "invalid chip name for ta microcode\n");
 return -EINVAL;
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
index 810635cbf4c1..86fb1eddf5a6 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
@@ -592,6 +592,9 @@ static int sdma_v4_0_init_microcode(struct amdgpu_device 
*adev)
 struct amdgpu_firmware_info *info = NULL;
 const struct common_firmware_header *header = NULL;

+   if (amdgpu_sriov_vf(adev))
+   return 0;
+
 DRM_DEBUG("\n");

 switch (adev->asic_type) {
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
index 48c95a78a173..9c72b95b7463 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
@@ -203,6 +203,9 @@ static int sdma_v5_0_init_microcode(struct 

RE: [PATCH] drm/amdgpu/sriov: Enable the mcbp parameter for sriov

2020-09-21 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Hi Monk,
Just for debugging, we don't need to remove those amdgpu_sriov_vf, it won't 
affect the mcbp disable.

Best wishes
Emily Deng



>-Original Message-
>From: Liu, Monk 
>Sent: Monday, September 21, 2020 4:02 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: RE: [PATCH] drm/amdgpu/sriov: Enable the mcbp parameter for sriov
>
>[AMD Official Use Only - Internal Distribution Only]
>
>Looks you missed many places, e.g.:
>
>866 if (amdgpu_mcbp || amdgpu_sriov_vf(adev)) {
>   867 bo_va = fpriv->csa_va;
>   868 BUG_ON(!bo_va);
>   869 r = amdgpu_vm_bo_update(adev, bo_va, false);
>   870 if (r)
>   871 return r;
>   872
>   873 r = amdgpu_sync_vm_fence(>job->sync, bo_va-
>>last_pt_update);
>   874 if (r)
>   875 return r;
>   876 }
>
>
>   949 if (chunk_ib->ip_type == AMDGPU_HW_IP_GFX &&
>   950 (amdgpu_mcbp || amdgpu_sriov_vf(adev))) {
>   951 if (chunk_ib->flags & AMDGPU_IB_FLAG_PREEMPT) {
>   952 if (chunk_ib->flags & AMDGPU_IB_FLAG_CE)
>   953 ce_preempt++;
>   954 else
>   955 de_preempt++;
>   956 }
>   957
>   958 /* each GFX command submit allows 0 or 1 IB preemptible for 
> CE
>& DE */
>   959 if (ce_preempt > 1 || de_preempt > 1)
>   960 return -EINVAL;
>   961 }
>
>
>  2029 r = amdgpu_device_wb_init(adev);
>  2030 if (r) {
>  2031 DRM_ERROR("amdgpu_device_wb_init failed %d\n", r);
>  2032 goto init_failed;
>  2033 }
>  2034 adev->ip_blocks[i].status.hw = true;
>  2035
>  2036 /* right after GMC hw init, we create CSA */
>  2037 if (amdgpu_mcbp || amdgpu_sriov_vf(adev)) {
>  2038 r = amdgpu_allocate_static_csa(adev, 
> >virt.csa_obj,
>  2039 AMDGPU_GEM_DOMAIN_VRAM,
>  2040 AMDGPU_CSA_SIZE);
>  2041 if (r) {
>  2042 DRM_ERROR("allocate CSA failed %d\n", r);
>  2043 goto init_failed;
>  2044 }
>  2045 }
>  2046 }
>  2047 }
>
>
>  4587 if ((amdgpu_sriov_vf(ring->adev) || amdgpu_mcbp) && (ib->flags &
>AMDGPU_IB_FLAG_PREEMPT)) {
>  4588 control |= INDIRECT_BUFFER_PRE_ENB(1);
>  4589
>  4590 if (flags & AMDGPU_IB_PREEMPTED)
>  4591 control |= INDIRECT_BUFFER_PRE_RESUME(1);
>  4592
>  4593 if (!(ib->flags & AMDGPU_IB_FLAG_CE) && vmid)
>  4594 gfx_v10_0_ring_emit_de_meta(ring,
>  4595 (!amdgpu_sriov_vf(ring->adev) && flags &
>AMDGPU_IB_PREEMPTED) ? true : false);
>  4596 }
>
>
>
>  4742 static void gfx_v10_0_ring_emit_cntxcntl(struct amdgpu_ring *ring,
>  4743  uint32_t flags)
>  4744 {
>  4745 uint32_t dw2 = 0;
>  4746
>  4747 if (amdgpu_mcbp || amdgpu_sriov_vf(ring->adev))
>  4748 gfx_v10_0_ring_emit_ce_meta(ring,
>  4749 (!amdgpu_sriov_vf(ring->adev) && flags &
>AMDGPU_IB_PREEMPTED) ? true : false);
>  4750
>  4751 dw2 |= 0x8000; /* set load_enable otherwise this package is just
>NOPs */
>  4752 if (flags & AMDGPU_HAVE_CTX_SWITCH) {
>
>72
> 73 /* don't enable OS preemption on SDMA under SRIOV */
> 74 if (amdgpu_sriov_vf(adev) || vmid == 0 || !amdgpu_mcbp)
> 75 return 0;
> 76
> 77 r = amdgpu_sdma_get_index_from_ring(ring, );
> 78
> 79 if (r || index > 31)
> 80 csa_mc_addr = 0;
>
>
>You need to change all the place  refer to "amdgpu_mcbp", and remove the
>condition of " || amdgpu_srvio_vf()"
>
>_
>Monk Liu|GPU Virtualization Team |AMD
>
>
>-Original Message-
>From: amd-gfx  On Behalf Of
>Emily.Deng
>Sent: Monday, September 21, 2020 3:55 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: [PATCH] drm/amdgpu/sriov: Enable the mcbp parameter for sriov
>
>For debug convenient, reuse mcbp parameter for sriov mcbp
>
>Signed-off-by: Emily.Deng 
>Change-Id: If1222b2c050376feefb8fed4be58b4b87d36bd77
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 9 ++---
> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 5 +++--
> driver

RE: [PATCH 1/2] drm/amdgpu/sriov: Add one parameter for mcbp debug

2020-09-21 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Hi Monk,
Good suggestion, will send out patch again.

Best wishes
Emily Deng



>-Original Message-
>From: Liu, Monk 
>Sent: Monday, September 21, 2020 1:37 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: RE: [PATCH 1/2] drm/amdgpu/sriov: Add one parameter for mcbp
>debug
>
>[AMD Official Use Only - Internal Distribution Only]
>
>Hi Emily
>
>There is already a amdgpu_mcbp parameter there, can you try to leverage that
>one ?
>
>e.g.:
>we refactor our driver's code and reduce the checking logic  from  "if
>(amdgpu_mcbp || amdgpu_sriov_vf(adev))" to something like
>"if( amdgpu_mcbp) "
>
>therefore:
>1) You need to force set "amdgpu_mcbp" to true in the driver's init stage once
>the "SRIOV" is detected *and* "amdgpu_mcbp" is not set to "0";
>2) for Bare-metal, we just leave "amdgpu_mcbp" as the value it was
>3) we interpret  "amdgpu_mcbp"  as:
>0: force disable, it will be "disable" for both BM and SRIOV
>1:  force enable, auto (default), it will be "enable" for both BM and SRIOV
>
>This way if you can disable MCBP in both SRIOV and BM by that existed
>parameter instead of introducing a duplicated one ...
>
>_
>Monk Liu|GPU Virtualization Team |AMD
>
>
>-Original Message-
>From: amd-gfx  On Behalf Of
>Emily.Deng
>Sent: Friday, September 18, 2020 11:27 AM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: [PATCH 1/2] drm/amdgpu/sriov: Add one parameter for mcbp debug
>
>For debug convenient, add sriov_mcbp parameter.
>
>Signed-off-by: Emily.Deng 
>Change-Id: I84019eb4344e00d85b2ecc853145aabb312412fe
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 9 +
>drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c  | 3 ++-
>drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 2 +-
> 4 files changed, 13 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>index 13f92dea182a..a255fbf4d370 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>@@ -183,6 +183,7 @@ extern uint amdgpu_ras_mask;  extern int
>amdgpu_bad_page_threshold;  extern int amdgpu_async_gfx_ring;  extern int
>amdgpu_mcbp;
>+extern int amdgpu_sriov_mcbp;
> extern int amdgpu_discovery;
> extern int amdgpu_mes;
> extern int amdgpu_noretry;
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>index 3f07d1475bd2..b0b2f0f7be94 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>@@ -145,6 +145,7 @@ uint amdgpu_dc_feature_mask = 0;  uint
>amdgpu_dc_debug_mask = 0;  int amdgpu_async_gfx_ring = 1;  int
>amdgpu_mcbp = 0;
>+int amdgpu_sriov_mcbp = 1;
> int amdgpu_discovery = -1;
> int amdgpu_mes = 0;
> int amdgpu_noretry;
>@@ -578,6 +579,14 @@ MODULE_PARM_DESC(mcbp,  "Enable Mid-command
>buffer preemption (0 = disabled (default), 1 = enabled)");
>module_param_named(mcbp, amdgpu_mcbp, int, 0444);
>
>+/**
>+ * DOC: sriov_mcbp (int)
>+ * It is used to enable mid command buffer preemption. (0 = disabled, 1
>+= enabled(default))  */ MODULE_PARM_DESC(sriov_mcbp, "Enable sriov
>+Mid-command buffer preemption (0 = disabled (default), 1 = enabled)");
>+module_param_named(sriov_mcbp, amdgpu_sriov_mcbp, int, 0444);
>+
> /**
>  * DOC: discovery (int)
>  * Allow driver to discover hardware IP information from IP Discovery table at
>the top of VRAM.
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>index 2f53fa0ae9a6..ca0e17688bdf 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>@@ -236,7 +236,8 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring,
>unsigned num_ibs,
>
> for (i = 0; i < num_ibs; ++i) {
> ib = [i];
>-
>+if (!amdgpu_sriov_mcbp)
>+ib->flags &= ~AMDGPU_IB_FLAG_PREEMPT;
> /* drop preamble IBs if we don't have a context switch */  if ((ib->flags &
>AMDGPU_IB_FLAG_PREAMBLE) &&
> skip_preamble &&
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>index d7f37cb92a97..156e76a5a6e0 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>@@ -742,7 +742,7 @@ static int amdgpu_info_ioctl(struct drm_device *dev,
>void *data, struct drm_file  dev_info.ids_flags = 0;  if (adev->flags &
>AMD_IS_APU)  dev_i

RE: [PATCH 1/2] drm/amdgpu/sriov: Add one parameter for mcbp debug

2020-09-20 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Ping .

>-Original Message-
>From: Emily.Deng 
>Sent: Friday, September 18, 2020 11:27 AM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: [PATCH 1/2] drm/amdgpu/sriov: Add one parameter for mcbp debug
>
>For debug convenient, add sriov_mcbp parameter.
>
>Signed-off-by: Emily.Deng 
>Change-Id: I84019eb4344e00d85b2ecc853145aabb312412fe
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 9 +
>drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c  | 3 ++-
>drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 2 +-
> 4 files changed, 13 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>index 13f92dea182a..a255fbf4d370 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>@@ -183,6 +183,7 @@ extern uint amdgpu_ras_mask;  extern int
>amdgpu_bad_page_threshold;  extern int amdgpu_async_gfx_ring;  extern int
>amdgpu_mcbp;
>+extern int amdgpu_sriov_mcbp;
> extern int amdgpu_discovery;
> extern int amdgpu_mes;
> extern int amdgpu_noretry;
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>index 3f07d1475bd2..b0b2f0f7be94 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>@@ -145,6 +145,7 @@ uint amdgpu_dc_feature_mask = 0;  uint
>amdgpu_dc_debug_mask = 0;  int amdgpu_async_gfx_ring = 1;  int
>amdgpu_mcbp = 0;
>+int amdgpu_sriov_mcbp = 1;
> int amdgpu_discovery = -1;
> int amdgpu_mes = 0;
> int amdgpu_noretry;
>@@ -578,6 +579,14 @@ MODULE_PARM_DESC(mcbp,
> "Enable Mid-command buffer preemption (0 = disabled (default), 1 =
>enabled)");  module_param_named(mcbp, amdgpu_mcbp, int, 0444);
>
>+/**
>+ * DOC: sriov_mcbp (int)
>+ * It is used to enable mid command buffer preemption. (0 = disabled, 1
>+= enabled(default))  */ MODULE_PARM_DESC(sriov_mcbp,
>+"Enable sriov Mid-command buffer preemption (0 = disabled (default),
>1
>+= enabled)"); module_param_named(sriov_mcbp, amdgpu_sriov_mcbp, int,
>+0444);
>+
> /**
>  * DOC: discovery (int)
>  * Allow driver to discover hardware IP information from IP Discovery table at
>the top of VRAM.
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>index 2f53fa0ae9a6..ca0e17688bdf 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>@@ -236,7 +236,8 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring,
>unsigned num_ibs,
>
> for (i = 0; i < num_ibs; ++i) {
> ib = [i];
>-
>+if (!amdgpu_sriov_mcbp)
>+ib->flags &= ~AMDGPU_IB_FLAG_PREEMPT;
> /* drop preamble IBs if we don't have a context switch */
> if ((ib->flags & AMDGPU_IB_FLAG_PREAMBLE) &&
> skip_preamble &&
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>index d7f37cb92a97..156e76a5a6e0 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>@@ -742,7 +742,7 @@ static int amdgpu_info_ioctl(struct drm_device *dev,
>void *data, struct drm_file
> dev_info.ids_flags = 0;
> if (adev->flags & AMD_IS_APU)
> dev_info.ids_flags |= AMDGPU_IDS_FLAGS_FUSION;
>-if (amdgpu_mcbp || amdgpu_sriov_vf(adev))
>+if (amdgpu_mcbp || (amdgpu_sriov_vf(adev) &&
>amdgpu_sriov_mcbp))
> dev_info.ids_flags |=
>AMDGPU_IDS_FLAGS_PREEMPTION;
> if (amdgpu_is_tmz(adev))
> dev_info.ids_flags |= AMDGPU_IDS_FLAGS_TMZ;
>--
>2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: Fix dead lock issue for vblank

2020-09-20 Thread Deng, Emily
Hi Hawking,
It already exit in the original file, for before patch, I move it to 
another place, for this V2 patch, I just use:
static int dce_virtual_pageflip(struct amdgpu_device *adev,
>+  unsigned crtc_id);


>-Original Message-
>From: Zhang, Hawking 
>Sent: Friday, September 18, 2020 8:12 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: RE: [PATCH] drm/amdgpu: Fix dead lock issue for vblank
>
>[AMD Public Use]
>
>Hi Emily,
>
>I can't find the implementation of dce_virtual_pageflip in the patch. Is it
>dropped by accident?
>
>Regards,
>Hawking
>
>-Original Message-
>From: amd-gfx  On Behalf Of
>Emily.Deng
>Sent: Friday, September 18, 2020 18:13
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: [PATCH] drm/amdgpu: Fix dead lock issue for vblank
>
>Always start vblank timer, but only calls vblank function when vblank is 
>enabled.
>
>This is used to fix the dead lock issue.
>When drm_crtc_vblank_off want to disable vblank, it first get event_lock, and
>then call hrtimer_cancel, but hrtimer_cancel want to wait timer handler
>function finished.
>Timer handler also want to aquire event_lock in drm_handle_vblank.
>
>Signed-off-by: Emily.Deng 
>Change-Id: I7d3cfb1202cd030fdcdec3e7483fcc4c9fa8db70
>---
> drivers/gpu/drm/amd/amdgpu/dce_virtual.c | 35 
> 1 file changed, 17 insertions(+), 18 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>index cc93577dee03..469c05fd43d5 100644
>--- a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>+++ b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>@@ -47,6 +47,9 @@ static void dce_virtual_set_display_funcs(struct
>amdgpu_device *adev);  static void dce_virtual_set_irq_funcs(struct
>amdgpu_device *adev);  static int dce_virtual_connector_encoder_init(struct
>amdgpu_device *adev,
> int index);
>+static int dce_virtual_pageflip(struct amdgpu_device *adev,
>+  unsigned crtc_id);
>+static enum hrtimer_restart dce_virtual_vblank_timer_handle(struct
>+hrtimer *vblank_timer);
> static void dce_virtual_set_crtc_vblank_interrupt_state(struct amdgpu_device
>*adev,
>   int crtc,
>   enum
>amdgpu_interrupt_state state); @@ -247,6 +250,11 @@ static int
>dce_virtual_crtc_init(struct amdgpu_device *adev, int index)
>   amdgpu_crtc->vsync_timer_enabled = AMDGPU_IRQ_STATE_DISABLE;
>   drm_crtc_helper_add(_crtc->base,
>_virtual_crtc_helper_funcs);
>
>+  hrtimer_init(_crtc->vblank_timer, CLOCK_MONOTONIC,
>HRTIMER_MODE_REL);
>+  hrtimer_set_expires(_crtc->vblank_timer,
>DCE_VIRTUAL_VBLANK_PERIOD);
>+  amdgpu_crtc->vblank_timer.function =
>dce_virtual_vblank_timer_handle;
>+  hrtimer_start(_crtc->vblank_timer,
>+DCE_VIRTUAL_VBLANK_PERIOD, HRTIMER_MODE_REL);
>   return 0;
> }
>
>@@ -476,7 +484,7 @@ static int dce_virtual_hw_fini(void *handle)
>
>   for (i = 0; imode_info.num_crtc; i++)
>   if (adev->mode_info.crtcs[i])
>-  dce_virtual_set_crtc_vblank_interrupt_state(adev, i,
>AMDGPU_IRQ_STATE_DISABLE);
>+  hrtimer_cancel(>mode_info.crtcs[i]-
>>vblank_timer);
>
>   return 0;
> }
>@@ -698,9 +706,15 @@ static enum hrtimer_restart
>dce_virtual_vblank_timer_handle(struct hrtimer *vbla
>  struct amdgpu_crtc,
>vblank_timer);
>   struct drm_device *ddev = amdgpu_crtc->base.dev;
>   struct amdgpu_device *adev = drm_to_adev(ddev);
>+  struct amdgpu_irq_src *source = adev-
>>irq.client[AMDGPU_IRQ_CLIENTID_LEGACY].sources
>+  [VISLANDS30_IV_SRCID_SMU_DISP_TIMER2_TRIGGER];
>+  int irq_type = amdgpu_display_crtc_idx_to_irq_type(adev,
>+  amdgpu_crtc->crtc_id);
>
>-  drm_handle_vblank(ddev, amdgpu_crtc->crtc_id);
>-  dce_virtual_pageflip(adev, amdgpu_crtc->crtc_id);
>+  if (amdgpu_irq_enabled(adev, source, irq_type)) {
>+  drm_handle_vblank(ddev, amdgpu_crtc->crtc_id);
>+  dce_virtual_pageflip(adev, amdgpu_crtc->crtc_id);
>+  }
>   hrtimer_start(vblank_timer, DCE_VIRTUAL_VBLANK_PERIOD,
> HRTIMER_MODE_REL);
>
>@@ -716,21 +730,6 @@ static void
>dce_virtual_set_crtc_vblank_interrupt_state(struct amdgpu_device *ad
>   return;
>   }
>
>-  if (state && !adev->mode_info.crtc

RE: [PATCH] drm/amdgpu: Fix dead lock issue for vblank

2020-09-18 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Hi Hawking,
You are right, my base is not the tip. I have updated now. Please help to 
review again.

Best wishes
Emily Deng



>-Original Message-
>From: Emily.Deng 
>Sent: Friday, September 18, 2020 6:13 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: [PATCH] drm/amdgpu: Fix dead lock issue for vblank
>
>Always start vblank timer, but only calls vblank function when vblank is 
>enabled.
>
>This is used to fix the dead lock issue.
>When drm_crtc_vblank_off want to disable vblank, it first get event_lock, and
>then call hrtimer_cancel, but hrtimer_cancel want to wait timer handler
>function finished.
>Timer handler also want to aquire event_lock in drm_handle_vblank.
>
>Signed-off-by: Emily.Deng 
>Change-Id: I7d3cfb1202cd030fdcdec3e7483fcc4c9fa8db70
>---
> drivers/gpu/drm/amd/amdgpu/dce_virtual.c | 35 
> 1 file changed, 17 insertions(+), 18 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>index cc93577dee03..469c05fd43d5 100644
>--- a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>+++ b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>@@ -47,6 +47,9 @@ static void dce_virtual_set_display_funcs(struct
>amdgpu_device *adev);  static void dce_virtual_set_irq_funcs(struct
>amdgpu_device *adev);  static int dce_virtual_connector_encoder_init(struct
>amdgpu_device *adev,
>   int index);
>+static int dce_virtual_pageflip(struct amdgpu_device *adev,
>+unsigned crtc_id);
>+static enum hrtimer_restart dce_virtual_vblank_timer_handle(struct
>+hrtimer *vblank_timer);
> static void dce_virtual_set_crtc_vblank_interrupt_state(struct amdgpu_device
>*adev,
> int crtc,
> enum
>amdgpu_interrupt_state state); @@ -247,6 +250,11 @@ static int
>dce_virtual_crtc_init(struct amdgpu_device *adev, int index)
> amdgpu_crtc->vsync_timer_enabled = AMDGPU_IRQ_STATE_DISABLE;
> drm_crtc_helper_add(_crtc->base,
>_virtual_crtc_helper_funcs);
>
>+hrtimer_init(_crtc->vblank_timer, CLOCK_MONOTONIC,
>HRTIMER_MODE_REL);
>+hrtimer_set_expires(_crtc->vblank_timer,
>DCE_VIRTUAL_VBLANK_PERIOD);
>+amdgpu_crtc->vblank_timer.function =
>dce_virtual_vblank_timer_handle;
>+hrtimer_start(_crtc->vblank_timer,
>+  DCE_VIRTUAL_VBLANK_PERIOD, HRTIMER_MODE_REL);
> return 0;
> }
>
>@@ -476,7 +484,7 @@ static int dce_virtual_hw_fini(void *handle)
>
> for (i = 0; imode_info.num_crtc; i++)
> if (adev->mode_info.crtcs[i])
>-dce_virtual_set_crtc_vblank_interrupt_state(adev, i,
>AMDGPU_IRQ_STATE_DISABLE);
>+hrtimer_cancel(>mode_info.crtcs[i]-
>>vblank_timer);
>
> return 0;
> }
>@@ -698,9 +706,15 @@ static enum hrtimer_restart
>dce_virtual_vblank_timer_handle(struct hrtimer *vbla
>struct amdgpu_crtc,
>vblank_timer);
> struct drm_device *ddev = amdgpu_crtc->base.dev;
> struct amdgpu_device *adev = drm_to_adev(ddev);
>+struct amdgpu_irq_src *source = adev-
>>irq.client[AMDGPU_IRQ_CLIENTID_LEGACY].sources
>+[VISLANDS30_IV_SRCID_SMU_DISP_TIMER2_TRIGGER];
>+int irq_type = amdgpu_display_crtc_idx_to_irq_type(adev,
>+amdgpu_crtc->crtc_id);
>
>-drm_handle_vblank(ddev, amdgpu_crtc->crtc_id);
>-dce_virtual_pageflip(adev, amdgpu_crtc->crtc_id);
>+if (amdgpu_irq_enabled(adev, source, irq_type)) {
>+drm_handle_vblank(ddev, amdgpu_crtc->crtc_id);
>+dce_virtual_pageflip(adev, amdgpu_crtc->crtc_id);
>+}
> hrtimer_start(vblank_timer, DCE_VIRTUAL_VBLANK_PERIOD,
>   HRTIMER_MODE_REL);
>
>@@ -716,21 +730,6 @@ static void
>dce_virtual_set_crtc_vblank_interrupt_state(struct amdgpu_device *ad
> return;
> }
>
>-if (state && !adev->mode_info.crtcs[crtc]->vsync_timer_enabled) {
>-DRM_DEBUG("Enable software vsync timer\n");
>-hrtimer_init(>mode_info.crtcs[crtc]->vblank_timer,
>- CLOCK_MONOTONIC, HRTIMER_MODE_REL);
>-hrtimer_set_expires(>mode_info.crtcs[crtc]-
>>vblank_timer,
>-DCE_VIRTUAL_VBLANK_PERIOD);
>-adev->mode_info.crtcs[crtc]->vblank_timer.function =
>-dce_virtual_vblank_timer_handle;
>-hrtimer_start(>mode_info.crtcs[crtc]->vblank_timer,
>-  DCE_VIRTUAL_VBLANK_PERIOD,
>HRTIMER_MODE_REL);
>-} else if (!state && adev->mode_info.crtcs[crtc]->vsync_timer_enabled)
>{
>-DRM_DEBUG("Disable software vsync timer\n");
>-hrtimer_cancel(>mode_info.crtcs[crtc]->vblank_timer);
>-}
>-
> adev->mode_info.crtcs[crtc]->vsync_timer_enabled = state;
> DRM_DEBUG("[FM]set crtc %d vblank interrupt state %d\n", crtc,
>state);  }
>--
>2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 2/2] drm/amdgpu: Fix dead lock issue for vblank

2020-09-18 Thread Deng, Emily
Thanks, will double check.

Best wishes
Emily Deng



>-Original Message-
>From: Zhang, Hawking 
>Sent: Friday, September 18, 2020 4:20 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: RE: [PATCH 2/2] drm/amdgpu: Fix dead lock issue for vblank
>
>[AMD Public Use]
>
>+  spin_lock_irqsave(>ddev->event_lock, flags);
>
>Are you sure you used the latest code base? I think recently we already switch
>to adev_to_drm(adev).
>
>Could you please double check?
>
>Regards,
>Hawking
>
>-Original Message-
>From: amd-gfx  On Behalf Of
>Emily.Deng
>Sent: Friday, September 18, 2020 11:27
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: [PATCH 2/2] drm/amdgpu: Fix dead lock issue for vblank
>
>Always start vblank timer, but only calls vblank function when vblank is 
>enabled.
>
>This is used to fix the dead lock issue.
>When drm_crtc_vblank_off want to disable vblank, it first get event_lock, and
>then call hrtimer_cancel, but hrtimer_cancel want to wait timer handler
>function finished.
>Timer handler also want to aquire event_lock in drm_handle_vblank.
>
>Signed-off-by: Emily.Deng 
>Change-Id: I7d3cfb1202cd030fdcdec3e7483fcc4c9fa8db70
>---
> drivers/gpu/drm/amd/amdgpu/dce_virtual.c | 155 +++
> 1 file changed, 77 insertions(+), 78 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>index cc93577dee03..8c02ab74c1de 100644
>--- a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>+++ b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>@@ -226,6 +226,74 @@ static const struct drm_crtc_helper_funcs
>dce_virtual_crtc_helper_funcs = {
>   .get_scanout_position = amdgpu_crtc_get_scanout_position,  };
>
>+static int dce_virtual_pageflip(struct amdgpu_device *adev,
>+  unsigned crtc_id)
>+{
>+  unsigned long flags;
>+  struct amdgpu_crtc *amdgpu_crtc;
>+  struct amdgpu_flip_work *works;
>+
>+  amdgpu_crtc = adev->mode_info.crtcs[crtc_id];
>+
>+  if (crtc_id >= adev->mode_info.num_crtc) {
>+  DRM_ERROR("invalid pageflip crtc %d\n", crtc_id);
>+  return -EINVAL;
>+  }
>+
>+  /* IRQ could occur when in initial stage */
>+  if (amdgpu_crtc == NULL)
>+  return 0;
>+
>+  spin_lock_irqsave(>ddev->event_lock, flags);
>+  works = amdgpu_crtc->pflip_works;
>+  if (amdgpu_crtc->pflip_status != AMDGPU_FLIP_SUBMITTED) {
>+  DRM_DEBUG_DRIVER("amdgpu_crtc->pflip_status = %d != "
>+  "AMDGPU_FLIP_SUBMITTED(%d)\n",
>+  amdgpu_crtc->pflip_status,
>+  AMDGPU_FLIP_SUBMITTED);
>+  spin_unlock_irqrestore(>ddev->event_lock, flags);
>+  return 0;
>+  }
>+
>+  /* page flip completed. clean up */
>+  amdgpu_crtc->pflip_status = AMDGPU_FLIP_NONE;
>+  amdgpu_crtc->pflip_works = NULL;
>+
>+  /* wakeup usersapce */
>+  if (works->event)
>+  drm_crtc_send_vblank_event(_crtc->base, works-
>>event);
>+
>+  spin_unlock_irqrestore(>ddev->event_lock, flags);
>+
>+  drm_crtc_vblank_put(_crtc->base);
>+  amdgpu_bo_unref(>old_abo);
>+  kfree(works->shared);
>+  kfree(works);
>+
>+  return 0;
>+}
>+
>+static enum hrtimer_restart dce_virtual_vblank_timer_handle(struct
>+hrtimer *vblank_timer) {
>+  struct amdgpu_crtc *amdgpu_crtc = container_of(vblank_timer,
>+ struct amdgpu_crtc,
>vblank_timer);
>+  struct drm_device *ddev = amdgpu_crtc->base.dev;
>+  struct amdgpu_device *adev = ddev->dev_private;
>+  struct amdgpu_irq_src *source = adev-
>>irq.client[AMDGPU_IRQ_CLIENTID_LEGACY].sources
>+  [VISLANDS30_IV_SRCID_SMU_DISP_TIMER2_TRIGGER];
>+  int irq_type = amdgpu_display_crtc_idx_to_irq_type(adev,
>+  amdgpu_crtc->crtc_id);
>+
>+  if (amdgpu_irq_enabled(adev, source, irq_type)) {
>+  drm_handle_vblank(ddev, amdgpu_crtc->crtc_id);
>+  dce_virtual_pageflip(adev, amdgpu_crtc->crtc_id);
>+  }
>+  hrtimer_start(vblank_timer, ktime_set(0,
>DCE_VIRTUAL_VBLANK_PERIOD),
>+HRTIMER_MODE_REL);
>+
>+  return HRTIMER_NORESTART;
>+}
>+
> static int dce_virtual_crtc_init(struct amdgpu_device *adev, int index)  {
>   struct amdgpu_crtc *amdgpu_crtc;
>@@ -247,6 +315,14 @@ static int dce_virtual_crtc_in

RE: [PATCH] drm/amd/pm: Skip smu_post_init in SRIOV

2020-09-17 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Emily.Deng 

>-Original Message-
>From: amd-gfx  On Behalf Of
>Jingwen Chen
>Sent: Thursday, September 17, 2020 5:43 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Chen, JingWen 
>Subject: [PATCH] drm/amd/pm: Skip smu_post_init in SRIOV
>
>smu_post_init needs to enable SMU feature, while this require virtualization
>off. Skip it since this feature is not used in SRIOV.
>
>v2: move the check to the early stage of smu_post_init.
>
>v3: fix typo
>
>Signed-off-by: Jingwen Chen 
>---
> drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c | 3 +++
> 1 file changed, 3 insertions(+)
>
>diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
>b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
>index a027c7fdad56..05cb1fdd15ce 100644
>--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
>+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
>@@ -2631,6 +2631,9 @@ static int navi10_post_smu_init(struct smu_context
>*smu)
> uint64_t feature_mask = 0;
> int ret = 0;
>
>+if (amdgpu_sriov_vf(adev))
>+return 0;
>+
> /* For Naiv1x, enable these features only after DAL initialization */
> if (adev->pm.pp_feature & PP_SOCCLK_DPM_MASK)
> feature_mask |= FEATURE_MASK(FEATURE_DPM_SOCCLK_BIT);
>--
>2.25.1
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.fre
>edesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfxdata=02%7C01%7CEmily.Deng%40amd.com%7Ca1fbb64ca45945c3f3
>e008d85aee24df%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C6
>37359326168883628sdata=Ix4oI%2FxGMb3vUimmLO%2Bix%2Bgqp9OY
>O0WfTOlZvieZj3Y%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 1/1] drm/amdgpu: rework ip block reinit for sriov

2020-08-27 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Hi Nirmoy,
Still think the original logical is more clear.

Best wishes
Emily Deng



>-Original Message-
>From: Das, Nirmoy 
>Sent: Thursday, August 27, 2020 11:19 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deucher, Alexander ; Liu, Monk
>; Gu, JiaWei (Will) ; Deng, Emily
>; Das, Nirmoy 
>Subject: [PATCH 1/1] drm/amdgpu: rework ip block reinit for sriov
>
>This patch removes some unwanted code duplication and simplifies sriov's ip
>block reinit logic.
>
>Signed-off-by: Nirmoy Das 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 117 +++--
> 1 file changed, 60 insertions(+), 57 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>index 696a61cc3ac6..0db6db03bcd3 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>@@ -2587,77 +2587,80 @@ int amdgpu_device_ip_suspend(struct
>amdgpu_device *adev)
> return r;
> }
>
>-static int amdgpu_device_ip_reinit_early_sriov(struct amdgpu_device *adev)
>+/** amdgpu_device_is_early_ip_block_sriov - check for early ip_blocks
>+ *
>+ * @ip_block: ip block that need to be check
>+ *
>+ * Returns a tri-state value for a given ip block.
>+ * If an ip block requires early reinit sriov then return 1 or 0 otherwise.
>+ * Return -1 on invalid ip block.
>+ *
>+ */
>+
>+static int
>+amdgpu_device_is_early_ip_block_sriov(const enum amd_ip_block_type
>+ip_block)
> {
>-int i, r;
>+switch (ip_block) {
>+/* early ip blocks */
>+case AMD_IP_BLOCK_TYPE_GMC:
>+case AMD_IP_BLOCK_TYPE_COMMON:
>+case AMD_IP_BLOCK_TYPE_PSP:
>+case AMD_IP_BLOCK_TYPE_IH:
>+return 1;
>+/* late ip blocks */
>+case AMD_IP_BLOCK_TYPE_SMC:
>+case AMD_IP_BLOCK_TYPE_DCE:
>+case AMD_IP_BLOCK_TYPE_GFX:
>+case AMD_IP_BLOCK_TYPE_SDMA:
>+case AMD_IP_BLOCK_TYPE_UVD:
>+case AMD_IP_BLOCK_TYPE_VCE:
>+case AMD_IP_BLOCK_TYPE_VCN:
>+return 0;
>+/* invalid ip block */
>+default:
>+return -1;
>+}
>+}
>
>-static enum amd_ip_block_type ip_order[] = {
>-AMD_IP_BLOCK_TYPE_GMC,
>-AMD_IP_BLOCK_TYPE_COMMON,
>-AMD_IP_BLOCK_TYPE_PSP,
>-AMD_IP_BLOCK_TYPE_IH,
>-};
>+static int amdgpu_device_ip_reinit_sriov(struct amdgpu_device *adev,
>+ const int is_early)
>+{
>+int i;
>
> for (i = 0; i < adev->num_ip_blocks; i++) {
>-int j;
>+int r = 0;
>+bool init_ip;
> struct amdgpu_ip_block *block;
>+enum amd_ip_block_type ip_block;
>
> block = >ip_blocks[i];
> block->status.hw = false;
>+ip_block = block->version->type;
>+init_ip = (is_early ==
>+   amdgpu_device_is_early_ip_block_sriov(ip_block));
>
>-for (j = 0; j < ARRAY_SIZE(ip_order); j++) {
>-
>-if (block->version->type != ip_order[j] ||
>-!block->status.valid)
>-continue;
>+if (!init_ip ||
>+(!is_early && block->status.hw) ||
>+!block->status.valid)
>+continue;
>
>-r = block->version->funcs->hw_init(adev);
>-DRM_INFO("RE-INIT-early: %s %s\n", block->version-
>>funcs->name, r?"failed":"succeeded");
>-if (r)
>-return r;
>-block->status.hw = true;
>+if (init_ip && (ip_block == AMD_IP_BLOCK_TYPE_SMC)) {
>+r = block->version->funcs->resume(adev);
>+goto show_log;
> }
>-}
>-
>-return 0;
>-}
>
>-static int amdgpu_device_ip_reinit_late_sriov(struct amdgpu_device *adev) -{
>-int i, r;
>-
>-static enum amd_ip_block_type ip_order[] = {
>-AMD_IP_BLOCK_TYPE_SMC,
>-AMD_IP_BLOCK_TYPE_DCE,
>-AMD_IP_BLOCK_TYPE_GFX,
>-AMD_IP_BLOCK_TYPE_SDMA,
>-AMD_IP_BLOCK_TYPE_UVD,
>-AMD_IP_BLOCK_TYPE_VCE,
>-AMD_IP_BLOCK_TYPE_VCN
>-};
>-
>-for (i = 0; i < ARRAY_SIZE(ip_order); i++) {
>-int j;
>-struct amdgpu_ip_block *block;
>+if (init_ip)
>+r = block->version->funcs->hw_init(adev);
>
>-for (j = 0; j < adev->num_ip_blocks; j++) {
>-block = >ip_blocks[j];
>+show_log:
>+DRM_INFO("RE-INIT-%s: %s %s\n", is_early ? "early":"late",
>+ block->version->funcs->name, r ?
>"failed":"succeeded");
>
>-if (block->version->type != ip_order[i] ||
>-!block->status.valid ||
>-block->status.hw)
>-continue;
>+if (r)
>+return r;
>
>-if (block->version->type ==
>AMD_IP_BLOCK_TYPE_SMC)
>-r = block->version->funcs->resume(adev);
>-else
>-r = block->version->funcs->hw_init(adev);
>+block->status.hw = true;
>
>-DRM_INFO("RE-INIT-late: %s %s\n", block->version-
>>funcs->name, r?"failed":"succeeded");
>-if (r)
>-return 

RE: [PATCH] SWDEV-220451 - Query guest's information by VF2PF message - Guest side - part 1

2020-08-27 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Emily.Deng 

>-Original Message-
>From: amd-gfx  On Behalf Of Bokun
>Zhang
>Sent: Wednesday, August 5, 2020 11:32 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Zhang, Bokun 
>Subject: [PATCH] SWDEV-220451 - Query guest's information by VF2PF
>message - Guest side - part 1
>
>- Add guest side change to support VF2PF message
>- Fix coding style
>
>Change-Id: I82e5518cb10ec0b19fecaba7e05b02f4b7f2b409
>Signed-off-by: Bokun Zhang 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h|  29 +-
> drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h | 276
>
> 2 files changed, 285 insertions(+), 20 deletions(-)  create mode 100644
>drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>index b0b2bdc750df..ad2b2628ab67 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>@@ -24,6 +24,8 @@
> #ifndef AMDGPU_VIRT_H
> #define AMDGPU_VIRT_H
>
>+#include "amdgv_sriovmsg.h"
>+
> #define AMDGPU_SRIOV_CAPS_SRIOV_VBIOS  (1 << 0) /* vBIOS is sr-iov ready
>*/
> #define AMDGPU_SRIOV_CAPS_ENABLE_IOV   (1 << 1) /* sr-iov is enabled on
>this GPU */
> #define AMDGPU_SRIOV_CAPS_IS_VF(1 << 2) /* this GPU is a virtual
>function */
>@@ -69,7 +71,10 @@ struct amdgpu_virt_fw_reserve {
> struct amd_sriov_msg_vf2pf_info_header *p_vf2pf;
> unsigned int checksum_key;
> };
>+
> /*
>+ * Legacy GIM header
>+ *
>  * Defination between PF and VF
>  * Structures forcibly aligned to 4 to keep the same style as PF.
>  */
>@@ -89,15 +94,7 @@ enum AMDGIM_FEATURE_FLAG {
> AMDGIM_FEATURE_HW_PERF_SIMULATION = (1 << 3),  };
>
>-struct amd_sriov_msg_pf2vf_info_header {
>-/* the total structure size in byte. */
>-uint32_t size;
>-/* version of this structure, written by the GIM */
>-uint32_t version;
>-/* reserved */
>-uint32_t reserved[2];
>-} __aligned(4);
>-struct  amdgim_pf2vf_info_v1 {
>+struct amdgim_pf2vf_info_v1 {
> /* header contains size and version */
> struct amd_sriov_msg_pf2vf_info_header header;
> /* max_width * max_height */
>@@ -116,6 +113,7 @@ struct  amdgim_pf2vf_info_v1 {
> unsigned int checksum;
> } __aligned(4);
>
>+/* TODO: below struct is duplicated to amd_sriov_msg_pf2vf_info */
> struct  amdgim_pf2vf_info_v2 {
> /* header contains size and version */
> struct amd_sriov_msg_pf2vf_info_header header; @@ -146,16 +144,6
>@@ struct  amdgim_pf2vf_info_v2 {
> uint32_t reserved[AMDGIM_GET_STRUCTURE_RESERVED_SIZE(256, 0,
>0, (9 + sizeof(struct amd_sriov_msg_pf2vf_info_header)/sizeof(uint32_t)), 3)]; 
> }
>__aligned(4);
>
>-
>-struct amd_sriov_msg_vf2pf_info_header {
>-/* the total structure size in byte. */
>-uint32_t size;
>-/*version of this structure, written by the guest */
>-uint32_t version;
>-/* reserved */
>-uint32_t reserved[2];
>-} __aligned(4);
>-
> struct amdgim_vf2pf_info_v1 {
> /* header contains size and version */
> struct amd_sriov_msg_vf2pf_info_header header; @@ -217,8 +205,9
>@@ struct amdgim_vf2pf_info_v2 {
> uint32_t reserved[AMDGIM_GET_STRUCTURE_RESERVED_SIZE(256,
>64, 0, (12 + sizeof(struct amd_sriov_msg_vf2pf_info_header)/sizeof(uint32_t)),
>0)];  } __aligned(4);
>
>+/* TODO: below macro and typedef will cause compile error, need to
>+remove */
> #define AMDGPU_FW_VRAM_VF2PF_VER 2
>-typedef struct amdgim_vf2pf_info_v2 amdgim_vf2pf_info ;
>+typedef struct amd_sriov_msg_vf2pf_info amdgim_vf2pf_info;
>
> #define AMDGPU_FW_VRAM_VF2PF_WRITE(adev, field, val) \
> do { \
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h
>b/drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h
>new file mode 100644
>index ..5355827ed0ae
>--- /dev/null
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h
>@@ -0,0 +1,276 @@
>+/*
>+ * Copyright 2018-2019 Advanced Micro Devices, Inc.
>+ *
>+ * Permission is hereby granted, free of charge, to any person
>+obtaining a
>+ * copy of this software and associated documentation files (the
>+"Software"),
>+ * to deal in the Software without restriction, including without
>+limitation
>+ * the rights to use, copy, modify, merge, publish, distribute,
>+sublicense,
>+ * and/or sell copies of the Software, and to permit persons to whom
>+the
>+ * Software is furnished to do so, subject to the following conditions:
>+ *
>+ * The above copyright notice and this permission notice shall be
>+included in
>+ * all copies or substantial portions of the Software.
>+ *
>+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>+EXPRESS OR
>+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>+MERCHANTABILITY,
>+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO
>EVENT
>+SHALL
>+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM,
>+DAMAGES OR
>+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
>+OTHERWISE,
>+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR
>THE USE
>+OR
>+ * OTHER DEALINGS IN THE SOFTWARE.
>+ *
>+ 

RE: [PATCH] drm/amdgpu: simplify hw status clear/set logic

2020-08-27 Thread Deng, Emily
Reviewed-by: Emily.Deng 

>-Original Message-
>From: amd-gfx  On Behalf Of Gu,
>JiaWei (Will)
>Sent: Thursday, August 27, 2020 2:50 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Gu, JiaWei (Will) 
>Subject: RE: [PATCH] drm/amdgpu: simplify hw status clear/set logic
>
>[AMD Official Use Only - Internal Distribution Only]
>
>Ping...
>
>-Original Message-
>From: Jiawei 
>Sent: Thursday, August 27, 2020 10:32 AM
>To: amd-gfx@lists.freedesktop.org
>Cc: Gu, JiaWei (Will) 
>Subject: [PATCH] drm/amdgpu: simplify hw status clear/set logic
>
>Optimize code to iterate less loops in
>amdgpu_device_ip_reinit_early_sriov()
>
>Signed-off-by: Jiawei 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 13 ++---
> 1 file changed, 6 insertions(+), 7 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>index 8f37f9f99105..696a61cc3ac6 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>@@ -2598,17 +2598,16 @@ static int
>amdgpu_device_ip_reinit_early_sriov(struct amdgpu_device *adev)
>   AMD_IP_BLOCK_TYPE_IH,
>   };
>
>-  for (i = 0; i < adev->num_ip_blocks; i++)
>-  adev->ip_blocks[i].status.hw = false;
>-
>-  for (i = 0; i < ARRAY_SIZE(ip_order); i++) {
>+  for (i = 0; i < adev->num_ip_blocks; i++) {
>   int j;
>   struct amdgpu_ip_block *block;
>
>-  for (j = 0; j < adev->num_ip_blocks; j++) {
>-  block = >ip_blocks[j];
>+  block = >ip_blocks[i];
>+  block->status.hw = false;
>
>-  if (block->version->type != ip_order[i] ||
>+  for (j = 0; j < ARRAY_SIZE(ip_order); j++) {
>+
>+  if (block->version->type != ip_order[j] ||
>   !block->status.valid)
>   continue;
>
>--
>2.17.1
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.free
>desktop.org%2Fmailman%2Flistinfo%2Famd-
>gfxdata=02%7C01%7CEmily.Deng%40amd.com%7Cd49aeebc6337454ad
>be508d84a558474%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C6
>37341078483894286sdata=ORgvulljLZNfexMcXZCXi4JEmz3J357Oxa%2B
>GxYW3%2FSo%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 2/2] drm/amdgpu: Limit the error info print rate

2020-08-18 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Ping ...
What about this patch.

>-Original Message-
>From: Emily.Deng 
>Sent: Tuesday, August 18, 2020 5:42 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: [PATCH 2/2] drm/amdgpu: Limit the error info print rate
>
>From: jqdeng 
>
>Use function printk_ratelimit to limit the print rate.
>
>Signed-off-by: jqdeng 
>Change-Id: Ief05debe30d975cbcf88e473c9f486d70b5a202c
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>index a94b3f862fc2..727b909b4b9e 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>@@ -1296,7 +1296,8 @@ int amdgpu_cs_ioctl(struct drm_device *dev, void
>*data, struct drm_file *filp)
>
> r = amdgpu_cs_parser_init(, data);
> if (r) {
>-DRM_ERROR("Failed to initialize parser %d!\n", r);
>+if (printk_ratelimit())
>+DRM_ERROR("Failed to initialize parser %d!\n", r);
> goto out;
> }
>
>--
>2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: Fix repeatly flr issue

2020-08-18 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

>-Original Message-
>From: Das, Nirmoy 
>Sent: Tuesday, August 18, 2020 4:22 PM
>To: Deng, Emily ; Das, Nirmoy
>; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu: Fix repeatly flr issue
>
>
>On 8/18/20 4:48 AM, Deng, Emily wrote:
>> [AMD Official Use Only - Internal Distribution Only]
>>
>>> -Original Message-
>>> From: Das, Nirmoy 
>>> Sent: Wednesday, August 12, 2020 8:18 PM
>>> To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>>> Subject: Re: [PATCH] drm/amdgpu: Fix repeatly flr issue
>>>
>>>
>>> On 8/12/20 11:19 AM, Emily.Deng wrote:
>>>> From: jqdeng 
>>>>
>>>> Only for no job running test case need to do recover in flr
>>>> notification.
>>>> For having job in mirror list, then let guest driver to hit job
>>>> timeout, and then do recover.
>>>>
>>>> Signed-off-by: jqdeng 
>>>> Change-Id: Ic6234fce46fa1655ba81c4149235eeac75e75868
>>>> ---
>>>>drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 20
>+++-
>>>>drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 22
>-
>>> -
>>>>2 files changed, 39 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>>>> b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>>>> index fe31cbeccfe9..12fe5164aaf3 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>>>> @@ -238,6 +238,9 @@ static void xgpu_ai_mailbox_flr_work(struct
>>> work_struct *work)
>>>>struct amdgpu_virt *virt = container_of(work, struct amdgpu_virt,
>>> flr_work);
>>>>struct amdgpu_device *adev = container_of(virt, struct
>>> amdgpu_device, virt);
>>>>int timeout = AI_MAILBOX_POLL_FLR_TIMEDOUT;
>>>> +int i;
>>>> +bool need_do_recover = true;
>>>
>>> We should find a better name for "need_do_recover", may be
>>> "need_to_recover" ?
>> Thanks, will modify later.
>>>
>>>> +struct drm_sched_job *job;
>>>>
>>>>/* block amdgpu_gpu_recover till msg FLR COMPLETE received,
>>>> * otherwise the mailbox msg will be ruined/reseted by @@ -258,10
>>>> +261,25 @@ static void xgpu_ai_mailbox_flr_work(struct
>>> work_struct *work)
>>>>flr_done:
>>>>up_read(>reset_sem);
>>>> +for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { struct amdgpu_ring *ring =
>>>> +adev->rings[i];
>>>> +
>>>> +if (!ring || !ring->sched.thread)
>>>> +continue;
>>>> +
>>>> +spin_lock(>sched.job_list_lock);
>>>> +job = list_first_entry_or_null(>sched.ring_mirror_list,
>>>> +struct drm_sched_job, node);
>>>> +spin_unlock(>sched.job_list_lock);
>>>> +if (job) {
>>>> +need_do_recover = false;
>>>> +break;
>>>> +}
>>>> +}
>>>
>>> This 1st job retrieval logic can move to a function as there are two
>>> instance of it.
>>> Sorry, I didn't get your point.
>
>
>xgpu_ai_mailbox_flr_work() and xgpu_nv_mailbox_flr_work() are using same
>logic under "flr_done:"  label trying to retrieve 1st job entry to determine if
>we should do recover or not.
>
>We could move that logic into a function like:
>
>
>bool function_name ()
>{
>for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
>struct amdgpu_ring *ring = adev->rings[i];
>
>if (!ring || !ring->sched.thread)
>continue;
>
>spin_lock(>sched.job_list_lock);
>job = list_first_entry_or_null(>sched.ring_mirror_list,
>struct drm_sched_job, node);
>spin_unlock(>sched.job_list_lock);
>if (job)
>return true;
>
>}
>
>return false;
>}
>
>and use that in xgpu_ai_mailbox_flr_work() and
>xgpu_nv_mailbox_flr_work() instead of
>
>having two copy of that logic.
Understand completely now. Thanks.
>
>
>
>Nirmoy
>
>>>
>>>>/* Trigger recovery for world switch failure if no TDR */
>>>>if (amdgpu_device_should_recover_gpu(adev)
>>>> -&& adev->sdma_timeout == MAX_SCHEDULE_TIMEOUT)
>>>> +&& (need_do_recover || adev->sdma_timeout ==
>>> MAX_SCHEDULE_TIMEOUT))
>>>>amdgpu_device_gpu_recover(adev, NULL);
>>>>}
>>>>
>>>> diff --gi

RE: [PATCH] drm/amdgpu: Fix repeatly flr issue

2020-08-17 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

>-Original Message-
>From: Das, Nirmoy 
>Sent: Wednesday, August 12, 2020 8:18 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu: Fix repeatly flr issue
>
>
>On 8/12/20 11:19 AM, Emily.Deng wrote:
>> From: jqdeng 
>>
>> Only for no job running test case need to do recover in flr
>> notification.
>> For having job in mirror list, then let guest driver to hit job
>> timeout, and then do recover.
>>
>> Signed-off-by: jqdeng 
>> Change-Id: Ic6234fce46fa1655ba81c4149235eeac75e75868
>> ---
>>   drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 20 +++-
>>   drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 22 -
>-
>>   2 files changed, 39 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>> b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>> index fe31cbeccfe9..12fe5164aaf3 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>> @@ -238,6 +238,9 @@ static void xgpu_ai_mailbox_flr_work(struct
>work_struct *work)
>>   struct amdgpu_virt *virt = container_of(work, struct amdgpu_virt,
>flr_work);
>>   struct amdgpu_device *adev = container_of(virt, struct
>amdgpu_device, virt);
>>   int timeout = AI_MAILBOX_POLL_FLR_TIMEDOUT;
>> +int i;
>> +bool need_do_recover = true;
>
>
>We should find a better name for "need_do_recover", may be
>"need_to_recover" ?
Thanks, will modify later.
>
>
>> +struct drm_sched_job *job;
>>
>>   /* block amdgpu_gpu_recover till msg FLR COMPLETE received,
>>* otherwise the mailbox msg will be ruined/reseted by
>> @@ -258,10 +261,25 @@ static void xgpu_ai_mailbox_flr_work(struct
>work_struct *work)
>>
>>   flr_done:
>>   up_read(>reset_sem);
>> +for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
>> +struct amdgpu_ring *ring = adev->rings[i];
>> +
>> +if (!ring || !ring->sched.thread)
>> +continue;
>> +
>> +spin_lock(>sched.job_list_lock);
>> +job = list_first_entry_or_null(>sched.ring_mirror_list,
>> +struct drm_sched_job, node);
>> +spin_unlock(>sched.job_list_lock);
>> +if (job) {
>> +need_do_recover = false;
>> +break;
>> +}
>> +}
>
>
>This 1st job retrieval logic can move to a function as there are two
>instance of it.
>Sorry, I didn't get your point.
>
>>
>>   /* Trigger recovery for world switch failure if no TDR */
>>   if (amdgpu_device_should_recover_gpu(adev)
>> -&& adev->sdma_timeout == MAX_SCHEDULE_TIMEOUT)
>> +&& (need_do_recover || adev->sdma_timeout ==
>MAX_SCHEDULE_TIMEOUT))
>>   amdgpu_device_gpu_recover(adev, NULL);
>>   }
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>> index 6f55172e8337..fc92c494df0b 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>> @@ -259,6 +259,9 @@ static void xgpu_nv_mailbox_flr_work(struct
>work_struct *work)
>>   struct amdgpu_virt *virt = container_of(work, struct amdgpu_virt,
>flr_work);
>>   struct amdgpu_device *adev = container_of(virt, struct
>amdgpu_device, virt);
>>   int timeout = NV_MAILBOX_POLL_FLR_TIMEDOUT;
>> +int i;
>> +bool need_do_recover = true;
>> +struct drm_sched_job *job;
>>
>>   /* block amdgpu_gpu_recover till msg FLR COMPLETE received,
>>* otherwise the mailbox msg will be ruined/reseted by
>> @@ -279,10 +282,25 @@ static void xgpu_nv_mailbox_flr_work(struct
>work_struct *work)
>>
>>   flr_done:
>>   up_read(>reset_sem);
>> +for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
>> +struct amdgpu_ring *ring = adev->rings[i];
>> +
>> +if (!ring || !ring->sched.thread)
>> +continue;
>> +
>> +spin_lock(>sched.job_list_lock);
>> +job = list_first_entry_or_null(>sched.ring_mirror_list,
>> +struct drm_sched_job, node);
>> +spin_unlock(>sched.job_list_lock);
>> +if (job) {
>> +need_do_recover = false;
>> +break;
>> +}
>> +}
>>
>>   /* Trigger recovery for world switch failure if no TDR */
>> -if (amdgpu_device_should_recover_gpu(adev)
>> -&& (adev->sdma_timeout == MAX_SCHEDULE_TIMEOUT ||
>> +if (amdgpu_device_should_recover_gpu(adev) && (need_do_recover
>||
>> +adev->sdma_timeout == MAX_SCHEDULE_TIMEOUT ||
>>   adev->gfx_timeout == MAX_SCHEDULE_TIMEOUT ||
>>   adev->compute_timeout == MAX_SCHEDULE_TIMEOUT ||
>>   adev->video_timeout == MAX_SCHEDULE_TIMEOUT))
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: fix reload KMD hang on GFX10 KIQ

2020-08-09 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Emily Deng 

>-Original Message-
>From: amd-gfx  On Behalf Of Monk
>Liu
>Sent: Monday, August 10, 2020 11:59 AM
>To: amd-gfx@lists.freedesktop.org
>Cc: Liu, Monk 
>Subject: [PATCH] drm/amdgpu: fix reload KMD hang on GFX10 KIQ
>
>GFX10 KIQ will hang if we try below steps:
>modprobe amdgpu
>rmmod amdgpu
>modprobe amdgpu sched_hw_submission=4
>
>Due to KIQ is always living there even after KMD unloaded thus when doing the
>realod KIQ will crash upon its register being programed by different values 
>with
>the previous loading (the config like HQD addr, ring size, is easily changed 
>if we
>alter the sched_hw_submission)
>
>the fix is we must inactive KIQ first before touching any of its registgers
>
>Signed-off-by: Monk Liu 
>---
> drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 4 
> 1 file changed, 4 insertions(+)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>index 622f442..0702c94 100644
>--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>@@ -6435,6 +6435,10 @@ static int gfx_v10_0_kiq_init_register(struct
>amdgpu_ring *ring)
> struct v10_compute_mqd *mqd = ring->mqd_ptr;
> int j;
>
>+/* inactivate the queue */
>+if (amdgpu_sriov_vf(adev))
>+WREG32_SOC15(GC, 0, mmCP_HQD_ACTIVE, 0);
>+
> /* disable wptr polling */
> WREG32_FIELD15(GC, 0, CP_PQ_WPTR_POLL_CNTL, EN, 0);
>
>--
>2.7.4
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.fre
>edesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfxdata=02%7C01%7CEmily.Deng%40amd.com%7C07fdd33db9d74d6cf
>25808d83ce1cc84%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7
>C637326287807233108sdata=Ab4%2BYW%2BTg42YDOqD1RdAKJk9xsT
>5RLAQj5LSEzuGzZU%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 1/2] drm/amdgpu: fix reload KMD hang on KIQ

2020-08-04 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

>-Original Message-
>From: amd-gfx  On Behalf Of Liu,
>Monk
>Sent: Tuesday, August 4, 2020 2:31 PM
>To: amd-gfx@lists.freedesktop.org
>Subject: RE: [PATCH 1/2] drm/amdgpu: fix reload KMD hang on KIQ
>
>[AMD Official Use Only - Internal Distribution Only]
>
>[AMD Official Use Only - Internal Distribution Only]
>
>Ping ... this is a severe bug fix
>
>_
>Monk Liu|GPU Virtualization Team |AMD
>
>
>-Original Message-
>From: amd-gfx  On Behalf Of Liu,
>Monk
>Sent: Monday, August 3, 2020 9:55 AM
>To: Kuehling, Felix ; amd-gfx@lists.freedesktop.org
>Subject: RE: [PATCH 1/2] drm/amdgpu: fix reload KMD hang on KIQ
>
>[AMD Official Use Only - Internal Distribution Only]
>
>[AMD Official Use Only - Internal Distribution Only]
>
>>>In gfx_v10_0_sw_fini the KIQ ring gets freed. Wouldn't that be the
>>>right place to stop the KIQ
>
>KIQ (CPC) will never being stopped (the DISABLE on CPC is skipped for SRIOV )
>for SRIOV in SW_FINI because SRIOV relies on KIQ to do world switch
>
>But this is really a weird bug because even with the same approach it doesn't
>make KIQ (CP) hang on GFX9, only GFX10 need this patch 
>
>By now I do not have other good explanation or better fix than this one
>
>_
>Monk Liu|GPU Virtualization Team |AMD
>
>
>-Original Message-
>From: Kuehling, Felix 
>Sent: Friday, July 31, 2020 9:57 PM
>To: Liu, Monk ; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH 1/2] drm/amdgpu: fix reload KMD hang on KIQ
>
>In gfx_v10_0_sw_fini the KIQ ring gets freed. Wouldn't that be the right place
>to stop the KIQ? Otherwise KIQ will hang as soon as someone allocates the
>memory that was previously used for the KIQ ring buffer and overwrites it with
>something that's not a valid PM4 packet.
>
>Regards,
>  Felix
>
>Am 2020-07-31 um 3:51 a.m. schrieb Monk Liu:
>> KIQ will hang if we try below steps:
>> modprobe amdgpu
>> rmmod amdgpu
>> modprobe amdgpu sched_hw_submission=4
>>
>> the cause is that due to KIQ is always living there even after we
>> unload KMD thus when doing the realod of KMD KIQ will crash upon its
>> register programed with different values with the previous
>> configuration (the config like HQD addr, ring size, is easily changed
>> if we alter the sched_hw_submission)
>>
>> the fix is we must inactive KIQ first before touching any of its
>> registgers
>>
>> Signed-off-by: Monk Liu 
>> ---
>>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> index db9f1e8..f571e25 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> @@ -6433,6 +6433,9 @@ static int gfx_v10_0_kiq_init_register(struct
>> amdgpu_ring *ring)  struct v10_compute_mqd *mqd = ring->mqd_ptr;  int
>> j;
>>
>> +/* activate the queue */
>> +WREG32_SOC15(GC, 0, mmCP_HQD_ACTIVE, 0);
>> +
Could we move follow to here?
if (RREG32_SOC15(GC, 0, mmCP_HQD_ACTIVE) & 1) {
WREG32_SOC15(GC, 0, mmCP_HQD_DEQUEUE_REQUEST, 1);
for (j = 0; j < adev->usec_timeout; j++) {
if (!(RREG32_SOC15(GC, 0, mmCP_HQD_ACTIVE) & 1))
break;
udelay(1);
}
>>  /* disable wptr polling */
>>  WREG32_FIELD15(GC, 0, CP_PQ_WPTR_POLL_CNTL, EN, 0);
>>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.fre
>edesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfxdata=02%7C01%7CEmily.Deng%40amd.com%7C1236f42617d246b20
>bc108d8384007e4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7
>C637321194957236933sdata=0%2BzHvJ1n4TZOYss4v1pR6i8bxq46JE73
>UIi%2B49x9joU%3Dreserved=0
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.fre
>edesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfxdata=02%7C01%7CEmily.Deng%40amd.com%7C1236f42617d246b20
>bc108d8384007e4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7
>C637321194957236933sdata=0%2BzHvJ1n4TZOYss4v1pR6i8bxq46JE73
>UIi%2B49x9joU%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu/sriov: Disable pm for multiple vf sriov

2020-06-10 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Hi Evan,
The multiple vf detect is amdgpu_device_ip_init.

Best wishes
Emily Deng



>-Original Message-
>From: Quan, Evan 
>Sent: Thursday, June 11, 2020 12:41 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: RE: [PATCH] drm/amdgpu/sriov: Disable pm for multiple vf sriov
>
>[AMD Official Use Only - Internal Distribution Only]
>
>Can this be moved to smu_early_init()? Or just do not adding the SMU ip for
>multiple vf sriov?
>
>Evan
>-Original Message-
>From: amd-gfx  On Behalf Of Emily
>Deng
>Sent: Tuesday, June 2, 2020 8:40 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: [PATCH] drm/amdgpu/sriov: Disable pm for multiple vf sriov
>
>Signed-off-by: Emily Deng 
>---
> drivers/gpu/drm/amd/powerplay/amdgpu_smu.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
>b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
>index 5294aa7..8ed6c90 100644
>--- a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
>+++ b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
>@@ -1311,8 +1311,10 @@ static int smu_hw_init(void *handle)  struct
>amdgpu_device *adev = (struct amdgpu_device *)handle;  struct smu_context
>*smu = >smu;
>
>-if (amdgpu_sriov_vf(adev) && !amdgpu_sriov_is_pp_one_vf(adev))
>+if (amdgpu_sriov_vf(adev) && !amdgpu_sriov_is_pp_one_vf(adev)) {
>+smu->pm_enabled = false;
> return 0;
>+}
>
> ret = smu_start_smc_engine(smu);
> if (ret) {
>--
>2.7.4
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.fre
>edesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfxdata=02%7C01%7Cevan.quan%40amd.com%7C1f8e52701d674ecc4eb
>f08d806f1e44d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C63726
>6983322685089sdata=RZ1PZlfRuO5b2jjHVwFcPqCDOZOZ2zRMgxZfjbAeY
>N0%3Dreserved=0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu/sriov: Disable pm for multiple vf sriov

2020-06-10 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Hi Monk,
Could you help to review this patch for multiple vf?

Best wishes
Emily Deng



>-Original Message-
>From: Deng, Emily 
>Sent: Wednesday, June 10, 2020 7:01 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Cc: Min, Frank 
>Subject: RE: [PATCH] drm/amdgpu/sriov: Disable pm for multiple vf sriov
>
>[AMD Official Use Only - Internal Distribution Only]
>
>>-Original Message-
>>From: Emily Deng 
>>Sent: Tuesday, June 2, 2020 8:40 PM
>>To: amd-gfx@lists.freedesktop.org
>>Cc: Deng, Emily 
>>Subject: [PATCH] drm/amdgpu/sriov: Disable pm for multiple vf sriov
>>
>>Signed-off-by: Emily Deng 
>>---
>> drivers/gpu/drm/amd/powerplay/amdgpu_smu.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>
>>diff --git a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
>>b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
>>index 5294aa7..8ed6c90 100644
>>--- a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
>>+++ b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
>>@@ -1311,8 +1311,10 @@ static int smu_hw_init(void *handle)  struct
>>amdgpu_device *adev = (struct amdgpu_device *)handle;  struct
>>smu_context *smu = >smu;
>>
>>-if (amdgpu_sriov_vf(adev) && !amdgpu_sriov_is_pp_one_vf(adev))
>>+if (amdgpu_sriov_vf(adev) && !amdgpu_sriov_is_pp_one_vf(adev)) {
>>+smu->pm_enabled = false;
>> return 0;
>>+}
>>
>> ret = smu_start_smc_engine(smu);
>> if (ret) {
>>--
>>2.7.4
>

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


  1   2   3   >