Re: [PATCH] drm/amdkfd: Fix a bug in SDMA RLC queue counting under HWS mode

2020-01-30 Thread Yong Zhao

True. It is a bug too. I am looking into it.

Yong

On 2020-01-30 5:51 p.m., Felix Kuehling wrote:

On 2020-01-30 17:29, Yong Zhao wrote:

The sdma_queue_count increment should be done before
execute_queues_cpsch(), which calls pm_calc_rlib_size() where
sdma_queue_count is used to calculate whether over_subscription is
triggered.

With the previous code, when a SDMA queue is created,
compute_queue_count in pm_calc_rlib_size() is one more than the
actual compute queue number, because the queue_count has been
incremented while sdma_queue_count has not. This patch fixes that.

Change-Id: I20353e657efd505353d0dd9f7eb2fab5085e7202
Signed-off-by: Yong Zhao 


Reviewed-by: Felix Kuehling 

But I took a look at pm_calc_rlib_size. I don't think subtracting 
dqm->sdma_queue_count from dqm->queue_count is not quite correct, 
because sdma_queue_count counts all SDMA queues, while queue_count 
only counts active queues. So an application that creates inactive 
SDMA queues will also create errors here. We probably need to count 
active compute and active SDMA queues separately in DQM to fix this 
properly.


Regards,
  Felix



---
  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 10 ++
  1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c

index 2870553a2ce0..80d22bf702e8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1237,16 +1237,18 @@ static int create_queue_cpsch(struct 
device_queue_manager *dqm, struct queue *q,

    list_add(>list, >queues_list);
  qpd->queue_count++;
+
+    if (q->properties.type == KFD_QUEUE_TYPE_SDMA)
+    dqm->sdma_queue_count++;
+    else if (q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI)
+    dqm->xgmi_sdma_queue_count++;
+
  if (q->properties.is_active) {
  dqm->queue_count++;
  retval = execute_queues_cpsch(dqm,
  KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES, 0);
  }
  -    if (q->properties.type == KFD_QUEUE_TYPE_SDMA)
-    dqm->sdma_queue_count++;
-    else if (q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI)
-    dqm->xgmi_sdma_queue_count++;
  /*
   * Unconditionally increment this counter, regardless of the 
queue's

   * type or whether the queue is active.

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdkfd: Fix a bug in SDMA RLC queue counting under HWS mode

2020-01-30 Thread Felix Kuehling

On 2020-01-30 17:29, Yong Zhao wrote:

The sdma_queue_count increment should be done before
execute_queues_cpsch(), which calls pm_calc_rlib_size() where
sdma_queue_count is used to calculate whether over_subscription is
triggered.

With the previous code, when a SDMA queue is created,
compute_queue_count in pm_calc_rlib_size() is one more than the
actual compute queue number, because the queue_count has been
incremented while sdma_queue_count has not. This patch fixes that.

Change-Id: I20353e657efd505353d0dd9f7eb2fab5085e7202
Signed-off-by: Yong Zhao 


Reviewed-by: Felix Kuehling 

But I took a look at pm_calc_rlib_size. I don't think subtracting 
dqm->sdma_queue_count from dqm->queue_count is not quite correct, 
because sdma_queue_count counts all SDMA queues, while queue_count only 
counts active queues. So an application that creates inactive SDMA 
queues will also create errors here. We probably need to count active 
compute and active SDMA queues separately in DQM to fix this properly.


Regards,
  Felix



---
  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 10 ++
  1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 2870553a2ce0..80d22bf702e8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1237,16 +1237,18 @@ static int create_queue_cpsch(struct 
device_queue_manager *dqm, struct queue *q,
  
  	list_add(>list, >queues_list);

qpd->queue_count++;
+
+   if (q->properties.type == KFD_QUEUE_TYPE_SDMA)
+   dqm->sdma_queue_count++;
+   else if (q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI)
+   dqm->xgmi_sdma_queue_count++;
+
if (q->properties.is_active) {
dqm->queue_count++;
retval = execute_queues_cpsch(dqm,
KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES, 0);
}
  
-	if (q->properties.type == KFD_QUEUE_TYPE_SDMA)

-   dqm->sdma_queue_count++;
-   else if (q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI)
-   dqm->xgmi_sdma_queue_count++;
/*
 * Unconditionally increment this counter, regardless of the queue's
 * type or whether the queue is active.

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdkfd: Fix a bug in SDMA RLC queue counting under HWS mode

2020-01-30 Thread Yong Zhao
The sdma_queue_count increment should be done before
execute_queues_cpsch(), which calls pm_calc_rlib_size() where
sdma_queue_count is used to calculate whether over_subscription is
triggered.

With the previous code, when a SDMA queue is created,
compute_queue_count in pm_calc_rlib_size() is one more than the
actual compute queue number, because the queue_count has been
incremented while sdma_queue_count has not. This patch fixes that.

Change-Id: I20353e657efd505353d0dd9f7eb2fab5085e7202
Signed-off-by: Yong Zhao 
---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 2870553a2ce0..80d22bf702e8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1237,16 +1237,18 @@ static int create_queue_cpsch(struct 
device_queue_manager *dqm, struct queue *q,
 
list_add(>list, >queues_list);
qpd->queue_count++;
+
+   if (q->properties.type == KFD_QUEUE_TYPE_SDMA)
+   dqm->sdma_queue_count++;
+   else if (q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI)
+   dqm->xgmi_sdma_queue_count++;
+
if (q->properties.is_active) {
dqm->queue_count++;
retval = execute_queues_cpsch(dqm,
KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES, 0);
}
 
-   if (q->properties.type == KFD_QUEUE_TYPE_SDMA)
-   dqm->sdma_queue_count++;
-   else if (q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI)
-   dqm->xgmi_sdma_queue_count++;
/*
 * Unconditionally increment this counter, regardless of the queue's
 * type or whether the queue is active.
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx