Re: [PATCH] drm/amdkfd: Fix a bug in SDMA RLC queue counting under HWS mode
True. It is a bug too. I am looking into it. Yong On 2020-01-30 5:51 p.m., Felix Kuehling wrote: On 2020-01-30 17:29, Yong Zhao wrote: The sdma_queue_count increment should be done before execute_queues_cpsch(), which calls pm_calc_rlib_size() where sdma_queue_count is used to calculate whether over_subscription is triggered. With the previous code, when a SDMA queue is created, compute_queue_count in pm_calc_rlib_size() is one more than the actual compute queue number, because the queue_count has been incremented while sdma_queue_count has not. This patch fixes that. Change-Id: I20353e657efd505353d0dd9f7eb2fab5085e7202 Signed-off-by: Yong Zhao Reviewed-by: Felix Kuehling But I took a look at pm_calc_rlib_size. I don't think subtracting dqm->sdma_queue_count from dqm->queue_count is not quite correct, because sdma_queue_count counts all SDMA queues, while queue_count only counts active queues. So an application that creates inactive SDMA queues will also create errors here. We probably need to count active compute and active SDMA queues separately in DQM to fix this properly. Regards, Felix --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index 2870553a2ce0..80d22bf702e8 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -1237,16 +1237,18 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q, list_add(>list, >queues_list); qpd->queue_count++; + + if (q->properties.type == KFD_QUEUE_TYPE_SDMA) + dqm->sdma_queue_count++; + else if (q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI) + dqm->xgmi_sdma_queue_count++; + if (q->properties.is_active) { dqm->queue_count++; retval = execute_queues_cpsch(dqm, KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES, 0); } - if (q->properties.type == KFD_QUEUE_TYPE_SDMA) - dqm->sdma_queue_count++; - else if (q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI) - dqm->xgmi_sdma_queue_count++; /* * Unconditionally increment this counter, regardless of the queue's * type or whether the queue is active. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdkfd: Fix a bug in SDMA RLC queue counting under HWS mode
On 2020-01-30 17:29, Yong Zhao wrote: The sdma_queue_count increment should be done before execute_queues_cpsch(), which calls pm_calc_rlib_size() where sdma_queue_count is used to calculate whether over_subscription is triggered. With the previous code, when a SDMA queue is created, compute_queue_count in pm_calc_rlib_size() is one more than the actual compute queue number, because the queue_count has been incremented while sdma_queue_count has not. This patch fixes that. Change-Id: I20353e657efd505353d0dd9f7eb2fab5085e7202 Signed-off-by: Yong Zhao Reviewed-by: Felix Kuehling But I took a look at pm_calc_rlib_size. I don't think subtracting dqm->sdma_queue_count from dqm->queue_count is not quite correct, because sdma_queue_count counts all SDMA queues, while queue_count only counts active queues. So an application that creates inactive SDMA queues will also create errors here. We probably need to count active compute and active SDMA queues separately in DQM to fix this properly. Regards, Felix --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index 2870553a2ce0..80d22bf702e8 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -1237,16 +1237,18 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q, list_add(>list, >queues_list); qpd->queue_count++; + + if (q->properties.type == KFD_QUEUE_TYPE_SDMA) + dqm->sdma_queue_count++; + else if (q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI) + dqm->xgmi_sdma_queue_count++; + if (q->properties.is_active) { dqm->queue_count++; retval = execute_queues_cpsch(dqm, KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES, 0); } - if (q->properties.type == KFD_QUEUE_TYPE_SDMA) - dqm->sdma_queue_count++; - else if (q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI) - dqm->xgmi_sdma_queue_count++; /* * Unconditionally increment this counter, regardless of the queue's * type or whether the queue is active. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amdkfd: Fix a bug in SDMA RLC queue counting under HWS mode
The sdma_queue_count increment should be done before execute_queues_cpsch(), which calls pm_calc_rlib_size() where sdma_queue_count is used to calculate whether over_subscription is triggered. With the previous code, when a SDMA queue is created, compute_queue_count in pm_calc_rlib_size() is one more than the actual compute queue number, because the queue_count has been incremented while sdma_queue_count has not. This patch fixes that. Change-Id: I20353e657efd505353d0dd9f7eb2fab5085e7202 Signed-off-by: Yong Zhao --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index 2870553a2ce0..80d22bf702e8 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -1237,16 +1237,18 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q, list_add(>list, >queues_list); qpd->queue_count++; + + if (q->properties.type == KFD_QUEUE_TYPE_SDMA) + dqm->sdma_queue_count++; + else if (q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI) + dqm->xgmi_sdma_queue_count++; + if (q->properties.is_active) { dqm->queue_count++; retval = execute_queues_cpsch(dqm, KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES, 0); } - if (q->properties.type == KFD_QUEUE_TYPE_SDMA) - dqm->sdma_queue_count++; - else if (q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI) - dqm->xgmi_sdma_queue_count++; /* * Unconditionally increment this counter, regardless of the queue's * type or whether the queue is active. -- 2.17.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx