Re: [PATCH v2] drm/amdkfd: disable kfd debugfs node of hang_hws on vf mode

2021-05-13 Thread Wang, Kevin(Yang)
[AMD Official Use Only - Internal Distribution Only]

thanks @Kuehling, Felix<mailto:felix.kuehl...@amd.com>,
I have also noticed this problem, in the multi-GPU environment, there is no 
working well.

Best Regards,
Kevin

From: Kuehling, Felix 
Sent: Friday, May 14, 2021 12:01 AM
To: Wang, Kevin(Yang) ; amd-gfx@lists.freedesktop.org 

Cc: Zhang, Hawking ; Min, Frank 
Subject: Re: [PATCH v2] drm/amdkfd: disable kfd debugfs node of hang_hws on vf 
mode

This won't work. the kfd_debugfs directory is system-wide. So you cannot
have a per-GPU criteria for creating it. You may have one GPU that
probes successfully, another that fails. You still need the debugfs. If
you have multiple GPUs probing successfully, you only want to create the
debugfs node once.

The hang_hws file requires writing a GPU-ID to it. So if a card doesn't
probe it won't have a GPU ID, so you won't be able to hang that card
through the hang_hws interface. So there is no need to hide the file
altogether.

Can you explain why hang_hws should be disabled for VFs?

Thanks,
  Felix

Am 2021-05-13 um 4:08 a.m. schrieb Kevin Wang:

> v1:
> the kfd debugfs node is rely on kgd2kfd probe success,
> if not, the kfd_debugfs should not be created,
> and the node of "hang_hws" should be disabled on vf mode.
>
> v2:
> also move kfd_debugfs_fini() into kgd2kfd_device_exit() function.
>
> 1. move kfd_debugfs_init() function into kgd2kfd_probe() function.
> 2. disable "hang_hws" debugfs node on vf mode.
>
> Signed-off-by: Kevin Wang 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c | 7 ---
>  drivers/gpu/drm/amd/amdkfd/kfd_device.c  | 3 +++
>  drivers/gpu/drm/amd/amdkfd/kfd_module.c  | 3 ---
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 4 ++--
>  4 files changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c
> index 673d5e34f213..f9a81f34d09e 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c
> @@ -88,7 +88,7 @@ static const struct file_operations 
> kfd_debugfs_hang_hws_fops = {
>.release = single_release,
>  };
>
> -void kfd_debugfs_init(void)
> +void kfd_debugfs_init(bool is_vf)
>  {
>debugfs_root = debugfs_create_dir("kfd", NULL);
>
> @@ -98,8 +98,9 @@ void kfd_debugfs_init(void)
>kfd_debugfs_hqds_by_device, _debugfs_fops);
>debugfs_create_file("rls", S_IFREG | 0444, debugfs_root,
>kfd_debugfs_rls_by_device, _debugfs_fops);
> - debugfs_create_file("hang_hws", S_IFREG | 0200, debugfs_root,
> - kfd_debugfs_hang_hws_read, 
> _debugfs_hang_hws_fops);
> + if (!is_vf)
> + debugfs_create_file("hang_hws", S_IFREG | 0200, debugfs_root,
> + kfd_debugfs_hang_hws_read, 
> _debugfs_hang_hws_fops);
>  }
>
>  void kfd_debugfs_fini(void)
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index dedb8e33b953..aa9154a8410f 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -649,6 +649,8 @@ struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd,
>
>ida_init(>doorbell_ida);
>
> + kfd_debugfs_init(vf);
> +
>return kfd;
>  }
>
> @@ -884,6 +886,7 @@ void kgd2kfd_device_exit(struct kfd_dev *kfd)
>amdgpu_amdkfd_free_gws(kfd->kgd, kfd->gws);
>}
>
> + kfd_debugfs_fini();
>kfree(kfd);
>  }
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_module.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
> index 5e90fe642192..6b9f735c55ea 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_module.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
> @@ -61,8 +61,6 @@ static int kfd_init(void)
> */
>kfd_procfs_init();
>
> - kfd_debugfs_init();
> -
>return 0;
>
>  err_create_wq:
> @@ -76,7 +74,6 @@ static int kfd_init(void)
>
>  static void kfd_exit(void)
>  {
> - kfd_debugfs_fini();
>kfd_process_destroy_wq();
>kfd_procfs_shutdown();
>kfd_topology_shutdown();
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index daa9d47514c6..f3ddd8c5b11e 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -1174,7 +1174,7 @@ static inline int kfd_devcgroup_check_permission(struct 
> kfd_dev *kfd)
>  /* Debugfs */
>  #if defined(CONFIG_DEBUG_FS)
>
> -void kfd_debugfs_init(void);
> +void kfd

Re: [PATCH v2] drm/amdkfd: disable kfd debugfs node of hang_hws on vf mode

2021-05-13 Thread Felix Kuehling
This won't work. the kfd_debugfs directory is system-wide. So you cannot
have a per-GPU criteria for creating it. You may have one GPU that
probes successfully, another that fails. You still need the debugfs. If
you have multiple GPUs probing successfully, you only want to create the
debugfs node once.

The hang_hws file requires writing a GPU-ID to it. So if a card doesn't
probe it won't have a GPU ID, so you won't be able to hang that card
through the hang_hws interface. So there is no need to hide the file
altogether.

Can you explain why hang_hws should be disabled for VFs?

Thanks,
  Felix

Am 2021-05-13 um 4:08 a.m. schrieb Kevin Wang:

> v1:
> the kfd debugfs node is rely on kgd2kfd probe success,
> if not, the kfd_debugfs should not be created,
> and the node of "hang_hws" should be disabled on vf mode.
>
> v2:
> also move kfd_debugfs_fini() into kgd2kfd_device_exit() function.
>
> 1. move kfd_debugfs_init() function into kgd2kfd_probe() function.
> 2. disable "hang_hws" debugfs node on vf mode.
>
> Signed-off-by: Kevin Wang 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c | 7 ---
>  drivers/gpu/drm/amd/amdkfd/kfd_device.c  | 3 +++
>  drivers/gpu/drm/amd/amdkfd/kfd_module.c  | 3 ---
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 4 ++--
>  4 files changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c
> index 673d5e34f213..f9a81f34d09e 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c
> @@ -88,7 +88,7 @@ static const struct file_operations 
> kfd_debugfs_hang_hws_fops = {
>   .release = single_release,
>  };
>  
> -void kfd_debugfs_init(void)
> +void kfd_debugfs_init(bool is_vf)
>  {
>   debugfs_root = debugfs_create_dir("kfd", NULL);
>  
> @@ -98,8 +98,9 @@ void kfd_debugfs_init(void)
>   kfd_debugfs_hqds_by_device, _debugfs_fops);
>   debugfs_create_file("rls", S_IFREG | 0444, debugfs_root,
>   kfd_debugfs_rls_by_device, _debugfs_fops);
> - debugfs_create_file("hang_hws", S_IFREG | 0200, debugfs_root,
> - kfd_debugfs_hang_hws_read, 
> _debugfs_hang_hws_fops);
> + if (!is_vf)
> + debugfs_create_file("hang_hws", S_IFREG | 0200, debugfs_root,
> + kfd_debugfs_hang_hws_read, 
> _debugfs_hang_hws_fops);
>  }
>  
>  void kfd_debugfs_fini(void)
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index dedb8e33b953..aa9154a8410f 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -649,6 +649,8 @@ struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd,
>  
>   ida_init(>doorbell_ida);
>  
> + kfd_debugfs_init(vf);
> +
>   return kfd;
>  }
>  
> @@ -884,6 +886,7 @@ void kgd2kfd_device_exit(struct kfd_dev *kfd)
>   amdgpu_amdkfd_free_gws(kfd->kgd, kfd->gws);
>   }
>  
> + kfd_debugfs_fini();
>   kfree(kfd);
>  }
>  
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_module.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
> index 5e90fe642192..6b9f735c55ea 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_module.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
> @@ -61,8 +61,6 @@ static int kfd_init(void)
>*/
>   kfd_procfs_init();
>  
> - kfd_debugfs_init();
> -
>   return 0;
>  
>  err_create_wq:
> @@ -76,7 +74,6 @@ static int kfd_init(void)
>  
>  static void kfd_exit(void)
>  {
> - kfd_debugfs_fini();
>   kfd_process_destroy_wq();
>   kfd_procfs_shutdown();
>   kfd_topology_shutdown();
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index daa9d47514c6..f3ddd8c5b11e 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -1174,7 +1174,7 @@ static inline int kfd_devcgroup_check_permission(struct 
> kfd_dev *kfd)
>  /* Debugfs */
>  #if defined(CONFIG_DEBUG_FS)
>  
> -void kfd_debugfs_init(void);
> +void kfd_debugfs_init(bool is_vf);
>  void kfd_debugfs_fini(void);
>  int kfd_debugfs_mqds_by_process(struct seq_file *m, void *data);
>  int pqm_debugfs_mqds(struct seq_file *m, void *data);
> @@ -1189,7 +1189,7 @@ int dqm_debugfs_execute_queues(struct 
> device_queue_manager *dqm);
>  
>  #else
>  
> -static inline void kfd_debugfs_init(void) {}
> +static inline void kfd_debugfs_init(bool is_vf) {}
>  static inline void kfd_debugfs_fini(void) {}
>  
>  #endif
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v2] drm/amdkfd: disable kfd debugfs node of hang_hws on vf mode

2021-05-13 Thread Kevin Wang
v1:
the kfd debugfs node is rely on kgd2kfd probe success,
if not, the kfd_debugfs should not be created,
and the node of "hang_hws" should be disabled on vf mode.

v2:
also move kfd_debugfs_fini() into kgd2kfd_device_exit() function.

1. move kfd_debugfs_init() function into kgd2kfd_probe() function.
2. disable "hang_hws" debugfs node on vf mode.

Signed-off-by: Kevin Wang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c | 7 ---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c  | 3 +++
 drivers/gpu/drm/amd/amdkfd/kfd_module.c  | 3 ---
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 4 ++--
 4 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c
index 673d5e34f213..f9a81f34d09e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c
@@ -88,7 +88,7 @@ static const struct file_operations kfd_debugfs_hang_hws_fops 
= {
.release = single_release,
 };
 
-void kfd_debugfs_init(void)
+void kfd_debugfs_init(bool is_vf)
 {
debugfs_root = debugfs_create_dir("kfd", NULL);
 
@@ -98,8 +98,9 @@ void kfd_debugfs_init(void)
kfd_debugfs_hqds_by_device, _debugfs_fops);
debugfs_create_file("rls", S_IFREG | 0444, debugfs_root,
kfd_debugfs_rls_by_device, _debugfs_fops);
-   debugfs_create_file("hang_hws", S_IFREG | 0200, debugfs_root,
-   kfd_debugfs_hang_hws_read, 
_debugfs_hang_hws_fops);
+   if (!is_vf)
+   debugfs_create_file("hang_hws", S_IFREG | 0200, debugfs_root,
+   kfd_debugfs_hang_hws_read, 
_debugfs_hang_hws_fops);
 }
 
 void kfd_debugfs_fini(void)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index dedb8e33b953..aa9154a8410f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -649,6 +649,8 @@ struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd,
 
ida_init(>doorbell_ida);
 
+   kfd_debugfs_init(vf);
+
return kfd;
 }
 
@@ -884,6 +886,7 @@ void kgd2kfd_device_exit(struct kfd_dev *kfd)
amdgpu_amdkfd_free_gws(kfd->kgd, kfd->gws);
}
 
+   kfd_debugfs_fini();
kfree(kfd);
 }
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_module.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
index 5e90fe642192..6b9f735c55ea 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_module.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
@@ -61,8 +61,6 @@ static int kfd_init(void)
 */
kfd_procfs_init();
 
-   kfd_debugfs_init();
-
return 0;
 
 err_create_wq:
@@ -76,7 +74,6 @@ static int kfd_init(void)
 
 static void kfd_exit(void)
 {
-   kfd_debugfs_fini();
kfd_process_destroy_wq();
kfd_procfs_shutdown();
kfd_topology_shutdown();
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index daa9d47514c6..f3ddd8c5b11e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1174,7 +1174,7 @@ static inline int kfd_devcgroup_check_permission(struct 
kfd_dev *kfd)
 /* Debugfs */
 #if defined(CONFIG_DEBUG_FS)
 
-void kfd_debugfs_init(void);
+void kfd_debugfs_init(bool is_vf);
 void kfd_debugfs_fini(void);
 int kfd_debugfs_mqds_by_process(struct seq_file *m, void *data);
 int pqm_debugfs_mqds(struct seq_file *m, void *data);
@@ -1189,7 +1189,7 @@ int dqm_debugfs_execute_queues(struct 
device_queue_manager *dqm);
 
 #else
 
-static inline void kfd_debugfs_init(void) {}
+static inline void kfd_debugfs_init(bool is_vf) {}
 static inline void kfd_debugfs_fini(void) {}
 
 #endif
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx