Bug#1065701: rocm_agent_enumerator: crash on systems without AMD GPU
Control: notfound -1 5.7.1-1 [Cordell Bloor] > reportbug noted that I had 5.7.1-1 installed, but my report was based on > the rocm-hipamd autopkgtests run on the DebCI. I didn't notice that the > DebCI was using a different version of rocminfo than me, as I'd assumed > it used packages from Unstable. OK. Updated issue metadata. -- Happy hacking Petter Reinholdtsen
Bug#1065701: rocm_agent_enumerator: crash on systems without AMD GPU
Hi Petter, On 2024-03-15 02:39, Petter Reinholdtsen wrote: [Cordell Bloor] Thanks Petter. After inspecting the code and reviewing both your report and the buildd logs, my conclusion is that this issue was fixed by upstream and included in 5.7.1-1. But your original report claimed it was present in version 5.7.1-1? Which version was used to trigger the error? reportbug noted that I had 5.7.1-1 installed, but my report was based on the rocm-hipamd autopkgtests run on the DebCI. I didn't notice that the DebCI was using a different version of rocminfo than me, as I'd assumed it used packages from Unstable. In retrospect, I suppose it makes sense that Testing is used as the base for the autopkgtests, since they're used to gate migration to Testing. Sincerely, Cory Bloor
Bug#1065701: rocm_agent_enumerator: crash on systems without AMD GPU
[Cordell Bloor] > Thanks Petter. After inspecting the code and reviewing both your report > and the buildd logs, my conclusion is that this issue was fixed by > upstream and included in 5.7.1-1. But your original report claimed it was present in version 5.7.1-1? Which version was used to trigger the error? -- Happy hacking Petter Reinholdtsen
Bug#1065701: rocm_agent_enumerator: crash on systems without AMD GPU
On Sun, 10 Mar 2024 06:56:56 +0100 Petter Reinholdtsen wrote: > > In my sid chroot, on a laptop with no AMD GPU, I get this: > > root@minerva:/# rocminfo > ROCk module is NOT loaded, possibly no GPU devices > root@minerva:/# rocm_agent_enumerator > gfx000 > root@minerva:/# Thanks Petter. After inspecting the code and reviewing both your report and the buildd logs, my conclusion is that this issue was fixed by upstream and included in 5.7.1-1. Sincerely, Cory Bloor
Bug#1065701: rocm_agent_enumerator: crash on systems without AMD GPU
In my sid chroot, on a laptop with no AMD GPU, I get this: root@minerva:/# rocminfo ROCk module is NOT loaded, possibly no GPU devices root@minerva:/# rocm_agent_enumerator gfx000 root@minerva:/# -- Happy hacking Petter Reinholdtsen
Bug#1065701: rocm_agent_enumerator: crash on systems without AMD GPU
Control: found -1 5.2.3-3 Hi Cory, On 2024-03-09 07:20, Cordell Bloor wrote: > On systems, the rocm_agent_enumerator command may crash with an error: > > Traceback (most recent call last): > File "/usr/bin/rocm_agent_enumerator", line 260, in > main() > File "/usr/bin/rocm_agent_enumerator", line 244, in main > target_list = readFromKFD() > ^ > File "/usr/bin/rocm_agent_enumerator", line 193, in readFromKFD > for node in sorted(os.listdir(topology_dir)): > > FileNotFoundError: [Errno 2] No such file or directory: > '/sys/class/kfd/kfd/topology/nodes/' I've been seeing this one for a long time in package builds, but it didn't occur to me that this is a user-visible issue, too. Seen here [1], for example. Best, Christian [1] https://buildd.debian.org/status/fetch.php?pkg=rocblas&arch=amd64&ver=5.3.3%2Bdfsg-2&stamp=1685955323&raw=0
Bug#1065701: rocm_agent_enumerator: crash on systems without AMD GPU
Package: rocminfo Version: 5.7.1-1 Severity: normal X-Debbugs-Cc: c...@slerp.xyz Dear Maintainer, On systems, the rocm_agent_enumerator command may crash with an error: Traceback (most recent call last): File "/usr/bin/rocm_agent_enumerator", line 260, in main() File "/usr/bin/rocm_agent_enumerator", line 244, in main target_list = readFromKFD() ^ File "/usr/bin/rocm_agent_enumerator", line 193, in readFromKFD for node in sorted(os.listdir(topology_dir)): FileNotFoundError: [Errno 2] No such file or directory: '/sys/class/kfd/kfd/topology/nodes/' It's not clear to me exactly why this error is emitted. Perhaps it's because the system does not have an AMD GPU at all. In that case, the expected output would be "gfx000\n". The purpose of rocm_agent_enumerator is to list all AMD GPUs on a system. If there are no AMD GPUs, then it should be an empty list. This behaviour can be seen in the rocm-hipamd autopkgtests [1]. While hipcc should probably not be calling rocm_agent_enumerator when the offload architecture has been manually specified, the rocm_agent_enumerator shouldn't be emiting any output on stderr. Sincerely, Cory Bloor [1]: https://ci.debian.net/data/autopkgtest/testing/amd64/r/rocm-hipamd/43752739/log.gz -- System Information: Debian Release: trixie/sid APT prefers unstable APT policy: (500, 'unstable') Architecture: amd64 (x86_64) Kernel: Linux 6.6.15-amd64 (SMP w/32 CPU threads; PREEMPT) Locale: LANG=C, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set Shell: /bin/sh linked to /usr/bin/dash Init: unable to detect Versions of packages rocminfo depends on: ii kmod31+20240202-2 ii libc6 2.37-15.1 ii libgcc-s1 14-20240303-1 ii libhsa-runtime64-1 5.7.1-1 ii libstdc++6 14-20240303-1 ii pciutils1:3.11.1-1 ii python3 3.11.8-1 rocminfo recommends no packages. rocminfo suggests no packages. -- no debconf information