Re: ROCm test launcher modifications when there is other non-AMD GPU cards.
Hi Fred, On 2025-05-29 02:06, PICCA Frederic-Emmanuel wrote: So here a new question: What about rocm on the CPU part ? Is it possible to run the rocm opencl on the CPU (at least it allows to test this part) without all this black magic 🙂. I don't know. I imagine it is probably device-only, but I've emailed the development team to ask. With that said, I'm not sure that the test results would be very interesting if executing on the host. That would not exercise any of the underlying ROCm compiler, runtime or driver components. Sincerely, Cory Bloor
Re: ROCm test launcher modifications when there is other non-AMD GPU cards.
Another question, I am still investigating in order to have an unshare working setup. I was told that unshare is meant to isolate from the hardware. I can understand this point of view. Nevertheless at some point we need a CPU in order to make all this work :). To my opinion GPUs shoud be available in the same way. So here a new question: What about rocm on the CPU part ? Is it possible to run the rocm opencl on the CPU (at least it allows to test this part) without all this black magic :). Cheers Fred
Re: ROCm test launcher modifications when there is other non-AMD GPU cards.
Hi Fred, Étienne, On 2025-05-26 20:29, Étienne Mollier wrote: > PICCA Frederic-Emmanuel, on 2025-05-26: >> do you know what should de done in order to prepare the chroot in our case. >> a sort of bind mount but when we tryed to bind mount something under /dev/, >> it enup with this error I'm not yet familiar with sbuild's unshare backend, but I assume that just like the autopkgtest-virt-unshare backend, it only supports bind mount of directories at most. So you could bind mount the necessary directory /dev/dri, but not the device /dev/kfd. I took a brief look at adding this a shorte while ago but shelved it for now as we are in hard freeze. For now, you'd have to use the podman backend to run tests. > I'm not sure yet how to expose devices the right way in unshare > mode. I would expect any method involving bind-mount to require > running the pre-build commands as root, and any method involving > mknod to require work with capabilities(7), user_namespaces(7) > and setcap(8), but implementation details are evading me so far. I think it should work just with user_namespaces, because rootless podman does it like that. This PoC seemed to work: christian@host$ unshare --user --map-root-user --mount root@host$ touch /tmp/kfd && mount --bind /dev/kfd /tmp/kfd && ls -l /tmp/kfd crw-rw 1 nobody nogroup 239, 0 Mai 26 22:56 /tmp/kfd Though I have to say, without some web searches pointing me to the solution of touching the file to bind-mount the device over, I would have not found this. (I strongly suspect that chown'ing won't be possible or won't fix the issue entirely. For podman, I had to jump through a few hoops, search for "setgroups" in [1]). Best, Christian [1]: https://salsa.debian.org/rocm-team/community/team-project/-/blob/master/doc/rocm-autopkgtests-in-containers.md
Re: ROCm test launcher modifications when there is other non-AMD GPU cards.
Hi Fred, PICCA Frederic-Emmanuel, on 2025-05-26: > We are just using sbuild in order to run the autopkgtest with the unshare > system > > It end up with this error > > autopkgtest [13:11:41]: test command1: rocm-test-launcher reconstruct > autopkgtest [13:11:41]: test command1: [--- > /dev/kfd not present, system either lacks AMD GPU or 'amdgpu' driver is not > loaded. > Skipping tests. […] > do you know what should de done in order to prepare the chroot in our case. If getting back to schroot mode is not a problem for you, there is the option to follow these instructions[1] I wrapped up a couple of years ago. They should still apply, modulo the fact that it is necessary to also pass --chroot-mode=schroot to the sbuild command, or adjust the sbuildrc to change the default mode. [1]: https://lists.debian.org/debian-ai/2022/06/msg0.html > a sort of bind mount but when we tryed to bind mount something under /dev/, > it enup with this error > > I: Placed new chroot tarball at /home/picca/.cache/sbuild/unstable-amd64.tar > I: Unpacking /home/picca/.cache/sbuild/unstable-amd64.tar to > /tmp/tmp.sbuild.xDhCW0XnvX... > mount: /tmp/tmp.sbuild.xDhCW0XnvX/dev/kfd: wrong fs type, bad option, bad > superblock on /dev/kfd, missing codepage or helper program, or other error. >dmesg(1) may have more information after failed mount system call. > mount failed: No such file or directory at /usr/libexec/sbuild-usernsexec > line 387. > E: Running 'true' inside the chroot failed. Please file a bug against sbuild. > E: Error creating chroot session: skipping ufo-filters I'm not sure yet how to expose devices the right way in unshare mode. I would expect any method involving bind-mount to require running the pre-build commands as root, and any method involving mknod to require work with capabilities(7), user_namespaces(7) and setcap(8), but implementation details are evading me so far. In hope this helps, -- .''`. Étienne Mollier : :' : pgp: 8f91 b227 c7d6 f2b1 948c 8236 793c f67e 8f0d 11da `. `' sent from /dev/pts/3, please excuse my verbosity `-on air: Agents of Mercy - The Black Forest signature.asc Description: PGP signature
Re: ROCm test launcher modifications when there is other non-AMD GPU cards.
Hello Christian > This requires package rocm-podman-support. That package also provides > you with > * rocm-podman-setup(1) to prepare your system for GPU-in-container use > * rocm-podman-create(1) to create an image. We are just using sbuild in order to run the autopkgtest with the unshare system It end up with this error autopkgtest [13:11:41]: test command1: rocm-test-launcher reconstruct autopkgtest [13:11:41]: test command1: [--- /dev/kfd not present, system either lacks AMD GPU or 'amdgpu' driver is not loaded. Skipping tests. but. This computer has and AMD graphical xcomputing unit. *** Agent 2 *** Name:gfx90c Uuid:GPU-XX Marketing Name: AMD Radeon Graphics Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode:NEAR Max Queue Number:128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x2) Queue Type: MULTI Node:1 Device Type: GPU Cache Info: L1: 16(0x10) KB L2: 1024(0x400) KB Chip ID: 5686(0x1636) do you know what should de done in order to prepare the chroot in our case. a sort of bind mount but when we tryed to bind mount something under /dev/, it enup with this error I: Placed new chroot tarball at /home/picca/.cache/sbuild/unstable-amd64.tar I: Unpacking /home/picca/.cache/sbuild/unstable-amd64.tar to /tmp/tmp.sbuild.xDhCW0XnvX... mount: /tmp/tmp.sbuild.xDhCW0XnvX/dev/kfd: wrong fs type, bad option, bad superblock on /dev/kfd, missing codepage or helper program, or other error. dmesg(1) may have more information after failed mount system call. mount failed: No such file or directory at /usr/libexec/sbuild-usernsexec line 387. E: Running 'true' inside the chroot failed. Please file a bug against sbuild. E: Error creating chroot session: skipping ufo-filters thanks for your help. Fred
Re: ROCm test launcher modifications when there is other non-AMD GPU cards.
On 2025-05-13 23:23, Cordell Bloor wrote: > Typically there is also the option of using the sadt tool from > devscripts [1]. However, it doesn't seem to recognize the Architecture > field in d/t/control That warrants a bug report. I took a brief look. At first glance, the trivial patch below should fix this by ignoring the field's contents, which wouldn't be an entirely unreasonable workaround. Parsing the field would be a bit more work, but perhaps the maintainers would be willing to implement it. Best, Christian > [1]: https://manpages.debian.org/testing/devscripts/sadt.1.en.html $ diff -u /usr/bin/sadt sadt --- /usr/bin/sadt 2025-04-29 05:20:06.0 +0200 +++ /tmp/sadt2025-05-14 22:45:56.169844645 +0200 @@ -453,6 +453,9 @@ def add_tests_directory(self, path): self.tests_directory = path +def add_architecture(self, architecture): +pass + def copy_build_tree(): rw_build_tree = tempfile.mkdtemp(prefix="sadt-rwbt.")
Re: ROCm test launcher modifications when there is other non-AMD GPU cards.
Hi Christian, On 2025-05-13 14:05, Christian Kastner wrote: If you mean "run the autopkgtest I'm defining in debian/tests/* manually", then you can simply run the test command from that directory, as long as you have the packages installed. Typically there is also the option of using the sadt tool from devscripts [1]. However, it doesn't seem to recognize the Architecture field in d/t/control, which prevents it from functioning with the ROCm autopkgtests (as I believe they all specify that field). e.g. root@da6262403eec:~/rocrand/rocrand# sadt rocrand cannot parse package relationship "", returning it raw sadt: warning: unknown field Architecture, skipping the whole paragraph sadt: warning: unknown field Architecture, skipping the whole paragraph OK (tests=0) Sincerely, Cory Bloor [1]: https://manpages.debian.org/testing/devscripts/sadt.1.en.html
Re: ROCm test launcher modifications when there is other non-AMD GPU cards.
Hi Clement, On 2025-05-13 16:41, LONGEAC Clement wrote: > Thank you for all your answers ! However I was wondering, is it possible to > do autopkgtest without backend, here (podman+rocm)? It depends on what you mean. autopkgtest(1) is a testing frontend, and it has many backends of the form autopkgtest-virt-. The Debian ROCm Team ships an autopkgtest-virt-podman+rocm, which provides the "podman+rocm" backend. So if with "do[ing] autopkgtest without backend", you mean using the autopkgtest(1) *frontend*, then no: a backend is required. If you mean "run the autopkgtest I'm defining in debian/tests/* manually", then you can simply run the test command from that directory, as long as you have the packages installed. > If we don't have a GPU, how do we go about using the backend without a > container? Sorry, I don't quite follow. Without a GPU, you wouldn't run a test at all, right? And a container is just something that some of the autopkgtest-virt-* backends use (be it docker, podman, or lxc). There are other options as well. Schroot is gradually falling into disuse, can we use the unshare backend instead of schroot when ordering autopkgtest? In general: sure, see autopkgtest-virt-unshare(1). However, it won't work with a GPU because there is no way to pass in the GPU device. The backend does have a --bind option, but it only works on directories, not devices. Adding device support looks to be quite easy, I'll file an MR over the weekend. Best, Christian
Re: ROCm test launcher modifications when there is other non-AMD GPU cards.
Hi Christian, Thank you for all your answers ! However I was wondering, is it possible to do autopkgtest without backend, here (podman+rocm)? If we don't have a GPU, how do we go about using the backend without a container? Schroot is gradually falling into disuse, can we use the unshare backend instead of schroot when ordering autopkgtest? Best regards, Clément LONGEAC - Mail original - De: "Christian Kastner" À: "debian-science" Envoyé: Lundi 12 Mai 2025 22:56:44 Objet: Re: ROCm test launcher modifications when there is other non-AMD GPU cards. On 2025-05-12 13:06, LONGEAC Clement wrote: > We have several NVIDIA cards and one AMD I wasn't paying attention: as there is only *one* AMD card, you can of course use the podman+rocm backend, as it will pick up all AMD cards. As there is only one, the "all" is not a problem here. So you can simply autopkgtest your package, with rocm-test-launcher enabled, with $ autopkgtest -B -- podman+rocm This requires package rocm-podman-support. That package also provides you with * rocm-podman-setup(1) to prepare your system for GPU-in-container use * rocm-podman-create(1) to create an image. Best, Christian
Re: ROCm test launcher modifications when there is other non-AMD GPU cards.
On 2025-05-12 13:06, LONGEAC Clement wrote: > We have several NVIDIA cards and one AMD I wasn't paying attention: as there is only *one* AMD card, you can of course use the podman+rocm backend, as it will pick up all AMD cards. As there is only one, the "all" is not a problem here. So you can simply autopkgtest your package, with rocm-test-launcher enabled, with $ autopkgtest -B -- podman+rocm This requires package rocm-podman-support. That package also provides you with * rocm-podman-setup(1) to prepare your system for GPU-in-container use * rocm-podman-create(1) to create an image. Best, Christian
Re: ROCm test launcher modifications when there is other non-AMD GPU cards.
On 2025-05-12 22:32, Cordell Bloor wrote: > On 2025-05-12 10:54, Christian Kastner wrote: >> [3]: Sadly, after much trying, it seems that the analog for [1] in >> rootless containers, using the 'podman+rocm' backend, is not >> possible due to come cgroupsv2 restriction. However, I still >> have the code for that, and I guess I could ship it for people who >> want to try it in rootful containers. > > I take it you are referring to setting environment variables in podman > workers? The ROCR_VISIBLE_DEVICES variable can isolate the GPU at a > fairly low level in the ROCm user land [2]. Or, do you mean only passing > through a subset of devices at all? I forget how you were approaching this. I was referring to passing in the devices, which seems to be the approach preferred by the official documentation [5], but not possible in rootless containers because of [6]. I did try to switch over to ROCR_VISIBLE_DEVICES at some point, but ran into some impediment. Don't immediately recall what it was. That reminds me though, ROCR_VISIBLE_DEVICES can be passed as-is, without needing magic. IOW, the following should work (though untested): autopkgtest -B --env=ROCR_VISIBLE_DEVICES=xxx -- podman+rocm Best, Christian [5]: https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/docker.html#restricting-gpu-access [6]: https://github.com/containers/podman/issues/21454#issuecomment-1920058541
Re: ROCm test launcher modifications when there is other non-AMD GPU cards.
Hi Christian, On 2025-05-12 10:54, Christian Kastner wrote: [3]: Sadly, after much trying, it seems that the analog for [1] in rootless containers, using the 'podman+rocm' backend, is not possible due to come cgroupsv2 restriction. However, I still have the code for that, and I guess I could ship it for people who want to try it in rootful containers. I take it you are referring to setting environment variables in podman workers? The ROCR_VISIBLE_DEVICES variable can isolate the GPU at a fairly low level in the ROCm user land [2]. Or, do you mean only passing through a subset of devices at all? I forget how you were approaching this. In any case, isolation via rooted containers would probably be useful as an option. I'd like to limit Pinwheel and Arctophylax to a single GPU [3]. They're getting a fair bit of interactive use now and that would make it easier to share them. It's up to you, though. Sincerely, Cory Bloor [2]: https://rocm.docs.amd.com/en/docs-6.4.0/conceptual/gpu-isolation.html [3]: https://salsa.debian.org/rocm-team/community/team-project/-/wikis/Continuous-integration-workers
Re: ROCm test launcher modifications when there is other non-AMD GPU cards.
Hi Clement, On 2025-05-12 13:06, LONGEAC Clement wrote: > We would have to hide NVIDIA in order to test only on AMD Radeon RX > 6400 . Rather than making a specific command for testing on the AMD or a > specific card, would it be possible to add an option to the rocm-test- > launcher /debian/test/opencl command, allowing specific selection of one > or more cards for the test phases? Should we select the specific card > we're interested in? Hm, interesting problem. rocm-test-launcher was designed to be a helper for tests driven by the autopkgtest command, rather than being invoked directly. Among other things, the utility checks for GPU presence only so that it can skip, rather than error out, on a lack thereof, which is what needs to happen in the official CI (as opposed to [1]). I saw GPU selection as part of the testbed setup. To this end, packages 'rocm-qemu-support' and 'rocm-podman-support' provide suitable autopkgtest backends [2, 3]. Given the above, adding this feature to rocm-test-launcher would seem like a layer violation, so Cory's intuition was right. However, it also shouldn't be needed. When designing tests, if you want to run them in your shell, you might as well invoke the tests directly, ie: OCL_ICD_VENDORS=/etc/OpenCL/vendors/amdocl64.icd debian/tests/script* And in d/tests/control, you would just use rocm-test-launcher debian/tests/script* and you could rely on the fact that our team's CI [1] provides properly set up testbeds. Side note: The synopsis of rocm-test-launcher is rocm-test-launcher CMD [ARGS] so if your intention was for debian/tests/script* to expand to multiple tests, you'll need to invoke rocm-test-launcher for each of them, or create a small wrapper like this one [4]. On 2025-05-12 17:56, Cordell Bloor wrote: > You should direct that question towards the pkg-rocm-tools > maintainer(or author). I've CC'd Christian Kastner Thanks. In any case, I'm subscribed to the list :) Best, Christian [1]: https://ci.rocm.debian.net [2]: For example, with the 'qemu+rocm' backend, you could run an autopkgtest with the one GPU in PCI slot 09:00.0 like so: $ autopkgtest -B -- qemu+rocm --gpu 09:00.0 [3]: Sadly, after much trying, it seems that the analog for [1] in rootless containers, using the 'podman+rocm' backend, is not possible due to come cgroupsv2 restriction. However, I still have the code for that, and I guess I could ship it for people who want to try it in rootful containers. [4]: https://sources.debian.org/src/rocrand/5.7.1-6/debian/tests/run-testdir/
Re: ROCm test launcher modifications when there is other non-AMD GPU cards.
Hi Clement, On May 12, 2025 5:06:28 a.m. MDT, LONGEAC Clement wrote: We have several NVIDIA cards and one AMD and want to test only on one, the AMD Radeon RX 6400 using this command rocm-test-launcher /debian/test/opencl. We encounter a problem when we want to run the test only on the AMD. The command takes all available cards, including NVDIA and AMD. We were able to solve the problem by lacing this command: *OCL_ICD_VENDORS=/etc/OpenCL/vendors/amdocl64.icd rocm-test-launcher debian/tests/script* * * We would have to hide NVIDIA in order to test only on AMD Radeon RX 6400 . Rather than making a specific command for testing on the AMD or a specific card, would it be possible to add an option to the rocm-test-launcher /debian/test/opencl command, allowing specific selection of one or more cards for the test phases? Should we select the specific card we're interested in? You should direct that question towards the pkg-rocm-tools maintainer (or author). I've CC'd Christian Kastner, as he may be able to provide more perspective. I don't know if managing visible devices is in-scope for the rocm-test-launcher, but Christian has probably considered the question before. Sincerely, Cory Bloor