Re: ROCm test launcher modifications when there is other non-AMD GPU cards.

2025-06-02 Thread Cordell Bloor

Hi Fred,

On 2025-05-29 02:06, PICCA Frederic-Emmanuel wrote:

So here a new question: What about rocm on the CPU part ?

Is it possible to run the rocm opencl on the CPU (at least it allows to test 
this part) without all this black magic 🙂.


I don't know. I imagine it is probably device-only, but I've emailed the 
development team to ask.


With that said, I'm not sure that the test results would be very 
interesting if executing on the host. That would not exercise any of the 
underlying ROCm compiler, runtime or driver components.


Sincerely,
Cory Bloor



Re: ROCm test launcher modifications when there is other non-AMD GPU cards.

2025-05-29 Thread PICCA Frederic-Emmanuel
Another question, 

I am still investigating in order to have an unshare working setup. I was told 
that unshare is meant to isolate from the hardware. I can understand this point 
of view.
Nevertheless at some point we need a CPU in order to make all this work :). To 
my opinion GPUs shoud be available in the same way.

So here a new question: What about rocm on the CPU part ?

Is it possible to run the rocm opencl on the CPU (at least it allows to test 
this part) without all this black magic :).

Cheers

Fred



Re: ROCm test launcher modifications when there is other non-AMD GPU cards.

2025-05-26 Thread Christian Kastner
Hi Fred, Étienne,

On 2025-05-26 20:29, Étienne Mollier wrote:
> PICCA Frederic-Emmanuel, on 2025-05-26:
>> do you know what should de done in order to prepare the chroot in our case.

>> a sort of bind mount but when we tryed to bind mount something under /dev/, 
>> it enup with this error

I'm not yet familiar with sbuild's unshare backend, but I assume that
just like the autopkgtest-virt-unshare backend, it only supports bind
mount of directories at most.

So you could bind mount the necessary directory /dev/dri, but not the
device /dev/kfd.

I took a brief look at adding this a shorte while ago but shelved it
for now as we are in hard freeze. For now, you'd have to use the
podman backend to run tests.

> I'm not sure yet how to expose devices the right way in unshare
> mode.  I would expect any method involving bind-mount to require
> running the pre-build commands as root, and any method involving
> mknod to require work with capabilities(7), user_namespaces(7)
> and setcap(8), but implementation details are evading me so far.

I think it should work just with user_namespaces, because rootless
podman does it like that.

This PoC seemed to work:

  christian@host$ unshare --user --map-root-user --mount
  root@host$ touch /tmp/kfd && mount --bind /dev/kfd /tmp/kfd && ls -l /tmp/kfd
  crw-rw 1 nobody nogroup 239, 0 Mai 26 22:56 /tmp/kfd

Though I have to say, without some web searches pointing me to the
solution of touching the file to bind-mount the device over, I
would have not found this.

(I strongly suspect that chown'ing won't be possible or won't fix the
issue entirely. For podman, I had to jump through a few hoops, search
for "setgroups" in [1]).

Best,
Christian

[1]: 
https://salsa.debian.org/rocm-team/community/team-project/-/blob/master/doc/rocm-autopkgtests-in-containers.md



Re: ROCm test launcher modifications when there is other non-AMD GPU cards.

2025-05-26 Thread Étienne Mollier
Hi Fred,

PICCA Frederic-Emmanuel, on 2025-05-26:
> We are just using sbuild in order to run the autopkgtest with the unshare 
> system
> 
> It end up with this error
> 
> autopkgtest [13:11:41]: test command1: rocm-test-launcher reconstruct
> autopkgtest [13:11:41]: test command1: [---
> /dev/kfd not present, system either lacks AMD GPU or 'amdgpu' driver is not 
> loaded.
> Skipping tests.
[…]
> do you know what should de done in order to prepare the chroot in our case.

If getting back to schroot mode is not a problem for you, there
is the option to follow these instructions[1] I wrapped up a
couple of years ago.  They should still apply, modulo the fact
that it is necessary to also pass --chroot-mode=schroot to the
sbuild command, or adjust the sbuildrc to change the default
mode.

[1]: https://lists.debian.org/debian-ai/2022/06/msg0.html

> a sort of bind mount but when we tryed to bind mount something under /dev/, 
> it enup with this error
> 
> I: Placed new chroot tarball at /home/picca/.cache/sbuild/unstable-amd64.tar
> I: Unpacking /home/picca/.cache/sbuild/unstable-amd64.tar to 
> /tmp/tmp.sbuild.xDhCW0XnvX...
> mount: /tmp/tmp.sbuild.xDhCW0XnvX/dev/kfd: wrong fs type, bad option, bad 
> superblock on /dev/kfd, missing codepage or helper program, or other error.
>dmesg(1) may have more information after failed mount system call.
> mount failed: No such file or directory at /usr/libexec/sbuild-usernsexec 
> line 387.
> E: Running 'true' inside the chroot failed. Please file a bug against sbuild.
> E: Error creating chroot session: skipping ufo-filters

I'm not sure yet how to expose devices the right way in unshare
mode.  I would expect any method involving bind-mount to require
running the pre-build commands as root, and any method involving
mknod to require work with capabilities(7), user_namespaces(7)
and setcap(8), but implementation details are evading me so far.

In hope this helps,
-- 
  .''`.  Étienne Mollier 
 : :' :  pgp: 8f91 b227 c7d6 f2b1 948c  8236 793c f67e 8f0d 11da
 `. `'   sent from /dev/pts/3, please excuse my verbosity
   `-on air: Agents of Mercy - The Black Forest


signature.asc
Description: PGP signature


Re: ROCm test launcher modifications when there is other non-AMD GPU cards.

2025-05-26 Thread PICCA Frederic-Emmanuel
Hello Christian

> This requires package rocm-podman-support. That package also provides
> you with
>  * rocm-podman-setup(1) to prepare your system for GPU-in-container use
>  * rocm-podman-create(1) to create an image.

We are just using sbuild in order to run the autopkgtest with the unshare system

It end up with this error

autopkgtest [13:11:41]: test command1: rocm-test-launcher reconstruct
autopkgtest [13:11:41]: test command1: [---
/dev/kfd not present, system either lacks AMD GPU or 'amdgpu' driver is not 
loaded.
Skipping tests.


but.

This computer has and AMD graphical xcomputing unit.

***  
Agent 2  
***  
  Name:gfx90c 
  Uuid:GPU-XX 
  Marketing Name:  AMD Radeon Graphics
  Vendor Name: AMD
  Feature: KERNEL_DISPATCH
  Profile: BASE_PROFILE   
  Float Round Mode:NEAR   
  Max Queue Number:128(0x80)  
  Queue Min Size:  64(0x40)   
  Queue Max Size:  131072(0x2)
  Queue Type:  MULTI  
  Node:1  
  Device Type: GPU
  Cache Info:  
L1:  16(0x10) KB
L2:  1024(0x400) KB 
  Chip ID: 5686(0x1636)   


do you know what should de done in order to prepare the chroot in our case.

a sort of bind mount but when we tryed to bind mount something under /dev/, it 
enup with this error

I: Placed new chroot tarball at /home/picca/.cache/sbuild/unstable-amd64.tar
I: Unpacking /home/picca/.cache/sbuild/unstable-amd64.tar to 
/tmp/tmp.sbuild.xDhCW0XnvX...
mount: /tmp/tmp.sbuild.xDhCW0XnvX/dev/kfd: wrong fs type, bad option, bad 
superblock on /dev/kfd, missing codepage or helper program, or other error.
   dmesg(1) may have more information after failed mount system call.
mount failed: No such file or directory at /usr/libexec/sbuild-usernsexec line 
387.
E: Running 'true' inside the chroot failed. Please file a bug against sbuild.
E: Error creating chroot session: skipping ufo-filters


thanks for your help.

Fred



Re: ROCm test launcher modifications when there is other non-AMD GPU cards.

2025-05-14 Thread Christian Kastner
On 2025-05-13 23:23, Cordell Bloor wrote:
> Typically there is also the option of using the sadt tool from
> devscripts [1]. However, it doesn't seem to recognize the Architecture
> field in d/t/control

That warrants a bug report.

I took a brief look. At first glance, the trivial patch below should fix
this by ignoring the field's contents, which wouldn't be an entirely
unreasonable workaround.

Parsing the field would be a bit more work, but perhaps the maintainers
would be willing to implement it.

Best,
Christian

> [1]: https://manpages.debian.org/testing/devscripts/sadt.1.en.html

 $ diff -u /usr/bin/sadt sadt
--- /usr/bin/sadt   2025-04-29 05:20:06.0 +0200
+++ /tmp/sadt2025-05-14 22:45:56.169844645 +0200
@@ -453,6 +453,9 @@
 def add_tests_directory(self, path):
 self.tests_directory = path

+def add_architecture(self, architecture):
+pass
+

 def copy_build_tree():
 rw_build_tree = tempfile.mkdtemp(prefix="sadt-rwbt.")



Re: ROCm test launcher modifications when there is other non-AMD GPU cards.

2025-05-13 Thread Cordell Bloor

Hi Christian,

On 2025-05-13 14:05, Christian Kastner wrote:

If you mean "run the autopkgtest I'm defining in debian/tests/*
manually", then you can simply run the test command from that directory,
as long as you have the packages installed.


Typically there is also the option of using the sadt tool from 
devscripts [1]. However, it doesn't seem to recognize the Architecture 
field in d/t/control, which prevents it from functioning with the ROCm 
autopkgtests (as I believe they all specify that field). e.g.


root@da6262403eec:~/rocrand/rocrand# sadt rocrand
cannot parse package relationship "", returning it raw
sadt: warning: unknown field Architecture, skipping the whole paragraph
sadt: warning: unknown field Architecture, skipping the whole paragraph


OK (tests=0)

Sincerely,
Cory Bloor

[1]: https://manpages.debian.org/testing/devscripts/sadt.1.en.html



Re: ROCm test launcher modifications when there is other non-AMD GPU cards.

2025-05-13 Thread Christian Kastner
Hi Clement,

On 2025-05-13 16:41, LONGEAC Clement wrote:
> Thank you for all your answers ! However I was wondering, is it possible to 
> do autopkgtest without backend, here (podman+rocm)?

It depends on what you mean.

autopkgtest(1) is a testing frontend, and it has many backends of the
form autopkgtest-virt-. The Debian ROCm Team ships an
autopkgtest-virt-podman+rocm, which provides the "podman+rocm" backend.

So if with "do[ing] autopkgtest without backend", you mean using the
autopkgtest(1) *frontend*, then no: a backend is required.

If you mean "run the autopkgtest I'm defining in debian/tests/*
manually", then you can simply run the test command from that directory,
as long as you have the packages installed.

> If we don't have a GPU, how do we go about using the backend without a 
> container?

Sorry, I don't quite follow. Without a GPU, you wouldn't run a test at
all, right?

And a container is just something that some of the autopkgtest-virt-*
backends use (be it docker, podman, or lxc). There are other options as
well.

Schroot is gradually falling into disuse, can we use the unshare backend
instead of schroot when ordering autopkgtest?

In general: sure, see autopkgtest-virt-unshare(1). However, it won't
work with a GPU because there is no way to pass in the GPU device. The
backend does have a --bind option, but it only works on directories, not
devices.

Adding device support looks to be quite easy, I'll file an MR over the
weekend.

Best,
Christian



Re: ROCm test launcher modifications when there is other non-AMD GPU cards.

2025-05-13 Thread LONGEAC Clement
Hi Christian,

Thank you for all your answers ! However I was wondering, is it possible to do 
autopkgtest without backend, here (podman+rocm)? If we don't have a GPU, how do 
we go about using the backend without a container? Schroot is gradually falling 
into disuse, can we use the unshare backend instead of schroot when ordering 
autopkgtest?

Best regards,
Clément LONGEAC

- Mail original -
De: "Christian Kastner" 
À: "debian-science" 
Envoyé: Lundi 12 Mai 2025 22:56:44
Objet: Re: ROCm test launcher modifications when there is other non-AMD GPU 
cards.

On 2025-05-12 13:06, LONGEAC Clement wrote:
> We have several NVIDIA cards and one AMD

I wasn't paying attention: as there is only *one* AMD card, you can of
course use the podman+rocm backend, as it will pick up all AMD cards. As
there is only one, the "all" is not a problem here.

So you can simply autopkgtest your package, with rocm-test-launcher
enabled, with

  $ autopkgtest -B  -- podman+rocm 

This requires package rocm-podman-support. That package also provides
you with
  * rocm-podman-setup(1) to prepare your system for GPU-in-container use
  * rocm-podman-create(1) to create an image.

Best,
Christian



Re: ROCm test launcher modifications when there is other non-AMD GPU cards.

2025-05-12 Thread Christian Kastner
On 2025-05-12 13:06, LONGEAC Clement wrote:
> We have several NVIDIA cards and one AMD

I wasn't paying attention: as there is only *one* AMD card, you can of
course use the podman+rocm backend, as it will pick up all AMD cards. As
there is only one, the "all" is not a problem here.

So you can simply autopkgtest your package, with rocm-test-launcher
enabled, with

  $ autopkgtest -B  -- podman+rocm 

This requires package rocm-podman-support. That package also provides
you with
  * rocm-podman-setup(1) to prepare your system for GPU-in-container use
  * rocm-podman-create(1) to create an image.

Best,
Christian



Re: ROCm test launcher modifications when there is other non-AMD GPU cards.

2025-05-12 Thread Christian Kastner
On 2025-05-12 22:32, Cordell Bloor wrote:
> On 2025-05-12 10:54, Christian Kastner wrote:
>> [3]: Sadly, after much trying, it seems that the analog for [1] in
>>   rootless containers, using the 'podman+rocm' backend, is not
>>   possible due to come cgroupsv2 restriction. However, I still
>>   have the code for that, and I guess I could ship it for people who
>>   want to try it in rootful containers.
> 
> I take it you are referring to setting environment variables in podman
> workers? The ROCR_VISIBLE_DEVICES variable can isolate the GPU at a
> fairly low level in the ROCm user land [2]. Or, do you mean only passing
> through a subset of devices at all? I forget how you were approaching this.

I was referring to passing in the devices, which seems to be the
approach preferred by the official documentation [5], but not possible
in rootless containers because of [6].

I did try to switch over to ROCR_VISIBLE_DEVICES at some point, but ran
into some impediment. Don't immediately recall what it was.

That reminds me though, ROCR_VISIBLE_DEVICES can be passed as-is, without
needing magic. IOW, the following should work (though untested):

autopkgtest -B  --env=ROCR_VISIBLE_DEVICES=xxx -- podman+rocm

Best,
Christian


[5]: 
https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/docker.html#restricting-gpu-access
[6]: https://github.com/containers/podman/issues/21454#issuecomment-1920058541



Re: ROCm test launcher modifications when there is other non-AMD GPU cards.

2025-05-12 Thread Cordell Bloor

Hi Christian,

On 2025-05-12 10:54, Christian Kastner wrote:

[3]: Sadly, after much trying, it seems that the analog for [1] in
  rootless containers, using the 'podman+rocm' backend, is not
  possible due to come cgroupsv2 restriction. However, I still
  have the code for that, and I guess I could ship it for people who
  want to try it in rootful containers.


I take it you are referring to setting environment variables in podman 
workers? The ROCR_VISIBLE_DEVICES variable can isolate the GPU at a 
fairly low level in the ROCm user land [2]. Or, do you mean only passing 
through a subset of devices at all? I forget how you were approaching this.


In any case, isolation via rooted containers would probably be useful as 
an option. I'd like to limit Pinwheel and Arctophylax to a single GPU 
[3]. They're getting a fair bit of interactive use now and that would 
make it easier to share them. It's up to you, though.


Sincerely,
Cory Bloor

[2]: https://rocm.docs.amd.com/en/docs-6.4.0/conceptual/gpu-isolation.html
[3]: 
https://salsa.debian.org/rocm-team/community/team-project/-/wikis/Continuous-integration-workers




Re: ROCm test launcher modifications when there is other non-AMD GPU cards.

2025-05-12 Thread Christian Kastner
Hi Clement,

On 2025-05-12 13:06, LONGEAC Clement wrote:
> We would have to hide NVIDIA in order to test only on AMD Radeon RX
> 6400 . Rather than making a specific command for testing on the AMD or a
> specific card, would it be possible to add an option to the rocm-test-
> launcher /debian/test/opencl command, allowing specific selection of one
> or more cards for the test phases? Should we select the specific card
> we're interested in?

Hm, interesting problem. rocm-test-launcher was designed to be a helper
for tests driven by the autopkgtest command, rather than being invoked
directly. Among other things, the utility checks for GPU presence only
so that it can skip, rather than error out, on a lack thereof, which is
what needs to happen in the official CI (as opposed to [1]).

I saw GPU selection as part of the testbed setup. To this end, packages
'rocm-qemu-support' and 'rocm-podman-support' provide suitable
autopkgtest backends [2, 3].

Given the above, adding this feature to rocm-test-launcher would seem
like a layer violation, so Cory's intuition was right.

However, it also shouldn't be needed. When designing tests, if you want
to run them in your shell, you might as well invoke the tests directly,
ie:

  OCL_ICD_VENDORS=/etc/OpenCL/vendors/amdocl64.icd debian/tests/script*

And in d/tests/control, you would just use

  rocm-test-launcher debian/tests/script*

and you could rely on the fact that our team's CI [1] provides properly
set up testbeds.

Side note: The synopsis of rocm-test-launcher is

   rocm-test-launcher CMD [ARGS]

so if your intention was for debian/tests/script* to expand to multiple
tests, you'll need to invoke rocm-test-launcher for each of them, or
create a small wrapper like this one [4].

On 2025-05-12 17:56, Cordell Bloor wrote:
> You should direct that question towards the pkg-rocm-tools
> maintainer(or author). I've CC'd Christian Kastner

Thanks. In any case, I'm subscribed to the list :)

Best,
Christian

[1]: https://ci.rocm.debian.net

[2]: For example, with the 'qemu+rocm' backend, you could run an
 autopkgtest with the one GPU in PCI slot 09:00.0 like so:

 $ autopkgtest -B  -- qemu+rocm --gpu 09:00.0 

[3]: Sadly, after much trying, it seems that the analog for [1] in
 rootless containers, using the 'podman+rocm' backend, is not
 possible due to come cgroupsv2 restriction. However, I still
 have the code for that, and I guess I could ship it for people who
 want to try it in rootful containers.

[4]:
https://sources.debian.org/src/rocrand/5.7.1-6/debian/tests/run-testdir/



Re: ROCm test launcher modifications when there is other non-AMD GPU cards.

2025-05-12 Thread Cordell Bloor

Hi Clement,

On May 12, 2025 5:06:28 a.m. MDT, LONGEAC Clement 
 wrote:


   We have several NVIDIA cards and one AMD and want to test only on
   one, the AMD Radeon RX 6400 using this command rocm-test-launcher
   /debian/test/opencl. We encounter a problem when we want to run the
   test only on the AMD. The command takes all available cards,
   including NVDIA and AMD. We were able to solve the problem by lacing
   this command:

   *OCL_ICD_VENDORS=/etc/OpenCL/vendors/amdocl64.icd rocm-test-launcher
   debian/tests/script*
   *
   *
   We would have to hide NVIDIA in order to test only on AMD Radeon RX
   6400 . Rather than making a specific command for testing on the AMD
   or a specific card, would it be possible to add an option to the
   rocm-test-launcher /debian/test/opencl command, allowing specific
   selection of one or more cards for the test phases? Should we select
   the specific card we're interested in?


You should direct that question towards the pkg-rocm-tools maintainer 
(or author). I've CC'd Christian Kastner, as he may be able to provide 
more perspective. I don't know if managing visible devices is in-scope 
for the rocm-test-launcher, but Christian has probably considered the 
question before.


Sincerely,
Cory Bloor