Re: Building Magma with AMD GPU support
On 2/17/25 20:07, Cordell Bloor wrote: If the documentation is identical for both the ROCm and CUDA versions of magma, do we want to just move libmagma-doc from contrib into main instead of creating a new libmagma-rocm-doc package? I suppose if the ROCm and CUDA versions of the package are different versions, you might want to be able to install both versions of the docs, but it does seem a bit odd to have two practically identical packages. I could see a case for doing it either way. I'd prefer separated documentation packages, even if they are kind of duplicate. This gives flexibility, while intertwined package may lead to inconsistency or blockage. For example, if I encountered some build issue with the cuda version and cannot fix it timely. In that case the rocm variant can still be updated independently, and you are not forced to fix the cuda build to make the rocm variant complete or consistent. Different variants can be prepared asynchronously. Such case also leads to version difference in doc packages, and you do not want a doc package for a mismatched older/newer version. Similarly, by separating src:pytorch and src:pytorch-cuda, I would not worry about any potential cuda build issue and it does not block the updates to the CPU version. Isolation makes things safer and easier to prepare than a giant update with everything ready.
Re: Building Magma with AMD GPU support
Hi Mo, I've opened an MR on Salsa with an extremely rough initial draft of the package update for magma-rocm [1]. I'm still a little fuzzy on some conventions, so I'm sure there's lots of stuff to change. Nevertheless, I think it's a useful starting place for further discussion. On 2024-12-19 16:40, Mo Zhou wrote: As you may have noticed, src:pytorch (main) and src:pytorch-cuda (contrib) is the identical source but uploaded twice due to the difference in their sections. This is found to minimize my effort compared to maintaining two separate sources, especially when I need to apply the same logic to many other packages like src:gloo, src:tensorpipe, etc. For magma I'd personally prefer my own approach. Maybe you can just refer to the debian/cudabuild.sh and debian/rocmbuild.sh from src:pytorch, and see whether this works for you. In that sense we can avoid duplicated working repository which is nothing but requiring double human effort. Namely, a debian/rocmbuild.sh conversion script, and a control.rocm file targeting at Section: main should be good to go. If the documentation is identical for both the ROCm and CUDA versions of magma, do we want to just move libmagma-doc from contrib into main instead of creating a new libmagma-rocm-doc package? I suppose if the ROCm and CUDA versions of the package are different versions, you might want to be able to install both versions of the docs, but it does seem a bit odd to have two practically identical packages. I could see a case for doing it either way. Sincerely, Cory Bloor [1]: https://salsa.debian.org/science-team/magma/-/merge_requests/1
Re: Building Magma with AMD GPU support
Hi Cordell, Thanks for working on this. As you may have noticed, src:pytorch (main) and src:pytorch-cuda (contrib) is the identical source but uploaded twice due to the difference in their sections. This is found to minimize my effort compared to maintaining two separate sources, especially when I need to apply the same logic to many other packages like src:gloo, src:tensorpipe, etc. For magma I'd personally prefer my own approach. Maybe you can just refer to the debian/cudabuild.sh and debian/rocmbuild.sh from src:pytorch, and see whether this works for you. In that sense we can avoid duplicated working repository which is nothing but requiring double human effort. Namely, a debian/rocmbuild.sh conversion script, and a control.rocm file targeting at Section: main should be good to go. That said, anybody is welcome to comment if there is any better approach to reduce human effort for such case. On 12/18/24 23:53, Cordell Bloor wrote: Hi Mo, I was building PyTorch and noticed that Magma [1] is a dependency for some configurations. There is a magma package with NVIDIA GPU support in contrib [2], but we don't have an AMD GPU version packaged for Debian. It took a bit of trial and error to successfully build the library, so I thought I'd share instructions for building magma with AMD GPU support: sudo apt-get -y install git build-essential libopenblas-dev gfortran hipcc librocblas-dev libhipblas-dev librocsparse-dev libhipsparse-dev git clonehttps://github.com/icl-utk-edu/magma.git cd magma git checkout v2.8.0 echo -e 'BACKEND = hip\nFORT = true\nGPU_TARGET = gfx803 gfx900 gfx906 gfx908 gfx90a gfx1010 gfx1030 gfx1100 gfx1101 gfx1102' > make.inc sed -i '1s/python$/python3/' tools/codegen.py sed -i 's/hip::host/hip::device/' CMakeLists.txt make generate CXX=hipcc cmake -S. -Bbuild -DBLA_VENDOR=OpenBLAS -DAMDGPU_TARGETS="gfx803;gfx900;gfx906;gfx908;gfx90a;gfx1010;gfx1030;gfx1100;gfx1101;gfx1102" -DMAGMA_ENABLE_HIP=ON make -j16 -C build I've only tested on my local workstation, but the above commands _should_ result in a magma library that runs on any discrete AMD GPU since Vega (excluding MI300). This AMD GPU build takes a long time so it would be nice to provide a binary package. I'd be happy to help maintain the magma package, but I think I will need your help to get it started. In particular, it's not clear to me how to organize the package sources to minimize duplicate work between the NVIDIA and AMD variants. I'm also unsure of what conventions to follow for package naming. Sincerely, Cory Bloor [1]: https://github.com/icl-utk-edu/magma [2]: https://tracker.debian.org/pkg/magma