Re: Building Magma with AMD GPU support

2025-02-17 Thread Mo Zhou

On 2/17/25 20:07, Cordell Bloor wrote:


If the documentation is identical for both the ROCm and CUDA versions 
of magma, do we want to just move libmagma-doc from contrib into main 
instead of creating a new libmagma-rocm-doc package? I suppose if the 
ROCm and CUDA versions of the package are different versions, you 
might want to be able to install both versions of the docs, but it 
does seem a bit odd to have two practically identical packages. I 
could see a case for doing it either way.


I'd prefer separated documentation packages, even if they are kind of 
duplicate. This gives flexibility, while intertwined package may lead to 
inconsistency or blockage.


For example, if I encountered some build issue with the cuda version and 
cannot fix it timely. In that case the rocm variant can still be updated 
independently, and you are not forced to fix the cuda build to make the 
rocm variant complete or consistent. Different variants can be prepared 
asynchronously. Such case also leads to version difference in doc 
packages, and you do not want a doc package for a mismatched older/newer 
version.


Similarly, by separating src:pytorch and src:pytorch-cuda, I would not 
worry about any potential cuda build issue and it does not block the 
updates to the CPU version. Isolation makes things safer and easier to 
prepare than a giant update with everything ready.




Re: Building Magma with AMD GPU support

2025-02-17 Thread Cordell Bloor

Hi Mo,

I've opened an MR on Salsa with an extremely rough initial draft of the 
package update for magma-rocm [1]. I'm still a little fuzzy on some 
conventions, so I'm sure there's lots of stuff to change. Nevertheless, 
I think it's a useful starting place for further discussion.


On 2024-12-19 16:40, Mo Zhou wrote:
As you may have noticed, src:pytorch (main) and src:pytorch-cuda 
(contrib) is the identical source but uploaded twice due to the 
difference in their sections. This is found to minimize my effort 
compared to maintaining two separate sources, especially when I need 
to apply the same logic to many other packages like src:gloo, 
src:tensorpipe, etc.


For magma I'd personally prefer my own approach. Maybe you can just 
refer to the debian/cudabuild.sh and debian/rocmbuild.sh from 
src:pytorch, and see whether this works for you. In that sense we can 
avoid duplicated working repository which is nothing but requiring 
double human effort. Namely, a debian/rocmbuild.sh conversion script, 
and a control.rocm file targeting at Section: main should be good to go.


If the documentation is identical for both the ROCm and CUDA versions of 
magma, do we want to just move libmagma-doc from contrib into main 
instead of creating a new libmagma-rocm-doc package? I suppose if the 
ROCm and CUDA versions of the package are different versions, you might 
want to be able to install both versions of the docs, but it does seem a 
bit odd to have two practically identical packages. I could see a case 
for doing it either way.


Sincerely,
Cory Bloor

[1]: https://salsa.debian.org/science-team/magma/-/merge_requests/1



Re: Building Magma with AMD GPU support

2024-12-19 Thread Mo Zhou

Hi Cordell,

Thanks for working on this.

As you may have noticed, src:pytorch (main) and src:pytorch-cuda 
(contrib) is the identical source but uploaded twice due to the 
difference in their sections. This is found to minimize my effort 
compared to maintaining two separate sources, especially when I need to 
apply the same logic to many other packages like src:gloo, 
src:tensorpipe, etc.


For magma I'd personally prefer my own approach. Maybe you can just 
refer to the debian/cudabuild.sh and debian/rocmbuild.sh from 
src:pytorch, and see whether this works for you. In that sense we can 
avoid duplicated working repository which is nothing but requiring 
double human effort. Namely, a debian/rocmbuild.sh conversion script, 
and a control.rocm file targeting at Section: main should be good to go.


That said, anybody is welcome to comment if there is any better approach 
to reduce human effort for such case.



On 12/18/24 23:53, Cordell Bloor wrote:


Hi Mo,

I was building PyTorch and noticed that Magma [1] is a dependency for 
some configurations. There is a magma package with NVIDIA GPU support 
in contrib [2], but we don't have an AMD GPU version packaged for 
Debian. It took a bit of trial and error to successfully build the 
library, so I thought I'd share instructions for building magma with 
AMD GPU support:


sudo apt-get -y install git build-essential libopenblas-dev gfortran hipcc 
librocblas-dev libhipblas-dev librocsparse-dev libhipsparse-dev
git clonehttps://github.com/icl-utk-edu/magma.git
cd magma
git checkout v2.8.0
echo -e 'BACKEND = hip\nFORT = true\nGPU_TARGET = gfx803 gfx900 gfx906 gfx908 
gfx90a gfx1010 gfx1030 gfx1100 gfx1101 gfx1102' > make.inc
sed -i '1s/python$/python3/' tools/codegen.py
sed -i 's/hip::host/hip::device/' CMakeLists.txt
make generate
CXX=hipcc cmake -S. -Bbuild -DBLA_VENDOR=OpenBLAS 
-DAMDGPU_TARGETS="gfx803;gfx900;gfx906;gfx908;gfx90a;gfx1010;gfx1030;gfx1100;gfx1101;gfx1102"
 -DMAGMA_ENABLE_HIP=ON
make -j16 -C build

I've only tested on my local workstation, but the above commands 
_should_ result in a magma library that runs on any discrete AMD GPU 
since Vega (excluding MI300).


This AMD GPU build takes a long time so it would be nice to provide a 
binary package. I'd be happy to help maintain the magma package, but I 
think I will need your help to get it started. In particular, it's not 
clear to me how to organize the package sources to minimize duplicate 
work between the NVIDIA and AMD variants. I'm also unsure of what 
conventions to follow for package naming.


Sincerely,
Cory Bloor

[1]: https://github.com/icl-utk-edu/magma
[2]: https://tracker.debian.org/pkg/magma