[easybuild] CUDA on machine with OmniPath
Hi Easybuilders, I am trying to get the newest PyTorch to work with the foss/2023a toolchain. So far, the CPU-only version seems to work (PR #19184), and I am trying to get the CUDA version to work. In the test suite, a lot of stuff fails with fi_info: error while loading shared libraries: libpsm_infinipath.so.1: cannot open shared object file: No such file or directory It looks like it is trying to load an OmniPath driver. However, our few GPU nodes have neither OmniPath nor Infiniband, but they do share the EasyBuild module with the CPU nodes, and of course OpenMPI is built with OmniPath support (which does not normally cause trouble for MPI jobs that stay within a single node, and this is our use case on the GPU nodes). I suspect the problem is due to PyTorch depending on NCCL and magma, and both of these depend on UCX-CUDA. Does anyone have a suggestion for how to handle this? Can one build a version of NCCL and magma without UCX-CUDA? Can one disable it with an environment variable? Or something else? We would rather avoid having a completely different module tree for these few nodes, but if it is necessary then we will have to. Best regards Jakob
[easybuild] HPC, Big Data, and Data Science Devroom at FOSDEM'24
HPC, Big Data, and Data Science Devroom at FOSDEM'24 https://hpc-bigdata-fosdem24.github.io - https://fosdem.org/2024 ** Call for Participation ** Submission deadline: Friday 1 Dec 2023 Devroom date: Sat 3 Feb 2024 We are proud to announce the 9th edition of the HPC, Big Data and Data Science devroom at FOSDEM 2024 in Brussels (Belgium). It is organised by representatives of the HPC, Big Data, and Data Science communities, who are joining forces to bring them together. The devroom will take place during the FOSDEM'24 weekend, on Saturday 3 February 2024, and is open for everybody to join (no registration required). Join us to enjoy a variety of talks, demos and interesting discussions on open-source HPC, Big Data, and Data Science. Sounds interesting? Submit your talk proposal, and see you at FOSDEM'24! Submissions are light weight: basically talk title + short description, plain text. Please visit the website for more information, and consider sharing this call for participation with friends and colleagues. Contact us via email at hpc-bigdata-devr...@lists.fosdem.org with questions or concerns, or via @fosdem_hpc on Twitter/X. -- Kenneth Hoste - HPC team at Ghent University, Belgium Adam Huffman - Big Data Institute, University of Oxford, UK