Re: [OMPI users] ifort and openmpi
Volker, https://ntq1982.github.io/files/20200621.html (mentioned in the ticket) suggests that patching the generated configure file can do the trick. We already patch the generated configure file in autogen.pl (if the patch_autotools_output subroutine), so I guess that could be enhanced to support Intel Fortran on OSX. I am confident a Pull Request that does fix this issue will be considered for inclusion in future Open MPI releases. Cheers, Gilles On Fri, Sep 16, 2022 at 11:20 AM Volker Blum via users < users@lists.open-mpi.org> wrote: > Hi all, > > This issue here: > > https://github.com/open-mpi/ompi/issues/7615 > > is, unfortunately, still current. > > I understand that within OpenMPI there is a sense that this is Intel's > problem but I’m not sure it is. Is it possible to address this in the > configure script in the actual OpenMPI distribution in some form? > > There are more issues with OpenMPI + Intel + scalapack, but this is the > first one that strikes. Eventually, the problem just renders a Macbook > unusable as a computing tool since the only way it seems to run is with > libraries from Homebrew (this works), but that appears to introduce > unoptimized BLAS libraries - very slow. It’s the only working MPI setup > that I could construct, though. > > I know that one can take the view that Intel Fortran on Mac is just broken > for the default configure process, but it seems like a strange standoff to > me. It would be much better to see this worked out in some way. > > Does anyone have a solution for this issue that could be merged into the > actual configure script distributed with OpenMPI, rather than having to > track down a fairly arcane addition(*) and apply it by hand? > > Sorry … I know this isn’t the best way of raising the issue but then, it > is also tiring to spend hours on an already large build process and find > that the issue is still there. If there was some way to figure this out so > as to at least not affect OpenMPI, I suspect that would help a lot of > users. Would anyone be willing to revisit the 2020 decision? > > Thank you! > > Best wishes > Volker > > (*) I know about the patch in the README: > > - Users have reported (see > https://github.com/open-mpi/ompi/issues/7615) that the Intel Fortran > compiler will fail to link Fortran-based MPI applications on macOS > with linker errors similar to this: > > Undefined symbols for architecture x86_64: > "_ompi_buffer_detach_f08", referenced from: > import-atom in libmpi_usempif08.dylib > ld: symbol(s) not found for architecture x86_64 > > It appears that setting the environment variable > lt_cx_ld_force_load=no before invoking Open MPI's configure script > works around the issue. For example: > > shell$ lt_cv_ld_force_load=no ./configure … > > This is nice but it does not help stop the issue from striking unless one > reads a very long file in detail first. Isn’t this perhaps something that > the configure script itself should be able to catch if it detects ifort? > > >
[OMPI users] ifort and openmpi
Hi all, This issue here: https://github.com/open-mpi/ompi/issues/7615 is, unfortunately, still current. I understand that within OpenMPI there is a sense that this is Intel's problem but I’m not sure it is. Is it possible to address this in the configure script in the actual OpenMPI distribution in some form? There are more issues with OpenMPI + Intel + scalapack, but this is the first one that strikes. Eventually, the problem just renders a Macbook unusable as a computing tool since the only way it seems to run is with libraries from Homebrew (this works), but that appears to introduce unoptimized BLAS libraries - very slow. It’s the only working MPI setup that I could construct, though. I know that one can take the view that Intel Fortran on Mac is just broken for the default configure process, but it seems like a strange standoff to me. It would be much better to see this worked out in some way. Does anyone have a solution for this issue that could be merged into the actual configure script distributed with OpenMPI, rather than having to track down a fairly arcane addition(*) and apply it by hand? Sorry … I know this isn’t the best way of raising the issue but then, it is also tiring to spend hours on an already large build process and find that the issue is still there. If there was some way to figure this out so as to at least not affect OpenMPI, I suspect that would help a lot of users. Would anyone be willing to revisit the 2020 decision? Thank you! Best wishes Volker (*) I know about the patch in the README: - Users have reported (see https://github.com/open-mpi/ompi/issues/7615) that the Intel Fortran compiler will fail to link Fortran-based MPI applications on macOS with linker errors similar to this: Undefined symbols for architecture x86_64: "_ompi_buffer_detach_f08", referenced from: import-atom in libmpi_usempif08.dylib ld: symbol(s) not found for architecture x86_64 It appears that setting the environment variable lt_cx_ld_force_load=no before invoking Open MPI's configure script works around the issue. For example: shell$ lt_cv_ld_force_load=no ./configure … This is nice but it does not help stop the issue from striking unless one reads a very long file in detail first. Isn’t this perhaps something that the configure script itself should be able to catch if it detects ifort?
Re: [OMPI users] Hardware topology influence
Hello Lucas, Le mar. 13 sept. 2022 à 14:23, Lucas Chaloyard via users < users@lists.open-mpi.org> a écrit : > Hello, > > I'm working as a research intern in a lab where we're studying > virtualization. > And I've been working with several benchmarks using OpenMPI 4.1.0 (ASKAP, > GPAW and Incompact3d from Phrononix Test suite). > > To briefly explain my experiments, I'm running those benchmarks on several > virtual machines using different topologies. > During one experiment I've been comparing those two topologies : > - Topology1 : 96 vCPUS divided in 96 sockets containing 1 threads > - Topology2 : 96 vCPUS divided in 48 sockets containing 2 threads (usage > of hyperthreading) > > For the ASKAP Benchmark : > - While using Topology2, 2306 processes will be created by the application > to do its work. > - While using Topology1, 4612 processes will be created by the application > to do its work. > This is also happening when running GPAW and Incompact3d benchmarks. > > What I've been wondering (and looking for) is, does OpenMPI take into > account the topology, and reduce the number of processes create to execute > its work in order to avoid the usage of hyperthreading ? > Or is it something done by the application itself ? > I would like to add that it is possible that the VMM (Virtual Machine Monitor) may never expose completely the physical topology to a guest. This may vary from one hypervisor to another. Thus, VM topology won't ever match the physical topology. I am not even sure if you can tweak the VMM to perfectly match physical and virtual topology. There was an interesting talk about this at the KVM forum a few years ago. You can watch it at https://youtu.be/hHPuEF7qP_Q. That said, I am experimenting by running MPI applications by using a unikernel. The unikernel is deployed in a single VM with the same number of VCPUs as in the host. In this deployment, I am using one thread per VCPU and the communication is over shared-memory, i.e., virtio. This deployment aims at leveraging the NUMA topology by using dynamic memory that is allocated per-core. In other words, threads allocate only local memory. For the moment, I could not bench this deployment but I will do it soon. Matias > I was looking at the source code, and I've been trying to find how and > when are filled the information about the MPI_COMM_WORLD communicator, to > see if the 'num_procs' field depends on the topology, but I didn't have any > chance for now. > > Respectfully, Chaloyard Lucas. >