Re: [OMPI users] ifort and openmpi

2022-09-15 Thread Gilles Gouaillardet via users
Volker,

https://ntq1982.github.io/files/20200621.html (mentioned in the ticket)
suggests that patching the generated configure file can do the trick.

We already patch the generated configure file in autogen.pl (if the
patch_autotools_output subroutine), so I guess that could be enhanced
to support Intel Fortran on OSX.

I am confident a Pull Request that does fix this issue will be considered
for inclusion in future Open MPI releases.


Cheers,

Gilles

On Fri, Sep 16, 2022 at 11:20 AM Volker Blum via users <
users@lists.open-mpi.org> wrote:

> Hi all,
>
> This issue here:
>
> https://github.com/open-mpi/ompi/issues/7615
>
> is, unfortunately, still current.
>
> I understand that within OpenMPI there is a sense that this is Intel's
> problem but I’m not sure it is. Is it possible to address this in the
> configure script in the actual OpenMPI distribution in some form?
>
> There are more issues with OpenMPI + Intel + scalapack, but this is the
> first one that strikes. Eventually, the problem just renders a Macbook
> unusable as a computing tool since the only way it seems to run is with
> libraries from Homebrew (this works), but that appears to introduce
> unoptimized BLAS libraries - very slow. It’s the only working MPI setup
> that I could construct, though.
>
> I know that one can take the view that Intel Fortran on Mac is just broken
> for the default configure process, but it seems like a strange standoff to
> me. It would be much better to see this worked out in some way.
>
> Does anyone have a solution for this issue that could be merged into the
> actual configure script distributed with OpenMPI, rather than having to
> track down a fairly arcane addition(*) and apply it by hand?
>
> Sorry … I know this isn’t the best way of raising the issue but then, it
> is also tiring to spend hours on an already large build process and find
> that the issue is still there. If there was some way to figure this out so
> as to at least not affect OpenMPI, I suspect that would help a lot of
> users. Would anyone be willing to revisit the 2020 decision?
>
> Thank you!
>
> Best wishes
> Volker
>
> (*) I know about the patch in the README:
>
> - Users have reported (see
>   https://github.com/open-mpi/ompi/issues/7615) that the Intel Fortran
>   compiler will fail to link Fortran-based MPI applications on macOS
>   with linker errors similar to this:
>
>   Undefined symbols for architecture x86_64:
> "_ompi_buffer_detach_f08", referenced from:
> import-atom in libmpi_usempif08.dylib
>   ld: symbol(s) not found for architecture x86_64
>
>   It appears that setting the environment variable
>   lt_cx_ld_force_load=no before invoking Open MPI's configure script
>   works around the issue.  For example:
>
>   shell$ lt_cv_ld_force_load=no ./configure …
>
> This is nice but it does not help stop the issue from striking unless one
> reads a very long file in detail first. Isn’t this perhaps something that
> the configure script itself should be able to catch if it detects ifort?
>
>
>


[OMPI users] ifort and openmpi

2022-09-15 Thread Volker Blum via users
Hi all,

This issue here:

https://github.com/open-mpi/ompi/issues/7615

is, unfortunately, still current. 

I understand that within OpenMPI there is a sense that this is Intel's problem 
but I’m not sure it is. Is it possible to address this in the configure script 
in the actual OpenMPI distribution in some form?

There are more issues with OpenMPI + Intel + scalapack, but this is the first 
one that strikes. Eventually, the problem just renders a Macbook unusable as a 
computing tool since the only way it seems to run is with libraries from 
Homebrew (this works), but that appears to introduce unoptimized BLAS libraries 
- very slow. It’s the only working MPI setup that I could construct, though.

I know that one can take the view that Intel Fortran on Mac is just broken for 
the default configure process, but it seems like a strange standoff to me. It 
would be much better to see this worked out in some way. 

Does anyone have a solution for this issue that could be merged into the actual 
configure script distributed with OpenMPI, rather than having to track down a 
fairly arcane addition(*) and apply it by hand?

Sorry … I know this isn’t the best way of raising the issue but then, it is 
also tiring to spend hours on an already large build process and find that the 
issue is still there. If there was some way to figure this out so as to at 
least not affect OpenMPI, I suspect that would help a lot of users. Would 
anyone be willing to revisit the 2020 decision?

Thank you!

Best wishes
Volker

(*) I know about the patch in the README:

- Users have reported (see
  https://github.com/open-mpi/ompi/issues/7615) that the Intel Fortran
  compiler will fail to link Fortran-based MPI applications on macOS
  with linker errors similar to this:

  Undefined symbols for architecture x86_64:
"_ompi_buffer_detach_f08", referenced from:
import-atom in libmpi_usempif08.dylib
  ld: symbol(s) not found for architecture x86_64

  It appears that setting the environment variable
  lt_cx_ld_force_load=no before invoking Open MPI's configure script
  works around the issue.  For example:

  shell$ lt_cv_ld_force_load=no ./configure …

This is nice but it does not help stop the issue from striking unless one reads 
a very long file in detail first. Isn’t this perhaps something that the 
configure script itself should be able to catch if it detects ifort?




Re: [OMPI users] Hardware topology influence

2022-09-15 Thread Matias Vara via users
Hello Lucas,

Le mar. 13 sept. 2022 à 14:23, Lucas Chaloyard via users <
users@lists.open-mpi.org> a écrit :

> Hello,
>
> I'm working as a research intern in a lab where we're studying
> virtualization.
> And I've been working with several benchmarks using OpenMPI 4.1.0 (ASKAP,
> GPAW and Incompact3d from Phrononix Test suite).
>
> To briefly explain my experiments, I'm running those benchmarks on several
> virtual machines using different topologies.
> During one experiment I've been comparing those two topologies :
> - Topology1 : 96 vCPUS divided in 96 sockets containing 1 threads
> - Topology2 : 96 vCPUS divided in 48 sockets containing 2 threads (usage
> of hyperthreading)
>
> For the ASKAP Benchmark :
> - While using Topology2, 2306 processes will be created by the application
> to do its work.
> - While using Topology1, 4612 processes will be created by the application
> to do its work.
> This is also happening when running GPAW and Incompact3d benchmarks.
>
> What I've been wondering (and looking for) is, does OpenMPI take into
> account the topology, and reduce the number of processes create to execute
> its work in order to avoid the usage of hyperthreading ?
> Or is it something done by the application itself ?
>

I would like to add that it is possible that the VMM (Virtual Machine
Monitor) may never expose completely the physical topology to a guest. This
may vary from one hypervisor to another. Thus, VM topology won't ever match
the physical topology. I am not even sure if you can tweak the VMM to
perfectly match physical and virtual topology. There was an interesting
talk about this at the KVM forum a few years ago. You can watch it at
https://youtu.be/hHPuEF7qP_Q. That said, I am experimenting by running MPI
applications by using a unikernel. The unikernel is deployed in a single VM
with the same number of VCPUs as in the host. In this deployment, I am
using one thread per VCPU and the communication is over shared-memory,
i.e., virtio. This deployment aims at leveraging the NUMA topology by using
dynamic memory that is allocated per-core. In other words, threads allocate
only local memory. For the moment, I could not bench this deployment but I
will do it soon.

Matias



> I was looking at the source code, and I've been trying to find how and
> when are filled the information about the MPI_COMM_WORLD communicator, to
> see if the 'num_procs' field depends on the topology, but I didn't have any
> chance for now.
>
> Respectfully, Chaloyard Lucas.
>