[OMPI devel] MPI ABI effort

2023-08-26 Thread Gilles Gouaillardet via devel
Folks, Jeff Hammond and al. published "MPI Application Binary Interface Standardization" las week https://arxiv.org/abs/2308.11214 The paper reads the (C) ABI has already been prototyped natively in MPICH. Is there any current interest into prototyping this ABI into Open MPI? Cheers, Gilles

Re: [OMPI devel] Open MPI v5.0.0rc9 is available for testing

2022-11-25 Thread Gilles Gouaillardet via devel
is too old. Probably something worth filing a > ticket for so that we can run to ground before 5.0 release. Oddly, CI > continues to pass on RHEL7 in AWS. I'm not sure what we've done to cause > that, but also worth investigating. > > Brian > > On 11/14/22, 10:32 PM, "d

Re: [OMPI devel] Open MPI v5.0.0rc9 is available for testing

2022-11-25 Thread Gilles Gouaillardet via devel
d it is too old. Probably something worth filing a >>> ticket for so that we can run to ground before 5.0 release. Oddly, CI >>> continues to pass on RHEL7 in AWS. I'm not sure what we've done to cause >>> that, but also worth investigating. >>> >>> Brian &

Re: [OMPI devel] Open MPI v5.0.0rc9 is available for testing

2022-11-25 Thread Gilles Gouaillardet via devel
what we've done to cause >> that, but also worth investigating. >> >> Brian >> >> On 11/14/22, 10:32 PM, "devel on behalf of Gilles Gouaillardet via >> devel" > devel@lists.open-mpi.org> wrote: >> >> CAUTION: This email originate

Re: [OMPI devel] Open MPI v5.0.0rc9 is available for testing

2022-11-14 Thread Gilles Gouaillardet via devel
Folks, I tried to build on a RHEL7 like, and it fails at make time because ROMIO requires stdatomic.h (it seems this file is only available from GCC 4.9) Are we supposed to be able to build Open MPI 5 with GCC 4.8 (e.g. stock RHEL7 compiler)? --enable-mca-no-build=io-romio314 cannot

Re: [OMPI devel] Proposed change to Cuda dependencies in Open MPI

2022-09-09 Thread Gilles Gouaillardet via devel
William, What is the desired behavior if Open MPI built with CUDA is used on a system where CUDA is not available or cannot be used because of ABI compatibility issues? - issue a warning (could not open the DSO because of unsatisfied dependencies)? - silently ignore the CUDA related components?

Re: [OMPI devel] Open MPI Java MPI bindings

2022-08-10 Thread Gilles Gouaillardet via devel
Hi, Same impression here. There are bug reports once in a while (both Open MPI mailing list and Stack Overflow). I remember one used the Java bindings to teach MPI. So my gut feeling is the number of active users is small but not zero. Cheers, Gilles On Wed, Aug 10, 2022 at 8:48 PM

Re: [OMPI devel] Possible Bug / Invalid Read in Ialltoallw

2022-05-05 Thread Gilles Gouaillardet via devel
ter ? with or without topology ? > > Thanks, > George. > > > On Wed, May 4, 2022 at 9:35 AM Gilles Gouaillardet via devel < > devel@lists.open-mpi.org> wrote: > > Damian, > > Thanks for the report! > > could you please trim your program and share it so I can

Re: [OMPI devel] Possible Bug / Invalid Read in Ialltoallw

2022-05-04 Thread Gilles Gouaillardet via devel
Damian, Thanks for the report! could you please trim your program and share it so I can have a look? Cheers, Gilles On Wed, May 4, 2022 at 10:27 PM Damian Marek via devel < devel@lists.open-mpi.org> wrote: > Hello, > > I have been getting intermittent memory corruptions and segmentation >

Re: [OMPI devel] Script-based wrapper compilers

2022-04-01 Thread Gilles Gouaillardet via devel
then it's potentially worth the time. > > -- > Jeff Squyres > jsquy...@cisco.com > > ________ > From: devel on behalf of Gilles > Gouaillardet via devel > Sent: Wednesday, March 23, 2022 10:28 PM > To: Open MPI Developers > Cc: Gilles

Re: [OMPI devel] Script-based wrapper compilers

2022-03-23 Thread Gilles Gouaillardet via devel
Brian, My 0.02 US$ Script based wrapper compilers are very useful when cross compiling, so ideally, they should be maintained. Cheers, Gilles On Thu, Mar 24, 2022 at 11:18 AM Barrett, Brian via devel < devel@lists.open-mpi.org> wrote: > Does anyone still use the script based wrapper

Re: [OMPI devel] OpenMPI and maker - Multiple messages

2021-02-18 Thread Gilles Gouaillardet via devel
Thomas, in order to test Open MPI, you can download https://raw.githubusercontent.com/open-mpi/ompi/master/examples/hello_c.c and build it $ mpicc -o hello_c examples/hello_c.c and then run $ mpiexec --mca btl ^openib -N 5 ./hello_c It should display 5 different lines (rank 0 to 4 of 5). If

Re: [OMPI devel] configure problem on master

2021-02-05 Thread Gilles Gouaillardet via devel
Edgar and Jeff, the root cause is we do not use pthreads here any more: >From master config.log configure:226861: gcc -std=gnu99 -o conftest -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -mcx16 -I/home/usersup/gilles/build/ompi-master/3rd-party/openpmix/include

Re: [OMPI devel] Segmentation fault in MPI_Get in the Vader/sm module

2021-02-03 Thread Gilles Gouaillardet via devel
Max, Thanks for the report and suggesting a solution! Could you please issue a pull request so your fix can be reviewed? note the commit has to be signed-off (IANAL, make sure you understand the implications) in order to be accepted upstream. Cheers, Gilles On Thu, Feb 4, 2021 at 6:40 AM Max

Re: [OMPI devel] mpirun 4.1.0 segmentation fault

2021-02-01 Thread Gilles Gouaillardet via devel
Andrej, that really looks like a SLURM issue that does not involve Open MPI In order to confirm, you can $ salloc -N 2 -n 2 /* and then from the allocation */ srun hostname If this does not work, then this is a SLURM issue you have to fix. Once fixed, I am confident Open MPI will just work

Re: [OMPI devel] mpirun 4.1.0 segmentation fault

2021-02-01 Thread Gilles Gouaillardet via devel
; srun: error: task 0 launch failed: Unspecified error > srun: error: task 3 launch failed: Unspecified error > srun: error: task 2 launch failed: Unspecified error > srun: error: task 1 launch failed: Unspecified error > [terra:177267] [[5499,0],0] plm:slurm: primary daemons co

Re: [OMPI devel] mpirun 4.1.0 segmentation fault

2021-02-01 Thread Gilles Gouaillardet via devel
Andrej, you *have* to invoke mpirun --mca plm slurm ... from a SLURM allocation, and SLURM_* environment variables should have been set by SLURM (otherwise, this is a SLURM error out of the scope of Open MPI). Here is what you can try (and send the logs if that fails) $ salloc -N 4 -n 384 and

Re: [OMPI devel] mpirun 4.1.0 segmentation fault

2021-02-01 Thread Gilles Gouaillardet via devel
Andrej, I can reproduce this behavior ... when running outside of a slurm allocation. What does $ env | grep ^SLURM_ reports? Cheers, Gilles On Tue, Feb 2, 2021 at 9:06 AM Andrej Prsa via devel wrote: > > Hi Ralph, Gilles, > > > I fail to understand why you continue to think that PMI has

Re: [OMPI devel] mpirun 4.1.0 segmentation fault

2021-02-01 Thread Gilles Gouaillardet via devel
Andrej, My previous email listed other things to try Cheers, Gilles Sent from my iPod > On Feb 2, 2021, at 6:23, Andrej Prsa via devel > wrote: > > The saga continues. > > I managed to build slurm with pmix by first patching slurm using this patch > and manually building the plugin: > >

Re: [OMPI devel] mpirun 4.1.0 segmentation fault

2021-02-01 Thread Gilles Gouaillardet via devel
Andrej, you are now invoking mpirun from a slurm allocation, right? you can try this: /usr/local/bin/mpirun -mca plm slurm -np 384 -H node15:96,node16:96,node17:96,node18:96 python testmpi.py if it does not work, you can collect more relevant logs with mpirun -mca plm slurm -mca

Re: [OMPI devel] mpirun 4.1.0 segmentation fault

2021-02-01 Thread Gilles Gouaillardet via devel
Andrej, that's odd, there should be a mca_pmix_pmix3x.so (assuming you built with the internal pmix) what was your exact configure command line? fwiw, in your build tree, there should be a opal/mca/pmix/pmix3x/.libs/mca_pmix_pmix3x.so if it's there, try running sudo make install once more and

Re: [OMPI devel] mpirun 4.1.0 segmentation fault

2021-02-01 Thread Gilles Gouaillardet via devel
Andrej, it seems only flux is a PMIx option, which is very suspicious. can you check other components are available? ls -l /usr/local/lib/openmpi/mca_pmix_*.so will list them. Cheers, Gilles On Mon, Feb 1, 2021 at 10:53 PM Andrej Prsa via devel wrote: > > Hi Gilles, > > > what is your

Re: [OMPI devel] mpirun 4.1.0 segmentation fault

2021-02-01 Thread Gilles Gouaillardet via devel
Andrej, what is your mpirun command line? is mpirun invoked from a batch allocation? in order to get some more debug info, you can mpirun --mca ess_base_verbose 10 --mca pmix_base_verbose 10 ... Cheers, Gilles On Mon, Feb 1, 2021 at 10:27 PM Andrej Prsa via devel wrote: > > Hi Gilles, > >

Re: [OMPI devel] mpirun 4.1.0 segmentation fault

2021-01-31 Thread Gilles Gouaillardet via devel
Andrej, The log you just posted strongly suggests a previously built (without --enable-debug) internal PMIx is being used. I invite you to do some cleanup sudo rm -rf /usr/local/lib/openmpi /usr/local/lib/pmix and then sudo make install and try again. if the issue persists, please post the

Re: [OMPI devel] How to build Open MPI so the UCX used can be changed at runtime?

2021-01-27 Thread Gilles Gouaillardet via devel
Gouaillardet via devel mailto:devel@lists.open-mpi.org>> wrote: Tim, a simple option is to configure ... LDFLAGS="-Wl,--enable-new-dtags" If Open MPI is built with this option, then LD_LIBRARY_PATH takes precedence over rpath (the default is the opposite as correctly

Re: [OMPI devel] How to build Open MPI so the UCX used can be changed at runtime?

2021-01-26 Thread Gilles Gouaillardet via devel
overrides for ucx_CFLAGS, ucx_LIBS, and ucx_STATIC_LIBS, but nothing for something like "ucx_LDFLAGS=''". I didn't see a simple way to add support for such an override without some more extensive changes to multiple m4 files. On Sun, Jan 24, 2021 at 7:08 PM Gilles Gouaillardet via dev

Re: [OMPI devel] How to build Open MPI so the UCX used can be changed at runtime?

2021-01-24 Thread Gilles Gouaillardet via devel
Tim, Have you tried using LD_LIBRARY_PATH? I guess "hardcoding the full path" means "link with -rpath", and IIRC, LD_LIBRARY_PATH overrides this setting. If this does not work, here something you can try (disclaimer: I did not) export LD_LIBRARY_PATH=/same/install/prefix/ucx/1.9.0/lib

Re: [OMPI devel] Which compiler versions to test?

2020-10-08 Thread Gilles Gouaillardet via devel
Hi Jeff, On RHEL 8.x, the default gcc compiler is 8.3.1, so I think it is worth testing. Containers could be used to setup a RHEL 8.x environment (so not only gcc but also third party libs such as libevent and hwloc can be used) if the MTT cluster will not shrink bigger. Cheers, Gilles On

Re: [OMPI devel] Hanging inside ompi_win_create_dynamic

2020-05-22 Thread Gilles Gouaillardet via devel
Luis, a MPI window creation is a collective operation, that involves the creation of a communicator that involves (several) non blocking allgather. I recommend you first make sure you that creating a windows does not start a recursion. Cheers, Gilles On Fri, May 22, 2020 at 7:53 PM Luis

Re: [OMPI devel] Warning

2020-05-15 Thread Gilles Gouaillardet via devel
Luis, you can do this: struct ompi_info_t * info = _mpi_info_null.info; ret = ompi_win_create_dynamic(_null->super, comm, win); Cheers, Gilles On Fri, May 15, 2020 at 7:38 PM Luis via devel wrote: > > Hi OMPI devs, > > I was wondewring if this warning is expected, if not, how should we >

Re: [OMPI devel] Open MPI v4.0.1: Process is hanging inside MPI_Init() when debugged with TotalView

2019-11-11 Thread Gilles Gouaillardet via devel
John, OMPI_LAZY_WAIT_FOR_COMPLETION(active) is a simple loop that periodically checks the (volatile) "active" condition, that is expected to be updated by an other thread. So if you set your breakpoint too early, and **all** threads are stopped when this breakpoint is hit, you might

Re: [OMPI devel] hcoll missing libsharp

2019-10-15 Thread Gilles Gouaillardet via devel
Folks, -L/path/to/lib -lfoo will look for the foo library (libfoo.so or libfoo.a on Linux) in the /path/to/lib directory. In the case of a dynamic library (e.g. libfoo.so), its dependencies will *not* be searched in /path/to/lib (and this is where LD_LIBRARY_PATH helps) that can be

Re: [OMPI devel] hcoll missing libsharp

2019-10-15 Thread Gilles GOUAILLARDET via devel
Jeff, I agree we are not trying to run the binary here, but I am still puzzled here... Basically, gcc -o a.out foo.o -L/.../hcoll/lib -lhcoll and libhcoll.so depends on libsharp.so (that is in /.../sharp/lib) 1) do you expect this to just work even if the linker has no idea where to find

Re: [OMPI devel] hcoll missing libsharp

2019-10-15 Thread Gilles GOUAILLARDET via devel
Chris, I would try to add the directory containing libsharp in your LD_LIBRARY_PATH Cheers, Gilles On October 15, 2019, at 11:08 PM, Chris Ward via devel wrote: Adding an appropriate LDFLAGS= didn't help; the revised tarball is here http://tjcw.freeshell.org/ompi-output-2.tar.bz2 . Do I

Re: [OMPI devel] Is mpirun in slurm producing the correct "srun" cmd behind the scene?

2019-07-05 Thread Gilles Gouaillardet via devel
Lawrence, My apologies for the incorrect link. The patch is at https://github.com/open-mpi/ompi/pull/6672.patch Cheers, Gilles On Sat, Jul 6, 2019 at 12:21 PM Gilles Gouaillardet wrote: > > Lawrence, > > this is a known issue (--cpu_bind optioned was removed from SLURM in > favor of the

Re: [OMPI devel] Is mpirun in slurm producing the correct "srun" cmd behind the scene?

2019-07-05 Thread Gilles Gouaillardet via devel
Lawrence, this is a known issue (--cpu_bind optioned was removed from SLURM in favor of the --cpu-bind option) and the fix will be available in Open MPI 4.0.1 Meanwhile, you can manually download and apply the patch at https://github.com/open-mpi/ompi/pull/6445.patch or use a nightly build of