Passant,
you can first try a PMIx only program
for example, in the test directory of PMIx
srun --mpi=pmix -N 2 -n 4 .libs/pmix_client -n 4
should work just fine (otherwise, this is a non Open MPI related issue)
If it works, then you can build an other Open MPI and pass
--enable-debug to the configure command line.
Hopefully, it will provide more information (or at least, you will have
the option to ask some very verbose logs)
Cheers,
Gilles
On 3/12/2019 5:46 PM, Passant A. Hafez wrote:
Hi Gilles,
Yes it was just a typo in the last email, it was correctly spelled in the job
script.
So I just tried to use 1 node * 2 tasks/node, I got the same error I posted
before, just a copy for each process, here it is again:
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[cn603-20-l:169109] Local abort before MPI_INIT completed completed
successfully, but am not able to aggregate error messages, and not able to
guarantee that all other processes were killed!
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[cn603-20-l:169108] Local abort before MPI_INIT completed completed
successfully, but am not able to aggregate error messages, and not able to
guarantee that all other processes were killed!
srun: error: cn603-20-l: tasks 0-1: Exited with exit code 1
I'm suspecting Slurm, but anyways, how can I troubleshoot this?
The program is a simple MPI Hello World code.
All the best,
--
Passant A. Hafez | HPC Applications Specialist
KAUST Supercomputing Core Laboratory (KSL)
King Abdullah University of Science and Technology
Building 1, Al-Khawarizmi, Room 0123
Mobile : +966 (0) 55-247-9568
Mobile : +20 (0) 106-146-9644
Office : +966 (0) 12-808-0367
________________________________________
From: users <users-boun...@lists.open-mpi.org> on behalf of Gilles Gouaillardet
<gil...@rist.or.jp>
Sent: Tuesday, March 12, 2019 8:22 AM
To: users@lists.open-mpi.org
Subject: Re: [OMPI users] Building PMIx and Slurm support
Passant,
Except the typo (it should be srun --mpi=pmix_v3), there is nothing
wrong with that, and it is working just fine for me
(same SLURM version, same PMIx version, same Open MPI version and same
Open MPI configure command line)
that is why I asked you some more information/logs in order to
investigate your issue.
You might want to try a single node job first in order to rule out
potential interconnect related issues.
Cheers,
Gilles
On 3/12/2019 1:54 PM, Passant A. Hafez wrote:
Hello Gilles,
Yes I do use srun --mpi=pmix_3 to run the app, what's the problem with
that?
Before that, when we tried to launch MPI apps directly with srun, we
got the error message saying Slurm missed the PMIx support, that's why
we proceeded with the installation.
All the best,
--
Passant
On Mar 12, 2019 6:53 AM, Gilles Gouaillardet <gil...@rist.or.jp> wrote:
Passant,
I built a similar environment, and had no issue running a simple MPI
program.
Can you please post your slurm script (I assume it uses srun to start
the MPI app),
the output of
scontrol show config | grep Mpi
and the full output of your job ?
Cheers,
Gilles
On 3/12/2019 7:59 AM, Passant A. Hafez wrote:
Hello,
So we now have Slurm 18.08.6-2 compiled with PMIx 3.1.2
then I installed openmpi 4.0.0 with:
--with-slurm --with-pmix=internal --with-libevent=internal
--enable-shared --enable-
static --with-x
(Following the thread, it was mentioned that building OMPI 4.0.0 with
PMIx 3.1.2 will fail with PMIX_MODEX and PMIX_INFO_ARRAY errors, so I
used internal PMIx)
The MPI program fails with:
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[cn603-13-r:387088] Local abort before MPI_INIT completed completed
successfully, but am not able to aggregate error messages, and not
able to guarantee that all other processes were killed!
for each process, please advise! what's going wrong here?
All the best,
--
Passant A. Hafez | HPC Applications Specialist
KAUST Supercomputing Core Laboratory (KSL)
King Abdullah University of Science and Technology
Building 1, Al-Khawarizmi, Room 0123
Mobile : +966 (0) 55-247-9568
Mobile : +20 (0) 106-146-9644
Office : +966 (0) 12-808-0367
------------------------------------------------------------------------
*From:* users <users-boun...@lists.open-mpi.org> on behalf of Ralph H
Castain <r...@open-mpi.org>
*Sent:* Monday, March 4, 2019 5:29 PM
*To:* Open MPI Users
*Subject:* Re: [OMPI users] Building PMIx and Slurm support
On Mar 4, 2019, at 5:34 AM, Daniel Letai <d...@letai.org.il
<mailto:d...@letai.org.il>> wrote:
Gilles,
On 3/4/19 8:28 AM, Gilles Gouaillardet wrote:
Daniel,
On 3/4/2019 3:18 PM, Daniel Letai wrote:
So unless you have a specific reason not to mix both, you might
also give the internal PMIx a try.
Does this hold true for libevent too? Configure complains if
libevent for openmpi is different than the one used for the other
tools.
I am not exactly sure of which scenario you are running.
Long story short,
- If you use an external PMIx, then you have to use an external
libevent (otherwise configure will fail).
It must be the same one used by PMIx, but I am not sure configure
checks that.
- If you use the internal PMIx, then it is up to you. you can either
use the internal libevent, or an external one.
Thanks, that clarifies the issues I've experienced. Since PMIx
doesn't have to be the same for server and nodes, I can compile slurm
with external PMIx with system libevent, and compile openmpi with
internal PMIx and libevent, and that should work. Is that correct?
Yes - that is indeed correct!
BTW, building 4.0.1rc1 completed successfully using external for all,
will start testing in near future.
Cheers,
Gilles
Thanks,
Dani_L.
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users