What Charles said was true but not quite complete. We still support the older 
PMI libraries but you likely have to point us to wherever slurm put them.

However,we definitely recommend using PMIx as you will get a faster launch 

Sent from my iPad

> On Nov 16, 2017, at 9:11 AM, Bennet Fauber <ben...@umich.edu> wrote:
> 
> Charlie,
> 
> Thanks a ton!  Yes, we are missing two of the three steps.
> 
> Will report back after we get pmix installed and after we rebuild
> Slurm.  We do have a new enough version of it, at least, so we might
> have missed the target, but we did at least hit the barn.  ;-)
> 
> 
> 
>> On Thu, Nov 16, 2017 at 10:54 AM, Charles A Taylor <chas...@ufl.edu> wrote:
>> Hi Bennet,
>> 
>> Three things...
>> 
>> 1. OpenMPI 2.x requires PMIx in lieu of pmi1/pmi2.
>> 
>> 2. You will need slurm 16.05 or greater built with —with-pmix
>> 
>> 2a. You will need pmix 1.1.5 which you can get from github.
>> (https://github.com/pmix/tarballs).
>> 
>> 3. then, to launch your mpi tasks on the allocated resources,
>> 
>>   srun —mpi=pmix ./hello-mpi
>> 
>> I’m replying to the list because,
>> 
>> a) this information is harder to find than you might think.
>> b) someone/anyone can correct me if I’’m giving a bum steer.
>> 
>> Hope this helps,
>> 
>> Charlie Taylor
>> University of Florida
>> 
>> On Nov 16, 2017, at 10:34 AM, Bennet Fauber <ben...@umich.edu> wrote:
>> 
>> I think that OpenMPI is supposed to support SLURM integration such that
>> 
>>   srun ./hello-mpi
>> 
>> should work?  I built OMPI 2.1.2 with
>> 
>> export CONFIGURE_FLAGS='--disable-dlopen --enable-shared'
>> export COMPILERS='CC=gcc CXX=g++ FC=gfortran F77=gfortran'
>> 
>> CMD="./configure \
>>   --prefix=${PREFIX} \
>>   --mandir=${PREFIX}/share/man \
>>   --with-slurm \
>>   --with-pmi \
>>   --with-lustre \
>>   --with-verbs \
>>   $CONFIGURE_FLAGS \
>>   $COMPILERS
>> 
>> I have a simple hello-mpi.c (source included below), which compiles
>> and runs with mpirun, both on the login node and in a job.  However,
>> when I try to use srun in place of mpirun, I get instead a hung job,
>> which upon cancellation produces this output.
>> 
>> [bn2.stage.arc-ts.umich.edu:116377] PMI_Init [pmix_s1.c:162:s1_init]:
>> PMI is not initialized
>> [bn1.stage.arc-ts.umich.edu:36866] PMI_Init [pmix_s1.c:162:s1_init]:
>> PMI is not initialized
>> [warn] opal_libevent2022_event_active: event has no event_base set.
>> [warn] opal_libevent2022_event_active: event has no event_base set.
>> slurmstepd: error: *** STEP 86.0 ON bn1 CANCELLED AT 2017-11-16T10:03:24 ***
>> srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
>> slurmstepd: error: *** JOB 86 ON bn1 CANCELLED AT 2017-11-16T10:03:24 ***
>> 
>> The SLURM web page suggests that OMPI 2.x and later support PMIx, and
>> to use `srun --mpi=pimx`, however that no longer seems to be an
>> option, and using the `openmpi` type isn't working (neither is pmi2).
>> 
>> [bennet@beta-build hello]$ srun --mpi=list
>> srun: MPI types are...
>> srun: mpi/pmi2
>> srun: mpi/lam
>> srun: mpi/openmpi
>> srun: mpi/mpich1_shmem
>> srun: mpi/none
>> srun: mpi/mvapich
>> srun: mpi/mpich1_p4
>> srun: mpi/mpichgm
>> srun: mpi/mpichmx
>> 
>> To get the Intel PMI to work with srun, I have to set
>> 
>>   I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so
>> 
>> Is there a comparable environment variable that must be set to enable
>> `srun` to work?
>> 
>> Am I missing a build option or misspecifying one?
>> 
>> -- bennet
>> 
>> 
>> Source of hello-mpi.c
>> ==========================================
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include "mpi.h"
>> 
>> int main(int argc, char **argv){
>> 
>> int rank;          /* rank of process */
>> int numprocs;      /* size of COMM_WORLD */
>> int namelen;
>> int tag=10;        /* expected tag */
>> int message;       /* Recv'd message */
>> char processor_name[MPI_MAX_PROCESSOR_NAME];
>> MPI_Status status; /* status of recv */
>> 
>> /* call Init, size, and rank */
>> MPI_Init(&argc, &argv);
>> MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
>> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>> MPI_Get_processor_name(processor_name, &namelen);
>> 
>> printf("Process %d on %s out of %d\n", rank, processor_name, numprocs);
>> 
>> if(rank != 0){
>>   MPI_Recv(&message,    /*buffer for message */
>>                   1,    /*MAX count to recv */
>>             MPI_INT,    /*type to recv */
>>                   0,    /*recv from 0 only */
>>                 tag,    /*tag of messgae */
>>      MPI_COMM_WORLD,    /*communicator to use */
>>             &status);   /*status object */
>>   printf("Hello from process %d!\n",rank);
>> }
>> else{
>>   /* rank 0 ONLY executes this */
>>   printf("MPI_COMM_WORLD is %d processes big!\n", numprocs);
>>   int x;
>>   for(x=1; x<numprocs; x++){
>>      MPI_Send(&x,          /*send x to process x */
>>                1,          /*number to send */
>>          MPI_INT,          /*type to send */
>>                x,          /*rank to send to */
>>              tag,          /*tag for message */
>>    MPI_COMM_WORLD);        /*communicator to use */
>>   }
>> } /* end else */
>> 
>> 
>> /* always call at end */
>> MPI_Finalize();
>> 
>> return 0;
>> }
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.open-2Dmpi.org_mailman_listinfo_users&d=DwICAg&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=HOtXciFqK5GlgIgLAxthUQ&m=t2C9i2WW8vYudLmnfvtKjpqTlBguLeivBwHAaQ1TcM4&s=aakHf5ypdTOe4-hQ86pcEN9FmiW1Xyngln5ODOUwCqQ&e=
>> 
>> 
>> 
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to