try run "ldd your_mpi_program" on the compute node and make sure that the SLURM pmi library is used.
在 2012-06-29五的 15:13 -0600,Sarah Mulholland写道: > This is exactly what I see when I run the command below. > > I rebuilt mpich2 with slurm. I had to set CFLAGS and CXXFLAGS and LIBS to > point to incorporate -I/path/to/slurm/incl and -L/path/to/slurm/lib to get it > to build with --with-pmi=slurm --with-pm=no and --slurm=/path/to/slurm since > the last doesn't appear to have an effect. > > When I run the command below, I still see four unique SLURM_PROCIDs. > > I have slurm-2.3.5 and mpich2-1.4.1p1. Does anybody else run with these two > packages and versions? If so, would you mind sending me your configure > flags? Any other suggestions would be appreciated. > > Thanks, > > Sarah > > -----Original Message----- > From: Moe Jette [mailto:[email protected]] > Sent: Friday, June 29, 2012 9:18 AM > To: slurm-dev > Subject: [slurm-dev] FW: slurm and mpich2 > > > SLURM's PMI library gets the task rank from the environment variable > SLURM_PROCID. I believe that you are not using that library. You can at least > confirm that the environment variable is set correctly by running something > like this: > $ srun -n4 -l printenv SLURM_PROCID | sort > 0: 0 > 1: 1 > 2: 2 > 3: 3 > > If you get output like above then the problem is definitely that of not using > SLURM's PMI library. > > > Quoting Sarah Mulholland <[email protected]>: > > > I should add that I have MpiDefault=none in my slurm configuration > > file as suggested by the slurm configuration tool. > > > > From: Sarah Mulholland > > Sent: Thursday, June 28, 2012 3:59 PM > > To: '[email protected]' > > Subject: slurm and mpich2 > > > > I have installed slurm-2.3.5 and mpich2-1.4.1p1. We are using the > > hydra process manager for mpich. As suggested on the ANL web site, I > > installed configured mpich2 with -with-hydra-bss=ssh,rsh,fork,slurm. > > Yet when I launch a process with srun all tasks are rank 0. > > > > I tried building mpich2 with slurm's native PMI library by configuring > > --with-pmi=slurm -with-pm=no -with-slurm=[/our/path/here], but > > autoconf didn't find slurm in the given location. > > > > Has anybody else experienced this? Any suggestions? > > >
