Replying to myself here. I see that SLURM_PROCID is like MPI_RANK, so it should be different values. The problem illustrated below is that the job ids (prior to the colons) are different. Correct me if I'm mistaken!
-----Original Message----- From: Sarah Mulholland [mailto:[email protected]] Sent: Thursday, August 02, 2012 11:11 AM To: slurm-dev Subject: [slurm-dev] FW: slurm and mpich2 Thanks for the reply. I was side tracked for a few weeks and am getting back to this. I have pmi enabled, pm as none, and slurm used for mpich. I see the advice about looking at the libraries that are linked to the executable with ldd, but for printenv of course it isn't linked to the pmi library, so I'm confused. Here's the debug command I'm using: srun -n4 -l printenv SLURM_PROCID | sort Am I right in understanding that the reports from all nodes should have the same slurm procid? -----Original Message----- From: Hongjia Cao [mailto:[email protected]] Sent: Saturday, June 30, 2012 2:42 AM To: slurm-dev Subject: [slurm-dev] FW: slurm and mpich2 try run "ldd your_mpi_program" on the compute node and make sure that the SLURM pmi library is used. 在 2012-06-29五的 15:13 -0600,Sarah Mulholland写道: > This is exactly what I see when I run the command below. > > I rebuilt mpich2 with slurm. I had to set CFLAGS and CXXFLAGS and LIBS to > point to incorporate -I/path/to/slurm/incl and -L/path/to/slurm/lib to get it > to build with --with-pmi=slurm --with-pm=no and --slurm=/path/to/slurm since > the last doesn't appear to have an effect. > > When I run the command below, I still see four unique SLURM_PROCIDs. > > I have slurm-2.3.5 and mpich2-1.4.1p1. Does anybody else run with these two > packages and versions? If so, would you mind sending me your configure > flags? Any other suggestions would be appreciated. > > Thanks, > > Sarah > > -----Original Message----- > From: Moe Jette [mailto:[email protected]] > Sent: Friday, June 29, 2012 9:18 AM > To: slurm-dev > Subject: [slurm-dev] FW: slurm and mpich2 > > > SLURM's PMI library gets the task rank from the environment variable > SLURM_PROCID. I believe that you are not using that library. You can at least > confirm that the environment variable is set correctly by running something > like this: > $ srun -n4 -l printenv SLURM_PROCID | sort > 0: 0 > 1: 1 > 2: 2 > 3: 3 > > If you get output like above then the problem is definitely that of not using > SLURM's PMI library. > > > Quoting Sarah Mulholland <[email protected]>: > > > I should add that I have MpiDefault=none in my slurm configuration > > file as suggested by the slurm configuration tool. > > > > From: Sarah Mulholland > > Sent: Thursday, June 28, 2012 3:59 PM > > To: '[email protected]' > > Subject: slurm and mpich2 > > > > I have installed slurm-2.3.5 and mpich2-1.4.1p1. We are using the > > hydra process manager for mpich. As suggested on the ANL web site, > > I installed configured mpich2 with -with-hydra-bss=ssh,rsh,fork,slurm. > > Yet when I launch a process with srun all tasks are rank 0. > > > > I tried building mpich2 with slurm's native PMI library by > > configuring --with-pmi=slurm -with-pm=no > > -with-slurm=[/our/path/here], but autoconf didn't find slurm in the given > > location. > > > > Has anybody else experienced this? Any suggestions? > > >
