Hi Jette,
Thank you very much. What MPI implementation/version are you using? > Many MPI implementations launch their tasks through SLURM, so this problem > should not exist in that case. More information about how various MPI > implementations work with SLURM is available here: > https://computing.llnl.gov/linux/slurm/mpi_guide.html I was using MPICH2 1.2.1 in this case. We also use MVAPICH2, which is similar. I am trying to avoid linking to Slurm's PMI. Otherwise we will have to distribute separate executable binaries for slurm and non-slurm versions. On the other hand, I wonder how linking to Slurm's PMI would help in Suspend/Resume. Does Slurm send TSTP signal to the individual computation process launched by mpich2? I guess this is the only way to make sure the actual computation processes are suspended? What SLURM plugin are you using for process tracking? > Run "scontrol show config | grep Proctrack" to see. $ scontrol show config | grep Proctrack ProctrackType = proctrack/pgid Regards, Shen Chen On Wed, Jul 6, 2011 at 3:32 AM, <[email protected]> wrote: > What MPI implementation/version are you using? > Many MPI implementations launch their tasks through SLURM, so this problem > should not exist in that case. More information about how various MPI > implementations work with SLURM is available here: > https://computing.llnl.gov/linux/slurm/mpi_guide.html > > What SLURM plugin are you using for process tracking? > Run "scontrol show config | grep Proctrack" to see. > > If your MPI implementation launches tasks outside of SLURM control, you may > just need to increase the sleep time. I don't believe there will be a > general solution available for all configurations. > > Quoting hash <[email protected]>: > >> Hi all, >> >> In src/slurmd/slurmstepd/req.c, we learned that slurm sends SIGTSTP, >> sleep(1), and sends SIGSTOP to suspend a job. >> This is very important feature to us, as we have two partitions for >> high/low priority jobs, and low priority jobs get suspended when >> resources aren't enough. >> >> However, the 1-sec sleep doesn't seem to be sufficient in some cases. >> Our jobs are launched with MPICH2's mpiexec, e.g. >> $ srun -c 8 mpiexec -n 8 /path/to/prog >> The process IDs are: >> mpiexec: 100 >> prog: 101-108 >> We issue the following command in terminal: >> $ kill -SIGTSTP 100 && sleep 1 && kill -SIGSTOP 100 >> >> In half of the cases, the mpiexec process (100) is stopped, but the >> underlying prog (101-108) are still running. Apparently, mpich2 hasn't >> got enough time to handle the TSTP signal before STOP comes, which can >> not be handled. >> As a result, squeue reports that the low priority job is suspended, >> and the high priority starts running, which overloads the workstation >> with more processes than processors. >> >> If we change the sleep time to 2 seconds, both mpiexec and prog >> processes are correctly stopped, at least for my 10 consecutive >> tests. >> >> We could certainly changing the 1-second delay to a larger value in >> req.c, but I'm not sure if it's going to work for larger jobs (more >> memory, involving more nodes). I wonder if there can be a better >> solution to the problem. Thank you! >> >> Regards, >> Shen Chen >> >> Cogenda Pte Ltd >> http://www.cogenda.com >> >> > > > > Moe Jette > SchedMD LLC > >
