Hi Jette,

Thank you very much.


What MPI implementation/version are you using?
> Many MPI implementations launch their tasks through SLURM, so this problem
> should not exist in that case. More information about how various MPI
> implementations work with SLURM is available here:
> https://computing.llnl.gov/linux/slurm/mpi_guide.html

I was using MPICH2 1.2.1 in this case. We also use MVAPICH2, which is
similar.

I am trying to avoid linking to Slurm's PMI. Otherwise we will have to
distribute separate executable binaries for slurm and non-slurm versions.


On the other hand, I wonder how linking to Slurm's PMI would help in
Suspend/Resume.

Does Slurm send TSTP signal to the individual computation process launched
by mpich2?

I guess this is the only way to make sure the actual computation processes
are suspended?



What SLURM plugin are you using for process tracking?
> Run "scontrol show config | grep Proctrack" to see.


$ scontrol show config | grep Proctrack

ProctrackType           = proctrack/pgid



Regards,

Shen Chen





On Wed, Jul 6, 2011 at 3:32 AM,  <[email protected]> wrote:
> What MPI implementation/version are you using?
> Many MPI implementations launch their tasks through SLURM, so this problem
> should not exist in that case. More information about how various MPI
> implementations work with SLURM is available here:
> https://computing.llnl.gov/linux/slurm/mpi_guide.html
>
> What SLURM plugin are you using for process tracking?
> Run "scontrol show config | grep Proctrack" to see.
>
> If your MPI implementation launches tasks outside of SLURM control, you
may
> just need to increase the sleep time. I don't believe there will be a
> general solution available for all configurations.
>
> Quoting hash <[email protected]>:
>
>> Hi all,
>>
>> In src/slurmd/slurmstepd/req.c, we learned that slurm sends SIGTSTP,
>> sleep(1), and sends SIGSTOP to suspend a job.
>> This is very important feature to us, as we have two partitions for
>> high/low priority jobs, and low priority jobs get suspended when
>> resources aren't enough.
>>
>> However, the 1-sec sleep doesn't seem to be sufficient in some cases.
>> Our jobs are launched with MPICH2's mpiexec, e.g.
>>     $ srun -c 8 mpiexec -n 8 /path/to/prog
>> The process IDs are:
>>    mpiexec:  100
>>    prog:  101-108
>> We issue the following command in terminal:
>>     $  kill -SIGTSTP 100 && sleep 1 && kill -SIGSTOP 100
>>
>> In half of the cases, the mpiexec process (100) is stopped, but the
>> underlying prog (101-108) are still running. Apparently, mpich2 hasn't
>> got enough time to handle the TSTP signal before STOP comes, which can
>> not be handled.
>> As a result, squeue reports that the low priority job is suspended,
>> and the high priority starts running, which overloads the workstation
>> with more processes than processors.
>>
>> If we change the sleep time to 2 seconds, both mpiexec and prog
>> processes are correctly stopped, at least for my 10 consecutive
>> tests.
>>
>> We could certainly changing the 1-second delay to a larger value in
>> req.c, but I'm not sure if it's going to work for larger jobs (more
>> memory, involving more nodes). I wonder if there can be a better
>> solution to the problem. Thank you!
>>
>> Regards,
>> Shen Chen
>>
>> Cogenda Pte Ltd
>> http://www.cogenda.com
>>
>>
>
>
>
> Moe Jette
> SchedMD LLC
>
>

Reply via email to