Very odd - let me explore when I get back. Sorry for delay Sent from my iPad
> On Jul 11, 2017, at 10:59 AM, Eugene Dedits <[email protected]> wrote: > > Ralph, > > > Are you suggesting doing something similar to this: > https://www.open-mpi.org/faq/?category=sge#sge-suspend-resume > > If yes, here is what I’ve done: > - start a job using slurm and "mpirun -mca orte_forward_job_control 1 -np 160 > xhpl” > - ssh to the node where mpirun is launched > - “kill -STOP PID” where PID is mpirun pid > - “kill -TSTP PID” > > In both cases (STOP and TSTP) I observer that there were 16 mpi processes > running > at 100% on all 10 nodes where the job was started. > > Thanks, > Eugene. > > > > > > >> On Jul 11, 2017, at 10:35 AM, [email protected] wrote: >> >> >> Odd - I'm on travel this week but can look at it next week. One possibility >> - have you tried hitting us with SIGTSTOP instead of SIGSTOP? Difference in >> ability to trap and forward >> >> Sent from my iPad >> >>> On Jul 11, 2017, at 9:29 AM, Eugene Dedits <[email protected]> wrote: >>> >>> >>> I’ve just tried 3.0.0rc1 and problems still persists there… >>> >>> Thanks, >>> E. >>> >>> >>> >>>> On Jul 11, 2017, at 10:20 AM, [email protected] wrote: >>>> >>>> >>>> Just checked the planning board and saw that my PR to bring that change to >>>> 2.1.2 is pending and not yet in the release branch. I’ll try to make that >>>> happen soon >>>> >>>> Sent from my iPad >>>> >>>>> On Jul 11, 2017, at 8:03 AM, "[email protected]" <[email protected]> >>>>> wrote: >>>>> >>>>> >>>>> There is an mca param ess_base_forward_signals that controls which >>>>> signals to forward. However, I just looked at source code and see that it >>>>> wasn't backported. Sigh. >>>>> >>>>> You could try the 3.0.0 branch as it is in release candidate and should >>>>> go out within a week. I'd suggest just cloning that branch of the OMPI >>>>> repo to get the latest state. The fix is definitely there >>>>> >>>>> Sent from my iPad >>>>> >>>>>> On Jul 11, 2017, at 7:45 AM, Eugene Dedits <[email protected]> >>>>>> wrote: >>>>>> >>>>>> >>>>>> Hi Ralph, >>>>>> >>>>>> >>>>>> thanks for reply. I’ve just tried upgrading to ompi 2.1.1. The same >>>>>> problem… :-\ >>>>>> Could you point me to some discussion of this? >>>>>> >>>>>> Thanks, >>>>>> Eugene. >>>>>> >>>>>>> On Jul 11, 2017, at 6:17 AM, [email protected] wrote: >>>>>>> >>>>>>> >>>>>>> There is an issue with how the signal is forwarded. This has been fixed >>>>>>> in the latest OMPI release so you might want to upgrade >>>>>>> >>>>>>> Ralph >>>>>>> >>>>>>> Sent from my iPad >>>>>>> >>>>>>>> On Jul 11, 2017, at 2:53 AM, Dennis Tants >>>>>>>> <[email protected]> wrote: >>>>>>>> >>>>>>>> >>>>>>>> Hello Eugene, >>>>>>>> >>>>>>>> it is just a wild guess, but could you try "srun --mpi=pmi2"(you said >>>>>>>> you built OMPI with pmi support) instead of "mpirun". >>>>>>>> srun is build-in and I think the preferred way of running parallel >>>>>>>> processes. Maybe scontrol is able to suspend it this way. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Dennis >>>>>>>> >>>>>>>>> Am 10.07.2017 um 22:20 schrieb Eugene Dedits: >>>>>>>>> Hello SLURM-DEV >>>>>>>>> >>>>>>>>> >>>>>>>>> I have a problem with slurm, openmpi, and “scontrol suspend”. >>>>>>>>> >>>>>>>>> My setup is: >>>>>>>>> 96-node cluster with IB, running rhel 6.8 >>>>>>>>> slurm 17.02.1 >>>>>>>>> openmpi 2.0.0 (built using Intel 2016 compiler) >>>>>>>>> >>>>>>>>> >>>>>>>>> I am running some application (hpl in this particular case) using >>>>>>>>> batch script similar to: >>>>>>>>> ----------------------------- >>>>>>>>> #!/bin/bash >>>>>>>>> #SBATCH —partiotion=standard >>>>>>>>> #SBATCH -N 10 >>>>>>>>> #SBATCH —ntasks-per-node=16 >>>>>>>>> >>>>>>>>> mpirun -np 160 xhpl | tee LOG >>>>>>>>> ----------------------------- >>>>>>>>> >>>>>>>>> So I am running it on 160 cores, 2 nodes. >>>>>>>>> >>>>>>>>> Once job is submitted to the queue and is running I suspend it using >>>>>>>>> ~# scontrol suspend JOBID >>>>>>>>> >>>>>>>>> I see that indeed my job stopped producing output. I go to each of >>>>>>>>> the 10 >>>>>>>>> nodes that were assigned for my job and see if the xhpl processes are >>>>>>>>> running >>>>>>>>> there with : >>>>>>>>> >>>>>>>>> ~# for i in {10..19}; do ssh node$i “top -b -n | head -n 50 | grep >>>>>>>>> xhpl | wc -l”; done >>>>>>>>> >>>>>>>>> I expect this little script to return 0 from every node (because >>>>>>>>> suspend sent the >>>>>>>>> SIGSTOP and they shouldn’t show up in top). However I see that >>>>>>>>> processes >>>>>>>>> are reliable suspended only on node10. I get: >>>>>>>>> 0 >>>>>>>>> 16 >>>>>>>>> 16 >>>>>>>>> … >>>>>>>>> 16 >>>>>>>>> >>>>>>>>> So 9 out of 10 nodes still have 16 MPI threads of my xhpl application >>>>>>>>> running at 100%. >>>>>>>>> >>>>>>>>> If I run “scontrol resume JOBID” and then suspend it again I see that >>>>>>>>> (sometimes) more >>>>>>>>> nodes have “xhpl” processes properly suspended. Every time I resume >>>>>>>>> and suspend the >>>>>>>>> job, I see different nodes returning 0 in my “ssh-run-top” script. >>>>>>>>> >>>>>>>>> So all together it looks like the suspend mechanism doesn’t properly >>>>>>>>> work in SLURM with >>>>>>>>> OpenMPI. I’ve tried compiling OpenMPI with “—with-slurm >>>>>>>>> —with-pmi=/path/to/my/slurm”. >>>>>>>>> I’ve observed the same behavior. >>>>>>>>> >>>>>>>>> I would appreciate any help. >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Eugene. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Dennis Tants >>>>>>>> Auszubildender: Fachinformatiker für Systemintegration >>>>>>>> >>>>>>>> ZARM - Zentrum für angewandte Raumfahrttechnologie und Mikrogravitation >>>>>>>> ZARM - Center of Applied Space Technology and Microgravity >>>>>>>> >>>>>>>> Universität Bremen >>>>>>>> Am Fallturm >>>>>>>> 28359 Bremen, Germany >>>>>>>> >>>>>>>> Telefon: 0421 218 57940 >>>>>>>> E-Mail: [email protected] >>>>>>>> >>>>>>>> www.zarm.uni-bremen.de >
