Thanks! I really appreciate your help. In a meantime I’ve tried experimenting with 1.8.3. Here is what I’ve noticed.
1. Running the job with “sbatch ./my_script” where my script calls mpirun -np 160 -mca orte_forward_job_control 1 ./xhpl and then suspending the job with “scontrol suspend JOBID” does not work. Of 10 nodes assigned to my job 4 are still running 16 mpi threads of xhpl. 2. Running exactly the same job and then sending TSPT to mpirun process does work: all 10 nodes show that xhpl processes are stopped. Resuming them with -CONT also works. Again, this is with OpenMPI 1.8.3 Once again, thank you for all the help. Cheers, Eugene. > On Jul 11, 2017, at 12:08 PM, [email protected] wrote: > > Very odd - let me explore when I get back. Sorry for delay > > Sent from my iPad > > On Jul 11, 2017, at 10:59 AM, Eugene Dedits <[email protected] > <mailto:[email protected]>> wrote: > >> Ralph, >> >> >> Are you suggesting doing something similar to this: >> https://www.open-mpi.org/faq/?category=sge#sge-suspend-resume >> <https://www.open-mpi.org/faq/?category=sge#sge-suspend-resume> >> >> If yes, here is what I’ve done: >> - start a job using slurm and "mpirun -mca orte_forward_job_control 1 -np >> 160 xhpl” >> - ssh to the node where mpirun is launched >> - “kill -STOP PID” where PID is mpirun pid >> - “kill -TSTP PID” >> >> In both cases (STOP and TSTP) I observer that there were 16 mpi processes >> running >> at 100% on all 10 nodes where the job was started. >> >> Thanks, >> Eugene. >> >> >> >> >> >> >>> On Jul 11, 2017, at 10:35 AM, [email protected] <mailto:[email protected]> >>> wrote: >>> >>> >>> Odd - I'm on travel this week but can look at it next week. One possibility >>> - have you tried hitting us with SIGTSTOP instead of SIGSTOP? Difference in >>> ability to trap and forward >>> >>> Sent from my iPad >>> >>>> On Jul 11, 2017, at 9:29 AM, Eugene Dedits <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> >>>> I’ve just tried 3.0.0rc1 and problems still persists there… >>>> >>>> Thanks, >>>> E. >>>> >>>> >>>> >>>>> On Jul 11, 2017, at 10:20 AM, [email protected] >>>>> <mailto:[email protected]> wrote: >>>>> >>>>> >>>>> Just checked the planning board and saw that my PR to bring that change >>>>> to 2.1.2 is pending and not yet in the release branch. I’ll try to make >>>>> that happen soon >>>>> >>>>> Sent from my iPad >>>>> >>>>>> On Jul 11, 2017, at 8:03 AM, "[email protected] >>>>>> <mailto:[email protected]>" <[email protected] >>>>>> <mailto:[email protected]>> wrote: >>>>>> >>>>>> >>>>>> There is an mca param ess_base_forward_signals that controls which >>>>>> signals to forward. However, I just looked at source code and see that >>>>>> it wasn't backported. Sigh. >>>>>> >>>>>> You could try the 3.0.0 branch as it is in release candidate and should >>>>>> go out within a week. I'd suggest just cloning that branch of the OMPI >>>>>> repo to get the latest state. The fix is definitely there >>>>>> >>>>>> Sent from my iPad >>>>>> >>>>>>> On Jul 11, 2017, at 7:45 AM, Eugene Dedits <[email protected] >>>>>>> <mailto:[email protected]>> wrote: >>>>>>> >>>>>>> >>>>>>> Hi Ralph, >>>>>>> >>>>>>> >>>>>>> thanks for reply. I’ve just tried upgrading to ompi 2.1.1. The same >>>>>>> problem… :-\ >>>>>>> Could you point me to some discussion of this? >>>>>>> >>>>>>> Thanks, >>>>>>> Eugene. >>>>>>> >>>>>>>> On Jul 11, 2017, at 6:17 AM, [email protected] >>>>>>>> <mailto:[email protected]> wrote: >>>>>>>> >>>>>>>> >>>>>>>> There is an issue with how the signal is forwarded. This has been >>>>>>>> fixed in the latest OMPI release so you might want to upgrade >>>>>>>> >>>>>>>> Ralph >>>>>>>> >>>>>>>> Sent from my iPad >>>>>>>> >>>>>>>>> On Jul 11, 2017, at 2:53 AM, Dennis Tants >>>>>>>>> <[email protected] >>>>>>>>> <mailto:[email protected]>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> Hello Eugene, >>>>>>>>> >>>>>>>>> it is just a wild guess, but could you try "srun --mpi=pmi2"(you said >>>>>>>>> you built OMPI with pmi support) instead of "mpirun". >>>>>>>>> srun is build-in and I think the preferred way of running parallel >>>>>>>>> processes. Maybe scontrol is able to suspend it this way. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Dennis >>>>>>>>> >>>>>>>>>> Am 10.07.2017 um 22:20 schrieb Eugene Dedits: >>>>>>>>>> Hello SLURM-DEV >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I have a problem with slurm, openmpi, and “scontrol suspend”. >>>>>>>>>> >>>>>>>>>> My setup is: >>>>>>>>>> 96-node cluster with IB, running rhel 6.8 >>>>>>>>>> slurm 17.02.1 >>>>>>>>>> openmpi 2.0.0 (built using Intel 2016 compiler) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I am running some application (hpl in this particular case) using >>>>>>>>>> batch script similar to: >>>>>>>>>> ----------------------------- >>>>>>>>>> #!/bin/bash >>>>>>>>>> #SBATCH —partiotion=standard >>>>>>>>>> #SBATCH -N 10 >>>>>>>>>> #SBATCH —ntasks-per-node=16 >>>>>>>>>> >>>>>>>>>> mpirun -np 160 xhpl | tee LOG >>>>>>>>>> ----------------------------- >>>>>>>>>> >>>>>>>>>> So I am running it on 160 cores, 2 nodes. >>>>>>>>>> >>>>>>>>>> Once job is submitted to the queue and is running I suspend it using >>>>>>>>>> ~# scontrol suspend JOBID >>>>>>>>>> >>>>>>>>>> I see that indeed my job stopped producing output. I go to each of >>>>>>>>>> the 10 >>>>>>>>>> nodes that were assigned for my job and see if the xhpl processes >>>>>>>>>> are running >>>>>>>>>> there with : >>>>>>>>>> >>>>>>>>>> ~# for i in {10..19}; do ssh node$i “top -b -n | head -n 50 | grep >>>>>>>>>> xhpl | wc -l”; done >>>>>>>>>> >>>>>>>>>> I expect this little script to return 0 from every node (because >>>>>>>>>> suspend sent the >>>>>>>>>> SIGSTOP and they shouldn’t show up in top). However I see that >>>>>>>>>> processes >>>>>>>>>> are reliable suspended only on node10. I get: >>>>>>>>>> 0 >>>>>>>>>> 16 >>>>>>>>>> 16 >>>>>>>>>> … >>>>>>>>>> 16 >>>>>>>>>> >>>>>>>>>> So 9 out of 10 nodes still have 16 MPI threads of my xhpl >>>>>>>>>> application running at 100%. >>>>>>>>>> >>>>>>>>>> If I run “scontrol resume JOBID” and then suspend it again I see >>>>>>>>>> that (sometimes) more >>>>>>>>>> nodes have “xhpl” processes properly suspended. Every time I resume >>>>>>>>>> and suspend the >>>>>>>>>> job, I see different nodes returning 0 in my “ssh-run-top” script. >>>>>>>>>> >>>>>>>>>> So all together it looks like the suspend mechanism doesn’t properly >>>>>>>>>> work in SLURM with >>>>>>>>>> OpenMPI. I’ve tried compiling OpenMPI with “—with-slurm >>>>>>>>>> —with-pmi=/path/to/my/slurm”. >>>>>>>>>> I’ve observed the same behavior. >>>>>>>>>> >>>>>>>>>> I would appreciate any help. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Eugene. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Dennis Tants >>>>>>>>> Auszubildender: Fachinformatiker für Systemintegration >>>>>>>>> >>>>>>>>> ZARM - Zentrum für angewandte Raumfahrttechnologie und >>>>>>>>> Mikrogravitation >>>>>>>>> ZARM - Center of Applied Space Technology and Microgravity >>>>>>>>> >>>>>>>>> Universität Bremen >>>>>>>>> Am Fallturm >>>>>>>>> 28359 Bremen, Germany >>>>>>>>> >>>>>>>>> Telefon: 0421 218 57940 >>>>>>>>> E-Mail: [email protected] <mailto:[email protected]> >>>>>>>>> >>>>>>>>> www.zarm.uni-bremen.de <http://www.zarm.uni-bremen.de/> >>
