Ralph,
Are you suggesting doing something similar to this: https://www.open-mpi.org/faq/?category=sge#sge-suspend-resume <https://www.open-mpi.org/faq/?category=sge#sge-suspend-resume> If yes, here is what I’ve done: - start a job using slurm and "mpirun -mca orte_forward_job_control 1 -np 160 xhpl” - ssh to the node where mpirun is launched - “kill -STOP PID” where PID is mpirun pid - “kill -TSTP PID” In both cases (STOP and TSTP) I observer that there were 16 mpi processes running at 100% on all 10 nodes where the job was started. Thanks, Eugene. > On Jul 11, 2017, at 10:35 AM, [email protected] wrote: > > > Odd - I'm on travel this week but can look at it next week. One possibility - > have you tried hitting us with SIGTSTOP instead of SIGSTOP? Difference in > ability to trap and forward > > Sent from my iPad > >> On Jul 11, 2017, at 9:29 AM, Eugene Dedits <[email protected]> wrote: >> >> >> I’ve just tried 3.0.0rc1 and problems still persists there… >> >> Thanks, >> E. >> >> >> >>> On Jul 11, 2017, at 10:20 AM, [email protected] wrote: >>> >>> >>> Just checked the planning board and saw that my PR to bring that change to >>> 2.1.2 is pending and not yet in the release branch. I’ll try to make that >>> happen soon >>> >>> Sent from my iPad >>> >>>> On Jul 11, 2017, at 8:03 AM, "[email protected]" <[email protected]> wrote: >>>> >>>> >>>> There is an mca param ess_base_forward_signals that controls which signals >>>> to forward. However, I just looked at source code and see that it wasn't >>>> backported. Sigh. >>>> >>>> You could try the 3.0.0 branch as it is in release candidate and should go >>>> out within a week. I'd suggest just cloning that branch of the OMPI repo >>>> to get the latest state. The fix is definitely there >>>> >>>> Sent from my iPad >>>> >>>>> On Jul 11, 2017, at 7:45 AM, Eugene Dedits <[email protected]> >>>>> wrote: >>>>> >>>>> >>>>> Hi Ralph, >>>>> >>>>> >>>>> thanks for reply. I’ve just tried upgrading to ompi 2.1.1. The same >>>>> problem… :-\ >>>>> Could you point me to some discussion of this? >>>>> >>>>> Thanks, >>>>> Eugene. >>>>> >>>>>> On Jul 11, 2017, at 6:17 AM, [email protected] wrote: >>>>>> >>>>>> >>>>>> There is an issue with how the signal is forwarded. This has been fixed >>>>>> in the latest OMPI release so you might want to upgrade >>>>>> >>>>>> Ralph >>>>>> >>>>>> Sent from my iPad >>>>>> >>>>>>> On Jul 11, 2017, at 2:53 AM, Dennis Tants >>>>>>> <[email protected]> wrote: >>>>>>> >>>>>>> >>>>>>> Hello Eugene, >>>>>>> >>>>>>> it is just a wild guess, but could you try "srun --mpi=pmi2"(you said >>>>>>> you built OMPI with pmi support) instead of "mpirun". >>>>>>> srun is build-in and I think the preferred way of running parallel >>>>>>> processes. Maybe scontrol is able to suspend it this way. >>>>>>> >>>>>>> Regards, >>>>>>> Dennis >>>>>>> >>>>>>>> Am 10.07.2017 um 22:20 schrieb Eugene Dedits: >>>>>>>> Hello SLURM-DEV >>>>>>>> >>>>>>>> >>>>>>>> I have a problem with slurm, openmpi, and “scontrol suspend”. >>>>>>>> >>>>>>>> My setup is: >>>>>>>> 96-node cluster with IB, running rhel 6.8 >>>>>>>> slurm 17.02.1 >>>>>>>> openmpi 2.0.0 (built using Intel 2016 compiler) >>>>>>>> >>>>>>>> >>>>>>>> I am running some application (hpl in this particular case) using >>>>>>>> batch script similar to: >>>>>>>> ----------------------------- >>>>>>>> #!/bin/bash >>>>>>>> #SBATCH —partiotion=standard >>>>>>>> #SBATCH -N 10 >>>>>>>> #SBATCH —ntasks-per-node=16 >>>>>>>> >>>>>>>> mpirun -np 160 xhpl | tee LOG >>>>>>>> ----------------------------- >>>>>>>> >>>>>>>> So I am running it on 160 cores, 2 nodes. >>>>>>>> >>>>>>>> Once job is submitted to the queue and is running I suspend it using >>>>>>>> ~# scontrol suspend JOBID >>>>>>>> >>>>>>>> I see that indeed my job stopped producing output. I go to each of the >>>>>>>> 10 >>>>>>>> nodes that were assigned for my job and see if the xhpl processes are >>>>>>>> running >>>>>>>> there with : >>>>>>>> >>>>>>>> ~# for i in {10..19}; do ssh node$i “top -b -n | head -n 50 | grep >>>>>>>> xhpl | wc -l”; done >>>>>>>> >>>>>>>> I expect this little script to return 0 from every node (because >>>>>>>> suspend sent the >>>>>>>> SIGSTOP and they shouldn’t show up in top). However I see that >>>>>>>> processes >>>>>>>> are reliable suspended only on node10. I get: >>>>>>>> 0 >>>>>>>> 16 >>>>>>>> 16 >>>>>>>> … >>>>>>>> 16 >>>>>>>> >>>>>>>> So 9 out of 10 nodes still have 16 MPI threads of my xhpl application >>>>>>>> running at 100%. >>>>>>>> >>>>>>>> If I run “scontrol resume JOBID” and then suspend it again I see that >>>>>>>> (sometimes) more >>>>>>>> nodes have “xhpl” processes properly suspended. Every time I resume >>>>>>>> and suspend the >>>>>>>> job, I see different nodes returning 0 in my “ssh-run-top” script. >>>>>>>> >>>>>>>> So all together it looks like the suspend mechanism doesn’t properly >>>>>>>> work in SLURM with >>>>>>>> OpenMPI. I’ve tried compiling OpenMPI with “—with-slurm >>>>>>>> —with-pmi=/path/to/my/slurm”. >>>>>>>> I’ve observed the same behavior. >>>>>>>> >>>>>>>> I would appreciate any help. >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Eugene. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Dennis Tants >>>>>>> Auszubildender: Fachinformatiker für Systemintegration >>>>>>> >>>>>>> ZARM - Zentrum für angewandte Raumfahrttechnologie und Mikrogravitation >>>>>>> ZARM - Center of Applied Space Technology and Microgravity >>>>>>> >>>>>>> Universität Bremen >>>>>>> Am Fallturm >>>>>>> 28359 Bremen, Germany >>>>>>> >>>>>>> Telefon: 0421 218 57940 >>>>>>> E-Mail: [email protected] >>>>>>> >>>>>>> www.zarm.uni-bremen.de
