Hi Ralph,
did you have a chance to take a look at this problem? Thanks! Eugene. On Tue, Jul 11, 2017 at 12:51 PM, Eugene Dedits <[email protected]> wrote: > Thanks! I really appreciate your help. > In a meantime I’ve tried experimenting with 1.8.3. Here is what I’ve > noticed. > > 1. Running the job with “sbatch ./my_script” where my script calls > mpirun -np 160 -mca orte_forward_job_control 1 ./xhpl > > and then suspending the job with “scontrol suspend JOBID” > does not work. Of 10 nodes assigned to my job 4 are still running > 16 mpi threads of xhpl. > > 2. Running exactly the same job and then sending TSPT to mpirun process > does work: all 10 nodes show that xhpl processes are stopped. Resuming > them with -CONT also works. > > Again, this is with OpenMPI 1.8.3 > > Once again, thank you for all the help. > > Cheers, > Eugene. > > > > > On Jul 11, 2017, at 12:08 PM, [email protected] wrote: > > Very odd - let me explore when I get back. Sorry for delay > > Sent from my iPad > > On Jul 11, 2017, at 10:59 AM, Eugene Dedits <[email protected]> > wrote: > > Ralph, > > > Are you suggesting doing something similar to this: > https://www.open-mpi.org/faq/?category=sge#sge-suspend-resume > > If yes, here is what I’ve done: > - start a job using slurm and "mpirun -mca orte_forward_job_control 1 -np 160 > xhpl” > - ssh to the node where mpirun is launched > - “kill -STOP PID” where PID is mpirun pid > - “kill -TSTP PID” > > In both cases (STOP and TSTP) I observer that there were 16 mpi processes > running > at 100% on all 10 nodes where the job was started. > > Thanks, > Eugene. > > > > > > > On Jul 11, 2017, at 10:35 AM, [email protected] wrote: > > > Odd - I'm on travel this week but can look at it next week. One > possibility - have you tried hitting us with SIGTSTOP instead of SIGSTOP? > Difference in ability to trap and forward > > Sent from my iPad > > On Jul 11, 2017, at 9:29 AM, Eugene Dedits <[email protected]> > wrote: > > > I’ve just tried 3.0.0rc1 and problems still persists there… > > Thanks, > E. > > > > On Jul 11, 2017, at 10:20 AM, [email protected] wrote: > > > Just checked the planning board and saw that my PR to bring that change to > 2.1.2 is pending and not yet in the release branch. I’ll try to make that > happen soon > > Sent from my iPad > > On Jul 11, 2017, at 8:03 AM, "[email protected]" <[email protected]> wrote: > > > There is an mca param ess_base_forward_signals that controls which signals > to forward. However, I just looked at source code and see that it wasn't > backported. Sigh. > > You could try the 3.0.0 branch as it is in release candidate and should go > out within a week. I'd suggest just cloning that branch of the OMPI repo to > get the latest state. The fix is definitely there > > Sent from my iPad > > On Jul 11, 2017, at 7:45 AM, Eugene Dedits <[email protected]> > wrote: > > > Hi Ralph, > > > thanks for reply. I’ve just tried upgrading to ompi 2.1.1. The same > problem… :-\ > Could you point me to some discussion of this? > > Thanks, > Eugene. > > On Jul 11, 2017, at 6:17 AM, [email protected] wrote: > > > There is an issue with how the signal is forwarded. This has been fixed in > the latest OMPI release so you might want to upgrade > > Ralph > > Sent from my iPad > > On Jul 11, 2017, at 2:53 AM, Dennis Tants <[email protected]> > wrote: > > > Hello Eugene, > > it is just a wild guess, but could you try "srun --mpi=pmi2"(you said > you built OMPI with pmi support) instead of "mpirun". > srun is build-in and I think the preferred way of running parallel > processes. Maybe scontrol is able to suspend it this way. > > Regards, > Dennis > > Am 10.07.2017 um 22:20 schrieb Eugene Dedits: > Hello SLURM-DEV > > > I have a problem with slurm, openmpi, and “scontrol suspend”. > > My setup is: > 96-node cluster with IB, running rhel 6.8 > slurm 17.02.1 > openmpi 2.0.0 (built using Intel 2016 compiler) > > > I am running some application (hpl in this particular case) using batch > script similar to: > ----------------------------- > #!/bin/bash > #SBATCH —partiotion=standard > #SBATCH -N 10 > #SBATCH —ntasks-per-node=16 > > mpirun -np 160 xhpl | tee LOG > ----------------------------- > > So I am running it on 160 cores, 2 nodes. > > Once job is submitted to the queue and is running I suspend it using > ~# scontrol suspend JOBID > > I see that indeed my job stopped producing output. I go to each of the 10 > nodes that were assigned for my job and see if the xhpl processes are > running > there with : > > ~# for i in {10..19}; do ssh node$i “top -b -n | head -n 50 | grep xhpl | > wc -l”; done > > I expect this little script to return 0 from every node (because suspend > sent the > SIGSTOP and they shouldn’t show up in top). However I see that processes > are reliable suspended only on node10. I get: > 0 > 16 > 16 > … > 16 > > So 9 out of 10 nodes still have 16 MPI threads of my xhpl application > running at 100%. > > If I run “scontrol resume JOBID” and then suspend it again I see that > (sometimes) more > nodes have “xhpl” processes properly suspended. Every time I resume and > suspend the > job, I see different nodes returning 0 in my “ssh-run-top” script. > > So all together it looks like the suspend mechanism doesn’t properly work > in SLURM with > OpenMPI. I’ve tried compiling OpenMPI with “—with-slurm > —with-pmi=/path/to/my/slurm”. > I’ve observed the same behavior. > > I would appreciate any help. > > > Thanks, > Eugene. > > > > > > -- > Dennis Tants > Auszubildender: Fachinformatiker für Systemintegration > > ZARM - Zentrum für angewandte Raumfahrttechnologie und Mikrogravitation > ZARM - Center of Applied Space Technology and Microgravity > > Universität Bremen > Am Fallturm > 28359 Bremen, Germany > > Telefon: 0421 218 57940 > E-Mail: [email protected] > > www.zarm.uni-bremen.de > > > >
