Okay, I tracked it down and have a fix pending for OMPI master: https://github.com/open-mpi/ompi/pull/3930 <https://github.com/open-mpi/ompi/pull/3930>
Once that cycles thru, I’ll create a PR for the 3.0 release. I’m not sure about taking it back to v2.x - I’ll have to check with those release managers. > On Jul 18, 2017, at 7:33 AM, [email protected] wrote: > > Just looking at it today... > >> On Jul 18, 2017, at 7:25 AM, Eugene Dedits <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hi Ralph, >> >> >> did you have a chance to take a look at this problem? >> >> Thanks! >> Eugene. >> >> >> >> >> On Tue, Jul 11, 2017 at 12:51 PM, Eugene Dedits <[email protected] >> <mailto:[email protected]>> wrote: >> Thanks! I really appreciate your help. >> In a meantime I’ve tried experimenting with 1.8.3. Here is what I’ve >> noticed. >> >> 1. Running the job with “sbatch ./my_script” where my script calls >> mpirun -np 160 -mca orte_forward_job_control 1 ./xhpl >> >> and then suspending the job with “scontrol suspend JOBID” >> does not work. Of 10 nodes assigned to my job 4 are still running >> 16 mpi threads of xhpl. >> >> 2. Running exactly the same job and then sending TSPT to mpirun process >> does work: all 10 nodes show that xhpl processes are stopped. Resuming >> them with -CONT also works. >> >> Again, this is with OpenMPI 1.8.3 >> >> Once again, thank you for all the help. >> >> Cheers, >> Eugene. >> >> >> >> >>> On Jul 11, 2017, at 12:08 PM, [email protected] <mailto:[email protected]> >>> wrote: >>> >>> Very odd - let me explore when I get back. Sorry for delay >>> >>> Sent from my iPad >>> >>> On Jul 11, 2017, at 10:59 AM, Eugene Dedits <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>>> Ralph, >>>> >>>> >>>> Are you suggesting doing something similar to this: >>>> https://www.open-mpi.org/faq/?category=sge#sge-suspend-resume >>>> <https://www.open-mpi.org/faq/?category=sge#sge-suspend-resume> >>>> >>>> If yes, here is what I’ve done: >>>> - start a job using slurm and "mpirun -mca orte_forward_job_control 1 -np >>>> 160 xhpl” >>>> - ssh to the node where mpirun is launched >>>> - “kill -STOP PID” where PID is mpirun pid >>>> - “kill -TSTP PID” >>>> >>>> In both cases (STOP and TSTP) I observer that there were 16 mpi processes >>>> running >>>> at 100% on all 10 nodes where the job was started. >>>> >>>> Thanks, >>>> Eugene. >>>> >>>> >>>> >>>> >>>> >>>> >>>>> On Jul 11, 2017, at 10:35 AM, [email protected] >>>>> <mailto:[email protected]> wrote: >>>>> >>>>> >>>>> Odd - I'm on travel this week but can look at it next week. One >>>>> possibility - have you tried hitting us with SIGTSTOP instead of SIGSTOP? >>>>> Difference in ability to trap and forward >>>>> >>>>> Sent from my iPad >>>>> >>>>>> On Jul 11, 2017, at 9:29 AM, Eugene Dedits <[email protected] >>>>>> <mailto:[email protected]>> wrote: >>>>>> >>>>>> >>>>>> I’ve just tried 3.0.0rc1 and problems still persists there… >>>>>> >>>>>> Thanks, >>>>>> E. >>>>>> >>>>>> >>>>>> >>>>>>> On Jul 11, 2017, at 10:20 AM, [email protected] >>>>>>> <mailto:[email protected]> wrote: >>>>>>> >>>>>>> >>>>>>> Just checked the planning board and saw that my PR to bring that change >>>>>>> to 2.1.2 is pending and not yet in the release branch. I’ll try to make >>>>>>> that happen soon >>>>>>> >>>>>>> Sent from my iPad >>>>>>> >>>>>>>> On Jul 11, 2017, at 8:03 AM, "[email protected] >>>>>>>> <mailto:[email protected]>" <[email protected] >>>>>>>> <mailto:[email protected]>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> There is an mca param ess_base_forward_signals that controls which >>>>>>>> signals to forward. However, I just looked at source code and see that >>>>>>>> it wasn't backported. Sigh. >>>>>>>> >>>>>>>> You could try the 3.0.0 branch as it is in release candidate and >>>>>>>> should go out within a week. I'd suggest just cloning that branch of >>>>>>>> the OMPI repo to get the latest state. The fix is definitely there >>>>>>>> >>>>>>>> Sent from my iPad >>>>>>>> >>>>>>>>> On Jul 11, 2017, at 7:45 AM, Eugene Dedits <[email protected] >>>>>>>>> <mailto:[email protected]>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi Ralph, >>>>>>>>> >>>>>>>>> >>>>>>>>> thanks for reply. I’ve just tried upgrading to ompi 2.1.1. The same >>>>>>>>> problem… :-\ >>>>>>>>> Could you point me to some discussion of this? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Eugene. >>>>>>>>> >>>>>>>>>> On Jul 11, 2017, at 6:17 AM, [email protected] >>>>>>>>>> <mailto:[email protected]> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> There is an issue with how the signal is forwarded. This has been >>>>>>>>>> fixed in the latest OMPI release so you might want to upgrade >>>>>>>>>> >>>>>>>>>> Ralph >>>>>>>>>> >>>>>>>>>> Sent from my iPad >>>>>>>>>> >>>>>>>>>>> On Jul 11, 2017, at 2:53 AM, Dennis Tants >>>>>>>>>>> <[email protected] >>>>>>>>>>> <mailto:[email protected]>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hello Eugene, >>>>>>>>>>> >>>>>>>>>>> it is just a wild guess, but could you try "srun --mpi=pmi2"(you >>>>>>>>>>> said >>>>>>>>>>> you built OMPI with pmi support) instead of "mpirun". >>>>>>>>>>> srun is build-in and I think the preferred way of running parallel >>>>>>>>>>> processes. Maybe scontrol is able to suspend it this way. >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Dennis >>>>>>>>>>> >>>>>>>>>>>> Am 10.07.2017 um 22:20 schrieb Eugene Dedits: >>>>>>>>>>>> Hello SLURM-DEV >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I have a problem with slurm, openmpi, and “scontrol suspend”. >>>>>>>>>>>> >>>>>>>>>>>> My setup is: >>>>>>>>>>>> 96-node cluster with IB, running rhel 6.8 >>>>>>>>>>>> slurm 17.02.1 >>>>>>>>>>>> openmpi 2.0.0 (built using Intel 2016 compiler) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I am running some application (hpl in this particular case) using >>>>>>>>>>>> batch script similar to: >>>>>>>>>>>> ----------------------------- >>>>>>>>>>>> #!/bin/bash >>>>>>>>>>>> #SBATCH —partiotion=standard >>>>>>>>>>>> #SBATCH -N 10 >>>>>>>>>>>> #SBATCH —ntasks-per-node=16 >>>>>>>>>>>> >>>>>>>>>>>> mpirun -np 160 xhpl | tee LOG >>>>>>>>>>>> ----------------------------- >>>>>>>>>>>> >>>>>>>>>>>> So I am running it on 160 cores, 2 nodes. >>>>>>>>>>>> >>>>>>>>>>>> Once job is submitted to the queue and is running I suspend it >>>>>>>>>>>> using >>>>>>>>>>>> ~# scontrol suspend JOBID >>>>>>>>>>>> >>>>>>>>>>>> I see that indeed my job stopped producing output. I go to each of >>>>>>>>>>>> the 10 >>>>>>>>>>>> nodes that were assigned for my job and see if the xhpl processes >>>>>>>>>>>> are running >>>>>>>>>>>> there with : >>>>>>>>>>>> >>>>>>>>>>>> ~# for i in {10..19}; do ssh node$i “top -b -n | head -n 50 | grep >>>>>>>>>>>> xhpl | wc -l”; done >>>>>>>>>>>> >>>>>>>>>>>> I expect this little script to return 0 from every node (because >>>>>>>>>>>> suspend sent the >>>>>>>>>>>> SIGSTOP and they shouldn’t show up in top). However I see that >>>>>>>>>>>> processes >>>>>>>>>>>> are reliable suspended only on node10. I get: >>>>>>>>>>>> 0 >>>>>>>>>>>> 16 >>>>>>>>>>>> 16 >>>>>>>>>>>> … >>>>>>>>>>>> 16 >>>>>>>>>>>> >>>>>>>>>>>> So 9 out of 10 nodes still have 16 MPI threads of my xhpl >>>>>>>>>>>> application running at 100%. >>>>>>>>>>>> >>>>>>>>>>>> If I run “scontrol resume JOBID” and then suspend it again I see >>>>>>>>>>>> that (sometimes) more >>>>>>>>>>>> nodes have “xhpl” processes properly suspended. Every time I >>>>>>>>>>>> resume and suspend the >>>>>>>>>>>> job, I see different nodes returning 0 in my “ssh-run-top” script. >>>>>>>>>>>> >>>>>>>>>>>> So all together it looks like the suspend mechanism doesn’t >>>>>>>>>>>> properly work in SLURM with >>>>>>>>>>>> OpenMPI. I’ve tried compiling OpenMPI with “—with-slurm >>>>>>>>>>>> —with-pmi=/path/to/my/slurm”. >>>>>>>>>>>> I’ve observed the same behavior. >>>>>>>>>>>> >>>>>>>>>>>> I would appreciate any help. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Eugene. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Dennis Tants >>>>>>>>>>> Auszubildender: Fachinformatiker für Systemintegration >>>>>>>>>>> >>>>>>>>>>> ZARM - Zentrum für angewandte Raumfahrttechnologie und >>>>>>>>>>> Mikrogravitation >>>>>>>>>>> ZARM - Center of Applied Space Technology and Microgravity >>>>>>>>>>> >>>>>>>>>>> Universität Bremen >>>>>>>>>>> Am Fallturm >>>>>>>>>>> 28359 Bremen, Germany >>>>>>>>>>> >>>>>>>>>>> Telefon: 0421 218 57940 >>>>>>>>>>> E-Mail: [email protected] <mailto:[email protected]> >>>>>>>>>>> >>>>>>>>>>> www.zarm.uni-bremen.de <http://www.zarm.uni-bremen.de/> >>>> >> >> >
