To answer your specific questions: The backend daemons (orted) will not exit until all locally spawned procs exit. This is not configurable - for one thing, OMPI procs will suicide if they see the daemon depart, so it makes no sense to have the daemon fail if a proc terminates. The logic behind this behavior spans multiple parts of the code base, I'm afraid.
On May 17, 2021, at 7:03 AM, Jeff Squyres (jsquyres) via users <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > wrote: FYI: general Open MPI questions are better sent to the user's mailing list. Up through the v4.1.x series, the "orted" is a general helper process that Open MPI uses on the back-end. It will not quit until all of its children have died. Open MPI's run time is designed with the intent that some external helper will be there for the entire duration of the job; there is no option to run without one. Two caveats: 1. In Open MPI v5.0.x, from the user's perspective, "orted" has been renamed to be "prted". Since this is 99.999% behind the scenes, most users won't notice the difference. 2. You can run without "orted" (or "prted") if you use a different run-time environment (e.g., SLURM). In this case, you'll use that environment's launcher (e.g., srun or sbatch in SLURM environments) to directly launch MPI processes -- you won't use "mpirun" at all. Fittingly, this is called "direct launch" in Open MPI parlance (i.e., using another run-time's daemons to launch processes instead of first launching orteds (or prteds). On May 16, 2021, at 8:34 AM, 叶安华 <yean...@sensetime.com <mailto:yean...@sensetime.com> > wrote: Code snippet: # sleep.sh sleep 10001 & /bin/sh son_sleep.sh sleep 10002 # son_sleep.sh sleep 10003 & sleep 10004 & thanks Anhua From: 叶安华 <yean...@sensetime.com <mailto:yean...@sensetime.com> > Date: Sunday, May 16, 2021 at 20:31 To: "jsquy...@cisco.com <mailto:jsquy...@cisco.com> " <jsquy...@cisco.com <mailto:jsquy...@cisco.com> > Subject: [Help] Must orted exit after all spawned proecesses exit Dear Jeff, Sorry to bother you but I am really curious about the conditions on which orted exits in the below scenario, and I am looking forward to hearing from you. Scenario description: · Step 1: start a remote process via "mpirun -np 1 -host 10.211.55.4 sh sleep.sh" · Step 2: check pstree in the remote host: <image001.jpg> · Step 3: the mpirun process in step 1 does not exit until I kill all the sleeping process, which are 15479 15481 15482 15483 To conclude, my questions are as follows: 1. Must orted wait until all spawned processes exit? 2. Is this behavior configurable? What if I want orted to exit immediately after any one of the spawned proecess exits? 3. I did not find the specific logic about orted waiting for spawned proecesses to exit, hope I can get some hint from you. PS (scripts): <image002.png> thanks Anhua -- Jeff Squyres jsquy...@cisco.com <mailto:jsquy...@cisco.com>