To answer your specific questions:

The backend daemons (orted) will not exit until all locally spawned procs exit. 
This is not configurable - for one thing, OMPI procs will suicide if they see 
the daemon depart, so it makes no sense to have the daemon fail if a proc 
terminates. The logic behind this behavior spans multiple parts of the code 
base, I'm afraid.

On May 17, 2021, at 7:03 AM, Jeff Squyres (jsquyres) via users 
<users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > wrote:

FYI: general Open MPI questions are better sent to the user's mailing list.

Up through the v4.1.x series, the "orted" is a general helper process that Open 
MPI uses on the back-end.  It will not quit until all of its children have 
died.  Open MPI's run time is designed with the intent that some external 
helper will be there for the entire duration of the job; there is no option to 
run without one.

Two caveats:

1. In Open MPI v5.0.x, from the user's perspective, "orted" has been renamed to 
be "prted".  Since this is 99.999% behind the scenes, most users won't notice 
the difference.

2. You can run without "orted" (or "prted") if you use a different run-time 
environment (e.g., SLURM).  In this case, you'll use that environment's 
launcher (e.g., srun or sbatch in SLURM environments) to directly launch MPI 
processes -- you won't use "mpirun" at all.  Fittingly, this is called "direct 
launch" in Open MPI parlance (i.e., using another run-time's daemons to launch 
processes instead of first launching orteds (or prteds).



On May 16, 2021, at 8:34 AM, 叶安华 <yean...@sensetime.com 
<mailto:yean...@sensetime.com> > wrote:

Code snippet: 
 # sleep.sh
sleep 10001 &
/bin/sh son_sleep.sh
sleep 10002
 # son_sleep.sh
sleep 10003 &
sleep 10004 &
 thanks
Anhua
  From: 叶安华 <yean...@sensetime.com <mailto:yean...@sensetime.com> >
Date: Sunday, May 16, 2021 at 20:31
To: "jsquy...@cisco.com <mailto:jsquy...@cisco.com> " <jsquy...@cisco.com 
<mailto:jsquy...@cisco.com> >
Subject: [Help] Must orted exit after all spawned proecesses exit
 Dear Jeff, 
 Sorry to bother you but I am really curious about the conditions on which 
orted exits in the below scenario, and I am looking forward to hearing from you.
 Scenario description:
·         Step 1: start a remote process via "mpirun -np 1 -host 10.211.55.4 sh 
sleep.sh"
·         Step 2: check pstree in the remote host:
<image001.jpg>
·         Step 3: the mpirun process in step 1 does not exit until I kill all 
the sleeping process, which are 15479 15481 15482 15483
 To conclude, my questions are as follows:
1.      Must orted wait until all spawned processes exit?
2.      Is this behavior configurable? What if I want orted to exit immediately 
after any one of the spawned proecess exits?
3.      I did not find the specific logic about orted waiting for spawned 
proecesses to exit, hope I can get some hint from you.
 PS (scripts):
<image002.png>
  thanks
Anhua
 

-- 
Jeff Squyres
jsquy...@cisco.com <mailto:jsquy...@cisco.com> 




Reply via email to