Just an update: eliminated the error below by telling MPI_Comm_spawn to create non-MPI processes, via the info key:
MPI_Info_set(info, "ompi_non_mpi", "true"); If you still want to pursue this matter, let me know. Kurt From: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mcc...@nasa.gov> Sent: Thursday, March 17, 2022 5:58 PM To: Open MPI Users <users@lists.open-mpi.org> Cc: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mcc...@nasa.gov> Subject: OpenMpi crash in MPI_Comm_spawn / developer message My job successfully spawned a large number of subprocesses via MPI_Comm_spawn, filling up the available cores. When some of those subprocesses terminated, it attempted to spawn more. It appears that the latter calls to MPI_Comm_spawn caused this error: [n022.cluster.com:08996] [[56319,0],0] grpcomm:direct:send_relay proc [[56319,0],1] not running - cannot relay: NOT ALIVE An internal error has occurred in ORTE: [[56319,0],0] FORCE-TERMINATE AT Unreachable:-12 - error grpcomm_direct.c(601) This is something that should be reported to the developers. I would attach the output created by the mpiexec arguments “--mca ras_base_verbose 5 --display-devel-map --mca rmaps_base_verbose 5 “, but it is 22 Mb. Do you have a location where I can drop the file? Thanks for any help. Kurt