Re: [OMPI users] Open MPI 4.0.3 outside as well as inside a SimpleFOAM container: step creation temporarily disabled, retrying Requested nodes are busy
> > Do you invoke mpirun from **inside** the container? > > IIRC, mpirun is generally invoked from **outside** the container, could > you try this if not already the case? > > > The error message is from SLURM, so this is really a SLURM vs > singularity issue. > > What if you > > srun -N 2 -n 2 hostname > > instead of > > mpirun ... I checked with SchedMD and the combination of commands that worked are this: salloc -t 0-04:00 -A ourgroup --nodes=2 --ntasks=4 mpirun -n 4 singularity exec openfoam10.sif /bin/bash -l -c 'source /opt/openfoam10/etc/bashrc& -parallel' There are probably other ways to get this to work but the above did the trick. Thanks for the suggestion. Rob >
Re: [OMPI users] Open MPI 4.0.3 outside as well as inside a SimpleFOAM container: step creation temporarily disabled, retrying Requested nodes are busy
Rob, Do you invoke mpirun from **inside** the container? IIRC, mpirun is generally invoked from **outside** the container, could you try this if not already the case? The error message is from SLURM, so this is really a SLURM vs singularity issue. What if you srun -N 2 -n 2 hostname instead of mpirun ... Cheers, Gilles On 3/1/2023 12:44 PM, Rob Kudyba via users wrote: Singularity 3.5.3 on RHEL 7 cluster w/ OpenMPI 4.0.3 lives inside a SimpleFOAM version 10 container. I've confirmed the OpenMPI versions are the same. Perhaps this is a question for Singularity users as well but how can I troubleshoot why mpirun just returns step creation temporarily disabled, retrying Requested Singularity> mpirun -V mpirun (Open MPI) 4.0.3 Report bugs to http://www.open-mpi.org/community/help/ Singularity> which mpirun /usr/bin/mpirun Singularity> $ mpirun -V mpirun (Open MPI) 4.0.3 mpirun -n 2 -mca plm_base_verbose 100 --mca ras_base_verbose 100 --mca rss_base_verbose 100 --mca rmaps_base_verbose 100 singularity exec openfoam simpleFoam -fileHandler uncollated -parallel | tee log.simpleFoam openfoam10/ openfoam10.sif openfoamtestfile.sh openfoam_v2012.sif [myuser@node047 motorBike]$ mpirun -n 2 -mca plm_base_verbose 100 --mca ras_base_verbose 100 --mca rss_base_verbose 100 --mca rmaps_base_verbose 100 singularity exec openfoam simpleFoam -fileHandler uncollated -parallel | tee log.simpleFoam openfoam10/ openfoam10.sif openfoamtestfile.sh openfoam_v2012.sif [myuser@node047 motorBike]$ mpirun -n 2 -mca plm_base_verbose 100 --mca ras_base_verbose 100 --mca rss_base_verbose 100 --mca rmaps_base_verbose 100 singularity exec openfoam10.sif simpleFoam -parallel | tee log.simpleFoam [node047:11650] mca: base: components_register: registering framework plm components [node047:11650] mca: base: components_register: found loaded component slurm [node047:11650] mca: base: components_register: component slurm register function successful [node047:11650] mca: base: components_register: found loaded component isolated [node047:11650] mca: base: components_register: component isolated has no register or open function [node047:11650] mca: base: components_register: found loaded component rsh [node047:11650] mca: base: components_register: component rsh register function successful [node047:11650] mca: base: components_open: opening plm components [node047:11650] mca: base: components_open: found loaded component slurm [node047:11650] mca: base: components_open: component slurm open function successful [node047:11650] mca: base: components_open: found loaded component isolated [node047:11650] mca: base: components_open: component isolated open function successful [node047:11650] mca: base: components_open: found loaded component rsh [node047:11650] mca: base: components_open: component rsh open function successful [node047:11650] mca:base:select: Auto-selecting plm components [node047:11650] mca:base:select:( plm) Querying component [slurm] [node047:11650] mca:base:select:( plm) Query of component [slurm] set priority to 75 [node047:11650] mca:base:select:( plm) Querying component [isolated] [node047:11650] mca:base:select:( plm) Query of component [isolated] set priority to 0 [node047:11650] mca:base:select:( plm) Querying component [rsh] [node047:11650] mca:base:select:( plm) Query of component [rsh] set priority to 10 [node047:11650] mca:base:select:( plm) Selected component [slurm] [node047:11650] mca: base: close: component isolated closed [node047:11650] mca: base: close: unloading component isolated [node047:11650] mca: base: close: component rsh closed [node047:11650] mca: base: close: unloading component rsh [node047:11650] mca: base: components_register: registering framework ras components [node047:11650] mca: base: components_register: found loaded component slurm [node047:11650] mca: base: components_register: component slurm register function successful [node047:11650] mca: base: components_register: found loaded component simulator [node047:11650] mca: base: components_register: component simulator register function successful [node047:11650] mca: base: components_open: opening ras components [node047:11650] mca: base: components_open: found loaded component slurm [node047:11650] mca: base: components_open: component slurm open function successful [node047:11650] mca: base: components_open: found loaded component simulator [node047:11650] mca:base:select: Auto-selecting ras components [node047:11650] mca:base:select:( ras) Querying component [slurm] [node047:11650] mca:base:select:( ras) Query of component [slurm] set priority to 50 [node047:11650] mca:base:select:( ras) Querying component [simulator] [node047:11650] mca:base:select:( ras) Selected component [slurm] [node047:11650] mca: base: close: unloading component simulator [node047:11650] mca: base: components_register: registering framework rmaps components
[OMPI users] Open MPI 4.0.3 outside as well as inside a SimpleFOAM container: step creation temporarily disabled, retrying Requested nodes are busy
Singularity 3.5.3 on RHEL 7 cluster w/ OpenMPI 4.0.3 lives inside a SimpleFOAM version 10 container. I've confirmed the OpenMPI versions are the same. Perhaps this is a question for Singularity users as well but how can I troubleshoot why mpirun just returns step creation temporarily disabled, retrying Requested Singularity> mpirun -V mpirun (Open MPI) 4.0.3 Report bugs to http://www.open-mpi.org/community/help/ Singularity> which mpirun /usr/bin/mpirun Singularity> $ mpirun -V mpirun (Open MPI) 4.0.3 mpirun -n 2 -mca plm_base_verbose 100 --mca ras_base_verbose 100 --mca rss_base_verbose 100 --mca rmaps_base_verbose 100 singularity exec openfoam simpleFoam -fileHandler uncollated -parallel | tee log.simpleFoam openfoam10/ openfoam10.sif openfoamtestfile.sh openfoam_v2012.sif [myuser@node047 motorBike]$ mpirun -n 2 -mca plm_base_verbose 100 --mca ras_base_verbose 100 --mca rss_base_verbose 100 --mca rmaps_base_verbose 100 singularity exec openfoam simpleFoam -fileHandler uncollated -parallel | tee log.simpleFoam openfoam10/ openfoam10.sif openfoamtestfile.sh openfoam_v2012.sif [myuser@node047 motorBike]$ mpirun -n 2 -mca plm_base_verbose 100 --mca ras_base_verbose 100 --mca rss_base_verbose 100 --mca rmaps_base_verbose 100 singularity exec openfoam10.sif simpleFoam -parallel | tee log.simpleFoam [node047:11650] mca: base: components_register: registering framework plm components [node047:11650] mca: base: components_register: found loaded component slurm [node047:11650] mca: base: components_register: component slurm register function successful [node047:11650] mca: base: components_register: found loaded component isolated [node047:11650] mca: base: components_register: component isolated has no register or open function [node047:11650] mca: base: components_register: found loaded component rsh [node047:11650] mca: base: components_register: component rsh register function successful [node047:11650] mca: base: components_open: opening plm components [node047:11650] mca: base: components_open: found loaded component slurm [node047:11650] mca: base: components_open: component slurm open function successful [node047:11650] mca: base: components_open: found loaded component isolated [node047:11650] mca: base: components_open: component isolated open function successful [node047:11650] mca: base: components_open: found loaded component rsh [node047:11650] mca: base: components_open: component rsh open function successful [node047:11650] mca:base:select: Auto-selecting plm components [node047:11650] mca:base:select:( plm) Querying component [slurm] [node047:11650] mca:base:select:( plm) Query of component [slurm] set priority to 75 [node047:11650] mca:base:select:( plm) Querying component [isolated] [node047:11650] mca:base:select:( plm) Query of component [isolated] set priority to 0 [node047:11650] mca:base:select:( plm) Querying component [rsh] [node047:11650] mca:base:select:( plm) Query of component [rsh] set priority to 10 [node047:11650] mca:base:select:( plm) Selected component [slurm] [node047:11650] mca: base: close: component isolated closed [node047:11650] mca: base: close: unloading component isolated [node047:11650] mca: base: close: component rsh closed [node047:11650] mca: base: close: unloading component rsh [node047:11650] mca: base: components_register: registering framework ras components [node047:11650] mca: base: components_register: found loaded component slurm [node047:11650] mca: base: components_register: component slurm register function successful [node047:11650] mca: base: components_register: found loaded component simulator [node047:11650] mca: base: components_register: component simulator register function successful [node047:11650] mca: base: components_open: opening ras components [node047:11650] mca: base: components_open: found loaded component slurm [node047:11650] mca: base: components_open: component slurm open function successful [node047:11650] mca: base: components_open: found loaded component simulator [node047:11650] mca:base:select: Auto-selecting ras components [node047:11650] mca:base:select:( ras) Querying component [slurm] [node047:11650] mca:base:select:( ras) Query of component [slurm] set priority to 50 [node047:11650] mca:base:select:( ras) Querying component [simulator] [node047:11650] mca:base:select:( ras) Selected component [slurm] [node047:11650] mca: base: close: unloading component simulator [node047:11650] mca: base: components_register: registering framework rmaps components [node047:11650] mca: base: components_register: found loaded component seq [node047:11650] mca: base: components_register: component seq register function successful [node047:11650] mca: base: components_register: found loaded component rank_file [node047:11650] mca: base: components_register: component rank_file register function successful [node047:11650] mca: base: components_register: found loaded component resilient